key: cord-303939-7knzjnyr authors: hu, fang; chen, si‐liang; dai, yu‐jun; wang, yun; qin, zhe‐yuan; li, huan; shu, ling‐ling; li, jin‐yuan; huang, han‐ying; liang, yang title: identification of a metabolic gene panel to predict the prognosis of myelodysplastic syndrome date: 2020-04-26 journal: j cell mol med doi: 10.1111/jcmm.15283 sha: doc_id: 303939 cord_uid: 7knzjnyr myelodysplastic syndrome (mds) is clonal disease featured by ineffective haematopoiesis and potential progression into acute myeloid leukaemia (aml). at present, the risk stratification and prognosis of mds need to be further optimized. a prognostic model was constructed by the least absolute shrinkage and selection operator (lasso) regression analysis for mds patients based on the identified metabolic gene panel in training cohort, followed by external validation in an independent cohort. the patients with lower risk had better prognosis than patients with higher risk. the constructed model was verified as an independent prognostic factor for mds patients with hazard ratios of 3.721 (1.814‐7.630) and 2.047 (1.013‐4.138) in the training cohort and validation cohort, respectively. the auc of 3‐year overall survival was 0.846 and 0.743 in the training cohort and validation cohort, respectively. the high‐risk score was significantly related to other clinical prognostic characteristics, including higher bone marrow blast cells and lower absolute neutrophil count. moreover, gene set enrichment analyses (gsea) showed several significantly enriched pathways, with potential indication of the pathogenesis. in this study, we identified a novel stable metabolic panel, which might not only reveal the dysregulated metabolic microenvironment, but can be used to predict the prognosis of mds. pro-inflammatory of bone marrow microenvironment and so on. 3, 4 at present, the mds revised international prognostic scoring system (ipss-r) is one of the gold standards for risk stratification and prognostic assessment in mds patients, in which, patients are categorized into five well-defined risk groups according to platelet count, haemoglobin levels, absolute neutrophil count (anc), marrow blast percentage and cytogenetics. 5, 6 although patients in intermediate-risk group are reported to have an intermediary survival, it is possible that the disease course might vary, with variable outcome actually. 7 in the meantime, mds lacks a diversified prognostic classification system at present. therefore, identification of more diversified prognostic models would better guide therapeutic decisions, further assisting to design more perfect clinical trials. furthermore, mds is a stem cell-derived disorder affecting multiple lineages. 8 mds stem cells with cd123 + have been reported to have higher levels of protein synthesis and change cellular energy metabolism, 9 which are similar with aml. 10, 11 the anti-leukaemia mechanism of b cell lymphoma 2 (bcl-2) inhibitor (venetoclax) combined with demethylated drugs (azacytidine) is the eradication of lscs by disrupting the tricarboxylic acid (tca) cycle for further and durable remissions for older aml patients. 12 moreover, isocitrate dehydrogenase 2 (idh2) enzyme inhibitor has been approved by us food and drug administration (fda) in 2017 for refractory or relapsed aml patients by targeting tumour energy metabolism for. bm microenvironment is vitally involved in the pathogenesis of mds according to the 'seed soil' theory, which consists of cellular componon-cellular components (metabolites, cytokines, hormones and angiogenic factors). 13 leukaemia cells use oxidative phosphorylation for survival, while hscs depend on glycolysis for energy production. 12 leukaemia cells are likely to uptake mitochondria from stromal cells by endocytosis. 14 as a consequence, metabolism plays key roles for non-cellular components. accumulative studies have revealed that the relationship between pathogenesis, treatment and metabolism of mds recently. therefore, we established a prognostic panel of metabolic gene by downloading data from gene expression omnibus (geo) datasets in the training cohort, which was further validated in an independent external cohort. in conclusion, we constructed a metabolic panel to predict the prognosis of mds and revealed that metabolism played significant roles in the prognosis of mds. the mrna expression profiles and relevant clinical information were downloaded from gse58831 15 and gse11 4922 16 datasets from the geo database. the metabolic gene sets utilized as the candidate metabolic gene lists were retrieved from 'c2.cp.kegg. v7.0.symbols' in gene set enrichment analysis (gsea). in addition, perl scripts were used to retrieve metabolic genes for further analysis. transcripts per million normalization and log2 transformation were performed on the expression profiles. de analysis was conducted on 861 annotated metabolic-related genes with protein coding functions by the limma. 17 the expression pattern of metabolic genes was examined in training cohort. genes were subjected to prognostic analysis in the case of consistent expression pattern in training cohort and independent external cohort. gse53381 dataset was used as the training cohort to construct metabolic risk panel. the lasso regression penalizes the data fitting criteria in a way that eliminates less informative predictor variables to yield simpler and more interpretable models. therefore, the metabolic panel was constructed according to the penalized maximum likelihood estimator with 1000-fold cross-validation. the least criteria of the penalized maximum likelihood estimator were employed to determine the optimal values of penalty parameter λ. in addition, gse11 4922 dataset served as an independent external validation cohort. the unified formula determined in the training cohort was used to generate the metabolic risk score in every patient, who were further categorized into high-and lowrisk groups according to the median metabolic risk score. univariate and multivariate forwarding stepwise cox regression analyses were conducted in both training and validation cohorts. a p < .05 indicated statistical significance. gsea v4.0.2 software (http://softw are.broad insti tute.org/gsea/ login.jsp) was utilized to recognize the potential biological pathways between high-and low-risk groups by using 'c2.cp.kegg.v7.0.symbols' gene sets. nom p-value < .05 indicated statistical significance and was further exhibited. time-dependent receiver operating characteristic (roc) curve was performed to assess the predictive performance of metabolic signature in the raining and validating cohorts, followed by calculation of area under the curve (auc) using survival roc package. overall survival (os) was defined as the primary outcome, which was calculated as the date of the study entry until death due to all causes. kaplan-meier curve was plotted by 'survival' package, followed by comparison by log-rank test. univariable and multivariable cox analyses were used to evaluate the prognostic performance of clinical and genetic features. categorical variables were compared by chi-square test or fisher's exact test. spss ® version 24.0 (ibm) and r software (version 3.6.0) were used for statistical analysis. a two sided p < .05 indicated statistical significance. the detailed patient characteristics of the two included cohorts and the correlation between clinicopathological features and metabolic risk level in training cohort and external validation cohort in mds ckmt2 fhit pik3ca dgkd entpd4 ada cbr3 dgkh hmgcs1 pde4d pfkfb3 mtmr6 pde4b chdh cyp2a6 ipmk rrm2b agps pfkfb2 mgst1 polr1d glud1 ces2 dgkz nme6 nt5c chst12 galt ass1 cel acp6 synj2 dgat2 oxct2 polr3f aldh7a1 adk tyms lpcat4 polr1a mcee afmid eprs aldh4a1 sord mpst gcdh srm tpi1 dnmt3b ppat gmps mtr gstm3 nt5m dgat1 cant1 cs gys1 inpp5e polr2a agpat3 hk1 ppox tsta3 gmpr pip4k2a mboat2 pip5k1b ptgds acads gclc gclm aco1 gsr tpmt among the 861 metabolic genes subjected to de analysis by the limma, 140 genes were differently expressed between healthy sample and mds sample ( figure 1a,b) . further, the prognostic values of these 140 genes were analysed via univariate cox regression analysis. ultimately, 22 genes that were differentially expressed as well as survival-related (p < .05) were identified. ( figure 1c) . afterwards, the lasso-penalized cox analysis regression was used to select the most useful predictive genes from the 22 genes. a penalized maximum likelihood estimator was performed with 1000 bootstrap replicates. the regularization parameter lambda was used to identify the optimal weighting coefficients via the least criteria (figure 2a,b) . afterwards, 15 metabolic genes were selected and the coefficient was estimated to construct the the in total, 201 patients with complete clinical data including age, gender, who category, karyotype, ipss, transfusion dependent, haemoglobin, bone marrow blasts cells, platelet count and absolute neutrophil count were included in the training and validation cohort. high-risk patients were associated with male, higher numbers of bone marrow blast cells, higher ipss score and lower absolute neutrophil count (table 1) figure 6a ,b. gsea identified 36 significantly enriched kegg pathways in the training or validation cohort. the majority of the metabolism-associated pathways were enriched in the low-risk group, and the metabolic pathways ranked by nes were cysteine and methionine metabolism, glycine serine and threonine metabolism, fatty acid metabolism and pyrimidine metabolism. on the contrary, the majority of the non-metabolism-associated pathways were enriched in the high-risk group. additionally, most enriched pathways were correlated with cancer (such as the cell cycle and phosphatidylinositol signalling system) or metabolism (such as the glycine serine and threonine metabolism, cysteine and methionine metabolism) ( figure 7a ,b). the mutation variants of the metabolic gene panel were explored in ccle database by the cbioportal for cancer genomics. 18, 19 as was expected, the gene amplification, which can change fifteen metabolic genes were identified to construct the metabolic model, most of which have been reported to be involved in malignancy. pfkfb2, a vital regulator of glucose metabolism, has been defined as a candidate gene for gc-triggered apoptosis according to comparative expression profiling in childhood acute lymphoblastic leukaemia (all). 23 interestingly, pfkfb2 was suppressed by mir-613 in gastric cancer, which could further inhibit cell proliferation and invasion. 24 the expression pattern is reported for the first time as a potential marker in mds plcb2, involved in inositol phosphate metabolism, has been narrowly linked to the poor prognosis in patients with hepatocellular carcinoma, lung cancer and mammary carcinoma. 25 in our study, plcb2 was negatively correlated with study should also be acknowledged. firstly, we are unavailable to more clinical information due to the data driving from geo database. secondly, the significance of the metabolic panel should be further confirmed in real-world research, and further basic experiments are simultaneously necessary to explore the underlying pathogenesis. in summary, we constructed a novel prognostic prediction model based on metabolic genes from geo database for mds, and further validated in the validation cohort. the prognostic model was not only an independent prognostic predictor for mds but also reflected the disordered metabolism of mds. moreover, this panel could be utilized as an effective approach for prognostic prediction in mds patients in clinical practice. the authors sincerely thank geo for sharing the huge amount of data. the authors declare no conflict of interest. conceptualization, fh and yl; methodology, csl, and fh; validation, csl; formal analysis,; investigation, fh, csl, and dyj; resources, yl; data curation, cyy; writing-original draft preparation, yw, and fh; writing-review and editing, yw, fh and jyl; visualization, slc, and lls and djd; supervision, yl; project administration, yl; funding acquisition, yl. the gse58831 and gse11 4922 datasets were collected via the gene expression omnibus (geo) database, which were utilized for retrieving clinicopathological data and rna expression patterns. all data or code generated or used during this study are available from the corresponding author by reasonable request. si-liang chen https://orcid.org/0000-0002-4095-5974 the genetic basis of phenotypic heterogeneity in myelodysplastic syndromes myelodysplastic syndromes clinical implications of genetic mutations in myelodysplastic syndrome immunologic aspects of hypoplastic myelodysplastic syndrome revised international prognostic scoring system for myelodysplastic syndromes myelodysplastic syndromes, version 2.2017, nccn clinical practice guidelines in oncology use of newer prognostic indices for patients with myelodysplastic syndromes in the low and intermediate-1 risk categories: a population-based study transplantation of a myelodysplastic syndrome by a long-term repopulating hematopoietic cell characterization and targeting of malignant stem cells in patients with advanced myelodysplastic syndromes inhibition of amino acid metabolism selectively targets human leukemia stem cells venetoclax with azacitidine disrupts energy metabolism and targets leukemia stem cells in patients with acute myeloid leukemia bcl-2 inhibition targets oxidative phosphorylation and selectively eradicates quiescent human leukemia stem cells evaluation of bone marrow microenvironment could change how myelodysplastic syndromes are diagnosed and treated protective mitochondrial transfer from bone marrow stromal cells to acute myeloid leukemic cells during chemotherapy combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes impact of spliceosome mutations on rna splicing in myelodysplasia: dysregulated genes/ pathways and clinical associations microarray analysis after rna amplification can detect pronounced differences in gene expression using limma the cbio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data integrative analysis of complex cancer genomics and clinical profiles using the cbioportal hallmarks of cancer: the next generation enhanced renewal of erythroid progenitors in myelodysplastic anemia by peripheral serotonin higher fetuin-a, lower adiponectin and free leptin levels mediate effects of excess body weight on insulin resistance and risk for myelodysplastic syndrome identification of glucocorticoid-response genes in children with acute lymphoblastic leukemia anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection distinct prognostic values of phospholipase c beta family members for non-small cell lung carcinoma loss of dnmt3b function upregulates the tumor modifier ment and accelerates mouse lymphomagenesis loss of dnmt3b accelerates mll-af9 leukemia progression prognostic and biologic significance of dnmt3b expression in older patients with cytogenetically normal primary acute myeloid leukemia pathways of retinoid synthesis in mouse macrophages and bone marrow cells arginine addiction in aml arginine deprivation for ass1-deficient mesothelioma a phase ii study of arginine deiminase (adi-peg20) in relapsed/refractory or poor-risk acute myeloid leukemia patients arginine deprivation using pegylated arginine deiminase has activity against primary acute myeloid leukemia cells in vivo microarray analysis reveals a major direct role of dna copy number alteration in the transcriptional program of human breast tumors impact of dna amplification on gene expression patterns in breast cancer identification of a metabolic gene panel to predict the prognosis of myelodysplastic syndrome key: cord-306380-msk9p1yy authors: lee, c.-w.; jackwood, m. w. title: evidence of genetic diversity generated by recombination among avian coronavirus ibv date: 2000 journal: arch virol doi: 10.1007/s007050070044 sha: doc_id: 306380 cord_uid: msk9p1yy previously, we demonstrated that the de072 strain of ibv is a recombinant which has an ibv strain d1466-like sequence in the s gene. herein, we analyzed the remaining 3.8 kb 3′ end of the genome, which includes gene 3, gene 4, gene 5, gene 6, and the 3′ non-coding region of the de072 and d1466 strains. those two viruses had high nucleotide similarity in gene 4. however, the other individual genes had a much different level of sequence similarity with the same gene of the other ibv strains. the genome of five ibv strains, of which the complete sequence of the 3′ end of the genome has been determined, were divided at an intergenic (ig) consensus sequence (ctgaacaa or cttaacaa) and compared phylogenetically. phylogenetic trees of different topology indicated that the consensus ig sequences and the highly conserved sequence around this regions may serve as recombination ‘hot spots’. phylogenetic analysis of selected regions of the genome of the de072 serotype field isolates further support those results and indicate that isolates within the same serotype may have different amounts of nucleotide sequence similarity with each other in individual genes other than the s gene. presumably this occurs because the consensus ig sequence serves as the template switching site for the viral encoded polymerase. infectious bronchitis virus causes a highly contagious upper-respiratory disease in chickens. the disease is characterized by increased ocular and nasal secretions, excess mucus in the trachea, decreased weight gain and feed efficiency in broilers, and declines in egg production and egg quality in layers. although live attenuated vaccines are available, ibv continues to be a severe economic problem in commercial chickens because many different serotypes of the virus exist and do not cross protect [3] . [4] . members of the nidovirales order have a single stranded positive sense rna genome and produce a 3 nested set of subgenomic mrnas when they replicate [4] . coronaviruses are divided into three antigenic groups based primarily on their structural proteins. infectious bronchitis virus is the type strain of coronaviruses and is the only virus placed in antigenic group three. characteristics of this group are a cleaved spike (s) glycoprotein, an n-glycosylated membrane (m) protein, and no hemagglutinin/esterase protein [19] . the genome of ibv is approximately 27 kilobases in length [1] . it is organized into six regions, each containing one or more open reading frames (orf's), which are separated by intergenic sequences (ig) that contain the signal for transcription of subgenomic mrnas [1, 17] . the viral rna-dependent rna polymerase is encoded in the 5 two thirds of the viral genome by two overlapping open reading frames (orf1a and orf1b) [1] . the structural protein genes are located 3 to the viral polymerase gene and are in order from 5 to 3 , the s glycoprotein gene (gene 2), the small envelope (e) gene (gene3), the m glycoprotein gene (gene4), and the nucleocapsid (n) gene (gene6) [19, 20] . evolution in ibv has been observed through the occurrence of variant viruses and analysis of known serotypes. more than twenty serotypes within ibv have been recognized worldwide and are thought to be generated by insertions, deletions, point mutations and rna recombination [2, 3, 6, 14] . evidence of natural recombination for several ibv strains has been reported [10, 15, 22] . however, because of the limited sequence information, recombination has only been described for a small part of the genome. so far, the complete sequence of the 3 end of the genome (from the 3 end of the polymerase gene to the poly a tail) of only three strains, beaudette, kb8523 and cu-t2 have been determined [1, 11, 20] . the de072 strain was first isolated in 1992 in the delmarva peninsula region of the usa and initial characterization of this virus indicated this virus was serologically distinct from any other ibv serotypes in north america [7] . previously, we demonstrated that the de072 strain is a recombinant which has a d1466-like sequence in the s1 and s2 genes [18] . d1466 is an ibv vaccine strain of the d212 serotype from the netherlands [7, 13, 14] . herein, we describe the sequences of the remaining genes of the de072 and d1466 strains with the exception of gene 1(the polymerase gene). we conducted phylogenetic analysis by dividing the genome in the ig sequence to elucidate possible role of this sequence in the homologous recombination in ibv. further, we conducted sequence analysis of six isolates of the de072 serotype in order to determine if recombination is frequently occurring in this region in field isolates of ibv. viruses used in this study are listed in table 1 . the viruses were propagated in 9-day-old embryonated specific-pathogen-free (spf) chicken eggs (select laboratories, gainesville, ga, usa). the d1466 strain of ibv was obtained as phenol-inactivated allantoic fluid using usda import permit #42290. viral rna from ibv grown in embryonating eggs was extracted using the high pure pcr template preparation kit (boehringer mannheim, indianapolis, in, usa) according to the manufacturers recommendation. rna from the phenol-inactivated allantoic fluid of d1466 was extracted with a modification in first several step of the high pure pcr template preparation kit. briefly, 1.5 ml of the infectious allantoic fluid was placed into a microcentrifuge tube and centrifuged at 13,000 ×g for 5 min. the aqueous top layer, approximately 200 l, was transferred to new tube. binding buffer (200 l) and 40 l of proteinase k (18 mg/ml) was added and incubated for 10 min at 70 • c. then 150 l of chloroform/isoamyl alcohol (49:1) was added, vortexed gently for 5-10 sec and then placed on ice for 15 min. the mixture was centrifuged at 13,000 ×g for 10 min. the upper phase was transferred to a clean 1.5 ml tube and 100 l of chloroform/isoamyl alcohol (49:1) was added. the mixture was vortexed gently for 5-10 sec. this was centrifuged for 2 min at 13,000 ×g, and the upper phase was transferred to a clean 1.5 ml tube. remaining steps were followed sequentially as described by the manufacturer. gene 3, gene 4, gene 5, gene 6, and a 421 bp hypervariable region (hvr) of the s1 gene were amplified separately using the titan one tube rt-pcr system (boehringer mannheim). primer sets used to amplify gene 3, gene 4, and the hvr in s1 are listed in table 2 . the primers utilized for amplification of gene 5 and gene 6 have been reported [8, 23] . the reaction conditions for rt-pcr were previously described [16, 23] . pcr products were cut from 1% agarose gels and purified using the qia quick gel extraction kit (qiagen, santa clarita, ca, usa). purified pcr products were either sequenced directly or cloned into the ta cloning vector (invitrogen, carlsbed, ca, usa), and automated sequencing with the prism dyedeoxy terminator cycle sequencing kit (perkin elmer, foster city, ca, usa) was conducted at the molecular genetics instrumentation facility, university of georgia. sequencing primers to various regions of the gene for de072 and the relative primer positions were calculated using the atg start site of gene 3 as 1 for primers gene 3 and 4, and atg start site of s1 gene as 1 for primers hvr in s1 d1466 were designed using oligo version 4.0 software (national bioscience, plymouth, mn, usa) and are available upon request. assembly of sequencing contigs, translation of nucleotide sequence into protein sequence, and initial multiple sequence alignments were performed with the clustal v method in megalign software versin 1.03 (dnastar inc., madison, wi, usa). phylogenetic trees for each gene were generated using the maximum parsimony method with 100 bootstrap replicates in a heuristic search using the paup 3.1 software program [21] . the nucleotide sequences reported here have been deposited with the genbank. the accession numbers are as follows: de072 (gene 3), af202998; de072 (gene 4), af202999; de072 (gene 5), af203000; de072 (gene 6), af203001; de072 (3 end non-coding region), af203002; d1466 (gene 3), af203003; d1466 (gene 4), af203004; d1466 (gene 5), af203005; d1466 (gene 6), af203006; d1466 (3 end non-coding region), af203007; 98-2831 (hvr in s1), af206254; 99-5831 (hvr in s1), af206255; 99-5425 (hvr in s1), af206256; 99-5658 (hvr in s1), af206257; 97-6370 (hvr in s1), af206258; 97-6386 (hvr in s1), af206259 the complete sequence of the 3 end of the genome of three strains, beaudette, kb8523 and cu-t2 and gene 6 of holl52 strain have been previously reported [1, 11, 20, 23] . a total of 3839 nucleotide and 3861 nucleotide were found, respectively, in a region beginning from the 5 end of gene 3 to the 3 end of de072 and d1466 genome. the intergenic sequence ctgaacaa or cttaacaa was found immediately upstream of the start site for each gene of both strains. the sequences were identical to those found in the corresponding genomic areas of the beaudette, kb8523, and cu-t2 strains (fig. 1). fig. 1 . the nucleotide sequence alignment of gene 3, gene 4, gene 5, and gene 6, and 3 noncoding region. dots indicates nucleotide identical to that of de072 strain. the conserved nucleotide sequences ctgaacaa or cttaacaa, which is located at the starting site of each gene, is in bold character. heavy underlines indicate the putative start codons, asterisks above the sequence indicate the stop codons gene 3 of both strains contained three orfs, 3a, 3b, and 3c. gene 4 consisted of the m protein gene with a single orf and a non-coding region between the 3 end of the m protein gene and gene 5. gene 5 contained two orfs (5a and 5b). gene 6 consisted of the n protein gene with a single orf and a 3 non-coding region. downstream from the stop codon of the n gene, a 15 base insertion was found in the d1466 genome which also occurs in the genome of the holl52 (fig. 1) . the 3 -terminal 3.8 kb of the genome of five strains and gene 6 of the holl52 strain were compared. the nucleotide sequence similarities among coding regions of gene 3, m, gene 5, and the n protein gene of de072 and other strain were between 83.3-97.6%. those of d1466 and other strains were between 78.7-98.2% identical. d1466 showed only 1.8% nucleotide difference with holl52 in gene 6. gene 3c and gene 5b were relatively more conserved than the other genes (table 3) . genes were divided by ig sequences (ctgaacaa/cttaacaa) and phylogenetic analysis was conducted. the de072 strain clustered with the cu-t2 (fig. 2) . kb8523, which is only the nephropathogenic strain, was solely placed in all genes compared. in order to demonstrate the genetic heterogeneity of the same serotype isolates of ibv, we conducted phylogenetic analysis using six de072 serotype field isolates. phylogenetic analysis of the hypervariable region (hvr) in s1, clustered all the de072 serotype isolates in one group with the prototype strain of the de072 serotype of ibv. this group was far from other serotypes of ibv strains in tree length. however, phylogenetic tree of gene 3 and gene 4 showed differences in tree topology among six isolates. in gene 3, only one isolate, 98-2831, clustered with de072. in gene 4, no isolates clustered with de072 and formed groups randomly with other serotypes of ibv (fig. 3) . de072 is a recent isolate made in 1992 [6] . in a previous study of the s gene, we demonstrated that this virus was closely related to d1466 which is an ibv vaccine strain of the d212 serotype from the netherlands [7, 13, 18] . analysis of gene 4 also reveals a high sequence relatedness between de072 and d1466 (table 3) . however, in the other genes analyzed in this study, de072 shares high sequence similarity with the cu-t2 strain which has also been reported to be a recombinant between arkansas and massachusetts strains [10] . considering the fact that both strains were isolated in the northeastern usa, it is possible that they had undergone similar selection pressure. on the other hand, d1466 shows high similarity with beaudette and holl52 strains in genes other than the s gene. the percent similarity in the n gene and a 15 base insertion in the 3 non-coding region suggests that both d1466 and holl52 are closely related (table 3 , fig. 1 ). the holl52 strain has been extensively used as a live vaccine in europe [5] . this finding provides more convincing evidene that vaccine strains are contributing to the emergence of variants in the field. based on these results, we suggest that de072 and d1466 had the same origin, but diverged a long time ago and evolved independently in different geographical locations. since recombination in coronaviruses is thought to occur by a template switching mechanism [8, 19] , we speculate that ig sequences may serve as 'hot spots' for homologous recombination. so far, recombinations suggested in ibv have been used on a small part of the genome [10, 15, 22] . examining only a small part of the genome may result in misleading conclusions because of point mutations or conserved regions of the gene. we conducted phylogenetic analysis by dividing 3.8 kb of the 3 end of the genome among five ibv strains at the ig sequences. phylogenetic trees of this sequence data had very different topology (fig. 2) , which indicates that recombination had occurred. it has been reported that rna recombination in ibv can occur randomly in non-localized sites in vitro [12] . however, considering the selection pressure in vivo recombination in the ig sequences should be advantageous to virus in two aspects. first, since crossovers occur at the site of consensus ig sequences, there would be no shift in the codon reading frame. second, since whole genes are substituted, there would be no drastic change in the conformation of proteins encoded by individual genes. further, cross-overs at each of the five ig sequences would generate tremendous genetic diversity. this amount of diversity may contribute to persistence and to the continuing emergence of new variants of ibv despite vaccination efforts. finally, we conducted sequence analysis of 6 isolates of the de072 serotype to demonstrate how random recombination occurs within the same serotype. phylogenetic analysis of the hvr in s1 shows that these 6 isolates cluster together because they are the same serotype. however, these 6 isolates had a much different level of nucleotide sequence similarity with each other in gene 3 and gene 4, and clustered randomly with other serotypes of ibv (fig. 3) . based on this result, it is clear that isolates of the same serotype can differ substantially in individual genes. thus, every field isolate of ibv could be unique in each gene sequence because of recombination. completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus location of the amino acid differences in the s1 spike glycoprotein subunit of closely related serotypes of infectious bronchitis virus infectious bronchitis nidovirales: a new order comprising coronaviridae and arteriviridae occurrence and significance of infectious bronchitis virus variant strains in egg and broiler production in the netherlands variant serotypes of infectious bronchitis virus isolated from commercial layer and broiler chickens antigenic and s-1 genomic characterization of the delaware variant serotypes of infectious bronchitis virus infectious bronchitis virus detection in allantoic fluid using the polymerase chain reaction and a dna probe poliovirus rna recombination: mechanistic studies in the absence of selection a novel variant of avian infectious bronchitis virus resulting from recombination among three different strains sequence analysis of gene 3, gene 4 and gene 5 of avian infectious bronchitis virus strain cu-t2 experimental evidence of recombination in coronavirus infectious bronchitis virus molecular epidemiology of infectious bronchitis virus in the netherlands phylogeny of antigenic variants of avian coronavirus ibv sequence evidence for rna recombination in field isolates of avian coronavirus infectious bronchitis virus differentiation of infectious bronchitis virus serotypes using polymerase chain reaction and restriction-fragment-lengthpolymorphism analysis coronavirus: how a large rna viral genome is replicated and transcribed spike gene analysis of the de072 strain of infectious bronchitis virus: origin and evolution the coronaviridae cloning and sequencing of genes encoding structural proteins of avian infectious bronchitis virus paup: phylogenetic analysis using parsimony. version 3. illinois natural history survey evidence of natural recombination within the s1 gene of infectious bronchitis virus comparative analyses of the nucleocapsid genes of several strains of infectious bronchitis viruses and other coronaviruses we express appreciation to dr. yoram weisman for providing d1466 virus and deborah hilt for technical assistance. thanks are also extended to drs. bruce seal, maricarmen garcia, and holly sellers for the review of this manuscript. received december 1, 1999 key: cord-337492-o6sy4zi4 authors: baric, ralph s.; crosson, sean; damania, blossom; miller, samuel i.; rubin, eric j. title: next-generation high-throughput functional annotation of microbial genomes date: 2016-10-04 journal: mbio doi: 10.1128/mbio.01245-16 sha: doc_id: 337492 cord_uid: o6sy4zi4 host infection by microbial pathogens cues global changes in microbial and host cell biology that facilitate microbial replication and disease. the complete maps of thousands of bacterial and viral genomes have recently been defined; however, the rate at which physiological or biochemical functions have been assigned to genes has greatly lagged. the national institute of allergy and infectious diseases (niaid) addressed this gap by creating functional genomics centers dedicated to developing high-throughput approaches to assign gene function. these centers require broad-based and collaborative research programs to generate and integrate diverse data to achieve a comprehensive understanding of microbial pathogenesis. high-throughput functional genomics can lead to new therapeutics and better understanding of the next generation of emerging pathogens by rapidly defining new general mechanisms by which organisms cause disease and replicate in host tissues and by facilitating the rate at which functional data reach the scientific community. m icrobial genome sequencing efforts continue at an increasing rate, resulting in an expanding catalogue of genes of unknown function for important pathogens (fig. 1) . in some cases, up to 40% of the annotated genes in fully sequenced microbial genomes have no known or predicted function. in many cases, these uncharacterized genes are highly conserved and are implicated in the pathogenic process through either their timing of expression or requirement for microbial replication. such data indicate that these genes execute important, general biological functions. the interpretation of existing and emerging microbial genomics data will require the scientific community to uncover new paradigms hidden within these sequences. this point is highlighted by a recent publication by hutchison et al. focused on building a minimal genome of mycoplasma mycoides (1) . the 473 genes in this genome include many required for known essential functions, such as dna replication, transcription, and translation, but the genome also contains 149 genes (~31%) of unknown function. likewise, the known coding capacity of dna viruses has been doubled over the past few years, an achievement made possible by using novel high-throughput sequencing and proteomic methods. the discovery of these genes of unknown function not only promises to affect our understanding of microbial pathogenesis but also provides an undiscovered wealth of new therapeutic targets for antibiotic, antiviral, and vaccine development for improved global and economic health. unlike large-scale genome-sequencing or structural-genomics efforts, the functional annotation of uncharacterized genes is not well developed technologically, and therefore, the scientific community cannot rely on a well-defined, mature set of experimental approaches. simple deletion or overexpression of an uncharacterized gene often fails to yield any discernible phenotype in standard laboratory contexts. in the same vein, a purely biochemical approach that relies on purification and in vitro biochemical characterization of uncharacterized genes often fails to yield fruitful functional data. furthermore, while sequence-based bioinformatic analyses and computational models may provide clues about functional genetic interactions, such predictions are often limited by our current biological knowledge and databases. as a consequence, functional annotation is incorrect in many cases because unvalidated information is propagated across species. in cases where gene function is experimentally assigned and validated, the information is often not broadly propagated or may not apply to another organism. therefore, a successful functional annotation endeavor requires a multipronged approach that involves open collaboration among scientists with expertise in genetics, bioinformatics, molecular biology, biochemistry, cell physiology, host-microbe interactions, and data management. such a program is outside the scope of traditional funding mechanisms and requires integrated teams of experts and new highthroughput experimental methodologies. the niaid has implemented a program aimed at assigning functions to open reading frames (orfs) and small noncoding rnas (ncrnas) that have been discovered by large-scale sequencing efforts to begin addressing this gap in our understanding of bacterial and viral gene function. this timely initiative from the niaid is highly significant because it is the institute's first attempt to incorporate functional annotation into its genomics and advanced technologies program, which is focused on developing genomics, proteomics, and bioinformatics resources to advance our understanding of infectious and immune-mediated diseases. the niaid functional genomics (fg) program builds on previous functional annotation efforts, such as combrex (2) , and brings together multidisciplinary groups to work on shared goals. its mission is to probe new biology in pathogenic bacteria and viruses by assigning functions to new genes. the fg program is unique in its approach in that it enables principal investigators to follow up on "omics" and phenotypic screening data with targeted experiments to establish gene function. most pathogens are evolving constantly, and such an integrated approach will provide the database to allow us to work toward predicting new emerging infectious diseases and respond to new pandemics. projects across the niaid fg program are driven by the idea that the assignment of gene function must be rooted in experimentation in vivo and in vitro. the challenge to the experimentalist is to develop systematic approaches to gene function discovery that have a reasonable chance of success over a large number of genes. fg centers are taking a variety of approaches to uncover the functions of protein-coding genes and small ncrnas. some approaches, which perhaps will have the highest yield, require at least some prediction of function, while others presuppose no specific functional knowledge. current state-of-the-art technologies, such as genome synthesis, next-generation nucleic acid sequencing, ribosomal profiling, high-resolution mass spectrometry, metabolomics, and molecular and systems modeling, enable high-throughput approaches to characterize genes of unknown function. though these technologies are clearly valuable, they are most effectively leveraged when combined with functional assays, such as measurements of microbial permeability and antimicrobial susceptibility or assays of host response to infection, which have been scaled up through recent developments in robotics, whole-genome synthesis, and high-content cellular imaging. in this commentary, we provide an overview of our coordinated efforts that are aimed at filling the gaps in our understanding of microbial gene function. these efforts are restricted to a few organisms and technologies but could be attempted on a much larger scale. such functional analyses could provide the scientific basis for rational approaches to the development of new therapeutic products to combat future and current gaps in the treatment of bacterial and viral infection. improving the quality of genome maps. most protein-coding genes are annotated by either their similarity to known genes or generalized guidelines for identifying translational start sites. such predictions are often incorrect. for example, we have demonstrated that a protein highly homologous to a secreted bacterial amidase is actually an intracellular regulator of peptidoglycan synthesis (3) . moreover, predictive algorithms to define the locations of non-protein-coding genes (e.g., small ncrnas, out-of-frame orfs, etc.) are generally poor. defining the boundaries of genes of unknown function is an important component of experimental functional annotation. we are using combinations of directed biochemical, computational, and next-generation-sequencing approaches to produce improved maps of gene boundaries in bacterial and viral pathogens. these efforts enable improved functional genetic and biochemical experiments. leveraging existing protein structure data. molecular structure data can provide tremendous insight into the biochemical functions of proteins. a high-quality structural model enables one to map conserved residues on the molecule and to develop hypotheses regarding active-site chemistry, ligand binding sites, protein docking sites, etc. we are taking advantage of the large amount of structural data available in the pdb to build homologybased protein models. in addition, we are using hidden markov model (hmm)-based approaches to build predictive structural models in cases where homology in the pdb is low. experimental structures and high-confidence structural models generated by the fg program are available to the community and have been used to develop and test specific functional hypotheses in vitro and in vivo and to define protein function (4). high-throughput protein production for functional biochemical analysis. structural models of proteins of unknown function are informing functional biochemical hypotheses that are being tested in vitro. specifically, we are leveraging existing high-throughput structural genomics infrastructure to produce expression clones of proteins of unknown function. this approach is yielding milligram quantities of many targets of unknown function, which are being assayed for a range of biochemical activities and are being used to produce antibodies for cellular studies. one specific use for these purified proteins is in activitybased metabolomic profiling (5), an unbiased approach to finding substrates and products for putative enzymes. defining gene function by biochemical association. a wellestablished approach to define function is to test for physical associations of proteins and rnas with other proteins or transcripts of known function. we are employing such biochemical association strategies. for protein-coding genes, we are measuring protein-protein associations using quantitative proteomic methods. for example, using a proteomics approach, we reported that kaposi's sarcoma-associated herpesvirus (kshv) viral interferon regulatory factor 1 (virf1) can bind the cellular interferonstimulated gene 15 (isg15) e3 ligase, herc5. interaction of virf1 with herc5 inhibits the conjugation of isg15 to cellular proteins, thereby dampening the ifn response to the virus (6). using a functional genomics screen, coupled with synthetic genome design, we demonstrated that the s glycoprotein genes of several severe acute respiratory syndrome coronavirus (sars-cov)-like bat coronaviruses can bind human receptors for entry and replicate efficiently in primary human airway epithelial cells and that they are resistant to existing sars vaccines and immunotherapeutics (7, 8) . our early results suggest that an efficient way to discover function using this approach is to target protein complexes of known function and define associated proteins of unknown function. these efforts have required us to develop new bioinformatics methods to understand the complex data produced by these experiments. prediction of ncrna targets. ncrnas represent a particular challenge in bacterial and virus research, as they can vary tremen-perspective dously among different bacterial species and viruses and little is known about their functions. we have generated catalogues of the ncrnas for the organisms (for example, in mycobacteria) (9) that we study and are applying large-scale methods to identify their targets. these methods start with bioinformatics, though the predictive algorithms are not yet particularly robust. in addition, we are applying cross-linking-, ligation-, and sequencing-based approaches to systematically link small ncrnas of unknown function to their transcript targets. thus, while many of our approaches to investigating ncrnas are still at the developmental stage, defining catalogues of ncrnas provides the community with a road map of targets for downstream studies and analyses. we are also using next-generation-sequencing technologies, such as transcriptome sequencing (rnaseq) and selective 2=hydroxyl acylation analyzed by primer extension (shape) analysis of rna, to investigate rna structure-function correlations. paired with these analyses, we are investigating the rna transcriptome to correlate conserved and unique rna structure elements with pathogen virulence factors, identify previously uncharacterized and/or rare translation initiation sites, and associate protein structure elements with virus biology, pathogenesis, and host range. genetic approaches: identifying phenotypes. we are using multiple strategies to identify phenotypes that are associated with the deletion or overexpression of target genes of unknown function. for bacterial pathogens, we have initially focused on genes that, when disrupted, produce measurable growth phenotypes or alterations in cellular barrier function under specified conditions, including antibiotic treatment. these assays are facilitated by wholegenome-mutant defined mutant libraries that allow more efficient high-throughput screening for specific phenotypes (10) . we are assaying the growth of mutant strains in axenic culture and in cell and animal infection models. the biolog screening platform provides a high-throughput approach to phenotype identification. we are growing individual deletion strains in parallel with wild-type control strains under~2,000 defined conditions. differences between strains identify specific medium conditions under which a particular gene is required for wild-type growth and have provided clues for target gene function (11) . in a particular case, a functional genetic study of mutants harboring deletions of genes of unknown function is informing the development of a new live attenuated brucella abortus vaccine strain (12) . in addition, we are applying genome-wide screening approaches to find genes that interact genetically to modify phenotypes. in our studies of viral pathogens, we have developed a highthroughput, multiarmed screening program to investigate both rna and dna virus-encoded host evasion functions. to accomplish this, we are applying modular cloning technology to easily shuttle candidate genes of unknown function into replicon expression and/or lentiviral vectors for alternate applications, such as the expression of toxic or otherwise challenging proteins and antibody generation. using these screening assays, we are defining viral counterdefense mechanisms mediated by known, novel, and noncanonical viral orfs and ncrnas, whose products modulate the host innate immune response and cellular defense machinery to the virus's advantage. screening assays include assessments of beta interferon (ifn-␤) antagonism, nf-b, toll-like receptor, and apoptosis modulation, inflammasome signaling, cgas-sting and p53 pathway modulation, cellular localization, global protein synthesis, and mtor inactivation. for example, we have reported multiple herpesviral proteins that modulate the cgas-sting dna sensing pathway and middle east respiratory syndrome coronavirus (mers-cov) and closely related bat merslike virus phosphodiesterase proteins that antagonize rnase l activation during infection (13) (14) (15) . in parallel, we also seek to identify viral entry proteins that program efficient infections across multiple species (7, 8) . our synthetic approach allows us to rapidly test hypothetical proteins-proteins whose expression has not yet been verified in the context of infection. genes of interest are then deleted or mutated using reverse genetic platforms, and virus mutant growth is examined both in primary human targets, such as airway epithelial cells and various immune cells, and during infection. although the four centers funded by the fg program have different specific goals, we share approaches and are experiencing common challenges, which we are working to solve together. shared solutions drive research forward. for example, each center includes a small rna (srna) discovery project. there is no generally agreed upon approach to elucidating srna gene function. it has been useful to compare experiences and potential solutions across centers. srna discovery projects as a group are converging on approaches that may work generally in bacteria. in addition, the data management groups at the four centers share common priorities for public dissemination of data and were able as a group to begin the process of defining new priorities for updating capabilities for the patric and vipr brc public resources. an example of this collaboration is the specification of the mode and format for transfer of transposon-sequencing (tn-seq) experimental data and biolog phenotyping data to patric, along with the computational tools to analyze these data (16) . this effort will standardize the process of data transfer from independent centers and will define the format for public display of these and other data sets generated by fg centers. another interaction is occurring with the seattle structural genomics center for infectious diseases and the center for structural genomics of infectious diseases. using validated overexpression platforms and novel uncharacterized and/or hypothetical orfs, the collaboration is designed to determine the structures of high-priority candidate proteins that antagonize host antimicrobial defense pathways in the host. it is likely in the future that the data we generate will be used by the systems biology centers to further create more complex models of host-pathogen interactions. as described above, we have used a multipronged investigation strategy to directly evaluate unknown and hypothetical genes from a diverse array of pathogens to characterize the biological functions encoded by these genes (table 1) . a particular strength of this approach is that this systematic workflow, which can be adapted to all pathogens with sequenced genomes, can ensure rational, directed, and rapid response times for vaccine and therapeutic design in answer to emerging and reemerging epidemics. the work of this program is defining a future blueprint to perform functional analysis of new pathogens as they emerge and to more rapidly respond to the need for knowledge of emerging organisms. design and synthesis of a minimal bacterial genome the combrex project: design, methodology, and initial results a cytoplasmic peptidoglycan amidase homologue controls mycobacterial cell wall synthesis structural asymmetry in a conserved signaling system that regulates division, replication, and virulence of an intracellular pathogen activity-based metabolomic profiling of enzymatic function: identification of rv1248c as a mycobacterial 2-hydroxy-3-oxoadipate synthase kaposi's sarcoma-associated herpesvirus viral interferon regulatory factor 1 interacts with a member of the interferon-stimulated gene 15 pathway a sars-like cluster of circulating bat coronaviruses shows potential for human emergence sars-like wiv1-cov poised for human emergence leaderless transcripts and small proteins are common features of the mycobacterial translational landscape resources for genetic and genomic analysis of emerging pathogen acinetobacter baumannii wrpa is an atypical flavodoxin family protein under regulatory control of the brucella abortus general stress response system brucella abortus ⌬rpoe1 confers protective immunity against wild type challenge in a mouse model of brucellosis modulation of the cgas-sting dna sensing pathway by gammaherpesviruses evasion of innate cytosolic dna sensing by a gammaherpesvirus facilitates establishment of latent infection middle east respiratory syndrome coronavirus ns4b protein inhibits host rnase l activation transit-a software tool for himar1 tnseq analysis r.s.b. and b.d. were supported by nih grant ai107810, s.c. was supported by nih grant ai107792, s.i.m. was supported by nih grant ai107775, and e.j.r. was supported by nih grant ai107774. this work, including the efforts of ralph s. baric and blossom damania, was funded by hhs | nih | national institute of allergy and infectious diseases (niaid) (ai107810). this work, including the efforts of sean crosson, was funded by hhs | nih | national institute of allergy and infectious diseases (niaid) (ai107792). this work, including the efforts of samuel miller, was funded by hhs | nih | national institute of allergy and infectious diseases (niaid) (ai107775). this work, including the efforts of eric j. rubin, was funded by hhs | nih | national institute of allergy and infectious diseases (niaid) (ai107774). key: cord-308034-9b219k0v authors: murray, james l.; sheng, jinsong; rubin, donald h. title: a role for h/aca and c/d small nucleolar rnas in viral replication date: 2014-01-30 journal: mol biotechnol doi: 10.1007/s12033-013-9730-0 sha: doc_id: 308034 cord_uid: 9b219k0v we have employed gene-trap insertional mutagenesis to identify candidate genes whose disruption confer phenotypic resistance to lytic infection, in independent studies using 12 distinct viruses and several different cell lines. analysis of >2,000 virus-resistant clones revealed >1,000 candidate host genes, approximately 20 % of which were disrupted in clones surviving separate infections with 2–6 viruses. interestingly, there were 83 instances in which the insertional mutagenesis vector disrupted transcripts encoding h/aca-class and c/d-class small nucleolar rnas (snoras and snords, respectively). of these, 79 snoras and snords reside within introns of 29 genes (predominantly protein-coding), while 4 appear to be independent transcription units. sirna studies targeting candidate snora/ds provided independent confirmation of their roles in infection when tested against cowpox virus, dengue fever virus, influenza a virus, human rhinovirus 16, herpes simplex virus 2, or respiratory syncytial virus. significantly, eight of the nine snora/ds targeted with sirnas enhanced cellular resistance to multiple viruses suggesting widespread involvement of snora/ds in virus–host interactions and/or virus-induced cell death. the goal of this study was to discover cellular genes required for viral replication with the aim of developing anti-viral agents. we employed gene-trap insertional mutagenesis, an approach utilizing a promoterless vector that randomly integrates into the host genome, thereby disrupting (trapping) host genes. there is an absolute requirement for a cellular-based promoter to drive expression of a vector-derived selectable marker conferring neomycin resistance. libraries of gene-trap insertional mutants [1] were used to select for resistance to lytic infection with a variety of viruses. using this approach, we have identified over 1,000 candidate cellular genes whose disruption confer survival following exposure to otherwise lytic viral infections [2] [3] [4] [5] [6] [7] [8] , or clostridium perfringens epsilon toxin [9, 10] . candidate genes mediating viral infection are identified in surviving clones by sequencing across genomic integration sites with primers annealing to the u3neosv2 vector used for insertional mutagenesis. while protein-encoding genes accounted for over 90 % of the candidate genes identified, the gene-trap insertional mutagenesis vector also inserts into non-protein-encoding genes. these include small nucleolar rnas (snornas), which are involved in maturation processes of ribosomal rnas (rrnas), small nuclear rnas (snrnas), transfer rnas (trnas), and messenger rnas (mrnas) [11] . results presented herein suggest the involvement of two novel classes of snornas in viral replication: snoras containing conserved h/aca boxes that participate in nucleotide pseudouridylation, and snords containing c/d boxes that participate in 2 0 -o-methylation ( fig. 1 ) [11] . snoras and snords function by assembling into respective small nucleolar ribonucleoprotein complexes (snrnps), and serve as guide rnas to direct complimentary target rna nucleotide modification and aid translation (reviewed in [12] ). snora/ds also regulate other cellular processes including alternative splicing, mrna editing [13] , and mirna-like silencing [14] . interestingly, a recent study utilizing a promoter trap to randomly inactivate cellular alleles revealed that snords 32a, 33, and 35a contribute to stress-induced apoptosis [15] . potential roles for this novel rna class in viral replication are considered. [2, 4, 5, 7, 8] . clonal library cells resisting lytic infection were selected in hep3b cells (dfv2), tzm-bl cells (hrv2 and hrv16), and vero e6 cells (hsv1, pv, and rsv), using a similar approach. briefly, gene-trap libraries, each harboring approximately 10 4 gene entrapment events, were expanded to 80-90 % confluency until approximately 10 3 daughter cells represented each clone. the indicated cell lines were infected fig. 1 a snoras are composed of two nearly complementary hairpin loops and two single stranded regions with conserved h and aca boxes. the h box sequence is ananna, where n can be g, u, a, or c. bulging within the hairpin loops allow complementary base pairing with target rna sequences. snoras assemble into small nucleolar ribonucleoprotein (snrnp) complexes that catalyze isomerization of the first unpaired uridine in the bulge region to pseudouridine (w). pseudouridylation occurs 14-16 nucleotides upstream of either the h box, the aca box, or both. b c/d boxcontaining snords assemble into snrnps in complex with complimentary target rnas, and catalyze 2 0 -o-ribose methylation. imperfect copies of the c and d boxes, termed c 0 and d 0 may be located internally. snords interact with target rnas via either of the 10-21 nucleotide antisense regions to guide snrnp-catalyzed 2 0 -o-ribose methylation 5 nucleotides upstream of the d or d 0 box with dfv (moi = 0.0002), pv (moi = 0.001), hrv2 or hsv1 (moi = 0.005), hrv16 (moi = 0.01), or rsv (moi = 0.05). infection proceeded until [90 % of cells were dead (3-7 days), and then the medium was changed every 2-3 days until surviving clones were visible (generally 2-3 weeks). surviving clones were detatched with trypsin, expanded, and re-infected at a tenfold higher moi to confirm resistance. resistant clones showing [70 % survival following re-infection were selected for expansion to identify trapped genes. genomic dnas from clonal virus-resistant cell lines were extracted using a qiaamp dna blood mini kit (qiagen, inc., valencia, ca, usa). shuttle vectors and genomic dna flanking the u3neosv1 integration site were recovered by restriction enzyme digests of genomic dna, selfligation, transformation into escherichia coli, and sequencing the resultant carbenecillin-resistant plasmids to identify trapped genes, as described [7] . rna interference and qrt-pcr studies snora/ds were screened by rnai for functional roles in viral replication against a panel of six viruses (cpv, dfv2, flu, hrv16, hsv2, and rsv). inhibition of viral production was determined by measuring viral rna production in culture supernatants by quantitative real time pcr (qrt-pcr). the same cell lines used for gene-trap studies (table 1) were used for rnai screens, with the exception that rnai screens against flu were performed in human hepg2 cells instead of canine mdck cells. cells were seeded in 6-well plates (10 5 cells/well) and transfected in duplicate wells with 50 nm sirnas (dharmacon, inc.) targeting candidate snoras or snords, along with relevant negative and positive control sirnas (n = 2 independent experiments). allstars negative control sirna (qiagen, inc.) or a non-targeting sir-na (dharmacon, inc.) served as negative control sirnas. negative controls were used to normalize qrt-pcr results to 100 %. sirnas targeting viral genes were used as positive controls, as follows: d5r and d7r (cpv), prm (dfv2), pa (flu), 2c and vp4 (hrv16), ul29 (hsv2), and the p gene (rsv). duplicate wells were seeded for uninfected and infected controls, lacking transfection reagents. hep3b, hepg2, and vero e6 cells were transfected with 50 nm sirnas using lipofectamine 2000 (invitrogen, carlsbad, ca, usa), whereas tzm-bl cells were transfected with sirnas using hiperfect (qiagen, inc.). cells were infected at 48 h post-transfection, using mois of 1 (cowpox and hsv2), 0.1 (flu), 0.01 (hrv16 and rsv), or 0.001 (dfv2). cells were infected for 2 h at either 37°c (cpv, dfv2, hsv2, flu, or rsv) or 33°c (hrv16). following the 2-h incubation period, cells were washed with pbs and fresh medium was added. at 3 days post-inoculation, culture supernatants were clarified by centrifugation and 200 ll was lysed in preparation for rna extraction, using an epmotion 5075 workstation (eppendorf) and the purelink 96 total rna purification kit (invitrogen). total rna was reverse transcribed using random hexamers (applied biosystems, foster city, ca, usa), and qrt-pcr was performed using an eppendorf mastercycler realplex 2 system, with taqman assays developed that detect viral dnas. to generate standards curves, amplicons produced during real time pcr detection of each viral cdna were cloned into the pcrii vector (invitrogen). qrt-pcr was performed using freshly prepared standards, serially diluted over eight logs of copy numbers. we observed that 83 candidate snora and snord genes were potentially disrupted in clonal cell lines surviving viral infection (table 1) . most candidate snora/d genes (79/83) occur as intronic sequences encoded within 29 host genes, although 4 snora/ds: snora76, snord3b-2, snord93, and snord104, are monocistronic. as indicated in table 1 , over half of the host genes encoding snora/ds were disrupted at multiple independent integration sites with the insertional mutagen vector u3neosv1, and/or were identified in cell lines surviving selection with multiple (2-6) independent viruses. several snora/ds implicated in our gene-trap studies reside within genes that have been previously been shown to be important for viral infection. for example, bat1 [17] , eif3a [18] , hspa8 [19] , rps11 [19] , and rps8 [19] influence influenza a infection, and hiv infection may depend on rpl10a [20] . these data indirectly suggested that snora/ds might serve as a previously unknown class of cellular rnas important for viral replication. to test this hypothesis, we knocked down expression of a variety of snora/ds with sirna and examined the resulting cells for susceptibility to six different viruses ( table 2 ). expression of nine snora/ds encoded within rcc1 (which also encodes the shorter non-coding snhg3 gene), or snhg1 were silenced with sirnas for 2 days prior to infection with either cowpox virus (cpv), dengue fever virus (dfv), influenza a (flu), human rhinovirus 16 (hrv16), herpes simplex virus 2 (hsv2), or respiratory syncytial virus (rsv). following a 2-h inoculation, cells were washed to remove inocula, and viral production in culture supernatants was measured by quantitative real time pcr using taqman assays annealing to viral genomic sequences. snora/ds tested in sirna screens were found to limit the capacity of viruses to replicate, with, eight out of nine sirna-treated cells failing to support replication of three of more viruses (table 2 ). while a previous study showed that rcc1 supports hsv1 replication [21] , we did not observe that silencing rcc1 or the non-coding small nucleolar rna host gene 3 (snhg3) inhibits hsv2 (data not shown), whereas inhibiting expression of the rcc1encoded snora73a did. rcc1 may be alternatively spliced to include some transcripts to contain snhg3, and these data may be consistent with snhg3 serving minimal function in viral infection. as has been observed with snhg2 (gas5) [22] , snhg1 is a non-coding gene whose intronic snords are stable and well-conserved between human and mouse, unlike their encoded exons [23] . thus, silencing of snords encoded within snhg1 may be expected to account for the observed viral inhibition, rather than the associated transcript. in gene-trap experiments with multiple viruses, rcc1-snhg3, snhg2, and taf1d were the most frequently interrupted genes harboring snora/ds. gene-trap insertion sites within the rcc1-snhg3 and snhg2 loci are shown in fig. 2 . it should be noted that in non-biased selection of clones of murine embryonal stem cells infected with the same insertional vector (data not shown), rcc1-snhg3 was not found to be a hot spot for insertional events. therefore, it is unlikely that hot spots for vector insertion is the sole process that lead to 54 independently derived mutant rcc1-snhg3 clones selected in cells surviving infection with five different viruses ( fig. 2 ; table 1 ). while sirna confirmation suggest a role for snora73a, it may also be considered that the rcc1 gene shares exons with snhg3, and disrupting snhg3, perhaps in concert with disrupting snora73a, may transmit the resistance phenotype. interestingly, the prominently represented genes snhg1, snhg2, and rcc1-snhg3 are members of the 5 0 -terminal oligopyrimidine gene (5 0 top) family [22] [23] [24] , which contain 4-15 oligopyrimidine tracts in their 5 0 ends [25] that regulate transcription and translation under conditions of growth arrest [22, 26] . shng2 has recently been found to also be associated with glucocorticoid receptor transcriptional regulation [27] , and inhibit cellular proliferation induced by the mammalian target of rapamycin (mtor) [28] . thus, there may be processes that work in concert with translational repression that affect viruses capacity for replication. further work is needed to define whether a role in virus infection is unique for the shng genes selected in clones resistant to multiple viruses. given that snords are encoded within abundantly expressed genes, such as ribosomal genes (table 1) , high expression may be critical for their normal biological roles and in viral infection. to our knowledge, this study is the first to show that disrupting or silencing expression of cellular c/d or h/aca-class small nucleolar rnas inhibits viral replication. however, emerging data is suggestive of their direct involvement in the process. for example, the epstein-barr viral genome encodes a snord (v-snorna-1), which is thought to target the viral polymerase [29] . in addition, a recent study showed that several small nucleolar rnas are differentially expressed following severe acute respiratory syndrome coronavirus and influenza virus infection [30] . indirect evidence suggests possible mechanisms whereby snoras and snords may support replication. several viruses including influenza, hiv-1, hsv2, and adenoassociated virus utilize the nucleolus (the cellular location of snoras and snords) during specific stages of their replication cycle [31] [32] [33] [34] . interestingly, snoras and snords are known to modify spliceosomal snornas by pseudouridylation or methylation [35] [36] [37] [38] [39] , which can be important for rna splicing [40] [41] [42] [43] , and viral transcript a gattctctaactgcgcatgcttctgcgcacgcgcaatagacattccaggacttccgggcacttcgtaaggtttaaaaaggatgcttcgcgttttctctctcctttttggagacagattc gcagtggtcgcttcttctccttggtaagtgtgatccttggtaagtgtgatcagatgcttgccaccggagttgtgggtctaatgctatagatcagtagccgagcttccctaggaagatca tatagtattttatttatttactttttttttttttttgagacggagtcggtttgtcactcaggctggactgcagtgctcgttgcaacctccgcctgccgggttcaagcgattctcgtgcc tcagcctctccagcagctgggattacaggcacgtgccaccacgcccggccaatttttgtattcttagtggagacggggtttcgctatgttggtcaggctggttttgaactcctgatttc cggtgatccaccaccctcggccttccaaagtgctgggattacaggcgtgagccaccgcgcctggccggaaatcatgtaatttaaaactatatatgggtgtcttaggcggcatcggtccc aactctaaagtacgcgttagacgggcctgggccagaagtgggccatggagacctcgggacccgcagggctgccgcccgacccagcgagcctctgaaggtgcaccgccacccccactgtt tatcttactgcctcatagtaggcacattgtcgttctcaatataattgcacacagttttattctggatcctcatttgcctttaagaattttctcaatttttctttttatttgatcgcacc actgcaacctccgcctgctgggctcaagcaattctcctgccgcagcctcccgagtagctgggactacaggcgtgtaccaccgcgcctggcttatttttgtatttttagtagagacggga tttcaccatgttggccaggctggtctccaacgcctgaccttgtggtccgccacgccaggccgaagattttcataatttggaagcattgcgtttcgtaattatgctttctcgtatttttg tgatttgggtcatttttatttttatatttttaggattacaggcgtgagccatcgtgcctggccgatttgggggtaattaacaagtccacgtgtttcatttgaatttaggatagctgggc ctaattgttgtctttgcttctgcggtaccttccacatagtactaaccgcctattgtaaagtaattagaatagctgaatatgcatgttaccagtctagaaaccgatttttttttaacacc ccactgtggacagggtggaaactcgtttgctttcttgtttaagatctgtagtaacatgaatggatgaaattgtttcctattggattctgtaaatttatgcgttacactgattgtccaac gtggatacacccgggaggtcactctccccgggctctgtccaagtggcgtaggggagcatagggctctgccccatgatgtacaagtccctttccacaacgttggaaataaagctgggcct cgtgtctgcgcctgcatattcctacagcttcccagagtcctgtcgacaattactggggagacaaaccatgcaggaaacagccttctagagcactgaatctggattgaagtctttttttt tttttttttttttggagatggagtcgctctgttgcccaggctggagtgcagtggcgcactccattgcctctgcctcccgggttcaagtgattatgctaagtgattctcctgccttggcc tcctgagtagctgggattacaggcccccgccaccacgccaggctaatttttgtatttttagtaaagacagggtttcaccatgttggtcaggctggtctcaaactcctgaccttgtgatc cgccagcctctggcctcccaaaatgttgggattacaggcgtgagccaccacacctggctggattgaagtcttaatacatgtttaagaaaaattggctaaaaagtagccaggcatgatga taggtagctggaggaaggagaatcgctggagcccaggagtgacctatactcaaacctatactccagtgccactgtactccaaccccaggcgatagcatgaggcccctcgttgaaaaagt ttagggttttgctgtactaatagattaatatcttgttttgcaggatttgttaaggattccaagtaactcttatttggtgagtaaatctgctaattgttttttgcttatcagctctttgt caatgatttctgtaatggaaataggattgaagagacttttattctagttggtcaggatttacctctgaggcatttaatcattctcagagcaatagccaaatatcgactttgctgcattt ttgtaggcatgttgacataacttcaacatatgctctgttctgtaaaaattgctttttttagtcagctcattaaaagtgcaaagtagtaaaagctgccctagtgaactgtaggaagccta attggctttatctacatgtgtagcctgagctgagaaagatactagcccttgaaaatactgtgggtgattagcaatattggatttgtcggttactccaattcctcactaatgagcattcc aacgtggataccctgggaggtcactctccccaggctctgtccaagtggcataggggagcttagggctctgccccatgatgtacagtccctttccacaacgttgaagatgaagctgggcc tcgtgtctgcgcctgcatattcctacagcttcccagagtcctgtggacaatgactggggagacaaaccatgcaggaaacatatctagtatactagattttaagttgaagtaggatcttc aggagtctaatcattatttcttttcttttaggagagaagacgatctgcacttcgcattttggcattgacatttaattttagggtcctttatatagaagggagagtaggtaaactgattt ttttttttaacagggagggtttgacaatctttggcagacttggagcaaaagattgaggtgcatttcatgcctccttttgagagtcttgctctgtcgcccaggctgtagtgcagtggcgc aatcttggctgcaacctcagcctcccaagtagctgggattacaaacataagccaccacgcccagccctcatacctcttttaaaagtcgacctgttttgcagaaagtctgctgtttttgt actaaaggctttggaatttggcatttagctaggaatgcacattctttcacctcattcatactttaagaaccacagaagtgactctgcttggccagaaggcacactgtgttggtggttat attaaaagtccttgagtattttgcttttcatgatcttgctcactgcaacttccgcctcccaggttcaggcgattctcctgcctcagcctcccaagtagctgcgactacaggcgtgtagc accacacctggctaatttttgtatttttagtagagatgaggtttcaccatattggccaggctgttctcaactcctgacctcgtgatccgcccacctcagcctcctaaagtgctgggatt acagctgtgagccaccctgcccggccacttttgtatgatttctaatgtatttgtaatttacctaacaaattgcctaatctgctatgttaatgtatttatgaattaaaataaatacgact gcatgtttgtggttcatttttgtggaggtggctgtggtgacatcagccaagaatctgaatggtactgttgaaggaaactagcatgatagcttcagttctaaaggccctgaaacctagtc tcaggtgggtcccccttgggttcactttatattggcagtttattgggaaaatggatattaggtcctgaccaataggaccgtaagtctgggttgagtgcaagatgagttagaccgattct ttagcttcctgcagtgtagtggaggaaaaatcgatggtagcaacgggaggttgtatccctagctgatgagttgtatgagcctctactacctggcgcacctccgcctgaagattgccaga attgcttgcctcatgacgtgagtcacaatggaaactttgtcaagccccctgcactggctgccaacataaatgttcagtaccctgaaggatgggactgaagggggatcatctagaaggta aagttacctactggcataggggaggtgggacagccgttaagccatttggaacttgatggagacaggtttgagggaggtgggtgagattggagtttggtggactgtagagcttgcttgcc aaggtgttgaggtcagggttggtttgagaatggaagctagttactagctatgattgtgggggaacacagcttgatttttcttacaagctaagaggagtgaggcagtgtttaagagggca tgttaaatgcagccaggcttggtggctcacacccgtaatcccagcacttaggctaaggcaggcggatcacaacatctagagatcctggccaacgcggtgaaaccctgtctgtactaaaa atacaaaataactgggcatggtggtgtgcacctgtgggaggctgaggcagaattgctggaacccgggagatggaggttgtactgagctgagaccttgccactgcgctccagcctggtga cagagttaagtctcaaaaaaaaggcatcttcctaaagcaattgtatttgtgcttacctgtgccaggcactgttctaggtaagcactaagtgggctttaatacagcatattccaatgggg aatcccaggaaccaaaagactaattgtccaagtccacaactagaagtggcacctctgcagaaacaagcatcaaattccctgctcaggaagaagccagatgagtcagccccattcgtctg tatgcccagtcccatccgtgtcctgctgtaactacatagatctcacctgagtaaagtgatttttttctgaa b tgacgtcatcaaaagaactcttatatacaggagcccaggcaccatactgtcttttcgaggtaggagtcgactcctgtgaggtatgttttatctttgcgaatgttgcgggttttggggcg ctccagcctttgtctgctaaggtcaccctgaattgactgggacttctaagccagtcgcgcgcctttgcagggcctacaagttgaggatggtgggggattgcacatatggtgcatgcgtg acctaagctgcgactatgttagagtagaactgcggagaagcctcggctctcgtgccctgcctctgatgaagcctgtgttggtagggacatctgagagtaatgatgaatgccaaccgctc tgatggtggcacatgccgagtcacccgagtaagctattgttaagggccgtgacccgagcctccatcagccgtccgctctcatgaaaggctgtcggtggtagtccacgtgcttaagtgcc tgcattccgcagtgtcaccaatatttccattagtgtttcttttttcttttttgagaccgagcctcgttctgtcacccaggctggaatgcagtggctcgacatcggttaatggcaacctc cgcgtcccgagttaagcagttctgcctcagcctcccaagtaactgggactacaggcgcgcgccaccatgctaggcttttatatttttagtagagacggggtttcaccatgttggcgaga ctggtcttgaactcctgacctcaggtgatccaccagccgtggtcccccaacatactgggattacaagccgtgagccaccgcgcccggccgccatcggtggttcttaactgcgggtgcag tgcttctttgtaacattaagtgtatcctttacctgtcgctagataatgaatggtatgttacctgcatcattggtttaaaaagacgaaccgtttttttaaagaacactctttaaaaaaaa gaaccgtggaacaatgaattaaaatctgtacctgatctctttaggtatggtgctgggtgcagatgcagtgtggctctggatagcaccttatggacaggtaagaattggggaaagtatgg tgggaagaatgaaatctaagaggtggtctcagcctgtgatgctttaagagtagtggacagaagggatttctgaaattctattctgaggctttaatgttaacagcatttatatttaacgt gactgctgtgaggttcattctcactgttttgcttgcatctttttgttttctagttgtgtccccaaggaaggatgagaatagctactgaagtaagttgaaaattccctctcaaaaaggtt taaagccattggatgtgccacaatgatgacagtttatttgctactcttgagtgctagaatgatgaggatcttaaccaccattatcttaactgaggcacccaaaatggtgagttggggaa catagagagtacacctaagttcacatgaagttgtttcttcccaggtcctaaagagcaagcctaactcaagccattggcacacaggtgagacacctctattttgtacttctcacttttaa gggattagaaaatagccaaagcaatgatgattatctatgttagtgcttctctcccctcttttcaaatgagaattttgctctcatattgatactaagtttaatactgaagaaaatgtgaa aacagatactatgatggttgcatagttcagcagatttaatcatgaagagatgtactatctgtctgatgtatctggggtagttgtggtttgctgttaatggttaagcagtgtaccaccaa tctaccattaaaatattttttgctgacaattttgtattaaaattacaggcattagacagaaagctggaagttgaaatgaatgcagcctggatgatgataagcaaatgctgactgaacat gaaggtcttaattagctctaactgactaaaggcatttgttagttttggcagggggtgaacactcatctgtggctattctaagaccactcttatttcttaggtggagtccaacttgcctg gaccagcttaatggttctggtaagtattaatgaaaacagtagatagacttaatgaaaatgctgatggtgatatgcttactgctgagctaatggcttaaggcttggctgatgaatactga ctgtattttccttgagcatgtctggaacagtgtttatgtgttttccttgagcatgtttggaacggggtttgtgtaatgatgttgatcaaatgtctgacctgaaatgagcatgtagacaa aggtaacactgaagaaccctgtgacagacaactttgaaaagagtttaatgatgtttagcattttaatggaagtcctcatttgtattccagctcctggtaacgtttttatccatggatga cttgcttgggtaaggacatgaagacagttcctgtcataccttttaaaggtacatgttttattgatgttaacgttaattgattgagctactgttagtgatgattttaaaattaaagcaga tgggaatctctctgagaaagaaaatggagattaatcttaaactgaaacagtagttgggaaatcttttagaaatccacctattactacctattggtaaaggagattaaatttctacaggt atggagagtcggcttgactacactgtgtggagcaagttttaaagaagcaaaggtatagcagttccaagtattttttttttttttttagacaagagtctagctcttgcccagaatggagt gcagcggcactatcagttcactgcaacctctgcctcccaggttcaaggaattctcctgcctcagcctcttgagtagctgggattacaggcatgtgacaccatgcctggctaattttgta cagctatgttgtccaggctggtctcgaactcttgacctcaagtgatactgcccgcactgacctcccaaagtgctgggattacaggcgtgagccaccatgccccgcctcaagtctggttt ttaagtgttgtaaagccgatacaatgatgataacatagttcagcagactaacgctgatgagcaatattaagtctttcgctcctatctgatgtatctggcggtaacattctagtttatgc cccgaaaaggggaatatagccattctataatgtttggagattttggattactcctaattgtatgcaagttgtcttactgtgtattgtcccttaatttcaggactcagaattcatgattg aagaaatgcaggttagtttaaactttgaaggaaatttttaaggtggcaaaaggttttggtggcatatacaccttaatctgtagatgggagtgattagctgtttaaaagttaaaatgtga ctgagaaggaaattgagtagggcaaattttaaatgggtattatttttcatcttcaaacaggcagacctgttatcctaaactaggtgagtcagcttttggtacatgtgatgattttcagt gtaaccaatgatgtaatgattctgccaaatgaaatataatgatatcactgtaaaaccgttccattttgattctgaggttactctactaacaagcatcacacatttgtattttgccctga ttaatatgttggcttcgctttcagggtttttaatgaccacaacaagcaagcatgcagcttactgcttgaaaggtgaggattggaaatgttgggactattataattgcagaatacatgat gatctcaatccaacttgaactctctcactgattacttgatgacaataaaatatctgatattctgcattcccatgtagcattttaattgaagtctgtaaatgtggctaaaagtcttgtct tattttttgagacagggtcttgcctcacccaagctagagtgcagtggcctttgaagcttactacagcctcaaacttctgggctcaagtgatcctcagcctcccagtggtctttgtagac tgcctgatggagtctcatggcacaagaagattaaaacagtgtctccaattttaataaatttttgcaatccatctggagtgtgtagtgtttacttaaaaaaggacagtgcttttcatctg tt splicing may depend upon host factors [44] [45] [46] [47] [48] . thus, it is possible that snoras and snords activate spliceosomal snornas, which promotes the proper splicing and translation of essential viral proteins. an alternative, though not mutually exclusive, hypothesis is that snords may function in 5 0 -cap viral rna maturation through 2 0 -o-ribose methylation to promote initiation of viral protein translation. in support of this hypothesis are observations that west nile virus, vesicular stromatitis virus, and dengue fever virus encode their own 2 0 -o-methyltransferases that modify 5 0 -cap structures accordingly, and that methyltransferase mutations are known to impair their replication [49] [50] [51] . influenza a lacks a 2 0 -o-methyltransferase; however, it utilizes a ''cap snatching'' mechanism to acquire 5 0 -caps from cellular mrnas [52] , which are potentially methylated by snords. there are no reports that cellular methyltransferases effect 5 0 -cap methylation. interestingly, the 2 0 -omethyl group in the cap of cellular mrnas also strongly influences its ability to act as primer for influenza virus rna transcription [53] . the discovery of these classes of non-coding genes prominently represented in the mutant clones selected in our virus surviving cell lines suggests an importance of snoras and snords in facilitating viral replication. however, this study is not exhaustive in terms of the role each particular snora/d may have in viral infection. it is possible that the sirnas used in our experiments inhibit not only the non-protein encoded rnas but also the gene in which they reside. selective in situ mutations in the encoding sequences will need to be performed in order to confirm each as the gene conferring the phenotype. it is anticipated that in future studies, the precise role that snoras and snords play will be identified to provide a deeper understanding of the complex interplay at work during the viral-host standoff. fig. 2 a gene disruptions within the rcc1, snhg3, snora73a, and snora73b loci observed in clonal cell lines resisting lytic viral infection. the snhg3 gene is a non-coding transcript that shares exons with the protein-coding rcc1 gene (highlighted in green). introns are designated with black text, whereas snhg3 exonic sequences not shared with rcc1 are shown in red. snora73a and snora73b are shown as the first and second intronic sequences highlighted in yellow, respectively. gene-trap insertions were observed within introns, as well each of the above mentioned genes. insertion sites are shown with single letters highlighted in blue (human cell lines resisting dfv2, flu, hrv2, or hrv16 infection) or red (vero e6 monkey cells resisting hsv2 infection). b snhg2 and intervening snord sequences. gene-trap insertion sites conferring resistance to pathogens are highlighted in blue text. in some cases, identical clones were recovered from viral selection in independent studies with more than one virus (viruses shown in table 1 ). color coding in 5 0 to 3 0 orientation: maroon-snord44 and snord47; light orange-snords 74-81; light blue-snhg2 coding sequence; black-snhg2 introns; green-adjacent 5 0 and 3 0 genomic sequence. three snords are within coding the sequence for gas 5, namely snords 47, 76, and 80 (color identification of cellular promoters by using a retrovirus promoter trap identification of cellular proteins required for replication of human immunodeficiency virus type 1 a functional role for adam10 in human immunodeficiency virus type-1 replication rab9 gtpase is required for replication of human immunodeficiency virus type 1, filoviruses, and measles virus inhibition of influenza a virus replication by antagonism of a pi3k-akt-mtor pathway member identified by gene-trap insertional mutagenesis effects of transforming growth factor-alpha (tgf-alpha) in vitro and in vivo on reovirus replication discovery of mammalian genes that participate in virus infection mutations in the igf-ii pathway that confer resistance to lytic reovirus infection oligomerization of clostridium perfringens epsilon toxin is dependent upon caveolins 1 and 2 gene-trap mutagenesis identifies mammalian genes contributing to intoxication by clostridium perfringens epsilontoxin the expanding snorna world non-coding rnas: lessons from the small nuclear and small nucleolar rnas converting nonsense codons into sense codons by targeted pseudouridylation snorna, a novel precursor of microrna in giardia lamblia small nucleolar rnas u32a, u33, and u35a are critical mediators of metabolic stress functional genomics in mice by tagged sequence mutagenesis cellular splicing factor raf-2p48/npi-5/bat1/uap56 interacts with the influenza virus nucleoprotein and enhances viral rna synthesis genome-wide rnai screen identifies human host factors crucial for influenza virus replication drosophila rnai screen identifies host genes important for influenza virus replication global analysis of host-pathogen interactions that regulate early-stage hiv-1 replication blocks to herpes simplex virus type 1 replication in a cell line, tsbn2, encoding a temperature-sensitive rcc1 protein classification of gas5 as a multi-small-nucleolar-rna (snorna) host gene and a member of the 5 0 -terminal oligopyrimidine gene family reveals common features of snorna host genes a mammalian gene with introns instead of exons generating stable rna products the host gene for intronic u17 small nucleolar rnas in mammals has no protein-coding potential and is a member of the 5 0 -terminal oligopyrimidine gene family top genes: a translationally controlled class of genes including those coding for ribosomal proteins comprehensive detection of human terminal oligo-pyrimidine (top) genes and analysis of their characteristics noncoding rna gas5 is a growth arrest-and starvationassociated repressor of the glucocorticoid receptor inhibition of human t-cell proliferation by mammalian target of rapamycin (mtor) antagonists requires noncoding rna growth-arrest-specific expression and processing of a small nucleolar rna from the epstein-barr virus genome integrative deep sequencing of the mouse lung transcriptome reveals differential expression of diverse classes of small rnas in response to respiratory virus infection nucleolin is required for an efficient herpes simplex virus type 1 infection nuclear and nucleolar targeting of influenza a virus ns1 protein: striking differences between different virus subtypes ribozyme-mediated inhibition of hiv 1 suggests nucleolar trafficking of hiv-1 rna the hiv tat protein affects processing of ribosomal rna precursor nucleolar factors direct the 2 0 -o-ribose methylation and pseudouridylation of u6 spliceosomal rna rnomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger rnas in mouse a computational screen for mammalian pseudouridylation guide h/aca rnas modification of u6 spliceosomal rna is guided by other small rnas identification of 13 novel human modification guide rnas protein-free small nuclear rnas catalyze a two-step splicing reaction the use of simple model systems to study spliceosomal catalysis the spliceosome: a ribozyme at heart? biological chemistry characterization of the catalytic activity of u2 and u6 snrnas mutations at alternative 5 0 splice sites of m1 mrna negatively affect influenza a virus viability and growth rate functional association between viral and cellular transcription during influenza virus infection modulating hiv-1 rna processing and utilization processing of adeno-associated virus rna experimental investigation of herpes simplex virus latency analysis of a structural homology model of the 2 0 -o-ribose methyltransferase domain within the vesicular stomatitis virus l protein mutagenesis of the dengue virus type 2 ns5 methyltransferase domain structure and function of flavivirus ns5 methyltransferase the cap-snatching endonuclease of influenza virus polymerase resides in the pa subunit both the 7-methyl and the 2 0 -o-methyl groups in the cap of mrna strongly influence its ability to act as primer for influenza virus rna transcription acknowledgments this work was supported by the red and bobby buisson foundation, and public health service small business innovation research (sbir) grant ai084705 from the division of aids, national institute of allergy and infectious diseases. dhr was supported by gifts from maggie chassman, the red and bobby buisson foundation, inc., zirus, inc., and the public health service. none of the funding sources influenced the study design, the collection, analysis of interpretation of data, the preparation of this manuscript, or the decision to submit the article for publication. we also thank dr. h. earl ruley of vanderbilt university for critical review of the manuscript, and drs. natalie mcdonald and thomas hodge (zirus, inc.) for technical support. the authors declare no conflict of interest. key: cord-331592-l44rupmi authors: wang, tzu-hao; chao, angel title: microarray analysis of gene expression of cancer to guide the use of chemotherapeutics date: 2007-09-30 journal: taiwanese journal of obstetrics and gynecology doi: 10.1016/s1028-4559(08)60024-8 sha: doc_id: 331592 cord_uid: l44rupmi summary the beauty of microarray analysis of gene expression (mage) is that it can be used to discover some genes that were previously thought to be unrelated to a physiologic or pathologic event. during the period from 1999 to 2007, applications of mage in cancer investigation have shifted from molecular profiling, identifying previously undiscovered cancer types, predicting outcomes of cancer patients, revealing metastasis signatures of solid tumors, to guiding the use of therapeutics. the roles of cancer genomic signatures have evolved through three phases. in the first phase, genomic signatures were described in stored cancer specimens and dubbed as molecular portraits of cancer. when gene expression profiles were carefully correlated with sufficient clinical information of cancer patients, new subgroups of cancers with distinct outcomes were revealed. in studies of the second phase, validation of cancer signatures was emphasized and commonly performed with independent groups of cancer specimens or independent data set. in the third phase, cancer genomic signatures have been further expanded beyond depicting the molecular portrait of cancer to predicting patient outcomes and guiding the use of cancer therapeutics. cancer genomic signatures have become an essential part of a new generation of cancer clinical trials. it is advocated that, in future clinical trials of cancer therapy, the cancer specimens of each participant should be tested for currently available predictor genomic signatures, so that the most effective treatment with the least adverse effects for each patient can be identified. then, participants can be triaged to an appropriate study group. a dna microarray is an orderly arrangement of dna on solid support, providing a medium for matching known and unknown dna samples [1] . the types of dna microarrays and relevant methodologies are reviewed by chao et al [2] . in this article, we briefly review the current advances in microarray analysis of gene expression (mage), focusing on recent reports of the microarray quality control (maqc) project and the shift of mage usage from molecular cancer profiling to clinical cancer therapeutics. gene expression profiling has generated and continues to generate extensive information on the molecular mechanisms of cellular function in particular tissues during physiologic or pathologic events. microarray analysis technology is a high throughput platform for gene expression profiling. the beauty of mage is that we can usually discover some genes that were previously not linked to certain physiologic or pathologic events. for instance, we have gained insight into the host response to sars infection [3] , tumor biology of various cervical cancer types [4] , molecular mechanisms in paclitaxel treatment of ovarian cancer [5] , and the intrinsic difference among the mesenchymal stem cells derived from distinct origins [6] . when dna microarrays are used to analyze similar tissues, gene expression profiles obtained from different studies used to be notoriously varied [7] , sometimes even conflicting. possible causes for the discrepancy include different assay platforms using different sequences to represent a particular gene, non-uniform coverage of gene sets, distinct data filtering strategies, various statistical stringency, as well as data complexity and variability [8] [9] [10] . the identification of differentially expressed genes in a studied condition with dna microarray analysis is often determined by the criteria set by the investigators. therefore, concerns have been raised regarding the reliability of microarray results due to the varied and often conflicting reports [11, 12] . to address this concern, a collaborative effort led by the united states food and drug administration that included 137 scientists from 51 organizations representing academia, industry, and the us government has completed the maqc project [13] [14] [15] [16] [17] [18] . in this project, identical specimens were aliquoted and assigned to participating laboratories to analyze using different microarray platforms, including those manufactured by applied biosystems, affymetrix, agilent, and ge healthcare. to validate the quantitative capability of microarrays, microarray results were compared with real-time quantitative polymerase chain reaction (pcr). the correlation between affymetrix gene expression results and taqman real-time quantitative pcr results has shown good linearity (r 2 = 0.95) [15] . a fold-change ranking method with a p-value cutoff < 0.05 has recently been shown to be reproducible in selecting the signature gene list from results using different microarray platforms [18] . these selection on criteria have been shown to more reproducible than t-test p value or significance analysis of microarrays [18] . we have applied this method in selecting the signature gene expression profiles with ease; after filtering using p < 0.05, we ranked genes by fold change and chose the top 25 genes that were upregulated in each group of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and adult bone marrow [6] . collectively, because of the remarkable improvement of microarray technology and the aforementioned critical evaluations, the majority of microarray researchers recognize the reliability and consistency of well-designed and carefully conducted microarray results. even in the 2 years before the publication of the maqc project, the clinical and biologic findings derived from microarrays were regarded to be "remarkably robust, with a high level of quantitative precision" [19, 20] . the recent maqc results further demonstrate that microarray gene expression analysis itself is suitable as a stand-alone quantitative comparison [17] . nevertheless, we should not ignore potential flaws. all the encouraging results of the maqc project only establish that microarray technology is robust, but they do not imply that the technology is foolproof. quoted from a commentary in the november 17, 2006 issue of cell, "you can learn to do pcr well in a month. but with microarrays, it can take years." [13] . to evaluate how mage can help to make a diagnosis or choose a therapy, researchers use one set of patients to identify a gene-expression pattern called a genetic signature that can correspond to a clinical issue, such as a 5-year survival rate, the response to a treatment, or the induction of side effects by a drug. the power of microarray technology is its ability to use changes in multiple genes as the pattern of gene expression rather than to choose thresholds of individual markers [19] . this genetic signature is then validated on other groups of patients [13] . during this trial period, it is critical that investigators understand how to minimize expression noise and bias through effective design. expression noise can be defined as gene expression variation that does not correlate with the biology or behavior being studied and is introduced by both the technology itself and/or during tissue processing [21] . bias is not inherent to microarray analysis but is easily introduced by faulty experimental design [21] . a series of sophisticated analytical strategies to address these problems have been discussed [19, 22, 23] , as summarized in table 1 . an unsupervised analysis does not use any a priori class definition, but it simply seeks to determine what structure is inherent in the data [19] . a commonly used example of unsupervised analysis is hierarchical clustering, i.e. letting the data define its own patterns by clustering genes that are most similar in expression profile [24] . a supervised analysis is more likely to reveal putative associations between genes and the cytogenetic class, but it may bias the outcome by forcing a model onto the data, i.e. the "overfitting" risk [19] . to extract robust profiles from multiple data sets, a meta-analysis has been done on 40 independent data sets derived from more than 3,700 array experiments, identifying 36 cancer signatures that were activated in cancer relative to the normal tissue from which the cancer arised [25] . a meta-analysis of these signatures further identified 67 genes that were activated in 12 or more signatures, suggesting a common transcriptional program pervading most types of cancer [22] . in functional enrichment analysis, a series of external functional information has been used to interpret and summarize large cancer signatures milestones [22] . databases of external functional information include gene ontology (www.geneontology.org) [26] , kyoto encyclopedia of genes and genomes (www.genome.jp/ kegg/), biocarta (www.biocarta.com), and genmapp (www.genmapp.org). commercially maintained integrative databases and softwares include metacore by genego (www.genego.com/metacore.php) and ingenuity systems (www.ingenuity.com). we have recently used the metacore suite to analyze the signature profiles of mesenchymal stem cells of various origins and obtain insights into biologic processes of each group [6] . the goal of transcriptional network analysis is to simplify a complex cancer signature to a small number of activated transcriptional programs that may shed light on neoplastic mechanisms and further point to potential targets of therapeutic intervention [27] . in addition to the aforementioned functional enrichment analysis, in which many of the downstream effectors are transcription factors, chromatin immunoprecipitation coupled with promoter microarrays (chip-chip assays) allow for genome-wide identification of transcription factor-binding sites [28, 29] . with hundreds of consensus binding sequences for transcription factors, which have been defined by sequence-based methods, it is feasible to perform a large-scale integrative analysis of binding-site profiles and cancer signature expression profiles [22] . analysis of expression modules, in which functional pathways (i.e. gene modules) are used as gene modules, was proposed to extend the investigation of cancer gene expression from individual genes to biologic processes [23] . when this concept of higher-level modules was applied to examine the joint behavior of differentially expressed genes in diabetic muscle, a significant change in the whole set of genes was noted, even though the expression of individual genes was not significantly different [30] . segal et al used this module-level analysis to obtain a global view of the shared and unique molecule modules underlying human cancer [31] . they demonstrated that activation or repression of some modules (e.g. cell cycle) was shared across multiple tumor types and could be regarded as a general tumorigenesis, whereas others (e.g. growthregulatory modules) were more specific to tissue origin or progression of particular tumors [23] . applications of mage in clinical cancer investigation have shifted from molecular profiling in the year 1999 [32, 33] , identifying previously undiscovered subgroups of particular type of cancer [34] , predicting outcomes of cancer patients in 2002 [35, 36] , and revealing a metastasis signature of solid tumors [37] , to guiding the use of therapeutics in 2006 [38] , as summarized in table 2 . the use of mage as the guide of cancer therapeutics has also been compared in meta-analyses in large b-cell lymphoma [39] . in a leukemia data set of 38 bone marrow samples (27 acute lymphoblastic leukemia and 11 acute myeloblastic leukemia), golub et al tested whether gene expression monitoring by dna microarray could be to assign tumors to known classes (class prediction) [32] . using a supervised learning classification algorithm, golub et al first constructed class predictors and then evaluated them using cross-validation with the same collection of specimens with known outcomes. their results suggested a general strategy for discovering and predicting cancer classes, which proved useful in predicting outcomes in patients with other tumor types [36] . as a proof of principle, perou et al used cdna microarrays to identify genes of differential expression between in vitro cultured human mammary epithelial cells and breast tumor specimens [40] . their results supported the feasibility and usefulness of this systematic approach for studying variations in gene expression patterns in human cancers as a means to dissect and classify solid tumors [40] . then, 64 surgical specimens of human breast tumors from 42 patients were analyzed for gene expression profiles [33] . they identified a set of co-expressed genes for which variation in mrna levels could be related to specific features of physiologic variation. molecular portraits of cancer with gene expression profiles were thus proposed [33] . diffuse large b-cell lymphoma (dlbcl) is one disease in which further subgrouping by histology is difficult because of inter-and intra-pathologist irreproducibility [41] . using hierarchical clustering for mage profiling, 40 dlbcl specimens could be divided into two distinct groups: 19 cases of germinal center b-like dlbcl, and 21 cases of activated b-like dlbcl [34] . statistically, patients with germinal center b-like dlbcl had a better overall survival than those with activated b-like dlbcl. the molecular classification of tumors based on gene expression profiles has thus proved its ability to identify previously undetected and clinically significant subtypes of cancer [34] . even in the same stage of disease, breast cancer is notorious for its unpredictable response to chemotherapy and variable overall outcome. chemotherapy [42] or hormonal therapy [43] reduces the risk of distant metastasis by about a third. however, because of the lack of an accurate patient triage strategy to determine who should or should not undergo adjuvant therapy, many patients who might not develop cancer metastasis at all have unnecessarily undergone adjuvant therapy. to develop patient-tailored therapy strategies for breast cancer, van't veer et al used supervised classification to analyze dna microarray data on primary breast tumors of 117 young patients [35] . van't veer et al identified a poor prognosis signature, by which they could predict a short time interval to distant metastasis. therefore, these results provide a strategy to select patients who would benefit from adjuvant therapy. in search of the molecular metastasis signature of cancer, ramaswamy et al compared gene expression patterns between primary tumors and metastases [37] . they identified a gene expression signature that distinguished primary from metastatic adenocarcinomas. however, the authors found that a subset of primary tumors resembled metastatic tumors, and they further confirmed this finding by applying the expression signature to data on 279 primary solid tumors of diverse types. notably, ramaswamy et al analyzed whole tumor tissues including surrounding stromal cells, instead of pure cancer population that could be isolated using laser capture microdissection. in the 17-gene metastasis signature identified in that study, two collagen genes and one lamin gene were upregulated, suggesting that malignancy is the product of the tumor-host microenvironment [44] . two articles about the prognostic usefulness of gene expression profiles in acute myeloid leukemia (aml) [19, 20] . these studies also demonstrated the ability of microarray technology to use somewhat imprecise patterns of gene expression rather than exact thresholds of individual markers [19] . currently, many prestigious journals, such as science and nature series, ask authors to deposit their microarray data in a minimal information about a microarray experiment (miame)-compliant form to one of two public repositories: gene expression omnibus at national center of biotechnology and informatics (http://www. ncbi.nlm.nih.gov/geo/) and array express at european bioinformatics institute (http://www.ebi.ac.uk/arrayexpress) as a prerequisite of publication of microarrayderived research articles. therefore, many large-scale microarray data sets became available for re-analyses by other researchers worldwide. for instance, multiple important papers of breast cancer gene expression profiles [35, [48] [49] [50] have been derived from the same comprehensive microarray data set with sufficient clinical information [51] . using several microarray data sets [34, 36, 52] , lossos et al did a meta-analysis of dbcl and selected 36 genes for further analysis with real-time quantitative pcr on independent samples of lymphoma from 66 patients [39] . six genes that showed the strongest predicting power were lmo2, bcl6, fn1, ccnd2, scya3, and bcl2. after testing these genes in two additional independent microarray data sets, lossos et al concluded that measurement of the expression of these six genes to be sufficient in predicting overall survival in dbcl [39] . the use of microarray gene expression profiles in predicting outcomes of cancer patients has been validated in the aforementioned studies. however, even if the same group samples were analyzed, distinct prognostic profiles have been derived for outcome prediction [35, [48] [49] [50] [51] . to resolve this paradox, fan et al compared the prediction powers of the gene sets for the same group of specimens by applying five geneexpression-based models: intrinsic subtypes, 70-gene profile, wound response, recurrence score, and the two-gene ratio [53] . by performing kaplan-meier survival analyses of 295 patients with breast cancer, fan et al found that four of the five models tested showed significant agreement in the outcome prediction for individual patients. the only exception was the model using two-gene ratio, which could not result in a reliable prediction [53] . the explanation for the surprising concordance among the four different models in predicting breast cancer outcomes is likely to be as follows: there was a large group of genes, which behaved differently and were related to biologic phenotypes of cancer, and ultimately, the patients' outcomes. in this large list of genes, each of the four models might have used only some of them to construct the signature profiles used for predicting outcomes. to dissect oncogenic pathway signatures in human cancer, bild et al used adenoviral vectors to express various oncogenic activities, such as myc, ras, e2f3, src, and î²-catenin in otherwise quiescent cells. applying this strategy, they were able to specifically isolate the subsequent events as defined by the activation/deregulation of that single oncogenic pathway, and analyze these events with affymetrix oligonucleotide microarrays [54] . in clinical samples of lung, breast and ovarian cancers, bild et al combined signature-based predictions across several pathways and identified coordinated patterns of pathway deregulation that distinguish between specific cancer and tumor subtypes [54] . using pathway-specific inhibitors such as ras pathway inhibitors, either farnesyltransferase inhibitor (l-744832) or farnesylthiosalicylic acid, and the src pathway inhibitor (su6656), bild et al could predict drug sensitivity of tested breast cancer cell lines according to their pathway deregulation. in summary, predictions of pathway deregulation in cancer cells can also predict the sensitivity to therapeutic agents that target components of the pathway. these results pave the path of using these oncogenic pathway signatures to guide the use of target therapeutics. because of the enormous complexity of cancer and a frequent inability to properly guide the use of available therapeutics, chemotherapy for solid tumors often results in marginal success. most people with advanced solid tumors will relapse and die of their disease [55] . furthermore, oncologists have always had to face the challenge of matching the right therapeutic regimen with the right individual, balancing relative benefit with risk to achieve the best outcome. with the goal of using genomic signatures to guide the use of chemotherapeutics, potti et al systematically extracted gene expression profiles from the following microarray data sets: nci-60 cancer cell lines [56] , additional 30 cancer cell lines [57] , the authors' previously reported breast, ovarian and lung cancer specimens [54] , and their newly analyzed 13 ovarian cancer cell lines and 119 advanced (figo stages iii or iv) serous epithelial ovarian cancers [38] . potti et al demonstrated patterns of predicted sensitivity of three human solid cancers (breast, lung, ovary) to seven common chemotherapeutic drugs (5-fluorouracil, paclitaxel, docetaxel, adriamycin, topotecan, cyclophosphamide, etoposide). to evaluate how individual signatures respond to a combination of drugs, potti et al also analyzed 51 breast cancer patients who were in a breast neoadjuvant treatment study that used a combination of paclitaxel, 5-fluorouracil, adriamycin and cyclophosphamide (tfac). the predicted response that was based on a combined probability of sensitivity built from the individual chemosensitivity predictions yielded a statistically significant distinction between responders and nonresponders [38] . as summarized in the figure, the roles of cancer genomic signatures have evolved through three phases. in the first phase, genomic signatures were described in stored cancer specimens and dubbed as molecular portraits of cancer [33] . when gene expression profiles were carefully correlated with sufficient clinical information of cancer patients, new subgroups of cancers with distinct outcomes were revealed [34] . in studies of the second phase, validation of cancer signatures was emphasized and commonly done with independent groups of cancer specimens or independent data sets [22, 25, 35, 39] . in the third phase, cancer genomic signatures have been expanded beyond depicting the molecular portrait of cancer to predicting patient outcomes [45, 46] , including metastasis [37] . it has become a rule, in the third phase, that all of the prognostic genomic signatures be validated in additional data sets. potti et al further demonstrated the role of cancer genomic signatures as a guide to the use of cancer therapeutics [38] . a new generation of cancer clinical trials was proposed, in which the cancer specimens of each participant should be tested for currently available predictor genomic signatures so that the most effective treatment and the least adverse effects for each patient could be identified. then, participants can be triaged to an appropriate study group (figure) . in fact, successful examples in treating patients with early-stage non-small cell lung cancer were reported by potti et al [59] and in treating those with advanced-stage ovarian cancer by dressman et al [60] . it is commonly argued against microarray results by the fact that a transcriptome does not necessarily reflect the corresponding proteome, the collection of proteins that execute the majority of cellular functions. indeed, mrna expression is only a coarse surrogate for protein activation levels. for many genes, however, mrna expression is a useful surrogate [23] . as documented in many studies, when one finds strong signals of differential expression, these are typically reflected later at the protein levels. the latter can be validated by protein assays such as enzyme-linked immunosorbent assays [5] or immunohistochemistry [4] . at the current developmental pace of genomic technology, the clinical trend towards personalized medicine is almost certain (figure) . many biomedical researchers and clinicians predict that microarray technology will be incorporated into clinical laboratories in hospitals in the near future. as extrapolated from the results of the studies discussed in this article, the use of a "focused array" to measure the expression of 50 to 100 genes in the signature profile for a selected disease would help clinicians predict the patient's response to a drug, triage patients into a chemo-responsive versus chemo-resistant group, and evaluate a panel of risk three phases of genomic signatures in cancer therapy and a new generation of clinical trials. this schema is adapted from herbst and lippman [58] and potti et al [38] . factors that may result in comorbidity of the patient. to achieve this practical feasibility, microarray technology would need to address a range of quality-control issues. every aspect of the process should be so robust that it can be considered to be foolproof. it is predicted that within the next 10 years, microarray developers will meet these challenges [13] . molecular classification of cancer using supervised machine learning golub et al molecular profiling of breast cancer perou et al identification of subgroups of diffuse large b-cell lymphoma with different outcomes alizadeh prediction of clinical outcomes of breast cancer van't veer identification of metastasis signature that reflected both contributions of the tumor ramaswamy et al [37] and the host environment 2004 identification of prognostic profiles of adult acute myeloid leukemia bullinger et al using independent samples of lymphoma to test a meta-analysis derived signature lossos establishment of cdna microarray analysis at the genomic medicine research core laboratory (gmrcl) of chang gung memorial hospital overview of microarray analysis of gene expression and its applications to cervical cancer investigation molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (sars-cov) molecular characterization of adenocarcinoma and squamous carcinoma of the uterine cervix using microarray analysis of gene expression paclitaxel (taxol) upregulates expression of functional interleukin-6 in human ovarian cancer cells through multiple signaling pathways functional network analysis on the transcriptomes of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and bone marrow evaluation of gene expression measurements from commercial microarray platforms comprehensive comparison of six microarray technologies redefinition of affymetrix probe sets by sequence overlap with cdna microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements a study of interlab and inter-platform agreement of dna microarray data comment on "'stemness': transcriptional profiling of embryonic and adult stem cells" and "a stem cell molecular signature getting the noise out of gene arrays arrays of hope evaluation of external rna controls for the assessment of microarray performance evaluation of dna microarray results with quantitative gene expression platforms performance comparison of one-color and two-color platforms within the microarray quality control (maqc) project for maqc consortium. the microarray quality control (maqc) project shows inter-and intraplatform reproducibility of gene expression measurements rat toxicogenomic study reveals analytical consistency across microarray platforms microarrays and clinical investigations gene-expression profiling in acute myeloid leukemia noise and bias in microarray analysis of tumor specimens integrative analysis of the cancer transcriptome from signatures to models: understanding cancer using microarrays cluster analysis and display of genome-wide expression patterns large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression for gene ontology consortium. the gene ontology (go) database and informatics resource mining for regulatory programs in the cancer transcriptome isolating human transcription factor targets by coupling chromatin immunoprecipitation and cpg island microarray analysis microarray analysis of gene expression of cancer control of pancreas and liver gene expression by hnf transcription factors pgc-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes a module map showing conditional activity of expression modules in cancer molecular classification of cancer: class discovery and class prediction by gene expression monitoring molecular portraits of human breast tumours distinct types of diffuse large b-cell lymphoma identified by gene expression profiling gene expression profiling predicts clinical outcome of breast cancer diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning a molecular signature of metastasis in primary solid tumors genomic signatures to guide the use of chemotherapeutics prediction of survival in diffuse large-b-cell lymphoma based on the expression of six genes distinctive gene expression patterns in human mammary epithelial cells and breast cancers a clinical evaluation of the international lymphoma study group classification of non-hodgkin's lymphoma. the non-hodgkin's lymphoma classification project polychemotherapy for early breast cancer: an overview of the randomised trials. early breast cancer trialists' collaborative group tamoxifen for early breast cancer: an overview of the randomised trials. early breast cancer trialists' collaborative group cancer's deadly signature use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia prognostically useful gene-expression profiles in acute myeloid leukemia prediction of cancer outcome with microarrays: a multiple random validation strategy gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications a gene-expression signature as a predictor of survival in breast cancer scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds for the lymphoma/ leukemia molecular profiling project. the use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma concordance among geneexpression-based predictors for breast cancer oncogenic pathway signatures in human cancers as a guide to targeted therapies twenty-two years of phase iii trials for patients with advanced non-small-cell lung cancer: sobering results chemosensitivity prediction by transcriptional profiling gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations molecular signatures of lung cancer-toward personalized therapy a genomic strategy to refine prognosis in early-stage non-small-cell lung cancer an integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer the authors like to thank dr yun-shien lee key: cord-290861-5bxvenue authors: ashwell, m.; freire, m.; o’nan, a.t.; benito, j.; hash, j.; mcculloch, r.s.; lascelles, b.d.x. title: characterization of gene expression in naturally occurring feline degenerative joint disease-associated pain date: 2018-11-19 journal: vet j doi: 10.1016/j.tvjl.2018.11.008 sha: doc_id: 290861 cord_uid: 5bxvenue degenerative joint disease (djd) associated-pain is a clinically relevant and common condition affecting domesticated cats and other species including humans. identification of the neurobiological signature of pain is well developed in rodent pain models, however such information is lacking from animals or humans with naturally occurring painful conditions. in this study, identification of housekeeping genes (hkg) for neuronal tissue and expression levels of genes considered associated with chronic pain in rodent models were explored in cats with naturally occurring osteoarthritic pain. fourteen adult cats were evaluated — seven without clinical signs of osteoarthritic pain, and seven with hind limb radiographic djd and pain. expression of an investigator-selected set of pain signaling genes (including asic3, atf3, cox2, cx3cl1, nav1.7, nav1.8, nav1.9, ngf, nk1r, tnfα, trka) in lumbar spinal cord dorsal horn and lumbar dorsal root ganglia tissues from clinically healthy cats and cats with djd were studied using quantitative rt-pcr (qpcr). hkg identified as the most stable across all tissue samples were many of the ribosomal protein genes, such as rpl30 and rps19. qpcr results showed atf3 and cx3cl1 up-regulated in djd-affected dorsal root ganglia compared to clinically healthy controls. in spinal cord, cx3cl1 was up-regulated and ngf was down-regulated when djd-affected samples were compared to healthy samples. further work is needed to understand the neurobiology of pain in naturally occurring disease and what rodent models are predictive of these changes in more heterogeneous populations such as domestic cats. feline degenerative joint disease (djd) can be associated with mobility impairment in feline patients (gruen et al., 2014 (gruen et al., , 2015 lascelles, 2010) . this impairment is believed to be due to pain associated with the djd as mobility appears to be increased when pain relief is provided (gruen et al., 2014 (gruen et al., , 2015 lascelles, 2010) . the approach to pain management can be divided into two basic approachesto try analgesics known or thought to be effective in other conditions or other species, or to base the analgesic selection on a rationale evaluation of the neurobiological changes present in the target disease state in the target species. this latter approach could be described as making rationale analgesic choices on the neurobiological signature of the pain. presently, nothing is known about the neurobiology of feline djd pain. one approach to understanding the neurobiology of pain is to look at differences in gene expression between normal and phenotypically well-defined diseased states. limited work has been performed along these lines in dogs. hegemann et al. (2005) measured gene expression levels of interleukin (il)-1α, il-1ß, il-2, il-4, il-6, il-8, il-10, il-12, il-18, interferon gamma (if-g), transforming growth factor beta (tgfß), and tumor necrosis factor alpha (tnfα) in synovial fluid collected from dogs with osteoarthritis and immune-mediated polyarthritis using semi-quantitative real-time pcr methods. they reported differences in gene expression levels for some of these genes when they compared affected and unaffected dogs, but did not relate the findings to the presence of painan important consideration given that common measures of disease, such as radiographic features, do not correlate well with pain levels. other investigators have evaluated the changes in gene expression in canine osteoarthritic cartilage (clements et al., 2006 (clements et al., , 2009 , but again, the results were not evaluated against whether or not there was a pain state present. quantitative real-time pcr (qpcr) studies have evaluated expression levels of various cytokines in feline allergic skin disease (taglinger et al., 2008) , feline chronic gingival stomatitis (harley et al., 1999) and feline coronavirus infection (gelain et al., 2006) . however, no studies have been performed in cats with djd to determine gene expression changes in relation to pain. we were interested in understanding the gene expression changes at crucial points in the nociceptive signal (pain) generation and transmission pathway in the dorsal root ganglia (drg) and the dorsal horn (dh) of the spinal cord. in this study, we first evaluated a panel of potential housekeeping genes to identify the most stable reference genes in djd and healthy cats in drg and dorsal horn as this had not been previously done. after the most stable reference genes were identified, a selection of genes previously associated with nociception in rodent models, and of interest to the authors, were examined using qpcr in the same samples to allow us to start characterizing the neurobiological signature of pain associated with djd in cats. identification of stable reference genes in pain-related tissues in felines is important because of the cat's use as a potential animal model for human conditions and the prevalence of aging cats in veterinary medicine. this study was approved by the ncsu institutional animal care and use committee (iacuc; approval no.10-133-o; approval date: 27 october 2010). fourteen domestic cats euthanased at a local animal shelter, and of known pain status via pre-euthanasia examinations, were used for sample collection. the investigators evaluated the cats prior to euthanasia to determine whether they had pain associated with hind limb joints. cats were euthanased with an overdose of barbiturates for reasons unrelated to this study (population control) and the investigators had no input into which cats were euthanased. immediately after euthanasia orthogonal digital radiographs of the lumbar axial skeleton and appendicular joints of the hind limbs were performed and evaluated for the presence of radiographic signs indicative of naturally occurring djd as described previously (lascelles, 2010) . the tarsus, stifle and hip joints were opened and visually inspected for evidence of macroscopic lesions indicative of djd as described previously (freire et al., 2011) and scored as described previously . the spinal canal from the last thoracic vertebrae to the sacrum was opened and evaluated for macroscopic lesions indicative of intervertebral disc degeneration with disk prolapse and spinal cord compression. in summary, cats were considered 'pain positive' if there were signs of discomfort on manipulation of hind limb joints; there was gross evidence of djd in those joints; there was muscle atrophy in the hind limbs that was not seen elsewhere in the body; and there were no grossly visible indications of intervertebral disk prolapse or spinal cord compression. seven cats were considered free of musculoskeletal pain on pre-euthanasia evaluation, and of free of naturally occurring appendicular joint djd and spinal djd, and were included in the normal group (no abnormalities seen on digital radiographs and on macroscopic inspection of appendicular joints and axial skeleton). seven cats were included in the hind limb djd group and showed radiographic signs indicative of moderate-severe djd in one or multiple appendicular joints of the hind limbs. no evidence of spinal compressive lesions was present. the lumbar spine from mid body of third lumbar vertebrae to mid body of the fifth lumbar vertebrae (lumbar intumescence) was harvested and drg of fourth, fifth and sixth spinal segments from both sides were dissected and individually stored. right and left dh of the spinal cord segment harvested were dissected and individually stored (after initial exposure to rnalater for at least 48 h). dissection was performed using a dissecting microscope (olympus szx16, ). looking at the transverse cut surface of the spinal cord, the white and grey matters can be differentiated and the dorsal horn can easily be identified. the dorsal and ventral aspects of the spinal cord are identifiable by location of the ventral median fissure and the cranial and caudal aspects are determined by looking at the orientation of the nerve rootlets. the dura mater was opened and the spinal cord was divided into right and left segments following the ventral median fissure. another cut placed just dorsal to the central canal allowed the dorsal horn to be separated from the ventral. this cut was performed carefully to assure separation by identification of the grey matter corresponding to the dorsal horn on the transverse cut surface of the spinal cord (supplementary figs. s1-s4). the same drg and dh segments of the spinal cord were sampled in normal cats for comparison. all tissue samples were stored in rnalater (qiagen) at à4 c for 24 h and at à20 c thereafter until sample processing. total rna from drg and dh was extracted using the qiagen rneasy kit and an on-column dnase digestion to remove genomic dna. the manufacturer's protocol was followed through the chloroform extraction but instead of precipitating the rna in 100% ethanol, 70% ethanol was added and the mixture applied to the rneasy (qiagen) column, following the rneasy kit instructions through the end of that manufacturer's protocol. quantity of the extracted rna was measured on a nanodrop spectrophotometer (thermo scientific, wilmington, de) and quality was evaluated by running the extracted rna on a 1.2% agarose gel to determine the integrity of the 28s and 18s ribosomal subunits. only samples showing a 28s:18s rrna band intensity ratio of approximately 2 (as determined visually) were used in this study. a high capacity cdna reverse transcription kit (applied biosystems inc., foster city, ca) was used to reverse transcribe 500 ng of total rna per the manufacturer's protocol. subsequently, reactions were diluted 1:20 to provide enough template for all genes to be evaluated. primer sequences for gusb, hmbs, rpl17, rps7, rps19 and ywhaz were obtained from penning et al. (2007) . the remaining housekeeping and target gene primers were designed using beacon designer software (premier biosoft intl., palo alto, ca) to be compatible with sybr green i and to span an intron in order to detect genomic contamination. pcr primer sequences for the remaining housekeeping genes and pain-related target genes are given in table 1 . quantitative pcr was performed in a volume of 20 ml, consisting of 1 ml of diluted cdna, 400 nmol/l of forward and reverse primers, 10 nmol/l fluorescein, and 1â power sybr green master mix (applied biosystems). a three-step amplification protocol was performed in an icycler iq (bio-rad). each reaction included one cycle at 95 c for 7 min, following by 40 cycles at 30 s at 95 c, 30 s at 52 c-62 c for annealing, and extension for 30 s at 72 c. specificity of each reaction was assessed by melt curve analysis (80 cycles starting at 55 c with an increase of 0.5 c every cycle, with a dwell time of 10 s) and one amplicon from each primer pair was dna sequenced to confirm identity. reactions were performed in duplicate, c t values were averaged for the replicates and negative controls were included to detect possible contamination. any duplicate measurements more than 0.5c t apart were repeated for that sample or removed from the analysis. standard curves were evaluated for each primer pair by combining equal amounts of cdna from each specimen into a pool. the pool was then diluted 1:3, 1:9, 1:27, 1:81 and 1:243. dilutions were evaluated in duplicate to calculate amplification efficiencies which ranged from 90% to 110%, depending on the type of tissue and primer pair. bestkeeper (pfaffl et al., 2004) , genorm (vandesompele et al., 2002) and normfinder (ohl et al., 2006) were used to evaluate the potential reference genes and select the most stable in these particular tissues. fold changes between healthy and djd affected samples were computed via normalization to the geometric mean of the selected housekeeping genes. the fold change calculations incorporated corrections for reaction efficiency using the method of pfaffl et al. (2002) . changes in the expression of healthy and djd samples were evaluated for statistical significance using a linear mixed model derived from the work of steibel et al. (2009) . bayesian information criteria was used to select the best parameters to include in the model (those contributing to the lowest bic score). fixed factors included health status (healthy vs. djd), reproductive status (sterilized vs. intact), and radiographic score for djd. random factors included variables for position of the sample (left vs. right), age, and sex of the cats. the model was established with duplicate samples from a given cat nested within the cat. all analysis was conducted with spss software (version 24.0, ibm) and an alpha (α) of 0.05 was used for all analyses. cats included in the djd-pain group were two castrated males, three intact males and two spayed females. mean age (aestandard deviation, sd) was 12 (1.4) years old, and mean weight (aesd) was 3.6 (0.6) kg; median body condition score was 5/9. the characteristics of the cats are detailed in table s1 (supplementary data), and all were designated 'pain positive'. bilateral hind limb appendicular joint djd in one or multiple joints were present in five of seven cats and samples from right and left dh of the spinal cord and right and left drg were analyzed. in the other two cats in the djd-pain group, only one hind limb had one or multiple joints affected with djd and samples from the nervous tissue ipsilateral to the affected side were collected. cats studied included in the healthy group were two castrated males, three intact males, one spayed female and one intact female. mean age (aesd) was 8.7 (3.4) years, and mean weight (aesd) was 4.2 (0.9) kg; median body condition score was 5/9. samples were collected from right and/or left nervous tissues in these cats to match with the samples collected in the cats with djd. ten potential housekeeping genes for drg and dh collected from djd-affected and unaffected cats were examined using the three gene expression reference gene programs. the ten genes evaluated in this study were selected based on previous work by penning et al. (2007) that examined the stability of these genes in feline liver, kidney, dental roots, heart and mammary gland tissues but not in any central nervous system tissues. the genes examined were beta actin (actb); beta-2-microglobulin (b2m); beta glucuronidase (gusb); hydroxymethylbilane synthase (hmbs); tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein-zeta polypeptide (ywhaz); and ribosomal proteins l17 (rpl17), l30 (rpl30), s7 (rps7), s9 (rps9) and s19 (rps19). due to dissection errors, two drg were missing from the sample set; however drg representing all 14 cats were used to evaluate the reference gene panel using bestkeeper and norm-finder, as those programs allow for missing data. only paired samples from 12 cats were able to be evaluated using genorm as that program requires complete data. all genes amplified in the majority of the samples examined but b2m had a sd > 1 and was removed as a potential reference gene based on the recommendation from the bestkeeper software. based on bestkeeper results, actb, rps19 and rps9 were the most stable housekeeping genes in the drg samples. using results from genorm, rps7, rps9 and actb were the most stable. and norm-finder produced results that differed from both genorm and bestkeeper, finding rpl30, hmbs and gusb as the most stable genes in the drg samples (table 2 ). an additional feature of genorm is the determination of the optimum number of reference genes needed for accurate normalization (fig. s2a and s2b ). the number of reference genes associated with an expression stability value of 0.15 or lower is considered the optimum number of housekeeping genes needed. for each of the tissues examined in the feline samples, the variation in normalization factor with two vs. three reference genes was <0.15, indicating the optimum table 2 reference gene rankings for each tissue type using bestkeeper, genorm and normfinder. reference gene ranked order (most stable to least stable) bestkeeper number of genes needed was two. however, since the three programs did not always agree on the two most stable genes, three were selected for each tissue type. based on these results, the geometric mean of actb, rps9 and rpl30 was used as the reference value in each of the drg samples. a total of 24 dh samples (12 from cats with djd and 12 from healthy cats) were used to evaluate potential housekeeping genes in this tissue. all ten reference genes amplified and had sd <1 so all were included in the reference gene analyses. determination of the most stable reference genes in dh was very consistent across the three evaluation programs, with hmbs being the most stable gene ( table 2) . actb was the next most stable using bestkeeper and normfinder and this gene was placed as the third most stable using genorm. based on these results, the geometric mean of actb, hmbs and rpl17 was used as the reference value in each of the dh samples. after a set of stable reference genes were identified for each tissue type, 13 genes associated with pain in rodents were selected (based on current knowledge of genes involved in pain states (foulkes and wood, 2008) and their expression levels compared in the drg from djd-affected and healthy samples. we hypothesized that these genes would show increased expression in djd cats. the selected genes were as follows: acid sensing ion channel 3 (asic3); activating transcription factor 3 (atf3); calcitonin gene related peptide (cgrp); cyclooxygenase 2 (cox2); chemokine (c-x3-x motif) ligand 1 (cx3cl1); sodium channel, voltage-gated, alpha subunit types ix (nav1.7), x (nav1.8) and xi (nav1.9); nerve growth factor (ngf); tachykinin receptor 1 (nk1r); substance p (tac1); tumor necrosis factor alpha (tnfα); and tropomyosin receptor kinase a (trka). although two primer pairs were designed for each target gene, no amplification was detected for cgrp and tac1, so they were excluded from this study. when djdaffected cats were compared to healthy control cats for the remaining 11 target genes, atf3 and cx3cl1 were significantly upregulated in the drg (1.639-fold and 1.069-fold, respectively; table 3 ). ten target genes were selected for examination in dh samples-asic3, cox2, cx3cl1, nav1.7, nav1.8, nav1.9, ngf, nk1r, tnfα, and trka. asic3, nav1.7 and trka were excluded because their pcr efficiencies were outside the acceptable range of 90-110% and two genes (nav1.8 and nav1.9) did not amplify in any of the dh samples. of the five remaining genes, cx3cl1 was up-regulated (1.878-fold) and ngf was down-regulated (0.457-fold) in the djd affected samples (table 3 ). with quantitative pcr, multiple target genes may be evaluated to measure changes in expression. however, to accurately determine the relative expression levels, a reference is used to normalize the expression results for differences in cdna quantity between different samples, enabling comparisons between target genes across disease states. bestkeeper, genorm, and normfinder provide three different approaches for examining potential genes to select as the most stable genes for a given set of conditions. at least three other studies have identified suitable gene expression housekeeping genes in various feline tissues. penning et al. (2007) found that most of the ribosomal genes they tested appeared to be suitable reference genes in the different feline tissues. wensman et al. (2007) tested six potential housekeeping genes in different brain tissues and found that hprt, ywhaz and rps7 were the most stable. kessler et al. (2009) tested 10 potential reference genes in neoplastic, endocrine, blood, liver, intestine and lymphoid tissues in healthy cats and found that rps7, actb and abl were the most stably expressed housekeeping genes. while none of these other feline studies examined central nervous tissues, our results are similar, with several of the ribosomal genes and actb being stable reference genes. in rat dorsal root ganglia and dorsal horn samples piller et al. (2013) found that actb and hprt and also rpl29 and rpl13a were suitable housekeeping genes in drg and dh samples, respectively, and wan et al. (2010) found rpl29, rpl3 and actb most stable in drg samples isolated from a rat model of neuropathic pain. based on these other studies, our findings that many of the ribosomal genes and actb are the most stable in feline drg and dh samples are reasonable. the target genes selected in this study were chosen based on results from work on skeletal pain conducted in rodents (e.g. mantyh, 2014) and knowledge of genes involved in pain states (foulkes and wood, 2008) in an attempt to characterize the neurobiology of djd-associated pain in the cat. they were also chosen because of therapeutics that are available, or in development, or because of their involvement in neuropathic pain states. mrna levels of cx3cl1, also known as fractalkine, were higher in the drg of djd-affected cats, suggesting that these cats were experiencing a neuropathic pain state (clark & malcangio, 2014) . in studies examining changes in gene expression in drgs, the expression of activating transcription factor 3 (atf3) is increased in the drg after peripheral nerve injury (shortland et al., 2006) and more recently has been shown to be increased also in models of oa in rodents (thakur et al., 2012) . atf3 is considered indicative of neuronal-damage and neuropathic pain. similar results were identified in the drgs isolated from the djd-affected cats used in this study, where the expression of atf3 was increased 1.6-fold, and further suggests that cats with djd may be experiencing neuropathic pain like states. very little other work has been performed in drg from nonrodent species, but in horses with laminitis, immunohistochemical analysis of drg from hind limb laminitic horses, comparing lumbar drg with cervical drg showed an increased expression of atf3, and the authors suggested this may be indicative of neuropathic pain (jones et al., 2007) . in many experimental pain models the expression of cox2 has been found to be increased in the spinal cord in response to inflammation and injury (vardeh et al., 2009 ) and contributes to pain, but we did not observe a significant difference in cox2 when we compared djd and healthy samples. the transcription factor atf3 has been shown to negatively regulate cox2 levels during acute inflammation in mice (hellmann et al., 2015) and our results may indicate the same negative regulation is occurring in cats with djd-pain. however, it is important to remember that our results reflect a single crosssectional chronic time point. the mechanisms of pain likely change over time, and the time course of the rodent models tends to be very short, and may not reflect mechanisms at later stages of the pain state. however, considering again our findings around cox2, investigators have struggled to show clinical benefit of nsaids in cats with djd-associated pain (gruen et al., 2015) . in the djd-affected cats, we found significant increased expression of cx3cl1 in dh, similar to what we detected in the drg samples. studies in rodents have shown that cx3cl1 and its receptor cx3cr1 are an important signaling pathway involved in microglial contributions to chronic pain. interestingly the fractalkine protein is membrane-bound until it is cleaved by different proteases, including cathepsin s. the cleaved, soluble form of fractalkine is associated with neuropathic pain, not the membrane-bound form (clark & malcangio, 2014) . we also found ngf to be significantly differentially expressed in dh, but not in the expected direction of that reported in other pain models. interestingly, in a study of a monoclonal antibody to nerve growth factor (anti-ngf mab) (gruen et al., 2016) , the anti-ngf mab produced robust and long-lasting improvements in activity and mobility, presumably due to a decrease in pain. this may illustrate how simply evaluating the expression of genes may not give the whole picturefor a comprehensive understanding of the neurobiology of pain, gene expression, translation into protein and post-translational modifications need to be evaluated. taken together, our results do point to increased expression of genes considered to be involved in neuropathic pain, and may be evidence of a neuropathic like pain state in cats with djdassociated pain. we expected that several of the genes we selected (which have all been shown to play a role in pain in rodent models of arthritis) would show altered expression between the djd-pain and healthy groups. there are several reasons for the relative lack of change across the genes we selected. rodent models are just thatinduced models, and these models may not reflect the neurobiology of pain in naturally occurring pain states. indeed, this is one of the reasons that has been suggested for the lack of translation of basic pain research into effective new therapeutics (lascelles et al., 2018) . another aspect of our study that should be factored into the interpretation of the results is that the tissues we collected contained varying types of cells and indeed tissue, and so we cannot directly infer that changes or lack of changes are due to any particular tissue type. we made every attempt to only use cats that definitely had djd-pain or were healthy, however, we did not have detailed history on each cat, and our pain status was simply designated as 'yes' or 'no'. future studies should endeavor to collect tissues from more highly phenotyped animals (with respect to pain status). this is particularly important as the neurobiology of pain probably varies over time and with features of the phenotype (e.g. excessive sensitivity present or not). the lack of changes in our study may reflect an underpowered study and the heterogeneity of the samples assessed. despite the shortcomings, we believe our study points to the need to look more closely at the naturally occurring pain states and such investigations may lead to novel therapeutic targets. however, there is a danger of missing novel, relevant targets if the starting point is already biased to what is known from rodent models, as our study was. selection of appropriate stable housekeeping genes is extremely important and has not been previously done for feline nervous tissues. as rodent species continue to illustrate they are not appropriate models for different human conditions, other animal models with naturally occurring disease, such as the cat, will become more and more prevalent and appropriate reference genes will be needed for accurate gene evaluation studies. our approach of evaluating neurobiological changes (gene expression) in nervous system tissue from cats with naturally occurring pain appears feasible. the results of this small study point to increased expression of genes considered to be involved in neuropathic pain, and may be evidence of a neuropathic like pain state in cats with djd-associated pain. further work should be undertaken to confirm our results, and expand on these studies. funding for this work was provided by private donations to translational research in pain program from individuals concerned about the lack of effective therapeutics for cats with chronic pain. jb was employed on a morris animal foundation grant, d08fe-043, working to develop an assessment tool for chronic pain in cats. none of the authors has any other financial or personal relationships that could inappropriately influence or bias the content of the paper. fractalkine/cx3cr1 signaling during neuropathic pain analysis of normal and osteoarthritic canine cartilage mrna expression by quantitative polymerase chain reaction cartilage gene expression correlates with radiographic severity of canine elbow osteoarthritis pain genes radiographic evaluation of feline appendicular degenerative joint disease vs. macroscopic appearance of articular cartilage whole blood cytokine profiles in cats infected by feline coronavirus and healthy non-fcov infected specific pathogen-free cats detection of clinically relevant pain relief in cats with degenerative joint disease associated pain criterion validation testing of clinical metrology instruments for measuring degenerative joint disease associated mobility impairment in cats a feline-specific anti-nerve growth factor antibody improves mobility in cats with degenerative joint disease-associated pain: a pilot proof of concept study cytokine mrna expression in lesions in cats with chronic gingivostomatitis cytokine profile in canine immune-mediated polyarthritis and osteoarthritis atf3 negatively regulates ptgs2/cox2 expression during acute inflammation neuropathic changes in equine laminitis pain quantitative taqman real-time pcr assays for gene expression normalisation in feline tissues feline degenerative joint disease crosssectional study evaluating the prevalence of radiographic degenerative joint disease in domesticated cats spontaneous painful disease in companion animals can facilitate the development of chronic pain therapies for humans the neurobiology of skeletal pain identification and validation of suitable endogenous reference genes for gene expression studies of human bladder cancer a validation of 10 feline reference genes for gene expression measurements in snap-frozen tissues relative expression software tool (rest©) for group-wise comparison and statistical analysis of relative expression results in real-time pcr determination of stable housekeeping genes, differentially regulated target genes and sample integrity: bestkeeper -excel-based tool using pair-wise correlations reverse transcription quantitative real-time polymerase chain reaction reference genes in the spared nerve injury model of neuropathic pain: validation and literature search atf3 expression in l4 dorsal root ganglion neurons after l5 spinal nerve transection a powerful and flexible linear mixed model framework for the analysis of relative quantification rt-pcr data quantitative real-time rt-pcr measurement of cytokine mrna expression in the skin of normal cats and cats with allergic skin disease characterisation of a peripheral neuropathic component of the rat monoiodoacetate model of osteoarthritis accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes cox2 in cns neural cells mediates mechanical inflammatory pain hypersensitivity in mice identification and validation of reference genes for expression studies in a rat model of neuropathic pain development of a real-time rt-pcr assay for improved detection of borna disease virus supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.tvjl.2018.11.008. key: cord-351845-bli3qm8w authors: prasad, kartikay; khatoon, fatima; rashid, summya; ali, nemat; alasmari, abdullah f.; ahmed, mohammad z.; alqahtani, ali s.; alqahtani, mohammed s.; kumar, vijay title: targeting hub genes and pathways of innate immune response in covid-19: a network biology perspective date: 2020-06-26 journal: int j biol macromol doi: 10.1016/j.ijbiomac.2020.06.228 sha: doc_id: 351845 cord_uid: bli3qm8w the current pandemic of 2019 novel coronavirus disease (covid-19) caused by a novel virus strain, 2019-ncov/sars-cov-2 have posed a serious threat to global public health and economy. it is largely unknown how the human immune system responds to this infection. a better understanding of the immune response to sars-cov-2 will be important to develop therapeutics against covid-19. here, we have used transcriptomic profile of human alveolar adenocarcinoma cells (a549) infected with sars-cov-2 and employed a network biology approach to generate human-virus interactome. network topological analysis discovers 15 sars-cov-2 targets, which belongs to a subset of interferon (ifn) stimulated genes (isgs). these isgs (ifit1, ifitm1, irf7, isg15, mx1, and oas2) can be considered as potential candidates for drug targets in the treatments of covid-19. we have identified significant interaction between isgs and tlr3 agonists, like poly i: c, and imiquimod, and suggests that tlr3 agonists can be considered as a potential drug for drug repurposing in covid-19. our network centric analysis suggests that moderating the innate immune response is a valuable approach to target covid-19. the current pandemic of coronavirus disease-2019 (covid-19) has led to over 5 million confirmed cases and almost 3.5 lakhs fatalities worldwide, since its emergence in late 2019. covid-19 is caused by a novel virus coronavirus strain, sars-cov-2, an enveloped, positive-sense, single-stranded rna, β-coronavirus of the family coronaviridae [1] . nearly 80% of covid-19 patients show mild symptoms, such as cough and fever, and do not require hospitalization, but the remaining 20% do require [2] . however, of those 20%, almost half of them develop severe respiratory failure in the form of fatal acute respiratory distress syndrome (ards) [3] . along with this fatal form, dysregulated immune response also mediated the severe covid-19 pathogenesis. the immune dysregulation, called hypercytokinemia or ''cytokine storm,'' is frequently associated with ards [4] . till date, it remains unclear how sars-cov-2 disrupt the host innate immune response. several recent studies have reported the dysregulated secretion of proinflammatory cytokines in covid-19 [5] [6] [7] . recently, melo et al. [5] reported a moderate interferon (ifn) response to sars-cov-2 infection in primary cells and showed that ifn can reduce sars-cov-2 replication in vitro. zhou et al. [7] and xiong et al. [6] recently examined the innate immune response to sars-cov-2 infection in bronchoalveolar lavage fluid (balf). the differentially expressed genes were enriched in inflammatory pathways including chemokine signaling [7] . whereas, in studies by xiong et al. [6] , the upregulated genes were largely related to viral infection j o u r n a l p r e -p r o o f with the most enriched biological processes being protein targeting to membrane and er. these studies reported the significant upregulation of a subset of interferon-stimulated genes (isgs) which are directly related to antiviral activity, such as isg15, ifih1, mx1, oas1-3, ifitms etc. these studies along with others revealed prominent role of innate immune response to covid-19 infection [8] [9] [10] [11] . the two important themes are that consistently results in the upregulation of chemokines, and increased levels of proinflammatory cytokines such as il-1β, il-2, and il-6, thus leads to tissue damage. the second is that sars-cov-2 triggered a robust ifn response, marked by the upregulation of several isgs in the lungs. like other viruses, novel sars-cov-2 also utilizes host machinery for their growth and survival during infection. systematic investigation of virus-host protein-protein interactions (ppis) offers an effective way toward elucidating the mechanisms of viral infection and drug repurposing [12] . it has also been reported that sars-cov-2 can establish a higher rate of infectivity if it acquires combinatorial mutations at the s-protein/ace2 interfacial residues [13] . several studies have shown that targeting cellular antiviral targets could be considered as a novel strategy for the development of effective treatments for viral infections, including sars-cov [14] , mers-cov [14] , ebola virus [15] , zika virus [16] , and more recently in sars-cov-2 [17] . there are no vaccines or drugs approved for the novel covid-19 infection yet, but more than 80 clinical trials have been launched to test coronavirus treatments. the current covid-19 pandemic leads the scientific community to focus seriously on drug repurposing to tackle the covid-19 infection [18] [19] [20] [21] [22] . towards this goal, in this study, we have generated a human-sars-cov-2 interactome based on recently published rna-seq analysis of human adenocarcinomic alveolar basal epithelial (a549) cells infected with sars-cov-2, and identified disease-related functional genes that will provide the insights into the patho-j o u r n a l p r e -p r o o f 4 mechanisms of covid-19. moreover, drug-protein interactions of these hub proteins, investigated in this study will leads to the identification of potential candidate drugs for repurposing. we have downloaded the differential gene expression data of human alveolar adenocarcinoma cells infected with covid-19 from melo et.al study [23] . we have also downloaded all the possible drug-gene interactions data from drug-bank database and comparative toxicogenomic database (ctd), along with drug-disease interactions as well [24, 25] . melo et.al [26] previously described 120 differentially expressed genes (degs) based on pvalue cut-off (p-val < 0.05). identification of significantly dysregulated genes on the basis of p-value is not quite precise. the amount of change in the expression of genes (log2fold change) should also be considered. we reanalysed the data with q-value cut-off (fdr < 0.05), as well as applied log2fold change cut-off (log2fold ≥ 2) for identifying significantly dysregulated genes from the data of 120 genes. for comprehensively analysing the biological function of degs, we used database for annotation, visualization and integrated discovery (david) version 6.7 [27] . this database uses the go and the kyoto encyclopedia of genes and genomes (kegg) analysis for analyzing the degs. the gene ontology analysis included the annotation at biological level, j o u r n a l p r e -p r o o f 5 cellular level and molecular level. the pathways and functions with fdr < 0.05 were considered significant. using the degs list of melo et.al. [26] , protein-protein interaction network was constructed using string plugin of cytoscape tool [28] . string plugin uses co-expression, textmining, gene fusion, neighbourhood, and experimental data for preparing ppi network. cluego tool was used for identifying the biological roles of degs and david tool was used for gene ontology (go) analysis. in this ppi network, each node indicates a protein and an edge indicates an interaction between proteins. we have also calculated the network`s topological parameters such as degree centrality (k), betweenness centrality ( )), closeness connectivity and eccentricity using network analyzer plugin of the cytoscape tool. hub genes are the highly connected genes having high correlation with other genes in the network. any changes in the expression of hub genes have the potential to influence the major part of the network. genes having high connectivity can transfer information rapidly in the network as compare to other genes. top genes which have the higher degree of connectivity (k) and betweenness centrality value were considered as hub genes. degree centrality (k) is value assign to each node purely based on number of links held by it. degree indicates the number of interactions held by a node with other nodes in the network and helps in measuring the node significance in controlling the network. degree centrality (k) = ∑ ( , ) …. equation1 where, is the node set containing all the neighbours of node u, and w(u,v) is the edge weight connecting node u with node v. betweenness centrality ( ) measures how many times a node falls on the shortest path with other neighbouring nodes. it characterizes a nodes ability to control the signal processing and information flow in the network. where p(k,u,f) is the number of interactions from k to f that passes through u, and p(k,f) represents total number of shortest interactions between node k and f. biological process enrichment analysis and go analysis of the hub genes were done using cluego and david tool respectively. drug-target interaction information for the hub genes was collected from the drugbank database [24] , and comparative toxicogenomics database (ctd) [25] . the predicted drugs for hub genes through the gene-drug interaction databases were used for constructing drugprotein network using stitch database [29] . stitch is a database known to predict functional and physical interaction between chemicals/drugs and genes. the interactions in stitch database is derived from five main sources, mainly by automated text-mining, highthroughput lab experiments, co-expression interaction data, interaction prediction by genomic context and by previous knowledges from databases. for each interaction between drug and gene, a combined score was calculated by stitch database. the combined score was calculated by combining the probabilities of interaction from different evidence channels and corrected probability of randomly observing an interaction. the drug-gene interactions having network score more then 0.9 was consider significant and positively hit. it is likely that the outcome of sars-cov-2 infection can largely be determined by the interaction between the host proteins and virus proteins. to build the human-sars-cov-2 interactome, we have used the transcriptome data of human alveolar adenocarcinoma (a549) cells infected with covid-19 from melo et al [23] . melo et al. identified 120 dysregulated genes with p-adj < 0.05, in which majority of genes were getting upregulated. gene ontology (go) analysis of dysregulated genes using david [30] revealed that the dysregulated genes were enriched in regulation of defence response to virus, regulation of protein export from nucleus, response to interferons, and regulation of response to cytokine stimulus (fig. 1a) . kegg pathway enrichment analysis reveals the role of dysregulated genes in chemokine signalling pathways, complement and coagulation pathways and rig-i-like receptor signalling pathways (fig. 1b) . we reanalysed the transcriptomic gene expression signature based on log2fold change > 2 and adj-p value < 0.05. we have identified 16 significantly dysregulated genes and of these genes, ifit1, ifitm1, irf7, isg15, mx1, and oas2 were highly upregulated. the genemania webserver was used to predict interactions between these degs genes in the network using the go term "biological process" and source organism homo sapiens as additional parameters. the gene enrichment of theses degs highlighted go terms including response to virus, ribonucleotide binding and ifn-related signalling pathway (fig. 1c) . overall, the analysis demonstrated that the upregulated genes are mainly linked to the host response to sars-cov-2 infection, type i interferon signaling and the cytokine-mediated signaling pathway. j o u r n a l p r e -p r o o f 8 we used the ppi network analysis to construct the interactome of the degs for covid-19. for this, we used the string plugin of cytoscape tool [28] , and obtained a ppi network with 117 nodes and 905 edges from the 120 dysregulated genes (fig. 2a) . the proteins are ranked based on their degree connectivity and betweenness centrality scores (supporting information table 1 ). among these, stat1, irf7, ifih1, mx1, isg15, ifit3, oas2, and ddx58 has high degree and betweenness centrality values. to obtain a more in-depth understanding of the interactome, go function and kegg pathway analysis were applied using david (fig. 2b and supporting information table 2 ). go analysis results showed that in the biological process, the interacting genes were mainly enrichment in the regulation of defence response to virus, innate immune response, inflammatory response, and also played an important role in complement activation, and response to nutrient and extracellular stimulus (fig. 2b) . the cellular components are significantly located in the extracellular region and membrane fraction, etc. molecular functions were mainly enriched in chemokines or cytokine activity, rna binding, transferase activity, and calcium ion binding. kegg pathway enrichment analysis revealed the role of interacting genes in complement and coagulation pathways, rig-1-like receptor signalling pathways, and chemokine signalling pathways (supporting information table 2 ). overall, the network analysis indicates that sars-cov-2 targets the proteins in the ifn signaling pathway to evade the immune system. this highlights the key role of the ifnmediated antiviral responses. predict viral targets [31] [32] [33] [34] . in this regard, two topological features, degree (number of connections) and betweenness (the fraction of all shortest paths that include a node within a network), were calculated to identify candidate hub nodes using the cytoscape's network analyzer tool. collectively, we have identified top 15 nodes with high degree of connectivity and betweenness value and was subsequently considered as hub genes in the network ( table 1) . biological process enrichment analysis of 15 hub genes using genemania webserver and go analysis revealed their role in defence mechanism to viral response, cellular response to type i interferon, regulation of viral genome replication, double-stranded rna binding, type i interferon signaling pathway, and regulation of innate immune response ( fig. 3 and these identified hub genes belong to the family of interferon stimulating genes (isgs). isgs including ifit and ifitm, isg15, ifih1, mx1, irf7, oas 1-3 and stat1are known to potentiate ifn signaling and thus exert antiviral activity. these isgs fall into two categories, one that is direct effectors of the innate immune response including: mx1, ifitm3, isg15, ifih1, and irf-7, and the other includes the induction of viral rna sensors such as ddx58 and the oas1-3 genes. in addition to antiviral activity, isgs may exert diverse functions including rna and nucleotide binding, ifn regulation and inflammation regulation. mx1, oas1-3, ifih1, ifitm1-3, isg15 etc. are highly expressed in respiratory airways [35, 36] and are associated with viral entry-associated ace2 gene, as shown in single-cell rnasequencing data from various tissues from human [37, 38] . given the high expression of the viral entry-associated genes, it is reasonable that these nasal epithelial cells are trained to express these immune-associated genes to decrease viral susceptibility. further, we used enrichr [39] , a web-based server for gene-set enrichment analysis and provides different summaries of collective functions of gene lists. interestingly, disease enrichment analysis demonstrated that the signature was highly associated with other viral diseases like west nile encephalitis, tick-borne encephalitis, dengue fever, chikungunya, and sars. also, these hub genes are involved in the mammalian phenotype including increased susceptibility to viral infection induced morbidity/mortality, decreased interferon secretion, increased igg level, abnormal t cell activation, and decreased interleukin-6 secretion. overall, these results clearly showed that the signatures are largely involved in innate immune response following sars-cov-2 infection (supporting information table 4 ). further using drugbank database [24] and comparative toxicogenomics database (ctd) [25] , we identified the possible drugs which are known to have possible interaction with the hub genes (supporting information table 5 ). in total, we have identified 185 drug molecules which are known to interact with the hub genes (supporting information fig. 1) . apart from protein-drug interaction, we also looked for drug-disease interaction, i.e. which drug is used in which disease. based on the drug-gene interaction results, we used stitch database [29] for final categorization of drug-gene interaction network based on interaction score which is ≥ 0.9. interestingly, we found five proteins mx1, oas1, stat1, ddx58, and isg15 were highly correlated to rare disease phenotype as well as to sars in enrichr database. these five proteins were also the most common influential proteins according to go analysis. the stitch drug-protein interaction network analysis of these five proteins has shown j o u r n a l p r e -p r o o f 11 significant interaction with drugs/compound (fig. 4) . as shown in figure 4 , mx1 gene interacts with mitomycin-c, an antitumor and imiquimod, an immune response modifier. ifih1 interacts with polyinosinic:polycytidylic acid (poly i:c), which is an immunostimulant. oas1 showed significant interaction with s-carbamidomethylcysteine (cysteine-s-acetamide) and with mgatp. stat1 interacts with vanadium oxide and mgatp whereas, ddx58 and isg15 interact with mgatp. some of these identified drugs are used for induction of ifns and thus play a key role in the body defence against sars-cov-2s infection. here, we generated a comprehensive human-sars-cov-2 interactome from transcriptome studies of a lung cell infected with covid-19. the ppi network analysis indicates that the pathways are enriched in host response to virus infection, type i interferons signaling, and cytokine activation. network topology analyses identified 15 high-value targets of sars cov-2, which belongs to a subset of canonical isgs. these isgs are largely involved in regulation of defence response to virus, innate immune response, inflammatory response, and rna binding [40, 41] . an interferon-inducible protein, mx dynamin like gtpase (mx1) is associated with influenza and viral encephalitis infection [42, 43] . also notable is the role of the interferon induced protein with tetratricopeptide repeats 1 (ifit1) and dexd/h-box helicase 58 (ddx58) as an antiviral activity. in hepatitis e virus infection, polymerase binds to ifit1 protecting the viral rna from translation inhibition mediated by ifit1 that boosts the interferon response in murine macrophage-like cells [44] . recently, the significant upregulation of ifit1-3, and ddx58 gene expression under covid-19 viral infection has been reported [26] . in response to viral infections, several genes of host such as oas1-3, j o u r n a l p r e -p r o o f irf7, irf9, stat1 and ifih1 are highly expressed and are highly correlated with host response to viral infections [45] [46] [47] . studies shows that covs are equipped with strategies to antagonize the ifn signaling pathway that facilitates the virus to escape host immune response. sars-cov escapes the host ifn signalling as its orf6 protein blocks the expression of stat1-activated genes [48] . similarly, in mers-cov orf4b inhibits irf3 and irf7 to antagonize the antiviral ifn-β response [49] . additionally, papain-like proteases (plps) are expressed in both sars-cov and mers-cov that enables to delay the host immune response. coronaviruses engages in interactions with ifn stimulated gene 15 (isg15) and antagonizing the ifn-mediated antiviral response [50, 51] . the antiviral activity of isg15 has been shown in several viruses including human cytomegalovirus [52] , hiv [53] , west nile virus [54] , swine fever virus [55] , mers-cov [56] , and sars-cov [57] . ifns play a key role in the body defence against viral infections, and in this regard, we here showed some candidate drugs for repurposing. one of these drugs is poly (i:c) (polyinosinic:polycytidylic acid), a synthetic double-stranded rna immune-stimulant, which is used as adjuvant in vaccine production [58] . it is agonist for toll like receptor 3 (tlr3) which induces the expression of ifns. many studies demonstrated that the tlr3 agonists, poly iclc and poly (i:c) increases the production of ifn-α, -β, and -γ, which inhibited cov replication and minimized the inhibitory effects of cov on ifn signaling pathways [59] [60] [61] . interestingly, chloroquine, a recently proposed drugs for covid-19 [62] , inhibits poly (i:c)mediated ifn-β induction [63] . therefore, tlr3 agonists can be a considered as potential drugs for repurposing in covid-19. in addition to tlr-3 agonists tlr7 agonists such as imiquimod [64] , can induce ifn production in the human body as well. imiquimod is a strong inducer of ifn-α and several proinflammatory mediators, including tnf-α, il-12, and chemokines [65] . the other drug predicted to bind mx1 and oas, mitomycin c is a cancer drug that is used in the treatment of bladder, colon, and breast cancers. it functions as an alkylating agent that causes cross-linking of dna and inhibits rna as well as protein synthesis [66, 67] . at high concentration, mitomycin c inhibits the replication of influenza virus by blocking the rna synthesis [68] . mitomycin c has also been shown to inhibit b cell, t cell, and macrophage proliferation in vitro and impair antigen presentation, as well as the secretion of ifn-γ, tnfα, and il-2. vanadium oxide, an activator of stat-1, involved in immune-regulating mechanisms, including immune suppression and inflammation downregulation by stimulating and activating b/t cells.3 [69, 70] in summary, our integrative interactome and network topology analyses showed that sars-cov-2 induced a strong ifn response, marked by the increased expression of several isgs, including mx1, oas1-3, ifih1, isg15, irfs, and ifitms etc. these isgs exert antiviral activity and could protect the host cells from the infection. this protective action of isgs might account for the lesser percentage of severe cases and the lower fatality rate in covid-19. however, the extent of protection or damage to the host cell depends on stage of infection, types of cells, sars clade [71] [72] [73] and other factors like co-infection, age, and comorbidities. recently, zou et al. [74] reported high sars-cov-2 loads very early during infection, suggesting that the virus may have developed arsenals that is able to delay the ifn response by inhibiting innate immune signaling. thus, ifn induction in the incubation period and at the very early stages of the infection could be the key to prevent covid-19 associated mortalities. administration of interferon-inducing agents, such as poly (i:c) and imiquimod could reduce the mortality of sars at the very early stages of the disease (fig. 5) . on the other hand, at the later stages of the disease, the balance of the immune system becomes impaired, leading to inflammatory over-reactions, cytokine storm, and possible autoimmune induce the production of ifns and isgs, which will reinstate the impaired immune responses. in the late stage of infection, there is a proinflammatory cytokine storm, which can be theoretically targeted by immunosuppressors, like mitomycin c, and vanadium oxide. a new coronavirus associated with human respiratory disease in china clinical characteristics of coronavirus disease 2019 in china covid-19 cytokine storm: the interplay between inflammation and coagulation imbalanced host response to sars-cov-2 drives development of covid-19 transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid-19 patients heightened innate immune responses in the respiratory tract of covid-19 patients harnessing innate immunity to eliminate sars-cov-2 and ameliorate covid-19 disease covid-19 infection: the perspectives on immune responses human coronavirus: host-pathogen interaction the innate immune system: fighting on the front lines or fanning the flames of covid-19? understanding human-virus protein-protein interactions using a human protein complex-based analysis framework, msystems high throughput designing and mutational mapping of rbd-ace2 interface guide non-conventional therapeutic strategies for covid-19 repurposing of clinically developed drugs for treatment of middle east respiratory syndrome coronavirus infection fda-approved selective estrogen receptor modulators inhibit ebola virus infection a screen of fda-approved drugs for inhibitors of zika virus infection a sars-cov-2 protein interaction map reveals targets for drug repurposing network-based drug repurposing for novel coronavirus 2019-ncov/sars-cov-2 drug repurposing against covid-19: focus on anticancer agents fast identification of possible drug treatment of coronavirus disease-19 (covid-19) through computational drug repurposing study boosting the arsenal against covid-19 through computational drug repurposing optimizing hydroxychloroquine dosing for patients with covid-19: an integrative modeling approach for effective drug repurposing sars-cov-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems 0: a major update to the drugbank database for the comparative toxicogenomics database: update sars-cov-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists cytoscape: a software environment for integrated models of biomolecular interaction networks stitch: interaction networks of chemicals and proteins systematic and integrative analysis of large gene lists using david bioinformatics resources the sars-coronavirus-host interactome: identification of cyclophilins as target for pancoronavirus inhibitors a physical and regulatory map of host-influenza interactions reveals pathways in h1n1 infection network-guided discovery of influenza virus replication host factors common nodes of virus-host interaction revealed through an integrated network analysis a cellular census of human lungs identifies novel cell states in health and in asthma a single-cell atlas of the human healthy airways sars-cov-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-cov-2 receptor ace2 is an interferon enrichr: a comprehensive gene set enrichment analysis web server 2016 update interferon-stimulated genes: a complex web of host defenses emerging roles of interferon-stimulated genes in the innate immune response to hepatitis c virus infection interferon-inducible protein mx1 inhibits influenza virus by interfering with functional viral ribonucleoprotein complex assembly host genetics of severe influenza: from mouse mx1 to human irf7 hepatitis e virus polymerase binds to ifit1 to protect the viral rna from ifit1-mediated translation inhibition polymorphisms of interferon-inducible genes oas-1 and mxa associated with sars in the vietnamese population early endonuclease-mediated evasion of rna sensing ensures efficient coronavirus replication differential regulation of the oasl and oas1 genes in response to viral infections severe acute respiratory syndrome coronavirus orf6 antagonizes stat1 function by sequestering nuclear import factors on the rough endoplasmic reticulum/golgi membrane middle east respiratory syndrome coronavirus orf4b protein inhibits type i interferon production through both cytoplasmic and nuclear targets structural insights into the interaction of coronavirus papain-like proteases and interferon-stimulated gene product 15 from different species isg15: it's complicated consecutive inhibition of isg15 expression and isgylation by cytomegalovirus regulators innate antiviral response targets hiv-1 release by the induction of ubiquitin-like protein isg15 isg15 facilitates cellular antiviral response to dengue and west nile virus infection in vitro antiviral activity of isg15 against classical swine fever virus replication in porcine alveolar macrophages via inhibition of autophagy by isgylating becn1 mers-cov papain-like protease has deisgylating and deubiquitinating activities structural basis for the ubiquitin-linkage specificity and deisgylating activity of sars-cov papain-like protease targeting poly(i:c) to the tlr3-independent pathway boosts effector cd8 t cell differentiation through ifn-alpha/beta intranasal treatment with poly(i*c) protects aged mice from lethal respiratory virus infections prophylactic and therapeutic intranasal administration with an immunomodulator, hiltonol((r)) (poly ic:lc), in a lethal sars-cov-infected balb/c mouse model evaluation of immunomodulators, interferons and known in vitro sars-cov inhibitors for inhibition of sars-cov replication in balb/c mice breakthrough: chloroquine phosphate has shown apparent efficacy in treatment of covid-19 associated pneumonia in clinical studies toll-like receptor 3 agonist poly(i:c)-induced antiviral response in human corneal epithelial cells the systemic response to topical aldara treatment is mediated through direct tlr7 stimulation as imiquimod enters the circulation cellular requirements for cytokine production in response to the immunomodulators imiquimod and s-27609 mitomycin c: mechanism of action, usefulness and limitations mitomycin c inhibits ribosomal rna: a novel cytotoxic mechanism for bioreductive drugs influence of mitomycin c on the replication of influenza viruses role of vanadium in cellular and molecular immunology: association with immune-related inflammation and pharmacotoxicology mechanisms vanadium carcinogenic, immunotoxic and neurotoxic effects: a review of in vitro studies dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice pathogenic human coronavirus infections: causes and consequences of cytokine storm and immunopathology ifn-i response timing relative to virus replication determines mers coronavirus infection outcomes sars-cov-2 viral load in upper respiratory specimens of infected patients authors sincerely thank to the department of science and technology, government of india.the authors extend their appreciation to the deanship of scientific research at king saud university for funding this work through research group no. rg-1441-461. no potential conflict of interest was reported by the authors. key: cord-345516-fgn7rps3 authors: miller, laura c; fleming, damarius; arbogast, andrew; bayles, darrell o; guo, baoqing; lager, kelly m; henningson, jamie n; schlink, sarah n; yang, han-chun; faaberg, kay s; kehrli, marcus e title: analysis of the swine tracheobronchial lymph node transcriptomic response to infection with a chinese highly pathogenic strain of porcine reproductive and respiratory syndrome virus date: 2012-10-30 journal: bmc vet res doi: 10.1186/1746-6148-8-208 sha: doc_id: 345516 cord_uid: fgn7rps3 background: porcine reproductive and respiratory syndrome virus (prrsv) is a major pathogen of swine worldwide. emergence in 2006 of a novel highly pathogenic prrsv (hp-prrsv) isolate in china necessitated a comparative investigation into the host transcriptome response in tracheobronchial lymph nodes (tbln) 13 days post-infection with hp-prrsv rjxwn06, prrsv strain vr-2332 or sham inocula. rna from each was prepared for next-generation sequencing. amplified library constructs were directly sequenced and a list of sequence transcripts and counts was generated using an rnaseq analysis pipeline to determine differential gene expression. transcripts were annotated and relative abundance was calculated based upon the number of times a given transcript was represented in the library. results: major changes in transcript abundance occurred in response to infection with either prrsv strain, each with over 630 differentially expressed transcripts. the largest increase in transcript level for either virus versus sham-inoculated controls were three serum amyloid a2 acute-phase isoforms. however, the degree of up or down-regulation of transcripts following infection with hp-prrsv rjxwn06 was greater than transcript changes observed with us prrsv vr-2332. also, of 632 significantly altered transcripts within the hp-prrsv rjxwn06 library 55 were up-regulated and 69 were down-regulated more than 3-fold, whilst in the us prrsv vr-2332 library only 4 transcripts were up-regulated and 116 were down-regulated more than 3-fold. conclusions: the magnitude of differentially expressed gene profiles detected in hp-prrsv rjxwn06 infected pigs as compared to vr-2332 infected pigs was consistent with the increased pathogenicity of the hp-prrsv in vivo. porcine reproductive and respiratory syndrome virus (prrsv), the causative agent of prrs in swine, is a member of the arteriviridae family in the order nidovirales. prrsv causes highly significant economic losses to the swine industry worldwide [1] as a result of both reproductive failure (late-term abortions and stillbirths) in pregnant sows and respiratory disease (pneumonia) in nursery and grower/finishing pigs [2] . infection with prrsv also predisposes pigs to infection by bacterial pathogens as well as other viral pathogens [3] [4] [5] [6] [7] , as such, prrsv is a key etiological agent of the porcine respiratory disease complex (prdc). clinical disease caused by prrsv is highly variable, ranging from mild, subclinical infection to acute death of adult animals [8] . differences in virulence have been attributed to numerous factors including host genetics, management practices, and virus strain heterogeneity [9] [10] [11] [12] [13] [14] [15] [16] . relatively little is known about the interactions of prrsv and host cells. the lymph node is an anatomic site where the innate immune response and adaptive immune system interface. tracheobronchial lymph nodes (tbln) in swine drain the lung field and provide the focal structure that can reproducibly be identified. although the tbln contains a number of cell types, sampling this tissue allows study of direct and indirect effects of an infectious agent on the lung and cells within the lymph node. in 2006 a unique syndrome with high morbidity and mortality was recognized in growing pigs in china that was originally known as porcine high fever disease (phfd) due to its uncertain etiology [17] . experimental infection of pigs in china with these novel viral isolates reproduced the clinical disease providing strong evidence for the role of prrsv as the causal agent of phfd. however, there was still a question as to whether there was some unknown agent in the prrsv preparations that increased the severity of the clinical disease over what was expected for a "routine" prrsv infection. this question was resolved when phfd was reproduced in china with virus derived from an infectious clone of the jx143 prrsv isolate [18] demonstrating that prrsv isolates with a common genetic motif had a causal role in phfd leading to this lineage of virus being called highly pathogenic prrsv (hp-prrsv). we imported a plasmid containing a full-length clone of the 2006 jxwn06 hp-prrsv isolate [19] from which infectious virus (rjxwn06) was rescued. an animal study was conducted comparing the pathogenicity of hp-prrsv isolate rjxwn06 with the north american prototype strain vr-2332 prrsv [20] . the objective of this report was to investigate gene expression profiles in porcine tracheobronchial lymph node (tbln) during viral infection with hp-prrsv rjxwn06 strain alongside of us prrsv strain vr-2332 at a snapshot of 13 days post-infection using bioinformatics. mapping short rna-seq reads and estimating transcript expression levels genomic short-read nucleotide alignment program (gsnap) was used for alignment and genome construction, and cufflinks to determine if differential expression and changes in transcript abundance were statistically significant [21, 22] . the rnaseq yielded 55,527,464 reads for the control, 43,263,207 reads for the hp-prrsv, and 34,555,783 for vr-2332 libraries after quality trimming and excluding any reads less than 25 bp. cufflinks was used to measure transcript abundances in fragments per kilobase of exon per million fragments mapped (fpkm). the cuffdiff output contained normalized fpkm for comparison between libraries (additional file 1). these values were used to calculate the fold change (log2 transformed) in expression between the experimental unit and the control. examination of the rnaseq data indicated that there were major changes in transcript abundance occurring in the prrsv-infected tbln-based unique transcripts [cuffdiff output (additional file 1)]. of these total transcripts, 632 were found to be significant hits in the hp-prrsv rjxwn06 library and 633 were significant in the us prrsv vr-2332 library (table 1) . of those 632 significant hits within the hp-prrsv rjxwn06 library 55 hits were up-regulated and 69 were down-regulated more than 3-fold whilst in the us prrsv vr-2332 library 4 hits were up-regulated and 116 were downregulated more than 3-fold. this derived catalog of expressed genes represents the first comparative analysis of the hp-prrsv rjxwn06 and vr-2332-infected tbln transcript abundance profiles and provides a database that informs us of genes involved in normal tbln physiology, as well as genes whose abundance is altered by prrsv infection. gene annotation of all significant hits (additional files 1 and 2) was then carried out using a mysql database matching the ensmbl (sscrofa9.56) chromosome location of aligned transcripts to gene names. gene ids and log2 fold-change expression values for significant hits, that had fpkm values in both the control and the infected differential expression testing for transcripts (cuffdiff output files), were then analyzed using the ingenuity pathway analysis software. when comparing the tbln transcriptome from sham-inoculated controls vs. the hp-prrsv rjxwn06-infected pigs, 568 of the 632 gene ids mapped to the ingenuity knowledge base and 165 were up-regulated while 148 were down-regulated. in the tbln of control vs. vr-2332-infected pigs, 528 of the 633 gene ids mapped to the ingenuity knowledge base and only 8 were up-regulated while 235 were down-regulated. table 2 lists the top ten genes (named by the hugo gene nomenclature committee (hgnc) [23] ) we detected that had a significant value in both hp-prrsv rjxwn06 and vr-2332-infected tbln rnaseq cuffdiff output and a fold-change increase or decrease of greater than 3. transcripts up-regulated in both hp-prrsv rjxwn06 and vr-2332-infected tbln by > 9-fold and > 4-fold vs. control tbln, respectively, were three serum amyloid a2 (saa4) acute-phase isoforms, as well as gene enssscg00000013369 f1s9c0_pig serum amyloid protein (no hgnc annotation), that are expressed in response to inflammatory stimuli ( table 2 ). other annotated genes ( table 2 ) that were up-regulated in hp-prrsv rjxwn06 tbln vs. control were resistin (retn) which is secreted by immune and epithelial cells and participates in the immune response by increasing transcriptional events that increase expression of several proinflammatory cytokines [24] ; three members of the s100 family (s100a9, s100a8, s100a12) of calcium-binding proteins localized in the cytoplasm and/or nucleus of a wide range of cells, involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation, and mediators of inflammatory and protective anti-infection responses [25] ; xanthine dehydrogenase (xdh) a generator of reactive oxygen species and possible cause of hypoxia-mediated lung injury [26] ; and peptidylarginine deiminase, type iv, (padi4) which may play a role in granulocyte and macrophage development leading to inflammation and immune responses [27] . also in the top ten up-regulated transcripts were two genes without a hgnc symbol, trem1_pig (enssscg00000001617) trigger receptor, which is expressed on myeliod cells, and the interleukin-1 receptor, type ii gene (il1r2) which is associated with host responses to subdue inflammation as a consequence of disease. down-regulated in hp-prrsv tbln vs. control tbln were diacylglycerol oacyltransferase 2 (dgat2) which catalyzes triglyceride synthesis which is critical for formation of adipose tissue [28] ; perilipin-1 (plin1) an important regulator of lipid storage; a member of the cytochrome p450 monooxygenases (cyp4b1) of unknown specific function; soluble galactosebinding lectin 12 (lgals12), cell death-inducing dffalike effector c (cidec), tumor suppressor candidate 5 (tusc5), protein phosphatase 1, regulatory (inhibitor) subunit 1a (ppp1r1a), c-type lectin domain family 4, member g, that encodes a glycan-binding receptor and a member of the c-type lectin family which plays a role in t-cell immune responses (clec4g). also in the top ten down-regulated transcripts were the following genes without projected hgnc symbols: ces1 liver carboxylesterase (enssscg00000002825) and f1sty2_pig thyroid hormone-responsive protein (enssscg00000014888). in vr-2332-infected pig tbln vs. control tbln, transcript abundance was down-regulated to a lesser extent and featured genes linked to metabolism in adipose tissue and regulation in neuronal activity functions including dermatopontin (dpt) extracellular matrix protein with possible functions in cell-matrix interactions and matrix assembly which enhances transforming growth factor beta (tgfb1) activity; beta-1 adrenergic receptor (adrb1); solute carrier family 2, facilitated glucose transporter member 4 (slc2a4); uncharacterized mlx interacting protein-like protein (mlxipl); basic helix-loop-helix transcription factor 15 (tcf15); forkhead box transcription factor protein c2 (foxc2); protein phosphatase 1 regulatory subunit 1b also known as dopamine-and camp-regulated neuronal phosphoprotein (ppp1r1b); potassium voltage-gated channel, kqt-like subfamily, member 4 (kcnq4) that is thought to play a critical role in the regulation of neuronal excitability; plexin domain containing 1 (plxdc1); and adenosine a1 receptor (adora1). analysis of the genomic data in the context of gene ontology, by ingenuity pathway analysis (ipa), allowed us to ascribe biological functional networks to the differentiated transcript abundance dataset. the top functions identified with the ingenuity canonical pathway list, filtered to apoptosis, cellular immune response, cytokine signalling, humoral immune responses and pathogeninfluenced signalling, based on differentially expressed genes were: granzyme a signalling, crosstalk between dendritic cells and natural killer cells, il-10 signalling, role of pattern recognition receptors in recognition of bacteria and viruses, il-12 signalling and production in macrophages, complement system, interferon signalling, communication between innate and adaptive immune cells, il-17a signalling in fibroblasts, granzyme b signalling, production of nitric oxide and reactive oxygen species in macrophages, differential regulation of cytokine production in macrophages and t helper cells by il-17a lgals12 and il-17f that were above the threshold of p value < 0.05, as calculated by fischer's test representing the ratio of number of genes from the dataset that map to the pathway and the number of all known genes ascribed to the pathway. the genes up-regulated in the hp-prrsv rjxwn06 infected pigs' tbln were associated in 18 networks: from biological networks with functions associated with cell death, antimicrobial responses and cancer, with the highest network score of 37, i.e. the likelihood of genes in this network would have approximately a 10 -37 chance of occurring randomly, and 21 focus molecules, i.e. the starting points for generating biological networks; to networks with functions associated with nervous system development and function, organ morphology and reproductive system disease with a score of 2 and 1 focus molecule. many of the upregulated networks related to cell death and inflammatory response functions fit with the results previously reported [17, 29] where hp-prrsv strain rjxwn06 caused severe disease, resulted in up to 100x higher abundance of virus and produced an exacerbated release of cytokines, including pro-inflammatory cytokines, when compared to type 2 prototype strain vr-2332. wide spread tissue damage [30] and cell death were observed as predicted by up-regulation of celldeath associated genes (circled in orange in figure 1 ) in the network representation of the mostly highly rated network for hp-prrsv rjxwn06 by ipa. the downregulated network functions in the hp-prrsv rjxwn06 infected tbln included activities associated with cellular function and maintenance, tissue morphology, metabolic disease, organismal development, carbohydrate metabolism, lipid metabolism, small molecule biochemistry, post-translational modification, protein folding, developmental disorder, which may be associated with cell death and reflects a severe disease state. similarly, the down-regulated network functions in the vr-2332 infected tbln were associated with cellular function, maintenance, development and organization. this study produced transcriptional profiles of tblns from non-infected, hp-prrsv rjxwn06 and us prrsv vr-2332-infected pigs that provides insight into immune figure 1 ingenuity pathways analysis summary. to investigate possible interactions of differently regulated genes, datasets representing 568 genes with altered expression profile obtained from the rnaseq data for hp-prrsv rjxwn06 were imported into the ingenuity pathway analysis tool and the following data is illustrated: the network representation of the most highly rated network (gene expression, cell death, lipid metabolism). the genes that are shaded were determined to be significant from the statistical analysis. the genes shaded red are up-regulated and those that are green are down-regulated. the intensity of the shading shows to what degree each gene was up or down-regulated. a solid line represents a direct interaction between the two gene products and a dotted line means there is an indirect interaction. genes associated with cell death are circled with orange color. dysregulation elicited by the virus on host transcript abundance levels necessary for a effective immune response. this rna-seq compendium extends the analyses of previous gene expression atlases performed using affymetrix genechip technology and provides an example of new methods to accommodate the increase in transcriptome data obtained from next generation sequencing [29, 31, 32] . it is well established that many pathogens cause changes in expression of specific genes that act to protect the host and clear the infection. prrsv strains differ in their dysregulation of the immune response to infection and delay in development of a protective immune response in vaccinated pigs [33, 34] . a higher number of significantly differentially expressed gene instances were detected in hp-prrsv rjxwn06 than vr-2332 when normalized to control samples at a snapshot of 13 days post inoculation (dpi). as anticipated, some of the genes (e.g., resistin) and pathways identified would be expected to be involved in the host response to a severe disease. in the case of resistin, it would be expected that adipose tissue stores are being mobilized as part of the host response to infection, which includes a high fever typical of infection with hp-prrsv. there are specific cellular proteins that regulate a protective immune response, for example the pro-inflammatory genes that were upregulated to a greater extent in hp-prrsv rjxwn06 than vr-2332 when normalized to control samples as observed when comparing the pathogenicity of hp-prrsv isolate rjxwn06 with the north american prototype strain vr-2332 prrsv. at 13 dpi hp-prrsv rjxwn06 inoculated pigs had an interstitial pneumonia that was significantly more severe than thevr-2332 inoculated group which appeared to be convalescing [30] . future studies of these differentially expressed genes, their transcript abundance, protein level, and protein function will enhance our understanding of the interaction of prrsv with the host. identification of new virulence mechanisms of prrsv may improve the prospects for rational design of more effective vaccines to limit viral replication and shedding. marc-145 cells were cultured in minimum essential medium (emem, safc 56416c) with 10% fetal bovine serum at 37â°c, 5% co 2 . wild-type (wt) type 2 prrsv strain vr-2332 (genbank u87392), passage 6 on marc-145 cells, was titrated and used for the swine study. virus (rescued jxwn06; rjxwn06) was rescued from a cloned cdna of chinese highly pathogenic type 2 prrsv strain jxwn06 [pwsk-jxwn; genbank ef641008, [19] ] and passaged 3 times on marc-145 cells for use in the swine study. the animal use protocol was reviewed and approved by the institutional animal care and use committee (iacuc) of the national animal disease center-usda-agricultural research service. thirty-two 10-week-old cross-bred pigs were obtained from a u.s. high-health herd and were found to be free of prrsv and influenza virus antibodies using commercially available enzymelinked immunosorbent assay (elisa) kits (herdchek prrs 2xr; idexx laboratories, westbrook, maine) and np elisa (multis elisa, idexx, westbrook, maine), respectively. pigs were also confirmed negative for porcine circovirus type 2 by quantitative real-time pcr [35] . one day prior to starting the experiment, pigs were bled, weighed and randomly assigned to one of four groups. group 1 (n = 8) consisted of negative control pigs, which received an intranasal 2 ml sham inoculum of minimum essential media (mem) on 0 dpi. group 2 pigs (n = 12) were challenged intranasally with 2 ml of 1 ã� 10 6 50% tissue culture infective dose (tcid 50 )/ml of chinese prrsv strain rjxwn06 in animal biosafety level-3-agriculture (absl-3-ag) housing, where they remained for the duration of the experiment. group 3 consisted of naã¯ve pigs (n = 4) that were placed in contact with group 2 swine on 2 dpi. group 4 pigs (n = 8) were challenged intranasally with 2 ml of 1 ã� 10 6 tcid 50 /ml of type 2 prototype strain vr-2332. groups 1 and 2 were housed in separate isolation rooms in an absl2 facility. animal care and euthanasia were conducted in accordance with the report of the avma panel on euthansia and under the supervision of iacuc of nadc. serum and bronchoalveolar lung lavage fluid (balf) were tested for infectious virus as described previously [36] . lungs were scored for gross lesions [11] and sections fixed for histopathology. swabs were collected from balf, and various sites for bacterial isolation [37] . following humane euthanasia, tracheobronchial lymph nodes (tbln) from in vivo hp-prrsv rjxwn06 (n = 8), us prrsv vr-2332 (n = 7), or sham-infected pigs (n = 8) were harvested at 13 days post-infection and total cellular rna was prepared as follows. one gram of tbln from each pig was collected immediately upon necropsy, minced and stored in rnalater (life technologies, grand island, ny) at â��80â°c until homogenized for extraction of total rna with magmax â�¢ -96 for microarrays total rna isolation kit (applied biosystems, carlsbad, ca) using the manufacturer's protocol. the integrity of the rna was confirmed with a 2100 bioanalyzer and rna 6000 nano-chip (agilent, santa clara, ca). the samples used had an average rna integrity number (rin) value of 7.8 and 28s:18s rrna ratio of 1.9. cdna library construction cdna libraries were constructed from pooled total cellular rna from the tbln in each treatment group using truseq sample prep kits (illumina inc., san diego, ca) and sequenced by 2 ã� 100 paired-end sequencing on an illumina hiseq 2000 instrument. in order to analyze the illumina reads, a series of bioinformatics methods were used to investigate gene expression profiles in tbln during prrsv infection with hp-prrsv rjxwn06 and us prrsv vr-2332 at a snapshot of 13 dpi. this was carried out with the construction of a rnaseq analysis pipeline ( figure 2 ) comprised of gsnap for alignment and genome construction, and cufflinks to determine if differential expression and changes in transcript abundance were statistically significant. three files of transcriptome data from the sham, hp-prrsv rjxwn06 and us prrsv vr-2332 inoculated groups were aligned to the ucsc pig genome build using the gsnap alignment program in preparation for differential expression analysis. the next step in the pipeline was to put the gsnap output into the cufflinks program and run it through three separate utilities or tools within the software package; cufflinks, cuffmerge, and cuffdiff. first the three files were run through cufflinks in order to assemble the aligned rna sequence reads into transcripts and estimate the abundances in fpkm of the paired-end reads. the cufflinks q-value was the false discovery rate (fdr)-adjusted p-value of the uncorrected test statistic. the q-value used in this study was 0.05. the significance status was "yes" when p was greater than q after benjamini-hochberg correction for multiple-testing (additional file 1). cuffmerge was then used to create a single transcript dataset from the multiple reconstructions. two runs were then conducted using the hp-prrsv rjxwn06 vs. control and the us prrsv vr-2332 vs. control datasets using the cuffdiff program to test for differential expression and regulation amongst the two disease states. gene annotation of all significant hits was then carried out using a mysql database matching to the ensembl sscrofa 9.56 reference genome currently supported by the integrative genomics viewer (broad institute). datasets representing genes with altered expression profile derived from rnaseq analyses were imported into the ingenuity pathway analysis tool (ipa tool; figure 2 computational pipeline. rnaseq analysis pipeline comprised of gsnap for alignment and cufflinks to determine if differential expression, and changes in transcript abundance were statistically significant (adapted from [38] ). ingenuity w systems, redwood city, ca, usa; http:// www.ingenuity.com). in ipa, differentially expressed genes were mapped to genetic networks available in the ingenuity database and then ranked by score. the basis of the ipa program consists of the ingenuity pathway knowledge base (ipkb) that is derived from known functions and interactions of genes published in the literature. thus, the ipa tool allows the identification of biological networks, global functions within the host and functional pathways of a particular dataset. the program also gives the significance value of the differentially expressed genes, the other genes with which it interacts, and how the products of the genes directly or indirectly act on each other, including those not involved in the microarray analysis. the networks created are ranked depending on the number of significantly expressed genes they contain and also list diseases that were most significant ( figure 1 ). assessment of the economic impact of porcine reproductive and respiratory syndrome on swine production in the united states general overview of prrsv: a perspective from the united states porcine reproductive and respiratory syndrome: clinical disease, pathology and immunosuppression in utero infection by porcine reproductive and respiratory syndrome virus is sufficient to increase susceptibility of piglets to challenge by streptococcus suis type ii pathogenesis of porcine reproductive and respiratory syndrome virus infection in gnotobiotic pigs synergism between porcine reproductive and respiratory syndrome virus (prrsv) and salmonella choleraesuis in swine laboratory investigation of prrs virus infection in three swine herds a brief review of procedures and potential problems associated with the diagnosis of porcine reproductive and respiratory syndrome genetic, geographical and temporal variation of porcine reproductive and respiratory syndrome virus in illinois associations between genetics, farm characteristics and clinical disease in field outbreaks of porcine reproductive and respiratory syndrome virus comparison of the pathogenicity of two us porcine reproductive and respiratory syndrome virus isolates with that of the lelystad virus comparative pathogenicity of nine us porcine reproductive and respiratory syndrome virus (prrsv) isolates in a five-week-old cesarean-derived, colostrumdeprived pig model reproductive failure of unknown etiology genetic perspectives on host responses to porcine reproductive and respiratory syndrome (prrs) heterogeneity of porcine reproductive and respiratory syndrome virus: implications for current vaccine efficacy and future vaccine development lelystad virus and the porcine epidemic abortion and respiratory syndrome emergence of fatal prrsv variants: unparalleled outbreaks of atypical prrs in china and molecular dissection of the unique hallmark an infectious cdna clone of a highly pathogenic porcine reproductive and respiratory syndrome virus variant associated with porcine high fever syndrome the 30-amino-acid deletion in the nsp2 of highly pathogenic porcine reproductive and respiratory syndrome virus emerging in china is not related to its virulence experimental infection of united states swine with a chinese highly pathogenic strain of porcine reproductive and respiratory syndrome virus de novo assembly and analysis of rnaseq data transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation genenames.org: the hgnc resources in 2011 cloning and characterization of porcine resistin gene porcine s100a8 and s100a9: molecular characterizations and crucial functions in response to haemophilus parasuis infection effect of hypoxia and reoxygenation on the formation and release of reactive oxygen species by porcine pulmonary artery endothelial cells citrullination by peptidylarginine deiminase in rheumatoid arthritis identification of a gene encoding an acyl coa: diacylglycerol acyltransferase, a key enzyme in triacylglycerol synthesis aberrant host immune response induced by highly virulent prrsv identified by digital gene expression tag profiling hp-prrsv challenge of 4 and 10-week-old pigs indepth global analysis of transcript abundance levels in porcine alveolar macrophages following infection with porcine reproductive and respiratory syndrome virus molecular characterization of transcriptome-wide interactions between highly pathogenic porcine reproductive and respiratory syndrome virus and porcine alveolar macrophages in vivo lymphoid hyperplasia resulting in immune dysregulation is caused by porcine reproductive and respiratory syndrome virus infection in neonatal pigs infection with porcine reproductive and respiratory syndrome virus stimulates an early gamma interferon response in the serum of pigs effect of vaccination with selective bacterins on conventional pigs infected with type 2 porcine circovirus in vivo growth of porcine reproductive and respiratory syndrome virus engineered nsp2 deletion mutants coinfection of pigs with porcine respiratory coronavirus and bordetella bronchiseptica uncovering the complexity of transcriptomes with rna-seq submit your next manuscript to biomed central and take full advantage of: â�¢ convenient online submission â�¢ thorough peer review â�¢ no space constraints or color figure charges â�¢ immediate publication on acceptance â�¢ inclusion in pubmed, cas, scopus and google scholar â�¢ research which is freely available for redistribution we thank the following members of the virus and prion research unit at the national animal disease center: j. huegel, j. crabtree, a. burow, d. adolphson, s. anderson, m. kappes and a. vorwald for technical assistance. we also gratefully acknowledge d. alt of the genomics unit at the national animal disease center, and a. severin of iowa state university ngs bioinformatics, for assistance in data analysis. mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the u.s. department of agriculture. additional file 1: cuffdiff output of all significant hits for comparison, using the hp-prrsv rjxwn06 vs. control and the us prrsv vr-2332 vs. control datasets, to test for differential expression and regulation amongst the two disease states. gene annotation (symbol) was carried out using a mysql database matching to the ensembl sscrofa9.56 reference genome currently supported by the integrative genomics viewer (broad institute). location (row.names); gene annotation (symbol); entrez gene name; treatment (sample_1, sample_2); status; abundance in fpkm (value_1, value_2); differential expression (log2_fold_change); test statistic (test_stat); p-value (p_value); false discovery rate (fdr)-adjusted p-value of the uncorrected test statistic (q_value); significance status after benjamini-hochberg correction for multiple-testing (significant).additional file 2: transcript sequences of all significant hits inthe hp-prrsv rjxwn06 vs. control and the us prrsv vr-2332 vs. control datasets. the authors declare that they have no competing interests.authors' contributions lcm: study conception, data collection and analysis, research design, manuscript writing. df: research design, data analysis. aa: data analysis. dob: rnaseq interpretation discussions and data analysis. bg: infectious clone and production of virus stocks. kml: study conception, animal study execution, virus stocks, data collection, and manuscript preparation and writing. jnh: data collection. sns: data collection. h-cy: infectious clone of hp-prrsv strain jxwn06. ksf: study conception, rescue of hp-prrsv infectious clone, and production of virus stocks. mek: study conception and manuscript preparation and writing. all authors read and approved the final manuscript. key: cord-356197-js7l86fh authors: zhou, ping; zhai, shanli; zhou, xiang; lin, ping; jiang, tengfei; hu, xueying; jiang, yunbo; wu, bin; zhang, qingde; xu, xuewen; li, jin-ping; liu, bang title: molecular characterization of transcriptome-wide interactions between highly pathogenic porcine reproductive and respiratory syndrome virus and porcine alveolar macrophages in vivo date: 2011-08-07 journal: int j biol sci doi: nan sha: doc_id: 356197 cord_uid: js7l86fh porcine reproductive and respiratory syndrome virus (prrsv) infects mainly the porcine alveolar macrophages (pams) and causes porcine reproductive and respiratory syndrome (prrs). previous studies have analyzed the global gene expression profiles of lung tissue in vivo and pams in vitro following infection with prrsv, however, transcriptome-wide understanding of the interaction between highly pathogenic prrsv (hp-prrsv) and pams in vivo has not yet been established. in this study, we employed affymetrix microarrays to investigate the gene expression patterns of pams isolated from tongcheng piglets (a chinese indigenous breed) after infection with hp-prrsv. during the infection, tongcheng piglets exhibited typical clinical signs, e.g. fever, asthma, coughing, anorexia, lethargy and convulsion, but displayed mild regional lung damage at 5 and 7 dpi. microarray analysis revealed that hp-prrsv infection has affected pams in expression of the important genes involved in cytoskeleton and exocytosis organization, protein degradation and folding, intracellular calcium and zinc homeostasis. several potential antiviral strategies might be employed in pams, including upregulating ifn-induced genes and increasing intracellular zinc ion concentration. and inhibition of the complement system likely attenuated the lung damage during hp-prrsv infection. transcriptomic analysis of pams in vivo could lead to a better understanding of the hp-prrsv-host interaction, and to the identification of novel antiviral therapies and genetic components of swine tolerance/susceptibility to hp-prrs. porcine reproductive and respiratory syndrome (prrs), caused by prrs virus (prrsv) which belongs to the genus arterivirus of the family arteriviridae, is the most economically significant disease effecting commercially bred pigs world-wide [1] . this disease is characterized by anorexia, increased late-term ivyspring international publisher abortions, increased number of stillborn pigs, mummified fetuses, weak live-born piglets, increased pre-weaning mortality, and delayed return to estrus [2] . in vivo, prrsv productive infection occurs predominately in alveolar macrophages of the lung [3] , followed by viremia and subsequent interstitial pneumonia within 3 days [4] . it was hypothesized that respiratory pathology, especially lung damage during prrsv infection, results from an overproduction of pro-inflammatory cytokines in the lungs [5] . genome-wide transcriptional responses of lungs of landrace×yorkshire crossbred piglets to a classical north american type prrsv strain infection was analyzed by solexa/illumina's digital gene expression (dge) system, which is a tag-based high-throughput transcriptome sequencing method [6] . this systematic analysis of the pulmonary gene expression profiles suggested that upregulation expression of pro-inflammatory cytokines, chemokines, adhesion molecules and inflammatory enzymes and inflammatory cells, antibodies, complement activation were likely to result in the development of inflammatory responses during prrsv infection processes [6] . another high-throughput deep sequencing was performed focusing on the pulmonary gene expression profiles after a highly pathogenic-prrsv (hp-prrsv) strain infection [7] . the system analysis of the pulmonary gene expression provides a comprehensive basis for better understanding the pathogenesis of hp-prrsv [7] . because prrsv infection occurs predominately in porcine alveolar macrophages (pams) [3] , the interaction between prrsv and pams have been studied systematically by high-throughput research methods in vitro. pams, lavaged from six piglets, were challenged with the lelystad prrsv strain in vitro, and the gene expression of the pams was investigated using affymetrix microarrays [8] . the result suggested that the expression of beta interferon 1 (ifn-β), but not of ifn-α, was strongly upregulated in the early stage of prrsv infection [8] . besides microarray, serial analysis of gene expression (sage) was also employed to examine the global expression of genes in prrsv-infected pams in vitro [9] . these studies have provided global gene expression profiles of lung tissue in vivo and pams in vitro following infection with prrsv; however, transcriptome-wide understanding of the interaction between prrsv and pams in vivo has not yet been established. in 2006, an unparalleled large-scale outbreak of highly pathogenic prrs (hp-prrs) occured in many areas of china. this outbreak affected more than 2 millions pigs and produced approximately 0.4 million fatal cases [10] . in this study, laboratory infection was performed in tongcheng piglets (a chinese indigenous breed living in tongcheng county of hubei province) using prrsv stain wuh3 [11] , a highly pathogenic prrsv isolated in china during the pandemic period of hp-prrs in 2006. we also employed affymetrix microarrays to investigate the gene expression patterns of pams isolated from the piglets after infection. the current study aims at better understanding the interaction between hp-prrsv and the host pams, which may lead to the identification of key host factors for tolerance/susceptibility to the virus and the finding of novel targets for antiviral therapies. all animal procedures were performed according to protocols approved by the biological studies animal care and use committee of hubei province, china. piglets used in this study were free from prrsv, pseudorabies virus (prv) and porcine circovirus type 2 (pcv2) determined by elisa test for serum antibodies. twelve 5-week-old tongcheng boars (a chinese indigenous breed) were obtained from three litters (four piglets per litter), and raised in pathogen-free facilities. to perform a paired experiment, individuals within a full-sib litter were separated into two groups: one infected group and one control group with 6 piglets in each group. the infected groups were challenged with prrsv-wuh3 (3 ml/15 kg, 10 -5 tcid50/ml) by intramuscular inoculation. slaughters were carried out at 0 day post-infection (dpi) for uninfected (control) groups, and at 5 or 7 dpi for infected groups. rectal temperature and clinical signs were recorded daily during the experiment. the serum samples for viremia detection were collected daily from all animals (one ml blood per sampling point). the pams for microarray analysis were collected by bronchoalveolar lavage from three uninfected pigs and three infected pigs at 5 dpi. post-mortem examinations were performed on all pigs. macroscopic lung lesions were given a subjective score to estimate the percentage of the lung affected by pneumonia, following a scoring system described previously [12, 13] . for histopathology analysis, samples of the apical segment of the lower lung lobes were collected and fixed in 4% paraformalclehyde for 24 h. fixed samples were dehydrated, embedded in paraffin, sectioned into 4 μm and stained with hematoxylin and eosin. sections were examined by light microscopy. for viremia detection, serum samples were collected daily from all pigs. total viral rna was extracted from 200 μl serum using trizol reagent (invitrogen, carlsbad, ca). cdna was synthesized using oligo(dt)15 primer, m-mlv reverse transcriptase (promega, madison, wi) in 50 μl reaction mixture according to the manufacturer's instructions. absolute quantitative-pcr (q-pcr) was performed using primers specific to the orf7 of prrsv (sense: 5'-tca gct gtg cca aat gct gg-3'; antisense: 5'-aaa tgg ggc ttc tcc ggg ttt t-3'). for absolute quantification, the pet-18m plasmid of the known copy number containing the orf7 fragment generated standard curve. viral copies per ml of the unknown samples were determined by linear extrapolation of the ct value plotted against the standard curve [14] . trizol (invitrogen) was used for rna extractions following the manufacturer's instructions. rna integrity and concentration were evaluated by denaturing formaldehyde gel electrophoresis and agilent 2100 bioanalyzer. the rna samples were sent to genetech biotechnology limited company (shanghai, china) for hybridization to the porcine affymetrix genechip (affymetrix, santa clara, ca). a total of 6 microarray analyses were conducted using the procedure described previously [15] . the raw data (affymetrix genechip scanner 3000) was converted to gene signal files by mas 5.0 (microarray suite version 5.0, affymetrix). the data points were normalized between slides using the quantile normalization method used by bolstad et al. [16] . the differentially expressed genes were selected using the sam (significance analysis of microarrays) package (http://www-stat.stanford.edu/~tibs/ sam/), and the false discovery rate (fdr) values were generated using permutations of the repeated measurements to estimate the percentage of genes identified by chance. in the experiment, sam settings were adjusted for a two class paired analysis, using one hundred permutations to calculate the differentially expressed gene list. the fold-change of 1.5 and a false discovery rate of approximately 5% were set as a threshold. all data are miame compliant and have been deposited in ncbi's gene expression omnibus and are accessible through geo series accession number gse22782 (http://www.ncbi.nlm.nih.gov /geo/query/acc.cgi?acc=gse22782). differential gene expressions were performed for hierarchical cluster (ver.3.0) and treeview (ver.1.60) analyses [17] . the functional annotation of differentially expressed genes was performed by the david (the database for annotation, visualization and integrated discovery) gene annotation tool (http://david.abcc.ncifcrf.gov/) [18] , as well as by referring to a previous work [19] . the rna samples prepared for microarray analysis were also used for q-pcr verification. reverse transcriptions were performed using m-mlv reverse transcriptase (promega) according to the manufacturer's instructions. the primers were designed with the primer premier 5.0 program. the rpl32 gene was used as the internal control [20] . the primer sequences, melting temperatures and product sizes are shown in table 1 . q-pcr was performed on the lightcycler 480ⅱ (roche, basel, sweden) using sybr green realtime pcr master mix (toyobo co., ltd, japan) as the readout. data was analyzed by the 2 -δδct method [21] . the data analysis procedure was performed as described previously [15] . after infection with prrsv-wuh3, the piglets presented typical clinical signs, e.g. fever, asthma, coughing, anorexia, lethargy and convulsion. the average rectal temperature rose to above 40.5 ºc at 2 dpi and seemed to peak at 5 dpi. the two piglets surviving at 5 dpi showed a slight decrease of rectal temperature in the following two days ( figure 1a ). to assess the replication and spread of hp-prrsv, the viral copy number/ml in serum was determined by absolute real-time quantitative-pcr ( figure 1b ). the level of viremia increased rapidly during the first two days post-infection, then increased slowly from 3 to 5 dpi, and approached the plateau phase at 6 or 7 dpi. pathologic examination was carried out on the animals. macroscopic examination detected a mild lung lesion at the apical segment of the lower lobes at 5 and 7 dpi ( figure 1c ). for estimating the severity of the pneumonia, gross lung lesion scores were made based on the method described previously [12, 13] . the low scores indicated a mild regional lung damage at 5 and 7 days after hp-prrsv infection ( figure 1d ). as compared with the uninfected group ( figure 1e) , microscopic examination detected a certain extent of congestion as well as interstitial infiltration of leukocytes in the lungs of infected piglets ( figure 1f ). pams samples collected from three infected piglets at 5 dpi and three uninfected piglets were analyzed. a total of 12,775 transcripts (53% of all probesets) were expressed in infected and non-infected pams (supplementary table 1 ). after quantile normalization, 321 genes were identified as differentially expressed (de) genes, with 219 being upregulated and 102 being downregulated, under the threshold of fold change (fc) of 1.5 or greater and a false discovery rate (fdr) of approximately 5% (figure 2a and supplementary table 2 ). based on the database for annotation, visualization and integrated discovery (david), 166 of the de genes were classified into 47 categories, many of which shared the same genes, according to their functional correlation (figure 2b and supplementary table 3 ). the majority of the genes related to the virus-host cell interaction could be assigned into the categories including cell death and apoptosis related, response to wounding, response to unfolded protein, response to oxidative stress, response to virus, innate immune response, response to cytokine stimulus, and endoplasmic reticulum (er) overload response. other de genes that were not classified by david were taken into account for further analysis below. in macrophages, prrsv entry into the host cell is mediated by heparan sulphate proteoglycans and the receptor sialoadhesin. upon a ph drop, prrsv is uncoated and its genome is released from the endosomes into the cytoplasm, which allows virus replication [22] . after hp-prrsv infection the atp6v1b2 gene, which encodes a component of vacuolar atpase (v-atpase) that mediates acidification of endosomal organelles [23] , was upregulated ( figure 3 ). sarm1, a negative regulator of trif-dependent toll-like receptor (tlr) signaling [24] and mapk phosphorylation [25] , was significantly downregulated (figure 3 ). sbno2, a potent inhibitor of nf-κb [26] , and socs1 which limits nf-κb signaling by decreasing p65 stability within the cell nucleus [27] , were upregulated ( figure 3 ). upon hp-prrsv infection, irf7 was found to be upregulated in pams at 5 dpi ( figure 3) , however, no type-i ifn or ifn-γ induction was observed. a number of ifn-induced genes (ifi6, ifi16, ifih1, ifit2, ifit3, ifitm3, gbp1, gbp2, mx1, gzmb, gzmh, isg15, usp18, rsad2, nmi) were upregulated (table 2) . jak-stat pathway seemed to be positively (stat1 and nmi) as well as negatively (socs1) regulated during hp-prrsv infection (figure 3 ). nine genes (s100a6, marcks, cacybp, cct6a, arhe, cct3, ptpn4, cct7, and twf1) related to actin and tubulin cytoskeleton organization were upregulated and three (rassf8, elmo1, and kif11) were downregulated (table 3 ). in addition, several exocytosis related genes (rsad2, gsk3b, lman2l, exoc2, sels, copz2, sec31l2, and sec8l1) were differentially expressed in pams after hp-prrsv infection (table 3) . rsad2, encoding an ifn-induced protein which inhibits influenza a virus release from the plasma membrane of infected cells by affecting the formation of lipid rafts [28] , was upregulated significantly. vesicle trafficking between the golgi apparatus and er seemed to be restricted, because copz2, a member of the copi coat which helps vesicles transport proteins from the cis end of the golgi complex back to the rough er [29] , and sec31l2, a component of the copii vesicle coat that mediates vesicular traffic from the rough er to the golgi apparatus [30] , were both downregulated. during hp-prrsv infection, homeostasis of isgylation, an ubiquitin-like modification, seemed to be re-established in pams by enhancing the expression of isg15, an ubiquitin-like protein, and usp18, which is an isg15 deconjugating protease (table 3) . three e3 ubiquitin ligase genes (cacybp, herc6, cul1) were upregulated, and two (herc3, g2e3) were downregulated ( table 3) . as expected, a large set of chaperone genes were upregulated, including heat shock 40 kda protein (hsp40) (dnaja1, dnaja4, dnajb1, dnajb2, and dnajb4), hsp60 (hspd1) , hsp70 (hspa1b, hspa4, hspa4, and hspa6) , hsp105/110 (hsph1), and subunits of chaperon in containing t-complex polypeptide 1 (cct3, cct7, and cct6a) , as well as npm3, a molecular chaperone in the cell nucleus (table 3) . upon hp-prrsv infection, several de genes were involved in the intracellular calcium homeostasis in pams (table 3) . after hp-prrsv infection, zinc ion concentration in pams seemed to be increased, through upregulating the expression of slc39a14 which encodes a zinc influx transporter [31] (table 3) . several zinc finger protein encoding genes (zdhhc9, zfand2a, zcchc6, zcwpw1, zfp2, znf258) were also identified as de genes, and all of them were upregulated, except znf258 (table 3) . during hp-prrsv infection, a set of de genes involved in the dynamic regulation of the extracellular matrix and vascular permeability was identified ( table 4 ). infiltration of leukocytes into pulmonary alveoli, as a sign of inflammation, was modulated by upregulating a small number of genes (ccl2, ccl4l, ccr5 and csf1) ( table 4 ). three genes, mpp1, pf4, and ppbp involved in neutrophil infiltration or activation [32] [33] [34] , were all downregulated (table 4) . during hp-prrsv infection, complement activation seemed to be inhibited, as expression of c3 and pfc, a positive regulator of complement activation, were downregulated, and clu, encoding for a complement inhibitor, was upregulated ( table 4 ). seven genes (ccl2, slc39a14, atp6v1b2, c3, ddit3, glrx2 and tnf) were selected for q-pcr assay to validate the changes in gene expression observed by microarray analysis. ccl2 was the main upregulated chemokine gene in this study (table 4) . two upregulated genes, slc39a14 and atp6v1b2, were involved in intracellular zinc homeostasis and endosome acidification, respectively. the downregulated c3 gene is the core member of the complement system which seemed to be inhibited, according to our study ( table 4 ). the q-pcr gene list also contained two de genes (ddit3, glrx2) which were not referred to in the discussion, and tnf, an important cytokine gene, which was not differentially expressed in tongcheng pams in response to hp-prrsv infection. the changes of these genes, detected by microarray analysis, was in agreement with the q-pcr validation (figure 4) . the results of this study showed that tongcheng piglets exhibited typical clinical signs following infection with hp-prrsv wuh3 strain. the lung damage caused by the infection was regional and mild at 5 and 7 dpi ( figure 1c and d), but further observation for a longer period of time was not performed in this study. the slow reproduction rate of the virus (viremia) at 3 to 7 dpi ( figure 1b ) suggested a near balance between the viral replication and the defense mechanisms in the pams. transcriptomic analysis of the pams at 5 dpi identified 321 de genes under the filter of 1.5-fold change, and the number of upregulated genes (219) was greater than that of downregulated genes (102). in comparison, an in vitro transcriptomic analysis of pams revealed that only small numbers (no more than 100) of de genes (threshold of 1.5-fold change) were identified at 1 to 12 hours post prrsv infection, and the overall effect of prrsv on the host transcription machinery was downegulation [8] . it is not sure whether there is a conversion of the overall effect of prrsv on host transcription machinery from downregulation to upregulaton as time goes on, or the change is only the effect of the difference between in vitro and in vivo assays. as compared with the number of de genes in this study, some thousands of de genes (threshold of almost 1.5-fold change) were identified in lung tissues at 4 and 7 dpi following both prrsv and hp-prrsv infection, by high-throughput deep sequencing assays [6, 7] . this great number of de genes might result from the huge amount of data obtained by the deep-sequencing method and from the many cell types in lung tissues. prrsv is considered to inhibit the synthesis of type-i ifns and its signaling by blocking stat1/stat2 nuclear translocation [35] . however, it is also reported that prrsv can phosphorylate ifn-regulatory factor 3 (irf-3) and weakly activate the ifn-β promoter in marc-145 cells in early infection, but the activations of irf-3 and ifn-β promoter are rapidly inhibited in the following infection [36] . the induction of ifn-β mrna, but not ifn-α mrna, is observed in monocyte-derived dendritic cells and primary alveolar macrophages infected by prrsv at 12 dpi [8, 37] . in some cases, even the expression of ifn-α can be detected in the lung [38] or serum [39] of pigs infected with prrsv during the early days. interestingly, in this study, no induction of type-i ifn was detected in pams at 5 dpi (figure 3) , whereas a series of ifn induced genes that are critical for the cell to defend itself against viral infection, were upregulated (table 2) . similar results were shown in lung tissues at 4 and 7 dpi following both prrsv and hp-prrsv infection [6, 7] , and it was speculated that the ifn induced genes were predominantly expressed by the uninfected cells [7] . here, another possibility is suggested that a certain amount of type-i ifn might be induced at the early stage of the infection before 5 dpi. during hp-prrsv infection, several aspects of the pams' function were under regulation, such as actin and tubulin cytoskeleton organization, exocytosis, protein degradation, protein folding, intracellular calcium and zinc homeostasis ( table 3) . increasing of intracellular zinc concentration impairs the replication of a variety of rna viruses, including poliovirus, influenza virus, coronavirus, arterivirus, rhinovirus, and respiratory syncytial virus [40] [41] [42] . recently, zinc ion has been reported to efficiently inhibit the rna-synthesizing activity of the multiprotein replication and transcription complex of both sars-coronavirus and equine arteritis virus [40] . upregulation of slc39a14 (also known as zip14) (table 3) , a member of the slc39 (zip) family which transports zinc from the extracellular space or organellar lumen into the cytoplasm [43] , might be a defense mechanism in pams during hp-prrsv infection. nevertheless, none of the slc39 family genes was identified as a de gene in a microarray assay of pams infected with prrsv in vitro [8] . furthermore, the expression of slc39a7, another member of the slc39 family, was downregulated in the lungs of landrace×yorkshire crossbred piglets at 7 dpi following hp-prrsv infection [7] . it has been shown in this study, that modulated inflammatory reaction, with a few proinflammatory cytokines upregulated (ccl2, ccl4l and its receptor ccr5, and csf1) ( table 4) , might contribute to the mild regional lung lesion observed at 5 and 7 dpi ( figure 1c and d) . besides, the complement system is one of the key players in the defense against infections. however, excessive activation of the complement can also exaggerate the disease induced by viral or bacterial infection. in 2009, a new h1n1 influenza a virus caused severe disease in naive middle-aged human individuals with preexisting immunity against seasonal strains, and this disease is reported to be induced through high titers of low-avidity nonprotective antibody and immune complex-mediated complement activation in the respiratory tract [44] . excessive complement activation can contribute to organ damage in combination with the cytokine storm in the later stages of sepsis caused by bacterial infection [45] . it is reported that blocking complement activation can ameliorate hepatic inflammation mediated by the hepatitis c virus core protein [46] . likewise, inhibition of complement with a potent c3 inhibitor (compstatin) in a baboon model of late-stage sepsis markedly improves organ preservation and other clinical parameters [47] . as it has been shown here, inhibition of the complement system might also be a contributor to the mild regional lung damage during hp-prrsv infection. interestingly, infection of hp-prrsv in six-week-old crossbred weaned pigs (landrace × yorkshire) induces complement activation accompanied by severe lung damage [7] . in summary, the data presented in this study suggested that during infection with hp-prrsv tongcheng piglets exhibited typical clinical signs, but displayed mild regional lung damage at 5 and 7 dpi. microarray analysis revealed that hp-prrsv infection has affected pams in vivo in expression of the important genes involved in cytoskeleton and exocytosis organization, protein degradation and folding, intracellular calcium and zinc homeostasis. several potential antiviral strategies might be employed in pams, including upregulating ifn-induced genes and increasing intracellular zinc ion concentration. furthermore, inhibition of the complement system likely attenuated the lung damage during hp-prrsv infection. this system analysis could lead to a better understanding of the hp-prrsv-host interaction, and to the identification of novel antiviral therapies and identifying genetic components for swine tolerance/susceptibility to hp-prrs. genetic parameters for performance traits in commercial sows estimated before and after an outbreak of porcine reproductive and respiratory syndrome immunohistochemical detection of porcine reproductive and respiratory syndrome virus antigen in neurovascular lesions immunological responses of swine to porcine reproductive and respiratory syndrome virus infection distribution of a korean strain of porcine reproductive and respiratory syndrome virus in experimentally infected pigs, as demonstrated immunohistochemically and by in-situ hybridization the combination of prrs virus and bacterial endotoxin as a model for multifactorial respiratory disease in pigs understanding prrsv infection in porcine lung based on genome-wide transcriptome response identified by deep sequencing aberrant host immune response induced by highly virulent prrsv identified by digital gene expression tag profiling genome-wide transcriptional response of primary alveolar macrophages following infection with porcine reproductive and respiratory syndrome virus effect of porcine reproductive and respiratory syndrome virus on porcine alveolar macrophage function as determined using serial analysis of gene expression (sage) emergence of fatal prrsv variants: unparalleled outbreaks of atypical prrs in china and molecular dissection of the unique hallmark immunogenicity of the highly pathogenic porcine reproductive and respiratory syndrome virus gp5 protein encoded by a synthetic orf5 gene comparative pathogenicity of nine us porcine reproductive and respiratory syndrome virus (prrsv) isolates in a five-week-old cesarean-derived, colostrum-deprived pig model comparison of the pathogenicity of two us porcine reproductive and respiratory syndrome virus isolates with that of the lelystad virus assessment of the efficacy of commercial porcine reproductive and respiratory syndrome virus (prrsv) vaccines based on measurement of serologic response, frequency of gamma-ifn-producing cells and virological parameters of protection upon challenge understanding haemophilus parasuis infection in porcine spleen through a transcriptomics approach a comparison of normalization methods for high density oligonucleotide array data based on variance and bias cluster analysis and display of genome-wide expression patterns database for annotation, visualization, and integrated discovery annotation of the affymetrix porcine genome microarray transcriptomic analysis of the dialogue between pseudorabies virus and porcine epithelial cells during infection analysis of relative gene expression data using real-time quantitative pcr and the 2(-delta delta c(t)) method involvement of proteases in porcine reproductive and respiratory syndrome virus uncoating upon internalization in primary macrophages v-atpase functions in normal and disease processes the human adaptor sarm negatively regulates adaptor protein trif-dependent toll-like receptor signaling sarm inhibits both trif-and myd88-mediated ap-1 activation cutting edge: a transcriptional repressor and corepressor induced by the stat3-regulated anti-inflammatory signaling pathway suppressor of cytokine signaling 1 (socs1) limits nf{kappa}b signaling by decreasing p65 stability within the cell nucleus the interferon-inducible protein viperin inhibits influenza virus release by perturbing lipid rafts identification and characterization of novel isoforms of cop i subunits human sec31b: a family of new mammalian orthologues of yeast sec31p that associate with the copii coat structure-function analysis of a novel member of the liv-1 subfamily of zinc transporters erythrocyte scaffolding protein p55/mpp1 functions as an essential regulator of neutrophil polarity role of the platelet chemokine platelet factor 4 (pf4) in hemostasis and thrombosis canine cxcl7 and its functional expression in dendritic cells undergoing maturation porcine reproductive and respiratory syndrome virus inhibits type i interferon signaling by blocking stat1/stat2 nuclear translocation porcine reproductive and respiratory syndrome virus (prrsv) could be sensed by professional beta interferon-producing system and had mechanisms to inhibit this action in marc-145 cells differential type i interferon activation and susceptibility of dendritic cell populations to porcine arterivirus expression of interferon-alpha and mx1 protein in pigs acutely infected with porcine reproductive and respiratory syndrome virus (prrsv) interferon-alpha response to swine arterivirus (poav), the porcine reproductive and respiratory syndrome virus zn inhibits coronavirus and arterivirus rna polymerase activity in vitro and zinc ionophores block the replication of these viruses in cell culture zinc ions inhibit replication of rhinoviruses effect of zinc salts on respiratory syncytial virus replication role of zn2+ ions in host-virus interactions severe pandemic 2009 h1n1 influenza disease due to pathogenic immune complexes complement: a key system for immune surveillance and homeostasis hepatic inflammation mediated by hepatitis c virus core protein is ameliorated by blocking complement activation complement inhibition decreases the procoagulant response and confers organ protection in a baboon model of escherichia coli sepsis the porcine reproductive and respiratory syndrome virus requires trafficking through cd163-positive early endosomes, but not late endosomes, for productive infection we would like to thank tinghua huang and lijie su in our lab for their help with microarray data analysis. we are grateful to dr. xiao zhang (uppsala university, sweden) for valuable discussions and improving the manuscript. we thank our lab members for sample collections. the authors have declared that no conflict of interest exists. supplementary key: cord-303132-m3j1dekj authors: smith, s. e.; gibson, m. s.; wash, r. s.; ferrara, f.; wright, e.; temperton, n.; kellam, p.; fife, m. title: chicken interferon-inducible transmembrane protein 3 restricts influenza viruses and lyssaviruses in vitro date: 2013-12-17 journal: j virol doi: 10.1128/jvi.01443-13 sha: doc_id: 303132 cord_uid: m3j1dekj interferon-inducible transmembrane protein 3 (ifitm3) is an effector protein of the innate immune system. it confers potent, cell-intrinsic resistance to infection by diverse enveloped viruses both in vitro and in vivo, including influenza viruses, west nile virus, and dengue virus. ifitm3 prevents cytosolic entry of these viruses by blocking complete virus envelope fusion with cell endosome membranes. although the ifitm locus, which includes ifitm1, -2, -3, and -5, is present in mammalian species, this locus has not been unambiguously identified or functionally characterized in avian species. here, we show that the ifitm locus exists in chickens and is syntenic with the ifitm locus in mammals. the chicken ifitm3 protein restricts cell infection by influenza a viruses and lyssaviruses to a similar level as its human orthologue. furthermore, we show that chicken ifitm3 is functional in chicken cells and that knockdown of constitutive expression in chicken fibroblasts results in enhanced infection by influenza a virus. chicken ifitm2 and -3 are constitutively expressed in all tissues examined, whereas ifitm1 is only expressed in the bursa of fabricius, gastrointestinal tract, cecal tonsil, and trachea. despite being highly divergent at the amino acid level, ifitm3 proteins of birds and mammals can restrict replication of viruses that are able to infect different host species, suggesting ifitm proteins may provide a crucial barrier for zoonotic infections. t ype i and ii interferons (ifns) are critical for the development of the cell intrinsic antiviral state and achieve this by inducing the expression of genes collectively named ifn-stimulated genes (isgs). expression of the interferon-inducible transmembrane (ifitm) genes (new members of the isg family) restricts the replication of several highly pathogenic human viruses, including severe acute respiratory syndrome (sars) coronavirus, filoviruses (marburg virus and ebola virus), influenza a viruses (iavs), and flaviviruses (dengue virus) (1, 2) . although restriction of hiv-1 infection has also been reported in some studies (3, 4) , others have failed to demonstrate such activity (1) . ifitm proteins are small, with an average size of 130 amino acids, and share a topology (5) defined by a conserved cd225 domain (6) . this domain consists of two intramembrane (im) regions and a conserved intracellular loop (cil). as their names suggest, ifitm proteins are upregulated by type i and ii ifns; however, some cell and tissue types express constitutive levels of one or more of these proteins (7) . in humans, ifitm1, -2, and -3 are expressed in a wide range of tissues, while ifitm5 expression is limited to osteoblasts. mice have orthologues for ifitm1, -2, -3, and -5 and additional ifitm genes, ifitm6 and ifitm7 (8) . genome analysis of chickens has predicted the existence of two ifitm genes, orthologous to human ifitm10 (huifitm10) and huifitm5 (9) . however, such in silico analysis is often confounded by inappropriate identification of pseudogenes and incorrect assignment of orthologues, due to an incomplete knowledge of ifitm gene duplication and evolutionary history of this locus during speciation. under such circumstances, careful genome analysis of syntenic regions and functional characterization of genes are required to attempt to unambiguously define orthologous genes. ifitm proteins are the only mediators of innate immunity known to inhibit viral infection by blocking cytoplasmic entry and replication of diverse enveloped viruses (10) . ifitm-mediated viral restriction occurs at entry sites of susceptible viruses, in the late endosomal and lysosomal compartments, where the proteins are predicted to adopt an intramembrane structure. the n and c termini of the proteins are predicted to be cytoplasmic, the two intramembrane domains are buried within the cytoplasmic facing lipid bilayer, and the cd225 domain is thought to be facing the cytoplasm (11) . ifitm proteins inhibit formation of a fusion pore between the virus and endosomal membranes following acidic activation of virus envelope fusion proteins. recently, the ability of ifitm proteins to alter cellular membrane fluidity was demonstrated, leading to the arrest of fusion pore formation at the stage of hemimembrane fusion (12) . furthermore, it was recently found that ifitm3 interacts with vesicle-membrane-protein-associated protein a (vapa) and prevents its association with oxysterolbinding protein (osbp) (13) , which disturbs intracellular cholesterol homeostasis and thus causes inhibition of viral fusion in the late endosome, by an unknown mechanism. the high constitutive levels of ifitm proteins observed in many tissues potentially provide a first line of defense against virus infection. the induction of type i ifns further promotes ifitm expression, increasing their protective effect on sur-rounding uninfected cells. depletion of ifitm3 in mouse cells results in a loss of 40% to 70% of ifn=s protective effect against endosomal entering viruses (1). a similar attenuation is also observed from cells derived from ifitmdel mice, lacking ifitm1, -2, -3, -5, and 6, suggesting ifitm3 accounts for most of the antiviral activity of this locus for the viruses investigated (14) . importantly, mice homozygous for ifitm3 deletion suffer fulminant viral pneumonia when challenged with low-pathogenicity iav (14, 15) . direct clinical relevance of ifitm3's involvement in restricting human iav infection has recently been shown in individuals hospitalized with seasonal or pandemic influenza h1n1/09 viruses (15) , where a statistically significant number of hospitalized patients show overrepresentation of a minor c allele in ifitm3 (rs12252-c) that correlates with a decrease in the ability of ifitm3 to restrict influenza virus infection in vitro. importantly, the significance of the association of the rs12252-c allele with severe influenza infection was recently replicated in a chinese cohort of patients (16) . together, these data reveal that the action of ifitm3 profoundly alters the course of influenza virus infection in mammals and that allelic variation in ifitm3 alters host susceptibility to severe influenza virus infection. although ifitms have been well characterized in humans and mice, little compelling functional data exists for this isg family in other species. avian iavs represent a continuing threat to human populations both as a source for direct human infection and as a reservoir for iav genetic variation. these reservoirs provide the conditions for the generation of reassorted iavs with altered host ranges and pandemic potential (17) . furthermore, endemic and emerging avian viral pathogens create major challenges to the poultry industry through loss of productivity and mortality. similarly, lyssaviruses, particularly rabies virus (rabv), pose a substantial public health threat, with half of the world's population living in areas of endemicity (18) , although reports of avian lyssavirus infections are rare. the clinical presentations of an infection are identical for all lyssavirus species; however, while current vaccines and postexposure prophylaxis provide sterilizing immunity against rabv and genetically similar species, no such protection is conferred against the more genetically diverse lyssaviruses (19) . intrinsic innate immunity plays an important role in controlling the spread of numerous enveloped viruses; however, the influence of the ifitm isgs on members of the lyssavirus genus has not previously been evaluated. although putative ifitm genes have been identified by database searching in many species (6, 9) , no formal genome analysis or functional assessment of avian ifitm genes has been undertaken. here we report the analysis of the ifitm locus, reaffirming the existence of chicken ifitm1 (chifitm1) and providing the first functional characterization of chifitm2 and chifitm3, as well as demonstrating restriction of endosomeentering viruses by chifitms in vitro. cell culture and generation of ifitm-expressing cell lines. human-derived a549 cells (ccl-185; atcc) were grown in f-12 medium (life technologies) and human hek293t (crl-1573; atcc) and chicken df-1 (crl-12203; atcc) cells were grown in dulbecco's modified eagle's medium (dmem) (life technologies); all media were supplemented with 10% (vol/vol) fetal bovine serum (fbs) (biosera). chicken and human ifitm gene sequences were synthesized (geneart; life technolo-gies) as codon-optimized genes for expression in human cells, and chicken ifitms were also synthesized for optimal expression in chicken cells. all ifitm genes were cloned into the bamhi and noti sites of the lentivirus vector, psin-bnha (20) , and sequences confirmed by capillary sequencing (gatc biotech). the gene cassette was cloned into psin-bnha, to ensure that a c-terminal hemagglutinin (ha) tag followed the ifitm protein. lentivirus vector stocks were made by a three-plasmid transfection of hek293t cells, grown to confluence in a 10-cm dish. briefly, 200 l of opti-mem (gibco) was mixed with 10 l of fugene-6 (roche) before addition of 1 g of a gag-pol-expressing vector (p8.91), 1 g of a vesicular stomatitis virus glycoprotein (vsv-g)-expressing vector (pmdg), and 1.5 g of vector expressing the transgene (psin-bnha) and incubated for 15 min. the medium was removed from the cells and replaced with 8 ml of dmem plus 10% fetal bovine serum (fbs), and the dna mixture was added dropwise to the cells. after 24 h at 37°c and 5% co 2 , the medium was removed and replaced with 8 ml dmem plus 10% fbs and incubated for a further 24 h. packaged lentivirus vector was harvested 48 and 72 h after transfection by collecting the supernatant and being filtered through a 0.45-m-pore filter (millex). the lentiviruses were used to transduce human a549 lung epithelial cells and produce a mixed population from which single cell clones were derived by limiting dilution. expression of ifitms was detected by ha flow cytometric analysis. confocal microscopy. cells were seeded at 1 ϫ 10 5 /well on coverslips in a 12-well plate 1 day prior to transfection with an ifitm-encoding plasmid (1 g dna with 3 l of fugene [promega]). cells were fixed with 100% methanol for 10 min followed by being blocked in 1% bovine serum albumin (bsa) for 30 min. the ha epitope was targeted by an anti-ha antibody conjugated to alexa fluor 550 (ab117513), and endosomes were visualized by a lamp1 antibody with human (ab25630; abcam) or chicken (lep100 igg; developmental studies hybridoma bank) specificity, followed by incubation with a secondary antibody conjugated to alexa fluor 488 (ab96871; abcam). flow cytometric analysis. transfected cells were harvested using 300 l 0.25% trypsin-edta (life technologies), neutralized with 300 l of cell culture medium plus 10% fbs, and pooled with the supernatant. the cells were spun at 2,000 ϫ g for 5 min, and the pellet was resuspended in 100 l pbs and transferred to a 96-well v-bottomed plate (nunc). the plate was centrifuged, and the cells were fixed and permeabilized in 100 l of cytofix/cytoperm buffer (becton, dickinson) and washed according to the manufacturer's guidelines. the cells were resuspended with the anti-ha antibody conjugated to fluorescein isothiocyanate (fitc) (a190-108f; cambridge bioscience) and incubated for 1 h at 4°c, followed by two rounds of washing. iav replication was detected by antinucleoprotein (anti-np) antibody (ab128193; abcam) followed by incubation with anti-mouse alexa fluor 650 (ab96882; abcam). cells were resuspended in 300 l of pbs before analysis by flow cytometry (facscalibur ii; becton, dickinson). chicken or human ifitm-expressing a549 cell lines were seeded at 3 ϫ using the target activation bioapplication. briefly, this method counts every cell on the plate by drawing a perimeter around each of the nuclei (detected by hoechst) and calculates the percentage of these cells also expressing gfp. luciferase activity, as a measure of lentivirus infection, was determined at 48 h postexposure using 50 l bright-glo reagent (promega). the cells were allowed to lyse for 2 min before the level of luciferase activity was measured using the fluostar omega (bmg labtech). gfp and luciferase levels are reported relative to infection of a549 cells in the absence of ifitm protein overexpression. sirna knockdown studies. df-1 chicken cells were seeded at 5 ϫ 10 4 cells/well in a 24-well plate and transfected with a small interfering rna (sirna) against chifitm3 (gcgaagtacctgaacatcacg) or a nonspecific sirna (uucuccgaacgugucacgugu), using lipofectamine rnaimax (life technologies) 48 h prior to ifn stimulation. plaque assays. material to be assayed was serially diluted in serumfree dmem and used to infect mdck cells in 12-well plates. after 1 h of incubation, the inoculum was removed, and the cells were overlaid with dmem containing 0.2% bsa (sigma-aldrich), 1.25% avicel (fmc biopolymer), and 1 g trypsin ml ϫ1 (21). after 2 days, the overlay was removed, and the cells were fixed with 4% formal saline-pbs solution for 20 min before being stained with 0.1% toluidine blue solution (sigma-aldrich) so that the number of pfu could be calculated. expression of ifitm proteins in different chicken tissues. tissues were removed from 3-week-old specific pathogen-free (spf) rhode island red (rir) chickens, specifically thymus, spleen, bursa of fabricius, cecal tonsil, gastrointestinal tract, trachea, bone marrow, brain, muscle, heart, liver, kidney, lung, and skin. rna was dnase treated, and reverse transcription was carried out (superscript iii reverse transcriptase; life technologies). the cdna from each tissue was amplified by pcr using the following primer sets: chifitm1 (f=-agcacaccagcatcaaca tgc, r=-ctacgaagtccttggcgatga), chifitm2 (f=-aggtgag catcccgctgcac, r=-accgccgagcaccttccagg), chifitm3 (f=-ggagtcccaccgtatgaac, r=-ggcgtctccaccgtcacca), and chicken_gapdh (glyceraldehyde-3-phosphate dehydrogenase) (f=-actgtcaaggctgagaacgg, r=-gctgagggagctgagatga). the chicken genome (ensembl browser, version 2.1) contains two putative ifitm genes on chromosome 5, the so-called ifitm5 (ensgalg00000004239; chromosome 5:1620304 to 1621805:1) and ifitm10 (ensgal g00000020497; chromosome 5:15244061 to 15249351:1). the putative ifitm5 gene is located next to an uncharacterized gene (ensgalg00000006478), with which it shares 30% amino acid identity. immediately adjacent to this are three sequence gaps whose estimated sizes are 1 kb, 1 kb, and 400 bp in the ensembl chicken genome build. importantly, the putative ifitm genes in chicken are flanked by the telomeric ␤-1,4-n-acetyl-galactosaminyl transferase 4 (b4galnt4) gene and the centromeric acid trehalase-like 1 (athl1) gene. the b4galnt4 and athl1 genes flank the antiviral ifitm 1, 2, -3, and -5 gene block in mammalian genomes. sequence similarity searches of the most recent build of the chicken genome (v4.0, ncbi) using tblastn analysis and the putative ifitm5 amino acid sequence, revealed several transcripts with high amino acid identity to ifitm5. additionally, blast hits were also identified to putative genes loc770612 and loc422993, within the locus flanked by b4galnt4 and athl1. between these putative genes, two blast hits span the exons of genes designated chicken ifitm3like (ncbi, loc422993; genbank accession no. xm_420925.4) and ifitm1-like (ncbi loc770612; genbank accession no. xm_001233949.3). a third blast hit matches an uncurated gene, "gene 376074," which is positioned between ifitm3-like and ifitm5. further analysis of gene 376074 showed it shared amino acid sequence identity with both ifitm3-like and ifitm1-like genes. sequence similarity searches of the ncbi chicken expressed sequence tag (est) database suggests gene 376074 is expressed. all of the ch-ifitm paralogues, like mammalian ifitms, are comprised of two exons, and the location of the intron-exon boundary is conserved across all of the chicken ifitm genes. therefore, the chicken genome contains an intact ifitm locus with four putative ifitm genes flanked by the genes b4galnt4 and athl1 (fig. 1a) . annotation of the chicken ifitm genes. using genome synteny, we ascribe chifitm5 as orthologous to mammalian ifitm5, gene 376074 as orthologous to ifitm2, loc422993 (ifitm3like) as orthologous to ifitm1, and loc770612 (ifitm1-like) as orthologous to ifitm3. multiple amino acid sequence alignments between the three predicted antiviral chifitm genes and direct orthologues in primate species suggest this assignment is plausible. a number of conserved ifitm family motifs are present in some of the chicken sequences (fig. 1b) , and although the chicken sequences differ significantly from the human and chimpanzee orthologues, many amino acids in the cil domain are conserved. multiple sequence alignments also reveal important amino acids in the chicken ifitms that help to categorize each sequence as either ifitm1 or ifitm2/3. tyr20 is conserved in all primate ifitm2 or -3 sequences and is also present in chicken "ifitm1like" (ncbi) but none of the other ifitm1 orthologues. this, as well as the longer n terminus, further supports our assessment of this gene as an ifitm2 or -3 gene, and by synteny, the gene is ifitm3. the alignment also reveals that other functionally significant amino acids are conserved in some of the chicken ifitm sequences, including the two cysteines (cys75 and -76) in im1 that are palmitoylation sites in other species (11) and are important for membrane positioning. phe79, also in im1, is conserved in chicken "ifitm1-like" (chifitm3), which is believed to be important for mediation of a physical association between ifitm proteins (22) . in light of syntenic analysis (and functional support shown later), we suggest the following revisions of the ncbi nomenclature: loc770612 as chifitm3 (previously "ifitm1like"), loc422993 as chifitm1 (previously chicken "ifitm3like"), and gene 376074 as chifitm2 (fig. 1) . subcellular localization of ifitm proteins. human ifitm1, -2, and -3 localize distinctly in the cell, with ifitm1 being predominantly cell surface expressed and ifitm2 and -3 being predominantly intracellular, localizing with endosomes ( fig. 2d and e). the cellular localization of ifitm proteins can further delineate their orthologous relationships. we therefore synthesized codon-optimized, c-terminal ha-tagged chifitm1, -2, and -3 and transiently transfected chicken df-1 fibroblasts, comparing their cellular localization to the orthologous human ifitm (hu-ifitm) proteins expressed in human a549 cells. confocal microscopy using an anti-ha antibody and an anti-lamp1 antibody showed chifitm1 was diffusely expressed throughout the cyto-plasm, and chifitm2 was present in the cytoplasm and the cell membrane, whereas chifitm3 localized perinuclearly ( fig. 2a to c), consistent with huifitm3. chifitm3 therefore shares synteny, amino acid similarity, and subcellular localization with hu-ifitm3. in the case of the other two chifitms, their localization is a less clearly paired with the human ifitms; thus, our nomenclature is founded on the gene order. chicken ifitm proteins restrict diverse virus infection. we investigated if, despite considerable amino acid sequence divergence, chicken ifitms could function as restriction factors. by expressing chifitm3 at different levels in the cell, we show that there is a strong expression-level-dependent correlation between the level of chifitm3 expression and the percentage of cells infected by a lentivirus vector pseudotyped with the lyssavirus envelope of lbv (fig. 3 ). we then compared the level of antiviral restriction of chicken ifitm2 and -3 to that of their orthologous human proteins in a549 cells. overexpression of chifitm3 resulted in 79.4% and 85% reductions in infection of a549s to lentivirus vectors pseudotyped with the lyssaviruses envelopes rabv and lbv, similar to the level of restriction by huifitm3 to the same viruses (fig. 4a) , even though chickens are rarely infected by lyssaviruses (23) . chifitm2 also restricts lyssavirus lbv and rabv infection to a similar level as chifitm3. a similar pattern of restriction is seen for lentiviruses pseudotyped with iav h1, h5, h7, and h10 (fig. 4b ). huifitm3 restricts viral infection of all influenza virus hemagglutinins, reducing infection by greater than 90%, and chifitm3 restricts h1 and h10 pseudotypes as effectively, but restricts h5 and h7 less well. chifitm2 and huifitm2 restrict more moderately, as shown by others. consistent with previous studies on huifitm3 protein (1, 2), chifitm3 failed to restrict mlv-a (fig. 4d) . overall, although chifitm3 and hu-ifitm3 only share 42% amino acid identity, the level of viral restriction of chifitm3 is similar to that in huifitm3. data are not shown for chifitm1, as a stably expressing cell line could not be made; this lack of stability at high expression levels is supported by hach et al. (24) , who show that overexpression of unpalmitoylated murine ifitm1 is difficult to achieve. we assessed the constitutive level of expression of chifitm3 in df-1 cells, by quantitative rt-pcr with primers for chifitm3. df-1 cells abundantly express chifitm3 (threshold cycles [c t s] of 20 for ifitm3 and 22 for gapdh). despite being ifn inducible, addition of ifn-␥ resulted in only a moderate induction, whereas addition of ifn-␣ caused a 2.67-log 2 (6.4-fold) increase in chifitm3 expression (fig. 5a) . we assessed our ability to knock down chifitm3 expression in df-1 cells using an sirna . note that the orientation change of chifitm2 and chifitm1 makes the assignment of orthology difficult; therefore, the chicken genes are named by gene order and conservation of specific functionally defined amino acid residues. the predicted mass is shown above the gene block. the colored columns in the sequence alignment (b) show residues that are shared between all nine ifitm sequences from humans, chimpanzees, and chickens. significant residues have been highlighted with a symbol below the sequence: o, tyrosine; oe, double cysteine; star, phenylalanine important for multimerization; 1, conserved ubiquitinated lysine. im1, intramembrane 1; cil, conserved intracellular loop; im2, intramembrane 2. designed to the chifitm3 transcript. treatment with this sirna on unstimulated df-1 cells resulted in a 1.23-log 2 (2.4-fold) reduction in the transcript level, with no change in chifitm3 transcript abundance with a nonspecific sirna. knockdown of endogenous chifitm3 resulted in a greater than 2-fold increase in infection of df-1 cells by replication-competent influenza a virus (a/wsn/1933) (fig. 5b) , assayed by flow cytometric analysis of np expression. furthermore, overexpression of chifitm3 in df-1 cells reduced viral replication by an average of 55% (fig. 5d) , and plaque assays show that the viral load was reduced from 1.3 ϫ 10 6 pfu ml ϫ1 to 3.1 ϫ 10 5 pfu ml ϫ1 after chifitm3 overexpression (fig. 5e) . together, these results show chifitm3 is able to restrict iav entry into df-1 cells. differential expression of chifitms in chicken tissues. we assessed the tissue-specific gene expression pattern in chickens using a panel of rna extracted from tissues from 3-week-old a549s transfected with huifitm proteins 1 to 3 (d, e, and f) in the absence of infection. panels show nuclei stained with dapi (4=,6-diamidino-2-phenylindole) (blue), endosomes marked with an antibody against lamp1 (green), ifitm protein marked by an antibody against the ha tag (red), and a merged image. rhode island red (rir) chickens focusing on thymus, spleen, bursa of fabricius, cecal tonsil, trachea, gastrointestinal tract, bone marrow, brain, muscle, heart, liver, kidney, lung, and skin. using primers specific to chifitm1, -2, or -3 (fig. 6) , expression of chifitm2 and -3 was detected in all tissues, although with lower levels of expression in the muscle and brain and higher levels in the cecal tonsils (fig. 6 ). in contrast, expression of chifitm1 was more restricted and confined to the bursa of fabricius, the gastrointestinal tract, and the cecal tonsil. to date, the antiviral activities of the ifitm2 and ifitm3 proteins have only been demonstrated in mammals, with a single report characterizing the function of chicken ifitm1 and ifitm5 (2) . computational analysis of vertebrate genomes suggests the ifitm gene family is present throughout vertebrates. however, this analysis and any phylogenetic reconstruction of gene history are complicated by the paralogous nature of the ifitm gene family, the presence of copy number variations, and the presence of numerous processed pseudogenes (6) . indeed, the identification of avian ifitms as part of the dispanin protein family failed to identify chicken ifitms in the antiviral ifitm1-to -3 subfamily defined as dsp2a to -c (25) . similarly, a more thorough analysis of vertebrate ifitms, while identifying distantly related ifitms in reptiles and birds, focused on eutherian clade 1 sequences for a detailed phylogenetic analysis (9). hickford et al. (26) have undertaken a comprehensive analysis of ifitm genes across a broad range of chordates. the authors have shown that all of the species analyzed, including "lower" vertebrates, such as lampreys, possess at least one ifitm-like gene. phylogenetic analysis of all of the ifitm paralogues they identified revealed that ifitm5 emerged in bony fish, while ifitm10 appears restricted to tetrapods. here we have resolved the entire antiviral ifitm locus on chromosome 5 of the chicken genome, expanding the number of ifitm genes to 4 in this locus, and confirmed that the locus is flanked by the genes athl1 and b4galnt4 (9) . crucially, we have shown that antiviral activity is conserved in chicken ifitms. the low-level sequence identity and orientation change of ch-ifitm2 and chifitm1 make the phylogenetic assignment of orthology problematic. our revised nomenclature of the chicken ifitm locus is based on the syntenic gene order, as previously discussed, and functional data where possible. however, given chifitm2 is localized to the plasma membrane, and the lack of an n-terminal extension (characteristic of huifitm2/3), it is possible that it is analogous to huifitm1. it is likely that similar extensive genetic and functional analyses will be essential to characterize the ifitm loci in other vertebrate species and define unambiguously ifitm1, -2, and -3 orthologues. using the chifitm3 amino acid sequence, we also searched the duck genome (v1.0) and identified a scaffold (2943) containing two duck ifitm (duifitm) orthologues. sequence identity and conserved synteny with the chifitm locus indicate they are ifitm5 and ifitm1. the two ifitm flanking genes, b4galnt4 and athl1, are also located on the scaffold in conserved positions. although annotated gene structures are absent in the browser, ifitm cdnas in other avian species align with the regions adjacent to both ends of the ifitm1 structure. this suggests the duck retains four ifitm genes at a conserved locus. we would expect the positions of duifitm2 and duifitm3 to be conserved with their orthologues in the chicken and other species. following infection with two h5n1 strains of avian influenza virus (a/duck/ hubei/49/05 and a/goose/hubei/65/05), levels of expression of duck ifitm3, -5, and -10 (measured by rnaseq) were increased to various degrees, reflecting a response befitting their expected function (27) . control of animal pathogens, especially those with zoonotic potential, is a key component of ensuring human health and food security. rabv is responsible for approximately 70,000 human deaths each year (28) , while other lyssaviruses have only been conclusively shown to cause a handful of fatalities (29) , although this could be due to poor surveillance. our results are the first to show diverse members of this genus of virus are sensitive to the inhibitory action of human ifitm proteins. furthermore, although most warm-blooded animals are susceptible to rabv, domestic birds are rarely infected by lyssaviruses (23) . despite this, chifitm2 and -3 were able to significantly reduce cell lyssavirus infection. avian iav infections, however, pose significant threats to human health, to the international poultry industry, and to small-scale poultry farmers (30) . our identification and functional characterization of the avian ifitm locus, together with knowledge that this gene family exists with copy number and allelic variants in other species (9, 15, 16) , should provide a focus for identifying ifitm variants with enhanced antiviral activity for use in farm animal breeding strategies to improve animal infectious disease resistance. specifically, we hypothesize that certain wild or outbred chicken ifitm allelic variants will confer enhanced levels of protection to pathogenic avian viruses that enter through acidic endosomes and that breeding for enhanced activity in ifitm variants will improve disease resistance in chickens. similarly, should chicken ifitm proteins restrict iav infection in chick embryos, the ablation of if-itm protein expression could improve vaccine production and boost yield. we have shown that chifitm proteins expressed in a549 cells are capable of restricting diverse viruses that enter cells through the acidic endosome pathway. furthermore, we show that df-1 chicken cells constitutively express chifitm3, and this is able to restrict influenza virus infection in vitro. despite sharing less than 50% amino acid identity, both chifitm3 and huifitm3 effectively restrict the entry of all lyssavirus and iav envelope pseudotypes tested. nevertheless, certain key amino acids in the n terminus, im1, and cil domain are conserved in chicken and human ifitm3, suggesting functional importance. this work describes our elucidation of the ifitm locus in the chicken genome and provides the first functional characterization of chifitm2 and chifitm3. despite this, many key questions remain; it is unclear how genes such as ifitm3 in humans and chickens, separated by 310 million years of evolution (31) and sharing less than 50% amino acid identity, maintain a conserved cellular location and a strong antiviral activity against a diverse range of viruses. it is of equal importance to determine, given the level of antiviral activity and the proposed indirect mechanism of ifitm protein restriction (12, 13) , how viruses overcome the restriction either within or between species. investigation of appropriately defined ifitm loci from different host species where cross-species transfer of virus in the ifitm proteins mediate cellular resistance to influenza a h1n1 virus, west nile virus, and dengue virus distinct patterns of ifitm-mediated restriction of filoviruses, sars coronavirus, and influenza a virus the ifitm proteins inhibit hiv-1 infection characteristics of ifitm, the newly identified ifn-inducible anti-hiv-1 family proteins molecular analysis of a human interferon-inducible gene family phylogenetic analysis of interferon inducible transmembrane gene family and functional aspects of ifitm3 ifitm proteins restrict antibodydependent enhancement of dengue virus infection normal germ line establishment in mice carrying a deletion of the ifitm/fragilis gene family cluster evolutionary dynamics of the interferon-induced transmembrane gene family in vertebrates new developments in the induction and antiviral effectors of type i interferon palmitoylome profiling reveals s-palmitoylationdependent antiviral activity of ifitm3 ifitm proteins restrict viral membrane hemifusion the antiviral effector ifitm3 disrupts intracellular cholesterol homeostasis to block viral entry ifitm3 limits the severity of acute influenza in mice ifitm3 restricts the morbidity and mortality associated with influenza interferon-induced transmembrane protein-3 genetic variant rs12252-c is associated with severe influenza in chinese individuals influenza virus evolution, host adaptation, and pandemic formation re-evaluating the burden of rabies in africa and asia rabies virus vaccines: is there a need for a pan-lyssavirus vaccine? multiply attenuated lentiviral vector achieves efficient gene delivery in vivo new low-viscosity overlay medium for viral plaque assays the cd225 domain of ifitm3 is required for both ifitm protein association and inhibition of influenza a virus and dengue virus replication rabies antibodies in sera of wild birds palmitoylation on conserved and nonconserved cysteines of murine ifitm1 regulates its stability and anti-influenza a virus activity the dispanins: a novel gene family of ancient origin that contains 14 human members evolution of fig 6 differential expression of chifitm transcripts in chicken tissues. expression levels of ifitm1, -2, and -3 were determined by rt-pcr across a range of chicken tissues (a) and compared to the expression level of gapdh (b). gapdh pcr was performed without reverse transcriptase (ϫrt) to control for genomic dna contamination. vertebrate interferon inducible transmembrane proteins the duck genome and transcriptome provide insight into an avian influenza virus reservoir species wildlife, environment and (re)-emerging zoonoses, with special reference to sylvatic tick-borne zoonoses in north-western italy human rabies due to lyssavirus infection of bat origin avian influenza virus (h5n1): a threat to human health continental breakup and the ordinal diversification of birds and mammals this work was supported by the wellcome trust (grant 098051) and funding from the biotechnology and biological sciences research council institute strategic program grant at the pirbright institute (bb/ j004448/1) and the medical research council (grant g1000413).we declare we have no conflicts of interest and no competing financial interests. fection occurs may help explain barriers and vulnerabilities to infection by diverse viruses. key: cord-305177-i71z2sf4 authors: neshat, sarah y; tzeng, stephany y; green, jordan j title: gene delivery for immunoengineering date: 2020-06-15 journal: curr opin biotechnol doi: 10.1016/j.copbio.2020.05.008 sha: doc_id: 305177 cord_uid: i71z2sf4 a growing number of gene delivery strategies are being employed for immunoengineering in applications ranging from infectious disease prevention to cancer therapy. viral vectors tend to have high gene transfer capability but may be hampered by complications related to their intrinsic immunogenicity. non-viral methods of gene delivery, including polymeric, lipid-based, and inorganic nanoparticles as well as physical delivery techniques, have also been widely investigated. by using either ex vivo engineering of immune cells that are subsequently adoptively transferred or in vivo transfection of cells for in situ genetic programming, researchers have developed different approaches to precisely modulate immune responses. in addition to expressing a gene of interest through intracellular delivery of plasmid dna and mrna, researchers are also delivering oligonucleotides to knock down gene expression and immunostimulatory nucleic acids to tune immune activity. many of these biotechnologies are now in clinical trials and have high potential to impact medicine. sarah y neshat 1 , stephany y tzeng 1 and jordan j green 1, 2, 3, 4, 5 a growing number of gene delivery strategies are being employed for immunoengineering in applications ranging from infectious disease prevention to cancer therapy. viral vectors tend to have high gene transfer capability but may be hampered by complications related to their intrinsic immunogenicity. non-viral methods of gene delivery, including polymeric, lipid-based, and inorganic nanoparticles as well as physical delivery techniques, have also been widely investigated. by using either ex vivo engineering of immune cells that are subsequently adoptively transferred or in vivo transfection of cells for in situ genetic programming, researchers have developed different approaches to precisely modulate immune responses. in addition to expressing a gene of interest through intracellular delivery of plasmid dna and mrna, researchers are also delivering oligonucleotides to knock down gene expression and immunostimulatory nucleic acids to tune immune activity. many of these biotechnologies are now in clinical trials and have high potential to impact medicine. gene delivery is increasingly being used to engineer the immune system in the laboratory and the clinic. various biotechnologies have been developed for the delivery of nucleic acids to cells, both in vivo and ex vivo. to overcome challenges of in vivo manipulation, several ex vivo cell engineering technologies have advanced to the clinic [1] ; however, delivery vehicles that can be administered directly to patients with delivery of the cargo efficiently to target cells in vivo also show great promise. various types of nucleic acid have been delivered for immune applications. here, we will discuss the delivery of plasmid dna (pdna) and mrna to overexpress a gene of interest; oligonucleotides such as small interfering rna (sirna) and micro rna (mirna), which can knock down gene expression; and immunostimulatory nucleic acids that elicit a specific immune response. differences in their physical, chemical, and immunological properties have been discussed in detail elsewhere [2] . briefly, pdna, the largest of these and generally >2 kb in size, is double-stranded and relatively stable to chemical degradation, but it must enter the nucleus. mrna, while less chemically stable than pdna, is smaller and acts in the cytoplasm, therefore not requiring nuclear entry. sirna and mirnas are generally only 20 bp and are also active in the cytoplasm, while immunostimulatory nucleic acids, which include cpg sequences in pdna and double-stranded rna structures, may be active in various intracellular locations, including the cytoplasm or the endosomal compartment. nucleic acids can activate the innate immune system, such as through sensing by toll-like receptors (tlrs), with single-stranded (ss) rna recognized by tlr-7 and -8, double-stranded (ds) rna recognized by tlr-3, and unmethylated cpg sequences in dna recognized by tlr-9 [3, 4] . other mechanisms of immune sensing include the cyclic gmp-amp synthase (cgas)stimulator of interferon genes (sting) pathway and the absent in melanoma 2 (aim2) pathway, which both detect cytosolic dna [5] , and dsrna sensors like rig-i-like receptors, reviewed here in detail [6] . examples below will show that nucleic acids may also be engineered to enhance or minimize this effect. as much of the recent progress in the field of gene delivery for immunotherapy has been for cancer applications, this review will focus on cancer immunotherapy but will also cover certain non-oncology applications. major strategies that have been explored for cancer immunotherapy [7] center on increasing the immunogenicity of the tumor microenvironment, enhancing the ability of antigen-presenting cells (apcs) to be activated, improving the activation of t cells and other lymphocytes in the context of the tumor while lessening the effect of suppressive immune cells, and vaccinating the patient with a tumor-specific antigen in order to generate a tumor-targeted immune response ( figure 1 ). broadly, an overarching goal of immunoengineering is often to shift the balance of the immune response between activation and suppression, or, more specifically, between the t h 1-type, including cell-mediated cytotoxic behavior that can be used to combat cancer; t h 2-type, which leads to a humoral response against extracellular pathogens and parasites; and t h 17-type, which is important for inducing inflammation. broad gene delivery strategies for cancer immunotherapy. gene delivery can be accomplished using viral, lipid-based, or polymeric vectors, or a combination of various materials. these can be used to genetically engineer immune cells ex vivo for adoptive transfer, or they can modify tumor cells or immune cells directly in vivo to promote immune activation against the tumor. some examples of gene delivery methods that can be used for tumor immunotherapy are shown here. viral vectors have historically been used for delivery of nucleic acids due to their effectiveness at transferring the nucleic acid payload to host cells. the most widely used vectors in recent years are retroviruses, including lentiviruses, which insert a gene of interest into the host genome; adenoviruses, which deliver an episomal, nonintegrating dna plasmid to cells; and adeno-associated viruses (aav), which can only replicate in coordination with a second virus [8 ] . for instance, zhu et al. described an aav vector for the delivery of the cytokine il-27, which inhibits t h 17 and t h 2 responses [9] . in the b16-f10 murine melanoma model, they found that this strategy depleted suppressive regulatory t cells (tregs), including at the tumor site, and led to enhanced efficacy of a cancer vaccine and anti-pd-1 checkpoint blockade therapy. importantly, however, viral vectors often elicit a strong immune response. not only can this have deleterious effects on the patient, but it may also reduce the efficacy of viruses, particularly upon repeated administration due to the formation of neutralizing antibodies. to address this, engineers have developed methods of coating viruses with polymers [8 ] to prevent exposure of antigenic viral epitopes ( figure 2 ). jung et al. used a hydrogel to encapsulate oncolytic adenoviruses, which are engineered to replicate only in tumor cells, and showed that encapsulation not only sustained the local release of virus in a hamster pancreatic cancer model but also reduced the animals' antiviral immune response, resulting in a >60% lower tumor burden compared to treatment with the adenovirus alone [10] . the use of polymer coatings can also improve viral transduction efficacy and allow codelivery of therapeutics: for instance, an adenovirus encoding the pro-t h 1 cytokine il-12 was coated with a copolymer of b-cyclodextrin and the cationic polyethylenimine (pei) to enhance delivery up to 600-fold in vitro and co-deliver a small molecule inhibitor of the suppressive tgf-b in the b16 model [11] . because of concerns about the safety of viral vectors, a wide range of non-viral delivery vehicles have been developed. state-of-the-art synthetic materials for gene delivery are reviewed in detail by lostalé -seijo et al. [12] . lipid-based nanoparticles or lipoplexes have been extensively researched for gene delivery, particularly in the case of mrna and oligonucleotides. amphiphilic lipids can form liposomal structures that encapsulate nucleic acids within the core; cationic lipids can associate with the negatively charged nucleic acid to form nanostructures. in one case, three immune-stimulatory genes encoded in mrna were delivered directly to b16-f10 melanoma or mc38 colorectal tumors in mice using lipid nanoparticles, which caused tumor regression and long-term immunity to the tumor [13 ] . engineering the chemical structure of lipids can improve the efficiency to allow better and more specific delivery of sirna and small single-guide rna (sgrna) to cells that are normally hard to transfect, including t cells [14] . specifically, lipids with conformationally constrained tails were fivefold to 10-fold more effective at gene delivery than similar lipids with unconstrained tails. several researchers have taken advantage of the immunestimulatory properties of nucleic acids delivered by lipid nanoparticles. among these are cyclic dinucleotides (cdns), such as cgamp agonists of stimulator of interferon genes (sting), which can increase the activity of apcs at the tumor site but, due to their high negative charge, require encapsulation to improve cellular internalization [15] . some lipid structures, particularly those with amine-containing cyclic head groups, have also been found to intrinsically activate sting, and the use of these immune-stimulatory lipids to deliver tumor antigen-encoding mrna resulted in effective vaccination against b16-f10 and ovalbumin-expressing b16 tumors, as lipids with cyclic head groups inducing nearly twofold higher antigen-specific cytotoxicity after injection [16 ] . on the other hand, excessive activation of interferon type i (ifn i) via pathways like sting can be damaging if uncontrolled; therefore, some groups have engineered mrna constructs to prevent ifn i induction while using lipid nanoparticles to co-deliver an adjuvant like monophosphoryl lipid a (mpla) that triggers ifn i in a more controlled manner [17] . given the intrinsic immunogenicity of some lipid carriers, polymers can serve as an alternative for nucleic acid delivery, though some polymers may also have adjuvant activity [18] . in addition to mrna and small nucleic acids, cationic polymers are also used often for the delivery of pdna, usually forming nanoparticles via selfassembly with nucleic acids. the versatility of polymer chemical and physical structures provides a wide range of properties that can be optimized. for instance, ex vivo transfection of t cells with mrna and dna with up to 25% and 18% efficiency, respectively, has been accomplished by tuning the branching architecture of the delivery vehicle, a co-polymer of poly(2-hydroxyethyl methacrylate) (phema) and poly(2-dimethylaminoethyl methacrylate) (pdmaema) [19] . in many cases, in vivo transfection is preferred in order to avoid ex vivo processing and cell culture steps. cationic pei has been studied for decades for gene delivery purposes, though modifications are often necessary to improve nanoparticle stability and biodistribution in vivo. for instance, a poly(ethylene glycol) (peg)-modified pei nanoparticle was complexed with pdna encoding small hairpin rna (shrna) to knock down expression of pd-l1 in tumor cells. local injection of hyaluronidase into the tumor improved nanoparticle accumulation after systemic administration by approximately twofold, leading to pd-l1 knockdown in the tumor [20] . polyrotaxanes, consisting of fourarm peg threaded with the cationic polysaccharide rings a-cyclodextrin, can deliver il-12 pdna to mc38 tumor cells after systemic administration by taking advantage of the stealth properties of peg to achieve good pharmacokinetics [21] . another peg-modified material, cationic trimethyl chitosan, delivered sirna against vegf and pigf to macrophages in breast cancer models, utilizing a mannose ligand to target macrophages and repolarize them to an immune-stimulatory phenotype [22] . this led to >90% tumor inhibition due to combination knockdown of both vegf and pigf in vivo. polymersomes, described as amphiphilic block co-polymers that self-assemble into liposome-like structures with an aqueous core, were also used to deliver cdn as a sting agonist and improved survival in a b16-f10 model after intratumoral injection [23] . the toxicity of high-molecular-weight pei is often limiting. less toxic low-molecular-weight pei can be crosslinked with degradable linkages to form a larger polymer with greater transfection efficacy while preserving the biocompatibility of the low-molecular-weight form. when functionalized with galactose to target the liver, a cross-linked pei co-polymer was able to deliver il-15 pdna to tumor cells with significantly lower polymermediated cytotoxicity and twofold to threefold higher gene delivery in vitro, leading to improved survival in an orthotopic murine ml-1 hepatocellular carcinoma model in vivo [24] . in a similar vein, biodegradable polymers are commonly used to reduce toxicity. poly (beta-amino ester)s (pbaes) are cationic, hydrolytically degradable polymers that have been used to deliver cpg odns, agonists of toll-like receptor 9 (tlr-9), which upregulates production of proinflammatory cytokines [25] , or cdn as a sting agonist [26] by local intratumoral delivery in mouse melanoma models. another pbae was also used to deliver pdna encoding 4 tissue, cell and pathway engineering immune-stimulatory genes locally to tumors, resulting in slowed tumor progression and long-term survival in b16-f10 and mc38 models [27 ] . smith et al. further modified pbae/pdna nanoparticles by coating them with a targeting ligand, allowtransfection of t cells after intravenous injection. this led to in situ generation of leukemia-specific chimeric antigen receptor (car) t cells, with nearly 6% of the circulating t cells car + within days [28 ] , and the same group showed that a similar strategy could be used to deliver mrna to t cells, inducing a memory phenotype in tumor-specific t cells as well as demonstrating proof-of-concept in situ gene editing [29] . other biodegradable polymers, like chargealtering releasable transporters, deliver mrna encoding the ovalbumin antigen to a mouse along with cpg as an adjuvant, allowing for successful vaccination of the mouse against ovalbumin-expressing a20 lymphoma after subcutaneous or intravenous delivery [30] , with up to 40% of mice considered cured of established tumors. the chemical tunability of synthetic polymers provides many opportunities to optimize their properties for gene delivery and immune engineering. lipid-based and polymer-based delivery systems can be combined to take advantage of properties from both materials. folate-modified methoxy poly(ethylene glycol)-poly(lactide) (mpeg-pla) and dioleoyl-3-trimethylammonium propane (dotap) nanoparticles encapsulating pdna encoding the cc-motif chemokine ligand 19 (ccl19) were designed to transfect folate receptorexpressing tumor cells in ct26 colon cancer models and induce expression of ccl19 to modulate dendritic cell (dc) and lymphocyte interactions [31] . the authors describe advantages over certain car t therapies, such as augmenting the favorable anti-tumor immune response while avoiding the detrimental cytokine release syndrome (crs). a similar hybrid concept has been used in combination with chemotherapy drugs. in one particular design, oxaliplatin (oxp) chemotherapy is delivered systemically to cause immunogenic cell death (icd) in parallel with locally delivered dna encoding a pd-l1 trap fusion protein via lipid-protamine-dna (lpd) nanoparticles that function selectively within the tumor, thus resulting in minimized immune-related adverse effects, for ct26, b16-f10, and 4t1 tumor models ( figure 3 ) [32] . the combination treatment with pd-l1 trap dna and oxp was significantly more effective than either treatment alone, with approximately 20% increase in median survival in ct26-bearing mice. these lpd nanoparticles have also been implemented to simultaneously silence expression of hmga1 (high mobility group protein a1) to increase t lymphocyte infiltration twofold to fourfold and induce pd-l1 trap expression to improve checkpoint inhibitor therapy in ct26 and 4t1 models [33] . alternatively, mckinlay et al. describe the generation of a combinatorial library to screen for hybrid-lipid chargealtering releasable transporters [34] . these specific materials are primarily effective for mrna transfection in t-cell lines, primary t cells in vitro, and splenic in situ. the incorporation of oleyl and nonenyl lipid elements increased the transfection efficacy of t cells to 1-1.5% and that of b cells to 11%, while minimizing toxicity-mediated cell death compared to the previously published polymeric charge-altering releasable transporters, which have been reported to transfect <1% and 1-7% of t cells and b cells, respectively [34] . inorganic or physical modifications can be applied to nanoparticles for increased stabilization or improved uptake by certain targeted cell populations. calcium phosphate (cap) has long been used for gene delivery due to its ability to encapsulate nucleic acid efficiently and dissolve intracellularly under the acidic conditions of the endolysosomal compartment, but it is limited by poor stability and lack of control in manufacturing, necessitating use of other materials to improve the properties of cap [35] . lipid-coated calcium phosphate (lcp) nanoparticles functionalized with mannose have been developed to target delivery of mrna to dendritic cells (dcs) draining to the lymph node in triple negative breast cancer (tnbc) in vivo models as a nanovaccine [36] . the mrna encodes the tumor-associated antigen mammary type mucin (muc1). transfected dcs express muc1 and present it to 4t1 tnbc-specific cytotoxic t cells, which expand and, in combination with anti-ctla-4 antibodies, leads to tumor infiltration, tumor growth inhibition, and memory [36] . lcp nps have also been used to deliver mrna encoding tyrosine related protein 2 (trp2) and pd-l1 sirna to b16-f10 melanoma models [37] , resulting in transfection of dcs to induce them to present a tumor antigen while also downregulating expression of a checkpoint molecule. alternatively, macrophages can be targeted using modified nanoparticles. cpg oligodeoxynucleotides (odn) can be targeted to macrophages using mannosylated carboxymethyl chitosan/protamine sulfate/caco 3 /odn (mcmc/ps/ caco 3 /odn), a polymer/inorganic nanoparticle hybrid whose features are designed to improve odn encapsulation, macrophage uptake, and ph-mediated intracellular release [38] . cpg delivery to macrophages promotes expression of cd80, an activating co-stimulatory signal to lymphocytes, therefore inducing the anti-tumor m1 phenotype in vitro with raw264.7 cells [38] , measured by approximately twofold higher secretion of il-12 and other inflammatory cytokines compared with macrophages treated with the common commercial transfection agent lipofectamine1 2000. another hybrid, peptide/hyaluronic acid/protamine/ caco 3 /dna nanoparticles (phnp), was developed to target pdna to j774a.1 macrophages and hela tumor cells in vitro [39] . the fusion protein promotes recognition by fc receptors on macrophages internalization by tumor cells, while the hyaluronic acid (ha) interacts with cd44 found on both cell types. the pdna encodes il-12 to repolarize macrophages from anti-inflammatory m2 to anti-tumor m1, in addition to downregulating the cd47 'don't-eat-me' signal and upregulating co-stimulatory cd80 on tumor cells in vitro to reverse cancer-induced immunosuppression [39] . arginine-coated gold nanoparticles (argnps) have also been used to target macrophages and deliver crispr-cas9 in order to edit out signal-regulatory protein a (sirp-a), the inhibitory receptor for cd47, thus allowing fourfold greater phagocytosis of human bone osteosarcoma cells in vitro [40] . the cationic arginine allowed binding to the single guide rna (sgrna) for sirp-a and the cas9 protein, which was engineered to include glutamic acid tags at the c-terminus and nuclear localization signal (nls) at the n-terminus. although many recent advances related to gene therapy have been focused on cancer, delivery technologies have been developed for other applications as well, including infection, autoimmune disorders, and allergy. genetic vaccines are particularly advantageous when compared to traditional peptide-based vaccines given the ability to stimulate at lower quantities, maintain antigen expression, bypass hla restriction, and expand to both humoral and cellular immunity responses [41] . recent vaccine developments have incorporated gene delivery using both polymer and lipid nanoparticles. a zika virus (zikv) vaccine formulated with the full natural dna sequence of zikv premembrane and envelope protein (prm-e) within a tetrafunctional peo/ppo/ethylene diamine amphiphilic block copolymer np has elicited antigenspecific serum igg, neutralizing antibodies, and protection upon intramuscular challenge [42] . similarly, an intradermal lnp vaccine contained n(1)-methylpseudouridine mrna encoding viral surface antigens, such as zikv prm-e, influenza virus hemagglutinin (ha), and hiv-1 envelope (env) [43] . the vaccine was shown to establish an antigen-specific cd4 + t-cell response in addition to an increase in b cells and plasma cells to generate humoral memory and high affinity neutralizing antibodies when compared to unmodified mrna 6 tissue, cell and pathway engineering vaccines. interestingly, yan et al. describes a subcutaneously injected scaffold loaded with ovalbumin (ova) mrna-lipoplexes constructed from chitosan-alginate gel [44] . while a more traditional vaccine formulation consisting of the ovalbumin protein elicited a stronger short-term humoral immune response, the mrna-lipoplex-based scaffold vaccine elicited greater t-cell proliferation within secondary lymphoid organs. ifn-g secretion as a result of the mrna-lipoplex scaffold was also greater than that due to protein, naked mrna, or mrna-lipoplex immunizations by two-to threefold [44] . additionally, anti-viral vaccines have been engineered with self-amplifying mrna (sam) encapsulated in lipid nanoparticles and show an induced type 1 ifn gene delivery for immunoengineering neshat, tzeng and green 7 response locally when compared to tlr7 agonists [45 ] . these studies have led to interest in exploring the use of sam mrna for nanoparticle-delivered immunotherapies [46] . in addition to infectious diseases, gene therapies have been used to address autoimmune disorders. an aav vector (aav5) has been used for single dose intraarticular delivery of the human interferon-b (hifn-b) gene in patients with inflammatory arthritis, including rheumatoid arthritis (ra) [47] . this therapy is unique in that there is local transcriptional control of the hifn-b transgene by an nf-kb promoter, therefore causing transgene expression only during states of flare-up inflammation. recently, allergy immunotherapy has expanded with goals of provoking biased antigen-specific t h 1 responses. microneedles coated with model antigen ova co-formulated with sting agonist as an adjuvant induced the generation of ova-specific serum igg2a, signifying an enhanced t h 1 response [48] . when challenged with ova, splenocytes produced higher il-2 and ifn-g t h 1 cytokines when compared to subcutaneous injections or microneedle delivery of alum formulations. although the studies described above have been designed for viral vaccines and allergies, concepts can be applied to improve future developments of cancer immunotherapies. exciting translational methods and immune stimulation techniques are being investigated for clinical applications within the last few years, and some examples are shown in table 1 . for instance, researchers at the national cancer institute have developed a potential treatment for heterogeneous metastatic cancer by retrovirally transduced autologous peripheral blood mononuclear cells (pbmcs) to express neoantigen-reactive tcrs isolated from the patient to target shared oncogenes and multiple neoantigens with aims to address tumor escape [49, 50] . additionally, a phase i trial led by modernatx, inc. outlines the safety and tolerability of lipid nanoparticles encapsulating genetic material encoding tumor neoantigens while inducing neoantigen-specific t cells in 33 patients, whether given as a monotherapy or in combination with pembrolizumab, and it subsequently advanced to phase ii [51, 52] . on the other hand, adenoviral-mediated delivery has been assessed in a completed phase i clinical trial with patients with malignant pleural mesothelioma to induce production of ifn-a in the pleural fluid and serum [53] . although the results of this heterogeneous pilot study were variable, disease stability or regression via scans and serum measurements of soluble mesothelin-related peptides (smrp) was noted in five of the nine subjects described as younger patients with lower tumor burdens. for applications outside of cancer immunotherapy, a phase i clinical trial is currently ongoing to test aav5 as a vector to induce nf-kb-regulated hifn-b expression in patients with ra [54] . in response to the recent covid-19 pandemic, modernatx, inc. in collaboration with the national institute of allergy and infectious diseases (niaid) are developing a lipid nanoparticle/ mrna-based vaccine for the novel sars-cov-2 coronavirus infection, currently in phase i trials in healthy volunteers [55] . immune responses can be engineered using gene delivery techniques to modulate the genetic composition of cells, enabling innovative cancer immunotherapy methods. nucleic acid cargos of varying size can be optimized for nanoparticle encapsulation to upregulate or downregulate gene expression, leading to productive antitumor immune responses. genetic material can be delivered via viral or non-viral approaches to immune cells, such as macrophages and dendritic cells, to activate or suppress activity. nanoparticles are diverse in formulation, with each type of delivery material conveying advantages and disadvantages, which may be combined in hybrid formulations consisting of multiple material types. in addition, new cargos, such as self-amplifying mrna, hold promise for efficient, next-generation vaccines, and nanostructures like polymersomes can allow for the delivery of combination immunotherapies. recent research has shown the expansive applications of gene therapy to various immune-related disorders and diseases, including tumors, infectious diseases, autoimmune disorders, and allergy. many of the current trials shown here are in early stages, but there are exciting developments that potentiate future clinical impact. it is important to note that gene-based therapies for immune engineering constitute a set of challenges that require further vetting of these platforms before entering further clinical stages, such as reproducibility, instability of genetic materials, production scale-up, transient off-target genotoxicity, and vector immunogenicity [56] . however, with well-funded companies increasingly sponsoring nucleic acid therapeutics and immune engineering approaches, there is great hope for the necessary advances that can achieve clinical efficacy while minimizing toxicity. nothing declared. polymeric nucleic acid delivery for immunoengineering dna-based biomaterials for immunoengineering nucleic acid sensing in mammals and plants: facts and caveats dna sensing by the cgas-sting pathway in health and disease double-stranded rna sensors and modulators in innate immunity an update on immunotherapy for solid tumors: a review while viral gene delivery is of high interest in the laboratory as well as in the clinic, it still faces several hurdles, some of which can be solved by combining viruses with other materials. this review covers many of the challenges to viral gene therapy as well as methods of overcoming them il-27 gene therapy induces depletion of tregs and enhances the efficacy of cancer immunotherapy a hydrogel matrix prolongs persistence and promotes specific localization of an oncolytic adenovirus in a tumor by restricting nonspecific shedding and an antiviral immune response combined delivery of a tgf-beta inhibitor and an adenoviral vector expressing interleukin-12 potentiates cancer immunotherapy synthetic materials at the forefront of gene delivery durable anticancer immunity from intratumoral administration of il-23, il-36gamma, and ox40l mrnas using lipid nanoparticles, the authors describe an antigen-agnostic combination immunotherapy via direct intratumoral injection of mrna encoding immunostimulatory genes, leading to anti-tumor effect even at sites distant from the local treatment site. this work also includes indepth analysis of the tumor microenvironment over the course of treatment constrained nanoparticles deliver sirna and sgrna to t cells in vivo without targeting ligands liposomal delivery enhances immune activation by sting agonists for cancer immunotherapy delivery of mrna vaccines with heterocyclic lipids increases anti-tumor efficacy by stingmediated immune cell activation this work highlights the importance of the immunological properties of the delivery vehicle itself, which is co-delivery of nucleoside-modified mrna and tlr agonists for cancer immunotherapy: restoring the immunogenicity of immunosilent mrna polymer nanoparticles as adjuvants in cancer immunotherapy cationic polymers for nonviral gene delivery to human t cells efficient pd-l1 gene silence promoted by hyaluronidase for cancer immunotherapy development of self-assembled multi-arm polyrotaxanes nanocarriers for systemic plasmid delivery in vivo combination antitumor immunotherapy with vegf and pigf sirna via systemic delivery of multifunctionalized nanoparticles to tumor-associated macrophages and breast cancer cells endosomolytic polymersomes increase the activity of cyclic dinucleotide sting agonists to enhance cancer immunotherapy a novel galactose-peg-conjugated biodegradable copolymer is an efficient gene delivery vector for immunotherapy of hepatocellular carcinoma polyplex interaction strength as a driver of potency during cancer immunotherapy biodegradable sting agonist nanoparticles for enhanced cancer immunotherapy in situ genetic engineering of tumors for long-lasting and systemic immunotherapy this article demonstrates the potential of biodegradable polymeric nanoparticles and combination dna delivery to program tumors and their microenvironment to activate a systemic cellular anti-tumor response without requiring a priori knowledge about the tumors or their antigens in situ programming of leukaemia-specific t cells using synthetic dna nanocarriers this article shows the advantages of using easily tailorable and functionalizable polymeric materials for gene delivery, and the in situ reprogramming of tumor-specific t cells has great potential for clinical application hit-and-run programming of therapeutic cytoreagents using mrna nanocarriers mrna vaccination with charge-altering releasable transporters elicits human t cell responses and cures established tumors in mice powerful anticolon tumor effect of targeted gene immunotherapy using folate-modified nanoparticle delivery of ccl19 to activate the immune system synergistic and low adverse effect cancer immunotherapy by immunogenic chemotherapy and locally expressed pd-l1 trap nanoparticle-mediated hmga1 silencing promotes lymphocyte infiltration and boosts checkpoint blockade immunotherapy for cancer enhanced mrna delivery into lymphocytes enabled by lipid-varied libraries of charge-altering releasable transporters lipid-coated calcium phosphate nanoparticle and beyond: a versatile platform for drug delivery combination immunotherapy of muc1 mrna nano-vaccine and ctla-4 blockade effectively inhibits growth of triple negative breast cancer mrna vaccine with antigen-specific checkpoint blockade induces an enhanced immune response against established melanoma functional polymer/inorganic hybrid nanoparticles for macrophage targeting delivery of oligodeoxynucleotides in cancer immunotherapy a multi-functional macrophage and tumor targeting gene delivery system for the regulation of macrophage polarity and reversal of cancer immunoresistance crispred macrophages for cell-based cancer immunotherapy a comparison of plasmid dna and mrna as vaccine technologies amphiphilic block copolymer delivery of a dna vaccine against zika virus nucleoside-modified mrna vaccines induce potent t follicular helper and germinal center b cell responses injectable biodegradable chitosan-alginate 3d porous gel scaffold for mrna vaccine delivery induction of an ifn-mediated antiviral response by a selfamplifying rna vaccine: implications for vaccine design weissman d: mrna vaccines -a new era in vaccinology preclinical potency and biodistribution studies of an aav 5 vector expressing human interferon-beta (art-i02) for local treatment of patients with rheumatoid arthritis assessment of th1/th2 bias of sting agonists coated on microneedles for possible use in skin allergen immunotherapy administration of autologous t-cells genetically engineered to express t-cell receptors reactive against mutated neoantigens in people with metastatic cancer enhanced detection of neoantigen-reactive t cells targeting unique and shared oncogenes for personalized cancer immunotherapy and immunogenicity of mrna-4157 alone in subjects with resected solid tumors and in combination with pembrolizumab in subjects with unresectable solid tumors (keynote-603) an efficacy study of adjuvant treatment with the personalized cancer vaccine mrna-4157 and pembrolizumab in patients with high-risk melanoma (keynote-942) a trial of intrapleural adenoviral-mediated interferon-alpha2b gene transfer for malignant pleural mesothelioma a single dose clinical trial to study the safety of art-i02 in patients with arthritis safety and immunogenicity study of 2019-ncov vaccine (mrna-1273) to prevent sars-cov-2 infection advances in mrna vaccines for infectious diseases dose escalation and efficacy study of mrna 2416 for intratumoral injection alone and in combination with durvalumab for patients with advanced malignancies. u.s. national library of medicine dose escalation study of mrna-2752 for intratumoral injection to patients with advanced malignancies phase 1 trial of interleukin 12 gene therapy for locally recurrent prostate cancer redirected high affinity gag-specific autologous t cells for hiv gene therapy risk prostate cancer trial using gene & androgen deprivation therapies gene-modified t cells in treating patients with locally advanced or stage iv solid tumors expressing ny-es0-1 therapy improving the outcome of liver transplantation for advanced hcc the authors thank the n.i.h. for support (p41eb028239, r01ca228133, and r01ey031097). syn (dge-1746891) thanks the nsf graduate research fellowship program for support. syt thanks the american autoimmune-related diseases association (aarda) for support. the authors are also thankful for support from the bloombergkimmel institute for cancer immunotherapy. key: cord-352200-i05h8csb authors: xu, yi; zhou, wenwu; zhou, yijun; wu, jianxiang; zhou, xueping title: transcriptome and comparative gene expression analysis of sogatella furcifera (horváth) in response to southern rice black-streaked dwarf virus date: 2012-04-27 journal: plos one doi: 10.1371/journal.pone.0036238 sha: doc_id: 352200 cord_uid: i05h8csb background: the white backed planthopper (wbph), sogatella furcifera (horváth), causes great damage to many crops by direct feeding or transmitting plant viruses. southern rice black-streaked dwarf virus (srbsdv), transmitted by wbph, has become a great threat to rice production in east asia. methodology/principal findings: by de novo transcriptome assembling and massive parallel pyrosequencing, we constructed two transcriptomes of wbph and profiled the alternation of gene expression in response to srbsdv infection in transcriptional level. over 25 million reads of high-quality dna sequences and 81388 different unigenes were generated using illumina technology from both viruliferous and non-viruliferous wbph. wbph has a very similar gene ontological distribution to other two closely related rice planthoppers, nilaparvata lugens and laodelphax striatellus. 7291 microsatellite loci were also predicted which could be useful for further evolutionary analysis. furthermore, comparative analysis of the two transcriptomes generated from viruliferous and non-viruliferous wbph provided a list of candidate transcripts that potentially were elicited as a response to viral infection. pathway analyses of a subset of these transcripts indicated that srbsdv infection may perturb primary metabolism and the ubiquitin-proteasome pathways. in addition, 5.5% (181 out of 3315) of the genes in cell cytoskeleton organization pathway showed obvious changes. our data also demonstrated that srbsdv infection activated the immunity regulatory systems of wbph, such as rna interference, autophagy and antimicrobial peptide production. conclusions/significance: we employed massively parallel pyrosequencing to collect ests from viruliferous and non-viruliferous samples of wbph. 81388 different unigenes have been obtained. we for the first time described the direct effects of a reoviridae family plant virus on global gene expression profiles of its insect vector using high-throughput sequencing. our study will provide a road map for future investigations of the fascinating interactions between reoviridae viruses and their insect vectors, and provide new strategies for crop protection. rice viral diseases are major threats to rice production and have been distributed worldwide across regions depending on rice cultivation [1] . the most prevalent rice viruses are plant-infecting reoviruses in the genera phytoreovirus, fijivirus and oryzavirus of the family reoviridae. southern rice black streak dwarf virus (srbsdv), commonly known as rice black-streaked dwarf virus 2 (rbsdv 2), is a novel member of the fijivirus [2] . since its first observation in 2001 in guangdong, china, srbsdv has spread rapidly and now causes large yield losses throughout southern and central china, northern vietnam, and recently has been identified in northern china [3, 4] and japan [5] . when infected with srbsdv, rice often develops stunted stems, dark green, twisted leaves, and white waxy swellings along veins on the abaxial surface of the leaves. when infected in seedling stage, rice shows more severe stunting and occasionally dried necrotic leaves. with infections after tillering, stunting is usually not so obvious, but disrupted head development and shrunken grains are apparent in many fields. srbsdv is non-enveloped and has an icosahedral capsid (t = 13) composed of an outer and inner protein shells, similar to other known fijiviruses. virions contain ten linear genomic segments (named s1, s2, s3, s4, s5, s6, s7, s8, s9, s10) of double-stranded rna (dsrna), which range from approximately 4.5 to 1.4 kb in size [2, 6] . sequence comparisons and phylogenetic analyses show that srbsdv has a high sequence similarity with rbsdv, and that segments s1 and s10 have the highest relatedness [7] . although the p7-1 protein of srbsdv was recently reported to induce the formation of tubular structures in insect cells and may be a virus movement protein [8] , the molecular functions of the translated proteins have not been documented. srbsdv is transmitted by the delphacid member white-backed planthopper (wbph), sogatella furcifera (hemiptera: delphacidae), in a persistent-propagative manner [2] . wbph is one of the most economically important insect pests in asian countries [9] and as an oligophagous plant-feeder, wbph can cause great harm by direct feeding or vectoring srbsdv to crops, such as rice, wheat and maize [2, 10] . wbph is known for its long-distance migratory habits, the areas affected migration have been regarded to be the central and southeastern part of china, and vietnam, and these areas are consistent with spread and emergence of srbsdv during the past years [11, 12] . the ecological and physiological perspectives of wbph and other hemipteran insect pests have been extensively studied [13, 14] , but molecular mechanisms whereby the insect causes crop damage and yield losses are poorly understood. recently, transcriptomes of the planthoppers nilaparvata lugens and laodelphax striatellus were reported using next-generation dna sequencing techniques [15, 16] . transcriptional response of whitefly to a geminivirus was also reported [17] . these papers provide considerable information relevant to the genomics of planthoppers and whiteflies, and also generate some insight into the molecular mechanisms of insect defense against virus infection. however, the genomic resources available for wbph are still scarce and searches in genbank identified only about 156 wbph ests. hence, this limited amount of data provides little information for transcriptional, proteomic, and gene functional analysis of wbph and almost no information is available to dissect the complicated interactions between the newly emerged srbsdv and its vector wbph. however, next-generation high-throughput dna sequencing technique has provided unprecedented fascinating opportunities for gene discovery of wbph and detection of the global transcription responses to srbsdv infection. here, we constructed two transcriptomes of wbph and profiled the alternation of gene expression in response to srbsdv infection at the transcriptional level. as a whole, 81388 distinct unigenes have been identified and the results indicated that srbsdv infection can potentially perturb primary metabolism and the ubiquitin-proteasome pathway of wbph and activate immune regulatory systems, such as rna interfering, autophagy and antimicrobial peptide production. to our knowledge, this is the first report to define the wbph transcriptome. illumina sequencing and reads assembly as described in the methods, viruliferous and non-viruliferous wbph whole body cdna libraries were subjected to illumina sequencing platform, resulting in 40,665,360 and 39,135,954 reads, respectively. after cleaning and quality checks, short sequences were assembled, resulting in 136031 and 144604 contigs and data are archived at the ncbi sequence read archive (sra) under accession number srp009194 (http://www. ncbi.nlm.nih.gov/sra). using paired end-joining and gap-filling methods, these contigs were further assembled into 241,380 scaffolds with a mean length of 295 base pair (bp). after clustering the scaffolds with the nucleotide sequences available at ncbi, sequence data from the two libraries were combined, and 81388 unigenes were finally obtained with a mean length of 555 bp ( table 1 ). the length distribution of total unigenes had similar patterns between viruliferous and non-viruliferous wbph samples, suggesting there was no bias in the construction of the cdna libraries ( figure 1 ). however, some unigenes were obtained only from viruliferous or non-viruliferous samples (data not shown) and we believe these differences may be caused by distinctions that arise from long-term ecological adaptation to virus infection. all files of the assembled contigs and scaffolds from viruliferous, nonviruliferous wbph, and the combined est libraries are available upon request. to annotate the unigenes, we searched reference sequences using blastx against the non-redundant (nr) ncbi protein database with a cut-off e-value of 10 25 . a total of 28909 (35.52% of all distinct sequences) unigenes provided a blast result (table s1 ). the species distributions of the best match result for each sequence were shown in figure 2 and table s2 . the sequences of wbph had a 16.17% matches with the red flour beetle (tribolium castaneum) sequences, followed by 14.04% and 12.49% with honey bee (apis mellifera) and the pea aphid (acyrthosiphon pisum), respectively. it was surprising that wbph shared the highest similarity with the red flour beetle in the blast annotation. a similar pattern has also been reported in the brown planthopper (nilaparvata lugens) with the transcriptome of n. lugens having a similarity of 18.89% matches with the red flour beetle sequences, 14.80% with the body louse (pediculus humanus corporis) and 13.19% with the pea aphids, respectively [15] . we used the go assignment to classify the functions of predicted wbph unigenes. go is a gene functional classification system which offers a dynamic-updated controlled vocabulary and a strictly defined concept to comprehensively describe properties of genes and their products in any organism [18] . go has three ontologies: molecular function, cellular component and biological process. among 30,987 annotated unigenes, approximately 18045 (58.23%) of the unigenes could be annotated in go based on sequence homology (table s3) . when compared with the other two rice planthoppers (n. lugens and l. striatellus), wbph had a very similar go distribution ( figure 3 ). this distribution is quite reasonable given that the three sap-feeding planthopper species not only belong to the same family delphacidae in the evolutionary level, but also have similar ecological niches in the rice eco-system. in addition, go analysis also showed a similar distribution of gene functions for non-viruliferous and viruliferous wbph (figure 4 ), indicating that the number of genes expressed in each go category was not significantly affected by srbsdv infection. molecular markers are widely applied in the study of insect evolution and differentiation. as mentioned before, wbph is well known for its yearly migration across asian countries, resulting in the spread of srbsdv in east asia. a critical problem in the study of wbph migration is lack of an effective molecular marker, which would be helpful for define of migration pathways. by using de novo assembly of transcriptome sequences, we have identified 7291 simple sequence repeats (ssrs) or microsatellites in wbph. according to predictions, about 7.26% of protein-coding sequences possessed such repeats, of which 62.43% were trinucleotide repeats, with (aag) n being the most frequent motif (39.8%), followed by 16.62% dinucleotide and 0.7% tetranucleotide repeats ( figure 5 , table s4 ). the results are consistent with the recent reports indicating that trinucleotide repeats are the most abundant microsatellites in coding ests [19, 20] . previous work revealed that (aac) n is the most frequent motif in l. striatellus [16] . the different predicted trinucleotide repeats in the two transcriptomes defined for the two available rice planthopper genomes may reflect distinct adaptive behaviors of these related insects. the large numbers of potential molecular markers found in our study will be particularly useful for future genetic mapping, parentage analysis, genotyping and gene flow studies of these species. to identify differentially expressed genes in response to srbsdv infection, the numbers of clean tags for each gene were calculated, and then individual sets of reads were mapped back to the previously assembled transcript and counted as a proxy for gene expression. the differentially expressed transcripts between the two samples were identified using an algorithm developed by audic et al [21] . the expressed transcripts were assigned into groups according to their functions, such as biological process (3315 sequences, figure 6a , table s5 ), cellular component (2965 sequences, figure 6b , table s5 ) and molecular function (3721 sequences, figure 6c , table s5 ), and the distribution of every ontology was shown in figure 6 . totally, 58% of the differentially expressed genes were up-regulated in the viruliferous wbph ( figure 7 and figure 8a ). the detected fold changes (log 2 ratio) of gene expression ranged from 215 to +16, and more than 80% of the genes were up-or down-regulated between 1.0 to 5.0 fold ( figure 8b ). among the molecular function assignments, a high percentage of genes were assigned to cellular and metabolism process genes (35.6%), and 11.8% were related to biotic stimulus genes ( figure 6 ). to validate the digital gene expression (dge) result, we compared the expression profiles of the non-viruliferous and viruliferous wbph using qrt-pcr. we selected 42 unigenes randomly, 33 of which demonstrated a concordant direction of change for both dge and qrt-pcr (table s6) . of the selected unigenes, nine were inconsistent between dge and qrt-pcr. this difference might be caused by a lower sensitivity of qrt-pcr than dge, and read coverage may be uneven across the transcript length, owing to sequencing biases [22, 23] . nevertheless, qrt-pcr analysis confirmed the direction of change detected by dge analysis, indicating that our results are reliable. as mentioned above, viral infections cause dramatic changes in cellular and metabolic processes. for the primary metabolism analyses, 225 out of 3315 genes (6.8%) were down-regulated in viruliferous wbph, most of which were involved in translation, amino acid metabolism, and biosynthesis of ribosomes, spliceosomes and aminoacyl-trnas (table 2 and figure 9 ). these results suggest that protein synthesis and amino acid metabolism of viruliferous wbph were inhibited by srbsdv infection, which are consistent with previous reports in wasps (campoletis sonorensis) and whiteflies [17, 24] . in wasps, when infected by a polydnavirus, translation of specific growth-associated host proteins was inhibited [24] ; and in whiteflies, the primary metabolism genes were dramatically down-regulated by tomato yellow leaf curl china virus (tylccnv) infection [17] . in addition, our results indicated that a large percentage of the genes involved in lipid metabolism and lipogeneic compound metabolism were upregulated (table 2 and figure 9 ). in contrast to whiteflies, tylccnv infection causes the down-regulation of lipid metabolism. a possible explanation for this phenomenon is the differences in replication styles of the two viruses. srbsdv, a typical dsrna virus, replicates their genomes strictly in the cytoplasm, and the lipid biosynthesis and related pathways are necessary for membrane proliferation that occurs in infected cells. tylccnv is a dna virus that enters nucleus directly for its genome replication, membrane proliferation is less extensive. effects on lipid metabolism have also been reported in other virus infections, for instance, human cytomegalovirus (cmv) infection resulted in increases in the flux of the fatty acid biosynthesis pathway genes that were essential for optimal viral growth in fibroblasts [25] . furthermore, hepatitis c virus has been shown to co-opt the prenylation pathway to promote the efficient replication of its genome and possibly encapsidation [26, 27] . all of these observations together with our data suggest that perturbation of lipid metabolism in a range of virally infected cells is a hallmark of cellular changes associated with viral infection. viruses depend on host's machineries to replicate and express their genomes, and replicating cells have large pools of deoxynucleotides and high levels of key enzyme activities that viruses exploit during replication [28] [29] [30] . so viruses have developed strategies to regulate the host cell cycle to facilitate their replication. our results indicated that 231 out 3315 genes (7.0%) related to cell cycle were altered (table 3) . compared with the mitotic (m) phase, there were more up-regulated genes during inter-phase according to our dge results (data not shown). meanwhile, pathway enrichment analyses also showed that gene expression patterns in the ubiquitin-proteasome pathway were significantly altered. the proteasome is the major non-lysosomal proteolytic machine in eukaryotes [31] . in animals and plants, perturbation of the ubiquitin-proteasome pathway has already been shown to be caused by many viruses [32, 33] . for example, adenovirus e1a protein can directly interact with the 19 s proteasome regulatory components (s4 and s8), that are involved in regulation of the activities of 26s proteasome [34] . inhibition of the proteasome by different chemical compounds not only impairs entry but also affects rna synthesis and subsequent protein expression in coronavirus infections [35] . ubiquitination and deubiquitination of the nucleoprotein (np) were also reported to regulate the replication of influenza a virus rna [36] . in the plant phyla, the geminivirus bsctv c2 protein was reported to attenuate the degradation of samdc1 and suppresses dna methylation-mediated gene silencing by inhibiting 26 s proteasome pathway in arabidopsis [37] . also in arabidopsis, perturbation of ubiquitin-proteasome system affects accumulation of turnip yellow mosaic virus (tymv) rna-dependent rna polymerase during viral infection [38] . in our work, 72 out of 3315 differentially expressed genes were related to the ubiquitinproteasome pathway, and a majority of these were down-regulated (table 3 and figure 9 ). the ubiquitin-proteasome pathway is involved in regulation of metabolic adaptation and immune responses and may function by degrading numerous short-lived proteins, such as regulatory proteins [39] [40] [41] . indeed, all of these pathways were significantly affected in viruliferous wbph. among the differentially expressed genes in the viruliferous wbph, 5.5% (181 out of 3315) genes related to cytoskeleton organization were altered, including 70 microtubule cytoskeleton organization genes and 74 actin cytoskeleton organization genes. cytoskeleton-dependent intracellular transport is a common strategy for virus transport to intracellular destinations [42] [43] [44] [45] [46] [47] . for example, association with microtubules is necessary for the release of rice gall dwarf virus (rgdv) from cultured insect vector cells [48] ; the pns10 of rice dwarf virus (rdv) induces tubular structures that facilitate virus spread in the vector nephotettix cincticeps [49] . rgdv and rdv, together with srbsdv that we studied in this work, all belong to the reoviridae family, so a potential similar cytoskeleton regulatory pathway may exist among these viruses. meanwhile, according to a recent work using a recombinant insect baculovirus expression system, the srbsdv p7-1 protein formed tubular structures in insect cells [8] . by integrating these data with our transcriptome results, it is reasonable to believe that the dynamics of cytoskeleton has a major role in virus infection. cellular and humoral responses are major response systems used by insects against microbial infection [50] [51] [52] [53] . among the differentially expressed genes in viruliferous wbph, many genes related to cellular and humoral immune response were upregulated (table 4 and figure 9 ). for example, 65% (52 out 80) of the unigenes were up-regulated in the lysosome pathway. b-cell lymphoma 6 protein (bcl-6), an important regulation factor in humoral immune responses, is believed to be involved in germinal center b-cell functions and the immune response [54, 55] . this protein acts as a sequence-specific repressor of transcription, and has been shown to modulate the transcription of startdependent il-4 responses of b cells and altered expression of bcl-6 leads to modulated il-4 levels and increases in the immune response [56] . also, among the differentially expressed genes, antimicrobial peptide and melanization-related product synthesizing genes were also up-regulated in the wbph. phospholipase a2 products are considered to interfere with viral infection and it has been found that several phospholipases a2 can efficiently protect host cells from hiv-1 replication [57, 58] . down-regulation of the phospholipase a2 inhibitor subunit b (table 3 ) may be helpful for the releasing of active phospholipases a2, and thus improve host defenses against virus attacks. rna interference (rnai) silences gene expression through small interfering rnas (sirnas) and micrornas (mirnas) [59, 60] . the rnai system was initially described in plants, and in recent years, some work has revealed that rnai is a major mechanism to target viruses in some insects [60, 61] . in d. melanogaster, dicer-2 produces sirnas, whereas dicer-1 recognizes precursors of mirnas. in these cases, the small rnas are assembled with an argonaute (ago) protein into related effector complexes, such as the rna-induced silencing complex (risc), to guide specific rna silencing [59, 61, 62] . in our transcriptome data, we found 31 records of genes (10 agos, 16 dicers, 4 dsrna binding proteins) involved in the rna interference pathway. in the dge results (table 4) , some involved genes were up-regulated in viruliferous wbph, such as dicer 1 and dicer 2, but no substantial difference in the expression levels of putative rnai related genes was found in the l. striatellus after infection with rice table 3 . expression profiles of genes in the ubiquitin-proteasome and cell cycle pathways. stripe virus (rsv), a tenuivirus [16] . these differences may be caused by the diversity among the replication cycles of the two viruses or the host defense systems. the latter hypothesis is supported by recent findings that rsv and rdv (both plant reoviruses) caused different effects in the rice rna interference pathway [63] . we employed massively parallel pyrosequencing to collect ests from viruliferous and non-viruliferous samples of wbph. in total, we obtained 81388 different unigenes, and have provided a major genomic resource that will be useful for subsequent investigations of the biology of wbph and generation of new insecticide targets. we for the first time described the direct effects of a reoviridae family plant virus on global gene expression profiles of its insect vector using high-throughput sequencing. the genes involved in primary metabolism, ubiquitin-proteasome, cytoskeleton dynamics and immune responses were up regulated in viruliferous wbph. in addition, we found that an rnai pathway exists in wbph, and this pathway may play an important role in virus defenses. our study will provide a road map for future investigations of the fascinating interactions between reoviridae viruses and their insect vectors, and provide new strategies for crop protection. the wbph culture used in this study originated from jiangsu province, china. srbsdv-infected and un-infected rice (oryza sativa cv. nipponbare, k818) plants were used to feed wbph in all experiments. for viruliferous wbph, srbsdv-infected rice seedlings were collected from fields and then moved to a beaker after discarding the old leaves. then 200 wbph second instar nymphs were swept into the beaker, and the beaker was enclosed with nylon mesh. after two days of feeding, the rice seedlings were gently turned upside-down, and the juvenile planthoppers were swept onto healthy rice seedlings. all plants were grown in soil at 2561uc, 80% relative humidity in a growth incubator, under a long day photoperiod of 14 h in light and 10 h in darkness. sample preparation and rna isolation after two days of feeding on srbdv infected rice, wbph were maintained on uninfected rice seedlings for more than 12 days (viral circulative period). pcr was then used to ensure that viral rnas were present in the individual wbph. during this time, non-viruliferous wbph, as the control group, were treated identically. approximately 100 non-viruliferous and viruliferous wbph nymphs and adults were collected for rna extraction. total rna was isolated using the trizol reagent (invitrogen, carlsbad, ca, usa) according to the manufacturer's protocol. the concentration and quality of total rna were determined by a nano-drop spectrophotometer (thermo fisher scientific, wilmington, de, usa). cdna library preparation and illumina sequencing for transcriptome analysis rna was purified from 20 mg of pooled total rna using oligo (dt) magnetic beads and fragmented into 200-700 nucleotides length sequences in the fragmentation buffer. the cleaved poly-(a) rna was transcribed using oligo (dt) (takara, japan), and then second-strand cdna synthesis was performed. after end repair and ligation of adaptors, the products were amplified by pcr and purified using qiaquick pcr purification kit to create a cdna library. the cdna library was sequenced on the illumina sequencing platform (hiseq2000). the raw reads from the images were generated by solexa ga pipeline 1.6. after removal of low quality reads, processed reads were assembled using short oligonucleotide analysis package (soap) de novo software and clustered with tigr gene indices (tgi) clustering tools [64] . all raw transcriptome data have been deposited in the sra database (ncbi). the generated unigenes larger than 350 bp were analyzed by searching the genbank and swissprot databases with the blastx algorithm. (http://www.ncbi.nlm.nih.gov/). gene ontology annotations and cog classfication of the unigenes were determined with the blast2go (http://www.blast2go.org/) and inter-proscan software (http://www.ebi.ac.uk/tools/pfa/ iprscan/). ssrs were identified with the microsatellite identification tool (misa) (http://pgrc.ipk-gatersleben.de/misa/). misa is based on perl script designed to allow the identification and characterization of microsatellites in a comparative genomic context. detection criteria were constrained to perfect repeat motifs of 1-6 bp and a minimum repeat number of 12, 8, 5, 5, 5 and 5, for mono-, di-, tri-, tetra-, penta-and hexa-nucleotide microsatellites, respectively. a rigorous algorithm was developed to identify differentially expressed genes (degs) between the non-viruliferous and viruliferous wbph as described [21] . short-read sequence data from non-viruliferous and viruliferous wbph were mapped separately against the reference set of assembled transcripts with burrows-wheeler aligner (bwa) 0.5.8a [65] , relative transcript levels were output as rpkm (reads per kilobase of exon model per million mapped reads) values, which were taken into account the transcript abundance. the false discovery rate (fdr) was used to determine the threshold p-value in multiple tests. a fdr,0.001 and an absolute value of the log 2 ratio .1 provided thresholds to determine significant differences in gene expression. the differentially expressed genes were used for go enrichment analyses as described [18] . the correlation between two libraries was statistically assessed by calculation of the pearson correlation coefficient (data not shown). to confirm the results of pyrosequencing analysis, the expression levels of 42 randomly selected genes were measured in non-viruliferous and viruliferous insects by qrt-pcr. total rnas from each sample were extracted using trizol reagent (invitrogen) and treated with dnase i (invitrogen) according to the manufacturer's protocol. the concentration of each rna sample was adjusted to 1 mg/ml with nuclease-free water and total rna was reverse transcribed in a 20 ml reaction system using the amv rna pcr kit (takara). qpr-pcr was carried out on the lightcycler 480@ ii with lightcycler 480@ sybr i master (roche applied science, basel, switzerland) under the following conditions: 95uc for 5 min; and 40 cycles of 95uc for 10 s, 60uc for 15 s, and 72uc for 20 s, followed by melting curve generation (68uc to 95uc). primers used in qrt-pcr for validation of differentially expressed genes were shown in table s7 . each gene was analyzed in triplicate, after which the average threshold cycle (ct) was calculated per sample, and an endogenous 18 s rrna gene of wbph was used for normalization. a no template control sample (nuclease free water) was included in the experiment to detect contamination and determine the degree of dimer formation (data not shown). biology and epidemiology of rice virus southern rice blackstreaked dwarf virus: a new proposed fijivirus species in the family reoviridae molecular characterization of segments s7 to s10 of a southern rice black-streaked dwarf virus isolate from maize in northern china a black-streaked dwarf disease on rice in china is caused by a novel fijivirus virus taxonomy: classification and nomenclature of viruses. eighth report of the international committee on the taxonomy of viruses the complete genome sequence of two isolates of southern rice black-streaked dwarf virus, a new member of the genus fijivirus the p7-1 protein of southern rice black-streaked dwarf virus, a fijivirus, induces the formation of tubular structures in insect cells weather associated with autumn and winter migrations of rice pests and other insects in south-eastern and eastern asia planthoppers, their ecology and management the search for domestic migration of the white-backed planthopper, sogatella furcifera (horvath) (homoptera: delphacidae), in japan estimating the immigration source of rice planthoppers, nilaparvata lugens (stal) and sogatella furcifera (horvath) (homoptera: delphacidae), in taiwan a migration analysis for rice planthoppers, sogatella furcifera (horvath) and nilaparvata lugens (stal) (homoptera: delphacidae), emigrating from northern vietnam from april to may weather associated with spring and summer migrations of rice pests and other insects in south-eastern and eastern asia transcriptome analysis of the brown planthopper nilaparvata lugens massively parallel pyrosequencing-based transcriptome analyses of small brown planthopper (laodelphax striatellus), a vector insect transmitting rice stripe virus (rsv) global analysis of the transcriptional response of whitefly to tomato yellow leaf curl china virus reveals the relationship of coevolved adaptations wego: a web tool for plotting go annotations genome-wide survey and analysis of microsatellites in nematodes, with a focus on the plant-parasitic species meloidogyne incognita a microsatellite marker for studying the ecology and diversity of fungal endophytes (epichloe spp.) in grasses the significance of digital gene expression profiles deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms fulllength transcriptome assembly from rna-seq data without a reference genome polydnavirus infection inhibits translation of specific growth-associated host proteins human cytomegalovirus induces the activity and expression of acetyl-coenzyme a carboxylase, a fatty acid biosynthetic enzyme whose inhibition attenuates viral replication hepatitis c virus rna replication is regulated by host geranylgeranylation and fatty acids strategies for hepatitis c therapeutic intervention: now and next the cell cycle and virus infection hepatitis c virus infection causes cell cycle arrest at the level of initiation of mitosis cell-cycle perturbation in sf9 cells infected with autographa californica nucleopolyhedrovirus the proteasome viruses and the 26 s proteasome: hacking into destruction interactions of sars coronavirus nucleocapsid protein with the host cell proteasome subunit p42 regulation of the 26 s proteasome by adenovirus e1a the ubiquitin-proteasome system plays an important role during various stages of the coronavirus infection cycle ubiquitination and deubiquitination of np protein regulates influenza a virus rna replication bsctv c2 attenuates the degradation of samdc1 to suppress dna methylation-mediated gene silencing in arabidopsis the ubiquitin-proteasome system regulates the accumulation of turnip yellow mosaic virus rna-dependent rna polymerase during viral infection ubiquitin ligases and the immune response scf-mediated protein degradation and cell cycle control cell cycle regulation by the ubiquitin pathway association of ebola virus matrix protein vp40 with microtubules intracellular transport of viruses and their components: utilizing the cytoskeleton and membrane highways movements of vaccinia virus intracellular enveloped virions with gfp tagged to the f13l envelope protein involvement of the secretory pathway and the cytoskeleton in intracellular targeting and tubule assembly of grapevine fanleaf virus movement protein in tobacco by-2 cells intracellular tansport of plant viruses: finding the door out of the cell the early secretory pathway and an actin-myosin viii motility system are required for plasmodesmatal localization of the nsvc4 protein of rice stripe virus association of rice gall dwarf virus with microtubules is necessary for viral release from cultured insect vector cells the spread of rice dwarf virus among cells of its insect vector exploits virus-induced tubular structures evolutionary dynamics of immune-related genes and pathways in disease-vector mosquitoes immunity-related genes and gene families in anopheles gambiae cell-mediated immunity in arthropods: hematopoiesis, coagulation, melanization and opsonization deep sequencing-based transcriptome analysis of plutella xylostella larvae parasitized by diadegma semiclausum alterations of a zinc finger-encoding gene, bcl-6, in diffuse large-cell lymphoma protein expression of b-cell lymphoma gene 6 (bcl-6) in invasive breast cancer is associated with cyclin d-1 and hypoxia-inducible factor-1 alpha (hif-1 alpha) transcriptional repression of stat6-dependent interleukin-4-induced genes by bcl-6: specific regulation of i epsilon transcription and immunoglobulin e switching intrinsic phospholipase a2 activity of adeno-associated virus is involved in endosomal escape of incoming particles secreted phospholipases a2, a new class of hiv inhibitors that block virus entry into host cells rna interference directs innate immunity against viruses in adult drosophila rna-based antiviral immunity rnai is an antiviral immune response against a dsrna virus in drosophila melanogaster the rna silencing endonuclease argonaute 2 mediates specific antiviral immunity in drosophila melanogaster viral infection induces expression of novel phased micrornas from conserved cellular microrna precursors de novo assembly of human genomes with massively parallel short read sequencing fast and accurate short read alignment with burrows-wheeler transform we thank tong zhou, fenfen wang, yanyuan bao for assistance in feeding and collecting insects, jian xue for advice on data analysis, and xiaoying chen and chuanxi zhang for supplying the gene ontology data of l. striatellus and n. lugens. conceived and designed the experiments: xz jw. performed the experiments: yx. analyzed the data: yx wz. contributed reagents/ materials/analysis tools: yx yz jw. wrote the paper: yx xz jw. key: cord-322566-ye27nqj2 authors: huang, yuxiang; tan, hexin; yu, jian; chen, yue; guo, zhiying; wang, guoquan; zhang, qinglei; chen, junfeng; zhang, lei; diao, yong title: stable internal reference genes for normalizing real-time quantitative pcr in baphicacanthus cusia under hormonal stimuli and uv irradiation, and in different plant organs date: 2017-05-03 journal: front plant sci doi: 10.3389/fpls.2017.00668 sha: doc_id: 322566 cord_uid: ye27nqj2 baphicacanthus cusia (nees) bremek, the plant source for many kinds of drugs in traditional chinese medicine, is widely distributed in south china, especially in fujian. recent studies about b. cusia mainly focus on its chemical composition and pharmacological effects, but further analysis of the plant's gene functions and expression is required to better understand the synthesis of its effective compounds. real-time quantitative polymerase chain reaction (rt-qpcr) is a powerful method for gene expression analysis. it is necessary to select a suitable reference gene for expression normalization to ensure the accuracy of rt-qpcr results. ten candidate reference genes were selected from the transcriptome datasets of b. cusia in this study, and the expression stability was assessed across 60 samples representing different tissues and organs under various conditions, including ultraviolet (uv) irradiation, hormonal stimuli (jasmonic acid methyl ester and abscisic acid), and in different plant organs. by employing different algorithms, such as genorm, normfinder, and bestkeeper, which are complementary approaches based on different statistical procedures, 18s rrna was found to be the most stable gene under uv irradiation and hormonal stimuli, whereas ubiquitin-conjugating enzyme e2 was the best suitable gene for different plant organs. this novel study aimed to screen for suitable reference genes and corresponding primer pairs specifically designed for gene expression studies in b. cusia, in particular for rt-qpcr analyses. baphicacanthus cusia (nees) bremek (figure s1 ), widely distributed in southern china, is the only plant belonging to the family acanthaceae. its roots are used as a traditional chinese medicine, named "nan-ban-lan-gen" (national pharmacopoeia committee, 2015) , for its antibacterial, antiviral, and immunoregulatory effects in treating colds, fever, and influenza, and especially severe acute respiratory syndrome (sun et al., 2008) . its leaves and stems are used to extract indigo naturalis (qingdai) . previous studies have shown that qingdai is used to treat leukemia hu et al., 2014) , ulcerative colitis (suzuki et al., 2013; fan et al., 2014) , oral cancer (lo and chang, 2013) , and psoriasis (lin et al., 2012) . numerous active compounds have been identified from b. cusia to date, such as indole alkaloids, quinazolinone alkaloids, monoterpenes, triterpenes, flavonoids, sterols, anthraquinones, benzoxazinones, and lignans. of these ingredients, the pharmaceutical activities of indole alkaloids are most frequently reported for their leukocyteinhibitory, anti-inflammatory, and antiviral activities (li et al., 1993; wu et al., 2005; huang et al., 2014) . recently, indirubin, tryptanthrin, and isorhamnetin were successfully characterized, and their anti-leukemia effects were validated . these compounds are secondary metabolites synthesized during normal plant growth or in response to environmental stresses (borowski et al., 2014) . the stress response mechanism of b. cusia to harsh environmental conditions needs to be explored to better understand its role in producing active ingredients. at present, the genetic information of b. cusia for molecular biology research is limited in public databases, which makes further in-depth studies more difficult. the mrna has recently been sequenced, and the partial contigs have already been deposited into the national center for biotechnology information (ncbi) database (srr4428209) to accelerate genetic studies in b. cusia. plants have established exceedingly complex molecular mechanisms to survive adverse environmental conditions, and the primary mechanism is induced at a transcriptional level when stresses are present (nakashima et al., 2009; tran et al., 2010) . moreover, studies show that most hormones can regulate plant physiological activities, and exogenous plant hormones can improve plant resistance to variable environments (bari and jones, 2009) . for example, abscisic acid (aba) is called "stress hormone" in plants because it plays an important role in resisting drought, salinity, and heat (tuteja, 2007) . jasmonic acid methyl ester (meja) plays a key role in plant defense and growth (zhang m. et al., 2015) . moreover, ultraviolet (uv)-b and meja were applied in combination for their synergistic effects on the expression levels of key genes in the biosynthetic pathway . rna sequencing (rna-seq), a transcriptome-based nextgeneration sequencing technique, has revolutionized the genome-wide gene expression analysis in various species (wang et al., 2009; stone and storchova, 2015) . the main outcome of rna-seq data is the identification of differentially expressed genes. it is also used to search for reference genes. real-time quantitative polymerase chain reaction (rt-qpcr) is a sensitive, specific, and reproducible technique widely used to analyze the expression of genes in different organisms and tissues under different conditions (bustin, 2002; andersen et al., 2004; caldana et al., 2007) . consequently, it has become the method of choice for validating candidate genes with a large sample of individuals and replicates. however, it is necessary to select suitable reference genes as internal controls under different experimental conditions for accurate rt-qpcr evaluation because of the variability in initial material, rna integrity, rt-pcr efficiency, and rt-qpcr efficiency (derveaux et al., 2010) . further, gene expression can be highly tissue specific and differentiated based on the physiological status of the organism or experimental treatments. many recent studies (exposito-rodriguez et al., 2008; liu et al., 2012) have proved that it is necessary to verify the expression stability of a candidate reference gene in each species prior to its use for normalization. in this regard, several free excel-based statistical algorithms such as genorm (vandesompele et al., 2002) , normfinder (andersen et al., 2004) , and bestkeeper (pfaffl, 2001) permit the identification of the best internal controls from a set of candidate normalization genes in a given series of biological samples. they have been successfully employed to identify the most stable reference genes in animals, microorganisms, human diseases, and various plant species, such as parsley (li m. y. et al., 2016) , chrysanthemum morifolium (qi et al., 2016) , camellia sinensis , oxytropis ochrocephala , and gentiana macrophylla (he et al., 2016) . besides these software programs, the best suitable reference gene should be evaluated with target genes associated with experimental conditions to obtain reliable results. b. cusia is still a less-studied species at the molecular level. terpenoid indole alkaloids (tias), derivatives of shikimate and terpenoid pathways, are important medicinal ingredients in b. cusia, and they are activated in response to hormonal stresses (schluttenhofer et al., 2014; . shikimate kinase and 1-deoxy-d-xylulose 5-phosphate reductoisomerase (dxr) are the main enzymes involved in the synthesis of tias (veau et al., 2000; kasai et al., 2005) . therefore, the genes coding for them are interesting target genes that may be able to testify the reliability of the reference genes under different experimental conditions. the stability of 10 commonly used reference genes based on the transcriptome datasets of b. cusia was evaluated by rna-seq (unpublished data) in this study to identify potential reference genes suitable for transcript normalization in experiments under uv irradiation and hormonal stimuli (meja and aba), and also in different plant organs. moreover, the expression of two target genes, bcsk and bcdxr, was investigated to demonstrate the suitability of the selected reference genes. field-grown samples of b. cusia were collected from perennial dominant shufeng farm in fujian, china (25 • 25 n 118 • 39c). the organ-specific series of samples (root, stem, leaf, and flower) were collected from flowering plants. stress treatments were applied to 6-month-old plants before flowering. for uv irradiation, plants with soil were transferred into flowerpots and placed under a uv-b transilluminator (0.2 mw cm −2 ) for 3 h, and then the viable leaves were selected. the overground parts were sprayed with a solution containing either 100 µm meja or 100 µm aba. then, tissue samples, mainly comprising viable leaves, were collected at 0, 2, 4, 6, 8, 12, and 24 h after treatment. all samples were separately collected in three biological repeats. so, a total of 60 samples were analyzed, consisting of 12 organspecific samples (root, stem, leaf, and flower) and 48 stresstreated samples (meja-, aba-, and uv-treated leaves). after collection, the samples were immediately frozen in liquid n 2 and stored at −80 • c until further use. the frozen samples were ground to a fine powder in liquid n 2 using a pestle and mortar. the total rna was extracted from all plant tissue samples using a column plant total rna kit (transgen biotech, china) following the manufacturer's recommendations. the concentration of rna samples was determined using a nanodrop 2000 spectrophotometer (thermo, america) at 260 nm, whereas its purity was assessed based on absorbance ratios at 260/280 nm. samples with an optical density absorption ratio at od260/280 between 1.9 and 2.2 and od260/230 <2.0 were used for cdna synthesis. the integrity of purified rna was confirmed using agarose gel electrophoresis and ethidium bromide staining. genomic dna was isolated from young leaves (100 mg) using the cetyltrimethyl ammonium bromide method and checked by agarose electrophoresis. first-strand cdnas were synthesized using the transscript one-step gdna removal and cdna synthesis supermix (transgen biotech, china), by adding oligo (dt) primer, gremover, e-mix, and r-mix to 1 µg of total rna. rnase-free water was added to the prior mixture, and the total volume (20 µl) was incubated at 42 • c for 15 min according to the manufacturer's protocol. reverse transcriptase was inactivated by incubating the mixture at 85 • c for 5 min, and the cdna solution was stored at −20 • c. transcriptome sequencing of different b. cusia organs (root, stem, and leaf) was performed using the illumina hi-seq 2500 platform (illumina, america). after assembly and annotation, (li and dewey, 2011) was used to analyze the read counts, which were then converted into fragments per kilobase of exon per million reads mapped (fpkm) values, a commonly accepted estimate for the expression level of unigenes (trapnell et al., 2010) . on the basis of previous studies, 10 candidate reference genes ( table 1) belonging to different functional classes were selected to avoid possible co-regulation of the genes. this group of genes comprised several classical housekeeping genes commonly used as internal control for expression studies. the sequences of candidate reference genes were obtained from the transcriptome database. the open reading frame sequences of these genes were cloned ( figure s2 ). the full-length cdna sequences of the candidate reference genes are provided in data s1. moreover, a "tblastx" (ncbi) was run with the candidate gene sequences on non-redundant database using the default settings of the online program to identify b. cusia homologs. the candidate reference genes tested are listed in table 1 with their respective reference(s) where they were first described. the amplification primers for real-time pcr were designed using the primer3 software (http://www.simgene.com/primer3) (rozen and skaletsky, 2000) as a criterion to amplify products from 158 to 249 bp with a temperature of 40-60 • c (primer sequences are shown in table 2 ). amplification primers were targeted to different exons to control genomic dna contamination. the performance of the designed primers ( table 2) was tested by pcr using either b. cusia cdna or genomic dna templates ( figure s3) . the presence of spurious products of amplification caused by genomic dna was also continuously checked by verifying rt-qpcr dissociation profile. a template-free control reaction was run to ensure the absence of contamination or primer-dimer formation for each primer pair. the pcr amplification efficiency was determined for each primer combination using the slope of the standard curve obtained by plotting the fluorescence versus a given concentration of a mixture of all sample cdnas (ranging from 1:1 to 1:10,000 dilution of the cdna mixture sample) using the equation: e = 10 (−1/slope) -1 (ruijter et al., 2009) , and the values were used in all subsequent analyses ( table 2) . real-time amplification reactions were performed using sybr green detection chemistry and run in 96-well plates with the thermal cycler dice tp800 (takara, japan). reactions were performed in a total volume of 20 µl containing 2.0 µl of template, 0.5 µl of each amplification primer, 10.0 µl of 2× top green qpcr suppermix (transgen biotech, china), and 7.0 µl of ddh 2 o. all reaction components without template were used as a negative control. the amplification program was set as follows: initial denaturation step of 95 • c for 30 s to activate the dna polymerase, followed by 40 cycles of denaturation at 95 • c for 5 s and annealing at frontiers in plant science | www.frontiersin.org 60 • c for 30 s. the amplification process was followed by a melting curve analysis, ranging from 60 to 95 • c. ampliconbased fluorescence thresholds were used to obtain the ctvalues. all rt-qpcr reactions were carried out in triplicate, both technically and biologically. the final quantification cycle (cq) values were the mean of nine values (biological triplicate, each in technical triplicate). melting curve analysis of the amplification products and gel electrophoresis analysis confirmed that the primers amplified only a single product (data not shown). the expression of target genes bcsk and bcdxr, which were important in the synthesis of compounds in b. cusia, was analyzed using the selected reference genes to validate the reference gene selection. the changes in relative expression levels were calculated using the 2 ct method (pfaffl, 2001) . furthermore, the expression levels of both were compared with the fpkm values in rna-seq data in some samples (table s1 ). the suitability of candidate control genes across all the experimental sets was evaluated using three statistical algorithms, genorm (version 3.5) (vandesompele et al., 2002) , normfinder (andersen et al., 2004) , and bestkeeper (pfaffl, 2001) programs, according to their respective protocols. for genorm and normfinder, the raw ct-values of each gene were converted into the relative quantities using the formula 2 − ct ( ct = each corresponding ct-value -lowest ct-value). for bestkeeper, the ct-value in the program was used directly. genes with standard deviation (sd) >1 were considered to be unacceptable as reference genes. the pairwise variation (v n /v n+1 ) was calculated additionally using the genorm software to determine the optimal number of reference genes needed to normalize; additional control gene was not required for normalization when v was <0.15 (vandesompele et al., 2002) . the 60 samples were divided into 6 experimental sets and analyzed individually to achieve a more accurate expression analysis. set 1 comprised 21 samples from the meja-induced b. cusia leaves, set 2 comprised 21 samples from the aba-induced b. cusia leaves, set 3 comprised 6 samples from the uv-irradiated b. cusia leaves, set 4 comprised12 samples from different organs (root, stem, leaf, and flower) of b. cusia, and set 5 which was defined as the stress group comprised samples from sets 1, 2, and 3. also, the overall stability among the five sets and every variety was analyzed. a total of 10 genes were selected as candidate genes with their fpkm values, as shown in table 1 . they were glyceraldehyde-3-phosphate dehydrogenase (gapdh), malate dehydrogenase (mdh), ubiquitin 10 (ubq), ubiquitin-conjugating enzyme e2 (ubc), actin (act), 18s rrna (18s), elongation factor 1-alpha (efa), cyclophilin (cyp), alpha-tubulin (tuba), and beta-tubulin (tubb) housekeeping genes, and these genes have all been reported as stable genes in other species. the sequences of the 10 reference genes were obtained from the transcriptome database (data s1). gene characteristics are shown in table 1 . the full-length unigene sequences from transcriptome database, which were used to design the specific primers for rt-qpcr, could be obtained in ncbi (accession number srr4428209). the specificity and efficiency of each primer pair were assessed by amplification and dissociation curve analysis. first, the performance of the amplification primers was tested by realtime pcr using cdna and gdna as templates, respectively. a single band with the expected size ( table 2 ) was obtained in each case without signs of primer-dimer formation (figure s3) , and four primer pairs yielded amplicons longer than those obtained with a cdna template (figure s3 ), whereas primers for other genes were unable to amplify genomic sequences. second, the melting curve analysis of the amplification products was performed by qpcr after 40 cycles of amplification. the presence of a single peak indicated that the expected amplicons were amplified ( figure s4) . the amplicons were also examined by 1% agarose gel electrophoresis using ethidium bromide staining, and a single band of the expected size for each primer pair was observed. the correlation coefficients (r 2 ) ranged between 0.9826 and 0.9996, and pcr amplification efficiencies between 87.5 and 98.1% ( table 2) ; both were results of three technical replicas. no signals were detected in no-template controls. real-time rt-pcr was conducted on the 60 cdna samples with 10 primer pairs. the 10 candidate control genes displayed a relatively wide range of expression level with mean ct-values between 14.3 (cyp) and 36.5 (tubb) (figure 1) , suggesting that these reference genes were expressed at different levels in b. cusia. the genes with higher sd of ct-values indicated more variable expression compared with those with lower sd, as shown in table 4 . the genorm program classifies the stability of gene expression by calculating the average expression stability (m). stably expressed genes have values below 1.5, and an m-value more than 1.5 indicates lower expression stability (vandesompele et al., 2002) . the ranking order according to the m value is depicted in figure 2 . as determined by genorm, 18s and efa were the most stable reference genes in total samples. in contrast, gapdh and ubc were the least stable reference genes. under each subset, the two best reference genes in meja stress were cyp and ubq with the lowest m-value. the most preferred genes for normalization in the aba stress were mdh and tuba. the two most stable reference genes in uv stress were cyp and mdh. as for the various organs, 18s and tubb were the most stable reference genes. furthermore, the best stable control genes in the stress group (set 5) were 18s and efa. the optimal number of reference genes required for accurate normalization was determined by the pairwise variation (v). the v n /v n+1 values were below 0.15 under all experimental conditions (figure 3) , indicating that one stable reference gene was enough to obtain accurate results. the normfinder program is another visual basic application tool for microsoft excel used to determine expression stabilities of reference genes. it ranks genes based on the stability value for each reference gene. more stable gene expression has the lower stability value. moreover, this algorithm suggests the best pair of genes among the candidate reference genes analyzed using the intra-and intergroup variance. the stability values of reference genes were calculated by normfinder, as shown in table 3 . in the subset of meja stress, 18s and efa were the most stable. in aba stress, mdh and gapdh were the most stable. in uv stress, the top two stably expressed genes were cyp and mdh. in different normal organs (non-stress group), ubc and efa were the top two stably expressed genes. in set 5, efa and ubq were the most highly ranked. when evaluating the total experimental samples, ubq and 18s were the top ranked genes ( table 3 ). in contrast, gapdh was the least stable reference gene under meja stress, tubb was the least stable under aba stress, and tuba was the least stable under uv irradiation and normal organs. tuba was also the least stably expressed gene in set 5 and the second least stably expressed in total samples. the rank in normfinder was slightly different from that in genorm. genes considered as the most stable by genorm (18s and efa) ranked second and third by normfinder, respectively. the bestkeeper algorithm evaluates the stabilities of candidate reference genes based on the cv ± sd values. in the meja stress set, 18s (2.65 ± 0.36) and ubq (1.85 ± 0.46) with lowest cv ± sd values were somewhat similar to the results of genorm and normfinder. in the aba stress set, mdh (0.78 ± 0.21) and ubq figure 3 | pairwise variation (v) of 10 candidate reference genes calculated by genorm to determine the optimal number of reference genes for accurate normalization. the threshold is 0.15. (0.98 ± 0.22) were considered as the most stable genes. in the uv stress treatment, 18s (0.11 ± 0.02) and efa (0.22 ± 0.04) were identified as the best reference genes for normalization. as for the subsequent three subsets, 18s was the best stable gene, which was consistent with the result of genorm ( table 4) . the comprehensive rankings of the candidate reference genes were determined using the reffinder program online (http://fulxie.0fees.us/?type=reference). data generated by the three programs across different experimental sets were further compared, as shown in table 5 . the top-and low-ranked candidate reference genes were selected for normalizing two target genes bcsk and bcdxr under different experimental conditions. the result (figure 4) showed that the expression level of bcsk and bcdxr in b. cusia under meja treatment increased with the increase in induction time when 18s was used as a control but attained a different expression pattern with the least stable gene gapdh. in aba stress, the expression level of bcsk and bcdxr reached the highest at 6 h when using the most stable reference genes (18s) as the internal control, while the expression level was overestimated when the least stable genes were used (gapdh). similarly, the relative expression level of bcsk increased when normalized using the most stable genes (18s) in normal organs, while the expression level was overestimated when normalized using the least stable combination (gapdh). the relative expression patterns of bcdxr were opposite when the foregoing two genes were used as controls. meanwhile, ubc and tuba, which showed the most and least stable genes, respectively, in different organs subset were used as the reference genes, and the expression levels of bcdxr and bcsk were generally identified with the expression profile in rna-seq (figure 4) . the result showed that ubc was more suitable for organs in b. cusia. rt-qpcr has become a broadly accepted method of choice for accurate expression profiling of target genes in the gene expression analysis (nolan et al., 2006; vanguilder et al., 2008) . the accurate expression level of a selected gene requires appropriate internal controls, which are commonly called housekeeping or reference genes (derveaux et al., 2010) . the ideal control genes should be stably expressed under each experimental condition, independent of organs, tissues, developmental stages, and different treatments (bustin et al., 2009) . however, no universally suitable control genes are available, since numerous studies have reported that the expression of reference genes can also vary considerably with experimental conditions and different species (reid et al., 2006; wan et al., 2011; galli et al., 2013; li j. et al., 2016) . therefore, a set of potential housekeeping genes must be previously validated in each particular experimental subset. to date, few studies have compared and selected housekeeping genes in b. cusia. this has hindered the characterization of genes involved in different organisms and stress responses in b. cusia. this novel study performed the first large-scale transcriptome data analysis for b. cusia (unpublished), consisting of 293,666 unigenes, which were used for reference gene selection. the rna-seq data were useful sources for screening candidate housekeeping genes and represented an important strategy for large-scale reference gene selection for non-model plants (zhuang et al., 2015) . traditional genes, which are involved in cytoskeleton structure (act, tuba, and tubb), protein synthesis (efα and 18s), biological metabolic processes (gapdh and ubq), and multifunctional proteins (cyp and mdh), are usually used as reference genes. in this study, the foregoing 10 common internal control genes of b. cusia were cloned ( figure s2 ) for expression normalization in 60 different samples, including uv irradiation, hormonal stimuli (meja and aba), and different organs. this novel study reported a systematic analysis of reference genes that could be used in different treatment samples of b. cusia. the result demonstrated that each species might have its own suitable reference genes, which should be determined for each subset of experimental conditions. three different, yet complementary, statistical programs were used to identify the most suitable internal controls for the normalization of gene expression studies in b. cusia to minimize bias entrapped by the statistical approach. the results obtained from genorm, normfinder, and bestkeeper were not completely identical due to the different calculation programs (jacob et al., 2013) , especially under specific individual conditions. similar findings were reported in other studies on rice (kim et al., 2003) , coffee (cruz et al., 2009) , tobacco (delaney, 2010) , and parsley (li m. y. et al., 2016) . furthermore, the reffinder program (xie et al., 2012) was used to generate a comprehensive ranking of candidate reference genes. as a result, 18s rrna was the most stable gene in all subsets except aba stimuli and uv irradiation, while gapdh, tuba, and tubb yielded poor values in this study. reference genes varied in different tissues, organs, and experimental conditions in most previous studies. in rice, ubq5 and efa were the most stable genes in all the tissue samples, while 18s and 25s rrnas were the most stable genes under various treatment conditions (jain et al., 2006) . however, another study showed that 18s and 25s rrnas had the least stable expression at different developmental stages and in different varieties of rice (li et al., 2010) . moreover, the selections of reference genes were not consistent in various species. act and tubb in carrot (tian et al., 2015) were the most suitable reference genes under "abiotic stress" and "hormonal stimuli, " but they were not suitable choices in the present study. moreover, gapdh showed a good performance in "hormonal stimuli" in parsley (li m. y. et al., 2016) but was the least stable in carrot. also, gapdh was the least stable gene in eucalyptus spp. (boava et al., 2010) and petunia hybrida (mallona et al., 2010) . the results demonstrated that each species might choose its own stably expressed genes for each subset of experimental conditions. 18s rrna is frequently used as the control gene for normalizing because it is independent of developmental stages and external stimuli (gantasala et al., 2013) . moreover, the 26 reports (real-time rt-pcr carried out on barley; published in the period of january 1996 to march 2008) examined indicated that 18s rrna was the most frequently used gene (8 times) among the 16 different reference genes used (kozera and rapacz, 2013) . the expression of bcsk and bcdxr genes was quantified in this study. bcsk and bcdxr were found to play important roles in the indole and carotenoid biosynthesis pathways, respectively. genorm is used in most studies because of its capacity to determine the number of genes necessary for normalization (kozera and rapacz, 2013) . the pairwise variation (v) was analyzed in this study to determine the optimal number of genes required for normalization; the v n /v n+1 values were below 0.15 under all subset conditions (figure 3) , indicating that one stable reference gene was enough to obtain accurate results. therefore, 18s rrna and gapdh were used to testify the expression levels of bcsk and bcdxr under stress conditions and different plant organs. the expression levels of bcsk and bcdxr increased with induction time in the meja treatment subset, verifying that tias are also induced by meja in b. cusia. moreover, the more the expression of bcsk, which participates in the shikimate pathway, the higher the production of indigo ( figure s5) . however, it decreases later because of branching to other aromatic compounds. both the target genes had a peak expression at 6 h in the aba stimuli subset as a result of the plant resistance to stress, conforming to the phenomenon that aba not only promoted withering of an organism but could also improve disease resistance in plants. the expression pattern of both genes was generally identified with the expression profile in rna-seq in different plant organs. in general, 18s rrna is the best gene among all samples, whereas ubc is the most suitable gene for plant organs, thereby facilitating future studies on gene expression in b. cusia. however, normalization with multiple reference genes has become the commonly used method to avoid erroneous data that may be triggered by using a single reference gene. therefore, based on transcriptome datasets, more novel and stable reference genes can be identified from other b. cusia samples in further studies. yh, ht, lz, and yd conceived and designed the study. yh, jy, gw, and yc collected the tissue material and performed the experiments. yh, jc, zg, and qz performed data analysis. yh wrote the manuscript. all authors read and approved the final manuscript. the supplementary material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017. 00668/full#supplementary-material data s1 | the full-length cdna sequences of candidate genes. the orf regions were highlighted in blue color. normalization of real-time quantitative reverse transcription-pcr data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets role of plant hormones in plant defence responses selection of endogenous genes for gene expression studies in eucalyptus under biotic (puccinia psidii) and abiotic (acibenzolar-s-methyl) stresses using rt-qpcr selection of candidate reference genes for realtime pcr studies in lettuce under abiotic stresses quantification of mrna using real-time reverse transcription pcr (rt-pcr): trends and problems the miqe guidelines: minimum information for publication of quantitative real-time pcr experiments a quantitative rt-pcr platform for high-throughput expression profiling of 2500 rice transcription factors evaluation of coffee reference genes for relative expression studies by quantitative real-time rt-pcr stable internal reference genes for normalization of real-time rt-pcr in tobacco (nicotiana tabacum) during development and abiotic stress how to do successful gene expression analysis using real-time pcr selection of internal control genes for quantitative real-time rt-pcr studies during tomato development process intervention effects of qrzslxf, a chinese medicinal herb recipe, on the dorβ-arrestin1-bcl2 signal transduction pathway in a rat model of ulcerative colitis selection of reliable reference genes for quantitative real-time polymerase chain reaction studies in maize grains selection and validation of reference genes for quantitative gene expression studies by real-time pcr in eggplant (solanum melongena l) selection and validation of reference genes for quantitative real-time pcr in gentiana macrophylla arsenic disulfide induced apoptosis and concurrently promoted erythroid differentiation in cytokine-dependent myelodysplastic syndrome-progressed leukemia cell line f-36p with complex karyotype including monosomy 7 evaluation of meisoindigo, an indirubin derivative: in vitro antileukemic activity and in vivo pharmacokinetics careful selection of reference genes is required for reliable performance of rt-qpcr in human normal and cancer cell lines validation of housekeeping genes as internal control for studying gene expression in rice by quantitative real-time pcr identification of three shikimate kinase genes in rice: characterization of their differential expression during panicle development and of the enzymatic activities of the encoded proteins normalization of reverse transcription quantitative-pcr with housekeeping genes in rice reference genes in real-time pcr rsem: accurate transcript quantification from rna-seq data with or without a reference genome selection of reliable reference genes for gene expression analysis under abiotic stresses in the desert biomass willow, salix psammophila chemical studies of strobilanthes cusia suitable reference genes for accurate gene expression analysis in parsley (petroselinum crispum) for abiotic stresses and hormone stimuli validation of candidate reference genes for the accurate normalization of real-time quantitative rt-pcr data in rice during seed development natura-alpha targets forkhead box m1 and inhibits androgen-dependent and -independent prostate cancer growth and invasion comparison of refined and crude indigo naturalis ointment in treating psoriasis: randomized, observer-blind, controlled, intrapatient trial validation of reference genes for gene expression studies in virus-infected nicotiana benthamiana using quantitative real-time an indirubin derivative, indirubin-3 ′ -monoxime suppresses oral cancer tumorigenesis through the downregulation of survivin validation of reference genes for quantitative real-time pcr during leaf and flower development in petunia hybrida transcriptional regulatory networks in response to abiotic stresses in arabidopsis and grasses pharmacopoeia of the people's republic of china vol 1 quantification of mrna using real-time rt-pcr a new mathematical model for relative quantification in real-time rt-pcr reference gene selection for rt-qpcr analysis of flower development in chrysanthemum morifolium and chrysanthemum lavandulifolium an optimized grapevine rna isolation procedure and statistical determination of reference reference genes selection in b. cusia genes for real-time rt-pcr during berry development primer3 on the www for general users and for biologist programmers amplification efficiency: linking baseline and bias in the analysis of quantitative pcr data analyses of catharanthus roseus and arabidopsis thaliana wrky transcription factors reveal involvement in jasmonate signaling the application of rna-seq to the comprehensive analysis of plant mitochondrial transcriptomes research progress of chemical constituents and pharmacological activities for baphicacanthus cusia (nees)bremek therapeutic efficacy of the qing dai in patients with intractable ulcerative colitis selection of suitable reference genes for qpcr normalization under abiotic stresses and hormone stimuli in carrot leaves potential utilization of nac transcription factors to enhance abiotic stress tolerance in plants by biotechnological approach transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation abscisic acid and abiotic stress signaling accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes twenty-five years of quantitative pcr for gene expression analysis cloning and expression of cdnas encoding two enzymes of the mep pathway in catharanthus roseus identification of reference genes for reverse transcription quantitative real-time pcr normalization in pepper (capsicum annuum l.) synergistic effects of ultraviolet-b and methyl jasmonate on tanshinone biosynthesis in salvia miltiorrhiza hairy roots evaluation and selection of appropriate reference genes for real-time quantitative pcr analysis of gene expression in nile tilapia (oreochromis niloticus) during vaccination and infection rna-seq: a revolutionary tool for transcriptomics characterization of anti-leukemia components from indigo naturalis using comprehensive two-dimensional k562/cell membrane chromatography and in silico target identification chemical studies of baphicacanthus cusia (nees)bremek. chinese traditional. herb. drugs selection of suitable reference genes for qrt-pcr normalization during leaf development and hormonal stimuli in tea plant (camellia sinensis) mirdeepfinder: a mirna analysis tool for deep sequencing of plant small rnas methyl jasmonate and its potential in cancer therapy effects of adding vindoline and meja on production of vincristine and vinblastine, and transcription of their biosynthetic genes in the cultured cmcs of catharanthus roseus selection of appropriate reference genes for quantitative real-time pcr in oxytropis ochrocephala bunge using transcriptome datasets under abiotic stress treatments the authors thank professor wansheng chen, dr. qing li, and dr. ying xiao for their kind suggestions and help in the study. key: cord-314642-oobbdgzh authors: campbell, allan title: the future of bacteriophage biology date: 2003 journal: nat rev genet doi: 10.1038/nrg1089 sha: doc_id: 314642 cord_uid: oobbdgzh after an illustrious history as one of the primary tools that established the foundations of molecular biology, bacteriophage research is now undergoing a renaissance in which the primary focus is on the phages themselves rather than the molecular mechanisms that they explain. studies of the evolution of phages and their role in natural ecosystems are flourishing. practical questions, such as how to use phages to combat human diseases that are caused by bacteria, how to eradicate phage pests in the food industry and what role they have in the causation of human diseases, are receiving increased attention. phages are also useful in the deeper exploration of basic molecular and biophysical questions. bacteriophages (also known as phages) are viruses that infect bacteria. as with all viruses, phages are infectious particles that have at least two components: nucleic acid and protein. if a phage invades a susceptible bacterial cell, the phage (or at least its nucleic acid) enters the cell and triggers a cycle of phage production (fig. 1) . during this cycle, the cell is reprogrammed to become a phage factory in which the components of the biosynthetic apparatus (such as ribosomes and atp generators) are diverted from their normal tasks in bacterial growth. the various pathways for reprogramming are initiated by phagespecified proteins, which are translated from phage mrna that is made after infection. the temporal programme is ordered and regulated. generally, nucleic-acid replication occurs first, followed by the synthesis of structural proteins of the phage particle. new phage particles are then assembled, and are later released from the cell. for most of the classically studied coliphages, release is effected by the disruption of the cell envelope and sudden lysis of the cell, the contents of which (including the new phage particles) spill out into the medium. the whole cycle can last ~40 minutes and produce ~100 new phages (fig. 1) . this type of lytic cycle is the only mode of reproduction for many phages. other one reason that phages have been useful in research is the ease of synchronizing the lytic cycle in a population of cells, so that the progression of molecular events can be measured over the whole culture. synchronization might be achieved either by simultaneous infection or by the induction of phage development in lysogenic cells. a second reason is the ease with which mutations that affect specific stages of the cycle can be isolated and analysed. phages are as varied in size and form as the viruses that infect eukaryotes. several formal classification schemes have been proposed, but none is widely useful or known to reflect phylogeny. table 1 summarizes basic information on common model phages and illustrates some of the diversity in phage morphology and lifestyle. the phylogeny of viruses has eluded generations of virologists. modern methods have redefined the questions rather than achieved definitive answers. in closely related groups, isolates can be arranged by the degree of similarity, but whether all viruses (or even all phages) constitute a monophyletic group remains uncertain. the development of ideas, starting in the pre-genome era, is illustrated by work on phages and their relatives. in the 1950s, numerous temperate phages (lambdoids) were discovered that resembled the phage-λ morphologically and could recombine with it. electron microscopy of heteroduplexes, coupled with the results from genetic crosses, verified that the lambdoid phages have a common genetic map but show diverse after an illustrious history as one of the primary tools that established the foundations of molecular biology, bacteriophage research is now undergoing a renaissance in which the primary focus is on the phages themselves rather than the molecular mechanisms that they explain. studies of the evolution of phages and their role in natural ecosystems are flourishing. practical questions, such as how to use phages to combat human diseases that are caused by bacteria, how to eradicate phage pests in the food industry and what role they have in the causation of human diseases, are receiving increased attention. phages are also useful in the deeper exploration of basic molecular and biophysical questions. approximately 25 years after their discovery in 1915 (ref. 1), the use of phages to answer fundamental questions about the mechanisms of inheritance began, taking advantage of the ease with which they could be manipulated 2 . between the years 1940 and 1970, phage experiments laid much of the groundwork for the science of molecular biology. that science, once dubbed 'modern' as distinct from 'classical' biology, has long since been assimilated into the textbooks. however, recent research indicates that the historical contributions that phage studies have made to molecular biology might be overshadowed by their future influence on biology in general, and on genetics in particular. phages have important roles in global ecology and bacterial pathogenicity. these emerging roles, together with the increasing number of recent publications on phage genomics, highlight the need for a natural selection has preserved some benign combinations. this raises the question of how the parental phages that give rise to a new junction came to be so different from one another. however, before addressing this issue, we leave the myopic consideration of lambdoid phages to look at what genome analysis implies about phages that are essentially unrelated, even those that come from highly diverse hosts. on that level, the message is clear. natural phage genomes are frequently chimeric in origin and contain genes that have been transferred over great evolutionary distances 5 . as with the genesis of new junctions among the lambdoid phages, the potential number of such transfers must greatly exceed what has passed through the sieve of natural selection. among the more marked examples of molecular chimerism is phage n15. whereas most of its phage genes are related to, and in the same order as, the genes of a lambdoid phage, its prophage form persists as a linear plasmid that includes sequences that are related to the f plasmid of escherichia coli as well as to coliphages p1 and p4 (ref. 6 ). this widespread chimerism forces us to believe that phage genomes collectively constitute a pool that is subject to continual mixing. this pool includes groups, such as the lambdoid phages, in which mixing is even more frequent. however, some gene blocks that are found in lambdoid phages might have resided elsewhere for much of their history, and have diverged over long periods during which they were not subject to frequent recombination with other lambdoid phages. it should be noted that molecular clock estimates indicate that some allelic variants of λ-genes diverged long before the existence of e. coli, which is the host of λ, perhaps even as long ago as the branching of bacteria from archaea and eukaryotes 4,7 . there are many reasons to question the accuracy of dating by molecular clocks 8 . however, the antiquity of viral genes is also indicated by an independent line of evidence 9 . there are marked examples (such as adenoviruses and reoviruses) in which the morphology of the viral capsid is similar in a group that includes animal viruses, plant viruses and bacteriophages. these similarities extend from the shapes of individual proteins to the overall morphology of the virus particle (composed of several protein species) and seem unlikely to represent evolutionary convergence. this indicates a common ancestry for reoviruses and phage φ6, for example, although the rna and protein sequences show no significant similarities and, therefore, must have diverged anciently, perhaps before the separation of the main domains of the living world. hendrix 4 offered the most comprehensive interpretation of these results. if different closely related phages, such as lambdoids, encounter one another in nature, they can recombine. the frequency of recombination depends on the degree of similarity of the two partners at the recombination points. so, between any two phages, shared segments of homology can serve as recombinational linkers, and even identities of oligonucleotide length can create preferred exchange points. none of this is new. however, breakage and joining sometimes occur even if there is no homology -which represents new ground -and over evolutionary time this happens often enough to generate the sharp boundaries between homology and non-homology that are observed in phages. most such new junctions should be lethal or deleterious, but specificities for properties such as integration, repression and host range. these studies also showed that any two lambdoid phages are similar in some segments of their genomes and dissimilar in others, with abrupt transitions from similarity to dissimilarity 3 . extensive dna sequencing has confirmed this general picture. it has also shown that segments with altered specificity (such as repressor genes), although differing greatly in sequence, are nonetheless orthologous. comparing all the known lambdoid phages, there is little doubt that they have undergone extensive recombination in nature, mixing and matching heterologous specificity determinants. sequencing also verifies the sharpness of most boundaries between segments of close similarity and those of extreme dissimilarity. at the base of the natural food chain are the photosynthetic microbes: green algae and cyanobacteria. for both of these groups, a substantial fraction of their biomass is liberated as dissolved organic material by viral infection. heterotrophic consumers of dissolved organic matter are also frequently killed by viral infection. no account of the natural food web that omits bacteriophages is even close to being complete 15, 16 . molecular machines. the focus of biochemistry has advanced from chemical catalysis by individual proteins to mechanical processes that are carried out by molecular assemblages. phage systems provide many instructive examples. the forces that drive dna injection during infection by the large-tailed dna phages have long eluded direct identification. for many common phages, such as λ and t4, this remains the case. more progress has been made with phages, such as t7, which inject their dna slowly. the initial step is the injection of t7 proteins from the phage particle to form a channel across the cell envelope that conducts dna into the cell by a process that is apparently enzymatic 17 . next, the dna is pulled into the cell by transcription with an effectively stationary rna polymerase (first host then phage polymerase) that reels dna through itself 18, 19 . the type i restriction enzyme eco ki can replace rna polymerase to reel in dna 20 . in assembling large double-stranded dna (dsdna) phages, viral dna can be pulled into preformed heads by the portal protein that is located at one vertex of the icosahedron. detailed studies of the physical mechanism were made on bacillus phage φ29 (ref. 21 ). the dodecameric portal protein rotates in the protein shell of the phage, in steps of 12 °, which reel in phage dna and are powered by atp hydrolysis (fig. 2) . the forces that act on individual dna molecules that are being packaged into the heads of φ29 have been accurately measured, and show that the φ29 portal protein generates one of the strongest known molecular motors 22 . as well as such motors, phages use numerous molecular machines. the insertion of phage dna into the chromosome during lysogenization is a good example. several molecules of λ-integrase form an assemblage (the intasome) that binds to a specific segment of phage dna, recruits the bacterial insertion sites into the complex, then makes concerted single-strand cuts in the two dnas. the strand cutting proceeds through the covalent joining of a 3′ phosphate to a tyrosine of the integrase protein. the terminal three bases of one characteristic of a phage family such as the lambdoids, is a common genetic map. the main determinants for structural and regulatory elements in the phage genome are present in the same order throughout the family. as discussed previously, these elements have sometimes been replaced by homologues or analogues from other phages. however, at some time in the past these genomes must have come into being, probably by the merging of smaller genomes with more limited functions. so, the original lambdoid phage might have derived its replication genes from one source, its lysis genes from a second source and its structural genes from a third. genes might have been added one at a time or in larger blocks 3, 4 . the exciting point is that such gene addition is not just a long ago fait accompli but is an active continuing process. the hendrix group 4 have identified segments of the genomes of lambdoid phages that seem to have been spliced into the phage genome recently. these segments (dubbed 'morons') typically include a single gene, which is often of unknown function, and generally differ in base composition from the rest of the phage genome. each moron is present in some lambdoid phages and absent from others. the incorporation by phages of toxin genes and longer pathogenicity islands might be an example of the same kind of process (see later). a complete understanding of the natural dynamics that underlie the recombinational origin of new phage types will require more specific information on the role of natural selection -not only the elimination of maladapted types (as mentioned earlier) but also active selection for the few superior variants. further efforts in this direction are desirable, not just for their intrinsic interest but also as models for viruses that infect eukaryotic hosts; for example, it is well documented that members of the coronavirus group, which has attracted attention recently through the emergence of a deadly new human pathogen that causes severe acute respiratory syndrome (sars), undergo frequent recombination in nature 10, 11 . the long-held suspicion that phages are ubiquitous throughout the prokaryotic world has been confirmed as our knowledge has expanded to include more phages and phagerelated sequences in prokaryotic genomes. the numerical abundance of phage-like particles in aquatic ecosystems generally exceeds that of prokaryotic cells and might represent >10% of the biomass of those cells. furthermore, as documented in recent reviews 12, 13, 14 , there is increasing evidence to indicate that a large fraction of these visible particles comprises viable infectious phages. www.nature.com/reviews/genetics p e r s p e c t i v e s predicted by the finding that the repressor assembles to form octamers, as required for cooperative binding 33 . our initial understanding of in vivo switching was incomplete in another respect. the earlier work showed that the or switch, by itself, is bistable. the ci gene, which encodes the repressor, is transcribed leftward from or, whereas the secondary repressor cro and other gene products come from the rightward message. both repressor and cro can bind to three sites in or and shut-off transcription in both directions. however, one or the other will eventually get the upper hand. if repressor synthesis is established, cro synthesis is shut off, and vice versa. the system, therefore, provides an elegant model for how switching events might be controlled in other contexts, such as in cell differentiation in multicellular eukaryotes 28, 34 . as for the phage system, it might seem gratuitous to question whether the crucial events have been identified, as the simple model seems to provide a sufficient explanation. the shut-off of repressor synthesis by cro depends on cro binding to the or3 operator site, which could be the crucial step in activating the lytic cycle. in this view, the establishment of repression depends on a race between cro and the repressor. however, it has been known from early on that another protein, cii, which acts at the promoter site pre, is crucial in repressor establishment 35 and that cro that is bound to the other or sites (or2 and or1) decreases cii production. this makes it equally possible that the decisive race is between cro and cii, with or3 having, at most, an ancillary role. the definitive experiments are yet to be done. a comprehensive modelling of the switch is available 36 , which agrees with the published data on lysogenization frequencies as a function of average phage input 37 . the complex regulation of the switch is shown in fig. 3 . however, in this pioneering work most of the kinetic parameters are estimates rather than direct measurements. this work underscores the potential for modelling whole-cell physiology with the present high-throughput data on transcription rates and protein abundance. still lacking is a complete model that predicts the strong overshoot in repressor concentration that is observed during lysogenization 34 . because the rate of repressor synthesis from cii-stimulated pre is much higher than that observed in a lysogen, it seems unlikely that cro that is bound to or3 can prevent repressor establishment if cii synthesis achieves its maximum rate. it was previously thought that holliday junctions were formed by re-ligation at the cutting site, followed by branch migration to the resolution site (7-bps removed). since then, important progress has been made on the nature of the initial dna-protein interactions 25 gene regulation. in a series of incisive experiments, ptashne and colleagues defined the modus operandi of the lysis/lysogeny decision in phage as a model genetic switch that operates in the right operator sites (or) 29 (fig. 3) . this work has recently been extended by the discovery that the natural switch is strongly influenced by the simultaneous interaction of the repressor with the leftward operator (ol), which is more than 3-kb away [30] [31] [32] ; this was the free 5′ ends then rotate to exchange partners, followed by ligation with removal of the integrase protein from the 3′ ends to form a holliday junction. the assemblage then undergoes a rearrangement (isomerization) and resolves the holliday structure by cutting the two previously uncut strands on the other side of the junction, which allows the exchange of partners by the freed 5′ ends, followed by resealing 23 . the reverse reaction (excision from the chromosome) requires a second phage-specified protein (excisionase) and proceeds by an analogous mechanism, which is, however, not a simple chemical reversal of the integration reaction 24 . probably the most important recent breakthrough was the discovery that both initiation and resolution involve strand swapping of the free 5′ ends, as described earlier 22 . shown by a set of four blue spheres and one red sphere. the dna base that is aligned with the active connector monomer is also shown in red. a | the active prna-atpase i interacts with the adjacent connector monomer (a), which in turn contacts the aligned dna base. b | the narrow end of the connector has moved anti-clockwise by 12 °, to place the narrow end of monomer c opposite atpase ii, the next atpase to be fired; this causes the connector to expand lengthwise by slightly altering the angle of the helices in the central domain (white arrow with asterisk). c | the wide end of the connector has followed the narrow end, as the connector relaxes and contracts (white arrow with two asterisks), thereby causing the dna to be translated into the phage head. for the next cycle, atpase ii is activated, which causes the connector to be rotated by another 12 °, and so on. at present, phage therapy can be described as promising, but despite its long history its ability to deliver on the promise remains to be fully shown. however, there have been clear successes with experimentally infected animals 40, 41 , and mathematical modelling corroborates the expectations of feasibility 42 . much of the earlier work was relatively crude and less controlled, and the whole area was dismissed without a thorough evaluation (largely because of the availability and economy of antibiotic therapy). now, the numerous potential problems, such as the deleterious effects of the phages themselves, the inactivation of phages by neutralizing antibodies and the genesis of phage-resistant mutants, remain to be rigorously addressed. in other areas that require bacterial survival under conditions of incomplete sterility, such as industrial fermentation, phages are the enemy. phage lysis of lactic-acid bacteria has long been a nuisance in cheese factories. extensive studies on phages that infect lactic-acid bacteria are continuing in many countries [43] [44] [45] . a long time 'holy grail' of research that is directed towards the elimination of the problem has been the isolation or engineering of stable bacterial mutants that are resistant to all of the phages that might attack a given culture 46, 47 . there are many occasions on which molecular biologists would like to identify, from a large collection of proteins or peptides, those few that have some special property (such as binding to a low molecular-weight ligand or to a specific site on a protein or dna molecule). it is easy to fractionate the collection on the basis of its binding properties, but if a binding species is rare, it might be impractical to isolate enough material to determine its structure. in such cases, it is preferable to isolate not the protein or peptide, but rather the dna that encodes it and can easily be amplified. the simplest way to do this is in a system in which the fractionation process brings the dna along with the protein it encodes. surface antigens on bacterial cells can be treated in this way. a more powerful method is to display the protein or peptide on the surface of a phage. phage particles are much smaller than bacteria, so larger libraries can be screened. if the library is composed of translational fusions to a protein that is abundant on the surface of a phage particle, the sensitivity is high; under conditions of single infection, the binding properties of a phage particle are determined by the gene fusion in its dna. phage therapy. following their discovery, phages were regarded as possible agents to combat pathogenic bacteria (as discussed by merril et al. 38 ). most early attempts at phage therapy were unsuccessful or inconclusive, apparently for trivial technical reasons such as the absence of placebo controls or the rapid clearance of injected phage from the bloodstream before it had time to reach its intended target 38, 39 . with the advent of antibiotics, the development and use of phage therapy almost disappeared from the western world, although it continued in the soviet union and eastern europe 39 . as the incidence of antibiotic resistance continues to rise, the field has been reactivated 38 . phage therapy has some inherent advantages over antibiotics, in both the specificity and the ability of the agent to propagate in the desired location (advantages that were seldom realized in earlier work with impure phage preparations) 39 . as with antibiotics, the efficiency is limited by bacterial mutation, which will doubtless be a great challenge to future development. (fig. 1) , the phage can either enter the lytic cyle -in which it reproduces and then lyses its host -or the lysogenic cycle -in which the phage dna is incorporated into the host dna and is replicated indefinitely. dashed boxes enclose operator sites that comprise a promoter-control complex. the three operator sites o r1-3 of the 'λ-switch' control the promoters p rm and p r . cro and ci dimers bind to the three sites with different affinities and in the opposite order to control the activation of the p rm and p r promoters 29 . if protein n is available, transcribing rna polymerases (rnaps) can be anti-terminated at the nut r and nut l sites; the termination sites t r1 and t l1 are inoperative for anti-terminated rnaps. the ci dimer acts as either a repressor or activator of the promoter p rm , depending on its concentration. p1 and p2 are proteases that degrade the cii phage protein. ciii is a phage gene product that is also a substrate for p1. by binding to p1, ciii inhibits the degradation of cii. b | λ-decision-circuit dna organization. phage-encoded genetic elements of the decision circuit are located in a 5,000-nucleotide region of the phage dna. genes are separated into leftward and rightward transcribed strands as indicated by the arrows. rightward extensions of the anti-terminated p r transcript transcribe the o and p genes that are essential for phage genome replication, and the q gene that controls the transcription of later genes on the lytic pathway. leftward extension of the anti-terminated p l transcript transcribes xis and int genes, which are essential for phage-chromosome integration and excision both into and out of the host chromosome. the locations of four termination sites are indicated by t r1-2 and t l1-2 . figure modified with permission from ref. 33 . www.nature.com/reviews/genetics p e r s p e c t i v e s direct insertion of toxin genes into phage dna, which might be transposon mediated. the latter possibility is more likely if the toxin gene is not at a terminus of the prophage, but rather in the middle, as in the λ-related coliphage 933w, which encodes a shiga-like toxin 55 . the evolution of bacterial pathogenicity can only be understood by including the role of phages and pathogenicity islands. prophages and pathogenicity islands seem to be closely related; however, assumptions about the nature of the relationship might be premature. pathogenicity islands belong to a broader category of dna known as specialization islands. specialization islands are blocks of contiguous genes that are distinguished by several criteria. first, they are dedicated to a common function, such as pathogenicity, which is not directly needed for simple survival. second, they differ from the surrounding dna in molecular statistics such as the g+c content, codon usage and neighbour relations. third, they are present in certain strains of a bacterial species and absent from others. their widespread occurrence has only come to light through whole genome sequencing, and their molecular statistics indicate that such elements have been derived in the relatively recent past from other bacterial species, to which they might be indigenous 56 . their intercalation in only certain strains of the species in which they are now observed indicates the operation of an insertion mechanism, such as phage integration or transposition; also, some islands are flanked by phage-insertion sites, which technically qualifies the whole unit, whatever its historical origin, as a defective prophage 57, 58 . this raises a host of questions for future research. one scenario (perhaps overly facile) is that the whole unit arose in the species of origin by rearrangements of host and phage dna, that it was introduced into the species in which we now find it by packing into a phage coat, followed by injection and lysogenization, and that, in its present host, it moves from one strain to another by infection and lysogenization. there is, at present, an extreme paucity of evidence to show that the transfer among strains happens by this method rather than by genetic recombination mechanisms that affect all chromosomal genes. at the other extreme, we might imagine that pathogenicity islands, similar to other genes that are scored as alien 59 , could move between species by some other mechanism (perhaps transformation or conjugation) and become inserted into the recipient chromosome by transposition or some other pathogenicity islands that are potentially phage-related. so, a framework for incisive studies on phage-mediated diseases has developed 52, 53 . it is easier to guess the future of the mechanistic studies than that of the evolutionary studies, in which many lines of inquiry converge. even from the vantage point of wholegenome sequencing, the prevalence of lysogeny might be underestimated because the identification of prophages requires homology to known phages. it has also long been known that complete prophages, which are able to generate plaque-forming particles, frequently deteriorate into incomplete (defective) prophages by changes that range from point mutations to extensive deletions and/or insertions. a particular type of defective phage -specialized transducing phages -can result from faulty excision from the chromosome, which replaces some phage dna by host dna adjacent to the insertion site 54 . this is one mechanism whereby phages might have acquired toxin genes. another is recent examples include fusions of domains of surface antigens of the human pathogen neisseria meningitidis onto the surface protein of phage t4, for possible use in vaccine construction, and fusions of random five amino-acid sequences to identify possible binding partners for a nuclease that cuts t4 dna for packaging 48, 49 . it was discovered early on that in some important bacterial diseases, such as diphtheria 50 and botulism 51 , pathogenicity depends on the presence of certain prophages, which generally encode toxins. knowledge has accumulated gradually, but the area has recently blossomed for at least two reasons. first, advances in eukaryotic cell biology and bacterial gene regulation allow a more profound understanding of the interaction between pathogenic bacteria and the human cell. second, whole-genome sequencing of pathogenic bacteria has disclosed the prevalence not only of prophages, but also of glossary analogues genes (or their products) that are not of common ancestry, but which have equivalent functions. the protein-mediated prevention of the termination of rna synthesis. bistable having two steady states that are stable to small fluctuations. coliphage a bacteriophage that infects escherichia coli bacteria. conjugation the intercellular transfer of dna that is mediated by pili, which are surface appendages that are encoded by certain bacterial plasmids and transposons. heteroduplex a dna molecule that is formed by base pairing between strands that are derived from two dna molecules that are not identical in sequence. a point at which the strands of two double-stranded dna molecules exchange partners, which occurs as an intermediate in genetic recombination. homologues genes (or their products) that are descended from a common ancestral gene. intasome an assemblage of integrase molecules that are bound to their dna substrate. lambdoid belonging to a group of phages that are related to λ. ligand a molecule that binds non-covalently to another type of molecule. a bacterium that harbours phages in a latent form, from which it can be activated to produce infectious phage particles. a group that contains all the organisms that are descended from a common evolutionary ancestor. orthologue the form of a gene in one species that corresponds most directly to a similar gene in another species. prophage the latent form of phage dna that is present in lysogenic bacteria. a phage that is able to form lysogenic bacteria. transformation the uptake of exogenous dna that becomes permanently incorporated into the genome of a cell. an artificial construct in which the coding regions of two different proteins are juxtaposed so as to generate a single chimeric protein product if translated. transposition the transposon-mediated movement of a segment of dna. a bacterial enzyme that moves along dna to cleave it far from its specific site of entry. mechanism. in this picture, such foreign dna is frequently found inside prophages or defective prophages simply because, from a bacterial perspective, prophages constitute junk dna the disruption of which does not affect bacterial survival. insertion of island dna into a pre-existing prophage might seem to add an unnecessary step to the mechanism of origin, but that is not the case. the insertion of island dna into the surrounding phage-related dna must have happened somewhere sometime, and it frequently now bears no traces of known insertion processes such as transposition. further work should illuminate two outstanding questions. first, if this and other 'alien' dna came from some distantly related donor species of bacteria, what was the donor? the bacterial genomes that have been sequenced so far provide only a handful of examples of lateral transfer in which both donor and recipient might be inferredperhaps simply because not enough bacteria have yet been sequenced. second, is the potential mobility of the element in its present host a significant factor in the epidemiology of the diseases? a scientific field derives its interest less from what we know than from the unanswered questions that seem approachable by the available methods. i have tried to indicate some such questions here, many of which have been incubating for years and have been explored through more lines of evidence than could be discussed. however, the time is ripe to seek definitive answers with improved methods. future research should include the stochastic kinetics analysis of developmental pathway bifurcation in phage λ-infected escherichia coli cells lysogenization by bacteriophage-λ. i. multiple infection and the lysogenic response long-circulating bacteriophage as antibacterial agents bacteriophage therapy successful treatment of experimental e.coli infections in mice using phage is generally superior over antibiotics bacteriophage therapy rescues mice bacteremic from a vancomycin-resistant enterococcus faecium phage therapy revisited: the population biology of a bacterial infection and its treatment with bacteriophages and antibiotics comparative genomics reveals close genetic relationships between phages from dairy bacteria and pathogenic streptococci: evolutionary implications for prophage-host interactions phages of dairy bacteria characterization of six leuconostoc fallax bacteriophages isolated from an industrial sauerkraut fermentation bacteriophagetriggered defense systems: phage adaptation and design improvements broad-range bacteriophage resistance in staphylococcus thermophilus by insertional mutagenesis display of a pora peptide from neisseria meningitidis on the bacteriophage t4 capsid surface a bipartite bacteriophage t4 soc and hoc randomized peptide display library: detection and analysis of phage t4 terminase (gp17) and late σ-factor (gp55) interaction studies of the virulence of bacteriophageinfected strains of corynebacterium diphtheriae conversion of toxigenicity in clostridium botulinum type c a satellite phage-encoded antirepressor induces repressor aggregation and cholera toxin gene transfer bacteriophage control of bacterial virulence transduction and segregation in escherichia coli k-12 sequence of shiga toxin 2 phage 933w from escherichia coli 0157:h7: shiga toxin as a phage late-gene product detecting alien genes in bacterial genomes pathogenicity islands of virulent bacteria: structure, function and impact on bacterial evolution comparative dna analysis across diverse genomes an investigation on the nature of ultramicroscopic viruses the growth of bacteriophage bacteriophages: evolution of the majority evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage the plasmid prophage n15: a linear dna with covalently closed ends ancient phylogenetic relationships nucleotide sequence of coliphage hk620 and the evolution of lambdoid phages evolution of viral structure evidence of natural recombination within the s1 gene of infectious bronchitis virus fundamental virology viroplankton: viruses in aquatic ecosystems viruses and nutrient cycles in the sea dynamics and distribution of cyanophages and their effect on marine synechococcus spp the significance of viruses to mortality in aquatic microbial communities viruses in marine planktonic systems no syringes please, ejection of dna from the virion is enzyme driven rna polymerasedependent mechanism for the stepwise t7 phage transport from the virion into e. coli rate of translocation of bacteriophage t7 dna across the membranes of escherichia coli translocation and specific cleavage of bacteriophage t7 dna in vivo by eco ki structure of the bacteriophage φ29 dna packaging motor the bacteriophage straight φ29 portal motor can package dna against a large internal force swapping dna strands and sensing homology without branch migration in λ-site-specific recombination sitespecific recombination intermediates trapped with suicide substrates architectural flexibility in λ-site-specific recombination: three alternate conformations channel the attl site into three distinct pathways analysis of higher order intermediates and synapsis in the bent-l pathway of bacteriophage λ-sitespecific recombination site-specific photo-cross-linking between λ-integrase and its dna recombination target arm-site binding for λ-integrase: solution structure and functional characterization of its amino-terminal domain phage and higher organisms four dimers of λ-repressor bound to two suitably spaced pairs of λ-operators form octamers and dna loops over long distances octamerization of λ-ci repressor is needed for effective repression of p(rm) and switching from lysogeny the primary self-assembly reaction of bacteriophage ci repressor dimers is to octamer control of λ-repressor synthesis the author thanks d. bamford, b. weisberg and three anonymous reviewers for helpful suggestions. key: cord-309556-xv3413k1 authors: chow, ryan d.; chen, sidi title: the aging transcriptome and cellular landscape of the human lung in relation to sars-cov-2 date: 2020-04-15 journal: biorxiv doi: 10.1101/2020.04.07.030684 sha: doc_id: 309556 cord_uid: xv3413k1 since the emergence of sars-cov-2 in december 2019, coronavirus disease-2019 (covid-19) has rapidly spread across the globe. epidemiologic studies have demonstrated that age is one of the strongest risk factors influencing the morbidity and mortality of covid-19. here, we interrogate the transcriptional features and cellular landscapes of the aging human lung through integrative analysis of bulk and single-cell transcriptomics. by intersecting these age-associated changes with experimental data on host interactions between sars-cov-2 or its relative sars-cov, we identify several age-associated factors that may contribute to the heightened severity of covid-19 in older populations. we observed that age-associated gene expression and cell populations are significantly linked to the heightened severity of covid-19 in older populations. the aging lung is characterized by increased vascular smooth muscle contraction, reduced mitochondrial activity, and decreased lipid metabolism. lung epithelial cells, macrophages, and th1 cells decrease in abundance with age, whereas fibroblasts, pericytes and cd4+ tcm cells increase in abundance with age. several age-associated genes have functional effects on sars-cov replication, and directly interact with the sars-cov-2 proteome. interestingly, age-associated genes are heavily enriched among those induced or suppressed by sars-cov-2 infection. these analyses illuminate potential avenues for further studies on the relationship between the aging lung and covid-19 pathogenesis, which may inform strategies to more effectively treat this disease. these data indicate that differential expression of sars-cov-2 host entry factors alone is unlikely to explain the relationship between age and severity of covid-19 illness. to discern the host cell types involved in covid-19 entry, we turned to a single cell rna-seq (scrna-seq) dataset of 57,020 human lung cells from the tissue stability cell atlas 19 . in agreement with prior reports, analysis of the single cell lung transcriptomes revealed that alveolar type 2 (at2) cells were comparatively enriched in ace2 and tmprss2-expressing cells 20, 21 ( supplementary figure 3a-b) . however, ace2-expressing cells represented only 1.69% of all at2 cells, while 47.52% of at2 cells expressed tmprss2. alveolar type 1 (at1) cells also showed detectable expression of ace2 and tmprss2, but at lower frequencies (0.39% and 26.70%). ctsl expression could be broadly detected in many different cell types including at2 cells, but its expression was particularly pronounced in macrophages (supplementary figure 3c) . since the expression of host entry factors ace2, tmprss2 and ctsl did not increase with age, we next sought to identify all age-associated genes expressed in the human lung (methods). using a likelihood-ratio test 22 , we pinpointed the genes for which age significantly impacts their expression. with a stringent cutoff of adjusted p < 0.0001, we identified two clusters of genes in which their expression progressively changes with age (figure 1d) . cluster 1 is composed of 643 genes that increase in expression with age, while cluster 2 contains 642 genes that decrease in expression with age. gene ontology and pathway analysis of cluster 1 genes (increasing with age) revealed significant enrichment for cell adhesion, vascular smooth muscle contraction, oxytocin signaling, and platelet activation, in addition to several other pathways (figure 1e) . these findings are consistent with known physiologic changes of aging, including decreased pulmonary compliance 23 , and heightened risk for thrombotic diseases 24 . of note, deregulation of the reninangiotensin system has been implicated in the pathogenesis of acute lung injury induced by sarschina and italy have found that patients with hypertension were more likely to develop ards 17 , require icu admission 31 , and die from the disease 32 , though we note that correlative epidemiologic studies do not necessarily demonstrate causality. cluster 2 genes (decreasing with age) were significantly enriched for mitochondrion, mitochondrial translation, metabolic pathways, and mitosis, among other pathways (figure 1f) , which is consistent with prior observations of progressive mitochondrial dysfunction with aging [33] [34] [35] . of note, cluster 2 was also enriched for genes involved in lipid metabolism, fatty acid metabolism, peroxisome, and lysosomal membranes. age-associated alterations in lipid metabolism could impact sars-cov-2 infection, as sars-cov can enter cells through cholesterol-rich lipid rafts [36] [37] [38] [39] . similarly, age-associated alterations in lysosomes could influence late endocytic viral entry, as the protease cathepsin l cleaves sars-cov spike proteins from within lysosomes 40, 41 . having compiled a high-confidence set of age-associated genes, we sought to identify the lung cell types that normally express these genes, using the human lung single cell transcriptomics dataset from the tissue stability cell atlas 19 . by examining the scaled percentage of expressing cells within each cell subset, we identified age-associated genes predominantly enriched in different cell types. cell types with highly enriched expression for certain cluster 1 genes (increasing with age) included fibroblasts, muscle cells, and lymph vessels (figure 2a) . in contrast, cell types with highly enriched expression for certain cluster 2 genes (decreasing with age) included macrophages, dividing dendritic cells (dcs)/monocytes, and at2 cells (figure 2b) . similar results were found using an independent human lung scrna-seq dataset from the human lung cell atlas (supplementary figure 4a-b) 42 . examining the muscle-enriched genes that increased in expression with age, gene ontology analysis revealed enrichment for vascular smooth muscle contraction, cgmp-pkg signaling, z-disc, and actin cytoskeleton, among other pathways ( figure 2c ). as for the at2-enriched genes that decreased in expression with age, gene ontology analysis revealed enrichment for metabolic pathways, biosynthesis of antibiotics, lipid metabolism, extracellular exosome, and mitochondrial matrix (figure 2d) . a subset of these enriched gene ontologies had also been identified by the bulk rna-seq analysis (figure 1e-f) . thus, integrative analysis of bulk and single-cell transcriptomes revealed that many of the age-associated transcriptional changes in human lung can be mapped to specific cell subpopulations, suggesting that the overall abundance of these cell types, their transcriptional status, or both, may be altered with aging. as the pathophysiology of viral-induced ards involves an intricate interplay of diverse cell types, most notably the immune system 43, 44 , aging-associated shifts in the lung cellular milieu 23 could contribute an important dimension to the relationship between age and risk of ards in patients with covid-19 31 . to investigate the cellular landscape of the aging lung, we applied a gene signature-based approach 45 to infer the enrichment of different cell types from the bulk rna-seq profiles. since bulk rna-seq measures the average expression of genes within a cell population, such datasets will reflect the relative proportions of the cell types that comprised the input population, though with the caveat that cell types can have overlapping expression profiles and such profiles may be altered in response to stimuli. using this approach, we identified ageassociated alterations in the enrichment scores of several cell types (figure 3a) . whereas epithelial cells decreased with age, fibroblasts increased with age (figure 3b ). this finding is consistent with the progressive loss of lung parenchyma due to reduced regenerative capacity of the aging lung 46 , as well as the increased risk for diseases such as chronic obstructive pulmonary disease and pulmonary fibrosis 47 . in addition, these results are concordant with the findings from analysis of human lung single-cell transcriptomes (figure 2a-b) . among the innate immune cell populations, the enrichment scores of total macrophages were inversely associated with age (figure 3c) . macrophages are major drivers of innate immune responses in the lung, acting as first-responders against diverse respiratory infections 48 . thus, the age-associated decrease in macrophage abundance may be a possible factor related to the greater severity of lung pathology in patients with covid-19. although macrophage accumulation is often associated with the pathologic inflammation of viral ards 49, 50 , pulmonary macrophages can act to limit the duration and severity of infection by efficiently phagocytosing dead infected cells and released virions [51] [52] [53] . notably, macrophages infected with sars-cov have been found to abort the replication cycle of the virus 54,55 , further supporting their role in antiviral responses. however, macrophages may suppress antiviral adaptive immune responses 48, 56 , inhibiting viral clearance in mouse models of sars-cov infection 57 . in aggregate, these prior reports suggest that the precise role of lung macrophages in sars-cov-2 pathophysiology is likely contextdependent. it is also plausible that the increased numbers of macrophages are not the primary distinction between young and old patients, but rather the functional status of the macrophages. in line with this, we observed that the age-associated changes in macrophages were specifically attributed to the pro-inflammatory m1 macrophage subset but not the immunoregulatory m2 subset 58 (figure 3c) , though this binary classification scheme represents an oversimplification of macrophage function. nevertheless, elucidating the consequences of age-associated changes in lung macrophages may reveal insights into the differential outcomes of older patients with covid-19. further studies are needed to investigate whether macrophages or other innate immune cells respond to sars-cov-2 infection, and how their numbers or function may change with aging. among the adaptive immune cell populations, we observed that th1 cells and cd4 + tcm cells trended in opposite directions with aging (figure 3d) . while the lungs of younger donors were enriched for th1 cells, they were comparatively depleted for cd4 + tcm cells; the inverse was true in the lungs of older donors. of note, mouse models of sars-cov infection have indicated important roles for cd4 + t cell responses in viral clearance 59, 60 . additionally, th1 cells are responsive to sars-cov vaccines 61 and promote macrophage activation against viruses 62 . it is therefore possible that age-associated shifts in cd4 + t cell subtypes within the lung may influence the subsequent host immune response in response to coronavirus infection. however, future studies will be needed to determine the role of th1 cells and other adaptive immune cells in the response to sars-cov-2, and how these dynamics may change with aging. we next explored the roles of lung age-associated genes in host responses to viral infection. since functional screening data with sars-cov-2 has not yet been described (as of march 30, 2020), we instead searched for data on sars-cov. while these two viruses belong to the same genus (betacoronaviridae) and are conserved to some extent 8 , they are nevertheless two distinct viruses with different epidemiological features, indicating unique virology and host biology. therefore, data from experiments performed with sars-cov must be interpreted with caution. we reassessed the results from a prior in vitro sirna screen of host factors involved in sars-cov infection 63 . in this kinase-focused screen, 130 factors were determined to have a significant effect on sars-cov replication. notably, 11 of the 130 factors exhibited age-associated gene expression patterns (figure 4a) , with 4 genes in cluster 1 (increasing with age) and 7 genes in cluster 2 (decreasing with age). the 4 genes in cluster 1 were all associated with increased sars infectivity upon sirna knockdown; these genes included clk1, akap6, alpk2, and itk. paradoxically, while knockdown of clk1 was associated with increased sars infectivity, cell viability was also found to be increased (figure 4b) . of the 7 genes in cluster 2, 6 were associated with increased sars infectivity and reduced cell viability upon sirna knockdown (aurkb, cdkl2, pdik1l, cdkn3, mst1r, and adk). age-related downregulation of these 6 factors could be related to the increased severity of illness in older patients. however, we emphasize that until rigorous follow-up experiments are performed with sars-cov-2, the therapeutic potential of targeting these factors in patients with covid-19 is unknown. using the human lung scrna-seq data, we then determined which cell types predominantly express these host factors. of the 4 genes in cluster 1 that had a significant impact on sars-cov replication, clk1 was universally expressed, while alpk2 expression was rarely detected (figure 4c, supplementary figure 5a) . itk was preferentially expressed in lymphocytic populations, and akap6 was most frequently expressed in ciliated cells and muscle cells. of the 7 overlapping genes in cluster 2, aurkb and cdkn3 were predominantly expressed in proliferating immune cell populations, such as macrophages, dcs/monocytes, t cells, and nk cells (figure 4d, supplementary figure 5b) . mst1r, pdik1l, and pskh1 were infrequently expressed, through their expression was detected in a portion of at2 cells (5.54%, 5.30%, and 5.47%, respectively). finally, adk and cdkl2 exhibited preferential enrichment in at2 cells (51.40% and 34.43%). in aggregate, these analyses showed that the age-associated genes with functional roles in sars-cov are expressed in specific cell types of the human lung. we then investigated whether age-associated genes in the human lung interact with proteins encoded by sars-cov-2. a recent study interrogated the human host factors that interact with 27 different sars-cov-2 proteins 64 , revealing the sars-cov-2 : human protein interactome in cell lines expressing recombinant sars-cov-2 proteins. by cross-referencing the interacting host factors with the set of age-associated genes, we identified 20 factors at the intersection ( figure 5a ). 4 of these genes showed an increase in expression with age (i.e. cluster 1 genes), while 16 decreased in expression with age (cluster 2 genes). mapping these factors to their interacting sars-cov-2 proteins, we noted that the age-associated host factors which interact with m, nsp13, nsp1, nsp7, nsp8, orf3a, orf8, orf9c, and orf10 proteins generally decrease in expression with aging (figure 5b) . however, a notable exception was nsp12, as the age-associated hostfactors that interact with nsp12 both showed increased expression with aging (crtc3 and mycbp2) (figure 5c ). nsp12 encodes for the primary rna-dependent rna polymerase (rdrp) of sars-cov-2, and is a prime target for developing therapies against covid-19. the observation that crtc3 and mycbp2 increase in expression with aging is intriguing, as these genes may be related to the activity of nsp12/rdrp in host cells. of note, mycbp2 is a known repressor of camp signaling 65, 66 , and camp signaling potently inhibits contraction of airway smooth muscle cells 67 . thus, age-associated increases in mycbp2 could promote smooth muscle contraction, which is concordant with our analyses on age-associated gene signatures in the lung (figure 1e) . mycbp2 might possibly contribute to covid-19 pathology by not only interacting with sars-cov-2 rdrp, but also through its normal physiologic role in promoting smooth muscle contraction. to assess the cell type-specific expression patterns of these various factors, we further analyzed the lung scrna-seq data. of the sars-cov-2 interacting genes that increase in expression with age, mycbp2 was frequently expressed across several populations, particularly proliferating immune populations (dc/monocyte, t cells, macrophages), muscle cells, fibroblasts, and lymph vessels (figure 5d) . mycbp2 was also expressed in 21.86% of at2 cells. cep68 was preferentially expressed in lymph vessels, while akap8l and crtc3 showed relatively uniform expression frequencies across cell types, including a fraction of at2 cells (8.53% and 9.07% expressing cells, respectively). of the sars-cov-2 interacting genes that decrease in expression age, npc2 and ndufb9 were broadly expressed in many cell types, including at2 cells (99.96% together, these analyses highlight specific age-associated factors that interact with the sars-cov-2 proteome, in the context of the lung cell types in which these factors are normally expressed. finally, we assessed whether sars-cov-2 infection directly alters the expression of lung ageassociated genes. a recent study profiled the in vitro transcriptional changes associated with sars-cov-2 infection in different human lung cell lines 68 . we specifically focused on the data from a549 lung cancer cells, a549 cells transduced with an ace2 expression vector (a549-ace2), and calu-3 lung cancer cells. several age-associated genes were found to be differentially expressed upon sars-cov-2 infection (figure 6a-c) . of note, the overlap between lung ageassociated genes and sars-cov-2 regulated genes was statistically significant across all 3 cell lines (figure 6d-f) , suggesting a degree of similarity between the transcriptional changes associated with aging and with sars-cov-2 infection. among the age-associated genes that were induced by sars-cov-2 infection, the majority of these genes increase in expression with age (cluster 1) (figure 6g-i) . conversely, among the age-associated genes that were repressed by sars-cov-2 infection, most of these genes decrease in expression with age (cluster 2). of note, the directionality of sars-cov-2 regulation (induced or repressed) and the directionality of ageassociation (increase or decrease with age) were significantly associated across all 3 cell lines (figure 6g-i) . to identify a consensus set of age-associated genes that are regulated by sars-cov-2 infection, we integrated the analyses from all 3 cell lines. 603 genes were consistently induced by sars-cov-2 infection (figure 7a ). of these, 20 genes are in cluster 1 (increase with age) and 2 genes are in cluster 2 (decrease with age). the 20 induced genes in cluster 1 include several factors involved in ras signaling (rab8b, rasa2, and rasgrp1) as well as clk1, which was shown to be involved in host responses to sars-cov infection (figure 4a-b) . on the other hand, 641 genes were concordantly repressed by sars-cov-2 infection (figure 7b here we systematically analyzed the transcriptome of the aging human lung and its relationship to sars-cov-2. we found that the aging lung is characterized by a wide array of changes that could contribute to the worse outcomes of older patients with covid-19. on the transcriptional level, we first identified 1,285 genes that exhibit age-associated expression patterns. we subsequently demonstrated that the aging lung is characterized by several gene signatures, including increased vascular smooth muscle contraction, reduced mitochondrial activity, and decreased lipid metabolism. by integrating these data with single cell transcriptomes of human lung tissue, we further pinpointed the specific cell types that normally express the age-associated genes. we showed that lung epithelial cells, macrophages, and th1 cells decrease in abundance with age, whereas fibroblasts and pericytes increase in abundance with age. these systematic changes in tissue composition and cell interactions can potentially propagate positive feedback loops that predispose the airways to pathological contraction 69 . we find that some of the age-associated genes have been previously identified as host factors with a functional role in sars-cov replication 63 , and a fraction of the age-associated factors have been shown to directly interact with the sars-cov-2 proteome 64 . furthermore, age-associated genes are significantly enriched among genes directly regulated by sars-cov-2 infection 68 , suggesting transcriptional parallels between the aging lung and sars-cov-2 infection. moreover, it is intriguing that the genes induced by sars-cov-2 infection tend to increase in expression with aging, and vice versa. whether any of these age-associated changes causally contribute to the heightened susceptibility of covid-19 in older populations remains to be experimentally tested. it is also important to note that the datasets analyzed here were not from patients with covid-19. given the limited data that is currently publicly available, we emphasize that the analyses presented here at this stage should not be used to guide clinical practice. these analyses resulted in a number of previously unnoted observations and phenomena that illuminate new directions for subsequent research efforts on sars-cov-2, generating genetically-tractable hypotheses for why advanced age is one of the strongest risk factors for covid-19 morbidity and mortality. ultimately, we hope such knowledge can help the field to sooner develop rational therapies for covid-19 that are rooted in concrete biological mechanisms. we thank akiko iwasaki, craig wilen, hongyu zhao, wei liu, wenxuan deng, andre levchenko, katie zhu, ruth montgomery, bram gerriten, steven kleinstein and a number of other colleagues for their critical comments and suggestions, which were incorporated into the analyses and manuscript. we thank antonio giraldez, andre levchenko, chris incarvito, mike crair, and scott strobel for their support on covid-19 research. we thank our colleagues in the chen lab, the genetics department, the systems biology institute and various yale entities. we also want to thank all of the healthcare workers who are risking their health on the frontlines to treat patients with this disease. rc and sc conceived and designed the study. rc developed the analysis approach, performed all data analyses, and created the figures. rc and sc prepared the manuscript. sc supervised the work. no competing interests related to this study. the authors have no competing interests as defined by nature research, or other interests that might be perceived to influence the interpretation of the article. the authors are committed to freely share all covid-19 related data, knowledge and resources to the community to facilitate the development of new treatment or prevention approaches against sars-cov-2 / covid-19 as soon as possible. as a note for full disclosure, sc is a co-founder, funding recipient and scientific advisor of evolveimmune therapeutics, which is not related to this study. c. david gene ontology and pathway analysis of cluster 1 age-associated genes that exhibit enriched expression in muscle cells. d. david gene ontology and pathway analysis of cluster 2 age-associated genes that exhibit enriched expression in alveolar type 2 (at2) cells. a. venn diagram of the intersection between age-associated genes in human lung and the sars-cov-2 : human protein interactome (gordon et al., 2020) . of the 20 age-associated genes that were found to also interact with sars-cov-2, 4 of them increased in expression with age, while 16 decreased with age. b. age-associated genes in human lung and their interaction with sars-cov-2 proteins, where each block contains a sars-cov-2 protein (underlined) and its interacting age-associated factors. blocks are colored by the dominant directionality of the age association (orange, decreasing with age; blue, increasing with age). gene targets with already approved drugs, investigational new drugs, or preclinical molecules are additionally denoted with an asterisk. the genotype-tissue expression (gtex) project was supported by the common fund of the office of the director of the national institutes of health, and by nci, nhgri, nhlbi, nida, nimh, and ninds 10, 11 . rna-seq raw counts and normalized tpm matrices were downloaded from the gtex portal (https://gtexportal.org/home/index.html) on march 18, 2020, release v8. all accessed data used in this study are publicly available on the web portal and have been deidentified, except for patient age range and gender. case-fatality rates in china and italy were from the chinese cdc and italian iss, for visualization of rna-seq expression data, the tpm values were log2 transformed and plotted in r (v3.6.1). all boxplots are tukey boxplots, with interquartile range (iqr) boxes and 1.5 ã� iqr whiskers. pairwise statistical comparisons in the plots were assessed by two-tailed mann-whitney test, while statistical comparisons across all age groups were performed by kruskal-wallis test. to identify age-associated genes, the raw counts values were analyzed by deseq2 (v1.24.0) 22 , using the likelihood ratio test (lrt). age-associated genes were determined at a significance threshold of adjusted p < 0.0001. genes passing the significance threshold were then scaled to z-scores and clustered using the degpatterns function from the r package degreport (v1.20.0). gene clusters with progressive and consistent trends with age were retained for downstream analysis. gene ontology and pathway enrichment analysis was performed using david (v6.8) 70 (https://david.ncifcrf.gov/), separating the age-associated genes into the two clusters (increasing or decreasing with age), as described above. scrna-seq data were analyzed in r (v3.6.1) using seurat 71,72 and custom scripts. of the 1285 age-associated genes identified from gtex bulk transcriptomes, 1049 genes were matched in the tissue stability cell atlas dataset and 1021 genes were matched in the human lung cell atlas dataset. to determine the percentage of cells expressing a given gene, the expression matrices were converted to binary matrices by setting a threshold of expression > 0. cell typespecific expression frequencies for each gene were then calculated using the provided cell type annotations. to identify genes preferentially expressed in a specific cell type, we further scaled the expression frequencies in r to obtain z-scores. data were visualized in r using the nmf package 73 . where applicable, gene ontology analysis was performed with david (v6.8) 70 , using genes with z-score > 2 in the cell type of interest for analysis. to infer the cellular composition of each lung sample, we analyzed the tpm expression matrices using the xcell algorithm 45 . the resultant cell type enrichment tables were analyzed in r. for data visualization, cell type enrichment scores were scaled to z-scores, and the median zscore for each age group was expressed as a heatmap, using the superheat package 74 . ageassociation was assessed across all age groups by kruskal-wallis test. to assess whether any age-associated genes affect host responses to sars-cov (a coronavirus related to sars-cov-2), we analyzed the data from a published sirna screen of host factors influencing sars-cov 63 (data set s1 in the publication; accessed on march 20, 2020). for data visualization, each point corresponding to a target gene was size-scaled and color-coded according to the age-association statistical analyses described above. to assess whether any lung age-associated genes encode proteins that interact with the sars-cov-2 proteome, we compiled the data from a preprint manuscript detailing the human host factors that interact with 27 different proteins in the sars-cov-2 proteome 64 (accessed on march 23, 2020). to assess whether the expression of lung age-associated genes is influenced by sars-cov-2 infection, we utilized the data from a preprint manuscript detailing the transcriptional response to sars-cov-2 infection 68 , from the gene expression omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse147507) (accessed on april 13, 2020). differentially expressed genes were determined using the wald test in deseq2 (v1.24.0) 22 comparing sars-cov-2 infected cells to batch-matched mock controls, with a significance threshold of adjusted p < 0.05. of the 1285 age-associated genes, 988 genes were matched to the rna-seq dataset. statistical significance of overlaps between the gene sets was assessed by hypergeometric test, assuming 21,797 total genes as annotated in the rna-seq dataset and 988 age-associated genes. statistical significance of the association between the directionality of sars-cov-2 regulation and the directionality of age-association was assessed by two-tailed fischer's exact test. gene ontology and pathway enrichment analysis was performed using david (v6.8) 70 (https://david.ncifcrf.gov/). comprehensive information on the statistical analyses used are included in various places, including the figures, figure legends and results, where the methods, significance, p-values and/or tails are described. all error bars have been defined in the figure legends or methods. codes used for data analysis or generation of the figures related to this study are available upon request to the corresponding author and will be deposited to github upon publication for free public access. all relevant processed data generated during this study are included in this article and its supplementary information files. raw data are from various sources as described above. all data and resources related to this study are freely available upon request to the corresponding author. tables table s1 : demographics of donors for gtex lung samples. (gordon et al., 2020) with age-association statistics from this study. table s22 : differential expression analysis in a549 cells, infected with sars-cov-2 vs mock control, with age-association annotations. table s23 : differential expression analysis in a549-ace2 cells, infected with sars-cov-2 vs mock control, with age-association annotations. table s24 : differential expression analysis in calu-3 cells, infected with sars-cov-2 vs mock control, with age-association annotations. cdkn1b gna13 rab8b n4bp3 rasgrp1 camsap2 mxi1 gem c1s rbm33 rasa2 brwd1 phc3 gatad2b pnrc1 arid4b atrx clk1 pnisr prtg h1f0 nckipsd cnn3 ptov1 cfd rhoc polr2f gnai2 tk1 tm4sf4 pskh1 ap1m2 msh2 atp6v0e2 pgam5 mvk aacs acaca acss2 nipsnap1 nsdhl aifm1 agr2 gipc1 mmab hadh qdpr ap1s1 ndufa7 poldip2 ahcy mrps28 samm50 prpf19 nup93 gba cln3 apeh atic arpc1b ppp1ca aprt atp6v0d1 akr1a1 phb c1qbp parp1 cbr1 acat1 pdhb ndufa9 mrpl27 prdx3 mtch2 ormdl2 ndufs3 a t p 1 b 1 a l g 5 n e u 1 p r i m 1 c e n p f h o o k 1 g c c 1 p i g o n a r s 2 a a r 2 a t p 6 v 1 a n d u f a f 1 a g p s n p c 2 p p t 1 n d u f b 9 -log 10 (adj. p-value) gem c1s gna13 rab8b brwd1 cnot6l cdkn1b camsap2 mxi1 rasa2 rbm33 gatad2b phc3 clk1 arid4b atrx pnisr pnrc1 prtg h1f0 nckipsd cnn3 gnai2 rhoc cfd ptov1 polr2f ap1m2 agr2 tk1 tm4sf4 gba pgam5 pskh1 acss2 acaca aacs mvk msh2 atp6v0e2 nup93 nipsnap1 nsdhl aifm1 acat1 cbr1 gipc1 mmab hadh mrps28 ap1s1 poldip2 qdpr ahcy atic apeh cln3 prpf19 samm50 arpc1b ppp1ca aprt parp1 pdhb mtch2 ndufa9 mrpl27 asna1 ndufs3 c1qbp ndufa7 phb ormdl2 atp6v0d1 akr1a1 prdx3 estimating clinical severity of covid-19 from the transmission dynamics in wuhan, china adjusted age-specific case fatality ratio during the covid-19 epidemic in hubei, china case-fatality rate and characteristics of patients dying in relation to covid-19 in italy coronavirus disease 2019 (covid-19) in italy severe outcomes among patients with coronavirus disease 2019 (covid-19) -united states a novel coronavirus from patients with pneumonia in china genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding a pneumonia outbreak associated with a new coronavirus of probable bat origin epidemiological characteristics of 2143 pediatric patients with coronavirus disease in china the genotype-tissue expression (gtex) project a novel approach to high-quality postmortem tissue procurement: the gtex project cryo-em structure of the 2019-ncov spike in the prefusion conformation receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov-2 spike glycoprotein covid-19 and italy: what next? the lancet 0 risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease the epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (covid-19) -china scrna-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation single-cell rna expression profiling of ace2, the putative receptor of wuhan single cell rna sequencing of 13 human tissues identify cell types and receptors of human coronaviruses moderated estimation of fold change and dispersion for rna-seq data with deseq2 effect of aging on respiratory system physiology and immunology alterations in platelet functions during aging: clinical correlations with thrombo-inflammatory disease in older adults a crucial role of angiotensin converting enzyme 2 (ace2) in sars coronavirus-induced lung injury angiotensin-converting enzyme 2 protects from severe acute lung failure identification of a novel coronavirus in patients with severe acute respiratory syndrome a novel angiotensin-converting enzyme-related carboxypeptidase (ace2) converts angiotensin i to angiotensin 1-9 a human homolog of angiotensin-converting enzyme cloning and functional expression as a captopril-insensitive angiotensin-converting enzyme 2 is an essential regulator of heart function clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan, china baseline characteristics and outcomes of 1591 patients infected with sars-cov-2 admitted to icus of the lombardy region the mitochondrial basis of aging and age-related disorders the mitochondrial basis of aging mitochondrial dysfunction in the elderly: possible role in insulin resistance lipid rafts are involved in sars-cov entry into vero e6 cells sars coronavirus entry into host cells through a novel clathrin-and caveolaeindependent endocytic pathway lipid rafts play an important role in the early stage of severe acute respiratory syndrome-coronavirus life cycle lipid rafts: heterogeneity on the high seas sars coronavirus, but not human coronavirus nl63, utilizes cathepsin l to infect ace2-expressing cells inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry a molecular cell atlas of the human lung from single cell rna sequencing molecular pathology of emerging coronavirus infections clinical progression and viral load in a community outbreak of coronavirus-associated sars pneumonia: a prospective study xcell: digitally portraying the tissue cellular heterogeneity landscape regeneration of the aging lung: a mini-review the aging lung pulmonary macrophages: key players in the innate defence of the airways the clinical pathology of severe acute respiratory syndrome (sars): a report from evolution of pulmonary pathology in severe acute respiratory syndrome regulating the adaptive immune response to respiratory virus infection regulation of immunological homeostasis in the respiratory tract alveolar macrophages in the resolution of inflammation, tissue repair, and tolerance to infection antibody-dependent infection of human macrophages by severe acute respiratory syndrome coronavirus cytokine responses in severe acute respiratory syndrome coronavirus-infected macrophages in vitro: possible relevance to pathogenesis alveolar macrophage elimination in vivo is associated with an increase in pulmonary immune response in mice evasion by stealth: inefficient immune activation underlies poor t cell response and severe disease in sars-cov macrophage plasticity, polarization, and function in health and disease t cell responses are required for protection from clinical disease and for virus clearance in severe acute respiratory syndrome coronavirus-infected mice cellular immune responses to severe acute respiratory syndrome coronavirus (sars-cov) infection in senescent balb/c mice: cd4+ t cells are important in control of sars-cov infection induction of th1 type response by dna vaccinations with n, m, and e genes against sars-cov in mice expanding roles for cd4+ t cells in immunity to viruses a kinome-wide small interfering rna screen identifies proviral and antiviral host factors in severe acute respiratory syndrome coronavirus replication, including double-stranded rna-activated protein kinase and early secretory pathway proteins a sars-cov-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing protein associated with myc (pam) is a potent inhibitor of adenylyl cyclases pam mediates sustained inhibition of camp signaling by sphingosine-1-phosphate camp regulation of airway smooth muscle function sars-cov-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems a microphysiological model of the bronchial airways reveals the interplay of mechanical and biochemical signals in bronchospasm systematic and integrative analysis of large gene lists using david bioinformatics resources integrating single-cell transcriptomic data across different conditions, technologies, and species comprehensive integration of single-cell data a flexible r package for nonnegative matrix factorization superheat: an r package for creating beautiful and extendable heatmaps for visualizing complex data key: cord-304913-qb9zeazk authors: thibivilliers, sandra; joshi, trupti; campbell, kimberly b; scheffler, brian; xu, dong; cooper, bret; nguyen, henry t; stacey, gary title: generation of phaseolus vulgaris ests and investigation of their regulation upon uromyces appendiculatus infection date: 2009-04-27 journal: bmc plant biol doi: 10.1186/1471-2229-9-46 sha: doc_id: 304913 cord_uid: qb9zeazk background: phaseolus vulgaris (common bean) is the second most important legume crop in the world after soybean. consequently, yield losses due to fungal infection, like uromyces appendiculatus (bean rust), have strong consequences. several resistant genes were identified that confer resistance to bean rust infection. however, the downstream genes and mechanisms involved in bean resistance to infection are poorly characterized. results: a subtractive bean cdna library composed of 10,581 unisequences was constructed and enriched in sequences regulated by either bean rust race 41, a virulent strain, or race 49, an avirulent strain on cultivar early gallatin carrying the resistance gene ur-4. the construction of this library allowed the identification of 6,202 new bean ests, significantly adding to the available sequences for this plant. regulation of selected bean genes in response to bean rust infection was confirmed by qrt-pcr. plant gene expression was similar for both race 41 and 49 during the first 48 hours of the infection process but varied significantly at the later time points (72–96 hours after inoculation) mainly due to the presence of the avr4 gene in the race 49 leading to a hypersensitive response in the bean plants. a biphasic pattern of gene expression was observed for several genes regulated in response to fungal infection. conclusion: the enrichment of the public database with over 6,000 bean ests significantly adds to the genomic resources available for this important crop plant. the analysis of these genes in response to bean rust infection provides a foundation for further studies of the mechanism of fungal disease resistance. the expression pattern of 90 bean genes upon rust infection shares several features with other legumes infected by biotrophic fungi. this finding suggests that the p. vulgaris-u. appendiculatus pathosystem could serve as a model to explore legume-rust interaction. common bean, phaseolus vulgaris, represents a great source of nutrition for millions of people and is the second most important legume crop, after soybean. it is the target of multiple pests and diseases causing substantial losses. for example, on susceptible bean cultivars, bean rust, caused by uromyces appendiculatus, may cause yield reduction from 18 to 100% with favorable environmental conditions, such as high moisture and temperature between 17 and 27°c [1] . among the 5 different stages of the bean rust life cycle, basidia, pycnia, aecia, uredinia, and telia, the most devastating on bean is the uredinial stage. the latent period between the germination of an urediniospore and the formation of a sporulating pustule can be as short as 7 days. signs of infection by uromyces appendiculatus include the presence of uredinia or sporeproducing pustules on the surface of the leaf. the identification of fungal proteins from quiescent and germinating uredospores enhanced the understanding of the infection process of this fungus [2, 3] . based upon mapping and quantitative trait loci (qtl) analysis, several genes involved in colletotrichum lindemuthianum (co; anthracnose)resistance and other resistance genes for bean common mosaic virus (bcmv), bean golden yellow mosaic virus (bgymv), common bacterial blight, and bean rust are clustered [2, 3] . the large number of resistance (r) genes for bean rust may correlate with the high pathogen population diversity; with 90 different races identified [4] . the locus ur-3 confers resistance to 44 out of the 89 u. appendiculatus races present in the usa [5, 6] . besides the ur-3 locus, a number of other r genes were identified in bean; such as locus ur-4 for race 49, locus ur-11 epistatic to ur-4 for race 67 or locus ur-13 mapped to the linkage group b8 [7, 8] . to date, no large scale transcriptomic analysis of bean rust infection has been performed to better understand the mechanism of resistance. all of these ur genes are effective against one specific rust strain, following the gene-for-gene resistance theory. consequently, gene pyramiding was used to produce cultivars carrying multiple resistance genes [9] . unfortunately, such resistance may prove to be effective in the field for only a short time due to the adaptation of the fungus to overcome plant defenses [10] . consequently, unraveling and understanding the mechanisms downstream of these r genes is a key research goal to circumvent the adaptation of the fungus to plant resistance. we investigated the phaseolus vulgaris-uromyces appendiculatus pathosystem at a transcriptional level for a better understanding of the plant response to fungal infection. in this study, we developed a subtractive suppressive hybridization (ssh) library made from the common bean cultivar early gallatin that exhibits susceptibility to u. appendiculatus race 41(virulent strain) but resistance to u. appendiculatus race 49 (avirulent strain). the resistance to u. appendiculatus is conferred by the presence of the ur-4 gene in this cultivar that leads to a hypersensitive response (hr) in presence of the pathogen race 49 [11] . this cdna bean library was enriched in expressed sequence tags (ests) that are potentially up-regulated by the compatible and incompatible interactions. more than 20,000 clones from the ssh library were sequenced and assembled into contigs. a total of 10,221 p. vulgaris sequences and 360 u. appendiculatus sequences were added to the ncbi database, significantly increasing the number of ests available for common bean. the regulation of 90 genes was confirmed by quantitative real time polymerase chain reaction (qrt-pcr) revealing 3 main expression patterns and highlighting gene regulation that occurs downstream of r protein activation. common bean is a diploid (n = 11) with a small genome size estimated at 450 to 650 mb [12] . so far, the total number of common bean ests available is 83,448 (verified on march, 2009). this number was significantly less before the publication of [13] who added ests from root nodules, phosphorus deficient roots, developing pods, and leaves, and from leaves and shoots with and without c. lindemuthianum inoculation [14] . (the current number also includes the 10, 221 ests added in this study.). the lack of sufficient p. vulgaris sequences precludes the construction of a useful dna microarray for this plant. consequently, in order to study the response of bean to u. appendiculatus infection, we created a ssh library and sequenced 20,736 clones from 3' and 5' ends. from 41,472 sequences, 8.5% were discarded due to the absence of a cloned sequence or low sequence quality. during cdna generation, sequence tags were incorporated prior to pooling cdnas from different conditions (see material and methods for details). the tags identify the treatment and time points used in generating the original mrna. the distribution of these tags among the library is presented in the figure 1 . approximately 17% of the sequences lacked a tag after sequencing, while 31% of the sequences had a "race 49" tag and more than 51% had a "race 41" tag. it is important to note that the majority of the ests coming from the fungus were tagged "late41", consistent with an effective colonization of the leaf by the virulent fungus (race 41). the lack of tag identification may come from inefficient incorporation of the tag during the library construction or the presence of non-identified nucleotide in the tag sequence making it indiscernible. the various cdnas in the library could be resolved back to their source tissue by the presence of unique sequence tags. for example, 51% percent of the est sequences were derived from bean tissue infected with race 41 since they had the "race 41" tag. this likely reflects the compatible interaction between the race 41 and its host allowing greater fungal penetration. biotrophic fungi are known to reprogram the host plant cell to support their growth [15] and plant ests tagged with race 41 could be involved in this process. contig assembly and removal of redundant sequences was performed on 38,592 sequences using tigr gene indices clustering tools (tgicl) and cap3 software. two thousand seven hundred twenty one sequences showed no similarity with other sequences and were categorized as singletons. these sequences had an average length of 670 bp. seven thousand eight hundred sixty contigs were assembled from the remaining 35,163 sequences. the average contig length was estimated to be 1 kb. an average contig contains 4.5 sequences (min:2, max:496). ultimately, 10,581 unisequences were identified and represent genes that are potentially differentially up-regulated during bean infection by virulent or avirulent pathogen isolates. among these 10,581 unisequences, 10,221 were annotated as bean genes and 360 were annotated as fungal genes based on best blast hits to the database these 360 fungal unisequences included 62 singletons and 298 contigs (table 1) . sequence analysis revealed that 8,806 ests had significant similarity with sequences in public databases such as dfci or ncbi (using blastn with an e-value ≤ e-20). forty-three percent of annotations were based on similarities to sequences in soybean databases and 32.8% were derived from comparisons with common bean ( table 1 , see additional file 1: excel file of the 10,581 unisequences). these unisequences were grouped into 18 different functional categories ( figure 2 ). the most abundant category contained the unknown (31.9%), non-classified (4.7%), and low or no hit (13.7%) groups and represents 50.3% of the entire library. the remaining 49.7% of the sequences were grouped into 14 categories, such as, protein metabolism and catabolism (7.9%), nucleotide and nucleic acid metabolism (8.8%), or stress defense response (2.9%). taken together, signal transduction regulation and nucleotide and nucleic acid metabolism represent 15.3% of the library. tian et al. (2007) also found that 14% of their est library, made from phosphorus starved bean plants, fell into these two categories [16] . similar observations were made on soybean in response to stresses such as drought, phosphorus starvation or nematode infection [17] . it would seem that under various biotic and abiotic stresses, plants activate several common pathways that alter the expression profile of genes, which allow the plant to response to the specific environmental condition. the ssh library was normalized to reduce the redundancy of the most highly expressed genes. however, some genes are very highly expressed and thus remain overrepresented in the normalized library. as expected, the largest contigs (i.e., composed of the most sequences) are involved in basic metabolism processes. the primary metabolism category comprises the largest component (5.9%) of the library based upon the number of unisequences and the proportion of contigs composed of a high number of sequences. for example, cl1contig48 is composed of 496 aligned sequences and not surprisingly, represents ribulose-1,5-bisphosphate carboxylase/oxygenase activase (rubisco activase). three of the other 15 largest contigs are also found in the primary metabolism category ( table 2 , see additional file 1: excel file of the 10,581 unisequences). this library was constructed to reveal the plant and fungal genes up-regulated during the rust infection process. contigs correlated with stress response pathways also have a high number of sequences such as the contig cl1contig105 with 72 sequences encoding an 1-aminoc-distribution of the sequences according to their tag figure 1 distribution of the sequences according to their tag. the est sequences are representing in the grey or black columns depending on whether they came from tissues harvested early (6 to 24 hai) or late (48 to 120 hai). x-axis represents the fungal race with which the leaves were infected prior to cdna isolation. y-axis represents the percentage of sequences in each category versus the total number of sequences of the library. yclopropane-1-carboxylic acid oxidase (acc oxidase), the contig cl10contig1 with 55 sequences encoding for a glucan endo-1,3-beta-glucosidase, and the contig cl22contig1 with 37 sequences and encoding an endochitinase. to determine the proportion of new p. vulgaris unigenes among the library, the sequences were compared with the p. vulgaris ests present in the ncbi database. by this analysis, 6,202 sequences, out of the 10,221 bean ests, can be considered as new p. vulgaris unigenes with the remaining 4,019 sequences matching known sequences with more than 98% identity over more than 100 bp. the ests present in the ncbi database originate from common bean cultivars such as bat93, negro jamapa 81 or g19883, facilitating the identification of putative single nucleotide polymorphisms (snps) between these public sequences and the ests derived from this study using cultivar early gallatin. of the 4,019 matching sequences, 791 sequences present a perfect match, 762 sequences have 1 mismatch or indel, 658 have 2 mismatches and/or indels, and 1,807 have more than 3 mismatches and/or indels. an average of 1 snp/indel is putatively identified every 335 bp. however, we were not able to further confirm these snp/indels due to the lack of the sequence trace files for the bean ests present in the ncbi database. this snp frequency is very similar to that reported previously by ramirez et al., (2005) who found 1 snp every 387 bp. our estimation is based on the comparison of cv early gallatin with 3 other cultivars (bat93, negro jamapa 81, and g19883). when this comparison is made between only 2 different cultivars (early gallatin and g19883) the snp frequency in the coding sequences decreases to 1 snp every 570 bp. for comparison, the snp frequency in the soybean coding sequence was estimated at 1 snp/490 bp in exons and 1 snp/375 bp in introns [18] . the genes identified by est sequencing represent candidates involved in the plant host's ability to withstand rust infection. therefore, genetic mapping of these gene candidates is a means to correlate their position with known qtl involved in disease resistance. the 360 fungal sequences represent 3.4% of the library. two studies in rice showed that the harvesting time (i.e., fungal biomass in the infected leaf is low at the earliest time points) and the stringency of selection (i.e., choice of the appropriate e-value for the blast) are very important to accurately sample the abundance of fungal est in infected leaf tissue [19, 20] . in this study, the selected e-value was e-20, greatly reducing the risk of false positive clones. tissue was sampled after 5 days of infection allowing the multiplication of the fungi in the leaf tissue. at 5dai, the haustoria are already mature and are probably redirecting the nutrient up-taken from the plant based on their genes expression pattern [21] . these genes were mainly annotated predominantly by comparison to ests from germinating uredospores of u. appendiculatus [22, 23] (table 3 , see additional file 1: excel file of the 10,581 unisequences). two hundred seventy two fungal sequences, representing 74.4% of the total, were considered identical to ests already present in the ncbi database while 88 sequences are new and unique as identified by less than 98% identity over at least 100 bp. interestingly, among the 88 fungal ests that showed no similarity with ests from u. appendiculatus germinating uredospores, 19 showed similarity with uromyces viciae haustorium-specific cdnas and may be specific to successful infections. these remaining sequences represent candidates for fungal genes more directly involved in the infection mechanism. the library was made from tissues infected with a virulent and avirulent rust strain to allow for the identification of genes involved in both pathogenhost compatibility and resistance. beside, the high simi"unisequence", "singleton", "contig = 2", and "contig>2" columns represent the number of total unisequences, those found once in the ssh library, the contigs made up of only 2 sequences and the contigs composed of more than 2 sequences, respectively, having a hit with an organism listed in column one. larity of these 19 sequences with haustoria-specific ests makes them likely candidates to encode potential effectors or avirulence proteins. the largest contig has sequence similarity to a putative beta-galactosidase (an enzyme involved in the degradation of the cell-wall) based on a match to a cdna from germinating p. pachyrhizi uredospores. among the 10,221 unisequences, we sought to confirm the expression of 90 ests using qrt-pcr. to normalize gene expression based on qrt-pcr, the identification of constitutively expressed bean reference genes is required. the use of house keeping genes as reference genes for gene expression normalization can induce some error in the analysis of the data without confirmation of their constitutive expression especially when using qrt-pcr [24, 25] . consequently, three bean genes, tc197, tc127, and tc185 (encoding a guanine nucleotide-binding protein beta subunit-like protein, ubiquitin, and tubulin beta chain respectively) were selected based on their housekeeping function and/or their presence in different bean cdna libraries [13, 14] . additionally, homologs of soybean genes cons6, cons7, and cons15 (encoding for a f-box protein family, a metalloprotease, and a peptidase s16 respectively), were chosen since they were recently shown to be expressed constitutively in soybean [26] . preliminary analysis of these putative constitutive genes by qrt-pcr performed on leaf, stem, and pod cdna led to the elimination of tc197, cons15 and tc185 due to the variability of their expression levels (data not shown). the stability of the expression level of the 3 remaining genes, tc127, cons6 and cons7 was evaluated by qrt-pcr on cdnas from bean uninfected or infected with bean rust race 41 or 49 at 6, 12, 24, 48, 72, and 96 hours after inoculation (hai). after analysis of their expression stability using genorm software [27] , cons7 was the most stably expressed in our experimental conditions (figure 3 ). for this reason, cons7 was selected for normalization of the expression data. it is interesting to note that cons7 was also among the most stably expressed constitutive genes in soybean [26] and, therefore, could be a candidate to use for expression normalization in other legumes. in order to compare expression of genes responding to u. appendiculatus race 41 to those responding to race 49, during bean infection and colonization, the expression level of six, selected fungal genes was analyzed using qrt-pcr ( figure 4 ). during the first 24 hours of the infection, the six genes were expressed at comparable levels. however, by 48 hai, the expression of all six genes was significantly higher in tissues infected with the virulent race 41 isolate. this result likely reflects the nature of the compatible, virulent interaction as compared to inhibition of race 49 infection by the host defenses. consistent with this, all six genes used in this analysis came from the ests possessing the tag of the "late41" library. the est cl3018contig1, encoding for a plant-induced rust protein 1, exhibits significant similarity with nmt1 (no messenger in thiamine), which is involved in the biosynthesis of the pyrimidine moiety of thiamine (vitamin b1). this gene was strongly expressed only in tissue infected with the virulent fungus race 41. similar observations were made previously using bean plants infected with uromyces fabae [28] . these data also suggest that the haustoria may not only be the site of nutrient uptake from the plant [29] but also the site of metabolite biosynthesis with specific haustorial genes involved in vitamin biosynthesis [e.g., nmt1] [28] . ninety bean unisequences were selected (based on their putative function and tag) and their regulation was confirmed by qrt-pcr using rna obtained from three independent biological replicates. unisequences coming from the ests in the "race 49" tagged libraries were desirable due to their potential involvement in a resistance pathway. the regulation of these genes was evaluated by qrt-pcr using rna from uninoculated leaf tissues or those inoculated with either u. appendiculatus uredospores of race 41 or race 49 isolates at the time points 0, 6, 12, 24, 48, 72, or 96 hai. the data obtained were used to compare the ratio of gene expression in tissues infected with race 41 or race 49 to that in uninoculated bean leaves. the data also allowed a direct comparison of gene expression induced by either race 41 or race 49. the first two comparisons highlight regulation in the infected plants by the rust fungi, while the third comparison highlights gene expression differences between the two types of infection, resistant and susceptible. the 90 genes showed significant expression differences in at least one of the 3 comparisons (p-value < 0.05, cut-off < -1 or > 1 or p-value < 0.1, cut-off < -0.58 or > 0.58 in log base 2). the transcriptional response was profiled in relation to the time after inoculation ( figure 5 ). for example, 39 and 41 genes showed differential regulation within 6 and 12 hai, respectively, in tissue inoculated with race 41. at these same time points 40 and 24 genes, respectively, were differentially regulated in tissues infected with race 49. at the latest time points, 72 and 96 hai, 16 and 19 genes, respectively, for race 41 and 6 and 14 genes, respectively, for race 49 were differentially regulated. it is interesting to note that the regulation occurring at the early time points appeared to be independent of the fungal race used for inoculation. at the early time points (i.e., first 48 hours), only 16 genes (36% of those tested) showed a difference in expression in tissues inoculated with the two fungal races. however, at the later time points, this number increased to 34 genes with 18 (36%) and 16 (32%) at 72 and 96 hai, respectively. these results suggested that during the beginning of the infection most of the bean gene regulation is independent of the fungal race, but differences due to fungal race occur as the infection progresses. it is possible that fungal-pathogen associated molecular pattern (pamp) elicitors (e.g., chitin) induce the same response from the plant at the beginning of the infection. subsequently, the avr4 protein in the race 49 is recognized after a couple of days leading to the induction of defense-related genes. however, in bean infected by race 41, no plant defense is activated and gene expression may reflect the reprogramming of the plant host by the fungus especially at the haustorial site. a key finding of the van cl229contig1 uromyces viciae haustorium-specific cdna similar to mitochondrial substrate carrier 14 cl201contig1 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to translation elongation factor 14 cl1contig310 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to von willebrand factor 12 cl1contig460 phakopsora pachyrhizi cdna from germinating urediniospores ssh-library similar to von willebrand factor 10 cl124contig1 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to unknown 10 cl116contig1 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to unnknown 10 cl492contig1 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to unknown 8 cl633contig1 uromyces viciae haustorium-specific cdna similar to nucleotide excision repair protein yeast rad23 8 cl662contig1 uromyces viciae haustorium-specific cdna similar to 6-phosphogluconate dehydrogenase 8 cl116contig2 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to unknown 8 cl766contig1 puccinia graminis f. sp. tritici ssh-library of puccinia graminis infected wheat leaves similar to 60s ribosomal protein l5 gene 8 cl772contig1 uromyces viciae haustorium-specific cdna similar to voltage-dependent ion-selective channel 8 cl582contig1 puccinia graminis f. sp. tritici ssh-library of puccinia graminis infected wheat leaves similar to glutathione s-transferase 8 cl787contig1 uromyces appendiculatus cdna from hyphae from gernimating uredospore similar to cysteine-rich secretory protein (crisp/scp/tpx1) expressed at 72-96 hai ( figure 5) . therefore, the dip pattern of gene expression upon rust infection appears to occur in both bean and soybean. furthermore, this biphasic regulation seems to be shared not only by rust fungi but by other biotrophic fungi. for example barley infected by blumeria graminis (causal agent of the powdery mildew), also showed a biphasic gene response, the first set of genes responded in the first 24 hours of the infection in the epidermis whereas the second set responded after 72-96 hours of infection in the mesophyll cells [30] . in contrast, soybean plants infected with phytophthora sojae, a hemibiotrophic oomycete, did not show a biphasic pattern of gene response [31] . based on these examples, this biphasic pattern might be specific to the biotrophic rust fungi. further comparisons need to be made to establish the specificity of this "dip" pattern of gene expression in response to biotrophic fungal infection. a more detailed analysis was performed on the expression ratio of transcripts in bean leaves inoculated with the fungus race 41 or race 49 versus uninoculated bean leaves. these analyses are presented in a hierarchical cluster based on euclidian distance ( figure 6 , see additional file 3: excel file of the ratio of the expression level of the 90 regulated genes for all conditions). this cluster can be divided into five main groups. the first, group a (a1 and a2), is composed of 17 genes up-regulated by both fungal races in the first 24 hours of the infection but enhanced expression is subsequently maintained only in the plants infected by race 49 at the later time points (up to 96 hai). genes in this group include those annotated as plant defense (35% of this group) containing pr1, wound induced protein 2 (win2) genes, cell-wall related (i.e., a cell-wall invertase gene), and signal transduction regulation category with a g-box binding protein pg2 or sensory transduction histidine kinase genes. these genes are likely involved in the defense pathways induced by a fungal-pamp since they have the same expression pattern during the first 24 hours of infection independent of the fungal race used. for example, the wound-induced protein win2 protein has anti-fungal activity [32] and possesses a domain that can bind a well known pamp, chitin [33] . the formation of haustoria by the fungus in the plant can occur within hours of infection [34] . after successful colonization of the bean cell, rust race 41 likely secretes effector proteins that can suppress the plant defense pathway induced by pamps. the initial induction of genes such as win2 by race 41 and their subsequent reduction in expression may be associated with this suppression of defense by the virulent pathogen only. the second group, group b, is composed of 16 genes that were induced at the beginning of the infection but were slightly down-regulated at the later time points independent of the fungal race. this group is rich in genes categorized as plant defense representing 56% of this group. the third group, group c (c1 and c2), is composed of 5 genes that appeared to be repressed by inoculation. in group c1, the genes were repressed during the first 12 hours by both races but this repression was only maintained at later time points (i.e., 72 and 96 hai) in tissues infected with race 49. in contrast to group c2, the apparent repression of genes occurred only after 72 hai with both races. the fourth group, group d, consists of 35 genes that were repressed in the first 12 hours by both races and subsequently expressed at levels comparable to the uninfected tissue. the genes repressed specifically at the early time points could also be involved in the basal defense response. this pool is composed of ests known to be involved in plant defense pathways [e.g., chitinase class 1 [35] , an auxin response factor 4 and an auxin conjugate hydrolase [36] , and a mlo-like protein 8 [37] ]. finally, the fifth group represents 18 genes that gave no discernable pattern of expression. these 90 representative genes mainly identified genes involved in the early responses of the bean under rust infection (i.e., first 96 hai). these genes share different expression patterns but are likely involved in the basal defense response, which is induced by pamps. these genes were induced by both races at the early time points but their regulation was often maintained only in plants infected by the fungus race 49. this may be due to the inability of this avirulent pathogen to suppress the plant ranking of bean genes based on their expression stability measured by qrt-pcr figure 3 ranking of bean genes based on their expression stability measured by qrt-pcr. the expression levels of three putative constitutive genes (tc127, cons6, and cons7) was measured during infection by both fungal race 41 and 49 in order to identify the best reference gene for qrt-pcr normalization. genes with the most stable expression during the conditions tested are on the right of the diagram, the less stably expressed being on the left. figure generated by genorm software. <::::: least stable genes most stable genes ::::> average expression stability m defense system. the same observation was made also by lee et al. (2008) at the protein level. lee et al. (2008) proposed a new model for plant disease resistance where rgene mediated resistance is integrated into the basal immunity system of the plant and functions primarily to restore the innate immunity response that is actively suppressed by virulent pathogens [38] . similar patterns of expression, independent of the pathogen virulence, were observed in arabidopsis [39] and barley [40] . another category of genes (i.e. cell-wall invertase or amino acid transporter-like protein 1) involved in the plant defense system are likely involved in the hr and were regulated only at the later time points in plants infected by the race 49 fungus. the expression of these genes may be the result of recognition of avrur-4 by the ur-4 resistance protein and lead to the presence of hr ten days after infection with this isolates. in summary, we identified 10,581 p. vulgaris unisequences and confirmed the regulation of 90 plant genes by rust infection in common bean. these data have added significantly to the genomic resources available for common bean, while also providing insight into how this plant responds to fungal infection. as part of this study, we identified constitutively expressed bean genes that can be used for normalization in gene expression studies. the data also suggest that a biphasic gene expression pattern may be a common feature in plants infected by biotrophic fungi. bean tissues were produced at the usda-ars facility (beltsville, md). p. vulgaris cv. early gallatin plants were inoculated with either u. appendiculatus race 41 (virulent strain) or race 49 (avirulent strain) uredospores. the primary leaves of 10 day old plants were inoculated on the top and bottom. spores (2 × 10 5 spores/ml) were mixed in water and then sprayed on leaves with an aerosol canister. the plants were placed in a dew chamber in the dark at 20°c for 12 hours and then moved to a growth room transcriptional expression of selected fungal genes during the infection process figure 4 transcriptional expression of selected fungal genes during the infection process. expression ratio of selected u. appendiculatus genes during the first 96 hours of the infection with bean rust race 49 or 41. qrt-pcr was performed on three independent biological replicates using cons7 as a reference for normalization. six ests, cl2800contig1 (heat shock protein 90), cl1917contig1 (proteasome subunit alpha), cl2317contig1 (glutamine synthetase), cl1contig289 (asparaginyl-trna synthetase), cl3018contig1 (planta-induced rust protein 1), and cl4935contig1 (unknown), were found strongly up-regulated in tissues infected with the fungal race 41 in comparison to tissues infected with the fungal race 49. the tag identification for these ests is "late race 41" indicating that they came originally from tissue infected with race 41. *: data significant with 0.05 < p-value ≤ 0.1. **: data significant with 0.01 < p-value ≤ 0.05. ***: data significant with p-value ≤ 0.01. nd: not determined. *** (24°c, 90% relative humidity) with supplemental fluorescent lighting (12 hours light/12 hours light). leaves were harvested 0, 6, 12, 24, 48, 72, 96, and 120 hai in 3 independent experiments. the presence of pustules or hr lesions when inoculated with u. appendiculatus race 41 or 49 isolates was observed 10 days after inoculation. bean leaf, stem, and pod tissues used for the identification of the putative constitutive genes were harvested on 3 month old plants grown in a greenhouse. the normalized ssh library was generated at the roy j. carver biotechnology center (urbana, il). the library is composed of more than 20,000 ests and was prepared as described by bonaldo et al. (1996) following the 4 th method [41] . the cdna from bean infected with u. appendiculatus race 41 or 49 was pooled and tagged as follows, early41/49 and late41/49 for cdna from bean tissues infected for 6-12-15-24 or 48-72-96-120 hours, respectively, by either race 41 or race 49. the enrichment in cdna regulated by the infection was possible by subtraction of cdna from the 4 described pools against cdna derived from uninoculated leaves and germinated spores. the library was subsequently sub-divided in 4 parts based on sequence tags added during library construction. 20,736 cdnas were cloned in pgem-t (promega) for sequencing. the 20,736 cdna clones were sequenced using an abi 3730xl dna sequencer (ame bioscience) at the catfish genetic research facility (usda-asr, stoneville, ms). the conversion of the electropherogram into base and quality files was performed using phred [42] . the est sequences were first cleaned of polya, polyt, and low complexity sequence using seqclean from tigr http://comp bio.dfci.harvard.edu/tgi/software/. contig assembly was done using the tigr gene indices clustering tools (tgicl) http://compbio.dfci.harvard.edu/tgi/software/ after removing vector and tag sequences. it uses a slightly modified version of ncbi's megablast, and the resulting clusters are then assembled using the cap3 assembly program (huang and madan. 1999). annotations for the sequences were obtained by blast against the tigr plant and fungal sequence databases and uniprot database. the ests were submitted to ncbi genbank dbest under the accession numbers fe674093 to fe712011. rna extraction and cdna synthesis from leaf tissues of common bean cv early gallatin infected with either u. appendiculatus race 41 (virulent strain) or race 49 (avirulent strain) and from soybean tissues infected by p. pachyrhizi, were performed as described by libault et al., 2008. briefly, rnas were extracted from ground frozen tissues using trizol@reagent (invitrogen, carlsbad, calif.) and purified by two phenol/chloroform extractions. the rnas were treated with turbo dna-free enzyme (ambion) to remove all dna contaminants. cdna synthesis was prepared from 5 μg of rna using the mmlv reverse transcriptase (promega, madison, wi). the qrt-pcr primers were designed with primer3 software http://frodo.wi.mit.edu/primer3/input.htm using the following criteria, tm of 60°c, pcr amplicon length from 80 to 125 bp, primer sequence length from 19 to 23 nucleotides with guanine-cytosine contents from 40% to 60% (see additional file 4: excel file of the qrt-pcr primers). the qrt-pcr on bean leaf tissues were performed in a 384-well plate format (7900 ht sequence detection system; applied biosystems, foster city, ca). the qrt-pcr of soybean leaf, pod, and stem tissues was performed with a 96-well plate qrt-pcr machine (7500 real-time pcr system; applied biosystems, foster city, ca). data analysis was performed as described by libault et al. (2008) with modifications [43] . the data collection was performed during 40 cycles for bean but 45 cycles for soybean with an rn threshold set at 0.2 for ct value determination. the ratios of the expression level were transformed into a log 2 base for clustering in gene traffic software using a hierarchical clustering algorithm. a t-test was used to assess the statistical differences of the mean of the ratio for each sample at each time point. temporal expression pattern of the 90 regulated transcripts during the infection process the identification of a reference gene for qrt-pcr normalization was made using genorm software [27] . this software calculates the mean pairwise variation (based on geometrical mean) for each gene and compares these values among these genes. a high mean pairwise variation is found for gene with low expression stability. the use of host resistance in disease management of rust in common bean giraldez r: a genetic linkage map of phaseolus vulgaris l. and localization of genes for specific resistance to six races of anthracnose (colletotrichum lindemuthianum). tag theoretical and applied genetics anthracnose resistance and linked molecular markers in common bean line a193 phenotypic and genotypic characterization of uromyces appendiculatus from phaseolus vulgaris in the americas monogenic and epistatic resistance to bean rust infection in common bean development of comprehensive rust resistant bean germplasm (abstr.) crg, a gene required for ur-3-mediated rust resistance in common bean, maps to a resistance gene analog cluster scar markers linked to the common bean rust resistance gene ur-13. tag theoretical and applied genetics sources, genes for resistance and pedigrees of 52 rust and mosaic resistant dry bean germplasm lines released by the usda beltsville bean project in collaboration with michigan, nebraska and north dakota agricultural experiment stations a critical analysis of durable resistance using specific races of the common bean rust pathogen to detect resistance genes in phaseolus vulgaris nuclear dna amounts in angiosperms sequencing and analysis of common bean ests. building a foundation for functional genomics comparative bioinformatic analysis of genes expressed in common bean (phaseolus vulgaris l.) seedlings. genome/national research council canada = genome/conseil national de recherches canada infection of arabidopsis thaliana leaves with albugo candida (white blister rust) causes a reprogramming of host metabolism molecular cloning and characterization of phosphorus starvation responsive genes in common bean (phaseolus vulgaris l.) sequencing and analysis of approximately 40,000 soybean cdna clones from a full-length-enriched cdna library a soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis large-scale identification of expressed sequence tags involved in rice and rice blast fungus interaction analysis of genes expressed during rice-magnaporthe grisea interactions analysis of expressed sequence tags from uromyces appendiculatus hyphae and haustoria and their comparison to sequences from other rust fungi shotgun identification of proteins from uredospores of the bean rust uromyces appendiculatus protein accumulation in the germinating uromyces appendiculatus uredospore genome-wide identification and testing of superior reference genes for transcript normalization in arabidopsis reference gene selection for quantitative real-time pcr analysis in virus infected cells: sars corona virus, yellow fever virus, human herpesvirus-6, camelpox virus and cytomegalovirus infections identification of four soybean reference genes for gene expression normalization. the plant genome accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes high level activation of vitamin b1 biosynthesis genes in haustoria of the rust fungus uromyces fabae nutrients of a rust fungus: the role of haustoria differential gene transcript accumulation in barley leaf epidermis and mesophyll in response to attack by blumeria graminis f.sp. hordei (syn. erysiphe graminis f.sp. hordei). physiological and molecular plant pathology patterns of gene expression upon infection of soybean plants by phytophthora sojae differential expression within a family of novel wound-induced genes in potato a novel pathogen-and woundinducible tobacco (nicotiana tabacum) protein with antifungal activity volatiles modulate the development of plant pathogenic rust fungi purification and characterization of an acidic beta-1,3-glucanase from cucumber and its relationship to systemic disease resistance induced by colletotrichum lagenarium and tobacco necrosis virus a plant mirna contributes to antibacterial resistance by repressing auxin signaling mlo, a novel modulator of plant defenses and cell death, binds calmodulin quantitative proteomic analysis of bean plants infected by a virulent and avirulent obligate rust fungus arabidopsis senescence-associ-ated gene101 stabilizes and signals within an enhanced disease susceptibility1 complex in plant innate immunity. the plant cell nuclear activity of mla immune receptors links isolate-specific and basal diseaseresistance responses normalization and subtraction: two approaches to facilitate gene discovery base-calling of automated sequencer traces using phred. i. accuracy assessment identification of 118 arabidopsis transcription factor and 30 ubiquitinligase genes responding to chitin, a plant-defense elicitor st contributed to the sequences production and analysis, performed the expression analysis and drafted the manuscript; tj performed the bioinformatic analysis. kbc produced the plant material and contributed to the sequences production; bs sequenced the clones; dx supervised the bioinformatic part of the project and worked over the draft version of the manuscript; bc and htn supervised the project and worked over the draft version of the manuscript; gs conceived and supervised the project and worked over the draft version of the manuscript. all authors read and approved the final manuscript. click here for file [http://www.biomedcentral.com/content/supplementary/1471-2229-9-46-s1.xls] click here for file [http://www.biomedcentral.com/content/supplementary/1471-2229-9-46-s4.xls] key: cord-294725-wyrg0nq8 authors: bourdon, julie a.; williams, andrew; kuo, byron; moffat, ivy; white, paul a.; halappanavar, sabina; vogel, ulla; wallin, håkan; yauk, carole l. title: gene expression profiling to identify potentially relevant disease outcomes and support human health risk assessment for carbon black nanoparticle exposure date: 2013-01-07 journal: toxicology doi: 10.1016/j.tox.2012.10.014 sha: doc_id: 294725 cord_uid: wyrg0nq8 new approaches are urgently needed to evaluate potential hazards posed by exposure to nanomaterials. gene expression profiling provides information on potential modes of action and human relevance, and tools have recently become available for pathway-based quantitative risk assessment. the objective of this study was to use toxicogenomics in the context of human health risk assessment. we explore the utility of toxicogenomics in risk assessment, using published gene expression data from c57bl/6 mice exposed to 18, 54 and 162 μg printex 90 carbon black nanoparticles (cbnp). analysis of cbnp-perturbed pathways, networks and transcription factors revealed concomitant changes in predicted phenotypes (e.g., pulmonary inflammation and genotoxicity), that correlated with dose and time. benchmark doses (bmds) for apical endpoints were comparable to minimum bmds for relevant pathway-specific expression changes. comparison to inflammatory lung disease models (i.e., allergic airway inflammation, bacterial infection and tissue injury and fibrosis) and human disease profiles revealed that induced gene expression changes in printex 90 exposed mice were similar to those typical for pulmonary injury and fibrosis. very similar fibrotic pathways were perturbed in cbnp-exposed mice and human fibrosis disease models. our synthesis demonstrates how toxicogenomic profiles may be used in human health risk assessment of nanoparticles and constitutes an important step forward in the ultimate recognition of toxicogenomic endpoints in human health risk. as our knowledge of molecular pathways, dose–response characteristics and relevance to human disease continues to grow, we anticipate that toxicogenomics will become increasingly useful in assessing chemical toxicities and in human health risk assessment. chronic inhalation of fine and ultrafine particulate matter has been associated with adverse pulmonary effects including fibrosis and cancer, as well as exacerbation of existing conditions such as asthma, bronchitis and chronic obstructive pulmonary disorder (bonner, 2007; knaapen et al., 2004) , in addition to cardiovascular disease (dockery et al., 1993; pope et al., 2004) . human exposure to manufactured nanomaterials (nms), which have at least one size dimension that is less than 100 nm, may constitute an increased risk of adverse effects especially following inhalation exposure, and their potential to induce toxic effects is poorly understood (handy and shaw, 2007) . moreover, the human health risks associated with inhalation exposure have not been adequately investigated. methods that can be effective in screening for nm toxicities are paramount, due to the countless variations in physical and chemical properties of nms in terms of size, shape, agglomeration and surface coatings. traditional assays used in human health risk assessment (hhra) generally involve chronic and subchronic rodent exposures with concomitant analyses of tumour induction (e.g., two-year rodent cancer bioassay), in addition to various non-cancer endpoints, the most sensitive of which is used for regulatory decision-making (meek et al., 1994) . these approaches form the foundation of the chemical regulatory system and have been invaluable for hhra. however, some of these assays, such as those based on chronic animal exposures at the maximum tolerated dose, are time and resource intensive, thus limiting broad application (suter et al., 2004) . recent discussions have identified gene expression profiling as a potentially rapid and cost-effective approach for identifying and assessing prospective hazard, characterizing chemical (or particle) mode of action, and assessing human relevance in support of hhra (national academy of sciences, 2007) . in order for gene expression data to become accepted for routine use in hhra, it is necessary to demonstrate that mrna/protein expression profiles can effectively predict the modes of action and biological outcomes of exposure at relevant doses, and to confirm that these data can be used to strengthen the foundation for hhra and regulatory decisions. in this regard, it has been hypothesized that gene expression profiling will be extremely useful in identifying effects at low doses, and moreover, useful for distinguishing between doses that elicit an adaptive response vs. those that yield adverse effects (boverhof and zacharewski, 2006) . to date, the application of gene expression profiling in regulatory toxicology has largely focused on qualitative identification of chemical modes of action and transcription biomarkers that can predict specific toxicities. however, the utility of gene expression profiling in quantitative determination of threshold values (e.g., benchmark doses) has not yet been rigorously explored . in the present study we investigate the utility of gene expression profiles derived from mice exposed to printex 90 carbon black nanoparticles (cbnps) by intratracheal installation to identify potential hazards, modes of action, and doses above which adverse effects may be expected for specific toxicological outcomes. in addition, we quantitatively compare benchmark doses for pathways to those of apical endpoints derived from the same experimental animals. we employ printex 90 as a model nm due to the rich database of traditional toxicity information on which our findings can be anchored. briefly, printex 90 consists almost entirely of carbon, with very low levels of impurities in terms of polycyclic aromatic hydrocarbons and endotoxins (bourdon et al., 2012b; jacobsen et al., 2008; saber et al., 2011) they generate reactive oxygen species (jacobsen et al., 2008) , induce dna strand breaks in vitro and in vivo (jacobsen et al., 2009; saber et al., 2005) and mutations in vitro (jacobsen et al., 2007) that are associated with oxidative stress . the data in this study are from previously published experiments investigating printex 90 cbnp exposure in c57bl/6 mice at various doses (i.e., vehicle, 18, 54 and 162 g) collected at several time-points (1, 3 and 28 days) following a single acute instillation (bourdon et al., 2012a) . we previously characterized widespread changes in gene expression involving acute phase response and inflammation, supported by concomitant influxes of pulmonary bronchoalveolar lavage cells (bal) and increases in tissue-specific dna strand breaks (bourdon et al., 2012a,b) . in addition to the examination of bmds and bmdls, we compare cbnp-modified gene expression profiles to various models of lung disease in mice and humans reported in the literature, in order to explore the utility of our data in predicting the potential risk of adverse health outcomes and the human relevance of expression changes. the work demonstrates one approach by which gene expression profiling may be integrated into hhra to support or predict apical toxicological endpoints, dose-response, and relevance to human diseases. details of the mouse exposures, particle characterization and pulmonary phenotype were previously published in bourdon et al. (2012a,b) . briefly, female c57bl/6 mice were exposed to a single installation of vehicle or printex 90 (18, 54 or 162 g) and euthanized 1, 3 and 28 days post-exposure (n = 6/group). the intratracheal instillation route of exposure allows for deposition of known doses directly in the lungs of the mice, and controls for potential dermal-and ingestion-related cbnp exposure that can occur during whole body inhalation exposures. the doses were selected to represent 1, 3 and 9 working days of exposure at the occupational inhalation exposure limit of 3.5 mg/m 3 of cb (as established by the us occupational safety and health administration (osha) and the us national institute for occupational safety and health (niosh))) for a mouse (assuming 1.8 l/h inhalation rate and 33.8% particle deposition in mouse, for an 8 h working day) (dybing et al., 1997; jacobsen et al., 2009) . very limited filtration of cbnps from the nose is expected during human exposure. printex 90 cbnps were characterized and displayed the following properties: 14 nm primary particle size, 295-338 m 2 /g brunauer emmett and teller (bet) surface area, 74.2 g/g pahs, 142 eu/g endotoxin, polydispersity index of 1, −10.7 mv zeta potential, 2.6 m peak hydrodynamic number and 3.1 m peak volume-size-distribution (bourdon et al., 2012b) . analysis of pulmonary inflammatory cellular influx in bronchoalveolar lavage (bal) revealed neutrophilic inflammation that was sustained to day 28 at all doses. tissue-specific genotoxicity, as observed by dna strand breaks, persisted up to day 28 at the two highest doses and fpg-sensitive sites at all doses on day 1 and the highest dose on day 3 (bourdon et al., 2012b) . whole mouse genome dna microarray revealed 487 and 81 differentially expressed genes (fdr adjusted pvalue ≤ 0.1 and fold changes ≥ 1.5) overall in lung and liver, respectively (bourdon et al., 2012a) . the complete microarray dataset is available through the gene expression omnibus at ncbi (http://www.ncbi.nlm.nih.gov/geo/, superseries gse35284, subseries gse35193). this dataset was previously used to examine molecular interactions between lung and liver upon cbnp exposure (bourdon et al., 2012a) . to determine the most affected processes of cbnp exposure, pathway analysis of gene expression data was conducted using a rank based test in r (r development core team, 2011) as described in alvo et al. (2010) . the relative expression for the genes in a pathway was first aligned by subtracting the median expression value for the combined treatment and control groups. these values were then ranked within each subject and the vector of average ranks was calculated for each treatment group. the distance between the two treatments was calculated and a permutation analysis was used to obtain a p-value for each pathway. pathways with p < 0.05 were considered significant. 2.3. benchmark dose (bmd)/lower confidence limit benchmark dose (bmdl) calculation for apical endpoints and rt-pcr data bmd10 (bmd representing an excess risk of 10% in exposed animals vs. controls) and bmdls (95% confidence limit) were calculated for apical endpoint data (inflammation and genotoxicity) and for rt-pcr using epa bmds 2.2 (davis et al., 2011) . only data that were statistically above control levels (p < 0.05) for at least two of the doses were included. prior to running the analysis, the data were screened for homogeneity of variance, and then fit against five continuous dose-response models (i.e., hill, polynomial, linear, power and exponential). goodness of fit >0.05 and scaled residuals within ±2.0 was applied as a cut off for selection of the appropriate model, and curves were also inspected visually. when more than one model was suitable, the one with the lowest akaike's information criterion (aic) was selected. in order to determine bmds and bmdls for gene expression data, bmdexpress was employed . briefly, microarray probes with more than one representation on each array were averaged. analyses were performed on genes that were identified as statistically significant by one-way anova (p < 0.05) using the four following models: hill, power, linear 1 • and polynomial 2 • . the power model had a power restriction of ≥1. selection on linear and polynomial 2 • was based on choosing a model which describes the data with the least complexity. a nested chi-square test, with cut-off of 0.05, first selects among linear and polynomial models, followed by comparing aic, which measures the relative goodness of fit. a hill model was excluded if the "k" parameter of the model was less than 1/3 of the lowest positive dose (18 g) (black et al., 2012) . other settings included maximum iterations of 250, confidence level of 0.95, benchmark response (bmr) of 1.349 (number of standard deviation defining bmd) . for functional classifications and analyses, the resulting bmd datasets were mapped to kegg pathways with promiscuous probes removed (probes that mapped to multiple annotated genes). bmds that exceeded the highest exposure dose (162 g) and that exceeded a goodness-of-fit p-value of 0.1 were removed from the analysis. to determine the correlation between gene expression profiles of mice exposed to cbnps with those of mouse pulmonary disease models, a prediction analysis for microarrays (pam) (tibshirani et al., 2002) was conducted in r (r development core team, 2011) using the pamr library (hastie et al., 2011) . data for this analysis encompassed 13 mouse lung disease models, and were obtained from the national centre for biotechnology information gene expression omnibus (accession #gse4231 and #gse11037). the samples were labelled as belonging to one of three models of lung inflammation: bacterial infection, lung injury and fibrosis, or th2 response (allergic airway inflammation). probes with common genbank accessions were collapsed to a single measurement for each sample using the mean. using the common accession numbers, a prediction model using shrunken centroids was estimated. cross-validation of the nearest shrunken centroid classifier was conducted to identify an appropriate threshold. pamr implements 10-fold cross-validation. this involves dividing the samples into ten approximately equal-size parts ensuring that the classes are distributed proportionally. ten-fold cross-validation works by fitting a model on 90% of the samples and then predicting the class labels of the remaining 10%. this procedure is repeated ten times, with each part playing the role of the test samples and the errors on all ten parts added together to compute the overall error. a threshold of 2 was selected, yielding a classifier with 753 genbank accessions. the means of the nine cbnp treatment conditions were then classified using the estimated prediction model. functional analysis was conducted to establish molecular perturbations that were in common or discrepant between cbnp exposed mice and inflammatory lung disease models. the analysis was conducted on genes that were common between cbnp and each lung disease model, then again for genes that were unique to cbnp, using a cut-off of fdr-adjusted p < 0.1 and a fold-change > 1.5 for all datasets. the less stringent cut-off was employed for disease models because of the low power in several of the datasets. david bioinformatics resources 6.7 was used to identify enriched biological functions from terms with similar genes and biological meaning (huang et al., 2009a,b) . david biological functions with enrichment scores > 1.3 were considered significant, in accordance with david recommendations (huang et al., 2009a) . clusters with enrichment scores > 1.3 in our analysis contained at least one gene ontology term or pathway for which the benjamini-corrected p-value was ≤0.05. in order to predict potential disease outcomes of relevance to humans, gene expression profiles were mined against genomic data repositories. disease prediction analysis was done in nextbio (http://nextbio.com) using the high dose exposure profiles as differentially expressed genes were identified at all time-points for this dose. data from cbnp exposed mice were compared to curated datasets to identify disease studies with similar gene profiles, gene ranking and consistency. pairwise gene signature correlations and rank-based enrichment statistics were employed in the calculation of nextbio scores for each disease. the disease that ranked highest in comparison with cbnp exposure was given a score of 100, and the rest of the results were normalized accordingly (kupershmidt et al., 2010) . meta-analysis was performed using select disease models for mice, as well as for human studies representative of disease state. the analysis identified, ranked and scored all genes and biogroups that were common between the studies according to the scoring method described above for disease prediction (kupershmidt et al., 2010) . biogroups were filtered for canonical pathways. the rank-based pathway analysis revealed a total of 151, 150 and 106 differentially expressed kegg pathways on days 1, 3 and 28, respectively. the most affected pathways according to statistical significance were primarily related to inflammation on day 1, to steroid biosynthesis and dna repair on day 3 and to apoptosis and inflammation on day 28. significant pathways (p < 0.05) pertaining to genotoxicity (dna damage and repair) and inflammatory and immune responses are summarized in table 1 , along with previously established phenotypes. all significant pathways are presented in supplemental table 1 . analysis of the number of common pathways between doses for each time-point revealed that most pathways occurring at lower doses also occur at higher doses. however, the number of significant pathways increased with dose (fig. 1) . epa bmds 2.2 bmds and bmdls were generated for apical endpoints and rt-pcr data (bmd values for each endpoint and gene presented in supplementary table 2 ; curves are presented in supplemental fig. s1 ). although many of the apical endpoints and rt-pcr data were not suitable for modelling, bmd and bmdl values generally increased over post-exposure time as expected. the mean bmds for inflammatory apical endpoints were 0.9, 1.2 and 9.6 g, and bmdls were 0.6, 0.9 and 6.5 g on days 1, 3 and 28, respectively. bmd values for rt-pcr data of genes involved in inflammation tended to be higher than for apical endpoints. mean bmds of inflammatory genes were 14.5, 16.7 and 29.0 g, and mean bmdls were 10.4, 9.1 and 20.1 g, on days 1, 3 and 28, respectively. bmds and bmdls were also generated for microarray gene expression profiles using bmdexpress. minimum bmds for kegg pathways relevant to inflammation, kegg pathways relevant to genotoxicity, for the most sensitive kegg pathways as well as for apical endpoint data are presented in table 2 . minimum bmds were calculated according to the median of all significant genes for each pathway and the 5th percentile of significant genes of all pathways, in order to increase sensitivity. even the 5th percentile bmds tended to be higher than bmds generated for apical endpoints (table 2) . however, minimum bmds, representing the most sensitive gene for each relevant pathway, were much more comparable to bmds of apical endpoints (table 2) . pam was used to compare the printex 90 gene expression dataset to 13 pulmonary gene expression profiles that represent a range of murine pulmonary disease models (e.g., transgene overexpression, treatments with infectious agents, toxic chemicals and allergens) as described in lewis et al. (2008) , thomson et al. (2012) . the models were classified according to the three following subgroups: (1) bacterial infection, (2) lung injury and fibrosis, and (3) th2 response (allergic airway inflammation). clustering of the models using pam is shown in fig. 2a . two cbnp exposure conditions (day 28 low and medium doses) did not cluster with other cbnp exposure condition or other disease models, likely due to lack of response. models of bacterial infection did not cluster with other disease models or cbnp exposure. pam analysis revealed an association between cbnp exposure, th2 responses and lung injury/fibrotic responses. although th2 response and lung injury/fibrotic responses were more closely associated with one another than with cbnp exposure, pam analysis revealed that cbnp exposure was more closely related to lung injury/fibrotic responses than to th2 responses, which is also supported by probability statistics comparing cbnp exposure with each disease sub-group (fig. 2b) . in order to examine commonalities and discrepancies between disease models and cbnp exposure in more detail, functional analysis was conducted on (1) genes that were in common between cbnp and each disease model and (2) genes that were unique to cbnp. the number of significant genes used for each analysis is presented in supplemental table 3 . the david biological functions are summarized in table 3 . this analysis demonstrates that inflammation was common between most models at all time-points (excluding aspergillus extract). on day 1, commonalities for cbnp exposure were observed with bacterial infection models (i.e., due to the acute phase response) and with injury and fibrosis models (i.e., due to changes in tissue morphogenesis related genes). day 3 revealed inflammation and cell cycle disturbances in most of the models. however, cbnp responses were more similar to bleomycin-induced lung injury as shown by the high degree of overlapping biological functions on day 3 (table 3) . cbnps triggered an adaptive immune response on day 28 that was also only apparent in lung injury and fibrosis models. gene expression profiles from the high dose cbnp-exposed mice vs. control were analysed in nextbio to identify closely related respiratory disease profiles in humans. on all post-exposure days, severe acute respiratory syndrome (sars), congenital cystic table 1 summary of significant kegg pathways (p ≤ 0.05) relating to phenotypes established in bourdon et al. (2012a,b 1 . venn diagrams illustrating overlap of significant pathways (p < 0.05) according to dose, for each post-exposure day (1, 3 and 28 days) in c57bl/6 mice exposed to cbnps. adenomatoid malformation, and injury of lung, were identified as the top three respiratory diseases associated with cbnp exposure. interestingly, fibrosis was identified as a predicted disease outcome of cbnp exposure that increased considerably with time (e.g., score of 14 on day 1, 35 on day 3 and 45 on day 28). in order to examine the molecular mechanisms that may be involved in fibrosis in more detail, a meta-analysis was completed using curated studies within nextbio that identified fibrosis as a phenotype. meta-analysis in mouse employed 36 models that included fibrosis-induced by injury with naphthalene, bleomycin and ganciclovir, doxycyclineinduced over-expression, and tgf-␤ over-expression in a variety of mouse models (wild type, inflammation resistant/susceptible). meta-analysis using cbnp gene expression profiles in mouse ranked 473 canonical pathways and 21,277 genes present in at least one of the studies on select models of pulmonary fibrosis and lung injury (identified in nextbio disease correlation profiles). in order to establish human-relevance, the analysis was repeated using human studies curated in nextbio. meta-analysis encompassed 4 studies from lung biopsies of patients affected with fibrosis, with intermediate to severe pulmonary hypertension, pneumonia and exacerbation of idiopathic pulmonary fibrosis. overall, 472 canonical pathways and 15,795 genes were ranked as present in at least one of the studies. the top ranked pathways and genes for the mouse and human meta-analyses are presented in table 4 . interestingly, comparison of fold-ranks between the mouse and human analysis revealed that the most affected pathways were the same in both species. however, the genes that were most perturbed during fibrotic responses were considerably different in cbnp-exposed mice compared to human diseases, with the exception of glycerol-3-phosphate dehydrogenase (gdp1), kruppel-like factor 4 (klf4), secreted phosphoprotein 1 (spp1) and ceruloplasmin (cp). it is now widely accepted that toxicity is preceded by, and accompanied by, transcriptional changes, thus providing molecular signatures of direct and indirect toxic effects (auerbach et al., 2010; fielden et al., 2011; gatzidou et al., 2007) . it is hypothesized that toxicogenomic profiling can be used as a screening tool to prioritize the specific assays that should be conducted from the standard battery of tests, thus minimizing animal use, cost and time (dix et al., 2007) . moreover, global analyses of transcriptional changes provide a wealth of information that can be used to identify putative modes of action and to query relevance to human adverse health outcomes (currie, 2012) . this type of approach is the general premise of the widely supported paradigm outlined in 'toxicity testing in the 21st century' (national academy of sciences, 2007) . however, substantive work demonstrating the ability of gene expression profiles to identify hazards, to assess risk of exposure via quantitative dose-response analysis, and to identify adverse outcomes associated with specific modes of action is required before these endpoints can be used in hhra. the present study applies pathway-and network-based approaches, bmd modelling, and disease prediction tools to gene expression data to explore the relationship between apical endpoints and transcriptional profiles. the work investigates the potential utility of gene expression profiling in determining hazard and mode of action of nps, in characterizing dose-response relationships and in predicting the relevance of these findings to potential disease-outcomes and human health effects for hhra. the utility of gene expression profiling in hazard identification has been examined for a limited number of chemicals, including dibutyl phthalate and acetaminophen (euling et al., 2011; kienhuis et al., 2011; makris et al., 2010) . toxicogenomic profiles of alachlor exposure in rat olfactory mucosa (genter et al., 2002) and dimethylarsenic (dma) exposure in human cultured bladder cells and rat bladder epithelium (sen et al., 2005 ; us epa, 2005) have also table 3 comparison of cbnp profiles with lung disease models using functional analysis for genes in common (grey) and genes unique to cbnp (black). provided useful information for two final assessments of acetochlor and arsenicals (us epa, 2004 . our data demonstrate that gene expression profiles can also be viewed as effective predictors of the biological effects of cbnp exposure. for example, inflammatory responses manifested at the gene expression level and detected using dna microarrays and classified in this work using kegg pathway analyses and previously in the same mice using ingenuity pathway analysis (bourdon et al., 2012a) are entirely consistent with the observed pulmonary influx of inflammatory markers (e.g., neutrophils, eosinophils and lymphocytes). the number of genes perturbed and the magnitude of expression changes in these pathways correlates with dose and time. in addition, observed transcriptomic changes associated with perturbations of cell cycle networks, alterations of non-homologous end-joining, and p53 signalling support the sustained genotoxicity observed in the mice, although dose and time correlations were not as apparent (e.g., levels of dna strand breaks remained relatively constant at the two highest exposure doses (bourdon et al., 2012b) whereas induction of dna repair genes decreased with dose and time). the transcriptomic changes associated with alterations in glutathione metabolism and free radical scavenging correlate with induction of dna formamidopyrimidine dna glycoslase (fpg) sensitive sites (an indicator of oxidative dna damage) early after the exposure. the persistence of this response is an indication of an adaptive response to oxidative stress in the lungs of the mice. interestingly, cbnp-induced alterations in gene expression profiles also revealed a pulmonary acute phase response and unexpected changes in lipid homeostasis, which were subsequently supported by measured decreases in plasma high density lipoprotein (hdl) (bourdon et al., 2012a) . the strong association between cbnpinduced gene expression profiles and apical endpoints collectively support the use of toxicogenomics for hazard identification of table 4 meta-analyses in nextbio using mouse and human profiles in which fibrosis was a phenotype. values in parentheses represent rank in the opposite species (mouse or human). rank 1 (rank 2) pathway rank 1 (rank 2) gene (symbol) nms, and perhaps more importantly, for highlighting unexpected adverse outcomes. moreover, ongoing work within the organization for economic co-operation and development (oecd) is actively developing adverse outcome pathways (aop) approaches that are expected to provide tangible methods by which systems biology endpoints can be used in human health risk assessment. toxicogenomics data that examine responses over dose and time in a variety of tissues can be very useful for such applications, as illustrated for cbnp exposure in fig. 3 . overall, our data suggest that gene expression profiles can be effectively used to identify putative mode(s) of action and hazards of np exposure, in the absence of phenotypic data. in addition to identification of hazard, it has been suggested that gene expression profiles may be useful for quantitative assessment (e.g., establishment of reference doses) of responses related to both cancer and non-cancer endpoints . benchmark doses are generally considered more informative than the no observable adverse effect level (noael) in deriving reference doses as they are based on the entire dose-response relationship (crump et al., 1995) . because alterations in gene expression can be initiated in the absence of biological effects (e.g., adaptive or stress response pathways effective in mitigating toxic effects), it is expected that reference doses for genomics endpoints may be too sensitive for use in hhra. however, previous analyses of 5 chemicals (i.e., 1,4-dichlorobenzene, propylene glycol mono-t-butyl ether, 1,2,3-trichloropropane, methylene chloride and naphthalene) showed that median bmd and bmdls for the most sensitive pathways and go categories were highly correlated with bmd and bmdls of cancer and non-cancer endpoints (thomas et al., 2011 . in the current study, rather than choosing the most sensitive (i.e., lowest) bmds, we focussed on the analysis of pathways that were specific to biological outcomes observed in the mice (i.e., phenotypically anchored), and calculated bmds for these relevant genes and pathways. the pathway-based bmds and bmdls calculated here for relevant pathways were actually less sensitive (i.e., higher bmds) than those of the observed apical endpoints. however, the mean of the minimum bmds and bmdls across all the pathways that we assigned as relevant to the apical endpoints (i.e., corresponding to the most sensitive genes within the relevant pathways) were similar to those of relevant apical endpoints. median bmds and bmdls for the most sensitive pathways also correlate more closely with apical endpoints even though the pathways were not necessarily relevant to these endpoints. this finding supports previous examples demonstrating a 1:1 correlation between bmds for gene expression and apical endpoints (thomas et al., 2011 . these data indicate the potential utility of using gene expression profiles in determining acceptable exposure limits for nps. in order to determine the specific utility of pathway derived bmds in hhra, it will be necessary to establish a comprehensive catalogue of pathways that are actually perturbed in the event of specific adverse effects. perhaps the principal motivation for including gene expression profiling in hhra is the wealth of information that can be used to identify key events that are correlated with adverse outcomes that are relevant to human disease, and moreover can be used to predict the likelihood of a human disease. identification of key events at the transcriptional level can facilitate the identification of processes that are critical for disease initiation and progression, thus allowing information from animal experiments to be queried and used for extrapolation to human scenarios (edwards and preston, 2008) . comparison of our data with specific models of lung disease, including bacterial infection, airway hypersensitivity and lung injury revealed that cbnps induced responses that were more closely related to lung injury and fibrosis than to other models. this finding was further supported by comparison of the expression profiles of cbnp exposed mice to those of curated studies of animals and humans exhibiting a myriad of pulmonary disease phenotypes. this analysis demonstrates that cbnp exposure perturbs genes that are known to be involved in tissue injury and fibrosis in mice. although it is unclear if cbnp exposure would result in the same gene expression profile in humans, similar pathways including many involved in fibrotic responses were found in both mice and humans (52% of the top 50 pathways found were common between mouse and human). despite concordance of pathways, the top ranked genes differed considerably between both species. however, many of the genes found in mice and humans had similar functions, including inflammatory and acute phase responses (e.g., saa3, socs3 and mt2 in mice and cp, vnn2 and cxcl10 in humans), cell cycle progression (cdkn1a in mice and klf4 in humans) and bone and tissue modelling (mmp14, timp1, eln and ogn in mice and spp1 in humans). thus, despite discordance in the gene expression profiles between species, the similar functions of top ranked genes and concordance between pathways supports the likelihood of similar responses in the event of cbnp exposure in humans. in addition, fibrosis has been identified as an outcome of exposure to various particles and nps in animals (bermudez et al., 2004; shvedova et al., 2008) , including printex 90 (e.g., 28-day nose only inhalation in wistar wu rats) (bellmann et al., 2009) , as well as in humans (lkhasuren et al., 2007; wang and christiani, 2003) . the process of pulmonary fibrosis is closely related to progression of carcinogenic outcome (hubbard et al., 2000) . these data demonstrating very similar fibrotic pathways in mice and humans and a significant overlap with cbnp-induced gene expression changes thus support the use of pathway-based approaches in identifying molecular mechanisms of disease onset and progression, and using gene expression profiles to support hhra. this study confirms several key elements that are necessary for the application of gene expression profiling for hhra of toxicant exposures in general. first, transcriptional profiles can effectively predict the biological effects of chemical exposures. specifically, in the absence of data for any apical endpoints, our data would have suggested that mice exposed to cbnps exhibit an inflammatory response, oxidative stress, dna damage and perturbations in cholesterol metabolism. second, a comparison of bmds and bmdls of relevant pathways and apical endpoints confirms that minimum pathway bmds and bmdls are in the same range as those of apical endpoints. third, that expression profiles can be fairly easily mined to identify potential adverse outcomes (i.e., diseases) that are relevant to humans, and might reasonably be expected to occur in humans exposed to substances that elicit specific gene expression patterns in experimental animals. we believe that our work constitutes a significant step towards the ultimate recognition of toxicogenomic endpoints for routine assessment of human health risk. gene expression profiling offers a promising approach to decipher the largely unknown hazards of np exposure. due to the unique properties of nps, powerful technologies that can assess a multitude of adverse outcome possibilities will be required to elucidate their modes of action and potential impacts on human health within a time-frame that is suitable for prompt regulatory decision making. this same premise should hold true for any new chemical products, for which toxicity is largely or completely unknown. in order to establish a strong foundation for the integration of gene expression profiling into hhra, it will be necessary for the approach employed here to be applied to a variety of additional chemicals/particles that span a wide range of toxicological potencies and modes of action, and using a variety of experimental designs (e.g., multiple doses and time-points). as our knowledge of molecular pathways, and of the diverse tools used to decipher their biological significance, dose-response characteristics and relevance to human disease continues to grow, we anticipate that toxicogenomics will become increasingly useful in assessing the toxicological hazards of a wide range of test articles, and by extension, for hhra. marchetti, lynn berndt-weis and miriam hill of health canada are thanked for reviewing and commenting on the original manuscript. this work was supported by the health canada genomics research and development initiative, and the chemical management plan. financial support for j. bourdon was through the natural sciences and engineering research council of canada. testing for mean and correlation changes in microarray experiments: an application for pathway analysis predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning rat inhalation test with particles from biomass combustion and biomass co-firing exhaust pulmonary responses of mice, rats, and hamsters to subchronic inhalation of ultrafine titanium dioxide particles cross-species comparisons of transcriptomic alterations in human and rat primary hepatocytes exposed to 2,3,7,8-tetrachlorodibenzo-p-dioxin lung fibrotic responses to particle exposure hepatic and pulmonary toxicogenomic profiles in mice intratracheally instilled with carbon black nanoparticles reveal pulmonary inflammation, acute phase response and alterations in lipid homeostasis carbon black nanoparticle instillation induces sustained inflammation and genotoxicity in mouse lung and liver toxicogenomics in risk assessment: applications and needs toxicogenomics: the challenges and opportunities to identify biomarkers, signatures and thresholds to support mode-of-action introduction to benchmark dose methods and u. s. epa's benchmark dose software (bmds) version 2.1.1 the toxcast program for prioritizing toxicity testing of environmental chemicals an association between air pollution and mortality in six u.s. cities t25: a simplified carcinogenic potency index: description of the system and study of correlations between carcinogenic potency and species/site specificity and mutagenicity systems biology and mode of action based risk assessment use of genomic data in risk assessment case study: ii. evaluation of the dibutyl phthalate toxicogenomic data set development and evaluation of a genomic signature for the prediction and mechanistic assessment of nongenotoxic hepatocarcinogens in the rat toxicogenomics: a pivotal piece in the puzzle of toxicological research genomic analysis of alachlor-induced oncogenesis in rat olfactory mucosa toxic effects of nanoparticles and nanomaterials: implications for public health, risk assessment and the public perception of nanotechnology pamr: pam: prediction analysis for microarrays bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists systematic and integrative analysis of large lists using david bioinformatics resources lung cancer and cryptogenic fibrosing alveolitis: a population-based cohort study mutation spectrum in fe1-mutatmmouse lung epithelial cells increased mutant frequency by carbon black, but not quartz, in the lacz and cii transgenes of mutatmmouse lung epithelial cells genotoxicity, cytotoxicity, and reactive oxygen species induced by single-walled carbon nanotubes and c(60) fullrenes in the fe1-mutatmmouse lung epithelial cells lung inflammation and genotoxicity following pulmonary exposure to nanoparticles in apoe−/− mice application of toxicogenomics in hepatic systems toxicology for risk assessment: acetaminophen as a case study inhaled particles and lung cancer. part a: mechanisms ontology-based meta-analysis of global collections of high-throughput public data disease-specific gene expression profiling in multiple models of lung disease occupational lung diseases and the mining industry in mongolia use of genomic data in risk assessment case study: i. evaluation of the dibutyl phthalate male reproductive development toxicity data set approach to assessment of risk to human health for priority substances under the canadian environmental protection act cardiovascular mortality and long-term exposure to particulate air pollution: epidemiological evidence of general pathophysiological pathways of disease r: a language and environment for statistical computing. r foundation for statistical computing tumor necrosis factor is not required for particle induced genotoxicity and pulmonary inflammation inflammatory and genotoxic effects of nanoparticles designed for inclusion in paints and laquers gene expression profiling of responses to dimethylarsinic acid in female f344 rat urothelium inhalation vs. aspiration of single-walled carbon nanotubes in c57bl/6 inflammation, fibrosis, oxidative stress, and mutagenesis toxicogenomics in predictive toxicology in drug development a method to integrate benchmark dose estimates with genomic data to assess the functional effects of chemical exposure application of transcriptional benchmark dose values in quantitative cancer and noncancer risk assessment integrating pathway-based transcriptomic data into quantitative chemical risk assessment: a five chemical case study overexpression of tumor necrosis factor-alpha in the lungs alters immune response, matrix remodeling, and repair and maintenance pathways diagnosis of multiple cancer types by shrunken centroids of gene expression science issue paper: mode of carcinogenic action for cadodylic acid and recommendations for dose response extrapolation revised reregistration eligibility decision document (red) for msma, dsma, and cama and cacodylic acid. environmental protection agency, washington occupational lung disease in china bmdexpress: a software tool for the benchmark dose analyses of genomic data the authors would like to acknowledge rusty thomas for early access to his bmdexpress software modified from the agilent platform and longlong yang for his technical support. we also thank mike walker for his helpful advice on bmd modelling. francesco none. supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.tox.2012.10.014. key: cord-311430-o32d3kaw authors: shahabi, vafa; berman, david; chasalow, scott d; wang, lisu; tsuchihashi, zenta; hu, beihong; panting, lisa; jure-kunkel, maria; ji, rui-ru title: gene expression profiling of whole blood in ipilimumab-treated patients for identification of potential biomarkers of immune-related gastrointestinal adverse events date: 2013-03-22 journal: j transl med doi: 10.1186/1479-5876-11-75 sha: doc_id: 311430 cord_uid: o32d3kaw background: treatment with ipilimumab, a fully human anti-ctla-4 antibody approved for the treatment of advanced melanoma, is associated with some immune-related adverse events (iraes) such as colitis (gastrointestinal irae, or gi irae) and skin rash, which are managed by treatment guidelines. nevertheless, predictive biomarkers that can help identify patients more likely to develop these iraes could enhance the management of these toxicities. methods: to identify candidate predictive biomarkers associated with gi iraes, gene expression profiling was performed on whole blood samples from 162 advanced melanoma patients at baseline, 3 and 11 weeks after the start of ipilimumab treatment in two phase ii clinical trials (ca184004 and ca184007). overall, 49 patients developed grade 2 or higher (grade 2+) gi iraes during the course of treatment. a repeated measures analysis of variance (anova) was used to evaluate the differences in mean expression levels between the gi irae and no-gi irae groups of patients at the three time points. results: in baseline samples, 27 probe sets showed differential mean expression (≥ 1.5 fold, p ≤ 0.05) between the gi irae and no-gi irae groups. most of these probe sets belonged to three functional categories: immune system, cell cycle, and intracellular trafficking. changes in gene expression over time were also characterized. in the gi irae group, 58 and 247 probe sets had a ≥ 1.5 fold change in expression from baseline to 3 and 11 weeks after first ipilimumab dose, respectively. in particular, on-treatment expression increases of cd177 and ceacam1, two neutrophil-activation markers, were closely associated with gi iraes, suggesting a possible role of neutrophils in ipilimumab-associated gi iraes. in addition, the expression of several immunoglobulin genes increased over time, with greater increases in patients with grade 2+ gi iraes. conclusions: gene expression profiling of peripheral blood, sampled before or early in the course of treatment with ipilimumab, resulted in the identification of a set of potential biomarkers that were associated with occurrence of gi iraes. however, because of the low sensitivity of these biomarkers, they cannot be used alone to predict which patients will develop gi iraes. further investigation of these biomarkers in a larger patient cohort is warranted. ipilimumab, a fully human monoclonal antibody that blocks cytotoxic t-lymphocyte antigen-4 (ctla-4) [1] , has been approved by the u.s. food and drug administration (fda) and several other regulatory agencies for the treatment of advanced metastatic melanoma (mm). the efficacy of ipilimumab has been demonstrated in a number of phase ii and two phase iii clinical trials in mm patients, where a significant prolongation of overall survival has been reported [2, 3] . treatment with ipilimumab is associated with a spectrum of aes which are immune mediated and called immune-related adverse events (iraes). gastrointestinal (gi) iraes such as diarrhea and colitis are among the most common ipilimumab-associated iraes [4] . in most cases the onset of gi iraes occurs after the second or third dose of ipilimumab [5] and these iraes are managed according to established treatment guidelines. in a previous report, examination of colon biopsies in a safety-focused clinical trial (ca184007) revealed abundant focal neutrophilic cryptitis and neutrophilic infiltration in the lamina propria of affected tissues from patients experiencing gi iraes. although administration of high doses of steroids often leads to successful and safe management of the majority of these iraes [5] [6] [7] , identification of biomarkers that may predict (before or soon after the start of the treatment) these iraes might improve patient care. in this context, peripheral blood biomarkers would be preferred, since collection of peripheral blood is less invasive than a colonic biopsy. to understand the underlying causes of ipilimumabassociated gi iraes and identify potential predictive biomarkers, gene expression profiling was performed on whole blood samples collected from metastatic melanoma patients before and during ipilimumab treatment in two phase ii clinical trials, ca184004 and ca184007 [6, 8] . a number of cell cycle-and immune-related genes were found to have higher expression at baseline and post-baseline in those patients who experienced gi iraes. in particular, increases in expression of neutrophil activation markers, cd177 and ceacam1, were found to be associated with the occurrence of gi iraes. in addition, greater increases in the expression of immunoglobulin-related genes were detected at weeks 3 and 11 in patients with gi iraes than in those without. these results are consistent with our understanding of the mechanisms underlying ipilimumab-associated gi iraes and provide a list of potential peripheral blood biomarkers for early prediction of these iraes. the multicenter, phase ii clinical trial ca184004 enrolled 82 previously-treated or untreated patients with stage iii (unresectable) or iv melanoma, randomized 1:1 into 2 arms to receive up to 4 intravenous infusions of either 3 or 10 mg/kg ipilimumab every 3 weeks in an induction phase. in trial ca184007, treatment-naïve or previously treated patients with stage iii (unresectable) or iv melanoma (n = 115) received open-label ipilimumab (10 mg/kg every 3 weeks for four doses) and were randomized to receive concomitant blinded prophylactic oral budesonide (9 mg/d with gradual taper through week 16) or placebo (4) . exclusion criteria included the use of any immuno-suppressing treatments including corticosteroids (patients on stable doses of hormone replacement therapy were exempt), cyclosporine, mycophenolate mofetil (cellcept w ), as well as chemotherapy and radiation, within 4 weeks prior to the first ipilimumab dose. complete study design, patient characteristics and endpoint reports of these trials were described elsewhere [6, 8] . gene expression profiles from twenty patients in ca184078 [9] who were treated with ipilimumab monotherapy at 10 mg/ml were used as a confirmation data set for the present analysis. these studies were conducted in accordance with the ethical principles originating from the current declaration of helsinki and consistent with international conference on harmonization good clinical practice and the ethical principles underlying european union directive 2001/20/ec and the united states code of federal regulations, title 21, part 50 (21cfr50). the protocols and patient informed consent forms received appropriate approval by all institutional review boards or independent ethics committees prior to study initiation. all participating patients (or their legally acceptable representatives) gave written informed consent for these biomarker-focused studies. safety was evaluated using the national cancer institute common terminology criteria for adverse events, based on adverse events (aes), physical examinations, and clinical laboratory assessments. drug-related gastrointestinal aes consistent with immune-mediated events and the intrinsic biological activity of ipilimumab were examined and reported. adverse events were recorded based on meddra v10.0 system organ class and preferred terms. clinical activity (ca) was defined as confirmed complete response, confirmed partial response, or stable disease ending not earlier than 24 weeks from date of first ipilimumab dose. a complete description of irae and ca evaluations for these trials has been reported elsewhere [6, 8] . neutrophils were quantified as a component of the standard hematology panel. absolute neutrophil counts were available at various time points for most patients. whole blood samples for gene expression profiling were collected just prior to first dose of ipilimumab (baseline), and 3 and 11 weeks after first ipilimumab dose. total rna was extracted from whole blood samples using the paxgene blood rna mdx kit with a biorobot universal system (qiagen, valencia, ca), and purified by rneasy minelute cleanup kit using qiacube (qiagen, valencia, ca). rna concentration was measured by nanodrop spectrophotometer and rna integrity was evaluated on the agilent 2100 bioanalyzer (agilent technologies, santa clara, ca). complementary dna preparation and hybridization on ht-hg-u133a 96-array plates followed manufacturer' s protocols (affymetrix, santa clara, ca). affymetrix raw data (.cel files) were normalized with the robust multi-array analysis (rma) algorithm, obtained from www.bioconductor.org, version 1.20.2. appropriate affymetrix control probe sets were examined to ensure quality control for the cdna synthesis and hybridization steps. principal component analysis (pca) was subsequently performed to detect outlier samples (single samples that account for a high degree of variation in the data). no sample was removed as an outlier. anti-log rma values were used in subsequent statistical analyses. for the combined data from studies ca184004 and ca184007, 12518 of the 22215 noncontrol probe sets had maximum expression level (rma normalized) of less than 32. these probe sets, with low expression levels across all samples, were excluded from further analysis. gene expression data analyses gi irae status was available for all 197 treated patients. gene expression data were available for 188 of these patients. for 162 of the patients, expression data were available for at least 2 of the 3 time points. only these patients were included in subsequent analyses. of these patients, 113 were classified in a no-gi irae group, which included patients with a worst-grade gi irae of 0 or 1. a total of 49 patients with grade 2 or greater (grade 2+) gi iraes were classified in a gi irae group. for the no-gi irae and gi irae groups, respectively, baseline expression data were available for 108 and 49 patients, week 3 data were available for 108 and 47 patients, and week 11 data were available for 78 and 30 patients. a repeated measures analysis of variance (anova) model was fit in partek genomics suite 6.6 (www. partek.com), with anti-log normalized expression level as dependent variable. explanatory variables included patient, time point within patient as a 3-level factor, and binary gi irae status, with no time-by-status interaction. because gi iraes were observed in all treatment arms, and because of the relatively small sample sizes in individual trials, data from patients in the two trials were combined to increase statistical power to detect associations. statistical inference based on this model focused on two hypothesis tests: a test of the null hypothesis that mean gene expression (averaged over time) was the same in the two gi irae status groups, and a test of the null hypothesis that mean gene expression (averaged over gi irae status) was the same for the three time points. an uncorrected p value of 0.05 was used as a cutoff to select probe sets with mean expression differences between comparison groups. the qvalue package (v1.20.0) in the r statistical computing environment (v2.15.0, http://www.r-project.org) was used to estimate false discovery rates (fdrs). expression of selected genes was confirmed by quantitative polymerase chain reaction (qpcr) as described previously [10] using pre-designed probes. functional interpretation of differentially expressed genes was computed using ingenuity pathway analysis (ipa) software (ingenuity systems), as described previously [10] . expression of each of 9697 non-control probe sets was analyzed individually. genes associated with gi irae status (grade 2+ vs. not) were selected by assessing the difference in mean pre-treatment expression between the gi and no-gi irae groups. two selection criteria were applied: a p value ≤ 0.05 for the hypothesis test comparing the gi and no-gi irae groups, and a minimum mean pre-treatment expression ratio of 1.5. for these tests, the p value threshold of 0.05 corresponded to an estimated fdr of 0.50. a set of 27 probe sets representing 24 unique genes met these criteria (additional file 1: table 1a ). this list included a number of immunerelated genes, such as cd3e, il2rg [11] , cd37 [12] , cd4, il32 [13] , and rac2 [14] ; cell cycle-and proliferation-associated genes such as sptan1 [15] , banf1 [16] , bat1 [17] , pcgf1 [18] , fp36l2 [19] , and wdr1 [20] ; and genes involved in intracellular vesicle trafficking such as picalm [21] , snap23 [21] , and vamp3 [22] . some of these molecules such, as il32, snap23 and rac2, have been reported either to be present in neutrophils or to regulate their function [13, 14, 21] . to identify early on-treatment predictors of ipilimumabassociated gi iraes, post-baseline expression levels of the 9697 non-control probe sets were compared between the gi irae and no-gi irae groups. thirty five and 47 probe sets were identified to have a mean expression ratio of at least 1.5 for the week 3 and 11 samples, respectively, and a p value ≤ 0.05 (fdr = 0.50) for the hypothesis test comparing the gi and no-gi irae groups. since most ipilimumab-associated gi iraes occur after the second or third dose of ipilimumab, the 35 probe sets differentially expressed at week 3 are of particular interest, as they might serve as early predictors to help identify patients who experience gi iraes after the second dose, given at week 3 (additional file 1: table 1b ). the probe set that exhibited the largest differential expression corresponded to the neutrophil-specific marker, cd177, a glycosyl-phosphatidylinositol (gpi)linked cell surface molecule [23] . there was no difference in baseline expression of cd177 between patients in the gi and no-gi irae groups. however, significantly higher cd177 expression was found after only one dose of ipilimumab in the gi irae group [12.2 fold higher in the gi irae group than in the no-gi irae group at week 3 (p = 7.6e-03)]. in addition, increase in mean expression of cd177 from baseline to week 3 was much greater in the gi irae group than in the no-gi irae group (additional file 1: table 2 ). the mean increase in cd177 expression was not associated with clinical activity (ca) of ipilimumab (p = 0.23) ( figure 1a ). expression levels of cd177 in these samples were confirmed by quantitative pcr, showing statistically significant differences between the gi irae and no-gi irae groups (unadjusted p < 0.005) or changes over time (unadjusted p < 0.0001). these data suggest that cd177 was not only a potential early predictor of gi iraes but that increase in cd177 gene expression might also be a consequence of treatment with ipilimumab independent of its clinical activity. since cd177 is a neutrophil surface marker, we examined the relationship between pbanc and expression levels of cd177. mean pbanc increased gradually with ipilimumab treatment, with the largest apparent increases occurring between weeks 9 and 11 in patients with higher-grade gi irae ( figure 1b predictor of gi iraes associated with ipilimumab treatment than pbanc. expression of cd177 at week 3 had large inter-individual variability, with considerably higher values in 7 patients than in the rest ( figure 1c ). all seven patients with high levels of cd177 expression (rma-normalized expression level ≥ 8) at week 3 had grade 2+ gi iraes during the course of treatment, suggesting high specificity of this biomarker above this threshold. however, many patients with grade 2+ gi iraes had a normalized expression level < 8 for cd177 at week 3, suggesting low sensitivity of the biomarker in predicting gi iraes (additional file 2: figure s1 ). in week 11 post-baseline samples, high expression levels of cd177 were found in both the gi irae and no-gi irae groups. many patients with grade 2+ gi iraes had already discontinued treatment before this time point, so the data from week 11 might be biased by informative drop-out. we also explored the timing of the onset of gi iraes in individual patients to establish the value of cd177 expression as an early predictor. out of 44 patients with grade 2+ gi iraes who had matching gene expression data at week 3, 6 patients reported the first gi irae before or on day 21 ± 3 days (the nominal date of blood sample collection), whereas the other 38 patients had gi iraes reported after this date ( figure 1d ). the four highest values of cd177 expression were detected in patients who reported the first grade 2+ gi irae between days 26 (2-5 days after blood sample collection) and 43 after receiving the first dose of ipilimumab. of note, one of the patients who had high cd177 expression at week 3 reported a grade 2 gi irae on day 72 ( figure 1d , marked in gray shade circle), but progressed to a grade 4 which ultimately led to fatal gi perforation. these data suggest that, although considerable increase in cd177 gene expression was closely associated with the onset of gi iraes, early increases might also predict gi iraes that could develop much later. however, since cd177 had low sensitivity, this biomarker could not identify most future gi iraes. association between expression of cd177 and other neutrophil-associated genes cd177 is a glycoprotein expressed by neutrophils, neutrophilic metamyelocytes, and myelocytes, but not by any other blood cells [24, 25] . therefore we specifically searched for other neutrophil-associated genes, to better understand the implication of this granulocyte subtype as an early predictor of gi iraes. these included genes encoding for granuleassociated proteins such as olfactomedin 4 (olfm4) [26] , azurocidin 1 (azu1), lactoferrin (ltf) [27] , cathelicidin (camp), myeloperoxidase (mpo) [28] , bactericidal/permeability increasing protein (bpi), defensin (defa4) [29] , neutrophil elastase (elane) [30] , cathepsin g (ctsg) [30] , ceacam6, ceacam8 [31] , and ceacam1 (which mediates neutrophil adhesion to endothelial cells and facilitates their transmigration into tissues) [32] . the genechip ht-hg-u133a includes probe sets for many of these genes additional file 1: table s3 . although an apparent greater expression of each of these genes was found in those samples with high levels of cd177 expression (figure 2a) , only the expression of ceacam1 was significantly linearly correlated with that of cd177 at week 3 (r = 0.75, p = 7.2e-30 between 219669_at, probe set for cd177, and 206576_s_at, probe set for ceacam1). consequently, the pattern of mean gene expression of ceacam1 over time was similar to that for cd177 ( figures 1a and 2b ). fifty-eight and 247 probe sets were identified as having at least a 1.5-fold change from baseline in week 3 and 11 samples, respectively, and p ≤ 0.05 (fdr = 0.055) for the test of a time effect on expression (additional file 1: tables 2 and s1). we performed a pathway analysis using the ipa software on the 247 differentially expressed probe sets since the size of this gene set was amenable to such analysis. the top biological processes that exhibited changes during ipilimumab treatment included pathways of cell proliferation and metabolism, and immune-related pathways such as il-10 signaling, il-8 signaling, and b cell development figure 3a ). while cd177 exhibited the largest change from baseline at both time points, immunoglobulinrelated genes dominated both lists, including igha1, igha2, ighg1, and ighv4-31, all of which showed significant increases in expression 11 weeks after baseline. slight increases in expression of some of these genes were already apparent in week 3 samples, suggesting an early onset of antibody response with much larger expansion at later time points (additional file 1: table 2 and figure 3b ). notably, increases in expression of these genes over time were more prominent in the gi irae groups than in the no-gi irae group. there was no corresponding increase in b-cell marker genes (such as cd20), suggesting that the cell types responsible for the increased expression of immunoglobulin genes may not have been b-cells, but later-stage b-lineage cells such as plasma cells [33] . whole blood samples from the clinical study ca184078 were independently analyzed using the same statistical model. in this study, 20 patients were treated with ipilimumab monotherapy at 10 mg/kg every 3 weeks for 4 doses. the mean cd177 expression ratio comparing gi irae and no-gi irae groups at week 3 and 11 was 4.3 and 12.0, respectively, with no significant difference at baseline. in the gi irae group, the mean fold change from baseline to week 3 and 11 was 4.8 and 15.3, respectively. by contrast, in the no-gi irae group, these changes were negligible (1.1 and 1.2, respectively). expression changes similar to those seen in the other two studies were observed for ceacam1 and most of the granule-associated genes, with significant changes from baseline to week 11 in the gi irae group but not in the no-gi irae group (additional file 1: table s2 ). treatment with ipilimumab has been shown to prolong overall survival in advanced melanoma patients in 2 phase 3 trials [2, 34] . gastrointestinal iraes such as diarrhea and colitis are among the most common and sometimes severe forms of adverse events associated with ipilimumab. these adverse events are currently managed following treatment guidelines. however, identification of predictive biomarkers is important as it may enhance the management of these toxicities and improve patient care. the present study was undertaken to identify such biomarkers in peripheral blood. in a previous report of the ca184007 trial in which colonic biopsies were collected, changes in the colonic mucosa following onset of diarrhea appeared more severe than those observed in the pre-specified biopsies following the first dose of ipilimumab. histopathologic examination of the biopsies revealed active colitis with marked neutrophilic infiltration into lamina propria to be the most striking characteristic of the affected tissue. in these biopsies, foci of neutrophilic cryptitis, crypt abscesses, glandular destruction, and erosions of the mucosal surface were also apparent early after the start of treatment in those patients who presented with gi irae symptoms [5] . however, collection of colonic biopsies is considered an invasive procedure and therefore, peripheral-blood surrogate markers would be preferred. gi irae onset is most commonly observed after the second or third doses of ipilimumab. gene expression profiling using pre-treatment blood samples identified a few immune-related genes with higher baseline expression in patients who developed gi iraes than in those who did not, including cd3e, il2rg, cd4, cd37 and il-32. interleukin 32 (natural killer protein-4) might be the most interesting gene from this list, as this cytokine has been reported to play a central role in acute flares of inflammatory bowel disease [35] , as well as other autoimmune diseases such as rheumatoid arthritis [36] . il32 is selectively expressed by activated cytotoxic t and natural killer (nk) cells [37] , potentiates the effect of il-2 and il-18 and activates the innate immune system (monocytes and macrophages) to secrete chemotactic factors such as tumor necrosis factor-alpha (tnf-α), cxcl2 (11) in a number of studies [38, 39] . in the current study, although treatment-induced changes in the expression of il-8 were not significant, pathway analysis found il-8 signaling to be one of the top 10 pathways changed during treatment ( figure 3a ). an increase in expression of neutrophil-associated genes was noticeable shortly after the start of treatment. in particular, significantly greater mean expression of the neutrophil surface marker cd177 was detected in a subset of patients who experienced gi iraes of grade 2 or greater. neutrophil granulocytes are the most abundant type of white blood cells in mammals and form an essential part of the innate immune system. during the acute phase of inflammation, particularly as a result of bacterial infection, environmental exposure [40] , and some cancers [41, 42] , neutrophils are among the firstresponders of inflammatory cells to migrate towards the site of inflammation. they transmigrate through the blood vessels, then through interstitial tissue, following chemical signals such as il-8, c5a, and leukotriene b4 in a process called chemotaxis [43] . cd177 is a unique marker for neutrophils [24, 25] and is up-regulated upon neutrophil activation during acute inflammatory responses toward stimuli such as bacterial infections [44] . significant increase in the mean expression of this marker was detected in patients who experienced moderate to severe gi iraes, independently from their clinical response to ipilimumab. indeed, the seven patients with the highest expression levels of cd177 (normalized expression level > 8) at week 3 were those who already had or would experience gi iraes within a few days to three weeks after that time point. currently, gi iraes during ipilimumab treatment are managed, according to treatment guidelines, by cessation of ipilimumab in combination with treatment with corticosteroids or tnf-α blockade. ca184004 was one of the earlier ipilimumab monotherapy trials, where these guidelines were not yet fully in place. in that trial, two patients died of severe gi iraes and intestinal perforation. one of these patients provided both baseline and week 3 blood samples for gene expression analysis. a 42.5-fold increase in cd177 expression from baseline to week 3 was apparent almost 50 days before the onset of the first gi irae episode, suggesting the early predictive value of this biomarker for severe gi iraes. cd177 has been shown to recognize an endothelial cell junction molecule, platelet endothelial cell adhesion molecule-1 (pecam-1), which contributes to interactions between neutrophils and endothelial cells, mediating trans-endothelial migration in the context of inflammatory cell recruitment [45] . another marker expressed by activated neutrophils, ceacam1, also showed a consistent increase in mean expression from baseline to 3 or 11 weeks after first ipilimumab dose, with a greater increase in the gi irae group. whereas cd177 is mostly an activation marker for neutrophils, ceacam1 mediates adherence of activated neutrophils and other hematopoietic cells (nk and t cells) to cytokine-activated endothelium, [46, 47] and has been suggested to play a role in immune-mediated diseases of the intestine. elevated ceacam1 expression has been reported in t cells of the lamina propria of small intestine in patients with celiac disease and in the large intestine of those with inflammatory bowel disease (ibd) [48] . the apparent association between neutrophil count, cd177 gene expression, and ipilimumab-associated gi iraes led us to search for other neutrophil-associated genes in the microarray data. degranulation is the process by which neutrophils release an assortment of proteins [49] such as mpo, defa4 [29] and elane [30] into the extracellular space. most of these genes were included in a neutrophil module reported by chaussabel et al. [33] . although a trend for greater expression of these genes was found in those samples with the highest levels of cd177 expression (figure 2a) , there was no statistically significant linear correlation between their expression and that of cd177 (data not shown). in fact, the mean expression of most of these genes in week 3 samples was lower than baseline (additional file 1: table 2 , left panel), when the greatest increase in cd177 was detected in some patients, suggesting that their expression might be repressed during the early neutrophil activation event. conversely, a significant increase in the mean expression of these genes was observed at week 11 (additional file 1: table s1 ), supporting the notion that the degranulation process follows that of the neutrophil activation event. these observations were also confirmed in another data set from an independent ipilimumab clinical trial, ca184078, in which higher mean expression of cd177 and ceacam1 were found in the gi irae group. in addition, our list of potential early predictors of gi iraes shared a number of common elements with sets of genes reported to confer resistance to intravenous corticosteroid therapy in children with ulcerative colitis [50] . genes shared between the two studies included cd177, ceacam1, olfm4, mmp8, bpi, clc, hp, and lcn2. in that study, post-baseline samples were collected only three days after the start of the corticosteroid treatment with significant differential expression of these genes between the steroid-resistant and sensitive patients. in our study, the earliest post-baseline blood samples were collected 3 weeks after the first dose of ipilimumab, and the major changes in gene expression occurred within this time period, suggesting that it might be possible to detect this predictive profile at an even earlier time within this period. in any event, these changes tended to precede significant increases in the number of peripheral neutrophils, suggesting that the proliferation of neutrophils occurs after the activation event and that changes in gene expression may serve as more sensitive biomarkers than increase in peripheral blood absolute neutrophil count (pbanc). another interesting finding in our analysis was the considerable increase in the number and expression of immunoglobulin-related genes at 3 and 11 weeks after first dose of ipilimumab in patients who had gi iraes. this is in accordance with previous reports on the inhibitory effects of ctla-4 on immunoglobulin and cytokine production by plasma cells [51] or its inhibitory effect on cd4 + t cells that mediate t cell help to b cells during antibody production. ctla-4 blockade by ipilimumab is likely to reduce this inhibition. in healthy people, the humoral response to enteric flora is maintained in homeostasis. dysregulation of this homeostasis, manifested as increasing antibody levels to select enteric microorganisms, is characteristic of gastrointestinal disorders such as ibd but not acute gi inflammation (i.e., diverticulitis/infection) (7) (8) (9) . in a previous report from the ca184007 trial, ipilimumab was found to induce antibody responses to selected enteric flora such as pseudomonas anti-i2, saccharomyces cerevisiae, and cbir flagellin. however, no strong association between a positive level of antibody responses toward these specific bacteria and gi iraes was observed. although gene expression profiling could not provide information regarding the specificity of the induced antibodies, it still indicates that in patients experiencing gi iraes, the immunoglobulin production machinery had been turned on. in the absence of infections by external pathogens, this response may well be due to the generation of antibodies to self antigens or those expressed by the intestinal flora. we have identified early changes in gene expression in patients treated with ipilimumab that in some patients might predict the incidence of later gi iraes. these gene expression changes, together with prior histopathologic examination of the affected tissue, point to an important role of neutrophils in the onset of gi iraes in these patients. high expression of cd177 at week 3 was a very specific biomarker for grade 2+ gi iraes, as all patients who had no such event displayed expression levels below a certain threshold (normalized expression level = 8). however, because of its low sensitivity as a biomarker, cd177 expression alone cannot be used to predict which patients will develop gi iraes, which may occur in patients with low cd177 expression. the earliest on-treatment sample collection was 3 weeks after first ipilimumab dose. therefore it is not clear how much increases in cd177 expression preceded the onset of gi iraes. this study identified potential biomarkers of ipilimumab toxicity that have biological plausibility. however, validation in a larger controlled trial is needed to assess potential clinical utility. additional file 1: table 1 . lists of potential predictive or earlypredictive biomarkers. (a) probe sets with ≥ 1.5-fold greater mean baseline expression in blood samples from patients with grade 2+ gi iraes than in those from patients with no grade 2+ gi irae (highlighted column). mean expression ratio in the post-baseline samples as well as p values for the test of a difference at baseline in mean expression between the two gi irae groups also are shown. (b) probe sets with a mean expression ratio of at least 1.5 for the comparison of the gi irae and no-gi irae groups at week 3 (highlighted column). mean expression ratio at other time points as well as p values for the test of a difference in week 3 mean expression between the two gi irae groups also are shown. mean expression ratio: positive values give (mean expression in the gi irae group)/(mean expression in the no-gi irae group); negative values give negative of (mean expression in the no-gi irae group)/(mean expression in the gi irae group). table 2 . lists of potential pharmacodynamic biomarkers. probe sets with ≥ 1.5-fold change in mean gene expression from baseline to week 3 (left panel) or week 11 (right panel; only the top 58 probe sets shown) in the gi irae group (highlighted columns). fold changes in the no-gi irae group as well as p values for the test of a difference between baseline and post-baseline expression also are shown. mean fold change: positive values give mean of (post-baseline expression)/(baseline expression); negative values give negative mean of (baseline expression)/(post-baseline expression). table 3 . granule-associated gene expression profiles. (a) mean expression ratio comparing the gi irae and no-gi irae groups for granule-associated genes at each time point. p value for the test of a difference in mean expression between the two gi irae groups (averaged over the three time points) also is shown. (b) mean fold change from baseline (bl) in the gi irae and no-gi irae groups for granule-associated genes. p value for the test of a difference in mean expression among the three time points (averaged over the two groups) is shown. for definitions of "mean expression ratio" and "mean fold change", see legends for tables 1 and 2, respectively. table s1 . complete list of potential pharmacodynamic biomarkers. probe sets with ≥ 1.5-fold mean change in gene expression from baseline to week 11 in the gi irae group (highlighted column). fold changes in the no-gi irae group as well as p values for the test of a difference between baseline and post-baseline expression also are shown. mean fold change: positive values give mean of (post-baseline expression)/(baseline expression); negative values give negative mean of (baseline expression)/(post-baseline expression). table s2 . granule-associated gene expression profiles in ca184078. (a) mean expression ratio comparing the gi irae and no-gi irae groups for granuleassociated genes at each time point. p value for the test of a difference in mean expression between the two gi irae groups (averaged over the three time points) also is shown. (b) mean fold change from baseline (bl) in the gi irae and no-gi irae groups for granule-associated genes. p value for the test of a difference in mean expression among the three time points (averaged over the two groups) is shown. mean expression ratio: positive values give (mean expression in the gi irae group)/(mean expression in the no-gi irae group); negative values give negative of (mean expression in the no-gi irae group)/(mean expression in the gi irae group). mean fold change: positive values give mean of (post-baseline expression)/(baseline expression); negative values give negative mean of (baseline expression)/(post-baseline expression). additional file 2: figure s1 . roc curve of cd177 expression at week 3 as a predictor of gi irae. the plot included 155 patients with known gi irae status and cd177 expression values. authors' contributions db, zt, sdc, mjk designed the two biomarker focused clinical trials. lw, bh, lp performed the pcr assays and prepared rna for affymetrix microarray analysis. mjk assisted in data analysis and interpretation. vf, sdc, zt, rj performed the data analysis and interpretation, and prepared the manuscript. all authors read and approved the final manuscript. checkpoint blockade in cancer immunotherapy improved survival with ipilimumab in patients with metastatic melanoma overcoming immunologic tolerance to melanoma: targeting ctla-4 with ipilimumab (mdx-010) enterocolitis in patients with cancer after antibody blockade of cytotoxic t-lymphocyte -associated antigen 4 blockade of cytotoxic t-lymphocyte antigen-4 by ipilimumab results in dysregulation of gastrointestinal immunity in patients with advanced melanoma a randomized, double-blind, placebo-controlled, phase ii study comparing the tolerability and efficacy of ipilimumab administered with or without prophylactic budesonide in patients with unresectable stage iii or iv melanoma infliximab in the treatment of anti-ctla4 antibody (ipilimumab) induced immune-related colitis a prospective phase ii trial exploring the association between tumor microenvironment biomarkers and clinical activity of ipilimumab in advanced melanoma assessment of pharmacokinetic interaction between ipilimumab and chemotherapy in a randomized study. 35th esmo congress abstract an immune-active tumor microenvironment favors clinical response to ipilimumab structure of the quaternary complex of interleukin-2 with its alpha, beta, and gammac receptors targeting cd37-positive lymphoid malignancies with a novel engineered small modular immunopharmaceutical interleukin-32: a cytokine and inducer of tnfalpha human neutrophils coordinate chemotaxis by differential activation of rac1 and rac2 human alpha spectrin ii and the fanconi anemia proteins fanca and fancc interact to form a nuclear complex baf: roles in chromatin, nuclear structure and retrovirus integration the bat1 gene in the mhc encodes an evolutionarily conserved putative nuclear rna helicase of the dead family polycomb group and scf ubiquitin ligases are found in a novel bcor complex that is recruited to bcl6 targets molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (sars-cov) aip1/wdr1 supports mitotic cell rounding clathrin assembly lymphoid myeloid leukemia (calm) protein: localization in endocytic-coated pits, interactions with clathrin, and the impact of overexpression on clathrinmediated traffic identification of a cellubrevin/vesicle associated membrane protein 3 homologue in human platelets nb1, a new neutrophil-specific antigen involved in the pathogenesis of neonatal neutropenia analysis of the expression of nb1 antigen using two monoclonal antibodies biochemical characterization of the neutrophil-specific antigen nb1 pdp4, a novel glycoprotein secreted by mature granulocytes, is regulated by transcription factor pu immunocytochemical localization of lactoferrin in human neutrophils. an ultrastructural and morphometrical study neutrophil granules in health and disease antibiotic proteins of human polymorphonuclear leukocytes neutrophil elastase, proteinase 3, and cathepsin g as therapeutic targets in human diseases subcellular localization and mobilization of carcinoembryonic antigen-related cell adhesion molecule 8 in human neutrophils interdependency of ceacam-1, -3, -6, and −8 induced human neutrophil adhesion to endothelial cells a modular analysis framework for blood genomics studies: application to systemic lupus erythematosus ipilimumab plus dacarbazine for previously untreated metastatic melanoma epithelial overexpression of interleukin-32alpha in inflammatory bowel disease interleukin-32, ccl2, pf4f1 and gfd10 are the only cytokine/chemokine genes differentially expressed by in vitro cultured rheumatoid and osteoarthritis fibroblast-like synoviocytes quiescent phenotype of tumor-specific cd8+ t cells following immunization mechanisms of inflammatory liver injury: adhesion molecules and cytotoxicity of neutrophils interleukin-8 and markers of neutrophil degranulation in pleural effusions subclinical responses in healthy cyclists briefly exposed to traffic-related air pollution: an intervention study the potential role of neutrophils in promoting the metastatic phenotype of tumors releasing interleukin-8 the interleukin-8 pathway in cancer human neutrophil migration into skin chambers is associated with production of nap-1/il8 and c5a neutrophil cd177 (nb1 gp, hna-2a) expression is increased in severe bacterial infections and polycythaemia vera the neutrophil-specific antigen cd177 is a counter-receptor for platelet endothelial cell adhesion molecule-1 (cd31) ceacam1: contact-dependent control of immunity two new synthetic peptides from the n-domain of ceacam1 (cd66a) stimulate neutrophil adhesion to endothelial cells dissection of the inflammatory bowel disease transcriptome using genome-wide cdna microarrays neutrophils: molecules, functions and pathophysiological aspects gene expression changes associated with resistance to intravenous corticosteroid therapy in children with severe ulcerative colitis inhibitory receptors cd85j, lair-1, and cd152 down-regulate immunoglobulin and cytokine production by human b lymphocytes gene expression profiling of whole blood in ipilimumab-treated patients for identification of potential biomarkers of immune-related gastrointestinal adverse events submit your next manuscript to biomed central and take full advantage of: key: cord-318576-dc5n6ni4 authors: jitobaom, kunlakanya; phakaratsakul, supinya; sirihongthong, thanyaporn; chotewutmontri, sasithorn; suriyaphol, prapat; suptawiwat, ornpreya; auewarakul, prasert title: codon usage similarity between viral and some host genes suggests a codon-specific translational regulation date: 2020-05-08 journal: heliyon doi: 10.1016/j.heliyon.2020.e03915 sha: doc_id: 318576 cord_uid: dc5n6ni4 the codon usage pattern is a specific characteristic of each species; however, the codon usage of all of the genes in a genome is not uniform. intriguingly, most viruses have codon usage patterns that are vastly different from the optimal codon usage of their hosts. how viral genes with different codon usage patterns are efficiently expressed during a viral infection is unclear. an analysis of the similarity between viral codon usage and the codon usage of the individual genes of a host genome has never been performed. in this study, we demonstrated that the codon usage of human rna viruses is similar to that of some human genes, especially those involved in the cell cycle. this finding was substantiated by its concordance with previous reports of an upregulation at the protein level of some of these biological processes. it therefore suggests that some suboptimal viral codon usage patterns may actually be compatible with cellular translational machineries in infected conditions. the genetic code is degenerate. there are 61 triplet codons coding for 20 amino acids and 3 stop codons [1] . therefore, each amino acid is encoded by several codons, with the exception of two amino acids (methionine and tryptophan). this codon redundancy results in synonymous codon usage, whereby one amino acid is encoded by 2, 4, or 6 codons [2] . several previous studies have revealed that synonymous codons are utilized with different frequencies and are not randomly used by different genomes or genes. this non-randomness is referred to as codon usage bias [3, 4] . each species preferentially uses different synonymous codons [5] . this results in a species-specific codon usage bias. similarly, there are several trna species that carry the same amino acid. these trna species are called isoacceptors [6] . codons and anti-codons in trna do not interact in a one-to-one fashion [7] . base pairing at the third codon position is wobble; for example, g can pair with both cytosine (c) and uracil (u) [8] . it has been demonstrated that trna modification directly affects trna and mrna wobble base pairing [9, 10] . both the available trna isoacceptors and trna modification change with the cell cycle, and they can be altered by cellular stresses [11, 12] . for efficient protein translation, the codon usage pattern should correlate with the population of available trna isoacceptors [13] . in the cellular stress-response, the alteration could enhance the expression of stress-response genes, with the codon usage patterns compatible with the changed trna modifications. those genes shown to be regulated by this codon-specific manner are called modification tunable transcripts (motts) [12, 14] . two major models have been proposed to explain the causes of codon usage bias: mutation pressure, and translational selection [15] . as to mutation pressure, it is believed that gc content is the major factor driving codon usage bias [16, 17] . the high mutation rates of some nucleotides or codons result in nucleotide substitution that might contribute to lower frequencies of some nucleotides and codons [15] . mutation pressure has been suggested to be the most important factor determining the codon usage bias in human rna viruses [18, 19, 20] . however, there are correlations between the codon usage bias and other factors related to translation efficiency (such as available trnas, mrna secondary structure, translation elongation rate, and the intragenic and intergenic codon bias) that cannot be explained by mutation pressure. this suggests that translational selection also influences codon usage bias [15] . the translational selection acts on codon usage bias to achieve efficient and accurate translation. the use of codons correlates with abundant trnas, resulting in a higher translation rate [21, 22, 23] . a correlation between codon usage bias and abundant trnas has been found in prokaryotes (such as e. coli [24] ) and in some eukaryotes (such as s. cerevisiae [25] , c. elegans [22] , drosophila [23] , and human [26] ). however, rare codons are preferred to encode some specific sets of genes or regions of genes, for instance, to enable protein oscillation in different phases of the cell cycle, slow down protein translation across the membrane, and reduce ribosome jamming and mrna secondary structure at the 5 0 end of coding sequences [11, 27, 28, 29] . therefore, the translational selection acting on the optimization of frequent and rare codon utilization is important in appropriate gene translation. viral replication is dependent on the cellular machineries of the host cells. thus, one would intuitively think that the codon usage of a viral genome should match that of its host in order to be efficiently expressed. surprisingly, however, most viruses have codon usage patterns that are different from the codon usage preference of their hosts [30, 31, 32, 33] . a previous study indicated an alteration in the cellular trna level after the infection of human immunodeficiency virus type 1 (hiv-1) [34] . in contrast, the cellular trna level was found to be unchanged following vaccinia and influenza a virus (iav) infection, whereas an alteration in the polysome-associated trna population was observed, particularly the population of polysome trna isoacceptors correlated with viral codon usage [35] . these findings suggest that the codon usage pattern and the regulation of translational machineries may influence gene expression in some viruses. in this study, we investigated the relationship between the codon usage bias of human genes and human rna viruses. it is generally believed that the codon usage bias of viruses differs from that of human genes; however, various human genes possess various codon usage patterns [19, 26] . in addition, intragenic codon biases had previously been reported in humans and mice [36] . a comparison of the codon usage at the genome level is therefore too generalized; a more precise comparison at the single-gene level may provide a better insight into the viral codon usage bias. a total of 20,190 major transcript variants of human protein coding sequences were recruited from the gencode database (version 26). the protein coding sequences of 77 human rna viruses were downloaded from the ncbi database. the open reading frames (orfs) of the protein coding sequences were rechecked by orffinder (ncbi) before performing the rscu calculation. the rscu is a simple parameter that represents the codon usage bias of synonymous codons in a coding sequence. in our analysis, the rscu was calculated from the protein coding sequences of the human and rna viruses. the rscu of each gene consists of 59 values corresponding to 59 synonymous codons; thus, the pca was performed to simplify the data to a smaller number of principal factors as a summary feature of the codon usage pattern of each gene. the pca successfully reduced the 59 values of each rscu into two significant components. the rscus of the human genes and rna viruses were represented by the coordinates of principal component 1 (pc1, x-axis) and principal component 2 (pc2, y-axis) plotted on the pca of an rscu graph ( figure 1 ). the rscu and pca of the rscus of the human genes and human rna viruses are shown in supplementary file 1. in figure 1 , the pcas of the rscus of the human genes were represented with a transparent black dot; genes with a similar rscu were located in the same area of the graph. the number of human gene located in each quadrant was counted: upper left, 6,020 genes; lower left, 4,227 genes; upper right, 4,664 genes; and lower right, 5,281 genes. interestingly, many human genes were located densely in the right quadrants, specifically, between (x ¼ 0.95 to 1.7) and (y ¼ -0.7 to 0.6). most rna viruses were also located in the right quadrants. additionally, negative sense-single strand rna viruses (-ssrna), ambisense rna viruses (ambi), and hiv-1 viruses were located in the area of the right quadrants figure 1 . the pca of the rscu. the rscu of human genes and rna viruses were subjected to pca. then, the simplified rscu values of human genes and rna viruses were plotted on the graph as the coordinates of component 1 (x) and 2 (y). the color keys indicate the groups of rna viruses. human genes are represented using transparent black dots. where the human genes were densely located. however, a great variation was observed in some groups of rna viruses, especially positive-sense single strand rna viruses (þssrna), doubled strand rna viruses (dsrna), and retroviruses (retro). the relationship between the pca of the rscu analysis and the codon adaptive index (cai) was investigated. cai is a common parameter used to assess the codon usage bias of a gene. it is calculated from the frequency of the overall codons in a given protein coding sequence with respect to a reference set of genes [37] . in our analysis, the human codon usage table, which is the average codon frequency of a human genome, was used as the reference set. from the results (figure 1 ), the pca of the rscu of the human codon usage table was plotted near the x and y intercepts, showing the average codon usage pattern of all human genes. a number of human genes in the pca of the rscu graph were selected and subjected to the cai calculation. the graphs of pc1 and cai, of pc2 and cai, were plotted; the linear regression and pearson correlation coefficient (pcc) were subsequently analyzed. from figure 2a , it was found that the cai of genes gradually decreased with an increase in pc1 (r 2 ¼ 0.7958, pcc ¼ -0.8921, p-value <0.0001), while a positive correlation was observed between cai and pc2 (r 2 ¼ 0.5824, pcc ¼ 0.7631, p-value < 0.0001). the percentages of the gc content at the third position of the codon (%gþc(3)) of the human genes were also determined. in a similar way to cai ( figure 2b ), %gþc(3) gradually decreased with an increase in pc1 (r 2 ¼ 0.9459, pcc ¼ -0.9726, p-value < 0.0001). a weak correlation between pc2 and %gþc(3) was observed (r 2 ¼ 0.4373, pcc ¼ 0.6613, p-value < 0.0001). thus, the pca of the rscu analysis could be used to characterize the heterogeneity of the codon usage bias in the human genome, in which genes in the left-upper quadrant contain more optimal codon usage for high expression, whereas those in the right quadrants near or below the x-axis have less optimal codon usage. the pcas of rscus of the human genes that coded for highly expressed proteins were plotted on a graph. the highly expressed proteins of humans had been previously identified using the proteomic approach (figure 1 ; see the gene list in supplementary table 1 ). [38] . from figure 1 , most of the highly expressed proteins were located in the left quadrants. this is in agreement with our analysis showing the relationship between pca and cai, and it supports the validity of using pca to predict cai. from the pca of the rscu graph (figure 1) , it was demonstrated that most of the human rna viruses were located in the right quadrants. the degree of difference in the codon usage pattern varied among the groups of viruses. this feature was also observed intragroup. the highest pc1 (x) belonged to rotavirus, indicating a high degree of difference in codon usage pattern compared to human genes and other rna viruses. in particular, there were a number of human genes with rscus similar to rna viruses (figure 1 ), especially -ssrna, ambi, and hiv viruses, which were located in the right quadrants, where human genes were also located densely. to investigate the kinds of human genes with codon usage patterns similar to rna viruses and the contributions of those genes in important biological processes, the human genes plotted in the same area with each subgroup of rna viruses were retrieved. the criteria for selection of the human genes with rscus similar to rna viruses are described in the methods section. figure 3 represents the selected human genes with rscus similar to rna viruses. these human genes were subjected to gene ontology (go) enrichment analysis using go-termfinder to identify the overrepresented go terms in biological processes [39] . revigo was then used to categorize the redundant go terms [40] . the number of human genes with rscus similar to rna viruses are listed in table 1 . the results ( figure 4) show the significant go terms in the biological processes of human genes with rscus similar to rna viruses. only human genes retrieved from six subgroups of rna viruses resulted in significant enrichment, namely, þssrna (subgroups 6, 7), -ssrna (subgroups 3, 4), retro (hiv-1), and ambi (subgroup 4), where the human genes recruited from þssrna (subgroups 6, 7), -ssrna (subgroup 4), retro (hiv-1), and ambi (subgroup 4) were in the same or adjacent area. the human genes with rscus similar to þssrna (subgroup 6), -ssrna (subgroup 4), retro (hiv-1), and ambi (subgroup 4) shared similar go terms in biological processes, including the cell cycle, the regulation of the cell cycle process, cell division, microtubule cytoskeletal organization, chromosome segregation, dna repair, macromolecule catabolism, and cellular localization ( fig. 4a , b, d, e, and f), while human genes with rscus similar to þssrna (subgroup 3) were related to rna processing ( figure 4c ). the list of enriched go terms in biological processes is shown in supplementary file 2. from the previous section, we demonstrated that human genes in the go terms of the cell cycle and the regulation of the cell cycle process adopt codon usage patterns similar to those of þssrna (subgroups 6, 7), -ssrna (subgroup 4), retrovirus (hiv-1), and ambisense (subgroup 4) viruses. to confirm that the codon usage patterns of these rna viruses are similar to human genes in the cell cycle and the regulation of the cell cycle process, the cell cycle codon score (cccs) was used to evaluate the similarity of the codon usage pattern between that of viral genes and a set of cell cycle-regulated human genes (top-600 set) [11] . the cccs of human rna viruses had been calculated and is detailed in table 2 . a positive cccs indicates that a gene has a codon usage pattern similar to the top-600 set. the results revealed that most of the þssrna viruses had a negative cccs score. only some þssrna viruses had a positive cccs score (such as dengue viruses [denvs], mers-coronavirus, sars-coronavirus, human coronaviruses, human enterovirus 68, and the hepatitis a virus), whereas most of the -ssrna, dsrna, hiv-1, hiv-2, and ambisense viruses had a positive cccs score. however, htlv-1 and htlv-2 in the retrovirus group gave a negative cccs. a previous study by frenkel-morgenstern et al. also demonstrated that the codon usage pattern of the cell cycle regulated genes (ccrs) influence the cell cycle-dependent protein expression [11] . thus, the similarity of the codon usage pattern of the ccr genes and rna viruses was investigated. ccr genes cycling at the protein level and non-cycle regulated genes (nccrs) that were found not to cycle at the protein level were selected from previous studies ( table 3 ). the rscus of the ccr and nccr genes were plotted on a graph and compared with the rscus of rna viruses ( figure 5 ). the results showed that the ccr genes were located with -ssrna, dsrna, and some þssrna viruses, indicating similar codon usage patterns, while the rscus of the nccr genes were distributed all over the graph with no resemblance to rna viruses. this result corresponds with the cccs of rna viruses ( table 2 ). from the previous section, we demonstrated that human genes with codon usage patterns similar to rna viruses contributed to some important biological processes, such as the cell cycle, the regulation of the cell cycle process, cell division, microtubule cytoskeletal organization, chromosome segregation, dna repair, macromolecule catabolism, cellular localization, and rna processing ( figure 4 ). when coupled with the fact that some viral infections can manipulate host cellular pathways (especially the translation machineries), this finding suggests that human genes with codon usage patterns similar to rna viruses may be upregulated during viral infection [41] . to substantiate this hypothesis, sets of proteomics data of rna virus infection were reanalyzed. the lists of upregulated protein profiles upon hiv-1 [42] , iav [43] , zika virus (zikv) [44] , and dengue virus serotype 2 (denv-2) [45] infections were obtained from previous studies. the lists of upregulated proteins upon viral infection were submitted to go-termfinder (supplementary file 3) , and the enriched go terms were compared to the enriched go terms of human genes with codon usage patterns similar to rna viruses from every subgroup. several enriched go terms of upregulated protein profiles during viral infections were found to be identical to the go terms of human genes with codon usage patterns similar to rna viruses ( figure 4 ). in the case of hiv-1 and zikv, the identical go terms included the cell cycle, the regulation of the cell cycle process, the mitotic cell cycle, organelle organization, cell division, microtubule-based process, and cellular localization. the identical go terms of iav and denv-2 included macromolecule metabolic processes, nucleic acid metabolic process, chromosome organization, cellular stress response and rna processing. synonymous codons are distributed unequally and in a non-random fashion, which is referred to as codon usage bias [46] . moreover, there are significant variations of codon usage bias among different species, and even among genes in the same organism [5, 21] . theoretically, two major factors shape the codon usage bias: mutation pressure and translational selection [18] . mutation pressure can result in uneven frequencies of nucleotide content, which can in turn influence codon usage bias [15, 47] . as to translational optimization, the frequent codons are usually found correlated with the population of trna isoacceptors [13, 15] . thus, the frequent or optimal codons would result in more rapid protein translation due to the greater availability of trnas corresponding to the frequently used codons [48] . although the replication of viruses relies on the host cell machinery, several viruses possess different a codon usage pattern to the codon usage preferences of their host [31] . for instance, the hiv-1 genome has been found to be a-rich [49] . the g-to-a hypermutation in the hiv-1 genome has been attributed to viral reverse transcriptase (rt), which lacks 3 0 to 5 0 exonuclease proofreading activity, leading to the misincorporation of nucleotides [50, 51] . in addition, the function of host enzymes of the apobec3 (a3) family has been found to partially contribute to a g-to-a mutation [52, 53] . furthermore, a difference in codon usage has been observed among individual genes of hiv-1 [51] . the hiv-1 gag gene, encoded for structural protein, adopts a great difference in codon usage pattern compared to human host cells. in contrast, the hiv-1 genes involved in the regulation of the replication cycle, tat and rev genes, have been demonstrated to be more similar to human codon usage bias [54] . in this analysis, the pca of rscu represented codon usage patterns of human genes and rna viruses as a coordinate of pc1 (x) and pc2 (y) on a graph. we demonstrated that the pca of rscu analysis is compatible with a well-established index, cai. this suggested that the pca of rscu could be used in assessing codon usage bias and comparing the difference in codon usage pattern. in particular, the pca of rscu allowed a comparison to be made of individual genes in the whole genome scale. human genes possess various, different codon usage patterns, as observed in each quadrant of a graph. as mentioned earlier, there are a number of human genes located densely in the right quadrants; these genes adopt a non-optimal codon usage pattern similar to ccr genes. although most of the rna viruses have a non-optimal codon usage pattern, a great variation in codon usage patterns was observed among the groups of þssrna, dsrna, and retroviruses. the greatest difference in the codon usage patterns belonged to rotavirus, as seen in the graph. by comparison, the rubella virus exhibited a more similar codon usage pattern to humans. these results correspond with those of another study which demonstrated that the codon usage patterns of þssrna viruses are closer to human than other rna viruses, and that the lowest cai belongs to dsrna viruses. in more detail, rubella virus (þssrna) had the highest cai at 0.773, and rotavirus had the lowest cai at 0.683 [55] . a number of human genes with codon usage patterns similar to that of rna viruses were found by the present study ( figure 3 ). the human genes with rscus similar to rna viruses were retrieved and subjected to gene ontology enrichment analysis. interestingly, it was found that only human genes similar to groups of viruses in the right quadrant resulted in significant enrichment, namely, the cell cycle, the cell cycle regulation process, cell division, microtubule cytoskeletal organization, chromosome segregation, dna repair, macromolecule catabolism, and cellular localization. the number of human genes with rscus similar to rna viruses retrieved from each subgroup varied from only seven to a thousand genes (table 1) . however, the number of retrieved genes did not affect the significance level or the number of enriched go terms in figure 4 ; for instance, 898 genes for the -ssrna subgroup 3 resulted in only 1 enriched go term, whereas 188 genes for the þssrna subgroup 7 resulted in 5 go terms. this suggests that the enriched go terms did not result from a bias from the different numbers of retrieved genes among the virus groups. it is also possible that some virus groups with a limited number of retrieved genes might not have had sufficient statistical power to enable the detection of enriched go terms. among the retrieved human genes, only the genes that adopted a non-optimal codon usage pattern retrieved from the area where they were located densely in the right quadrants gave significant overrepresented go terms. this suggested that the difference in codon usage bias in human genes might have specific functions. the contribution of non-optimal codon usage bias in human genes on the regulation of protein expression has been investigated in previous research [11, 56, 57] . one study on ccr genes revealed that the non-optimal codon usage pattern generates the oscillation in protein expression during cell cycle progression [11] . we demonstrated that the rscus of rna viruses were similar to the rscus of ccr genes, using both the cccs calculation and the pca of rscu. the wars tryptophanyl-trna synthetase results showed that -ssrna, dsrna, hiv-1, hiv-2, ambisense viruses, and a few viruses in the þssrna group exhibited non-optimal codon usage similar to that of the ccr genes. this suggests that despite having a non-optimal codon usage bias, viral genes might be efficiently expressed during the specific phase of the cell cycle correlated with the available trna population in that period. the trna population is tissue specific and varies with cellular conditions [58] . alteration of the trna population depends on the level of aminoacyl trna synthetase and cellular atp concentration [59] . in yeast cells, oscillation of the aminoacyl trna synthetase and atp during the cell cycle has been found to result in an increase in trna levels in the g2/m phase, but a low trna level was observed toward the end of the g1 phase [11] . therefore, with a low-charged trna concentration, the genes expressed during g1 prefer optimal codon usage bias, whereas the genes with non-optimal codon usage bias are highly expressed in the other phases of the cell cycle with a high-charged trna concentration [11] . the availability of charged trnas during the cell cycle may regulate protein translation in a codon usage-specific manner. several studies have revealed viral subversion of the cell cycle by arresting via various mechanisms to generate the resources and favorable environment for viral replication and viral protein production [60] . cell cycle arrest has been observed in both dna and rna virus infections. in some rna viruses, cell cycle arrest at a specific phase may lead to an increase in viral protein translation [61] . during the g2/m phase, the expression of many proteins has been found to fluctuate by arresting at g2/m, viruses may use this mechanism to regulate protein expression [62] . another study found that hiv-1 was more transcriptionally active during the g2 phase, and that arresting of the cell cycle may help limit the host immune response [63] . moreover, an hiv-1 infection also causes an alteration at the cellular trna level [34] . as to avian coronavirus infections, g2/m arrest has been found with an increased viral protein expression [64] . furthermore, rotavirus infection arrests the cell cycle in the s/g2 phase, favoring viral protein expression [61] , while influenza a virus infection arrests the cell cycle in the g0/g1 phase, resulting in increased viral protein expression [65, 66] . these findings suggest that viruses may manipulate the cell cycle and cellular translation machinery to create available trna population favoring the viral codon usage pattern. several go terms of human genes with codon usage patterns similar to rna viruses have been found by previous studies to be identical to the go terms of upregulated protein profiles in viral infections. global proteomic and phosphoproteomic changes in hiv-1 infected cd4þ t cells revealed that hiv-1 affected transcriptional and translational regulation, and targeted rna or protein degradation in order to modulate biological processes (including signal transduction, cell cycle, metabolic processes, and the immune system) [42] . in addition, a study of zikv infected human neurospheres also found an upregulation profile of proteins involving cell cycle arrest. this resulted in an alteration of the cell cycle in order to regulate the transcription and translation of the host cells [44] . iav infection targeted several cellular pathways to favor its replication, including aminoacyl-trna biosynthesis, glycolysis, fatty acid biosynthesis, and spliceosome [43] . moreover, regulated proteins and phosphoproteins in denv-infected cells were related to cellular macromolecule biosynthesis, rna splicing, chromatin modification, and cell stress response, and these regulations help facilitate viral protein expression [45] . this suggests that human genes with codon usage patterns similar to that of viruses may be upregulated at the translational level in a viral infection. it is unclear whether viruses evolved to regulate the translational machinery in order to accommodate the codon usage pattern that had already been shaped by mutational pressure, or whether they adapted their codon usage pattern to match the cellular translational machinery condition in infected cells. these two possibilities are not mutually exclusive. further studies are required to gain more insight into this new aspect of the virus-host interaction. the protein coding sequences of human genomes were downloaded from the gencode database (version 26) in fasta format [67] . the data set provided the nucleotide sequences of the coding transcripts on the reference chromosomes, including multiple transcript variants for each gene. thus, only the major transcript variants were selected for analysis. the major transcript variant is the longest transcript variant of the gene with a complete orf. for each gene, one representative as a major transcript variant with the longest sequence length was selected using a custom python script (supplementary file 4) . the orfs of the protein coding sequences were rechecked by the orffinder tool at the website https://www.ncbi.nlm.nih.gov/orffinder (ncbi) before performing further analysis. the transcript variants with an incomplete orf were excluded and substituted by alternative variants. a list of the selected major transcripts variants is given in supplementary file 1. the data set for the sequences of the human rna virus genome utilized by a previous study was used [55] . a total of 77 rna viruses that can cause diseases in humans were selected. the protein coding sequences of those rna viruses were downloaded in fasta format from the nucleotide database (refseq, ncbi). for each virus species, the protein coding sequences from different isolations of the same virus species were downloaded. the sequences were selected based on their availability in the database (some viral protein coding sequences were only available in a few numbers). any coding sequence with unidentified nucleotides that could not be translated or with an incomplete orf was excluded. the selected human rna viruses were categorized by their family, genus, and genome polarity. they comprised 39 positive-sense single strand rna (þssrna) viruses; 18 negative-sense single strand rna (-ssrna) viruses; 4 double strand rna (dsrna) viruses; 4 retroviruses (retro); and 12 ambisense (þ/-, ambi) viruses. supplementary file 5 presents a list of the human rna viruses used in this study, the accession numbers of the sequences, and the number of coding sequences of each virus. the rscu is the ratio of the observed frequency of the codon in a gene to the expected frequency of the codon under the condition that all the synonymous codons are equally used. the three stop codons (taa, tag, tga), met (atg), and trp (tgg) were excluded from the analysis. the observed frequency of the codons in the genes was counted. the fasta sequences were parsed, and the codons of the coding sequences for each transcript variant were counted by a python script and the biopython library (python version 3.5.2, with biopython version 1.66; for the scripts, see supplementary file 6). then, the rscu was calculated as follows: where n is the number of synonymous codons (1 n 6) for the amino acid, and x i is the number of occurrences of codon i. the synonymous codons with rscu values greater than 1.0 had a positive codon usage bias and were defined as abundant codons, while those with rscu values less than 1.0 had a negative codon usage bias and were defined as lessabundant codons. in the case of rscu values that were exactly 1.0, it meant that there was no codon usage bias, and the codons were chosen equally [68] . the rscu of human rna viruses were calculated using the caical server, which is available at http://genomes.urv.es/caical [37] . the multiple protein coding sequences from different isolations of the virus in the same species were submitted to the caical server for the rscu calculation. after that, the average rscu was calculated to represent the rscu of each rna virus species. the rscus of the human genes and rna viruses are provided in supplementary file 1. the cai of a specific gene was calculated using the cai calculator on the caical server [37] . the reference human codon usage table was obtained from the codon usage database (http://www.kazusa.or. jp/codon/) [69] . the rscus of the protein coding sequences of 20,190 human genes and 77 human rna viruses were input to the pca. the pca was performed using pasw statistics for windows, version 18 (spss inc., chicago, ill., usa). the kaiser-meyer-olkin measure of sampling adequacy test (kmo-msa) was also analyzed. the overall kmo-msa was 0.799, which was greater than a cut off of 0.5, indicating that the sample size was adequate. the principle components were successfully extracted using covariance matrix and quartimax rotation, which reduced the high dimensions of the dataset to a smaller number of dimensions. the selection of the significant components was based on a scree plot and the proportions of variance. the scree plot was a plot of the component numbers and eigenvalues; only the first two components had eigenvalues greater than 1, and they accounted for 45.5% of the total variance. the rscu of the gene was represented by the coordinate of pc1 and pc2 (x, y) on the graph. the pcas of the rscu graphs were plotted and analyzed using graphpad prism 7 (graphpad software, inc., ca). the cccs of a gene evaluates the similarity of the codon usage between that of a specific gene and a set of cell cycle regulated human genes. the calculation has been previously described [11] . briefly, the cccs is the sum of the codon preference (cp) values of the cell cycle regulated human genes (top-600 set) over all codons in the coding sequence of a gene, normalized by the length of the cdna. the cccs of a specific gene was calculated as follows: cccsðgþ ¼ x codonðgeneþ cp topà600 ðcodonþ , lengthðgþ where g is every codon of a gene, and cp top à600 (codon) is the cp in the top-600 gene set (see cited reference, table 1 ). on the pca of the rscu graph, the human genes located in the same area of rna viruses were taken as human genes with rscus similar to rna viruses. to select these human genes, the average rscu of each group of rna viruses was initially calculated. however, a great variation of rscu was observed in each group of rna virus categorized by the nucleic acid types of their genomes. hence, the viruses in each group were divided into subgroups by their rscu before calculation of the average pc1 (x) and pc2 (y); these rna-virus subgroups are listed in table 1 . subsequently, the average rscu of the rna viruses in each subgroup (mean vrscu, coordinate [a, b]) were set as the circle center of a circle with a radius of 0.3 units (figure 3 ). the radius was calculated to be minimal, to not exceed the standard deviation of the distance between the circle center and the human genes, and to cover most of the viruses in each subgroup. the human genes located within the circle were taken as the human genes with rscus similar to rna viruses. the distance (r) from the circle center (a, b) was measured as follows: (a-x) 2 þ (b-y) 2 ¼ r 2 where a and b are the coordinates of the mean vrscu of each subgroup of rna viruses, while x and y are the coordinates of the rscus of the human genes. the human genes with an rscu similar to each group of rna viruses were analyzed for go enrichment using go-termfinder, and revigo was used to categorize the redundant go terms [39, 40] . the whole genome of homo sapiens was used as a background list. the overrepresented go terms in the biological process were investigated in both analyses, with p 0.01 taken as a significant enrichment. the simple linear regression analysis and pearson correlation coefficient (pcc) were determined using graphpad prism 7 with p < 0.05 was taken as significant. kunlakanya jitobaom: conceived and designed the experiments; performed the experiments; analyzed and interpreted the data; contributed reagents, materials, analysis tools or data; wrote the paper. supinya phakaratsakul: conceived and designed the experiments; performed the experiments; analyzed and interpreted the data; contributed reagents, materials, analysis tools or data. thanyaporn sirihongthong: conceived and designed the experiments; performed the experiments; analyzed and interpreted the data. sasithorn chotewutmontri, prapat suriyaphol: performed the experiments; analyzed and interpreted the data; contributed reagents, materials, analysis tools or data. ornpreya suptawiwat: analyzed and interpreted the data; contributed reagents, materials, analysis tools or data. prasert auewarakul: conceived and designed the experiments; analyzed and interpreted the data. general nature of the genetic code for proteins rna codewords and protein synthesis vi. on the nucleotide sequences of degenerate codeword sets for isoleucine, tyrosine, asparagine and lysine pav e, codon catalog usage and the genome hypothesis codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type codon usage tabulated from international dna sequence databases: status for the year 2000 isoacceptor trna's codon-anticodon pairing: the wobble hypothesis the g x u wobble base pair. a fundamental building block of rna structure crucial to rna function in diverse biological systems c5-substituents of uridines and 2-thiouridines present at the wobble position of trna determine the formation of their keto-enol or zwitterionic formsa factor important for accuracy of reading of guanosine at the 3΄-end of the mrna codons accurate translation of the genetic code depends on trna modified nucleosides genes adopt non-optimal codon usage to generate cell cycle-dependent oscillations in protein levels trna modifications regulate translation during cellular stress codon usage and trna content in unicellular and multicellular organisms codon-biased translation can be regulated by wobble-base trna modification systems during cellular stress responses codon bias as a means to fine-tune gene expression codon usage between genomes is constrained by genome-wide mutational processes a simple model based on mutation and selection explains trends in codon and amino-acid usage and gc composition within and across genomes causes and implications of codon usage bias in rna viruses the extent of codon usage bias in human rna viruses and its evolutionary origin mutational pressure in zika virus: local adar-editing areas associated with pauses in translation and replication codon bias and gene expression synonymous codon usage, accuracy of translation, and gene length in caenorhabditis elegans effects of codon usage on gene expression: empirical studies on drosophila codon usage determines translation rate in escherichia coli correlation between the abundance of yeast transfer rnas and the occurrence of the respective codons in protein genes: differences in synonymous codon choice patterns of yeast and escherichia coli with reference to the abundance of isoaccepting transfer rnas the action of selection on codon bias in the human genome is related to frequency, complexity, and chronology of amino acids causes and effects of n-terminal codon bias in bacterial genes translation efficiency is determined by both codon bias and folding energy local slowdown of translation by nonoptimal codons promotes nascent-chain recognition by srp in vivo analysis of synonymous codon usage in h5n1 virus and other influenza a viruses impact of the biased nucleotide composition of viral rna genomes on rna structure and codon usage hiv-1 gag expression is quantitatively dependent on the ratio of native and optimized codons unusual codon usage of hiv hiv-1 modulates the trna pool to improve translation efficiency vaccinia and influenza a viruses select rather than adjust trnas to optimize translation intragenic codon bias in a set of mouse and human genes caical: a combined set of tools to assess codon usage adaptation comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins go:: termfinder-open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes revigo summarizes and visualizes long lists of gene ontology terms exploiting trnas to boost virulence proteo-transcriptomic dynamics of cellular response to hiv-1 infection targeting metabolic reprogramming by influenza infection for therapeutic intervention zika virus disrupts molecular fingerprinting of human neurospheres proteomics profiling of host cell response via protein expression and phosphorylation upon dengue virus infection correlation between the abundance of escherichia coli transfer rnas and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the e. coli translational system a general model of codon bias due to gc mutational bias rates of aminoacyl-trna selection at 29 sense codons in vivo the a-nucleotide preference of hiv-1 in the context of its structured rna genome fidelity of hiv-1 reverse transcriptase the biased nucleotide composition of the hiv genome: a constant factor in a highly variable virus likely role of apobec3g-mediated g-to-a mutations in hiv-1 evolution and drug resistance the cytidine deaminase cem15 induces hypermutation in newly synthesized hiv-1 dna codon usage of hiv regulatory genes is not determined by nucleotide composition genome polarity of rna viruses reflects the different evolutionary pressures shaping codon usage non-optimal codon usage affects expression, structure and function of clock protein frq codon usage regulates protein structure and function by affecting translation elongation speed in drosophila cells tissue-specific differences in human transfer rna expression cell cycle variations of dinucleoside polyphosphates in synchronized cultures of mammalian cells cell cycle regulation during viral infection rotavirus replication is correlated with s/g2 interphase arrest of the host cell cycle g2/m cell cycle arrest in the life cycle of viruses human immunodeficiency virus type 1 vpr induces dna replication stress in vitro and in vivo cell cycle perturbations induced by infection with the coronavirus infectious bronchitis virus and their effect on virus replication influenza a virus replication induces cell cycle arrest in g0/g1 phase influenza a virus ns1 induces g0/g1 cell cycle arrest by inhibiting the expression and activity of rhoa protein gencode: the reference human genome annotation for the encode project an evolutionary perspective on synonymous codon usage in unicellular organisms a new and updated resource for codon usage tables, bmc bioinf dynamic proteomics in individual human cells uncovers widespread cell-cycle dependence of nuclear proteins variability and memory of protein levels in human cells cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results cyclebase.org-a comprehensive multi-organism online database of cell-cycle experiments identification of genes periodically expressed in the human cell cycle and their expression in tumors the authors declare no conflict of interest. supplementary content related to this article has been published online at https://doi.org/10.1016/j.heliyon.2020.e03915. key: cord-322286-2de6r1h6 authors: vandewege, michael w; sotero-caio, cibele g; phillips, caleb d title: positive selection and gene expression analyses from salivary glands reveal discrete adaptations within the ecologically diverse bat family phyllostomidae date: 2020-07-22 journal: genome biol evol doi: 10.1093/gbe/evaa151 sha: doc_id: 322286 cord_uid: 2de6r1h6 the leaf-nosed bats (phyllostomidae) are outliers among chiropterans with respect to the unusually high diversity of dietary strategies within the family. salivary glands, owing to their functions and high ultrastructural variability among lineages, are proposed to have played an important role during the phyllostomid radiation. to identify genes underlying salivary gland functional diversification, we sequenced submandibular gland transcriptomes from phyllostomid species representative of divergent dietary strategies. from the assembled transcriptomes, we performed an array of selection tests and gene expression analyses to identify signatures of adaptation. overall, we identified an enrichment of immunity-related gene ontology terms among 53 genes evolving under positive selection. lineage-specific selection tests revealed several endomembrane system genes under selection in the vampire bat. many genes that respond to insulin were under selection and differentially expressed genes pointed to modifications of amino acid synthesis pathways in plant-visitors. results indicate salivary glands have diversified in various ways across a functional diverse clade of mammals in response to niche specializations. bats (order chiroptera) make up 20% of mammalian diversity and several groups have been alluded to as examples of extreme phenotypic and genomic changes leading to species diversity (ray et al. 2006; pritham and feschotte 2007; dumont et al. 2012; hayden et al. 2014; platt et al. 2014; phillips and baker 2015; sotero-caio et al. 2015) . in particular, the leaf-nosed bats (phyllostomidae) include >200 extant species, representing the most ecologically diverse mammal family with their wide range of dietary strategies practiced cirranello et al. 2016) . morphometric and the bat family phyllostomidae is one of the most ecologically diverse families of mammals. salivary glands are hypothesized to facilitate adaptation to novel diets because their secreted products are the first to come in contact with food and pathogens. we sequenced expressed transcripts from phyllostomid salivary glands and found strong signals of selection among immune-related genes. selection and gene expression signals among specific lineages were less clear but pointed to modifications of the endomembrane transport system and metabolic pathways. although we could not strongly link gene evolution to dietary adaptations, results indicated diversification in response to niche specializations. evolutionary models suggest the ecological and phenotypic diversity among lineages and species are associated with strong selection on dietary specialization features (monteiro and nogueira 2011; rojas et al. 2011; dumont et al. 2012; rossoni et al. 2017; hedrick et al. 2019) . bats exhibit several characteristics rendering them interesting to examine, most obviously their ability to fly and echolocate. furthermore, bats act as vectors for zoonotic diseases including sars coronavirus, ebola, nipah virus, and rabies (calisher et al. 2006; smith and wang 2013; olival et al. 2017) . a lesser examined characteristic thought to be directly linked to their dietary multiplicity and adaptation is the anatomical diversity of their salivary glands tandler et al. 2001) . in mammals, there are typically three major pairs of salivary glands: the submandibular, parotid, and sublingual glands, and all use intracellular processes that involve the synthesis, modification, packaging, and secretion of proteins in membrane-bound granules. these glands are considered part of the digestive system as they secrete digestive enzymes, however, their products perform additional functions including antimicrobial resistance and biochemical communication (dobrosielski-vergona 1993; talley et al. 2001; bloss et al. 2002; safi and kerth 2003; vandewege et al. 2013) . tandler and phillips (1998) have shown that among 55 species of bats, secretory products were correlated with diet, especially in insectivorous species. further, phillips et al. (2014) found evidence that salivary glands play a role in lipid metabolism in myotis lucifugus. because of this wide variation and their direct link to immunity, diet, and reproduction, submandibular glands (smgs) and their products are hypothesized to play an important role in the adaptive radiation of mammals (phillips and tandler 1996) . linking genetic variation and gene products to selection and adaptation is still a challenge in evolutionary biology. more recently, the implementation of next-generation sequencing has facilitated the search for signatures of adaptation through dna and/or rna comparative analyses. indeed, sequencing transcriptomes offers an in-depth, data-rich method to identify selective pressures that have influenced the evolution of tissue-specific genes. of interest are genes that have undergone positive selection to adapt physiological, immunological, and ecological processes to new environments (daugherty and malik 2012; hawkins et al. 2019) . selection analyses among bat genes have generally been limited to few species for specific purposes (shen et al. 2010; zhang et al. 2013) . hawkins et al. (2019) analyzed orthologs from 18 different bat genomes and transcriptomes and found most genes under selection were related to immunity and collagen production. here, phyllostomidae was examined specifically because of their extensive and relatively rapid radiation to new feeding strategies. we sequenced the smg transcriptomes of nine phyllostomid bats representing different subfamilies and different diets, and through analysis of orthologs characterized how selection on coding sequence and expression differences have shaped smgs. nine species from seven out of the 11 recognized subfamilies were chosen to maximize representation of the phylogenetic and dietary diversity of phyllostomidae ( fig. 1 ). we also included two insectivorous bats, m. lucifugus from vespertilionidae and pteronotus parnellii from mormoopidae, as outgroups. tissues were extracted and frozen in liquid nitrogen within 5 min after euthanasia. additional details from tissue loans provided by the natural science research laboratory (nsrl) of the museum of texas tech university can be found in supplementary table s1, supplementary material online. rna isolation, sequencing, and assembly rna was extracted from smgs of each bat using trizol (invitrogen, carlsbad, ca, usa) following manufacturer protocols. oligo-dt magnetic beads were used to enrich for mrna strands with poly-a tails and a strand-specific pairedend cdna library was prepared using a scriptseq kit (epicentre, madison wi usa). libraries were sequenced on illumina platform (see supplementary table s1, supplementary material online). for each species, pairedend reads were filtered for quality using trimmomatic 0.36 (bolger et al. 2014) putative open reading frames and translated peptide sequences were identified using transdecoder (haas et al. 2013 ) and the resulting peptide sequences were processed through the trinotate pipeline to identify functional properties and gene ontology (go) annotations associated with biological processes, molecular functions, and cellular components. to summarize, peptide sequences were queried against the swissprot database (dimmer et al. 2012) using blastp (altschul et al. 1997 ) and the pfam (finn et al. 2010 ) database using hmmer 3.2.1 (eddy 2011) . peptides were also scanned for a signal peptide using signalp (petersen et al. 2011) , and transmembrane domains using tmhmm 2.0 (krogh et al. 2001 ). orthology assignment is still a major challenge in bioinformatics and evolutionary biology. here, we developed a process to filter out similar transcripts produced by trinity. the first step was to assume similar sequences would have similar trinotate annotations. we parsed the trinotate output to identify the best sequence representative for each unique swissprot annotation. to choose the best gene representative among multiple coding sequences (cdss) with the same swissprot annotation, we multiplied the length of the cds to the percent identity of the swissprot hit. this metric correlated with e value, but effectively acted as a bit score when e values were identical. the cds with the highest metric was chosen to represent the annotation. if a cds did not have a swissprot annotation, it was removed. we then ran combined best sequences from all species, and ran this data set through the orthomcl (li et al. 2003) pipeline to identify orthologous groups. only single gene ortholog groups were used in downstream selection tests. poor ortholog assignment can influence sequence relationships in a phylogeny, and because the relationships among phyllostomids are robust, a reasonable test of ortholog assignment would be to reconstruct a phylogenetic tree and determine if the resulting tree reflects previously described relationships. therefore, we reconstructed a phylogenetic tree from 500 randomly sampled single gene orthologs shared among all 11 individuals. each orthologous group was translated, aligned using linsi parameters in mafft (katoh and standley 2013) and reverse translated to construct a codon alignment. resulting alignments were concatenated and we used raxml (stamatakis 2014) to find the best tree from the unpartitioned data set using the ml and rapid bootstrapping algorithm, a gtrgamma model of nucleotide substitution, and 100 bootstrap replicates. single gene orthologous groups that were found in seven or more phyllostomids were tested for evidence of selection using the maximum likelihood approach described by goldman and yang (1994) . codon alignments were constructed as above and the best tree for each gene was estimated using raxml. we used codeml in paml (yang 2007) to estimate the role of selection on gene evolution by comparing the rate of nonsynonymous substitution per nonsynonymous site (d n ) to the rate of synonymous substitution per synonymous site (d s ). the d n /d s ratio can be used as a sensitive measure of selective pressure; however, in most cases, the overall d n /d s is <1 and only a few amino acid sites are evolving quickly. therefore, to determine whether a gene was evolving adaptively, we calculated the likelihood of models that allow d n /d s to vary among codon sites (m1a, m2a, m7, and m8). for all genes, we used likelihood ratio tests (lrts) to compare nested models that allow and disallow codon site d n /d s to be >1 (m2a v m1a, m8 v m7), and to test for significant differences between nested models (yang 1998) . if both models that allow d n /d s rates to be >1 (m2a and m8) were significantly better fits to the data (lrt, p < 0.05), we inferred these genes were evolving under positive selection. we performed a false discovery rate (fdr) correction on the p values resulting from the m2a v m1a and m7 v m8 lrts. fdrs were estimated using the qvalue function in the "qvalue" r module (storey and tibshirani 2003; storey 2020) . we also conducted branch-site selection tests in codeml (yang and nielsen 2002; zhang et al. 2005) for ortholog groups that were present among all 11 species. alignments were constructed as previously described, but we used the species tree generated above ( fig. 1 ). the branch-site test for positive selection divides branches of a phylogenetic tree into foreground and background branches. the null model (model a) restricts positive selection among codons in both foreground and background branches. the alternative (model b) allows positive selection to occur among codon in foreground branches. likelihoods between model a and model b were compared using a lrt. the fdr was also estimated from the distribution of p values. we conducted independent branch-site tests where the d. rotundus branch was the foreground, a second test with trachops cirrhosus as foreground, and a third test that included the entire plant-visitors clade as foreground ( fig. 1 ). we mapped quality filtered paired-end rnaseq reads back to reference sequences using bowtie 1.2.2 (langmead et al. 2009 ) and default parameters of rsem (li and dewey 2011) . only ortholog groups that were present in four or more species were present in reference sequences. ortholog groups with <10 mapped reads across all 11 samples were removed prior to analyses. normalization of raw read counts was performed using the estimatesizefactors and estimatedispersions functions in deseq2 v1.26 (love et al. 2014) . patterns of species variation in expression levels were assessed by performing a phylogenetically corrected pca using the phytools r package (revell 2012 ) based on the blind variance stabilizing transformed data. we also conducted a differential expression analysis from the normalized count data between plant-visitors versus others (see fig. 1 ) and determined statistical significance when the adjusted p value was < 0.1. we repeated this analysis examining only nectarivores. for all single gene orthologs tested for selection, we counted the number of reoccurring go terms. we then used the go term proportion to summarize the general function of genes tested in the smgs of phyllostomids using revigo (supek et al. 2011 ), which nests redundant and similar terms from long go term lists by semantic clustering. we repeated the same analysis using go terms described from orthologs under positive selection. to test for go term enrichment for the genes under selection, panther (mi et al. 2016 ) was applied to identify statistically overrepresented go terms from the list of genes under positive selection using the genes tested as the reference list. we used the fisher's exact test and applied the fdr correction and statistical significance was determined below an adjusted p value <0.05. we estimated whether a protein was membrane bound, a receptor, immune related, or secreted by searching for specific go terms. if "secretion," "extracellular space," or "extracellular region" was present among go terms, we annotated the gene as secreted. if "immune," "defense," or "antimicrobial" was present, the gene was annotated as an immune. if "membrane" was present we called the gene membrane bound and if "receptor" was present, we called the gene a receptor (supplementary table s2, supplementary material online). we also used david 6.8 (huang et al. 2009 ) to map kegg pathways to differentially expressed genes. to investigate putative proteins underlying the evolution of smgs, we performed rnaseq on 11 chiroptera species: nine phyllostomids with varying dietary strategies, and two outgroups representing the bat ancestral state of strict insectivory ( fig. 1 ). among the 11 samples, between 53,757 and 105,666 transcripts were assembled per species. over 87% of the paired-reads mapped back to each assembly indicating most reads were used. among the transcripts, we predicted between 17,820 and 54,578 orfs, and among the orfs, we found between 5,392 and 9,580 unique hits to the swissprot database (supplementary table s1, supplementary material online). the most similar isoform to the reference sequence was chosen to represent the annotation. swissprot annotations were not used a priori for ortholog clustering. however, if after clustering there were multiple swissprot annotations in a single gene ortholog group, the most common annotation was used to describe the ortholog group. in all, orthomcl grouped 79,817 annotations into 10,267 orthologous groups that included between 2 and 45 members (supplementary fig. s1a , supplementary material online). among the ortholog groups, 6,694 were single gene families represented in four or more species and 2,247 were found in all 11 species (supplementary fig. s1b , supplementary material online). among these 6,694 groups, 6,322 (94%) had the same swissprot annotation among orthologs, suggesting overall consistency between orthomcl clustering and trinotate annotations. in the remaining 372 groups, there was one dominant gene annotation and the outliers were likely a result of improper classification by trinotate, given blast hits are often closely related paralogs and not true orthologs, in which case annotations were manually corrected. to additionally test if the ortholog predictions were generally accurate and dna sequences largely reflected species relationships, we generated a phylogenetic tree from the concatenated alignments of 500 randomly sampled orthologs that were present in all species. although individual gene trees may not reflect a species trees, we expected that a consensus would emerge from a large volume of dna sequences if orthologs were accurately predicted and aligned. we found that the resulting concatenated tree accurately reflected previous species trees generated by dumont et al. (2012) , baker et al. (2016) , and rojas et al. (2016) , with high bootstrap support at each node ( fig. 1 ). among the 6,694 orthologs, 4,007 were single gene orthologs present in seven or more phyllostomids, and 2,920 had alignments with enough shared sites to produce output from codeml. after correcting for fdr, we found 53 genes where models of evolution allowing positive selection were significantly better fit to the data than neutrality in both m2a and m8 tests (supplementary table s2, supplementary material online). to contrast background salivary gene functions to those under selection, we summarized the go terms of the 2,920 tested genes using the revigo server ( fig. 2a) . overall, most of these gene products were localized to the cytosol, but many were also positioned in the plasma membrane and extracellular region. common biological processes identified were related to transcription regulation, cell death, and protein synthesis and transport. by contrast, the biological process and cellular component terms of genes under selection were fairly different from those obtained in our global assessment of protein function ( fig. 2b) . these proteins had more terms related to the extracellular exosome or plasma membrane and were involved in the immune response ( fig. 2b) . interestingly, no terms related to diet or metabolism (i.e., carbohydrate or lipid metabolism) were associated with genes under positive selection. we used panther's overrepresentation test to determine whether any go terms were significantly overrepresented among genes under selection. there were no molecular function terms enriched among the genes under selection (supplementary fig. s2 , supplementary material online). the most enriched biological process term was innate immune response, but all enriched biological process terms were related to the defense against other organisms ( fig. 2c ). consistent with receptor-like proteins under selection, terms associated with membrane surfaces were enriched in the cellular component category ( fig. 2c ). upon survey of the multiple go terms assigned to each gene, we found that eight of the 53 genes were involved in the immune response (supplementary table s2, supplementary material online). moreover, 50% of immunity-related loci had secretory annotations although a binomial test did not suggest secretion among immune proteins was enriched (p â¼ 0.46). of the remaining 45 genes, 18 genes were membrane bound, two were secreted, and two had terms for both membranes and secretion. site tests can inform about positive selection on a protein among all species, but they cannot inform about episodic selection on specific lineages. therefore, we conducted branch-site tests using the species tree as a reference on the most unique lineages, namely the plant-visitors, d. rotundus, and t. cirrhosus, and the plant-visiting species. we identified 2,247 single gene orthologs that were present among all 11 species and 1,590 had enough shared sites to be successfully tested in codeml. after correcting for fdr, we found 24, 20, and 13 genes under positive selection in the plant-visitors, d. rotundus, and t. cirrhosus, respectively (supplementary table s3, supplementary material online) making up between 0.8% and 1.5% of genes tested. bpia2 (bpi fold-containing family a member 2), an immune response gene, was the only gene found to be under selection in both the plant-visitors and d. rotundus and approaching statistical significance in t. cirrhosus (supplementary table s3 we measured the expression of 6,692 orthologs among all 11 species. we performed a phylogenetically corrected pca on the variance stabilizing transformed expression data and found that all species except d. rotundus, t. cirrhosus, and hsunycteris thomasi had relatively correlated expression profiles where most of the insectivores and plant-visitors clustered together ( fig. 3a ). as there were not any biological replicates and few dietary replicates among our samples, the most robust approach to identifying relevant differentially expressed genes was to compare the plant-visitors to the remaining species to identify expression differences that could explain how plant-visitors have adapted to a diet with a different macromolecular profile. sixteen genes were differentially expressed in plant-visitors, eight were upregulated, and eight were downregulated ( fig. 3b) , and no go terms or kegg pathways were significantly enriched among these 16 genes. however, the most common pathways associated with these differentially expressed genes were metabolic pathways (supplementary table s4, supplementary material online). we performed another differential expression experiment by just examining the nectarivores. interestingly, 36 out of 40 differentially expressed genes were downregulated in nectarivores ( fig. 3c ), many of these genes were also involved in metabolic pathways (supplementary table s5, supplementary material online). salivary glands secrete products that have important biological roles in diet/digestion, oral health, and communication, as well as display the most varied cellular ultrastructure in distinct taxa . the great morphological variation observed in the ultrastructure of phyllostomid bat salivary gland granules led phillips and tandler (1996) to speculate that adaptation to novel niches drives salivary gland evolution. smg acinar cells are responsive to a variety of extracellular signals which can affect gene expression, protein synthesis, and protein modifications, and this responsiveness can be driven by the density and distribution of cell surface receptors . salivary proteins are among the first to directly encounter food and introduced pathogens, which lends biological importance in the context of adaptive evolution. however, their role in the adaptation of mammals to novel niches has been largely under investigated. here, we sequenced the smg transcriptomes of nine phyllostomid bats with varying feeding strategies to illuminate the role of salivary glands in this adaptive radiation. interestingly, the percentage of genes under selection expressed in phyllostomid salivary glands was not necessarily greater than genes under selection among bats a whole (hawkins et al. 2019) . further, links to dietary changes were not strongly apparent, but signatures of selection did reveal modification of smg functions. among the sequenced transcriptomes, we were able to annotate between 5,392 and 9,580 expressed proteins, which were successfully clustered as 6,694 single gene orthologs. about 4,007 were present in seven or more phyllostomids and tested for selection among codon sites. using codon selection models, we found only a small proportion (1.8%) of genes demonstrated signatures of positive selection. consistently, hawkins et al. (hawkins et al. 2019 ) yielded the same frequency of genes under selection from the transcriptomes and genomes of 18 bat species using multiple tissue sources (e.g., spleen, trigeminal ganglia, inner ear, and embryos transcriptomes). this suggests that overall, the genes expressed in smgs phyllostomids are not necessarily subjected to stronger selective pressures in coding regions than genes expressed in other tissues among bats. salivary glands are known to function in immunity through the localized function of immune cells (f abi an et al. 2012) and immunity was the only enriched term among genes under selection ( fig. 2c ). eight of the 53 proteins inferred to be affected by positive selection were linked to immunity and defense, and five of these had a secretory component (supplementary table s2 , supplementary material online). these eight genes had common functions associated with the innate and humoral immune response and stimulate the activation of immune cells to viral infected sites. for example, bpia1 (bpi fold-containing family a member 1) and bpia2 are secreted and known to localize to upper airways and prevent biofilm formation by gram-negative bacteria (liu et al. 2013; prokopovic et al. 2014) . a high exposure to pathogens unique to bats could explain this result, but positive selection among immune genes is not uncommon in animals (roux et al. 2014; xiao et al. 2015; van der lee et al. 2017; hawkins et al. 2019) . defense and immunity genes typically have a high evolutionary rate attributed to a continuous arms race between pathogens and host immune response (jiggens and kim 2007; kosiol et al. 2008; viljakainen et al. 2009 ), and our data provide insight about how positive selection has shaped bat oral tract defense against introduced pathogens. an unexpected facet of our results was a general paucity of genes under selection with roles related to metabolism among site tests. therefore, to identify lineage-specific trends, we used branch-site models and tested for episodic selection in groups that exhibit the most divergence from the ancestral insectivorous trait: the common vampire bat d. rotundus, the frog-eating bat t. cirrhosus, and all plant-visitors. there was a similar frequency of genes under positive selection as site tests, between 0.5% and 1.5%, and no go terms were enriched among these genes. most go terms were not redundant among the three tests, except for associations with the golgi body, which is expected given the importance of proper secretory function of salivary glands. the golgi body is the central organizer of the membrane trafficking system. it is where proteins are sorted into vesicles and trafficked out of the cell. however, it is noteworthy that the vampire bat displayed an overrepresentation of golgi-related ontology terms and genes under selection, when compared with the other two tested groups. apparent adaptation or specialization of the golgi body and endomembrane system was present in the vampire bat d. rotundus. the common vampire bat belongs to the subfamily desmodontinae, the only obligate sanguivores among amniotes, which may be the most ecologically divergent among the phyllostomids. francischetti et al. (2013) previously performed a transcriptomic analysis of vampire bat salivary glands and found a diversity of anticoagulants, antiinflammatory proteins, and neural disruptors hypothesized to enhance the efficiency of parasitizing animals for blood meals. thus, unlike other phyllostomids, secreted proteins may not only be helpful to digestion but effective adaptations to blood-feeding with minimal death risk for the bat and its prey. branch-site tests in d. rotundus identified 20 genes under selection (supplementary table s3, supplementary material online). notable go terms among these 20 genes pointed to the regulation of protein trafficking, that is, golgi organization, protein n-linked glycosylation, trans-golgi network, and positive regulation of secretion (supplementary fig. s3 , supplementary material online). the genes linked to these terms were golgi phosphoprotein 3-like (glp3l), vesicleassociated membrane protein-associated a (vapa), and nsfl1 cofactor p47 (nsf1c). vapa can mediate vesicle transport from the endoplasmic reticulum (er) to the golgi body (lehto et al. 2005) . glp3l is generally expressed in secretory tissues, localizes to golgi, is required for efficient anterograde trafficking, and knocking out glp3l causes golgi dispersal and impairs secretion (ng et al. 2013) . nsf1c seems to be relevant for golgi reassembly after mitosis (p echeur et al. 2002) . lastly, a gene under selection in the n-glycan biosynthesis pathway was pmm2 (phosphomannomutase 2). pmm2 specifically catalyzes the isomerization of mannose 6-phosphase to mannose 1-phosphate. deleterious mutations in pmm2 tend to cause defects in protein glycosylation and subsequent congenital disorders (matthijs et al. 1997) . vampire bats diverged from an insectivorous ancestor and subsequently underwent a rapid transition from insectivory to sanguivory (datzmann et al. 2010; dumont et al. 2012) . given our results, it is plausible the endomembrane system was modified during this transition. moreover, given that some golgi body-related genes appeared under selection in all branch-site tests, this organelle played some role in the adaptive radiation of phyllostomids. another interesting result from branch-site tests was a response to insulin stimulus in the plant-visitors. two genes under selection, ubiquitin-conjugating enzyme e2b (ube2b) and casp8 and fadd-like apoptosis regulator (cflar), were specifically associated with this term. cflar regulates downstream genes involved in lipid metabolism, glucose uptake, and oxidative stress. in mice, cflar was shown to reverse nonalcoholic steatohepatitis liver disease (wang et al. 2017 ) that can be caused by insulin resistance and high blood sugar among other metabolic factors (liu et al. 2019) . the exact function of ube2b is a little less clear but appears to be linked to muscle atrophy. ube2b becomes more expressed in fasting rats and atrophying muscle cells, but becomes suppressed in response to insulin (wing and banville 1994; polge et al. 2015) . the plant-visitors make up a collection of frugivorous and nectarivorous species that ingest high amounts of sugars at once (laska 1990 ). interestingly, the phyllostomid great fruit-eating bat (artibeus lituratus) exhibits no difference in serum insulin levels between fasted and fed states (protzek et al. 2010) . moreover, serum insulin levels observed in a. lituratus were higher than those observed in mice (fujiwara et al. 2007 ) and obese humans (yassine et al. 2009 ). we can speculate physiological changes, that is, constantly high concentration of insulin, may have caused a selective response in insulin associated pathways. selection pressures may not only occur at the sequence level but also the expression regulation level. therefore, we paired selection tests with gene expression analyses. the distinct natural histories represented by t. cirrhosus and d. rotundus were reflected in the pca. in fact, the only way t. cirrhosus stood out in this study was in the expression pca. trachops cirrhosus specializes on frogs (tuttle and ryan 1981) which is a unique feeding strategy in phyllostmids, but data from the vespertilionid nyctalus lasiopterus suggests the transition from insects to carnivory may not require any major adaptations (ib aã±ez et al. 2001) . consistently, selection pressures, given branchsite tests, were weakest in t. cirrhosus (supplementary table s3, supplementary material online). the most remarkable observation from overall gene expression profiles was a general correlation among insectivores and plant-visitors. even the distantly related m. lucifugus and p. parnellii were included in this cluster ( fig. 3a ). hsunycteris thomasi slightly deviated from the other plant-visitors, albeit this deviation was not as extreme as t. cirrhosus and d. rotundus. morphological and genetic data suggest nectar-feeding derived independently in glossophaginae and lonchophyllinae (datzmann et al. 2010) . this independence may also have been captured by our pca. given overall similarity of expression profiles between plant-visitors and insectivores, few genes were differentially expressed ( fig. 3b ), but more genes were differentially expressed when we examined just the nectarivores ( fig. 3c ) and many of these proteins were in amino acid metabolic pathways. the strength of differential expression among genes in amino acid pathways was not enough to be significantly enriched, but given carbohydrates are in higher abundance in plantvisitor diets, amino acid synthesis is linked to the products of glycolysis and the citric acid cycle, it is plausible this group has modified these pathways as adaptive responses. we examined the evolutionary history of genes expressed in smgs to test for links between genetic variation and signatures of selection. given the ecological variation of these species, we expected to find strong selection signals and greater expression diversity. indeed, we identified a strong selection signal among immune-related genes in phyllostomids, but lineage-specific adaptations were less clear, that is, fewer genes under selection among a wide array of pathways and few differentially expressed genes. from our data, we inferred modifications of the endomembrane system with a focus on the golgi body, most apparent in the vampire bat. further, lineage-specific adaptations have occurred in response to insulin changes and modifications of metabolic pathways in the plant-visitors, signaling unique, and lineage-specific adaptations have occurred in phyllostomids with diverse feeding strategies. supplementary data are available at genome biology and evolution online. we would like to thank robert baker and carleton phillips for previous work that inspired this research. this work was not possible without the previous efforts of ttu students, faculty, and staff toward sample collection and curation. therefore, we are thankful for nict e ord oã±ez-garza, maria sagot, heath garner, robert bradley, and the texas tech university natural science research laboratory (ttu nsrl) for tissue loans. we also acknowledge the high performance computing center (hpcc) at texas tech university at lubbock for providing hpc resources. url: http://cmsdev.ttu.edu/hpcc. c.g.s.c. was supported with a fellowship (pde) by conselho nacional de desenvolvimento cient ä±fico e tecnol ogico (cnpq), and by a postdoctoral fellowship (pnpd) from coordenacâ¸ã£o de aperfeicâ¸oamento de pessoal de n ä±vel superior (capes), brazil during the development of this study. these funding agencies had no role in any experimental aspect of this study. gapped blast and psi-blast: a new generation of protein database search programs higher level classification of phyllostomid bats with a summary of dna synapomorphies potential use of chemical cues for colony-mate recognition in the big brown bat, eptesicus fuscus trimmomatic: a flexible trimmer for illumina sequence data bats: important reservoir hosts of emerging viruses morphological diagnoses of higher-level phyllostomid taxa (chiroptera: phyllostomidae) evolution of nectarivory in phyllostomid bats (phyllostomidae gray, 1825, chiroptera: mammalia) rules of engagement: molecular insights from host-virus arms races the uniprot-go annotation database in 2011 biology of the salivary glands morphological innovation, diversification and invasion of a new adaptive zone accelerated profile hmm searches salivary defense proteins: their network and role in innate and acquired oral immunity the pfam protein families database the "vampirome": transcriptome and proteome analysis of the principal and accessory submaxillary glands of the vampire bat desmodus rotundus, a vector of human rabies insulin hypersensitivity in mice lacking the v1b vasopressin receptor a codon-based model of nucleotide substitution for protein-coding dna sequences full-length transcriptome assembly from rna-seq data without a reference genome de novo transcript sequence reconstruction from rna-seq using the trinity platform for reference generation and analysis a metaanalysis of bat phylogenetics and positive selection based on genomes and transcriptomes from 18 species a cluster of olfactory receptor genes linked to frugivory in bats morphological diversification under high integration in a hyper diverse mammal clade systematic and integrative analysis of large gene lists using david bioinformatics resources bat predation on nocturnally migrating birds a screen for immunity genes evolving under positive selection in drosophila mafft multiple sequence alignment software version 7: improvements in performance and usability patterns of positive selection in six mammalian genomes predicting transmembrane protein topology with a hidden markov model: application to complete genomes ultrafast and memoryefficient alignment of short dna sequences to the human genome food transit times and carbohydrate use in three phyllostomid bat species targeting of osbp-related protein 3 (orp3) to endoplasmic reticulum and plasma membrane is controlled by multiple determinants rsem: accurate transcript quantification from rna-seq data with or without a reference genome orthomcl: identification of ortholog groups for eukaryotic genomes splunc1/bpifa1 contributes to pulmonary host defense against klebsiella pneumoniae respiratory infection silibinin ameliorates hepatic lipid accumulation and oxidative stress in mice with non-alcoholic steatohepatitis by regulating cflar-jnk pathway moderated estimation of fold change and dispersion for rna-seq data with deseq2 mutations in pmm2, a phosphomannomutase gene on chromosome 16p13, in carbohydrate-deficient glycoprotein type 1 syndrome (jaeken syndrome) panther version 10: expanded protein families and functions, and analysis tools evolutionary patterns and processes in the radiation of phyllostomid bats golph3l antagonizes golph3 to determine golgi morphology host and viral traits predict zoonotic spillover from mammals phospholipid species act as modulators in p97/p47-mediated fusion of golgi membranes signalp 4.0: discriminating signal peptides from transmembrane regions secretory gene recruitments in vampire bat salivary adaptation and potential convergences with sanguivorous leeches dietary and flight energetic adaptations in a salivary gland transcriptome of an insectivorous bat salivary glands, cellular evolution, and adaptive radiation in mammals plasticity and patterns of evolution in mammalian salivary glands: comparative immunohistochemistry of lysozyme in bats large numbers of novel mirnas originate from dna transposons and are coincident with a large species radiation in bats role of e2-ub-conjugating enzymes during skeletal muscle atrophy massive amplification of rolling-circle transposons in the lineage of the bat myotis lucifugus isolation, biochemical characterization and anti-bacterial activity of bpifa2 protein insulin and glucose sensitivity, insulin secretion and b-cell distribution in endocrine pancreas of the fruit bat artibeus lituratus bats with hats: evidence for recent dna transposon activity in genus myotis phytools: an r package for phylogenetic comparative biology (and other things) when did plants become important to leaf-nosed bats? diversification of feeding habits in the family phyllostomidae bats (chiroptera: noctilionoidea) challenge a recent origin of extant neotropical diversity intense natural selection preceded the invasion of new adaptive zones during the radiation of new world leaf-nosed bats patterns of positive selection in seven ant genomes secretions of the interaural gland contain information about individuality and colony membership in the bechstein's bat adaptive evolution of energy metabolism genes and the origin of flight in bats bats and their virome: an important source of emerging viruses capable of infecting humans integration of molecular cytogenetics, dated molecular phylogeny, and model-based predictions to understand the extreme chromosome reorganization in the neotropical genus tonatia (chiroptera: phyllostomidae) raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies qvalue: q-value estimating for false discovery rate control statistical significance for genomewide studies revigo summarizes and visualizes long lists of gene ontology terms female preference for male saliva: implications for sexual isolation of mus musculus subspecies secretion by striated ducts of mammalian major salivary glands: review from an ultrastructural, functional, and evolutionary perspective microstructure of mammalian salivary glands and its relationship to diet genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts evolution of the abpa subunit of androgen-binding protein expressed in the submaxillary glands in new and old world rodent taxa rapid evolution of immune proteins in social insects targeting casp8 and fadd-like apoptosis regulator ameliorates nonalcoholic steatohepatitis in mice and nonhuman primates 14-kda ubiquitin-conjugating enzyme: structure of the rat gene and regulation upon fasting and by insulin transcriptome analysis revealed positive selection of immune-related genes in tilapia likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution paml 4: phylogenetic analysis by maximum likelihood codon-substitution models for detecting molecular adaptation at individual sites along specific lineages effects of exercise and caloric restriction on insulin resistance and cardiometabolic risk factors in older obese adults -a randomized clinical trial comparative analysis of bat genomes provides insight into the evolution of flight and immunity evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level associate editor: naruya saitou key: cord-332006-if46jycd authors: whitehead, kathryn a.; langer, robert; anderson, daniel g. title: knocking down barriers: advances in sirna delivery date: 2009 journal: nat rev drug discov doi: 10.1038/nrd2742 sha: doc_id: 332006 cord_uid: if46jycd in the 10 years that have passed since the nobel prize-winning discovery of rna interference (rnai), billions of dollars have been invested in the therapeutic application of gene silencing in humans. today, there are promising data from ongoing clinical trials for the treatment of age-related macular degeneration and respiratory syncytial virus. despite these early successes, however, the widespread use of rnai therapeutics for disease prevention and treatment requires the development of clinically suitable, safe and effective drug delivery vehicles. here, we provide an update on the progress of rnai therapeutics and highlight novel synthetic materials for the encapsulation and intracellular delivery of nucleic acids. rna interference (rnai) gained international attention in 1998 when fire, mello and colleagues discovered the ability of double-stranded rna to silence gene expression in the nematode worm caenorhabditis elegans 1 . three years later, tuschl and co-workers published their celebrated proof-of-principle experiment demonstrating that synthetic small interfering rna (sirna) could achieve sequence-specific gene knockdown in a mammalian cell line 2 . the first successful use of sirna for gene silencing in mice was achieved for a hepatitis c target shortly thereafter 3 . since that time, the biotechnology sector has made considerable efforts in the advancement of sirna therapeutics for the treatment of various disease targets, including viral infections 4,5 and cancer [6] [7] [8] . rnai is a fundamental pathway in eukaryotic cells by which sequence-specific sirna is able to target and cleave complementary mrna 2 . rnai is triggered by the presence of long pieces of double-stranded rna, which are cleaved into the fragments known as sirna (21) (22) (23) nucleotides long) by the enzyme dicer 9 . in practice, sirna can be synthetically produced and then directly introduced into the cell, thus circumventing dicer mechanics (fig. 1) . this shortcut reduces the potential for an innate immune interferon response and the shutdown of cellular protein expression that can occur following the interaction of long pieces (>30 nucleotides) of doublestranded rna with intracellular rna receptors 10 . once sirna is present in the cytoplasm of the cell, it is incorporated into a protein complex called the rnainduced silencing complex (risc) 11 . argonaute 2, a multifunctional protein contained within risc, unwinds the sirna, after which the sense strand (or passenger strand) of the sirna is cleaved 12 . the activated risc, which contains the antisense strand (or guide strand) of the sirna, selectively seeks out and degrades mrna that is complementary to the antisense strand 13 (fig. 1) . the cleavage of mrna occurs at a position between nucleo tides 10 and 11 on the complementary antisense strand, relative to the 5′-end 14 . the activated risc complex can then move on to destroy additional mrna targets, which further propagates gene silencing 15 . this extra potency ensures a therapeutic effect for 3-7 days in rapidly dividing cells, and for several weeks in non-dividing cells 16 . eventually, sirnas are diluted below a certain therapeutic threshold or degraded within the cell, and so repeated administration is necessary to achieve a persistent effect. theoretically, when using appropriately designed sirna, the rnai machinery can be exploited to silence nearly any gene in the body, giving it a broader therapeutic potential than typical small-molecule drugs. indeed, it has already been reported that synthetic sirnas are capable of knocking down targets in various diseases in vivo, including hypercholesterolaemia 17 , liver cirr hosis 18 , hepatitis b virus (hbv) 4, 19 , human papillomavirus 20 , ovarian cancer 21 and bone cancer 22 . in order for these advances to be implemented in a clinical setting, safe and effective delivery systems must be developed. while 'naked' , chemically modified sirna has shown efficacy in certain physiological settings such as the brain 23 and the lung 24 , there are many tissues in the body that require an additional delivery system to facilitate transfection. this is because naked sirna is subject to degradation by endogenous enzymes, and is too large (~13 kda) and too negatively charged to cross cellular membranes. the strand of the sirna molecule that is complementary to the target mrna, which activates risc and has an important role in target mrna identification and destruction. the process of delivering nucleic acid material into the cell. the issue of effective and non-toxic delivery is a key challenge and serves as the most significant barrier between sirna technology and its therapeutic application. the ease of sirna delivery is partly dependent on the accessibility of the target organ or tissue within the body. localized sirna delivery -that is, application of sirna therapy directly to the target tissue -offers several benefits, including the potential for both higher bioavailability given the proximity to the target tissue, and reduced adverse effects typically associated with systemic administration. by contrast, systemic delivery, meaning the intravenous injection of delivery particles that then travel throughout the body to the target organ or tissue, requires that particles have the ability to avoid uptake and clearance by non-target tissues (fig. 2) . there are several tissues that are amenable to topical or localized therapy, including the eye, skin, mucus membranes, and local tumours [25] [26] [27] [28] (table 1) . local sirna delivery is particularly well-suited for the treatment of lung diseases and infections. the direct instillation of sirna into the lung through intranasal or intratracheal routes enables direct contact with lung epithelial cells. these cells play a part in a myriad of lung conditions and infections, including cystic fibrosis, asthma, influenza and the common cold 24 . it has been reported that respiratory syncytial virus (rsv) replication can be inhibited by nasally administered sirna formulated with or without transfection agents in mice 29, 30 . progress in the treatment of rsv continues with phase ii clinical trials using an aerosolized sirna delivery system 31 . intratracheal administration of sirna has also been reported to offer prophylactic and therapeutic effects in the treatment of severe acute respiratory syndrome 32 . another example of local delivery is direct intratumoral injection of sirna delivery complexes into various mouse xenograft models. sirna complexed with the delivery agent polyethyleneimine (pei) was shown to inhibit tumour growth upon intratumoral injection in mice bearing glioblastoma xenographs 28 . niu and co-workers have also reported naked sirna efficacy up on direct injection into a subcutaneous cervical cancer model in mice 20 . in contrast to the direct accessibility of localized targets, many tissues can only be reached through the systemic administration of delivery agents in the bloodstream. sirna formulations for systemic application face a series of hurdles in vivo before reaching the cytoplasm of the target cell (fig. 2) . post-injection, the sirna complex must navigate the circulatory system of the body while avoiding kidney filtration, uptake by phagocytes, aggregation with serum proteins, and enzymatic degradation by endogenous nucleases 33 . phagocytosis serves as a significant immunological barrier, not only in the bloodstream but also in the extracellular matrix of tissues. phagocytic cells such as macrophages and monocytes remove foreign material from the body to protect against infection by viruses, bacteria and fungi. unfortunately, phagocytes are also highly efficient at removing certain therapeutic nanocomplexes and macromolecules from the body, and steps must be taken to avoid opsonization when designing drug delivery vehicles 33 . egress from the bloodstream and across the vascular endothelial barrier poses a significant challenge for delivery of sirna to many tissues within the body. in general, molecules larger than 5 nm in diameter do not readily cross the capillary endothelium, and therefore will remain in the circulation until they are cleared from the body. there are certain tissues, however, that allow the entry of larger molecules, including the liver, spleen, and some tumours. these organs allow the passage of molecules up to 200 nm in diameter, which can accommodate a typical drug delivery nanocarrier 34 . is introduced into the cytoplasm, where it is cleaved into small interfering rna (sirna) by the enzyme dicer. alternatively, sirna can be introduced directly into the cell. the sirna is then incorporated into the rna-induced silencing complex (risc), resulting in the cleavage of the sense strand of rna by argonaute 2 (ago2). the activated risc-sirna complex seeks out, binds to and degrades complementary mrna, which leads to the silencing of the target gene. the activated risc-sirna complex can then be recycled for the destruction of identical mrna targets. after an sirna complex leaves the bloodstream, it must diffuse through the extracellular matrix, which is a dense network of polysaccharides and fibrous proteins that can create resistance to the transport of macromolecules and nanoparticles 35 . this can slow or even halt the drug delivery process and create an additional opportunity for nanoparticles to be taken up by resident macrophages. having been taken up by the target cell, particles must then escape the endosome to reach the cytoplasm 36 . if the sirna nanocomplex is unable to exit the endosome, it will be trafficked through endomembrane compartments of decreasing ph and be subject to degradative conditions in the lysosome 37 . finally, if formulated with delivery agents, sirna must be released from the carrier to the cellular machinery. modified sirna for improved delivery humans have evolved a number of host-defence mechanisms against sirna, as it is a feature of certain viral infections. however, chemical modifications can be introduced into the sirna molecule to evade immune defences in vivo. for example, many non-modified sirnas can induce nonspecific activation of the immune system through the toll-like receptor 7 (tlr7) pathway 38, 39 . this effect can be reduced by the incorporation of 2′-o-methyl modifications into the sugar structure of selected nucleotides within both the sense and antisense strands 38, 40 (fig. 3a) . 2′-o-methyl modifications have also been shown to confer resistance to endo nuclease activity 41 and to abrogate off-target effects when incorporated into the seed region, which corresponds to nucleotides 2-8 on the antisense strand 42 . other common modification approaches to mitigate enzymatic degradation include the introduction of phosphorothioate backbone linkages at the 3′-end of the rna strands to reduce susceptibility to exonucleases. it is also possible to incorporate alternative 2′ sugar modifications (for example, a fluorine substitution) to increase resistance to endonucleases 43 . another strategy to improve the therapeutic efficacy of sirna involves the conjugation of small molecules or peptides to the sense strand of the sirna. several small molecules have been reported to increase target-gene knockdown in vitro, including membrane-permeant peptides 44 and polyethylene glycol (peg) 45 . of particular note are cholesterol-modified sirnas, which have demonstrated increased binding to serum albumin, resulting in improved biodistribution to certain targets including the liver (fig. 3b) . cholesterol-modified sirna were capable of silencing apolipoprotein b (apob) targets in mouse liver and jejunum, and of ultimately reducing total cholesterol levels 46 . another study by difiglia and co-workers details the ability of a cholesterol-modified sirna to knockdown a gene associated with huntington's disease. a single intrastriatal injection was able to delay the abnormal behavioural phenotype observed in a rapid-onset mouse model of this disease 23 . given the success of cholesterol-modified sirna in vivo, wolfrum and co-workers attempted to identify alternative lipid-like molecules to serve as rna conjugates for improved delivery of sirna 47 . specifically, fatty acids and bile-salt derivatives were conjugated to sirna and injected into mice and hamsters in order to elucidate how modified sirna conjugates interact with the high-density lipoprotein (hdl) and low-density lipoprotein (ldl) receptors that enable delivery to the liver. it was found that shorter fatty-acid chain lengths (97%. our aim is to study the global gene expression pattern induced by ige sensitization and fcεri aggregation on human mast cells. gene expression analysis using cdna or oligo-dna microarrays has proven to be a sensitive method to develop and refine the molecular determinants of several human disorders, including cancer and autoimmune diseases. we analyzed the expression pattern of 8,793 transcripts from the stimulated mast cells, and compared the expression patterns with control/unstimulated samples. the complete gene expression data of our experiments, representing 8,793 probeid is available at the ncbis gene expression omnibus [25] , and is accessible through geo series accession number gse1933 (25) . the microarray analysis revealed that 760 genes (~8.6%) were differentially expressed between resting and stimulated mast cells with statistical significance (p ≤ 0.05), which were hierarchically clustered ( figure 2 ). because of the relatively large number of genes that were differentially regulated, we focused on genes that were upregulated by at least a 2-fold in any time point of mast cell stimulation. of the 760 genes, 58 genes were initially upregulated (at least 2-fold) by ige-sensitization alone (table. 1), and a total of 115 genes were overexpressed (by 2-fold or more in at least one time point), during the time course of mast cell activation by ige-alone or after crosslinking fcεri (table. 1 ). in order to examine the global characteristics of these genes, we used the gene ontology consortium database for biological processes [26] . using this database we analyzed the genes that were upregulated by at least 2fold, thus, allowing us to separate the 115 genes, into the following functional families: (a) cytokines and cytokine receptors; (b) chemokines and chemokine receptors; (c) other immunoregulatory genes; (d) cell proliferation and anti-apoptosis; (e) adhesion and cytoskeleton remodeling; (f) transcription factors and regulators of transcription; (g) signal transduction; (h) genes involved in other cellular functions "others" (table 1 & figure 3 ) our study revealed that substantial changes in gene expression in response to monomeric-ige sensitization alone. in order to ensure that the human ige (ige, cat: 30-ai05, lot number a01071004, fitzgerald, concord, ma), used in this study was indeed monomeric ige, prior to each experiment; we run a sample of the ige through nondenaturing polyachrylamide-gel-electrophosys (nonreducing-page). no aggregates were observed at any time (data not shown). we focused the analysis of our data on genes that were upregulated by at least 2-fold, over basal levels, and identified 58 genes that were increased by ige sensitization alone (table. 1 ). we then separated them into different categories, based on their biological function (determined by public databases). among the most prominent findings was the upregulation of genes coding for the cytokines il-1β (3.3 fold), il-6 (2.7 fold), and csf1 (1.6 fold); genes coding for the chemokines il-8 (cxcl8) (2.1 fold), mip1β (ccl4) (3.5 fold), mcp3 (ccl7) (2.1 fold), groα (cxcl1) (2.3 fold) and groγ (cxcl3) (1.6 fold), were also upregulated. other than these, several genes coding for other receptors involved in immune-responses; immunoregulatory genes; adhesion and/or cytoskeleton remodeling; regulators of apoptosis; signal transduction; transcription factors; were also upregulated by monomeric ige (table. 1 ). thus, these results suggest that "passive" sensitization of mast cells, with monomeric ige, may not only prime mast cells to be ready for the challenge to come, but that mast cells may also have the potential to purity of mast cells an interesting finding in our study was upregulation of genes coding for the pyrogenic and proinflammatory cytokines il-1β, il-6 ( table. 1a). one of the key roles for il-1β is to trigger the upregulation of key proinflammatory proteins [27] . we also observed the upregulation of ptx3 (pentaxin-related gene) ( table. 1c): ptx3 is a protein involved in inflammation that is rapidly upregulated by il-1β. other cytokine genes that were upregulated are csf1 (a growth promoting cytokine), and the cytokine receptors il1r1, il27ra, tnfrsf9, tnfrsf12 and il1rn (table. 1a). we also found that the level of expression of genes coding for the chemokines il-8 (cxcl8), ccl7 (mcp3), ccl4 (mip1β) and cxcl1 (groα) was increased (table 1b) . il-8 plays a major role in inflammatory responses mainly due to its ability to recruit and activate neutrophils. ccl7, ccl4 and cxcl1 are known to recruit monocytes, nk, basophils, dendritic cells and th2 cells. moreover, our data also show that several other chemokine genes were upregulated during mast cell stimulation: these include ccl5 (rantes), cxcl3 (groγ), and the chemokine receptor ccrl2 (table. 1b). these data suggest a key role for fcεri in triggering an a wide range of inflammatory responses, as the over-expression of cytokines and chemokines is a prerequisite to triggering inflammation, including vascular permeability, and leukocyte and lymphocyte recruitment, differentiation and activation. several genes involved in innate and adaptive immuneresponse were upregulated at least by 2-fold during mast cell stimulation: these include the toll like receptor 2 (tlr2), as well as several genes of the tnfα signaling pathways, including tnfaip6 (table. 1c). these findings support previous reports of the role of mast cells in innate immunity and antibacterial activity [21] . transcripts of genes lif, cd69, and cd83 were also upregulated, as were the major histocompatibility complex genes (hla-dqb1, and hke2) and inhibitory receptor of igg, fcgr2b (table. 1c). the generation of transcripts for such genes suggests that mast cells can acquire characteristics typical of cells involved in innate and adaptive immune responses. several genes involved in cell proliferation and survival, such as pdgfa (platelet-derived growth factor alpha polypeptide), pdgfb (platelet-derived growth factor beta polypeptide), pbef1 (pre-b-cell colony-enhancing factor), tieg (tgfb inducible early growth response), and insig1 (insulin induced gene 1) were upregulated, as well as several anti-apoptosis genes including tnfaip3 changes in gene expression in human mast cells following ige sensitization and fcεri aggregation figure 2 changes in gene expression in human mast cells following ige sensitization and fcεri aggregation. changes in expression over control of human cord blood derived mast cells that were activated by ige sensitization and fcεri crosslinking for different time points (2 hr, 6 hr and 12 hr). labelled crna from cell of each time point were hybridized to human genome focus array and signals were scanned after fluidics. the data was analyzed as described in material and methods and analysis revealed differential expression of 760 genes between resting and stimulated mast cells with statistical significance (p ≤ 0.05). agglomerative average-linkage hierarchical clustering of the five different experimental conditions was obtained for selected groups of genes using genespring 7.0. each colored box represents the normalized expression level of a given gene in a particular experimental condition and is colored according to the color bar. the data represent average of four independent experiments. sensitised 2h 6h 12h fold change color bar table 1 (table. 1d ). this supports the fact that, in the initial stages of mast cell activation, several mediators produced are mainly for cell proliferation and survival [28] . thus, fcεri aggregation may enhance mast cell proliferation and survival, perhaps owing to the autocrine effects of the cytokines, growth factors, and antiapoptotic proteins, triggered by fcεri aggregation. nm_002984.1 ccl4 chemokine (c-c motif) ligand 4 3.5 4.7 1.6 -3.0 nm_001511.1 cxcl1 chemokine (c-x-c motif) ligand 1 2.3 2.5 1.1 -2.9 nm_006273.2 ccl7 chemokine (c-c motif) ligand 7 2.1 3.6 5.3 2.2 nm_002090.1 cxcl3 chemokine (c-x-c motif) ligand 3 1.6 2.5 1.6 -1.8 nm_002985.1 ccl5 chemokine (c-c motif) ligand 5 1.6 1.8 2.1 1.8 af015524 ccrl2 chemokine (c-c motif) receptoranother functional characteristic of immune-cell activation is the coordinated expression of genes involved in cell adhesion and cytoskeleton remodeling (table. 1e). of particular importance are the genes coding for proteins involved in cell motility, cytokinesis, endocytosis and exocytosis. we found at least 2-fold upregulation of various genes coding for proteins involved in cell adhesion, such as flrt2 (fibronectin leucine rich transmembrane protein 2), kal1 (kallmann syndrome 1 sequence), cd151 (cd151 antigen), and alcam (activated leukocyte cell adhesion molecule); as well as for several gene-transcripts involved in cytoskeleton remodeling, including rasal1 (ras protein activator like 1), arhe (ras homolog gene family, member e), arf6 (adp-ribosylation factor 6), and flnb (filamin b, β-actin binding protein 278) (table. 1e). the expression of genes involved in cell adhesion and cytoskeleton remodeling is an essential step in immunecell activation. resting immune cells have cytoskeletal structures that sequester antigen, chemokine, and adhesion receptors in accessible regions of the plasma membrane. upon activation, reorganization of the actin cytoskeleton leads to the formation of supramolecular activation clusters, bringing receptors and costimulatory molecules together, as well as important adaptor proteins that promote the sustained activation of the cell. stimulation of immune-effector cells through their antigen receptors initiates cell cycle entry and changes the gene expression pattern, a response generally referred to as "activation". we found that the genes for several transcription factors were upregulated during ige-sensitization and fcεri aggregation, including the transcription factors most active during an immune response, such as nfκb and nfat (table. 1f). we observed an increase in the transcripts for the nuclear factor of kappa light polypeptide genes 1, alpha, and epsilon (nfκb 1, 1a, and 1e), and pie chart showing the percentage distribution of the upregulated genes figure 3 pie chart showing the percentage distribution of the upregulated genes. a. percentage distribution of the total amount of genes upregulated. all the genes, observed to be upregulated at least 2-fold at any given time point, were distributed, according to their biological function described in b the nuclear factor of activated t cells, nfatc1 (table. 1f ). other transcription factors upregulated were the oncogenes myc (v-myc myelo-cytomatosis viral oncogene homolog), and maff (v-maf musculo-aponeurotic fibrosarcoma oncogene homolog f) ( table. 1f). interestingly, the activities of nfκb and nfat together are responsible for the transcription of many proinflammatory genes, including several genes coding for cytokines and chemokines [29, 30] . during mast cell activation, many signaling molecules are engaged in diverse responses, ranging from calcium release from internal stores, degranulation, the generation of lipid-derived proinflammatory mediators and the production of cytokines and chemokines. in our study we observed that a substantial number of genes coding for intracellular signaling proteins were upregulated, by at least 2-fold ( (table. 1g ). moreover, we show here the upregulation of genes that code for oxidized low density lipoprotein receptor (olr1), and for the low density lipoprotein receptor (ldlr) ( table. 1g), indicating a potential role for mast cells in cholesterol homeostasis. of particular interest is the upregulation of the gene coding for the lipid kinase, sphingosine kinase 1 (sphk1) ( table. 1g). we and others have previously reported that sphk1 plays a critical role in the intracellular signaling pathways triggered by fcεri in mast cells [28, 31] , and coordinates several physiological responses triggered by activated mast cells. we confirmed our microarrays findings by real time pcr on selected genes such as il-1β il6, il-8, mcp3, rantes and sphk1, utilizing an aliquot of the same rna sample that was used for the microarray experiments ( figure 4) . the results showed that the messenger rna for the selected genes follows a similar pattern of expression to that observed with the oligo-dna microarray experiment, thus confirming the results and the quality of the data obtained with the high-density microarrays. mast cell activation also results in the sustained de novo production of pro-inflammatory cytokines and chemokines both of which may contribute to the inflammation and pathology underlying allergic disease as well as in innate and acquired immunity. the amounts of these cytokines were measured by elisa and depicted in figure 5a . fcεri-triggered generation of il-1β, il-6, il-8, ccl7 (mcp3) and ccl5 (rantes), whereas ige sensitization alone triggered smaller amounts of il-1β and mcp3, high amount of il-8, and very less amounts of il-6 and rantes ( figure 5a ). we verified the differential expression of sphk1 by western blot analysis ( figure 5b ). its levels of expression were found to be consistent with that of microarray as well as real time pcr results. thus, these data together with real time pcr data validate microarray results. mast cell activation via fcεri triggers exocytosis of granules containing pre-formed inflammatory mediators in a tyrosine kinase and calcium dependent manner. here we studied, whether monomeric-ige alone, may activate fcεri intracellular signaling pathways, leading to physiological responses of mast cells, by analyzing the overall tyrosine phosphorylation; fluctuations in cytosolic ca 2+ concentration; and degranulation by measuring β-hexosaminidase release (fig. 6a,b &6c) . we show here that in our experimental setting monomeric-ige alone is not able to trigger any changes on the overall protein-tyrosine phosphorylation patterns compared with resting cells; nor was it able to trigger calcium release from internal stores; neither degranulation. on the other hand fcεri-aggregation did indeed trigger all these responses (fig. 6a,b &6c ). binding of ige to fcεri enhances the cell surface expression of fcεri, as a result, its ability to promote the stabilization/accumulation of fcεri on the mast cell surface in the presence of continued basal levels of protein synthesis [32, 33] . it is possible that most of the enhanced ige dependent functions that are observed after antigen or anti-ige-induced fcεri aggregation, in cells that have been sensitized with ige, are a consequence of the higher level of fcεri expression. however, a controversial question remains as to whether monomeric ige can also have more direct effects on mast cell functions. many studies over the years have shown no evidence that the binding of monomeric ige can induce detectable signaling or production of mediators by mast cells. however, some groups have reported that monomeric ige can enhance mast cell survival and trigger cytokine production [24, 28, 33] . in concon sen 2h 6h 12h trast, a recent study by matsuda et al [24] , fail to find any ability of ige to enhance mast cell survival on withdrawal of scf. interestingly, the study by matsuda et al, also showed that ige sensitization alone can induce the upregulation of cytokines and chemokines at the protein level, namely il-8 and mcp1 [24] . in agreement to this, we show that il-8 is induced by ige alone at the mrna as well as at the protein level (table 1, and figures 4 and 5) , in contrast we could not detect any significant increment on mcp-1 levels, we can speculate that this difference could be due to the different amounts ige used (1 µg/ml vs 2.5 µg/ml). however, we also show the upregulation of various chemokines, including the mcp-1-related protein mcp-3, which was also upregulated by ige alone (table 1, and figures 4 and 5) . in our present study, we found that several genes related to proliferation were upregulated by ige alone (table 1) ; however, whether these genes, if fully transcribed, may be able to trigger mast cell proliferation in the absence of scf is not known. the observation that ige alone can induce the upregulation of a substantial number of genes encoding for cytokines and chemokines, has profound implications in our understanding of the role of mast cell in inflammation. cytokines and chemokines share many activities, including the ability to induce fever and shock syndrome in animal models [27] . cytokine and chemokine production are universal components of a wide range of disease states, including immune-complex-mediated conditions such as nephritis [34] , arthritis [35] , and acute graft rejec[36] . these data suggest a potential role for mast cells in triggering, or at least contributing to, strong inflammatory responses. in is also interesting to mention that, several genes encoding for transcription factors were upregulated by monomeric-ige and fcεri aggregation. perhaps the most prominent of these transcription factors, is nfκb. nfκb represents a family of related proteins which dimerize to form transactivating complexes [35] . nfκb dimmers are sequestered in the cytoplasm by interaction with inhibitory proteins (the iκbs). various stimuli activate kinase signaling cascades that result in the phosphorylation and degradation of iκb, thereby releasing nfκb to translocate to the nucleus, where it activates transcription of target genes. many studies have emphasized the role of this transcription factor in regulating genes at critical points in immune-cell development and activation [37] . many nfκb targets are antiapoptotic [38] , which may explain the importance of the nfκb pathway in oncogenesis and resistance to chemotherapy [39, 40] . during an immuneresponse several genes are triggered by the nfκb, these include genes coding for the various proinflammatory molecules, such as mips, il-1β, il-6, il-8, tnfα, groα and other cytokines, chemokines and cell adhesion molecules icam, vcam and selectins [41] . as immune cells progress through development and respond to antigenic challenge, they trigger signal transduction pathways that alter their cellular functions and the activity of transcription factors, changing their effector functions and their gene expression profiles. during mast cell activation, many signaling molecules are engaged in diverse responses, ranging from calcium release from internal stores, degranulation, the generation of lipidderived proinflammatory mediators and the production of cytokines and chemokines. in our study we observed that a substantial number of genes coding for intracellular signaling proteins were upregulated, by at least 2-fold, during mast cell stimulation (table. 1g). interestingly, we observed a substantial upregulation of the mrna for sphingosine kinase 1 (sphk1), even by ige alone, this upregulation was also confirmed at the protein level. sphingosine kinases are novel enzymes that phosphorylate sphingosine (a membrane lipid), to generate the bioactive molecule sphingosine-1-phosphate (spp), which is implicated in several inflammatory responses. we and others have previously reported that sphk1 plays a critical role in the intracellular signaling pathways triggered by fcεri in mast cells [28, 31] , and coordinates several physiological responses triggered by activated mast cells. we showed that sphk1 is involved in the calcium signals triggered by fcεri aggregation in human mast cells, as well as playing a critical role for mast cell degranulation [31] . previously, we reported a pivotal role for mast cell activation: tyrosine phosphorylation, calcium sig-nals, degranulation figure 6 mast cell activation: tyrosine phosphorylation, calcium signals, degranulation. a. analysis of overall protein phosphorylation on tyrosine residues. upper panel; overall tyrosine-phosphorylation pattern was analysed in cell extracts from: control unstimulated cells (basal); cells treated with ige alone for (ige) for 5 min; and after fcεri crosllinking for 5 min (xlfcεri). lower panel; the blots were probed for α-tubulin (control for equal loading). results shown are representative of four separate experiments. b. levels of intracellular free calcium. intracellular calcium measurements of mast cells following addition of ige alone (ige); and following the addition of the anti-human ige, to igesensitized cells (xlfcεri), the intracellular calcium levels were analyzed in a continuous reading for the timesstated in the graph. results shown are the mean plus the standard deviation of triplicate measurements and are representative of four separate experiments. c. degranulation. b-hexosaminidase release was determined from control-unstimulated mast cells (basal); following monomeric-igesensitization for 30 minutes (ige); and following fcεri aggregation by addition of the anti-human ige to sensitized cells for 30 minutes (xlfcεri). results shown are the mean plus the standard deviation of triplicate measurements and are representative of four separate experiments. [42, 43] , showing that sphk is key in triggering calcium release from internal stores, and the activation of the phagocyte nadph oxidase. moreover, very recently we demonstrated the role of sphk1 in inflammatory responses triggered by the anaphylatoxin c5a in human neutrophil and macrophages. these responses include: calcium signals, degranulation, cytokine production and chemotaxis triggered by c5a [44, 45] . the observation that the gene encoding for sphk1 is activated during ige-sensitization of mast cells, coupled to the findings above may indicate a key role for sphk1 in mast cell triggered responses. the significance of this research supports the notion that, activation of mast cells appear to be linked to a wide range of pathologies, not only in allergies (as is widely recognized), but potentially in other inflammatory conditions. the method of global gene expression analysis using cdna or oligo-dna microarrays has proven to be a sensitive method to identify and define/redefine the molecular determinants of several human disorders, including cancer and autoimmune diseases, and has provided us with signatures of the immune response [41] . using this technology, complemented with powerful analytical methods, we compared the gene expression profiles of human mast cells stimulated by ige sensitization, and from a series of time points of fcεri aggregation, with unstimulated/control human mast cells. whether changes in gene expression, under these conditions, are representative of a pathological state is not currently known. it is also not known whether ige/antigen and fcεri aggregation will trigger the same set of genes in an organism, where a number of events may be activating mast cells at the same time. however, taken together, our data brings us better insights into the molecular basis of mast cell activation, and provides meaningful information, regarding the mechanisms by which mast cell activation may contribute to the overall activation of the immune response, having clinical implication for improving not only allergic conditions but potentially other inflammatory diseases, where mast cells may play a role. this study is an attempt to elucidate the molecular mechanisms which mast cells undergo during "priming" ige sensitization and full activation by fcεri aggregation in a global perspective. in conclusion, our present study provides information that mast cells, by generating a broad range of cytokines and chemokines, may be a potent contributor of the immune response by recruiting and/or activating other immune-effector cells including the activation of lymphocytes that may, in turn, continue the spreading of the inflammatory response. moreover, changes in the gene expression pattern of transcription factors, intracellular signaling molecules, and cytoskeletal remodeling and anti-apoptosis pathways occur, which would also contribute to the amplification of the inflammatory response. mast cells are well established innate immune-effector cells, and there is mounting evidence to, at least, suggest that mast cells may contribute to the development of acquired immunity [46, 47] , whether in host defense or in allergic or autoimmune diseases. it will be pivotal to define in more detail whether and under which circumstances mast cells may influence the development and/or magnitude of acquired immune responses. unless specifically stated all materials and reagents were purchased from sigma-aldrich (singapore). human umbilical cord blood (cb) samples were collected from normal full-term deliveries of informed individuals with formal consents, meeting the universality institutional review board guidelines, for research using human samples. cd34 + haematopoietic progenitor-cells were harvested using macs cell isolation kit (miltenyi biotec), following the manufacturer's instructions. the isolated cd34 + haematopoietic progenitor-cells were cultured for 5-6-weeks in the presence of 100 ng/ml of stem cell factor (scf cat: 300-07, peprotec, rocky hill, nj), and for the first week this was supplemented with 10 ng/ml of interleukin-3 (il-3 cat: 200-03, peprotec, rocky hill, nj). cells were shown to be differentiated by staining them for specific mast cell markers as follows: for mast cells chymase, with an anti-human chymase mab (igg1-mab1254, chemicon, temecula, ca), and fitc-conjugated secondary antibody (anti-mouse igg-fitc, sigma-aldrich, singapore); for c-kit with anti-human c-kit mouse monoclonal pe-conjugated (cat. no. 555714; clone yb5.b8, bd biosciences -pharmigen, singapore), isotype control anti-mouse ige-pe (cat. no. 555749; clone mopc-21, bd biosciences -pharmigen, singapore); and for fcεri cell-surface expression with antihuman fcεri polyclonal (rabbit-igg-ab31494; abcam, cambridge, uk) and fitc-conjugated secondary antibody (anti-rabbit igg-fitc, sigma-aldrich, singapore) also used as isotype control, and analyzed immediately using a coulter epics-elite esp flow cytometer (beckman, germany). purity was estimated at >97%. the differentiated mast cells were plated in 6 well plates and allowed to rest for 24 hr. after differentiation, mast cells were plated in 6 well plates and allowed to rest for 24 hr. cells in all wells, except the control well, were sensitized with human monomeric ige (1 µg/ml, ige, cat: 30-ai05, lot number a01071004, fitzgerald, concord, ma) overnight. fcεri aggregation was carried out by incubating the cells with monoclonal mouse-anti-human ige (1 µg/ml, anti-human-ige, cat: mca2115, clone 4c3, serotec, oxford, uk) at 37°c, for 2 hr, 6 hr and 12 hr. rna was extracted from all the samples using the qiagen rneasy mini kit (qiagen, valencia, ca). integrity of rna was checked by formamide gel electrophoresis; quantification of rna was carried out by measuring the a 260 nm . labeling and hybridization was carried out as previously described [46] . briefly, 8 µg of total rna from each sample was used to synthesize double stranded cdna using t7-(dt24) oligonucleotide primer and superscript reverse transcriptase (invitrogen). the resultant cdna was purified and 1 µg of purified cdna was labeled with biotin by transcription in vitro. the labeled crnas were, fragmented in the presence of metal ions and then hybridized to hg-focus array (affymetrix), following hybridization the gene chips were washed and stained after which the chips were scanned by gene array scanner (agilent technologies). data collection and analysis was carried out using micro-array suite 5.0 (mas) (affymetrix). the absolute data (signal intensity, detection call and detection p-value) were exported into genespring v7.0 (silicon genetics, redwood city, ca, usa) software for analysis by parametric test based on cross gene error model (pcgem). the anova approach has been used to find differentially expressed genes (p < 0.05). the benjamini and hochberg false discovery rate multiple testing correction was applied. agglomerative average-linkage hierarchical clustering of the five different experimental were obtained for selected groups of genes with gene spring 7.0 software (silicon genetics, redwood city, ca, usa) using standard correlation as similarity matrix. real-pcr was performed, as previously described [46] , using 1 µg of total rna from the same samples used for microarray experiments. pcr was performed for transcripts of il-1β (primers: for amplicon detection, the light cycler rna master sybr green kit (roche) was used as described by the manufacturer. pcrs were performed in a lightcycler ® instrument (roche) as follows: reverse transcription at 61°c for 20 min, initial denaturation at 95°c for 2 min; amplification for 45-65 cycles of denaturation (95°c, 5s, ramp rate 20°c/s), annealing (optimal temperature, 5s, ramp rate 20°c/s) and extension (72°c, product length [bp]/25 s, ramp rate 2°c/s). a single online fluorescence reading for each sample was taken at the end of extension step. quantitative results were expressed by identification of the second derivative maximum points, which marked the cycles where the second derivatives of the fluorescence signal curves are at maximum. these points were expressed as fractional cycle numbers. then, these cycle numbers were plotted against the logarithm of the concentrations of serially 2-fold diluted standard samples to obtain a standard curve. the concentrations of unknown samples were calculated by extrapolation from this standard curve. positive sample specificity was confirmed by determining the melting curve (95°c, 5s, ramp rate 20°c/ s; 68°c, 15s, ramp rate 20°c/s; 95°c, 0s, ramp rate 0.1°c/s, continuous measurement). supernatants from control cells, cells sensitized, and cells following fcεri aggregation, were collected and stored at -20c until use. il-1β, il-6, il-8, mcp-3 and rantes levels in the supernatants were evaluated using elisa (r&d systems inc., mn, usa) following the manufacturer's instructions. western blots were carried out as previously done [31] . briefly, 40 µg of lysate for each sample was resolved on 10% polyacrylamide gels (sds-page) under denaturing conditions and then transferred to 0.45 µm nitrocellulose membranes. for overall tyrosine phosphorytaion, the blots were probed using a specific monoclonal anti-phosphotyrosine primary antibody (p-tyr, sc-7020, santa cruz, ca, usa), and an anti-mouse hrp-conjugated secondary antibody (anti-mouse igg-hrp, a-4416, sigma). bands were visualized using the ecl western blotting detection system (amersham, singapore). for sphk1 expression, the blots were probed using a rabbit polyclonal anti-sphk1 primary antibody (anti-sphk1, x1627p, exalpha, ma, usa), and hrp-conjugated secondary antibody (anti-rabbit igg-hrp, sc-2004, santa cruz, ca, usa). for loading control the blots were probed with a monoclonal anti α-tubulin (anti-α-tubulin, sc-5286, santa cruz, ca, usa), and an anti-mouse hrpconjugated secondary antibody (anti-mouse igg-hrp, a-4416, sigma). bands were visualized using the ecl western blotting detection system (amersham, singapore), and quantified by densitometry analysis. cytosolic calcium was measured as described previously [31] . briefly, cells were loaded with 1 µg/ml fura2-am (molecular probes, leiden, the netherlands) in pbs, 1.5 mm ca 2+ and 1 % bsa. after removal of excess reagents by dilution and centrifugation (in pbs), the cells were resuspended in pbs containing 1.5 mm ca 2+ and 1 % bsa, for 30 min; or in pbs containing 1.5 mm ca 2+ , 1 % bsa, and human-monomeric ige (1 µg/ml) for sensitization, for 30 min. after removal of excess ige by dilution and centrifugation (in pbs), the cells were resuspended in 1.5 mm ca 2+ supplemented pbs and warmed to 37°c in the cuvette; unsensitazed cells were placed in the cuvette and cytosolic calcium was measured before and after the addition of monomeric ige. ige-sensitized cells were placed in the cuvette and fcεri was crosslinked by addition of mouse-anti-human ige (1 µg/ml). fluorescence was measured at 340 and 380 nm. degranulation was measured using as previously described [31] . briefly, an aliquot of cells was resuspended in pbs containing 1.5 mm ca 2+ and 1 % bsa, and incubated with monomeric ige for 30 min at 37°c. another aliquot of cells was resuspended in pbs containing human-monomeric ige (1 µg/ml) for sensitization, 1.5 mm ca 2+ and 1 % bsa, for 30 min. after removal of excess ige by dilution and centrifugation (in pbs), the cells were resuspended in 1.5 mm ca 2+ supplemented pbs, and fcεri was crosslinked by addition of mouseanti-human ige (1 µg/ml) to cells for 30 min at 37°c. following the incubation, 50 µl of supernatant, was incubated with 200 µl of 1 mm p-nitrophenyl n-acetyl-β-dglucosaminide for 1 hr at 37°c. the total β-hexosaminidase concentration was determined by a 1:1 extraction of the remaining buffer and cells with 1% triton x-100; a 50 µl aliquot was removed and analyzed as described. reactions were quenched by addition of 500 µl of 0.1 m sodium carbonate buffer. the enzyme concentration was determined by measuring the od at 400 nm. β-hexosaminidase release was represented as a percent of total enzyme. to analyze the expression of the intracellular mast cell chymase, 1 × 10 6 cells were washed with ice cold pbs, fixed and permeabilised using the fix and perm reagents from caltag (caltag laboratories, burlingame, ca) as follows: after washing, samples were resuspended in 100 ul of reagent a (fixation medium) and incubated for 15 min at rt. the cells were then washed twice with ice-cold pbs, and resuspended in 100 ul of reagent b (permeabilization medium) and incubated for 15 min at rt. cells were washed twice and resuspended in 100 µl of pbs/1% fbs and 5 µl of the anti-human chymase mab (igg1-mab1254, chemicon, temecula, ca) was added and samples were incubated for 20 min at rt. samples were washed twice in ice-cold pbs, then resuspended in pbs/ 1% fbs and 5 µl of fitc-conjugated secondary antibody (anti-mouse igg-fitc, sigma-aldrich, singapore) was added, and incubated in dark for an 30 min at rt. samples were washed twice with ice-cold pbs and resuspended in 100 µl of pbs/1% fbs for immediate analysis. to analyze the cell surface expression of ckit and fceri, the samples were initially processed as above except that the permeabilisation step was omitted. for c-kit the primary antibody was an anti-human c-kit mouse monoclonal (mca1841, clone 104d2, serotec, oxford, uk), and the secondary antibody was a fitc-conjugated (antimouse igg-fitc, sigma-aldrich, singapore). for fcεri cell-surface expression, the cells were labeled with the primary anti-human fcεri polyclonal (rabbit-igg-ab31494. abcam, cambridge, uk), and the secondary antibody was a fitc-conjugated (anti-rabbit igg-fitc, sigma-aldrich, singapore). all the samples were analyzed by flow cytometry was using a facscalibur machine (bd biosciences), and the data analysed using the cell quest™ pro software. mast cells in innate immunity mast cells to the defense mast cells in autoimmune disease mast cells the diverse potential effector and immunoregulatory roles of mast cells in allergic disease transcriptional response of human mast cells stimulated via the fcεri and identification of mast cells as a source of il-11 the receptor with high affinity for ige signalling through the high-affinity ige receptor fcεri roles of mast cells and basophils in innate and acquired immunity complexity and redundancy in the pathogenesis of asthma: reassessing the roles of mast cells and t cells mast cells can amplify airway reactivity and features of chronic inflammation in an asthma model in mice an essential role of mast cells in the development of airway hyperresponsiveness in a murine asthma model mast cells phagocytosis of fimh expressing enterobacteria histamine-induced activation of human lung macrophages mast cell-derived tumor necrosis factor induces hypertrophy of draining lymph nodes during infection human mast cell transcriptome project identification of novel mast cell genes by serial analysis of gene expression in cord blood-derived mast cells gene expression screening of human mast cells and eosinophils using high-density oligonucleotide probe arrays: abundant expression of major basic protein in mast cells comparative cytokine profile of human skin mast cells from two compartments strong resemblance with monocytes at baseline but induction of il-5 by il-4 priming gene expression profiling of ca 2+ -atapase inhibitor dtbhq and antigen-stimulated rbl-2h3 mast cells identification of specific gene expression profiles in human mast cells mediated by toll-like receptor 4 and fcεri gene expression profiles for fcεri, cytokines and chemokines upon fcεri activation in human cultured mast cells derived from peripheral blood marked increase in cc chemokine gene expression in both human and mouse mast cell transcriptomes following fcε-receptor i cross-linking: an interspecies comparison monomeric ige enhances human mast cell chemokine production: il-4 augments and dexamethasone suppresses the response interleukin-1 induces a shock-like state in rabbits: synergism with tumor necrosis factor and the effects of cyclooxygenase inhibition monomeric ige stimulates signaling pathways in mast cells that lead to cytokine production and cell survival nf-kappa b and rel proteins: evolutionary conserved mediators of immune responses generic signals and specific outcomes: signaling through ca 2+ , calcineurin, and nf-at dichotomy of ca 2+ signals triggered by different phospholipid pathways in antigen stimulation of human mast cells minimal requirements for ige mediated regulation of surface fcεri karasuyama h: drastic up-regulation of fcεri on mast cells is induced by ige binding through stabilization and accumulation of fcεri on the cell surface increased il-12 release by monocytes in nephritic patients the physiological and pathophysiological role of chemokines during inflammatory and immunological responses differential roles of il-1 and tnf-α on graft versushost disease and graft versus leukemia rel/nf-κb transcription factors: key mediators of b-cell activation constitutive nfκb maintains high expression of a characteristic gene network, including cd40, cd86, and a set of antiapoptotic genes in hodgkin/reed-stemberg cells control of oncogenesis and cancer therapy resistance by the transcription factor nf-κb calcium mobilization via sphingosine kinase in signalling by the fcεri antigen receptor signatures of the immune response fcγri coupling to phospholipase d initiates sphingosine kinasemediated calcium mobilization and vesicular trafficking a molecular switch changes the signalling pathway used by the fcγri antibody receptor to mobilize calcium anaphylatoxin signaling in human neutrophils: a key role for sphingosine kinase antisense knockdown of sphingosine kinase 1 in human macrophages inhibits c5a receptordependent signal transduction, ca 2+ signals, enzyme release, cytokine production and chemotaxis mast cells in the development of adaptive immune responses mast cells in allergy and autoimmunity: implications for adaptive immunity melendez aj: expression profile of immune response genes in patients with severe acute respiratory syndrome this work was supported by a bmrc-young investigator award (r-185-000-044-305). we thank a-k fraser-andrews for proofreading the manuscript. mj carried out data analysis and prepared the microarraydata table and figures. hkt isolated the progenitor cells, differentiated the mast cells and carried out the receptor crosslinking. rr carried out the rna extraction, labeling and microarray hybridization. lz carried out rt-pcr. kkc and mr provided the cord blood. ajm designed the study and drafted the manuscript. all authors read and approved the final manuscript. key: cord-292004-9rpoll7y authors: mitchell, hugh d.; eisfeld, amie j.; stratton, kelly g.; heller, natalie c.; bramer, lisa m.; wen, ji; mcdermott, jason e.; gralinski, lisa e.; sims, amy c.; le, mai q.; baric, ralph s.; kawaoka, yoshihiro; waters, katrina m. title: the role of egfr in influenza pathogenicity: multiple network-based approaches to identify a key regulator of non-lethal infections date: 2019-09-20 journal: front cell dev biol doi: 10.3389/fcell.2019.00200 sha: doc_id: 292004 cord_uid: 9rpoll7y despite high sequence similarity between pandemic and seasonal influenza viruses, there is extreme variation in host pathogenicity from one viral strain to the next. identifying the underlying mechanisms of variability in pathogenicity is a critical task for understanding influenza virus infection and effective management of highly pathogenic influenza virus disease. we applied a network-based modeling approach to identify critical functions related to influenza virus pathogenicity using large transcriptomic and proteomic datasets from mice infected with six influenza virus strains or mutants. our analysis revealed two pathogenicity-related gene expression clusters; these results were corroborated by matching proteomics data. we also identified parallel downstream processes that were altered during influenza pathogenesis. we found that network bottlenecks (nodes that bridge different network regions) were highly enriched in pathogenicity-related genes, while network hubs (highly connected network nodes) were significantly depleted in these genes. we confirmed that this trend persisted in a distinct virus: severe acute respiratory syndrome coronavirus (sars). the role of epidermal growth factor receptor (egfr) in influenza pathogenesis, one of the bottleneck regulators with corroborating signals across transcript and protein expression data, was tested and validated in additional mouse infection experiments. we demonstrate that egfr is important during influenza infection, but the role it plays changes for lethal versus non-lethal infections. our results show that by using association networks, bottleneck genes that lack hub characteristics can be used to predict a gene’s involvement in influenza virus pathogenicity. we also demonstrate the utility of employing multiple network approaches for analyzing host response data from viral infections. despite high sequence similarity between pandemic and seasonal influenza viruses, there is extreme variation in host pathogenicity from one viral strain to the next. identifying the underlying mechanisms of variability in pathogenicity is a critical task for understanding influenza virus infection and effective management of highly pathogenic influenza virus disease. we applied a network-based modeling approach to identify critical functions related to influenza virus pathogenicity using large transcriptomic and proteomic datasets from mice infected with six influenza virus strains or mutants. our analysis revealed two pathogenicity-related gene expression clusters; these results were corroborated by matching proteomics data. we also identified parallel downstream processes that were altered during influenza pathogenesis. we found that network bottlenecks (nodes that bridge different network regions) were highly enriched in pathogenicity-related genes, while network hubs (highly connected network nodes) were significantly depleted in these genes. we confirmed that this trend persisted in a distinct virus: severe acute respiratory syndrome coronavirus (sars). the role of epidermal growth factor receptor (egfr) in influenza pathogenesis, one of the bottleneck regulators with corroborating signals across transcript and protein expression data, was tested and validated in additional mouse infection experiments. we demonstrate that egfr is important during influenza infection, but the role it plays changes for lethal versus non-lethal infections. our results show that by using association networks, bottleneck genes that lack hub characteristics can be used to predict a gene's involvement in influenza virus pathogenicity. we also demonstrate the utility of employing multiple network approaches for analyzing host response data from viral infections. keywords: systems biology, network topology, influenza, sars-cov, data integration introduction viruses that are newly introduced to the human population have the potential to be highly pathogenic. while the pathogenicity of these new strains tends to wane as adaptation progresses, emerging viruses, such as highly pathogenic avian influenza strains, are an ever-present threat to human health and the global economy because it is difficult to predict when a new pathogenic strain will appear. the 1918 influenza a virus pandemic claimed 20-100 million lives worldwide (peiris et al., 2004) . multiple influenza pandemics have emerged since. most recently, human infections of h7n9 influenza, which first emerged in the spring of 2013, have resulted in 1568 infections including 616 deaths (food and agriculture organization [fao] , 2019). since 2003, h5n1 avian influenza has caused 860 human infections with a mortality rate of 53% (world health organization [who] , 2018). the 2009 h1n1 pandemic caused less severe disease in humans but spread to nearly 200 countries (michaelis et al., 2009 ) and may have contributed to the deaths of an estimated 284,000 people (dawood et al., 2012) . the fact that influenza strains vary greatly in pathogenicity underscores the need to understand the underlying host mechanisms that contribute to the severity of infection so that we are better prepared to alleviate the effects of highly pathogenic strains. despite the potential for pandemic infection with a highly virulent, highly transmissible new strain of influenza, the current understanding of these mechanisms remains limited. a major advantage of a systems biology approach to pathobiology is the ability to identify novel, key elements of a biological process, such what regulators are involved in critical processes. high-throughput profiling methods (e.g., transcriptomics) provide powerful tools for examining how entire systems respond to different perturbations such as acute disease. network reconstruction provides the opportunity to utilize all available data and is a critically important tool for representing complex sets of interactions. for biological systems, network analysis has proven useful for analyzing genetic interactions among genes, as well as protein-protein, protein-dna, and kinase-substrate interactions (ideker and krogan, 2012) . in addition, network approaches have attempted to identify regulatory associations between genes and proteins by comparing expression patterns across multiple conditions (faith et al., 2007; mcdermott et al., 2009 mcdermott et al., , 2012 . these approaches may capture physical interactions but can also identify more subtle, though equally important, regulatory relationships between gene pairs or within gene clusters. previous work has shown that prioritization of key regulators based on network topology is superior to simple ranking of differentially expressed genes (mcdermott et al., 2011) . our group and others have demonstrated that genes occupying certain topological positions in association networks play important regulatory roles in the biological process being studied (yu et al., 2007; mcdermott et al., 2009 mcdermott et al., , 2016 mitchell et al., 2013) . network hubs are identified by the degree centrality metric, which is the number of edges associated with any given node. network bottlenecks are identified by the betweenness centrality metric, which is the number of shortest paths between all pairs of nodes that pass through a given node. these are two of the most studied topological features, yet it is unclear from the literature which of these is the most effective predictor of regulatory function for any given network construction approach or biological context. it is also unclear what distinct regulatory roles each has; such information is important to discern as it may be used to identify targets for therapeutic intervention. typically, studies that attempt to uncover the underlying mechanisms of pathogenicity simply compare a single high-and low-pathogenicity strain or dose (hatta et al., 2001; kobasa et al., 2007; cilloniz et al., 2009; safronetz et al., 2011; tisoncik-go et al., 2016) . while this approach may allow pathogenicity-related host responses to be identified, it can be difficult to distinguish between responses that are truly tied to pathogenicity and those that are strain-specific. for this study, we use network clustering and topology to compare six influenza strains and mutants of varying pathogenicity (referred to herein as the pathogenicity gradient) at multiple doses and four time points in the context of a murine infection model. this allows us to identify pathogenicityrelated traits with greater certainty than in previous studies. we utilize global transcriptomic and proteomic data from these experiments, thus providing a more complete view of the layered interaction between host and virus. we demonstrate a networkbased approach for identifying critical factors in influenza pathogenesis and test our findings with a pharmacological inhibitor during lethal and non-lethal infections. microarray data was deposited previously in the gene expression omnibus (geo) under the following accession numbers: gse33263: influenza a/vn/1203/04 infection in mice with three viral doses at 1, 2, 4, and 7 days (im001); gse37572: ha avirulent mutation in a/vietnam/1203/2004(h5n1) infection in mice at 10 4 pfu at 1, 2, 4, and 7 days (im004); gse43301: influenza a/vn/1203/04 pb2-627e mutant infection in mice with 10 4 pfu at 1, 2, 4, and 7 days (im005); gse43302: influenza a/vn/1203/04 pb2-627e mutant infection in mice with 10 3 pfu at 1, 2, 4, and 7 days (im005); gse44441: influenza a/vn/1203/04 pb1-f2 mutant infection in mice with 10 4 pfu at 1, 2, 4, and 7 days (im006); gse44445: influenza a/vn/1203/04 ns1trunc124 mutant infection in mice with two doses at 1, 2, 4, and 7 days (im007); gse37569: influenza a/ca04/2009 infection in mice with four doses at 1, 2, 4, and 7 days (ca04m001); gse33266: sars-cov ma15 infection in mice with four viral doses at 1, 2, 4, and 7 days (sm001); gse50000: sars-cov ma15, icsars-cov, or sars batsrbd infection in mice with two viral doses at 1, 2, 4, and 7 days (sm003); gse49262: sars-cov ma15 or sars deltaorf6 infection in mice with 1o5 pfu at 1, 2, 4, and 7 days (sm012); gse49263: sars-cov ma15 or sars nsp16 infection in mice with 10 5 pfu at 1, 2, 4, and 7 days (sm014). proteomics data for im001, im004, im005, im006, and im007 (described above) can be found at https://omics.pnl.gov/ project-data/systems-virology-contract-data. we set out to build a synthetic pathogenicity profile that represents the severity of the infection for each experimental condition. the viruses used to construct the pathogenicity gradient include the h1n1 strain, a/california/04/2009 (ca04), from the 2009 pandemic,; the highly pathogenic h5n1 avian strain, a/vietnam/1203/2004 (vn1203); and four mutants of vn1203: vn1203-haavir (which lacks the multi-basic cleavage site in the viral hemagglutinin protein that is critical for extra-pulmonary viral spread), vn1203-pb2-627e (which lacks a mammalian-adapting mutation that substantially increases the replicative ability of the viral polymerase complex in mammalian cells), vn1203-ns1trunc (which encodes a c-terminal truncation in the effector domain of the ns1 host response antagonist protein), and vn1203-pb1f2del (which lacks expression of the pb1-f2 protein). genes that mirror this profile are hypothesized to be related to pathogenicity in some way, a proposition similar to that given by taylor et al. (2016) . to construct the profile, we scaled all six strains/mutants proportional to their median lethal dose (mld 50 ) value, or the amount of viral particles at which 50% of infected mice succumb to infection ( figure 1a) . we therefore assigned a score for each strain corresponding to the log of the mld 50 and then adjusted the scores to account for differences in administered doses across studies. since infection conditions for each strain included a dose at 10 4 pfu, the corresponding strain's log mld 50 was assigned to all infection conditions at this dose. to make the score more intuitive (high score = high pathogenicity), each log mld 50 was subtracted from the maximum observed log mld 50 . the intent was to quantitatively relate the experimental conditions to each other, with the expectation that genes related to pathogenicity would manifest expression patterns similar to the pathogenicity profile. to avoid negative values, an additional unit was added to each score. therefore, the pathogenicity level for a given infection condition i, with dose d i and the particular viral strain's mld 50 m i , is given by: where m max is the maximum observed mld 50 , and d com is the dose common to all strains/mutants, i.e., the experimental conditions for all infections included at least this dose, if not others. applying this calculation across all conditions yielded the profile ( figure 1b) . since the array of conditions differed somewhat between the transcripts and protein data, profiles unique to each of these datasets were generated. sample collection and microarray processing are described for this dataset in tchitchek et al. (2013) . for our analysis, we selected the probes that were (1) present on all arrays after quality control filtering, (2) previously identified as significantly changed from mock expression (q-value < 0.05), and (3) had a simultaneous log2 fold change of at least 1.5 in at least one experimental condition (li et al., 2011) . this resulted in the selection of 7471 probes for analysis. the results for proteomic data, sample processing, capillary lc-ms/ms analysis, spectral matching, and peptide-to-protein rollup are described in tchitchek et al. (2013) . missing value imputation was performed using a regularized expectation maximization algorithm (webb-robertson et al., 2015) . selected proteins were only those that were (peiris et al., 2004) detected in every experiment before imputation (experiment = set of infections with one given strain), and (food and agriculture organization [fao] , 2019) significantly changed from mock (p < 0.05) in at least one experimental condition; this resulted in 1476 proteins that were used in our analysis. for the correlation calculations (below), expression data from selected time points were extracted and then mean-summarized for each strain and dose. some viruses received heavier experimental coverage than others; we therefore expanded the data compendium to equalize the influence each virus strain in the pathogenicity gradient has on the correlation calculations by duplicating the data from under-represented conditions so that every strain was equally represented in the compendium. we calculated pearson's correlation using the expanded expression profile for each gene/protein and the appropriately expanded pathogenicity profile. day 1, day 2, day 4, and day 7 designations were used to refer to the top 5% of pathogenicitycorrelated genes (both positive and negative correlation) using individual time point data for correlation calculations. network inference was performed using the context likelihood of relatedness software tool as described (mitchell et al., 2013) . the network centrality measures of betweenness and degree were determined using the r igraph package. to identify regulators of interest, we used the "interactions by protein function" and "significant interactions within set(s)" tools in the metacore software package (clarivate analytics, philadelphia pa) to identify genes/proteins whose interactors were enriched among pathogenicity-related proteins. a similar approach was used for pathogenicity-related genes. we used the r weighted gene correlation network analysis (wgcna) package to identify clusters of genes or proteins with behavioral similarity (langfelder and horvath, 2008) . cluster identification was performed using the blockwisemodules function with the following parameter values for transcript cluster analysis: power = 12, minmodulesize = 30, maxblocksize = 8000, reassignthreshold = 0, mergecutheight = 0.25, and pamrespectsdendro = f. all animal experiments and procedures were approved by the university of wisconsin (uw)-madison school of veterinary medicine animal care and use committee under relevant institutional and american veterinary association guidelines. all experiments using replication competent h1n1 viruses were performed in biosafety level 2 (bsl-2) or animal enhanced biosafety level 2 (absl-2) containment laboratories at uw-madison. experiments using replication competent h5n1 virus were performed in an absl-3+ containment laboratory at uw-madison. uw-madison bsl-2, absl-2, and absl-3+ laboratories are approved for use by the united states (us) centers for disease control and prevention (cdc) and the us department of agriculture. experiments using replication competent sars-cov were performed in an absl-3+ containment laboratory at the university of north carolina at chapel hill (unc). unc bsl-2, absl-2, and absl-3+ laboratories are approved for use by the us cdc. madin-darby canine kidney (mdck) cells were propagated in a minimum essential medium (mem) containing 5% newborn calf serum and were maintained at 37 • c in an atmosphere of 5% co 2 . cell stocks are periodically restarted from early passage aliquots and routinely monitored for mycoplasma contamination. the a/california/04/09 h1n1 virus (ca04) was provided by the us cdc. the a/chicken/vietnam/ty167/2011 (h5n1) virus (ty167) was obtained through surveillance activities in vietnam. stock viruses were generated by passaging an aliquot of the original virus once in mdck cells containing 0.6% bovine serum albumin (bsa) fraction v (sigma-aldrich) and 1 µg/ml tosyl phenylalanyl chloromethyl ketone (tpck)-treated trypsin (ca04) or in embryonated chicken eggs (ty167), as previously described (eisfeld et al., 2014) . stock virus titers were quantified by plaque assay in mdck cells using standard methods. sars-cov was propagated and assayed for titer levels, as published previously . nine-to ten-week-old female c57bl/6j mice (the jackson laboratory) were administered 0 or 100 mg/kg of gefitinib (tocris bioscience) in 1% tween-80 in phosphate-buffered saline (pbs) by oral gavage 1 day prior to infection and each subsequent day until the end of the experiment. four mice were used for each drug and viral dose combination. for infection, mice were anesthetized by intraperitoneal (i.p.) injection of ketamine and dexmedetomidine (45-75 mg/kg ketamine + 0.25-1 mg/kg dexmedetomidine) and were intranasally inoculated with 50 µl of pbs-containing viruses, as indicated in figure 6 and the corresponding text in the "effect of egfr inhibition on influenza pathogenesis in mice" section. following inoculation, dexmedetomidine was reversed by i.p. injection of atipamezole (0.1-1 mg/kg). subsequent to infection, individual body weights and survival were monitored for up to 17 days, and mice were humanely euthanized when exhibiting severe clinical symptoms or at the end of the observation period. sars-cov infections in mice were performed as previously published . to model the trend in mice weight over time, we used linear mixed effects models with a normal conditional distribution and identity link on each time course. fixed effects were day, gefitinib level, and the interaction between day and gefitinib level, while random effects for each mouse were included to account for variability in the mice and for the non-independent nature of the data over time. a second model that did not include the gefitinib level and day interaction terms was also fit; a likelihood ratio test was then conducted to determine if the group slopes were significantly different. due to the patterns of mouse weight over time, a single linear model was not always sufficient to model the data; this was determined by using linear splines to estimate change points where separate linear models should be used to represent the trend for different time ranges. specifically, knot points were identified by determining if adding an additional time point to the linear model of an existing segment changed the model. if the new point changed the model significantly, then a new knot point was identified; otherwise, the time point was added to the segment. after all knot points were found, a single random mixed effect model was fit to each segment of data. ty167 at 10 3 pfu was modeled as a single segment, ca04 at 10 2 pfu and 10 3 pfu in three segments, and ty167 at 10 2 pfu in four segments. separate p-values were determined for each segment (as identified from the knot points as boundaries), which are provided in table 2 . our overall strategy is depicted in figure 2 . we used transcriptomics and proteomics data in conjunction with pathogenicity data from the different virus strains/mutants (figures 2a,b) to identify pathways and individual genes/proteins that were important for influenza pathogenicity. correlated gene modules in the transcriptomics were first detected ( figure 2c ) and compared with the pathogenicity profile (figure 2a ) to identify gene modules whose behavior linked them to pathogenicity. individual proteins whose behavior correlated with pathogenicity were incorporated into interaction enrichment analysis, which identified genes whose interaction neighbors from curated networks were enriched among pathogenicity-correlated proteins ( figure 2d) . these results could be connected to host responses evident in the gene clusters depicted in figure 2c . an association network built from mutual information of perturbed gene pairs ( figure 2e ) was used for topology analysis, which yielded network hubs and bottlenecks. lists of pathogenicity-correlated genes from early and late time points ( figure 2f ) were combined and compared to network nodes with high hub and bottleneck scores. comparison of genes correlated with pathogenicity to network hubs and bottlenecks showed significant overlap with network bottlenecks (figure 2g ), but hubs were strikingly excluded ( figure 2h) . examination of results from interaction enrichment ( figure 2d ) and network topology (figures 2e-g) revealed epidermal growth factor receptor (egfr) as a candidate for follow-up experiments ( figure 2i ). we utilized transcriptomic and proteomic datasets (represented by figure 2b ) of mouse infection with six strains/mutants of influenza at varying doses and times, as described in tchitchek et al. (2013) . these viruses display varying degrees of virulence, as assessed by the minimal dose that is lethal to 50% of animals to which it is administered (mld 50 , figure 1a , see section "materials and methods" for details). samples were collected at 1, 2, 4, and 7 days post-infection for global transcriptomics and proteomics analysis of lung tissues relative to time-matched mock-infected controls. to identify transcripts correlated with pathogenicity, we used the wgcna network clustering approach (langfelder and horvath, 2008) to cluster gene expression profiles across all experiments into expression modules that represent groups of genes with similar expression behaviors ( figure 2c ). this approach further affirms that the overall gene expression pattern of each module has true biological meaning because each identified pattern is manifested by many genes. the representative expression profile of each module, or eigengene, can be correlated with clinical measures or other metadata to identify modules of interest (langfelder and horvath, 2008; saris et al., 2009; levine et al., 2013) . we thus applied wgcna to our transcript dataset to identify network modules related to influenza pathogenicity. figure 3a shows the correlations of all eigengene profiles to pathogenicity. two of the modules, pink and black, showed much higher correlations than the others and were selected for further analysis. the pink module is strongly positively correlated with pathogenicity ( figure 3b) . we found statistical enrichment in plasminogen activation among the genes in this module, particularly klkb1 and coagulation factor x1; this suggests that pathogenic influenza infection involves a perturbed coagulation cascade. the black module ( figure 3c ) is strongly negatively correlated with pathogenicity and was strikingly enriched for b-cell activation, implying that a diminished presence of b-cell activity is related to pathogenesis in influenza. interestingly, a previous report showed that influenza caused apoptotic loss of bone marrow b-cells in mice despite the complete lack of viral particles detected in the bones (sedger et al., 2002) . since figure 2 | overview of analysis strategy. omics data (transcriptomics and proteomics) were used in conjunction with pathogenicity data from the different virus strains/mutants (a,b). correlated modules in the transcriptomics were detected (c) and compared with the pathogenicity profile (a) to identify gene modules whose behavior linked them to pathogenicity. individual proteins whose behavior correlated with pathogenicity were submitted to interaction enrichment analysis, which looked for genes whose interaction neighbors from curated networks were enriched among pathogenicity-correlated proteins (d). an association network built from mutual information of perturbed gene pairs (e) was used for topology analysis, which yielded network hubs and bottlenecks. lists of pathogenicity-correlated genes from early and late time points (f) were combined and compared to network nodes with high hub and bottleneck scores. overlap was seen with network bottlenecks (g) but not hubs (h). egfr was identified as a candidate for follow-up experiments based on overlaps between interaction enrichment and pathogenicity-related bottlenecks (i). b-cells are known to both reside in and travel through the lungs (polverino et al., 2016) , high-path flu may trigger death of lung b-cells, resulting in previously unappreciated effects on host response during these infections. to determine the extent to which proteome data validated our transcriptomics findings, we used the matching protein expression data to evaluate downstream pathway regulation related to infection severity at different time points (figure 2d) . we reasoned that correlation with pathogenicity could have different meanings depending on which time points are used for the correlation calculation. correlated genes identified early in the infection could be filling regulatory roles, while those from later points are expected to be the downstream effects of earlier events. accordingly, we identified genes and proteins whose expression profiles correlated with pathogenicity at individual time points and designated the resulting lists as day 1, day 2, day 4, and day 7. we then integrated the gene and protein data using target enrichment to identify the regulatory targets of protein pathway expression (supplementary tables s1, s2). we found that syk, prkcb, and ebf1 were day 1 genes and that each of these is a regulator of b-cell activation/maturation. proteins known to be regulated by and/or bind to these regulators were significantly enriched among day 1 (syk) proteins, while transcripts of genes regulated by ebf1 and prkcb were enriched among day 1 and day 7 (ebf1) or day 4 (prkcb) genes. like the b-cell-related module from the cluster analysis, the expression profiles of all three of these b-cell regulators showed strong negative correlation to pathogenicity, thus reinforcing the concept of decreased b-cell presence in the mouse lung during severe influenza infection. thus, our observation that the presence of b-cell-related functions is tied to pathogenicity is borne out by comparing results across time points and data types. similarly, we observed enrichment in proteins regulated by coagulation factor xiii a1 (f13a1) in day 4 proteins, with the f13a1 transcript also among day 4 genes. transcripts for coagulation regulators plat and serpine1 as well as their downstream targets were found among day 7 genes. these latter results validated our findings from transcript expression that coagulation-related pathways are linked to pathogenicity. in addition to findings related to regulation of b-cell activity and coagulation, we also observed that direct protein targets of egfr activity are significantly enriched among day 1 and day 4 proteins. in this way, integration of transcriptomic and proteomic data enhances our analysis and identifies the pathways most likely to be important for infection severity. from this analysis we constructed three lists of genes that were identified as being correlated to pathogenicity in early infection (supplementary table s3 table s4 ), or both ( figure 2f and supplementary table s5) . we focused on genes that are correlated with pathogenicity both early and late in the infection, for two reasons: (peiris et al., 2004) high correlations from two separate groups of data points for the same gene means it is likely that these genes truly correlate with pathogenicity, and (food and agriculture organization [fao] , 2019) the overlap of the two groups helps yield the identity of genes important both early and throughout the infection process. because influenza viral titers reach maximal levels by day 2, regulatory responses are likely to occur in the first 24 h of infection. we thus designated the day 1 results as "early" and other time points as "late." to identify genes with both early and late infection correlation to pathogenicity, genes were identified from the top 5% of pathogenicity-correlated genes at early time points. the same procedure was used for late time points, and the intersection of these two resulted in a list of early/late correlated genes. the overlap between day 1 genes and the set of combined day 2, day 4, and day 7 genes resulting in 54 genes (we refer to these as early/late correlated genes). the overlap between early and late was highly significant (p = 6.4e-13, fisher test). the clustering analysis provided a way to determine what kinds of genes manifested expression behaviors connected to pathogenicity. however, we were interested in identifying regulatory mechanisms of influenza infection in the context of pathogenicity. as a means of identifying key regulators, we turned to an approach based on network topology. a growing body of work has shown that network topology, or the placement of nodes in the network structure, can be used to identify entities with key regulatory roles (yu et al., 2007; zhou and liu, 2014; narang et al., 2015; mcdermott et al., 2016) . of particular interest are network bottlenecks and hubs, both of which have been shown to be enriched for regulators under various circumstances (yu et al., 2007; mcdermott et al., 2009 mcdermott et al., , 2012 mcdermott et al., , 2016 zhou and liu, 2014; narang et al., 2015) . we built a mutual information-based association network with transcript data (mcdermott et al., 2009 (mcdermott et al., , 2016 mitchell et al., 2013) using 7471 genes deemed significantly changed in at least one experimental condition (strain, dose, or time; see section "materials and methods") as input; we also identified network hubs and bottlenecks, which were defined as the top 5% of betweenness scores and degree scores, respectively ( figure 2e) . since network hub nodes are known to have critical systemic functions, we hypothesized that pathogenicity-related genes may be enriched in bottleneck or hub genes. to test this hypothesis, we examined the statistical enrichment of the identified pathogenesis sets with bottlenecks and hubs identified from the network analysis. interestingly, no overlap was discovered between network hubs and early/late correlated genes (p = 0.11, twosided fisher test). in contrast, we found these genes to be significantly enriched in network bottlenecks (10 of the 54, p = 2.9e-4, two-sided fisher test), suggesting bottlenecks are lung sulfonated steroid importer important early regulators of pathogenesis (figures 2g,h ; names and descriptions of the genes are found in table 1 ). notably, three of these ten, cd22, fcrl1, and ikzf3, are closely related to b-cell activation and overlapped with members of the black wgcna module. another of these 10 genes is egfr (figure 2i ), which we found to have protein targets enriched among correlated proteins at day 1 and day 4. to determine if these results were biased by the selection of arbitrary thresholds, we generated rankings for the degree, betweenness, and correlation to pathogenicity of all genes, then produced matrices of upper percentile thresholds by applying a fisher enrichment test for each threshold pairing. remarkably, we observed a dramatic exclusion (blue cells in figure 4 , upper right) of hubs from correlated genes across a wide range of thresholds. in contrast, bottlenecks showed a strong enrichment trend (red cells in figure 4 , upper left). these results show that for influenza infection, network hubs and bottlenecks have strikingly opposite roles regarding pathogenicity of the virus. given that network bottlenecks have a unique relationship with genes related to pathogenicity, we hypothesized that network hubs might be enriched in genes involved in more general aspects of infection. to test this, we identified the most highly perturbed genes across the transcriptome by ranking genes by their maximum fold change value across all data sets and then built matrices that compared the maximum expression to betweenness and degree, as before. as shown in the lower panels of figure 4 , genes with high maximum expression overlapped dramatically with network hubs but showed minimal enrichment for bottlenecks. thus, highly connected genes (hubs) are strongly related to high expression and are strongly segregated from pathogenicity-related genes, while network bottlenecks show a strikingly different strong relationship to pathogenicity. to determine if similar relationships exist in a distinct infection system, we applied a similar analysis to a compendium of four datasets obtained from mice infected with the sars coronavirus (sars-cov), one of which was previously published by gralinski et al. (2013) . mice were infected with wt sars-cov and three attenuated mutants at varying doses and analyzed for lung gene expression at one, two, four, and seven days postinfection. since lethality is not readily observed in attenuated sars-cov mutants, we used animal weight loss at each time point to represent pathogenicity at each infection condition. also, since viral replication kinetics are slower for sars-cov infection compared to that of influenza virus, we used days 1 and 2 to represent early infection and days 4 and 7 for late infection when ranking genes for pathogenicity. when applying the approach outlined above, we observed very similar results with sars-cov to those seen with influenza virus (figure 5) . the same patterns of exclusion and enrichment of hubs and bottlenecks in regard to pathogenesis-correlated genes and high-expression genes was shown to be even more dramatic in sars-cov. thus, the enrichment of pathogenicity-related genes in network bottlenecks and their exclusion from network hubs appears to be a widespread phenomenon characteristic of respiratory viral infections in mice. this finding is significant because the network betweenness measurement we applied was in no way informed by our pathogenicity results, yet it is able to significantly enrich for pathogenicity-related genes. thus, network bottlenecks but not hubs facilitate the identification of critical regulators as intervention targets. further studies will determine whether this approach is applicable in other infection systems as well. interestingly, no overlap was found between pathogenicityrelated genes in influenza and sars-cov, but significant overlap in bottlenecks (39 genes, p-value < 10 −6 ) and hubs (203 genes, p-value << 10 −6 ) was found between the two viruses. epidermal growth factor receptor has previously been shown to play a role in influenza infection (eierhoff et al., 2010; ueki et al., 2013) but has not been tied to pathogenicity. because we identified egfr as a pathogenicity-correlated bottleneck gene with apparent signaling effects evident in proteomics and transcriptomics, we investigated the role of egfr in pathogenesis further with a mouse model. we treated mice for 14 days with the egfr inhibitor gefitinib and monitored infection-related weight loss of treated and untreated animals to determine if egfr inhibition affected the course of infection. after one day of treatment, mice were infected with one of two strains of influenza virus at various doses: ca04 (10 2 , 10 3 , and 10 4 pfu) or the highly pathogenic h5n1 avian strain, a/chicken/vietnam/ty167/2011 (ty167) (10 1 , 10 2 , 10 3 , and 10 4 pfu). the alternative h5n1 figure 4 | overlap in biological measures and graph topology for influenza infection. genes were ranked according to their correlation with the pathogenicity profile (top panels), maximum fold change across all infection conditions (bottom panels), network betweenness (left panels), and network degree (right panels). the top fraction from one ranking was compared to the top fraction in the other ranking using a two-tailed fisher's exact test as indicated. numerical scale represents the absolute value of the log10 p-value. for negative enrichment, these values were multiplied by -1. strain was used since the mld50 for vn-1203 is very low and makes identification of drug effects difficult. linear mixed effects models were used to model the weight loss trajectories from different infection conditions and to determine if the weight loss slope differed between treatments (table 2 and figure 6 ). red lines represent segments of the data that could be modeled with a single linear model; segments are separated by knot points. for all doses of cal/04 and the two lower doses of ty167, drug treatment significantly increased infection severity in some segments. however, higher doses of ty167 erased this trend and possibly partially reversed it. while all animals died by day 13, weight loss was less rapid by a small but significant margin in drug-treated animals at the highest viral dose. thus, egfr appears to play a significant role in the severity of non-lethal infections such that when it is inhibited, the infection is more severe. however, when the threshold is crossed to the highly lethal pathogenesis of h5n1, other mechanisms potentially take over and supersede or override the role of egfr. we used a multi-faceted approach to uncover critical components of pathogenicity in an attempt to take full advantage of the pathogenicity gradient in our study's influenza viruses and mutants. we compared the expression behavior of genes and proteins to the pathogenicity measurements of viruses in our study; this allowed us to identify which pathways and features are most closely associated with pathogenicity. the results provide clues to the underlying causes of the severity of highly pathogenic strains. our group previously used this dataset to determine that host responses to various infection conditions involve similar pathways but are characterized by distinct kinetic expression profiles (tchitchek et al., 2013) . in the current study, we use a complementary approach to identify the genes and pathways that are most closely associated with more pathogenic viruses instead of identifying elements common to all. we show that the network topology of association networks can be used to figure 4 ; however, weight loss values at each infection condition were used as a pathogenicity measurement. weight loss correlations for early (days 1 and 2) and late (days 4 and 7) were combined to obtain the pathogenicity ranking. predict genes' involvement in pathogenicity-related processes. we used this knowledge in conjunction with other network methods to identify genes and pathways associated with disease severity. our results show that signaling downstream of egfr, coagulation pathways, and b-cell down-regulation in the lung are tied to infection severity in highly pathogenic influenza. a follow-up validation study in mice confirms the role of egfr in influenza pathogenicity. we first asked what broad trends in pathogenicity could be identified using a network clustering approach. one of the detected network clusters was strongly enriched for functions related to b-cells and was negatively correlated with the pathogenicity profile. this could be caused by either a general down-regulation of gene expression in lung b-cells or a general loss of b-cells from the lung. although influenza may infect b-cells expressing flu-specific b-cell receptors (dougan et al., 2013) , initially naïve mice from our experiment are not likely to have expanded virus-specific b-cells during the time frame of our omics experiments. thus, the effect is not likely a result of gene regulation within infected b-cells and is more likely due to a diminished b-cell lung population in highly pathogenic infections. a previous report demonstrated apoptotic death of bone marrow b-cells in flu-infected mice despite failing to show that the virus was present in the bone marrow (sedger et al., 2002) . therefore, a systemic signal appears to target remote b-cells and may target lung b-cells as well. while the adaptive immune response is not likely to play a direct role during the time frame of these experiments, a lower b-cell population, for whatever reason, may signal important immune response dynamics not previously understood. histological or other studies would be necessary to confirm the relationship between severe infection and diminished b-cell numbers. the second cluster was related to coagulation/fibrinolysis, which has shown a precedent in previous work for an involvement in influenza infection (berri et al., 2013) . plasminogen (which opposes clot formation) appears to promote destructive inflammation during influenza infection. while we observed both pro-and anti-coagulation factors that were positively correlated with pathogenicity, these responses may represent a mixture of virusinduced responses and host responses to a pathogenic state. we then corroborated these results by identifying links between pathogenicity-correlated genes and pathogenicitycorrelated targets of these genes. since a dataset of this kind deals strictly with the expression of genes and proteins, other events such as protein-protein interactions, protein-mrna interactions, and phosphorylation/dephosphorylation events are not directly monitored. thus, a portion of the very early events that determine the severity of an infection is not observable with this dataset. by integrating transcript and protein data, however, we were able to reveal links between upstream and downstream effectors for egfr signaling, coagulation regulation, and b-cell down-regulation that would not be possible without the availability of both data types. since correlations between transcript and protein expression profiles are consistently observed to be low across biological systems (vogel and marcotte, 2012) , validation of transcript expression using direct correlation of protein abundances is generally not successful. however, we believe that functional rather than direct correspondence of transcripts and proteins represents an effective integration of both data types. hence, our study provides hypotheses for the involvement of a number of genes/pathways in the pathogenicity of influenza virus. to learn more about regulatory mechanisms of influenza infection, we determined whether topological positions in association networks were related to pathogenicity. we found that genes correlated with pathogenesis overlapped significantly with bottlenecks but were dramatically excluded from hubs. this result may be explained by the fact that network hubs are highly connected to many other network nodes so that rather than being involved only in highly pathogenic conditions, they tend to be involved in all infection conditions. this is affirmed by the observation that genes with the largest changes in gene expression (therefore likely to exert the strongest influence on other genes) were very strongly enriched for hub genes. on the other hand, network bottlenecks represent nodes linking different areas of the network and may identify genes that have an influence on only a subset of the processes being monitored by the data. interestingly, in a network built with data from infections of varying pathogenicity, the genes exerting these influences appear to be involved in pathogenicity-related processes. the same relationships between network topology, viral pathogenicity, and gene expression that were observed for influenza virus were also noted when we used a similar dataset of sars-cov infections, thus further validating our analysis and demonstrating that these relationships appear to apply to respiratory viruses in general. we observed remarkably high (77% of possible) overlap between hub genes in sars-cov and influenza virus networks; this is consistent with the tendency of these genes to have a universal influence during infectious disease. in contrast, bottlenecks and pathogenicity genes showed much lower or non-existent overlap between the two infection systems, suggesting that each virus maintains unique mechanisms of host interaction. this finding is important because it demonstrates that the identification of nonhub bottlenecks may represent a way to naively identify virusspecific pathogenicity-related genes when pathogenicity data is not available. previous work has shown that network bottlenecks have important regulatory roles (yu et al., 2007; mcdermott et al., 2009 mcdermott et al., , 2011 mcdermott et al., , 2012 mcdermott et al., , 2016 mitchell et al., 2013) , but this is the first time that an association has been seen between bottlenecks and pathogenesis, with network hubs being conspicuously excluded. to validate our findings, we treated mice with the egfr inhibitor gefitinib during infection with high-and low-path influenza. weight loss was significantly worsened when egfr was inhibited during low-path infection as well as during low dose infection treatment with a highly pathogenic strain, all of which were non-lethal infections. these results suggest that care should be taken when administering gefitinib to patients at risk of or currently infected with influenza. interestingly, however, high-dose, high-pathogenicity conditions displayed a possible reversal of this trend, with gefitnib showing a significant slowing of the weight loss trend at the highest dose. thus, the role of egfr is dependent on the severity of the current infection, indicating a role in pathogenicity as predicted by our omics studies. egfr stimulation has previously been shown to play a role in promoting influenza particle uptake, and egfr inhibition diminished viral titer in infected mice (eierhoff et al., 2010; ueki et al., 2013) . however, the effect of egfr inhibition on pathogenicity was not determined in previous studies. viral titer measurements made during this experiment would have allowed us to determine the effect of the drug on viral replication simultaneously with pathogenicity, allowing a clearer picture of the mechanisms at play during egfr inhibition. while the specific mechanisms are unknown, our results point to a scenario where egfr inhibition mainly exacerbates pathogenicity at low severity, likely because of the resulting blockage of host benefits such as wound-healing in the lungs (puddicombe et al., 2000) . interestingly, sars-cov infection in the context of overactive egfr results in pulmonary fibrosis (venkataraman et al., 2017) , supporting the idea that egfr signaling supports tissue regrowth during respiratory figure 6 | egfr inhibition during influenza infection. mice were exposed to the indicated dosages of gefitinib and influenza virus strains, then monitored for body weight over the indicated days post-infection (dpi). red vertical lines indicate knot points for linear modeling (see sections "materials and methods" and "statistical analysis"). green star: significance below 0.05; see table 2 for segments with near-significant changes (segments with no significance indication had p-values above 0.1). infection. the beneficial effect that comes from preventing viral particle uptake is only apparent under severe conditions when the host is largely unable to repair damaged tissue, as is likely the case in our high-dose, high-pathogenicity infection when mice are moribund. thus, egfr activation is a double-edged sword in influenza infection, promoting viral replication through increased virion uptake or suppression of cytokine production (kalinowski et al., 2014) while simultaneously driving tissue maintenance. this shift in the effect of egfr inhibition across pathogenicity provides new clues to the role of egfr regulation during lethal and non-lethal influenza virus disease. in summary, we have used a unique combination of networkbased analyses of transcript and protein expression from our pathogenicity gradient dataset to (1) identify b-cell downregulation and coagulation pathway up-regulation as being likely associated with pathogenicity in influenza; (2) show that identification of non-hub bottlenecks represents a way to use association networks to enrich prediction of pathogenicityrelated genes and pathways; (3) validate the involvement of one of these pathways, egfr signaling; and (4) show that egfr inhibition appears to override a key host response mechanism involved in non-lethal viral infections. the transcriptomics datasets generated for this study can be found in gene expression omnibus, gse50000, gse49262, gse33263, gse37572, gse43301, gse43302, gse44441, gse44445, gse37569, gse33266, and gse49263. proteomics datasets can be found at https://omics.pnl.gov/project-data/ systems-virology-contract-data. all animal experiments and procedures were approved by the university of wisconsin (uw)-madison school of veterinary medicine animal care and use committee under relevant institutional and american veterinary association guidelines. ml isolated a virus strain. ae, lg, rb, yk, and kw designed the experiments. ae, lg, and as performed the experiments. hm, ks, jm, and km conceived and designed the analysis. hm, ks, nh, and jw performed the analysis. hm and kw wrote the manuscript. all authors reviewed the manuscript. plasminogen controls inflammation and pathogenesis of influenza virus infections via fibrinolysis lethal influenza virus infection in macaques is associated with early dysregulation of inflammatory related genes estimated global mortality associated with the first 12 months of 2009 pandemic influenza a h1n1 virus circulation: a modelling study antigen-specific b-cell receptor sensitizes b cells to infection by influenza virus the epidermal growth factor receptor (egfr) promotes uptake of influenza a viruses (iav) into host cells influenza a virus isolation, culture and identification large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles h7n9 situation update mechanisms of severe acute respiratory syndrome coronavirusinduced acute lung injury molecular basis for high virulence of hong kong h5n1 influenza a viruses differential network biology egfr activation suppresses respiratory virus-induced irf1-dependent cxcl10 production aberrant innate immune response in lethal infection of macaques with the 1918 influenza virus wgcna: an r package for weighted correlation network analysis transcriptome analysis of hiv-infected peripheral blood monocytes: gene transcripts and networks associated with neurocognitive functioning host regulatory network response to infection with highly pathogenic h5n1 avian influenza virus defining the players in higher-order networks: predictive modeling for reverse engineering functional influence networks topological analysis of protein co-abundance networks identifies novel host targets important for hcv infection and pathogenesis the effect of inhibition of pp1 and tnfalpha signaling on pathogenesis of sars coronavirus bottlenecks and hubs in inferred networks are important for virulence in salmonella typhimurium novel swine-origin influenza a virus in humans: another pandemic knocking at the door a network integration approach to predict conserved regulators related to pathogenicity of influenza and sars-cov respiratory viruses automated identification of core regulatory genes in human gene regulatory networks re-emergence of fatal human influenza a subtype h5n1 disease b cells in chronic obstructive pulmonary disease: moving to center stage involvement of the epidermal growth factor receptor in epithelial repair in asthma pandemic swine-origin h1n1 influenza a virus isolates show heterogeneous virulence in macaques weighted gene co-expression network analysis of the peripheral blood from amyotrophic lateral sclerosis patients bone marrow b cell apoptosis during in vivo influenza virus infection requires tnf-alpha and lymphotoxin-alpha identification of pathogenicity-related genes in fusarium oxysporum f. sp. cepae specific mutations in h5n1 mainly impact the magnitude and velocity of the host response in mice integrated omics analysis of pathogenic host responses during pandemic h1n1 influenza virus infection: the crucial role of lipid metabolism respiratory virus-induced egfr activation suppresses irf1-dependent interferon lambda and antiviral defense in airway epithelium overactive epidermal growth factor receptor signaling leads to increased fibrosis after severe acute respiratory syndrome coronavirus infection insights into the regulation of protein abundance from proteomic and transcriptomic analyses review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics cumulative number of confirmed human cases for avian influenza a(h5n1) reported to who the importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics a computational model to predict bone metastasis in breast cancer by integrating the dysregulated pathways this study was funded by grant u19ai106772, provided by the national institute of allergy and infectious diseases, national institutes of health. pacific northwest national laboratory is a multi-program laboratory operated by battelle for the united states department of energy (doe) under contract de-ac05-76rl01830. the authors would like to thank daniel beechler for technical assistance, catherine himes for editorial services, and nathan johnson for graphic arts contributions. the supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2019.00200/ full#supplementary-material key: cord-355927-nzoiv9pj authors: lemmon, alan r.; brown, jeremy m.; stanger-hall, kathrin; lemmon, emily moriarty title: the effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and bayesian inference date: 2009-05-21 journal: syst biol doi: 10.1093/sysbio/syp017 sha: doc_id: 355927 cord_uid: nzoiv9pj although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. we use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ml) and bayesian frameworks. by introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. we find that in both ml and bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. furthermore, within a bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. the magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. the results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis. phylogenetic analysis has become well established as an important research tool in the biological sciences (harvey et al. 1996; avise 2006) , with applications spanning broad fields of research, including evolution (murphy et al. 2001; bowers et al. 2003; mckenna and farrell 2005) , ecology (armbruster 1992; webb 2000) , and medicine (bush et al. 1999; hillis 2000; eickmann et al. 2003) . numerous studies have demonstrated that model misspecification can affect the accuracy of phylogenetic estimates (kuhner and felsenstein 1994; yang et al. 1994; sullivan et al. 1995; lockhart et al. 1996; lemmon and moriarty 2004 ). an important, but unresolved, question is whether ambiguous data affect the accuracy of phylogenetic estimates (kearney 2002; de queiroz and gatesy 2007; wiens 2003a, and references therein) . the answer to this question is becoming increasingly relevant as more studies combine data sets. with partial sequences of more than 165 000 taxa now available in large sequence databases, such as gen-bank, embl, or ddbj, an increasing number of studies will use large-scale combinations of sequences from these databases in meta-analyses. incomplete sequences and sampling biases in these databases have led researchers to build phylogenetic data sets that have large numbers of ambiguous characters and gaps (driskell et al. 2004) . the effects of ambiguous data are unclear, at least in part, because the terminology used to describe the problem has neither been defined carefully nor used consistently across studies. consequently, we begin by clarifying our terminology. we define the data as a matrix of cells with rows and columns corresponding to sequences and homologous sites, respectively. the value in each cell represents the character state for the corresponding sequence and site. the state of each character is unambiguous (taking the state "a," "c," "g," or "t"), partially ambiguous (taking the state "b," "d," "h," "v," "s," "w," "r," "y," "k," or "m"), or ambiguous (taking the state "?" or "n"). ambiguous character is used to refer to a character with an ambiguous state. note that unless explicitly modeled, a gap (represented by the state "-" and resulting from an insertion or a deletion) will have the same effect as an ambiguous character. also note that partially ambiguous character states are not considered here for simplicity. we use the term ambiguous site to refer to a site containing 1 or more ambiguous characters and the term ambiguous sequence to refer to a sequence containing 1 or more ambiguous characters. last, we use invariable site to refer to a site in which all unambiguous characters have the same state. to ensure clarity, we henceforth avoid using the term "missing data," although the reader may think of ambiguous or gap characters as missing data. because of the complexity of the problem and the fact that conclusions from simulation studies are conflicting, the potential impact of ambiguous characters is still unclear. early evidence suggested that ambiguous characters can reduce phylogenetic accuracy, especially when taxa have large numbers of ambiguous characters (platnick et al. 1991) . subsequent studies of the effects of ambiguous characters disagree in their conclusions. for example, wiens (1998 wiens ( , 2003a wiens ( , 2003b argued that adding ambiguous sequences or sites to a phylogenetic analysis has no detrimental effect and that the addition of more of these sequences increases accuracy, even if it means adding ambiguous characters. this argument seems counterintuitive because adding in-group taxa will decrease the average length of internal branches and thus make the estimation of phylogenetic relationships more difficult (jermiin et al. 2004) . his analyses also suggest that even highly ambiguous sequences have little impact on the phylogenetic relationships of the unambiguous sequences (wiens 2003a (wiens , 2003b ). in contrast, other studies (huelsenbeck 1991; hillis et al. 1992; bull et al. 1993; wiens and reeder 1995; dragoo and honeycutt 1997) found that ambiguous characters resulted in reduced phylogenetic accuracy, although the severity of the effect was variable. this variation was attributed to the number of ambiguous characters (huelsenbeck 1991; hillis et al. 1992; bull et al. 1993; wiens and reeder 1995) , the type of data (e.g., dna vs. restriction sites; wiens and reeder 1995) , the taxonomic distribution of the ambiguous characters (dragoo and honeycutt 1997) , or the topological information in the data set (dragoo and honeycutt 1997; wiens 2003b; philippe et al. 2004) . most of the past research into the effects of ambiguous characters focused on maximum parsimony analysis (huelsenbeck 1991; platnick et al. 1991; wiens 1998; kearney and clark 2003; wiens 2003b) . more recently, studies also have considered the effects of ambiguous characters on neighbor joining (wiens 2003a) , maximum likelihood (ml) (dunn et al. 2003; wiens 2003a wiens , 2006 gouveia-oliveira et al. 2007) , or bayesian (wiens 2006; wiens and moen 2008) analyses. although wiens (2006) concluded that adding ambiguous sequences or sites to a data set increased phylogenetic accuracy for maximum parsimony, neighbor joining, ml, as well as bayesian analyses, dunn et al. (2003) found a 50% reduced accuracy for maximum parsimony and no reduction in accuracy for ml when ambiguous characters were added. there are 2 conflicting views on how ambiguous characters affect accuracy. some authors argue that reduced accuracy is due to a lack of information (i.e., not enough unambiguous characters) rather than due to the ambiguous characters per se (kearney and clark 2003; wiens 2003a) . they view ambiguous character states simply as representing the unknown, with little impact on the outcome of a phylogenetic analysis (kearney and clark 2003) . the practical implication of this view is that all available data, including ambiguous sequences and sites, should be used in a phylogenetic analysis to maximize the available information. in contrast, other studies suggest that ambiguous characters bias the resulting phylogeny to an extent that goes beyond the lack of information (huelsenbeck 1991) . the recommendation in this case would be to reduce the data set in an effort to eliminate as many ambiguous characters as possible. even though rarely stated explicitly, this latter strategy is widely used, either by setting arbitrary limits for including ambiguous sequences into an analysis (kearney and clark 2003) or by excluding the leading and trailing ends of sequences in a data set. though some researchers are careful to remove ambiguous characters, others disregard their potential effects and use largely ambiguous matrices in an effort to maximize the number of taxa and genes sampled. because simulation studies have drawn conflicting conclusions regarding the effects of ambiguous characters, and the exact mechanism by which ambiguous characters may affect phylogenetic accuracy remains unknown (wiens 2006) , further investigation is needed. previous studies conflict because the approaches taken by authors have confounded the effects of ambiguous characters with the effects of phylogenetic information (due to nucleotide substitutions). more specifically, authors have manipulated data sets by either adding sites containing more than 2 unambiguous characters or by changing the state of characters from unambiguous to ambiguous. in either case, authors have inadvertently manipulated the amount of phylogenetic information along with the number of ambiguous characters. furthermore, different simulation studies varied widely in their assumptions. for example, wiens (1998 wiens ( , 2003a wiens ( , 2003b assumed that all characters evolved at the same rate and that branch lengths were equal. in contrast, dunn et al. (2003) assumed that characters and lineages evolved at different rates and allowed for different branch lengths. to determine the consequences of including ambiguous characters in phylogenetic analyses, it is necessary to separate these confounding variables and explore the effects of ambiguous characters across a wide range of parameter space. the goals of this study are 1) to determine whether ambiguous characters bias estimates of phylogeny, and if they do, 2) to understand the mechanism by which this bias is introduced, and 3) to identify the factors that contribute to the direction and magnitude of the bias. our approach differs from previous studies in that we only introduce ambiguous sites that should be topologically uninformative if ambiguous characters have no effect. in this way, we are able to remove the confounding factors described above and arrive at a clear understanding of the effects of ambiguous characters. we show that at least 5 factors determine the direction and magnitude of bias resulting from ambiguous characters: the number and taxonomic distribution of ambiguous characters, the strength of topological support from unambiguous characters, the degree of among-site rate variation, and the method and assumptions of the analysis (including the priors assumed in a bayesian analysis). although we focus on ambiguous characters, we expect gaps due to insertions and deletions to have the same effect, unless they are explicitly modeled. we conclude by discussing the implications of this work and introduce several possible solutions to the problem. in the following section, we outline the simulation conditions and the general conditions under which the analyses were conducted. in the results, we present the specific conditions under which the analyses were 132 systematic biology vol. 58 conducted along with the result to which they pertain. the factors found to contribute to the effects of ambiguous characters are presented with increasing complexity, beginning with individual factors and ending with combinations of factors. in this way, the reader can more easily understand the effect of each factor as well as the interactions among the factors. in order to gain a clear understanding of the effects of ambiguous characters on estimates of phylogeny, our analyses incorporated the following 4 properties: first, we primarily used simulated data (instead of empirical data) in order to gain control over the factors affecting our phylogenetic estimates and to vary those factors independently. second, we simplified the simulations as much as possible, focusing on variables that were of immediate interest. consequently, our analyses were based on 4-taxon simulations under simple models of evolution. third, our simulated data sets contained 2 regions (sets of nucleotide sites). the first region, which was of fixed length, contained only unambiguous characters and provided a baseline amount of support for the true topology. the second region was of variable length and may have contained ambiguous characters. fourth, we were careful to include ambiguous characters in such a way that we could eliminate other confounding factors. more specifically, ambiguous sites provided no topological information because only 2 of the 4 characters had unambiguous states. in this way, we were able to remove the effects of substitutions in the ambiguous sites that could affect support for the true topology. in order to determine the effects of bayesian priors, we also compared results from ml and bayesian analyses (hillis et al. 1996; felsenstein 2004) , when possible. we first generated 6 alignments, each comprising 500 nucleotides, using a 4-taxon tree (in which each branch length was equal to 1.0 my) and the jukes-cantor (jc) model of evolution (jukes and cantor 1969) . the 6 types of data differed in that they were simulated under 6 different rates of evolution: 0.000015, 0.0015, 0.015, 0.15, 1.3, and 15.0 substitutions per site per my (refer to fig. 1 ; note that 1.3 is not a typographical error). these rates were chosen, based on preliminary simulations, to produce data sets containing a range of phylogenetic information, resulting in posterior probabilities (given 500 unambiguous sites) for the true topology of 1/3, 2/3, 1, 1, 2/3, and 1/3, respectively. the lowest rate of evolution produced data sets that were invariable at all sites and the highest rate produced data sets that were saturated. all data sets were simulated with seq-gen 1.2.5 (rambaut and grassly 1997) . to introduce among-site rate variation, we produced 36 types of combined data sets by concatenating pairwise combinations of the 6 rate types outlined above. each of the 36 combined data sets thus contained 1000 sites. we refer to the first 500 sites as gene a and the remaining sites as gene b. to vary the taxonomic distribution of ambiguous characters across data sets, we replaced all 500 characters in gene b with ambiguous characters (taking the state "?") for 2 of the taxa (either sister or nonsister on the 4-taxon tree) or none of the taxa (gene b unambiguous). last, we varied the length of gene b by removing between 0 and 500 of the sites at the end of gene b (in increments of 50). one hundred replicates of each of the 36 types were created for a total of 118 800 data sets (36 rate combinations ×100 replicates× 3 ambiguous character distributions × 11 lengths). two types of ml analyses were performed. the first type was performed to identify the effect of ambiguous characters on estimates of topology. to identify the ml topology for a given 4-taxon data set, each of the 3 possible topologies was scored using paup* v.4.0b10 (swofford 2003 ) under the jc model of evolution. branch lengths were optimized using default settings, which include collapsing short branches (<10 −8 substitutions per site) to polytomies. the topology with the highest likelihood after optimization was chosen as the ml topology. we also computed the likelihood of the data, given topologies with fixed branch lengths (see description below). in order to accommodate rate heterogeneity in data sets with rate variation across genes, we conducted additional ml analyses using treefinder (jobb 2008) . the second type of ml analysis was performed to identify the effect of ambiguous characters on estimates of branch lengths. because we simulated the data using an ultrametric tree, we expect the tips of the estimated phylogeny to be equidistant from the root if ambiguous characters have no effect. therefore, a molecular clock test can be used to determine whether relative branch lengths are significantly affected. we first computed the likelihood of the data given the true topology and branch lengths optimized under the jc model. we then computed the likelihood of the data given the true topology and branch lengths optimized with a molecular clock assumption enforced (also under the jc model; the root is assumed to be between the 2 internal nodes). the molecular clock assumption forces the tips to be equidistant from the root. the ratio of these likelihoods was then computed to assess whether any departure from a clock-like evolutionary process was significant (χ 2 test with 2 df; felsenstein 1981 felsenstein , 1988 . the type i error rate was computed as the proportion of replicates in which the clock model was rejected. bayesian analyses were performed to assess the effect of ambiguous characters on estimates of topological support in the form of bipartition posterior probabilities. posterior distributions were estimated figure 1. simulation design. among-site rate variation was simulated using 6 rates of evolution (chosen to produce the desired pp for the true tree with 500 sites) combined across 2 genes to form 36 rate combinations. gene a contained unambiguous sites, whereas gene b contained ambiguous sites. ambiguous characters were present for either sister or nonsister taxa. although gene a always contained 500 sites, the length of gene b varied from 0 to 500 sites. note that gene b contained no topological information, regardless of the rate of evolution. pp = posterior probabilities. using mrbayes v.3.1.1 (ronquist and huelsenbeck 2003) with 4 incrementally heated chains (temperature = 0.2). unless specified otherwise, we assumed the following priors. for the topology, a uniform prior (across all possible resolved topologies) was assumed (the default in mrbayes v.3.1.1). note that this prior places a zero prior probability on polytomies. for the branch lengths, the exponential prior with mean equal to 0.1 was assumed (the default in mrbayes v.3.1.1). note that this prior penalizes long branch lengths and requires branches to take lengths greater than 0 (i.e., does not allow polytomies). markov chains were sampled every 10 generations. mrconverge v1.2b (written by a.r.l.; http://www.evotutor.org/mrconverge) was used to assess burn-in and convergence of 4 independent runs (see brown and lemmon [2007] for details). each data set was analyzed under the jc model of evolution. in addition, 3 different models of amongsite rate variation were considered: gamma-distributed rates with 4 discrete categories ( 4 ) (steel et al. 1993; yang 1993 yang , 1994 , invariable sites (i) (gu et al. 1995; waddell and penny 1996) , and unlinked rates across partitions (p) (ronquist and huelsenbeck 2003) . the priors assumed for these 3 models of rate variation were uniform(0,50), uniform(0,1), and dirichlet(1,1), respectively. the latter model was used by partitioning the sites according to the gene to which they belong (note that in this case, the partition boundaries are known). in addition, we also enforced a strong prior at the true values for the jc + i and jc + p models to assess the effect of the flat priors on the posterior distribution. to confirm that the biases caused by ambiguous characters in our simulated data sets could also affect estimates of topology derived from empirical data sets, we manipulated an empirical data set that originally contained very few ambiguous characters. an 8-taxon, single-gene (16s) subset of data was taken from mueller et al. (2004) . to the original data set, we appended up to 1000 additional sites in 2 different schemes. in the first scheme (referred to as sister variable), we randomly chose sites in which the character states of 2 sister species (hydromantes italicus and hydromantes brunus) differed, appended copies of them to the original matrix, and changed the character states of the other 6 species at all appended sites to ambiguous ("?"). in the second scheme (referred to as distant invariable), we randomly chose sites in which the character states of 2 nonsister species (desmognathus fucus and ensatina eschscholtzii) did not differ, appended copies of them to the original matrix, and changed the character states of the other 6 species at all appended sites to ambiguous ("?"). phylogenetic trees were then inferred from each of these new data sets using ml and bayesian methods (settings were the same as described above, except that the unpartitioned gtr + i + model was assumed). because the appended sites in both types of manipulated data sets contain unambiguous characters for only 2 taxa, they should carry no topological information (i.e., their addition should not affect topological support values). effectively invariable data.-we begin by describing the results from analyses of effectively invariable data sets (i.e., rate = 0.000015 substitutions per site per my). we use the term "effectively invariable" because the vol. 58 rate of evolution was so low that all simulated data sets were completely invariant at all sites. here, the jc model is assumed (no rate heterogeneity). in this simple case, we expect the support for each of the 3 possible topologies to be equal, regardless of the length of gene b or the distribution of ambiguous characters. this expectation is met in the ml framework (fig. 2a) . in contrast, the expectation of equal support is not met in the bayesian framework (fig. 2b) . in this case, support for the true topology increases if gene b is ambiguous for sister taxa, but decreases if gene b is ambiguous for nonsister taxa. the magnitude of the bias (deviation from the expectation) increases as the length of gene b increases. when gene b is ambiguous for none of the taxa (the control), support for the true tree remains at the expected value. one interesting pattern is that the bias caused by the ambiguous data is asymmetric, with the positive bias (gene b ambiguous for sister taxa) being approximately twice that of the negative bias (gene b ambiguous for nonsister taxa). this asymmetry is due to the fact that the posterior probability estimate is positively biased for 1 tree and negatively biased for each of the other 2 trees. because the posterior probabilities of all 3 trees must sum to 1, the magnitude of the bias observed for each of the 2 negatively biased trees is less than the magnitude of the bias observed for the single positively biased tree (supplementary table s1 , http://www.sysbio.oxfordjournals.org). one possible factor that could lead to the difference between the ml and the bayesian results (fig. 2a,b) is the fact that branch lengths may be collapsed to polytomies in the ml analyses but not in the bayesian analyses. branch lengths are not collapsed in the bayesian analyses because the prior on topologies gives a 1/3 probability to each of the 3 possible (resolved) topologies. this prior places a zero probability on a branch length of 0 (i.e., a polytomy), which can lead to the star tree problem (suzuki et al. 2002; cummings et al. 2003; lewis et al. 2005; yang and rannala 2005; kolaczkowski and thornton 2006; steel and matsen 2007; yang 2007) . thus, branch lengths are forced to take small but nonzero values. to investigate whether the ability to collapse polytomies could be responsible for the difference between the ml and the bayesian results, we conducted ml analyses with branch lengths constrained to small but nonzero values. our results show that this factor indeed drives the misleading posterior probabilities in the bayesian framework (fig. 2c) . as in the bayesian case, support for the true topology changes with the length of gene b, and the direction of the change depends on the distribution of ambiguous characters. note that neither the ambiguous characters nor the prior alone produces substantial bias in topological support. instead, it is the interaction between ambiguous characters and the prior that produces the bias. in the supplemental material (http://www.oxfordjournals.org/our journals/sysbio/), we present a mathematical argument suggesting that when polytomies are given a zero prior probability (only nonzero branch lengths are allowed), topological figure 2. the effect of ambiguous characters on topological support when both genes are effectively invariable (rate = 0.000015 substitutions per site per my). on each graph, we plot the support for the true tree as a function of the length of gene b. each point represents the mean across 100 replicate data sets. in an ml framework (a), ambiguous characters do not affect topological support (pr: calculated as the proportion of 100 replicates in which the true tree was chosen, with a value of 1/3 given to replicates with equal support across topologies) when branch lengths can be collapsed to polytomies. in a bayesian framework (b), topological support (pp) changes as a function of the length of gene b and whether gene b is ambiguous for sister (black) or nonsister (dark gray) taxa. when gene b is unambiguous (light gray), topological support is unaffected by the length of gene b. when branch lengths are forced to take a small but nonzero value (10 −6 substitutions/site) in an ml framework (c), ambiguous characters bias topological support (measured as the ratio of likelihood scores for the true to one of the false trees) in the manner seen in the bayesian framework. note that in a bayesian framework, the flat prior on bifurcating topologies requires branch lengths to take a nonzero value. pp = posterior probability; pr = probability. variable data.-here, we describe the results of analyses of simulated data in which the rate of evolution was sufficient to produce variable sites but remained the same for ambiguous and unambiguous sites. again, the jc model is assumed (no rate heterogeneity). recall that the rate of gene a determines the baseline level of support for the true topology. gene b should not provide topological information when present in just 2 taxa. if ambiguous characters have no effect on topological support, then we expect support for the true topology to vary systematically with the rate of evolution but not with the length of gene b. this expectation is met in the ml framework (fig. 3a) . although some stochastic error is present when the sequences have evolved under very high rates, this error would disappear if the number of replicates was increased. in contrast, this expectation is not met in the bayesian framework (fig. 3b) , where topological support changes as a function of the length of gene b, the distribution of ambiguous characters, and the rate of evolution. the observed bias is highest when the rate of evolution produces either effectively invariable or saturated data and lowest when the rate of evolution is intermediate. no bias is observed when gene a provides strong support for the true topology (fig. 3, center 2 columns) . interestingly, the direction of bias is opposite for low and high rates of evolution. when the rate is low, support for the true topology is positively biased when sister taxa have ambiguous characters. conversely, when the rate is high, support for the true topology is negatively biased when sister taxa have ambiguous characters. one factor that could lead to bias in the bayesian framework when the rate of evolution is high is the branch-length prior. recall that an exponential branchlength prior (with mean equal to 0.1) was assumed in the bayesian analyses presented in figure 3b . under this prior, the density decreases as the length of the branch increases, thus favoring short branches over long branches. if the exponential branch-length prior does not contribute to the bias in topological support, we expect the posterior probability of the true tree to be the same when a different prior is assumed. to determine whether this expectation is met, we estimated the posterior distributions assuming a uniform (flat) prior (0,100) on branch lengths. as expected, the bias disappears under a flat branch-length prior (fig. 3c) , suggesting that the combination of the exponential branch-length prior and the ambiguous characters produces the bias. note that changing the prior has no effect on the bias observed for data sets that evolved under low rates of evolution because zero length branches are currently not allowed under any branch-length prior (as described above). we also computed the likelihood of the saturated data assuming each of the 3 possible topologies with branch lengths fixed at an arbitrary value that is large (1.0 substitutions/site) but still smaller than the true value (15.0 substitutions/site). this has the effect of mimicking the bayesian exponential prior, which negatively biases the branch-length estimates. as in the bayesian analyses, support for the true topology decreases with the length of gene b when sister taxa have ambiguous characters but increases with the length of gene b when nonsister taxa have ambiguous characters (fig. 3d) . as expected, no bias is present when gene b is ambiguous for none of the taxa. this result demonstrates that constraining branch lengths to values lower than their optimum in an ml setting has the same effect as assuming a prior that favors short branches in a bayesian setting. misspecification here, we describe results from the analyses in which the rate of evolution is different for ambiguous and unambiguous sites. in this case, the correct model of evolution is the jc model with separate rates for the 2 genes. we used this model by partitioning the data set by gene (i.e., with known boundary) in both ml and bayesian analyses. in the bayesian analysis, the rate prior was set to dirichlet(1,1). the magnitude and direction of bias in topological support are a function of the relative rates of the ambiguous and unambiguous sites in bayesian (fig. 4) but not in ml ( supplementary fig. s1 ) analyses. for the bayesian analyses, substantial bias is observed when the rate of evolution of gene a is low (fig. 4 , left columns) or when the rates of evolution at both genes are high (fig. 4 , lower right corner). this suggests that weakly supported bipartitions are more sensitive to the effects of ambiguous characters. the rate of evolution of gene b can affect support for the true topology when the baseline support (from gene a) is weak. support for the true topology is strongly biased when gene b is evolving faster than gene a. when gene a is evolving faster, a much smaller bias is typically observed. the magnitude of the bias caused by ambiguous characters also differs depending on the assumed model of among-site rate variation. this is shown in figure 5 , which presents results from bayesian analyses of data sets in which gene a is variable and gene b is effectively invariable. if rate priors do not interact with ambiguous characters to produce biased topological estimates, then we expect the posterior probability estimates not to vary with the assumed model of rate variation, as long as that model matches the simulation conditions. to test this expectation, we compared results for 3 models of rate variation (fig. 5) : discrete gamma ( ), invariable sites (i), and partitioned with variable rate prior (p). in principle, the latter 2 models should match the simulation conditions. the direction and magnitude of bias are similar for the discrete gamma and invariable sites 136 systematic biology vol. 58 figure 3. the effect of ambiguous characters on topological support when both genes are evolving at the same rate. axes and shades of gray are the same as in figure 2 . note that the graphs in the left column of (a), (b), and (d) are identical to those presented in figure 2 . in an ml framework (a), ambiguous characters do not lead to a systematic bias in topological support, regardless of the rate of evolution (increasing from left to right columns). in a bayesian framework (b), however, the magnitude and direction of the bias are a function of the rate of evolution. this bias is strongest when the rate of evolution is low or high and weakest when the rate of evolution is intermediate (e.g., when gene a provides strong support for the true tree). when the rate of evolution is high, the bias exists when an exponential branch-length prior is assumed (b) but is absent when a uniform branch-length prior is assumed (c). the type of bias seen in the bayesian framework can be demonstrated in the ml framework (d) if branch lengths are fixed at an arbitrarily low value (results for 10 −6 substitutions per site per my are shown in the lower left graph) or a very high value (results for 1.0 substitutions per site per my shown in the lower right graph) data set. note that in the bayesian framework, the flat topological prior prohibits zero-length branches and the exponential branch-length prior penalizes long branches. models. surprisingly, the bias is much more substantial for the partitioned model when the rate of gene a is low but much less substantial when the rate of gene a is high. results from the ml analyses largely support these conclusions ( supplementary fig. s2 ), although systematic bias was only observed when gene a was evolving at a high rate. in order to study the effect of the priors on the rate variation parameters, we performed additional analyses in which we used strong priors to effectively fix parameters at their true values for the invariable sites and partitioned models. the prior on the proportion of invariable sites (uniform from 0 to 1) has a small effect on the bias (compare second and third rows of fig. 5) , despite the fact that the proportion of invariable sites is only accurately estimated when gene b is ambiguous for none of the taxa (supplementary table s2 ). in contrast, the prior on the relative rates in the partitioned model appears to have a substantial effect (compare the fourth and fifth rows of the left column in fig. 5 ). when the prior is set such that strong weight is placed on the true values (i.e., dirichlet(10 000, 10 000)), the bias for effectively invariable data sets (left column) approximates the bias seen when the jc model was assumed (fig. 2) . because the ratio of rates of evolution is infinity when gene a is variable and gene b is effectively invariable, the rate prior could not be fixed at the true values for some of the rate conditions. figure 4. the effect of ambiguous characters on bayesian posterior probabilities when rates differ between unambiguous (a) and ambiguous (b) genes. in each graph, the average posterior probability of the true tree (y-axis) is plotted as a function of the length of gene b (x-axis), pattern of ambiguous characters (shade of gray), rate of gene a (column), and rate of gene b (row). graphs show results from analyses in which rate variation was modeled in a partitioned analysis (partitioned by gene) with a dirichlet(1,1) rate prior. therefore, the model of evolution is overparameterized along the diagonal (equal rates; analogous to figure 3b ) and correctly parameterized off the diagonal. note that the magnitude and direction of bias are a function of the relative rates of the ambiguous and unambiguous genes. also note that in some cases, the bias is strongest when the number of ambiguous sites is low. if ambiguous characters have no effect on branchlength estimates, then we would expect estimated trees to be ultrametric and the type i error rate for a molecular clock test to be independent of the number of am-biguous sites. our analyses demonstrate that this is not the case: ambiguous characters can substantially inflate the type i error rate for the molecular clock test (fig. 6) . in some cases, the type i error rate can increase rapidly (from 5% to 100%) with the addition of very 138 systematic biology vol. 58 figure 5. the effect of ambiguous characters on bayesian posterior probabilities under different models of among-site rate variation. axes and shades of gray have the same meaning as in figure 2b . each row corresponds to the top row of figure 4 (gene b is effectively invariable). results for models of rate variation are shown: discrete gamma with 4 rate categories ( ), invariable sites (i), and partitioned with variable rate prior (p). the subscript f indicates that the rate variation parameter was fixed at the true value, removing the effect of the prior on that parameter. note that in each case, the light gray lines show the results from analyses of data sets in which both genes a and b were completely unambiguous (i.e., the control). under the (incorrect) gamma model of rate heterogeneity, the posterior probabilities were slightly biased even for the unambiguous data sets. few ambiguous sites (e.g., 50). inflation of the type i error rate is greatest when genes a and b are evolving at very different rates. note, however, that when the 2 genes are evolving under the same rate (graphs along the diagonal in fig. 6 ) or when gene b is unambiguous for all taxa (light gray lines in fig. 6) , the type i error rate is independent of the length of gene b. these results suggest that the interaction between ambiguous characters and rate variation among sites can lead to estimates of trees that are significantly nonultrametric. also note that we are assuming the jc model of evolution (no rate heterogeneity), so the results displayed in the off-diagonal cells of figure 6 are, in fact, underparameterized. thus, we cannot say whether the bias 139 figure 6. the effect of ambiguous characters on the probability of incorrectly rejecting a molecular clock model of evolution in an ml framework. the proportion of 100 replicates in which the clock model was rejected in a χ 2 test (df = 2; y-axis) is plotted against the length of gene b (x-axis), distribution of ambiguous characters (shade of gray), the rate of gene a (columns), and the rate of gene b (rows). because rate heterogeneity was not accommodated in these analyses (see text), the model of evolution was underparameterized in analyses presented off the diagonal. note that substantial inflation of type i error requires both rate variation (off diagonal) and ambiguous characters (black or dark gray points). would disappear if rate heterogeneity were properly modeled (though we expect the bias would disappear). recall that in both schemes (sister variable and distant invariable), the appended sites contained unambiguous characters for only 2 of the 8 taxa, so topological support should remain the same as sites are appended if ambiguous characters have no effect. our results clearly demonstrate, however, that ambiguous characters in empirical data sets can strongly bias estimates of topological support and branch lengths (fig. 7) . in particular, when variable sites are added (fig. 7a) , sister taxa are 140 systematic biology vol. 58 figure 7. the effect of ambiguous characters on estimates of an empirical phylogeny estimated in a bayesian framework. in (a), we present results based on an empirical data set with up to 1000 variable sites appended. the character state at each appended site was unambiguous but different for the sister taxa hydromantes brunus (hb) and hydromantes italicus (hi) and was ambiguous ("?") for the other 6 taxa: aneides flavipunctatus (af), aneides hardii (ah), desmognathus fucus (df), desmognathus wrighti (dw), ensatina eschscholtzii (ee), and phaeognathus hubrichti (ph). the number of appended sites is given above each phylogeny, and the bipartition posterior probability estimate is given at each internal branch. in (b), we present results based on the same empirical data set but with up to 1000 invariable sites appended. here, the character state at each appended site was identical for the distant taxa df and ee and was ambiguous for the other 6 taxa: af, ah, dw, hb, hi, and ph. note that when variable sites are added, taxa with unambiguous characters are pushed apart on the phylogeny, whereas when invariable sites are added, taxa with unambiguous characters are pulled together. topologies estimated in an ml framework matched those estimated in a bayesian framework. pushed apart on the phylogeny. for example, when the data set contains no ambiguous sites, h. brunus and h. italicus are sister taxa supported by a posterior probability of 1.0. when the data set contains 1000 ambiguous sites, however, these 2 taxa are on opposite sides of the phylogeny (the branches separating them are supported by posterior probabilities of 0.67, 0.84, and 0.94). conversely, when invariable sites are added (fig. 7b) , distant taxa are pulled together. for example, when the data set contains no ambiguous sites, d. fucus and e. eschscholtzii are separated by 3 branches with posterior probabilities equal to 1.0, 0.86, and 0.49. when the data set contains 1000 ambiguous sites, however, these 2 taxa are only separated by 1 internal branch. in both cases, the result is strong support for bipartitions that do not appear in the topology estimated without the ambiguous sites. trees inferred using the ml criterion produced the same pattern of bias. ambiguous characters can strongly bias estimates of topology and branch lengths in ml and bayesian phylogenetic inference. gaps due to insertions or deletions will have the same effect unless explicitly modeled (note that most software, including mrbayes, treat gaps as ambiguous characters because explicit models of indels are rarely implemented). we have shown that the magnitude and direction of the bias vary as a function of the number of ambiguous characters, the topological position of ambiguous sequences, the relative rates of evolution for ambiguous and unambiguous sites, and the model of sequence evolution assumed. furthermore, topological bias is likely to be most pronounced in a bayesian framework due to the additional interaction between the ambiguous characters and the priors. even so, estimates of branch length and topology can be biased in an ml framework when rate variation across sites is not properly modeled. these results are in sharp contrast to recent opinions that the effects of ambiguous characters are overstated in the literature (e.g., de queiroz and gatesy 2007). bipartitions that are strongly supported by unambiguous sites are likely to remain strongly supported with the inclusion of ambiguous sites (e.g., fig. 4 , columns 3 and 4). false bipartitions that would otherwise be weakly supported, however, may become strongly supported with the inclusion of even a few ambiguous sites. in practice, therefore, it may be difficult to distinguish between true bipartitions that are strongly supported by real signal and false bipartitions that are strongly supported because of the effects of ambiguous characters. note that although we focused our analyses on the 4-taxon case, we expect our conclusions to hold for weakly supported bipartitions in larger phylogenies, although the effects are expected to be more complex due to the interactions with additional bipartitions. in contrast to several previous simulation studies that attributed a reduced phylogenetic accuracy to a lack of information in the ambiguous sites (leading to low resolution; wiens 1998 wiens , 2003a ), our study clearly shows that ambiguous characters actively produce misleading estimates of phylogeny through interaction with 2 other factors: bayesian priors and model misspecification. interaction with bayesian priors can be understood by considering a bayesian analysis of an invariable 4-taxon data set. priors on topology (uniform over strictly bifurcating topologies) and branch lengths (typically uniform or exponential) result in sampled branch lengths that are small but nonzero. for a particular site in the data set, the conditional likelihood score is equal to 1.0 for any subtree containing only taxa with ambiguous character states (i.e., "?"). in effect, these portions of the tree are pruned out (ignored). thus, the site likelihood is calculated only along branches connecting the sequences that are unambiguous for that site. for sites in which 2 of the sequences have unambiguous character states, this score is not identical across the 3 topologies. one of the topologies groups these 2 unambiguous taxa as sister, whereas the other 2 topologies position them in a nonsister arrangement. two branches separate sister taxa, whereas 3 branches separate nonsister taxa. given that branch lengths are nonzero and the site is invariable (i.e., both taxa with unambiguous characters have the same state), the likelihood under the topology placing the 2 unambiguous taxa as sister is greater than that under the topologies placing them as nonsister (the likelihood is greater when fewer branches separate taxa with the same character state). the priors ensure that only nonzero branch lengths are sampled and thus that the posterior probability of placing the 2 unambiguous taxa as sister is greater than 1/3. this posterior probability increases with an increasing number of such sites. a similar line of reasoning will lead to the opposite conclusion for saturated data sets. this explanation predicts the pattern of topological error seen in our analyses and is confirmed by the mathematical argument shown in the supplemental material. ambiguous characters can also interact with model misspecification to produce misleading estimates of phylogeny. in order to understand this interaction, consider a 4-taxon data set in which gene a is evolving at a slower rate than gene b (refer to fig. 8 ). suppose that a pair of sister sequences contain ambiguous characters for all sites in gene b. under this scenario, the lengths of the branches connecting this sister pair will be estimated based only on the sites in gene a, whereas the lengths of the branches connecting the other sister pair will be estimated based on all the sites (both genes). if amongsite rate variation is not properly modeled, the branches connecting the sister pair with ambiguous characters in gene b will be shorter than those connecting the other sister pair because gene a is evolving at a slower rate than the average rate across all sites (both genes). as a result of this interaction, rate variation across sites is manifested as rate variation across branches, resulting in biased branch-length estimates (in fact, variation across sites in any model parameter could be manifested as 142 systematic biology vol. 58 figure 8. ambiguous characters interact with model misspecification to produce misleading branch-length estimates. a data set is simulated using an ultrametric tree. some sites evolve under a slow rate, whereas others evolve under a fast rate. ambiguous characters are introduced nonrandomly with respect to rate and taxon. if rate variation is correctly modeled, the estimated tree is ultrametric. if rate variation is not correctly modeled, the estimated tree is nonultrametric. the interaction between ambiguous characters and model misspecification causes among-site rate variation to be manifested as among-branch rate variation. note that the pattern of branch lengths inferred depend on the taxonomic distribution of the ambiguous characters, even though the ambiguous sites contain no topological information. parameter variation across branches through the same process). bayesian priors introduce additional factors that may then interact to bias topological support. the accuracy of many analyses may be jeopardized by the effects of ambiguous characters on branch-length estimates. for example, we have shown that ambiguous characters may increase the propensity to incorrectly reject a molecular clock (fig. 6) . other branch lengthdependent analyses that may be affected include divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis. future studies are needed to determine the indirect effects of ambiguous characters on the accuracy of each type of analysis. many interesting problems stem from the potential for topological bias due to ambiguous characters; we outline several of them here. the rogue taxon problem, wherein 1 taxon is particularly labile in what is otherwise a well-supported tree, could result from the inclusion of ambiguous characters. related to the rogue taxon problem is conflict among gene trees. although unambiguous sites might support the same, true topology, genes with a large proportion of ambiguous characters may support alternative topologies due solely to misinterpretation of the ambiguous characters. a researcher could be deceived into believing that multiple phylogenetic signals exist across genes (interpreted as hybridization, horizontal gene transfer, or incomplete lineage sorting), when in fact all support for alternative topologies is due to the presence of ambiguous characters. one of us (k.s.-h.) has come across such an example of discordance among gene trees in empirical data from north american fireflies. once ambiguous sites were excluded from the analysis, gene tree congruence increased substantially (stanger-hall et al. 2007) . last, statistical approaches to phylogenetic hypothesis testing (e.g., bayesian posterior probabilities and ml bootstrap proportions) may also be rendered inaccurate by this bias. hypothesis testing is of particular concern because changes in posterior probabilities or bootstrap proportions of only a few percent can alter conclusions of significance, even when the bias is not strong enough to alter the preferred topology. the results of this study carry serious implications for the practice of combining data when inferring phylogenies, particularly when rates of evolution vary across data sets. for instance, consider the situation in which data are gathered from a large number of species for 2 genes: 1 slower-evolving nuclear gene is included to resolve deep relationships and 1 faster-evolving mitochondrial gene is included to resolve shallow relationships (note that this approach is increasingly common). due to monetary or time constraints, not all species are sequenced for both genes. our 4-taxon simulations suggest that the ambiguous characters will cause the taxa sequenced for only the fast gene to be pushed apart on the phylogeny, whereas the taxa sequenced for only the slow gene will be pulled together. analyses of simulated 8-taxon data sets ( supplementary fig. s3) , as well as a manipulated empirical data set (fig. 7) , confirm these predictions. supermatrix-style approaches that do not have nearly complete overlap in taxon sampling across data sets will be particularly prone to this type of error. although we expect no systematic error if the effects of priors are weak and rate variation across sites is correctly modeled, ensuring these 2 properties may be difficult in practice. for example, the branch-length prior is expected to have strong effects on any branch for which no substitutions have been observed, regardless of the dimensions of the data set. correct modeling of rate variation across sites may be even more difficult. ambiguous characters may appear in an alignment for a variety of reasons, such as monetary constraints, desire to publish quickly, poor alignments, or technical difficulties with sequencing. given these various causes for the inclusion of ambiguous characters, rates of evolution are unlikely to be discretely different between ambiguous and unambiguous sites. determining a proper method for modeling rate variation is likely to be extremely difficult, especially as the proportion of ambiguous characters at each site increases. heterotachy (changes in rates of evolution within a site across the tree), which has already proven problematic with complete data sets (kolaczkawski and thornton 2004; philippe et al. 2005; spencer et al. 2005; steel 2005; lockhart et al. 2006; matsen and steel 2007) , may also interact with ambiguous characters to produce effects that may be difficult to avoid. one possible effect, for example, is for heterotachy to be manifested as among-site rate variation, thereby biasing estimates of among-site rate heterogeneity parameters. we have not investigated the effectiveness of particular methods for correcting for the ambiguous character bias, although we suggest several here. the first (and most obvious) solution is to use only completely unambiguous data matrices when inferring phylogenies. to do so, either ambiguous characters should be filled in (through additional sequencing) or ambiguous sites should be removed from the alignment. a second potential solution is to use a technique known as statistical imputation (kalton and kish 1981; ford 1983; david et al. 1986; little and rubin 2002; marker et al. 2002) . to impute data, each ambiguous site is filled in using characters from a randomly selected unambiguous site that has the same site pattern as the ambiguous site when cells containing ambiguous characters are ignored. drawbacks of this approach include the need to account for the uncertainty associated with the filled data and the fact that imputing some sites may be impossible (due to lack of a matching unambiguous site), especially when the matrix contains a large number of sequences or a small number of sites. the third potential solution is to evaluate the effects of ambiguous characters on a data set-specific basis to see if a correction is needed. one approach is to analyze the data set with and without ambiguous sites and look for variation in the results. note that in many cases, this approach may yield an unclear conclusion because the ambiguous sites could also contain true phylogenetic signal; this is the reason the ambiguous character problem is difficult to study using empirical data. the fourth and final solution we offer is to estimate the ambiguous character states as free parameters. in an ml framework, this would entail identifying the state for each ambiguous character that maximizes the likelihood of observing the unambiguous characters. in a bayesian framework, a prior would be placed on the distribution of character states and the posterior distribution of character states for each ambiguous character would be estimated. the difficulty with this approach is that the number of parameters that would need to be estimated would be quite large for data sets containing a large number of ambiguous characters. this list of solutions is certainly not exhaustive; we look to future studies to identify the relative merits of various solutions. we have demonstrated the potential for ambiguous characters to positively mislead ml and bayesian phylogenetic inference. however, we have not investigated all possible variables that affect the magnitude of this bias (e.g., tree shape), and we leave such analyses for future studies. much additional work is also needed to identify powerful and robust diagnostics for assessing when ambiguous characters may affect a particular data set as well to determine priors and models that minimize their effect. until the costs of including ambiguous characters in empirical data sets can be more fully elucidated and methods for eliminating their effects can be developed, extreme caution should be taken when including ambiguous characters or indels in ml or bayesian phylogenetic analyses. phylogeny and the evolution of plant-animal interactions evolutionary pathways in nature: a phylogenetic approach unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events the importance of data partitioning and the utility of bayes factors in bayesian phylogenetics experimental molecular evolution of bacteriophage t7. evolution predicting the evolution of human influenza a comparing bootstrap and posterior probability values in the four-taxon case alternative methods for cps income imputation the supermatrix approach to systematics systematics of mustelid-like carnivores prospects for building the tree of life from large sequence databases molecular phylogenetics of myliobatiform fishes (chondrichthyes: myliobatiformes), with comments on the effects of missing data on parsimony and likelihood evolutionary trees from dna sequences: a maximum likelihood approach phylogenies from molecular sequences: inference and reliability inferring phylogenies an overview of hot deck procedures maxalign: maximizing usable data in an alignment maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites new uses for new phylogenies experimental phylogenetics: generation of a known phylogeny molecular systematics when are fossils better than extant taxa in phylogenetic analysis? the biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated available from: url www editor. mammalian protein metabolism two efficient random imputation procedures fragmentary taxa, missing data, and ambiguity: mistaken assumptions and conclusions problems due to missing data in phylogenetic analyses including fossils: a critical review performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous is there a star tree paradox? a simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates the importance of proper model assumption in bayesian phylogenetics polytomies and bayesian phylogenetic inference statistical analysis with missing data evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis heterotachy and tree building: a case study with plastids and eubacteria large-scale imputation for complex surveys phylogenetic mixtures on a single tree can mimic a tree of another topology molecular phylogenetics and evolution of host plant use in the tropical rolled leaf "hispine" beetle genus cephaloleia (chevrolat) (chrysomelidae: cassidinae) morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes molecular phylogenetics and the origins of placental mammals phylogenomics of eukaryotes: impact of missing data on large alignments heterotachy and long-branch attraction in phylogenetics on missing entries in cladistic analysis seq-gen: an application for the monte carlo simulation of dna sequence evolution along phylogenetic trees mrbayes 3: bayesian phylogenetic inference under mixed models likelihood, parsimony, and heterogeneous evolution phylogeny of north american fireflies (coleoptera: lampyridae): implications for the evolution of light signals should phylogenetic models be trying to "fit an elephant"? the bayesian "star paradox" persists for long finite sequences a complete family of phylogenetic invariants for any number of taxa under kimura's 3st model among-site rate variation and phylogenetic analysis of 12s rrna in sigmodontine rodents overcredibility of molecular phylogenies obtained by bayesian phylogenetics paup*: phylogenetic analysis using parsimony (*and other methods), version 4.0b10 evolutionary trees of apes and humans from dna sequences exploring the phylogenetic structure of ecological communities: an example for rain forest trees does adding characters with missing data increase or decrease phylogenetic accuracy? incomplete taxa, incomplete characters and phylogenetic accuracy: is there a missing data problem? missing data, incomplete taxa, and phylogenetic accuracy missing data and the design of phylogenetic analyses missing data and the accuracy of bayesian phylogenetics combining data sets with different numbers of taxa for phylogenetic analysis maximum-likelihood estimation of phylogeny from dna sequences when substitution rates differ over sites maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods fair-balance paradox, star-tree paradox and bayesian phylogenetics comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation branch-length prior influences bayesian posterior probability of phylogeny associate editor: lars jermiin supplementary material can be found at http://www. sysbio.oxfordjournals.org. the authors thank david bryant, gavin naylor, and the computational phylogenetics group at the university of texas at austin for useful discussions. we are also grateful to matt morgan, tracy heath, lars jermiin, and 2 anonymous reviewers for comments on a previous version of this manuscript. key: cord-295019-8tf8ah6g authors: weber, wilfried; fussenegger, martin title: emerging biomedical applications of synthetic biology date: 2011-11-29 journal: nat rev genet doi: 10.1038/nrg3094 sha: doc_id: 295019 cord_uid: 8tf8ah6g synthetic biology aims to create functional devices, systems and organisms with novel and useful functions on the basis of catalogued and standardized biological building blocks. although they were initially constructed to elucidate the dynamics of simple processes, designed devices now contribute to the understanding of disease mechanisms, provide novel diagnostic tools, enable economic production of therapeutics and allow the design of novel strategies for the treatment of cancer, immune diseases and metabolic disorders, such as diabetes and gout, as well as a range of infectious diseases. in this review, we cover the impact and potential of synthetic biology for biomedical applications. | mammalian gene expression control strategies. a | repression-based expression control. a repressor protein binds to its operator and thus prevents activation of the promoter and expression of the gene of interest. in response to an inducer, the repressor dissociates from the operator, the promoter is derepressed, and the gene of interest is expressed. b | activation-based expression control. a minimal promoter (p min ) is activated when a chimeric transcription factor that is constructed by fusing a repressor protein to a transcription activation domain binds to its operator. in the presence of an inducer, the repressor protein-transcription-activation domain complex dissociates from its operator, p min is no longer activated, and transcription of the gene of interest is prevented. c | mrna transcript-based expression control. a self-cleaving ribozyme is fused to a small-molecule-binding aptamer and introduced into the 3′ untranslated region (utr) of a gene of interest. in the absence of the inducer, the ribozyme undergoes self-cleavage, thereby eliminating the poly(a) tail (pa) from the open reading frame and preventing translation. however, in the presence of the inducer, the aptamer undergoes a conformational change, which inactivates the ribozyme and allows translation to occur. genetic circuits that can be switched between two stable expression states (for example, an 'on' and an 'off' state) by a transient stimulus. in the absence of a switching stimulus, the expression state is locked and inherited across cell generations. genetic circuits whose response dynamics depend on a combination of past and present states. the capacity of a species to sense and score its surrounding ecosystem, for example, to identify the type and population density of neighbouring species. a small-molecule-based chemical language by which bacteria communicate within and across populations (the 'quorum'). production and response to quorum-sensing molecules is correlated with population density. these are devices that are selectively induced within a specific concentration range of the input signal. at lower or higher trigger levels, the band-pass filter shuts down output signals. are starting to emerge [63] [64] [65] . this review focuses on the recent progress in synthetic biology towards improving human health, including diagnostic applications, design of novel preventive care strategies, progress in drug discovery, design and delivery, and development of novel treatment strategies, such as prosthetic networks. synthetic biology holds the promise of providing unique opportunities for major advances in the improvement of human health in the twenty-first century 4, 66, 67 . understanding disease mechanisms pathogen mechanisms. the availability of affordable technology for synthesizing and assembling 10 sequences for proteins or for viral 68 and bacterial 69 genomes with increased speed has dramatically improved our understanding of host-pathogen interactions and of disease mechanisms. the synthetic biology principle of 'analysis by synthesis' provides mechanistic insight by combining rapid synthesis, assembly, shuffling and mutation of individual genetic components with straightforward liposomes that are decorated with antigenic peptides or proteins that elicit an immune response. these are molecular devices that promote the spreading of a specific gene throughout a target population by taking advantage of a mechanism that multiplies the specific gene in the host genome. gene drive systems produce non-mendelian patterns of inheritance. functional analysis. for example, the genome of the h1n1 virus that was responsible for the 1918 spanish influenza pandemic was synthesized using sequence information from genomic pieces that were extracted from permafrost-conserved tissue samples. functional analysis of the reconstructed virus provided new insight into the key virulence factors of the pathogen: namely, a haemagglutinin variant that induces membrane fusion without trypsin activation and a modified polymerase that enhances viral replication 68 . the same study also revealed that a combination of eight genes was responsible for the exceptional virulence of the spanish influenza strain 68, 70 . this finding may help to identify the pandemic potential of future virus variants [71] [72] [73] . synthesis and analysis of chimeric viruses have also made a substantial contribution to the understanding of coronavirus zoonoses that were responsible for the severe acute respiratory syndrome (sars) pandemic of 2002 and 2003. the characterization of the history of the sars coronavirus, especially its switch in tropism, was particularly challenging, as its direct ancestors could not be propagated in laboratory models. however, after a 30 kb sars-like bat coronavirus was designed to contain the receptor-binding spike protein of its human homologue, the synthesized chimeric virus was able to replicate in culture and infect mice 74 . these in vivo studies revealed infectionenhancing mutations in the spike protein and established this surface protein as a key factor that is responsible for tropism switches in coronavirus zoonoses 74 . reconstruction of pathogens by dna synthesis can also be used for the production of diagnostic high-density antigen arrays 75, 76 , such as those used to profile post-lyme-disease syndrome 76 or the humoural immune responses to hepatitis c and the human immunodeficiency virus (hiv) 77 . immune systems. synthetic biology has recently provided new insight into disorders that are related to deficiencies of the immune system, which is known for its particularly complex control circuits and cellular interaction networks. for example, dysfunction of b-lymphocyte activation underlies several physiological disorders 78 . functional reconstitution and analysis of the human b cell antigen receptor (bcr) signalling cascade in insect cells revealed that bcrs are not activated by antigen-specific crosslinking, as presented in textbooks, but instead have an autoinhibitory oligomeric conformation on resting b lymphocytes that shifts to an active dissociated form when antigens bind 79 . this triggers the signalling cascade, which results in antibody production and the onset of a humoural immune response. also, construction of a representation of the complete human peptidome engineered for display on the surface of t7 phages enabled church and colleagues 80 to discover new autoantigens. they used patient-derived autoantibodies to enrich autoantigenic peptides displayed on the phages; they could then identify the antigens by high-throughput sequence analysis 80 . knowledge of the antigens that are involved in autoimmune processes is important for understanding disease aetiology, developing accurate diagnostic tests and designing drugs that neutralize autoreactive immune cells 80 . vaccines. high-throughput and high-precision assembly and engineering of entire genomes from welldefined genetic components using synthetic biology principles has provided new opportunities for the design of attenuated pathogens for use as vaccines. for example, primates immunized with virus-like particles that were produced by selective expression of particular chikungunya virus (chikv) structural proteins were protected against viraemia after a high-dose challenge; even immunodeficient mice that were treated with monkey-derived antibodies survived subsequent lethal doses of chikv 81 . dna synthesis and assembly has also played an essential part in pioneering a safe live vaccine against the poliovirus 82 . the poliovirus was attenuated by systematic genome-scale changes of adjacent pairs of codons from over-to underrepresented codon sets in viral capsid genes (for example, gcc|gaa is strongly under-represented compared with gca|gag, although both encode ala-glu). these changes reduced translation and impaired the replication competence and infectivity of the virus. this attenuated poliovirus provides protective immunization in mice and offers a high safety standard given the low probability that all 631 individual changes will revert and thus reconstitute infectious wild-type viruses. the genome-engineering approach used here could represent a general strategy for designing live vaccines against infectious diseases. other promising vaccination concepts include using antigen-producing immunostimulatory liposomes as genetically programmable synthetic vaccines 83 and the production of heat-stable oral algae-based vaccines to protect against staphylococcus aureus infections 84,85 . vector control. suppression of insect vector populations using transgenic viral strains that harbour conditional dominant-lethal synthetic circuitry may control the transmission of malaria parasites and dengue viruses and could eventually control the spread of untreatable diseases [86] [87] [88] . mosquitoes that are transgenic for a tetracycline-dependent transactivator (tta) that is exclusively expressed in the female's indirect flight muscle can only be propagated in the presence of tetracycline, which represses the transcription of this gene. however, the absence of tetracycline leads to the development of a female-specific flightless phenotype 87, 88 . putting the eggs of this transgenic mosquito into the ecosystem results in male-only releases; female mosquitoes remain grounded and cannot feed, mate or take blood meals, which effectively represents a lethal phenotype. males do not transmit the disease, but they disseminate the synthetic circuit across the resident wild-type mosquito population 87, 88 (fig. 2a) . similarly, a synthetic homing endonuclease-based gene drive system could be used to spread genetic modification, such as malaria resistance, from engineered mosquitoes to the field population. homing endonucleases typically produce a single sequence-specific double-strand break in the host genome that is repaired by homologous recombination using the homing endonuclease gene (heg) as a template. consequently, × × × figure 2 | synthetic biology for understanding and preventing disease. a | a female-specific dominant-lethal gene network for mosquito control. mosquitoes were engineered to express an intron-containing variant of the tetracycline (tet) transactivator (tta) under the control of a flight-muscle-specific promoter (p fm ). in male mosquitoes, the intron is not spliced out, which prevents correct tta translation. in female progeny, however, functional tta translation is restored by sex-specific mrna splicing. this results in the activation of the tta-responsive promoter p tet and the expression of a toxic gene that triggers a flightless phenotype. if mosquitoes are raised in the presence of tetracycline (tet), tta is prevented from activating p tet , which results in a normal phenotype. however, following their release into the tet-free environment, engineered males mate with wild-type females. this transmits the female-specific dominant flightless phenotype and should eventually result in the reduction or extinction of the wild-type population. b | propagation of a selfish gene converting a heterozygous into a homozygous host. the homing endonuclease i-scei is expressed and cleaves its cognate restriction site (rs) on the homologous chromosome. following end resection and repair, the i-scei expression cassette is inserted into the second chromosome. pa, poly(a) tail. these constitute a group of secondary metabolites produced through linear decarboxylative condensation of acetyl-coa with several malonyl-coa-derived extender units to a polyketide chain. many pharmacologically active compounds, such as antibiotics and anti-cancer drugs, belong to the polyketide class. the selfish heg is copied to the broken chromosome in a gene conversion process referred to as 'homing' . expressing the heg i-scei under the control of a male germline promoter enabled efficient homing in transheterozygous males and rapid genetic drive, which led to heg invasion in caged mosquito populations 89 (fig. 2b) . by engineering the sequence specificity of other hegs (for example, i-anii or i-crei), the gene drive concept could, in principle, be used to knock in or knock out gene functions that target the mosquito's ability to serve as a disease vector 89 . field tests of release of insects carrying dominant lethals (ridl) technology using first-generation ttatransgenic mosquitoes have already been conducted in grand cayman. first, a small-scale release confirmed that transgenic males could survive, mate with wild females and produce transgenic larvae, and then the full field trial showed an 80% reduction in the numbers of wild mosquitoes about 11 weeks after release. as the study site was not isolated and the surrounding areas contained high densities of wild-type mosquitoes, scoring the actual suppression efficiency remains challenging 90 . drug development drug discovery. synthetic mammalian transcription circuits consisting of a chimeric small-molecule-responsive transcription factor and a cognate synthetic promoter were originally designed for future gene-based therapies, and the aim was to adjust therapeutic transgene expression in mammalian cells in response to a pharmacologically active substance 34, 47, 49, 91 . as most chimeric transcription factors are derived from repressors that manage drug resistance in bacteria (for example, resistance to antibiotics 92 ) and are promiscuous for structurally related compounds, mammalian cells containing such circuitry could also be used in 'reverse mode' , as integrated screening devices for the class-specific discovery of new drug candidates 33,93 (for example, new antibiotics 92 ) (fig. 3a) . when mammalian cells that are transgenic for the screening circuit are exposed to a compound library, they detect and modulate reporter gene expression in the presence of a non-toxic, cellpermeable and bioavailable molecule that has a classspecific core structure and corresponding drug activity (for example, antibiotic activity) (fig. 3b) . using the same screening setup, compounds have been detected that lock the transcription factor onto the dna, which may block induction of antibiotic resistance in pathogens and render them drug-sensitive 94 (for example, see bioversys). using such compounds alongside the specific antibiotic may offer novel anti-infective treatment opportunities and a new life cycle for established antibiotics (fig. 3c) . other trigger-inducible transcription control systems can be used in this manner as well, such as those that are responsive to streptogramin 47 , tetracycline or macrolide antibiotics 91 , anti-diabetes drugs 95 or immunosuppressive lactones 96, 97 . one example of the efficacy of a transcription circuit system involves the bacterial transcriptional repressor ethr. ethr represses transcription of etha and so prevents etha-mediated conversion of the lastline-defence antibiotic ethionamide into a pathogenkilling metabolite 94 . the chemical 2-phenyethylbutyrate, best known for its strawberry flavour, was the first compound found that specifically inactivated ethr and so triggered etha expression and re-established the sensitivity of mycobacterium tuberculosis to ethionamide (fig. 3d) . further work revealed other ethr-inactivating ethionamide booster compounds; these have also been successfully tested in a mouse model of human tuberculosis 98 . restoring drug sensitivity by pharmacological inhibition of master resistance regulators may be widely applicable 94 . a further example of the use of synthetic circuitry for drug discovery is provided by mammalian cells that are conditionally arrested in the g1 phase of the cell cycle by circuitry controlling the expression of the cycline-dependent kinase inhibitor p27. these cells reproducibly formed a mixture of isogenic subpopulations of proliferation-inhibited cells and proliferating cells that had spontaneously escaped the synthetic cell cycle block 99 . these cells could be used as a cell-based cancer model and could be used to screen for anticancer compounds that selectively eliminate proliferating cells while leaving arrested ones intact 100, 101 . drug production and drug delivery. the synthetic pathways that are created by assembling enzymatic cascades or networks in bacteria, yeast and plants have been instrumental for the large-scale economic production of high-value drug and drug precursor compounds, as well as for the biosynthesis of new secondary metabolites with novel therapeutic activities. examples include complex polyketides 102,103 , halogenated alkaloids [104] [105] [106] and the precursors of the anti-malaria drug artemisinin (which is produced by the company amyris, for example) 107 and of the anti-cancer compound taxol 108 . for production of these compounds, it was necessary to overcome several challenges, including the functional expression of complex biosynthetic enzymes (such as cytochrome p450 monooxygenases 109 ) and the overall orchestration of the multistep pathway to avoid accumulation of (toxic) intermediate products and to ensure metabolic channelling 60 . small-molecule-responsive protein-protein and protein-dna interactions that are used to pioneer gene switches in mammalian cells 36, 49, 110 have also been successfully re-engineered in the design of triggerinducible biohybrid materials for drug delivery [111] [112] [113] [114] [115] . using synthetic protein-polyacrylamide and dnapolyacrylamide monomers, hydrogels can be produced that dissolve when specific ligands are supplied (fig. 4) . biopharmaceuticals (for example, vascular endothelial growth factor (vegf)) supplied during gel formation are loaded into the hydrogel and can be released in a dose-dependent manner after subcutaneous implantation into mice and oral administration of the trigger compound 115 . it is thought that any trigger-inducible protein-protein and protein-dna interactions could be used to produce drug-sensing and drug-releasing hydrogels 114, 116, 117 . in chinese hamster ovary (cho-k1) cells, the streptogramin-responsive repressor (pip) was expressed by a constitutive promoter (p const ). pip binds to its multimeric operator (pir3) and represses expression of the reporter gene secreted alkaline phosphatase (seap). exposing this screening cell line to a small molecule library only resulted in seap production for compounds that were streptogramin-like, cell-permeable and non-toxic (indicated by the brown star) . b | discovery of small molecules that are able to overcome antibiotic resistance. the mycobacterium tuberculosis antibiotic resistance regulator (ethr) was fused to the herpes-simplex-derived transcriptional activator (vp16) and expressed in human embryonic kidney cells (hek293-t) under the control of a constitutive promoter (p const ). when ethr-vp16 binds to its cognate operator (o ethr ), the minimal promoter (p min ) is activated, which results in expression of the reporter gene seap. a screen is performed to identify a cell-permeable, non-toxic molecule (indicated by the yellow star) that prevents ethr binding to o ethr , stopping seap expression. c | overcoming resistance to ethionamide in m. tuberculosis. in m. tuberculosis, ethr represses transcription of both the baeyer-villiger monooxygenase (etha) and itself in a negative feedback loop. when 2-phenylethylbutyrate (indicated by the pink star) is added, it prevents ethr binding its target promoter (labelled 'p' in the figure). this derepresses etha production, thereby turning ethionamide into a cytotoxic compound that kills the mycobacterium. pa, poly(a) tail. interactive biohybrid material based on the interaction of a repressor protein with its cognate dna operator motif. homodimeric tetracycline repressor (tetr) is converted into a single-chain repressor (sctetr) by connecting two tetr subunits through a flexible peptide linker, and it is tagged with six histidines (sctetr-his 6 ). this molecule is coupled to a polymer and is mixed with a polyacrylamide that has copies of a tetracycline operator (teto) attached to it. sctetr binds to teto so crosslinks are formed, making a hydrogel. when tetracycline is added, sctetr releases teto, and the gel is dissolved. this can be used to release another molecule that was attached to the polymer -in this case, the cytokine interleukin 4 (il-4). genetically encoded repair program protecting against dna damage. in prokaryotes, the repair program is coordinated by lexa and reca. dormant individual cells within a bacterial population that show a high tolerance to antimicrobials. biofilms are surface-associated bacterial communities that are encased in a hydrated extracellular polymeric substance (eps) matrix that is composed of polysaccharides, proteins, nucleic acids and lipids. they are crucial to the pathogenesis of many clinically important bacteria and exhibit resistance both to the immune system and to antimicrobial treatments, making them difficult to eradicate 118, 119 . collins and colleagues 120 successfully engineered bacteriophage t7 to constitutively express dspb: an enzyme that hydrolyses β-1,6-n-acetyl-d-glucosamine, which is an adhesin that is required for biofilm formation and integrity in staphylococcus spp. and escherichia coli clinical isolates. the initial infection of a bacterial biofilm with this bacteriophage (known as t7 dspb ) results in rapid multiplication of the phage and expression of dspb. following lysis, t7 dspb and dspb are released into the biofilm, which leads to re-infection and degradation of β-1,6-n-acetyl-d-glucosamine. during the process of t7 dspb infection, bacterial biofilm cell counts are reduced by 99.997% -over two orders of magnitude greater than when a non-enzymatic phage is used 120 . in a follow-up study, bacteriophage m13 was engineered to express lexa3, which suppresses the sos dna repair system that bacteria require to counteract antibiotic-induced oxidative stress [121] [122] [123] . infection by this designer phage sensitizes e. coli to quinolone antibiotics. use of this phage increases the survival of mice that are infected with e. coli, decreases the survival of antibiotic-resistant bacteria, persister cells and biofilm cells and reduces the number of antibiotic-resistant bacteria that arise from an antibiotic-treated population. it also acts as a strong adjuvant for other bactericidal antibiotics 124 . the designer phage platform can be used to produce other antibiotic adjuvants 124 . although it was once abandoned after the introduction of antibiotics, phage therapy is currently being revisited in several clinical trials around the world as the prevalence of multidrug-resistant pathogens is dramatically increasing. although phage therapy may face clinical challenges associated with development of bacterial phage resistance, phage neutralization by the immune system and pharmacokinetics, the field will certainly receive an impetus from designer phages 125 . bacteria can communicate with each other using a chemical language known as quorum sensing. individual bacteria produce and secrete signalling molecules (called autoinducers) that are common to multiple species or are species-specific. these molecules accumulate as the population grows and can bind to receptors that coordinate colony-wide gene expression or manipulate the behaviour of other bacterial populations. for example, vibrio cholerae produces cholera autoinducer 1 (cai-1) and autoinducer 2 (ai-2), which trigger repression of key virulence factors. feeding infant mice with a probiotic e. coli that naturally produces ai-2 and has been engineered to constitutively synthesize cai-1 significantly increased the animals' survival rate after ingesting v. cholerae 126 . this suggests that such an approach could be an economic strategy to prevent infectious diseases. unlike antibiotics, quorumsensing-based interventions do not kill pathogens but reprogram their behaviour; this strategy may be free of selection pressure and therefore may be less prone to develop resistance. in another study, commensal bacteria were equipped with synthetic circuitry to stimulate glucose-dependent insulin production in intestinal epithelial cells 127 . after intravenous injection, escherichia coli accumulates in cancer tissue, where it reaches high population densities. e. coli is engineered to link the quorum-sensing receptor luxr to an autoinducer 1 (ai-1)-inducible promoter (p lux ). p lux is also used to drive luxi and the invasin gene inv. luxi produces ai-1, generating a positive feedback loop that coordinates invasion throughout the population. b | acetylsalicylic acid (aspirin)-triggered killing of cancer cells after invasion of salmonella spp. salmonella spp. naturally invade cancer cells after intravenous injection. salmonella spp. were engineered with a pseudomonas putida-derived signal-amplifying two-level cascade in which nahr controls salicylate promoter (p sal )-driven xyls2 expression and xyls2 then triggers a xyls2-dependent promoter (p m )-driven expression of the cytosine deaminase (labelled cd in the figure) . salicylate induces both nahr-based p sal and xyls2-mediated p m activation. mammalian cells are resistant to 5-fluorocytosine because they lack cytosine deaminase, which converts 5-fluorocytosine into the toxic cancer therapeutic 5-fluorouracil. c | invasive bacteria suppress oncogene expression. e. coli is engineered to constitutively co-express a catenin β-1-specific short hairpin rna (shrna), listeria monocytogenes listeriolysin (llo*) and inv under control of the bacteriophage t7 promoter (p t7 ). they invade cancer cells (using the inv protein), escape from the phagosome (using llo*) and knock down the catenin β-1 oncogene (using shrna). d | therapeutic protein transduction. lentiviral particles are produced using an integrase-negative helper vector (designated 'helper' in the figure) and a constitutive expression vector encoding the protein of interest (designated 'protein' in the figure) fused to viral protein r (vpr) and a protease cleavage site (pc). this can be delivered to any target cell in the absence of viral nucleic acids and proteins. an example application is described in the main text. pa, poly(a) tail. p ef1a , elongation factor 1 alpha (ef1a) promoter. the action of drugs in the body over a period of time. it covers absorption of the drug as well as its distribution, tissue localization, biotransformation and excretion. commensal bacteria live in close contact with the host. in this special type of symbiosis, one partner is benefited, whereas the other is neither benefited nor harmed. despite decades of progress in cancer therapy, a major challenge remains: how to specifically target and selectively kill neoplastic cells that develop within native and implanted tissue and relocate within the organism to form metastasis. therefore, therapeutic strategies that are designed to eliminate cancer cells must be extremely precise to exclusively target diseased tissue while leaving normal tissue intact. although native cytotoxicity or the constitutive expression of anticancer compounds have demonstrated some potential in animal studies and human clinical trials 128 , trigger-inducible drug expression circuits delivered by tumour-invasive bacteria or tumour-transducing viral particles may improve cancer therapy. synthetic biologists have recently designed a few anti-cancer devices that provide precise timing, location and dosing of drug production by external cues and could provide greater intra-tumoural effects while minimizing systemic toxicity. after intravenous injection or oral administration, many bacterial species (for example, e. coli and salmonella spp.) naturally sense and selfpropel towards tumours. these bacteria have also been engineered to selectively invade and proliferate in tumour tissues and to produce cytotoxic compounds as well as reporter proteins for non-invasive follow-up monitoring of tumour regression 128 . these bacteria express flagella to penetrate tissue and chemotactic receptors to promote migration towards aspartate produced by viable cancer cells, ribose released by necrotic tissue or hypoxic regions generated by the hyper-metabolic activities of neoplastic cells. after they have reached the tumour site, the bacteria then either proliferate in the extracellular space or invade the tumour cells. in either situation, selective cytotoxicity was engineered by expressing toxins, cytokines, tumour antigens, pro-apoptotic factors or prodrugconverting enzymes 128 . non-invasive e. coli has successfully been programmed to invade cultured tumour cells in a hypoxia-responsive or population-density-dependent manner. the corresponding circuitries consist of the anaerobically induced formate dehydrogenase promoter driving the yersinia pseudotuberculosis invasin gene (inv), which mediates invasion using specific integrin receptors that are typically expressed on tumour cells. populationdensity-dependent invasion requires an engineered quorum-sensing circuit that triggers inv expression after the bacterial population has reached a threshold size at the tumour site. this circuitry consists of quorum-sensing receptor luxr that co-induces luxi (which encodes the enzyme producing the quorum-sensing messenger autoinducer 1 (ai-1)), and inv. ai-1-triggered, luxrmediated expression of luxi represents a positive feedback loop that amplifies inv expression and ai-1 production; this coordinates and broadcasts the invasion order across the entire population 129 (fig. 5a) . tumour-invading bacteria have also been engineered for trigger-inducible drug expression after entering tumour cells. in addition to l-arabinose130 and γ-irradiation-induced 131 drug expression, a synthetic salicylate-triggered expression device has been used to control expression of drug components following systemic administration of the trigger molecule in mice in tumour cells that have been invaded by salmonella spp. 132 . the device is based on a circuit that is derived from pseudomonas putida, which controls expression of cytosine deaminase in a salicylate-inducible manner 132 . mammalian cells normally lack cytosine deaminase, which means that they are resistant to 5-fluorocytosine because this enzyme is needed to convert 5-fluorocytosine into the cytotoxic molecule 5-fluorouracil. tumour-bearing mice were injected with attenuated salmonella enterica engineered with the p. putidaderived circuit and then treated with 5-fluorocytosine. the mice showed significant tumour regression when fed with acetylsalicylic acid (aspirin) 132 , which is rapidly converted to salicylate after intake by the animal (fig. 5b) . rnai is a potent and highly conserved mechanism for the targeted knockdown of mrna translation by small rnas. non-pathogenic e. coli was engineered to express a short rna hairpin that triggers rnai against catenin β-1, which is a colon-cancer-specific oncogene 133 . these bacteria, which were also engineered to express proteins to mediate cellular invasion and escape α α from the phagosome, were administered orally or intravenously and significantly reduced catenin β-1 levels in the intestinal epithelium and in human colon cancer xenografts in mice 133 (fig. 5c) . combining various bacterial anti-cancer treatment strategies may increase safety, specificity and efficiency in future clinical trials. viral synthetic devices. viruses have also been successfully engineered to transduce specific cells by expressing epitopes that are recognized by particular cell-surface receptors and to express prodrug convertases and cytokines for use in cancer therapy 134 . most of these oncolytic viruses carry coding viral nucleic acids, which may cause side effects owing to recombination with the host chromosome or proviral elements that are already in the host cell. recently, synthetic viral particles have been designed that lack coding nucleic acids and that exclusively package therapeutic proteins, which can be released in a dose-dependent manner 135 . for example, viral particles carrying linamarase from manihot esculenta were injected into human breast cancer xenografts in mice that had been treated with the non-toxic natural product linamarin; these viruses triggered efficient tumour regression owing to the cyanide produced by linamarase-mediated conversion of linamarin 135 (fig. 5d) . similarly, protein-carrying viral nanoparticles have been used to deliver site-specific dna recombinases, such as flp, to precisely integrate or excise genetic components on the host chromosome 136 . they might also be used to deliver native or chimeric transcription factors that could transiently control the expression of target genes that are involved in therapeutic interventions, lineage control or induction of pluripotency 137 . a transformation sensor for cancer therapy. gene therapy advances for cancer include virus-mediated delivery of cytotoxic effector genes controlled by cancer-specific promoters 138, 139 or delivery of chimeric adaptor proteins to link tyrosine kinase signalling to the apoptosis-inducing caspase machinery 140 . most promoters and control circuits that coordinate simple reactions such as these are inherently noisy and only allow linear responses, which means limited control of specificity and efficacy. however, using two internal input signals can improve fidelity, mediate sharp response profiles and ensure robust biochemical processes 141 . using decision-making circuits as blueprints, nissim and bar-ziv 142 designed a tunable dual promoter integrator (dpi) to target cancer cells precisely. the dpi consists of two native promoters that are concurrently activated by two independent transcription factors. each cancer-sensing promoter produces a different fusion protein in proportion to its activity, and these two proteins assemble together as a chimeric transcription factor. this transcription factor then activates a synthetic promoter that controls expression of the herpes simplex virus type 1 thymidine kinase (tk1), which is cytotoxic in the presence of nucleotide analogues, such as ganciclovir (fig. 6a) . the dpi could be optimized for a specific cancer cell type by using different combinations of input promoters and effector genes, as well as by modulating the assembly efficiency and half life figure 6 | synthetic genetic cancer classifiers. a | a transformation-sensing cancer kill switch can consist of a two-input, transformation-sensing device with 'and' logic. the device constantly monitors the transformation state of a cell and produces a kill signal when two malignancy markers occur. two independent malignancy-sensitive promoters drive expression of two chimeric proteins (docs-vp16 and gal4 bd -coh2). when they are simultaneously expressed, both proteins dimerize to form a synthetic transcription factor that binds gal4 operator sites (o gal4 ), induces downstream minimal promoters (p min ) and triggers expression of the herpes simplex virus type 1 thymidine kinase (tk1). in the presence of ganciclovir, the system is cytotoxic. b | a microrna (mirna)-based cancer classifier that discriminates cancer cells from non-transformed cells by scoring high and low expression profiles of a set of cancer-specific mirnas. the classifier consists of high and low mirna sensors that exclusively promote output gene expression if the specific input mirnas are expressed at high or low levels, respectively. in the high mirna sensor, high-target mirna concentrations prevent translation of mrnas encoding the reverse tetracycline-dependent transactivator (rtta) and the repressor of the lactose operon (laci). this results in derepression of transcription of the output gene (labelled 'output' in the figure). in the low mirna sensor, the output-gene-encoding mrna is only translated when low-target mirna concentrations are present. c | by combining different high and low mirna sensors, the classifier can be customized to sense predetermined profiles of high and low mirna levels, such as the ones that are typically produced by cancer cells and respond with expression of the apoptosis-inducing human bcl2-associated x protein (bax). pa, poly(a) tail. a vitamin-a-dependent, g-protein-coupled receptor that is expressed in intrinsically photosensitive retinal ganglion cells networks that replace existing cellular functionality that is ill-driven or out of order. they represent molecular prostheses for non-functional cellular activity; they differ from other synthetic networks that add useful functionality but do not replace non-functional cellular networks. of the chimeric transactivator components. so far, a set of three promoters have been characterized in detail, but the dpi design may accommodate other suitable promoters. the recently developed 'cell-type classifier' is conceptually similar to the dpi, as it can also be programmed to destroy cells that express a specific set of neoplastic markers 31 . the cell-type classifier combines transcription and translation control components in a single scalable synthetic circuit that senses expression levels of a set of (currently up to six) endogenous micrornas (mirnas); it triggers an apoptosis-inducing response only if those levels match a preset profile. the cell-type classifier combines sensor modules for the detection of highly and lowly expressed mirnas (fig. 6b) . for clinical implementation, both the dpi and the cell-type classifier must either be delivered to the cancer tissue, or they must provide a fail-safe mechanism that constantly eliminates transforming cells from engineered tissue implants. other emerging tools for biomedicine novel treatment strategies will require new technologies to sense and control disease. synthetic biologists have designed new devices that could sense key physiological activities and have found new ways to dose therapeutic interventions precisely in response to external physical cues. such synthetic devices could have wide-ranging biomedical applications. thus far, synthetic control devices that are designed to interface with host metabolism and to reprogram cellular behaviour have largely been limited to heterologous transcription factors. rna controllers may represent an alternative. they are straightforward to design and can be integrated into a single expression unit containing sensors (aptamers), gene-regulatory components (ribozymes) and effector transgenes 39, 143, 144 . the inherent modularity and compatibility of rna-based control components enables them to be independently optimized or exchanged. for example, an rna control device consisting of a drug-responsive aptamer linked to a ribozyme in the 3′ untranslated region (utr) of a cytokine expression unit enabled trigger-inducible inactivation of ribozymemediated transcript cleavage and full transgene expression in the presence of the input signal 145 . this synthetic rna control device was applied to control proliferation of engineered primary human t cells and enabled external control of the expansion of transgenic t cells that are implanted into mice. synthetic rna control devices could provide the advance that is necessary to enable t cell therapy 145 ; by contrast, state-of-the-art, triggerinducible expansion of engineered t cells using chimeric antigen receptors has only led to moderate proliferation and poor survival of t cells in clinical trials 146, 147 . another use for synthetic rna is the design of programmable sensor-actuator devices that convert levels of an intracellular protein into a discrete high or low transgene-expression state 148 . the rna devices consist of a three-exon, two-intron minigene followed by the transgene. the introns contain protein-sensing aptamers, and the central exon includes a stop codon. binding of the protein to the aptamers controls splicing of the minigene; when the central exon is spliced out, the transgene is expressed at high levels, and when it remains unspliced, the transgene is expressed at low levels. such a device was configured to sense subunits of nuclear factor kappa b (nfκb) or β-catenin (which are neoplastic markers) and to express the herpes simplex virus thymidine kinase. thymidine kinase renders cells susceptible to ganciclovir, so this device only operated as a cancer kill switch in the presence of the cancer markers and gangcicolvir 148 . the modular configuration of the rna sensor-actuator device allows it to be tailored to different intracellular proteins and even to multi-protein input using specific intronic aptamers. also, responsiveness and performance can be tuned by placing the aptamers at different locations within the introns. the availability of compact rna sensor-actuators that are easy to design and to alter and that control transgene expression in response to intracellular levels of key proteins may also improve the ability to link metabolic disease states with gene-based therapeutic interventions. light is becoming increasingly popular as a traceless, moleculefree input signal for triggering transgene expression in living systems. bacteria have been engineered to record projected images with gigapixel resolution [149] [150] [151] and to adjust transgene expression in response to multichromatic input 150 , and now genetic light switches have also been designed to control gene expression 152 and shape of mammalian cells 153 . devices that convert light pulses into transcription may foster novel therapeutic opportunities in future gene-and cell-based therapies and may improve the manufacturing of difficult-to-produce protein pharmaceuticals, such as cancer therapeutics. an illustrative example is light-controlled expression of the glucagon-like peptide 1 (glp1), which is a promising drug candidate for the treatment of type 2 diabetes 65 (fig. 7a) . an optogenetic device that enables light-triggered gene expression in human cells was designed. this involves ectopic expression of melanopsin in human embryonic kidney cells and functional rewiring of signalling downstream of melanopsin; the cascade integrates blue-light-pulsetriggered photoreception and produces a reversible and sustained intensity-dependent transcription response. when placed in hollow fibre containers and implanted into mice, transgene expression in the engineered lightsensitive cells could be controlled remotely by an optical fibre 65 . illuminating mice that carried subcutaneous implants of microencapsulated photo-responsive cells also enabled transdermal control of transgene expression and of corresponding protein levels in the blood of treated animals. this system was able to attenuate glycaemic excursions and to control glucose homeostasis in a mouse model of human type 2 diabetes 65 . prosthetic networks. prosthetic networks are synthetic sensor-effector devices that act as molecular prostheses. when engineered into cells and functionally connected to host metabolism, they sense, monitor and score disease-relevant metabolites, process off-level concentrations and coordinate adjusted diagnostic, preventive or therapeutic responses in a seamless, automatic and self-sufficient manner. an example of the use of a prosthetic network is the sensing of metabolites to improve control of urate homeostasis (fig. 7b) . moderate levels of uric acid, which scavenges radicals, are deemed to be beneficial. however, a transient surge in uric acid that is released by dying cells during cancer therapy leads to tumour lysis syndrome, and chronic hyperuricaemia can result in gout. humans are particularly sensitive to imbalances of urate homeostasis because they lack uricolytic activity. a prosthetic network that constantly monitors blood urate concentrations and restores urate homeostasis by controlled expression of a urate oxidase -which reduces excessive urate concentration while preserving levels that are suitable for radical scavenging -could represent a treatment strategy for hyperuricaemic disorders 64 . in brief, human cells that contain such a prosthetic network have recently been designed by combining: the uric acid sensor hucr 154 , which manages oxidative stress protection in deinococcus radiodurans; the human uric acid transporter urat1 (also known as slc22a12), which increases the intracellular uric acid levels and thus the sensitivity of the prosthetic circuit; and a secretionengineered urate oxidase (smuox) that is clinically licensed for the treatment of the tumour lysis syndrome 155 . figure 7 | advanced therapeutic and prosthetic networks. a | light-triggered transcription control of blood glucose homeostasis. the synthetic phototransduction cascade consists of rewired melanopsin and nuclear factor of activated t cells (nfat) control circuits. photo-isomerization of the 11-cis-retinal chromophore (r) by blue light (~480 nm) activates melanopsin. this sequentially turns on gaq-type g protein (gaq), phospholipase c (plc) and phosphokinase c (pkc) and triggers ca 2+ ion influx via transient receptor potential channels (trpcs) and possibly also from the endoplasmic reticulum. this ca 2+ ion surge activates calmodulin (cam) to calcineurin (can), which dephosphorylates nfat. nfat then translocates into the nucleus, where it binds to specific promoters (p nfat ) and coordinates transgene transcription. when linked to the glucagon-like peptide (glp1), this mechanism allowed light-controlled blood glucose homeostasis to be achieved in a mouse model of type 2 diabetes. b | prosthetic network for the treatment of tumour lysis syndrome and gout. implanted sensor-effector cells are used to monitor serum urate levels constantly: they import urate via a transgenic human uric acid transporter (urat1). urate prevents binding of the uric acid-sensitive transsilencer (krab-hucr, which is the uricase regulator linked to a krab domain) to its operator (huco 8 ). this operator controls expression of secretion-engineered urate oxidase (smuox), so smuox is expressed when urate concentration reaches pathological levels. smuox mediates conversion of urate into allantoin. expression of smuox stops when urate concentration reaches oxidative-stress-protective urate levels. pa, poly(a) tail. part a is modified, with permission, from ref. 65 © (2011) american academy for the advancement of science. the prosthetic uric-acid-responsive expression network (urex) was able to sense uric acid concentrations precisely and to activate secretion of smuox when the uric acid concentration was at pathologic levels. secretion of smuox is stopped as soon as the uric acid concentration has returned to the homeostasis level. this was impressively demonstrated when urex-transgenic cells were implanted into urate-oxidase-deficient mice, which develop acute hyperuricaemia with symptoms that are similar to human gout. urex was able to degrade urate and restore urate homeostasis in the blood, resulting in the dissolution of uric acid crystal deposits in the kidney of treated animals 64 . its straightforward design may allow urex to serve as a blueprint for the assembly of other prosthetic networks that sense metabolic disturbances and circulating pathologic metabolites. an artificial insemination device. artificial insemination is standard practice to facilitate both human reproduction and livestock breeding. because of broad variations in oestrus expression and ovulation timing, coordinating sperm delivery with female oestrus is still a major challenge. ovulation is triggered at a specific time when the pituitary gland releases luteinizing hormone, which binds to the luteinizing hormone receptor (lhr) and coordinates the release of the oocyte. by integrating synthetic signalling cascades with advanced biomaterials, kemmer and colleagues 63 designed an artificial insemination device that coordinates sperm delivery with oestrus control. the artificial insemination device consists of cellulose sulphate capsules containing bull sperm and sensor cells 63, 156, 157 . the sensor cells are engineered to express lhr constitutively so that when it is activated, it triggers expression of cellulase that can be secreted. after implantation into the cow's uterus, the sensor cell line constantly monitors the animal's luteinizing hormone levels, and the oestrus-triggered surge in luteinizing hormone levels leads to the production of secreted cellulase, which degrades the implanted capsule and results in the timely delivery of the sperm and successful conception. fine tuning of the designer cascade could enable its use in other species, including in humans. in recent years, synthetic biology has substantially advanced strategies for classical biomedical applications, such as pathogen characterization 68, 70, 74 , disease analysis [78] [79] [80] , diagnostics [75] [76] [77] , screening assays 33, 92, 94, 100, 101 , drug production [105] [106] [107] [108] 158 and vaccination 81, 82, 85 . this progress may imminently develop into shorter drug discovery 94, 98 and drug development timelines, increased precision of drug delivery 112, 114 and production of new and more affordable medicines [104] [105] [106] [107] [108] [109] . ultimately, sophisticated therapeutic sensor-effector devices that can sense disturbances, seek out pathological conditions and restore function are on the roadmap. such therapeutic networks that connect diagnostic input with therapeutic output may provide all-in-one diagnostic, preventive and therapeutic solutions in future gene-and cell-based therapies. matching diagnostic outcome with high-end therapies has recently become a focus of the pharmaceutical industry, which has declared personalized medicine as the treatment strategy of the future. tools that will have a tremendous impact in future biomedical applications include: using light-activated triggers to bring about a precise therapeutic response in cells 65 , programming bacteria to seek and destroy cancer cells 128, 132, 133 and using synthetic circuitry to keep crucial metabolites at homeostatic levels 64, 65 , to manage disease-controlled expansion 145 or to eliminate specific cell populations 31, 148, 159 . recent work has shown that this is, in principle, possible and that some devices are working as expected and are producing a therapeutic impact in animal models of human diseases 64, 65 . implants consisting of engineered microencapsulated cells represent a way of introducing prosthetic networks with a predefined function instead of directly targeting the host cells with the genetic material. although implants containing cells with engineered prosthetic networks are certainly the most promising way forward, they will limit biomedical applications to extracellular disease metabolites that can be therapeutically addressed through the vascular system. however, there is still a long way to go until syntheticbiology-based biomedical devices will be a clinical reality. placing therapeutic circuits in specific cells of a patient and making sure that there will be no interference with human metabolism are the most important challenges. therefore, clinical use of synthetic-biology-based devices and therapeutic scenarios will face the same scientific, ethical and legal issues as any gene-and cell-based therapy, but they may offer more complex control dynamics and are therefore expected to have a higher therapeutic impact. although none of the synthetic devices, prosthetic networks and related products that are pioneered by synthetic biologists have yet been used in the clinics or in clinical trials, the stage is set -with novel treatment strategies available and the commitment of the pharmaceutical industry in place -for synthetic biology to deliver the biomedicines of the twenty-first century. network news: innovations in 21st century systems biology evolvability and hierarchy in rewired bacterial gene networks identification of functional elements and regulatory circuits by drosophila modencode informing biological design by integration of systems and synthetic biology genome-sequencing anniversary. a celebration of the genome, part i genotype and snp calling from next-generation sequencing data scalable gene synthesis by selective amplification of dna pools from high-fidelity microchips high-fidelity gene synthesis by retrieval of sequence-verified dna identified using high-throughput pyrosequencing parallel on-chip gene synthesis and application to optimization of protein expression creating bacterial strains from genomes that have been cloned and engineered in yeast evidence for large diversity in the human transcriptome created by alu rna editing hematopoietic stem cell gene therapy with a lentiviral vector in x-linked adrenoleukodystrophy creation of a bacterial cell controlled by a chemically synthesized genome life after the synthetic cell making cellular memories a synchronized quorum of genetic clocks diversity-based, model-guided construction of synthetic gene networks with predicted functions a synthetic oscillatory network of transcriptional regulators a synthetic gene-metabolic oscillator construction of a genetic toggle switch in escherichia coli an engineered mammalian band-pass network hysteresis in a synthetic mammalian gene network an engineered epigenetic transgene switch in mammalian cells rationally designed logic integration of regulatory signals in mammalian cells a universal rnai-based logic evaluator that operates in mammalian cells a fast, robust and tunable synthetic gene oscillator intron length increases oscillatory periods of gene expression in animal cells a tunable synthetic mammalian oscillator a synthetic-natural hybrid oscillator in human cells a synthetic time-delay circuit in mammalian cells and mice multi-input rnai-based logic circuit for identification of specific cancer cells this paper describes a synthetic network that classifies cells according to their specific microrna expression profiles. this strategy might be generally applicable for identification andengineering of specific cell types synthetic biology: integrated gene circuits mammalian synthetic biology-from tools to therapies recent advances in mammalian synthetic biology -design of synthetic transgene control networks synthetic gene networks in mammalian cells molecular diversity-the toolbox for synthetic gene switches and networks rna-based computation in live cells ligand-dependent regulatory rna parts for synthetic biology in eukaryotes frameworks for programming biological function through rna parts and devices refinement and standardization of synthetic biological parts and devices mammalian synthetic biology: engineering of sophisticated gene networks ultrasensitivity and noise propagation in a synthetic transcriptional cascade semi-synthetic mammalian gene regulatory networks intronically encoded sirnas improve dynamic range of mammalian gene regulation systems and toggle switch origin of bistability underlying mammalian cell cycle entry synthetic ecosystems based on airborne inter-and intrakingdom communication streptogramin-based gene regulation systems for mammalian cells controlling transgene expression in subcutaneous implants using a skin lotion containing the apple metabolite phloretin de novo design and construction of an inducible gene expression system in mammalian cells development of genetic circuitry exhibiting toggle switch or oscillatory behavior in escherichia coli multistability in the lactose utilization network of escherichia coli a genetic time-delay circuitry in mammalian cells engineered bidirectional communication mediates a consensus in a microbial biofilm consortium programmed population control by cell-cell communication and regulated killing a synthetic multicellular system for programmed pattern formation watch the clockengineering biological systems to be on time a synthetic low-frequency mammalian oscillator synthetic biology: exploring and exploiting genetic modularity through the design of novel biological networks eukaryotic systems broaden the scope of synthetic biology synthetic biology: applications come of age next-generation synthetic gene networks the second wave of synthetic biology: from modules to systems a designer network coordinating bovine artificial insemination by ovulation-triggered release of implanted sperms this describes the first synthetic closed-loop control gene network that manages homeostasis of a crucial disease metabolite in an animal model the first optogenetic device that controls the production of a therapeutic protein in an animal disease model is the potential of synthetic biology: a view from the european academies science advisory council synthetic biology moving into the clinic characterization of the reconstructed 1918 spanish influenza pandemic virus complete chemical synthesis, assembly, and cloning of a mycoplasma genitalium genome a two-amino acid change in the hemagglutinin of the 1918 influenza virus abolishes transmission infectious diseases. as swine flu circles globe, scientists grapple with basic questions origins and evolutionary genomics of the 2009 swine-origin h1n1 influenza a epidemic synthetic viruses: a new opportunity to understand and prevent viral disease synthetic recombinant bat sars-like coronavirus is infectious in cultured cells and in mice antibody-profiling technologies for studying humoral responses to infectious agents anti-borrelia burgdorferi antibody profile in post-lyme disease syndrome lips arrays for simultaneous detection of antibodies against partial and whole proteomes of hcv, hiv and ebv mutations of the igbeta gene cause agammaglobulinemia in man oligomeric organization of the b-cell antigen receptor on resting cells autoantigen discovery with a synthetic human peptidome this study is a demonstration of how the synthesis of the human peptidome enables the direct identification of new autoantigens a virus-like particle vaccine for epidemic chikungunya virus protects nonhuman primates against infection this provides a description of a standard strategy for the production of codon-pair deoptimized viral genomes for use as attenuated life vaccines optimization and quantification of protein synthesis inside liposomes from pond scum to pharmacy shelf heat-stable oral alga-based vaccine protects mice from staphylococcus aureus infection female-specific insect lethality engineered using alternative splicing female-specific flightless phenotype for mosquito control this paper presents a strategy in which transgenic insects containing a synthetic gene network could provide pest control by disseminating a conditional flightless female phenotype among natural insect populations genetic elimination of dengue vector mosquitoes a synthetic homing endonuclease-based gene drive system in the human malaria mosquito science snipes at oxitec transgenicmosquito trial macrolide-based transgene control in mammalian cells and mice design of a novel mammalian screening system for the detection of bioavailable, non-cytotoxic streptogramin antibiotics the impact of synthetic biology on drug discovery in this study, the reconstruction of a synthetic antibiotic resistance network in mammalian cells enabled the discovery of novel anti stringent rosiglitazone-dependent gene switch in muscle cells without effect on myogenic differentiation a system for small-molecule control of conditionally replication-competent adenoviral vectors streptomyces-derived quorumsensing systems engineered for adjustable transgene expression in mammalian cells and mice synthetic ethr inhibitors boost antituberculous activity of ethionamide controlled proliferation by multigene metabolic engineering enhances the productivity of chinese hamster ovary cells in vitro assays for anticancer drug discovery-a novel approach based on engineered mammalian cell lines a novel mammalian cell-based approach for the discovery of anticancer drugs with reduced cytotoxicity on non-dividing cells drug discovery and natural products: end of an era or an endless frontier? exploiting plug-and-play synthetic biology for drug discovery and production in microorganisms chemical biology: synthetic metabolism goes green silencing of tryptamine biosynthesis for production of nonnatural alkaloids in plant culture this work described the biosynthesis of halogenated molecules in plants using synthetic pathways production of the antimalarial drug precursor artemisinic acid in engineered yeast isoprenoid pathway optimization for taxol precursor overproduction in escherichia coli engineering escherichia coli for production of functionalized terpenoids using plant p450s the impact of mammalian gene regulation concepts on functional genomic research, metabolic engineering, and advanced gene therapies conditional dna-protein interactions confer stimulus-sensing properties to biohybrid materials drug-sensing hydrogels for the inducible release of biopharmaceuticals genetically engineered protein in hydrogels tailors stimuli-responsive characteristics synthetic mammalian gene networks as a blueprint for the design of interactive biohybrid materials a gene therapy technology-based hydrogel for the trigger inducible release of bipharmaceuticals in mice a coumermycin/novobiocin-regulated gene expression system regulation of protein secretion through controlled aggregation in the endoplasmic reticulum this paper describes a novel metabolite-based strategy for the eradication of antibiotic-tolerant bacterial 'persister cells bacterial charity work leads to population-wide resistance dispersing biofilms with engineered enzymatic bacteriophage sublethal antibiotic treatment leads to multidrug resistance via radical-induced mutagenesis how antibiotics kill bacteria: from targets to networks a common mechanism of cellular death induced by bactericidal antibiotics engineered bacteriophage targeting gene networks as adjuvants for antibiotic therapy fighting bacterial infections-future treatment options engineered bacterial communication prevents vibrio cholerae virulence in an infant mouse model secretion of insulinotropic proteins by commensal bacteria: rewiring the gut to treat diabetes engineering the perfect (bacterial) cancer therapy environmentally controlled invasion of cancer cells by engineered bacteria genetically engineered salmonella typhimurium as an imageable therapeutic probe for cancer tumour-targeted delivery of trail using salmonella typhimurium enhances breast cancer survival in mice in vivo gene regulation in salmonella spp. by a salicylate-dependent control circuit short hairpin rna-expressing bacteria elicit rna interference in mammals reprogrammed viruses as cancer therapeutics: targeted, armed and shielded therapeutic protein transduction of mammalian cells and mice by nucleic acid-free lentiviral nanoparticles protein transduction from retroviral gag precursors efficient construction of sequencespecific tal effectors for modulating mammalian transcription purification and characterization of 1-aminocyclopropane-1-carboxylate oxidase from apple fruit targeting cancer by transcriptional control in cancer gene therapy and viral oncolysis redirecting tyrosine kinase signaling to an apoptotic caspase pathway through chimeric adaptor proteins how erk1/2 activation controls cell proliferation and cell death: is subcellular localization the answer? a tunable dual-promoter integrator for targeting of cancer cells engineering ligandresponsive gene-control elements: lessons learned from natural riboswitches artificial ribozyme switches containing natural riboswitch aptamer domains genetic control of mammalian t-cell proliferation with synthetic rna regulatory systems control of large, established tumor xenografts with genetically retargeted human t cells containing cd28 and cd137 domains the promise and potential pitfalls of chimeric antigen receptors in this paper, the abundance of specific cellular proteins was linked to transcription of target genes via splicing-modulating rna controllers. this enables proteome-based classification and engineering of individual cells synthetic biology: engineering escherichia coli to see light multichromatic control of gene expression in escherichia coli a synthetic genetic edge detection program induction of protein-protein interactions in live cells using light spatiotemporal control of cell signalling using a light-switchable protein interaction hucr, a novel uric acid-responsive member of the marr family of transcriptional regulators from deinococcus radiodurans cloning and expression in escherichia coli of the gene encoding aspergillus flavus urate oxidase a novel system for trigger-controlled drug release from polymer capsules design of high-throughputcompatible protocols for microencapsulation, cryopreservation and release of bovine spermatozoa short-term bmp-2 expression is sufficient for in vivo osteochondral differentiation of mesenchymal stem cells an autonomous system for identifying and governing a cell's state in yeast the authors declare no competing financial interests. key: cord-314915-b6aqwubh authors: futas, jan; oppelt, jan; jelinek, april; elbers, jean p.; wijacki, jan; knoll, ales; burger, pamela a.; horin, petr title: natural killer cell receptor genes in camels: another mammalian model date: 2019-07-02 journal: front genet doi: 10.3389/fgene.2019.00620 sha: doc_id: 314915 cord_uid: b6aqwubh due to production of special homodimeric heavy chain antibodies, somatic hypermutation of their t-cell receptor genes and unusually low diversity of their major histocompatibility complex genes, camels represent an important model for immunogenetic studies. here, we analyzed genes encoding selected natural killer cell receptors with a special focus on genes encoding receptors for major histocompatibility complex (mhc) class i ligands in the two domestic camel species, camelus dromedarius and camelus bactrianus. based on the dromedary genome assembly camdro2, we characterized the genetic contents, organization, and variability of two complex genomic regions, the leukocyte receptor complex and the natural killer complex, along with the natural cytotoxicity receptor genes ncr1, ncr2, and ncr3. the genomic organization of the natural killer complex region of camels differs from cattle, the phylogenetically most closely related species. with its minimal set of klr genes, it resembles this complex in the domestic pig. similarly, the leukocyte receptor complex of camels is strikingly different from its cattle counterpart. with kir pseudogenes and few lilr genes, it seems to be simpler than in the pig. the syntenies and protein sequences of the ncr1, ncr2, and ncr3 genes in the dromedary suggest that they could be human orthologues. however, only ncr1 and ncr2 have a structure of functional genes, while ncr3 appears to be a pseudogene. high sequence similarities between the two camel species as well as with the alpaca vicugna pacos were observed. the polymorphism in all genes analyzed seems to be generally low, similar to the rest of the camel genomes. this first report on natural killer cell receptor genes in camelids adds new data to our understanding of specificities of the camel immune system and its functions, extends our genetic knowledge of the innate immune variation in dromedaries and bactrian camels, and contributes to studies of natural killer cell receptors evolution in mammals. due to production of special homodimeric heavy chain antibodies, somatic hypermutation of their t-cell receptor genes and unusually low diversity of their major histocompatibility complex genes, camels represent an important model for immunogenetic studies. here, we analyzed genes encoding selected natural killer cell receptors with a special focus on genes encoding receptors for major histocompatibility complex (mhc) class i ligands in the two domestic camel species, camelus dromedarius and camelus bactrianus. based on the dromedary genome assembly camdro2, we characterized the genetic contents, organization, and variability of two complex genomic regions, the leukocyte receptor complex and the natural killer complex, along with the natural cytotoxicity receptor genes ncr1, ncr2, and ncr3. the genomic organization of the natural killer complex region of camels differs from cattle, the phylogenetically most closely related species. with its minimal set of klr genes, it resembles this complex in the domestic pig. similarly, the leukocyte receptor complex of camels is strikingly different from its cattle counterpart. with kir pseudogenes and few lilr genes, it seems to be simpler than in the pig. the syntenies and protein sequences of the ncr1, ncr2, and ncr3 genes in the dromedary suggest that they could be human orthologues. however, only ncr1 and ncr2 have a structure of functional genes, while ncr3 appears to be a pseudogene. high sequence similarities between the two camel species as well as with the alpaca vicugna pacos were observed. the polymorphism in all genes analyzed seems to be generally low, similar to the rest of the camel genomes. this first report on natural killer cell receptor genes in camelids adds new data to our understanding of specificities of the camel immune system and its functions, extends our genetic knowledge of the innate immune variation in dromedaries and bactrian camels, and contributes to studies of natural killer cell receptors evolution in mammals. keywords: camelid, leukocyte receptor complex, natural killer complex, snp, microsatellites introduction camels (camelus spp.) represent an important genus for a number of reasons. due to their adaptation to desert or semidesert regions, old world camels tolerate harsh conditions, which are inhospitable for many livestock species, including extreme temperatures and prolonged periods without access to food and water (reviewed in jirimutu et al., 2012) . as a result, they are of socioeconomic importance across the middle east, northern africa, and much of asia, where they are used for meat, milk, hides, transportation, and sport. the significance of camels as a sustainable livestock species is likely to continue, as many regions face increased temperatures and desertification as a result of climate change (megersa et al., 2014; watson et al., 2016) . concurrently, recent trends towards intensive production and the movement of camel production to peri-urban settings are altering the pathogen pressures to which these animals are exposed (abdallah and faye, 2013) . camels are also of importance with respect to a number of specific infectious diseases. for example, dromedaries (camelus dromedarius) are a natural host of middle east respiratory syndrome coronavirus, and transmission of the virus from camels to humans has been confirmed (gossner et al., 2014; hemida et al., 2017) . interestingly, significant differences exist between dromedaries and bactrian camels (camelus bactrianus) with regard to their susceptibility to foot and mouth disease (fmd), one of the most costly diseases of production animals worldwide; i.e., dromedaries are not susceptible to fmd and do not transmit infection (wernery and kinne, 2012) . furthermore, the immune system of camels has several unusual features. notable among these is the presence of homodimeric heavy chain antibodies (hamers-casterman et al., 1993) , not known to occur in any other mammalian family, which have both potential and realized applications in a variety of research, diagnostic, and therapeutic settings (muyldermans et al., 2009; de meyer et al., 2014; steeland et al., 2016) . the persistence of uniquely organized ileal peyer's patches into adulthood of the dromedary (zidan and pabst, 2008) is another example. additionally, productively rearranged dromedary t-cell receptor delta variable (trdv) (antonacci et al., 2011) and t-cell receptor gamma variable (trgv) (vaccarelli et al., 2012) genes undergo somatic hypermutation to generate a diversified repertoire of these genes. this mechanism has not been documented for t-cell receptor genes in other mammalian species and appears to compensate the more limited repertoire of trdv and trgv genes found in camels relative to other artiodactyls (ciccarese et al., 2014) . a further atypical aspect of the camelid immunogenome is the unusually low genetic diversity of the major histocompatibility complex (mhc) of the three species of old world camels in both class i (plasil et al., 2019) and class ii genes (plasil et al., 2016) . the immunological characterization of cellular components of the camel immune system is scarce mainly due to the small number of cross-reacting monoclonal antibodies raised against leukocyte antigens of humans (hussen et al., 2017) , bovines, and/ or other related species (mossad et al., 2006) available. this is also one of the reasons why natural killer cells and their functions in camelids have not been studied so far. natural killer (nk) cells constitute a heterogeneous lymphocyte population (allan et al., 2015) involved primarily in innate immune responses against intracellular pathogens and tumor cells. they also influence adaptive immune responses via the production of cytokines (vivier et al., 2011) and crosstalk with dendritic cells (hamerman et al., 2005) , play a role in placentation (parham and moffett, 2013) , and contribute to the recognition of allogeneic cells. the diversity of the nk cell receptor repertoire is essential to the performance of these multiple functions. the integration of activating and inhibitory signals originating from various surface receptors determines the activation status of an individual nk cell, providing the capacity to discriminate between self and non-self or altered self (lanier, 1998) . characterization of genes underlying receptors on the nk cell surface can significantly contribute to our understanding of the functional heterogeneity of nk cells. among them, nk cell receptors (nkr) of several gene families bind polymorphic mhc class i or mhc class i-like molecules to mediate nk cell function. due to functional relationships between mhc and nkr molecules, the underlying genes and genomic regions represent an important biological model in terms of their co-evolution in the context of pathogen pressures, disease, and survival (guethlein et al., 2015; carrillo-bustamante et al., 2016) . however, the current knowledge of mammalian nkr genes, in comparison with that of mhc regions, is rather fragmentary. two major genomic complexes encoding nkr, the natural killer complex (nkc) and the leukocyte receptor complex (lrc), have been identified in mammalian genomes. genes in the nkc represent receptors with c-type lectin-like extracellular domains; genes in the lrc code receptors with extracellular ligand-binding domains belong to the immunoglobulin superfamily (trowsdale et al., 2001) . despite these structural differences, some receptor families of both complexes are able to fulfil the same functions in terms of mhc class i recognition, downstream signaling, and mediation of nk cell activation/inhibition. to accomplish these ends, different nkr gene families expanded and diversified in different mammalian species, representing an example of convergent evolution in mammals (kelley et al., 2005; guethlein et al., 2015) . two immunologically well-defined species, humans and mice, have expanded structurally unrelated receptor families: humans use the killer-cell immunoglobulin-like receptors (kir) and leukocyte immunoglobulin-like receptors (lilr), both encoded within the lrc, whereas in mice the killer-cell lectinlike receptor (klr) genes of one family (klra, formerly ly49) are expanded in the nkc. a common sign of these expanded gene families, along with allelic variation of members, is haplotypic variation in the number of genes and pseudogenes in populations/ strains of same species (marsch et al., 2003; schenkel et al., 2013) . the gene content of the lrc (human vs. primates) and nkc (mouse vs. rat) is known to vary even between closely related mammalian species and families; as a result, knowledge of nkr genes in a number of important species remains fragmentary or missing. significant differences have been shown to exist within artiodactyls, as for example between cattle and pigs (sanderson et al., 2014; schwartz et al., 2017; schwartz and hammond, 2018) , but knowledge of the genes underlying camel nkr is lacking. in the context of our work on the camelid immunogenome, the objective of this study was to characterize the genomic content of nkc and lrc with special focus on genes encoding natural killer cell receptors for mhc class i ligands in the two domestic camel species, c. dromedarius and c. bactrianus. based on our new assembly camdro2 of the c. dromedarius genome , we characterized the nkc and lrc genomic regions and three natural cytotoxicity receptor genes (ncr1, ncr2, and ncr3). their gene contents and organization were compared to the national center for biotechnology information (ncbi) reference genomes for c. dromedarius (ncbi genome accession code gca_000767585.1) assembly prjna234474_ca_dromedarius_v1.0, c. bactrianus (gca_000767855.1) assembly ca_bactrianus_mbc_1.0, and vicugna pacos (gca_000164845.3) assembly vicugna_pacos-2.0.2. the selected orthologous genes were searched in two genomes of domesticated artiodactyl species, cattle bos taurus (gca_000003055.5) assembly bos_taurus_umd_3.1.1 and pig sus scrofa (gca_000003025.6) assembly sscrofa11.1. various individual genomes for c. dromedarius (ncbi bioproject accessions: prjna269274, prjna269961, and prjna310822) and c. bactrianus (prjna183605 and prjeb407) were searched in publicly available whole-genome shotgun contigs for candidate microsatellite markers. likewise, individual whole genome sequencing reads for v. pacos (prjna233565 and prjna340289) were used to estimate single-nucleotide polymorphism (snp) variability in selected genes of alpacas. alongside automatic computational annotation of genes in camdro2 (see elbers et al., 2019) , selected unrecognized genes for nk receptors were manually annotated in the nkc and lrc genomic regions. first, we searched the c. dromedarius ncbi reference genome by tblastn algorithm of ncbi's blast ®1 for orthologous protein sequences to killer-cell lectin-like receptors recently identified in cattle as klr genes (schwartz et al., 2017) . second, the ab initio messenger rna (mrna) models complementary dna (cdna) for corresponding genes of all klr gene lineages in the dromedary reference genome were inspected for completeness using ncbi's conserved domain database cdd search 2 and tmhmm server v.2.0 3 for prediction of transmembrane helices in predicted proteins. the cdna for klre was incorrect; thus, the cattle sequence (schwartz et al., 2017) was used instead. these cdna models were aligned against the camdro2 chromosome 34 sequence using ncbi's splign algorithm 4 and also blast ®1 searched in our full genomic assembly camdro2. identified genes were annotated accordingly. the splign algorithm 4 was also used on scaffolds for the dromedary, bactrian camel, and alpaca ncbi reference genomes. the killer-cell immunoglobulin-like receptor kir genes and leukocyte immunoglobulin-like receptor lilr genes were searched on camdro2 chromosome 9 by the tblastn algorithm 1 . individual immunoglobulin-like (ig-like) domains and the cytoplasmic tail of bactrian 3-domain receptor lilr (xp_010960360) served as query sequences. orthologous and paralogous sequences were found. the corresponding genomic and predicted cdna sequences were retrieved from the ncbi reference genomes for both camel species. the full length kir and lilr cdnas were cross-aligned with adequate scaffolds of reference genomes using splign 4 . the predicted protein sequences were screened with cdd 2 and tmhmm server v. 2.0 search 3 . the gene sequences were blast ®1 searched in camdro2. identified full-length genes were annotated, and gene fragments were recorded. the natural cytotoxicity receptor genes ncr1, ncr2, and ncr3 were blast ® searched in the full assembly camdro2 based on annotation of the dromedary ncbi reference genome. although there exist numbers of alternatively spliced variants for each gene/protein, we focused on cdna models of the longest variant ( table 1) . the predicted dromedary, bactrian camel, and cattle cdna for ncr3 were incomplete; therefore, we used the sequence predicted from the white-tailed deer odocoileus virginianus texanus instead. during the process of manual annotation, we also characterized the nkc and lrc regions, and comparisons were made between c. dromedarius, c. bactrianus, and v. pacos. to allow comparisons with other studies and identification of orthologues, a standardized systematic nomenclature of nkr genes (schwartz et al., 2017) was used. however, when referring to original reports on human or mouse genes, both the original and standard gene names and symbols were used. the gene-specific primers encompassing full-length genes with flanking sequences were designed on ncbi c. bactrianus gene sequences using primer-blast 5 and checked for specificity against reference genomes of both camel species. the list of primer pairs used is available in supplementary table 1. various compositions of pcrs and adequate thermal profiles used are summarized in supplementary table 2 . pcr products were checked by 0.5% agarose gel electrophoresis and quantified by invitrogen ™ qubit ™ fluorometer using qubit ™ dsdna br assay kit (thermo fisher scientific, waltham, ma, usa). they were kept frozen at −20°c until massive parallel sequencing. each individual's long-range amplicon of genes under study was indexed separately during library preparation using the nexteraxt dna library preparation kit (illumina, san diego, ca, usa) and sequenced on a miseq ™ system (illumina, san diego, ca, usa) platform using the miseq ™ reagent kit v2 (500 cycles) according to manufacturer's protocol in different runs. the quality of the raw sequencing reads was checked using fastqc 6 . low quality read ends were removed by trimmomatic (bolger et al., 2014) (slidingwindow:4:15) . only reads longer than 150 bp were used for the mapping by bwa-mem (li, 2013) . the alignment was post-processed using samtools (li et al., 2009 ) (sorting and conversions), gatk (depristo et al., 2011) (indel realignment), and picard 7 (pcr duplicates removal). further, only mappings with the minimal mapped length of 70 bp, a maximum of 5% soft-clipping, and a maximum of 10% mismatches were kept using ngsutils (breese and liu, 2013) and bbmap 8 . the repetitive sequences of di-, tri-, and tetra-nucleotides were searched in the nkc and lrc of the bactrian camel ncbi reference genome by repeatmasker 9 . candidate microsatellites (msats) were identified by blast ® search of repetitions flanked with 100 bp sequences in whole-genome shotgun contigs from three dromedaries and two bactrian camels. the most diverse sequences with unique occurrence in genome were chosen, and primers were designed in oligo primer analysis software v.4.0 (molecular biology insights, colorado springs, co, usa). primer specificity was verified against the ncbi reference genomes of both camel species using ncbi's primer-blast 5 . the pcr conditions were optimized for six msats, finalizing in one 5-plex (czm025-czm029) and one single (czm030) pcr protocol. reaction compositions were as follows: 1.0 μl 10 × taq buffer (top-bio, prague, czech republic), 0.5 u combitaq dna polymerase (top-bio, prague, czech republic), 200 µm each 6 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 7 http://broadinstitute.github.io/picard/ 8 https://sourceforge.net/projects/bbmap/ 9 http://repeatmasker.org/cgi-bin/webrepeatmasker dntp (thermo fisher scientific, waltham, ma, usa), 0.1 μl of each primer of concentration 10 μm ( table 2) , and 50 ng of genomic dna. pcr reaction mix was supplemented with pcr grade h 2 o (top-bio, prague, czech republic) to a final volume of 10.0 μl. the thermal cycler abi verity 96 well (applied biosystems, foster city, ca, usa) was used for amplification. the thermocycling conditions consisted of initial denaturation at 95°c for 3 min; 30 cycles of denaturation at 95°c for 30 s, annealing at 56°c (64°c for czm030) for 30 s, and elongation at 72°c for 30 s; and final elongation at 72°c for 60 min and holding at 7°c. all markers were then tested by fluorescent fragment analysis using applied biosystems ® abi prism 3500 and sized with genescan ™ 500 liz ® size standard (thermo fisher scientific, waltham, ma, usa). the data obtained from the fragment analyzer were evaluated using the genemapper ® software v.4.1 (thermo fisher scientific, waltham, ma, usa). the dromedary dna samples in this study were transferred from previous projects [austrian science fund (fwf)p1084-b17 and p24706-b25; pi: p. burger] and originated either from plucked hair, ethylenediaminetetraacetic acid (edta) blood collected commensally on whatman fta ® cards (sigma-aldrich, vienna, austria) during routine veterinary controls, or from dna extracts sent by collaborators under bilateral agreements. samples were imported with permits from the austrian ministry of labour, social affairs, health and consumer protection. all the bactrian camel samples were collected commensally during veterinary procedures for previous research projects (gacr 523/09/1972; pi: p. horin). details about the samples are provided in the supplementary table 3 . selected genes of the nkc and lrc regions were genotyped by targeted resequencing of long-range pcr amplicons in both camel species. for comparison, individual genotypes in the same batch of genes except lilr genes were acquired for alpacas by data mining. two panels of 10 animals were created from collections of samples originating from various populations. the c. dromedarius panel encompassed individuals coming from jordan (irbid), iran, saudi arabia (magaheem and wadda), canary islands, uae (dubai), kenya, sudan, nigeria, and kazakhstan. the genomic dna was previously isolated by phenol-chloroform extraction and kept frozen at −20°c. the c. bactrianus panel consisted of individuals from three mongolian regions (bayan ovoo, galshar, and norovlin). the genomic dna was isolated from frozen archived whole blood samples by nucleospinblood © kit (macherey-nagel, düren, germany) according to the manufacturer's protocol. the genes of interest were isolated by pcrs on genomic dna, obtained amplicons were indexed to track individual samples, and then were sequenced in multiple illumina next-generation sequencing (ngs) runs and mapped to adequate reference sequence for amplicon (see above). a panel of four alpacas was created from publicly available whole-genome sequencing projects. the raw data of illumina ngs runs were downloaded from the european nucleotide archive database 10 (ena accession numbers srr1552593-1552609, srr4095109, srr4095110, and srr4095135). the quality was checked using fastqc 6 and kraken package (davis et al., 2013) . adapter and quality (phred < 15) trimming was performed by cutadapt (martin, 2011) . bwa-mem (li, 2013) was used for the alignment, and the alignments were post-processed by samtools (li et al., 2009 ) (sorting and conversions), gatk (depristo et al., 2011) (indel realignment), and picard 7 (pcr duplicates removal). alignments were further filtered using ngsutils (breese and liu, 2013) and bbmap 8 for maximum soft-clipping (5%), maximum number of mismatches (10), minimum mapped length (35 bp), maximum soft-clipping (5%), and minimum mapping quality (mapq 40). the reference sequences for mapping were retrieved from the v. pacos ncbi reference genome. most v. pacos sequences were framed by the primer sequences used in camels. 10 https://www.ebi.ac.uk/ena the alignments of reads to the reference sequence were inspected using igv software 11 . the variable positions (variant in homozygous state) and confirmed sequence variants (variant detected in heterozygous state) were treated as snps. they were written to consensus sequences for each animal using iupac nucleotide ambiguity codes in bioedit, version 7.2.6. (hall, 1999) along with insertions/deletions, and sequences from same animal species were manually aligned. the number of snps was counted using dnasp version 5.10 program (librado and rozas, 2009) , and frequency was calculated as percentage. the cdna sequences were in silico extracted in bioedit v.7.2.6, based on mrna models for each gene ( table 1 ) and checked by splign 4 for completeness. haplotypes of each diploid individual were reconstructed for every panel and gene (cdna) separately using phase (stephens and donnelly, 2003) algorithm in dnasp v.5.10. the coding sequences were extracted in bioedit v.7.2.6. the number of snps was counted in dnasp v.5.10, and the percentage of coding sequence length calculated. amino acid sequences were deduced from coding sequences in bioedit v.7.2.6. the manually aligned predicted protein sequences were compared. sequences differing in at least one amino acid position were numbered and designated as different alleles of a particular gene. a phylogenetic analysis of sequences obtained by long-range pcr or data mining was done separately for nkc (c-type lectinlike) and lrc (immunoglobulin-like) genes. the nucleotide coding region sequences were aligned by clustalw multiple alignment algorithm in bioedit v.7.2.6. one haplotype per gene was chosen for each species to represent the respective loci of the dromedary, bactrian camel, and alpaca. corresponding cattle and pig sequences retrieved from ncbi's genbank 12 were used for a comparison. the maximum likelihood phylogenetic trees were constructed in mega5 (tamura et al., 2011) based on the tamura 3-parameter model and the partial deletion method (95% cutoff) with 100 bootstrap repetitions (tamura, 1992) . 2 | characteristics of microsatellite markers in natural killer complex (nkc) and leukocyte receptor complex (lrc) regions. type of repetition c. dromedarius amplicon size forward primer reverse primer the general organization of the two genomic regions, the natural killer complex (nkc) and the leukocyte receptor complex (lrc), containing genes and gene families encoding the nk cell receptors annotated based on the dromedary genome assembly camdro2, was established and is represented in figure 1 . the phylogenetic trees of the genes analyzed are shown in figures 2 and 3 for nkc and lrc genes, respectively. a summary of the allelic variants of the predicted proteins for the genes genotyped in dromedaries and bactrian camels is given in the nkc region encompassing approximately 0.9 mbp was localized on chromosome 34 of camdro2. twenty-six genes encoding receptors with the c-type lectin-like domain (ctld) of different lineages were identified in this region. no expansion of any klr gene family was observed in the dromedary genome. most klr genes clustered at one end of the region. this cluster contains five functional genes (klra, klrd, klre, klri, and klrk), two functional members of the klrc family, and two pseudogenes (klrh and klrj). klrc is the only gene family with two members sharing the same ctld but signaling oppositely: klrc1 codes for an inhibitory receptor, while klrc2 encodes an activating receptor. members of three families (klrb, klrf, and klrg) are located at the opposite end of the nkc region, separated from each other by a group of c-type lectin (clec) genes. two members of each of these families were found in the camdro2 dromedary genome. the genes klrb1 and klrb1b have the standard structure of inhibitory receptors with a cytoplasmic tail containing an immunoreceptor tyrosinebased inhibitory motif (itim). genes encoding their putative ligands (clec2d and clec2f, respectively) were found in the vicinity. similarly, the genes klrf1 (nkp80) and klrf2 (nkp65) encoding activating receptors are located in close proximity of genes coding for their predicted ligands, aicl and kacl (clec2b and clec2a, respectively). klrg1 encoding an inhibitory receptor marks the boundary of the nkc. klrg2 was found outside of nkc, on chromosome 7. while in the c. dromedarius ncbi reference genome the nkc is split into two scaffolds (nw_011591409, nw_011591669), it is contained within a single scaffold (nw_011511552) in the c. bactrianus ncbi reference genome. the gene content and gene orientations are the same in both genomes. the only exception is the presence of a premature stop codon in the bactrian klrc2 sequence, which thus seems to be a pseudogene. since no expansion of klr gene families was observed, we focused on the allelic variation of inhibitory receptors supposed to recognize mhc class i ligands. due to their poor quality, some samples were not successfully amplified by long-range pcrs. because of an apparent mixed ancestry (c. dromedarius x c. bactrianus) of one c. dromedarius, heterozygous sequences of mixed origin were removed. consequently, different numbers of genotypes were retrieved for different genes as indicated in supplementary table 4 . despite polymorphisms existing in the genomic and predicted mrna sequences (supplementary table 4) , none of the tested genes were found to have more than three protein variants. five of the eight tested genes in c. bactrianus were monomorphic on the protein level. klra encodes an inhibitory receptor with one itim signaling motif in its cytoplasmic tail and a relatively long extracellular stalk region (over 70 amino acids). two variants of this receptor molecule were predicted in c. dromedarius, sharing the same ctld but differing by one amino acid residue in the cytoplasmic tail. a klra variant with the same ctld occurs frequently in c. bactrianus. a second variant of bactrian klra differs by nine amino acids (one in the cytoplasmic tail, four in the stalk, and four in the ctld). klrc1 codes for an inhibitory receptor with two itims. three variants of the klrc1 protein were identified in each camel species. one klrc1 variant was shared by both camel species, and two additional variants in each species differed by only one amino acid. only two different ctld variants were present in each species. klrc2 codes for an activating receptor with a charged amino acid residue (lysine) in the transmembrane domain. in c. dromedarius, klrc2 appears to encode two variants of a functional receptor. in c. bactrianus, this gene seems to be monomorphic with a premature stop codon shortly after the origin of translation. the klrd gene product consists of a ctld with a short stalk and a transmembrane domain with no signaling motif. for klrd, one protein sequence common to both camel species was observed, with two additional variants found in c. dromedarius, differing by one amino acid each. all three camel genes, klrc1, klrc2, and klrd, have codons for cysteine residues in the stalk region of the protein, allowing formation of disulfide links and of heterodimers klrd/klrc1 and klrd/klrc2. another pair of presumably interacting receptors forming noncovalent heterodimers is klre/klri. the striking features of old world camelid klre are the presence of sequence for an itim in the cytoplasmic tail of the protein and the existence of a second variant in c. dromedarius with a duplication of six amino acid residues in the ctld. in c. bactrianus, klre encodes only one variant of the protein sequence. the klri gene showed very limited polymorphism in both camel species at the genomic level, encoding only one protein variant with only one functional itim (and one mutated motif) in the cytoplasmic tail of the molecule. the predicted protein product of the klrj gene was identical in both camel species. another sequence variant in the dromedary camel differed by only one amino acid in the stalk region. according to the adopted mrna model, all these sequences contain a premature stop codon in the ctld of the protein. figure 1 | organization of genomic regions encoding nk cell receptors in dromedary camel. the nkc was delineated by styk1 and klrg1 genes on chromosome 34 (cdr34) of camdro2 between 11.61 and 12.50 mb. klr genes are represented as solid color triangles, klr pseudogenes as empty color triangles, and lectin-like clec or non-lectin genes as solid grey triangles. the lrc was found between the oscar and il11 genes on chromosome 9 (cdr9) in the region 63.00-72.01 mb. lilr genes are represented as solid orange triangles, lilr pseudogenes as empty orange triangles, immunoglobulin-like domains or cytoplasmic domain gene fragments as orange rectangles, kir pseudogenes as empty blue triangles, other types of ig-like genes as solid color triangles, and different types of flanking genes as solid grey triangles. green rectangles mark positions of newly developed microsatellite markers czm025-czm30. july 2019 | volume 10 | article 620 frontiers in genetics | www.frontiersin.org figure 2 | phylogeny of nucleotide coding sequences for nkc c-type lectin-like genes analyzed by long-range pcr/data mining. the percentage of trees (out of 100 bootstrap replicates), in which the associated sequences clustered together is given at branch nodes. branch lengths are expressed as the number of substitutions per site. clusters of genes are highlighted according to the color scheme used in figure 1 . haplotypes generated in this study were chosen (one per gene) to represent loci of camelus dromedarius (dromedary), camelus bactrianus (bactrian), and vicugna pacos (alpaca). comparisons were made to bos taurus (cattle) sequences retrieved from genbank (accession numbers: kx611576, kx611577, kx611578, kx698607, nm_174376.2, nm_001075139.1, nm_001098163.1, and nm_001168587.1) and sus scrofa (pig) sequences (accession numbers: nm_213813.2, nm_214338.1, xm_005655677.3, xm_005655679.3, xm_013988381.2, xm_013988416.2, and xm_021092357.1). july 2019 | volume 10 | article 620 frontiers in genetics | www.frontiersin.org figure 3 | phylogeny of nucleotide coding sequences for lrc immunoglobulin-like genes analyzed by long-range pcr/data mining. the percentage of trees (out of 100 bootstrap replicates) in which the associated sequences clustered together is given at branch nodes. branch lengths are expressed as the number of substitutions per site. haplotypes generated in the study were chosen (one per gene) to represent loci (colored) of camelus dromedarius (dromedary), camelus bactrianus (bactrian), and vicugna pacos (alpaca). the nomenclature for the camelid lilr gene family is provisional and will change when a complete assembly of this region is available. some alpaca lilr genes were not included due to an incomplete resolution of this family in the reference genome. the alpaca's lilra 2-ig sequence was retrieved from genbank (accession number xm_015252448.1). a selection of bos taurus (cattle) sequences (accession numbers: nm_174740.2, nm_181451.1, nm_183365.1, nm_001008415.1, nm_001097567.1, nm_001098089.1, xm_005201067.4, xm_024978801.1, xm_024978806.1, xm_024978818.1, xm_024978824.1, and xm_024978827.1) and all functional sus scrofa (pig) sequences (accession numbers: nm_001113218.1, nm_001123143.1, nm_001128451.1, xm_003134173.4, xm_013998762.2, xm_021094960.1, and xm_021094977.1) were used for comparison. july 2019 | volume 10 | article 620 frontiers in genetics | www.frontiersin.org klrk in both camel species codes for a functional activating receptor with a charged amino acid (arginine) in the transmembrane region. two proteins with variant ctld were recognized in c. dromedarius, while two proteins in c. bactrianus share the same ctld as one of the dromedary klrk. in the phylogenetic trees obtained, all nkc genes sequences of both camel species clustered with their putative orthologues in alpaca, cattle, and pig (figure 2) . the lrc region of approximate length 0.7 mbp was localized to chromosome 9 of camdro2. fifteen full-length genes encoding receptors containing immunoglobulin-like (ig-like) domains of various lineages were identified in the lrc region. besides two kir pseudogenes containing itim domains and an expanded lilr gene family, a singular ig-like receptor was found in the vicinity of fcar and ncr1, located between nlrp7 and nlrp2. it comprises two ig-like domains different from those of the lilr and the kir genes and has a long cytoplasmic tail with two itims. it is a novel type of lrc gene, similar to a gene recently identified in pigs (schwartz and hammond, 2018) . based on its structure, this inhibitory type of receptor gene may be functional, similarly to pigs. the expanded family of lilr genes is organized in two distinct clusters. the first region spanning approximately 141 kb is located between the genes rps9 and cdc42ep5. this region contains three putatively functional genes, lilrb1, lilrb2, and a lilrb2-like sequence. lilrb1 codes for an inhibitory receptor with three ig-like domains. lilrb2 and lilrb2-like each encode four ig-like domains and a cytoplasmic domain with itims. several fragmented sequences containing ig-like domains were identified within this region as well. the second lilr region spanning approximately 127 kb is located between lair1 and the two kir pseudogenes. three full-length lilr genes and a pseudogene were identified in this region. lilrb3 codes for an inhibitory receptor; it comprises four ig-like domains, a transmembrane region, and a cytoplasmic tail with two intact itims. lilrb4 also codes for an inhibitory receptor, but with only two ig-like domains. in addition, there are a potentially functional activating lilra gene and a lilrb pseudogene (containing an itim sequence) located in the same region. the predicted lilra contains four ig-like domains in its extracellular region but has no signal peptide sequence. the cytoplasmic domain is short and contains no itims. several fragments with complete or partial ig-like domains were also found in this region. in the current c. dromedarius ncbi reference genome, the lrc is split amongst at least four scaffolds (nw_011593473, nw_011591120, nw_011595380, and nw_011591711). they matched our camdro2 assembly in terms of the number and orientation of orthologous non-ig-like genes recognized by automatic annotation. single ig-like genes and two kir pseudogenes, but not expanded lilr genes, were unraveled. the bactrian lrc of ncbi reference genome is contained within a single scaffold (nw_011515311), but assembly of the expanded lilr genes is not resolved and lilrb2-like is missing. the overall lrc organization in the c. bactrianus reference genome is the same as that of the lrc of dromedaries (figure 1) . based on pcr and resequencing of representative panels of c. dromedarius and c. bactrianus, individual genotypes could be successfully identified for most of the genes analyzed. however, the amplification of kir3dl sequences in bactrian camels provided only limited numbers of sequences (supplementary table 4) . similar to nkc genes, some sequences from one c. dromedarius individual were removed due to their mixed origin. the kir3dl gene contains a 2-bp deletion in the exon for the third ig-like domain, causing a frameshift and a premature stop codon. this deletion is identical in both camel species. the locus kirdp contained sequences with premature stop codons and frameshift mutations in both camel species. the same was found in ncbi reference genomes. therefore, we assigned kirdp and kir3dl sequences provisionally as pseudogenes in both species. in contrast to the low polymorphism of the nkc receptors, higher numbers of variable amino acid sites were found within kir3dl. an in silico 2-bp insertion resulted in three and seven predicted fulllength protein variants in bactrian camels and dromedaries, respectively ( table 1) . lilrb1 encodes a protein of similar structure to kir3dl, with three constant-type ig-like domains and two functional itims in its cytoplasmic tail. unlike other members of the lilr family coding for receptors with four ig-like domains, lilrb1 has no variable-type ig-like domain between its first and second domains. nevertheless, in c. dromedarius, only three variants with minor changes in their amino acid sequences were found, and in c. bactrianus, this gene appears to be monomorphic. the gene lilrb2 of c. dromedarius encodes a functional inhibitory receptor with four ig-like domains and two itims. six variants recognized in the panel of dromedaries share the same ig-like domains and differ only by two amino acids in the transmembrane region and one in the cytoplasmic tail. all lilrb2-like sequences obtained by pcr were identical to lilrb2 sequences retrieved from the same dromedaries' dnas. in c. bactrianus, lilrb2 encodes two mrnas with a premature stop codon in the first ig-like domain and differs by 37-38 amino acid positions from its dromedary counterparts. no pcr products were obtained for lilrb2-like from the bactrian camel panel. lilrb3 encodes a receptor with 82% identity (86% similarity) to lilrb2 in c. dromedarius. all four protein variants had only two different ig-like domains (the second and third) with one amino acid change each. five variants of the bactrian lilrb3 had 16 inter-species specific positions with another 11 amino acids differing within species. sequences of two activating lilra genes were retrieved by pcr. one of them, containing two ig-like domains, arginine in the transmembrane region, and a long cytoplasmic tail, but with the first itim deleted and the second mutated, was provisionally named as lilra 2-ig. both camel species shared one variant of the lilra 2-ig protein, and two additional variants were present in the dromedary. the second activating gene was designated lilra 4-ig as it contained four ig-like domains, arginine in its transmembrane region, and a short cytoplasmic tail with no signaling motif. three variants of the dromedary and four variants of the bactrian camel lilra 4-ig protein are very similar, differing in only six amino acid positions. one specific variant with an in-frame deletion of 25 amino acids in the third ig-like domain was shared between species. the phylogenetic tree constructed for the coding regions of the camel lrc genes (figure 3) and their homologs in alpaca, cattle, and pig revealed three main clusters of genes characterized by the overall structure of the encoded receptor. the first cluster grouped genes with four ig-like domains receptors (lilrs). the second group was a cluster of genes coding for receptors with three or two ig-like domains (kirs and lilrb1). the third cluster was formed by genes encoding receptors with two ig-like domains (ncr1 and lilra2-ig). within the first cluster, three distinct camel genes, lilrb2, lilrb3, and lilra4-ig, clustered with various cattle and pig lilr genes. likewise, various cattle and pig kir genes formed a cluster with camelid kir3dl genes. this cluster was related with the cluster of camelid lilrb1 genes, while the cluster containing ncr1 gene sequences was related to the camelid lilra2-ig genes. as no cattle and/or pig homologs of camelid lilrb1 and lilra2-ig genes could be identified in the reference genomes of these two species, they did not appear in the trees constructed. the predicted proteins of the ncr1, ncr2, and ncr3 genes were studied as potentially activating immunoglobulin-like receptors for various ligands different from mhc class i. the ncr1 gene is located within the lrc, and the structure of camelid ncr1 is similar to the structure of this gene's products in other species, with two extracellular ig-like domains and a charged residue (arginine) in the transmembrane domain, which allows its interaction with activating adaptor proteins. two allelic variants were identified in each camel species that differed from each other by only one or two amino acids. c. bactrianus possessed one additional variant containing a premature stop codon in the first ig-like domain. the ncr2 gene is located on chromosome 20 of camdro2. it encodes a functional receptor with one extracellular ig-like domain and a charged residue (lysine) in the transmembrane domain. one allelic variant of the receptor is shared by both camel species. another variant, found only in c. dromedarius, has a premature stop codon in the stem region of the putative molecule. two additional variants were identified in c. bactrianus, differing by one amino acid each. the ncr3 gene is also located on chromosome 20, within the mhc region. all sequences in both camel species contained the same two premature stop codons; this gene thus seems to be nonfunctional in camels. the alpaca is evolutionarily the most closely related species to the old world camels. the v. pacos ncbi reference genome contains two scaffolds (nw_005882720 and nw_005883060) with clec and klr genes. the gene content and organization of the alpaca nkc region was found to be very similar to that of the camel nkc, with similarities of amino acid sequences ranging from 88% to 98%. based on publicly available alpaca genomes, the extent of polymorphism of genomic as well as of protein sequences was higher than in either of the camel species. five protein variants were predicted for klra, klrc1, klrc2, and klre. the amino acid changes were concentrated in the ctld and the stem of klra and in the ctld in klrc2. in contrast, they were evenly distributed throughout klrc1. klre coded for four functional protein variants with only three different ctlds, and amino acid changes were concentrated mostly in the cytoplasmic tail. one allele had a 1-bp insertion leading to frameshift and a premature stop codon in the ctld part of the receptor. the same itim motif as in camels was recognized in all five variants. the three variants of klrd identified differed by one amino acid each in the stem region of the receptor and thus shared the same ctld. klri also coded for three variants with the same ctld, differing only in the cytoplasmic tail. due to a non-synonymous substitution, a second itim was recreated in one of the variants. the five protein variants predicted for klrj differed in their ctlds, although all of them contained the same premature stop codon according to the mrna model. klrk haplotypes coded for a potentially functional activating receptor (arginine in the transmembrane region). only one amino acid difference was observed in two ctld types shared by the four klrk protein variants. the alpaca lrc region is fragmented amongst numerous scaffolds of the ncbi reference genome. one of them (nw_005882947) contains a four-domain lilrb gene, kirdp, kir3dl, fcar, ncr1, gp6, and a novel ig-like gene, while another scaffold (nw_005883177) contains lilrb1, comprising three ig domains, and lilrb, with four ig domains. only fragments of lilr genes could be found in the rest of the relevant scaffolds. like in camels, kirdp contains various frameshifts. in contrast, kir3dl has retained an intact genomic sequence and is thus likely to encode a functional inhibitory receptor, but with only one functional itim. the second itim is mutated in all protein variants. seven variants of kir3dl were identified, differing in 11 amino acid positions in total. these sites are equally distributed throughout the molecule. lilrb1 also codes for a potentially functional inhibitory receptor with three ig-like domains and seven identified variants. they differed in 17 positions located in two of the ig-like domains, the stem and the cytoplasmic tail. the protein homology in comparison with dromedary camel kir3dl and lilrb1counterparts reached 93%. the gene ncr1 of the alpaca encodes a functional activating receptor with a charged amino acid residue (arginine) in the transmembrane region. the three detected allelic variants differed only in the stem and transmembrane regions. ncr2 codes for a functional activating receptor (lysine in the transmembrane region) as well. three protein variants with minor changes in the signal peptide and the cytoplasmic tail not affecting their potential function were identified. all identified ncr3 sequences have premature stop codons in the ig-like domain. the amino acid sequence similarity to camel sequences was 96%. in contrast to the rather conservative organization of the mammalian major histocompatibility complexes, natural killer cell receptor genes and their complex genomic regions are evolutionarily flexible. several different types of genomic organization of the nkr regions have been recognized in mammals (martin et al., 2002; hao et al., 2006; guethlein et al., 2015) , and sometimes striking differences have been observed between related taxa (kelley et al., 2005; sanderson et al., 2014; schwartz et al., 2017; schwartz and hammond, 2018) . therefore, studies of genes encoding nk cell receptors may contribute to our understanding of the heterogeneity of nk cell functions in particular mammalian species. at the same time, these genes and especially complex genomic regions such as nkc and lrc represent a relevant model for evolutionary biology. characterization of nkr genes in so far poorly studied species and/or families can bring new information on evolutionary mechanisms governing this part of the mammalian immunogenome. despite the importance of camels as a model for immunogenetic studies (ciccarese et al., 2014) , virtually nothing is known about nk cells in camels in terms of their morphology, functions, their surface receptors, and/or underlying genes. in this context, this study represents the first report on the nkc and lrc genomic regions and on ncr genes in old world camels and their comparison with a new world camelid, the alpaca. whole genome sequences of old world camels, c. dromedarius, c. bactrianus, and camelus ferus (jirimutu et al., 2012; wu et al., 2014; fitak et al., 2016) and of the alpaca v. pacos (ncbi genome 13 accession gca_000164845.3) are currently available. however, their annotation is rather fragmentary and largely composed of predicted sequences generated in silico, based on homologies and sequence similarities with other mammalian species. taking into consideration the quality of resources including the availability of biological material, we focused on the two domestic camel species, c. dromedarius and c. bactrianus. even for these species, however, the current status of whole genome assemblies proved to be insufficient for a correct annotation of nkr genes, especially of the lrc, containing multiple copies of sometimes highly similar sequences and exhibiting copy number variations. therefore, the major resource for our analyses was a new genome assembly of the c. dromedarius genome obtained by a combination of several long-read sequencing techniques and bioinformatic approaches . in all genes selected for sequence analyses, the genomic sequences of nkr genes were highly similar in both camel species studied as well as to available alpaca sequences. such a high sequence similarity was observed for a number of other genes and was also characteristic for mhc class ii sequences (plasil et al., 2016) . therefore, it seems that pcr failures observed in some cases probably do not indicate polymorphisms in the primer binding site. taking also into consideration the generally low polymorphism of the camel genomes (fitak et al., 13 https://www.ncbi.nlm.nih.gov/genome 2016), the occurrence of a putative polymorphic variant on both chromosomes is not too likely. moreover, the pcr failures concerned mostly the loci klrc2 and kir3dl in the bactrians, which seem to be both pseudogenes, so we have not explored them further for the purposes of this study. nevertheless, both loci merit to be further investigated in the future. information that the monomorphic status of the bactrian klrc2 could be explained by allelic drop-out or existence of copy number variation, i.e., partial/total deletion of klrc2 from some bactrian nkc haplotypes, and that polymorphic amino acid positions within the kir3dl sequences were concentrated in the second immunoglobulin-like domain, known to interact with mhc class i ligands in mammals and in the stalk region of the molecule, need to be explored. another potential technical problem is the use of long-range pcrs for amplifying related members of a gene family, which may produce chimeric products. the nkc genes analyzed here were in majority single (not duplicated) genes with characteristic sequences. the lrc genes analyzed, with only lilr genes as members of expanded gene family/families, were different to such an extent that we could distinguish them. in addition, both types of phylogenetic trees clearly supported the individuality of each gene. moreover, sequences successfully amplified as a whole matched the reference sequences. the remaining ones were amplified in two pieces, and again, they matched the reference sequences. although we have checked the overlapping sequences of two-piece pcrs and they did not indicate more polymorphisms, we cannot strictly exclude the possibility that such a sequence could be composed of pieces of two different yet highly homologous loci, taking also into consideration numerous fragments of lilr genes observed in camdro2. the genomic organization of the nkc region of camelids differs from cattle, the phylogenetically most closely related species, whose nkr genes have been studied so far. while in cattle an expansion of klrc and klrh genes was reported (schwartz et al., 2017) , the minimal set of klr genes observed in camelids resembles the genomic organization of the nkc of the domestic pig. similarly, the leukocyte receptor complex of camels is strikingly different from the cattle lrc containing expanded kir (sanderson et al., 2014) and lilr genes (hogan et al., 2012) . in camels, the lrc with non-expanded kir genes and several pseudogenes seems to be even less complex than in the pig (schwartz and hammond, 2018) . within the natural killer complex, all types of klr genes identified in mammals (hao et al., 2006) were found. none of them apparently expanded into a large family; the maximum number of members within a family was two. similar to other mammalian species, the klra gene codes for a homodimeric type ii inhibitory receptor (ly49) (dimasi and biassoni, 2005) , klrc1 encodes an inhibitory receptor with two itims motifs (nkg2a) (vance et al., 1998) , klrc2 encodes an activating receptor (nkg2c) , and klrg1 codes an inhibitory receptor for cadherin molecules (ito et al., 2006) . these data suggest that the function of these molecules could be very similar to human and other mammalian nk receptors, especially in terms of their capacity to form heterodimers cd94/ nkg2a (klrd/klrc1), cd94/nkg2c (klrd/klrc2) (braud et al., 1998) , and/or klre/klri (saether et al., 2008) . in humans, the heterodimers cd94/nkg2a (klrd/klrc1) and cd94/nkg2c (klrd/klrc2) recognize a relatively low polymorphic non-classical mhc class i ligand hla-e, and their polymorphism is also rather low (braud et al., 1998) . contrary to rats, in which klrh recognize mhc class i ligand (daws et al., 2012) , camelid klrh sequences represent only remnants of a full gene sequence. although mrna for bovine klrj was described (storset et al., 2003) , the precise splicing of camelid klrj and possible expression as a functional receptor remains to be verified. the low polymorphism of camelid klrk is comparable to the limited polymorphism of this gene in humans and mice encoding an activating receptor nkg2d for diverse ligands (reviewed in lanier, 2015) . the genomic organization of the nkc is similar in both domestic camel species and in v. pacos. the functionally important polymorphism of nkc genes is limited, with one monomorphic gene and six genes with two to three allelic protein variants in c. dromedarius. an even higher number of monomorphic klr genes (three functional and two potential pseudogenes) was observed in c. bactrianus, and its klrc2 seems to be a pseudogene. it is not clear how this low nkc variation can be related to the fact that no "hla-e-like" molecule has been found to date outside of simians and rodents and to our recent finding that the mhc gene cluster containing hla-e in humans has been lost in camels, similarly to cattle and pigs (plasil et al., 2019) . interestingly, v. pacos seems to be more polymorphic in nkc, at both the genomic and protein levels, despite the limited number of individual genomes analyzed. within the lrc region, no kir genes have expanded, while lilr genes expanded both activating and inhibitory family members. as for the nkc, the same overall organization of the lrc with fcar, ncr1, kir, and lilrb1 genes, three lilrb genes encoding a 4-ig-like domain receptor, low variable kir3dl and lilrb1, and unresolved lilr gene fragments was observed in c. bactrianus. the polymorphism of kir3dl was similar in the dromedary (seven possible protein variants) and the alpaca (seven functional protein variants), while the alpaca seems to be more variable in the lilrb1 gene (seven vs. three protein variants, respectively). unfortunately, we were unable to analyze further alpaca lilr genes, mainly due to a lack of correct fulllength gene reference sequences and/or to a low coverage of ngs reads available in public databases. ncr1, ncr2, and ncr3 are the major activating receptors on human nk cells (reviewed in koch et al., 2013) . the ncr1 gene is located within the lrc; ncr2 and 3 are located on the human chromosome 6, with ncr3 mapping within the mhc (lanier, 2005) . the chromosome location of ncr1, ncr2, and ncr3 genes in the dromedary corresponds to the human homologues, suggesting an orthologous nature of the ncr sequences retrieved. however, only ncr1 and ncr2 have a structure of functional genes, while ncr3 appears to be a pseudogene. the ncr1, ncr2, and ncr3 genes of c. bactrianus and v. pacos are very similar in terms of their genomic locations, sequence homologies, and genomic variation. we are aware of the limitations due to the quality of the current assembly; however, the clusters of the lilr sequences identified in the phylogenetic trees indicated, similarly to nkc genes, the individuality of each of the genes. although the purpose of this study was to outline the general organization of the two nkr complexes in terms of major gene families represented, and their location within nkc and lrc, further work is needed to definitively resolve the complex structure of lrc region, and a detailed characterization of individual lilr genes and pseudogenes is needed. taken together, this first report on nkr genes in camelids revealed features characteristic for nkc and lrc of tylopoda. despite close phylogenetic relationships to cattle, important differences in the nkc and lrc genomic organization and their polymorphism were observed. on the other hand, many similarities with pigs were found. the data presented here increase our genetic knowledge of the innate immune variation in dromedaries and bactrian camels and contribute to studies of nkr evolution in mammals. the results of this project add to our understanding of specificities of the camel immune system and its functions and represent a prerequisite for future investigations on mhc/nkr interactions in health and disease. the camel datasets generated for this study can be found in the ncbi´s genbank ® . the sequences obtained for nkc genes have accession numbers mk644262 -mk644413. the lrc gene sequences for both camel species have accession numbers mk644414 -mk644532. the ncrs sequences have accesion numbers mk473784 -mk473840. the dromedary dna samples in this study were transferred from previous projects [austrian science fund (fwf) p1084-b17 and p24706-b25; pi: pb] and originated either from plucked hair, edta blood collected commensally on whatman fta cards (sigma-aldrich, vienna, austria) during routine veterinary controls or from dna extracts sent by collaborators under bilateral agreements. samples were imported with permits from the austrian ministry of labour, social affairs, health and consumer protection. all bactrian camel samples were collected commensally during veterinary procedures for previous research projects (gacr 523/09/1972; pi: ph). all samples were collected by a licensed veterinarian in compliance with all ethical and professional standards. jf made nkc annotation, designed primers, carried out pcr for nkr genes, and analyzed data. jo made all ngs mappings. aj made lrc annotation and carried out pcr for lilrb1. je provided camdro2 whole genome sequence and annotation. jw made microsatellite definition and analysis. ak designed the microsatellite project. pb designed the project. ph designed the typology of camel farming system in saudi arabia cattle nk cell heterogeneity and the influence of mhc class i expression and genomic analyses of camelus dromedarius t cell receptor delta (trd) genes reveal a variable domain repertoire enlargement due to cdr3 diversification and somatic mutation trimmomatic: a flexible trimmer for illumina sequence data hla-e binds to natural killer cell receptors cd94/nkg2a, b and c ngsutils: a software suite for analyzing and manipulating next-generation sequencing datasets the evolution of natural killer cell receptors characteristics of the somatic hypermutation in the camelus dromedarius t cell receptor gamma (trg) and delta (trd) variable domains kraken: a set of tools for quality control and analysis of high-throughput sequence data identification of an mhc class i ligand for the single member of a killer cell lectin-like receptor family, klrh1 nanobody-based products as research and diagnostic tools a framework for variation discovery and genotyping using nextgeneration dna sequencing data structural and functional aspects of the ly49 natural killer cell receptors improving illumina assemblies with hi-c and long reads: an example with the north african dromedary the de novo genome assembly and annotation of a female domestic dromedary of north african origin human-dromedary camel interactions and the risk of acquiring zoonotic middle east respiratory syndrome coronavirus infection co-evolution of mhc class i and variable nk cell receptors in placental mammals bioedit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/nt nk cells in innate immunity naturally occurring antibodies devoid of light chains heterogenous but conserved natural killer receptor gene complexes in four major orders of mammals dromedary camels and the transmission of middle east respiratory syndrome coronavirus (mers-cov) characterisation of bovine leukocyte ig-like receptors reactivity of commercially available monoclonal antibodies to human cd antigens with peripheral blood leucocytes of dromedary camels (camelus dromedarius) killer cell lectin-like receptor g1 binds three members of the classical cadherin family to inhibit nk cell cytotoxicity genome sequences of wild and domestic bactrian camels comparative genomics of natural killer cell receptor gene clusters activating natural cytotoxicity receptors of natural killer cells in cancer and infection nk cell receptors nk cell recognition nkg2d receptor and its ligands in host defense association of dap12 with activating cd94/nkg2c nk cell receptors aligning sequence reads, clone sequences and assembly contigs with bwa-mem. arxiv® the sequence alignment/map format and sam tools dnasp v5: a software for comprehensive analysis of dna polymorphism data killer-cell immunoglobulin-like receptor (kir) nomenclature report leukocyte ig-like receptor complex (lrc) in mice and men cutadapt removes adapter sequences from high-throughput sequencing reads livestock diversification: an adaptive strategy to climate change and rangeland ecosystem changes in southern ethiopia identification of monoclonal antibody reagents for use in the study of immune response in camel and water buffalo camelid immunoglobulins and nanobody technology variable nk cell receptors and their mhc class i ligands in immunity, reproduction and human evolution the major histocompatibility complex in old world camelids and low polymorphism of its class ii genes the major histocompatibility complex of old world camelids: class i and class i-related genes klre/i1 and klre/i2: a novel pair of heterodimeric receptors that inversely regulate nk cell cytotoxicity definition of the cattle killer cell ig-like receptor gene family: comparison with aurochs and human counterparts the ly49 gene family. a brief guide to the nomenclature, genetics, and role in intracellular infection the evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation the unique evolution of the pig lrc, a single kir but expansion of lilr and a novel ig receptor family nanobodies as therapeutics: big opportunities for small antibodies a comparison of bayesian methods for haplotype reconstruction from population genotype data natural killer cell receptors in cattle: a bovine killer cell immunoglobulin-like receptor multigene family contains members with divergent signaling motifs estimation of the number of nucleotide substitutions when there are strong transition-transversion and g + c-content biases mega5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods the genomic context of natural killer receptor extended gene families generation of diversity by somatic mutation in the camelus dromedarius t-cell receptor gamma variable domains mouse cd94/nkg2a is a natural killer cell receptor for the nonclassical major histocompatibility complex (mhc) class i molecule qa-1(b) innate or adaptive immunity? the example of natural killer cells camels and climate resilience: adaptation in northern kenya foot and mouth disease and similar virus infections in camelids: a review camelid genomes reveal evolution and adaptation to desert environments unique microanatomy of ileal peyer's patches of the one humped camel (camelus dromedarius) is not age-dependent project and the nkr study. jf and ph drafted the manuscript. jo, aj, and jw wrote paragraphs. je and pb edited the manuscript. all authors read, commented on, and approved the final version of the manuscript. key: cord-295307-zrtixzgu authors: delgado-chaves, fernando m.; gómez-vela, francisco; divina, federico; garcía-torres, miguel; rodriguez-baena, domingo s. title: computational analysis of the global effects of ly6e in the immune response to coronavirus infection using gene networks date: 2020-07-21 journal: genes (basel) doi: 10.3390/genes11070831 sha: doc_id: 295307 cord_uid: zrtixzgu gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. for this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. in this work, gene co-expression networks were reconstructed from rna-seq expression data with the aim of analyzing the time-resolved effects of gene ly6e in the immune response against the coronavirus responsible for murine hepatitis (mhv). through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in ly6e [formula: see text] compared to wild type animals. results show that ly6e ablation at hematopoietic stem cells (hscs) leads to a progressive impaired immune response in both liver and spleen. specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of ly6e [formula: see text] mice. on the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ecm remodeling in ly6e [formula: see text] mice. these findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches. the recent sars-cov-2 pandemic has exerted an unprecedented pressure on the scientific community in the quest for novel antiviral approaches. a major concern regarding sars-cov-2 is the capability of the coronaviridae family to cross the species barrier and infect humans [1] . this, along with the tendency of coronaviruses to mutate and recombine, represents a significant threat to global health, which ultimately has put interdisciplinary research on the warpath towards the development of a vaccine or antiviral treatments. given the similarities found amongst the members of the coronaviridae family [2, 3] , analyzing the global immune response to coronaviruses may shed some light on the natural control of viral infection, and inspire prospective treatments. this may well be achieved from the perspective of systems biology, in which the interactions between the biological entities involved in a certain process are represented by means of a mathematical system [4] . within this framework, gene networks (gn) have become an important tool in the modeling and analysis of biological processes from gene expression data [5] . gns constitute an abstraction of a given biological reality by means of a graph composed by nodes and edges. in such a graph, nodes represent the biological elements involved (i.e., genes, proteins or rnas) and edges represent the relationships between the nodes. in addition, gns are also useful to identify genes of interest in biological processes, as well as to discover relationships among these. thus, they provide a comprehensive picture of the studied processes [6, 7] . among the different types of gns, gene co-expression networks (gcns) are widely used in the literature due to their computational simplicity and good performance in order to study biological processes or diseases [8] [9] [10] . gcns usually compute pairwise co-expression indices for all genes. then, the level of interaction between two genes is considered significant if its score is higher than a certain threshold, which is set ad hoc. traditionally, statistical-based co-expression indices have been used to calculate the dependencies between genes [5, 7] . some of the most popular correlation coefficients are pearson, kendall or spearman [11] [12] [13] . despite their popularity, statistical-based measures present some limitations [14] . for instance, they are not capable of identifying non-linear interactions and the dependence on the data distribution in the case of parametric correlation coefficients. in order to overcome some of these limitations, new approaches, e.g., the use of information theory-based measures or ensemble approaches, are receiving much attention [15] [16] [17] . gene co-expression networks (gcns) have already been applied to the study of dramatic impact diseases, such as cancer [18] , diabetes [19] or viral infections (e.g., hiv) in order to study the role of immune response to these illnesses [20, 21] . genetic approaches are expected to be the best strategy to understand viral infection and the immune response to it, potentially identifying the mechanisms of infection and assisting the design of strategies to combat infection [22, 23] . the current gene expression profiling platforms, in combination with high-throughput sequencing, can provide time-resolved transcriptomic data, which can be related to the infection process. the main objective of this approach is to generate knowledge on the immune functioning upon viral entry into the organism, which means mean a perturbation to the system. in the context of viral infection, a first defense line is the innate response mediated by interferons, a type of cytokines which eventually leads to the activation of several genes of antiviral function [24] . globally, these genes are termed interferon-stimulated genes (isgs), and regulate processes like inflammation, chemotaxis or macrophage activation among others. furthermore, isgs are also involved in the subsequent acquired immune response, specific for the viral pathogen detected [25] . gene ly6e (lymphocyte antigen 6 family member e), which has been related to t cell maturation and tumorogenesis, is amongst the isgs [26] . this gene is transcriptionally active in a variety of tissues, including liver, spleen, lung, brain, uterus and ovary. its role in viral infection has been elusive due to contradictory findings [27] . for example, in liu et al. [28] , ly6e was associated with the resistance to marek's disease virus (mdv) in chickens. moreover, differences in the immune response to mouse adenovirus type 1 (mav-1) have been attributed to ly6e variants [29] . conversely, ly6e has also been related to an enhancement of human immunodeficiency viruses (hiv-1) pathogenesis, by promoting hiv-1 entry through virus-cell fusion processes [30] . also in the work by mar et al. [31] , the loss of function of ly6e due to gene knockout reduced the infectivity of influenza a virus (iav) and yellow fever virus (yfv). this enhancing effect of ly6e on viral infection has also been observed in other enveloped rna viruses such as in west nile virus (wnv), dengue virus (den), zika virus (zikv), o'nyong nyong virus (onnv) and chikungunya virus (chikv) among others [32] . nevertheless, the exact mechanisms through which ly6e modulates viral infection virus-wise, and sometimes even cell type-dependently, require further characterization. in this work we present a time-resolved study of the immune response of mice to a coronavirus, the murine hepatitis virus (mhv), in order to analyze the implications of gene ly6e. to do so, we have applied a gcn reconstruction method called engnet [33] , which is able to perform an ensemble strategy to combine three different co-expression measures, and a topology optimization of the final network. engnet has outscored other methods in terms of network precision and reduced network size, and has been proven useful in the modeling of disease, as in the case of human post-traumatic stress disorder. the rest of the paper is organized as follows. in the next section, we propose a description of related works. in section 3, we first describe the dataset used in this paper, and then we introduce the engnet algorithm and the different methods used to infer and analyze the generated networks. the results obtained are detailed in section 4, while, in section 5, we propose a discussion of the results presented in the previous section. finally, in section 6, we draw the main conclusions of our work. as already mentioned, gene co-expression networks have been extensively applied in the literature for the understanding of the mechanisms underlying complex diseases like cancer, diabetes or alzheimer [34] [35] [36] . globally, gcn serve as an in silico genetic model of these pathologies, highlighting the main genes involved in these at the same time [37] . besides, the identification of modules in the inferred gcns, may lead to the discovery of novel biomarkers for the disease under study, following the 'guilt by association' principle. along these lines, gcns are also considered suitable for the study of infectious diseases, as those caused by viruses to the matter at hand [38] . to do so, multiple studies have analyzed the effects of viral infection over the organism, focusing on immune response or tissue damage [39, 40] . for instance, the analysis of gene expression using co-expression networks is shown in the work by pedragosa et al. [41] , where the infection caused by lymphocytic choriomeningitis virus (lcmv) is studied over time in mice spleen using gcns. in ray et al. [42] , gcns are reconstructed from different microarray expression data in order to study hiv-1 progression, revealing important changes across the different infection stages. similarly, in the work presented by mcdermott et al. [43] , the over-and under-stimulation of the innate immune response to severe acute respiratory syndrome coronavirus (sars-cov) infection is studied. using several network-based approaches on multiple knockout mouse strains, authors found that ranking genes based on their network topology made accurate predictions of the pathogenic state, thus solving a classification problem. in [39] , co-expression networks were generated by microarray analysis of pediatric influenza-infected samples. thanks to this study, genes involved in the innate immune system and defense to virus were revealed. finally, in the work by pan et al. [44] , a co-expression network is constructed based on differentially-expressed micrornas and genes identified in liver tissues from patients with hepatitis b virus (hbv). this study provides new insights on how micrornas take part in the molecular mechanism underlying hbv-associated acute liver failure. the alarm posed by the covid-19 pandemic has fueled the development of effective prevention and treatment protocols for 2019-ncov/sars-cov-2 outbreak [45] . due to the novelty of sars-cov-2, recent research takes similar viruses, such as sars-cov and middle east respiratory syndrome coronavirus (mers-cov), as a starting point. other coronaviruses, like mouse hepatitis virus (mhv), are also considered appropriate for comparative studies in animal models, as demonstrated in the work by de albuquerque et al. [46] and ding et al. [47] . mhv is a murine coronavirus (m-cov) that causes an epidemic illness with high mortality, and has been widely used for experimentation purposes. works like the ones by case et al. [48] and gorman et al. [49] , study the innate immune response against mhv arbitrated by interferons, and those interferon-stimulated genes with potential antiviral function. this is the case of gene ly6e, which has been shown to play an important role in viral infection, as well as various orthologs of the same gene [50, 51] . mechanistic approaches often involved the ablation of the gene under study, like in the work by mar et al. [31] , where gene knockout was used to characterize the implications of ly6e in influenza a infection. as it is the case of giotis et al. [52] , these studies often involve global transcriptome analyses, via rna-seq or microarrays, together with computational efforts, which intend to screen the key elements of the immune system that are required for the appropriate response. this approach ultimately leads experimental research through predictive analyses, as in the case of co-expression gene networks [53] . in the following subsections, the main methods and gcn reconstruction steps are addressed. first, in section 3.1, the original dataset used in the present work is described, together with the experimental design. then, in section 4.1, the data preprocessing steps are described. subsequently in section 3.3, key genes controlling the infection progression are extracted through differential expression analyses. finally, the inference of gcns and their analysis are detailed in sections 3.4 and 3.5, respectively. the original experimental design can be described as follows. the progression of the mhv infection at genetic level was evaluated in two genetic backgrounds: wild type (wt, ly6efl/fl) and ly6e knockout mutants (ko, ly6e ∆hsc ). the ablation of gene ly6e in all cell types is lethal, hence the ly6e ∆hsc strain contains a disrupted version of gene ly6e only in hematopoietic stem cells (hsc), which give rise to myeloid and lymphoid progenitors of all blood cells. wild type and ly6e ∆hsc mice were injected intraperitoneally with 5000 pfu mhv-a59. at 3 and 5 days post-injection (d p.i.), mice were euthanized and biological samples for rna-seq were extracted. the overall effects of mhv infection in both wt and ko strains was assessed in liver and spleen. in total 36 samples were analyzed, half of these corresponding to liver and spleen, respectively. from the 18 organ-specific samples, 6 samples correspond to mock infection (negative control), 6 to mhv-infected samples at 3 d p.i. and 6 to mhv-infected samples at 5 d p.i. for each sample, two technical replicates were obtained. libraries of cdna generated from the samples were sequenced using illumina novaseq 6000. further details on sample preparation can be found in the original article by pfaender et al. [54] . for the sake of simplicity, mhv-infected samples at 3 and 5 d p.i. will be termed 'cases', whereas mock-infection samples will be termed 'controls'. the original dataset consists of 72 files, one per sample replicate, obtained upon the mapping of the transcript reads to the reference genome. reads were recorded in three different ways, considering whether these mapped introns, exons or total genes. then, a count table was retrieved from these files by selecting only the total gene counts of each sample replicate file. pre-processing was performed using the edger [55] r package. the original dataset by pfaender et al. [54] was retrieved from geo (accession id: gse146074) using the geoquery [56] package. additional files on sample information and treatment were also used to assist the modeling process. by convention, a sequencing depth per gene below 10 is considered neglectable [57, 58] . genes meeting this criterion are known as low expression genes, and are often removed since they add noise and computational burden to the following analyses [59] . in order to remove genes showing less than 10 reads across all conditions, counts per million (cpm) normalization was performed, so possible differences between library sizes for both replicates would not affect the result. afterwards, principal components analyses (pca) were performed over the data in order to detect the main sources of variability across samples. pca were accompanied by unsupervised k-medoid clustering analyses, in order to identify different groups of samples. in addition, multidimensional scaling plots (mds) were applied to further separate samples according to their features. last, between-sample similarities were assessed through hierarchical clustering. the analyses of differential expression served a two-way purpose, (i) the exploration of the directionality in the gene expression changes upon viral infection, and (ii) the identification of key regulatory elements for the subsequent network reconstruction. in the present application, differentially-expressed genes (deg) were filtered from the original dataset and proceeded to the reconstruction process. this approximation enabled the modeling of the genetic relationships that are considered of relevance in the presented comparison [60] [61] [62] . in the present work mice samples were compared organ-wise depending on whether these corresponded to control, 3 d p.i. and 5 d p.i. the identification of deg was performed using the limma [63] r package, which provides non-parametric robust estimation of the gene expression variance. this package includes voom, a method that incorporates rna-seq count data into the limma workbench, originally designed for microarrays [64] . in this case, a minimum log2-fold-change (log2fc) of 2 was chosen, which corresponds to four fold changes in the gene expression level. p-value was adjusted by benjamini-hochberg [65] and the selected adjusted p-value cutoff was 0.05. in order to generate gene networks the engnet algorithm was used. this technique, presented in gómez-vela et al. [33] , is able to compute gene co-expression networks with a competitive performance compared other approaches from the literature. engnet performs a two-step process to infer gene networks: (a) an ensemble strategy for a reliable co-expression networks generation, and (b) a greedy algorithm that optimizes both the size and the topological features of the network. these two features of engnet offer a reliable solution for generating gene networks. in fact, engnet relies on three statistical measures in order to obtain networks. in particular, the measures used are the spearman, kendall and normalized mutual information (nmi), which are widely used in the literature for inferring gene networks. engnet uses these measures simultaneously by applying an ensemble strategy based on major voting, i.e., a relationship will be considered correct if at least 2 of the 3 measures evaluate the relationship as correct. the evaluation is based on different independent thresholds. in this work, the different thresholds were set to the values originally used in [33] : 0.9, 0.8 and 0.7 for spearman, kendall and nmi, respectively. in addition, as mentioned above, engnet performs an optimization of the topological structure of the networks obtained. this reduction is based on two steps: (i) the pruning of the relations considered of least interest in the initial network, and (ii) the analysis of the hubs present in the network. for this second step of the final network reconstruction, we have selected the same threshold that was used in [33] , i.e., 0.7. through this optimization, the final network produced by engnet results easier to analyze computationally, due to its reduced size. networks were imported to r for the estimation of topology parameters and the addition of network features that are of interest for the latter network analysis and interpretation. these attributes were added to the reconstructed networks to enrich the modeling using the igraph [66] r package. the networks were then imported into cytoscape [67] through rcy3 [68] for examination and analyses purposes. in this case, two kind of analyses were performed: (i) a topological analysis and (ii) an enrichment analysis. regarding the topological analysis, clustering evaluation was performed in order to identify densely connected nodes, which, according to the literature, are often involved in a same biological process [69] . the chosen clustering method was community clustering (glay) [70] , implemented via cytoscape's clustermaker app [71] , which has yielded significant results in the identification of densely connected modules [72, 73] . among the topology parameters, degree and edge betweenness were estimated. the degree of a node refers to the number of its linking nodes. on the other hand, the betweenness of an edge refers to the number of shortest paths which go through that edge. both parameters are considered as a measure of the implications of respectively nodes and edges in a certain network. particularly, nodes whose degree exceeds the average network node degree, the so called hubs, are considered key elements of the biological processes modeled by the network. in this particular case, the distribution of nodes' degree network was analyzed so those nodes whose degree exceeded a threshold were selected as hubs. this threshold is defined as q3 + 1.5 × iqr, where q3 is the third quartile and iqr the interquartile range of the degree distribution. this method has been widely used for the detection of upper outliers in non-parametric distributions [74, 75] , as it is the case. however, the outlier definition does not apply to this distribution since those nodes whose degree are far above the median degree are considered hubs. on the other hand, gene ontology (go) enrichment analysis provides valuable insights on the biological reality modeled by the reconstructed networks. the gene ontology consortium [76] is a data base that seeks for a unified nomenclature for biological entities. go has developed three different ontologies, which describe gene products in terms of the biological processes, cell components or molecular functions in which these are involved. ontologies are built out of go terms or annotations, which provide biological information of gene products. in this case, the clusterprofiler [77] r package, allowed the identification of the statistically over-represented go terms in the gene sets of interest. additional enrichment analyses were performed using david [78] . for both analyses, the complete genome of mus musculus was selected as background. finally, further details on the interplay of the genes under study was examined using the string database [79] . the reconstruction of gene networks that adequately model viral infection involves multiple steps, which ultimately shape the final outcome. first, in section 4.1, exploratory analyses and data preprocessing are detailed, which prompted the modeling rationale. then, in section 4.2, differential expression is evaluated for the samples of interest. finally, networks reconstruction and analysis are addressed in section 4.3. at the end, four networks were generated, both in an organand genotype-wise manner. a schematic representation of the gcn reconstruction approach is shown in figure 1 . general scheme for the reconstruction method. the preprocessed data was subjected to exploratory and differential expression analyses, which imposed the reconstruction rationale. four groups of samples were used to generate four independent networks, respectively modeling the immune response in the liver, both in the wt and the ko situations; and in the spleen, also in the wt and the ko scenarios. in order to remove low expression genes, a sequencing depth of 10 was found to correspond to an average cpm of 0.5, which was selected as threshold. hence, genes whose expression was found over 0.5 cpm in at least two samples of the dataset were maintained, ensuring that only genes which are truly being expressed in the tissue will be studied. the dataset was log2-normalized with priority to the following analyses, in accordance to the recommendations posed in law et al. [64] . the results of both pca and k-medoid clustering are shown in figure 2a . clustering of the log2-normalized samples revealed clear differences between liver and spleen samples. also, for each organ, three subgroups of analogous samples that cluster together are identified. these groups correspond to mock infection, mhv-infected mice at 3 d p.i. and mhv-infected mice at 5 d p.i. (dashed lines in figure 2a ). finally, subtle differences were observed in homologous samples of different genotypes ( figure a1 ). organ-specific pca revealed major differences between mhv-infected samples for ly6e ∆hsc and wt genotypes, at both 3 and 5 d p.i. these differences were not observed in the mock infection (control situation). organ-wise pca are shown in figure 2b ,c. the distances between same-genotype samples illustrate the infection-prompted genetic perturbation from the uninfected status (control) to 5 d p.i., where clear signs of hepatitis were observed according to the original physiopathology studies [54] . on the other hand, the differences observed between both genotypes are indicative of the role of gene ly6e in the appropriate response to viral infection. these differences are subtle in control samples, but in case samples, some composition biass is observed depending on whether these are ko or wt, especially in spleen samples. the comparative analysis of the top 500 most variable genes confirmed the differences observed in the pca, as shown in figure a2 . among the four different features of the samples under study: organ, genotype, sample type (case or control) and days post injection; the dissimilarities in terms of genotype were the subtlest. in the light of these exploratory findings, the network reconstruction approach was performed as follows. networks were reconstructed organ-wise, as these exhibit notable differences in gene expression. additionally, a main objective of the present work is to evaluate the differences in the genetic response in the wt situation compared to the ly6e ∆hsc ko background, upon the viral infection onset in the two mentioned tissues. for each organ, log2-normalized samples were coerced to generate time-series-like data, i.e., for each genotype, 9 samples will be considered as a set, namely 3 control samples, 3 case samples at 3 d p.i. and 3 case samples at 5 d p.i. both technical replicates were included. this rational design seeks for a gene expression span representative of the infection progress. thereby, control samples may well be considered as a time zero for the viral infection, followed by the corresponding samples at 3 and 5 d p.i. the proposed rationale is supported by the exploratory findings, which position 3 d p.i. samples between control and 5 d p.i. samples. at the same time, the reconstruction of gene expression becomes robuster with increasing number of samples. in this particular case, 18 measuring points are attained for the reconstruction of each one of the four intended networks, since two technical replicates were obtained per sample [80] . the differential expression analyses were performed over the four groups of 9 samples explained above, with the aim of examining the differences in the immune response between ly6e ∆hsc and wt samples. limma -voom differential expression analyses were performed over the log2-normalized counts, in order to evaluate the different genotypes whilst contrasting the three infection stages: control vs. cases at 3 d p.i., control vs. cases at 5 d p.i. and cases at 3 vs. 5 d p.i. the choice of a minimum absolute log2fc ≥ 2, enabled considering only those genes that truly effect changes between wt and ly6e ∆hsc samples, whilst maintaining a relatively computer-manageable number of deg for network reconstruction. the latter is essential for the yield of accurate network sparseness values, as this is a main feature of gene networks [5] . for both genotypes and organs, the results of the differential expression analyses reveal that mhv injection triggers a progressive genetic program from the control situation to the mhv-infected scenario at 5 d p.i., as shown in figure 3a . the absolute number of deg between control vs. cases at 5 d p.i. was considerably larger than in the comparison between control vs. cases at 3 d p.i. furthermore, in all cases, most of the deg in control vs. cases at 3 d p.i. are also differentially-expressed in the control vs. cases at 5 d p.i. comparison, as shown in figure 4 . regarding genes fold change, an overall genetic up-regulation is observed upon infection. around 70% of deg are upregulated for all the comparisons performed for wt samples, as shown in figure 3b . nonetheless, a dramatic reduce in this genetic up-regulation is observed, by contrast, in knockout samples, even limiting upregulated genes to nearly 50% in the control vs. cases at 3 d p.i. comparison of liver ly6e ∆hsc samples. the largest differences are observed in the comparison of controls vs. cases at 5 d p.i ( figures a3 and a4 ). these deg are of great interest for the understanding of the immune response of both wt and ko mice to viral infection. these genes were selected to filter the original dataset for latter network reconstruction. the commonalities between wt and ko control samples for both organs were also verified through differential expression analysis following the same criteria (log2fc > 2, p value < 0.05). the number of deg between wt and ko liver control samples (2) and between wt and ko spleen control samples (20) were not considered significant, so samples were taken as analogous starting points for infection. as stated above, the samples were arranged both organ and genotype-wise in order to generate networks which would model the progress of the disease in each scenario. gcns were inferred from log2-normalized expression datasets. a count of 1 was added at log2 normalization so the problem with remaining zero values was avoided. each network was generated exclusively taking into consideration their corresponding deg at control vs. cases at 5 d p.i., where larger differences were observed. four networks were then reconstructed from these previously-identified deg for liver wt samples (1133 genes), liver ko samples (1153 genes), spleen wt samples (506 genes) and spleen ko samples (426 genes). this approach results in the modeling of only those relationships that are related to the viral infection. each sample set was then fed to engnet for the reconstruction of the subsequent network. genes that remained unconnected due to weak relationships, which do not overcome the set threshold, were removed from the networks. furthermore, the goodness of engnet-generated models outperformed other well-known inference approaches, as detailed in appendix b. topological parameters were estimated and added as node attributes using igraph, together with log2fc, prior to cytoscape import. specifically, networks were simplified by removing potential loops and multiple edges. the clustering topological scrutiny of the reconstructed networks revealed neat modules in all cases, as shown in figure a5 . the number of clusters identified in each network, as well as the number of genes harbored in the clusters is shown in table a1 . as already mentioned, according to gene networks theory, nodes contained within the same cluster are often involved in the same biological process [5, 81] . in this context, the go-based enrichment analyses over the identified clusters may well provide an idea of the affected functions. only clusters containing more than 10 genes were considered, since this is the minimum number of elements required by the enrichment tool clusterprofiler. the results of the enrichment analyses revealed that most go terms were not shared between wt and ko homologous samples, as shown in figure 5 . in order to further explore the reconstructed networks, the intersection of ko and wt networks of a same organ was computed. this refers to the genes and relationships that are shared between both genotypes for a specific organ. additionally, the genes and relationships that were exclusively present at the wt and ko samples were also estimated, as shown in figure a6 . the enrichment analyses over the nodes, separated using this criterion, would reveal the biological processes that make the difference between in ly6e ∆hsc mice compared to wt ones. the results of such analyses are shown in figure a7 . finally, the exploration of nodes' degree distribution would reveal those genes that can be considered hubs. those nodes comprised within the top genes with highest degree (degree > q3 + 1.5 × iq), also known as upper outliers in the nodes distribution, were considered hubs. a representation of nodes' degree distribution throughout the four reconstructed networks is shown in figure 6 . these distributions are detailed in figure a8 . this method provided four cutoff values for the degree, 24, 39, 21 and 21, respectively for liver wt and ko, spleen wt and ko networks. above these thresholds, nodes would be considered as hubs in each network. these hubs are shown in tables a2-a5 . figure 5 . enrichment analyses performed over the main clusters identified in wt and ko networks of (a) liver and (b) spleen networks. gene ratio is defined by the number of genes used as input for the ernichment analyses associated with a particular go term divided by the total number of input genes. . boxplots representative of the degree distributions for each one of the four reconstructed networks. identified hubs, according to the q3 + 1.5 × iqr criterion, are highlighted in red. the degree cutoffs, above which nodes would be considered as hubs, were 24, 39, 21 and 21, respectively for liver wt, liver ko, spleen wt and spleen ko networks. note degree is represented in a log scale given that the reconstructed networks present a scale-free topology. in this work four gene networks were reconstructed to model the genetic response mhv infection in two tissues, liver and spleen, and in two different genetic backgrounds, wild type and ly6e ∆hsc . samples were initially explored in order to design an inference rationale. not only did the designed approach reveal major differences between the genetic programs in each organ, but also, between different subgroups of samples, in a time-series-like manner. noticeably, disparities between wt and ly6e ∆hsc samples were observed in both tissues, and differential expression analyses revealed relevant differences in terms of the immune response generated. hereby, our results predict the impact of ly6e ko on hsc, which resulted in an impaired immune response compared to the wt situation. overall, results indicate that the reconstruction rationale, elucidated from exploratory findings, is suitable for the modeling of the viral progression. regarding the variance in gene expression in response to virus, pca and k-medoid clustering revealed strong differences between samples corresponding to liver spleen, respectively (figure 2a ). these differences set the starting point for the modeling approach, in which samples corresponding to each organ were analyzed independently. this modus operandi is strongly supported by the tropism that viruses exhibit for certain tissues, which ultimately results in a differential viral incidence and charge depending on the organ [82] . in particular, the liver is the target organ of mhv, identified as the main disease site [83] . on the other hand, the role of the spleen in innate and adaptive immunity against mhv has been widely addressed [84, 85] . the organization of this organ allows blood filtration for the presentation of antigens to cognate lymphocytes by the antigen presenting cells (apcs), which mediate the immune response exerted by t and b cells [86] . as stated before, pca revealed differences between the three sample groups on each organ: control and mhv-infected at 3 and 5 d p.i. interestingly, between-groups differences are specially clear for liver samples (figure 2b) , whereas spleen samples are displayed in a continuum-like way. this becomes more evident in organ-wise pca (figure 2) , and was latter confirmed by the exploration of the top 500 most variable genes and differential expression analyses ( figure a2 ). furthermore, clear differences between wt and ly6e ∆hsc samples are observed in none of these analyses, although the examination of the differential expression and network reconstruction did exposed divergent immune responses for both genotypes. the differential expression analyses revealed the progressive genetic response to virus for both organs and genotypes (figures 3a and 4) . in a wt genetic background, mhv infection causes an overall rise in the expression level of certain genes, as most deg in cases vs. control samples are upregulated. however, in a ly6e ∆hsc genetic background, this upregulation is not as prominent as in a wt background, significantly reducing the number of upregulated genes (figure 3b) . besides, the number of deg in each comparison varies from wt to ly6e ∆hsc samples. attending at the deg in the performed comparisons, for both the wt and ko genotypes, liver cases at 3 d p.i. are more similar to liver cases at 5 d p.i. than to liver controls, since the number of deg between the first two measuring points is significantly lower than the number of deg between control and case samples at 3 d p.i. (figure 4a,b) . a different situation occurs in the spleen, where wt cases at 3 d p.i. are closer to control samples (figure 4c ), whereas ko cases at 3 d p.i. seem to be more related to cases at 5 d p.i. (figure 4d ). this was already suggested by hierarchical clustering in the analysis of the top 500 most variable genes, and could be indicative of a different progression of the infection impact on both organs, which could be modulated by gene ly6e, at least for the spleen samples. moreover, the results of the deg analyses indicate that the sole knockout of gene ly6e in hsc considerably affects the upregulating genetic program normally triggered by viral infection in wild type individuals (in both liver and spleen). interestingly, there are some genes in each organ and genotype that are differentially expressed in every comparison between the possible three sample types, controls, cases at 3 d p.i. and cases at 5 d p.i. these genes, which we termed highly deg, could be linked to the progression of the infection, as changes in their expression level occur with days post injection, according to the data. the rest of the deg, show an uprise or fall when comparing two sample types, which does not change significantly in the third sample type. alternatively, highly deg, shown in table a6 , exhibited three different expression patterns: (i) their expression level, initially low, rises from control to cases at 3 d p.i. and then rises again in cases at 5 d p.i. (ii) their expression level, initially high in control samples, falls at 3 d p.i. and falls even more at 5 d p.i cases. (iii) their expression level, initially low, rises from control to cases at 3 d p.i. but then falls at cases at 5 d p.i., when it is still higher than the initial expression level. these expression patterns, which are shown in figure a9 , might be used to keep track of the disease progression, differentiating early from late infection stages. in some cases, these genes exhibited inconsistent expression levels, specially at 5 d p.i. cases, which indicates the need for further experimental designs targeting these genes. highly deg could be correlated with the progression of the disease, as in regulation types (i) and (ii) or by contrast, be required exclusively at initial stages, as in regulation type (iii). notably, genes gm10800 and gm4756 are predicted genes which, to date, have been poorly described. according to the string database [79] , gm10800 is associated with gene lst1 (leukocyte-specific transcript 1 protein), which has a possible role in modulating immune responses. in fact, gm10800 is homologous to human gene piro (progranulin-induced-receptor-like gene during osteoclastogenesis), related to bone homeostasis [87, 88] . thus, we hypothesize that bone marrow-derived cell lines, including erythrocytes and leukocytes (immunity effectors), could also be regulated by gm10800. on the other hand, gm4756 is not associated to any other gene according to string. protein gm4756 is homologous to human protein dhrs7 (dehydrogenase/reductase sdr family member 7) isoform 1 precursor. nonetheless and to the best of our knowledge, these genes have not been previously related to ly6e, and could play a role in the immune processes mediated by this gene. finally, highly deg were not found exclusively present in wt nor ko networks, instead, these were common nodes of these networks for each organ. this suggests that highly deg might be of core relevance upon mhv infection, with a role in those processes independent on ly6e ∆hsc . besides, genes hykk, ifit3 and ifit3b; identified as highly deg throughout liver ly6e ∆hsc samples were also identified as hubs in the liver ko network. also gene saa3, highly deg across spleen ly6e ∆hsc samples was considered a hub in the spleen ko network. nevertheless, these highly deg require further experimental validation. the enrichment analyses of the identified clusters at each network revealed that most go terms are not shared between the two genotypes ( figure 5 ), despite the considerable amount of shared genes between the two genotypes for a same organ. the network reconstructed from liver wt samples reflects a strong response to viral infection, involving leukocyte migration or cytokine and interferon signaling among others. these processes, much related to immune processes, are not observed in its ko counterpart. the liver wt network presented four clusters ( figure a5a ). its cluster 1 regulates processes related to leukocyte migration, showing the implication of receptor ligand activity and cytokine signaling, which possibly mediates the migration of the involved cells. cluster 2 is related to interferon-gamma for the response to mhv, whereas cluster 3 is probably involved in the inflammatory response mediated by pro-inflammatory cytokines. last, cluster 4 is related to cell extravasation, or the leave of blood cells from blood vessels, with the participation of gene nipal1. the positive regulation observed across all clusters suggests the activation of these processes. overall, hub genes in this network have been related to the immune response to viral infection, as the innate immune response to the virus is the mediated by interferons. meanwhile, the liver ko network showed three main clusters ( figure a5b ). its cluster 1 would also be involved in defense response to virus, but other processes observed in the liver wt network, like leukocyte migration or cytokine activity, are not observed in this cluster nor the others. cluster 2 is then related to the catabolism of small molecules and cluster 3 is involved in acids biosynthesis. these processes are certainly ambiguous and do not correspond the immune response observed in the wt situation, which suggests a decrease in the immune response to mhv as a result of ly6e ablation in hsc. on the other hand, spleen wt samples revealed high nuclear activity potentially involving nucleosome remodeling complexes and changes in dna accessibility. histone modification is a type of epigenetic modulation which regulates gene expression. taking into account the central role of the spleen in the development of immune responses, the manifested relevance of chromatin organization could be accompanied by changes in the accessibility of certain dna regions with implications in the spleen-dependent immune response. this is supported by the reduced reaction capacity in the first days post-infection of ly6e ∆hsc samples compared to wt, as indicated by the number of deg between control and cases at 3 d p.i for these genotypes. the spleen wt network displayed three clusters ( figure a5c ). cluster 1, whose genes were all upregulated in ly6e ∆hsc samples at 5 d p.i. compared to mock infection, is mostly involved in nucleosome organization and chromatin remodelling, together with cluster 3. cluster 2 would also be related to dna packaging complexes, possibly in response to interferon, similarly to liver networks. instead, in spleen ko most genes take part in processes related to the extracellular matrix. in the spleen ko network, four clusters were identified ( figure a5d ). cluster 1 is related to the activation of an immune response, but also, alongside with clusters 2 and 4, to the extracellular matrix, possibly in relation with collagen, highlighting its role in the response to mhv. cluster 3 is implied in protease binding. the dramatic shut down in the ko network of the nuclear activity observed in the spleen wt network, leads to the hypothesis that the chromatin remodeling activity observed could be related to the activation of certain immunoenhancer genes, modulated by gene ly6e. in any case, further experimental validation of these results would provide meaningful insights in the face of potential therapeutic approaches (see appendix a for more details). the exploration of nodes memebership, depending on whether these exclusively belonged to wt or ko networks or, by contrast, were present in both networks, helped to understand the impairment caused by ly6e ∆hsc . in this sense, go enrichment analyses over these three defined categories of the nodes in the liver networks revealed that genes at their intersection are mainly related to cytokine production, leukocyte migration and inflammatory response regulation, in accordance to the phenotype described for mhv-infection [89] . however, a differential response to virus is observed in wt mice compared to ly6e-ablated. the nodes exclusively present at the wt liver network are related to processes like regulation of immune effector process, leukocyte mediated immunity or adaptive immune response. these processes, which are found at a relatively high gene ratio, are not represented by nodes exclusively present in the liver ko network. additionally, genes exclusively present at the wt network and the intersection network are upregulated in case samples with respect to controls ( figure a6a) , which suggests the activation of the previously mentioned biological processes. on the other hand, genes exclusively-present at the liver ko networks, mostly down-regulated, were found to be associated with catabolism. as for the spleen networks, genotype-wise go enrichment results revealed that the previously-mentioned intense nuclear activity involving protein-dna complexes and nucleosome assembly is mostly due to wt-exclusive genes. actually, these biological processes could be pinpointing cell replication events. analogously to the liver case, genes that were found exclusively present in the wt network and the intersection network are mostly upregulated, whereas in the case of ko-exclusive genes the upregulation is not that extensive. interestingly, the latter are mostly related to extracellular matrix (ecm) organization, which suggest the relevance of ly6e on these. other lymphocyte antigen-6 (ly-6) superfamily members have been related to ecm remodelling processes such as the urokinase receptor (upar), which participates in the proteolysis of ecm proteins [90] . however and to the best of our knowledge, the implications of ly6e in ecm have not been reported. the results presented are in the main consistent with those by pfaender et al. [54] , who observed a loss of genes associated with the type i ifn response, inflammation, antigen presentation, and b cells in infected ly6e ∆hsc mice. genes stat1 and ifit3, selected in their work for their high variation in absence of ly6e, were identified as hub genes in the networks reconstructed from liver wild type and knockout samples, respectively. it is to be noticed that our approach significantly differs to the one carried out in the original study. in this particular case, we consider that the reconstruction of gcn enables a more comprehensive analysis of the data, potentially finding the key genes involved in the immune response onset and their relationships with other genes. for instance, the transcriptomic differences between liver and spleen upon ly6e ablation become more evident using gcn. altogether, the presented results show the relevance of gene ly6e in the immune response against the infection caused by mhv. the disruption of ly6e significantly reduced the immunogenic response, affecting signaling and cell effectors. these results, combining in vivo and in silico approaches, deepen in our understanding of the immune response to viruses at the gene level, which could ultimately assist the development of new therapeutics. for example, basing on these results, prospective studies on ly6e agonist therapies could be inspired, with the purpose of enhancing the gene expression level via gene delivery. given the relevance of ly6e in sars-cov-2 according to previous studies [54, 91] , the overall effects of ly6e ablation in hscs upon sars-cov-2 infection, putting special interest in lung tissue, might show similarities with the deficient immune response observed in the present work. in this work we have presented an application of co-expression gene networks to analyze the global effects of ly6e ablation in the immune response to mhv coronavirus infection. to do so, the progression of the mhv infection on the genetic level was evaluated in two genetic backgrounds: wild type mice (wt, ly6efl/fl) and ly6e knockout mutants (ko, ly6e ∆hsc ) mice. for these, viral progression was assessed in two different organs, liver and spleen. the proposed reconstruction rationale revealed significant differences between mhv-infected wt and ly6e ∆hsc mice for both organs. in addition we observed that mhv infection triggers a progressive genetic response of upregulating nature in both liver and spleen. in addition, the results suggest that the ablation of gene ly6e at hsc caused an impaired genetic response in both organs compared to wt mice. the impact of such ablation is more evident in the liver, consistently with the disease site. at the same time, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ecm remodeling in ly6e ∆hsc mice. we infer that the presence of ly6e limits the damage in the above mentioned target sites. we believe that the characterization of these processes could motivate the efforts towards novel antiviral approaches. finally, in the light of previous works, we hypothesize that ly6e ablation might show analogous detrimental effects on immunity upon the infection caused by other viruses including sars-cov, mers and sars-cov-2. in future works, we plan to investigate whether the over-expression of ly6e in wt mice has an enhancement effect in immunity. in this direction, ly6e gene mimicking (agonist) therapies could represent a promising approach in the development of new antivirals. the authors declare no conflict of interest. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q organ q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q genotype q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q sample type q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q top 500 most variable genes across liver samples row z−score 0 top 500 most variable genes across spleen samples table a1 . number of deg used as input to engnet for network reconstruction and their latter distribution in inferred networks. genes that were not assigned to a cluster (or were comprised in minoritary clusters) were not taken into consideration for enrichment analyses. input genes 1133 1153 506 426 network genes 1118 1300 485 403 cluster 1 262 284 180 109 cluster 2 218 379 255 190 cluster 3 579 624 36 77 cluster 4 59 figure a7 . enrichment analyses based on node exclusiveness of (a) liver and (b) spleen networks. wt refers to nodes exclusively present at those networks reconstructed from wt samples; ko refers to nodes exclusively present at networks reconstructed from ly6e ∆hsc samples; both addresses shared nodes between wt and ko networks. gene ratio is defined by the number of genes used as input for the ernichment analyses associated with a particular go term divided by the total number of input genes. expression of highly deg across spleen ko samples (d) figure a9 . cpm-normalized expression values of highly deg identified across (a) liver wt samples, (b) liver ko samples, (c) spleen wt samples and (d) spleen ko samples. dashed lines separate samples from the three groups under study: controls, cases at 3 d p.i. and cases at 5 d p.i. note sample order within same group is exchangeable. the reconstruction method employed in this case study was validated against other thee well-known inference methods: aracne [93] , wgcna [94] and wto [95] . the output of each reconstruction method, using default values (including engnet) was compared to a gold standard (gs), retrieved from the string database. four different gss were taken into consideration, since these were reconstructed from the deg that were identified in the comparison of control vs. case samples at 5 d p.i., as shown in section 4.2. these deg were mapped to the string database gene identifiers selecting mus musculus as model organism (taxid: 10090). a variable percentage of deg (6-20%) could not be assigned to a string identifier, and were thus removed from the analysis. the interactions exclusively concerning the resulting deg in each case were retrieved from the string database. these interaction networks would serve as gss. the mentioned deg (without unmapped identifiers) would also serve as input for the four reconstruction methods to be compared. the aracne networks were inferred using the spearman correlation coefficient following the implementations in the minet [96] r package. in this case, mutual information values were normalized and scaled in the range 0-1. on the other hand, the wgcna networks were reconstructed following the original tutorial provided by the authors [97] . the power was defined as 5. additionally, the wto networks were built using pearson correlation in accordance to the documentation. absolute values were taken as relationship weights. finally, engnet networks were inferred using the default parameters described in the original article by gómez-vela et al. [33] . for the comparison, the receiver operating characteristic (roc)-curve was estimated using the proc [98] r package. roc curves are shown in figure a10 . the area under the roc curve (auc) was also computed in each case for the quantitative comparison of the methods, as shown in figure a11a . the auc compares the reconstruction quality of each method against random prediction. an auc ≈ 1 corresponds to the perfect classifier whereas am auc ≈ 0.5 approximates to a random classifier. thus, the higher the auc, the better the predictions. on average, engnet provided the best auc results, whilst maintaining a good discovery rate. in addition, engnet provided relatively scarce networks compared to wgcna, as shown in figure a11b . this is considered of relevance given that sparseness is a main feature of gene networks [7] . hosts and sources of endemic human coronaviruses identification and characterization of severe acute respiratory syndrome coronavirus replicase proteins an orally bioavailable broad-spectrum antiviral inhibits sars-cov-2 in human airway epithelial cell cultures and multiple coronaviruses in mice a first course in systems biology computational methods for gene regulatory networks reconstruction and analysis: a review gene network coherence based on prior knowledge using direct and indirect relationships gene regulatory network inference: data integration in dynamic models-a review structure optimization for large gene networks based on greedy strategy comprehensive analysis of the long noncoding rna expression profile and construction of the lncrna-mrna co-expression network in colorectal cancer a new cytoscape app to rate gene networks biological coherence using gene-gene indirect relationships evaluation of gene association methods for coexpression network construction and biological knowledge discovery a comparative study of statistical methods used to identify dependencies between gene expression signals ranking genome-wide correlation measurements improves microarray and rna-seq based global and targeted co-expression networks wisdom of crowds for robust gene network inference comparison of co-expression measures: mutual information, correlation, and model based indices mider: network inference with mutual information distance and entropy reduction bioinformatics analysis and identification of potential genes related to pathogenesis of cervical intraepithelial neoplasia lsd1 activates a lethal prostate cancer gene network independently of its demethylase function diverse type 2 diabetes genetic risk factors functionally converge in a phenotype-focused gene network survivin (birc5) cell cycle computational network in human no-tumor hepatitis/cirrhosis and hepatocellular carcinoma transformation coexpression network analysis in chronic hepatitis b and c hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma reverse genetics approaches for the development of influenza vaccines how viral genetic variants and genotypes influence disease and treatment outcome of chronic hepatitis b. time for an individualised approach? accessory proteins 8b and 8ab of severe acute respiratory syndrome coronavirus suppress the interferon signaling pathway by mediating ubiquitindependent rapid degradation of interferon regulatory factor 3 interferon-stimulated genes: a complex web of host defenses distinct lymphocyte antigens 6 (ly6) family members ly6d, ly6e, ly6k and ly6h drive tumorigenesis and clinical outcome emerging role of ly6e in virus-host interactions identification of chicken lymphocyte antigen 6 complex, locus e (ly6e, alias sca2) as a putative marek's disease resistance gene via a virus-host protein interaction screen polymorphisms in ly6 genes in msq1 encoding susceptibility to mouse adenovirus type 1 interferon-inducible ly6e protein promotes hiv-1 infection ly6e mediates an evolutionarily conserved enhancement of virus infection by targeting a late entry step flavivirus internalization is regulated by a size-dependent endocytic pathway ensemble and greedy approach for the reconstruction of large gene co-expression networks identification of candidate mirna biomarkers for pancreatic ductal adenocarcinoma by weighted gene co-expression network analysis a comprehensive analysis on preservation patterns of gene co-expression networks during alzheimer's disease progression gene co-expression network analysis for identifying modules and functionally enriched pathways in type 1 diabetes gene co-expression analysis for functional classification and gene-disease predictions systems analysis reveals complex biological processes during virus infection fate decisions identifying novel biomarkers of the pediatric influenza infection by weighted co-expression network analysis comprehensive innate immune profiling of chikungunya virus infection in pediatric cases linking cell dynamics with gene coexpression networks to characterize key events in chronic virus infections discovering preservation pattern from co-expression modules in progression of hiv-1 disease: an eigengene based approach the effect of inhibition of pp1 and tnfα signaling on pathogenesis of sars coronavirus the regulatory role of microrna-mrna co-expression in hepatitis b virus-associated acute liver failure sars-cov-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes murine hepatitis virus strain 1 produces a clinically relevant model of severe acute respiratory syndrome in a/j mice the nucleocapsid proteins of mouse hepatitis virus and severe acute respiratory syndrome coronavirus share the same ifn-β antagonizing mechanism: attenuation of pact-mediated rig-i/mda5 activation murine hepatitis virus nsp14 exoribonuclease activity is required for resistance to innate immunity the interferon-stimulated gene ifitm3 restricts west nile virus infection and pathogenesis organization, evolution and functions of the human and mouse ly6/upar family genes interferon-stimulated gene ly6e enhances entry of diverse rna viruses chicken interferome: avian interferon-stimulated genes identified by microarray and rna-seq of primary chick embryo fibroblasts treated with a chicken type i interferon (ifn-α) integrative network biology framework elucidates molecular mechanisms of sars-cov-2 pathogenesis edger: a bioconductor package for differential expression analysis of digital gene expression data geoquery: a bridge between the gene expression omnibus (geo) and bioconductor evaluation of statistical methods for normalization and differential expression in mrna-seq experiments orchestrating high-throughput genomic analysis with bioconductor heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences systems approach identifies tga 1 and tga 4 transcription factors as important regulatory components of the nitrate response of a rabidopsis thaliana roots computational inference of gene co-expression networks for the identification of lung carcinoma biomarkers: an ensemble approach step-by-step construction of gene co-expression networks from high-throughput arabidopsis rna sequencing data limma powers differential expression analyses for rna-sequencing and microarray studies precision weights unlock linear model analysis tools for rna-seq read counts false discovery control with p-value weighting the igraph software package for complex network research cytoscape 2.8: new features for data integration and network visualization network biology using cytoscape from within r gene co-opening network deciphers gene functional relationships community structure analysis of biological networks a multi-algorithm clustering plugin for cytoscape topological analysis and interactive visualization of biological networks and protein structures selectivity determinants of gpcr-g-protein binding boxplot-based outlier detection for the location-scale family outlier detection: how to threshold outlier scores? gene ontology consortium: going forward clusterprofiler: an r package for comparing biological themes among gene clusters systematic and integrative analysis of large gene lists using david bioinformatics resources the string database in 2017: quality-controlled protein-protein association networks, made broadly accessible massive-scale gene co-expression network construction and robustness testing using random matrix theory uncovering biological network function via graphlet degree signatures. cancer inform viral pathogenesis structure-guided mutagenesis alters deubiquitinating activity and attenuates pathogenesis of a murine coronavirus crosstalk of liver immune cells and cell death mechanisms in different murine models of liver injury and its clinical relevance a disparate subset of double-negative t cells contributes to the outcome of murine fulminant viral hepatitis via effector molecule fibrinogen-like protein 2 structure and function of the immune system in the spleen progranulin and a five transmembrane domain-containing receptor-like gene are the key components in receptor activator of nuclear factor κb (rank)-dependent formation of multinucleated osteoclasts rank is essential for osteoclast and lymph node development autologous intramuscular transplantation of engineered satellite cells induces exosome-mediated systemic expression of fukutin-related protein and rescues disease phenotype in a murine model of limb-girdle muscular dystrophy type 2i the intriguing role of soluble urokinase receptor in inflammatory diseases ly6e restricts the entry of human coronaviruses, including the currently pandemic sars-cov-2 aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context wgcna: an r package for weighted correlation network analysis wto: an r package for computing weighted topological overlap and a consensus network with integrated visualization tool ar/bioconductor package for inferring large transcriptional networks using mutual information a general framework for weighted gene co-expression network analysis proc: an open-source package for r and s+ to analyze and compare roc curves this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license key: cord-328287-3qgzulgj authors: moni, mohammad ali; liò, pietro title: network-based analysis of comorbidities risk during an infection: sars and hiv case studies date: 2014-10-24 journal: bmc bioinformatics doi: 10.1186/1471-2105-15-333 sha: doc_id: 328287 cord_uid: 3qgzulgj background: infections are often associated to comorbidity that increases the risk of medical conditions which can lead to further morbidity and mortality. sars is a threat which is similar to mers virus, but the comorbidity is the key aspect to underline their different impacts. one uk doctor says "i’d rather have hiv than diabetes" as life expectancy among diabetes patients is lower than that of hiv. however, hiv has a comorbidity impact on the diabetes. results: we present a quantitative framework to compare and explore comorbidity between diseases. by using neighbourhood based benchmark and topological methods, we have built comorbidity relationships network based on the omim and our identified significant genes. then based on the gene expression, ppi and signalling pathways data, we investigate the comorbidity association of these 2 infective pathologies with other 7 diseases (heart failure, kidney disorder, breast cancer, neurodegenerative disorders, bone diseases, type 1 and type 2 diabetes). phenotypic association is measured by calculating both the relative risk as the quantified measures of comorbidity tendency of two disease pairs and the ϕ-correlation to measure the robustness of the comorbidity associations. the differential gene expression profiling strongly suggests that the response of sars affected patients seems to be mainly an innate inflammatory response and statistically dysregulates a large number of genes, pathways and ppis subnetworks in different pathologies such as chronic heart failure (21 genes), breast cancer (16 genes) and bone diseases (11 genes). hiv-1 induces comorbidities relationship with many other diseases, particularly strong correlation with the neurological, cancer, metabolic and immunological diseases. similar comorbidities risk is observed from the clinical information. moreover, sars and hiv infections dysregulate 4 genes (anxa3, gns, hist1h1c, rasa3) and 3 genes (hba1, tfrc, ghitm) respectively that affect the ageing process. it is notable that hiv and sars similarly dysregulated 11 genes and 3 pathways. only 4 significantly dysregulated genes are common between sars-cov and mers-cov, including nfkbia that is a key regulator of immune responsiveness implicated in susceptibility to infectious and inflammatory diseases. conclusions: our method presents a ripe opportunity to use data-driven approaches for advancing our current knowledge on disease mechanism and predicting disease comorbidities in a quantitative way. electronic supplementary material: the online version of this article (doi:10.1186/1471-2105-15-333) contains supplementary material, which is available to authorized users. the term "comorbidity" refers to the coexistence of multiple diseases or disorders in relation to a primary disease or disorder in an individual [1] . a comorbidity relationship between two diseases exists whenever they appear simultaneously in a patient more than chance alone [2] . it represents the co-occurrence of diseases or presence http://www.biomedcentral.com/1471-2105/ 15/333 to chronic obstructive pulmonary disease (copd) [6, 7] , obesity [8] , mental disorders [9] , immune-related diseases [10] , cancer [11] etc. comorbidity can be attributed to the disease connections on the molecular level, such as dysregulated genes, ppis (protein-protein interactions), and metabolic pathways as potential causes of comorbidity [1, 3, 12, 13] . from a genetic perspective, a pair of diseases is connected because they have both been associated with the same dysregulated genes [14, 15] , whereas from a proteomics perspective phenotypically similar diseases are related via biological modules such as ppis or molecular pathways [16, 17] . population-based disease association is important in conjunction with molecular and genetic data to uncover the molecular origins of diseases and disease comorbidities. patient medical records contain important clarification regarding the co-occurrences of diseases affecting the same patient [2] . during the last few years, several researchers have been conducted the disease comorbidity analysis to understand the origins of many diseases [1, 12, 18] . goh, cusick, valle, childs, vidal, barabasi et al. and feldman, rzhetsky, vitkup et al. built networks of gene-disease associations by connecting diseases that have been associated with the same genes [14, 15] , whereas lee, park, kay, christakis, oltvai and barabási et al. constructed a network in which two diseases are linked if metabolic reactions are associated between them [13] . disease association studies from proteomic point of view have been studied by rual, venkatesan, hao, hirozane-kishikawa, dricot, li, berriz, gibbons, dreze, ayivi-guedehoussou et al. and stelzl, worm, lalowski, haenig, brembeck, goehler, stroedicke, zenkner, schoenherr, koeppen et al. [19, 20] . rzhetsky, wajngurt, park and zheng et al. inferred the comorbidity links between 161 disorders from the disease history of 1.5 million patients [12] . however, all of these efforts have focused on the role of a single molecular or phenotypic measure to capture disease-disease relationships. in our work we have used disease-gene associations, ppis, molecular pathways and clinical information to obtain statistically significant associations and comorbidity risks among diseases. inflammation is a hallmark of many serious human infectious diseases associated to a wide variety of infections, such as hiv-1 [21] . uk doctor max pemberton says "i'd rather have hiv than diabetes" as life expectancy among diabetes patients is lower than that of hiv [22] . however, hiv has a comorbidity impact on the diabetes. also the flu can cause complications, including bacterial pneumonia, or the worsening of chronic health problems. asthma is the most common comorbidity in patients hospitalized for swine influenza (h1n1) infection [23] . dengue can cause myocardial impairment, arrhythmias and, occasionally, fulminant myocarditis [24] . chronic medical conditions, such as heart disease, lung disease, diabetes, renal disease, rheumatologic disease, dementia, and stroke are risk factors for influenza complications [25] . common chronic infections such as periodontitis or infection with helicobacter pylori may also increase stroke risk [26] . moreover, the severity of pneumonia in patients coinfected with influenza virus and bacteria is significantly higher than in those infected with bacteria alone. the incidence of flu is higher in children and younger adults than in older individuals, but influenzaassociated morbidity and mortality increase with age, especially for individuals with underlying medical conditions such as chronic cardiovascular diseases [27] . during the ageing process the immune system becomes compromised and it causes an increasing inflammation [28] . in particular, chronic inflammation (inflammageing) and metabolic function are strongly affected by the ageing process [29] . the ageing of populations is leading to an unprecedented increase different diseases like cancer and fatalities. it is reported that 80% of the elderly population has three or more chronic conditions [30] . on the other hand, respiratory viruses are an emerging threat to global health security and have led to worldwide epidemics with substantial morbidity and mortality [31] . coronaviruses (covs) cause respiratory and enteric diseases in human and other animals that induce fatal respiratory, gastrointestinal and neurological disease. severe acute respiratory syndrome (sars) is an epidemic human disease, is caused by a coronavirus (cov), called sarsassociated coronavirus (sars-cov) [32] . sars patients may present with a spectrum of disease severity ranging from flu-like symptoms and viral pneumonia to acute respiratory distress syndrome and death [33] . most of the deaths were attributed to complications related to sepsis, ards and multiorgan failure, which occurred commonly in the elderly for comorbidities [34] . age and comorbidity (e.g. diabetes mellitus, heart disease) were consistently found to be significant independent predictors of various adverse outcomes in sars [35] . children with sars have better prognosis than adults [34] . advanced age and comorbidities were significantly associated with increased risk of sars-cov related death, due to acute respiratory distress syndrome [35] . mild degree of anaemia is common in the sars infected patients and patients who have recovered from sars show symptoms of psychological trauma [34] . another novel coronavirus mers-cov, which is a new threat for public health, has similar clinical characteristics to sars-cov, but the comorbidity is the key aspect to underline their different impacts [36, 37] . mers-cov causes respiratory infections of varying severity and sometimes fatal infections in humans including kidney failure and severe acute pneumonia [38] . despite sharing some clinical similarities with sars (eg, fever, http://www.biomedcentral.com/1471-2105/15/333 cough, incubation period), there are also some important differences such as the rapid progression to respiratory failure, which we have studied on comorbidities point of view. infection with the human immunodeficiency virus-1 (hiv) and the resulting acquired immune deficiency syndrome (aids) affects cellular immune regulation [39] . hiv infection severely impacts on the immune system causing phenotypic changes in peripheral cells and dysregulates the innate immune system [40] . significant number of hiv-1 infected patients exhibits osteopenia and osteoporosis, leading to higher incidence to develop weak and fragile bones during the course of disease [41] . hiv has also been associated with an increased risk of developing both diabetes and cardiovascular disease [42] . infection with hiv weakens the immune system and reduces the body's ability to fight infections that may lead to cancer [43, 44] . people infected with human immunodeficiency virus (hiv) have a higher risk of some types of cancer (kaposi sarcoma, non-hodgkin lymphoma, cervical cancer, anal, liver, lung cancer, and hodgkin lymphoma) than uninfected people [45] . many people infected with hiv are also infected with other viruses that cause certain cancers [46, 47] . hiv infection even when controlled by highly active antiretroviral therapy (haart) is being linked to chronic inflammation [48] . people with hiv-1 infection appear to have a markedly higher rate of chronic kidney disease than the general public [49] . it is because some of the risk factors associated with hiv-1 acquisition are the same as those that lead to kidney disease because of the virus itself and some therapies (e.g. haart therapy). antiretroviral therapy for hiv may increase the risk of developing metabolic syndrome (abdominal obesity, hyperglycaemia, dyslipidaemia and hypertension) and thus predispose to type 2 diabetes and cardiovascular disease. many of the biologic factors thought to be causally associated with inflammation in hiv disease are also thought to be causally associated with the inflammation of ageing [50] . infections (acute and chronic conditions) are often associated to comorbidity that increases the risk of medical conditions which can lead to further morbidity and mortality. comorbidities related to flu have been recently investigated [51] . comorbidities for tuberculosis have also been studied recently [52, 53] . to understand the overall mechanism we have studied the comorbidity associations of sars and hiv infections. both hiv and sars are emerging infectious diseases in the modern world; each of these diseases has caused global societal and economic impact related to unexpected illnesses and deaths [54] . sras is a significant public health threat and hiv is a long term chronic infection. since these two infections are associated with high mortality rates and there are no clinically approved antiviral treatments or vaccines available for either of these infections, we have selected these two infections for our study. centred on the sars and hiv-1 infections we have investigated highly heterogeneous disease comorbidity networks using the disease-gene associations, ppi subnetwork, molecular pathways and clinical information. we have presented a systematic and quantitative approach to discover human disease comorbidities using different sources of available mrna expression, protein-protein interactions, signalling pathways, disease-gene associations, disease-disease associations and disease-drug associations data. it has been shown that sars coronavirus infects and replicates in a wide variety of host cells in susceptible animals and human beings [55, 56] . to understand the host response to this pathogen, we analysed the gene expression patterns of sars infected patients, compared to normal subjects using oligonucleotide microarrays from the ncbi geo (http://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse1739) [55] . we analysed the microarray gene expression data of over 8,700 genes from the pbmcs of 10 sars patients, and compared with healthy control samples. we found that 274 genes (p < 0.01, > 1.5 fold change) were differentially expressed as compared to healthy controls in which 120 genes were significantly up regulated and 154 genes were significantly down regulated (see additional file 1: table s1 ). on the other hand, monocytes are the key immune responsive cells whose function is adversely impacted by hiv-1. hiv-1 infection radically alters the monocyte phenotype, which is reflected in an hiv-1 induced gene expression analysis. monocyte gene expression microarray data were collected for control and hiv patients from the ncbi geo (http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=gse18464) [57] . to find out the significant dysregulated genes during the hiv-1 infection, we have performed global gene expression analysis. we found that 186 genes (p < 0.01, > 1.5 fold change) were differentially expressed as compared to healthy controls in which 71 genes were up regulated and 115 genes were down regulated (see additional file 2: table s2 ). considering the significantly dysregulated genes of sars (274 genes) and hiv-1 (186 genes) infections, and gene-disease associations information, we have constructed two gene-disease associations networks (gdn), which are used to explore the shared genetic associations and disease comorbidity. starting from the bipartite graph we generated biologically relevant network projections and constructed multi-relational gene-disease network in which nodes are diseases or genes, and edges indicate association between gene and disease. this bipartite http://www.biomedcentral.com/1471-2105/15/333 graph consists of two disjoint sets of nodes, where one set corresponds to all known genetic disorders and the other set corresponds to all of our identified significant genes for sars and hiv-1 infections. the list of disorders, disease genes, and associations between them were obtained from the online mendelian inheritance in man (omim) [58] , a compendium of human disease genes and phenotypes (see details in the methods section). we classified each disorder into one of 21 disorder categories based on the physiological system affected as introduced in goh, cusick, valle, childs, vidal, barabasi et al. [14] . in the gdn, nodes represent diseases class or genes, and two disorders are connected to each other if they share at least one gene in which mutations are associated with both diseases groups (figures 1 and 2 ). the number of interlinked genes between sars infection and other diseases indicates that immunological, hematological, neurological, metabolic and dermatological diseases categories are strongly associated with the sars infection (see figure 1 and additional file 3: table s3 ). few genes are also shared between more than 2 categories of diseases i.e those disease groups are also associated through at least that genes. for an instance, the gene atm shared among sars infection, cancer and immunological diseases. therefore, cancer and immunological diseases are also interrelated through the gene atm. among all these disease classes immunological diseases class is tightly correlated with the sars infection due to the highest number of genes (12 genes) shared between them. on the other hand, the number of associated genes between hiv infection and other diseases indicates that neurological, metabolic, cancer and hematological diseases categories are strongly correlated with the hiv infection (see figure 2 and additional file 4: figure 1 the gene-disease association network centred on the sars infection is constructed based on the different categories of diseases that are connected and showed comorbidities with the sars infection through the different genes. red colour represents different categories of disorders and green colour represents different genes that are common with the other categories of disorders. the size of a disease node is proportional to the number of dysregulated genes shared between the infections/disorder groups. a link is placed between a disorder and a disease gene if mutations in that gene lead to the specific disorder. http://www.biomedcentral.com/1471-2105/15/333 figure 2 the gene-disease association network centred on the hiv infection is constructed based on the different categories of diseases that are connected and showed comorbidities with the hiv-1 infection through the different genes. red colour represents different categories of disorders and green colour represents different genes that are common with the other categories of disorders. the size of a disease node is proportional to the number of dysregulated genes shared between the infections/disorder groups. a link is placed between a disorder and a disease gene if mutations in that gene lead to the specific disorder. table s4 ). few hiv dysregulated genes are also shared between more than 2 categories of diseases such as the gene tgfb1 is shared among hiv infection, cancer and skeletal diseases. it is notable that 11 significant genes (4 upregulated and 7 downregulated) are similarly dysregulated in the both sars and hiv infections. to observe the association of sars and hiv infections with other 7 important diseases (chronic heart failure, kidney disorders, breast cancer, parkinson, osteoporosis, type 1 and type 2 diabetes), we have collected mrna microarray raw data associated with each disease from the gene expression omnibus (http://www.ncbi.nlm.nih.gov/geo/) accession numbers are gse9006, gse9128, gse15072, gse7158, gse8977 and gse7621 [59] . after several steps of statistical analysis we have selected the most significant over and under expressed genes for each infection and disease. we also performed cross compare analysis to find the common significant genes between each disease and sars/ hiv-1 infection. we observed that sars infection shares 21, 12, 16, 5, 11, 11, 11 and 13 genes corresponding to the chronic heart failure, kidney disorders, breast cancer, parkinson, osteoporosis, hiv-1 infection, type 1 and type 2 diabetes. on the other hand, hiv-1 infection shares 11, 10, 17, 9, 7, 11, 9 and 7 genes corresponding to the chronic heart failure, kidney disorders, breast cancer, parkinson, osteoporosis, sars infection, type 1 and type 2 diabetes. then we built disease-disease relationships network for sars and hiv-1 infection with other diseases (see figures 3 (a) and (b) and additional file 5: table s5 and additional file 6: table s6 ). since genes do not function alone and they coordinate their activities in the form of complexes or molecular pathways. therefore two diseases are potentially inter-correlated to each other if they share at least one commonly associated pathway. for this http://www.biomedcentral.com/1471-2105/15/333 reason we have used reactome pathway database [60] and selected the pathways related to these 7 diseases as well as sars and hiv-1 infections. we have observed that diseases and infections shared pathways between them as shown in figures 3 (a) and (b) and additional file 5: table s5 and additional file 6: table s6 . dysregulation in a protein subnetwork may yield the dysfunction of multiple protein subnetworks. therefore, multiple diseases may be caused by the malfunction of a protein complex. so, two diseases are potentially related to each other if they share one or more commonly associated protein subnetwork. to identify the association between diseases based on the ppi subnetwork, we used significantly associated disease protein pairs data from the hprd data base [61] . to find statistically significant associations among diseases, we built disease networks centred on the sars and hiv infections in which two diseases are comorbid if there exists one or more protein subnetwork that are associated with both diseases. the disease similarity network and the protein-protein interaction network are integrated systematically and comprehensively in a simple and compact manner to formulate the disease comorbidity for the sars and hiv-1 infections as shown in figures 4 and 5. we showed that sars and hiv infections shared ppi subnetworks with the other 7 diseases or infections similar to the gene-disease and pathway-disease associations as shown in figures 4 and 5 . based on the gene expression, protein-protein interaction and molecular pathways data, we have found that both sars and hiv-1 infections have a strong association with other 8 diseases or infections (chronic heart failure, kidney disorders, breast cancer, parkinson, osteoporosis, hiv/sars infection, type 1 and type 2 diabetes). these diseases and infections are also strongly correlated among them. we present the correlation strength and distance between a pair of these diseases and infections in figure 6 . we show that some diseases (such as kidney disorders, breast cancer, osteoporosis and heart failure) are more associated with the sars infection (see figure 6 ). kidney disorder is also tightly connected with the hiv-1 infection. the probability of occurring comorbidities between the more tightly connected diseases is more than that of others. it is notable that the patient medical records contain important evidence regarding the co-occurrences of diseases affecting the same patient. so, we constructed a phenotypic disease comorbidity network using 32 million medical records of 13039018 patients data from medpar and analysed its structural properties to better understand the connections among diseases and infections. nodes are unique diseases and edges indicate co-morbidity of the diseases. we included edges between disease pairs for which the co-occurrence is significantly greater than the random expectation based on population prevalence of the diseases. as pointed out in [2] , the relative risk (rr ij ) overestimates relations involving rare infections and diseases, and underestimates relationships between very common disorders or infections. on the other hand, φ-correlation underestimates comorbidity between rare and frequent diseases, and discriminates associations between disorders of similar appearances. thus, we built a network by selecting only the statistically significant network edges having rr ij ≥ 20 and φ ij ≥ 0.06. figure 7 summarises the set of all comorbidity associations among all diseases expressed in the study population by constructing a phenotypic disease network (pdn). in the pdn, nodes are disease phenotypes identified by unique icd-9-cm (the international classification of diseases) disease codes, and links connect phenotypes that show significant comorbidity according to the relative risk rr ij ≥ 20 and the correlation φ ij ≥ 0.06. our phenotypic disease network consists of 336 unique diseases nodes and 1018 co-morbidity relationships. sars-associated coronavirus icd-9-cm diagnosis code is 079.82, which is under the group of "viral and chlamydial infection in conditions classified elsewhere and of unspecified site" and icd-9-cm diagnosis code 079. moreover, the icd-9-cm code 480.3 is for the pneumonia due to sars associated coronavirus. so we have considered both icd-9-cm codes 079.82 and 480.3 for our phenotypic sars comorbidity study. in our 3 digit code data we have considered 079 and for 5 digit code data we have considered 480.3. considering the relative risk rr ij ≥ 10 between the disease group 079 and other disorder categories, we have constructed the pdn as shown in figure 8 (a), and considering the relative risk rr ij ≥ 20 between the disease group 480.3 and other disorder categories, we have constructed the pdn as shown in figure 8 (b). we presented only the most significant relative risk associations (see additional file 7: table s7 and additional file 8: table s8 ). the icd-9-cm diagnosis code for the human immunodeficiency virus (hiv) infection is 042 to 044, which is under the group of "infectious and parasitic diseases" and icd-9-cm code (001-139). so we have considered both 3 digit and 5 digit icd-9-cm codes for our phenotypic comorbidity studies related to hiv infection. considering the relative risk rr ij ≥ 20 between the disease group 042 and other disorder categories, we have constructed the pdn as shown in figure 9 (a) and considering the relative risk rr ij ≥ 100 and φ-correlation φ ij ≥ 0.06 between the disease groups under the sub categories of 042 and other disorder categories, we have constructed the pdn as shown in figure 9 (b). only the most significant relative risk association is represented (see additional file 9: table s9 and additional file 10: table s10 ). to observe the trend of phenotypic relative risk corresponding to the number of shared genes between 2 http://www.biomedcentral.com/1471-2105/15/333 diseases, we have computed the number of shared genes between two diseases and their corresponding phenotypic relative risk of the occurrence of comorbidities as shown in figure 10 . we observed that with increasing number of shared biomarker genes between 2 diseases, the phenotypic relative risk is also increased. we may predict existing diseases of a patient and the prospective disease comorbidities through the identification of highly up and down dysregulated genes. so based on the available data we could predict the disease comorbidities and the level of the comorbidities using the regression model as figure 10 . it is notable that ageing is also a "disease", not a natural process, for which age-related diseases increase exponentially with chronological time. so, to understand the impact of ageing on the disease comorbidities for sars and hiv infections we have considered the ageing data from the genage database (http://genomics.senescence. info/genes/human.html) [62, 63] . after cross comparing http://www.biomedcentral.com/1471-2105/15/333 human lung epithelial cells are likely among the first targets to encounter invading severe acute respiratory syndrome-associated coronavirus (sars-cov) [32] . thus, a comprehensive evaluation of the complex epithelial signalling to sars-cov is crucial to better understand sars pathogenesis. since both of the sars-cov and mers-cov infections cause severe lung pathology we compare and contrast the genes expression level of sars-cov infection and mers-cov infection. to compare between sars-cov and mers-cov infections, and the affect on the disease comorbidities, we have performed the time series microarray data analysis for the both types of infections on lung compared to controls. we have considered gene expression microarray data from the ncbi geo (http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=gse45042) [64] . from the analysis of sars-cov vs mock-infected controls (treated the same way except without the virus) we have found 215 genes are highly significant and from the analysis of mers-cov vs mock we have found 234 gens are highly significant (see details in the additional file 11: table s11 and additional file 12: table s12 ). interestingly, only 4 genes (nfkbia, egr1, ddit31 and ifit2) are common between these two infections (see figure 13 ). however, only 2 genes (nfkbia and egr1) play an important role and differentially expressed among the both infections in lung and also in sars infected pbmcs. then from the hierarchical cluster analysis of the differentially expressed genes of the lung infection by sars-cov and mers-cov, we observed distinct groups of genes that were significantly changed over time (see additional file 13: figure s1 and additional file 14: figure s2 , and additional file 11: table s11 and additional file 12: table s12 ). the log fold changes of the common 4 genes (nfkbia, egr1, ddit31 and ifit2) expression level for the infection of mers-cov and sars-cov are presented in the figures 14 and 15 . we observed that the log fold changes of nfkbia genes expression level is sharply upregulated in the both types of infections corresponding to time point. so nfkbia is an important bio-marker for the both mers-cov and sars-cov infections. it is also observed that the inflammatory genes nfkbia is a highly over expressed in the both pbmcs and lung cells for the infection of sars and also for the infection of mers in the lung cells (see figure 16 ). indeed, the immune system plays a pivotal role in the outbreak of the inflammatory state. so in case of sars infection, the nfkbia gene plays an important role for the disease comorbidities. on the other hand, similar diseases share common genes and could be treated by the same drugs [17] , which may allow us to make predictions for new uses of existing drugs. for an instance, the anti-diabetic drug metformin plays a major protective effect against cancer development and increases significantly higher survival rate of the cancer patients [65] . the finding is that the earlier the metformin regimen was initiated, the greater the http://www.biomedcentral.com/1471-2105/15/333 preventive benefit for the cancer patient. there is an evidence that the antiviral medication, ribavirin, does not work in case of sars infection [66] . to this end, we used connectivity map (cmap), which is a database of more than 1,400 drug transcriptional signatures in several cell lines [67] . this database allows to identify of molecules that induce similar or opposite transcriptional changes relative to the query signature, based on their connectivity enrichment scores. as a query signature we used our 274 highly dysregulated genes for the sars infection. we generated the connectivity score value ranges between +1 and -1, where a highly positive score indicates that the drug induces changes similar to those induced by viral infection, while a highly negative score indicates that the drug reverses the expression of the sars signature. based on the connectivity score we have selected most potential positive and negative regulators of sars viral response (see details in the additional file 15: table s13 ). potential negative regulators indicate that drugs reverse the sars signature gene expression. among the negative potential regulator, the drug molecule tetracycline, zalcitabine, gibberellic acid, prestwick-642 and sulfaquinoxaline are more potential for the mcf7 cell line and vorinostat for the hl60 cell line. based on the data demonstrate the efficacy of different drug against sars virus can be predicted effective drug treatment for the emergent viruses. furthermore, immunomodulatory drugs that reduce the excessive host inflammatory response to respiratory viruses have therapeutic benefit to reduce the sars infection as well as disease comorbidities. we presented and analysed multi-relational disease comorbidity relationships of sars and hiv-1 infections with other diseases or infections based on the associations of genetics, proteomics, molecular signalling pathways and phenotypic disorders. the combination of molecular biology, genetics and clinical medicine has greatly facilitated understanding of how different diseases relate to each other. based on the combined genetics, ppis, pathways and clinical data, our disease networks can disclose potentially novel disease relationships that have not been captured by previous individual studies. the underlying hypothesis behind this line of research is that once we catalogue all disease-related genes, ppi complex and signalling pathways, if we do not consider environmental changes, we will be able to predict the susceptibility of each individual to future diseases using various molecular biomarkers and it will help us to enter an era of predictive medicine. our results indicate that such a combination of molecular and population-level data could help to build novel hypotheses about disease mechanisms. furthermore, if two or more diseases have associated comorbidity, the occurrence of one of them in a patient may increase the likelihood of developing the other diseases. we have also studied the differences between mers-cov and sars-cov in the host response. this enables rapid assessment of viral properties and the ability to anticipate possible differences in human clinical responses to mers-cov and sars-cov and their impact on comorbidities with respect to the general comorbidities conditions. we used this information to predict potential effective drugs against sars-cov, a method that could be more generally used to identify candidate therapeutics in future disease outbreaks. these investigation approach may also help to generate hypotheses and make rapid advancements in characterising the new viruses. we also found that patients' response of sars appears to be mainly an innate inflammatory response using nfk-bia, rather than any specific immune response against a viral infection such as hiv. however, hiv infection and highly active antiretroviral therapy (haart) also increase the immune reconstitution inflammatory syndrome (iris) and inflammation through the nf-κb pathways [68] . moreover we have studied before about the impact of hiv infection on bone diseases and infection (e. g. osteoporosis and osteomyelitis). we observed that genes (e.g. rankl) and pathways (e.g. nf-κb) that are dysregulated by hiv infection also impact on the bone remodelling and bone related diseases. it is also recognised that inflammation plays a role in cancer aetiology, and various studies have found that inflammation may causes iris, obesity and tumour-promoting effects [69] . moreover, inflammation is an important concomitant cause of many major age-associated pathologies such as cancer, neurodegeneration and diabetes [70] . our study provides important evidence to associate diseases with the ageing process at the system level and helps to understand more about the comorbidities of the complex diseases. the ageing process itself is accompanied by a chronic low-grade inflammation, which is termed "inflammageing". the combination of metabolic-driven and age-driven inflammatory pathways plays a pivotal role in disease progression. this observation suggests that inflammageing and meta-inflammation can share stimuli and pathogenic mechanisms for comorbidities. we suppose that what is happening for the comorbidities we investigate is similar to what found for prions [71, 72] . similar to most infectious agents, prion causes inflammatory responses by activating innate immunity through glial cells in the brain. the complete transcriptome of the prion brain at 10 different time points is observed during the 22-week period [71, 72] . at the beginning of the disease, both normal and diseased mouse networks were the same. although the disease started in the most unique network of prion accumulation and replication it is progressed to the other networks. based on this approach we may propose a pathway model for comorbidities how hubs genes dysregulate several other pathways to influence comorbidities. the number of dysregulated pathways could be proportional to the amount of dysregulation of hub genes. our pathway model may states that the hubs that are over turned on, may direct the signal to the different pathways creating comorbidities as shown in figure 17 . for the infection, one of the pathways related to the inflammation starts dysregulation. with increasing time, both confidence level of inflammation and the number of dysregulated pathways are increased. moreover, with the increasing of inflammation the number of diseases for the comorbidities may increase. so initially infection dysregulates one signalling pathway of any cells and that causes other pathways may be dysrupted. in this way disrupting pathways increase more diseases in the same patient and make multimorbidity. disease genes play a central role in the human interactomes. overlapping component genes serve as bridges across the relatively independent functional modules or pathways. so perturbation in one pathway, such as the nf-κb signalling pathway, could be propagated throughout the other relevant pathways. we found sars and hiv-1 infections share 11 significantly dysregulated genes as well as molecular pathways. both sars and hiv-1 viruses may infect and find an already existing comorbidity or generate a new comorbidity through the perturbation of the infected pathways. furthermore, it may provide us an opportunity to investigate the role of other http://www.biomedcentral.com/1471-2105/15/333 genes from the same pathway in the disease space. therefore, pathways could be used to represent the underlying biology of diseases and make prediction of disease comorbidities. in most of the cases, the correlativeness among genes, pathways and diseases are many-to-many, e.g. a disease is associated to many different genes and pathways; and a pathway is associated to many different diseases. this study suggests that a single pathway can be involved in many diseases whereas a disease may have dysregulation in many biological processes. hence, if a drug is already available to treat a disease through modulating the activity of a pathway, then it could potentially be used to treat other diseases that are strongly linked with the same pathway. on the other hand, when a disease shows dysregulation in multiple pathways, a pathwayguided combined drug may be employed in the treatment. moreover, the protein subnetwork-based approach to diseases may aid in drug discovery, in fact it can potentially be used to treat other diseases that are linked to the same protein complex. thus, our findings not only potentially help us to understand how different diseases are related based on their underlying molecular mechanisms but also provide insights into the design of novel, protein complex-guided therapeutic interventions for diseases. extending the concept of subclassifying patient cohorts to the single patient level refers to as personalised medicine. during the last few years, acceptance level of the personalised medicine is sharply increased as it has been apparent that standard treatment approaches are rarely efficient across the entire patient population. advances in highthroughput molecular assay technologies in the fields of genomics, proteomics and other "omics" is increasing the diagnostic and therapeutic strategies for personalised treatment. as a result, declining per-sample cost has given rise to numerous public repositories of biomolecular data. in particular, the availability of these data sets for many different diseases presents a ripe opportunity to use data-driven approaches to advance our current knowledge of disease relationships in a systematic way. the identified disease patterns can then be further investigated with regards to their diagnostic utility or help in predicting novel therapeutic targets. medicine will focus on each individual patient. it will become intrinsically proactive and will increasingly focus on wellness rather than disease. proactive and personalised medicine will bring fundamental changes to healthcare, taking carefully http://www.biomedcentral.com/1471-2105/15/333 targeted preventative or therapeutic action at the earliest indications of risk or disease comorbidities. we are entering into the genomic era of medicine, where a patient's genetic/genomic data is becoming important for clinical decision making, including disease risk assessment, disease diagnosis and subtyping, drug therapy and dose selection, risk assessment for adverse drug reaction, and family planning [73] . today multi-scale and complex biomedical data are gathered and analysed to uncover combinations of predictive disease profiles. our genome, as well as multiple proteomes, multiple transcriptomes, multiple metabolomes, and other personalised data sets obtained at different points in our lives, will be readily available at affordable prices for each individual. in the near future, clinicians will have to consider genetic/genomic implications to patient care throughout their clinical workflow, including electronic prescribing of medications. therefore, for the implementation of the personalised medicine system, a model could be developed that will take individual genetic data. dysregulated biomarker genes will be identified from this genetic data and disease will be identified from the gene-disease association database. based on the information of the existing disease, the model will predict disease comorbidities using the disease-disease associations database. this will provide us to detect many diseases at the earliest detectable phase, even weeks, months, or maybe years before the symptoms appear and it will afford crucial insights into optimizing of our wellness. thus, personalised medicine will give fundamental new insights into disease mechanisms, and hence will open new opportunities for diagnosis, therapy and prevention from the disease comorbidities. in this study, we have considered all available categories of omics and phenotypic data to quantify the sars and hiv-1 infections centred comorbidity associations. we have shown that the phenotype disease network (pdn) has a heterogeneous structure where some diseases are highly connected while others are hardly connected at all. our findings showed that disease progression can be represented and studied using network methods, offering the potential to enhance our understanding of the origin and evolution of human diseases. detecting comorbidity in a large population is of clinical interest due to the fact that it may reveal new information useful for cause of diseases as well as for new treatment strategies. this study demonstrates the value of an integrated approach in revealing disease relationships and new opportunities for therapeutic applications. so we can say that this kind of approach will be helpful for making evidence-based recommendations about disease comorbidities. moreover, considering environmental factors (such as physiological stress, diet), ethnic group and gender discriminations are important factors in the comorbidity analysis. our network approach could be extended as a comorbidity map by integrating diet, exercise and other factors as in [74] . the gene-disease associations data used in this study were collected from the online mendelian inheritance in man (omim) database (http://www.ncbi.nlm.nih.gov/ omim/). this omim database is the best-curated repository of all known disease genes and their associated http://www.biomedcentral.com/1471-2105/15/333 figure 17 progressive temporal activation of pathways. a schematic view of networks becoming disease comorbidities increased for the perturbation of the pathways dysregulation advances with time. the red circles indicate increased levels of dysregulated gene expression relative to control and the red linked indicate dysregulated pathways that have been increased from infection as compared with normal control. the green indicated transcripts that are the same in control and infection condition. the four panels represent the network with time intervals of the infection progression. with time the inflammation confidence level is increased which is indicated by confidence interval. disorders [75, 76] . genotype-phenotype relationships, as summarised in the omim database, contained more than 5000 human disease-genes associations involving 1500 diseases and 3000 disease associated genes. each entry of the omim is composed of four fields, the name of the disorder, the associated gene symbols, its corresponding omim id, and the chromosomal location. we selected the entries with the "(3)" tag, for which there is strong evidence that at least one mutation is cause of the disorder. omim initially focused on monogenic disorders but in recent years has expanded to include complex traits and the associated genetic mutations that confer susceptibility to these common disorders [58] . subsequently we classified each disorder into 21 primary disorder classes based on the physiological system affected as introduced in goh, cusick, valle, childs, vidal and barabasi et al. [14] . disorders having distinct multiple clinical features are assigned to the "multiple" class. this classification scheme reflects the phenotypic similarities among diseases in the same class and has been successfully used in the recent studies of systematic disease analysis [77] . the gene expression data used in this study was obtained from the ncbi gene expression omnibus (geo) (http://www.ncbi.nlm.nih.gov/geo/) [59] . we have considered 10 different data sets for our analysis (accession numbers are gse1739, gse45042, gse17400, gse9006, gse9128, gse15072, gse7158, gse8977, gse18464 and gse7621) [32, 55, 57, 64, [78] [79] [80] [81] [82] . these data sets contain data from the patients of different age and sex. after several rounds of filtering, normalization and statistical analysis, we had microarrays representing sars, mers, hiv-1 infections and 7 other human diseases (heart failure, kidney disorders, breast cancer, parkinson, osteoporosis, type 1 and type 2 diabetes). http://www.biomedcentral.com/1471-2105/15/333 the protein-protein interaction (ppi) data for human was obtained from the human protein reference database (hprd) [61] . hprd contains the maximum number of ppi data among all publicly available literature-derived databases for human ppi [83] . we have used the reactome knowledge base of human biological pathways database for our pathways association analysis [60] . for the cross compare analysis between the sars and hiv infections, and ageing process we have download ageing data from the human ageing genomic resources (http://genomics.senescence.info/ download.html) [62, 63] . they have collected human ageing genes after an extensive review of the literature. these genes are commonly dysregulated during the ageing process. to test the validity of the proposed disease associations, we examined the disease co-occurrence information at the population level. we obtained statistically significant pairwise comorbidity associations reconstructed from over 32 million medical records in the us medicare claims database recorded in the icd-9-cm format (http://www. icd9data.com), which are frequently used for epidemiological and demographic studies and collected from [2] . we used medpar records from 1990 to 1993, where the dates and reasons for all hospitalizations were reported in icd-9-cm format and it contains the diagnoses of 13039018 elderly patients. each record consists of the date of visit, a primary diagnosis and up to 9 secondary diagnosis. all diagnoses are specified by icd9 codes of up to 5 digits. the first three digits specify the main disease category while the last two are used to give additional information about the disease. in total, the icd-9-cm classification consists of 657 different categories at the 3 digit level and 16,459 categories at the 5 digit level [2] . to determine whether some existing drug compounds can reverse the sars infection signature, we used the publicly available connectivity map (cmap) database [67] . cmap provides associations among genes, chemicals and disease or infection conditions. it is a collection of genome-wide transcriptional data from cultured human cells treated with 1,400 different compounds. the method of global gene expression analysis using oligonucleotide microarrays has proven to be a sensitive method to develop and refine the molecular determinants of human disorders [55] . using this technology, we compared the gene expression profiles of sars, hiv and other diseases. to avoid the problems of comparing microarray data of different platforms and experimental systems, we normalized the gene expression data in each microarray sample (disease state or control) using the z-score transformation (z ij ) for each disease gene expression matrix using z ij = where sd is the standard deviation, g ij represents the expression value of gene i in sample j. this transformation allows for the direct comparison of gene expression values across various microarray samples and diseases. to combined more than one data series or experiments for a given disease, we employed a linear regression approach to obtain a combined t-test statistic between two conditions. data were log 2 -transformed and we calculated expression level for each gene using a linear regression model : where y i is the gene expression value and x i is a disease state (disease or control). the coefficients β 0 and β 1 are the parameters of this model and were estimated by least squares. the t-test statistic, when estimating the value of β 1 , is the same as the standard t-test statistic between disease and control states. time series microarray gene expression data analysis was divide into two steps: pre-processing and identification of statistically significant points by t-test, anova and regression analysis to find differently expressed gene profiles in different time points. in the first step, we preprocessed the experimental data using different statistical methods and finally followed by post less normalization, recommended by the golden spike project [84] . in the second step, we have used a most suitable method "masigpro" (microarray significant profiles) to identify differentially expressed genes in time-course microarray experiments, which is a two step regression method successfully applied on more than one groups of time-series [85, 86] . this two steps regression strategy is used to find genes with significant temporal expression changes and significant differences between experimental groups. this procedure first adjusts this global model by the leastsquared technique to identify differentially expressed genes and selects significant genes applying false discovery rate control procedures. then stepwise regression is applied as a variable selection strategy to study differences between experimental groups and statistically significant profiles. after finding differentially gene expression profiles among the group of experiments, the next step is to cluster them according to their profile similarities. the hierarchical clustering and the median gene expression profiles of clusters are performed according to the "masigpro" package in r [85] . student's unpaired t-test was performed to identify genes that were differentially expressed in patients over normal samples and significant genes were selected. a threshold of at least 1.5 fold change and a p value for the t-tests of less than 0.01 were chosen. in addition, a two-way anova with bonferroni's post-hoc test was used to establish statistical significance between groups (< 0.01). pathways and functional categories were considered as over-represented when fisher's exact test p value was < 0.01. for presenting the signalling and http://www.biomedcentral.com/1471-2105/15/333 interaction pathways of the different significant genes, we used cytoscape for data integration and network visualization [87, 88] and reactome functional interaction (fi) cytoscape plugin for knowledge base of human biological pathways and network processes [60] . for the gene disease association, we have considered the neighbourhood based benchmark and topological methods, which are better suited to our networks [89] . in this case, topological refers to methods that rely only on the structure of the network to draw conclusions. we construct a gdn from gene-disease associations where the node in the network can be either a disease or a gene. this network can also be regarded as a bipartite graph. diseases are connected when the diseases share at least one significant dysregulated genes. let a particular set of human diseases d and a set of human genes g, genedisease associations attempt to find whether gene g ∈ g is associated with disease d ∈ d. if g i and g j , the sets of significant up and down dysregulated genes associated with diseases i and j respectively, then the number of shared dysregulated genes (n g ij ) associated with both diseases i and j is as follows: the co-occurrence refers to the number of shared genes in the gdn. the common neighbours is the based on the jaccard coefficient method, where the edge prediction score for the node pair is as: where e is the set of all edges. the number of shared pathways and protein subnetwork that links between diseases i and j are calculated using the equation 1 and the link prediction score is measured using the equation 2. to estimate the correlation starting from disease cooccurrence, we need to quantify the strength of disease association for comorbidities by dipicting a distance between two diseases. for the analysis of the phenotypic data, we used the relative risk (rr ij ) as the quantified measures of comorbidity tendency of two disease pairs and checked φ-correlation (φ ij ) to measure the robustness of the comorbidity associations. the rr ij is observing in a pair of diseases i and j affecting the same patient. when two diseases co-occur more frequently than expected by chance, we will get rr ij > 1 and φ ij > 0. however, rr ij and φ ij are not independent of each other and each carries unique biases that are complementary [1, 2] . so, we used both measures of comorbidity to ensure the robustness of our investigations. the rr ij allows us to quantify the co-occurrence of disease pairs compared with the random expectation which is calculated as: (3) where n is the total number of patients in the population, p i is the incidences/prevalences of disease i, p j is the incidence of disease j and c ij is the number of patients that have been diagnosed with both diseases i and j. for rr ij >= 1 comorbidity is larger than expected by chance and for rr ij < 1 comorbidity is smaller than expected by chance. to calculate the significance of the relative risk rr ij , we used the katz, baptista, azen and pike et al. method to estimate confidence intervals [90] . according to their estimation, the 99% confidence interval for the rr ij between two diseases i and j is calculated by: lower bounds of the confidence interval (lb) = rr ij * exp(−2.56 * σ ij ) and upper bounds of the confidence interval (ub) = rr ij * exp(2.56 * σ ij ), where σ ij is given by: disease pairs within the 99% confidence interval are only considered if the lb value is larger than 1 when rr ij is larger than 1, or if the ub value is smaller than 1 when rr ij is smaller than 1. relative risk measure is intrinsically biased towards overestimation of relationships between rare diseases and underestimates the co-morbidity of more frequent diseases [2] . this bias can be reduced by introduction of a φ-correlation measure. we can quantify the strength of comorbidities by calculating the correlation coefficient associated with a pair of diseases i and j as: where c ij is the number of patients affected by both diseases, n is the total number of patients in the studied population, and p i and p j are the morbidity or incidence of the i th and j th diseases respectively. the φ-correlation is the pearson's correlation for the variables which only take 0 or 1 values [91] . for φ ij > 0 comorbidity is larger than expected by chance and for φ ij < 0 comorbidity is smaller than expected by chance. we can determine the significance of φ = 0 by performing a t-test. this consists of calculating t according to the formula: t = φ √ n−2 √ 1−φ 2 , where n is the number of observations used to calculate φ. to predict the comorbidities considering the primary or index disease we have calculated the conditional relative risk (conditional rr ij ) as follows: for all possible disease pairs i and j, for the cases that one index disease (i) is present (k = true) or absent (k = false). (p = 0.01). we have weighted the edges using a mutual information metric which quantifies how much greater the edge relationship is with respect to co-occurrence. the mutual information weight between two diseases i and j is defined as w ij = c ij p i + p j − c ij (6) where c ij is the observed co-occurrence and p i and p j are the morbidity or prevalence of the i th and j th diseases respectively. to compare between sars-cov and mers-cov, a gene set enrichment analysis was undertaken using gsea [92] . to find out the correlation (similarities) and distance (dissimilarities) among the diseases from the integrated analysis of multidimensional data (gene expression and protein protein interaction), we have applied euclidian distance measurement and metric multi-dimensional scaling (mds) using majorization [93] . mds is a set of methods for discovering hidden structures in multidimensional data. based on a proximity matrix derived from variables measured on objects as input entity, these distances are mapped on a lower dimensional spatial representation. optimization problem is used to find mapping in target dimension of the data based on given pairwise proximity information while minimize the objective function. the particular objective function (or loss function) we used in this work is a sum of squares, commonly called stress. we used majorization to minimize stress and this mds solving strategy is known as smacof (scaling by majorizing a complicated function). stress majorization is an optimization strategy used in multidimensional scaling (mds) where, for a set of nm-dimensional data items, a configuration x of n points in r(<< m)-dimensional space is sought that minimizes the stress function σ (x). here r is 2 that means the (r × n) matrix x lists points in 2-dimensional euclidean space. we have applied the cost function σ to measures the squared differences between ideal (m-dimensional) distances and actual distances in r-dimensional space as follows: x 1 of dimension n 1 × p as the individual's or judge's configuration, and x 2 of dimension n 2 × p as the object's configuration matrix. the least squares metric multidimensional scaling or mds problem is the minimization of σ and over all m × p configurations x. here w ij are given non-negative weights and d ij are given non-negative dissimilarities. the d ij (x) are the euclidean distances between rows i and j of x. thus d ij (x1, x2) = p s=1 x 1is − x 2js 2 (8) where w ij ≥ 0 is a weight for the measurement between a pair of points (i, j), d ij (x) is the euclidean distance between i and j, and δ ij is the ideal distance between the points (their separation) in the m-dimensional data space. note that w ij is used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair). a configuration x which minimizes σ (x) gives a plot in which points that are close together correspond to points that are also close together in the original m-dimensional data space. programming scripts are freely available at www.cl.cam. ac.uk/~mam211/comor/. http://www.biomedcentral.com/1471-2105/15/333 information regarding each of the clusters and genes is described in additional file 12: table s12 . additional file 15: table s13 . connectivity map results of predicted drugs per instance (for each drug and cells line) to reverse sars-cov for early and sustained signature (drugs with negative enrichment scores). the impact of cellular networks on disease comorbidity a dynamic network approach for the study of human phenotypes comor: a software for disease comorbidity risk assessment comorbidity of cardiovascular disease, diabetes and chronic kidney disease in australia. canberra: australian institute of health and welfare mortality after incident cancer in people with and without type 2 diabetes impact of metformin on survival comorbidities in chronic obstructive pulmonary disease comorbidities of chronic obstructive pulmonary disease the incidence of co-morbidities related to obesity and overweight: a systematic review and meta-analysis comorbidity: a network perspective detecting shared pathogenesis from the shared genetics of immune-related diseases comorbidity and survival after early breast cancer. a review probing genetic overlap among complex human phenotypes the implications of human metabolic network topology for disease comorbidity the human disease network network properties of genes harboring inherited disease mutations a human phenome-interactome network of protein complexes implicated in genetic disorders network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets modelling osteomyelitis towards a proteome-scale map of the human protein-protein interaction network a human protein-protein interaction network: a resource for annotating the proteome origin and physiological roles of inflammation as a doctor, i'd rather have hiv than diabetes. the spectator magazine pandemic influenza a (h1n1) virus hospitalizations investigation team: hospitalized patients with 2009 h1n1 influenza in the united states cardiovascular manifestations of the emerging dengue pandemic complications of viral influenza association of symptoms of chronic bronchitis and frequent flu-like illnesses with stroke centers for disease control and prevention (cdc): recommendations of the advisory committee on immunization practices (acip) how ageing processes influence cancer the role of low-grade inflammation and metabolic flexibility in aging and nutritional modulation thereof: a systems biology approach prevalence of comorbidity of chronic diseases in australia early gene expression events in ferrets in response to sars coronavirus infection versus direct interferon-alpha2b stimulation dynamic innate immune responses of human bronchial epithelial cells to severe acute respiratory syndrome-associated coronavirus infection canadian sars research network, kelvin dj. interferon-mediated immunopathological events are associated with atypical innate and adaptive immune responses in patients with severe acute respiratory syndrome sars: prognosis, outcome and sequelae severe acute respiratory syndrome-coronavirus infection in aged nonhuman primates is associated with modulated pulmonary and systemic immune responses epidemiological, demographic, and clinical characteristics of 47 cases of middle east respiratory syndrome coronavirus disease from saudi arabia: a descriptive study severe respiratory illness caused by a novel coronavirus isolation of a novel coronavirus from a man with pneumonia in saudi arabia osteoimmunopathology in hiv/aids: a translational evidence-based perspective differential impacts of r5 vs. x4 hiv-1 on the transcriptome of primary cd4+ t cells hiv-1 triggers apoptosis in primary osteoblasts and hobit cells through tnfα activation a review of co-morbidity between infectious and chronic disease in sub saharan africa: tb and diabetes mellitus, hiv and metabolic syndrome, and the impact of globalization cancer risk in people infected with human immunodeficiency virus in the united states hepatocellular cancer in hiv-infected individuals: tomorrow's problem? vajdic cm: incidence of cancers in people with hiv/aids compared with immunosuppressed transplant recipients: a meta-analysis risk of human papillomavirus-associated cancers among persons with aids hiv infection and lymphoma control of antiviral immunity by pattern recognition and the microbiome safety and success of kidney transplantation and concomitant immunosuppression in hiv-positive patients hiv infection, inflammation, immunosenescence, and aging weakened immunity in aged hosts with comorbidities as a risk factor for the emergence of influenza a h7n9 mutants tuberculosis comorbidity with communicable and non-communicable diseases: integrating health services and control efforts heightened plasma levels of heme oxygenase-1 and tissue inhibitor of metalloproteinase-4 as well as elevated peripheral neutrophil counts are associated with tuberculosis-diabetes comorbidity emerging infectious diseases: threats to human health and global stability expression profile of immune response genes in patients with severe acute respiratory syndrome chemokine up-regulation in sars-coronavirus-infected, monocyte-derived human dendritic cells interferon-α drives monocyte gene expression in chronic unsuppressed hiv-1 infection online mendelian inheritance in man (omim), a knowledgebase of human genes and genetic disorders ncbi geo: mining tens of millions of expression profiles database and tools update reactome: a knowledgebase of biological pathways human protein reference database-2009 update human ageing genomic resources: integrated databases and tools for the biology and genetics of ageing meta-analysis of age-related gene expression profiles identifies common signatures of aging cell host response to infection with novel human coronavirus emc predicts potential antivirals and important differences with sars coronavirus new users of metformin are at low risk of incident cancer a cohort study among people with type 2 diabetes severe acute respiratory syndrome (sars) applications of connectivity map in drug discovery and development developments in hiv neuropathogenesis infection, immunoregulation, and cancer charting the nf-κb pathway interactome map a systems approach to prion disease systems biology and p4 medicine: past, present, and future emerging landscape of genomics in the electronic health record for personalized medicine clinical assessment incorporating a personal genome mckusick's online mendelian inheritance in man (omim) a new face and new challenges for online mendelian inheritance in man (omim) evolutionary history of human disease genes reveals phenotypic connections and comorbidity among genetic diseases gene expression in peripheral blood mononuclear cells from children with diabetes gene expression profiles in peripheral blood mononuclear cells of chronic heart failure patients mitochondrial dysregulation and oxidative stress in patients with chronic kidney disease mesenchymal stem cells within tumour stroma promote breast cancer metastasis a genomic pathway approach to a complex disease: axon guidance and parkinson disease an evaluation of human protein-protein interaction data in the public domain preferred analysis methods for affymetrix genechips revealed by a wholly defined control dataset masigpro: a method to identify significantly differential expression profiles in time-course microarray experiments serial expression analysis: a web tool for the analysis of serial gene expression data cytoscape: a software environment for integrated models of biomolecular interaction networks cytoscape 2.8: new features for data integration and network visualization exploring and exploiting disease interactions from multi-relational gene and phenotype networks obtaining confidence intervals for the risk ratio in cohort studies applied multiple regression/correlation analysis for the behavioral sciences. routledge gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles a majorization algorithm for solving mds submit your next manuscript to biomed central and take full advantage of: • convenient online submission • thorough peer review • no space constraints or color figure charges • immediate publication on acceptance • inclusion in pubmed, cas, scopus and google scholar • research which is freely available for redistribution we thank fp7-health-f5-2012 for providing financial support, under grant agreement n 305280 (mimomics). additional file 1: table s1 . highly statistical significantly differential expressed genes between sars and control group in pbmcs. table s2 . highly statistical significantly differential expressed genes between hiv and control group. table s7 . phenotypic disease association for sars infection based on the icd9 codes at the 3-digit category level. only statistically significant links with high relative risk rr ij are considered. additional file 8: table s8 . phenotypic disease association for sars infection based on the icd9 codes at the 5-digit category level. only statistically significant links with high relative risk rr ij are considered.additional file 9: table s9 . phenotypic disease association for hiv infection based on the icd9 codes at the 3-digit category level. only statistically significant links with high relative risk rr ij are considered. additional file 10: table s10 . phenotypic disease association for hiv infection based on the icd9 codes at the 5-digit category level. only statistically significant links with high relative risk rr ij are considered.additional file 11: table s11 . highly statistical significant differentially expressed genes between sars-cov and reference group (mock) in lung epithelial cells.additional file 12: table s12 . highly statistical significant differentially expressed genes between mers-cov and reference group (mock) in lung epithelial cells.additional file 13: figure s1 . median expression profile of sars-cov vs mock using hierarchical clustering (ward method, pearson correlation) of 215 statistical significantly differential expressed genes (p < 0.001). the information regarding each of the clusters and genes is described in additional file 11: table s11 .additional file 14: figure s2 . median expression profile of mers-cov vs mock using hierarchical clustering (ward method, pearson correlation) of 234 statistical significantly differential expressed genes (p < 0.001). the the authors declare that they have no competing interests.authors' contributions mam and pl designed and mam implemented the analysis of the paper. mam and pl wrote the manuscript. both authors contributed to and approved the manuscript. key: cord-336573-bpg1dg24 authors: greenaway, hui yee; kurniawan, monica; price, david a; douek, daniel c; davenport, miles p; venturi, vanessa title: extraction and characterization of the rhesus macaque t cell receptor β-chain genes date: 2009-06-09 journal: immunol cell biol doi: 10.1038/icb.2009.38 sha: doc_id: 336573 cord_uid: bpg1dg24 rhesus macaque models have been instrumental for the development and testing of vaccines prior to human studies and have provided fundamental insights into the determinants of immune efficacy in a variety of infectious diseases. however, the characterization of antigen-specific t cell receptor (tcr) repertoires during adaptive immune responses in these models has previously relied on human tcr gene assignments. here, we extracted and characterized tcr β-chain (trb) genes from the recently sequenced rhesus macaque genome that are homologous to the human trb genes. comparison of the rhesus macaque trb genes with the human trb genes revealed an average best-match similarity of 92.9%. furthermore, we confirmed the usage of most rhesus macaque trb genes by expressed tcrβ sequences within epitope-specific tcr repertoires. this primary description of the rhesus macaque trb genes will provide a standardized nomenclature and enable better characterization of tcr usage in studies that utilize this species. the rhesus macaque is widely used as a non-human primate model to study infection and immunity due to the close genetic relationship with humans (∼93% average humanmacaque sequence identity1) and the homology between human and rhesus pathogen genomes2, 3. indeed, rhesus macaques have been used to study fundamental aspects of immunology, including the development and maintenance of t cell memory 4, immunodominance5 and the aging immune system6. there have also been many studies of users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms immune responses in rhesus macaque models of human infections such as human immunodeficiency virus (hiv)7, influenza virus8, 9, tuberculosis10, epstein-barr virus (ebv)11, 12, cytomegalovirus (cmv)4, 13-15, smallpox16, measles17 and severe acute respiratory syndrome (sars)18. furthermore, rhesus macaques have been instrumental in the design and testing of vaccines against infections such as hiv19 and smallpox 16. the various roles of t lymphocytes in adaptive immune responses to infection, which include the provision of helper functions to other immune cells and cytolytic control of infected cells, require that t cell populations recognize a large variety of foreign peptides bound to major histocompatibility complex (mhc) molecules. this recognition is facilitated by a diverse repertoire of t cell receptors (tcrs). the tcr repertoires that respond to different peptide-mhc epitopes can vary greatly. indeed, diversity estimates range from ∼10 to >1000 different tcrs responding to a specific epitope20-23. moreover, some epitope-specific tcr repertoires can feature biased usage of tcr vβ (trbv) or jβ (trbj) genes, or distinct patterns of amino acid usage within the third complementarity-determining region (cdr3)24. studies of the tcr repertoire can provide valuable information about the molecular evolution of an immune response and the factors that shape clonotype selection in vivo25. furthermore, it is becoming increasingly apparent that the clonotypic structure of an epitope-specific t cell response can have important implications for the immune control of some viral infections. for example, one issue of current debate that has important consequences for the rational design of immunotherapeutic and vaccination strategies24, 26 is whether a restricted tcr repertoire responding to a highly variable pathogen could be associated with the emergence of viral mutants that escape t cell recognition at this epitope27-31. many studies of t cell immunity in rhesus macaque models of infection have utilized tcr repertoire data to gain additional insights5, 14, 30, 32-43. in particular, a large number of studies have characterized the tcr repertoires of target cd4 + t cell populations or cd8 + t cell populations involved in the control of simian immunodeficiency virus (siv) in rhesus macaques5, 30, 32-39, 41-43. most of these studies have relied on human tcr gene homology to identify v and j gene usage. although the rhesus macaque tcr dβ (trbd) and trbj genes have previously been sequenced44, the trbv genes were not previously available. here, we present the trbv, trbd and trbj genes extracted from the rhesus macaque genome1 on the basis of their homology with the human trb genes. in addition, we demonstrate extracted trb gene usage in expressed tcrβ sequences by using an existing database of 7218 tcrβ sequences involved in cd8 + t cell responses specific for the immunodominant mamu-a*01-restricted sl8/tl8 (s/ttpesanl; tat, residues 28-35) and cm9 (ctpydinqm; gag, residues 181-189) epitopes derived from siv30, 45. the trb genes extracted from the rhesus macaque genome will enable more accurate characterization of rhesus macaque tcrβ repertoires. trbv gene corresponding most precisely to each rhesus macaque trbv gene was identified on the basis of the highest percentage match between the nucleotide sequences for the trbv genes (i.e. v-gene in the imgt standarized labels). the percent similarity between the nucleotide sequences for the rhesus macaque and the best-match human trbv genes ranged between 78.3% and 96.5%, with an average similarity of 92.2%. we could not identify a one-to-one correspondence between all rhesus macaque and human trbv genes ( figure 1 ). in many cases, one human trbv gene was found to be the best match to more than one of the trbv genes extracted from the rhesus macaque genome. for example, the human trbv6-5 gene had the highest percent similarity of all human trbv genes to five of the rhesus macaque trbv genes; in contrast, the human trbv6-6 gene was not the best match to any of the rhesus macaque trbv genes. for five of the 72 trbv genes, only partial sequences were available from the rhesus macaque genome (table s1 in supporting information) and only two of these partial trbv genes were incomplete at the 3' end, which would influence their use in analysis of the cdr3. the human trbv17 subgroup, consisting of just one gene, was the only one for which no corresponding trb gene was found in the rhesus macaque genome (using a cutoff of 75% similarity). we also compared the trbv exons (i.e. l-part1+v-exon in the imgt standardized labels) between the rhesus macaque and best-match human trbv genes ( table 1 ). the percent identities between the nucleotide sequences for the rhesus macaque and human trbv exons ranged between 72.7% and 96.5%, with an average of 92.9%. the similarities between the rhesus macaque and human trbv exons at the amino acid sequence level ranged between 19.5% and 94.7%, with an average of 85.3%. the two trbd genes extracted from the rhesus macaque genome were found to have 95.0% and 92.8% agreements at the nucleotide level with the corresponding human trbd genes ( table 2 and rhesus_macaque_trbd.fsa in supporting information). the percent similarities between the rhesus macaque and human trbd exon (i.e. d-region in the imgt standardized labels) nucleotide sequences were 84.6% and 75.0%. the rhesus macaque trbd genes have been sequenced in a previous study44. the trbd1 gene extracted from the rhesus macaque genome does not differ from that reported in this previous study. a 1.2% difference was found between the trbd2 gene reported here and that reported previously, with a single nucleotide difference occurring in the 5' spacer. thus, there are no differences in the trbd2 d-region extracted from the rhesus macaque genome compared with that reported previously44. for each of the 14 human trbj genes, there was one corresponding trbj gene found on chromosome 3 of the rhesus macaque genome (table 3 and rhesus_macaque_trbj.fsa in supporting information). the percent similarities between the rhesus macaque trbj genes and the corresponding human trbj genes are shown in table 3 (range: 92.1% and 98.7%; average: 96.1%). a comparison of the rhesus macaque and human trbj exons (i.e. j-region in the imgt standardized labels) revealed percent similarities of nucleotide sequences ranging between 90.2% and 100%, with an average similarity of 95.4% (table 3 ). the similarities between the translated trbj exons of the rhesus macaque and human genes ranged between 81.3% and 100%, with an average similarity of 92.3% (table 3) . we compared the trbj genes extracted from the rhesus macaque genome with those reported in a previous study44. the only differences found were in the trbj1-6 and trbj2-1 genes, which differed by 1.9% and 2%, respectively. a single nucleotide difference in the 20 th nucleotide position of the trbj1-6 exon resulted in a difference of a single amino acid (i.e. the trbj1-6 exon from the rhesus macaque genome contained h in the 7 th amino acid position instead of y). in the trbj2-1 gene, a single nucleotide difference in the 31 st nucleotide position of the exon did not result in any amino acid differences between the trbj2-1 exon extracted from the rhesus macaque genome and that reported by cheynier et al.44 to demonstrate the use of the trb genes extracted from the rhesus macaque genome by expressed tcrβ sequences, we used an existing database of 7218 tcrβ sequences involved in cd8 + t cell responses specific for the immunodominant mamu-a*01-restricted siv-sl8/tl8 and siv-cm9 epitopes in 20 rhesus macaques30, 45. each of these tcrβ sequences was aligned with the trb gene exons to determine the most likely trbv, trbj and trbd gene usage. in table 4 and table 5 we show the rhesus macaque trb genes that were found to be most likely used by at least one of the tcrβ sequences. the genes used by the tcrβ sequences included 54 of the 72 trbv genes, both trbd genes, and 13 of the 14 trbj genes. the highest percent homology and longest match between each trb gene and a tcrβ sequence is also shown. of the 18 rhesus macaque trbv genes not used by the tcrβ sequences, 12 either didn't begin with a start codon or contained stop codons when translated ( table 1 ). the rhesus macaque trbj2-2p gene, which is homologous to the human trbj2-2p gene (qualified by imgt as having an "open reading frame" functionality), was the only trbj gene not used by the tcrβ sequences. deviations between the rhesus macaque trb genes and tcrβ sequences were mostly attributed to the full-length genes not being used by the tcrβ sequences, owing to nucleotides being cleaved during tcr gene recombination. however, allelic differences could also exist between the single rhesus macaque sequenced in the genome project and the 20 siv-infected macaques from which the tcrβ sequences were obtained. possible allelic variants of the trb genes used by the tcrβ sequences were not identified due to the level of uncertainty associated with distinguishing allelic variants from sequencing errors, in either the rhesus macaque genome or tcrβ sequences, when there were often small numbers of tcrβ sequences per rhesus macaque using a particular trb gene. however, we investigated whether the nucleotide sequence variants of the trbj1-6 and trbj2-1 genes reported by cheynier et al. 44 were used in our collection of epitopespecific tcrβ sequences. the previously reported variant of the trbj1-6 gene was found to be used by some tcrβ sequences, suggesting that this is an allelic variant of the trbj1-6 gene extracted from the rhesus macaque genome. the trbj2-1 gene variant was not used by any of the tcrβ sequences. this trbj2-1 gene variant may be an allelic variant that was not present in any of the 20 rhesus macaques in which the mamu-a*01-restricted siv-sl8/tl8-and siv-cm9-specific tcrβ repertoires were studied but it is also possible that the single nucleotide difference in the trbj2-1 gene reported cheynier et al. 44 is due to sequencing error. the assembly of reference tcr gene data sets for many species has often relied on the ad hoc sourcing of different tcr genes from various studies over time. here, we report a reference set of trb genes extracted from the rhesus macaque genome, most of which were expressed by tcrβ sequences in our extensive database of tcrβ repertoires involved in cd8 + t cell responses to the immunodominant mamu-a*01-restricted sl8/tl8 and cm9 epitopes derived from siv. although there is a high degree of similarity (93.0%) between the exons of the rhesus macaque and human trb genes, important interspecies differences exist. these interspecies differences are emphasized by the lack of a one-to-one correspondence between the rhesus macaque and human trbv genes, and could potentially limit the accuracy of studies that rely on human tcr genes to characterize rhesus macaque tcr repertoires. the rhesus macaque trb genes described herein will not only aid in the identification of the trbv and trbj genes used by tcrβ sequences, they will also improve the accuracy of studies that aim to characterize the v(d)j recombination mechanisms that produce tcrβ repertoires. indeed, several of the extracted rhesus macaque trb genes have already been used in a study of tcrβ sequence sharing between macaques in the siv-sl8/tl8-specific and siv-cm9-specific cd8 + t cell responses39. this study required predictions of the potential v(d)j recombination mechanisms involved in producing the observed epitopespecific tcrβ repertoires, which were more reliable using the rhesus macaque trb genes instead of the human trb genes. rhesus macaques are frequently used to study fundamental aspects of immunology and investigate vaccine efficacy in a variety of infectious diseases. increasing evidence, much of which has come from studies conducted with this non-human primate model, indicates that the clonotypic architecture of antigen-specific t cell populations is a fundamental determinant of immune control and disease outcome26, 45. thus, the rhesus macaque trb genes presented here provide a valuable tool for dissecting the molecular features of tcrβ repertoires that underlie such associations in this model. the published rhesus macaque (macaca mulatta) genome1 is available from the national center for biotechnology information (ncbi) rhesus macaque genome resources website (http://www.ncbi.nlm.nih.gov/projects/genome/guide/rhesus_macaque/). the trb gene locus is located on chromosome 3 (accession number: nc_007860.1). the rhesus macaque chromosome 3 sequence was queried against all human trb reference genes (obtained from the ncbi human resources website http://www.ncbi.nlm.nih.gov/projects/genome/guide/ human/) using blast (basic local alignment search tool)46 to identify regions in the rhesus macaque sequence that resembled human trb genes. results were filtered to those with e-value ≤ 0.001, total alignment length ≥ 35% of the human reference gene, and total percent identity ≥ 75% with the human reference gene. these parameters were chosen to minimize false positive search results. overlapping regions were merged and all regions were extended in both the 5' and 3' directions to account for regions missed in blast's local alignment search. sequence alignments using clustalw47 were then performed to compare each region of the rhesus macaque genome with each human trb gene from the ncbi human reference set. the best human match to each macaque region was identified and then used as a guide to determine the exact length and terminal ends of the rhesus macaque trb gene sequences, as well as intron and exon positions. we assessed the similarity between the rhesus macaque and the ncbi human trb reference gene sequences (or the imgt human trb reference gene if the ncbi reference gene sequence was partial) by identifying the human trb gene that had the highest overall percentage identity with each rhesus macaque trb gene using a clustalw alignment. we encountered the following scenarios: (i) a clearly identifiable one-to-one correspondence between a rhesus macaque and a human trb gene; (ii) a rhesus macaque trb gene with reasonable similarity to a group of human trb genes; and, (iii) a human trb gene with no reasonable correspondence to a rhesus macaque trb gene. we therefore adopted the following approach to labelling the rhesus macaque trb genes. for each rhesus macaque trb gene, we first identified the group of human trb genes to which it was most similar (e.g. trbv1). we then numbered all rhesus macaque trb genes which were most similar to this same group of human trb genes according to the order in which the trb sequences were found in the rhesus macaque genome (e.g. trbv1-1, trbv1-2, etc.). the immunogenetics (imgt)48 nomenclature for tcr genes was used throughout. for all mamu-a*01-restricted siv-sl8/tl8-specific and siv-cm9-specific tcrβ sequences, we performed a complete alignment analysis using the identified rhesus macaque trb genes. this analysis determined for each epitope-specific tcrβ sequence the bestpercentage-match trbv, trbd and trbj genes over the longest alignment length by initially aligning the trbv gene at the 5' end of the tcrβ sequence and then aligning the trbj gene at the 3' end of the tcrβ sequence. a minimum percentage match of 77% over an alignment length of at least 50 nucleotides was required for alignment of the trbv genes. for alignment of the trbj genes, a minimum percentage match of 70% was required over the length of the trbj exon. the trbd genes were then aligned to the sequence interval between the identified trbv and trbj regions. a match to a string of two or more nucleotides was considered to originate from the trbd gene. exons, introns and recombination signal sequences have been included and gene families consisting of multiple genes are highlighted. all trbv gene sequences were aligned using clustalw and the tree was constructed in clustalw using the neighbour-joining method49 and bootstrapped 1000 times. branches with bootstrap values >80% are indicated with a black dot and branch lengths are those assigned by clustalw. the tree was visualized using the interactive tree of life50 (available at http://itol.embl.de/). note that the tree has been rotated about the mid-point of the most distant nodes to assist visualization. greenaway et al. comparison of the rhesus macaque trbv genes and their best human homologues. the best human homologue had the highest percent identity with the rhesus macaque gene nucleotide sequence. the alignment length is the total length across both the aligned rhesus macaque and human gene/exon sequences. the exon amino acid sequence was translated in the frame that yielded a start codon at the 5' end of the exon. comparisons of the exon amino acid sequences were omitted for trbv genes in which no start codon was found. for partial rhesus macaque exons missing a portion of sequence at the 5' end, the sequences were translated in the frame in which the start codon was found in the human homologues. the rhesus macaque gene is a partial sequence, with a missing portion of sequence at the 5' end of the gene. the percent identities between rhesus macaque and human genes and exons are calculated with the missing portion of rhesus macaque gene excluded. 5 the rhesus macaque gene is a partial sequence, with a missing portion of sequence at the 3' end of the gene. the percent identities between rhesus macaque and human genes and exons are calculated with the missing portion of rhesus macaque gene excluded. table 3 comparison of the rhesus macaque trbj genes and their human homologues. 1 the alignment length is the total length across both the aligned rhesus macaque and human gene/exon sequences. the trbj exons were translated in the frame that yielded the characteristic fgxg or lgxg motif. 1 the alignments were performed over the total length of the trbd or trbj exon. evolutionary and biomedical insights from the rhesus macaque genome complete nucleotide sequence of the rhesus lymphocryptovirus: genetic validation for an epstein-barr virus animal model complete sequence and genomic analysis of rhesus cytomegalovirus development and homeostasis of t cell memory in rhesus macaque analysis of tcralphabeta combinations used by simian immunodeficiency virus-specific cd8+ t cells in rhesus monkeys: implications for ctl immunodominance dramatic increase in naive t cell turnover is linked to loss of naive t cells from old primates current concepts in aids pathogenesis: insights from the siv/macaque model preclinical study of influenza virus a m2 peptide conjugate vaccines in mice, ferrets, and rhesus monkeys aberrant innate immune response in lethal infection of macaques with the 1918 influenza virus high resolution radiographic and fine immunologic definition of tb disease progression in the rhesus macaque experimental rhesus lymphocryptovirus infection in immunosuppressed macaques: an animal model for epstein-barr virus pathogenesis in the immunosuppressed host the bzlf1 homolog of an epstein-barr-related gamma-herpesvirus is a frequent target of the ctl response in persistently infected rhesus macaques experimental coinfection of rhesus macaques with rhesus cytomegalovirus and simian immunodeficiency virus: pathogenesis induction and evolution of cytomegalovirus-specific cd4+ t cell clonotypes in rhesus macaques rhesus cmv: an emerging animal model for human cmv prolonged dominance of clonally restricted cd4(+) t cells in macaques infected with simian immunodeficiency viruses the repertoire of cytotoxic t lymphocytes in the recognition of mutant simian immunodeficiency virus variants clonal focusing of epitope-specific cd8+ t lymphocytes in rhesus monkeys following vaccination and simian-human immunodeficiency virus challenge the role of production frequency in the sharing of simian immunodeficiency virus-specific cd8+ tcrs between macaques contributions of cd4+, cd8+, and cd4+cd8+ t cells to skewing within the peripheral t cell receptor beta chain repertoire of healthy macaques the disruption of macaque cd4+ t-cell repertoires during the early simian immunodeficiency virus infection maintenance of cd4+ t cell tcr vbeta repertoire heterogeneity is characteristic of apathogenic siv infection in non-human primate model of aids contribution of t-cell receptor repertoire breadth to the dominance of epitope-specific cd8+ t-lymphocyte responses sequence of the rhesus monkey t-cell receptor beta chain diversity and joining loci public clonotype usage identifies protective gag-specific cd8+ t cell responses in siv infection basic local alignment search tool clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice imgt, the international immunogenetics database the neighbor-joining method: a new method for reconstructing phylogenetic trees interactive tree of life (itol): an online tool for phylogenetic tree display and annotation we thank dr mark tanaka for assistance with the phylogenetic analysis and associate professor andrew collins for helpful discussions. refer to web version on pubmed central for supplementary material. key: cord-313138-y485ev30 authors: magor, katharine e.; miranzo navarro, domingo; barber, megan r.w.; petkau, kristina; fleming-canepa, ximena; blyth, graham a.d.; blaine, alysson h. title: defense genes missing from the flight division date: 2013-04-24 journal: dev comp immunol doi: 10.1016/j.dci.2013.04.010 sha: doc_id: 313138 cord_uid: y485ev30 birds have a smaller repertoire of immune genes than mammals. in our efforts to study antiviral responses to influenza in avian hosts, we have noted key genes that appear to be missing. as a result, we speculate that birds have impaired detection of viruses and intracellular pathogens. birds are missing tlr8, a detector for single-stranded rna. chickens also lack rig-i, the intracellular detector for single-stranded viral rna. riplet, an activator for rig-i, is also missing in chickens. irf3, the nuclear activator of interferon-beta in the rig-i pathway is missing in birds. downstream of interferon (ifn) signaling, some of the antiviral effectors are missing, including isg15, and isg54 and isg56 (ifits). birds have only three antibody isotypes and igd is missing. ducks, but not chickens, make an unusual truncated igy antibody that is missing the fc fragment. chickens have an expanded family of lilr leukocyte receptor genes, called chir genes, with hundreds of members, including several that encode igy fc receptors. intriguingly, lilr homologues appear to be missing in ducks, including these igy fc receptors. the truncated igy in ducks, and the duplicated igy receptor genes in chickens may both have resulted from selective pressure by a pathogen on igy fcr interactions. birds have a minimal mhc, and the tap transport and presentation of peptides on mhc class i is constrained, limiting function. perhaps removing some constraint, ducks appear to lack tapasin, a chaperone involved in loading peptides on mhc class i. finally, the absence of lymphotoxin-alpha and beta may account for the observed lack of lymph nodes in birds. as illustrated by these examples, the picture that emerges is some impairment of immune response to viruses in birds, either a cause or consequence of the host-pathogen arms race and long evolutionary relationship of birds and rna viruses. lymph node duck chicken major histocompatibility complex a b s t r a c t birds have a smaller repertoire of immune genes than mammals. in our efforts to study antiviral responses to influenza in avian hosts, we have noted key genes that appear to be missing. as a result, we speculate that birds have impaired detection of viruses and intracellular pathogens. birds are missing tlr8, a detector for single-stranded rna. chickens also lack rig-i, the intracellular detector for singlestranded viral rna. riplet, an activator for rig-i, is also missing in chickens. irf3, the nuclear activator of interferon-beta in the rig-i pathway is missing in birds. downstream of interferon (ifn) signaling, some of the antiviral effectors are missing, including isg15, and isg54 and isg56 (ifits). birds have only three antibody isotypes and igd is missing. ducks, but not chickens, make an unusual truncated igy antibody that is missing the fc fragment. chickens have an expanded family of lilr leukocyte receptor genes, called chir genes, with hundreds of members, including several that encode igy fc receptors. intriguingly, lilr homologues appear to be missing in ducks, including these igy fc receptors. the truncated igy in ducks, and the duplicated igy receptor genes in chickens may both have resulted from selective pressure by a pathogen on igy fcr interactions. birds have a minimal mhc, and the tap transport and presentation of peptides on mhc class i is constrained, limiting function. perhaps removing some constraint, ducks appear to lack tapasin, a chaperone involved in loading peptides on mhc class i. finally, the absence of lymphotoxin-alpha and beta may account for the observed lack of lymph nodes in birds. as illustrated by these examples, the picture that emerges is some impairment of immune response to viruses in birds, either a cause or consequence of the host-pathogen arms race and long evolutionary relationship of birds and rna viruses. ó 2013 elsevier ltd. all rights reserved. a survey of genomic resources demonstrates that the avian immune gene complement is reduced compared to mammals. an initial investigation of the immune genes in the chicken genome, a red jungle fowl, suggested that birds have a reduced immune gene repertoire (consortium, 2004) . as this sequence assembly and annotation has been improved, some of these missing genes have been identified, however others are clearly not present. as genomes are sequenced for other birds, including turkey (dalloul et al., 2010) , zebrafinch and duck (http:// pre.ensembl.org/anas_platyrhynchos/info/index), synteny along the chromosome allowed identification of genes. thus, immune genes could be identified even if significantly diverged. a comparison of immune genes between three species of birds, confirmed that immune genes show greater divergence between species than other genes, with higher dn/ds ratio than other parts of the genome and evidence of positive selection on specific codons within genes . the sequencing of cdna libraries as expressed sequence tags (ests) (carre et al., 2006) , and blast homology searches helped to identify the genes. nonetheless, some genes are still unaccounted for. this appears true for all birds, although species differences exist. for some of these genes missing from the avian defense arsenal, the evidence is overwhelming, while others are less certain. in all cases, the completion and quality of the genome sequence and annotation determines whether a gene can be identified or not. gaps exist in the genome sequences, and immune genes are often present in gene families, which are particularly prone to problems with assembly. est libraries are incomplete, and immune gene expression may be restricted to certain tissues or cell types, and most importantly, only following immune activation. thus, until genomes are complete and error-free it may be premature to say that a gene is not there. nonetheless, claiming that a gene is missing certainly inspires research aimed at confirming or disproving this, or demonstrating that another gene plays an analogous or compensatory role. thus, it is worth highlighting the genes that appear to be missing. the contracted immune gene repertoire of birds was discussed in recent review of the progress in avian immunology since the availability of the chicken genome (kaiser, 2010 (kaiser, , 2012 . in comparison with mammals, birds have partial repertoires of pattern recognition receptors including tlr receptors (boyd et al., 2007; brownlie and allan, 2011; cormican et al., 2009 ) and rig-like receptors (barber et al., 2010; karpala et al., 2012) . others have extensively examined the repertoire of avian cytokines (kaiser et al., 2005) and chemokines (hughes et al., 2007; kaiser et al., 2005) interferons (schultz et al., 2004) (schultz and magor, 2008) and defensins (lynn et al., 2007) noting the genes missing from these repertoires. the immunoglobulin locus been characterized in ducks (lundqvist et al., 2001) , and encodes just three antibody isotypes (magor, 2011) . finally, the chicken major histocompatibility complex (kaufman, 2013) is a minimal mhc, where only the most essential genes have been retained. these reviews of each system, although excellent, do not dwell on the genes not found. over the course of our analysis of immune systems of ducks, we have often invested significant effort to identify homologues of the chicken or mammalian immune system. despite our best efforts, some genes have eluded our search. here we will focus on components of three parts of the immune system that we are investigating in ducks (pattern recognition, antibodies and mhc) and identify the genes that are not there in the duck or the chicken or both. we will assess the strength of the data suggesting the absence of the gene, and consider the effect of the gene loss on the immune system of the animal. finally, we will speculate on the selective forces that may have led to the loss of the gene. innate immunity provides the first line of defense against pathogens. recognition of the pathogen through the molecular patterns of conserved pathogen components, or pattern recognition activates a signaling cascade to turn on genes for the effectors of the immune response. toll like receptors (tlrs) detect foreign invaders by sensing pathogen-associated molecular patterns. binding of agonists to tlrs on the cell surface, or within the endosomal compartment, activate signal transduction pathways to turn on antimicrobial peptides, cytokines, interferons and cellular killing mechanisms. birds possess genes for ten tlrs. these include two tlr1 genes, two tlr2 genes, tlr3, tlr4, tlr5, tlr7, tlr15 and tlr21. several excellent reviews have been written recently on avian tlr genes (brownlie and allan, 2011; cormican et al., 2009) . two genes are missing in comparison to fish and mammals, tlr8 and tlr9. tlr9, which detects cpg, has been functionally compensated by tlr21 (brownlie et al., 2009; keestra et al., 2010) . tlr7 and tlr8 are phylogenetically related as the product of an ancient gene duplication and both can recognize single-stranded rna, oligoribonucleotides, and nucleic acid analogues in the endosomal compartment (reviewed by cervantes et al., 2012) . tlr8, which is present in fish and mammals, is absent in birds. sequencing downstream of chicken tlr7 showed only fragments of tlr8 (philbin et al., 2005) . further, pcr evidence suggested that tlr8 was disrupted in all the galliform birds, but not anseriform birds, and there was speculation that this could account for the increased susceptibility of chickens to influenza relative to ducks (philbin et al., 2005) . to follow up on this observation, we cloned duck tlr7 cdna, and isolated a genomic clone for duck tlr7, sequenced it, and examined the region downstream. as seen for chickens, we could identify only small fragments of tlr8, and a cr1 element disrupted the gene in both ducks and chickens (macdonald et al., 2007) . tlr8 is also absent from the zebra finch (cormican et al., 2009 ) and turkey genome (ramasamy et al., 2012) . given the evolutionary distance of galliform birds and zebra finch, tlr8 is likely missing from the entire avian lineage. for several years, mouse tlr8 had been presumed non-functional, based on the lack of response to tlr7/8 agonists in the tlr7à/à mouse (hemmi et al., 2002) . however, when peripheral blood monocytes from mice are treated with selective tlr8 agonists, imidazoquinoline 3m002 and poly t oligonucleotides, mouse tlr8 activation is demonstrated while tlr7 is suppressed (gorden et al., 2006) . tlr8 is expressed in monocytes/macrophages and myeloid dcs, while tlr7 is expressed in pdcs and b cells (hornung et al., 2002) . tlr8 also plays a role in detecting bacterial rna, including rna from borrelia burgdorferi, the agent of lyme disease, inducing production of ifn-beta through irf7 (cervantes et al., 2011) . tlr8 is upregulated by the phagocytosis of mycobacterium, including the attenuated bcg vaccine strain, mycobacterium bovis, and mycobacterium tuberculosis (davila et al., 2008) . human tlr8 allelic variants are associated with increased susceptibility to pulmonary tuberculosis (davila et al., 2008) . the protective allele is associated with decreased translation of tlr8, presumably resulting in a decrease in sensing and activation, and less inflammation (davila et al., 2008) . effectively, loss of tlr8 expression is protective against tuberculosis. it is not clear why the loss of tlr8 was selected for in birds. the simplest explanation is the similarity of function between tlr7 and tlr8 rendered the second gene non-functional. in this scenario, however, there is no selection for the deletion of the gene. alternatively, tlr8 became detrimental, perhaps by recognizing self-antigens and initiating autoimmunity. negative selection would then likely lead to the loss of this receptor. tlr7 has been implicated in induction of autoimmunity (mills, 2011) . early experiments used chickens to demonstrate thyroid autoimmunity (sundick et al., 1992; wick et al., 1974) but it is not known to what extent avian species suffer autoimmunity in nature. ironically, knockout of tlr8 in mice leads to autoimmunity through overexpression and disregulation of tlr7 (demaria et al., 2010) . intriguingly, nucleic acid sensing tlrs are implicated in preventing reactivation of host retroviral elements and consequent tumor production (yu et al., 2012) . tlr7 has been directly implicated in this immunosurveillance, as lack of antibodies against endogenous retroviral elements correlates with absence of tlr7 in knockout mice strains. this crucial role of tlr7 in immunosurveillance of endogenous retroviruses would provide the selective pressure to retain tlr7 in the genome, regardless of how tlr8 was lost. since tlr8 appears to have been inactivated by a cr1 repetitive element, it is tempting to speculate that tlr8 was lost in a hypothetical reactivation of endogenous retroviral elements that disrupted the genome in a distant avian ancestor. jim kaufman has alluded to such a catastrophic 'avian big bang' in describing the loss of several avian mhc genes (kaufman and wallny, 1996) . another theoretical possibility is that tlr8 became the target of a pathogen that subverted it for its own benefit (discussed in barber, 2011) . viral subversion of tlr3 is such that its absence increases host survival from many pathogens. tlr3-induced host proinflammatory cytokines allow west nile virus to cross the blood brain barrier (wang et al., 2004) . tlr3 activity has also been implicated in influenza-induced pneumonia (le goffic et al., 2006) and morbidity from vaccinia infection (hutchens et al., 2008) . along these lines, we can envision a pathogen that subverted avian tlr8 for increased susceptibility. this could include viral targeting of tlr8 receptor for increased inflammation and pathology, or subversion of an endosomal tlr8 for entry of a mycobacterial pathogen into the cell. mycobacteria are initially engulfed by macrophages, but survive and multiply intracellularly. thus, bacterial or viral subversion of a prr may drive selection to disable the gene. whether cause or effect, the lack of tlr8 in avian monocytes/ macrophages likely does contribute to the susceptibility of birds to rna viruses (west nile virus, newcastle disease virus, influenza virus and others) and intracellular bacterial infections, including mycobacteria. mycobacterium avium is a significant pathogen of birds, particularly those raised in small flocks, while modern flock hygiene has reduced the incidence in commercial poultry. susceptibility to mycobacteriosis in birds varies, with chickens, pheasants, partridges being most susceptible, ducks and geese moderately resistant, and pigeons being very resistant (reviewed in tell et al., 2001) . rig-i is a cytoplasmic pattern recognition receptor for singlestranded 5 0 -triphosphate rna with short double-stranded conformation, such as panhandle structures of viral genomes (hornung et al., 2006; pichlmair et al., 2006; schlee et al., 2009; yoneyama et al., 2004) . both rig-i, and the related pattern recognition receptor for intracellular rna, mda5, share the same pathway signaling through mavs on the mitochondrion (fig. 1 ). after detection of viral rna by rig-i, a conformational change releases the card domains (kolakofsky et al., 2012; kowalinski et al., 2011; luo et al., 2011; takahasi et al., 2008) . trim25, an e3 ubiquitin ligase, interacts with the card domains of rig-i to activate it through attached (gack et al., 2007) or unanchored k63-polyubiquitin chains (jiang et al., 2012; zeng et al., 2010) . the relative importance of these two mechanisms in the activation of rig-i is still controversial, but activation leads to oligomerization and rig-i translocation to the mitochondria. translocation of rig-i and trim25 to the mitochondrial membrane involves the mitochondrial chaperone 14-3-3e , allowing interaction with mavs at the mitochondrion. this interaction induces prion-like aggregates of mavs (hou et al., 2011) that initiate signaling leading to irf3/7 translocation and the production of type i interferons and proinflammatory cytokines. the gene encoding rig-i, ddx58, is not annotated in the chicken genome sequence, and is missing in some fish species, but mda5 homologues are present in all vertebrate families (zou et 2009). we demonstrated that ducks have a functional rig-i (barber et al., 2010) . in contrast, the ddx58 gene appears absent in chickens by analysis of the syntenic region of the z chromosome, although we can identify the flanking gene. we also cannot find rig-i in a search of the expressed sequence tag database for chickens. thus rig-i is missing in the genome of the ancestral chicken represented by the red jungle fowl, and the sequences from modern commercial chicken breeds. our southern blots show a duck rig-i probe cross-hybridizes with pigeon dna, but not with chicken dna (barber et al., 2010) . we also cannot detect the gene in dna of turkey or partridge, suggesting that the gene is missing in galliformes (barber, 2011) . furthermore, we showed that chicken df-1 cells cannot detect rig-i ligand, but if we transfect the cells with duck rig-i we can reconstitute the pathway (barber et al., 2010) . the loss of rig-i likely contributes to the susceptibility of chickens to infection compared to ducks to a variety of singlestrand rna viruses, including influenza a virus and newcastle disease virus, both of which cause more harm in chickens than ducks. the related rna detector, mda5, can partially compensate and detect avian influenza in chicken cells to generate an interferon response (karpala et al., 2011; liniger et al., 2012) . it is difficult to speculate on the selective forces resulting in loss of rig-i in some birds. some have suggested the possible existence of a compensatory yet-to-be identified alternate receptor (karpala et al., 2011) , which could certainly facilitate the loss from the genome. rig-i was initially identified in a leukemia cell line upregulated by retinoic acid (liu et al., 2000) , and indeed it is upregulated by a variety of stress inducers. rig-i is implicated in a number of other biological events including cell proliferation, apoptosis, senescence, and acute and chronic inflammatory diseases (liu and gu, 2011) . it is possible that selection to eliminate rig-i from aberrant activation in one of these alternate roles had resulted in the loss of rig-i in galliform birds. finally, there remains the intriguing possibility that the rig-i receptor was the prey of one of the many single-strand rna viruses that infect birds, including influenza virus, newcastle disease virus, west nile virus, and coronaviruses, and was usurped for the virus' advantage. the tlr and rlr signaling pathways are targets for viral subversion (reviewed by es-saad et al., 2012; ramos and gale, 2011) . influenza virus interferes in the mammalian rig-i pathway in several places, through the action of ns1 protein (gack et al., 2009) . in a similar manner, paramyxoviruses make the v protein that interferes with signaling by the chicken mda5 receptor (childs et al., 2007) involving direct protein-protein interaction and preventing interaction with the rna ligand (motz et al., 2013) . while this interference renders the receptor non-functional during an infection, it would not necessarily lead to selection to eliminate the receptor. however, we can envision interactions with the rig-i receptor where regulation is aberrant, or excessive activation leads to death. in this scenario, loss of rig-i could provide a selective advantage to survive a lethal infection with an unknown pathogen. suggesting that rig-i is also involved in development in some capacity, rig-i knockout mice are embryonic lethal due to liver damage (kato et al., 2005) . aberrant or loss of expression of rig-i during development due to pathogen subversion, and associated embryonic lethality, could result in selective loss of this receptor. while this raises the question of how the birds without rig-i survived, we note that rig-i is not absolutely essential for development, since rig-i knockout mice have since been made on a different genetic background which are fertile and viable (wang et al., 2007) . given that rig-i expression is impaired in lethal infections (kobasa et al., 2007) , it is possible to envision a scenario by which aberrant expression of rig-i leads to death, and the gene is selectively lost in a common ancestor of chickens and turkeys. human rig-i is regulated through polyubiquitinylation, isgylation, sumolyation, and phosphorylation and alternate splicing (eisenacher and krug, 2012; loo and gale, 2011; maelfait and beyaert, 2012; oshiumi et al., 2012; wang et al., 2011) . a major on-off switch for human rig-i upon viral infection is polyubiquitination by host e3 ubiquitin ligase, tripartite motif protein 25 (trim25) (gack et al., 2007) . sequences at the t55 residue implicated in interaction with trim25 and the site of attachment of polyubiquitin chains, k172, are not conserved in duck or zebra finch rig-i (barber et al., 2010) or goose rig-i . thus activation of avian rig-i involves ubiquitination at alternate residues, or interaction with unanchored polyubiquitin chains, not attached to any protein, can activate rig-i (zeng et al., 2010) . given the lack of rig-i in chickens, the recent observation that knockdown of chicken trim25 impairs the interferon response of chicken cells is intriguing (rajsbaum et al., 2012) . perhaps trim25 is involved in the activation of chmda5. the binding of unanchored k63 polyubiquitin chains can activate human mda5 in vitro (jiang et al., 2012) . a role for trim25 in generating or attaching ubiquitin chains to mda5 could explain the importance of chtrim25 in the interferon response of chicken cells. riplet/rnf135 is a cytoplasmic e3-ligase identified by yeast two-hybrid as one of the proteins binding rig-i, and is essential for rig-i activation in human cell lines upon infection with an rna virus (oshiumi et al., 2009; oshiumi et al., 2010) . riplet shares 60.8% identity with trim25 in humans (oshiumi et al., 2009) , and also has an n-terminal ring domain and c-terminal pry/spry domain. the ring domain confers ubiquitin e3 ligase activity (nisole et al., 2005) and also contributes to other protein-protein interactions (borden, 2000) . riplet also mediates k63-polyubiquitination of rig-i (gao et al., 2009; oshiumi et al., 2010) . however, there is debate as to whether riplet interacts with the card domains of rig-i (gao et al., 2009 ) the c-terminal repressor domain of rig-i or both (oshiumi et al., 2010) . riplet is crucial for rig-i activation in cells regardless of expression of trim25 (oshiumi et al., 2010) . knockout of riplet (oshiumi et al., 2010) or trim25 (gack et al., 2007) impaired the rig-i dependent innate immune response, suggesting that both are required. knockout of riplet resulted in animals that were deficient in the production of interferon in response to rna, but not dna viruses (oshiumi et al., 2010) . riplet is present in zebra finch (taeniopygia guttata), but we were unable to find the ortholog in the chicken (gallus gallus) genome. in the duck, we have located a putative riplet coding region, but it lacks exon 1. in repeated 5 0 race experiments, all clones recovered contain sequences that correspond to an intact open reading frame, but lack the expected ring domain. in mice, deletion of the ring domain prevents rig-i activation (oshiumi et al., 2010) therefore we hypothesize that deletion of the ring domain in ducks may render it functionally inactive. nonetheless, we saw upregulation of riplet and trim25 in duck lung at 1dpi with highly pathogenic avian influenza virus (fleming-canepa x. et al., unpublished data) . because riplet may not be functional in ducks, but still highly upregulated during influenza infection, we speculate that riplet is acting as a decoy for the viral ns1. influenza a ns1 protein interacts with trim25 (gack et al., 2009) and riplet (rajsbaum et al., 2012) causing inhibition of innate immune signaling. alternatively, riplet may dimerize with other e3 ligases to function, as recently shown for trim16 (bell et al., 2012) . comparison of embryonic fibroblast cells from rig-i knockout and wild-type mice upon influenza infection, reveal genes that are downstream of rig-i signaling. these genes have been referred to as the rig-i bioset, the genes induced by influenza infection in a rig-i dependent manner. in mouse fibroblast cells, the genes include ifnb, irf3, irf7, stat1, stat2, pkr, oas, mx1, ifit2 (isg54), ifit1 (isg56) and rsad2 (viperin) (loo et al., 2008) . while the overlap between rig-i and mda5 inducible genes downstream of mavs signaling in chicken cells is unknown, we used a microarray approach to examine the genes turned on by rig-i in chicken cells. using chicken df-1 cells, transfected with duck rig-i, the expression of the rig-i gene bioset in avian species is augmented (barber et al., 2013) . we noted that some essential genes of the mouse rig-i bioset are missing in avian species, including irf3, isg15, and ifit2 (isg54) and ifit1 (isg56). interferon regulatory factor-3 is a critical player in the induction of type i ifns following virus infection (au et al., 1995) . irf3 and irf7 have different and crucial roles in the induction of infa/b . irf3 is constitutively expressed, and is activated by c-terminal phosphorylation that allows dimerization and nuclear localization (lin et al., 1998) . this led to the suggestion that irf3 was responsible for the initial upregulation of the ifnb gene, followed by interferon dependent induction of irf7. however, irf7à/à knockout mice are severely impaired in interferon production upon infection with ssrna viruses (honda et al., 2005) , suggesting the contribution of irf3 is minor. although a gene has been named irf3 in chickens (grant et al., 1995) , it is interferon inducible and more similar to irf7. others have noted the absence of irf3 in chickens (huang et al., 2010) and in avian species (cormican et al., 2009 ). it is not known which irf is translocating to the nucleus to activate interferon in the rig-i/mda5 pathway in avian species. we speculate that irf7 fulfills the nuclear translocation and activation of type i ifns in both tlr and rlr signaling, but this has not been experimentally examined. interferon stimulated gene 15 (isg15) is highly up regulated by interferon treatment and was the first ubiquitin-like modifier identified. the amino acid sequence of isg15 is similar to a linear ubiquitin dimer (reviewed in zhang and zhang, 2011) . isg15 is conjugated to proteins like ubiquitin, through a process called isgylation. among the identified isgylated substrates are interferon-induced proteins like pkr, rig-i, mxa (zhao et al., 2005) . irf3 isgylation by herc5 (the main isg15 e3 ligase in human) increases stability of irf3, exerting a positive regulation in the rig-i pathway . negative feedback on rig-i expression and signaling is mediated by isg15 conjugation to rig-i (kim et al., 2008) . isg15 is also involved in a direct antiviral mechanism where isgylation of influenza a virus ns1 protein impairs viral replication . no genes homologous to human isg15 have been annotated in any of the available avian genomes. in the chicken, genes located adjacent to human isg15 were predicted; including hes1 (homologous to human hes4) and agrn. within this syntenic region of the chicken genome no ubiquitin-like gene was present. similarly, homologs of hes1 and argn genes were found in the duck scaffolds (scaffold 1197 and 2665, respectively) but synteny analysis cannot be performed because the scaffolds do not overlap. enzymes involved in the isgylation system (including ube1l, ubch8, herc5 and usp18) are present in the chicken and duck genomes, but there is not yet any functional evidence of isgylation in these species. usp18, which is responsible for cleavage of isg15 from isgylated substrates, correlates with survival of influenza-infected chickens, indirectly suggesting some functionality of the isgylation system (uchida et al., 2012) . isg15 conjugation plays many roles in mammalian antiviral immunity, including isgylation of mx, pkr, rig-i, and irf3, and influenza ns1 protein (reviewed in skaug and chen, 2010) . however, given the absence of several isg15 targets, including rig-i in chickens, irf3 in birds, and evidence that mx is non-functional in chickens (schusser et al., 2011) and ducks (bazzigher et al., 1993) , the absence of this ubiquitin modifier in birds would be less significant. it is not known whether ns1 is modified by isg15 in avian hosts. we cannot rule out the possibility that we have failed to identify the avian isg15 homolog because of low sequence conservation with human isg15. it is also possible that another unknown ubiquitin-like modifier in birds plays the role of isg15 within the isgylation system. intriguingly, the most similar sequence to isg15 in the duck and chicken genome lies within the c-terminal end of 2 0 ,5 0 -oligoadenylate synthetase-like (oasl) gene. oasl has two tandem ubiquitin-like domains that share 37% amino acid identity to human isg15. the antiviral activity of the human p59 protein, encoded by human oasl, is dependent on the c-terminal ubiquitin-like domain (marques et al., 2008) . however, the biological function of the oasl ubiquitin-like domains is not yet clear and its role as ubiquitin-like modifier has not been described. the interferon-induced proteins with tetratricopeptide repeats (ifit) genes are highly upregulated by type i ifns or by viral infection (bluyssen et al., 1994; levy et al., 1986; wathelet et al., 1986) . the human ifit gene family consists of ifit1 (isg56), ifit2 (isg54), ifit3 (isg60), and ifit5 (isg58), while the mouse ifit family lacks ifit5 and contains ifit1, ifit2, and ifit3 (bluyssen et al., 1994) . the ifit family appears to be limited to a single gene in marsupials, birds, frogs and fish (reviewed by zhou et al., 2013) . while these proteins have served as markers of viral infection, only recently have their functions in the innate antiviral response been elucidated (daffis et al., 2010; fensterl et al., 2012; mcdermott et al., 2012; pichlmair et al., 2011; schmeisser et al., 2010) . ifit proteins reside within the cytoplasm of cells, and all contain multiple tetratricopeptide repeats (tprs) (lamb et al., 1995) . the tprs within these proteins consist of a helix-turn-helix motif and facilitate protein-protein interactions (blatch and lassle, 1999) . ifit1 and ifit2 mediate their antiviral activity by a disruption of translation via an interaction with eukaryotic initiation factor 3 (eif3) . ifit1 and ifit2 also inhibit the translation of viral mrnas lacking a 2 0 -o methylation cap structure (daffis et al., 2010) . ifit1 and ifit5 have the ability to bind to, and sequester viral 5 0 -triphosphate rna (pichlmair et al., 2011) . the crystal structure of ifit2 has revealed an rna binding domain that also may function in an antiviral context (yang et al., 2012) . the multi-functional ifit1 protein can also restrict the replication of human papilloma virus (hpv) by binding the viral helicase e1, and restricting its function in viral replication (saikia et al., 2010) . interestingly, ifit1 has also been associated with negative feedback regulation of genes upregulated during viral infection, further demonstrating the diverse function of these genes . the ifit gene family represents a significant contributor to the broad-ranged antiviral activity of interferons, and plays an important role in the cellular, innate antiviral response. in avian species, the only identifiable ifit gene encodes a protein that aligns with other ifit5 proteins in a phylogenetic tree (fig. 2) . the upregulation of ifit5 following viral infection of chicken cells expressing duck rig-i ( barber et al., 2013) or infection of ducks (vanderven et al., 2012) suggests ifit5 is an important antiviral effector in avian species. the apparent absence of an expanded ifit gene family in avian species suggests that several of the functions attributed to ifit proteins will be missing. indeed, the specific role of avian ifit5 during a viral infection is unknown. birds have only three antibody isotypes, igm, iga and igy. igy, the avian serum ig most similar to mammalian igg, is a precursor to igg and ige that has composite function of both isotypes (warr et al., 1995) . ducks make a truncated version of igy (magor et al., 1992) . in addition, birds use a single light chain gene of the k type (magor et al., 1994a; reynaud et al., 1983) . ducks have three immunoglobulin heavy chain genes arranged in the gene order ighm, igha and ighy encoding the mu, alpha and upsilon chains for igm, iga and igy, respectively (lundqvist et al., 2001; magor et al., 1999) . the igha gene, encoding alpha is inverted in the locus, and ighd (delta) is absent. despite availability of chicken, zebrafinch and turkey genomes, no other avian immunoglobulin heavy chain locus has yet been assembled. from the limited analysis that has been published for chicken igh , it shares the same organization. the transposition of igha from the 3 0 most position in the locus, to an inverted position downstream of ighm, may have also resulted in the loss of ighd. lack of ighd is evident from genomic sequencing for ducks. early studies reported a d chain in chickens (chen et al., 1982) , but it is generally accepted that there is no avian homologue of igd. since igd has been identified in teleosts (bengten et al., 2002; wilson et al., 1997) frogs (zhao et al., 2006) and reptiles (cheng et al., 2013; wei et al., 2009 ) the ighd gene was lost in birds. igd is an enigmatic antibody that exists in a wide variety of forms in different species, except birds. the function of igd is beginning to emerge from observations first made for fish, and subsequently for human igd. igd functions at the interface of innate and adaptive immune responses. in fish, igd specific b cells have been identified, and secreted igd lacks the variable region suggesting it functions more like a pattern recognition receptor (edholm et al., 2010) . igd is found on the surface of granulocytes in fish, which do not make the igd transcript, and involves a specific receptor (edholm et al., 2010) . in humans, circulating igd binds to basophils and activates antimicrobial and inflammatory factors . igd from igd+ igm-b cells binds to basophils, and can also bind to certain bacteria in the respiratory tract. the basophil binds igd through a specific receptor, and cross-link-ing of igd leads to the production of b cell activating factors (baff) and pro-inflammatory cytokines. serum igd is elevated in patients with chronic infections, and specific igd antibodies could be demonstrated in a number of these infections (reviewed by chen and cerutti, 2010) . this ancient surveillance system serves to instruct the b cells of the type of pathogens in the respiratory tract. as the specific functions of igd are elucidated, the consequences of the lack of igd antibody in birds will become evident. birds have basophils, but it is unclear whether a different ig isotype can bind to the basophil igd receptor to compensate, or whether the receptor exists in birds. indeed, it remains to be demonstrated that a homologous receptor is involved in the igd binding by basophils of humans and fish. duck igy is made in two secreted forms, a full-length form and a truncated form. the truncated form, called igydfc, lacks the fc region entirely. it arises from alternate splicing that adds an exon encoding just two amino acids after the ch1 and ch2 domains, and uses an alternate polyadenylation site (magor et al., 1992) (magor et al., 1994b) . what controls the alternate splicing is unknown, but the truncated form predominates later in the immune response. the igydfc antibodies would be expected to be defective in several processes such as antigen internalization, which is required for appropriate presentation of antigens needed to generate t cell help. the truncated igy also does not participate in complement fixation, opsonization, precipitation reactions, and reportedly also cannot participate in hemagglutination inhibition (hi) (higgins et al., 1987) . of benefit to ducks, perhaps the truncated igy helps prevent viral internalization through receptor-mediated endocytosis and subsequent infection of macrophages and other leukocytes (magor, 2011) . the chicken ig-like receptor (chir) genes (dennis et al., 2000) are counterparts of the leukocyte immunoglobulin-like receptor family (lilr). the chir genes constitute a large and diverse family of genes in the chicken, with more than 100 members located in a region syntenic to the mammalian leukocyte receptor complex fig. 2 . a phylogenetic tree showing similarity of avian and mammalian ifit sequences. sequences were aligned and phylogenetic tree generated using a maximum likelihood estimation using a program called phyml using www.phylogeny.fr. (dereeper et al., 2008) . accession numbers for the ifit sequences were: chicken ifit5 (xm_421662.3), turkey ifit5 (xm_003208028.1), zebra finch ifit5 (xm_002188552.1), human ifit1 (nm_001270927.1), mouse ifit1(nm_008331.3), human ifit2 (nm_001547.4), mouse ifit2 (nm_008332.3), human ifit3 (nm_001031683.2), mouse ifit3 (nm_010501.2), human ifit5 (nm_012420.2). note the duck ifit sequence is a partial sequence. (lrc) (laun et al., 2006; nikolaidis et al., 2005; viertlboeck and gobel, 2011; viertlboeck et al., 2005) and vast diversity within an individual (viertlboeck et al., 2010) . chir are expressed in a variety of myeloid and lymphoid cells, with individual receptors expressed in a cell-type restricted manner (viertlboeck et al., 2005) . receptor diversity includes variation within a hypervariable region, the putative binding region, alternate transcript splicing, and presence or absence of functional activation or inhibitory motif. the extensive expansion and diversification of this family in chickens, and leukocyte expression, suggests their evolution is in response to the pressure of pathogens, as suggested for human lilr and mouse pir genes (barclay and hatherley, 2008) . human lilr receptors are involved in self/non-self recognition and some engage mhc class i targets, as well as pathogen mimics of mhc proteins (anderson and allen, 2009; brown et al., 2004) . staphylococcus aureus targets the mouse inhibitory receptor pir-b for increased virulence (nakayama et al., 2012) . in turn, activating pir-a receptors may have evolved in response to the selective pressure from pathogens, as indicated by the relict itims in the pir-a gene suggesting it is derived from a pir-b ancestor (nakayama et al., 2012) . thus, counterbalance through inhibitory and activating chir proteins may have evolved in response to pathogen manipulation of immune signaling through these receptors. we have searched unsuccessfully for immunoglobulin superfamily members homologous to the chicken chir receptors in ducks. in high and low stringency southern blots, genomic dna from chickens shows an extensive pattern of hybridization, while dna from ducks shows no significant hybridization (macdonald et al., 2007) . all efforts to amplify these genes by polymerase chain reaction using several sets of degenerate primers are completely unsuccessful. our searches of the draft assembly of the duck genome, and 70,000 expressed tag sequences generated by 454 sequencing, also find no evidence of chir homologues. we cannot rule out the possibility that we have simply missed the chir genes due to weak homology, if they have evolved to be quite different in ducks. this in itself is quite intriguing. the rapid species-specific divergence of primate lilr genes, with only some genes showing clear orthologous relationships between species (canavez et al., 2001) , while others have evolved to be unique in each species is thought to reflect their species-specific interactions with pathogens. alternatively, these genes are truly absent from ducks, despite their presence in chickens. there are several examples where different vertebrates have employed different families of leukocyte receptors (parham and moffett, 2013) . for example, cattle use kir as nk cell receptors, which have undergone expansion (mcqueen et al., 2002) while horses use ly49 (takahashi et al., 2004) . although there are hundreds of chir genes, the only chir with a known ligand is chir-ab1, which functions as the chicken igy fc receptor (viertlboeck et al., 2007) . chickens have a large number of chir-ab1 genes that have varying specificities for igy . remarkably, we cannot find identifiable homologues of the chir-ab1 in ducks. duck full-length igy does not bind the chicken fc receptor (viertlboeck et al., 2007) . as noted above ducks also make a truncated igydfc that would be expected to not to bind fc receptor. göbel speculated that the loss of the igy fc fragment, and the duplication and divergence of the chicken chir-ab1 (fc receptor) family were both strategies to evade a pathogen interfering with the igy-fc receptor interaction in birds (purzel et al., 2009) . ducks evade this pathogen by production of an anti-body lacking the fc region, retaining the specificity for the antigen, as this truncated form predominates in the later immune response. in chickens, selection favored the duplication of the chir receptor family to make a large number of potential 'decoy receptors' for igy. while no known pathogen targets the fc-igy interaction in birds, this is a very interesting hypothesis. chickens may elude this pathogen through the binding of decoy receptors, while ducks may avoid the internalization of an intracellular pathogen through the production of the igydfc. the vertebrate mhc is the most dynamic part of the genome, showing repeated cycles of 'birth-and-death' evolution (kelley et al., 2005) . polygeny and polymorphism are hallmarks of the region, with varying numbers of genes between species (and sometimes between individuals). usually it is not possible to identify orthologous genes between species. the mhc class i and class ii genes are the most polymorphic genes in the vertebrate genome. the mhc of the chicken has been referred to as the 'minimal mhc' (kaufman et al., 1995) , fulfilling all the requirements of an mhc region, with a limited set of genes. the b locus, or genomic mhc region, contains just 19 genes within 92 kb (kaufman et al., 1999) . mhc class i genes flank either side of the transporters for antigen processing (tap) genes. they are referred to as the major (bf2) and minor (bf1) mhc class i loci. similarly, the mhc class ii genes (blb1 and blb2) are located on either side of tapasin (tap-bp), and in close proximity to the chaperones involved in mhc class ii loading (dma and dmb). several genes are notably absent, including the proteasome genes lmp2 and lmp7, as well as genes encoding tnf alpha, and lymphotoxin alpha and beta. kaufman argues the mhc organization critically affects function because proximity of the genes involved in antigen transport and presentation, allows their encoded proteins to evolve to work together. indeed, tap1 and tap2 genes are also polymorphic, and using a peptide translocation assay, kaufman recently showed that tap determines specificity for the linked dominant mhc class i gene (walker et al., 2011) . the limitation to one mhc class i gene in chickens, impairs defense against viral pathogens, as ability to defend against a particular pathogen is completely dependent on whether or not it can load peptides from that pathogen. the best illustration of the consequences of limited mhc class i presentation is the ability of chickens of one genotype to defend against rous sarcoma virus, while other strains cannot (wallny et al., 2006) . the duck has a functionally similar mhc class i region, with 5 mhc class i genes encoded adjacent to the tap genes (moon et al., 2005) . ducks predominantly express one gene, which is adjacent to the tap2 gene, which is also polymorphic (mesa et al., 2004) . in ducks, as in chickens, this is expected to have functional consequences for the defense against viruses. viruses can easily change the one or two epitopes that can be presented by alleles encoded by one mhc class i gene, and thus escape the cytotoxic t cells focused on these epitopes. we have been unable to identify the tapasin (tapbp) gene in the duck mhc. tapasin bridges the gap between the tap transporter and empty mhc class i molecules, bringing them into close proximity to the translocation core where peptides are loaded (sadasivan et al., 1996) . in the absence of tapasin, empty mhc class i molecules weakly associate with tap leading to binding and cell surface expression of less than optimal peptides (grandea et al., 1995). tapasin has been identified adjacent to mhc class ii in other birds, including chicken (frangoulis et al., 1999) , quail (shiina et al., 1999) , turkey (chaves et al., 2009) , pheasant and zebra finch and black grouse . also, the tapasin gene is polymorphic in chicken, turkey and pheasant (sironi et al., 2006) . through analysis of the unannotated duck genome (preensemble) we can identify the location of the duck mhc class ii genes, but a search for tapasin within proximity is unsuccessful. using primers based on the chicken sequence, or conserved regions identified in aligned avian tapasin sequences, our attempts to amplify tapasin from mallards or a domestic duck by rt-pcr or from genomic dna fail to yield a tapasin product. it is possible that failure to amplify tapasin from ducks is due to sequence divergence of tapasin between avian species. the galliform tapasin proteins are about 90% identical , but the human and chicken tapasin share only 36% amino acid identity (frangoulis et al., 1999) . in low stringency southern blot analysis, distinctive bands could be detected in chicken, however the probe does not hybridize to duck genomic dna (petkau, 2012) . although tapasin seems to play an important role in antigen presentation, certain human mhc class i alleles can function in a tapasin-independent manner (park et al., 2003; lewis et al., 1998) . a single amino acid substitution in hla-b from asp116 to tyr116 allows the latter to function in a completely tapasin independent manner (sieker et al., 2007) . it also appears that tapasin influences the peptide repertoire presented to mhc class i, favoring certain peptides over others. in the absence of tapasin antigen presentation is altered, rather than deficient, and is still sufficient to induce in an immune response (boulanger et al., 2010) . similarly, the absence of immunoproteasomes in mice, results in a change in 50% of the loaded peptide repertoire (kincaid et al., 2012) . we could speculate that loss of tapasin was advantageous in ducks. we presume that ducks, like chickens, already had constraints on presentation of antigens by mhc class i due to potential co-evolution of the tap transporters and adjacent mhc class i genes. evidence for this is that both tap1 and tap2 are polymorphic in ducks, and they express one dominant mhc class i gene (mesa et al., 2004) . perhaps the loss of tapasin permits closer interaction of the tap transporter and the specific mhc class i molecule intended for loading. proteins encoded by genes co-evolving along a haplotype reach a best fit. tapasin as the bridge between tap and mhc class i molecules could serve to bring the incorrect mhc class i into proximity of the tap transporters from the other haplotype, unless it was also evolving to keep step. the genes encoding tnf-alpha (tnf-a), and lymphotoxin-alpha and beta (lta and ltb) are missing from the avian mhc. extensive efforts by pcr, est mining, and hybridization to identify tnf-a in chickens and ducks have failed. similarly, the two genes encoding lta and ltb are missing in chickens (kaiser, 2012) , and a scan of the genome sequence shows they are also absent in ducks. lymphotoxin-alpha knockout mice lack lymph nodes (de togni et al., 1994) , and similarly chickens have no lymph nodes. primitive lymph nodes were previously described in ducks (berens von rautenfeld and burdras, 1983) and immunoglobulin transcripts were analyzed in lymphatic tissues isolated from ducks (bando and higgins, 1996; magor et al., 1994a) . however, we question whether these tissues contain recognizable lymph nodes containing secondary lymphoid tissue, as we are unable to identify anything in the lymphatic tissue that resembles a lymph node, even tracking with injected india ink. we examined isolated lymphatic tissues in ducks for mrna expression of ccl19 and ccl21, the two chemokines which are responsible for recruiting naïve t cells and dendritic cells to lymph nodes. we showed expression of these chemokines was negligible in lymphatic tissues, and abundant in spleen and influenza-infected lung tissues (fleming-canepa et al., 2011) . clearly, the lymphatic tissues of ducks are not sites of recruitment of lymphocytes and dendritic cells as expected for secondary lymphoid tissues. thus, ducks and chickens, like all other non-mammalian vertebrates (hofmann et al., 2010) , lack true lymph nodes. through comparison of the immune arsenal of ducks and chickens we highlight several immune genes that are 'mia or missing in action' from the flight divisions. we refer to these genes as mia, because we cannot say with certainty that these genes are not present, at least until the sequencing and annotation of avian genomes is more complete. if these genes are truly absent, what emerges is a picture in which chickens, missing tlr8 and rig-i, have less ability to detect rna viruses and intracellular bacteria than ducks, which lack only tlr8. in addition, birds are missing components in the rig-i pathway, and interferon-responsive antiviral effectors. chickens have a large expanded family of leukocyte receptors, which are apparently missing in ducks, which include the igy fc receptors. notably, ducks make a truncated igy that is lacking the fc region as their most abundant serum antibody. by similarity to their human homologues, the lilr receptors, other members of the chir family are presumed to be involved in self/non-self recognition, and their expansion may somehow compensate for the deficit due to the minimal avian mhc. in contrast, ducks may have lost tapasin, and with it lost some of the constraint on antigen presentation due to co-evolving linked mhc class i and tap transporter genes. whether the host-pathogen arms race is cause or consequence of these gene losses, birds have had a long evolutionary relationship with rna viruses, including those causing zoonoses. regulation of t-cell immunity by leucocyte immunoglobulin-like receptors: innate immune receptors for self on antigenpresenting cells identification of a member of the interferon regulatory factor family that binds to the interferonstimulated response element and activates expression of interferon-induced genes gene duplication and fragmentation in the zebra finch major histocompatibility complex duck lymphoid organs: their contribution to the ontogeny of igm and igy identification of avian rig-i responsive genes during influenza infection antiviral pattern recognition receptors in the natural host of influenza, ducks (anas platyrhynchos) association of rig-i with innate immunity of ducks to influenza the counterbalance theory for evolution and function of paired receptors no enhanced influenza virus resistance of murine and avian cells expressing cloned duck mx protein trim16 acts as an e3 ubiquitin ligase and can heterodimerize with other trim family members the igh locus of the channel catfish, ictalurus punctatus, contains multiple constant region gene sequences: different genes encode heavy chains of membrane and secreted igd topography, ultrastructure and phagocytic capacity of avian lymph nodes the tetratricopeptide repeat: a structural motif mediating protein-protein interactions structure, chromosome localization, and regulation of expression of the interferon-regulated mouse ifi54/ifi56 gene family ring domains: master builders of molecular scaffolds? absence of tapasin alters immunodominance against a lymphocytic choriomeningitis virus polytope viral evasion and subversion of patternrecognition receptor signalling conserved and distinct aspects of the avian toll-like receptor (tlr) system: implications for transmission and control of bird-borne zoonoses the lilr family: modulators of innate and adaptive immune pathways in health and disease avian toll-like receptors chicken tlr21 acts as a functional homologue to mammalian tlr9 in the recognition of cpg oligodeoxynucleotides comparison of chimpanzee and human leukocyte ig-like receptor genes reveals framework and rapidly evolving genes chicken genomics resource: sequencing and annotation of 35,407 ests from single and multiple tissue cdna libraries and cap3 assembly of a chicken gene index phagosomal signaling by borrelia burgdorferi in human monocytes involves toll-like receptor (tlr) 2 and tlr8 cooperativity and tlr8-mediated induction of ifn-beta tlr8: the forgotten relative revindicated defining the turkey mhc: sequence and genes of the b locus evidence for an igd homologue on chicken lymphocytes new insights into the enigma of immunoglobulin d immunoglobulin d enhances immune surveillance by activating antimicrobial, proinflammatory and b cell-stimulating programs in basophils extensive diversification of igh subclass-encoding genes and igm subclass switching in crocodilians mda-5, but not rig-i, is a common target for paramyxovirus v proteins sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution the avian toll-like receptor pathway-subtle differences amidst general conformity 0 -o methylation of the viral mrna cap evades host restriction by ifit family members genetic association and expression studies indicate a role of toll-like receptor 8 in pulmonary tuberculosis abnormal development of peripheral lymphoid organs in mice deficient in lymphotoxin tlr8 deficiency leads to autoimmunity in mice paired ig-like receptor homologs in birds and mammals share a common ancestor with mammalian fc receptors phylogeny.fr: robust phylogenetic analysis for the non-specialist identification of two igd+ b cell populations in channel catfish, ictalurus punctatus regulation of rlr-mediated innate immune signaling -it is all about keeping the balance evolutionary analysis and expression profiling of zebra finch immune genes regulators of innate immunity as novel targets for panviral therapeutics interferon-induced ifit2/isg54 protects mice from lethal vsv neuropathogenesis expression of duck ccl19 and ccl21 and ccr7 receptor in lymphoid and influenza-infected tissues identification of the tapasin gene in the chicken major histocompatibility complex influenza a virus ns1 targets the ubiquitin ligase trim25 to evade recognition by the host viral rna sensor rig-i trim25 ring-finger e3 ubiquitin ligase is essential for rig-i-mediated antiviral activity reul is a novel e3 ubiquitin ligase and stimulator of retinoicacid-inducible gene-i cutting edge: activation of murine tlr8 by a combination of imidazoquinoline immune response modifiers and polyt oligodeoxynucleotides dependence of peptide binding by mhc class i molecules on their interaction with tap cirf-3, a new member of the interferon regulatory factor (irf) family that is rapidly and transiently induced by dsrna small anti-viral compounds activate immune cells via the tlr7 myd88-dependent signaling pathway bile immunoglobulin of the duck (anas platyrhynchos). ii. antibody response in influenza a virus infections b-cells need a proper house, whereas t-cells are happy in a cave: the dependence of lymphocytes on secondary lymphoid tissues during evolution type i interferon [corrected] gene induction by the interferon regulatory factor family of transcription factors irfs: master regulators of signalling by toll-like receptors and cytosolic pattern-recognition receptors irf-7 is the master regulator of type-i interferon-dependent immune responses 0 -triphosphate rna is the ligand for rig-i quantitative expression of toll-like receptor 1-10 mrna in cellular subsets of human peripheral blood mononuclear cells and sensitivity to cpg oligodeoxynucleotides mavs forms functional prion-like aggregates to activate and propagate antiviral innate immune response global characterization of interferon regulatory factor (irf) genes in vertebrates: glimpse of the diversification in evolution re-evaluation of the chicken mip family of chemokines and their receptors suggests that ccl5 is the prototypic mip family chemokine, and that different species have developed different repertoires of both the cc chemokines and their receptors tlr3 increases disease morbidity and mortality from vaccinia infection ubiquitin-induced oligomerization of the rna sensors rig-i and mda5 activates antiviral immune response advances in avian immunology-prospects for disease control: a review the long view: a bright past, a brighter future? forty years of chicken immunology pre-and post-genome a genomic analysis of chicken cytokines and chemokines identifying innate immune pathways of the chicken may lead to new antiviral therapies characterization of chicken mda5 activity: regulation of ifn-beta in the absence of rig-i functionality cell type-specific involvement of rig-i in antiviral response antigen processing and presentation: evolution from a bird's eye view the chicken b locus is a minimal essential major histocompatibility complex a ''minimal essential mhc'' and an ''unrecognized mhc'': two extremes in selection for polymorphism chicken mhc molecules, disease resistance and the evolutionary origin of birds chicken tlr21 is an innate cpg dna receptor distinct from mammalian tlr9 comparative genomics of major histocompatibility complexes negative feedback regulation of rig-i-mediated antiviral signaling by interferon-induced isg15 conjugation mice completely lacking immunoproteasomes show major changes in antigen presentation aberrant innate immune response in lethal infection of macaques with the 1918 influenza virus a structure-based model of rig-i activation structural basis for the activation of innate immune pattern-recognition receptor rig-i by viral rna tetratrico peptide repeat interactions: to tpr or not to tpr? the leukocyte receptor complex in chicken is characterized by massive expansion and diversification of immunoglobulin-like loci detrimental contribution of the toll-like receptor (tlr)3 to influenza a virus-induced acute pneumonia interferonstimulated transcription: isolation of an inducible gene and identification of its regulatory region hla-a⁄0201 presents tapdependent peptide epitopes to cytotoxic t lymphocytes in the absence of tapasin the interaction between interferon-induced protein with tetratricopeptide repeats-1 and eukaryotic elongation factor-1a isg56 is a negative-feedback regulator of virus-triggered signaling and cellular antiviral response virus-dependent phosphorylation of the irf-3 transcription factor regulates nuclear translocation, transactivation potential, and proteasome-mediated degradation chicken cells sense influenza a virus infection through mda5 and cardif signaling involving lgp2 retinoic acid inducible gene-i, more than a virus sensor the mitochondrial targeting chaperone 14-3-3epsilon regulates a rig-i translocon that mediates membrane association and innate antiviral immunity gene expression networks underlying retinoic acid-induced differentiation of acute promyelocytic leukemia cells distinct rig-i and mda5 signaling by rna viruses in innate immunity immune signaling by rig-i-like receptors the immunoglobulin heavy chain locus of the duck. genomic organization and expression of d, j, and c region genes structural insights into rna recognition by rig-i avian betadefensin nomenclature: a community proposed update genomics of antiviral defenses in the duck, a natural host of influenza and hepatitis b viruses emerging role of ubiquitination in antiviral rig-i signaling. microbiol immunoglobulin genetics and antibody responses to influenza in ducks cdna sequence and organization of the immunoglobulin light chain gene of the duck, anas platyrhynchos one gene encodes the heavy chains for three different forms of igy in the duck opposite orientation of the alpha-and upsilon-chain constant region genes in the immunoglobulin heavy chain locus of the duck structural relationship between the two igy of the duck, anas platyrhynchos: molecular genetic evidence the p59 oligoadenylate synthetase-like protein possesses antiviral activity that requires the c-terminal ubiquitin-like domain identification and validation of ifit1 as an important innate immune bottleneck evolution of nk receptors: a single ly49 and multiple kir genes in the cow the dominant mhc class i gene is adjacent to the polymorphic tap2 gene in the duck, anas platyrhynchos tlr-dependent t cell activation in autoimmunity the mhc of the duck (anas platyrhynchos) contains five differentially expressed class i genes paramyxovirus v proteins disrupt the fold of the rna sensor mda5 to inhibit antiviral signaling inhibitory receptor paired ig-like receptor b is exploited by staphylococcus aureus for virulence origin and evolution of the chicken leukocyte receptor complex trim family proteins: retroviral restriction and antiviral defence riplet/rnf135, a ring finger protein, ubiquitinates rig-i to promote interferon-beta induction during the early phase of viral infection ubiquitin-mediated modulation of the cytoplasmic viral rna sensor rig-i the ubiquitin ligase riplet is essential for rig-i-dependent innate immune responses to rna virus infection variable nk cell receptors and their mhc class i ligands in immunity, reproduction and human evolution a single polymorphic residue within the peptide-binding cleft of mhc class i molecules determines spectrum of tapasin dependence allelic diversity of tap in wild mallards identification and characterization of a functional, alternatively spliced toll-like receptor 7 (tlr7) and genomic disruption of tlr8 in chickens reis e sousa, c, 2006. rig-i-mediated antiviral responses to single-stranded rna bearing 5 0 -phosphates chicken igy binds its receptor at the ch3/ch4 interface similarly as the human iga: fc alpha ri interaction species-specific inhibition of rig-i ubiquitination and ifn induction by the influenza a virus ns1 protein expression analysis of turkey (meleagris gallopavo) toll-like receptors and molecular characterization of avian specific tlr15 rig-i like receptors and their signaling crosstalk in the regulation of antiviral immunity complete sequence of a chicken lambda light chain immunoglobulin derived from the nucleotide sequence of its mrna roles for calreticulin and a novel glycoprotein, tapasin, in the interaction of mhc class i molecules with tap the inhibitory action of p56 on select functions of e1 mediates interferon's effect on human papillomavirus dna replication recognition of 5 0 triphosphate by rig-i helicase requires short blunt double-stranded rna as contained in panhandle of negative-strand virus identification of alpha interferon-induced genes associated with antiviral activity in daudi cells and characterization of ifit3 as a novel antiviral gene the interferon system of non-mammalian vertebrates comparative immunology of agricultural birds mx is dispensable for interferon-mediated resistance of chicken cells against influenza a virus positive regulation of interferon regulatory factor 3 activation by herc5 via isg15 modification gene organization of the quail major histocompatibility complex (mhccoja) class i gene region comparative molecular dynamics analysis of tapasin-dependent and -independent mhc class i alleles single nucleotide polymorphism discovery in the avian tapasin gene emerging role of isg15 in antiviral immunity goose rig-i functions in innate immunity against newcastle disease virus infections the role of iodine in thyroid autoimmunity: from chickens to humans: a review natural killer cell receptors in the horse: evidence for the existence of multiple transcribed ly49 genes nonself rna-sensing mechanism of rig-i helicase and activation of antiviral immune responses mycobacteriosis in birds identification of host genes linked with the survivability of chickens infected with recombinant viruses possessing h5n1 surface antigens from a highly pathogenic avian influenza virus avian influenza rapidly induces antiviral genes in duck lung and intestine complexity of expressed chir genes the chicken leukocyte receptor cluster the chicken leukocyte receptor complex: a highly diverse multigene family encoding at least six structurally distinct receptor types the chicken leukocyte receptor complex encodes a primordial, activating, high-affinity igy fc receptor the chicken leukocyte receptor complex encodes a family of different affinity fcy receptors the dominantly expressed class i molecule of the chicken mhc is explained by coevolution with the polymorphic peptide transporter (tap) genes peptide motifs of the single dominantly expressed class i molecule explain the striking mhc-determined response to rous sarcoma virus in chickens sequencing of the core mhc region of black grouse (tetrao tetrix) and comparative genomics of the galliform mhc mitochondrion: an emerging platform for host antiviral signalling tolllike receptor 3 mediates west nile virus entry into the brain causing lethal encephalitis rig-i à/à mice develop colitis associated with downregulation of g alpha i2 igy: clues to the origins of modern antibodies the genome of a songbird molecular cloning, full-length sequence and preliminary characterization of a 56-kda protein induced by human interferons expression of igm, igd, and igy in a reptile, anolis carolinensis a review: the obese strain (os) of chickens: an animal model with spontaneous autoimmune thyroiditis a novel chimeric ig heavy chain from a teleost fish shares similarities to igd crystal structure of isg54 reveals a novel rna binding structure and potential functional mechanisms isolation of a 97-kb minimal essential mhc b locus from a new reverse-4d bac library of the golden pheasant the rna helicase rig-i has an essential function in double-stranded rna-induced innate antiviral responses nucleic acid-sensing toll-like receptors are essential for the control of endogenous retrovirus viremia and erv-induced tumors reconstitution of the rig-i pathway reveals a signaling role of unanchored polyubiquitin chains in innate immunity interferon-stimulated gene 15 and the protein isgylation system human isg15 conjugation targets both ifn-induced and constitutively expressed proteins functioning in diverse cellular pathways isg15 conjugation system targets the viral ns1 protein in influenza a virus-infected cells identification of igf, a hinge-region-containing ig class, and igd in xenopus tropicalis mapping of the chicken immunoglobulin heavy-chain constant region gene locus reveals an inverted alpha gene upstream of a condensed upsilon gene interferon induced ifit family genes in host antiviral defense origin and evolution of the rig-i like rna helicase gene family the idea for this review came from a conversation with martin flajnik in the beer tent at the comparative immunology workshop, held in waterloo, ontario in 2010. key: cord-301218-zsp5sh9o authors: weeraratna, ashani t.; nagel, james e.; de mello-coelho, valeria; taub, dennis d. title: gene expression profiling: from microarrays to medicine date: 2004 journal: j clin immunol doi: 10.1023/b:joci.0000025443.44833.1d sha: doc_id: 301218 cord_uid: zsp5sh9o with the mapping of the human genome comes the ability to identify genes of interest in specific diseases and the pathways involved therein. laboratory technology has evolved in parallel, providing us with the ability to assay thousands of these genes at once, a technique known as microarray analysis. the main #x003fion that this type of technology raises is how we can apply this powerful technology to clinical medicine. recently, advances in data analysis, as well as standardization of the technology, have allowed us to examine this #x003fion, and indeed a few clinical trials currently being performed include microarrays as part of their protocol. in this review we outline the microarray technique and describe these types of studies in further detail. one of the promises of the human genome project is that through knowledge of genomic organization and chromosomal location, it will be possible to identify and link specific genes to susceptibility to various human diseases. in the past, gene expression information has been obtained on a one-by-one, single-gene basis typically through the use of northern blot analysis; however, the introduction of hybridization to nucleotide arrays now permits the rapid, simultaneous screening of the expression of several thousand individual genes at a given time. the two most common forms of gene expression profiling used today are the serial analysis of gene expression (sage) and microarray analysis. the sage technique is based on the principle that a 10-to 14-bp sequence referred to as a "tag" can uniquely identify a transcript, provided that the tag is obtained from a unique position within a transcript (1) . this method of profiling allows researchers to examine the changes in the absolute levels of transcripts in a cell and, because it does not require an a priori knowledge of the transcriptome, can uncover novel genes expressed therein. however, this technique is quite labor-intensive and technically challenging, and the costs involved with the generation and sequencing of sage libraries are beyond the scope of many laboratories. microarray technology, the older of the two techniques, is intrinsically more "user-friendly." the first recorded instance of this technology is often overlooked, but was published in a study by augenlicht et al. where, in 1987 , investigators used a nylon membrane, containing 4000 complementary dna (cdna) sequences to examine changes in gene expression in colon cancer (2) . since these early studies, microarray profiling has been significantly refined and modified to optimize the sensitivity of the assay as well as the number of genes examined in a given experiment. gene expression profiling may provide valuable insights into the molecular mechanisms underlying disease. to perform a successful experiment, there is a need to identify clones of interest for arraying, isolate high-quality rna from tissues of interest, and analyze the data in the most informative manner possible (fig. 1) . each of these steps will be examined in detail below. in a microarray experiment, gene expression is often compared in two samples of rna. this typically means comparing "normal" to "diseased" tissues or "treated" and "untreated" cells or samples derived from various experimental conditions. what has become quite clear through its development and application is that microarray analysis is an exquisitely sensitive technique and prone to a (2) rna is extracted from cells or tissues of interest and labeled either with cy3 or cy5 (glass slides) or with p 33 (nylon filters) and hybridized. (3) images are analyzed using programs such as arraypro or iplab. (4) data is clustered and information extracted using bioinformatics. myriad of unavoidable variability, which leads to difficulty when designing an experiment. the first hurdle one must overcome is to select an appropriate disease state, or experimental condition as a reference against which all samples in a given test set can be compared. one major problem begins in simply defining "what is normal?" or obtaining a specimen to which a diseased tissue can be legitimately compared. skin from different areas of the body can significantly vary, just as cells derived from different tissues or region of a given organ may significantly differ. tumors are frequently a heterogenous mixed population displaying varying degrees of anaplasia, necrosis, and vascular proliferation. thus, even comparing a single tumor cell type derived from different patients can yield quite varied gene expression profiles. overall, sample selection is a conundrum. peripheral blood, the prototypical clinical specimen, is seldom a useful source of informative specimens simply because localized gene expression changes in tissues are not represented in rna made from peripheral blood leukocytes. moreover, the percentage of white blood cells within a given patient can vary from sample to sample affecting the rna recovered. because of the exquisite sensitivity inherent in microarray analysis, the use of mixed cell populations is a tenuous proposition. this is especially true for more complex organ structures such as the brain, where, for example, dopaminergic cells that display pathogenesis in several neurodegenerative and addiction disorders, are very sparsely represented in the total organ cell population. most commonly, inclusion of a pure population of the suspected infiltrating cell types in the experiment can assist in the identification of genes associated with infiltration or contamination, and then statistical analysis can be used to exclude these genes from the analysis. in addition, research techniques such as facs or laser capture microdissection can further enrich these heterogenous cell populations, resulting in more isolated and defined cell subpopulations for profiling. however, such enrichments must also consider the fact that inflammatory infiltrates or cells present in an adjacent tissue may themselves be part of the disease process and therefore are an appropriate component of the specimen. the development of many diseases may occur over an extended period of time and some may even include an orderly progression of stages. the originating event(s) leading to clinical symptoms or findings may have been initiated many years prior to diagnosis so that specimens obtained when symptoms develop may have limited informational value in predicting pathoformic disease. despite these caveats, over the past few years, a steady stream of reports has appeared describing the use of microarray technology in a variety of research areas including cancer, autoimmune or infectious disease, and a variety of inherited disorders, all with the intent of identifying and understanding their molecular origins and mechanisms. some have indeed identified potentially valuable markers for diagnosis. more recently, as the genomes of various bacteria, viruses, parasites, and other pathogens are sequenced, studies have been directed toward elucidating specific genes involved in microbial pathogenicity and virulence with the obvious expectation that such genes may serve as potential therapeutic targets in disease treatment. the first step in the construction of a microarray is to identify and collect clones (cdnas) or short oligonucleotides that encode genes important for research purposes. cdna arrays can be designed and constructed with a number of different goals in mind. such arrays may be focused on a particular tissue, chromosome, developmental stage, gene family, disease, or functional characteristic (e.g., signaling molecules, cytokines, apoptotic-mediators), or may be unfocused. oligonucleotide microarrays are manufactured by in situ synthesis on glass using a combination of photolithography and oligonucleotide chemistry. the result is a panel of short oligonucleotides that, depending on the particular array, identify up to about 33,000 discrete human genes. recently, other manufacturers have begun to produce what are being called "spotted" oligonucleotide arrays. rather than the oligonucleotide being directly synthesized on the array substrate, these arrays are constructed using a robotic pin-based microarrayer to spot conventionally synthesized 40-to 80-bp oligonucleotides onto glass slides or nylon filters (3) . genes of interest can be identified using the public uni-gene database (http://www. ncbi.nlm.nih.gov/unigene/). unigene is an experimental system for automatically partitioning nucleotide sequence data deposited in gen-bank into a nonredundant set of gene-oriented clusters (4) . each unigene cluster contains sequences that represent a unique gene as well as related information such as the tissue types in which the gene has been expressed and its chromosomal location. unigene numbers may also correlate to hypothetical proteins (i.e., proteins identified by in silico analysis of genetic sequences) or as yet uncharacterized transcripts obtained from random-primed cdna libraries, referred to as expressed sequence tags (ests). each unigene cluster typically includes a number of clones that may be potentially used for cdna array construction. a useful public source of cdna clones are those made available by the integrated molecular analysis of genomes and their expression (image) consortium that also places sequence, map, and expression data about these clones into the public domain (4) . at the present time, there are approximately 4.5 million ests in the ncbi databases (http://www.ncbi.nlm.nih.gov/) and most are available from commercial distributors. additional cdna clones are available from commercial enterprises and from research laboratories that have constructed and sequenced unique cdna libraries. after choosing the appropriate genes and ests for a given array, these genes can be cloned into plasmid vectors suitable for transforming bacteria. bacterial clones containing cdnas of interest are propagated and the dna extracted and purified. the gene-specific inserts are amplified in microtiter plates by pcr. the best arrays contain only clones that are sequence-verified, to ensure accuracy and quality. such verification is crucial to the reliability of the data obtained from such arrays. using a multiwell format, these cdna inserts or oligonucleotides can be spotted by a robotic microarrayer onto glass slides, or nylon or nitrocellulose membranes that have been pretreated to augment their surface charge and increase the adherence of the dna (5). currently, depending on whether the format is nylon-based or glass-chip-based, an array may contain anywhere from 500 (nylon) to over 30,000 (glass chip) genes. glass-and nylon-based arrays are often regarded as alternative technologies. however, they have both strong and weak points that are often complementary. filter-based arrays used with radioactivity generally require less total rna, although with current protocols such as dendrimer or amino-allyl labeling, small amounts of rna can be used for fluorescence arrays as well (6) . however, filter-based arrays have lower per filter cost, making them an attractive choice for smaller laboratories (7) . on the other hand, fluorescent labeling allows control and experimental rna to be hybridized together, allowing for the significant advantage of a direct comparison. the process of dna hybridization involves the reassociation of single-stranded dna to form double-stranded dna with one strand originating from a cell or tissue under study and the other strand with the target sequence that has been printed or synthesized on the microarray. a crucial factor for successful hybridization is the purity and quality of the rna extracted from the cells or tissue of interest. contamination of this rna with genomic dna, proteins or detergent residues, or its degradation by ubiquitous ribonucleases may cause serious problems during the rt-pcr steps of the procedure. the method of labeling probe rna depends on the particular type of microarray being used for the study. with microarrays printed on glass slides, it is customary to label during reverse transcription one sample with the dye cyanine-3 (cy3) that, when excited by light, yields green fluorescence and, the other sample with cyanine-5 (cy5) that yields red fluorescence (8) . synthesized oligonucleotide arrays typically use biotinylated probes and are stained posthybridization with streptavadin conjugated to phycoerythrin. for microarrays using nylon membranes, the target rna is typically radioactively labeled by incorporation of [ 33 p] dctp or [ 32 p] dctp nucleotides during reverse transcription (9). while not commonly performed, arrays on glass slides may also be queried with radiolabeled probes. irrespective of labeling method, the probes are purified and incubated in a suitable buffer for 16-24 h with the microarray. posthybridization, the arrays are washed and quantity of signal incorporated in each spot is measured using either a specialized slide reader or an imaging system. analysis of microarray data continues in an evolutionary state with a number of different research groups analyzing their data in a variety of ways using combinations of various microarray-specific, spreadsheet, data display, and statistical software programs (10) (11) (12) (13) . to date, there is no universally accepted method to analyze microarray data and thus the analytic method selected is frequently directed toward the specific research question being asked. often, microarray data is examined using several techniques with the method providing the most robust interpretation being utilized for publication and further pursuit. one of the challenges in array data analysis is to distinguish specific physiologic changes in gene expression from the noise and variability inherent within the microarray technique. although there is a paucity of data specifically addressing such variability in human tissue, current available information suggests that the normal variance of expression of tightly regulated genes in a given tissue may range up to 20-30%. the miniaturization of the assay and the ability to conduct thousands of experiments at a given time (for analysis purposes, hybridization to each array spot can be considered a small experiment) in parallel inherently produces considerable variability in a microarray experiment (14) . the sources of fluctuation accumulate at each step of the microarray procedure from the initial processing of the tissue sample, through target and array preparation, hybridization, and image processing (15, 16) . whereas fluorescent-labeling of spotted cdnas allows both the experimental and control rna to be hybridized on the same microarray, radioactively labeled samples require that each specimen be hybridized on a separate array. the arrays are queried using specific software that recognizes and assigns a numeric density value to each spot on the microarray. irrespective of whether fluorescent-or radioactive-labeled microarrays are used, most research groups then apply a normalization procedure to the family of arrays included in an experiment to bring their signal range into an acceptable confidence interval and adjust the signals on each filter to approximate a normal distribution with a mean of 0 and a standard deviation of 1. many different normalization techniques have been described but there is yet no agreement as to a "best" way to normalize microarray data (17) . normalization is important to eliminate artifacts and allow comparison between filters. however, normalization has limits that, when ignored, can result in the creation of false signals and misinterpretation of the data (18) . following normalization, microarray data is examined to identify differences in gene expression. the simplest technique is the ratio of experimental to control or fold change. many published studies have used the "twofold change" criterion as a measure of significance and it has been shown that this method can be reproducible even taking into account interlaboratory variability. for example, one study compared gene expression changes in yeast, in three different laboratories, and showed a greater than 95% concordance in genes increased over twofold (19) . while this method is straightforward, it rapidly becomes apparent that this calculation may not be useful in all cases, most importantly because it eliminates all information about absolute gene expression levels. more significantly, fold change does not embrace any knowledge of biology. succinctly, genes that are members of a defined pathway or that respond to a common challenge are likely to be coregulated and therefore could be expected to display similar patterns of expression. a statistical technique generically termed "exploratory multivariate data" or "cluster analysis" has come into the forefront to identify groups of genes that display similar changes in expression. in general, classical clustering techniques start by creating a set of bidirectional distance vectors that represent the similarity between genes and between clusters of genes. an iterative process is then undertaken where each gene profile is compared to all the other gene profiles and clusters until eventually all genes are in one cluster (10) . there are numerous hierarchical clustering algorithms that differ in their starting point and the manner in which they calculate and compare the distances between the existing clusters and the remainder of the data set (17) . bottom-up (agglomerative) hierarchical clustering was first applied to microarray analysis by eisen et al. (20) . because this technique produces readily visualized patterns of coordinately regulated genes and is supported by software programs such as clusteru c and treeview c created by eisen (http://rana.lbl.gov/), it has become extremely popular for microarray analysis. other types of cluster analysis include multidimensional cluster analysis, which uses the similarities between two samples to generate a pearson's pairwise correlation coefficient. this gives an idea of the magnitude of difference between two samples and, when applied to three or more samples, also provides a direction of the difference between them. once these samples have been mapped into a three-dimensional plot, the similarity between two samples can be assessed by the distance between them. the more tightly two samples cluster together, the more similar they are. once these classes of genes have been identified, statistical analyses can be used to best determine which genes cause the samples to segregate as they do (21) . however, irrespective of the particular clustering method chosen, it quickly becomes apparent that microarrays can differentiate tens of thousands of genes, only a small subset, in the range of 5-10%, undergo significant change in expression, and are therefore worthy of additional study. this point led to the testing of another group of statistical techniques that included selforganizing maps (22) and k -means clustering (23) that organize the expression data before actual clustering (24, 25) . more recently mathematical procedures such as probabilistic principal component analysis (26) and support vector machines (svms) (27) (28) (29) , as well as models based on neural network designs (12, 30) or bayesian inference (31, 32) , have begun to be explored. in these techniques, an analysis algorithm is "trained" with a portion of the data set and these results used to heuristically select among various data-fitting models, one of which is then used to examine the entire data set. if an analysis technique can be developed and validated that can identify the genes that undergo a significant change in expression and remove those that do not, it could alter microarray design and construction in favor of smaller focused arrays that query only biologically relevant genes. clearly, gene expression analysis remains a work in progress. the goal is to develop tools that can identify meaningful expression changes, evaluate the significance of these changes to determine whether they are different than what might occur by chance alone, and ultimately group genes to reveal and examine the combinatorial nature of transcriptional control. two important points to take into consideration when running a microarray experiment are the necessity to run replicate experiments (33) (34) (35) and to validate the gene expression changes using other techniques such as real-time pcr. it may also be useful to analyze the expression levels of proteins encoded by the altered genes. this can be done by techniques such as immunocytochemistry or western blot, using specific antibodies for the proteins of interest. recently, members of the microarray gene expression data (www.mged.org) society have advocated the adoption of the minimal information about a microarray experiment (miame) guidelines (36, 37) (www.mged.org/workgroups/miame/miame checklist. html). one effect of these standards seems certain-there will be a move to the use of a single microarray product for all future clinical studies. bioinformatics applies principles of information science and technology to make life science data more understandable and useful. in practice, when dedicated computer software is used to search for hidden patterns in groups of data and to link this information to other data, this is referred to as "data mining." the usual first endpoint of a microarray experiment is a list of the genes or their genbank accession numbers that have undergone a meaningful change in expression. more often than not, a few of the gene names are recognizable, but the majority are not. however, what we really want to know is the function(s) of the gene, how it is related to other biologic pathways and processes or defined clinical syndromes or diseases, how this gene was affected by how the microarray experiment was conducted, and to link this information with clinical data, treatment outcomes, and drug responses. ultimately, we want to develop gene expression data into a prognostic or diagnostic tool. as previously noted, the trend in microarray analysis is toward various unsupervised clustering techniques. more recently, supervised techniques such as svm or neural networks that allow nonexpression data to be incorporated into the clustering model have shown added promise. however, it is accepted that there is really no single technique that is appropriate for all data sets, leaving the interpretation of microarray data an inexact science (17) . depending on how the data is processed, different relationships may be revealed which in and of themselves may be informative. data-mining software continues to evolve with several dozen commercial and academic products available. so far, no "one program does it all," and one is typically left with moving the expression data between various database, statistical, graphics, and annotation packages during analysis. numerous new high-quality databases have been constructed and other previously existing databases such as genbank significantly augmented by data produced by the human genome project (http://www.ncbi.nlm.nih.gov/ sitemap/index.html). together, these databases contain a truly amazing amount of information that is being updated and expanded on a regular basis. the effect of this continuous updating is that there is a degree of impermanence associated with the data. this has produced the secondary effect of turning the retrieval of information about individual genes from the various online genomic databases such as locuslink (www.ncbi.nlm.nih.gov/locuslink), omim (www.ncbi.nlm.nih.gov/omim/), aceview (www. ncbi.nlm.nih.gov/ieb/research/acembly/ ), unigene (www.ncbi.nlm.nih.gov/unigene), kegg (www.genome. ad.jp/kegg/kegg2.html), and genecards tm (http:// nciarray.nci.nih.gov/cards/) into complex hot-linked spreadsheets and databases, that are nonetheless userfriendly. this situation will persist into the foreseeable future as new gene function and pathway data becomes available over the internet. another concept receiving a great deal of recent attention, particularly by the pharmaceutical industry, is the observation that there are genetically based differences in drug and immune responses between individuals that may be utilized to optimize a person's therapy and that are responsible for adverse or suboptimal drug responses. further, a logical corollary to this idea is that through sequence analysis, it will be possible to identify disease-susceptibility genes that might represent potential targets for future drug development or other interventional therapy (38) . this has created a new field, pharmacogenomics, that examines inherited gene variations that dictate drug response and studies their effect on clinical drug responses (39) . presently, the identification and cataloguing of snps is the most popular method to investigate these complex genetic associations. snps are the most common form of genetic variation, occurring approximately once every 1000 bp throughout the 3 billion base pair human genome. although their incidence varies substantially across the genome, the total number of human snps is estimated to be over 10 million (40) . as of september 2002, 4.3 million snps have been deposited in the public dbsnp database (http://www.ncbi.nlm.nih.gov/snp/) and approximately 1.25 million snps have been mapped in silico to the human genome by the snp consortium (tsc; a private, not-for-profit alliance of 13 major multinational companies and the wellcome trust; http://snp.cshl.org) (41, 42) . since dna possesses only four nucleotides, the number of potential snp variations is quite restricted, making snps well-suited for high-throughput automated or parallel analysis (43) . however, the bottleneck in snp genotyping is the sample preparation; i.e., purifying hundreds of thousands of different loci so that snp genotyping can be done at each site. many different snp detection technologies are under commercial development (44) . presently, most large-scale projects examining genome-wide snp expression are based on differential hybridization affinity using either spotted or in situ synthesized oligonucleotide arrays (45, 46) or utilize mass spectroscopy for genotype analysis (7, 47) . however, a new spotted thiol-modified oligonucleotide gold thin film array appears capable of substantially improving detection speed and sensitivity (48) . snp data analysis is much simpler than microarray data analysis because the readout can be designed to be binary (i.e., fluorescent or nonfluorescent) rather than scalable. at present, snps have provided the most clinically relevant data linked to the hgp but continue to be a work in progress. a number of clinical studies are already under way using snp-based genetic testing to identify patients who are at increased risk for diabetes, cardiovascular disease, adverse drug reactions, cancer, or deep vein thrombosis. snp genotyping holds great promise as a breakthrough technology that will introduce the era of personalized medicine. however, the clinical snp genotyping studies that are now in progress are largely, if not exclusively, based on gene-disease associations and gene polymorphisms that were discovered 5-10 years previously demonstrating that considerable important work is still needed to identify meaningful disease-causing and modifying genes. linking these discoveries with public data produced by large genome-wide snp discovery and validation initiatives can be expected to promote a gradual introduction of snp genotyping into diagnostic medicine and gene-based pharmacotherapeutics. although there are several potential pitfalls associated with microarray technology, it is a powerful technique, and as evidenced by the surge in the medical literature over the past few years, has become increasingly popular (table i) . it is nevertheless accepted that the widespread inclusion of clinical data into microarray analysis algorithms will not be simple. several decades of research and development of clinical record systems has shown that a data model needed to capture the broad array of clinical parameters is extremely complex and difficult to standardize. for example, molecular profiling of cutaneous melanoma allowed for the identification of a more motile group of tumors histopathologically indistinguishable from their less aggressive counterparts (21) . in addition, potential usefulness of microarray-derived gene expression data has been shown in several recent studies of lymphoma, leukemia (25, 50) , and multiple myeloma (51) where modeling techniques that incorporate outcome and drug response during treatment were used to define tumor types or patient groups and to suggest rational targets for drug therapy or development. it is becoming increasingly evident that these techniques may have a significant impact on current diagnostic methods (52) . although many studies have demonstrated that current clinical parameters are reliable predictors of outcome, a few studies are beginning to reveal that certain prognostic indicators can more efficiently be derived from profiling studies. analysis of breast cancer samples by microarray revealed that where standard clinical and histological criteria can be useful in predicting disease outcome in patients with early onset breast cancer, these patients had a very distinct gene expression signature that acted as an even more powerful predictor (53) . this sort of robust prediction can also be made on the basis of microarray analysis of children presenting with medulloblastomas, in whom, again, outcome can be determined on the basis of their molecular signature (54) . in addition to classifying disease states and predicting outcomes, microarray analysis can also be used to analyze the effects of treatments and patient response to therapy. a recent study demonstrated that the effects of diverse regulators of breast cancer growth on breast cancer cells in culture linked the behavior of these cells to important clinical properties seen in in vivo specimens (55) . array analysis has been used to examine the probability of the rejection of renal allografts, by studying the gene expression profiles of patients who rejected their grafts as compared to those who did not. again, accurate predictions as to whether or not a patient would reject a graft could be made on the basis of their molecular profiles. other examples include using an oligonucleotide array, encoding several variations for the gene which encodes debrisoquine hyroxylase (cyp2d6) that metabolizes various psychotropic medications, and thus similar to an snp chip, where researchers were able to determine which patients might need adjustments in dosage due to their ability to metabolize these drugs, based on the alleles expressed (56) . in another study examining the changes in skeletal muscle tissues of patients with and without insulin treatment, several genes were differentially expressed. these genes were associated with muscle insulin resistance and complications associated with insulin metabolism, allowing again for the reassessment of treatment of patients with differing profiles (57) . when genes are identified that are useful as prognostic indicators, or markers of response, as indicated by the aforementioned studies, smaller arrays can be custom made to reflect these discoveries, and to aid with patient assessment, diagnostically or therapeutically. in at least two cases, a chip has been made to aid in diagnosis. one of these, the ovachip, contains genes involved in ovarian cancer as identified by sage analy-sis, and the utility of this chip is under investigation by several different groups (58) . another, the lymphochip, uses a custom array to help diagnose tumors of lymphoid origin (59, 60) . in cancers where tumor type and origin can be difficult to diagnose, this type of chip could have great utility. in addition, using a microarray platform to analyze viral dna has proven its effectiveness. an hpv dna chip uses a microarray platform to screen patients for possible human papilloma virus (hpv) infection, by spotting several different hp viruses on a microarray, allowing clinicians and researchers to determine the possibility of probable complications (such as cervical cancer) depending on the type of hpv present in the patient (61, 62) . furthermore, the most recent array application of clinical significance was the use of microarray technology to identify the virus responsible for sars. to do this, researchers created a chip containing over 12,0000 different viral gene signatures (the virochip) and only a few spots on this chip, all of which correlated to corona virus, showed positive expression (63) (64) (65) . the time saved using this method of analysis may significantly advance the discovery of a treatment for this epidemic and others of its kind. to date, microarray analysis has existed almost exclusively as research tool that requires considerable effort and time by skilled individuals to prepare high-quality rna, label and hybridize the arrays, and read and analyze the data. although microarray technology has begun to enter clinical medicine, several significant hurdles need to be overcome. for routine clinical lab use, significant improvements are needed in microarray fabrication, hybridization methodology, and analysis that will allow much or all of the processes to be fully automated and thus increase reproducibility within and across experiments. microfluidics and nanofabrication technologies that range from the use of dna as a construction material for mechanical devices to the use of carbon nanotubules to produce microarray-like device may have greater potential for full automation as well as increasing throughput speed and accuracy in the study of gene expression. in addition, the field of proteomics is a rapidly burgeoning one, and the identification of proteins and antigens for therapeutic use will be of high significance in the future. several commercial software vendors have already announced they plan to modify their data-mining software to link nucleotide and protein databases and tools that in the future may allow both individual gene transcription and translation to be readily evaluated. upon the accumulation of these technologies and data-mining tools, it is likely that the promise of microarrays as a tool for the clinician may one day be realized. serial analysis of gene expression expression of cloned sequences in biopsies of human colonic tissue and in colonic carcinoma cells induced to differentiate in vitro recent advances in dna microarrays database resources of the national center for biotechnology navigating gene expression using microarrays-a technology review jr: comparison of different labeling methods for twochannel high-density microarray experiments sensitivity issues in dna arraybased expression measurements and performance of nylon microarrays for small samples quantitative monitoring of gene expression patterns with a complementary dna microarray hybridization analyses of arrayed cdna libraries gene-expression profiling in human cutaneous melanoma clusfavor 5.0: hierarchical cluster and principalcomponent analysis of microarray-based transcriptional profiles microarray-based cancer diagnosis with artificial neural networks analyzing array data using supervised methods novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cdna array normalization strategies for cdna microarrays extracting information from cdna arrays computational analysis of microarray data processing and quality control of dna array hybridization data reproducibility of oligonucleotide microarray transcriptome analyses. an interlaboratory comparison using chemostat cultures of saccharomyces cerevisiae cluster analysis and display of genome-wide expression patterns molecular classification of cutaneous malignant melanoma by gene expression profiling interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation systematic determination of genetic network architecture analysis of large-scale gene expression data molecular classification of cancer: class discovery and class prediction by gene expression monitoring the main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data knowledge-based analysis of microarray gene expression data by using support vector machines support vector machine classification and validation of cancer tissue samples using microarray expression data diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks using bayesian networks to analyze expression data bayesian hierarchical model for identifying changes in gene expression from microarray experiments development of a prostate cdna microarray and statistical gene expression analysis package identifying and quantifying sources of variation in microarray data using high-density cdna membrane arrays importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cdna hybridizations standards for microarray data minimum information about a microarray experiment (miame) toward standards for microarray data roses ad: pharmacogenetics and the practice of medicine the use of single-nucleotide polymorphism maps in pharmacogenomics haplotype variation and linkage disequilibrium in 313 human genes a map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms the snp consortium: summary of a private consortium effort to develop an applied map of the human genome characterization of single-nucleotide polymorphisms in coding regions of human genes snp market view: opportunities, technologies, and products sbe-tags: an array-based method for efficient single-nucleotide polymorphism genotyping parallel genotyping of human snps using generic high-density oligonucleotide tag arrays highthroughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry a surface invasive cleavage assay for highly parallel snp analysis bioinformatics and clinical informatics: the imperative to collaborate characterization of stage progression in chronic myeloid leukemia by dna microarray with purified hematopoietic stem cells global gene expression profiling of multiple myeloma, monoclonal gammopathy of undetermined significance, and normal bone marrow plasma cells prediction of treatment response using gene expression profiles expression profiling predicts outcome in breast cancer prediction of central nervous system embryonal tumour outcome based on gene expression the gene expression response of breast cancer to growth regulators: patterns and correlation with tumor expression profiles iii: cyp2d6 genotyping with oligonucleotide microarrays and nortriptyline concentrations in geriatric depression gene expression profile in skeletal muscle of type 2 diabetes and the effect of insulin treatment development of a highly specialized cdna array for the study and diagnosis of epithelial ovarian cancer the lymphochip: a specialized cdna microarray for the genomic-scale analysis of gene expression in normal and malignant lymphocytes distinct types of diffuse large b-cell lymphoma identified by gene expression profiling correlation of cervical carcinoma and precancerous lesions with human papillomavirus (hpv) genotypes detected with the hpv dna chip microarray method hpv oligonucleotide microarray-based detection of hpv genotypes in cervical neoplastic lesions a novel coronavirus associated with severe acute respiratory syndrome microarray-based detection and genotyping of viral pathogens characterization of a novel coronavirus associated with severe acute respiratory syndrome gene expression in human alcoholism: microarray analysis of frontal cortex patterns of gene expression are altered in the frontal and motor cortices of human alcoholics global gene expression profiling of end-stage dilated cardiomyopathy using a human cardiovascular-based cdna microarray microarray gene expression profiles in dilated and hypertrophic cardiomyopathic endstage heart failure oligonucleotide microarray analysis of intact human pancreatic islets: identification of glucoseresponsive genes and a highly regulated tgfbeta signaling pathway gene expression profile in skeletal muscle of type 2 diabetes and the effect of insulin treatment in vivo regulation of human skeletal muscle gene expression by thyroid hormone analysis of gene expression profile during 3t3-l1 preadipocyte differentiation array-based gene expression profiling to study aging mitotic misregulation and human aging stereotyped and specific gene expression programs in human innate immune responses to bacteria expression of cytokine-and chemokine-related genes in peripheral blood mononuclear cells from lupus patients by cdna array identification of hypoxia-responsive genes in a dopaminergic cell line by subtractive cdna libraries and microarray analysis gene-microarray analysis of multiple sclerosis lesions yields new targets validated in autoimmune encephalomyelitis expression profiling of medulloblastoma: pdgfra and the ras/mapk pathway as therapeutic targets for metastatic disease distinct types of diffuse large b-cell lymphoma identified by gene expression profiling characterization of stage progression in chronic myeloid leukemia by dna microarray with purified hematopoietic stem cells informatic selection of a neural crest-melanocyte cdna set for microarray analysis molecular classification of cutaneous malignant melanoma by gene expression profiling genomic analysis of metastasis reveals an essential role for rhoc activation of peroxisome proliferator-activated receptor gamma suppresses nuclear factor kappa b-mediated apoptosis induced by helicobacter pylori in gastric epithelial cells genome-wide screening of genes showing altered expression in liver metastases of human colorectal cancers by cdna microarray gene-expression profiles in hereditary breast cancer gene expression profiling predicts clinical outcome of breast cancer global gene expression analysis of gastric cancer by oligonucleotide microarrays genome-wide analysis of gene expression in human hepatocellular carcinomas using cdna microarray: identification of genes involved in viral carcinogenesis and tumor progression hormone therapy failure in human prostate cancer: analysis by complementary dna and tissue microarrays failure of hormone therapy in prostate cancer involves systematic restoration of androgen responsive genes and activation of rapamycin sensitive signaling differential gene expression in renal-cell cancer gene expression profiling of alveolar rhabdomyosarcoma with cdna microarrays prostasin, a potential serum marker for ovarian cancer: identification through microarray technology microarrays and toxicology: the advent of toxicogenomics the promise of toxicogenomics key: cord-291719-1ku6cmwj authors: hajjo, rima; tropsha, alexander title: a systems biology workflow for drug and vaccine repurposing: identifying small-molecule bcg mimics to reduce or prevent covid-19 mortality date: 2020-10-06 journal: pharm res doi: 10.1007/s11095-020-02930-9 sha: doc_id: 291719 cord_uid: 1ku6cmwj purpose: coronavirus disease 2019 (covid-19) is expected to continue to cause worldwide fatalities until the world population develops ‘herd immunity’, or until a vaccine is developed and used as a prevention. meanwhile, there is an urgent need to identify alternative means of antiviral defense. bacillus calmette–guérin (bcg) vaccine that has been recognized for its off-target beneficial effects on the immune system can be exploited to boast immunity and protect from emerging novel viruses. methods: we developed and employed a systems biology workflow capable of identifying small-molecule antiviral drugs and vaccines that can boast immunity and affect a wide variety of viral disease pathways to protect from the fatal consequences of emerging viruses. results: our analysis demonstrates that bcg vaccine affects the production and maturation of naïve t cells resulting in enhanced, long-lasting trained innate immune responses that can provide protection against novel viruses. we have identified small-molecule bcg mimics, including antiviral drugs such as raltegravir and lopinavir as high confidence hits. strikingly, our top hits emetine and lopinavir were independently validated by recent experimental findings that these compounds inhibit the growth of sars-cov-2 in vitro. conclusions: our results provide systems biology support for using bcg and small-molecule bcg mimics as putative vaccine and drug candidates against emergent viruses including sars-cov-2. electronic supplementary material: the online version of this article (10.1007/s11095-020-02930-9) contains supplementary material, which is available to authorized users. few months after the declaration of covid-19 pandemic by the world health organization (who), the disease-causing virus is still sweeping the globe, causing more fatalities, failing health care systems, and resulting in severe economic losses. currently there are no approved drugs to treat covid-19, and new vaccine development is expected to take at least 12-18 months (1,2), with growing fears of possible failure associated with changes in viral antigenic determinants (3) or shortlived immunity (4) . additionally, the highly specific virusneutralizing antibodies in recovered patients may be short lived and ineffective in preventing the disease caused by the emerging variable strains of the virus (4) . with these uncertainties regarding an eminent specific sars-cov-2 vaccine, there is a need to search for current alternatives, such as agents that can stimulate or emulate the unique capabilities of our innate immune system. recent immuno-oncology success stories indicate that the best cancer-fighting strategies results from unleashing the patients' immune power (5) (6) (7) (8) . there is an increased awareness that harnessing innate immune responses, opens up new possibilities for long-term, multifaceted tumor control (9, 10) and infectious disease prevention (11) (12) (13) . therefore, next generation antiviral vaccines should be capable of boosting innate immune responses to tackle a wide range of novel pathogens very early after exposure, as single treatments or adjuvants to traditional vaccines targeting the adaptive immune system. accumulating evidence from the biomedical literature indicates that sars-cov-mediated pathology, a very similar pathology to sars-cov-2, was mainly caused by ineffective innate immune responses, associated with a severe reduction in the number of t cells in the blood (14) . recent evidence indicated that sars-cov-2 and mycobacterium tuberculosis (mtb) share unique similarities in terms of the host protein interaction partners, and both pathogens infect lung tissues (15) . on the other hand, old 'polypharmacological' vaccines, such as the bcg vaccine for tuberculosis (tb), has shown promising therapeutic effects for a wide range of infectious and non-infectious diseases including bladder cancer (16) (17) (18) . studies showed that bcg's polypharmacological effects were not limited to memory t cell immunity, but promoted strong, beneficial, and long-lasting effects on innate immunity. the who also recognized these beneficial 'offtarget' effects of bcg, calling for a further investigation to repurpose this vaccine for other orphan life-threatening diseases (19) . indeed, there are multiple clinical trials testing bcg for 216 conditions other than tb including 19 studies for covid-19 as reported on clinicaltrials.gov (20) . additionally, few recent peer-reviewed reports have pointed to an epidemiological relationship between bcg and covid-19 without providing substantial evidence (21) (22) (23) (24) . one should expect that the results of the randomized clinical trials (rcts) will help establish the value of the bcg vaccine as a treatment or prophylactic against the disease. herein, we describe a unique drug and vaccine repurposing workflow, and list high confidence proteins and pharmacological classes of compounds, that work as bcg mimics at the system level by inducing beneficial long lasting trained immune response. we also propose that bcg mimics can be used as alternatives to bcg in protecting from covid-19 and other emergent infectious diseases. we have developed and applied a systems biology workflow to study the bcg network pharmacology and prioritize smallmolecule bcg mimics and antivirals. this workflow is based on our original chemocentric informatics workflow described thoroughly in a previous report (25) . our current workflow ( fig. 1) incorporates three major components: (1) a module for mining and prioritizing gene signatures representative of a condition or a biological state; (2) a network-mining module to identify genetic perturbations that induce gene expression profiles that are highly enriched with the genes constituting the condition gene signature; and (3) a pathway enrichment module to understand the biological processes involved in the mechanism of action of bcg and highly correlated genetic perturbagens. a consensus gene signature for bcg vaccine was derived from gene expression profiles in peripheral blood mononuclear cells (pbmcs) in response to a bcg challenge test reported by matsumiya et al (76) , gse58636 dataset on ncbi gene expression omnibus (geo) (27) . we used the data collected from whole blood samples taken from healthy human subjects enrolled in phase 1 trial (clinical trials registration: nct01194180). for the purposes of this study we used the gene expression profiles generated from two human subject groups included in the above trial: group 1 (bcg naive), and group 2 (bcg vaccinated; median time since vaccination, 10 years). to study network pharmacology and query the connectivity map, we developed a consensus gene signature using genes that showed significant differential gene expression in response to a bcg challenge test (stimulated) in comparison with controls (unstimulated) on days 0 and 14 in both groups 1 and 2. a systematic search, for nearest neighbor (nn) genes/proteins of the upregulated and downregulated genes in bcg's gene signature, was conducted in cytoscape (77) version 3.8.0 using the string (78) protein query application. all retrieved protein-protein interactions (ppis), including both physical and functional interactions were retrieved from widely used and reliable databases such as mint (79) , hprd (80), bind (81) , dip (82), biogrid (83) , kegg (84), reactome (60), ecocyc (85) , nci-nature pathway interaction database (86) , and gene ontology (go) (87) protein complexes. network building tools in cytoscape version 3.7.2 were used to generate ppi networks for bcg-cgs. enrichment analysis was conducted in cytoscape (77) and metacore to identify pathways and biological processes associated with bcg-cgs and cmap genetic connections. the significance of the enrichment was determined by the hypergeometric test (88) . all terms from the ontology were ranked based on their calculated p values. ontology terms with p values less than the p value threshold 0.05 are defined as statistically significant and therefore relevant to the studied list of genes. all terms from the ontology were ranked according to their calculated p values. the cmap (27, 89) is a chemogenomics database that catalogs 1.3 million profiles of transcriptional responses of human cells to chemical and genetic perturbations. currently, there are 27,927 perturbagens (19,911 small molecules, and 7494 genetic perturbagens) producing 476,251 expression signatures in 9 human cell lines: pc3, vcap, a375, a459, ha1e, hcc515, ht29, mcf7, hepg2. this database of cellular signatures has been produced using the l1000 platform (27) ; a high-throughput gene expression assay that measures the mrna transcript abundance of 978 "landmark" genes from human cells. causal reasoning (90) analysis identifies genes and proteins of a 'topological significance' in order to make decisions whether these genes/proteins are eligible for targeting in the studied phenotype. in this study, we applied causal reasoning to identify molecular regulators that most likely directly cause the observed expression changes in transcriptional profiles in response to bcg. in this approach, changes in gene expression in both directions as well as the effect of edges in the network are taken into account. for each node (i.e., gene) in the causal reasoning network, observed changes in expression are matched with the expected changes inferred from the network structure given the hypothesis that the observed gene expression is decreased or increased due to its activity. each node has an outgoing activation or inhibition effects on other objects in the knowledge database, and a key hub with a predicted increase in activity shows increased expression for those genes that the hub is known to activate, and it shows decreased expression for genes it is known to inhibit. each predicted key hub has a prediction p value which is produced as a result of a binomial test used to assess the probability of making a given number of supportive data out of all defined differentially expressed genes (degs) in the examined data. it is noteworthy that causal reasoning examines both direct neighbors of differentially expressed genes, and remote (several steps away) regulators. all causal reasoning predictions were performed in key pathway advisor from clarivate analytics, using the pollard method (91). gplots (92) v3.0.1.2 was used for plotting enhanced heatmaps for transcriptional data (e.g., heatmap representing bcg-cgs in fig. 2 ). heat maps were generated using the heatmap.2 function included in this package. to study the bcg polypharmacology and potential beneficial effects of this vaccine in preventing the fatal consequences of covid-19, we have devised and implemented a 'network biology' workflow ( fig. 1) to interrogate the hypothesis that bcg vaccination may protect from covid-19 fatalities. this workflow is based on our drug repurposing chemocentric informatics workflow which has been validated previously for small-molecule drug repurposing (25) . the current workflow is tweaked towards vaccine repurposing by employing novel bioinformatic approaches to computationally model and connect molecular networks in an effort to understand the underlying 'network' biology of vaccines, and pinpoint the regulatory genes and proteins responsible for causing the observed beneficial multitherapeutic effects. although we are not the first group to use network biology approaches to study the transcriptional changes of vaccines, to our knowledge, this is the first study that uses these approaches both to support vaccine repurposing, specifically for covid-19, as well as identify putative small molecule drugs that can mimic the vaccine effects. our workflow starts with the prioritization of a gene signature to study the bcg network pharmacology. first, we derived a consensus gene signature (cgs) for bcg based on geo's dataset gse58636 (26) . details on bcg-cgs signature are found in table s1 (supporting information). twenty-two differentially-expressed genes across all 4 experiments (2 groups × 2 time points discussed in methods) formed bcg's consensus gene signature (bcg-cgs) shown in fig. 2a . all 22 genes in bcg-cgs were used as seed nodes to build a protein-protein interaction network for signature genes (fig. 2b ). interactions were extracted from string database, and high confidence interactions included physical interactions (e.g., binding), functional interactions (e.g., activation, inhibition, catalysis), or gene co-expression. two types of networks were generated: 1) high-confidence 'core' network restricted to bcg signature genes as network nodes and high confidence (≥0.70) interactions as network edges, and 2) mediumconfidence interaction network obtained from expanding the core network by 20 additional nodes (fig. 3 ). enrichment analysis results performed in cytoscape, using string's protein-protein interactions, indicated that bcg-cgs is enriched in inflammatory cytokines and immune response modulators (fig. 2b) . some signature genes are also involved in the negative control of important viral processes (e.g., (fcn1, tnf and ccl3), and others are involved in the response to viral infections (e.g., ifng, rnase6, il6 and tnf). the complete lists of enriched pathways are included in tables s2 and s3 (supporting information). we identified 291 key hubs using the causal reasoning method, which seeks to identify molecular regulators that can directly cause the observed transcriptional changes in response to bcg vaccination. key regulators can be transcriptional factors and proteins with potentially altered activity that explains the transcriptional changes. top five statistically significant inhibited key hubs included hey1, dsipi (gilz), jagged1, hand1 and mir-129-1-3p, whereas top five statistically significant activated key hubs were phf20, tafii70, glutaredoxin, runx2 and notch1 (nicd). top 30 causal key hubs are shown in table i and all identified 291 key hubs are included in table s4 (supporting information) . in order to identify experimentally validated upstream regulators that cause transcriptional changes similar to those induced by bcg, we queried the connectivity map (cmap) (27) database of the broad institute with bcg-cgs and identified proteins and small-molecule drugs that have strong connectivity scores with bcg (fig. 1) . the cmap approach enabled us to compare bcg-cgs with 'experimentally' predefined signatures of therapeutic compounds and genetic perturbations (i.e., over expression or knockdown) included in the cmap and ranked according to a connectivity scores (ranging from +100 to −100), representing relative similarity to bcg-cgs. the connectivity score itself is derived using a nonparametric, rankbased, pattern-matching strategy based on t he kolmogorov-smirnov statistics (28) . all instances in the database are then ranked according to their connectivity scores with bcg-cgs; those at the top (+) are most strongly correlated to the query signature and looked at as bcg mimics, and those at the bottom (−) are most strongly anticorrelated and can reverse bcg's gene signature. our analysis identified three highly enriched classes of genetic knockdown (kd) perturbagens and one pharmacological class of drugs that have positive connectivity scores in alveolar a549 cells (i.e., caused similar transcriptional changes to those induced by bcg in alveolar a549 cells). these hits can be considered as bcg mimics capable of inducing transcriptional changes similar to those caused by the bcg vaccine. therefore, we suggest that bcg mimics can be used as alternatives to bcg vaccination to promote long-lasting b a beneficial effects on immune cells. the three enriched protein classes are: protein phosphatases (with best positive connection for ppp4c kd), histone deacetylases (with best positive connection for hdac10 kd followed by hdac11 kd), and mediator complex proteins (with best positive connection for med6 kd followed by med7 kd). additionally, protein kinase c (pkc) activators were enriched as a drug class; and top three pkc activators with highest cmap connectivity scores to bcg-cgs were prostratin, phorbol-12-myristate-13-acetate, and ingenol. it is evident that all of the above four classes of proteins share one common feature: they participate in the transcriptional and metabolic regulation of immune cells in response to environmental cues including responses to pathogens (29) (30) (31) (32) . all top-scoring pkc activators from the cmap, are also known to have antiviral effects or affect t cell activation (33) (34) (35) (36) (37) . remarkably, analyzing top ten cmap positive connections with bcg-cgs obtained from nine cell lines indicated that two compounds are approved antiviral drugs: raltegravir (top 3rd positive connection, an hiv integrase inhibitor) and lopinavir (top 6th positive connection, an hiv protease inhibitor). more interestingly, emetine (top 4th positive connection) and lopinavir were recently validated to inhibit sars-cov-2 replication in vitro (38) . we also found evidence in the biomedical literature indicating that mst-312 (39), narciclasine (40) and verrucarin-a (41) possess antiviral activities. all cmap hits are provided in tables s5 and s8 (sup porting information). initial reports from clinical studies evaluating the use of lopinavir in covid-19 patients showed that the unbound lopinavir concentrations in the lungs were calculated to be subtherapeutic against sars-cov-2 (42, 43) . another study found that the unbound drug concentrations of lopinavir are far from reaching the ec50 of sars-cov-2 (16.4 μg/ml), although they clearly suffice to inhibit hiv-1 (44) . the authors mentioned fig. 3 high-confidence expanded network for bcg-cgs. nodes are color-coded using a split pie chart coloring scheme indicating pathway/gene set contribution to each node from the top 5 most enriched pathways/gene lists. core network is composed of genes in the bcg-cgs that are not singletons. step 1 expansion, added 10 additional nodes (i.e., genes) to the core network. step 2 expansion, added another 10 nodes for the first expansion. step 3 expansion, added another 10 nodes to the second expansion. expansions were performed to see which pathways remained most statistically significant, and therefore are considered high confidence pathways. that approximately 60-to 120-fold higher concentrations than those found in covid-19 patients treated with lopinavir-ritonavir, are required to reach the assumed ec50 at trough levels, making effective treatment of covid-19 with lopinavir and ritonavir at the currently used doses unlikely (44) . in order to prioritize high confidence bcg genetic mimics, we integrated hypotheses derived independently from the cmap with those predicted by causal reasoning, and accepted common hits only (i.e., cmap positive connections with bcg-cgs that are also predicted as beneficial drug targets by causal reasoning). this analysis resulted in 30 high confidence common hits reported in table s9 (supporting information) . we tested whether bcg-cgs, cmap positive connections, or predicted key hubs will have any impact on covid-19 by identifying overlaps with sars-cov-2 interactome, i.e., human proteins that were experimentally validated to interact with sars-cov-2 and extracted from two recent reports (45, 46) . this analysis (fig. 4a ) validated 3 protein hits to have physical links to sars-cov-2. the three proteins are transcribed by brd4, prkaca and sirt5; they all were positive connections from the cmap, predicted as statistically 3 † predicted activity of the key hub by causal reasoning is denoted byif the hub is inhibited, and denoted by + if the hub is activated ‡ correct/total network predictions: correct for the genes in the dataset predicted correctly; total for the total number of genes in the causal reasoning network § calculation distance: using causal reasoning one-step key hubs are defined as statistically significant transcriptional factors that are associated with experimental differential expressed genes regulation. two-step and three-step key hubs are distant key hubs that regulate one-step transcriptional factors *p-value calcualted for the polynomial test (2020) 37:212 significant key hubs, and were also validated as sars-cov-2 interacting proteins (15) . additionally, 14 high-confidence cmap positive connections, were validated to make physical interactions with sars-cov-2 proteins. these proteins are: psen2, pabpc1, hmox1, cit, plat, igf2r, ripk1, ndufs3, ndufa5, ggh, neu1, scarb1, csnk2b, f2rl1. and two positive connections, mark2 and mark3, were reported to have interactions with corona viruses (45) . predicted causal key hubs, sigmar1 and gnb1, were also validated to have physical links to sars-cov-2 (15) , and a third key hub ppia was known as a human protein interacting with proteins from corona viruses (45) . additionally, we mined the biomedical literature to identify evidence for linking bcg small molecule mimics with sars-cov-2, corona viruses or viral infections in general. we found that two out of ten top positive compound connections (emetine and lopinavir), were recently validated to inhibit sars-cov-2 replication in vitro (38) . other compounds we found to inhibit the growth of corona viruses, or had general antiviral activities (table ii) . previous peer-reviewed reports indicated that bcg's nonspecific effects on the immune system, can reduce all-cause child mortality (47) , protect individuals from numerous viral infections (48) (49) (50) (51) (52) , and it can even enhance the efficiency of some viral vaccines (53) (54) (55) . recently, several peer-reviewed studies have pointed to a striking correlation between universal bcg vaccination policies and reduced covid-19 mortality (23) . however, most epidemiological studies identified this correlation without acknowledging other important study confounders like social, economic, and demographic differences between countries. lately, escobar et al. mitigated multiple confounding factors for the first time and still observed several significant associations between bcg vaccination and reduced covid-19 deaths (21) . the authors of this study highlighted the need for mechanistic studies behind the effect of bcg vaccination on covid-19, and for clinical evaluation of the effectiveness of bcg vaccination to protect from severe covid-19. earlier studies suggested that the documented beneficial off-target effects of bcg in protecting from non-tb infections, including perhaps covid-19, involve a potentiation of innate immune responses through epigenetic mechanisms (56) (57) (58) . to our knowledge, we report here on the first study providing a mechanistic insight to explain the relationships between bcg and covid-19 at the molecular and systems biology levels as well as extend this insight toward proposing several bcg mimetics among known drugs as candidates for repurposing against the disease. our results indicate that bcg-cgs, key regulatory hubs and bcg-mimics identified from the cmap enrich common biological pathways important for key viral processes such as rna synthesis and processing, virus-host interactions, positive regulation of viral genome replication, and they are also important for the immune response mounted against the virus. supporting evidence from the biomedical literature confirms that bcg has many beneficial 'off-target' effects that can protect humans from emerging novel pathogens by boasting their innate immune responses (59) . our studies suggest that bcg promotes a wide-range of transcriptional and metabolic changes, including beneficial gene commensalism, that have been shown to reduce mortality and morbidity from non-tb infectious diseases (51, 52) . we show that bcg can produce these protective 'off target' effects mainly by increasing the production of thymus-generated short-lived undifferentiated cd4+ cells known as naive t cells (th 0 ), and triggering their differentiation into the long-lived mature naive t cells (mnts), such as cd4+ and cd8+ t cells (60) . interestingly, a very recent study published in science (61) showed that many unexposed patients (20-50%) carry selective and cross-reactive sars-cov-2 t cell epitopes protecting patients against severe infection. another recent study in cell (62) reported similar observation of strong sars-cov-2 selective memory t cell immunity (reminiscent of the functional patterns observed after successful vaccine immunizations) in patients with asymptomatic or mild infections. although these studies make no connection to any previous bcg vaccination as a source of selective epitopes, we observe that these observations are consistent with our mechanistic hypothesis concerning the protective effect of bcg against covid-19. these conclusions are supported by the enrichment results produced using the 'compare experiment' algorithm in metacore from clarivate analytics, which looks for significant coordinated gene expression effects across all experiments to test whether the pathway is being up-or down-regulated in a manner that is unlikely to be accounted for by random chance. the top enriched pathway map, with upregulated genes in response to bcg, is 'immune response t cell subsets: secreted signals' (fig. 4b) . a recent study showing that sars-cov-2 reshapes central cellular pathways, such as translation, splicing, carbon metabolism and nucleic acid metabolism (53), provides further support for this observation. naturally, bioinformatics techniques relying on gene expression, pathway over-representation and network biology have some limitations and biases: 1) results are impacted by the user-selected cut-off thresholds used to determine significant genes, which could make the results userdependent (63) ; 2) all components in the pathway are given equal weights without paying attention to the nature of the interactions between the different components (64) ; 3) there are underlying assumptions that pathways are independent of each other, contrary to the fact that pathways cross-talk and overlap (65); 4) assuming independence between genes may result in false positive predictions of highly enriched genes or gene sets (66); 5) these methods are incapable of modeling an organism's biology as a dynamic system, and cannot predict changes in the system due genetic mutations or environmental changes (67); 6) most pathway knowledge databases are built by curating experiments performed in different cell types at different time points under different conditions, so they are missing condition-and cell-specific information (64) . in order to mitigate some of the aforementioned limitations, we used a consensus gene signature since it is more stable than other gene expression signatures, we paired overrepresentation pathway analysis with causal reasoning to b a fig. 4 (a) a venn diagram showing overlaps between bcg genetic mimics and key hubs with sars-cov-2 and corona viruses interactomes. (b) top "pathway map" with the highest level of enrichment by genes in bcg-cgs. this map is generated using metacore from clarivate analytics. red thermometers indicate genes overexpressed in response to bcg treatment, and the hight of the red bars is representative of the differential gene expression level (i.e., log 2 values of the fold change). the numbers under the thermometers 1-5 refer to the experiment number: 1) gene expression on day 0 in response to bcg vaccination to a bcg-naïve population on day 1; 2) gene expression on day 0 in response to bcg re-vaccination to a previously vaccinated population; 3) gene expression on day 14 in response to bcg vaccination to a bcg-naïve population; 4) gene expression on day 14 in response to bcg revaccination to a previously vaccinated population, and 5) positive connections from the connectivity map, and the red bar in the thermometer number 5 represents presence of the gene only. predict protein activities based on the nature of interactions between upregulated or downregulated genes, and we also integrated results from several bioinformatics methods such as causal reasoning and cmap predictions to prioritize common hypotheses. a recent publication (68) in lancet has questioned whether bcg's effects can last for a long time. our top enriched pathway map (fig. 4b) indicates that bcg's effects can be longlasting if the effects were exerted on thymus-generated th 0 cells, which can occur to a greater extent very early in life before reaching thymic involution by puberty (69) . this pathway map indicates that bcg is capable of affecting both the numbers and the types of produced innate immune cells, as well as their maturation to long-lived memory t cells (i.e., what is known as trained immunity). this is very significant in the context of bcg's protective effects from sars-cov-2 and other emergent novel viruses where the individual's ability to eradicate such viruses is dictated by the number and diversity of naive t cell reservoir (70, 71) . our analysis suggest that bcg may protect individuals from novel pathogens by priming their trained immunity to fight such pathogens, including sars-cov-2. supporting evidence for this hypothesis is found in the literature (72) indicating that the protective effects of the bcg against tb, can last from 15 to 60 years after vaccination (72, 73) , with longer lasting effects when the vaccine is administered during the first year of life (74, 60) . a recent study indicated that "school-aged bcg vaccination offered moderate protection against tuberculosis for at least 20 years, which is much longer than previously thought" (60, 72) . another 60year follow-up study, showed that bcg vaccine efficacy persisted for 50 to 60 years after a single dose of bcg (60) . of special interest is a recent study that showed that mucosal vaccination resulted in an increased frequency of antigenspecific lung tissue-resident cd4+ t cells that provide longterm immunity (75) . these studies serve as additional evidence from the literature supporting our claim that a single dose of an 'effective' bcg vaccination to infants can have a very long duration of protection against pathogens including sars-cov-2. our findings provided systems biology support for using bcg to protect from the severe consequences of covid-19. bcg is currently on who's list of essential medicines; it is considered one of the safest and most effective medicines unknown †score refer to the cmap score. it represents the level of similarity between transcriptional effects induced by bcg and each of the compounds ‡ validation refers to the presence of any supporting evidence from the biomedical literature that the predicted bcg mimics have any antiviral activities. antiviral means there is evidence that the compound is used as or has antiviral activity; sars-cov-2 means that the compound should antiviral activity against sars-cov-2; corona viruses means that the compound showed antiviral activity against corona viruses other than sars-cov-2 § covid-19 ct: there is evidence that the compound is being tested in clinical trials for covid-19. there are 12 studies found for ruxolitinib in covid-19 on clinicaltrials.gov. needed in a health system. there is also evidence indicating that bcg can improve the response to vaccines directed against viral infections (48, (53) (54) (55) , which may prove useful when sars-cov-2-specific vaccines become available. therefore, we suggest that administering the bcg vaccine to all newborns may protect them from the infection by sars-cov-2 and other emerging pathogens. since this is an approved vaccine for tb, it can directly enter phase iii testing for the protection from covid-19 caused fatalities. however, we caution that running these experiments during an active covid-19 outbreak, might expose participants to aggravated immune responses if they contract covid-19 during the study. we also advise that clinical study design takes into account several factors that are known to affect the performance of bcg vaccine, such as: the age of the participants, geographies, ethnicities, route of administration and the mycobacterium strain used in the vaccine. it is equally important to run experimental validation studies, to evaluate the effects of bcg mimics, in preventing covid-19 or for treating urological cancers. our results provide systems biology support for using bcg and small-molecule bcg mimics as putative vaccine and drug candidates against emergent viruses including sars-cov-2. of course, any practical actions to repurpose this vaccine as a means of protection against sars-cov-2, or other novel viruses, should be preceded by the successful in vitro and animal experimentation. we also caution that previous studies showed that the protective effects of bcg were found to be weaker when the vaccine was given after the first year of life and particularly after puberty (68). research at al-zaytoonah university of jordan grant 2020-2019/17/03. at acknowledges partial support from nih grant ot2tr003441. we thank clarivate analytics for providing access to metacore, a specialized pathway and functional genomics analysis product. reference to commercial products or services does not constitute their endorsement. r. hajjo generated the idea, designed the workflow, generated content, performed data analysis and wrote the manuscript. a. tropsha was engaged in the critical discussion of the study design, recommended some studies, and edited the manuscript. developing covid-19 vaccines at pandemic speed from covid-19 research to vaccine application: why might it take 17 months not 17 years and what are the wider lessons? health research policy and systems vaccination against coronaviruses in domestic animals [internet]. vaccine deployment of convalescent plasma for the prevention and treatment of covid-19 the history and advances in cancer immunotherapy: understanding the characteristics of tumor-infiltrating immune cells and their therapeutic implications springer nature novel cancer immunotherapy agents with survival benefit: recent successes and next steps advances in cancer immunotherapy 2019 -latest trends immuno-oncology combinations: raising the tail of the survival curve harnessing innate immunity in cancer therapy targeting innate immunity to enhance the efficacy of radiation therapy targeting innate immunity for antiviral therapy through small molecule agonists of the rlr pathway antiviral innate immunity pathways innate immunity to influenza virus: implications for future therapy. expert review of clinical immunology t cell-mediated immune response to respiratory coronaviruses a sars-cov-2 protein interaction map reveals targets for drug repurposing bcg-induced cross-protection and development of trained immunity: implication for vaccine design the mechanism of action of bcg therapy for bladder cancer-a current perspective autophagy controls bcg-induced trained immunity and the response to intravesical bcg therapy for bladder cancer bcg vaccines 1 report on bcg vaccine use for protection against mycobacterial infections including tuberculosis, leprosy, and other nontuberculous mycobacteria (ntm) infections prepared by the sage working group on bcg vaccines and who secretariat. 2017. 20. search of: bcg | covid-19 -list results -clinicaltrials.gov [internet bcg vaccine protection from severe coronavirus disease 2019 (covid-19) is global bcg vaccination-induced trained immunity relevant to the progression of sars-cov-2 pandemic? is bcg vaccination affecting the spread and severity of covid-19? allergy bcg vaccination induced protection from covid-19 chemocentric informatics approach to drug discovery: identification and experimental validation of selective estrogen receptor modulators as ligands of 5-hydroxytryptamine-6 receptors and as potential cognition enhancers gene expression and cytokine profile correlate with mycobacterial growth in a a next generation connectivity map: l1000 platform and the first 1,000,000 profiles nonparametric statistical methods protein phosphatase 4 is an essential positive regulator for treg development, function, and protective gut immunity phosphatase pp4 negatively regulates type i ifn production and antiviral innate immunity by dephosphorylating and deactivating tbk1 mediator complex interaction partners organize the transcriptional network that defines neural stem cells histone deacetylase function in cd4+ t cells the effect of ingenol-b on the suppressive capacity of elite suppressor hiv-specific cd8+ t cells prostratin as a new therapeutic agent targeting hiv viral reservoirs. drug news and perspectives. drug news perspect effects of prostratin on cyclin ti/p-tefb function and the gene expression profile in primary resting cd4+ t cells lipopolysaccharide and phorbol 12-myristate 13-acetate both impair monocyte differentiation, relating cellular function to virus susceptibility lrr and pyd domains-containing protein 3 inflammasome is activated and inhibited by berberine via toll-like receptor 4/ myeloid differentiation primary response gene 88/nuclear factor-κb pathway, in phorbol 12-myristate 13-acetate-induced macrophage remdesivir, lopinavir, emetine, and homoharringtonine inhibit sars-cov-2 replication in vitro herpes simplex virus virucidal activity of mst-312 and epigallocatechin gallate antiviral (rna) activity of selected amaryllidaceae isoquinoline constituents and synthesis of related substances antiviral activity of brefeldin a and verrucarin a why lopinavir and hydroxychloroquine do not work on covid-19 -sciencedaily effect of systemic inflammatory response to sars-cov-2 on lopinavir and hydroxychloroquine plasma concentrations pharmacokinetics of lopinavir and ritonavir in patients hospitalized with coronavirus disease 2019 (covid-19) american college of physicians master regulator analysis of the sars-cov-2/human interactome a sars-cov-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing. biorxiv vaccination and all-cause child mortality from 1985 to 2011: global evidence from the demographic and health surveys adjuvant effect of bacille calmette-guérin on hepatitis b vaccine immunogenicity in the preterm and term newborn. frontiers in immunology repeated bcg vaccination is more effective than a single dose in preventing diabetes in non-obese diabetic (nod) mice. undefined non-specific effects of vaccines illustrated through the bcg example: from observations to demonstrations benefits of bcg-induced metabolic switch from oxidative phosphorylation to aerobic glycolysis in autoimmune and nervous system diseases bcg-induced protection: effects on innate immune memory bcg vaccination enhances the immunogenicity of subsequent influenza vaccination in healthy volunteers: a randomized, placebo-controlled pilot study non-specific effect of bacille calmette-guérin vaccine on the immune response to routine immunisations influence of mycobacterium bovis bacillus calmette-guérin on antibody and cytokine responses to human neonatal vaccination the american association of immunologists report on bcg vaccine use for protection against mycobacterial infections including tuberculosis, leprosy, and other nontuberculous mycobacteria (ntm) infections prepared by the sage working group on bcg vaccines and who secretariat nonspecific effects of bcg vaccine on viral infections bcg vaccination protects against experimental viral infection in humans through the induction of cytokines associated with trained immunity. cell host and microbe bacille calmette-guérin induces nod2-dependent nonspecific protection from reinfection via epigenetic reprogramming of monocytes long-term efficacy of bcg vaccine in american indians and alaska natives: a 60-year follow-up study selective and cross-reactive sars-cov-2 t cell epitopes in unexposed humans robust t cell immunity in convalescent individuals with asymptomatic or mild covid-19 using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex ten years of pathway analysis: current approaches and outstanding challenges network biology: understanding the cell's functional organization. nature reviews | genetics gene set analysis: challenges, opportunities, and future research covid-19 vaccination clinical trials should consider multiple doses of bcg thymus size and agerelated thymic involution: early programming, sexual dimorphism, progenitors and stroma thymic and postthymic regulation of naïve cd4+ t-cell lineage fates in humans and mice models mechanisms shaping the naïve t cell repertoire in the elderly -thymic involution or peripheral homeostatic proliferation? experimental gerontology the duration of protection of school-aged bcg vaccination in england: a population-based case-control study duration of bcg protection against tuberculosis and change in effectiveness with time since vaccination in norway: a retrospective population-based cohort study. the lancet infectious diseases delaying bcg vaccination from birth to 10 weeks of age may result in an enhanced memory cd4 t cell response enhanced protection conferred by mucosal bcg vaccination associates with presence of antigenspecific lung tissue-resident pd-1 + klrg1 − cd4 + t cells gene expression and cytokine profile correlate with mycobacterial growth in a human bcg challenge model. the journal of infectious diseases cytoscape: a software environment for integrated models of biomolecular interaction networks string v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets mint: the molecular interaction database human protein reference database as a discovery resource for proteomics bind: the biomolecular interaction network database dip: the database of interacting proteins the biogrid interaction database: 2008 update kegg for linking genomes to life and the environment ecocyc: a comprehensive database resource for escherichia coli the nci-nature pathway interaction database: a cell signaling resource the gene ontology (go) database and informatics resource functional analysis of omics data and small molecule compounds in an integrated "knowledge-based" platform the connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. ny).: science causal reasoning on biological networks: interpreting transcriptional changes a computational model to define the molecular causes of type 2 diabetes mellitus new pharmacological strategies to fight enveloped viruses a screen of fda-approved drugs for inhibitors of zika virus infection antiviral activity of brefeldin a and verrucarin a middle east respiratory syndrome and severe acute respiratory syndrome: current therapeutic options and potential targets for novel therapies novel activities of safe-in-human broadspectrum antiviral agents brefeldin a and cytochalasin b reduce dengue virus replication in cell cultures but do not protect mice against viral challenge emetine inhibits zika and ebola virus infections through two molecular mechanisms: inhibiting viral replication and decreasing viral entry the natural compound homoharringtonine presents broad antiviral activity in vitro and in vivo publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations key: cord-329617-gzivtsho authors: lee, albert k.; kulcsar, kirsten a.; elliott, oliver; khiabanian, hossein; nagle, elyse r.; jones, megan e.b.; amman, brian r.; sanchez-lockhart, mariano; towner, jonathan s.; palacios, gustavo; rabadan, raul title: de novo transcriptome reconstruction and annotation of the egyptian rousette bat date: 2015-12-07 journal: bmc genomics doi: 10.1186/s12864-015-2124-x sha: doc_id: 329617 cord_uid: gzivtsho background: the egyptian rousette bat (rousettus aegyptiacus), a common fruit bat species found throughout africa and the middle east, was recently identified as a natural reservoir host of marburg virus. with ebola virus, marburg virus is a member of the family filoviridae that causes severe hemorrhagic fever disease in humans and nonhuman primates, but results in little to no pathological consequences in bats. understanding host-pathogen interactions within reservoir host species and how it differs from hosts that experience severe disease is an important aspect of evaluating viral pathogenesis and developing novel therapeutics and methods of prevention. results: progress in studying bat reservoir host responses to virus infection is hampered by the lack of host-specific reagents required for immunological studies. in order to establish a basis for the design of reagents, we sequenced, assembled, and annotated the r. aegyptiacus transcriptome. we performed de novo transcriptome assembly using deep rna sequencing data from 11 distinct tissues from one male and one female bat. we observed high similarity between this transcriptome and those available from other bat species. gene expression analysis demonstrated clustering of expression profiles by tissue, where we also identified enrichment of tissue-specific gene ontology terms. in addition, we identified and experimentally validated the expression of novel coding transcripts that may be specific to this species. conclusion: we comprehensively characterized the r. aegyptiacus transcriptome de novo. this transcriptome will be an important resource for understanding bat immunology, physiology, disease pathogenesis, and virus transmission. electronic supplementary material: the online version of this article (doi:10.1186/s12864-015-2124-x) contains supplementary material, which is available to authorized users. and pteropus whereas yangochiroptera includes the family myotidae and genus myotis [3] . unlike most mammals, bats can fly and this ability enabled their wide geographical range and increased metabolism [2] . interestingly, bats have recently come to the forefront of zoonotic disease research with vast number of pathogens identified in a wide variety of bat species [2] . upwards of 85 different viruses, primarily rna viruses, have been detected and/or isolated from bats [2, 4] . amongst these are emerging viruses that cause lethal disease in humans and nonhuman primates including nipah virus [5, 6] , hendra virus [7] , severe acute respiratory syndrome (sars)-like coronavirus [8] , middle east respiratory syndrome coronavirus (mers-cov) [9] , marburg virus (marv) [10] [11] [12] [13] , and ebola virus (ebov) [14] [15] [16] . despite the severe virulence of these viruses in humans, infected bats are often asymptomatic [13, [17] [18] [19] [20] [21] [22] . nipah virus and hendra virus interactions with their natural reservoir hosts, pteropus vampyrus and pteropus alecto, respectively, are well characterized. experimental infections of bats with high doses of henipaviruses have shown virus replication and shedding with little to no disease [20] [21] [22] . remarkably, the only viruses known to have induced any observable pathology in bats are rabies virus and australian bat lyssavirus [2, 23] . understanding mechanisms of disease and differential responses to infection in asymptomatic reservoir host species compared to species that exhibit severe pathology will help inform the development of novel therapeutics and disease prevention approaches. rousettus aegyptiacus, commonly known as the egyptian rousette bat, has been identified as a natural reservoir host for marv through ecological, epidemiological, and experimental studies [10, 12, 13, 18, 19, 24] . furthermore, it has been speculated this bat could host ebola virus [12, [25] [26] [27] , although recent experimental infection studies have shown ebola virus does not replicate well in r. aegeyptiacus [28] . the majority of human outbreaks due to marv have been associated with caves inhabited by r. aegyptiacus. furthermore, epidemiological surveillance of the r. aegyptiacus colony located in the python cave in uganda revealed a biannual spike in marburg virus prevalence. this pattern correlated strongly with spillover transmission events in humans [24] . initial studies in captive bats evaluated clinical signs, virus dissemination, and virus shedding patterns during experimental infection with a marv isolate derived from wild bats [13] . consistent with a natural reservoir host, the bats showed little to no evidence of disease even though the virus disseminated throughout their body and was actively shed [13] . these results were confirmed when bats were infected with marv angola, a strain isolated from a lethal human case [18] . in the absence of genetic and transcriptomic information for r. aegyptiacus and with limited available reagents, studying this reservoir host animal model has been challenging. the rapid expansion in genomic knowledge for different bat species has facilitated comparative studies that rely on the identification of genes and gene families, and has established a framework for developing necessary reagents. full genome annotations for pteropus vampyrus (2.63x, [29] ), myotis lucifugus (6.6x, [29] ) pteropus alecto (110x, [30] ), myotis davidii (110x, [30] ), and myotis brandtii (77.8x, [31] ) are now available. additionally, transcriptomic annotations for pteropus alecto [32] and artibeus jamaicensis [33] have been published. in particular, the complementary genome and transcriptome annotations for p. alecto has aided studies on henipavirus infections in its reservoir host [30, 32] . the host transcriptional response to different viruses was also recently assessed in a kidney cell line derived from p. vampyrus utilizing the previously annotated genome [34] . in this manuscript, we report the transcriptomic annotation of r. aegyptiacus from a de novo assembly of rna sequencing data from 11 tissues isolated from a male and a female bat. we identified 24,118 canonical coding transcripts whose expression profiles were consistent with the corresponding tissues of origin. in addition, we identified and validated novel coding transcripts that do not have any homology with the known sequences. furthermore, we evaluated the annotation for immune-related genes and assessed the presence and expression of genes associated with a variety of immune functions. we employed a de novo assembly approach to generate a comprehensive transcriptome without relying on a genome reference. first, we generated 20 rna-seq libraries consisting of 11 tissue types (table 1 , fig. 1a ) each collected from one male and one female r. aegyptiacus bat, which yielded approximately 2.1 billion reads. we then assembled the high quality reads using trinity [35] (fig. 1b) . this process generated 14,796,219 contigs. the assembly had high continuity and coverage with a median number of 718,807 contigs and median n50 of 1,540 [36] , which leverages the homology of known sequences of related species. we assigned gene symbols to contigs when this information was available. this process clustered the contigs into isoform groups (fig. 1c) . we compared our assembly to the transcriptomes of three related bat species --m. davidii, p. alecto, and m. brandtii. using blast, we recovered 90. tissue-specific transcriptome assemblies contained different numbers of contigs, due to their different levels of expression and sequencing depth. without a common ground for comparison, it was difficult to perform downstream comparative analyses such as differential gene expression analysis; therefore, we combined contigs from all tissues into one unified, nonredundant reference transcriptome (fig. 1d) . to this end, we iteratively merged the assemblies two at a time, similar to the approach employed in [37] (fig. 1d) . we obtained 4,746,293 nonredundant contigs. among the nonredundant contigs, 974,765 (20.54 %) of the sequences were annotated by bat sequences, 860,578 (18.13 %) by primate sequences, and 104,796 (2.2 %) by sequences in nt database (fig. 2a) . the nonredundant contig set had slightly lower sensitivity, though it still remained high; 86.60 % of m. davidii, 85.95 % of m. brandtii, and 95.30 % of p. alecto transcripts were recovered. the resulting annotated contigs were assigned gene names and combined using the longest annotated orf as the transcript. this resulted in an annotation for r. aegyptiacus that contained a total of 24,118 genes. to determine the efficiency of using the msa pipeline, we determined that 84 % (20,207 genes) of the contigs were annotated using the bat database and 16 % (3,911 genes) were subsequently annotated using the primate database. these data show that the msa pipeline, which utilizes known transcripts from related species only, is a sensitive and efficient method for de novo transcriptome annotation. we evaluated biological validity of the reconstructed transcriptome by analyzing global expression patterns across the different tissues. if the transcriptome assembly and annotations were accurate, the expression profiles of a given tissue should cluster with those of the same tissue origin and segregate from those of different origins [36, 38] . a gene can result in more than one transcript isoform; therefore, to capture the highest amount of information, for each gene, we focused on the transcript with the longest open reading frame (orf) (fig. 2a) . after normalizing the expression values, we performed multidimensional scaling (mds) to determine the relationships between the gene expression patterns in different tissues. as expected, mds showed a clear separation of the samples according to the tissue of origin (fig. 3a) and explains 74 % of the variance in the data. to examine the evolutionary relationship among tissues, we performed hierarchical clustering of the gene expression profiles (fig. 3b ). the brain, which has a different developmental pathway compared to the other organs, was classified as an outgroup. the spleen, lymph node, and bone marrow are all organs of the immune system and, as expected, clustered near each other. the peripheral blood contains some of the same cell types as the immune organs, thus, clustered near these tissues. lastly, the gonads and kidney, which develop from the intermediate mesoderm, were grouped as neighbors in the tree. these results suggest that our transcriptome captured sufficient heterogeneity of gene expression to distinguish individual tissues while preserving their developmental relationships. we further assessed biological validity of our transcriptome assembly through gene ontology (go) analysis of tissue-specific expression profiles. we compared expression profile of each tissue with the average expression in the whole dataset, and identified the top 200 most differentially expressed genes based on a generalized linear modeling framework. using this list, we examined the enriched go biological process (bp) terms. figure 4 shows the top 10 go bp terms from the bone marrow, spleen, lymph nodes, and peripheral blood mononuclear cells (pbmcs). (for other tissues, see additional file 1). terms enriched for each tissue are consistent with their expected physiological functions. r. aegyptiacus is a natural reservoir host for marv, allowing for virus replication and dissemination with little to no pathological consequences [13, [17] [18] [19] [20] [21] [22] . one important aspect of reservoir host biology is how their immune response compares to that of animal species that experiences severe disease, such as humans. therefore, we examined the transcriptome for the presence of immune-related genes. we associated the r. aegyptiacus gene set with go terms based on the human-specific gene ontology annotation. this resulted in 14,781 genes that mapped to 14,817 go terms. we used categorizer [39] and applied the immune class goslim terms to identify immune-related genes from this set. similar to previous studies in p. alecto and a. jamaicensis, we found that out of 14,817 go terms, approximately 2.75 % were associated with immune response [32, 33] . amongst the most represented go terms were cytokine production, lymphocyte activation, t cell activation, regulation of apoptosis, and regulation of lymphocyte activation (fig. 5) . we next searched for specific genes related to various aspects of the immune response in other mammals, primarily mice and humans. we first evaluated the annotation of the transcriptome for the presence of anti-viral genes. a multitude of pattern recognition receptors were identified including toll-like receptors (tlrs) 1-9, rig-i, mda5, and lgp2 along with the important scaffold and signaling molecules myd88 and mavs. a variety of antiviral molecules were also found, including mx1 and mx2, pkr, sting, irf3, irf5, irf7, members of the ifit and ifitm families, and isg15. we also looked for the presence of type i, ii, and iii interferons (ifn). we were able to identify ifngamma, ifngamma2, and ifnalpha. transcripts corresponding to the ifn receptor subunits ifnar1 and ifnar2 were also identified. ifnalpha and ifnbeta have been previously characterized by cloning from stimulated cells [40] . we, however, did not find any contigs corresponding to ifnb. to eliminate the possibility of an impaired assembly, we aligned the processed rna-seq reads to the ifnb sequence from p. alecto [41] (additional file 2 and additional file 3). we detected only 2 reads from r. aegyptiacus,which did not provide sufficient coverage to construct the transcript. these data suggest that ifnb expression in healthy tissues of r. aegyptiacus is low, consistent with other mammals in which ifnb is primarily expressed after exposure to a stimulus. we also searched the transcriptome for genes associated with innate immune cells. we found the transcripts for the cd14 and cd11c genes, which are commonly used for phenotyping macrophages and dendritic cells, as well as transcripts for the cd80 and cd86 genes, which are useful for evaluating the activation status of these cells. genes associated with natural killer (nk) cells, however, were less evident. we were able to identify transcripts of co-receptor gene cd56, but not cd16. transcripts of genes encoding for molecules in the killer cell lectin-like receptor (klr) family, including nkg2a and nkg2d, were also not found. in other bat transcriptomes, such as p. alecto and a. jamaicensis, coverage of nk cellrelated genes was more sparse than that of other mammals [32, 33] . a similar observation was made in the genome of m. davidii [30] . the absence of nk cell-related genes in the r. aegyptiacus transcriptome further strengthens the theory that bats might contain a different nk cell receptor repertoire than other species. next, we examined the repertoire of genes associated with adaptive the immune response. we identified a variety of transcripts associated with t cell identification, activation, inhibition, and differentiation including cd3 , cd4, cd8a, cd25, cd69, ccr7, pd-1, ctla4, gata3, foxp3, and tbet. interestingly, we were able to identify transcripts for the tcrα and tcrβ chains, but were unable to find transcripts for the tcrδ and tcrγ chains. the transcriptome annotation for p. alecto included these genes, but they were present at low levels [32] . this supports the notion that αβ t cells are the predominant t cell subset in bats. we also looked at genes associated with b cells and were able to find transcripts for cd19, cd20, cd27, as well as transcripts that were similar to the frequency shown is the percent of immune class go slim terms associated with that particular pathway out of all the go terms that were identified the immunoglobulin heavy chains a, e, g, and m and the immunoglobulin light chains κ and λ. future analysis of the r. aegyptiacus genome is required to fully evaluate the immunoglobulin gene repertoire. finally, we studied the cytokine and chemokine repertoire, important for shaping both innate and adaptive immune responses. we found a variety of transcripts corresponding to a wide array of both pro-inflammatory and anti-inflammatory cytokines. these included il-2, il-4, il-5, il-6, il-12a, il-12b, il-17a, il-23, il-10, tgfβ, tnf, ifnγ , il-1β, ccl2, ccl5, and cxcl10. altogether, the reference transcriptome generated for r. aegyptiacus provides an excellent foundation for investigating reservoir host immunology in bats. there were 2,806,154 unannotated contigs from the nonredundant contig set (fig. 2b) . of those, 71.6 % (2,008,503 contigs) did not have an orf suggesting the majority of these contigs may be noncoding transcripts. to determine if the unannotated contigs were real or artifacts from the assembly, we used blast to align this set of contigs to the p. alecto genome and found that 96 % (2,706,432 contigs) were aligned. to evaluate the possibility of an incomplete or impaired assembly, we grouped the aligned contigs into a total of 1,012,664 clusters based on the presence of overlapping sequences. this reduction suggests that multiple isoform expression patterns between different tissues may have affected our assembly or that our short read assembly may have been incomplete. nonetheless, the number of unannotated contigs that aligned to the p. alecto genome suggests that these contigs, either coding or noncoding, may be novel transcripts shared within the order pteropodinae. future studies evaluating the conservation and possible functions of these sequences are essential to determine the importance of these genetic elements. to validate novel contigs in r. aegyptiacus that appeared to be coding we utilized pcr. primers were designed to produce amplicons for eight highly expressed, unannotated contigs that contained orfs longer than 400 bp. using rna isolated from the spleen, we were able to produce amplicons of the expected size from at least one bat ( fig. 6 and additional file 4). the sequences of these amplicons were found to fig. 6 unannotated, novel transcripts from r. aegyptiacus were validated of by rt-pcr. rna from the spleen of both bats was reverse transcribed to make cdna. the cdna was amplified using primers specific for one of 8 novel transcripts that were unannotated in the assembly, but contained a complete orf larger than 400 nucleotides. the expected product sizes were: transcript 1, 457 bp; transcript 2, 450 bp; transcript 3, 419 bp; transcript 4, 548 bp; transcript 5, 469 bp, transcript 6, 277 bp; transcript 7, 507 bp; and transcript 8, 301 bp match the expected sequence from the assembled orf of the unannotated contig. these contigs also showed high sequence similarity to the p. alecto genome. in particular, six of the 8 validated transcripts showed sequence similarity higher than 75 % at a query coverage greater than 64 %. the other two validated transcripts had a query coverage of 23 with 78.36 % identity (transcript 1 in fig. 6 ) and a query coverage of 7 with 91.27 % identity (transcript 2 in fig. 6 ) (additional file 5); therefore, we hypothesize that these transcripts might be specific to r. aegyptiacus. further investigation is needed to fully understand the characteristics and biological functions associated with the proteins these contigs encode. in this paper, we presented the comprehensively annotated of transcriptome of r. aegyptiacus and assessed its quality and biological validity. this transcriptome will be an important resource to study bat immunology. in particular, it will facilitate the process of investigating differences in host responses between asymptomatic reservoir host species and species that exhibit severe pathology. it will also pave the way for the development of novel therapeutics and prevention approaches against emerging zoonotic virus outbreaks. tissues and blood were collected from one male and one female adult r. aegyptiacus bats that were bred and housed at the colony established at the center for disease control and prevention, atlanta, ga, usa (amman et al. 2015 [13] ). approximately 100 mg of the following tissues were collected and homogenized in 1 ml of trizol ls (invitrogen, carlsbad, ca): liver (bat id:bat7, bat17), lung (bat05, bat15), heart (bat03, bat13), kidney (bat04, bat14), brain (bat02, bat12), axillary lymph nodes (bilateral, pooled) (bat06, bat16), spleen (bat10, bat19), bone marrow (bat01, bat11), and gonad (bat08, bat20). pbmcs (bat08, bat18) were isolated from the blood and stored in trizol ls as well. rna was extracted using the purelink rna mini kit (invitrogen, carlsbad, ca). cdna was synthesized using the truseq stranded total rna sample prep kit (illumina, san deigo, ca) according to the manufacturer's protocol. the libraries were evaluated for quality using the agilent 2100 bioanalyzer (agilent, santa clara, ca). after quantification by real-time pcr with the kapa qpcr kit (kapa biosystems, woburn, ma), libraries were diluted to 10 nm. cluster amplification was performed on the illumina cbot and libraries were sequenced on the illumina hiseq 2500. eight of the female bat libraries were single-end, while the remaining tissues from the female bat and all tissues from the male bat were paired-end. all of the libraries sequenced were 125 bp in length. the average library depth was 66 m reads (minimum 16 m and maximum 98 m). all experimental procedures were conducted with approval from the centers for disease control and prevention (cdc, atlanta, ga, usa) institutional animal care and use committee, and in strict accordance with the guide for the care and use of laboratory animals (committee for the update of the guide for the care and use of laboratory animals 2011). the cdc is an association for assessment and accreditation of laboratory animal care international fully accredited research facility. no human patient-derived clinical materials were used in these studies. we first examined the quality of the reads using fastqc v0.11.3 [42] . we also preprocessed the reads to remove the adapter sequence using cutadapt v1.5 [43] . we removed "agatcggaagagcacacgtctgaactcc agtcac" from the forward strand and "agatcggaa-gagcgtcgtgtagggaaagagtgt-agatctcgg-tggtcgccgtatcatt" from the reverse strand. we performed strand-specific de novo transcriptome assembly using trinity r20140413p1 [35] with the parameters: "-normalize_reads" and "-ss_lib_type fr", along with its default parameters for all of our samples. for annotation of contigs and clustering them into a gene model, we used multiple species annotation pipeline, an nucleotide-based annotation approach that is more efficient and faster than blastx [36] . to make a blast [44] database for bats, we started with the complete "nucleotide collection" (nt) database. we exported all accession numbers of the bat sequences at ncbi and made a subset database from nt using "blastdb_aliastool -db nt -dbtype nucl -gilistbats.sequence.gi.txt -title bats -out bats". using the same type of query, we also created a database for primates including humans due to their extraordinarily well-annotated transcriptomes, which will maximize the power of our annotation pipeline. we then used blast to iteratively align the contigs to the bat db, the primate db, and finally nt using a subtractive approach: what did not align to the bat db was aligned to the primate db, and what did not align to the primate db was aligned to nt. to assess the coverage of our transcriptome, we downloaded the m. davidii, p. alecto, and m. brandtii transcriptomes from ncbi eukaryotic genomes annotations [41] . we generated a blast index out of union of all contigs from our samples, and aligned the three bat contigs to our blast databases. we chose the alignment with 70 % of sequence identity with maximum evalue of 1e-4. to generate a nonredundant set of contigs, we iteratively merged individual assemblies using the the methods similar to the [37] employed to merge the kmers. using cd-hit-est v4.6 [45] with sequence identity threshold of 0.99, we merged the first two pairs of contig sets (of sample i and sample i + 1) upto the final sample n. after each iteration, we merged the resulting merged contig sets using a similar approach until only one contig set remained. for the expression profiling, we generated a reference transcriptome consisting of transcripts each representing a gene model according to the following method: we first used transdecoder (r20140413p1) [46] to find the orf of all transcripts. then, based on the msa pipeline, we chose a transcript with gene symbols and the longest orf in each gene cluster to capture the most information for downstream expression analysis. we did not consider the contigs mapped to nt database in this manuscript because obtaining feature files for all sequences as required by the msa pipeline was computationally impractical, and a majority of the gene symbols (24, 118) are captured in the bat and primate databases. after a canonical transcript set was obtained, we used this as a transcriptome reference for expression analysis. we mapped the preprocessed reads to this reference using rsem v1.2.19 [47] and obtained a gene-to-count matrix. we removed the transcripts with expression variance equal to zero or with low expression (count <= 10). for mds plot, we used the spearman correlaton as a distance measure and "cmdscale" from the "stats" package in r [48] . to explore the biological processes in each gene expression profile, we employed a oneto-all sample comparison using the edger generalized linear model framework [49, 50] . for each tissue, we compared individual gene expression within the tissue versus the average expression of all other tissues. with each tissue having differently ranked gene lists, we then selected top 200 genes and ran gene ontology analysis using topgo [51] with human-specific gene ontology annotation [52] . we used blast [44] to align unannotated contigs to the genome of p. alecto with the evalue of 1e-4 and query coverage of 40 %. to cluster the aligned contigs into groups, we used bedtools [53] setting the distance threshold parameter at 0. for transcripts that did not align with any similarity to bat, primate, or nt blast databases, we applied a series of filters to select for the coding transcripts to be validated. we used the following criteria: an orf that was complete with both a start and stop codon, an orf that was at least 400 bp in size, and a transcript that was expressed (a read count > 0). we further selected for the novel transcripts with usuable primers using primer-blast [54] . using these criteria, the number of novel transcripts was narrowed down to a total of 8. the primers and expected amplicon size are listed in additional file 4. for validation, rna was extracted from the spleen tissue of both the male and female bats using trizol ls (invitrogen, carlsbad, ca). cdna was synthesized from 2.5 μg of rna using the superscript iii first-strand synthesis supermix (invitrogen, carlsbad, ca). amplicons for each of the primer sets were generated using phusion hotstart flex dna polymerase (new england biolabs, ipswitch, ma) and run on a 1.5 % agarose gel for visualization. the correct size amplicon was gel extracted, quantified, and sanger sequenced on the applied biosystems 3730×1 dna analyzer. bats and zoonotic viruses: can we confidently link bats with emerging deadly viruses? memórias do instituto oswaldo cruz microbat paraphyly and the convergent evolution of a key innovation in old world rhinolophoid microbats bats: important reservoir hosts of emerging viruses serologic evidence for the presence in pteropus bats of a paramyxovirus related to equine morbillivirus identifying hendra virus diversity in pteropid bats nipah virus: a recently emergent deadly paramyxovirus bats are natural reservoirs of sars-like coronaviruses middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group studies of reservoir hosts for marburg virus marburg virus infection detected in a common african bat isolation of genetically diverse marburg viruses from egyptian fruit bats oral shedding of marburg virus in experimentally infected egyptian fruit bats (rousettus aegyptiacus) fruit bats as reservoirs of ebola virus investigating the zoonotic origin of the west african ebola epidemic seroepidemiological prevalence of multiple species of filoviruses in fruit bats (eidolon helvum) migrating in africa experimental inoculation of plants and animals with ebola virus lack of marburg virus transmission from experimentally infected to susceptible in-contact egyptian fruit bats virological and serological findings in rousettus aegyptiacus experimentally inoculated with vero cells-adapted hogan strain of marburg virus transmission studies of hendra virus (equine morbilli-virus) in fruit bats, horses and cats experimental hendra virus infectionin pregnant guinea-pigs and fruit bats (pteropus poliocephalus) experimental nipah virus infection in pteropid bats (pteropus poliocephalus) australian bat lyssavirus infection in a captive juvenile black flying fox seasonal pulses of marburg virus circulation in juvenile rousettus aegyptiacus bats coincide with periods of increased risk of human infection ebola haemorrhagic fever. the lancet large serological survey showing cocirculation of ebola and marburg viruses in gabonese bat populations, and a high seroprevalence of both viruses in rousettus aegyptiacus ebola virus antibodies in fruit bats experimental inoculation of egyptian rousette bats (rousettus aegyptiacus) with viruses of the ebolavirus and marburgvirus genera comparative analysis of bat genomes provides insight into the evolution of flight and immunity genome analysis reveals insights into physiology and longevity of the brandt's bat myotis brandtii the immune gene repertoire of an important viral reservoir, the australian black flying fox transcriptome sequencing and annotation for the jamaican fruit bat (artibeus jamaicensis) transcriptome profiling of the virus-induced innate immune response in pteropus vampyrus and its attenuation by nipah virus interferon antagonist functions full-length transcriptome assembly from rna-seq data without a reference genome transcriptome reconstruction and annotation of cynomolgus and african green monkey de novo assembly and analysis of rna-seq data the evolution of gene expression levels in mammalian organs categorizer: a web-based program to batch analyze gene ontology classification categories induction and sequencing of rousette bat interferon α and β genes cutadapt removes adapter sequences from high-throughput sequencing reads gapped blast and psi-blast: a new generation of protein database search programs cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences rsem: accurate transcript quantification from rna-seq data with or without a reference genome r: a language and environment for statistical computing edger: a bioconductor package for differential expression analysis of digital gene expression data differential expression analysis of multifactor rna-seq experiments with respect to biological variation topgo: topgo: enrichment analysis for gene ontology gene ontology: tool for the unification of biology bedtools: a flexible suite of utilities for comparing genomic features submit your next manuscript to biomed central and take full advantage of: • convenient online submission • thorough peer review • no space constraints or color figure charges • immediate publication on acceptance • inclusion in pubmed, cas, scopus and google scholar • research which is freely available for redistribution we thank thomas kepler, stephanie d'souza, adam hume, elke muhlberger, jenna kelly for comments and discussion on the project. we also thank ahhyun kim for the illustration of a bat in fig 1a. this work was funded by the defense threat reduction agency (dtra) grant hdtra1-14-1-0016 and the training program in computational biology 5t32gm082797-07. the findings and conclusions in this report are those of the authors and do not necessarily represent the views of the centers for disease control and prevention or the u.s. army. the authors declare that they have no competing interests. rr and gp designed the study. akl assembled the sequence data and constructed the assembly. akl and oe annotated the assembly. akl and hk examined the assembled data and assessed the quality. kak and ern performed and interpreted the molecular studies. akl and kak analyzed and interpreted the expression studies. mebj, bra and jst provided biological material for analysis. akl and kak wrote the manuscript. all authors read, edited, and approved the final manuscript. key: cord-344297-qqohijqi authors: smith, jacqueline; sadeyen, jean-remy; cavanagh, david; kaiser, pete; burt, david w. title: the early immune response to infection of chickens with infectious bronchitis virus (ibv) in susceptible and resistant birds date: 2015-10-09 journal: bmc vet res doi: 10.1186/s12917-015-0575-6 sha: doc_id: 344297 cord_uid: qqohijqi background: infectious bronchitis is a highly contagious respiratory disease which causes tracheal lesions and also affects the reproductive tract and is responsible for large economic losses to the poultry industry every year. this is due to both mortality (either directly provoked by ibv itself or due to subsequent bacterial infection) and lost egg production. the virus is difficult to control by vaccination, so new methods to curb the impact of the disease need to be sought. here, we seek to identify genes conferring resistance to this coronavirus, which could help in selective breeding programs to rear chickens which do not succumb to the effects of this disease. methods: whole genome gene expression microarrays were used to analyse the gene expression differences, which occur upon infection of birds with infectious bronchitis virus (ibv). tracheal tissue was examined from control and infected birds at 2, 3 and 4 days post-infection in birds known to be either susceptible or resistant to the virus. the host innate immune response was evaluated over these 3 days and differences between the susceptible and resistant lines examined. results: genes and biological pathways involved in the early host response to ibv infection were determined andgene expression differences between susceptible and resistant birds were identified. potential candidate genes for resistance to ibv are highlighted. conclusions: the early host response to ibv is analysed and potential candidate genes for disease resistance are identified. these putative resistance genes can be used as targets for future genetic and functional studies to prove a causative link with resistance to ibv. electronic supplementary material: the online version of this article (doi:10.1186/s12917-015-0575-6) contains supplementary material, which is available to authorized users. infectious bronchitis (ib) is a highly contagious respiratory disease of chickens first described in the usa in the 1930's [1] [2] [3] . clinical signs include: coughing, sneezing, rales and nasal discharge. the disease can also affect the reproductive organs, which leads to a decrease in egg quality and production, thus making it a major cause of economic losses within the poultry industry [4] . the causative virus, infectious bronchitis virus (ibv) is a coronavirus, which is an enveloped virus with a single positive-stranded rna genome, which replicates in the host cell cytoplasm [5] . proteins encoded by ibv include the viral rna polymerase, structural spike proteins, membrane and nucleocapsid and various other regulatory proteins. the spike glycoprotein mediates cell attachment and plays a significant role in host cell specificity [6] . the existence of many different ibv serotypes, which are not cross-protective means that control of ib, is very difficult. mortality is usually fairly low (~5 %), however some strains of the virus can also cause nephritis meaning that, depending on strain, mortality can be greater than 50 % [7, 8] or even up to 80 % with some australian isolates [9] . ibv infection leaves birds more susceptible to colibacillosis [10] and subsequent bacterial infections can also lead to a high level of mortality [11] . currently, attenuated live vaccines are used in broilers and pullets, and killed vaccines are used in layers and breeders [12] . however, virus control is very difficult, as there are only a few vaccine types and many different strains of ibv. the virus also continues to mutate rapidly, generating more virulent strains of the disease [13] [14] [15] . coronaviruses have now also been detected in other avian species such as turkey, duck, goose, pheasant, guinea fowl, teal, pigeon, peafowl and partridge [4] . the extent to which the virus affects the host is highly dependent on the chicken breed [4] and the mhc b locus is known to play a role in susceptibility to the virus [16] . in this study we attempt to identify non-mhc genes, which may be involved in resistance to ibv. no genetic analyses have thus far been undertaken in order to try and do this and no quantitative trait loci or genes associated with resistance have been determined, so far. based on differential gene expression in susceptible and resistant lines of chickens, we identify potential candidate genes for disease resistance towards ibv (virulent m41 strain). building on the previous work by dar et al. [17] and wang et al. [18] we used affymetrix wholegenome chicken microarrays to examine the tracheal gene expression profiles of a line of birds known to be susceptible to ibv infection (line 15i) and a line known to show resistance (line n). we determined the early host response to infection and propose possible candidate genes for involvement in disease resistance towards ibv. understanding how coronaviruses infect the host and identifying genes involved in resistance is important not only for the poultry industry but also has important implications for human health, as diseases such as sars are also caused by coronaviruses [19, 20] . all animal work was conducted according to uk home office guidelines and approved by the roslin institute animal welfare and ethical review body. the lines used in these experiments are an ibv susceptible lineline 15i (inbred white leghorn strain) [21] and an ibv resistant lineline n (non-inbred cornell strain). line 15i was developed at east lansing in the usa in the 1940s [22] and line n at cornell, usa in the 1960s [23] . the lines have since been maintained at the institute for animal health in compton, uk. twoweek-old chicks from each line (15i and n) were separated into two experimental rooms, with ad libitum access to food and water. in one room, 54 birds (27 from each line) were infected with 4 log 10 cid 50 (10 4 cid 50 ) of virulent ibv-m41 strain in a total of 100 μl of 0.2 % bsa in pbs equally by intra nasal and ocular routes. in the other room, 54 control birds (27 from each line) received 100ul pbs via the same route. trachea samples (upper half ) were collected at 2, 3 and 4 days postinfection (9 individual birds from each line at each time point). the trachea of infected and control birds from each line were analysed for viral load using taqman real-time quantitative rt-pcr assays. tissue samples (~30 mg) were stabilized in rnalater (ambion, life technologies, paisley, uk) and disrupted using a bead mill (retsch mm 300, retsch, haan, germany) at 20 hz for 4 min. total rna was prepared using an rneasy kit (qiagen, crawley, uk) extraction method as per the manufacturer's protocol. samples were resuspended in a final volume of 50 μl of rnasefree water. concentrations of the samples were calculated by measuring od 260 and od 280 on a spectrophotometer (nanodrop, thermo scientific, paisley, uk). quality of the rna was checked on a bioanalyser (agilent technologies, south queensferry, uk). an rna integrity number (rin) > 8 proved the integrity of the rna. biotinylated fragmented crna was hybridized to the affymetrix chicken genome array. this array contains comprehensive coverage of 32,773 transcripts corresponding to over 28,000 chicken genes. the chicken genome array also contains 689 probe sets for detecting 684 transcripts from 17 avian viruses. for each experimental group (control and infected birds in each of the two lines at each of 2, 3 and 4 dpi), three biological replicates (3 rna pools from 3 birds) were hybridized. thus, 36 arrays were used in total. hybridization was performed at 45°c for 16 hours in a hybridization oven with constant rotation (60 rpm). the microarrays were then automatically washed and stained with streptavidin-phycoerythrin conjugate (sape; invitrogen, paisley, uk) in a genechip fluidics station (affymetrix, santa clara, ca). fluorescence intensities were scanned with a genearray scanner 3000 (affymetrix, santa clara, ca). the scanned images were inspected and analyzed using established quality control measures. array data have been submitted to array express (http://www.ebi.ac.uk/arrayexpress/) under the accession number e-tabm-1128. gene expression data generated from the genechip operating software (gcos) was normalised using the plier (probe logarithmic intensity error) method [24] within the affymetrix expression console software package. this normalised data was then analysed using the limma and farms [25] packages within r in bioconductor [26] . probes with a false discovery rate (fdr) value <0.05 and a fold change ≥1.5 were deemed to be biologically significant. in order to determine which biological pathways are involved in the responses to viral infection, we analysed our differentially-expressed (de) genes using pathway express [27, 28] which uses kegg pathways [29] to pictorially display up/down regulation of genes. (nb. these diagrams are based on the human pathways and so are not completely representative of the chicken pathways). genes differentially expressed during the host response (fdr <0.05) were analysed against a reference background consisting of all genes expressed in the experiment. factors considered by pathway express include the magnitude of a gene's expression change and its position and interactions in any given pathway, thus including an 'impact factor' when calculating statistically significant pathways. anything with a p-value <0.25 is deemed significant when using this software. use of the ingenuity pathway analysis (ipa) program [30] revealed which canonical pathways are being switched on by ibv infection in the host (with benjamini-hochberg multiple testing correction) and allowed us to analyze the gene interaction networks involved in the host response. genes were clustered by similar expression pattern and analysed for enriched go-terms and transcription factor binding sites (tfbs) using expander (v5.2) [31]. normalised expression data from control samples were compared with infected samples to examine the host response to ibv infection. enrichment analysis of particular go terms or tfbs within clusters was done using the tango and prima functions, respectively, within the expander package. taqman real-time quantitative rt-pcr (qrt-pcr) was used to quantify viral rna levels and for confirmation of the microarray results for the mrna levels of selected genes. this was performed on 3 replicate pools of 3 samples (9 birds). primers (sigma) and probe (pe applied biosystems, warrington, uk) ( table 1) were designed using primer express (pe applied biosystems). briefly, the assays were performed using 2 μl of total rna and the taqman fast universal pcr master mix and one-step rt-pcr mastermix reagents (pe applied biosystems) in a 10 μl reaction. amplification and detection of specific products were performed using the applied biosystems 7500 fast real-time pcr system with the following cycle profile: one cycle at 48°c for 30 min and 95°c for 20 sec, followed by 40 cycles at 95°c for 3 sec and 60°c for 30 sec. data are expressed in terms of the cycle threshold (ct) value, normalised for each sample using the ct value of 28s rrna product for the same sample, as well described previously [32] [33] [34] . final results are shown as 40-ct using the normalised value, or as fold-change from uninfected controls. taqman real-time quantitative rt-pcr analysis was used to measure viral load in trachea samples from both control and infected birds from both lines 15i and n. tracheal tissue was chosen for examination in this study as the target of ibv is the epithelial surface of the respiratory tract. viral rna was detected in infected birds, but no significant difference in viral load was detected between lines at any of the days 2, 3 or 4 post infection (fig. 1) . this would indicate that the resistance to the virus seen in line n is due to how the birds respond to the virus once it has entered the body and is not a measure of how the birds can prevent initial infection by the virus itself. when resistance to ibv infection was originally determined in these lines, it was noted that they were equally susceptible to infection, but a variation in outcome was seen. in line n, 33 % of birds showed air sac lesions whereas 73 % of 15i birds presented lesions. mortality was 0 in line n, but 47 % within line 15i birds. it was hypothesized that the different lines were producing different immunological responses upon infection [21] . gene expression differences found in the susceptible 15i line between infected and control birds over days 2, 3 and 4 post infection were analysed, with a view to examining the innate host response to infection by ibv. genes seen to be induced during the host response to infection include c1s, irf1, stat1, mx1, tlr3 and ctss as previously recognised by guo et al. [35] . we also identified ifit5, oasl, sca2, lyg2, isg12-2, ddx60, ifih1, irf7, table s1 . to elucidate which biological pathways are being perturbed during the host response to ibv infection, we analysed our data using pathway express [36] . the resulting pathway diagrams are extremely useful in establishing which gene networks are involved in a particular experimental response. as seen in fig. 2 , genes involved in antigen presentation and the toll-like receptor (tlr) pathway are up-regulated. tlrs identify pathogen associated molecular patterns (pamps) and are crucial to the innate immune system. in this study tlr3 is shown to be induced at 3 dpi. tlr3 recognizes double-stranded rna intermediates produced during viral replication and has previously been shown to be induced in the trachea at this time after ibv infection [37] . another pathway involved is the phosphatidylinositol signalling pathway (table 2) . phosphatidylinositol kinases are known to play an important role in the viral life cycle after infection of the host and pi4kb is known to be exploited by coronaviruses for viral entry. the product of pi4kb catalysis is phosphatidylinositol 4-phosphate (pi4p) and coronavirus entry into the host is mediated by the pi4p lipid microenvironment [38] . genes involved in the complement system are also highlighted as being up-regulated in response to ibv infection. complement-mediated lysis of viruses is an important facet of the host innate immune system and its role in defence against viral infection [39] as reflected in the induction of these genes in this study. use of ingenuity pathway analysis (ipa) software also allowed us to determine which biological systems are active during the host response. up-regulated genes are seen to be part of the canonical biological pathways shown in fig. 3a . biological processes involving pattern recognition receptors and interferon signalling feature heavily. the interferon response is a powerful antiviral mechanism, which has previously been shown to be involved in the host response after ibv infection. a very early induction of ifn-γ has been reported in splenocytes [40] , and in peripheral blood mononuclear cells (pbmcs) and lung leukocytes [41] . ifnb expression has also been reported in trachea between 1 and 2 dpi [42] . we do not see this increase in expression of interferon genes (due to the absence of data earlier than 2 dpi), but we do see the downstream effects, with increased expression of many interferon-induced genes. specific physiological processes activated upon ibv infection can also be seen in fig. 3b . the stimulation of various different immune cells is seen along with the indication of reproductive abnormality, which would reflect the problems seen with egg-laying upon ibv infection. in order to cluster genes seen to be involved in the host response to infectious bronchitis into groups with similar expression profiles and probably sharing similar functions or gene regulatory pathways, we utilised the click algorithm within the expander program [43] . figure 4a shows the expression profile of genes upregulated during the response to virus. the expander program was also used to analyse the gene ontology (go) functional annotations of the genes being differentially expressed. figure 4b shows the biological process terms, which are significantly enriched in the genes responding during the host response to infection. as would be expected, these include terms like 'innate immune response' and 'antigen processing and presentation'. 'nad + adp-ribosyltransferase activity' and 'phosphoinositide binding' are also highlighted. transcription factor binding sites present in de genes which are significantly over-represented were also predicted. figure 4c shows that genes up-regulated during the host response have a high proportion of irf7 and isre binding sites. irf7 is a transcriptional activator, which binds to the interferon-stimulated response element (isre) in ifn promoters and functions as a molecular switch for antiviral activity. analysis of the gene expression differences between infected and control birds across the two lines has provided us with information on how these lines differ in their response to infection. examination of the gene expression profiles in the control birds of the two different lines also allowed us to identify genes, which are inherently different between the susceptible and resistant birds. it can be seen that there are numerous genes, which show large expression differences between the two lines, even before infection. dramatic differences in gene expression of certain genes, including ddt, sri, blb1, hscb, bf1, bf2, suclg2, mx1 and sri, which are more highly expressed in the resistant n line table s2 shows all 1930 de probes) so, it can be seen that these are genes which have inherently different expression levels between susceptible and resistant birds, even before infection occurs. we therefore postulate that some of these genes may play an important role in disease resistance. the potential interactome of ibv has recently been investigated by stable isotope labelling with amino acids in cell culture (silac) coupled to a green fluorescent proteinnanotrap pull-down methodology [44] . host proteins, which bind to the ibv n protein were identified, some of the genes for which, we see as being inherently expressed at higher levels in susceptible birds in this study. these genes include myh9, caprin1, dhx57, hnrnph3, rpl27a, fmr1, c22orf28, hnrpdl, sfrs3, rpl31, npm1 and rpsa. this may therefore be one of the reasons why line 15i is more susceptible to ibv infectionthere are more host proteins to which the virus binds, compared with the resistant line n. upon infection, differences in response are also seen between the two lines. interestingly, apart from cd38 and cd4 at 3 dpi and fkbp5 at 4 dpi, all other differential gene expression between the lines is seen at 2 dpi in this study (additional file 3: table s3 ). cd38 is a glycoprotein found on the surface of many immune cells including cd4+, cd8+, b lymphocytes and natural killer cells and is a marker of cell activation. it functions in cell adhesion, signal transduction and calcium signalling. cd4 is found on the surface of immune cells such as t helper cells, monocytes, macrophages and dendritic cells. it is a membrane glycoprotein which interacts with mhcii antigens. the protein functions to initiate or augment the early phase of t-cell activation. the protein encoded by fkbp5 is a member of the immunophilin protein family, which play a role in immuno-regulation and basic cellular processes involving protein folding and trafficking. early defence by the host is a key mechanism for combatting viral infection, and induction of ifnb and other innate genes in response to ibv infection has been shown to peak around 18-36 hr post infection [42] . in this study, genes more highly expressed (or less down-regulated) in the resistant n line at 2 dpi include a number of collagen genes (col3a1, col1a2, col9a1, col9a2, col6a1 and col4a1) and other genes such as acan, fstl1, comp, eif3a, stat3 and igfbp5. genes seen to be more highly expressed (or less down-regulated) in the susceptible 15i line include rbm39, mafb, nnk2, ccn1, mgat5 and thrap3. one consequence of ibv infection is the production of poor quality, misshapen eggs by infected birds [45] . some of the genes previously identified as being important for the creation of a healthy eggshell are seen to be more highly expressed by the resistant n line birds after infection in this study. these genes include col1a2, creld2, hsp90b1, p4hb and erp29 [46] . for a full list of genes differentially expressed between the two lines in trachea (409 de probes) see additional file 3: table s3 . ipa analysis of genes showing different inherent expression between lines 15i and n shows that the molecular functions of these genes is primarily concerned with their involvement in cell death and cell adhesion (fig. 5) , two processes previously shown to be significant in infected kidneys [47] . when the differential host responses to infection are examined, it is seen that genes involved in proliferation of t-lymphocytes and genes concerned with cell attachment and cytoplasmic organization are more highly expressed in the resistant line n. other processes significantly involved are apoptosis and necrosis (fig. 6a) , which have been previously documented in ibv-infected vero cells by liu et al. [48] . one of the most perturbed biological networks noted in this analysis is that involving genes related to connective tissue disorders and involve many collagen genes. these genes are more highly expressed in susceptible line 15i birds compared to resistant line n birds (fig. 6b) suggesting that ibv infection might cause more disorder of eggshell formation in this line [49] . the production of poor quality eggs by ibv infected birds may, in part be a reflection of the expression of these kinds of gene networks compared to that seen in resistant birds. twenty-one genes were selected for qrt-pcr validation ( table 3) . these genes were chosen based on their involvement in the host response and whether they were differentially expressed between the susceptible and resistant lines (either inherently or during the course of infection). of the 21 genes tested, 19 showed comparable higher expression in response to infection in the resistant than in the susceptible line c inherently higher expression in the resistant line differential expression to that determined by the arrays. however, the results for ifnar2 and igfbp5 were not confirmed (additional file 4: figure s1 ). besides knowing that the mhc b locus has a bearing on disease resistance, the lack of any genetic information or identified qtl meant that we had to rely upon the gene expression differences we saw between susceptible and resistant lines to give us clues as to genes potentially involved in resistance to ibv infection. identifying genes which were expressed at different levels in the two lines of birds highlighted b-locus genes (blb1, bf1, bf2 , b-g) as well as bringing to our attention various other non-mhc genes which, due to their known biology, could be candidates for being involved in resistance to ibv infection (table 4) . mx1, c1s, irf7, tlr3, c1r, ccli7, isg12-2 and ifitm3 are all strongly induced during the host response to ibv infection. they are all innate immune genes which could potentially have a role in determining susceptibility to the virus. mx1 and ifitm3 are already established as anti-viral molecules [50] [51] [52] . cd38, cd4, fkbp5 and stat3 all show a higher level of expression during the host response in the resistant birds compared to that of the susceptible birds, indicating their involvement in the host defence mechanism. cd38 and cd4, with their role as receptors on immune cells, as described above, are obvious candidates, along with fkbp5 as an immune-regulator. stat3 is activated by various cytokines and growth factors and functions in cellular processes such as cell growth and apoptosis. even before infection, many genes are seen to be highly differentially expressed between lines 15i and n. oasl is an interferon-induced molecule known to have anti-viral activity against certain viruses such as hepatitis c virus. ddt is highly homologous to the macrophage migration inhibition factor, mif. we have also shown it to be highly differentially expressed in other chicken lines, which are susceptible or resistant to marek's disease virus [53] . ifnar2 is an obvious candidate prediction, as the interferon response is central to the host's defence against ibv infection. tpd52l1, bcl2l1, faim2 and ciapin1 are all known to be involved in regulation of apoptosis, a process seen to be important during ibv infection. hscb, sri, and suclg2, although not having an obvious potential biological role in disease resistance, are highly differentially expressed between susceptible and resistant lines and should thus be considered as potential candidates. resistance to ibv infection is brought about by the immune response after the virus has entered the host and is not due to prevention of initial viral infection. there is a small initial innate response at 2 dpi, with much more gene expression seen at 3 and 4 dpi. analysis of genes being activated or inhibited upon infection shows that the biological pathways primarily affected during ibv infection include mapk signalling, those involved in the interferon response and those involving pattern recognition receptors. susceptible and resistant lines show a differential host response mostly at 2 dpi. there are also genes which are inherently different between the two lines studied, including many genes, which control the apoptotic potential of the host. these differences seen in gene expression levels, allow us to postulate on many candidate genes for disease resistance. some potential candidates for involvement in disease resistance include genes already known to confer resistance to other viral infections (mhc-b locus genes, mx1, oasl and ifitm3), genes involved in apoptotic processes (tpd52l1, bcl2l1, faim2 and ciapin1) and others which could be potential candidates due to their known biology (e.g. ddt and cd4). array data have been submitted to array express (http://www.ebi.ac.uk/arrayexpress/) under the accession number e-tabm-1128. additional file 1: table s1 . gene expression seen during the host response to ibv infection in the trachea of susceptible birds. (xlsx 24 kb) additional file 2: table s2 . gene expression differences found to be inherent between susceptible and resistant lines in the trachea. (xls 386 kb) additional file 3: table s3 . genes found to be differentially expressed between susceptible and resistant lines in response to ibv infection in the trachea. (xls 107 kb) additional file 4: figure s1 . the authors declare no conflicts of interest and no competing financial interests. an apparently new respiratory disease in baby chicks studies of infectious coryza of chickens with special reference to its etiology cultivation of the virus of infectious bronchitis coronavirus avian infectious bronchitis virus coronavirus replication and interaction with host recombinant avian infectious bronchitis virus expressing a heterologous spike gene demonstrates that the spike protein is a determinant of cell tropism interleukin-6 expression after infectious bronchitis virus infection in chickens review of infectious bronchitis virus around the world nephropathogenic avian infectious bronchitis viruses the role of phagocytic cells in enhanced susceptibility of broilers to colibacillosis after infectious bronchitis virus infection ability of massachusetts-type infectious bronchitis virus to increase colibacillosis susceptibility in commercial broilers: a comparison between vaccine and virulent field virus diseases of poultry a reverse transcriptase-polymerase chain reaction survey of infectious ibv infection bronchitis virus genotypes in western europe from isolation of a variant infectious bronchitis virus in australia that further illustrates diversity among emerging strains a novel genotype of avian infectious bronchitis virus isolated in japan in 2009 association of the chicken mhc b haplotypes with resistance to avian coronavirus transcriptional analysis of avian embryonic tissues following infection with avian infectious bronchitis virus transcriptome of local innate and adaptive immunity during early phase of infectious bronchitis viral infection identification of a novel coronavirus in patients with severe acute respiratory syndrome a novel coronavirus associated with severe acute respiratory syndrome investigations into resistance of chicken lines to infection with infectious bronchitis virus use of highly inbred chickens in research studies on genetic resistance to marek's disease guide to probe logarithmic intensity error (plier) estimation. santa clara: affymetrix i i/ni-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data r: a language and environment for statistical computing recent additions and improvements to the onto-tools kegg: kyoto encyclopedia of genes and genomes ingenuity pathway analysis infectious bursal disease virus: strains that differ in virulence differentially modulate the innate immune response to infection in the chicken bursa re-evaluation of chicken cxcr1 determines the true gene structure: cxcli1 (k60) and cxcli2 (caf/ interleukin-8) are ligands for this receptor chicken interleukin-21 is costimulatory for t cells and blocks maturation of dendritic cells molecular mechanisms of primary and secondary mucosal immunity using avian infectious bronchitis virus as a model system a systems biology approach for pathway level analysis induction of innate immune response following infectious bronchitis corona virus infection in the respiratory tract of chickens phosphatidylinositol 4-kinase iiiβ is required for severe acute respiratory syndrome coronavirus spike-mediated cell entry virus complement evasion strategies infectious bronchitis virus induces acute interferon-gamma production through polyclonal stimulation of chicken leukocytes rapid nkcell activation in chicken after infection with infectious bronchitis virus m41 activation of the chicken type i ifn response by infectious bronchitis coronavirus click and expander: a system for clustering and visualizing gene expression data the cellular interactome of the coronavirus infectious bronchitis virus nucleocapsid protein and functional implications for virus biology egg quality and production following infectious bronchitis virus exposure at one day old hen uterine gene expression profiling during eggshell formation reveals putative proteins involved in the supply of minerals or in the shell mineralization process transcriptome analysis of chicken kidney tissues following coronavirus avian infectious bronchitis virus infection induction of caspase-dependent apoptosis in cultured cells by the avian coronavirus infectious bronchitis virus effects of avian infectious bronchitis virus antigen on eggshell formation and immunoreaction in hen oviduct interferons: cell signalling, immune modulation, antiviral response and virus countermeasures mx proteins: mediators of innate resistance to rna viruses ifitms restrict the replication of multiple pathogenic viruses submit your next manuscript to biomed central and take full advantage of: • convenient online submission • thorough peer review • no space constraints or color figure charges • immediate publication on acceptance • inclusion in pubmed, cas, scopus and google scholar • research which is freely available for redistribution this work was supported by the biotechnology and biological science research council (bbsrc), as part of grant numbers bb/d013704/1, bb/ d013704/2 and bb/d010705/1. the authors would like to thank alison downing (edinburgh genomics, edinburgh, uk) for excellent technical assistance with the affymetrix microarray experiments. authors' contributions js performed the arrays, analysed the results and wrote the manuscript; dc carried out challenge experiments, j-rs prepared rna, measured viral load and performed qrt-pcr; db and pk conceived and supervised the project and revised the manuscript. all authors read and approved the final manuscript. key: cord-307769-rjseio5s authors: sim, winnie h; wagner, josef; cameron, donald j; catto‐smith, anthony g; bishop, ruth f; kirkwood, carl d title: expression profile of genes involved in pathogenesis of pediatric crohn's disease date: 2012-05-24 journal: j gastroenterol hepatol doi: 10.1111/j.1440-1746.2011.06973.x sha: doc_id: 307769 cord_uid: rjseio5s background and aim: expression profiling of genes specific to pediatric crohn's disease (cd) patients was performed to elucidate the molecular mechanisms underlying disease cause and pathogenesis at disease onset. methods: we used suppressive subtractive hybridization (ssh) and differential screening analysis to profile the mrna expression patterns of children with cd and age‐ and sex‐matched controls without inflammatory bowel disease (ibd). results: sequence analysis of 1000 clones enriched by ssh identified 75 functionally annotated human genes, represented by 430 clones. the 75 genes have potential involvement in gene networks, such as antigen presentation, inflammation, infection mechanism, connective tissue development, cell cycle and cancer. twenty‐eight genes were previously described in association with cd, while 47 were new genes not previously reported in the context of ibd. additionally, 29 of the 75 genes have been previously implicated in bacterial and viral infections. quantitative real‐time reverse transcription polymerase chain reaction performed on ileal‐derived rna from 13 cd and nine non‐ibd patients confirmed the upregulation of extracellular matrix gene mmp2 (p = 0.001), and cell proliferation gene reg1a (p = 0.063) in our pediatric cd cohort. conclusion: the retrieval of 28 genes previously reported in association with adult cd emphasizes the importance of these genes in the pediatric setting. the observed upregulation of reg1a and mmp2, and their known impact on cell proliferation and extracellular matrix remodeling, agrees with the clinical behavior of the disease. moreover, the expressions of bacterial‐ and virus‐related genes in our cd‐patient tissues support the concept that microbial agents are important in the etiopathogenesis of cd. crohn's disease (cd) is a chronic inflammatory disorder of the bowel. the cause of cd is unclear and a complex interplay between genetic, environmental and immune components has been implicated. 1 the prevailing hypothesis for the pathogenesis of cd is that an aberrant immune response, generated against microbial agents in genetically susceptible hosts, results in chronic intestinal inflammation. thus far, 71 genes have been implicated in cd based on genome-wide association studies, and include genes involved in autophagy, maintenance of mucosal barrier integrity and immune regulation. 2, 3 the nod2/card15 on chromosome 16 was the first locus implicated, mutations of which are thought to affect bacterial recognition. 4 subsequently, four genes, il10ra, il10rb, psmg1 and tnfrsf6b, have been linked to pediatric cd. 5, 6 the polygenic nature of cd suggests that direct targeting of individual disease susceptibility genes is unlikely to be therapeu-tically effective. key molecules in pathophysiology, downstream of regulatory events induced by different causative factors are more likely targets for therapeutic interventions. insights into key gene-environmental interactions relevant to disease pathogenesis could help identify causative stimuli (e.g. infectious agents) based on molecular signatures of the host response. 7 to date, microarray studies carried out on intestinal tissue of cd patients have identified several molecular biomarkers relating to inflammation, abnormal immunoregulation and cell biology, metabolism, signaling, transcription, electrolyte transport and extracellular matrix structure. [8] [9] [10] [11] [12] [13] [14] the suppressive subtractive hybridization (ssh) technique provides a complementary, non-biased approach to the identification of new genes or pathogens associated with cd. in ssh, suppression pcr normalizes the representation of rare and abundant cdna within the target population, and the subtraction step removes common nucleic acid sequences between the target specimen and its matched control. this results in an enriched pool of sequences specific to the target population. 15 the advantage of this approach is that no assumed knowledge of gene identity is required, as it does not rely on a defined set of gene library or conserved sequence signatures as probes for gene identification. 7 hence ssh complements microarray studies by identifying potentially important genes that may not be represented on the array platforms utilized by inflammatory bowel disease (ibd) microarray studies. ssh has been successfully used in the discovery of novel viruses, and the transcriptome profiling of human hepatoma and bone regeneration. [16] [17] [18] in the present study, we used ssh to analyze the differential expression profile in ileal biopsies from children with cd compared with age-and sex-matched non-ibd control children. the purpose of this study was to examine the initial events occurring during cd pathogenesis. tissue selection. ileal biopsy specimens (3-6 mm 3 ) were obtained from patients (aged 4-16) with symptoms suggestive of ibd and undergoing initial diagnostic endoscopy at the royal children's hospital, melbourne, australia. all tissue specimens were stored in rnalater (ambion, melbourne, australia) at -70°c until nucleic acid extraction. the diagnosis of cd was established using standard clinical endoscopic and histopathological criteria according to the montreal classification. 19 patients with esophagitis, mild non-specific gastritis or no known pathological diagnosis were used as non-ibd controls. none of the patients had received antibiotics or immunosuppressive drugs prior to endoscopy. demographic and clinical details of patients assayed by suppressive subtractive hybridization and real-time reverse transcription polymerase chain reaction (rt-pcr) are presented in tables 1 and 2 , respectively. each biopsy was mechanically homogenized, the supernatant harvested, and rna extracted using the allprep dna/rna mini kit (qiagen, melbourne, australia) according to the manufacturer's protocol. all extractions were conducted in a biological safety cabinet class ii. the cdspecific subtractive library was constructed using the pcr-select cdna subtraction kit according to the user manual provided (clontech, palo alto, ca, usa). an overview of the ssh technique is described in figure s1 . ileal rna were obtained separately from four cd and four non-ibd patients, then pooled into cd and non-ibd groups for the ssh assay. the patient groups were matched based on sex, mean age and common genotypes associated with cd, to minimize heterogeneity. differential screening. the library of differentially expressed cdna specific to the cd population was constructed using the topo ta cloning kit (invitrogen, melbourne, australia). five thousand randomly selected clones from the cd-specific subtractive library were spotted onto hybond nylon membrane endoscopic presentation of ileal region where biopsy is taken. where two biopsies taken from separate ileal locations of a patient differ in presentation, both are described here. § genotyping of patients based on single-nucleotide polymorphism were performed for an earlier study. 20 major alleles are del, c, g, g, c and a; for nod2 leu1007fsinc, arg702trp, gly908arg; il23, atg16l1 and tlr4 respectively. cd, crohn's disease; ssh, suppressive subtractive hybridization. (amersham biosciences, sydney, australia) in 384 ¥ 3 by 2 arrays by the australian genome research facility (agrf), melbourne. cd-specific sequences were detected by reverse hybridization with digoxigenin (dig)-labeled probes (roche, sydney, australia) synthesized directly from cdna of the cd and non-ibd subtractive library, according to manufacturer's protocol (dig applications manual for filter hybridization, roche). clones with greater than three times hybridization affinity to the cd-library-specific probes as compared to non-ibd-library-specific probes were selected for sequencing. 23 the primers used are detailed in table s1 . quantification of cdna by real-time pcr was performed using the sybr greener qpcr super mix for abi prism (invitrogen), in accordance with manufacturer's instructions. analysis of real-time rt-pcr reactions and quantification of rna was determined using the 7300 system sequence detection software version 1.4 (applied biosystems). each sample was analyzed in triplicate. gene expression levels for individual patient samples were normalized relative to the expression of ribosomal protein l32 (rpl32) housekeeping gene. calculations were based on the pfaffl method, a mathematical method based on the real-time pcr efficiencies. 24 the origene clone cdna (125 fg) of each gene was used as the calibrator in every assay to allow for direct comparison of gene expression for all samples analyzed across multiple assays. statistical analysis. the mann-whitney u-test was used to compare the difference in median values between gene expression in cd and non-ibd patient samples. a p-value of less than 0.05 was considered statistically significant. all statistical tests were performed using sigmastat, version 3.5 (systat software inc., san jose, ca, usa). this study received ethics approval from the human ethics committee of the royal children's hospital (ehrc no. 23003). written and informed consent was obtained from each individual, parent or guardian prior to enrolment in the study. sequence analysis of 1000 differentially expressed clones from the cd subtraction library identified 863 clones with high homology to genbank sequences. these included 430 clones, which had matches to human mrna sequences representing 75 annotated genes. the remaining clones had sequence similarity to mitochondrial and ribosomal genes, hypothetical proteins, expressed sequence tag (est), human chromosomes, bacterial and animal genes. the 75 annotated genes were assigned to eight functional clusters based on information obtained from the ucsc genome browser and ncbi entrez gene database. the map location, gene function and frequency of ssh clone representation for each gene is listed in table s2 . we noted an enrichment of immune function genes and inflammatory mediators (cluster i and ii); extracellular matrix, remodeling, and ion transport coding genes (cluster iii); metabolic enzymes and signal transducers (cluster iv); genes involved in cell-cycle regulation (cluster v); cancer-related genes (cluster vi); transcription factors and post-transcription modifiers (cluster vi) and genes with unknown function (clusters viii). to assess the quality of the ssh data, genes representing different clone abundance levels were selected for real-time rt-pcr quantification on ileal biopsies. three genes were selected based on their representation of the ssh detection frequency range (high: > 50; moderate: 10-50; low: < 10), and also on potential functional interest with respect to cd pathogenesis. reg1a (55 clones) was selected based on its cell proliferative function and earlier reports of upregulation in colonic tissue of adult cd patients. 11,12 mmp2 (12 clones) is involved in wound healing and has been proposed to have a protective role in colitis by regulating barrier function and vascularisation. 25 anpep (2 clones) has previously been reported to be a receptor for coronavirus. 26 real-time rt-pcr analysis of the three genes was conducted on ileum-derived rna from 13 cd and nine non-ibd patients, in triplicate. for cd patients cd5, cd6 and cd11, biopsies taken from both endoscopically affected and unaffected ileal locations were used in the analysis. individual gene expression levels for each sample were represented as fold change ratios relative to the expression of positive controls (origene clones for mmp2, anpep and reg1a). the individual expression levels (fold change value) of each gene for the biopsy samples of the 13 cd and nine non-ibd patients are depicted in figure 1 . using the mann-whitney statistical test for non-parametric and unpaired populations, the transcript expression levels of mmp2 were found to be significantly higher in cd ileal biopsies as compared to non-ibd ileal biopsies (p = 0.001). the cd population had a trend towards a higher level of reg1a transcript expression, although the difference was not statistically significant (p = 0.063). there was no significant difference in anpep transcript expression between cd and non-ibd patient samples (p = 0.305). the real-time rt-pcr results validated that genes represented by > 10 clones enriched by subtractive hybridization were expressed in higher abundance in cd as compared with non-ibd ileal biopsies. reg1a, mmp2 and anpep expression. analysis of reg1a, mmp2 and anpep gene expression across the cd patient samples revealed interesting patterns of expression. using a fold change ratio of 1 as reference, four cd ileum samples (cd1, cd2, cd8, cd11un) with high levels of mmp2 expression, had low or negligible reg1a and anpep expression (fig. 1) . this inverse pattern of expression was also observed in the cd ileum samples where mmp2 gene expression was high. to contextualize our ssh findings, we compared our results with the data tables from seven microarray studies published previously, that had reported differential expression of genes between inflamed biopsies of cd and non-inflamed biopsies of non-ibd controls. [8] [9] [10] [11] [12] [13] [14] of the 75 annotated genes, 28 genes have been previously analyzed by microarray ( table 3) . the genes were either reported to be upregulated (n = 16), downregulated (n = 10) or variable (n = 2) depending on biopsy site assayed. there were 47 genes identified in this study that have not been previously described in the context of ibd investigations. to identify biological and functional networks based on potential gene interactions among the 75 ssh enriched genes, we utilized the "core" program of the ingenuity pathway analysis software. the majority of the 75 genes were classified into six networks comprising the following functions: (i) antigen presentation, inflammatory response, cancer; (ii) cancer, cell cycle, cellular compromise; (iii) connective tissue development and function, tissue morphology, developmental disorder; (iv) infection mechanism, genetic disorder, nutritional disease; (v) cell signaling, cellular assembly and organization, cellular function and maintenance; and (vi) amino acid metabolism, molecular transport, small molecule biochemistry (table 4 ). network 1 contained the highest number of ssh genes. interestingly, 18/23 genes in this network have been previously reported in microarray studies. the five newly identified genes within this network are cathepsin (ctss), dopa decarboxylase (ddc), integrin beta 1 (itgb1), poly adp-ribose polymerase (parp9) and prothymosin alpha (ptma). figure 2 depicts a schematic representation of this gene network. ctss and itgb1 appear to be involved in multiple pathways, including several direct and indirect associations with the previously reported genes. to elucidate evidence for microbial pathogenesis, the 75 functionally annotated genes were individually searched against the ncbi entrez gene database for reported functional associations with viral or bacterial infections. a total of 29 genes associated with microbial pathogenesis were identified (table 5 ). the pathogenesis of cd is thought to involve a complex interplay between the microbiome, the environment and multiple genetic factors. to gain further insights into the gene regulation processes involved, several gene array analyses have been performed using surgical resections or endoscopic biopsies of the colon obtained during treatment of adults with known ibd. [8] [9] [10] [11] [12] 14 however, the chronicity of the disease process and variability of treatments used are likely to have influenced gene expression profiles in these patients. our study used tissue obtained at initial diagnosis in treatment-naive children with early onset disease. to date, there have been very few studies of events at the genetic level during early disease onset in children. a recent study examining the genome-wide expression profile of pediatric ibd patients was conducted using colonic tissue. 8 our study extends these initial gene expression profile studies by comparing ileal biopsies from a pediatric cohort of cd and non-ibd patients. ssh analysis led to the identification of 75 functionally annotated genes, specific to the cd cohort. comparison of our ssh data with existing microarray studies revealed that 47 of these genes are novel and 28 genes have been previously identified by microarray to be either upregulated or downregulated in the cd population. gene networks. the antigen presentation, inflammatory response and cancer gene network (network 1) comprise one-third figure 1 the relative expression levels of reg1a, mmp2 and anpep in ileal biopsies from 13 crohn's disease (cd) and nine non-inflammatory bowel disease (ibd) patients. the relative expression ratio of each gene was calculated based on real-time reverse transcription polymerase chain reaction (rt-pcr) efficiency and the crossing point deviation of the target patient sample versus the internal rpl32 control, according to pfaffl. 24 of the genes identified by ssh, with a high proportion of genes previously identified to be differentially expressed in cd. this is partially attributable to acute inflammation of the biopsies of cd patients as compared with the non-inflamed biopsies of non-ibd controls. differences in gene expression profiles between inflamed and non-inflamed cd terminal ileum have been recently described. 13 relative to non-ibd controls, the gene expressions of il-8 and saa1 were reportedly much higher in inflamed cd terminal ileum as compared to non-inflamed cd terminal ileum. 13 new genes identified within this network include ctss, ddc, itgb1, parp9 and ptma. based on the molecular interactions depicted in this network, ctss and itgb1 appear to be involved in , and with other genes previously reported as upregulated in cd population. ctss is mainly expressed in antigen-presenting cells and is required for the degradation of mhc-class-iiassociated invariant chains, necessary for proper mhc class ii antigen presentation. 55, 56 integrins, which include itgb1, are membrane receptors involved in cell adhesion and several processes, including immune response. itgb1 is expressed during hypoxic conditions, and can serve as an indicator of intestinal wound repair, which occurs only in a hypoxic environment. 57 the reg1a gene is involved in regulation of cell proliferation, and has been proposed to function as a mitogenic and/or an anti-apoptotic factor in ulcerative colitis (uc)-colitic cancer progression. 58 its high expression levels have been correlated with the severity of intestinal inflammation in patients with uc, and microarray studies have reported its upregulation in the colon of adult ibd patients. 9,10,12 similarly, we identified an upregulation of reg1a in the terminal ileum of pediatric cd patients. this was however contrary to a recent study comparing the expression of reg1a in the terminal ileum of adult cd and non-ibd controls, which reported a downregulation in reg1a expression. 13 the difference in reg1a expression could indicate a distinction between the pathogenesis of early onset cd and adult-onset cd. based on the knowledge that reg1a gene expression is associated with cancer development, 59 the high level of reg1a expression in the terminal ileum of some cd pediatric patients could indicate an increased risk for colorectal cancer development. individuals with early onset cd have been previously described to have an increased risk of developing colorectal cancer. 60 the increased levels of mmp2 observed in cd ileum are consistent with previous studies conducted on colonic tissue where mmp2 is highly expressed in the intestinal epithelia during ibd. 61, 62 other studies have suggested the involvement of mmp2 in the regulation of epithelial barrier function. 25 since epithelial barrier dysfunction plays a central role in the pathogenesis of intestinal inflammation, the increased expression of mmp2 may serve as a response to counteract tissue damage, hence protecting against colitis. 63 the fluctuation in reg1a and mmp2 gene expression between ileal biopsies of different patients and also between biopsies taken at different ileal locations of the same patient, suggest a spatialtemporal nature of gene regulation during early cd pathogenesis. this finding is consistent with the clinical nature of cd, with its patchy distribution. glycoprotein processing (man1a1); packaging (tgoln2, eef1a1) and possibly release (canx). evidence of response to bacterial infection is reflected by the enrichment of receptors for adherent invasive escherichia coli and helicobacter pylori (ceacam6, cd74). 28, 30 the enrichment of mmp2, serpina1, otud4, macf1, pls1, muc17 and clca1 transcripts suggests the presence of infectious agent(s) early in disease pathway as these genes have previously been reported to be upregulated during bacterial or viral infections. [40] [41] [42] [43] [44] [45] [46] the involvement of slc5a1 and sf3b1 gene products in the impairment of intestinal glucose absorption and apoptosis due to hiv-1-induced glucose channel mis-sorting and cell cycle arrest suggest the occurrence of viral activities in early cd pathogenesis. 48, 50 the psme2, ptma, hla-dra, lrrc25 and xrn1 genes or gene products have been previously reported to be associated with defense against viral and bacterial infections. [51] [52] [53] [54] it is possible that these genes are differentially expressed in cd patients in response to infectious triggers. our study recognizes the limitation of the ssh technique whereby the cd subtraction library contained clones that are not differentially expressed, as shown by the anpep expression data. this limitation was also observed in previous studies. 15 preliminary ssh data presented in this study were verified either by real-time pcr quantification or comparison to microarray data from studies performed on individuals with and without ibd. several of the genes anecdotally identified in the context of cd by our study have roles in microbial pathogenesis, promoting inflammation, epithelial remodeling, vesicular transport or cell differentiation and proliferation. these processes are relevant to cd pathogenesis, hence future investigations into the association between these novel gene candidates and cd could contribute to the understanding of the disease. suppressive subtractive hybridization method. restriction endonuclease-digested tester dna was split into two pools and ligated with adaptor 1 or adaptor 2r. two successive rounds of hybridization with excess restriction endonuclease-digested driver dna followed. thereafter, single-stranded components of the adaptors were filled in. exponential amplification of tester-specific sequences is used to enrich for potential differentially expressed genes. type a molecules are significantly enriched, differentially expressed sequences, while cdna that are not differentially expressed form type c molecules with the driver. the concentration of high-and low-abundance sequences is equalized, whereby highly abundant molecules re-anneal to form type b and d molecules. during the second hybridization, remaining equalized and subtracted single-stranded tester cdna reassociate to form type e hybrids, with different ends corresponding to sequences of adaptor 1 and adaptor 2r (adapted from clontech pcr-select cdna subtraction kit user manual [bd biosciences]). primers used for real-time reverse transcription polymerase chain reaction quantification of anpep, reg1a, mmp2 and rpl32 table s2 differentially expressed genes specific to crohn's disease (cd) ileum. genes within each functional category are listed in order of clone abundance please note: wiley-blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. any queries (other than missing material) should be directed to the corresponding author for the article. wh sim et al. mechanisms of disease: pathogenesis of crohn's disease and ulcerative colitis genome-wide meta-analysis increases to 71 the number of confirmed crohn's disease susceptibility loci the genetics of crohn's disease mapping of a susceptibility locus for crohn's disease on chromosome 16 loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease inflammatory bowel disease and mutations affecting the interleukin-10 receptor new technologies, human-microbe interactions, and the search for previously unrecognized pathogens activation of an il-6:stat3-dependent transcriptome in pediatric-onset inflammatory bowel disease dissection of the inflammatory bowel disease transcriptome using genome-wide cdna microarrays analysis of mucosal gene expression in inflammatory bowel disease by parallel oligonucleotide arrays regulation of gene expression in inflammatory bowel disease and correlation with ibd drugs: screening by dna microarrays ulcerative colitis and crohn's disease: distinctive gene expression profiles and novel susceptibility candidate genes characterization of intestinal gene expression profiles in crohn's disease by genome-wide microarray analysis genome-wide gene expression differences in crohn's disease and ulcerative colitis from endoscopic pinch biopsies: insights into distinctive pathogenesis suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cdna probes and libraries identification of herpesvirus-like dna sequences in aids-associated kaposi's sarcoma transcriptional profiling of bone regeneration. insight into the molecular complexity of wound repair differentially profiling the low-expression transcriptomes of human hepatoma using a novel ssh/microarray approach toward an integrated clinical, molecular and serological classification of inflammatory bowel disease: report of a working party of the 2005 montreal world congress of gastroenterology interaction of crohn's disease susceptibility genes in an australian paediatric cohort source: a unified genomic resource of functional annotations, ontologies, and gene expression data the ucsc genome browser database: update 2010 primer3 on the www for general users and for biologist programmers a new mathematical model for relative quantification in real-time rt-pcr selective ablation of matrix metalloproteinase-2 exacerbates experimental colitis: contrasting role of gelatinases in the pathogenesis of colitis aminopeptidase n is a major receptor for the entero-pathogenic coronavirus tgev human coronavirus 229e: receptor binding domain and neutralization by soluble receptor at 37 degrees c ceacam6 acts as a receptor for adherent-invasive e. coli, supporting ileal mucosa colonization in crohn disease integrin alpha3beta1 (cd 49c/29) is a cellular receptor for kaposi's sarcoma-associated herpesvirus (kshv/hhv-8) entry into the target cells helicobacter pylori binds to cd74 on gastric epithelial cells and stimulates interleukin-8 production cellular protein ttrap interacts with hiv-1 integrase to facilitate viral integration gp340 promotes transcytosis of human immunodeficiency virus type 1 in genital tract-derived cell lines and primary endocervical tissue hiv-1-mediated apoptosis of neuronal cells: proximal molecular mechanisms of hiv-1-induced encephalopathy genetic and pharmacologic alteration of cathepsin expression influences reovirus pathogenesis htlv type i tax activation of the cxcr4 promoter by association with nuclear respiratory factor 1 activity of lysosomal exoglycosidases in saliva of patients with hiv infection the hepatitis delta virus rna genome interacts with eef1a1, p54(nrb), hnrnp-l, gapdh and asf/sf2 human herpesvirus-6 induces mvb formation, and virus egress occurs by an exosomal release pathway the measles virus (mv) glycoproteins interact with cellular chaperones in the endoplasmic reticulum and mv infection upregulates chaperone expression hiv-1 interaction with human mannose receptor (hmr) induces production of matrix metalloproteinase 2 (mmp-2) through hmr-mediated intracellular signaling in astrocytes helicobacter pylori infection and short-term intake of low-dose aspirin have different effects on alpha-1 antitrypsin/alpha-1 peptidase inhibitor (alpha1-pi) levels in antral mucosa and peripheral blood hiv-1 promotor insertion revealed by selective detection of chimeric provirus-host gene transcripts single-nucleotide polymorphisms associated with symptomatic infection and differential human gene expression in healthy seropositive persons each implicate the cytoskeleton, integrin signaling, and oncosuppression in the pathogenesis of human parvovirus b19 infection investigating the human immunodeficiency virus type 1-infected monocyte-derived macrophage secretome two atypical enteropathogenic escherichia coli strains induce the production of secreted and membrane-bound mucins to benefit their own growth at the apical surface of human mucin-secreting intestinal ht29-mtx cells lps-induced mucin expression in human sinus mucosa can be attenuated by hclca inhibitors adenovirus infection inactivates the translational inhibitors 4e-bp1 and 4e-bp2 inhibitory effect of hiv-1 tat protein on the sodium-d-glucose symporter of human intestinal epithelial cells cleavage of poly(a)-binding protein by poliovirus 3c proteinase inhibits viral internal ribosome entry site-mediated translation human immunodeficiency virus type 1 vpr induces g2 checkpoint activation by interacting with the splicing factor sap145 the role of the proteasome activator pa28 in mhc class i antigen processing novel function of prothymosin alpha as a potent inhibitor of human immunodeficiency virus type 1 gene expression in primary macrophages hla and hepatitis b infection 20s rna narnavirus defies the antiviral activity of ski1/xrn1 in saccharomyces cerevisiae human cathepsin s, but not cathepsin l, degrades efficiently mhc class ii-associated invariant chain in nonprofessional apcs essential role for cathepsin s in mhc class ii-associated invariant chain processing and peptide loading selective induction of integrin beta1 by hypoxia-inducible factor: implications for wound healing possible role of reg ialpha protein in ulcerative colitis and colitic cancer reg1a expression is a prognostic marker in colorectal cancer and associated with peritoneal carcinomatosis increased risk of large-bowel cancer in crohn's disease with colonic involvement matrix metalloproteinase levels are elevated in inflammatory bowel disease genes expressed in pediatric crohn comparable expression of matrix metalloproteinases 1 and 2 in pouchitis and ulcerative colitis matrix metalloproteinases in inflammatory bowel disease: boon or a bane? upregulation of reg 1alpha and gw112 in the epithelium of inflamed colonic mucosa stability of housekeeping genes in alveolar macrophages from copd patients we would like to thank the children and families for their participation in this study. this project was supported by research grants from the murdoch children's research institute, the cass foundation, the lynne quayle charitable trust, equity trustees ltd, glaxosmithkline australia, the victorian government's operational infrastructure support program, and by a national health and medical research council (nhmrc) research grant. dr kirkwood is supported by an nhmrc rd wright research fellowship (607347). key: cord-282968-kjvvoveq authors: qu, renjun; miao, yujing; cui, yingjing; cao, yiwen; zhou, ying; tang, xiaoqing; yang, jie; wang, fangquan title: selection of reference genes for the quantitative real-time pcr normalization of gene expression in isatis indigotica fortune date: 2019-03-25 journal: bmc mol biol doi: 10.1186/s12867-019-0126-y sha: doc_id: 282968 cord_uid: kjvvoveq background: isatis indigotica, a traditional chinese medicine, produces a variety of active ingredients. however, little is known about the key genes and corresponding expression profiling involved in the biosynthesis pathways of these ingredients. quantitative real-time polymerase chain reaction (qrt-pcr) is a powerful, commonly-used method for gene expression analysis, but the accuracy of the quantitative data produced depends on the appropriate selection of reference genes. results: in this study, the systematic analysis of the reference genes was performed for quantitative real-time pcr normalization in i. indigotica. we selected nine candidate reference genes, including six traditional housekeeping genes (act, α-tub, β-tub, ubc, cyp, and ef1-α), and three newly stable internal control genes (mub, tip41, and rpl) from a transcriptome dataset of i. indigotica, and evaluated their expression stabilities in different tissues (root, stem, leaf, and petiole) and leaves exposed to three abiotic treatments (low-nitrogen, aba, and meja) using genorm, normfinder, bestkeeper, and comprehensive reffind algorithms. the results demonstrated that mub and ef1-α were the two most stable reference genes for all samples. tip41 as the optimal reference gene for low-nitrogen stress and meja treatment, while act had the highest ranking for aba treatment and cyp was the most suitable for different tissues. conclusions: the results revealed that the selection and validation of appropriate reference genes for normalizing data is mandatory to acquire accurate quantification results. the necessity of specific internal control for specific conditions was also emphasized. furthermore, this work will provide valuable information to enhance further research in gene function and molecular biology on i. indigotica and other related species. electronic supplementary material: the online version of this article (10.1186/s12867-019-0126-y) contains supplementary material, which is available to authorized users. properties. however, most of them have a low abundance in plants, for example, indican, isatin, indirubin, and indigotin account for 1.16-43.6 μg/g dw (dry weight), 0.30-3.45 μg/g dw, 1.01-34.4 μg/g dw, and 1.45-18.7 μg/g dw, respectively [12] . these active compounds are secondary metabolites that accumulate during normal plant growth or exposure to environmental stresses [13] . it is necessary to elucidate the biosynthetic pathway of i. indigotica under various stresses to increase the content of active ingredients. with the development of high-throughput sequencing technology, many candidate genes involved in the biosynthesis of active ingredients have been obtained from the transcriptome database [14, 15] . therefore, an understanding of functional gene expression profiling will provide us with better insight into the metabolic pathway and the regulatory mechanism operating under stresses in this medical herb. with its high sensitivity, accuracy, and specificity as well as its high-throughput characteristic, the quantitative real-time polymerase chain reaction (qrt-pcr) has become the most powerful and reliable molecular technique for gene expression analysis in a wide range of biological research areas [16, 17] . however, the quantitative results are often affected by several error sources, such as the amount of starting material, the rna integrity, reverse transcription, and qrt-pcr amplification [18] . to obtain accurate qrt-pcr analysis results, it is crucial to normalize the raw gene expression data. the use of stable reference genes as normalization factors to minimize these errors has become the most common approach [18] . housekeeping genes, or genes involved in basic metabolism, such as actin (act ), glyceraldehyde-3-phosphate dehydrogenase (gapdh), tubulin (tub), and elongation factor 1 alpha (ef-1α) have traditionally served as references in plant science, because they were believed to be consistently expressed across various tissues, developmental stages, and treatments [19, 20] . nevertheless, numerous studies have reported that the transcription level of commonly-used housekeeping genes shows unacceptable variability under different experimental conditions [21] . if inappropriate reference genes are selected for normalization, the noise of the expressing assay will be increased, and thus, misinterpretation of the results will appear [22] . consequently, it is essential to systematically evaluate potential reference genes to ensure that they are appropriate for a specific experimental condition [23] . to date, igg (http://icg.big. ac.cn), a wiki-driven knowledgebase that collects internal reference genes for diverse species, has been integrating a comprehensive collection of more than 150 plants [24] , such as arabidopsis [25] , cucumber [26] , wheat [27] , rice [28] , artemisia annua [29] , and panax ginseng [30] . however, it has not been used for the systematic selection of a reference gene for qrt-pcr analysis in i. indigotica under hormone treatment or low-nitrogen stress, a factor that impedes functional gene studies. high-throughput mrna sequencing (rna-seq), a transcriptome profiling-based deep-sequencing technology approach, has paved the way for the use of transcriptome analysis in various species at an amazing scale and speed [31] . with recent advances, rna-seq can reveal novel genes, carry out tissue-specific alternative splicing, and identify differentially expressed genes [32] . meanwhile, plant transcriptome data have commonly been used to search for appropriate reference genes through this technique [33] . excel-based tools, such as genorm [34] , normfinder [35] and bestkeeper [36] , have been developed to select the most suitable reference genes from a set of biological samples under investigation to be used in an expression stability analysis. besides this, the optimal gene should be evaluated with genes of interest to obtain reliable results. as a non-model species, few previous studies have been conducted on i. indigotica at a molecular level. lignans, a component of i. indigotica that are produced by the phenylpropanoid pathway, are important chemical ingredients that exhibit various biological activities. cinnamoyl coenzyme a reductase (ccr) is one of the key enzymes involved in the biosynthesis of lignin monomers, and the expression of the iiccr gene shows prominent diversity in response to hormonal stress [37] . therefore, iiccr may be used to demonstrate that the reference genes are reliable under various experimental conditions. in this study, nine common candidate reference genes were selected based on the transcriptome libraries of i. indigotica (srr1051997) to determine appropriate reference genes for qrt-pcr normalization in different plant tissues, and under low-nitrogen stress and exposure to hormonal stimuli (aba and meja). moreover, the expression level of one target gene, iiccr , was assayed to verify the reliability of the proposed reference genes. finally, the results will provide the basis for further research in exploring gene expression profiling under different experimental conditions. based on previous reports in arabidopsis [25] , cucumber [26] and wheat [27] , we selected nine genes as candidate genes by mining the isatis indigotica transcriptome data [14] . the details of gene symbol, gene id, gene name, arabidopsis ortholog no, and characteristics of pcr amplification in isatis indigotica are shown in table 1 . subsequently, qrt-pcr primers were designed, and their specificity was determined using gel electrophoresis and a melting curve. the 2% agarose gel electrophoresis showed that only one amplicon corresponding to the expected fragment size was obtained after pcr amplification in all candidate reference genes (additional file 1: figure s1 ), and a single amplification peak was present on the melting curve for all primer sets (additional file 1: figure s2 ). qrt-pcr amplification efficiencies of 91% and 109% with correlation coefficients (r 2 ) ranging from 0.9859 to 0.9987 were calculated based on a standard curve assay generated from amplification with a series of cdna dilutions (table 1) . the transcript abundances of the nine reference genes were determined from their mean cycle threshold value values (ct) and varied from 15 to 29, with lower ct values corresponding to higher expression abundance. among all candidates, ef1-α had the highest transcript level with the lowest mean ct value of 18.07 ± 1.23 (mean ± sd), followed by rpl (20.82 ± 1. table s1 ). genes with ct values with large sds had more variable expression compared to these with lower sds. mub showed the smallest variation in gene expression (22.40 ± 0.89), while β-tub showed the most variable level of expression. the expression stability of nine candidates across different sample sets was evaluated and ranked using four different computational algorithms including genorm, normfinder, bestkeeper, and reffinder. the expression stability values (m) calculated by genorm were used to evaluate the stability of the nine proposed reference genes by comparing the average variation of each gene to all others. an m value of 1.5 was used as a threshold for expression stability, so the gene with the lowest m value was recognized as the most stable reference gene and vice versa [34] . based on the above criteria, when all the samples of different tissues and abiotic stresses were combined, ef1-α and mub had the lowest m value, whereas ubc had the highest value, indicating that ef1-α and mub possessed the most stable expression, and ubc was the most variably expressed. for the aba stress set, the two most stable genes for normalization were act and mub with the minimum m value, and β-tub was the least stable gene. in the meja stress set, ef1-α and tip41 ranked as the two most stable gene and ubc was the least stable one. in the set of samples under lownitrogen stress, mub and cyp were the top two stable gene and α-tub was the most unstable candidate. finally, in the various tissues set, ef1-α and cyp were the most stable gene, and tip41 was the least stable gene (fig. 2) . to obtain the optimal number of genes needed for qrt-pcr normalization, the average pairwise variation (vn/vn + 1) between two sequential normalization factors (nf n and nf n+1 ) was calculated using the genorm programme. the cutoff value of vn/vn + 1 < 0.15 indicated that n stable reference genes are enough to obtain accurate results [34] . under exposure to aba stress, meja stress, low-nitrogen stress, and different tissues, the v2/v3 value was already below 0.15, indicating that two reference genes were sufficient for accurate normalization. when all the samples were taken together, the pairwise variation (v2/v3) was 0.166, while v3/v4 was 0.147, indicating that the addition of a third reference gene had a significant effect on the results (fig. 3 ). the optimal normalization gene among these candidates was determined by normfinder according to their table 2 . a lower stability value indicated more stable expression of a gene. in the subset of all samples, mub, ef1-α, and act were identified as the three most stable reference genes, while ubc showed higher variation. tip41 and ef1-α were identified as the two most stable reference genes under lownitrogen stress, while tip41 was the least stable gene in different tissues. under meja stress, mub and α-tub were suggested to be the most stable genes, and ubc was the least stable one. consistent with the genorm analysis, act was shown to be the most stable gene under aba stress, and ef1-α and cyp were the two most stable genes in different tissues. the bestkeeper algorithm was used to rank the stability of the reference genes according to the standard deviations (sd) and coefficients of variance (cv) of their ct values, which are listed in table 3 . higher stability was indicated by a lower cv ± sd value. ct values with sds of less than 1 were considered to have an acceptable range of variation. mub (1.31 ± 0.29) and tip41 (1.45 ± 0.34) were the most stable genes for expression normalization in the aba stress set; tip41 (1.05 ± 0.25) and mub (1.49 ± 0.35) in meja stress; cyp (0.29 ± 0.05) and tip41 (0.29 ± 0.06) in low-nitrogen stress; and act as for the all-samples set, mub showed the highest expression stability, which is consistent with the genorm and normfinder results. as shown in table 4 , the reffinder program, which integrates genorm, normfinder and bestkeeper, was used to generate a comprehensive ranking of the nine candidate reference genes. in this process, mub and ef1-α were ranked as the top two genes of the total sample. tip41 was suggested to be the most stable gene under meja stress and low-nitrogen stress, while it was unstably expressed under different tissues. cyp comprehensively ranked first in different tissues. under aba stress, act was the most stable. the transcript abundances of ubc were extremely unstable in all samples, different tissues, and under meja stress. α-tub was found to be unstable under low-nitrogen stress, while for aba stress, β-tub was the least stable gene. to validate the utility of the proposed reference genes in these four experiments, the relative expression level of iiccr was detected using the most stable (mub or tip41 were used alone or mub was combined with tip41 for low-nitrogen stress and meja stress, mub, ef1-α, or their combination for aba stress and different tissues) and least stable (ubc for low-nitrogen stress, meja stress and different tissues, β-tub for aba stress) reference genes as calibrators. in low-nitrogen stress, the highest expression level of iiccr was detected in n1, followed by n3 and n2, and then in n0. by contrast, there was a distinct discrepancy when using ubc as the least stable reference gene (fig. 4a) . we noticed that iiccr has a similar expression pattern normalized by the optimal reference gene and the most unstable reference visibly differed (fig. 4b) . the iiccr expression level was significantly upregulated at 0 h and 8 h under aba treatment when mub, and ef1-α were used as the internal control genes. however, it was at 0 h and 24 h when β-tub was used (fig. 4c) . the normalization results of the iiccr expression level in meja treatment were consistent when using the two optimal genes (mub and tip41) as calibrators, while significant deviations appeared when normalized by the worst reference gene, ubc (fig. 4d) . these outcomes proved that accurately normalize gene expression, it is important to validate reference genes with stale expression under diverse experimental conditions. the growth and development of plants is challenged by unsuitable environmental factors, such as salinity, drought, uv stress, and pathogen infection due to habitat restriction [13] . being sessile, it is necessary for plants to evolve a series of defense and/or adaption mechanisms. among these, plant secondary metabolites are known to play major roles in the adaptation of plants to their environments and in conferring protection against stress conditions [38] . furthermore, the metabolites are unique sources for active pharmaceuticals, cosmetics, and food additives [39] . unfortunately, the yield of plant secondary metabolites is so low that they cannot meet the increasing demand of the market. to increase their production, the biosynthesis pathways and key gene expression profiling related to the biosynthesis of the secondary metabolites need to first be elucidated. then, we will be able to establish a better understanding of gene functions [40] . qrt-pcr has emerged as a broadly accepted method for gene expression analysis due to its accuracy, highthroughput, and sensitivity [18] . nevertheless, selecting reference genes from the literature without systematic validation could cause inaccurate qrt-pcr results [22] . hence, the selection and validation of appropriate reference genes for normalizing data is mandatory to acquire accurate quantification results under various experimental conditions for a given species. high-throughput sequencing technologies with fast development have provided a highly effective method to study plant transcriptomics [41] , plant epigenomics [42] , and plant genomics [43] . moreover, the creation of large data sets and gene expression data by sequencing are regarded as an abundant source for reference gene selection, especially for non-model plants. therefore, i. indigotica large-scale transcriptome data (srr1051997) can serve as a gene pool to identify potential internal control genes. systematic and comprehensive evaluation of nine reference genes was performed in different tissues of i. indigotica and in leaves subjected to various treatments. the results showed that a single amplification peak presented on the melting curve images, so all the primer sets had quite good specificity (additional file 1: figure s2 ). additionally, the qrt-pcr performance of tested reference genes suggested high amplification efficiency values (close to 100%) ( table 1 ). the statements mentioned above justify that these primers worked as expected and were reliable for further analyses of the stability of candidate reference genes. three statistical algorithms (genorm, normfinder, and bestkeeper) were used to analyze the stability of these candidates. however, the stability rankings obtained from the algorithms were not identical. the results of genorm and normfinder were similar for some conditions, but discrepancies occurred for the orders ranked by best-keeper. for instance, in the aba treatment, genorm and normfinder calculated act to be the most stable gene, while it was ranked moderately by bestkeeper. this apparent variation was probably due to the different calculation principles in the three statistical algorithms [44] . to obtain consistent results, a comprehensive online tool, reffinder, which integrates the three algorithms, has been widely applied to generate a final comprehensive ranking of reference gene expression [45] . in the current study, the reffinder analysis, identified mub as the most stable gene in the all-samples set. meanwhile, ubc, α-tub and β-tub had relatively poor expression stability values, which is similar to previous results in artemisia annua l [29] and brassica napus [46] . traditionally, classical housekeeping genes have been regarded as stably expressed at various development stages and under different treatments. however, an increasing number of studies is showing that the expression stabilities of most of these genes actually have great variation [22, 45] . therefore, the use of housekeeping genes as references must be validated under specific conditions. in this study, six traditional housekeeping genes, which are involved in the cytoskeleton (act , α-tub and fig. 4 relative expression of iiccr using the selected reference genes. the results were normalized using the selected stable reference genes (singly or in combination) and the unstable genes in sample sets across treatment with a n, b different tissues, c aba, and d meja. the bars indicate the standard error (± se) evaluated from three biological replicates β-tub), post-translational modification (ubc and cyp), and ribosomal structure and biogenesis (ef1-α), were shown to display dramatic differences in expression patterns under conditions of low-nitrogen stress, hormonal stimuli, and in different tissues. cyp was the most stable reference gene in different tissues of i. indigotica but exhibited relatively low stability in lycoris aurea [19] . ef1-α ranked neither as the top, nor as the least suitable gene under four experimental conditions. α-tub and β-tub of the tubulin gene family are used as reference genes in many species, such as capsicum annuum l. [47] , cynodon dactylon under cold stress [48] , and eremosparton songoricum under various stress conditions [49] , but in our study, they, along with ubc, always displayed the least stable expression pattern. in addition, act was ranked first in aba treatment and was also the most stable gene in the all-samples set. consistent with the result in lilium davidii var. unicolor, act also showed strong stability in i. indigotica [50] ; however, act is not appropriate for gene normalization in different organs of salix matsudana [51] . compared with the traditional housekeeping genes, the newly reported reference genes performed better in gene normalization under specific conditions [19, 44] . in this research, three newly reported reference genes, mub, tip41, and rpl, were analyzed. mubs are membraneanchored ubiquitin-fold proteins, which are thought to play a crucial role in diverse signaling cascades [52] . the corresponding genes are universally expressed in the tissues of many plants, animals, and fungi [53] . in salix matsudana, mub was shown to be highly stable under salt and drought stress conditions [51] . in the current study, we also found mub to be the most suitable gene in the all-samples subset. the rpl gene also served as a stable reference gene [54] , but it showed less stable expression patterns under almost all tested conditions in i. indigotica. similar to lycoris aurea [19] , tip41 was revealed to be the optimal reference gene for meja stress. as for i. indigotica, tip41 also ranked first under low-nitrogen stress and was selected as the best reference gene. previous studies have indicated that tip41 has fairly stable expression during salt stress in oilseed rape as well as at different developmental stages and in various tissues of arabidopsis plants [46, 55] . in addition, tip41 was not only validated as a reliable internal control gene under abiotic stress in chickpeas [56] , it was shown to be suitable for cucumber plants under various degrees of nitrogen nutrition [57] . the expression patterns of a target gene iiccr were examined using the two most stable and least stable reference genes to further confirm the stabilities of reference genes. the results showed that the iiccr displayed a consistent expression pattern in response to low-nitrogen stress, hormonal stimuli and in different tissues when mub was used as an internal control, either singly or in combination with tip41 or ef1-α. however, severe variance appeared when the least stable genes, ubc or β-tub, were used for normalization. our results were consistent with previous studies, which reported that the use of unstable reference genes for qrt-pcr analysis resulted in significant variation in target gene amplification profiles, resulting in the misinterpretation of expression data [58] . consequently, it is extremely important to systematically select reference genes to accurately measure the target genes' expression levels. the selection of suitable reference genes is a prerequisite to quantifying gene expression by qrt-pcr. in this study, a series of candidate reference genes were systematically validated to normalize gene expression during i. indigotica's response to various conditions. three prevalentlyused algorithms (genorm, normfinder, and bestkeeper) were adopted to analyze the expression stability of the nine candidates. reffind produced the final comprehensive ranking, showing that the optimal reference genes were mub and ef1-α across all samples; tip41 and cyp under low-nitrogen stress; ef1-α and tip41 under meja stress; cyp and ef1-α in different tissues; and act and mub under aba stress. the reference genes identified as the least stable, β-tub and ubc, are not recommended for the normalization of transcripts. the qrt-pcr of iiccr was used to validate the reliability of these results, and the selected reference genes were shown to significantly reduce the error rate in gene quantification. the results obtained from the present work will help to further increase the accuracy of normalization in qrt-pcr analysis and will facilitate gene expression studies in i. indigotica. seeds of isatis indigotica, collected from shanxi province in northern china, were used in this study. the seeds were soaked in tap water to wash away the empty seeds floating on the water. the plump seeds were sown in plastic pots filled with a mixture of perlite and vermiculite (ratio, 1:1; v/v) and maintained in the greenhouse of nanjing agricultural university (118°51′ e; 32°1′ n), nanjing, china. after germination, seedlings were irrigated with 1/4 strength hoagland's solution once a week before being subjected to different experimental treatments 6 weeks later. for the hormonal stimuli, the leaves of the seedling were sprayed with 100 μm abscisic acid (aba treatment) or 100 μm methyl jasmonate (meja treatment) and then collected at 0, 4, 8, 12, and 24 h. low-nitrogen level stress was produced by irrigating the seedlings with a solution comprising five concentration levels of nitrogen for 1 week, which all included 1 mm kh 2 [59] in addition to 0 mm kno 3 , 2.5 mm kno 3 , 5 mm kno 3 , and 10 mm kno 3 . root, stem, leaf, and petiole tissues were collected from untreated seedlings. three biological repeats were collected for all samples from each treatment, immediately frozen in liquid nitrogen, and stored at − 80 °c for total rna extraction. total rna from each sample was extracted using the rnaprep pure plant kit (tiangen) and treated with dnase i to avoid genomic dna contamination, in accordance with the kit instructions. the rna concentration and purity were quantified using a colibri spectrophotometer (berthold detection). the integrity of the purified rna samples was examined by 1.5% (p/v) agarose gel electrophoresis. samples were used for cdna synthesis at absorption ratios of a260/a280 = 1.9-2.1 and a260/a230 ≥ 2.0. a first strand cdna synthesis reaction was carried out and transcribed from 2 μg total rna and 1 μg oligo-dt in a final volume of 20 μl using the primescript ™ 1st strand cdna synthesis kit (takara) by following the manufacturer's protocols. the final cdna samples were diluted five-fold with rnaasefree water and then stored at − 20 °c until further analysis. the candidate genes selected in the present study served as reference genes that were previously reported as suitable for gene expression normalization in other model plants and similar species subject to different experimental conditions. their names were used to search the i. indigotica transcriptome library (srr1051997) and the genes that were extensively expressed in organizations were selected (table 1) . moreover, to ensure the reliability and correctness of the proposed reference genes, we blast-searched the nucleotide sequences of candidate genes against the arabidopsis genome database to identify their homologs in i. indigotica. based on the unigene sequences (additional file 1: file s1), specific primers were designed using primer 3 software (http://bioin fo.ut. ee/prime r3-0.4.0/) with the following criteria: melting temperature (tm) 58-62 °c, gc content 40-65%, primer length 16-20 bp, and amplicon length 100-220 bp ( table 1) . self-complementarity and hair-pin structures were avoided. the primer specificity was judged by the agarose gel electrophoresis of the pcr amplification products (additional file 1: figure s1 ) and observed via melting curves (additional file 1: figure s2 ). pcr amplification was performed in a total volume of 20 μl, containing 3.6 μl of ddh 2 o, 2 μl of five-fold diluted cdna, 10 μl of 2× pcr buffer, 2 μl of dntps (2 mm), 1 μl of each primer (10 mm), and 0.4 μl of kod fx (1.0 u/μl). the pcr program was as follows: 5 min at 94 °c, 35 cycles of 10 s at 98 °c, 30 s at 60 °c, and 30 s at 68 °c, followed by 5 min extension at 68 °c. the pcr products were run on 2% agarose gel electrophoresis (additional file 1: figure s1 ). the qrt-pcr was conducted in 96-well plates with an abi 7500 realtime pcr system (applied biosystems) using the sybr ® green i (biouniquer). the reaction mixture contained 2 μl of five-fold diluted cdna, 2 μl of each primer (10 mm), 0.4 μl of 50 × rox1 and 10 μl of realtime pcr master mix to give a final volume of 20 μl. the program for qrt-pcr was set as 10 min at 95 °c, 40 cycles of 15 s at 95 °c, and 30 s at 60 °c. the melting curves were recorded in each reaction by constantly raising the temperature from 65 to 90 °c (additional file 1: figure s2 ). each sample was run with three technical replicates, and every plate included one no template control (ntc) to monitor possible dna contamination. the threshold cycle (ct) was measured automatically. a standard curve was generated with five-fold series dilution of the mixed cdna of all samples to calculate the pcr efficiency (e) and correlation coefficient (r 2 ). the pcr amplification efficiency (e) of each primer pair was calculated by the curve slope using e = [5 (−1/slope) − 1] × 100% [51] . the stability and suitability of the nine selected reference genes were evaluated by three algorithms, genorm [34] , normfinder [35] , and bestkeeper [36] , across all experimental sets. finally, reffinder (http://150.216.56.64/refer enceg ene.php) integrated the three algorithms to obtain an overall ranking. for genorm and normfinder, the mean ct value of three biological repeats from each gene was converted into the relative expression level using the formula 2 −δct (δct = ct value of each sample − the lowest ct value) [60] . for bestkeeper, the mean ct value was imported into the program directly. stability measures (m) of the candidate genes were calculated with the genorm algorithm. stepwise exclusion of the gene with the highest m (least stable gene) value was used to rank the analyzed genes. subsequently, the pairwise variation (vn/vn + 1) values calculated by the genorm were used to determine the optimal number of candidate reference genes; a value below 0.15 indicated that no additional reference gene was required. normfinder evaluated the genes' expression stability by assessing intra and intergroup variation in a given sample set, offering a ranking in which the highest stability (s) value represented the least stable gene [61] . bestkeeper was used to calculate the standard deviation (sd) and coefficient of variation (cv) of the average ct values. analyzed genes with a standard deviation (sd) > 1 were considered to be unacceptable reference genes, and the gene with the lowest cv ± sd value was the most stable one. to obtain a more accurate expression analysis, the 54 samples were divided into four experimental sets and analyzed individually: 15 samples from the aba-induced i. indigotica leaves (set 1, aba treatment); 15 samples from the meja-induced i. indigotica leaves (set 2, meja treatment); 12 samples from the low-nitrogen stressed i. indigotica leaves (set 3, n treatment); and 12 samples from different tissues (roots, stems, leaves, and petioles) of i. indigotica (set 4, different tissues). in addition, the stability of the four sets together and that of each variety was analyzed. to identify the stability of the reference genes selected in this study, the expression level of iiccr , a gene involved in the lignin monomers biosynthesis pathway [37] , was detected with qrt-pcr analysis. the expression patterns of iiccr in samples of i. indigotica under low-nitrogen stress, meja treatment, aba treatment, and in different tissues were normalized using two most and one least stable reference genes, respectively, as recommended by reffinder. the 2 −δδct method, a commonly used method to analyze the relative exchange in gene expression, was used to calculate the relative expression data of the target gene [62] . three technical replicates were performed for each biological sample. additional file 1: table s1 . ct values of the 9 candidate reference genes. figure s1 . specificity of primer pairs for qrt-pcr amplification. figure s2 . melting curves of the 9 candidate reference genes showing single peaks. file s1. sequences of nine candidate reference genes. act : actin; ubc: ubiquitin-conjugating enzyme; α-tub: alpha-tubulin; β-tub: beta-tubulin; ef1-α: elongation factor 1-α; mub: membrane-anchored ubiquitin-fold protein; cyp: cyclophilin; rpl: ribosomal protein l18; tip41: tip41-like family protein. indole alkaloids from the roots of isatis indigotica and their inhibitory effects on nitric oxide production the cytotoxicity to leukemia cells and antiviral effects of isatis indigotica extracts on pseudorabies virus anti-sars coronavirus 3c-like protease effects of isatis indigotica root and plantderived phenolic compounds clinical efficacy and il-17 targeting mechanism of indigo naturalis as a topical agent in moderate psoriasis indigo naturalis ameliorates murine dextran sodium sulfate-induced colitis via aryl hydrocarbon receptor activation characterization of anti-leukemia components from indigo naturalis using comprehensive two-dimensional k562/cell membrane chromatography and in silico target identification indirubin, the active constituent of a chinese antileukaemia medicine, inhibits cyclin-dependent kinases antiviral activity of isatis indigotica root-derived clemastanin b against human and avian influenza a and b viruses in vitro thermochemical studies on the quantity-antibacterial effect relationship of four organic acids from radix isatidis on escherichia coli growth effect of extracts from indigowood root (isatis indigotica fort.) on immune responses in radiation-induced mucositis novel indolo[2,1-b]quinazoline analogues as cytostatic agents: synthesis, biological evaluation and structure-activity relationship determination of indican, isatin, indirubin and indigotin in isatis indigotica by liquid chromatography/electrospray ionization tandem mass spectrometry influence of abiotic stress signals on secondary metabolites in plants high-throughput sequencing and de novo assembly of the isatis indigotica transcriptome dynamic metabolic and transcriptomic profiling of methyl jasmonate-treated hairy roots reveals synthetic characters and regulators of lignan biosynthesis in isatis indigotica fort quantification of mrna using real-time rt-pcr validation and evaluation of reference genes for quantitative real-time pcr in macrobrachium nipponense quantification of mrna using real-time reverse transcription pcr (rt-pcr): trends and problems selection and validation of appropriate reference genes for quantitative real-time pcr analysis of gene expression in lycoris aurea evaluation of suitable reference genes for qrt-pcr normalization in strawberry (fragaria × ananassa) under different experimental conditions evaluation of candidate reference genes for expression studies in pisum sativum under different experimental conditions the implications of using an inappropriate reference gene for real-time reverse transcription pcr data normalization control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies icg: a wiki-driven knowledgebase of internal control genes for rt-qpcr normalization systematic validation of candidate reference genes for qrt-pcr normalization under iron deficiency in arabidopsis selection of appropriate reference genes for gene expression studies by quantitative real-time polymerase chain reaction in cucumber reference gene selection for qpcr gene expression analysis of rust-infected wheat defining reference genes for quantitative real-time pcr analysis of anther development in rice reference gene selection in artemisia annua l., a plant species producing anti-malarial artemisinin validation of suitable reference genes for quantitative gene expression analysis in panax ginseng rna-seq: a revolutionary tool for transcriptomics differential gene and transcript expression analysis of rna-seq experiments with tophat and cufflinks reference gene selection for qrt-pcr normalization analysis in kenaf (hibiscus cannabinus l.) under abiotic stress and hormonal stimuli accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes normalization of real-time quantitative reverse transcription-pcr data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets determination of stable housekeeping genes, differentially regulated target genes and sample integrity: bestkeeper-excel-based tool using pair-wise correlations isolation and characterization of a gene encoding cinnamoyl-coa reductase from isatis indigotica fort plant responses to simultaneous biotic and abiotic stress: molecular mechanisms plant cell cultures: chemical factories of secondary metabolites production of plant secondary metabolites: a historical perspective performance comparison of benchtop high-throughput sequencing platforms high-throughput sequencing technologies exploring plant transcriptomes using ultra highthroughput sequencing selection and validation of appropriate reference genes for quantitative real-time pcr normalization in staminate and perfect flowers of andromonoecious taihangia rupestris selection of housekeeping genes for gene expression studies in human reticulocytes using real-time pcr selection of reference genes for quantitative reverse-transcription polymerase chain reaction normalization in brassica napus under various stress conditions identification of reference genes for reverse transcription quantitative real-time pcr normalization in pepper selection and validation of reference genes for target gene analysis with quantitative rt-pcr in leaves and roots of bermudagrass under four different abiotic stresses reference gene selection in the desert plant eremosparton songoricum validation of reference genes for accurate normalization of gene expression in lilium davidii var. unicolor for real time quantitative pcr selection of suitable reference genes for quantitative real-time pcr gene expression analysis in salix matsudana under different abiotic stresses protein lipid modifications in signaling and subcellular targeting mubs, a family of ubiquitin-fold proteins that are plasma membrane-anchored by prenylation stable internal reference genes for the normalization of real-time pcr in different • fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold open access which fosters wider collaboration and increased citations maximum visibility for your research ready to submit your research ? choose bmc and benefit from: sweetpotato cultivars subjected to abiotic stress conditions genomewide identification and testing of superior reference genes for transcript normalization in arabidopsis identification and validation of reference genes and their impact on normalized gene expression studies across cultivated and wild cicer species reliable reference genes for normalization of gene expression in cucumber grown under different nitrogen nutrition selection of reference genes for diurnal and developmental time-course real-time pcr expression analyses in lettuce comparison of the response of ion distribution in the tissues and cells of the succulent plants aloe vera and salicornia europaea to saline stress selection of suitable reference genes for qrt-pcr normalization during leaf development and hormonal stimuli in tea plant (camellia sinensis) identification of stable reference genes for quantitative pcr in koalas analysis of relative gene expression data using real-time quantitative pcr and the 2 −δδct method not applicable. this research was supported by grants of the national natural science foundation of china (grant no. 31171486). springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. authors' contributions qrj, yj, and txq conceived the study and designed the experiments. qrj, myj, and cyj performed the experiments. cyw and zy analyzed the data with suggestions from wfq, yj, and txq. qrj wrote the manuscript. all authors read and approved the final manuscript. 1 college of horticulture, nanjing agricultural university, nanjing 210095, china. 2 institute of food crops, jiangsu academy agriculture sciences, nanjing 210014, china. the authors declare that they have no competing interests. the datasets supporting the conclusions and description of a complete protocol are included within the article and its additional files. not applicable. not applicable. key: cord-320005-i30t7cvr authors: pardo, a. title: the human genome and advances in medicine: limits and future prospects date: 2004-03-31 journal: archivos de bronconeumología ((english edition)) doi: 10.1016/s1579-2129(06)70078-7 sha: doc_id: 320005 cord_uid: i30t7cvr nan on april 14, 2003 , the international human genome sequencing consortium announced the successful completion of its task. the correct sequence of the bases cytosine (c), thymine (t), adenine (a), and guanine (g) in the gene-containing regions of dna had been elucidated with an accuracy of 99.99% for 99% of the euchromatin. this is considered to be the most that can be achieved with current technology, and all that now remains is to sequence the remaining regions, which are more difficult because they include almost 400 highly repetitive dna fragments in addition to the centromeres, the structures that divide chromosomes. the consortium of which the human genome project (hgp) formed a part included 20 centers in 6 countries (china, france, germany, great britain, japan, and the united states of america). this international group chose to announce the completion of the task in april 2003 in order to coincide with the 50th anniversary of the publication, in april 1953, of the paper by watson and crick 1 that first described dna's double helix structure. the hgp's initial objectives were fulfilled 2 years ahead of schedule, and, in addition to compiling a highly accurate sequence of the human genome which has been made freely available and accessible to everyone, the consortium has developed a set of new technologies and has constructed genetic maps of the genomes of various organisms. moreover, this program of scientific investigation is linked to a parallel bioethics program. it is also interesting to note that, thanks to advances in technology, this result was achieved for a cost lower than the initial budget, which estimated that 500 mb would be sequenced annually at a cost of 0.25 dollars per finished base. the final figure was 1400 mb per year at a cost of 0.09 dollars per base. the size and scope of the hgp has also provided valuable lessons about the organization and management of large projects involving international collaboration, and those lessons will no doubt prove useful in the administration of other large scale projects. what was the genesis of this project? what general lessons has it taught us so far? how will it influence medicine? what future prospects, hopes, and fears has it given rise to? what ethical problems does it pose? these are just some of the general questions that this article will attempt to analyze. the genome is the total set of genes carried by an organism, and each gene is a segment of dna's double helix structure containing the recipe for making a polypeptide chain in a protein. a protein may contain a single polypeptide chain, as in the case of insulin, and therefore a single gene will code for this protein, or it may contain more than one chain, as in the case of hemoglobin, so that this protein is encoded by more than one gene. there are around 100 billion (us 100 trillion) cells in the human organism, and each one of these contains a complete genome. this genome is found on the 23 pairs of chromosomes in the cell nucleus. around 1.8 meters of dna containing approximately 3000 million (3 billion us) base pairs is packed into the nucleus of each cell. the genetic code uses groups of 3 dna bases to specify the amino acids that make up the polypeptide chains of proteins, the principal actors in life's drama. one of the first genomes to be completely sequenced was that of simian virus 40 (sv40), which contains 5226 nucleotides. 2 by the beginning of the 1980s, viral genomes containing over 100 000 bases had been sequenced, making it possible for scientists to envisage the possibility of sequencing bacterial genomes containing over 1 000 000 bases. when the idea of sequencing the human genome was first proposed during the mid-1980s, the undertaking seemed hardly feasible using the technology available at that time. however, after various preparatory meetings, the national institutes of health and the department of energy of the usa officially announced on october 1, 1990 the launch of a program to sequence the human genome, and james watson (of watson appointed director of the recently created national center for human genome research. around the same time, the public consortium known as the human genome project was formed, and this organization announced a 15-year plan (from 1990 to 2005) with the following objectives: a) to determine the complete nucleotide sequence of human dna and identify all the genes in human dna (estimated to number between 50 000 and 100 000); b) to build physical and genetic maps; c) to analyze the genomes of selected organisms used in research as model systems (eg, the mouse); d) to develop new technologies; and e) to analyze and debate the ethical and legal implications for individuals and for society as a whole. one of the difficulties that had to be overcome in the task of accurately sequencing the bases that make up the human genome was that approximately 50% of dna is highly repetitive. the strategy adopted by the hgp was to sequence the dna whose location on the chromosomes was already known. 3 however, this strategy was challenged in 1998 by j. craig venter and his team, who had just set up a private company called celera genomics. taking advantage of recent advances in technology, this team proposed an alternative strategy based on cutting the genome into small segments and using a computer to reassemble the sequences by matching the overlapping ends of each fragment. with these innovations, this private consortium announced that they would sequence the human genome in 3 years, in other words, that they would complete the task by 2001. this undoubtedly brought immense pressure to bear on the public group, the hgp, headed since 1992 by francis s. collins, and also gave rise to fears that a private company might control a large part of the human genome through patents. after several unsuccessful attempts to get the private and public sector groups to collaborate, an agreement was reached to simultaneously publish a first draft of the human genome in february, 2001. this draft did not, however, have the degree of precision of the current one. consequently, the hgp consortium published its results in nature 4 in february 2001, and celera did likewise in science. 5 the sequences were subsequently corroborated with a greater degree of reliability, and in april 2003, with the sequence practically complete, the hgp consortium declared the task to be completed. 6, 7 discoveries and surprises one of the surprising facts thrown up by the sequencing of the human genome was that it only contains approximately 30000 genes. owing to its size, it had been estimated that the genome would contain between 50 000 and 100 000 genes. in simple organisms, such as yeasts, the number of genes directly correlates with the size of the genome because most of the information in the genome clearly codes for proteins, and the individual genes have a well-defined beginning and a clear stop point and exit for the messenger rna. it had seemed logical, therefore, that the greater the complexity of the organism, the larger would be the number of genes. however, the sequencing of the genomes of other organisms has yielded unexpected results. for example, the common fruit fly, drosophila melanogaster, has approximately 13 500 genes, fewer than other simpler organisms, such as the earth worm, caenorhabditis elegans, with 18 500 genes, and the mustard plant, arabidopsis thaliana, with around 28000. 8-10 therefore, the human genome only has around 2000 more genes than arabidopsis despite its obviously greater biological complexity. so we have learned that the human genome has fewer genes than expected and also that that the distance separating them is considerable. it has been calculated that the gene density in the human genome is around 12 per 1000000 bases, while in drosophila this figure is 117, and in arabidopsis, 221. it is important to understand that the genes in human dna, as in most eukaryotes, are highly fragmented; in other words, not all of the bases from the beginning to the end of the gene are read to make a protein. the dna in the genes has coding regions, called exons, interrupted by long noncoding sequences, called introns (intergenic regions). these noncoding regions are removed by the process of splicing in the formation of messenger rna, so that the resulting messenger rna is much shorter than the original dna from which it was produced. for example, it has been reported that around 54% to 59% of genes in human chromosomes 14 and 22 undergo alternative splicing-the exons combine in different ways and produce various different proteins. 4, 5 this means that the number and variety of proteins in an organism does not depend solely on the number of genes in the genome, but rather on the way these genes are used. another important question thrown up by the results of the hgp was the following: if only 1% to 2% of the bases in the human genome code for proteins, then what do the rest do? an equivalent part of the noncoding portion of the genome probably contains most of the sequences that regulate the expression of genes, such as the promoters, regions that occur before the beginning of the gene. there are many other elements in the genome that affect the behavior of other components, such as the centromeres and telomeres. finally, a large part of the genome is made up of highly repetitive dna sequences, the function of which is little understood. why are there so many repetitive sequences in the human genome not found in the genomes of invertebrates? many dna sequences seem to have originated as a result of the movement of genetic elements called transposons, segments of dna that can move from one site to another within the genome. it has been postulated that many of the changes that have occurred during the evolution of vertebrates may have been triggered by the action of transposons which jumped to regulating regions and modified the expression pattern of the genes. genome sequencing is a tool that allows us to reconstruct the history of hundreds of millions of years of evolution marked by mutation, that is, the process of exchange and rearrangement of the sequences that has contributed to the formation of new species or has given rise to new genes. the task of solving these puzzles and fitting each piece into its place still presents a huge challenge because clues to our history still lie undiscovered in the noncoding sequences found in each chromosome, the sequences previously considered to be "junk dna." for example, the complete sequencing of the sex-determining y chromosome has revealed some very intriguing facts that have aroused great interest among geneticists and biologists who study evolution. these will be described in general terms in the following section. 11 the 2 human sex chromosomes, x and y, both had their origin in the same ancestral autosome several hundred million years ago, but their sequences diverged through evolution. as a result, sequences identical to those of the x chromosome that permit recombination between the two chromosomes in those regions only exist today in the terminal regions of the y chromosome. however, over 95% of the modern y chromosome has specific regions with no equivalents on another chromosome that would enable recombination during sperm production, and this is a rare example of persistence in the absence of sexual recombination. these regions contain genes that specifically code for testicular proteins as well as highly repetitive sequences which-probably because they are not understoodwere previously considered to be nonfunctional "junk" dna. with the complete sequencing of these regions, it has been found that some of these sequences are palindromic (as in the phrase anita lava la tina); that is, they read the same from left to right as from right to left, on both strands of the double helix. this fact has led to the hypothesis that x-y recombination has been replaced by recombination between the arms of the y chromosome in the regions where the palindromic sequences are located. 12 in this context, the y chromosome reveals great powers of self-preservation, using evolutionary strategies to survive in the absence of recombination with another homologous chromosome. probably one of the greatest expectations generated by the sequencing of the human genome has been the hope that this knowledge might benefit humans through its medical applications. the understanding of the role played by genetic factors in human health and disease will make it possible for us to discover better ways to approach the prevention, diagnosis, and treatment of pathological processes. it is thought that the science of genomics will soon explain the mysteries of the hereditary factors associated with heart disease, cancer, diabetes, schizophrenia, and many other chronic degenerative processes. it is also hoped that it will give us a better understanding of the genetic factors that influence our susceptibility and/or response to various infectious diseases. genomics holds the promise of individualized medicine that can be tailored to each patient's genetic profile. one of the challenging aspects of any analysis of the influence of an individual's genes on the development of certain diseases is ascertaining whether a particular disease is caused by a single gene or the interaction between several genes. it is also essential to understand how the environment influences the expression of such interactions. there are relatively few known diseases that are associated with mutations in a single gene. they include sickle cell anemia and cystic fibrosis. in the case of the gene that causes cystic fibrosis, over 900 different mutations have been identified that affect the function of the protein it encodes. in normal cells, the protein produced by this gene acts as a channel that allows cells to release chloride and other ions. in people with cystic fibrosis, however, this gene has a mutated sequence, and the protein produced is defective so that the cells do not release chloride. the result is an improper salt balance. this gives rise to the production of an abnormally thick mucus which, among other things, obstructs the airways and leads to infections. 13 however, the origin of most human diseases and of the variations in individual responses to drugs is more complex and involves the interrelation between multiple genetic factors, such as genes and the proteins they produce, and nongenetic factors, such as the influence of the environment. although all individuals share dna sequences that are 99.9% the same, each person has a unique genome. the remaining 0.1% is responsible for the genetic diversity between individuals. many differences are due to a variation in a single base pair in a gene. single nucleotide polymorphisms (snps) are variations of a gene that occur because of a change in a single letter (nucleotide) in the dna sequence, for example, the substitution of "cta" for "cca." snps contribute to the differences between individuals. while most of these polymorphisms have no effect, others cause slight differences in certain characteristics that do not affect health, such as physical appearance. others, however, may increase or decrease the individual' s risk of developing certain diseases. this happens, for example, in the case of acquired immune deficiency syndrome (aids). we now know that not all individuals exposed to the type 1 human immunodeficiency virus (hiv) become infected, and that the progression period from infection to aids is highly variable among infected individuals. some patients may develop the disease in 3 years, while others remain asymptomatic for more than 15 years. although the reasons for these differences are not entirely understood, it has recently been discovered that genetic factors play a very important role in the transmission of the virus and progression to disease. there must be 2 co-receptors on the surface of a cell in order for the virus to attach itself effectively and later infect the host cell. the first of these is cd4, the key receptor for t lymphocyte facilitators, and the second is one of the members of the chemokine family of receptors. c-c chemokine receptor 5 (ccr5) is one of the main co-receptors used by the virus to penetrate macrophages and t lymphocytes, so that it plays a critical role in the pathogenic process of aids. several studies have demonstrated that the polymorphic allele ccr5-delta32 (which contains a 32 base pair deletion) has a powerful protective effect in the progression of the hiv infection. 14 similar findings will probably emerge in relation to other diseases, so that in the future we will understand such enigmas as why not all smokers develop chronic obstructive pulmonary disease or lung cancer, or why not everyone who is exposed to avian antigens develops hypersensitivity pneumonitis. scientists have started to compile a catalogue of the common variations in the human population, which includes snps, small deletions and insertions in the coding dna, and other structural differences. part of this database is already available to the public. 15 another important point is that sets of nearby snps on the same chromosome are inherited in blocks. these patterns of snps on a block are known as haplotypes, and certain snps can be used as tags to identify the haplotypes in a block. the elucidation of the complete human genome has given rise to a new project the aim of which is to develop a haplotype map of the human genome called the hapmap. 16 the hapmap locates blocks of haplotypes, and the specific snps that identify them are called snp tags. the international hapmap project was started in 2002 and will be of fundamental importance in examining the genome in relation to phenotypes. it will also be a tool that will enable researchers to identify the genes and genetic variations that affect health and illness. in addition to its use in analyzing the relationship between genes and disease, the hapmap will be a powerful resource for studying the genetic factors that contribute to individual variations in our response to environmental factors, susceptibility to infection, adverse reactions, and response to drugs and vaccines. using only the snp tags, researchers will be able to identify regions on the chromosomes with different distributions of haplotypes in two groups of people, for example, those who suffer from a disease and those who do not. this will also facilitate the development of tests that can predict which medicines and vaccines might be more effective in individuals with particular genotypes for the genes that affect the metabolism of these drugs. the complete sequencing of the genome of an organism is only the first step in the quest to understand its biology. it is still necessary to identify all the genes and ascertain the function of the products expressed by these genes, that is, functional rna and proteins. functional genomics is based on the key premise of the central dogma of molecular genetics, which states that dna sequences are used as templates for the synthesis of rna, and this rna is subsequently used as a template for the synthesis of proteins. 17 moreover, scientists still have to analyze and understand the noncoding regulatory regions and other functional elements of the human genome and of the genomes of other organisms. this has led to the creation of a project called the encyclopedia of dna elements-or encode. the goals of this new project are to identify and map the exact location of all the protein-encoding and non-protein-encoding genes, and to identify other functional elements encoded in the dna sequences, such as promoters and other transcriptional regulatory sequences, as well as determinants of chromosome structure and function, such as origins of replication. the aim is to provide a comprehensive encyclopedia of all these elements in order to help researchers better understand human biology and predict potential disease risks, and to stimulate the development of new therapies for the prevention and treatment of disease. it has been said that the basis for understanding the genome of a mammal is the characterization of the part that is transcribed (ie, the transcriptome) and the identification of the proteins it produces (ie, the proteome). many technologies have been developed to study functional genomics, and foremost among these are the cdna microarrays or dna chips, which have been widely used to explore the expression profiles of thousands of genes simultaneously. 18, 19 this technology has been used to gain a greater understanding of the molecular mechanisms of various diseases, such as, for example, pulmonary fibrosis. idiopathic pulmonary fibrosis belongs to the category of idiopathic interstitial pneumonias and is characterized by the relatively rapid destruction of the lung parenchyma. as a result, some 50% of patients die within 3 years. 20 in a recent study, lung biopsy samples from patients with idiopathic pulmonary fibrosis and other patients with normal lungs were analyzed using this technique of oligonucleotide microarrays. 21 the results showed that gene expression patterns clearly distinguished normal from fibrotic lungs, and that many of the genes that were significantly increased in fibrotic lungs encoded proteins associated with the extracellular matrix and enzymes responsible for its replacement. this study, and others that have investigated various pathological processes, 22 illustrates the analytical power of gene expression in the identification of the molecular pathways involved in disease. the identification of the different groups of genes involved in the pathogenic processes of human disease will also facilitate the discovery of new molecular targets that can eventually be used in the treatment of such diseases. for example, we have recently found in hypersensitivity pneumonitis, an inflammatory lung disease characterized by lymphocytic alveolitis, the exaggerated expression of a chemokine derived from dendritic cells known as ccl18. this chemokine is a powerful attractor of t lymphocytes and, at least theoretically, blocking it for therapeutic reasons could reduce the lymphocyte infiltration that characterizes this disease. 23 other new genomic technologies include: a) toxicogenomics, which studies the genetic basis of an individual's response to environmental factors, such as drugs and contaminants; and b) pharmacogenomics, which deals with the development of drugs designed for specific pathogenic processes that will target specific metabolic pathways. in general terms, the genomic sciences have been defined as those which study genes, their products, and their interactions. one of the earliest objectives of the hgp was to set up a program, called elsi, to analyze the ethical, legal and social implications of genomic sciences. in this context, unesco created the international bioethics committee, and in 1997 published a declaration that states, "recognizing that research on the human genome and the resulting applications open up vast prospects for progress in improving the health of individuals and of humankind as a whole, but emphasizing that such research should fully respect human dignity, freedom and human rights, as well as the prohibition of all forms of discrimination based on genetic characteristics, proclaims the principles that follow and adopts the present declaration." the articles of this declaration 24 deal with the following topics: a) human dignity and the human genome; b) rights of individuals; c) research on the human genome; d) conditions for the exercise of scientific activity; e) solidarity and international cooperation, and f) the promotion of the principles set out in the declaration. article 1 of this universal declaration on the human genome and human rights states: "the human genome underlies the fundamental unity of all members of the human family, as well as the recognition of their inherent dignity and diversity. in a symbolic sense, it is the heritage of humanity." the medical application of the information generated by genetics must be consistent with the general principals of medical ethics: a) beneficence, or acting for the good of individuals and their families; b) doing no harm; c) respecting the autonomy of the individual, that is, allowing individuals to make independent decisions after providing them with information; and d) individual and social justice. genetic information is confidential, and it is the responsibility of institutions and authorities not to interfere without prior consent. however, there are certain circumstances that could justify the intervention of the state, such as those related to public health issues, or the well-founded request of an authority in connection with a judicial investigation. how can we define the limits between what is permitted and what is prohibited, or between privacy and responsibility towards third parties? these are the kind of topics that must be discussed and analyzed by the ethics committees in each country, which should then inform their respective legislators on these issues. other aspects that need to be reported and considered include: privacy and justice in the use of genetic interpretation, nondiscrimination, and the need to distinguish between information that we individually prefer not to know and facts that must be revealed for family or social reasons. closely related to these ethical considerations is the problem of the privatization of knowledge and the granting of patents. for example, the last nucleotide in the genetic code of the coronavirus responsible for severe acute respiratory syndrome had hardly been read when the race had already begun to take control of the intellectual rights to the sequence. in private hands, a patent on a viral sequence could delay or increase the cost of developing a treatment or diagnostic tests for a particular disease. this question has caused concern among biomedical researchers, who are afraid that broad patents on genetic sequences will affect research work in universities and public institutions and will have a detrimental effect on future public health strategies. an example of this is the case of the predictive test for breast cancer, which uses the genes brca1 and brca2. the curie institute in paris has been struggling for the right to continue analyzing these genes at a third of the price currently charged by the genome company myriad genetics (utah, usa), which was granted a european patent for these genes in 2001. molecular biology has implicitly promised to transform medicine by elucidating the smallest details of the mechanisms of life. to the extent that the molecular processes of diseases are revealed, we will, in many cases, be able to prevent them or to design effective cures or individualized treatments. genetic tests will be able to predict an individual's susceptibility to a disease, and the diagnosis of many pathological processes will be much more detailed and specific than it is today. new drugs will be designed based on an understanding of the molecular mechanisms of common diseases, such as diabetes and systemic arterial hypertension, and it will be possible to treat these diseases by focusing on specific molecular targets. in the case of diseases such as cancer, for example, drugs can be adapted to the specific response of the patient and, within a few decades, it will be possible to cure many potential diseases at a molecular level before they develop. most probably these changes will not all occur in the immediate future. it will take us a long time to understand the human genome, the book of our species, with its 23 chapters called chromosomes, each containing thousands of stories known as genes, composed of paragraphs called exons, interrupted by as yet indecipherable messages called introns, written in words called codons, made up of letters called bases. no doubt, access to the exact sequence of the genome will gradually modify, with increasingly greater impact, the practice of medicine in the coming decades, and in this context it is essential that this knowledge and these technologies be immediately incorporated into public and professional education; this is a priority and the task must begin today. prometheus stole fire from the gods for the benefit of mankind; it is up to us to ensure that our new promethean knowledge be used to throw light on many of the mysteries of biology. molecular structure of nucleic acids the genome of simian virus 40 the human genome project: past, present, and future the international human genome sequencing consortium. initial sequencing and analysis of the human genome the sequence of the human genome a vision for the future of genomics research. a blueprint for the genomic era human genome sequencing available at the genome sequence of drosophila melanogaster genome sequence of the rematode c. elegans: a platform for investigating biology. the c. elegans sequencing consortium sequence and analysis of chromosome 5 of the plant arabidopsis thaliana the male-specific region of the human y chromosome is a mosaic of discrete sequence classes abundant gene conversion between arms of palindromes in human and ape y chromosomes identification of the cystic fibrosis gene: cloning and characterization of complementary dna international meta-analysis of hiv host genetics. effects of ccr5-delta32, ccr2-64i, and sdf-1 3'a alleles on hiv-1 disease progression: an international meta-analysis of individualpatient data national human genome research institute. crick f. central dogma of molecular biology medical applications of microarray technologies: a regulatory science perspective chip genético (adn array): el futuro ya está aquí clasificación actual de las neumonías intersticiales idiopáticas gene expression analysis reveals matrilysin as a key regulator of pulmonary fibrosis in mice and humans uses of expression microarrays in studies of pulmonary fibrosis, asthma, acute lung injury, and emphysema ccl18/dc-ck-1/parc up-regulation in hypersensitivity pneumonitis universal declaration of the human genome and human rights key: cord-348815-lthz75oc authors: kurreck, jens title: rna interference: from basic research to therapeutic applications date: 2009-01-19 journal: angew chem int ed engl doi: 10.1002/anie.200802092 sha: doc_id: 348815 cord_uid: lthz75oc an efficient mechanism for the sequence‐specific inhibition of gene expression is rna interference. in this process, double‐stranded rna molecules induce cleavage of a selected target rna (see picture). this technique has in recent years developed into a standard method of molecular biology. successful applications in animal models have already led to the initiation of rnai‐based clinical trials as a new therapeutic option.[image: see text] only ten years ago andrew fire and craig mello were able to show that double‐stranded rna molecules could inhibit the expression of homologous genes in eukaryotes. this process, termed rna interference, has developed into a standard method of molecular biology. this review provides an overview of the molecular processes involved, with a particular focus on the posttranscriptional inhibition of gene expression in mammalian cells, the possible applications in research, and the results of the first clinical studies. the term rna interference (rnai) refers to a cellular process by which a double-stranded rna (dsrna) sequence-specifically inhibits the expression of a gene. this very efficient process of posttranscriptional gene silencing (ptgs), which involves numerous cellular proteins besides the rna, is strongly conserved in eukaryotes and presumably serves as a protection against viruses and genetic instability arising from mobile genetic elements such as transposons. it was originally observed in plants, [1] but correctly described for the first time in the late 1990s for the nematode caenorhabditis elegans. [2] for this achievement andrew fire and craig mello were honored with the 2006 nobel prize for medicine or physiology. [3, 4] as measured by the number of publications, rnai belongs, along with proteomics, to the most dynamic fields of biotechnology. [5] 1. 1 in their key publication, fire and mello introduced a long double-stranded rna into c. elegans and observed that, as a result, the expression of the homologous gene was blocked. [2] since then, the basic processes involved have been determined in detail. in a first step, the endonuclease dicer processes the long dsrna into small or short interfering rnas (sirnas) which are around 21 nucleotides long, of which 19 nucleotides form a helix and 2 nucleotides on each of the 3' ends are unpaired (figure 1 a) . the actual effector of the rnai is the ribonucleoprotein complex risc (rnainduced silencing complex), which is guided by the sirna to the complementary target rna. as a result, the target rna is cleaved at a specific site in the center of the duplex, 10 nucleotides from the 5' end of the sirna strand. [6] the catalytic component that cleaves the target rna (slicer activity), has been identified as the protein designated argonaut 2 (ago2). [7] an analysis of its crystal structure showed that ago2 contains a domain which resembles rnase h, [8] a long-known protein that cleaves the rna component of a dna/rna duplex. after cleavage, the target rna lacks those elements which are typically responsible for stabilizing mrnas, namely the 5' end cap and the poly-a tail at the 3' end, so that the cleaved mrna is rapidly degraded by rnases and the coded protein can no longer be synthesized. it is assumed that the loading of the sirna into the risc is accomplished by the risc-loading complex (rlc), which consists in drosophila melanogaster of dicer-2 and r2d2, and in mammalian cells of dicer and the tar rna binding protein (trbp). furthermore, it has been shown that during the activation of the risc, the strand designated the passenger (or sense) strand is cleaved, while the other strand, the guide (or antisense) strand, remains in the risc. [10, 11] recent investigations with reconstituted human rlc demonstrated that ago2 dissociates from the rest of the complex after loading with the double-stranded rna. [12] 1.2. rna interference in mammalian cells the technique of turning off the expression of specific genes by dsrnas could, initially, be applied to a large number of eukaryotes, such as plants, c. elegans or d. melanogaster, but could not be applied to mammals since long dsrnas trigger an unspecific interferon (inf) response in mammalian cells. the dsrna is interpreted by these cells as a pathogen, and protein kinase r is activated, which terminates protein synthesis in the affected cells. [13] furthermore, enzymes are induced which produce 2'-5'-linked oligoadenylates and thereby cause an rnase l-dependent unspecific degradation of single-stranded rna. [14] since the inf response is only triggered by dsrnas which are longer than 30 nucleotides, [15] the realization that rnai is induced by rnas of approximately 21 nucleotides [6, 16] provided a solution to the problem: with their groundbreaking work that chemically synthesized 21-mer sirnas trigger rnai in mammalian cells, tuschl and co-workers opened the way to use rnai for experiments in mammalian cells. [17] this created new opportunities, not only for research, but also for therapeutic treatments. the presynthesized sirna is phosphorylated on its 5' end by the kinase clp1 after entering the cells [18] which is then followed by the rnai pathway described above (figure 1 b) . rnai expanded the repertoire of the already well known oligonucleotide-based strategies of ptgs. antisense oligo-nucleotides have been employed for the last 30 years to inhibit the expression of genes at the mrna level. antisense and rnai strategies have many things in common, such as the necessity to identify suitable binding sequences on the target rna, the stabilization of the oligonucleotide by chemical modification, or the transport of the negatively charged polymer across the cell membrane. experience in the antisense field allowed for very rapid progress to be made with the new rnai strategy. [19] there are, however, important differences between the two technologies: antisense oligonucleotides are single-stranded (modified) dna molecules, which primarily induce the cleavage of the target rna in the cell nucleus by activation of rnase h. in contrast, rnai is triggered by double-stranded rna, which functions primarily in the cytoplasm. ago2, the most important component of the risc, is localized in the p bodies. [20] as a result, the central steps of rnai appear to take place in these discrete structures of the cytoplasm. in the case of rnai, an endogenous cellular pathway is followed, which could explain the high efficiency with which sirnas are able to inhibit the expression of their target genes. they can be up to 1000 times as efficient as traditional antisense oligonucleotides against the same target molecule. [21, 22] while no particularly important region could be determined for the normally 15-20 nucleotide long antisense oligonucleotides, the seed region (positions 2-8 of the antisense strand, figure 1 a) is of great importance for sirnas, since it is presumably here that the interaction with the target rna begins. the effects of sirnas are transient. the degradation of the target rna usually begins immediately after the sirna enters the cell; however, the decrease in the amount of protein depends on the half-life of the target protein. normally a pronounced inhibitory effect can be observed in cell culture within 48 h of transfection of an sirna; however, there are proteins with a very slow rate of turnover, which can be stable for much longer. also one must keep in mind that in most cases the target gene is not completely shut off, which is why rnai is referred to as a knockdown technology as opposed to knockout in the case of transgene animals created by homologous recombination. the inhibition of the expression of the target gene usually lasts for five to seven days both in vitro [23] and in vivo. [24] interestingly an sirna can work for different lengths of time in different species: an sirna against apolipoprotein b was active in mice for only a few days and after nine days was back to 70 % of its initial starting level, whereas the knockdown in nonhuman primates was still effective after 11 days. [25] the duration of action of an sirna presumably depends on numerous factors, such as the target organ, the target gene, and the species. intracellularly expressed short hairpin rnas (shrnas) can be used instead of chemically synthesized sirna to extend gene silencing (see figure 1 b and section 3). rnai is primarily a process of ptgs, that is, gene expression is inhibited by a selective blockade of an mrna. it has also been reported that rnai can alter the chromatin structure in the nucleus and thereby influence transcription. [26] this has been observed in particular for yeast, plants, and fruit flies. the importance of rnai for transcriptional deoxythymidine is often used as the overhangs in chemically synthesized sirnas. the position at which the complementary target rna is cleaved is indicated with an arrow, and the seed region, through which the interaction with the target rna begins, is indicated. b) simplified model of the rnai mechanism in mammalian cells. after uptake of the chemically synthesized sirnas into the cells, they are loaded onto the risc by the rlc, in the course of which the sense strand is removed. the antisense strand guides the risc to the complementary target rna, which is cleaved by the ago2 protein. a longer term inhibition of gene expression can be accomplished when an shrna is expressed intracellularly instead of by the exogenous application of an sirna. (figure adapted from ref. [9] .) gene silencing in mammals has, in contrast, not yet been clearly demonstrated. besides the previously described sirnas, which can be employed as research tools and potential therapeutics for the artificial regulation of gene expression, the importance of endogenous short rnas which do not code for proteins is becoming increasing clear. the role of the approximately 21 nucleotide long micro-rnas (mirnas) in the posttranscriptional regulation of genes has been investigated very intensively in the last few years. [27] in version 11.0 of the mirbase data base (http://microrna.sanger.ac.uk) from april 2008 there are over 6000 mirnas listed from animals, plants, and viruses; for humans alone over 1000 mirnas are predicted. in the nucleus, mirna precursors (pri-mirnas) are formed by special mirna genes or as introns from proteincoding polymerase ii transcripts. they are processed by the rnase iii drosha to approximately 70 nucleotide long pre-mirnas, which are transported out of the nucleus by exportin-5 and are cleaved there by dicer to become the functional mirnas ( figure 2 ). similar to sirnas, they also form a ribonucleoprotein complex with argonaut proteins and bind to their target rnas. however mirnas preferentially recognize target sequences in the 3'-untranslated region (3'-utr) of mrnas, and binding often takes place with an incomplete homology, although a perfect base pairing in the previously mentioned seed region (positions 2-8 of the mirna) forms the core of the interaction. depending on the degree of homology between the mirna and mrna, the result can be an irreversible cleavage of the target molecule or merely repression of translation. the precise mechanism of the mirna-dependent posttranslational repression of gene expression is currently the subject of intense research. [27] according to the two most important models, either translation is blocked or the mrna is destabilized. the inhibition of translation could take place at the level of initiation. in this process it is assumed that the ago2/mirna complex interacts with the cap structure at the 5' end after binding to the 3'-utr of an mrna and thereby prevents the binding of the initiation factor eif4e. as a result, the initiation complex cannot be formed. alternatively, translation could be slowed after initiation or the ribosomes could dissociate prematurely. according to the alternative model, the mrna is de-adenylated by the mirna, which makes 3'!5' degradation possible or the cap is removed, which would enable degradation in the 5'!3' direction by exonucleases. possibly there are other mechanisms through which mirnas could work. it is assumed that mirnas control the activity of about 30 % of all protein-coding genes in mammals. since every mirna regulates numerous mrnas, and in turn mrnas can be influenced by numerous mirnas, this results in an extremely complex regulatory network. so it is hardly surprising that mirnas are involved in all cellular processes that have been investigated and play an important role in numerous diseases, such as cancer, [28] viral infections, [28, 29] and genetic diseases. [30] for further information concerning the activity and function of mirnas the reader is referred to recent review articles. [27, 31] a further class of short regulatory rnas are associated with piwi proteins and are thus referred to as pirnas. [32] these rnas, at around 24-30 nucleotides in length, are slightly longer than typical sirnas or mirnas. they are presumably processed from single-stranded precursors and are found principally in germ cells. besides their importance in the control of mobile genetic elements, a function in spermatogenesis is also suspected. recently, endogenous sirnas (esirnas) were found by comprehensive sequencing of short rnas in mammalian cells (mouse oocytes). [33, 34] it was previously assumed that an rna-dependent rna polymerase activity was required for the production of esirnas, but this is not found in mammals. it has now been shown that other double-stranded rnas, such as long hairpin structures or complementary sequences, can serve as the starting point for the production of esirnas. the esirnas derive from retro-transposons and apparently function as their inhibitors. in addition, esirnas from figure 2 . mirna pathways in mammalian cells. rnas are transcribed in the nucleus in the form of a precursor (pri-mirna), which is processed by the rnase iii drosha to pre-mirna. in this process, the drosha complexes with the dgcr8 protein. the pre-mirna is exported out of the nucleus and into the cytoplasm by exportin-5 and cleaved there by dicer (complexed with trbp) to form the functional mirna, which in turn combines with an argonaut protein (ago) to form an mirna-ribonucleoprotein (mirnp) complex. the mirna can either cause endonucleolytic cleavage of the target mrna through ago2 or block translation in the case of partial complementarity. (figure adapted from ref. [27] .) pseudogenes have been found, which could be of significance for the regulation of protein-coding transcripts. the first important step for the successful application of rnai is the design of efficient sirnas. the original assumption, that it is not necessary to search for suitable target sequences in the target rna, [35] proved to be too optimistic. in practice, the efficiency of different sirnas against the same target rna varies drastically. [36] apparently factors intrinsic to the sirna itself and the characteristics of the target rna both play a role in silencing. [37] the probability of identifying a very efficient sirna was significantly increased after it was discovered that both strands of an sirna or mirna are not equally likely to be incorporated into a risc. instead, the strand with a lower thermodynamic stability (namely, a higher a/t content) at its 5' end is preferred. [38, 39] thereafter, the molecular basis for this strand asymmetry could be determined. [40] in d. melanogaster, risc is loaded by a heterodimer of dicer-2 and the dsrna-binding protein r2d2. here, r2d2 binds to the more thermodynamically stable end of the double-stranded rna and thus determines which strand associates with the risc as the guide strand. in a detailed study with 180 sirnas against two different rna targets, besides the relative stability of the two ends, additional criteria (preference for special bases in certain positions) were identified which are common among the functional sirnas. [41] in these experiments, however, the significance of each parameter was determined independently of one another. to also take into account synergistic influences of multiple linked parameters, an artificial neuronal network was trained with a dataset of over 2000 sirnas against 34 different mrnas (biopredsi-algorithm). [42] an extensive survey of the activity of published sirnas has shown, however, that there are a number of very active sirnas which do not correspond to the proposed criteria, while numerous other carefully designed sirnas are inactive. recently, even the hypothesis that the relative stability of the two ends has an influence on their efficiency has been called into question. [43] neither in an experimentally investigated set of different sirnas nor in a comprehensive analysis of published sirnas or sirnas posted to databanks could a correlation between the terminal stability of the sirna and its silencing activity be found. other characteristics of the sirna also possibly play a role. for example, it has been shown that sirnas whose antisense strands form stable helices at their ends only show a low level of activity. [44] the authors advise, therefore, designing sirnas such that the antisense strand is as unstructured as possible. besides the sirna itself, the target rna could also play an important role in silencing. this could help to explain why the expression of some targets is easily inhibited, while the knockdown of others is more difficult. in a study with several thousand sirnas, which were conceived for different genes according to the biopredsi algorithm, 70 % of the investigated kinase genes were easy silenced (defined as two of two tested sirnas working), while 6 % of the genes could not be down-regulated by up to 10 different sirnas. [45] studies with antisense oligonucleotides have already shown that the accessibility of the binding region on the target rna of oligonucleotides is of great importance for the efficiency of silencing. a correspondence between the accessibility for antisense oligonucleotides and sirnas has been demonstrated. [46] in a more comprehensive analysis, the accessibility of target rnas was predicted by an iterative bioinformatic approach and by experimental rnase h mapping. [47] the results showed that sirnas against predicted highly accessible areas were more efficient than those whose target sequence was inaccessible. the relative thermodynamic stability of the two ends of the sirna proved, in contrast, not to be a suitable criterion for the prediction of the efficiency of an sirna. we analyzed the influence on silencing of the thermodynamic design of the sirna and the accessibility of the target rna more closely with the help of artificial target structures. [48] we were able to confirm in reporter assays the strand asymmetry, namely, that the target sequences in the natural orientation led to a stronger silencing than the other way around. on the other hand, there was a clear correlation between the local free energy of the sirna binding region and silencing. we therefore proposed a two-step model to describe the inhibitory efficiency of sirnas ( figure 3 ): initially the thermodynamic characteristics of the sirna, that is, the relative stability of the two ends, determine the asymmetric incorporation of the two strands into the risc. in a second step the accessibility of the binding region of the . two-step model to explain the efficiency of sirna (s: sense strand, as: antisense strand): 1) depending on the relative stability of the two ends of an sirna, one of the two strands is preferentially assembled into the risc. the retention of the strand complementary to the target rna can be achieved through the selection of a suitable sequence. 2) an antisense strand assembled into the risc can, however, be unsuitable for silencing when the complementary sequence of the target rna is inaccessible. the local structure of the target region thus also influences silencing significantly. (figure adapted from ref. [48] with permission from elsevier.) sirna on the target rna affects the strength of the silencing. this model was confirmed in an analysis of around 200 sirnas and shrnas against over 100 different human genes. [49] according to this study, the accessibility of the target rna for the sirna is of greater importance than the duplex asymmetry for efficient knockdown. in a further report it was shown that the accessibility of the 3' end of the target rna is particularly important. [50] as already mentioned in section 1, the interaction between the sirna or mirna and the target rna begins in the seed region. some algorithms for the design of sirnas, such as the sfold web server, [51] take into account not only the thermodynamic characteristics of the duplex but also the predicted secondary structure of the target rna. it must be emphasized that none of the models proposed so far can guarantee a successful prediction of the activity of an sirna. there must, therefore, be other factors which still need to be identified, in particular synergistic effects, which influence the efficiency of rnai experiments. conventional sirnas consist of a 19-mer duplex and two nucleotide long overhangs on each of the 3' ends. it has, however, been reported that longer sirnas can be more efficient. in an experiment with sirnas of various lengths, 27 mers had an efficiency up to 100 times higher than the conventional 21 mers. [52] in a further study, 29-mer shrnas were proven to be especially potent. [53] the long duplexes were initially processed to 21 mers by dicer and were thus presumably more efficiently assembled into the risc by the rlc. the problem of the design of individual sirnas can be bypassed by the use of enzymatically synthesized sirna pools. [54] first, long dsrnas are generated which can be processed with bacterially synthesized rnase iii or recombinant dicer to endoribonuclease-prepared sirnas. this mix of sirnas can harbor the risk of increased off-target effects (see section 4); on the other hand, each individual sirna is present at a very low concentration so that the undesirable side effects are apparently diluted out. with this method, inexpensive comprehensive libraries against the complete human and mouse genome have been manufactured. although unmodified sirnas can be used in cell cultures, it can be advantageous to build modified nucleotides into the sirna so as to specifically inhibit the expression of a gene. the primary reason for the chemical modification of sirnas is the increase in resistance to nucleolytic degradation. in fact, although sirnas have an unexpectedly long life, it is usually necessary to stabilize them further by the use of modified nucleotides for in vivo applications. modifications can often lengthen the half-life of the sirna in plasma and improve its pharmacokinetic characteristics. furthermore, new functionalities can be introduced, fluorescent markers or lipophilic groups, for example, which improve cellular uptake. the rapid successful incorporation of chemically modified components can be attributed to the experience gained in the field of antisense technologies. a multitude of modified nucleotides have been employed in the past years for sirnas, of which several selected examples will be explained here. further details have been explained extensively in comprehensive review articles. [55] [56] [57] the incorporation of unnatural nucleotides into sirnas presents a particular challenge, since the modifications must not affect the silencing activity of the sirna. in this context it is important to remember that the two strands of the sirna have different functions: while the guide strand is assembled into the risc and leads the complex to the target rna, the passenger strand is discarded in loading the risc. the passenger strand is, therefore, more likely to tolerate modifications, but the guide strand can also have modified nucleotides built into suitable positions. of particular importance is the hydroxy group at the 5' end of the guide strand, which must be phosphorylated for entry of the sirna into the rnai pathway. correspondingly, an sirna whose 5' end is blocked-for example, by an amino linker-loses its inhibitory activity. [58] comparatively simple, in contrast, is the incorporation of functional groups on the ends of the passenger strand. in this way it is possible to follow the localization of an sirna with a fluorophore on the 5' end of the passenger strand without having a grave influence on its silencing activity. [59] furthermore, the cellular penetration of the sirna can be improved with a lipophilic component such as 12-hydroxylauryl acid or cholesterol (see also section 5.1.1). [60] the most common modification for the stabilization of antisense oligonucleotides is phosphorothioate dna, in which an unlinked oxygen atom is substituted by a sulfur atom. phosphorothioates are very stable with respect to nucleases and are comparatively simple to manufacture. rna variants of the phosphorothioates ( figure 4 ) have therefore also been built into sirnas. these modifications are fundamentally tolerated by the rnai machinery; however, toxic side effects have been observed when the phosphorothioate content is high. [61] nucleotides have also been used whose ribose was modified at the 2'-position, for example, 2'-o-methyl rna chemie and 2'-fluoro-modified nucleotides ( figure 4 ). the fluoro substituent is very small, and does not seriously influence the functionality of the sirna. [61] [62] [63] the significantly larger methyl group, in contrast, inhibits the rnai function when the entire sirna consists of 2'-o-methyl-substituted nucleotides. [63] therefore, modification types are sought which increase the stability of sirna without reducing their silencing activity. blunt-ended sirnas proved to be suitable when the rna units and 2'-o-methyl nucleotides alternate in both strands, so that a modified nucleotide is opposite an unmodified one. [64] such modified sirnas were injected into mice as components of lipoplexes (see section 5.1.1). [65] the sirnas were taken up by vascular endothelial cells and reduced the level of the target mrna and of the target protein. a further modification commonly used in past years is the locked nucleic acid (lna, figure 4 ). [66] [67] [68] lnas have numerous desirable characteristics such as high nuclease stability and high affinity for the target structure; their incorporation into an rna duplex, however, causes serious structural changes. a complete modification of an sirna with lnas is therefore impossible, but a few lna monomers can be built into the sirna without loss of its silencing ability. [62] in a systematic study, the positions of the antisense strand were identified which tolerate the substitution of the rna nucleotides by an lna component without loss of activity. [69] the incorporation of lnas into sirnas not only increases the nuclease stability, it can also reduce the off-target effects of an sirna (see section 4) by inactivating the sense strand and increase the efficiency of sirnas by improved loading of the risc. corresponding lna-modified sirnas showed favorable characteristics in systemic use in vivo compared to unmodified sirnas. [70] we used the method of inactivating a strand of an sirna by the incorporation of lnas to analyze in detail the mechanism of rnai-induced inhibition of the coxsackie virus. [71] these cardiotropic viruses, which belong to the family of the picornaviridae, possess a single positive-stranded genome, from which during replication a negative strand is copied as an intermediate. the selective inactivation of one of the two strands by lnas showed that only sirnas against the genomic positive strand possess an antiviral activity. in a further study, a triple-stranded sirna construct was employed, in which the antisense strand was hybridized with two shorter 10-12 nucleotide long complementary strands. [72] these so-called small internally segmented interfering rnas (sisirna) were modified at various positions with lnas and had a very high serum stability and silencing activity. the fact that the various modifications in different positions of the sirna could be built into the sirna without a drastic loss of activity suggested the possibility of combining various types of rna analogues. in this way all of the oh groups of an sirna could be substituted successfully: all the pyrimidines were replaced by 2'-fluoro-modified nucleotides, the purines of the sense strand by deoxyribonucleotides, while 2'-o-methyl-rnas were used for the purines of the antisense strand. [73] furthermore, the ends were protected by inverted abasic sugars and a phosphorothioate bond. these completely modified sirnas had a half-life in human serum of several days, as opposed to several minutes for their unmodified forms, and were significantly more efficient than the starting sirna in a vector-based in vivo model of hepatitis b virus (hbv) infection. a great disadvantage of chemically synthesized sirnas is that their activity is transient and only lasts several days, because the sirnas degrade over time and are diluted out by cell division. it was, therefore, a major advance when in 2002 several research groups simultaneously developed expression systems in which the sirna is continuously generated in cells. [74] in the most common system the sirna is converted into a dna sequence which codes for the sense strand, a loop, and the antisense strand. this dna template is transcribed from a vector under the control of polymerase iii promoters. these promoters are optimized for the generation of large amounts of precisely defined rnas. the most commonly used are the promoter of the u6 component of the spliceosome as well as the h1 promoter of the rna component of rnase p. during transcription, a self-complementary rna is created, which is referred to as an shrna. the shrna is processed intracellularly by dicer into sirna, which mediates silencing. the shrna expression systems led to the creation of new applications for rnai. usually a vector-expressed shrna works significantly longer than chemically synthesized sirna. plasmids equipped with a resistance gene can be used to select transfected cell lines in which the target gene can be down-regulated for several months. [75] in addition, transgenic animals can be generated in which the gene of interest is permanently inhibited by using shrna expression vectors. for example, the shrna expression cassette can be incorporated into embryonic mouse stem cells by electroporation [76] or lentiviral transfer [77] (see section 5.2.1). a problem of this method is that integration of the transgene is random, so the silencing efficiency can vary considerably depending on the integration site. furthermore, important cellular genes can be destroyed. for this reason a locus was sought that guaranteed a strong and predictable shrna expression. the rosa 26 locus fulfils these requirements and is used to integrate the transgene homologously by recombinase-mediate cassette exchange (rmce). the knockdown was 80-95 % when there was a single copy of the shrna-expression cassette in most analyzed organs. [78] an advantage of this procedure relative to conventional knockout techniques is the immense saving in time: the shrna-expressing animals are available for investigation in around three to four months, while with knockout animals back-crosses that can take up to several years are often necessary before the gene can be deleted from both chromosomes in a genetically defined background. surprisingly, we have recently observed phenotypic differences between knockout and shrna-expressing animals. [79] in this study, the function of the vanilloid receptor trpv1, which plays an important role in pain perception, was investigated in detail. while the reaction of the shrnaexpressing animals was in accordance with published data from trpv1-knockout animals in most tests, such as capsaicin-induced hypothermia and colitis, and their reaction to a heat stimulus, they showed pronounced differences in the perception of neuropathic pain. while the knockout of trpv1 had no impact on the perception of neuropathic pain, the mechanical hypersensitivity and allodynia in the shrna-expressing animals was significantly reduced in comparison to wild-type animals. this finding agrees with results from small molecule receptor antagonists. [80] the cause for the differences in the behavior of knockout and shrnaexpressing animals is not yet fully understood; however, a complete knockout and a partial knockdown appear to lead to differences in compensation mechanisms. one should keep in mind that small molecule pharmacological substances also only partially inhibit their targets, so that the partial knockdown in an rnai experiment may better reflect the outcome of a medicinal therapy with substances directed against that target. a further advantage of the rnai technology is its broad applicability. while classical knockouts by homologous recombination are only routinely done with mice, shrna vector technology allows genes to be turned off in other species, such as rats. [81] a further development of this idea is the creation of disease-resistant domestic animals with the help of rnai. in goat foetuses and bovine blastocysts, rnai shut off the prion protein (prp), which aggregates in transmissible spongiform encephalopathy (tse). [82] in this way it was possible to generate domestic animals which are resistant to bse and related diseases. cattle could be made resistant to foot-and-mouth disease by using a similar method. the creation of transgenic domestic animals, however, not only results in technological challenges, but also has ethical and social implications which must not be neglected. while the shrnas in the systems described so far are expressed under the control of polymerase iii promoters, modern systems can also employ polymerase ii promoters. this results in transcripts with a cap at the 5' end and a poly-a tail at the 3' end, which are not compatible with the rnai machinery. nevertheless, to use polymerase ii promoters, the expression of mirnas is simulated. these alternatives are usually components of longer pre-mrnas which are transcribed under the control of polymerase ii promoters. a naturally occurring mirna can be replaced with an artificial shrna in the sequence context of the mirna. [83] the rna polymerase ii first generates a long primary transcript, from which drosha cuts out the pre-mirnas. these are exported into the cytoplasm where dicer processes them into sirnas, which are assembled into risc. comparative studies with conventional and mirna-type shrnas against hiv have shown that the latter are up to 80 % more efficient. [84] besides their high efficiency, the mirna-type shrnas have other advantages relative to classical shrnas. for one thing, they allow the simultaneous expression of a protein-coding sequence upstream of the mirna segment. in this way, a reporter such as gfp or a relevant functional protein can be expressed with the shrna at the same time. secondly, polycistronic expression becomes possible, that is, more than one microrna-type shrna can be expressed at the same time from a single transcript. [85] in this manner, either several genes can be silenced at once or a target gene can be very efficiently inhibited by several shrnas. a third advantage is the option of using cell-typespecific promoters. while polymerase iii promoters mediate a strong and ubiquitous expression, there are a large number of different polymerase ii promoters which are only active under certain conditions or in specific cell types. for example, an mirna-type shrna against the transcription factor wilms tumor 1 was expressed under the control of the proximal promoter of the murine gene rhox5. this specifically inhibited the expression of the target gene in nurse cells of the testis. [86] the vector systems also provided the opportunity to regulate rnai by pharmacological substances. these systems may be differentiated into reversible and irreversible types. in reversible systems, expression of the shrna is "turned on" by the addition of an inducer. when the inducer is taken away, transcription of the shrna ceases and the target gene of the sirna is once again expressed. in irreversible systems, shrna expression can be induced, but cannot be turned off again. this form of regulation is widely employed when genes which are essential for embryonic development are to be investigated in adult organisms. the most common reversible shrna expression system is based on tetracycline (tet) controlled transcription. [87] for tet control, the promoter is usually modified by the addition of a tet operon, to which a repressor protein binds. the addition of an inducer-such as tetracycline or its more commonly used structural analogue doxycycline (dox)-results in a structural change in the tet repressor being induced such that it is released from the tet operon, which opens the way for transcription of the shrna. the tet system functions in vitro as well as in vivo. for example, an shrna against the polo-like kinase 1 (plk1) was dox-dependently expressed to study the importance of the target gene for the proliferation of cancer cells. [88] it was shown by inoculating these cells into immunodeficient nude mice that the rnai-mediated silencing could be modulated in a dox-dependent manner in vivo. in a further study, transgenic animals were generated according to the previously described rmce procedure, in which the shrna expression could be reversibly induced by dox. [89] in this way, the target gene, which codes for the insulin receptor, could be down-regulated for a chosen period of time. the tet system combines numerous advantages: it has a low background activity in the absence of an inducer, is strongly inducible, and quickly reversible after removal of the inducer, and the inducers tetracycline and doxycycline are nontoxic, well-characterized pharmacological substances. besides the described system, there are numerous other variants of tet control and other reversible regulation systems, which are explained in a recent review article. [90] the cre-lox system has been widely used for many years for conventional knockouts and has also been employed as an irreversible method for conditional rnai. in this system, transcription of the functional shrna is destroyed by the insertion of an additional dna segment into the expression cassette. for example, a neomycin (neo) resistance gene flanked by two loxp sites can be integrated into the shrnacoding region. [91] cre recombinase removes the interrupting sequence when expressed in the same cell and induces the synthesis of the shrna. alternatively, the stuffer sequence can also be inserted in the promoter region. cre-lox systems allow temporal control of rnai suppression, for example, induction after embryonic development as well as tissuespecific silencing when cre recombinase is expressed in certain cell types. small molecular pharmacological substances which typically bind to proteins and inhibit their catalytic cores or block membrane-bound receptors usually bind to their target molecules through spatial interactions. this often results in undesirable side effects when the substance also binds to other structurally similar proteins. since rnai applications are based on watson-crick base pairing between an oligonucleotide and an rna, there was hope that undesired side effects played no role when a target sequence that only appears once in the genome was used. in practice, a single mismatch can lead to a complete loss of silencing. [75, 92] more extensive microarray analyses, with which global profiles of gene expression can be created, showed, however, that sirnas are not completely specific. while initial studies suggested that the so-called off-target effects of sirnas are dose-dependent and can be avoided by the use of lower concentrations of sirna, [93] other studies showed that the unspecific effects have a similar dose response to the intended knockdown of the target gene. [94] the identity of as few as eleven nucleotides between the antisense strand of the sirna and an mrna can result in the down-regulation of an mrna which is not the intended target. these off-target effects can have effects on the phenotype, for example, the viability of cells. [95] more recent investigations have shown that it is not the overall identity of an mrna with the sirna, but rather the perfect correspondence between parts of the 3'-utr and the seed region (positions 2-7 or 2-8) of the antisense strand of the sirna which determines whether gene expression is influenced ( figure 5 ). [96] in a systematic study the frequency of all 4096 possible hexamers in the 3'-utr of the transcriptome was investigated. [97] it was shown that some hexamers are rare while others are considerably more common. it became clear in a microarray analysis that sirnas with common seed regions trigger stronger offtarget effects than those for which there are only a few complementary sequences. this means that off-target effects can be reduced by clever design of the sirna. furthermore, the specificity of sirnas can be reduced through the incorporation of modified nucleotides. it is comparatively easy to completely inactivate the sense strand by modifications so that the danger of off-target regulation can be reduced to a minimum. changes to the antisense strand are, on the other hand, more challenging since the inhibitory effects on the expression of the target gene must not be influenced. a single 2'-o-methyl substitution on the ribose of the second nucleotide was shown to be enough to significantly reduce off-target effects while maintaining silencing activity ( figure 5 ). [98] besides the regulation of partially homologous mrnas, sirnas can surprisingly also trigger an interferon (inf) response, although it was originally assumed that these responses are only induced by double-stranded rna molecules greater than 30 nucleotides in length. a complete analysis of the inf-stimulated genes revealed, however, that sirnas can also activate the interferon system, presumably mediated by protein kinase r. [99] this effect is not specific for sirnas, but has also been observed for vector-expressed shrnas. [100] presumably the toll-like receptors (tlr) and the helicases rig-1 and mda5, in addition to protein kinase r, also play an important role in the recognition of sirnas by the immune system. three members of the tlr family recognize rna and can trigger an immune response through a complex signaling pathway ( figure 6 ). it could be shown for plasmacytoid dendritic cells that sirnas induce inf-a via tlr7. [101] the activating effects of the sirnas on endosomal tlrs is dependent on the sequence of the sirna. [102] as a result, motifs could be identified which led to a strong induction of the immune response. this means that immunostimulation can be circumvented by avoiding the use of these motives in an sirna. for special applications, such as the treatment of viral infections or cancer, strongly immunomodulatory sirnas which have two functions, knockdown of the target gene and induction of interferons, could be used deliberately. [103] in a recent publication it was reported that unspecific effects of sirnas can also be mediated by tlr3. the investigation of the anti-angiogenetic effects of sirnas, which are used, for example, in the treatment of age-related macular degeneration (see section 6.3.1), showed in an animal model that unspecific sirnas without homologous sequences in the mammalian genome were just as efficient as sirna against the vascular endothelial growth factor (vegf) or its receptor. [105] these effects were not dependent on a sequence-specific silencing of the target, nor were offtarget rnai nor inf-a/b activated. instead, choroidal neovascularization was blocked by tlr-3 and its adaptor trif, which are localized in various cell types of the cell surface, as well as the induction of inf-g and interleukin-12. further undesirable side effects can come about by crossreactions with the endogenous mirna pathway. as explained in the introduction, sirnas and mirnas function by very similar mechanisms. for this reason it is hardly surprising that sirnas can act as mirnas. this means that sirnas can interact with the 3'-utr of mrnas by partial homology and can inhibit their translation without triggering their degradation. [106, 107] furthermore, expressed shrnas can block the endogenous mirna pathway. a pronounced liver toxicity was observed after a high dose of viral vectors carrying an shrna expression cassette was injected into mice. [108] of the 49 shrnas tested, 36 caused liver damage, which in 23 cases was fatal. presumably, the cellular mirna pathway was disturbed by, among other things, over-saturation of exportin-5, which is responsible for transporting mirna precursors out of the nucleus and into the cytoplasm. no side effects were observed at a lower concentration of the vector, in contrast, and protection from hbv was achieved in an animal model for up to a year. in response to this work, a recently published study investigated whether chemically synthesized sirnas have an influence on cellular mirnas. [109] liposomal delivery of the sirnas resulted in the expression of hepatocyte-specific genes being inhibited by around 80 %. the level and the function of several investigated mirnas were not influenced by the sirna treatment. in conclusion, it is clear that rnai applications will never be completely specific. by suitable design of the sirnas as well as the use of modified nucleotides, however, the unspecific effects can be minimized. in addition, the dose of the sirnas or shrnas should be as low as possible. the reliability of the results of functional analyses can be increased by verifying the phenotype with multiple independent sirnas. for therapeutic applications, it must be remembered that small molecular substances usually also have numerous (toxic) side effects. for this reason, the same safety standards should apply to the preclinical development of rnai applications as for other substances. oligonucleotides are multiply negatively charged macromolecules which cross the hydrophobic cell membrane with difficulty. the delivery of the sirnas into cells presents one of the greatest challenges to the development of rnai applications. for cell-culture applications, transfection reagents are commonly used, which often have toxic side effects in animals or humans. previous work from the antisense field established that a certain amount of oligonucleotides are spontaneously taken up by cells in vivo. thus, sirnas also work without a carrier when locally applied, such as through intranasal delivery [110] or intrathecal injection. [24] it should be remembered in this case that local application can create a high concentration of the sirna in a spatially restricted area. additional measures are required for efficient systemic delivery. basically, the approaches can be divided into nonviral delivery of chemically synthesized sirnas and viral delivery of shrna expression cassettes. the preferred method depends on the application: for temporary diseases such as acute infections, the short-acting sirnas can be sufficient, while for chronic diseases, such as hiv infection or metabolic diseases, the vector-based method is presumably more advantageous to avoid repeated dosing. in the first in vivo applications of rnai free sirnas were applied by hydrodynamic injection in to the tail vein. [111] this involves injection of a relatively large volume of the sirna solution in a short time at high pressure. in this way the sirnas are preferentially taken up by the liver, and proof of principle was demonstrated in practice by the knockdown of target genes in this target organ. this method is, however, very harsh and not viable for humans. for this reason, intensive work on the development of biocompatible procedures for the delivery of sirnas has been going on for many years. in vivo systemic application of sirnas requires that they overcome numerous barriers to unfold their activity: [112] free oligonucleotides are rapidly filtered from the blood by the kidneys and subsequently excreted. in addition, unmodified sirnas are rapidly degraded by nucleases (see section 2.2), and foreign macromolecules are phagocytized by the reticuloendothelial system and deposited in the liver and spleen. the half-life of the sirnas in the bloodstream can be extended by hydrophobic polymers such as polyethylene glycol (peg). the sirnas must then overcome the capillary endothelium and diffuse into the extracellular matrix of the target tissue. uptake into the cells normally occurs by endocytosis, during which an important step is the release of the sirna from the endosomes into the cytoplasm, where they manifest their activity. there are numerous methods to aid these processes, of which the most important will be discussed here. many substances are packed into liposomes to improve their pharmacokinetic characteristics. liposomes form a phospholipid bilayer surrounding an aqueous compartment, in which polar substances can be stored, and mediate uptake of the substances into the cells by some form of vesicular transport, for example, through endosomes. cationic lipids are particularly well suited for the delivery of negatively charged nucleic acids. most commercially available transfection reagents form lipoplexes with the oligonucleotides; in these lipoplexes the sirna is not contained in the inner compartment. however, numerous new, less-toxic formulations have been developed for in vivo applications. usually lipoplexes and liposomes are surrounded by peg (figure 7 a,b) to achieve longer circulation in the blood stream and to reduce the toxicity. in addition, fusogenic lipids can be added, which improve the release of the sirnas from the endosomes. while free sirnas are rapidly excreted by the kidneys after intravenous injection, an sirna labeled with the fluorescent dye cy3 that was administered as an sirna lipoplex could be detected in many organs. [65] the sirnas remained, unfortunately, primarily in the endothelial cells of the blood vessels and, therefore, barely penetrated into the tissues themselves. liposomal delivery of the completely modified sirna against hbv described in section 2.2 increased both the efficiency and duration of action in a mouse model. [113] the sirna was encapsulated in stable nucleic acid-lipid particles (snalps), which consist of a cationic and a fusogenic lipid and are also pegylated (figure 7 b) . snalps were subsequently used to test an sirna against apolipoprotein b in primates. [25] the level of the mrna in the liver was reduced by more than 90 % after a single injection, and as a result the protein, the serum cholesterol, and the level of low-density lipoprotein (ldl) was reduced. this showed that liposomemediated sirna delivery could be successfully tested in a clinically relevant context. the knockdown of apolipoprotein b demonstrates two further aspects: first, a partial reduction of the target protein is sufficient to reach a relevant therapeutic benefit, namely reduction of ldl to a normal level. for the use of rnai against tumors or viral infections, however, the greatest possible knockdown of the target gene must be reached to prevent a relapse of the disease. secondly, figure 7 . nonviral delivery of sirnas. a) lipoplex: cationic lipids (gray) form complexes with the negatively charged sirnas (red). peg (yellow) is frequently attached to improve the pharmacokinetic characteristics. b) liposomes in which the cationic lipids enclose the sirna. c) sirna coupled to cholesterol to increase its lipophilicity. d) specific delivery by coupling of sirnas on the antigen-binding fragment of an antibody through positively charged protamine. e) direct coupling of an sirna to an aptamer for tumor-cell-specific delivery. f) neuronal delivery by a peptide of the rabies virus glycoprotein (rvg) with an arginine nonamer (9r) at its carboxy terminus to bind the sirna. g) receptor-mediated delivery by coupling of a ligand (f: folate) to a dna oligonucleotide (blue), that hybridizes with sirna (sense strand: green, antisense strand: red). further details are described in the text. the advantages of rnai technology lies in the fact that any chosen gene can be inhibited, not just the so-called drugable targets, against which traditional small-molecule substances can be directed. besides the lipid-based systems, various other polymers have been employed for the delivery of sirnas. one of the most intensively investigated polymers for the delivery of nucleic acids is polyethyleneimine (pei). the linear or branched pei polymers are strongly positively charged and can therefore form complexes with sirnas and electrostatically interact with the cell surface. the complexes are taken up by endocytosis, and pei improves the release of the sirna by destroying the endosomes. pei-sirna complexes can be employed successfully to limit influenza infections in mice, for example. [114] nanoparticles from completely different substances have also been developed. for example, medarova et al. used nanoparticles, which after systemic application allow the delivery and proof of sirna uptake at the same time. [115] the samples consisted of magnetic nanoparticles labeled with a dye which absorbs in the near-infrared region so that accumulation in tumors could be observed by magnetic resonance imaging (mri) and near-infrared in vivo optical imaging (nirf). the nanoparticles were equipped with a myristoyl-coupled polyarginine peptide for translocation across the membrane. in an alternative system, carbon fiber nanotubes were used, which facilitated entry of sirnas into t cells and primary cells, which are otherwise difficult to transfect with liposomal systems. [116] an alternative strategy is to couple lipophilic molecules directly to the sirna. of a series of tested groups, cholesterol and 12-hydroxylauric acid coupled to the 3' end of the sense strand proved to be the best suited to ensure efficient uptake by the cells and knockdown of the target gene. [60] as a result, a cholersterol-coupled sirna (figure 7 c) was injected into the tail veins of mice. [117] cellular uptake and silencing of the target protein (apolipoprotein b) could be shown in the liver and the jejunum (a section of the small intestine). in addition to lipophilic groups, cell-penetrating peptides (cpp) can improve the cellular uptake of oligonucleotides. [118] interestingly, phosphorothioate oligonucleotides which are not covalently attached to an sirna improve the uptake by a caveolin-mediated mechanism. [119] this resulted in the expression of lamin in primary huvec endothelial cells being inhibited. the development of systems that allow specific delivery of sirnas to their target cells represents a great advance. this approach allows the applied doses to be smaller and possible side effects in other tissues can be avoided. an elegant possibility consists of coupling the sirna to an antibody which recognizes a protein on the surface of special cells. in a ground-breaking study, the antigen-binding fragment of an antibody against the hiv glycoprotein, which was fused to protamine, was used (figure 7 d) . [120] this positively charged protein can assemble with approximately six (negatively charged) sirnas in a noncovalent manner. with this construct it was possible to inhibit an hiv infection of primary t cells, which are difficult to transfect with lipidbased strategies. the authors succeeded in vivo with their antibody strategy in delivering the sirnas to tumor cells which presented the ligand proteins of the antibodies on their cell surface. to avoid the need to combine two classes of molecules (proteins and nucleic acids) sirnas have been coupled to aptamers-ligand-binding, in vitro selected nucleic acids. in a first effort, a streptavadin bridge was used to bind an sirna against lamin to an aptamer against the prostate-specific membrane antigen (psma), [121] a membrane receptor which is expressed in prostate cancer cells and the vascular endothelia of tumors. this conjugate made efficient silencing possible, but is relatively complex because of the biotinstreptavadin bridge. for this reason, the sirna was coupled directly to a different aptamer against psma in a further study (figure 7 e) . [122] once in the cell, the sirna is removed from the aptamer by dicer. in an animal model it was possible to inhibit the growth of a tumor from human prostate carcinoma cells with an aptamer-coupled sirna against plk1. the treatment of neurological diseases is complicated by the need to pass through the blood-brain barrier, which often prevents the entry of drugs from the bloodstream into the brain. to overcome this barrier, a 29 amino acid long peptide from the rabies virus glycoprotein was coupled with an arginine nonamer to an sirna (figure 7 f) . [123] the peptide bound to the acetylcholine receptor, which is expressed by neuronal cells, so that the conjugate specifically penetrates neurons. in vivo, an intravenously injected sirna with the peptide succeeded in getting into brain cells, and protected mice from an infection with japanese encephalitis virus. a further possibility for cell-type-specific delivery is the coupling of a receptor ligand (such as folate) to a dna oligonucleotide (figure 7 g) . [124] this dna oligonucleotide hybridizes with the 3' extended end of the antisense strand of the sirna, and the ligand mediates the uptake into the cells of the construct, which consists of two rna molecules and a dna oligonucleotide. presumably, dicer or an rnase h then produces the mature sirna. further details concerning nonviral delivery of sirnas are described in recently published reviews. [112, [125] [126] [127] viruses belong to the most dangerous pathogens for humans, and therapies against them are often inadequate or not available. for the last 20 years, however, a concept has been pursued to introduce therapeutically useful genes into patients with the help of viral vectors. in the process, the viruses are usually changed such that essential components for replication are missing so that they cannot produce progeny viruses and therefore cannot harm the patient or others (figure 8 ). although worldwide over 220 genes have been transferred in almost 1500 clinical trials, [128] the real breakthrough in gene therapy has not yet been accomplished. high hopes were placed on combining the new and very efficient rnai technology with the experience of gene angewandte chemie therapy. [90, 129] this involves getting the shrna expression cassettes into the cells by means of viral vectors. this form of gene transfer is usually significantly more efficient than the nonviral delivery of sirnas. three types of vectors are primarily used: retroviral vectors, adenoviral vectors, as well as vectors based on the adeno-associated virus (aav). the most important advantages and disadvantages of the three vector types are summarized in table 1 . past experience has shown that it is impossible to find a vector which is optimal for all indications, instead the choice of vector type depends on the intended specific therapeutic use. retroviruses have an rna genome, which is copied into a double-stranded dna, which in turn is integrated into the host genome as a proviral dna. this characteristic is maintained in therapeutically used retroviral vectors so as to permanently express the transgene. the immune response to these vectors is weak and the viruses are modified such that they can no longer leave the cells and cause damage. the family of the retroviridae are divided into the subgroups onco-retroviruses and lentiviruses. onco-retroviruses can only transduce proliferating cells. they are primarily used ex vivo, that is, the cells-for example hematopoietic stem cells-are removed from the patient which are transduced with the retrovirus vector in tissue-culture dishes and are later re-administered to the patient. in this manner children have been treated who suffer from x-scid (severe combined immunodeficiency disorder), which is caused by a mutation in the gc interleukin receptor gene located on the x chromosome. [130] the presumed advantage of long-term expression by stable integration into the host genome proved to be a disadvantage, however, since several of the children developed leukemia as a consequence of the treatment. [131] the retroviral vector integrated in the proximity of the lmo2 proto-oncogene promoter and led to the anomalous transcription and expression of lmo2. this finding shows that the safety of the vectors must be improved; at the same time, however, one must remember that diseases such as scid are frequently untreatable by any other means and lead to the early death of the affected children. lentiviral vectors can, for example, be derived from the human immunodeficiency virus (hiv). they have the ability to transduce quiescent as well as proliferating cells, thus increasing their therapeutic range. furthermore, their oncogenic potential is presumably less. the g glycoprotein of the vesicular stomatitis virus can be used as the coat protein for lentiviral vectors, which allows the transduction of almost any cell type. after the demonstration that retorviral vectors are in principle suited to sirna-mediated gene silencing by inhibiting the reporter gene egfp, [132] they were employed for medically relevant purposes. the specific knockdown of the oncogene k-ras v12 allele in human tumor cells caused them to lose their tumorgenicity. [133] in addition, lentiviral vectors were used to introduce shrna expression cassettes against viruses or their receptors into host cells. a lentivirus vector proved to be particularly efficient for the inhibition of hepatitis c virus (hcv). this vector produced several shrnas against the virus genome and the host cell receptor immune reaction, cytotoxicity cytotoxicity cd81 at the same time and thereby blocked hcv replication, cd81 expression, and cell binding of the hcv surface protein e2. [134] the intensive efforts to use lentiviral vectors for the transfer of shrna expression cassettes for the treatment of hiv infection and an ongoing clinical trial for this purpose will be described in section 6.3.2. adenoviruses have a linear, double-stranded dna genome and most often cause respiratory problems in humans. the genomic dna of adenoviruses remains episomal in the infected cells, so that no risk of insertional mutagenesis exists. unfortunately, they do cause a powerful immune reaction, which led to a fatal reaction in a clinical study. [135] parts of the early genes were removed in the first generation of adenoviral vectors. the early genes e2 and/or e4 were deleted in the second generation adenovirus vectors to further reduce the immunogenicity and to create additional space for transgenes. in the newest vectors, which are referred to as gutless, all coding sequences are deleted, so that besides the transgene, only the inverted terminal repeats (itrs) and the packaging signal y remain. [136] this approach drastically reduces the toxicity and immunogenicity of the vectors, and enables the long-term expression of a transgene. adenoviral vectors have already been employed in different medical areas for the knockdown of damaging genes. for example, both guinea pigs and pigs were protected from infection by foot-and-mouth disease by an shrna-expressing adenovirus vector. [137] the adenovirus vector mediated delivery of shrna expression cassettes has also been developed for the treatment of heart diseases. the disruption of the calcium balance is an important cause of heart failure. with rnai-mediated inhibition of phospholamban, an inhibitor of the serca2 a (sarcoplasmic reticulum ca 2+ pump), it was possible to improve the calcium uptake into the sacroplasmic reticulum in primary neonatal rat cardiomyocytes. [138] adeno-associated viruses (aavs) belong to the family of the parvoviridae and possesses a comparatively small linear single-stranded dna genome. aavs are attractive vectors for gene transfer, since they efficiently transduce target cells and are nonpathogenic for humans. while natural aavs integrate into a specific region in chromosome 19, the genes required for this are usually deleted from the recombinant vectors, so that they remain primarily episomal. despite this, aav vectors are noteworthy for their long-term, stable expression of transgenes. for gene therapeutic uses, the aav serotype 2 was first developed as a vector. since it inefficiently transduced many cell types such as muscle cells, other serotypes have been used in the past few years to expand their tropism. the genome of the aav-2 vector can be packed in capsids of other serotypes. this leads to the creation of so-called pseudotype vectors, with which cells of practically any given tissue can be transduced. [139] a further disadvantage of the conventional single-stranded vectors-the delayed start of the expression of the transgene-could also be eliminated. maximal gene expression is achieved after only a few days with selfcomplementary double-stranded aav vectors. [140] aav vectors are intensively employed in rnai experiments because of their numerous advantages. for example, the dopamine-synthesizing enzyme tyrosine hydroxylase was down-regulated in the midbrain neurons of mice with shrna-expressing aav vectors in one of the first in vivo studies. [141] as a result, behavioral changes such as a motorperformance deficit and altered reaction to a psychostimulant were seen. the faster acting self-complementary aav vectors proved useful for cell-culture experiments: the mrna of the target gene was reduced by up to 80 % after transduction of a culture of rat-lung fibroblasts for 72 h. [142] 6. applications of rna interference the sequencing of the human genome as well as those of many other eukaryotic model organisms rates as one of the most important developments of the last few decades in the life sciences. in many cases, however, only the sequence is known, while the function of the coded protein remains unknown. determination of gene function has become one of the most important tasks of present research. at about the same time as the completion of the major sequencing projects, rnai was established as a method to allow the creation of loss-of-function phenotypes in a comparatively rapid and simple manner. this led to the adoption in only a few years of rnai as a standard method of molecular biological research that is employed in a very large number of biochemical laboratories. since silencing is based on the pairing between the mrna of the gene of interest and the sirna guide strand, gene functions can be investigated significantly faster than they can be with small-molecule inhibitors, which must first be identified in laborious high-throughput screens. in addition, closely related isoforms of proteins can be selectively turned off by suitable selection of target sequences to investigate their specific functions, [143] which is almost never possible with pharmacological substances. even when the goal of a pharmaceutical project is the development of a traditional drug, rnai offers a rapid method to validate the target. [144] the unspecific effects of rnai applications discussed in section 4 must, however, be kept in mind; thus controls were already laid down with which the specificity of an rnai experiment should be proven in the early stages of the research. [145] these include, among other things, suitable controls of the knockdown at the mrna and protein levels, dose-response curves of the sirna, as well as the use of multiple sirnas against the same target. besides the analysis of the function of individual genes, many genes can also be investigated at the same time by using angewandte chemie sirna libraries. in a first example, every member of the family of de-ubiquitinating enzymes was selectively turned off with shrnas. [146] this approach led to the discovery that the knockdown of familial cylindromatosis tumor suppressor (cyld) led to an increase in the activity of the transcription factor nf-kb after tnf-a stimulation (figure 9 ). interestingly, the activation could be prevented by an aspirin derivative. as a result, patients with cylindromatosis, mostly benign tumors of the skin appendages, were treated with salicylic acid, which in some cases led to a full remission. [147] this example illustrates how rnai has led to new indications for well-known drugs. a number of libraries of sirnas, endoribonucleaseprepared sirnas, and shrna-expressing retrovirus vectors have now been developed which cover the entire human genome. genome-wide screens are primarily used in virological or oncological studies. in this way over 250 cellular factors necessary for hiv-1 infection could be identified in a comprehensive screen with 4 sirnas against each of the approximately 21 000 human genes. [148] this led not only to additional information about the viral life cycle but also identified new potential therapeutic targets. in a screen of retroviral vectors with mirna-type shrnas against around 3000 genes, proteins were identified that are involved in the proliferation of cancer cells. [149] in a further genome-wide screen, potential tumor suppressors were found which were required to block the proliferation of fibroblasts and melanocytes that contained an activated mutant of the braf proto-oncogene. [150] 6.3. therapeutic applications the decades-long experience with the clinical development of antisense oligonucleotides [151] and ribozymes [152] was utilized in the therapeutic application of sirnas. only this can explain how the first rnai treatments were started on humans just three and a half years after sirnas were first used in mammalian cells. antisense oligonucleotides and sirnas differ from conventional substances by their size, and large-scale synthesis of these oligomers causes considerable difficulties and high costs. furthermore, the two strands of the sirnas must be synthesized separately and subsequently hybridized. this process has to guarantee the formation of a uniform drug that must, in the end, satisfy the requirements of the regulatory authorities. local application was selected for the first proof-of-concept studies because of the previously discussed delivery problems. table 2 shows the most advanced, rnai-based clinical trails. the eye is a spatially well-defined organ with low nuclease activity in which the active agent can be injected intravitreally (directly into the vitreous body) comparatively easily. the only two oligonucleotides which have been approved by the american food and drug administration (fda) are for the treatment of eye diseases. the antisense oligonucleotide fomivirsen is directed against cytomegalovirus, which causes retinitis in aids patients; macugen is an aptamer for the treatment of age-related macular degeneration (amd), one of the most common eye diseases among the elderly. the first rnai-based clinical studies were started at the end of 2004 with an sirna against vegf. inhibiting the expression of vegf should block neovascularization in patients with amd. the sirna has since been tested under the name bevasiranib in a phase iii trial by the company opko health. sirna therapeutics (since bought by merck & co. inc., usa) initiated the first clinical studies with a chemically modified sirna. the sirna with the code sirna-027 was stabilized by unpaired deoxythymidine with a phosphorothioate bond and inverted abasic sugar residues on the ends of the antisense and sense strand, respectively, and is directed against the vegf receptor-1. this approach also enabled the treatment of patients with amd. an intravitreal injection of the sirna reduced the area of neovascularization by as much figure 9 . function of the tumor suppressor cyld, which was identified by means of an rnai screen. cyld works as an inhibitor in the nf-kb signaling pathway. the loss of the cyld function leads to uncontrolled growth. the pathway can also be inhibited by using sodium salicylate or prostaglandin-1 (pga1). traf: tnf-receptorassociated factor; ikk: ikb kinase complex. scheme adapted from ref. [146] . as 66 % in a mouse model for choroidal neovascularization. [154] the study of kleinmann et al. [105] already discussed in section 4 called the mechanism of action of the sirnas against vegf or its receptor into question. the authors came to the conclusion that the antiangiogenic effect was not due to the knockdown of the target genes, but was based on the extracellular activation of tlr-3. in a further clinical study, the sirna rtp801i-14 against the hypoxia-induced gene rtp801 was used for the treatment of amd according to quark pharmaceuticals. this approach is possibly safer and more efficient than the anti-vegf substances. viral infections present an increasingly pressing medical problem. the number of chronic infections associated with hiv-1, as well as hbv and hcv, are continually increasing. furthermore, new variants of viruses, such as the influenza virus h5n1, or new viruses, such as sars coronavirus, emerge as additional threats. intensive global travel and the fact that humans and animals live closely together in some regions of the world mean that new dangers from viruses must be expected. despite the enormous need for antiviral agents, there are only relatively few approved drugs for the treatment of viral diseases. this demonstrates the necessity for the development of new antiviral strategies. rnai is based on the complementary base pairing of a target rna and the guide strand of the sirna which, as a result, allows for the rapid adaptation of this technology to any given variant of a virus or to new types of virus. this is a great advantage of rnai relative to conventional approaches, which require time-consuming optimization of small-molecule substances. since the first reports about the antiviral effects of sirnas against respiratory syncytial virus (rsv), [155] other successful rnai applications against most classes of medically relevant viruses, including hiv-1, hbv, hcv, sars-coronavirus, influenza virus, polio virus, and coxsackie virus, have been published. [156] an important role in rnai approaches against viruses is played by the choice of suitable target sequences. viral rnas often contain significant secondary structure, which can seriously impede the efficiency of inhibition by an sirna (see section 2.1). for example, hiv-1 tar rna is inaccessible for the risc and could only be cleaved after the secondary structure was broken open by 2'-o-methyl-rna oligonucleotides directed against regions neighboring the sirna binding site. [157] one of the biggest problems for the long-term use of rnai against viruses is viral escape. for both the polio virus [158] and hiv, [159] cases have been described in which viral replication can at the beginning be blocked efficiently, but after a while the virus titer increases again, because of the selection of mutants which can overcome the inhibition. nonessential genes-for example, the nef gene of hiv-1-can be deleted. [160] usually, however, viruses overcome rnai-mediated silencing with point mutations in the target sequence. a comprehensive analysis of 500 hiv-1 mutants showed that certain positions are preferentially mutated. [161] to avoid viral escape, sirnas should be directed against strongly conserved regions of the virus. nonstructural proteins will be more severely affected by mutations than capsid proteins. it has, however, been reported that substitutions often result in silent mutations which do not affect protein function. [162] this escape route of the virus can be hindered by selecting conserved regions with a structural function which is destroyed by the mutations. in this way, the coxsackie virus could be inhibited over a long period by an sirna against the conserved cis-acting replication element (cre), while an sirna targeted against structurally unimportant regions led to viral escape. [163] even with the careful selection of target sequences, however, rnai approaches will require the development of combination therapy, similar to those already employed in the conventional treatment of viral infections. in analogy to highly active antiretroviral therapy (haart), in which several small-molecule active drugs are used against hiv-1, several sirnas or shrnas could be used against the virus. the combination of four shrna expression cassettes in a lentiviral vector led to the viral escape of hiv-1 observed for a single shrna being avoided. [164] the alternative to the use of sirnas against the virus is to down-regulate cellular factors which the virus requires to enter the cell and to replicate. the chance of viral escape by mutation is drastically reduced with cellular genes. the critical factor, however, is that the corresponding protein is not essential for the cell. this is, for example, the case for the hiv-1 co-receptor ccr5. mutations in the ccr5 gene have no consequences for the health of the individual but protect the person from infection with hiv-1. hematopoietic stem cells were protected against hiv-1 by the rnai-mediated knockdown of ccr5. [165] this approach is not, however, restricted to hiv-1-silencing of the coxsackie virus adenovirus receptor led to a significant reduction in the replication of cvb-3. [166] [167] [168] a recently begun clinical trial for the treatment of hiv-1infected patients combined several targets and rna-based strategies to get the best protection against escape mutants: a single lentiviral vector expresses an shrna against the hiv-1 genes rev and tat, a hammerhead ribozyme against ccr5, and a decoy oligonucleotide against the transactivation response (tar) element. [169, 170] the gene transfer occurs in this case ex vivo, that is, haematopoetic stem cells are removed from the patient, transduced in tissue culture with the vector, and then re-infused. in a therapeutic program for the treatment of hbv infections, the company nucleonics inc. is developing a vector with four shrnas against different segments of the viral genome. this approach should prevent viral escape. a phase i clinical study with vectors designated nuc b1000 started in 2007. the lung is one of the organs in which sirnas are relatively easy to apply; rnai approaches are thus promising for the treatment of respiratory diseases. infections with rsv could be inhibited by intranasal application of sirnas in a mouse model. [110] as a result, a clinical study was initiated to test how well the sirna aln-rsv01 were tolerated in healthy volunteers. according to the recently published chemie results, no serious side effects were observed, and the systemic bioavailability of the intranasally applied sirna was minimal, as expected. [171] the subsequent phase ii study investigated the safety and antiviral effects of aln-rsv01 in infected adults. the sirna was, according to alnylam pharmaceuticals, tolerated well and showed statistically significant antiviral activity. a further field in which great hope is placed on rnai is cancer research. [172] it does not require a great deal of imagination to expect that the inhibition of factors such as oncogenes could block the uncontrolled proliferation of tumor cells. the expression of genes which lead to angiogenesis within the tumor to create new blood vessels to supply the tumor with oxygen and nutrients can also be blocked. in addition, targets may be chosen which are responsible for metastasis, since in most cases primary tumors can be surgically removed and the metastases represent the real problem. finally, rnai can be employed to resensitize resistant tumor cells to treatment with chemotherapeutic agents or radiotherapy. the most important way in which tumor cells become resistant to chemotherapeutic agents is through the expression of the multidrug resistance (mdr) gene. if mdr expression is suppressed by sirnas, the cells again become vulnerable to chemotherapeutics. [173] there are many published studies which show that tumor growth could be slowed in animal models by rnai. for example, sirnas against cd31 inhibit the growth of tumors in various xenograft mouse models. [174] the sirnas penetrate into the tumor endothelial cells as lipoplexes and block angiogenesis. a further interesting option involves increasing the antitumor activity of oncolytic viruses by rnai. while viral vectors are usually modified such that after their creation they can no longer replicate (see section 5.2), oncolytic adenoviruses replicate selectively in cancer cells and destroy the cells by cell lysis. when such a virus is augmented with an shrna expression cassette (for example, against the mutated k-ras v12 oncogene) the inhibitory effect on tumor growth is increased. [175] in a first clinical rnai cancer trial, patients with glioblastoma multiforme were treated. [176] these brain tumors are almost untreatable by currently available means and the prognosis for the affected patients is very poor. the rnai approach was directed against tenascin-c, which is strongly expressed in this tumor tissue. the rnai treatment was successful in preventing the re-emergence of operatively removed glioblastomas in many patients. this product is currently being developed further by senetek plc. calando pharma employed an unmodified sirna against the m2 subunit of the ribonucleotide reductase in a phase i study for the treatment of solid tumors, whereby the sirna was delivered by a special nanoparticle. the company silence therapeutics is planning a clinical study with an sirna lipoplex (atu027). the liposomal formulated 2'-o-methylmodified sirna (atuplex) is directed against the expression of protein kinase n3 (pkn3). other companies have announced clinical trials of rnai approaches against various forms of cancer for the near future. [177] in a further clinical study, rnai is being employed as a therapeutic strategy against acute kidney failure. it has been shown that the temporary inhibition of the tumor suppressor p53 can prevent cell damage. [178] this will be exploited since the sirna akli-5 will inhibit the expression of p53 for a limited period of time. the safety of akli-5 is to be tested in a phase i trial in patients for whom a high risk of kidney failure exists because of a major cardiovascular operation. in january 2008, transderm inc. began a clinical study for the treatment of the autosomal-dominant genetic disease pachyonychia congenita, a disruption of keratinization. the sirna was intradermally injected and specifically inhibited the expression of the keratin mutation k6a, which is responsible for the disease. [179] in addition, the blockade of an endogenous mirna is being tested as a therapeutic strategy. in experiments with nonhuman primates, the liver-specific mirna-122 could be inhibited by a complementary lna oligonucleotide. [180] the lna was systemically (intravenously) administered and did not trigger any apparent toxic side effects. the level of plasma cholesterol could be reduced by inhibiting mirna-122. this mirna is an interesting target molecule for a further indication, since it is also required by hcv for replication. according to a press release from the danish company santaris pharma, a clinical trial with the lna inhibitor of the mirna-122 began in may 2008. since rnai is a technology which is strongly applicationoriented, it is of great commercial significance. the practical application of rnai rests on several fundamental patents, the most important of which include patents known as tuschl i and ii as well as kreutzer-limmer i. while the tuschl ii patent, which refers to the typical 19-21 base pair long sirnas with 3' overhangs, has already been granted, the decision regarding the tuschl i patent remains open. the kreutzer-limmer i patent has been granted in europe (but not yet in the us), however its precise extent is not yet decided. [153] a strong patent position is held by the us biotech alnylam pharmaceuticals. besides the named core patents on rnai, they also hold patents on the chemical modification and delivery of the sirnas. as a result, alnylam has made several major deals, for example, an extensive research cooperation with novartis in 2005. in july 2007, roche ag received from alnylam a non-exclusive licence for $331 million for the therapeutic use of sirnas under alnylam ip and their european research unit. much attention was generated by the take over of sirna therapeutics by merck & co., usa, for $1.1 billion at the end of 2006. these transactions show that the major pharmaceutical companies have recognized the potential of rnai and are prepared to invest a great deal in this new technology. further details regarding the patent situation as well as related business in the rnai field have been brought together in a recent review. [153] 7. summary and outlook rna interference has developed into one of the most important technologies of biomedical research within just a few years. the simple and efficient possibility to inhibit the expression of a specific gene makes possible the elucidation of the functions of proteins which are so far unknown. however, rnai has not only become a standard method of molecular biology-it has already made its way into the clinic. around a dozen clinical studies based on rnai are already running, and the first results are promising. basically, knockdown technologies can be used against any disease in which a deleterious gene is over-expressed (for example, cancer, viral infections, inflammation). two major obstacles must, however, be overcome before it can become a broadly applicable standard therapy: the question of their specificity and efficient delivery to the target cells. as already explained, sirnas can cause unspecific side effects and activate the immune system. these undesired effects can be minimized by the clever selection of the sequence and the use of modified nucleotides. immense efforts have been undertaken to develop carrier systems with which sirnas can be delivered to their target cells. despite the advances of the last years, further developments are still required to get systemically applied sirnas to their required site of action. here, viral vector systems for shrna expression cassettes offer additional options for efficient and organ-specific delivery. this approach must, however, first overcome the reservations based on the negative experience with gene therapy. then the two strategies-the delivery of chemically synthesized sirnas and the vector-mediated expression of shrnas-can complement each other, and either of the approaches can be chosen depending on the requirements of a given application. with the anticipated advances in the next few years in solving these problems, the vision of many rnai researchers could become reality: the use of genome-wide screens with sirna libraries will allow targets for diseases such as cancer to be identified, which then can be functionally investigated and validated with the sirna employed in the screening. afterwards, the same molecule can be optimized with chemical modifications in a standard manner and tested in animal models with special delivery agents, before the sirna (or a corresponding shrna) can be employed directly for testing in humans. this approach will enable an unprecedented acceleration of the development of new therapy options to be achieved. with sirnas, the specific inhibition of a single target gene is usually attempted; however, experience in the antisense field has shown that this can, under some circumstances, be inadequate for complex diseases such as cancer. in contrast, mirnas affect many target rnas, so that more comprehensive regulation can be achieved with the inhibition of a mirna. the clinical studies on the inhibition of mirnas, which have already begun or are planned for the near future, will possibly show the greater therapeutic effect. the coming years will show whether rnai, after its success in the research laboratories, will also live up to the promise of the antisense strategies to offer a new medical option for a molecular-based therapy. my special thanks go to my co-workers, whose tireless efforts contributed to the advances of my research group. i would like to thank volker a. erdmann for the continuing support during my habilitation. i thank henry fechner, jörg kaufmann, harry kurreck, hans-peter vornlocher, and denise werk for their critical reading of the manuscript, and erik wade for translating the text with great care. financial support of my research efforts by the dfg (ku-1436/1, sfb/tr19 tp c1), the bmbf/rna netzwerk, and the fonds der chemische industrie is gratefully acknowledged. proc. natl. acad. sci proc. natl. acad. sci proc. natl. acad. sci kurreck in therapeutic oligonucleotides proc. natl. acad. sci uhlmann in therapeutic oligonucleotides proc. natl. acad. sci panzner in therapeutic oligonucleotides drug discovery today rossiin therapeutic oligonucleotides key: cord-347917-fmb5nyxu authors: liu, junli; wang, fangfang; du, liuyang; li, juan; yu, tianqi; jin, yulan; yan, yan; zhou, jiyong; gu, jinyan title: comprehensive genomic characterization analysis of lncrnas in cells with porcine delta coronavirus infection date: 2020-01-28 journal: front microbiol doi: 10.3389/fmicb.2019.03036 sha: doc_id: 347917 cord_uid: fmb5nyxu porcine delta coronavirus (pdcov) is a novel emerging enterocytetropic virus causing diarrhea, vomiting, dehydration, and mortality in suckling piglets. long non-coding rnas (lncrnas) are known to be important regulators during virus infection. here, we describe a comprehensive transcriptome profile of lncrna in pdcov-infected swine testicular (st) cells. in total, 1,308 annotated and 1,190 novel lncrna candidate sequences were identified. gene ontology (go) and kyoto encyclopedia of genes and genomes (kegg) analysis revealed that these lncrnas might be involved in numerous biological processes. clustering analysis of differentially expressed lncrnas showed that 454 annotated and 376 novel lncrnas were regulated after pdcov infection. furthermore, we constructed a lncrna-protein-coding gene co-expression interaction network. the kegg analysis of the co-expressed genes showed that these differentially expressed lncrnas were enriched in pathways related to metabolism and tnf signaling. our study provided comprehensive information about lncrnas that would be a useful resource for studying the pathogenesis of and designing antiviral therapy for pdcov infection. long non-coding rnas (lncrnas), which are transcripts larger than 200 nt in length that lack protein-coding ability, have previously been described in mammalian cells (kapranov et al., 2007; mattick and rinn, 2015) . most of them have a structure similar to mrna; they have a 5 methylguanosine cap and are usually spliced and polyadenylated at their 3 termini. notably, lncrna expression shows significant cell and tissue specificity (mercer et al., 2008; derrien et al., 2012) . emerging evidence shows that non-coding rnas have a regulatory role in multiple cellular processes, such as genomic imprinting, chromatin modification, and alternative splicing of rna (mercer et al., 2009) . moreover, some diseases such as cancer and neurological disorders are also related to the dysregulated expression of lncrna (qureshi et al., 2010; tsai et al., 2011) . numerous studies have been conducted to ascertain their functional role during viral infection. for example, nrav can promote influenza virus replication and virulence through negatively regulating the initial transcription of varieties of interferonstimulated genes (isgs) (ouyang et al., 2014) . lncrna-acod1, named by its neighboring coding gene aconitate decarboxylase 1, significantly reduces virus multiplication by directly interacting with the metabolic enzyme glutamic-oxaloacetic transaminase (wang et al., 2017) . neat1, one of the lncrnas induced by hiv-1 infection, is retained in the nucleus and serves as a scaffold for the nuclear paraspeckle substructure. importantly, neat1 deficiency enhances hiv-1 replication (zhang et al., 2013) . although large amounts of data have proved that several lncrnas are involved in different kinds of virus infection, the mechanisms by which they act are still largely unknown. porcine delta coronavirus (pdcov), a novel emerging pathogenic enterocytetropic virus, was first discovered from the feces of pigs in hong kong in 2012. it is an enveloped, single-stranded positive-sense rna virus. it belongs to the genus deltacoronavirus in the family coronaviridae of the order nidovirales (woo et al., 2012) . the genome length of pdcov is approximately 25.4 kb. it is similar in structure to other coronaviruses, with shorter non-coding regions (5 -utr and 3 -utr) at both terminals. the 3/4 genome from the 5 end contains two overlapping open reading frames, orf1a and orf1b, encoding the pp1a and pp1b, respectively. the downstream of the genome encodes structural protein spike (s), envelope (e), membrane (m), accessory proteins ns6, structural protein nucleocapsid (n), and accessory proteins ns7 and ns7a. a total of 15 non-structural proteins are encoded by the 5 terminal of the genome (fang et al., 2017) . pdcov mainly causes acute, watery diarrhea, vomiting, dehydration, and mortality in suckling piglets, including lesions in the stomach and lungs (ma et al., 2015) . the first outbreak of pdcov infection was reported in the united states in early 2014 and, to date, it has been detected in canada, south korea, china, thailand, and vietnam, thus posing huge threat to the swine industry and attracting a great deal of attention (lee and lee, 2014; dong et al., 2016; lorsirigool et al., 2017; ajayi et al., 2018; saeng-chuto et al., 2019) . during infection, the accessory and non-structural proteins of pdcov usually perform multiple functions to promote replication in infected cells. a previous study showed that ns6 interaction with rig-i/mda5 attenuated the binding activity between rig-i/mda5 and double-stranded rna, resulting in a reduced level of ifn-β production (fang et al., 2018) . also, the non-structural protein nsp5, a 3c-like protease, is an important molecule in suppressing type i ifn signaling (zhu et al., 2017b) . in addition, nemo, an essential modulator of nf-κb, can also be cleaved by nsp5, causing inhibition of ifn-β production (zhu et al., 2017a) . though there are many reports of immune evasion by pdcov, the precise pathogenic mechanism of pdcov is largely unclear. based on the increasing number of reports on host lncrnas associated with virus infection, we are interested in investigating whether host lncrnas were involved in pdcov infection. in this study, genome-wide profiling of lncrnas in swine testicular (st) cells infected with pdcov was performed using rna-seq. we identified 830 differentially expressed lncrnas from pdcovinfected cells. an integrative analysis of lncrna alterations suggested their putative role in regulating the expression of several key genes in metabolic and tnf signaling pathways during infection. in conclusion, this work supports the role of lncrnas as important regulators of pdcov infection. swine testicular cells and porcine jejunum intestinal epithelial cells (ipec-j2) were grown in dmem supplemented with 10% (vol/vol) fbs (gibco, carlsbad, ca, united states) at 37 • c in a humidified 5% co 2 atmosphere. the pdcov-ch-ha3-2017 (mk040455) strain, stored in our laboratory, was propagated in st cells. for rna-seq, st cells were infected with pdcov at a multiplicity of infection (moi) of 10; the medium for pdcov infection was dmem containing 0.2 ug/ml trypsin that had been tpcktreated (millipore sigma, st. louis, mo, united states) for 11 h. mock-infected cells were placed in the same volume of dmem, with the same concentration of tpck-treated trypsin. total rna was isolated from each group using superfectri tm total rna isolation reagent (pufei, shanghai, china) according to the manufacturer's instructions. the rna quality was checked by 1% agarose gel electrophoresis. the purity and concentration of rna were measured by nanophotometer r spectrophotometer (implen, münchen, germany) and qubit r rna assay kit in qubit r 2.0 fluorometer (life technologies, camarillo, ca, united states). rna integrity was assessed using the rna nano6000 assay kit of the bioanalyzer 2100 system (agilent technologies, santa clara, ca, united states). for quantitative rt-pcr (rt-qpcr), st and ipec-j2 cells were infected or mock-infected with pdcov at an moi of 10 and harvested at the indicated time. all experiments were conducted in triplicate. sequencing libraries were generated using the rrna-depleted rna with a nebnext r ultra tm directional rna library prep kit (new england biolabs, ipswich, ma, united states). after determining the quality of the library, rna-seq was performed using an illumina hiseq tm 4000 (illumina, san diego, ca, united states) to generate raw reads. after removing poly-n sequences, adapters, and low-quality reads, clean reads were obtained and the paired-end reads were aligned to ensemble pig genome (release 76). lncrnas were identified using tophat2 (v2.0.9), and reads that were mapped to the pig genome were assembled using cufflinks v2.1.1 (trapnell et al., 2010) . cuffdiff (v2.1.1) was used to calculate the fpkms of both lncrnas and coding genes in each sample. gene fpkms were computed by summing the fpkms of transcripts in each gene group, and differentially expressed (de) transcripts were assigned where there was a statistically significant level of expression (p < 0.05). rna-seq and data analysis were completed by novogene. gene ontology (go) enrichment analysis of differentially expressed genes or lncrna target genes was conducted with respect to biological process, molecular function, and cellular component with the goseq r package, in which gene length bias was corrected. kyoto encyclopedia of genes and genomes (kegg) was used to perform pathway enrichment analysis 1 . kobas software was used to test the level of statistical significance of enrichment of differentially expressed genes and/or lncrna target genes in kegg pathways (mao et al., 2005) . to determine the reliability of the rna-seq data, 15 differentially expressed lncrnas were randomly selected to test the expression by rt-qpcr. total rna was extracted from st and ipec-j2 cells using superfectri tm total rna isolation reagent (pufei, shanghai, china). first-strand cdna was synthesized with a reverse transcriptase kit (vazyme, nanjing, china). rt-qpcr was performed with a sybr green master mix (vazyme, nanjing, china) on the lightcycler 96 (roche, basel, switzerland). the pdcov m gene was detected by rt-pcr. all the primers are presented in table 1 . relative expressions were calculated using the 2 − ct method with gapdh as the internal control. comparisons between groups were made using two-way 1 http://www.genome.jp/kegg/ table 1 | primers used for rt-qpcr validation. sequence (5 -3 ) amplicon anova. the data reported are the mean ± sem. differences were considered statistically significant when p < 0.05. for each lncrna, the pearson correlation coefficient of its expression value with that of each protein-coding gene was calculated. under the conditions of an absolute value of the pearson correlation coefficient >0.998 and p < 0.00001, the interaction network of the differentially expressed lncrnas and protein-coding gene co-expression pairs was then constructed using cytoscape (v3.5.1) (shannon et al., 2003) . to identify the lncrnas in pdcov-infected cells, we sequenced the transcriptomes of the st cells with or without pdcov infection using high-throughput rna sequencing. robust and reproducible data were obtained from all samples, and more than 1 × 10 8 clean reads per sample were retained after removing reads containing adapter or poly-n sequences and reads with low quality. afterward, all clean reads were aligned onto the pig reference genome (release 76) using tophat2 and were compared and assembled with cuffcompare and cufflinks, respectively, and coverage analysis was performed on those clean reads on different annotated gene types. the distribution of each type of gene was counted according to the expression level. in total, eight categories of rna were identified, according to database annotation of those transcripts, in which the protein-coding genes were highly represented (66.54% in ps and 69.10% in st, respectively) ( figure 1a) . next, four software tools, cnci, cpc, phylocsf, and pfam, were used to calculate the protein-coding potential of assembledtranscripts to screen lncrnas, then taking the intersection of transcripts with no coding potential in these software products as the novel lncrna ( figure 1b) . in total, 1,308 annotated and 1,190 novel lncrna candidates were identified (supplementary tables s1, s2). it has been reported that lncrnas, in comparison with protein-coding genes, usually share some common genomic features to their sequences. they are generally shorter in length, have fewer but longer exons, and there is lower evolutionary sequence conservation, with only ∼15% of mouse lncrnas having homologs in humans. lncrnas also demonstrate low expression levels (the median is ∼10% of that of protein-coding genes) (heward and lindsay, 2014) . to further determine the characteristics of the lncrnas identified in the present study, we compared the transcript length, exon number, and degree of conservation between protein-coding genes and lncrnas. conservation analysis of exons, introns, and promoters of lncrnas and protein-coding genes showed that the exons of protein-coding genes were the most conserved and the exons of lncrna were far less conserved ( figure 1c) . furthermore, fewer exons and shorter orfs were found in lncrnas, which was also consistent with the reported lncrnas (figures 1d,e) . long non-coding rnas sequences are poorly conserved and do not appear to form large homologous families, so it is difficult to infer their common ancestors by sequence similarity (ponting et al., 2009) . therefore, it is challenging to predict the functions of a type of lncrna on the basis of its sequence or structure. there have been reports of using genome-wide association analysis between lncrnas and the co-expressed and/or co-regulated protein-coding genes to characterize the function of the lncrna (huarte et al., 2010) . to investigate the putative role of lncrnas, we first analyzed the whole rna-seq profiles to identify target proteincoding genes whose location or expression was significantly correlated with the candidate lncrna. for co-located target gene prediction, we searched coding regions 100 k upstream and downstream of lncrna. in total, 8,812 pairs of lncrna-proteincoding genes, containing 2,088 lncrnas and 3,566 proteincoding genes, were identified (supplementary table s3 ). for co-expressed target gene prediction, the expression correlation between lncrnas and protein-coding genes was evaluated. when the required pearson correlation coefficient was set above 0.95, 1,048,575 pairs of lncrna-protein-coding genes, containing 1,730 lncrnas and 10,581 protein-coding genes, were obtained (supplementary table s4 ). we next performed go and kegg pathway analysis for the target genes of lncrnas. the top 20 go and kegg pathways with the highest representation of each term are reported (figure 2 and supplementary tables s5, s6) . kegg enrichment analysis revealed that pathways related to the immune system and metabolism were preferentially targeted. the go-term analysis was divided into three main categories: cellular component, biological process, and molecular function. significantly, a large number of biological processes, like protein-dna complex assembly, dna packaging and transcription, and the cellular macromolecule metabolic process, were enriched. furthermore, protein binding and nucleic acid binding and the nucleosome and organelles, belonging to molecular function and cellular component, respectively, were also enriched. go and kegg pathway enrichment analysis of target genes revealed that lncrnas may act in cis or trans to participate in the regulation of expression of multiple important genes in different processes including protein binding, dna transcription, metabolism, and immune response. to identify the pdcov-associated lncrnas, cuffdiff software was used to investigate the differentially expressed (de) lncrnas in pdcov-infected cells. the hierarchical clustering heat map in figure 3a shows the de lncrna expression profiling data. out of the 1,308 annotated and 1,190 novel lncrnas, we obtained 454 annotated de lncrnas (225 up-regulated and 229 down-regulated) and 376 novel de lncrnas (252 up-regulated and 124 down-regulated) after pdcov infection (p < 0.05; supplementary table s7) . importantly, we observed 20 lncrnas whose expression levels were decreased to fpkm = 0 after pdcov infection, while the fpkms of another 12 lncrnas, all novel lncrnas, were 0 before pdcov infection (supplementary table s7 ). this suggests that these 32 lncrnas, though they have very low expression levels, might be strongly associated with the viral infection. furthermore, to evaluate the reliability of rna-seq data analysis, 15 lncrnas were selected for rt-qpcr analysis in pdcov-infected cells. as shown in figure 3b , the expression levels of the 15 selected lncrnas, though exhibiting no significant differences at 4 h post-infection (hpi), were all significantly changed at 11 hpi in st cells. also, different expression patterns of lncrnas were detected in ipec-j2 cells. as shown in figure 3c , 11 out of the 15 selected rna were significantly altered at 11 hpi, and all of them were differently expressed at 24 hpi. for both st and ipec-j2 cells, they had a strong expression pattern consistent with the rna-seq results ( table 2) . the lncrna in the genome is not randomly distributed, so locus classification will be an effective first step in analyzing its regulatory functions at the genome level (luo et al., 2016) . in general, lncrnas function either in cis or in trans to affect the transcription of genes within or far from the same genomic locus (clark and blackshaw, 2014) . to understand the potential functional association between lncrnas and cognate genes, we investigated their genomic distribution pattern relative to protein-coding loci and classified all de lncrnas to ascertain their potential biological roles. all de lncrnas were classified into six categories comprising sense-upstream lncrna, sense-downstream lncrna, sense-overlapping lncrna, antisense-upstream lncrna, antisense-downstream lncrna and antisense-overlapping lncrna. as shown in figure 4a , 26% of de lncrnas were located in the same strand but upstream of protein-coding genes and 24% were located downstream, while antisense-upstream and antisense-overlapping comprised 27 and 1%, respectively, and the remaining 22% were antisensedownstream lncrnas. next, in order to define the lncrna functions more precisely, go enrichment analysis of the colocated genes of up-and down-regulated lncrnas were analyzed independently. the results showed that protein-coding genes associated with de lncrnas were mainly enriched in terms of molecular function and cellular component, primarily under the category of nucleic acid binding and intracellular membranebounded organelle ( figure 4b) . notably, by analyzing the relative expression level, we found that antisense lncrna and proteincoding genes were specifically co-expressed, in which two pairs showed a positive and three pairs showed a negative correlation in their expression patterns ( figure 4c and supplementary table s8 ). we speculated that these antisense lncrnas act in cis to modulate the expression of their cognate genes. the functional association between regulatory lncrna and protein-coding gene transcripts can be determined by performing expression correlation analysis coupled with ascertaining their putative role in related physiological processes. to further investigate the potential mechanism of action of the pdcov-associated lncrnas, the de lncrnas and their predicted target de protein-coding genes were investigated by delineating lncrna-protein-coding gene functional interactions. here we identified 1,048,575 pairs of de lncrna-de proteincoding genes, containing 821 lncrnas and 8,799 protein-coding genes (p < 0.01). next, kegg pathway analysis was repeated once again (figure 5a) , and we found that metabolic and tnf signaling pathways were significantly enriched. the interaction network involving the metabolic and tnf signaling pathways was then constructed. several key genes in metabolism were positively or negatively regulated by lncrnas ( figure 5b ). of the significantly enriched genes, atp5l and atp5f1, two of the mitochondrial membrane atp synthase subunits, were regulated by lnc-000625, lnc-001104, aldbssct0000008902, and aldbssct0000006348. in addition, three lncrnas, lnc-000459, lnc-000258, and aldbssct0000005568, might regulate acyl-coenzyme a thioesterase four expression. these results suggest that these lncrnas might be involved in the regulation of metabolic processes particularly involving energy and lipid metabolism. meanwhile, an inducible program of inflammatory gene expression is central to antiviral defense. many of them, i.e., ccl5, ccl20, cxcl2, cxcl10, map3k8, nf-κb1, and interleukin 6 (il-6), were protein-coding genes known to have roles in the inflammatory response. in the network (figure 5c ), eight lncrnas have putative regulatory roles in il-6 expression. six of them, lnc-000173, lnc-000269, lnc-000242, lnc-000657, aldbssct0000009132, and aldbssct0000001339, might exert positive regulation, while lnc-001173 and aldbssct0000010894 showed the opposite effect. this suggests that these lncrnas might act as the regulatory module of the circuit that is involved in the inflammatory response. numerous studies have shown that lncrnas play a key role during viral infection. the lncrnas thril, nest, neat, and lincrna-cox2 can participate in immune responses against viral infection mainly through regulating the expression of tnf-α, ifn-γ, il8, and inflammatory response, respectively (carpenter et al., 2013; gomez et al., 2013; imamura et al., 2014; li et al., 2014) . pdcov is an important enteric virus mainly causing diarrhea in suckling pigs. infection with pdcov causes changes in the expression levels of several host cell proteins of host innate immune response, but little is known about the critical roles of lncrnas in these processes. here, we performed rna-seq to identify the lncrnas involved in pdcov infection. the results of comparing clean reads to the genome showed that more than 60% of reads are protein-coding genes, and no lncrna classifications were identified due to the limited lncrna database annotation in pig. in our results, 1,190 novel lncrnas were identified. further analysis showed that the basic characteristics of these novel lncrnas are consistent with the known ones. our rna-seq results further enrich the pig lncrna database. in total, we found 454 annotated and 376 novel lncrnas that were differentially expressed during pdcov infection. these lncrnas were classified as sense-upstream lncrna, sense-downstream lncrna, sense-overlapping lncrna, antisense-upstream lncrna, antisense-downstream lncrna, and antisense-overlapping lncrna. many antisense-overlapping lncrnas have inverse expression patterns with their sense transcript counterparts. this suggests that these antisenseoverlapping lncrnas may have a negative regulatory effect on them. in contrast, many lncrnas that do not contain overlapping sequences display expression patterns correlated with their neighboring protein-coding gene transcripts. in the present study, two out of five antisense overlapping lncrnas were found to have high consistency in their expression (figure 4) . similarly, the lncrna evx1as, which initiates within the first exon of the gene evx1, has an overlap of eight nucleotides with the evx1 mrna and promotes transcription of its neighbor gene by increasing the binding affinity of histone h3 lysine 4 tri-methylation (h3k4me3) and histone h3 lysine 4 acetylation (h3k27ac) at the promoter region. considering that most lncrnas might function through their secondary structure rather than the primary one, this suggests that the regulation of antisense transcripts by antisense-overlapping lncrna may not be mediated through base-complementary pairing. correlation analysis of de lncrna and protein-coding genes identified a number of de lncrna-de protein-coding gene pairs. the main enriched kegg pathways of these protein-coding genes were in metabolism and oxidative phosphorylation. in a recent report, 5-day-old neonatal pigs were infected with pdcov, and transcriptome profile and kegg pathway enrichment analysis were performed at different stages of infection (wu et al., 2019) . in our study, we found that the lncrna targeted genes enriched in those pathways that were perturbed during the late stage of infection. in addition, the expression level of transglutaminase 3 (tgm3) and apolipoprotein a-2 (apoa2) in a wu et al. (2019) study were significantly changed. similarly, we also found that tgm1 was up-regulated, and apoa1, apoa4, and apoa5 were down-regulated during pdcov infection (data not shown). moreover, our data show that many cytokines and chemokines, which elicit an inflammatory response, were differentially expressed in the infected cells compared to mock cells. the inflammation causes injury to the intestinal tissues, resulting in diarrhea or even death. raised ccl and cxcl10 levels were associated with the severity of virus infection (betakova et al., 2017; masood et al., 2018) . here, we identified a number of lncrnas that may regulate the expression of these inflammatory molecules. to the best of our knowledge, this is the first study focusing on the expression profile of cellular lncrnas after pdcov infection. our data show the expression landscape of lncrnas, with special emphasis on the lncrna-protein modules operating in response to pdcov infection. moreover, this study provides a comprehensive genome-wide resource for exploring the molecular and cellular regulatory functions of lncrnas. this study will also be useful for identifying lncrnas as potential biomarkers for the diagnosis of pdcov infection and designing better prophylactic and therapeutic tools against virus infection. in the present study, the expression profiles of lncrnas were determined in pdcov-infected st cells. in total, 1,190 novel lncrnas were identified. a total of 830 lncrnas were differentially expressed between pdcov-infected or mockedinfected st cells. kegg pathway analysis of de lncrna coexpressed genes revealed that they might be primarily involved in regulating metabolism and tnf signaling pathways. our study systematically characterizes lncrna expression during pdcov infection and provides a useful resource for identifying and functionally characterizing the cognate gene products of those lncrnas. this study will also be useful for assigning lncrnas as potential biomarkers of pdcov infection and designing better preventive and therapeutic measures against the virus infection, which would be economically beneficial for the pig farming community. the raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher. jll, jg, and jz conceived and designed the experiments. fw, ld, and jl performed the experiments. jll, yy, yj, and ty analyzed the data. jll drafted the manuscript. all authors read and approved the final manuscript. herdlevel prevalence and incidence of porcine epidemic diarrhoea virus (pedv) and porcine deltacoronavirus (pdcov) in swine herds in ontario cytokines induced during influenza virus infection a long noncoding rna mediates both activation and repression of immune response genes long non-coding rna-dependent transcriptional regulation in neuronal development and disease the gencode v7 catalog of human long noncoding rnas: analysis of their gene structure, evolution, and expression isolation, genomic characterization, and pathogenicity of a chinese porcine deltacoronavirus strain chn-hn-2014 discovery of a novel accessory protein ns7a encoded by porcine deltacoronavirus porcine deltacoronavirus accessory protein ns6 antagonizes interferon beta production by interfering with the binding of rig-i/mda5 to double-stranded rna the nest long ncrna controls microbial susceptibility and epigenetic activation of the interferon-gamma locus long non-coding rnas in the regulation of the immune response a large intergenic noncoding rna induced by p53 mediates global gene repression in the p53 response long noncoding rna neat1-dependent sfpq relocation from promoter region to paraspeckle mediates il8 expression upon immune stimuli rna maps reveal new rna classes and a possible function for pervasive transcription complete genome characterization of korean porcine deltacoronavirus strain kor/knu14-04 the long noncoding rna thril regulates tnfalpha expression through its interaction with hnrnpl the genetic diversity and complete genome analysis of two novel porcine deltacoronavirus isolates in thailand in 2015 divergent lncrnas regulate gene expression and lineage differentiation in pluripotent cells origin, evolution, and virulence of porcine deltacoronaviruses in the united states automated genome annotation and pathway identification using the kegg orthology (ko) as a controlled vocabulary role of tnf alpha, il-6 and cxcl10 in dengue disease severity discovery and annotation of long noncoding rnas long non-coding rnas: insights into functions specific expression of long noncoding rnas in the mouse brain nrav, a long noncoding rna, modulates antiviral responses through suppression of interferon-stimulated gene transcription evolution and functions of long noncoding rnas long non-coding rnas in nervous system function and disease retrospective study, full-length genome characterization and evaluation of viral infectivity and pathogenicity of chimeric porcine deltacoronavirus detected in vietnam cytoscape: a software environment for integrated models of biomolecular interaction networks transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation long intergenic noncoding rnas: new links in cancer progression an interferon-independent lncrna promotes viral replication by modulating cellular metabolism discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus expression profile analysis of 5-day-old neonatal piglets infected with porcine deltacoronavirus neat1 long noncoding rna and paraspeckle bodies modulate hiv-1 posttranscriptional expression porcine deltacoronavirus nsp5 inhibits interferon-beta production through the cleavage of nemo porcine deltacoronavirus nsp5 antagonizes type i interferon signaling by cleaving stat2 the supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.copyright © 2020 liu, wang, du, li, yu, jin, yan, zhou and gu. this is an openaccess article distributed under the terms of the creative commons attribution license (cc by). the use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. no use, distribution or reproduction is permitted which does not comply with these terms. key: cord-326719-p1ma4akz authors: enjuanes, luis; almazán, fernando; ortego, javier title: virus-based vectors for gene expression in mammalian cells: coronavirus date: 2003-12-31 journal: new comprehensive biochemistry doi: 10.1016/s0167-7306(03)38010-x sha: doc_id: 326719 cord_uid: p1ma4akz publisher summary the coronavirus and the torovirus genera form the coronaviridae family, which is closely related to the arteriviridae family. both families are included in the nidovirales order. recently, a new group of invertebrate viruses, the roniviridae, with a genetic structure and replication strategy similar to those of coronaviruses, has been described. this new virus family has been included within the nidovirales. coronaviruses have several advantages as vectors over other viral expression systems: (1) coronaviruses are single-stranded rna viruses that replicate within the cytoplasm without a dna intermediary, making integration of the virus genome into the host cell chromosome unlikely, (2) these viruses have the largest rna virus genome and, in principle, have room for the insertion of large foreign genes, (3) a pleiotropic secretory immune response is best induced by the stimulation of gut-associated lymphoid tissues, (4) the tropism of coronaviruses may be modified by manipulation of the spike (s) protein allowing engineering of the tropism of the vector, (5) non-pathogenic coronavirus strains infecting most species of interest (human, porcine, bovine, canine, feline, and avian) are available to develop expression systems, and (6) infectious coronavirus cdna clones are available to design expression systems. within the coronavirus two types of expression vectors have been developed: one requires two components (helper–dependent expression system) and the other a single genome that is modified either by targeted recombination or by engineering a cdna encoding an infectious rna. this chapter focuses on the advantages and limitations of these coronavirus expression systems, the attempts to increase their expression levels by studying the transcription-regulating sequences (trss), and the proven possibility of modifying their tissue and species-specificity. the coronavirus and the torovirus genera form the coronaviridae family, which is closely related to the arteriviridae family. both families are included in the nidovirales order [1, 2] . recently, a new group of invertebrate viruses, the roniviridae, with a genetic structure and replication strategy similar to those of coronaviruses, has been described [3] . this new virus family has been included within the nidovirales [4] . coronaviruses have several advantages as vectors over other viral expression systems: (i) coronaviruses are single-stranded rna viruses that replicate within the cytoplasm without a dna intermediary, making integration of the virus genome into the host cell chromosome unlikely [5] ; (ii) these viruses have the largest rna virus genome and, in principle, have room for the insertion of large foreign genes [1, 6] ; (iii) a pleiotropic secretory immune response is best induced by the stimulation of gut associated lymphoid tissues. since coronaviruses in general infect the mucosal surfaces, both respiratory and enteric, they may be used to target the antigen to the enteric and respiratory areas to induce a strong secretory immune response; (iv) the tropism of coronaviruses may be modified by manipulation of the spike (s) protein allowing engineering of the tropism of the vector [7, 8] ; (v) non-pathogenic coronavirus strains infecting most species of interest (human, porcine, bovine, canine, feline, and avian) are available to develop expression systems; and (vi) infectious coronavirus cdna clones are available to design expression systems. within the coronavirus two types of expression vectors have been developed (fig. 1) , one requires two components (helper-dependent expression system) (fig. 1a ) and the other a single genome that is modified either by targeted recombination [6] (fig. 1b.1 ) or by engineering a cdna encoding an infectious rna. infectious cdna clones are available for porcine [9, 10] (fig. 1b.2 and b. 3), human ( fig. 1b.4 ) [11] , murine [12] and avian (infectious bronchitis virus, ibv) coronavirus [13] , and also for the arteriviruses equine infectious anemia virus (eav) [14] , porcine respiratory and reproductive syndrome virus (prrsv) [15] , and simian hemorrhagic fever virus (shfv) [16] . the availability of these cdnas and the application of target recombination to coronaviruses [6] have been essential for the development of vectors based on coronaviruses and arteriviruses. this review will focus on the advantages and limitations of these coronavirus expression systems, the attempts to increase their expression levels by studying the transcription-regulating sequences (trss), and the proven possibility of modifying their tissue and species-specificity. coronaviruses comprise a large family of viruses infecting a broad range of vertebrates, from mammalian to avian species. coronaviruses are associated mainly with respiratory, enteric, hepatic and central nervous system diseases. in humans and fowl, coronaviruses primarily cause upper respiratory tract infections, while porcine and bovine coronaviruses (bcovs) establish enteric infections that result in severe economical loss. human coronaviruses (hcov) are responsible for 10-20% of all common colds, and have been implicated in gastroenteritis, high and low respiratory tract infections and rare cases of encephalitis. hcov have also been associated with infant necrotizing enterocolitis and are tentative candidates for multiple sclerosis. in march 2003, a new group of hcovs has emerged as the ethiological agent of the severe acute pneumonia syndrome (sars) affecting thousands of people, mostly in china, singapore, and toronto. in addition, human infections by coronaviruses seem to be ubiquitous, as coronaviruses have been identified wherever they have been looked for, including north and south america, europe, and asia and no other human disease has been clearly associated with them with the exception of respiratory and enteric infections. virions contain a single molecule of linear, positive-sense, single-stranded rna (fig. 2) . the coronavirus genome with a size ranging from 27.6 to 31.3 kb is the largest viral rna known. coronavirus rna has a 5 0 terminal cap followed by a leader sequence of 65-98 nucleotides and an untranslated region of 200-400 nucleotides. at the 3 0 end of the genome there is an untranslated region of 200-500 nucleotides followed by a poly(a) tail. the virion rna, which functions as an mrna and is infectious, contains approximately 7-10 functional genes, four or five of which encode structural proteins. the genes are arranged in the order 5 0 -polymerase-(he)-s-e-m-n-3 0 , with a variable number of other genes that are believed to be non-structural and largely non-essential, at least in tissue culture [1] . about two-thirds of the entire rna comprises the rep1a and rep1b genes. at the overlap between the rep1a and 1b regions, there is a specific seven-nucleotide ''slippery'' sequence and a pseudoknot structure (ribosomal frameshifting signal), which are required for the translation of rep1b as a single polyprotein (rep1a/b). the 3 0 third of the genome comprises the genes encoding the structural proteins and the other non-structural ones. coronavirus transcription occurs via an rna-dependent rna synthesis process in which mrnas are transcribed from negative-stranded templates. coronavirus mrnas consist of six to eight types of varying sizes, depending on the coronavirus strain and the host species. the largest mrna is the genomic rna that also serves as the mrna for rep1a and 1b, the remainder are subgenomic mrnas (sgmrnas). the mrnas have a nested-set structure in relation to the genome structure (fig. 2b ). coronaviruses are enveloped viruses containing a core that includes the ribonucleoprotein formed by the rna and nucleoprotein n ( fig. 2a) . the core is formed by the genomic rna, the n protein and the membrane (m) protein carboxyterminus. most of the m protein is embedded within the membrane but its carboxyterminus is integrated within the core and seems essential to maintain the core structure [17, 18] . at least in the transmissible gastroenteritis virus (tgev) the m protein presents two topologies. in one (m 0 ), both the amino and the carboxyl termini face the outside of the virion, while in the other (m) the carboxy-terminus is inside [18] . in addition, the virus envelope contains two or three other proteins, the spike (s) protein that is responsible for cell attachment, the small membrane protein (e) and, in some strains, the hemagglutinin-esterase (he) [1] . the replicase gene encodes a protein of approximately 740-800 kda which is co-translationally processed. several domains within the replicase have predicted functions based on regions of nucleotide homology [19]. the coronaviruses have been classified into three groups (1, 2 and 3) based on sequence analysis of a number of coronavirus genes [1] . helper-dependent expression systems have been developed using members of the three groups of coronaviruses ( fig. 1 ). group 1 coronaviruses include porcine, canine, feline and hcovs. expression systems have been developed for the porcine and hcovs since minigenomes are only available for these two coronaviruses. one expression system has been developed using tgev-derived minigenomes [20] . the tgev-derived rna minigenomes were successfully expressed in vitro using t7 polymerase and amplified after in vivo transfection using a helper virus. tgev-derived minigenomes of 3.3, 3.9 and 5.4 kb were efficiently used for the expression of heterologous genes [21, 22] . the smallest minigenome replicated by the helper virus and efficiently packaged was 3.3 kb in length [20] . using m39 minigenome, a two-step amplification system was developed based on the cloning of a cdna copy of the minigenome after the immediate-early cytomegalovirus promoter (cmv) [20] . minigenome rna is first amplified in the nucleus by the cellular rna pol ii and then, the rnas are translocated into the cytoplasm where they are amplified by the viral replicase of the helper virus. the -glucuronidase (gus) and a surface glycoprotein (orf5), that is the major protective antigen of the prssv, have been expressed using this vector [22] . tgevderived helper expression systems have a limited stability and minigenomes without the foreign gene replicate about 50-fold more efficiently than those with the heterologous gene [22] . expression of gus gene and prrsv orf5 with these minigenomes has been demonstrated in the epithelial cells of alveoli and in scattered pneumocytes of swine lungs, which led to the induction of a strong immune response to these antigens [22] . the hcov-229e has also been used to express new sgmrnas [23] . it was demonstrated that a synthetic rna including 646 nt from the 5 0 end plus 1465 nt from the 3 0 end was amplified by the helper virus. most of the work has been done with mouse hepatitis virus (mhv) defective rnas (minigenomes) [24, 25] . three heterologous genes have been expressed using the mhv system, chloramphenicol acetyltransferase (cat), he, and interferon (ifn-). expression of cat or he was detected only in the first two passages because the minigenome used lacks the packaging signal [26] . when virus vectors expressing cat and he were inoculated intracerebrally into mice, he-or cat-specific sgmrnas were only detected in the brains at days 1 and 2, indicating that the genes in the minigenome were expressed only in the early stage of viral infection [27] . a mhv minigenome rna was also developed as a vector for expressing ifn-. the murine ifn-gene was secreted into culture medium as early as 6 h posttransfection and reached a peak level at 12 h post-transfection. no inhibition of virus replication was detected when the cells were treated with ifn-produced by the minigenome rna, but infection of susceptible mice with a minigenome producing ifn-caused significantly milder disease, accompanied by less virus replication than that caused by virus containing a control vector [25, 28] . ibv is an avian coronavirus with a single-stranded, positive-sense rna genome of 27,608 nt. a defective rna (cd-61) derived from the beaudette strain of the ibv virus was used as an rna vector for the expression of two reporter genes, luciferase and cat [29] . helper-dependent expression systems have a limited stability probably due to the foreign gene since tgev minigenomes of 9.7, 3.9 and 3.3 kb, in the absence of the heterologous gene, are amplified and efficiently packaged for at least 30 passages, without generating new dominant subgenomic rnas [20] . the expression of gus, prssv orf5, or cat using tgev-or ibv-derived minigenomes in general increases until passages three or four, expression levels are maintained for about four additional passages, and steadily decrease during successive passages [20] [21] [22] 29] . using ibv minigenomes cat expression levels between 1 and 2 mg per 10 6 cells have been described. the highest expression levels (2-8 mg of gus per 10 6 cells) have been obtained using a two-step amplification system based on tgev derived minigenomes with optimized trss [20, 21] . using minigenomes derived from tgev and ibv expression was highly dependent on the nature of the heterologous gene used. luciferase expression with tgev and ibv minigenomes was reduced to almost background levels, while expression of gus or cat was at least 100-1000-fold higher than background levels, respectively. the construction of cdna clones encoding full-length coronavirus rnas has considerably improved the genetic manipulation of coronaviruses. the enormous length of the coronavirus genome and the instability of plasmids carrying coronavirus replicase sequences have hampered, until recently, the construction of a full-length cdna clone. infectious coronavirus cdna clones have been described for coronaviruses [9, 10, 13] and for arteriviruses [14, 15] . the strategy used to clone tgev infectious cdna was based on three points [9] : (i) the construction was started from a minigenome that was stably and efficiently replicated by the helper virus [20] . during the filling in of minigenome deletions a cdna fragment that was toxic to the bacterial host was identified. this fragment was reintroduced into the cdna in the last cloning step; (ii) in order to express the long coronavirus genome and to add the 5 0 cap required for tgev rna infectivity, a two-step amplification system that couples transcription in the nucleus from the cmv promoter, with a second amplification in the cytoplasm driven by the viral polymerase, was used; and (iii) to increase viral cdna stability within bacteria, the cdna was cloned as a bacterial artificial chromosome (bac), that produces a maximum of two plasmid copies per cell. a fully functional infectious tgev cdna clone, leading to a virulent virus infecting both the enteric and respiratory tract of swine was engineered. the stable propagation of a tgev full-length cdna in bacteria as a bac has been considerably improved by the insertion of an intron to disrupt a toxic region identified in the viral genome (fig. 3 ) [30] . the viral rna was expressed in the cell nucleus under the control of the cmv promoter and the intron was efficiently removed during translocation of this rna to the cytoplasm. intron insertion in two different positions (nt 9466 and 9596) allowed stable plasmid amplification for at least 200 generations. infectious tgev was efficiently recovered from cells transfected with the modified cdnas. the great advantage of this system is that the performance of coronavirus reverse genetics only involves recombinant dna technologies carried out within the bacteria. using tgev cdna the green fluorescent protein (gfp) gene of 0.72 kb was cloned in two positions of the rna genome: either by replacing the non-essential 3a and 3b genes or between genes n and 7. the engineered genome was very stable (>30 passages in cultured cells) and led to the production of high expression levels (50 mg/10 6 cells) when the gfp replaced genes 3a/b but was unstable when cloned between genes n and 7 [31] . in this case, the gfp gene was eliminated by homologous recombination between preexisting trs sequences and those introduced to express gfp. using the most stable vector, the acquisition of immunity by newborn piglets breast fed by immunized sows (lactogenic immunity) was demonstrated [31] . gus expression levels using coronavirus based vectors are similar (fig. 4) to those described for vectors derived from other positive strand rna viruses such as sindbis virus (50 mg per 10 6 cells) [32] . a second procedure to assemble a full-length infectious construct of tgev was based on the in vitro ligation of six adjoining cdna subclones that span the entire tgev genome [10] . each clone was engineered with unique flanking interconnecting junctions that determine a precise assembly with only the adjacent cdna subclones, resulting in a tgev cdna. in vitro transcripts derived from the full-length tgev construct were infectious. using this construct, a recombinant tgev was assembled that replaced orf 3a with the gfp gene, leading to the production of a recombinant tgev that grew with titers of 10 8 pfu/ml and expressed gfp in a high proportion of cells [33] . an infectious cdna clone has also been constructed for hcov-229e, another member of group 1 coronaviruses [11] . in this case, the system is based on the in vitro transcription of infectious rna from a cdna copy of the hcov-229e genome that has been cloned and propagated in vaccinia virus (fig. 5) . briefly, the full-length genomic cdna clone of hcov-229e was assembled by in vitro ligation, and then cloned into the vaccinia virus dna under control of the t7 promoter. recombinant vaccinia viruses containing the hcov-229e genome were recovered after transfection of the recombinant vaccinia virus dna into cells infected with fowlpox virus. in a second phase, the recombinant vaccinia virus dna was purified and used as a template for in vitro transcription of hcov-229e genomic rna that was transfected into susceptible cells for the recovery of infectious recombinant coronavirus (fig. 5) . a coronavirus replicon has been derived from the hcov genome using the same procedure described for the full-length genome construction [34] . this replicon included the 5 0 and 3 0 ends of the hcov-229e genome, the replicase gene of this virus and a single reporter gene coding for gfp located downstream of a trs element for coronavirus mrna transcription. when rna transcribed from this cdna was transfected into bhk-21 cells, only 0.1% of the cells showed strong fluorescence. this data shows that the coronavirus replicase gene products suffice for discontinuous sgmrna transcription, in agreement with the requirements for the arterivirus replicase [35] . the expression of a heterologous gene (gfp) by a tgev replicon was increased between 300-and 400-fold when tgev n protein was in cis co-expressed [36] . in addition, expression from a tgev replicon was also observed when n protein was in trans co-expressed using the venezuelan equine encephalitis virus vector [36] . furthermore, expression from hcov based vectors also was significantly increased by co-expression of n gene. therefore, it seems that n protein either stabilizes coronavirus replicons or increases their replication, transcription or translation. reverse genetics in this coronavirus group has been efficiently performed by targeted recombination between a helper virus and either non-replicative or replicative coronavirus-derived rnas (fig. 2b.1 ). this approach, developed by masters' group [6, 37] , was first applied to the engineering of a five-nucleotide insertion into the 3 0 untranslated region (3 0 utr) of mhv [37] . this approach was facilitated by the availability of an n gene mutant, designated alb4, that was both temperaturesensitive and thermolabile. alb4 forms tiny plaques at restrictive temperature that are easily distinguishable from wild-type plaques. in addition, incubation of alb4 virions at non-permissive temperature results in a 100-fold greater loss of titer than for wild-type virions. these phenotypic traits allowed the selection of recombinant viruses generated by a single crossover event following cotransfection into mouse cells of alb4 genomic rna together with a synthetic copy of the smallest subgenomic rna (rna7) tagged with a marker in the 3 0 utr. an improvement of the recombination frequency was obtained between the helper virus and replicative defective rnas as the donor species. whereas between replication competent mhv and non-replicative rnas a recombination frequency of the order of 10 à5 was estimated, the use of replicative donor rna yielded recombinants at a rate of some three orders of magnitude higher [38] . this higher efficiency made it possible to screen for recombinants even in the absence of selection. in this manner, the transfer of silent mutation in rep1a gene of a minigenome to wild-type mhv at a frequency of about 1% was demonstrated. targeted recombination has been applied to the generation of mutants in most of the coronavirus genes. thus, two silent mutations have been created so far in gene 1a [38] . the s protein has also been modified by targeted recombination. changes were introduced by one crossover event at the 5 0 end of the s gene that modified mhv pathogenicity [39] . targeted recombination mediated by two cross-overs allowed the replacement of the s gene of a respiratory strain of tgev by the s gene of enteric tgev strain pur-c11 leading to the isolation of viruses with a modified tropism and virulence [7] . in this case the recombinants were selected in vivo using their new tropism in piglets. a new strategy for the selection of recombinants within the s gene, after promoting targeting recombination, was based on elimination of the parental replicative tgev by the simultaneous neutralization with two mabs (i. sola and l. enjuanes, unpublished results). mutations have also been introduced by targeted mutagenesis within the e and m genes. these mutants provided corroboration for the pivotal role of e protein in coronavirus assembly and identified the carboxyl terminus of the m molecule as crucial to assembly [40] . targeted recombination was also used to express heterologous genes. for instance, the gene encoding gfp was inserted into mhv between genes s and e, resulting in the creation of the largest known rna viral genome [41] . an infectious mhv cdna clone has recently been assembled in vitro. a method similar to the one developed to assemble an infectious tgev cdna clone based on the in vitro ligation of seven contiguous cdna subclones has been applied to the construction of a cdna that spanned the 31.5 kb of the mhv a59 strain [12] . the ends of the cdnas were engineered with unique junctions, which were directed to assembly with only the adjacent cdnas subclone, resulting in an intact mhv-a59 cdna construct. the interconnecting restriction site junctions that are located at the ends of each cdna are systematically removed during the assembly of the complete full-length cdna product, allowing reassembly without the introduction of nucleotide changes. rna transcripts derived from the full-length mhv-a59 construct was infectious, although virus recovery was enhanced 10-15-fold in the presence of rna transcripts encoding the nucleocapsid protein, n. the infectious ibv cdna clone was assembled using the same strategy reported for hcov-229e with some modifications [13] . similarly to hcov-229e, the ibv genomic cdna was assembled downstream of the t7 promoter by in vitro ligation and cloned into the vaccinia virus dna. however, recovery of recombinant ibv was done after the in situ synthesis of infectious ibv rna by transfection of restricted recombinant vaccinia virus dna (containing the ibv genome) into primary chick kidney cells previously infected with a recombinant fowlpox expressing t7 rna polymerase. engineered cdnas are having an important impact on the study of mechanisms of coronavirus replication and transcription and provide an invaluable tool for the experimental investigation of virus-host interactions. replication-competent propagation-deficient virus vectors based on tgev genomes deficient in the essential gene e that are complemented in packaging cell lines have been developed [33, 42] . two types of cell lines expressing tgev e protein have been established, one with transient expression using the non-cytopathic sindbis virus replicon psinrep21 (fig. 6 ) and another stably expressing e gene under the cmv promoter. the rescue of recombinant tgev deficient in the non-essential 3a and 3b genes, and the essential e gene reached high titers (>6 â 10 6 pfu/ml) in cells transiently expressing the tgev e protein, while this titer was up to 5 â 10 5 pfu/ml in packaging cell lines stably expressing protein e. interestingly, virus titers were related to protein e expression levels [42] . recovered virions showed the same morphology and stability at different ph and temperatures than the wild type virus. a second strategy for the construction of replication-competent propagationdefective tgev genomes expressing heterologous genes, involves the assembly of an infectious cdna from six cdna fragments that are ligated in vitro [33] . the defective virus with the essential e gene deleted was complemented by the expression of e gene using the venezuelan equine encephalitis replicon expression vector. however, titers of recombinant tgev-áe expressing the gfp were at least 10-or 100-fold lower (around 10 4 pfu/ml) than with the system using stably transformed cells or the sin vector to complement deletion of the e gene [42] . coronavirus minigenomes have a theoretical cloning capacity close to 27 kb, since their rna with a size of about 3 kb is efficiently amplified and packaged by the helper virus and the virus genome has about 30 kb. in contrast, the theoretical cloning capacity for an expression system based on a single coronavirus genome like tgev according to current available knowledge is between 3 and 3.5 kb taking into account that: (i) the non-essential genes 3a (0.2 kb), 3b (0.73 kb), and most of gene 7 [43] have been deleted leading to a viable virus; (ii) the standard s gene can be replaced by the s gene of a porcine respiratory coronavirus (prcv) mutant with a deletion of 0.67 kb; and (iii) both dna and rna viruses may accept genomes with sizes up to 105% of the wild type genome. this cloning capacity will probably be enlarged by deleting non-essential domains of the replicase gene. these domains are being identified by comparing the arterivirus replicase gene (i.e., for eav 9.7 â 10 3 nt) and that of coronavirus (i.e., for tgev 20.3 â 10 3 nt) [19] . differences in size between these two replicase genes could correspond to nonessential domains in the coronavirus replicase that may be dispensable. to optimize expression levels it is essential to improve virus vector replication levels without increasing virulence, to optimize the accumulation of total mrna levels, and to improve mrna translation. these results can only be achieved by determining the mechanism involved in these processes. a brief review of the mechanism of mrna transcription in coronavirus and arterivirus is described to help achieve this goal. coronavirus rna synthesis occurs in the cytoplasm via a negative-strand rna intermediate that contains short stretches of oligo(u) at the 5 0 end. both genome-size and subgenomic negative-strand rnas, which correspond in number of species and size to those of the virus-specific mrnas have been detected [44, 45] . coronavirus mrnas have a leader sequence at their 5 0 ends. at the start site of every transcription unit on the viral genomic rna, there is a trs that includes a highly conserved core sequence (cs) that is nearly homologous to the 3 0 end of the leader rna. this sequence constitutes part of the signal for sgmrna transcription. the common 5 0 leader sequence is only found at the very 5 0 terminus of the genome, which implies that the synthesis of sgmrnas involves fusion of non-contiguous sequences. the mechanism involved in this process is under debate. nevertheless, the discontinuous transcription during negative-strand rna synthesis model is compatible with most of the experimental evidence [45] [46] [47] . because the leader-mrna junction occurs during the synthesis of the negative strand within the sequence complementary to the cs (ccs) the nature of the cs is considered crucial for mrna synthesis. transcription levels may be influenced by many factors. the three that we [21] consider most relevant are: (i) potential base pairing between the leader 3 0 end and sequences complementary to the trs located at the 5 0 end of each nidovirus gene (ctrs), that guide the fusion between the nascent negative strand and the leader trs. a minimum complementarity is needed between the leader-trs and the ctrss of each gene. extension of this complementarity increases mrna synthesis up to a certain extent, beyond a certain extension addition of 5 0 or 3 0 cs flanking sequences does not help transcription [21, 48, 49] ; (ii) proximity of a gene to the 3 0 . since the trss act as signals to slow down or stop the replicase complex, the smaller mrnas should be the most abundant. although this has been shown to be the case in the mononegavirales [50] and in coronaviruses shorter mrnas are in general more abundant, the relative abundance of coronavirus mrnas is not strictly related to their proximity to the 3 0 end [21, 51] ; and (iii) potential interaction of proteins with the trss rna, and protein-protein interactions that could regulate transcription levels. the reassociation of the nascent rna chain with the leader trs is probably mediated by approximation of the leader trs through rna-protein and protein-protein interactions. the three factors implicated in the control of mrna abundance assume a key role for the trs. hence, in order to engineer vectors with high expression levels, it seems relevant to define the characteristics of the trs, including the size of the 5 0 and 3 0 trs sequences flanking the cs. the cs of coronaviruses belonging to groups i (hexamer 5 0 -cuaaac-3 0 ) and ii (heptamer 5 0 -ucuaaac-3 0 ) share homology, whereas the cs of coronaviruses belonging to group iii, like that of ibv, have the most divergent sequence (5 0 -cuuaacaa-3 0 ). also, arterivirus css have a sequence (5 0 -ucaacu-3 0 ) that partially resembles that of ibv. thus, the css of different coronaviruses are quite similar though slightly different in length. this cs is essential for mrna synthesis, and can be considered to be a defined domain in the trs because it is particularly conserved within a nidovirus family, while the flanking sequences, both at the 5 0 (5 0 trss) and at the 3 0 (3 0 trss) have a unique composition for each gene even within the same virus. the influence of the cs in transcription has been analyzed in detail in the arteriviruses [46, 47] . using an infectious cdna clone of eav it has been shown that sgmrna synthesis requires base pairing interaction between the leader trs and ctrs in the viral negative strand. the construction of double mutants in which a mutant leader cs was combined with the corresponding mutant rna7 body cs, resulting in the specific restoration of mrna7 synthesis, suggested that the sequence of the cs per se is not crucial, as long as the possibility for cs base pairing is maintained. nevertheless, it has been shown that other factors, besides leader-body base pairing, also play a role in sgmrna synthesis and that the primary sequence (or secondary structure) of trss may dictate strong base preferences at certain positions [46] . in addition, detailed analysis of the trs used in the arteriviruses [47] , mhv [52] , bcov [53] , and tgev [31] indicate that non-canonical cs sequences may also be used for the switch during the discontinuous synthesis of the negative strand during transcription in the nidovirales. the promotion of transcription from a given cs is also a function of the cs flanking sequences. data from different laboratories working with different nidoviruses have shown that cs flanking sequences can critically influence the strength of a given fusion site [21, 48, 49, [53] [54] [55] . although approximations to the definition of the trs have been made, the precise length of the trs requires further work to optimize accumulation of mrna levels. studies on coronavirus transcription were performed using more than one cs to express the same mrna. the accumulated amounts of sgmrna remained nearly the same for constructs with one to three css, and transcription preferentially occurred at the 3 0 -most trs [29, [56] [57] [58] . this observation is consistent with the coronavirus discontinuous transcription during the negative-strand synthesis model [59] . driving vector expression to different tissues may be highly convenient in order to preferentially induce a specific type of immune response, i.e., mucosal immunity by targeting the expression to gut-associated lymph nodes. in addition, it seems useful to change the species specificity of the vector to expand its use. both tissue-and species-specificity have been modified using coronavirus genomes. group 1 coronaviruses attach to host cells through the s glycoprotein by interactions with aminopeptidase n (apn) which is the cellular receptor [60, 61] . group 2 coronaviruses use the carcinoembryonic antigen-related cell adhesion molecules (ceacam) as receptors. engineering the s gene can lead to changes both in the tissue-and species-specificity [7, 8] . tropism change in general leads to a change in virulence. certainly this is the case in porcine coronavirus with a virulence directly related to its ability to grow in the enteric tract [7] . gene expression among the non-segmented negative-stranded rna viruses is controlled by the highly conserved order of genes relative to the single transcriptional promoter. rearrangement of the genes of vesicular stomatitis virus eliminates clinical disease in the natural host and is considered a new strategy for vaccine development [62] . in coronavirus, genes closer to the 3 0 end are in general expressed more abundantly than 3 0 end distal ones and, in principle, gene order change can also lead to virus attenuation (p. rottier, personal communication). the arteriviridae include four members: eav, prrsv, shfv and lactate dehydrogenase-elevating virus of mice (ldhv). defective genomes of eav have been isolated and used to express a reporter gene (cat) in cell culture [35] . more interestingly, infectious cdna clones have been obtained for eav [14] , prrsv [14, 63] and shfv [16] creating the possibility of specifically altering their genomes for vector development and vaccine production. to insert genes in different positions a unique restriction endonuclease site has been introduced between consecutive eav genes [63] . the viruses recovered expressed epitopes of nine amino acids from mhv within the ectodomain of the membrane (m) protein for at least three passages [35] . foreign epitopes have also been expressed by using prrsv vectors [64] . both helper-dependent expression systems, based on two components, and single genomes constructed by targeted recombination, or by using infectious cdnas, have been developed for coronaviruses. the sequences that regulate transcription have been characterized mainly using helper-dependent expression systems. these expression systems have the advantage of their large cloning capacity, in principle higher than 27 kb, produce reasonable amounts of heterologous antigens (2-8 mg/10 6 cells), show a limited stability (synthesis of heterologous gene is maintained for around 10 passages), and elicit strong immune responses. in contrast, coronavirus vectors based on single genomes have at present a limited cloning capacity (3-3.5 kb), expression levels of heterologous genes are 10-fold over those of helper dependent systems (>50 mg/10 6 cells) and are very stable (>30 passages). furthermore, replication-competent propagation-deficient expression systems based on coronavirus genomes have been developed increasing the safety of these vectors. the possibility of expressing different genes under the control of trss with programmable strength, and engineering the tissue and species tropism indicate that coronavirus vectors are very flexible. thus, coronavirus-based vectors are emerging with a high potential for vaccine development and, possibly, for gene therapy. van regenmortel, m.h.v. virus taxonomy. classification and nomenclature of viruses virus taxonomy. classification and nomenclature of viruses proc. natl. acad. sci. usa 97 the nidoviruses (coronaviruses and arteriviruses) proc. natl. acad. sci. usa 93 proc. natl. acad. sci. usa 96 proc. natl. acad. sci. usa 95 this work has been supported by grants from the comisio´n interministerial de ciencia y tecnologı´a (cicyt), la consejerı´a de educacio´n y cultura de la comunidad de madrid, and fort dodge veterinaria from spain, and the european communities (life sciences program, key action 2: infectious diseases). key: cord-339012-4juhmjaj authors: hou, wei; liu, fei; van der poel, wim h.m.; hulst, marcel m. title: rapid host response to an infection with coronavirus. study of transcriptional responses with porcine epidemic diarrhea virus date: 2020-07-28 journal: biorxiv doi: 10.1101/2020.07.28.224576 sha: doc_id: 339012 cord_uid: 4juhmjaj the transcriptional response in vero cells (atcc® ccl-81) infected with the coronavirus porcine epidemic diarrhea virus (pedv) was measured by rnaseq analysis 4 and 6 hours after infection. differential expressed genes (degs) in pedv infected cells were compared to degs responding in vero cells infected with mammalian orthoreovirus (mrv). functional analysis of mrv and pedv degs showed that mrv increased the expression level of several cytokines and chemokines (e.g. il6, cxcl10, il1a, cxcl8 [alias il8]) and antiviral genes (e.g. ifi44, ifit1, mx1, oasl), whereas for pedv no enhanced expression was observed for these “hallmark” antiviral and immune effector genes. pathway and gene ontology “enrichment analysis” revealed that pedv infection did not stimulate expression of genes able to activate an acquired immune response, whereas mrv did so within 6h. instead, pedv down-regulated the expression of a set of zinc finger proteins with putative antiviral activity and enhanced the expression of the transmembrane serine protease gene tmprss13 (alias mspl) to support its own infection by virus-cell membrane fusion (shi et al, 2017, viruses, 9(5):114). pedv also down-regulated expression of ectodysplasin a, a cytokine of the tnf-family able to activate the canonical nfkb-pathway responsible for transcription of inflammatory genes like il1b, tnf, cxcl8 and ptgs2. the only 2 cytokine genes found up-regulated by pedv were cardiotrophin-1, an il6-type cytokine with pleiotropic functions on different tissues and types of cells, and endothelin 2, a neuroactive peptide with vasoconstrictive properties. furthermore, by comprehensive datamining in biological and chemical databases and consulting related literature we identified sets of pedv-response genes with potential to influence i) the metabolism of biogenic amines (e.g. histamine), ii) the formation of cilia and “synaptic clefts” between cells, iii) epithelial mucus production, iv) platelets activation, and v) physiological processes in the body regulated by androgenic hormones (like blood pressure, salt/water balance and energy homeostasis). the information in this study describing a “very early” response of epithelial cells to an infection with a coronavirus may provide pharmacologists, immunological and medical specialists additional insights in the underlying mechanisms of coronavirus associated severe clinical symptoms including those induced by sars-cov-2. this may help them to fine-tune therapeutic treatments and apply specific approved drugs to treat covid-19 patients. the lack of knowledge for treating hospitalized sars-cov-2 infected patients is one of the pressing problems of the current covid-19 pandemic. the sars-cov-2 virus shows a close genetic similarity to the in april 2003 identified sars virus (sars-cov-1) and to other sars-related coronaviruses isolated from humans and bats. sars-cov-2 induces clinical respiratory symptoms familiar to the 2003 virus, mostly in persons with underlying diseases like copd, heart failure, diabetes and obesity (1: wu et al. 2020) . despite the 2003 sars-cov-1 virus has been extensively studied in the last two decades, there are no vaccines available yet, neither there are effective prophylactic and therapeutic treatment regimens with drugs that work equally well for each individual patient with sars-induced respiratory problems. such treatments might prevent development of severe disease patterns like "acute respiratory distress syndrome" (ards) and other, often fatal complications, and may decrease the case-fatality rate of sars-cov-2 infections. in our lab we study the alpha-coronavirus pedv. pedv was first detected in pig herds in 1977 in europe (2: pensaert and de bouck 1978) . however, this virus reemerged in the spring of 2013 in north america causing a massive outbreak among pig herds, resulting in the death of about 30% of the suckling piglets due to severe diarrhea and dehydration ( although several studies concluded that these clinical symptoms were caused by mrv itself, in concordance with the co-existence of mrv3 in pedv infected piglets, also other mrv serotypes were isolated from hospitalized patients with airway problems diagnosed positive for sars-cov-1 (11: cheng et al. 2009 , 12: duan et al. 2003 , 13: zuo et al. 2003 . recently, a cross-family recombinant coronavirus was isolated in china from bat faeces in which an rna sequence originating from the s1 segment of mrv was inserted in the coronavirus genome between the n and ns7a genes, indicating that both viruses were replicating simultaneously in a single cell in bats (14: huang et al. 2016) . a prevalence study showed that this cross-family recombinant coronavirus circulated in an isolated bat colony in a cave in china (15: obameso et al. 2017 ). this cooccurrence of mrv with coronaviruses raised the questions whether a synergistic effect between both viruses exists and if such coexistence plays a role in viral pathogenesis. therefore we studied the host response in cultured cells early (4 and 6 hours) after pedv and mrv infection using rnaseq. our original goal was to identify early factors and processes induced by pedv or mrv that could stimulate or influence the replication and pathogenesis of the other virus. the host, tissue and cell tropism of pedv differs from sars-cov-1 and -2. however, the genomic organization, replication strategy and function of a part of the viral nonstructural proteins share common features among all coronaviruses (16: brian and baric 2005) . this applies particularly for interactions in infected cells of nonstructural coronavirus proteins with specific host proteins. host proteins that are recruited or silenced to support virus replication, assembly and release. in our experiment we used vero cells (cercopithecus aethiops epithelial kidney cell line; atcc® ccl-81) because these cells support efficient infection and replication of both mrv and pedv. vero cells are susceptible for many coronaviruses, including sars-cov-1 and -2 (17: chu et al. 2020). they originate from epithelial tissue, in part resembling nasal and bronchial epithelium cells, the prime target cells infected by sars-cov-2 in the airways of humans. recent research showed that sars-cov-2 is also able to replicate in epithelial cells of human small intestinal organoids (18: lamers et al. 2020) . a disadvantage of vero cells is a deletion in the type i interferon (ifn) gene cluster on chromosome 12 (19: osada et al. 2014 ). therefore, these cells lack expression of type i ifns important for activation of antiviral defense mechanisms. however, research has shown that vero cells by-pass this ifnactivation route and could mount an antiviral response mediated by interferon regulatory factor 3 (20: chew et al. 2009 ). single infections with pedv or mrv3 alone and simultaneous (double) infections of vero cells with both viruses were performed using a maximum multiplicity of infection (moi) to achieve a synchronized infection of all cells. by rnaseq measured expression levels of mrna transcripts/genes in infected cells were compared to rnaseq profiles measured from similar treated mock-infected cells harvested at the same time point after infection. the detected sets of differential expressed genes (degs) for pedv and mrv were analyzed by gene set enrichment analysis (gsea) using functional bioinformatic programs to retrieve biological processes (pathways and gene ontology terms [go-term]) and associations with chemical compounds, including drugs. in addition, we searched the literature for functional information of the pedv-degs to find possible associations with sars-cov-2 pathogenesis. because of the covid-2 pandemic we gave priority to publish the results of this functional bioinformatical analysis and datamining for the single infected vero cells with pedv separate from the results of the double infections with mrv3. in this report we focused on the "very early" host response of epithelial cells to an infection with the coronavirus pedv and pay less attention to the role of specific viral proteins in this host response to pedv. in part our results were in agreement with results of a previous rnaseq study comparing sars-cov-2 and influenza host responses by rnaseq (21: blanco-melo et al. 2020). but we also found associations with biological processes, and pivotal genes/proteins acting in these processes, that had not been recognized before. this information may contribute to the search for novel or alternative preventive or therapeutic drugs and treatment protocols for this devastating covid-19 disease. a time-dependent infection experiment was performed with cultured vero cells. details are described in supplementary file 1 (material and methods) and visually displayed in this file. briefly, overnight cultured vero cells grown in 2 cm 2 wells were mock-infected, infected with mrv3 strain wbvr (7: hulst et al. 2017) or pedv strain cv777 (2: pensaert and de bouck 1978, 22: rasmussen et al. 2018] ) with a multiplicity of infection of ≥1 for 30 min at 4°c. for pedv and corresponding mockinfected cells, 10 µg/ml of trypsin in serum-free medium was used to facilitate infection of vero cells during the whole experiment. all virus and mock-infected timepoints were performed in quadruplicate. after incubation for 30 min at 4°c, virus was discarded and cells were washed twice and supplied with fresh culture medium. cells were incubated for 0, 2, 4, 6, and 16h at 37°c and 5% co2. after incubation for the indicated times, cells were placed on ice before total rna was isolated from three of the quadruplicate wells. the replication of both viruses in vero cells was monitored using virus-specific rt-qpcr tests ( fig.1 : methods and primers used for pcr are provided in supplementary file 1). in addition, cells in one of the quadruplicate wells incubated for 16h were fixated and stained with antibodies directed against the s2 spike protein of pedv and the s1 attachment protein (α1) of mrv3. a decrease in ct-values for pedv was not observed before 6 h post inoculation (6 h.p.i), indicating that replication in pedv infected cells started later than was observed for mrv (at 4 h.p.i). staining of the cells after 16h indicated that nearly all vero cells were infected with mrv3 and more than 50% with pedv. also more than 50% of the cells in 16hwells appeared as fused cells (syncytia), confirming that more than 50% of the cells were infected with pedv. quality control of the total rna isolated from infected cells using an agilent bioanalyzer showed that rnas isolated from pedv infected wells at 16 h.p.i. were partially degraded (rin values below 9), making them unsuitable for rnaseq analysis. therefore, only 0, 4 and 6h timepoints were analyzed using rnaseq. stained with a monoclonal antibody directed against the s2 spike protein. mrv and mock infected cells were stained with a polyclonal rabbit serum raised against a peptide sequence of the s1-attachment protein of mrv serotype 3. nuclei were stained blue with the hoechst, 4',6-diamidino-2-phenylindole dye. equal amounts of total rna isolated from triplicate wells were pooled and subjected to rnaseq analysis by genomescan b.v.(leiden, the netherlands) using next generation sequencing (ngs) (see supplementary file 2a for details). mapping of ngs reads to the cercopithecus aethiops reference genome and preparation of datafiles with calculated fold change (fc) of expression levels of mapped mrnas, were performed for each comparison at 0, 4, and 6h by genomescan (see supplementary file 2b). from these datafiles we extracted lists of degs with a fc>2 and p-value of <0.05. after accessing the ncbi, panther or kegg databases for human orthologs, not annotated cercopithecus aethiops degs were annotated with an hugo official gene symbols (http://www.genenames.org). in supplementary file 3 sheet "pedv-mrv degs fc>2", lists of all annotated pedv and mrv degs are presented with their fc. in a separate sheet "pedv-degs functional info" all 266 individual degs regulated by pedv at 4 and 6 h.p.i. are presented with their fc, information about their function and the types of human cells in which expression of the gene is relatively high compared to other human cells (retrieved from the "primary cell atlas" dataset of biogps: http://biogps.org/). note that all tables in these excel sheets of supplementary file 3 are sortable using the headers. in all results paragraphs beneath information about the biological function of degs was retrieved by consulting the "genecards" (weizmann institute of science: https://www.genecards.org/) and ncbi gene reports (entrez gene: https://www.ncbi.nlm.nih.gov/gene/), and literature linked to these reports (for references about these biological functions of genes/proteins we refer to publications cited in these reports: "genecards" weblinks are provided in supplementary file 3). sets of pedv and mrv degs were analyzed using the gsea program geneanalytics (lifemap sciences, inc.) and pathways (for mrv and pedv), go-terms (not for mrv), and associations with compounds/drugs (not for mrv) with a high or medium score (p-value <0.05) were retrieved and listed in 3 separate sheets in supplementary file 3 (sheets "mrv-pedv pathways", "pedv g0-terms" and "pedv compounds"). similar and related pathways retrieved for both pedv and mrv, and remarkable pedv pathways, go-terms and compound associations are summarized in table 1 . for pedv all degs within these pathways are provided with their regulation, up (green) or down (red). for mrv only degs in common with pedv-degs were listed in table 1 (see sheet "mrv-pedv pathways" in supplementary file 3 for all mrv-degs acting in these pathways). subsets of pedv-degs were selected matching the terms "chemokines-cytokines", "antiviral" , and terms related to the pathogenesis of covid-2 (explained below) using the genotyping program varelect (lifemap sciences, inc.) and displayed in supplementary file 3 in separate sheets: "chemokines-cytokines", "(anti)-viral", etc. based on these selections we prepared a set of pedv key-degs consisting of genes regulated with a fc of >10 (up) or <-10 (down) or playing an important role in biological processes induced by pedv and related to covid-19 pathology. in beneath results sections we tried to give as much as possible meaningful information about the function of key-degs for which we found an association with sars-cov-2 infections. we emphasize that further dedicated experimental and in-silico research is necessary to confirm the involvement of the proteins encoded by these genes for pathogenesis of this viral disease. table 1 . enriched pathways, go-terms and compound associations of pedv-degs. *pedv enriched pathways (a), go-terms (b), and associations with compounds and drugs (c) with a high and medium score and with at least 2 matching genes were retrieved from geneanalytics. common pathways for mrv were included in table 1a . a full list of pathways with degs, retrieved for mrv at 4 and 6h, is provided in supplementary file 3 (sheet mrv-pedv pathways). a possible function or process related to specific degs, pathways, go-term, or compounds/drugs is provided in blue text between brackets. $ official gene-symbols (hugo abbreviations) are listed for degs. down-regulated degs were colored red and up-regulated degs were colored green. in section a the number of degs regulated by mrv in a pathway and the common degs are provided between brackets. degs regulated by both pedv and mrv are underlined. compared to mrv, only a few genes involved in "cytokine/chemokine signaling" were regulated at 4 and 6h by pedv. in fig. 2a the regulation of cytokines/chemokines in pedv and mrv infected vero cells are displayed. this indicated that mrv increased the transcription of a broad set of cytokines/chemokines, including interferonmediated cytokines like cxcl10 and cxcl8 (alias il8), already at 4 h.p.i., whereas pedv did not, even not when replication of pedv rna was detected by rt-qpcr at 6 h.p.i.. for mrv, this cytokine/chemokine response at 4 h.p.i. was followed by high up-regulation of "hallmark" antiviral genes at 6 h.p.i. (see fig. 2b : e.g. interferoninduced genes [ifi] and oasl) and chemokines that attract t cells, monocytes, granulocytes, including basophils (e.g. cxcl8, cxcl11 and ccl2). pedv infection up-regulated only a few genes coding for proteins with cytokine activity (ctf1 and edn2), and also did not elevated gene expression of these "hallmark" antiviral genes. in contrast, pedv down-regulated expression of 6 genes (out of 9 in total) coding for zinc finger proteins (out of 9 in total), all 6 with an antiviral activity towards herpes simplex virus 1 ( fig 2c) binding of thrombin to f2rl2 reduces inflammation, activates platelets and increases vasodilation and permeability of the vascular wall (see also below in the section "platelets activation"). csf1r is a receptor for the cytokine colony stimulating factor 1, a cytokine that regulates differentiation and function of macrophages, and in the cns, the density and distribution of microglia cells. the blnk gene codes for a cytosolic protein that passes on b-cell receptor signals in the signaling cascade that activates b-cell development and function. gene expression of genes coding for essential components of this b-cell signaling, like "spleen associated tyrosine kinase" (syk) and "lyn proto-oncogene"(lyn) were not regulated by pedv, nor by mrv. genes involved in amino acid, protein translation and metabolism of immuno-active compounds. pedv degs coding for enzymes involved in the metabolism of the non-essential amino acids histidine, phenylalanine, tryptophan and proline were found enriched in the pedv dataset (see supplementary file 3, sheet pedv-compounds). remarkable were the degs coding for amine oxidases involved in the catabolism of the biogenic amines histamine, tryptamine and phenylethylamine, their derivates and related substrates/products of these enzymes (fig. 3, aoc1 , maoa, il4i1). none of these amino oxidase genes were regulated by mrv. using the genotyping program varelect, pedv-degs with an association with these biogenic amines were retrieved (supplementary file 3, sheet "biogenic amines"). three enzymes clustered in the "histidine metabolism" pathway (https://www.kegg.jp/keggbin/show_pathway?hsa00340+4128) with histamine and reaction products generated from this biogenic amine (fig. 3) . also most association of pedv-degs were found by varelect for histamine. the gene coding for the amine oxidase "interleukin 4 induced 1" (il4i1) was strongly down-regulated (30-fold) 4h after infection with pedv. besides catalysis of l-phenylaniline into phenylpyruvate (fig.3) , il4i1 also fulfills an important role in signaling in "synaptic clefts" formed between antigen presenting cells (apc) and t cells (so-called "immune cleft": see also below) ( expression of the prostaglandin-endoperoxide synthase 2 (ptgs2) gene was upregulated by mrv at 6h p.i. ptgs2 synthesizes prostaglandin endoperoxide h2 (pgh2), an compound with a short half-life and the precursor of many biological active prostaglandins: e.g. thromboxane-a2 (mediates activation of platelets), pgi2 and pge2. in contrast, pedv increased the expression of the gene coding for prostaglandin e synthase (ptges) which converts pgh2 into pge2. pge2 is a direct vasodilator, but does not inhibit platelet aggregation. pge2 also suppresses t cell receptor signaling. pedv decreased expression of the gamma-glutamyltransferase 1 gene (ggt1) after 4h (2-fold), but increased expression of this gene 2 hours later to a 4-fold level compared to mock infected cells. ggt synthesizes leukotriene d4 (ltd4) from ltc4. ige-activated mast cells may secrete ltd4 and ltc4, together with histamine and platelets activating factor (paf). this vesicle mediated secretion by mast cells (degranulation) results in stimulation of mucus production, and similar to histamine, increases the permeability and smooth muscle contraction of the vascular wall. in persons suffering from asthma this degranulation leads to an immediate allergic response (bronchospasm, airflow obstruction and forming of edema). genes involved in "cilia and synaptic cleft" formation. gsea detected "axon guidance" as the go-term with the highest score for pedv (see table 1 ). in addition, pedv-degs were enriched coding for proteins involved in calcium ion-dependent exocytosis from vesicles into the "synaptic clefts" between two cells (e.g. between axons and dendrites), and degs coding for proteins involved in formation of cilia. cilia protruding from cells are found in many forms. they can have a static (structural) function, e.g. in forming of clefts between two cells (see fig. 4 ), or a motile function. motile cilia on the surface of ciliated cells lining up the epithelial layers in the nose, trachea and bronchia sweep out superfluous mucus containing dirt from the airways. pedv degs matching the terms "cilia" and "synaptic cleft" retrieved form the genotyping program varelect were further examined by consulting functional information in the ncbi gene and genecards reports in order to evaluate their association with these processes (see supplementary file 3, sheet "cilia and synaptic cleft"). based on this analysis we identified genes in the set of pedv degs which can i) negatively regulate cell adhesion (rnd1 and sema5a), ii) inhibit formation of cilia (kinases mak1 and cdk20, highly up-regulated at 6h), and iii) regulate cytoskeleton rearrangements that facilitate axon growth and growth and stabilization of dendritic spines (f2rl2, regulation of genes involved in histamine/biogenic amines (see above) and formation of cilia/clefts suggested that gene expression related to this intersynaptic signaling between immune cells could be affected in response to infection with pedv (fig 4) . in particular, the highly down-regulated gene il4i1 (30-fold at 4h p.i.) is of interest (see also above). il4i1 is believed to be secreted from apc's (e.g. dc's) in the immune cleft formed with t cells (31: molinier-frenkel et al. 2019). the mechanism how il4i1 transmits its signal to t cells is not completely understood. it could bind to a receptor that concentrates this amino oxidase in the cleft, resulting in elevated h2o2 and ammonia production, phenylalanine depletion and phenylpyruvate production in the cleft space. these alteration in the concentration of these chemicals in the cleft are sensed by the t cell. the paralog of il4i1, amino oxidase maoa (down-regulated 11-fold by pedv) could also play a similar role in this signaling process. remarkable was also the strong down-regulation of genes coding for the olfactory receptor family 2 subfamily a member 14 (or2a14; 17-fold at 4h) and anoctamin 2 (ano2, alias cacc;14-fold at 4h). ano2 is a calcium-activated chloride channel imbedded in the basal membranes of neurons that harbor apical membrane receptors like or2a14 that sense odorants. by importing chloride ions into the cytosol ano2 contributes to the depolarization of these neurons (https://www.kegg.jp/kegg-bin/show_pathway?hsa04740+57101). loss of smell and taste is one of the first noticeable symptoms of covid-19. genetic defects in the ano2 gene are associated with von willebrand disease, a bleeding disorder due to defective platelet aggregation ( . also disease incidence in adult males is significantly higher than in females of the same age. to assess whether pedv-degs relate to these pathological symptoms, the genotyping program varelect was used to identify genes matching the terms "ards", "cardiomyopathy", "obesity (diabetic)", and "platelets activation". detailed information about all matching degs is provided in supplementary file 3 in separate sheets for all 4 queried terms. degs matching to more than one query term are displayed in fig. 5 . remarkable associations of degs with these terms are mentioned in sections beneath. . edn2 and agt2 are vasoactive peptides and binding of edn2 and agt2 to their receptors on granular cells of the juxtaglomerular apparatus in the kidney raises free calcium levels in the cytosol, leading to inhibition of camp-mediated secretion of the aspartylprotease renin (ren), the key regulator of renin-angiotensin-aldosterone system (raas) (https://www.kegg.jp/kegg-bin/show_pathway?hsa04924+1907). ren converts pre-angiotensinogen (agt) to the endocrine peptide-hormone agt1 (https://www.kegg.jp/kegg-bin/show_pathway?hsa04614+5972). agt1 is further cleaved to variants with specific endocrine activity by angiotensin i converting enzymes (e.g. ace and ace2: ). on the surface of bronchial epithelial cells ace2 was identified as entry receptor for sars-cov-1 and -2 (43: hoffmann et al. 2020). the octamer peptide agt2 stimulates secretion of the mineralocorticoid hormone aldosterone by the adrenal glands. aldosterone, and the agt and edn peptide hormones regulate an array of physiological processes in the body, e.g. vascular smooth muscle contraction, blood pressure, fluid and electrolyte homeostasis (44: agapitov et al. 2002) . all processes that are important for proper functioning of the vascular system, heart muscles and kidneys. ctf1 is directly involved in the pathology of numerous cardiovascular diseases by promoting cardiac myocyte hypertrophy (41: wollert et al. 1996) , which may lead to the onset of heart-diseases like "hypertrophic cardiomyopathy" or "dilated cardiomyopathy", and eventually, to (lethal) heart failure. pedv induced a strong down-or up-regulation of several other genes directly involved in the function of cardiomyocytes. sodium voltage-gated channel subunit 4 (scn4b) was strongly upregulated (15-fold) and mylk2 (see above), citrate synthase (cs; down-regulation 24-fold) and ankyrin repeat domain 1 (ankrd1) were strongly down-regulated. down-regulation of cs may reduce oxidative capacity in cardiomyocytes. gene expression of ankrd1 was down-regulated 12-fold in response to mrv infection at 4 h.p.i, but reverted to a 24-fold up-regulation 2h later. ankrd1 is a putative transcription factor involved in regulation of gene expression in hypertrophic myocytes (https://www.wikipathways.org/index.php/pathway:wp516) regulation of cytosolic calcium levels in the cytosol of cardiomyocytes, e.g. by binding of col1a1 and col2a (up-regulated by pedv, see above) to integrin subunit alpha on the surface of cardiomyocytes or after import of calcium ions mediated by calcium voltage-gated channels (e.g. by cacna1h; down-regulated 5-fold at 4h by pedv) may also trigger myocyte hypertrophy. pedv strongly down-regulated gene expression of a potassium voltage-gated channel (kcnq4;16-fold). this in contrast to a strong up-regulation of the sodium symporter scn4b. for the potassium channel cacna1h (alias kv7.4) it was reported that this channel regulates the membrane potential and ca 2+ permeability of mitochondria located in the vicinity the sarcoplasmic reticulum in rat cardiomyocytes (45: testai et al. 2016). all three above mentioned ion channels are also involved in the process of excitation, contraction/relaxation and repolarization of cardiac myocytes. mrv also downregulated gene expression 3 to 4-fold for the potassium (kcnq4) and calcium channel cacna1h, but did not increased expression of the sodium symporter gene scn4b or orthologs of this gene. chronic hypertension and heart disease/failure is a complication frequently observed in obese/diabetic patients. in accordance with this, 16 out of the 65 pedv degs matching the term "obesity" also matched with the term "cardiomyopathy" (see additional remarkable pedv-degs. highly up-or down-regulated pedv-degs not mentioned in the text, and to our opinion interesting with regard to coronavirus infection, are briefly described in table 2 . among these degs several genes coding for transcription factors and genes transcribed in antisense rna's that inhibit translation of their coding counterparts. for more functional information about these degs we refer to the weblinks provided in supplementary file 3, sheet "pedv key degs". table 2 . remarkable pedv-degs not mentioned in the text. in this report we measured the transcriptional response of vero cells shortly after infection with the coronavirus pedv. the function of the responding host genes and the biological processes in which they act were studied in detail by us to find plausible relations to covid-19 pathology. because of the differences in genomic organization and expression of viral proteins between sars-cov-2 and pedv, we paid less attention to couple the response of specific host genes to the function of specific coronavirus proteins. we were able to infect the majority of vero cells (>50%) with pedv and mrv synchronically. this resulted in a unique set of highly up-and down-regulated degs for pedv. not more than 14% of the 266 pedv-degs (n=37) were similar to mrv-degs (total of 727 mrv-degs). in contrast to mrv, we observed no typical response of antiviral genes and related cytokine/chemokine genes in vero cells within 6 h.p.i. for pedv. for mrv these processes started already before 4 h.p.i.. we have to notice that pedv replication started 2h later than mrv3 replication, which could in part be the reason for not detecting transcriptional regulation of specific cytokine, chemokine and antiviral genes for pedv. longer incubation times than 6h were not planned in the original design of our experiments and would have resulted in a set of pedv-degs dominated by genes involved in syncytia forming and apoptotic/necrotic cell death. nevertheless, at 6h replication of pedv rna was detected by rt-pcr, indicating that dsrna was present in the cells and could have be sensed by cytosolic pattern recognition receptors of the rig-i-like family to initiate an antiviral and related cytokine/chemokine response. similar as observed in another study with pedv and vero cells, and in analogy with sars-cov-1 and -2, we observed a high up-regulation of the transmembrane serine protease gene (tmprss13) that acts as a co-factor in the infection process of cells ( we observed a reduced expression of eef1a1, as part of a transcription factor-complex that binds and activates the promotor of ifnγ, and of the cytokine eda and its receptor (edaradd) involved in activation of canonical-nfkb transcription of antiviral cytokine-chemokine genes like cxcl8 (alias il8) an cxcl10. this reduced expression of eef1a1 and eda and its receptor may play a crucial role in delaying or downgrading an ifn-mediated antiviral and cytokine/chemokine response in our vero cell system. the elevated transcription of many cytokine and chemokine genes in vero cells by mrv suggests that replication of this rna virus in epithelial cells induces secretion of these immune effectors (more information about the genes/processes that responded to mrv infection will be published elsewhere). pedv, and also the mrv3 strain we used both replicate in enterocytes lined up in the intestinal mucosal layer. in the intestinal and bronchial epithelial layer, microfold (m) cells are imbedded between these lined up epithelial cells. m-cells sense and engulf foreign pathogens/antigens from the lumen to present them to residing immune cells. according to pathway analysis, most t cell related immune genes regulated in response to mrv infection were part of the pathways "t-helper 17 (th17) differentiation/activation" and "il17 signaling". antigen presentation by m cells to th17 cells in the submucosal layers stimulate secretion of different types of il17 cytokines (il17a-d, il25 and il17f) resulting in activation of different types of innate immune cells and t cells, including th1/th2 cells. dysregulation of this pathogen-induced il17 response may disturb the balance between th1/th2-cell mediated immune responses, resulting in excessive inflammation, damage to the epithelial layer and on the long term, to autoimmune reactions. tf rorc (or specific isoforms of this tf, see above) plays a pivotal in controlling il17 expression and secretion by th17 cells. pedv strongly up-regulated gene expression of rorc in vero cells whereas mrv down-regulated expression of this tf. this difference in regulation of tf rorc suggests that virus-induced activation or suppression of il17 secretion by th17 cells in submucosal layers of airway epithelium can be an important mechanism to dysregulate the activation of t cell responses. therefore, tf rorc might be a potential target for drug treatment/development. drugs affecting expression of rorc, like the fluorinated steroid "dexamethasone" and the synthetic tetracycline derivative "doxycycline" (http://ctdbase.org/basicquery.go?bqcat=gene&bq=rar+related+orphan+receptor +c) are already under investigation in relation to sars-cov-2 pathogenesis. transcriptional regulation of a set of genes coding for enzymes involved in biogenic amine metabolism was unique for pedv, and not observed for mrv. most associations of these pedv-degs were found with histamine, a compound produced by mast cells and basophils, and released by these cells in response to allergens and pathogen-induced inflammation. the 10-fold up-regulation of the enzyme aoc1 suggests that histamine is converted to imidazole-acetaldehyde (see fig. 3 ). however, without data of intra-and extracellular concentrations of the chemicals this remains speculative. recent reports indicated that submucosal mast cells in the lungs were triggered by sars-cov-2 infection to release pro-inflammatory cytokines (il1, il6 and tnf-alpha) . mdscs infiltrate these tumors and inflamed tissues to suppress the local activity of specific immune cells. therefore, the role of infiltrating mdscs at inflammatory sites in the lungs of covid-19 patients, as part of an sars-cov-2 immune-evading strategy, and the role of il4i1 in this process, is worthwhile to investigate in more detail. expression of genes that promote or inhibit the formation and motility function of cilia were time-dependently regulated by pedv. the 20-fold up-regulation of fez1 gene expression at 4h (20-fold) descended within 2h to a moderate 6-fold upregulation. this descend occurred simultaneously with elevation of gene expression of the kinases mak and cdk20, both involved in inhibition of cilia formation. because pedv efficiently replicates in enterocytes that carry ciliated membrane protrusions (microvilli) on their luminal surface, regulation of these genes could be related to structural changes in the cytoskeleton of cells imposed by virus replication (e.g. syncytia forming in pedv infected vero cultures). likewise, sars-cov-2 replication could also impose structural changes in cilia protruding from the surface of upperairway epithelial cells (nose, trachea) and bronchi. based on our results we cannot pinpoint a specific process in which these cilia-regulating genes act. possible processes can either be formation of an immune cleft, a virological cleft to promote more effective infection of neighboring cells, or cytoskeleton rearrangements to support virus replication, morphogenesis and budding from cells. interestingly, two recent studies revealed a high level of expression of the sars-cov-2 entry receptor ace2 and its co-receptor tmprss2 in ciliated airway epithelial cells ( . these processes deserve more attention, and may also be considered as possible target-processes for interference with drugs. within our set of pedv-degs, and the biological processes deduced from this set, we found associations with diverse aspects of covid-19 pathogenesis, i.e. "ards", "cardiomyopathy", "obesity (diabetic)", and "platelets activation". however, it is unknown whether the proteins encoded by these pedv-degs indeed play a role in the biological processes underlying the symptoms and complications observed in hospitalized persons infected with sars-cov-2. nevertheless, a part of these genes/processes may be starting points for further dedicated research. research to fine tune drug treatment protocols that are already applied for covid-19 patients, or research that provides new insights for treatments with alternative prophylactic and therapeutic approved drugs. in table 3 . we summarized the pedv-degs that are, to our opinion, of interest for modulation of the biological processes underlying the pathogenesis of covid-19. table 3 . possible target genes for covid-19 therapy. colophon. the overwhelming amount of data published recently made it impossible for us to oversee all (novel) facts about the sars-cov-2 virus and pathology of the covid-19 disease. some important genes and related processes imbedded in our set of pedv-degs may have been overlooked by us. therefore, we encourage researchers, especially medical, immunological and pharmaceutical specialists, to study this set of degs in detail. the users-friendly supplementary file 3 with functional information about degs and related biological processes can be down-loaded from the web. by publishing these pedv data ahead of our complete study, we hope that some of the gene targets and cognate processes we have identified for the coronavirus pedv will contribute to a better understanding how hospitalized covid-19 patients can be treated and cured. a more condensed version of this manuscript, focusing on the original goal of our study, will be submitted to a peer-reviewed virological journal soon. risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease a new coronavirus-like particle associated with diarrhea in swine origin, evolution, and genotyping of emergent porcine epidemic diarrhea virus strains in the united states porcine epidemic diarrhea virus infection: etiology, epidemiology, pathogenesis and immunoprophylaxis a novel pathogenic mammalian orthoreovirus from diarrheic pigs and swine blood meal in the united states identification of a novel reassortant of a mammalian orthoreovirus in faeces of diarrheic pigs in the netherlands: presentation at the 11th annual meeting of epizone in paris high similarity of novel orthoreovirus detected in a child hospitalized with acute gastroenteritis to mammalian orthoreoviruses found in bats in europe pathogenesis of reovirus infections of the central nervous system identification and characterization of a new orthoreovirus from patients with acute respiratory infections a novel reovirus isolated from a patient with acute respiratory disease reov isolated from sars patients cloning and identification of reovirus isolated from specimens of sars patients a bat-derived putative cross-family recombinant coronavirus with a reovirus gene the persistent prevalence and evolution of crossfamily recombinant coronavirus gccdc1 among a bat population: a two-year followup coronavirus genome structure and replication comparative tropism, replication kinetics, and cell damage profiling of sars-cov-2 and sars-cov with implications for clinical manifestations, transmissibility, and laboratory studies of covid-19: an observational study sars-cov-2 productively infects human gut enterocytes the genome landscape of the african green monkey kidney-derived vero cell line characterization of the interferon regulatory factor 3-mediated antiviral response in a cell line deficient for ifn production imbalanced host response to sars-cov-2 drives development of covid-19 full-length genome sequences of porcine epidemic diarrhoea virus strain cv777; use of ngs to analyse genomic and sub-genomic rnas tmprss2 and mspl facilitate trypsin-independent porcine epidemic diarrhea virus replication in vero cells desc1 and mspl activate influenza a viruses and emerging coronaviruses for host cell entry efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss2 a novel isoform of the orphan receptor rorγt suppresses il-17 production in human t cells structure, function and evolution of the hemagglutinin-esterase proteins of corona-and toroviruses functional screen reveals sars coronavirus nonstructural protein nsp14 as a novel cap n7 methyltransferase 2'-o methylation of the viral mrna cap evades host restriction by ifit family members human il4i1 is a secreted lphenylalanine oxidase expressed by mature dendritic cells that inhibits t-lymphocyte proliferation the il4i1 enzyme: a new player in the immunosuppressive tumor microenvironment avoiding the void: cell-to-cell spread of human viruses a common 253-kb deletion involving vwf and tmem16b in german and italian patients with severe von willebrand disease type 3 pathological findings of covid-19 associated with acute respiratory distress syndrome case-fatality rate and characteristics of patients dying in relation to covid-19 in italy the tn antigen-structural simplicity and biological complexity post-translational modifications of coronavirus proteins: roles and function emerging wuhan (covid-19) coronavirus: glycan shield and structure prediction of spike glycoprotein and its interaction with human cd26 therapies for bleomycin induced lung fibrosis through regulation of tgf-beta1 induced collagen gene expression covid-19-induced acute respiratory failure: an exacerbation of organspecific autoimmunity? medrxiv 2020.04 cardiotrophin-1 activates a distinct form of cardiac muscle cell hypertrophy. assembly of sarcomeric units in series via gp130/leukemia inhibitory factor receptor-dependent pathways a985g polymorphism of the endothelin-2 gene and atrial fibrillation in patients with hypertrophic cardiomyopathy sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor role of endothelin in cardiovascular disease expression and function of kv7.4 channels in rat cardiac mitochondria: possible targets for cardioprotection human coronary arteriolar dilation to adrenomedullin: role of nitric oxide and k(+) channels adrenomedullin improves cardiac function and prevents renal damage in streptozotocin-induced diabetic rats adrenomedullin: a possible autocrine or paracrine inhibitor of hypertrophy of cardiomyocytes histones and heart failure in diabetes histone h4 lysine 16 hypoacetylation is associated with defective dna repair and premature senescence in zmpste24-deficient mice studies of the sim1 gene in relation to human obesity and obesity-related traits neuropeptide b mediates female sexual receptivity in medaka fish, acting in a female-specific but reversible manner porcine epidemic diarrhea virus inhibits dsrna-induced interferon-β production in porcine intestinal epithelial cells by blockade of the rig-imediated pathway mast cells contribute to coronavirus-induced inflammation: new anti-inflammatory strategy mast cell stabilisers, leukotriene antagonists and antihistamines: a rapid review of effectiveness in covid-19 lnc-c/ebpβ modulates differentiation of mdscs through downregulating il4i1 with c/ebpβ lip and wdr5 identification of discrete tumor-induced myeloid-derived suppressor cell subpopulations with distinct t cellsuppressive activity sars-cov-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-cov-2 receptor ace2 and tmprss2 are primarily expressed in bronchial transient secretory cells mechanisms of innate immune evasion in re-emerging rna viruses supplementary file 3: interactive excel file with sortable tables in separate sheets. please, first read the sheet "read me" for an explanation and instructions for the use of the tables. excel sheets contain tables with i) pedv and mrv degs extracted from rnaseq data files, ii) functional information about the pedv-degs, iii) gsea extracted pathways (for mrv and pedv), go-terms (only for pedv) and compound associations (only for pedv), and iv) associations of pedv-degs with the terms "cytokines-chemokines", "(anti)-viral", "biogenic amines", "cilia and synaptic clefts", and the disorders "ards, "cardiomyopathy", "obesity", and "platelets activation" (a-c-o-p). the authors declare that they have no competing interests.authors' contributions. key: cord-312551-w4tps34p authors: razvi, mohammad h.; peng, dunfa; dar, altaf a.; powell, steven m.; frierson, henry f.; moskaluk, christopher a.; washington, kay; el‐rifai, wael title: transcriptional oncogenomic hot spots in barrett's adenocarcinomas: serial analysis of gene expression date: 2007-07-17 journal: genes chromosomes cancer doi: 10.1002/gcc.20479 sha: doc_id: 312551 cord_uid: w4tps34p serial analysis of gene expression (sage) provides quantitative and comprehensive expression profiling in a given cell population. in our efforts to define gene expression alterations in barrett's‐related adenocarcinomas (ba), we produced eight sage libraries and obtained a total of 457,894 expressed tags with 32,035 (6.9%) accounting for singleton tags. the tumor samples produced an average of 71,804 tags per library, whereas normal samples produced an average of 42,669 tags per library. our libraries contained 67,200 unique tags representing 16,040 known gene symbols. five hundred and sixty‐eight unique tags were differentially expressed between bas and normal tissue samples (at least twofold; p ≤ 0.05), 395 of these matched to known genes. interestingly, the distribution of altered genes was not uniform across the human genome. overexpressed genes tended to cluster in well‐defined hot spots located in certain chromosomes. for example, chromosome 19 had 26 overexpressed genes, of which 18 mapped to 19q13. using the gene ontology approach for functional classification of genes, we identified several groups that are relevant to carcinogenesis. we validated the sage results of five representative genes (anpep, ecgf1, pp1201, eif5a1, and gkn1) using quantitative real‐time reverse‐transcription pcr on 31 ba samples and 26 normal samples. in addition, we performed an immunohistochemistry analysis for anpep, which demonstrated overexpression of anpep in 6/7 (86%) barrett's dysplasias and 35/65 (54%) bas. anpep is a secreted protein that may have diagnostic and/or prognostic significance for barrett's progression. the use of genomic approaches in this study provided useful information about the molecular pathobiology of bas. © 2007 wiley‐liss, inc. gastroesophageal reflux disease (gerd) is a major health problem in the united states with a prevalence of 5-7% in the general population and an increasing incidence rate (serag, 2006) . approximately 10% of patients with chronic gerd develop a metaplastic condition known as barrett's esophagus (be) in which the normal squamous epithelium of the esophagus is replaced by a columnar epithelium with goblet cells. be is a serious premalignant lesion that can ultimately progress from metaplasia to dysplasia and subsequently to barrett's adenocarcinoma (ba) (ferraris et al., 1997; o'connor et al., 1999; rana and johnston, 2000) . the incidence of ba has rapidly increased in the western world over the past three decades (hamilton et al., 1988; phillips et al., 1991; blot et al., 1993) , and is comprised of aneuploid tumors characterized by complex molecular alterations (el-rifai et al., 2001; . several genetic abnormalities have been associated with barrett's tumorigenesis, including microsatel-lite instability (meltzer et al., 1994) , loss of heterozygosity (dolan et al., 1999) , gene-promoter hypermethylation (sato and meltzer, 2006) , as well as up-and down-regulation of various genes (wu et al., 1993; swami et al., 1995; regalado et al., 1998; brabender et al., 2002) . comprehensive molecular analyses of dna amplifications and gene expression have revealed complex genetic alterations in gastroesophageal and lower esophageal adenocarcinomas (el-rifai et al., 1998; varis et al., 2002; van dekken et al., 2004; kuwano et al., 2005) . analyses of the human transcriptome map of normal tissues have shown clustering of highly expressed genes in chromosomal domains (caron et al., 2001) . chromosomal arms and bands are known to occupy specific locations within the nucleus known as chromosome territories (cts). the positioning of a gene(s) can influence its access to the machinery responsible for specific nuclear functions such as transcription and splicing (cremer and cremer, 2001) . recently, a few reports have suggested the presence of transcriptional hot spots in the cancer genome, (wu et al., 2006) where overexpressed genes tend to cluster in defined chromosomal domains; however, similar information remains lacking for most cancer types. serial analysis of gene expression (sage) provides unlimited, comprehensive, genome-wide analysis of gene expression in a given cell population (velculescu et al., 1995 . the major advantage in using sage is the quantitative ability to accurately evaluate transcript numbers without prior sequencing information. this method has proven invaluable in studies of several tumor types, including adenocarcinomas of the colon (parle-mcdermott et al., 2000; st croix et al., 2000) , prostate (culp et al., 2001) , pancreas (argani et al., 2001) , ovary (hough et al., 2000) , and breast (seth et al., 2002) . in this study, we explored the ba transcriptome using sage and mapped gene-expression changes to chromosomal positions, thereby generating a map of transcriptional oncogenomic hot spots of this deadly cancer. high-quality total rna (500 lg) was extracted from four intestinal-type, moderately to poorly differentiated, ba cases (three gastroesophageal junctional [gej] and one lower esophageal) using an rneasy kit (qiagen, hilden, germany) . in addition, four normal gastric mucosa pools were used as reference samples. each of these pools consisted of four normal gastric mucosal biopsy samples from four different individuals. the tumors selected for sage analysis were estimated to consist of more than 70% tumor cells. all normal samples had histologically normal mucosae confirmed on review of hematoxylin-and eosin-stained sections. importantly, histopathological examination confirmed that none of the normal samples had any areas of inflammation or necrosis. all samples were collected with consent in accordance with approved institutional review board protocols. sage libra-ries were constructed using nlaiii as the anchoring enzyme and bsmfi as the tagging enzyme as described in sage protocol version 1.0e, june 23, 2000, which includes a few modifications of the standard protocol (velculescu et al., 1995) . a detailed protocol and schematic of the method is available at (http://www.sagenet.org/protocol/ index.htm). we sequenced 20,000 clones with an average of 2,500 clones per library, using the cancer genome anatomy project (cgap). esage 1.2a software was used to extract sage tags, remove duplicate ditags, tabulate tag contents, and link sage tags in the database to unigene clusters using the recently reported ehm-tag-mapping method (margulies and innis, 2000; margulies et al., 2001) . the resulting libraries' tags were compared with unigene clusters and the sage tag ''reliable'' mapping database (http://www.sagenet. org/resources/genemaps.htm). statistical analyses of these tags were then performed using esage software. quantitative real-time reverse-transcription pcr quantitative real-time reverse-transcription pcr (qrt-pcr) was performed on 31 adenocarcinomas of barrett's-related origin, 26 normal gastric epithelial tissues, and 6 barrett's metaplasia tissue samples. all tissues were dissected to obtain !70% cell purity. all of the adenocarcinoma samples were collected from the gej or lower esophagus and ranged from well differentiated (wd) to poorly differentiated (pd), stages i-iv, with a mix of intestinal-and diffuse-type tumors. rna was purified from all samples using an rneasy kit. single-stranded cdna was generated using an advantage tm rt-for-pcr kit (clontech, palo alto, ca). qrt-pcr was performed using an icycler (biorad, hercules, ca) with sybr green technology, and the threshold cycle numbers were calculated using icycler software v3.0. reactions were performed in triplicate and threshold cycle numbers were averaged. for validation of sage results, we designed gene-specific primers for human anpep, ecgf1, pp1201, eif5a1, gkn1, and hprt1. these primers were obtained from integrated dna technologies (idt, coralville, ia) and their sequences are available upon request. a single-melt curve peak was observed for each product, thus confirming the purity of all amplified cdna products. the qrt-pcr results were normalized to hprt1, which had minimal variation in all normal and neoplastic samples tested. fold overexpression was calculated according to the formula, 2 ðr t àe t þ =2 ðr n àe n þ , as described earlier (buck, 2002) where r t is the threshold cycle number for the reference gene observed in the tumor, e t is the threshold cycle number for the experimental gene observed in the tumor, r n is the threshold cycle number for the reference gene observed in the normal sample, and e n is the threshold cycle number for the experimental gene observed in the normal sample. r n and e n values were averages of the corresponding normal analyzed samples. the relative fold expression with standard error of mean (6sem) is shown in figure 2 . immunohistochemistry immunohistochemical (ihc) analysis of anpep protein expression was performed on a tumor tissue microarray (tma) that contained 65 adenocarcinomas. samples from adjacent normal and dysplastic tissues were included when available. all tissue samples were histologically verified, and representative regions were selected for inclusion in the tma. all of the adenocarcinoma samples were collected from either the gej or lower esophagus and ranged from wd to pd, stages i-iv, with a mix of intestinal-and diffuse-type tumors. tissue cores with a diameter of 0.5 mm were retrieved at least two tumors showed more than fivefold change (p 0.01). tags with ''0'' value were replaced with arbitrary 0.5 values for relative calculation of fold expression. the ratio was calculated after normalization to total tag numbers. figure 1. chromosomal localization of deregulated genes. chromosomal regions that contain up-regulated genes are shown in red, whereas those that contain down-regulated genes are displayed in green. regions which contain both up-and down-regulated genes are colored in yellow. the distribution of these genes did not follow a random distribution pattern and several genomic regions contain clusters of deregulated genes. some of the more significant ''hot spots'' can be seen here on chromosomes 1 (p 5 0.01), 3 (p 5 0.02), 12 (p 5 0.01), 15 (p 5 0.01), and 19 (p 5 0.01). from the selected regions of the donor blocks and punched to the recipient block using a manual tissue array instrument (beecher instruments, silver spring, md). each tissue sample was represented by four tissue cores on the tma. sections (5 lm) were transferred to polylysine-coated slides (super-frostplus, menzel-gläser, braunschweig, germany) and incubated at 378c for 2 hr. the resulting tma was used for ihc analysis utilizing a 1:50 dilution of anpep antibody (cd13/aminopeptioverexpressed genes 1q21 13 s100a16, s100a2, s100a7, s100a9, s100a8, ecm1, s100a10, s100a6, lmna, sprr3, hdgf, hist2h2be, tagln2 dase-n ab-3 mouse monoclonal antibody; lab vision corporation, fremont, ca). sections were deparaffinized and rehydrated. tma slides were treated in a microwave with citrate buffer for 20 min and incubated with the antibody at room temperature. detection was performed using an avidin-biotin immunoperoxidase assay. cores with no evidence of staining, or only rare scattered positive cells less than 3%, were recorded as negative. the overall intensity of staining was recorded as that for the core with the strongest intensity. ihc results were evaluated for intensity and frequency of staining. the intensity of staining was graded as 0 (negative), 1 (weak), 2 (moderate), and 3 (strong). the frequency was graded from 0 to 4 by percentage of positive cells as follows: grade 0, <3%; grade 1, 3-25%; grade 2, 25-50%; grade 3, 50-75%; grade 4, >75%. the index score was the product of multiplication of the intensity and frequency grades, which was then classified into a 4point scale: index score 0 5 product of 0, index score 1 5 products 1 and 2, index score 2 5 products 3 and 4, index score 3 5 products 6 through 12. sequence analyses of 20,000 clones from eight sage libraries produced 457,894 expressed tags, with 32,035 tags (6.9%) accounting for singleton tags. the four tumor sage libraries (gsm758, gsm757, hg7, and hs29) produced 287,219 tags with an average of 71,804 tags per library. the normal samples (gsm14780, gsm784, 13s, and 14s) produced 170,675 tags with an average of 42,669 tags per library. the comparison of expressed tags to the unigene cluster release of may 2005 identified 67,200 unique sage tags. these tags represented 16,040 known gene symbols according to unigene information. of these, 568 unique tags were differentially expressed between bas and normal tissue samples (at least twofolds and p 0.05). these unique tags matched 395 known genes (242 upregulated and 153 downregulated) that regulate diverse cellular functions and signaling pathways, which may prove to be quite significant in the detection and prevention of cancer. ninety-three genes were significantly altered, showing a greater than fivefold expression change in at least two tumor libraries as compared to all four normal libraries (p 0.01) ( table 1) . fortyeight genes showed up-regulation, whereas 45 were down-regulated. the group of over-expressed genes contained several with known cancer-related functions, including members of s100a calciumbinding proteins, heat-shock protein 27 kda (hsb1), heat-shock 90 kda protein beta (hspcb), prothymosin (ptma), transmembrane bax inhibitor motif containing-1 (pp1201), peroxiredoxin-3 (prdx3), and endothelial growth factor-1 (ecgf1). down-regulated transcripts included genes such as gastrokine (gkn1), down-regulated in gastric cancer (gddr), gastric intrinsic factor (gif), methyl-cpg binding domain protein 3 (mbd3), and trefoil factor 2 (tff2). cgap maintains the public sage database for gene expression in human cancer (lal et al., 1999) , and sequence data are publicly available at http:// www.ncbi.nih.gov/geo and http://cgap.nci.nih.gov/ sage/. onto-express online software (http://vortex.cs. wayne.edu/index.htm) (khatri et al., 2002; draghici et al., 2003) was used to identify potential transcriptional oncogenomic hot spots in the genome and obtain the functional classification of the deregulated genes. we mapped all sage unique transcripts (16,040 gene symbols) to their corresponding cytogenetic locations. the altered transcripts (395 known gene symbols) were analyzed against all transcripts to generate an expression ideogram and identify transcription hotspots (fig. 1) . interestingly, the distribution of altered genes was not uniform along the human chromosomes. overexpressed genes tended to cluster in well-defined hot spots across the human genome (table 2) . for example, 26 overexpressed genes mapped to chromosome 19, of which 18 mapped to the single chromosome band 19q13. similarly, 35 genes mapped to chromosome 1, of which 13 mapped to the chromosome band 1q21. table 3 and figure 1 summarize these data and map the genes to their corresponding cytogenetic locations. gene ontology (go) terms are organized in three general categories: biological process, cellular role, and molecular function; terms within each go category are linked in defined parent-child relationships that reflect current biological knowledge (ashburner et al., 2000) . among the 395 differentially expressed genes, the number corresponding to each category was tallied and compared with the number expected for each go category based on its representation on the reference gene list, which contained all of the unique 16,040 known gene symbols detected by analysis of the eight sage libraries. significant differences from the expected were calculated with a twosided binomial distribution. false discovery rates (benjamini et al., 2001) and bonferroni adjustments were also calculated. the biological meaning of the p values obtained depends upon the list of genes that are submitted; as our gene list is from a comparison of ba samples, it can be inferred that this cancer stimulates the processes involved within the functional groups that were most highly represented in the results of the go classification. in our set of differentially expressed genes, the functional groups demonstrating the most significant representation appear under the biologicalprocess ontology and map to the cell-cycle regulation, dna binding and regulation, cell-environment interaction, and cell-signaling categories. table 4 summarizes several important go functional classes. to evaluate further the sage data, we selected five novel genes (anpep, ecgf1, pp1201, eif5a1, and gkn1, all of which have important cellular or biological features) for validation with qrt-pcr. we confirmed over-expression of anpep, ecgf1, pp1201, and eif5a1 and downregulation of gkn1 in primary gej and lower esophageal adenocarcinoma samples (table 5 , fig. 2) . interestingly, gkn1 was not expressed in normal esophageal mucosa samples but showed a transient expression in be samples where 4/6 of these samples demonstrated expression levels comthe average ratio is shown. this ratio was calculated by comparing the total number of tags in tumor samples and normal samples. parable to those observed in normal gastric mucosae. we did not have samples with barrett's dysplasia for qrt-pcr. the gkn1 expression was lost in almost all adenocarcinoma samples (fig. 2) . the qrt-pcr products were run on 1.2% agarose gels for visual confirmation of these results (fig. 3) . rt-pcr results for all five genes were also compared in each individual primary tissue sample to determine any correlations in combined gene expression levels; however, we were unable to find any correlations of statistical significance. the ihc analysis demonstrated a lack of immunostaining for anpep in normal esophageal and gastric epithelial tissues. on the other hand, bas showed overexpression of anpep (score 11 to 13) in 35/65 (54%) tumors. a weak to moderate expression of anpep (score 11 to 12) was observed in 6/7 (86%) high-grade barrett's dysplasia samples. the immunostaining pattern of anpep was cytoplasmic with strong extracellular and luminal expression (fig. 4) . the immunostaining for anpep was observed in tumors with intestinal and diffuse histological subtypes and in all stages (table 6 ). however, the relatively small sample size did not provide a sufficient statistical power to detect significant correlations between the ihc staining patterns and clinicopathological factors such as tumor histology, grade, or stage. in this study, we performed a comprehensive analysis of the transcriptome of bas using sage. the major advantage to using sage is the quantitative ability to evaluate accurately transcript numbers without prior sequence information. the sage analysis produced a great deal of information about transcripts and candidate cancer genes, and we have interpreted these data in terms of possible genomic and functional organization of candidate cancer genes. sage analysis requires laborious and extensive sequencing that often limits the number of samples that are subjected to analysis. we obtained a total of 457,894 expressed tags from eight sage libraries with minimal singleton tags (32,035; 6.9%). the qrt-pcr analysis on a larger sample size confirmed the sage results and validated the overexpression of anpep, ecgf1, pp1201, and eif5a1 and downregulation of gkn1. ecgf1 (thymidine phosphorylase) expression has been shown to correlate with the angiogenic activity of some tumors (mazurek et al., 2006) . ecgf1 expression may be a sign of tumor-stromal interac(54) 13/13 (100) n3-n4 0/0 (0) 0/0 (0) 0/0 (0) 0/0 (0) 0/0 (0) na 3/10 (30) 6/10 (60) 5/10 (50) 2/10 (20) 9/10 (90) a values in parentheses are percentages. na, information not available; gej, gastroesophageal junction; eso, esophageal; wd, well-differentiated; md, moderately-differentiated; pd, poorly differentiated. we did not observe statistical significance with any of the correlates due to small sample size. the horizontal axis shows sample numbers, whereas the fold expression in tumor samples compared with that in normal samples is shown on the vertical axis. the fold expression was calculated according to the formula: 2 ðrtàetþ =2 ðrnàenþ as detailed in the ''materials and methods'' section. each bar represents one sample. the displayed mean fold expression for each sample is calculated in comparison with the expression average of the 26 normal samples. the expression of each gene was normalized to the expression of hprt1, which showed minimal variation in all normal and neoplastic samples tested. gkn1 shows downregulation ( 0.4-fold expression) whereas anpep, pp1201, eif5a1, and ecgf1 demonstrate overexpression (!2.5 fold expression) in primary tumors as compared to normal tissue samples. tion promoting greater vascularization around the cancer lesion and has also been found to protect cells from dna-damaging agents and related apoptosis (jeung et al., 2006) . eif5a1 (eukaryotic translation factor 1) has been shown to be involved in cell proliferation through the action of polyamines (nishimura et al., 2002 (nishimura et al., , 2005 , and plays a role in the regulation of tp53-related apoptosis (li et al., 2004) . pp1201, also known as transmembrane bax inhibitor motif-containing 1 (tmbim1), is a novel gene of cancer cells. although very little is known regarding gkn1, it has been previously reported as highly expressed in normal gastric epithelium (martin et al., 2003) and down-regulated in gastric carcinomas (oien et al., 2004) . we have detected strong expression of gkn1 in be that was followed with loss of its expression in adenocarcinomas. this transient expression of gkn1 may be a protective response to acid-induced reflux-disease injury that is the lost with cellular progression to cancer. anpep, also known as cd13, is of a particular clinical interest since it is a secreted protein that may be used as a potential biomarker. using ihc, analysis of anpep expression demonstrated protein expression at the outer cell membrane layers with significant secretion into the lumen of 6/7 barrett's high-grade dysplasia samples and generally greater expression in 35/65 adenocarcinomas, suggesting that anpep overexpression may be an early event in carcinogenesis. anpep expression plays a role in angiogenesis where a reduction in expression has been shown to cause reduced capillary formation (fukasawa et al., 2006) , cell motility (chang et al., 2005) , and adhesion (fukasawa et al., 2006) . inhibition of anpep decreases the invasive potential of metastatic tumor cells in vitro (saiki et al., 1993) . interestingly, anpep is also a cell-surface metalloproteinase that acts as a recep-tor for human coronavirus (yeager et al., 1992) and is considered to be a marker for epithelial-mesenchymal interaction (sorrell et al., 2003) . the combination of transcriptional analysis together with cytogenetic information provided a powerful tool to align altered transcripts across the human genome. interestingly, the distribution of deregulated genes did not follow a uniform pattern across the genome. instead, we found a remarkable pattern of distribution with the presence of transcriptional hot spots along chromosomal domains. from this pattern, we were able to identify novel, transcriptionally active, and oncogenomic hot spots. one of our surprising findings was the clustering of 26 overexpressed genes in one of the smallest human chromosomes, 19. we also identified a number of other hot spots, such as 1q21 (13 genes), 12p13 (9 genes), and 6p21.2 (6 genes) (table 2) in a recent analysis of amplification-based clustering demonstrated that cancers with similar etiology, cell-of-origin, or topographical location have a tendency to obtain convergent amplification profiles (myllykangas et al., 2006) . in line with this observation, vogel et al. (2005) reported that genes expressed in concert are organized in a linear arrangement for coordinated regulation. the present evidence suggests organization of a large proportion of the human transcriptome into gene clusters throughout the genome, which are partly regulated by the same transcription factors, share biological functions, and are characterized by nonhousekeeping genes (vogel et al., 2005) . taken together, our results further highlight the complex organization of the cancer genome and suggest that integrated analysis of the transcriptome may reveal similar findings in other tumors as well. each cancer candidate gene was assigned to a functional group based on go information (table 4 ). five matched tumor and normal samples that were analyzed using qrt-pcr were subjected to 1.2% agarose gel electrophoresis and ethidium bromide staining. the intensity of bands confirms the pcr results, indicat-ing higher mrna expression levels of anpep, pp1201, eif5a1, and ecgf, as well as lower expression of gkn1 in most of the tumor samples as compared with their matched normal control samples. hprt1 was used as a control to show similar levels in each matched normal and tumor samples. using this approach, several groups that are highly interesting and relevant to carcinogenesis were identified including transcriptional regulators (38 genes) and zinc finger transcription factors (23 genes). similarly, several candidate genes were found to be involved in the notable functional groups of cell-environment interaction and signal transduction. subsets of these groups were of inter-est and included metalloproteinases and g proteins and their regulators. among the interesting groups, we also observed deregulation of 31 genes that regulate cell calcium homeostasis. the role of calcium-binding proteins in carcinogenesis has drawn a complex picture showing downregulation or overexpression depending upon the tumor type and location (kao et al., 1990; mueller et al., 1999 ; heighway et al., 2002; heizmann et al., 2002; imazawa et al., 2005) . the sage data also indicated up-regulation of several members of the protein phosphatases such as ppap2b, hif3a, and ppp2r1b that are known to regulate and activate several cellular kinases (parsons, 1998; nigg, 2001; bakkenist and kastan, 2004; ventura and nebreda, 2006) . we have recently shown that over-expression of ppp1r1b in gastrointestinal cancers is associated with several oncogenic properties including the resistance of cancer cells to drug-induced apoptosis (belkhiri et al., 2005) . taken together, our data suggest a genomic organization of cancer genes, which are involved in the deregulation of specific cellular processes important for the tumorigenesis cascade. in conclusion, our findings indicate the presence of transcriptionally active oncogenomic hot spots in the cancer genome of bas. we have detected deregulation of several important cancer genes and identified novel targets for carcinogenesis. the biological functions and clinical significance of these genes will be elucidated in future studies. discovery of new markers of cancer through serial analysis of gene expression: prostate stem cell antigen is overexpressed in pancreatic adenocarcinoma gene ontology: tool for the unification of biology. the gene ontology consortium phosphatases join kinases in dnadamage response pathways darpp-32: a novel antiapoptotic gene in upper gastrointestinal carcinomas controlling the false discovery rate in behavior genetics research continuing climb in rates of esophageal adenocarcinoma: an update glutathione s-transferase-pi expression is downregulated in patients with barrett's esophagus and esophageal adenocarcinoma secreted and cell surface genes expressed in benign and malignant colorectal tumors the human transcriptome map: clustering of highly expressed genes in chromosomal domains cd13 (aminopeptidase n) can associate with tumor-associated antigen l6 and enhance the motility of human lung cancer cells chromosome territories, nuclear architecture and gene regulation in mammalian cells tracking prostate carcinoma micrometastasis to multiple organs using histochemical marker genes and novel cell systems loh at the sites of the dcc, apc, and tp53 tumor suppressor genes occurs in barrett's metaplasia and dysplasia adjacent to adenocarcinoma of the esophagus global functional profiling of gene expression molecular and biologic basis of upper gastrointestinal malignancy gastric carcinoma consistent genetic alterations in xenografts of proximal stomach and gastro-esophageal junction adenocarcinomas genetic differences between adenocarcinomas arising in barrett's esophagus and gastric mucosa gastric cancers overexpress darpp-32 and a novel isoform, t-darpp incidence of barrett's adenocarcinoma in an italian population: an endoscopic surveillance programme. gruppo operativo per lo studio delle precancerosi esofagee (gospe) aminopeptidase n (apn/ cd13) is selectively expressed in vascular endothelial cells and plays multiple roles in angiogenesis prevalence and characteristics of barrett esophagus in patients with adenocarcinoma of the esophagus or esophagogastric junction expression profiling of primary non-small cell lung cancer for target identification s100 proteins: structure, functions and pathology large-scale serial analysis of gene expression reveals genes differentially expressed in ovarian cancer s100a2 overexpression is frequently observed in esophageal squamous cell carcinoma protection against dna damage-induced apoptosis by the angiogenic factor thymidine phosphorylase active involvement of ca2 1 in mitotic progression of swiss 3t3 fibroblasts profiling gene expression using onto-express genetic alterations in esophageal cancer a public database for gene expression in human cancers a novel eif5a complex functions as a regulator of p53 and p53-dependent apoptosis esage: managing and analysing data generated with serial analysis of gene expression (sage) a comparative molecular analysis of developing mouse forelimbs and hindlimbs using serial analysis of gene expression (sage) a novel mitogenic protein that is highly expressed in cells of the gastric antrum mucosa evaluation of tumor angiogenesis and thymidine phosphorylase tissue expression in patients with endometrial cancer microsatellite instability occurs frequently and in both diploid and aneuploid cell populations of barrett's-associated esophageal adenocarcinomas subcellular distribution of s100 proteins in tumor cells and their relocation in response to calcium activation dna copy number amplification profiling of human neoplasms cell cycle regulation by protein kinases and phosphatases inhibition of cell growth through inactivation of eukaryotic translation initiation factor 5a (eif5a) by deoxyspergualin independent roles of eif5a and polyamines in cell proliferation the incidence of adenocarcinoma and dysplasia in barrett's esophagus: report on the cleveland clinic barrett's esophagus registry gastrokine 1 is abundantly and specifically expressed in superficial gastric epithelium, down-regulated in gastric carcinoma, and shows high evolutionary conservation serial analysis of gene expression identifies putative metastasis-associated transcripts in colon tumour cell lines phosphatases and tumorigenesis high-dose cytarabine and daunorubicin induction and postremission chemotherapy for the treatment of acute myelogenous leukemia in adults incidence of adenocarcinoma and mortality in patients with barrett's oesophagus diagnosed between 1976 and 1986: implications for endoscopic surveillance abundant expression of the intestinal protein villin in barrett's metaplasia and esophageal adenocarcinomas role of aminopeptidase n (cd13) in tumor-cell invasion and extracellular matrix degradation cpg island hypermethylation in progression of esophageal and gastric cancer time trends of gastroesophageal reflux disease: a systematic review novel estrogen and tamoxifen induced genes identified by sage (serial analysis of gene expression) production of a monoclonal antibody, df-5, that identifies cells at the epithelial-mesenchymal interface in normal human skin. apn/cd13 is an epithelial-mesenchymal marker in skin e-cadherin expression in gastroesophageal reflux disease, barrett's esophagus, and esophageal adenocarcinoma: an immunohistochemical and immunoblot study evaluation of genetic patterns in different tumor areas of intermediate-grade prostatic adenocarcinomas by high-resolution genomic array analysis targets of gene amplification and overexpression at 17q in gastric cancer serial analysis of gene expression analysing uncharted transcriptomes with sage protein kinases and phosphatases as therapeutic targets in cancer chromosomal clustering of a human transcriptome reveals regulatory background sucrase-isomaltase gene expression in barrett's esophagus and adenocarcinoma a novel method for gene expression mapping of metastatic competence in human bladder cancer human aminopeptidase n is a receptor for human coronavirus 229e the contents of this work are solely the responsibility of the authors and do not necessarily represent the official views of the national cancer institute, university of virginia, or vanderbilt university. we thank mr. frank revetta for his technical assistance and mrs. sheryl mroz for editing this manuscript. (63) 14 (67) 6 (100) 6 (75) (20) na, information not available; gej, gastroesophageal junction; eso, esophageal; wd, well-differentiated; md, moderately-differentiated; pd, poorly differentiated. we did not observe statistical significance with any of the correlates due to small sample size. a values in parentheses are percentages. key: cord-355075-ieb35upi authors: papenfuss, anthony t; baker, michelle l; feng, zhi-ping; tachedjian, mary; crameri, gary; cowled, chris; ng, justin; janardhana, vijaya; field, hume e; wang, lin-fa title: the immune gene repertoire of an important viral reservoir, the australian black flying fox date: 2012-06-20 journal: bmc genomics doi: 10.1186/1471-2164-13-261 sha: doc_id: 355075 cord_uid: ieb35upi background: bats are the natural reservoir host for a range of emerging and re-emerging viruses, including sars-like coronaviruses, ebola viruses, henipaviruses and rabies viruses. however, the mechanisms responsible for the control of viral replication in bats are not understood and there is little information available on any aspect of antiviral immunity in bats. massively parallel sequencing of the bat transcriptome provides the opportunity for rapid gene discovery. although the genomes of one megabat and one microbat have now been sequenced to low coverage, no transcriptomic datasets have been reported from any bat species. in this study, we describe the immune transcriptome of the australian flying fox, pteropus alecto, providing an important resource for identification of genes involved in a range of activities including antiviral immunity. results: towards understanding the adaptations that have allowed bats to coexist with viruses, we have de novo assembled transcriptome sequence from immune tissues and stimulated cells from p. alecto. we identified about 18,600 genes involved in a broad range of activities with the most highly expressed genes involved in cell growth and maintenance, enzyme activity, cellular components and metabolism and energy pathways. 3.5% of the bat transcribed genes corresponded to immune genes and a total of about 500 immune genes were identified, providing an overview of both innate and adaptive immunity. a small proportion of transcripts found no match with annotated sequences in any of the public databases and may represent bat-specific transcripts. conclusions: this study represents the first reported bat transcriptome dataset and provides a survey of expressed bat genes that complement existing bat genomic data. in addition, these data provide insight into genes relevant to the antiviral responses of bats, and form a basis for examining the roles of these molecules in immune response to viral infection. bats make up approximately 20% of the extant mammalian diversity and are the second most species rich mammalian lineage after rodents [1] . the order chiroptera is divided into two suborders: the megachiroptera and microchiroptera. these two lineages are estimated to have diverged approximately 58 million years ago [2] . megachiroptera consists of a single family, the old world fruit bats, while microchiroptera includes 17 families of echolocating bats. bats have a wide geographic distribution and exploit a variety of environmental niches, being absent only from the polar regions. bats are also hosts to numerous viruses, many of which are highly pathogenic to humans and other mammals yet appear to cause no clinical consequences in bats [3] [4] [5] [6] [7] [8] . this group of mammals also shares a variety of unique characteristics that likely facilitate the persistence and spread of the viruses they carry. highly social species, bats live at much higher densities than other mammals. they are the only mammals capable of powered flight and have long lifespans relative to their body size [9] . despite their diversity, unique characteristics and role as natural reservoirs for viruses, bats are also the least studied of all mammalian taxa and there is little information available on antiviral immunity in any bat species. bats are the natural reservoir hosts of more than 80 viruses, with new viruses or viral sequences of bat origin being discovered each year [9, 10] . rna viruses account for the overwhelming majority of known bat viruses, many of which are among the most deadly known to man, including ebola, hendra, nipah and sars-like coronaviruses [9] . many of these viruses, which cause severe morbidity and mortality in humans and other mammals, appear to cause no clinical diseases in bats under natural or experimental infection. the most studied example is the henipaviruses (hendra and nipah viruses) which are members of the family paramyxoviridae. nipah virus has a mortality rate of 40-90% in humans and close to 100% in experimental animal models (cats and hamsters). yet, infection of pteropus vampyrus (the natural reservoir host of nipah virus in malaysia) and p. poliocephalus (a related bat species native in australia) by a high dose of nipah virus, failed to result in clinical signs of disease [7, 8, 11] . other examples of experimental infections of bats including ebola zaire, japanese encephalitis and st. louis encephalitis viruses have not resulted in any symptoms of disease despite the presence of viral rna in tissues [3] [4] [5] [6] . experimental infections of p. poliocephalus with nipah virus have demonstrated the presence of serum antibody and viral shedding in the absence of clinical symptoms of disease [11] . the only viruses that have been demonstrated to cause clinical symptoms of disease in bats are rabies virus and the closely related australian bat lyssavirus [12, 13] . however, results of experimental infections are inconsistent with only a small proportion of bats succumbing to infection, and rates of sero-conversion and virus recovery from tissues were reported to be very low [13] . the long co-evolutionary history of bats with viruses has likely resulted in the adaptation of the bats immune system to cope with viral infection. one hypothesis is that the innate immune system rapidly controls viral replication to very low levels that cause no clinical consequences to bats, but still results in viral shedding and subsequent spillover to other species. however, as little information currently exists on any aspect of bat immunology and few bat-specific reagents exist, this hypothesis remains untested. recent years have seen a surge in the availability of whole genome sequence data. bats were among the organisms sequenced as part of the us national institutes of health (nih)-funded mammalian genome project. these genomic resources are an important step forward in identifying the genes that are involved in antiviral immunity in bats and in providing insights into other unique life history characteristics. there are currently two publicly available bat genome sequences: one from the megabat p. vampyrus and a second from the microbat myotis lucifugus. both bat genomes were initially sequenced to low coverage (2.6x for p. vampyrus and 1.7x for m. lucifugus, though a draft quality assembly of the m. lucifugus genome based on 7x coverage sequencing is now available). additionally, the annotations were predominantly based upon comparative data. despite these shortcomings, these projects have an important role to play in revealing the mechanisms that have evolved to allow bats to remain asymptomatic to so many viral diseases. in order to understand bat-virus interactions, we are developing the australian black flying fox, p. alecto, as a model bat species. p. alecto belongs to the family pteropodidae and is closely related to p. vampyrus [14] . these two species are reservoirs for a variety of closely related viruses, the most important of which include the henipaviruses, hendra virus in p. alecto and nipah virus in p. vampyrus [10] . a number of important resources have now been developed for p. alecto, including cell lines from a variety of tissues [15] . we have also begun to identify some of the genes involved in immune responses in this species and carry out functional studies in bat cells [16] [17] [18] [19] [20] [21] . to begin to characterise the immune gene repertoire of p. alecto, we sequenced the transcriptome of bat immune tissues and mitogen-stimulated cells using the illumina platform. to our knowledge, this study represents the first analysis of the transcriptome of any species of bat. our analysis of the p. alecto transcriptome provides information on a variety of immune genes not previously identified in any bat species and represents an important starting point for examining the antiviral activity of these molecules. overview of the bat transcriptome two separate transcriptomic datasets were generated and raw sequences from each database were submitted to the sequence read archive [sra: srr350710.3 and srr351237.2]. the first was obtained using total rna extracted from a juvenile male flying fox thymus. due to its role in central tolerance, the thymus expresses a large proportion of the proteome and therefore allows for the identification of a broad range of genes, including those involved in the immune response. to enrich for sequences corresponding to cytokines and innate immune genes, the second dataset was derived from pooled total rna obtained from mitogen-stimulated spleen, white blood cells and lymph node and unstimulated thymus and bone marrow obtained from one pregnant female and one adult male flying fox. cells were stimulated with lipopolysaccharide (lps) and ionomycin, which stimulate the production of pro-inflammatory cytokines; polyic, a tlr3 ligand; pha, which triggers t cell activation and pma, which activates t and b cells. about 12.5 million 65 bp long reads were obtained from the thymus dataset, while 23.9 million 76 bp long reads were generated from the stimulated pooled sample. prior to assembly, the raw reads were trimmed of low quality sequence and polya/t tails, uninformative strings of 'n' and primer/adapter contaminants were cleaned. the filtered dataset consisted of 12,399,095 reads from the thymus (between 20-63 bp) and 22,577,294 reads from the stimulated pooled dataset (between 20-73 bp). the filtered reads were de novo assembled using the software packages velvet and oases. the resulting oases assemblies consisted of 247,909 contigs (n50 1244 bp) from the thymus and 313,641 contigs (n50 733 bp) from the pooled samples. the largest contigs in the thymus and pooled samples were 11.8 kb and 8.9 kb respectively, both of which correspond to the dna-dependent protein kinase catalytic subunit (dna-pkcs) which is represented by a 12.4 kb transcript in other species, including horse. for comparative purposes, an assembly using mira was also generated. summary statistics from the velvet, oases and mira assemblies are listed in additional file 1: table s1 . all subsequent analyses were performed using the oases assemblies. to identify orthologues of known mammalian protein coding genes, the bat contigs were used to search the kegg and ncbi non-redundant (nr) protein databases with blastx (e-value < 0.001). of the 247,801 contigs longer than 50 bp in the thymus sequence assembly, about 46% matched annotated proteins in the nr database. for the pooled dataset, about 51% of the 313,528 loci matched proteins in nr. similar results were obtained for both assembled libraries against the kegg database. of the assembled thymus transcripts annotated using kegg, 36% of all transcripts were more similar to horse sequences than to any other species, followed by dog (16%) and cow (12%) (figure 1 ). similar results were obtained for the pooled tissue dataset (not shown). this result is consistent with the now generally accepted view that bats belong within laurasiatheria, which includes carnivora, cetartiodactyla (whales and even toed ungulates), eulipotyphla (moles and shrews), pholidota (scaly anteater) and perissodactyla (odd toed ungulates) [22] [23] [24] [25] [26] [27] . however, until recently, the phylogenetic relationships within laurasiatheria have been controversial. conflicting results have been reported using complete mitochondrial genome sequences to infer phylogenetic relationships with support for a sister relationship between chiroptera and fereungulata (carnivora, pholidota, perissodactyla and cetartiodactyla) or a relationship between chiroptera and eulipotyphla [28] [29] [30] . analysis of the nuclear gene, protamine p1, as well as large genomic datasets, has provided evidence that bats are sister to a clade containing perissodactyla, carnivora, and cetartiodactyla [31, 32] . the volume of sequence data generated by transcriptome sequencing provides the opportunity for larger scale sequence comparisons than previously possible using the few full length bat genes available or by comparison with the limited whole genome sequence data. our results support the comparative analysis of retroposon loci which has also demonstrated that bats share a sister relationship with horses, forming a clade with carnivora [27] . alignment of contigs from the thymus and pooled datasets to the kegg database identified 178,554 and 285,268 contigs respectively with homology to 16,863 and 16,927 unique human proteins. to explore gene function, gene ontology (go) terms were used. of contigs that matched proteins in the kegg database, 86% were assigned go terms and 78% could be mapped to go slim terms using go term mapper (additional file 2: figure s1 ). genes with go slim terms were further classified into twelve selected classes ( figure 2 ). the most abundant go terms found in the thymus dataset were involved in cell growth and maintenance (16.8%), enzyme activity (14.8%), cellular components (14.3%) and metabolism and energy pathways (14.5%). similar results were obtained for the pooled tissue dataset (data not shown). the go classification demonstrates that a diverse range of genes were identified in each of our two datasets providing a broad survey of bat genes. a goal of the present study was to identify immune transcripts, particularly those that may play a role in antiviral immunity. only 3.5% of the bat transcribed genes from each of the datasets showed homology to genes associated with immune function. this represents about 500 different immune-related genes ( figure 2 ). the bat immune transcripts were further categorised using go terms to annotate the transcripts into 40 immune categories. represented in the datasets were genes involved in a broad range of immune activities with lymphocyte activation, cytokine production and t cell activation making up the largest proportions of immune transcripts ( figure 3 ). using kegg codes to identify immune genes, our data revealed 70 genes involved in toll-like receptor (tlr) cascades, 50 genes involved in b cell activation, 79 involved in t cell activation, 72 involved in natural killer cell cytotoxicity and 41 involved in antigen presentation. additional immune genes not identified in the kegg database were obtained by searching sequences from the nr database. the sequences of all genes described in the text are provided in the additional file 3. one hypothesis for the ability of bats to resist the pathological effects of viral infection is that they are able to rapidly control viral replication early in the immune response through innate antiviral mechanisms. the bat transcriptome contained representatives of a variety of immune genes including pattern recognition receptors, interferons, interferon stimulated genes and natural killer cell receptors. pattern recognition receptors (prr) including tlrs, rig-i like helicases (rlhs) and nucleotide oligomerisation domain (nod) like receptors (nlrs) recognise conserved molecular patterns associated with a broad range of pathogens. both tlrs and rlhs initiate signalling pathways that result in the induction of similar immune and inflammatory responses but are expressed in different locations within the cell and differ in the pathogens they recognise. tlrs are transmembrane proteins expressed by the plasma membrane or endosome and recognise a broad range of pathogens including viruses, bacteria and fungi. of eleven previously identified p. alecto tlr genes [18] , only tlr5 was absent from the oases assemblies, however it was present in the mira assembly, which used a lower coverage cut-off and is useful for identifying genes with low expression levels. rlhs are expressed in the cytoplasm where they recognise viral rna and dna [33, 34] . three bat rlh genes, retinoic-acid-inducible protein i (rig-i), melanoma-differentiation-associated gene 5 (mda5) and laboratory of genetics and physiology 2 (lgp2) were identified in our transcriptome datasets and have recently been described in p. alecto [17] . these results provide further evidence that bats are able to recognise a broad range of pathogens, similar to other species. nlrs are a diverse family of cytoplasmic prrs involved in the activation of a variety of signalling pathways. nlrs are primarily involved in bacterial recognition, although more recently, evidence for recognition of viral rna and dna by some members of the nlr family has been reported [35] [36] [37] . the only nlrs identified in the bat transcriptome datasets were nod-like receptor family card domain containing 5 (nlrc5) and nlr family, pyrin domain containing 3 (nlrp3). nlrc5 is a recently identified nlr proposed to function as a positive and negative regulator of antiviral immune responses [36] . nlrp3 (also known as nalp3) is activated by a variety of danger signals including viral and bacterial infections and environmental irritants. activation of nlrp3 in turn activates caspase-1 in the inflammasome which proteolytically cleaves the cytokines il-1î² and il-18 into active mature peptides [37] . the identification two nlrs with associations with antiviral immunity in the bat transcriptome is remarkable and provides a starting point for understanding the role of nlrs in antiviral immunity in bats. the interferon (ifn) response is a key component of the innate immune system and the first cytokines induced against viral infection. since the ifn response is important in the control of viral replication in other mammals, we searched the bat transcriptome for ifns and ifn stimulated genes (isgs) that may be critical to the ability of bats to remain asymptomatic to viral infections. type i (including ifnî± and î²) and iii (ifnî») ifns are induced directly in response to viral infection and play a role in the earliest stages of the innate immune response. type i (î±) ifn and its receptor (ifnar1 and ifnar2) were identified in the bat transcriptome datasets (additional file 3). although type iii ifns, ifnî»1 and ifnî»2 are upregulated in stimulated bat cells [21] , neither of these genes were identified in our datasets, likely reflecting a low expression level in our samples. the il-10r2 chain of the type iii ifn receptor was present in the bat transcriptome, but its partner chain ifnlr1 was not found. both il-10r2 and ifnlr1 were recently described in p. alecto and ifnlr1 was demonstrated to act as a functional receptor for ifnî» [20] . the induction of type i and type iii ifns results in the transcription of hundreds of isgs including prrs that detect viral rna, transcription factors that result in the amplification of the ifn response and a small number of proteins that are directly responsible for inducing an antiviral state. the isgs, myxovirus resistance (mx) gtpases, protein kinase r (pkr), 2'-5' oligoadenylate synthetases (oas), ribonuclease l (rnasel) and isg15 are among the proteins with confirmed antiviral activity in other mammals [38] . the bat transcriptome datasets contained genes orthologous to mammalian mx1, mx2, oas1, oas2, oas3, oas-like (oasl), pkr, rnasel and isg15 consistent with the presence of an isg repertoire in bats that is similar to that of other species. these results provide the first evidence that the pathways activated by the ifn response are likely similar in bats to those described in other mammals. the mx gene family is among the best characterised isgs, first identified as antiviral proteins following the observation that the sensitivity of many inbred mouse strains to orthomyxovirus was solely due to mutations within the mx locus [39] . the mx family of gtpases trap essential viral components, and in so doing prevent viral replication at early time points. although the full spectrum of mx antiviral activity is unknown, representatives of both rna and dna viruses have been shown to be sensitive to the effects of mx [40] . a full length transcript, encoding a 667 amino acid protein was identified in our bat transcriptome datasets and found to be orthologous to mx1 based on comparison with known mammalian mx1 and mx2 family members (figure 4a and data not shown). bat mx1 contained the highly conserved tripartite gtp-binding domain found in all mammalian mx proteins. in addition, a dynamin family signature and putative leucine zipper motif were found near the c terminal end, represented by a stretch of evenly spaced leucine residues. the bat protein was also conserved in the region identified as the stalk of human mxa including loop 2 which is associated with antiviral activity. consistent with other species, loop 4 of the mxa stalk is the least conserved region of the bat mx protein [41] . loop 4 has been reported to be proteinase k sensitive and may play a role in lipid binding [42, 43] (figure 4b ). bat mx1 does not contain the stretch of basic amino acids (k/r) near the c terminal end associated with nuclear localisation of mouse mx1, consistent with the bat protein remaining localised within the cytoplasm [44] . the conservation of key residues important in antiviral activity is consistent with the bat mx1 playing a role in antiviral immunity similar to other species. the identification of the sequences of important isgs will now allow us to determine whether functional differences in the initiation and regulation of these proteins account for the differences in susceptibility of bats to viral infections compared to other mammals. natural killer (nk) cells are an important component of the innate immune response, providing a first line of defence against viruses and tumours. to our knowledge, no investigations of nk cell receptors from any species of bat have been reported previously. nk cells express cell surface receptors that recognise major histocompatibility complex (mhc) class i or class i like molecules on the surface of cells and lyse infected or abnormal cells by cytotoxicity. two families of nk receptors that bind classical mhc class i ligands have been identified: the killer immunoglobulin like receptors (kirs), which are encoded by genes in the leukocyte receptor complex (lrc), and the killer cell lectin like receptors (klrs), which are encoded by genes in the natural killer complex (nkc). different lineages of mammals make use of genes from the two different superfamilies to carry out analogous functions. kirs are used preferentially by primates, cattle, domestic cats, dogs and pigs [45, 46] . similarly, the kir-like receptors, marsupial immunoglobulin-like receptors (mairs) and chicken immunoglobulin-like receptors (chirs), have expanded in marsupials and chickens respectively [47, 48] . although chir-ab binds igy, the ligand for the majority of chirs is unknown and the presence of a charged transmembrane residue and a cytoplasmic immunoreceptor tyrosine-based inhibition motif (itim), are consistent with the possibility that they play a role in nk activity [49] . rodents, horses and platypus are the only species so far described that have expanded the klrs, represented by the ly49 family [50] [51] [52] . in the bat transcriptome dataset, no transcripts with homology to kirs or ly49 receptors were identified. in bony fish, novel immune type receptors (nitrs) which contain an n terminal variable domain and a c terminal ig domain have been identified as the primary activating and inhibitory receptors expressed by nk cells [53] . nitrs were also used to search the bat transcriptome but failed to identify any orthologous transcripts. the failure to find kir or ly49 like receptors in the bat transcriptome may reflect low expression levels of these genes resulting in their absence from our datasets. however, blast searches of the publicly available whole genome sequence of the closely related pteropid bat, p. vampyrus revealed no evidence of kirs or ly49 receptors. as this is a low coverage genome (2.63x), further work is required to determine whether pteropid bats have kir and/or ly49 receptors. overall, the absence of these important nk receptors from our datasets warrants further investigation into the nature of nk cells in bats. nk cells in a wide range of mammalian species additionally express cd94/nkg2 (also called klrd1/ klrc) lectin-like receptor heterodimers. unlike the kir and ly49 receptors, which bind (classical) mhc class ia ligands, the cd94/nkg2 heterodimer binds the (non-classical) mhc class ib ligands hla-e and qa-1 in humans and mice respectively [54] . the cd94/ nkg2a heterodimer generates inhibitory signals whereas the cd94/nkg2c heterodimer generates activating signals within nk cells. both cd94 and nkg2a were identified in the bat transcriptome, however nkg2c transcripts were not identified, possibly reflecting the low abundance of transcripts of this gene in our datasets. two and 37 nkg2a transcripts were identified in the thymus and pooled datasets respectively and six transcripts corresponding to cd94 were identified in the pooled dataset. two of the longest nkg2a sequences were aligned with nkg2a and nkg2c sequences from human and mouse. as shown in figure 5a , the bat genes display highest conservation with other nkg2a genes including the presence of conserved itim motifs in their cytoplasmic domains, designated by i/v/l/sxyxxl/v indicating that they are likely functional inhibitory receptors [55] . the more divergent nkg2d, which binds mhc class i chain-related genes, mica/b, and the ul16 binding proteins (ulbps) in human [46] , was also detected. two distinct bat cd94 contigs were identified, one of which is missing two conserved cysteines in the stalk region, the first of which forms an interchain disulfide bond with nkg2 and the second which forms an intrachain disulphide bond. the second bat cd94 sequence is missing a conserved cysteine in the extracellular domain that forms an intrachain disulphide bond (figure 5b ). the absence of key cysteines in both of the bat cd94 sequences may have implications for the formation of heterodimers with nkg2 and for the unique folding of the cd94 chain. combined with our failure to detect kirs or ly49 receptors, our data may provide the first evidence for the presence of atypical nk cell responses in bats. however, confirmation of the nature of the nk response and the composition of receptors used by bat nk cells awaits further investigation. other nk receptors were also identified in our datasets including cd244 which acts as an activating or inhibitory receptor on human and mouse nk cells respectively [56] and the natural cytotoxicity receptors expressed by nk cells. co-receptors including cd16 and cd56 expressed by subsets of nk cells in other species were also identified in the bat transcriptome. identification of nk cell receptors and co-receptors provides information for the development of reagents to identify bat nk cells and paves the way for further studies of nk cell function during viral infection in bats. genes involved in the adaptive immune system, including mhc class i and ii genes and t and b cell receptors and co-receptors were highly represented in both the thymus and pooled datasets providing evidence that bats have all of the components necessary to mount an adaptive immune response. mhc class i molecules play an important role in the initiation of the adaptive immune response through recognition of endogenously-derived peptides from viruses and other pathogens. in the thymus dataset, 46 contigs had homology to mammalian mhc class i proteins, while 24 were homologous in the pooled data. other transcripts in the mhc class i antigen-loading and presentation pathway were also identified, including beta-2-microglobulin, transporter associated with antigen processing 1 (tap1), calnexin and tapasin. class irelated genes were also present in the bat transcriptome dataset including cd1a, cd1b, cd1d, mr1, hfe, fcrn and ulbps, which have a variety of immune and nonimmune functions. the presence of ulbps is consistent with the expression of nkg2d, but orthologues of mica/b or mill were not observed. the presence of nkg2d suggests bats should have a mic homologue, but these may not be detected possibly due to low or tissue-specific expression. to our knowledge, these sequences provide the first class i and class i-associated transcripts from any species of bat. of the 46 contigs with homology to mhc class i genes in the thymus dataset, 29 contained in-frame stops. these may be expressed pseudogenes, represent assembly or sequencing errors or result from reading frame shifts due to the presence of unprocessed transcript. as the sequences were obtained from multiple individuals, it is not possible to confidently distinguish between alternative isoforms, alleles and in some cases, loci. however, clustering the remaining contigs with open reading frames (orfs), there are clearly at least 9 distinct mhc class i genes expressed. the majority of class i contigs contained the î±1 or î±2 domain or partial sequence corresponding to both domains and were used for further sequence analysis. the deduced amino acid sequence of contigs with the most complete î±1 or î±2 domains were aligned with human hla-a ( figure 6 ). all of the bat class i sequences contained a unique three amino acid insertion in the î±1 domain that appears to be bat specific. as shown in figure 6 , the bat transcripts display amino acid variation in their î±1 and î±2 domains, corresponding to the peptide binding region. however, they appear to be remarkably conserved from residues 131 to 175 of the î±2 domain. these results may indicate that bats contain a very closely related class i gene repertoire that have coevolved with the specific viruses they carry. some of the class i transcripts represented in the thymus and pooled datasets contained an 84 bp insertion at the end of the î±1 domain. the longest of these transcripts corresponded to the leader peptide through to 71 amino acids of the î±2 domain and is shown in figure 6 . the insertion at the end of the î±1 domain is not present in class i sequences from other mammals and includes two in frame stop codons that would prevent translation beyond the î±1 domain ( figure 6 ). this sequence was figure 5 a. alignment of deduced amino acid sequences of bat nkg2 with human and mouse nkg2a and nkg2c. sequences are divided into cytoplasmic, transmembrane, stalk, and lectin domains. the predicted itim motifs in the cytoplasmic domain are shaded. the conserved cysteine residue in the stalk predicted to be involved in interchain disulphide bond formation with cd94 is shaded and indicated with an asterisk. dashes indicate similarity and dots indicate gaps. b. alignment of the deduced amino acid sequences of bat cd94 with the human and mouse orthologues. sequences are divided into cytoplasmic, transmembrane, stalk, and lectin domains. conserved cysteines predicted to be involved in disulphide bond formation are shaded. cysteine pairs are indicated by identical numbers below the cysteine. the cysteine predicted to form a disulphide bond with nkg2 is indicated with an asterisk. confirmed by race pcr and transcripts were detected in a variety of tissues including lymph node, spleen, liver, lung, heart, kidney, small intestine, brain and salivary glands, thus providing evidence that they are not an artefact of the transcriptome assembly (data not shown). comparison with the closely related p. vampyrus whole genome sequence available in ensembl revealed that the 84 bp insertion is identical to the beginning of intron 2 of a p. vampyrus class i gene. mhc class i splice variants that retain intron sequence and result in the translation of a truncated protein have been identified in other mammals, including soluble splice variants of human hla-g that plays a role in immunoregulation at the feotal-maternal interface [57] . further investigation will be required to determine whether the bat gene encodes a soluble protein corresponding only to the î±1 domain or whether it represents a transcribed pseudogene. however, given the abundance of this transcript in our datasets it is possible that it plays a role in immune regulation in p. alecto. unlike class i molecules, which are ubiquitously expressed, class ii molecules are expressed only by antigen presenting and b cells and present exogenously derived peptides to t cells. the mhc class ii molecules are composed of an î± and a î² chain encoded by a and b genes respectively [58, 59] . eutherians have three main classical class ii gene clusters: dp, dr, and dq, as well as the nonclassical dm and dn/do gene clusters [60, 61] . sequences corresponding to exon 2 of mhc class ii drb genes have been described in four species of microbats [62] [63] [64] . however, prior to the present study no class ii genes have been reported from any species of megabat. sequences corresponding to genes involved in the class ii antigen processing and presentation pathway were also identified in our datasets including the class ii invariant (cd74) chain and cathepsin s (additional file 3). in the p. alecto thymus and pooled datasets we identified 78 and 238 contigs respectively that were homologous to class ii sequences. phylogenetic analysis revealed that the alpha chain sequences were homologous to dma, doa, dqa and dra from other mammals (figure 7a ) and the beta chain sequences were homologous to dmb, dob, dqb and drb (figure 7b ). these results are consistent with orthologous relationships between the bat class ii genes and those from other mammals. t cell receptor (tcr) genes corresponding to all four chains of the t cell receptor were present in our datasets, consistent with bats having both î±î² and î³î´ t cells. sequences corresponding to the constant and variable domains of the tcr were identified including many tcrî± related contigs, tcrî² related contigs, a few tcrî³ and tcrî´ chain related contigs. in humans and mice approximately 95% of circulating t cells express the î±î² t cell receptor. in contrast, î³î´ t cells account for up to 70% of circulating t cells in young ruminants, rabbits and chickens [65, 66] . the low abundance of tcrî³ and tcrî´ related transcripts in our datasets is consistent with the possibility that î±î² t cells may be the predominant tcr present in bats. in addition, a variety of t cell co-receptors, including the accessory tcrî¶ chain, cd3, cd4, cd8 and cd28 were identified in our datasets. we previously described the immunoglobulin heavy chain diversity of p. alecto, revealing the presence of a highly diverse variable region gene repertoire [16] . sequences encoding the variable and constant domains of immunoglobulin heavy and light chains were represented in our datasets. these included heavy chain genes encoding iga, igg, igm and ige, which have previously been described in the megabat, cynopterus sphinx. no evidence for the transcription of igd was observed in the p. alecto transcriptome, a result which is consistent with c. sphinx [67] . the two light chain subtypes, kappa and lambda and a variety of b cell co-receptors including cd19, cd22, cd72, cd79a and cd79b were also identified in our datasets (additional file 3). many of the bat immune transcripts showed high levels of sequence similarity compared to homologues from other mammals. among the most conserved bat innate immune genes were the prrs; the tlrs, rig-i helicases and nlrs, which displayed >80% amino acid identity with homologues. this likely reflects their roles in the recognition of conserved pathogen motifs. members of the oas family were also highly conserved, in particular oas1 which shared 87% amino acid identity with the dog oas1 sequence. in addition, the nk co-receptor, cd56 shared 93% amino acid identity with mouse, hamster, guinea pig and human sequences. among the adaptive immune genes, mhc associated proteins, calnexin, tap1 and cathepsin s shared 89-95% identity with corresponding sequences from other mammals reflecting their conserved roles in the antigen processing and presentation pathway. several members of the mhc class i and ii families were also highly conserved, including cd1b and cd1d which shared 88 and 89% amino acid identity with horse and chimp sequences respectively. the bat mhc class ii doa and dra shared 91 and 89% amino acid identity with orthologous sequences in other mammals. the t cell co-receptor, cd28 shared 90% identity with the rhinoceros cd28 sequence and the constant domain of igm shared 92% identity with camel igm. there were >77,000 unannotated contigs in the thymus and pooled datasets. only about 3% of these contigs matched predicted cdnas from the p. vampyrus genome sequence, which are annotated using orthologous sequences from other species [68] . the unannotated contigs contained a total of 3266 open reading frames (orfs) longer than 300 bp. of these, 92.6% (e-value < 10 -3 ) aligned to the closely related p. vampyrus whole genome sequence and represent highly divergent homologues or bat specific genes. the remaining loci represent either misassembled contigs or bat-specific transcripts that are located in sequencing gaps in the low coverage p. vampyrus genome sequence. the 3266 long (>300nt) orfs were searched for conserved domains using profile hidden markov models with hmmscan (hmmer v3; http://hmmer.org/) obtained from the pfam database [69] . this identified 345 orfs containing 214 unique domains, including several defensins, antimicrobial peptides and dna-binding domains. searches using domain models from the pfam-b database, identified a further 437 unique, predicted-conserved domains in 733 orfs. a further 2188 orfs remained unannotated. a high proportion of these were rich in cysteine, tryptophan and proline, and prolines frequently appeared in low complexity regions (additional file 4: figure s2a and b) . further characterisation of these unannotated transcripts will provide insight into whether they are functionally significant and in particular whether any unique bat specific transcripts are involved in the antiviral immune response. bats are a highly diverse, species rich group of mammals that have evolved a variety of distinctive characteristics since their divergence from other mammals [70] . despite the central importance of bats in harbouring a variety of viruses with the potential to spillover to other species, very little is known about antiviral immunity in bats. next generation sequencing provides the opportunity to survey genes that are conserved between distantly related species as well as to provide insights into novel adaptations through the identification of previously unidentified transcripts. to identify genes involved in the immune response, we carried out a transcriptome analysis of thymus and immune cells and tissues of the australian black flying fox, p. alecto. this study represents the first survey of expressed bat immune genes and complements existing low coverage bat genome sequences. our analysis provides a broad overview of the bat transcriptome and contains representatives of all of the major classes of immune genes. the results are consistent with bats having all of the components of the immune system present in other mammals. the majority of these correspond to genes that have not previously been described in any species of bat and thus represent an important resource for future investigations into antiviral immunity in bats. animals p. alecto bats used in this study were wild caught from east brisbane, queensland, australia. bats were handled and euthanised as previously described [15] . all experiments were approved by the australian animal health laboratories animal ethics committee (protocol aec1281). the thymus was removed from a juvenile male bat and immediately stored in rnalater (ambion) for rna extraction. the spleen, lymph nodes (ln), thymus, bone marrow and peripheral blood were collected from one adult male and one pregnant female bat. single cells were extracted from the spleen, thymus and ln by tissue extrusion through a 70 î¼m sterile sieve (bd biosciences) in the presence of dmem supplemented with 15 mm l-glutamine, 100 units/ml penicillin and 100 units/ml streptomycin (invitrogen). splenocytes and peripheral blood lymphocytes (pbmls) were isolated by density centrifugation over lymphoprep (nycomed, oslo) as described previously [21] . cells were resuspended in dmem with 10% fcs, 15 mm l-glutamine, 100 units/ml penicillin and 100 units/ml streptomycin and cell numbers were determined using a haemocytometer with trypan blue exclusion. the thymus and bone marrow cells were stored in rnalater (ambion) for rna extraction and the spleen, thymus and ln were cultured with a variety of stimulants. the isolated splenocytes, ln and pbmls from each bat were pooled and were then seeded at 1 x 10 7 cells per well in 24 well tissue culture plates (nunc) with pha (10ug/ml; sigma) and lps (10 î¼g/ml; sigma); pma (50 î¼g/ml; sigma) and ionomycin (2nm/ml; sigma); or polyic (30 î¼g/ml; invivogen) and incubated in a humidified atmosphere of 5% co 2 in air at 37â°c. cells were harvested in rlt buffer (qiagen) at 0, 1, 4 and 18 hours and homogenised using a qiashredder (qiagen) following the manufacturer's instructions. the lysate was then stored at â��80â°c and total rna extracted the next day (0, 1, and 4 hours) or processed immediately (18 hours). rna extraction was carried out as previously described using the rneasy mini kit (qiagen) with removal of genomic dna with dnase i digestion [16] . total rna from the thymus of a juvenile male bat was used for illumina sequencing separately from all other samples. total rna obtained from the stimulated and unstimulated cells from the two adult bats was pooled as follows: 22% thymus total rna (11% from each bat) and 78% pooled total rna from the rest of the mitogen stimulated and unstimulated cells/tissues (~3.45% for each sample; total of 22 samples). sequencing mrna isolation from total rna, library preparation and single-end read sequencing was performed by geneworks pty ltd, thebarton south australia using the illumina genome analyser iix sequencing platform. library preparation was performed as per illumina's mrna sequencing sample preparation guide (part # 1004898 rev. d) except 5 î¼g of total rna was used for mrna selection using poly-t oligo-attached magnetic beads. the thymus library was run on a single lane of a flow cell resulting in more than 12.5 million 65-base sequences for a total of about 0.82 gigabases (gb) of sequence. the pooled library consisted of 4 lanes resulting in 24 million 76 bp sequences for a total of about 1.8gb of sequence. sequence pre-processing and de novo assembly the quality of the sequences were evaluated using fastqc [71] . sequences were pre-processed in two stages. first, all bases at the 3' end of the reads with quality scores of 3 or lower were removed. second, poly a/t tails, uninformative sequences (ns) and primer/adaptor contaminants were trimmed using snowhite (version 1.1.3) [72], a cleaning pipeline for next-generation cdna sequences, which includes seqclean [73] and tagdust [74] . we ran snowhite with two runs of seqclean and one run of tag-dust and a final minimal length cutoff of 20 bp was used. the pre-processed sequences were de novo assembled using two different approaches. (1) the reads were assembled with velvet (version 1.0.12) [75] using individual kmers from 19 bp to 31 bp. next, the contigs produced by velvet were processed using oases (version 0.1.15) [76] . oases loci were then merged using cd-hit-est (version 4.0) [77] with a global sequence identity threshold of 1.0. finally, a length cutoff was set to 50 bp and the default coverage cutoff of 3 was used. we term the final result of this process a contig (2) . the reads were also assembled using mira 3 (v3.2.0rc3) [78] with default setting for est and illumina reads assembly, i.e. maximum front and end gap clip is 2 bp, maximum length of the possible vector leftover allowed is 18 bp, minimum quality score, window length and read length were all set to 20, allowed to clip poly a/t at ends, and minimum read coverage per contig was 2. the bat contigs were firstly annotated by using the best hits of blastx [79] search against nr protein database and kegg pathway database with an e-value cutoff of 0.001 for annotating the protein coding contigs that were conserved with other species. then the unannotated contigs were further annotated by using blastn search against refseq_rna database with an e-value cutoff of 10 -5 for the contigs containing conserved utrs and without significant protein coding regions. the contigs not annotated by the above two steps were further analysed by using blastn against the cdnas from megabat (p. vampyrus) and microbat (m. lucifugus). we translated the un-annotated transcripts into protein sequences from 6 frames, extracted the orfs longer than 300 bp. this was performed separately for the 2 datasets. these orfs were searched against pfam-a and pfam-b databases to identify conserved domains. the two sets of long orfs were pooled and clustered based on cd-hit with sequence identity of 50% [77] . the amino acid compositions were further analysed for the nonredundant longer orfs with composition profiler [80] . all the kegg ids of the human proteins identified by the blastx searches were extracted from the annotation process and were mapped to uniprot ids. then the go analysis for the uniprot proteins (uniprotkb-goa: gene_association.goa_human) was used to assign the go terms for the transcripts. the number of genes in categories of the go slim database was counted using the go term classification counter, categoriser [81, 82] and the immune category of the bat genes was annotated using the generic gene ontology term mapper [83] . the go classifications were further grouped into twelve broad categories as follows: cell death and apoptosis go:0005783; extracellular matrix go:0005739; cell, go:0005623 and nucleoplasm, go:0005654). binding (binding, go:0005488; protein binding, go:0005515 go:0003677; nucleotide binding go:0004872; actin binding, go:000377; calcium ion binding, go:0005509; chromatin binding, go:0003682; carbohydrate binding, go:0030246 and rna binding, go:0003723). reproduction and development (development enzyme activity (catalytic activity go:0016787 ca), based on the protein alignment to retain codon positions. based on the nucleotide and protein alignments, phylogenetic trees were constructed by the neighbour joining method [85], maximum parsimony and minimum evolution using the mega4 program the genbank accession numbers for sequences used in the sequence and phylogenetic analysis are as follows: mhc class i: (cap58485) hla-a; mhc class iia: human, homo sapiens (hosa) dma (nm_006120) dqa (m21931), dra (u13648) dra (nm_001113706), dma (nm_001004039) gallus gallus (gaga) b-la (ay357253) mhc class iib: hosa dqb (m20432), drb (nm_021983), dob (l29472), dpb (m57466), dmb (u15085) rattus norvegicus (rano) dqb (x56596) equus caballus (eqca) dqb (l33910) dmb (dq431246), drb (ay191776) ovis aries (ovar) dqb (l08792) mumu (nm_001136068) hosa (af135187_1) avian mx: gaga (np_989940) mammal species of the world: a taxonomic and geographic reference in the timetree of life human ebola outbreak resulting from direct exposure to fruit bats in luebo, democratic republic of congo swanepoel r: fruit bats as reservoirs of ebola virus studies of arthropod-borne virus infections in chiroptera. iv. the immune response of the big brown bat (eptesicus f. fuscus) maintained at various environmental temperatures to experimental japanese b encephalitis virus infection experimental inoculation of plants and animals with ebola virus transmission studies of hendra virus (equine morbillivirus) in fruit bats, horses and cats experimental hendra virus infection in pregnant guinea-pigs and fruit bats (pteropus poliocephalus) bats: important reservoir hosts of emerging viruses bats as a continuing source of emerging infections in humans experimental nipah virus infection in pteropid bats (pteropus poliocephalus) australian bat lyssavirus infection in a captive juvenile black flying fox pathogenesis studies with australian bat lyssavirus in grey-headed flying foxes (pteropus poliocephalus) a phylogenetic supertree of the bats (mammalia: chiroptera) establishment, immortalisation and characterisation of pteropid bat cell lines immunoglobulin heavy chain diversity in pteropid bats: evidence for a diverse and highly specific antigen binding repertoire molecular characterisation of rigi-like helicases in the black flying fox, pteropus alecto molecular characterisation of toll-like receptors in the black flying fox pteropus alecto interferon production and signaling pathways are antagonized during henipavirus infection of fruit bat cell lines type iii ifn receptor expression and functional characterisation in the pteropid bat, pteropus alecto type iii ifns in pteropid bats: differential expression patterns provide evidence for distinct roles in antiviral immunity complete mitochondrial genome of a neotropical fruit bat, artibeus jamaicensis; and a new hypothesis of the relationships of bats to other eutherian mammals parallel adaptive radiations in two major clades of placental mammals molecular phylogenetics and the origins of placental mammals resolution of the early placental mammal radiation using bayesian phylogenetics molecules consolidate the placental mammal tree pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions the phylogenetic position of the talpidae within eutheria based on analysis of complete mitochondrial sequences monophyletic origin of the order chiroptera and its phylogenetic position among mammalia, as inferred from the complete sequence of the mitochondrial dna of a japanese megabat, the ryukyu flying fox (pteropus dasymallus) maximum likelihood analysis of the complete mitochondrial genomes of eutherians and a reevaluation of the phylogeny of bats and insectivores confirming the phylogeny of mammals by use of large comparative sequence data sets characterization and phylogenetic utility of the mammalian protamine p1 gene differential roles of mda5 and rig-i helicases in the recognition of rna viruses shared and unique functions of the dexd/h-box helicases rig-i, mda5, and lgp2 in antiviral innate immunity function of nod-like receptors in microbial recognition and host defense regulation of immune pathways by the nod-like receptor nlrc5 nlrp3 inflammasome activation: the convergence of multiple signalling pathways on ros production? interferon-inducible antiviral effectors transgenic mice with intracellular immunity to influenza virus the mx gtpase family of interferon-induced antiviral proteins. microbes and infection the interferon-induced mx protein of chickens lacks antiviral activity stalk domain of the dynamin-like mxa gtpase protein mediates membrane binding and liposome tubulation via the unstructured l4 loop structural basis of oligomerization in the stalk region of dynamin-like mxa transport of the murine mx protein into the nucleus is dependent on a basic carboxy-terminal sequence natural killer cell receptors in cattle: a bovine killer cell immunoglobulin-like receptor multigene family contains members with divergent signaling motifs comparative genomics of natural killer cell receptor gene clusters characterization of the opossum immune genome provides insights into the evolution of the mammalian immune system the leukocyte receptor complex in chicken is characterized by massive expansion and diversification of immunoglobulin-like loci the chicken leukocyte receptor complex encodes a family of different affinity fcy receptors the ever-expanding ly49 gene family: repertoire and signaling natural killer cell receptors in the horse: evidence for the existence of multiple transcribed ly49 genes identification of natural killer cell receptor clusters in the platypus genome reveals an expansion of c-type lectin genes the phylogenetic origins of natural killer receptors and recognition: relationships, possibilities, and realities nk gene complex dynamics and selection for nk cell receptors signaling pathways engaged by nk cell receptors: double concerto for activating receptors, inhibitory receptors and nk cells of mice and men: different functions of the murine and human 2b4 (cd244) receptor on nk cells biology and functions of human leukocyte antigen-g in health and sickness* nomenclature for factors of the hla system three-dimensional structure of the human class ii histocompatibility antigen hla-dr1 sequence organisation of the class ii region of the human mhc evolutionary relationships of class ii majorhistocompatibility-complex genes in mammals class ii drb polymorphism and sequence diversity in two vesper bats in the genus myotis non-neutral evolution of the major histocompatibility complex class ii gene drb1 in the sac-winged bat saccopteryx bilineata mhc class ii drb diversity, selection pattern and population structure in a neotropical bat species, noctilio albiventris prominence of gamma delta t cells in the ruminant immune system characterization of avian t-cell receptor î³ genes the two suborders of chiropterans have the canonical heavy-chain immunoglobulin (ig) gene repertoire of eutherian mammals the pfam protein families database a molecular phylogeny for bats illuminates biogeography and the fossil record tagdust-a program to eliminate artifacts from next generation sequencing data velvet: algorithms for de novo short read assembly using de bruijn graphs oases: de novo transcriptome assembler for very short reads cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences using the miraest assembler for reliable and automated mrna transcript assembly and snp detection in sequenced ests gapped blast and psi-blast: a new generation of protein database search programs composition profiler: a tool for discovery and visualization of amino acid composition differences categorizer categorizer: a web-based program to batch analyze gene ontology classification categories generic gene ontology (go) term mapper multiple sequence alignment using clustalw and clustalx the neighbor-joining method: a new method for reconstructing phylogenetic trees mega4: molecular evolutionary genetics analysis (mega) software version 4.0 statistics of local complexity in amino acid sequences and sequence databases accuracy of protein flexibility predictions the universal protein resource (uniprot) submit your next manuscript to biomed central and take full advantage of: â�¢ convenient online submission â�¢ thorough peer review â�¢ no space constraints or color figure charges â�¢ immediate publication on acceptance â�¢ inclusion in pubmed, cas, scopus and google scholar â�¢ research which is freely available for redistribution we thank craig smith, carol de jong, deborah middleton low complexity regions in protein sequences were detected with the seg program with default parameters [87] . the transcription of a bat mhc class i gene was examined using quantitative pcr (qpcr) as described previously [18] . briefly, total rna was prepared from lymph node, spleen, liver, lung, heart, kidney, small intestine, brain and salivary glands using the rneasy mini kit (qiagen) as described above. cdna was generated using a quantitect reverse transcription kit for rt-pcr (qiagen). qpcr primers were designed using primer express 3.0 (applied biosystems) with default parameter settings (5'-acgactcctattccccaggatag-f and 5'-gaaagc cactggtacctgtgaga-r). reactions were carried out using express sybr w greener tm qpcr supermix universal (invitrogen) and an applied biosystems 7500 fast real-time qpcr instrument. additional file 1: table s1 . summary of additive multiple-kmer velvet/ oases/mira3 assembly.additional file 2: figure s1 . overview of the bat transcriptome. the distribution of 178,554 and 285,268 transcriptome sequences that have mapped to human orthologues from p. alecto thymus and pooled tissue datasets based on go slim terms. sequences within the three areas of gene ontology: molecular function, biological process and cellular component are further divided into subgroups at the go slim level.additional file 3: sequences of all genes described in the manuscript.additional file 4: figure 2 . amino acid composition of large unannotated orfs. the horizontal axis shows amino acids sorted by flexibility index [88] .a. amino acid composition of 1656 large unannotated non-redundant orfs relative to proteins in the swissprot database [89] . the amino acids trp, cys and pro have twice the abundance in unannotated orfs compared to swissprot proteins.b. amino acid composition of 1195 low complexity regions in unannotated orfs relative to 1656 unannotated non-redundant orfs. prolines are abundant in low complexity regions, but trp and cys are not. the authors declare that they have no competing interests. key: cord-289033-vfh3op6a authors: algammal, abdelazeem m.; w. el-kholy, ali; riad, emad m.; mohamed, hossam e.; elhaig, mahmoud m.; yousef, sulaiman a. al; hozzein, wael n.; ghobashy, madeha o. i. title: genes encoding the virulence and the antimicrobial resistance in enterotoxigenic and shiga-toxigenic e. coli isolated from diarrheic calves date: 2020-06-10 journal: toxins (basel) doi: 10.3390/toxins12060383 sha: doc_id: 289033 cord_uid: vfh3op6a calf diarrhea is one of the considerable infectious diseases in calves, which results in tremendous economic losses globally. to determine the prevalence of shiga-toxigenic e. coli (stec) and enterotoxigenic e. coli (etec) incriminated in calf diarrhea, with special reference to shigatoxins genes (stx1 and stx2) and enterotoxins genes (lt and sta) that govern their pathogenesis, as well as the virulence genes; eaea (intimin) and f41(fimbrial adhesion), and the screening of their antibiogram and antimicrobial resistance genes; aadb, sul1, and bla-tem, a total of 274 fecal samples were collected (april 2018–feb 2019) from diarrheic calves at different farms in el-sharqia governorate, egypt. the bacteriological examination revealed that the prevalence of e. coli in diarrheic calves was 28.8%. the serotyping of the isolated e. coli revealed 7 serogroups; o(26), o(128), o(111), o(125), o(45), o(119) and o(91). furthermore, the congo red binding test was carried out, where 89.8% of the examined strains (n = 71) were positive. the antibiogram of the isolated strains was investigated; the majority of e. coli serotypes exhibit multidrug resistance (mdr) to four antimicrobial agents; neomycin, gentamycin, streptomycin, and amikacin. polymerase chain reaction (pcr) was used to detect the prevalence of the virulence genes; stx1, stx2 lt, sta, f41 and eaea, as well as the antimicrobial resistance genes; aadb, sul1, and bla-tem. the prevalence of stec was 20.2% (n = 16), while the prevalence of etec was 30.4% (n = 24). briefly, the shiga toxins genes; stx1 and stx2, are the most prevalent virulence genes associated with stec, which are responsible for the pathogenesis of the disease and helped by the intimin gene (eaea). in addition, the lt gene is the most prevalent enterotoxin gene accompanied by the etec strains, either alone or in combination with sta and/or f41 genes. the majority of pathogenic e. coli incriminated in calf diarrhea possesses the aadb resistance gene, followed by the sul1 gene. enrofloxacin, florfenicol, amoxicillin-clavulanic acid, and ampicillin-sulbactam, are the most effective antimicrobial agents against the isolated stec and etec strains. calf diarrhea is one of the most predominant syndromes in newly born animals all over the world, that are associated with remarkable economic losses, high morbidity and mortality rates [1] . the most prevalent bacterial pathogen which is incriminated in young calf's diarrhea is escherichia. coli (e. coli); moreover, the most common viral causes are rotavirus and coronavirus [2] . based on the molecular and pathological criteria, the most common pathotypes of e. coli incriminated in neonatal colibacillosis are; shiga-toxigenic e. coli (stec), enterotoxigenic e. coli (etec), and enterohemorrhagic e. coli (ehec) [3] . etec is a common pathotype associated with infectious diarrhea in calves. the newly born calves exhibit a high affinity to etec, which is associated with watery diarrhea. they colonize the small intestine after fimbrial adhesion and predispose to severe watery diarrhea [4] . two main virulent factors are included; the fimbriae and the enterotoxins [5] . etec is skillful in generating 2 major types of enterotoxins; heat-labile (lt) and heat-stable (sta and stb) enterotoxins, in both man and animal [6] . ruminants are considered the main reservoir of stec; the severity of infection in ruminants varies depending upon the animal age, immunity, and the gastrointestinal tract conditions. certain animals undergo the exaggerated shedding of stec (>104 cfu/g fecal matter), which results in the contamination of the environment and the transmission of the infection [7] . a large number of stec-outbreaks were reported, due to the ingestion of contaminated vegetables and fruits with the animal feces. stec infection in humans is mainly associated with hemolytic uremic syndrome and hemorrhagic colitis. stec has major public health importance, since it is incriminated in causing several food-borne outbreaks [8] . stec are associated with dysentery in young calves. they produce two various potent types of shiga-toxins; stx1 and stx2, and certain types of stec have the ability to produce the intimin. the e. coli strains, which possess the eaea gene and do not produce stx1 and stx2 genes, were defined as enteropathogenic e. coli [9, 10] . the bacteriophages play a major role in the transmission of stx genes. the stx-phages are sharing an identical sequence that analogous to lambdoid-phages. the presence of stx genes in the phage lysis-portion illustrates the link between the production of shiga-toxins and the release of phage during the lytic growth [3, 10] . the antimicrobial resistance is usually associated with pathogenic e. coli that could be attributed to the widespread improper use of antibiotics. the multidrug resistance (mdr) is common in e. coli and primarily associated with several genes like; bla-tem, blactx (β-lactamase genes), sul1 (sulfonamide resistance gene), and aadb (aminoglycoside resistance gene) [11, 12] . the current study was performed to investigate the prevalence of shiga-toxigenic and enterotoxigenic e. coli incriminated in calf diarrhea, with particular reference to shiga-toxins genes (stx1 and stx2) and enterotoxins genes (lt and sta) that govern their pathogenesis, as well as the virulence genes; eaea and f 41. in addition, the screening of their antibiogram and antimicrobial resistance genes; aadb (aminoglycosides-resistance gene), sul1 (sulfonamides-resistance gene), and bla-tem (extended β-lactamase gene) is conducted, in order to select the antibiotics of choice. the bacteriological examination of 274 fecal swabs obtained from diarrheic calves revealed that the overall prevalence of e. coli was 28.8% (n = 79), based on the microscopical examination, colonial characters on macconkey's agar and eosin-methylene-blue agar, as well as biochemical tests. concerning the age of the examined calves, the results revealed that 15 diarrheic calves were infected with e. coli in the first 2 months of age (33.3%), 41 calves at 2-4 months old (28.5%), and 23 calves at 4-6 months old (27.1%) ( table 1) . in the present study, the serological identification of the retrieved isolates revealed that a total of 64 (81.01%) of the isolated strains were typable with o antisera, while 15 isolates (18.99%) were untypable. the most common serogroup was o128 (16.5%), followed by o111 (13.9%) and o26 (11.4%), o125 (11.4%), o91 (10.1%), o45 (8.9%), and o119 (8.9%). the frequency of e. coli serogroups was illustrated in table 2 . there is no statistically significant difference in the prevalence of e. coli serogroups (p > 0.05). the congo red test revealed that 89.8% of the tested e. coli strains (n = 71) were positive, including the members of serotypes; o26, o111, o125, o128, o45, o119 and 15 untypable strains, while the strains of the o91 serogroup were cr negative (n = 8). the antimicrobial susceptibility testing of the isolated e. coli strains (table 3) showed a remarkable resistance to neomycin (96.2%), gentamycin and streptomycin (95%), and amikacin (93.7%). furthermore, substantial sensitivity was recorded to enrofloxacin (84.9%), florfenicol (82.4%), and both amoxicillin/clavulanic acid and ampicillin/sulbactam (78.5%, each). the statistical analysis proved that the resistance of the tested strains against various antimicrobial agents was significantly different (p < 0.0001). regarding the occurrence of the multidrug-resistance and the distribution of the antimicrobial resistance genes among the isolated strains, we noticed that 41.8% of the tested e. coli strains (n = 33) exhibited multidrug resistance to neomycin, gentamicin, streptomycin, and amikacin and harbored the aadb resistance gene. moreover, 27.8% of the examined strains (n = 22) showed multidrug-resistance to neomycin, gentamicin, streptomycin, amikacin, and trimethoprim/sulfamethoxazole, and harbored both aadb and sul1 resistance genes. in addition, 21.5% of the tested strains (n = 17) exhibited a multidrug-resistance to neomycin, gentamicin, streptomycin, amikacin, amoxicillin/clavulanic acid, and ampicillin/sulbactam, and harbored both aadb and blatem resistance genes (table 4) . concerning the distribution of the antimicrobial resistance genes; the aadb gene is the most predominant antimicrobial resistance gene associated with the retrieved e. coli strains, either alone or in combination with the sul1 gene (27.8%), or in combination with the blatem gene (21.5%). moreover, the sul1 gene was determined alone in 8.8% of the examined strains. statistically, there is a significant difference in the distribution of the antimicrobial resistance genes among the examined e. coli strains (p < 0.05) (tables 5 and 6, and figure 1 ). p value = 0.04 (significant; p < 0.05). regarding the distribution of the virulence genes among the examined e. coli strains, the pcr revealed two pathotypes (40/79, 50.6%); shiga-toxigenic e. coli (stec) (16/79, 20.2%) and enterotoxigenic e. coli (24/79, 30.4%). regarding the stec; the stx1 gene is the most predominant regarding the distribution of the virulence genes among the examined e. coli strains, the pcr revealed two pathotypes (40/79, 50.6%); shiga-toxigenic e. coli (stec) (16/79, 20.2%) and enterotoxigenic e. coli (24/79, 30.4%). regarding the stec; the stx1 gene is the most predominant shiga-toxin gene, either found alone (n = 4) or in combination with stx2 and eaea genes (n = 7), or in combination with the stx2 gene (n = 2), while the stx2 gene was detected alone in three e. coli strains. concerning the etec; the lt gene was the most prevalent enterotoxin gene, either found in combination with the f 41 gene (n = 10), or in combination with sta gene (n = 4), or in combination with sta and f 41 genes (n = 3). in addition, the sta gene was detected alone in seven e. coli strains (tables 6 and 7 , and figures 2 and 3 ). there is no statistically significant difference in the distribution of virulence genes among the isolated e. coli strains (p > 0.05). shiga-toxin gene, either found alone (n = 4) or in combination with stx2 and eaea genes (n = 7), or in combination with the stx2 gene (n = 2), while the stx2 gene was detected alone in three e. coli strains. concerning the etec; the lt gene was the most prevalent enterotoxin gene, either found in combination with the f41 gene (n = 10), or in combination with sta gene (n = 4), or in combination with sta and f41 genes (n = 3). in addition, the sta gene was detected alone in seven e. coli strains (tables 6 and 7 , and figures 2 and 3 ). there is no statistically significant difference in the distribution of virulence genes among the isolated e. coli strains (p > 0.05). this study was aimed at determining the prevalence of stec and etec incriminated in calf diarrhea, with special reference to the shiga-toxins genes (stx1 and stx2) and enterotoxins genes (lt and sta) that govern their pathogenesis, as well as the virulence genes; eaea and f41, and the screening of their antimicrobial resistance profiles and antimicrobial resistance genes; aadb, sul1, and bla-tem. the overall prevalence of e. coli was 28.8%, which is lower than what was reported in diarrheic calves in egypt (63.6%) [13] , ethiopia (36.8%) [14] , argentina (30.1%) [15] and india (85.04%) [16] ; however, a lower prevalence was reported in other previous studies in korea (22%) [17] and switzerland (5.5%) [18] . differences in the prevalence of e. coli may be due to the differences in geography, management practices, floor type, health conditions, and the calf's age [2, 13, 15] . further, the high rate of e. coli isolation in the present study could be attributed to many reasons, such as the mixing of different age groups, poor environmental and hygienic conditions, or the poor quantity and/or quality of colostrum. in addition, the colostrum maternal antibodies cannot neutralize the high dose of pathogenic e. coli infection [14] . regarding the age of the examined calves, the prevalence of e. coli was 33.3%, 28.5%, and 27.1% in the first 2 months old, 2-4 months old, and 4-6 months old aged calves, respectively. there is no statistically significant difference in the prevalence of e. coli among the different ages (p = 0.74). a previous study reported that the prevalence of e. coli was high in the young calves, and then decreased as the age increased [14] . the serological identification of the retrieved isolates revealed that a total of 64 (81.01%) strains were typable, while 15 isolates (18.99%) were untypable. the most common serogroup was o128 (16.5%), followed by o111 (13.9%) and o26 (11.4%), o125 (11.4%), o91 (10.1%), o45 (8.9%), and o119 (8.9%). the identified serogroups in the current study have a different variety than those previously recorded [13, [19] [20] [21] , which are often associated with sick children with diarrhea [19, 22] . in the current study, 89.8% of the e. coli strains were cr positive, including serogroups o26, o111, o125, o128, o45, o119 and 15 untypable strains; a finding endorsed by a previous study [23] stated that 95% of e. coli isolates are pathogenic by cr binding, which indicates the virulence of these strains. in other studies, it was reported that 61.9-90% of e. coli isolates were cr positive [24, 25] . in the present study, the isolated strains exhibited a remarkable resistance against four antimicrobial agents, including; neomycin (96.2%), gentamycin and streptomycin (95%), and this study was aimed at determining the prevalence of stec and etec incriminated in calf diarrhea, with special reference to the shiga-toxins genes (stx1 and stx2) and enterotoxins genes (lt and sta) that govern their pathogenesis, as well as the virulence genes; eaea and f 41, and the screening of their antimicrobial resistance profiles and antimicrobial resistance genes; aadb, sul1, and bla-tem. the overall prevalence of e. coli was 28.8%, which is lower than what was reported in diarrheic calves in egypt (63.6%) [13] , ethiopia (36.8%) [14] , argentina (30.1%) [15] and india (85.04%) [16] ; however, a lower prevalence was reported in other previous studies in korea (22%) [17] and switzerland (5.5%) [18] . differences in the prevalence of e. coli may be due to the differences in geography, management practices, floor type, health conditions, and the calf's age [2, 13, 15] . further, the high rate of e. coli isolation in the present study could be attributed to many reasons, such as the mixing of different age groups, poor environmental and hygienic conditions, or the poor quantity and/or quality of colostrum. in addition, the colostrum maternal antibodies cannot neutralize the high dose of pathogenic e. coli infection [14] . regarding the age of the examined calves, the prevalence of e. coli was 33.3%, 28.5%, and 27.1% in the first 2 months old, 2-4 months old, and 4-6 months old aged calves, respectively. there is no statistically significant difference in the prevalence of e. coli among the different ages (p = 0.74). a previous study reported that the prevalence of e. coli was high in the young calves, and then decreased as the age increased [14] . the serological identification of the retrieved isolates revealed that a total of 64 (81.01%) strains were typable, while 15 isolates (18.99%) were untypable. the most common serogroup was o128 (16.5%), followed by o111 (13.9%) and o26 (11.4%), o125 (11.4%), o91 (10.1%), o45 (8.9%), and o119 (8.9%). the identified serogroups in the current study have a different variety than those previously recorded [13, [19] [20] [21] , which are often associated with sick children with diarrhea [19, 22] . in [23] stated that 95% of e. coli isolates are pathogenic by cr binding, which indicates the virulence of these strains. in other studies, it was reported that 61.9-90% of e. coli isolates were cr positive [24, 25] . in the present study, the isolated strains exhibited a remarkable resistance against four antimicrobial agents, including; neomycin (96.2%), gentamycin and streptomycin (95%), and amikacin (93.7%). a previous study from egypt in diarrheic buffalo-calves farms reported that the most prevalent phenotypic resistance patterns were; ampicillin (71.4%) and amoxicillin (64.3%), as well as trimethoprim/sulfamethoxazole (50%) and gentamicin (42.8%) [26] . a previous study in iran reported that e. coli strains retrieved from diarrheic calves showed maximum resistance to penicillin, streptomycin, tetracycline, lincomycin and sulfamethoxazole [27] . the resistance pattern observed in our study and the previous other studies indicated that the emergence of multidrug-resistant species has become an increasing problem worldwide, due to the overuse of antimicrobial drugs in both animal and human medicine [28, 29] . the increased pattern of multidrug resistance could be attributed to the accumulation of genes encoding for the antibiotic resistance, either on the bacterial chromosome or plasmid [12] . concerning the occurrence of the multidrug-resistance patterns and the distribution of the antimicrobial resistance genes among the isolated e. coli strains, 41.8% of the tested strains showed multidrug resistance to the aminoglycosides antibiotics; neomycin, gentamicin, streptomycin, and amikacin (harbored aadb gene), while 27.8% exhibited multidrug resistance to the aminoglycosides antibiotics and trimethoprim/sulfamethoxazole (harbored both aadb and sul1 genes). although clavulanic acid and sulbactam are known as effective β-lactamase inhibitors, 21.5% of the tested strains showed multidrug resistance to amoxicillin/clavulanic acid, ampicillin/sulbactam, and the aminoglycosides antibiotics (harbored both aadb and blatem genes). in addition, the most predominant antimicrobial resistance gene was aadb gene, which occurred in 72/79 of the examined strains. these findings are endorsed by a previous study, which reported that an aminoglycoside-resistance determinant (aadb) gene was prevalent among the pathogenic e. coli strains originated from diarrheic calves. such type of resistance could be attributed to the presence of aadb gene, as well as the widespread improper use of aminoglycosides for the treatment of calf diarrhea in the past years [30] . genes encoding resistance to aminoglycosides, sulfonamides, and ampicillins were aadb, sul1, and blatem genes, respectively [31] . a previous study revealed that the most usually detected βlactamase gene is blatem, occurring in 66/74 of the tested strains; the resistance to the antimicrobial agents could be due to the presence of variable gene variants in the resistant isolates [32] . in the present study, the prevalence of the enterotoxigenic e. coli was 30.4%, while the prevalence of shiga-toxigenic e. coli was 20.2%. concerning the isolated stec strains, the stx1 gene is the most predominant shiga-toxin gene, either found alone or in combination with stx2 and eaea genes, or in combination with the stx2 gene. a previous study from diarrheic calves in india reported that the profile of virulence genes of the stec isolates was found in diverse combinations, and the combination of hlya and eaea genes was most the predominant [16] . shiga toxins usually inactivate the host cell-ribosomes with subsequent inhibition of protein-biosynthesis. the occurrence of the stx1, eaea, and stx2 genes together constitute an epidemiological significance, as previous studies reported that the combination of these genes could increase the ability of e. coli to cause severe human illness [16, 33] . regarding the isolated etec strains, the lt gene was the most prevalent enterotoxin gene, either found in combination with the f 41 gene, or in combination with the sta gene, or in combination with both sta and f 41 genes. lt (heat-labile toxin) usually stimulates the adenylate-cyclase enzymatic system, while sta (stable-toxin) activates the guanylate-cyclase system, resulting in severe watery diarrhea. the f 41 gene is usually associated with the occurrence of diarrhea in calves and has a higher prevalence in diarrheic calves rather than healthy ones, which warrants a great role of this gene in etec pathogenesis [34] . in argentina, the most prevalent virulence genes of pathogenic e. coli isolated from dairy calves were k99, f 41 and f 5 [15] . in brazil, the prevalence of the combination of sta and lt genes was 3.9% in e. coli isolated from diarrheic calves [35] . in china, 15.5% of the etec which originated from healthy calves were carried lt and sta genes [36] . furthermore, in italy, the sta gene was absent in e. coli that isolated from diarrheic buffalo calves [37] . the diversity in the prevalence of shiga-toxins genes, enterotoxins genes, and other virulence-related genes in the present study and the other studies may be attributed to the geographical origin of samples, the sample size, the handling of collected samples, the number of examined strains, the type of the examined virulence genes, and the role of the examined virulence genes in the pathogenesis of the disease. in conclusion, e. coli continues to be one of the major causes of calf diarrhea resulting in severe economic losses. both shiga-toxigenic and enterotoxigenic e. coli are the most prevalent pathotypes incriminated in the disease occurrence. the shiga toxins genes; stx1 and stx2, are the most prevalent virulence genes associated with stec, which are responsible for the pathogenesis of the disease and helped by the intimin gene (eaea). furthermore, the lt gene is the most prevalent enterotoxin gene which accompanies the etec strains isolated from calf diarrhea, either alone or in combination with the sta and or the f 41 genes. the majority of the isolated stec and etec harbored aadb antibiotic resistance gene and exhibited a multidrug-resistance pattern to neomycin, gentamycin, streptomycin, and amikacin. moreover, enrofloxacin, florfenicol, amoxicillin-clavulanic acid, and ampicillin-sulbactam are the most effective antimicrobial agents against the isolated stec and etec strains. the incessant implementation of the antibiogram profile is needed to determine the most effective antibiotic, due to the high prevalence of the multidrug-resistant e. coli strains. a total of 274 fecal specimens were aseptically gathered using sterile rectal swabs from april 2018-feb 2019, from diarrheic calves aged between 1 day and 6 months, from two private farms (124 calves from farm i and 150 from farm ii) at el-sharqia governorate, egypt. the collected specimens were rapidly transferred to the bacteriological lab, faculty of veterinary medicine, suez canal university, for further analysis. the handling of animals was carried out by well-trained scientists, according to the instructions of the animal ethics review committee of suez canal university (scu-362-27/12/2016). the collected specimens were inoculated in nutrient broth and then incubated for 24 h at 37 • c. loopful from the culture suspension was streaked on macconkey's agar and eosin methylene blue agar (oxoid, uk). the recovered typical colonies (pink colonies on macconkey's agar and metallic green sheen colonies on emb) were completely identified morphologically and biochemically, as described by quinn et al. [38] . the obtained e. coli isolates were serogrouped by the detection of o antigens using the slide agglutination test, according to the method previously described by edwards and ewing [39] . the congo red (cr) binding test has been used to detect the invasive e. coli, using trypticase agar supplemented with 0.03% cr dye (oxoid, uk). the colonies were streaked on congo red agar and incubated at 37 • c for 24 hrs, then plates were kept at room temperature for 48 hrs. the demonstration of the invasive strains (red colonies) was observed and recorded according to the methods previously described by panigrahy and yushen [24] . the isolated e. coli strains were tested against nine antimicrobial agents; amoxicillin/clavulanic acid (20/10 mcg), ampicillin/sulbactam (10/10 mcg), amikacin (10 mcg), neomycin (30 mcg), enrofloxacin (5 mcg), florfenicol (30 mcg), streptomycin (10 mcg), trimethoprim/sulfamethoxazole (25 mcg) and gentamicin (30 mcg) (oxoid, uk), using the disc diffusion method. the diameter of the inhibition zone was measured in millimeters and expressed as sensitive, intermediate, and resistant, as described by clsi [40] . bacterial dna of purified bacterial cells was extracted using the qiaamp dna mini kit (invitrogen, usa). recovered dna templates were quantified using a nanodrop (nanodrop 1000, thermo scientific, loughborough, uk), adjusted to 100 ng µl −1 . to assess the virulence genes (stx1, stx2 lt, sta, f 41 and eaea), as well as the antimicrobial resistance genes (aadb, sul1, and bla-tem) in the obtained e. coli strains, pcr was performed using specific sets of primers (metabion, germany) ( table 8 ). the pcr reaction (25 µl) consists of 12.5 µl go taq ® green master mix 2x (promega, wisconsin, usa), 1 µl (20 pmol) of each primer, 5 µl dna extract, and pcr grade water up to 25 µl. the cycling conditions are listed in table 8 . negative control (no dna template) and positive control reference strains (previously isolated and kindly provided by a.h.r i, dokki, egypt) were used in the pcr assay. amplified fragments were screened by 1.5% agarose gel electrophoresis (applichem gmbh, darmstadt, germany) for 45 min at 100 v in 1× tae, visualized using 15 µl of dna gel stain (sigma-aldrich, st. louis, mi, usa) and photographed under uv transilluminator. a 100 bp ladder (fermentas, thermo scientific, darmstadt, germany) was used. the chi-square was performed to analyze the data, to test the null hypothesis of different treatments using the statistical analysis software (sas ® software version 9.4, sas institute, cary, nc, usa). the significance level was (p < 0.05). comparison of electrocardiographic parameters and serum electrolytes and microelements between single infection of rotavirus and coronavirus and concurrent infection of cryptosporidium parvum with rotavirus and coronavirus in diarrheic dairy calves an overview of calf diarrhea-infectious etiology, diagnosis, and intervention clinical and sero-molecular characterization of escherichia coli with an emphasis on hybrid strain in healthy and diarrheic neonatal calves in egypt pathophysiology of diarrhea in calves prevalence, molecular typing, and antimicrobial resistance of bacterial pathogens isolated from ducks calf diarrhea: epidemiological prevalence and bacterial load in oyo and ogun states escherichia coli o157:h7: animal reservoir and sources of human infection increase in multistate foodborne disease outbreaks-united states genetic variation among avian pathogenic e. coli strains isolated from broiler chickens epidemiological study of diarrheagenic escherichia coli virulence genes in newborn calves prevalence, the antibiogram and the frequency of virulence genes of the most predominant bacterial pathogens incriminated in calf pneumonia prevalence and molecular epidemiological characterization of antimicrobial-resistant escherichia coli isolates from japanese black beef cattle the occurrence of the multidrug resistance (mdr) and the prevalence of virulence genes and qacs resistance genes in e. coli isolated from environmental and avian sources the distribution of escherichia coli serovars, virulence genes, gene association and combinations and virulence genes encoding serotypes in pathogenic e. coli recovered from diarrhoeic calves, sheep and goat characterization of escherichia coli isolated from calf diarrhea in and around kombolcha molecular screening of pathogenic escherichia coli strains isolated from dairy neonatal calves in cordoba province prevalence and antimicrobial resistance pattern of shiga toxigenic escherichia coli in diarrheic buffalo calves causative agents and epidemiology of diarrhea in korean native calves prevalence of four enteropathogens in the faeces of young diarrhoeic dairy calves in switzerland prevalence and genetic profiles of shiga toxin-producing escherichia coli strains isolated from buffaloes, cattle, and goats in central vietnam prevalence and comparative studies of some major serotype of e. coli from cattle and buffalo calf scour diarrhoeagenic escherichia coli and salmonella in calves and lambs in kashmir: absence, prevalence and antibiogram human infections with non-o157 shiga toxin-producing escherichia coli an alternative approach for evaluating the phenotypic virulence factors of pathogenic escherichia coli differentiation of pathogenic and nonpathogenic escherichia coli isolated from poultry virulence factors of escherichia coli isolated from diarrheic sheep and goats serotyping, antibiotic susceptibility, and virulence genes screening of escherichia coli isolates obtained from diarrheic buffalo calves in egyptian farms characterization of escherichia coli virulence genes, pathotypes and antibiotic resistance properties in diarrheic calves in iran changing patterns of infectious disease multidrug resistant commensal escherichia coli in animals and its impact for public health characterization of class 1 integrons-mediated antibiotic resistance among calf pathogenic escherichia coli virulence factors in escherichia coli isolated from calves with diarrhea in vietnam prevalence and molecular characterization of clinical isolates of escherichia coli expressing an ampc phenotype serotypes, virulence genes, and intimin types of shiga toxin (verotoxin)-producing escherichia coli isolates from cattle in spain and identification of a new intimin variant gene (eae-ξ) a systematic review and meta-analysis of the epidemiology of pathogenic escherichia coli of calves and the role of calves as reservoirs for human pathogenic occurrence and characteristics of virulence genes of escherichia coli strains isolated from healthy dairy cows in inner mongolia, china characterization of enterotoxigenic e. coli (etec), shiga-toxin producing e. coli (stec) and necrotoxigenic e. coli (ntec) isolated from diarrhoeic mediterranean water buffalo calves identification of enterobacteriaceae performance standards for antimicrobial disk and dilution susceptibility tests for bacteria isolated from animals: approved standard molecular basis of virulence in clinical isolates of escherichia coli and salmonella species from a tertiary hospital in the eastern cape detection of shiga-toxigenic escherichia coli (stec) in diarrhoeagenic stool & meat samples in mangalore occurrence and characteristics of enterohemorrhagic escherichia coli o26 and o111 in calves associated with diarrhea multiplex pcr for enterotoxigenic, attaching and effacing, and shiga toxin-producing escherichia coli strains from calves relative distribution and conservation of genes encoding aminoglycoside-modifying enzymes in salmonella enterica serotype typhimurium phage type dt104 simple and reliable multiplex pcr assay for detection of blatem, blashv and blaoxa-1 genes in enterobacteriaceae genetic diversity and antimicrobial resistance of escherichia coli from human and animal sources uncovers multiple resistances from human sources this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license the authors are grateful to the researchers supporting project number (rsp-2020/53), king saud university, riyadh, saudi arabia. the authors declare no conflict of interest. key: cord-315072-b28yikvj authors: giotis, efstathios s.; robey, rebecca c.; skinner, natalie g.; tomlinson, christopher d.; goodbourn, stephen; skinner, michael a. title: chicken interferome: avian interferon-stimulated genes identified by microarray and rna-seq of primary chick embryo fibroblasts treated with a chicken type i interferon (ifn-α) date: 2016-08-05 journal: vet res doi: 10.1186/s13567-016-0363-8 sha: doc_id: 315072 cord_uid: b28yikvj viruses that infect birds pose major threats—to the global supply of chicken, the major, universally-acceptable meat, and as zoonotic agents (e.g. avian influenza viruses h5n1 and h7n9). controlling these viruses in birds as well as understanding their emergence into, and transmission amongst, humans will require considerable ingenuity and understanding of how different species defend themselves. the type i interferon-coordinated response constitutes the major antiviral innate defence. although interferon was discovered in chicken cells, details of the response, particularly the identity of hundreds of stimulated genes, are far better described in mammals. viruses induce interferon-stimulated genes but they also regulate the expression of many hundreds of cellular metabolic and structural genes to facilitate their replication. this study focusses on the potentially anti-viral genes by identifying those induced just by interferon in primary chick embryo fibroblasts. three transcriptomic technologies were exploited: rna-seq, a classical 3′-biased chicken microarray and a high density, “sense target”, whole transcriptome chicken microarray, with each recognising 120–150 regulated genes (curated for duplication and incorrect assignment of some microarray probesets). overall, the results are considered robust because 128 of the compiled, curated list of 193 regulated genes were detected by two, or more, of the technologies. electronic supplementary material: the online version of this article (doi:10.1186/s13567-016-0363-8) contains supplementary material, which is available to authorized users. the interferon (ifn) response is one of the most important arms of host innate immunity against virus infection [1, 2] . infected cells are able to recognise foreign nucleic acids and induce the synthesis and secretion of type i ifn (ifn-α and ifn-β) and type iii ifn (ifn-λ), which bind to receptors on the surface of neighbouring cells and trigger the transcriptional regulation of genes involved in the antiviral state. studies in mammals have demonstrated that there are several hundred such ifn-regulated genes (irgs). because the vast majority are up-regulated they are overwhelmingly referred to as ifn-stimulated genes (isgs) so, hereafter, they will be referred to generically as isgs (or specifically as chicken isgs, chisgs), except where the more generic term avoids confusion. induction of isgs involves the jak/stat signalling pathway: stat1 is either recruited directly to target promoters for a relatively weak activation or, more commonly, is recruited in a complex called isgf3 in association with stat2 and irf9 [1, 3] . isgs are the focus of considerable current attention with regard to: (i) their antiviral activity, (ii) an increasing appreciation of the complexity of their regulation and (iii) their targeting by virus-encoded modulators of ifn-induced responses [1, 3, 4] . these studies require comprehensive catalogues of the isgs, especially where system-wide approaches are undertaken. even though many key mammalian isgs have been known for some time, it is with the relatively recent advent of transcriptomic technologies that the full complement has been catalogued (mainly using microarrays [5] ; see also schoggins et al. [6] ). in contrast to the mammalian ifn system our equivalent knowledge of the avian system has lagged behind. although ifn was discovered in chickens in 1957 [7] the first chicken ifn gene was characterised in 1994 [8] and the key chicken isg, pkr, was identified in 2004 [9] . the derivation of the chicken genome sequence, first drafted in 2004 [10] , did not greatly advance our understanding of chicken isgs because of the incomplete nature of the gallus gallus genome assembly, even at v4 (galgal4), which might be partly due to the fact that the chicken karyotype has six pairs of macrochromosomes (but 33 pairs of microchromosomes), and the difficulties in annotating immunity genes, which are some of the most divergent between mammals and birds [11] . however, it has become apparent that key genes of the innate immune system, such as the transcription factors irf9 and one member of the irf3/irf7 dyad [12, 13; unpublished] , are absent from avian species, indicative of significant functional differences between them and mammals. moreover, for reasons that are not understood, the cytosolic pattern recognition receptor, rig-i, appears to have been lost from chicken as well as other galliformes [13, 14] . to generate a chicken isg database we have compared data from three transcriptomic technology platforms: (i) the classical 3′-biased genechip chicken genome array (32k; affymetrix, high wycombe, uk), (ii) the chicken gene 1.0 sense target (st) whole transcriptome array (affymetrix) and (iii) illumina (little chesterford, uk) rna-seq. this three-way comparison allowed a high level of cross-validation of data from each technology, beyond what would normally be achieved by qrt-pcr. it also allows subsequent studies, constrained to use any particular technology, to be more broadly compared. we monitored irg expression in chicken embryo fibroblast (cef) induced for 6 h with 1000 units recombinant chicken ifn-α (rchifn1; hereafter routinely referred to as ifn), a time chosen to reflect predominantly primary signalling targets. the expression data for selected genes were also validated by pcr and qrt-pcr. overlapping data show generally high degrees of concordance in the identity of the irgs and their relative levels of regulation by ifn, with disparity mainly where multiple microarray probes exist for single genes. the study was presented in a preliminary form as a poster at the international cytokine and interferon society (icis) meeting ("cytokines 2015"; october 11-14, 2015) in bamberg, germany [15] . freshly isolated cef were provided by the former institute for animal health (compton, uk, now the pirbright institute, pirbright, uk). cells were seeded in t25 flasks (greiner bio one, kremsmünster, austria; 5.6 × 10 6 cells/flask) and cultured overnight in 5.5 ml 199 media (gibco thermo fisher scientific, paisley, uk) supplemented with 8% heat-inactivated newborn bovine serum (nbcs; gibco), 10% tryptose phosphate broth (tpb; sigma-aldrich, gillingham, uk), 2% nystatin (sigma-aldrich) and 0.1% penicillin streptomycin (gibco). recombinant chicken ifn-α (rchifn1) was prepared as previously reported [16] and was added in culture media to a final concentration of 1000 units/ml. confluent cells were treated with ifn or mock-treated and incubated for six hours before harvesting. cells were stored at −80 °c in rnalater (sigma-aldrich) until rna extraction. the experiment was repeated in triplicate with three different batches of cef. total rna was extracted from cells using an rneasy kit (qiagen, crawley, uk) according to the manufacturer's instructions. on-column dna digestion was performed using rnase-free dnase (qiagen) to remove contaminating genomic dna. rna samples were quantified using a nanodrop spectrophotometer (thermo fisher scientific, paisley, uk) and checked for quality using a 2100 bioanalyzer (agilent technologies, wokingham, uk). all rna samples had an rna integrity number (rin) ≥9.6. rna samples were processed for microarray with the genechip ® chicken genome array (affymetrix) using the genechip ® 3′ ivt express kit (affymetrix) or for microarray with the chicken gene 1.0 st array (affymetrix) using the ambion (paisley, uk) wt expression kit for affymetrix genechip ® whole transcript (wt) expression arrays (ambion) and the genechip wt terminal labelling and controls kit (ambion), following the manufacturers' instructions, as described previously [17] . total rna (100 ng) was used as input and quality checks were performed using the 2100 bioanalyzer at all stages suggested by the manufacturer. rna samples were processed in two batches of 18 but batch mixing was used at every stage to avoid creating experimental bias. hybridisation of rna to chips and scanning of arrays was performed by the medical research council's clinical sciences centre (csc) genomics laboratory (hammersmith hospital, london, uk). rna was hybridised to genechip chicken genome array chips (affymetrix) in a genechip hybridization oven (affymetrix), the chips were stained and washed on a genechip fluidics station 450 (affymetrix), and the arrays were scanned in a gene-chip scanner 3000 7g with autoloader (affymetrix). cdna was synthesised from rna samples from untreated and ifn-treated cef using the quantitect ® reverse transcriptase system (qiagen) according to the manufacturer's instructions. the glyceraldehyde 3-phosphate dehydrogenase (gapdh) was used as a reference gene. all target gene expression levels were calculated relative to gapdh expression levels and the target gene expression level in −2 h uninfected cef using the comparative c t method (also referred to as the 2 −δδct method). triplicate untreated (control) and ifn-treated cef were processed for transcriptome analysis by rna-seq. the cell samples used were identical to those used for the microarray analyses. total rna was extracted as for microarrays (above) and rna libraries were prepared for deep sequencing using the truseq rna sample preparation kit (illumina) according to the manufacturer's instructions. total rna (2.5 μg) was used as an input for each library. a total of six rna adapter indices were randomly assigned to the 12 samples to allow multiplexing of libraries. at the end of the protocol, libraries were quantified using a nanodrop spectrophotometer and checked for quality using a 2100 bioanalyzer high sensitivity dna chip (agilent technologies). rna library qpcr quantification, multiplexing and sequencing was performed by the medical research council's clinical sciences centre (csc) genomics laboratory, hammersmith hospital, london, uk. libraries were quantified using the kapa biosystems (london, uk) library quantification kit (kk4824) on an abi 7500 fast qpcr machine (applied biosystems). libraries were then diluted to a 2 nm stock solution, pooled for multiplexing, denatured and diluted to a final molarity of 20 pm. libraries were loaded on to the flow cell (8-16 pm per lane) for clustering and cluster generation was performed by the illumina cbot using version 3 kits. sequencing of the flow cell was then carried out on the illumina hiseq 2000 using the version 3 kits. data were processed using microarray data were processed using workflows in genespring ™ (agilent) and partek ™ (partek inc., st louis, mo, usa) commercial software suites. data (.cel files) were analysed and statistically filtered using either partek genomic suite 6.6 (partek gs) or genespring version 7.2 (agilent technologies) software. input files were normalized with either gcrma or genespring algorithms for gene array on core metaprobesets. a one-way anova was performed using either software across all samples. statistically significant genes were identified using mixed model analysis of variance with a false discovery rate (benjamini-hochberg test) of p < 0.05. fold-change values <±3.0 were removed. rna-seq data were imported into clc bio's genomics workbench (clc bio, aarhus, denmark; now qiagen), quality-controlled and thereafter processed using that package (versions 6 and 7). after quality control, the reads were subjected to quality trimming then mapped against ensembl galgal4 annotated genes (release 75 [18] ) for quantitative analysis of expression. fold change and false discovery rates (fdr) were calculated using kal's z test [19] , with pooled data, or baggerly's test [20] , using separate triplicates. initially, we used the 32k genechip ® chicken genome array (affymetrix) because, as well as displaying probes for 32 773 chicken transcripts, it displays probes for 684 transcripts from 17 different viral pathogens of chickens, which offers advantages to those studying virus infections in a chicken background. subsequently, we used the more refined chicken gene 1.0 st array (affymetrix) because it offers a higher probe density against 18 214 chicken genes and should allow detection of transcript isoforms, including non-polyadenylated and alternatively polyadenylated, though it does not include probes for viral genes. separate weekly batches of cef, produced from pools of eggs from the same flock (rhode island red) held in spf-like conditions at the former compton laboratory of the institute for animal health (now the pirbright laboratory) served as biological replicates. principal component analysis of the microarray data (data not shown) indicated limited variation between batches so, thereafter, biological triplicates were used routinely. irgs were identified from expression analysis data determined using the 32k genechip following ifn treatment (1000 units, 6 h) of cef. after quantile normalization, significant hits were identified with genespring using an unpaired t test with asymptotic p-value computation and benjamini-hochberg multiple testing correction to generate false discovery rates (fdr). a matrix of fdr (from <0.001 to 1) plotted against fold change (fc; from 1.0 to >3) is shown in table 2 . a relatively conservative fdr of <0.01 returned 250 differentially expressed probesets. overlaying this with a value for fc for which changes in expression might reasonably be expected to be readily and reliably assayed using other technologies, namely >3, reduced the number of selected, significant probesets to a manageable 181 (180 up, 1 down). these settings were therefore chosen for further analysis. for 23 of these probe sets, no currently recognised genes were automatically assigned. of the remaining 158 probe sets, 29 were assigned to genes recognised in duplicate by other probe sets. consequently 129 recognised genes were identified as differentially expressed (the down-regulated transcript was not, at that time, assigned to a recognised gene). with the chicken gene 1.0 st array, 157 probe sets demonstrated differential expression (156 up, 1 down) at the same settings (fc > 3, fdr < 0.01). amongst these, there were five duplicated probe sets and 27 that were not automatically assigned to recognised genes therefore 125 recognised genes were uniquely identified as differentially regulated. illumina rna-seq yielded a total of 170 million reads (100 bases; paired) for the mock-treated cef triplicate samples and 167 million for the ifn-treated samples. upon quality trimming and mapping to ensembl galgal4 annotated genes (release 75), using clc bio's genomic workbench, 138 recognised genes were identified as differentially regulated (137 up, 1 down) using kal's proportion-based z test [19; as implemented in the clc bio package] at the same settings (fc > 3, fdr < 0.01). kal's is performed on the pooled reads from ifn-treated and untreated samples. it is perhaps, therefore, more widely applicable; it also returned a number of irgs comparable to those returned by the microarrays. triplicate-based analysis using baggerly's proportionbased beta-binomial test [20; as implemented in the clc bio package] at the same settings (fc > 3, fdr < 0.01) returned an additional 37 up-regulated genes. comparison of the complete raw gene lists from the three technologies using the most compatible identifier (essentially the gene symbol) with an online venn diagram tool (venn diagram generator; [21] ) demonstrated that 233 recognised genes were identified as differentially regulated. of these, 51 were identified in common by all three technologies and a further 57 were identified by two out of three technologies, meaning that 108 were identified by at least two technologies. a total of 125 were therefore each identified only by individual technologies ( figure 1a) . as well as comparing the identities of the differentially regulated genes, the correlation of expression of the genes identified by the different platforms was examined in terms of both level and rank of fc (figures 2a and b) . for instance, comparing rna-seq data with the 32k genechip data, spearman correlation values were 0.93 for fc level and rank. considering the current state of assembly and annotation of the chicken genome, the correlation of isgs in terms of gene identity as well as the level and rank of induction as indicated by all three technology platforms is reassuring. nevertheless the platform transcriptomic data were validated for selected genes by rt-pcr (data not shown) and by qrt-pcr ( figure 3a) . a 6 h time point was chosen for microarray and rnaseq analysis of ifn treatment as it has been widely used and is known to result in significant levels of a broad range of isgs in mammals, making it suitable for defining the chicken interferome. use of this single time point does not, however, provide unequivocal insight into mechanistic interpretation of isg induction; for instance, it does not discriminate between strictly isre-dependent induction of isgs and isre-independent induction of isgs by mechanisms that might include immediate high-level induction of irf1, which has been observed in mammalian systems [22] [23] [24] . kinetic analysis of the induction of expression of a subset of isgs was therefore conducted at 45, 90, 180 and 360 min post application of ifn (see figure 3b ). even among highly-induced isgs, different temporal profiles were observed, from the rapid accumulation of ifit5 (1000-fold by 90 min) and rsad2 (which remain at steady levels to 360 min) to the steadier, sustained accumulation of mx and the more modestly induced stat1; with lgp2 and trim25 peaking at 180 min. although differences in mrna stability and turnover will influence the profiles, this identification of the isgs will allow detailed analysis of their promoters to investigate elements (and the factors that bind them) that contribute to the complexity of the observed induction patterns. of the 51 irgs initially identified by all three technologies, 47 ). this suggests either that the mammalian equivalents are isgs but that they are not included as such in interferome or that they are not isgs in mammals. the raw lists were refined by manual "curation", allowing for synonyms of recognised genes (for instance isg12-2 versus isg12(2)) and, after bioinformatic analysis using blast, etc., assigning recognised gene identifiers to probe sets that previously lacked them. at the end of this process ( figure 1b ; additional files 1, 2), it was apparent that some (n = 12) differentially regulated genes identified by the microarrays were also identified as differentially regulated by rna-seq but that they fell outside of the strict fc > 3 and fdr < 0.01 parameters, reflecting unsurprising disparity in the sensitivity of the three technologies. those genes that were expressed down to fc > 2.5 or with an fdr up to < 0.05 were, therefore, also incorporated to produce a final list ( figure 1c ; additional files 1, 2). it is obvious that this manual curation of the data, to allow for alternative gene id nomenclature used by the three technologies and for differences in sensitivity, introduced minor changes to the figures from the automatic comparisons cited above (figure 1 ; additional files 1, 2). curation, therefore, reduced the number of irgs from 233 to 193. it also increased the number of differentially expressed genes detected by two out of three technologies from 108 to 118 (compare figures 1a and b) . relaxing the criteria for detection of differentially regulated genes by rna-seq (to fc > 2.5 and/or fdr < 0.05) further increased the number of genes detected by all three technologies from 70 to 72 (representing 37%) or by at least two of the technologies from 118 to 128 (66%), leaving 65 genes detected by single technologies (compare figures 1b and c) , with 29 of those detected by rna-seq alone (using the kal's test, at fc > 3.0 and fdr < 0.01; additional files 1, 2). of the 37 additional isgs identified by rna-seq as significant (fc > 3 fdr < 0.01) by the more sensitive baggerly's test but not by kal's (table 3) , two were also identified as significant by kal's using the relaxed criteria (fdr < 0.05). baggerly's, therefore, identified 35 isgs additional to those described in the above analyses using rna-seq (kal's analysis) and the microarrays (table 3) analysis of rna-seq data depends directly on the extant annotated genome sequence. perhaps not surprisingly therefore, rna-seq identified the largest proportion of genes amongst the set of 193 unique irgs that we compiled (150; 78%). nevertheless, the microarrays each identified 63% of the genes (122 and 121) . congruence was highest, and almost identical, between rna-seq and each microarray (98 and 95; 51 ± 1%; all percentages referring to the total of 193 unique irgs). between microarrays it fell to 41% (79). for two-way-only comparisons, the distribution of unique genes between the microarrays was symmetrical (42 and 43; 22%). between rna-seq and each microarray, unique genes were biased >2-fold towards rna-seq: 52 (27%) versus 24 (12%) against the genechip and 55 (28%) versus 26 (13%) against the st array. clearly in simple terms of numbers of irgs identified, rna-seq outperforms the microarrays. this is probably attributable to the historic nature of the array design based on earlier genome assemblies and annotations, with consequent effects on overall coverage (which might disproportionately affect conditionally expressed genes such as those of the innate immune responses). nevertheless, the ability of microarrays to quantify expression of 50% (about 100) of such a large pool of important genes will often prove sufficient for the experimental objectives where other considerations might affect the choice of technology (see below). moving away from actual numbers of genes, it is worth noting that deeper analysis (in the form of validation by alternative approaches) will, by definition, be required to determine which of the genes identified uniquely as irgs by individual technologies are actually irgs. genomic loci for each of the predicted isgs were visually inspected using genomic workbench's genome browser, displaying tracks showing: gene, transcript, exon and orf annotations for the current chicken genome build as well as read-mapping for control and ifn-treated reads [27] . on occasions, such inspection revealed the presence of non-annotated, inducibly-transcribed regions, representing exons, whole genes or even gene families. examples include those previously described at the chicken ifitm locus [28; data not shown], at the herc locus (described below) or downstream of ccl19 (loc100857191; "c-c motif chemokine 26-like"; figure 4 ). systematic analysis of these isgs is outside the scope of this manuscript but the data deposited from this study (european nucleotide archive (ena) study number prjeb7620 [44] ) will facilitate ongoing study and improved annotation. in some cases, although not currently annotated on the ensembl chicken genome, the genes have ids in ncbi and were identified as isgs by one of the microarrays. examples of these include loc415756, loc415922 ("guanylate-binding protein 4-like") and loc422513 ("hect domain and rld 4-like", a member of the herc family, discussed below). about 10% of the reads from cefs did not map to the current chicken genome. the unmapped reads combined from the control and ifn-treated samples were assembled into contigs using the de novo assembly function of genomic workbench. the rna-seq function of genomic workbench was then used to quantitate expression of the contigs in control and ifn-treated samples. one of the most highly-expressed contigs was one which, when analysed by blast, proved to represent a homologue of stat2, which is missing from the current ensembl annotated reference chicken genome in (b) pyurf shows 24-fold suppression by ifn but the sequence surrounding pyurf shows 87-fold induction from the right-hand end of the unannotated, antisense loc422513 and considerably higher upregulation from the left-hand end (due to its lower uninduced levels), consistent with these representing homologues of ifn-inducible human genes herc6 and herc5. assembly (galgal4; release 84), though at ncbi it has recently been placed as a refseq gene on chromosome 33 in the new assembly galgal5 (an annotated form of which has not yet been released and is currently not scheduled for release). the de novo assembled contig sequence was used to derive primers for rt-pcr; characterisation of chicken stat2 will be reported elsewhere. the data on differential expression showed an overwhelming over-representation of genes up-regulated by ifn. for each of the technologies, only one gene was detected as down-regulated. corresponding geneids were pyurf (pigy upstream reading frame; ensgalg00000026229) for rna-seq and pigy (phosphatidylinositol glycan anchor biosynthesis, class y; ncbi geneid: 101748971) for the st array. the down-regulated 32k genechip probe (gga.8802.1.s1_at), though not mapped to a known gene at the time of initial processing, according to the affymetrix netaffx ™ analysis center [29] is now also assigned as pyurf. in humans, pigy and pyurf represent different open reading frames on the same spliced transcript of a gene on hs chromosome 4 located downstream of herc6 then herc5. the pyurf/pigy gene is overlapped on the opposite strand by herc3, which extends downstream to be followed by fam13a. similarly, the chicken pigy (ncbi) and pyurf (ensembl) genes map to a locus lying upstream of herc3 then fam13a on gg chromosome 4 (see figure 4) , with herc-like loc422513 ("hect domain and rld 4-like") starting upstream but spanning and extending downstream of the chicken pyurf. our rna-seq data ( figure 4) indicate that this locus is poorly annotated and demonstrates complex regulation of the component genes by ifn. thus, although the pigy/ pyurf transcript is down-regulated by ifn, as recorded by all three technologies, it appears to be closely flanked upstream and downstream by still unannotated multiple exons that are clearly strongly induced by ifn (figure 4 ). sequences within these upstream and downstream regions (which are represented by the single ncbi refseq (gal-gal5) gene, loc422513, but appear as though they may represent two separate genes, figure 4 ) bear homology with genes of the herc family, consistent with the fact that herc5 neighbours the human pigy/pyurf gene and that herc3 neighbours the chicken pigy/pyurf gene. the chicken herc3 gene shows no evidence of induction by ifn. description of the interferon-inducibility of the chisgs serves as the first step in understanding the regulation of their expression and their role in anti-viral (and potentially broader anti-microbial) activities. there is considerable current interest in the antiviral responses of particular cell types, particularly those of the lymphoid, myeloid and dendritic lineages. however, the definition of a wide variety of these cell types is not so advanced in avian species so we felt it best to produce baseline data for readily available, primary cells, namely chick embryo fibroblasts (cef) as they are highly responsive to ifn. they also remain important for commercial production of vaccine viruses (including human vaccines) as well as for the routine isolation and diagnosis of avian pathogens. given the currently incomplete nature of the chicken genome assembly (even at galgal5) and of its annotation (as currently available for galgal4 and even as awaited for galgal5) it is inevitable that updates will continue to be released but the primary data reported here, and publicly-available, for microarrays and rna-seq, can always be applied to updated microarray assignments as well as to subsequent genome assemblies and annotations. all things being equal, rna-seq would seem to be the method of choice for transcriptomic analysis of chicken ifn responses, particularly given its ability to produce high-resolution quantitative and qualitative data. moreover the data are readily portable and can be easily mined by others with different research focus. they can also be applied immediately to newly released genome assemblies and annotations (whether global or local), whereas microarray analysis must await the generation of annotation updates for each technology. however, although the cost of sequencing has fallen, and will probably continue to do so, there remain considerable overheads to handling large data sets from extensive, complicated experiments, especially in terms of computing and data storage capacity, as well as speed of processing and archiving. for such experiments, microarrays continue to offer a tractable approach, capable of quickly quantifying and comparing the expression of the central core of irgs producing relatively compact data for rapid analysis and easy archiving. induction of innate responses with pamps will trigger different or broader ranges of responses by virtue of the fact that they will trigger other or more pathways than just the ifn-pathway. for instance we (giotis et al. unpublished) and others [12] have begun to analyse the responses induced by the dsrna analogue poly[i:c]. regulation of isg expression might affect the innate responses observed in different cell lines or tissues so it will be important to understand the mechanisms involved. additionally, we have observed suppression of isg induction in the spontaneously immortalized chicken fibroblast cell line, df-1 [30] , due to their enhanced basal expression of the regulatory isg, socs1 (giotis et al., unpublished) . identification of the isgs means that their promoters, enhancers and other regulatory elements can be systematically analysed to help understand the complex kinetics of expression of their expression (figure 4 ). several studies have investigated changes in host gene expression in response to infection in vivo or in culture with particular avian viruses [31] [32] [33] [34] [35] [36] [37] [38] [39] . although many of these genes will represent innate (and potentially antiviral) host responses, the majority will be involved in the metabolic, cell cycle and ultrastructural changes that the virus has to induce to facilitate replication. furthermore, it is not unusual for viruses to modulate the expression of signalling molecules key to the antiviral responses or of antiviral effectors themselves. for instance, we have shown that even an attenuated strain of fowlpox virus blocks induction of ifn-β (chifn2) and is highly resistant to the antiviral activity induced by ifn [16, 40] . the results of existing and future studies of infection in vivo or in culture with particular avian viruses can now be compared with data presented here for isg induction by ifn to look for evidence of modulation of isg expression by viruses, whether that be modulation of individual isgs, subsets [4] or the complete set. for instance, fowlpox virus blocks essentially all isg expression but a mutant defective in the fpv012 ankyrin repeat/f-box protein identified by laidlaw et al. [40] induces modest levels of a subset of the isgs (giotis et al., unpublished) . such analyses can be extended to important avian zoonotic viruses and pathogens with huge impact on the global poultry industry. although this study relates to type i ifn, extensive comparison with the effects of type iii ifn could now be conducted, extending on the qrt-pcr comparison made by masuda et al., who looked at induction of mx and oas by ifn-β, ifn-γ and ifn-λ [41] . interferons and viruses: an interplay between induction, signalling, antiviral responses and virus countermeasures inborn errors of anti-viral interferon immunity in humans interferon-stimulated genes: a complex web of host defenses pathogenic influenza viruses and coronaviruses utilize similar and contrasting approaches to control interferon-stimulated gene responses functional classification of interferon-stimulated genes identified using microarrays a diverse range of gene products are effectors of the type i interferon antiviral response virus interference. i. the interferon chicken interferon gene: cloning, expression, and analysis characterization of the chicken pkr: polymorphism of the gene and antiviral activity against vesicular stomatitis virus international chicken genome sequencing c (2004) sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution evidence of the adaptive evolution of immune genes in chicken functional analysis of chicken irf7 in response to dsrna analog poly(i:c) by integrating overexpression and knockdown defense genes missing from the flight division innate sensing of viruses by pattern recognition receptors in birds id: 217: transcriptomic analysis of the chicken interferome genetic screen of a library of chimeric poxviruses identifies an ankyrin repeat protein involved in resistance to the avian type i interferon response species difference in anp32a underlies influenza a virus polymerase host restriction dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources differential expression in sage: accounting for normal between-library variation involvement of the irf-1 transcription factor in antiviral responses to interferons constitutive expression of an isgf2/irf1 transgene leads to interferon-independent activation of interferon-inducible genes and resistance to virus infection ifn regulatory factor-1 bypasses ifn-mediated antiviral effects through viperin gene induction interferome v2.0: an updated database of annotated interferon-regulated genes chicken interferon-inducible transmembrane protein 3 restricts influenza viruses and lyssaviruses in vitro the df-1 chicken fibroblast cell line: transformation induced by diverse oncogenes and cell death resulting from infection by avian leukosis viruses transcriptomic profiling of virus-host cell interactions following chicken anaemia virus (cav) infection in an in vivo model molecular responses to the influenza a virus in chicken trachea-derived cells early host responses to avian influenza a virus are prolonged and enhanced at transcriptional level depending on maturation of the immune system transcriptional analysis of host responses to marek's disease viral infection analysis of the early immune response to infection by infectious bursal disease virus in chickens differing in their resistance to the disease a comparative analysis of host responses to avian influenza infection in ducks and chickens highlights a role for the interferon-induced transmembrane proteins in viral resistance analysis of the crow lung transcriptome in response to infection with highly pathogenic h5n1 avian influenza virus integrated analysis of microrna expression and mrna transcriptome in lungs of avian influenza virus infected broilers differential expression of micrornas in marek's disease virus-transformed t-lymphoma cell lines genetic screen of a mutant poxvirus library identifies an ankyrin repeat protein involved in blocking induction of avian type i interferon biological effects of chicken type iii interferon on expression of interferon-stimulated genes in chickens: comparison with type i and type ii interferons we are grateful for the skilled support of laurence game, nathalie lambie and adam giess of the medical research council's (mrc) clinical sciences centre's (csc) genomics facility in conducting microarray analysis and illumina sequencing. we gratefully acknowledge sarah butcher and geraint barton of the bioinformatics support service at imperial college london for their advice. the datasets supporting the conclusions of this article are available from the following repositories: european bioinformatics institute (ebi) arrayexpress accession numbers e-mtab-3711 (for the 32k genechip; [42] ) and e-mtab-3712 (for the st array; [43] ). european nucleotide archive (ena) study number prjeb7620 (for illumina rna-seq; [44] ). additional file 1. table of additional file 2. detailed information on chisgs identified by rna-seq, and microarray technologies (1). technologies identifying significant irgs are listed as "1" rna-seq (using kal's z test); "2" affymetrix 32k genechip chicken genome array and "3" chicken gene 1.0 st array' . chisgs significant by one or both microarrays and rna-seq using kal's z test under relaxed criteria (fc > 2.5 or fdr < 0.05) are indicated by "(1)". "+" after the technology identifier indicates that ifn-induced rna-seq read density was observed at the location of the unannotated gene. (2) interferome status [45] . (3) human homologue data (hugo) [46] . (4) mouse orthologue data (mgi) [47] . ifn: interferon; irgs: ifn-regulated genes; isgs: ifn-stimulated genes; cef: chicken embryo fibroblasts; rchifn1: recombinant chicken ifn-α; rin: rna integrity number; qrt-pcr: quantitative real-time pcr; gapdh: glyceraldehyde 3-phosphate dehydrogenase; fc: fold change; fdr: false discovery rate. the authors declare that they have no competing interests. esg and rcc design of the study, data acquisition and analysis, drafting the manuscript. ngs data compilation and analysis, drafting the manuscript. cdt design, production, curation and maintenance of chisg browser website. sg design of the study, critically reviewing the manuscript. mas design of the study, data analysis, finalizing manuscript. all authors read and approved the final manuscript. key: cord-323307-nu9ib62h authors: dong, dong; lei, ming; hua, panyu; pan, yi-hsuan; mu, shuo; zheng, guantao; pang, erli; lin, kui; zhang, shuyi title: the genomes of two bat species with long constant frequency echolocation calls date: 2016-10-26 journal: mol biol evol doi: 10.1093/molbev/msw231 sha: doc_id: 323307 cord_uid: nu9ib62h bats can perceive the world by using a wide range of sensory systems, and some of the systems have become highly specialized, such as auditory sensory perception. among bat species, the old world leaf-nosed bats and horseshoe bats (rhinolophoid bats) possess the most sophisticated echolocation systems. here, we reported the whole-genome sequencing and de novo assembles of two rhinolophoid bats – the great leaf-nosed bat (hipposideros armiger) and the chinese rufous horseshoe bat (rhinolophus sinicus). comparative genomic analyses revealed the adaptation of auditory sensory perception in the rhinolophoid bat lineages, probably resulting from the extreme selectivity used in the auditory processing by these bats. pseudogenization of some vision-related genes in rhinolophoid bats was observed, suggesting that these genes have undergone relaxed natural selection. an extensive contraction of olfactory receptor gene repertoires was observed in the lineage leading to the common ancestor of bats. further extensive gene contractions can be observed in the branch leading to the rhinolophoid bats. such concordance suggested that molecular changes at one sensory gene might have direct consequences for genes controlling for other sensory modalities. to characterize the population genetic structure and patterns of evolution, we re-sequenced the genome of 20 great leaf-nosed bats from four different geographical locations of china. the result showed similar sequence diversity values and little differentiation among populations. moreover, evidence of genetic adaptations to high altitudes in the great leaf-nosed bats was observed. taken together, our work provided a useful resource for future research on the evolution of bats. bats (order chiroptera) are one of the largest monophyletic clades in mammals (order chiroptera), and constitute nearly 20% of living mammalian species. they can perceive their surroundings using a wide range of sensory systems, and have long been regarded as the most unusual and specialized species of all mammals. most bats are sophisticated echolocators and rely on their echolocation systems for navigation. however, old world fruit bats have no laryngeal echolocating ability, and navigate largely by vision. based on overwhelming molecular genetic evidence, it has been proposed that echolocating bats are paraphyletic (teeling, et al. 2005) . old world fruit bats and some laryngeal echolocators (including rhinolophidae, hipposideridae, craseonycteridae, megadermatidae, and rhinopomatidae families) are a natural group -the suborder yinpterochiroptera, and the remaining laryngeal echolocating bats are grouped to another suborder yangochiroptera . two distinct navigation approaches can be employed by echolocating bats: low duty cycle (ldc) echolocation and high duty cycle (hdc) echolocation (teeling 2009 ). ldc echolocators can separate pulse and echo in time to avoid forward masking, whereas some species of hdc echolocators separate pulse and echo in frequency. it has been documented that rhinolophoid bats might possess the most sophisticated echolocation systems (jones and teeling 2006) . recently, results from some hearing-related genes suggested sequence convergence in laryngeal echolocating bats (li, et al. 2008; davies, et al. 2012) . we attempted to investigate whether similar patterns can be detectable in other hearing-related genes. furthermore, a sensory trade-off between investment in vision and echolocation has been identified (dechmann and safi 2009) . loss-of-function in short-wave sensitive opsin (sws1 gene) occurred in rhinolophoid bats, which use hdc echolocation and can emit long constant frequency calls (zhao, et al. 2009 ). although several bat genomes have been sequenced (zhang, et al. 2013) , the evolutionary mechanisms of the rhinolophoid bats remains unclear. comparative genomics will provide us opportunities to investigate whether similar patterns can be detectable in other sensory genes. the great leaf-nosed bat (hipposideros armiger) and the chinese rufous horseshoe bat (rhinolophus sinicus) are two important species of rhinolophoid bats. first, these are model organisms with remarkable hdc echolocation ability and can emit continuous ultrasonic calls of long constant frequency with remarkable acoustic features (doppler-shift compensation) (schnitzler, et al. 2003) . we can comprehensively explore how rhinolophoid bats evolved a specialized form of echolocation. second, they are important reservoir hosts of emerging viruses, and the chinese rufous horseshoe bat has been suggested to carry the direct ancestor of severe acute respiratory syndrome (sars) coronavirus (ge, et al. 2013) . in this work, we presented the genomes of the great leaf-nosed bat and the chinese rufous horseshoe bat using the next generation sequencing platform (illumina hiseq 2500). the result revealed the adaptation of auditory sensory perception in hdc echolocators, and showed an extensive contraction of olfactory receptor gene repertoires as well as pseudogenization of some vision-related genes. furthermore, we performed genome re-sequencing to analyze the population genetic structure of the great leaf-nosed bats. the genomic data provide genetic evidence of adaptive evolution in rhinolophoid bats. a female great leaf-nosed bat (hipposideros armiger) and a female chinese rufous horseshoe bat (rhinolophus sinicus) were captured from a cave (n 30°20.497′ genomic dna was extracted from bat muscle using the qiagen dneasy blood and tissue kit. six paired-end libraries with insert size of 170 bp, 500 bp, 800 bp, 3k bp, 8k bp and 20k bp were constructed and sequenced for the great leaf-nosed bat and the chinese rufous horseshoe bat, respectively. the libraries were sequenced using illumina hiseq2500 platform, which has a read length of 101 bp. low quality sequencing reads were filtered out and potential sequencing errors were removed. the following filtering criteria were carried out: 1) filter reads with >5% unidentified nucleotides; 2) filter reads with >10 nucleotides aligned to the adapter sequence, allowing <3 mismatches; 3) remove putative pcr duplicates generated by pcr amplification in the library construction process. finally, we generated 476.5 gb and 288.5 gb of sequences for the great leaf-nosed bat and the chinese rufous horseshoe bat, respectively. the genome sequences were assembled using allpaths software (butler, et al. 2008 ). briefly, contigs were generated by constructing a de bruijin graph with the sequencing reads from short-insert library data. the graph was simplified to generate the contigs by removing tips, merging bubbles and solving repeats. the sequencing reads were mapped to the assembled contigs, and the scaffolds were constructed by weighting the rates of consistent and conflicting paired-end relationships. at last, we retrieved the read pairs with one end that uniquely mapped to the contigs and the other end located in the gap region, a local assembly for these collected reads was performed to fill the gaps. a more detailed genome assembly method is provided in supplementary methods. total rnas of the two bats were extracted from brain, cerebellum, heart, liver, stomach, kidney, lung and muscle tissues for the generation of transcriptome data. paired-end libraries for rna sequencing were constructed using the illumina mrna-seq prep kit. the quality and integrity of the rna samples were determined using the agilent2100 bioanalyzer. poly(a) mrnas were isolated using oligo(dt) beads, fragmented, and converted to cdnas followed by end repair, adaptor ligation, and pcr amplification. the libraries thus generated were sequenced using the illumina hiseq2500 platform as described above. we searched for tandem repeats across the genomes using tandem repeats finder. transposable elements were predicted in the genomes by homology search against the known transposable elements (te) in repbase (jurka, et al. 2005 ) (version 20110920) using repeatmasker version 3.3.0 (tarailo-graovac and chen 2009). the protein-coding genes of the bat genomes were annotated by combining homology-based, ab initio and rna-seq gene prediction methods. at first, rna-seq data were assembled using the trinity package (trapnell, et al. 2013) . pasa (version r2012-06-25) (haas, et al. 2003) was then used to map the assembled transcripts. based on the set of gene models, a training set was constructed for de novo predictors by selecting the genes with complete structures and at least 100% mapping rate for uniprot vertebrate proteins. for the ab initio prediction, augustus (stanke and waack genes with the training set generated by pasa. for homology-based gene prediction, the protein sequences of human, mouse, dog, cow, little brown bat and large flying fox were downloaded from ensembl release 72 and mapped onto the repeat-masked genome using genblasta (she, et al. 2009 ). rna-seq data were mapped to the genome using tophat (trapnell, et al. 2009 ), and the transcription-based gene structure were generated by cufflinks (trapnell, et al. 2013) . the final gene set was generated by merging all genes predicted using glean software (http://sourceforge.net/projects/glean-gene/). to infer gene function, it was based on the best match of the alignment to the swissprot and translated embl nucleotide sequence data library databases using blastp. interproscan (mulder and apweiler 2007 ) was used to determine motifs and domains in the final gene set. to evaluate completeness of the genomes and annotations, cegma method (parra, et al. 2007) was employed. we used the treefam methodology (li, et al. 2006) to define gene families in 14 mammalian genomes (human, macaque, mouse, rat, dog, cat, horse, rhinoceros, cow, pig, little brown bat, large flying fox, great leaf-nosed bat and chinese rufous horseshoe bat). the protein sequences of other 12 mammalian species were obtained from ensembl database (release 72). gene family expansion and contraction analysis was performed by café software (de bie, et al. 2006) . a random birth and death model was proposed to study gene gain and loss in the gene families across a user-specified phylogenetic tree. a global parameter λ (lambda), which described both gene birth (λ) and death (μ = -λ) rates across all branches of all gene families was estimated using the maximum likelihood method. a conditional p-value was calculated for each gene family, and families with conditional p-values less than 0.05 were considered to have a significantly accelerated rate of expansion and contraction. protein sequences of the aforementioned 14 mammals were aligned using muscle software (edgar 2004) . all orthologous genes were concatenated to one super gene for each species. raxml (stamatakis 2014 ) was applied to build phylogenetic trees. we partitioned the data by coding genes, and evaluated the model parameter independently for each partition. in all partitioned analyses, the empirical base frequencies and the evolutionary rates were estimated independently for every partition. bootstrap support was obtained by repeating the original partitioned ml raxml analysis on 100 bootstrap replicates for each dataset using different random number seeds in each repetition. next, we inferred the species tree using coalescent method: maximum pseudo-likelihood estimation of species tree (mp-est) (liu, yu, et al. 2010) . individual gene tree for each gene was estimated using the maximum-likelihood method and rooted by an outgroup (human). species trees were estimated from the rooted gene trees in the program mp-est with 100 bootstrap replicates. the results supported that bats are member of scrotifera (chiroptera + carnivores + perissodactyla + cetartiodactyla) with bat lineage diverging from fereuungulata (carnivores + perissodactyla + cetartiodactyla). the values of ka, ks, the ka/ks ratio were estimated for each gene using the codeml programs nested in the paml package (yang 2007) . in order to detect positively selected genes, optimized branch-site likelihood model (zhang, et al. 2005) was used. we separately explored the positively selected genes in the great leaf-nosed bat and the chinese rufous horseshoe bat. for each analysis, only one bat species was selected as foreground branches, and all other species were regarded as the background branches. the revised branch-site model a was employed, which attempts to identify positive selection acting on some sites on the "foreground branches". using an likelihood ratio test (lrt), the alternative hypothesis that positive selection occurs on the foreground branches (ka/ks > 1) is compared with the null hypothesis (ka/ks=1). bayesian empirical bayes values were used to identify sites under positive selection. then, branch two-ratio model was applied to detect accelerated evolved genes in specific lineage. the one-ratio model assumed an equal ka/ks ratio for all lineages in the phylogeny, and the two-ratio model assumed two ka/ks ratios: one branch for the background, one for the foreground branch leading to the specific species. then, clade model c was employed to test for positive selection along the rhinolophoid bats. the two clades were assumed to share sites under purifying selection and neutral evolution, but to differ at a third site partitions under divergent selection. the null model used for the clade model c was m2a_rel (weadick and chang 2012) , whose lrt has a relatively lower false-positive rate. go annotations were downloaded from ensembl databases and were assigned to these orthologous genes. the binomial test was used to identify go categories with more than 20 gene that had an excess of non-synonymous changes in bat lineages. next, we used the program mapp (multivariate analysis of protein polymorphism) (stone and sidow 2005) to evaluate the physicochemical impact of these convergent amino acid substitutions on bats. physicochemical variations can be used to predict how these particular convergent amino acid substitutions might affect protein function. in this work, we performed a probabilistic analyses of the sequence convergence in echolocating bats. a maximum likelihood approach, implemented in the software package codeml ancestral, was used. we compared the pair-wise branches of two echolocating bat in the phylogeny, and posterior probabilities of all possible amino acid substitutions were calculated. the probabilities of divergent and convergent substitutions were calculated as the sum of joint probabilities of substitutions between the two branches of echolocating bats. convergence and divergence estimates were based on posterior distributions of ancestral states and substitutions. the same state (same amino acid) represents convergent substitutions, and the different state represents divergent substitutions. finally, to further validate that the convergence between two branch pairs of echolocating bats was significant, we performed the simulation analysis to compare the observed probabilities against that of the null hypothesis. simulated sequences were generated using evolver, another package from paml package (yang 2007) . the branch-wise convergence probabilities were calculated with 1,000 replicates. we used the similar in silico method as previously reported in dong et al. (dong, et al. 2009 ). at first, we used previously published or genes in vertebrates as query sequences (niimura and nei 2007) and conducted a tblastn search against the genome sequences with a cutoff e value of 1e-10 to identify the or gene repertoires. here, we totally identified or gene repertories from eight mammalian genomes (the site (http://genome.ucsc.edu). then, the non-redundant blast-hits were extended to the 5' and 3' directions along the genome sequences, and the potential coding regions were extracted from these sequences. the chemosensory receptor genes in mammals have high sequence similarity. here, we re-performed a tblastn against the genome sequences using or coding genes identified from each species, and the non-redundant blast-hits were used to identify the or pseudogenes containing interrupting stop codons or frameshifts. to identify partial or genes from these sequences, we extracted the sequences that did not have any nonsense or frameshift mutations. we then constructed a multiple alignment of these sequences together with functional or genes by the program e-ins-i in mafft version 5.8 (katoh, et al. 2002) . from those alignments, we extracted partial or sequences that meet the following criteria. when the c-terminal region of an or gene is missing from the genome sequence, the n-terminal region should contain an initiation codon at a proper position and should not contain any nonsense mutations, frameshifts, or long gaps. when the n-terminal region is missing, the c-terminal portion should have a stop codon at a proper position and should not contain any nonsense mutations, frameshifts, or long gaps. we also identified 6 and 10 sequences with nonsense stop codon in the great leaf-nosed bat and chinese rufous horseshoe bats, which miss both a start and stop codons. however, these sequences were removed because they have relatively short sequence length (~400 bp) and have strong sequence similarity with bitter taste receptor genes. to assign identified or genes into distinct or gene families, a collection of protein sequences from horde database version 42 (safran, et al. 2003 ) was used. to detect the extensive gain and lose of or gene repertories, we employed the reconciled tree method (nam and nei 2005) , in which the topology of a gene tree is reconciled with that of a species tree. an in-house program was applied. briefly, based on the phylogenetic tree of or genes, we compared the condensed gene tree and the species tree under the parsimony principle. the number of ancestral genes can be estimated, and the information of the past occurrence of gene expansion and contraction. here, we used a 70% condensed tree of or genes for analyses. a list of vision-related genes were obtained from go category of visual perception (go:0007601). we subjected human vision-related proteins to tblastn against the genomes with cutoff threshold of e-value 1e-5. we found that best-hits for each human protein by using the criteria that more than 30% of the aligned sequences showed an identity above 30%. genewise algorithm was employed to identify potential pseudogenes with parameters -genesf -for -quiet. those genes with frame shifts or pre-mature stop codons were considered as candidates. we then filtered them as follows: 1) we aligned all human proteins to their corresponding genomic loci, and those genes with frameshifts or premature stop codons in human-to-human alignments were removed; 2) as for the human-to-human alignments, those genes with obvious splicing errors near their frameshifts or premature stop codons were removed; 3) candidate pseudogenes with a low number of sequencing reads covering their frameshift or premature stop codon sites were regarded assembly error. those genes with a number of reads containing genotype variations at these sites were considered as heterozygous and were also removed. we used a method based on ka/ks to identify go categories that significantly above average in the great leaf-nosed bat genome and chinese horseshoe bat genome. at first, the ka and ks rates are calculated by paml package from all aligned bases with quality score larger than 20 in orthologs, using the f3x4 codon frequency model and the rev substitution matrix. in order to examine the evolution function catalog, we downloaded the go annotation of human gene from the ensembl biomart database (release-71). we estimated the average ka and ks values for those genes which have annotated go as following equations (s1, s2). where t is the number of annotated genes within go categories, i a and i a are the numbers of non-synonymous substitutions and sites, i s and i s are the numbers of synonymous substitutions and sites in gene i, as estimated by paml, respectively. the expected proportion of non-synonymous substitutions a p in a go category was then calculated (s3). for a given go category c, the probability c p of observing an equal or higher number of non-synonymous substitutions and synonymous substitutions was calculated assuming a binominal distribution (s4). where c a and c s are the total number of non-synonymous and synonymous substitutions in go category c, respectively. we applied an approach to the binomial test described above to identify go categories that have an excess of non-synonymous changes on one lineage. for lineages x and y, the average proportion of non-synonymous substitutions were calculated by the following formula (s5). x is the total number of non-synonymous substitutions in the x lineage, y is the total number of non-synonymous substitutions in the y lineage, and the divergence of the proportion of non-synonymous substitution numbers in different lineages between the observed and expected obeys binomial distribution, the formula is as in the following equation (s6). as described for the absolute rate tests, we then computed this statistic for every go category, as well as for every category in 10,000 randomly permuted data sets. we sampled a total of 20 great leaf-nosed bats distributed in four different locations. genomic dna was extracted from wing membranes of each individual. paired-end sequencing library with an insert size of 500 bp was constructed for each sample, and sequenced on the illumina hiseq 2500 platform with 2×101 bp mode. duplicate sequencing reads were filtered out according to the following criteria: 1) any reads with >10% unidentified nucleotides; 2) reads with >10 nt aligned to the adapter sequence, allowing <10% mismatches; 3) reads with 50% bases having phred quality <5. the filtered reads were mapped to the genome using bwa, and samtools were used to call snps. then, we filtered snps using vcftools and gatk under the following criteria: 1) coverage depth >4 and <10000; 2) root mean square mapping quality >10; 3) the distance of adjacent snps >10 bp; 4) the distance to a gap > 10; 5) read quality value >30. to estimate phylogenetic relationships, the genetic distances were calculated among all samples to generate a neighbor-joining (nj) tree using phylip. we performed a principal component analysis using the package gcta. the population structure was inferred using frappe (v1.1) with a maximum likelihood method (tang, et al. 2005) . sliding-window approach (10 kb window sliding in 10 kb step) was employed to quantify polymorphism levels (θ π , pairwise nucleotide variation as a measure of diversity) and genetic differentiation (fst) between the high altitude region (dq) and low altitude regions (tw, jx and gz). to detect significant signatures of selective sweep, z-transformed fst values was calculated. next-generation genome sequencing was carried out, generating 476.5 gb and 288.5 gb of sequences for the great leaf-nosed bat and the chinese rufous horseshoe bat (supplementary table s1 ), respectively. the genome size was estimated to be 2.18 gb and 2.07 gb for the great leaf-nosed bat and the chinese rufous horseshoe bat ( supplementary fig.1 supplementary fig.3 ). known transposon-derived repeats account for 25.8% and 28.5% of the genomes in the great leaf-nosed bat and the chinese rufous horseshoe bat, respectively, which are lower than other non-bat mammalian species (supplementary table s5 ). to facilitate the genome annotation, we generated a high-depth transcriptome data from these two rhinolophoid bats. with repeats masked, the genome was annotated by integrating the homologous prediction, ab initio prediction and transcription-based prediction methods. as a result, a non-redundant reference gene set of 22,009 and 23,152 protein-coding genes were generated for the great leaf-nosed bat and the chinese rufous horseshoe bat ( supplementary fig.4) , respectively. we employed cegma method to evaluate the completeness of genome annotation. the result showed that the vast majority of the core genes were present in our predicted gene sets (97.08% for the great leaf-nosed bat and 96.14% for the chinese rufous horseshoe bat), indicating the completeness of gene sets identification. next, we aligned the transcriptome sequencing reads to the predicted genes, and the result showed that approximately 96% of exons are accurately covered (96.8% for the great leaf-nosed bat and 97.1% for the chinese rufous horseshoe bat). comparative analysis showed a high gene sequence similarity between them (91%, supplementary fig.5 ). we next examined the level of homology between our predicted genes and sequences in the uniprot database. the result showed that >92% of the genes were functionally annotated (94% for the great leaf-nosed bat and 92.2% for the chinese rufous horseshoe bat). compared with the gene families in other three mammalian species -the little brown bat, large flying fox and human, we identified 8,792 homologous gene families shared by five species. a total of 975 gene families were specific to the rhinolophoid bats ( fig. 1) . further functional annotation indicated that the rhinolophoid bats specific gene families were significantly over-represented in two major functional categories: atp binding (43 genes, f.d.r.= 0.0002) and immunity and host defense (25 genes, f.d.r.= 0.0051; supplementary table s6) . until now, the relationship of bats to other members of superorder laurasiatheria has proven difficult to resolve. some studies insisted that bats belong to the clade of pegasoferae which comprises chiroptera, carnivores and odd-toed ungulates (lindblad-toh, et al. 2011; meredith, et al. 2011; mccormack, et al. 2012) , whereas others proposed that bats are a sister group to the clade comprising carnivores and euungulata (pumo, et al. 1998; murphy, et al. 2001; murphy, et al. 2007; song, et al. 2012; zhang, et al. 2013) . to determine the phylogenetic position of bats within the superorder laurasiatheria, a total of 4,569 single-copy 1:1 orthologous genes were fig.6 ). the result based on nucleotide data was in line with previous analysis that bats are a sister group to odd-toed ungulates, whereas the result based on amino acid data supported that bat bats are sister group to the fereuungulata (carnivores + perissodactyla + cetartiodactyla). to account for the tree discordance among loci, coalescent method was applied. coalescent trees were highly consistent with the result inferred from amino acid data using partitioned method ( supplementary fig.7) . to dissect the phylogenetic signal, previously published eight different phylogenetic hypotheses ( supplementary fig.8) were proposed (waddell, et al. 1999; murphy, et al. 2001; nishihara, et al. 2006; prasad, et al. 2008; lindblad-toh, et al. 2011; meredith, et al. 2011; mccormack, et al. 2012 table s7 , supplementary fig.9 ). the result is consistent after incorporating the data from eulipotyphyla group ( supplementary fig.10) . we subsequently estimated the divergence time among these 14 mammalian species. the bat lineage seems to be diverged from fereuungulata around 81 million years ago, and the rhinolophoid bats seem to be diverged from the old world fruit bats around 68 million years ago. comparative genome analyses were carried out to assess the evolution and innovation within the rhinolophoid bats. we next determined the expansion and contraction of gene orthologous clusters during evolution. the result identified 48 significantly expanded and 65 significantly contracted gene families in the great leaf-nosed bat, 46 significantly expanded and 54 significantly contracted gene families in the chinese rufous horseshoe bat (fig. 2) . functional annotation showed that gene family contraction mainly included many olfactory receptor gene families in both rhinolophoid bat lineages (supplementary table s8 ), which is consistent with the result that the olfactory system is aberrant in some echolocating bats. many of the expanded gene families in both rhinolophoid bats are significantly enriched in immune-related functional categories (supplementary table s9 ). moreover, we identified 577, 453 and 182 positively selected genes in the great leaf-nosed bat, the chinese rufous horseshoe bat and the large flying fox, (supplementary tables s10, 11, 12), respectively. olfaction is of great importance in the lives of bats. many bats can use olfaction for mother-pup recognition, find food and avoid danger. in old world fruit bats, olfaction appears to be of particular importance, and fruit bats can find food from scent cues. animals that rely heavily on the sense of smell tend to have large numbers of or genes, while species that always use other senses have fewer functional or genes (niimura and nei 2007) . it has been suggested that bats displayed a diverse olfaction abilities. in order to describe the diversity of bat or gene repertoires, we identified the entire set of or genes of four bat species (supplementary methods, supplementary table s13 ). in line with previous work (hayden, et al. 2014) , we observed that echolocating bats have less fraction of or pseudogenes (18% for the great leaf-nosed bat, 16% for the chinese rufous horseshoe bat and 14% for the little brown bat) than non-echolocating bats (26% for the large flying fox). however, further analysis showed that the large flying fox and little brown bat have more than 400 intact or genes while these two rhinolophoid bats only have <300 intact or genes. this finding is consistent with the result that rhinolophoid bats have a relatively small olfactory epithelium than the frugivorous pteropodidae (neuweiler 2000) . next, we reconstructed a protein neighbor-joining tree of all newly identified intact or genes in bats (fig. 3a) . it is obvious that or genes can be classified into two distinct classes based on sequence similarity: class i, postulated to bind to water-borne molecules, and class ii, hypothesized to bind to airborne molecules. the exact number of or genes in each class/or family are shown (supplementary table s14 , table s15 ). it seems that four bat species contain similar number of or genes in class i, while or gene contraction occurred in two rhinolophoid bats in class ii . previous works have documented that the number of or genes varies extensively among mammalian species, and extensive gains and losses of or genes have been observed (niimura and nei 2007) . to further understand the evolutionary changes of or gene repertoires, we estimated the gains and losses of the or genes in a diverse range of mammals (supplementary methods). evolutionary changes in the number of or genes in mammals have been shown in fig. 3b . we can clearly identify an extensive or gene contraction events occurred to the branch leading to the common ancestor of bats. further extensive gene contractions can be observed in the branch leading to the rhinolophoid bats. this finding also suggests massive "birth-and-death" of or genes in the bat species. table s18 ). since high omega may be due to stochastic effect caused by extremely small sample size, we removed these genes with omega value of 999. the result is also stable that more positively selected genes were detected in the branches leading to echolocating bats (12 genes, great leaf-nosed bat, p = 9.1e-5; 10 genes, chinese rufous horseshoe bat, p = 8.8e-3; 10 genes, little brown bat, p = 0.011). next, branch model (two-ratio model) was carried out with the attempt to detect genes with accelerated evolution in the bat species. the result further indicated that more hearing-related genes have higher ɷ values on the branches leading to echolocating bats than all other lineages (supplementary table s19) . clade model c implemented in paml was employed (weadick and chang 2012) , and the result also persisted that more positively selected genes were detected in the branches leading to echolocating bats (supplementary table s20 ). moreover, a significant association between the average number of non-synonymous substitutions for all the hearing-related genes leading to each mammalian species and the estimated frequency of best hearing sensitivity for that species (r = 0.84, p = 0.00032, fig. 4 ) was observed. no significant correlation between such hearing frequencies and number of synonymous changes was observed (p = 0.132). a significant association between the number of non-synonymous changes between sister taxa was observed (r = 0.67, p = 0.006). it is obvious that echolocating bats have typically undergone many more non-synonymous changes in the hearing-related genes than non-echolocating mammals. these results indicated the evolution of ultrasonic hearing in the rhinolophoid bats has involved in adaptive amino acid replacements in the hearing-related genes, which provided evidence conferring greater auditory sensitivity to ultrasonic frequency. previous works have documented that seven hearing-related genes underwent convergent evolution in echolocators (li, et al. 2008; liu, cotton, et al. 2010; davies, et al. 2012; shen, et al. 2012 ). here, genome-wide signatures of convergent evolution were examined in laryngeal echolocating bats. except for the previously reported seven hearing-related genes, we totally identified 10 genes examined in the sound of perception category containing potential sequence convergent loci (site-wise convergence posterior probabilities > 0.5). to confirm our result, we amplified and sequenced these 10 hearing-related genes from another two echolocating bats (eptesicus fuscus and miniopterus natalensis). the result also showed that these 10 genes have higher convergence probabilities occurred in echolocating bats from a wider range of taxa, and the convergence probabilities between branches were significant based on simulations (supplementary table s21 ). however, maximum likelihood trees recovered the topology that all laryngeal echolocating bats formed a monophyletic clade for only four genes (col1a1, icam1, bsnd and strc, supplementary fig.11 ). further analyses showed that echolocating bats are paraphyletic based on synonymous substitutions, whereas the non-synonymous trees revealed monophyly of laryngeal echolocators for only one hearing-related genes (strc gene, supplementary fig.12 ). next, multivariate analyses of protein polymorphism (mapp) was employed to detect the physicochemical impact of convergent substitutions in echolocating bats. mapp scores were estimated for the amino acid variants nested in the strc gene, and the result showed that these replacements had important functional effects (mapp score = 18.61, p = 1.44e-4 for h28q; mapp score = 10.33, p = 3.98e-3 for a39t; mapp score = 7.37, p = 2.27e-2 for v169i). we further measured the number of sites with convergent amino acid substitutions along the branches as a direct measurement of sequence convergence, and found that the number of convergent sites in the branch pairs is proportional to the number of divergent sites ( supplementary fig.13 ). the number of convergent sites in the laryngeal echolocating bats does not significantly exceed that between the branch pair of the little brown bat and large flying fox, given their numbers of divergent sites (supplementary table s22 ). no significant differences was observed in the total number of sites that have experienced convergent substitutions from hearing-related genes. this result indicated that there is no exceptional genomic signature indicative of adaptive convergence between laryngeal echolocating bats, and genes with adaptive convergent substitutions might confine to few specific genes. bats are nocturnal mammals. the eyes of most echolocating bats are relatively small and poorly developed, whereas old world fruit bats often have excellent eyesight . rhinolophoid bats have the most sophisticated echolocation ability, and have been proposed that some genes involved in visual perception may have undergone relaxed selection (zhao, et al. 2009 ). we next examined the molecular basis for the poor visual perception in the echolocating bats. of bats have long been regarded as important reservoir hosts of emerging viruses (calisher, et al. 2006) . to examine population dynamics and understand evolutionary processes, we sampled 20 great leaf-nosed bats from 4 major distributed locations in china, including one group from high-altitude region (fig. 6a, table s28) are located at the intergenic regions. in order to resolve their phylogenetic relationships, we constructed a neighbor-joining (nj) tree based on pairwise genetic distances (fig. 6b) . this result showed that the great leaf-nosed bats formed separate groups according to the different locations. principal component analysis clearly divided these samples into four groups (dq, gz, jx and tw, fig. 6c) . these results suggested that there were significant population structures among the great leaf-nosed bat populations. furthermore, we performed population structure analysis. when k=4, all these four populations were clearly separated (fig. 6d) . next, we measured the genetic diversity values (θ π ) of four populations, and found similar sequence diversity values (dq: 0.0012, gz:0.0009, jx:0.0009 and tw:0.0011, supplementary fig. s14 ). we further observed that the population differentiation statistic (fst) between populations, and the result showed little differentiation among populations (fst ranging from 0.013 between jx and tw to 0.057 between tw and dq, supplementary table s29) , which suggests universal inter-region gene flows. since the method of population differentiation has been widely used to detect selective sweeps (akey, et al. 2010; axelsson, et al. 2013; gou, et al. 2014 table s31 ). the result showed that genes related to catabolic process are likely to have been targets of recent positive selection. interestingly, we found that five genes (epas1, plxnd1, gja1, sell and chdh) belong to hypoxia response related go categories (pugh and ratcliffe 2003; storz and moriyama 2008) , including 'angiogenesis', 'blood coagulation', 'blood vessel morphogenesis' and 'oxidoreductase activity'. epas1 can respond to the changes in available oxygen in the cellular environment under the high-altitude conditions. our work suggested that epas1 is involved in a selective sweep during the move of bats from low to high altitude. although hypoxia go categories are not over-represented, these highlighted hypoxia-related genes gave us a clue that genetic adaptations might be associated with high altitude. using deep sequencing and de novo assembly, we generated two genomes of rhinolophoid bats. rhinolophoid bats can perceive the world by using a wide range of sensory mechanisms, some of which have become highly specialized. these genome data provided useful resources to decipher the molecular adaptations of phenotypic traits. rhinolophoid bats arguably possess the most sophisticated echolocation systems, and can emit relatively long calls adapted to detect and classify the wing beats of insects. they are heavily reliant on hearing for a variety of ecologically important roles. previous works have documented that hearing-related genes are predominantly evolutionarily conserved in mammals (kirwan, et al. 2013) . here, we found evidence that some hearing-related genes have undergone darwinian selection associated with the evolution of specialized constant frequency echolocation. positive selection acting on hearing-related genes in rhinolophoid bats might result from the extreme selectivity used in auditory processing by these bats. many previous works have reported the sequence convergence of some hearing-related genes reuniting echolocating bats (li, et al. 2008; liu, et al. 2011; davies, et al. 2012; . we found no genome-wide sequence convergence for echolocation, indicating erroneous phylogenetic grouping are still rare it has been suggested that the enlargement of one area of brain might be associated with the reduction in size of other brain area (harvey and krebs 1990) . the auditory cortex and the inferior colliculus are extremely enlarged in the volume in laryngeal echolocating bats (especially in rhinolophoid bats), while visual brain areas are relatively enlarged in old world fruit bats (dechmann and safi 2009 ). the trade-off has been proposed in investment in brain tissues because of the extreme energetic demands imposed by neural processing. our result showed more visual perception genes have become pseudogenes in rhinolophoid bats, and it is reasonable to speculate that some visual perception gene may have undergone relaxed natural selection in echolocating bats. meanwhile, positive selection acting on some hearing-related genes was identified. such concordance suggests that some genes are impacted by natural selection, which raised the possibility that changes at the sensory genes will have direct consequences for those genes controlling for other sensory modalities, perhaps via trade-offs. this finding supports the longstanding but weakly supported assumption that bats are experiencing trade-off between vision and audition . olfaction is of great importance in the lives of bat species. previous works have identified olfactory receptor (or) gene repertoire in the little brown bat and the large flying fox using the profile hidden markov model (hayden, et al. 2010; hayden, et al. 2014 in specific gene family. a possible explanation is that the little brown bat has no well-developed olfaction ability, but tends to recognize specific odorants after recent or gene duplication. these comparative analyses have provided great insights into adaptation to their specialized sensory mechanisms. in this work, we re-sequenced the genome of 20 great leaf-nosed bats from four distributed locations. the genome re-sequencing analysis has been performed based generally on the following considerations: 1) to characterize the genetic diversity and patterns of evolution; 2) to understand the genetic bases of adaptation to high altitude in the great leaf-nosed bats. efforts for the conservation measures will benefit from the knowledge of population genetic structure of the great leaf-nosed bats. here, we found very little differentiation among populations, which suggests universal inter-region gene flows or incomplete lineage sorting. a broader geographical scale analysis is needed in the future. furthermore, we provided evidence of genetic adaptation in the great leaf-nosed bat that are associated with high altitude. selective sweep mapping was conducted for populations from different altitudes, and identified several hypoxia-related genes with a high extent of differentiation on the genome scale. epas1 is transcription factor that respond to the changes in the available oxygen in the cellular environment under high-altitude conditions, and mutations at epas1 are tightly associated with hematologic phenotypes (van patot and gassmann 2011). previous works have documented that epas1 polymorphisms are associated with tibetan people with lower hemoglobin concentrations (beall, et al. 2010) . a loss-of-function role of epas1 might exist in high-altitude adaptation. so, our result indicated potential high-altitude hypoxia adaptation mechanisms of the great leaf-nosed bat. our work is based on a limited genome re-sequencing resource, and data from more samples are necessary for future work. however, false positives notwithstanding, the results provided valuable staring points for experimental follow-up, and suggested an initial evolutionary scenario of bats in adaptation to high-altitude hypoxia. to the best of our knowledge, it is the first time to report the de novo assembled genome and genome re-sequencing of bats with long constant frequency echolocation calls. these data are essential for us to understand the evolution of bats. tracking footprints of artificial selection in the dog genome the genomic signature of dog domestication reveals adaptation to a starch-rich diet natural selection on epas1 (hif2alpha) associated with low hemoglobin concentration in tibetan highlanders prediction of complete gene structures in human genomic dna allpaths: de novo assembly of whole-genome shotgun microreads bats: important reservoir hosts of emerging viruses parallel signatures of sequence evolution among hearing genes in echolocating mammals: an emerging model of genetic convergence cafe: a computational tool for the study of gene family evolution comparative studies of brain evolution: a critical insight from the chiroptera evolution of olfactory receptor genes in primates dominated by birth-and-death process muscle: multiple sequence alignment with high accuracy and high throughput isolation and characterization of a bat sars-like coronavirus that uses the ace2 receptor whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia improving the arabidopsis genome annotation using maximal transcript alignment assemblies comparing brains ecological adaptation determines functional mammalian olfactory subgenomes a cluster of olfactory receptor genes linked to frugivory in bats the evolution of echolocation in bats repbase update, a database of eukaryotic repetitive elements mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform a phylomedicine approach to understanding the evolution of auditory sensory perception and disease in mammals the hearing gene prestin reunites echolocating bats treefam: a curated database of phylogenetic trees of animal gene families the hearing gene prestin unites echolocating bats and whales a high-resolution map of human evolutionary constraint using 29 mammals a maximum pseudo-likelihood approach for estimating species trees under the coalescent model convergent sequence evolution between echolocating bats and dolphins the voltage-gated potassium channel subfamily kqt member 4 (kcnq4) displays parallel evolution in echolocating bats parallel evolution of kcnq4 in echolocating bats parallel adaptive radiations in two major clades of placental mammals ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis impacts of the cretaceous terrestrial revolution and kpg extinction on mammal diversification interpro and interproscan: tools for protein sequence classification and comparison resolution of the early placental mammal radiation using bayesian phylogenetics using genomic data to unravel the root of the placental mammal phylogeny evolutionary change of the numbers of homeobox genes in bilateral animals the biology of bats extensive gains and losses of olfactory receptor genes in mammalian evolution pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions cegma: a pipeline to accurately annotate core genes in eukaryotic genomes confirming the phylogeny of mammals by use of large comparative sequence data sets regulation of angiogenesis by hypoxia: role of the hif system complete mitochondrial genome of a neotropical fruit bat, artibeus jamaicensis, and a new hypothesis of the relationships of bats to other eutherian mammals human gene-centric databases at the weizmann institute of science: genecards, udb, crow 21 and horde from spatial orientation to food acquisition in echolocating bats genblasta: enabling blast to identify homologous gene sequences parallel evolution of auditory genes for echolocation in bats and toothed whales parallel and convergent evolution of the dim-light vision gene rh1 in bats (order: chiroptera) consel: for assessing the confidence of phylogenetic tree selection resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies gene prediction with a hidden markov model and a new intron submodel physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity mechanisms of hemoglobin adaptation to high altitude hypoxia estimation of individual admixture: analytical and study design considerations using repeatmasker to identify repetitive elements in genomic sequences hear, hear: the convergent evolution of echolocation in bats? a molecular phylogeny for bats illuminates biogeography and the fossil record differential analysis of gene regulation at transcript resolution with rna-seq tophat: discovering splice junctions with rna-seq hypoxia: adapting to high altitude by mutating epas-1, the gene encoding hif-2alpha towards resolving the interordinal relationships of placental mammals an improved likelihood ratio test for detecting site-specific functional divergence among clades of protein-coding genes paml 4: phylogenetic analysis by maximum likelihood comparative analysis of bat genomes provides insight into the evolution of flight and immunity evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level the evolution of color vision in nocturnal mammals this project is supported by key construction program of the national '985' project of east china normal university to dong dong (79633006), and the national natural science foundation of china (no. 31570382) to shuyi zhang. we thanks shanghai majorbio bio-pharm biotechnology co., ltd. for genome sequencing and dr.chao-hung lee for providing valuable advices.. dd designed the study, and dd, ml, ph, yp, sm, gz, ep, kl and sz carried out the data analysis. dd wrote the manuscript. all authors read and approved the final manuscript. the authors declare no competing financial interests. key: cord-301546-yck1t3pp authors: kozaki, toshinori; kimmelblatt, brian a.; hamm, ronda l.; scott, jeffrey g. title: comparison of two acetylcholinesterase gene cdnas of the lesser mealworm, alphitobius diaperinus, in insecticide susceptible and resistant strains date: 2007-12-28 journal: arch insect biochem physiol doi: 10.1002/arch.20229 sha: doc_id: 301546 cord_uid: yck1t3pp two cdnas encoding different acetylcholinesterase (ache) genes (adace1 and adace2) were sequenced and analyzed from the lesser mealworm, alphitobius diaperinus. both adace1 and adace2 were highly similar (95 and 93% amino acid identity, respectively) with the ace genes of tribolium castaneum. both adace1 and adace2 have the conserved residues characteristic of ache (catalytic triad, intra‐disulfide bonds, and so on). partial cdna sequences of the alphitobius ace genes were compared between two tetrachlorvinphos resistant (kennebec and waycross) and one susceptible strain of beetles. several single nucleotide polymorphisms (snps) were detected, but only one non‐synonymous mutation was found (a271s in adace2). no snps were exclusively found in the resistant strains, the a271s mutation does not correspond to any mutations previously reported to alter sensitivity of ache to organophosphates or carbamates, and the a271s was found only as a heterozygote in one individual from one of the resistant a. diaperinus strains. this suggests that tetrachlorvinphos resistance in the kennebec and waycross strains of a. diaperinus is not due to mutations in either ache gene. the sequences of adace1 and adace2 provide new information about the evolution of these important genes in insects. arch insect biochem physiol. © 2007 wiley‐liss, inc. the lesser mealworm, alphitobius diaperinus, is a manure-breeding beetle that is the primary structural pest of the poultry industry in the united states (axtell, 1999; hinton and moon, 2003) . the lesser mealworm is also a reservoir of salmonella typhimurium, escherichia coli, tapeworms, avian leucosis virus, turkey coronavirus, turkey enterovirus (avincini and ueta, 1990; axtell and arends, 1990; despins et al., 1994; goodwin and waltman, 1996; mcallister et al., 1996; watson et al., 2000) , and may serve as a source of campylobacter contamination of poultry (bates et al., 2004) . high beetle populations consume significant amounts of bird feed (savage, 1992) . under dry conditions in the broiler house, beetles bite the skin of birds resting at night. to prevent these bites, birds will rest for short periods and then move (despins et al., 1987; vaughan and turner, 1984) . this can affect the weight gain of chicks. organophosphate and carbamate insecticides have served as effective tools for control of the lesser mealworm, and tetrachlorvinphos continues to be used for this purpose in the united states. however, a recent study indicated that there were some populations of lesser mealworm in which a substantial portion of the population was highly resistant to tetrachlorvinphos (hamm et al., 2006) . the mechanism responsible for this resistance has not been determined. organophosphate and carbamate insecticides exert their toxic effects via inhibition of acetylcholinesterase (ache). recent studies have discovered archives of insect biochemistry and physiology march 2008 that some insect species have a single ace gene (drosophila melanogaster and musca domestica), while many other species have two ace genes (culex pipiens, bombyx mori, myzus persicae, among others). beetles (coleoptera) are the most evolutionarily successful metazoans, contributing 25% of all known animal species, far more than any other taxonomic order. despite the diversity and economic importance of coleoptera, ace genes have been reported from only two beetles: leptinotarsa decemlineata (say) (zhu et al., 1996) and tribolium castaneum (http://www.hgsc.bcm.tmc/edu/projects/ tribolium/). evidence has accumulated that indicates a limited number of mutations in ace are asfig. 1 . the nucleotide and deduced amino acid sequences of adace1 (drosophila ace orthologous) cdna (accession no. eu086056). the residues that make intra-disulfide bonds are marked with *, oxianion hole with §, catalytic triad with +, acyl binding site with ¶, and anionic subsite with ‡. the locations of the primers used for genotyping are indicated as **. sociated with resistance (i.e., mutations that code for an organophosphate and/or carbamate insensitive ache) in insects (fournier, 2005; kono and tomita, 2006; oh et al., 2006) . in this study we examined if there was one or two ace genes in a. diaperinus, and if mutations in ace could be correlated with tetrachlorvinphos resistance. three strains of lesser mealworm were used. the denmark-s (susceptible) strain was obtained from saturnia (bjerringbrovej 48 2610 rødovre, denmark). two strains (from kennebec, co., me, and waycross, ga) that contain high proportions of tetrachlorvinphos resistant individuals (hamm et al., 2006) were also used. a. diaperinus colonies were maintained at 28°c with 60-70% rh, and provided a diet of cracked corn:wheat bran (95:5) ad libitum. adult beetles from the kennebec and waycross strains were exposed to tetrachlorvinphos using a residual contact bioassay method as described previously (hamm et al., 2006) . beetles that survived exposure to a concentration of tetrachlorvinphos that was 350-fold greater than the susceptible strain lc 99 for 48 hr (i.e. resistant individuals) were used in genotyping. five denmark-s adult beetles (84 mg) were used to isolate mrna using a quickprep micro mrna purification kit (ge healthcare, waukesha, wi,), and cdna was synthesized with 500 ng of mrna using superscript iii reverse transcriptase (invitrogen, carlsbad, ca). a fragment that encoded the ace cdna, orthologous drosophila ace (fournier et al., 1989) , was amplified by s11ace (taygar-tayttycciggitt), s1iace (garatgtggaaycci-aayac), and as17ace (ccicciccrtaiaycca). a fragment that encoded the ace-paralogous gene was amplified using dl3 (gciaciatgtgga aycciaa) and dr98 (ggytticcigtyttigcraa) with the following thermal cycler program 95°c for 3 min, 35 cycles (95°c for 30 sec, 40°c for 30 sec, 72°c for 1.5 min) and a final extension at 72°c for 7 min. we obtained a fragment of 151 bp for the drosophila ace orthologous gene and a fragment of 825 bp for the paralogous gene. gene specific primers for 5′-and 3′-race were designed based on these sequences. race was performed using the bd smart race cdna amplification kit (bd biosciences, mountain view, ca). to compare the ace alleles in the susceptible and tetrachlorvinphos-resistant beetles, the ace cdnas were sequenced from individual beetles. the mrna was prepared using a polyatract system 1000 (promega, madison, wi), and was concentrated with a microcon ym-100 (millipore, billerica, ma). one fourth of the mrna from a beetle was used in the rt-pcr (50°c for 30 sec, 94°c for 2 min, 40 cycles (95°c for 30 sec, 50°c 30 sec, and 72°c for 1 min) using the superscript iii one-step rt-pcr system and platinum taq (invitrogen). gene-specific primers, s54adace (aagctgcccaattcttgcta), s84adace (tct-acctcaacatctgggtgcctcagc), and as53adace (aagctagggccatccttttc) were used for the amplification of the drosophila ace-orthologous gene. primers s44adace (gctgaacaccaccac-catgc), s43adace (gacacggtgttcggggactt), and as51adace (gcgaactcgttgacgttaca) were used to amplify the drosophila ace paralogous gene. dna sequencing was performed with s84adace and as53adace for the drosophila ace orthologous gene and with s43adace and as51adace for the paralogous gene at the cornell biotechnology resource center. we obtained the nearly complete orfs for adace1 and adace2 (figs. 1 and 2) . both genes show a high similarity (95 and 93% amino acid identity, respectively) with the predicted ace genes of t. castaneum (xp_970774.1, xp_973462.1) . kyte-doolittle hydropathy plots indicated the c-terminal of both adace1 and adace2 were hydrophobic (data not shown), and thus potentially exchanged for glycolipids. the cdna sequence of adace1 (the drosophila ace orthologous gene, eu086056) was 2,123 bp; archives of insect biochemistry and physiology march 2008 encoding 636 amino acid resides of an immature ache (fig. 1) . the deduced amino acid sequence had the characteristic features of ache, including the residues for the intra-molecular disulphide bonds (c110(67)-c137(94), c312(254)-c329(265), c465(402)-c582(521)), catalytic triad (s260(200), e389(327), h503(400)), protein dimerisation (c602), anionic subsite (w127(86)), oxianion hole (g172(118), g173(119), a261(201), and acyl binding site (w293(233), f352(290), f393(329)) (number in parentheses indicates the corresponding amino acid in torpedo ache). we could not unambiguously identify the translation start site because no stop codon was found in frame in the 5′ upstream region. if this transcript is similar in size to the ace gene in the colorado potato beetle, l. decemlineata (zhu and clark, 1995) , it will be more than 13 kb in size. however, adace1 has an initiation codon that is identical to the one tentatively identified in l. decemlineata. given that adace1 does not have any of the mutations associated with organophosphate and/or carbamate resistance in drosophila (mutero et al., 1994) , lucilia cuprina (chen et al., 2001) , or m. domestica aches (kim et al., 2003; kozaki et al., 2001; walsh et al., 2001) , we conclude that adace1 encodes an organophosphate-sensitive ache (characterized by m126, v205, g287, f352, and g390). this is consistent with the denmark-s strain being insecticide susceptible. we sequenced 1,895 bp encoding 591 amino acids of adace2 (the d. melanogaster ace paralogous ache, eu086057) (fig. 2) . as also found for adace1, fig. 4 . alignment of the deduced amino acid sequences from the drosophila ace orthologous genes in coleoptera. ad, tc, and ld represent a. diaperinus, t. castaneum, and l. decemlineata, respectively. archives of insect biochemistry and physiology march 2008 the residues for the intramolecular disulphide bonds (c76(67)-c103(94), c275(254)-c295(265), c410(402)-c532(521)), catalytic triad (s208(200), e334(327), h448(440)), protein dimerisation (c558) anionic subsite (w93(86)), oxianion hole (g126(118), g127(119), a209(201)), and acyl binding site (w241(233), f298(290), f338(331)) were found in adace2. we were unable to complete the 5′-race for adace2, although we tried multiple variations of the protocol given by the manufacturer, including increased or decreased cation concentration, increasing the viscosity of the reaction mix by bsa or by using an alternative cation (mg2 + to mn 2+ ). the alignment of this gene with the drosophila ace paralogous aches showed that, as expected for an insecticide-susceptible strain, beetles from the denmark-s strain had an organophosphate and carbamate sensitive type. a partial cdna, covering the amino acid residues found to be responsible for insecticide resistance in other species, was sequenced from individual adults for both alphitobius ace genes to ascertain if resistance was due to a change in one or both genes. if resistance was due to a mutation in adace1 or adace2, all resistant individuals should have a unique allele (i.e., different from the susceptible strain). the drosophila ace orthologous gene, adace1, was sequenced from two susceptible denmark-s, four waycross (tetrachlorvinphos-resistant), and two kennebec (tetrachlorvinphos-resistant) adults. the deduced amino acid sequences from all individuals were the same. there were six synonymous polymorphisms detected (data not shown). the ace paralogous gene, adace2, was sequenced from three susceptible (denmark-s), five waycross (tetrachlorvinphos-resistant), and five kennebec (tetrachlorvinphos-resistant) adults. the sequences from all individuals were highly similar. one of the denmark-s and one of the waycross beetles had a a271(261)s mutation (detected as a heterozygote in both individuals). there were an additional 10 synonymous polymorphisms identified (data not shown). given that neither adace1 nor fig. 5 . alignments of the deduced amino acid sequences from the drosophila ace paralogous genes in coleoptera. ad and tc represent a. diaperinus and t. castaneum, respectively. manure breeding insects responsible for cestodiasis in caged layer hens poultry integrated pest management: status and future ecology and management of arthropod pests of poultry relationship of campylobacter isolated from poultry and from darkling beetles in new zealand the acetylcholinesterase gene and organophosphorus resistance in the australian sheep blowfly, lucilia cuprina transmission of enteric pathogens of turkeys by darkling beetle larvae (alphitobius diaperinus) construction profiles of high rise caged layer houses in association with insulation damage caused by the lesser mealworm, alphitobius diaperinus (panzer) in virginia mutations of acetylcholinesterase which confer insecticide resistance in insect populations drosophila melanogaster acetylcholinesterase gene structure, evolution and mutations transmission of eimeria, viruses, and bacteria to chicks: darkling beetles (alphitobius diaperinus) as vectors of pathogens resistance to cyfluthrin and tetrachlorvinphos in the lesser archives of insect biochemistry and physiology alphitobius diaperinus, collected from the eastern united states arthropod populations in highrise caged-layer houses after three manure cleanout treatments cloning, mutagenesis, and expression of the acetylcholinesterase gene from a strain of musca domestica; the change from a drug-resistant to a sensitive enzyme amino acid substitutions conferring insecticide insensitivity in ace-paralogous acetylcholinesterase fenitroxon insensitive acetylcholinesterases of the housefly, musca domestica, associated with point mutations reservoir competence of alphitobius diaperinus (coleoptera: tenebrionidae) for escherichia coli (enterobacteriales: enterobacteriaciae) resistanceassociated point mutations in insecticide-insensitive acetylcholinesterase expression of the ace-paralogous acetylcholinesterase of culex tritaeniorhynchus with an amino acid substitution conferring insecticide insensitivity in baculovirus-insect cell system reducing darkling beetles residual and topical toxicity of various insecticides to the lesser mealworm (coleoptera: tenebrionidae) identification and characterization of mutations in housefly (musca domestica) acetylcholinesterase involved in insecticide resistance limited transmission of turkey coronavirus (tcv) in young turkeys by adult lesser mealworms, alphitobius diaperinus panzer (tenebrionidae) comparisons of kinetic properties of acetylcholinesterase purified from azinphosmethyl-susceptible and resistant strains of colorado potato beetle a point mutation of acetylcholinesterase associated with azinphosmethyl resistance and reduced fitness in colorado potato beetle we thank d.a. rutz for the beetles and c. leichter, j. briddell, and c. reasor for technical as-sistance. the daljit s. and elaine sarkaria professorship and a research fellowship (to t.k.) from the japanese society for the promotion of science for young scientists supported this project. adace2 were different between resistant and susceptible beetles, we conclude that the mechanism of tetrachlorvinphos resistance in these strains of a. diaperinus is not due to mutations in the ace genes (i.e., is not an altered acetylcholinesterase).alignments of the deduced amino acid sequences from the drosophila ace orthologous and paralogous genes in coleoptera are shown in figures 4 and 5, respectively. as expected, the ace orthologous sequences of the two tenebrionidae, a. diaperinus and t. castaneum, were more similar to each other than to the leptinotarsa decemlineata (fig. 4) . these three coleopteran sequences differed primarily at the n-and c-terminal regions, but showed expected conservation at most functionally important residues. similarly, the ace paralogous sequences from a. diaperinus and t. castaneum were highly similar, with the greatest number of differences found in the c-terminal region (fig. 5) .the phylogenic tree of the arthropod aches (fig. 3) shows there are two major groups, with the acari aches being intermediates. each group is further divided into the subgroups, primarily by order. adace1 and adace2 clustered with the other coleoptera genes in both. this is consistent with the idea that beetles have two ace genes. the mutations related to organophosphate resistance were first studied in d. melanogaster and m. domestica. however, studies of drosophila ace orthologous genes failed to identify mutations responsible for organophosphate resistance in other species. subsequently, mutations in the drosophila ace paralogous genes were found to be associated with the resistance in some mosquitoes. the increasing number of insect genome sequences reveal that the ancestral condition, at least in pterygota, is two copies of ace. it also appears that mutations on the drosophila and musca ace paralogous genes are more important than the mutations on the drosophila orthologous genes, in terms of conferring organophosphate resistance, at least in many species. key: cord-314503-u1y1bznk authors: jaluria, pratik; konstantopoulos, konstantinos; betenbaugh, michael; shiloach, joseph title: a perspective on microarrays: current applications, pitfalls, and potential uses date: 2007-01-25 journal: microb cell fact doi: 10.1186/1475-2859-6-4 sha: doc_id: 314503 cord_uid: u1y1bznk with advances in robotics, computational capabilities, and the fabrication of high quality glass slides coinciding with increased genomic information being available on public databases, microarray technology is increasingly being used in laboratories around the world. in fact, fields as varied as: toxicology, evolutionary biology, drug development and production, disease characterization, diagnostics development, cellular physiology and stress responses, and forensics have benefiting from its use. however, for many researchers not familiar with microarrays, current articles and reviews often address neither the fundamental principles behind the technology nor the proper designing of experiments. although, microarray technology is relatively simple, conceptually, its practice does require careful planning and detailed understanding of the limitations inherently present. without these considerations, it can be exceedingly difficult to ascertain valuable information from microarray data. therefore, this text aims to outline key features in microarray technology, paying particular attention to current applications as outlined in recent publications, experimental design, statistical methods, and potential uses. furthermore, this review is not meant to be comprehensive, but rather substantive; highlighting important concepts and detailing steps necessary to conduct and interpret microarray experiments. collectively, the information included in this text will highlight the versatility of microarray technology and provide a glimpse of what the future may hold. although, the principles behind microarray technology were conceived almost 20 years ago and developed from southern blotting, they did not gain wide spread attention for nearly a decade when researchers were first able to utilize high quality slides with precision robotics resulting in reproducible results [1] [2] [3] . for instance, a quick pubmed search with the words, 'microarray and 1995' results in 13 total articles, 5 of which are review articles. similar searches with the words, 'microarray and 2000' and 'microarray and 2005' result in 288 total articles (78 review articles) and 3906 total articles (1037 review articles), respectively. despite this relative surge in microar-ray-related articles, few recent publications address core issues regarding design, implementation, and subsequent data analysis. in covering these and related issues, the present text aims to illustrate the strengths, weaknesses, and application of microarrays, especially to those unfamiliar with the technology. today's arrays are vastly superior to their predecessors in terms of quality, probe density, and structural layout [2, 3] . before dealing with these and other characteristics, it is important to discuss, at some length, what microarrays are as well as the fundamental concepts behind the technology. the term microarray is both descriptive and somewhat ambiguous as it is commonly used to describe a variety of platforms including protein microarrays and tissue microarrays [3, 4] . a microarray is typically defined as a collection of microscopic spots arranged in an array or grid-like format and attached to a solid surface or membrane, hence the term [4, 5] . these spots typically referred to as probes, are designed such that each probe binds a specific nucleic acid sequence corresponding to a particular gene through a process termed hybridization [3] . the sequence bound to a probe, often referred to as the target, is labeled with some kind of detectable molecule or dye such as a fluorophore [4] . the level of binding between a probe and its target is quantified by measuring the fluorescence or signal emitted by the labeling dye when scanned. this signal, in turn, provides a measure of the expression of the specific gene containing the target sequence [2, 3] . although, there are several different types of dna microarrays, for the purposes of this text only two will be considered; spotted microarrays and oligonucleotide microarrays [1] . details regarding these two platforms are highlighted in table 1 . spotted microarrays are often referred to as dual-channel or two-color microarrays because two samples, each labeled with a different fluoro-phore, are hybridized onto a single slide [3, 6] . as a result of combining two samples onto a single slide, only relative expression levels can be determined using spotted arrays [1] . the probes in spotted arrays are oligonucleotides, complementary dna (cdna), or fragments of polymerase chain reaction (pcr) products; each type conferring different properties to the spotted array. despite these differences, all spotted arrays are similar in terms of: array construction, target preparation, and data analysis [2, 7] . in contrast, oligonucleotide microarrays also referred to as single-channel microarrays are hybridized with only one sample and therefore generate absolute expression levels. these arrays utilize probes designed to complement mrna sequences and are produced using various methods including in situ synthesis, some type of deposition method, or photolithography [3, 4] . as alluded to earlier, two important elements of microarray technology are target preparation and probe construction. depending on the type of microarray being used, different cellular components can be used for target generation including: rna, genomic dna, cdna, complementary rna (crna), and pcr products [6, 7] . regardless of which of these are used, ensuring the quality, stability, and reproducibility of the generated targets is paramount for subsequent processing. similarly, probes can consist of any of the following: cdna, oligonucleotides, fragments of pcr products, restriction-enzyme digested fragments, oligomers, or expressed sequence tags (ests) [1, 2, 6] . irrespective of the exact composition of the probes, they all serve the same basic function; binding very specific sequences. although, probes are constructed in a variety of ways, depending on the type of array and the specific application, the same public databases are referenced for sequencing information [2, 5, 7] . typically, arrays are fabricated with duplicates of each probe, enhancing the likelihood of observing hybridization for each gene. a simple schematic of the entire process for a spotted cdna microarray experiment can be seen in figure 1a [8] . briefly, total rna, once isolated from a sample, is reverse transcribed to produce cdna, labeled with fluorescent dyes, and then hybridized onto the spotted array [5, 8] . hybridization is quantified using the intensities of the fluorescent dyes at particular wavelengths. by comparing fluorescence intensities, genes that are differentially expressed between the two samples can be identified, along with the direction of that difference (i.e. overexpression or under-expression relative to a control) [2, 3] . for example, figure 1b illustrates the significance of each color when the test sample is labeled with cy5 and the control sample is labeled with cy3. in this case, black represents no binding (i.e. no signal), green indicates greater binding of the control sample than of the test sample referred to as down-regulation, yellow indicates equal binding between the two samples, and red indicates greater binding of the test sample than of the control sample referred to as up-regulation. if the dyes are used in reverse (i.e. cy5 is used to label the control sample and cy3 is used to label the test sample) the colors would have the opposite representations [3, 7] . although a multitude of microarrays are commercially available, each designed for a specific species or general family of organisms; these arrays are limited by the information available in genomics databases [2, 9] . though, the genomes of only a few species have been entirely sequenced and made available to the public, microarrays for a large number of species are available [10, 11] . for instance, checking the website for affymetrix reveals genome-wide arrays are available for the following microbes: bacillus subtilis, escherichia coli, pseudomonas aeruginosa, members of the genus plasmodium, staphylococcus aureus, and members of the genus saccharomyces. small, custom arrays can be designed for many more species as long as genomic sequences are available for a particular organism or family of organisms [2, 9, 11] . continued genome exploration has resulted in the need for frequent updating and re-organization of spotted arrays. with more information constantly coming online, microarrays are continually refined to enhance reproducibility and detection levels of weak signals by modifying the positioning and sequences of the ests spotted [2, 10] . as previously mentioned, ests are essentially unique segments of cdna identical to a portion of a gene, thereby acting as binding domain. in addition, valuable information can still be ascertained by hybridizing samples onto arrays designed for other species [12, 13] . so, even without an entire genome being spotted onto commercially available arrays for a given species, microarray experiments can still yield important results. any discussion regarding microarray technology would be incomplete without a detailed examination of the various limitations and complexities inherently present. such a discussion is vital to properly conduct microarray experiments and analyze microarray data; overcoming technological limitations in the process. before conducting microarray experiments, the following questions need to be addressed: what are the goals of the experiment, what biological comparisons are most relevant to these goals, how should the experiments be designed and performed taking into account the various sources of variability, which platform should be used, what controls need to be in place, and how can the results be verified [14, 15] . in approaching these and other relevant questions, a great deal of information regarding microarray technology can be ascertained. to answer the first two questions regarding goals and relevant comparisons, a number of resources can be referenced. several organizations such as the microarray gene expression data (mged) society and the european bioinformatics institute (ebi) have established guidelines to aid researchers in the design and implementation of microarray experiments [8, 9, 16] . in general, narrowing the objectives of a microarray study can provide insight into which biological samples should be compared. clear and concise goals also help define the scope of the study, providing a framework within which subsequent experiments can be proposed and implemented. one of the most commonly sited proposals is the minimum information about a microarray experiment (miame) that includes a series of recommendations and standards on collecting and analyzing microarray data [16, 17] . this document was designed to allow data generated by microarray experiments to be interpreted and reproduced with certainty. in addition, repositories such as the gene expression omnibus (geo) created by the national center for biotechnology information (ncbi) and arrayexpress created by the ebi have been established to store and share gene expression data [16, 17] . array even extending to other experiments as long as the same reference sample is used [16, 17] . however, this setup can become costly if the goals of the experiment require multiple comparisons to be made. in contrast, the saturated design involves making every possible comparison exactly once [17] . this approach is balanced and simple to establish, however, it is not applicable to all conditions and is not appropriate when a series of experiments are planned. ultimately, the design selected must address the goals and requirements of the experiment being conducted. without these and other considerations, errors in analysis including the identification of false positives can result, masking underlying patterns and incorrectly deciphering biological behavior. there are multiple sources of variability such as differences in: arrays, dye labeling, efficiency in reverse transcription, and hybridization [10, 14] . some of these issues relate back to the actual production of arrays and how probes are prepared; elements of quality control on the part of the manufacturer. the remaining issues are best overcome by: incorporating replicates to generate statistical significance (i.e. averages and variance), performing dye-swapping experiments, and pooling samples to minimize biological variation [6, 7, 14] . both technical and biological replicates are commonly employed, each with a different purpose in mind. technical replicates aim to quantify procedural variations such as sample preparation and handling [16] . in contrast, biological replicates aim to identify variation in the biological system being studied [16] . similarly, dye swapping involves switching the dyes used for labeling in a manner that prevents one type of sample from being labeled by a single dye. this setup helps account for the dye effect; an important systematic error that stems from differences in the properties of the dyes. the pooling of samples also reduces inherent variation in biological samples while at the same time generating sufficient sample quantities for subsequent processing [17] . as discussed earlier, there are two main platforms to consider when designing microarray experiments; spotted microarrays and oligonucleotide microarrays. the advantages and disadvantages of each are outlined in table 1 along with examples of when a particular platform is most beneficial [2, 3] . for example, oligonucleotide microarrays are ideal for time-course experiments because each array is hybridized with only one sample, allowing any array to be compared to any other array. this translates into requiring a smaller number of total samples for the same number of duplicates while at the same time more accurately representing the control for a given condition. similarly, for static conditions in which a basic comparison between treated and untreated cell populations is needed, dualchannel microarrays may be the best fit. each application has its own set of criteria that should be carefully evaluated to determine the best platform to use [1, 15] . for instance, if specific genes are to be investigated, it should be verified that the platform includes those particular genes with the desired number of replicates. a simple search online will reveal a multitude of companies that manufacture microarrays and allow customers to construct their own custom arrays using specialized software. in addressing the various sources of error, systematic or otherwise, proper controls need to be implemented. there are two types of controls, as they pertain to microarray technology; internal controls and external controls [1, 18] . internal controls check for the quality of the printed microarray whereas external controls account for performance in terms of sensitivity and robustness. the internal controls often used include: hybridization controls, poly-a controls, normalization control sets, and housekeeping genes [10, 18] . each type of control is commonly found in commercially available arrays and serves a distinct function relating to one specific aspect of microarray processing. in addition, samples can be spiked with particular agents to isolate or quantify detection limits, non-specific noise, and similar parameters [17] . similarly, a number of approaches can be taken to minimize external variables such as discrepancies in: growing and preparing biological samples, isolating and purifying rna, cell synchronization, hybridization protocols, and target preparation [14, 19] . in general, standardizing procedures can greatly reduce these errors introduced during the course of the experiment. although, the preparation of control samples used in a microarray experiment is typically not critical, the samples must be stable throughout the experiment and be reproducible. to verify the quality of purified rna and/or cdna gel electrophoresis and/or spectrophotometry should be used. with regards to cell synchronization, whole-culture methods such as serum starvation (a method in which cells are deprived of animal serum, a commonly used media supplement, to direct cells towards quiescence) and dna arrest (a general method of using chemical or pharmacological agents to prevent one or more phases of dna replication, suspending cells in a particular stage of the cell cycle) are typically used [20, 21] . however, selective methods such as mitotic shake-off, a method that involves shaking a flask or plate to remove cells undergoing mitosis because these cells are loosely attached, have also been used due to questions about the validity of whole-culture methods [20, 21] . whatever synchronization method is used should be applied to all of the biological samples to ensure a valid comparison is being made. typically to validate microarray results any one of a number of techniques such as rt-pcr, northern blotting, western blotting, and even the use of multiple microarray platforms can be employed [10, 14, 18] . figure 2 illustrates which of these methods are relevant to which aspects of microarray analysis. as mentioned earlier, verification is critical in order to assign distinct expression patterns to specific genes with certainty (i.e. statistical significance) because of the inherent variability present in microarray data. since a single microarray experiment evaluates the expression levels of tens of thousands of genes simultaneously, it would be extremely impractical to verify each and every gene using any of the methods listed above. instead, what is typically done is that a number of key genes are verified depending on the purpose and scope of the experiment [2, 5] . in addition, not every gene can be assayed using each verification method because the necessary components may not be available such as monoclonal antibodies necessary for western blotting or labeled primers for rt-pcr. as a result, multiple methods are often used to verify the results of microarray experiments. as described previously, expression levels for a given gene are determined using intensity values. one distinction between dual-channel microarrays and single-channel common steps employed to ensure quality and validity of microarray results common steps employed to ensure quality and validity of microarray results. from a quality control standpoint, replicates should be performed using rna samples prepared at the same time under the same conditions. various features of the arrays being used should also be known, especially the controls. to verify the results generated from microarray experiments, a combinatorial approach is usually needed; checking the statistical significance associated with the expression levels of specific genes, reviewing the literature, and conducting additional experiments such as rt-pcr or northern blotting. microarray quality microarrays is that the former generates relative expression levels whereas the latter generates absolute expression levels [1, 6] . this distinction stems from the fact that dual-channel microarrays are hybridized with two different samples; one considered the test sample and the other considered the control sample. as a result, the expression level determined for a specific spot or gene is dependent upon both samples and is a ratio of the form: therefore, the expression level for a gene in a dual-channel microarray is relative, not absolute. in contrast, singlechannel arrays are hybridized with only one sample and therefore the expression level for a given gene is absolute [2, 4] . once a scanned image for a hybridized microarray has been generated, visual inspection of the data can proceed, prior to normalization. this entails using imaging software to exclude specific spots with poor signaling and adjust the size/shape of grids that encompass the spots [10, 22] . next, normalization procedures can be applied to the data. essentially, normalization accounts for differences in labeling efficiencies and detection levels for the fluorescent dyes as well as differences in the quantity/ quality of rna samples [4, 10, 23] . as such, normalization can be thought of as the first level of filtering applied to the data. advanced statistical software packages offered by companies such as partek and acuity are commonly used. private research institutes such as the institute for genomic research (tigr) and the sanger institute along with academic facilities around the world also provide free software for microarray analysis [22] . although a number of normalization techniques can be applied to microarray data, the most commonly used are: total intensity, regression, and ratio statistics [18, 23, 24] . all three of these techniques assume that for some group of genes on the array, the average expression ratio is equal to one [10] . total intensity normalization assumes both samples (test and control) are comprised of the same amount of rna and the total amount of rna hybridized from each sample is the same. therefore, the total intensity calculated from all the spots on an array should be the same for both fluorescent dyes (channels) [5, 10] . conversely, normalization using regression presumes that a significant number of genes are expressed to the same extent in both samples; a reasonable assumption for samples that are fairly similar [22] . if the labeling and detection efficiencies for the two samples were equivalent, then the slope of the plot shown in figure 3 would be one [10, 22] . figure 3 was constructed from unnormalized data obtained from a single, spotted cdna array. two dif-ferent samples were hybridized onto the array, each labeled with a different dye. the graph illustrates inherent differences between the dyes in terms of labeling and detection efficiencies due to the characteristics of each dye such as stability. using regression techniques, the best-fit slope is calculated and modified to be equal to one by adjusting gene intensities. lastly, normalization using ratio statistics assumes that there exists some subset of genes with the same expression levels in both samples [10, 23] . these housekeeping genes, as they are often referred to, are used to calculate probability densities which in turn allow the mean expression ratio to be adjusted to one. each of these techniques calculates a normalization factor that is then used to scale the data, accounting for the variations previously mentioned [4, 9, 10] . following normalization the data can be probed using a host of statistical techniques that evaluate and ultimately decipher microarray data. for the purposes of this text, only two types will be touched upon briefly; clustering and hypothesis testing [6] . in general, both types of statistical methods strive to categorize, shape, and illuminate underlying patterns and therefore can be very useful in analyzing microarray data [23, 25] . however, both methods rely on different underlying principles and assumptions that directly influence their employment. clustering algorithms rely on calculating some kind of 'distance metric' to position gene expression levels into a expression level test sample intensity control sample inten = s sity scatter plot of measured intensities for both fluorescent dyes on a log-log scale prior to normalization figure 3 scatter plot of measured intensities for both fluorescent dyes on a log-log scale prior to normalization. the measured intensities are in arbitrary units. each point in the graph represents a single spot on a hybridized microarray. in addition, the red line shown is the best-fit line calculated for the data with a slope that is close to unity. matrix of sorts with a certain level of commonality [10] . both differentially expressed genes and groups of genes with similar expression patterns can be highlighted using clustering techniques. the most widely used clustering algorithms include: hierarchical, self-organizing maps (soms), k-means, and principle component analysis (pca) [8, 10, 18] . the mathematical formulations behind each of these methods are too complex and lengthy to be dealt with here, and so for the sake of brevity, very basic information will be covered in this text with a strong recommendation to consult specific references [10, 18, 23] . typically, these and other algorithms are used to create a more accurate and meaningful interpretation of the data. figure 4 illustrates how four different algorithms (when applied to the same data set) can generate vastly different groupings; each providing a different perspective on patterns present in the data. the data shown in figure 4 was obtained by hybridizing human cell lines grown under varying conditions onto cdna microarrays. by applying clustering algorithms in sequence, one after another, synergy is possible, lessening the shortcomings in each individual method. for example, it is common to apply pca to data prior to analysis with either k-means clustering or soms in order to generate an estimate for the number of clusters to be formed [10] . hierarchical clustering, quite possibly the most commonly used clustering algorithm, links every gene in an array to every other gene through a series of expanding brackets that collectively form a dendrogram [22] . genes deemed to be closely associated, referring back to the concept of a distance metric which in fact can be computed using several different statistical frameworks, are connected by a node [18, 23] . each node links to other nodes of various sizes, in a repetitive process until every possible pair of genes are linked, as illustrated in figure 4b . this type of clustering is popular due to its simplicity and ability to visualize the data. however, the statistical framework has several disadvantages including: not being able to account for multiple ways in which expression patterns can be similar, having difficulty assimilating large quantities of data, and forcing a hierarchical system upon a data set that does not truly exhibit a hierarchical lineage [10, 23] . unlike hierarchical clustering, soms require initialization and are much less rigid in terms of structure while at the same time remaining robust and unique. initialization involves defining a particular geometry, typically a grid or ring, with a specified number of groups or nodes [10, 23] . these nodes are mapped into a high dimensional space and successive iterations, usually tens of thousands, look to reduce the number of dimensions [10] . the algorithm also makes use of weighted vectors to select and group similar data entries together, essentially training itself after each phase. the end result of this process is a selforganized network that can be visualized [4, 6] . these and other features make soms a powerful tool in exploratory studies with an emphasis on visualization. similarly, k-means clustering aims to partition gene expression data into a specified number of disjoint clusters. again, a distance metric is used in these calculations and can be specified by the user. genes within a cluster are deemed similar to one another, but clusters are deemed dissimilar to one another producing a series of clusters that are not related or connected, opposite of the structure produced in hierarchical clustering [7, 10] . essentially, each gene is placed in one of the clusters initially specified and distances between clusters are calculated. next, genes are moved from one cluster to another until a local stability is reached in which the distance between clusters is maximized while at the same time minimizing the distance between members of a given cluster [22, 23] . this method is reliable and relatively simple and therefore is useful in analyzing data for which there is some prior knowledge such as classifying serotypes or strains. ensuring that the partitions constructed using k-means clustering have some type of real or actual significance is where the difficulty lies [10] . pca is an algorithm that relies on visually highlighting similarities in data in a manner that reduces the number of dimensions. it can be applied to any number of data sets from a small group of genes within a single array to groups of experiments each with a number of arrays [10, 22, 23] . as seen in figure 4a , the plot generated from pca allows patterns in data to be visualized by examining the proximity of clusters. the method implements a series of calculations to best separate the data and project that final analysis onto a 2 or 3 dimensional plot [22, 23] . when combined with other clustering methods, pca can be a very useful tool, as described earlier. besides clustering algorithms another statistical approach typically used to analyze microarray data is hypothesis testing which aims to establish statistical significance associated with divergent findings. if a group of genes, perhaps genes that constitute a particular pathway, are differentially expressed between two samples, hypothesis testing can quantify the extent of those differences. hypothesis testing is comprised of the following steps: specify the null hypothesis and the alternative hypothesis, select a significance level, calculate a statistic analogous to the parameter designated in the null hypothesis, calculate the probability value (p-value), compare the p-value with the significance level, and finally accept or reject the null hypothesis [7, 26] . at the end of these steps, an observed outcome is associated with a statistical likelihood indicating whether or not the observed outcome is the result of chance and not some real difference or phenomenon [26] . application of hypothesis testing is most useful when evaluating microarray data with specific genes or groups of genes in mind as opposed to discovery or exploration. a large number of microarray-related studies in the past have aimed to either characterize diseased cells in comparison to healthy cells or highlight the genes involved in a particular biological pathway [4, 8] . infrequently, studies were undertaken for other purposes such as gene discovery or examining distinct cellular properties [5, 6] . however, in recent years, the number of studies utilizing microarrays in some capacity has increased greatly. more and more studies are relying on microarrays to provide insight into observed physiology, essentially using microarrays to further characterize biological systems [3, 9] . in most of these cases, microarray analysis has generated interesting results, but also raised additional questions requiring further investigation, limiting its successful implementation. for instance, the application of bio-informatics tools such as microarrays to characterize microbial populations exposed to toxins and pollutants has been explored [27] . being able to understand the catabolism of xenobiotics could enhance bioremediation processes with a direct impact on pollution control and environmental organization [27] . in addition, the exploration of previously uncharacterized microbes using microarrays could identify novel genes with relevant functionality [27] . in this context, a number of studies have focused on specific issues such as investigating how candida albicans, a human fungal pathogen, is able to protect itself from the toxic effects of nitric oxide produced by the immune system [27, 28] . microarray analysis revealed a group of nine genes were over-expressed during exposure to nitric oxide. of these nine genes yhb1, which produces a flavohemoglobin that detoxifies nitric oxide, was the most highly expressed [28] . evolutionary studies using microarrays have also gained prominence with the use of species-specific arrays in parallel. for example, researchers hybridized dna from the progeny of two yeast strains, one with a particular evolved trait (i.e. mating discrimination) and the other without, onto oligonucleotide microarrays [29] . the arrays used in this study were designed to detect a multitude of polymorphisms between the two strains. adaptive mutations were identified by linking polymorphisms to the evolved parental strain [29] . investigators then mapped known genes and constructed a computer simulation capable of evaluating various parameters impacting mapping precision [29] . finally, the researchers applied their method to yeast strains adapting to a changing glucose-galactose feed illustrating mutations in the same gene can lead to parallel adaptation [29] . similarly, scientists compared community-acquired invasive staphylococcus aureus strains to isolates from healthy people using microarray constructed from 7 previous sequencing projects [30] . ten dominant lineages were identified; each with a distinct group of genes with potential functions related to virulence and resistence. subsequent analysis suggested a common ancestor could be traced back for all of the strains studied, but evolutionary divergence must have occurred early on [30] . the development of therapeutics has also benefited from the implementation of microarrays as evidenced by a number of recent publications. for example, scientists examined gene expression profiles from patients with chronic drug abuse, intending to better understand addiction and therefore formulate better treatments [31] . analysis of the array data revealed very little overlap in the expression patterns for heroin and cocaine users [31] . these findings were contrary to widely held views regarding the shared effects of heroin and cocaine on dopamine, thus prompting reassessment of previous assumptions [31] . another study, examined the mechanism behind acquired nisin resistance in bacteria [32] . researchers found genes involved in the following pathways to be expressed differentially between resistant and non-resistant lactococcus lactis strains: cell wall biosynthesis, energy metabolism, fatty acid and phospholipid metabolism, regulatory functions, and metal/peptide transport and binding [32] . using this information, the researchers established mutant strains that either had genes knocked down or over-expressed and found these mutants had varying levels of nisin resistance as compared to the parental, wild-type strains [32] . in terms of disease characterization and detection, microarrays are also finding use. for instance, the pathogenicity of coxasackievirus b3 (cvb3) was examined; in humans this virus adversely affects the heart muscle [33] . using cdna microarrays, researchers compared murine hearts infected with the virus against non-infected murine hearts. in addition, oligonucleotide microarrays were used to compare infected hela cells over time [33] . together, these experiments identified a number of differentially expressed genes, providing clues as to the precise sequence of events following infection. similarly, the use of custom microarrays to characterize unknown samples from water treatment centers as part of a quality control measure was examined [34] . the microarray was constructed to target 16s ribosomal rna (rrna) from several groups of nitrifying bacteria and tested against reference samples with some success [34] . using microarrays in the capacity of diagnostics has also become relatively popular especially in the context of outbreaks for which rapid diagnostic tools are needed to quickly evaluate pathogens and identify specific strains or serotypes [9, 35, 36] . for example, a microarray was constructed specifically to probe single nucleotide polymorphisms (snps) for foot and mouth disease virus (fmdv) [37] . the results were classified using statistical methods in order to develop a procedure to test for specificity with diagnostic application [37] . similarly, a study combined the use of microarrays with reverse transcription-pcr to differentiate between two genetically similar enteroviruses; enterovirus 71 (ev71) and coxsackievirus a16 (ca16) [38] . this approach had a diagnostic accuracy of at least 92% for each of the two viruses as compared to reverse transcription-polymerase chain reaction (rt-pcr) and neutralization testing [38] . currently, studies are being conducted to explore the feasibility and implementation of similar methods for other pathogens [38, 39] . with advancements in software and robotics technology, microarrays are becoming inexpensive, robust, and reliable [2, 5] . the availability of custom arrays designed to probe a small subset of genes (usually several hundred) or specific pathways have also enhanced the potential utilization of microarray technology [1, 2, 9] . this section was designed to highlight the latest advances in the technology, speculate on novel applications of microarray technology, and outline areas of research that have just begun to use microarrays. together these aspects portray the potential of microarrays in terms of applications as well as from a technical standpoint. breakthroughs in various aspects of the technology from fabrication to commercialization are continually influencing the kinds of microarrays and techniques researchers are using. currently, microarray experiments are conducted in a series of steps with each step being distinct and in a particular order. however, newly developed chips equipped with electronic circuitry are circumventing a number of these steps particularly sample labeling [8, 9] . in addition, a number of companies and research facilities now offer specialized arrays for detection, sequencing, and/or diagnostic purposes [3, 6] . by commercializing such highly specific arrays, data gathering is being expedited for studies with explicit purposes. an integrated platform like the lab-on-a-chip (a system that combines multiple manipulations including sample mixing, labeling, and separation onto a single chip) is also influencing microarray technology. the miniaturization and automated techniques used to construct the lab-on-a-chip system are being applied to microarrays leading to arrays that can be readily used for high-throughput applications [7] . one of the most promising areas of research includes classification; particularly in the context of diseases and/or pathogens [40] [41] [42] [43] . for instance, in 2002 researchers at the national cancer institute used microarrays to organize biopsy samples of diffuse large-b-cell lymphoma from more than 200 patients [44] . they identified 3 subgroups with varying expression of 17 distinct genes; constructing a model capable of predicting survival rates following chemotherapy [44] . in another study, researchers used microarrays to confirm previous classifications of nonpathogenic, low-pathogenic, and high-pathogenic types for 94 different yersinia enterocolitica strains [45] . researchers identified clusters of genes as being representative of each type (i.e. being present in one group, but not in another) with functional implications [45] . another arena in which microarrays may prove beneficial is discovery; primarily in the context of gene functions and the identification of novel organisms. for instance, in a recent study researchers analyzed an escherichia coli strain, a49, with a mutation in the rnpa gene making it sensitive to temperature and therefore unable to grow at or above 43â°c [46] . under varying growth conditions, researchers found a number of genes differentially expressed. careful review of these genes revealed rnase p, the mutated gene product, may have more functions than what had been proposed previously, especially in the context of handling precursor rnas [46] . researchers in 2003 constructed a custom array with highly conserved arrangements from every fully sequenced viral genome available in genbank [39] . next, they hybridized a viral isolate from a severe acute respiratory syndrome (sars) patient onto the array and found a previously unidentified coronavirus [39] . subsequent work involving viral sequencing verified these findings and showcased the potential of custom arrays to expedite the identification of pathogens; a virtual necessity in combating future outbreaks [39] . in terms of biological products, particularly vaccines and therapeutic proteins, microarrays may also find use. as detailed in various governmental regulations, slight variations in a biological process may result in distinct final products; requiring further testing and validation [9] . microarrays may very well provide a means of avoiding these procedures by establishing criteria (i.e. expression patterns for a small set of genes) that can be used to verify consistency and reproducibility. extensive research would, however, be required to first establish the necessary criteria. in addition, it should be stressed that in this particular application, microarray results would have to be viewed in terms of patterns for a group of genes rather than the expression levels of individual genes [9, 15] . this is because the variability associated with a single gene can exceed levels needed to verify or validate biological proc-esses, whereas the variability in groups of genes where an overall pattern is decoded is much less [10] . another area that has and continues to find microarray technology beneficial is pathway probing; illumination of biological pathways. often, microarray data alone cannot decipher the sequential steps necessary for a particular mechanism to occur, however, it can provide insight into what genes or groups of genes should be investigated further [15, 35] . for instance, a paper published in 2003 used microarrays together with other experimental techniques to decipher a pathway responsible for regulating the expression of cyclooxygenase-2 (cox-2), a pro-inflammatory protein associated with arthritis and pain [47] . a continuation of this work was published in 2005 further illuminating the pathway and possible feedback mechanisms with important therapeutic implications [48] . perhaps the greatest potential lies in combining two fields within the scope of bio-informatics; genomics and proteomics. genomics is the study of genes and their function whereas proteomics is the study of proteins and their functions [42, 49] . by utilizing tools from each of these two disciplines, researchers may be able to construct more accurate and comprehensive models depicting specific biological processes. for example, in a recent study both two-dimensional gel electrophoresis and microarrays were used to identify genes involved in the acclimation of changing visible light in cyanobacteria [50] . focusing on the organism fremyella diplosiphon, researchers found approximately 80 proteins with different levels between cells grown in green light vs. red light as well as 17 genes not previously thought to be regulated by light [50] . further exploration revealed a number of these genes had homologs in other organisms, though their functionality had not been fully deciphered [50] . in another study, both microarrays and proteomics were used to evaluate an escherichia coli mutant secreting more î±-hemolysin (hlya) than the parent strain [51] . the researchers found decreased levels of trna-synthetases in the mutant as compared to the parent strain [51] . based on this information, the researchers designed a modified hlya gene to reduce the rate of translation by incorporating rare codons leading to the same amino acid sequence [51] . when the parent strain was transformed with this modified hlya gene, it secreted even more hlya than the mutant [51] . in other words, the study indicated it was possible to engineer cells using an approach that combined genomics and proteomics. microarrays are a powerful genomics tool, designed to illuminate differences in the expression of genes within cells. despite being a relatively new technology, the scientific community has quickly adopted its use in a variety of fields including drug development, evolutionary biology, and disease characterization [1, 52] . the strength of the technology rests on the several factors including: ease of use, availability of platforms and lower cost relative to other exploratory methods such as northern blotting or ribonuclease protection assay (rpa), implementation of statistical methods for detailed analysis, and most importantly a global view of a gene expression encompassing an entire genome. as previously eluded to, the technological limitations associated with microarrays manifest themselves in terms of variability typically seen as systematic errors. improvements in robotics, array fabrications, and continued genome sequencing can certainly address these issues, but not entirely remove them. this places limits on what microarray technology can achieve, although a comprehensive understanding of microarrays can help establish meaningful and reproducible data. an effort to: properly design the experiment, establish quality control steps such as checking rna purity, analyze the data, and verify the results can also combat technological challenges [10, 14, 53] . in addition, archiving databases and files is a consideration often overlooked, though quite important in being able to return to data with new leads and directions for subsequent research or simply cross-compare with new data. there are, of course, other limitations, inherently present that restrict the scope of microarray analysis just like any other tool. for example, microarrays only present a snapshot of the transcriptome which is continually changing and responding to cellular needs and signals. as such, microarrays only illuminate a part of what is going on inside a cell or a population of cells [3, 6] . in addition, there does not necessarily have to be a tight correlation between the expression of a gene and the amount of translated protein. therefore, differentially expressed genes may not translate into varying protein levels with functional implications [3] . furthermore, the complexity of microarray analysis makes it exceedingly difficult to ascertain meaningful data with real biological significance without clearly defined goals or targets. an intricate aspect of genomic analysis is the interplay between genes or groups of genes (i.e. mechanisms) and that information is not easily deciphered using microarrays. and finally, the functionality of a gene cannot be determined solely using microarrays [2, 3] . indeed, other methods and experimental tools are needed to decipher the proteome, understand the varying interactions between genes and/or proteins, and develop a more complete picture of cellular behavior. ultimately, microarrays will continue to be used in a variety of research areas as more options in the design of custom arrays become available along with an increase in the assortment of species-specific arrays. technological advancements may help bring down the cost as well as enhance reproducibility and reliability promoting the applicaton of microarrays in new and diverse fields. in the end, the questions raised by microarray results are often just as vital as the answers they produce; a key to expanding the role of any scientific method to encompass new fields. exploring the new world of the genome with dna microarrays recent advances in dna microarrays dna microarrays: their use and misuse. microcirculation the use and analysis of microarray data microarray expression profiling: capturing a genome-wide portrait of the transcriptome microarray expression profiling: analysis and applications from patterns to pathways: gene expression data analysis comes of age bioinformatics in microbial biotechnology -a mini review computational analysis of microarray data differential gene expression in recombinant pichia pastoris analysed by heterologous dna microarray hybridization an analysis of the use of genomic dna as a universal reference in two channel dna microarrays a framework to analyze multiple time series data: a case study with streptomyces coelicolor design of studies using dna microarrays applications of microarrays in the pharmaceutical industry methods and approaches in the analysis of gene expression data uncovering evolutionary patterns of gene expression using microarrays microarray analysis of gene expression during the cell cycle biological methods for cell-cycle synchronization of mammalian cells tm4: a free, open-source system for microarray data management and analysis comparisons and validation of statistical clustering techniques for microarray gene expression data cluster analysis and display of genome-wide expression patterns systematic determination of genetic network architecture fundamentals of biostatistics environmental genomics: exploring the unmined richness of microbes to degrade xenobiotics transcriptional response of candida albicans to nitric oxide and the role of the yhb1 gene in nitrosative stress and virulence high-resolution mutation mapping reveals parallel experimental evolution in yeast microarrays reveal that each of the ten dominant lineages of staphylococcus aureus has a unique combination of surface-associated and regulatory genes distinctive profiles of gene expression in the human nucleus accumbens associated with cocaine and heroin abuse transcriptome analysis reveals mechanisms by which lactococcus lactis acquires nisin resistence genetic determinants of coxsackievirus b3 pathogenesis dna microarray detection of nitrifying bacterial 16s rrna in wastewater treatment plant sample oligonucleotide microarrays in microbial diagnostics gene expression studies using microarrays: principles, problems, and prospects microarray-based identification of antigenic variants of foot-andmouth disease virus: a bioinformatics quality assessment combining multiplex reverse transcription-pcr and a diagnostic microarray to detect and differentiate enterovirus 71 and coxsackievirus a16 viral discovery and sequence recovery using dna microarrays mixed-genome microarrays reveal multiple serotype and lineage-specific differences among strains of listeria monocytogenes diagnostic oligonucleotide microarray fingerprinting of bacillus isolates brousseau r: the development of a dna microarray-based assay for the characterization of commercially formulated microbial products potential use of microarray technology for rapid identification of central nervous system pathogens the use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma application of comparative phylogenomics to study the evolution of yersinia enterocolitica and to identify genetic differences relating to pathogenicity the effect of a single, temperature-sensitive mutation on global gene expression in escherichia coli shear-induced cyclooxygenase-2 via a jnk2/c-jun-dependent pathway regulates prostaglandin receptor expression in chondrocytic cells konstantopoulos k: divergent responses of chondrocytes and endothelial cells to shear stress: cross-talk among cox-2, the phase 2 response, and apoptosis dna microarray technology for target identification and validation genomic dna microarray analysis: identification of new genes regulated by light color in the cyanobacterium fremyella diplosiphon engineering hlya hypersecretion in escherichia coli based on proteomic and microarray analyses genetic analysis and attribution of microbial forensics evidence quantitative oligonucleotide microarray fingerprinting of salmonella enterica isolates funding was provided by the intramural program at the national institute of diabetes & digestive & kidney diseases, national institutes of health.the authors would also like to thank members of the biotechnology unit for their input and willingness to proofread the manuscript. the authors declare that they have no competing interests. pj formulated the content, performed the literature search, and drafted much of the manuscript. kk contributed to revising the manuscript and adding content. mb contributed to formulating the content and layout. js contributed to formulating the content, revising the manuscript, and drafting portions of the manuscript. key: cord-319519-mb9ofh12 authors: ding, j.; hostallero, d. e.; el khili, m. r.; fonseca, g. j.; milette, s.; noorah, n.; guay-belzile, m.; spicer, j.; daneshtalab, n.; sirois, m.; tremblay, k.; emad, a.; rousseau, s. title: a network-informed analysis of sars-cov-2 and hemophagocytic lymphohistiocytosis genes' interactions points to neutrophil extracellular traps as mediators of thrombosis in covid-19 date: 2020-07-02 journal: nan doi: 10.1101/2020.07.01.20144121 sha: doc_id: 319519 cord_uid: mb9ofh12 abnormal coagulation and an increased risk of thrombosis are features of severe covid-19, with parallels proposed with hemophagocytic lymphohistiocytosis (hlh), a life-threating condition associated with hyperinflammation. the presence of hlh was described in severely ill patients during the h1n1 influenza epidemic, presenting with pulmonary vascular thrombosis. we tested the hypothesis that genes causing primary hlh regulate pathways linking pulmonary thromboembolism to the presence of sars-cov-2 using novel network-informed computational algorithms. this approach led to the identification of neutrophils extracellular traps (nets) as plausible mediators of vascular thrombosis in severe covid-19 in children and adults. taken together, the network-informed analysis led us to propose the following model: the release of nets in response to inflammatory signals acting in concert with sars-cov-2 damage the endothelium and direct platelet-activation promoting abnormal coagulation leading to serious complications of covid-19. the underlying hypothesis is that genetic and/or environmental conditions that favor the release of nets may predispose individuals to thrombotic complications of covid-19 due to an increase risk of abnormal coagulation. this would be a common pathogenic mechanism in conditions including autoimmune/infectious diseases, hematologic and metabolic disorders. hlh genes are significantly enriched within the sars-cov-2 host protein interactome 106 in the case of the sars-cov-2 pandemic, with widespread impact across the world, there 107 is an urgency that requires the adaptation of different strategies to understand covid-19. in this 108 paper, we exploited the knowledge existing within protein interaction networks to identify the 109 molecular pathways underpinning thrombotic complications of covid-19 using advanced 110 computational algorithms. as described in the introduction, a subset of patients suffering from 111 severe complications of covid-19 present clinically with symptoms similar to hlh. therefore, 112 we have assembled a list of candidate genes responsible for primary hlh and associated 113 syndromes to explore their relationships with covid-19 24,25 (supplementary table s1 ). 114 the first question asked was whether these hlh genes had potential interactions with 115 sars-cov-2. we assembled a protein interaction network between the sars-cov-2 host 116 interaction protein network recently published 23 and the hlh genes using an algorithm that we 117 created for this purpose, genelist2covid19. the algorithm establishes the shortest path between 118 the candidate genes and the known host interacting proteins with sars-cov-2 and calculates an 119 overall connectivity score for the network (a smaller value represents a greater connectivity) ( fig 120 1 and supplementary table s1 ). we computationally validated the predictions of the 121 genelist2covid19 to identify significant interactions. to demonstrate that the method can 122 assign significant connectivity scores to genes associated with covid-19, we obtained a list of 123 10 confirmed covid-19 related genes 26 , which are differentially expressed in severe covid19 124 patients (supplementary table s1 ). we then calculated the "covid-19" connectivity score for 125 those 10 genes (sa) as well as all the genes (sb) using genelist2covid19. we found that sa is 126 significantly (p-value=0.017) smaller than sb, which indicates that those 10 covid-19 related 127 genes are indeed "significantly connected" to sars-cov-2 proteins (fig. 2) . to show the 128 specificity of the method, we also calculated the "covid19" connectivity score for 100 randomly 129 selected genes (sc) and compared it to the connectivity score of all genes (sb). we found that sc 130 is not significantly smaller than b (the background) (p-value=0.106) (fig. 2) . in other words, 131 those 100 random genes are not "significantly connected" to sars-cov2 proteins, which reflects 132 the fact that those genes were randomly picked. as an additional control, we repeated the analysis 133 using genes linked to male infertility 27 , a condition that has not been associated with covid-19 134 (supplementary table s1 ). the connectivity score was not significantly different from all other 135 genes (p-value=0.872), further demonstrating the specificity of the genelist2covid19, which is 136 not restricted to random genes but can also discriminate gene lists associated with other conditions 137 (fig. 2) . after the method was validated, we compared the "connectivity" score for hlh genes 138 listed above with all genes that connect to sars-cov-2 proteins through our assembled protein-139 protein interaction network (fig. 2) . we found that the score for the hlh marker genes is 140 significantly smaller compared to all other genes (p-value=0.0082, one-sided rank sum test) (fig. 141 2). as an additional control, we compared the hlh genes to a list of vascular angiogenesis genes 142 linked to both h1n1 and sars-cov-2 pulmonary infections 17 (supplementary table s1 & fig. 143 2). the hlh genes' connectivity score was smaller, which means that those genes had closer 144 theoretical interactions to sars-cov-2. this suggests that hlh genes and their associated 145 pathways are of high interest in the study of sars-cov-2 infections. 146 147 differential expression of hlh genes in health conditions related to covid-19. 148 we next investigated whether the expression of hlh genes in lung tissue were highly regulated 149 (up condition. 157 we hypothesized that hlh genes, that may play an active role in thrombotic complications 158 of covid-19, are more likely to be regulated in co-morbid conditions. rab27a expression was 159 found altered in all studied conditions, while ap3b1 expression was also found altered in all 160 conditions except one, lung cancer (fig 3) . table s1 ). the majority of 218 the genes were expressed in neutrophils and 7 of them (unc13d, lyst, ap3b1, magt1, 219 rab8a, golga2 and g3bp1) were significantly elevated in either inactive or active sjia 220 compared to control neutrophils (fig. 4) . it is worth noting that the most connected gene to sars-221 cov-2, ap3b1 that can directly interact with sars-cov-2 e protein, is in the list of up-regulated 222 genes in neutrophils derived from sjia. an important message stemming from our discovery that 223 nets may be drivers of coagulopathies in reactive hlh is the potential susceptibility of a subset 224 of the pediatric population, identifying them as at risk of severe complication of covid-19. this 225 led us to the next step, predicting potential vulnerable populations to thrombolytics complications 226 of covid-19 based on their susceptibility to release nets. 227 228 identifying potentially vulnerable populations to covid-19 based on nets release. 229 the world health organization has established that identifying vulnerable populations is 230 an urgent public health priority in the context of the covid-19 pandemic 48 . based on the analyses 231 above, nets may play an important role in promoting thrombosis in covid-19. the role of 232 neutrophils in coagulopathies is becoming increasingly recognised and particularly that of nets 41 . 233 therefore, we hypothesized that health conditions associated with increased release of nets could 234 be a predictive factor for thrombotic complications of covid-19. based on this hypothesis and in 235 order to identify vulnerable populations, we developed a method called forward (informative 236 random walk for ranking diseases) to rank different diseases associated with nets based on 237 their relevance to a gene set of interest (here genes in the hlh-sars-cov-2 network) (see 238 methods for details). 239 we obtained two net gene signatures from previous studies 49, 50 and combined them to 240 obtain a list of net-associated genes ( supplementary table s1 ). then, we obtained the list of 241 diseases associated with these genes from the disgenet database 51 . entries with gene-disease 242 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . association (gda) <0.4, were filtered out to arrive at a set of 99 diseases. we used the set of 24 243 genes in the hlh-sars-cov-2 interaction network ( fig. 1) as the query set and used the 244 humannet integrated network 35 as the gene interaction network in forward. the full list of 99 245 diseases, ranked using forward, is provided in supplementary table s3. table 2 illustrates 246 the diseases with a normalized disease score (nds) greater than 0.5, meaning that these diseases 247 are enriched above the background probabilities. most of the 10 top-ranked diseases associated 248 with genes linked to nets can be sub-grouped in 4 major categories 1-immune/infectious 249 (alzheimer's disease, immunodeficiency 8); 2-cardiovascular (myocardial reperfusion injury, 250 hemolytic anemia due to g6pi, bleeding disorder type; 15); 3-metabolic (diabetes) and 4) cancer 251 (liver carcinoma). it is important to note that some of these diseases would be putatively 252 associated with net-deficiency such as immunodeficiency 8, more often presenting clinically 253 with bleeding. whether patients suffering from these diseases are protected from thrombotic-254 complications of covid-19 remains to be determined. however, diseases associated with 255 increased net release are expected to yield greater risk of thrombosis and may identify vulnerable 256 populations to severe thrombotic complications of covid-19. 257 258 discussion 259 based on recent literature, we hypothesized that severe pulmonary thrombotic 260 complications of covid-19 are associated with a hematologic cytokine storm that could be, in 261 part, defined using genes causing hlh. the network-informed analysis presented in this paper, 262 revealed that 1) the top go biological function associated with hlh genes is neutrophil 263 degranulation, consistent with a recent report highlighting the undervalued role of neutrophils in 264 hlh 36 ; 2) hlh genes are significantly enriched with the sars-cov-2 human interactome; 3) the 265 top-ranked hlh gene, ap3b1, has roles in cargo loading of type ii pneumocytes, where it may 266 interact with sars-cov-2 to disturb surfactant physiological functions to promote 267 inflammation/pro-coagulation activities; 4) diseases/syndromes-associated with increased release 268 of neutrophil extracellular traps (nets) may predict vulnerable populations, including those 269 affecting children. 270 taken together, the network-informed analysis led us to propose the following model: the 271 release of nets in response to inflammatory signals acting in concert with sars-cov-2 damage 272 the endothelium and direct platelet-activation promoting abnormal coagulation leading to serious 273 complications of covid-19 in susceptible individuals (fig. 5) . the underlying hypothesis is that 274 genetic and/or environmental conditions that favor the release of nets may predispose individuals 275 to thrombotic complications of covid-19 due to an increase risk of abnormal coagulation. this 276 would be a common pathogenic mechanism amongst numerous conditions including 277 autoimmune/infectious diseases, hematologic and metabolic disorders. 278 the role of neutrophils in coagulopathies is becoming increasingly recognised and 279 particularly that of nets 41 . interestingly, elevated neutrophils count is the best single leukocyte 280 predictor of cardiovascular risk 52 , bettered only by the combination of high neutrophils to low 281 lymphocytes ratio 52 , a clinical feature of covid-19 10 . net release can be triggered by various 282 inflammatory mediators found elevated in severe covid-19, including crp, il-1b, il-6 and il-283 8 43 . there is also a positive correlation between circulating serum of il-6, il-8, crp and net 284 levels 53 . nets are found in a variety of conditions such as infection, malignancy, atherosclerosis, 285 and autoimmune diseases with reports now emerging that describe their presence in covid-19 9,54-286 56 . amongst the known diseases associated with nets, several are related to children including 287 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . cystic fibrosis 57 , meningococcal sepsis 58 ; lyme neurobiellosis 59 ; juvenile dermatomyositis 60 288 and pediatric inflammatory bowel diseases 61 . in pediatric sepsis, nets levels were elevated and 289 correlated with disease severity, mirroring results in mice where higher nets levels in response 290 to lipopolysaccharides are found in infant mice compared to adults 62 . 291 one of the quickest ways to decrease the burden of covid-19 on the health care systems 292 throughout the world is to identify at-risk populations to emphasize the importance of infection 293 prevention measures for those individuals. since these measures incur high personal, social and 294 economic costs, a precise knowledge is essential. we presented a novel computational algorithm 295 that enabled us to identify potential diseases linked with nets (table 2) . interestingly, amongst 296 the identified diseases, diabetes, a well-established comorbidity of covid-19 63 , is ranked 4 th and 297 7 th . our study provides additional insight into the potential mechanisms involved, with increase 298 nets formation resulting from the underlying chronic inflammation as a key factor promoting 299 coagulopathies in diabetics suffering from covid-19. as for the top ranked disease, alzheimer's 300 disease, whether nets in the brain can lead to an increased risk of systemic thrombosis looks less 301 likely than the reverse, that sars-cov-2 infection may increase nets release in the brain that 302 could exacerbate alzheimer's disease-driven pathology including a greater risk of stroke. this 303 may be an important question for future studies due to the susceptibility and severity of the elderly 304 to covid-19 and notably the extreme mortality seen in long-term care home arounds the world 305 where cognitive impairment is highly prevalent 64,65 . 306 it has been suggested that covid-19 should be added to this list of hyperferritinemic 307 syndromes, which includes adult-onset still's disease, septic shock, catastrophic anti-phospholipid 308 syndrome, and mas (reactive hlh) 66 . collectively, these diseases may share similar underlying 309 factors of complications, including an underappreciated role of nets leading to coagulopathies. 310 it is possible that individuals can unfortunately contract sars-cov-2 infection in addition to other 311 factors that underlie any of these conditions (other viruses for example), which may lead to further 312 amplification loop. pcr-negative sars-cov-2 patients presenting with clinical symptoms of 313 hyperferritinemic syndrome should be considered highly vulnerable and appropriate infection 314 control measures should be put in place. 315 disorders associated with bleeding should decrease the risk of thrombotic complications 316 of covid-19 (however they may still lead to severe covid-19 via other mechanism). 317 nevertheless, they can be informative on pathophysiology. the strongest connectivity to sars-318 cov-2 e protein was ap3b1 (fig. 1) . loss of function of ap3b1 leads to hermansky-pudlak 319 syndrome type 2 that is associated with bleeding and coagulation defects 67,68 . the sars-cov 320 ( therefore, both proteins have a coherent subcellular localization supporting their potential 323 interaction. moreover, in post-mortem immunohistochemical analysis of lung tissue, the sars-324 cov-2 s (spike) and e proteins were found to localize with the respiratory epithelia, the 325 interalveolar, and the septal capillaries 5 . in addition, septal and intra-alveolar neutrophilia was 326 observed 5 , colocalizing some of the key players of a neutrophil-driven sars-cov-2 enhanced 327 coagulation cascade in covid-19 (fig. 5) . whether sars-cov-2 e protein can directly or 328 indirectly penetrate neutrophils and/or platelets remains unknown, as these cells are not reported 329 to highly express ace2+/tmprss2+, the two key host proteins for viral entry. however, both of 330 these proteins are highly expressed on type ii pneumocytes 71 , where ap3b1 is important for cargo 331 loading of lamellar bodies 72 . a postmortem examination in a covid-19 patient who succumbed 332 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . to a sudden cardiovascular accident revealed sars-cov-2-viruses present in pneumocytes despite 333 pcr-negative nasal swabs 73 , indicating a prolonged risk in the lower airways for complications. 334 immunodeficiency 8, resulting from the loss of function of coronin-1, also leads to 335 bleeding. coronin-1 plays key functions in pmn trafficking 74 in part via its interaction with the 336 integrin î²2. î²2 integrin-mediated systemic net release is a viral mechanism of immunopathology 337 in hantavirus-associated disease such as kidney and lung damage 75 , similar to the 338 immunopathology in severe covid-19. overall, diseases associated with a putative loss-of-339 function of nets suggest mechanistic roles for ap3b1, coronin-1 and integrin b2 in regulating 340 net-mediated coagulopathies in the lung alveolar and peri-alveolar areas (fig. 5) . the analysis in this study is based on a new algorithm that we develop (freely available at 342 https://github.com/phoenixding/genelist2covid19). genelist2covid19 can systematically 343 evaluate the connection of any given gene list to sars-cov-2 proteins both within-host proteins 344 and between host-viral proteins. therefore, it can be used to study a wide variety of biological 345 problems associated with covid-19, especially in circumstances where experimental data on 346 covid-19 (e.g. transcriptomics or genomics) is not yet available for the problems of interest. the 347 algorithm was found effective, on positive (proven to be associated with covid-19) and negative 348 (irrelevant to covid-19) gene lists. in terms of limitation, genelist2covid19 is dependent on 349 the prior knowledge of the protein interactome within the host, and between the host and virus. 350 currently, we have a well-established protein-protein interactome for the human species. 351 however, the interactome between the sars-cov-2 proteins and human proteins is relatively 352 limited 23 , since such an interactome is far from complete. for example, there are no reported 353 interactions for ace2 and tmprss2, which are critical to sars-cov-2 infection. we provided 354 an option (-v) in genelist2covid19 to utilize any new host-viral protein interactome data when 355 it becomes available. while, genelist2covid19 is good at telling whether an input gene list is 356 associated with covid-19, it cannot test the mechanistic hypothesis generated. it can provide the 357 network that connects the genes in the list of the sars-cov-2 proteins, but it cannot determine 358 which nodes/edges in the network is more critical (and when they are activated). at this stage, the 359 most useful information is derived from considering the entire network. as the availability of 360 covid-19 related "-omics" data increases, we will extend the method into a joint-model that 361 integrates all those omics data for a more comprehensive, high-definition network model that can 362 provide additional and more precise insights for the role of genes in covid-19. the second 363 computational algorithm provided, forward (https://github.com/ddhostallero/forward), is 364 also limited by the requirement of known gene-disease associations. conditions. further studies in well-defined cohorts of covid-19 patients are mandatory to 377 confirm the relevance of the observations highlighted in the present study. such knowledge may 378 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . be of importance in novel covid-19 severity biomarkers identification that will be needed in the 379 management of individuals at risk of complications. 380 381 methods 382 383 datasets 384 for this study we used the following datasets: the interactions between the sars-cov-2 proteins 385 and host proteins 23 , reporting 332 interactions that involve 26 sars-cov-2 proteins and 332 386 human host proteins. each interaction in the map was assigned an interaction score that represents 387 the strength of the interaction (a score between 0 and 1). we also collected the protein-protein 388 interactions (with interaction scores) between all human host proteins from the hippie database 78 389 we obtained a list of highly/lowly (h/l) expressed genes under different health conditions that 390 potentially associated with covid19 10,28,29 . to identify the vulnerable populations using 391 forward, the full list of genes associated with these diseases were downloaded from the 392 disgenet for the analysis, we did not use the bootstrapping option, selected homo sapiens as 'species', and 407 used default values for all other parameters. we obtained go terms with a "difference score" 408 above 0.5. this score represents the normalized difference between the query probabilities and the 409 baseline probabilities in the rwr algorithm, with the best score observed as 1 ( table 1) . building the hlh-sars-cov-2 interaction network 412 we first built a network that connects all the sars-cov-2 proteins and the human host 413 proteins based on the collected protein interaction data. the edges connecting different proteins 414 are weighted based on the interaction scores obtained from the original datasets above. next, we 415 inferred the signaling paths from sars-cov-2 proteins down to a list of proteins (genes) of 416 interest. 417 a few key assumptions must be made before we can make such inference. first, since 418 collected protein interactions within the host (and between the host and the sars-cov-2 virus) 419 do not have directions, the reconstructed network graph is undirected. here, we assumed that the 420 information (i.e., infection) flows from the sars-cov-2 proteins to the proteins that directly 421 interact with sars-cov-2 proteins, next to other intermediate signaling proteins, and finally to 422 the target genes (proteins) of interest. there might be multiple intermediate proteins residing 423 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint between the direct sars-cov-2 interacting proteins and the target proteins of interest. second, 424 we did not allow loops in our path from sars-cov-2 proteins to the target proteins to reduce the 425 computation complexity. although feedback loops have been reported in previous studies 80 , they 426 are still relatively rarely observed in the human protein-protein network 81 . last, we assumed that 427 the interaction score between two proteins is proportional to the strength (or the likelihood) of their 428 interaction. a larger interaction score represents either a stronger or more likely interaction, which 429 results in a "stronger" connection edge in both cases. 430 the objective of the analysis was to find the strongest (or most likely) "connecting" path 431 from sars-cov-2 proteins to the target proteins (genes) of interest in the constructed network, 432 where the connection strength was quantified by a "connectivity score". we formulated the above 433 problem as: problem denotes the "shortest path problem", which we solved using dijkstra's algorithm (with a 449 quadratic time complexity in the number of vertices). 450 the above optimization strategy relies heavily on the strong edges (interactions with high 451 scores). the preference of high score edges may lead to over-sized paths, composed of only high 452 score edges. to avoid oversized paths, we penalized/constrained the length of the path (# of edges 453 in the path) while minimizing the connectivity score (a smaller connectivity score represents 454 stronger connectivity). here, we revised the aforementioned optimization problem into a 2-pass 455 strategy. in the first pass, we find all the shortest paths x(s,t) (with the same path length) that 456 connect sars-cov-2 proteins to the target proteins of interest, without considering the edge 457 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint weights (interaction scores) in the graph. in the second pass, we find the path x(s,t) in x(s,t) that 458 produces the minimal connectivity score connectivityconstrained (s,t) by taking the weight scores into 459 considerations for only the selected candidate paths from the first run. 460 461 we have packaged all the code into a tool named genelist2covid19, which is freely 465 available for academic uses. 466 467 ranking diseases using forward 468 we developed forward to rank a set of diseases (with known associated genes) based on their 469 relevance to a set of genes (here hlh genes). this method works on the principles of random 470 walk with restarts (rwrs) for ranking genes and gene sets on heterogenous networks 33,79,82 , and 471 enables integration of gene-level interactions to rank a set of diseases, with known associated 472 genes, based on their relevance. 473 forward, requires three types of inputs (sup. fig. 1 ): 1) a set of diseases along with 474 genes associated with each disease and the score of gene-disease associations (optional), 2) a gene 475 interaction network (e.g. co-expression, protein-protein interaction, etc.), and 3) a set of query 476 genes. using these inputs, forward first generates a heterogeneous network comprising of gene-477 gene edges and disease-gene edges, with normalized edge weights representing the strength of the 478 gene-gene interaction and the strength of evidence for gene-disease interaction (e.g. from the 479 disgenet database). then, the query set is superimposed on this network and is used as the restart 480 set in an rwr algorithm. using rwr in this algorithm allows us to capture topological 481 information within the network both locally (the neighborhood surrounding the query set) and 482 globally. after the convergence of the rwr, the steady-state probabilities of the disease nodes 483 represent their relevance to the query set. in order to correct for the network bias (i.e. to avoid 484 diseases with a large number of associated genes be ranked highly independent of their relevance 485 to the query set), we run the rwr one more time with all the genes in the network as the restart 486 set, providing a background steady-state probability for each disease node. the difference between 487 the steady state probabilities of these two rwrs are then normalized between 0 and 1. more 488 specifically, letrepresents the difference between the steady state probabilities of the two rwrs 489 for disease , where = 1, 2, â�¦ , and is the total number of diseases to be ranked (note that 490 â��1 â�¤ -â�¤ 1). also, let 3,4 = max -(| -|). the normalized disease score (nds) for the -th 491 disease is: 492 it is important to note that nds above 0.5 reflect diseases whose similarity score with respect to 494 the query set is larger than their similarity score with respect to all genes (i.e. background). 495 the rwr (which we used in forward) is an algorithm for scoring the similarity between 496 any given node of a weighted network and a query set of nodes. starting from some initial node, 497 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . at each step the random walker moves to an adjacent node with a probability proportional to the 498 edge weight connecting the two nodes and with some probability (known as probability of restart) 499 it jumps to one of the nodes in the query set (also known as the restart set). the restart probability 500 controls the influence of the local topology of the network (surrounding the query set) and its 501 global topology. we used = 0.5 to balance the influence of these two factors. 502 503 software availability 504 the software genelist2covid19 is written in python, available as an open source tool at github 505 (https://github.com/phoenixding/genelist2covid19). an implementation of the software 506 forward is available in python and is freely available on github 507 (https://github.com/ddhostallero/forward). these github repositories include the source code 508 as well as detailed instructions on how to install and use the methods. 509 510 511 512 513 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. contributions and to ensure that questions related to the accuracy or integrity of any part of the 531 work, even ones in which the author was not personally involved, are appropriately investigated, 532 resolved, and the resolution documented in the literature. 533 534 535 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. hemolytic anemia, nonspherocytic, due 0.500 (-) none . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . to glucose phosphate isomerase deficiency 10 bleeding disorder, platelet-type, 15 0.500 (-) none 551 abbreviations used: nets = neutrophil extracellular traps. a a "+" sign indicates demonstrated net release in the disease, a "-" sign a deficiency in nets. brackets "( )" 554 around the "+" or "-" signs indicate prediction without published data. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. network shows all the paths connecting the sars-cov-2 proteins to the hlh proteins (genes). 560 the red nodes represent the sars-cov-2 proteins, the yellow nodes are the human host proteins 561 that directly interact with sars-cov-2 proteins, the green nodes are the intermediate interacting 562 host proteins, and the blue nodes denote the target hlh proteins (genes). the edge weights in the 563 network represent the interaction strength (or probability). 564 565 figure 2 . hlh genes are significantly enriched within the sars-cov-2 host protein 566 interactome. a connectivity score was calculated for each of the genes of interest (e.g. hlh 567 genes in this work). we further analyzed the network connectivity of all genes to the sars-cov2 568 proteins (or randomly picked genes). with these two analyses, we ended up with two lists of 569 connectivity scores: sa (for hlh genes) and sb (for all background genes). then, we calculated 570 the statistical significance (p-value) using a one-sided mann-whitney rank test to determine 571 whether sa is significantly smaller than sb (stronger connectivity). sa significant p-value implies 572 that the list of proteins (genes) of interest is "significantly connected" to the sars-cov2 proteins. 573 a) a list of 10 known covid19 related genes (differential genes in severe covid19 patients) 574 have statistically stronger connections to the sars-cov-2 proteins compared with all background 575 genes (p-value=0.017). b) a list of 100 random genes does not "significantly" connect to the 576 sars-cov-2 proteins. c) the 23 male infertility genes do not "significantly" connect to the 577 sars-cov-2 proteins. d) the 11 hlh genes have statistically (p-value=0.00821) stronger 578 connections to the sars-cov-2 proteins (compared with all background genes). e) a list of 45 579 vascular angiogenesis genes linked to both h1n1 and sars-cov-2 pulmonary infections 580 significantly (p-value=4.89e-5) connect to the sars-cov-2 proteins. the 11 hlh genes have 581 the smallest mean/median connectivity score compared to all the gene lists analyzed. please note 582 that the p-values here only indicate whether the input gene lists have significantly smaller 583 connectivity scores than all the background genes, and they could be affected by the size of the 584 gene list. to compare the strength of the "connectivity" of input gene lists to sars-cov-2 585 proteins, we should also look at the mean (represented by a green triangle) and the median 586 (represent by a vertical line) connectivity scores. 587 588 figure 3 . differential expression of hlh genes in covid19 associated health conditions. 589 gene on the log fold change (condition vs control). genes whose fold change was among the top 25% 596 were classified as high (h) and those whose fold change was among the bottom 25% were 597 classified as low (l). finally, the 11 hlh genes were assessed to determine whether they are 598 among the h or l genes in each condition. the red blocks represent hlh genes that were highly 599 expressed (h, top 25%) in condition (vs. control) while the blue blocks represent hlh genes that 600 were lowly expressed (l, bottom 25%) in condition (vs. control). we compared the hlh genes 601 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july 2, 2020. . and all background genes in terms of h/l expression under various conditions. we first counted 602 the number of h/l (differentially expressed between the condition and control) for each of the 603 hlh genes, and then for each of the background genes. next, we used a one-sided mann-whitney 604 rank test to determine whether the hlh genes have larger absolute fold changes, (i.e, are 605 differentially expressed), in covid19 associated conditions compared to all the background genes 606 significantly (p-value<0.05). the average number of h/l conditions for hlh genes (red or blue 607 blocks) is 3.82, which is significantly larger (one-sided rank-sum test p-value=1.01e-4) than the 608 average number of h/l conditions for all genes (1.70). 609 610 611 figure 4 . expression of hlh genes in control, inactive sjia and active sjia neutrophils. 612 gene expression of hlh-sars-cov-2 and positive covid-19 genes (supplementary table 1 ) 613 in sjia was calculated from geo series gse122552 47 . data was mapped to the hg38 genome and 614 normalized by reads per kilobase per million (rpkm). values for hlh genes were displayed for 615 control and sjia patients that were either in remission (inactive sjia) or had active symptoms 616 (active sjia). 617 618 figure 5 . model of net-mediated endothelial damage contributing to pulmonary vascular 619 thrombosis in severe covid-19. 620 infection by sars-cov-2 in vulnerable population will lead to hyperinflammation either from 621 underlying genetic mutations, specific epigenetic landscapes or external factors, that will result in 622 the increase circulation of acute phase reactants such as crp and pro-inflammatory cytokines 623 associated with neutrophilia like il-6, il-17a/f and cxcl8 (il-8). il-17a activates the 624 endothelium to induce neutrophil adhesion 93 , where the increase in crp can trigger the release of 625 nets, resulting in damage to the endothelium as well as aggregation and activation of platelets. 626 additionally, the presence of sars-cov-2 e protein in type ii pneumocytes could disturb the 627 surfactant cargo via its interaction with ap3b1, leading to impaired secretion of sp-d and greater 628 net formation by septal and intra-alveolar neutrophils increasing the risk of thrombosis in the 629 pulmonary microvasculature. in some predisposed patients the combinations of these mechanisms 630 will lead to severe covid-19 complications. the identification of mediators of this pro-631 coagulation cascade is essential in achieving the two-fold task of identifying vulnerable 632 populations and developing a personalized medicine approach. 633 634 635 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint figure 1 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medrxiv preprint abnormal coagulation parameters are associated with 637 poor prognosis in patients with novel coronavirus pneumonia clinical course and risk factors for mortality of adult inpatients with covid-640 19 in wuhan, china: a retrospective cohort study high incidence of venous thromboembolic events in anticoagulated severe 642 covid-19 patients high risk of thrombosis in patients with severe sars-cov-2 infection: a 644 multicenter prospective cohort study complement associated microvascular injury and thrombosis in the 647 pathogenesis of severe covid-19 infection: a report of five cases large-vessel stroke as a presenting feature of covid-19 in the young imbalanced host response to sars-cov-2 drives development of 652 covid-19 covid-19: consider cytokine storm syndromes and immunosuppression neutrophil extracellular traps in covid-19 clinical course and risk factors for mortality of adult inpatients with covid-657 19 in wuhan, china: a retrospective cohort study clinical predictors of mortality due to 659 covid-19 based on an analysis of data of 150 patients from wuhan, china. intensive care 660 weathering the covid-19 storm: lessons from hematologic cytokine 662 syndromes hemophagocytic lymphohistiocytosis induced by severe pandemic 664 influenza a (h1n1) 2009 virus infection: a case report. case rep. med. 2011, 951910 analysis of fatal cases of pandemic influenza a (h1n1) virus infections in 667 pediatric patients with leukemia hemophagocytic lymphohistiocytosis associated with 2009 pandemic 669 influenza a (h1n1) virus infection report of a fatal pediatric case of 671 pulmonary vascular endothelialitis, thrombosis, and angiogenesis in 674 adult haemophagocytic syndrome an outbreak of severe kawasaki-like disease at the italian epicentre of the 678 -2 epidemic: an observational cohort study outbreak of kawasaki disease in children during covid-19 pandemic: a 681 prospective observational study in kawasaki-like disease: emerging complication during the 684 covid-19 pandemic hyperinflammation, rather than hemophagocytosis, is the 686 common link between macrophage activation syndrome and hemophagocytic 687 lymphohistiocytosis a sars-cov-2 protein interaction map reveals targets for drug 689 repurposing advances in the pathogenesis of primary and 693 secondary haemophagocytic lymphohistiocytosis: differences and similarities host-viral infection maps reveal signatures of severe covid-19 patients key functional genes of spermatogenesis identified by 698 microarray analysis incidence, clinical characteristics and prognostic factor of patients with 700 covid-19: a systematic review and meta-analysis sars-cov-2 response signaling and regulatory networks the role of rab27a in the regulation of neutrophil function rab27a and rab27b regulate neutrophil azurophilic granule exocytosis 708 and nadph oxidase activity by independent mechanisms localization of the ap-3 adaptor complex defines a novel endosomal exit 711 site for lysosomal membrane proteins knowledge-guided analysis of 'omics' data using the knoweng cloud 713 platform string v11: protein-protein association networks with increased 715 coverage, supporting functional discovery in genome-wide experimental datasets humannet v2: human gene networks for disease research mechanisms of action of ruxolitinib in murine models of hemophagocytic 720 lymphohistiocytosis neutrophil extracellular traps in immunity and disease covid-19 and kawasaki disease: novel virus and novel case enhanced formation of neutrophil extracellular traps in kawasaki disease neutrophil extracellular traps induce 728 aggregation of washed human platelets independently of extracellular dna and histones an emerging role for neutrophil extracellular traps in noninfectious 731 disease neutrophil cytoplasts induce th17 differentiation and skew 733 inflammation toward neutrophilia in severe asthma role of c-reactive protein at sites of inflammation and 735 infection secondary hemophagocytic lymphohistiocytosis in pediatric patients: a 737 single center experience and factors that influenced patient prognosis neutrophils in pediatric autoimmune disease the role of extracellular histones in systemic-onset 742 juvenile idiopathic arthritis neutrophils from children with systemic juvenile idiopathic arthritis 744 exhibit persistent proinflammatory activation despite long-standing clinically inactive 745 blueprint and covid-19 global substrate profiling of proteases in human neutrophil 748 extracellular traps reveals consensus motif predominantly contributed by elastase neutrophil extracellular traps contain calprotectin, a cytosolic protein 751 complex involved in host defense against candida albicans the disgenet knowledge platform for disease genomics which white blood cell subtypes predict increased cardiovascular 756 risk? crp induces netosis in heart failure patients with or without diabetes targeting potential drivers of covid-19: neutrophil extracellular traps primary tumors induce neutrophil extracellular traps with targetable 762 metastasis promoting effects neutrophil extracellular traps sequester circulating tumor cells and 764 promote metastasis fibrosis lung disease from childhood to adulthood: neutrophils trap (net) formation, and net degradation neutrophil extracellular traps in tissue and periphery in juvenile 774 neutrophil extracellular traps in pediatric inflammatory bowel disease neutrophil extracellular traps (nets) exacerbate severity of infant sepsis association of blood glucose control and outcomes in patients with covid-780 19 and pre-existing type 2 diabetes dementia care during covid-19 epidemiology of covid-19 in a long-term care facility in king 784 storm, typhoon, cyclone or hurricane in 786 patients with covid-19? beware of the same storm that has a different origin identification of a homozygous deletion in the ap3b1 gene causing 789 altered trafficking of lysosomal proteins in hermansky-pudlak syndrome due to 792 mutations in the î²3a subunit of the ap-3 adaptor subcellular location and topology of severe acute respiratory 794 syndrome coronavirus envelope protein coronavirus envelope protein: current knowledge sars-cov-2 receptor ace2 is an interferon-stimulated gene in human 798 the alveolar epithelium determines susceptibility to lung fibrosis in 801 pathological evidence for residual sars-cov-2 in pulmonary tissues of a 803 ready-for-discharge patient coronin 1a, a novel player in integrin biology, controls neutrophil trafficking 805 in innate immunity î²2 integrin mediates hantavirus-induced release of neutrophil 807 extracellular traps complement activation contributes to severe acute respiratory 809 high level of neutrophil extracellular traps correlates with poor prognosis 811 key: cord-103505-9adtbwp2 authors: hale, a. t.; zhou, d.; bastarache, l.; wang, l.; zinkel, s. s.; schiff, s. j.; ko, d. c.; gamazon, e. r. title: the genetic architecture of human infectious diseases and pathogen-induced cellular phenotypes date: 2020-07-21 journal: nan doi: 10.1101/2020.07.19.20157404 sha: doc_id: 103505 cord_uid: 9adtbwp2 infectious diseases (id) represent a significant proportion of morbidity and mortality across the world. host genetic variation is likely to contribute to id risk and downstream clinical outcomes, but there is a need for a genetics-anchored framework to decipher molecular mechanisms of disease risk, infer causal effect on potential complications, and identify instruments for drug target discovery. here we perform transcriptome-wide association studies (twas) of 35 clinical id traits in a cohort of 23,294 individuals, identifying 70 gene-level associations with 26 id traits. replication in two large-scale biobanks provides additional support for the identified associations. a phenome-scale scan of the 70 gene-level associations across hematologic, respiratory, cardiovascular, and neurologic traits proposes a molecular basis for known complications of the id traits. using mendelian randomization, we then provide causal support for the effect of the id traits on adverse outcomes. the rich resource of genetic information linked to serologic tests and pathogen cultures from bronchoalveolar lavage, sputum, sinus/nasopharyngeal, tracheal, and blood samples (up to 7,699 positive pathogen cultures across 92 unique genera) that we leverage provides a platform to interrogate the genetic basis of compartment-specific infection and colonization. to accelerate insights into cellular mechanisms, we develop a twas repository of gene-level associations in a broad collection of human tissues with 79 pathogen-exposure induced cellular phenotypes as a discovery and replication platform. cellular phenotypes of infection by 8 pathogens included pathogen invasion, intercellular spread, cytokine production, and pyroptosis. these rich datasets will facilitate mechanistic insights into the role of host genetic variation on id risk and pathophysiology, with important implications for our molecular understanding of potentially severe phenotypic outcomes. the genetic basis of infectious disease (id) risk and severity has been relatively a schematic diagram illustrating our study design and the reference resource we provide 130 can be found in figure the id-associated genes tend to be less tissue-specific (i.e., more ubiquitously 212 expressed) than the remaining genes ( figure s1a , mann whitney u test on the  statistic, p = 213 7.5x10 -4 ), possibly reflecting the multi-tissue predixcan approach we implemented, which 214 prioritizes genes with multi-tissue support to improve statistical power, but also the genes' 215 pleiotropic potential. we hypothesized that tissue expression profiling of id-associated genes 216 can provide additional insights into disease etiologies and mechanisms. for example, the 217 intestinal infection associated gene ndufa4 is expressed in a broad set of tissues, including 218 the alimentary canal, but displays relatively low expression in whole blood ( figure s1b ). in 219 addition, tor4a, the most significant association with bacterial pneumonia (table 1) table 5 ). these data identify specific molecular mechanisms across id traits with critical 239 regulatory roles (e.g., protein modifications) in host response among the id-associated genes. we tested the hypothesis that distinct infectious agents exploit common pathways to find 241 a compatible intracellular niche in the host, potentially implicating shared genetic risk factors. notably, 64 of the 70 id-associated genes ( notably, we identified an enrichment (fdr = 9.68x10 -3 ) for a highly conserved motif figure 7a ). in addition, we identified significantly more replicated snp our study provides a reference atlas of genetic variants and genetically-determined integrating predicted transcriptome from multiple tissues improves association detection. plos genet 15, e1007889. human genomics. the genotype-tissue expression (gtex) pilot analysis: multitissue ojala the apoptotic v-cyclin-cdk6 complex phosphorylates and inactivates bcl-2 survival of tissue-resident memory t cells requires exogenous lipid 708 uptake and metabolism conversion of p35 to p25 deregulates cdk5 activity and promotes neurodegeneration cdk5 deletion enhances the anti-inflammatory potential of gc-mediated gr activation during inflammation. 714 frontiers in immunology 10 the human papillomavirus type 716 16 e7 gene encodes transactivation and transformation functions similar to those of adenovirus 717 e1a microbial genome-wide association studies: 719 lessons from human gwas principal components analysis corrects for stratification in genome-wide association studies the ability to replicate in macrophages is conserved between 724 yersinia pestis and yersinia pseudotuberculosis genome-wide methylation analysis and epigenetic unmasking 727 identify tumor suppressor genes in hepatocellular carcinoma development of a large-scale de-identified dna biobank to enable 731 personalized medicine atg9a controls dsdna-driven dynamic translocation of 734 sting and the innate immune response quantitative proteomics reveals metabolic and pathogenic properties of chlamydia 737 trachomatis developmental forms human immune disorder arising 739 from mutation of the alpha chain of the interleukin-2 receptor contrasting the genetic architecture of 30 complex traits from summary association data analysis of genome-wide association data highlights candidates for drug repositioning in 745 psychiatry phosphoproteomics to characterize host response during influenza a virus infection of human macrophages phosphoproteomic analyses reveal signaling pathways that facilitate lytic 752 gammaherpesvirus replication gene set enrichment 755 analysis: a knowledge-based approach for interpreting genome-wide expression profiles lactic acidosis in sepsis: it's not all anaerobic: 758 implications for diagnosis and management diagnostic value of muc1 and epcam mrna as tumor markers in differentiating benign from malignant pleural effusion structure and regulation of the cdk5-p25(nck5a) complex essential function for the nuclear protein akirin2 in b cell activation and humoral 766 immune responses akirin2 is critical for inducing 769 inflammatory genes by bridging ikappab-zeta and the swi/snf complex investigating the possible 773 causal association of smoking with depression and anxiety using mendelian randomisation 774 meta-analysis: the carta consortium subversion of the actin cytoskeleton 776 during viral infection genome-wide association and hla region fine-mapping studies identify susceptibility 779 loci for multiple common infections. nat commun 8, 599. known genetic associations and discovery of new genetic disorders pathogen culture and virology data linked to whole-genome genetic information 1017 the sd consists of a wide range of clinical microbiological data. for individuals with 1018 whole-genome genetic information, we analyzed pathogen (bacterial, mycobacterial, and fungal) 1019 culture data derived from the following positive cultures for the indicated clinical samples 2) sputum (n = 2,478), 3) sinus/nasopharyngeal (n = 1,820), 4) bronchial-1021 alveolar lavage (n = 1,265), and 5) tracheal sampling (n = 422). furthermore, we analyzed a 1022 respiratory panel containing 28 viral strains from 2,890 individuals with whole-genome genetic 1023 information. viral strains included the following: 1) adenovirus, 2) bocavirus, 3) bordetella 1024 parapertussis, 4) bordetella pertussis, 5) chlamydia pneumoniae, 6) coronavirus coronavirus hku1, 8) coronavirus nl63, 9) coronavirus nos, 10) coronavirus oc43 12) human metapneumovirus, 13) influenza a, 14) influenza a, h1 16) influenza a, h3, 17) influenza b, 18) mycoplasma pneumoniae 20) parainfluenza 1, 21) parainfluenza 2, 22) parainfluenza 3, 23) parainfluenza 1029 4, 24) respiratory syncytial virus (rsv), 25) rsv, a, 26) rsv, b, and 27) rhinovirus. the 1030 pathogen information for each individual in our study included: 1) total number of cultures number of ambiguous cultures (i.e., 1032 normal upper respiratory bacteria or low level contamination); 4) number of positive cultures 1033 (i.e., the number of cultures with growth consistent with clinical infection); 5) genus or genera 1034 isolated (up to 96 unique genera per sample site), which ranged from zero to 10 per sample. 1035 1036 methods details 1037 gwas of id traits (ivs) only biallelic non-palindromic variants were considered as ivs mr-egger 1091 regression generalizes the inverse-variance weighted method, where the intercept is assumed 1092 to be zero. we also used the weighted-median estimator high-throughput human in vitro susceptibility testing about/, and phenotype 1101 definitions and family-based gwas of the hi-host phenome project were previously 1102 described 2015) were obtained from the coriell institute. the lcls represented 1104 diverse populations, including esn (esan in nigeria lcls were cultured in rpmi 1640 media containing 10% fetal bovine serum, 2 mm glutamine salmonella 1110 infection was performed using pmmb67gfp (pujol and bliska, 2003), and sifa deletion was 1111 constructed using lambda red and validated using pcr correction for the total number of genes tested (n = 9,868) across 35 phenotypes (i.e., p < 1058 1.4x10 -7 ). trait-specific significance was determined using bonferroni correction for the total 1059 number of genes tested (n = 9,868, p < 5.07x10 -6 ). genomic ancestry was quantified using key: cord-352190-1987sfyz authors: xia, hongyue; li, xibao; zhao, wenliang; jia, shuran; zhang, xiaoqing; irwin, david m.; zhang, shuyi title: adaptive evolution of feline coronavirus genes based on selection analysis date: 2020-08-13 journal: biomed res int doi: 10.1155/2020/9089768 sha: doc_id: 352190 cord_uid: 1987sfyz purpose: we investigated sequences of the feline coronaviruses (fcov), which include feline enteric coronavirus (fecv) and feline infectious peritonitis virus (fipv), from china and other countries to gain insight into the adaptive evolution of this virus. methods: ascites samples from 31 cats with suspected fip and feces samples from 8 healthy cats were screened for the presence of fcov. partial viral genome sequences, including parts of the nsp12-nsp14, s, n, and 7b genes, were obtained and aligned with additional sequences obtained from the genbank database. bayesian phylogenetic analysis was conducted, and the possibility of recombination within these sequences was assessed. analysis of the levels of selection pressure experienced by these sequences was assessed using methods on both the paml and datamonkey platforms. results: of the 31 cats investigated, two suspected fip cats and one healthy cat tested positive for fcov. phylogenetic analysis showed that all of the sequences from mainland china cluster together with a few sequences from the netherlands as a distinct clade when analyzed with fcov sequences from other countries. fewer than 3 recombination breakpoints were detected in the nsp12-nsp14, s, n, and 7b genes, suggesting that analyses for positive selection could be conducted. a total of 4, 12, 4, and 4 positively selected sites were detected in the nsp12-nsp14, s, n, and 7b genes, respectively, with the previously described site 245 of the s gene, which distinguishes fipv from fecv, being a positive selection site. conversely, 106, 168, 25, and 17 negative selection sites in the nsp12-14, s, n, and 7b genes, respectively, were identified. conclusion: our study provides evidence that the fcov genes encoding replicative, entry, and virulence proteins potentially experienced adaptive evolution. a greater number of sites in each gene experienced negative rather than positive selection, which suggests that most of the protein sequence must be conservatively maintained for virus survival. a few of the sites showing evidence of positive selection might be associated with the more severe pathology of fipv or help these viruses survive other harmful conditions. feline coronaviruses (fcov) belong to the genus alphacoronavirus within the subfamily coronavirinae of the family coronaviridae in the order nidovirales [1] . fcov include the feline enteric coronavirus (fecv) and the feline infectious peritonitis virus (fipv) [2] . similar to other coronaviruses, such as the sars and mers viruses, fipv infections are distributed worldwide and can cause a fatal pathogenic disease fip in their hosts, thus, seriously endangering the life and health of cats [3] . however, the more common variant, fecv, causes an asymptomatic or mild enteric infection [3] . fipv and fecv are antigenically divided into two types (i and ii) based on a difference in the nucleotide sequence of the s gene, which encodes the spike protein [4] . most natural cases of feline coronavirus infection are type i fcov; however, these viruses poorly propagate in cell culture, whereas type ii fcov viruses can grow in several different cell lines [5] . the pathogenesis of fip is not yet completely understood. it has been suggested that large viral quasispecies of fcov, which are due to copying errors of its rna genome, may destroy a weak immune system found in some individuals leading to fip [6, 7] . the coronavirus spike (s) glycoprotein is a typical class 1 viral fusion protein and plays a central role in receptor binding and viral entry [8] . in addition to the s gene, there are several other genes in the fcov genome. nonstructural protein nsp12 encodes the rna-dependent rna polymerase protein, nsp13 encodes the helicase protein, and nsp 14 encodes the exoribonuclease protein, which are all essential for genome replication. the n gene encodes the nucleocapsid, which is commonly used in phylogenetic analysis [9] . the 7b gene is a small orf that is located downstream of the n gene and is important for virulence [10] . compared to the more predominant type i fcov, type ii fcov viruses are only found in 2-30% of infections [11] . at present, a high incidence of type i fcov occurs in europe, japan, australia, korea, and the usa.however, fipv cases in japan and taiwan are more frequently associated with type ii fcov [5, 12] , suggesting a difference in the geographical distribution of the different serotypes of fcov. to date, almost all fcov strains isolated from china are type i [13] . the goal of our study is to increase the sampling of fcov in china and to also examine the selective pressures acting on the genes of these viruses isolated from different parts of the world. to study this, we obtained ascites samples from 31 cats with suspected fip as well as feces samples from 8 healthy cats. these samples were collected at several pet hospitals in liaoning province, china, during the period october 2017 to may 2019. of these 39 samples, 3 were found to be positive for fcov. the adaptive evolutionary properties of fcov, including selective pressure, were systematically analyzed using paml and datamonkey for the key fcov functional proteins involved in viral entry, 1 replication, and virulence. the aim of this study was to provide insight into the adaptive evolution of fcov, which might provide insight into their pathogenic mechanisms. 2.1. sampling. samples were collected from veterinary hospitals in liaoning province, china, between october 2017 and may 2019. a total of 39 samples were obtained, with 31 being from cats with suspected fip, as they had clinical symptoms such as loss of appetite and weight and increased abdominal girth with peritoneal effusion and/or pleural effusion [11] that are associated with this disease. some, but not all, of the cases were examined for clinical hematologic and biochemical analysis. effusions were collected by needle and syringe puncture guided by ultrasound. in addition, 8 fresh feces samples were collected from healthy cats using anal swabs, which were suspended in pbs and then stored at -80°c. 2.2. viral rna and reverse transcription. viral rna was extracted from 140 μl of effusion or feces suspension with the qiaamp viral rna minikit (qiagen, shenyang, china), following the manufacturer's instructions, and stored at -80°c. extracted rna was used as the template for cdna synthesis with primescript™ ii 1st strand cdna synthesis kit (takara, china) with random hexamers, following the manufacturer's instructions. 2.3. pcr, cloning, and sequencing. to amplify the s gene, we designed primers based on available feline coronavirus sequences. the partial s gene was divided into two segments, with primers designed separately for each part. the best primers and reaction conditions for each pcr reaction were selected using gradient pcr. primer pairs s2b2-f and s2b2-r and sc1-f and sc1-r were used to amplify the 5 ′ and 3 ′ ends of the s gene, respectively. pcr was performed using 2 μl of cdna in a 10 μl reaction containing 1 mmol/l concentration of each primer and 5 μl of 2x es taq mastermix (beijing comwin biotech co., ltd.). to amplify partial nsp12, nsp13, and nsp14 gene sequences, we designed primer pairs 1b3f and 1b1r and 1b6f and 1b6r. to amplify part of the n gene, primer pair n1 and n2 was used [14] . to amplify the 7b gene, the previously designed primer pair 7b-f1 and 7b-r1 was used [3] . all pcr reactions had the same preheating temperature of 94°c and an extension temperature of 72°c. annealing temperatures of the different amplifications are listed in table 1 . products of the amplifications were separated by electrophoresis, with dna from appropriate bands extracted, cloned, and sequenced (sangon biotech, shanghai, china) to confirm virus detection. as previously described [15] , type i fcov is entirely a feline virus; however, type ii fcov is a recombinant of type i fcov and a canine coronavirus (ccov) that resulted in a fcov genome containing the s gene and parts of the adjacent genes from ccov. this results in the recombinant type ii fcov s gene having a different size from the type i s gene. thus, amplification of the s gene allows typing of the fcov virus. studies. in addition to the fcov genomes sequenced in this study, genome sequences that contained all of the 5,895 bases that we amplified in our sequences and represent the diversity found in several select countries (united states, united kingdom, netherlands, and belgium) were downloaded from the genbank database for analysis. the accession numbers for these sequences are listed in supporting information table s1 . to establish the phylogenetic relationships of these viruses, conserved regions of the nsp12, nsp13, nsp14, s, 7b, and n genes were concatenated into a single sequence. from the 24 fcov sequences from mainland china and other countries, an alignment of 5,895 bases was generated using clustalw as implemented in mega 6.0. bayesian phylogenetic trees, based on the nucleotide sequences, which were constructed using mrbayes 3.1 with 5,000,000 generations, sampled every 100 generations, using the commands mcmcp samplefreq=100, a burnin of 20,000 generations, 4 chains, and the best-fit substitution model gtr+i+g applied, which had been selected using jmodeltest [16] . canine cov (ccov) sequences (accession numbers kc175339.1 and jq404410.1) were used as the outgroup to root the trees. since recombination can influence the detection of positive selection, we assessed whether recombination had occurred within our aligned dataset using the gard (genetic algorithm for recombination detection) method [17] . a model selection procedure was run for each gene (the nsp12, nsp13, and nsp14 genes were considered to be a single gene), which sifts through all 203 possible timereversible models in a hierarchical testing procedure combining nested lrt tests with aic selection to pick a single "bestfitting" rate matrix, with site-to-site rate variation accounted for by the β-γ distribution [17] . the best-fitting substitution models for the s gene was trn model; however, there was no name of best model for the other three genes, and serial number of the best-fitting substitution models for nsp12-14, n, and 7b gene was (010230) with aic of 20622.10, (010230) with aic of 5874.10, and (012232) with aic of 5108.58, respectively. to detect the presence of positive selection in the fcov sequences from the different countries, we applied the branch, site, and branch-site tests from the paml suit [18] . values of ω (the nonsynonymous/synonymous rate ratio) greater than 1 suggest positive selection. the p values can be calculated through likelihood ratio tests (lrt), where the null hypothesis would be rejected if the p value is <0.05 when the model allows positive selection. the branch model detects positive selection acting on particular lineages [19, 20] . a variety of models, including one ratio, free ratio, and two ratios (where the foreground lineages should be labeled), were analyzed. comparing the free ratio and one ratio models examines whether the ω ratios differ among lineages. comparing the two ratio and free ratio examines whether the ω ratios are different between the foreground and background lineages. the site models allow different ω ratios among sites [21, 22] . the one ratio model m0 assumes that the same ω exists for all sites across the phylogeny, while the nearly neutral model m1, positive selection model m2, discrete model m3, beta model m7, and beta and ω model m8 assume 2, 3, 3, 10, and 11 classes of codons, respectively, with different ω values, including some that suggest positive selection. if a site class with ω greater than one is found, which suggests positive selection, then sites with evidence of positive selection and a posterior probability p > 95% level were identified. posterior probabilities were calculated by naïve empirical bayes (neb) and the bayes empirical bayes (beb) [21] . selection pressure was also analyzed through the datamonkey suite of programs, including fixed effects likelihood (fel; p < 0:05), random effects likelihood (rel; bayes factor > 50), and mixed-effects model of evolution (meme; p < 0:05). we considered a site to be under positive or negative selection when detected as such by at least two different methods. fcov. these three positive cats were derived from different households. all of the positive cases were classified as type i fcov, as the amplified s gene products had a size that was expected for type i, rather than type ii fcov. symptoms such as fever, anorexia, loss of weight, panting, and abdominal extension were observed in the suspected cases, and the two fip-positive cats subsequently died within 10 days of admission to the hospital, while the healthy cat remained asymptomatic. nsp12, nsp13, nsp14, s, 7b, and n gene clones were obtained from these three positive samples for sequencing and were used in the following analyses. accession numbers for the fcov sequences obtained in this study are provided in supporting information table s2 . analysis. an alignment of 5,895 bases, containing the partial nsp12, nsp13, nsp14, s, and n genes and the complete 7b gene sequence, was used for the phylogenetic analyses, with canine sequences used to root the tree. this analysis showed that our new fcov sequences from mainland china clustered together previously characterized sequences from china, as well as a few sequences from the netherlands as a distinct clade, separate from those of other countries (figure 1 ). the analysis identified two additional clades, one composed of sequences only from the united kingdom and a second composed of sequences from belgium, united states, and the netherlands (figure 1) . apart from the clustering of the sequences from the united kingdom, no clear geographic separation of fcov sequences can be observed for the other two clades. [23] , we tested our fcov sequences for evidence of recombination using gard with kh testing [17] . the results of this analysis for each gene is shown in table 2 . gard detected evidence for one breakpoint within the nsp12-nsp14 genes and 2 within each of the s, n, and 7b genes. however, of these potential breakpoints, only those at locations 1020 (nsp12-nsp14), 360 and 582 (both n), and 434 (7b) were significant supported by the kh test, with p values < 0.05. as only a few recombination breakpoints were reliably detected by gard, this suggests that recombination has little effect on our sequences and should not affect the detection of sites under positive selection [23] . 3.4. measurement of selection pressure. we used 5 methods to identify sites within the fcov sequences with evidence for positive selection. the results of these analysis are shown in table 3 . for the nsp12-nsp14 gene region, four positively selected sites (6, 488, 562, and 631) were identified by at least two methods showing significance at p < 0:05, bayes factors > 50, or posterior probabilities > 95% (table 3 ). site 562 in the nsp12-nsp14 genes showed evidence of positive selection with methods from both paml and datamonkey, while the three remaining sites in this gene fragment were only detected by two or more methods from datamonkey. a total of twelve positively selected sites (22, 43, 101, 149, 151, 166, 167, 172, 175, 229, 245, and 475) were found in the s gene that were identified by at least two methods. sites 151, 172, 175, and 245 showed evidence for positive selection from methods using both paml and datamonkey (table 4) , while the other eight sites were detected by methods from only one platform (either paml or datamonkey) ( table 4) . for the n gene, only one positively selected site, position 21, was identified by methods from both paml and datamonkey, while three others (13, 52, and 195 ) were identified only with methods from paml (table 5 ). three positively selected sites (41, 149, and 187) in the 7b gene were detected by methods from both paml and datamonkey (table 6) , with one site (5) showing evidence for positive selection only with methods from paml (table 6 ). in addition to positive selection, the fel and rel methods can both identify sites with evidence of negative selection sites. using these methods, we identified 106, 168, 25, and 17 negative selection sites in the nsp12-14, s, n, and 7b genes, respectively, which had significant evidence by both methods (supporting information tables s3-s10). when the branch and branch-site models of paml were applied to the alignments, no evidence for positive selection was found for any branch in the phylogenies. to study the evolution of feline cov (fcov) in china, we collected a total of 31 samples from cats, with 3 of these 31 samples testing positive for type i fcov. the identification of type i fcov in cats in liaoning province, china, is in line with a previous study that showed that type i fcov is most prevalent in cats in china [13] . as the occurrence of fip is associated with fecv, which results from fecv mutating into fipv [24] [25] [26] , we selected both fipv and fecv sequences from several other countries for our phylogenetic analysis of our fcov sequences. phylogenetic analysis of the concatenated gene sequences obtained in this study yields some insight into the regionalism of type i fcovs. all fcov sequences from mainland china cluster together with a few sequences from the netherlands. the biomed research international clustered together with other fipv sequences of china; however, it was an early diverging lineage in this clade ( figure 1) ; thus, the other fipv sequences from china possibly contain additional disease-causing mutations. sequences from the united kingdom cluster together as a separate clade. the third cluster of fcov sequences include isolates from belgium, united states, and the netherlands. apart for the united kingdom clade, an obvious geographical separation of the fcov sequences is not observed, which might be a consequence of the trade in cats and tourism between countries. our analysis shows that fcov can be transmitted from one country to another, although a particular geographic position, such as the united kingdom locates in an island, might limit its spread. the datamonkey suite of programs can identify codons under selection as well as the presence of recombinant sequences within a dataset [27] . ideally, we should first determine whether recombination has occurred among the gene sequences, which can be assessed using gard [17] . although a breakpoint was identified within the nsp12-14 gene segment and 2 more within the other three genes examined, the number of breakpoints is less than three and thus likely would have little effect on the identification of sites experiencing positive selection by empirical bayes methods [23] . in addition, it is known that the type ii fcov emerged via double recombination between type i fcov and type ii ccov and that recombination events in type i fcov have rarely been reported [28] . previous studies have shown that the transmission of natural recombinant strain rarely occurs; thus, we should be cautious in concluding putative recombination event based on these computational analyses [29] . although none of the branches of the fcov phylogeny from different countries investigated here showed evidence of positive selection, when site models were applied, four sites in nsp12-14, ten sites in the s gene, 4 sites in the n gene, and 4 sites in the 7b gene were detected as having evidence for positive selection. site 245 of the s gene, which previously had been shown to distinguishes fipv from fecv [30] , was found to be a positive selection site. both the fel and rel methods, from the datamonkey suite of programs, can identify sites experiencing negative selection. from our analyses, 106, 168, 25, and 17 negative selection sites were identified by both fel and rel in the nsp12-14, s, n, and 7b genes (supporting information tables s3-s10). viruses can experience both positive and negative selection. positive selection leads to an increase in the abundance of a specific genetic variant, while negative selection results in genetic conservation [31] [32] [33] . a larger number of sites experiencing negative rather than positive selection were found for all of the genes that we analyzed, which suggests that most sites are conservative, and only a few can be adapted. the proteins encoded by the nsp12-14 genes are responsible for viral genome replication, with the s gene product responsible for viral entry into host cells, the n gene product responsible for the virus nucleocapsid protein, and the 7b gene product is responsible for viral virulence. negative selection in these genes indicates that they are essential to the virus. the few positively selected changes that have occurred likely help these viruses survive and adapt under harmful conditions, with the change at site 245 of the s gene due to positive selection potentially related to the highly virulent fipv of the virus [30] . positive selection at other sites may contribute to the entry into host cells and enhanced virulence of viruses. by analyzing the selective pressure experienced by genes in the fcov genome involved in replication, entry, and virulence, we have identified a few sites that potentially experienced adaptive evolution. as negative selection occurs at a higher rate than positive selection within the fcov genes, this suggests that only a few sites can beneficially adapt to allow greater infectivity in the host and that most sites are under strong negative selection to conserve function. a few of the positively selected sites in fcov might be associated with the occurrence of fipv and lead to these viruses causing a more fatal disease. additional experimental analysis needs to be conducted to better understand the consequences of the positively selected changes identified here. all virus data obtained or analyzed during this study are included in this published article. sequences of the obtained viruses have been uploaded in genbank with accession numbers in supplementary materials table s2 . the authors declare that they have no conflict of interests. the ministry of science and technology of the people's republic of china (the national key research and development program, number 2016yfd0500300) and the educational department of liaoning province of china (climbing scholar). table s1 : other coronavirus isolate sequences used in this study. table s2 : accession numbers of the coronavirus isolate sequences obtained in this study. table s3 : negative selection sites for the nsp12, nsp13, and nsp14 genes based on rel analysis. table s4 : negative selection sites for the s gene based on rel analysis. table s5 : negative selection sites for the n gene based on rel analysis. table s6 : negative selection sites for the 7b gene based on rel analysis. table s7 : negative selection sites for the nsp12, nsp13, and nsp14 genes based on fel analysis. table s8 : negative selection sites for the s gene based on fel analysis. table s9 : negative selection sites for the n gene based on fel analysis. mutations of 3c and spike protein genes correlate with the occurrence of feline infectious peritonitis feline coronavirus 3c protein: a candidate for a virulence marker? genetics and pathogenesis of feline infectious peritonitis virus feline infectious peritonitis virus with a large deletion in the 5′-terminal region of the spike gene retains its virulence for cats isolation and molecular characterization of type i and type ii feline coronavirus in malaysia quasispecies composition and phylogenetic analysis of feline coronaviruses (fcovs) in naturally infected cats persistence and evolution of feline coronavirus in a closed catbreeding colony genotyping coronaviruses associated with feline infectious peritonitis coronavirus genomics and bioinformatics analysis genomic organization and expression of the 3' end of the canine and feline enteric coronaviruses an outbreak of feline infectious peritonitis in a taiwanese shelter: epidemiologic and molecular evidence for horizontal transmission of a novel type ii feline coronavirus genetic diversity and phylogenetic analysis of feline coronavirus sequences from portugal circulation and genetic diversity of feline coronavirus type i and ii from clinically healthy and fip-suspected cats in china rt-pcr detection of feline infectious peritonitis virus persistence and transmission of natural type i feline coronavirus infection mrbayes 3: bayesian phylogenetic inference under mixed models automated phylogenetic detection of recombination using a genetic algorithm paml: a program package for phylogenetic analysis by maximum likelihood likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution synonymous and nonsynonymous rate variation in nuclear genes of mammals likelihood models for detecting positively selected amino acid sites and applications to the hiv-1 envelope gene maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus a effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites two related strains of feline infectious peritonitis virus isolated from immunocompromised cats infected with a feline enteric coronavirus feline infectious peritonitis viruses arise by mutation from endemic feline enteric coronaviruses significance of coronavirus mutants in feces and diseased tissues of cats suffering from feline infectious peritonitis detecting signatures of selection from dna sequences using datamonkey full genome analysis of a novel type ii feline coronavirus ntu156 differential stepwise evolution of sars coronavirus functional proteins in different host species spike protein fusion peptide and feline coronavirus virulence predicting adaptive evolution phylogenetics by likelihood: evolutionary modeling as a tool for understanding the genome codon-substitution models for detecting molecular adaptation at individual sites along specific lineages we thank the staff in the petmate pet hospitals, without whose help, the samples could not be successfully obtained for this study. we also thank dr. xianchun tang for the technological support. this work was supported by grants from hx contributed to the design of research and participated in the analysis and interpretation of data. hx, xl, wz, sj, and xz performed the experiments. hx wrote the draft manuscript and dmi modified the manuscript. sz designed the study and supervised the work. all authors read and approved the final manuscript. key: cord-315498-gpzee1f2 authors: parkinson, n.; rodgers, n.; head fourman, m.; wang, b.; zechner, m.; swets, m.; millar, j. e.; law, a.; russell, c.; baillie, j. k.; clohisey, s. title: systematic review and meta-analysis identifies potential host therapeutic targets in covid-19. date: 2020-09-01 journal: nan doi: 10.1101/2020.08.27.20182238 sha: doc_id: 315498 cord_uid: gpzee1f2 an increasing body of literature describes the role of host factors in covid-19 pathogenesis. there is a need to combine diverse, multi-omic data in order to evaluate and substantiate the most robust evidence and inform development of future therapies. we conducted a systematic review of experiments identifying host factors involved in human betacoronavirus infection (sars-cov-2, sars-cov, mers-cov, seasonal coronaviruses). gene lists from these diverse sources were integrated using meta-analysis by information content (maic). this previously described algorithm uses data-driven gene list weightings to produce a comprehensive ranked list of implicated host genes. 5,418 genes implicated in human betacoronavirus infection were identified from 32 datasets. the top ranked gene was *ppia*, encoding cyclophilin a. pharmacological inhibition with cyclosporine in vitro exerts antiviral activity against several coronaviruses including sars-cov. other highly-ranked genes included proposed prognostic factors (*cxcl10*, *cd4*, *cd3e*) and investigational therapeutic targets (*il1a*) for covid-19, but also previously overlooked genes with potential as therapeutic targets. gene rankings also inform the interpretation of covid-19 gwas results, implicating *fyco1* over other nearby genes in a disease-associated locus on chromosome 3. pathways enriched in gene rankings included t-cell receptor signalling, protein processing, and viral infections. we identified limited overlap of our gene list with host genes implicated in ards (innate immune and inflammation genes) and influenza a virus infection (rna-binding and ribosome-associated genes). we will continue to update this dynamic ranked list of host genes as the field develops, as a resource to inform and prioritise future studies. updated results are available at https://baillielab.net/maic/covid19. there are multiple sources of information that associate host genes with sars-cov-2 viral replication, the subsequent host immune response and the ensuing pathophysiology. integrating these sources of information may provide more robust evidence associating specific genes and proteins with key processes underlying the mechanisms of disease. this is needed in order to make informed judgements about new therapies for inclusion in model studies and clinical trials. the pace of new research into covid-19 pathophysiology, including host dependency factors, immune responses, and genetics, has made it nearly impossible to read every report. in addition, assessing the quality and relevance of new evidence is difficult, time-consuming, and requires a high level of expertise. information from diverse sources has varying quality, scale, and relevance to host responses to sars-cov-2. computational approaches can aid data evaluation and integration. simple, intuitive methods have a conceptual advantage for translation to decision-making: if both the processes and results are easily comprehensible, then it is easier for human users to trust the conclusions. sars-cov-2 is a betacoronavirus with a 30kb single-stranded positive-sense rna genome, and is genetically similar to other human coronaviruses: sars-cov, mers-cov and the seasonal 'common cold' 229e, oc43, hku1 and nl63 coronaviruses. like all viruses, sars-cov-2 relies on host machinery to replicate. host dependency factors represent an attractive target for new therapeutics, as evolution of drug resistance is expected to be slower for host-directed than viral-directed therapies. 1 treatment directly targeting viral replication can target viral proteins (e.g. remdesivir 2 ), or host proteins upon which the virus depends. 3 host-targeted therapies may have an important role in infectious diseases in general, and the only treatment so far found to reduce mortality in covid-19 -dexamethasone 4 -is likely to act by targeting host immune-mediated organ damage. 5 other host-directed treatments (e.g. anakinra, tocilizumab, sarilumab, mavrilimumab), repurposed from other indications, are currently under investigation. 2, 6, 7, 8, 9, 10 in this analysis we systematically identify and combine existing data from human betacoronavirus research to generate a comprehensive ranked list of host genes as a resource to inform further work on covid-19. to identify existing literature which could provide informative datasets for host gene prioritisation, we conducted a systematic review of published studies and preprint manuscripts pertaining to host gene involvement in human betacoronavirus infection and associated disease. results from identified studies, in the form of lists of implicated host factor genes, were combined using meta-analysis by information content (maic), 3 an approach we previously developed to identify host genes necessary for influenza a virus (iav) replication. we have previously demonstrated that the maic algorithm successfully predicts new experimental results from an unseen future experiment. 3 our gene prioritisation results both recapitulate existing understanding of covid-19 pathophysiology and highlight key host factors and potential therapeutic targets that have to date been largely overlooked. maic allows the combination of data from diverse sources without prior assumptions regarding the quality of each individual data source. the maic approach begins with the following assumptions: 1. there exists a set of true positives: host genes involved in covid-19 pathogenesis. 2. a gene is more likely to be a true positive if it is found in multiple experiments. 3. a gene is more likely to be a true positive if it occurs in a list containing a higher proportion of genes with supporting evidence from multiple sources. 4. due to experimental biases, the evidence that a gene is a true positive is further increased if it is found across experimental types. with these assumptions, maic allows the quantification of the information content in a gene list by comparing that list to the results from other experiments that might reasonably be expected to find some of the same genes. input gene lists can be categorised by data type (table 2) , allowing comparison both within and between methodologies. maic produces a weighting factor for each experiment, and this weighting is used to calculate a score for each gene. the analysis then produces a final ranked list of genes based on this score, which summarises the combined evidence from all input sources of that particular gene being involved in sars-cov-2 pathogen-host interaction. a full description of the maic algorithm can be found in our original report. 3 inclusion and exclusion criteria are shown in table 1 . to complement emerging data pertaining to the novel sars-cov-2, we included studies of other human coronaviruses. included methodologies are shown in table 2 . table 2 candidate-gene human genetic studies < 5 hosts in virus group or control group in patient studies meta-analyses, in silico anayses, re-analysis of data published elsewhere potentially relevant pre-print manuscripts were identified by screening all papers categorised as covid-19-related in the biorxiv and medrxiv servers. titles and abstracts of all returned papers were first assessed for relevance and duplication by a single member of the review team. following this, full-length texts were obtained and an in-depth review was carried out by two further reviewers, independently, in order to confirm eligibility according to table 2 and table 1 . in cases where a consensus was not reached, a third reviewer appraised the paper. this method ensured each paper was assessed for eligibility by a minimum of three independent reviewers. relevant data, as shown in table 3 , was extracted from each reviewed paper. table 2 organism human, rodent, non-human primate cell / tissue type vero6, a549, serum peer reviewed or pre-print peer-reviewed, pre-print relevant gene lists were identified and extracted. datasets were excluded from the analysis where insufficient data were available to construct a meaningful unbiased gene list, for example where results for only a non-systematically selected subset of genes of interest were reported. gene lists were categorised based on methodology as shown in table 2 . gene list rankings were preserved where possible, if sufficient numerical data were available. rankings were based on significance or magnitude of effect. adjusted measures of significance, usually adjusted !, were prioritised over raw ! and !"#$% to determine ranking where multiple values were available. for studies reporting comparisons at multiple time points, genes were ranked based on the minimum ! across all comparisons. to exclude irrelevant genes, a significance or effect size threshold was applied to all lists. this was either the threshold used by the authors for reporting, or where full data were provided this was determined as adjusted gene, transcript and protein names or identification numbers were converted to the associated hgnc gene symbol, or an equivalent ensembl or refseq symbol where no hgnc symbol existed. non-primate genes were mapped to their human homologues using the ncbi homologene database, 11 or excluded from the analysis if no human homologue could be identified. rank-based gene set enrichment analysis was performed using the package fgsea in r version 3.5.2, with genes ranked by maic score. 12 !-values were estimated using an empirical probability distribution based on 10 6 permutations. gene set over-representation in the top 100 genes was analysed by a fisher's exact test as implemented in enrichr. 13 the benjamini-hochberg procedure was used to control the false discovery rate (!"# < 0.05) for both methods. we identified a total of 31 studies with available data meeting our eligibility criteria (12 pre-print manuscripts and 19 peer-reviewed studies), yielding 32 gene lists (supplementary figure 1 , supplementary table 1 ). the included gene lists comprised 11 ranked and 21 unranked lists, in 8 experimental categories, with list lengths ranging from three to 9,967 genes (median 61). the datasets included 3 genetic perturbation screens (crispr, rnai and interferon-stimulated gene overexpression), 3 genetic studies (of which 2 were in humans), 7 protein-protein interaction studies, 7 proteomic and 12 transcriptomic studies. is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . https://doi.org/10.1101/2020.08.27.20182238 doi: medrxiv preprint a single gene list only. beyond ranks around 700 in this study, gene scores approach baseline, indicating they have little corroborative evidence. of 5,418 genes implicated in human betacoronavirus infection in these 32 datasets, 4,150 are supported by a single paper only, 629 had evidence from more than one source within the same experimental category, and 639 are supported by data from multiple study types. although extensive within-category overlap was seen for transcriptomic studies, there was less concordance within categories such as proteomics and protein-protein interaction. as with our previous study of influenza, contributions from one crispr screen dominate the overall information content (figure 1 b) . this was due in part to list length and hence contribution to scores for multiple lower-ranking genes. information contributions for the top 100 genes are more balanced (figure 1 c) . the maic score distribution (figure 1 d) reflects the degree of cross-category overlap, with the highest ranked genes supported by data from three distinct experimental categories. the highest ranking genes and their contributing evidence sources are shown in figure 3 a and supplementary table 2 . top genes include il1a 14 and other components of the innate and adaptive immune systems (such as cxcl10, cd4 and tlr3), which have previously been shown to contribute to covid-19 pathogenesis. other top genes have not previously received much attention in the context of coronavirus infection. these include ppia (cyclophilin a), and rybp (ring1 and yy1 binding protein), which play roles in protein folding, transcriptional repression, regulation of proteasomal degradation and apoptosis. an up-to-date prioritised list of implicated genes is available at baillielab.net/maic/covid19. we will repeat the analysis regularly as new data become available. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . https://doi.org/10.1101/2020.08.27.20182238 doi: medrxiv preprint death in severe covid-19 is usually a consequence of lung injury leading to ards (acute respiratory distress syndrome), a final common pathway that can occur in any severe acute respiratory infection. host susceptibility factors in covid-19 may be shared with other infections or ards. we compared the output from our analysis to our previous maic analysis of influenza a virus. 3 among the top 500 ranked genes from each output, we found 72 overlapping genes (figure 3 b) , including a large number of rna-binding, ribosome-associated and chaperone genes. unexpectedly few immune related genes overlapped. this is surprising as both viruses are single-stranded rna viruses and despite differing in sense, intracellular pathogen detection mechanisms are expected to be similar. to expand this analysis we manually curated genes associated with ards and determined the overlap with our maic output (supplementary table 4 ). among the top 500 ranked genes from the coronavirus maic output we found an overlap of 13 genes with the ards list (consisting of 103 genes) (figure 3 c) . here we saw a number of genes associated with innate immunity and modulation of inflammation (tnf!, il6, il18, ccl2, il1b, tlr1, il13, nf!bia). . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . the inflammatory profile, including hyperferritinaemia, observed in covid-19 has led to the suggestion that a form of secondary haemophagocytic lymphohistiocytosis (hlh), a hyper-inflammatory syndrome, could be occurring. 15 we manually curated . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . genes involved in the familial form of this syndrome and compared these with our maic output, finding no overlap, although only eleven genes were found in the literature to be associated with familial hlh. to better understand the biological functions of the most strongly implicated genes, we performed gene set enrichment analysis in ten databases of functional annotations. we used two complementary methods, assessing enrichment either in terms of rank distribution across the whole dataset (permissive) or in overrepresentation in the top 100 genes only (conservative). there was extensive overlap between these approaches (figure 4 a and supplementary table 3 ). . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . functional annotations that were significant using both methods and that were reflected in results from more than one database included terms related to cytokine, toll-like receptor and t-cell receptor signalling, protein processing, apoptosis, the complement system, vegf signalling, glucose metabolism and viral infections such as influenza a. as expected, the relative information contributions from different experiment types varied between pathways (figure 4 b) . for example, terms related to complement and coagulation received relatively little contribution from crispr screen data (derived from epithelial cells) but more information from proteomics and experiments using serum or swab samples, whilst pathways related to protein processing had relatively greater contributions from protein interaction studies. in all cases, enriched pathways drew information from a range of experiment types. one of the principal applications of maic is in the interpretation of results of genome-wide association studies (gwas). genome-wide association studies often implicate a locus containing a number of candidate genes and the precise nature of the interaction between gene and disease may not be known. as an example, we applied our results to the locus in chromosome 3 associated with hospitalisation due to covid-19 in the sole covid-19 gwas published to date (data from which was also included in this analysis). 16 this locus contained six genes (sl6a20, ltzfl1, ccr9, fyco1, cxcr6 and cxr1) that could all plausibly be linked to covid-19 pathophysiology on the basis of their known functions. of these, fyco1, which encodes a protein involved in vesicle transport and autophagy, was highly ranked in our results (rank 42). fyco1 is supported by sars-cov-2-specific protein-protein interaction and transcriptomic data. ccr9 (rank 417) had additional support from a single transcriptomic study, while sl6a20, ltzfl1, cxcr6 and xcr1 had low ranks in our results, with no corroborating evidence in other studies. the interpretation of any meta-analysis is critically dependent on the criteria for inclusion. in this case, our objective is to cast the net wide, including a range of data sources that are both conceptually and methodologically divergent. experimental results bearing little relation to the composite of evidence from the other studies are downgraded by the maic algorithm, so the effect of irrelevant, noisy, or poorlyconducted experiments is minimal. 3 by using permissive inclusion criteria, together with weighted meta-analysis, we have identifed key elements of the host-pathogen . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . interaction and promising therapeutic targets for further investigation and intervention. these include host factors involved in viral replication, and elements of the immune response, which have been overlooked in the contributing studies ( figure 2) . coronaviruses hijack host endomembranes to facilitate anchoring of the replication/transcription complex. 17 consistent with this, we observe an overrepresentaion of endoplasmic reticulum-related genes. genes related to the function of the endoplasmic reticulum (faf2, ergic1, tent5c, tent5a, cfl1, stard5) and glycosylation (mogs, ugdh, uggt1, pdia6, pdia3) are observed along with a number of chaperone proteins (hspb1, hspa4, hspa8, hsp90aa1, st13) . many of these genes are related to the unfolded protein response (upr), a stress response initiated by accumulation of misfolded proteins. fyco1, implicated in published gwas results, has been suggested as a key mediator linking er-derived double membrane vesicles, the primary replication site for coronaviruses, with the microtubule network. 18 viral entry via spike (s) protein-mediated membrane fusion is well characterised. 19 the first step, spike activation, requires cleavage via host proteases such as cathepsin l (ctsl, rank 31). 20 in the wei et al. 21 crispr screen, knockout of ctsl restricted viral production. cathepsin l inhibition has been suggested as a promising therapeutic strategy for covid-19: specific small molecule inhibitors are in early stages of development, and direct or indirect inhibition is observed with a number of approved drugs including glycopeptide antibiotics, chloroquine and dexamethasone. 22 atp1a1 (rank 35), encoding a subunit of the na+/k+ cotransporter, has similarly been shown to be necessary for membrane fusion and viral entry for a number of coronaviruses. 23 inhibition of atp1a1 by cardiac glycosides suppressed mers-cov infection in vitro. 24 additional antiinflammatory 25 effects of these drugs make them theoretically attractive therapeutic options, but adverse effects may limit their practical application. the interferon-stimulated gene ly6e (rank 3), which plays a key role in enhancing cellular entry by rna viruses including influenza a virus, 26 was unexpectedly found to have a strong restricting effect on sars-cov-2, sars-cov and mers-cov. 27 such widely opposite differential effects on different viruses have been reported for other host genes, such as ifitm3, rsad2 and axl (ranked 1056, 1039 and 215 respectively by maic), 26 and for ppia (see below). consistent with the emerging understanding of the pathogenesis of covid-19, key genes in the inflammatory response to sars-cov-2 infection are highly represented in the top 100 genes. these include genes involved in recognizing the virus (tlr3, ifih1), activating the innate immune system (oas2, herc5, s100a9), chemotaxis (s100a9, cxcl10, cxcl8, ccl20, saa2) and pro-inflammatory cytokines (il1a, il18). . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . https://doi.org/10.1101/2020.08.27.20182238 doi: medrxiv preprint toll-like receptor 3 (tlr3) is an endosome-associated pathogen-associated molecular pattern receptor, constitutively expressed in the respiratory tract and many immune cells. tlr3 detects double-standed viral rna and triggers production of type i interferons and other pro-inflammatory cytokines, such as il6 (rank 104) and tnf! (rank 182) via irf3 and nf-!b. 28 the chemokine cxcl10 (rank 17) is a key signalling molecule in viral immunity, which could contribute to pulmonary inflammation as well as aiding viral clearance. 29 cxcl10 levels are associated with outcome in influenza 30 and are thought to have a protective effect in sars, but a pro viral effect in hiv. 31 cxcl10 has been proposed as a prognostic marker for the progression of disease in covid-19, with continuously high levels of cxcl10 associated with worse outcomes. 32 the high rankings of genes associated with activation and binding of t lymphocytes (cd4, cd3e, fgl2, lck, sell) are also likely related to their prognostic significance, as lymphopaenia is strongly associated with poor outcomes in covid-19. 33, 34 absolute counts of cd3+, cd4+ and cd8+ t lymphocytes have been proposed as a potential predictor of outcome in severe covid-19 patients 35 with an increase in numbers of these cells observed during recovery. the highest-ranking gene is ppia, which encodes peptidyl-prolyl cis-trans isomerase a (ppia, also known as cyclophilin a, cypa), a cytosolic protein involved in protein folding and trafficking, cell signalling and t-cell activation via the calcineurin/nfat pathway. 36 cypa is a pro-viral factor for hepatitis c virus (hcv), hiv-1, and sars-cov, and an anti-viral factor for iav. 37, 38 the cyclophilin inhibitor cyclosporine has in vitro antiviral activity against hcv. 39, 40, 41 this was also observed in a hcv clinical trial, where cyclosporine combined with interferon-! was more efficacious in achieving sustained virologic response than interferon monotherapy. 42 similar in vitro and clinical results were demonstrated for the cypa inhibitor alisporivir (debio-015). 43 cypa is also a proviral factor for hiv-1 and alisporivir can inhibit hiv-1 replication in vitro. 44, 45 a genome-wide protein-protein interaction screen identified an interaction between the sars-cov nsp1 protein and cypa. 46 cyclosporine inhibits sars-cov replication in vero e6 cells, as well as hcov-229e, hcov-nl63, avian coronavirus and feline coronavirus. nsp1 also induced il-2 expression in hek293 cells through the calcineurin/nfat pathway, making inhibition of this pathway interesting from both an antiviral and immunomodulatory perspective. 47 the high ranking of il1a (encoding interleukin 1-!) is striking because monoclonal antibodies against interleukin 1 receptor are a plausible therapeutic target for covid-19. 10 this pro-inflammatory cytokine, which is synergistic with tnf!, is constitutively expressed in epithelial cells and is upregulated after sars-cov-2 infection. 48 interleukin-1 receptor blockade with anakinra is now being tested in a number of randomised clinical trials in covid19. 10 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september 1, 2020. . https://doi.org/10.1101/2020.08.27.20182238 doi: medrxiv preprint the principal advantage of the maic approach is that it allows integration of data from diverse sources. unlike other methods for gene list comparison such as vote counting or robust rank aggregation, 49 maic applies a data-driven weighting to each dataset, accepts both ranked and unranked lists, and includes user-defined categories which prevent any single method from overwhelming the results. maic outperforms other methods for predicting antiviral genes. 3 this meta-analysis is restricted to studies involving genome-wide hypotheses or screening data for large gene sets, and does not consider evidence from candidate gene genetic studies or single-gene perturbations. where a single gene has been investigated extensively but genome-scale studies are sparse, our approach may underestimate the relative strength of evidence for certain genes. single gene studies, however, are likely to focus preferentially on genes that fit pre-conceived ideas of disease pathogenesis and may be prone to other biases such as publication bias, something which we mitigated against in our inclusion criteria. genetic perturbation data are still relatively sparse for sars-cov-2 and other human betacoronaviruses: only one genome-wide crispr knockout screen and two other sub-genome-scale screens (kinome-wide rnai and interferon-stimulated gene overexpression screens) were included in the meta-analysis. limited data of this type could be responsible for the lower than expected rankings for ace2 (rank 320), a major functional receptor for the sars-cov and sars-cov-2 spike (s) proteins, and tmprss2 (rank 3037), a serine protease required for s protein priming. 19, 50, 51 while ace2 was identified as a host dependency factor in the crispr screen, tmprss2 was not, and as neither gene was included in the other two screens, the effects (or lack thereof) could not be confirmed. the only other supportive evidence for a role in disease pathophysiology, in studies included here, came from a single transcriptomic study for each; there was no evidence from protein-protein interaction, proteomics or genetics. 48, 52 both of these genes have been proposed as possible therapeutic targets for covid-19, and clinical trials are underway for the tmprss2 inhibitors nafamostat and camostat mesylate. systematic review and meta-analysis are routine elements in the assessment of clinical evidence and some fields in genomics, but have been less widely applied to mechanistic biology. using a flexible and intuitive method, we have systematically reviewed and meta-analysed host gene-level data from studies that address a range of complementary questions regarding human betacoronavirus infection. this provides external validation for numerous host genes implicated in both viral life cycle, and immune response, and identifies several plausible therapeutic targets with broad support from multiple sources. translational genomics. targeting the host immune response to fight infection genome-wide crispr screen identifies host dependency factors for influenza a virus infection dexamethasone in hospitalized patients with covid-19 -preliminary report & icecap tissue-specific tolerance in fatal gm-csf blockade with mavrilimumab in severe covid-19 pneumonia and systemic hyperinflammation: a single-centre, prospective cohort study interleukin-1 blockade with highdose anakinra in patients with covid-19, acute respiratory distress syndrome, and hyperinflammation: a retrospective cohort study an algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation enrichr: interactive and collaborative html5 gene list enrichment analysis tool a dynamic immune response shapes covid-19 progression covid-19: consider cytokine storm syndromes and immunosuppression sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum unconventional use of lc3 by coronaviruses through the alleged subversion of the erad tuning pathway angiotensinconverting enzyme 2 is a functional receptor for the sars coronavirus cathepsin l functionally cleaves the severe acute respiratory syndrome coronavirus class i fusion protein upstream of rather than adjacent to the fusion peptide genome-wide crispr screen reveals host genes that regulate sars-cov-2 infection cathepsin l-selective inhibitors: a potentially promising treatment for covid-19 patients de atp1a1-mediated src signaling inhibits coronavirus entry into host cells screening of fda-approved drugs using a mers-cov clinical isolate from south korea identifies potential therapeutic options for digoxin and digoxin-like immunoreactive factors (dlif) modulate the release of proinflammatory cytokines. inflammation research : official journal of the european histamine research society ly6e mediates an evolutionarily conserved enhancement of virus infection by targeting a late entry step ly6e impairs coronavirus fusion and confers immune control of viral disease sensing of viral infection and activation of innate immunity by toll-like receptor 3 covid 19: a clue from innate immunity progression of whole-blood transcriptional signatures from interferon-induced to neutrophil-associated patterns in severe influenza cxcl10/ip-10 in infectious diseases pathogenesis and potential therapeutic implications the cytokine storm in covid-19: an overview of the involvement of the chemokine/chemokine-receptor system lymphopenia in severe coronavirus disease-2019 (covid-19): systematic review and meta-analysis lymphopenia predicts disease severity of covid-19: a descriptive and predictive study characteristics of lymphocyte subsets and their predicting values for the severity of covid-19 patients cyclophilin a: a key player for human disease cyclosporin a inhibits the influenza virus replication through cyclophilin a-dependent and -independent pathways cyclophilin a restricts influenza a virus replication through degradation of the m1 protein essential role of cyclophilin a for hepatitis c virus replication and virus production and possible link to polyprotein cleavage kinetics cyclosporin a suppresses replication of hepatitis c virus genome in cultured hepatocytes suppression of hepatitis c virus replication by cyclosporin a is mediated by blockade of cyclophilins combined interferon alpha2b and cyclosporin a in the treatment of chronic hepatitis c: controlled trial the cyclophilin inhibitor debio 025 combined with peg ifnalpha2a significantly reduces viral load in treatment-naã¯ve hepatitis c patients inhibition of human immunodeficiency virus type 1 replication in human cells by debio-025, a novel cyclophilin binding agent the novel cyclophilin inhibitor cpi-431-32 concurrently blocks hcv and hiv-1 infections via a similar mechanism of action von the sars-coronavirus-host interactome: identification of cyclophilins as target for pan-coronavirus inhibitors cyclosporine has a potential role in the treatment of sars comparative transcriptome analysis reveals the intensive early-stage responses of host cells to sars-cov-2 infection robust rank aggregation for gene list integration and meta-analysis sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus upper airway gene expression differentiates covid-19 from other acute respiratory illnesses and reveals suppression of innate immune responses by sars-cov-2. medrxiv : the preprint server for health sciences the authors would like to acknowledge prof. c. wiley and prof j. doench for sharing data and their helpful comments on the manuscript. key: cord-317779-j67vb7f3 authors: irizarry, kristopher j. l.; downs, eileen; bryden, randall; clark, jory; griggs, lisa; kopulos, renee; boettger, cynthia m.; carr, thomas j.; keeler, calvin l.; collisson, ellen; drechsler, yvonne title: rna sequencing demonstrates large-scale temporal dysregulation of gene expression in stimulated macrophages derived from mhc-defined chicken haplotypes date: 2017-08-28 journal: plos one doi: 10.1371/journal.pone.0179391 sha: doc_id: 317779 cord_uid: j67vb7f3 discovering genetic biomarkers associated with disease resistance and enhanced immunity is critical to developing advanced strategies for controlling viral and bacterial infections in different species. macrophages, important cells of innate immunity, are directly involved in cellular interactions with pathogens, the release of cytokines activating other immune cells and antigen presentation to cells of the adaptive immune response. ifnγ is a potent activator of macrophages and increased production has been associated with disease resistance in several species. this study characterizes the molecular basis for dramatically different nitric oxide production and immune function between the b2 and the b19 haplotype chicken macrophages.a large-scale rna sequencing approach was employed to sequence the rna of purified macrophages from each haplotype group (b2 vs. b19) during differentiation and after stimulation. our results demonstrate that a large number of genes exhibit divergent expression between b2 and b19 haplotype cells both prior and after stimulation. these differences in gene expression appear to be regulated by complex epigenetic mechanisms that need further investigation. discovering genetic biomarkers associated with disease resistance and enhanced immunity is critical to developing advanced strategies for controlling viral and bacterial infections in various species. plos disease resistance and susceptibility depends on a variety of factors including genetics. in numerous species, disease resistance has been associated with major histocompatibility complex (mhc) haplotype, as well as polymorphisms in several immune genes such as tgfβ and tnfα [1, 2] . cytokine production, specifically secretion of pro-inflammatory molecules, has also been associated with increased resistance against disease [3, 4] . studies have demonstrated association of mhc-b haplotype in chickens and resistance to a variety of viral pathogens, including aiv, marek's disease virus (mdv), avian leukosis virus, newcastle disease virus and rous sarcoma virus [5] [6] [7] [8] [9] [10] as well as other pathogens [11, 12] . b2 haplotype chickens are more resistant to avian coronavirus infection than b19 haplotypes and these differences in disease resistance were observed early after infection in our previous studies [10] . this suggests that innate immunity plays a major role with the macrophage being a key player in this enhanced immune response as evidenced by the b2 haplotype birds' greater capability to produce nitric oxide (no) in response to ifnγ and poly i:c [13] . in addition, b2 macrophages activated t cells more efficiently than macrophages derived from b19 haplotypes [14] . macrophages are directly involved in cellular interactions with pathogens and demonstrate distinct immune responses from more disease resistant animals in response to infection [15] [16] [17] [18] [19] [20] . in addition, macrophages release cytokines activating other immune cells and antigen presentation to cells of the adaptive immune response [21] [22] [23] . it has become increasingly clear that dysregulation of macrophage function is involved in inflammatory disease processes such as rheumatoid arthritis, inflammatory bowel disease and cancer [24] [25] [26] . involved in these interactions are crucial molecules such as toll-like receptors (tlrs) that recognize invading microorganisms, resulting in communication with the adaptive immune system such as increased expression of mhc surface molecules, t cell receptors and secreted cytokines [21, 23] . genetic differences in any of those molecules can potentially account for differences in immune competence and thus provide potential immunogenetic markers for disease resistance to various pathogens. ifnγ is a potent activator of macrophages and increased production has been associated with disease resistance in multiple species [27] [28] [29] [30] [31] . these findings indicate that chickens with enhanced ifnγ production are more resistant to certain infections. ifnγ enhances macrophage activation, expression of mhc and nitric oxide release which aides in killing of pathogens and also increases activity of cytotoxic t cells and secretion of th1 cytokines [31, 13] , underscoring how crucial this process is for innate immune competence. macrophage tlrs appear to be primed by ifnγ, reprogramming cellular responses to other cytokines, such as type i interferons and il-10 and activating the jak-stat pathway (janus kinase and signal transduction and activator of transcription) [24, 32, 33] . ifnγ, which increases tlr receptor availability for interaction with its ligands, has been shown to induce tlr2, 4, 6 and 9 [34] [35] [36] [37] . the response of macrophages to an immune stimulus is not just dependent on cell surface receptor and cytokine expression. other factors include the differentiation of monocytes into functional macrophages, a tightly regulated process that influences immune competence [38] . recent studies demonstrated a critical role for molecules such as a2b adenosine receptor for differentiation and proliferation of monocytes and macrophage function in immunity and inflammation [39, 40] . a2b expression is induced by ifnγ and leads to increase of anti-inflammatory signaling counteracting the inflammatory response activated within the ifnγ pathway. taken together, these studies emphasize the genetic basis of the activation of macrophages by ifnγ playing an important role in the innate immune response signaling and providing resistance to disease. in addition to inflammatory signaling, a number of transcription factor pathways and epigenetic mechanisms all contribute to immune function. dysregulation of any of these events will lead to an impaired innate immune response and consequently, increased susceptibility to disease. using an ex vivo model, we investigated the gene expression in macrophages from haplotypes b2 and b19 during differentiation and after stimulation with ifnγ. our experimental design leveraged an initial 6 day window for monocytes to differentiate into macrophages, which was followed by ifnγ stimulation between 1 and 24 h to further characterize subsequent rna gene expression and the molecular basis for dramatically different nitric oxide production and immune function between the b2 and the b19 haplotype chicken macrophages animal protocols were performed under the approval of the institutional animal care and use committee at western university of health sciences, pomona, california (westernu). fertilized eggs, descended from modified wisconsin line 3, were obtained from dr. w. elwood briles, northern illinois university, and incubated and hatched under standard conditions at (38˚c/50-65% humidity) [10, 13] at westernu. in addition to daily health monitoring, fresh food and water were provided ad libitum. experimental animals were euthanized by insufflation of isoflurane gas (butler, dublin, oh). whole blood samples were collected via jugular venipuncture in edta tubes from age matched chicks at 14-18 weeks old. at no time did the amount of blood harvested from each animal exceed 1% of body weight. and stored at -80˚c. the morphology of adherent cells was observed daily under bright field microscopy (20x objective). purity of monocyte cultures using this culture method was confirmed by ifa and facs using monoclonal antibody kul01 as previously described as part of a different aspect of this study [13] . a 50 ρg/ml ch-ifnγ solution (invitrogen, carlsbad, ca) was prepared in rpmi w/o phenol red culture medium (invitrogen, carlsbad, ca). after washing the cells twice with warm pbs, macrophage cultures were stimulated with 1 ml of rpmi-ch-ifnγ mixture [13] . nitric oxide production was measured [10, 43, 44] to confirm macrophage stimulation in assays by interferon (data not shown). stimulation was evaluated as yes/no based on previously published results from b2 and b19 ifnγ stimulated macrophages (10) sample collection and rna sequencing a total of 145 gigabytes of rna sequence data was obtained from b19 and b2 haplotypes of chickens. two birds from each haplotype were selected for inclusion in the sequencing. each bird provided blood for extraction and isolation of peripheral blood mononuclear cells. purified monocytes were cultured for differentiation and cell samples were collected from nine time points for each bird. samples were collected for sequencing on the day they were cultured (day t-6), as well as on day -3 (t-3), day 0 (called 0 hours), and then six additional times over a 24 hour period corresponding to 1 hour, 2 hours, 4 hours, 8 hours, 16 hours and 24 hours after interferon stimulation. cells were lysed in wells with rlt buffer containing beta-mercaptoethanol (qiagen, valencia, ca) and stored at -80˚c. rna was processed with the qiashredder and rnaeasy kit from qiagen (valencia, ca) according to manufacturer's instructions and sent on dry ice to dr. calvin keeler at the university of delaware for generation of libraries and sequencing with an illumina hiseq 2000. an rna sequence library was constructed from purified rna. the library was fragmented in order to generate appropriately sized rna fragments suitable for templates in random primed first-strand cdna synthesis. second strand synthesis was completed in accordance with specifications for sequencing with illumina's hiseq2000 platform. the samples corresponding to each time point from each bird were sequenced and the data was stored in a unique file for each sequenced sample and time point. forty fasq files were generated from the data totaling 145 gigabytes. the average file size was 3.65 gigabytes and the standard deviation was 2.25 gigabytes. the sequencing data provided 933,107,885 reads across the biological samples and time points (table 1 ). across all time points for the two b2 samples, one produced 298,903,517 reads and the other produced 165,589,594 reads. similarly, across all time points, the b19 samples produced 285,392,384 reads and 183,222,390. for each time point (across all four birds) sequencing reads ranged from a low of approximately 78 million reads to a high of just over 171 million reads, with most time points producing over 88.4 million reads each and a few producing over 100 million reads each. the chicken reference genome washuc2, corresponding to ensembl release 70, was downloaded from ensembl.org (http://www.ensembl.org/info/data/ftp/index.html). annotation files included the small rna annotation files obtained from mibase release 19 (http://www. mirbase.org/). sequenced reads were filtered to remove low quality sequences from the data. filtered sequences were aligned to the reference genome using bowtie and tophat, available along with the software package cufflinks, from john hopkins university center for computational biology (https://ccb.jhu.edu/software.shtml). the aligned reads generated by bowtie produced gapped alignments on the reference genome which tophat used to identify splice junctions flanking exons. the resulting aligned reads were analyzed by cufflinks to construct transcripts corresponding to mrna sequences. next, cufflinks was employed to estimate transcript specific expression levels across the transcripts and genes within the reference genome based on the number of sequence reads for each mrna. the sequence read data was normalized using the fragments per kilobase of transcript per million mapped reads (fpkm) method to more accurately determine expression levels. the resulting transcriptome data was loaded into the mysql relational database to more effectively manage, explore, mine and annotate the data. gene expression data was hierarchically clustered using 1-pearson correlation on the rows and keeping the column order conserved. the resulting clustered data set was visualized as a heat map with black representing lack of gene expression, and darker shades of blue indicating lower expression values. dark purple represents higher expression values than any shade of blue while bright pink represents the highest expression values. for visualization purposes, the heat maps were generated with maximum heat map color assigned to expression set lower than the absolute maximum expression value contained in the entire data set, subsequently all values of expression greater than or equal to the assigned expression threshold (for example, 1000) shared the same color on the heat map (regardless of whether the actual expression level was 1000, 2000, 20,000 or 90,000) . this setting provided the optimal visualization of both high and low expressed genes in the heat maps. gene enrichment calculations were performed using the david bioinformatics database tool version 6.8 (https://david.ncifcrf.gov/). the analysis was performed using comparisons of successive time points within the b2 haplotype data to identify sets of genes that were enriched (p-value < 0.05). the b2 haplotype represents the robust macrophage phenotype as characterized by nitric oxide production compared to the b19 haplotype. subsequently, the gene enrichment was performed on the b2 data. gene enrichment was determined using three distinct databases: gene ontology biological process, kegg pathways, and reactome pathways corresponding to s2, s3 and s4 tables respectively. because a large number of enrichment annotation terms were produced, a subset of representative highlights from each of these three enrichment analyses was chosen for inclusion in the results. highlights were selected to provide examples of the biological process annotations, kegg pathways annotations and reactome annotations. realtime pcr was performed on a selected number of target genes to validate rna sequencing results. rna was taken from macrophages stimulated with ifnγ as described above for 2 and 4 hours, unstimulated samples (0h) served as control. for realtime rt-pcr, cdna synthesis was performed using superscript iii first strand synthesis kit (invitrogen, waltham, ma), according to manufacturer's instructions. pcr conditions were as follows: 95˚c for 10 minhot start, 40 cycles of 95˚c for 15 sec, 60 or 63˚c depending on gene (see primers) for 30 sec according to manufacturer instructions for the biotool 2x sybr green qpcr mix (biotool, houston, tx). primer sequences were designed using primer 3 (atp6voc, litaf, il18r, tln-1.) primer sequences previously published were used for tlr2, tlr4, tlr5, tlr6 and tlr7 [45] . atp6voc (annealing 60˚c) forward tgttgtcatggcgggtatta, reverse acaaataacctgggctgctg; litaf(annealing 60˚c) forward atcctcacccctaccctgtc, reverse gacgtgtcacgatcatctgg; il18r (annealing 63˚c) forward ctcttcgtgcctcca ttgat, reverse accaagttcaactggccaaa; tln-1(annealing 60˚c) forward tcaagcaga agttgcacacc, reverse gggagccattaaggatgtca. pcr analysis was done using the δδ method with 18s serving as housekeeping gene control. statistics were done using graphpad software (prism version 7), paired t-test, two-tailed. a total of 179 gene expression measurements were extracted from a published paper describing the fold change in expression levels of genes induced after 4 h exposure to cytomegalovirus [46] . the data was converted to a tab-delimited text file containing the official gene symbol and the reported expression level. the file was loaded into a mysql relational database and joined to the expression data produced from the b2 and b19 cells. the data was joined on the gene symbol and a set of 54 genes were identified. the fold change for the b2 and b19 expression data was calculated by taking the log-2 (4 h expression / 0 h expression). b2 and b19 genes having expression = 0 for the initial time point were converted to 0.1 to prevent division by zero. additionally, the fold-change reported for il6, 280.8, was changed to 35, in order to preserve the scale of the graphs and legibility of the resulting data represented in the histograms. the fold-change in expression for the b-haplotype birds and the published data was plotted using microsoft excel. a set of 13,618 unique genes from among all mapped sequencing reads was generated from the 4 birds across all 9 time points. next, we analyzed the expression data to determine the number of genes expressed in each haplotype within each time point. within the minus 6-day (t-6) time point, representing the time point after plating and adherence of monocytes and the start of differentiation into mature macrophages, 11,785 genes were expressed in the b2 birds while 12,089 were observed in the b19 birds, with 11,216 genes expressed in both. interestingly, 4770 genes were off in both b19 and b2 haplotypes while just 569 genes exhibited expression in only the b2 chickens and 873 genes were expressed only in the b19 birds. similar relationships were detected in each of the remaining eight time points. the t-3 day time point, representing 3 days of differentiation in cell culture, exhibited the greatest expression of genes with a total of 11,429 expressed in both b19 and b2 birds while just 4068 genes lacked evidence of expression in both haplotypes. also, during the t-3 day time point the greatest number of genes (1118) exhibit evidence of expression in the b2 birds while lacking evidence of expression in the b19 birds. at the t0 time point, after 6 days of differentiation and immediately before stimulation with interferon, 10,975 genes were expressed in both haplotypes while 4547 genes were not expressed in macrophages of either haplotype. likewise, the 1 h and 2 h time points exhibited 11,349 genes and 10,789, on in both haplotypes, respectively. it is worth noting that the time point with the most genes off in both haplotypes is 16 h with 5238 genes. overall the data indicates that approximately 10,000 to 11,000 genes are on in both haplotypes at each time point while roughly 4000 to 5300 genes are off in both haplotypes at each time point. the number of genes on in one haplotype, while off in the other haplotype, ranges from about 400 to 1140 depending upon the haplotype and time point (fig 1) in order to better understand the cell biology underlying differences in macrophage differentiation and activation between b19 and b2 birds, we searched for genes exhibiting statistically significant differences between different time points within a single b-haplotype haplotype as well within the same time point between haplotypes. when comparing the expression profiles between the b2 and b19 haplotypes, we identified 210 genes exhibiting differential expression at the t-6 day time point. these genes represent 198 genes with higher expression in the b2 birds and just 12 genes for which expression was greater in the b19. after three days, at the t-3 day time point, thousands of genes exhibited altered expression patterns between the two groups. surprisingly, 7000 genes showed higher expression in the b19 birds while only 14 genes were expressed at higher levels in the b2 birds. by t0 hrs, which corresponds to 6 days of monocyte differentiation into macrophages, we observed 955 genes with significant expression patterns between the haplotypes. of these genes, 544 exhibited greater expression in the b2 haplotype while 411 exhibited higher expression in the b19 haplotype. cells were stimulated with ifnγ immediately following rna collection at the t0 hr time point. at 1 h (t1) post-stimulation 665 genes show evidence of significant patterns of expression between the haplotypes where b19 birds had 109 genes expressed to higher levels while the b19 haplotype was associated with 556 genes having greater expression compared to t0. this pattern of increased expression in the b19 group is reversed by the 2 h time point. at 2 h after ifnγ treatment, the b2 cells show a global increase in expression for 5989 genes while the b19 cells have just 18 genes on at higher levels than the b2 birds. by 4 hours after stimulation, the b2 birds still exhibit greater expression for 1029 genes while the b19 birds exhibit higher expression for 12 genes. this trend changes by 8 hours after treatment, at which time the slower responding b19 group begin showing increased expression in 797 genes while the b2 cells have greater expression for just 15 genes. by 16 hours after stimulation, only 66 genes are differentially expressed between the two haplotype groups. and, at the 24 hour mark, 406 genes show evidence of statistically significant differences in expression between them with the b2 cells exhibiting greater expression for 339 genes while the b19 cells have higher expression for 67 genes ( table 2) . the b2 and b19 haplotype birds represent distinct genetic variation within the b-locus on chromosome 16. subsequently, patterns of gene expression variation of the genes located within this region were investigated. among the seventeen genes exhibiting statistically significant differences in expression between the b2 and b19 birds, many displayed divergent gene expression patterns prior to ifnγ stimulation. in the b2 cells, gene expression peaks on day t-6 and expression is effectively inhibited by day t-3. this is not the case in the b19 cells. rather than reach maximum expression levels in a single day, the b19 cells don't achieve maximum expression until day t-3 ( fig 2) . for example trim7, trim27.1, bf2, tpn, and trim41 exhibit strong expression on day t-6 in the b2 cells while the same genes exhibited prolonged expression over day t-6 and day t-3 in the b19 cells. members of the trim (tripartite motif) family have been implicated in antiviral immune defense and several are ubiquitin ligases [47, 48] . tpn (tapasin) is a co-factor for mhc i critical for antigen presentation to cytotoxic t-cells and chickens express the single mhci locus termed bf-2 which is working with tpn in antigen presentation and it has been shown that there are differences in the selection of high affinity peptides in b19 vs b15 haplotypes [49] highlighting their critical role in immune competence. additional genes within the b-locus display a similar pattern of pre-stimulatory differences in gene expression between the two different haplotypes, including genes involved in differentiation, cell growth and apoptosis such as ptpn2 (tyrosine protein phosphatase non-receptor2) and nfkb. gene expression decreases to approximately baseline levels by time point t0 hours. a second distinction in the gene expression patterns between b19 and b2 cells is that b2 cells exhibited a fairly robust expression at 2 and 4 hours after interferon stimulation. unlike the b2 haplotype, the b19 haplotype appears incapable of generating such a rapid, robust and coherent gene expression profile. in contrast, the b19 cells generate a delayed, weak and uncoordinated lower level of expression that extends up to 8 hours, and in some cases even 16 distinct temporal gene expression patterns in b2 versus b19 monocytes/macrophages. b-locus haplotypes in chickens provide a mechanism for genetically perturbing the cluster of immunologically important genes on chromosome 16 and producing phenotypic variation affecting infectious disease susceptibility and resistance. the heat map allows visualization of gene expression between the two genetically distinct haplotypes. each row represents a gene within the b-locus (listed on the right) and each column corresponds to a particular time point when cells were collected for rna sequencing. black pixels indicate zero gene expression for a particular gene at a specific point in time, and dark blue corresponds to very low expression, while brighter blue indicates the next higher levels. dark purple represents higher expression levels than blue colors, and pink represents the highest levels of gene expression. monocytes were obtained from each haplotype of chicken and allowed to differentiate into macrophages in vitro for seven, days beginning on day minus 6 (t-6). rna was sampled on day t-6, day t-3, and again three days later which is denoted as 0 hours (t0), when ifnγ was initially added to the cultures. on t0, rna was sampled immediately before stimulation with ifnγ. subsequent time points correspond to the time following interferon stimulation, in hours (1 hour, 2 hours, 4 hours, 8 hours, 16 hours and 24 hours). as visible on the heat map, there are distinct differences in gene expression between the b2 and b19 cells. the most dramatic difference occurs on day t-6. b2 cells exhibit a rapid burst of gene expression, indicated as a single column of pink on the left most edge of the heat map. in contrast, the b19 cells appear to undergo a much slower and prolonged gene expression program that was not as rapidly down regulated as in genes in the b2 cells. additional gene expression data for a number of proteins involved in cell growth and apoptosis, is shown in the bottom half of the figure to highlight a similar pattern in gene expression and kinetics. the green border indicates the b2 haplotype expression pattern and the red border corresponds to the b19 expression pattern. hours. overall, this global pattern of temporally dysregulated gene expression represents a reoccurring theme with the b19 monocytes and macrophages. the divergent timing of gene expression observed in the b-locus genes is mirrored in many other genes as well, including members of the tlr signaling pathway, cellular mediators of apoptosis and cell survival, and components of cytokine signaling. the global dysregulation of gene expression among 700 genes at the t-3 day time point, as well as the expression pattern of 6000 genes exhibiting altered expression led us to explore the pattern of gene expression changes within each haplotype group over all of the time points. at the onset of the study, the b2 cells were actively expressing a diverse set of genes, however by the day t-3, most of those genes displayed reduced expression in the b2 group. even so, the b19 haplotype cells continue to express these 7000 genes at higher levels than the b2 birds. after stimulation, b2 macrophages again show different patterns of expression compared to b19 cells in regards to timing of peak expression and coherence of expression. four distinct patterns of divergent gene expression were identified between the b2 haplotype birds and the b19 haplotype birds (fig 3) . the first interesting divergent pattern shows strong gene expression on day t-6 in the b2 birds while relatively low levels of expression are observed in the b19 birds on the same day. this pattern is of interest because it represents a group of genes that are differentially regulated at the onset of the experimental time course. specifically, these genes include the macrophage m1 marker ptgs2, as well as the b-locus gene cyp21. other genes exhibiting this pattern include secreted interleukin ligands il-1β, il4i1, and il6, along with genes associated with inhibition of cellular processes including irg1 and mip-3α. interestingly, the adenosine receptor also displays this pattern of expression. these genes may represent initial modulators of divergent monocyte to macrophage differentiation between the b2 and b19 cells. the second example of divergent expression patterns is the single peak of day t-6 expression in the b2 haplotype cells compared to the prolonged multiple day expression in the b19 haplotype cells. some of these genes are macrophage differentiation mediators, like gata2 [50] , and fadd, while others are macrophage podosome (primary matrix structure) markers, including vcl and gsn. other genes exhibiting this divergent expression pattern include chemokine receptors, like cxcr4, fatty acid transport, such as slc25a17, and ubiquitin related factors, like dd5, which is associated with proteasomal degradation of gene products. additional interesting divergent gene expression patterns were observed between the two haplotypes occurring after stimulation by ifnγ (fig 3) . a notable difference in post-stimulatory induction of gene expression is a four-hour difference in peak expression timing for a large number of induced genes. in the b2 haplotype macrophages, the peak expression occurs between 2 and 4 hours, while in the b19 macrophages, the peak expression occurs between 4 and 8 hours. some of the most noticeable genes exhibiting this divergent gene expression pattern include litaf, il-1β, il12, and ifih1, genes involved in macrophage signaling and m1 macrophage polarization [26] . additionally, a number of genes implicated in invadosome assembly and function also exhibit this temporally displaced pattern of induction such as cd44, rac1, and src. another discernable difference in post-stimulatory induction of genes between the b2 macrophages compared to the b19 macrophages is one of coherence (fig 3) . specifically, there are a number of genes for which the b2 macrophages are able to rapidly turn on and reach relatively high levels of expression within 2 to 4 hours of ifnγ stimulation. in contrast, these same genes fail to exhibit a coherent peak of expression, even after 4 to 8 hours, in the b19 cells. instead, they exhibit a dispersed "smear" of gene expression extending from approximately 1 hour after stimulation to 16 hours post-stimulation. some of the most represented genes exhibiting this divergent pattern of expression include molecules involved in lysosome function and phagocytosis. cttn and actr3, genes implicated in fcr mediated phagocytosis, four distinct patterns were identified as representative of the types of divergent gene expression that re-occur across many genes involved in macrophage differentiation, activation and function in b2 versus b19 macrophages. 1. day t-6: b2 high vs b19 low. this divergent pattern exhibits strong expression of genes on day -6 in the b2 birds while relatively low levels of expression are observed in the b19 birds at the same time point. genes of interest include an adenosine receptor (p2ry12) 2. day t-6: b2 = 1 day vs. b19 = 3 days. this example of divergent patterns is the single peak of day t-6 gene expression in the b2 haplotype cells compared to the prolonged multiple day expression until day t-3 in the b19 haplotype cells. genes of interest include macrophage differentiation gene gata, adenosine receptor a2a and macrophage podosome markers vcl and gsn. macrophages. another interesting divergent gene expression pattern observed between the two haplotypes occurs after stimulation by ifnγ. there is a four-hour difference in peak expression timing for a large number of induced genes. in the b2 haplotype macrophages, the peak expression occurs between 2 and 4 hours, while in the b19 macrophages, the peak expression occurs between 4 and 8 hours. 4. maximum ifnγ stimulation: b2 = coherent vs. b19 = non-coherent another discernable difference in post-stimulatory induction of genes between the b2 macrophages compared to the b19 macrophages is one of coherence. specifically, there are a number of genes for which the b2 macrophages are able to rapidly turn on and reach relatively high levels of expression within 2 to 4 hours of ifnγ stimulation. in contrast, these same genes fail to exhibit a coherent peak of expression, even after 4 to 8 hours, in the b19 cells. instead, they exhibit a dispersed "smear" of gene expression extending from approximately 1 hour after stimulation to 16 hours post-stimulation. https://doi.org/10.1371/journal.pone.0179391.g003 along with lysosomal-associated molecules, like laptm5 and lamp1, as well as the lysosomal transporter molecules atp6ap1, atp6v1g1 and atp6v0c, exhibit this non-coherent pattern of expression in the b19 macrophages. in contrast, immediately following stimulation, the b2 cells rapidly induce expression of roughly 6000 genes by the 2 h following stimulation; while, at the same time, the cells derived from the b19 birds show no signs of induction among these genes until after 4 h. it is interesting to note that while the b2 birds show a statistically significant increase in expression for 6100 genes between 1 h and 2 h, the b19 cells exhibit increased expression for just 66 genes at this time point. the largest wave of increased gene expression occurs in the b19 cells during the transition from 2 h to 4 h post stimulation, when 1164 genes increase significantly over this time period. at the transition between 8 h and 16 h, the b2 haplotype group only exhibits differences in expression for 83 genes, with 44 having higher expression at the 16 h time point. yet, the b19 cells show differences in 386 genes during this same period, but interestingly, 356 of these genes exhibit decreased expression during this same time interval. taken together, these results suggest that a global disruption of temporal gene expression underlies the observed differences in differentiation, activation and nitric oxide production from macrophages derived from the two different mhc haplotypes. gene expression was measured in separate samples of b2 and b19 cells following stimulation with ifnγ. change in expression was assessed at 2 hours and 4 hours post stimulation. atp6v0c exhibited the greatest induction of all genes assayed, showing an increased expression in the b2 cells at 4 hours that was 20 times the initial expression at 0-hours. expression of atp6voc was dramatically less in the b19 birds. similarly, il18r exhibited greater than 9 times the initial expression in the b2 cells at 4 hours compared to the b19 cells which exhibited less than 2 times the initial expression at 0-hours. litaf and tlr2 exhibited more than 7 times the expression at 4 hours in the b2 macrophages, while tln-1, tlr-5, tlr-6 and tlr-7 exhibited greater than 4 times the initial expression in the b2 macrophages. in contrast, the b19 macrophages failed to exhibit comparable induction of these genes (fig 4) . in addition to the rt-pcr validation of gene expression, 54 genes, for which gene expression changes were described following cytomegalovirus stimulation were used as comparisons for the corresponding genes in the b2 and b19 haplotype birds (fig 5) . a total of 25 published genes exhibited decreases in expression following cytomegalovirus stimulation while 29 genes exhibited increased expression following stimulation. interestingly, all but one gene (fez1) in the b2 cells exhibited increased expression following ifnγ stimulation. in contrast, ten genes displayed decreased expression in the b19 cells. of the ten exhibiting fold-change < 0 in the b19 cells, 70% also exhibited decreased expression in the cytomegalovirus stimulated cells. in total, 28 genes (52%) expressed in the b2 cells matched the direction of the fold change reported in the published data while 33 genes (61%) corresponded between the b19 cells and the published data. of the ten published genes reported as having greater than 5-fold increased expression, 90% of the b2 genes exhibited fold-change in the same direction. overall, this data, in conjunction with the rt-pcr data, provides a comprehensive set of validation data providing evidence that the b2 and b19 gene expression data is reproducible and similar to expression patterns observed in cells stimulated towards macrophage activation pathway. visualization of gene expression via heat maps facilitated the identification of distinct expression patterns between the b2 and b19 haplotypes. because any initial differences in gene expression existing 6 days before ifnγ stimulation represent candidates responsible for the observed phenotypic differences between the two haplotypes. genes exhibiting divergent gene expression patterns between b2 and b19 birds on day -6 were identified (fig 6) . the genes cluster into four major clades (clade1, clade2, clade3, and clade4 with a singleton labelled clade 5). among these genes, represented in clade1 and clade2, are a number of mirnas exhibiting strong expression in b2 cells (mir-147, mir-146b, mir-1618, mir-200a, mir-1649, and mir-1648a) compared to the b19 samples. likewise, mirnas contained in clade3 and clade4 exhibit greater expression in b19 cells (mir-1627, mir-222b, mir-1633, and mir-19a). a number of small nucleolar rnas (snornas) exhibit similarly dichotomous gene expression patterns (clade4). for example, snord24, snoz40, snord74, snora17, and snord12 exhibit substantially higher levels of expression in b19 cells on day -6 compared to b2 cells (fig 6) . however, b2 cells also express snornas exhibiting divergent expression patterns between the two haplotypes, such as snou2_19 (clade2). in addition to non-coding rnas, divergent expression patterns are also observed with protein-coding rnas (fig 6) . for example, clade1 contains il6, il18, il-1β, ccl1, ptpn2 and mmp10, which exhibit higher initial expression on day -6 in the b2 birds. in contrast, the protein-coding genes lamp2, ubxn7, ube4b, pk3ca, ube2w, and cx3cr1, in clade3, exhibit higher initial expression patterns in b19 birds. clade5 contains the single gene ifnγ, which exhibits relatively low expression early in both b2 and b19 cells, but following stimulation rises to a higher level at 8 hours in the b19 birds. 54 genes, for which gene expression changes were previously described following cytomegalovirus stimulation were used as comparisons for the corresponding genes in the b2 and b19 haplotype birds. a total of 25 published genes exhibited decreases in expression following cytomegalovirus stimulation while 29 genes exhibited increased expression following stimulation. all but one gene (fez1) in the b2 cells exhibited increased expression following ifnγ stimulation. in contrast, ten genes displayed decreased expression in the b19 cells. of the ten exhibiting fold-change < 0 in the b19 cells, seven exhibited decreased expression in the cytomegalovirus stimulated cells. twenty-eight genes (52%) expressed in the b2 cells matched the direction of the fold change reported in the published data while 33 genes (61%) corresponded between the b19 cells and the published data. of the ten published genes reported as having greater than at least 5-fold increased expression, 90% of the b2 genes exhibited fold-change in the same direction. considering the diverse expression patterns discovered and the results indicating the involvement of non-coding rnas, further results including divergent non-coding rna expression will be described in more detail in further publications. enrichment analysis of genes exhibiting statistically significant differences in expression between time points and/or haplotypes (table 3 , s2, s3 and s4 tables) was performed using gene ontology and both kegg and reactome pathways. the results of the gene enrichment identification of divergent gene expression patterns between b2 and b19 macrophages. visualization of divergent gene expression patterns between the b2 and b19 haplotypes. a subset of genes exhibiting divergent gene expression were identified and visualized in heat following hierarchical clustering of the genes (rows), but not the time points (columns). the genes cluster into four major clades (clade1, clade2, clade3, and clade4) with a singleton gene (labelled clade 5). among these genes, represented in clade1 and clade2, are a number of mirnas exhibiting strong expression in b2 cells (mir-147, mir-146b, mir-1618, mir-200a, mir-1649, and mir-1648a) compared to the b19 samples. likewise, mirnas contained in clade3 and clade4 exhibit greater expression in b19 cells (mir-1627, mir-222b, mir-1633, and mir-19a). additionally, a number of small nucleolar rnas (snornas) exhibit similarly dichotomous gene expression patterns (clade4) such that snord24, snoz40, snord74, snora17, and snord12 exhibit substantially higher levels of expression in b19 cells on day -6 compared to b2 cells while b2 cells express such as snou2_19 (clade2). https://doi.org/10.1371/journal.pone.0179391.g006 provided a high-resolution perspective of the functional role of mrna sequenced within the b2 cells across the experimental time points. between the -6 day and -3 day time points, a large number of genes exhibit reduced expression. these genes are enriched for biological processes such as transcription, mrna splicing, the transition from day -3 to t0 (just prior to ifnγ stimulation) correlates with down regulation of genes associated with chromosome segregation, mitotic nuclear division and dna repair, cell cycle pathways, acetylation and some immunological functions of interleukin 3 and 5 signaling and interleukin receptor signaling. genes exhibiting increases in expression during this interval were enriched in processes and pathways related to interleukin 4, biosynthesis of amino acids, and metabolic pathways. within an hour of ifnγ stimulation genes exhibiting increased expression were associated with inflammatory and defense responses, toll-like receptor signaling pathways, cell chemotaxis, myd88 signaling, jak-stat signaling, cell adhesion, foxo signaling, and raf activation. genes exhibiting an increase in expression within two hours of ifnγ stimulation are enriched for biological processes of intracellular protein transport, er-to-golgi vesical mediated transport, endosome to lysosomal transport, chromatin remodeling, histone h3 acetylation, regulation of vesicle fusion and protein import into the nucleus. among the kegg pathways that exhibit enrichment for these genes are rna transport, protein export, lysosome, mrna surveillance pathway, endocytosis and mtor signaling. similarly, the enriched reactome pathways mirror these processes and include cytosolic sensors of pathogen-associated dna, rna polymerase ii initiation and promoter clearance, mtorc1-mediated signaling, map2k and mapk activation, downstream tcr signaling, fcε receptor 1 mediated mapk activation, and clec7a signaling. genes exhibiting decreased expression between 2 hours and 4 hours following ifnγ stimulation correspond to reduced expression of cell-cycle pathways and mediators, as well as genes implicated in g1/s transcription, dna replication, and separation of sister chromatids. biological processes identified within these genes include mitotic spindle assembly, chromosome segregation, and mitotic spindle checkpoint assembly. conversely, genes exhibiting increased expression during this same time are enriched for biological processes of inflammatory response, defense response to virus, positive regulation of interleukin 12 production, negative replication of viral genome replication, bacterial defense processes and positive regulation of il1α secretion. kegg pathways associated with these genes include cytokine-cytokine receptor interaction and genes implicated in influenza a signaling. reactome processes identified included stem cell population maintenance and tor signaling. over the remaining time points, from 4 hours to 8 hours, from 8 hours to 16 hours and from 16 hours to 24 hours the b2 cells exhibit a systemic down regulation of the genes that were initially activated during the ifnγ stimulation. overall, the gene enrichment analysis of the rna sequence data provides a cellular-level picture of the specific biological processes that occur over time following activation of monocyte-derived macrophages. previous work in our laboratories investigated the association between chicken haplotype and disease resistance, specifically the enhanced resistance of b2 haplotypes to avian coronavirus ibv [10] and the influence of innate immunity leading to decreased clinical signs of illness. we showed that macrophages play an important role in this enhanced immunity, demonstrating much better activation in response to stimulation [13] . to analyze the gene expression involved in this process leading to increased macrophage nitric oxide release in b2 haplotypes, we stimulated macrophages from b2 and b19 chicks for rna sequencing. in addition, we had observed different cell morphology when isolated monocytes from b2 and b19 haplotypes were differentiating into macrophages and therefore time points before stimulation, during differentiation of the macrophages, were included in this study. the rationale for investigating the gene expression differences between the b2 and b19 haplotype birds was to address underlying questions that were raised at the end of our previous studies: 1. why do the ifnγ stimulated b19 derived macrophages exhibit decreased nitric oxide production compared to the ifnγ stimulated b2 derived macrophages? 2. how do the two lineages of macrophages differ at the gene expression level? 3. what specific patterns of gene expression correlate with divergent macrophage differentiation, activation and function? 4. what is the underlying cause of the divergent gene expression patterns observed between b2 and b19 macrophages? the data collection, analysis and interpretation of results described herein provide plausible answers to these questions based on bioinformatics and functional genomics approaches. although these answers are more realistically new hypotheses for further investigation, they do represent significant advances in the understanding of b2 and b19 monocyte differentiation into macrophages and the resulting divergent patterns of b2 and b19 macrophage activation and function. ultimately, the findings and interpretations we report must be functionally and experimentally validated. even so, the use of computational methods to answer these questions represents a valuable first step in deciphering the cellular phenotypes underlying mhc haplotype variation in macrophage cells. our results demonstrate that there are large numbers of genes differentially expressed in the two haplotypes, both during differentiation of peripheral monocytes into mature macrophages, as well as after stimulation of differentiated macrophages with interferon. the answer to the question of why the ifnγ stimulated b2 haplotype cells produce more nitric oxide than the ifnγ stimulated b19 cells lies in the timing of macrophage differentiation and the phenotypic variation that is set up early prior to ifnγ stimulation, such as divergent expression of genes involved in differentiation and immune competence. at day t-6, after plating of monocytes without interferon stimulation, several genes relating to inflammation, interferon responses and differentiation are upregulated in the b2 haplotype. this is to be expected as adherence of the monocytes is actually an activation signal, but it is notable that this signal is not resulting in the same gene expression pattern in the b19 haplotype. this pattern can be observed for genes such as il1β, ptsg2, il6, which are mainly associated with the inflammatory m1 phenotype. on the other hand, adenosine receptor a2b is also showing increased gene expression at this time in b2 cells, and this receptor plays an important role in differentiation, as well as in the inflammatory response. expressions of these genes are consequently initially increased at the time of adherence, and then again after stimulation with interferon, in the b2 haplotype. some, but not all of these genes are expressed after stimulation with interferon in the b19 haplotype, but not to the same extent as the b2 macrophages, which appears to relate to the initial lack of expression at t-6 days, the beginning of differentiation. another interesting observation was the differential expression of genes at day t-6 versus day t-3 in the two haplotypes, as a large number of genes is highly expressed in both haplotypes at day t-6, but then is completely shut down in b2 haplotypes while showing delayed expression until day t-3 in the b19 birds. this seems to indicate a lack of appropriate regulation in the b19 birds, consequently leading to less coherent initiation of gene expression when stimulated. some of the genes showing this pattern are macrophage differentiation associated gata2 and fadd, as well as macrophage podosome markers vcl and gsn. taken together, these results appear to suggest that the regulation of b2 differentiation from monocyte to macrophage is very tightly regulated with many genes increasing in expression, quickly followed by shutting down this increased gene expression. in contrast, the regulation of b19 gene expression is not well regulated, appearing to "linger" with either delayed or extended gene expression. consequently, we observed differences in expression of genes after stimulation with interferon. specifically, genes that were strongly expressed at day t-6 and not expressed (or only weakly expressed) at day t-3 in the b2 birds, were robustly increased at 2 and 4 hours of stimulation. in contrast, the same genes showed weak and delayed expression in b19 birds after stimulation, emphasizing the importance of the regulation of gene expression during differentiation. this relates very well to the differences we previously reported in morphology of b2 and b19 macrophages during differentiation and after stimulation. our results provide insight into the complexity associated with macrophage differentiation, activation and function. thousands of genes are up-regulated and then down-regulated in a 24-hour period following ifnγ stimulation. the coordinated activity of multiple regulatory and gene expression control mechanisms is required to effectively achieve the dramatic changes in internal cellular programing that occur. although the b2 and b19 birds' haplotypes differ within the mhc locus, the functional consequences of this genetic difference extend well beyond the genes encoded within the b-locus, including macrophage differentiation, m1 and m2 macrophage markers, lysosomal factors involved in phagocytosis, podosome development, invadosome capabilities, chemotaxis potential and matrix degradation ability. additionally, during the process of differentiation and activation, thousands of genes associated with basic cellular biology undergo rapid changes in expression in coordination with the expression of factors associated with cell renewal and proliferation such as cell cycle regulators, mitotic spindle components, factors involved in chromatin remodeling, molecules required for chromosome segregation and nuclear division. taken together, our data and interpretations provide a framework of possible mechanisms of b2 and b19 macrophage biology in differentiation and activation. as such, our findings offer a number of hypotheses about macrophage cell biology that can be used for subsequent studies aimed at validating our findings. although we performed rt-pcr on a set of differentially expressed genes between the b2 and b19 macrophages, it is not feasible, or possible, to systematically verify, via rt-pcr, each and every transcript, observed at each time point, in the experiment. even so, our pcr validation provides independent evidence that the pattern of gene expression we observed in the rna sequence data, was consistent and reproducible which is also in line with our previous research detailing differences in macrophage activation and function [13] conclusions we have tried to elucidate possible mechanisms involved in enhanced disease resistance and macrophage functions displayed by b2 haplotype chickens compared to b19. this study highlights the complex gene expression patterns involved in macrophage differentiation and activation. one of the main conclusions from the large number of differences seen in the gene expression of the two haplotypes is the fact that there are not just a few genes or genetic markers that can be readily identified as being the ultimate cause of enhanced macrophage function in b2 chickens. rather, it appears that events during differentiation of monocytes into macrophages have a significant impact on the subsequent ability for stimulation of immune genes after ifnγ treatment. the differences in gene expression correlate with the previously observed differences in morphology of the two haplotypes, with b2 macrophages having a more typical macrophage appearance [13] . considering the global temporal dysregulation of many genes in b19 haplotypes compared to the more resistant b2 chicks, it seems likely that a variety of genomic regulatory mechanisms (such as transcription factors, mirnas, snornas, ubiquitin mediated proteasomal degradation, and epigenetic regulation) might play a major role in this process which will be further detailed in a future publication. it will be of great interest to further elucidate these mechanisms and their connection to enhanced immunity. ultimately, our detailed model of macrophage differentiation, activation and function following ifnγ stimulation provides a high resolution molecular map of cellular biology which can be leveraged by other investigators to further explore the role of these genes in immunology. supporting information s1 table. significant differences in gene expression with p-values. pairwise gene expression differences between samples (b2 and b19 haplotype) and timepoints (-6 days, -3 days, 0 days, 1 hr, 2 hrs, 4 hrs, 8 hrs, 16 hrs, 24 hrs) are provided with p-values in excel format. the analysis described within this manuscript focused on differences between [1] matched time points between b2 and b19 haplotype chickens (such as b2 1 hr versus b19 1 hr) as well as [2] progressive timepoints within the same haplotype group (such as b2 1hr versus b2 2hr and b19 4 hr versus b19 8 hr). subsequently the data contained in this supplemental file also focuses predominantly on those comparisons. this file contains a total of 163,043 rows including the header line containing field names (genename-identifier for each gene either as gene symbol or ensemble geneid; locus-chromosome number and the start-end base pair location of the gene; sample1 and sample2-the "paired" samples compared for significant gene expression, note 'ab' corresponds to b2 haplotype and 'ec' corresponds to b19 haplotype; teststatusindication that the analysis method performed by cuffdiff program within the cufflinks package was 'ok'; fpkm1 and fpkm2-the fpkm values for sample 1 and sample 2 respectively; log2fpkm-the log of the ratio of fpkm1 and fpkm2; teststat-the test statistic generated during the statistical analysis; pvalue-the p-value corresponding to the difference in expression between sample1 and sample2; qvalue-a multiple testing corrected p-value; signif-'yes ' indicates that the pairwise difference in expression is in fact statistically significant). (xlsx) s2 table. significant gene ontology biological process enrichment analysis results. enriched gene ontology biological process terms identified within up and down regulated genes within the b2 haplotype chicken samples across progressive time points (-6 days, -3 days, 0 days, 1 hr, 2 hrs, 4 hrs, 8 hrs, 16 hrs, 24 hrs) are provided in excel file format. since the b2 chickens exhibited the most robust macrophage phenotype these samples were used for the analysis as a means of characterizing the biological processes that were associated with the altered gene expression across the experimental time points. note 'ab' indicates b2 haplotype. the file contains a total of 362 rows including the header line containing field names (sample comparison-indicates the specific pair of time points for which gene expression changes were identified; gene set-indicates the specific set of differentially expressed genes, 'down' or 'up'; category-indicates the specific subset of terms that were used for the analysis; term -provides the gene ontology identifier for the identified gene ontology term/annotation; description-the specific enriched gene ontology biological process term; gene count-the number of genes within the differentially expressed genes that are mapped to the particular enriched gene ontology term; %-the corresponding percent associated with the specific number of genes; p-value-the p-value associated with the gene ontology term enrichment). (xlsx) s3 table. significant kegg pathway enrichment analysis results. since the b2 chickens exhibited the most robust macrophage phenotype these samples were used for the analysis as a means of characterizing the kegg pathways that were associated with the altered gene expression across the experimental time points. note 'ab' indicates b2 haplotype. the file contains a total of 110 rows including the header line containing field names (sample comparison-indicates the specific pair of time points for which gene expression changes were identified; gene set-indicates the specific set of differentially expressed genes, 'down' or 'up'; categoryindicates the specific subset of terms that were used for the analysis; term-provides the kegg pathway identifier for the identified pathway term/annotation; description-the specific enriched kegg pathway term; gene count-the number of genes within the differentially expressed genes that are mapped to the particular enriched kegg pathway term; %-the corresponding percent associated with the specific number of genes; p-value-the p-value associated with the kegg pathway term enrichment). (xlsx) s4 table. significant reactome pathway enrichment analysis results. since the b2 chickens exhibited the most robust macrophage phenotype these samples were used for the analysis as a means of characterizing the reactome pathways that were associated with the altered gene expression across the experimental time points. note 'ab' indicates b2 haplotype. the file contains a total of 110 rows including the header line containing field names (sample comparison -indicates the specific pair of time points for which gene expression changes were identified; gene set-indicates the specific set of differentially expressed genes, 'down' or 'up'; category -indicates the specific subset of terms that were used for the analysis; term-provides the reactome pathway identifier for the identified pathway term/annotation; description-the specific enriched reactome pathway term; gene count-the number of genes within the differentially expressed genes that are mapped to the particular enriched reactome pathway term; %-the corresponding percent associated with the specific number of genes; p-valuethe p-value associated with the reactome pathway term enrichment). association of inos, trail, tgf-beta2, tgf-beta3, and igl genes with response to salmonella enteritidis in poultry. genetics, selection, evolution: gse tumour necrosis factor-alpha-857t allele reduces the risk of hepatitis b virus infection in an asian population heterophils isolated from chickens resistant to extra-intestinal salmonella enteritidis infection express higher levels of pro-inflammatory cytokine mrna following infection than heterophils from susceptible chickens profiling pro-inflammatory cytokine and chemokine mrna expression levels as a novel method for selection of increased innate immune responsiveness association of the major histocompatibility complex with avian leukosis virus infection in chickens host age and major histocompatibility genotype influence on rous sarcoma regression in chickens resistance to marek's disease in chickens with recombinant haplotypes to the major histocompatibility (b) complex chicken mhc molecules, disease resistance and the evolutionary origin of birds major histocompatibility complex and background genes in chickens influence susceptibility to high pathogenicity avian influenza virus immune-related gene expression in two b-complex disparate genetically inbred fayoumi chicken lines following eimeria maxima infection the chicken major histocompatibility complex in disease resistance and poultry breeding dramatic differences in the response of macrophages from b2 and b19 mhc-defined haplotypes to interferon gamma and polyinosinic:polycytidylic acid stimulation macrophages from disease resistant b2 haplotype chickens activate t lymphocytes more effectively than macrophages from disease susceptible b19 birds intracellular survival of brucella abortus, mycobacterium bovis bcg, salmonella dublin, and salmonella typhimurium in macrophages from cattle genetically resistant to brucella abortus susceptibility to mycobacterial infections: the importance of host genetics role of strain differences on host resistance and the transcriptional response of macrophages to infection with yersinia enterocolitica differential responses of macrophages from bovines naturally resistant or susceptible to mycobacterium bovis after classical and alternative activation analysis of the transcriptional networks underpinning the activation of murine macrophages by inflammatory mediators transcription of innate immunity genes and cytokine secretion by canine macrophages resistant or susceptible to intracellular survival of leishmania infantum innate immunity: the virtues of a nonclonal system of recognition innate immunity: impact on the adaptive immune response innate immune recognition: mechanisms and pathways csf-1 as a regulator of macrophage activation and immune responses. archivum immunologiae et therapiae experimentalis differential effects of cpg dna on ifnbeta induction and stat1 activation in murine macrophages versus dendritic cells: alternatively activated stat1 negatively regulates tlr signaling in macrophages macrophage polarization and plasticity in health and disease interferon production and host resistance to type ii avian (marek's) leukosis virus (jm strain) in vivo effects of chicken interferon-gamma during infection with eimeria the role of tlr9 polymorphism in susceptibility to pulmonary tuberculosis atf3 confers resistance to pneumococcal infection through positive regulation of cytokine production chicken interferon-mediated induction of major histocompatibility complex class ii antigens on peripheral blood monocytes. vet immunol immunopathol crosstalk among jak-stat, toll-like receptor, and itam-dependent pathways in macrophage activation regulation of interferon and toll-like receptor signaling during macrophage activation by opposing feedforward and feedback inhibition mechanisms interferon-gamma and polyunsaturated fatty acids increase the binding of lipopolysaccharide to macrophages toll-like receptor 2 and 4 surface expressions on human monocytes are modulated by interferon-gamma and macrophage colony-stimulating factor. immunology letters triggering of toll-like receptors modulates ifn-gamma signaling: involvement of serine 727 stat1 phosphorylation and suppressors of cytokine signaling 1 and icsbp control constitutive and ifn-gamma-regulated tlr9 gene expression in mouse macrophages development of monocytes, macrophages, and dendritic cells shaping of monocyte and macrophage function by adenosine receptors a(2b) adenosine receptors in immunity and inflammation an avian, oncogenic retrovirus replicates in vivo in more than 50% of cd4+ and cd8+ t lymphocytes from an endangered grouse a dna vaccine expressing env and gag offers partial protection against reticuloendotheliosis virus in the prairie chicken (tympanicus cupido) differential nitric oxide production by chicken immune cells non-replicating adenovirus vectors expressing avian influenza virus hemagglutinin and nucleocapsid proteins induce chicken specific effector, memory and effector memory cd8(+) t lymphocytes. virology profile of toll-like receptor expressions and induction of nitric oxide synthesis by toll-like receptor agonists in chicken monocytes transcriptome analysis reveals human cytomegalovirus reprograms monocyte differentiation toward an m1 macrophage gata6 regulates aspartoacylase expression in resident peritoneal macrophages and controls their survival trim family proteins: retroviral restriction and antiviral defence evolution of a cytoplasmic tripartite motif (trim) protein in cows that restricts retroviral infection a mechanistic basis for the co-evolution of chicken tapasin and major histocompatibility complex class i (mhc i) proteins we would like to thank the vivarium staff of western university of health sciences and the sequencing facilities staff at the university of delaware for their work. we would also like to thank rich upshaw and richard applebee at western university, and christopher sullivan, at oregon state university, who helped acquire, set up and maintain the computational hardware required to complete this project. additionally, the authors wish to thank paul gettler for his expertise and time in helping with figures for this paper.visualization: yd ki. writing -review & editing: yd ki. conceptualization: ki yd ec. key: cord-302047-vv5gpldi authors: willemsen, anouk; zwart, mark p title: on the stability of sequences inserted into viral genomes date: 2019-11-14 journal: virus evol doi: 10.1093/ve/vez045 sha: doc_id: 302047 cord_uid: vv5gpldi viruses are widely used as vectors for heterologous gene expression in cultured cells or natural hosts, and therefore a large number of viruses with exogenous sequences inserted into their genomes have been engineered. many of these engineered viruses are viable and express heterologous proteins at high levels, but the inserted sequences often prove to be unstable over time and are rapidly lost, limiting heterologous protein expression. although virologists are aware that inserted sequences can be unstable, processes leading to insert instability are rarely considered from an evolutionary perspective. here, we review experimental work on the stability of inserted sequences over a broad range of viruses, and we present some theoretical considerations concerning insert stability. different virus genome organizations strongly impact insert stability, and factors such as the position of insertion can have a strong effect. in addition, we argue that insert stability not only depends on the characteristics of a particular genome, but that it will also depend on the host environment and the demography of a virus population. the interplay between all factors affecting stability is complex, which makes it challenging to develop a general model to predict the stability of genomic insertions. we highlight key questions and future directions, finding that insert stability is a surprisingly complex problem and that there is need for mechanism-based, predictive models. combining theoretical models with experimental tests for stability under varying conditions can lead to improved engineering of viral modified genomes, which is a valuable tool for understanding genome evolution as well as for biotechnological applications, such as gene therapy. a large number of virus genomes have been engineered to carry additional sequences for a variety of purposes. viruses are often used as vectors for heterologous gene expression in cultured cells or the natural host. for example, the baculovirus expression system is widely used for expression work (chambers et al. 2018 ), lentiviruses show great promise for gene therapy (milone and o'doherty 2018) , and phage display allows for selection of desired epitopes (wu et al. 2016) . marker genes have also been built into viruses to facilitate tracking infection spread (dolja, mcbride, and carrington 1992) . as viruses evolve rapidly, including the incorporation of genome-rearrangements, it is therefore unsurprising that the insertion of sequences into viral genomes often goes hand in hand with the rapid occurrence of deletions (koonin, dolja, and morris 1993; pijlman et al. 2001; zwart et al. 2014 ). the inserted sequence, and sometimes parts of the viral genome, are then rapidly lost. this genomic instability can have economic ramifications, leading to decreases in heterologous protein expression (kool et al. 1991; de gooijer et al. 1992; scholthof, scholthof, v c the author(s) 2019. published by oxford university press. this is an open access article distributed under the terms of the creative commons attribution non-commercial license (http://creativecommons.org/ licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. for commercial re-use, please contact journals.permissions@oup.com and jackson 1996) . it can also introduce limitations and complications to working with marker genes majer, darò s, and zwart 2013) . understanding the stability of inserted sequences therefore has value from an applied perspective, but it could also shed light on basic questions. first, how stable are natural virus genomes, and under what conditions do they become unstable? second, since horizontal gene transfer (hgt) plays an important role in virus evolution, under what conditions are transferred sequences likely to be retained? in this review, we consider the stability of inserted sequences and the dynamics of their removal from virus genomes from an evolutionary perspective. first, we provide an overview of empirical results which shed light on insert-sequence stability for viruses, based on the baltimore classification. second, we present some conceptual considerations pertaining to sequence stability, identifying important parameters for understanding and potentially predicting stability. we identify theory and experiments that point toward viable strategies for mitigating the rapid loss of inserted genes, and point out key questions that should be addressed in future research. we argue that virus genome organization has a large impact on the stability of inserted sequences, whilst stability is a complex trait that can depend on environmental conditions. we provide an overview of empirical results for the stability of natural and engineered inserted sequences, following the baltimore classification. our primary focus is on engineered viruses: studies where gene insertions are an addition to the viral genome (leading to an increase in genome size) and where the subsequent fate of these inserted sequences has been tracked. as inserted sequences can incur a fitness cost, these are often quickly purged from the viral genome. often these fitness costs are related to a disruption of the viral genome (e.g. gene order). we therefore also consider studies on genome rearrangements in wild viruses and introduce other relevant modifications that shed light on what the impact of genomic inserts can be. we provide an overview of the results and main conclusions of our review in table 1 . several studies relating to the stability of double-stranded (ds) dna viruses have been published. the dsdna viruses have a wide range of genome sizes, from the small polyomaviridae and papillomaviridae ranging from 4.5 to 8.4 kbp, to the relatively table 1 . we provide an overview of the main conclusions, for all viruses and for the different baltimore classification groups. viruses genera covered in relevant studies conclusions of this review all viruses • inserted sequences are often unstable and rapidly lost upon passaging of an engineered virus • the position at which a sequence is integrated in the genome can be important for stability • sequence stability is not an intrinsic property of genomes because demographic parameters, such as population size and bottleneck size, can have important effects on sequence stability • the multiplicity of cellular infection affects sequence stability, and can in some cases directly affect whether there is selection for deletion variants • deletions are not the only class of mutations that can reduce the cost of inserted sequences, although they are the most common i: dsdna alphabaculovirus, lambdavirus, mastadenovirus, orthopoxvirus, t7likevirus, varicellovirus • large genomes that are readily engineered and also highly plastic, as exemplified by the 'genome accordion' in poxviruses • small insertions can be stable, but larger insertion are rapidly lost • classic studies with phages exemplify how lower limits to the size of packaged genomes can be used to increase insertion stability ii: the inverted terminal repeats of vaccinia virus undergo rapid changes in size due to unequal crossover events leading to stable and unstable forms (moss, winters, and cooper 1981) . the diversity in this region is needed for immune evasion and for the colonization of novel hosts and appears to be mainly regulated by recombination events. however, other processes such as mutation leading to accelerated rates of recombination cannot be ruled out. poxviruses, such as vaccinia, virus are classified as nucleocytoplasmic large dna viruses (ncldvs). these viruses have larger than average genome sizes and the more recently discovered giant viruses are also classified as such. the ncldvs appear to have undergone a dynamic evolution where gene gain and loss events go in parallel with host-switches between animal and protist hosts (koonin and yutin 2018) . interestingly, the phylogenomic reconstructions performed by koonin and yutin (2018) suggest that giant viruses (for which the host range appears to be restricted to protists) have evolved from simpler viruses (infecting animals) on many independent occasions. this again suggests that the host plays an important role in genome stability where in animals the pressure for smaller virus genomes is stronger than in protists (koonin and yutin 2018) . experimentally it has also been shown that vaccinia virus has a highly plastic genome. after deletion of one host range gene of vaccinia virus, another host range gene increases in copy number (elde et al. 2012) , leading to genomic expansion. the increased gene expression is in itself beneficial, but the high gene copy number also increases the supply of beneficial gain-offunction mutations. once these gain-of-function mutations are fixed in the population, the other copies of the gene are lost and thus the vaccinia genome size decreased (associated with the cost of an increased genome size) (elde et al. 2012 ), leading to accordion-like evolutionary dynamics (andersson, slechta, and roth 1998) . modified vaccinia virus ankara (mva) is used as a viral vector for the development of vaccines against infectious diseases such as malaria, influenza, tuberculosis, hiv/aids, and ebola (sutter and staib 2003; gó mez et al. 2012; gilbert 2013; stanley et al. 2014) the optimization of poxvirus promoters in this viral vector has proven to be an effective strategy for increasing the stability of antigen (inserted sequence) expression, and therewith the development of mva-based vaccines (alharbi 2019) . although live attenuated vaccines have substantially reduced rabies prevalence after oral-vaccination campaigns were conducted (lafay et al. 1994; macinnes et al. 2001) , such live vaccines are not efficacious in all rabies vector species. as an alternative, recombinant human adenovirus vaccine vectors expressing the rabies glycoprotein have been developed. the fitness of a replication-competent human adenovirus expressing the rabies glycoprotein was similar to that of the wild-type virus, as tested in vitro (knowles et al. 2009 ). moreover, the inserted rabies virus gene was stable during both in vivo and in vitro passaging (knowles et al. 2009 ), demonstrating the potential of this recombinant vaccine vector as an effective alternative. non-human adenoviruses can be used as alternative vaccine vectors, providing several advantages such as a limited host range and restricted replication in non-host species. by using bovine adenovirus type 3, a variety of antigens and cytokines were successfully expressed in vivo (ayalew et al. 2015) . the stability of bovine adenovirus type 1 was tested by inserting the eyfp marker and subsequently passaging the recombinant virus in cell culture (ren et al. 2018) . although replication of this recombinant virus was less efficient than the wild-type virus, the inserted eyfp was stable. engineered alphabaculoviruses (infecting arthropods) are widely used as vectors for the expression of heterologous genes in insect cells. nonetheless, during serial passaging defective interfering (di) baculoviruses that lack large portions of the genome are rapidly produced, in what appears to be an intrinsic property of baculovirus infection (pijlman et al. 2001) . as a result of having a smaller genome size, these dis most likely have a replicative advantage (higher fitness). especially in bioreactor configurations where the cellular multiplicity of infection (moi, the number of virus particles infecting a cell) is high, fasterreplicating dis can rapidly reach high frequencies (kool et al. 1991) . the rapid generation of dis involves several recombination steps and prevents the development of stable baculovirus expression vectors, as inserted sequences are then also rapidly lost (pijlman et al. 2001) . the loss of sequences inserted into baculovirus genomes is not only due to the formation of dis. when an origin of replication that is enriched in di genomes was removed, baculovirus genomic stability at high mois increased as no dis were observed. strikingly, inserted foreign sequences were still rapidly lost (pijlman, van schinjndel, and vlak 2003) , showing that rapid di generation is not the only impediment to the stability of inserted genes. addition of endogenous viral sequences-homologous repeat regions important for baculovirus replication-to inserted sequences promoted the stability of insertions (pijlman et al. 2004) , highlighting the importance of the genomic context for insert stability. another study in which the importance of the genomic context was stressed involved the generation of infectious clones and determination of the stability of suid herpesvirus 1, the causal agent of aujeszky's disease. sequences inserted in infectious clones were genetically stable in escherichia coli. however, for the reconstituted viruses, the insertion at the gg locus was highly unstable, whereas the same insert was stable when inserted between the us9 and us2 genes (smith and enquist 1999, 2000) . stability was only determined in a short-term experiment, but these results nevertheless emphasize the importance of the genomic context for stability, even in viruses with relatively large and stable genomes. bacteriophages were instrumental in the development of molecular cloning methods. among dsdna phages, lambdaviruses of e.coli were widely used as cloning vectors, and methods were developed to increase the stability and maximum size of inserts (chauthaiwale, therwath, and deshpande 1992) . one interesting approach made use of the fact that there is a minimum genome size for efficient packaging into virus particles. when endogenous genes that are non-essential for the lytic cycle are removed, not only can larger sequences be inserted, but there is also selection for maintaining the inserted sequences because they increase genome size and enable packaging (thomas, cameron, and davis 1974) . moreover, it has been shown that phage t7 engineered with a biofilm-degrading enzyme (dispersin b) was superior to unmodified phage at clearing short-term biofilms (lu and collins 2007) . although providing a 'public' benefit in the form of an exoenzyme that can degrade host defenses, surprisingly this insertion does not have a cost and is therefore stable (schmerer et al. 2014) . interestingly, the insertion of an endosialidase at the same locus was both beneficial and costly, although in this case evolutionary stability was not determined (gladstone, molineux, and bull 2012) . in summary, engineered dsdna viruses containing foreign gene insertions are relatively unstable and stability is only reached when the genomic context and demographic conditions (e.g. census population sizes, bottleneck sizes, and population structure) are optimal. contrarily, in natural conditions dsdna viruses appear to be highly plastic where increases and decreases in genome size occur on a relatively short evolutionary time scale. in particular, host-switches may play important roles in increased plasticity and stability of dsdna viral genomes. even though unstable viral genomes may help increase viral fitness by avoiding the hosts' immune system in natural conditions, it may also prevent the development of stable viral expression vectors in bioreactor configurations. the ssdna viruses have much smaller genome sizes as compared to the dsdna viruses (group i), ranging from 1.8 to 2.3 kbp genomes of the circoviridae to the 24.9 kbp genome of the spiraviridae. judging only by the small range in genome size, one would expect that ssdna viruses are less plastic compared their dsdna counterparts, and thus less likely to accept foreign genes in their genomes. although few studies have addressed genomic stability of ssdna viruses after an insertion, an example in wild viruses of frequent sequence insertions, duplications, and deletions are the geminiviridae, with genomes of about 2.5-3 kbp (monopartite) or 4.8-5.6 kbp (bipartite). during the course of geminivirus infection in plants, shorter subgenomic dnas often arise. these subgenomic dnas can range in size and some result in defective dnas (stenger et al. 1992; stanley et al. 1997; patil et al. 2007) , that replicate at the expense of the full-length genome. these subgenomic dnas can lead to reduced symptom severity in plants and thereby act as modulators of viral pathogenicity. it is speculated that the (sometimes stepwise) deletion process leading to subgenomic dnas can also be the process leading to the reversion to wild-type full-length dna molecules with either insertions or deletions that make these bigger or smaller than the wild-type genome (martin et al. 2011 ). when inserting sequences into the genome of maize streak virus (msv, geminiviridae), the infection efficiency decreased as the size of the insert increased (shen and hohn 1991) . although, some of the msv mutants obtained deletions and reverted to the wildtype length, the frequency of the deletion process did not increase linearly with the size of the insert, but rather depended on the nature of the sequence (shen and hohn 1991) . deletion mutants of the african cassava mosaic virus (acmv) have also shown to revert back to the original wild-type genome length through recombination between the two components of the bipartite genome (etessami, watts, and stanley 1989) . the selection pressure on the reversion to wild-type genome length is probably a strong size constraint on encapsidation, where in the case of acmv the size of encapsidated dna determines the multiplicity of geminivirus particles (frischmuth, ringel, and kocher 2001) . the nanoviridae family includes ssdna viruses with a multipartite genome that are composed of six to eight circular segments. segmented ssdna viruses present unique challenges when thinking about the stability of inserted sequences, because the frequency of genomic segments is highly plastic for some of these viruses. these viruses might therefore downregulate segments for expression of the inserted sequence, even if downregulation of co-localized homologous genes is costly. a lower frequency would also entail a lower mutational supply, limiting evolvability, the capacity of the virus to generate beneficial variation and subsequently adapt. segmented viruses might therefore display rapid adaptive responses to inserted sequences, whilst simultaneously limiting their potential for longer-term evolution. to the best of our knowledge, this potentially interesting tradeoff has not been shown. inserted sequences can be unstable in ssdna phages, which like their dsdna counterparts also can have an upper limit to genome size. inserts of up to 163 bp were stable in x174, despite markedly reducing fitness (russell and muller 1984) . genomes with larger insertions were still infectious, although the insert was then rapidly lost. later, it was shown that short palindromic sequences could be inserted in x174, but that these inserts become more unstable as the number of repeats is increased and when the identity of the repeats is identical (williams and mü ller 1987) . in other work, it has been shown that phage display (wu et al. 2016) can be used to select clones coding for peptides with high affinity for a particular target, although selection for m13 phages with no insert-due to their presumed faster replication-can hamper 'phage panning' (tur et al. 2001) . based on the little evidence we obtained there appears to be strong selection for genome streamlining in ssdna viruses. after a sequence insertion, reversion to the wild-type genome size is observed in both natural and laboratory conditions. interestingly, the nature of the insert appears to be more important than the size of the insert, indicating that the genomic context also plays an important role in the stability of ssdna viruses. the dsrna viruses have a range of genome sizes (3.7-30.5 kb) that is similar to the ssdna viruses. most of the dsrna viruses contain segmented genomes, where during replication, positivesense ssrnas are packaged into procapsids and serve as templates for dsrna synthesis. thus, the progeny particles contain a complete set of equimolar genome segments. proper recognition and stoichiometrical packaging of the ssrnas is indispensable for multi-segmented genome assembly. although, different dsrna viruses employ different mechanisms for this assembly, these all rely on proper recognition of the ssrnas in either specific rna-protein or rna-rna interactions (borodavka, desselberger, and patton 2018) . we therefore expect that dsrna virus genomes are highly streamlined, since most gene insertions will probably disturb the recognition and packaging process of the ssrnas. interestingly, for rotaviruses it has been observed that genome segments containing sequence duplications are preferentially packaged into progeny viruses relative to wild-type segments (troupin et al. 2011) , indicating that an increase in genome/segment size may not be a hard constraint. we hypothesize that few if any gene insertions will lead to viable genomes due to the perturbation of segmented genome assembly into virus particles. if a gene insertion happens to be viable, it will probably be rapidly purged from the viral genome. one exception to this hypothesis could be a gene insertion originating from a closely related virus, for example a virus with similar packaging signals, leading to a fitness advantage, such as increased packaging efficiency. only a small number of studies that test the stability of inserts in dsrna viruses are available, and these concern the generation of recombinant rotavirus expressing foreign genes. group a rotavirus, consisting of eleven segments, has been engineered to express fluorescent proteins (kanai et al. 2017 (kanai et al. , 2019 komoto et al. 2017 ) such as enhanced green fluorescent protein (egfp) and mcherry. however, segment 5 in which these genes were introduced is expressed at low levels and is subject to proteasomal degradation ( based on limited evidence, we tentatively conclude that stoichiometric packaging of segmented genomes may form an impediment to engineering and insert stability. however, recent work also suggests that careful engineering of dsrna viruses may lead to stable sequence insertions. the generality of these conclusions for the dsrna viruses, and their dependence on environmental and demographic conditions, remain to be seen. the ssrna(þ) viruses range in genome size from 2.3 to 31 kbp. it has been shown that both animal and plant ssrna(þ) viruses can express inserted foreign genes. however, the nature of the ssrna(þ) genomes poses several limitations to efficient expression and maintenance of the insert. most ssrna(þ) genomes used for the expression of foreign genes code for a polyprotein, a single orf that is further processed after translation into different mature peptides. the processing occurs through autocatalytic cleaving at specific cleavage sites located between the different proteins to be expressed. insertions should therefore be carefully engineered, including proper cleavage sites corresponding to the site of insertion. even when respecting these design rules, inserts may impose restriction on viral replication due to the failure of proper protease cleavage due to conformational constraints. in addition, the genomes of rna viruses tend to be composed of overlapping genes (belshaw, pybus, and rambaut 2007) , which limits their adaptive capacity (simon-loriere, holmes, and pagá n 2013). overlap can form an impediment to engineering, and perhaps to the likelihood an inserted sequence is maintained, as insertions will often affect multiple genes. we focus exclusively on engineered viruses, given that there are many examples for this virus group. poliovirus is a good candidate as a live viral vector for the expression of foreign genes, since the attenuated sabin strains of poliovirus elicit strong protective immune responses without causing disease (sabin 1957) . insertions of up to 534 bp of the rotavirus vp7 gene into sabin 3 poliovirus gave rise to infectious viruses that expressed portions of the vp7 outer capsid protein (mattion et al. 1994 ). this is promising as antibodies generated to vp7 are able to neutralize the virus. nevertheless the size of the insert in this construction is limited as only inserts of about 300 bp or smaller were stable upon serial passages in tissue culture, whereas larger insertions failed to produce infectious viruses (mattion et al. 1994 ). one of the major limitations appears to be the polyprotein nature of the genome. the recombinant viruses expressing the inserted gene were found to be slower in the assembly of infectious virus particles, and showed smaller plaques and lower virus titers. this is possibly due to slow cleavage at the artificial cleavage sites around the insert (mattion et al. 1994) . sindbis virus, another ssrna(þ) virus genome that encodes for a single orf, accepted relatively large inserts of 3.2 kbp in the 11.8 kbp genome (pugachev et al. 1995) . however, the recombinant sindbis viruses appeared unstable and especially inserts at the 3 0 end were rapidly lost during serial passages, suggesting a positional effect. for members of the flaviviridae family, such as west nile virus and hepatitis c virus (hcv), inserted reporter genes appear to be unstable. this instability is related to the size of the insert, and comes about because of the disruption of structural rna elements required for viral replication (ruggli and rice 1999; pierson et al. 2005) . to cope with these issues, recombinant flaviviridae viruses carrying the split-luciferase gene were generated (tamura et al. 2018 ), including dengue virus, japanese encephalitis virus, hcv and bovine viral diarrhea virus. in vitro, these recombinant viruses appear to be evolutionary stable and propagation was comparable to the wild-type virus, most probably due to the small 11 amino acid insert size. to demonstrate the utility of the split reporter system-to determine in vivo viral dynamics and the efficacy of antiviral reagents-the recombinant hcv was tested in chimeric mice. chronic infection was established and the luciferase gene was stably maintained in the viral genome (tamura et al. 2018) . live attenuated vaccines for porcine reproductive and respiratory syndrome (prrsv) have failed to provide effective protection due to the genetic diversity of circulating prrsv strains. to improve the efficacy of prrsv vaccination, a recombinant virus expressing porcine interleukin-4 (a regulator of the immune response) was constructed (zhijun . this recombinant virus remained stable upon serial passaging in vitro, and induced higher ratios of interleukin-4 and cd4þcd8þ doublepositive t cells in vivo. despite the presumably better immune response of the host, the recombinant prrsv vaccine did not significantly improve protection efficacy (zhijun . in another attempt, granulocyte-macrophage colony-stimulating factor (gm-csf) was inserted in a prrsv vaccine strain. the inserted gene was stably expressed upon serial passaging in vitro, and the presence of gm-csf led to increased surface expression of mhciþ, mhciiþ, and cd80/86þ (yu et al. 2014) . although evaluated solely in vitro, this recombinant strain is expected to elicit stronger immune responses and hereby improve vaccine efficacy against prrsv infection. it has been shown that many different plant viruses can express foreign genes, and they have the advantage being able to express these directly in vivo. as an initial strategy to express foreign genes in plants, on many occasions viral genes were replaced with the gene of interest (gene replacement instead of gene insertion). this strategy appeared to be (partially) successful in plant ssdna viruses (hayes et al. 1988; ward, etessami, and stanley 1988; hayes, coutts, and buck 1989) , as the replaced coat protein did not appear to play an essential role in virus spread throughout the plant host (ward, etessami, and stanley 1988) . however, viral ssrna(þ) genomes seem to be less plastic as the replacement strategy was mostly unsuccessful in plant rna viruses. although the rna viral vectors permitted the expression of replaced genes, either they were only viable in protoplasts and not in whole plants (french, janda, and ahlquist 1986; joshi, joshi, and ow 1990) , or they were unable to establish systemic infections (takamatsu et al. 1987; dawson, bubrick, and grantham 1988) . shortly after, studies showed that gene insertion-rather than gene replacement-was better suited for expressing foreign genes in ssrna(þ) viral genomes (dawson et al. 1989; donson et al. 1991; chapman, kavanagh, and baulcombe 1992) . the chloramphenicol acetyltransferase (cat) gene (dawson et al. 1989) , and the dihydrofolate reductase (dhfr) and the neomycin phosphotransferase (npt) genes (donson et al. 1991) were successfully expressed in plants by using tobacco mosaic virus (tmv) as a vector. in addition, the bacterial gus gene has shown to successfully express when inserted into the viral genome of potato virus x (pvx) (chapman, kavanagh, and baulcombe 1992) . however, in all these cases the presence of a foreign gene leads to genomic instability resulting in the partial deletion of the gus and npt genes and a complete deletion of cat during systemic infection. this instability may result from the presence of the insert leading to lower accumulation levels of the genomic rna, as well as leading to mrna instability and/or interfering with synthesis of the viral proteins. sequence redundancy due to a promoter duplication can also lead to genomic instability and thus the subsequent deletion of the inserted sequence (dawson et al. 1989; chapman, kavanagh, and baulcombe 1992) . indeed, for tmv and pvx it has been shown that replacing one of the promoter sequences with that from related viruses (donson et al. 1991) together with further removal of additional sequence duplications (dickmeis, fischer, and commandeur 2014) , leads to increased stability of the insert. interestingly, as for the dna viruses, the site and size of the insert seems to be important for ssrna(þ) viruses. first, the positioning of the cat gene downstream (instead of upstream) of the tmv coat protein, resulted in a poorly replicating virus that was not able to systematically infect the host plants (dawson et al. 1989) . and second, the dhfr gene (238 bp) inserted in a tmv background appears to be maintained stably through several passages, while the 3.5â larger npt gene (832 bp) in the same experimental setup was unstable during systemic movement of the virus. this may also be related to the nature of the insert, where sequences with a codon usage similar to that of the viral vector may be retained longer than those that have an opposite codon usage. interestingly, chung, canto, and palukaitis (2007) generated recombinant plant viruses with inserted genes of unrelated plant viruses and observed instability and variation in the rate of partial or complete loss of the insert depending on the inserted sequence itself, the host used, or the viral vector used (chung, canto, and palukaitis 2007) . also sequences with a high toxicity for the host, are more likely to become deleted faster or to impede viral replication. in a previous study we reported on experimental evolution of pseudogenization in virus genomes using tobacco etch virus (tev) expressing egfp (zwart et al. 2014 ), a gene known to be toxic in many expression systems. in this case egfp can be considered a non-functional sequence, as it does not add any function to the viral genome. we showed that egfp has a high fitness cost in tev, and the loss of egfp depended on the passage length, where longer passages led to a faster and assured loss. similarly, prolonged propagation of tev and plum pox potyvirus expressing gus (dolja, mcbride, and carrington 1992; dolja et al. 1993; guo, ló pez-moya, and garcía 1998) , and tmv expressing gfp (rabindran and dawson 2001) , led to the appearance of spontaneous deletion variants. due to the increase in genome size, viruses that carry an insert are unlikely to be as fit as the parental (ancestral) virus, even if they accumulate initially to similar levels. the tev-egfp genomes that had lost the insert had a within-host competitive fitness advantage, where the smaller the genome the higher the within-host competitive fitness. interestingly, although the size of the deletions varied, convergent evolution did occur in terms of fixed point mutations (zwart et al. 2014) . this result also suggests that a demographic 'sweet spot' exists, where heterologous insertions are not immediately lost while evolution can act to integrate them into the viral genome. in summary, in several studies passage duration has an effect on insert stability, with inserts being more stable in shorter passages. we explore these effects in the conceptual section presented at the end of this paper (see also box 1). here we illustrate how demography can affect the observed stability of an inserted sequence, using a simulation model. this model is based on (willemsen et al. 2016 ) and incorporates logistic virus growth, deterministic recombination with a fixed rate, and population bottlenecks after a given number of generations. to describe virus growth and recombination in each generation, two coupled ordinary differential equations are used: here, i is the number of viruses with the insertion intact, d is the number of viruses with a deletion, x is initial growth rate of each virus variant, j is the carrying capacity, q is the rate at which i recombines to d, and w is a constant for determining the effect of each virus on the others replication, with the effect of d on i being w d ¼ x d =x i and vice versa w i ¼ 1=w d . the frequency of the deletion variant is at the start of each passage, to simulate the bottleneck we draw the number i from a binomial distribution with a size a and success probability f d from the previous time point, and then d ¼ a à i. to illustrate the effects of bottlenecks we chose the parameters in table 2 , set the initial f d to zero, and considered various values of a. the difference in fitness between the virus with insertion and without is large (x i =x d ¼ 0:8). the simulation data illustrate how under these conditions narrow bottlenecks can lead to stable inserted sequence (fig. 2) . during each round of passaging the frequency of the deletion variant comes up, but as it does not reach a frequency near 1/a this variant is not sampled during the bottleneck. only when the bottleneck is wider is the probability of sampling the virus variant with a deletion large enough for this to occur regularly. once a deletion variant has been sampled during the bottleneck, it rapidly goes to fixation as it has a much higher fitness than the full-length virus. figure 1 provides a simple illustration of the same principle. when considering host species jumps using the same tev-egfp vector, we show that host switches can radically change evolutionary dynamics (willemsen, zwart, and elena 2017). after over half a year of evolution in two semi-permissive host species, with a large difference in virus-induced virulence, the egfp insert appears to remain stable. a fitness costs of egfp was only found in the host for which tev has low virulence. in the hosts for which tev has high virulence there was no fitness cost and viral adaptation was observed. this contradicts theories that suggest that high virulence could hinder between-host transmission. when considering the evolution of genome architecture, host species jumps might play a very important role, by allowing evolutionary intermediates to be competitive. the stability of an insert could change when considering insertions that might be beneficial for the virus. using the tev genome we simulated two hgt events, by separately introducing functional exogenous sequences that are potentially beneficial for the virus (willemsen et al. 2017 ). in one case, the insertion was rapidly purged from the viral genome, restoring fitness to wild-type fitness levels. in another case, the inserted gene-the 2b rna silencing suppressor from cucumber mosaic virus-did not seem to have a major impact on viral fitness and was therefore not lost when performing experimental evolution. interestingly this insertion duplicated the function of rna silencing suppression function of another gene in the genome. when mutating this functional domain of the tev gene, the inserted gene provided a replicative advantage. these observations suggest a potentially interesting role for hgt of short functional sequences in improving evolutionary constraints on viruses. besides hgt, another mechanism for evolutionary innovation is gene duplication. the effects in the stability on a genetically redundant insert might be variable. on one hand, one would expect the duplicated copy to be rapidly deleted from the genome as it does not confer an additional function. on the other hand, if a duplicated sequence is stable it may act as a stepping stone to the evolution of new biological functions. we have investigated the stability of genetically redundant sequences by generating (tev) viruses with potentially beneficial gene duplications (willemsen et al. 2016 ). all gene duplications resulted in a loss of viability or in a significant reduction in viral fitness. experimental evolution always led to deletion of the duplicated gene copy and maintenance of the ancestral copy. however, the stability of the different duplicated genes was highly divergent, suggesting that passage duration is not the main factor for determining whether the insert will be stable or unstable. the deletion dynamics of the duplicated genes were associated with the passage duration and the size of the duplicated copy. by developing a mathematical model we showed that the fitness effects alone are not enough to predict genomic stability. a context-dependent recombination rate is also required, with the context being the identity of the insert and its position. in summary, these experimental observations demonstrate the deleterious nature of gene insertions in ssrna(þ) viruses, where the highly streamlined genomes limit sequence space for the evolution of novel functions, and in turn adaptation to environmental changes. the ssrna(à) viruses are composed of genomes that range from 10 to 25.2 kbp in size. these viruses are particularly attractive candidates as viral vectors. while in ssrna(þ) viruses inserts are subject to deletion, inserts in their ssrna(à) counterparts appear more stable (mebatsion et al.1996; schnell et al. 1996) . one reason for this stability is that in general the genes in the ssrna(à) viral genomes are non-overlapping and are expressed as separate mrnas, thus consisting of a modular organization that can be easily manipulated for the insertion of foreign genes. if correctly engineered (e.g. without affecting any regulatory regions), one could expect that gene insertions are more stable in ssrna(à) viruses as compared to ssrna(þ) viruses, since the complexities surrounding correct processing of a polyprotein are not an issue here. moreover, if expressed as a separate mrna, the size of the insert is probably restricted only by the packaging limits of ssrna(à) viruses. the low rate of homologous recombination in ssrna(à) viruses can be another explanation for higher genomic stability (chare, gould, and holmes 2003; han and worobey 2011) . non-homologous recombination will probably rarely lead to variants with the insert deleted and other regions undisturbed, given it is less constrained than homologous recombination, and hence low homologous recombination rates could be a limiting factor on sequence evolution. however, genomic deletions that disrupt the inserted sequence will be subject to less constraints, as for example they can disrupt the reading frame of the insert without affecting the expression of virus genes. canine distemper virus (cdv), a species in the morbillivirus genus, is an important pathogen of a variety of animals, including the dog. this virus, however, has shown to be a promising expression vector for the development of vaccines. although the replicative fitness of a recombinant cdv carrying the rabies virus glycoprotein was slightly lower than the wild-type cdv, the insert was stably expressed during serial passaging in vitro and inoculation in vivo induced specific neutralizing antibodies against both rabies and cdv . similarly, genes expressing foreign antigens can be cloned into recombinant measles virus where measles virus proteins and inserted genes are coexpressed. this relatively small vector can accept large gene insertions, that in most cases are stably expressed (billeter, naim, and udem 2009; malczyk et al. 2015) . for example, for the development of a vaccine against middle east respiratory syndrome coronavirus (mers-cov), it has been shown that a recombinant measles virus expressing the spike glycoprotein of mers-cov is genetically stable in vitro and induces strong humoral and cellular immunity in vivo (malczyk et al. 2015) . vesicular stomatitis virus (vsv) is a commonly used vaccine vector that has been engineered to express surface proteins from diverse viruses, including ebola (garbutt et al. 2004 ), human immunodeficiency virus type 1 (hiv-1) (johnson et al. 1997) , and influenza a (roberts et al. 1999) , which can stimulate protective immune responses against these pathogens (bukreyev et al. 2006 ). in addition, vsv has shown promise as a candidate for oncolytic virus therapy, as it replicates most efficiently in cells with diminished innate immunity such as cancer cells, which often have impaired production of and/or response to interferon (barber 2005) . mutations that attenuate vsv growth in healthy immune-competent cells can further enhance the safety of this anti-cancer therapy potential (barber 2005) . what is particularly interesting about the genome organization of vsv and other ssrna(à) viruses is that promoter proximal genes are more efficiently expressed than promoter distal ones (iverson and rose 1981; wertz, perepelitsa, and ball 1998; pesko et al. 2015) . the efficiency of expression of the inserted gene (and therewith the strength of the immune response) can be controlled (tokusumi et al. 2002; roberts et al. 2004 ). however, inserting a foreign gene close to the promoter also can also reduce the expression of downstream vector genes (skiadopoulos et al. 2002) , which in turn can negatively affect virus transcription and rna replication (wertz, moudy, and ball 2002; zhao and peeters 2003) . these empirical observations again show that the site of the insert plays an important role in recombinant vector stability. when considering the size of the insert and its stability, ssrna(à) viruses accept relatively large insert without drastically affecting virus replication. sendai virus, with a genome size of about 15.3 kbp, can carry and efficiently express gene insertions up to 3.2 kbp (sakai et al. 1999) . however, also here the insert size is limited, where the final virus titers in vitro are proportionally reduced as the insert size increases. while in vivo no such size-dependent effect was observed, an attenuated replication and pathogenicity were detected (sakai et al. 1999) . insertions up to 3.9 kbp in the $15.4 kbp genome of the human parainfluenza virus 3 were viable and replicated efficiently in vitro (skiadopoulos et al. 2000) . nonetheless, the insertions longer than 3,000 bp reduced the robustness to environmental perturbation of the virus, as temperature sensitivity was augmented and replication was restricted to certain sites in vivo (skiadopoulos et al. 2000) . the ssrna(à) viruses seem promising expression vectors, where one can control gene expression and introduce relatively large inserts that, in many instances, appear to be stable. the constraints imposed on viral gene insertions seem to be the lowest in this group of viruses. yet, the ideal vector that accepts all types and sizes of foreign gene insertions without decreasing viral replication, has not been identified yet. retro-transcribing ssrna(þ) viruses, or retroviruses, are small viruses varying in genome size from 7 to 11 kbp and are classified in the retroviridae family. after entering a host cell, the retroviral rna genome is converted into dsdna by reverse transcription. the viral dna integrates into the host genome, where viral genes are translated. therefore, these viruses are often used for gene therapy. retroviruses frequently undergo genomic rearrangements, including gene insertions and deletions (indels). moreover, recombination can be common due to the combination of 'diploid' virus particles and high intrinsic recombination rates (jetzt et al. 2000) . therefore as a general observation this viral group appears to have a highly plastic genome, and should be relatively open to foreign gene insertions. as retroviruses integrate into the host genome, the stability of inserts does not necessarily depend solely on the retrovirus genome configuration and demographic conditions. as host genomes are in general less streamlined than those of viruses, one could expect that gene insertions are stable after integration into the host genome. however, the random integration of retroviruses in the host genome makes it hard to predict genomic stability. as a wild example, hiv-1 frequently undergoes genomic rearrangements, where indels are significant source of evolutionary change. these indels appear to have an impact on virus transmission and adaptation as for example indels in the hiv-1 pol gene are associated with drug resistance (rakik et al. 1999) , and indels in the gag and vif genes are associated with disease progression and infectivity (alexander et al. 2002; aralaguppe et al. 2017 ). the hiv-1 surface envelope glycoprotein contains five variable regions (v1-v5) that can tolerate a higher rate of indels than the rest of the genome. interestingly, indel rate estimates vary significantly among variable regions and subtypes (from different hosts) (palmer and poon 2018). when introducing gfp into the five variable regions of hiv-1, certain regions (v4 and v5) were more tolerant to foreign gene insertions than the other variable regions (v1, v2, and v3) (nakane, iwamoto, and matsuda 2015) . in particular, gfp insertions into the v3 region showed lower levels of expression (nakane, iwamoto, and matsuda 2015) , which is consistent with v3 having the lowest indel rate (palmer and poon 2018) , thus having a lower stability after gene insertions. this piece of empirical evidence again shows that the site of insertion plays an important role in determining expression levels and stability. retroviruses have a valuable potential as vectors for introducing therapeutic genes into cancer cells. murine retroviruses are the most commonly used vectors in clinical trials today, and seem promising candidates for human gene therapy as they target dividing cells with a high degree of efficiency and lead to stable gene transfer as they integrate into the chromosomes of the target cell (edelstein et al. 2004 ). however, we still have to deal with important safety issues when using retroviruses for gene therapy. the random integration of retroviruses in the host genome poses a risk, as the integration near the lmo2 proto-oncogene promoter can trigger the development of leukemia (hacein-bey-abina et al. 2003) . besides the risks related to retroviral gene therapy, the limited efficiency of in vivo gene transfer poses another obstacle. replication defective retrovirus vectors are often used in clinical trials but limited since they can only infect a fraction of solid tumor cells (rainov and ren 2003) . for the delivery of the transgene in all tumor cells, replication-competent retroviral vectors are a promising alternative. the suitability of murine leukemia virus (mlv)-based vectors for cancer gene therapy has been analyzed in vitro and in vivo by paar et al. (2007) . they found that the choice of the virus strain, the position of the insert, and the host cells used, can influence the replication kinetics, genomic stability, and transgene expression levels (paar et al. 2007) . concordantly, the egfp sequence was inserted into mlv under different configurations (i.e. site of insertion and flanking sequence), and the reporter gene was deleted upon extended cell culture (duch et al. 2004 ). the stability was improved by decreasing the length of sequence repeats flanking the inserted sequence, however, eventually egfp was always (partially or completely) deleted (duch et al. 2004) . in another study, transgenes of different sizes (gfp, hph, pac) were inserted into mlv. deletions were always observed, where the deletion dynamics depended on the size of the insert and preferred sites of recombination were detected (logg et al. 2001) . using retroviral vectors for the expression and transfer of foreign genes is central to the development of gene therapy. an advantage of using retro-transcribing ssrna(þ) viruses is that after reverse transcription a dsdna molecule stably integrates into the host genome. with careful design, testing, and engineering, the retroviruses are promising vectors for the treatment of diseases, such as cancer. the retro-transcribing dsdna (rt-dsdna) viruses have small genome sizes varying from 3 to 8.3 kb, and include the viral families caulimoviridae and hepadnaviridae. as the name suggests, the rt-dsdna viruses replicate through an rna intermediate, and in some cases the pre-genomic rna is alternatively spliced. although genomic rearrangements appear to be frequent in rt-dsdna viruses, we hypothesize that gene insertions will often be unstable, because 1, they tend to have compact genomes, and 2, insertions can easily disturb a viral regulatory sequence or lead to incorrect processing of the alternative spliced products. a. willemsen and m. p. zwart | 9 2.7.1 wild viruses in contrast to the retroviruses (group vi), the genome replication of the caulimoviridae is entirely episomal. however, fragmented and rearranged endogenous caulimovirus sequences have been found in a wide variety of plant species (teycheney and geering 2011) . for the hepadnaviridae, the viral genome can be integrated into the host genome, through a process that exploits ds breaks in the host genome. although this is an infrequent event, the integrated viral dna often contains deletion, inversions and duplications, often inactivating the virus. in the case of hepatitis b virus (hbv), integration into the human genome can cause genetic damage and chromosomal instability leading to hbv-induced liver cancer (shafritz et al.1981; furuta et al. 2018 ). several studies in the 1980s already reported the possibility of inserting foreign dna into specific sites of the cauliflower mosaic virus (camv) genome without greatly affecting viral infectivity or function (gronenborn et al. 1981; howell, walker, and walden 1981; dixon, koenig, and hohn 1983; brisson et al. 1984; lefebvre, miki, and laliberté 1987) . in two of these studies, functional bacterial genes were introduced into the camv genome, where a fragment of the lac operator (gronenborn et al. 1981 ) and the dhfr gene (brisson et al. 1984) were successfully expressed. in these studies, issues regarding the stability of the insert were raised, where the lac operator was lost after five successive transfers and extended growth of the plants, and deletions in the dhfr gene started appearing after the second and third transfers. on the contrary, an inserted mammalian metallothionein gene appeared to be stable and functional in the camv genome (lefebvre, miki, and laliberté 1987) . these studies suggest that the differences in stability of inserts in the camv genome depend on at least two factors. first, the site of the insert seems to be important as many inserts are lethal for the virus (gronenborn et al. 1981; howell, walker, and walden 1981; dixon, koenig, and hohn 1983) . second, the size of the insert is important, as camv can accept only small foreign genes due to viral encapsidation limits (gronenborn et al. 1981; lefebvre, miki, and laliberté 1987) . as described along this review, vectors containing the gfp as an insert are often designed to study the infection dynamics of viruses. however, the size of gfp is relatively large (around 700 nt) and often leads to instability of vectors (zwart et al. 2014; nakane, iwamoto, and matsuda 2015) . to cope with the size limitation a split gfp system has been engineered (cabantous, terwilliger, and waldo 2005) , where only a small part of gfp is introduced in the viral vector and the other part is expressed using a transgenic host. when the two gfp fragments are together, spontaneous association leads the formation of a fluorescent molecule. in the camv genome this system allowed to track a camv protein in vivo (dá der et al. 2019). the partial gfp insertion was stable for ten or four serial passages, depending on the host plant species used, suggesting that the demographic conditions such as the host play an important role in stability. although the number of studies on insert stability in rt-dsdna viruses is limited, we reason that several constraints limit insert stability in these viruses. although small inserts will allow to track viral infection dynamics, the use of rt-dsdna viruses for gene therapy does not seem practicable as integration into the host genome is a rare event for these viruses. sequence loss is inherently an evolutionary process, at a minimum involving mutation and selection, and therefore needs to be framed in an evolutionary context. here, we consider how theory might help to better understand and ultimately predict this process. first, inspired by empirical results we consider the effects of virus population and bottleneck sizes on sequence loss. second, we consider whether there are different evolutionary trajectories that lead to a restoration of fitness following insertion of a sequence, and their implications for sequence stability. we understand demography to be a description of the size and structure of virus populations over time. in this discussion we will consider virus populations that are divided into demes at the host or cell level. theory suggests that demography could have major implications for the loss of inserted sequences, with small population sizes, narrow bottlenecks, and short time intervals between bottlenecks resulting in high sequence stability. hence, the stability of the inserted sequence cannot be viewed solely as a property of a genome, rather it is a phenotype and therefore depends on the environment. in this section, we motivate this argument and present a simulation model that highlights the effects of demography on the deletion of inserted sequences. at its core, the stability of genomic insertions in viral genomes depends on two key factors. first, the supply of mutations removing the insertion is crucial, because selection can only act on existing heritable variation. second, selection then acts to fix variants with the inserted sequence removed. all other things equal, the larger the supply of mutations that remove the insertion and the larger the selection coefficients of variants with the insert removed, the less stable the insertion will be. the interplay between mutation and selection will govern the stability of genomic inserts, and in many cases demography has an important role in shaping this interplay. for example, low fitness can lead to small population sizes, which in turn will limit the mutation supply (chao 1990; lynch and gabriel 1990) . a high-cost inserted sequence might therefore limit viral evolvability, thereby promoting its own stability. genetic drift can also play an important role in determining the stability of inserted sequences, as inserts can have high stability if a viral population regularly passes through population bottlenecks. this idea is inspired by the empirical observation that a group iv plant virus appears to be stable when shortduration passages are used, but not in long-duration passages zwart et al. 2014; willemsen et al. 2016 willemsen et al. , 2018 . viruses pass through bottlenecks at many points during infection, in vitro and in vivo (zwart and elena 2015) , it is therefore important to consider these effects. even if there is a large supply of deletions and strong selection for the deletion mutant, if deletion mutants fail to reach a frequency !/a, where a is the bottleneck size, they are unlikely to pass through the bottleneck (willemsen et al. 2016 (willemsen et al. , 2018 . this leads to a 'resetting' of the virus population by each bottleneck event (fig. 1) , effectively resulting in high stability of the inserted sequences (box 1, see also fig. 2 ). short passages shorten the time for deletion mutants to reach the frequency 1/a, making it more difficult for these variants to pass through bottlenecks and hereby promoting insert stability. it is important to remember that assays for detecting deletion mutants, such as deep sequencing or the polymerase chain reaction, do have limited sensitivity. deletions may therefore also be detected more readily in longer passages, whilst low frequency mutations that will be purged by bottlenecks may not be detected (bull, nuismer, and antia 2019) . demography can also modulate the strength of selection itself. the moi (cellular multiplicity of infection) is a key demographic parameter at the cellular level, as it describes the number of virus particles infecting a cell. if an inserted sequence affects viral fitness in trans at the within cell level-for example by being toxic-then the moi will determine whether there can be selection (miyashita and kishino 2010) . at high mois there will be no selection, because the toxin is produced in all cells and affects the replication of both producers and nonproducers of the toxin (fig. 3 ). an interesting conundrum is that high mois also tend to promote the evolution of di viruses (huang 1973) due to within-cell selection, and hence these two effects must be weighed accordingly. in this review, we considered only a few cases in which inserted sequences potentially could have beneficial effects on a virus (thomas, cameron, and davis 1974; gladstone, molineux, and bull 2012; schmerer et al. 2014; willemsen et al. 2017) . beneficial effects could promote insertion stability and are therefore interesting from a bioengineering perspective, but demography can once again play a role in determining sequence stability. heterologous expression of endosialidase, an exoenzyme that degrades a key biofilm component after phageinduced cell lysis, lead to increased amplification of phage t7 in capsulated e.coli (gladstone, molineux, and bull 2012) . however, a phage that did not express the dispersin outcompeted the engineered virus, as it could reap the benefits of dispersin production whilst not bearing costs. this tragedy of the commons is a reversal of the situation sketched above for high mois (fig. 3) . one proposed strategy to increase stability would be setting up culture conditions such that phages are growing in isolation or spatially structured environments (gladstone, molineux, and bull 2012) , other examples of demography-based approaches to increasing insert stability. it will certainly not always be possible to address issues of insert stability through demographic changes, but theory suggests this can be an interesting approach. some experimental protocols already exploit some of these principles, in particular strict adherence to low moi (fitzgerald et al. 2006) . one should caution against naive applications of evolutionary theory, as the details of each real-world system matter (schmerer et al. 2014 ). there are multiple, non-mutually exclusive mechanisms by which an inserted sequence can be costly for a virus. consequently, deletion of the inserted sequence may not be the only class of mutation that ameliorates the insert's effects on fitness, a possibility we explore in this section. we argue that alternative trajectories may sometimes play a role, but that due to mutation supply of different types of mutations, deletion of the inserted sequence is the most likely trajectory. a cost of the insert can arise because of the attributes of the inserted sequence (i.e. metabolic costs of expressing extra genes, toxicity of gene products), reorganization of the genome due to the insertion (i.e. disruption of the regulation of gene expression, polyprotein processing, and subgenomic rnas), or limitations on genome size imposed by virus-particle packaging. deletion of the inserted sequence is therefore not the only plausible class of mutation that can restore viral fitness, as other mutations can also affect fitness. these mutation types are 1, regulatory mutations (i.e. promoter mutations) that downregulate gene expression (van opijnen, boerlijst, and berkhout 2006) , 2, removal of immunogenic sequence motifs (fros et al. 2017) , 3, alteration of unfavorable secondary rna structures (mcfadden et al. 2013) , and 4, adopting a more favorable codon usage (carrasco, de la iglesia, and elena 2007; agashe et al. 2013; cladel et al. 2013) , as synonymous mutations can have marked effects on virus fitness. these different mutation classes are likely to have different mutation rates, and mutation bias might therefore drive the evolutionary route that is followed (stoltzfus and mccandlish 2017) . for example, consider that recombination rates are high for many viruses (tromas and elena 2010) , and there are many figure 3 . in panel a, we illustrate how the cellular moi can have a direct effect on selection strength. consider a virus that expresses a product that is toxic and acts in trans within cells to lower replication levels, but deletions can remove the gene coding this gene. if there is a mixed virus population with variants with the insertion intact and deleted, at high moi all cells will be infected with both variants and the toxin will lower replication. the ubiquitousness of the toxin will limit selection against the virus variant with the deletion. when moi is low, due to genetic drift at the cellular not all cells will contain both variants, and virus variant with the deletion is selected because those cells infected only with this variant have higher replication. in panel b, the relationship between the cellular moi (ordinate) and the frequency of single-genotype infection (abscissa) for a virus population with genotypes a and b is given, for different frequencies of the two virus genotypes in the population (f a shown, f b ¼ 1 -f a ). note that the frequency of single-genotype infections is given as the proportion of infected cells in which only virus genotypes a or b are present. as the moi increases, the frequency of single-genotype infections decreases, although it depends on the frequency of the two virus genotypes in the population. if genotype a expresses a gene that has fitness costs that act in trans (e.g. toxicity), then selection can only act against this genotype when there is an appreciable number of singlegenotype infections. possible recombination events that partially remove an insertion. in contrast, probably only a small fraction of point mutations will be beneficial (sanjuá n, moya, and elena 2004; carrasco, de la iglesia, and elena 2007) , e.g. in this case by lowering expression of the inserted gene or leading to more favorable codon usage. we therefore conjecture that mutation supply is likely to favor the evolution of deletions in the transgene over beneficial point mutations that affect fitness cost. consider the 'genomic accordion' observed in poxviruses (elde et al. 2012) : beneficial point mutations typically occur long after gene amplification by copy number variation. likewise we expect deletions that remove an insertion to be fixed before point mutations that also lessen its impact occur. nevertheless, the occurrence of alternative evolutionary trajectories could, depending on the exact mutation supply and effect sizes for different classes of mutations, contribute to making stability of genomic inserts less repeatable and predictable in some cases (de visser and krug 2014; bolnick et al. 2018 ). whereas some sequences inserted into viral genomes are stable, others are clearly not. although there are some factors that appear to explain these differences, at the end of the day there is still a great deal about the relatively simple question of stability that we do not understand. in contrast, these different outcomes are encouraging, because they suggest that if we understand the process well enough, we can design more stable insertions. for most viruses, strong selective constraints appear to exist against increasing genome size. in natural conditions, this is an impediment for evolutionary innovation by gene duplication or hgt. in laboratory conditions, this is an impediment for expressing a gene of interest by using engineered viral vectors. when stratifying by viral groups, we observe that the stability of viral genomes partially depends on the nature of the genome. viral genomes with separately expressed nonoverlapping orfs (group v: ssrnaà) appear to have less constraints imposed on sequence insertions as compared to genomes with genes encoded in one single orf (group iv: ssrnaþ). although the dsdna (group i) virus genomes are extremely plastic in natural conditions, this observation is not a good predictor for stability of engineered viral genomes as inserts are generally lost. in the case of ssdna (group ii) viruses, the varying frequency of genomic segments might lead to rapid adaptive responses to inserted sequences. while in the case of segmented dsrna (group iii) viruses, sequence insertions probably perturb segmented genome assembly. when comparing the retro-transcribing viruses, the rt-ssrna(þ) (group vi) viruses appear to successfully express sequences of interest after stable integration into the host genome, whilst the rt-dsdna (group vii) viruses are less stable and only rarely integrate into the host genome. multipartite viruses, represented in various groups, also present unique challenges when thinking about the stability of inserted sequences. when comparing all viral genome architectures, we conclude that that genomic stability is not a fixed, intrinsic property. although we show that insert stability depends on the nature of the genome, the site and size of the insert and the recombination rate, the host species and demographic conditions (i.e. population and bottleneck size) can radically change viral evolutionary dynamics. we have illustrated this idea with a simple simulation model that considers the effect of genetic bottlenecks (box 1), where the observed stability of the viral genome decreases as the bottleneck is widened. the interplay between all factors affecting insert stability appears to be complex and unexpectedly sensitive to the exact conditions under which a virus population evolves. given these complexities, we think it may be challenging to develop predictive models of insert stability, for different types of virus genomes under different conditions. we hope to see developments in this area, possibly linked to resurging interest in preventing and exploiting di viruses. however, we think that experimental tests of the stability of viral constructs will remain important in the foreseeable future. experimental evolution can detect design problems in engineered genomes by looking at fitness and evolutionary stability (springman et al.) . as springman and collaborators suggest, experimental evolution may also prove useful for optimizing the stability of expression vectors by ameliorating constraints for which solutions are hard to predict because we lack a mechanistic understanding, such as codon usage (carrasco, de la iglesia, and elena 2007; agashe et al. 2013) . this approach can lead to improved engineering of viral genomes, which is also of interest for designing vectors with tags to follow viral infection, and for the use of viral vectors for gene therapy as well as for vaccine vectors. finally, for real-world applications it can be useful to determine quantitatively the impact of the loss of inserted sequences on the desired output. for example, models suggest that deletions in vector vaccines may not have a large impact on eliciting the desired immune response (bull, nuismer, and antia 2019) . we have noticed that a surprisingly large number of studies draw conclusions on the stability of inserted sequences in viral genomes based on experiments with either no or low replication. we cannot stress enough the importance of replication in studying genomic stability, in the first place because mutation is a stochastic process. moreover, as illustrated by our simple simulations-in which mutation is deterministic-bottlenecks and population dynamics can also introduce further stochastic effects that influence stability (fig. 1) . furthermore, empirical studies with high levels of replication show the extent to which observed stability does vary between replicates (zwart et al. 2014 ). good codons, bad transcript: large reductions in gene expression and fitness arising from synonymous mutations in a key enzyme inhibition of human immunodeficiency virus type 1 (hiv-1) replication by a two-amino-acid insertion in hiv-1 vif from a nonprogressing mother and child poxviral promoters for improving the immunogenicity of mva delivered vaccines evidence that gene amplification underlies adaptive mutability of the bacterial lac operon increased replication capacity following evolution of pyxe insertion in gag-p6 is associated with enhanced virulence in hiv-1 subtype c from east africa bovine adenovirus-3 as a vaccine delivery vehicle vsv-tumor selective replication and protein translation the evolution of genome compression and genomic novelty in rna viruses reverse genetics of measles virus and resulting multivalent recombinant vaccines: applications of recombinant measles viruses (non)parallel evolution genome packaging in multi-segmented dsrna viruses: distinct mechanisms with similar outcomes', current opinion in virology expression of a bacterial gene in plants by using a viral vector nonsegmented negative-strand viruses as vaccine vectors recombinant vector vaccine evolution protein tagging and detection with engineered self-assembling fragments of green fluorescent protein distribution of fitness and virulence effects caused by single-nucleotide substitutions in tobacco etch virus overview of the baculovirus expression system fitness of rna virus decreased by muller's ratchet' potato virus x as a vector for gene expression in plants bacteriophage lambda as a cloning vector stability of recombinant plant viruses containing genes of unrelated plant viruses synonymous codon changes in the oncogenes of the cottontail rabbit papillomavirus lead to increased oncogenicity and immunogenicity of the virus split green fluorescent protein as a tool to study infection with a plant pathogen modifications of the tobacco mosaic virus coat protein gene affecting replication, movement, and symptomatology a tobacco mosaic virus-hybrid expresses and loses an added gene a structured dynamic model for the baculovirus infection process in insect-cell reactor configurations empirical fitness landscapes and the predictability of evolution potato virus x-based expression vectors are stabilized for long-term production of proteins and larger inserts mutagenesis of cauliflower mosaic virus tagging of plant potyvirus replication and movement by insertion of beta-glucuronidase into the viral polyprotein systemic expression of a bacterial gene by a tobacco mosaic virus-based vector transgene stability for three replication-competent murine leukemia virus vectors' gene therapy clinical trials worldwide 1989-2004-an overview poxviruses deploy genomic accordions to adapt rapidly against host antiviral defenses size reversion of african cassava mosaic virus coat protein gene deletion mutants during infection of nicotiana benthamiana protein complex expression by using multigene baculoviral vectors' bacterial gene inserted in an engineered rna virus: efficient expression in monocotyledonous plant cells the size of encapsidated single-stranded dna determines the multiplicity of african cassava mosaic virus particles cpg and upa dinucleotides in both coding and non-coding regions of echovirus 7 inhibit replication initiation post-entry', elife correction: characterization of hbv integration patterns and timing in liver cancer and hbv-infected livers properties of replication-competent vesicular stomatitis virus vectors expressing glycoproteins of filoviruses and arenaviruses clinical development of modified vaccinia virus ankara vaccines evolutionary principles and synthetic biology: avoiding a molecular tragedy of the commons with an engineered phage poxvirus vectors as hiv/aids vaccines in humans propagation of foreign dna in plants using cauliflower mosaic virus as vector susceptibility to recombination rearrangements of a chimeric plum pox potyvirus genome after insertion of a foreign gene lmo2-associated clonal t cell proliferation in two patients after gene therapy for scid-x1 homologous recombination in negative sense rna viruses stability and expression of bacterial genes in replicating geminivirus vectors in plants rescue of in vitro generated mutants of cloned cauliflower mosaic virus genome in infected plants defective interfering viruses localized attenuation and discontinuous synthesis during vesicular stomatitis virus transcription high rate of recombination throughout the human immunodeficiency virus type 1 genome specific targeting to cd4þ cells of recombinant vesicular stomatitis viruses encoding human immunodeficiency virus envelope proteins bsmv genome mediated expression of a foreign gene in dicot and monocot plant cells entirely plasmid-based reverse genetics system for rotaviruses in vitro and in vivo genetic stability studies of a human adenovirus type 5 recombinant rabies glycoprotein vaccine (onrab)', vaccine reverse genetics system demonstrates that rotavirus nonstructural protein nsp6 is not essential for viral replication in cell culture detection and analysis of autographa californica nuclear polyhedrosis virus mutants with defective interfering properties evolution and taxonomy of positive-strand rna viruses: implications of comparative analysis of amino acid sequences vaccination against rabies: construction and characterization of sag2, a double avirulent derivative of sadbern mammalian metallothionein functions in plants a recombinant canine distemper virus expressing a modified rabies virus glycoprotein induces immune responses in mice rescue and evaluation of a recombinant prrsv expressing porcine interleukin-4' genomic stability of murine leukemia viruses containing insertions at the env 3' untranslated region boundary dispersing biofilms with engineered enzymatic bacteriophage mutation load and the survival of small populations elimination of rabies from red foxes in eastern ontario stability and fitness impact of the visually discernible rosea1 marker in the tobacco etch virus genome a highly immunogenic and protective middle east respiratory syndrome coronavirus vaccine based on a recombinant measles virus vaccine platform recombination in eukaryotic single stranded dna viruses the shift from low to high non-structural protein 1 expression in rotavirus-infected ma-104 cells', memó rias do instituto oswaldo cruz attenuated poliovirus strain as a live vector: expression of regions of rotavirus outer capsid protein vp7 by using recombinant sabin 3 viruses influence of genome-scale rna structure disruption on the replication of murine norovirus-similar replication kinetics in cell culture but attenuation of viral fitness in vivo highly stable expression of a foreign gene from rabies virus vectors clinical use of lentiviral vectors estimation of the size of genetic bottlenecks in cell-to-cell movement of soil-borne wheat mosaic virus and the possible role of the bottlenecks in speeding up selection of variations in trans-acting genes or elements instability and reiteration of dna sequences within the vaccinia virus genome the v4 and v5 variable loops of hiv-1 envelope glycoprotein are tolerant to insertion of green fluorescent protein and are useful targets for labeling effects of viral strain, transgene position, and target cell type on replication kinetics, genomic stability, and transgene expression of replication-competent murine leukemia virus-based vectors phylogenetic measures of indel rate variation among the hiv-1 group m subtypes deletion and recombination events between the dna-a and dna-b components of indian cassava-infecting geminiviruses generate defective molecules in nicotiana benthamiana generation of recombinant rotavirus expressing nsp3-unag fusion protein by a simplified reverse genetics system an infectious west nile virus that expresses a gfp reporter gene spontaneous excision of bac vector sequences from bacmid-derived baculovirus expression vectors upon passage in insect cells double-subgenomic sindbis virus recombinants expressing immunogenic proteins of japanese encephalitis virus induce significant protection in mice against lethal jev infection assessment of recombinants that arise from the use of a tmv-based transient expression vector clinical trials with retrovirus mediated gene therapy: what have we learned? a novel genotype encoding a single amino acid insertion and five other substitutions between residues 64 and 74 of the hiv-1 reverse transcriptase confers high-level cross-resistance to nucleoside reverse transcriptase inhibitors generation of infectious clone of bovine adenovirus type i expressing a visible marker gene complete protection from papillomavirus challenge after a single vaccination with a vesicular stomatitis virus vector expressing high levels of l1 protein functional cdna clones of the flaviviridae: strategies and applications construction of bacteriophage phix174 mutants with maximum genome sizes properties and behavior of orally administered attenuated poliovirus vaccine accommodation of foreign genes into the sendai virus genome: sizes of inserted genes and viral replication the distribution of fitness effects caused by single-nucleotide substitutions in an rna virus challenges in predicting the evolutionary maintenance of a phage transgene the minimal conserved transcription stop-start signal promotes stable expression of a foreign gene in vesicular stomatitis virus plant virus gene vectors for transient expression of foreign proteins in plants integration of hepatitis b virus dna into the genome of liver cells in chronic liver disease and hepatocellular carcinoma mutational analysis of the small intergenic region of maize streak virus the effect of gene overlapping on the rate of rna virus evolution long nucleotide insertions between the hn and l protein coding regions of human parainfluenza virus type 3 yield viruses with temperature-sensitive and attenuation phenotypes construction and transposon mutagenesis in escherichia coli of a full-length infectious clone of pseudorabies virus, an alphaherpesvirus evolutionary stability of a refactored phage genome chimpanzee adenovirus vaccine generates acute and durable protective immunity against ebolavirus challenge novel defective interfering dnas associated with ageratum yellow vein geminivirus infection of ageratum conyzoides a number of subgenomic dnas are produced following agroinoculation of plants with beet curly top virus mutational biases influence parallel adaptation vaccinia vectors as candidate vaccines: the development of modified vaccinia virus ankara for antigen delivery expression of bacterial chloramphenicol acetyltransferase gene in tobacco plants mediated by tmv-rna characterization of recombinant flaviviridae viruses possessing a small reporter tag endogenous viral sequences in plant genomes viable molecular hybrids of bacteriophage lambda and eukaryotic dna recombinant sendai viruses expressing different levels of a foreign reporter gene the rate and spectrum of spontaneous mutations in a plant rna virus rotavirus rearranged genomic rna segments are preferentially packaged into viruses despite not conferring selective growth advantage to viruses selection of scfv phages on intact cells under low ph conditions leads to a significant loss of insert-free phages effects of random mutations in the human immunodeficiency virus type 1 transcriptional promoter on viral fitness in different host cell environments expression of a bacterial gene in plants mediated by infectious geminivirus dna adding genes to the rna genome of vesicular stomatitis virus: positional effects on stability of expression predicting the stability of homologous gene duplications in a plant rna virus high virulence does not necessarily impede viral adaptation to a new host: a case study using a plant rna virus effects of palindrome size and sequence on genetic stability in the bacteriophage /x174 advancement and applications of peptide phage display technology in biomedical science construction and in vitro evaluation of a recombinant live attenuated prrsv expressing gm-csf recombinant newcastle disease virus as a viral vector: effect of genomic location of foreign gene on gene expression and virus replication matters of size: genetic bottlenecks in virus infection and their potential impact on evolution key: cord-307202-iz1bo218 authors: shaw, dominick; portelli, michael; sayers, ian title: asthma date: 2014-05-02 journal: handbook of pharmacogenomics and stratified medicine doi: 10.1016/b978-0-12-386882-4.00028-1 sha: doc_id: 307202 cord_uid: iz1bo218 asthma is a common respiratory disease with a complex etiology involving a combination of genetic and environmental components. current asthma management involves a step-up and step-down approach based on asthma control with a large degree of heterogeneity in responses to the main drug classes currently in use: β(2)-adrenergic receptor agonists, corticosteroids, and leukotriene modifiers. importantly, asthma is heterogeneous with respect to clinical presentation and the inflammatory mechanisms that underlie it. this heterogeneity likely contributes to variable results in clinical trials, particularly when targeting specific inflammatory mediators. these factors have motivated a drive toward stratified medicine in asthma based on clinical/cellular outcomes or genetics (i.e., pharmacogenetics). significant progress has been made in identifying genetic polymorphisms that influence the efficacy and potential for adverse effects of all main classes of asthma drugs. importantly an emerging role for genetics in phase ii development of newer therapies has been demonstrated (e.g., anti-il4). similarly, the stratification of patients based on clinical characteristics (e.g., blood and sputum eosinophil levels) has been critical in evaluating newer therapies (e.g., anti-il5). as a proof of concept, anti-ige is the latest therapy to be introduced into clinical practice, although only for severe, allergic patients (i.e., in a stratified manner). as new asthma genes are identified using genome-wide association, among other technologies, new targets (e.g., il33/il33 receptor (il1rl1)) will emerge and pharmacogenetics in these development programs will be essential. in this chapter we review the current understanding of asthma pathobiology and its clinical presentation, as well as the use of stratified medicine, which holds great promise for maximizing clinical outcomes and minimizing adverse effects in existing and new therapies. umbrella definition will require integrating bench and bedside approaches using data from ongoing genomic and proteomic profiling studies of large, well-characterized asthma populations-such as the current eu-wide ubiopred study (formally, "unbiased biomarkers in prediction of respiratory disease outcomes")-with feedback to improved in vitro and in vivo animal models and human studies. the international consensus report on the diagnosis and treatment of asthma defines asthma as "a chronic inflammatory disorder of the airways in which many cells play a role, including mast cells and eosinophils. in susceptible individuals this inflammation causes symptoms which are usually associated with widespread, but variable, airflow obstruction that is often reversible either spontaneously or with treatment, and causes an associated increase in airway responsiveness to a variety of stimuli." the interaction of these features determines the clinical manifestations and severity of the disease and the response to treatment [3] . how these features relate to each other, how they are best measured, and how they contribute to clinical manifestations of the disease remains unclear. age of disease onset also complicates understanding, although there are many shared features in the diagnosis of asthma in children and in adults, there are also important differences. the differential diagnosis, the natural history of wheezing illnesses, and the ability to perform certain investigations are all influenced by age [4] . according to the world health organization (who), between 100 and 150 million people around the globe (roughly equivalent to the population of the russian federation) suffer from asthma. worldwide, deaths from asthma have reached more than 180,000 annually. overall, asthma affects 5-10% of the population in many developed countries. one study found that the annual estimated incidence of physician-diagnosed asthma ranged from 0.6-29.5 per 1000 persons. risk factors for incident asthma among children include male gender, atopic sensitization, parental history of asthma, early-life stressors and infections, obesity, and exposure to indoor allergens, tobacco smoke, and outdoor pollutants. risk factors for adult-onset asthma include female sex, airway hyper-responsiveness, lifestyle factors, and work-related exposures. while the exact cost of asthma care is difficult to determine, a 2009 systematic review [5] found eight national studies reporting a national total cost. the total costs in 2008 in u.s. dollars were high and varied widely: singapore, $49.36m; canada, $654m; switzerland, $1413m; germany, $2740m; united states, $8256m. although the costs of asthma care vary by country, it is estimated that worldwide the number of disability-adjusted life years (dalys) lost from asthma is about 15 million per year. worldwide asthma accounts for around 1% of all dalys lost, which reflects the high prevalence and severity of asthma. the number of asthmacaused lost dalys is similar to that for diabetes, cirrhosis of the liver, and schizophrenia. costs are related to other factors, including asthma control, demographics, medical history, and doses of inhaled corticosteroid (ics) prescribed [6] . improving asthma control is associated with decreased cost. asthma classically causes wheeze, cough, chest tightness, and breathlessness (dyspnoea). the pattern of symptoms can be a clue to the diagnosis. episodic symptoms worse at night and in the early morning, or in response to exercise, allergen and cold air exposure, or after taking aspirin or beta blockers are very suggestive of asthma. moreover, symptoms that are relieved by bronchodilators point toward asthma as the underlying cause. symptoms that lower the probability of asthma include those of prominent dizziness, light-headedness, and peripheral tingling (all point toward dysfunctional breathing); a chronic productive cough in the absence of wheeze or breathlessness (more likely bronchiectasis); a repeatedly normal physical examination of chest when symptomatic; and voice disturbance, symptoms with colds only, or a significant smoking history. taking a detailed patient history and recording spirometry when the patient is symptomatic is crucial to the accurate classification of the disease. there are several differential diagnoses that must be considered when making a diagnosis of asthma. these are outside the scope of this chapter. a good guide to the clinical diagnosis and management of asthma are the british thoracic society asthma guidelines, which are updated regularly and are available at www.brit-thoracic.org.uk/clinical-information/ asthma/. triggers for asthma symptoms include allergens (proteins) derived from, among others, house dust mite (hdm), cat and dog dander, cockroach, and fungi, especially aspergillus fumigatus. there are numerous other nonallergic symptom triggers, including weather and atmospheric change, air pollution, exercise, menstruation, emotion and laughter, viral infection, gastroesophageal reflux, and rhinitis. various occupations can both cause and worsen asthma (e.g., welding, baking, paint spraying). asthma symptoms often overlap with other allergies, including oral allergy syndrome and food allergies. delineating between causes of symptoms can be difficult. if a specific aeroallergen is identified that triggers asthma (classically tree or grass pollen), a course of either sublingual or subcutaneous desensitization can be undertaken to reduce symptoms on exposure to that specific allergen. there are multiple other symptom triggers that can include both allergic and non-allergic mechanisms; examples include exposure to chemicals, such as perfumes, paint and bleach, plants (e.g., ligustrum vulgare-privet hedges), salicylates, and sulphites. multiple factors have been shown to either directly cause or worsen symptoms of asthma [4] . these are listed in box 28.1. other factors not directly related to the disease can worsen symptoms and are associated with more frequent exacerbations. these include dysfunctional breathing, vocal cord dysfunction, psychosocial problems, and poor adherence to treatment regimes [4] . other forms of asthma are recognized, among them occupational, exercise-induced, and pregnancy-related asthma. these subdivisions are somewhat arbitrary, but may have different causes, prognoses, and complexities. for example, occupational asthma includes that triggered by ige-mediated mechanisms, that due to specific agents with unclear pathophysiology, and that secondary to irritants (also known as rads-reactive airways dysfunction syndrome). underlying asthma can also be made worse by occupational exposure (work-aggravated). most experts regard exercise-induced asthma as a different disease, albeit one resulting in similar symptoms. exercise-induced asthma is often diagnosed with different investigations, including eucapnic voluntary hyperventilation or exercise testing. it is associated with certain sports (swimming and cross country skiing in particular) and has a unique pathophysiology directly related to damage of the airway epithelium. asthma control can deteriorate in pregnancy, due to both hormonal effects and the physical effects of a gravid uterus on diaphragm and respiratory muscle function. pregnancy can also impact treatment decisions. leukotriene modifiers are generally contraindicated, and the health of the fetus as well as that of the mother needs to be considered. poor adherence is probably the biggest single issue affecting asthma control. studies have shown that people with asthma over-report use of ics; one study found that although 95% of responders said they used their inhalers, only 58% actually did. adherence to ics therapy is associated with a lower rate of death, whereas increasing use of reliever (salbutamol/bricanyl) medication, which improves symptoms but does not treat underlying airway inflammation, is associated with increased mortality [7, 8] . a 2010 report entitled evaluation of the scale, causes and costs of waste medicines: report of dh funded national project (trueman et al. york health economics consortium and university of london school of pharmacy, 2010) put figures to the possible cost benefits from improving asthma treatment adherence in the united kingdom. it estimated the resulting net benefit associated with compliance to be £2250 per patient, with a reduction in expected annual treatment costs of approximately £75 per patient per year. based on an asthma point prevalence of 5.8%, the findings suggest that there are almost 1.8 million asthmatics in the united kingdom who are noncompliant. if interventions were available to change the behavior of all partially compliant medicine users to raise the percentage to 80% or more, the report estimated that over £130 million in treatment cost savings could be realized in the united kingdom alone. a more modest target of doubling current compliance rates would result in savings of approximately £90 million per year. the aim of asthma treatment is to improve symptoms, maintain lung function, and prevent exacerbations. from a patient's perspective, these are simple aims, but for the purposes of study design and treatment trials they do not address the complexity of the disease process. accordingly, guidelines now specifically address asthma control (symptoms and lung function) and severity (need for treatment) separately. the current global initiative for asthma (gina) guidelines assess levels of control based on daytime symptoms, limitation of activities, nocturnal symptoms/awakening, need for reliever/rescue treatment, lung function (pef or fev 1 ), and exacerbations [9] ( treatment steps are defined by the treatment level required to maintain asthma control (table 28 .1), with patients at steps 1-2 having well-controlled asthma requiring little treatment, and patients at step 4 having poorly controlled asthma despite four or five different drugs. the step approach is similar to that used by the british thoracic society (bts) guidelines [4] . asthma is a variable disease both temporally and clinically. there is also overlap between symptoms and exacerbations. this overlap was the subject of a recent consensus document that defined asthma exacerbations as "events characterized by a change from the patient's previous status." severe exacerbations are "events that require urgent action on the part of the patient and physician to prevent a serious outcome, such as hospitalization or death from asthma," and moderate asthma exacerbations are "events that are troublesome to the patient, and that prompt a need for a change in treatment, but that are not severe…clinically identified by being outside the patient's usual range of day-to-day asthma variation" [10] . the most practical definition of an asthma exacerbation is probably an episode of worsening symptoms not responding to increasing bronchodilator therapy. this definition was employed in an elegant study on the time course of peak flow changes [11] . for the purposes of clinical studies, most authors define exacerbation as the need for rescue courses of systemic corticosteroids (prednisolone/prednisone). although the majority of patients with asthma are treated and investigated in primary care settings, the main burden of asthma is due to its severe form (i.e., step 4) and exacerbations. using the united kingdom as an example (population 60 million), asthma is responsible for more than 1200 deaths per year and for over 50,000 hospital admissions, with an annual expenditure of £800 million on pharmaceutical costs alone. it was recently estimated that a patient controlled at the mildest end of the spectrum (british thoracic society (bts) guidelines: step 1 with no exacerbations) would cost 50 times less to provide his or her package of care than would a patient at the worst end of the spectrum (bts guidelines: step 5 with exacerbations) [12] . it is also calculated that 5% of asthma patients are responsible for at least 50% of the total healthcare burden [4] ; the most expensive individual cost is an icu admission (bts level 3) due to a life-threatening exacerbation. in the u.s. more than half (53%) of asthma sufferers in 2008 had an asthma attack, and of these half of the children and one-third of the adults missed school or work because of it; on average, children missed four days of school and adults missed five days of work. exacerbation pathogenesis is not fully understood. although research has focused on infective agents, especially viral, there may be other explanations. asthma exacerbations are associated with both inflammatory and immunological cell infiltration. the inflammatory cell infiltrate is composed of varying numbers of eosinophils, neutrophils, and lymphocytes. this airway inflammation, combined with smooth-muscle hypertrophy step 1 (intermittent) fewer than once/week; asymptomatic, normal pef between attacks two or fewer/month 80% or more less than 20% step 2 (mild persistent) more than once/week, fewer than once/day; attacks may affect activity >2 times a month >/=80% 20-30% step 3 (moderate persistent) daily; attacks affect activity more than once/week 60%-80% more than 30% step 4 (severe) continuous; limited physical activity frequent 60% or less more than 30% and thickening of the lamina reticularis, is accentuated by mucus plugs, serum protein deposition, inflammatory cell and cellular debris, leading to blockage of the airway (airflow obstruction) and wheeze. approximately 80% of exacerbations are associated with respiratory tract viral infections, with rhinovirus responsible for about two-thirds of cases [13] . asthmatic subjects also have much more severe lower respiratory tract illness with rhinovirus infection than do healthy control subjects. the mechanism for this is not fully understood. infection induces inflammation, increasing levels of neutrophils, eosinophils, cd41+ cells, cd81+ cells, and mast cells through increased mrna expression and translation of il-6, il-8, il-16, eotaxin, ifn-γ-induced protein 10 (ip-10), chemokine (c-c motif) ligand 5 (ccl5/rantes), and other proinflammatory cytokines. other viruses implicated in exacerbations include enterovirus, coronavirus, influenza, parainfluenza, respiratory syncytial virus, metapneumovirus, adenovirus and bocavirus [14] . although influenza vaccination is recommended for all individuals with asthma, there is currently no hard evidence that this improves outcomes. bacterial infection has also been implicated in asthma exacerbation. individuals with asthma have an increased risk of invasive pneumococcal disease [15] , and an increased frequency of detection of chlamydophila pneumoniae [16] . one study found mycoplasma pneumoniae infection in 20% of patients hospitalized for severe asthma [17] . several trials have evaluated the role of macrolide antibiotics in treating and preventing asthma exacerbations. one study randomized adults with exacerbations to the ketolide antibiotic telithromycin or placebo; the telithromycin group had a small but significant improvement in symptoms and lung function from exacerbation to the end of treatment [18] . further studies of macrolide antibiotics (which have immunomodulatory as well as antibiotic properties) are under way, but as yet macrolide antibiotics are not recommended by international asthma guidelines. new techniques for assessing the airway bacterial microbiota have established that the airways are not normally sterile, and may permit the role of bacterial infection in airways disease to be further delineated. a recent study found that pathogenic proteobacteria, particularly haemophilus species, were much more frequent in the bronchi of patients with asthma or copd than in controls [19] . prolonged exposure to aeroallergens can result in chronic airway inflammation via th2-driven ige mechanisms. this immunological reaction may intensify airway inflammation, increase inflammatory cell activation, and stimulate mucus glands to hypersecrete, leading to airway obstruction. studies of bronchoalveolar fluid before and after allergen challenge show eosinophilic inflammation as the major airway response, associated with late-phase responses of airflow obstruction. in addition, il5 and il13 are significantly raised. exposure to seasonal allergens has been implicated in sudden asthma-related deaths; hdm, cat, and cockroach allergen sensitization are risk factors for emergency treatment [20] . grass pollen sensitization, or ''thunderstorm asthma,'' has also been associated with epidemics of exacerbations [21] . fungal allergens are found both outdoors and indoors, and sensitivity to them is a risk factor for the development, persistence, severity, and mortality associated with asthma. individuals with asthma are often sensitized to fungi such as cladosporium species, alternaria species, penicillium species, and candida species [22] . alternaria species sensitization and exposure are associated with symptoms and a 200-fold increased risk of respiratory arrest in subjects with asthma [23] . fungal exposures may worsen disease by different mechanisms; fungalassociated proteases may lead to the development of allergic airway inflammation along with ige-mediated responses; fungal allergen-induced asthmatic reactions evoke an il5 response with increased eosinophil recruitment and degranulation [24] . the relationship between fungal exposure and asthma is complicated by the degree of sensitivity to fungal allergens and the resultant allergy. a specific disease entity called allergic bronchopulmonary aspergillosis (abpa) exists in which colonization of the respiratory tract with the ubiquitous aspergillus fumigatus is associated with an increased allergic response (both type 1 and type 3) and severe asthma exacerbations. trials in both abpa and a related disease, severe asthma with fungal sensitization (safs), have shown some benefit with the oral antifungal itraconazole [25] , but good-quality trial data are lacking. unlike other chronic conditions, there is no gold standard for the diagnosis of asthma. although tests of airway function can be used to diagnose and study different aspects of airway disease (airway inflammation/airway hyper-responsiveness), diagnosis still depends on the presence of specific symptoms, which include wheeze, cough, chest tightness, and difficulty breathing. these symptoms are often variable, worse at night, or worse in response to triggers. they often respond to asthma therapy and may be associated with atopy/allergy and a family history of similar problems. the lack of a gold standard affects clinical provision and studies alike; most estimates suggest that asthma is wrongly diagnosed in approximately 30% of patients [26] . the development of a cheap and reliable noninvasive test that can diagnose asthma is seen as the holy grail for diagnosisbased research. measurements of airflow limitation provide an assessment of the severity, reversibility, and variability of airflow obstruction, and can help confirm the diagnosis of asthma. spirometry (forced expiratory volume in 1 second (fev 1 ), forced vital capacity (fvc), and fev 1 /fvc) is the preferred method of measuring airflow limitation, as it is more repeatable and less effort dependent than peak flow. a fev 1 /fvc ratio of less than 0.7 is consistent with a diagnosis of airflow obstruction, and an increase in fev 1 of >12% (or >200 ml) after administration of a short-acting bronchodilator indicates reversible airflow limitation consistent with asthma [27] . importantly, most asthma patients will not exhibit reversibility at each assessment, and repeated testing is advised to confirm a diagnosis. consequently, an absent response to bronchodilators does not exclude asthma. peak expiratory flow (pef) measurements can be used to monitor asthma. they are ideally compared to the patient's own previous best measurements using his or her own peak flow meter. an improvement of 60 l/min (or >20% of the prebronchodilator pef) after inhalation of a bronchodilator, or diurnal variation in pef of more than 20% (with twice-daily readings, more than 10%), suggests poorly controlled asthma. other approaches have been utilized to diagnose asthma, both to confirm a formal diagnosis and to identify corticosteroid response. these approaches include noninvasive measurements of airway inflammation or assessment of airway hyper-responsiveness. measuring airway inflammation is relatively easy, either using techniques to induce sputum and then counting the differential eosinophil and neutrophil cell numbers present, or by measuring the fraction of nitric oxide (no) present in the exhaled breath (f e no) using portable chemical analyzers. airway hyper-responsiveness can be assessed by several different methods. all are designed to provoke bronchoconstriction, which is measured using spirometry. the concentration (or dose) at which bronchoconstriction occurs is then used to categorize the degree of airway responsiveness and the presence or absence of asthma. although these tests are widely employed in research settings, their use in a clinical setting is hampered by their varying sensitivity and specificities for asthma diagnosis [4] , as both airway inflammation and airway hyper-responsiveness can be present in healthy individuals without symptoms. the symptoms of asthma in children are recurrent wheezing, cough, difficulty breathing, and chest tightness. evaluation of these symptoms is critical to the diagnosis, treatment, and outcome measures used in clinical studies; however, asthma triggers and inflammatory cell type are increasingly used to define childhood asthma phenotypes. pediatric asthma is complicated by wide variations in symptom prevalence. the majority of children with asthma have mild or moderate disease; 5% of all asthmatic children have chronic symptoms and/or recurrent exacerbations, despite maximum treatment with conventional medications [28] . asthma in childhood is heterogeneous in several ways, including etiology, clinical presentation, and response to treatment. several attempts have been made to stratify treatment on the basis of different asthma phenotypes in children, with varying degrees of success. stratifying treatments in pediatric asthma is more complicated because of patient age and the inability (dependent on age) to perform more complicated or invasive tests. in children, the onset and progression of asthma result from a complex interplay between genetic background and environmental exposures. the complexity is confounded by wheezing illnesses consisting of several distinct disease entities, with no general agreement on their number or underlying mechanisms [29] . among the environmental factors that influence the risk of asthma are viral and bacterial respiratory infections. exposure to environmental tobacco smoke is also associated with increased rates of early viral illnesses. other factors associated with the risk of wheeze include physical factors associated with increased breathing (exercise, laughing, crying, and excitement) or allergens (aeroallergens and food allergens). there are considerable age-related changes in the relative importance of trigger factors for wheeze in children. human rhinovirus (hrv) has been implicated in both the etiology of asthma and exacerbations; infants who have hrv infections with wheezing are at a significantly increased risk for subsequent asthma, and over 50% of childhood asthma exacerbations are triggered by hrv. it is important to note that exposure to hrv does not lead to wheezing illness in all children, nor does wheezing illness result in asthma in all cases, suggesting that the host genotype also plays a role. a recent study of two cohorts of children with asthma found that variants at the 17q21 locus were associated with asthma in children who had hrv wheezing illnesses, implicating an interaction between hrv wheezing illness in early childhood and the 17q21 genotype (see section 28.3.2). the presentation of asthma symptoms and exacerbations in childhood also varies. some children have frequent exacerbations with few daily symptoms, while others have recurrent symptoms without airway inflammation and exacerbations. other diseases coexist with childhood asthma, including gastroesophageal reflux disease, severe asthma with fungal sensitization, obesity, and vocal cord dysfunction. targeting these coexisting disorders has met with varying clinical success [30] . an important component of the development of asthma is the inflammatory cascade. this involves the infiltration of a number of inflammatory cells, such as eosinophils, neutrophils, b-and t-lymphocytes, macrophages, mast cells, dendritic cells, and basophils, into the airway, and the release of inflammatory mediators (e.g., leukotrienes, histamine, cytokines, and chemokines) by airway, structural cells (including epithelial, smooth-muscle, endothelial, and fibroblast cells). however, since the degree of inflammation in asthma is not directly related to asthma severity, this suggests that other factors, such as structural changes in the airways, also play a role in disease modulation and progression. these structural changes have been termed airway remodeling and may exist in the presence or absence of inflammatory mechanisms in the airway [31] . the inflammatory response in asthma is a result of excessive activation of mast cells in the airway, where in the early response to allergen challenge, degranulation of mast cells results in the release of a number of proinflammatory factors such as il4, leukotrienes, and histamine. this causes an immediate hypersensitivity response that in turn leads to airway narrowing. the concomitant release of other inflammatory factors such as cytokines and chemokines from the same mast cells provides an optimal environment for recruitment to the airways of other inflammatory cells such as eosinophils, basophils, neutrophils, and t-lymphocytes. once in the airways, these cells act in tandem with allergen-activated macrophages, resulting in what is termed the late asthmatic response (occurring at 4-8 hours postchallenge). here, through the action of a number of released potent immunomodulators (e.g., tnfα, il3, il4, il5, and il13), airway narrowing occurs. this may occur in periods that last more than 24 hours. of the inflammatory cells involved in the asthmatic response, it is the infiltration of eosinophils that is considered characteristic of the asthmatic phenotype, particularly that of mild/moderate allergic asthma ( figure 28.1) . for a more extensive description of inflammatory mechanisms underlying allergic asthma see hodge & sayers [32] . although inflammation in asthma has been extensively studied, the relationship between inflammation and clinical symptoms of asthma is still unclear. this may be partially explained by a possible link between airway inflammation and airway hyper-responsiveness [33] . here, airways become more susceptible to sensitizing agents, which would otherwise be unable to trigger an airway response, because of (a) increased release of mediators such as histamine and leukotrienes, (b) abnormal smooth-muscle behavior, and (c) airway thickening. airway remodeling plays an important role in asthma pathogenesis. airway structural changes characteristic of airway remodeling occur because of prolonged airway inflammation. here, prolonged release of inflammatory factors results in: l thickening of the smooth-muscle bronchial walls, leading to airway narrowing l denudation of the bronchial epithelium l hyperplasia and hypertrophy of the airway smoothmuscle l hyperplasia and hypertrophy of the epithelial goblet cells, resulting in the formation of large mucus plugs liable to occlude the airway and increase airway hyper-responsiveness airway remodeling also involves changes occurring in the extracellular matrix and its constituent proteins (collagens i/iii/iv, fibronectin, and laminin) and structural changes such as angiogenesis, vasodilation, increased airway blood flow, and changes in autonomic neurological function [34] . the development of asthma results in a degree of airway remodeling. the structural changes that define airway remodeling are an important feature in asthmatic airways, where even in newly diagnosed asthma patients, a degree of structural change can be identified in the bronchial wall [35] . airway remodeling can be defined as a process of sustained disruption and modification of structural cells and tissues leading to the development of a new airway wall morphology [36] . airway remodeling ( figure 28 .2) is initiated either through various inflammatory pathways, highlighting the importance of the inflammatory pathway [37] , or through bronchial hyper-responsiveness, where airway remodeling occurs independently of inflammation [38] . our understanding of the extent and nature of genetic variation in the human genome has dramatically improved since the publication of the first draft genome sequence in 2001, figure 28.1 asthma pathophysiology. exposure to allergens-for example, dermatophagoides pteronyssinus 1 (der p1) and dactylis glomerata (dacg1)-originating from various sources, including pollen, house dust mite, and mold, are taken up by dendritic cells, b-lymphocytes, epithelial cells, and macrophages, which present the antigens to t-lympocytes. this activates the t-lymphocytes to produce cytokines that regulate immunoglobulin e (ige) production by the b-lymphocytes. bound ige activates mast cells in the airways. activated mast cells initially release a number of factors, including il4, leukotrienes, and histamine, that cause airway hyper-responsiveness and result in an early response to allergen stimulus via bronchoconstriction. the concurrent release of cytokines and chemokines from the mast cell recruits eosinophils, basophils, and t-lymphocytes to the airway. these cells, in association with t cell-activated neutrophils and with antigen stimulated macrophages, release chemokines and leukotrienes over an extended period of time, resulting in a late response to allergen stimulus via bronchoconstriction. prolonged stimulation of the late-response factors ultimately leads to the airway structural changes such as mucus hypersecretion, myofibroblast, and airway smooth-muscle proliferation, which are common to the asthmatic lung. and it continues to improve exponentially with initiatives such as the 100,000 genomes project, which aims to genome-sequence this number of uk subjects, and the encyclopedia of dna elements consortium (encode), an initiative to identify the genome's regulatory regions. recent figures suggest that more than 6.9 million singlenucleotide polymorphisms (snps) or single-base-pair changes exist in a genome consisting of 3 billion base pairs. similarly there is a growing realization that deletions, insertions, and expansions of tandem repeats also represent significant variation. these genetic polymorphisms are found throughout the genome and have the potential to influence gene function in several ways leading to human disease: (1) a coding-region nonsynonymous polymorphism can alter the amino acid sequence and structure/function of the protein; (2) a polymorphism can introduce a stop codon leading to the production of a nonfunctional protein; and (3) a polymorphism in a regulatory region can regulate the expression levels of a gene. it has long been known that atopic diseases such as asthma run in families. in 1916, using 621 atopic probands and 76 nonatopic controls and their respective families, it was shown that 48.4% of atopic probands had a family history of atopy, compared with 14.5% in the control population [39] . similarly, a study of 176 families showed a very high concordance of asthma, hayfever, and eczema in parents and children [40] . twin studies have been useful in identifying a significant concordance of asthma that is higher in monozygotic twins (identical genotype) versus dizygotic twins (on average sharing half of their genes). in the largest study to date, using 11,688 danish twin pairs, it was suggested that 73% of susceptibility to asthma was genetic, with a substantial environmental component [41] . more recent studies suggest hereditability estimates of 35-95% for asthma [42] . therefore, asthma is considered a complex genetic disorder and, in contrast to single-gene disorders (e.g., cystic fibrosis), involves multiple genes, with expression influenced by both genetic and environmental factors. multiple environmental factors are important in asthma development, including tobacco smoke exposure, respiratory viral infections, antibiotic use, diet, and allergen exposure. gender and ethnic background also make a significant contribution. this complex mode of inheritance combined with the heterogeneity in the presentation of the disease has made gene discovery a challenge. early studies investigated inheritance through families containing multiple affected children, using linkage analyses and candidate gene approaches based on biology or location in the genome. however, the figure 28.2 asthmatic airway pathology. this schematic comparison of a normal airway with that observed in severe chronic asthma indicates histological changes that accompany recurring inflammation seen over time. unlike that from the unaffected individual, the bronchial mucosa from the severe asthmatic displays thickening of the basement membrane, airway smooth-muscle hypertrophy, leukocyte infiltration, epithelial cell desquamation, goblet cell hyperplasia in the epithelial lining accompanied by mucus hypersecretion, and plugging of the bronchial lumen, as well as edema and collagen deposition in the submucosal area. (not drawn to scale.) source: reproduced with permission from hodge and sayers [32] . reproducibility of these findings was disappointing primarily because of inadequate power, subject heterogeneity (different phenotype definition), population stratification, and multiple testing without correction. more recently, hypothesis-free genome-wide association studies (gwas), which examine association with typically 500,000+ common (>5% frequency) polymorphisms spanning the genome in cases and controls, using very stringent statistical thresholds ( figure 28 .3), have been very successful. via these multiple approaches, over 190 asthma genes have been described in more than 1000 publications. the major reproducible findings for asthma follow below (for more comprehensive reviews see [42, 43] ). although this chapter focuses on the genetics of asthma diagnosis, subphenotypes in asthma (e.g., atopy, serum total ige levels, and lung function (e.g., fev 1 )) have also been investigated for genetic influences. positional cloning involves linkage analyses (region or whole genome) followed by fine mapping using association. linkage analysis uses family data to follow the transmission of genetic information spanning the entire genome (commonly short tandem repeat markers) across generations. these data are used to determine if a genetic marker is close to, or linked with, a gene involved in a particular disease using families with multiple affected children. once a specific chromosomal region has been identified in this hypothesis-free approach, snps spanning the region are tested to see if they are more common in people with asthma compared to people without. the first gene to be identified with confidence using positional cloning was disintegrin and metalloprotease 33 (adam33) [44] . using a cohort of 460 families, linkage was demonstrated to a locus on chromosome 20p13 for asthma, bronchial hyper-responsiveness (bhr), and total ige levels. subsequently, polymorphisms spanning adam33 were associated with asthma in the second stage. adam33 is thought to contribute to airway remodeling via its enzymatic functions. using positional cloning, multiple genes have been identified with functions varying from transcription factors to epithelial differentiation and tissue remodeling. these genes have provided a novel insight into potential mechanisms that are altered in asthma (table 28. 2). polymorphisms spanning the urokinase plasminogen activator receptor (plaur or upar) were also associated with asthma susceptibility and were associated with rate of decline of lung function in asthma through linkage/association analyses on chromosome 19q13 [45] . upar is a serine protease receptor involved in multiple mechanisms, including cell proliferation, migration, and extracellular matrix degradations via plasmin generation. these data suggest that genetic factors may be important in determining airway structural changes or "remodeling" in asthma [46] . genotype for typically 500,000+ snps in large numbers of cases and controls using arrays compare differences to identify disease-specific snps using stringent correction for multiple testing candidate gene studies are hypothesis-driven and based on the suggested biology of a gene product or the location of the gene in a chromosomal region previously linked to asthma. such studies commonly employ asthma cases versus controls, although family-based association has also been used. for excellent reviews of candidate gene studies in asthma, see ober and hoffjan [47] and undarmaa et al. [48] . the primary highly reproducible genes identified using candidate gene approaches are multiple components of the interleukin 4/13 axis (see table 28 .2), including the cytokine genes themselves (il4/il13); related receptor (il4ra) and downstream signaling effector signal transducers and activators of transcription (stat ) 6. the role of the il4/il13 axis in the pathogenesis of asthma has been extensively documented [49] . human studies have identified elevated numbers of cells expressing il13 mrna in the bronchial tissue of atopic and nonatopic asthmatic subjects [50] ; administration of recombinant il13 in mouse lungs resulted in an increase in airway mucus secretion, development of subepithelial fibrosis, airway hyper-responsiveness (ahr), and eosinophilic airway inflammation-that is, several key features of the human disease [51] . il13 is produced by a variety of cells, including th2 cd4+, th1 cd4+, cd8+ t cells, mast cells, basophils, and eosinophils. il13 mediates its effects by interacting with a complex receptor system comprising an il-4rα/il13rα1 heterodimer and the il13rα2 receptor [49] . the finding that genetic polymorphisms within il13 that determine levels and/or structure and are associated with the development of asthma make biological sense. the use of gwas approaches to identify asthma susceptibility genes has revolutionized our understanding of asthma genetics. while positional cloning showed great success with simple mendelian disorders, it did not perform well for complex diseases. similarly, candidate gene studies are useful but very inefficient. the capacity to interrogate the entire genome for 500,000+ common snps in cases and controls has provided an excellent platform for gene discovery. the first gwas for asthma used a discovery cohort of 994 patients with childhood onset asthma and 1243 nonasthma controls, and identified significant association to a locus on chromosome 17q21 [52] . this locus includes genes for zona pellucida binding protein 2 (zpbp2), gasdermin b (gsdmb) and orm1-like protein 3 (ormdl3). the 17q21 locus has been reproduced as a locus associated with childhood onset asthma in many studies; however, the identification of the specific gene(s) underlying these effects remains to be resolved. zpbp2, gsdmb, and ormdl3 have been implicated in gene transcription, cell apoptosis, and sphingolipid synthesis, respectively. this initial gwas has now been superseded by several larger-scale studies. in particular, the gabriel consortium study, involving 10,365 asthma cases and 16,110 controls, identified association between polymorphisms spanning il33, il1rl1/il18r1, hla-dq, smad3, and il2rb [53] . note: for candidate gene approaches, genes focused to replication in more than 10 studies [47, 48] . for gwas, genes focused to those meeting conventional genome-wide significance (p < 5 × 10−8) and/or independent replication in the caucasian population. to date, two of the most reproducible association signals are to the il33 gene on chromosome 9p24 and the il33 receptor (il1rl1 or st2) gene on chromosome 2, both of which highlight the significance of this ligand/receptor system in asthma. interestingly, these loci are associated with severe asthma as well [54] . there is good evidence from biology that the il33/st2 axis may be of relevance in asthma; mice induced to develop allergic airway disease were treated with an antibody that blocks il33 binding to its receptor. this treatment was shown to block many of the features of the disease, including reduced serum ige and airway eosinophil and lymphocyte counts. significantly, both induction and resolution of the disease were attenuated [55] . in human studies, il33 has been shown to be elevated in the airways of asthma patients, particularly in the airway structural cells, including the bronchial epithelium, and is induced in these cells by relevant stimulations (e.g., hdm [56] ). in addition, a soluble form of the st2 receptor has been shown to be elevated during asthma exacerbation [57] . overall, it is thought that the il33/st2 axis may be particularly important in attracting and activating inflammatory cells in the airways; however, the precise role(s) of this pathway remains to be resolved. as outlined in table 28 .2, other genes have been identified using gwas, including those involved in diverse roles such as inflammatory cell function and airway smoothmuscle contraction, again providing a unique insight into potentially altered mechanisms in asthma. however, most gwas determined single variants only confer a very modest odds ratio of 1.1-1.5 of developing asthma. for a more comprehensive review of gwas findings in asthma, see akhabir and sandford [43] . over the last 40 years, there has been great progress in identifying asthma susceptibility genes. current and future approaches include meta-analyses using cohorts of tens of thousands of asthma patients, the incorporation of gene expression/snp analyses (the expression quantitative trait loci (eqtl)), the investigation of asthma subphenotypes using gwas, and whole-genome sequencing. the real challenge is the translation of these genetic findings into a deeper understanding of the biology of asthma and the potential identification of therapeutic targets. already several of these newly identified genes (e.g., il33 and il33 receptor (il1rl1/st2)) represent excellent pharmacological targets, and it is anticipated that genetics in these loci will be essential to identifying patients most likely to gain clinical benefit from targeting this pathway. the lack of concordance between approaches (e.g., linkage versus gwas) can be explained by the fact that the methodologies are designed to detect different types of variants, e.g., linkage analysis has good power to detect high-risk disease-causing alleles but is not effective at identifying common alleles of modest effect, as gwas does. it is reassuring that many of the genes identified in candidate gene approaches have been reproduced in gwas (e.g., the il13/il4 locus on chromosome 5q31, a region that has also been linked to asthma). while there has been much success in recent years in identifying genes for asthma using gwas, the overall genetic variation accounted for by these genetic polymorphisms is very small, leading to the concept of "missing hereditability." possible explanations for missing hereditability include: (1) rare variants (<5% frequency) with larger effect size not measured on existing platforms; (2) structural variation (e.g., copy number variation); (3) gene-environment contributions; (4) gene-gene interactions; (5) epigenetic mechanisms; and (6) overestimation of initial hereditability. the aim of treatment is to achieve control of asthma and prevent exacerbations. most guidelines adopt a stepwise approach to the management of asthma and advise stepping up treatment as necessary and stepping down when control is achieved for at least three months. although there are several pharmacological options for asthma treatment (see figure 28 .4), the two main classes remain bronchodilators (short-or long-acting) and corticosteroids. short-acting bronchodilators (e.g., salbutamol/terbutaline) are sympathomimetics (β 2 -adrenergic receptor agonists) that relax airway smooth-muscle, enhance mucociliary clearance, decrease vascular permeability, and possibly modulate mediator release from mast cells. they are used for symptom control, acute exacerbations, and exerciseinduced asthma. frequent or regularly scheduled use of rapid-acting inhaled β 2 agonists for long-term management of asthma does not adequately control symptoms, pef variability, or airway inflammation. long-acting bronchodilators (e.g., salmeterol/formoterol) activity is identical to that of short-acting β 2 -adrenergic receptor agonists, but should be used in combination with inhaled corticosteroids because of concerns regarding increased mortality when used as a monotherapy. long-acting bronchodilators are normally introduced at treatment step 3. inhaled corticosteroids (ics) form the backbone of treatment for all but the very mildest asthma. ics improve lung function, symptoms, and quality of life, reduce exacerbations, and improve mortality. they have broad anti-inflammatory and immunosuppressive effects. corticosteroids enter the cell cytoplasm and bind with the inactive glucocorticoid receptor complex. the activated glucocorticoid receptor binds to dna at the glucocorticoid response element sequence and promotes synthesis of anti-inflammatory proteins (transactivation), and inhibits transcription and synthesis of many proinflammatory cytokines (transrepression). they also reduce the number of t-lymphocytes, dendritic cells, eosinophils, and mast cells in the airway, and reduce inducible nitric oxide production. ics (beclomethasone dipropionate/fluticasone propionate/budesonide/ flunisolide/ciclesonide/mometasone) all have side effects related to their transactivation properties, including local effects on the oropharynx (hoarseness, candidiasis, cough, and dysphonia) and systemic effects (cushing syndrome, osteoporosis, cataracts, dermal thinning and bruising, adrenal insufficiency, and growth suppression) in children [58] . consequently, the lowest dose required to control symptoms and prevent exacerbations should be prescribed. other medications are normally reserved for second-or third-line treatment. these include methylxanthines (theophylline), leukotriene modifiers (montelukast), inhaled anticholinergics (ipratropium bromide/tiotropium), and sodium chromoglycate. a wide range of interventions have been assessed for both the primary prevention of asthma and the secondary prophylaxis of the disease once present. these treatments have varying effects, but may be recommended in individual cases. primary prevention measures include aeroallergen and food allergen avoidance, fish oil supplementation, avoidance of tobacco smoke and air pollutants, immunotherapy, and immunization. measures that have been assessed for secondary prophylaxis include hdm and allergen avoidance, smoking reduction, subcutaneous and sublingual immunotherapy, dietary manipulation, weight loss, breathing techniques, and exercise training. some investigators argue that severe asthma is an inflammatory condition separate from mild/moderate asthma, rather than simply the severe end of the disease spectrum. this is difficult to prove, as underlying asthma may be complicated by other aspects (obesity/reflux/smoking/poor adherence), making it more difficult to treat and control. there are different definitions of severe asthma, but all require high-dose inhaled corticosteroids or oral corticosteroids and include recurrent exacerbations and poor control. as high-dose ics and oral corticosteroids are associated with side effects attempts have been made to design treatments to either reduce the amount of corticosteroid prescribed (steroid-sparing agents) or treat the underlying asthma in a different manner. steroid-sparing agents trialled include methotrexate, gold, cyclosporin, and azathioprine. currently, none have been recommended in any asthma guidelines. box 28.2 lists other treatments used on an individual basis. the need for a stratified-medicine approach to asthma has become apparent over the last decade. this has followed the realization that the term asthma covers a whole range of disease phenotypes. as techniques for investigating and researching asthma have evolved, it has become clear that different disease processes are present that require different treatments and approaches. whether this represents disease phenotyping or just a better understanding of disease pathogenesis is unclear. over the years, research focus has moved from the airway smooth-muscle via airway inflammation to airway immunology, and on to the epithelial-mesenchymal trophic unit. future insight into disease pathogenesis and new treatment developments will follow with the use of the latest technologies, which include "omics" platforms and assessment of airway microbiota. improving our understanding of stratified medicine for asthma-targeting the right therapy to the right patientaims to reduce hospital admissions due to poorly controlled asthma and fatalities. although most asthma can be controlled by ics and bronchodilators, there remains a cohort of patients with persistent symptoms, recurrent exacerbations, and worse quality of life. the current stepwise management approach (see figure 28 .4) is inefficient, particularly in light of heterogeneity in clinical presentation and response to existing therapies, making a stratified approach essential. early work conceptualized asthma as extrinsic (allergic) and intrinsic (nonallergic) [59] . however, the recent use of unbiased approaches to classify disease using three large datasets and cluster analysis [60] [61] [62] has highlighted the different types of disease under the umbrella term asthma ( figure 28 .5). although these early studies were performed in different parts of the world and had statistical variations, their results were similar. age at disease onset was found to be a key differentiating factor; early-onset disease was associated with more atopic and allergic conditions, whereas later-onset disease was associated with eosinophilic inflammation and obesity. these studies were all limited by a cross-sectional design and by the fact that treatment may have determined clusters. longitudinal studies initiated in childhood are ongoing. for example, pacman (formally, "pharmacogenetics of asthma medication in children: medication and anti-inflammatory effects") aims to differentiate children with uncontrolled asthma, despite ics treatment, using a range of clinical and cellular markers [63] . the role of allergy (an inappropriate and harmful immune response to a normally nonharmful substance which requires sensitization) and that of atopy (the tendency to develop ige antibodies to commonly encountered environmental allergens by natural exposure in which the route of entry is across intact mucosal surfaces) have been well studied in asthma. people with extrinsic asthma were thought to develop the disease earlier in life, be atopic, and have identifiable allergic triggers as well as other allergic diseases such as rhinitis or eczema. intrinsic asthma was thought to develop later in life (after 40 years of age), and be associated with aspirin-exacerbated respiratory disease but not with allergic sensitization. when small studies in humans suggested that levels of th2 cytokines were similar in extrinsic anti-ige bronchial thermoplasty macrolide antibiotics antifungals (itraconazole/voriconazole) anti-il5 anti-il13 box 28.2 new and emerging therapies reserved for severe asthma and intrinsic asthma, and that treatment with ics was effective in the majority of mild to moderate asthma cases, the distinctions between extrinsic and intrinsic asthma fell out of favor [64] . heterogeneity in response to existing medication is exemplified by malmstrom's study, which investigated individual response to leukotriene receptor antagonist (ltra) (montelukast 10 mg once daily), and the inhaled steroid beclomethasone (200 μg twice daily) over a 12-week period in 895 chronic asthma patients [65] . overall, there was a clear improvement in lung function (primary outcome fev 1 ) for both treatments; however, when stratified based on individual patient data, there was a very large degree of heterogeneity for both treatment groups, with some patients showing up to 50% improvement in fev 1 and others showing a decline in lung function with a 30% decrease in fev 1 (figure 28.6 ). this interindividual variability, a component of clinical trials that is seldom appreciated, is thought to have a strong genetic component. the therapeutic regimen for the treatment of asthma has been well defined and was described earlier in this chapter. despite such well-defined treatment regimens, which are constantly updated and streamlined, as well as the availability of high-quality medications, in many cases asthma is still not sufficiently controlled. while current asthma therapeutics are able to successfully treat 90-95% of the asthmatic population, around 50% still have daily symptoms and almost all patients report limitations of daily activities [66] . this also leaves around 5-10% of patients who do not respond to conventional treatments. this demonstrates that the current one-size-fits-all approach is not perfect, and emphasizes the need for a stratified medicine approach to asthma. variability in current therapy involves variation in drug efficacy and presentation of side effects, each of which presents challenges to successful maintenance of asthma. variability in drug efficacy, especially when occurring for β 2 -adrenergic receptor agonists and glucocorticosteroids, and in the presentation of side effects, can result in patient therapy having to deviate from the standard therapeutic regimen for the treatment of asthma. this high degree of interpatient variability in treatment response and in side effects can be attributed to a number of factors: (1) the interaction of genetic factors; (2) individual patient characteristics, such as weight, gender, and pregnancy; and (3) exposure to environmental insults, such as air pollution, allergens, and cigarette smoke. these factors interact, making the control of asthmatic episodes and the reduction of exacerbations more difficult. for example, asthma attacks can affect different age groups differently according to the season-children are more affected in summer while the older population suffers more in winter. when considering variation in treatment efficacy, one must also be mindful of individual patient factors such as inhaler technique and noncompliance (discussed earlier). currently asthma therapy is planned without consideration of genetic variance. however, genetic variance is arguably the most important of the factors listed previously-a claim supported by evidence gained from investigating the repeatability of response to therapy. this approach demonstrated that repeatability was between 60-80%, with a substantial proportion of the variance due to genetic factors [67] . one important form of variance in therapeutic efficacy is non-response. non-response is an important limitation in the maintenance and control of asthma, in that therapeutics expected to manage the disease at a certain level of severity fail outright, leading to the use of a trial approach in which different therapies are utilized with variable results until a suitable regimen is determined. a good example is severe asthma, where the patient is on permanent oral glucocorticosteroid therapy because of nonresponse to other therapies such as ics, leukotriene receptor antagonists, and β 2 -adrenergic receptor agonists. non-responders present with longer duration of asthma exacerbations and worse morning lung function, as well as a more frequent family history of asthma, when compared to those individuals responsive to glucocorticosteroid therapy [67] . the recognition that allergen specific ige activation of mast cells is central in driving allergic asthma lead to the discovery that the primary mast cell-signaling cascade could be inhibited by a monoclonal antibody toward the ige binding site to the high affinity receptor (fcer1). anti-ige (omalizumab) became the first specific biologic to be used in the treatment of severe allergic asthma. clinical trials revealed efficacy as well as almost total inhibition of early and late asthmatic responses to inhaled allergen [68] . following a rigorous examination of trial data by the national institute for clinical excellence (nice), omalizumab is now recommended as an option for treating severe individual data persistent, confirmed allergic ige-mediated asthma as an add-on to optimized standard therapy in people aged six and older who need continuous or frequent treatment with oral corticosteroids (defined as four or more courses in the previous year) in the united kingdom. although the drug is prescribed for cases of therapy-resistant asthma associated with allergy, it is also licensed for patients with evidence of allergy but a normal ige. omalizumab represents the first stratified treatment for asthma, but there is evidence that its use can be further stratified by using biomarkers of th2 inflammation to predict response. in a recent paper studying the effect of omalizumab on uncontrolled severe persistent allergic asthma, treatment effect was analyzed in relation to f e no, blood eosinophils, and serum periostin (an epithelial protein that is induced by il13). after 48 weeks of omalizumab, reductions in protocol-defined exacerbations were greater in high versus low subgroups for all three biomarkers, suggesting a potential prognostic ability [69] . this biomarker approach has helped divide asthma into and th2 high and th2 low disease. asthma has traditionally been considered a th2 process linked to atopy and allergy, type i hyper-sensitivity reactions, eosinophilic inflammation, and response to corticosteroids. "th2 high disease," as evidenced by high f e no and a sputum eosinophilia of >2%, has been associated with a better response to corticosteroids when compared to "th2 low disease" [70] [71] [72] . studies on lebrikizumab, a novel monoclonal antibody to il13 [73] , have suggested that serum periostin may be a biomarker for a more general th2 asthmatic phenotype. in one study a subgroup of individuals who had asthma and persistent elevation in serum periostin showed greater improvements in airway function and fewer exacerbations after lebrikizumab treatment than those with lower serum periostin. interestingly, levels of f e no, produced by inducible nitric oxide synthase, an enzyme induced in human airway epithelial cells by il13, were as informative as periostin in identifying th2-high individuals responsive to lebrikizumab. both the role of stratification based on phenotyping and the need for correct study end points in treatment trials have been informed by the salutary lesson of the anti-il5 antibody mepolizumab. although mepoluzimab effectively blocks eosinophilic inflammation, initial studies were disappointing in that no effect was demonstrated on spirometry or peak flow, despite a significant reduction in blood eosinophil numbers [74] . when the effect of the drug was studied on an outcome measure related to eosinophilia, namely severe exacerbations, there were significantly fewer exacerbations with a concomitant improvement in asthma-related quality of life [75] . tellingly, there were no significant differences with respect to symptoms, postbronchodilator fev 1 , or airway hyper-responsiveness, suggesting that different physiological processes determine these asthma outcomes. the study also helps researchers reflect upon the need to measure the correct end point in a disease that has different pathological mechanisms that all respond differently to treatment. phenotyping asthma has led to the identification of new disease targets, but aside from the use of anti-ige, it has not yet led to a step change in management. other approaches assessing the separate aspects of asthma pathophysiologynamely, airway inflammation, airway hyper-responsiveness, and airflow obstruction have been trialled with differing results. the key determinant in the design of these stratified approaches was the recognition that although asthma is a disease defined by symptoms, with treatment response also assessed by symptom improvement, there is no clear correlation between pathophysiology and symptomatology/ exacerbation risk. in the facet study (designed to evaluate the benefits of adding a long-acting β 2 -adrenergic receptor agonist to different doses of ics), higher-dose ics had a marked beneficial effect on exacerbation frequency, but relatively less effect on symptoms and peak expiratory flow. the opposite was true with the addition of long-acting β 2 -agonists [76] . this indicates that exacerbation frequency does not closely relate to symptoms and measures of disordered airway function, suggesting that the mechanisms responsible for these features are different [77] . numerous studies have demonstrated that in asthma the features of airway inflammation, airway hyper-responsiveness, variable airflow obstruction and associated symptoms can overlap, occur independently or change over time, in response to treatment or other external factors such as allergen exposure, or viral infection. one obvious example of this is eosinophilic bronchitis, a condition characterized by corticosteroid-responsive cough and the presence of a sputum eosinophilia occurring in the absence of variable airflow obstruction or airway hyper-responsiveness [78] . although asthma guidelines recommend the assessment of airflow obstruction (fev 1 /pef) and related symptoms for their primary treatment response outcomes, the realization is that this approach looks only at one aspect of asthma pathophysiology which is not closely related to exacerbation risk. this has led to new approaches aimed at reducing asthma exacerbations and symptoms while maintaining corticosteroid burden at the lowest possible level. airway hyper-responsiveness is defined as increased sensitivity to an inhaled constrictor agonist and a steeper slope of the dose-response curve. two main forms of bronchoconstrictor stimuli exist: direct and indirect. direct bronchoconstrictors, such as histamine or methacholine, stimulate receptors on the airway smooth-muscle, while indirect ones cause bronchoconstriction by secondary release of bronchoconstrictor mediators from mast cells or activation of neural pathways. airway responsiveness is usually measured as the provocative dose of methacholine causing a 20% fall in fev 1 by linear interpolation of the log dose-response curve (pc 20 ). in the general population, the distribution of airway hyper-responsiveness follows a continuous unimodal log-normal distribution, with asthma sufferers representing the hyper-responsive part of the distribution curve. a pc 20 is not usually measurable in non-diseased individuals, which suggests a large difference in airway responsiveness between non-diseased individuals and asthma patients. the cut-off used to identify asthma is normally a methacholine concentration of <8 mg/ml. this value had a sensitivity of 100%, a specificity of 93%, and a negative predictive value of 100% in a study on a population of 500 college students with a diagnosis of current symptomatic asthma. the use of methacholine pc 20 to diagnose asthma has been evaluated; one study demonstrated that when asthma is defined as consistent symptoms with objective evidence of abnormal variable airflow obstruction, a positive methacholine challenge is more sensitive than pef amplitude % mean and the acute bronchodilator response in diagnosis [79] . the use of methacholine pc 20 to guide treatment has also been assessed. although it was found that pc 20 -guided treatment resulted in a reduction in asthma exacerbations, this was at the expense of increased inhaled corticosteroid use [80] ; consequently, although pc 20 is often used for asthma diagnosis, or as a study end point, it is not routinely used to guide treatment decisions. asthma has been traditionally viewed as a condition where airway inflammation causes airway hyper-responsiveness, which in turn leads to variable airflow obstruction and symptoms; however, cross-sectional and longitudinal studies of airway inflammation using sputum induction in large populations with a diverse range of presentations suggest that this hypothesis requires modification. a recent bronchoscopy study demonstrated that bronchoconstriction is independent of airway inflammation and can lead to airway remodeling [38] . the development of noninvasive techniques to assess airway inflammation, including induced sputum and f e no, has made it possible to relate airway inflammation to objective measures of disordered airway function in larger and more heterogeneous populations than was possible with bronchoscopy studies. in general, these newer studies contradict findings in earlier studies and do not find a correlation between sputum eosinophil count and various markers of airway dysfunction. it has been observed that a subset of patients with symptomatic asthma do not have sputum evidence of eosinophilic airway inflammation [81] . many have sputum neutrophilia. this sputum profile is evident in corticosteroid-naïve as well as corticosteroid-treated subjects, suggesting that it is not always an artifact related to treatment. importantly, patients with noneosinophilic asthma respond less well to inhaled budesonide than do a group with more typical sputum features [82] . similar sputum findings have been reported in more severe asthmatics; a subgroup of patients with refractory asthma have been identified who have bronchoscopic evidence of neutrophilic airway inflammation, normal eosinophil counts, and a normal basement membrane thickness. these findings suggest the presence of a distinct asthma phenotype characterized by a predominantly neutrophilic airway inflammatory response and relative corticosteroid resistance. furthermore, there is evidence that neutrophilic asthma may result from activation of the innate immune system with the production of proinflammatory cytokines. the use of noninvasive measures to guide treatment decisions by stratification based on the presence or absence of airway inflammation has been assessed. the strongest evidence for effect relates to the use of induced sputum to guide corticosteroid dose in moderate to severe asthma. two studies have shown that using induced sputum differential counts to guide treatment results in fewer exacerbations for the same overall corticosteroid burden [81, 83] . although the evidence is clear that this approach works, induced sputum is not widely employed, possibly because it is seen as time consuming, requiring expertise both to induce the sputum and to perform the differential counts. the discovery that levels of f e no, measured via a device similar to a breathalyzer, correlate well with sputum eosinophilia and relate to corticosteroid responsiveness, has driven the application of inflammometry (noninvasive measurement of inflammation in the airways). inflammometry using f e no measurements has also been used to guide and stratify treatment decisions. though one study was positive in patients with asthma in pregnancy, with a reduction in exacerbations, most have failed to show an improvement when compared to guideline-driven (fev 1 /pef and symptoms) management. the hunt is still on for a reliable, inexpensive, and valid biomarker of airway inflammation. fev 1 and pef measurements reflect changes in the caliber of the large airways. our knowledge of anatomical and physiological changes in the small airways of patients with asthma is based on small case series of resected lung tissue from patients with asthma undergoing surgery for cancer, or on cases of fatal asthma. these case series have demonstrated that there is significant inflammation present in the small airways (<2 mm diameter) in asthma. fatal asthma is associated with peripheral airway inflammation and differences in the number of activated eosinophils in the distal lung. other studies have revealed alterations in the epithelium and smooth-muscle, as well as mucous hypersecretion and distal airway plugging of the small airways. the presence of inflammation in the small airways in asthma may explain why small airways account for up to 50-90% of total airflow resistance in asthma, but only 10% of airflow resistance in normal airways. recently, the development of "small-particle" ics, designed to target the peripheral lung, and the advent of new technologies-nitrogen washout, impulse oscillometry, and hyperpolarized noble gas magnetic resonance imaging, which allows assessment of peripheral lung function-have led to a resurgence of interest in the distal lung. studies of small-particle ics have been inconsistent; those comparing small-particle and standard-particle icss have failed to demonstrate improved asthma outcomes when administered in clinically comparable doses. future asthma treatment may yet be stratified by the presence or absence of small airway inflammation. there have been several attempts to base treatment decisions on measures of airway inflammation, including induced sputum and f e no in children. generally, they have been unsuccessful, although some experts do stratify treatment decisions on the response to oral or intramuscular steroids. while methods of stratifying asthma patients to specific treatments based on nongenetic factors such as clinical outcomes, cellular measures, or protein biomarkers have shown some success, a large body of work has investigated the potential of genetic markers as predictors of patient responses to existing therapies, i.e., pharmacogenetics. pharmacogenetics, the investigation of the effect of genetic polymorphisms on response to treatment or risk of adverse side effects, is one of the first steps in developing personalized prescribing. the use of genetic information to stratify patient prescribing is potentially more desirable compared to nongenetic stratification, as technology now allows a small amount of blood or saliva sample to be taken and a dna test can be completed within hours. to date, asthma pharmacogenetic studies have suffered from relatively small retrospective designs and a focus on only a few candidate genes; however, more recent, larger prospective studies have been completed that provide greater confidence in original findings and hypothesis-free approaches such as gwas. these studies are primarily focused on pharmacodynamic aspects (e.g., improvement in lung function post-treatment), and only limited information is available about the impact of pharmacogenetics on adverse effects. pharmacogenetics in asthma is relatively advanced compared to that for other diseases. genetic factors influencing the main treatment classes (i.e., β 2 -adrenergic receptor agonists, corticosteroids, and leukotriene modifiers) have been identified with some confidence (table 28. 3). an overview of these findings follows; however, for more in-depth analyses, see portelli and sayers [84] . β 2 -adrenergic receptor agonists carry out their function through the β 2 -adrenergic receptor-a 413-amino-acid g-protein-coupled receptor encoded by an intronless gene (adrb2) located on chromosome 5q31.32. binding of agonists to the receptor activates adenyl cyclase through stimulatory gs proteins, which in turn activate protein kinase a. the latter phosphorylates several target proteins, resulting in a decrease in intracellular calcium that, importantly, causes smooth-muscle relaxation in the airways. it is not surprising that the majority of evidence of genetically driven effects on β 2 -adrenergic receptor-agonist therapy stem from the adrb2 gene, which has therefore been extensively studied for pharmacogenetic effects on β 2 -adrenergic receptoragonist responses. adrb2 is a highly polymorphic gene containing 51 known and validated polymorphisms, of which 49 are snps and 2 are insertion/deletion variants. most studies have focused on the role of four nonsynonymous coding-region polymorphisms: arg16gly (arginine-to-glycine substitution at position 16 in the protein), gln27glu, val34met, and thr164ile [85] . in a caucasian population, the frequency of the polymorphisms at positions 16 and 27 were identified as 59% (arg16glu) and 29%, (gln27glu) [85] . the remaining val34met and thr164ile polymorphisms are rare, having approximate frequencies of <0.001% and 0.05%, respectively. the arg16 variant has been shown to have pharmacogenetic potential through association with; an enhanced acute response to β 2 -adrenergic receptor agonists l a decline of asthma control following prolonged use of β 2 -adrenergic receptor agonists l a subsensitivity of response for bronchoprotection by β 2 -adrenergic receptor agonists however, several studies have failed to reproduce these effects, meaning that a common consensus on the contribution of arg16 has yet to be reached. this lack of consensus can be explained when we consider that there are at least 12 different haplotypes (a specific combination of snps across the gene) in adrb2. investigations into the effect of a genotype in isolation, without consideration of its haplotype, is likely to introduce confounding into the association. functional effects of coding-region polymorphisms in adrb2 have been identified through in vitro work carried out in cell lines. these have included the following: [86] . in one of the larger studies, basu et al. identified an arg16 copy numberdependent increase in disease exacerbations in 1182 patients with asthma aged 3 to 22 years on daily exposure to β 2 -adrenergic receptor agonist (regularly inhaled corticosteroid plus salbutamol on demand group, and regularly inhaled corticosteroid plus salmeterol and salbutamol on demand group) [87] . it is important to note that this effect was driven by asthma patients who used salbutamol and/or salmeterol daily, supporting the suggestion that the arg16 polymorphism has an integral role in the effectiveness of β 2 -adrenergic receptor-agonist therapy [87] . however, as with other studies, study limitations did not allow a clear association to be made between the adrb2 gene (via its polymorphisms) and β 2 -adrenergic receptor-agonist efficacy. namely, all participants were using β 2 -adrenergic receptor-agonists as a reliever; a study arm of non-β 2 -adrenergic receptoragonist reliever use would have provided a clearer interpretation of the detrimental effects of salbutamol/salmeterol in the arg16 subjects. large studies have failed to observe a clinically relevant effect of these polymorphisms. these include longitudinal studies of β 2 -adrenergic receptor-agonist efficacy and the study of concomitant administration of corticosteroids. a good example is a study of 2250 asthma patients randomly assigned to: (1) budesonide plus formoterol maintenance and reliever therapy; (2) fixed-dose budesonide plus formoterol; or (3) fixed dose fluticasone plus salmeterol for six months. no overall effect of the gly16arg genotype on clinical outcomes was found [88] . another contributing factor to the pronounced differences in the conclusions from different studies investigating these adrb2 polymorphisms is trial design variation. the majority of studies have focused on the adrb2 arg16gly polymorphism; however, other potential pharmacogenetically relevant polymorphisms also affect β 2 -adrenergic receptor-agonist efficacy. multiple polymorphisms in the gene's regulatory regions that have potential clinical relevance are present in adrb2. by studying eight common haplotypes based on 26 snps, a recent in vitro approach using "whole gene" transfection identified differential effects on receptor expression and downregulation that are haplotype-driven [89] . this study identified four common haplotypes with elevated receptor expression and two haplotypes with enhanced receptor downregulation. another area recently investigated is polymorphisms occurring in the gene's untranslated regions, which may have an effect on gene expression and resultant drug efficacy. multiple genes are likely to be involved in the regulation of β 2 -adrenergic receptor-agonist response and expression of side effects, due to the agonists' known complex and multifactorial mechanism of action. one gene that has been associated with patient response to these agonists is arginase 1 (arg1), which was identified using a novel algorithm implemented in a family-based association test (fbat) [90] . in this study of 209 children and their parents, the arg1 snp, rs2781659, was associated with bronchodilator response (bdr) when snps from 111 candidate genes (42 involved in β 2 -adrenergic receptor-signaling/regulation, 28 involved in glucocorticoid regulation, and 41 from prior asthma association studies) were investigated for their association with acute response to inhaled β 2 -adrenergic receptor-agonist in 209 children and their parents. in agreement with this study polymorphisms spanning arg1 and influencing patient response to salbutamol have also been identified in a candidate gene study involving 221 asthma subjects [91] . the arg1 polymorphisms identified in both studies were in linkage disequilibrium (ld) (inherited together), suggesting a common causative mechanism involving potential transcriptional regulation due to the polymorphisms' location (predominantly 5′ to the gene). this alteration in transcription has now been confirmed in promoter-reporter studies, which found that the key arg1 haplotype associated with improved bdr drives the highest level of arg1 promoter activity [92] . in the study by vonk et al., arg2 snps were also associated with patient responses to salbutamol [91] , suggesting an integral role for the arginase family in β 2 -adrenergic receptor-agonist therapy. recently a s-nitrosoglutathione reductase (gsnor) snp (rs1154400, promoter region) was associated with a decreased response to salbutamol in 107 african-american children [93] . in the same study, a post hoc multilocus analysis discovered that a combination of rs1154400 with adrb2 arg16gly, gly27glu, and the carbamoyl phosphate synthetase-1 (cps1) snp rs2230739 gave a 70% predictive value for lack of response to therapy [93] , implying that pharmacogenetic regulation of β 2 -adrenergic receptoragonist therapy may depend on several loci acting together via gene-gene interactions. in confirmation, 4/5 snps tested in gsnor were associated with asthma patient responses to salbutamol in 168 puerto rican asthma patients [94] . these snps were also associated with asthma susceptibility, and the key risk haplotype was associated with increased transcriptional activity based on promoter-reporter studies [94] . gsnor is an alcohol dehydrogenase that breaks down gsno, an endogenous bronchodilator [95] . in addition, gsno regulates nitrosylation of proteins, leading to alterations in function, including g-protein coupled receptor kinase 2 (grk2), which phosphorylates and desensitizes the β 2 -adrenergic receptor [96] . other novel genes have recently been associated with β 2 -adrenergic receptor-agonist therapy including the spermatogenesis-associated, serine-rich 2-like (spats2l) and collagen (col22a1) genes [97] . the col22a1 gene, through association with the intronic snp rs6988229, was associated with acute bronchodilator response to inhaled salbutamol in a genome-wide association study in which ∼500,000 snps were tested in 403 caucasian trios. this association was replicated in a pooled population of three asthma trial populations, as well as in three additional asthma populations [97] . spats2l was also identified as a modulator of β 2 -adrenergic receptor-agonist function through a gwas. here, the snp rs295137, located near the spats2l gene, was significantly associated with percentage change in baseline fev 1 in 1644 caucasian asthma patients; this was replicated in two alternate caucasian populations (n ∼ 500 each). molecular biology techniques confirmed these results and identified that spats2l may be an important regulator of β 2 -adrenergic receptor downregulation. the identification of patients at risk from potential adverse effects of β 2 -adrenergic receptor-agonists remains a critical clinical question, as does the targeting of this class of drug to patients most likely to benefit from it. while there has been clear progress with respect to study design (e.g., examining haplotypes instead of genotypes in isolation) and adequately powered studies using thousands of individuals, there is still a need for large prospective studies of asthma patients with matched phenotypes and carefully controlled covariates that include environmental influences. similarly, these studies would benefit from gwa approaches and the identification of gene-gene interactions. while investigations of the effect of genetic polymorphisms on β 2 -adrenergic receptor-agonist responses have predominantly focused on clinical end points (e.g., lung function parameters) in the different genotype groups, there has been recent interest in the real-life application of genetic knowledge. as outlined, the first-line treatment for asthma is a shortacting β 2 -adrenergic receptor-agonist (e.g., salbutamol) as needed (step 1); if symptoms persist, the addition of inhaled corticosteroid (e.g., beclomethasone) is considered (step 2); for further control, a long-acting β 2 -adrenergic receptor agonist (e.g., salmeterol) or a leukotriene receptor antagonist (ltra) (e.g., montelukast) is added (step 3) (figure 28.4) . in a recent study, lipworth et al. set out to use genetic information on the adrb2 arg16 polymorphism to inform the choice of prescribing salmeterol or montelukast as addon therapy [98] . children with persistent asthma and homozygous for the arg16 genotype (n = 62) were randomized to receive salmeterol (50 μg, bd) or montelukast (5 or 10 mg, once daily) as an add-on to inhaled fluticasone propionate for one year. the study tested whether carriers of the arg16 genotype were more prone to adverse effects (e.g., prolonged β 2 -adrenergic receptor-agonist use associated exacerbations), and hence whether montelukast provided superior control for this preselected population. outcomes were school absences (primary outcome), exacerbation score, reliever use (salbutamol), morning dyspnoea, and asthma control questionnaire (acq) qualityof-life scores. montelukast provided superior benefit for all measures, with clinically relevant differences within three months. no significant difference in fev 1 (% pred) was observed, providing further evidence of limited correlation between lung function and symptom-based scores [98] . these results suggest that larger and longer prospective studies are warranted to provide more definitive data on the clinical utility of arg16 stratification including a gly16 study arm and the use of additional markers to define the population would provide clearer interpretation. as outlined, leukotriene synthesis inhibitors (ltsis) and leukotriene receptor antagonists (ltras) are commonly used as an add-on therapy in asthma to provide greater control or steroid-sparing effects. the cysteinyl leukotrienes (ltc 4 , ltd 4 , lte 4 ) and dihydroxy leukotriene (ltb 4 ) contribute to the inflammatory process in asthma and are synthesized from arachidonic acid via the 5-lipoxygenase pathway (figure 28.7) . cysteinyl leukotrienes have been implicated in bronchoconstriction, mucus secretion, vascular permeability, inflammatory cell infiltration and cytokine production. arachidonic acid is converted to 5-hydroperoxyeicosatetraenoic acid and leukotriene a 4 (lta 4 ) by 5-lipoxygenase (5-lo/alox5) and 5-lipoxygenaseactivating protein (flap/alox5ap), which acts as an adaptor protein for this reaction. lta 4 is then converted to leukotriene b 4 (ltb 4 ) by lta 4 hydrolase (lta4h) or, alternatively, is conjugated with reduced glutathione to form ltc 4 via the actions of leukotriene c4 (ltc 4 ) synthase (ltc4s). ltc 4 is next transported to the extracellular space via the multidrug resistance protein 1 (mrp1) ( figure 28 .7). leukotriene modifiers include ltsis, which act by targeting 5-lo (e.g., zileuton), resulting in a decrease in all leukotriene biosynthesis (ltc 4 , ltd 4 , lte 4 , and ltb 4 ) or ltras, which act by specifically blocking cysteinyl leukotrienes from binding to their primary receptor, cysltr1, which is found on many cell types including inflammatory cells and airway smooth-muscle cells, (e.g., montelukast and zafirlukast). data demonstrating the large degree of heterogeneity of asthma patient responses to this class of drug are summarized in figure 28 .6. these data have led to extensive genetic studies investigating the roles of snps in a large number of candidate genes associated with leukotriene production and/or activity and acute responses to ltras and ltsis. associations have been described for polymorphisms in alox5, alox5ap, ltc4s, cysltr1, cysltr2, and mrp1; however, many of these studies have been small and so have led to a lack of reproducibility in findings [99] . more recent studies have looked at different aspects of leukotriene modifier functions, including the association of montelukast absorption (i.e., pharmacokinetics) with polymorphisms in oatp2pb1 [100] . however, others have failed to observe these associations [101] . 5-lo (along with 5-lo-activating protein) is the major regulatory switch for leukotriene production, and extensive studies have demonstrated that the level of 5-lo can have dramatic effects on both cysteinyl and dihydroxy leukotriene production. 5-lo catalyzes the conversion of arachidonic acid to lta 4 , one of the early stages in leukotriene production ( figure 28.7) . alox5 is found on chromosome 10q11.2 and is a large gene composed of 14 exons and 13 introns; it spans approximately 82 kb. multiple studies suggest that alox5 polymorphisms can influence clinical responses to ltras and ltsis-in particular, a functional repeat polymorphism in the promoter region (resulting in alterations in sp1 transcription factor binding) have been associated with altered gene transcription and with response to the ltsi, abt-761 [99, 102] . these early studies have now been extended to investigate multiple polymorphisms, spanning the entire gene, and association with ltra and/or ltsi responses in asthma patients. for example, tantisira et al. observed an association between alox5 intronic snps-rs892690, rs2029253, and rs2115819-and change in fev 1 post-ltsi(zileuton)-in a cohort of 577 asthma patients [103] . the rs2115819 snp was also a predictor of response to the ltra, montelukast, in a previous study of 252 asthma subjects using percentage change in fev 1 as the primary outcome [104] . in both cases, the gg versus ga or aa genotype had the greatest improvement in lung function (fev 1 ) post-drug [104] . the lack of association for the rs892690, and rs2029253 snps in the montelukast study may be due to the reduced power of this study compared to the zileuton study. the functional significance of these snps remains to be resolved, and it is important to note that they may not be the causative genetic change, and that polymorphisms inherited at the same time (i.e., in ld) may be of relevance. two alox5 snps (rs4987105 (synonymous thr120thr) and rs4986832 (5′ region)) were also associated with improvement in pef on treatment with montelukast in a 12-week study in 174 asthma patients. this further confirmed the relevance of alox5 snps and this key enzyme in leukotriene production [105] . additional support for the relevance of alox5 in asthma control is provided by a recent study of 270 children which found that the alox5 sp1 promoter polymorphism determined urinary lte 4 levels and was associated with reduced lung function and, potentially, an acq score in non-5 copy carriers [106] . in this study, there was an inverse relationship between urinary lte 4 levels and fev 1 . another important enzyme in the leukotriene pathway is leukotriene c 4 synthase (ltc4s), which specifically results in the generation of the first cysteinyl leukotriene, ltc 4 , by conjugating glutathione to lta 4 . there has been intense research into the role of ltc4s in asthma susceptibility because this enzyme, like 5-lo, is thought to be a key regulatory switch for cysteinyl leukotrienes, which are thought to play a more prominent role in asthma (e.g., via bronchoconstriction) than the dihydroxy ltb 4 . ltc4s is relatively small, spanning 2.51 kb on chromosome 5q35. in particular, a promoter polymorphism (a-444c, rs730012) has been intensely investigated for pharmacogenetic effects as it is thought to alter ltc4s levels via transcription (higher levels in c allele carriers) and therefore alter ltc 4 production [107] . of the ten published studies investigating the role of ltc4s-444 a>c, five showed an improvement in outcome measures, including fev 1 post-ltra therapy for carriers of the c allele, as hypothesized, although results were not always statistically significant. in a more recent study of zileuton responses (fev 1 change over time) in 577 asthma subjects, the ltc4s-444 polymorphism showed no effect; however, an alternative ltc4s polymorphism (rs272431, intron 1) was associated with improved response (mean fev 1 ) [103] . as these results were obtained in different populations and using different end points, the relative contribution of the ltc4s polymorphism to ltra or ltsi therapeutic responses in asthma are still unclear. larger prospective studies using multiple ethnic groups are required. montelukast and other ltras target the cysteinyl leukotriene receptor 1, which is expressed in a variety of cells, including airway smooth-muscle and various inflammatory cells (e.g., eosinophils), therefore inhibiting the activation of this receptor by cysteinyl leukotrienes (figure 28.7) . cysteinyl leukotriene receptor 1 is thought to be the main receptor mediating cysteinyl leukotriene receptor smooth-muscle contraction and inflammatory cell cytokine production in asthma. again, genetic variation in the target receptor for these compounds may influence how effective they are in carriers of these alleles by regulating receptor function/expression. the cysteinyl leukotriene receptor 1 gene, cysltr1, is intronless and found on chromosome xq13-21; it generates a 337 amino acid g-protein-coupled receptor (gpcr). a second receptor for cysteinyl leukotrienes has also been described and, while not the target of ltras, may modulate the effects of cysteinyl leukotrienes in vivo. the gene for cysteinyl leukotriene receptor 2 is found on chromosome 13q14 and encodes for a 346 amino acid gpcr. interestingly, in combination with several alox5 snps, polymorphisms spanning cysltr1 have been identified as determinants of responses to montelukast, suggesting snp-snp interactions may be important [108] . however, direct evaluation of cysltr1 polymorphisms and responses to montelukast [104] or zileuton [102] have not identified a significant effect for them to date. studies of cysltr2 are limited; however, snps rs91227 and rs912278 (3′utr) have been associated with an improvement in morning pef following administration of montelukast for 12 weeks [105] . while the majority of studies have focused on candidate polymorphism analyses in genes directly involved in leukotriene production or activity, several studies have identified additional genes of relevance to ltsi and ltra response. in addition to its effects on vasodilation and bronchoconstriction, the prostaglandin d2 receptor (gene: ptgdr) is thought to regulate, at least in part, levels of leukotriene c 4 . in a study of 100 asthmatic children prescribed montelukast (5 mg/day), a modest effect of the ptgdr-4441t/c was observed. similarly, in a more extensive study of 169 snps in 26 candidate genes in asthma patients prescribed fluticasone, fluticasone propionate plus salmeterol or montelukast, a significant association between four snps in the cholinergic muscarinic receptor 2 gene (chrm2) and montelukast response (change in fev 1 ) over 16 weeks was observed [106] . chrm2 and chrm3 are thought to mediate airway tone via regulation of contractile/relaxation responses and targeting of chrm3 using antagonists; for example, tiotropium has shown clinical efficacy in multiple respiratory diseases. these data potentially suggest that the nature of the airway obstruction/tone determined by altered chrm2 expression and/or activity may influence responses to montelukast. while the majority of studies have investigated the association between gene polymorphisms and acute responses to asthma therapy, a recent study by mougey et al. identified that solute carrier organic anion transporter family, member 2b1 (slco2b1, alternative name: organic anion transported sb1; oatp2b1), was able to mediate montelukast permeability using a model cell system engineered to express the oatp2b1 protein [100] . subsequently, the same group also showed that an alteration in the protein structure of oatp2b1 that occurs naturallythat is, arg312gln (rs12422149)-was associated with reduced morning plasma concentrations of montelukast following an evening dose during one month or six months of treatment in 80 asthma patients [100] . more specifically, ga (arg/gln) genotype carriers had ∼20% lower montelukast concentration than the gg (arg) group at one month, and ∼30% lower concentration at six months, and of clinical relevance a allele carriers did not demonstrate benefit from montelukast using a symptom-based score [100] . the same researchers have now replicated these findings on montelukast absorption [109] ; however, another study did not replicate this finding and failed to identify any effect of the arg312gln polymorphism on montelukast plasma concentrations, albeit in a different study design involving fewer subjects [101] . these studies highlight the potential importance of genetic factors influencing drug transporters that may be anticipated to affect pharmacokinetics and the pharmacodynamics. similarly, several snps in another transporter protein, multidrug resistance protein 1 (mrp1) (alternative name: atp binding cassette, subfamily c, member 1 (abcc1))-for example, rs119774have been associated with montelukast response (change in fev 1 % predicted) [110] and zileuton response [103] . this association potentially confirms the pharmacogenetic significance of this mrp1 snp marker. this snp is intronic, and the underlying functional mechanism, including which genetic variant explains these effects, remains to be resolved. overall, there has been good progress in the pharmacogenetics of leukotriene modifier therapy in asthma, with snps in multiple leukotriene-synthesising enzymes (e.g., alox5) showing robust association with lta and ltsi responses. multiple genes in the leukotriene synthesis pathway and/or receptors have polymorphisms that have been associated with asthma susceptibility, which implies that there may be a more leukotriene-driven asthma that is therefore more amenable to treatment with ltsis and ltras. of interest in this drug class, preliminary data suggest a significant contribution of snps in drug transporter genes, ultimately determining both the pharmacokinetics and the pharmacodynamics. these data set the scene for larger prospective studies to provide accurate effect sizes and determine clinical implications with greater confidence. corticosteroids are an important pharmacogenetic target in asthma. initial investigations into a pharmacogenetic approach for the corticosteroid element of asthma therapy focused on the glucocorticoid receptor gene (gr, alternative name: nuclear receptor subfamily 3, group c, member 1, nr3c1), which maps to chromosomal region 5q31-a region associated with multiple asthma phenotypes. as with other genes discussed in this chapter, several polymorphisms have been described in the gr gene that have functional consequences, such as a val641asp polymorphism that has been shown to influence the binding affinity for dexamethasone. however, similar to most polymorphisms shown to have potential pharmacogenetic effects, several of these polymorphisms are rare and their functional significance is questionable [84] . despite this, the gr remains a tantalizing target for asthma pharmacogenetics, and it is surprising that there have not been further recent investigations into the role of gr polymorphisms in corticosteroid responses. gr is not the only gene to be associated with geneticbased variance in corticosteroid efficacy in asthma treatment. a number of other genes have been implicated, including crhr1, tbx1, nk2r, stip1, dusp1, and fcer2, and these are discussed below. snps in the corticotrophin-releasing hormone receptor 1 (crhr1) gene are associated with response to inhaled corticosteroid treatment, based on end point change in fev 1 following 8 weeks of treatment in three asthmatic cohorts using a candidate gene approach (131 snps in 14 genes, n = 1117 asthma subjects). for example, the intronic crhr1 snp rs242941 exhibited genotype-specific changes in the percentage predicted change in fev 1 in response to corticosteroid therapy in 470 adult asthma subjects [111] . this study was the first to show a pharmacogenetic effect for steroid efficacy in an asthmatic cohort, and it highlights the crhr1 gene to be at least one of a number of factors determining corticosteroid efficacy. crhr1 is thought to influence responses to exogenously administered corticosteroid through the regulation of endogenous levels of corticosteroid. the t box 21 (tbx21) gene has been shown to be a predictor of improvement in bronchial hyper-responsiveness (bhr) (4 year change) post-corticosteroid treatment in children, based on a genotype-dependant variation of the nonsynonymous snp rs2240017 (his33gln), where the presence of the g allele gave greatest improvements [112] . however, the association of this gene remains under question because of its low allele frequency in caucasians (maf ∼0.04), which resulted in only a few subjects (n = 5) being available to contribute to this observation [112] . additional data have given more confidence to the involvement of tbx21 because the his33gln polymorphism was significantly associated with improved asthma control in the presence of corticosteroids, although the alternative allele to that described was associated with greater control in 53 korean asthma patients during 5-12 weeks treatment [112] . interestingly in the same study, ye et al. identified a novel gene, the neurokinin 2 receptor (nk2r), which was associated with improved asthma control in the presence of corticosteroids; the g allele (gly) of the nk2r snp rs77038916g/a, (gly231glu) was associated with the greatest improvement. neurokinin a induces bronchoconstriction and inflammation, therefore modulation of the neurokinin receptor may influence the magnitude of these responses. multiple snps in the stress-induced phosphoprotein 1 gene (stip1) were associated with variable fev 1 responses to treatment with the inhaled corticosteroid flunisolide in 382 asthma subjects [103] . this study investigated snps spanning eight candidate genes and identified stip1 snps rs4980524, rs6591838, and rs2236647 as snps affecting percent change in fev 1 in response to flunisolide at 4 weeks and 8 weeks [103] . stip1 codes for an adaptor protein that coordinates functions with hsp70 and may be involved in formation of the glucocorticosteroid receptor heterocomplex. the dual-specificity phosphatase 1 (dusp1) gene encodes a protein that has dual specificity for tyrosine and threonine and inactivates p38 mitogen-activated protein kinase (mapk). recently, several snps including a 5′ region snp, rs881152, were associated with (1) bronchodilator responses and (2) asthma control in the presence of corticosteroid in a cohort of asthma patients (n = 430) [113] . the mechanism underlying these effects is unclear; however, it was suggested that corticosteroid induces dusp1 expression, which may be altered in carriers of the rs881152 5′-region snp. this influences the ability of dusp1 to target the p38 mapk signaling pathway. in a cohort of 311 asthmatic children, the low-affinity ige receptor gene (fcer2) has been highlighted as putatively involved in corticosteroid (budesonide) regulation of exacerbation by virtue of an association between the snp t2206c (rs28364072, intronic) and the relative risk of severe exacerbations while taking budesonide [114] . interestingly, the c allele was also associated with elevated ige levels in the 311 asthma patients [114] . importantly, this pharmacogenetic association has now been replicated in two other asthma cohorts (n = 386 and 939, respectively), with the 2206c allele being associated with increased hospital visits and uncontrolled asthma in patients receiving ics [115] . it is not surprising that this polymorphic variation in an ige receptor influences receptor expression, which alters ige levels and is associated with more severe forms of asthma. in turn, these subjects cannot be adequately controlled by corticosteroid treatment. the first gwas in asthma, published in 2012, identified snps in the glucocorticoid-induced transcript 1 gene (glcc1) as determinants of glucocorticoid response in 935 asthma subjects [116] . this study examined 534,290 snps using the human hap 550v3 beadchip to identify 100 snps of interest in an initial cohort of 403 parent-child trios. the primary outcome was change in fev 1 . of interest was a snp in the glcc1 gene promoter region (rs37972c/t) that was associated with attenuation in fev 1 improvement post-corticosteroid in three of four cohorts tested [116] . a subsequent gwas identified the t-gene as a novel player in lung function response (fev 1 ) to inhaled corticosteroids. analyses were carried out in 418 caucasian asthmatics and replicated in a secondary asthmatic population (n = 407), where 3 out of 47 successfully replicated snps were associated under the same genetic model in the same direction, including 2 of the top 4 snps ranked by p value. these snps (rs3127412c/ t and rs6456042a/c) were in strong ld with a variant located in the t-gene, suggesting that this gene is a novel player in the pharmacogenetic regulation of corticosteroid efficacy. this was confirmed by a follow-up association, where the t-gene variant was associated with lung function response to inhaled corticosteroids in the initial gwas analysis (an average 2-3-fold increase in fev 1 response for homozygous wild-type subjects; rs3127412 tt; and rs6456042 cc) [117] . not all recent developments have been due to gwas. for example, a recent hypothesis-driven study implicated the cytochrome p450 3a enzymes in glucocorticosteroid efficacy through modulations in corticosteroid metabolism. in this study, roberts et al. showed, through mrnadriven studies confirmed via microsomes and recombinant enzymes, that cyp3a4 and cyp3a5 metabolize corticosteroids (budesonide) into inactive metabolites [118] . they suggested that differences in the expression or function of these enzymes in the lung and/or liver could influence corticosteroid efficacy in the treatment of asthma. to date, the majority of pharmacogenetic studies have focused on one class of asthma therapy. however, several recent studies have investigated combination therapy, which more accurately reflects the clinical situation. we have summarized the main findings regarding the influence of adrb2 polymorphisms, including arg16gly, on combination therapies, particularly fluticasone/salmeterol and budesonide/formoterol in section 28.5.5.1. however, it is clear that reports on associations between adrb2 arg16 and increased exacerbation in asthma patients using regular salbutamol in addition to their maintenance combination are mixed, with some studies failing to report the association, potentially because of differences in study design. more recently, additional genes have been investigated for snp associations with patient responses to combination therapy. in a study of 81 asthma patients regularly using an ics/laba combination, it was shown that polymorphisms in the nitrous oxide synthase 3 (nos3) gene involving an amino acid change (g894t, glu298asp), were associated with post-drug improvement in lung function (change in fev 1 (% pred)) [119] . nos3 codes for a key enzyme in the production of no in the airways, and f e no has been shown to be elevated in asthma and is considered a marker of ongoing inflammation. importantly, earlier data identified that the glu298asp polymorphism is functional in nos3 and influences levels of f e no with the tt genotype associated with lower f e no levels. these data suggest that lower nos3 activity and potentially lower f e no levels identify patients more likely to respond to combination therapy; however, larger prospective trials are required to validate these findings because they are potentially counterintuitive if no is a marker of inflammation and ics target inflammation. a recent 16-week study on the effect of genetic determinants (169 snps in 26 candidate genes) on asthma patient response to combined fluticasone propionate/salmeterol therapy identified three snps in the cholinergic muscarinic receptor 2 gene (chrm2) that are associated with asthma acq scores, and found a single snp, rs1461496 in heat shock 70kd protein 8 (hspa8), to be associated with change in fev 1 [106] . of interest, chrm2 snps were also associated with responses (lung function) to montelukast (see earlier), suggesting that the "tone" of the airways as determined by muscarinic receptors is important for therapies that target airway contraction. hspa8 is involved in protein folding in the cell, but it is important to note that hspa8 snps previously were not associated with response to ics [120] . drug development in asthma has been slow to generate new classes, and the major advances in patient care have come from new compounds and/or indications for existing drug classes (e.g., improved duration of action for labas). one potential explanation for this limited progress is the design of and recruitment to phase i and phase ii trials for the evaluation of new compounds. we have already outlined how initial trials of anti-il5 demonstrated disappointing effects on lung function and asthma symptoms. only after careful selection of patients, based on clinical and cellular parameters, were clinically relevant improvements observed. researchers now understand that asthma is heterogeneous and, particularly for strategies targeting specific mediators, careful selection of patients is necessary to fully understand the therapeutic potential of new drugs. recent approaches to the treatment of asthma [121] include the following: the specific targeting of single mediators is unlikely to provide a therapeutic option for asthma in its broadest definition. however, targeting specific subpopulations has shown clinical efficacy. very recently, a study investigating a human il13-neutralizing monoclonal antibody (tralokinumab) used a carefully balanced population of atopic and nonatopic asthmatics, with exclusions for additional respiratory pathology, cigarette smoking ≥10 pack-years, recent infection, or treatment with immunosuppressive medication, in a randomized double-blind study [122] . here, tralokinumab was associated with improved lung function based on an increase in fev 1 and a decrease in daily β 2 -adrenergic receptor-agonist use [122] . these studies overall demonstrate that careful selection of asthma subjects, in this case based on phenotype, is critical in asthma drug development. as novel asthma genes are identified, it is paramount that their evaluation as potential drug targets, particularly in the context of phase ii trials, takes into account pharmacogenetic factors. preliminary data support genetic testing in phase ii trials of newer compounds, as shown in a recent report on a il4/il13 dual antagonist (pitrikinra). polymorphic variation in the target receptor for this antagonist (i.e., il4rα) significantly influenced outcomes in allergic asthma subjects [123] . pitrikinra is a recombinant form of il4 differing at two amino acid residues (i.e., a mutein, r121d/y124d). this has been shown to reduce late-phase antigen responses (lar) to inhaled antigen as defined by changes in lung function (fev 1 ) over 4-10 hours postantigen exposure. in one study, subjects had been following a four-week period of twice daily active treatment of nebulized pitrikinra or placebo, with an increased lar ratio correlating with a reduced fev 1 [124] . stratification of subjects (pitrikinra n = 15, placebo n = 14) into genotype groups for the nonsynonomous il4ra snp rs1801275 (gln576arg) identified arg/arg carriers as having an attenuated lar (p < 0.0001) following pitrikinra treatment compared to gln/gln or gln/arg genotypes. similarly, stratification based on rs1805011 (glu400ala) showed an attenuated lar in the glu/ala group but not the glu/glu group [123] . interestingly, the arg576 variant (when in combination with val75) was shown to be a risk factor for allergic asthma and to lead to enhanced il4rα signaling post il4 stimulation [125] . these preliminary data suggest that selecting a subgroup of patients with a particular genotype when it is anticipated that the receptor/pathway may have a more dominant role in that individual's asthma is critical to interpreting phase ii clinical trials. approaches to further define asthma subphenotypes (e.g., using cluster analyses) have been useful and further confirm this heterogeneity in clinical presentation. while current medications have been extremely successful in asthma management (e.g., ics), it is clear that individual responses to these medications and to therapies in development are heterogeneous, with large variation in clinical benefits and/ or detrimental effects. the stratification of asthma treatment using clinical and/or genetic approaches therefore shows great potential for maximizing clinical outcomes and minimizing adverse effects, leading to improvement in the management of asthma. there has been excellent progress in this area of research over the last five to ten years with the introduction of anti-ige (omalizumab) therapy into mainstay asthma treatment, but only for those patients with clinical indications (i.e., severe allergic asthma) with an elevated serum ige level. this stratification in prescribing for anti-ige shows proofof-concept by targeting this expensive biologic to those patients most likely to show clinical benefit. similarly, there is accumulating evidence that stratification of patients in phase ii trials of newer therapies-for example, anti-il5 (mepoluzimab) and anti-il13 (lebrikizumab), again based on clinical and cellular patient profiles, is essential to adequately evaluate these therapies and will be important if these therapies are introduced into clinical practice. clear progress in asthma pharmacogenetics has come with completion of newer studies that interrogate multiple snps in large prospective cohorts. to date, predominantly candidate gene/pathway approaches have been used, but several genetic variants have now been identified with confidence (e.g., alox5 and leukotriene modifier response; fcer2 and glucocorticoid response). importantly, these studies show independent replication (the gold standard in genetic studies, including specific snps and direction of effect). also importantly, several gwass have now been completed that provide integration of common variation spanning the entire genome (typically testing 500,000 snps) identifying novel gene variants in a hypothesis-free approach (e.g., the collagen (col22a1) locus and spats2l and response to albuterol/salbutamol). these more recent findings still need further replication and validation; however, they potentially pave the way for a "responder" and "nonresponder" profile based on multiple snps in multiple genes, all measured on the same platform. a key question is the relative contribution of common variation (measured on these platforms) and rare variation (not currently measured to these outcomes). another remaining question relates to the magnitude of changes in clinical measures between genotype groups in asthma patients (e.g., fev 1 ). data so far suggests that these differences can be considered clinically relevant; however, further work with larger populations is required to accurately estimate effect sizes. similarly, the genetic variation identified to date accounts only for a modest proportion of the overall variation of the trait (e.g., a recent estimate suggests the hereditability of bronchodilator response at ∼28.5%). it is likely that studies so far have been confounded by factors including snp/haplotype analyses, gene-gene interactions, and the relative contribution of snps in different ethnic backgrounds. therefore, while our knowledge has dramatically increased, there is a need for prospective trials of these current compounds that involve large numbers of subjects and hypothesis-free approaches including gwas. as with gwas approaches to identifying disease susceptibility genes, "missing hereditability" will be an important issue in pharmacogenetics studies. this small proportion of hereditability currently assigned to identified loci (e.g., crhr1 < 3%) of corticosteroid response needs to be addressed because asthma medications cannot be successfully personalized unless we resolve a larger degree of genetic variability and/or environmental factors. as with clinical/cellular approaches to stratifying phase ii trials of newer asthma therapies, it is clear that genetic stratification has a role to play. for example, the recent phase ii trials of the il4/il13 dual antagonist (pitrikinra) suggests patient selection is critical to evaluating the clinical efficacy of this compound, with specific il4rα genotypes identifying responder groups. more pharmacogenetic integration is required for phase ii trials, which represents a clear challenge for drug development pipelines. as our understanding of the genetic basis of asthma increases other therapeutic targets for asthma will become apparent (e.g., il33 receptor antagonists). it is clear that the genetic variation associated with asthma susceptibility in these genes will influence pathways targeting methods, as they essentially identify individuals in which the pathway may be particularly important. it is also likely that existing therapies may be influenced by the genetic factors that underlie asthma susceptibility, particularly therapies that have multiple targets (e.g., ics). the impact of our genetic knowledge on asthma management and prescribing practice is limited at this time, as there is a need to further define the complex genetic basis of responders and adverse effects in large prospective studies. however, preliminary "real-life" studies with related outcome measures (e.g., days from school) have provided proof of concept that genetic information can have important implications for disease management (e.g., informing second-line therapy in asthma using adrb2 arg16) [98] . there remain many unanswered questions regarding the potential of stratified approaches in the management of asthma. while progress has been made in identifying patient subgroups based on clinical measures, cell counts, or marker expression, more work is required using very large studies to extensively characterize asthma patients not only cross-sectionally, but also longitudinally over many years to identify the natural progression of the disease. the field of genetics is rapidly progressing, with great technological developments moving at a dramatic pace, e.g., gwas comprising a million common snps or several million rare snps, gene expression studies, and wholeexome or targeted resequencing. these approaches will allow the interrogation of genetic variation effect in patient subgroups. it is likely that a drug-specific genetic profile involving several genes (e.g., receptors, signaling mediators, transcription factors) will be a step toward personalized medicine in asthma, with associated benefits including avoidance of adverse side effects and adequate control of the disease. we anticipate that through these approaches many novel common variants/genes will be identified as underlying responses to current asthma medication. as the cost of genetic analyses continues to fall, the potential of a relatively simple test that will have an impact on asthma management costs becomes a real possibility. if genetic or clinical information is going to influence clinical practice, there must be a simple, reproducible diagnostic test based on snp combinations or simple clinical measures, with adequately high specificity and sensitivity that distinguishes responders and nonresponders. however, the introduction of routine genetic and/or clinical testing into practice will require a clear demonstration of health benefits and cost-effectiveness that outperforms the current stepwise approach to asthma management. a plea to abandon asthma as a disease concept a brief history of asthma and its mechanisms to modern concepts of disease pathogenesis guidelines for the diagnosis and management of asthma. national asthma education and prevention program: expert panel report iii. nih publication no. 08-4051 british thoracic society and scottish intercollegiate guidelines network: british guideline on the management of asthma. a national clinical guideline economic burden of asthma: a systematic review asthma-related resource use and cost by gina classification of severity in three european countries inhaler competence in asthma: common errors, barriers to use and recommended solutions low-dose inhaled corticosteroids and the prevention of death from asthma global initiative for asthma (gina) and its objectives an official american thoracic society/european respiratory society statement: asthma control and exacerbations: standardizing endpoints for clinical asthma trials and clinical practice differences between asthma exacerbations and poor asthma control risk factors and costs associated with an asthma attack neutrophil degranulation and cell lysis is associated with clinical severity in virus-induced asthma viruses and bacteria in acute asthma exacerbations-a ga2len-dare* systematic review asthma as a risk factor for invasive pneumococcal disease increased frequency of detection of chlamydophila pneumoniae in asthma mycoplasma pneumoniae and asthma in children the effect of telithromycin in acute exacerbations of asthma disordered microbial communities in asthmatic airways etiology of asthma exacerbations thunderstorm outflows preceding epidemics of asthma during spring and summer the link between fungi and severe asthma: a summary of the evidence exposure to an aeroallergen as a possible precipitating factor in respiratory arrest in young patients with asthma asthma exacerbations 2: aetiology randomized controlled trial of oral antifungal treatment for severe asthma with fungal sensitization: the fungal asthma sensitization trial (fast) study a cross-sectional study of patterns of airway dysfunction, symptoms and morbidity in primary care asthma interpretative strategies for lung function tests severe asthma in childhood: assessed in 10 year olds in a birth cohort study exclusive viral wheeze and allergic wheeze: evidence for discrete phenotypes phenotypes of refractory/severe asthma. paediatr respir rev the effect of airway remodelling on airway hyperresponsiveness in asthma airway hyperresponsiveness in asthma: not just a matter of airway inflammation is asthma a nervous disease? the parker b. francis lectureship airway mucosal inflammation even in patients with newly diagnosed asthma airway remodelling structural and inflammatory changes in copd: a comparison with asthma effect of bronchoconstriction on airway remodeling in asthma human sensitisation the familial incidence of allergic disease genetic and environmental influence on asthma: a population-based study of 11,688 danish twin pairs the genetics of asthma and allergic disease: a 21st century perspective genome-wide association studies for discovery of genes involved in asthma association of the adam33 gene with asthma and bronchial hyperresponsiveness plaur polymorphisms are associated with asthma, plaur levels, and lung function decline evidence of a genetic contribution to lung function decline in asthma asthma genetics 2006: the long and winding road to gene discovery replication of genetic association studies in asthma and related phenotypes interleukin-13 in asthma pathogenesis elevated expression of messenger ribonucleic acid encoding il-13 in the bronchial mucosa of atopic and nonatopic subjects with asthma interleukin-13: central mediator of allergic asthma genetic variants regulating ormdl3 expression contribute to the risk of childhood asthma a large-scale, consortium-based genomewide association study of asthma genome-wide association study to identify genetic determinants of severe asthma resolution of allergic inflammation and airway hyperreactivity is dependent upon disruption of the t1/st2-il-33 pathway house dust mite allergen induces asthma via toll-like receptor 4 triggering of airway structural cells elevated soluble st2 protein levels in sera of patients with asthma with an acute exacerbation inhaled corticosteroids in lung diseases intrinsic asthma cluster analysis and clinical asthma phenotypes identifying adult asthma phenotypes using a clustering approach identification of asthma phenotypes using cluster analysis in the severe asthma research program inflammatory phenotypes underlying uncontrolled childhood asthma despite inhaled corticosteroid treatment: rationale and design of the pacman2 study the immunopathology of extrinsic (atopic) and intrinsic (non-atopic) asthma: more similarities than differences oral montelukast, inhaled beclomethasone, and placebo for chronic asthma. a randomized, controlled trial. montelukast/beclomethasone study group measuring pulmonary function in practice heterogeneity of therapeutic responses in asthma the effect of an anti-ige monoclonal antibody on the early-and late-phase responses to allergen inhalation in asthmatic subjects exploring the effects of omalizumab in allergic asthma pathological features and inhaled corticosteroid response of eosinophilic and non-eosinophilic asthma t-helper type 2-driven inflammation defines major subphenotypes of asthma characterization of within-subject responses to fluticasone and montelukast in childhood asthma lebrikizumab treatment in adults with asthma effects of an interleukin-5 blocking monoclonal antibody on eosinophils, airway hyper-responsiveness, and the late asthmatic response mepolizumab and exacerbations of refractory eosinophilic asthma exacerbations of asthma: a descriptive study of 425 severe exacerbations. the facet international study group sputum analysis, bronchial hyperresponsiveness, and airway function in asthma: results of a factor analysis eosinophilic bronchitis: clinical features, management and pathogenesis a comparison of the validity of different diagnostic tests in adults with asthma clinical control and histopathologic outcome of asthma when using airway hyperresponsiveness as an additional guide to long-term treatment. the ampul study group asthma exacerbations and sputum eosinophil counts: a randomised controlled trial analysis of induced sputum in adults with asthma: identification of subgroup with isolated sputum neutrophilia and poor response to inhaled corticosteroids determining asthma treatment by monitoring sputum cell counts: effect on exacerbations genetic basis for personalized medicine in asthma mutations in the gene encoding for the beta 2-adrenergic receptor in normal and asthmatic subjects polymorphisms of the beta2-adrenergic receptor correlated to nocturnal asthma and the response of terbutaline nebulizer adrenergic beta(2)-receptor genotype predisposes to exacerbations in steroid-treated asthmatic patients taking frequent albuterol or salmeterol effect of adrb2 polymorphisms on response to longacting beta2-agonist therapy: a pharmacogenetic analysis of two randomised studies common adrb2 haplotypes derived from 26 polymorphic sites direct beta2-adrenergic receptor expression and regulation phenotypes arg1 is a novel bronchodilator response gene: screening and replication in four asthma cohorts arginase 1 and arginase 2 variations associate with asthma, asthma severity and beta2 agonist and steroid response regulatory haplotypes in arg1 are associated with altered bronchodilator response genetic variants of gsnor and adrb2 influence response to albuterol in african-american children with severe asthma gsno reductase and beta2-adrenergic receptor gene-gene interaction: bronchodilator responsiveness to albuterol endogenous nitrogen oxides and bronchodilator s-nitrosothiols in human airways regulation of beta-adrenergic receptor signaling by s-nitrosylation of g-protein-coupled receptor kinase 2 a genome-wide association study of bronchodilator response in asthmatics tailored second-line therapy in asthmatic children with the arg(16) genotype leukotriene pathway genetics and pharmacogenetics in allergy absorption of montelukast is transporter mediated: a common variant of oatp2b1 is associated with reduced plasma concentrations and poor response slco2b1 c.935g>a single nucleotide polymorphism has no effect on the pharmacokinetics of montelukast and aliskiren. pharmacogenet pharmacogenetic association between alox5 promoter genotype and the response to anti-asthma treatment the glucocorticoid receptor heterocomplex gene stip1 is associated with improved lung function in asthmatic subjects treated with inhaled corticosteroids influence of leukotriene pathway polymorphisms on response to montelukast in asthma pharmacogenetics of the 5-lipoxygenase biosynthetic pathway and variable clinical response to montelukast pharmacogenetics of asthma controller treatment variant ltc(4) synthase allele modifies cysteinyl leukotriene synthesis in eosinophils and predicts clinical response to zafirlukast asthma pharmacogenetic study using finite mixture models to handle drugresponse heterogeneity effect of citrus juice and slco2b1 genotype on the pharmacokinetics of montelukast treatment heterogeneity in asthma: genetics of response to leukotriene modifiers corticosteroid pharmacogenetics: association of sequence variants in crhr1 with improved lung function in asthmatics treated with inhaled corticosteroids pharmacogenetic study of the effects of nk2r g231e g>a and tbx21 h33q c>g polymorphisms on asthma control with inhaled corticosteroid treatment dualspecificity phosphatase 1 as a pharmacogenetic modifier of inhaled steroid response among asthmatic patients fcer2: a pharmacogenetic basis for severe exacerbations in children with asthma fcer2 t2206c variant associated with chronic symptoms and exacerbations in steroidtreated asthmatic children genomewide association between glcci1 and response to glucocorticoid therapy in asthma genome-wide association identifies the t gene as a novel asthma pharmacogenetic locus metabolism of beclomethasone dipropionate by cytochrome p450 3a enzymes g894t polymorphism of enos gene is a predictor of response to combination of inhaled corticosteroids with long-lasting beta2-agonists in asthmatic children pharmacogenetics of asthma ancestry, ancestry-informative markers, asthma, and the quest for personalized medicine a phase ii placebo-controlled study of tralokinumab in moderate-to-severe asthma il-4 receptor alpha polymorphisms are predictors of a pharmacogenetic response to a novel il-4/il-13 antagonist effect of an interleukin-4 variant on late phase asthmatic response to allergen challenge in asthmatic patients: results of two phase 2a studies v75r576 il-4 receptor alpha is associated with allergic asthma and enhanced il-4 receptor function research in the authors' laboratory is funded by asthma uk, the medical research council, and the british medical association. we thank dr emily hodge for originally generating figure 28 .2. key: cord-102935-cx3elpb8 authors: hassani-pak, keywan; singh, ajit; brandizi, marco; hearnshaw, joseph; amberkar, sandeep; phillips, andrew l.; doonan, john h.; rawlings, chris title: knetminer: a comprehensive approach for supporting evidence-based gene discovery and complex trait analysis across species date: 2020-04-24 journal: biorxiv doi: 10.1101/2020.04.02.017004 sha: doc_id: 102935 cord_uid: cx3elpb8 generating new ideas and scientific hypotheses is often the result of extensive literature and database reviews, overlaid with scientists’ own novel data and a creative process of making connections that were not made before. we have developed a comprehensive approach to guide this technically challenging data integration task and to make knowledge discovery and hypotheses generation easier for plant and crop researchers. knetminer can digest large volumes of scientific literature and biological research to find and visualise links between the genetic and biological properties of complex traits and diseases. here we report the main design principles behind knetminer and provide use cases for mining public datasets to identify unknown links between traits such grain colour and pre-harvest sprouting in triticum aestivum, as well as, an evidence-based approach to identify candidate genes under an arabidopsis thaliana petal size qtl. we have developed knetminer knowledge graphs and applications for a range of species including plants, crops and pathogens. knetminer is the first open-source gene discovery platform that can leverage genome-scale knowledge graphs, generate evidence-based biological networks and be deployed for any species with a sequenced genome. knetminer is available at http://knetminer.org. which is prone to information being overlooked and subjective biases being introduced. even when 38 the task of gathering information is complete, it is demanding to assemble a coherent view of how 39 each piece of evidence might come together to "tell a story" about the biology that can explain how 40 multiple genes might be implicated in a complex trait or disease. new tools are needed to provide 41 scientists with a more fine-grained and connected view of the scientific literature and databases, 42 rather than the conventional information retrieval tools currently at their disposal. 43 scientists are not alone with these challenges. search systems form a core part of the duties of 44 many professions. studies have highlighted the need for search systems that give confidence to 45 the professional searcher and therefore trust, explainability, and accountability remain a significant knetminer provides search term suggestions and real-time query feedback. from a search, a user 118 is presented with the following views: gene view is a ranked list of candidate genes along with a 119 summary of related evidence types. map view is a chromosome based display of qtl, gwas 120 peaks and genes related to the search terms. evidence view is a ranked list of query related 121 evidence terms and enrichment scores along with linked genes. by selecting one or multiple 122 elements in these three views, the user can get to the network view to explore a gene-centric or 123 evidence-centric knowledge network related to their query and the subsequent selection. (nilsson-ehle, 1914) and that the red pigmentation of wheat grain is controlled by r genes on the 136 long arms of chromosomes 3a, 3b, and 3d (sears, 1944 figure 3a ). this network is displayed in the 155 network view which provides interactive features to hide or add specific evidence types from the 156 network. nodes are displayed in a defined set of shapes, colors and sizes to distinguish different 157 types of evidence. a shadow effect on nodes indicates that more information is available but has 158 been hidden. the auto-generated network, however, is not yet telling a story that is specific to our 159 traits of interest and is limited to evidence that is phenotypic in nature. 160 to further refine and extend the search for evidence that links tt2 to grain color and phs, we can 162 provide additional keywords relevant to the traits of interest. seed germination and dormancy are 163 the underlying developmental processes that activate or prevent pre-harvest sprouting in many 164 8 grains and other seeds. the colour of the grain is known to be determined through accumulation of 165 proanthocyanidin, an intermediate in the flavonoid pathway, found in the seed coat. these terms 166 and phrases can be combined using boolean operators (and, or, not) and used in conjunction 167 with a list of genes. thus, we search for traescs3d02g468400 (tt2) and the keywords: "seed 168 germination" or "seed dormancy" or color or flavonoid or proanthocyanidin. this time, 169 knetminer filters the extracted tt2 knowledge network (823 nodes) down to a smaller subgraph of 170 68 nodes and 87 relations in which every path from tt2 to another node corresponds to a line of 171 evidence to phenotype or molecular characteristics based on our keywords of interest ( figure 3b ). overall the exploratory link analysis has generated a potential link between grain color and phs 193 due to tt2-mft interaction and suggested a new hypothesis between two traits (phs and root 194 hair density) that were not part of the initial investigation and previously thought to be unrelated. 195 furthermore, it raises the possibility that tt2 mutants might lead to increased root hairs and to 196 higher nutrient and water absorption, and therefore cause early germination of the grain. more data 197 and experiments will be needed to address this hypothesis and close the knowledge gap. biologists would generally agree to be informative when studying the function of a gene. searching 255 a kg for such patterns is akin to searching for relevant sentences containing evidence that 256 supports a particular point of view within a book. such evidence paths can be short e.g. gene a 257 was knocked out and phenotype x was observed; or alternatively the evidence path can be longer, 258 e.g. gene a in species x has an ortholog in species y, which was shown to regulate the 259 expression of a disease related gene (with a link to the paper). in the first example, the relationship 260 between gene and disease is directly evident and experimentally proven, while in the second 261 12 example the relationship is indirect and less certain but still biologically meaningful. there are 262 many evidence types that should be considered for evaluating the relevance of a gene to a trait. in 263 a kg context, a gene is considered to be, for example, related to 'early flowering' if any of its 264 biologically plausible graph patterns contain nodes related to 'early flowering'. in this context, the 265 word 'related' doesn't necessarily mean that the gene in question will have an effect on 'flowering shown to a user; let alone if combining gcss for tens to hundreds of genes. there is therefore a 293 need to filter and visualise the subset of information in the gcss that is most interesting to a 294 specific user. however, the interestingness of information is subjective and will depend on the 295 biological question or the hypothesis that needs to be tested. a scientist with an interest in disease 296 biology is likely to be interested in links to publications, pathways, and annotations related to 297 diseases, while someone studying the biological process of grain filling is likely more interested in 298 links to physiological or anatomical traits. to reduce information overload and visualise the most 299 interesting pieces of information, we have devised two strategies. 1) in the case of a combined 300 gene and keyword search, we use the keywords as a filter to show only paths in the gcs that 301 connect genes with keyword related nodes, i.e. nodes that contain the given keywords in one of 302 their node properties. in the special case where too many publications remain even after keyword 303 filtering, we select the most recent n publications (default n=20). nodes not matching the keyword 304 are hidden but not removed from the gcs. 2) in the case of a simple gene query (without 305 additional keywords), we initially show all paths between the gene and nodes of type 306 phenotype/trait, i.e. any semantic motif that ends with a trait/phenotype, as this is considered the 307 most important relationship to many knetminer users. 308 gene ranking 309 we have developed a simple and fast algorithm to rank genes and their gcs for their importance. 310 we give every node in the kg a weight composed of three components, referred to as sdr, 311 standing for the specificity to the gene, distance to the gene and relevance to the search terms. 312 specificity reflects how specific a node is to a gene in question. for example, a publication that is 313 cited (linked) by hundreds of genes receives a smaller weight than a publication which is linked to 314 one or two genes only. we define the specificity of a node x as: where n is the 315 frequency of the node occurring in all n gcs. d i s t a n c e assumes information which is associated 316 more closely to a gene can generally be considered more certain, versus one that's further away, 317 e.g. inferred through homology and other interactions increases the uncertainty of annotation 318 14 propagation. a short semantic motif is therefore given a stronger weight, whereas a long motif 319 receives a weaker weight. thus, we define the second weight as the inverse shortest path distance 320 of a gene g and a node x: both weights s and d are not influenced by the 321 search terms and can therefore be pre-computed for every node in the kg. relevance reflects the 322 relevance or importance of a node to user-provided search terms using the well-established 323 measure of inverse document frequency (idf) and term frequency (tf) (salton & yang, 1973 we define the knetscore of a gene as: 330 the sum considers only gcs nodes that contain the search terms. in the absence of search terms, 331 we sum over all nodes of the gcs with r=1 for each node. the computation of the knetscore 332 biologists, such as tables and chromosome views, allowing them to explore the data, make 370 choices as to which gene to view, or refine the query if needed. these initial views help users to 371 reach a certain level of confidence with the selection of potential candidate genes. however, they 372 16 do not tell the biological story that links candidate genes to traits and diseases. in a second step, to 373 enable the stories and their evidence to be investigated in full detail, the network view visualises 374 highly complex information in a concise and connected format, helping facilitate biologically 375 meaningful conclusions. consistent graphical symbols are used for representing evidence types 376 throughout the different views, so that users develop a certain level of familiarity, before being 377 exposed to networks with complex interactions and rich content. scientists spend a considerable amount of time searching for new clues and ideas by synthesizing 397 many different sources of information and using their expertise to generate hypotheses. knetminer 398 is a user-friendly platform for biological knowledge discovery and exploratory data mining. it allows 399 humans and machines to effectively connect the dots in life science data and literature, search the 400 17 connected data in an innovative way, and then return the results in an accessible, explorable, yet 401 concise format that can be easily interrogated to generate new insights. we discovering protein drug targets using 563 the monarch initiative: an integrative data and analytic platform connecting 568 phenotypes to genotypes across species a wheat homolog of mother of ft and tfl1 acts in the regulation 570 of germination zur kenntnis der mit der keimungsphysiologie des weizens in 573 zusammenhang stehenden inneren faktoren bioinformatics meets user-centred design: a perspective meta-analysis of the heritability of human traits based on fifty 579 years of twin studies information retrieval in the workplace: a 581 comparison of professional search practices progress in biomedical knowledge discovery: a 25-year 584 on the specification of term values in automatic indexing cytogenetic studies with polyploid species of wheat knowledge graphs and knowledge networks: the 590 story in brief knetmaps: a biojs component to visualize 592 biological knowledge networks identification of loci 594 governing eight agronomic traits using a gbs-gwas approach and validation by qtl 595 mapping in soya bean big data: astronomical or genomical? sensitivity to "sunk costs" in mice, rats, and humans iwgsc 605 whole-genome assembly principal investigators whole-genome sequencing and assembly shifting the limits in wheat research 608 and breeding using a fully annotated reference genome trend analysis of knowledge graphs for crop pest and diseases mother of ft and tfl1 regulates seed germination 613 through a negative feedback loop modulating aba signaling in arabidopsis use of graph database for the integration allelic variation and transcriptional isoforms of wheat tamyc1 gene 618 regulating anthocyanin synthesis in pericarp the authors declare that they have no competing interests. key: cord-020101-5rib7pe8 authors: nan title: cumulative author index for 2008 date: 2008-11-17 journal: virus res doi: 10.1016/s0168-1702(08)00367-5 sha: doc_id: 20101 cord_uid: 5rib7pe8 nan dettori, g., see medici, m.c. (137) 163 devi, s., see osman, o. (135) murine leukemia virus reverse transcriptase: structural comparison with hiv-1 reverse transcriptase the gprlqpy motif located at the carboxy-terminal of the spike protein induces antibodies that neutralize porcine epidemic diarrhea virus detection of ovine herpesvirus 2 major capsid gene transcripts as an indicator of virus replication in shedding sheep and clinically affected animals genetic characterization of equine influenza viruses isolated in italy between a new living cell-based assay system for monitoring genome-length hepatitis c virus rna replication unraveling the puzzle of human anellovirus infections by comparison with avian infections with the chicken anemia virus the contribution of feathers in the spread of chicken anemia virus cloning and subcellular localization of the phosphoprotein and nucleocapsid proteins of potato yellow dwarf virus, type species of the genus nucleorhabdovirus the p26 gene of the autographa californica nucleopolyhedrovirus: timing of transcription, and cellular localization and dimerization of product complete genomic sequence of turkey coronavirus recombinant l and p protein complex of rinderpest virus catalyses mrna synthesis in vitro molecular divergence of grapevine virus a (gva) variants associated with shiraz disease in south africa sequence analysis of a reovirus isolated from the winter moth operophtera brumata (lepidoptera: geometridae) and its parasitoid wasp phobocampe tempestiva (hymenoptera: ichneumonidae sars coronavirus replicase proteins in pathogenesis virus-induced gene silencing in medicago truncatula and lathyrus odorata evaluating the 3c-like protease activity of sars-coronavirus: recommendations for standardized assays for drug discovery hbx modulates iron regulatory protein 1-mediated iron metabolism via reactive oxygen species pathogenetic mechanisms of severe acute respiratory syndrome detection of a novel circovirus in mute swans (cygnus olor) by using nested broadspectrum pcr chimaeric hiv-1 subtype c gag molecules with large in-frame c-terminal polypeptide fusions form virus-like particles cross-species recombination in the haemagglutinin gene of canine distemper virus sapovirus-like particles derived from polyprotein cauliflower mosaic virus gene vi product n-terminus contains regions involved in resistance-breakage, self-association and interactions with movement protein adenovirus vector induced innate immune responses: impact upon efficacy and toxicity in gene therapy and vaccine applications interfering with cellular signaling pathways enhances sensitization to combined sodium butyrate and gcv treatment in ebv-positive tumor cells evidence for recombination between pcv2a and pcv2b in the field retroviral reverse transcriptases (other than those of hiv-1 and murine leukemia virus): a comparison of their molecular and biochemical properties mitochondrial plasmids of sugar beet amplified via rolling circle method detected during curtovirus screening appearance of intratypic recombination of enterovirus 71 in taiwan from the circulation of subgenogroups b5 and c5 of enterovirus 71 in taiwan from in vitro replication of bamboo mosaic virus satellite characterization of the interaction of domain iii of the envelope protein of dengue virus with putative receptors from cho cells intrahost evolution of envelope glycoprotein and orfa sequences after experimental infection of cats with a molecular clone and a biological isolate of feline immunodeficiency virus the sars-coronavirus plnc domain of nsp3 as a replication/ transcription scaffolding protein limited compatibility between the rna polymerase components of influenza virus type a and b serotype-specificity of recombinant fusion proteins containing domain iii of dengue virus very virulent infectious bursal disease virus isolated from wild birds in korea: epidemiological implications genetic analysis and evaluation of the reassortment of influenza b viruses isolated in taiwan during the enhanced immune responses of mice inoculated recombinant adenoviruses expressing gp5 by fusion with gp3 and/or gp4 of prrs virus effect of antiviral treatment and host susceptibility on positive selection in hepatitis c virus novel hiv-1 reverse transcriptase inhibitors synthesis of recombinant human parainfluenza virus 1 and 3 nucleocapsid proteins in yeast saccharomyces cerevisiae evolutionary analyses of european h1n2 swine influenza a virus by placing timestamps on the multiple reassortment events hepatitis b pregenomic rna splicing-the products, the regulatory mechanisms and its biological significance human papillomavirus 16 e6, l1, l2 and e2 gene variants in cervical lesion progression pathology and hematology of the caribbean spiny lobster experimentally infected with panulirus argus virus tomato leaf curl virus satellite dna as a gene silencing vector activated by helper virus infection down-regulation of sclerotinia sclerotiorum gene expression in response to infection with sclerotinia sclerotiorum debilitation-associated rna virus presence of p1b and absence of hc-pro in squash vein yellowing virus suggests a general feature of the genus ipomovirus in the family potyviridae the truncated virus-like particles of c6/36 cell densovirus: implications for the assembly mechanism of brevidensovirus tubulovesicular structures are a consistent (and unexplained) finding in the brains of humans with prion diseases interferon antagonist function of japanese encephalitis virus ns4a and its interaction with dead-box rna helicase identification of novel viral interleukin-10 isoforms of human cytomegalovirus ad169 cross-reactive and serospecific epitopes of nucleocapsid proteins of three hantaviruses: prospects for new diagnostic tools genetic diversity of the vp1 gene of duck hepatitis virus type i (dhv-i) isolates from southeast china is related to isolate attenuation the major tegument structural protein vp22 targets areas of dispersed nucleolin and marginalized chromatin during productive herpes simplex virus 1 infection mutational events during the primary propagation and consecutive passages of hepatitis e virus strain je03-1760f in cell culture the n protein of tomato spotted wilt virus (tswv) is associated with the induction of programmed cell death (pcd) in capsicum chinense plants evolutionary relationships of virus species belonging to a distinct lineage within the ampelovirus genus a novel genomic constellation (g10p[3]) of group a rotavirus detected from buffalo calves in northern india phylogeny of lagos bat virus: challenges for lyssavirus taxonomy dc-sign enhances infection of cells with glycosylated west nile virus in vitro and virus replication in human dendritic cells induces production of different modulation of cellular transcription by adenovirus 5, ⌬e1/e3 adenovirus and helper-dependent vectors involvement of cytoskeleton in junín virus entry hiv-1 reverse transcriptase inhibitor resistance mutations and fitness: a view from the clinic and ex vivo genomic characterization of novel marine vesiviruses from steller sea lions restricted quasispecies variation following infection with the gb virus b molecular characterization of vp4, vp6 and vp7 genes of a rare g8p[14] rotavirus strain detected in an infant with gastroenteritis in italy retroviral reverse transcription mechanisms of resistance to nucleoside analogue inhibitors of hiv-1 reverse transcriptase truncation of cytoplasmic tail of eiav env increases the pathogenic necrosis characterization of russian rabies virus vaccine strain rv-97 bacteriophage preparation inhibition of reactive oxygen species generation by endotoxin inhibition of autographa californica nucleopolyhedrovirus (acnpv) polyhedrin gene expression by dnazyme knockout of its serine/threonine kinase (pk1) gene serine/threonine kinase (pk-1) is a component of autographa californica multiple nucleopolyhedrovirus (acmnpv) very late gene transcription complex and it phosphorylates a 102 kda polypeptide of the complex amino acid at position 95 of the matrix protein is a cytopathic determinant of rabies virus phosphorylation of the tgbp1 movement protein of potato virus x by a nicotiana tabacum ck2-like activity comparative genomics of serotype asia 1 foot-and-mouth disease virus isolates from india sampled over the last two decades low dna htlv-2 proviral load among women in são paulo city antiviral potentials of medicinal plants a molecular epidemiological study of rabies in puerto rico the latent membrane protein 1 (lmp1) encoded by epstein-barr virus induces expression of the putative oncogene bcl-3 sars coronavirus accessory proteins hepatitis b viruses: reverse transcription a different way increase in proto-oncogene mrna transcript levels in bovine lymphoid cells infected with a cytopathic type 2 bovine viral diarrhea virus plantibody-mediated inhibition of the potato leafroll virus p1 protein reduces virus accumulation interferon ␤1-a and selective anti-5ht 2a receptor antagonists inhibit infection of human glial cells by jc virus the begomoviruses honeysuckle yellow vein mosaic virus and tobacco leaf curl japan virus with dna␤ satellites cause yellow dwarf disease of tomato genetic structure of a population of potato virus y inducing potato tuber necrotic ringspot disease in japan molecular characterisation and phylogenetic analysis of chronic bee paralysis virus inhibition of west nile virus replication in cells stably transfected with vector-based shrna expression system complete genome sequence analysis of dengue virus type 2 isolated in detecting molecular adaptation at individual codons in the glycoprotein gene of the geographically diversified infectious hematopoietic necrosis virus positive natural selection in the evolution of human metapneumovirus attachment glycoprotein sphingomyelin induces structural alteration in canine parvovirus capsid late steps of parvoviral infection induce changes in cell morphology molecular cloning and sequence analysis of the duck enteritis virus ul5 gene the interaction between kshv rta and cellular rbp-j and their subsequent dna binding are not sufficient for activation of modulation of hepatitis b virus replication by expression of polymerasesurface fusion protein through splicing: implications for viral persistence seroprevalence and genetic evolutions of swine influenza viruses under vaccination pressure in korean swine herds the cycle for a siphoviridae-like phage (vhs1) of vibrio harveyi is dependent on the physiological state of the host expression and biochemical characterization of nsp2 cysteine protease of chikungunya virus effect of sirna mediated suppression of signaling lymphocyte activation molecule on replication of peste des petits ruminants virus in vitro prevalence and molecular characterization of wu/ki polyomaviruses isolated from pediatric patients with respiratory disease in functional mapping of the porcine reproductive and respiratory syndrome virus capsid protein nuclear localization signal and its pathogenic association genetic variation of hepatitis c virus in a cohort of injection heroin users in wuhan identification of a conserved linear b-cell epitope at the n-terminus of the e2 glycoprotein of classical swine fever virus by phage-displayed random peptide library lópez-galíndez, an hiv-1 215v mutant shows increased phenotypic resistance to d4t hiv-1 p17 binds heparan sulfate proteoglycans to activated cd4 ϩ t cells up-regulation of murid herpesvirus 4 orf50 by hypoxia: possible implication for virus reactivation from latency effective inhibition of japanese encephalitis virus replication by small interfering rnas targeting the ns5 gene size reversion of a truncated dna␤ associated with tobacco curly shoot virus f gene recombination between genotype ii and vii newcastle disease virus growth of tick-borne encephalitis virus (european subtype) in cell lines from vector and non-vector ticks dna recognition properties of the cell-to-cell movement protein (mp) of soybean isolate of mungbean yellow mosaic india virus emerging g9 rotavirus strains in the northwest of china implications of recombination for hiv diversity induction of apoptosis in vero cells by newcastle disease virus requires viral replication, de-novo protein synthesis and caspase activation subcellular localization of the triple gene block proteins encoded by a foveavirus infecting grapevines phylogenetic analysis of the ns5 gene of dengue viruses structural basis for drug resistance mechanisms for non-nucleoside inhibitors of hiv reverse transcriptase importance of cholesterol for infection of cells by transmissible gastroenteritis virus animal models and vaccines for sars-cov infection comparative full genome analysis revealed e1: a226v shift in oropouche virus entry into hela cells involves clathrin and requires endosomal acidification molecular evidence for polyphyletic origin of human immunodeficiency virus type 1 subtype c in transcriptomic analysis of responses to infectious salmon anemia virus infection in macrophage-like cells rnase h activity: structure, specificity, and function in reverse transcription a single nucleotide change in hop stunt viroid modulates citrus cachexia symptoms sequence analysis of mrna transcripts encoding jembrana disease virus tat-1 in vivo isolation of a type 3 vaccine-derived poliovirus (vdpv) from an iranian child with x-linked agammaglobulinemia a review of studies on animal reservoirs of the sars coronavirus genetic analysis of dengue 3 virus subtype iii 5ј and 3ј non-coding regions allopurinol, an inhibitor of purine catabolism, enhances susceptibility of tobacco to tobacco mosaic virus effects of acp26 on in vitro and in vivo productivity, pathogenesis and virulence of autographa californica multiple nucleopolyhedrovirus sars-cov replication and pathogenesis in an in vitro model of the human conducting airway epithelium rna transcription analysis and completion of the genome sequence of yellow head nidovirus mechanisms of inhibition of hiv replication by non-nucleoside reverse transcriptase inhibitors acute non-cytopathic bovine viral diarrhea virus infection induces pronounced type i interferon response in pregnant cows and fetuses extracellular vesicles containing virus-encoded membrane proteins are a byproduct of infection with modified vaccinia virus ankara analysis of jembrana disease virus mrna transcripts produced during acute infection demonstrates a complex transcription pattern complete sequence of the genome of avian paramyxovirus type 2 (strain yucaipa) and comparison with other paramyxoviruses cloning and sequencing of capsid protein of indian isolate of extra small virus from macrobrachium rosenbergii a member of a new genus in the potyviridae infects rubus molecular epidemiology of rabies in indonesia key: cord-280897-el7bdkcf authors: wang, hai-fang; feng, liang; niu, deng-ke title: relationship between mrna stability and intron presence date: 2007-03-02 journal: biochemical and biophysical research communications doi: 10.1016/j.bbrc.2006.12.184 sha: doc_id: 280897 cord_uid: el7bdkcf abstract introns were found to enhance almost every steps of gene expression except increasing mrna stability. by analyzing the genome-wide data of mrna stability published by someone previously, we found that human intron-containing genes have more stable mrnas than intronless genes, and the arabidopsis thaliana genes with the most unstable mrnas have fewer introns than other genes in the genome. after controlling for mrna length, we found mrna stability is still positively correlated with intron number in human intron-containing genes. but in yeast saccharomyces cerevisiae, two different datasets on mrna half-life gave conflicting results. the components of messenger ribonucleoprotein particles recruited during intron splicing may be retained in cytoplasmic mrnps and act as signals of mrna stability or simply insulators to avoid mrna degradation. introns are widespread noncoding sequences in eukaryotic genomes, their cost and benefit to the host are still not established [1] [2] [3] . recent progress has revealed that splicing out introns from pre-mrnas can enhance almost every steps of gene expression from transcription to translation [4] . as mrna accumulation is determined by both synthesis and degradation, mrna stability is equally important in regulating gene expression as transcription [5, 6] . do introns also increase the amount of protein produced from a gene by enhancing the mrna stability? ryu and mertz [7] showed that mutants of virus sv40 late transcript lacking introns are defective in mrna stability in the nucleus, but not in mrna stability in the cytoplasm. in maize, inclusion of salt intron can stimulate cat gene expression to 10-18-fold higher than the intronless control gene; but the spliced mrnas do not have a higher stability than those encoded by the intronless control gene [8] . nott et al. [9] found that human tpi intron 6 inserted into reporter gene renilla luciferase can enhance the mrna accumulation, but they did not observe any significant splicing-dependent alteration in mrna stability. chang et al. [10] found that insertion of a 138-bp intron into sars-cov spike protein gene can enhance the protein expression in mammalian cells, but the mrnas exhibited similar decay rates as the intronless control mrna. splicing was found to be essential for significant protein expression of human b-globin gene. absence of introns results in inefficient 3 0 -end mrna processing, and the unprocessed b-globin mrna is substantially less stable than the 3 0end processed mrna [11] . in addition, the half-life of 3 0 -end processed b-globin mrnas encoded by intron-containing gene was 21 ± 7 h while the half-life of the 3 0 -end processed b-globin mrnas encoded by intronless gene was 15 ± 3 h [11] . but the authors [11] and another scientist that cited the paper [12] looked it as a minor difference, and so they did not think that introns can alter mrna stability. the accumulating data from genome sequencing and large-scale analysis of gene expression make it possible to re-examine or further testing the conclusions of previous experimental studies on specific genes. a successful example is the survey of genes with one or more 3 0 -untranslated exons to test the rule for termination-codon position in nonsense-mediated mrna decay [13] . for the present issue, the relationship between mrna stability and intron presence was not extensively studied in genome scale. gutierrez et al. [14] compared 100 genes with the most unstable transcripts against genes encoding stable transcripts in arabidopsis thaliana. they did not find significant differences in intron numbers. although most of the genes are intronless in yeast saccharomyces cerevisiae, more than 70% of ribosome protein genes have at least one intron [15] . five ribosome protein genes were found to encode anomalous unstable mrnas (half-life < 10 min). wang et al. [15] noticed that four of the five genes lack introns. the aberrantly decay rates of these genes were attributed to specialized regulatory programs and distinct functions [15] . here, we collected the genome-wide data on the stability of mrnas (half-life or decay rate) in homo sapiens, s. cerevisiae, and a. thaliana from publication supplements and examined their relationships with intron presence/ abundance. the genome annotation files of h. sapiens (build 35 version 1), yeast s. cerevisiae (updated february 6, 2006) , and a. thaliana (updated november 16, 2005) were downloaded from the ncbi genome database (ftp://ftp.ncbi.nih.gov/genomes/). in the case of alternative splicing variants, we retained the longest mrna for analysis (although similar results were obtained by analyzing the shortest mrna, data not shown). the data on mrna stability were gathered from publication supplements, including the decay rates of human mrnas in hepatocellular carcinoma cell line hepg2 cells and primary fibroblast cell line bud8 cells [16] , mrna half-lives in human t lymphocytes stimulated with medium, human t lymphocytes stimulated with an anti-cd3 antibody, and human t lymphocytes stimulated with antibodies anti-cd3 and anti-cd28 [17] . the mrnas with decay rate 60 h à1 were excluded from our analyses. in assigning faster expressed genes and slower expressed genes and surveying motifs that regulate mrna decay, we followed the methods/results of yang et al. [16] . we use gnf geneatlas version 2 [18] to determine human gene expression level. the data on mrna half-life and gene expression levels of yeast s. cerevisiae strain y262 [15] and yeast s. cerevisiae strain rpb1-1 [19] were collected from publication supplements deposited by the authors, http://web.wi.mit.edu/young/pub/holstege.html and http://genome-www.stanford.edu/turnover/data.shtml, respectively. gutierrez et al. studied the mrna stability of a. thaliana [14] . but they only provided the data of 100 genes with the most unstable transcripts (with half-life 6 60 min). so, these 100 genes were compared with other genes annotated in a. thaliana genome. the data were normalized by logarithmic transformation using 10 as the base in partial correlation analyses. in human, the mrnas of intron-containing genes are more stable in human hepg2 cells and bud8 cells, the mrnas of intron-containing genes have lower decay rates than those of intronless genes (table 1 ). in human t lymphocytes stimulated with medium and t lymphocytes stimulated with antibodies anti-cd3 and anti-cd28, the mrnas of intron-containing genes have longer half-lives than those of intronless genes (table 1 ). in human t lymphocytes stimulated with anti-cd3 antibody, there is also a significant difference between mrna half-lives of intron-containing genes and those of intronless genes (p = 0.045). considered from the average value of mrna half-lives, the mrnas of intronless genes are a little more stable than those of the intron-containing genes. but if considered from the median value of mrna half-lives, the mrnas of intron-containing genes are more stable than those of the intronless genes. as the nonparametric mann-whitney test uses the ranks of the data rather than their raw values to calculate the statistic, the conclusion based on median value is stronger. that is, the mrnas of intron-containing genes have longer half-lives than those of intronless genes in human t lymphocytes stimulated with anti-cd3 antibody. as a stable mrna can be measured by lower decay rate or longer half-life, our analyses of different sources of data consistently showed that the mrnas of intron-containing genes are more stable than those of intronless genes in human cells. we further tested the relationship between intron presence and human mrna stability by controlling other biological characters to see whether the above relationship is the byproduct of other relationships. some evidence suggested that short mrnas may be more stable [20, 21] . but the intronless genes we analyzed have significantly shorter mrnas than the intron-containing genes (supplementary table s1 ). so the difference in mrna stabilities between intron-containing genes and intronless genes could not be attributed to the difference in mrna lengths. human intronless genes are not randomly distributed across molecular function categories [22] . meanwhile, previous studies on yeast and human showed a strong relationship between physiological function and mrna turnover rates [15, 16] . could the difference in mrna stability between intronless genes and intron-containing genes be attributed to functional differences? according to the observed mrna decay rates, yang and coauthors [16] divided the gene ontology (go) categories into three groups: faster, slower, and no significant. we analyzed the distribution of intron-containing genes and intronless genes in the three go groups by chi-square test. there are no significant differences in the genes analyzed in bud8 cells and those analyzed in t cells (p > 0.10). for the genes analyzed in hepg2 cells, intronless genes appear to be enriched in faster decay group (p = 0.046), but the p value is near the conventional significant borderline of 0.05. furthermore, we designed a method to remove the effects of functional difference on mrna stability. for each human intronless gene, we selected an intron-containing gene with similar function and similar mrna length to pair with it. the functional similarity was defined as if they share two or more go terms (http://www.geneontology.org/) and the mrna length similarity was defined as if the length difference is below 30%. more stringent criteria result in much fewer pairs of genes giving too small a sample to study. in the cases that two or more intron-containing genes are paired with one intronless gene, we used the median value of the mrna stabilities of the intron-containing genes in the pairwise comparison with the intronless gene. we found that the mrnas of intronless genes are less stable (i.e. have higher decay rates or shorter half-lives) than the intron-containing genes with similar mrna lengths and functions (fig. 1) . but the differences are not significant in wilcoxon signed ranks tests of the data from t lymphocytes with anti-cd3 and from t lymphocytes with both anti-cd3 and anti-cd28 (fig. 1) . some au-rich elements in the 3 0 -untranslated region of mrnas were found to decrease mrna stability [5, 16] . but we found they are not more abundant in the mrnas of intronless genes (supplementary table s2 ). on the contrary, they are a little more abundant in the mrnas of intron-containing genes (supplementary table s2 ). the presence of introns seems to enhance mrna stability in human. a further question is whether multiple introns have cumulative effect on mrna stability in intron-containing genes. after controlling for mrna length, we found significant negative correlations between intron number and mrna decay rate in human hepg2 cells and bud8 cells, and significant positive correlations between intron number and mrna half-life in human t lymphocytes ( table 2) . controlling for mrna length together with other potential factors like gene expression level and gc content gave similar results (data not shown). the mrnas of intron-containing genes are more stable than those of intronless genes in yeast strain y262 (table 1) . strangely, we found that the mrnas of intronless genes are more stable in yeast strain rpb1-1 (table 1) , although the difference is very small. different from human genes, the mrnas of the yeast intronless genes we analyzed are significantly longer than those of intron-containing genes (supplementary table s1 ). we further compared the mrna stabilities between intronless genes and introncontaining genes with similar mrna length and similar go annotations as above (as intron-containing genes are rare in yeast, we selected intronless genes with similar functions and mrna lengths to pair with each intron-containing gene). the mrnas of intron-containing genes are still more stable than those of intronless genes in yeast strain y262, but less stable in yeast strain rpb1-1 (fig. 1) . controlling for mrna length, mrna stability and intron number are positively correlated in strain y262, but not in strain rpb1-1 ( table 2) . we re-examined the relationship between mrna stability and intron-presence in a. thaliana. in the 100 genes reported to have the most unstable mrnas [14] , 94 genes were found in the annotated genome of a. thaliana. as the authors did not provide a list of genes with stable transcripts, we compared the 94 genes with all other 26448 genes annotated in a. thaliana genome. we found 36.2% of the 94 genes are intronless. comparably, there are much lower percentage of intronless genes in the other 26448 genes (19.8%, pearson chi-square test p = 7.3 · 10 à5 ). different from previous study [14] , we found a significant difference in intron number between the 94 genes with the most unstable transcripts and the other 26448 genes (table 3) . furthermore, the difference in intron number between intron-containing genes with the most unstable transcripts and other intron-containing genes annotated in a. thaliana genome is also significant ( table 3) . the data from yeast strain y262 [15] and from strain rpb1-1 [19] did not give consistent results. we do not exactly know which dataset is more reliable. considering the date of publishing (the dataset from strain y262 was pub-lished in 2002 while that from strain rpb1-1 was published in 1998), it is likely that the data from strain y262 [15] may be more accurate because of the advancement of analyzing techniques. for that reason, we are inclined to think that introns and/or their splicing out from pre-mrnas have enhancing effects on mrna stability in yeast s. cerevisiae. the data from human t lymphocytes gave weaker results than those from hepatocellular carcinoma cell line hepg2 cells and primary fibroblast cell line bud8 cells. heat shock, hypoxia, and other stresses were reported to cause stabilization of some mrnas [23, 24] . the stability of mrnas apparently varies with environmental or physiological changes. nonetheless we can see a trend (although not very distinct) for the mrnas of intronless genes to be less stable. the enhancing effect seems to be not very strong and so it may be overwhelmed by environmentally or physiologically induced changes. statistical analysis of genome-wide data can reveal significant small differences. as we see, common experimental studies on specific genes are difficult to reveal small differences. previous experiments [7] [8] [9] [10] [11] did not show the enhancing effect of introns on mrna stability probably because the effects are not very strong. in addition, there is the possibility that a limited number of genes do not follow the general rules because of some specific reasons. then, how can introns enhance the mrna stability? the components of mrnps (messenger ribonucleoprotein particles) recruited during intron splicing or deposited onto exon-exon junctions [25, 26] may be retained in cytoplasmic mrnps and act as signals of mrna stability, or simply insulators to avoid inter-or intra-rna base-pairing. some proteins have been shown to have such insulating effects to prevent rna:dna hybrid (i.e. r loop structure) in nucleus, thereby suppressing unwanted dna recombination [27] [28] [29] [30] . similarly, a gene having high intron density is expected to have better insulated mrnas in cytoplasm. the mrnas of a human gene without introns or with few introns are more likely to form stem-loop which will stall ribosome and trigger endonucleolytic mrna cleavage [31, 32] . meanwhile these mrnas are also more likely to form double-stranded rna with other rnas in cytoplasm, being prone to degraded by rna interference [33] . in addition, there is also evidence that the exon junc-tion complexes of mrnps promotes mrna ribosome association [34] . the ribosomes attached on mrna may have the effect of stabilizing mrna in a way similar with the protein components of mrnps. the biology of intron gain and loss the evolution of spliceosomal introns: patterns, puzzles and progress the rise and falls of introns how introns influence and enhance eukaryotic gene expression bringing the role of rnrna decay in the control of gene expression into focus post-transcriptional control of gene expression: a genome-wide perspective simian virus 40 late transcripts lacking excisable intervening sequences are defective in both stability in the nucleus and transport to the cytoplasm intronmediated enhancement of transgene expression in maize is a nuclear, gene-dependent process a quantitative analysis of intron effects on mammalian gene expression influence of intron and exon splicing enhancers on mammalian cell expression of a truncated spike protein of sars-cov and its implication for subunit vaccine development analysis of the stimulatory effect of splicing on mrna production and utilization in mammalian cells the effect of intron location on intron-mediated enhancement of gene expression in arabidopsis a rule for termination-codon position within intron-containing genes: when nonsense affects rna abundance identification of unstable transcripts in arabidopsis by cdna microarray analysis: rapid decay is associated with a group of touch-and specific clockcontrolled genes precision and functional specificity in mrna decay decay rates of human mrnas: correlation with functional characteristics and sequence attributes genome-wide analysis of mrna decay in resting and activated primary human t lymphocytes a gene atlas of the mouse and human protein-encoding transcriptomes dissecting the regulatory circuitry of a eukaryotic genome the relationship between mrna stability and length in saccharomyces cerevisiae relationship between mrna stability and length: an old question with a new twist the non-random distribution of intronless human genes across molecular function categories molecular mechanisms regulating mrna stability: physiological and pathological significance induction of tsp1 gene expression by heat shock is mediated via an increase in mrna stability linking nuclear mrnp assembly and cytoplasmic destiny the exon junction complex is detected on cbp80-bound but not eif4e-bound mrna in mammalian cells: dynamics of mrnp remodeling cotranscriptionally formed dna:rna hybrids mediate transcription elongation impairment and transcription-associated recombination keeping rna and dna apart during transcription inactivation of the sr protein splicing factor asf/sf2 results in genomic instability mrna processing and genomic instability endonucleolytic cleavage of eukaryotic mrnas with stalls in translation elongation rna lost in translation post-transcriptional gene silencing by sirnas and mirnas splicing enhances translation in mammalian cells: an additional function of the exon junction complex we thank jie guo and da-yong zhang for suggestion. this work was supported by the national natural science foundation of china (grant no. 30270695) and beijing normal university. key: cord-022178-4oh02tlr authors: markl, jürgen; sadava, david; hillis, david m.; heller, h. craig; hacker, sally d. title: evolution von genen und genomen date: 2018-10-12 journal: purves biologie doi: 10.1007/978-3-662-58172-8_23 sha: doc_id: 22178 cord_uid: 4oh02tlr der erste weltkrieg endete im november 1918. die zahl der todesfälle in den vier kriegsjahren wurde jedoch schon bald übertroffen von den opfern einer massiven grippeepidemie, an der weltweit über 50 mio. menschen starben – und damit mehr als doppelt so viele wie in den schlachten des ersten weltkriegs. die pandemie von 1918/1919 war insofern bemerkenswert, als die sterberate unter jungen erwachsenen, die einer grippe gewöhnlich mit viel geringerer wahrscheinlichkeit zum opfer fallen als kinder und greise, um das 20-fache höher lag als bei den vorherigen und später folgenden grippeepidemien. warum erwies sich dieses grippevirus speziell unter den normalerweise widerstandsfähigsten menschen als so tödlich? der virusstamm von 1918 löste im menschlichen immunsystem eine besonders starke reaktion aus. infolge dieser überreaktion waren menschen mit einem leistungsfähigen immunsystem tendenziell stärker betroffen. der erste weltkrieg endete im november 1918. die zahl der todesfälle in den vier kriegsjahren wurde jedoch schon bald übertroffen von den opfern einer massiven grippeepidemie, an der weltweit über 50 mio. menschen starben -und damit mehr als doppelt so viele wie in den schlachten des ersten weltkriegs. die pandemie von 1918/1919 war insofern bemerkenswert, als die sterberate unter jungen erwachsenen, die einer grippe gewöhnlich mit viel geringerer wahrscheinlichkeit zum opfer fallen als kinder und greise, um das 20-fache höher lag als bei den vorherigen und später folgenden grippeepidemien. warum erwies sich dieses grippevirus speziell unter den normalerweise widerstandsfähigsten menschen als so tödlich? der virusstamm von 1918 löste im menschlichen immunsystem eine besonders starke reaktion aus. infolge dieser überreaktion waren menschen mit einem leistungsfähigen immunsystem tendenziell stärker betroffen. in der regel können wir uns im kampf gegen viren schon auf unser immunsystem verlassen. die immunreaktion bildet auch die grundlage der impfung. seit 1945 haben spezielle impfprogramme gegen grippeviren dazu beigetragen, die anzahl und schwere der grippefälle in grenzen zu halten. allerdings wirkt der impfstoff eines bestimmten jahres vermutlich nicht gegen die viren des folgenden jahres. der grund: es entwickeln sich ständig neue stämme von grippeviren und sorgen für genetische variabilität in der population. würden diese nicht evolvieren, könnten wir eine dauerhafte resistenz gegen sie aufbauen. das würde die jährliche grippeimpfung überflüssig machen. da die viren aber evolvieren, müssen pharmaunternehmen jedes jahr einen neuen, anderen grippeimpfstoff entwickeln und in ausreichender menge bereitstellen. die immunantwort von wirbeltieren wird ausgelöst, wenn das immunsystem proteine auf der virenoberfläche erkennt. demnach kann das virus durch veränderungen seiner oberflächenproteine der immunabwehr entkommen. je größer die anzahl der veränderungen der oberflächenproteine ist, desto eher werden die virenstämme vom immunsystem nicht erkannt, können ihre wirte infizieren und haben damit einen vorteil gegenüber anderen stämmen. biologen verfolgen, wie sich die oberflächenproteine der grippeviren von jahr zu jahr ändern. dadurch beobachten sie die evolution in aktion. mit diesen erkenntnissen können sie dann wirkungsvollere impfstoffe entwickeln. durch die erforschung von rasch evolvierenden organismen hat man sehr viel über die molekularen grundlagen der evolution gelernt. die ergebnisse molekularer evolutionsstudien finden wiederum anwendung in der praxis, etwa bei der entwicklung besserer strategien zur bekämpfung tödlicher krankheiten. in "experiment: warum war die grippepandemie von 1918/19 so schlimm?" in 7 abschn. 23.4 und in 7 "faszination forschung" am ende dieses kapitels finden sie antworten auf diese frage. das genom eines organismus oder virus setzt sich zusammen aus der gesamtheit aller seiner gene sowie sämtlichen nichtcodierenden abschnitten der erbsubstanz. bei eukaryoten finden sich die meisten gene auf den chromosomen im zellkern, es gibt aber auch gene in den mitochondrien und chloroplasten. bei organismen mit sexueller fortpflanzung vererben sowohl die männlichen als auch die weiblichen individuen gene der nucleären dna, hingegen werden mitochondrien-und chloroplastengene in der regel nur über das cytoplasma der eizellen weitergegeben. durch sequenzalignments können biologen bei einzelnen individuen oder arten auftretende nucleotid-oder aminosäuresubstitutionen nachweisen. wird alleine die anzahl der nucleotidsubstitutionen oder aminosäureaustausche zwischen sequenzen ermittelt, führt dies häufig dazu, dass die tatsächliche zahl der zugrunde liegenden veränderungen unterschätzt wird. damit genome von eltern an die nachkommen weitergegeben werden können, müssen sie zunächst repliziert werden. die replikation der dna verläuft jedoch, wie sie bereits wissen, nicht ohne fehler. fehler bei der dna-replikation -mutationenliefern einen großteil des ausgangsmaterials für evolutionäre veränderungen. mutationen sind eine grundvoraussetzung für das langfristige überleben von organismenpopulationen, denn sie bilden die eigentliche quelle für die genetische variabilität, die es populationen ermöglicht, als reaktion auf veränderungen ihrer umwelt zu evolvieren. ein bestimmtes allel eines gens kann erst dann an nachfolgende generationen weitergegeben werden, wenn ein individuum, das dieses allel besitzt, überlebt und sich fortpflanzt. das betreffende allel muss in kombination mit zahlreichen anderen genen des genoms funktionieren, sonst wird es von der selektion rasch ausgelesen. darüber hinaus werden das ausmaß und der zeitpunkt der expression eines gens streng reguliert. aus diesem grund kann man die gene eines einzelnen organismus als interagierende mitglieder einer gruppe betrachten, unter denen arbeitsteilung herrscht, aber auch starke wechselseitige abhängigkeiten bestehen. ein genom ist also nicht einfach eine willkürliche ansammlung von genen, die auf den chromosomen in zufälliger reihenfolge angeordnet sind. vielmehr handelt es sich dabei um eine komplexe zusammenstellung miteinander interagierender gene, regulationssequenzen und strukturelemente. dazwischen liegen abschnitte aus nichtcodierender dna, die vermutlich kaum eine direkte funktion erfüllen. die positionen der gene unterliegen genau wie ihre abfolge einem evolutionären wandel; das gleiche gilt für den umfang und die lage der nichtcodierenden dna-abschnitte. all diese veränderungen können sich auf den phänotyp eines organismus auswirken. mittlerweile haben biologen die genome einer großen zahl von organismen einschließlich des menschen vollständig sequenziert. die in diesen sequenzen enthaltenen informationen tragen dazu bei, dass wir heute besser verstehen, wie und warum sich organismen unterscheiden, wie sie funktionieren und wie sie sich im verlauf der evolution entwickelt haben. das fachgebiet der molekularen evolutionsforschung beschäftigt sich mit der erforschung der mechanismen und konsequenzen der evolution von makromolekülen, insbesondere von nucleinsäuren (dna und rna) und proteinen. i i i i i i i i i i i i i i i t v i i i l l l l i i i i i 1 5 10 15 20 25 30 zahl der aminosäuren an dieser position i i t t t t t t i t t t t i t v t t t t t l e t e i i i i i i i i i i i i i i i i i i i i i i i i i p p p p p p p p p p p p p p p p p p p p p p p p p p p p p i i i i i i i i i i i i i i i i i i i i i i i i i i i i i p p p p p p p p p p p p p p p p p p p p p p p p p p p p p t t t t t t t t t t t t t t t t t t t t t t t t t t t t t m m m m i i i i i i i i i i i i i v i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i v i i i i i i v i i i i i i hoch konserviert 6 1 2 3 1 2 5 1 2 2 5 3 3 2 5 1 5 4 5 2 2 5 3 1 1 3 1 1 1 1 1 1 1 1 1 3 1 2 2 1 1 1 5 1 6 9 2 1 7 2 2 2 3 2 2 2 6 4 4 5 4 die genome aller organismen evolvieren im laufe der zeit. evolutionäre veränderungen lassen sich durch einen vergleich der nucleotid-und aminosäuresequenzen verschiedener arten nachweisen. durch analyse der molekularen evolution im laborexperiment unter kontrollierten bedingungen können biologen viele evolutionsprozesse direkt erforschen. sie sollten . . . ein sequenzalignment durchführen und dazu eine matrix erstellen können, in der die ähnlichkeiten und unterschiede der sequenzen miteinander vergleichend gegenübergestellt sind. zeigen können, warum anhand der ermittelten zahl der nucleotid-oder aminosäureunterschiede zwischen zwei sequenzen häufig unterschätzt wird, wie viele veränderungen tatsächlich zwischen den sequenzen stattgefunden haben. ? 1. führen sie ein sequenzalignment der folgenden sequenzen durch und erstellen dann eine distanzmatrix, in der die zahl der identischen nucleotide und der unterschiede (einschließlich insertions-und deletionsereignissen) verglichen werden. 2. erklären sie, warum durch einfaches zählen der abweichenden nucleotide zweier sequenzen die tatsächliche zahl der nucleotidsubstitutionen seit der auseinanderentwicklung der sequenzen häufig zu gering eingeschätzt wird. verwenden sie dazu als beispiel einen vergleich ihres sequenzalignments der sequenzen a und b aus aufgabe 1. wie sie gesehen haben, befasst sich die molekulare evolutionsforschung mit der evolution von genen und proteinen, vergleicht nucleotid-und aminosäuresequenzen verschiedener organismen miteinander und rekonstruiert, welche veränderungen während der stammesgeschichte stattgefunden haben. als nächstes werden sie erfahren, wie sich genome verändern, und einige der folgen dieser veränderungen näher betrachten. wie sie in 7 abschn. 12.2 erfahren haben, versteht man unter einer mutation jegliche veränderung des genetischen materials. eine mutationsform, die sich in einer population etablieren kann, ist die punktmutation, der austausch eines einzelnen nucleotids. viele solcher nucleotidsubstitutionen in der dna haben keine auswirkung auf ein protein -selbst dann nicht, wenn die veränderung an einem proteincodierenden gen erfolgt, denn für die meisten aminosäuren gibt es mehr als ein codon (7 abb. 14.4). eine substitution, die nicht zu einer anderen aminosäure führt, bezeichnet man als synonyme, neutrale oder stille substitution (7 abb. 23.4a). synonyme substitutionen wirken sich nicht auf die struktur und funktion eines proteins aus (können allerdings andere auswirkungen haben, wie in 7 abschn. 15.1 beschrieben) und unterliegen daher wahrscheinlich weniger dem einfluss der natürlichen selektion als andere formen der substitution. eine neutrale evolution unterscheidet sich von einer negativen (reinigenden) und positiven selektion dadurch, dass sie sich nicht auf die überlebensfähigkeit und fortpflanzung des betreffenden organismus auswirkt. die fixierungsrate neutraler nucleotidsubstitutionen innerhalb von populationen ist unabhängig von der populationsgröße. durch vergleiche der raten synonymer und nichtsynonymer substitutionen kann man eine positive und negative selektion in proteingenen erkennen. die genome von organismen zeichnen sich durch eine sehr unterschiedliche größe aus, dagegen ist die zahl der proteincodierenden gene deutlich weniger variabel. der genetische code legt fest, welches codon welche aminosäure codiert (7 abb. 14.4). eine nucleotidsubstitution, die zu einer veränderung der von einem gen codierten aminosäuresequenz führt, bezeichnet man als nichtsynonyme substitution ( teil vi 2. die unterschiede in den aminosäuresequenzen wurden jeweils paarweise in eine tabelle eingetragen. anschließend wurden die aminosäureänderungen in den evolutionären stammbaum eingetragen und die zahl der konvergenten ähnlichkeiten zwischen jedem der artenpaare ermittelt. die ergebnisse können dann als distanzmatrix dargestellt werden. die matrix zeigt oberhalb der diagonalen für jedes artenpaar die anzahl der aminosäureunterschiede und unterhalb die anzahl der konvergenten ähnlichkeiten. an den lysozymsequenzen der beiden arten mit einer gärkammer zur verdauung lässt sich die mehrzahl der konvergenten ähnlichkeiten zwischen den einzelnen artenpaaren nachweisen; dabei zeigt sich einhergehend mit der unabhängigen evolution einer gärkammer zur verdauung eine molekulare konvergenz. abb. 23.6 in der anzahl der gene pro genom gibt es enorme unterschiede. diese abbildung zeigt die zahl der gene einer auswahl von organismen, deren genome schon vollständig sequenziert wurden, angeordnet nach ihren bekannten evolutionären verwandtschaftsbeziehungen. bakterien und archaeen weisen im normalfall weniger gene auf als die meisten eukaryoten. unter den eukaryoten besitzen vielzellige organismen mit organisierten geweben (pflanzen und tiere; die dunkelgrünen und blauen zweige) mehr gene als einzellige organismen (türkisfarbene zweige) oder vielzellige lebewesen ohne eine auffällige organisation der gewebe (gelbe zweige) entstanden. obwohl also die hoatzins und die säugetiere mit gärkammer in den letzten paar hundert millionen jahren keinen gemeinsamen vorfahren hatten, haben sie ähnliche anpassungen ihres lysozyms entwickelt, die es ihnen ermöglichen, ihren der gärung dienenden bakterien nährstoffe zu entziehen. k l k e y d n q n v pavian i r l r q y n d q n v mensch v r m r r y n d q n v ratte t r m y q y n d k n v rind v k l k e w d n r d l pferd v a m g g w n e k d l ursprünglicher zustand v r m r q w n d k n v durch in der größe der genome gibt es bei verschiedenen organismen bekanntlich erhebliche unterschiede. betrachtet man die großen taxonomischen kategorien, so ist eine gewisse korrelation zwischen genomgröße und komplexität der organismen zu erkennen. das genom des winzigen bakteriums mycoplasma genitalium umfasst nur 470 gene. aus 634 genen besteht das genom des bakteriums rickettsia prowazekii, dem erreger des fleckfiebers. hingegen besitzt homo sapiens ungefähr 21.000 proteincodierende gene. 7 abb. 23.6 zeigt die zahl der gene einer auswahl von organismen, deren genome bereits vollständig sequenziert wurden, angeordnet nach ihren bekann-ten evolutionären verwandtschaftsbeziehungen. wie ihnen die abbildung zeigt, bedeutet ein größeres genom nicht eine höhere komplexität des phänotyps. (vergleichen sie beispielsweise den reis mit den anderen pflanzen.) es überrascht nicht, dass für den bau und die aufrechterhaltung der funktionen eines großen, vielzelligen organismus mehr und komplexere genetische informationen erforderlich sind als bei einem kleinen, einzelligen bakterium. überraschend ist jedoch, dass einige organismen, etwa lungenfische, manche schwanzlurche und lilien, rund 40-mal mehr dna im zellkern aufweisen als beispielsweise der mensch. natürlich ist ein lungenfisch oder eine lilie nicht 40-mal komplexer aufgebaut als ein mensch. warum variiert die genomgröße dann so stark? die unterschiede in der genomgröße sind nicht so groß, wenn man nur jenen anteil der dna betrachtet, der tatsächlich proteine codiert oder sequenzen anderer rnas als mrnas festlegt. die organismen mit der größten menge an kern-dna (einige farne und blütenpflanzen) weisen zwar 80.000-mal so viel dna auf wie die bakterien mit den kleinsten genomen, aber keine der arten besitzt mehr als 100-mal so viele proteincodierende gene wie ein bakterium. daher beruhen die meisten wie im vorherigen abschnitt erwähnt, besitzen die meisten vielzelligen organismen sehr viel mehr gene als der größte teil der einzelligen arten. aber die vielzelligen lebewesen sind aus einzelligen vorfahren hervorgegangen. wie kam es zu der zunahme der anzahl der gene innerhalb des genoms von vielzelligen organismen im laufe der evolution? eine solche zunahme kann vor allem durch zwei mechanismen zustande kommen: es können gene von anderen arten übertragen oder innerhalb einer art dupliziert werden. bisweilen können gene zwischen entfernt miteinander verwandten zweigen am stammbaum des lebens ausgetauscht werden. die duplikation eines gens bietet gelegenheiten für die evolution neuer funktionen. manche gene liegen im genom in multiplen kopien vor; diese evolvieren im laufe der zeit häufig gemeinsam. in 7 viele genduplikationen betreffen jeweils nur ein gen oder wenige gene, bei polyploiden organismen (darunter zahlreiche pflanzen) wurden jedoch komplette genome dupliziert. bei einer duplikation sämtlicher gene ergeben sich unzählige möglichkeiten, neue funktionen zu evolvieren. genau das ist offenbar bei der evolution der wirbeltiere passiert. das genom der gnathostomata (kiefermünder -das sind alle wirbeltiere mit einem beweglichem unterkiefer) scheint von vielen wichtigen genen vier diploide sätze aufzuweisen. aufgrund dieser erkenntnis gelangten biologen zu dem schluss, dass sich bei dem vorfahren dieser gruppe zwei duplikationen des gesamten genoms ereigneten. diese duplikationen ermöglichten eine beträchtliche spezialisierung einzelner wirbeltiergene, deren expression heute in hohem maße gewebespezifisch erfolgt. zwar haben sich die vertreter der globingenfamilie in ihrer form und funktion diversifiziert, die mitglieder vieler anderer die ziffern geben die geschätzte anzahl an änderungen der dna-sequenz entlang des betreffenden astes im stammbaum an. durch ein duplikationsereignis entstanden die und -globingen-cluster. abb. 23.9 ein genstammbaum der globingenfamilie. eine analyse mittels der methode der molekularen uhr legt nahe, dass sich die 'und "-globingen-cluster (blau bzw. grün) vor ungefähr 450 mio. jahren auseinanderentwickelt haben -also relativ bald nach der entstehung der wirbeltiere genfamilien evolvieren hingegen nicht unabhängig voneinander. so besitzen beispielsweise fast alle organismen zahlreiche kopien (bis zu mehrere tausend) der gene für ribosomale rna. ribosomale rna (rrna) bildet das hauptstrukturelement der ribosomen und erfüllt als solches eine wichtige rolle bei der proteinsynthese. sämtliche lebewesen müssen -oft in großen mengen -proteine synthetisieren (vor allem während ihrer frühen entwicklung). durch den besitz zahlreicher kopien der rrna-gene wird gewährleistet, dass die organismen rasch viele ribosomen produzieren und dadurch eine hohe proteinsyntheserate aufrechterhalten können. die gene für ribosomale rna evolvieren wie alle anderen teile des genoms, und so sammeln sich in den rrna-genen verschiedener arten mit der zeit unterschiede an. innerhalb einer art sind sich die zahlreichen kopien der rrna-gene hingegen sowohl strukturell als auch funktionell sehr ähnlich. diese ähnlichkeit ist auch sinnvoll, denn im idealfall sollte jedes ribosom einer art auf die gleiche weise proteine synthetisieren. mit anderen worten, innerhalb einer spezies evolvieren die vielen kopien dieser rrna-gene gemeinsam. dieses phänomen bezeichnet man als konzertierte evolution. www.life11e.com/a23.1 wie kommt es zu einer solchen konzertierten evolution? offensichtlich liegen ihr zwei verschiedene mechanismen zugrunde. einer davon ist ungleiches crossing-over. während der replikation der dna eines diploiden organismus im zuge der meiose lagern sich die homologen chromosomenpaare aneinan-der, und es kommt zu einer rekombination durch crossing-over (7 abschn. 11.4). im falle hochrepetitiver gene wie der rrna-gene passiert es jedoch leicht, dass diese bei der paarung der chromosomen gegeneinander versetzt werden, weil so viele kopien der gleichen gene auf den chromosomen vorhanden sind (7 abb. 23.10a). infolgedessen erhält das eine der beiden homologen chromosomen beim crossing-over zusätzliche kopien des rrna-gens, während das andere chromosom entsprechend weniger kopien abbekommt. erfolgt in einer der kopien eine punktmutation in form einer basensubstitution, kann diese durch das ungleiche crossing-over schrittweise vermehrt und damit allmählich fixiert werden. umgekehrt kann sie durch das ungleiche crossing-over auch schrittweise verringert und schließlich eliminiert werden. so oder so bleiben die vielen kopien des gens einander sehr ähnlich. den zweiten mechanismus, der zu einer konzertierten evolution führt, bezeichnet man als gerichtete genkonversion. dieser mechanismus erfolgt mit viel höherer geschwindigkeit als ein ungleiches crossing-over und hat sich als primärer mechanismus für die konzertierte evolution von rrna-genen erwiesen. in dna-strängen kommt es häufig zu brüchen, welche dann wieder repariert werden (7 abschn. 13.4). die gene für die ribosomale rna liegen während des zellzyklus die meiste zeit eng beieinander. kommt es zu einer beschädigung eines der gene, kann eine kopie des rrna-gens des homologen chromosoms als matrize zur reparatur der beschädigten kopie dienen; dabei ersetzt die als matrize fungierende sequenz die originalsequenz (7 abb. 23.10b). in vielen fällen erfolgt diese reparatur offenbar insofern recht einseitig, als häufig bestimmte sequenzen als matrizen verwendet werden. dadurch kann sich teil vi eine solche bevorzugt verwendete sequenz rasch auf alle kopien des gens ausbreiten. auf diese weise kann es passieren, dass sich eine veränderung, die in einer einzelnen kopie auftritt, rasch auf alle anderen kopien ausbreitet. aber ganz gleich, welcher dieser mechanismen der konzertierten evolution im konkreten fall zugrunde liegt: durch sie evolvieren die kopien eines hochrepetitiven gens nicht unabhängig voneinander. zwar treten nach wie vor mutationen auf, aber wenn diese in einer einzelnen kopie erfolgen, dann breiten sie sich entweder rasch auf sämtliche kopien aus oder gehen vollständig aus dem genom verloren. dieser prozess ermöglicht es, dass jede kopie über die zeit bezüglich sequenz und funktion ähnlich bleibt. durch horizontalen gentransfer können bestimmte funktionen von genen auch zwischen entfernt verwandten arten übertragen werden. genduplikation kann zur evolution neuer funktionen führen. manche hochrepetitiven gene durchlaufen eine konzertierte evolution, wodurch ihre gleichförmige funktionalität erhalten bleibt. sie sollten . . . beschreiben können, wie es durch horizontalen gentransfer zur übertragung von genen zwischen verschiedenen linien kommen kann, insbesondere zwischen bakterien. anhand eines genstammbaums einer genfamilie ableiten können, wann in der stammesgeschichte einer gruppe von arten genduplikationen stattgefunden haben. erläutern können, inwiefern ein dupliziertes gen gelegenheiten für die evolution neuer funktionen liefert. die beiden prozesse, die einer konzertierten evolution zugrunde liegen, grafisch darstellen und ihre unterschiede aufzeigen können. ? gentransfers für den organismus, der durch diesen mechanismus neue gene erhält. für die ergebnisse von untersuchungen der molekularen evolution gibt es in der gesamten biologie praktische anwendungsmöglichkeiten, etwa, um grundlegende aspekte biologischer funktionen besser zu verstehen, oder für forschungen zur menschlichen gesundheit. die evolutionsgeschichte von genen liefert informationen zur funktion von proteinen. zebrafisch en2b zebrafisch en2a huhn en2 maus en2 mensch en2 zebrafisch en1b zebrafisch en1a huhn en1 maus en1 mensch en1 en2 en1 seeigel en lanzettfischchen en bei wirbeltieren entstanden durch eine genduplikation die beiden paralogen engrailed-gene en1 und en2. in der zu den zebrafischen führenden linie traten weitere genduplikationen auf. anstatt mit genstammbäumen arbeiten viele biologen je nach fragestellung auch mit proteinstammbäumen, zum beispiel um die verwandtschaftsbeziehungen oder den funktionswandel homologer proteine zu analysieren. entsprechend zu den genen unterscheidet man orthologe proteine (gleiche funktion in unterschiedlichen organismen) und paraloge proteine (unterschiedliche funktion im gleichen organismus). so sind die 'und die "-untereinheit des menschlichen hämoglobins paralog (7 abb. 23.9), die "-untereinheiten von mensch und maus dagegen ortholog. weiter vorne in diesem kapitel haben sie erfahren, wie biologen genabschnitte identifizieren können, die einer positiven bei den verschiedenen duplizierten genen für natriumkanäle der vielen kugelfischarten sind mehrere unterschiedliche substitutionen erfolgt, die zu einer resistenz gegen ttx geführt haben. in diesen genen sind aber auch zahlreiche andere veränderungen aufgetreten, die nichts mit der evolution der ttx-resistenz zu tun haben. biologen, die sich mit der funktion von natriumkanälen befassen, können eine menge über die funktionsweise dieser kanäle lernen (und auch über neurologische krankheiten, die durch mutationen in den genen für natriumkanäle verursacht werden), indem sie in erfahrung bringen, welche veränderungen für eine ttx-resistenz selektiert wurden. dazu vergleichen sie die raten synonymer und nichtsynonymer substitutionen bei den genen der verschiedenen linien, in denen sich eine resistenz gegen ttx entwickelt hat. in ähnlicher weise versucht man mithilfe der prinzipien der molekularen evolution die funktion sowie die diversifizierung der funktion zahlreicher anderer proteine zu verstehen. bei ihren untersuchungen der zusammenhänge zwischen selektion, evolution und funktion von makromolekülen kamen biologen schon bald auf die idee, sich die molekulare evolution in einer kontrollierten laborumgebung zunutze zu machen, um neuartige makromoleküle mit nützlichen eigenschaften herzustellen. das war die geburtsstunde der anwendungen der in vitro-evolution ("evolution im reagenzglas"). anschließend können sie feststellen, welche der gegenwärtig kursierenden influenzastämme die größte zahl von veränderungen in diesen positiv selektierten codons aufweisen. diese influenzastämme werden nämlich am ehesten überleben, sich vermehren und zukünftig grippeepidemien verursachen -und bilden somit die logischen ziele für neue impfstoffe. diese praktische anwendung der prinzipien der evolutionstheorie führt dazu, dass wirkungsvollere impfstoffe gegen grippeviren entwickelt werden -und damit alljährlich weniger menschen an grippe erkranken oder gar daran sterben. synapsenfutter: wenden sie an, was sie gelernt haben als genom eines organismus bezeichnet man die gesamtheit seiner gene, regulatorischen sequenzen und strukturelemente einschließlich der nichtcodierenden dna. das gebiet der molekularen evolution befasst sich damit, welche zusammenhänge zwischen der struktur von genen und proteinen und der funktionsweise von organismen bestehen. mittels sequenzalignment von nucleinsäuren oder proteinen verschiedener organismen lassen sich diese vergleichen und homologe positionen identifizieren. siehe 7 abb. 23.1; 7 activity 23. key: cord-287396-18p171nr authors: schroyen, martine; tuggle, christopher k. title: current transcriptomics in pig immunity research date: 2014-11-15 journal: mamm genome doi: 10.1007/s00335-014-9549-4 sha: doc_id: 287396 cord_uid: 18p171nr swine performance in the face of disease challenge is becoming progressively more important. to improve the pig’s robustness and resilience against pathogens through selection, a better understanding of the genetic and epigenetic factors in the immune response is required. this review highlights results from the most recent transcriptome research, and the meta-analyses performed, in the context of pig immunity. a technological overview is given including wholegenome microarrays, immune-specific arrays, small-scale high-throughput expression methods, high-density tiling arrays, and next generation sequencing (ngs). although whole genome microarray techniques will remain complementary to ngs for some time in domestic species, research will transition to sequencing-based methods due to cost-effectiveness and the extra information that such methods provide. furthermore, upcoming high-throughput epigenomic studies, which will add greatly to our knowledge concerning the impact of epigenetic modifications on pig immune response, are listed in this review. with emphasis on the insights obtained from transcriptomic analyses for porcine immunity, we also discuss the experimental design in pig immunity research and the value of the newly published porcine genome assembly in using the pig as a model for human immune response. we conclude by discussing the importance of establishing community standards to maximize the possibility of integrative computational analyses, such as was clearly beneficial for the human encode project. electronic supplementary material: the online version of this article (doi:10.1007/s00335-014-9549-4) contains supplementary material, which is available to authorized users. anticipating the need to efficiently feed an estimated 9 billion people, pig farms have significantly increased in size, which also unfortunately increases the risk of disease incidence. a better understanding of the immune system in swine is essential, because susceptibility to infectious diseases has great influence on pig performance (mellencamp et al. 2008; boddicker et al. 2012) . selection for improvement in economical traits such as feed conversion and prolificacy is already routine practice, and improvement of immune capacity through selection is gradually catching up (edfors-lilja et al. 1998; uddin et al. 2011; lu et al. 2012 ). in addition, pigs are an important biomedical model for humans as they share great similarity in anatomy, physiology, genetics, and genomics, including many genes in the immune system (dawson 2011; dawson et al. 2013) , which is very helpful when modeling human immune responses and diseases (lunney 2007; fairbairn et al. 2011; kapetanovic et al. 2012; meurens et al. 2012) . moreover, pigs have recently been genetically modified to serve as an improved model for human disease (suzuki et al. 2012) . for more genetically modified pig models for human diseases, we refer the reader to the reviews of ross and prather (2011) and walters et al. (2012) . at the genomics and transcriptomics level, the pig is a very appealing model since one can extrapolate more easily to human due to the high homology in gene sequence and chromosomal structure (groenen et al. 2012 ). the immune system components of the pig genome (immunome) have become the best-annotated genes in the current genome (dawson et al. 2013) . this improved pig genome annotation will greatly aid creation of biomedical models. when the cause of a medical condition in humans is found, the corresponding mutation(s) can be genetically engineered in pigs to mimic the disease, and (novel) drug therapies can be tested (walters et al. 2012) . differences in genotype associated with a different immune response phenotype in pig can easily be examined with the use of the porcine 60 k high-density snp chip (boddicker et al. 2012; lu et al. 2012; wang et al. 2012a ). the last decade witnessed a steep increase in the amount of transcriptomic pig immune response data using whole porcine genome microarrays. meanwhile, custom-made arrays enriched in annotated genes of the pig immune system were used to address specific research questions (gao et al. 2010 ). in addition, novel techniques are surfacing in studies focusing on the porcine immune response transcriptome. high-density oligonucleotide tiling arrays and next generation sequencing (ngs) have been further broadening the knowledge of the pig's immunome (mockler et al. 2005; morozova and marra 2008) . these techniques are able to identify and quantify known and unknown transcripts and can be used to extend recent efforts made on the pig genome annotation (gao et al. 2012) . the understanding of the function of diverse mirnas will profit immensely from ngs, and papers on the porcine mirnaome are being published sharbati et al. 2010) , as surely soon will follow on epigenetic changes such as dna methylations or histone modifications. several research groups have graciously provided information to our review on their recently published and unpublished studies, including whole genome microarrays to rna-seq, mir-na-seq, rip-chip, and some of those methylation and histone modification studies (table 1) . the aim of this review is to summarize the use and results of recent transcriptomic research in pig immunity, identify current needs in the field, and anticipate future areas of progress. an overview will be given of studies using whole genome microarrays and immune-specific arrays, small-scale high-throughput expression methods, high-density tiling array, and ngs (table 2; fig. 1 ). experimental design concerns and studies using the newest sequencing techniques and epigenetic tools together with a systems biology approach to interpret the data will also be discussed. for this review, we searched the entrez pubmed database, using keyword search terms such as pig transcriptomics, pig immunity, microarray, immune-specific array, high-throughput qpcr, rna-seq and mirna-seq, and systems biology. in addition, references from the articles obtained by this method were checked for additional relevant material. the arrayexpress database from ebi was consulted to create supplementary table 1, using sus scrofa as organism and filtered on 'array assay' as technology. whole genome microarray studies, the start of global high-throughput transcriptomics the range of genes assayed with whole genome microarray depends on the array used (table 2 ; fig. 1 ). the first porcine whole genome array available, the qiagen-nrsp8 array, was a result of collaboration between qiagen-operon and the usda-nrsp-8 swine genome community (zhao et al. 2005) . until then, human arrays were used to tackle porcine transcriptomic issues, or swine arrays were developed for a specific research question, as described in the next section. the pigoligoarray was a second-generation porcine 70-mer oligonucleotide array (steibel et al. 2009 ). currently, the most well known are commercially available microarrays such as the porcine gene expression microarray of agilent and the genechip porcine genome array of affymetrix (tuggle et al. 2010; huang et al. 2011; zhou et al. 2011; bao et al. 2012) . freeman and colleagues recently introduced the affymetrix snowball microarray, which is more comprehensive than the first affymetrix chip in coverage both for transcript structure and for the transcriptome as a whole, containing more than double as many probes (freeman et al. 2012) . details and examples of these whole genome arrays are shown in table 2 . an overview of pig immunity experiments submitted to the arrayexpress database is given in supplementary table 1. since global porcine microarray studies have been extensively described earlier , 2010), we will limit this section by highlighting the importance of using microarrays in a systems biology approach; we do provide a comprehensive table on all porcine immunerelated microarray experiments submitted to arrayexpress (supplementary table 1) . several microarray studies have been conducted to study host response to porcine reproductive and respiratory syndrome (prrs) using commercial or more specific pig annotated microarrays (bates et al. 2008; genini et al. 2008a; ait-ali et al. 2011; wysocki et al. 2012) . badaoui et al. (2013) recently illustrated how the information of multiple prrs studies could be used simultaneously to gain insight on host response to prrsv challenges. they collected all publicly available microarray data covering multiple porcine immunology studies and including many different breeds, tissues, pathogens, and array platforms. the data of 779 general immune response arrays were assembled, and separate meta-analyses for differential expression were performed using these 779 arrays as well as a subset of 279 arrays specifically from prrs experiments (badaoui et al. 2013) . to find prrs-specific expression responses, they eliminated differentially expressed genes common to these two metaanalyses. other meta-analysis studies, examining the immune response to salmonella typhimurium, combine microarray information with data such as serum cytokine measurements or microbiota differences. the results of these meta-analyses performed on prrs or s. typhimurium will be discussed in the section ''overall value of transcriptomics in important infectious swine diseases.'' in addition, whole genome microarrays were used to study pig response to haemophilus parasuis infection by zhao et al. (2013) . in one of their earlier studies, a genome-wide affymetrix array experiment revealed 931 differentially expressed genes between 3 non-infected and 3 h. parasuisinfected pigs in spleen at 7dpi ). among them were retn, s100a8, s100a9, and s100a12, all important innate immunity genes marking inflammation. chen et al. (2011b) reexamined the raw data of this first experiment and used a more robust genechip rma (gcrma) normalization (wu and irizarry 2004) , and an improved annotation using anexdb (couture et al. 2009) . a network analysis revealed that the ps100a8/ps100a9-casp3-slc1a2 pathway played an important role in h. parasuis infection, and cebpb may act as a transcription factor of the two s100 family members (chen et al. 2011b ). later, zhao and co-workers describe a systems biology approach starting from the same affymetrix data. with the use of the kegg and reactome databases, 1,999 transcripts from the affymetrix chip were flagged as immunogenes. with this reduced dataset, exploratory analyses such as a principal component analysis (pca) and a geneset enrichment analysis (gsea) were conducted. with this pca/ gsea method, they came to a core set of 16 differentially expressed genes, indicative of an h. parasuis infection (zhao et al. 2013) . to construct the immunologically important topology involved in an h. parasuis infection, the c 3 net algorithm was used (altay and emmert-streib 2010). several networks were created, and the largest network had the complement gene c1r as hub; although c1r was not differentially expressed on the microarray, it was predicted to play a crucial role in the immune defense. with this immunome-focused method, more subtle differences could be pulled out, and much new information was obtained as compared to the earlier whole genome array analysis (zhao et al. 2013 ). the immune-specific array, a focused microarray besides using commercially available or custom-made transcriptome-wide microarrays, or examining only immunogenes on those arrays, arrays focusing only on the genes involved in the porcine immune response are also used (table 2 ). in the past, several of these arrays were constructed to specifically focus on key immune genes (ledger et al. 2004) , to explore gene expression in a specific immune tissue (dvorak et al. 2006; machado et al. 2005; niewold et al. 2005) , or to examine host interactions with a specific pathogen (zhang et al. 2006; skovgaard et al. 2010 ). more information on these studies is given in table 2 . next to these small-scale arrays, other immunespecific arrays were made using an existing whole genome array and adding probes examining immune-related genes. in a study by flori and co-workers, the qiagen-nrsp8 array was extended by adding probes for pseudorabies virus rnas as well as probes for transcripts from the porcine major histocompatibility complex, also called the swine leukocyte antigen (sla) complex, referred to as the qiagen ? sla/prv in fig. 1 . this array was the first to simultaneously examine viral transcripts and porcine immune transcripts (flori et al. 2008 ). gao and colleagues used the nrsp8-13k chip enriched with sla genes and immunity genes outside the sla complex and called it the sla-ri/nrsp8-13k chip (gao et al. 2010 ). in the future, when using microarrays to examine gene expression patterns, whole genome arrays or the extended versions of them are clearly preferred. when the objective of the study is to examine only a relative small number of genes, other techniques such as fluidigm digital array or nanostring (described in more detail below) can be performed. these low cost, user-friendly, and rapid approaches make the small-scale in-house printed arrays redundant. upcoming sub-genomic-scale, high-throughput methods: fluidigm and nanostring even though qrt-pcr is often used as the routine method to confirm microarray and rna-seq findings, it has been used in many pig immune studies as a primary means to measure expression levels of a dozen or more immune genes (borca et al. 2008; lastra et al. 2009; islam et al. 2012a, b; uddin et al. 2012; martins et al. 2013a, b; uddin et al. 2013 ), together with suitable housekeeping genes for normalization . although it is a very accurate and sensitive method, standard qrt-pcr has the major drawback that only a few genes in a few samples can be examined at once as compared to high-throughput methods (fig. 1) . further, significant amounts of money and time are spent controlling sample and assay variation by using appropriate-and preferably multiple-housekeeping genes and validating pcr efficiencies or correcting for them by using standard curves. the use of highthroughput qrt-pcr methods, such as the fluidigm digital array, can help overcome these issues. the fluidigm digital array integrated fluidic circuit (spurgeon et al. 2008) is an example of a nanofluidic chip through which up to 9,216 qrt-pcr reactions can be performed at once, which represents 96 genes tested on 96 samples (ramakrishnan et al. 2013) (fig. 1) . beyond sheer speed of analysis, an important advantage of nanotechnology is that a much smaller amount of rna per sample is necessary to perform 96 assays. in livestock, the first fluidigm experiments have been published only recently, as a stand-alone expression study or as a validation of a microarray/rna-seq experiment (robic et al. 2011; skovgaard et al. 2013; sorg et al. 2013; pilcher et al. 2014) . in a h1n2 influenza virus study, skovgaard and coworkers used fluidigm to examine the expression of several pattern recognition receptors, ifn and ifn-induced genes, cytokines, and acute response protein genes after 24 hpi, 72 hpi, and 14 dpi. they expanded their study by including data on differentially expressed mirnas and mrnas in the lungs and saw several mirnas potentially targeting the differentially expressed mrnas (skovgaard et al. 2013) . another high-throughput method uses the nanostring ncounter system by which color-coded probes are counted to digitally measure gene expression (fig. 1) . this nonamplification method can be seen as a small-scale, hybridization-based platform. per gene of interest, two sequence-specific probes are needed, a capture probe that has an affinity tag such as biotin to capture the gene on a surface, and a reporter probe with a multiple-fluorescent tag code that acts as a unique detector (geiss et al. 2008 ). its strength lies in the non-enzymatic approach, with no reverse transcriptase and no pcr amplification. this approach essentially eliminates primer-primer interactions, and up to 800 transcripts can be measured at a time. the technique is not described in livestock species thus far. however, with the possibility to use degraded rna samples such as rna from formalin-fixed paraffin-embedded tissue samples, this method could become a very useful expression measuring tool. high-density oligonucleotide tiling array for gene expression and genome annotation high-density oligonucleotide tiling arrays have diverse genomic applications, such as transcriptome mapping and quantification. in general, tiling arrays are similar to microarrays as they both utilize oligonucleotide probes. however, through utilization of large numbers of probes, tiling arrays can span large genomic sequences without being restricted to annotated sequences (gao et al. 2012) . thus, tiling arrays may contain probes covering the entire genome in an unbiased manner (liu 2007) , and such partially overlapping probes can be used to find expressed genes, previously annotated or not. gao and co-workers used a nimblegen tiling array (386,620 probes) for the analysis of the transcription map of the sla complex (gao et al. 2012) . ninety-seven genes were found to be differentially expressed between un-stimulated peripheral blood mononuclear cells (pbmcs) and pbmcs stimulated with phorbol myristate acetate (pma)/ionomycin. these results are in nearly complete agreement with their previous experiment using the sla-ri/nrsp8-13k chip. furthermore, with the tiling array, they were able to confirm and refine the previous annotation of the sla genes (gao et al. 2012 ). transcriptome sequencing rna-seq is a recently developed method enabling the examination of the complete set of transcripts in a cell using a sequencing by synthesis approach ). with this technique, mrna is converted into a library of cdna fragments, and at one or both ends of these fragments, adaptors are ligated. each fragment is read in a highly parallel manner, and typical read lengths are between 30 and 700 bp, and go up to an average of 4,000 bp and more with the pacbio rs sequencer (ferrarini et al. 2013) . the resulting reads are aligned to a reference genome of the organism examined, and a genomewide transcription map can be made describing the structure and enumerating the expression level of each gene. de novo assembly of a transcriptome is also possible if the reference genome is not available or poorly annotated (martin and wang 2011) ; integrated approaches are also possible. rna-seq has already proven to be a cost-effective way to investigate the sequence of all mrna transcripts in a specific tissue and/or for a specific physiological condition. not only is the sequence obtained together with the quantitative level of transcripts, but also splice variants can be detected, which is a major advantage in contrast with techniques such as microarrays. moreover, with rna-seq, there is very low, if any, background noise, and no transcript specific primers or probes are required, so no prior knowledge is needed ). this latter attribute is important when working with livestock species for which genomic sequence assemblies are incomplete (bauersachs and wolf 2012). as such, esteve-codina and colleagues reported hundreds of un-annotated protein coding genes in the porcine genome while studying the gonad transcriptome (esteve-codina et al. 2011 ). however, projects costs are influenced by the research question asked. if the goal is to examine general expression pattern differences in well-annotated, moderately expressed genes, sequencing does not have to be very deep, and samples can be multiplexed in one lane. for novel transcripts or genes with a low expression level, higher coverage is needed, and costs will increase (the encode consortium project 2011a). one may also consider depleting highly abundant genes of little interest, such as alpha and beta globin in a blood sample, prior to sequencing in order to improve detection sensitivity (choi et al. submitted) . in 2010, xiao and collaborators performed a 3' tag digital gene expression (dge) analysis of the porcine lung transcriptome on pigs infected with the prrs virus (xiao et al. 2010a, b) . this dge technique can be seen as a predecessor of rna-seq. whereas rna-seq will sequence all transcripts containing poly-a ? tails, dge detects those transcripts containing catg recognition sites because the transcripts are 'tagged' through digestion of cdnas at a nlaiii restriction site (hong et al. 2011) . with dge, only a portion of the transcript is analyzed instead of the nearly full transcripts as seen with rna-seq ). both techniques offer similar estimates of gene expression levels but rna-seq has the advantage to be able to provide information about transcript structure and consequently can detect splice variants (hong et al. 2011 ). xiao and colleagues sequenced lung rna libraries of control noninfected pigs (n = 3), prrs-affected pigs necropsied at 4dpi (n = 3,) and prrs-affected pigs necropsied at 7dpi (n = 3) after infection with the classical north american prrsv type (nprrsv) (xiao et al. 2010a) or the highly virulent prrsv (hprrsv), typically found in asia (xiao et al. 2010b ) and found 4,520 (hprrsv) and 5,430 (nprrsv) differentially expressed genes. for both types of prrsv, a higher expression of anti-apoptotic genes and a lower expression of pro-apoptotic genes could be seen as a viral strategy for replication and spread (xiao et al. 2010a, b) . as such, suppressed expression of short type i interferon (spi ifn) and ifna, both important innate anti-viral genes, was detected. additionally, there was an upregulation of cd163 noted, which could indicate an increase in internalization of prrsv since a positive correlation is described between the expression of this prrsv receptor gene and prrsv infectivity (patton et al. 2009 ). patton and colleagues described that treatment with modulators such as lipopolysaccharide (lps) or il10, affecting cells to express cd163, as a consequence also affected the susceptibility of the host for prrsv, thereby increasing or decreasing viral infectivity (patton et al. 2009 ). miller and colleagues examined gene expression profile data obtained by sage tag analysis of trachea-bronchial lymph nodes of sham-or prrsv-infected pigs at 13dpi. infection occurred with both highly pathogenic hprrsv (n = 8) as well as north american type nprrsv (n = 7) (miller et al. 2012 ). the hprrsv strain showed a more altered gene expression profile with higher fold change differences relative to the controls than the nprrsv, indicating the increased pathogenicity of the hprrsv. among the top 10 genes that were upregulated were three serum amyloid a2 acute-phase isoforms, resistin (retn), and three s100 calcium binding proteins, s100a8, s100a9, and s100a12 (miller et al. 2012) . jiang et al. (2013) used the go, kegg, and reactome databases to analyze the data on nprrsv infection further and identified six biological system categories affected by prrsv, including cellular processes, genetic information processing, environment information processing, metabolism, organismal systems, and human diseases (jiang et al. 2013) . in a largescale prrsv infection study aimed at finding genes controlling variation in immune response outcomes, boddicker and colleagues identified a region on ssc4, containing snp marker wur10000125, was strongly associated with both weight gain and prrs viral load for 21 days postinfection (boddicker et al. 2012 (boddicker et al. , 2013 . eisley and coworkers performed an rna-seq analysis of all genes in this wur10000125 region through comparing blood rna from 8 pairs of littermates with one of two different wur10000125 genotypes. they identified a strong candidate gene differentially expressed between favorable and unfavorable genotypes (eisley et al. 2014) . several other immune-related rna-seq studies are forthcoming (table 1) . their objective is to examine viral or bacterial immune responses in pig macrophages, dendritic cells, lymph nodes, globin depleted whole blood, and tissue samples from the gastrointestinal tract. undoubtedly, the knowledge of pig immune responses at the transcriptome level will greatly benefit from these recently published and to-be-published studies. next to exploring the ''traditional'' transcriptome, mir-naome analyses are also progressing in the pig (mcdaneld 2009; liu et al. 2010) . micrornas (mirnas) are small non-coding rnas that can regulate gene expression through degrading or interfering with the respective mrna sequence, often at the 3' un-translated region (3'utr) of the gene. during the last few years, high-throughput sequencing has lead to a steep rise in discovering novel porcine mirnas (xie et al. 2011) . moreover, ngs can facilitate distinguishing between mirnas and other small rna fragments, and thus, ngs is the most promising technique for exploring the mirnaome. criteria to distinguish true mirna from other rna fragments are listed in kozomara and griffiths-jones (2011) . mirna sequences and predicted targets for all important livestock animals can be found in mirbase, a comprehensive mirna information database (griffiths-jones et al. 2006) . with regard to immunological responses in pigs, recent studies have reported that porcine mirna can intricately engage itself in host-virus interaction networks (he et al. 2009; loveday et al. 2012; guo et al. 2013) . conversely, mir-nas expressed by the viral pathogen can promote a favorable host cell environment for enhanced viral replication by targeting porcine mrnas (skalsky and cullen 2010) . an example of this involving response to pseudorabies virus was published by anselmo and co-workers. using ngs, they analyzed both viral and host mirna expressions in infected dendritic cells, and identified 5 viral mirnas, 156 known porcine and 27 new porcine mirnas (anselmo et al. 2011). another group used ngs techniques to analyze mirna expression profiles in pseudorabies virus infected porcine epithelial cell lines . eleven mirnas were detected in the viral genome, and 209 known and 39 novel mirnas assigned to the porcine genome were also found. wu and colleagues mainly focused on the viral mirnas, which were associated with regulation of viral gene transcription but also proposed to control gene expression in the host of genes annotated for immune processes, viral replication, cell death, as well as other processes. podolska and colleagues compared the mirnaome in necrotic and visually unaffected pieces of lungs from piglets infected with a. pleuropneumoniae and found 169 conserved and 11 candidate novel mirnas. twenty-nine were significantly up-or down-regulated between necrotic and unaffected tissue (podolska et al. 2012). timoneda and colleagues noted differences in mirna expression between aujeszky's disease or suid herpesvirus type 1 (suhv-1) virus-infected and mockinfected animals, as well as differences when looking at a virulent strain of suhv-1 compared to an attenuated one (timoneda et al. 2014 ). mrna-seq and mirna-seq: a powerful combination toward the elucidation of mirna-rna interactions, endale ahanda and colleagues analyzed the 3'utr variants of all genes of the sla region by analyzing rna-seq data. in this way, snps in mirna target sequences, potentially impacting gene expression, could be revealed. to investigate the co-expression between mrna and mirna, mrna-seq and mirna-seq data from an earlier study looking at liver, longissimus dorsi, and abdominal fat were used (chen et al. 2011a; endale ahanda et al. 2012) . negative correlation between expression levels of mirnas and their predicted target genes could be found, which suggested that the prediction algorithms used were reliable (endale ahanda et al. 2012) . since mirnas can play an important role in hostpathogen interactions (scaria et al. 2006 ), gao and colleagues looked by means of mirna-seq at host mirnas that could target prrsv transcripts. deep sequencing was performed on pams inoculated with a mock dose or with prrsv. the resulting data were mapped against all known mirnas listed in the current mirbase. mirna target prediction revealed that one mirna family, the mir-181 members, seemed to suppress prrsv replication in vivo at the early stage. one mir-181c target is the 3'utr of cd163 mrna, which encodes an important prrsv receptor. mir-181c is able to downregulate cd163 expression and thus interferes with viral attachment and penetration . similar results were seen for other mir-181 members. there is also good evidence that expression profile differences with regard to an s. typhimurium infection may be partially controlled by mirnas. huang and co-workers found that mir-155 was decreased in persistently shedding (ps) pigs in comparison with low shedding (ls) pigs ) after an s. typhimurium infection. mir-155 targets transcription factors cebpb and spi1, which in turn control the expression of important immunogenes (adamik et al. 2013; wei et al. 2014) . bao and co-workers specifically investigated potential mirna-mrna regulatory interactions occurring during challenge with s. typhimurium. mrna-seq and mirna-seq data were collected on whole blood samples of ls and ps animals (bao et al. unpublished) . they found 37 and 24 mirnas up-and down-regulated at 2 dpi when looking at ls and ps pigs together. several of them were thought to be involved in innate and adaptive immune responses. they also discovered 3 mirnas to be higher expressed in ls than in ps pigs at day 2, which could be interesting candidates for biomarkers for selection toward low shedders. bao and colleagues subsequently used the sequence-based mirna target prediction software miranda to propose mirnas-mrna regulatory relationships associated with the s. typhimurium infection (bao et al. unpublished) . ye and colleagues searched for factors controlling susceptibility for enterotoxigenic e. coli with fimbria 18 (etec-f18) in intestinal tissue in the sutai pig and found 58 differentially expressed mirnas, and after examining regulatory networks with differentially expressed mrnas that are target of one of those mirnas, 12 of them were shown as hubs for an enriched list of differentially expressed immune-related genes (ye et al. 2012) . several other porcine mirnaome studies are to be published soon focusing on the impact of mirnas after bacterial or viral infections (table 1) . in addition to different expression measurement techniques to examine pig immune responses, various experimental designs to study immunity have been used in these studies. first, there are in vitro versus in vivo studies. whereas in vitro studies are performed outside the living organism, and thus can be more controlled, in vivo studies usually better reflect the underlying biology. choices can be made to compare challenged and non-challenged or vaccinated versus non-vaccinated pigs, and whenever possible preferably littermates are chosen for such comparisons. to reduce genetic background variation even more, treated and untreated tissue within animal can be compared, as seen with the small intestinal segment perfusion (sisp) technique (hulst et al. 2013) . one can also choose to challenge all animals with a specific pathogen, and not using separate uninfected animals as controls, but contrast high and low responders, as was done in the salmonella experiments (uthe et al. 2009; huang et al. 2011; knetter et al. 2014) . less controlled, but perhaps more realistic, are studies performed after an on-farm outbreak, comparing healthy and diseased pigs or low and high responders (serao et al. 2014) . another key variable concerns the type of samples to be collected. most pig immune studies conducted to identify host response to common porcine pathogens or to immune response stimulators such as lps or pma/ionomycin described in this review provide gene expression data from a single tissue or isolated cell type, and this at a limited number of times post-infection/stimulation. to study specific components of immune response, it is clear that dissection and analysis of primary or secondary immune tissue are required. in human and mouse studies, significant effort has been taken a step further-the analysis of the transcriptome of highly specific cell types. such samples are isolated on the basis of cell surface marker expression. the parameters for cell selection and isolation are often complex, utilizing a multifactorial list of cell surface markers to identify a highly refined cell population (novershtern et al. 2011; shay and kang 2013) . reports on, and options for, specific cell subsets are limited in swine (genini et al. 2008a; kapetanovic et al. 2012 ) and are mostly due to the relative lack of immune-targeted reagents critical for such detailed cell phenotyping. examining the whole blood transcriptome has several advantages, including ease of collection and repeated sampling of the same individual during response to a stimulus, which is especially useful in controlling for baseline variation in the study of immune responses. blood rna profiling is advantageous in screening for biomarkers as well; it can be used to study variation in immune response and develop gene signatures predictive of inflammatory and/or disease status. an example is given by the prrs host genome consortium (phgc) project, where blood of 200 infected animals per trial is collected at day 0 and 8 different times post-infection ). however, since whole blood comprises a number of cell types, gene expression differences should be handled with great caution. with the aid of complete blood counts, the transcriptional response data can be deconvoluted to help identify the unique regulatory control of specific cellular responses to pathogens (shen-orr et al. 2010) . at the end of the day, the ultimate goal is to see how the results of all these individual transcriptomic studies fit into an improved understanding of porcine immune response. recently, such a meta-analysis was performed by combining results of several microarray-based pig immune studies to find prrs-specific responses (badaoui et al. 2013) . this meta-analysis successfully summarized the general pathway(s) believed to be induced by prrsv. several interferon regulatory factors (irfs) were highlighted in this analysis, and interferons clearly play an important role in viral infections. in agreement with the digital gene expression experiment (xiao et al. 2010a) , in several microarray and qpcr experiments, a dampened expression type i ifn response can be seen, which indicates an inadequate stimulation of the innate anti-viral immune response (genini et al. 2008a; xiao et al. 2010a; ait-ali et al. 2011; garcia-nicolas et al. 2014 ). genini and colleagues observed a strong elevation of ifnb at 9hpi, but only a slightly elevated expression of ifna (genini et al. 2008a ). ait-ali and colleagues noted a similar low ifna expression, but an, albeit late, accumulation of ifnb expression. they stated that prrsv could delay type i interferon transcriptional response in an attempt to counteract the host's early immune response (ait-ali et al. 2011) . van reeth and colleagues measured the ifna levels in bronchoalveolar fluids during prrsv infection and saw that its presence was low, a thousand-fold lower than with an infection with swine influenza virus or porcine respiratory coronavirus (van reeth et al. 1999 ). however, different prrsv isolates were shown to invoke different (and sometimes significantly higher) ifna expression levels, but no detectable ifna protein levels were found by elisa (lee et al. 2004 ). zhang and colleagues found that prrsv does not fail to induce ifna or ifnb mrna expression in monocyte-derived dendritic cells, but the protein seems to be blocked post-transcriptionally . although all described data point out to a weakened ifn response, greatly responsible for a persistent viral infection, the data by these last two studies demonstrate the incomplete information achieved from looking solely at transcriptomic data. another overall prrs finding is the induction of proinflammatory chemokines and cytokines. the differential expression of a cell surface receptor involved in cytokine regulation, trem1, was found through the meta-analyses by badaoui et al. (2013) and was also present in the list of top ten upregulated transcripts in the rna-seq experiment of miller et al. (2012) . in the meta-analysis study, trem1 changed, among others, the expression of chemokines such as ccl2 and ccl3, interleukins il6, il18, and il1b, and toll-like receptors tlr2 and tlr4. xiao et al. (2010a) showed an upregulation in the inflammatory response toll-like receptor genes tlr2, tlr4, cytokines (among which il1b), and chemokines. the acute-phase protein saa2 and inflammasome genes retn, s100a8, s100a9, and s100a12 were upregulated in the rnaseq experiment by miller and colleagues. for s. typhimurium, as mentioned earlier, transcriptomic studies in blood have examined differences between ls and ps pigs. in the in vivo study by knetter and colleagues, when looking at cytokine presence in serum at 2 dpi, the pro-inflammatory cytokines il1b, tnfa, and ifnc levels were higher in ps pigs compared to ls and control pigs, and the anti-inflammatory il10 was upregulated in both ls and ps pigs, while only cxcl8 was elevated in the ls animals (knetter et al. 2014) . also, uthe and colleagues saw a correlation between ifnc levels and shedding status. it seems that the ps animals have a much more extreme inflammatory response, as if the ps animals respond less quickly and thus extend their inflammatory response (uthe et al. 2009 ). additionally, looking at gene expression differences at 2 dpi compared to 0 dpi, ps animals showed a more extensive transcriptomic response, both in number of differentially expressed genes, as well as in level of expression compared to the ls animals. the most overrepresented regulation networks in ps animals at 2dpi involve the stat1, ifnb1, and ifnc networks, showing a complex pro-inflammatory profile (knetter et al. 2014) . the genes casp1, tnfa, and il10 were also found upregulated in these networks, and hence, a nice correlation between serum cytokines and gene expression could be noted. other important regulators cebpb, spi1, and tlr4 in the ps upregulated expression pathways as well as the tnfa and ifnc pathways were earlier reported by huang and colleagues in similar challenge studies . not surprisingly, the microbiota that differ between ls and ps on 2dpi point to microbiota that play a role in gastrointestinal inflammation (bearson et al. 2013) . there are, however, regional differences in the inflammatory response to s. typhimurium expression pattern in the gut on 2dpi (collado-romero et al. 2010) . while cytokine genes such as tnfa, il6, il1b, and ifnc are upregulated in the jejunum and colon, they are not induced in the ileum. collado-romero and co-workers proposed that the ileum mucosa reacts slowly against the pathogen. martins and colleagues and wang and colleagues both examined transcriptomic differences due to s. typhimurium in mesenteric lymph nodes. martins and colleagues describe an elevation of ifnc, il1b, cxcl2, cxcl8, casp1, slc11a, defb2, tlr8, and nod2 at 2dpi. nfjb was significantly down-regulated at 2 and 6dpi (martins et al. 2013a ). wang and colleagues also note an early repression of the nfjb pathway from 24 to 48 hpi in mesenteric lymph nodes . when looking at gene expression at only 2, 4, or 8hpi in jejunal scrapings, elevated gene expression was observed of inflammatory genes such as il8, il1b, pap, and s100a9 (hulst et al. 2013) . they also noted an upregulation of nfkbia, an nfjb inhibitor at this early time point. a study using in vitro stimulation with endotoxin of blood of animals prior to infection was also able to find cytokine differences between ls and ps pigs, showing an attenuated response in ls animals, in contrast to a clear pro-inflammatory response in ps pigs (knetter 2013) . interestingly, while gene expression on day 0 showed a similar magnitude of response in ls and ps pigs, response differences to lps in blood at 2 dpi between ls and ps pigs were dramatic. only 14 probesets were differentially expressed in ls animals after endotoxin stimulation, while 959 probesets in ps animals changed significantly, showing an apparent tolerization mechanism in ls animals. since differences in gene expression patterns between ls and ps animals on day 0 were not significant enough to create predictor sets of genes in this and earlier studies, kommadath and co-workers used the more sensitive weighted gene co-expression network analysis (wgcna) technique after rna-seq profiling of 8 ls and 8 ps animals. wgcna creates modules of co-expressed genes that are significantly correlated with shedding levels. they include interesting genes such as cytokine genes, genes involved in tlr, nfjb, and nod-like receptor pathways and genes linked to bacterial infections (kommadath et al. 2014) . for over a decade, microarrays have provided an enormous amount of information concerning different immune-related questions in pig transcriptomics. they have become increasingly inexpensive tools to search for gene expression differences between distinct immune phenotypes. recently, rna-seq, and other sequence-based methods such as mirna-seq, bs-seq, chip-seq, and medip-seq, experiments have become more cost-effective. the advantage of gaining information about all expressed and modified genes, different regulation of splice variants, as well as information about sequence-specific and histone code regulation, is a big plus for rna-seq over microarrays. however, in the near future, the biggest challenge lies in comparing all existing data from many different kinds of platforms, so as to integrate such orthogonal data and better understand the physiology behind the disease phenotype and to find regulatory networks or biomarkers for disease resistance. such meta-analyses use a set of statistical techniques to combine results from different studies (badaoui et al. 2013) , requiring only that the platform elements can be matched. it is not necessary that the exact same questions were addressed; e.g., an experiment looking at high and low responders to an infection can be compared to a study with infected versus control animals. or an acute response study can be matched to a chronic response study. adding to this, the possibility to integrate new (and broader) information obtained from upcoming next generation sequencing studies would really improve our transcriptomic and epigenomic insights into pig immunity. an example of combining microarray studies for deeper insight is given by pérez-montarelo and co-workers. a meta-analysis was performed on 20 independent gene expression studies, using data from 480 of the same affymetrix array. by doing so, the expression of 12,320 genes could be checked in 27 tissues and they could identify tissue-specific genes and tissue-specific regulatory networks and transcription factors (perez-montarelo et al. 2012) . the meta-analysis study by badaoui and colleagues described above illustrates how a meta-analysis experiment could be achieved for a prrs-specific research question even across different platforms. disparate microarray elements that mapped to the same ipa cdna assembly (couture et al. 2009 ) were considered to be comparable. 30,504 such elements could be compared across all 779 datasets used, of which more than a third was investigating prrs. to facilitate these kinds of meta-analyses, lessons can be learned from the human encode project by conducting experiments in a similar way and processing and archiving data using standard procedures (the encode consortium project 2011b; birney 2012; landt et al. 2012) . in this way, data quality is assured, data utility can be extended, and it becomes easier to compare datasets, combine computational analyses, and consequently perform meta-analysis. besides cross-platform meta-analyses, also cross-species meta-analyses are very promising, and r packages are freely available to conduct them (kuhn et al. 2008; kristiansson et al. 2013 ). however, until now, the main goal of these cross-species expression comparisons is often to employ model organisms for human diagnostic or therapeutic research (yu et al. 2012; grigoryev et al. 2013 ). kapetanovic and colleagues show how the clustering software biolayout express 3d (be3d) can be used to visualize inter-species expression comparisons (kapetanovic et al. 2013 ). when comparing differential gene expression in mouse, human, and pig macrophages after lps stimulation, they noted that pig macrophages act more human-like than do mouse macrophages. in a first paper, this group used be3d to identify and visualize sets of genes responding similarly over time to lps. they found that a subset of these genes had similar patterns of induction with human macrophage response, but not with mouse, where much lower stimulation or even repression was observed (kapetanovic et al. 2012) . they extended this work in a later paper by using the larger snowball array. taking only the differential genes between mouse and human, and removing those that did not have pig orthologues on the snowball array, be3d showed a large upregulated cluster in human compared to mouse macrophages which contained ido1 as the hub gene. this cluster was highly upregulated in pig macrophages as well. other genes such as those clustering around nos2a also behaved in a murine-specific manner, with an upregulated expression in mouse macrophages while no differential expression in human or pig (kapetanovic et al. 2013) . in addition to cross-platform and cross-species comparisons, it is important to investigate simultaneously different biological levels influencing the pig's immune status. in the past, disease research very often focused only on one part of the host-pathogen interaction (smits and schokker 2011) . one examined a portion of the host response, and looked at its ability to fight off an infection and examining its degree of disease susceptibility. or one focused solely on the pathogen, describing level of virulence among different pathogen variants. as well, very often, only a small part of the host immune response was measured, such as a specific cell type or tissue, or only one particular timeframe was targeted. narrow time windows or even single stages can be quite limiting, as immune response varies dramatically over time, and thus, time is a particularly crucial variable in an expression study of immune responses. with a systems biology approach, many levels of knowledge are gathered on both host and pathogen in a challenge study (genomic, transcriptomic, epigenomic, and metabolomic) and at different time points (smits and schokker 2011; tuggle et al. 2011) . the ultimate goal is to combine that data to fully explain host-pathogen interactions and discover emergent properties of the system that are difficult to reveal with current approaches. to disseminate public data and improve transcriptomic and epigenomic data mining, a livestock expression and epigenetic database (epidb) is under development. epidb includes data from chickens, cattle, pigs, sheep, and horses and provides a useful repository source, as well as tools to process and visualize expression and epigenetic data ). examples of systems biology approaches integrating transcriptomic and epigenomic data can already be found in many mirna-seq experiments, where one mirna can regulate a network of several mrnas (giles et al. 2013; valdmanis et al. 2013; szeto et al. 2014; yang et al. 2014) . a porcine mrna-seq study often precedes a mirna-seq experiment on the same sample set to test for correlations between mrna and mirna in order to predict influences of specific mirnas on components of the whole transcriptome (bao et al. unpublished; endale ahanda et al. 2012) . however, the accuracy and depth of understanding stand or fall with the quality of the pig draft genome assembly and its annotation. recently, the immunome response annotation group (irag) was able to improve the characterization of the pig immunome by manual annotation of almost 3,500 transcripts mapping to over 1,400 genes (dawson et al. 2013 ). this was accomplished using the latest swine genome assembly version 10.2. for the genes without porcine rna sequence evidence, rna sequences of other species (often of human) were used to annotate more than 1,100 transcripts using the alignment tools in otterlace (searle et al. 2004; loveland et al. 2012) . furthermore, gene expression clustering after infection or stimulation for many independent challenge experiments provided evidence for the involvement of over 500 genes not previously annotated for function in immune response processes (dawson et al. 2013 ). ongoing improvements of the draft assembly and additional annotation of the immunome will greatly improve the value of pig disease transcriptomic studies as well as further support the pig as model for human immune response. since immune networks are very complex (gardy et al. 2009 ), a deeper understanding of such complexity is needed for advancements in unraveling porcine disease response mechanisms and in developing the pig as a viable model for human immunity. it is encouraging that substantial new high-throughput data have been reported in this area, and that analysis of such data is moving toward a systems biology approach by integrating different methods and combining multiple datasets. with the even higher throughput whole genome techniques coming to the forefront and performed at a relatively low cost, these comprehensive experiments will become more commonplace in the near future. the making of encode: lessons for big-data projects evidence for a major qtl associated with host response to porcine reproductive and respiratory syndrome virus challenge validation and further characterization of a major quantitative trait locus associated with host response to experimental infection with porcine reproductive and respiratory syndrome virus patterns of cellular gene expression in swine macrophages infected with highly virulent classical swine fever virus strain brescia understanding haemophilus parasuis infection in porcine spleen through a transcriptomics approach a global view of porcine transcriptome in three tissues from a full-sib pair with extreme phenotypes in growth and fat deposition by paired-end rna sequencing porcine s100a8 and s100a9: molecular characterizations and crucial functions in response to haemophilus parasuis infection increasing gene discovery and coverage using rna-seq of globin rna reduced porcine blood samples evaluation of suitable reference genes for gene expression studies in porcine alveolar macrophages in response to lps and lta evaluation of suitable reference genes for gene expression studies in porcine pbmcs in response to lps and lta quantitative analysis of the immune response upon salmonella typhimurium infection along the porcine intestinal gut alphacoronavirus protein 7 modulates host innate immune response transcriptional approach to study porcine tracheal epithelial cells individually or dually infected with swine influenza virus and streptococcus suis comparative assessment of the pig, mouse, and human genomes: a structural and functional analysis of genes involved in immunity analysis of porcine mhc using microarrays microrna 181 suppresses porcine reproductive and respiratory syndrome virus (prrsv) infection by targeting prrsv receptor cd163 cytokines transcript levels in lung and lymphoid organs during genotype 1 porcine reproductive and respiratory syndrome virus (prrsv) infection enabling a systems biology approach to immunology: focus on innate immunity direct multiplexed measurement of gene expression with colorcoded probe pairs genome-wide transcriptional response of primary alveolar macrophages following infection with porcine reproductive and respiratory syndrome virus gene expression profiling of porcine alveolar macrophages after antibody-mediated crosslinking of sialoadhesin (sn, siglec-1) pandemic h1n1 influenza virus elicits similar clinical course but differential host transcriptional response in mouse, macaque, and swine infection models mirbase: microrna sequences, targets and gene nomenclature meta-analysis of molecular response of kidney to ischemia reperfusion injury for the identification of new candidate genes analyses of pig genomes provide insight into porcine demography and evolution mannose-specific interaction of lactobacillus plantarum with porcine jejunal epithelium increasing expression of microrna 181 inhibits porcine reproductive and respiratory syndrome virus replication and has implications for controlling virus infection identification of host encoded micrornas interacting with novel swine-origin influenza a (h1n1) virus and swine influenza virus molecular characterisation of the early response in pigs to experimental infection with actinobacillus pleuropneumoniae using cdna microarrays intestinal salmonella typhimurium infection leads to mir-29a induced caveolin 2 regulation digital gene expression for non-model organisms distinct peripheral blood rna responses to salmonella in pigs differing in salmonella shedding levels: intersection of ifng, tlr and mirna pathways transcription networks responsible for early regulation of salmonella-induced inflammation in the jejunum of pigs expression of toll-like receptors and downstream genes in lipopolysaccharide-induced porcine alveolar macrophages age-related changes in phagocytic activity and production of pro-inflammatory cytokines by lipopolysaccharide stimulated porcine alveolar macrophages reactomes of porcine alveolar macrophages infected with porcine reproductive and respiratory syndrome virus pig bone marrow-derived macrophages resemble human macrophages in their response to bacterial lipopolysaccharide the impact of breed and tissue compartment on the response of pig macrophages to lipopolysaccharide characterizing the porcine immune response to an environmental and pathogenic challenge: swine barn dust and salmonella infection. graduate theses and dissertations salmonella enterica serovar typhimurium-infected pigs with different shedding levels exhibit distinct clinical, peripheral cytokine and transcriptomic immune response phenotypes livestock epidb: an integrated epigenetics and gene expression database and discovery resource. in: plant and animal genome 22, workshop session w025 mirbase: integrating micror-na annotation and deep-sequencing data a novel method for cross-species gene expression analysis cross-species and crossplatform gene expression studies with the bioconductor-compliant r package 'annotationtools' expression of immunoregulatory genes in peripheral blood mononuclear cells of european wild boar immunized with bcg development of a macroarray to specifically analyze immunological gene expression in swine porcine reproductive and respiratory syndrome virus field isolates differ in in vitro interferon phenotypes micrornaome of porcine pre-and postnatal development response of swine spleen to streptococcus suis infection revealed by transcription analysis transcription analysis on response of swine lung to h1n1 swine influenza virus transcription analysis of the porcine alveolar macrophage response to porcine circovirus type 2 getting started in tiling microarray analysis current knowledge of microrna characterization in agricultural animals understanding streptococcus suis serotype 2 infection in pigs through a transcriptional approach whole blood transcriptome comparison of pigs with extreme production of in vivo dsrna-induced serum ifn-a temporal-and strain-specific host microrna molecular signatures associated with swine-origin h1n1 and avian-origin h7n7 influenza a virus infection community gene annotation in practice genome-wide association study for t lymphocyte subpopulations in swine advances in swine biomedical model genomics pandemic h1n1 influenza virus causes disease and upregulation of genes related to inflammatory and immune responses, cell death, and lipid metabolism in pigs the peripheral blood transcriptome reflects variations in immunity traits in swine: towards the identification of biomarkers next-generation transcriptome assembly exploring the immune response of porcine mesenteric lymph nodes to salmonella enterica serovar typhimurium: an analysis of transcriptional changes, morphological alterations and pathogen burden innate and adaptive immune mechanisms are effectively induced in ileal peyer's patches of salmonella typhimurium infected pigs microrna: mechanism of gene regulation and application to livestock improving pig health through genomics: a view from the industry the pig: a model for human infectious diseases analysis of the swine tracheobronchial lymph node transcriptomic response to infection with a chinese highly pathogenic strain of porcine reproductive and respiratory syndrome virus evolutionary characterization of pig interferon-inducible transmembrane gene family and member expression dynamics in tracheobronchial lymph nodes of pigs infected with swine respiratory disease viruses applications of dna tiling arrays for whole-genome analysis applications of next-generation sequencing technologies in functional genomics immunopathogenesis of severe acute respiratory disease in zaire ebolavirus-infected pigs development of a porcine small intestinal cdna microarray: characterization and functional analysis of the response to enterotoxigenic e. coli densely interconnected transcriptional circuits control cell states in human hematopoiesis modulation of cd163 receptor expression and replication of porcine reproductive and respiratory syndrome virus in porcine macrophages porcine tissue-specific regulatory networks derived from meta-analysis of the transcriptome reduced airway surface ph impairs bacterial killing in the porcine cystic fibrosis lung gene expression profiling of longissimus dorsi and adipose tissue in pigs with differing post-weaning growth rate comparison of the innate immune responses of porcine monocyte-derived dendritic cells and splenic dendritic cells stimulated with lps. innate immun ramakrishnan r expression levels of 25 genes in liver and testis located in a qtl region for androstenone on ssc7q1 infusion of freshly isolated autologous bone marrow derived mononuclear cells prevents endotoxin-induced lung injury in an ex vivo perfused swine model transgenics and modern reproductive technologies control of porcine reproductive and respiratory syndrome (prrs) through genetic improvements in disease resistance and tolerance genome-wide analysis of antiviral signature genes in porcine macrophages at different activation statuses host-virus interaction: a new role for micrornas in vitro transcriptome analysis of porcine choroid plexus epithelial cells in response to streptococcus suis: release of pro-inflammatory cytokines and chemokines the otter annotation system genetic analysis of reproductive traits and antibody response in a prrs outbreak herd deciphering the porcine intestinal microrna transcriptome immunological genome project and systems immunology cell type-specific gene expression differences in complex tissues viruses, micrornas, and host interactions hepatic gene expression changes in pigs experimentally infected with the lung pathogen actinobacillus pleuropneumoniae as analysed with an innate immunity focused microarray expression of innate immune genes, proteins and micrornas in lung tissue of pigs infected experimentally with influenza virus (h1n2) down-regulation of mechanisms involved in cell transport and maintenance of mucosal integrity in pigs infected with lawsonia intracellularis host-pathogen interactions transcriptomic and nuclear architecture of immune cells after lps activation microfluidic highthroughput rt-qpcr measurements of the immune response of primary bovine mammary epithelial cells cultured from milk to mastitis pathogens high throughput gene expression measurement with real time pcr in a microfluidic dynamic array assessment of the swine proteinannotated oligonucleotide microarray cystic fibrosis pigs develop lung disease and exhibit defective bacterial eradication at birth il2rg genetargeted severe combined immunodeficiency pigs integrated mrna and microrna transcriptome sequencing characterizes sequence variants and mrna-microrna regulatory network in nasopharyngeal carcinoma model systems standards, guidelines and best practices for rna-seq the encode consortium project (2011b) a user's guide to the encyclopedia of dna elements (encode) the role of viral and host micrornas in the aujeszky's disease virus during the infection process time course differential gene expression in response to porcine circovirus type 2 subclinical infection advances in swine transcriptomics methods for transcriptomic analyses of the porcine host immune response: application to salmonella infection using microarrays introduction to systems biology for animal scientists mapping quantitative trait loci for innate immune response in the pig expression dynamics of tolllike receptors mrna and cytokines in porcine peripheral blood mononuclear cells stimulated by bacterial lipopolysaccharide expression patterns of porcine toll-like receptors family set of genes (tlr1-10) in gutassociated lymphoid tissues alter with age correlating blood immune parameters and a cct7 genetic variant with the shedding of salmonella enterica serovar typhimurium in swine upregulation of the microrna cluster at the dlk1-dio3 locus in lung adenocarcinoma differential production of proinflammatory cytokines in the pig lung during different respiratory virus infections: correlations with pathogenicity completion of the swine genome will simplify the production of swine as a large animal biomedical model global transcriptional response of porcine mesenteric lymph nodes to salmonella enterica serovar typhimurium analysis of porcine transcriptional response to salmonella enterica serovar choleraesuis suggests novel targets of nfkappab are activated in the mesenteric lymph node rna-seq: a revolutionary tool for transcriptomics genome-wide association studies for hematological traits in swine transcription analysis on response of porcine alveolar macrophages to haemophilus parasuis knockdown of pu.1 mrna and as lncrna regulates expression of immune-related genes in zebrafish danio rerio gene expression profiling in the lungs of pigs with different susceptibilities to glasser's disease transcriptomic analysis identifies candidate genes and functional networks controlling the response of porcine peripheral blood mononuclear cells to mitogenic stimulation preprocessing of oligonucleotide array data pseudorabies virus infected porcine epithelial cell line generates a diverse set of host micrornas and a special cluster of viral micrornas identifying putative candidate genes and pathways involved in immune responses to porcine reproductive and respiratory syndrome virus (prrsv) infection understanding prrsv infection in porcine lung based on genome-wide transcriptome response identified by deep sequencing aberrant host immune response induced by highly virulent prrsv identified by digital gene expression tag profiling discovery of porcine micrornas in multiple tissues by a solexa deep sequencing approach deep rna sequencing reveals dynamic regulation of myocardial noncoding rna in failing human heart and remodeling with mechanical circulatory support analysis of differential mirna expression in the duodenum of escherichia coli f18-sensitive and -resistant weaned piglets a cross-species analysis method to analyze animal models' similarity to human's disease state global transcriptional response of pig brain and lung to natural infection by pseudorabies virus macrophage transcriptional responses following in vitro infection with a highly virulent african swine fever virus isolate porcine reproductive and respiratory syndrome virus activates the transcription of interferon alpha/beta (ifn-alpha/beta) in monocyte-derived dendritic cells (mo-dc) validation of a first-generation long-oligonucleotide microarray for transcriptional profiling in the pig systems infection biology: a compartmentalized immune network of pig spleen challenged with haemophilus parasuis molecular characterization of transcriptome-wide interactions between highly pathogenic porcine reproductive and respiratory syndrome virus and porcine alveolar macrophages in vivo transcriptional profiling of swine lung tissue after experimental infection with actinobacillus pleuropneumoniae key: cord-296979-8r851j4t authors: zhong, ying; liu, dong-ling; ahmed, mohamed morsi. m.; li, peng-hao; zhou, xiao-ling; xie, qing-dong; xu, xiao-qing; han, ting-ting; hou, zhi-wei; zhong, chen-yao; huang, ji-hua; zeng, fei; huang, tian-hua title: host genes regulate transcription of sperm-introduced hepatitis b virus genes in embryo date: 2017-10-31 journal: reproductive toxicology doi: 10.1016/j.reprotox.2017.08.009 sha: doc_id: 296979 cord_uid: 8r851j4t abstract hepatitis b virus (hbv) can invade the male germline, and sperm-introduced hbv genes could be transcribed in embryo. this study was to explore whether viral gene transcription is regulated by host genes. embryos were produced by in vitro fertilization of hamster oocytes with human sperm containing the hbv genome. total rna extracted from test and control embryos were subjected to smart-pcr, ssh, microarray hybridization, sequencing and blast analysis. twenty-nine sequences showing significant identity to five human gene families were identified, with csh2, eif4g2, pcbd2, psg4 and ttn selected to represent target genes. using qrt-pcr, when csh2 and pcbd2 (or eif4g2, psg4 and ttn) were silenced by rnai, transcriptional levels of hbv s and x genes decreased (or increased). this is the first report that host genes participate in regulation of sperm-introduced hbv gene transcription in embryo, which is critical to prevent negative impact of hbv infection on early embryonic development. hepatitis b is a potentially life-threatening liver infection caused by the hepatitis b virus (hbv). transmission of hbv through blood transfusion [1] , body fluids [2] , intrauterine infection [3] , cell, tissue and organ transplantation [4] , and others including hemodialysis units or intravenous drug injection or occupational exposure [5, 6] have been documented. in recent years some studies reported a true vertical transmission of hbv via germline and found that hbv has a negative effect on sperm motility in vivo and that couples in which the male is infected with hbv have a high risk of a low fertilization rate after in vitro fertilization (ivf) [7] . huang et al. provided direct evidence that hbv dna is able to integrate into the chromosomes of patient sperm, and that such hbv-carrying sperm can fertilize oocytes [8] . after fertilization, the sperm-introduced hbv c, s and x genes retain their functions of replication and expression in embryonic cells [9, 10] . subsequently, some studies showed that the hbv s protein can cause a series of apoptotic events resulting in reduced sperm motility, loss of sperm membrane integrity, sperm dysfunction, decreased fertility, and sperm death [11] [12] [13] . hu et al. collected 578 embryos from couples with at least one hbsag seropositive partner, and found that hbv dna was present in 14.4% (83/578) of embryos [14] . thus the effects of hbv infection on human reproduction and early embryonic development and their mechanisms have attracted the attention of researchers. sperm samples from healthy donors were divided into two groups: the test group was transfected with plasmid containing the full-length hbv genome (t), and the control group was non-transfected (c). zona-free hamster ova were fertilized by transfected or non-transfected sperm to obtain two-cell embryos. total rna was extracted from t-and c-sperm-derived embryos, respectively and then subjected to smart amplification, followed by ssh in both forward (t as tester) and reverse (c as tester) directions to enrich for up-regulated genes (t-c amplicon) or down-regulated genes (c-t amplicon). after t-a cloning and bacterial amplification, the forward and reverse libraries were cloned by pcr to obtain single insert-containing clones that were used for microarray assay. clones with a fold change value greater than 2 or less than 0.5 when compared with the average cy5/cy3 intensity ratio, the differential expression of which was statistically significant, were selected for sequencing. for the acquired sequences, blast was used to search for sequences homologous to human genes in the genbank nucleotide database. of the sequences, five were selected as the target genes and their expression in two-cell embryos was confirmed by rt-pcr. b: sperm in the test group was co-transfected with plasmid containing the full-length hbv genome and the target gene-specific shrna/sirna. sperm in the control group was transfected with plasmid containing the full-length hbv genome alone. the transcription levels of hbv s and x genes in the two-cell embryos were assessed by real time qrt-pcr to identify whether the target genes participated in regulation of hbv gene expression. same procedures listed above were hepatitis b infection and disease pathogenesis are known to be influenced by a number of factors, including host genetic factors [15] . many cellular proteins that possibly regulate hbv gene transcription in hepatocytes have been identified, including liver-enriched proteins, such as hnf1, hnf3, c/ebp, and klf15, ubiquitous factors, such as sp1, rfx1, nf-y, and ap1, and members of the nuclear receptor superfamily, such as hnf4, rxra, ppara and coup-tfs [16] [17] [18] [19] . hepatotropism is a prominent feature of hbv infection, and virus infection appears to be restricted to hepatocytes [20] . because sperm-derived embryos differ from hepatic cells, this study was undertaken to explore whether host genes participate in regulation of hbv gene transcription in embryonic cells, which is critical to reveal the regulation mechanism of viral gene transcription in these cells and to prevent the negative impact of hbv infection on early embryonic development. although using embryo produced by fertilization of human oocyte with human sperm carrying hbv genes would be an ideal model, such a model presents major moral, ethical and legal problems. the interspecific ivf between human sperm and zona-free golden hamster ova is highly associated with human in vitro fertilization [21] and has no the aforementioned problems, which has been included in "who laboratory manual for the examination and processing of human semen" [22] and widely used in the research of reproductive biology and medicine [23] [24] [25] , thus the interspecific ivf was employed to obtain two-cell embryos in the current study. as hbv dna has been detected in patient sperm [7, 8] , donor sperm transfected with recombinant plasmid containing the fulllength hbv genome was used instead of patient sperm to explore whether host genes participate in regulation of sperm-introduced hbv gene transcription in embryos. three measures were taken to ensure reliable and accurate results (fig. 1) . first, switching mechanism at 5 end of rna template (smart) amplification was employed because, unlike cells from other tissues, limited number of two-cell embryos made recovery of significant amounts of rna very difficult. smart amplification allows the synthesis of highquality cdna for suppression subtractive hybridization (ssh) from sub-microgram levels of rna [26] . combined with t-a cloning and bacterial amplification, we obtained sufficient subtraction products for array probe preparation, microarray assay and cdna sequencing. next, we combine ssh with microarray to obtain the cdnas highly enriched for differentially expressed genes of both high and low abundance and greatly reduce tedious work for screening of subtraction libraries, as well as the likelihood of false-positive clones enriched via ssh [27] . finally, we silenced the target genes and a control gene by rna interference (rnai) to detect effects of the silencing of these genes on transcriptional level of hbv genes to determine whether host genes participate in regulation of hbv gene transcription. semen samples were obtained from healthy donors. written, informed consent was obtained from all study subjects who allowed their sperm samples to be used for research. all protocols used in the current study involving human subjects were approved by the institutional ethical review boards of chengdu jingjiang hospital for maternal and child health care (approval number: cjhmchc-0010) and by the ethics committee of shantou univeralso used to isolate and identify a control gene with a fold change value greater than 0.5 and less than 2 when compared with the average cy5/cy3 intensity ratio, the differential expression of which was not statistically significant. sity medical college (approval no. sumc-00-0031) according to the world medical association declaration of helsinki: ethical principles for medical research involving human subjects [28] . the animal protocol was designed to minimize pain or discomfort. the animals were acclimatized to laboratory conditions (23 • c, 12 h/12 h light/dark, 50% humidity, ad libitum access to food and water) for two weeks prior to experimentation. all animals were euthanized by barbiturate overdose (intravenous injection, 150 mg/kg pentobarbital sodium) for oocyte collection, and then their bodies were subjected to harmless treatment. all procedures involving animals were reviewed and approved by the medical animal care & welfare committee in shantou university medical college (iacuc protocol number: sumc2015-152) according to the recommended guide for the care and use of laboratory animals published by the national research council (us) [29] . semen samples were incubated in a humidified incubator (37 • c, 5% co 2 in air) for 30 min to allow liquefaction. motile sperm were selected using the swim-up method in biggers-whittem-whittingham (bww) medium supplemented with 0.3% bovine serum albumin (bsa) [30] . after washing, these sperm samples were used for subsequent experiments. recombinant plasmid pbr322-hbv containing the full-length hbv genome was kindly provided by professor yi-ping hu (the open laboratory for molecular genetics, the second military medical university, china). construction and transfection of plasmid pires2-egfp-hbv ( fig. 2 ) was performed as described previously [31] . briefly, a full-length cdna of hbv was isolated from pbr322-hbv by digestion of the restriction enzyme ecori, and purified using purification extraction kit (takara biotechnology (dalian) co., ltd, china). plasmid pires2-egfp vector (promega, madison, wi, usa) was linearized and mixed with the hbv cdna at 1:3-1:10 molar ratio in a ligation reaction and incubated at 16 • c overnight. the ligation mixture was used directly to transform the ecoli competent cells dh5␣ according to the manufacturer's instruction (takara). the right orientation and authenticity of the hbv coding sequence were confirmed by restriction digestion. each sperm sample from a healthy donor was divided into two groups: a test group and a control group. a 100-l aliquot of a mixture comprised of 1 l of plasmid pires2-egfp-hbv (1.5 g/l) containing the full-length hbv genome, 6 l of fugene hd, and 93 l of hepes-buffered saline was incubated at room temperature for 15 min, and then added to the sperm in the test group, which was incubated for 1.5 h. the sperm in the control group were not transfected. to measure the transfection efficiency, a nick translation kit (roche, basel, switzerland) and fluorescein-12-dutp (thermo fisher scientific, waltham, ma, usa) were used. the plasmid was labeled with fluorescein-12-dutp by nick translation according to the manufacturer's instructions. transfected sperm were detected by green fluorescence and counted under a fluorescence microscope. in our preliminary experiment, the transfection efficiency was 39.51 ± 2.72%, and the proportion of embryos derived from hbv transfected sperm was 27.66%. oocyte preparation and insemination were performed as described in reference [22] . briefly, zona-free hamster ova were fertilized by sperm from the test and control groups and then incubated in a humidified incubator (37 • c, 5% co 2 in air) for 24 h to obtain two-cell embryos. two-cell embryos exhibiting green fluorescence, derived from the hbv-transfected sperm (the test group), and those without green fluorescence, derived from the non-transfected sperm (the control group), were collected under a fluorescence microscope (dmil led, leica, germany). after washing, total rna from the two-cell embryos was extracted for cdna library construction by smart amplification. smart amplification and ssh were carried out using smarter tm pcr cdna synthesis kit and clontech pcr-select tm cdna subtraction kit (clontech laboratories, inc, ca, usa) according to the manufacturer's instructions. control mouse liver total rna (the positive group), deionized h 2 o (the negative group) and the primers (3 smart cds primer ii a and 5 pcr primer ii a) were provided by the kit. briefly, 3.5 l (50 ng) total rna extracted from two-cell embryos of the test and control groups were successively subjected to first-strand cdna synthesis, amplified by long-distance pcr, purified by column chromatography, digested with rsai, diluted to a final concentration of 300 ng/l in 1x tne buffer, and subjected to adaptor ligation. ssh materials consisted of cdna from the test two-cell embryos as the tester and cdna from the control two-cell embryos as the driver for forward subtraction, and vice versa for reverse subtraction. after the first and second hybridizations, hybridized samples 1 and 2 were mixed and then incubated at 68 • c overnight, followed by a primary pcr and a secondary pcr. after an agarose/ethidium bromide gel analysis, the products were used for dna ligation. the same procedures listed above were also performed for reverse subtraction and control subtraction. t/a cloning and amplification of subtraction products were performed using pgem ® -t easy vector system (promega) according to the manufacturer's instructions. briefly, ligation reactions containing pgem ® -t easy vector, 2x rapid ligation buffer, t4 dna ligase and the secondary pcr products of ssh were incubated overnight at 4 • c. two microliters of ligation reaction was added into 50 l of jm109 high efficiency competent cells, which was placed on ice for 20 min followed by heat-shocking at 42 • c for 45 s, putting on ice for 2 min, and then adding into 950 l of luria-bertani (lb) medium and incubation at 37 • c for 1.5 h with shaking. one hundred microliters of the transformation culture was plated onto lb/ampicillin/iptg/x-gal plates followed by incubation overnight at 37 • c. the 174 white (positive) colonies obtained were separately resuspended in 1.5 ml lb medium with ampicillin (100 g/ml) and grown at 37 • c for 6 h. pcr amplifications were carried out using 0.3 l bacterial suspension and premix taq tm (takara), and 5 l from each pcr reaction was subjected to 1% agarose gel electrophoresis. positive bands were detected for 152 of 174 colonies. microarray hybridization and data analysis were performed by takara biotechnology (dalian) co., ltd. briefly, the pcr products for differentially expressed genes from 152 clones, obtained from the subtraction libraries, were purified using 2-propanol precipitation and then spotted in triplicate onto takara glass slides (takara) using an affymetrix arrayer 417 (takara). the pcr products from the forward-and reverse-subtracted libraries (2 g each) were labeled with cy3 and cy5 fluorescent dyes and separately used to probe the glass slides containing the pcr-amplified cdna. the slides were hybridized overnight at 65 • c with labeled purified probes. quality control of the microarray chips was performed according to takara's method and standard. array slides were scanned using an affymetrix 428 array scanner (affymetrix, buckinghamshire, uk). the measured intensities are expressed as a ratio of cy5/cy3 intensities, which were background-corrected and normalized to the average cy5/cy3 ratio. the ratios were log 2 -transformed, and the following clones were selected for sequencing: 1. clones with a fold change value greater than 2 or less than 0.5 when compared with the average cy5/cy3-intensity ratio, the differential expression of which was statistically significant, were used to select target genes; 2. a clone with a fold change value greater than 0.5 and less than 2, the differential expression of which was not statistically significant, was chosen as the control [32] . dna sequencing and analysis were performed by beijing genomics institute shenzhen co., ltd. (shenzhen, china) using a 3730 xl dna analyzer (applied biosystems, ca, usa). sequencing reactions were carried out with bigdye v3.1 mix and pop-7 tm polymer (applied biosystems). for the acquired sequences, basic local alignment search tool (blast) was used to search for sequences homologous to human genes in the genbank nucleotide database. of the sequences, five with statistically significant differential expression were selected as the target genes, and a non-statistically significant sequence without statistical significance of differential expression was selected as a control. the transcriptions of both target and control genes in the two-cell embryos were confirmed using rt-pcr. the extraction and reverse transcription of total rna was performed using an ambion cells-to-cdnatm ii kit (life technologies, ca, usa) according to the manufacturer's instructions. pcr amplification was performed in a 25 l reaction mixture containing cdna (5 l), premixed taq polymerase (12.5 l) (takara), forward and reverse primer (10 m, 0.5 l each) and ddh 2 o (6.5 l), and the ˇactin was co-amplified as an internal control [33] . the gene-specific primers were designed using primer-blast (table 1 ). cycling conditions were as follows: 2 min at 94 • c; 35 cycles of 30 s at 94 • c and 30 s at 57 • c; and one cycle of 1 min at 72 • c. the amplified pcr fragments were subjected to 1.5% agarose gel electrophoresis and stained with ethidium bromide. eighteen healthy donors were randomly divided into six groups, and their sperm samples were used to fertilize zona-free hamster oocytes in vitro and assess the effects of the silencing of five target genes (csh2, eif4g2, pcbd2, psg4 and ttn) and a control gene (esrrg) on transcription of hbv s and x genes by real-time quantitative (q)rt-pcr. the sperm samples from three donors were individually used for assaying each gene, and each assay was repeated three times. gene-specific short hairpin (sh)rna or short interfering (si)rna used for silencing the target and control genes, and non-interfering scrambled oligonucleotides used as the negative controls, were designed and synthesized by shanghai genepharma co., ltd. or qiagen china (shanghai) co., ltd, respectively (table s1 ). each sperm sample from a healthy donor was divided into two groups: a test group, which was co-transfected with plasmids containing the full-length hbv genome and the gene-specific shrna/sirna or a non-interfering scrambled oligonucleotide; and a control group, which was transfected with the plasmid containing the full-length hbv genome alone. real-time qrt-pcr was performed using an abi 7300 real-time pcr system (applied biosystems, ca, usa) to compare the transcriptional level of hbv genes in the two-cell embryos of the test and control groups, with gapdh as an internal control [33] . total rna was extracted from the two-cell embryos and cdna synthesis was performed as described previously [34] . a total volume of 20 l of reaction mixture contained cdna (2 l), 2 x quantifast sybr green pcr master mix (10 l) (qiagen, hilden, germany), forward and reverse primers (10 m, 0.2 l each) for s and x genes and gapdh (table 1) , and rnase-free water (7.6 l). cycling conditions were as follows: 5 min at 95 • c; 40 cycles of 15 s at 95 • c and 30 s at 60 • c; and one cycle of 15 s at 95 • c, 2 min at 55 • c, and 15 s at 95 • c. the data was analyzed quantitatively using the 2 − ct method. in the 2 − ct analysis, individual data were converted to a linear form using the 2 −ct calculation [35] and then subjected to a paired-sample t-test using spss 16.0 software to determine a significant differences in average transcription levels of s and x genes between the test and control groups. a p-value of less than 0.05 was considered statistically significant. smart amplification showed that dispersed positive bands were detected in the test, control and positive groups, and not in the negative group, indicating successful amplification. after ssh, t/a cloning and bacterial amplification, 174 white (positive) colonies from subtracted libraries (123 from the forward and 51 from the reverse ssh libraries) were amplified by pcr, resulting in 152 positive insert-containing clones (103 from the forward-and 49 from the reverse-subtracted libraries). the insert sizes ranged from 500 to 1500 bp, and the majority of insert sizes was approximately 1000 bp (fig. 3) , in agreement with the statistical prediction of rsai digestion. table 1 the primers were used to amplify the target genes, a control gene, an internal control genes and hbv s and x genes. forward reverse in the current study, the two-fold average ratio of cy5/cy3 intensities was 5.740, and the 0.5-fold average ratio of cy5/cy3 intensities was 1.4349. both are the thresholds for determining whether gene differential expression is statistically significant. of 152 positive insert-containing clones, 29 showed fold change values greater than 2 or less than 0.5, and 123 exhibited fold change values greater than 0.5 and less than 2. these 29 clones and one of the 123 clones were selected for sequencing (table s2 ). blast analysis revealed 29 acquired sequences showing significant identity to five human gene families (average identity = 94.59%, ranging from 82% to 100%, e-value: 0.0) (table s2 ). of these, five representative genes (one from each family, identity: ≥96%, e-value: 0.0) were selected as target genes that were chorionic somatomammotropin hormone 2 (csh2), eukaryotic translation initiation factor-4␥, 2 (eif4g2), pterin-4␣-carbinolamine dehydratase/dimerization cofactor of hepatocyte nuclear factor 1␣ (tcf1)2 (pcbd2), pregnancy-specific ␤-1-glycoprotein 4 (psg4) and titin (ttn) ( table 2 ). the selection criteria were as follows: first, three sequences showed significant identity to three human gene families, and each gene family had one sequence only, thus they (eif4g2, psg4 and ttn) were chosen as representative genes in these gene families. next, seven sequences showed significant identity to two members (nadh and pcbd2) of a human gene fam-ily. because nadh is located within mitochondria and not within cytoplasm, it is difficult to determine entry efficiency of nadhspecific shrna into mitochondria and to evaluate its silencing effects, thus pcbd2 was chosen as representative gene in this gene family. finally, 19 sequences showed significant identity to one human gene family, in which a sequence from the clone f123 with the highest ratio of cy5/cy3-intensities showed the highest identity (99%) to the members (csh1 and csh2) of this gene family, thus the csh2 was chosen as representative gene. in addition, a sequence (f110) from 123 clones showed significant identity to human estrogen receptor-related receptor (esrrg) family and was chosen as a control gene ( table 2, table s2 ). their transcriptions in the two-cell embryos were confirmed by rt-pcr (fig. 4) . the transcriptional levels of hbv s and x genes in two-cell embryos after the silencing of target genes and a control gene by rna interference (rnai) are shown in table 3 . when csh2 and pcbd2 were silenced, s and x genes expression in two-cell embryos was down-regulated. csh2 knockdown reduced the transcription of s and x genes by 11.1-and 8.33-fold in two-cell embryos, respectively, as compared to those in the control group. knockdown of pcbd2 reduced s and x gene transcription by 4.35-and 3.70-fold, respectively. when eif4g2, psg4 and ttn were silenced, s and x genes expression in two-cell embryos was up-regulated between 2.52-to 14.94-fold. there was no significant difference in the levels table 2 the clones were selected as the target genes and a control gene. both are the thresholds for determining whether gene differential expression is statistically significant. §2 identities: the extent to which two nucleotide sequences have the same residues at the same positions in an alignment, often expressed as a percentage [36] . §3 e value: the expectation value or expect value represents the number of different alignments with scores equivalent to or better than s that is expected to occur in a database search by chance. the lower the e value, the more significant the score and the alignment [36] . * the target genes. ¤the control gene. effects of the silencing of the target and control genes on transcriptional levels of hbv s and x genes in two-cell embryos. eighteen healthy donors were randomly divided into six groups, and their sperm sample were used to fertilize zona-free hamster oocytes in vitro and assess the effects of the silencing of five target genes (csh2, eif4g2, pcbd2, psg4 and ttn) and a control gene (esrrg) on transcription of hbv s and x genes in two-cell embryos using real time qrt-pcr and 2 − ct method. the sperm samples from three donors were individually used for assaying each gene, and each assay was repeated three times. each sperm sample from a donor was divided into two groups: a test group (t), which was co-transfected with plasmids containing the full-length hbv genome and the gene-specific shrna or sirna; and a control group (c), which was transfected with plasmid containing the full-length hbv genome alone. the data were presented as a fold change value (fcv) in transcription of s and x genes in the test group normalized to internal control gene (gapdh) and relative to those in the control group. individual data were converted to a linear form using 2 −ct calculation [35] and then subjected to a paired-sample t-test using 16.0 software to determine a significant difference in average transcriptional levels of s and x genes between the test and control groups. a p-value of less than 0.05 was considered statistically significant. * p < 0.05; ¤p > 0.05. of s and x gene transcription in two-cell embryos silenced for esrrg compared to non-silenced two-cell embryos (table 3 ). in the current study, two non-interfering scrambled oligonucleotides (niso) were used as the negative controls. there was no significant difference in the levels of s and x gene transcription in the two-cell embryos between the test (niso treated) and control (niso non-treated) groups (table 4 ). rnai is a biological process in which rna molecules inhibit gene expression or translation, by neutralizing targeted mrna molecules. oligonucleotides are short dna or rna molecules, of which sirna and shrna are central to rnai. it has been demon-strated that hamster and human sperm have a strong tendency to interact with exogenous dna and are able to transfer dna to oocytes [37] . zhang et al. injected plasmid rnai-ready-psiren-retroq-zsgreen, which was constructed for interrupting the zfx gene, into testis in the test group mice, and then the male mice were mated individually with females, resulting in 78.75 ± 7.50% of the male offspring, significantly higher than the offspring derived from the control groups (p < 0.01), suggesting that rnai could be as a tool to control the sex ratio of mouse offspring by interrupting zfx/zfy genes in sperm cells [38] . in the current study, the two-cell embryos derived from sperm co-transfected with plasmids containing the full-length hbv genome and gene-specific shrna/sirna for interrupting the target genes were as the test group, and those derived from sperm transfected with plasmid containing the full-length table 4 effects of the non-interfering scrambled oligonucleotides on transcriptional levels of hbv s and x genes in two-cell embryos. each sperm sample from a donor was divided into two groups: a test group (t), which was co-transfected with plasmids containing the full-length hbv genome and a non-interfering scrambled oligonucleotide (niso); and a control group (c), which was transfected with plasmid containing the full-length hbv genome alone. the data were presented as a fold change value (fcv) in transcription of s and x genes in the test group normalized to internal control gene (gapdh) and relative to those in the control group. individual data were converted to a linear form using 2 −ct calculation [35] and then subjected to a paired-sample t-test using 16.0 software to determine a significant difference in average transcriptional levels of s and x genes between the test and control groups. ap-value of less than 0.05 was considered statistically significant. ¤p > 0.05. hbv genome alone were as the control group. the transcription levels of hbv genes between the test and control groups are significantly different (p < 0.05), which suggested that in the test group the target genes have been silenced by rnai and participated in transcriptional regulation of hbv genes, causing the change of hbv gene transcription levels. we found that csh2, pcbd2, eif4g2, psg4 and ttn to be involved in transcriptional regulation of hbv s and x genes in two-cell embryos. the protein encoded by csh2 is a member of the somatotropin/prolactin family of hormones and plays an important role in growth control [39] . pcbd2 is a protein coding gene and associated with tetrahydrobiopterin biosynthesis [39] . silencing of csh2 and pcbd2 by rnai decreased the levels of s and x gene transcription in two-cell embryos, indicating that both genes up-regulated transcription of s and x genes. the protein encoded by eif4g2 is a subunit of eif4f, a cap binding protein complex that mediates translation initiation by specific recognition [39] . psg4 is a member of the carcinoembryonic antigen (cea) gene family and may play a role in regulation of the innate immune system [39] . ttn encodes a large abundant protein that is a key component in the assembly and functioning of vertebrate striated muscles, and in non-muscle cells, seems to play a role in chromosome condensation and chromosome segregation during mitosis [39] . when eif4g2, psg4 and ttn were silenced by rnai, the transcriptional levels of s and x genes in two-cell embryos increased, suggesting that these genes down-regulate s and x gene transcription. these results suggest that certain host genes participate in regulating hbv gene transcription in two-cell embryos. however, the interplay between the virus and host cells is complex, and it remains largely unknown how these genes interact with each other or act independently to allow differential transcription of hbv genes. some clues may help in understanding their interaction. first, certain previously identified host hbv-regulatory genes share high similarity in function and signaling pathways with genes (identified herein) that regulate hbv gene transcription. for example, sp1 is a ubiquitous factor that binds to gc-rich motifs in many promoters, can regulate transcription of hbv genes [16, 17] , and is involved in many cellular processes, including cell differentiation, growth, and apoptosis. the protein encoded by csh2 also plays an important role in growth control and provides avenues for developmental regulation and tissue specificity. both genes function in the prolactin signaling pathway, suggesting possible regulation of hbv gene transcription by a similar mechanism. next, some host genes might indirectly participate in regulating hbv gene transcription through general activation of transcription regulators or factors. for example, hnf1 is a liver-enriched transcription factor that can regulate transcription of hbv genes [16, 17] . pcbd2, identified herein, regulates dimerization of hnf1␣ and enhance its transcriptional activity. finally, our identified genes might directly contribute to regulation of hbv gene expression. for example, eif4g2, besides down-regulating transcription of s and x genes, might repress viral rna translation by forming translationally inactive complexes [39] . there are still significant questions. human embryonic genome activation (ega) occurs at the 4-8 cell stage [40, 41] , why host genes can regulate hbv gene expression in two-cell embryos? some studies offered the useful clues, which may help shed light on this question. first, it has been demonstrated that ega in mouse occurs at the two-cell stage, which is controlled by maternally deposited rnas and proteins. yu et al. identified lately that oocyte-expressed yes-associated protein is a key activator of ega in mouse [42] . in the current study, the two-cell embryos were produced by ivf of zona-free golden hamster oocyte with human sperm. ega in golden hamster also occurs at the two-cell stage [43] , whether certain activator (s) deposited in golden hamster oocyte activate human genome? next, in development of human preimplantation embryo, there are four unique embryonic stagespecific patterns (essps) of gene expression [44] . essp1 (maternally inherited oocyte mrnas) were expressed at high levels at the zygote stage and declined during development to the blastocyst stage. essp2 includes embryonic-activated genes, first transcribed at approximately the 8-cell stage. essp3 comprises genes not expressed until the blastocyst stage. essp4 includes persistent transcripts that maintained stable expression from the zygote to blastocyst stages [41] . it is unknown whether the target genes identified in the current study are included in essp4? undoubtedly, clarification of these questions would explore many mysteries in early embryonic development. in addition to the above question, hbv s gene encodes hbsag, which packages the viral components. hbv x gene encodes hbx protein, whose activity is absolutely required for in vivo replication and spread of the virus [44] . can the target genes affect virion assembly, or affect replication and spread of the virus through regulating viral transcription to block its transmission? moreover, outcome of hbv infection is markedly heterogeneous, varying from acute asymptomatic self-limiting infection to fulminant hepatic failure or to decompensated cirrhosis and hepatocellular carcinoma. it has been recognized that host genetic background influences the outcome of viral hepatitis infection [45] . in the current study, we found that some host genes upregulate or downregulate transcription of s and x genes. can we assume that these host genes together function to maintain homeostasis? disturbance of such homeostasis could decrease or increase host susceptibility to hbv infection or allow infection to develop in different directions, leading to different clinical outcomes. these questions will be evaluated in future clinical studies. human embryo development is more fragile than that of many other species [41] . human fecundity rates are relatively low, largely due to pre-and post-implantation embryo loss [41] . in vitro, 50-70% of ivf embryos fail to reach the blastocyst stage [41, 46] . certain factors that may contribute to abnormal development before ega include inherited genetic mutations, aneuploidy, environmental insult to germ cells, events during fertilization and sperm-related factors [41] . in our previous study, it was detected that hbv infection induces various chromosomal abnormalities in patient sperm, including aneuploidy, acentric fragment and deletion, ring chromo-some, triradial, dicentric chromosome and pulverization [8] . sperm with these chromosomal abnormalities can achieve normal fertilization and introduce these aberrations into the embryo, which may increase the risk of abortion, stillbirth, or birth defects [8] . in a recent clinical literature, hbv mrna was found in abandoned ivf embryos of hbv-infected fathers, which confirmed that hbv could not only enter early cleavage embryos via sperm but also replicate in embryos, resulting in early abortion [47] . the aformentioned findings suggested that hbv may interfere with early embryonic development and thus affect pregnancy outcome. therefore it is very important to investigate hbv gene transcription and its regulation mechanism in embryos. this study, for the first time, provides experimental evidence that transcription of hbv genes occurs in early embryonic cells and is regulated by host genes. it is worth mentioning that, besides hbv, there are more emerging infectious diseases virus, such as hepatitis c virus (hcv), human immunodeficiency virus (hiv), severe acute respiratory syndrome virus (sars), ebola virus and zika virus (zikv), which pose a serious threat to human health [48] [49] [50] [51] [52] . lately, some studies begin to investigate the true vertical transmission of hcv and hiv via germline [34, [53] [54] [55] [56] , but the research on such transmission of sars, ebola and zika viruses is still a blank. to clarify whether the aforementioned viral genes are transmitted via germline and their expressional regulation in embryo would make a great contribution to exploring the interplay mechanism between the viruses and host cells and to maintaining human reproductive health. the authors declare no conflict of interest. the founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results. refining the risk estimate for transfusion-transmission of occult hepatitis b virus epidemiological and molecular features of hepatitis b and hepatitis delta virus transmission in a remote rural community in central africa #38: hepatitis b in pregnancy screening, treatment, and prevention of vertical transmission hepatitis b transmission by cell and tissue allografts: how safe is safe enough? hepatitis b reverse seroconversion and transmission in a hemodialysis center: a public health investigation and case report incidence of percutaneous injury in taiwan healthcare workers adverse effects of hepatitis b virus on sperm motility and fertilization ability during ivf effects of hepatitis b virus infection on human sperm chromosomes expression of hepatitis b virus genes in early embryonic cells originated from hamster ova and human spermatozoa transfected with the complete viral genome detection and expression of hepatitis b virus x gene in one and two-cell embryos from golden hamster oocytes in vitro fertilized with human spermatozoa carrying hbv dna hepatitis b virus s protein enhances sperm apoptosis and reduces sperm fertilizing capacity in vitro effects of hepatitis b virus s protein exposure on sperm membrane integrity and functions effects of hepatitis b virus s protein on human sperm function the presence and expression of the hepatitis b virus in human oocytes and embryos host genetic factors in hepatitis b infection, liver cancer and vaccination response: a review with a focus on africa core promoter: a critical region where the hepatitis b virus makes decisions regulatory elements of hepatitis b virus transcription kruppel-like factor 15 activates hepatitis b virus gene expression and replication genotype-dependent activation or repression of hbv enhancer ii by transcription factor coup-tf1 transcriptional regulation of hepatitis b virus by nuclear hormone receptors is a critical determinant of viral tropism correlation between the zona-free hamster egg sperm penetration assay and human in vitro fertilization world health organization, zona-free hamster oocyte penetration test, in: who cpg methylation participates in regulation of hepatitis b virus gene expression in host sperm and sperm-derived embryos the sperm penetration assay for the assessment of fertilization capacity sperm penetration assay as an indicator of bull fertility cdna amplification by smart-pcr and suppression subtractive hybridization (ssh)-pcr suppression subtractive hybridization world medical association declaration of helsinki: ethical principles for medical research involving human subjects guide for the care and use of laboratory animals world health organization, direct swim-up, in: who investigation of recombinant mouse sperm protein izumo as a potential immunocontraceptive antigen distributional fold change test − a statistical approach for detecting differential expression in microarray experiments both beta-actin and gapdh are useful reference genes for normalization of quantitative rt-pcr in human ffpe tissue samples of prostate cancer in vitro study on vertical transmission of the hiv-1 gag gene by human sperm analysis of relative gene expression data using real-time quantitative pcr and the 2(-delta c(t)) method sperm-mediated gene transfer into oocytes of the golden hamster: assessment of sperm function rnai as a tool to control the sex ratio of mouse offspring by interrupting zfx/zfy genes in the testis functional genomics of 5-to 8-cell stage human embryos by blastomere single-cell cdna analysis non-invasive imaging of human embryos before embryonic genome activation predicts development to the blastocyst stage oocyte-expressed yes-associated protein is a key activator of the early zygotic genome in mouse golden hamster embryonic genome activation occurs at the two-cell stage: correlation with major developmental changes x region mutations of hepatitis b virus related to clinical severity understanding the host genetics of chronic hepatitis b and c, semin does severe teratozoospermia affect blastocyst formation, live birth rate, and other clinical outcome parameters in icsi cycles? relationship between the mechanism of hepatitis b virus father-infant transmission and pregnancy outcome the impact of emerging infectious diseases on chinese blood safety sars-like wiv1-cov poised for human emergence hepatitis delta and hiv infection effect of ebola virus disease on maternal and child health services in guinea: a retrospective observational cohort study model-informed risk assessment for zika virus outbreaks in the asia-pacific regions factors affecting sperm fertilizing capacity in men infected with hiv the integrated hiv-1 provirus in patient sperm chromosome and its transfer into the early embryo by fertilization research on the vertical transmission of hepatitis c gene from father-to-child via human sperm effects of hepatitis c virus infection on human sperm chromosomes this work was financially supported by the national natural science foundation of china (grant number 30972526) and the applied basic research programs of sichuan province of china (grant number 2014jy0110). the authors thank professor stanley lin for his assistance in revising the final draft of manuscript and editing for english grammar and syntax. supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.reprotox.2017. 08.009. key: cord-266521-vovas81d authors: yokobayashi, yohei title: aptamer-based and aptazyme-based riboswitches in mammalian cells date: 2019-06-22 journal: curr opin chem biol doi: 10.1016/j.cbpa.2019.05.018 sha: doc_id: 266521 cord_uid: vovas81d molecular recognition by rna aptamers has been exploited to control gene expression in response to small molecules in mammalian cells. these mammalian synthetic riboswitches offer attractive features such as small genetic size and lower risk of immunological complications compared to protein-based transcriptional gene switches. the diversity of gene regulatory mechanisms that involve rna has also inspired the development of mammalian riboswitches that harness various regulatory mechanisms. in this report, recent advances in synthetic riboswitches that function in mammalian cells are reviewed focusing on the regulatory mechanisms they exploit such as mrna degradation, microrna processing, and programmed ribosomal frameshifting. molecular recognition by rna aptamers has been exploited to control gene expression in response to small molecules in mammalian cells. these mammalian synthetic riboswitches offer attractive features such as small genetic size and lower risk of immunological complications compared to proteinbased transcriptional gene switches. the diversity of gene regulatory mechanisms that involve rna has also inspired the development of mammalian riboswitches that harness various regulatory mechanisms. in this report, recent advances in synthetic riboswitches that function in mammalian cells are reviewed focusing on the regulatory mechanisms they exploit such as mrna degradation, microrna processing, and programmed ribosomal frameshifting. envisioned practical applications of mammalian synthetic biology frequently require gene switches that recognize endogenous or exogenous chemical signals and turn on or turn off expression of proteins, which in turn regulate synthetic genetic circuits inside the cell. these chemical gene switches need to be flexible enough to be tailored to diverse chemical species, engineered to function as on-switch or off-switch, fine-tuned to adjust the sensitivity, and have a minimal genetic and metabolic footprint. protein-based engineered transcription factors (tfs), such as tet-on and tet-off systems derived from a bacterial tf, are among the most widely used tools to control mammalian gene expression in response to small molecule triggers [1] . however, there are a number of drawbacks of tf-based gene switches as generally applicable switches for mammalian and biomedical applications [2] [3] [4] : (i) adapting an engineered tf to respond to a new compound is challenging, (ii) expression of engineered tfs can trigger immunogenicity, (iii) genetic size and expression of an engineered tf can burden the host cell or the vector, and (iv) switch performance can be influenced by the expression level of the tf; therefore, optimization may be required. rna aptamers for desired ligands can be obtained relatively easily by in vitro selection (selex) [5, 6] ; therefore, riboswitches can potentially be engineered to respond to a variety of compounds more readily than engineered tfs. moreover, aptamers are typically small (20 to 100 nt). even with the additional nucleotides necessary to regulate gene expression, genetic footprints of riboswitches (few hundred nt) are small enough to satisfy vectors that have limited capacity (e.g. adeno associated virus). furthermore, the lack of any translated proteins is expected to result in lower immunogenicity and metabolic burden on the host cell. in 1998, werstuck and green [7] demonstrated that insertion of an rna aptamer selected to bind the hoechst dye h33342 in the 5 0 untranslated region of mrna allows repression of gene expression in response to the ligand in cho cells, presumably by blocking ribosome binding or scanning. it should be noted that this work predated the discovery of natural bacterial riboswitches [8, 9] , demonstrating that small molecules can directly and specifically affect gene expression in the absence of mediator proteins. importantly, rna aptamer-based regulation of gene expression allows bioengineers to harness the rich diversity of gene regulatory mechanisms that involve rna, for example, translation initiation, rna interference and micrornas, and rna splicing. for each rna mediated gene regulatory mechanism, there are multiple ways to couple its outcome with aptamer-ligand interaction, further enriching the potential diversity of synthetic riboregulators. an excellent review on rna-based gene switches in mammalian cells was recently published by ausländer and fussenegger [10] . therefore, this article reviews more recent developments in synthetic mammalian riboswitches with a focus on the diversity of the regulatory mechanisms harnessed by the riboswitches. in 2004, yen et al. demonstrated chemical regulation of gene expression in mammalian cells by embedding a hammerhead ribozyme in the untranslated regions (utrs) of an mrna [11] . as the 5 0 and 3 0 utrs are indispensable for translation of eukaryotic mrnas, self-cleavage within the mrna was expected to suppress gene expression. addition of toyocamycin to the cell culture medium resulted in nonspecific incorporation of the antiviral nucleotide analog into the mrna and statistical inactivation of the ribozyme activity [12] . while the ribozyme was not specifically regulated by a small molecule via an aptamer, this work paved the way for the subsequent riboswitches that employ allosterically regulated ribozymes (aptazymes) embedded in the 5 0 and/or 3 0 utr to chemically regulate gene expression in mammalian cells (figure 1a ) [13] [14] [15] [16] . this strategy continues to be popular, and the recent advances highlight new approaches to design and optimize aptazymes. earlier aptazyme-based riboswitches in mammalian cells were designed via trial-and-error, or based on mediumthroughput or high-throughput screening in bacteria or yeast systems. however, it has been reported that ribozyme activity in living cells are not highly correlated among different cell types (bacteria, yeast, or mammalian cells) which is understandable considering the differences in translational mechanism, intracellular environment (rna binding proteins, ribonucleases, etc.), and mode of gene regulation by ribozymes in different cell types [17 ] . because high-throughput screening of riboswitches directly in mammalian cells is technically challenging, alternative design strategies focusing on aptazymes that function in mammalian cells are desirable. the farzan group demonstrated an intriguing strategy that combines empirical experimental data with quantitative modeling [18] . they first synthesized a panel of 32 aptazymes and measured their riboswitch performance in cultured mammalian cells, and attempted to correlate the experimental results with various design parameters such as calculated annealing energy and the number of hydrogen bonds in the communication module. the researchers concluded that the proximity of base pairing (or lack thereof) within the communication module to the ribozyme affects the ribozyme activity, and devised 'weighted hydrogen bond score (whbs)' as a calculable parameter that correlates with the ribozyme performance. they used whbs as a guide to design aptazyme-based riboswitches using three aptamers with good switching characteristics [18] . it remains to be seen if the strategy can be extrapolated to different aptamers, ribozymes, and aptazyme architectures. dohno et al. addressed the aptazyme design problem with a unique approach [19 ] . instead of starting from an existing aptamer, they started with a rationally designed molecule targeted to mismatched dna/rna sequences. the naphthyridine carbamate tetramer with z-stilbene mammalian riboswitches yokobayashi 73 linker (z-ncts) designed by the group is known to bind xgg/xgg mismatches in dna and rna through watson-crick type hydrogen bonding between the naphthyridine moieties with the unpaired guanines [20] . they used this ligand as a 'molecular glue' to induce a tertiary contact between loops i and ii of a hammerhead ribozyme that is critical for ribozyme activity (figure 1b) . the tertiary contact between the two loops was disrupted by introducing an agg/ugg mismatch that is recognized by z-ncts. the engineered ribozyme was inserted into the 3 0 utr of an mrna encoding firefly luciferase whose expression was reduced by 4-fold upon addition of z-ncts (0.6 mm) in hela cells. while it is notable that the switching was observed with a low ligand concentration, z-ncts exhibited cellular toxicity (lc50 = 0.8 mm) probably due to nonspecific interactions with cellular rnas. in contrast to these rational or semi-rational design efforts, high-throughput screening of aptazymes has mostly been executed in escherichia coli or saccharomyces cerevisiae. however, screening is generally labor and cost intensive while yielding sequence information for a handful of 'hits'. screening also yields limited sequence--function information that can be exploited for further optimization or library design. to address the lack of comprehensive sequence-function relationship data for ribozymes, we developed an in vitro high-throughput ribozyme assay method using deep sequencing (figure 2 ) [21 ,22,23] . a library of ribozyme or aptazyme mutants are transcribed in vitro in a single tube as a mixture, and the resulting ribozymes (cleaved or uncleaved) are converted to dna templates for deep sequencing. the number of cleaved and uncleaved reads for each ribozyme variant are then counted to calculate cleavage efficiency under the reaction condition. this strategy yields a complete sequence-function relationship for all members of the ribozyme library up to 10 4 variants or more [24] , depending on library preparation and sequencing output. this method can unambiguously identify functional aptazymes if they exist, and provide a broader view of the sequence-function relationship that can be used to 74 synthetic biology schematic illustration of in vitro aptazyme assay strategy based on deep sequencing. an aptazyme library is prepared as a dna mixture which is transcribed into rna in vitro by t7 rna polymerase. the aptazyme pool is then converted to dna sequencing templates by reverse transcription and pcr. deep sequencing yields read counts of the cleaved and uncleaved fragments for every mutant in the library from which cleavage efficiency (% cleaved) is calculated. plotting cleavage efficiency of the mutants in the presence and absence of the ligand reveals aptazyme candidates which are subsequently evaluated as riboswitches. build a mechanistic model and/or to further optimize the aptazyme design. we used deep sequencing to identify guanine-activated ribozymes based on an hdv-like ribozyme which were subsequently used to control gene expression in hek 293 cells [21 ] . use of a low mg 2+ concentration during in vitro transcription may have contributed to the positive correlation of the ribozyme activity in vitro and in mammalian cells. similar methods were also used to fine tune mammalian gene expression levels [25] and to screen for active ribozymes (but not aptazymes) directly in mammalian cells [26] . programmed -1 ribosomal frameshifting (-1prf) [27] can result in translation of an alternative polypeptide from an mrna by shifting the translation reading frame by -1 nucleotide. a canonical -1prf element consists of a 7-nt slippery sequence x xxy yyz (trinucleotides xxy and yyz representing the original frame and xxx yyy representing the -1 frameshift) followed by a stable secondary structure such as a pseudoknot or a stem-loop. the chang group first controlled -1prf in mammalian cells (figure 3a ) by incorporating theophylline and s-adenosyl-l-homocysteine (sah) aptamers [28] . an on/off ratio of 6 was reported but the frameshifting efficiency (fe) remained low. the same group used another -1prf stimulating pseudoknot from sars coronavirus to engineer theophylline inducible -1prf switches with up to fivefold activation of gene expression [29] . more recently, matsumoto et al. used the analogs of naphthyridine carbamate tetramers described above to control the formation of a pseudoknot, thereby chemically inducing -1prf (figure 3a , top) [30] . although up to 9-fold activation was observed in hela cells, the maximum fe was low (3.2%) in the presence of the pseudoknot inducing ligand. it is possible that the toxicity of the ligand prevented observation of higher fes in hela cells, as an fe as high as 24% was observed in rabbit reticulocyte lysate. besides controlling translation via an aptamer embedded in the targeted mrna, another strategy aims to control the microrna (mirna) processing pathway by an rna aptamer. the mirna product subsequently targets one or more mrnas in trans. an et al. first demonstrated that an aptamer incorporated in a mirna precursor allows chemical regulation of rna interference (rnai), in this case, by modulating the dicer mediated cleavage of a short-hairpin rna (shrna) [31] . the smolke group has developed an alternative strategy that controls the drosha mediated processing of primary mirna substrates by embedding an rna aptamer in the vicinity of the drosha cleavage site [32] . binding of the ligand to the aptamer inhibits drosha cleavage and upregulates the expression of the gene targeted by the mirna. recently, the group adapted an aptamer that they selected for (6r)-folinic acid (fa) to control a synthetic mirna precursor targeting il-2 receptors (figure 3b ) [33] . although the modulation of the targeted gene expression by the ligand was rather modest (up to 35% activation), it was sufficient to observe a robust regulation of cell proliferation. the suess group reported a new strategy to control the dicer processing by inserting an aptamer that binds to tetr protein [34] near the cleavage site blocking the reaction in the presence of tetr [35 ] . they showed that addition of the tetr ligand doxycycline dissociates tetr from the mirna precursor and increases the formation of the mature mirna (figure 3c ). doxycycline was able to tightly control the mature mirna level (1%-40% relative to unmodified mirna precursor), although the on/off ratio of the targeted reporter gene was more moderate (3). although this system requires an exogenous protein factor, it allows the use of doxycycline which has been used extensively in mammalian cells and animals. the pei group adapted another drosha modulation strategy originally reported by our group [36, 37] to control endogenous gene expression in cancer cell lines to induce apoptosis. a theophylline activated aptazyme was inserted between an inhibitory strand that blocks the 3 0 single-stranded region of the pri-mirna, thereby preventing the rna from being processed by drosha (figure 3d ). the researchers targeted map4k4 in hepg2 cells [38] and bcl-2 in mcf-7 cells [39] , and observed fourfold to fivefold increase in apoptosis. vogel et al. showed that exon skipping can be controlled by a tetracycline rna aptamer inserted near the 3 0 splice site along with a suicide exon (figure 3e ) [40] . tetracycline induces exon skipping resulting in 5-fold activation of gene expression. in combination with an aptazyme device, the dynamic range increased to 7-fold in hek 293 cells. an mrna can be targeted by an endogenous mirna by inserting one or more mirna target sequences in the 3 0 utr, a strategy sometimes used to achieve cell type-dependent gene expression. mou et al. controlled the accessibility of the mirna target sequence by an aptamer embedded near the target site in the mrna [41 ] . addition of tetracycline occluded the mirna target site resulting in upregulation of gene expression (figure 3f ). the performance of the switch, as expected, was dependent on the mirnas and cell types used, but they observed tight regulation of gene expression with up to 19-fold activation in hela cells with a mir-21 target site. the diversity of gene regulatory mechanisms that involve rna has and will continue to inspire synthetic rnabased gene switches in mammalian cells. it can be anticipated that riboswitches based on different regulatory mechanisms have advantages for different applications. for example, mirna-based riboswitches are more convenient for controlling endogenous gene expression because they operate in trans as opposed to the utr embedded aptazymes that function in cis. however, the existing mammalian riboswitches still exhibit poor and variable dynamic range and ligand sensitivity compared to the conventional tf-based gene switches. with further optimization and increasing availability of new aptamers and ligands for cellular applications, synthetic riboswitches should emerge as useful tools in synthetic biology of mammalian cells. controlling mammalian gene expression with small molecules generation of cells expressing improved doxycycline-regulated reverse transcriptional transactivator rtta2s-m2 selecting the optimal tet-on system for doxycycline-inducible gene expression in transiently transfected and stably transduced mammalian cells tet-on systems for doxycycline-inducible gene expression systematic evolution of ligands by exponential enrichment: rna ligands to bacteriophage t4 dna polymerase in vitro selection of rna molecules that bind specific ligands controlling gene expression in living cells through small molecule-rna interactions sensing small molecules by nascent rna: a mechanism to control transcription in bacteria thiamine derivatives bind messenger rnas directly to regulate bacterial gene expression synthetic rna-based switches for mammalian gene expression control exogenous control of mammalian gene expression through modulation of rna self-cleavage identification of inhibitors of ribozyme self-cleavage in mammalian cells via high-throughput screening of chemical libraries controlling mammalian gene expression by allosteric hepatitis delta virus ribozymes genetic control of mammalian t-cell proliferation with synthetic rna regulatory systems conditional control of mammalian gene expression by tetracycline-dependent hammerhead ribozymes a ligand-dependent hammerhead ribozyme switch for controlling mammalian gene expression highly motif-and organism-dependent effects of naturally occurring hammerhead ribozyme sequences on gene expression this work surveyed a panel of ribozymes in various cellular and genetic contexts and showed that ribozyme activity is highly context dependent. the results imply that care must be taken when translating screening results obtained in nonmammalian systems to mammalian cells rational design of aptazyme riboswitches for efficient control of gene expression in mammalian cells restoration of ribozyme tertiary contact and function by using a molecular glue for rna the authors used a 'molecular glue' to restore a tertiary contact in a hammerhead ribozyme, presenting an alternative to canonical aptamers for controlling gene expression in mammalian cells naphthyridine tetramer with a pre-organized structure for 1:1 binding to a cgg/cgg sequence high-throughput assay and engineering of self-cleaving ribozymes by sequencing deep sequencing was used to exhaustively assay 512 aptazyme mutants without screening or selection in vitro. the aptazymes were then shown to function as riboswitches in mammalian cells high-throughput mutational analysis of a twister ribozyme applications of high-throughput sequencing to analyze and engineer ribozymes deep sequencing analysis of aptazyme variants based on a pistol ribozyme analyzing and tuning ribozyme activity by deep sequencing to modulate gene expression level in mammalian cells direct screening for ribozyme activity in mammalian cells changed in translation: mrna recoding by -1 programmed ribosomal frameshifting synergetic regulation of translational reading-frame switch by ligand-responsive rnas in mammalian cells rational design of a synthetic mammalian riboswitch as a ligand-responsive -1 ribosomal frame-shifting stimulator small synthetic molecule-stabilized rna pseudoknot as an activator for -1 ribosomal frameshifting artificial control of gene expression in mammalian cells by modulating rna interference through aptamer-small molecule interaction design of small molecule-responsive micrornas based on structural requirements for drosha processing regulation of t cell proliferation with drug-responsive microrna switches an rna aptamer that induces transcription although this system includes a protein component (tetr), tight control of natural microrna maturation was demonstrated using a widely used small molecule (doxycycline) in mammalian cells modulating endogenous gene expression of mammalian cells via rna-small molecule interaction conditional rna interference mediated by allosteric ribozyme regulation of map4k4 gene expression by rna interference through an engineered theophyllinedependent hepatitis delta virus ribozyme switch inducible bcl-2 gene rna interference mediated by aptamer-integrated hdv ribozyme switch a small, portable rna device for the control of exon skipping in mammalian cells conditional regulation of gene expression by ligand-induced occlusion of a microrna target sequence a new mode of engineered rna-based gene regulation in mammalian cells was demonstrated by controlling the accessibility of a mirna target site by aptamer-ligand interaction nothing declared. work in the author's laboratory mentioned in the article was financially supported by okinawa institute of science and technology graduate university. papers of particular interest, published within the period of review, have been highlighted as: of special interest of outstanding interest key: cord-272378-umvi0veu authors: subramanian, subbaya; steer, clifford j. title: special issue: microrna regulation in health and disease date: 2019-06-15 journal: genes (basel) doi: 10.3390/genes10060457 sha: doc_id: 272378 cord_uid: umvi0veu our understanding of non-coding rna has significantly changed based on recent advances in genomics and molecular biology, and their role is recognized to include far more than a link between the sequence of dna and synthesized proteins [...]. our understanding of non-coding rna has significantly changed based on recent advances in genomics and molecular biology, and their role is recognized to include far more than a link between the sequence of dna and synthesized proteins. micrornas (mirnas) are small regulatory rnas that play a crucial role in posttranscriptional gene regulation. greater than 2500 mirnas have been identified and catalogued in humans and many of them are conserved in other species [1] . mirnas are implicated in almost every facet of fundamental cellular functions including development, senescence and disease. the past decade has experienced a remarkable increase in the understanding of mirna biogenesis, their target genes, mirna biomarkers and potential therapeutics for a growing number of disease conditions. rna-based markers and therapeutics have a potentially significant clinical impact, and many of the mirna-based therapies are at various stages of application and human clinical trial [2] . as background, non-coding rnas are divided into (i) transcription rnas (including both trna and rrna); (ii) small rnas, which are further subdivided into sirnas, mirnas, snornas, and snrnas; and (iii) most recently, long non-coding rnas, which are now know to transcribe short peptides [3] . micrornas are single-stranded non-coding rnas that are typically 18-25 nucleotides (nts) in length and are best known for their role in the post-transcriptional regulation of gene expression. they are the most abundant class of small endogenous non-protein coding rnas, and make up one of the largest well-conserved gene families found among viruses, plants and animals. the majority of mirna sequences in humans are typically transcribed by introns of non-coding and coding transcripts, with few transcribed by exonic regions. microrna genes are typically transcribed by polymerase ii or iii and generate primary mirnas (pri-mirnas), which can contain sequences for multiple mirnas and be hundreds of nts in length. these structures are then processed and cleaved by drosha-dgcr8 complex, resulting in the formation of a hairpin-shaped stem-loop structure, which is known as the precursor mirna (pre-mirna) and typically around 70 nts in length. the pre-mirna is exported outside of the nucleus primarily by exportin 5. further processing takes place in the cytoplasm by dicer1-tarbp2, which is an rnase iii enzyme, resulting in a two-stranded duplex of mirna-mirna*. typically, it is 18-25 nt long, with one strand designated the guide strand and the other as the passenger strand. finally, the guide strand is incorporated into the rna-induced silencing complex (risc), which is a large multiprotein mirna ribonucleoprotein complex that is the effector compound in modulating target gene transcription. alternative pathways have been described that are drosha-dgcr8 independent as well as dicer-independent, and are likely to greatly advance our understanding of mirna biogenesis and involvement in conditions of health and disease. regulatory interactions between mirna and other noncoding rnas, including long noncoding rnas (lncrna), and circular rna (circrna) are now known to determine the cellular functional status and phenotype. micrornas use seed sequences (6-8 bases long) to bind microrna response elements (mres) located on their interacting partners, primarily at the 3'utr of coding transcripts [4] . however, it is possible that the frequency of mres in the entire transcriptome of a given cell contributes to the dynamic gene regulatory process by acting as a sponge for mature mirnas, thus regulating their functional availability. thus, the coding and noncoding transcripts sharing common mre sites compete with each other and define the gene expression profile of a given cell. these competing transcripts are collectively called competing endogenous rnas [5] . thus, gene expression regulation is a complex process involving the dynamic interactions between mirna-mrna-lncrna-circrna. this complexity is increased multifold when these interacting partners are exchanged between cells via extracellular vesicles. recognizing this intricate system will significantly aid in our understanding of the health and disease process. we are beginning to study disease pathogenesis at the cellular, organ, and whole body levels; and the gut microbiota is increasingly recognized as a crucial player [6] . it is time to acknowledge each of these players as they take center stage in maintaining homeostasis and the normal physiological functioning of an organism. there are still significant gaps in understanding the complex regulatory mechanisms of mirnas; however, the field is advancing rapidly. further, it has been shown that a number of nuclear receptors are involved in the transcriptional regulation of mirna expression, including the small heterodimer partner (shp) and farnesoid x receptor (fxr). in general, mirnas are detected as (i) extracellular circulating mirna bound to different lipoproteins; (ii) part of a non-membrane ribonucleoprotein complex associated with argonaut proteins; and (iii) contained in exosomes as extracellular vesicles, where they act as nano-sized transporters involved in the communication between neighboring cells. the recent finding that mirnas are also transported from one cell to another via tunneling nanotubes underscores their importance in maintaining communication among all cell types, including those associated with cancer. this special issue of genes, entitled "microrna regulation in health and disease" consists of a series of articles spanning the clinical realm from colorectal cancer to pulmonary fibrosis. however, we begin with a research article by liu et al. who reported for the first time the existence of complemented palindromic small rnas (cpsrnas) from sars coronavirus, and propose that cpsrnas and palindromic small rnas (psrnas) constitute a novel class of small rnas [7] . such a discovery of cpsrnas could pave a way to find novel markers for pathogen detection and to reveal the mechanisms underlying infection or pathogenesis from a different perspective. in a study titled, "a two-cohort rna-seq study reveals changes in endometrial and blood mirnome in fertile and infertile women", rekker et al. compared mid-secretory phase samples between fertile and infertile women [8] . the study revealed 21 differentially expressed mirnas from the endometrium and one from blood samples. among the novel mirnas, chr2_4401 was validated and showed upregulation in the mid-secretory endometrium. in addition to the novel findings, the authors confirmed the involvement of mir-30 and mir-200 family members in mid-secretory endometrial functions. hueso et al. elegantly showed in their article that an exonic switch regulates the differential accession of micrornas to the cd34 transcript in atherosclerosis progression [9] . further, they proposed a new mechanism of mirna action, linked to a cryptic splicing site in the target-host gene, that would regulate the differential accession of mirnas to their cognate binding sites. li et al. studied the role of mirna-106a-5p in inhibiting c2c12 myogenesis via targeting pik3r1 and modulating pi3k/akt signaling [10] . their results showed that mir-106a-5p was elevated in aged muscles and dexamethasone (dex)-treated myotubes. the up-regulation of mir-106a-5p significantly reduced the diameters of myotubes accompanied by increased levels of muscular atrophy genes and decreased pi3k/akt activities. finally, mir-106a-5p was demonstrated to directly bind to the 3'-utr of pik3r1, thus, repressing pi3k/akt signaling. the microbiome appears to interact and perhaps influence an unlimited number of metabolic processes in health and disease. in their article, yuan et al. postulate that the altered nutrient composition and mirna expression in colorectal cancer (crc) microenvironment selectively exerts pressure on the surrounding microbiota, leading to alterations in its composition [11] . further, the authors present a detailed overview of the current understanding of the role of mirnas in mediating host-microbiota interactions in crc. "single nucleotide polymorphisms in mir143 contribute to protection against non-hodgkin lymphoma (nhl) in caucasian populations" by bradshaw et al. is first to report a correlation between mirsnps in mir-143 and a reduced risk of nhl in caucasians [12] . further, it is supported by significant snps in high linkage disequilibrium (ld) in a large european nhl genome-wide association study (gwas) meta-analysis. axmann et al. compared the mirna profiles in serum and lipoprotein particles of healthy individuals with those of patients with uremia [13] . they observed a significant increase in levels of cellular mirna level using reconstituted high-density lipoprotein (hdl) particles artificially loaded with mirna, whereas incubation with native hdl particles yielded no measurable effect. based on the results, the authors concluded that there was no relevant effect of lipoprotein-particle-mediated mirna-transfer under in vivo conditions though the mirna profile of lipoprotein particles can be used as a diagnostic marker. mullenbrock et al., carried out an elegant systems analysis transcriptomic and proteomic study on the potential role of mirnas in pulmonary fibrosis [14] . they specifically targeted fibroblasts and myofibroblasts as the key effector cells responsible for the excessive extracellular matrix (ecm) deposition and fibrosis progression in both idiopathic pulmonary fibrosis (ipf) and systemic sclerosis (ssc) patient lungs. the comprehensive analyses of mrna, mirna, and matrisome proteomic profiles in ipf and ssc lung fibroblasts revealed robust fibrotic signatures at both the gene and protein expression levels and identified novel fibrogenesis-associated mirnas whose aberrant downregulation in disease fibroblasts likely contributes to their fibrotic and ecm gene expression. somatostatin (sst) analogues were used to control the proliferation and symptoms of neuroendocrine tumors (nets) in an article by døssing et al., entitled "somatostatin analogue treatment primarily induce mirna expression changes and up-regulates growth inhibitory mir-7 and mir-148a in neuroendocrine cells" [15] . two mirnas which were highly induced by sst analogues, mir-7 and mir-148a, were shown to inhibit the proliferation of nci-h727 and cndt2 cells. sst analogues also produced a general up-regulation of the let-7 family members. sst analogues controlled and induced distinct mirna expression patterns among which mir-7 and mir-148a both have growth inhibitory properties. as fda-approved small rna drugs begin to enter the arena of clinical medicine, it is critical to expand both preclinical and clinical research studies for mirnas. a growing number of reports suggest a significant utility of mirnas as biomarkers for pathogenic conditions, modulators of drug resistance, and/or as drugs for medical intervention in almost all human health conditions. the pleiotropic nature of this class of nonprotein-coding rnas makes them particularly attractive drug targets for diseases with a multifactorial origin and few, if any, available treatments. the landscape of both diagnostic and interventional medicine will arguably continue to evolve as candidate mirnas pass successfully through phase 2 and 3 clinical trials. in this special issue of genes, we provide a series of articles that highlight micrornas as diagnostic, predictive and therapeutic agents for human disease. the development of bioinformatics programs to identify mirna-binding sites in target genes and their corresponding biological pathways, along with an expanding platform of in vitro and in vivo preclinical research models, has propelled mirnas into clinical medicine. the first sirna human trial was conducted in 2004 and, in 2018, the first sirna drug was approved, paving the way for a class of mirna transcripts whose active investigation began only a little more than 15 years ago. the future of human mirna clinical trials is absolutely guaranteed and that time has arrived. the development of mirna diagnostics and therapeutics is an exciting and potentially new frontier in treating diseases for which few treatment options exist. we believe and hope that this special issue of genes will be an important resource for a wide variety of audiences, including students at all levels, and established investigators who are interested in contributing to the remarkable and ever-expanding field of micrornas in health and disease. the authors declare no conflicts of interest. the potential for microrna therapeutics and clinical research a network of noncoding regulatory rnas acts in the mammalian brain the multilayered complexity of cerna crosstalk and competition cerna cross-talk in cancer: when ce-bling rivalries go awry microrna-mediated tumor-microbiota metabolic interactions in colorectal cancer velthut-meikas, a. a two-cohort rna-seq study reveals changes in endometrial and blood mirnome in fertile and infertile women an exonic switch regulates differential accession of micrornas to the cd34 transcript in atherosclerosis progression microrna-106a-5p inhibited c2c12 myogenesis via targeting pik3r1 and modulating the pi3k/akt signaling host-microrna-microbiota interactions in colorectal cancer single nucleotide polymorphisms in mir143 contribute to protection against non-hodgkin lymphoma (nhl) in caucasian populations serum and lipoprotein particle mirna profile in uremia patients systems analysis of transcriptomic and proteomic profiles identifies novel regulation of fibrotic programs by mirnas in pulmonary fibrosis fibroblasts somatostatin analogue treatment primarily induce mirna expression changes and up-regulates growth inhibitory mir-7 and mir-148a in neuroendocrine cells key: cord-023605-zibwrv76 authors: nan title: genetics and biotechnology of bacilli: a.t. ganesan and j.a. hoch (eds.): (proceedings of the second international conference on the genetics and biotechnology of bacilli, stanford university, stanford, ca, july 6–8, 1983) academic press inc., orlando, fl, 1984, xviii + 421 pp. ($41.50) isbn 0-12-274 60-9 date: 2003-01-16 journal: gene doi: 10.1016/0378-1119(87)90076-x sha: doc_id: 23605 cord_uid: zibwrv76 nan [ 179 authors; 48 articles. role of retroviruses: in neoplasia, in nature. w.p. rowe memorial lecture: changing dogmas in retrovirology (j.a. levy). retroviruses and murine model system endogenous leukemia viruses, xenotropic retroviruses, computer age, mammary tumorigenesis, proviral genome of radiation leukemia virus, friend leukemia cells, envelope gene, mhc, mmtv, antiviral drugs. retroviruses and the vertebrate model system: oncogenes in rats, avian leukemia viruses, feline retroviruses, bovine leukemia virus, ultraviolet-induced retroviruses, susceptibilities to 2-deoxy-d-glucose, functional heterogeneity, rat sarcoma galliera. retroviruses and human pathology: htlv viruses, retroviral genes in human reproductive tissue, aids, t-cell leukemia, ltrs of leukemia viruses, lav proteins, lymphocyte immune functions, anti-,htlv-iii and anti t-cell antibodies, seroepidemiological study of lav by elisa in aids, antibodies to saids, retroviruses expressed during oocyte maturation, psoriasis, human milk rnase, slow infections. retroviruses and oncogenes: retroviruses with two oncogenes, ras oncogenes, raf oncogene, cell attachment, amplification of oncogenes, nucleotide sequence of erba, osteosarcoma oncogene, c-myc gene, fes oncogene, myc and myb oncogenes, detection of oncogene transcripts]. [the integrated prophage (a.m. campbell); scatology and biotechnology (s. falkow). chromosomal organization: ribosomal and transfer rna genes; replication origin and terminus; membrane; gene amplification; cloning and gene fusion; tn917 insertional mutagenens; sucrose system. secretion: b-lactomases; levansucrase; secretion vectors; serin protease; sub&in control; cr-amylase promoter and signal peptide. transcription: phage +29; temporally expressed promoter; lipiarmycin as inhibitor of rna polymerase; phage sppl as vector. cloning: inducible promoter; thiol-activated cytolysin; chloramphenicol inducibility; stability of bifunctional plasmid; cloning in bacilli; chromogenic detection. synthesis of sporulation-associated products : cloning and sequence of the spooa locus (f. ferrari and j. hoch); amplification of sporulation genes; (84 authors; 25 articles. sol spiegehnan's obituary; molecular basis of gene expression (d.d. brown). gene expression and its regulation: sv40 enhancer; cloned alcohol dehydrogenase in drosophila. dna methylation and globin gene expression; histone gene transcription; control of fl-globin transcription; transcriptional enhancement; introns in murine kappa light chain genes. in vivo gene transfer and development : drosophila ; sea urchin ; transgenic mouse; gene transfer into mice; surface receptors; nuclear transplantation; neuropeptide genes; homologous recombination in l cells. viral gene and oncogene systems: splicing of mrna; regulation of thymidine kinase expression; control of a genes of hsvl; early adenovirus transcription; mutational analysis of ad5ela gene; adenovirus va1 rnatranslational enhancer; early sv40 transcription; transforming genes of chicken and human b cell lymphomas ; myc oncogene). biological regulation and development mueller); estrogens, brain cell functions, and behaviour (b.s. mceven); transmembrane-mediated communication (g.l. nicolson); hormonal control of nacl and water transport in epithelia (a. taylor and l.g. palmer); metabolism of cell surface receptors -possible roles in cell sensitivity and responses to activators development and characterization of chinese hamster cell lines (cho); genetic manipulation cho lines genetic systems developed in cho lines; intermediary metabolism; cell structure and behaviour; mechanism of genetic variation; lineages of cho lines the bacteria, a treatise on stxucture and function biology: taxonomy of the pseudomonas; control of pseudomonas pulida growth on agar surfaces; evolution of enzyme structure and function in pseudomonas; outer membrane permeability of pseudomonas aeruginosa; toxins and virulence factors of pseudomonas aenqinosa. ii. genetics: chromosome mobilization and genomic organization in pseudomonas degradative plasmids in pseudomonas; gene cloning and manipulation in pseudomonas; cloning of pseudomonas genes in escherichia coli. iii. biochemistry: biosynthetic and catabolic features of amino acid metabolism in pseudomonas; catabolic potential of pseudomonas cepacia ; terpenoid metabolism by pseudomonas; biochemistry of aromatic hydrocarbon degradation in pseudomonas genetics, developmat, and evolution. 17th evolution and morphogenesis, gene action and morphogenesis in plants, mobile elements in maize, mutation, apical meristems and developmental selection in plants, properties of mutable alleles recovered from mutator key: cord-252781-06hs9pit authors: lai, wing-fu title: cyclodextrins in non-viral gene delivery date: 2013-10-05 journal: biomaterials doi: 10.1016/j.biomaterials.2013.09.061 sha: doc_id: 252781 cord_uid: 06hs9pit cyclodextrins (cds) are naturally occurring cyclic oligosaccharides. they consist of (α-1,4)-linked glucose units, and possess a basket-shaped topology with an “inner–outer” amphiphilic character. over the years, substantial efforts have been undertaken to investigate the possible use of cds in drug delivery and controlled drug release, yet the potential of cds in gene delivery has received comparatively less discussion in the literature. in this article, we will first discuss the properties of cds for gene delivery, followed by a synopsis of the use of cds in development and modification of non-viral gene carriers. finally, areas that are noteworthy in cd-based gene delivery will be highlighted for future research. due to the application prospects of cds, it is anticipated that cds will continue to emerge as an important tool for vector development, and will play significant roles in facilitating non-viral gene delivery in the forthcoming decades. cyclodextrins (cds) are cyclic (a-1,4)-linked oligosaccharides of a-d-glucopyranose [1] . the earliest reference to them can be dated back to 1891 when villiers described a crystalline substance called "cellulosine", which was isolated from a bacterial digest of starch. that substance lacks reducing properties but is resistant to acid hydrolysis [2] . it was later known as "cyclodextrin". the ring structures of cds were elucidated by freudenberg and colleagues in the late 1930s [3, 4] . somewhat later the basic physicochemical characteristics of cds (including reactivity, cavity size, chemical structure, solubility and the ability to form inclusion complexes with guest molecules) were described by cramer in his book "einschlussverbindungen" [5] . over the years, cds have gained prominence in a multitude of pharmaceutical areas, ranging from chiral separation of basic drugs [6] to controlled drug delivery [7e 12]; however, their potential in gene delivery has received relatively less attention. in this article, we will first discuss the properties of cds for gene delivery, followed by an overview of the use of cds in development and modification of non-viral gene carriers. finally, areas that are noteworthy in cd-based gene delivery will be highlighted for future research. it is hoped that this article cannot only provide a synopsis of recent advances in the field, but can also offer practical insights for future development of cd-based technologies for gene delivery. the most common cds are a-, band g-cds, consisting of six, seven and eight a-d-glucopyranose units, respectively (fig. 1 ). owing to the presence of extensive hydroxyl groups, cds are soluble in water (table 1) . however, because of the relatively high crystal energy of cds [13] , molecules in the crystal state are strongly bound. therefore, the aqueous solubilities of cds are generally lower than those of the comparable linear dextrins. in addition, compared to aand g-cds (whose aqueous solubilities at ambient conditions are 1.5 and 2.4 g/l, respectively), the aqueous solubility of b-cd is only 0.2 g/l [14] . this is partially attributed to the formation of internal hydrogen bonds between secondary hydroxyl groups, thereby diminishing the capacity of b-cd molecules to form hydrogen bonds with surrounding water molecules [13] . structurally, cds have both the hydrophilic cavity exteriors and the apolar cavity interiors. this provides a micro-environment for encapsulation and solubilization of hydrophobic "guest" molecules [15, 16] , and enables cds to be exploited as excipients of chemical drugs. examples of drugs formulated with cds are listed in table 2 . apart from being used in drug formulations, the prospects of cds in delivering genes and other nucleic acids have become more evident in recent years [17, 18] . agrawal's team was one of the first to study the use of cds (and their analogs) in facilitating cellular uptake of oligonucleotides [19, 20] and in modulating oligonucleotide-induced immune stimulation [21] . later, abdou et al. examined the ability of various native and derivatized cds to enhance the action of an 18-mer phosphodiester oligodeoxynucleotide (od) (which is complementary to the initiation region e-mail address: rori0610@graduate.hku.hk. biomaterials j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / b i o m a t e r i a l s of the mrna coding for the spike protein, and contains the intergenic consensus sequence of an enteric coronavirus) against viral growth in human adenocarcinoma cells [22] . they discovered that compared to the naked od which resulted in only 12e34% of viral inhibition in vitro, up to 90% of viral inhibition could be obtained when the od was complexed with an b-cd derivative, 6-deoxy-6-s-b-d-galactopyranosyl-6-thio-cyclomalto-heptaose, in a molar ratio of 1: 100 [23e27] . this, along with other studies [27, 28] , has paved the way for subsequent intense research on cd-mediated gene delivery. fig. 1 . the structures and space-filling models of (i) a-cd, (ii) b-cd and (iii) g-cd. in the space-filling models, hydrogen, carbon and oxygen atoms are colored in black, white and gray, respectively. table 1 physical properties of a-, band g-cd. a-cd b-cd g-cd cds can be appealing to gene delivery applications because not only of their binding affinity to nucleic acids [17, 29] but also of their ability to attenuate the cytotoxicity of other gene carriers. the latter has been supported by a previous study [30] , in which a series of linear cationic b-cd-based polymers (bcdps) were constructed via condensation of diamino-cd monomers with diimidate comonomers. compared with polyamidines lacking cds, the ic 50 s of the bcdps to bhk-21 cells were remarkably higher [30] . this showed that cd incorporation into the backbone of the cationic polymer substantially lowers the polymer cytotoxicity. in addition to the properties mentioned above, cds are effective absorption enhancers in therapeutics delivery. as illustrated in vitro by skin permeation studies [31] , after complexation with b-cd, meglumine antimoniate (ma) led to a 2-fold increase in the antimony flux. a similar absorption-enhancing effect of cds was shown by using dimentylb-cd, which, at a concentration of 5% (w/v), elevated the permeability of the nasal mucosa to the intranasally administered neurotrophic peptide, org2766, and enhanced the absorption in rabbits 1to 2-fold from 10 ae 6% (mean ae s.d.) for administration of the peptide alone to 17 ae 8%, and in rats 5-fold from 13 ae 4% to 65 ae 21% [32] . all these evidenced the absorption-enhancing property of cds, and it is this property that may also facilitate gene delivery. the latter has been substantiated by an earlier study, which successfully improved adenoviral-mediated gene transfer to the rat jejunum by using cds [33] . the improvement has been ascribed to the cdmediated enhancement of viral binding and internalization into the host cells. in fact, cds have comparatively large molecular weights (>972 da) and low octanol/water partition coefficients. this, along with the presence of a plurality of hydrogen donors and acceptors on their molecules, has made cds unlikely to be directly permeable to lipophilic biological membranes such as skin and gastrointestinal mucosa [34e37]. it was hypothesized that cds enhance absorption mainly by increasing membrane permeability through complexation with membrane phospholipids and cholesterols [38] . this hypothesis was supported by an earlier study, which depleted membrane cholesterol from porcine, bovine and human erythrocytes by incubating the cells in suspensions of lecithin liposomes [39] . the study found that membrane permeability remained unaltered when the level of cholesterol removal was up to 30%, but upon more extensive cholesterol depletion, the transfer rates of nonelectrolytes and organic acids penetrating the membrane were considerably elevated [39] . however, the biphasic response of cholesterol depletion on membrane permeability in erythrocytes appeared not to be reproducible in artificial lipid membranes [39] . the effect of cholesterol depletion on membrane permeability is still obscure, and further study is required before the molecular basis of cd-mediated absorption enhancement can be fully elucidated. cds have practical potential in gene delivery, but due to their failure to form stable complexes with plasmid dna (pdna) [40] , native cds have limited transfection efficiency. cds are, therefore, usually derivatized prior to their use in gene transfer. a good example of cd derivatives is polycationic amphiphilic cds (pacds) (which were constructed by the amendment of the facial anisotropy of the truncated-cone cd torus via instillation of cationic and table 2 examples of drugs whose cd-containing formulations have been marketed. hydrophobic elements in the "skirt' or "jellyfish" architectures [41, 42] ). by fine adjustment of the molecular parameters (e.g. charge density, hydrophilic-hydrophobic balance, nature of the functional groups, and spacer length), the dna complexation capacity and transfection efficiency of pacds can be modulated [43e 45] . more examples of cd derivatives are shown in fig. 2 . they were fabricated by modification of b-cd with a pyridylamino, alkylimidazole, methoxyethylamino or primary amine group at the 6position of the glucose units [40] . studies with 32 p-labeled pdna indicated that these derivatives promoted cellular uptake of the transgene much more efficiently than native cds [40] . among these derivatives, those having unmodified 2-and 3-hydroxyls and possessing an amino, pyridylamino or butylimidazole group at the 6position were found to have the best performance in cos-7 cell transfection [40] . these molecular constructs warrant further development as gene carriers. aside from being used directly for gene delivery after derivatization, cds have been used as linking agents or structural modifiers for development of gene carriers. by functioning as linking agents, cds are used to covalently link other polymers together to form larger molecular constructs as gene carriers. one example of polymers fabricated by this approach was synthesized by linking low molecular weight poly(ethylenimine) (pei), which is a cationic aziridine polymer exhibiting a high proton buffering capacity over a broad range of ph [46] , with b-cd by using tosyl chloride to first generate amine-reactive tosyldeoxy-b-cd, which subsequently reacted with pei to generate the cd-pei conjugate (pei-b-cd) [25] . in vitro studies showed that pei-b-cd was basically nontoxic to hek293 cells at the working concentration for pdna delivery. compared to unmodified pei, pei-b-cd induced nearly 4-fold higher luciferase expression [25] . by anchoring human insulin (which was derivatized with a hydrophobic palmitate group) onto its polyplexes, the transfection efficiency obtained could even be over an order of magnitude higher than that provided by unmodified pei, either with or without the derivatized insulin [25] . notwithstanding the prospects discussed above, it is worth noticing that the enhancing effect of cds on pei is valid only under the premise of proper optimization of the grafting ratio of cds. this was revealed by the observation that modification of 5%, 10% and 16% of the amine groups in pei with cds reduced the luciferase activity by 1, 2 and 4 orders of magnitude, respectively [26] . this reduction was hypothesized to be due to the altered pk profile of the pei amines, resulting in a decrease in the efficiency of endosomal release. such a hypothesis was supported by the evidence that compared to unmodified pei, pei-b-cd exhibited a lower buffering capacity [26] . another example of cd-linked polymers is linear bcdps, which were synthesized from difunctionalized cds and difunctionalized comonomers [47] . similar to pei-b-cd, high efficiency of this type of polymer in gene delivery necessitates fine structural optimization. results of the luciferase activity assay in bhk-21 cells showed that the highest transfection efficiency was achieved by the linear bcdp with 6 methylene units [30] . the transfection efficiency of 64 and 10% of that achieved by the one with 6 methylene units, respectively [30] . these results evidenced that different levels of cd incorporation can influence the transfection efficiency of the polymer [30] . besides forming linear polymers, cds have been used to fabricate star-shaped vectors, in which cds function as the cores and other polymers as the arms. examples of vectors formed by this approach are listed in table 3 [48e52]. these vectors can facilitate pdna delivery at different levels, and are worth further development for possible use in practical situations. apart from native cds, derivatives of cds have been used as linking agents. previously, huang et al. cross-linked pei by using (2hydroxypropyl)-b-cd (2-hy-b-cd) and (2-hydroxypropyl)-g-cd (2hy-g-cd) [53] . the two resulting polymers exhibited lower cytotoxicity than pei 25 kda, and had transfection efficiency in skov-3 cells approximately 20 and 2 times higher than that achieved by pei 600da and pei 25 kda, respectively. more recently, b-cd has also been converted into the carboxymethyl-b-cd sodium salt, which has been combined with quaternized chitosan to form a dna carrier. the carrier could not only adsorb pdna perfectly at a polymer/ dna mass ratio of 4:1, but could also reach 40% of the transfection efficiency attained by liposomes [54] . however, all polymers discussed above have not been compared with those fabricated with native cds. how hydroxy-and carboxy-alkylation of cds affect the performance of the resulting polymers in gene delivery is still unknown. to further improve the performance of cd-linked polymers, one of the commonly adopted strategies is ligand conjugation. one example of ligands used is the cy11 peptide, which can facilitate fibroblast growth factor receptor (fgfr)-mediated endocytosis of polyplexes [55] . compared to unmodified pei-b-cd, cy11conjugated pei-b-cd showed higher transfection efficiency in cos-7 and hepg2 cells [55] . this corroborates the applicability of the cy11 peptide to improve the efficiency and target specificity in gene delivery. another example is transferrin, an iron-binding and -transport protein which functions as a targeting moiety towards various cancer cell lines (including those of colon cancer, ovarian cancer and glioblastoma) [56] . in a previous study, transferrin was conjugated to the poly(ethylene glycol) (peg)-adamantane (peg-ad) conjugate, which was subsequently incorporated into dna nanoparticles of the linear imidazole-conjugated bcdp [57] . the transferrin-peg-ad conjugate could not only self-assemble with the nanoparticles via inclusion complex formation between adamantane and the cd moieties on the particle surface, but could also retain high receptor binding activities. luciferase activity assays in k562 leukemia cells found that the transfection efficiency of the nanoparticles surface-modified with transferrin-peg-ad conjugates was 4-fold higher than that of the unmodified counterparts [57] . these results reveal the promise of ligand conjugation in enhancing the performance of polymeric vectors in gene transfer. aside from functioning as linking agents, cds can be used to structurally modify existing gene carriers. here cds are utilized in two ways. the first is as threading devices. this is exemplified by the works from li's group, which has fabricated a range of supramolecular polyrotaxanes consisting of cationic a-cd rings threaded and blocked on a poly[(ethylene oxide)-ran-(propylene oxide)] (p(eo-r-po)) random copolymer [58] . approximately 12 a-cd rings were found in each molecule of p(eo-r-po), with the rings being located selectively on eo segments of the copolymer. in hek293 cells, the polyrotaxanes fabricated showed higher transfection efficiency than pei 25 kda [58] , and deserve further evaluation as gene carriers for both in vitro and in vivo applications. the second way of utilizing cds is as pendants. an example of vectors developed by this approach is the polyamidoamine (pamam) dendrimer conjugates with a-, b-, and g-cds. the conjugates could condense table 3 examples of cd-based polymers with a star-shaped architecture. description ref. a star-shaped polymer consisting of a b-cd core and polyamidoamine (pamam) dendron arms the polymer showed more than 1-fold higher transfection efficiency, but lower cytotoxicity, than the pamam control (g4, with an ethylenediamine core) in human neuroblastoma sh-sy5y cells. [48] a star-shaped polymer consisting of a g-cd core and folate (fa)-modified oligoethylenimine (oei) arms the polymer exhibited low cytotoxicity, and demonstrated the ability to target and deliver dna to specific tumor cells which over-expressed folate receptors (frs). in addition, the polymer was reported to be able to recover and recycle frs onto cellular membranes. this can facilitate continuous fr-mediated endocytosis of the polyplexes. [49] a cd derivative containing poly(l-lysine) (pll) dendrons the derivative was prepared by click conjugation of per-6-azido-b-cd with the propargyl focal point pll dendron. it could not only load methotrexate drugs and show a sustained release behavior, but could also complex with pdna for transfection. [50] a star-shaped polymer consisting of a b-cd core and poly(2-(dimethylamino)ethyl methacrylate) (pdmaema) arms the polymer showed much lower cytotoxicity but higher transfection efficiency than high molecular weight pdmaema homopolymers. [51] a star-shaped polymer consisting of a b-cd core and poly(poly(ethylene glycol)ethyl ether methacrylate)-modified pdmaema arms compared to the polymer consisting of a b-cd core and unmodified pdmaema arms, this polymer demonstrated higher transfection efficiency. [51] a star-shaped polymer consisting of an a-cd core and oei arms at an n/p ratio of 8 or higher, the polymer complexed with dna to form polyplexes with a diameter of 100e200 nm. it gave transfection efficiency comparable to, or even higher than, that of pei 25 kda in hek293 and cos7 cells, but its cytotoxicity was significantly lower. [52] pdna and protect it from dnase i-mediated degradation [59] . in vitro studies showed that the a-cde conjugate (which is a dendrimer conjugate with a-cd) had higher transfection efficiency than conjugates with band g-cds [59] . its transfection efficiency in nih3t3 and raw264.7 cells was superior to lipofectin, and was 100-fold higher than that of the unmodified dendrimer [59] . to boost the efficiency of gene delivery, a-cde prepared from the g2 dendrimer was galactosylated with various degrees of substitution (ds). compared to the unmodified counterpart, the galactosylated conjugate with a ds value of 4 exhibited higher transfection efficiency in hepg2, nih3t3 and a549 cells. such an increase in transfection efficiency upon galactosylation, however, was found to be insensitive not only to the presence of competitors (asialofetuin and galactose) during transfection but also to the availability of asialoglycoprotein receptors on the cells to be transfected [60] . a similar phenomenon also happened in a-cde after mannosylation, which led to a receptor-independent increase in the efficiency of transfection [61] . the mechanism underlying this transfection enhancement is still unclear, but is proposed to be partially caused by the interaction between the modified conjugates and the intracellular galactose-or mannose-binding lectins [60] . such an interaction was thought to have increased the efficiency of intracellular trafficking and nuclear translocation of the polyplexes [60] . in addition to ligand conjugation, the gene delivery efficiency of a-cde can be augmented by fine adjustment of structural parameters. for instance, compared to the a-cde conjugates synthesized from g2 and g4 dendrimers, the one constructed with the g3 dendrimer demonstrated higher transfection efficiency [62] . furthermore, conjugates having different ds of a-cd not only displayed different membrane-disruptive abilities on calcein-encapsulated liposomes [63] , but also showed different cytotoxic activities and gene delivery capacities. in comparison with those having ds values of 1.1 and 5.4, the conjugate having a ds value of 2.4 showed higher transfection efficiency in nih3t3 and hepg2 cells, and could deliver pdna more efficiently to spleen, liver and kidney after intravenous administration [63] . these results point to the importance of structural optimization of the conjugate for transfection. more recently, folate (fa)-appended a-cdes (fa-a-cdes) with various ds of fa have been synthesized from the g3 dendrimer [64] . as fa is known to have negligible toxicity, low immunogenicity and high affinity to frs (whose isoform fra is over-expressed frequently in malignancies but not in normal tissues) [65e68], it is expected that after fa incorporation into a-cde, the resulting conjugate will exhibit higher tumor cell specificity and transfection efficiency. however, owing to the low receptor-binding activities of the resulting conjugate, no significant difference in transfection efficiency has been observed before and after fa incorporation [64] . table 4 examples of polymers modified with cds for gene delivery. ref. a polypseudorotaxane of the peg-grafted a-cd/pamam dendrimer conjugate with either a-cd or g-cd the polypseudorotaxane allowed sustained release of pdna. upon intramuscular injection into mice, the transfection efficiency of the one with g-cd lasted for at least 14 days. [69] a polyrotaxane with b-cd and a-cd rings threaded onto ionene-6,10 the polyrotaxane formed stable complexes with pdna and with a pdna/sirna mixture. it enhanced cellular uptake of the nucleic acid, and demonstrated low cytotoxicity. [70] a pamam starburst dendrimer conjugate the transfection efficiency of the conjugate was significantly higher than that of the unmodified aand b-cde conjugates in a549 and raw264.7 cells. it exhibited higher endosomal escape efficiency, and successfully delivered the transgene to the nucleus 6 h after transfection in a549 cells. 12 h after intravenous administration to mice, this conjugate provided higher gene transfer activity in the kidney than unmodified aand b-cde conjugates. [71, 72] a pamam starburst dendrimer conjugate with lactose-bearing a-cd in hepg2 cells, the conjugate exhibited higher transfection efficiency than unmodified pamam, lactosylated pamam and a-cde. it also showed negligible cytotoxicity even up to a carrier/dna charge ratio of 150/1. compared to jetpeiô-hepatocyte, the conjugate demonstrated higher transfection efficiency in hepatocytes 12 h after intravenous administration to mice. [73] b-cd-modified hyperbranched pamam the polymer was fabricated by michael addition copolymerization of n,n 0 -methylene bisacrylamide with 1-(2-aminoethyl)piperazine and mono-6-deoxy-6ethylenediamino-b-cd. it demonstrated an ability to condense and deliver dna. [74] a polypseudorotaxane with g-cd rings threaded onto linear pei compared to unmodified linear pei, the polypseudorotaxane was more efficient in facilitating cellular uptake of pdna in nih/3t3 cells, and displayed much lower cytotoxicity. [75] b-cd-conjugated poly(ε-lysine) in nih-3t3 cells, the transfection efficiency of the polymer was four orders of magnitude higher than that of unmodified poly(ε-lysine), and was 10 times higher than that of linear pei. [76] to improve the fr-binding activity of fa-a-cde, peg has been used as a spacer between the dendrimer and fa, forming fa-peg-a-cde [64] . among the three fa-peg-a-cdes (ds ¼ 2, 5 or 7) fabricated, the one having a ds value of 5 performed the best in transfection, and demonstrated superior binding ability to both pdna and frs. 12 h after intratumoral injection into mice, fa-peg-a-cde (ds ¼ 5) exhibited remarkably higher pdna delivery efficiency than a-cde. these results implicate the potential of fa-peg-a-cde (ds ¼ 5) as a carrier for tumor-targeted gene delivery. besides fa-peg-a-cde and other vectors that have been discussed in this section, there are many other polymers developed from modification of existing polymers using cds. some of them are listed in table 4 [69e76]. these polymers have illustrated the potential of cds in enhancing the performance of polymeric systems in transfection. in the sections above, we have delineated some of the major approaches of employing cds for applications in gene delivery. these approaches are summarized in fig. 3 . for future research, one area that deserves exploration is drug/gene co-delivery. as cds have an excellent drug loading capacity, along with their potential in gene transfer, cds are anticipated to emerge as attractive candidates for simultaneous transport of drugs and genes. such technical viability has been corroborated by hu et al., who conjugated tegafur to pei-b-cd to fabricate a prodrug of tegafur for drug/ gene co-delivery [77] . they observed that at an n/p ratio of 25, the conjugate could condense pdna into complexes with a diameter of around 150 nm. however, conjugation of tegafur reduced the number of primary amines on pei, thereby compromising the dna binding ability and buffering capacity of pei-b-cd. compared with unmodified pei-b-cd, the transfection efficiency of pei-b-cdtegafur in b16f10 and cos7 cells was much inferior [77] . in addition to pei-b-cd-tegafur, the fu-pei-b-cd (fpc) conjugate, which was synthesized from 5-fluoro-2 0 -deoxyuridine (fdurd) and pei-b-cd, has been reported as a bifunctional anticancer prodrug [78] . compared to fdurd, fpc demonstrated stronger anti-proliferative and cytotoxic activities in glioma cells, and led to around 10-fold higher cellular uptake [78] . however, before pei-b-cd-tegafur and fpc can be used as prodrugs practically, more stringent tests are required to confirm whether the pharmacokinetic and pharmacodynamic properties of the drugs are the same after conjugation with pei-b-cd. recently, a hydrogen bonding strengthened hydrogel has been fabricated by radical copolymerization of peg methacrylated b-cd (peg-b-cd) with 2-vinyl-4,6-diamino-1,3,5triazine (vdt) [79] . experimentation showed that the hydrogel loaded ibuprofen (ibu) successfully and could control the rate of drug release. on top of that, the surface of the hydrogel anchored pdna through hydrogen bonding between diaminotriazine and the dna base pairs, and hence allowed reverse gene transfection in cos-7 cells cultured on the gel surface. such multifunctional potential has made the hydrogel possible to be further developed into a tissue engineering scaffold for drug/gene co-delivery. apart from drug/gene co-delivery, the prospects of stem cell transfection are also worth noting. such prospects have been illustrated by a recent study, in which a cell-penetrating peptide containing the protein transduction domain (ptd) of the hiv-1 tat protein has been conjugated to pei-b-cd, forming tat-pei-b-cd [80] . at an n/p ratio of 20, the polymer displayed reasonable transfection efficiency in somatic cell lines (34e40% in cho and hepg2, and 50e80% in 293t, u138 and u87). in placenta-derived mesenchymal stem cells (pmscs), the level of transgene expression achieved by tat-pei-b-cd was comparable to that obtained by fugene 6, and was approximately twice of that mediated by pei-b-cd after 48 and 96 h of post-transfection incubation (fig. 4a ). before and after transfection, the phenotypic profile of pmscs was examined. consistent expression profiles (including negativity for hla-dr, cd45, cd38 and cd34, and positivity for cd147 and cd90) were observed. this suggests the maintenance of the phenotypic profile of pmscs after tat-pei-b-cd-mediated transfection. moreover, mtt assays confirmed that the cytotoxicity of tat-pei-b-cd was negligible at the working concentration, with no appreciable loss of cell viability being observed in pmscs (fig. 4b) . as the therapeutic potential of stem cells relies considerably on the cells' homing ability and their capacity to synthesize and secret therapeutic proteins [81e83], developing an efficient yet non-toxic and non-immunogenic gene carrier for stem cells will be of immense practical importance. the last area that deserves future attention is small rna transfer. since advances in cd-mediated rna delivery have been surveyed elsewhere [84] , here we will not dwell on them. but it is worth pointing out that though the focus of our discussion in this article has been restricted to gene delivery, the possibility of a gene vector to interact electrostatically with dna may imply that the same vector can complex with rna [85] . this is confirmed by arima et al., whose fa-peg-a-cdes have not only been found to mediate gene delivery [64] , but have also been able to deliver sirna to elicit rna interference (rnai) in tumor-bearing mice [86] . detailed evaluation of the efficiency of each of the cd-based gene vectors will doubtless be needed if those systems are to be used for rna transfer. moreover, owing to the extra vulnerability of rna to enzymatic degradation, additional challenges will be encountered when rna rather the matrix was found to be able to transfect ccd fibroblast cells. [101] than dna is to be delivered [85] . but regarding the emergence of rnai and the bright prospects of rna technologies in diverse areas, ranging from therapeutic target validation [87, 88] to longevity enhancement [89] , if existing cd-based gene delivery technologies turn out to be applicable to rna transfer, not only can their medical applications be significantly broadened, but a vista of new opportunities will also be opened up for rna-mediated therapies. finally, owing to the issue of intellectual property, innovations documented in the patent literature sometimes may not have been reported in scientific journals. patent publications are thus a rich knowledge source complementary to the conventional scientific literature, and advances delineated in both scientific and patent publications should deserve the same amount of academic attention. the earliest patent on the pharmaceutical use of cds can be dated back to 1953 [90] . besides documenting the basic properties of cds and disclosing a method to prepare cds in aqueous solution via precipitation, the patent delineates the ability of cds to improve the duration of activity, taste and chemical stability of bioactive compounds. the patent, however, has not been successfully put into industrial applications [91] . this is partially due to the safety concerns raised by a review article published in 1957 [92] , which referred to unpublished data stating that rats fed orally with b-cd died within a week. though the observed toxicity was later found to be more likely caused by impurities rather than the substance per se [93] , the pharmaceutical applications and acceptance of cds were hampered for many years and only until the 1970s the world first pharmaceutical formulation containing cds emerged [13] . now more and more innovations on cds have appeared in the patent literature. though at this moment most of these patents are related either to production and structural modification of cds or to applications of cds as drug excipients, some efforts have already been directed to the use of cds in delivery of genes and other nucleic acids [94e101] ( table 5) . regarding the increasing awareness of the potential of cds in gene delivery, it is expected that cdbased gene carriers will escalate in number in the patent literature in the forthcoming decades. gene delivery is an expanding area of biotechnological research, and has exhibited high potential in biomedical applications [102e105]. though viral vectors entered clinical trials in as early as the 1990s and are still the most extensively studied gene delivery systems, the safety risks involved with using viruses warrant development of non-viral alternatives. over decades, copious polymers have been investigated as gene carriers [106e 111], but the possible use of cds has rarely been seriously considered. this may pertain to the fact that native cds fail to form stable complexes with pdna [40] , thereby showing much lower transfection efficiency in comparison with conventional polymeric vectors such as chitosan, poly(lactic-co-glycolic acid) (plga), pei and pll. however, judging from the evidence presented so far in this article, cds have favorable properties for their applications in gene transfer (e.g. having the ability to form inclusion complexes with chemical drugs for drug/gene co-delivery, being able to potentially function as an absorption enhancer in therapeutics transfer, and being capable of modulating the cytotoxicity of other polymers) and display great versatility in gene delivery, from functioning as linking agents for development of new vectors to modulating the performance of existing polymers in transfection. taking the promising potential of cds into account, there is no doubt that cds will continue to emerge as an important tool for vector development, and will play significant roles in facilitating non-viral gene delivery in the future. cyclodextrins and their uses: a review sur la fermentation de la fécule par l'action du ferment butyrique über die schardingeredextrine aus stärke neue ansichten über die stärke chiral separation of basic drugs by capillary zone electrophoresis with cyclodextrin additives cyclodextrin-derived phresponsive nanoparticles for delivery of paclitaxel prolonged local antibiotics delivery from hydroxyapatite functionalised with cyclodextrin polymers preparation and characterization of starch/cyclodextrin bioadhesive microspheres as platform for nasal administration of gabexate mesylate (foy) in allergic rhinitis treatment the intracellular effects of non-ionic amphiphilic cyclodextrin nanoparticles in the delivery of anticancer drugs cyclodextrin-based device coatings for affinity-based release of antibiotics synthesis, properties and controlled release behaviors of hydrogel networks using cyclodextrin as pendant groups cyclodextrins and their pharmaceutical applications aqueous nanogels modified with cyclodextrin role of cyclodextrins in improving oral drug delivery past, present and future of cyclodextrin research cyclodextrins in oligonucleotide delivery vehicles for oligonucleotide delivery to tumours synthesis, hybridization properties, nuclease stability, and cellular uptake of the oligonucleotideeamino-b-cyclodextrins and adamantane conjugates use of cyclodextrin and its derivatives as carriers for oligonucleotide delivery modulation of oligonucleotideinduced immune stimulation by cyclodextrin analogs beta-cyclodextrin derivatives as carriers to enhance the antiviral activity of an antisense oligonucleotide directed toward a coronavirus intergenic consensus sequence the first targeted delivery of sirna in humans via a selfassembling, cyclodextrin polymer-based nanoparticle: from concept to clinic cyclodextrin-based pharmaceutics: past, present and future cyclodextrin-polyethylenimine conjugates for targeted in vitro gene delivery cyclodextrin-modified polyethylenimine polymers for gene delivery new class of polymers for the delivery of macromolecular therapeutics the influence of sodium glycocholate and other additives on the in vivo transfection of plasmid dna in the lungs the interaction of b-cyclodextrin with nucleic acid monomer units effects of structure of b-cyclodextrincontaining polymers on gene delivery mode of action of b-cyclodextrin as an absorption enhancer of the watersoluble drug meglumine antimoniate nasal administration of an acth(4-9) peptide analogue with dimethyl-b-cyclodextrin as an absorption enhancer: pharmacokinetics and dynamics beta cyclodextrins enhance adenoviral-mediated gene delivery to the intestine experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings cyclodextrins in drug delivery pharmaceutical applications of cyclodextrins. iii. toxicological issues and safety evaluation cyclodextrins in transdermal and rectal delivery use of cyclodextrins to manipulate plasma membrane cholesterol content: evidence, misconceptions and control strategies changes of membrane permeability due to extensive cholesterol depletion in mammalian erythrocytes cell transfection with polycationic cyclodextrin vectors tailoring b-cyclodextrin for dna complexation and delivery by homogeneous functionalization at the secondary face rational design of cationic cyclooligosaccharides as efficient gene delivery systems polycationic amphiphilic cyclodextrins for gene delivery: synthesis and effect of structural modifications on plasmid dna complex stability, cytotoxicity, and gene expression preorganized macromolecular gene delivery systems: amphiphilic b-cyclodextrin "click clusters cyclodextrin-scaffolded glycotransporters for gene delivery recent advances in rational gene transfer vector design based on poly(ethylene imine) and its derivatives self-assembling nucleic acid delivery vehicles via linear, water-soluble, cyclodextrin-containing polymers efficient gene transfection in the neurotypic cells by star-shaped polymer consisting of b-cyclodextrin core and poly(amidoamine) dendron arms folic acid modified cationic g-cyclodextrinoligoethylenimine star polymer with bioreducible disulfide linker for efficient targeted gene delivery new cyclodextrin derivative containing poly(l-lysine) dendrons for gene and drug co-delivery star-shaped cationic polymers by atom transfer radical polymerization from b-cyclodextrin cores for nonviral gene delivery cationic star polymers consisting of a-cyclodextrin core and oligoethylenimine arms as nonviral gene delivery vectors two novel non-viral gene delivery vectors: low molecular weight polyethylenimine crosslinked by (2-hydroxypropyl)-b-cyclodextrin or (2-hydroxypropyl)-g-cyclodextrin math1 gene transfer based on the delivery system of quaternized chitosan/na-carboxymethyl-bcyclodextrin nanoparticles fgf receptor-mediated gene delivery using ligands coupled to pei-b-cyd transferrin receptor 2 is frequently expressed in human cancer cell lines transferrin-containing, cyclodextrin polymer-based particles for tumor-targeted gene delivery synthesis and characterization of polyrotaxanes consisting of cationic a-cyclodextrins threaded on poly[(ethylene oxide)-ran-(propylene oxide)] as gene carriers enhancement of gene expression by polyamidoamine dendrimer conjugates with a enhancing effects of galactosylated dendrimer/a-cyclodextrin conjugates on gene transfer efficiency improvement of gene delivery mediated by mannosylated dendrimer/acyclodextrin conjugates effects of structure of polyamidoamine dendrimer on gene transfer efficiency of the dendrimer conjugate with a-cyclodextrin in vitro and in vivo gene transfer by an optimized a-cyclodextrin conjugate with polyamidoamine dendrimer potential use of folate-polyethylene glycol (peg)-appended dendrimer (g3) conjugate with a-cyclodextrin as dna carriers to tumor cells distribution, functionality and gene regulation of folate receptor isoforms: implications in targeted therapy molecular cloning and characterization of the human folatebinding protein cdna from placenta and malignant tissue culture (kb) cells the role of a-folate receptor-mediated transport in the antitumor activity of antifolate drugs effect of hyperthermia on intracellular drug accumulation and chemosensitivity in drug-sensitive and drug-resistant p388 leukaemia cell lines polypseudorotaxanes of pegylated a-cyclodextrin/polyamidoamine dendrimer conjugate with cyclodextrins as a sustained release system for dna cellular delivery of polynucleotides by cationic cyclodextrin polyrotaxanes potential use of glucuronylglucosyl-b-cyclodextrin/dendrimer conjugate (g2) as a dna carrier in vitro and in vivo possible enhancing mechanisms for gene transfer activity of glucuronylglucosyl-bcyclodextrin/dendrimer conjugate in vitro and in vivo gene delivery mediated by lactosylated dendrimer/a-cyclodextrin conjugates (g2) into hepatocytes photoluminescent hyperbranched poly(amido amine) containing b-cyclodextrin as a nonviral gene delivery vector improved cell viability of linear polyethylenimine through g-cyclodextrin inclusion for effective gene delivery sunflowershaped cyclodextrin-conjugated poly(epsilon-lysine) polyplex as a controlled intracellular trafficking device polyethylenimine-cyclodextrintegafur conjugate shows anti-cancer activity and a potential for gene delivery bifunctional conjugates comprising b-cyclodextrin, polyethylenimine, and 5-fluoro-2'-deoxyuridine for drug delivery and gene transfer cyclodextrin-cross-linked diaminotriazine-based hydrogen bonding strengthened hydrogels for drug and reverse gene delivery cyclodextrin-pei-tat polymer as a vector for plasmid dna delivery to placenta mesenchymal stem cells autologous mesenchymal stem cell transplantation induce vegf and neovascularization in ischemic myocardium osteogenesis by human mesenchymal stem cells cultured on silk biomaterials: comparison of adenovirus mediated gene transfer and protein delivery of bmp-2 optimization of a gene electrotransfer method for mesenchymal stem cell transfection cyclodextrin-based sirna delivery nanocarriers: a stateof-the-art review nucleic acid delivery with chitosan and its derivatives folate-peg-appended dendrimer conjugate with a-cyclodextrin as a novel cancer cell-selective sirna delivery carrier rna interference-mediated validation of genes involved in telomere maintenance and evasion of apoptosis as cancer therapeutic targets rna interference in cancer: targeting the anti-apoptotic protein c-flip for drug discovery rnai-mediated rna degradation: future prospects of longevity enhancement verfahren zur herstellung von einschlubverbindungen physiologisch wirksamer organischer verbindungen cyclodextrins and their industrial uses. paris: éditions de santé the schardinger dextrins cyclodextrin technology cyclodextrin cellular delivery system for oligonucleotides formulations for non-viral in vivo transfection in the lungs linear cyclodextrin copolymers linear cyclodextrin copolymers linear cyclodextrin copolymers linear cyclodextrin copolymers method for preparing linear cyclodextrin copolymers cyclodextrin-based materials, compositions and uses related thereto nucleic acid delivery: roles in biogerontological interventions protein kinases as targets for interventive biogerontology: overview and perspectives nucleic acid therapy for lifespan prolongation: present and future delivery of therapeutics: current status and its relevance to regenerative innovations corneal gene delivery: chitosan oligomer as a carrier of cpg rich, cpg free or s/mar plasmid dna deliverydified pei protects against cytotoxic lymphocyte killing of a granzyme b inhibitor gene using carbamate-mannose mo gdnf gene delivery via a 2-(dimethylamino)ethyl methacrylate based cyclized knot polymer for neuronal cell applications dendrimer type bio-reducible polymer for efficient gene delivery in vivo nucleic acid delivery with pei and its derivatives: current status and perspectives synthesis and properties of chitosan-pei graft copolymers as vectors for nucleic acid delivery the author would like to thank the editor-in-chief, david f. williams, for offering the opportunity to proceed with this work. thanks are extended to ka-fai fung, yau-foon tsui, foon-lian lim and kenneth l. k. wu for support and help during the writing of this manuscript. key: cord-345475-ttrcmtu4 authors: de oliveira, luisa abruzzi; breton, michèle claire; bastolla, fernanda macedo; camargo, sandro da silva; margis, rogério; frazzon, jeverson; pasquali, giancarlo title: reference genes for the normalization of gene expression in eucalyptus species date: 2011-12-24 journal: plant cell physiol doi: 10.1093/pcp/pcr187 sha: doc_id: 345475 cord_uid: ttrcmtu4 gene expression analysis is increasingly important in biological research, with reverse transcription–quantitative pcr (rt–qpcr) becoming the method of choice for high-throughput and accurate expression profiling of selected genes. considering the increased sensitivity, reproducibility and large dynamic range of this method, the requirements for proper internal reference gene(s) for relative expression normalization have become much more stringent. given the increasing interest in the functional genomics of eucalyptus, we sought to identify and experimentally verify suitable reference genes for the normalization of gene expression associated with the flower, leaf and xylem of six species of the genus. we selected 50 genes that exhibited the least variation in microarrays of e. grandis leaves and xylem, and e. globulus xylem. we further performed the experimental analysis using rt–qpcr for six eucalyptus species and three different organs/tissues. employing algorithms genorm and normfinder, we assessed the gene expression stability of eight candidate new reference genes. classic housekeeping genes were also included in the analysis. the stability profiles of candidate genes were in very good agreement. pcr results proved that the expression of novel eucons04, eucons08 and eucons21 genes was the most stable in all eucalyptus organs/tissues and species studied. we showed that the combination of these genes as references when measuring the expression of a test gene results in more reliable patterns of expression than traditional housekeeping genes. hence, novel eucons04, eucons08 and eucons21 genes are the best suitable references for the normalization of expression studies in the eucalyptus genus. for cellulose pulp has resulted in wood shortages in recent years (steane et al. 2002 , foucart et al. 2006 , grattapaglia and kirst 2008 . hence efforts in many fields of research are being made to improve forest productivity including molecular approaches such as whole-genome sequencing and high-throughput analysis of gene expression. with such objectives in mind, the eucalyptus genome network (eucagen) was created (http://www.ieugc.up.ac.za), representing one example of a valuable database platform for genome research in e. grandis and other species (rengel et al. 2009 ). with the recent availability of eucalyptus genome and transcriptome data, many efforts are and will be made to assess eucalyptus gene expression with conventional or highthroughput techniques. independently of the method employed, the use of reference genes as internal controls for gene expression measurements is absolutely essential. such validated reference genes for eucalyptus are still scarce. dna macro-and microarray hybridizations and partial or whole transcriptome sequencing linked to digital transcript counting (rna-seq), among other techniques, allow the expression analysis of thousands of genes simultaneously, employing differentially labeled rna or cdna populations. these techniques have the advantage of speed, highthroughput and a high degree of potential automation compared with conventional quantification methods such as northern blot analysis, rnase protection assays, or competitive reverse trnascription-pcr (rt-pcr; rajeevan et al. 2001 , czechowski et al. 2005 . reverse transcription followed by real-time, quantitative pcr (rt-qpcr) is the most sensitive and specific technique commonly used to assess gene expression levels (aerts et al. 2004 ). it allows more in-depth studies of smaller sets of genes across many individuals, treatments or cell/tissue types to be performed. rt-qpcr is the technique of choice to validate gene expression results derived from the above-mentioned high-throughput methods (rajeevan et al. 2001 , czechowski et al. 2005 . as mentioned previously, only good internal reference genes will allow confident comparison of gene expression results. internal control genes are used to normalize mrna fractions and are often referred to as housekeeping genes which should not vary their expression during development, among tissues or cells under investigation, or in response to experimental treatments. the most common housekeeping genes employed in plant gene expression studies are those encoding actin (bas et al. 2004 , barsalobres-cavallari et al. 2009 ), tubulin (schmidt and delaney 2010, yang et al. 2010 ), glyceraldehyde-3-phosphate dehydrogenase (gapdh) (tong et al. 2009 , maroufi et al. 2010 , rrna (guénin et al. 2009 , schimidt and delaney 2010 , yang et al. 2010 , polyubiquitin (libault et al. 2008 , barsalobres-cavallari et al. 2009 ) and elongation factor 1a (silveira et al. 2009 , tong et al. 2009 ). many studies make use of these housekeeping genes without proper validation of their presumed stability, based on the assumption that they would be constitutively expressed due to their role in basic cellular processes. considerable amounts of data show that most studied housekeeping genes have expression that can vary considerably depending on the cell type or experimental condition (thellin et al. 1999 , hruz et al. 2011 . with the increased sensitivity, reproducibility and large dynamic range of the rt-qpcr methods, the requirements for proper internal control genes have become increasingly stringent. in recent years, large numbers of reference gene validation attempts have been reported for plants, most of them covering model, crop or ornamental species such as rice , jain et al. 2006 , arabidopsis thaliana (remans et al. 2008 , hong et al. 2010 , dekkers et al. 2012 , tobacco (schmidt and delaney 2010) , sugarcane (iskandar et al. 2004) , potato (nicot et al. 2005) , brachypodium sp. (hong et al. 2008) , soybean (jian et al. 2008 , libault et al. 2008 , hu et al. 2009 , kulcheski et al. 2010 , tomato (coker and davies 2003 , expósito-rodríguez et al. 2008 , løvdal and lillo 2009 , dekkers et al. 2012 , brachiaria sp. (silveira et al. 2009 ), coffee (barsalobres-cavallari et al. 2009 ), peach (tong et al. 2009 ), wheat (paolacci et al. 2009 ), chicory (maroufi et al. 2010) , cotton (artico et al. 2010) , cucumber , lolium sp. (martin et al. 2008) , orobanche sp. (gonzález-verdejo et al. 2008) and cyclamen sp. (hoenemann and hohe 2011) . few studies have focused on woody plants such as poplar (brunner et al. 2004 , grape (reid et al. 2006 ) and longan tree (lin and lai 2010) . reference genes for gene expression studies in eucalyptus have been recently presented. de almeida et al. (2010) , working with e. globulus microccuttings rooted in vitro, have indicated histone h2b and a-tubulin as the most suitable reference genes during in vitro adventitious rooting, in the presence or absence of auxin. boava et al. (2010) , working with clonal seedlings of the hybrid plant (e. grandisâe. urophylla) exposed to biotic (puccinia psidii) or abiotic (acibenzolar-s-methyl) stresses, concluded that genes encoding the eukaryotic elongation factor 2 (eef2) and ubiquitin were the most stable, and ideal as internal controls. both studies tested a small number of genes (11 and 13, respectively) selected according to literature data concerning other plant systems and experimental conditions. given the increasing interest in the functional genomics of eucalyptus and the need for validated reference genes for a broader set of species and experimental conditions, we sought to identify the most stably expressed genes in a set of 21,432 genes assayed by microarray developed to compare stem vascular (xylem) and leaf tissues of e. grandis and e. globulus adult trees. best candidate genes were then validated by rt-qpcr in assays with rnas from xylem and leaves of six eucalyptus species and flowers of e. grandis. seven traditional housekeeping genes most employed in expression studies in plants were also included in our analysis. the eucalyptus species selected in the present study are among the most planted trees in the tropics and the most employed in breeding programs in brazil (e. grandis, e. urophylla, e. globulus, e. saligna, e. dunnii and e. pellita). most importantly, they exhibit highly contrasting phenotypes, especially in growth rate, biotic and abiotic resistance and wood quality (fao 2001 , coppen 2002 , which, in principle, would make the search for general reference genes for the genus difficult. as a result, genes selected as the least variable among all conditions tested have not yet been described in the literature. this set of genes may represent an important molecular tool to analyze accurately the expression of eucalyptus genes in different tissues/organs and in different species via array hybridization or rt-qpcr. data from microarray hybridizations conducted within the project 'genolyptus: the brazilian research network on the eucalyptus genome' (http://genoma.embrapa.br/genoma/genolyptus) were analyzed in order to select the most stably expressed eucalyptus genes. the microarray study was conceived with nine 50-mer oligoprobes covering the length of each one of the 21,432 unique sequences derived from the genolyptus expressed sequence tag (est) data set (genbank accession nos. ho763666-ho769458 and hs047685-hs075494). nine oligoprobes were also designed for 10 cdnas encoding known human proteins as negative controls. oligoprobes were synthesized 'on-chip' in duplicate, randomly distributed in two blocks of 10 identical slides. leaf blades and vascular (xylem) tissue samples were taken from two e. grandis clonal trees, i.e. derived from the same matrix tree and harboring the same genotype. two additional xylem samples were collected from two other e. grandis clonal trees of a different genotype and from two e. globulus clonal trees. therefore, 10 cy3-labeled cdna samples and 10 identical chips were produced at roche nimblegen for the microarray assays, with a total number of 385,956 features per slide [microarray results were submitted to gene expression omnibus (geo) under accession nos. gsm786737-gsm786746]. the most stably expressed genes were mined in the microarray data by employing two statistical algorithms named significance analysis of microarrays (sam) (tusher et al. 2001) and standard deviation microarray analysis (sdma; see materials and methods), that allow the representation of results in three-dimensional (3d) graphs. the input data to sam were gene expression measurements from the set of microarray experiments, as well as the response variable from each experiment. according to tusher et al. (2001) , sam computes a statistic d(i) for each gene 'i', measuring the strength of the relationship between gene expression and the response variable. it uses repeated permutations of the data to determine if the expression of any gene is significantly related to the response. the cut-off for significance is determined by a tuning parameter delta, chosen by the user based on the false-positive rate. one can also choose a fold change parameter, to ensure that called genes change at least a pre-specified amount. in the present study, the value of delta was set to 0.2 so that we could mine the genes whose expression exhibited the lowest variation among the three conditions assessed in the microarrays, i.e. e. grandis leaves and xylem and e. globulus xylem (fig. 1a) . a ranking of 50 genes whose fold change values were approximately equal to 1 were selected as reference candidate genes, since they presented the lowest variation of expression among the leaves and xylem of e. grandis and xylem of e. globulus (table 1) . sdma is a novel and simpler algorithm based on the comparison of the average gene expression in relation to the global average of expressed genes in microarrays and the overall standard deviation, allowing the presentation of results in graphical mode (see materials and methods). sdma allowed us to generate a 3d graph that evidenced genes expressed in a position equivalent to their overall average expression among the three conditions analyzed in the microarrays (fig. 1b) . the average value of gene expression by sdma should be as similar as possible to the global average of expression, and the overall standard deviation should tend to zero when the scope of the analysis is the selection of genes whose expression is stable. using the same criteria applied in sam, we selected 50 genes whose standard deviations were close to zero, indicating the similarity between the values of mean and mean global gene expression ( table 1 ). an sdma 3d graphic is presented in fig. 1c where the mean expression values of the 50 most stable genes selected under the conditions studied are plotted. note that points representing selected genes tend to form a straight line, indicating that their means of expression when compared with the global average have a standard deviation tending to zero. we were pleased to note that the employment of either sdma or sam allowed us to identify the same group of 50 genes as the most stably expressed, confirming the robustness of the analysis performed by both algorithms. nevertheless, the ranking of the two methods differed, as shown in table 1 . since none of the sequences selected presented molecular or biochemical identities similar to previously described eucalyptus genes or proteins, we named them eucons01 to eucons50, according to the ranking generated by sdma, as stated in the first column of table 1 . selected sequences were annotated using blastx (altschul et al. 1997 ) against the available non-redundant protein sequences (nr), and their functional categories were determined by the blast2go software (conesa et al. 2005) . although some sequences exhibited expected (e) values too high for a confident annotation, approximately half of them (48%) showed similarity to known proteins. the other half of the sequences corresponded to hypothetical proteins (10%) or returned a 'no hit' (42%) result ( table 1) . the gene ontology analysis of the 50 selected genes by blast2go allowed the functional classification of 35 (70%) of the sequences, as represented in fig. 2 . most of the sequences were classified in functions related to cellular (12) or metabolic (12) processes, among six other functional categories. the remaining 15 (30%) sequences were classified as 'no hit', and were not represented in fig. 2 . in order to validate the results further by rt-qpcr, we selected 10 candidate genes with the least variation in expression and whose annotation matched a known plant protein according to sdma (eucons01, 04, 06, 07 and 08) and sam (eucons15, 21, 27, 32 and 43) . selected genes are highlighted in gray in table 1 . in order to check their true expression stability, primers for rt-qpcr validation of the 10 eucalyptus sequences selected as potential reference genes were designed and are presented in table 2 . in addition to them, we also designed primers for five genes traditionally employed as references, based on their housekeeping function, including a sand family protein (remans et al. 2008) , gapdh (dambrauskas et al. 2003) , histone h2b, ribosomal protein l23a (ribl23a) and tubulin (tua2) (czechowski et al. 2005) , as presented in table 2 . reference genes previously recommended for the analysis of gene expression during e. globulus rooting in vitro and named euc10 and euc12 (de almeida et al. 2010) were also evaluated, and the primers employed in rt-qpcr are also presented in table 2 . where the observed relative difference is identical to the expected relative difference with a delta set to 0.2. solid and dotted red and green lines represent genes whose observed relative differences were lower or higher than the expected relative differences, i.e. whose expression varied among tissues tested. (b) three-dimensional graph generated with the standard deviation microarray analysis (sdma) method showing genes (open circles) expressed in positions equivalent to their overall average expression among the three conditions analyzed in the microarrays, i.e. leaves (egrl, z-axis) and xylem (egrx, x-axis) of e. grandis and xylem of e. globulus (eglx, y-axis). the higher concentration of circles around the main diagonal line proved that most genes exhibited very similar expression values in the analyzed tissues. the most differentially expressed genes appeared proportionally far from the main diagonal line. (c) sdma 3d graph representing the 50 most invariable eucalyptus genes according to microarray data. points representing selected genes tend to form a straight line since their means of expression are similar to the global average, with a standard deviation tending to zero. gene names and the identity of e. grandis genomic (eucagen) scaffolds where sequences are located, as well as the embl or genbank accession codes (gene id) and putative functional identity of sequences based on blast analysis are indicated along with the estimated (e) value. results of the statistical analysis performed are indicated: standard deviations (sd) for the sdma method, and fold change and final score (d) of the sam method. genes were ranked from highest to lowest stability for both methods. lines shaded in gray represent genes selected for validation via rt-qpcr analysis. total rna samples were prepared from six eucalyptus species, distributed as follows: flower, leaf and xylem of e. grandis, leaf and xylem of e. dunnii, e. pellita, e. saligna and e. urophylla, and xylem of e. globulus. rt-qpcr evaluations were conducted with biological duplicates and experimental quadruplicates. results were analyzed using the software genorm 3.5 (vandesompele et al. 2002) and normfinder (andersen et al. 2004 ) in order to generate comparable rankings of genes based on their stability of expression. the cq data collected for all samples were transformed to relative quantities using the 2 àááct method developed by livak and schmittgen (2001) . we did not succeed in obtaining satisfactory single-peak dissociation curves after rt-qpcr with primers designed for eucons01 and eucons15 (data not show), and both candidate genes were discarded from the analysis. with genorm, the average expression stability (m-value) of all genes was first calculated. m-values are defined as the mean variation of a certain gene related to all of the others. the genorm software recommends an m-value below the threshold of 1.5 in order to identify genes with stable expression, although 0.5 has been used as the threshold limit by many authors (radonić et al. 2005 , allen et al. 2008 , coll et al. 2010 , de almeida et al. 2010 , taylor et al. 2010 . as shown in fig. 3a , all 15 candidate genes examined showed a very high stability of expression, with thresholds <0.12, independently of the tissues/organs evaluated. according to the genorm analysis, eucons04, eucons08 and eucons21 were the most stably expressed genes and should be considered as the best reference genes for rt-qpcr normalizations. in order to evaluate the optimal number of reference genes for reliable normalization, genorm allows calculation of the pairwise variation vn/vn + 1 between the sequential ranked normalization factors nfn and nfn + 1 to determine the effect of adding the next reference gene in normalization. the normalization factor is calculated based on the geometric average among the two most stable gene relative quantities and the stepwise inclusion of the other genes in the order of their expression stability. a large pairwise variation implies that the added reference gene has a significant effect on normalization and should be included for calculation of a reliable normalization factor. considering the cut-off value of 0.15, below which the inclusion of an additional reference gene is not necessary (vandesompele et al. 2002) , the use of the two most stably expressed genes, eucons08 and eucons21, was sufficient for accurate normalization (<0.02) in all organs studied (flower, leaves and xylem) from the six eucalyptus species (fig. 3a, b) . the same applies when analyzing xylem and leaves separately, with eucons04 and ribl23a genes (leaves) and eucons06 and eucons08 genes (xylem; data not show). in addition to genorm, the expression stability of candidate reference genes assayed by rt-qpcr was also analyzed with the normfinder software. this program takes into account the intra-and intergroup variations for normalization factor calculation and the results are not affected by occasionally co-regulated genes. the best candidate will be the one with the intergroup variation as close to zero as possible and, at the same time, having the smallest error bar possible. hence values are inversely correlated to gene expression stability, which avoids artificial selection of co-regulated genes (andersen et al. 2004 ). according to the normfinder analysis of gene expression in leaves, xylem tissues and among species, the stability values of the 15 genes studied were <0.138, with error bars no greater than 0.044 ( fig. 3c ; table 3 ). when we analyzed the gene expression in all tissues/organs and species, the stability value was in the range between 0.017 and 0.106, proving again that all genes elected are good references for rt-qpcr studies in eucalyptus. the ranking of the genes and their respective stability values are shown in table 3 . according to the normfinder analysis and in agreement with the results of genorm, the three most stable genes were eucons04, eucons08 and eucons21 when considering all tissues/organs and species. when the expression in leaves is considered separately, the stability values were in the range between 0.008 and 0.086, and the three most stable genes in these organs were eucons04, eucons08 and eucons32. in xylem vascular tissues, the stability values were in the range of 0.01-0.138, and the genes eucons27, eucons07 and eucons06 were the most stable ( table 3) . the algorithm ranked eucons04 as the most stably expressed gene in all samples regardless of whether the samples were collected into one main group or divided into two groups. nonetheless, just one housekeeping gene is determined to all samples using normfinder when no groups are defined. so, a different group was created to analyze the most stable couple ( table 3) . when leaves and xylem were tested as different groups, the stability values were in the range between 0.011 and 0.094. eucons04 exhibited the lowest stability value. normfinder identified eucons04 and eucons08 as the most appropriate combination of genes, showing a stability value of 0.009. fig. 2 functional classification of the 50 most stable eucalyptus genes according to microarray hybridization data analysis through sam and sdma. gene ontology hits registered for the 50 most constitutive genes that could be assigned a putative function based on swiss-prot query. only known genes are shown. validation of the stability of eucalyptus reference genes via dxr differential gene expression analysis terpenoids are all derived from two common precursors, isopentenyl diphosphate (ipp) and dimethylallyl diphosphate (dmapp). in higher plants, ipp and dmapp are synthesized through two distinct pathways in separate cellular compartments, the cytosolic mevalonate (mva) pathway and the 2-c-methyl-d-erythritol 4-phosphate (mep) pathway that takes place in plastids. the mep pathway, through which diterpenes are synthesized, has two important initial steps: (i) the formation of 1-deoxy-d-xylulose 5-phosphate (dxp) from pyruvate and glyceraldehyde 3-phosphate through the action of the dxp synthase (dxs), followed by the conversion of dxp into mep by the action of the dxp reductoisomerase (dxr). as dxs and dxr are key enzymes catalyzing the two committed steps for isoprenoid biosynthesis, genes coding for dxs and dxr may play important roles in controlling the plastidic synthesis of isoprenoids and the downstream diterpene products (carretero-paulet et al. 2002 , liao et al. 2007 , wu et al. 2009 , yan et al. 2009 ). it is known that the expression patterns of the dxr enzyme and its encoding gene vary quite consistently according to the plant and organ being assessed. this enzyme and its encoding mrna show increased expression in inflorescences and leaves of a. thaliana (carretero-paulet et al. 2002) and salvia miltiorrhiza (yan et al. 2009 ), but decreased levels were reported in stems and roots. in rauvolfia veticillata, on the other hand, gene name abbreviations, genbank, embl or tair accession codes and the putative functional identity of genes based on blast analysis are indicated. the eucagen scaffolds containing the genome sequences of the referred genes are presented. based on the eucalyptus sequences, primers were designed as shown along with amplicon lengths (bp). eucalyptus reference genes (continued) higher levels of dxr mrna were reported in fruits and roots, with the lowest levels in flowers (liao et al. 2007 ). in order to confirm the constitutive expression of the three best eucalyptus genes selected as references (eucons04, eucons08 and eucons21), we tested them by normalizing the patterns of dxr gene expression in eucalyptus and compared the results with those normalized by the traditional reference genes ribl23a and gapdh. therefore, the expression of dxr and the reference genes was measured by rt-qpcr in the same set of tissues/organs and eucalyptus species previously tested. dxr expression values were then normalized against the expression values of two reference genes, as shown in fig. 4 . in order to allow comparisons among reference genes, the average value of the pairwise reference gene relative expression in the different organs/tissues tested was set to 1 and taken to normalize the dxr relative expression. as expected, steady-state mrna levels for the dxr gene were much higher in leaves, followed by flowers, with lower values observed in xylem tissues of e. grandis (fig. 4a) . as shown in fig. 4 , the pairwise combination of eucons04, eucons08 or eucons21 allowed more confident results than the ribl23a/gapdh pair. the relative expression of the dxr gene was much less variable when normalized with eucons genes and, most importantly, much more concordant if compared with results normalized by the ribl23a/gapdh pair. this is more evident in fig. 4b where no statistical difference was observed in dxr relative expression values among the xylem from e. grandis, e. globulus and e. pellita when ribl23a/gapdh were used as references, but was quite different when normalized with any two of the eucons genes. essentially the same conclusions were assumed by the analysis of dxr relative expression obtained with leaves and xylem tissues from e. dunnii, e. pellita, e. saligna and e. urophylla (results not shown). real-time qpcr and cdna microarray measurements are highly reproducible techniques to assess gene expression at the steady-state mrna level (yue et al. 2001 , stankovic and corfas 2003 , stahlberg et al. 2004 ). however, in comparison with classical rt-pcr, the main advantages of rt-qpcr are its higher sensitivity, specificity of measurements, and broad quantification range of up to seven orders of magnitude (bustin 2002 , gachon et al. 2004 ), besides being a great aid to study expression in genes whose transcript levels are known to be very low (higuchi et al. 1993) . the analysis by rt-qpcr has become the most common method for validating whole-genome microarray data or of a smaller set of genes, and molecular diagnostics (giulietti et al. 2001 , chuaqui et al. 2002 . accurate normalization is an absolute prerequisite for correct measurement of gene expression, and the most commonly used normalization strategy involves standardization to a single constitutively expressed control gene. therefore, the ideal reference gene should exhibit invariable expression levels among all different cell types, tissues, organs, developmental stages or treatments that are submitted to the test organism (vandesompele et al. 2002 , andersen et al. 2004 ). however, it has become clear that no single gene is constitutively expressed in all cell types and under all experimental conditions. it has been shown extensively that the expression of the so-called 'housekeeping' genes, although constant under some experimental conditions, can vary quite considerably in other cases, implying that the expression stability of the intended control gene has to be verified before each experiment (thellin et al. 1999 , volkov et al. 2003 , czechowski et al. 2005 relative expression of the isoprenoid biosynthetic gene dxr in different tissues/organs of eucalyptus by rt-qpcr and normalization with different reference gene pairs. gene pairs employed as references are indicated at the bottom of the graphics. average values of the relative expression of the reference genes in the different tissues were set to 1 in order to normalize the expression of dxr. (a) expression patterns of the dxr gene in flowers, leaves and xylem tissues of e. grandis. (b) expression patterns of the dxr gene in xylem tissues of e. grandis, e. globulus and e. pellita. , guénin et al. 2009 , hruz et al. 2011 ). normalization with multiple reference genes is becoming the gold standard for the technique, but reports that identify such genes in plant research are limited, especially for woody species (rajeevan et al. 2001 , coker and davies 2003 , brunner et al. 2004 , iskandar et al. 2004 , nicot et al. 2005 , jain et al. 2006 , reid et al. 2006 , expósito-rodriguez et al. 2008 , gonzález-verdejo et al. 2008 , hong et al. 2008 , jian et al. 2008 , libault et al. 2008 , martin et al. 2008 , remans et al. 2008 , hu et al. 2009 , løvdal and lillo 2009 , paolacci et al. 2009 , silveira et al. 2009 , tong et al. 2009 , de almeida et al. 2010 , artico et al. 2010 , boava et al., 2010 , hong et al. 2010 , kulcheski et al. 2010 , lin and lai 2010 , maroufi et al., 2010 , schmidt and delaney 2010 , yang et al. 2010 , hoenemann and hohe 2011 . in the present work, we evaluated the results of microarray data concerning 21,442 eucalyptus genes and selected the 50 most stably expressed genes in leaves of e. grandis and the xylem of e. grandis and e. globulus. to do so, two statistical algorithms were employed, sam and sdma, and the same 50 candidate genes were pointed out as the most invariably expressed genes in microarrays, although the ranking of the genes was different (table 1) . while sam is a well established and popular program to analyze microarray data, with almost 6,500 citations in pubmed (tusher et al. 2001) , sdma is here presented as a novel algorithm developed to better represent, in 3d graphs, the results of the most stably expressed genes in microarrays. it is based on the principle that gene expression values with lower standard deviations are supposed to be the most similarly expressed among the samples being tested. by rt-qpcr, the expression stability of eight of the 50 best candidate genes selected by sam and sdma was addressed in different organs (leaves and flowers) and vascular tissues (xylem) derived from six species of eucalyptus. besides these eight novel genes, seven other genes previously tested as references in eucalyptus or other plants were also evaluated, including classic housekeeping genes such as those encoding tubulin (tua2), histone h2b, gapdh and the ribosomal protein l23a (ribl23a). genes encoding tua2, gapdh, histones, ribosomal proteins and rnas are the most employed and tested housekeeping genes in plants (thellin et al. 1999 , rajeevan et al. 2001 , volkov et al. 2003 , iskandar et al. 2004 , czechowski et al. 2005 , barsalobres-cavallari et al. 2009 , de almeida et al. 2010 , lin and lai 2010 , maroufi et al. 2010 . according to our rt-qpcr results and data analysis by genorm and normfinder, all these housekeeping genes showed quite consistent stability in expression in eucalyptus, especially ribl23a (fig. 3, table 3 ). nevertheless, the employment of the pair gapdh/ribl23a to normalize the expression of the isoprenoid biosynthetic gene dxr proved that, at least together, these genes are not suitable as references for eucalyptus gene expression studies. indeed, the combination of any pair of the three best reference genes here presented, eucons04, 08 and 21, to normalize the results of dxr gene expression was intrinsically consistent, leading to a quite different interpretation of the dxr gene expression in xylem tissues of three eucalyptus species, as shown in fig. 4b . the stability of the tua2 gene has often been used to normalize rt-qpcr expression data (brunner et al. 2004 , gonzález-verdejo et al. 2008 , de almeida et al. 2010 . analysis of the eucalyptus microarray and rt-qpcr data revealed that, indeed, it has a quite stable expression. nevertheless, this gene is far from being the best reference for eucalyptus among those tested (fig. 3, table 3 ). tua2 has been shown to be a suitable normalization gene during plant development in orobanche ramosa (gonzález-verdejo et al. 2008) , for comparison of gene expressions among species of populus (brunner et al. 2004) , and during e. globulus adventitious rooting in vitro (de almeida et al. 2010), but it was unstable during seedling development in a. thaliana (volkov et al. 2003 , hong et al. 2010 , zhou et al. 2010 , in different tissues or under biotic and abiotic stresses in potato (nicot et al. 2005 ) and cucumber . similar results were also obtained by artico et al. (2010) in cotton, silveira et al. (2009) in brachyaria brizantha, and expósito-rodríguez et al. (2008) and dekkers et al. (2012) for a. thaliana and tomato seeds. like tua2, the gapdh gene was shown to reach stability values consistent enough to be considered a reference in our studies with eucalyptus (fig. 3, table 3 ), although much better candidates were pointed out. according to , the relative expression of gapdh in rice varied up to 2-fold. in brachypodium distachyon, results of rt-qpcr showed that the gapdh gene was stably expressed under various abiotic stresses, without considerable variation in response to growth hormones, although it exhibited less stability according to the tissue type being evaluated (hong et al. 2008) . in tomato, gapdh was poorly ranked as a good reference gene based on the analysis of est data (coker and davies 2003) and rt-qpcr assays during plant development (expósito-rodríguez et al. 2008) or under abiotic stress (løvdal and lillo 2009) . similar results were obtained with peach, where gapdh was not among the best reference genes in the experimental groups (tong et al. 2009 ). according to tong et al. (2009) , the reasons for the observed discrepancies may be that gapdh not only acts as a component of the glycolytic pathway but also takes part in other processes. therefore, the expression profile of gapdh might fluctuate according to the corresponding experimental conditions. the gene encoding histone h2b also exhibited levels of steady-state mrna quite constant in the different eucalyptus organs/tissues evaluated in this study (fig. 3, table 3 ). however, like the other traditional housekeeping genes, its stability was overcome by the novel eucons genes discussed later. during e. globulus adventitious rooting in vitro, h2b, along with tua2, was among the most stably expressed genes (de almeida et al. 2010) . the gene encoding histone h3 in chicory was also indicated as a good reference for rt-qpcr assay normalization (maroufi et al. 2010 ). nevertheless, czechowski et al. (2005) , based on the analysis of a large amount of data derived from microarray studies, showed that genes encoding histones were not among the best reference genes for a. thaliana. similar results were obtained by lin and lai (2010) when studying synchronized longan tree embryogenic cultures at different developmental stages and temperatures. genes encoding ribosomal proteins and rrnas are often viewed as a homogeneous collection of housekeeping genes and were employed as references in many works (thellin et al. 1999 , volkov et al. 2003 , iskandar et al. 2004 , barsalobres-cavallari et al. 2009 , de almeida et al. 2010 . nevertheless, members of this gene family were shown to have extraribosomal functions with strong variations in the pattern of their expression (wool 1996 , mcintosh et al. 2005 . for instance, these genes were shown to be specifically induced or repressed in particular tissues during different stages such as tuber (taylor et al. 1992 ) and root (williams and sussex 1995) development; or in response to stresses such as genotoxicity (revenkova et al. 1999 ) and cold (saez-vasquez et al. 2000 , kim et al. 2004 ). volkov et al. (2003) specifically evaluated the tissue-specific changes in the ribl23a mrna levels in different organs of a. thaliana. compared with leaves, the level of ribl23a mrna was increased in flowers and reduced in stems and siliques. these observations are in accordance with the idea that ribosomal protein genes in plants are transcriptionally up-regulated in actively growing tissues and down-regulated in metabolically inactive tissues (marty et al. 1993 , moran 2000 . interestingly, among the traditional housekeeping genes tested in the present work, ribl23a was the most stable, only outperformed by the eucons genes discussed later. one of the eucons genes tested in the present work is orthologous to at2g28390.1, originally identified as one of the best reference genes for a. thaliana gene expression analysis by czechowski et al. (2005) , both by microarray and by rt-qpcr analysis. the orthologous eucalyptus sequence was tested by de almeida et al. (2010) during in vitro adventitious rooting, proving it to be one of the best reference genes for rt-qpcr under the conditions assayed. the at2g28390.1 sequence putatively encodes a sand family protein member, a membrane protein related to vesicle traffic (cottage et al. 2004 , czechowski et al. 2005 . considering eucalyptus leaves, xylem and flowers tested in the present work, the at2g28390.1 gene was among the least stable genes (fig. 3, table 3 ). similar results were obtained for euc10 and euc12 genes. both sequences were previously identified as strong reference gene candidates for e. grandis vs. e. globulus xylem and leaf gene expression studies (unpublished results). de almeida et al. (2010) proved that euc12 is indeed a good reference gene for rt-qpcr studies during e. globulus in vitro rooting. in the present work, both genes exhibited acceptable stability values (fig. 3, table 3 ), but were outperformed by the eucons genes. ecualyptus grandis sequences for euc10 and euc12 were derived from at3g07640.1 (encoding an unknown protein) and at1g32790.1 (encoding a putative rna-binding protein) from a. thaliana, also pointed out by czechowski et al. (2005) as the best reference genes based on both microarrays and rt-qpcr. the analysis of the eucalyptus microarray data allowed us to identify the 50 most stable genes in the xylem of e. grandis and e. globulus and leaves of e. grandis (table 1) . we named these potential reference genes eucons after 'eucalyptus constitutives'. rt-qpcr analysis of eight of the selected genes proved that these genes were indeed very reliable references for the normalization of gene expression in different eucalyptus organs and tissues, especially those named eucons04, 08 and 21. analysis of the function of the putative encoded proteins revealed that these genes may also belong to the so-called housekeeping class of genes. eucons04 putatively encodes a protein highly similar to cyclin-dependent protein kinases (cdks) such as r. communis cdk8 and cdks from a. thaliana (menges et al. 2005) . these types of proteins are able to phosphorylate protein target amino acids in different metabolic pathways and, most notably, in cell cycle control (umeda et al. 2005) . eucons08 is similar to r. communis and a. thaliana genes possibly encoding the transcription elongation factor sii (tfiis). sii is considered one of the numerous elongation factors that enable rna polymerase ii to transcribe faster and/or more efficiently. it engages transcribing rna polymerase ii and assists it in bypassing blocks to elongation by stimulating a cryptic, nascent rna cleavage activity intrinsic to rna polymerase (wind and reines 2000) . eucons21 encodes a protein with significant sequence similarity to a putative r. communis aspartyl-trna synthetase. aminoacyl-trna synthetases catalyze the addition of amino acids to their cognate trnas. in the case of aspartyl-trna synthetase, the amino acid bound to trnas is aspartate. in plants, all aminoacyl-trna synthetases are nuclear encoded and are post-translationally targeted to the compartments where protein synthesis takes place, i.e. the cytoplasm, mitochondria or plastids (duchêne et al. 2005) . according to the analysis of the rt-qpcr data performed with the software normfinder, eucons04 and eucons08 are the best reference genes pairwise when assessing test gene expression exclusively in leaves, or in leaves along with xylem tissues. if only xylem tissues are analyzed, eucons07 and eucons27 would be the best references (table 3) . eucons07 encodes a protein similar to a member of the abc transporter family from arabidopsis lyrata while eucons27 putatively encodes a factor related to peroxisome biogenesis. interestingly, an abc transporter atpase-encoding gene was indicated as one of the best reference genes for rt-qpcr analysis of embryogenic cell cultures of cyclamen persicum (hoenemann and hohe 2011). kamada et al. (2003) , analyzing expression profiles of genes encoding peroxisomal proteins in a. thaliana, showed that these genes are expressed in all plant organs, suggesting that they play a role in metabolic pathways of unidentified plant peroxisomes and may have a constitutive expression in plants. it is important to mention that, when considering all organs/ tissues of all eucalyptus species evaluated by rt-qpcr, the stability values of eucons04, 08 and 21 are not statistically different from those observed for h2b, ribl23a and eucons06 according to the normfinder analysis, as can be observed in fig. 3c and table 3 . nevertheless, the algorithm genorm also indicated eucons04, 08 and 21 as the best reference genes for the group of variables evaluated. although outperformed by eucons04 and 08 as reference genes, the remaining eucons06, 32 and 43 genes also presented consistently constant stability values in our analysis. eucons06 putatively encodes a plastidic atp/adp-transporter, while eucons32 and 43 encode a nitrogen regulatory protein and a serine/threonine-protein kinase, respectively. to our knowledge, none of these sequences was previously indicated as a potential reference to normalize studies of gene expression by rt-qpcr. based on the microarray expression analysis of >21,000 eucalyptus genes, we identified the 50 most stably expressed genes in leaves (e. grandis) and xylem tissues (e. grandis and e. globulus). we proved by rt-qpcr that eight representatives of these reference genes are indeed very stable in different organs/tissues and species of eucalyptus, outperforming traditional housekeeping genes. considering that two statistical programs allowed us to reach similar interpretations of the microarray results, and that potential discrepancies should be expected, the good agreement of our results with the independent approaches strongly suggested that eucons04, eucons08 and eucons21 should be regarded as tne most suitable reference genes for normalization of gene expression studies in eucalyptus species. although the selected reference genes were tested only in six species of a genus with >700 species, these six species represent some of the most widely planted trees in the tropics (fao 2001) and exhibit quite a large variation in growth rate, stress resistance and wood quality (coppen 2002) . to our knowledge, the present work represents the widest in-depth study developed to validate optimal reference genes for the evaluation of transcript levels in different eucalypt organs and species. in summary, these findings provide useful tools for the normalization of rt-qpcr experiments and will enable more accurate and reliable gene expression studies related to functional genomics in eucalyptus. for microarray studies, xylem tissues were collected from 4-year-old, field-grown e. grandis and e. globulus trees located at hortoflorestal barba negra (aracruz celulose s.a., today's fibria) in barra do ribeiro, rs, brazil. xylem was collected by scraping the exposed vascular tissue after the removal of the 0.5-1 cm thick stem bark. two lines of genetically unrelated matrixes were chosen and each line was represented by two clones (biological duplicates), therefore totalling eight xylem samples. from both clones of one of the e. grandis lines, mature leaves were also collected. to minimize the proportion of primary xylem mainly located in the main veins of leaves, only leaf blades without the central vein were used for this study. tissue samples were immediately frozen in liquid nitrogen and stored at à80 c. for rt-qpcr studies, the same e. grandis and e. globulus trees were sampled, along with xylem and leaves of field-grown e. dunnii, e. pellita, e. saligna and e. urophylla. eucalyptus grandis flowers were also collected under the same conditions. harvested organs/tissues were immediately frozen in liquid nitrogen and stored at à80 c until further analysis. total rna was extracted using the purelink plant rna purification (invitrogen) reagent according to the manufacturer's instructions for small-scale rna isolation. about 20 mg of total rna was sent to nimblegen systems inc. (reykjavik, iceland) for cdna synthesis and microarray hybridizations. microarray experiments were carried out by roche nimblegen. in total, 21,432 unigenes were selected from the genolyptus est data set to make up a basic chip. ten cdnas encoding known human proteins were also included in chips as negative controls. nine oligonucleotides, 50 bp long, distributed throughout each sequence and with close melting temperatures were designed and synthesized for each sequence consensus or singleton. probes were randomly distributed on two blocks of each chip in duplicated form, adding up to 385,856 features per chip. therefore, each chip was composed of two blocks (technical replicates) containing the same collection of randomly distributed probes, and 18 hybridization values were collected for each gene from every chip. a total of 10 identical chips were produced. two chips were hybridized with cdna samples from e. grandis mature leaves, and eight chips were destined to xylem cdna hybridizations. after submission of total rna samples to nimblegen, prepared as described above, cdnas were labeled with cy3, and hybridizations, washing, scanning, data collection and initial data normalization were performed according to nimblegen's standard protocols. microarray expression data were normalized into log2 intensity values. afterwards, we carried out three distinct analyses. in the first one, we compared hybridizations from e. grandis leaf and xylem. in the second one, we compared e. grandis xylem and e. globulus xylem. in both previous analyses, the aim was to find the most similarly and the most differentially expressed genes. in the third analysis, we looked for the most similarly expressed genes in hybridizations from the three organs/tissues. in each analysis, data were mean-centered as follows. a reference set was generated by averaging the expression of each gene over all hybridizations. each piece of hybridization data was subtracted from the reference data set, generating new mean-centered data. in the next step, the 'relative difference' in gene expression was computed. the relative difference score was used to identify the most similarly and the most differentially expressed genes. we have performed the two class unpaired sam (tusher et al. 2001 ) when comparing two tissues, and a multiclass sam when comparing the three organs/tissues. in order to perform the experiments, we have used sam version 3.09 and r 2.9.2 tools and the sdma v1.0 tool as described next. in this paper we propose a new approach, called sdma, for finding the most similarly expressed genes in microarray studies. sdma is the acronym for standard deviation microarray analysis. the formal statement of sdma can be defined as follows: let g = {g 1 , g 2 , g 3 ,. . ., g m } be a set of genes. let h = {h 1 , h 2 , h 3 ,. . ., h o } be a set of hybridizations, where o ! 2. let m = {t 1 , t 2 , t 3 ,. . ., t n } be a set of tissues, where n ! 2 and each element t contains a set of hybridizations such that t & h, and t x t ty = ø for any x 6 ¼ y. let e hp = {e hp_g1 , e hp_g2 , e hp_g3 ,. . ., e hp_gm } be a set of expressions levels of m genes in hp hybridization, where p o.let avg(t p g q ) be the average of expression levels of gene q over all hybridizations from tissue p. let sd(g q ) be the standard deviation of gene q considering avg(t 1 g q ), avg(t 2 g q ), avg(t 3 g q ),. . . and avg(t n g q ). sdg can assume any value from 0 until 1. sd gq is equal to zero when avg(t 1 g q ) = avg(t 2 g q ) = avg(t 3 g q ) = . . . = avg(t n g q ), i.e. the gene q has exactly the same expression level in all tissues. the value of sd gq increases proportionally to growth of difference among avg(t 1 g q ), avg(t 2 g q ), avg(t 3 g q ),. . . and avg(t n g q ). so, sdma can rank the n most similarly expressed, i.e. those n genes for which sd g is closer to zero. when viewing sdma graphs, a main diagonal line is supposed to exist since it contains every possible data point where avg(t 1 g) = avg(t 2 g) = avg(t 3 g) = . . . = avg(t n g). although it is rare to find a gene obeying this restriction when comparing similar tissues, a concentration of data points around the main diagonal line is supposed to exist. otherwise, when comparing very dissimilar tissues, data points are supposed to be dispersed in space. regarding the eucalyptus microarray analysis, a set of 21,442 genes was considered, i.e. g = {g 1 , g 2 , g 3 ,. . ., g 21,442 }. there were three tissues evaluated, i.e. m = {t 1 , t 2 , t 3 }, where t 1 represents e. grandis leaf, t 2 represents e. grandis xylem and t 3 represents e. globulus xylem. there was a set of 20 hybridizations, i.e. h = {h 1 , h 2 , h 3 ,. . ., h 20 }, where t 1 = {h 1 , h 2 , h 3 , h 4 }, t 2 = {h 5 , h 6 ,. . ., h 12 } and t 3 = {h 13 , h 14 ,. . ., h 20 }. we also considered sd g as the standard deviation of expression levels of a gene g in t 1 , t 2 and t 3 . moreover, microarray expression data were scaled into log 10 intensity values. it resulted in values for expression levels from 4.66 to 5.20. the sdma approach ranked the genes according to their similarities in expression levels in the three distinct tissues. so, genes with minor standard deviation are supposed to be the most similarly expressed gennes. otherwise, genes with higher standard deviation are supposed to be the most differentially expressed ones. primer pairs for rt-qpcr were designed using the program primerquest (http://www.idtdna.com/scitools/applications/ primerquest) and are listed in table 2 . the relative transcript abundance was detected by sybr green, and pcrs were carried out in a total volume of 20 ml using a thermocycler 7500 real time pcr system (applied biosystems). reaction conditions included one initial cycle of denaturation at 95 c for 5 min followed by 40 cycles of 95 c for 15 s (denaturation), 60 c for 10 s (annealing) and 72 c for 15 s (elongation). pcrs were followed by a melting curve program (60-95 c with a heating rate of 0.1 c s à1 and a continuous fluorescence measurement). a negative control was run without a cdna template in all assays to assess the overall amplification specificity. the gene ontology functional annotation tool blast2go (conesa et al. 2005 ) was used to assign go identities and enzyme commission numbers. this tool also enabled statistical analysis related to over-representation of functional categories based on a fisher exact statistic methodology. selection of appropriate control genes to assess expression of tumor antigens using real-time rt-pcr reference gene selection for real-time rt-pcr in human epidermal keratinocytes gapped blast and psi-blast: a new generation of protein database search programs normalization of real-time quantitative reverse transcription-pcr data: a modelbased variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets identification and evaluation of new reference genes in gossypium hirsutum for accurate normalization of real-time quantitative rt-pcr data identification of suitable internal control genes for expression studies in coffea arabica under different experimental conditions utility of the housekeeping genes 18s rrna, beta-actin and glyceraldehyde-3-phosphate-dehydrogenase for normalization in real-time quantitative reverse transcriptase-polymerase chain reaction analysis of gene expression in human t lymphocytes selection of endogenous genes for gene expression studies in eucalyptus under biotic (puccinia psidii) and abiotic (acibenzolar-s-methyl) stresses using rt-qpcr validating internal controls for quantitative plant gene expression studies quantification of mrna using real-time reverse transcription pcr (rt-pcr): trends and problems expression and molecular analysis of the arabidopsis dxr gene encoding 1-deoxy-d-xylulose 5-phosphate reductoisomerase, the first committed enzyme of the 2-c-methyl-d-erythritol 4-phosphate pathway post-analysis follow-up and validation of microarray experiments selection of candidate housekeeping controls in tomato plants using est data natural variation explains most transcriptomic changes among maize plants of mon810 and comparable non-gm varieties subjected to two n-fertilization farming practices blast2go: a universal tool for annotation, visualization and analysis in functional genomics research molecular characterisation of the sand protein family: a study based on comparative genomics, structural bioinformatics and phylogeny genome-wide identification and testing of superior reference genes for transcript normalization genes encoding two essential dna replication activation proteins, cdc6 and mcm3, exhibit very different patterns of expression in the tobacco by-2 cell cycle reference gene selection for quantitative reverse transcription-polymerase chain reaction normalization during in vitro adventitious rooting in eucalyptus globulus labill identification of reference genes for rt-qpcr expression analysis in arabidopsis and tomato seeds dual targeting is the rule for organellar aminoacyl-trna synthetases in arabidopsis thaliana selection of internal control genes for quantitative real-time rt-pcr studies during tomato development process mean annual volume increment of selected industrial forest plantation species. by l. ugalde and o. pérez. forest plantation thematic papers, working paper 1. forest resources development service, forest resources division transcript profiling of a xylem vs phloem cdna subtractive library identifies new genes expressed during xylogenesis in eucalyptus real-time pcr: what relevance to plant studies? an overview of real-time quantitative pcr: applications to quantify cytokine gene expression selection of housekeeping genes for normalization by real-time rt-pcr: analysis of or-myb1 gene expression in orobanche ramosa development eucalyptus applied genomics: from gene sequences to breeding tools normalization of qrt-pcr data: the necessity of adopting a systematic, experimental conditions-specific, validation of references the lack of a systematic validation of reference genes: a serious pitfall undervalued in reverse transcription-polymerase chain reaction (rt-pcr) analysis in plants towards a systematic validation of references in real-time rt-pcr kinetic pcr analysis: real-time monitoring of dna amplification reactions selection of reference genes for normalization of quantitative real-time pcr in cell cultures of cyclamen persicum identification and testing of superior reference genes for a starting pool of transcript normalization in arabidopsis exploring valid reference genes for gene expression studies in brachypodium distachyon by real-time pcr refgenes: identification of reliable and condition specific reference genes for rt-qpcr data normalization evaluation of putative reference genes for gene expression normalization in soybean by quantitative real-time rt-pcr comparison of reference genes for quantitative real-time polymerase chain reaction analysis of gene expression in sugarcane validation of housekeeping genes as internal control for studying gene expression in rice by quantitative real-time pcr validation of internal control for gene expression study in soybean by quantitative real-time pcr functional differentiation of peroxisomes revealed by expression profiles of peroxisomal genes in arabidopsis thaliana normalization of reverse transcription quantitative-pcr with housekeeping genes in rice molecular cloning of low-temperature-inducible ribosomal proteins from soybean selection of optimal internal controls for gene expression profiling of liver disease the use of micrornas as reference genes for quantitative polymerase chain reaction in soybean a new 1-deoxy-d-xylulose 5-phosphate reductoisomerase gene encoding the committed-step enzyme in the mep pathway from rauvolfia verticillata identification of four soybean reference genes for gene expression normalization reference gene selection for qpcr analysis during somatic embryogenesis in longan tree analysis of relative gene expression data using real-time quantitative pcr and the 2(-delta delta c(t)) method reference gene selection for quantitative real-time pcr normalization in tomato subjected to nitrogen, cold, and light stress validation of reference genes for gene expression analysis in chicory (cichorium intybus) using quantitative real-time pcr evaluation of reference genes for quantitative rt-pcr in lolium perenne growthrelated gene expression in nicotiana tabacum mesophyll protoplasts the two ribosomal protein l23a genes are differentially transcribed in arabidopsis thaliana global analysis of the core cell cycle regulators of arabidopsis identifies novel genes, reveals multiple and highly specific profiles of expression and provides a coherent model for plant cell cycle control characterization of the structure and expression of a highly conserved ribosomal protein gene, l9, from pea housekeeping gene selection for real-time rt-pcr normalization in potato during biotic and abiotic stress identification and validation of reference genes for quantitative rt-pcr normalization in wheat reference gene selection for quantitative real-time pcr analysis in virus infected cells: sars corona virus, yellow fever virus, human herpesvirus-6, camelpox virus and cytomegalovirus infections validation of array-based gene expression profiles by real-time (kinetic) rt-pcr an optimized grapevine rna isolation procedure and statistical determination of reference genes for real-time rt-pcr during berry development normalisation of real-time rt-pcr gene expression measurements in arabidopsis thaliana exposed to increased metal concentrations a new genomic resource dedicated to wood formation in eucalyptus involvement of arabidopsis thaliana ribosomal protein s27 in mrna degradation triggered by genotoxic stress accumulation and nuclear targeting of bnc24, a brassica napus ribosomal protein corresponding to a mrna accumulating in response to cold treatment stable internal reference genes for normalization of real-time rt-pcr in tobacco (nicotiana tabacum) during development and abiotic stress selection of reference genes for quantitative real-time pcr expression studies in the apomictic and sexual grass brachiaria brizantha properties of the reverse transcription reaction in mrna quantification real-time quantitative rt-pcr for low-abundance transcripts in the inner ear: analysis of neurotrophic factor expression higher-level relationships among the eucalypts are resolved by its-sequence data differential expression and sequence analysis of ribosomal protein genes induced in stolon tips of potato (solanum tuberosum l.) during the early stages of tuberization a practical approach to rt-qpcr-publishing data that conform to the miqe guidelines housekeeping genes as internal standards: use and limits selection of reliable reference genes for gene expression studies in peach using real-time pcr significance analysis of microarrays applied to the ionizing radiation response control of cell division and transcription by cyclin-dependent kinase-activating kinases in plants accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes heat-stressdependency and developmental modulation of gene expression: the potential of house-keeping genes as internal standards in mrna expression profiling using real-time rt-pcr selection of appropriate reference genes for gene expression studies by quantitative real-time polymerase chain reaction in cucumber developmental regulation of ribosomal protein l16 genes in arabidopsis thaliana transcription elongation factor sii extraribosomal functions of ribosomal proteins cloning and characterization of the 1-deoxy-d-xylulose 5-phosphate reductoisomerase gene for diterpenoid tanshinone biosynthesis in salvia miltiorrhiza (chinese sage) hairy roots molecular characterization and expression of 1-deoxy-d-xylulose 5-phosphate reductoisomerase (dxr) gene from salvia miltiorrhiza characterization of reference genes for quantitative real-time pcr analysis in various tissues of salvia miltiorrhiza an evaluation of the performance of cdna microarrays for detecting changes in global mrna expression normalization with genes encoding ribosomal proteins but not gapdh provides an accurate quantification of gene expressions in neuronal differentiation of pc12 cells we acknowledge m.sc. marta dalpian heis for her guidance and review of the statistical tests. we are grateful to the staff of aracruz celulose s.a. (today's fibria) for providing the biological material, and genolyptus project members that selected the eucalyptus material. key: cord-253973-zr28uujh authors: maccoux, lindsey j; clements, dylan n; salway, fiona; day, philip jr title: identification of new reference genes for the normalisation of canine osteoarthritic joint tissue transcripts from microarray data date: 2007-07-25 journal: bmc mol biol doi: 10.1186/1471-2199-8-62 sha: doc_id: 253973 cord_uid: zr28uujh background: real-time reverse transcriptase quantitative polymerase chain reaction (real-time rt-qpcr) is the most accurate measure of gene expression in biological systems. the comparison of different samples requires the transformation of data through a process called normalisation. reference or housekeeping genes are candidate genes which are selected on the basis of constitutive expression across samples, and allow the quantification of changes in gene expression. at present, no reference gene has been identified for any organism which is universally optimal for use across different tissue types or disease situations. we used microarray data to identify new reference genes generated from total rna isolated from normal and osteoarthritic canine articular tissues (bone, ligament, cartilage, synovium and fat). rt-qpcr assays were designed and applied to each different articular tissue. reference gene expression stability and ranking was compared using three different mathematical algorithms. results: twelve new potential reference genes were identified from microarray data. one gene (mitochondrial ribosomal protein s7 [mrps7]) was stably expressed in all five of the articular tissues evaluated. one gene hira interacting protein 5 isoform 2 [hirp5]) was stably expressed in four of the tissues evaluated. a commonly used reference gene glyceraldehyde-3-phosphate dehydrogenase (gapdh) was not stably expressed in any of the tissues evaluated. most consistent agreement between rank ordering of reference genes was observed between bestkeeper© and genorm, although each method tended to agree on the identity of the most stably expressed genes and the least stably expressed genes for each tissue. new reference genes identified using microarray data normalised in a conventional manner were more stable than those identified by microarray data normalised by using a real-time rt-qpcr methodology. conclusion: microarray data normalised by a conventional manner can be filtered using a simple stepwise procedure to identify new reference genes, some of which will demonstrate good measures of stability. mitochondrial ribosomal protein s7 is a new reference gene worthy of investigation in other canine tissues and diseases. different methods of reference gene stability assessment will generally agree on the most and least stably expressed genes, when co-regulation is not present. the quantification of gene expression allows the mechanism organising biological activity to be determined. at present, real-time rt-qpcr provides the most accurate and specific measure of gene expression, with an unsurpassed dynamic range and a high level of reproducibility. a number of variables will contribute to the variability of gene expression measurements, such as the number and type of cells in the tissue evaluated, the method and efficiency of mrna extraction, mrna handling techniques [1] , mrna integrity [2, 3] , method of reverse transcription [4] and analytical detection chemistry method [1] . these inter-sample differences are addressed through the process of normalisation [5] , whereby the expression of an individual gene within a sample is related to that of a calibrating gene known as a reference, control or "housekeeping" gene. ideally, a reference gene is expressed at a consistent and repeatable quantity across all samples being compared, so that relative differences in gene expression can be measured with confidence. commonly used reference genes such as beta-2-microglobulin (b2m), glyceraldehyde-3-phosphate dehydrogenase (gapdh) and beta actin (actb), are not constantly expressed across all tissue types and disease states [6, 7] . thus it is widely accepted that the selection of reference genes should be established through the validation of expression stability in the tissue or cells of interest, before use. a number of statistical algorithms exist for the optimisation of reference gene selection, such as genorm [6] , global pattern recognition [8] , bestkeeper© [9] , equivalence tests [10] and normfinder [11] . in each case, mathematical evaluation of expression data allows the ordering of candidate reference genes, on the basis of the relative expression stabilities. at present, no gold standard exists for the selection of reference genes, and although methods have been compared with similar results in some reports [12] [13] [14] but not in others [11] , the optimal method for selections remains unknown. the identification of new reference genes from microarray data, within a particular tissue type, has been demonstrated to provide more "stable" reference genes than those conventionally used [11, [14] [15] [16] , as determined using stability algorithms. microarray data can be stratified on the basis of fold changes in expression [14] , the variance of expression [11, 16] or integrative correlations [15] . candidate genes can then be selected from stratified data, and frequently demonstrate expression stabilities greater than conventionally used reference genes [11, 14, 15] . however microarray data has yet to identify a new reference gene which shows consistent stability across multiple tissue or cell types, and/or disease situations. therefore a ubiquitous reference gene suitable for normalisation of gene expression of all experiments probably does not exist, but the identification of new reference genes to improve in reference gene stability is important to reduce error in rt-qpcr experiments. in this study, we identified candidate reference genes from microarray expression profiling data generated from the evaluation of two different canine articular tissues (cartilage and cranial cruciate ligament). the relative stability of expression of each reference gene in normal and osteoarthritic canine articular tissues was determined from rt-qpcr reactions using statistical algorithmic packages. the stability of the new reference genes were compared between tissues, and related to a commonly used reference gene(gapdh). identities and putative functions of each of the new reference genes are listed in table 1 . although the genes selected did not localise to common pathways or functions, two of the genes coded for mitochondrial ribosomal proteins. the metrics of the candidate reference gene stability are presented in table 2 . all methods of stability analysis agreed in finding the new genes mrps7 and mrps25 to be stably expressed. likewise, c7orf28b and nck2 were determined to be the least stably expressed genes by both genorm ( figure 2b ) and normfinder ( figure 2a) . gapdh was identified as the 4 th most stably expressed gene by both genorm and best-keeper©, and the 8 th most stably expressed gene by normfinder. all three methods of reference gene analysis agreed on the most stably expressed reference genes, which were c7orf28b, mrps7 and mapk6. genorm ( figure 2b ) and normfinder ( figure 2a ) agreed that the least stably expressed gene was nck2. gapdh was identified as the 9 th most stably expressed gene by normfinder, the 7 th most stably expressed gene by genorm, and the 5 th most stably expressed gene by bestkeeper©. methods did not agree on the most stably expressed genes, although all methods agreed on the five most stably expressed genes (albeit, not their order); atic, mrps7, c7orf28b, ormdl2 and hirp5. mrps25 was the least stably expressed gene as determined by both normfinder (figure 2a ) and genorm ( figure 2b ). gapdh was identified as the 7 th most stably expressed gene by normfinder, the 9 th most stably expressed gene by genorm, and the 8 th most stably expressed gene by bestkeeper©. although bestkeeper© and normfinder agreed on the six most stably expressed genes (mrps25, atic, hirp5, tkt, mrps7, ptdss1), and nck2 was determined to be the least stably expressed gene by normfinder ( figure 2a) and genorm ( figure 2b ), no further patterns of agreement in rank ordering of the expression profiles were identified. atic was identified as the most stably expressed gene by normfinder ( figure 2a) and bestkeeper© ( figure 2c ), and the 6 th most stably expressed gene by genorm. rank ordering between normfinder and genorm agreed on the seven most stably expressed (c7orf28b, mrps25, pias1, ptdss1, atic, mrps7 and hirp5) genes but not their order, and the least stably expressed gene (nck2). bestkeeper© ( figure 2c ) and normfinder ( figure 2a ) agreed on the most stably expressed gene (c7orf28b). using the reference gene stability value (m) of 0.40 as the determinant of stable expression [6] , mrps7 was stably expressed in all five tissues, and hirp5 was found to be stably expressed in four tissues ( figure 2b ). gapdh was found to be unstable in all of the tissues evaluated, which is consistent with the findings of a previous study of reference genes in these tissues [17] . comparison of gene stability (m) and pairwise stability (v) values with a previous study of commonly used reference genes using similar tissues further illustrates how optimal reference gene stabilities, as assessed by genorm, can be achieved using the new reference genes rather than the commonly used reference genes (table 3) . no single reference gene was consistently identified as being the most stably expressed by normfinder, genorm or bestkeeper© across most tissues. there was not consistent agreement in the rank ordering, or the selection of the optimal candidates by the different methods, although reference gene stability measures as determined by: 2a figure 2 reference gene stability measures as determined by: 2a. the normfinder algorithm (with a lower value indicating increased reference gene stability). 2b. the genorm algorithm (with a stability measure [m value] <0.4 indication appropriate reference gene stability). 2c. the bestkeeper algorithm (with a higher value indicating increased reference gene stability). please note that as only the top 10 genes (as ranked by the normfinder algorithm) are selected for analysis, thus there are not necessarily data points for each gene corresponding to each tissue. agreement was generally reached on the most and least stable gene. for example bestkeeper© and normfinder always identified the same gene as being most stably expressed. when looking at rank order across all three reference gene stability programs, fat pad showed the highest correlation between methods, followed by cruciate ligament, cartilage, bone and synovium as the least consistent ( table 2) . when the data for all tissues was compared together (figure 2a , b, c), a much clearer pattern of reference gene stability was observed. the stability metrics of the reference genes in different tissues show similar patterns across all three methods. mrps7 demonstrates the most consistent metric (low genorm m value, low normfinder value and high bestkeeper© correlation), with hirp5 and atic demonstrating a similarly consistent stability across all tissues. this is supported by the finding that mrps7 was consistently identified as being stably expressed in all tissues by genorm (mrps7), as well as being ranked as one of the two most stable reference genes in four of the five tissues by genorm (cartilage, fat, bone and synovium), and in three of the five tissues using normfinder and bestkeeper© (cartilage, ligament and fat). identification of new reference genes using rt-qpcr methodology for gene normalisation was not successful at identifying new reference genes with increased stability. nck2 was determined to be the least stably expressed gene in synovium and fat pad, and one of the four least stably expressed genes in cruciate ligament and cartilage. trappc2l was not identified as being stably expressed in any tissue using the genorm algorithm, and was not ranked higher than the 8 th most stably expressed gene in any tissue using the normfinder algorithm. a number of different strategies have been employed to filter microarray data to identify new reference genes, such as selection on the co-efficient variation and level of expression [11] , fold changes of expression [14, 16] , or integrative correlations [15] . we used a combination of filtering on statistical significance, fold change and coefficient of variation (percentage standard deviation) to narrow the potential number of reference genes. furthermore, these criteria were applied to three different experiments, using two different data sets, to identify genes which were more likely to have generic stability across multiple tissues for diseases. genes were finally filtered on the basis of defined annotation and level of expression. in retrospect, genes should also have been selected on the basis of single transcript expression (i.e. the absence of splice variants). although the two most stably expressed genes (mrps7 and hirp5) currently have no splice variants reported, the absence of splice variants did not necessarily confer reference gene stability across multiple tissues (as demonstrated by gapdh and mrps25, genes which do not have splice variants annotated and were not the most stably expressed) but should be taken into account when selecting new reference genes, as another potential indicator of instability. our filtering method was straightforward, quickly performed and easily completed by any person without a full understanding of microarray data set handling, and as such could be applied to publicly available microarray data sets for a given experiment or disease. variability in the expression of commonly used reference genes has been recognised on the analysis of cell culture experiments [18] and clinical tissue specimens [19] . the selection of reference genes upon their stability as determined by the mathematical assessment of their expression values is a widely accepted technique [6, [12] [13] [14] [15] 20, 21] . we identified one gene which showed stable expression across normal and diseased articular tissues (mrps7), and a number of genes which demonstrated a relatively consistent stability across the majority of tissue specimens (hirp5). one should bear in mind that the tissues evaluated were from the same embryological origin (mesenchymal tissue), and hence there may have been a tendency towards identifying a reference gene which was stable in all tissues, although this is not supported by reports of reference gene stability in different tissues [21] . likewise, the diseases compared in the microarray data sets were the same as those affecting the tissue samples evaluated by real-time rt-qpcr, which may further tend towards identifying reference genes whose stability was constant. therefore, although we identified one gene as being stably expressed in all tissues, we would not advocate its use as a reference gene in other tissues or diseases without assessment of its stability in the samples to be evaluated [6, 16, 21] , as the utopia of a universal reference gene suitable for all studies probably does not exist on basis of the published evidence to date. mitochondrial ribosomal protein s7 is involved in mitochondrial protein synthesis. the precise function of this gene is unknown in eukaryotes, but the protein is thought to be involved in organising the 3' domain of the 16 s rrna in the mitochondria of prokaryotes, and thus be involved in the initiation of translation in mammalian mitochondria [22] . microarray data analysis indicated the mitochondrial ribosomal protein s25 was also stably expressed, although it was only stably expressed in two of the four tissues analysed by rt-qpcr (cartilage and fat pad). in a separate study, mitochondrial ribosomal protein l19 was one of six genes identified from microarray data obtained from different tissues and cells, as a good reference gene for real-time rt-qpcr experiments, when compared to conventional reference genes for mammary tumour expression profiling [16] . mitochondrial ribosomal gene expression appears to show greater stability across different tissues and thus may be less affected by tissue type or disease status, and better potential candidate reference genes for other real-time rt-qpcr experiments. comparing the results of this study to a similar previous study of commonly used reference genes in multiple articular tissues demonstrates the increased stability of the "new" reference genes (table 3 ) [17] . the selection of candidate reference genes from microarray data identified new genes which were more stably expressed and is consistent with the general outcome of previous studies using this methodology [11, [14] [15] [16] . the normalisation of microarray data by geometric mean of three reference genes [6] did not identify genes (nck2 or trappc2l) with appropriate stability to be suitable for use as reference genes. the instability of these genes may be reflected, in part, by the greater variation identified in the triplicate repeats of each assay when compared to more genes determined as being more stably expressed such as hirp5 or mrps7. the less stable expression of the three conventional reference genes (gapdh, rpl13a and sdha) probably resulted in the selection of similarly "unstably" expressed reference genes from microarray data, and thus accounted for this being a futile method of trying to select reference genes, which agrees with the evaluation of these types of methodologies for the accurate normalisation of microarray data [23] . these genes were selected on the basis of a preliminary study of reference gene stability in canine oa tissues [24] , however subsequent work evaluating greater sample numbers has determined that one of these genes (shda) demonstrates differential expression in oa cartilage [25] and thus its use may have further predisposed to the selection of genes which were not stably expressed. furthermore, the conventionally used reference gene we evaluated (gapdh) did not show acceptable stable expression in any of the tissues we analysed. we used three different methods of ranking reference gene stability in each experiment. correlation co-efficient could be generated to compare methods and quantify the agreement of the rank ordering of different methods. previous studies have demonstrated that the generation of rank orders can be very similar between different methods [14] , but this is not always the case [11] . the best correlation in rank ordering was observed between genorm and bestkeeper©, across all the tissues which is unsurprising as both are generated by pairwise comparisons (although genorm uses transformed data, whereas bestkeeper© uses threshold cycle values), although bestkeeper© and normfinder always identified the same gene as being most stably expressed. the rank order of reference gene stability was identified most consistently for fat pad, followed by cruciate ligament, cartilage, bone and least consistently for synovium. the advantage of using a model based stability assessment is that rank ordering can be changed if co-regulated genes are included in the stability assessment procedure, as pairwise assessments will determine an increase in stability between these methods [11] . as we identified a number of new reference genes which have very little functional information associated with their annotation, we checked for co-regulation between the most stably expressed genes by removing one of the highest ranked genes (as determined by pairwise comparisons) alternately, and reassessing the rank ordering of reference genes stabilities. no major changes in rank ordering or reference gene stability were observed when this was performed. however, it should be noted that other factors besides gene expression pathway similarities can contribute to co-regulation. yu et al. (2003) identified that genes targeted by similar transcription factors have complex relationships across the co-regulated genes [26] . the different methods for determining reference gene stability did not necessarily agree on rank order, but were good at determining both the most and least stably expressed genes, regardless of method. the top two most stably expressed genes analysed by genorm for each tissue were then used to study cytokine gene expression in canine osteoarthritis [27] . the use of microarray data for the selection of reference genes allowed the identification of multiple genes demonstrating greater stability than a conventional reference gene in multiple tissues. mitochondrial ribosomal protein s7 is suitable for use in all the experimental conditions we analysed, and is suitable for investigation in other experiments. different methods of assessment of gene stability do not always show correlation between the rank order of gene expression stability, but they do generally agree on which genes are suitable for use to normalise gene expression experiments. expression profiling data from 10 hip articular cartilage samples (5 control, 5 from osteoarthritic [oa] joints) and 16 cranial cruciate ligament (ccl) samples (4 normal low-risk of rupture, 7 normal high-risk of rupture, and 5 ruptured ligament from oa joints) were generated from a custom designed 44 k transcript canine whole genome 60 mer oligonucleotide microarray [28] . raw data was normalised by two methods; locally weighted scatterplot smoothing (lowess), or using the geometric mean of 3 conventional reference genes arbitrarily selected (glyceraldehyde-3-phosphate dehydrogenase [gapdh], ribosomal protein l13a [rpl13a], succinate dehydrogenase flavoprotein subunit a [sdha]). expression quantification was exported into an excel datasheet (microsoft excel 2003), and the data compared in three separate experiments as follows; 1) normal hip articular cartilage was compared to oa cartilage, 2) normal ccl (high-risk of rupture) was compared to normal ccl (low-risk of rupture), the stepwise procedure for identifying candidate reference genes is illustrated in figure 1 . data for each reference gene candidate was compared in each experiment by calculating the fold change in mean expression level (between the two comparison groups), student's t-tests and percentage standard deviation (co-efficient of variation). to identify the most stably expressed genes across each of the experiments, the prospective reference genes were then selected using the following the criteria; 1. student's t-test p value > 0.5 (in all experiments). 2. ratio of expression between the two groups compared in each experiment <1.5 (in all experiments). 3. standard deviation of the mean expression in each experimental group being <30% (in all experiments). the data sets were reduced to 420 transcripts (lowess normalised) and 13 transcripts (reference gene normalisation). to further refine and filter the new reference gene list, data was ordered on the average signal intensity and; 4. the probe sequences used from the microarray experiments were entered into the ncbi blast ® database [29] to confirm the gene identity, 5. gene function was determined [29] and the associated gene information checked to ensure no known involvement in oa. complete filtering reduced the data set to 12 genes, of which 10 were selected from the lowess normalised data, ( and cytoplasmic protein nck2 [nck2]). glyceraldehyde-3-phosphate dehydrogenase [gapdh] was also selected as it is a commonly used reference. the sequence details and putative functions (determined by reference to the human transcripts [29] ) are listed in table 1 . a separate set of samples were collected for the analysis of the new reference genes. infrapatellar fat (n = 5), ruptured cranial cruciate ligament (n = 5), femoral head articular cartilage (n = 5), ulnar subchondral bone (n = 5) and synovial membrane (n = 5) were obtained from dogs with clinical oa secondary to naturally occurring joint disease. in each case the samples were obtained as part of the standard surgical treatment for the disease in question (total hip replacement, cranial cruciate ligament rupture surgery or fragmented coronoid process removal). con-trol samples (healthy) were obtained from infrapatellar fat pad (n = 5), cranial cruciate ligament (n = 5), synovial membrane (n = 5), hip articular cartilage (n = 5) and ulnar bone (n = 5) of dogs euthanized for reasons other than, and with no evidence of, joint disease. following the collection of the tissue, the samples were weighed and immediately stored in rnalater™ (qiagen inc, crawley, uk), according to the manufacturer's instructions, until extraction. for all of the tissue samples total rna was extracted using a phenol/guanidine hydrochloride reagent (trizol, invitrogen ltd, uk) with a chloroform extraction and ethanol precipitation, as previously described [30] . an on column dna digestion step was included (rnase-free dnase set, qiagen ltd, crawley, uk). final elution of the total rna was performed using 30 μl of rnase free water, and repeated to maximize the amount of rna eluted. total rna samples were stored at -80°c until use. the concentration of total rna representing each sample was quantified by using a nanodrop ® nd -1000 uv/visible spectrophotometer (nanodrop technologies ltd, utah, usa). reverse transcription was performed using superscript ii reverse transcriptase (invitrogen, dorset, uk) according to the manufacturers instructions [31] . initially 200 μg (10 μl) total rna was pre-incubated with 0.5 μg (1 μl) oligo-dt [12] [13] [14] [15] [16] [17] [18] (invitrogen, paisley, uk) and 10 mm (1 μl) dntp mix (invitrogen, paisley, uk) at 65° for 5 minutes. after microarray data normalised by two different methods was filtered to identify new reference genes using statistical significance, fold changes in expression between experimental groups, the co-efficient of variation and ontological evaluation chilling on ice, 4 μl of 5 × first strand buffer (containing 250 mm tris-hci (ph 8.3), 375 mm kc1, 15 mm mgcl 2 ), 2 μl of 0.1 m dtt and 40 units (1 μl) of rnase (promega, southhampton, uk) were added to each sample and the samples incubated for 2 minutes at 42°c, followed by the addition of 200 units (1 μl) of superscript ii reverse transcriptase (invitrogen, doreset, uk) and incubated for 50 minutes. reverse transcriptase activity was terminated by incubation at 70°c for 15 minutes, and samples stored at -80°c until use. transcript sequences were obtained from the national centre for biotechnology information [29] and were cross referenced to the ensembl canine genome database [32] . primer and probe sequences were then designed for each of the reference genes by using the universal probe library assay design centre (roche diagnostics ltd; [33] ) blast searches were performed for all primer sequences to confirm gene specificity, and electrophoresis of the pcr reaction mixture confirmed a single product of the appropriate length in all cases. primers were synthesized by metabion international ag (martinsried, germany), and probes were synthesized by roche diagnostics (lewes, uk) using locked nucleic acid with 5'-end reporter dye fluorescein (fam (6-carboxy fluorescene)) and 3'-end dark quencher dye. real-time rt-qpcr assays were performed in triplicate using the lightcycler ® 480 (roche diagnostics; lewes, uk) in 384 well format, with three no template controls used for each assay. the reaction volume in each well consisted of 5 [34] , the parameters of which are listed in table 1 . all samples were checked for absence of genomic dna contamination using a canine genome specific rt-qpcr assay, previously described [25] . the assays were deemed to be reproducible, as determined by the average standard deviation of the triplicate repeats of each assay being less than 30% (table 1) . real-time rt-qpcr data was exported into an excel datasheet (microsoft excel 2003) and analysed using three separate reference gene stability analysis software packages; genorm [6] , bestkeeper© [9] and normfinder [11] . each of these methods generates a measure of reference gene stability, which can be used to rank the reference genes in order of stability. genorm generates a stability measure (the m value) for each gene which is arbitrarily suggested to be lower than 0.4 (with a lower value indicating increased gene stability across samples), and a pairwise stability measure to determine the benefit of adding extra reference genes for the normalisation process, with again a lower value indicating greater stability of the normalised genes, and a lower value indicating greater stability with an arbitrary cut off value of lower than 0.15 indicating acceptable stability of the reference gene combination [6] . normfinder generates a stability measure of which a lower value indicates increased stability in gene expression. by using a model-based approach, normfinder groups samples to allow for a direct estimation of expression variation, compared to the pairwise comparison approach that ranks genes according to the similarity of their expression profiles. therefore, taking a sample set which consists of two sample subgroups where all of the candidates but one show little difference between the groups, the one candidate which shows no difference will have the smallest stability value across all candidates and be the most stably expressed gene. best-keeper© generates a pairwise correlation co-efficient between each gene and the bestkeeper© index (the geometric mean of the threshold cycle values of all the reference genes grouped together). stability measures for combined (normal and diseased) samples were recorded, as ultimately it is these measures which would be used to determine which genes were suitable for normalising expression data from genes of interest in a particular disease (osteoarthritis) in practice. bestkeeper© can only be used to analyse a maximum of 10 housekeeping genes so the three genes least stably expressed (as determined by normfinder) were excluded from bestkeeper© analysis. the stability values for each gene, as determined by each method of analysis, are illustrated in figure 2a , b, and 2c. statistical tests were performed using a statistical software package (minitab v14.1; minitab ltd.; coventry, uk). spearman rank correlation coefficients were then calculated using the ranking order of genes to compare the relationship of the relative ordering of genes by different methods of analysis (table 2) . finally, the stability parameters of the new reference genes were compared to those generated for commonly used reference genes in a similar study of canine oa tissues [17] (table 3) . pitfalls of quantitative real-time reverse-transcription polymerase chain reaction towards standardization of rna quality assessment using user-independent classifiers of microcapillary electrophoresis traces quantification of mrna using real-time reverse transcription pcr (rt-pcr): trends and problems sensitivity and accuracy of quantitative real-time polymerase chain reaction using sybr green i depends on cdna synthesis conditions real-time rt-pcr normalisation; strategies and considerations accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes validation of housekeeping genes for normalizing rna expression in real-time pcr customized molecular phenotyping by quantitative gene expression and pattern recognition analysis determination of stable housekeeping genes, differentially regulated target genes and sample integrity: bestkeeper--excel-based tool using pair-wise correlations equivalence test in quantitative reverse transcription polymerase chain reaction: confirmation of reference genes suitable for normalization normalization of real-time quantitative reverse transcription-pcr data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets reference gene selection for quantitative real-time pcr analysis in virus infected cells: sars corona virus, yellow fever virus, human herpesvirus-6, camelpox virus and cytomegalovirus infections selection of reference genes for quantitative rt-pcr studies in striped dolphin (stenella coeruleoalba) skin biopsies comparison of 12 reference genes for normalization of gene expression levels in epstein-barr virus-transformed lymphoblastoid cell lines and fibroblasts selection of suitable reference genes for accurate normalization of gene expression profile studies in non-small cell lung cancer statistical modeling for selecting housekeeper genes expression stability of commonly used control genes in canine articular connective tissues validation of endogenous controls for gene expression studies in human adipocytes and preadipocytes a comparison of various "housekeeping" probes for northern analysis of normal and osteoarthritic articular cartilage rna martens a: selection of a set of reliable reference genes for quantitative real-time pcr in normal equine skin and in equine sarcoids development and evaluation of canine reference genes for accurate quantification of gene expression identification of a mammalian mitochondrial homolog of ribosomal protein s7 clinically validated benchmarking of normalisation techniques for two-colour oligonucleotide spotted microarray slides expression stability of commonly used reference genes in canine articular connective tissues analysis of normal and osteoarthritic canine cartilage mrna expression by quantitative-pcr genomic analysis of gene expression relationships in transcriptional regulatory networks expression profiling of select cytokines in canine osteoarthritis tissues design and production of a whole genome dog oligonucleotide microarray. advances in canine and feline genomics information ncb: national centre for biotechnology information assessment of the use of rna quality metrics for the screening of normal and pathological canine articular cartilage samples oligonucleotides used as template calibrators for general application in quantitative polymerase chain reaction lm was self funded, dnc was funded by the biotechnology and biological sciences research council, fs was funded by the university of manchester, and pjd was funded by the higher education funding council of england. the study was funded in part by a grant from the petplan charitable trust, uk, and in part by a project grant from the university of manchester. the manuscript preparation was funded by the university of manchester. neither funding body had any role in the study design; in the collection, analysis, and interpretation of data; in the writing of the manuscript; or the decision to submit the manuscript for publication. dnc and lm carried out the microarray data analysis. lm and fs carried out the assay design. dnc, lm and fs performed the molecular genetic studies and dnc performed the statistical analysis. dnc and pjrd conceived the study, its design and coordination, and drafted the manuscript with lm. key: cord-263470-vmqvropy authors: rukavtsova, e. b.; abramikhina, t. v.; shulga, n. ya.; bykov, v. a.; bur’yanov, ya. i. title: tissue specific expression of hepatitis b virus surface antigen in transgenic plant cells and tissue culture date: 2007 journal: russ j plant physiol doi: 10.1134/s1021443707060088 sha: doc_id: 263470 cord_uid: vmqvropy the tobacco plants (nicotiana tabacum l.) carrying the hbsag gene controlled by (aocs)(3)amaspmas, the hybrid promoter that includes regulatory elements of the agrobacterial octopine and mannopine synthase genes, as well as plants controlled by the same promoter and adh1, maize alcohol dehydrogenase gene intron were obtained. the presence of the adh1 gene intron did not significantly change the level of expression of the hbsag gene in plants. the analysis of expression of hepatitis b virus surface antigen (hbs-antigen) in transformed plants expressing the hbsag under the control of different promoters was made. the level of hbs-antigen in plants carrying the hbsag gene controlled by (aocs)(3)amaspmas, the hybrid agrobacterium-derived promoter, was the highest in roots and made up to 0.01% of total amount of soluble protein. the level of hbs-antigen in plants carrying the hbsag gene controlled by the dual 35s rna cauliflower mosaic virus promoter was the same in all organs of the plant and made up to 0.06% of the total amount of soluble protein. hairy root and callus cultures of plants carrying the hbsag gene and expressing the hbs-antigen were obtained. the transgenic plant-based technologies for production of subunit vaccines of a new generation, which may become cheaper and safer alternatives to traditionally obtained vaccine preparations, are now being developed. plant cells contain enzymatic systems of post-translational modification necessary for the assembling of monomer vaccine proteins they synthesize into the immunogenic multimer forms. plants are fully able to synthesize target antigens that can cause an active immune response [1] of the host organism. the viral and bacterial antigens were shown to stimulate the production of immunoglobulines against the corresponding pathogens [2] [3] [4] [5] [6] [7] . various transgenic plants are now being produced and tested in research centers worldwide as potential producers of vaccines against infection agents causing various human diseases, including the hepatitis b virus [3, 8, 9] . however, the transgenic plant-derived vaccine preparations are not yet commercially available. earlier we have obtained the tobacco plants expressing the synthetic gene of the hepatitis b surface antigen ( hbsag ) controlled by single and dual 35s rna cauliflower mosaic virus promoters (camv 35s and camv 35ss, respectively) [10, 11] . the presence of the dual 35s promoter increased expression of the antigen to the level of 0.05% of total amount of soluble protein. the transgenic potato plants expressing the hbsag gene controlled by the same promoter and also by the patatin promoter of potato tubers were also produced. the amount of the hbs-antigen in potato tubers exceeded 1 µ g/g of tuber mass and was the highest in plants expressing the hbsag gene controlled by the dual camv 35ss promoter. to obtain plants with tissue-specific expression of the hepatitis b vaccine gene in tissue cultures and whole plants (as potential producers of the vaccine), tissue-specific promoters, especially of the hybrid agrobacterium-derived promoters (aocs)3amaspmas , prove to be effective. these promoters consist of the regulatory elements of the octopine synthase ( ocs ) and mannopine syntase ( mas ) genes of agrobacterium tumefaciens. as was shown earlier, these promoters strongly induced the expression of the β -glucoronidase (gus) gene in plants up to the level greatly exceeding that in plants controlled by the dual 35s rna cauliflower mosaic virus promoter [12] . the objective of this study was to obtain transgenic tobacco plants synthesizing the hepatitis b surface antigen controlled by ( aocs ) 3 amaspmas promoters and regulated by the elements of agrobacterial octopine synthase and mannopine synthase genes and also to analyze the expression profile of the hbsag gene in different cells of the whole plant as well as that in callus and hairy root tissue cultures. materials and methods construction of plasmids for plant transformation. pss-hbsag, the recombinant plasmid, carrying a synthetic gene encoding the hbsag/mayw polypeptide [13] , was used as a gene source. the two vectors both containing the ( aocs ) 3 amaspmas promoter were used for cloning and plant transformation: pe1802 and pe1945 (courtesy of dr gelvin, purdue university, united states) [12] . the extension of the hbsag gene with specific sequences necessary for molecular cloning was done by means of pcr with two primers containing the kpn i and sal i restriction sites: 5'-cggg-taccatggaaaacattactt and 5'-cggtcgac-ctatcattaaatgtaaac, respectively. the reaction mixture contained 0.1 µ g of pdes20 plasmid dna as a template, 25 mm kcl, 60 mm tris-hcl, ph 8.5, at 25 ° c, 1.5 mm mgcl 2 , 0.1% triton x-100, 10 mm 2-mercaptoethanol, 0.2 mm dntps mixture (usb, united states), 0.25 µ m of each primer and 2.5 units of taq dna polymerase (sibenzim, russia). the reaction volume was 50 µ l. the reaction started with 5 min at 94° c and was followed by 30 cycles: 1 min at 94° c, 1 min at 55° c, 1 min at 72° c, and ended with 7 min at 72° c. the gene amp pcr system 2400 (perkin-elmer, united states) was used for experiments. the amplified gene was cloned into a binary pe1802 vector for transformation of plants between the restriction sites kpn i and sal i. for cloning purposes, the hbsag gene was cut out of the pss-hbsag plasmid between the xho i and cfr 9 i restriction sites and incorporated into the pe1945 vector between the same sites. the plasmid constructions obtained were used for transformation of escherichia coli strain hb101. the constructions containing the hbsag gene downstream the promoter were transferred into strain lba4404 (pal4404) of a. tumefaciens [14] by means of direct transformation [15] . the analysis of dna of the obtained clones was done by southern hybridization with the 32 p-labeled amplification product of the hbsag gene [16] . transgenic plants. the nicotiana tabacum l., cv. samsun plants were cultivated in vitro in 0.5-to 1-l cultivation containers on the agar-solidified phytohormone-free ms medium [17] at 24 to 26°ë , 2 klx, and 65% relative humidity. the resulting agrobacterial strain culture was used for the infection of leaf disks according to a standard protocol [18] . the leaf disks were co-cultivated with the overnight agrobacterial cultures for two days and then transferred onto the selection ms medium containing hormones (1 mg/l of ba and 0.1 mg/l of naa), 50 mg/l of kanamycin sulfate (km), and 500 mg/l of cefotaxime. the regenerated shoots were passed onto the selection ms medium. calli were obtained from leaf explants of transformed plants on the ms medium containing 3% sucrose, 0.5 mg/l ba, 2 mg/l naa, and 50 mg/l km. the hairy root culture was grown on the hormone-free ms medium from disk leaves of transformed plants infected with strain a4 of a. rhizogenes [19] . extraction of dna from tobacco leaves. the dna from tobacco leaves was extracted as described in [20] . leaves were crushed in 1.5 ml eppendorf tubes, 0.4 ml of the extraction buffer was added (containing 0.2 m tris-hcl, ph 7.5, 0.25 m nacl, 25 mm edta, and 0.5% sds); the mixture was incubated for 1 h at room temperature and then centrifuged at 12000 rpm for clarification. an equal volume of isopropanol was added, and the dna precipitate was dissolved in 100 µ l of the te buffer (10 mm tris-hcl and 1 mm edta, ph 8.0). the plant dna obtained was used as a template for pcr. pcr analysis of the hbsag gene. for the pcr analysis of hbsag gene, 0.5 to 1 µ g of plant genomic dna was used. the reaction mixture and cycle conditions were the same as above. immunoassay of the surface antigen level. the immunoassay of the hepatitis b virus surface antigen level in transformed plants was done as earlier described by us [12] with minor modifications. leaves, roots, and calli of tested plants were ground in liquid nitrogen, and the extraction buffer (0.05 m sodium phosphate buffer, ph 7.5, 0.15 m nacl, 1 mm edta, 0.3% tween 20, 0.4 mm phenylmethanesulfonyl fluoride, and 0.5% sodium ascorbate). the extract obtained was centrifuged for 20 min at 3500 rpm, the supernatant was transferred to 1.5-ml eppendorf tubes and recentrifuged for 10 min at 12000 rpm. the "vektohep b-hbs-antigen" test systems (jsc vektor-best, russia) were used for measuring the hbs-antigen level in the supernatant. the recombinant yeast cell-derived hbsantigen [21] was used as a positive control. the assay was carried out according to the manufacturer's instructions. the total amount of soluble protein was measured according to bradford [22] . mechanical leaf wounding for induction of hbsag gene expression. leaves were immersed into the liquid ms medium, cut with a scalpel into stripes 1 to 2 mm wide, and incubated at 24 to 26 °ë for 24 h. the injured leaves were used for immunoassay. the recombinant plasmids used for transformation of plants are shown in fig. 1 . these plasmids are based on the pe1802 and pe1945 vectors containing the ( aocs ) 3 amaspmas promoter [12] . both plasmids are carrying hbsag , a synthetic gene [10] . the difference between the two vectors is that the pe1945 is carrying the adh 1 the maize alcohol dehydrogenase gene intron, rukavtsova et al. but is lacking the tl enhancer. the obtained plasmid constructs carrying the hbsag gene under the ( aocs ) 3 amaspmas promoter were used for the direct transformation of strain lba4404 (pal4404) of a. tumefaciens . separate agrobacterial colonies were collected for extraction of plasmid dna and its visualization in agarose gel electrophoresis. the transformed agrobacterial strains were used for the infection of tobacco leaf disks. 15 to 20 lines of transformed plants of each type were selected for subsequent molecular, genetic, and biochemical analysis. the hbsag gene presence in transformed plants was confirmed by means of pcr. the target dna, which corresponds to the hepatitis b virus surface antigen, was found in all transgenic plants tested (fig. 2) . the immunoassay was carried out to study the expression profile of the surface antigen hbsag in obtained plants. the hbs-antigen was found in different amounts in plants of all transgenic tobacco lines tested (see the table). the expression level of this antigen in the in vitro grown transgenic tobacco plants was up to 0.01% of total amount of soluble protein. maximum expression of the surface hbsag antigen was observed in roots. the genetic construct pe1945-ag used for some of our experiments contained the alcohol dehydrogenase gene intron. the inclusion of introns into plant cells may enhance the expression of foreign genes [23] . however, in our case the presence of the adh1 gene intron did not increase the expression of the target gene. the enhancement effect seems to be dependent on the location of the intron in the genetic construct carrying the target gene. the table shows the expression levels of the hbsantigen in plants controlled by different promoters. thus, the expression level of the hbs-antigen in plants controlled by ( aocs ) 3 amaspmas, the hybrid agrobacterial promoter, reached the maximum of 0.01% of total amount of soluble protein in roots. the expression level of the hbs-antigen in plants controlled by the dual 35s rna cauliflower mosaic virus promoter was the same in all organs of the plant, accounting for up to 0.06% of total amount of soluble protein. therefore, (aocs) 3 amaspmas, the hybrid agrobacterial promoter, can be regarded as a tissue-specific element of the hbsag gene expression, maximum expression being induced in roots. the activity of the agrobacterial mannopine synthase gene can be very different in different tissues and organs of the plant, maximum expression being observed in roots and calli [24, 25] . the expression level of the hbs-antigen in leaves of transgenic plants controlled by the (aocs) 3 amaspmas promoter substantially increased after wounding and reached to 0.01% of total amount of soluble protein. these results correspond with the available data on the induction of foreign gene expression in leaves of transgenic plants con-trolled by the mannopine synthase promoter under similar wounding conditions [24, 26] . eight lines of transgenic plants with the highest level of expression of the hbs-antigen were selected to obtain so-called hairy root tissue cultures by means of infection of leaf disks with strain a4 of a. rhizogenes culture. hairy roots, being effectively plant tumors transformed by a. rhizogenes [19] , are a convenient system for production of secondary metabolites and recombinant proteins due to their genetic stability and fast growth in the hormone-free medium. as a result of retransformation of plants carrying the hbsag gene with strain a4 of a. rhizogenes, hairy root cultures were obtained also carrying the hbs-antigen (fig. 3) . the level of expression of the hbs-antigen in different lines of hairy root cultures remained the same as that in parent transgenic plants making up to 0.01% of total soluble protein. leaf explants of the transformed plants expressing the hbs-antigen controlled by the (aocs) 3 amaspmas promoter were used to obtain callus cultures. the hbsantigen expression in these cultures was also observed (fig. 4) . this could be explained by the induction of activity of the mannopine synthase and octopine synthase promoters by a higher level of auxins in callus culture cultivated in the auxin-containing medium [24] . it should be noted that the results of our study do not confirm the existing data on more efficient expression of foreign genes controlled by the (aocs) 3 amaspmas hybrid promoters that those controlled by the dual camv 35ss promoter [12] . in those study, the activity of the (aocs) 3 amaspmas promoter was judged on the basis of gus staining assay. another data, however, are available [27] when the researchers did not notice a considerable difference between the bt-cry1ia1, an 3 amaspmas promoters. the attempts have also been made to use the (aocs) 3 amaspmas hybrid promoter for transformation of plants that could potentially produce the vaccine against sars (severe acute respiratory syndrome) [28] . the expression of sars-cov, the target s protein, was the highest in roots of transformed tobacco and immature fruits of the transformed tomato plants. (aocs) 3 amaspmas, the hybrid promoter, can be considered as a potential tool for tissue specific expression of various pharmaceutically important proteins in transformed plants and their cell cultures. (1-5) tobacco plants transformed with pe1802-ag (lines 8, 11, 12, 13, 21) ; (6-9) tobacco plants transformed with pe1945-ag (lines 1, 4, 6, 9) . average values are presented for three independent experiments; standard error in each of them did not exceed 0.0001%. transgenic plants as vaccine production systems expression of hepatitis b surface antigen in transgenic plants edible vaccines expression of immunogenic glycoprotein s polypeptides from transmissible gastroenteritis coronavirus in transgenic plants plant expression systems for the production of vaccines transgenic plants as edible vaccines immunogenicity of porcine transmissible gastroenteritis virus spike protein expressed in plants expression of human hepatitis b virus surface antigen gene in transgenic tobacco polypeptides of hepatitis b surface antigen produced in transgenic potato production of transgenic tobacco plants expressing the gene encoding of hepatitis b virus surface antigen analysis of transgenic tobacco plants carring the gene for hepatitis b virus surface antigen strength and tissue specificity of chimeric promoters derived from the octopine and mannopine synthase genes expression of hepatitis b virus surface antigen in transgenic potato plants and its characteristics transfer of octopine t-dna segment to plant cells mediated by different types of agrobacterium tumouror root-inducing plasmids: generality of virulence systems binary vectors molecular cloning: a laboratory manual, cold spring harbor: cold spring harbor lab revised medium for rapid growth and bioassays with tobacco tissue cultures transformation of dicotyledonous plant cells using the ni plasmid of agrobacterium tumefaciens and ri plasmid of a. rhizogenes, plant genetic transformation and gene expression. a laboratory manual, draper transformation of several species of higher plants by agrobacterium rhizogenes. sexual transmission of the transformed genotype and phenotype a simple and rapid method for the preparation of plant genomic dna for pcr analysis recombinant plasmid dna pdes20 codes hepatitis b virus surface antigen (hbsag/mayw), yeast strain s. cerevisiae dan-041/pdts20 -producer of hepatitis b virus antigen and vaccine on its base: rf patent no. 2 088 664 a rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding the effect of intron location on intron-mediated enhancement of gene expression in arabidopsis dual promoter of agrobacterium tumefaciens mannopine synthase genes is regulated by plant growth hormones combined usage of regulated promoters and grafting technique in gene fusions to lacz reveal new expression patterns of chimeric genes in transgenic plants evaluation of bt-cry1ia1 (cryv) transgenic potatoes on two species of potato tuber moth, phthorimaea operculella and symmetrischema tangolias (lepidoptera: gelechiidae) in peru severe acute respiratory syndrome (sars) s protein production in plants: development of recombinant vaccine the authors thank s.v. chernyshov for assistance in the development of the pe1945-ag genetic construct.this study was supported by the presidium of russian academy of sciences, the program "dynamics of plant, animal, and human genofonds" and also by russian foundation for basic research (projects nos. 05-08-01473 and 06-04-81022). key: cord-252536-gfx4cq03 authors: bieniossek, christoph; imasaki, tsuyoshi; takagi, yuichiro; berger, imre title: multibac: expanding the research toolbox for multiprotein complexes date: 2011-12-07 journal: trends biochem sci doi: 10.1016/j.tibs.2011.10.005 sha: doc_id: 252536 cord_uid: gfx4cq03 protein complexes composed of many subunits carry out most essential processes in cells and, therefore, have become the focus of intense research. however, deciphering the structure and function of these multiprotein assemblies imposes the challenging task of producing them in sufficient quality and quantity. to overcome this bottleneck, powerful recombinant expression technologies are being developed. in this review, we describe the use of one of these technologies, multibac, a baculovirus expression vector system that is particularly tailored for the production of eukaryotic multiprotein complexes. among other applications, multibac has been used to produce many important proteins and their complexes for their structural characterization, revealing fundamental cellular mechanisms. protein complexes composed of many subunits carry out most essential processes in cells and, therefore, have become the focus of intense research. however, deciphering the structure and function of these multiprotein assemblies imposes the challenging task of producing them in sufficient quality and quantity. to overcome this bottleneck, powerful recombinant expression technologies are being developed. in this review, we describe the use of one of these technologies, multibac, a baculovirus expression vector system that is particularly tailored for the production of eukaryotic multiprotein complexes. among other applications, multibac has been used to produce many important proteins and their complexes for their structural characterization, revealing fundamental cellular mechanisms. multiprotein complexes catalyze cellular function our understanding of cellular function has remarkably expanded in recent years, brought about by technological advances in dna and protein analysis [1] . the sequencing of complete genomes, such as the human genome, has set the stage to address proteome-wide interaction studies, which have revealed that proteins do not typically exist as isolated entities [2] [3] [4] . rather, they are assembled in complex molecular machines consisting of numerous proteins and, often, other biomolecules (such as nucleic acids, sugars, lipids and small molecules), arranged into functional modules that catalyze essential cellular processes. this molecular organization has been recently termed 'protein sociology' [5] . understanding cellular processes requires detailed knowledge of the 3d structure of the molecules involved, and the parameters and architectural features that dictate their interaction. structural genomics efforts strive to analyze comprehensively single proteins and protein domain structures on a whole-genome scale, and have provided atomic structures of thousands of protein structures and folds. furthermore, the architectures of essential macromolecular complexes, such as ribosomes, nucleosomes and rna polymerases, have been revealed at near atomic resolution, providing a wealth of structural detail and crucial insight into the functions of these multicomponent machines [6] [7] [8] . integrated approaches combining structural, functional and computational data are emerging, and provide views of protein organization in space and time in entire organisms [5, 9] . notwithstanding, our molecular understanding of the very large number of protein complexes in the cell remains limited to a handful of examples for which detailed nearatomic structures are known. this is mostly because of the difficulty in obtaining samples in sufficient quality and quantity for molecular studies. many essential complexes remain intractable because they exist in very low amounts in their endogenous host, which hinders their purification review glossary cre recombinase: enzyme that recognizes, binds to, cuts and religates specific dna sequences (called loxp sequences). four copies of cre recombinase bind to two loxp dna sequences, and then cre recombinase exchanges one strand each from the two loxp sequences. if the two loxp sequences are present on two different dna molecules, then these will be conjoined as a result, resulting in one dna molecule with two loxp sites. homing endonucleases: restriction enzymes with particularly long (20-30 base pairs) recognition sequences. after cutting the dna, they often leave fournucleotide overhangs that can be compatible with overhangs generated by another restriction enzyme, bstxi; then, dnas processed with homing endonucleases can be efficiently ligated to dnas tailored by bstxi. both recognition sequences are destroyed in the process, and the ligation product cannot be cut by either enzyme. ligation independent cloning (lic): a method for inserting dna sequences (i.e. genes) into a plasmid dna molecule without using dna ligase. dnas to be conjoined are treated by exonucleases that trim back one strand of the dna double helix, exposing the complementary strand as a long sticky end. if two dna molecules have complementary sticky ends, they can be conjoined simply by mixing. lic methods have become popular in recent years because they do not require preselected dna sequences that serve as recognition sites for restriction enzymes. a version of lic that is entirely independent of any dna sequence is called slic (sequence and ligation independent cloning). loxp sequence: short imperfect inverted repeat of 34 base pairs which serves as a recognition site, cleavage site and religation site for cre recombinase. the loxp sequence is not symmetric, therefore, the conjoining of two loxp sequences by cre recombinase is directional. nuclear polyhedrosis virus (npv): npv attacks caterpillars, such as the larvae of the alfalfa looper or the silkworm. npvs are highly specific for their host and thus useful as pest control agents. polyprotein: long polypeptide which contains several individual proteins that are connected by linker amino acids which either are capable of self-cleavage, or are recognition and cleavage sites of highly specific proteases. tn7 transposon: originally discovered as a large (14 kb) dna segment from tn7 phage, which can insert into a specific location in the e. coli genome. the transposon encodes the tn7 transposase, a four-subunit protein complex that carries out this insertion reaction. three dna elements are minimally required for this reaction: the tn7l and tn7r dna sequences which flank the dna to be transposed, and a sequence called tn7 attachment site (atttn7) into which the transposed dna is inserted. this insertion mechanism can be efficiently exploited to conjoin recombinant dna by providing the tn7l and tn7r sequences on a plasmid at each end of a dna sequence of interest, and the atttn7 sequence on another dna which will serve as recipient. the reaction is started by addition of the transposase to these dnas. corresponding author: berger, i. (iberger@embl.fr). from native source material. recombinant overproduction can resolve this bottleneck, and numerous expression systems, mostly for heterologous protein expression in escherichia coli, have been developed and refined over the past few decades. more recently, e. coli expression systems have been designed for coexpression of several proteins by using polycistronic mrna transcripts, or two or more plasmids that coexist in the same cell [10, 11] (box 1). however, many eukaryotic protein complexes cannot be produced efficiently in e. coli. they may contain subunits that are too large for the e. coli transcription and translation machinery, or may require either specific chaperone systems for proper folding or protein modifications (such as phosphorylation or acetylation) that e. coli cannot provide. thus, the successful overproduction of these complexes, which is required to decipher their structure and function, depends on the availability of powerful eukaryotic expression technologies. in this review, we describe multibac, a box 1. coexpression toolbox for production of protein complexes expression systems for production of protein complexes in e. coli frequently make use of polycistronic expression cassettes with several genes of interest, spaced apart by ribosome binding sites (shine-dalgarno sequences), placed under the control of a single promoter, and typically followed by a sequence for efficient termination of mrna transcription [10, 11, 21, [60] [61] [62] (figure ia) . in eukaryotic hosts, an analogous design can involve internal ribosomal entry sites (iress), which are inserted between gene coding regions under control of a single promoter [24] [25] [26] [27] (figure ib ). iress exist in the 5 0untranslated regions of many viral and cellular mrnas [24] , and facilitate cap-independent translation by recruiting ribosomes for efficient protein production. ires sequences differ greatly and exhibit species specificity [26] . for example, ires elements from encephalomyocarditis virus (emcv) work well in mammalian cells, whereas iress from perina nuda virus (phv) and rhopalosiphum padi virus (rhpv) have been successfully used for protein complex production in insect cells [27] . polyproteins are long polypeptides containing individual proteins spaced by specific proteolytic cleavage sites. certain viruses, such as coronavirus, produce their entire proteome by proteolytic processing of polyproteins encoded by a few open reading frames. polyprotein approaches have proven to be particularly powerful for balancing the stoichiometry of coexpressed proteins [22, [28] [29] [30] (figure ic) . polyprotein constructions can involve self-cleaving peptides found in picornavirus (called p2a peptides) or variants thereof [28, 29] . alternatively, constructions can be used that mimic open reading frames in positive-sense single-stranded rna viruses and provide a highly specific protease gene together with genes of interest arranged in a single large open reading frame [22, 30] . individual proteins are then liberated from the encoded polyprotein by the protease, which cleaves the proteolytic sites placed between the protein subunits ( figure ic) . the elucidation of protein-protein interactions in complexes is of crucial importance for many applications including structural biology. a novel approach, coesprit, utilizes library-based construct screening for the identification and expression of soluble protein complexes in e. coli [31] . in coesprit, a subunit of a (putative) protein complex is provided as bait. the interacting partner is provided in the form of a random incremental library generated by exonuclease digestion of the full-length gene. prior input from bioinformatics analyses such as homology alignments or domain identification is not required for this approach. coexpression of the library with the bait protein allows identification of soluble complexes by immunofluorescence-assisted colony screening using labeled antibody markers against affinity tags present on the proteins screened. production of protein complex from high-expressor colonies thus identified can then be scaled up to milligram amounts for high-resolution studies by nmr or x-ray crystallography [31] . trends in biochemical sciences february 2012, vol. 37, no. 2 recent eukaryotic expression system that is specifically designed to tackle and overcome this crucial production bottleneck [12, 13] . we summarize the technology concepts underlying multibac and review its wide range of applications. the multibac system yeast, mammalian cells and insect cells have been successfully used for recombinant production of eukaryotic proteins [14] . in particular, the use of insect cell cultures infected by a recombinant baculovirus, carrying the eukaryotic gene of interest, was first demonstrated many years ago [15] . the exciting evolution of baculovirus from a pest control agent to a powerful recombinant protein production tool has recently been reviewed [16] , and baculovirus expression systems have become increasingly popular for many applications [17] [18] [19] . multibac is a baculovirus expression system particularly designed for the production of eukaryotic multiprotein complexes with many subunits (figure 1 ). it consists of an array of small synthetic dna plasmids, an engineered baculovirus genome derived from the autographa californica nuclear polyhedrosis virus (acnpv; see glossary) that is used to infect cells of the caterpillar spodoptera frugiperda, and a set of protocols detailing every step from gene insertion into the plasmids to production of protein complexes in cultured insect cells [19, 20] . the presence of many subunits in a protein complex requires the assembly of many encoding genes and their integration into the baculovirus in a multicomponent co-production experiment. this process is laborious and technically challenging using conventional methods that typically involve serial insertion of genes into increasingly large and difficult-to-handle dna plasmids. multibac applies a different concept for multigene assembly that relies on recombination of small, custom-designed, synthetic dna plasmid molecules (< 3 kb) that are called 'acceptors' and 'donors' (figure 1a ). acceptors and donors can be easily loaded with one or several genes each, and recombined in a single-step reaction into a multigene construct. acceptors contain an origin of replication that allows propagation in standard cloning strains of e. coli, whereas donors harbor a conditional origin of replication (derived from phage r6kg) that requires the presence of a specific protein (known as p) for replication. this protein is expressed from the pir gene inserted into the chromosome of specifically tailored e. coli strains that are used to propagate donor plasmids [13, 20] . donors and acceptors contain a resistance marker, a short imperfect inverted repeat (loxp), an expression cassette consisting of a baculoviral promoter (p10 or polh), a dna segment for inserting heterologous genes, and an efficient signal for eukaryotic polyadenylation (figure 1a ). the expression cassettes are flanked by a homing endonuclease site and a compatible bstxi site, which allow for iterative assembly of several expression cassettes on each plasmid [21] . importantly, donors can survive in a pir-negative background only if they are conjoined with an acceptor that provides a nonconditional origin of replication, and this is the central feature that enables flexible and efficient generation of multigene constructs. these are achieved in vitro multibac consists of an array of small (3 kb), synthetic dna plasmids called acceptors and donors. acceptors have a regular origin of replication (ori cole1 ), whereas donors have a conditional origin derived from r6kg phage (ori r6kg ) that requires special bacterial strains for their propagation. donors and acceptors contain expression cassettes controlled by late baculoviral promoters (polh or p10) as well as strong eukaryotic polyadenylation signals (from sv40 or hsvtk). all plasmids contain the loxp sequence (circles filled in red) for fusing donors to an acceptor by the cre recombinase. each plasmid has a different resistant marker: gentamycin resistance (gn r ) for acceptors, and either chloramphenicol (cm r ), kanamycin (kn r ) or spectinomycin (sp r ) resistance for donors. in addition, a 'multiplication' module is present to facilitate the assembly of several expression cassettes on a donor or acceptor based on specifically designed restriction sites (e.g. homing endonucleases and bstxi restriction endonuclease, shown as blue boxes flanking promoter and terminator, respectively) [18, 19, 21] . acceptors contain the dna sequences (tn7l and tn7r) required for transposition by the tn7 transposase. (b) the assembly of a composite multibac baculoviral genome for multigene expression. genes are inserted into donors and acceptors by using either restriction endonucleases and ligase, or sequence-independent and ligation-independent cloning methods (bottom). cre-mediated fusion produces a multigene construction that is inserted into the baculoviral genome by tn7 transposition in specifically tailored e. coli cells that contain the viral dna (also known as a bacmid). the dna located between the tn7r and tn7l sequences in the multigene fusion construct is inserted by the transposase enzyme into the tn7 attachment site (mini-atttn7). the multibac baculovirus was engineered for improved protein production and delayed host cell lysis by deleting specific genes [12] . the combination of sequence and ligation independent cloning (slic) for gene insertion and cre-loxp fusion for multigene construct generation is called tandem recombineering and can be performed in a parallelized mode on microtiter plates, optionally on a robot [21] . further heterologous genes can be inserted into an additional loxp site present on the bacmid. composite baculoviral dna is then purified from the bacteria and used to transfect insect cells. virus is expanded by infecting small-volume (50-100 ml) insect cell cultures and harvesting the budded virus particles released into the culture media. this virus is then used for protein complex expression in larger (typically one to several liters) insect cell cultures [13, 20] . trends in biochemical sciences february 2012, vol. 37, no. 2 by cre recombinase, which fuses one or several donors (each with one or several inserted genes) to a single acceptor in a one-step reaction by conjoining via the loxp sites; this results in plasmid dimers, trimers and tetramers. the resulting multigene expression constructs are characterized by the precise combinations of resistance markers present on the fusions, and this can be exploited for combinatorial assembly strategies based on multiple antibiotic selection [21] . the process of inserting genes into acceptors and donors, which can be optionally done by ligation independent methods followed by cre-loxp fusion, is termed 'tandem recombineering' [22] . gene insertions into the multibac genome take place in bacterial strains that contain the multibac viral genome as an artificial chromosome together with a plasmid encoding the tn7 transposon enzyme complex. multigene acceptordonor fusion constructs are transformed into these bacterial cells, and the tn7 transposon enzyme inserts the collection of expression cassettes present on the acceptor-donor fusion in a single-step reaction into the tn7 attachment site engineered into the baculoviral genome ( figure 1b) . productive transposition disrupts a laczaencoding gene, which enables blue/white screening of colonies. the multibac baculoviral genome has been engineered for improved protein complex production by removing genes encoding viral protease and apoptotic activities, thereby reducing proteolytic breakdown of the heterologous gene products and delaying lysis of the infected cells [12, 13] . as a second site of entry in addition to the tn7 attachment site, the multibac genome contains also a distal loxp site for adding further functionalities. for example, a gene encoding l-phosphatase was inserted into this site to remove phosphates from a coexpressed protein complex [23] . the composite multibac virus is then purified and used to transfect insect cells for protein production in petri dishes containing cell monolayers or, for larger volumes, shaker flasks containing suspension cultures [13, 19, 20] . multibac resolves several challenges encountered in protein complex production. these include constraints on handling many and often very large encoding dnas and the necessity to revise expression experiments rapidly and flexibly if a subunit or purification tag needs to be altered or exchanged, by replacing the respective donor or acceptor with a modified version in the tandem recombineering pipeline. in addition to multibac, a growing number of technologies are currently being used for expressing protein complexes (box 1). examples of these approaches are the use of internal ribosome entry sites (iress) for multigene expression, and the use of polyprotein constructions involving self-cleaving peptide sequences or proteolytic processing of large precursor polypeptides by specific proteases [22, [24] [25] [26] [27] [28] [29] [30] . particularly promising for structural biology, where often a large number of constructs need to be scrutinized in parallel, are new methods coupling gene libraries to coexpression (box 1) [31] . these approaches are compatible with the multibac technology concept; for example, the polyprotein approach has been successfully used to produce a human transcription factor core complex by multibac, with improved production efficiency and yield [22] . multibac expression of protein complexes for structural studies the multibac system, originally designed by x-ray crystallographers interested in studying multiprotein complexes [12] , has been well received and put to good use by the structural biology community (figure 2, box 2) . many proteins and their complexes have been produced by multi-bac, often for the first time, for structure elucidation, providing important insight into their biological function ( figure 2 ). in some cases, for example for producing intact full-length protein kinase cbii, an existing transfer plasmid was combined with the optimized multibac baculovirus genome for protein production [32] . the modular concept of gene assembly by tandem recombineering has turned out to be instrumental for accelerating the process of obtaining high-quality samples for structural biology of numerous complexes [33] [34] [35] [36] . two outstanding examples of the utility of multibac are the elegant production of the entire anaphase promoting complex, apc/ca large (1.1 mda) 13-subunit multiprotein assembly that regulates defined cell cycle transitions [34] and the recent crystal structure elucidation of the mediator head modulea transcription factor complex that is essential for the expression of class ii genes in eukaryotes [35] . the gene assembly encoding the complete apc/c was inserted into two multibac baculoviruses, encoding eight and five subunits, respectively. co-infection of insect cells with the two viruses allowed the purification of the entire 1.1-mda complex and its structural determination by electron microscopy (em), revealing the structural basis for subunit assembly (figure 2 ) [34] . the production of the mediator head module from yeast (seven subunits, 223 ti bs figure 2 . multibac expression for structural biology. the multibac system has been successfully used to produce many proteins and their complexes in highquality for structural analysis. the structures of lkb1-strada-mo25a (pdb 2wtk) [55] , a tumor suppressor kinase complex, the pkcbii kinase (pdb 3pfq) [32] , and the rad9-rad1-hus1 complex, a dna damage checkpoint complex (pdb 3g65) [56] , were determined by x-ray crystallography providing crucial insight into the function of these important proteins. the crystal structure of the n-terminal domain (ntd) of the chromatin remodeler mot1 bound to the tata-box binding protein (tbp) showed a molecular bottle-opener for tbp (pdb 3g65) [57, 58] . in a hybrid approach involving em and x-ray crystallography, a structure of a different chromatin remodeling enzyme, isw1 (lacking the atpase domain), bound to a nucleosomal template was elucidated (pdb 2y9z and emd-1877) [33] . strikingly, the entire 13 subunit yeast apc/c, an e3 ubiquitin ligase essential for the cell cycle, was assembled by multibac (box 3) as well as two multisubunit apc/c subcomplexes (tpr5 and sc8), leading to structures revealing many details of apc/c assembly by em (emd-1844, 1841 and 1845 ) [34] . molecular illustrations were prepared with pymol (http://www.pymol.org/). trends in biochemical sciences february 2012, vol. 37, no. 2 kda), a paradigmatic case study for the successful application of the multibac system, is detailed in box 2 [35] . clearly, multibac is only one of many tools that bring such studies to fruition. nonetheless, quality sample preparation is a crucial component of the structural determination pipeline. the studies already available exemplify the aptitude of the system, and hint at recurring challenges encountered when using multibac in multiprotein complex research (box 3). the list of illuminating structures of complexes produced by using multibac can be expected to grow rapidly in the future. complexes come in many forms: the coatomer proteins and their complexes in their native tissue can exist as several isoforms, which may complicate their biochemical and structural characterization when extracted and purified from endogenous source material by immunoprecipitation or classical biochemical approaches. an example is the coatomer, the central protein component that covers certain vesicles, known as coat protein i (copi)-coated vesicles, which are thought to play an important role in the early secretory pathway. coatomer consists of seven subunits and exists in four isoforms of different subunit composition. the four isoforms co-purify when extracted from animal tissue, which hampers their structural analysis and impedes attempts to understand the potentially different functions of the individual isoforms. for example, it remains unclear whether or not copi-coated vesicles are uniformly coated with individual isoforms, although these have been found to be unequally distributed in the golgi stack [37] . recently, all four coatomer isoforms have been successfully produced using multibac, setting the stage to individually address their structure and function [38] . this has been achieved by integrating the genes encoding the five subunits that are common to all isoforms into the tn7 attachment site of the multibac viral genome. the genes for the other two subunits, which are variable and define the four isoforms, were inserted by using a donor plasmid into the loxp site that is distal to the tn7 attachment site on the multibac viral genome. coatomer complex isoforms can then be purified successfully from insect cell cultures infected with the respective multibac virus [38] . silkworms, the larvae of the moth bombyx mori, have been used for thousands of years to produce silk for textile production, but they have gained a new role as protein production bioreactors after the successful expression of human a-interferon in these insects. a growing number of proteins, including interleukins and antibodies, have since been produced using this system [39, 40] . interestingly, it seems that silkworms can sometimes produce much higher yields of recombinant proteins than can be obtained from liquid cell cultures. for example, the activity of interleukin-3 produced in silkworms is 10 000-fold higher than that produced in cultures of immortalized african green monkey kidney (cos) cells, and 30-fold higher than that produced in insect cell suspension culture. notably, 1 mg of purified human macrophage colony-stimulating factor can be extracted from 10 silkworms [39] . mediator is a large multiprotein complex that is central to gene expression in eukaryotes and is organized into three modules termed head, middle or arm, and tail [63] . the size and complexity of mediator has long posed a major challenge for structural studies. mediator head (seven subunits, 223 kda) was first successfully produced recombinantly by infecting insect cells with single baculoviruses, each encoding one subunit, opening the door for structural studies [36] . however, this procedure required lengthy optimization to indentify suitable, highly producing recombinant baculoviruses by repeated rounds of laborious screeninga process of 8 weeks or longer for each baculovirus. the demanding procedures constrained logistics and severely compromised attempts at structure determination of mediator head by x-ray crystallography, which typically necessitates flexible, stream-lined and efficient production protocols for generating many variants of the protein specimen studied. the multibac system elegantly resolved this fundamental impediment [35] . all genes encoding the subunits of the mediator head module were inserted into a single multibac baculovirus by using tandem recombineering [36, 64] . combinatorial assembly of the genes lead to a series of multibac baculoviruses encoding either full mediator head (med17, med6, med18, med8, med20, med11 and med22) or subcomplexes including core head (med17, med6, med8, med11 and med22) and mini head (med17, med11 and med22). investigation of these complexes by em allowed initial assignment of subunit positions [64] . modification of med17 and med18 by eliminating flexible regions was crucial to obtain diffraction-quality head module crystals, and this was easily accomplished by simply exchanging the respective unmodified genes in the original multibac baculovirus expressing the full head [35] . the modular acceptor-donor construction principle of multibac is crucial to facilitate this alteration, alleviating a need to assemble the entire multigene construct every time again de novo. the recombinant head module has been purified and crystallized, and the architecture of this 223-kda complex has been determined by x-ray crystallography ( figure i) . the crystal structure of the mediator head reveals how this essential complex is built from its components to provide stability as well as flexibility for transcription regulation, resulting in a platform for other transcription factors [35] . notably, a portion of the head named 'neck domain' confers stability and integrity of the complex by forming a striking, novel multihelical bundle, engaging five of the seven subunits of mediator head. these impressive results have prompted the development of a version of the multibac system adjusted for protein complex production in a silkworm bioreactor [41] . in this system, the original baculovirus genome was replaced with that of a b. mori nuclear polyhedrosis virus (bmnpv), likewise provided as an artificial chromosome in bacterial cells. a tn7 attachment site was introduced into this bmnpv genome, and a plasmid expressing the tn7 transposon complex was co-transformed. thus, the silkworm multibac system retains the ability to accept multigene vector constructions prepared by tandem recombineering or by conventional restriction and ligation cloning. a distinct feature of the b. mori multibac system is the method of infecting the silkworms, which occurs via injection of virus solution with a needle. protein complexes are then purified from the hemolymph of infected silkworms after several days. recombinant human dna polymerase d has been produced using silkworm multibac [41] . although the enzyme is thought to contain four subunits, the extraction from animal tissue usually leads to loss of two subunits during purification. four milligrams of active enzyme containing all four subunits can be purified from 350 silkworms infected with a silkworm multibac virus carrying four genes, providing a simple, fast, reliable and economic platform to produce human dna polymerase d. therefore, the silkworm multibac system might be useful for the production of other protein complexes, especially if an economic option for protein production is required. extending the concept to mammalian cells the concept of multigene delivery of fused acceptor and donor plasmids, each carrying one or several genes, is not restricted to baculovirus expression systems. originally, tandem recombineering was developed for generating multiexpression plasmids for production of protein complexes in e. coli to alleviate imbalances in expression levels observed with multiplasmid co-transformation [21] . expression of protein complexes in mammalian cells can be achieved by transfection with multiple plasmids, in analogy to multiplasmid co-transformation in e. coli. however, this approach can lead to imbalanced production from the plasmids and to heterogeneous cell populations with differences in the ratio of plasmids incorporated. this impediment can be efficiently resolved by using a single plasmid that contains all the heterologous genes. this multigene plasmid can be conveniently generated by tandem recombineering of properly modified acceptors and donors, in which the baculoviral promoters are exchanged for mammalian promoters [22, 42] . the resulting multigene plasmid is introduced into mammalian cells by using a transfection reagent [42] . the efficacy of this approach was demonstrated in a study where an assortment of proteins, systematically tagged with fluorescent markers, was expressed transiently or stably in animal tissue cells [42] . this multibacderived system, called multilabel, can be used for simultaneously visualizing proteins and their actions in homogeneous cell populations by tracking the fluorescent signals. importantly, the ratio of the heterologous proteins is constant at the single-cell level. multilabel has been recently implemented to dissect the interactions of epidermal growth factor receptor (egfr), rab gtpases and phosphoinositol phosphates, which are components of membrane trafficking processes [42, 43] . gene therapy for obesity? gene therapy involves either the targeted delivery of one or several wild-type genes that replace or substitute in for disease-causing genes (which bear aberrant mutations) and produce the desired gene products. or the delivery of genes encoding therapeutic proteins that prevent, modulate or correct the disease [44] . baculoviruses have been intensively studied as potential delivery vectors for gene therapy because they transduce mammalian cells efficiently, do not seem to be toxic, and are nonhazardous (for the target cells and for those carrying out the experiment) as they do not replicate in mammalian cells [17, 45, 46] . further advantages are the ease of producing baculoviruses and the large tolerance for heterologous gene insertions into the viral genome without compromising virion integrity. however, there are formidable obstacles to overcome, such as the observed inactivation of baculoviruses by the human complement system and the lack of specificity with respect to cell types transduced. these problems have not box 3. multibac debugged: challenges and solutions a multitude of factors can compromise the production of multiprotein complexes. among the major impediments are: insufficient knowledge of the composition of the complex. recombinant production of such a complex by coexpression will invariably fail if a subunit crucial for complex integrity has not been properly identified in original preparations. although proteomics and genomics technologies increasingly contribute to improve and validate the data available (reviewed in [1] ), protein complex coexpression can already reveal important information about interacting partners even if the (presumed) complete complex cannot be obtained. post-translational modifications (ptms). ptms such as phosphorylation and acetylation can be essential for proper assembly of a complex or for its activity. thus, the modification activity needs to be either coexpressed or supplied during purification. an additional problem is that insect cells may decorate proteins with nonphysiological ptms (frequently phosphorylation), which compromise complex formation or activity. therefore, mutations that abolish the modification activity must be introduced in the host genome, or enzymes (such as l-phosphatase) that remove the modification have to be coexpressed or supplied [23] . alternatively, inhibitors of the particular modifying enzyme (if known) can be added to the growth media if compatible with cell growth and protein production. complex instability. coexpression of an entire complex from a single virus may not be the ultimate solution. for example, a complex may dissociate during purification in a salt gradient. it may be worth identifying exactly what components dissociate during purification; stable subcomplexes may then be produced separately to reconstitute the complex in vitro by choosing proper conditions. effort required to place all subunit genes on a single baculovirus. for very large complexes, this may be challenging, and applying coinfection with a small number (two or three) of multigene viruses instead may be a viable option. virus instability. this is a drawback of every baculovirus expression system, because baculoviruses deteriorate during amplification and passaging. gene deletions, notably affecting heterologous dna inserts, can potentially abolish complex production. to minimize such deleterious events, specific protocols have been delineated [13, 20] , and new approaches to overcome viral in stability by genome engineering are being developed [65] . satisfactorily been resolved, therefore, baculoviruses have not yet played a major role as gene therapy vectors. by contrast, recombinant adeno-associated virus (raav) does not have the drawbacks of baculovirus and is currently the best choice for efficient gene delivery for gene therapy [47] . however, clinical grade production of raavs, which are necessarily rendered replication incompetent, remains a major impediment for the field. a highly efficient scale-up protocol for raav production utilizes recombinant baculoviruses to produce gene-therapy-competent raav particles in insect cell cultures [48] [49] [50] [51] . this protocol involves three recombinant baculoviruses that carry genes encoding different components of raav and the transgene to be delivered (figure 3 ). co-transfection of insect cells with the three baculoviruses results in the assembly of intact raav particles, which can be further processed and purified to achieve clinical grade. a particularly noteworthy recent study has involved the production of raav particles for gene therapy of obesity in laboratory rodents (figure 3 ) [52] . the raav administered to laboratory rats fed on a high-fat diet contained leptin cdna as the transgene, introduced into the multibac baculoviral genome via cre-loxp fusion of a donor containing the leptin gene under control of a promoter that is active in mammalian cells (figure 3 ). in this review, we have presented approaches, notably the multibac technology, for producing important eukaryotic protein complexes that were hitherto not amenable to structural and functional analysis. molecules as complex as the apc/c have been successfully produced using multi-bac, enabling structure determination and architectural dissection. more recently, the technology concepts underlying multibac have been extended to mammalian gene delivery and even gene therapy, and we anticipate that many more applications will benefit from this approach. in addition, optimized multigene expression technologies that involve polyproteins and library approaches have become available, and the polyprotein technology has been integrated into multibac, increasing the prowess of the system. a beneficial next step could be to incorporate also library approaches such as coesprit [31] (box 1). this could conceivably accelerate the discovery of protein-protein interactions for complexes that rely on a eukaryotic expression host. protein-protein interactions are intensively studied in the pharmaceutical industry, with the aim to identify intermolecular surfaces that can be targeted for drug design [53] . it will be interesting to see whether this field of discovery will also benefit from the multibac technology in the future. considerable effort has been devoted to generating custom-designed insect and mammalian host cells, which provide specialized functionalities such as humanized glycosylation [54] . a logical extension of the multibac system will be to make use of these host cells, for example to produce antibodies with a close-to-human glycosylation pattern for therapeutic applications. the multibac system continues to catalyze progress in many fields of research. an entirely open question is where the limits of the multibac production technology may be. for instance, the transcriptional machinery producing eukaryotic mrnas contains, in addition to the genetic dna template, a stunning 100 proteins organized in multisubunit complexes. it may seem frivolous to set out with multibac to address this complexity structurally and functionally in a defined, fully recombinant setup. nonetheless, the first glimpses of the molecular architectures involved are emerging: the mediator head, produced by multibac, has been crystallized and the structure resolved. the stage is set. exciting times are ahead of us. the authors declare no competing financial interest. bac-vp chicken î²-actin promoter loxp donor ti bs figure 3 . application of the multibac system in gene therapy. raav particles are produced by co-infection with three different baculoviruses [48, 49] . bac-rep harbors two expression cassettes that contain genes for the major aav replication enzyme, rep78, and an n-terminal truncation of rep78, rep52d. raav-bac contains aav inverted terminal repeat (itr) elements that are required for rescue, replication and packaging of transgene sequences, together with rat leptin cdna under the control of a chicken b-actin promoter, which is inserted into the raav-bac baculoviral genome by cre-loxp-mediated fusion of a specifically tailored donor plasmid [52] . leptin is a hormone that acts in the brain to reduce food intake and stimulate energy expenditure. bac-vp produces the aav virion coat proteins. complete raav virions containing the leptin gene are produced in triply co-infected insect cells and purified [50, 52] , and then administered to dietinduced obese rats. an obese rat is shown compared to a normal rat for illustration (bottom). diet-induced obesity renders laboratory rats (and presumably other species) resistant to leptin treatment. therefore, it is close to impossible to curtail diet-induced weight gain. this could be overcome by circumventing leptin resistance or restoring leptin actions in obese animals. the surprising outcome of the study involving baculovirus-produced leptin raav as a gene therapy vector was that exercise, in this study wheel-running, was required to prevent completely weight gain when combined with the leptin gene therapy intervention, leading to the conclusion that work-out, in tandem with leptin gene delivery, may actually develop into a potential antiobesity treatment [52] . the baculovirus schematic drawing is adapted from an image kindly supplied by k. j. airenne (university of kuopio, finland). the raav particles shown are based on pdb entry 1lp3 [59] . getting a grip on complexes the tandem affinity purification (tap) method: a general procedure of protein complex purification functional organization of the yeast proteome by systematic analysis of protein complexes proteome survey reveals modularity of the yeast cell machinery the molecular sociology of the cell the ribosome in focus: new structures bring new insights nucleosome structural studies structure of eukaryotic rna polymerases proteome organization in a genome-reduced bacterium expression of protein complexes using multiple escherichia coli protein co-expression systems: a benchmarking study deciphering correct strategies for multiprotein complex assembly by co-expression: application to complexes as large as the histone octamer baculovirus expression system for heterologous multiprotein complexes protein complex expression by using multigene baculoviral vectors recent advances in the production of proteins in insect and mammalian cells for structural biology production of human beta interferon in insect cells infected with a baculovirus expression vector milestones leading to the genetic engineering of baculoviruses as expression vector systems and viral pesticides baculovirus as versatile vectors for protein expression in insect and mammalian cells baculovirus gene delivery: a flexible assay development tool new baculovirus expression tools for recombinant protein complex production multibac: multigene baculovirus-based eukaryotic protein complex production automated unrestricted multigene recombineering for multiprotein complex production robots, pipelines, polyproteins: enabling multiprotein expression in prokaryotic and eukaryotic cells multiprotein expression strategy for structural biology of eukaryotic complexes viral ires rna structures and ribosome interactions cellular ires-mediated translation ires mediated pathways to polysomes: nuclear versus cytoplasmic routes development of a prokaryotic-like polycistronic baculovirus expression vector by the linkage of two internal ribosome entry sites correction of multi-gene deficiency in vivo using a single 'self-cleaving' 2a peptide-based retroviral vector high cleavage efficiency of a 2a peptide derived from porcine teschovirus-1 in human cell lines, zebrafish and mice tev protease-facilitated stoichiometric delivery of multiple genes using a single expression vector coesprit: a library-based construct screening method for identification and expression of soluble protein complexes crystal structure and allosteric activation of protein kinase c bii structure and mechanism of the chromatin remodelling factor isw1a structural basis for the subunit assembly of the anaphase-promoting complex architecture of the mediator head module head module control of mediator interactions differential localization of coatomer complex isoforms within the golgi apparatus recombinant heptameric coatomer complexes: novel tools to study isoform-specific functions silkworm expression system as a platform technology in life science baculovirus-mediated production of the human growth hormone in larvae of the silkworm, bombyx mori production of recombinant human dna polymerase delta in a bombyx mori bioreactor a plasmid-based multigene expression system for mammalian cells neuropilin-1 promotes vegfr-2 trafficking through rab11 vesicles thereby specifying signal output the future of gene therapy potential cancer gene therapy by baculoviral transduction baculoviruses as gene therapy vectors for human prostate cancer therapeutic in vivo gene transfer for genetic disease using aav: progress and challenges scalable generation of high-titer recombinant adeno-associated virus type 5 in insect cells successful production of pseudotyped raav vectors using a modified baculovirus expression system a simplified baculovirus-aav expression vector system coupled with one-step affinity purification yields hightiter raav stocks from insect cells an inducible system for highly efficient production of recombinant adeno-associated virus (raav) vectors in insect sf9 cells synergy between leptin therapy and a seemingly negligible amount of voluntary wheel running prevents progression of dietary obesity in leptin-resistant rats making protein interactions druggable: targeting pdz domains protein n-glycosylation in the baculovirus-insect cell system structure of the lkb1-strad-mo25 complex reveals an allosteric mechanism of kinase activation crystal structure of the rad9-rad1-hus1 dna damage checkpoint complex-implications for clamp loading and regulation structure and mechanism of the swi2/snf2 remodeller mot1 in complex with its substrate tbp a bottle opener for tbp the atomic structure of adeno-associated virus (aav-2), a vector for human gene therapy strategies for protein coexpression in escherichia coli the pst44 polycistronic expression system for producing protein complexes in escherichia coli co-expression of protein complexes in prokaryotic and eukaryotic hosts: experimental procedures, database tracking and case studies structure of eukaryotic mediator complexes mediator head module structure and functional interactions stabilized baculovirus vector expressing a heterologous gene and gp64 from a single bicistronic transcript we thank christiane schaffitzel, darren hart, sergei zolotukhin, francisco asturias and all members of our laboratories for helpful discussions, and tim richmond and roger kornberg for support. cb is recipient of a swiss national science foundation (snsf) advanced researcher grant. ti is a fellow of the human frontier of science program (hsfp). yt is supported by the us national science foundation (grant mcb 0843026) and the american heart association (grant 073595n). ib acknowledges support from the snsf, the agence nationale de la recherche (anr), the centre national de la recherche scientifique (cnrs), the embl and the european commission (ec) through the joint eipod program, and the ec projects spine2-complexes and 3d-repertoire (framework program 6 (fp6)), as well as instruct, pcube, biostruct-x and complexinc (ec fp7). key: cord-328899-kog99kk5 authors: ferrari, stefano; geddes, duncan m; alton, eric w.f.w title: barriers to and new approaches for gene therapy and gene delivery in cystic fibrosis date: 2002-12-05 journal: adv drug deliv rev doi: 10.1016/s0169-409x(02)00145-x sha: doc_id: 328899 cord_uid: kog99kk5 clinical trials of gene therapy for cystic fibrosis suggest that current levels of gene transfer efficiency are probably too low to result in clinical benefit, largely as a result of the barriers faced by gene transfer vectors within the airways. the respiratory epithelium has evolved a complex series of extracellular barriers (mucus, lack of receptors, immune surveillance, etc.) aimed at preventing penetration of lumenally delivered materials, including gene therapy vectors. in addition, once in the cell, further hurdles have to be overcome, including dna degradation, nuclear import and the ability to maintain long-term transgene expression. strategies to overcome these barriers will be addressed in this review and include the use of: (i) clinically relevant adjuncts to overcome the extraand intracellular barriers; (ii) less-conventional delivery routes, such as intravenous or in utero administration; (iii) more efficient non-viral vectors and ‘stealth’ viruses which can be re-administered; and (iv) new approaches to prolong transgene expression by means of alternative promoters or integrating vectors. these advances have the potential to improve the efficiency of gene delivery to the airway epithelium, thus making gene therapy a more realistic option for cystic fibrosis. clearly an urgent need for a novel therapeutic approach. cystic fibrosis (cf) is the most common reces-preclinical studies carried out soon after the sively inherited lethal disease among caucasian isolation of the gene showed that both viral and population, affecting approximately one in 2500 non-viral gene transfer agents (gtas) were able to newborns. cf is caused by mutations in the cystic correct the chloride ion transport defect in cf fibrosis transmembrane conductance regulator transgenic mice. the success of these studies led (cftr) gene, and to date around 1000 different several groups to initiate clinical trials of gene mutations have been identified. the underlying gene transfer in cf patients and three classes of gtas mutation leads to defective production of the cftr have been used so far: adenovirus, adeno-associated 2 protein, a camp-regulated chloride (cl ) channel virus (aav) and cationic lipids. although the lung located in the apical membrane of epithelial cells. remains the most medically relevant target, many although the organs affected in cf also include the investigators chose to start their clinical studies by pancreas, gut, liver and reproductive tract, the clini-looking at gene transfer to the nasal mucosa. nasal cal picture is dominated by pulmonary disease, with airway epithelia have a similar histology and the recurrent cycles of infection leading to inflammation, same cf-associated abnormalities in ion transport as bronchiectasis and, in greater than 90% of patients, pulmonary epithelia, but compared to lung, the nasal death from respiratory failure. the isolation of the cavity has an easier access for both gene transfer and gene responsible for cf in 1989 suggested the safety measurements and represents a reduced risk to feasibility of new therapeutic treatments based on the patient in case of the occurrence of side effects. cftr gene transfer to patients with cf. the nine clinical trials using recombinant adenovirus rationale for the development of gene therapy proto-as a gta have been published so far [1] [2] [3] [4] [5] [6] [7] [8] [9] , with five cols relies on the fact that heterozygotes appear to be involving administration of the virus to the lung phenotypically normal, expression of cftr is low epithelium [2, [6] [7] [8] [9] . the results reported are not and the dysfunctional epithelial lining cells in the always consistent and of the 30 noses assessed most affected organ, the lung, are accessible through following a single application, approximately onenon-invasive techniques. furthermore, although con-third showed some changes in chloride transport. no ventional treatments have increased life expectancy functional measurements were assessed in the lung of cf patients to approximately 30 years, there is studies, although some evidence of vector-derived cftr mrna was found. common findings in all be different in cf subjects. more importantly, it is the studies published include (i) a dose-dependent now generally accepted that the efficiency of in vivo mild local inflammation and (ii) the progressive lack gene transfer with currently available vectors needs of expression following repeated administration. to be increased. we will here review the progress two phase i, single administration, dose escalation made in improving gtas and the hurdles to efficient trials using aav as a gta have been reported so far. transgene expression, which any vector has to over-aav-cftr was delivered to the maxillary sinus [10] come before entering the airway epithelial cells. or nebulised to the lungs of cf subjects [11] . in both cases aav delivery was shown to be safe and vector dna was found by pcr, up to 70 days in the 2 . extracellular barriers maxillary sinuses [10] and 14-30 days in the airways [11] . when delivered to the maxillary sinus, some the respiratory epithelium presents a particular degree of dose-dependent chloride transport correc-challenge for gtas, since one function of the upper tion was observed. however, in both studies no respiratory tract is to keep foreign particles out of the vector-derived mrna was detected. lung. airway epithelia have evolved a complex cationic lipids have been used in eight clinical series of barriers to prevent penetration of lumenally trials [12] [13] [14] [15] [16] [17] [18] [19] , with two of them involving nebulisa-delivered materials (including both viral and nontion of the lipoplexes to the lower airways [16, 19] . viral gtas) into the cell or interstitial compartment. different lipoplexes were used including dc-chol-these barriers consist of: (i) a well-defined mucus dope [12, 14, 17] , dotap [15] , gl-67 [13, 16, 19] layer that may bind inhaled vectors and remove them and edmpc-cholesterol [18] , with dna doses via mucus clearance mechanisms, (ii) a glycocalyx ranging from 10 mg to 1.25 mg when administered to that may bind vectors and prevent binding to cell the nose and up to 42.2 mg when nebulised to the surface receptors and perhaps most importantly, and lungs [16] . results were similar to those reported (iii) an apical cell membrane that is relatively devoid with adenoviral vectors and a correction towards of viral receptors and growth / tropic receptors (fig. normal values of the chloride defect was observed 1). this series of barriers is complemented by both in the nose (about 20%) [12, [14] [15] [16] and in the epithelial tight junctions that are 'moderately leaky' lung (25%) [16] , lasting between 7 days and 3 weeks to ions but quite 'tight' for larger solutes, thereby [16] . interestingly, in one study naked dna was preventing penetration by current vectors from the reported to be as efficient as lipoplexes [13] . unlike lumenal surfaces into the interstitium. in addition, viruses, it has been reported that lipoplexes can be cf lungs are characterised by the presence of a successfully re-administered without apparent loss of discontinuous barrier of purulent secretions, which efficacy [17] . however, mild flu-like symptoms were contain exogenous actin, dna and inflammatory noted in both lung trials following aerosolisation of products that can modify the integrity of a lipoplex liposome-dna complexes and are probably related and limit gene delivery to the airways. cf sputum to the presence of unmethylated cpg motifs in has been shown to retard the movement of particles bacterially derived plasmid dna [16, 19] . having a size comparable with lipoplexes and to in summary, within 13 years of the cloning of the almost completely block 560-nm particles [20] . cftr gene tremendous progress has been made and furthermore, binding of negatively charged cf proof-of-principle for correction of the basic chloride mucus components to the gene complexes may defect has been established within the target organ in change their surface charge and size, resulting in a vivo in cf subjects. however, in none of the clinical decreased transport of the lipoplex through the trials cited, has sodium hyperabsorption been altered. mucus and therefore in a decreased cellular uptake each of the three gtas used has achieved limited [21] . success, with none outshining the others. interesting-using an ex vivo sheep trachea model, we demonly, despite non-viral vectors being arguably less strated that cationic lipid-mediated gene transfer, but efficient than viruses in animal models and labora-not adenoviral vectors or recombinant sendai virus tory studies, clinical trials have shown that this might [22] , was inhibited by normal mucus, with removal extracellular barriers include the presence of infected mucus and sputum, mucociliary clearance and tight junctions between the cells, which limit the access of viral vectors to receptors localised on the basolateral membrane. alternatively, gene transfer agents could be delivered intravenously (i / v route), even if it is unlikely that airway epithelial cells will be targeted. inflammation, ctl-mediated degradation of transduced cells and neutralizing antibodies can further limit transgene expression or re-administration of the gene transfer vectors. hidden stem or progenitor cells could be targeted by lentiviral vectors. once inside the cell, the genetic material has to overcome endosome and cytoplasmic degradation and to get into the nucleus. ce, ciliated cell; b, basal cell; g, goblet cell; tj, tight junctions; c, capillary; smg, submucosal gland. of this layer significantly increasing transgene ex-nant human dnase has been shown to reduce sputum pression [23] . similar results were obtained when viscoelasticity, facilitate the transport of nanospheres cationic polymers, such as pei, were used. however, though cf sputum [20] and, when mixed with we have recently shown that the mucus barrier can, lipoplexes or polyplexes, not to alter their ability to in part, be overcome by treatment with mucolytic mediate gene delivery [28] . agents, such as nacystelyn, when either a cationic recent findings have suggested that the glycocalyx lipid (gl-67 or edmpc:chol) or a cationic polymer may also represent another major obstacle to gene (pei) are used to deliver genes to airway epithelium transfer to the airway epithelium. the glycocalyx is [24] . composed of many carbohydrate-bearing structures, similarly, sputum and bronchoalveolar lavage including glycoproteins, glycolipids and proteoglyfluid (balf) recovered from cf patients have been cans. treatment with agents to remove components shown to inhibit liposome[25] , adenovirus[26] of the glycocalyx, such as neuroaminidase, has been and aav-mediated gene transfer efficiency [27] . shown to enhance the susceptibility of polarised cells however, when cf sputum was treated with recom-to transduction by ad or aav vectors [29] . binant human dnase, an increased liposome-me-attempts have been made to circumvent barriers at diated gene delivery was observed [25] . recombi-the mucosal surface by delivering transgenes via the bloodstream. the intravenous route may make it 3 . cell surface barriers possible to access the basolateral membrane of airway epithelial cells, characterised by a higher rate the early studies with model systems that emof endocytosis and an increased density of viral ployed poorly differentiated airway epithelial cells receptors. however, the difficulties in overcoming suggested that gene transfer efficiency for a variety the large number of barriers between the vascular of vectors would be high. however, with the advent compartment and airway epithelial cells (endothelial of well-differentiated culture systems it became clear cells, endothelial cell basement membrane, inter-that (i) the airway lumen-facing columnar cell, the stitium and epithelial basement membrane) make this predominant cell type that must be transduced in route challenging. a large number of studies have vivo, is relatively resistant to viral and nonviral gene been conducted to identify which cells in the lung transfer and (ii) the apical surfaces of well-differenare transduced after intravenous (i.v.) injection of tiated airway epithelial cells have a low basal and liposome-dna complexes. most of these have stimulated rate of endocytosis. in contrast, currently concluded that the majority of the cells transfected available cell cultures are composed of basal or are the pulmonary endothelial cells, with some poorly differentiated cells that are highly transducstudies reporting transfection of type i / ii alveolar ible, but do not mirror the airway lumen in vivo [35] . epithelial cells [30] . only a few studies have reported airway epithelial cell transfection after i.v. 3 .1. both synthetic and viral vectors are unable to injection of cationic lipid-dna complexes [31, 32] . decreased progressively as the cells became more this may have been due to properties of this specific polarised and differentiated [36] . similar reports formulation (such as a longer circulation time) since demonstrated that much of the reduction in gene similar results were not obtained when other gtas delivery was due to a decrease in binding of the such as dotap-dope, dc-chol-dope. gl-67a cationic lipid to the surface of mature airway epiand 22 kda pei were administered [32] . thelial cells [37] and also to a decline in the rate of independently from the route of administration, internalisation of the bound complexes into these innate immune defence mechanisms and pre-existing cells [38] . the hypothesis is that reduction in binding immunity are likely to play a crucial role in defend-may be due to differences in surface charge between ing the lung from foreign particles, including gene poorly and well-differentiated cells. while poorly therapy materials. airway macrophages are known to differentiated cells exhibited receptor-mediated endoreduce the amount of gta able to reach the epitheli-cytosis, pinocytosis and phagocytosis, well-differenal cells either via a direct mechanism (phagocytosis) tiated cell were only capable of receptor-mediated or, since they are antigen presenting cells, through endocytosis [37] . some reports have indicated that stimulation of the host immune system. pre-existing cell proliferation may influence cationic lipid-meimmunity from antibodies to wild-type viruses, such diated transfection activity, either by making epias adenovirus and aav, could hinder their ability to thelial cells themselves more susceptible to transfectarget epithelial cells efficiently. a survey of normal tion or by temporary disruption of the tight junctions and cf subjects has shown that virtually all subjects [38] . this would allow the lipoplexes to either access had antibodies to ad5 and to aav2 (the two most the basolateral surface of well-differentiated cells or used viral vectors so far), although only 55 and 32%, to access more immature, less differentiated cells. respectively, were neutralising antibodies [33] . it has with regard to viral vectors, it has been demonalso been observed that individuals with a higher strated that the receptors for adenovirus [39], aav-2 baseline anti-ad neutralising antibody titre mounted [40] and retrovirus are localised to the basolateral a higher neutralising antibody response after vector membrane, and are therefore not accessible when administration [34] . these vectors are delivered topically. first examples reported is the p2y -purinoreceptor, 2 which is highly expressed on the apical surface of in principle, four general strategies may improve epithelial cells and is stimulated to internalise upon gene transfer efficiency to airway epithelia. utp activation. an increase in gene transfer efficiency was observed when adenoviral vectors 3 .2.1. tight junctions conjugated to biotinylated utp ligands were used to the barrier function of epithelial tight junctions target the endogenous p2y -purinoreceptor on well-2 can be transiently disrupted so that vectors can differentiated human airway epithelial cells, known access the basolateral membrane of target cells, to be refractory to adenovirus-mediated gene transfer which are rich in viral receptors and have a higher [51] . similar results were obtained when gtas were rate of endocytosis. this can be achieved by using retargeted to bind to the bradykinin receptor [52] , the 21 ca chelator agents, such as egta [41, 42] , non-urokinase plasminogen activator receptor [53] and ionic detergents, such as polidocanol [43] , and the serpin enzyme complex receptor (sec-r), all of antibodies able to block the function of proteins which are expressed on the apical surface of airway involved in the tight junction-complex, such as e-epithelial cells. administration of plasmid dna cadherin [44] . in addition, medium-chain fatty acids, carrying the cftr cdna condensed with poly-lsuch as sodium caprate, have been shown to be lysine sec-r ligand to cf mice produced a correction 2 better agents for enhancing gene transfer than of the cl efflux, a more normal sodium channel egta, via disruption of claudin-1, a major structural activity and a reversal of nitric oxide synthase-2 component of the tight junctions [45] . croyle et al. downregulation [54] . recently found that a blend of sucrose, mannitol and investigators are currently trying to characterise pluronic f68 enhances adenoviral-mediated gene new ligands to target receptors present on the apical expression in both large and small lung airways. this surface of airway epithelia. one method is to use a formulation was shown to increase tight junction phage display library screened for peptides binding permeability and allow the use of 1 / 2 log lower viral with high affinity to airway epithelial cells. this has dose [46] . airway instillation of perfluorochemical been mainly done in vitro and several promising (pfc) liquid can also transiently open tight junc-candidates have already been identified [55] , but tions, enhancing adenoviral-[47], aav [48] and attempts to identify more relevant molecules by in cationic lipid-mediated gene transfer [49], although vivo bio-panning are currently under evaluation. this procedure can increase inflammation. an attractive alternative strategy is to evaluate systematically glycoproteins from other enveloped 3 .2.2. modification of the host viruses for their ability to efficiently infect airway the apical membrane can be modified so that it epithelia and to use these glycoproteins to pseudobinds vectors. one approach has been to incorporate type recombinant viral vectors. some viruses have unnatural sugars into membrane glycoproteins and already been shown to infect the polarised airway use them as a molecular handle on which a novel epithelium from the apical surface, including respirareceptor is constructed. the artificial receptor en-tory pathogens, such as human coronavirus 229e hanced adenoviral binding and gene transfer to cells [56] , h1n1, h2n2 and h3n2 influenza a virus that were normally relatively resistant to adenovirus strains [57] and viruses from the filoviridae family infection [50] . this suggests the feasibility of a gene such as marburg and ebola virus. by using an transfer strategy in which the biosynthetic machinery ebola-pseudotyped hiv vector, kobinger et al. of the cell is used to engineer novel receptors on the showed efficient and stable transduction of intact cell surface. airway epithelium from the apical surface in vivo [58] . the modification of the vector can also be non-the gta can be modified in order to target a specific. given the low density of ad-receptors on receptor on the apical membrane that has the capaci-the apical membrane, several strategies to increase ty to both bind and internalise a vector. one of the cell binding via a non-fibre-dependent pathway have been developed. complexing ad vectors with poly-using fusogenic lipids or peptides, such as gala cations [59] or incorporating ad in calcium phos-and ha-2 from influenza haemagglutinin, to disrupt phate precipitates [60] have been shown to enhance the endosome membrane. the neutral lipid dioleoylgene transfer to airway epithelia in vitro and in vivo. phosphatidylethanolamine (dope) is generally em-these strategies are thought to neutralise the adverse ployed as a fusogenic helper lipid in a cationic charge interaction between negatively charged ad lipid-dna complex. the second approach involves particles and the negatively charged cell surface, using a dna delivery system with a high buffering resulting in improved binding and uptake of ad capacity and the flexibility to swell when protonated. vector. the rationale that endosomes can be ruptured if the furthermore, it has been reported that both syn-ph drop in the late endosome is inhibited by the thetic [22] and viral vector [61] mediated gene buffering capacity of the formulation led to the use transfer to differentiated airway epithelia can be of ph-sensitive liposomes and polyethylenimine as increased by a prolonged contact time. gene transfer agents [64] . because of the great number of secondary amines, pei behaves as a 3 .2.4. identification of new viral vectors 'proton sponge', able to buffer the low ph in the improved binding and entry from the apical endosomes, resulting in the inhibition of the low surface have been reported with both new viral ph-activated nucleases. this leads to a large increase vectors and new serotypes of existing vectors. it has in the ionic concentration inside the endosome, recently been shown that, in contrast to aav 2, aav finally resulting in osmotic swelling due to water serotype 5 is able to infect human airway epithelia entry and rupture of the organelle [65] . from the lumenal surface, suggesting that 2,3-linked in contrast to plasmid dna, viral vectors have sialic acid is either a receptor for aav 5 or a evolved mechanisms to escape endosome degradanecessary component of a receptor complex [62] . tion and for some of them it is a prerequisite for recombinant sendai virus, a single-stranded rna subsequent nuclear localisation of the virions. hanparamyxovirus, is also able to infect airway epitheli-sen et al. have shown that the passage through the al cells from the lumenal surface. its envelope endosome acidic component exposes aav to conproteins, f (fusion) and hn (haemagglutinin), have ditions that modify the capsid, allowing the virus to been shown to interact with cholesterol and sialic use the cytoskeleton for subsequent trafficking events acid, respectively, both molecules known to be to the nucleus [66] . sendai virus uses the f protein to present on the apical membrane of airway epithelial fuse its envelope with the cell plasma membrane, cells [22] . similarly, a recombinant respiratory allowing the genetic material to be released directly syncytial virus has recently been developed because into the cytoplasm and thus avoiding endosome of its ability to infect ciliated cells via the lumenal degradation. this led to the use of hvj-liposome as membrane [63] . a new gene transfer agent in which uv-inactivated sev particles are mixed together with lipids in a liposome formulation in order to allow the plasmid 4 . intracellular barriers dna to be introduced into the cytoplasm of the transfected cell [67] . viral and nonviral vectors enter cells through endocytosis, a normal process for the internalisation 4 .2. cytoplasm-based degradation pathways and degradation of extracellular material. viral vectors and some synthetic vectors, such as 4 .1. endosome pei and hvj-liposome, are characterised by the ability to escape endosome degradation. however, plasmid-based vectors are quite susceptible to once released in the cytoplasm and before entering endosomal degradation and attempts have been made the nucleus, other hurdles are encountered. synthetic to enhance transgene escape from the endosome-vector-based strategies are characterised by low gene lysosome pathway (fig. 1) . one approach involves transfer efficiency because plasmid dna is quickly 21 degraded by ca -sensitive cytosolic nucleases, with several approaches have been taken to improve an apparent half-life of 50-90 min [68, 69] . duan et nuclear entry of pdna, including the electrostatic al. showed that in polarised epithelial cells aav binding of pdna to nls-containing proteins, such capsid is ubiquinated after endocytosis and that this as hmg-1, and the covalent attachment of nlsprocess is a barrier to raav transduction. motifs to double-stranded dna. with regard to the proteasome-dependent degradation of ubiquinated latest approach, most of the studies published so far molecules represents a major pathway for disposal of have used the nls of the simian virus 40 (sv40) both endogenous and foreign proteins. in vivo appli-large t antigen because its trafficking to the nucleus cation of proteasome inhibitors in mouse lungs is well characterised and has also shown to mediate augmented raav-mediated gene transfer from unde-nuclear import of non-karyophilic proteins. by using tectable levels to a mean of 10.461.6% of the this 'piggyback' approach, sebestyen et al. demonepithelial cells in large bronchioles [70] . strated nuclear accumulation in digitonin-permeabilised cells, but failed to show uptake of the modified 4 .3. nuclear import dna in nuclei of intact cells. this suggests that covalent modification of pdna with a signal peptide once within the cytoplasm, the transgene must be may alter its behaviour and interaction with other imported into the nucleus to be transcribed. for cellular factors [73] . by capping a luciferase gene plasmid-based expression, nuclear import is a rate-with a single nls peptide, zanta et al. showed a limiting step and intracellular trafficking of pdna, 10-1000-fold increase in gene transfer efficiency either naked or complexed to synthetic vectors, is [74] . however, the major drawback of this technique largely uncharacterised (fig. 1) . during non-viral is the relatively low amount of nls-pdna that can gene transfer, entry of exogenous dna into the be produced. nucleus occurs only in cells that are actively dividthe development of peptide nucleic acids (pnas), ing, i.e., when the nuclear envelope breaks down. oligonucleotide analogues in which the sugar phos-this is consistent with the observation that well-phate backbone of nucleic acid has been replaced by differentiated, non-dividing airway epithelial cells a synthetic peptide backbone, has led to further show very low transfection efficiency. pollard et al. developments in this field. pna is capable of showed that less than 1 / 1000 naked cdna copies sequence-specific recognition of dna and rna microinjected in the cytoplasm were effectively following the watson-crick hydrogen-bonding trafficked to the nucleus [71] . scheme and can be used to link peptides, such as these results may be largely due to the inability of nls-motifs, to plasmid dna. the advantage over pdna to effectively translocate through the nuclear the other strategies is that the nls peptide can be pore complexes (npcs). each npc is comprised of a linked to very specific regions of plasmid dna, with large family of proteins (nucleoporins) forming a no effects on the transcription of genes located structure in which a central channel is surrounded by elsewhere on the plasmid. furthermore, a precise eight peripheral channels. it is thought that the number of nls-motifs can be attached and large peripheral channels, about 9-10 nm diameter, allow quantities of the modified pdna obtained [75] . small solutes and proteins up to 50-60 kda to freely however, up to now, none of the strategies described diffuse in an out the nucleus. larger proteins need a has been used to enhance nuclear import and gene nuclear localisation signal (nls) in order to be transfer expression in airway epithelial cells in vivo. actively transported through the central channel of in addition to this, there are reports suggesting that the pore. under physiological conditions, a super-some polycations, such as pei [71] and lactosylated coiled pdna has a diameter of 10 nm or larger if in poly-l-lysine [76] , may facilitate the nuclear uptake a relaxed conformation, thus suggesting that passive of pdna and that, unlike lipoplexes, the complexes diffusion of pdna through the npc is highly remain intact during nuclear translocation. this unlikely [72] . therefore, targeting of pdna to might suggest the existence of nuclear import pathnuclear 'shuttle' proteins has become part of the ways distinct from the conventional nls one. design of gtas developed to transfer non-dividing unlike plasmid-based approaches, viral vectors cells, as in cf gene therapy. have evolved efficient ways to enter the nucleus. the cytoplasmic movement of viral dna towards the directed at adenoviral capsid proteins present during nucleus is facilitated by the interaction of viral vector delivery, resulting in the production of neuproteins, such as polymerase or capsid proteins, with tralising antibodies (nab) that limit repeated vector the microtubular network [77] . it has recently been administration. shown that adenovirus type 2 docks at the can / several approaches have been taken to reduce nup214 protein of the nuclear pore, then hijacks expression-limiting immune responses in recipients histone h1 and specific h1-import receptors to effect of virus-based gene therapy. a targeted uncoating of its nucleocapsid at the (i) immunosuppressant drugs such as cyclophosnuclear pore. consequently, the viral dna is liber-phamide, cyclosporine and fk506 have been reated near the opening of the pore and positioned for ported to prolong transgene expression and facilitate translocation into the nucleus [78] . repeated gene transfer in the lung [81] . however, the for most of the viral vectors, but not retroviruses, potential to induce severe systemic side effects may nuclear import does not depend on the mitotic status limit the clinical application of these drugs. furtherof the cell. it has recently been reported that a more, prolonged general immunosuppression of cf polypurine tract (cppt) present in the hiv-1 genome patients, where lungs are colonised by pathogenic leads to the formation of a triple-stranded dna bacteria, could be injurious. structure, the hiv-1 flap, which is recognised by the (ii) topical corticosteroids, such as budesonide, nuclear import machinery of the host cell. this have allowed successful re-administration of adenoultimately results in the transport of the hiv-1 virus, at least twice. compared to saline-treated genome into the nucleus across the nuclear pore animals, budesonide treatment resulted in significomplex [79] . introduction of the cppt sequence in cantly higher transgene expression and lower recombinant lentiviral vectors has been shown to amounts of nab both in balf and serum. howenhance nuclear import and, therefore, transgene ever, these differences disappeared after five previexpression [80] . for other viruses, such as sendai ous exposures to the virus [82] . virus, nuclear import is not a rate-limiting step since (iii) co-administration of interferon-g (ifn-g) or replication and transcription both occur in the cyto-interleukin-12 (il-12) has been shown to diminish plasm. the activity of t cells and formation of nab, h2 allowing efficient re-administration (at least once) of recombinant virus [83] . a potential drawback of this 5 . host immune responses approach is that t cells are inhibited at the h2 expense of increased t activation. thus, both ifn-h1 as transgene expression is transient, and unless g and il-12, while capable of inhibiting humoral lung-resident stem cells are targeted, the treatment of immunity, might enhance the elimination of adeno-cf by gene therapy will require repeated administra-virus-transduced cells by ctls. tions throughout the lifetime of the patient. the (iv) both ctl and b cell responses require prevent the activation of humoral and cellular imbe overcome in order to re-administer the vector. mune response to viruses, thus causing prolonged these include: (i) an antigen-nonspecific, cytokine-transgene expression and efficient re-administration. dependent response resulting in acute inflammation; several strategies have been adopted including the 1 (ii) a cytotoxic (cd8 ) t-lymphocyte (ctl)-depen-use of non-depleting monoclonal antibodies to the dent response directed at cells expressing viral or cd4 molecule [84] and the blockade of either cd40transgene proteins, resulting in chronic inflammation cd40 ligand [85] or cd28-b7 (with ctla4-ig) and lack of persistent transgene expression; and (iii) [86] , co-stimulatory signals necessary for complete 1 a helper (cd4 ) t lymphocyte-dependent response t-cell activation. these treatments have resulted in suppression of cellular and humoral responses, pro-shortened version of the cftr gene ('mini-gene') longed transgene expression and, in some cases, [96] . another approach is to expand raav packaging vector re-administration up to four times [84] . how-capacity with trans-splicing or overlapping vectors. ever, in non-human primates, treatment with an antithe first reconstitutes gene expression from two cd40 ligand monoclonal antibody did not prevent independent raav vectors, each encoding unique, the elicitation of a virus-specific antibody response non-overlapping halves of a transgene. the overlapupon secondary challenge with vector [87] . it is ping vector approach uses homologous recombinapossible that a combination of blocking agents may tion between overlapping regions in two independent provide a more complete abrogation of t-and b-cell vectors [97] . preliminary results seem to indicate that immune responses. the trans-splicing approach is 4-10-fold more effi-(v) antibody neutralisation of the virus can be cient than the overlapping approach [98] . reduced by coating the vector particle with polyethylene glycol (peg) [88] , gl67-dope-peg [89] 5 .2. plasmid-based delivery and poly-l-lysine or deae-dextran [59] . these treatments have the added advantage of providing a plasmid-based gene delivery strategies are genermeans to retarget the vector through ligands coupled ally regarded as safer and less immunogenic alterto the coating polymer. another potential strategy for natives to viral vectors. repeated administration of effective repeated delivery is 'serotype switching' lipoplexes in mice [99] and cf patients [17] resulted where gene therapy is initiated with one virus in similar levels of transgene expression as observed serotype, then switched to a virus derived from a after a single delivery, thus suggesting that lipoplexsecond serotype for subsequent administration, there-es can be re-administered without apparent loss of by avoiding neutralising antibodies induced by the efficacy. however, the efficacy of repeated adminisfirst serotype. however, transgene expression may be trations is dependent on the dose used and the time limited by cross-reactive ctls that can also target interval between administrations. lee et al. reported cells infected by the second virus serotype [90] . that very little or no loss in efficacy was observed if the use of these strategies were to be combined provided the dose of lipoplexes was low or if the with adenovirus vectors that are devoid of all viral time interval between successive instillations was sequences ('gutless' or helper-dependent adenoviral sufficiently long [100] . it appears that the inflammavectors), thereby avoiding a cell-mediated immune tion elicited by the complexes may affect the efficacy response, it might be possible to repeatedly deliver of repeat administration. the virus. the use of these 'stealth' adenoviruses non-viral gene delivery agents can indeed have could eliminate the requirement for systemic im-inflammatory and toxic effects in vivo. scheule et al. munosuppression with repeated administration [91] . observed a dose-dependent pulmonary inflammation recombinant aav (raav) vectors are generally characterised by infiltrates of neutrophils and, to a less immunogenic in the airways and beck et al. lesser extent, macrophages and lymphocytes when have recently suggested that aav can evade immu-the cationic lipid gl-67 was administered to mouse nological surveillance and can be repeatedly de-lungs in vivo. associated with this were elevated livered to rabbit airways, because they are unable to levels of the pro-inflammatory cytokines il-6, tnftransduce antigen presenting cells (dendritic cells) a and ifn-g that peaked at day 1-2 post-instillation, [92] . however, this possibility has recently been but resolved to normal limits by day 14 [101] . ruled out unless different serotypes [93] or some histopathological analysis of lung sections from form of immunosuppression [94, 95] is used. further-mice treated with the individual components of the more, the small packaging size limit of the virion has lipoplex suggested that the cationic lipid was the restricted the use of genetic control elements that can major mediator of the observed inflammation. howdrive cftr expression compared to the relatively ever, results of clinical studies in which cf patients weak promoter activity of the native viral inverted were subjected to either aerosolised liposomes alone terminal repeat (itr) sequence. recent progress to [102] or cationic lipid-pdna complexes indicated overcome this problem has included the use of a that bacterially derived pdna may also be inflam-matory [16, 19] . each of the cationic lipid-pdna-tion, scaria et al. showed that by using an adenovirus treated patients, but not the liposome-treated con-co-expressing both human cftr and icp47 (a gene trols, exhibited mild flu-like symptoms (including shown to block the transporter associated with mhc 1 fever, myalgia and a reduction in fev of approxi-class i-mediated antigen presentation to cd8 t 1 mately 15%) over a period of 24 h. cells) a prolonged expression (up to 21 days) was one possible explanation for this response may be observed in monkey lungs, even though natural killer related to the presence of unmethylated cpg di-cell activity was enhanced [107] . nucleotide sequences in bacterially derived pdna. however, several lines of evidence suggest that compared with dna of eukaryotic origin, bacterial attenuation of promoter function may be the most genomic dna contains a 20-fold higher frequency significant factor in the lack of persistence of transof the dinucleotide sequence cpg. further, unlike gene expression. the transcriptional activity of the eukaryotic dna, in which approximately 80% of the widely used cytomegalovirus immediate early gene cytosines are methylated, bacterial dna is relatively promoter (cmv) is highly robust, but prone to unmethylated. instillation of bacterial dna or oligo-inactivation over time. cytokines induced by adenonucleotides containing immunostimulatory cpg virus-or plasmid-mediated gene delivery have been motifs into mouse lungs resulted in inflammation of shown to down-regulate cmv-driven expression. for the lower respiratory tract [103] . several strategies this reason, many investigators have evaluated alterhave been employed to decrease the immuno-native promoters, with many showing increased stimulatory properties of pdna, including (i) meth-persistence. these include the polyubiquitin c proylation of cpg sequences [104] , (ii) reduction of the moter (high-level transgene expression for up to 8 cpg frequency by eliminating non-essential regions weeks and still detectable after 6 months), the or by site-directed mutagenesis [105] and (iii) the elongation factor 1a promoter (expression up to 4 use of inhibitors of the cpg signalling pathway, such weeks) [108] and the cmv-ubiquitin b hybrid as chloroquine or quinacrine [105] . independent of promoter (expression up to 3 months, with 50% of the strategy used, the cpg-reduced pdnas were day 2 levels remaining at day 84) [109] . similar found to be less pro-inflammatory. however, meth-prolonged transgene expression was obtained when ylation of the cpg motifs can severely reduce the the e4 region from adenovirus 2 or simply the open expression of the transgene [104] . reading frame 3 (orf3) of e4 were cloned upstream of the cmv promoter on a plasmid backbone [110] . however, a potential disadvantage of this approach 6 . expression of the therapeutic gene is the immunogenicity of the e4 orf3 product once expressed in the transfected cell, thereby limiting its one of the main obstacles to the development of usefulness in vivo. in addition, there is growing gene therapy for the airways is the inability of evidence that genomic sequences, either within or current viral and nonviral gene transfer vectors to flanking the gene, might be essential to provide in direct sustained expression of a therapeutic trans-vivo long-term expression [111] . gene. this may be due to several causes including an alternative approach to achieve prolonged loss of the vector (especially if present in an episom-transgene expression is to use artificial chromosome al form), transcriptional silencing of the transgene vectors or integrating viruses, since they are stable promoter, loss of the transfected cell through cell over time and will propagate to daughter cells should turnover, or the generation of an immune response to cell division occur. huertas et al. have recently the transgene product or the transfected cell itself. developed a circular yeast artificial chromosome several approaches have therefore been taken to (yac) carrying the human cftr sequence and the achieve longer-term expression following each gene orip and ebna-1 genes from epstein-barr (ebv) transfer treatment. as outlined in section 5.1, im-virus [112] . plasmids carrying these two ebv genes munosuppressant drugs have been reported to have been shown to allow long-term episomal prolong transgene expression [106] , because of their maintenance of the dna, being able to replicate and ability to block t cell-mediated response. in addi-segregate in the daughter cells. however, unless lung-resident stem cells are targeted, these vectors integrated in the host genome will eventually be lost are unlikely to have greater advantages over conven-due to cell turnover. tional plasmids, since the airway epithelium is an alternative would be to target lung resident mainly composed of non-dividing cells. furthermore, stem cells so that the integrated transgene is continutheir size makes in vivo delivery and subsequent ously propagated to the daughter cells when cell nuclear trafficking quite difficult. division occurs. there has been a long-standing with regard to integrating viruses, aav and, more debate regarding the identity of airway epithelial recently, lentiviruses have been considered as good stem cells and two theoretical models of cell lineage candidates for prolonged expression. wild-type aav in the pseudostratified airway epithelium have been persists by site-specific integration into human chro-suggested. the 'stem cell niche' model suggests that mosome 19. however, recombinant aavs persist airway epithelial stem cells are localised to distinct predominantly in an episomal form and integrate and potentially inaccessible compartments of the randomly in the host genome at a much lower lung. however, there is evidence for great plasticity frequency than wild-type aav, probably because of in growth and differentiation potential of airway the lack of rep gene products [113] . furthermore, epithelial cells and earlier studies showed that both results demonstrate that, at least in the liver extra-basal and non-basal cells could regenerate a comchromosomal, not integrated genomes, are the pri-plete mucociliary epithelium in tracheal grafts. this mary source of raav-mediated gene expression observation led to an alternative 'unlimited plastici[114] . ty' model where many non-terminal cells with ample lentiviral vectors from different strains including progenitorial capacity are scattered throughout the human, feline and equine immunodeficiency virus epithelium [115] . a better understanding of this issue have been used to transduce airway epithelial cells is also likely to benefit any gene therapy strategy because of their potential to provide long-term using integrating vectors, considering the different expression through the integration of the provirus accessibility of stem cells in the two models. into the host cell genome. kobinger et al. recently used an hiv-based vector pseudotyped with the envelope from the ebola (eboz) virus to efficiently 7 . how many and which cells should be transduce airway epithelia in vivo. animals receiving corrected to achieve clinical benefit? eboz-pseudotyped hiv vector demonstrated minimal expression at day 7, but strong expression in the a key issue is to distinguish between two conairway epithelium, including submucosal glands, by cepts: 'percent of cells corrected' and 'level of cftr day 28 that persisted at day 63. on average, 30% of transduced / cell'. the entire tracheal epithelium was transduced by the vector at day 28 and 24% at day 63 [58] . 7 .1. percent of cells corrected despite these results, integrating vectors still have some problems. the first is that chromosomal posi-an in vitro study has shown that approximately tion and structure can negatively affect transgene 6-10% of 'corrected' cells are needed to restore 2 expression, thus leading to host shut-off of the normal cl transport function [116] . the amplifica-2 transferred expression cassette, a problem that has tion of functional correction reflects the fact that cl plagued retrovirus gene transfer vectors. this effect, can move via gap junctions from non-corrected cells known as transcriptional silencing, can be mitigated into adjacent corrected cells for secretion. an in vivo through the use of chromatin insulators, which are study has shown that 5% of the normal level of cftr protein-binding dna elements that lack intrinsic gene expression can correct the chloride abnormality promoter / enhancer activity, but shelter genes from (50% of normal) and, importantly, the intestinal transcriptional influence of surrounding chromatin. pathology seen in mice with cf [117] . however, the second problem is that the majority of the different levels of correction may be required to epithelium is comprised of differentiated cells that restore the various functions of cftr. the relationare replaced every few months, so that a transgene ship between efficiency of gene transfer and normali-sation of sodium transport is likely to be linear, gene, will have to be used to drive cftr expression. reflecting the fact that cftr directly regulates enac to produce more physiological cftr expression channels within individual cells. this suggests that other approaches such as the use of (i) genomic virtually every affected cell (100%) should be cor-context vectors in which cftr expression is driven rected [116] . the level of transfection required for by its natural promoter and regulatory sequences and other functions of cftr to be restored, e.g., sulpha-(ii) 'gene targeting' molecules, such as rna-dna tion / sialylation defects or transport of other mole-chimeric oligos and small dna fragments for cules, remains unknown and appears to depend on homologous replacement may be needed. this latter the cell type transduced and the gta used. zhang et strategy would allow the mutation within the cftr al. showed that cationic liposome-mediated cftr gene to be 'surgically' modified without altering the transfer achieved very low transgene expression with promoter and the regulatory sequences of the cftr insignificant correction of the chloride defect, but gene. mucus sulphation was reduced to levels seen in non-cf airways. the converse was seen with adenovirus that, despite higher levels of expression, did not 8 . non-conventional approaches transduce goblet cells [118] . in addition, the route of administration itself seems to have a role with regard because of the hurdles encountered by gtas in to the localisation of the delivered gene. instillation getting into airway epithelial cells, many less conof gtas into the mouse lungs results predominantly ventional strategies than those described above have in transfection of the alveolar and terminal bronchial been developed and will be reviewed here. cells, while, when they are aerosolised, a higher level of transfection is observed in the airway epithelium 8 .1. oligonucleotide-mediated strategies [119] . aerosolisation is more likely to lead to a more even deposition of the gtas throughout the lung because of their size, oligonucleotides have the than could be attained by instillation, which pre-potential to enter the cell and the nucleus much more sumably primarily deposits the complexes in the easily than plasmid dna. goncz et al. have deparenchyma. veloped a new strategy based on gene targeting by small fragment homologous replacement (sfhr). specific genomic sequences are targeted with small fragments of exogenous dna (400-800 bp) that are a delivered gene would ideally be expressed in a homologous to the targeted endogenous dna semanner similar to the normal pattern of the defective quences except for the particular base pairs that gene that it is replacing. to counterbalance the poor encode the desired modification. gene targeting is gene transfer efficiency with current gtas, very thought to have several advantages over classic gene strong non-specific viral promoters such as rsv and complementation, including long-term and tissue cmv have been used to drive transgene expression. specific expression of the functional gene, no intro-however, because the number of cftr molecules duction of foreign sequences and no immune reper respiratory epithelial cell is low (20-100 channel sponse. for the first time, goncz et al. showed the proteins / cell) and that the cftr protein regulates modification of specific genomic sequences in exon other ionic channels implicated in water and salt 10 of the mouse cftr after small dna fragments secretion, it is possible that high levels of cftr were delivered to the lungs of normal mice [122] . expression may perturb the function of other proteins recently, conversion of wild-type cftr to the or alter physiological properties of the cell. it has g551d mutation in primary rat hepatocytes has been been reported that a high level of cftr expression reported by using a different molecule, rna-dna can cause growth arrest and increased cell volume chimeraplasts [123] . [120, 121] , thus suggesting that either regulatable another strategy is to use oligonucleotides as expression cassettes [121] or epithelium-specific antisense molecules. friedman et al. used antisense promoters, such as that for the human cytokeratin 18 oligoribonucleotides to correct the cftr splicing mutation 3849 1 10 kb c → t in human and mouse uptake. the feasibility of these methods is very epithelial cells [124] . in another report, antisense organ-and tissue-specific and for the lung represents inhibition of b cell antigen receptor-associated pro-a challenge with many unknown aspects. gersting activated cl currents in [dphe ]cftr-expressing fected with plasmid dna mixed with superparamag-cho cells [125] . antisense strategy might suggest a netic nanoparticles (magnetofection) resulted in more new way to correct the defects present in cf, such as than 100-fold increase in gene transfer [128] . the mucin production or sodium hyperabsorption. challenge is now to see whether this very promising technique [129] can be applied to the airway epi-8 .2. spliceosome-mediated rna trans-splicing thelium in vivo. (smart) 8 .4 . in utero gene transfer a very recent technology developed by mitchell and collaborators takes advantage of the cell's because of the hurdles and barriers involved with endogenous splicing machinery as a strategy for conventional gene transfer, several groups have modifying pre-mrna. the smart process uses begun to look into in utero gene delivery as a new pre-therapeutic rna molecules (ptms) that are way potentially to increase the efficacy and duration designed to base pair with the intron of a targeted of transgene expression. cf is a particularly inviting pre-mrna to suppress target cis-splicing while target disease for the development of in utero gene enhancing trans-splicing between the ptm and therapy because amniotic fluid circulation provides target. the aim is to repair mutant pre-mrna vector exposure to pulmonary, gastrointestinal and molecules and generate full length repaired mrna sinus epithelia, all primary sites of cf pathology. that is translated and processed into mature cftr viral vector introduced into amniotic fluid of mice, protein [126] . in an in vivo model of df508 cf rats, sheep and rabbits results in reporter gene airway epithelia, liu et al. showed that human cf expression in both pulmonary and gastrointestinal bronchial xenografts infected with a recombinant epithelia. transgene expression was also demonadenovirus encoding a ptm targeted to cftr intron strated in pulmonary epithelia after intratracheal 9 demonstrated partial correction of cftr-mediated instillation of vector in utero. persistence of trans-2 cl permeability to 22% of that seen in non-cf gene expression in the lung ranged from 14 to 30 xenograft [127] . this strategy would allow a more days post-infection. most of these studies have been physiological expression of cftr, as previously carried out with adenoviral vectors, and many have discussed. furthermore, as ptm expression cassettes led to substantial inflammation and subsequent foetal can be much smaller than those encoding full-length loss. fewer adverse events have been observed when cdnas, smart also allows for the use of smaller aav is used. retroviral vector use for applications and less immunogenic vectors, such as raav with in utero has been limited because the amniotic fluid limited packaging capacity. however, the very high reduces infectivity (see ref. [130] for an extended titres required to achieve correction and the potential overview). of trans-splicing into non-cftr mrnas are disthe potential advantage of in utero gene transfer advantages which should not be ignored. over other approaches is that, as the host immune system is not fully developed, it may be possible to 8 .3. physical methods tolerise the organism to viral vectors. however, recent reports have ruled out the possibility of because of the inefficiency of currently available successful repeated administration of viral vectors gtas, newer ways of increasing gene transfer have during adult life despite previous in utero exposure to be developed. physical methods such as magnet[131] . in humans this approach would be even less ism, electroporation and ultrasound have been em-successful since the immune system becomes responployed by several groups as a means to enhance gene sive by midgestation. larson et al. have recently presented an interesting future a feasible option to detect gene expression in a and controversial report, showing that treatment of non-invasive way in human airways. primate foetuses with an adenovirus expressing the cftr gene resulted in accelerated differentiation of the lung [132] . furthermore, after in utero gene transfer 1 0. conclusion with an adenovirus containing the cftr gene, a permanent reversion of the lethal phenotype in cf in conclusion, in the 13 years since the cloning of knockout mice was observed [133] . this might the cftr gene and after about 20 clinical trials, suggest that cf is a developmental disease that could some of the crucial barriers limiting gene transfer be prevented by transient in utero cftr gene expreshave become clearer. the combination of newly sion at the proper time of lung and intestine differendeveloped gtas and technologies, better models to tiation. test gene-based therapies and new detection systems will hopefully allow these barriers to be overcome. clinical trials have shown that there is clearly a this work was supported by the cystic fibrosis requirement for newer approaches to improve deliv-research trust (sf) and a wellcome trust senior ery and efficiency, increase duration of expression clinical fellowship (ewfwa). the authors are and permit repeated administration of gtas. in members of the uk cystic fibrosis gene therapy addition there is an increasing need by the scientific consortium (www.cfgenetherapy.org.uk). community that all these new gtas and technologies are tested on relevant airway test systems (i.e., only highly differentiated epithelial cells in r eferences vitro and a spectrum of in vivo systems) before entering clinical trials. cf mice have been a great cationic liposome mediated cftr gene transfer to the nasal r aerosol administration of a recombinant chadadenovirus expressing cftr to cystic fibrosis patients: a wick cationic lipid-mediated cftr crystal, gene transfer to the lungs and nose of patients with cystic airway epithelial cftr mrna expression in cystic fibrosis fibrosis: a double-blind placebo-controlled trial, lancet 353 patients after repetitive administration of a recombinant a phase i study of adenovirus-mediated webb, d.r. gill, repeat administration of dna / liposomes transfer of the human cystic fibrosis transmembrane conductto the nasal epithelium of patients with cystic fibrosis, gene ance regulator gene to a lung segment of individuals with ther safety and biological efficacy of a lipid-aerosol and lobar administration of a recombinant adeno-cftr complex for gene transfer in the nasal epithelium of virus to individuals with cystic fibrosis. ii. transfection adult patients with cystic fibrosis rosentransfer of aav-cftr in maxillary sinus a clinical inflammatory syndrome attributable to aerosolised lipid-dna administration in cystic fibrosis simoens, administration of tgaavcf to cystic fibrosis subjects with f. de baets structural alterations of gene complexes by cystic geddes, liposome-mediated cftr gene transfer to the nasal fibrosis sputum efficient gene transfer dna alone for gene transfer to cystic fibrosis airway to airway epithelium using recombinant sendai virus, nat. epithelia in vivo the extra-and intracellular barriers a placebo-controlled study of liposomenative sheep airway epithelium a low rate adjuncts for nonviral gene transfer to airway epithelium, of cell proliferation and reduced dna uptake limit cationic gene ther the effect of mucolytic r.c. boucher, limited entry of adenovirus vectors into agents on gene transfer across a cf sputum barrier in vitro, well-differentiated airway epithelium is responsible for in-gene ther inhibitory effect of cystic fibrosis polarity influences the efficiency of recombinant adenoassputum on adenovirus-mediated gene transfer in cultured sociated virus infection in differentiated airway epithelia, epithelial cells increas-(raav) transduction by bronchial secretions from cystic ing epithelial junction permeability enhances gene transfer to fibrosis patients polyethylenimine shows properties of interest for cystic fibrosis gene therapy egta enhancement of adenovirus-(1999) 219-225. mediated gene transfer to mouse tracheal epithelium in vivo retargeting the coxsackievirus and adenovirus receptor to the apical surface of polarized epithelial cells enhanced in vivo airway gene transfer via transient modireveals the glycocalyx as a barrier to adenovirus-mediated fication of host barrier properties with a surface-active agent, gene transfer epithelial integrity resulting from e-cadherin dysfunction l.c. tsui, comparison between intratracheal and intravenous predisposes airway epithelial cells to adenoviral infection, administration of liposome-dna complexes for cystic fi-am systemic gene expres-enhanced epithelial gene transfer by modulation of tight sion after intravenous dna delivery into adult mice, science junctions with sodium caprate targeting transgene ment of novel formulations that enhance adenoviral-mediated expression for cystic fibrosis gene therapy, mol. ther. 4 gene expression in the lung in vitro and in vivo immune responses to adenovirus and adeno-associated virus liggitt, perfluorochemical liquid-enhanced adenoviral vector in humans variability of human systemic humoral immune halbert, perfluorochemical liquid enhances adeno-associated responses to adenovirus gene transfer vectors administered to virus-mediated transgene expression in lung use of perfluorocarbon (fluorinert) to adenovirus-mediated gene transfer to basal but not columnar enhance reporter gene expression following intratracheal cells of cartilaginous airway epithelia, hum. gene ther. 7 instillation into the lungs of balb / c mice: implications for (1996) 921-931. nebulized delivery of plasmids engineering novel cell surface reand differentiated airway epithelial cells in vitro and in vivo, ceptors for virus-mediated gene transfer of binding and entry of liposome-dna complexes decreases g-protein-coupled receptors as targets for gene transfer transfection efficiency in differentiated airway epithelial vectors using natural small-molecules ligands adeno-associated virus ing ad5 fibers to bradykinin receptors expressed on the type 2-mediated gene transfer: altered endocytic processing lumenal surface of human airway epithelium, pediatr. pulenhances transduction efficiency in murine fibroblasts targeting the urokinase sugimachi, k. sueishi, hvj (sendai virus)-cationic lipoplasminogen activator receptor enhances gene transfer to somes: a novel and potentially effective liposome-mediated human airway epithelia molecular conjugate mediated cftr gene transfer in cf mice affects secondary coutelle, a potential barrier to gene transfer a novel peptide, thalwht, for the targeting 482-497. of human airway epithelia mccray nucleases prevent efficient delivery to the nucleus of injected jr., human coronavirus 229e infects polarized airway epiplasmids l. processing limits gene transfer to polarized airway epithelia infection of human airway epithelia with h1n1, by adeno-associated virus polyethylenimine but not cationic lipids pseudotyped lentiviral vector can efficiently and stably promotes transgene delivery to the nucleus in mammalian transduce airway epithelia in vivo wood-quantitative studies on the nuclear transport of plasmid dna worth dna vector chemistry: the covalent attachment of signal peptides incorporation of adenovirus in calcium phosphate precipitates enhances gene transfer to airway 102 (1998) single nuclear localization signal peptide is sufficient to carry 184-193. dna to the cell nucleus adenovirus-mediated gene transfer to ciliated airway epithelia peptide nucleic acids: versatile tools for gene requires prolonged incubation time 3 (2001) virus type 5 to 2,3-linked sialic acid is required for gene 831-841. transfer nuclear import and export of viruses and virus genomes respiratory syncytial virus (rsv) infects ciliated cells of airway import of adenovirus dna involves the nuclear suppl. 22 (2001) abstr. 207. pore complex receptor can / nup214 and histone h1 a versatile vector for montagnier, gene and oligonucleotide transfer into cells in culture and in p. charneau, hiv-1 genome nuclear import is mediated by a vivo: polyethylenimine eponge a protons: un moyen d'entrer dans une gene transfer by lentiviral vectors is limited by nucleaŕc ellule auquel les virus n'ont pas pense cyclophosphamide di transient immunosuppression allows transgene exminishes inflammation and prolongs transgene expression pression following readministration of adeno-associated viral following delivery of adenoviral vectors to mouse liver and vectors successful readministration of adeno-associated virus vectors budesonide enhances repeated gene transfer and expression to the mouse lung requires transient immunosuppression in the lung with adenoviral vectors efficient expression of cftr function prevents formation of blocking iga antibodies to recombiwith adeno-associated virus vectors that carry shortened nant adenovirus and allows repeated gene therapy to mouse cftr genes repeated administracapacity with trans-splicing or overlapping vectors: a quantion of adenoviral vectors in lungs of human cd4 transgenic titative comparison expanding aav packaging 163 (1999) 448-455. capacity with trans-splicing and overlapping vectors: a antibody to cd40 (2001) abstr. 221. ligand inhibits both humoral and cellular immune responses to mouse airway blunting of immune second dose of a cftr cdna-liposome complex is as responses to adenoviral vectors in mouse liver and lung with effective as the first dose in restoring camp-dependent ctla4ig readministration of adenovirus vector in nonhuman primate lungs by blockade of cd40-nichols pegylation of adenotions of cationic lipids for efficient gene transfer to the lung, virus with retention of infectivity and protection from hum basis of complexed with polyethylene glycol and cationic lipid is pulmonary toxicity associated with cationic lipid-mediated shielded from neutralizing antibodies in vitro, gene ther. 5 gene transfer to the mammalian lung circumvention of anti-adenovirus neutralizing safety of a single aerosol administration of escalating doses immunity by administration of an adenoviral vector of an of the cationic lipid gl-67 / dope / dmpe-peg5000 alternate serotype sponses against the virus and allow for significant gene a.m. krieg, cpg motifs in bacterial dna cause inflammaexpression upon readministration in the lung contribugino, repeated delivery of adeno-associated virus vectors to tion of plasmid dna to inflammation in the lung after the rabbit airway gelhardt, vector-specific complementation profiles of two 262. independent primary defects in cystic fibrosis airways transplant immunosuppression increases and waldrep, transgene expression in mouse airway epithelium prolongs transgene expression following adenoviral-mediby aerosol gene therapy with pei-dna complexes, mol. ated transfection of rat lungs cheng, biosynthetic and growth abnormalities are associ adenoviral vector expressing icp47 inhibated with high-level expression of cftr in heterologous its adenovirus-specific cytotoxic t lymphocytes in cells expression of the human cftr gene in epithelial cells sion of df508 cftr in normal lung after site-specific high and sustained transgene expression in vivo from modification of cftr sequences by sfhr, gene ther. 8 plasmid vector containing a hybrid ubiquitin promoter conversion of wild-type cftr to the g551d increased duration of transgene expression in the lung with mutation in primary rat hepatocytes using rna / dna plasmid dna vectors harbouring adenovirus e4 open oligonucleotides epstein-barr virus / human vector provides silverman, r. kole, correction of aberrant splicing of the high-level, long-term expression of a -antitrypsin in mice cftr) gene by antisense oligonucleotides expression of the human cftr gene from episomal orip-ebna1 control of cystic fibrosis transmembrane conductance regulator expression by bap31 repair of cftr epithelial cell line, gene ther. 3 (1996) 748-755. mrna by spliceosome-mediated rna trans-splicing virus vector genomes are primarily responsible for stable zhou partial correction of endogenous df508 cftr in human cystic fibrosis airway epithelia by spliceosome evidence for stem-cell niches in the 47-52. tracheal epithelium magnetofection of permanent and primary efficiency of gene transfer for of 2 parts) (2001) abstr. 1017. restoration of normal airway epithelial function in cystic fibrosis fection: enhancing and targeting gene delivery by magnetic a force in vitro and in vivo utero aav-mediated gene transfer to rabbit enpulmonary epithelium development of reexpression following readministration of an adenoviral a ferret model of the cystic fibrosis, pediatr. pulmonol. vector in adult mice after initial in utero adenoviral towards an ovine model of cystic fibrosis gene transfer into the a. fetal primate: evidence for the secretion of transgene delsing cftr modulates lung secretory cell prolifer-abstr. 1152. ation and differentiation key: cord-306535-j26eqmxt authors: robertson, matthew j.; kent, katarzyna; tharp, nathan; nozawa, kaori; dean, laura; mathew, michelle; grimm, sandra l.; yu, zhifeng; légaré, christine; fujihara, yoshitaka; ikawa, masahito; sullivan, robert; coarfa, cristian; matzuk, martin m.; garcia, thomas x. title: large-scale discovery of male reproductive tract-specific genes through analysis of rna-seq datasets date: 2020-08-19 journal: bmc biol doi: 10.1186/s12915-020-00826-z sha: doc_id: 306535 cord_uid: j26eqmxt background: the development of a safe, effective, reversible, non-hormonal contraceptive method for men has been an ongoing effort for the past few decades. however, despite significant progress on elucidating the function of key proteins involved in reproduction, understanding male reproductive physiology is limited by incomplete information on the genes expressed in reproductive tissues, and no contraceptive targets have so far reached clinical trials. to advance product development, further identification of novel reproductive tract-specific genes leading to potentially druggable protein targets is imperative. results: in this study, we expand on previous single tissue, single species studies by integrating analysis of publicly available human and mouse rna-seq datasets whose initial published purpose was not focused on identifying male reproductive tract-specific targets. we also incorporate analysis of additional newly acquired human and mouse testis and epididymis samples to increase the number of targets identified. we detected a combined total of 1178 genes for which no previous evidence of male reproductive tract-specific expression was annotated, many of which are potentially druggable targets. through rt-pcr, we confirmed the reproductive tract-specific expression of 51 novel orthologous human and mouse genes without a reported mouse model. of these, we ablated four epididymis-specific genes (spint3, spint4, spint5, and ces5a) and two testis-specific genes (pp2d1 and saxo1) in individual or double knockout mice generated through the crispr/cas9 system. our results validate a functional requirement for spint4/5 and ces5a in male mouse fertility, while demonstrating that spint3, pp2d1, and saxo1 are each individually dispensable for male mouse fertility. conclusions: our work provides a plethora of novel testisand epididymis-specific genes and elucidates the functional requirement of several of these genes, which is essential towards understanding the etiology of male infertility and the development of male contraceptives. the world human population reached nearly eight billion people in august 2019. this number continues to rise and is predicted to reach nearly ten billion by the year 2050 [1] . the increasing need to promote family planning through the development of reliable contraceptive options available to both men and women is widely recognized. currently there are numerous contraceptive options available to women; however, identification of a safe, non-hormonal contraceptive option for men is still an ongoing challenge. although several different fertility control alternatives for men have been investigated, none are currently clinically approved for use. our understanding of the mechanisms underlying male reproductive physiology is still at an early stage as the identification and elucidation of the function of key reproductive proteins is still an ongoing effort. identifying druggable protein targets expressed in the male reproductive tract has been the focus of numerous studies dedicated to the development of male contraception. the mammalian epididymis is a segmented organ comprised of a single, highly coiled tubule with functionally and morphologically distinct regions that can be subdivided most simplistically into a proximal, central, and distal region, conventionally named the caput, corpus, and cauda regions, respectively [2] . as mammalian spermatozoa transit through the epididymis, they acquire the ability to recognize and fertilize an egg, properties that they did not possess upon exiting the testis [3] . considering its essential role, the epididymis-in addition to maturing germ cells of the testis and spermatozoa-is a prime target for the development of a male contraceptive. to advance progress towards the development of a non-hormonal male contraceptive, several previous high-throughput studies have been published that identified a number of human, mouse, and rat genes as testis-specific or epididymis-specific [2, [4] [5] [6] [7] [8] [9] . in 2003, schultz et al. conducted the first study to identify male reproductive tract-specific genes using microarrays. through affymetrix-based genome-wide geneexpression analysis of meiotic-and post-meiotic spermatogenic cells, together with parallel analysis of available data from the ncbi unigene database, the authors identified 271 mouse genes as testis-specific, which included genes with both known and unknown function at the time [4] . in the following 5 years, through two additional microarray-based studies of rat testes and purified rat testicular cells, johnston et al. identified 58 [5] and 398 [8] additional or overlapping genes as testisspecific. in 2014, as part of the continued effort to identify novel contraceptive targets, the newer rna-seqbased transcriptomics methodology was utilized identifying 364 human genes as testis-specific [9] . together with antibody-based protein profiling, many of these genes were characterized in terms of the spermatogenic cell populations showing expression [9] . the first high-throughput transcriptomics study to identify epididymis-specific genes was a 2005 mouse epididymal transcriptome study, in which rna isolated from each of the 10 epididymal segments was analyzed by microarray analysis, identifying 75 epididymis-specific genes with distinct patterns of segmental gene regulation [2] . later in 2007, additional transcriptome profiling utilizing whole genome microarrays resulted in identification of 77 previously unreported epididymis-specific transcripts in the mouse [6] and 110 epididymis-specific transcripts in the rat [7] . a significant number of the identified mouse and rat genes in these studies were not known at the time, and only the probe identification numbers were presented. when evaluating potential druggability in a targetbased drug discovery process, one must consider the protein properties that are required for safe and effective inhibition. among the most significant is tissue expression specificity to minimize potential adverse effects, protein function and whether protein activity or interaction with other proteins is potentially druggable, sequence similarity to closely related paralogs that may be ubiquitously expressed, and whether genetically manipulated animal models demonstrate a functional requirement for the target of interest [10] . several noteworthy review publications have mentioned numerous genes whose critical functions, high expression, and specificity to the testes or epididymides make them viable nonhormonal male contraceptive targets [11] [12] [13] [14] [15] [16] [17] [18] . however, among the identified genes, a significant number either (1) are required for fertility, but are expressed in nonreproductive tissues, or (2) are reproductive tractspecific, but, when disrupted, lead to subfertility [10] . in either case, both are ineffective and highly undesirable outcomes for a potential male contraceptive target. therefore, the identification of additional novel male reproductive tract-specific genes would allow for further advances to be made in the quest to develop an effective and safe non-hormonal male contraceptive. in this study, 21 newly acquired and 243 previously published human and mouse rna-seq datasets [9, [19] [20] [21] [22] [23] [24] [25] [26] were processed in parallel through a custom bioinformatics pipeline designed to identify novel reproductive tractspecific and reproductive tract-enriched transcripts. additional databases obtained from illuminating the druggable genome [27] , mouse genome informatics [28] , and ensembl biomart [29] were utilized to stratify the results into subgroups based on protein druggability and on the availability of a mouse model. numerous reproductive tract-specific and reproductive tract-enriched, potentially druggable targets for which no published mouse model exists, congruent in expression across both mouse and human datasets were identified through our analysis and verified through conventional polymerase chain reaction (pcr). we present the data in a manner that should be most relevant and of substantial interest to the male contraceptive development field since identification of new targets worthy of consideration for further functional analysis in a knockout animal model and potential drug targeting continues to be of vast importance. through our results, we identified four novel epididymis-specific genes (spint3, spint4, spint5, and ces5a) and two novel testis-specific genes (pp2d1 and saxo1) worthy of functional validation in an animal model. through the crispr/cas9 system, we generated four individual gene knockouts (spint3, ces5a, pp2d1, and saxo1) and one double knockout mouse model (spint4/5) revealing an essential requirement for spint4 and spint5 in male mouse fertility, and the potential utility of pursuing spint4 in humans as a non-hormonal contraceptive target. despite significant advances in our understanding of the human and rodent testis and epididymis transcriptome, mostly through microarray-based studies, no prior studies have utilized purified human testis cells for the identification of human testis-specific transcripts, no prior studies have utilized the more state-of-the-art rna-seqbased transcriptomics methodology for analysis of human epididymis-specific transcripts, and no prior studies have utilized rna-seq analysis of rodent reproductive tissues or cells to identify rodent reproductive tractspecific transcripts. to address these gaps in knowledge, and to increase the number of identified reproductive tract-specific genes in both species using the most relevant high-throughput transcriptomics methodology, we analyzed in parallel on a custom bioinformatics pipeline a large number of published and newly acquired human and mouse rna-seq datasets. one hundred and sixtytwo previously published human and 81 previously published mouse rna-seq datasets were retrieved from the sequence read archive (sra). the sra value for each sample is listed in additional file 1: table s1 and additional file 1: table s2 . we also generated 12 new human and 9 new mouse reproductive tissue rna-seq samples (geo accession gse150854). the final dataset is comprised of 3 new and 5 previously published human testis datasets [9] , 27 previously published purified human germ cell datasets [23, 24] , 6 previously published purified human sertoli cell datasets [23, 26] , 9 new and 6 previously published human epididymis segment datasets [21] , 6 previously published mouse testis datasets [19] , 9 new mouse epididymis datasets, 10 previously published purified mouse germ cell datasets [22, 25] , and 3 previously published purified mouse sertoli cell datasets [20] . an additional 118 previously published datasets contributed to the 26 non-reproductive human tissues [30] and 62 previously published datasets contributed to the 14 non-reproductive mouse tissues [19] . figure 1a , b summarizes all the samples acquired for the study. we performed a principal component analysis to visualize the variation in the samples after correcting for batch effects. human reproductive and non-reproductive tissues grouped according to sample type. the reproductive tissue samples clustered by tissue type whether or not they were newly generated or acquired from the sra (fig. 1c) . mouse data showed a similar variation in the samples based on the tissue type (fig. 1d) . for both human and mouse reproductive tissues, samples separated by whether or not the rna-seq was performed on isolated cells or the whole tissue. epididymal tissue was distinct from testis tissue in both human and mouse (fig. 1c, d) . to identify potential male reproductive tract-specific drug candidates, we analyzed the aggregated rna-seq data to find genes that were statistically significant in expression when compared to the non-reproductive tissue that had the maximum expression for that gene. this gene list was then further refined by filtering for genes that were lowly expressed in the non-reproductive tissue that had the maximum expression for that gene (tpm less than or equal to 1.0 for human; tpm less than or equal to 2.0 for mouse). finally, this tpm filtered list was then filtered for the genes that had a reproductive tissue or cell expression value greater than or equal to 10.0 tpm for human, or 8.0 tpm for mouse (fig. 2a) . across all the reproductive tissues, 720 candidate genes were identified in the human and 1304 candidate genes were identified in the mouse samples (fig. 2b , additional file 2: fig. s1 ). additional file 3: table s3 and additional file 4: table s4 summarize the differential fold change, identity of the non-reproductive tissue with maximal gene expression based on the differential gene analysis, fdr, average and standard deviation tpm expression values, and log2 cpm gene expression value for the human and mouse samples, respectively. the results from the fdr and tpm expression value filtering for the human and mouse samples are summarized in additional file 5: table s5 and additional file 6: table s6 , respectively. additional file 5: table s5 and additional file 6: table s6 report the log2 fold change for the reproductive tissue or cell of interest compared to the tissue with maximal gene expression. the genes identified in additional file 5: table s5 and additional file 6: table s6 pass the filters in at least one of the reproductive tissues or cells of interest. in additional file 5: table s5 and additional file 6: table s6 , a value of zero for a given gene and fold expression comparison indicates that for that comparison, the gene did not pass the filters. the majority of genes were downregulated in the reproductive tissue of interest compared to the maximal gene expressing nonreproductive tissue (additional file 7: fig. s2 ). from the analysis, the majority of the candidate genes that passed the fdr and tpm filters were identified in the testis-or sperm-related cells in both human and mouse samples (additional file 7: fig. s2 ). the majority of candidate genes identified in our screen that were testis-specific were already identified by the human protein atlas [9] and/or our reanalysis of (see figure on previous page.) fig. 1 summary of the human and mouse rna-seq samples used in the identification of novel male reproductive tract-specific drug targets. the rna-seq samples used in the human (a) and mouse (b) analyses are schematically shown. principal component analysis was performed on the human (c) and mouse (d) non-reproductive and reproductive samples separately. the colors of the circles next to the tissues listed in a and b correspond to the colors used in the circles for the pca in c and d. sample size (n) values in red and/or black denote the number of new (red) and previously published (black) samples included in our analysis. fig. 2 identification of candidate drug male reproductive gene targets. a diagrammatic representation of overall methodology used to identify reproductive tract-specific candidate genes in humans (720 genes) and in mice (1062 genes). the maximum gene expression was determined across all the non-reproductive tissue samples for each gene for a reproductive tissue or cell sample of interest. genes were then filtered for significance using a false discovery rate (fdr) of less than or equal to 0.05 based on the differential gene expression analysis for the nonreproductive tissue with maximum gene expression and reproductive tissue or cell sample of interest. genes that passed the fdr filter were filtered such that the average tpm expression value of the maximum expressing non-reproductive tissue was less than or equal to 1.0 tpm and the average tpm expression value of the reproductive tissue or cell of interest was greater than or equal to 10.0 tpm. b diagrammatic representation of the number of human and mouse candidate genes in terms of (1) the number of orthologs in the opposite species, (2) the number of genes previously or not previously identified in a prior transcriptomics-based drug target report, (3) the availability and phenotypic outcome of any reported mouse models, and (4) the number of novel genes without a reported mouse model congruent across both species. the main value in each bubble represents the total number of candidate genes identified regardless of tissue or cell identified in. the numbers in parentheses comprise the total number of candidate genes that are either epididymis-specific or specific to testis and epididymis, but not testis and/or testis cell-specific only. the hpa testis datasets (additional file 8: fig. s3 and additional file 9: table s7 ). thirty-six out of the 91 genes that were identified across all the human epididymis tissue were also identified by the human combined (newly acquired and previously published datasets) testis candidate gene list. finally, the majority of the candidate genes, 300, identified from the combined newly generated and previously published human testis datasets were shared with genes identified from the various testis cell datasets. we identified more candidate genes in the newly generated human epididymis tissues compared to previously published data: 19 out of 54 genes were unique to the newly generated caput samples compared to only 1 out of 36 genes which was unique to the previously published samples, 19 out of 75 genes were unique in the newly generated corpus samples compared to 12 out of 68 genes which were unique to the previously published corpus samples, and 33 genes were unique to the newly generated cauda samples compared to 2 genes in the previously published cauda data with no overlap between the two cauda gene lists (additional file 8: fig. s3 and additional file 9: table s7 ). there were 117 candidate genes that overlapped between the newly generated human testis samples and mouse testis sample gene lists, while there were 134 candidate genes that overlapped between the previously published human testis sample and mouse testis sample gene lists (additional file 8: fig. s3 and additional file 9: table s7 ). across all human epididymis tissue samples, including the newly generated and previously published samples, there were 16 genes in common with the combined list of candidate genes across all the mouse epididymis tissue samples. there was a small overlap between the human and mouse samples when the newly generated human caput, corpus, and cauda tissues were individually compared to the mouse caput, corpus, and cauda tissues; there was an overlap of 10, 12, and 4 for the caput, corpus, and cauda, respectively (additional file 8: fig. s3 and additional file 9: table s7 ). this trend was continued for the candidate gene lists derived from the previously published human caput, corpus, and cauda samples when compared to the candidate gene list from the mouse caput, corpus, and cauda, with 7, 10, and 4 genes in common for the caput, corpus, and cauda comparisons, respectively (additional file 8: fig. s3 and additional file 9: table s7 ). additional file 9: table s7 details the genes that are unique and in common for each of the comparisons. to assess the potential usefulness of the candidate genes identified in each human reproductive tissue as drug targets, we assigned the genes to a protein family (i.e., gpcr or ion channel). the majority of identified genes were not from a traditional drug target family like kinases or enzymes. the testis and germ cell datasets provided the most potential targets while the epididymis datasets provided the fewest (additional file 10: fig. s4a ). the protein family classification for each candidate gene identified in each reproductive tissue is detailed in additional file 11: table s8 . the majority of the candidate genes do not have a reported mouse model (additional file 10: fig. s4b ). additional file 12: table s9 summarizes mouse model availability for each candidate gene identified from human reproductive tissues or cells. figure 3 shows the complete list of novel human genes without a reported mouse model as identified in each of the respective cell and/or tissue datasets. digital pcrs (heatmap) and conventional pcrs demonstrating expression of a subset of the novel human reproductive tract-specific genes without a reported mouse model that we identified are shown in figs. 4 and 5, respectively. additional file 13: fig. s5 shows the complete list of previously identified human genes that remain without a reported mouse model as identified in each of the respective cell and/or tissue datasets. additional file 14: fig. s6 shows the complete list of male reproductive tract-specific human genes for which a previously generated mouse model shows male infertility phenotype, as identified in each of the respective cell and/or tissue datasets. through our bioinformatics analysis of previously published and newly acquired rna-seq datasets, we identified a total of 720 genes as reproductive tract-specific in humans (fig. 2 ). of these genes, 122 genes do not have a mouse gene ortholog, while 598 genes have a mouse gene ortholog (fig. 2) . of those with a mouse gene ortholog, 477 have a single gene ortholog (324 have the same symbol in mouse, while 153 have a different symbol in mouse), while 121 have two or more orthologous mouse genes. seventy-six human genes had 2-3 orthologous mouse symbols, 36 genes had 4-10 orthologous mouse symbols, and 9 genes (fam205a, krta p10-6, magea10, or2ag1, pramef11, pramef2, ssx2, ssx3, and ssx4b) had greater than 10 orthologous mouse symbols (11-93 symbols) (additional file 5: table s5 ). of the 720 human genes that we identified as male reproductive tract-specific, 435 have not been previously identified in a transcriptomics-based male reproductive tract-specific study [2, [4] [5] [6] [7] [8] [9] . the sum of our human data confirms the findings of 232 out of 364 genes from djureinovic et al. [9] . after re-identification of gene symbols from reported affymetrix ids and consideration of orthologous genes (mouse to human and rat to human), our human data confirm the findings of 19 out of 39 genes from johnston et al. [5] , 77 out of 176 genes from schultz et al. [4] , 5 out of 32 genes from johnston et al. [6] , 36 out of 253 genes from johnston et al. [2] , 4 out of 58 genes from johnston et al. [8] , and 3 out of 19 genes from jelinsky et al. [7] . of the 598 genes that have a mouse gene ortholog, 346 have not been previously identified as male reproductive tractspecific, and of these, 233 human genes currently lack mouse phenotype information based on data obtained from ensembl biomart, mgi, impc, and ncbi. three hundred and eighty-six genes were identified as testis-specific through either the reanalysis of djureinovic et al. testis datasets (377 genes identified), analysis of our de novo testis datasets (322 genes identified), or both (additional file 5: table s5 ). three hundred and thirteen genes were congruent across both datasets, while 64 genes were uniquely identified through our reanalysis of djureinovic et al.'s datasets and only 9 genes [ac136352.4, ankrd20a1, ankrd62, fam230a, ggtlc2, iqcm, potec, prnt, and utf1] were uniquely identified through our de novo datasets (additional file 5: table s5 ). interestingly, of the 377 genes we identified through djureinovic et al.'s reanalyzed datasets, 143 were not previously identified in their report [9] or any of the other previous reports [2, [4] [5] [6] [7] [8] . of these 143 genes, we randomly verified 21 of these genes as testis-specific in humans through conventional pcr (fig. 5) . we also verified through rt-pcr an additional 15 genes-such as allc, cdkl3, cox7b2, or2h1, and sppl2c-that had been identified through previous studies (additional file 15: fig. s7 ). of the 386 genes identified through either testis datasets, 150 have not been previously identified; of these, 117 genes have one or more mouse orthologs; and of these, 76 genes are lacking reported phenotype information. of the 76 novel genes lacking a reported mouse model, 7 genes encode enzymes (adam20, cpa5, dusp21, naa11, plscr2, prss38, and triml1), 6 encode transcription factors (bhmg1, foxr2, prdm9, tgif2ly, znf560, znf729), 2 encode transporters (slc22a14, slc25a52), and 61 encode proteins of unknown drug target type two hundred and thirty-three novel human reproductive tract-specific genes that each have mouse orthologous genes but with no reported knockout mouse models. the listed genes were identified in one or more datasets as indicated in the venn diagram. underlined genes were also identified in our studies as reproductive tract-specific in mouse (109 genes). genes written in blue encode either enzymes, kinases, gpcrs, ogpcrs, transporters, transcription factors, or proteins involved in epigenetic regulation (74 genes) . genes written in dark red were identified in both testis (testis and/or testis cell) and epididymis (10 genes). (such as etdb, smim36, bend2, btg4, cnbd1, dppa2, efcab5, erich6, fthl17, iqcm, mroh2b, ms4a5, oosp2, pnma6e, ppp4r3c, rbmxl3, rtl9, spdye4, spem2). all of these genes are listed in fig. 3 , and many of these genes are listed in figs. 4, 5, and/or 6. to the best of our knowledge, no prior studies have utilized purified human testis cells for the identification of human testis-specific transcripts. through our analysis, we identified 291 genes as human testis-specific through one or more of the human germ cell datasets, but not through either of the human testis datasets (additional file 5: table s5 ). seventy-six genes were identified exclusively through one or more of the five human spermatogonia datasets (genes such as anp32d, c13orf42, dscr4, or13g1, or2d2, or52e4, ssx2, tle7), while 18 genes were identified exclusively through the human spermatocyte datasets (genes such as h2bfm, mageb17, mageb18, or2b6, tcp10, and znf709) and 79 genes were identified exclusively through the human spermatid datasets (genes such as ac013269.1, clec20a, or7e24, pramef2, spat a31a3, tmem191c, znf679). thirty-four genes were identified through all three cell types' datasets (genes such as ccdc166, eloa2, fam47a, heat r9, and spata31a1). many of these genes are listed in figs. 3, 4, 5, and/or 6. of the 291 genes identified as human testis-specific through one or more of the human germ cell datasets, 252 genes have not been previously identified, 201 of which have one or more equivalent mouse orthologs with 139 of these genes having not been knocked out in mouse. of these 139 novel genes with no mouse model, 8 encode enzymes (glt6d1, prss48, satl1, sult6b1, tmprss7, tpte2, triml2, and ttll8), 1 encodes an epigenetic protein (taf1l), 6 encode gpcrs (gpr156, tas2r13, tas2r30, tas2r46, tas2r50, vn1r2), 1 encodes a kinase (cdkl4), 33 encode ogpcrs (such as or2d3, or3a2, or52e5, or8g5, or10j1, and ) and obp2b met candidate threshold through our analysis of human testes datasets but did not meet candidate threshold from any of the germ cell or sertoli cell datasets, indicating potential expression in peritubular myoid cells, leydig cells, or other cell outside of the seminiferous epithelium. fam36a has not been previously identified, and neither mouse orthologs (1700011m02rik, gm9112) have been knocked out. obp2b was previously identified through djureinovic et al. [9] and johnston et al. [6] ; however, of the equivalent mouse orthologs (lcn4, obp2a, obp2b), only obp2a has been knocked out revealing abnormal coat/hair pigmentation [31] . ism2 and magec2 were identified through both human sertoli cell datasets, while also identified through testis and/or germ cell datasets. both genes have been previously identified (ism2 [8] , magec2 [9] ). ism2 knockout mice display non-reproductive phenotypes [9] . consistent with this finding, our mouse data do not identify ism2 as reproductive tract-specific in mice. magec2 lacks a mouse ortholog for functional analysis in mice. human sertoli cell-specific krtap2-3, krtap4-12, lhx9, and psg5 were identified through one or both human sertoli cell datasets but were not identified through any of the testis or germ cell three hundred and two novel mouse genes with human orthologs without a reported mouse model. the listed genes were identified in one or more mouse datasets as indicated in the venn diagram. underlined genes were also identified through our studies as reproductive tractspecific in human (111 genes). genes written in blue encode either enzymes, kinases, gpcrs, ogpcrs, transporters, transcription factors, or proteins involved in epigenetic regulation (60 genes). genes written in dark red were identified in both testis (testis and/or testis cell) and epididymis (14 genes). datasets indicating sertoli cell-specific expression in the testes (additional file 5: table s5 ). none of these genes have been previously identified as reproductive tractspecific in humans although lhx9 and psg5 have mouse orthologs that have been knocked out [32] [33] [34] [35] [36] [37] . human krtap2-3 has mouse orthologs krtap5-2, gm4559, gm40460, and gm45618, and human krta p4-12 has mouse orthologs krtap4-7 and gm11555; none of these mouse orthologs have been knocked out ( fig. 3 and additional file 5: table s5 ). psg5 knockout mice display non-reproductive phenotypes [32] [33] [34] [35] [36] ; however, lhx9 knockout mice display absent testes and sterility due to an essential requirement for lhx9 during mouse gonad formation [37] . a lhx9-gfpcreer knock-in mouse line-generated by knocking-in gfpcreer at the endogenous lhx9 locuscrossed with the rosa26-tdtomato reporter mouse line revealed cre recombinase activity in retinal amacrine cells, developing limbs, testis, hippocampal neurons, thalamic neurons, and cerebellar neurons [38] . thus, lhx9 is not reproductive tract-specific in mice. our mouse data confirm this finding. to the best of our knowledge, no prior studies have utilized rna-seq for analysis of human epididymis-specific transcripts. through our studies, we identified 39 genes as human epididymis-specific through one or more of the human epididymis segment datasets that were not identified through any of the other human male reproductive tissue or cell datasets, indicating true epididymis specificity (additional file 5: table s5 ). of these 39 genes identified as human epididymis-specific, 29 genes have not been previously identified, 24 of which have equivalent mouse orthologs with 16 of these genes having not been knocked out in mouse. of these 16 novel human epididymis-specific genes with no mouse model, 1 encodes an enzyme-related gene (spint3) and the remaining 15 encode proteins of unknown drug target type (such as actbl2, bsph1, mslnl, spag11a, spag11b, wfdc10a, and wfdc9) (fig. 3) . seven genes were identified through our de novo sequenced human epididymis segment datasets that were not identified through our reanalysis of the human epididymis segment datasets by browne et al.; two of these genes are considered novel without mouse models: defb104a and defb104b (fig. 3 , additional file 5: table s5 , and additional file 9: table s7 ). meanwhile, five genes were identified through our reanalysis of the human epididymis segment datasets by browne et al. that were not identified through our de novo-sequenced human epididymis segment datasets; two of these genes are considered novel without mouse models: actbl2 and mslnl (fig. 3 , additional file 5: table s5 , and additional file 9: table s7 ). fifty-two genes met the criteria for identification as epididymis-specific through one or more of the human epididymis segment datasets, while also being identified as reproductive tract-specific through one or more of the testes, germ cell, and/or sertoli cell datasets (additional file 5: table s5 ). thus, these targets are not epididymis-specific per se, but may be desirable potential male contraceptive targets considering their broader target availability. of these 52 genes identified as human male reproductive tract-specific and epididymisexpressed, 20 genes have not been previously identified, 15 of which have one or more equivalent mouse orthologs with 11 of these genes having not been knocked out in mouse. all 11 of these novel genes with no mouse model encode proteins of unknown drug target type (al163195.3, ccdc168, ccnb3, defb121, defb134, eppin-wfdc6, krtap2-3, magea11, pnma6e, spem2, tex44) ( fig. 3 and additional file 5: table s5 ). since model organisms other than mice may be of interest for the future functional study of human genes-especially those for which no known mouse ortholog exists-we list novel reproductive tract-specific human genes without a mouse ortholog in additional file 16: fig. s8 , which may be of interest for generating null rat or marmoset models [39] . digital pcr (heatmap) demonstrating expression of a subset of these novel human reproductive tract-specific genes without mouse orthologs is shown in additional file 17: fig. s9 . through our bioinformatics analysis of previously published and newly acquired mouse rna-seq datasets, we identified a total of 1062 genes as reproductive tractspecific in mice. of these genes, 303 genes do not have a human gene ortholog, while 759 genes do (fig. 2) . of those with a human gene ortholog, 632 have a single ortholog (451 with the same gene symbol in human; 181 with a different symbol), while 127 mouse genes have two or more ortholog human genes (fig. 2 ). ninety-two mouse genes have 2-3 orthologous human symbols, 16 genes have 4-10 orthologous human symbols, and 19 genes (such as 1700080o16rik, ankrd36, fam90a1b, gm15319, magea10, pramel1, spdye4a, spdye4b, and zfy1) have greater than 10 orthologous human symbols ranging anywhere from twelve to twenty-six symbols (additional file 6: table s6 ). of the 1062 mouse genes that we identified as male reproductive tract-specific in mouse, 743 have not been identified in a previous transcriptomics-based study [2, [4] [5] [6] [7] [8] [9] . the sum of our mouse data confirms the findings of 150 out of 271 mouse genes from schultz et al. [4] , 7 out of 54 mouse genes from johnston et al. [2] , and 6 out of 23 mouse genes from johnston et al. [6] (additional file 18: table s10 ). of the 759 mouse genes that have a human ortholog equivalent, 482 have not been previously identified as male reproductive tract-specific, and of these, 302 genes currently lack mouse phenotype information based on data obtained from ensembl biomart, mgi, impc, and ncbi (fig. 6 ). digital pcr (heatmap) demonstrating expression of a subset of the novel mouse reproductive tract-specific genes with human orthologs and no reported mouse model, and with human reproductive tract enrichment, is shown in fig. 7 . seventeen novel genes without mouse models (8030474k03rik, fthl17b, fthl17c, fthl17d, fthl17e, fthl17f, gm15262, gm18336, magea3, magea4, magea5, magea6, magea8, mageb2, xlr5a, xlr5b, and xlr5c) were identified through the mouse id4+ spermatogonia datasets that were also identified as spermatogonia-specific through the human datasets (bend2, bx276092.9, fam9a, fthl17, magea3, magea4, magea6, mageb6, and pnma6e) and were not identified through either mouse or human testis datasets, indicating restricted expression in spermatogonia, spermatogonial stem cells, or both (additional file 5: table s5 and additional file 6: table s6 ). eight genes (1700080o16rik, ccnb3, gm21964, gm4779, pet2, prr23a1, prr23a2, and prr23a3) were identified through the mouse id4+ spermatogonia datasets, genes whose human orthologs-ccnb3, dcaf8l1, magea10, magea11, magea12, magea9, magea9b, prr23a, prr23b, prr23c, and tle7-were also identified through the human spermatogonia datasets. since these genes were also identified through mouse, human, or both species' respective testis datasets, this indicates either strong expression in the spermatogonia compartment or expression outside of and in addition to the spermatogonia compartment. twenty-two genes were identified as mouse sertoli cellspecific as they were not otherwise identified as reproductive tract-specific through our analysis of mouse testes or germ cell datasets (encode project consortium testes and helsel et al.'s id4+ germ cell datasets) (additional file 6: table s6 ). of these 22 genes, 18 have human orthologs; of these 18 genes, 17 are novel and not previously identified as male reproductive tractspecific; of these 17 genes, 7 have not been previously knocked out in the mouse (c1ql2, gm45015, mageb18, mycs, shc4, sowahd, and tmsb15b2); and of these 7 genes, 2 genes (gm45015 and mageb18) were identified with human orthologs (pnma6e and mageb18) that were also identified through our human analysis as human reproductive tract-specific (additional file 5: table s5 and additional file 6: table s6 ). unlike the limited number of reproductive tract-specific genes we identified through human sertoli cell-specific datasets, a considerable number of genes-202 mouse genes-were identified as reproductive tract-specific through analysis of zimmermann et al.'s mouse postnatal day 35 sertoli cell datasets that were also identified through either the mouse testis datasets, mouse id4+ germ cell datasets, or both (additional file 6: table s6 ). of these 202 genes, 160 have one or more human orthologs; of these 160 genes, 54 are novel and not previously identified as male reproductive tract-specific; of these 54 genes, 36 have not been previously knocked out in the mouse; and of these 36 genes, 16 (1700011l22rik, 1700011m02rik, 1700018b08rik, 1700042g07rik, ankrd36, ankrd60, etd, gm39566, gm6657, gm9112, magea10, spata31d1b, tcp10c, tex44, tex48, and wfdc9) were identified with human orthologs that were also identified through our analyses as human reproductive tractspecific (fig. 6 , additional file 5: table s5 , and additional file 6: table s6 ). to the best of our knowledge, published rna-seq data of mouse whole epididymis or epididymis segments does not exist, for the identification of epididymis-specific transcripts or otherwise. therefore, we isolated caput, corpus, and cauda segments from adult (postnatal day 60) b6/129 mice and subjected the rna to sequencing. sixty-six genes were identified as mouse epididymisspecific as they were not identified as mouse male reproductive tract-specific through our reanalysis of the entable s6 ). of these 66 genes, 48 have human orthologs; of these 48 genes, 34 are novel and not previously identified as male reproductive tract-specific; of these 34 genes, 17 have not been previously knocked out in the mouse (ascl4, bsph1, c1s2, clec18a, cyp2j13, cyp4a30b, defb42, gm45826, lce6a, lcn6, muc15, odam, spag11a, spag11b, spink13, svs1, tchhl1); and of these 17 genes, 5 genes (bsph1, defb42, gm45826, spag11a, and spag11b) were identified with human orthologs (bsph1, defb136, spag11a, and spag11b) that are also human epididymis-specific (fig. 6 , additional file 5: table s5 , and additional file 6: table s6 ). sixty-five genes were identified as reproductive tractspecific in mouse with expression in both epididymis and testis and/or testis cell. of these 65 genes, 48 have human orthologs; of these 48 genes, 24 are novel and not previously identified as male reproductive tract-specific; of these 24 genes, 14 have not been previously knocked out in the mouse (1700009n14rik, 4922502d21rik, 4930563d23rik, d330045a20rik, defb28, dgkk, fam90a1b, gm6871, hrasls5, nxf3, shc4, spint3, trpc5os, wfdc9); and of these 14 genes, 3 genes (spint3, trpc5os, and wfdc9) were identified with human orthologs (spint3, trpc5os, and wfdc9) that are also human epididymis-specific (fig. 6 , additional file 5: table s5 , and additional file 6: table s6 ). through the aforementioned studies, we identified spint3, spint4, ces5a, pp2d1, and saxo1 as congruent in expression across both mouse and human datasets with expression restricted to either the epididymis (spint3, spint4, ces5a) or the testis (pp2d1 and saxo1) (figs. 3, 4, 5, and 6; additional file 5: table s5 ; additional file 6: table s6 ). spint5 was also identified as epididymisspecific in mouse (fig. 5 , additional file 6: table s6 ); however, in humans, spint5p is a pseudogene that is not processed into protein. conventional rt-pcr of a panel of mouse and human tissue cdnas confirmed epididymis-or testis-restricted expression of spint3, spint4, ces5a, pp2d1, and saxo1 in both species and spint5 in mouse (fig. 5) . to glean insight into the onset of expression for the epididymis-specific genes, spint3, spint4, spint5, and ces5a; whole epididymides from postnatal days (p) 3, p6, p10, and p14; and epididymis segments (caput, corpus, and cauda) from p21, p28, p35, and p60 aged mice were collected and analyzed through rt-pcr (additional file 19: fig. s10 ). spint3 expression begins as early as p3 (low) and gradually increases through p10 and p14 reaching steady levels throughout p21 to p60 in all three segments of the epididymis (additional file 19: fig. s10 ). in contrast, spint4 and spint5 display near identical expression levels with no expression at p3, p6, or p10, and expression apparent at p14 and later time points, with expression restricted to corpus only at p21 and p28, and caput and corpus, but not cauda at p35 and p60 (additional file 19: fig. s10 ). rnascope-based fluorescence in situ hybridization revealed a distinct segment-specific pattern of expression for spint4 that was identical to spint5, with both showing expression in most of the epithelial cells restricted to a brief region of distal caput/proximal corpus (additional file 20: fig. s11) . spint3, on the other hand, displayed a pattern of expression in a majority of epithelial cells that begins just a bit further downstream along the corpus, but persisting for much further along the corpus, throughout the corpus and into the cauda (additional file 20: fig. s11 ). these results indicate that spint3 shares a role that is distinct from spint4 and spint5 and indicates a potential redundancy between spint4 and spint5 and how humans may have lost the evolutionary pressure to keep spint5p as a protein-coding gene. to glean insight into the potential spermatogenic cell population(s) expressing pp2d1 and saxo1, we performed rt-pcr of mouse testes isolated at postnatal day (p) 3, a time point enriched for gonocytes; p6 (onset of expression of type a spermatogonia); p10 (early spermatocytes); p14 (late spermatocytes); p21 (spermatids); and p35 and p60, which display complete spermatogenesis [40] (additional file 19: fig. s10 ). expression of pp2d1 and saxo1 is detected at similar levels at p28 and later, but not at p21 or before indicating expression during spermiogenesis and spermiation (additional file 19: fig. s10 ). to determine the male reproductive requirement and potential functional role of the identified novel male reproductive tract-specific genes, spint3, ces5a, pp2d1, and saxo1 were individually ablated by crispr/cas9mediated zygote approach. since in humans spint5p is a pseudogene, and in mice, spint5 protein is most similar in sequence to mouse spint4, we simultaneously ablated both mouse spint4 and spint5 genes, which on mouse chromosome 2 are only separated by 12.9 kilobases. the efficiency of generating each mutant is summarized in additional file 1: table s11 . each of the genes contained deletions of differing sizes and genomic targets. the genomic sequences flanking the deletion in each mutant are presented in additional file 1: table s12 , and representative sanger sequencing results for each mutant are presented in fig. 8 . using the forward and reverse primer pairs presented in fig. 8a -e and listed in additional file 1: table s13 , offspring carrying the mutant alleles were identified through routine genotyping (fig. 8k-o) . spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout mouse lines were examined in parallel with littermate controls of equivalent age to determine the effect of gene ablation on spermatogenesis, sperm maturation, and fertility in male mice. none of the knockout strains generated in this study displayed any overtly abnormal appearance, difference in body weight (fig. 9 ) or composition, or difference in behavior when compared to the controls. to determine the male reproductive requirement of each of the genes of interest, spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout and control adult male mice were housed continuously with two females for 3 months and the size and number of litters were recorded. although spint3, pp2d1, and saxo1 knockout males sired a number and size of litters during the test mating period that was not significantly different from controls (fig. 9a-c) , spint4/5 and ces5a knockout males sired significantly fewer litters and pups over the test mating period (fig. 9a-c) . spint4/5 null males displayed a statistically significant 95% reduction in the number of litters and pups sired per male and statistically significant 64% reduction in litter size, over the 3-month mating period (n = 9 controls, n = 9 kos) (fig. 9 ). seven out of 9 males displayed complete infertility, and the two remaining males, who sired pups, sired pups at a significantly reduced number of litters and pups per month with litters of reduced litter size (fig. 9a-c) . this fertility defect in spint4/5 double ko males was not associated with any significant changes in epididymis and testis histology (additional file 21: fig. s12 ) or sperm numbers, motility, and morphology ( fig. 9g-i) . ces5a null males displayed a variegated phenotype with an overall statistically significant 50% reduction in the number of litters and pups sired per male, but no significant difference in litter size, over a 3-month mating period (n = 9 controls, n = 9 kos). the fertility defect in ces5a ko males was associated with significant changes in epididymis histology (additional file 22: fig. s13 ) and significant reductions in sperm motility and progressive motility (fig. 9h, i) . ces5a null males displayed a 50% reduction in sperm motility and progressive cells, a 50% increase in static cells, and a 25% decrease in average path velocity and progressive velocity after hyperactivation. no changes in testis histology were found (additional file 22: fig. s13) , and despite the sperm motility defect, scanning electron microscopy failed to identify a morphological defect in ces5a null sperm in comparison to controls (additional file 23: fig. s14) . the epididymides and testes weights of spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout mice were not significantly different from littermate control mice (fig. 9e, f) . histological analyses of testes from spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout mice revealed all had seminiferous tubules with intact epithelia and the presence of all germ cell subtypes and all stages of spermatogenesis (additional file 21: fig. s12 and additional file 24: fig. s15 ). histological analyses of caput, corpus, and cauda from spint3, spint4/5, pp2d1, and saxo1 ko mice revealed spermatozoa in tubule lumens of all knockouts with no significant differences in epididymal histology in comparison to controls (additional file 21: fig. s12 and additional file 24: fig. s15 ). however, ces5a knockout mice displayed significant histological abnormalities including lumen dilation (possibly from occlusion), inflammation, and the appearance of abnormal epithelia (additional file 22: fig. s13 ). computer-assisted sperm analysis of spint3, spint4/5, pp2d1, and saxo1 knockouts 9 phenotype analysis of crispr/cas9 generated null mice for determining the contraceptive potential of the selected genes. spint4/5 and ces5a null mice show significant fertility defects; meanwhile, spint3, pp2d1, and saxo1 null mice appear normal. fertility (a-c), body and reproductive organ weights (d-f), and sperm parameters (g-i) were all measured between knockout (−/−) and littermate control [wild-type (+/+) and heterozygous (+/−)] mice as indicated. bars represent mean ± sem. *p < 0.05, **p < 0.01, ***p < 0.005. ns, not significant. showed no statistically significant differences across all measured parameters including sperm concentration, sperm motility, and progressive motility (fig. 9g-i) . ces5a knockouts displayed significant decreases in sperm number and sperm motility (fig. 9h, i) ; however, cauda epididymal sperm isolated from a variety of ces5a null animals looked morphologically indistinguishable to controls (additional file 23: fig. s14 ). to date, the etiology of idiopathic male infertility is not fully understood, and hormonal male contraceptives have not been effective. therefore, identification of novel reproductive tract-specific genes, and elucidating the functional requirement or lack thereof of these genes, is essential towards understanding the etiology of male infertility and the development of male contraceptives. despite significant advances in our understanding of the human and rodent testis and epididymis transcriptome, mostly through microarray-based studies, no prior studies have utilized purified human testis cells for the identification of human testis-specific transcripts, no prior studies have utilized the more state-of-the-art rna-seq-based transcriptomics methodology for analysis of human epididymis-specific transcripts, and no prior studies have utilized rna-seq analysis of rodent reproductive tissues or cells to identify rodent reproductive tract-specific transcripts. to address these gaps in knowledge, and to increase the number of identified reproductive tract-specific genes using the most relevant high-throughput transcriptomics methodology, we analyzed in parallel on a custom bioinformatics pipeline a large number of published and newly acquired human and mouse rna-seq datasets. through our studies, we identified and verified many novel male reproductive tract-specific transcripts in both species, and through the crispr/cas9 system, we interrogated the reproductive requirement of a subset of these genes. we found that spint4 (together with spint5) in mice is required for normal male mouse fertility, and although not required for male fertility, we identified ces5a as playing a major biological role in the epididymis. we report the remaining genes that we knocked out-spint3, pp2d1, and saxo1as dispensable for male reproductive function, which is essential information to disseminate to the scientific community. our study also verified the male reproductive tract-specific expression of many previously identified genes (additional file 13: fig. s5 , additional file 15: fig. s7) , and genes for which previously published mouse models display male infertility phenotypes (additional file 14: fig. s6 ). this later group of already functionally validated genes serves as potential male contraceptive targets worth underscoring to the research community. prior to massively parallel microarray-based and rnaseq-based transcriptomics analyses for the identification of reproductive tract-specific genes, the ncbi unigene database was a valuable resource for many in the male reproductive biology field for identifying testis-specific transcripts [4, [41] [42] [43] . although in our study we only considered prior microarray-based and rna-seq-based studies when considering the novelty of the genes that we identified, it is worth noting that several genes that we identified-not previously identified in microarray-based and rna-seq-based transcriptomics studies-were previously identified through studies that solely utilized the unigene database [41, 42] . nineteen human genes that we identified-c16orf82, ccdc27, cpa5, fam217a, fam46d, fam47c, fbxo24, fbxw10, fkbp6, galn tl5, kcnu1, mageb3, mroh2b, nutm1, prdm9, rbmxl3, spata31e1, triml1, and trpc5os-were previously identified by liu et al. [42] , and seven mouse genes that we identified-1700013d24rik, akap3, ankrd36, hrasls5, spesp1, tex22, and ubqlnl-were previously identified by choi et al. [41] and liu et al. [42] . thus, our results confirm the findings of these previous studies. since more than half of all human protein-coding genes are categorized as unknown in terms of drug target potential (additional file 11 table s8 ), and only 30% encode for classically druggable enzymes, gpcrs, ion channels, nuclear receptors, and transporters (additional file 10 fig. s4) , the potential to find new undiscovered drug targets that can be drugged using classical approaches is somewhat limited. indeed, in our study, we found one hundred and nine genes to be novel in terms of previously published high-throughput transcriptomics studies, without a current reported mouse model, and reproductive tract-specific in both humans and mice (figs. 2, 3, and 6 ). many of these genes (93 genes; 85%) fall into the category of unknown and may otherwise be considered "undruggable" due to various challenges with existing targeting approaches [10] . however, the contraceptive potential of these genes should not be overseen, but rather investigated for potential identification of a high-affinity small molecule that can either interfere with protein-protein interaction (ppi) or target the protein specifically for degradation using a new technology called proteolysis targeting chimeras (protacs). protein-protein interaction targets are not deemed undruggable, based on the discovery of small molecules capable of deeper and higher affinity binding within the contact surfaces of the target protein [44] . additionally, once a high-affinity small molecule against a specific target protein is identified, an engineered protac molecule can mark a target protein for proteasomal degradation by linking the target protein to the polypeptide co-factor, ubiquitin [45] [46] [47] . there are currently various combinations of protacs developed to overcome the limitations of cell permeability, stability, solubility, selectivity, and tissue distribution [48] [49] [50] [51] . therefore, disrupting ppis or utilizing protacs provides the potential to greatly promote the development of contraceptive drugs against the "undruggable" nonenzymatic target protein space. if gene knockouts for closely related and ubiquitously expressed paralogs display no abnormal phenotype, then unintended drug targeting of these proteins may result in no side effects in humans. however, the burden of safety for a male contraceptive is extremely high, and if it can be avoided, targeting non-reproductive tractexpressed proteins in humans should be avoided since the functional requirements for these proteins may not be fully understood. further, although mice are one of the best models for human disease, they are unable to communicate when they are unwell, a phenomenon that may occur independent of any measurable phenotypic traits. thus, potential reported side effects in humans during clinical trials may have been present, but missed, during animal studies, or in fact be present in humans and not in mice because of the vast biological differences across these two species. thus, with drug safety in mind, reproductive tract-specific candidates should be prioritized based on somatic cell-expressed protein sequence similarity, especially in the drug binding pocket. according to ensembl, several novel reproductive tract-specific genes without mouse models that we identified and verified-ac022167.5, al672043.1, bhmg1, c1orf105, c2orf92, c4orf51, ccdc196, spint3, tex44, tex48, tex51, and trpc5os (figs. 3, 4, and 5; additional file 5: table s5 )-have no known associated paralogs, indicating reproductive tract-specific drug targeting is highly likely. additionally, spag11a and spag11b are epididymis-specific paralogs (figs. 3, 4 , and 5; additional file 5: table s5 ) with no other known paralogs according to ensembl. efcab5 (figs. 3 and 5, additional file 5: table s5 ) has a ubiquitously expressed paralog, nsrp1, with only 7% amino acid sequence similarity, indicating specific drug targeting potential for this candidate. likewise, erich6 and mroh2b (figs. 3 and 5, additional file 5: table s5 ) have ubiquitously expressed paralogs erich6b and mroh2a, respectively, with 20% and 30% amino acid sequence similarity, indicating reasonable potential for specific drug targeting. prr23a, prr23b, and prr23c are testis-specific paralogs (figs. 3 and 5, additional file 5: table s5 ) with prr23d2 as the next closest paralog according to ensembl. since prr23d2 has less than 26% amino acid sequence similarity to prr23a, prr23b, and prr23c, but appears to be epididymis-specific according to the human protein atlas [52] , all four proteins of unknown function make suitable drug candidates. likewise, spat a31d1, spata31d3, and spata31d4 are testis-specific paralogs (figs. 3 and 5, additional file 5: table s5 ) with spata31a5 as the next closest paralog according to ensembl. since spata31a5 has less than 26% sequence similarity to spata31d1, spata31d3, and spat a31d4, but also appears to be reproductive tractspecific according to hpa [52] , all four proteins with unknown function also appear to be worthy of consideration for potential drug targeting. tpte and tpte2 are testis-specific paralogs (figs. 3 and 5, additional file 5: table s5 ) with ubiquitously expressed pten as the next closest paralog according to ensembl. since pten has less than 25% sequence similarity to tpte and tpte2, off-target effects appear to be unlikely. likewise, wfdc10a, wfdc10b, and wfdc13 are all epididymis-specific paralogs with the closest nonreproductive tract-expressed paralog, wfdc5, having less than 26% sequence similarity to any of the rts paralogs, also indicating high drug specificity potential. several additional novel reproductive tract-specific enzyme and gpcr genes without mouse models that we identified-gpr156, prss38, prss48, sult6b1, tmpr ss7, triml2, and ttll8 (fig. 3, 4 , and 5; additional file 5: table s5 )-have non-reproductive tract-expressed paralogs with less than 35% sequence similarity indicating good drug specificity potential. a novel testis-specific transporter gene without a reported mouse model that we identified, slc25a52, would make a poor drug candidate since its closest paralog, slc25a51, is 93% similar in amino acid sequence and ubiquitously expressed. cpa5, iqca1l, and ppp4r3c have ubiquitously expressed paralogs, cpa1, iqca1, and ppp4r3b, respectively, with 50-60% protein sequence similarity indicating careful consideration must be made for potential drug targeting without off-target effects. of the seventy-three genes that our study identifies as reproductive tract-specific in humans and for which a published mouse model shows male infertility phenotype (additional file 14: fig. s6) [28, 29, 31] , it is worth noting that 21 genes-cnbd2, defb110, fam170a, fbxo47, meig1, meiob, meioc, odf1, odf4, rec114, rnf17, spaca1, spata22, spem1, spo11, sycp1, terb1, tex19, tex38, tnp2, and topaz-do not have any associated paralogs and, thereby, may be considered most suitable for further drug development. however, it is also worth noting that it may be possible that genes required for male fertility in mice may not necessarily be required for male fertility in humans. of the seventy-three human reproductive tract-specific genes our study identified with male mouse infertility phenotypes, twenty-seven genes-actl7b [53] , akap4 [54] , boll [55] , brdt [55] [56] [57] [58] [59] [60] , catsper4 [54] , ccdc155 [61] , fkbp6 [55, [61] [62] [63] , meig1 [64] , meiob [55] [56] [57] 65] , nanos2 [55, 61] , odf1 [55] , prdm9 [61, 66, 67] , prss37 [55] , rad21l1 [68] , rbmxl2 [55, 62] , rnf17 [69] , sohlh2 [61, 70] , spaca1 [55] , spata16 [55, 56] , spem1 [55] , spo11 [55, 58, 61] , sun5 [55] [56] [57] , sycp1 [55] , tex38 [55] , tnp2 [55] , tssk1b [62] , and zpbp [55] -are currently associated with mutations underlying human male infertility, confirming a similar functional requirement for these genes in humans may exist. for the remaining 45 genes, however, either these genes are not required for human male fertility as they are required in mice, or associated mutations in male infertile patients have not yet been reported. although many reproductive tract-specific genes have been studied through functional genetics approaches, many remain to be solved. elucidating the function of these novel genes is necessary to build a better understanding of the factors underlying spermatogenesis and sperm maturation, which has implications in understanding the etiology of male infertility and the development of male contraceptives. in this study, four epididymis-specific genes (spint3, spint4, spint5, and ces5a) and two testis-specific genes (pp2d1 and saxo1) were deleted in mice to determine their functional requirement in male fertility and potential utility as male contraceptive target. we chose to study these genes because all but saxo1 encode enzymes or enzyme-related protein products and are thus considered druggable in the classical sense. saxo1, a cilia-related gene, was chosen because prior literature demonstrated expression in sperm [71] . although not druggable in the classical sense, if targeted through non-canonical approaches, one could obtain a fast-acting drug with greater reversibility potential and a decreased likelihood of affecting testicular function and size. the epididymis-specific genes we chose to target for functional analysis, by the very nature of their tissue's expression, also fit this potential drug profile of modulating only the latest stages of sperm development. analyses of testis and epididymis organ weights and histology, sperm parameters and morphology, and fertility revealed no significant differences in spint3, pp2d1, and saxo1 knockout mice in comparison to littermate controls demonstrating that, individually, spint3, pp2d1, and saxo1 are not required for male mouse fertility and are not suitable targets for the development of a male contraceptive. however, we found partial effects on male fertility in ces5a knockout mice and profound effects on male fertility in spint4/5 double knockout mice. ces5a is a member of a multigene family of mammalian carboxylesterases that can hydrolyze ester, thioester, amide, and carbamate linkages in a wide variety of endogenous and exogenous molecular substrates, including triglycerides, thus playing key roles in both metabolism and detoxification [72] [73] [74] [75] . ces5a shares roughly similar percent homology (~40% homology) to all four of its related paralogs, ces1, ces2, ces3, and ces4a. human carboxylesterase 1 (ces1) is predominantly expressed in the liver and has been shown to have triglyceride hydrolase activity as overexpression of human ces1 in cells leads to an increase in cholesteryl ester hydrolysis and free cholesterol efflux [76] . further, mouse ces1g-a protein expressed by one of a cluster of eight syntenic genes (ces1a through ces1h) orthologous to the human ces1 gene-has been shown to have triglyceride hydrolase activity as ces1g null mice display hyperlipidemia and abnormal lipid homeostasis including increased liver and circulating cholesterol and triglycerides, and altered saturated and unsaturated fatty acid levels [77, 78] . therefore, it is likely that ces5a exhibits similar carboxylesterase activity in the epididymis hydrolyzing cholesteryl ester and affecting free cholesterol efflux. indeed, recombinant ces5a protein has been previously shown to have carboxylesterase activity hydrolyzing cholesterol ester and choline ester [79] . since sperm cholesterol content is significantly decreased during epididymal maturation [80, 81] and a proper cholesterol/phospholipid (c/pl) ratio of the sperm plasma membrane is required for sperm capacitation [82, 83] , ces5a may be pivotal in regulating sperm membrane cholesterol and lipid levels to ensure the normal function of male gametes in the last steps of the fertilization process. the most closely related paralog to spint4 is eppin, which, as reviewed in o'rand et al., has at least three physiological functions [16] . eppin inhibits sperm motility when it binds the semen coagulation protein semenogelin 1 (semg1) on the sperm surface [84] ; it modulates the proteolytic activity of prostate-specific antigen (psa), a serine protease, against its seminal plasma substrate, semg1 [85] ; and it exhibits strong antibacterial activity [86] . these functions are postulated to prevent premature hyperactivation and capacitation of sperm in the female reproductive tract [16] , and to protect spermatozoa from proteolytic and bacterial attack during transit in the female reproductive tract [11, 16] . thus, it is possible that the physiological function of spin t4 is similar. however, unlike spint1 and spint2, which have been shown to act as protease inhibitors against a wide variety of prss and tmprss proteases [87] [88] [89] [90] , when tested against a panel of eight proteases (including psa, trypsin, chymotrypsin, plasmin, urokinase, thrombin, factor xa, and elastase), spint3 and spint4 were shown to lack protease inhibiting capability [91] . this indicates that either the protease inhibiting properties of spint3 and spint4 were lost in favor of yet unknown functions or their protease activity has a narrower spectrum of inhibition against unknown targets. since spint4/5 null male mice are severely subfertile, without an apparent difference in epididymis histology, sperm number, sperm morphology, and sperm motility parameters in comparison to the wild-type (wt) mice, this phenocopies the reproductive phenotype of several null mice of testis-, epididymis-, or prostate-specific genes (sof1, tmem95, and spaca6; pate8, and pate10), which reveal a requirement in regulating sperm migration through the oviduct and sperm-oocyte fusion in mice [92, 93] . a severe fertility defect associated with normal sperm number, morphology, and motility is also shared among mice lacking the sperm membrane protein adam3, thought to be crucial in sperm-zp binding and sperm migration through the uterotubular junction (utj) [94, 95] . more than 10 proteins including 2 proteases (ace, adam1a, adam2, calr3, clgn, cmtm2a/b, pdilt, pmis2, prss37, rnase10, tex101, and tpst2) have been described that affect the processing and/or localization of adam3 protein in spermatozoa [96, 97] . further studies with spint4/5 null mice are required to determine whether sperm behavior in the female reproductive tract, specifically the sperm migration through the utj, is adversely affected. since a large 16,797-bp genomic region-including the intergenic region between the spint4 and spint5 genes-was deleted ( fig. 8 ; additional file 1: table s12 ), based on the evidence presented in this manuscript, we cannot exclude the possibility that cis-acting elements and/or trans-acting factors affecting the expression of other genes may have contributed to the phenotype of these mice. lack of protein-coding ability of human spint5p does not necessarily indicate that this pseudogene is functionally obsolete. pseudogenes have been shown to play roles in gene expression and gene regulation [98] . for example, pseudogene transcripts can act as competitive endogenous rnas (cerna) through competitive binding of mirna, which results in regulation of gene expression [99] . to this end, studying the functional requirement of spint5 in male mice is necessary to further our knowledge of evolutionarily conserved genes between species. since humans are genetically diverse, a limitation to phenotype characterization of genetically manipulated mice is the reliance of a single mouse background to the examination of complex genetic outcomes, such as fertility, that is under the control of many genes with different levels of contribution to the phenotype [100] . it is possible that a gene that causes complete infertility in an inbred mouse background may only cause partial infertility or subfertility in a different inbred line or more robust outbred background. since the mice used in our study were a cross between c57bl/6 (b6) mice and dba/2 (d2) mice, and thus, these b6d2f1 mice are heterozygous for b6 and d2 alleles at all loci in their genome, we can eliminate infertility susceptibility of either the b6 or the d2 background as the cause for fertility defects in ces5a and spint4/5 mice. it does remain to be determined, however, whether the phenotype of the genes we knocked out would be more or less severe on a different mouse background, and if required for male fertility in humans, the level of contribution to male fertility of these genes across genetically diverse men. a limitation to this study is the reliance on mrna abundance positively correlating with protein abundance. future studies are necessary to elucidate the relationship between mrna and protein expression levels of the candidate genes identified in our study. furthermore, despite batch corrections that were made, technical differences in sample preparation and integrity across the various published rna-seq datasets can influence the results of our findings. one of the major advantages to our study design is the use of rna-seq datasets from purified human and mouse germ cells and sertoli cells to identify reproductive tract-specific targets since the use of whole testes for the identification of cell typespecific transcripts in past studies is subject to dilution effects. however, this advantage could also be considered a disadvantage since purified cells from nonreproductive tissues were not used for comparison but, if analyzed, purified cells from non-reproductive tissues may have revealed significant levels of expression in non-reproductive tissues. ultimately, functional studies in animals and humans will help to confirm whether genes identified in our study are essential for male fertility and not any other physiological process. through the integration of hundreds of published and newly acquired human and mouse reproductive and non-reproductive tissue and cell rna-seq datasets, we have generated a list of novel genes expressed predominantly or exclusively in the male reproductive tract that are worthy of consideration for functional validation in an animal model and potential targeting for a male contraceptive. our results further validate a functional requirement for spint4/5 and ces5a in male mouse fertility, while demonstrating that spint3, pp2d1, and saxo1 are each individually dispensable for male mouse fertility. identifying novel reproductive tract-specific genes congruent across species adds insight into organismal biology and valuable information that can be used to identify potential male contraceptive drug target candidates. furthermore, elucidating the individual functional requirement or lack thereof of these novel genes builds a better understanding of the factors underlying spermatogenesis and sperm maturation, which has implications in understanding the etiology of male infertility and further validation of the utility of a potential male contraceptive target. the de novo isolated human testes and epididymides included in this study were obtained from three donors through a local organ transplant program in quebec, canada, called transplant quebec. all procedures were approved by the local ethics committee, and written consent was obtained from each respective donor's family. the donors were of 40, 52, and 65 years of age with no preexisting medical condition that could affect reproductive function. donor testes and epididymides were removed under artificial circulation to preserve other organs that were assigned for transplantation. each testis and epididymis were dissected in the laboratory of robert sullivan at université laval. each epididymis was dissected into three segments corresponding to the caput, corpus, and cauda regions and minced into small tissue pieces. testes and epididymides tissue fragments were immediately snap frozen in liquid nitrogen, stored at − 80°c, and shipped frozen to baylor college of medicine for further processing. eight non-reproductive tissue types (kidney, liver, lung, skin, spleen, and stomach) and 2 female reproductive tissues (ovary and uterus) were obtained from the baylor college of medicine tissue acquisition and pathology core. thirteen nonreproductive tissue types (adipose, adrenal gland, brain, colon, heart, leukocytes, pancreas, prostate, salivary gland, skeletal muscle, small intestine, smooth muscle, thyroid) were obtained as purified rnas from takara bio (kusatsu, japan). human testes and epididymis segments were used for de novo rna-seq analysis; all human tissues and/or resulting rnas were used for rt-pcr verification. mouse tissues [testis, caput, corpus, cauda, ovary, uterus, and 17 non-reproductive tissue types (adipose, bladder, brain, colon, eye, heart, kidney, liver, lung, prostate, skeletal muscle, skin, small intestine, spleen, stomach)] were obtained from dissection of b6/129 mice; the remaining 2 non-reproductive tissues (smooth muscle and thyroid) were obtained as purified rnas from takara bio. mouse epididymis segments were used for de novo rna-seq analysis; all mouse tissues and/or resulting rnas were used for rt-pcr verification. rna for both rna-seq and/or rt-pcr verification was isolated from human and mouse tissues using trizol/ chloroform extraction method followed by rneasy mini kit from qiagen with on-column dnase (qiagen) treatment using the manufacturer's protocol. rnas used for rna-seq were assessed by bioanalyzer for rna integrity. for rt-pcr, rna was reverse-transcribed to cdna using superscript iii reverse transcriptase from thermo fisher according to the manufacturer's protocol. cdna was then pcr amplified using gene-specific primers designed using ncbi primer design tool. primer sequences are listed in additional file 1: table s14 . library generation for rna-seq rna-seq libraries were made using kapa stranded mrna-seq kit (kk8420). briefly, poly-a rna was purified from total rna using oligo-dt beads; subsequently, it was fragmented to small size; and first strand cdna was synthesized. second strand cdna was synthesized and marked with dutp. resultant cdna was used for end repair, a-tailing, and adaptor ligation. finally, the library was amplified for sequencing on an illumina novaseq 6000 platform. the strand marked with dutp was not amplified, allowing strand-specific sequencing. sequence alignment, quantification, and differential gene expression human testes, human epididymis segments, and mouse epididymis segments were sequenced by the department of molecular and human genetics functional genomics core at baylor college of medicine (additional file 1: table s1 and additional file 1: table s2 ). previously published reproductive and non-reproductive tissue and cell sequences were downloaded from the sequence read archive (sra) [101] (additional file 1: table s1 and additional file 1: table s2 ). all sequences were trimmed using trim galore! and aligned against the human genome (grch38) or mouse genome (grcm38) using hisat2 [102, 103] . gene expression in each tissue was quantified using featurecounts, filtered for only protein-coding genes, and batch corrected by removing unwanted variation using the ruvr method from ruvseq [104, 105] . differential gene expression was determined for each reproductive tissue against each nonreproductive tissue using the r package edger [106] . principal component analysis (pca) was performed in the r statistical environment using the log2 counts per million (cpm) for each gene in the corresponding tissue after using the ruvr method to correct for batch variation as described above. our procedure for identifying a reproductive-specific gene was repeated for each reproductive tissue or cell sample independently. the following selection criteria were applied to each reproductive tissue or cell sample. first, the non-reproductive tissue with the maximum expression, expressed as the log2 fold change between the non-reproductive tissue and the reproductive tissue or cell of interest, was identified for each gene using the results from the differential gene expression analysis. second, we identified reproductive-specific gene drug candidates using three filters: a false discovery rate (fdr) filter, a maximum transcript per million (tpm) expression value filter on the non-reproductive sample with the maximum expression identified as described above, and a minimum tpm expression value filter on the reproductive tissue or cell sample of interest. a gene was kept if the fdr from the differential gene expression analysis was less than or equal to 0.05 for the comparison of the reproductive tissue or cell sample of interest to the non-reproductive tissue with the maximum expression. a gene was considered to be a male reproductive tissue-specific drug target if the average tpm expression value in the non-reproductive tissue with the maximum expression from the differential analysis was less than or equal to 1.0 for human (2.0 for mice), and if the average tpm expression value for that gene was greater than or equal to 10.0 for human (8.0 for mice) in the reproductive tissue or cell sample of interest. the average and standard deviation of the tpm expression value for each gene was calculated from the ruvr batch corrected counts per million expression value for each tissue or cell sample. we consolidated data from ensembl biomart [29] and mouse genome informatics (mgi) [28] to create a comprehensive database of mouse gene symbols orthologous to human genes and vice versa. each respective species' stable ensemble gene id was used for each conversion, with gene symbol as the final output. as mentioned, several notable high-throughput gene expression studies using microarrays or rna-seq, focused on identifying male reproductive tract-specific genes, have been previously published [2, [4] [5] [6] [7] [8] [9] . tables and supplementary tables from these studies were gathered to collect the lists of genes previously identified. for microarray-based studies, affymetrix ids were used to confirm the identity of a listed gene, based on current sequence mappings, or in many cases to identify de novo the identity of a gene only known at the time of the study by its affymetrix id and not gene symbol. for mouse and rat studies, gene symbols were converted to orthologous human symbols, to systematically catalog both the rodent and corresponding human symbols as previously identified. for example, 399 affymetrix probe ids were listed as reproductive tract-specific in johnston et al. [8] . after re-identification of gene symbols based on current ensembl sequence mappings, 42 identified gene symbols remained the same, 160 received an updated/replacement gene symbol identification, 103 genes that were previously unidentified received a new gene symbol identification, 26 affymetrix ids lost mapping to any gene symbol, and 67 remain unidentified. out of the total of 305 affymetrix ids that mapped to 301 current rat gene symbols, 257 rat gene symbols converted to at least one human ortholog gene symbol that was either the same symbol or different. both rat and human symbols, based on new mappings, were considered previously identified. for the complete list of previously identified genes, see additional file 18: table s10 . genes were classified as encoding either enzymes, epigenetic-related proteins, g protein-coupled receptors (gpcrs), ion channels, kinases, nuclear receptors, orphan gpcrs (ogpcrs), transcription factors, transporters, or unknown proteins based on data obtained from illuminating the druggable genome [27] (additional file 11: table s8 ). we used data obtained from ensembl biomart [29] , mgi [28] , the international mouse phenotyping consortium (impc) [31] , and pubmed searches to generate a comprehensive database identifying the existence of a mouse model for all mouse genes (additional file 12: table s9 ). we then queried our identified candidate human and mouse genes against this list. for a given human gene, we queried the equivalent mouse ortholog gene symbol(s). spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout mice were produced at baylor college of medicine. b6d2f1 (c57bl/6 × dba2) mice were used as embryo donors, and cd1 mice were used as foster mothers. mice were purchased from charles river (wilmington, ma). all mice were housed in a temperature-controlled environment with 12-h light cycles and free access to food and water. mice were housed in accordance with nih guidelines, and all animal experiments were approved by the institutional animal care and use committee (iacuc) at baylor college of medicine. generation of spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout mice to generate spint3, spint4/5, ces5a, pp2d1, and saxo1 knockout mice, grna/cas9 ribonucleoprotein complex was electroporated into fertilized eggs and transplanted into surrogate mothers as previously described [107] . briefly, to harvest fertilized eggs, card hyperova (0.1 ml, cosmo bio) was injected into the abdominal cavity of b6d2f1 females (charles river), followed by human chorionic gonadotropin (hcg) (5 units, emd chemicals). forty-eight hours after card hyperova, b6d2f1 males were allowed to mate naturally. twenty hours after mating, fertilized eggs with 2 pronuclei were collected for electroporation. custom crrnas targeting each gene were purchased from millipore-sigma. the sequences for all guide rnas used for crispr/cas9mediated gene editing are listed in additional file 1: table s11 . crrna and tracrrna (millipore-sigma) were diluted with nuclease-free water. the mixture was denatured at 95°c for 5 min and allowed to anneal by cooling gradually to room temperature (1 h). each grna was mixed with cas9 protein solution (thermo fisher scientific) and opti-mem media (thermo fisher scientific), and then incubated at 37°c for 5 min to prepare the grna/cas9 rnps [final concentration, 300 ng/μl cas9 for 250 ng/μl of each grna]. the grna/cas9 rnp solution was placed between electrodes with a 1mm gap in the ecm 830 electroporation system (btx). fertilized eggs were arranged between the electrodes, and then, the electroporation was performed with the following conditions: 30 v, 1-ms pulse duration, and 2 pulses separated by 100-ms pulse interval. for egg transfer, electroporated embryos were transplanted into the oviduct of pseudo-pregnant icr recipients. after 19 days, offspring were obtained by natural birth or cesarean section. the f0 mice with sequence-predicted heterozygous mutations were used for the mating with additional b6d2f1 mice to generate homozygous mutants. the f2 or later generations were used for the phenotypic analyses. for sanger sequence analysis of mutant mice, genomic dna was isolated by incubating tail tips in lysis buffer [20 mm tris-hcl (ph 8.0), 5 mm edta, 400 mm nacl, 0.3% sds, and 200 μg/ml actinase e solution] at 60°c overnight. polymerase chain reactions (pcrs) amplifying the genomic region containing the insertion/deletion events were performed using kod xtreme enzyme (toyobo, osaka, japan); pcr products were purified using the qiaquick pcr purification kit (qiagen, carlsbad, ca, usa) and sent for sanger sequencing on an abi 3130xl genetic analyzer (thermo fisher scientific, waltham, ma, usa) using the forward primer. for routine genotyping of mutant mice, genomic dna was isolated by separately incubating ear snips and tail tips in 50 mm naoh solution at 95°c overnight and inactivating with 1 m tris ph = 8.0. pcrs amplifying wild-type and mutant-specific amplicons were performed using 2x amfisure pcr master mix (gendepot, barker, tx). primer sequences are listed in additional file 1: table s13 . upon sexual maturation (6-7 weeks of age), knockout and littermate control male mice (n = 6-9 mice per genotype) were continuously housed with two 7-8week-old wild-type b6d2f1/j female mice per male for 12 weeks. during the fertility test, the number of pups was counted shortly after birth. the total number of litters and pups per male over the mating trial was calculated and divided by the number of months to generate averages and statistics per genotype. the average number of pups per litter is based on the average litter size per male where a litter is considered one or more pups. knockout and littermate control male mice (n = 5-13 mice per genotype) that were 12 weeks of age were used to examine body and reproductive organ weights, and testicular and epididymal histology. testes and epididymides were fixed in bouin's fixative, embedded in paraffin, sectioned at 5 μm thickness, and stained with 1% periodic acid-schiff (pas) stain followed by counterstaining with hematoxylin 2 solution. histological images were acquired with an aperio at2 slide scanner (leica microsystems). knockout and littermate control male mice (n = 5-13 mice per genotype) that were 12 weeks of age were used to examine sperm numbers and motility parameters using computer-assisted sperm analysis (casa). cauda of both epididymides was isolated, transferred into human tubule fluid (htf) (irvine scientific, santa ana, ca) containing 5 mg/ml of bsa, minced, and placed in a humidified incubator for 15 min at 37°c with 5% co 2 . following incubation, the sperm were diluted 1:50 in htf, added to a pre-warmed slide, and analyzed using a hamilton-thorne bioscience's ceros ii instrument. several fields of view were illuminated and captured until at least 200 cells were counted. rnascope 2.5 hd reagent kit (red) (cat. 322350, advanced cell diagnostics, newark, ca, usa) was used to detect spint3, spint4, and spint5 mrna transcripts on pfa-fixed, paraffin-embedded sections from 3-monthold wild-type epididymis. the probes against mm-spint3, mm-spint4, and mm-spint5 were custom-made, and the standard positive control (mm-ppib, cat. 313911) and negative control (dapb, cat. 310043) probes were used. the assay was performed according to the manufacturer's instructions. slides were counterstained using dapi and mounted using prolong glass antifade mountant (thermo fisher scientific inc.). multi-channel fluorescent images were acquired with an aperio versa (leica microsystems). all measurements are expressed as mean ± standard error of the mean. statistical differences were determined using student's t test. differences were considered statistically significant if the p value was less than 0.05. supplementary information accompanies this paper at https://doi.org/10. 1186/s12915-020-00826-z. additional file 1: tables s1, s2, s11, s12, s13, and s14. table s1 . summary of human rna-seq datasets. this table contains the sra value for each previously published human rna-seq dataset that was reanalyzed as part of this study. the geo accession number for each new human rna-seq dataset generated and subsequently analyzed in this study is also included. table s2 . summary of mouse rna-seq datasets. this table contains the sra value for each previously published mouse rna-seq dataset that was reanalyzed as part of this study. the geo accession number for each new mouse sample generated and subsequently analyzed in this study is also included. table s11 . single-guide rnas targeting the genes' upstream (u) and downstream (d) regions used for generating knockout mice. efficiency of embryo transplantation was presented using the number of total pups delivered by pseudopregnant mice divided by the number of total embryos used for oviduct transplantation (total pups/embryos transplanted). efficiency of genome editing was determined by the number of pups carrying enzymatic mutations divided by the number of pups subjected to genotyping (gm pups/pups genotyped). table s12 . sanger sequencing of detailed genotype of mutant dna sequences in all the five mouse lines. table s13 . primers and pcr conditions used for genotyping the mutant alleles of the knockout mouse lines. table s14 . human and mouse rt-pcr primer sequences used for verification of reproductive tract-specificity. additional file 2: fig. s1 . genes that passed the tpm and fdr filters in at least one of the measured reproductive tissues or cells were visualized using a heatmap of the ruvr batch corrected log2 cpm gene expression values for the human (a) and mouse (b) samples. additional file 3: table s3 . human expression summary. contains differential fold change, identity of the non-reproductive tissue with maximal gene expression based on the differential gene analysis, false detection rate (fdr) value, average and standard deviation tpm expression values, and log2 cpm gene expression value for the human samples. all protein-coding genes (18,305 genes) that had expression in at least one reproductive tissue or cell is listed. additional file 4: table s4 . mouse expression summary. contains differential fold change, identity of the non-reproductive tissue with maximal gene expression based on the differential gene analysis, false detection rate (fdr) value, average and standard deviation tpm expression values, and log2 cpm gene expression value for the mouse samples. all protein-coding genes (16,891 genes) that had expression in at least one reproductive tissue or cell is listed. additional file 5: table s5 . all human male reproductive tract-specific genes that met the criteria of identification as reproductive tract-specific in at least one male reproductive tissue or purified cell type, with the level of fold change listed under the tissue or cell if all criteria were met. the criteria of selection are as follows: fdr < 0.05; tpm repro > 10; tpm non-repro , max < 1. a fold change value of 0 indicates the criteria were not met for that that tissue or cell. additional columns indicating 1.) the equivalent mouse ortholog gene symbols (single or multiple symbols) that exist, and 2.) if our studies identified any of these mouse orthologs as reproductive tract-specific in mouse, are included. additional file 6: table s6 . all mouse male reproductive tract-specific genes that met the criteria of identification as reproductive tract-specific in at least one male reproductive tissue or purified cell type, with the level of fold change listed under the tissue or cell if all criteria were met. the criteria of selection are as follows: fdr < 0.05; tpm repro > 8; tpm non-repro , max < 2. a fold change value of 0 indicates the criteria were not met for that that tissue or cell. additional columns indicating 1.) the equivalent human ortholog gene symbols (single or multiple symbols) that exist, and 2.) if our studies identified any of these human orthologs as reproductive tract-specific in human, are included. received: 17 april 2020 accepted: 8 july 2020 population division: un the mouse epididymal transcriptome: transcriptional profiling of segmental gene expression in the epididymis1 the human epididymis: its function in sperm maturation a multitude of genes expressed solely in meiotic or postmeiotic spermatogenic cells offers a myriad of contraceptive targets identification of testis-specific male contraceptive targets: insights from transcriptional profiling of the cycle of the rat seminiferous epithelium and purified testicular cells identification of epididymis-specific transcripts in the mouse and rat by transcriptional profiling the rat epididymal transcriptome: comparison of segmental gene expression in the rat and mouse epididymides1 stage-specific gene expression is a fundamental characteristic of rat spermatogenic cells and sertoli cells the human testis-specific proteome defined by transcriptomics and antibody-based profiling toward development of the male pill: a decade of potential non-hormonal contraceptive targets epididymal protein targets: a brief history of the development of epididymal protease inhibitor as a contraceptive disrupting the male germ line to find infertility and contraception targets male contraception: another holy grail male contraception: past, present and future metabolic cooperation in testis as a pharmacological target: from disease to contraception non-hormonal male contraception: a review and development of an eppin based contraceptive the control of male fertility by spermatid-specific factors: searching for contraceptive targets from spermatozoon's head to tail epididymal approaches to male contraception an integrated encyclopedia of dna elements in the human genome research resource: the dynamic transcriptional profile of sertoli cells during the progression of spermatogenesis expression profiles of human epididymis epithelial cells reveal the functional diversity of caput, corpus and cauda regions transcriptome analysis of highly purified mouse spermatogenic cell populations: gene expression signatures switch from meiotic-to postmeiotic-related processes at pachytene stage dynamics of the transcriptome during human spermatogenesis: predicting the potential key genes regulating male gametes generation chromatin and single-cell rna-seq profiling reveal dynamic signaling and metabolic transitions during human spermatogonial stem cell development id4 levels dictate the stem cell state in mouse spermatogonia human sertoli cells support high levels of zika virus replication and persistence pharos: collating protein information to shed light on the druggable genome 2018: knowledgebase for the laboratory mouse analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics high-throughput discovery of novel developmental phenotypes carcinoembryonic antigen-related cell adhesion molecule 2 controls energy balance and peripheral insulin action in mice shp1 phosphatase-dependent t cell inhibition by ceacam1 adhesion molecule isoforms deletion of the carcinoembryonic antigen-related cell adhesion molecule 1 (ceacam1) gene contributes to colon tumor progression in a murine model of carcinogenesis ceacam1a-/-mice are completely resistant to infection by murine coronavirus mouse hepatitis virus a59 carcinoembryonic antigen-related cell adhesion molecule 10 expressed specifically early in pregnancy in the decidua is dispensable for normal murine development the lim homeobox gene lhx9 is essential for mouse gonad formation generation and characterization oflhx9-gfpcreert2knock-in mouse line targeted germline modifications in rats using crispr/cas9 and spermatogonial stem cells spermatogenic cells of the prepuberal mouse. isolation and morphological characterization integrative characterization of germ cell-specific genes from mouse spermatocyte unigene library comparative and functional analysis of testis-specific genes genome engineering uncovers 54 evolutionarily conserved and testis-enriched genes that are not required for male fertility in mice reaching for high-hanging fruit in drug discovery at protein-protein interfaces pharmacological perturbation of cdk9 using selective cdk9 inhibition or degradation catalytic in vivo protein knockdown by small-molecule protacs hijacking the e3 ubiquitin ligase cereblon to efficiently target brd4 lessons in protac design from selective degradation with a promiscuous warhead assessing different e3 ligases for small molecule induced protein ubiquitination and degradation cell-penetrating peptides: 20 years later, where do we stand? tat peptide-mediated cellular delivery: back to basics proteomics. tissue-based map of the human proteome single nucleotide polymorphisms: discovery of the genetic causes of male infertility a comprehensive gene mutation screen in men with asthenozoospermia malacards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search genetic evaluation of patients with non-syndromic male infertility a systematic review on the genetics of male infertility in the era of nextgeneration sequencing genetics of male infertility: from research to clinic association study of single-nucleotide polymorphisms in faslg, jmjdia, loc203413, tex15, brdt, or2w3, insr, and tas2r38 genes with male infertility evaluation of 172 candidate polymorphisms for association with oligozoospermia or azoospermia in a large cohort of men of european descent point-of-care whole-exome sequencing of idiopathic male infertility the biology of infertility: research advances and clinical challenges mutation screening of the fkbp6 gene and its association study with spermatogenic impairment in idiopathic infertile men efficient typing of copy number variations in a segmental duplication-mediated rearrangement hotspot using multiplex competitive amplification genetic defects in human azoospermia single-nucleotide polymorphisms of the prdm9 (meisetz) gene in patients with nonobstructive azoospermia two single nucleotide polymorphisms in prdm9 (meisetz) gene may be a genetic risk factor for japanese patients with azoospermia by meiotic arrest single-nucleotide polymorphisms in the human rad21l gene may be a genetic risk factor for japanese patients with azoospermia caused by meiotic arrest and sertoli cell-only syndrome excess of rare variants in genes that are key epigenetic regulators of spermatogenesis in the patients with non-obstructive azoospermia mutations in sohlh1 gene associate with nonobstructive azoospermia human fam154a (saxo1) is a microtubule-stabilizing protein specific to cilia and related structures human carboxylesterases: a comprehensive review structure and catalytic properties of carboxylesterase isozymes involved in metabolic activation of prodrugs human carboxylesterases and their role in xenobiotic and endobiotic metabolism the mammalian carboxylesterases: from molecules to functions heterologous expression, purification, and characterization of human triacylglycerol hydrolase hepatic carboxylesterase 1 is essential for both normal and farnesoid x receptorcontrolled lipid homeostasis deficiency of carboxylesterase 1/esterase-x results in obesity, hepatic steatosis, and hyperlipidemia baculo-expression and enzymatic characterization of ces7 esterase lipid remodeling of murine epididymosomes and spermatozoa during epididymal maturation development changes occurring in the lipids of ram epididymal spermatozoa plasma membrane regionalization and redistribution of membrane phospholipids and cholesterol in mouse spermatozoa during in vitro capacitation variation in the cholesterol/phospholipid ratio in human spermatozoa and its relationship with capacitation analysis of recombinant human semenogelin as an inhibitor of human sperm motility characterization of an eppin protein complex from human semen and spermatozoa antimicrobial activity of human eppin, an androgen-regulated, sperm-bound protein with a whey acidic protein motif the role of type ii transmembrane serine proteasemediated signaling in cancer delineation of proteolytic and non-proteolytic functions of the membrane-anchored serine protease prostasin mechanisms of hepatocyte growth factor activation in cancer tissues regulation of cell surface protease matriptase by hai2 is essential for placental development, neural tube closure and embryonic survival in mice three genes expressing kunitz domains in the epididymis are related to genes of wfdc-type protease inhibitors and semen coagulum proteins in spite of lacking similarity between their protein products sperm proteins sof1, tmem95, and spaca6 are required for spermoocyte fusion in mice identification of multiple male reproductive tract-specific proteins that regulate sperm migration through the oviduct in mice disruption of adam3 impairs the migration of sperm into oviduct in mouse male mice deficient for germ-cell cyritestin are infertile co-expression of sperm membrane proteins cmtm2a and cmtm2b is essential for adam3 localization and male fertility in mice factors controlling sperm migration through the oviduct revealed by gene-modified mouse models pseudogenes regulate parental gene expression via cerna network interspecific recombinant congenic strains between c57bl/6 and mice of the mus spretus species: a powerful tool to dissect genetic control of complex traits the sequence read archive cutadapt removes adapter sequences from high-throughput sequencing reads transcript-level expression analysis of rna-seq experiments with hisat, stringtie and ballgown systematic selection of reference genes for the normalization of circulating rna transcripts in pregnant women based on rna-seq data featurecounts: an efficient general purpose program for assigning sequence reads to genomic features edger: a bioconductor package for differential expression analysis of digital gene expression data crispr/cas9 mediated genome editing in es cells and its application for chimeric analysis in mice publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations we thank drs. yumei li all data generated or analyzed during this study are included in this published article, its supplementary information files, and publicly available repositories. the sra values for each of the 162 previously published reproductive and non-reproductive human rna-seq datasets [9, 21, 23, 24, 26, 30] and 81 previously published reproductive and non-reproductive mouse rna-seq datasets [19, 22, 25] are listed in additional file 1: table s1 and additional file 1: table s2 . all raw and processed data for the 12 new human and 9 new mouse samples generated in this study is deposited in ncbi geo (accession gse150854). all mice generated in this study, and any additional information about this study, are available from the corresponding authors upon request.ethics approval and consent to participate human tissue acquisition was approved by the ethics committee at université laval with written consent obtained from each respective donor's family. all animal experiments were approved by the institutional animal care and use committee (iacuc) at baylor college of medicine. additional file 7: fig. s2 . summary of number of statistically significant up and down-regulated genes, and quantification of candidate genes with respect to the individual reproductive tissue or cell of interest. the plots in panels (a) and (b) summarizes the number of statistically significant human or mouse genes respectively, that are up-regulated or downregulated in each reproductive tissue or cell of interest compared to the non-reproductive tissue with maximal gene expression. red columns depict the number genes that are up-regulated and blue columns depict the number genes that are down-regulated. changes in gene expression were considered statistically significant for an fdr of less than or equal to 0.05. the total number of candidate genes are designated by the black columns. candidate genes are genes that passed the fdr and tpm expression value filters.additional file 8: fig. s3 . venn diagrams comparing the overlap between the candidate male reproductive genes identified by the indicated reproductive tissues. the human testis combined gene list is the list of genes from both new samples we isolated and from previously published testis samples. the human epididymis combined gene list is the list of genes identified in either previously published samples or the newly generated samples across all sections of the epididymis. lastly, the mouse epididymis combined gene list is the list combined list of genes identified across all three sections of the mouse epididymis.additional file 9: table s7 . complete cross-sample comparison identifying human and mouse reproductive tract specific genes common to two or more samples and unique to each as identified through our studies.additional file 10: fig. s4 . classification of genes into different protein families and identification of the existence of an experimental mouse model. each candidate human gene was classified as an enzyme (enzyme), chromosome and histone modifiers (epigenetic), g-proteincoupled receptor (gpcr), orphan g-protein-couple receptor (ogpcr), kinase (kinase), transcription factor (tf), nuclear receptor (nr), ion channel (ic), chromosome and histone modifying transcript factor (tf; epigenetic), transporter (transporter) and unknown (a). the total number of candidate genes identified in our search for mouse models were plotted. orange columns designate the number of candidate genes where a model was identified while yellow designates candidate genes where a model was not identified (b).additional file 11: table s8 . drug target type classification for human genes. genes are listed according to the tissue and/or cell that they were identified as reproductive tract-specific in.additional file 12: table s9 . availability of a mouse model for human genes with a mouse ortholog. genes are listed according to the tissue and/or cell that they were identified as reproductive tract-specific in.additional file 13: fig. s5 . one-hundred and forty-two previously identified human male reproductive tract-specific genes that remain without a reported mouse model. the listed genes were identified in one or more datasets as indicated in the venn diagram. underlined genes were also identified in our studies as reproductive tract-specific in mouse. genes written in blue encode either enzymes, kinases, gpcrs, ogpcrs, transporters, transcription factors, or proteins involved in epigenetic regulation. genes written in dark red were identified in both testis (testis and/or testis cell) and in epididymis.additional file 14: fig. s6 . seventy-three human male reproductive tract-specific genes that each have a reported mouse model with male infertility phenotype. the listed genes were identified in one or more datasets as indicated in the venn diagram. underlined genes were also identified in our studies as reproductive tract-specific in mouse. genes written in blue encode either enzymes, kinases, gpcrs, ogpcrs, transporters, transcription factors, or proteins involved in epigenetic regulation. genes written in dark red were identified in both testis (testis and/or testis cell) and in epididymis.additional file 15: fig. s7 . rt-pcr confirmation of reproductive tractspecificity in both humans (a) and mice (b). the genes listed in this figure were identified through our studies and previous studies, but currently remain without a reported mouse model. gapdh and hprt are included as housekeeping genes.additional file 16: fig. s8 . eighty-nine novel human genes without a mouse ortholog. the listed genes were identified in one or more datasets as indicated in the venn diagram. genes written in blue encode either enzymes, kinases, gpcrs, ogpcrs, transporters, transcription factors, or proteins involved in epigenetic regulation. genes written in dark red were identified in both testis (testis and/or testis cell) and in epididymis.additional file 17: fig. s9 . novel reproductive tract-specific human genes that do not have any equivalent mouse orthologs. these genes may serve as potential contraceptive targets, however functional validation would need to be carried out in another model organism than mouse, such as rat or marmoset, which do have orthologs to these genes. the digital pcr (heatmap) depicts the average transcripts per million (tpm) value per tissue per gene from the indicated human rna-seq datasets as processed in parallel through our bioinformatics pipeline. white = 0 tpm, black ≥30 tpm. the expression profile of the human housekeeping gene, gapdh, is included as reference. for data obtained from published datasets, superscript values reference the dataset publication as previously mentioned.additional file 18: table s10 . 1064 previously identified genes. genes previously identified as male reproductive tract-specific through high throughput gene expression studies using either microarrays or rna-seq [2] [3] [4] [5] [6] [7] [8] . the human ortholog to genes identified in mouse and rat studies is included.additional file 19: fig. s10 . developmental expression pattern of spint3, spint4, spint5, pp2d1, and saxo1 in epididymis and testis of postnatal and adult mice. whole epididymides were used at postnatal days 3, 6, 10, and 14 and epididymis segments (caput, corpus, and cauda) were used at postnatal days 21, 28, 35, and 60. whole testes were used at all time points. the housekeeping gene, hprt, was used as reference.additional file 20: fig. s11 . multi-channel fluorescence images of bilateral epididymis serial sections stained with custom rnascope probes targeting either spint3, spint4, or spint5 mrna (red) and dapi (blue). the position of caput (cap), corpus (cor), and cauda (cau) is labeled in the overview image (left column). the position of the magnification over the epididymis is the same for all three sections (right column).additional file 21: fig. s12 . representative periodic acid-schiff staining of spint3 and spint4/5 knockout and littermate control (wild-type) testes and epididymis segments (caput, corpus, and cauda) at 3 months of age.additional file 22: fig. s13 . representative periodic acid-schiff staining of ces5a knockout and littermate control (wild-type) testes and epididymis segments (caput, corpus, and cauda) at 3 months of age.additional file 23: fig. s14 . representative scanning electron microscopy images of spint4/5 and ces5a ko and littermate control mouse sperm.additional file 24: fig. s15 . representative periodic acid-schiff staining of pp2d1 and saxo1 knockout and littermate control (wild-type) testes and epididymis segments (caput, corpus, and cauda) at 3 months of age. all authors have no competing interests. key: cord-103150-e9q8e62v authors: mishra, shreya; srivastava, divyanshu; kumar, vibhor title: improving gene-network inference with graph-wavelets and making insights about ageing associated regulatory changes in lungs date: 2020-11-04 journal: biorxiv doi: 10.1101/2020.07.24.219196 sha: doc_id: 103150 cord_uid: e9q8e62v using gene-regulatory-networks based approach for single-cell expression profiles can reveal un-precedented details about the effects of external and internal factors. however, noise and batch effect in sparse single-cell expression profiles can hamper correct estimation of dependencies among genes and regulatory changes. here we devise a conceptually different method using graph-wavelet filters for improving gene-network (gwnet) based analysis of the transcriptome. our approach improved the performance of several gene-network inference methods. most importantly, gwnet improved consistency in the prediction of generegulatory-network using single-cell transcriptome even in presence of batch effect. consistency of predicted gene-network enabled reliable estimates of changes in the influence of genes not highlighted by differential-expression analysis. applying gwnet on the single-cell transcriptome profile of lung cells, revealed biologically-relevant changes in the influence of pathways and master-regulators due to ageing. surprisingly, the regulatory influence of ageing on pneumocytes type ii cells showed noticeable similarity with patterns due to effect of novel coronavirus infection in human lung. inferring gene-regulatory-networks and using them for system-level modelling is being widely used for understanding the regulatory mechanism involved in disease and development. the interdependencies among variables in the network is often represented as weighted edges between pairs of nodes, where edge weights could represent regulatory interactions among genes. gene-networks can be used for inferring causal models [1] , designing and understanding perturbation experiments, comparative analysis [2] and drug discovery [3] . due to wide applicability of network inference, many methods have been proposed to estimate interdependencies among nodes. most of the methods are based on pairwise correlation, mutual information or other similarity metrics among gene expression values, provided in a different condition or time point. however, resulting edges are often influenced by indirect dependencies owing to low but effective background similarity in patterns. in many cases, even if there is some true interaction among a pair of nodes, its effect and strength is not estimated properly due to noise, background-pattern similarity and other indirect dependencies. hence recent methods have started using alternative approaches to infer more confident interactions. such alternative approach could be based on partial correlations [4] or aracne's method of statistical threshold of mutual information [5] . 1 single-cell expression profiles often show heterogeneity in expression values even in a homogeneous cell population. such heterogeneity can be exploited to infer regulatory networks among genes and identify dominant pathways in a celltype. however, due to the sparsity and ambiguity about the distribution of gene expression from single-cell rna-seq profiles, the optimal measures of gene-gene interaction remain unclear. hence recently, sknnider et al. [6] evaluated 17 measures of association to infer gene co-expression based network. in their analysis, they found two measures of association, namely phi and rho as having the best performance in predicting co-expression based gene-gene interaction using scrna-seq profiles. in another study, chen et al. [7] performed independent evaluation of a few methods proposed for genenetwork inference using scrna-seq profiles such as scenic [8] , scode [9] , pidc [10] . chen et al. found that for single-cell transcriptome profiles either generated from experiments or simulations, these methods had a poor performance in reconstructing the network. performance of such methods can be improved if gene-expression profiles are denoised. thus the major challenge of handling noise and dropout in scrna-seq profile is an open problem. the noise in single-cell expression profiles could be due to biological and technical reasons. the biological source of noise could include thermal fluctuations and a few stochastic processes involved in transcription and translation such as allele specific expression [11] and irregular binding of transcription factors to dna. whereas technical noise could be due to amplification bias and stochastic detection due to low amount of rna. raser and o'shea [12] used the term noise in gene expression as measured level of its variation among cells supposed to be identical. raser and o'shea categorised potential sources of variation in geneexpression in four types : (i) the inherent stochasticity of biochemical processes due to small numbers of molecules; (ii) heterogeneity among cells due to cell-cycle progression or a random process such as partitioning of mitochondria (iii) subtle micro-environmental differences within a tissue (iv) genetic mutation. overall noise in gene-expression profiles hinders in achieving reliable inference about regulation of gene activity in a cell-type. thus, there is demand for pre-processing methods which can handle noise and sparsity in scrna-seq profiles such that inference of regulation can be reliable. the predicted gene-network can be analyzed further to infer salient regulatory mechanisms in a celltype using methods borrowed from graph theory. calculating gene-importance in term of centrality, finding communities and modules of genes are common downstream analysis procedures [2] . just like gene-expression profile, inferred gene network could also be used to find differences in two groups of cells(sample) [13] to reveal changes in the regulatory pattern caused due to disease, environmental exposure or ageing. in particular, a comparison of regulatory changes due to ageing has gained attention recently due to a high incidence of metabolic disorder and infection based mortality in the older population. especially in the current situation of pandemics due to novel coronavirus (sars-cov-2), when older individuals have a higher risk of mortality, a question is haunting researchers. that question is: why old lung cells have a higher risk of developing severity due to sars-cov-2 infection. however, understanding regulatory changes due to ageing using gene-network inference with noisy single-cell scrna-seq profiles of lung cells is not trivial. thus there is a need of a noise and batch effect suppression method for investigation of the scrna-seq profile of ageing lung cells [14] using a network biology approach. here we have developed a method to handle noise in gene-expression profiles for improving genenetwork inference. our method is based on graphwavelet based filtering of gene-expression. our approach is not meant to overlap or compete with existing network inference methods but its purpose is to improve their performance. hence, we compared other output of network inference methods with and without graph-wavelet based pre-processing. we have evaluated our approach using several bulk sample and single-cell expression profiles. we further investigated how our denoising approach influences the estimation of graph-theoretic properties of gene-network. we also asked a crucial question: how the gene regulatory-network differs between young and old individual lung cells. further, we compared the pattern in changes in the influence of genes due to ageing with differential expression in covid infected lung. our method uses a logic that cells (samples) which are similar to each other, would have a more similar expression profile for a gene. hence, we first make a network such that two cells are connected by an edge if one of them is among the top k nearest neighbours (knn) of the other. after building knn-based network among cells (samples), we use graph-wavelet based approach to filter expression of one gene at a time (see fig. 1 ). for a gene, we use its expression as a signal on the nodes of the graph of cells. we apply a graph-wavelet transform to perform spectral decomposition of graph-signal. after graph-wavelet transformation, we choose the threshold for wavelet coefficients using sureshrink and bayesshrink or a default percentile value determined after thorough testing on multiple data-sets. we use the retained values of the coefficient for inverse graph-wavelet transformation to reconstruct a filtered expression matrix of the gene. the filtered gene-expression is used for gene-network inference and other down-stream process of analysis of regulatory differences. for evaluation purpose, we have calculated inter-dependencies among genes using 5 different co-expression measurements, namely pearson and spearman correlations, φ and ρ scores and aracne. the biological and technical noise can both exist in a bulk sample expression profile ( [12] ). in order to test the hypothesis that graph-based denoising could improve gene-network inference, we first evaluated the performance of our method on bulk expression data-set. we used 4 data-sets made available by dream5 challenge consortium [15] . three data-sets were based on the original expression profile of bacterium escherichia coli and the single-celled eukaryotes saccharomyces cerevisiae and s aureus. while the fourth data-set was simulated using in silico network with the help of genenetweaver, which models molecular noise in transcription and translation using chemical langevin equation [16] . the true positive interactions for all the four data-sets are also available. we compared graph fourier based low passfiltering with graph-wavelet based denoising using three different approaches to threshold the waveletcoefficients. we achieved 5 -25 % improvement in score over raw data based on dream5 criteria [15] with correlation, aracne and rho based network prediction. with φ s based gene-network prediction, there was an improvement in 3 out of 4 dream5 data-sets ( fig. 2a) . all the 5 network inference methods showed improvement after graphwavelet based denoising of simulated data (in silico) from dream5 consortium ( fig. 2a) . moreover, graph-wavelet based filtering had better performance than chebyshev filter-based low pass filtering in graph fourier domain. it highlights the fact that even bulk sample data of gene-expression can have noise and denoising it with graph-wavelet after making knn based graph among samples has the potential to improve gene-network inference. moreover, it also highlights another fact, well known in the signal processing field, that wavelet-based filtering is more adaptive than low pass-filtering. in comparison to bulk samples, there is a higher level of noise and dropout in single-cell expression profiles. dropouts are caused by non-detection of true expression due to technical issues. using low-pass filtering after graph-fourier transform seems to be an obvious choice as it fills in a background signal at missing values and suppresses high-frequency outlier-signal [17] . however, in the absence of information about cell-type and cellstates, a blind smoothing of a signal may not prove to be fruitful. hence we applied graph-wavelet based filtering for processing gene-expression dataset from the scrna-seq profile. we first used scrna-seq data-set of mouse embryonic stem cells (mescs) [18] . in order to evaluate network inference in an unbiased manner, we used gene regulatory interactions compiled by another research group [19] . our approach of graph-wavelet based pre-processing of mesc scrna-seq data-set improved the performance of gene-network inference methods by 8-10 percentage (fig. 2b) . however, most often, the gold-set of interaction used for evaluation of gene-network inference is incomplete, which hinders the true assessment of improvement. figure 1 : the flowchart of gwnet pipeline. first, a knn based network is made between samples/cell. a filter for graph wavelet is learned for the knn based network of samples/cells. gene-expression of one gene at a time is filtered using graph-wavelet transform. filtered gene-expression data is used for network inference. the inferred network is used to calculate centrality and differential centrality among groups of cells. figure 2 : improvement in gene-network inference by graph-wavelet based denoising of gene-expression (a) performance of network inference methods using bulk gene-expression data-sets of dream5 challenge. three different ways of shrinkage of graph-wavelet coefficients were compared to graph-fourier based low pass filtering. the y-axis shows fold change in area under curve(auc) for receiver operating characteristic curve (roc) for overlap of predicted network with golden-set of interactions. for hard threshold, the default value of 70% percentile was used. (b) performance evaluation using single-cell rna-seq (scrna-seq) of mouse embryonic stem cells (mescs) based network inference after filtering the gene-expression. the gold-set of interactions was adapted from [19] (c) comparison of graph wavelet-based denoising with other related smoothing and imputing methods in terms of consistency in the prediction of the gene-interaction network. here, phi (φ s ) score was used to predict network among genes. for results based on other types of scores see supplementary figure s1 . predicted networks from two scrna-seq profile of mesc were compared to check robustness towards the batch effect. hence we also used another approach to validate our method. for this purpose, we used a measure of overlap among network inferred from two scrna-seq data-sets of the same cell-type but having different technical biases and batch effects. if the inferred networks from both data-sets are closer to true gene-interaction model, they will show high overlap. for this purpose, we used two scrnaseq data-set of mesc generated using two different protocols(smartseq and drop-seq). for comparison of consistency and performance, we also used a few other imputation and denoising methods proposed to filter and predict the missing expression values in scrna-seq profiles. we evaluated 7 other such methods; graph-fourier based filtering [17] , magic [20] , scimpute [21] , dca [22] , saver [23] , randomly [24] , knn-impute [25] . graphwavelet based denoising provided better improvement in auc for overlap of predicted network with known interaction than other 7 methods meant for imputing and filtering scrna-seq profiles (supplementary figure s1a ). similarly in comparison to graph-wavelet based denoising, the other 7 methods did not provided substantial improvement in auc for overlap among gene-network inferred by two data-sets of mesc (fig. 2c , supplementary figure s1b ). however, graph wavelet-based filtering improved the overlap between networks inferred from different batches of scrna-seq profile of mesc even if they were denoised separately (fig. 2c , supplementary figure s1b ). with φ s based edge scores the overlap among predicted gene-network increased by 80% due to graph-wavelet based denoising (fig. 2c ). the improvement in overlap among networks inferred from two batches hints that graph-wavelet denoising is different from imputation methods and has the potential to substantially improve gene-network inference using their expression profiles. improved gene-network inference from single-cell profile reveal agebased regulatory differences improvement in overlap among inferred genenetworks from two expression data-set for a cell type also hints that after denoising predicted networks are closer to true gene-interaction profiles. hence using our denoising approach before estimat-ing the difference in inferred gene-networks due to age or external stimuli could reflect true changes in the regulatory pattern. such a notion inspired us to compare gene-networks inferred for young and old pancreatic cells using their scrna-seq profile filtered by our tool [26] . martin et al. defined three age groups, namely juvenile ( 1month-6 years), young adult (21-22 years) and aged (38-54 years) [26] . we applied graph-wavelet based denoising of pancreatic cells from three different groups separately. in other words, we did not mix cells from different age groups while denoising. graph-wavelet based denoising of a singlecell profile of pancreatic cells caused better performance in terms of overlap with protein-protein interaction (ppi) (fig. 3a , supplementary figure s2a ). even though like chen et al. [7] we have used ppi to measure improvement in genenetwork inference, it may not be reflective of all gene-interactions. hence we also used the criteria of increase in overlap among predicted networks for same cell-types to evaluate our method for scrnaseq profiles of pancreatic cells. denoising scrnaseq profiles also increased overlap between inferred gene-network among pancreatic cells of the old and young individuals (fig. 3b , supplementary figure s2b ). we performed quantile normalization of original and denoised expression matrix taking all 3 age groups together to bring them on the same scale to calculate the variance of expression across cells of every gene. the old and young pancreatic alpha cells had a higher level of median variance of expression of genes than juvenile. however, after graph-wavelet based denoising, the variance level of genes across all the 3 age groups became almost equal and had similar median value (fig. 3c ). notice that, it is not trivial to estimate the fraction of variances due to transcriptional or technical noise. nonetheless, graph-wavelet based denoising seemed to have reduced the noise level in single-cell expression profiles of old and young adults. differential centrality in the co-expression network has been used to study changes in the influence of genes. however, noise in single-cell expression profiles can cause spurious differences in centrality. hence we visualized the differential degree of genes in network inferred using young and old cells scrna-seq profiles. the networks inferred from non-filtered expression had a much higher number of non-zero differential degree values in comparison to the de-noised version (fig. 3d, supplementary figure s2c ). thus denoising seems to reduce differences among centrality, which could be due to randomness of noise. next, we analyzed the properties of genes whose variance dropped most due to graphwavelet based denoising. surprisingly, we found that top 500 genes with the highest drop in variance due to denoising in old pancreatic beta cells were significantly associated with diabetes mellitus and hyperinsulinism. whereas, top 500 genes with the highest drop in variance in young pancreatic beta cells had no or insignificant association with diabetes (fig. 3e) . a similar trend was observed with pancreatic alpha cells (supplementary figure s2d ) . such a result hint that ageing causes increase in stochasticity of the expression level of genes associated with pancreas function and denoising could help in properly elucidating their dependencies with other genes. improvement in gene-network inference for studying regulatory differences among young and old lung cells. studying cell-type-specific changes in regulatory networks due to ageing has the potential to provide better insight about predisposition for disease in the older population. hence we inferred genenetwork for different cell-types using scrna-seq profiles of young and old mouse lung cells published by kimmel et al. [14] .the lower lung epithelia where a few viruses seem to have the most deteriorating effect consists of multiple types of cells such as bronchial epithelial and alveolar epithelial cells, fibroblast, alveolar macrophages, endothelial and other immune cells. the alveolar epithelial cells, also called as pneumocytes are of two major types. the type 1 alveolar (at1) epithelial cells for major gas exchange surface of lung alveolus has an important role in the permeability barrier function of the alveolar membrane. type 2 alveolar cells (at2) are the progenitors of type 1 cells and has the crucial role of surfactant production. at2 cells ( or pneumocytes type ii) cells are a prime target of many viruses; hence it is important to understand the regulatory patterns in at2 cells, especially in the context of ageing. we applied our method of denoising on scrnaseq profiles of cells derived from old and young mice lung [14] . graph wavelet based denoising lead to an increase in consistency among inferred genenetwork for young and old mice lung for multiple cell-types (fig. 4a) . graph-wavelet based denoising also lead to an increase in consistency in predicted gene-network from data-sets published by two different groups (fig. 4b) . the increase in overlap of gene-networks predicted for old and young cells scrna-seq profile, despite being denoised separately, hints about a higher likelihood of predicting true interactions. hence the chances of finding gene-network based differences among old and young cells were less likely to be dominated by noise. we studied ageing-related changes in pagerank centrality of nodes(genes). since pagerank centrality provides a measure of "popularity" of nodes, studying its change has the potential to highlight the change in the influence of genes. first, we calculated differential pagerank of genes among young and old at2 cells (supporting file-1) and performed gene-set enrichment analysis using enrichr [27] . the top 500 genes with higher pagerank in young at2 cells had enriched terms related to integrin signalling, 5ht2 type receptor mediated signalling, h1 histamine receptor-mediated signalling pathway, vegf, cytoskeleton regulation by rho gtpase and thyrotropin activating receptor signalling (fig. 4c) . we ignored oxytocin and thyrotropin-activating hormone-receptor mediated signalling pathways as an artefact as the expression of oxytocin and trh receptors in at2 cells was low. moreover, genes appearing for the terms "oxytocin receptor-mediated signalling" and "thyrotropin activating hormone-mediated signalling" were also present in gene-set for 5ht2 type receptormediated signalling pathway. we found literature support for activity in at2 cells for most of the enriched pathways. however, there were very few studies which showed their differential importance in old and young cells, such as bayer et al. demonstrated mrna expression of several 5-htr including 5-ht2, 5ht3 and 5ht4 in alveolar epithelial cells type ii (at2) cells and their role in calcium ion mobilization. similarly, chen et al. [28] showed that histamine 1 receptor antagonist reduced pulmonary surfactant secretion from adult rat alveolar at2 cells in primary culture. vegf pathway is active in at2 cells, and it is known that ageing has an effect on vegf mediated angiogenesis in lung. moreover, vegf based angiogenesis is for comparing two networks it is important to reduce differences due to noise. hence the plot here shows similarity of predicted networks before and after graph-wavelet based denoising. the result shown here are for correlation-based co-expression network, while similar results are shown using ρ score in supplementary figure s2 . (c) variances of expression of genes across single-cells before and after denoising (filtering) is shown here. variances of genes in a cell-type was calculated separately for 3 different stages of ageing (young, adult and old). the variance (estimate of noise) is higher in older alpha and beta cells compared to young. however, after denoising variance of genes in all ageing stage becomes equal (d) effect of noise in estimated differential centrality is shown is here. the difference in the degree of genes in network estimated for old and young pancreatic beta cells is shown here. the number of non-zero differential-degree estimated using denoised expression is lower than unfiltered expression based networks.(e) enriched panther pathway terms for top 500 genes with the highest drop in variance after denoising in old and young pancreatic beta cells. known to decline with age [29] . we further performed gene-set enrichment analysis for genes with increased pagerank in older mice at2 cells. for top 500 genes with higher pagerank in old at2 cells, the terms which appeared among 10 most enriched in both kimmel et al. and angelids et al. data-sets were t cell activation, b cell activation, cholesterol biosynthesis and fgf signaling pathway, angiogenesis and cytoskeletal regulation by rho gtpase (fig. 4d) . thus, there was 60% overlap in results from kimmel et al. and angelids et al. data-sets in terms of enrichment of pathway terms for genes with higher pagerank in older at2 cells (supplementary figure s3a , supporting file-2, supporting file-3). overall in our analysis, inflammatory response genes showed higher importance in older at2 cells. the increase in the importance of cholesterol biosynthesis genes hand in hand with higher inflammatory response points towards the influence of ageing on the quality of pulmonary surfactants released by at2. al saedy et al. recently showed that high level of cholesterol amplifies defects in surface activity caused by oxidation of pulmonary surfactant [30] . we also performed enrichr based analysis of differentially expressed genes in old at2 cells (supporting file-4). for genes up-regulated in old at2 cells compared to young, terms which reappeared were cholesterol biosynthesis, t cell and b cell activation pathways, angiogenesis and inflammation mediated by chemokine and cytokine signalling. whereas few terms like ras pathway, jak/stat signalling and cytoskeletal signalling by rho gt-pase did not appear as enriched for genes upregulated in old at2 cells ( figure 3b , supporting file-4). however previously, it has been shown that the increase in age changes the balance of pulmonary renin-angiotensin system (ras), which is correlated with aggravated inflammation and more lung injury [31] . jak/stat pathway is known to be involved in the oxidative-stress induced decrease in the expression of surfactant protein genes in at2 cells [32] . overall, these results indicate that even though the expression of genes involved in relevant pathways may not show significant differences due to ageing, but their regulatory influence could be changing substantially. in order to further gain insight, we analyzed the changes in the importance of transcription factors in ageing at2 cells. among top 500 genes with higher pagerank in old at2 cells, we found several relevant tfs. however, to make a stringent list, we considered only those tfs which had nonzero value for change in degree among gene-network for old and young at2 cells. overall, with kimmel at el. data-set, we found 46 tfs with a change in pagerank and degree (supplementary table-1) due to ageing for at2 cells (fig. 4e) . the changes in centrality (pagerank and degree) of tfs with ageing was coherent with pathway enrichment results. such as etv5 which has higher degree and pagerank in older cells, is known to be stabilized by ras signalling in at2 cells [33] . in the absence of etv5 at2 cell differentiate to at1 cells [33] . another tf jun (c-jun) having stronger influence in old at2 cells, is known to regulate inflammation lung alveolar cells [34] . we also found jun to be having co-expression with jund and etv5 in old at2 cell (supplementary figure s4) . jund whose influence seems to increase in aged at2 cells is known to be involved in cytokine-mediated inflammation. among the tfs stat 1-4 which are involved in jak/stat signalling, stat4 showed higher degree and pagerank in old at2. androgen receptor(ar) also seem to have a higher influence in older at2 cells (fig. 4e ). androgen receptor has been shown to be expressed in at2 cells [35] . we further performed a similar analysis for the scrna-seq profile of interstitial macrophages(ims) in lungs and found literature support for the activity of enriched pathways (supporting file-5). whereas gene-set enrichment output for important genes in older ims had some similarity with results from at2 cells as both seem to have higher pro-inflammatory response pathway such as t cell activation and jak/stat signalling. however, unlike at2 cells, ageing in ims seem to cause an increase in glycolysis and pentose phosphate pathway. higher glycolysis and pentose phosphate pathway activity levels have been previously reported to be involved in the pro-inflammatory response in macrophages by viola et al. [36] . in our results, ras pathway was not enriched significantly for genes with a higher importance in older macrophages. such results show that the pro-inflammatory pathways activated due to aging could vary among different cell-types in lung. for the same type of cells, the predicted networks for old and young cells seem to have higher overlap after graph-wavelet based filtering. the label "raw" here means that, both networks (for old and young) were inferred using unfiltered scrna-seq profiles. wheres the same result from denoised scrna-seq profile is shown as filtered. networks were inferred using correlation-based co-expression. in current pandemic due to sars-cov-2, a trend has emerged that older individuals have a higher risk of developing severity and lung fibrosis than the younger population. since our analysis revealed changes in the influence of genes in lung cells due to ageing, we compared our results with expression profiles of lung infected with sars-cov-2 published by blanco-melo et al. [37] . recently it has been shown that at2 cells predominantly express ace2, the host cell surface receptor for sars-cov-2 attachment and infection [38] . thus covid infection could have most of the dominant effect on at2 cells. we found that genes with significant upregulation in sars-cov-2 infected lung also had higher pagerank in gene-network inferred for older at2 cells (fig. 5a) . we also repeated the process of network inference and calculating differential centrality among old and young using all types of cells in the lung together (supporting file-6). we performed gene-set enrichment for genes up-regulated in sars-cov-2 infected lung. majority of the 7 panther pathway terms enriched for genes up-regulated in sars-cov-2 infected lung also had enrichment for genes with higher pagerank in old lung cells (combined). total 6 out of 7 significantly enriched panther pathways for genes up-regulated in covid-19 infected lung, were also enriched for genes with higher pagerank in older at2 cells in either of the two data-sets used here (5 in angelids et al., 3 in kimmel et al. data-based results). among the top 10 enriched wikipathway terms for genes up-regulated in covid infected lung, 7 has significant enrichment for genes with higher pagerank in old at2 cells (supporting file-7). however, the term type-ii interferon signalling did not have significant enrichment for genes with higher pagerank in old at2 cells. we further investigated enriched motifs of transcription factors in promoters of genes up-regulated in covid infected lungs (supplementary methods). for promoters of genes up-regulated in covid infected lung top two enriched motifs belonged to irf (interferon regulatory factor) and ets family tfs. notice that etv5 belong to sub-family of ets groups of tfs. further analysis also revealed that most of the genes whose expression is positively cor-related with etv5 in old at2 cells is up-regulated in covid infected lung. in contrast, genes with negative correlation with etv5 in old at2 cells were mostly down-regulated in covid infected lung. a similar trend was found for stat4 gene. however, for erg gene with higher pagerank in young at2 cell, the trend was the opposite. in comparison to genes with negative correlation, positively correlated genes with erg in old at2 cell, had more downregulation in covid infected lung. such trend shows that a few tfs like etv5, stat4 with higher pagerank in old at2 cells could be having a role in poising or activation of genes which gain higher expression level on covid infection. inferring regulatory changes in pure primary cells due to ageing and other conditions, using singlecell expression profiles has tremendous potential for various applications. such applications could be understanding the cause of development of a disorder or revealing signalling pathways and master regulators as potential drug targets. hence to support such studies, we developed gwnet to assist biologists in work-flow for graph-theory based analysis of single-cell transcriptome. gwnet improves inference of regulatory interaction among genes using graph-wavelet based approach to reduce noise due to technical issues or cellular biochemical stochasticity in gene-expression profiles. we demonstrated the improvement in gene-network inference using our filtering approach with 4 benchmark data-sets from dream5 consortium and several single-cell expression profiles. using 5 different ways for inferring network, we showed how our approach for filtering gene-expression can help genenetwork inference methods. our results of comparison with other imputation, smoothing methods and graph-fourier based filtering showed that graph-wavelet is more adaptive to changes in the expression level of genes with changing neighborhood of cells. thus graph-wavelet based denoising is a conceptually different approach for preprocessing of gene-expression profiles. there is a huge body of literature on inferring gene-networks from bulk gene-expression profile and utilizing it to find differences among two groups of samples. however, applying classical procedures on singleshown for erg, which have higher pagerank in young at2 cells. most of the genes which had a positive correlation with etv5 and stat4 expression in old murine at2 cells were up-regulated in covid infected lung. whereas for erg the trend is the opposite. genes positively correlated with erg genes in old at2 had more down-regulation than genes with negative correlation. such results hint that tfs whose influence (pagerank) increase during ageing could be involved activating or poising the genes up-regulated in covid infection. cell transcriptome profiles has not proved to be effective. our method seems to resolve this issue by increasing consistency and overlap among gene-networks inferred using an expression from different sources (batches) for the same cell-type even if each data-sets was filtered independently. such an increase in overlap among predicted network from independently processed data-sets from different sources hint that estimated dependencies among genes reach closer to true values after graphwavelet based denoising of expression profiles. having network prediction closer to true values increases the reliability of comparison of a regulatory pattern among two groups of cells. moreover, recently chow and chen [39] have shown that age-associated genes identified using bulk expression profiles of the lung are enriched among those induced or suppressed by sars-cov-2 infection. however, they did not perform analysis with systems-level approach. our analysis highlighted ras and jak/stat pathways to be enriched for genes with stronger influence in old at2 cells and genes up-regulated in covid infected lung. ras/mapk signalling is considered essential for self-renewal of at2 cell [33] . similarly, jak/stat pathway is known to be activated in the lung during injury [40] and influence surfactant quality [32] . we have used murine aging-lung scrna-seq profiles however our analysis provides an important insight that regulatory patterns and master-regulators in old at2 cells are in such a configuration that they could be predisposing it for a higher level of ras and jak/stat signalling. androgen receptor (ar) which has been implicated in male pattern baldness and increased risk of males towards covid infection [41] had higher pagerank and degree in old at2 cells. however, further investigation is needed to associate ar with severity on covid infection due to ageing. on the other hand, in young at2 cells, we find a high influence of genes involved in histamine h1 receptor-mediated signalling, which is known to regulate allergic reactions in lungs [42] . another benefit of our approach of analysis is that it can highlight a few specific targets of further study for therapeutics. such as a kinase that binds and phosphorylates c-jun called as jnk is being tested in clinical trials for pulmonary fibrosis [43] . androgen deprivation therapy has shown to provide partial protection against sars-cov-2 infection [44] . on the same trend, our analysis hints that etv5 could also be considered as drug-target to reduce the effect of ageing induced ras pathway activity in the lung. we used the term noise in gene-expression according to its definition by several researchers such as raser and o'shea [12] ; as the measured level of variation in gene-expression among cells supposed to be identical. hence we first made a base-graph (networks) where supposedly identical cells are connected by edges. for every gene we use this basegraph and apply graph-wavelet transform to get an estimate of variation of its expression in every sample (cells) with respect to other connected samples at different levels of graph-spectral resolution. for this purpose, we first calculated distances among samples (cells). to get a better estimate of distances among samples (cells) one can perform dimension reduction of the expression matrix using tsne [45] or principal component analysis. we considered every sample (cell) as a node in the graph and connected two nodes with an edge only when one of them was among k-nearest neighbors of the other. here we decide the value of k in the range of 10-50, based on the number of samples(cells) in the expression data-sets. thus we calculated the preliminary adjacency matrix using k-nearest neighbours (knn) based on euclidean distance metric between samples of the expression matrix. we used this adjacency matrix to build a base-graph. thus each vertex in the base-graph corresponds to each sample and edge weights to the euclidean distance between them. the weighted graph g built using knn based adjacency matrix comprises of a finite set of vertices v which corresponds to cells (samples), a set of edges e denoting connection between samples (if exist) and a weight function which gives nonnegative weighted connections between cells (samples). this weighted matrix can also be defined as a n xn (n being number of cells) weighted adjacency matrix a where a ij is 0 if there is no edge between cells i and j , otherwise a ij = weight(i, j) if there exist an edge between i, j. the degree of a cell in the graph is the sum of weights of edges incident on that cell. also, diagonal degree matrix d of this graph comprises of degree d(i) if i = j, 0 otherwise. a non-normalized graph laplacian operator l for a graph is defined as l = d − a. the normalized form of graph laplacian operator is defined as : both laplacian operators produce different eigenvectors [46] . however, we have used a normalized form of laplacian operator for the graph between cells. the graph laplacian is further used for graph fourier transformation of signals on nodes (see supplementary methods) ([47] [46] ). for filtering in the fourier domain, we used chebyshev-filter for gene expression profile. we took the expression of each gene at a time considering it as a signal and projected it onto the raw graph (where each vertex corresponds to each sample) object [17] . we took forward fourier transform of signal and filtered the signal using chebyshev filter in the fourier domain and then inverse transformed the signal to calculate filtered expression. this same procedure was repeated for every gene. this would finally give us filtered gene expression. spectral graph wavelet entails choosing a nonnegative real-valued kernel function which can behave as a bandpass filter and is similar to fourier transform. the re-scaled kernel function of graph laplacian gives wavelet operator which eventually produce graph wavelet coefficients at each scale. however, using continuous functional calculus one can define a function of self adjoint operator on the basis of spectral representation of graph. although for a graph with finite dimensional laplacian, this can be achieved by eigenvalues and eigenvectors of laplacian l [47] . the wavelet operator is given by t g = g(l). t g f gives wavelet coefficients for a signal f at scale = 1. this operator operates on eigenvectors u l as t g u l = g(λ l )u l . hence, for any graph signal, operator t g operates on the signal by adjusting each graph fourier coefficient as and inverse fourier transform given as the wavelet operator at every scale s is given as t s g = g(sl). these wavelet operators are localized to obtain individual wavelets by applying them to δ n , with δ n being a signal with 1 on vertex n and zero otherwise [47] . thus considering coefficients at every scale, the inverse transform can be obtained as here, in spite of filtering in fourier domain, we took wavelet coefficients of each gene expression signal at different scales. thresholding was applied on each scale to filter wavelet coefficients. we applied both hard and soft thresholding on wavelet coefficients. for soft thresholding, we implemented well-known methods sure shrink and bayes shrink. finding an optimal threshold for wavelet coefficients for denoising linear-signals and images has remained a subject of intensive research. we evaluated both soft and hard thresholding approaches and tested an information-theoretic criterion known as the minimum description length principle (mdl). using our tool gwnet, user can choose from multiple options of finding threshold such as visushrink, sureshrink and mdl. here, we have used hard-thresholding for most the data-sets as proper soft-thresholding of graph-wavelet coefficient is itself a topic of intensive research and may need further fine-tuning. one can also use hardthreshold value based on the best overlap among predicted gene-network and protein-protein interaction (ppi). while applying it on multiple datasets we realized that threshold cutoffs estimated by mdl criteria and best overlap of predicted network with known interaction and ppi, were in the range of 60-70 percentile. for comparing predicted network from multiple data-sets, we needed uniform percentile cutoff to threshold graph-wavelet coefficients. hence for uniform analysis of several datasets, we have set the default threshold value of 70 percentile. hence in default mode, wavelet coefficient with absolute value less than 70 percentile was made equal to zero. gwnet tool is flexible, and any network inferences method can be plugged in it for making regulatory inferences using a graph-theoretic approach. here, for single-cell rna-seq data, we have used gene-expression values in the form of fpkm (fragments per kilobase of exon model per million reads mapped). we pre-processed single-cell gene expression by quantile normalization and log transformation. to start with, we used spearman and pearson correlation to achieve a simple estimate of the measure of inter-dependencies among genes. we also used aracne ( algorithm for the reconstruction of accurate cellular networks) to infer network among genes. aracne first computes mutual information for each gene-pair. then it considers all possible triplet of genes and applies the data processing inequality (dpi) to remove indirect interactions. according to dpi, if gene i and gene j do not interact directly with each other but show dependency via gene k, the following inequality hold where i(g i , g j ) represents mutual information between gene i and gene j. aracne also removes interaction with mutual information less than a particular threshold eps. we have used eps value to recently skinnider et al., [6] showed superiority of two measures of proportionality rho(ρ) and phi(φ s ) [48] for estimating gene-coexpression network using single-cell transcriptome profile. hence we also evaluated the benefit of graph-wavelet based denoising of gene-expression with measures of proportionality ρ and φ s . the measures of proportionality φ can be defined as φ(g i , g j ) = var(g i − g j ) var(g i ) where g i is the vector containing log values of expression of a gene i across multiple samples (cells) and var() represents variance function. the symmetric version of φ can be written as whereas rho can be defined as to estimate both measures of proportionality, ρ and φ, we used 'propr' package2.0 [49] . the networks inferred from filtered and unfiltered gene-expression were compared to the ground truth. ground truth for dream5 challenge dataset was already available while for single-cell expression, we assembled the ground truth from hip-pie (human integrated protein-protein interaction reference) database [50] . we considered all edges possible in network, sorted them based on the significance of edge weights. we calculated the area under the receiver operator curve for both raw and filtered networks by comparing against edges in the ground truth. receiver operator is a standard performance evaluation metrics from the field of machine learning, which has been used in the dream5 evaluation method with some modifications. the modification for receiver operating curve here is that for x-axis instead of false-positive rate, we used a number of edges sorted according to their weights. for evaluation all possible edges sorted based on their weights in network are taken from the gene-network inferred from filtered and raw graphs. we calculated improvement by measuring fold change between raw and filtered scores. we compared the results of our approach of graphwavelet based denoising with other methods meant for imputation or reducing noise in scrna-seq profiles. for comparison we used graph-fourier based filtering [17] , magic [20] , scimpute [21] , dca [22] , saver [23] , randomly [24] , knn-impute [25] . brief descriptions and corresponding parameters used for other methods are written in supplementary method. the bulk gene-expression data used here evaluation was download from dream5 portal (http://dreamchallenges.org/project/dream-5network-inference-challenge/). the single-cell expression profile of mesc generated using different protocols [18] was downloaded for geo database (geo id: gse75790). single-cell expression profile of pancreatic cells from individuals with different age groups was downloaded from geo database (geo id:gse81547). the scrna-seq profile of murine aging lung published by kimmel et al. [14] is available with geo id : gse132901. while aging lung scrna-seq data published by angelids et al. [51] is available with geo id: gse132901. the code for graph-wavelet based filtering of gene-expression is available at http://reggen. iiitd.edu.in:1207/graphwavelet/index.html. the codes are present at https://github. com/reggenlab/gwnet/ and supporting files are present at https://github.com/reggenlab/ gwnet/tree/master/supporting$_$files. an integrative approach for causal gene identification and gene regulatory pathway inference singlecell transcriptomics unveils gene regulatory network plasticity chemogenomic profiling of plasmodium falciparum as a tool to aid antimalarial drug discovery supervised, semi-supervised and unsupervised inference of gene regulatory networks reverse engineering cellular networks evaluating measures of association for single-cell transcriptomics evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data scenic: single-cell regulatory network inference and clustering scode: an efficient regulatory network inference algorithm from single-cell rna-seq during differentiation gene regulatory network inference from single-cell data using multivariate information measures characterizing noise structure in single-cell rna-seq distinguishes genuine from technical stochastic allelic expression noise in gene expression: origins, consequences, and control, science comparative assessment of differential network analysis methods murine single-cell rna-seq reveals cellidentity-and tissue-specific trajectories of aging wisdom of crowds for robust gene network inference genenetweaver: in silico benchmark generation and performance profiling of network inference methods enhancing experimental signals in single-cell rna-sequencing data using graph signal processing comparative analysis of single-cell rna sequencing methods a gene regulatory network in mouse embryonic stem cells recovering gene interactions from single-cell data using data diffusion an accurate and robust imputation method scimpute for single-cell rna-seq data single-cell rna-seq denoising using a deep count autoencoder saver: gene expression recovery for singlecell rna sequencing a random matrix theory approach to denoise single-cell data missing value estimation methods for dna microarrays single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns enrichr: interactive and collaborative html5 gene list enrichment analysis tool histamine stimulation of surfactant secretion from rat type ii pneumocytes aging impairs vegf-mediated, androgen-dependent regulation of angiogenesis dysfunction of pulmonary surfactant mediated by phospholipid oxidation is cholesterol-dependent age-dependent changes in the pulmonary renin-angiotensin system are associated with severity of lung injury in a model of acute lung injury in rats mapk and jak-stat signaling pathways are involved in the oxidative stress-induced decrease in expression of surfactant protein genes transcription factor etv5 is essential for the maintenance of alveolar type ii cells, proceedings of the national academy of sciences of the united states of targeted deletion of jun/ap-1 in alveolar epithelial cells causes progressive emphysema and worsens cigarette smoke-induced lung inflammation androgen receptor and androgen-dependent gene expression in lung the metabolic signature of macrophage responses imbalanced host response to sars-cov-2 drives development of covid-19 single cell rna sequencing of 13 human tissues identify cell types and receptors of human coronaviruses the aging transcriptome and cellular landscape of the human lung in relation to sars-cov-2 jak-stat pathway activation in copd, the european androgen hazards with covid-19 the h1 histamine receptor regulates allergic lung responses late breaking abstract -evaluation of the jnk inhibitor, cc-90001, in a phase 1b pulmonary fibrosis trial androgen-deprivation therapies for prostate cancer and risk of infection by sars-cov-2: a population-based study (n = 4532) visualizing data using t-sne discrete signal processing on graphs: frequency analysis wavelets on graphs via spectral graph theory how should we measure proportionality on relative gene expression data? propr: an r-package for identifying proportionally abundant features using compositional data analysis hippie v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks an atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics we thank dr gaurav ahuja for providing us valuable advice on analysis of single-cell expression profile of ageing cells. none declared.vibhor kumar is an assistant professor at iiit delhi, india. he is also an adjunct scientist at genome institute of singapore. his interest include genomics and signal processing.divyanshu srivastava completed his thesis on graph signal processing for masters degree at computational biology department in iiit delhi, india. he has applied graph signal processing on protein structures and gene-expression data-sets.shreya mishra is a phd student at computational biology department in iiit delhi, india. her interest include data sciences and genomics. • we found that graph-wavelet based denoising of gene-expression profiles of bulk samples and singlecells can substantially improve gene-regulatory network inference.• more consistent prediction of gene-network due to denoising lead to reliable comparison of predicted networks from old and young cells to study the effect of ageing using single-cell transcriptome.• our analysis revealed biologically relevant changes in regulation due to aging in lung pneumocyte type ii cells, which had similarity with effects of covid infection in human lung.• our analysis highlighted influential pathways and master regulators which could be topic of further study for reducing severity due to ageing. key: cord-252859-zir02q69 authors: chung, t. philip; laramie, jason m.; meyer, donald j.; downey, thomas; tam, laurence h.y.; ding, huashi; buchman, timothy g.; karl, irene; stormo, gary d.; hotchkiss, richard s.; cobb, j. perren title: molecular diagnostics in sepsis: from bedside to bench date: 2006-09-11 journal: j am coll surg doi: 10.1016/j.jamcollsurg.2006.06.028 sha: doc_id: 252859 cord_uid: zir02q69 background: based on recent in vitro data, we tested the hypothesis that microarray expression profiles can be used to diagnose sepsis, distinguishing in vivo between sterile and infectious causes of systemic inflammation. study design: exploratory studies were conducted using spleens from septic patients and from mice with abdominal sepsis. seven patients with sepsis after injury were identified retrospectively and compared with six injured patients. c57bl/6 male mice were subjected to cecal ligation and puncture, or to ip lipopolysaccharide. control mice had sham laparotomy or injection of ip saline, respectively. a sepsis classification model was created and tested on blood samples from septic mice. results: accuracy of sepsis prediction was obtained using cross-validation of gene expression data from 12 human spleen samples and from 16 mouse spleen samples. for blood studies, classifiers were constructed using data from a training data set of 26 microarrays. the error rate of the classifiers was estimated on seven de-identified microarrays, and then on a subsequent cross-validation for all 33 blood microarrays. estimates of classification accuracy of sepsis in human spleen were 67.1%; in mouse spleen, 96%; and in mouse blood, 94.4% (all estimates were based on nested cross-validation). lists of genes with substantial changes in expression between study and control groups were used to identify nine mouse common inflammatory response genes, six of which were mapped into a single pathway using contemporary pathway analysis tools. conclusions: sepsis induces changes in mouse leukocyte gene expression that can be used to diagnose sepsis apart from systemic inflammation. the incidence of sepsis and the number of sepsisrelated deaths is increasing at substantial cost ($17 billion annually in the us), 1 as reported recently. the cornerstone of successful therapy is early, accurate diagnosis, coupled with eradication of the source of infection and appropriate antibiotic therapy. 2 unfortunately, efforts to identify specific and sensitive diagnostic markers for sepsis have met with limited success. 3 the diagnosis of sepsis in icus is especially challenging because it is frequently difficult to distinguish between systemic inflammation and systemic infection. clinicians are unable to identify the pathogen responsible for sepsis in up to 50% of patients or to determine whether patients are responding to antibiotic therapy 1 ; and the traditional means of identifying the organism responsible for bacterial infections are nonspecific (gram stain), slow (culture), or insensitive (both gram stain and culture). there is compelling evidence that efforts to identify clinically important biomarkers in sepsis will prove successful ultimately, 4-6 especially given the recent successes of high-throughput, genomic technologies in other fields that face diagnostic challenges (eg, leukemia, 7 breast 8, 9 and colon cancer 10 ). this molecular classification strategy involves searching for expression patterns in a subset of genes from tissues of known phenotype (the "training" data set), constructing a prediction rule, and then using these "biomarker" genes to predict the phenotype of new samples (the "test" data set). studies that apply these genome-wide technologies to inflammation and sepsis in animal models and patients are now underway, [11] [12] [13] [14] as reviewed recently. 15, 16 these reports suggest that genome-wide profiling of gene expression holds promise as a molecular diagnostic tool, capable of generating profiles from leukocytes that are sensitive, specific, and timely for pathogen detection. 13 despite provocative in vitro findings, 4, 5 there are few reports of using microarrays to study sepsis in vivo. these reports indicate that the transcriptional response during polymicrobial sepsis is organ-specific in mice 12 and rats. 11 there are no reports of using microarrays to make a diagnosis of sepsis, although very recent reports suggest that this will be feasible in patients. 17, 18 we hypothesized that leukocyte gene expression profiles obtained using dna microarrays could be used to predict septic states; in particular, distinguishing between sterile and infectious sources of systemic inflammation, a common conundrum in caring for the critically ill or injured. tissue samples from both septic patients and clinically relevant mouse models of sepsis were tested. using an investigational protocol approved by the washington university human studies committee, informed consent was obtained to collect samples of splenic tissue intraoperatively or immediately postmor-tem as described previously. 19 seven specimens for expression profiling were chosen retrospectively from patients with injury (trauma or operation) complicated by sepsis and organ dysfunction (sepsis group). these were compared with six age-and gender-matched control specimens from patients with injury (trauma) requiring splenectomy (injury group). a total of 13 patient spleen samples were collected. care and use of laboratory animals were conducted in accordance with a protocol approved by the washington university animal studies committee. seven-to nineweek-old, male c57bl/6 mice were purchased (harlan, inc) and allowed to acclimatize for at least 1 week. male mice were used to avoid the confounding effect of the female estrous cycle and the well-characterized sexual dimorphism of the adaptive response to cecal ligation and puncture (clp). 20 the seven experimental groups were designed to make classification difficult, reflecting clinically important distinctions relevant to care of icu patients: lethal abdominal inflammation from a sterile source, lethal abdominal infection, and antibiotictreated abdominal infection. mice were assigned to the seven treatments listed in table 1 , grouped into those with no deaths (controls) and those that were "sick" with substantial associated mortality (sterile or infectious causes of systemic inflammation). normal animals were untreated. previously reported protocols were used to perform clp and sham laparotomy. 21, 22 the ceca of some animals were punctured twice using a 25-gauge needle (clp), and others were punctured using a 27gauge needle to produce a "milder" sepsis that caused much lower mortality (mild clp). animals that had laparotomy and cecal manipulation without ligation or puncture were included in the sham laparotomy group (sham). another group of animals had clp and treatment with a standard antibiotic regimen for mouse abdominal sepsis: ip ceftriaxone 6.4 mg/kg and metronidazole 3.5 mg/kg, delivered 1, 12, and 24 hours after clp (clp ϩ antibiotics group). 23 to induce severe systemic inflammation without infection, ip injection of escherichia coli lipopolysaccharide (lps) serotype 0111:b4 20 mg/kg (sigma, inc) was performed (lps group). 24 the dose of lps (20 mg/kg) was used because it reliably produces death over several days in the animals; smaller doses in pilot experiments tended to produce morbidity without mortality. mice injected ip with abbreviations and acronyms clp ϭ cecal ligation and puncture fdr ϭ false discovery rate est ϭ expressed sequence tag ip ϭ intraperitoneal lps ϭ lipopolysaccharide pca ϭ principal components analysis normal saline vehicle (saline group) acted as the concurrent control group for lps treatment. the census of surviving mice was recorded at 24-hour intervals for 7 days. in a separate cohort, mice were sacrificed at 24 hours after injury by cervical dislocation under halothane general anesthesia. the 24-hours time point after injury was chosen because pilot experiments showed that the ability to distinguish between spleen expression profiles was greater at 24 hours than at 48 or 72 hours after injury. 25 whole spleen tissue from eight clp and eight sham mice was harvested and flash-frozen in liquid nitrogen and stored at ϫ80°c. in another cohort of animals, intracardiac blood from eight animals per group was pooled using the paxgene blood rna system to derive total rna from whole blood (preanalytix gmbh). while 5 genechips from blood for each of the seven groups were prepared, there were a total of 35 mouse blood genechips from 280 animals: 28 gene-chips were in the training data set and 7 genechips were in the test (de-identified) data set. whole blood was collected for automated white blood cell counts from eight animals in each group over two replicates, each run performed with samples from concurrent control animals. a hemavet multispecies hematology analyzer (cdc technologies) provided counts and an automated differential. data were expressed as the mean ϯ sem. total rna from human and mouse spleens was extracted per trizol protocol (life technologies, inc). total rna from mouse blood was extracted using the paxgene kit as recommended by the manufacturer. crna target for genechip hybridization was prepared from total rna using the protocol recommended (affymetrix, inc). the human spleen crna was hybridized with the human full-length genechip (approximately 7,000 probe sets). mouse spleen and blood crna were hybridized with the u74av2 genechip (approximately 12,400 probe sets). fluorescent hybridization signal was detected using a genearray scanner (hewlett-packard, inc) at the washington university alvin j siteman cancer center and, for the prospective mouse cohort, the genechip scanner 3000 (affymetrix). the data discussed in this publication have been deposited in ncbi's gene expression omnibus (geo, http://www.ncbi.nlm.nih.gov/ geo/) and are accessible through geo series accession number gse5663. tests for differential expression, class prediction, and pathway analysis expression values were calculated from genechip.cel files using dna chip analyzer software (version 1.3) and default settings (only perfect match probes were used). 26 principal components analysis (pca) was used to visually explore global effects for genome-wide trends, unexpected effects, and outliers in the expression data (partek software). differentially expressed genes were identified using a mixed-model anova and linear contrasts. we next determined whether it was possible to accurately distinguish tissue samples resulting from models of sterile systemic inflammation or models of systemic infection in human spleen, mouse spleen, and mouse blood. to produce unbiased estimates of prediction accuracy while also optimizing the set of predictor genes, classifier type, and classifier parameters, we used a two-level, nested cross-validation procedure that produces prediction estimates that are not biased by gene, classifier, and parameter selection. this procedure makes use of an "outer" cross-validation to produce the estimate of accuracy, and an "inner" cross-validation to perform classifier and gene selection. 27 for class prediction, we compared k-nearest neighbor, nearest centroid, and linear and quadratic discriminant analysis classifiers. two complementary programs were used to query regulatory networks: pathwayassist (ariadne genomics) and pathway analysis (ingenuity systems). both path-wayassist and pathway analysis use the published literature or publicly available databases to identify relationships between genes, small molecules, or other objects. this information in turn is used to map de novo, putative, interaction networks from a given list of input gene survival data were calculated using the kaplan-meier method and survival curves were compared using logrank test (prism v3.03, graphpad software, inc). cbc and cell differential data were analyzed by one-way anova using the tukey-kramer post-test to correct for multiple comparisons (prism v3.03). modified multiple organ dysfunction syndrome (mods) score 28 are reported as the mean ϯ sem and analyzed using the mann-whitney u test for nonparametric data (prism v3.03). the significance of change in mouse blood relative rna abundance was measured across all seven groups using two-way anova for the effects of treatment and batch (time). from this analysis, a step-up false discovery rate (fdr) of 0.1 was used to identify a subset of discriminating genes for treatment effect, visualized using pca. 29 for pair-wise group comparisons across mouse blood, a two-way anova for the effects of treatment and batch was used. because there was no batch effect for spleen, one-way anova was used. an fdr of 0.1 was applied to the raw p value for each pair-wise comparison, giving a list of informational genes for each comparison. the characteristics of the 13 patients sampled are found in table 2 . none of the injury (control) patients had positive cultures or signs of infection or sepsis at the time of sample collection, and all but one recovered uneventfully after splenectomy. in contrast, all sepsis patients had positive bacterial or fungal cultures or obvious signs of infection (pus or necrotizing fasciitis) complicated by sepsis-induced organ dysfunction at the time of operation (modified mods score, p ϭ 0.001 versus injury). dna chip analyzer hybridization signal analysis using default filters flagged 1 of 13 human genechips as a statistical outlier (ͼ 15% probe pair expression values calculated as statistical outliers, suggesting unreliable hybridization signal). this sample (patient 7) was omitted from additional analysis. pca was used to explore differences in gene expression for all genes across the remaining 12 microarrays, demonstrating considerable differences between the sepsis and injury classes (data not shown). consistent with the large variance in expression observed across subjects, there were no genes identified as differentially expressed between these two classes using a fdr of 0.1. to estimate sepsis class prediction accuracy among these samples, a 12 ϫ 11 leaveone-sample-out nested cross-validation was performed. the average predictive accuracy was 67.1%. all control group animals (saline, sham, normal) survived the duration of the experiment. in contrast, "sick" animals with systemic infection (clp, clp ϩ antibiotics, mild clp, and lps) exhibited typical signs of piloerection, anorexia, and lethargy followed by 7-day mortality rates of 78%, 25%, 44%, and 100%, respectively ( fig. 1) . absolute white blood cell counts varied among the seven experimental groups (p ͻ 0.0001), in particular between control and sick animals (fig. 2) . there was no difference among animals with sterile or infectious causes of systemic inflammation (lps, clp, mild clp, or clp ϩ antibiotics). likewise, there was no difference in cell differentials among controls or among animals with sterile or infectious causes of systemic inflammation, except for lps versus clp ϩ antibiotics (neutrophils and lymphocytes, p ͻ 0.05). mouse differential gene expression dna chip analyzer hybridization signal analysis flagged 2 of the 35 mouse blood genechips as statistical outliers: one saline and one sham genechip from the training data set. these mouse outliers genechips were omitted from additional analysis. thirty-three gene-chips remained in the mouse blood data set. pca was used to visualize treatment differences in expression for all genes in blood and spleen, demonstrating batch and any-clp treatment effects (p ͻ 0.05, fig. 3 ). we applied a stringent multiple test correction (bonferroni, 0.05) to the p values from the two-way anova to identify a small set of genes in blood differentially expressed between the seven treatment groups (25 probe sets corresponding to 24 genes, as probe sets for lipocalin 2 appeared twice [ table 3 ]). 30 pca analysis of these 25 probe sets revealed that the seven experimental groups were clustered into three apparent phenotypes (fig. 4) : control animals, lps-treated animals (sterile source of systemic inflammation), and those that had any clp treatment (sepsis). comparisons of gene expression across groups generated several informational gene lists (fdr ϭ 0.1); each indicated apparent increases and decreases in gene expression induced by clp or lps (fig. 5 ). using mouse spleen samples for gene expression analysis, we were able to classify the samples as clp or sham with 96.0% accuracy, estimated using cross-validation. for the blood data, the experimental design dictated that the first four replicates (batches) were used to train the classifiers. the best classifiers were trained on all 26 training samples (batches 1 to 4) and used to predict the seven de-identified test samples (batch 5). all seven test samples were predicted correctly as any-clp (septic) versus non-clp. we also performed fivefold leave-onebatch-out cross-validation, which produced an overall accuracy estimate of 94.4% (fig. 6) . the prediction accuracy differentiating lps from controls was 93.2%. the ability to predict low versus high mortality after clp was substantially lower (62.4%). to obtain the final mouse blood classifier, leave-one-batch-out crossvalidation was performed for the purpose of classifier selection. sixty-four classifiers tied for best prediction. the median number of genes for these 64 classifiers was 450. of these 450 genes, only 61 genes showed changes greater than twofold, and the majority (86.4%) were altered by twofold or less. there were nine genes that demonstrated increased rna abundance across spleen and blood and across clp and lps (fig. 5) . we call this cluster of genes the "common inflammatory response cluster" (table 4 ). pathway analysis tools also were used to put this list into a biologic (functional) context. all but one of these genes is annotated and has been associated previously with a gene product, small molecule, or cellular process linked to inflammation (fig. 7) . a single expressed sequence tag (est) completed the list of nine genes (genechip identifier, 99849_at). a blast search identified this sequence as retroviral. a search of the ncbi gene expression omnibus microarray database showed that rna for this affymetrix probe set is increased in a number of models of inflammation, both animal and human. the ability to diagnose sepsis more accurately would allow appropriate treatment to be instituted earlier, thereby improving the likelihood of survival. 31, 32 we hypothesized that expression profiles could distinguish between septic and nonseptic states in vivo, and that expression profiles could define lists of common responder genes using a systematic, unbiased approach. 15,16 recent , saline, sham, and normal) is apparent, but less notable than the batch effect (appearing on pc3 and pc12, explaining 10% ϩ 2% of variance, respectively). no pcs besides pc3 and pc12 showed a substantial difference between any-clp and non-clp groups (determined by t-test on each pc). note that the genechip scanner used for batches 1 to 4 was different than that used for batch 5 (latest generation), which is one explanation why batch 5 is most different from the other four batches. reports using microarray technology in vitro indicated that inflammatory and infectious insults produce distinct transcriptional signatures. 4, 5 the current study is the first examining the ability of microarray gene expression profiles to distinguish sterile from infectious causes of systemic inflammation in vivo. we examined gene expression profiles of splenic tissue from patients with injury versus those with injury complicated by sepsis and organ dysfunction. differences in apparent gene expression between the sepsis and injury (control) phenotypes were used to construct a classifier, the accuracy estimate of which was 67.1%. this small, exploratory clinical study provided "bedside" proof of feasibility using human transcriptional profiles to model the septic phenotype, but also demonstrated a large degree of variance in gene expression between subjects, because of both technical and biologic differences. to control more of this variance we moved from the bedside to the bench, performing a systematic examination of the diagnostic potential of spleen and blood gene expression profiling in inbred mice. consistent with the human data, spleen samples from mice after clp exhibited microarray patterns that could be modeled to predict the septic phenotype. the nested cross-validation accuracy estimate of 96.0% for sepsis prediction using mouse spleen was considerably better than that found using human spleen, likely because of the mouse experimental design that exploited fresh tissues and identical age, gender, and genotype across subjects. because the clinical use of gene expression analysis using splenic tissue is severely limited, we explored next the use of circulating blood for class prediction in our mouse models. the combined accuracy of the predictions for any-clp versus non-clp and lps versus controls was high at 94.4% and 93.2%, respectively. the accuracy estimate for distinguishing between the high and low mortality clp groups (clp versus mild clp and clp plus antibiotics) at this 24-hour time point was much less at 62.4%. these conclusions are consistent with the pca analysis in figure 4 , in which we sought differences in apparent table 3 . abx, antibiotics. gene expression across the seven experimental groups. interestingly, the only samples that were cluster outliers were in the clp plus antibiotics group, consistent with an effect of antibiotics to change the septic phenotype toward the control phenotypes. we conclude that circulating blood gene expression profiles can be used to predict clp and non-clp phenotypes in prospective cohorts, in particular, distinguishing controls from lethal endotoxicosis (lps) from lethal infection (clp). it is important to note that at this 24-hour time point, there were no substantial differences between lps or any of the clp groups in clinical presentation or complete blood counts. microarray analysis could make the diagnosis of sepsis (distinguishing between sterile and infectious sources of systemic inflammation) when clinical criteria and white blood cell counts could not, a frequent occurrence in icus. 33 in addition, the differences between groups as measured by absolute fold-changes in individual gene expression were small (eg, in the final model for blood, 86.4% of genes were altered less than twofold), yet the changes in the patterns of expression across hundreds of genes were robust predictors of phenotype. to discover changes in expression generic to the inflammatory and septic responses, we used the intersection of gene lists identified by the pair-wise group comparisons (fig. 5) . nine genes were commonly increased, validated across two tissue types (spleen and blood) and two insults (clp and lps). in contrast, there were no genes that were commonly decreased. given that this list of nine genes was based on changes in relative rna abundance across a number of cell types, the network analysis performed served as an exploratory tool, validating in silico the role of six of nine genes in canonical pathways for inflammation, apoptosis, and signal transduction: inhibitor of dna binding 2, calgranulin a and b, interferon regulatory factor 7, lipocalin 2, and formyl peptide receptor-like 1 ( table 4) . several characteristics of these nine common inflammatory response genes are notable. inhibitor of dna binding 2 is required for normal mouse immune development, especially of lymph nodes and peyer's patches. 34 calgranulin a and b, which belong to a recently described group of proinflammatory molecules, form extracellular complexes that bind to and activate endothelial cells, promoting chemotaxis and phagocytic adhesion in a positive feedback manner. 35, 36 interferon regulatory factor 7 is a key regulatory of monocyte development, essential to differentiation of monocytes to macrophages. 37 lipocalin 2 is a secreted protein that undergoes transcriptional induction after cytokine withdrawal and induces leukocyte-specific apoptosis. 38 lipocalin 2 transcription, translation, and secretion are induced by ligation of toll-like receptors on leukocytes, with secreted lipocalin 2 acting to sequester siderophores, thereby limiting bacterial growth. 39 formyl peptide receptor-like 1 is a member of the chemoattractant subfamily of g proteincoupled receptors that are involved in controlling leukocyte migration. 40 the other two annotated genes not listed in the network, neutrophilic granule protein and serum amyloid a3, have also been associated with inflammation and cellular defense, although less is known about their functions and protein interactions. neutrophilic granule protein is a cysteine protease inhibitor that has been associated with myeloid differentiation. 41 serum amyloid a3 is a highdensity apolipoprotein, the only amyloid made by both hepatocytes and peripheral monocytes and macrophages. 42 it is believed to function by retargeting transported lipids, including cholesterol, in the disposal of toxins. 42 the function of the ninth gene, a retroviral species that is increased in a number of different models of infection and inflammation, is not known. we compared this common inflammatory response cluster with the list of proteins recently reported to diagnose intra-amniotic infection in patients, 18 as another means of validating the importance of these nine genes. of the 11 proteins and polypeptides detected in that study, 3 were also identified in our study, specifically calgranulin a and b and lipocalin 2 (neutrophil gelatinase-associated lipocalin). calgranulin a (s100a8) was also the most differentially expressed gene in blood in a small microarray study comparing eight septic patients with four surgical controls with-out systemic inflammation. 17 serum amyloid a protein was identified as a plasma proteome biomarker in patients with coronavirus (severe ards). 43 together, these studies support our hypothesis that there are a group of common inflammatory response genes that can be used as novel biomarkers to diagnose inflammation across species, tissue, and different types of infecting organisms, at either the rna or protein level. because of the large degree of variation (noise) in human spleen gene expression, no individual genes surveyed passed the fdr filter. nevertheless, the data from these patients have proved invaluable for the study of immune dysfunction in human sepsis 2 and provided proof-ofprinciple here that molecular profiles of human lymphoid tissues could be used to distinguish between septic and nonseptic phenotypes. because cellular populations of mammalian tissues are heterogeneous, use of microarray profiles to study the cellular response to a given stimulus must be understood in the context of changes in cell populations. this substantially limits conclusions about whole spleen data, given our reports and those of others that sepsis accelerates splenocyte apoptosis. 2, 19, 44 in contrast, changes in blood cellular heterogeneity are measured routinely. although clp and lps stimuli changed both absolute wbcs and cell differentials compared with the controls, among the four groups of "sick" animals cell counts were indistinguishable (fig. 2) . this mirrors the clinical situation where differentiating between sterile (lps) and infectious (any-clp) sources of systemic inflammation is not possible based on clinical grounds or cell counts. microarray profiles, as we have discussed, were very successful at making this distinction. many questions remain unanswered. what are the optimal computational methods to identify robust predictors from microarray or proteomic data? can gene or protein expression profiles be used to diagnose sepsis in other animal models? if so, are there predictive gene sets that are common to different types of infection (eg, gram-positive versus gram-negative infections)? once the diagnosis has been made, are there markers that indicate response to therapy or prognosis? there are sufficient preclinical and preliminary patient data to justify testing these hypotheses. because of the heterogeneity of expression profiles, large-scale collaborative studies will be required to enroll sufficient patients to identify robust sepsis markers, and in the process, untangle the biology of infection from inflammation. 13 back to the bedside-the promise of molecular profiling for sepsis diagnosis in conclusion, our in vivo data corroborate in vitro findings indicating that microarray analysis holds promise as a means of identifying distinct expression profiles ("molecular fingerprints") that can diagnose the septic phenotype. our human spleen data join recent blood data from septic patients 17 and serum and amniotic fluid data from patients with intra-amniotic fluid infection, 18 indicating that transcriptome and proteome studies will deliver on the promise of novel inflammation diagnostics. 15, 16 a single inflammation gene, calgranulin a (s100a8), was detected in all three studies at either the rna or protein level. our data are unique in that they show that transcriptome molecular profiles can distinguish between sterile and infectious causes of systemic inflammation and can make a diagnosis of sepsis in prospective cohorts. importantly, we observed that the magnitude of change in gene expression that was needed to predict the septic phenotype was very small. it was the pattern of these small changes in expression that were predictive, not the magnitude of any single change. we and others reported recently validated clinical protocols for blood gene expression profiling used to characterize the human systemic inflammatory response. 13, 14 the data presented here suggest that these protocols should be extended to clinical trials, testing the efficacy of microarray gene expression profiling to diagnose human sepsis. we expect that these studies will provide new insight into how specific pathogens uniquely perturb the physiology of cir-culating leukocytes and how the host successfully mounts pathogen-specific defenses. . mouse common inflammatory response cluster. nine probe sets (red genes) were commonly altered regardless of tissue or insult. a contemporary pathway analysis tool was used to automatically create a network of interactions among these nine genes, based on previously reported interactions in the literature. eight of the nine probe sets are known inflammation genes; the ninth probe set is an expressed sequence tag. this network validates in silico that six of these genes are involved in canonical pathways of inflammation, apoptosis, and regulation of signal transduction. the epidemiology of sepsis in the united states from 1979 through 2000 the pathophysiology and treatment of sepsis sccm/esicm/ accp/ats/sis international sepsis definitions conference the plasticity of dendritic cell responses to pathogens and their components human macrophage activation programs induced by bacterial pathogens measures, markers, and mediators: toward a staging system for clinical sepsis. a report of the fifth toronto sepsis roundtable molecular classification of cancer: class discovery and class prediction by gene expression monitoring molecular portraits of human breast tumours gene expression profiles in hereditary breast cancer broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays molecular signatures of sepsis: multiorgan gene expression profiles of systemic inflammation sepsis gene expression profiling: murine splenic compared to hepatic responses determined using cdna microarrays application of genome-wide expression analysis to human health and disease a network-based analysis of systemic inflammation in humans functional genomics of critical illness and injury injury research in the genomic era expression profiling: toward an application in sepsis diagnostics diagnosis of intraamniotic infection by proteomic profiling and identification of novel biomarkers apoptotic cell death in patients with sepsis, shock, and multiple organ dysfunction effect of gender and sex hormones on immune responses following shock evaluation of factors affecting mortality rate after sepsis in a murine cecal ligation and puncture model inducible nitric oxide synthase (inos) gene deficiency increases the mortality of sepsis in mice overexpression of bcl-2 in transgenic mice decreases apoptosis and improves survival in sepsis sequence makes a difference: paradoxical effects of stress in vivo physiological genomics model-based analysis of oligonucleotide arrays: expression index computation and outlier detection primary and secondary transcriptional effects in the developing human down syndrome brain and heart multiple organ dysfunction score: a reliable descriptor of a complex clinical outcome controlling the false discovery rate: a practical and powerful approach to multiple testing simultaneous statistical inference treating sepsis reducing mortality in sepsis: new directions determination of infection probability versus the diagnosis and treatment of antibiotic-responsive diseases development of peripheral lymphoid organs and natural killer cells depends on the helix-loop-helix inhibitor id2 s100a8: emerging functions and regulation phagocyte-specific s100 proteins: a novel group of proinflammatory molecules monocyte differentiation to macrophage requires interferon regulatory factor 7 induction of apoptosis by a secreted lipocalin that is transcriptionally regulated by il-3 deprivation lipocalin 2 mediates an innate immune response to bacterial infection by sequestrating iron agonist-induced trafficking of the low-affinity formyl peptide receptor fprl1 identification of a series of differentiation-associated gene sequences from gm-csf stimulated bone marrow murine serum amyloid a3 is a high density apolipoprotein and is secreted by macrophages plasma proteome of severe acute respiratory syndrome analyzed by two-dimensional gel electrophoresis and mass spectrometry apoptosis in lymphoid and parenchymal cell during sepsis: findings in normal and t & b cell deficient mice we thank ms alice tong, ms sandra k macmillan, and ms tracey h wagner for expert technical assistance. critical revision: chung, downey, buchman, stormo, hotchkiss, cobb tests for differential expression, class prediction, and pathway analysis: for the mouse blood study, a mixed-model anova was used to detect differential expression between treatment groups, with a linear contrast between the any-clp and non-clp groups. the anova model was chosen to partition treatment group and technical batch variability from variability due to biological and experimental noise. the following linear mixed model was used to detect differential expression on a gene-bygene basis in the mouse blood data:where y gij is the expression of the gth gene for ith treatment and jth batch. the mean expression for the gth gene is given by g . the symbols t and b represent effects due to treatment and batch respectively. the error for the gth gene for sample ij is designated as gij . for the mixed-model analysis of variance, treatment is a fixed effect and batch is a random effect. a batch constitutes 7 samples (one from each treatment group) which were processed and hybridized at the same time. in the case of the last batch (batch 5), the genechips were scanned on a different scanner. for the mouse and human spleen studies samples were processed in a single batch, so a simple one-way analysis of variance with a contrast between any-clp and non-clp was used to identify differentially expressed genes. the linear contrast between any-clp and non-clp is given by:where clp is the mean of the clp group, clpϩabx is the mean of the clpϩabx group, etc.where possible, the following competing classifiers were considered for all tasks, and the optimal classifier was selected: number of genes (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50 , 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 2000, 5000, and 10000), prior probabilities for nearest centroid (equal and proportional), functions for discriminant analysis (liner and quadratic), prior probabilities for discriminant analysis (equal and proportional), number of neighbors (k) for k-nn (1, 3, and 5), and distance functions for k-nn (euclidean distance, pearson's linear correlation, and absolute value distance). thus as many as 426 classification models were considered for each classification task.for the mouse blood data, we used a leave-one-batchout (5-fold, one for each of the 5 batches) outer crossvalidation, while the inner cross-validation is leave-onesample-out. we refer to this as nested cross-validation with an outer "leave-one-batch-out" layer and an inner "leave-one-sample-out" layer. (27) using this method, the determination of how many and which genes to use for classification were determined using only the training samples. in addition, all additional classifier parameters (e.g., number of neighbors and distance measure) were determined using only the training samples. for each held-out batch in the outer 5-fold cross-validation, the classifier and genes that performed best on inner cross-validation were selected and applied to the 6 or 7 held-out test samples (two batches only had 6 samples due to removal of an outlier sample). for the mouse and human spleen data where all samples were processed in a single batch, both the outer and inner cross-validation used full leave-one-sample-out. 598 .e1 key: cord-258035-2tk7maqk authors: defilippis, victor; raggo, camilo; moses, ashlee; früh, klaus title: functional genomics in virology and antiviral drug discovery date: 2003-10-31 journal: trends in biotechnology doi: 10.1016/s0167-7799(03)00207-5 sha: doc_id: 258035 cord_uid: 2tk7maqk abstract virology research and antiviral drug discovery are poised to benefit from the post-genomic revolution for three main reasons. first, viruses need the host to replicate and are therefore vulnerable to inhibition of cellular pathways. knowledge of complete genomic sequences of both virus and host now permits the study of this interplay on a global scale. combining transcriptomics and proteomics with large-scale gene knockdown experiments will enable the identification of novel antiviral targets. second, massive parallel assay systems, such as dna microarrays, which define the post-genomic era, will facilitate viral diagnostics. third, the combination of genetics with genomics will enable the analysis of viral mutants and strains on an unprecedented scale. the dramatic effects of viral infection on host cell transcriptional patterns have been well-documented and will be briefly highlighted. in addition, we discuss recent trends that apply functional genomics methods to the discovery of new targets and therapies for viral disease. victor defilippis, camilo raggo, ashlee moses and klaus frü h vaccine and gene therapy institute, oregon health and science university, 505 nw 185th ave, beaverton, or 97006, usa virology research and antiviral drug discovery are poised to benefit from the post-genomic revolution for three main reasons. first, viruses need the host to replicate and are therefore vulnerable to inhibition of cellular pathways. knowledge of complete genomic sequences of both virus and host now permits the study of this interplay on a global scale. combining transcriptomics and proteomics with large-scale gene knockdown experiments will enable the identification of novel antiviral targets. second, massive parallel assay systems, such as dna microarrays, which define the postgenomic era, will facilitate viral diagnostics. third, the combination of genetics with genomics will enable the analysis of viral mutants and strains on an unprecedented scale. the dramatic effects of viral infection on host cell transcriptional patterns have been welldocumented and will be briefly highlighted. in addition, we discuss recent trends that apply functional genomics methods to the discovery of new targets and therapies for viral disease. virology has been a genomic science for longer than any other field because the first complete genomes sequenced were of viral origin [1] . as of this publication, 1535 full viral genome sequences are available in genbank (http://www.ncbi.nlm.nih.gov/genbank/index. html). however, only modern sequencing endeavors made possible the complete sequencing of the human genome as well as that of model organisms. 'virogenomics' research aims to describe the interaction between the products of two genomes using post-genomic methods such as dna microarrays. given that viruses require a host organism to propagate, this approach promises to deepen our understanding of virus -host interactions. eventually, virogenomics research will also lead to new treatments for diseases caused by viruses, because whole genome sequences (virus and host) represent the complete collection of possible antiviral targets. classically, antiviral drug discovery focused on viral gene products. however, the limited number of viral targets and the rapid emergence of drug-resistant mutations in many viral systems make the search for host cell targets an attractive alternative. modern virology and antiviral drug discovery are thus expected to be impacted dramatically by functional genomics methods. the widespread use of these methods is reflected in issues of major virology journals, which are rarely published without articles that include or rely on functional genomics experiments, in particular, dna microarrays. however, most of the published studies are limited to showing that a given virus induces or represses a given set of genes. the future challenge, however, is to determine which of the myriad of host cell gene products are essential for virus survival or for virally induced disease progression. such gene products represent targets for inhibiting virus infection, pathogenesis, or virusinduced tumor formation. recent breakthroughs in gene knockdown technology are expected to accelerate the pace by which host cell pathways crucial to virus replication can be identified. furthermore, diagnostic virology will be impacted by microarray technology because probes for thousands of viral species and strains can be matched against an unknown specimen. here, we highlight some published and as-yet unpublished studies that use dna microarrays and gene knockdown methods in virology. a thorough discussion of the widespread use of microarray technology in biomedicine is beyond the scope of this review, but there are many excellent papers that cover these subjects in depth [2 -4] . briefly, microarrays are glass, silicon, or nylon platforms containing oligo-or polynucleotides (targets) identical or complementary to known genes. interrogation of biological samples is performed by synthesis from sample mrna of fluorescently labeled probes that are hybridized to target sequences. gene expression is quantified by measuring fluorescence emission with a confocal laser scanner. the ability to immobilize up to 20 000 different genes on a single array enables examination of the transcriptional activity of entire eukaryotic genomes in parallel. however, microarrays are adaptable because the investigator can specify target sequences. examination of viral genes, host genes or any such combination can thus be accomplished on a single array. this technology can also be used to immobilize specific probes for many different viral species, different viral strains, or complex dna viruses. many common viruses, such as retroviruses, papillomaviruses and parvoviruses, have very small genomes (often , 10 genes). therefore, using microarrays for examination of gene expression in such viruses is not practical. in these cases other techniques, such as northern blotting and quantitative rt-pcr (qpcr), are more appropriate. however, microarrays are useful for genotyping small viruses, characterization of strain variability and identification of virus type(s) present in a clinical sample. high-density oligonucleotide arrays were used to characterize sequence variability of the hiv quasi-species in patient samples [5] and oligonucleotide arrays were applied to correlate progression of cervical cancer with the presence of the hpv genotype [6] . in a particularly clever study, joe de risi's laboratory [7] constructed microarrays using polynucleotides of conserved regions from a broad range of common viruses. using random pcr amplification from infected samples, they detected infecting viruses by examining specific patterns of hybridization. recently, this technique confirmed that the agent of severe acute respiratory syndrome (sars) is a member of the coronavirus family (cdc press release, 24 march 2003) . samples from sars patients reacted with probes from avian and bovine coronaviruses. microarrays allow not only simultaneous diagnosis of multi-strain viral infection from clinical specimens but also identification of new virus types (see [8, 9] ). furthermore, diagnostic microarrays will have important applications in the fields of biological warfare and bioterrorism. dna microarrays will be able to rapidly diagnose individuals for exposure to different biothreats such as poxviruses, plague, anthrax and tularemia. these methods can also be used to determine whether viral strains and types used in an attack have been genetically altered. viral genome microarrays are useful for determining the complex transcriptional program of larger viruses, especially members of the herpesvirus family. for instance, the transcriptional program of the 250 kilobase human cytomegalovirus (hcmv) genome [10] was mapped by chambers and colleagues using oligonucleotide arrays representing all 226 hcmv open reading frames (orfs) [11] . because the function and expression pattern of many of the hcmv genes are still unknown, such classification might help in characterizing unknown orfs (e.g. non-structural proteins are generally immediate early and early genes whereas structural proteins are often late genes). transcriptional analysis of the tumorigenic g-2 herpesvirus kaposi's sarcoma-associated virus (kshv, human herpesvirus type 8) has also been examined with microarrays. the virus is mostly latent in infected and transformed cells but lytic replication can be induced either spontaneously in a small percentage of cells or artificially using phorbol esters in culture. correlation of viral gene expression with latent and lytic phenotypes has been investigated using nylon membrane [12] and glass slide-based microarrays [13] . jenner and colleagues [12] measured kshv transcription in latently infected cell cultures that were both lysis-induced and uninduced over a time course (0-72 hr). expression levels for each gene were grouped according to similarity in patterns of expression over all time points and both treatments by cluster analysis [14] . interestingly, this approach showed that some genes clustering with typical lytic genes had been previously misclassified as latent using other methods. this was because their expression level was very high in a small percentage of cells undergoing spontaneous lytic replication thereby contaminating the 'strictly latent' sample. other large dna viruses, such as poxviruses, are also excellent candidates for such research. the emergence of smallpox as a potential agent of bioterrorism and recent monkeypox outbreaks have renewed interest in finding new therapeutic, diagnostic and immunological targets for improved diagnosis, treatment and vaccination strategies for poxiviruses. viral genome arrays also allow phenotypic classification of viral mutants. for instance, gene expression was examined using microarrays for herpes simplex virus type 1 (hsv1) employing both wild type and a mutant strain lacking immediate-early gene a27 [15] . although the mutant used in this study was well characterized, it is conceivable that unknown viral mutants can be classified according to their transcriptome fingerprint in a manner similar to the compendium approach described for yeast [16] . particularly appealing is use of this approach to characterize and classify randomly generated mutants with targeted mutants. much progress has been made in creating random transposon libraries of herpesvirus genomes cloned as bacterial artificial chromosomes (bacs) [17] . transcriptional profiles observed in the hcmv and hsv1 experiments agreed strongly with previous studies of gene expression that used alternative techniques (e.g. northern blotting, rnase protection assays, primer extension) thus validating the use of microarrays for examining gene transcription. a complementary approach to viral mutant characterization is study of their role in manipulating the host cell transcriptional profile. the deletion of non-essential genes often results in a 'normal' phenotype in tissue culturethat is, the mutant virus behaves similar to the wild type in traditional assays such as single-step and multi-step growth curves. however, the microarray fingerprint of such a mutant might be very different from the wild type because the deleted gene might be involved in manipulating host cell pathways that are not essential in vitro but have an important role in vivo. an example for such a situation from our laboratory is shown in fig. 1 . a mutant cmv lacking a nonessential gene grew much like wild type in fibroblasts. however, microarray analysis revealed dramatic differences in the transcriptional profile of fibroblasts infected with the mutant versus wild-type virus. in addition to regulating a similar set of genes as the wild type, the mutant virus induced or repressed a unique set of genes not regulated in cells infected with the wild type. such results will not only help to elucidate the function of nonessential viral genes but also could be used to classify viral mutants generated by random approaches. the investigation of host cell transcriptional changes in response to virus infection and viral protein expression has been a vigorous area of microarray research over the past few years. at last count (july 2003) there are 201 references containing the words 'microarray' and 'virus' in the pubmed literature database, most of which describe the host cell response to viral infection. given that virus replication involves use and manipulation of multiple host proteins and molecular processes, cellular pathways related to transcription and translation, signal transduction, metabolism, host defense and cell cycle control are review commonly altered in response to, or as a result of, infection with diverse virus types. transcriptional changes in virally infected cells are either the result of anti-viral, 'pro'-viral or bystander host responses. as a common antiviral response, expression of interferon-stimulated genes (isgs) is often observed during infection by diverse, unrelated viruses. isgs create a cellular state in which virus replication is blocked and thus there is natural selective pressure to circumvent or impair this host response (for a recent review see [18] ). the ability to prevent, inhibit, or limit this induction might therefore contribute to virulence and represent a target pathway for antiviral treatments. for anti-viral therapy, it is conceivable to target viral proteins that interfere with the induction of isgs thus releasing the innate immune response to combat viral infection. an anti-host strategy used by viruses that complicates transcriptome analysis involves interference with cellular rna metabolism. global inhibition of transcription or rna degradation can appear as gene 'repression' in a comparative analysis. a bias toward gene downregulation was indeed observed for influenza [19] as well as hsv [15] . interestingly, in each case several host transcripts were upregulated despite massive nonspecific downregulation of host cell mrna. picornaviruses induce degradation of the mrna cap binding complex eif4f thereby inhibiting cellular gene expression by preventing recruitment of 40s ribosomal subunits to 5 0 caps. following this, translation of genes can only occur when they contain internal ribosome entry sites (ires), as do many viral genes. by selectively isolating mrna-polysome complexes in poliovirus infected (eif4f depleted) and uninfected cells, johannes et al. subsequently used microarrays to identify cellular genes that were translated during poliovirus infection [20] . indeed, a small group of cellular genes were identified as being transcribed despite the lack of 5 0 caps in infected cells. although these genes had diverse functions it seems possible that the internal ribosomal entry site evolved to overcome the block by picornaviruses. thus, microarray analysis of polysome-fractions can be used to screen for host cell mrnas that survive viral attack thus potentially representing novel anti-viral host factors. in contrast to host pathways that represent an antiviral response, others might be beneficial or even essential for viral replication. inhibiting such 'pro'-viral factors offers novel avenues for anti-viral drug discovery. there are several examples for such essential host factors that were identified by traditional, hypothesis-driven methods (for recent reviews see [21, 22] ). in addition, non-hypothesisdriven global gene expression profiling can reveal host cell genes that promote viral growth. however, the challenge is how to pick those genes that might be important for viral replication. several investigators, including our laboratory, chose certain microarray-identified differentially regulated genes because they were associated with interesting biological characteristics or displayed unusual expression patterns. for instance, t. shenk's group noticed in a dna microarray experiment that several genes regulating prostaglandin metabolism were induced by human cytomegalovirus [23] . one of the induced genes was cyclooxygenase-2, which leads to the synthesis of prostaglandin e 2 (pge2) thus modulating cellular gene expression and immune function. based on this initial observation, zhu et al. recently demonstrated that hcmv replication requires the function of cyclooxygenase-2 [24] . the authors showed how specific inhibitors of cox-2 lead to a .100-fold decrease in yield of hcmv titers. furthermore, the addition of pge2 in the presence of such inhibitors restored normal hcmv replication. intriguingly, a virus closely related to hcmv that infects rhesus macaques actually carries a cox-2 homologue in its genome [25] , which further emphasizes the importance of this host cell gene for cytomegalovirus reproduction. a recent example from jay nelson's laboratory further supports the notion that some transcriptionally upregulated host cell genes are important for viral replication (a. hirsch and j. nelson, pers. commun.). examination of host gene expression in human cells infected with various flaviviruses revealed significant upregulation of a host cell kinase. cells treated with corresponding kinase inhibitors or with small interfering rna (sirna) that specifically blocks this particular host cell protein resulted in a nearly tenfold decrease in production of west nile virus (fig. 2) , as well as other flaviviruses. a special case of virus modulation of host cell gene expression in which functional genomics will reveal novel targets and treatments is viral oncogenesis. because oncogenic transformation involves the misregulation of gene expression, cancer research is a field in which transcriptomics and proteomics will have the most immediate and direct benefit for patients (for a recent review see [26] ). virus-induced transformation of infected cells occurs because viral oncogenes trigger cellular pathways that ultimately overcome growth restrictions. thus, the study of viral transformation using functional genomics methods will reveal targets relevant to the development of cancer in general. our laboratory is currently studying kaposi's sarcoma (ks), a virally induced tumor that frequently appears in aids-patients. a characteristic feature of ks is the transformation of vascular endothelial cells to so-called spindle cells; a process that is recapitulated in vitro upon infection of primary or life-extended endothelial cells with kshv [27] . this in vitro model represents an excellent system to study cellular events during transformation because oncogenesis can be reproducibly triggered by infection with kshv. microarray analysis by our laboratory and by others revealed substantial transcriptional changes during spindle cell formation upon infection with kshv [28, 29] . our study identified upregulation of the oncogene c-kit. this host gene is particularly interesting because it is involved in other non-viral cancers, some of which also form spindle cells (e.g. gastrointestinal stromal tumors). importantly, signal transduction by c-kit can be inhibited by the receptor, tyrosine kinase inhibitor imatinib mesylate (sti571; gleevec) [30] . indeed, sti571 treatment prevented kshv-induced spindle cell formation in vitro [28] . this observation suggested that the anti-tumor drug gleevec might be useful in treating ks patients. this prediction was borne out in a recent clinical trial (http://www.asco.org/ac/1,1003,_12-002489-00_18-002003-00_197-00103219-00_29-00a,00.asp). treatment with gleevec of cutaneous ks in aids patients resulted in clinical and histological regression of lesions. these results suggest that c-kit [and plateletderived growth factor (pdgf) receptor; another target for gleevec that is also present in ks lesions] represents a valid target for ks therapy. the examples listed earlier demonstrate that, in principle, functional genomics methods can identify new antiviral targets or targets for the treatment of diseases caused by viruses. however, finding the 'needle' in the 'haystack' of changes that occur in host cells during virus infection can be a daunting task. until now, the availability of specific inhibitors was one of the reasons for choosing to study a particular host cell gene that was upregulated in a virally infected cell. however, recent years have seen a dramatic improvement in gene knockdown methods using antisense rna or rna interference (rnai) [31, 32] . compared with earlier generations of phosphorothioate antisense molecules, which displayed many nonspecific effects, new antisense oligomers are highly specific and long-lived. for instance, phosphordiamidate morpholino oligomers are highly resistant to enzymatic degradation and can be used to block translation of mrna [33] . the most recent method for analyzing gene inhibition studies has been rnai or sirna, where small 21 -23-nucleotide (nt) rna duplexes interfere with the transcription program by directing degradation of homologous mrna [34] [35] [36] [37] . rna gene silencing mechanisms are common to plants, fungi, nematodes, drosophila and mammalian cells [34 -36] . the mechanism by which sirna cleaves the target sequence has not been completely defined. the current model proposes that dsrna is cleaved into 21 -23 nt in an atp-dependent process by the rnase iii enzyme dicer [34,36 -38,39] . the 21 -23-nt sirna are thought to form a enzyme complex -the rna induced silencing complex (risc) -that unwinds the duplex sirna and targets the complex to the recognition site on targeted mrna where cleavage occurs and mrna is degraded [37, 38, 40, 41] . in mammalian cells, synthesized 21-nt sirna duplexes can lead to sequence-specific mrna degradation without inducing the antiviral interferon response activated by longer dsrna molecules [35] . both new-generation antisense oligomers and sirna can be used to knockdown expression of genes that were found by dna microarrays to be upregulated in virally infected cells or organisms. what makes the combination of microarray and knockdown methods particularly attractive for viral systems is that, instead of de novo induction, microarrays usually indicate severalfold changes in the level of a given transcript upon infection. thus, it is not necessary to completely shutdown expression of a target, but it is sufficient to reduce the level of induction back to the pre-infection state. using the in vitro ks model, we applied morpholino antisense dna oligomers against c-kit to kshv-infected endothelial cells. a single treatment was sufficient to inhibit foci formation, a process that takes several days to develop [28] . similar results were obtained with sirna against c-kit (fig. 3) . having established that antisense approaches work in this system, we are now using these methods to evaluate a large number of host genes found to be induced by kshv on transformation of endothelial cells. at present, we have examined .30 different host genes induced by kshv. in several instances, we observed an inhibition of the transformed phenotype comparable to the c-kit antisense or sirna (a. moses and k. frü h, unpublished observations). the corresponding genes represent known or novel oncogenes and potential new targets for therapy of ks as well as other cancers. in other viral systems, sirna targeted against viral or host cell genes has been successfully used to inhibit viral replication. the p gene of the non-segmented negativestrand rna virus, respiratory syncitial virus, was selectively silenced and resulted in a decrease in viral progeny and loss of syncytia formation [42] . poliovirus, a dsrna virus, had reduced titers when treated with sirna against capsid or viral polymerase genes, confirming that cytoplasmic rna virus can be inhibited by sirna [43] . however, poliovirus with a single nucleotide difference in the sirna target region can lead to escape mutants, suggesting that multiple target regions need to be generated for therapeutic purposes [43] . similarly, sirna duplexes targeted against various hiv genes inhibited hiv replication [44] [45] [46] [47] . in addition, decreasing cd4 levels by sirna results in decreased hiv entry and spread [44] . moreover, targeting of the downstream nuclear factor kb (nf-kb) p65 gene, thought to be involved in stimulating hiv gene expression, decreased hiv-1 replication [47] . similarly, human papillomavirus and hepatitis c virus replication were recently modulated with sirna [48, 49] . different vector-based sirna approaches have also been developed based on polymerase iii promoter expression of rnai molecules with small hairpin rna structures that can be processed by dicer [50] . this type of expression cassette has been incorporated into retroviral vectors for stable delivery and expression [50, 51] . recently, the therapeutic potential of sirna was demonstrated by retroviral delivery of sirna that could inhibit mutant k-ras v12 in human pancreatic carcinoma, leading to loss of tumorigencity [51] . the therapeutic potential of rnai in vivo has also been demonstrated in mice [52] . although these results demonstrate that sirna can inhibit a wide variety of viruses, it is our opinion that the real potential for this technique will be its use in validating which host cell products are important for virus replication or viral pathogenesis. as described earlier, pre-screening with dna microarrays can identify potential host cell targets. however, an even bolder approach is to bypass microarrays altogether by using a whole-genome knockdown approach in a given system and examining subsequent virus replication ability. although this approach is still difficult to implement in mammalian systems, owing to the large (and still unknown) number of genes and associated costs, a proof-of-principle study was successfully undertaken in drosophila cells. taking advantage of the completion of the drosophila genome and that rnai is extremely efficient in more primitive eukaryotes, n. perrimon's laboratory (harvard medical school) used a high-throughput rnai screen based on 21 000 dsrnas corresponding to every known drosophila gene to monitor viral replication in schneider cells. this approach revealed dozens of host cell gene products that were essential for viral replication but not for replication of the host cell (s. cherry and n. perrimon, pers. commun.). once whole human-genome sirna libraries are available, it will be possible to use similar approaches to identify the complete set of human genes required for viruses to complete their replication. although most functional genomics studies of virology focus on cataloging global transcriptional and translational events, proof-of-principle studies are emerging demonstrating that some of the detected changes are events required or desired for virus replication or pathogenesis. until now, validation could only be achieved using small-molecule inhibitors or dominant-negative proteins. however, the recent revolution in gene inhibition by antisense and sirna will significantly facilitate the validation of a large number of potential targets identified by dna microarrays or other methods. moreover, wholegenome screens with sirna will enable a complete evaluation and validation of the role of host cell gene products for viral replication. these approaches thus represent an unprecedented opportunity to characterize the functional host gene products essential for virus replication and disease. nucleotide sequence of bacteriophage phi x174 dna genomics, gene expression and dna arrays dna chips: state-of-the art primer on medical genomics. part iii: microarray experiments and data analysis extensive polymorphisms observed in hiv-1 clade b protease gene using high-density oligonucleotide arrays correlation of cervical carcinoma and precancerous lesions with human papillomavirus (hpv) genotypes detected with the hpv dna chip microarray method microarray-based detection and genotyping of viral pathogens typing and subtyping influenza virus using dna microarrays and multiplex reverse transcriptase pcr genotyping of 22 human papillomavirus types by dna chip in korean women: comparison with cytologic diagnosis analysis of the protein-coding content of the sequence of human cytomegalovirus strain ad169 dna microarrays of the complex human cytomegalovirus genome: profiling kinetic class with drug sensitivity of viral gene expression kaposi's sarcoma-associated herpesvirus latent and lytic gene expression as revealed by dna arrays transcription program of human herpesvirus 8 (kaposi's sarcoma-associated herpesvirus) cluster analysis and display of genome-wide expression patterns global analysis of herpes simplex virus type 1 transcription using an oligonucleotide-based dna microarray functional discovery via a compendium of expression profiles fast screening procedures for random transposon libraries of cloned herpesvirus genomes: mutational analysis of human cytomegalovirus envelope glycoprotein genes viruses and interferon: a fight for supremacy global impact of influenza virus on cellular pathways is mediated by both replication-dependent and -independent events identification of eukaryotic mrnas that are translated at reduced cap binding complex eif4f concentrations using a cdna microarray virogenomics: a novel approach to antiviral drug discovery cyclin-dependent kinases as cellular targets for antiviral drugs altered cellular mrna levels in human cytomegalovirus-infected fibroblasts: viral block to the accumulation of antiviral mrnas inhibition of cyclooxygenase 2 blocks human cytomegalovirus replication analysis of the complete dna sequence of rhesus cytomegalovirus the future of cancer management: translating the genome, transcriptome, and proteome long-term infection and transformation of dermal microvascular endothelial cells by human herpesvirus 8 kaposi's sarcoma-associated herpesvirusinduced upregulation of the c-kit proto-oncogene, as identified by gene expression profiling, is essential for the transformation of endothelial cells altered patterns of cellular gene expression in dermal microvascular endothelial cells infected with kaposi's sarcoma-associated herpesvirus inhibition of c-kit receptor tyrosine kinase activity by sti 571, a selective tyrosine kinase inhibitor gene silencing in mammals by small interfering rnas morpholino oligos: making sense of antisense? resistance of morpholino phosphorodiamidate oligomers to enzymatic degradation potent and specific genetic interference by doublestranded rna in caenorhabditis elegans duplexes of 21-nucleotide rnas mediate rna interference in cultured mammalian cells a species of small antisense rna in posttranscriptional gene silencing in plants rnai: double-stranded rna directs the atpdependent cleavage of mrna at 21 to 23 nucleotide intervals an rna-directed nuclease mediates posttranscriptional gene silencing in drosophila cells role for a bidentate ribonuclease in the initiation step of rna interference evidence that sirnas function as guides, not primers, in the drosophila and human rnai pathways atp requirements and small interfering rna structure in the rna interference pathway phenotypic silencing of cytoplasmic genes using sequence-specific double-stranded short interfering rna and its application in the reverse genetics of wild type negative-strand rna viruses short interfering rna confers intracellular antiviral immunity in human cells sirna-directed inhibition of hiv-1 infection modulation of hiv-1 replication by rna interference potent and specific inhibition of human immunodeficiency virus type 1 replication by rna interference rna interference directed against viral and cellular targets inhibits human immunodeficiency virus type 1 replication selective silencing of viral gene expression in hpv-positive human cervical carcinoma cells treated with sirna, a primer of rna interference clearance of replicating hepatitis c virus replicon rnas in cell culture by small interfering rnas rna interference: the new somatic cell genetics? stable suppression of tumorigenicity by virus-mediated rna interference rna interference in adult mice we thank alec hirsch, jay nelson, sara cherry and norbert perrimon for unpublished information. key: cord-104073-vsa5y7ip authors: warner, emily f.; bohálová, natália; brázda, václav; waller, zoë a. e.; bidula, stefan title: cross kingdom analysis of putative quadruplex-forming sequences in fungal genomes: novel antifungal targets to ameliorate fungal pathogenicity? date: 2020-09-23 journal: biorxiv doi: 10.1101/2020.09.23.310581 sha: doc_id: 104073 cord_uid: vsa5y7ip fungi contribute to upwards of 1.5 million human deaths annually, are involved in the spoilage of up to a third of food crops, and have a devastating effect on plant and animal biodiversity. moreover, this already significant issue is exacerbated by a rise in antifungal resistance and a critical requirement for novel drug targets. quadruplexes are four-stranded secondary structures in nucleic acids which can regulate processes such as transcription, translation, replication, and recombination. they are also found in genes linked to virulence in microbes, and quadruplex-binding ligands have been demonstrated to eliminate drug resistant pathogens. using a computational approach, we identified putative quadruplex-forming sequences (pqs) in 1362 genomes across the fungal kingdom and explored their potential involvement in virulence, drug resistance, and pathogenicity. here we present the largest analysis of pqs in fungi and identified significant heterogeneity of these sequences throughout phyla, genera, and species. moreover, pqs were genetically conserved. notably, loss of pqs in cryptococci and aspergilli was associated with pathogenicity. pqs in the clinically important pathogens aspergillus fumigatus, cryptococcus neoformans, and candida albicans were located within genes (particularly coding regions), mrna, repeat regions, mobile elements, trna, ncrna, rrna, and the centromere. genes containing pqs in these organisms were found to be primarily associated with metabolism, nucleic acid binding, transporter activity, and protein modification. finally, pqs were found in over 100 genes associated with virulence, drug resistance, or key biological processes in these pathogenic fungi and were found in genes which were highly upregulated during germination, hypoxia, oxidative stress, iron limitation, and in biofilms. taken together, quadruplexes in fungi could present interesting novel targets to ameliorate fungal virulence and overcome drug resistance. fungi contribute to upwards of 1.5 million human deaths annually, are involved in the spoilage 23 of up to a third of food crops, and have a devastating effect on plant and animal biodiversity. 24 moreover, this already significant issue is exacerbated by a rise in antifungal resistance and 25 a critical requirement for novel drug targets. quadruplexes are four-stranded secondary 26 structures in nucleic acids which can regulate processes such as transcription, translation, 27 replication, and recombination. they are also found in genes linked to virulence in microbes, 28 and quadruplex-binding ligands have been demonstrated to eliminate drug resistant 29 pathogens. using a computational approach, we identified putative quadruplex-forming 30 sequences (pqs) in 1362 genomes across the fungal kingdom and explored their potential 31 involvement in virulence, drug resistance, and pathogenicity. here we present the largest 32 analysis of pqs in fungi and identified significant heterogeneity of these sequences 33 throughout phyla, genera, and species. moreover, pqs were genetically conserved. notably, 34 loss of pqs in cryptococci and aspergilli was associated with pathogenicity. pqs in the 35 clinically important pathogens aspergillus fumigatus, cryptococcus neoformans, and candida 36 albicans were located within genes (particularly coding regions), mrna, repeat regions, 37 introduction sequence nucleotides and can form intramolecular or intermolecular associations [17, 18] . 75 this structure is further stabilised by the presence of monovalent cations, especially potassium 76 [19] . moreover, the 5ʹ-to 3ʹ-directionality of the strands, glycosidic bonding in the g-tetrads, 77 the cation present, and number of stacked g-tetrads contribute to the wide variation of 78 observed g4 structures and topologies [13] . conversely, ims form within cytosine-rich regions there has been increased interest in the therapeutic potential of targeting quadruplexes 92 following the implication of these secondary structures in disease, especially cancer, due to 93 their prevalence in oncogene promoters [26] . however, there is also now a growing number 94 of pathogens in which g4s respectively; figure 2a and e). the basidiomycota and zoopagomycota had high pqs 218 frequencies relative to genome size (0.445 and 0.373 pqs/kbp, respectively; figure 2b and 219 f). the mucoromycota and basidiomycota displayed high pqs frequencies relative to gc 220 content (459 and 340 pqs/gc%, respectively; figure 2c and g). fungi within the 221 basidiomycota had the highest average gc content (53.3%; figure 2d and h). the 222 microsporidia and cryptomycota scored lowest for total number of pqs (300 and 372, 223 respectively), pqs/kbp (0.091 and 0.029, respectively) and pqs/gc% (8 and 11, 224 respectively; figure 2 ). moreover, they also had low gc content (39.6% and 35.0%, 225 respectively). 226 considering g4s and ims form in guanine or cytosine rich regions, respectively, one would 227 expect fungi with a higher genome gc content to have a higher pqs frequency by chance. 228 to investigate this further, the frequency of pqs/kbp relative to the gc content in all fungi and 229 their divisions were plotted. as expected, there was a positive correlation between gc content 230 and pqs frequency amongst all the fungal species analysed (r=0.5290; p<0.0001; figure 3a ). and mucoromycota (r=0.5619, r=0.3891, and r=0.5239 and r=0.2883, respectively; all p<0.05; 233 figure 3b -d). however, there was not a significant correlation observed within the 234 kickxellomycotina (n=7 species) were 8399.7, 6726.5, and 11315.1, respectively ( figure 4a ). 259 the average pqs/kbp for each subphylum was 0.746, 0.081, and 0.292, respectively ( figure 260 4b). the average pqs/gc% for each subphylum was 162.3, 162.7, and 305.4, respectively 261 ( figure 4c ). finally, the average gc% for each subphylum was 54.4%, 35.3%, and 38.2%, 262 respectively ( figure 4d) . figure 4c ). finally, the average gc% were 39.6% and 35.0%, respectively ( figure 4d ). 274 finally, we also highlighted the frequency of pqs in fungal genera which contained important 275 human and plant pathogens. we found that there was also large heterogeneity in the 276 frequency of pqs between species within genera containing human pathogens (e.g. 277 aspergillus spp., candida spp., cryptococcus spp., blastomyces spp.) and plant pathogens 278 (e.g. verticillium spp., and fusarium spp.; figure 5 ). this variation was particularly wide within 279 aspergillus spp., and cryptococcus spp. 280 evolutionary conservation of genetic motifs within the genome are a hallmark of their 282 fundamental importance to how that organism functions. therefore, we endeavoured to 283 explore whether there was evolutionary conservation of pqs within fungal genomes. we chose to explore this relationship in aspergillus spp., due to the robustness and accuracy of 285 the phylogenetic tree available [44] . 286 notably, we found that the frequency of pqs/kbp appeared to be intrinsically linked to how 287 closely related species were, with species within the same section displaying similar pqs 288 frequencies ( figure 6 ). aspergilli in this tree were divided into 13 sections (range of pqs/kbp 289 the ascomycota and basidiomycota contain many of the most prevalent fungal pathogens of 305 both plants and humans, including the genera aspergillus spp., candida spp., and 306 cryptococcus spp., which contain fungal species that account for most fungal-related deaths 307 in humans. although, not all species within these genera are potential pathogens and we found 308 high variation in their pqs frequency. therefore, we compared the pqs frequency between pathogenic and non-pathogenic species to explore whether there was a link with 310 pathogenicity. 311 similarly, comparing 16 species of cryptococcus (9 pathogenic, 7 non-pathogenic) we also 320 found that pathogenic species had a significantly lower frequency of pqs/kbp (0.480 vs. when only total pqs were considered, the largest number of pqs in all three fungal species 348 could be found within the coding regions (cds), genes, and mrna, with few pqs found in 349 other genomic features ( figure 8a , b, and c). however, this was not the same when 350 considering the frequency of pqs/kbp of the genomic features. in a. fumigatus, the greatest 351 frequency of pqs could be found in the repeat regions ( figure 8d ). the lowest frequency 352 could be found within the trna. in c. neoformans, the highest pqs frequencies were still in 353 the cds, genes, and mrna, with a very low frequency found within the trna ( figure 8e ). in 354 c. albicans, the highest frequency of pqs could be found in the rrna, followed by repeat 355 regions and ncrna ( figure 8f ). there were no pqs found in the trna and low frequencies 356 were again found in the mobile elements. the total number and frequency of pqs 100 bp 357 before and after the annotated genomic features appeared to be evenly distributed (figure 8) . 358 pqs are found in genes encoding proteins involved in metabolism, nucleic acid 359 binding, cell transport, and protein modification 360 as we knew the genomic location of the pqs, we could then identify the number and identity 361 of the genes which contained these sequences. this further enabled us to identify the classes of proteins associated with pqs-containing genes. in a. fumigatus, 35.1% of genes contained 363 at least one pqs. in c. neoformans, this number was almost double, with 59.9% of genes 364 containing pqs. conversely, pqs were only found in 5.6% of genes in c. albicans. 365 despite the discrepancies in the number of genes where pqs can found between the 366 organisms, in all cases, pqs were primarily located in genes which encoded proteins involved 367 in metabolism, nucleic acid binding, cell transport, and protein modification (figure 9 ). they 368 were least likely to be found in genes encoding for calcium-binding proteins, extracellular 369 matrix proteins, cell adhesion molecules, and defense/immunity proteins (figure 9) . 370 in all organisms, pqs could be found in the highest frequency in genes associated with 371 metabolite interconversion enzymes. in a. fumigatus, the number of genes associated with 372 metabolite interconversion enzymes was 3.7-fold higher than the next represented protein 373 class (434 genes vs. 117 genes for nucleic acid binding proteins and transporters; figure 9a) . 374 in c. neoformans the number of genes associated with these enzymes was 2.1-fold higher 375 compared to nucleic acid binding proteins (491 genes vs. 231 genes, respectively; figure 9b) . 376 in c. albicans, the difference in the number of pqs-containing genes associated with 377 metabolite interconversion enzymes and nucleic acid binding proteins was much lower (26 378 genes vs. 21 genes, respectively; 1.2-fold; figure 9c ). surprisingly, when categorising genes 379 based on gene ontology terms, there was an almost identical distribution of genes involved in im in the promoter of the hiv-1 pro-viral genome has also been recently been described [31] . 387 thus, whether pqs could be found in genes associated with virulence/drug resistance in a. fumigatus, c. neoformans, and c. albicans was explored. although the list is not exhaustive 389 (there are many proteins still yet to be characterised), there were many interesting candidates 390 that arose from the analysis. in total, pqs were found in over 100 genes associated with the 391 virulence, drug resistance, or key biological processes of a. fumigatus (39 genes), c. 392 neoformans (41 genes), and c. albicans (27; tables 1-3) . 393 in a. fumigatus, pqs could be found in notable genes, including the 14-α sterol demethylases 394 (cyp51a and cyp51b), the 1,3-β-glucan synthase catalytic subunit fks1, and abc drug 395 exporter atrf, which are involved in drug resistance. in addition to genes involved in virulence, 396 including transcription factors stua, hapx, and pacc, genes involved in pigment biosynthesis 397 (pksp, arp2, abr1, abr2, and ayg1), a master regulator of secondary metabolism laea, and 398 glin and glip which are involved in the synthesis of gliotoxin (table 1) . 399 as pqs could be found in almost two-thirds of c. neoformans genes, it was not surprising that 400 pqs could be found in those associated with virulence. these included the abc transporter 401 afr1 (which is associated with fluconazole resistance), the protein kinases fsk and hog1, the 402 calcineurin-associated genes crz1 and cna1, pacc/rim101 like in a. fumigatus, and numerous 403 capsule-associated genes (the main virulence factor of cryptococcus) including cap2, cap5, 404 cap10, cap59, cap60, cap64, cas31, cas33, and cxt1 (table 2) . 405 there were very few genes in c. albicans that contained sequences likely to form 406 quadruplexes, and thus, quadruplexes might be less important in this organism. notable genes 407 included the iron permeases ftr1 and ftr2, and a gene associated with flucytosine resistance 408 (rrp9 ; table 3) . 409 the highest scoring potential quadruplex-forming sequences for each of these genes were 410 then re-analysed in an alternative pqs predictive algorithm called qgrs mapper. in this 411 instance, the scores of known quadruplex-forming sequences were compared to scores of the 412 pqs in fungi. this was conducted to provide further insight into whether these sequences 413 were likely to form quadruplex structures. figure 10b ). in all 437 cases, the average pqs frequencies in the upregulated genes were higher than the average 438 pqs observed throughout the entire genome ( figure. 10b ). the average pqs frequencies in 439 upregulated pqs-containing genes were 2.97 pqs/kbp (germinating conidia), 3.72 pqs/kbp (oxidative stress), and 2.26 pqs/kbp (biofilms; figure 10b ). although, there were a range of 442 pqs frequencies observed between the genes from 0.34 to 11.90 pqs/kbp. the genes 443 containing the highest pqs frequencies for each condition were afua_8g01710 in 444 germinating conidia and hyphae (11.90 pqs/kbp), afua_4g09580 in hypoxic fungi (5.59 445 pqs/kbp), afua_3g03650 during iron limitation (8.50 pqs/kbp), afua_5g10220 during 446 oxidative stress (5.28 pqs/kbp), and afua_8g01980 in biofilms (5.90 pqs/kbp). 447 interestingly, each of these genes were upregulated in at least 3 out the 6 conditions 448 investigated. 449 in this study, the number of potential quadruplex-forming sequences within the genomes of 451 fungi were computationally predicted and their potential involvement in pathogenicity was 452 discussed. several important observations were made. this was the first study to identify the 453 heterogeneity of pqs amongst genetically distinct fungal species. moreover, we highlighted 454 that pathogenic aspergillus and cryptococcus species contained fewer pqs compared to their 455 non-pathogenic counterparts and these could be found throughout known genomic features, 456 including genes, mrna, repeat regions, trna, ncrna, and rrna. genes containing pqs 457 were associated with metabolism, nucleic acid-binding proteins, protein modifying enzymes, 458 and transporters. notably, pqs likely to form quadruplexes were identified in genes linked 459 with fungal virulence or drug resistance, such as cyp51a, and could be found in genes 460 upregulated during fungal growth and in response to stress. 461 the frequency of pqs throughout genomes is highly variable; for example, human genomes 462 were shown to contain around 0.228 pqs/kbp, whereas the genomes of escherichia coli 463 contain around 0.028 pqs/kbp [15] . in this study we also found significant differences in the interestingly, loss of pqs has recently also been observed in pathogenic coronaviridae [47] . 488 it has also been reported that host nucleolin (an rna-binding protein) can bind and stabilise 489 quadruplexes in the ltr promoter of hiv-1, which can silence viral transcription [48] . 490 therefore, in this situation, loss of quadruplexes would be beneficial for immune evasion. (pacc/rim101). the most notable virulence factor of c. neoformans is its polysaccharide 536 capsule and pqs could be found in numerous capsule-associated genes (cas31, cap60, 537 cap59, cap64, cap2, cap5, cas33, cap10, and cxt1) [69]. in c. albicans pqs could be found 538 in genes such as the iron permeases ftr1 and ftr2 [70] . notably, many of these genes contained 539 pqs which have previously been shown to be capable of forming bona fide quadruplexes, 540 such as the sequence ggaggaggagg [71] . it is also interesting to highlight that these 541 organisms contained many more g 2+ l 1-12 compared to g 3+ l 1-12 pqs sequences, which is a 542 characteristic shared with s. cerevisiae [15] . 543 there are now an ever-increasing number of g4s identified within genes linked to microbial 544 pathogenicity. g4-forming motifs located in the hsds, recd, and pmra genes of s. the pearson correlation coefficient was used to determine the association between pqs and 795 gc content. p<0.05 was considered statistically significant. 796 stop neglecting fungi. nat microbiol strategies for engineering natural product biosynthesis in fungi the regulation and functions of dna and rna g-quadruplexes i-motif dna: structural features and significance to cell biology whole genome experimental maps of dna g-quadruplexes in 605 multiple species quadruplex dna: sequence, topology and structure an intramolecular g-quadruplex structure with mixed parallel/antiparallel mycocosm portal: gearing up for 1000 fungal genomes g4hunter web application: a web server for g-quadruplex prediction panther version 14: more genomes, a new panther go-slim and 652 improvements in enrichment analysis tools applications for protein sequence-function evolution data: 655 mrna/protein expression analysis and coding snp scoring tools comparative transcriptome analysis revealing dormant conidia 664 and germination associated genes in aspergillus species: an essential role for atfa in 665 conidial dormancy additional oxidative stress reroutes the global response of 669 aspergillus fumigatus to iron depletion global transcriptome changes underlying colony growth in the 673 opportunistic human pathogen aspergillus fumigatus a robust phylogenomic time tree for biotechnologically and 676 medically important fungi in the genera aspergillus and penicillium. mbio g-quadruplex-induced instability during leading-strand replication rna g-quadruplexes are globally unfolded in eukaryotic 681 cells and depleted in bacteria nucleolin stabilizes g-quadruplex structures folded by the ltr 687 promoter and silences hiv-1 viral transcription aspergillus fumigatus conidia survive and germinate 690 in acidic organelles of a549 epithelial cells genomic distribution and functional analyses of potential g-694 quadruplex-forming sequences in saccharomyces cerevisiae divergent distributions of inverted repeats and g-quadruplex forming 697 sequences in saccharomyces cerevisiae genome-wide prediction of g4 dna as regulatory motifs: role in 701 metabolism impacts upon candida immunogenicity and 703 pathogenicity at multiple levels metabolism in fungal pathogenesis. cold spring harb perspect med antifungal resistance, metabolic routes as drug targets, 707 and new antifungal agents: an overview about endemic dimorphic fungi secondary metabolite arsenal of an opportunistic pathogenic fungus candidalysin is a fungal peptide toxin critical for mucosal infection the fungal cyp51s: their functions, structures identification of aspergillus fumigatus 716 multidrug transporter genes and their potential involvement in antifungal resistance laea, a regulator of morphogenetic fungal virulence factors recognition of dhn-melanin by a c-type lectin receptor is 726 required for immunity to aspergillus aspergillus fumigatus virulence 728 through the lens of transcription factors role of afr1, an abc transporter-encoding gene, in the in vivo 730 response to fluconazole and virulence of cryptococcus neoformans distinct stress responses of two functional laccases in cryptococcus neoformans are revealed in the absence of the thiol-specific antioxidant the capsule of the fungal pathogen cryptococcus neoformans functional characterization of the ferroxidase, permease high-affinity 738 iron transport complex from candida albicans characterization of highly conserved g-quadruplex motifs as 742 potential drug targets in streptococcus pneumoniae g-quadruplex dna motifs in the malaria parasite plasmodium 744 falciparum and their potential as novel antimalarial drug targets characterization of g-quadruplex motifs in espb genes of mycobacterium tuberculosis as potential drug targets berberine antifungal activity in fluconazole-resistant pathogenic 752 yeasts: action mechanism evaluated by flow cytometry and biofilm growth inhibition 753 in candida spp key: cord-256837-100ir651 authors: smith, steven b.; dampier, william; tozeren, aydin; brown, james r.; magid-slav, michal title: identification of common biological pathways and drug targets across multiple respiratory viruses based on human host gene expression analysis date: 2012-03-14 journal: plos one doi: 10.1371/journal.pone.0033174 sha: doc_id: 256837 cord_uid: 100ir651 background: pandemic and seasonal respiratory viruses are a major global health concern. given the genetic diversity of respiratory viruses and the emergence of drug resistant strains, the targeted disruption of human host-virus interactions is a potential therapeutic strategy for treating multi-viral infections. the availability of large-scale genomic datasets focused on host-pathogen interactions can be used to discover novel drug targets as well as potential opportunities for drug repositioning. methods/results: in this study, we performed a large-scale analysis of microarray datasets involving host response to infections by influenza a virus, respiratory syncytial virus, rhinovirus, sars-coronavirus, metapneumonia virus, coxsackievirus and cytomegalovirus. common genes and pathways were found through a rigorous, iterative analysis pipeline where relevant host mrna expression datasets were identified, analyzed for quality and gene differential expression, then mapped to pathways for enrichment analysis. possible repurposed drugs targets were found through database and literature searches. a total of 67 common biological pathways were identified among the seven different respiratory viruses analyzed, representing fifteen laboratories, nine different cell types, and seven different array platforms. a large overlap in the general immune response was observed among the top twenty of these 67 pathways, adding validation to our analysis strategy. of the top five pathways, we found 53 differentially expressed genes affected by at least five of the seven viruses. we suggest five new therapeutic indications for existing small molecules or biological agents targeting proteins encoded by the genes f3, il1b, tnf, casp1 and mmp9. pathway enrichment analysis also identified a potential novel host response, the parkin-ubiquitin proteasomal system (parkin-ups) pathway, which is known to be involved in the progression of neurodegenerative parkinson's disease. conclusions: our study suggests that multiple and diverse respiratory viruses invoke several common host response pathways. further analysis of these pathways suggests potential opportunities for therapeutic intervention. respiratory viruses account for seasonal colds, bronchiolitis, acute otitis, sinusitis, croup, community-acquired pneumonia, and exacerbation of both chronic obstructive pulmonary disease and asthma [1] . the prevalence of pandemic orthomyxoviridae influenza a virus (flu) from april 2009 to 2010 was estimated to be approximately 60 million cases, 270,000 hospitalizations, and 12,000 deaths [2] . paramyxoviridae respiratory syncytial virus (rsv) infection results in nearly two million children requiring medical care with about 57,000 children younger than five years hospitalized annually [3] . in one survey, rsv was the most prevalent pathogen in children under five years with an acute respiratory infection, followed by adenoviridae adenovirus (adeno), and picornaviridae human rhinovirus (hrv) [4] . while initially effective, pathogen gene targeted treatments exert evolutionary selection on the infectious species often leading to the emergence of drug resistant strains. as a result, there are increasing clinical reports of resistance against many drugs that directly act on viral proteins or their dna [5, 6] . in particular, resistance to different classes of antiviral drugs is becoming more clinically prevalent in respiratory virus infections as seen with rsv and flu treated with the antiviral drugs palivizumab [7] , and oseltamivir [8] , respectively. pathogens elucidate two broad types of biochemical responses in the host. first is the activation of the host immune system. while the immune response is critical in combating pathogen infections, its over-activation often exacerbates tissue damage initiated by viral invasion [9, 10] . the second response is the up-regulation of host genes, such as protein biosynthetic pathways, that are crucial for sustaining pathogen invasion, replication and evasion [11] . interestingly, genetically distinct respiratory viruses often modulate common host proteins and biological pathways during infection [1] . for example, many respiratory viruses trigger similar general airway inflammatory responses such as the expression of cytokines interleukin-6 (hugo gene name il6), interleukin-8 (il8) and interleukin-11 (il11), and granulocyte macrophage-colony stimulating factor (csf2). these inflammatory responses in turn initiate iga production, b cell differentiation and t cell stimulation [12] [13] [14] [15] [16] . as a consequence, diagnosis for specific viral infections is difficult since diverse respiratory viruses cause similar, often indistinguishable patient symptoms [1] . however, because distinct respiratory viruses converge on similar immune responses, opportunities also exist for targeting host proteins and pathways which will potentially affect multiple viral pathogens [17] . moreover, human targets might be less susceptible to the evolution of drug resistance due to constraints on the virus to find alternative host pathways for its proliferation. individuals may experience a co-infection or sequential infections of multiple viruses or bacteria which can complicate both disease diagnosis and drug prescription decisions. furthermore, patients infected by multiple pathogens may have further complications due to drug-drug interactions, cumulative drug toxicities and immune system suppression, as observed during hiv and mycobacterium tuberculosis co-infections [18, 19] . indeed, a study in children under five years showed pervasive clinical occurrences of co-infections involving combinations of rsv, hrv, paramyxoviridae parainfluenza virus, flu, coronaviridae sars-coronavirus (coron), paramyxoviridae metapneumonia virus (mpneu), parvoviridae human bocavirus and adeno [4] . therefore, in addition to minimizing drug resistance, there is a need for new therapeutic approaches to safely and effectively treat co-infections by multiple viral and/or bacterial pathogens, particularly where strain-specific diagnostics or treatments are unavailable. the development of new antiviral therapeutics requires a greater understanding of the global host response when challenged by different types of viruses. such knowledge may lead to the identification of novel human genome targets that are shared across multiple viral infections as well as opportunities for repositioning existing drugs for the treatment of infectious diseases [20] . several recent studies have generated multiple mrna microarray gene expression datasets derived from experiments involving the infection of human cell-lines or animal models with one or more of the major respiratory viruses [21] [22] [23] . through a systematic analysis of these respiratory virus-human host gene expression datasets, we determined common sets of genes and pathways involved in host responses to viral infections. among the most significant pathways, we identified several potential new opportunities for repurposing existing drugs for the treatment of respiratory viral infections. we performed a large-scale analysis of published mrna microarray datasets from studies involving a wide range of respiratory viruses in human host infection models. we focused on human mrna array datasets in order to avoid complications inherent in cross-species comparisons. in order to ensure consistency in experimental conditions and reduce biases due to noisy or poor quality datasets, we instituted an iterative process of database querying, data filtering, and common pathway analysis across all published human mrna datasets for twelve relevant respiratory viruses. these viruses initially included the double stranded dna viruses herpesviridae human cytomegalovirus (cmv) and adeno; the positive sense single stranded rna viruses coron, picornaviridae coxsackievirus (cox), hrv, picornaviridae echovirus (echo), and picornaviridae enterovirus (entero); and the negative sense single stranded rna viruses flu, mpenu, rsv, bunyaviridae hantavirus (hant) and sin nombre virus (snv). this list was later narrowed to include only the subset listed in table 1 based on filtering processes outlined in the materials and methods and shown in figure 1 . a total of seven different respiratory viruses were analyzed, represented by fifteen unique gene expression omnibus (geo) datasets (indicated by geo series or gse accession numbers), nine different human cell types, and seven different array platforms for a total of 28 unique comparisons. note that one dataset (gse17156) contained two different viruses (flu and rsv) that were analyzed. after querying the geo database and prescreening for obvious non-candidate datasets such as those not associated with human array platforms, there were at least 23 datasets associated with at least one of the twelve respiratory viruses. however, after considering all conditions for geo dataset candidacy, at least four of these datasets were excluded. in one case, an adeno dataset (gse1291 [pmid unpublished]) had less than three samples per treatment group, as did a cox (gse712 [pmid unpublished]) and a cmv (gse19345 [24] ) dataset. as another example, a cmv dataset (gse675 [25] ) lacked a healthy/control treatment group. additionally, at least four datasets had some comparison groups that did not fit the filters for inclusion. for instance, an hrv (gse13396) dataset's original study design was to observe differences in hrv infectivity between asthmatic and non-asthmatic patients. the asthmatic comparison group data were eliminated from the analysis because of potential difficulties in distinguishing between host inflammatory responses due to viral infections from those associated with chronic asthma. similarly, a combined flu, hrv and rsv dataset (gse17156) contained two main patient groups. one group was classified as developing symptoms after exposure to a single virus under study, while the other group did not develop any symptoms after exposure. only the group that developed symptoms for each of the three viruses was considered for further analysis and the asymptomatic group was omitted. in total, 19 geo datasets, representing 42 unique comparisons (different time points and/or virus strains) were analyzed for quality because they met the four requirements for dataset candidacy. no single dataset exhibited overall poor quality control (qc), and therefore, all 19 datasets representing 42 comparison groups were analyzed for differential expression. however, qc analysis across all candidate datasets revealed two outliers in gse17156 (samples gsm429252 and gsm429279), two in gse11348 (samples gsm286647 and gsm286733), and one outlier each in dataset gse24132 [26] (sample gsm594166), gse1739 (sample gsm30367), and gse19580 (sample gsm487986) for a total of seven samples removed from five different datasets. an illustration of the kernel density and principle component analysis (pca) plots generated during the qc analysis is shown in figure 2 for gse17156's rsv treatment (median of 141 hours post infection) and rsv control (baseline) groups. additional qc analysis results including median of absolute deviation (mad) score plots and pair-wise correlation maps are shown in figure s1 . initially, all samples except gsm429279 showed acceptable kernel density (figure 2a) , pca (figure 2c ), mad score ( figure s1a ) and pair-wise correlation ( figure s1c ) plots. the sample gsm429279 was removed because: a) it did not conform to the kernel density of the other samples; b) it fell outside of the hotelling t2 alpha threshold of 0.05 (represented by the superimposed elliptical on the pca plot), and; c) it was an outlier in both the mad score and pair-wise correlation plots. a second qc round was performed, which resulted in a further non-conforming sample, gsm429252, being discarded. subsequent qc analysis generated acceptable results in kernel density (figure 2b ), pca (figure 2d ), mad score ( figure s1b ), and pair-wise correlation ( figure s1d ), thus this dataset passed our criteria for inclusion in the analysis. all datasets exhibiting acceptable quality were analyzed for probe differential expression. an example volcano plot is shown in figure s2 for rsv treatment group at peak symptoms versus control group (data originating from gse17156). cutoff levels of 1.5-fold increase or decrease in probe expression levels, respectively, and p-values ,0.05 were used throughout (represented by red lines in figure s2 ). all comparison groups had at least some differentially expressed probes, although the number varied greatly indicating potential falsely discovered probes (for example, a comparison group within gse18816 had 111 differentially expressed probes while a comparison group within gse11408 had 2533 differentially expressed probes). however, the conservative pathway enrichment approach we employed tends to attenuate falsely discovered genes. there were three comparison groups that did not meet the least square mean (lsm) threshold requirement and were excluded from the differentially expressed probe list: two of the for each comparison group, the differentially expressed probes were mapped to their corresponding genes, and then a p-value was assigned for each pathway map using the software genego (accessed june 2011). next, the comparison group's significant pathway lists were combined to find the union of all significant pathways (that is, the combined pathway list where all treatment groups have at least one significant pathway). a total of 459 out of the approximately 650 pathway maps available in metabase were determined to be significant. comparison groups having ,5% significant pathways of the total significant pathways (that is, comparison groups containing less than 23 significant pathways) lead to the exclusion of eleven comparison groups from the union list. excluded groups were: hrv at 8 hours (eliminating one comparison group from gse11348), hrv at 72 hours (eliminating one comparison group from gse17156), both strains of flu at 1 hour and 3 hours each and another strain at 6 hours (eliminating three comparison groups from gse18816), rsv at 24 hours (eliminating all comparison groups from gse24132), cmv at 24 and 72 hours (eliminating all comparison groups from gse24434 [27] ), and flu at 8 hours (eliminating all comparison groups from gse24533 [28] ). at the end of the final step in our filtering process, a total of 15 datasets, or 28 comparison groups remained (tables 1, s1 and s2). there were 67 enriched pathways in which all seven respiratory viruses were represented by at least one comparison group (table s3 ). the list is ranked first by the viral frequency, followed by the sum of the normalized viral expression (nve) for each pathway. also shown are the differentially expressed as well as the total number of network objects across all 28 comparisons. the top 20 enriched pathways are listed in table 2 along with the percentage and names of the differentially expressed genes with a viral frequency of at least five in each pathway. of these, the top five pathways were chosen for further analysis and mapping. these pathways are epidermal growth factor receptor (egfr) signaling, cd40 signaling, interferon-gamma (ifng) signaling, histamine receptor h1 (hrh1) signaling, and interleukin-17 (il17) signaling (figures s5. s6, s7, s8, s9; table s4 ). additionally, the parkin-ubiquitin proteasomal system (parkin-ups) pathway was chosen for further analysis because it has not been previously associated with the innate immunity and might be an interesting new mechanism of host response to respiratory viral infection ( figure 3 ). the nves for differentially expressed genes with frequencies of at least six viruses are shown in table 3 along with their associated pathways. the list is ranked by the greatest viral frequency, and then by number of pathways in which the gene is differentially expressed. the nve values for all genes, along with associated pathways, ranked by the greatest viral frequency, followed by the number of pathways in which the gene is differentially expressed are in table s5 . we ensured that the nve was not bias toward any particular comparison group, and indeed no single dataset contributed to the overall nve for any single virus (table s2) . hierarchical clustering on the quantile normalized fold change values for all genes having expression values in at least 26 out of 28 comparisons (at least 90% comparisons) and significant in at least seven comparisons ( figure s3 ) as well as for genes with nve of at least six viruses ( figure s4 ) did not reveal any dominant clustering by gse or virus type. the most consistently up-regulated genes (up-regulated in at least six viruses and down-regulated no more than one virus) are: nuclear factor of kappa light polypeptide gene enhancer in b-cells inhibitor alpha (nfkbia), tumor necrosis factor alpha-induced protein 3 (tnfaip3), chemokine c-c motif ligand 2 (ccl2), interferon regulatory factor 1 (irf1), prostaglandin-endoperoxide synthase 2 (ptgs2), chemokine c-c motif ligand 20 (ccl20), dual specificity phosphatase 1 (dusp1), eukaryotic translation initiation factor 2-alpha kinase 2 (ei-f2ak2), tnf receptor superfamily member 6 (fas), suppressor of cytokine signaling 1 (socs1), tnf receptor-associated factor 1 (traf1), and ubiquitin-conjugating enzyme e2l 6 (ube2l6). there were no consistently down-regulated mrnas (downregulated in at least six viruses and up-regulated in no more than one virus). we sought drug repurposing candidate targets from the top five enriched pathways and the parkin-ups pathway by searching the drugbank database, version 3.0 (http://www.drugbank.ca/ accessed august 2011) [29] [30] [31] , for drugs targeting any of the 67 differentially expressed genes with a viral frequency of at least five (table s6 ). of these, thirteen genes, or almost 20% of the original 67 genes, were associated with at least one approved small molecule or protein therapy. there genes were: prostaglandinendoperoxide synthase 2 (ptgs2), tnf, matrix metallopeptidase 9 (mmp9), jun proto-oncogene (jun), interleukin 1 beta (il1b), ccl2, cd86, coagulation factor iii (f3), phosphoinositide-3kinase regulatory subunit 1 (pik3r1), intercellular adhesion molecule 1 (icam1), nuclear factor of kappa light polypeptide gene enhancer in b-cells 2 (nfkb2), caspase 1 (casp1), and tubulin beta 3 (tubb3). a selection of these genes, along with other characteristics to evaluate their potential as drug targets such as involvement in immune response [29] [30] [31] , jackson laboratory knock-in/knock-out mouse (jax) phenotype [32] , approved or marketed small molecule drug or protein therapy, and current indications for that drug, are listed in table 4 . note that the current indication may not be for the gene target listed. mimosine (gene target: ccl2) and glucosamine (gene targets: nfkb2 and mmp9) did not have a current indication, while the interactions of natalizumab (gene target: icam1) and gallium nitrate (gene target: ilb1) with their gene targets were unclear. additionally, therapies associated with ptgs2 are cyclooxygenase (cox-2) inhibitors which have known side-effect issues thus were not explored further. therefore, nfkb2, icam1 and ptgs2 were excluded from table 4 , leaving ten genes for potential drug repurposing. the potential cases for drug repurposing are discussed more in-depth for four targets; f3, il1b, tnf and casp1. our study used a systematic process to minimize potential technical noise that could have arisen from our comparative analysis of fifteen unique datasets from nine different cell types, and seven different array platforms. these measures included candidate dataset filtering followed by qc, differential gene expression and pathway enrichment analyses. a total of 14 out of 42, about one third of the total comparisons, were removed as a result of this filtering process, which is indicative of our conservative analysis approach. we had previously used largescale and merged-sam analyses in integrating large-scale microarray datasets involving cancer tissues from multiple laboratories [33, 34] . however, the small sample size datasets used in the present study required a more rigorous methodology to identify data outliers. to our knowledge the qc analysis performed with each geo dataset is unique to this study. although no dataset was completely disregarded after qc analysis, some samples were clear outliers, thus potentially skewing the data. kauffmann and huber have demonstrated improvements in signal-to-noise ratios after performing post normalization qc analysis to remove array outliers within an experiment [35] . those authors used ma-plot and boxplots of the log-ratios to determine outliers instead of mad scores, pca and pair-wise correlations employed in this study. fundamentally, the concept of data improvement after outlier removal applies regardless of the qc analysis approach. signaling 26 jun, myc, nfkbia, stat1, fos, jak2, hbegf, dusp1, dusp4, ptk2, gsk3b, mmp9, nfkb2, pik3r1, prkca, sos2, tgfa cd40 signaling 31 il8, jun, nfkbia, tnfaip3, ccl2, fas, il6, irf1, jak2, ptgs2, traf1, ccnd2, cd86, icam1, lyn, map2k3, map3k14, mapk14, nfkb2, pik3r1, tp53, traf5 ifng signaling 24 myc, stat1, cdkn1a, eif2ak2, irf1, jak2, socs1, stat2, camk2g, cebpb, icam1, mapk14, pik3r1, prkca, ptpn11 hrh1 signaling 25 il8, jun, nfkbia, fos, il6, tnf, csf2, f3, gnaq, gnb4, gng12, icam1 despite the diverse nature of the microarray data analyzed here, we found a large overlap between comparison groups in significant pathways, especially the immune system. of the top twenty enriched pathways, eighteen are associated with immune response ( table 2) . for example, egfr signaling is known to be activated during infection by respiratory viruses flu [36] and entero [37, 38] . cd40 signaling is associated with coron [39] , rsv [40] , and the general immune response [41] . interferon gamma (ifng) signaling is initiated by flu [42] and rsv [43] , while interleukin 1 signaling is stimulated by flu [42] . as components of the general immune response, interferon and interleukin pathways are activated by infectious agents such as hepatitis c virus (hcv), hiv and tuberculosis as well as chronic diseases like crohn's disease, diabetes, and metastatic melanoma [44, 45] . the overall relationships between the transitory host immunity response launched by pathogenic infections versus that seen in chronic autoimmune and neurodegenerative diseases are complex and an intense area of investigation [46] . in addition, there are considerations about subtle shifts in gene function roles in different cell tissue types amongst the various diseases. thus, we are cautious about any linkages between pathways involved in infections and those of chronic diseases as implied by our analysis without further validation studies. parkin (park2) is an e3-ubiqutin ligase associated with the progression of the neurodegenerative disorder parkinson's disease. [47] . as a central hub protein in the parkin-ups pathway, park2 ubiquinates proteins encoded by septin 5 (sept5) [48] , tubulin alpha and beta [49] , and the glycosylated form of synuclein, alpha (snca) [50] for degradation by the 26s proteasome. park2 also ubiquinates synuclein, alpha interacting protein (sncaip) for regulation of snca [51] , interacts with stip1 homology and u-box containing protein 1 e3 ubiquitin protein ligase (stub1) to enhance ubiquitination of g proteincoupled receptor 37 (gpr37), [52] (which associates with f-box and wd repeat domain containing 7 (fbxw7)), and cullin 1 (cul1) to ubiquitinate cyclin e [53] . park2 is deactivated by protolytic cleavage by casp1 and caspase 8 (casp 8) [54] and can be activated by either heat shock protein 70kd (hspa4) or stub1 [52] . the parkin-ups pathway is not commonly associated with general immune response to viral infection. however, other ubiquitylation proteins, such as isg15, are known to play roles in host defense [55, 56] . associations between influenza infection and neuroinflammation in early onset autosomal recessive parkinson's disease have been recently suggested [57] [58] [59] . at least one factor in the progression of parkinson's disease is the formation of neuotoxic lewy bodies due to increases in snca. increases in snca are believed to be the result of loss-of-function mutations in park2 which cause disruptions in the protein's localization and solubility [60] [61] [62] . polymorphisms in the gene park2 have also been associated with susceptibility to infectious diseases such as leprosy, typhoid fever and paratyphoid fever, although the exact mechanism is still unclear [63, 64] . jang et al. observed activation of snca in mouse nervous tissue long after pathogenic h5n1 flu infection where the increased levels of snca mirror those found in parkinson's disease [57] . similarly, recent findings from rohn and catlin indicate flu as a potential causative factor for parkinson's disease [58] . indeed, links between flu and other neurodegenerative diseases have been suggested, and these include seizures, transverse myelitis, expressive aphasia, syncope, encephalitis, neuromyelitis optica, and central nervous system disease in general [65] [66] [67] . park2 itself has a low signal at the mrna level which might be due to its significant regulation by post-translation processes [52, 54] . further studies are needed to determine the mechanism our analysis suggests several potential repurposing opportunities for launched drugs against host-viral targets (table 4 ). this assumption is based on the occurrence of genes that are differentially expressed in infection models for at least five of the seven respiratory viruses, have involvement in a number of relevant pathways related to host immune response, and encode for known drug targets. the drugs associated with this gene list do not have current indications as anti-viral therapies, although pranlukast and clenbuterol are prescribed for relief of lung disorders such as bronchospasm after allergic reactions and asthma bronchoconstriction during asthma attacks, respectively. also, minocycline, sometimes called minocin, is a broad-spectrum tetracycline antibiotic as well as a caspase 1 (casp1) inhibitor while chloroquine is a well-known anti-malaria drug [29] [30] [31] . in fact, eight of the ten drug repurposing gene targets are involved in activation of the innate immune response, while the remaining two have some evidence of virus modulation. potential drug repurposing opportunities for f3, il1b, tnf, and mmp9, as well as the parkin-ups pathway gene product casp1, are discussed below. coagulation factor iii (f3). f3 normally binds to the native cofactor vii or viia to induce the blood coagulation cascade. treatment with recombinant coagulation factor viia promotes blood coagulation in hemophiliacs [29] [30] [31] . esmon et al. [68] suggest that coagulation could be used therapeutically to modulate inflammation responses and vice versa, but also caution about the danger of increased incidence of thrombosis. the consistent upregulation of f3 across five viruses suggests that the immunecoagulation axis is already initiated and supplemental f3 activation may cause thrombosis complications. further study is needed to develop therapeutics that could balance between innate immune response triggered by coagulation factor viia therapy and stabilization of the antithrombotic state. interleukin 1 beta (il1b). il1b is a cytokine involved in inflammatory response, cell proliferation, differentiation, and apoptosis. il1b is specifically cleaved into its active form by the protease casp1 after which it activates the nlrp3 inflammasome [29] [30] [31] 69] . indeed, il1b is consistently up regulated across cmv, flu, hrv, mpenu and rsv which likely correlates with inflammasome activation. however, overexpression of il1b causes multiple inflammatory disorders [69] . antagonists or neutralizers of il1b, such as canakinumab, could potentially reduce inflammation damage associated with viral infection. tumor necrosis factor (tnf). tnf has a wide range of biological functions including modulation of immune response to pathogen assault. mouse tnf knock-out phenotypes include abnormal immune system physiology, increased susceptibility to viral infection, and both increased and decreased susceptibility to bacterial infection [29] [30] [31] . in our study, tnf is mostly up regulated in infections by cmv, coron, cox, and flu but directionally ambiguous for mpneu and not expressed under rsv. while total disruption of tnf function would be deleterious to the host, there are instances where partial tnf inhibition provides a clinical benefit in patients with viral complications [70, 71] . pranlukast is a cysteinyl leukotriene receptor-1 antagonist that reduces bronchospasm caused by an allergic reaction, usually with asthmatic individuals. this drug inhibits tnf-alpha by blocking macrophage cysteinyl leukotriene 1 (cysltc4, d4) receptors [72] or suppression of nf-kappa b activation [73] . pranlukast has been recently shown to be beneficial not only in cases of respiratory syncytial virus postbronchiolitis, but also in a wide variety of other diseases with strong inflammatory complications such as cystic fibrosis, cancer, atherosclerosis, eosinophils cystitis, otitis media, capsular contracture, and eosinophilic gastrointestinal disorders [71] . amrinone is a type 3 pyridine phosphodiesterase inhibitor used in the treatment of congestive heart failure and is an inhibitor of tnf [74] . phosphodiesterase inhibitors have been shown to alter immune response [75] [76] [77] [78] and, in one case, specifically through tnf [79] . amrinone is known to modulate pro-and antiinflammatory factors in endotoxin-stimulated cells [80] . type 4 phosphodiesterase inhibitors have been used to treat rsv-induced airway hyper-responsiveness and lung eosinophilia [81] . therefore, indirect evidence suggests that armirone may be beneficial in respiratory viral infection situations by inhibiting tnf via type 4 phosphodiesterase, although this has yet to be seen in clinical studies. matrix metallopeptidase 9 (mmp9). mmp9 encodes a matrix metallopeptidase that degrades type iv and v collagens, and is implicated in arthritis and metastasis [29] [30] [31] . we can only speculate on the role mmp9 plays in infection. our analysis finds the gene to be up-regulated for three viruses while down-regulated for two different viruses. in other studies, mmp9 has been observed to be up-regulated after exposure to double stranded rna and is important to airway injury [82] , specifically by rsv [83] . mmp9 expression is induced by il1b [84] which, as mentioned above, is an activator of the nlrp3 inflammasome [85] . mmp9 inhibitors such as marimastat, minocycline or captopril, could be beneficial assuming that the protein is coopted by the infecting virus for tissue remodeling. blocking mmp9 may also reduce inflammatory damage by down-regulating the inflammasome. caspase 1 (casp1). in the case of the parkin-ups pathway, inhibiting tubulin-beta formation may reduce viral proliferation given that flu utilize acetylated tubulin for protein trafficking [86] and increases in neuronal class iii tubb occur after cox infection [87] . a casp1 inhibitor such as minocycline could be used to increase park2 ubiquitinase activity, in turn decreasing the tuba or tubb availability. as mentioned above, casp1 is a component of the nlrp3 inflammasome, activating the precursor to il1b [69] . therefore, a casp1 inhibitor would have an antagonist relationship with il1b, hence the inflammasome. further, casp1 inhibitors would be agonists for park2, thereby reducing accumulation of snca. in this regard, casp1 inhibitors may not only prevent unnecessary nlrp3 inflammasome activation via ilb1, but may also reduce accumulation of neurotoxic lewy bodies through activation of park2. however, caspases are not specific to the parkin-ups pathway and inhibition in this regard may result in toxicity or other complications [88] . additionally, mouse jax phenotypes for casp1 show both increased and decreased susceptibility to bacterial infection, as well as decreased inflammatory response. while casp1 inhibition may prove beneficial in terms of increasing inflammatory responses, it is ambiguous in terms of benefit for bacterial infections. in our analysis, the expression of casp1 and tubb3 is also somewhat variable across virus types. therefore, more study is needed specifically on the role of caspase and tubulin in host response to respiratory virus infection. modulation of any human host pathway for the treatment of viral infections has potential drawbacks with respect to toxicity and other side-effects. for example, although interferon is widely used to help combat viral pathogens, the treatment is known to cause an array of side-effects related to toxicity including confusion, lethargy, impaired mental status, numbness, tingling, fevers, chills, headaches, anorexia and sepsis [89, 90] . another caveat is that some proteins are beneficial if up-regulated during initial viral infection but have detrimental effects if over-activated for prolonged periods. thus determining the desired mechanism and direction of therapeutic intervention requires careful study. although targeting host-pathogen interactions is a challenging therapeutic approach, there are considerable upside benefits with respect to overcoming pathogen-mediated drug resistance and the capability of treating multiple, co-infecting pathogens. our study suggests several potential human-host proteins that could be targets of future therapeutics as well as some possible drug candidates for further investigations of repurposing against respiratory virus infections. the national center for biotechnology information's geo database (http://www.ncbi.nlm.nih.gov/geo/ (accessed between january and july 2011) was searched for human mrna datasets for twelve respiratory viruses. (figure 1 ) reduced the number of viruses with suitable datasets to seven species (table 1) . all analyzed geo datasets contain at least one ''treatment group'' and ''control group''. ''treatment'' was the experimental variable under study, usually a virus type, strain, or time point. ''group'' was a collection of individual ''samples'', or replicates, each of which originates from their own microarray chip. ''comparison group'' was the treatment group compared to a control group. a particular dataset may have more than one comparison group. all criteria for dataset inclusion in the final analysis were chosen prior to the analysis. dataset candidacy filtering consisted of four criteria: 1) the dataset must contain at least 3 samples per treatment or control group because a sample size any less would mean a loss in statistical power for subsequent analysis; 2) the microarray platform must be supported by either affymetrix, agilent or illumina due to probe mapping abilities of the software used in subsequent analysis; 3) each gene expression profile had to be derived from human cells and probed using a human-based genome microarray platform and not other species; and 4) the dataset must contain at least one wild-type infection treatment group (i.e., unmodified virus strain or infectivity mechanism) and at least one healthy control group (i.e., no genetic or media modifications such as gene knock outs or inhibitors, respectively). prior to quality control (qc) analysis, we pre-screened and preprocessed each dataset. normalized raw data and the study design table were imported from the geo databases (the data was assumed to be normalized by robust multi-array average, but in some cases the published study used an alternative normalization method). where appropriate, the intensity values were log 2 transformation. various experimental parameters such as time point, virus strain and number of replicates were extracted from the study design tables. samples irrelevant to the main study design were marked for segregation or exclusion from our downstream analysis, but not excluded from quality assessment. these were classified as ''failing to meet treatment specification'' at the candidate filtering step. studies that had a large number of missing intensity values (over 10%) were annotated and flagged. the qc analysis assessed each sample in the dataset for kernel density, pca, mad, and pair-wise pearson correlation such that: 1) the kernel density was normally distributed; 2) after pca values were within the hotelling t2 alpha level threshold of 0.05 [91] [92] [93] ; 3) mad score scores were in the range of +3 to 23 with no outliers [94] ; and 4) inner-treatment group pair wise correlations for samples derived from a single cell were $0.97 or $0.90 if taken from individual donors [94] . figures were created using array studio software, version 4.1. (omicsoft corporation, research triangle park, nc, usa [95] ). during subsequent analysis, each comparison group was treated separately, regardless of dataset origination, in order to gain a wider, less bias view of representative genes and pathways. once a comparison group passed the qc analysis filters, lsm values were calculated for each probe using array studio in order to reduce the number of false positives due to low probe intensity values. probes within each of the filtered datasets were tested for biological and statistical relevance using the array studio implementation of fold change and statistical models, respectively. specifically, to determine a probe's fold change expression when compared to control, the geometric mean of each probe's log 2 transformed intensity value within a treatment was generated, and then normalized to the corresponding control group's geometric mean. the treatment versus control data were fitted to a general linear model, and associated p-values for each probe were calculated using a modified t-test [96] . thus, to be considered differentially expressed, each probe within a comparison group must have a p-value ,0.05 after general linear model test and a fold change in either direction of 1.5. to visualize a comparison group's significance and fold change, volcano plots were generated using array studio of a probe's 2log(p-value) versus its transformed fold change (fc) value according to the following piece-wise function: the differentially expressed probes were mapped to their corresponding genes using metacore/metabase (genego), a software/database package that creates biological pathways and networks from gene lists (database accessed june 2011) [97, 98] . if more than one probe mapped to a gene, the probe with the highest magnitude fold change value was used for that gene. thus, the mapped differentially expressed probe list became the differentially expressed gene list for each comparison group. the differentially expressed gene lists from each comparison group were analyzed for enriched pathways using genego. a pvalue for each of the 658 pathway maps in the metabase were generated for each comparison group using a hypergeometric test [99] . in order for a pathway to be considered enriched, each comparison group must contain pathways that have a p-value ,0.01 and occur in .5% of the total studies. the enriched pathway list was ranked by its viral frequency, which is defined by the number of viruses represented by at least one comparison group, and then by the sum of normalized viral expression or nve for each enriched pathway. the nve for each pathway was calculated using the number of comparisons containing significant pathways within a virus type relative to the number of comparisons within that virus type. for example, if one out of four flu comparisons for pathway a were significant, the nve for flu would be 1/4. ranking the pathways in this fashion resulted in a clearer determination of pathways shared across multiple viruses, irrespective of time, strain type, or number of comparison groups. after examining the ranked pathway list described above, the top five significant pathways and an additional pathway representing a unique mechanism were further analyzed. with each map, the proteins were labeled according to the number of viruses in which the transcript was differentially expressed thus yielding the viral frequency for that protein. in cases where a protein complex was made up of subunits, the greatest magnitude fold change value for any subunit was chosen to represent the entire complex. genego was used for the visualization of this pathway map. similar to the pathway nve, the nve for each gene within these six chosen pathways was calculated using the number of comparisons containing either up or down regulated genes for each protein within a virus type relative to the number of comparisons within that virus type. for example, if two out of three rsv comparisons for gene x were up-regulated, gene x's nve for rsv would be 2/3. we performed complete linkage and correlation distance hierarchical clustering using arraystudio on quantile normalized fold change values to determine the separation qualities of the analyzed data [100] . clustering was performed on genes that had expression values for at least 90% of the total number of comparisons. we used the matlab function 'knnimpute' to impute missing fold change values using k-nearest neighbors estimation (matlab version 7.11 (r2010b), mathworks, cambridge ma, 2010) [101, 102] . approved or marketed small molecule and protein therapeutics for each of the differentially expressed proteins modulated by 5 or more respiratory viruses were obtained from the drugbank database, version 3.0 (http://www.drugbank.ca/ accessed august 2011) [29] [30] [31] . we only considered those drugs that were launched products with experimental and clinical evidence of direct interaction with gene product in question. the therapy's interaction with the target and approved indication were identified using a combination of drugbank, the drug manufacturer's information page, and the national center for biotechnology information's pubchem (http://pubchem.ncbi.nlm.nih.gov/ accessed september 2011) [103] and gene (http://www.ncbi.nlm. nih.gov/gene/ accessed september 2011) databases. supplemental evidence of mechanism of action was obtained from immune or infection-related jackson laboratory knock-in/knock-out mouse (jax) phenotype (http://www.jax.org/ accessed september 2011) [32] . (table 3 ). the horizontal axis contains each of the 28 different comparisons labeled by virus, gse and time point. the vertical axis shows the clustering of 27 genes from the top five and parkin-ups pathways that have an nve of at least 6 and have an expression value in at least 26 comparisons. for genes present in more than one of the five pathways, the number of participating pathways is indicated by the count of ''*'' before the gene name. color scheme is as described for figure s3 . (tif) figure s5 epidermal growth factor receptor signaling pathway with viral frequency. viral frequencies superimposed for each of most frequently differentially expressed proteins, where red circles are differential expression of genes by 7 viruses, orange circles are differential expression of genes by at least 6 viruses, and blue circles are differential expression of genes by 5 viruses. see metacore website at http://www.genego.com/pdf/ mc_legend.pdf for figure legend and table s4 for pathway map gene products' corresponding hugo gene names. (tif) figure s6 cd40 signaling pathway with viral frequency. viral frequencies superimposed for each of most frequently differentially expressed proteins, where red circles are differential expression of genes by 7 viruses, orange circles are differential expression of genes by at least 6 viruses, and blue circles are differential expression of genes by 5 viruses. see metacore website at http://www.genego.com/pdf/mc_legend.pdf for figure legend and table s4 for pathway map gene products' corresponding hugo gene names. (tif) figure s7 interferon-gamma signaling pathway with viral frequency. viral frequencies superimposed for each of most frequently differentially expressed proteins, where red circles are differential expression of genes by 7 viruses, orange circles are differential expression of genes by at least 6 viruses, and blue circles are differential expression of genes by 5 viruses. see metacore website at http://www.genego.com/pdf/mc_legend. pdf for figure legend and table s4 for pathway map gene products' corresponding hugo gene names. (tif) figure s8 histamine receptor h1 signaling pathway with viral frequency. viral frequencies superimposed for each of most frequently differentially expressed proteins, where red circles are differential expression of genes by 7 viruses, orange circles are differential expression of genes by at least 6 viruses, and blue circles are differential expression of genes by 5 viruses. see metacore website at http://www.genego.com/pdf/mc_legend. pdf for figure legend and table s4 for pathway map gene products' corresponding hugo gene names. (tif) figure s9 interleukin-17 signaling pathway with viral frequency. viral frequencies superimposed for each of most frequently differentially expressed proteins, where red circles are differential expression of genes by 7 viruses, orange circles are differential expression of genes by at least 6 viruses, and blue circles are differential expression of genes by 5 viruses. see metacore website at http://www.genego.com/pdf/mc_legend. pdf for figure legend and table s4 for pathway map gene products' corresponding hugo gene names. (tif) respiratory viruses other than influenza virus: impact and therapeutic advances estimating the burden of 2009 pandemic influenza a (h1n1) in the united states the role of immunoprophylaxis in the reduction of disease attributable to respiratory syncytial virus viral and atypical bacterial detection in acute respiratory infection in children under five years unsuccessful therapy with adefovir and entecavir-tenofovir in a patient with chronic hepatitis b infection with previous resistance to lamivudine: a fourteen-year evolution of hepatitis b virus mutations identification of broad-spectrum antiviral compounds and assessment of the druggability of their target for efficacy against respiratory syncytial virus (rsv) analysis of respiratory syncytial virus preclinical and clinical variants resistant to neutralization by monoclonal antibodies palivizumab and/or motavizumab weekly epidemiological record:recommended composition of influenza virus vaccines for use in the 2011-2012 northern hemisphere influenza season influenza-induced innate immunity: regulators of viral replication, respiratory tract pathology & adaptive immunity infection and apoptosis as a combined inflammatory trigger the dependence of viral rna replication on coopted host factors inflammatory mediator tak1 regulates hair follicle morphogenesis and anagen induction shown by using keratinocyte-specific tak1-deficient mice interleukin (il)-6 directs the differentiation of il-4-producing cd4+ t cells infection of a human respiratory epithelial cell line with rhinovirus. induction of cytokine release and modulation of susceptibility to infection by cytokine exposure interleukin-11: stimulation in vivo and in vitro by respiratory viruses and induction of airways hyperresponsiveness proud d (1992) production of granulocyte-macrophage colony-stimulating factor by cultured human tracheal epithelial cells systems biology and the host response to viral infection hiv-1/mycobacterium tuberculosis coinfection immunology: how does hiv-1 exacerbate tuberculosis? timing of antiretroviral therapy for hiv in the setting of tb treatment drug repositioning: re-investigating existing drugs for new therapeutic indications computational biology approaches for selecting host-pathogen drug targets systems integration of biodefense omics data for analysis of pathogen-host interactions and identification of potential targets host-directed drug targeting of factors hijacked by pathogens human cytomegalovirus infection causes premature and abnormal differentiation of human neural progenitor cells altered cellular mrna levels in human cytomegalovirus-infected fibroblasts: viral block to the accumulation of antiviral mrnas transforming growth factor beta is a major regulator of human neonatal immune responses following respiratory syncytial virus infection cytomegalovirus ie1 protein elicits a type ii interferon-like host cell response that depends on activated stat1 but not interferon-gamma systems-level comparison of host responses induced by pandemic and seasonal influenza a h1n1 viruses in primary human type i-like alveolar epithelial cells in vitro chembl. an interview with john overington, team leader, chemogenomics at the european bioinformatics institute outstation of the european molecular biology laboratory (embl-ebi). interview by wendy a. warr drugbank: a knowledgebase for drugs, drug actions and drug targets drugbank: a comprehensive resource for in silico drug discovery and exploration mouse mutant resource web site asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types large-scale integration of microarray data reveals genes and pathways common to multiple cancer types microarray data quality control improves the detection of differentially expressed genes the epidermal growth factor receptor (egfr) promotes uptake of influenza a viruses (iav) into host cells enterovirus 71 modulates a cox-2/pge2/camp-dependent viral replication in human neuroblastoma cells: role of the c-src/egfr/p42/p44 mapk/creb signaling pathway enterovirus 71 induces integrin beta1/egfr-rac1-dependent oxidative stress in sk-n-sh cells: role of ho-1/co in viral replication(1) severe acute respiratory syndrome (sars) coronavirus-induced lung epithelial cytokines exacerbate sars pathogenesis by modulating intrinsic functions of monocytederived macrophages and dendritic cells role of monocytes and eosinophils in human respiratory syncytial virus infection in vitro the cd40 antigen and its ligand cytokines and acute phase proteins associated with acute swine influenza infection in pigs respiratory syncytial virus impairs macrophage ifn-alpha/beta-and ifngamma-stimulated transcription by distinct mechanisms down-regulation of the interferon signaling pathway in t lymphocytes from patients with metastatic melanoma interleukin-1 beta targeted therapy for type 2 diabetes immunomodulatory functions of type i interferons autoregulation of parkin activity through its ubiquitin-like domain parkin functions as an e2-dependent ubiquitin-protein ligase and promotes the degradation of the synaptic vesicle-associated protein, cdcrel-1. proceeding of the national academy of parkin binds to alpha/beta tubulin and increases their ubiquitination and degradation ubiquitination of a new form of alpha-synuclein by parkin from human brain: implications for parkinson's disease parkin ubiquitinates the alpha-synuclein-interacting protein, synphilin-1: implications for lewy-body formation in parkinson disease pael receptor, endoplasmic reticulum stress, and parkinson's disease parkin is a component of an scf-like ubiquitin ligase complex and protects postmitotic neurons from kainate excitotoxicity caspase-1 and caspase-8 cleave and inactivate cellular parkin the role of ubiquitylation in immune defence and pathogen evasion interferon-stimulated gene 15 and the protein isgylation system highly pathogenic h5n1 influenza virus can enter the central nervous system and induce neuroinflammation and neurodegeneration immunolocalization of influenza a virus and markers of inflammation in the human parkinson's disease brain viral parkinsonism ring finger 1 mutations in parkin produce altered localization of the protein the c289g and c418r missense mutations cause rapid sequestration of human parkin into insoluble aggregates alterations in the solubility and intracellular localization of parkin by several familial parkinson's disease-linked point mutations park2/ pacrg polymorphisms and susceptibility to typhoid and paratyphoid fever susceptibility to leprosy is associated with park2 and pacrg pediatric neurologic complications associated with influenza a h1n1 influenza-associated central nervous system dysfunction: a literature review influenzaassociated monophasic neuromyelitis optica innate immunity and coagulation the nlrp3 inflammasome in health and disease: the good, the bad and the ugly effect of tnf-alpha production inhibitors on the production of pro-inflammatory cytokines by peripheral blood mononuclear cells from htlv-1-infected individuals antileukotriene drugs: clinical application, effectiveness and safety cysteinyl leukotrienes enhance tumour necrosis factor-alpha-induced matrix metalloproteinase-9 in human monocytes/macrophages pranlukast, a cysteinyl leukotriene receptor 1 antagonist, attenuates allergenspecific tumour necrosis factor alpha production and nuclear factor kappa b nuclear translocation in peripheral blood monocytes from atopic asthmatics effect of amrinone on tumor necrosis factor production in endotoxic shock the use of cytokine inhibitors. a new therapeutic insight into heart failure phosphodiesterase inhibitors suppress proliferation of peripheral blood mononuclear cells and interleukin-4 and -5 secretion by human t-helper type 2 cells differential efficacy of lymphocyte-and monocyte-selective pretreatment with a type 4 phosphodiesterase inhibitor on antigen-driven proliferation and cytokine gene expression modulating effects of nonselective and selective phosphodiesterase inhibitors on lymphocyte subsets and humoral immune response in mice differential regulation of human monocyte-derived tnf alpha and il-1 beta by type iv camp-phosphodiesterase (camp-pde) inhibitors effect of the phosphodiesterase iii inhibitor amrinone on cytokine and nitric oxide production in immunostimulated j774.1 macrophages type 4 phosphodiesterase inhibitors attenuate respiratory syncytial virus-induced airway hyper-responsiveness and lung eosinophilia double-stranded rna induces mmp-9 gene expression in hacat keratinocytes by tumor necrosis factor-alpha effect of respiratory syncytial virus on the activity of matrix metalloproteinase in mice il-1beta induces mmp-9 via reactive oxygen species and nf-kappab in murine macrophage raw 264.7 cells first dominique dormont international conference on ''host-pathogen interactions in chronic infections -viral and host determinants of hcv, hcmv, and hiv infections enhanced acetylation of alpha-tubulin in influenza a virus infected epithelial cells coxsackievirus preferentially replicates and induces cytopathic effects in undifferentiated neural progenitor cells viral subversion of apoptotic enzymes: escape from death row interferon in oncological practice: review of interferon biology, clinical applications, and toxicities systems biology of interferon responses on lines and planes of closest fit to systems of points in space analysis of a complex of statistical variables into principal components data analysis using arraystudio array studio software to fit the generalized linear model pathway mapping tools for analysis of high content data functional analysis of omics data and small molecule compounds in an integrated ''knowledge-based'' platform global functional profiling of gene expression another multidimensional analysis package, version missing value estimation methods for dna microarrays the mathworks i (2010) matlab pubchem: integrated platform of small molecules and biological activities human cytomegalovirus-regulated paxillin in monocytes links cellular pathogenic motility to the process of viral entry toll-like receptor 3 has no critical role during early immune response of human monocyte-derived dendritic cells after infection with the human cytomegalovirus strain tb40e dependent upregulation of mcl-1 by human cytomegalovirus is mediated by epidermal growth factor receptor and inhibits apoptosis in short-lived monocytes expression profile of immune response genes in patients with severe acute respiratory syndrome dynamic innate immune responses of human bronchial epithelial cells to severe acute respiratory syndrome-associated coronavirus infection human influenza is more effective than avian influenza at antiviral suppression in airway cells gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans systemslevel comparison of host-responses elicited by avian h5n1 and seasonal h1n1 influenza viruses in primary human macrophages gene expression profiles during in vivo human rhinovirus infection: insights into the host response rhinovirus-induced modulation of gene expression in bronchial epithelial cells from subjects with asthma identification of human metapneumovirus-induced gene networks in airway epithelial cells by microarray analysis differential recognition of tlr-dependent microbial ligands in human bronchial epithelial cells identification of gene biomarkers for respiratory syncytial virus infection in a bronchial epithelial cell line we thank vinod kumar for his assistance in computational analysis. key: cord-351548-jvl63652 authors: juranic lisnic, vanda; babic cac, marina; lisnic, berislav; trsan, tihana; mefferd, adam; das mukhopadhyay, chitrangada; cook, charles h.; jonjic, stipan; trgovcich, joanne title: dual analysis of the murine cytomegalovirus and host cell transcriptomes reveal new aspects of the virus-host cell interface date: 2013-09-26 journal: plos pathog doi: 10.1371/journal.ppat.1003611 sha: doc_id: 351548 cord_uid: jvl63652 major gaps in our knowledge of pathogen genes and how these gene products interact with host gene products to cause disease represent a major obstacle to progress in vaccine and antiviral drug development for the herpesviruses. to begin to bridge these gaps, we conducted a dual analysis of murine cytomegalovirus (mcmv) and host cell transcriptomes during lytic infection. we analyzed the mcmv transcriptome during lytic infection using both classical cdna cloning and sequencing of viral transcripts and next generation sequencing of transcripts (rna-seq). we also investigated the host transcriptome using rna-seq combined with differential gene expression analysis, biological pathway analysis, and gene ontology analysis. we identify numerous novel spliced and unspliced transcripts of mcmv. unexpectedly, the most abundantly transcribed viral genes are of unknown function. we found that the most abundant viral transcript, recently identified as a noncoding rna regulating cellular micrornas, also codes for a novel protein. to our knowledge, this is the first viral transcript that functions both as a noncoding rna and an mrna. we also report that lytic infection elicits a profound cellular response in fibroblasts. highly upregulated and induced host genes included those involved in inflammation and immunity, but also many unexpected transcription factors and host genes related to development and differentiation. many top downregulated and repressed genes are associated with functions whose roles in infection are obscure, including host long intergenic noncoding rnas, antisense rnas or small nucleolar rnas. correspondingly, many differentially expressed genes cluster in biological pathways that may shed new light on cytomegalovirus pathogenesis. together, these findings provide new insights into the molecular warfare at the virus-host interface and suggest new areas of research to advance the understanding and treatment of cytomegalovirus-associated diseases. the cytomegaloviruses, classified within the betherpesvirinae subfamily, are a group of species-specific herpes viruses that establish life-long infection of their hosts. human cytomegalovirus (hcmv) can cause devastating disease and death in congenitallyinfected infants, and long-term neurological complications in survivors. in adults, hcmv can cause a spectrum of diseases in immune compromised patients involving multiple organs and tissues and is a primary cause of graft loss in transplant patients [1, 2] . in recent years, hcmv has been linked to lung injury in trauma patients [3] and is also postulated to act as a cofactor in atherosclerosis and some cancers [4, 5] . for these reasons, there is an urgent need for an effective vaccine and new antiviral intervention strategies that mitigate the toxicity and drug resistance shortcomings of current antiviral drugs [1, 6] . there exist a number of challenges to our understanding of cmv pathogenesis as well as progress in vaccine and antiviral drug development. two outstanding challenges are the gaps in our knowledge of viral genes and how these gene products interact with host cellular gene products to cause disease. despite the publication of the first sequence of the hcmv genome in 1990 [7, 8] , and the first sequence of the murine cytomegalovirus (mcmv) genome in 1996 [9] , there are still important questions regarding the nature and number of genes for these viruses. mcmv is the most widely used model to study hcmv diseases and recapitulates many of clinical and pathological findings found in human diseases. our understanding of mcmv viral genes and genomes has evolved with the technology used to study them. a major milestone in understanding mcmv came with decoding the first mcmv complete genome sequence by rawlinson and colleagues [9] . the authors identified a 230 kb genome predicted to encode 170 genes. subsequent refinements in the annotation of the mcmv were introduced by classical molecular and biochemical studies that are reflected in the current ncbi reference sequence. the application of new technologies to study the mcmv genome emerged in the last decade and include proteomic [10] , in silico [11] , and gene array [12, 13] approaches that have led to major revisions in gene annotation. more recently cheng and colleagues [14] proposed additional changes after sequencing isolates to measure genome stability after in vitro and in vivo passage. also, lacaze and colleagues [13] extended the microarray approach to include probes specific to both strands of the genome, leading to the discovery of noncoding and bi-directional transcription at late stages of mcmv infection. finally, a recent transcriptomic analysis of newly synthesized rna in mcmv infected fibroblasts [15] applied rna-seq technology to study regulation of viral gene expression and identified a very early peak of viral gene transcriptional activity at 1-2 hours post infection followed by rapid cellular countermeasures but did not attempt to re-annotate mcmv genome. altogether, these new technologies have refined and advanced our knowledge of viral genes and the mcmv genome. nevertheless, we still lack definitive annotation for the standard lab strains of mcmv and specific knowledge of how many of these genes function during natural infection and disease. currently, two annotations of mcmv genomes are used -the original rawlinson's annotation with minor modifications (genbank accession no. gu305914. 1) where 170 open reading frames (orfs) are identified and the ncbi reference sequence annotation (genbank accession no: nc_004065.1) with 160 orfs. we previously used a transcriptomic approach to analyze gene products of hcmv [16] . this was the first report to characterize abundant antisense and noncoding transcription in the hcmv genome showing that there is greater complexity of herpesvirus genomes than previously appreciated. using rna-seq technology, gatherer et al. [17] showed that most protein coding genes are also transcribed in antisense but are generally expressed at lower levels than their sense counterparts. a more recent analysis of translational products of hcmv [18] by ribosomal footprinting indentified 751 translated orfs, further underscoring the complexity of herpes virus genomes. we describe mcmv transcriptional products that differ from predicted orfs, novel spliced transcripts, and novel transcripts derived from intergenic regions of the genome. additionally, we found that the most abundant viral transcript (mat) is a spliced transcript recently identified as a noncoding rna that limits accumulation of cellular mirnas [19, 20] . here we report that this transcript also specifies a novel protein and to our knowledge, this is the first viral transcript that functions both as a noncoding rna and mrna. analysis of the host transcriptional response to infection revealed many unexpected host genes that are regulated by virus infection, including many noncoding rna genes. correspondingly, many host genes regulated by virus infection cluster in unexpected biological pathways that may shed new light on the pathogenesis of cytomegalovirus-associated diseases. together, these findings suggest important revisions are required for mcmv genome annotation and emphasize numerous aspects of mcmv biology and the host response to this infection that are unknown and require further study. in this study, we set out to complete a transcriptomic analysis of mcmv infection. we analyzed viral transcripts through classical cdna cloning and sequencing and through next generation sequencing of cdna generated from total cellular rna (rna-seq). analysis of cdna libraries is a well-proven approach to analyze viral transcripts based on isolation of long transcripts, molecular cloning of the transcripts, and traditional sanger-based sequencing of the cdna clones. traditional cloning has many advantages, including isolation of novel transcripts that may not be identified by probe-based technologies, as well as precise analysis of transcript splice sites and transcript 39 ends. the introduction of massively parallel sequencing techniques represents a major new technology to study gene expression. basically, rna (total or fractionated) is converted to a library of smaller cdna fragments. adaptors are added to the fragments, and the shorter fragments are sequenced in a high-throughput manner using next generation sequencing technology. this rna-sequencing (rna-seq) approach is free of selection biases associated with traditional cloning or probe-based methods and allows for the entire transcriptome to be analyzed in a quantitative manner (reviewed in [21] ). first, cdna libraries representing the major temporal classes of viral gene expression were generated by collecting rna from infected mouse embryonic fibroblasts (mefs) at 9 time points after infection. for rna-seq analysis, rnas collected at the same 9 time points were pooled, converted to cdna, and sequenced on the illumina genome analyzer iix. of the 33,995,400 reads that passed the filter from infected cells, 11% aligned to mcmv genome indicating a 585-fold coverage of the viral genome. a total of 448 cdna clones were included in the final analyses [84 from the immediate early (ie) library, 163 from the early (e) library, and 201 from the late (l) library]. generally, temporal assignment of cdna clones in this study agrees with previous studies and a detailed comparison, including discrepancies to earlier studies is provided in dataset s1. as shown in figures 1 and 2 , transcriptomic data generated using these two experimental approaches were compared to currently available genome annotation (the ncbi reference sequence, genbank accession. no. nc_004065.1, and a more recent sequence analysis of the smith strain, genbank accession no. gu305914.1). using this schematic overview, current annowe have conducted a comprehensive analysis of the murine cytomegalovirus and host cell transcriptomes during lytic infection. we identify numerous novel spliced and unspliced transcripts of mcmv. unexpectedly, the most abundantly transcribed viral genes are of unknown function. we found that the most abundant viral transcript, recently identified as a noncoding rna regulating cellular micrornas, also codes for a novel protein. to our knowledge, this is the first viral transcript that functions both as a noncoding rna and an mrna. infection alters expression of many unexpected host genes, including many noncoding rna genes. correspondingly, many cluster in unexpected biological pathways that may shed new light on cytomegalovirus pathogenesis. together, these findings provide new insights into the molecular warfare at the virus-host interface and suggest new areas of research to advance the understanding and treatment of cytomegalovirus-associated diseases. figure 1 . comparison of cdna cloning and rna-seq data in relation to current genome annotation. comparison of poly(a) cdna library (green arrows) and rna-seq analysis of murine cytomegalovirus (gray histograms). the longest clone from each group of clones in the cdna library is shown. eland alignments of rna-seq reads were loaded in integrative genomics viewer and compared to nc_004065.1, (red arrows) and gu305914.1 (blue arrows). the data range for rna-seq data was set to 20-5000. data is shown in 30 kb ranges with 1 kb overlap. data is shown for the first 120 kb of the mcmv genome and the figure legend is shown in figure 2 . doi:10.1371/journal.ppat.1003611.g001 tations (red and blue arrows) largely agree. the mcmv transcripts identified through our classical cdna cloning and sequencing (green arrows) and the rna-seq expression profiles (gray histograms), showed complementary results to each other but diverged dramatically from current annotations. a summary of the cdna clones relative to genes annotated in the ncbi reference gu305914.1 (blue arrows). the data range for rna-seq data was set to 20-5000. data is shown in 30 kb ranges with 1 kb overlap. data is shown for genomic region spanning 119-230 kb of the mcmv genome. doi:10.1371/journal.ppat.1003611.g002 table s1 , and a complete list of the 448 cdna clones isolated in this study and their characteristics are presented in table s2 . a detailed comparison of transcripts cloned in this study compared to current annotations (gu305914.1 and nc_004065) is presented in table s3 , and includes estimated relative expression measured as reads per kilobase per million mapped reads (rpkm) derived from rna-seq analysis. analysis of the cdna clones revealed many novel transcripts with the following characteristics: (1) novel antisense transcripts. excluding 20 cdna clones that mapped to intergenic regions, 275 (64%) of cdna clones were in the sense (s) orientation, 39 (9%) were antisense (as), and 114 (27%) overlapped more than one gene in both s and as orientations relative to original annotation provided by rawlinson and colleagues [9] . these designations were reevaluated according to the current ncbi reference sequence (nc_004065) in which as transcripts to hypothetical or putative proteins were revised as s transcripts due to lack of evidence for the predicted sense transcript. if as transcripts that overlapped hypothetical proteins also overlapped s transcripts in this library, the as designation was preserved. according to these criteria, 431 (97%) transcripts were in s orientation, only 4 (0.09%) were in as orientation, and 9 (2%) overlapped more than one gene in both s and as orientations. (2) transcripts overlapping more than one currently annotated genes. several cdna clones were isolated that overlap more than one currently annotated gene. for example, four cdna clones in our library overlapped both the m15 and m16 genes. the longest of these clones specifies a 1673 bp transcript, whereas current annotation predicts two genes of 908 bp (m15) and 632 bp (m16). these data suggest the possibility that several orfs are translated from unexpectedly long or polycistronic transcripts. unusually long and polycistronic transcripts have recently been shown to be a feature of hcmv transcriptome [18] . (3) the absence of transcripts in currently annotated regions. several gene regions were not well represented in the cdna cloning study. for example, no cdna clones overlapping m01 or m170 were found in this study, and these regions had the lowest rpkm reads by rna-seq of 132 and 141, respectively (gray histograms in figures 1 and 2 and table s3 ), consistent with earlier gene array-based studies [12] . for comparison, the well-defined mcmv genes m04 and m138, both represented with multiple clones in our cdna library, have rpkm values of 14,137 and 16,935 respectively. because some viral transcripts with higher rpkm reads were not represented in the classical cdna library (for example, an rpkm of 2967 for the m98 gene), we conclude the cdna library cloning did not capture all viral transcripts. nevertheless, the failure to identify clones in regions poorly represented by rna-seq data suggests that further attention is required to prove or disprove the existence of genes predicted by in silico (orf) analysis. (4) novel spliced transcripts. one of the most striking aspects of the cloning study was the abundance of novel spliced transcripts. a total of 22 novel spliced transcripts were cloned in this study as well as spliced transcripts reported by others. a complete list of spliced transcripts identified in this and other studies is provided in table s4 . one of the most abundant novel spliced transcripts identified in this study maps to the m116 gene (8 cdna clones). current annotation for m116 predicts a protein of 645 aa, whereas the clones in this study predict a novel truncated protein product. splicing results in a change of open reading frame at amino acid residue 350 of the currently annotated m116 protein and introduction of a new stop codon at position 401 (dataset s2). by far the most abundant novel spliced transcript is that overlapping m169 in the s orientation, and m168 in the as orientation and is discussed in detail below. approximately 31% of the viral cdna clones and 41% of all viral reads from the rna-seq analysis mapped to the novel spliced transcript at the right end of the genome. the structure of this transcript relative to current gene annotation is shown in figure 3a and the spliced nature of this transcript is also apparent in the rna-seq profile (gray histogram). the longest predicted orf extends into the second exon and predicts a protein of 147 aa, of which the first 127 residues matches the predicted m169 protein sequence ( figure 3b ). to determine if this transcript is translated, an antibody was prepared to the protein sequence predicted for m169. this antibody was used in immunoblot analysis of cell lysates prepared from mock-infected cells and cells infected with wild-type (wt) bac-derived smith strain virus, or various multi-gene and single-gene mutant virus strains. this antibody reacted with a protein of approximately the expected size (17 kda) ( figure 3d ) in cells infected with wt virus and mutant viruses that express most or all or the mat transcript as determined by northern blot analysis ( figure 3c ). while the mat protein is first detected 24 hrs after infection, maximal amounts are observed at 60 and 72 hrs after infection ( figure 3e ). mat protein was also detected in fibroblasts exposed to five different strains of mcmv isolated from wild mice, indicating that the coding region of the gene is conserved in laboratory and wild strains of mcmv ( figure 3g ). most remarkably, these findings demonstrate that the mat gene generates a single transcript with both noncoding [19, 20] and protein coding functions. we also investigated the possibility that mat protein accumulation is directly related to control of the mat transcript levels by cellular mir-27 [20] . marcinowski and colleagues have shown that when the binding site for mir-27 is mutated (m169-mut virus), mat transcript levels were increased twofold in comparison to levels obtained in cells infected with wt mcmv at 24 hours after infection due to loss of transcript regulation by the microrna. the difference in mat transcript abundance between wt and the m169-mut virus was lost by 48 hours after infection. however, we observed that mat protein levels were similar in cells infected with the wt and m169-mut viruses at 24 and 48 hrs after infection ( figure 3h ). we conclude that the noncoding function of the mat transcript (regulation of cellular mir-27) is unrelated to mat protein accumulation. in addition to providing valuable insight into transcript structure, rna-seq analysis revealed several new facets of the viral transcriptome. first, accumulation of individual viral transcripts varies by several orders of magnitude. figure 4a depicts the number of rna-seq reads mapped against the mcmv genome in which rows 1, 2 and 3 visualize the data with the maximum number reads set to 50,000, 5000, and 500, respectively. validating the classical cloning study, the most abundant transcript identified by rna-seq is the mat transcript. enumeration (rpkm) of the most abundant transcripts is presented in figures 4b and 4c , and shows that after the mat transcript, the most highly expressed genes are m119, m116, and m48, all genes without assigned functions. also highly expressed are the immune evasion genes m04 and m138, m55 (glycoprotein b), and additional genes of unknown functions (m73 and m15). second, as shown by comparing figure 4b to c, both the overall magnitude of expression and the ranking of the abundance of different transcripts vary according to annotation used for the rpkm analysis. third, analysis of reads mapped at the highest resolution ( figure 4a , row 3) indicates that most of the viral genome is transcribed to some degree. remarkably, 30 or 35% of the reads mapped to intergenic regions, depending on annotation ( figure 4d ). this percentage is reduced to 14% when annotation is modified to reflect the correct mat gene structure identified in this study. rna-seq also detected significant transcription in m74-m75 ( figure s4 ), m85-m87 and m88-m90 ( figure 4e ) intergenic regions. in contrast, the transcriptional profile of the annotated m87 shows less transcriptional activity than the adjacent intergenic regions. similarly, rna-seq identified transcription from genes that were not isolated in the classical cdna library or in previous studies using microarray technology [12, 13] . a detailed analysis of the sensitivity of this rna-seq study to previous studies is provided in supplemental datasets s1a-c. we also compared our rna-seq data to a recent rna-seq analysis of the mcmv transcriptome using bac-derived wt virus on nih-3t3 fibroblasts [15] . as shown in supplemental figure s1 , the profiles obtained from these two different rna-seq experiments are remarkably similar despite using different sequencing platforms and library generation approaches. also, either seven or eight of the 10 most abundant genes were identical in both datasets (supplemental dataset s1c). minor differences in abundance of some transcripts can be attributed to differences in the time points analyzed in these two studies as well as the fact that our analysis achieved an order of magnitude greater sequencing depth (compare reads analyzed for each histogram set in figure s1 ). together these findings demonstrate that rna-seq analysis is a highly sensitive method for detection of viral gene expression during infection. moreover, these findings highlight numerous incongruencies with current annotation for the mcmv genome. finally, rna-seq analysis revealed that many of the most abundantly expressed viral genes are of unknown function. because cdna cloning and rna-seq identified significant differences between the mcmv transcriptome and current annotations, we performed an in depth analysis of several genomic regions by northern analyses ( figure 5 , figures s2, s3 , s4, s5) using our cdna clones to generate strand specific riboprobes ( table 1) . to investigate genomic regions where transcripts overlapping more than one gene were detected, we analyzed transcription in m15-16 and m19-20 regions. in both regions multiple transcripts were detected with different temporal expression patterns. smaller transcripts tended to accumulate at later time points, a feature previously reported for certain transcripts in both hcmv and mcmv [22] [23] [24] . in the m15-m16 region 5 transcripts were cloned, all of which overlapped the predicted m15 and m16 genes, and one transcript was spliced ( figure 5a ). the rna-seq profile figure 5 . verification of new transcripts by northern blot. balb/c mef cells were infected with bac derived smith virus and harvested at indicated times post infection. total rna was separated by denaturing gel electrophoresis, transferred to nylon membrane and incubated with probes specific for s and as transcripts. rna integrity and loading was evaluated by inspecting 28s (not shown) and 18s rrna bands under uv light after transfer to membrane. transcripts in the m15-16 (a), m19-m20 (b), m116 (c) and m71-m74 (d) gene regions were analyzed (due to smiling effects during gel electrophoresis for the image shown in 4a and c, the ladder was not accurate for inner lanes of the gel and the position of the ribosomal bands was therefore used to estimate the band sizes). predicted genes (rawlinson's annotation) are depicted as empty arrows, while thin black arrows show longest transcripts cloned in our cdna library as well as clones used to generate probes (marked with *). 39 ends of transcripts are marked with arrowheads. the nucleotide coordinates relative to smith sequence (nc_004065.1) of isolated transcripts are given below thin arrows, while the names of the clones are written above. thin gray lines show isolated transcripts that cannot be detected with the probe. gray histograms showrna-seqreads aligned to mcmv genome. maximal possible exposure times were used to ensure even low abundance transcripts are detected and are noted on the blots. doi:10.1371/journal.ppat.1003611.g005 ( figure 5a , gray histogram) also strongly indicated transcription that spans both predicted genes. consistent with our cdna library, no antisense transcripts were detected while the sense probe detected 5 transcripts ( figure 5a , bands 1-5). the 39 end of all cdna clones end at or close to nucleotide position 15700 (supplemental table s2 ) and rna-seq data alignment to mcmv genome indicates a sudden drop in reads around this nucleotide position. assuming that transcripts in this region are coterminal, band sizes predict transcript initiation sites in m11, m12, m13 or m14, and m15 ( figure 5a , bands 1-4). similar results were found in this region in cells infected with wild isolates of mcmv (alec redwood, personal communication). while the smallest band observed by northern analysis ( figure 5a , band 5) corresponds in size to the novel spliced transcript, e253 (566 bp), we could not confirm splicing by pcr using intronflanking primers (data not shown). therefore, it is likely this spliced transcript represents an aberrant transcript or a result of intramolecular template switching during reverse transcription [25] . the m20 gene region also diverged substantially from current annotation. similar to m15-16 region, in the m20 region 5 transcripts with differential temporal expression patterns were detected by northern analyses using clone ie205 as a probe ( figure 5b ). no transcript was detected using an as probe derived from clone ie205 or l57, indicating that m19 is not transcribed in the sense orientation ( figure 5b and figure s2 ). we therefore propose that m19 should be removed from mcmv genome annotation. this is consistent with our cloning study where 4 transcripts have been isolated in this region and none overlap m19 in the sense orientation (supplemental table s2 ). evidence from the cdna library and rna-seq alignment (table s2 and figure 5b ) indicates that 5 bands detected in northern are co-terminal, with the 39 end located close to nucleotide position 20430. the largest band at 4 kb ( figure 5b , band 1) is detectable at 30 and 60 hours pi and we predict that this transcript initiates in m23. we failed to detect transcripts between the m20 and m25 genes consistent with previous studies [12] and northern analyses using mutant viruses lacking genes in this region (data not shown). the lack of cdna clones can be explained by low abundance and size of this transcript, as well as by propensity of cdna libraries to enrich 39 ends. the band slightly smaller than 3 kb ( figure 5b , band 2) shows a peak accumulation at 60 hours pi and is consistent with a transcript overlapping m19-m21 (approx locations 20430-23220, 2.79 kb). the band of approximately 2 kb ( figure 5b , band 3) shows peak accumulation 10 hours pi. based on the rna-seq profile this band could represent transcripts that initiate at nucleotide position 22060. finally, the late time points are dominated by smaller transcripts of approximately 1 kb ( figure 5b , band 5; predicted start site at 21440) which correspond in size to the cdna clones detected in our study. in short, northern analyses support the conclusion that transcripts overlapping multiple genes in the m15-m16 region and m19-m20 region accumulate in infected cells and indicate that additional, larger transcripts are transcribed in these regions which have yet to be characterized. next we analyzed gene regions in which we detected novel spliced transcripts. the m116 region was chosen as an example of a highly abundant spliced transcript of unknown function in addition to m168-169 transcript. current annotations predict an orf of 1.9 kb whereas rna-seq profiles and cdna study both detected a slightly shorter (1.6 kb) transcript with an 81 bp intron. northern blot analysis ( figure 5c ) identified a strong band of appropriate size (1.6 kb) that starts to accumulate at ie times and peaks at e and l times after infection. due to the small intron and high abundance of this transcript, unspliced transcripts could not be definitively resolved by northern analysis but were confirmed by pcr using primers flanking the intron ( figure s3) . additionally, northern analysis detected another less abundant band of approximately 3 kb. leatham and colleagues [26] have detected a band of similar size in the homologous region in hcmv of 3.2 kb that encompasses ul119-115 genes. while we failed to isolate cdna clones overlapping m117 region, we predict the larger, less abundant 3 kb band observed initiates in m117, though additional northern or 59race studies are needed to confirm the start site of the larger transcript. the m72-m74 region was previously shown to have a very complex transcriptional profile [23, 27] . cdna library data, rna-seq data and results of northern analyses with l42 as a probe all are in agreement with the findings of scalzo et al. [23] of multiple spliced transcripts that share exon 2. bands 5-7 ( figure 5d ) correspond to previously reported m60, m73 and m73.5 spliced transcripts. in our cloning study four isolated clones correspond to m73.5 transcripts (represented by the longest clone, l443) and one to m73 (l33) (supplemental table s4 ). transcripts corresponding to m60 were not isolated in the cloning study, however the rna-seq profile in the region corresponding to m60 exon1 shows active transcription ( figure s4d ). we have also detected a 1.1 kb band ( figure 5d , band 6) that corresponds to longer m73 and m75 transcripts, and bands corresponding to unspliced transcripts of m73 and m73.5 (approx. 2 kb, figure 5d , band 4). in addition to these previously published transcripts, we have also cloned one novel spliced transcript from this region, e180. in accordance with work in the m60 region [23] , we propose m71s as a designation for this novel gene. like other spliced transcripts from this region, e180 shares exon 2 with other transcripts while its splice donor site is located at 102830. the spliced nature of this transcript has been confirmed by pcr ( figure s4c) , however more analyses are needed to determine its exact 59 start site. northern analysis revealed a band of 0.5 kb ( figure 5d , band 7) that corresponds in size to the e180 spliced transcript whereas the unspliced version is detected around 3 kb ( figure 5d, band 3) . in addition, a band of similar size (3.5 kb) transcribed from the plus genomic strand detected by the l147 probe ( figure figure s4b ) could correspond to the unspliced version of e180. all probes used in this region detected bands transcribed from negative genomic strand that correspond to those previously reported by rapp et al. [27] . based on additional northern blots using cdna clones l69 (as to m72) and l147 (as to m74) ( figure s4a and b) as well as previous reports [23, 27] we conclude that the 5 kb transcript starts in m75 and ends in m72 and corresponds to transcript encoding gh while the 3 kb transcript starts in m74 and ends in m72 and corresponds to transcript encoding dutpase (supplemental figure s4 ). additional very large transcripts transcribed from plus or minus genomic strands detected by the l42 and l147 probes, respectively, have yet to be characterized but underscore the complex transcriptional patterns in this region. last, we set out to confirm novel antisense transcripts detected in the cdna library. analysis of transcription in m100-m103 region confirmed previously published findings. we have detected single m102 transcript from plus genomic strand as described by scalzo [28] ( figure s5a) . a probe derived from m100 detected a single transcript from negative genomic strand corresponding to m100 [28] , and 2 from the positive genomic strand that correspond to those described by cranmer et al. [29] and are in line with our cdna and rna-seq analysis ( figure s5b) . the presence of sense and antisense transcripts in this gene region corresponds to findings for hcmv [16] . finally, in the m103 gene region we detect 2 transcripts from plus genomic strand that correspond to those described by lyons et al [30] (figure s5c ). temporal expression of transcripts detected by northern in this region is in line with our cdna analysis and previously published data [13, 15] . based on northern analyses of 5 regions, we conclude that our cdna and rna-seq analyses faithfully represents the mcmv transcriptome in infected primary fibroblasts and confirms the presence of novel transcripts. moreover, the distribution of clones in the ie, e and l cdna libraries accurately reflected the accumulation of transcripts detected by northern analyses. rna-seq analysis also enabled us to investigate changes in the host transcriptome. differentially expressed (de) murine genes in mcmv-infected cells compared to mock-infected cells were identified by calculating rpkm. this analysis identified 10,748 statistically significant (p,0.05) genes altered by infection ( table s5) . the top induced, upregulated, repressed and downregulated genes are presented in tables 2-5 , (genes associated with characterized biological pathways are in bold). interferon b (ifnb1) and the interferon-inducible pyhin1 (a.k.a. ifi-209, ifix) were among the top induced genes, consistent with the expected host response to virus infection. also congruous with expected host responses to infection were two highly induced genes associated with apoptosis induction (hrk and tnfsf10 [a.k.a. trail]). interestingly, transcription factors (foxa1, en2, insm1, tbx21, [a. k.a t-bet], and tp73) were among the most strongly induced by mcmv infection. chemokine ligands dominated the group of the top upregulated genes. genes encoding proteins with roles in intrinsic cellular defense were also highly upregulated, including oas1, mx1, gpb5 and rsad2 (a.k.a. viperin). there were also a surprising number of genes involved in development, differentiation, and stem cell renewal strongly induced or upregulated by infection, including foxa1, spint1, lin 28b, en2, gabrq, esx1, trim71, trp73, cpne5, cdh7, cited 1, pou4f1, and jag2. the relevance of these genes, as well as others including art3, ugt8, and trank1 to infection is not clear. we analyzed protein levels of several induced and upregulated transcripts whose relevance to mcmv infection is unknown (figure 6 ) including the notch ligand jagged 2, the homeobox-containing transcriptional factor engrailed 2, and the e3 ubiquitin-protein ligase trim 71. protein levels of all proteins tested correlated with their transcript levels in infected balb/c fibroblasts. interestingly, the top repressed and downregulated genes are primarily of unknown relevance to infection, though many are receptor or cell surface molecules (npy6r, rxfp, mc2r, cd200r3, antxrl, scara5, il1r2, agtr2, gpr165, the olfactory receptor genes, olfr1314 and olfr78, and the lectin or lectin-like genes clec 3b and reg3a). mcmv infection also caused repression or downregulation of noncoding (nc)rnas including the small nucleolar rna gene, snord15a and genes of unknown function including 3 long intergenic noncoding rnas (lincrnas), the miscellanous rna, 4930412b13rik, and 2 antisense transcripts (gm12963, gm15883). to summarize, while many of the top upregulated genes are associated with host responses to infection, the function of many of cxcl10 chemokine (c-x-c motif) ligand 10 8.5 ccl5 chemokine (c-c motif) ligand 5, rantes 8.5 trank1 tetratricopeptide repeat and ankyrin repeat containing 1 8.4 cxcl9 chemokine (c-x-c motif) ligand 9; mig 8. genes and their products do not work in isolation but rather form pathways and networks. even small perturbations in gene expression in a pathway can exert profound influences on eventual processes or functions. therefore, we analyzed gene lists for shared common pathways. as expected, the top scoring gene networks for all differentially expressed (de) genes included (i) infectious disease, antimicrobial response, inflammatory response (28 focus molecules); (ii) inflammatory response, cellular development, cellmediated immune response (27 focus molecules) and (iii) cell morphology, hematological system development and function, inflammatory response (19 focus molecules) ( table s6a) . these were also top networks when the subset of up-and down-regulated genes were evaluated (table s6b) . also identified were networks associated with cell morphology and hematological system development and function. when this analysis was conducted with only induced and repressed genes, the top networks included cellular development, cell-mediated immune response, cellular function and maintenance, gene expression and embryonic development ( table s6c ). the relationships among the molecules in top networks for differentially regulated and induced/ repressed genes are shown in figures figure s6 and figure s7 . thus, an unexpected outcome of this analysis is that mcmv infection influences a subset of networks controlling development.. the biological functions and/or diseases that were most significant to the molecules in the mcmv-regulated networks are shown in figure 7a . immunological disease, cardiovascular disease, genetic disorders, and skeletal and muscular disorders ranked as the top bio-functions connected with genes altered by mcmv infection. among molecular and cellular functions, cell growth and proliferation were the top ranked perturbed functions, consistent with known effects of lytic mcmv infection of cells. nervous system development and function is at the top of the list of physiological and developmental biofunctions, followed by organismal and tissue development and, surprisingly, behavior with 92 associated genes. de genes were also evaluated for canonical pathways in the ingenuity library ( figure 7b) . the pathways most affected by mcmv included g-protein coupled receptor signaling followed by pathogenesis of multiple sclerosis and gaba receptor signaling. together, these analyses point to known and expected consequences of infection at the cellular level (i.e., cell growth and proliferation, g-protein coupled receptor signaling) and physiological level (i.e. nervous system development) but also highlight unexpected cell and molecular functions, as well as physiological systems and disorders that may advance the understanding of cmv pathogenesis. gene ontology (go) enrichment using gorilla ranked lists analysis [31, 32] was also used to analyze de genes. the full list of enriched go terms long with associated genes is shown in table s7 . gorilla analysis highlighted processes associated with upregulated genes including cell differentiation, neuron differentiation, regulation of ion transport, and the g-protein coupled receptor signaling pathway. genes downregulated during mcmv infection were associated with many processes, including regulation of cell shape, adhesion, motility, and the extracellular matrix. altogether, gorilla analyses support results of the ingenuity pathway analysis and suggest novel processes regulated in infected cells, notably suggesting that infection leads to a restructuring of the extracellular environment of the infected cells. we report a comprehensive analysis of the mcmv transcriptome during lytic infection derived from cloning and sequencing of viral transcripts and next generation sequencing (rna-seq). by combining the approaches of rna-seq and traditional cdna cloning as well as northern and rt-pcr analyses in certain complex regions, we were able to construct a comprehensive profile of viral and host transcription during lytic infection. we also investigated the host transcriptome using rna-seq combined with differential gene expression analysis, pathway analysis, and gene ontology analysis. the major findings are as follows: 1) the mcmv transcriptome diverges substantially from that predicted by current annotation; 2) the identification of a novel viral protein specified by the mat transcript indicates that this transcript functions as an mrna and a non-coding rna; 3) the majority of the most abundantly transcribed viral genes are of unknown function; and 4) the host response to infection includes regulation of many host genes and gene networks of unknown relevance to infection. there are four major findings from the analysis of the mcmv transcriptome. first, we demonstrate novel transcripts of mcmv including novel splice variants, transcripts that map to noncoding regions, and transcripts overlapping multiple genes. earlier, we reported similar novel transcripts of hcmv through analysis of a classical cdna library [16] . this study revealed a dramatic increase in the complexity of viral gene products compared to currently available predictions and its findings were later on confirmed by rna-seq analysis [17] . a more recent analysis of hcmv translational products [18] by ribosomal footprinting identified over 700 translated orfs -a strikingly high number compared to annotated genes. this discrepancy is, at least in part, a consequence of the polycistronic nature of hcmv transcripts which appear to code for many more orfs than previously predicted (internal in frame or out-of-frame orfs, uorfs) as well as orfs coming from antisense or dedicated short transcripts. our analysis demonstrated that the mcmv transcriptome is similarly complex: we identified several regions where multiple 39 co-terminal transcripts expressed in different temporal phases are being transcribed. such transcripts have the potential to code for truncated protein forms or even completely new proteins as described for hcmv, suggesting that the size and complexity of the mcmv proteome, like the mcmv transcriptome, is currently underestimated. accumulation of ncrnas is also a prominent feature of the cytomegalovirus transcriptomes. our rna-seq analysis shows intense transcription in previously described stable mcmv introns and in intergenic regions, consistent with abundant ncrnas reported for hcmv and mcmv [16, 17, 33] . these findings have a profound implication for understanding studies of cmv genes functions and underscore the need for transcriptomic maps in addition to genomic maps depicting only orfs. the functions of many mcmv genes have been elucidated by using deletion mutants [34] . however in a transcriptionally complex region of the genome any deletion will likely impact multiple transcripts and possibly multiple proteins resulting in complex phenotypes. in line with previous studies [13] , we identified novel as transcripts of mcmv. interestingly, preliminary estimates in our cloning study indicate that as transcripts occur at much lower frequency than reported for hcmv [16] . there are likely to be additional as transcripts of mcmv. because we did not capture every known sense transcript of mcmv, we may presume that the cdna cloning study did not capture all as transcripts. in addition, the rna-seq analysis performed in this study was limited by the fact that the methods employed did not provide strand-specific information and could not identify novel as transcripts. as transcripts, even those expressed at low levels, may possess noncoding rna functions and contribute to complexity of the proteome as has described for hcmv [20] . therefore, further studies are needed to determine the number and nature of as transcripts derived from mcmv and will be critical to generating definitive transcriptome and proteome maps of this virus. the cdna library analysis does suggest that the extent of mcmv as transcription is lower than that described for other herpesviruses, including hcmv. these results are consistent with a strand-specific rna-seq experiment performed by dã¶lken group [15] that also show poor as transcription in comparison to sense counterparts. very little antisense transcription was also noted for the anguillid herpesvirus 1 (anghv1) infecting eels [35] , though extensive antisense transcription was reported for other herpesviruses, including kshv and mhvc68 [36, 37] . we conclude that different members of the herpesviridae family differ in the extent of antisense transcription during lytic infection. second, we observed similar inconsistencies between transcriptomic data and gene annotation for mcmv as previously reported for hcmv [16] . these discrepancies can profoundly impact future studies related to the quantitative analyses of gene expression, interpretation of microarray studies, comparisons to newly sequenced virus strains, and studies using deletion mutant virus strains. the results presented here represent an important first step in re-annotation of the mcmv genome and underscore the utility of transcriptome studies in validating and refining genome annotation for microbial pathogens. third, analysis of the mcmv transcriptome revealed the striking abundance of the spliced mat transcript. this gene is also largely conserved in wild isolates of mcmv (alec redwood, personal communication and [38] ) and the protein is expressed by wild isolates tested in this study. mat abundance may reflect its multiple functions. the 39 untranslated region (utr) of this transcript facilitates degradation of murine mir-27, establishing that this transcript functions as a noncoding rna molecule [19, 20] . members of the alpha, beta, and gamma herpes virus subfamilies all encode for abundant, largely enigmatic noncoding rnas including the latency associated transcript (lat) of herpes simplex virus (hsv), ebna rnas of esptein-barr virus (ebv), the b2.7 transcript of hcmv, the pan rna of kaposi's sarcoma herpes virus (kshv) and the hsur transcripts of herpesvirus samiri (hvs) which also downregulates the cellular mir27 (reviewed in [39] ). in addition to the noncoding function of the mat, we demonstrate that this transcript also encodes for a novel small protein of approximately 17 kda. to our knowledge, this is the first herpes virus transcript we know of that functions both as a noncoding rna, and an mrna that specifies a novel viral protein. fourth, a somewhat startling finding from the quantitative rna-seq analysis was that after mat, the most abundant viral transcripts in infected cells are derived from genes without known functions, including m116. we report that m116 is a novel spliced transcript predicted to specify a much smaller protein compared to current annotation. these results highlight fundamental gaps in our understanding of basic mcmv biology. we found that the cdna library and rna-seq approaches yielded remarkably complementary data including identification of novel transcripts and new insights into transcript abundance, despite different biases in each of these methods. for example, while there may be selection bias for isolating transcripts with long tracts of adenosines during cdna library construction [16] , gc content, bias in the sites of fragmentation, primer affinity and transcript-end effects may influence rna-seq results [40] . future rna-seq studies may also facilitate novel gene identification as rna-seq has now been applied to ab initio reconstruction of gene structure [41] using only rna-seq data and the genome sequence. however, currently available algorithms are unable to cope with highly dense genomes, such as mcmv and other viral genomes. until such tools are developed for very dense genomes, rna-seq data relies upon comparison to existing gene annotation and other experimental methods for gene structure prediction. in this study, we compared rna-seq to currently used annotations but also to the cdna library study, northern analysis, and rt-pcr studies to identify and validate numerous novel transcripts. we also report that lytic infection elicits a profound cellular response in fibroblasts. this study identified 10,748 differentially regulated genes. as the number of mouse genes is estimated to be 33,207 [42] we estimate that over 31% of mouse genes are altered in response to infection. many of the top upregulated and induced genes and gene networks were associated with immune responses to infection, including interferon and interferon-inducible genes such as phyin1, a potential activator of p53 [43] , the inflammasone regulator gpb5 [44] and rsad2 (a.k.a. viperin), also known to be induced by hcmv [45] . inflammatory chemokine ligand genes are also highly upregulated during infection. mcmv encodes virally-derived chemokine homologs specified by the m131/m129 genes [46, 47] and at least one chemokine receptor homolog, m33 [48] . numerous host chemokine receptors are also upregulated by infection, suggesting a remarkably complex interplay between mcmv-derived and host derived chemokine signaling during infection. induction of inflammatory gene networks by mcmv also lends credence to the hypothesis that inflammatory responses link cmv infection to chronic diseases, such as chronic allograft rejection, cardiovascular disease, and cancer [2, 4, 5] . numerous transcription factors are also induced or upregulated by infection including insulinoma-associated 1 (insm1). recently, insm1 was found to be strongly upregulated by hsv-1 infection and shown to promote hsv gene expression, probably by binding the hsv infected cell protein (icp)0 promoter. [49] . this raises the intriguing possibility that insm1 plays a similar role in promoting virus gene expression during mcmv infection. another induced transcription factor induced at the transcript and protein level is engrailed-2 (en2). this transcription factor is key to patterning cerebellar foliation during development [50] . we previously described a profound dysregulation of cerebellar development in brains of neonatal mice infected with mcmv [51] , suggesting a possible physiological link to regulation of this gene. another top induced gene was the gaba receptor, gabrq. glutamate receptor signaling was also identified as significantly impacted canonical pathways in our dataset. in the developing brain gaba and glutamate receptors influence neuronal proliferation, migration, differentiation or survival processes [52] . whether and how these observations relate to our previous findings that mcmv infection of neonates results in decreased granular neuron proliferation and migration [51] are important areas for future study and may impact our understanding of neurological damage and sequelae associated with hcmv in congenitally-infected infants. perhaps most importantly, many top regulated genes, especially downregulated and repressed genes, are associated with functions whose roles in infection are obscure, including many genes of unknown function. many downregulated or repressed genes are cell surface molecules, host lincrnas, antisense rnas, or small nucleolar rnas. regulation of lincrnas was recently observed during infection with severe acute respiratory syndrome coronavirus (sars-cov) and influenza virus, and have been suggested to impact host defenses and innate immunity [53] . further studies to identify the functions of these downregulated and repressed genes and noncoding rnas during mcmv infection may well provide novel insights into the virus-host molecular interface as well as possible therapeutic targets. this analysis also revealed immunological disease, cardiovascular disease, genetic disorders and skeletal and muscular disorders as top bio-functions connected with genes altered by mcmv infection. while mcmv involvement in cardiovascular disease is a subject of intensive research, potential involvement in skeletal and muscular disorders is not well documented but may be relevant to the novel observation that mcmv infection of mice with a heterozygous trp53 mutation develop rhabdomyosarcomas at high frequency [54] . a primary caveat of rna-seq analysis is determining whether changes in gene transcript levels are also reflected at the protein level. this is particularly important as herpesviruses can control protein accumulation at the post-transcriptional, translational, and post-translational levels [55] [56] [57] . we confirmed that the notch ligand, jagged 2, is highly upregulated by infection at both the transcript and protein level. notch signaling is a highly conserved signaling pathway that plays important roles in development, including neurogenesis and differentiation of immune cell subsets [58] . jagged 2 is also upregulated by the alphaherpesviruses, hsv-1 and psuedorabies viruses [59] . kshv and ebv also exploit the notch signaling pathway to facilitate aspects of their life cycle [60] and notch signaling is proposed to influence hsv-2-induced interferon responses [61] . we show for the first time that the betaherpesvirus, mcmv, also influences notch signaling. dysregulation of jagged2 as a consequence of mcmv infection is highly interesting since it plays a role in important processes affected by cmv including inner ear development [62, 63] , generation of motor neurons [64] and differentiation of immune cell subsets [65, 66] . to summarize, this study has refined the understanding of mcmv gene expression and identified new areas of research to advance our understanding of the host response to these ancient viruses. we describe what is to our knowledge, the first herpes virus transcript that functions as both a noncoding rna that limits accumulation of cellular mirnas, and an mrna that specifies a protein. this study also revealed novel features of the host response to infection. perhaps most importantly, this study identified many virus and host genes of unknown function that are regulated during infection. it is highly likely that further study of these genes may lead to breakthroughs in the understanding and treatment of cytomegalovirus-related diseases. primary mouse embryonic fibroblasts (mefs) from balb/c or balb.k mice were prepared and maintained as described [67] and used between passages 3-8. immortalized murine balb.k mefs, (mef.k) [68] and svec4-10 (atcc crl-2181) were used for immunoblot studies. mcmv smith strain (atcc vr-1399) was propagated and titrated on primary balb/c mef by standard plaque assay as described in detail in [69] . wild type mcmv isolates k181 (genbank acc no: am886412.1), c4d, k6 and wp15b (genbank acc no: eu579860.1) [38] were a kind gift from a. redwood (university of western australia, australia). construction of the d7s3, dm167, dm168, dm169, dm170, and m169-mut mutant viruses were previously described and were generated by et-cloning [70] using the full-length mcmv bac psm3fr [71] . the double deletion mutants (dm168dm169 and dm169dm170) were constructed exactly as described previously [20] . primers for construction of the double deletion mutants are also as described [20] using the forward primer for the first gene and reverse primer for the second gene. all infections were conducted by exposing cells to 0.3 pfu/cell followed by centrifugal enhancement for 30 minutes at 800 g, as described in [69] . smith mcmv infected cbalb/c mefs were harvested 72 h post infection and viral dna was isolated as described [16] . rna was extracted from smith mcmv infected balb/c mef at 4, 8 and 12 hrs after infection (ie library); 16, 24 and 32 hrs after infection (e library); and 40, 60 and 80 hrs after infection (l library). no drug was used to select for different temporal classes of transcripts and equal amounts of rna from each time point were pooled prior to library construction. cdna libraries were generated as described previously for hcmv [16] by following the instruction manual for the superscript plasmid system with gateway technology for cdna synthesis and cloning (invitrogen) with some minor modifications. briefly, total rna was isolated using the trizol reagent (invitrogen, ca, usa). a poly(t)-tailed paci primer-adapter was used for first-strand cdna synthesis (59-gcggccgcttaattaacc(t) 15 -39) . after sec-ond-strand synthesis, an ecori-pmei adapter was added to the 59 end and cdnas were cleaved with ecori and paci. the ecori-pmei adapter was generated by annealing following oligonucleotides: 59-aattcccgcgggtttaaacg-39 and 59-pho-cgtttaaacccgcggg-39. cdna fragments were inserted into a modified pcdna3.1(+) previously digested with ecori and paci and transformed into xl1-blue supercompetent e. coli cells (stratagene, ca, usa). positive selection of viral cdna clones was performed as described previously [16] . mse i-digested genomic mcmv dna was labeled using a dig high prime dna labeling detection starter kit ii (roche applied science) and used to identify virallyderived cdna clones. plasmids harboring cdna clones that reacted with probe were isolated and sequenced from the 59 end using t7 primer for pcdna3.1(+) or the 39 ends using primer (59gcaccttccagggtcaaggaag) or standard poly (t) primers at the osu plant-microbe genomics facility. sequences were compared to the mcmv smith strain genome [genbank accession no. nc_004065] using mega blast. total rna was extracted from balb/c mef cells cultured in 100 mm 2 petri dishes and exposed to 0.3 pfu/cell of the mw 97.01 strain of murine cytomegalovirus or mock-infected. at 4, 8, 12, 16, 24, 32, 40, 60 and 80 hours after infection, rna was isolated using trizol reagent. rna integrity was assessed on agilent bioanalyzer and only samples with rna index values of at least 9 were used. equal amounts of rna from each time point were pooled (0.3 mg of rna per time point) and treated with dnasei. libraries were prepared with illumina truseq rna kit according to manufacturer's instructions and sequenced on illumina genome analyzer iix as single-end 36 bp reads. the illumina truseq rna kit employed does not allow for strandspecific information to be derived from the sequence data. datasets are available at the national center for biotechnology information (ncbi) sequence read archive (sra) accession no. srr953479 (sequence reads from mcmv-infected mefs) and accession no. srr953859 (sequence reads from mockinfected mefs). reads were aligned to mouse (ncbi37/mm9 assembly) and mcmv genome (genbank acc.no. nc_004065.1) using eland aligner or bowtie aligner (for comparison with data provided by lars dã¶lken). it is important to note that both eland and bowtie aligners do not map splice junctions and thus give concordant results. alignments were visualized using integrative genomics viewer (http://www.broadinstitute.org/igv/) [72] . differential gene expression was assessed by calculating rpkm (reads per kilobase of million mapped reads (rpkm) using sammate 2.6.1. release with edger (http://sammate. sourceforge.net/) [73] . gene ontology (go) enrichment analysis was performed on filtered lists of differentially expressed genes (p,0.05) using gorilla ranked lists analysis [31, 32] . ingenuity core analysis (ingenuity systems, www.ingenuity.com) was used for gene interaction network and canonical pathway analysis. gene lists were filtered for statistically significant differentially expressed genes (p,0.05) and a fold change cutoff of 2 was set to identify molecules whose expression was significantly differentially regulated. for network generation, these molecules (network eligible molecules), were overlaid onto a global molecular network developed from information in the ingenuity knowledge base based on their connectivity. the functional analysis of a network identified the biological functions and/or diseases that were most significant to the molecules in the network. right-tailed fisher's exact test was used to confirm that biological functions and/or disease assigned to data sets were not due to chance. the nature of individual de genes was also investigated using the mouse genome informatics databases (http://www.informatics.jax.org/) [74] and entrez gene (http://www.ncbi.nlm.nih.gov/gene) [75] . rna was isolated using trizol reagent from mock or mcmvinfected balb/c mef at 24 hours (mat) or 10, 30 and 60 hrs after infection. rna (1 mg/lane or 10 mg/lane (mat)) was separated by formaldehyde agarose gel electrophoresis and transferred to positively charged nylon membrane and crosslinked by uv irradiation. membranes were reacted to dig-labeled probes overnight at 67uc. for mat detection, a dig labeled double-stranded dna probe was made using fragments corresponding to the mat gene sequences derived from cdna library clones e1, e125 and e134 using roche's dig-high prime dna labeling and detection starter kit i. for all other northern blots, single-stranded dig-labeled rna probes were used generated using roche's dig northern starter kit. antisense probes were generated by in vitro transcription from t7 promoter present in pcdna3.1 plasmids containing cdna clones that harbor the desired gene fragments (table 1) . therefore antisense probes are identical to transcripts cloned in cdna library and can detect transcripts antisense to cdna clones. to generate sense probes, t3 promoter was added to 59 end of complimentary strand of the gene fragments used for antisense probes by pcr ( table 1 ). the pcr fragments were then in vitro transcribed and dig labeled using t3 rna polymerase. care was taken to generate sense probes of length comparable to corresponding antisense probes. the m169 gene sequence was amplified by pcr using viral dna isolated from mcmv bac psm3fr using following primers: f: 59-tttttggatccatgagcaacgcggtcccgttc-39 and r: 59-tttttctgcagtcatcacggggggcacc-tacc-39, reacted with bamhi and hindiii (new england biolabs), inserted into pqe30 expression vector and introduced to e.coli bl21 prep4 strain (qiagen). the protein was induced according to manufacturers' instructions and purified on a his-tag column. purified protein was used for immunization of balb/c mice and antibody titer in blood serum was measured by elisa. when antibody titer in serum reached adequate levels, animals were sacrificed, their spleens isolated and fused with sp2/o cells. supernatants from motherwells were tested by immunoblot blot on purified mat protein and positive wells were rescreened by immunoblot using lysates from mefs infected with wt, d7s3, d168-169, d169-170, dm168, dm169 and dm170 mutants as described below. rna from mock-or mcmv-infected cells isolated for northern blots was also reverse transcribed using oligo-dt primers (proto-script m-mulv taq rt-pcr kit, new england biolabs). no reverse transcriptase (-rt) controls were run in parallel. splicing was then verified by pcr amplification using primers that flank putative introns (m116; f: cttcatcggattcggaggc; r: tgttgttgtcgacgtctgatgtg; m71-m75; f: atc-tcctctgcctccgacctc, r: cgatgtcatcttggaa-tccgacga; m72-m75; f: ccggatacgaccgtcagc, r: cgatgtcatcttggaatccgacga) using phusion high fidelity polymerase (new england biolabs). mock-infected or mcmv-infected primary mefs, or murine cell lines (mef.k, svec4-10) were lysed in ripa buffer. protein lysates were separated by sds-page and transferred to pvdf. mat protein was detected with anti-m169 antibody described above, jag2 with antibody n-19 (santa cruz), engrailed 2 with en2 pa5-14363 antibody (thermo scientific), trim71 with pa5-19282 (thermo scientific), and actin with antibody c4 (millipore) followed by peroxidase-labeled secondary antibodies (jackson immunoresearch or abcam). proteins were visualized using amersham ecl prime western blotting reagent (ge healthcare) and quantified using imagej software (http://rsbweb.nih.gov/ij/). dataset s1 comparison of sensitivity and temporal gene expression data from this study to previous microarray studies of mcmv (s1a and s1b) and comparison of rpkm values in and this rnaseq experiment (s1c). dataset s2 spliced cdna clone overlapping m116 and comparison of predicted protein to current annotation. (pdf) figure s1 rna-seq profiles comparison. rna-seq data from total rna obtained from mcmv infected nih-3t3 fibroblasts from 25 and 48 hrs pi sequenced by dã¶lken group (gse35833) was aligned against mcmv genome (gb acc no nc_004065.1) using bowtie aligner and visualized in igv in comparison to our rna-seq data. the view of the complete genome is shown at the top with 4 areas magnified below (labeled a-d) and the number of reads displayed are noted on the side. since viral genes display a wide range of expression levels, the whole genome view is shown in wide data range (upper panel) more suitable for displaying highly transcribed regions and a narrowed data range (lower panel) that is more suitable for less transcribed regions. as can be seen, the profiles of the compared alignments are remarkably similar, the only differences being abundance of certain transcripts which are due to different time points analyzed in comparison to the pooled data of our rna-seq and significantly greater depth of at least one order of magnitude of our data in comparison to marcinowski data. (tif) figure s2 analysis of the m20-19 region. balb/c mef cells were infected with bac derived smith virus and harvested 10, 30 and 60 hrs post infection. total rna was separated by denaturing gel electrophoresis, transferred to nylon membrane and incubated with probe generated by in vitro transcription from t7 promoter of l57 [a; probe should detect predicted m19(s) transcripts] or probe generated by in vitro transcription from t3 promoter of ie205 transcript [probe should detect m20(s)-m19(as) transcripts]. rna integrity and loading was evaluated by inspecting 28s (not shown) and 18s rrna bands under uv light after transfer to membrane. predicted genes (rawlinson's annotation) are depicted as empty arrows, while thin black arrows show longest transcripts cloned in our cdna library as well as clones used to generate probes (marked with *). 39 ends of transcripts are marked with arrowheads. the nucleotide coordinates relative to smith sequence (nc_004065.1) of isolated transcripts are given below thin arrows, while the names of the clones are written above. gray histograms showrna-seqreads aligned to mcmv genome. maximal possible exposure times were used to ensure even low abundance transcripts are detected and are noted on the blots. (tif) figure s3 verification of m116 splicing by pcr. schematic of the m116 gene region and clone l29 (a) and pcr analysis of the splice site (b). in (a) predicted genes (rawlinson's annotation) are depicted as empty arrows, while the thin black arrow depicts clone l29 with red arrows depicting the primers used in (b). the 39 ends of transcripts are marked with arrowheads. the nucleotide coordinates relative to smith sequence (nc_004065.1). gray histograms showrna-seq reads aligned to mcmv genome. rna isolated from wt infected balb/c mef used in northern blots was reverse transcribed with oligo-dt primers, and then amplified with primers specific for m116 that flank the putative intron (marked by red arrows). no rt controls were run in parallel. spliced cdna clones and viral dna were used as spliced and unspliced amplification controls, respectively. (tif) figure s4 analysis of the m71-m75 region spliced transcripts by northern blot and pcr. balb/c mef cells were infected with bac derived smith virus and harvested at indicated times post infection (a and b). total rna was separated on denaturing gel electrophoresis, transferred to nylon membrane and incubated with probes specific for s and as transcripts. rna integrity and loading was evaluated by inspecting 28s (not shown) and 18s rrna bands under uv light after transfer to membrane. predicted genes (rawlinson's annotation) are depicted as empty arrows, while thin black arrows show longest transcripts cloned in our cdna library as well as clones used to generate probes (marked with *). images of northern blots are shown using probes derived from the m72 region in (a) and the m74 in (b). the 39 ends of transcripts are marked with arrowheads. the nucleotide coordinates relative to smith sequence (nc_004065.1) of isolated transcripts are given below thin arrows, while the names of the clones are written above. thin gray lines show isolated transcripts that cannot be detected with the probe. gray histograms showrna-seqreads aligned to mcmv genome. maximal possible exposure times were used to ensure even low abundance transcripts are detected and are noted on the blots (a and b). in (c), the m71-75 spliced transcript and one of two possible m72-75 spliced transcripts were verified by pcr amplification using primers that flank their putative introns. rna isolated from wt infected balb/c mef used in northern blots was reverse transcribed with oligo-dt primers, and then amplified with primers specific to m71-m75 or m72-m75 spliced transcripts (marked by red arrows). no rt controls were run in parallel. spliced cdna clones and viral dna were used as spliced and unspliced amplification controls, respectively. in (d), the position of exon 1 of m60 reported by scalzo et. al. [23] , is compared to rna-seq data for this genomic region. (tif) figure s5 northern blot analysis of the m100-m103 region. balb/c mef cells were infected with bac derived smith virus and harvested at indicated times post infection. total rna was separated on denaturing gel electrophoresis, transferred to nylon membrane and incubated with probes specific for s and as transcripts overlapping m102 (a), m100 (b) and m103 (c) genes. rna integrity and loading was evaluated by inspecting 28s (not shown) and 18s rrna bands under uv light after transfer to membrane. predicted genes (rawlinson's annotation) are depicted as empty arrows, while thin black arrows show longest transcripts cloned in our cdna library as well as clones used to generate probes (marked with *). 39 ends of transcripts are marked with arrowheads. the nucleotide coordinates relative to smith sequence (nc_004065.1) of isolated transcripts are given below thin arrows, while the names of the clones are written above. thin gray lines show isolated transcripts that cannot be detected with the probe. gray histograms showrna-seqreads aligned to mcmv genome. maximal possible exposure times were used to ensure even low abundance transcripts are detected and are noted on the blots. (tif) figure s6 graphical representation of top 3 genetic networks for differentially regulated genes. upregulated genes are shown in red, while downregulated are shown in green. level of differential expression is represented by color saturation with most dramatically changed genes being shown in the most saturated color (strong red or green). these overlapping genetic networks are associated with (i) cell-mediated immune response, cellular development, cellular function and maintenance (30 focus molecules); (ii) infectious disease, antimicrobial response, inflammatory response (27 focus molecules) and (iii) antimicrobial response, inflammatory response, gene expression (24 focus molecules) (see supplemental table s5 ). (tif) figure s7 graphical representation of top 5 genetic networks for genes induced or repressed by infection. induced genes are shown in red, while repressed are shown in green. level of differential expression is represented by color saturation with most dramatically changed genes being shown in the most saturated color (strong red or green). these overlapping genetic networks are associated with various developmental processes (see supplemental table s6 ). (tif) cytomegalovirus: pathogen, paradigm, and puzzle manifestations of human cytomegalovirus infection: proposed mechanisms of acute and chronic disease cytomegalovirus reactivation in critically ill immunocompetent hosts: a decade of progress and remaining challenges infection and atherosclerosis. an alternative view on an outdated hypothesis does cytomegalovirus play a causative role in the development of various inflammatory diseases and cancer? the search for new therapies for human cytomegalovirus infections analysis of the protein-coding content of the sequence of human cytomegalovirus strain ad169 the dna sequence of the human cytomegalovirus genome analysis of the complete dna sequence of murine cytomegalovirus identification of proteins associated with murine cytomegalovirus virions predicting coding potential from genome sequence: application to betaherpesviruses infecting rats and mice experimental confirmation of global murine cytomegalovirus open reading frames by transcriptional detection and partial characterization of newly described gene products temporal profiling of the coding and noncoding murine cytomegalovirus transcriptomes stability of murine cytomegalovirus genome after in vitro and in vivo passage real-time transcriptional profiling of cellular and viral gene expression during lytic cytomegalovirus infection antisense transcription in the human cytomegalovirus transcriptome high-resolution human cytomegalovirus transcriptome decoding human cytomegalovirus posttranscriptional regulation of mir-27 in murine cytomegalovirus infection degradation of cellular mir-27 by a novel, highly abundant viral transcript is important for efficient virus replication in vivo uncovering the complexity of transcriptomes with rna-seq characterization of the human cytomegalovirus ul34 gene the murine cytomegalovirus m73.5 gene, a member of a 39 co-terminal alternatively spliced gene family, encodes the gp24 virion glycoprotein regulation of cytomegalovirus late-gene expression: differential use of three start sites in the transcriptional activation of icp36 gene expression reverse transcriptase template switching and false alternative transcripts alternate promoter selection within a human cytomegalovirus immediate-early and early transcription unit (ul119-115) defines true late transcripts containing open reading frames for putative viral glycoproteins expression of the murine cytomegalovirus glycoprotein h by recombinant vaccinia virus dna sequence and transcriptional analysis of the glycoprotein m gene of murine cytomegalovirus cloning, characterization, and expression of the murine cytomegalovirus homologue of the human cytomegalovirus 28-kda matrix phosphoprotein (ul99) mapping and transcriptional analysis of the murine cytomegalovirus homologue of the human cytomegalovirus ul103 open reading frame gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists discovering motifs in ranked lists of dna sequences murine cytomegalovirus encodes a stable intron that facilitates persistent replication in the mouse strategies for the identification and analysis of viral immune-evasive genescytomegalovirus as an example genome-wide gene expression analysis of anguillid herpesvirus 1 redefining the genetics of murine gammaherpesvirus 68 via transcriptome-based annotation the lytic transcriptome of kaposi's sarcoma-associated herpesvirus reveals extensive transcription of noncoding regions, including regions antisense to important genes laboratory strains of murine cytomegalovirus are genetically similar to but phenotypically distinct from wild strains of virus noncoding rnps of viral origin local and global factors affecting rna sequencing analysis ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas the mouse genome database (mgd): from genes to mice-a community resource for mouse biology stabilization of p53 in human cytomegalovirus-initiated cells is associated with sequestration of hdm2 and decreased p53 ubiquitination gbp5 promotes nlrp3 inflammasome assembly and immunity in mammals human cytomegalovirus directly induces the antiviral protein viperin to enhance infectivity cytomegalovirus mck-2 controls mobilization and recruitment of myeloid progenitor cells to facilitate dissemination spliced mrna encoding the murine cytomegalovirus chemokine homolog predicts a beta chemokine of novel structure functional analysis of the murine cytomegalovirus chemokine receptor homologue m33: ablation of constitutive signaling is associated with an attenuated phenotype in vivo herpes simplex virus induces the marked up-regulation of the zinc finger transcriptional factor insm1, which modulates the expression and localization of the immediate early protein icp0 the engrailed homeobox genes determine the different foliation patterns in the vermis and hemispheres of the mammalian cerebellum altered development of the brain after focal herpesvirus infection of the central nervous system glutamate and gaba receptor signalling in the developing brain unique signatures of long noncoding rna expression in response to virus infection and altered innate immune signaling cytomegalovirus infection leads to pleomorphic rhabdomyosarcomas in trp53+/2 mice the virion-packaged endoribonuclease of herpes simplex virus 1 cleaves mrna in polyribosomes getting the message direct manipulation of host mrna accumulation during gammaherpesvirus lytic infection regulation of translation initiation by herpesviruses notch: control of lymphocyte differentiation in the periphery transcriptional response of a common permissive cell type to infection by two diverse alphaherpesviruses notch and wnt signaling: mimicry and manipulation by gamma herpesviruses inhibition of gammasecretase cleavage in the notch signaling pathway blocks hsv-2-induced type i and type ii interferon production notch signaling regulates the pattern of auditory hair cell differentiation in mammals notch signaling and the developing inner ear jagged2 controls the generation of motor neuron and oligodendrocyte progenitors in the ventral spinal cord the notch ligands jagged2, delta1, and delta4 induce differentiation and expansion of functional human nk cells from cd34+ cord blood hematopoietic progenitor cells expression of notch receptors and ligands on immature and mature t cells a mouse model for cytomegalovirus infection. current protocols in immunology ly49p recognition of cytomegalovirus-infected cells expressing h2-dk and cmvencoded m04 correlates with the nk cell antiviral response dissection of the antiviral nk cell response by mcmv mutants mutagenesis of viral bacs with linear pcr fragments (et recombination) systematic excision of vector sequences from the bac-cloned herpesvirus genome during virus reconstitution integrative genomics viewer sammate: a gui tool for processing short read alignments in sam/bam format the mouse genome database (mgd): comprehensive resource for genetics and genomics of the laboratory mouse entrez gene: genecentered information at ncbi we thank corinna benkartek and martin messerle for the mutant viruses and alec redwood for the gift of the wild mcmv viruses and for generously sharing unpublished data. we thank lars dolken for kindly providing their rna-seq data for comparison. we also thank andrea henkel, misel satrak and guojuan zhang for help with the cdna library. key: cord-257843-nj2707mv authors: mariani, thomas j; qiu, xing; chu, chinyi; wang, lu; thakar, juilee; holden-wiltse, jeanne; corbett, anthony; topham, david j; falsey, ann r; caserta, mary t; walsh, edward e title: association of dynamic changes in the cd4 t-cell transcriptome with disease severity during primary respiratory syncytial virus infection in young infants date: 2017-08-17 journal: the journal of infectious diseases doi: 10.1093/infdis/jix400 sha: doc_id: 257843 cord_uid: nj2707mv background: nearly all children are infected with respiratory syncytial virus (rsv) within the first 2 years of life, with a minority developing severe disease (1%–3% hospitalized). we hypothesized that an assessment of the adaptive immune system, using cd4(+) t-lymphocyte transcriptomics, would identify gene expression correlates of disease severity. methods: infants infected with rsv representing extremes of clinical severity were studied. mild illness (n = 23) was defined as a respiratory rate (rr) < 55 and room air oxygen saturation (sao(2)) ≥ 97%, and severe illness (n = 23) was defined as rr ≥ 65 and sao2 ≤ 92%. rna from fresh, sort-purified cd4(+) t cells was assessed by rna sequencing. results: gestational age, age at illness onset, exposure to environmental tobacco smoke, bacterial colonization, and breastfeeding were associated (adjusted p < .05) with disease severity. rna sequencing analysis reliably measured approximately 60% of the genome. severity of rsv illness had the greatest effect size upon cd4 t-cell gene expression. pathway analysis identified correlates of severity, including jak/stat, prolactin, and interleukin 9 signaling. we also identified genes and pathways associated with timing of symptoms and rsv group (a/b). conclusions: these data suggest fundamental changes in adaptive immune cell phenotypes may be associated with rsv clinical severity. respiratory syncytial virus (rsv), the most important cause of respiratory tract illness in infants and young children, infects 50%-70% of infants during the first year of life [1] . although most infections are relatively mild, 1%-3% of infected infants require hospitalization, accounting for 74 000-126 000 admissions of infants aged < 1 year annually in the united states [2, 3] . additionally, rsv-related emergency department visits for infants aged ≤ 1 year of age range from 39 to 69 per 1000, and rsv-related office visits are 3 times as many [4] . although many risk factors for severe disease are recognized, such as prematurity, congenital heart disease, pulmonary disease, and neurologic and immunosuppressive conditions, the majority of infants brought to medical attention are healthy full-term infants. host and environmental factors also associated with more severe disease, although less overt, are male sex, lack of breastfeeding, tobacco smoke exposure, and low levels of maternally derived protective antibody [5] . host immune responses are also thought to influence disease manifestations, including severity [6] [7] [8] [9] . experimental animal models of rsv infection suggest that th2 cd4 immune-dominant responses, as well as diminished or impaired anti-inflammatory t-regulatory (treg) cell function and increased th17 responses, contribute to increased lung pathology [6, 7, 10] . data from infant studies are less compelling, with some supporting and others refuting that a t2-dominant response is responsible for severe disease [9, [11] [12] [13] [14] [15] . recently gene expression analysis of whole blood collected during infection has been used to assess the immune response to rsv [16] [17] [18] . one study noted increased expression of interferon signaling and neutrophil gene pathways and diminished t-and b-cell gene pathways [17] . because cd4 t cells are critical in the development of adaptive immunity following infection and also influence the degree of inflammation, we sought to investigate gene expression patterns using high-throughput rna sequencing (rna-seq) of isolated cd4 t cells in healthy full-term infants aged <10 months at the time of primary rsv infection by comparing infants with mild and severe clinical disease. we identified unique gene expression patterns, implicating biological pathways associated with disease severity, which provide novel insight into pathogenesis of rsv infection in this population. the research subject review boards of the university of rochester medical center (urmc) and rochester general hospital approved the study, and a parent provided written informed consent. respiratory syncytial virus-infected infants were selected for this analysis from 3 cohorts as part of the aspires study of rsv pathogenesis. a birth cohort, enrolled between august and december of both 2012 and 2013, were followed by a combination of passive and active surveillance for development of rsv infection during the subsequent winter. infants with respiratory symptoms were evaluated at home for rsv infection using an rsv group-specific reverse transcriptase-polymerase chain reaction (rt-pcr) assay [19] . a second cohort was enrolled when seen for acute respiratory symptoms in pediatric offices or emergency rooms and tested for rsv by antigen detection (quidel, san diego, ca) and/ or rt-pcr. the third cohort consisted of infants hospitalized with rsv at urmc's golisano children's hospital. eligible subjects were full-term (>36 wk gestation) healthy infants born after the previous may 1 and aged <10 months at rsv infection to ensure primary infection. the hospitalized infants were seen daily until discharged, and charts were reviewed for signs of respiratory illness and lowest room air oxygen saturation (sao 2 ). infants were evaluated by 2 members of the study team (a physician and a project nurse). demographic data, illness symptoms, findings on physical examination, and results of standard laboratory and chest radiograph results, when available, were recorded and defined as visit 1 (acute). illness onset was determined by physician interview of parent(s) during evaluation. following evaluation, a nasal swab using an infant-sized flocked swab (floqswabs catalog no. 525cs01, copan, murrieta, ca) was placed in 2 ml of sterile ultraviolet-inactivated water and 2-3 ml of heparinized blood collected. a second blood sample was collected during a second visit (convalescent) 12-16 days after illness onset. the nasal swab was used for detection of other respiratory virus coinfection, streptococcus pneumoniae, and haemophilus influenzae using a taqman multiplex assay and separately for moraxella catarrhalis according to published methods [20] . heparinized blood was maintained at room temperature for up to 2 hours, and peripheral blood mononuclear cells were isolated by ficoll-hypaque gradient, flow-sorted into subsets including cd3 + cd4 + cd8 − lymphocytes, and stored in rna lysis buffer at −20°c [21] . rna sequencing was performed as previously described, starting with 1 ng of rna and using the smarter ultra low amplification kit (clontech, mountain view, ca) [20] . libraries were constructed using the nexteraxt library kit (illumina, san diego, ca) and sequenced on the illumina hiseq2500 to generate approximately 20 million 100-bp single end-reads per sample. preanalysis data processing was as previously described [20] . for categorical variables, we used samseq to identify genes with significant differences in mean expression (q < 0.05). both pearson and spearman correlation tests, with benjamini-hochberg correction to control false discovery rate at a 0.05 level, were used to select genes with significant correlation with continuous variables. we also conducted a multivariate linear mixed-effects regression analysis to study the linear association between the expression of individual genes (response variables) and various important demographic, clinical, and environment variables (see supplementary methods). differentially expressed genes were used for canonical pathway identification and upstream regulator analysis using ingenuity pathway analysis (qiagen, redwood city, ca). to assess t-cell transcription factor activity, we interpreted cd4 gene expression patterns associated with severity using bayesian estimation of transcription factor activity [22, 23] . quantitative polymerase chain reaction was performed on selected genes for confirmation of the gene expression as described [20] . a p value < .05 was considered statistically significant. for continuous clinical variables, we performed 2-sample welch t tests to check the equality of mean values between 2 patient groups defined by disease severity. for binary variables, fisher's exact test was used. breastfeeding was modeled as a categorical variable (none < some < exclusive) and tested using spearman's rank correlation. of 86 rsv-infected infants enrolled in the study, we selected 46 representing the extreme ends of the severity spectrum: 23 with mild disease (n = 10 birth cohort, n = 13 second cohort), defined as maximum respiratory rate (rrmax) < 55 per minute and sao 2 ≥ 97%; and 23 with severe disease (all from the hospitalized cohort), defined as rrmax ≥ 65 and sao2 ≤ 92%. subjects with more severe illness were significantly younger (2.2 vs 4.0 months; p < .01), more likely to be exposed to environmental tobacco smoke (23% vs 4%; p = .01), and less likely to be breastfed (65% vs 87%; p = .05) ( table 1) . viral coinfection was similar between groups. the groups were equally colonized with moraxella catarrhalis, but the more severely ill group was significantly more likely to be colonized with s. pneumoniae and/or h. influenzae (65% vs 30%; p = .04). although all infants were considered full term, those with severe clinical symptoms were born at slightly lower gestational age. the cd3 + /cd4 + /cd8 − (cd4 + ) t cells from this group of 46 infants were sorted and subjected to rna-seq analysis. a total of 51 samples from 38 subjects (n = 19 in each group) passed quality control (qc), and the remaining samples were removed from analysis. the raw reads, mapping rate, and gene detection rates are shown in figure 1 . samples averaged approximately 20m reads with >90% mapping rate, and our filtered analytical data set included expression values for 10 446 genes. interestingly cd4 + t cells appeared to express approximately 60% of the genome, consistent with prior data on sorted lymphocytes [21] . an assessment of cell type−specific markers (eg, cd3, cd4, cd8, cd19, mpo) confirmed the purity of the sorted cells (supplementary figure 1) . univariate analysis was used to identify gene expression patterns associated with each clinical or demographic variable ( table 2 ). our analysis demonstrated that clinical severity was associated with the greatest effect size upon gene expression (n = 551 genes). the effect size for clinical severity was much greater than that for sex (n = 74 genes), rsv group (a/b; n = 53 genes), or days since onset of clinical symptoms (n = 35 genes). interestingly, we found that bacterial cocolonization seems to have a greater effect (n = 12 genes) than viral coinfection (n = 3 genes). appreciating the potential of confounding variables to influence identification of significant gene expression patterns, we constructed a multivariate linear mixed-effects regression model that included the variables with greatest marginal effects. numbers of significant genes identified by this more conservative analysis are summarized in table 2 . details for genes identified, including their relationships to individual variables, are provided in supplementary table 1 . in this multivariate model, severity of illness continued to have the strongest association with gene expression (n = 140 genes), albeit now only slightly greater than sex (n = 113 genes). the multivariate model again showed significant gene expression associated with days since onset of clinical symptoms (n = 63 genes), rsv group (n = 68 genes), and gestationl age (n = 41 genes). in addition, the multivariate model also identified a substantial number of genes significantly associated with viral coinfection/bacterial colonization status (n = 67 genes) and found that pathogenic bacteria colonization alone is significantly associated with gene expression (n = 44 genes). a heat map of the 140 genes associated with disease severity identified by multivariate analysis and stratified by time since onset of clinical symptoms is shown in figure 2a . these data demonstrated population-level heterogeneity (particularly in the severely affected subjects) and suggested greater differences in gene expression in the acute phase than in the convalescent phase. gene-set analysis was used to interpret the expression pattern of genes associated with severity, in both the more liberal univariate analysis (n = 551 genes) and the more conservative multivariate analysis (n = 140 genes), identifying a number of signaling pathways and intercellular signaling molecules that may be associated with clinical responses in rsv-infected infants ( figure 2b and supplementary figure 2 ). although there was limited confirmation of significance at the individual gene level, with only 64 of 140 multivariate genes also identified in univariate analysis, there was a high degree of confirmation for canonical pathway discovery ( figure 2b ). these observations are consistent with the robustness of a pathway-based approach to analysis of transcriptomics data. genes associated with severity predicted activation of jak/ stat, prolactin, gα1, interleukin 1, inos, igf1, and phosphatidylinositol 3-kinase (pi3k)/akt signaling pathways were noted in severely affected subjects. interleukin 9 (il-9) pathway signaling was also predicted to be altered, but it was unclear whether this pathway was activated or inhibited. identification of these pathways was predominantly driven by genes demonstrating ) . b, pathways associated with severe phenotype. ingenuity pathway analysis (ipa) was used to identify canonical pathways represented by genes associated with severity in cd4 lymphocytes from respiratory syncytial virus-infected subjects. the variables used to generate gene sets for ipa were multivariate severity phenotype (a; n = 140) and univariate severity phenotype (b; n = 551). thirteen pathways are shown where fisher's exact test p values were <.05 for at least 1 variable. orange and blue circles indicate predicted increased or decreased pathway activation (activation z score), respectively. genes included in each pathway are listed and are colored red if increased in severe subjects or green if decreased in severe subjects. genes with >1.5-fold increases are bold. regulation was associated with cd4+ t-cell gene expression in severely ill infants, implicating shifts in arachidonate metabolism. activation of nrf2 (nfe2l2), the major antioxidant pathway, was also associated with cd4 + t-cell gene expression in severely ill infants. this analysis also suggested activation of a number of regulators of mitogenesis, including myc, kras, kip1 (cdkn1b), and p53 (tp53). together these data support the conclusion that broad changes in cd4 t-cell activity and survival are associated with clinical severity in rsv-infected infants. in fact, some observed gene expression changes specifically implicated responses associated with cd4 t-cell subtype, such as regulation of il-9 signaling ( figure 2b ). also noted were increased activation of jak2 and increased expression of suppressor of cytokine signaling (socs) genes in cd4 t cells from severe subjects, particularly socs2, which antagonizes the other socs proteins, and together with jak2 regulates early th1/th2 differentiation [24] [25] [26] . in an effort to directly test whether severity-associated changes in gene expression were consistent with alterations in t-cell subtype, we implemented bayesian estimation of transcription factor activity to predict transcription factor activity based upon cd4 gene expression patterns associated with severity [23] . we found changes in gata3 activity were predicted to be significantly associated with severity as defined by the univariate model (p = 1.62e-05), whereas rora was marginally significant (p = .04) and stat1/2/4 and smads were not significant (p > .05). gata3 is a transcriptional activator that is required for t-helper 2 (th2) differentiation. it also plays a role in differentiation of th9 cells. these data support the interpretation that changes in cd4 t-cell subtype differentiation are associated with severity in rsv infection. we explored biological interpretations of genes associated with the time since onset of clinical symptoms (figure 3 ). for this analysis, we focused on the set of genes significantly associated with the stage of visit-acute versus convalescent. as anticipated, activation of interferon signaling was noted early in the clinical course, based on the assessment of canonical pathway gene expression ( figure 3a ) and upstream regulator prediction ( figure 3b ). interestingly, both type i (ifna/b), type ii (ifng), and type iii (ifnl) interferon responses were implicated, which was confirmed by significant enrichment of stat1-stat2 and irf1 target genes (p = 9.033e-05 and .0004, respectively). this was associated with downstream changes in inflammasome activation, as implicated by predictions of mavs and il1rn as upstream regulators ( figure 3b ). changes in viral pattern recognition signaling, potentially driven predominantly by increases in tlr3/7/9-associated genes, were also identified. interestingly, decreases in proliferative activity in cd4 t cells, as indicated by reductions in cnot7 and mapk1, were also observed. the rsv group (a/b) was not associated with clinical disease severity independently but was associated with significant differences in cd4 t-cell transcriptome status (table 2) . pathwaybased interpretation of these gene expression responses identified signaling pathways that may be differentially affected by infection with these 2 rsv groups ( figure 4a ). based upon canonical pathway analysis, adenine/adenosine salvage, ppar, and sapk/jnk signaling were all predicted to be differentially regulated by rsv subgroup infection. interestingly, upstream regulator analysis ( figure 4b ) suggested rictor (mtor2 related), clock, and nfe2l2 (nrf2) may play a significant role in these responses. we attempted to validate expression estimates for 6 genes (cd4, cd7, fkbp5, oas1, rsad2, socs2) selected based upon rna-seq detection levels and biological interest ( figure 5 and supplementary table 2 ). expression of 5 of these genes (cd4, fkbp5, oas1, rsad2, socs2) demonstrated significant concordance with rna-seq data (all p < .001). cd7 expression levels were low, which is likely the source of failure to validate expression estimates. among these 5 genes, quantitative polymerase chain reaction expression estimates confirmed socs2 expression was significantly associated with severity ( figure 5a and supplementary table 2 ), whereas cd4, oas1, and rsad2 were all significantly associated with time since onset of clinical symptoms ( figure 5b and supplementary table 2 ). a full understanding of the cellular immune mechanisms underlying disease severity during rsv infection in infants has been elusive, especially in normal, full-term, healthy infants, the population that comprises the majority of infants brought to medical attention. most studies investigating rsv disease pathogenesis have measured the presence and levels of various inflammatory proteins, such as cytokines and chemokines in blood or respiratory secretions, or their in vitro production by peripheral blood mononuclear cells, often with disparate results [9, [11] [12] [13] [14] [15] . recently, analysis of gene expression in whole-blood samples using microarray from rsv-infected infants was reported from 2 centers [16] [17] [18] . a study of 21 rsv-infected infants showed extensive activation of innate immune responses, specifically of the interferon signaling network, but could not identify differences according to disease severity [16] . in a much larger study involving 90 rsv-infected infants with mild and severe rsv disease, mejias and colleagues reported overexpression of neutrophil, inflammation, and interferon genes and suppression of t-and b-cell genes [17] . in this study, severe illness was associated with greater expression of neutrophil and inflammation genes than in mildly ill subjects, whereas mildly ill subjects showed overexpression of innate immunity genes. because cd4 t cells are important in the early adaptive immune response to rsv and are associated with the degree of inflammation during infection, we chose to investigate gene expression patterns in isolated cd4 t cells from infants with primary rsv infection of differing severity during their first year of life. to optimize identification of differentially expressed genes, we selected infants at the extremes of disease severity. interrogation of cd4 t-cell gene expression in these subjects identified a number of pathways of interest, including prolactin signaling, jak/stat signaling, pi3k/akt signaling, and il-9 signaling. our data indicated severely ill rsv-infected infants displayed greater activation of prolactin signaling in cd4 t cells. prolactin, a polypeptide hormone originating from the pituitary gland, has been shown to bind to the prolactin receptor expressed on cd4 t cells and to have a variety of effects on cd4 t cells [27, 28] . prolactin induction of t-bet transcription through phosphorylation of stat2 and stat5 is inversely dose dependent, and it has been suggested that exposure of cd4 t cells to high levels of prolactin might reduce th1 function [27] . another study found that treg cells express prolactin receptor and exposure to prolactin inhibited their suppressive effect on th1 cells in vitro [28] . interestingly, it has been reported that infants with severe rsv disease admitted to the intensive care unit had significantly higher serum prolactin levels than moderately ill infants [29] . the jak2 cytokine and jak/stat signaling pathways are integral to the production of the innate type i interferons ifnα and ifnβ, and thus it is not surprising that these pathways are differentially expressed according to disease severity. confirming that such an effect is detectable in circulating, purified cd4 t cells is novel. it is known that the rsv ns1 and ns2 proteins block type i effectively via degradation of the transcription activator stat2 [30] . the socs genes are included in the jak/stat pathways, and the socs2, socs3, and socs5 genes were all overexpressed in the severely ill subjects. it has been reported that socs3 is expressed during th2 immune responses and in vitro experiments after exposure of murine respiratory epithelial cells to rsv [31] . both stats and socs are key regulators of t-cell differentiation, maturation, and function [32] . similar to stats, socs may cross-regulate one another, and the proteins are differentially expressed in th cell lineage [33] . socs1 and socs3 favor th2 and th17 differentiation, and socs2 favors the differentiation of th17 by the compensation of th2 differentiation [34] [35] [36] . it also was not surprising to find that the pi3k/akt pathway activation was increased in severe rsv disease. pi3k signaling is thought to have an important role in cd4 t-cell differentiation and function, including tregs [37] . respiratory syncytial virus has been shown to rapidly activate this pathway, which is associated with inhibition of cellular apoptosis as well as increased inflammatory cytokine production [38] . interleukin 9, a th2 cytokine produced by cd4 t cells, eosinophils, and neutrophils, has been found in the upper and lower airway secretions of infants with rsv bronchiolitis at relatively high levels but has not been previously correlated with disease severity [39, 40] . in a murine model of rsv infection, depletion of il-9 enhanced clearance of virus from the lungs, and the authors concluded that il-9 promoted a th2-type inflammatory response [41] . together, our data are supportive of a model where shifts in cd4 t-cell subtype toward a th2 phenotype are associated with severity of illness in rsv-infected infants. further studies, particularly in larger and more heterogeneous populations, will be required to confirm this observation and to determine whether these changes are mechanistic or are purely a result of the illness itself. the exclusion of infants with well-known risk factors for severe disease allowed us to remove these influences from our analysis. although we did not use a formal severity scoring system to define disease severity in our population, we believe that the clinical criteria selected provided a valid separation of mild and severe disease. the use of sorted cd4 t cells, rather than whole blood, and rna-seq are unique aspects of this study and add to the existing gene expression literature. a potential limitation of this method is that some of the findings might have been affected by manipulation of the cd4 t cells during purification and sorting. however, and critically, identical procedures were used for processing samples from all subjects, regardless of disease severity. all blood samples were collected between 8 and 10 am, kept at room temperature, and processed immediately. therefore, any impact of sample processing should be applicable to all subjects and result mostly in limitations of sensitivity. in summary, we were able to identify a number of differentially expressed genes and gene pathways involved in primary rsv infection in full-term healthy infants that are associated with increased disease severity. the results may help to ultimately identify potential biomarkers of severe disease and provide putative intermediate, molecular phenotypes of disease that can be assayed using experimental in vitro or in vivo infection models of rsv. supplementary materials are available at the journal of infectious diseases online. consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author. respiratory syncytial virus and parainfluenza virus respiratory syncytial virus-associated hospitalizations among children less than 24 months of age hospitalizations associated with influenza and respiratory syncytial virus in the united states the burden of respiratory syncytial virus infection in young children viral bronchiolitis in children respiratory syncytial virus: virology, reverse genetics, and pathogenesis of disease protective and dysregulated t cell immunity in rsv infection innate immune dysfunction is associated with enhanced disease severity in infants with severe respiratory syncytial virus bronchiolitis the human immune response to respiratory syncytial virus infection the cd4 t cell response to respiratory syncytial virus infection type 1 and type 2 cytokine imbalance in acute respiratory syncytial virus bronchiolitis predominant type-2 response in infants with respiratory syncytial virus (rsv) infection demonstrated by cytokine flow cytometry respiratory syncytial virus infection in infants is associated with predominant th-2-like response peripheral blood cytokine responses and disease severity in respiratory syncytial virus bronchiolitis macrophage inflammatory protein-1alpha (not t helper type 2 cytokines) is associated with severe forms of respiratory syncytial virus bronchiolitis global gene expression profiling in infants with acute respiratory syncytial virus broncholitis demonstrates systemic activation of interferon signaling networks whole blood gene expression profiles to assess pathogenesis and disease severity in infants with respiratory syncytial virus infection nasopharyngeal microbiota, host transcriptome, and disease severity in children with respiratory syncytial virus infection viral shedding and immune responses to respiratory syncytial virus infection in older adults the healthy infant nasal transcriptome: a benchmark study flow-based sorting of neonatal lymphocyte populations for transcriptomics analysis optimized logic rules reveal interferon-γ-induced modes regulated by histone deacetylases and protein tyrosine phosphatases comparative analysis of anti-viral transcriptomics reveals novel effects of influenza immune antagonism socs proteins, cytokine signalling and immune regulation regulation of cd4+ t-cell polarization by suppressor of cytokine signalling proteins viral exploitation of host socs protein functions prolactin can modulate cd4+ t-cell response through receptor-mediated alterations in the expression of t-bet prolactin levels correlate with abnormal b cell maturation in mrl and mrl/lpr mouse models of systemic lupus erythematosus-like disease the neuroendocrine stress response and severity of acute respiratory syncytial virus bronchiolitis in infancy bovine respiratory syncytial virus nonstructural proteins ns1 and ns2 cooperatively antagonize alpha/beta interferon-induced antiviral response respiratory syncytial virus (rsv) attachment and nonstructural proteins modify the type i interferon response associated with suppressor of cytokine signaling (socs) proteins and ifn-stimulated gene-15 (isg15) suppressors of cytokine signaling (socs) in t cell differentiation, maturation, and function suppressors of cytokine signaling proteins are differentially expressed in th1 and th2 cells: implications for th cell lineage commitment and maintenance loss of suppressor of cytokine signaling 1 in helper t cells leads to defective th17 differentiation by enhancing antagonistic effects of ifn-gamma on stat3 and smads functional cross-modulation between socs proteins can stimulate cytokine signaling socs-3 regulates onset and maintenance of t(h)2-mediated allergic responses the role of the pi3k signaling pathway in cd4(+) t cell differentiation and function respiratory syncytial virus inhibits apoptosis and induces nf-kappa b activity through a phosphatidylinositol 3-kinase-dependent pathway interleukin 9 production in the lungs of infants with severe respiratory syncytial virus bronchiolitis severe respiratory syncytial virus bronchiolitis in infants is associated with reduced airway interferon gamma and substance p il-9 regulates pathology during primary and memory responses to respiratory syncytial virus infection we would like to thank the following individuals whose efforts were instrumental in facilitating the key: cord-354829-god79qzw authors: mao, kaimin; geng, wei; liao, yuhan; luo, ping; zhong, hua; ma, pei; xu, juanjuan; zhang, shuai; tan, qi; jin, yang title: identification of robust genetic signatures associated with lipopolysaccharide-induced acute lung injury onset and astaxanthin therapeutic effects by integrative analysis of rna sequencing data and geo datasets date: 2020-09-23 journal: aging (albany ny) doi: 10.18632/aging.104042 sha: doc_id: 354829 cord_uid: god79qzw acute lung injury (ali) and acute respiratory distress syndrome (ards) are life-threatening clinical conditions predominantly arising from uncontrolled inflammatory reactions. it has been found that the administration of astaxanthin (ast) can exert protective effects against lipopolysaccharide (lps)-induced ali; however, the robust genetic signatures underlying lps induction and ast treatment remain obscure. here we performed a statistical meta-analysis of five publicly available gene expression datasets from lps-induced ali mouse models, conducted rna-sequencing (rna-seq) to screen differentially expressed genes (degs) in response to lps administration and ast treatment, and integrative analysis to determine robust genetic signatures associated with lps-induced ali onset and ast administration. both the meta-analyses and our experimental data identified a total of 198 degs in response to lps administration, and 11 core degs (timp1, ly6i, cxcl13, irf7, cxcl5, ccl7, isg15, saa3, saa1, tgtp1, and gbp11) were identified to be associated with ast therapeutic effects. further, the 11 core degs were verified by quantitative real-time pcr (qrt-pcr) and immunohistochemistry (ihc), and functional enrichment analysis revealed that these genes are primarily associated with neutrophils and chemokines. collectively, these findings unearthed the robust genetic signatures underlying lps administration and the molecular targets of ast for ameliorating ali/ards which provide directions for further research. acute respiratory distress syndrome (ards) is an acute inflammatory lung injury, associated with increased pulmonary vascular permeability, increased lung weight, and loss of aerated lung tissue [1] . its less severe form is acute lung injury (ali). most patients need mechanical ventilation for support. the initial acute or exudative phase of ali/ards is characterized by the rapid onset of dyspnea, hypoxemia, respiratory failure, and bilateral infiltrates on chest radiographs that are consistent with pulmonary edema [2] . ali/ards is common and has been associated with several clinical disorders, such as sepsis; pneumonia; aspiration of gastric contents, aging saltwater, or freshwater; major trauma; transfusion of blood products; acute pancreatitis; and drug reactions (for example, reactions to lipopolysaccharide) [3] . in the past 50 years, considerable progress has been made in understanding the epidemiology, pathogenesis, and pathophysiology of ards. however, ards is being increasingly recognized as a heterogeneous syndrome, generating momentum for the identification of clinical and biological features for classifying patients into subphenotypes that might be more responsive to specific therapies. lipid a (endotoxin), the hydrophobic anchor of lipopolysaccharide (lps), is a glucosamine-based phospholipid that makes up the outer monolayer of the outer membranes of most gram-negative bacteria [4] . in recent years, lps, which has been most widely used in drug-associated ali models, can effectively induce a neutrophilic inflammatory response accompanied by an increase in intrapulmonary cytokines. many studies have shown that oxidative stress plays a major role in the pathogenesis of lung injury in a murine model of ali induced by lipopolysaccharide (lps) [5] [6] [7] . in response to the increased formation of reactive oxygen species (ros), thioredoxin interacting protein (txnip) detaches from thioredoxin (trx), binds to the nucleotide-binding domain-like receptor protein 3 (nlrp3), and then activates nlrp3 inflammasome [8] . the activation of the nlrp3 inflammasome results in the maturation and release of pro-inflammatory cytokines, such as interleukin-1β (il-1β), which further aggravates the production of inflammatory cytokines (tumor necrosis factor-α (tnf-α), il-6, inducible nitric oxide synthase (inos), and cyclooxygenase-2 (cox2)) and induces oxidative stress [9] [10] [11] . astaxanthin (ast) is a lipid-soluble, red-orange-colored xanthophyll carotenoid synthesized by many microorganisms and various types of marine life. the main producers of natural ast are microalgae and fungi. aquatic animals such as salmon, red seabream, shrimp, lobster and crayfish, which feed on ast-producing organisms, are significant dietary sources of ast for humans [12] [13] [14] . it has been revealed that ast can prevent inflammatory processes by blocking the expression of pro-inflammatory genes as a consequence of suppressing nuclear factor kappab (nf-κb) activation [15] . some studies also suggested that ast has a dosedependent ocular anti-inflammatory effect, through the suppression of no, pge2, and tnf-alpha production, which is achieved by directly blocking nos enzyme activity [16] . furthermore, ast has great therapeutic value for lung disease, such as an antifibrotic effect against the promotion of myofibroblast apoptosis based on dynamin-related protein-1 (drp1)-mediated mitochondrial fission in vivo and in vitro [17] and anti-inflammatory effect against lps-induced ali, as mentioned above [18, 19] . however, at the transcriptional level, the mechanism of action of ast in the treatment of ali-/ ards-remains unclear. therefore, we hope to explore the molecular targets of ast against ali-/ ards-through further research, with the purpose of providing a new alternative for the clinical treatment of this acute lung disease. to determine the common molecular signatures underlying lps-induced mouse ali initiation, five microarray datasets were obtained from corresponding independent studies. the characteristics of the studies composing the five gene expression compendiums are listed in table 1 and supplementary table 1 . we extracted and annotated the five microarrays, which yielded a collection of 4093 unique genes from 64 samples, including 26 control and 36 lps-induced ali mice. before the meta-analysis study, we comprehensively analyzed the five datasets by identifying the differentially expressed genes in each data set and evaluated overlapping significant genes. the overlapping results were used to generate a venn diagram (figure 1 ), and three genes (ccl12, zbp1, and cxcl13) were identified in the common region, suggesting that these three genes were significantly correlated with lps management in mice in the five datasets. then, we performed a meta-analysis using networkanalyst (http://www.networkanalyst.ca), which is a comprehensive web-based tool designed to perform meta-analyses of gene expression data [20] . an overview outlining the procedure of the analysis is presented in figure 2a . using three meta-analysis approaches, namely fisher's method, fixed effect model and voting count, 3139, 2143 and 3223 differentially expressed genes, respectively, were identified. among these genes, 2097 were identified by all three methods ( figure 2b ), with 1043 (49.7%) genes being upregulated and 1054 (50.3%) being downregulated in the lps group compared with the control group. a full list of the 2097 common genes identified by the three meta-analysis methods is presented in supplementary table 2 . a heat map of the top 50 common degs across the five datasets is displayed in figure 2c . of note, the 10 top upregulated genes (p<0.05) were junb, vcam1, ehd1, ifrd, adm, cd83, nadk, litaf, tubb6, and ctps. the 2 most significantly downregulated genes (p<0.05) among the top 50 common degs were acss1 and abcd3. the merged data from this meta-analysis are listed in supplementary data 1. to further identify the robust expression signatures in lps-induced ali and investigate the transcriptional changes resulting from treatment of ali with ast, we divided mice into three groups, including the control group, lps group, and ast group. rna-sequencing (rna-seq) was performed to profile differentially expressed genes (degs) associated with lps-induced ali initiation and ast treatment. a total of 1187 degs were identified in the lps-induced ali group compared with the control group. among these genes, 989 were table 3 . then, we compared these genes with the degs obtained from the above meta-analysis, and generated two heat-maps of the common degs across the meta-analysis results and our experimental aging data, which are displayed in figure 3c and supplementary figure 1 . in total, 198 degs were detected in both published data and our experimental data, including 181 upregulated and 17 downregulated degs. to explore the therapeutic effect of ast against ali at the genetic level, we also compared the gene expression profile of the lps-induced ali group with that of the ast treatment group. in total, 21 degs were identified after ast treatment ( figure 3b table 4 ). we subsequently integrated the rna-seq and microarray meta-analysis data, and 11 core degs (timp1, ly6i, cxcl13, irf7, cxcl5, ccl7, isg15, saa3, saa1, tgtp1, and gbp11) that were upregulated in ali models and downregulated significantly after ast treatment were identified ( table 2) . to understand the function of the 11 core degs, go enrichment analysis including molecular function (mf), biological process (bp) and cellular component (cc) categories (supplementary table 5 ) was performed using the 'clusterprofile' package in r [21] . in bp terms, the upregulated genes were associated with "cell chemotaxis," the "chemokine−mediated signaling pathway," and "neutrophil migration" ( figure 4a ). several studies have shown that neutrophil migration and related chemokine network regulation in the lung play roles in the pathogenesis and development of ali/ards [22] [23] [24] . in the mf category, the core degs were associated with "glycosaminoglycan binding," "chemokine activity," and "receptor-ligand activity" ( figure 4b ). since glycosaminoglycan-cytokine interactions have been reported to support cellular mechanisms that cause acute inflammation [25] , ast may affect these interactions by downregulating the genes involved to exert an anti-inflammatory effect. moreover, degs were enriched in the cc category involved in "high−density lipoprotein particles," "symbiont−containing vacuole membranes," and "plasma lipoprotein particles" ( figure 4c ). to further confirm the differences in the expression of the 11 core degs (timp1, ly6i, cxcl13, irf7, cxcl5, ccl7, isg15, saa3, saa1, tgtp1, and gbp11) among the control group, lps group, and ast group, we divided mice into three groups and conducted qrt-pcr and ihc verification ( figure 5a -5k, supplementary figure 4 ). the results demonstrate that the relative expression levels of all 11 genes were significantly upregulated in the lps group compared to the control group. more importantly, the expression levels of the above 11 degs, as analyzed by qrt-pcr, were significantly inhibited after the application of ast. of the 11 core genes, 8 were tested by ihc, and the results were consistent with the qrt-pcr results, which further verifying the data (supplementary figure 4) . overall, the rt-qpcr and ihc results were consistent with our integrative rna-seq analysis and metaanalysis, suggesting the critical role that the 11 core dges might play in the mechanism by which ast alleviates ali/ards. as a life-threatening condition, ali/ards is an underrecognized condition, and its treatment is an unmet medical need. it is thought that inflammatory storm is the key factor in the occurrence of ali [26] , and anti-inflammatory and antioxidant therapy should be the primary objective in ali/ards [27] . to find the conserved genes responsible for lps-induced ali initiation and the effects of ast treatment, we identified robust changes in gene expression related to ali by meta-analysis and rna-seq using the gene expression omnibus (geo) database and mice, respectively. moreover, we performed functional enrichment analysis of core genes using the gene ontology (go) database to explore the possible molecular mechanisms that mediate the therapeutic effect of ast. before the meta-analysis of the five microarray datasets, we compared the differentially expressed genes in each dataset, and 3 common differentially expressed genes (degs) were found in all five datasets: cxcl13, zbp1, and ccl12. cxcl13 is abnormally expressed in the lung tissues of patients with idiopathic pulmonary fibrosis (ipf), and its circulating concentration is also highly correlated with the clinical manifestations and disease progression of individual patients. in the lung tissues of patients with ipf, cxcl13 may promote focal infiltration of nonproliferating b cells through the cxcl13-cxcr5 axis [28] . zbp1 is a host protein that was shown to be an innate sensor of viral infection, regulating cell death, inflammasome activation, and proinflammatory responses in a variety of situations, including infection and embryonic development [29] . a previous study indicated that zbp1 is abnormally expressed in h1n1induced pneumonia associated with acute respiratory distress syndrome in mice [30] . ccl12 (mcp1), which is elevated in pulmonary fibrosis, has been reported to mediate fibroblast survival through il-6 [31] . since fibroproliferation is initiated early in lung injury, it has been observed that ccl12 is highly expressed in ards statistical analysis of significant differences between groups was achieved with one-way anova using prism 7 software. ****p < 0.0001, ***p < 0.001, **p < 0.01, and *p < 0.05 were considered statistically significant. aging induced by severe sepsis [32] . to reduce the study bias and increase the statistical power of individual microarray data, we performed a meta-analysis of five microarray gene expression profiles to assess the differentially expressed genes between lps-induced and control groups. consequently, 2097 differentially expressed genes (degs) were identified using three meta-analysis approaches. to further identify the robust expression signature related to lps-induced ali and investigate the transcriptional changes in response to the treatment of ali by ast, we performed rna-seq on three groups of mice and integrated the data with the results of the above mentioned meta-analysis. ultimately, we identified 11 core degs that were significantly associated with ast treatment. saa3, ly6i, saa1, irf7, cxcl5, ccl7, timp1, isg15, gbp11, tgtp1, and cxcl13 were found to be overexpressed in the lps group compared with the control group but relatively downregulated in the ast group. our qrt-pcr and ihc verification of the 11 core genes in the mice suggested that these genes might be the key mediators of the therapeutic effect of ast in ali/ards. among the 11 core genes that were differentially expressed in response to ast mediation, two genes are members of the serum amyloid a (saa) family. saa is a critical acute-phase protein that is often increased by infection, trauma, cancer, or other causes of inflammation and plays an important role in the regulation of inflammatory responses [33] . recent studies have indicated that an increased level of saa is positively correlated with the disease progression of covid19, and can thus be a sensitive indicator for assessing the severity and prognosis of covid-19 [34] . in our study, saa3 was the most significantly inhibited gene by ast application in lps-induced ali mice, and its downregulation was further confirmed by qrt-pcr and ihc. saa3, the one of three isoforms of saa expressed in mice, is stimulated intensely in lps-induced acute systemic inflammation, which is consistent with our findings [35] . high expression of saa3 in response to acute inflammation may be repressed by an interaction with noncoding rnas. it has been confirmed that mir-30b-3p may target saa3 to protect against lps-induced ali [36] . additionally, lncrna malat1 can also target saa3 directly or indirectly to cause many diseases such as inflammation, diabetes and septic cardiomyocytes [37, 38] . saa1, another member of the saa gene family, is believed to have a pro-inflammatory effect, and its expression may aggravate tissue inflammation and damage [39] . removing the n-and c-terminal sequences of saa1 can switch the protein to an anti-inflammatory role [40] . however, other research has suggested that mice induced to express genetically modified human saa1 have a partial protective effect against the inflammatory response and lung injury caused by lps [41] . moreover, saa1 is the direct target of mir-660, which can protect nucleus pulposus cells from tnfainduced apoptosis in intervertebral disc degeneration [42] . considering that saa might act as a biomarker of inflammatory disease, it is possible, that its downregulation induced by ast may partly indicate the antiinflammatory effect of ast. the deeper molecular mechanism underlying saa action in response to ast application deserves further exploration. interferon regulatory factor 7 (irf7) is considered the master regulator of ifn-α against pathogenic infections [43] . the excessive activation of irf7 promotes the development of acute lung injury (ali) caused by influenza a virus (iav), and attenuating irf7 activity can significantly prevent the progression of iavinduced ali in model mice [44] . thus, the present finding that irf7 was upregulated by lps and downregulated in response to ast treatment may suggest that of ast protects against ali. regarding how irf7 regulates ifn production, mirna may act as an important mediator. previous research has shown that mir-302c can downregulate irf3 and irf7 expression to mediate influenza a virus-induced ifnβ expression [45] . additionally, mir-144 was shown to reduces the antiviral response by attenuating the traf6-irf7 pathway to alter the cellular antiviral transcriptional landscape [46] . however, whether mirna-irf7 interactions are involved in the pharmacological mechanism of ast remains to be further investigated. tissue inhibitor of metalloproteinase-1 (timp1), a member of timp family, is primarily recognized to regulate the degradation of the extracellular matrix by inhibiting the activity of matrix metalloproteinases (mmps) [47] . it has been reported that an imbalance between mmp9 and timp1 plays a pivotal role in the pathogenesis of ards mainly through participating in airway remodeling, thus indicating the function of the mmp9/timp1 ratio in the evolution of pulmonary fibrosis in ards [48] . indeed, increased systemic levels of timp1 were proven to be associated with increased 90-day mortality in ards patients according to a large, prospective, multicenter study [49] . additionally, other research has demonstrated that increased timp1 expression promotes an immune response, has a pro-inflammatory effect in the lungs after influenza infection and facilitates an injurious phenotype [50] . the above observations not only support our present results regarding timp1 but also provide a considerable explanation for the increase in timp1 expression after lps application. intriguingly, given that timps are highly expressed in liver fibrosis and that the imbalance of mmps/timps promotes the progression of fibrosis, shen et al. found that astaxanthin is able to repress the activation of hepatic stellate cells (hscs) to ameliorate liver fibrosis through downregulating the expression of nf-κb and tgf-β1 and preserving the balance between mmp2 and timp1 [48] . hence, it is reasonable to further investigate whether there is a similar mechanism by which ast downregulates the expression of timp1 to mitigate lps-induced ali. interferon-stimulated gene 15 (isg15), which encodes the ubiquitin-like protein isg15, which is primarily induced by type i interferons, is an essential player in regulating host signaling pathways such as damage repair responses and immune responses. isg15 can be induced by various pathogenic stimuli such as viral and bacterial infections, lipopolysaccharide (lps), retinoic acid, or certain genotoxic stressors [51] . in accordance with our findings, previous studies have observed increased levels of isg15 conjugates in macrophages in response to lps treatment [52] . moreover, research has found that systemic isg (mx1, isg15, ifit1, and ifit3) expression within the first days of ards onset is associated with disease severity and prognosis. this response should be considered along with other identified genetic, environmental, and complex demographic factors as the cause of heterogeneity in ards prognosis [53] . nevertheless, no data has been reported on the association between isg15 and ast in the literature. since the excessive recruitment of leukocytes appears to be a central contributor to the pathogenesis of ali, the elevation of proinflammatory cytokines and chemokines is considered the most important factor [54] . similarly, we found that the expression of chemokines such as ccl7, cxcl5, and cxcl13 were increased after lps instillation but decreased after ast treatment. previous reports have documented an increased level of ccl7 in a mouse model of acute lps-induced lung inflammation [55] . moreover, the expression of cxcl5 is also rapidly induced in ali murine models after lps administration [56] . however, no data has on cxcl13 expression in ali models has been reported. therefore, we report for the first time the induction of cxcl13 after lps administration, which provides insights into the role of cxcl13 in the pathogenesis of ali. furthermore, the observation of decreased expression of ccl7, cxcl5, and cxcl13 may hint at the antiinflammatory properties of ast. although the roles of other degs (ly6i, gbp11, and tgtp1) have been described in many other diseases in detail, their regulatory mechanisms in ali-/ ards-are not fully understood. our results show that these degs are overexpressed to varying degrees in the lps group and that ast can effectively prevent this overexpression. further studies on the roles of these three genes in ali initiation and progression are need. to determine the functional mechanisms of these 11 degs, go enrichment analyses were further conducted. according to the results, 61 terms in biological process category, 18 in cellular component category and 15 in the molecular function category were enriched. the 5 most significantly enriched terms in the bp category were associated with chemokines and neutrophils, indicating the dominant role of neutrophils and related chemokines in the pathogenesis and progression of lps-induced ali. in ali, the excessive recruitment of inflammatory cells and their mediators results in injury to endothelial and epithelial barriers [54] . thus, agents such as ast, which can exert robust anti-inflammatory effects, may provide potential treatment prospects. despite this, several limitations of the current study need to be addressed. first, our research did not use the ali mouse models induced by other agents; thus, it did not address heterogeneity of ali initiation. second, given that findings in animal models of lps-induced lung injury may depend on the time point at which samples are obtained and physiological data are captured, the dynamic changes in lps-induced ali models may have been ignored to a certain extent [57] . finally, in-depth research into the underlying mechanisms using knockout-gene mice for each differentially expressed gene will help further our understanding of the role of ast in ameliorating ali/ards. in conclusion, many genes were dysregulated in ali/ards. we not only identified genes that consistently differed in expression in the lps group compared to the control group but also revealed that ast can alleviate the abnormal expression of these genes and thus confer a certain therapeutic effect against ali/ards, suggesting the potential for ast to become a novel treatment for ali/ards. to identify the genes related to lps-induced acute lung injury in mice, five datasets (gse102016, gse2 411, gse16409, gse104214, and gse18341) were obtained from geo (gene expression omnibus, http://www.ncbi.nlm.nih.gov/geo) [58] [59] [60] [61] [62] . lps and control treatments were used in this study. the detailed information (experimental design, transcriptome analysis, array information, data processing, and platform id) for these datasets can be obtained from the geo repository, and this information is partly aging summarized in table 1 and is described in more detail in supplementary table 1 . then, we conducted a microarray meta-analysis using networkanalyst 3.0 (https://www.networkanalyst.ca) [20] . networkanalyst is a visual analytics platform for comprehensive gene expression profiling and metaanalysis. all gene probes were converted to a common entrez id using the gene/probe conversion tool in networkanalyst. following quantile normalization, all datasets were preprocessed through a log2 transformation and variance stabilizing normalization (vsn). each dataset was visualized in box plots to ensure an identical distribution among the samples. differential expression analysis was performed independently for each dataset using networkanalyst, with an fdr of 0.05 and a significance of p < 0.05. the moderated t-test was based on the limma algorithm. for the meta-analysis, we used fisher's method, the fixed-effect model, and vote counting (combined p < 0.05 or vote counts ≥ 3 were considered significant) to identify the differentially expressed genes (degs) and we selected the common degs identified by these three methods as the final degs. male c57bl/6j mice (18-24g, 6~8-weeks-old) were purchased from beijing vital river laboratory animal technology co., ltd. (beijing, china). the mice were housed 5 per cage under a 12h light/dark cycle in a laboratory at 23 ± 2 °c and 50% humidity. all experiment protocols conformed to the guidelines of the china council on animal care and use. these animal studies were approved by the institutional animal research committee of union hospital. the mice were randomly allocated into three groups: (1) the control group (n=10), which was exposed to pbs alone and received an intraperitoneal injection of sterile saline; (2) the lps group (n=10), which was exposed to pbs containing 0.5 mg/ml lps; and (3) the ast group (n=10), which was intraperitoneally injected with ast (10 mg/ml, dissolved in pbs) at a dosage of 50mg/kg body weight every day before one week of exposure to lps to evaluate its preventive and protective effects [19, 63] , and intraperitoneally injected with 100mg/kg ast (20 mg/ml, dissolved in pbs) 24 hours after lps exposure in order to confirm the therapeutic effect of ast [16] . ast was obtained from sigma-aldrich (st louis, mo, usa). for acute lps exposure, mice were exposed to an aerosol of phosphate buffer saline (pbs) alone or pbs containing 0.5 mg/ml lps for 2h, in a custom-built cuboidal chamber. the lps solution was aerosolized with a constant output ultrasonic nebulizer (model: 402b, yuwell, china) at a flow rate of 35ml/h. lps was purchased from sigma-aldrich (extracted from escherichia coli o55: b5, l2880). the chamber was 20 cm long, 15 cm wide and 15 cm high. total rna was extracted from mouse lung tissue samples with trizol® reagent (invitrogen, ca) following the manufacturer's protocol. the concentration and purity of the rna were measured by a nanodrop2000 spectrophotometer (nanodrop technologies, technologies, wilmington, de, usa), the rna integrity was detected by agarose gel electrophoresis, and the rin was determined using an agilent2100 bioanalyzer (agilent technologies, santa clara, ca, usa). the construction of a single library required a total of 5μg rna with a concentration of ≥200ng/μl and an od 260/280 ratio between 1.8 and 2.2. then, oligo (dt) magnetic beads were subjected to capture mrnas that contained poly-a tails from the total rna. the resulting mrnas were subsequently randomly broken into small fragments of approximately 200 bp by adding fragmentation buffer. the mrna fragments functioned as the templates for double-stranded cdna (dscdna) synthesis using the superscript double-stranded cdna synthesis kit (invitrogen, ca, usa). under the action of reverse transcriptase, a strand of cdna was synthesized by using random primers with mrna as a template, which was followed by two-strand synthesis to form a stable double-stranded structure. since there was a cohesive terminus in the double-stranded cdna structure, end repair mix was added to patch it into a blunt end, and an a base was added at the 3 'end to connect the yshaped adaptor. to purify and enrich the dscdna, 15 cycles of pcr were performed, and clean dna beads were used to screen 200-300 bp bands. after quantification by tbs380 (picogreen, invitrogen, ca usa), high-throughput sequencing of the resulting libraries was performed on the illumina hiseq xten/novaseq 6000 sequencing platform (san diego, ca, usa), and the sequencing read length was pairedend (pe) 150. to ensure the accuracy of the subsequent biological information analysis, the raw sequencing data generated from rna-seq was firstly filtered to obtain high-quality sequencing data (clean data) to ensure the smooth progress of the subsequent analysis. quality control of the raw reads was performed using seqprep (https://github.com/jstjohn/seqprep) and sickle (https://github.com/najoshi/sickle). the processes were as follows. the first step was to remove the adapter aging from the reads and the reads that did not insert the fragment due to the self-connection of the adapter. second, bases with a low quality (quality value less than 20) at the end of the sequence (3' end) were trimmed. if there was still a quality value of less than 10 for the remaining sequence, the whole sequence was discarded; otherwise, it was retained. third, reads with n ratios over 10% and sequences with lengths less than 20 bp after quality trimming were also removed. finally, the error rate (%), q20 and q30 values, gc-content (%), and sequence duplication levels of the generated clean reads were assessed [64] . after filtering the raw data, the clean data were aligned to the mouse reference genome grcm38 by 'bowtie2' software [65] . then, read summarization was calculated by the 'feature count' tool. differently expressed genes (degs) between the lps samples and control samples were identified by t-test using the 'deseq2' r package, as were degs between the ast samples and lps samples [66] . the raw p-value was adjusted to the false discovery rate by the benjamini method, and a false discovery rate (fdr) ≤ 0.05 and |log2fc|≥ 1 was chosen as the threshold. based on the hypergeometric distribution algorithm, go (gene ontology, http://www.geneontology.org/) biological process (bp), molecular function (mf) and cell component (cc) pathway enrichment analyses were performed using the 'clusterprofler' r package [21] . a p-value ≤ 0.05 was set as the cutoff criterion. to validate the combined findings from rna-seq and microarray meta-analysis, the expression of 11 core degs in the three groups was confirmed. rnaiso plus reagent (takara, tokyo, japan) was employed to extract total rna from mouse lung tissues from each group, and reverse transcription was performed to obtain cdna using primescript™rt master mix (takara, tokyo, japan) along with the gdna eraser kit (takara, tokyo, japan). relative mrna expression levels were determined using rt-pcr performed on bio-rad cfx maestro (bio-rad, usa) with tb green® premix ex taq™ ii (takara, tokyo, japan). all the above experimental steps were performed according to the manufacturer's instructions for the corresponding kit. glyceraldehyde-3-phosphate dehydrogenase (gapdh) was selected as the reference, and the primer sequences are presented in supplementary table 6 . qrt-pcr was performed under the following conditions: 95 °c for 3 min, followed by 40 cycles at 95 °c for 30 s, 56 °c for 30 s, and 72 °c for 30 s. each analysis was implemented in triplicate, and the relative expression levels of the target genes were calculated by employing the 2-δδct method [67] . inmex and networkanalyst were applied for the network-based microarray meta-analysis. for qpcr data, statistical analysis of differences between groups was achieved by one-way anova using prism 7 software (graphpad software inc., san diego, ca, usa). a twotailed test was used for all data, and differences with a p-value <0.05 were considered significant. first of all, we used the search formula of lps[all fields] and ("lung"[mesh terms] or lung[all fields]) and ("mus musculus"[organism] and "expression profiling by array" [filter] ) to obtain 62 results in geo datasets. by eliminating datasets of mirna sequencings, datasets not related to acute lung injury, and datasets that only researching on rna sequencing of specific cells such as macrophages and type ii alveolar epithelial cells, there were 8 articles remained (gse71648, gse104214, gse102016, gse38014, gse18341, gse16409, gse11662 and gse2411). continuing to check the specific description of the sample in articles, we found some datasets were grouped with sample of n<3 and some mainly studied ali or ards induced by excessive ventilation or non-lps chemicals. in the end, there were 5 datasets (gse102016, gse2411, gse16409, gse104214 and gse18341) that met the requirements of integrated analysis. we conducted a microarray meta-analysis using networkanalyst 3.0 combined three well-established meta-analysis approaches --fisher's method, fixed effect model, and vote counting. the features and main characteristics are given below (https://www.networkanalyst.ca). (1) fisher's method (-2*∑log(p)) is known as a 'weight-free' method and combines p values from multiple studies for information integration. (2) effect size is the difference between two group means divided by standard deviation, which are considered combinable and comparable across different studies. in the fixed effects models (fem), the estimated effect size in each study is assumed to come from an underlying true effect size plus measurement error. (3) vote counting is the simplest method in metaanalysis. differentially expressed gene is first selected based on a threshold to obtain a list of de genes for each study. the vote for each gene can then be calculated as the total number of times it occurred in all de lists. the final de genes can be selected based on the minimal number of votes set by the user. after the mice were sacrificed, the lung tissues were collected. immediately, the tissue was fixed in 4% paraformaldehyde for 24 hours and embedded in paraffin. the embedded tissue was sliced into 5 µm sections for staining. after the tissue sections were deparaffinized and rehydrated, they were heated in citrate buffer at 121 °c for 30 minutes to restore antigen activity. the sections were then incubated with 0.3% hydrogen peroxide in methanol for 30 minutes to inhibit endogenous peroxidase activity. after blocking nonspecific reactions with 10% normal bovine serum, the sections were incubated with rabbit polyclonal antibodies specific for ccl7 (1:100, abcam), saa3 (1:50, ab231680, abcam), ly6i (1:2000, abcam), saa1 (1:100-1:200, thermo), lrf7 (1:100, thermo), timp1 (1:100, thermo), isg15 (1:100, thermo) and cxcl13 (1:1000, abcam). the treated samples were placed at 4 °c for 12 hours. the sections were then washed with pbs and incubated with horseradish peroxidase-conjugated secondary antibodies at 37 °c for 2 hours. the stained sections were imaged under an inverted phase contrast microscope. acute respiratory distress syndrome acute respiratory distress in adults. the lancet, saturday 12 the acute respiratory distress syndrome lipopolysaccharide endotoxins cordycepin inhibits lps-induced acute lung injury by inhibiting inflammation and oxidative stress xanthohumol ameliorates lipopolysaccharide (lps)-induced acute lung injury via induction of ampk/gsk3β-nrf2 signal axis linarin prevents lps-induced acute lung injury by suppressing oxidative stress and inflammation via inhibition of txnip/nlrp3 and nf-κb pathways curcumin and allopurinol ameliorate fructose-induced hepatic inflammation in rats via mir-200a-mediated txnip/nlrp3 inflammasome inhibition troxerutin protects kidney tissue against bde-47-induced inflammatory damage through cxcr4-txnip/nlrp3 signaling astaxanthin: a review of its chemistry and applications astaxanthin, a carotenoid with potential in human health and nutrition biorefinery approach and environmentfriendly extraction for sustainable production of astaxanthin from marine wastes astaxanthin inhibits nitric oxide production and inflammatory gene expression by suppressing i(kappa)b kinase-dependent nf-kappab activation effects of astaxanthin on lipopolysaccharide-induced inflammation in vitro and in vivo astaxanthin prevents pulmonary fibrosis by promoting myofibroblast apoptosis dependent on drp1-mediated mitochondrial fission astaxanthin alleviated acute lung injury by inhibiting oxidative/nitrative stress and the inflammatory response in mice astaxanthin prevents against lipopolysaccharideinduced acute lung injury and sepsis via inhibiting activation of mapk/nf-κb networkanalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and metaanalysis clusterprofiler: an r package for comparing biological themes among gene clusters inflammatory cytokines in patients with persistence of the acute respiratory distress syndrome neutrophils in the initiation and resolution of acute pulmonary inflammation: understanding biological function and therapeutic potential evidence for chemokine synergy during neutrophil migration in ards fernández-botrán r. modulation of acute inflammation by targeting glycosaminoglycan-cytokine interactions contribution of neutrophils to acute lung injury antiinflammatory activity of a novel family of aryl ureas compounds in an endotoxin-induced airway epithelial cell injury model c-x-c motif chemokine 13 (cxcl13) is a prognostic biomarker of idiopathic pulmonary fibrosis zbp1: innate sensor regulating cell death and inflammation dynamic gene expression analysis in a h1n1 influenza virus mouse pneumonia model the cc chemokine ligand 2 (ccl2) mediates fibroblast survival through il-6 complement inhibition decreases early fibrogenic events in the lung of septic baboons the cytokine-serum amyloid achemokine network serum amyloid a is a biomarker of severe coronavirus disease and poor prognosis serum amyloid a3 is a high density lipoprotein-associated acute-phase protein exosomes derived from microrna-30b-3p-overexpressing mesenchymal stem cells protect against lipopolysaccharide-induced acute lung injury by inhibiting saa3 long non-coding rna malat1 regulates hyperglycaemia induced inflammatory process in the endothelial cells il-6 induced lncrna malat1 enhances tnf-α expression in lps-induced septic cardiomyocytes via activation of saa3 emerging functions of serum amyloid a in inflammation suppression of lipopolysaccharide-induced inflammatory response by fragments from serum amyloid a serum amyloid a promotes lps clearance and suppresses lps-induced inflammation and tissue injury knockdown of mir-660 protects nucleus pulposus cells from tnf-a-induced apoptosis by targeting serum amyloid a1 irf7: activation, regulation, modification and function attenuation of interferon regulatory factor 7 activity in local infectious sites of trachea and lung for preventing the development of acute lung injury caused by influenza a virus mir-302c mediates influenza a virus-induced ifnβ expression by targeting nf-κb inducing kinase mir-144 attenuates the host response to influenza virus by targeting the traf6-irf7 signaling axis cytokine functions of timp-1 imbalance between matrix metalloproteinases (mmp-9 and mmp-2) and tissue inhibitors of metalloproteinases (timp-1 and timp-2) in acute respiratory distress syndrome patients serum mmp-8 and timp-1 in critically ill patients with acute respiratory failure: timp-1 is associated with increased 90-day mortality timp-1 promotes the immune response in influenza-induced acute lung injury isg15 in antiviral immunity and beyond lipopolysaccharide activates the expression of isg15-specific protease ubp43 via interferon regulatory factor 3 extremes of interferon-stimulated gene expression associate with worse outcomes in the acute respiratory distress syndrome role of chemokines in the pathogenesis of acute lung injury proteinaseactivated receptor-1, ccl2, and ccl7 regulate acute neutrophilic lung inflammation extracellular atp mediates the late phase of neutrophil recruitment to the lung in murine models of acute lung injury molecular dynamics of lipopolysaccharide-induced lung injury in rodents modulation of lipopolysaccharideinduced gene transcription and promotion of lung injury by mechanical ventilation bpifa1 regulates lung neutrophil recruitment and interferon signaling during acute inflammation altered gene expression profiles in the lungs of benzo[a]pyreneexposed mice in the presence of lipopolysaccharideinduced pulmonary inflammation effects of age on the synergistic interactions between lipopolysaccharide and mechanical ventilation in mice clara cells attenuate the inflammatory response through regulation of macrophage behavior astaxanthin suppresses cigarette smokeinduced emphysema through nrf2 activation in mice fastp: an ultra-fast all-inone fastq preprocessor fast gapped-read alignment with bowtie 2 moderated estimation of fold change and dispersion for rna-seq data with deseq2 analysis of relative gene expression data using real-time quantitative pcr and the 2(-delta delta c(t)) method the authors declare that they have no conflicts interest. please browse full text version to see the data of supplementary tables 1 to 3 key: cord-260496-s2ba7uy3 authors: moncany, maurice l.j.; dalet, karine; courtois, pascal r.r. title: identification of conserved lentiviral sequences as landmarks of genomic flexibility date: 2006-08-08 journal: c r biol doi: 10.1016/j.crvi.2006.07.001 sha: doc_id: 260496 cord_uid: s2ba7uy3 considering that recombinations produce quasispecies in lentivirus spreading, we identified and localized highly conserved sequences that may play an important role in viral ontology. comparison of entire genomes, including 237 human, simian and non-primate mammal lentiviruses and 103 negative control viruses, led to identify 28 conserved lentiviral sequences (clss). they were located mainly in the structural genes forming hot spots particularly in the gag and pol genes and to a lesser extent in ltrs and regulatory genes. the cls pattern was the same throughout the different hiv-1 subtypes, except for some hiv-1-o strains. only cls 3 and 4 were detected in both negative control htlv-1 oncornaviruses and d-particle-forming simian viruses, which are not immunodeficiency inducers and display a genetic stability. clss divided the virus genomes into domains allowing us to distinguish sequence families leading to the notion of ‘species self’ besides that of ‘lentiviral self’. most of acutely localized clss in hiv-1s (82%) corresponded to wide recombination segments being currently reported. to cite this article: m.l.j. moncany et al., c. r. biologies 329 (2006). hiv genome flexibility is characterized by a high frequency of spontaneous mutations and a variety of rearrangements. the error-prone replication by hiv reverse transcriptase is generally held responsible for the impressive fraction of defective viruses observed in productively infected lymphocytes. a variety of other mechanisms can contribute to generate modifications in the hiv genomes by restabilizing the viral information under new forms, among them recombination phenomena now thought to be mainly responsible for hiv genomic flexibility throughout an infection process [1] [2] [3] [4] . this raised the notion of 'mosaic viruses' composed of parts inherited from divergent entire or partial viral genomes that could be present in the cells when the replication steps occur [5] [6] [7] . as lentiviral genomes are composed of an alternation of long variable and short conserved sequences, this appeared to be a characteristic of each part of the genomes organized as a succession of segments that can evolve differently and independently. the sequence conservation can be considered as the maintenance of either the general function of a protein or a potential precise function associated with a nucleotide segment (e.g., restriction or binding site). the extended nucleotide variability allowed by natural selection could lead to the evolution or to the disappearance of a function. in view of this analysis, the short conserved sequences may play crucial roles both in viral ontology and viral divergence independently of gene products functions (e.g., regulatory or enzymatic processes) as some of these sequences overlap genes. they could correspond to recombinogenic sequences important to understand lentiviral genomic flexibility. studies of the hiv genetic variability required computerized methods to investigate genomic divergences [8] [9] [10] [11] . a recombinant identification program was applied to the hiv-1 gag and env coding regions and allowed to determine putative large recombination segments -thus delimiting 'recombination cassettes' -and to create phylogenetic trees [12] . however, theses trees highly differed when the currently used computation methods were applied to either the gag, pol, or env genes, and when reference genome in the same population was changed [6, 7, [12] [13] [14] . the situation was made more complex when fragment analysis of a single gene showed divergent phylogenetic trees for each studied fragment [5] [6] [7] [14] [15] [16] , this being due to the independent evolution of each gene. in lentiviral genomes, the programs have rarely revealed precise recombinogenic segments, but rather computerized plots or large domains possibly implied in the recombination process [10, 11, 17] . comparison of whole genomes of related 'species' (with the meaning of taxons) is currently considered to identify the genomic organization and functionality without prior biological characterization [8, 18] . in our global approach concerning complete genomes in all situations, we carried out the detection of conserved lentiviral sequences (clss), their precise location being harmonized thanks to the use of a single mmy 1 ® sequence starting reference (see section 2). the analysis was made on genomes belonging to mammal lentiviruses, negative control viruses and randomgenerated genome-like sequences. this scan of the dna lentiviral sequences for conserved stretches allowed to identify 28 clss mainly situated in gag, pol and, in a minor proportion, env structural genes. a few of them were also noticed in the ltrs and the regulatory genes. a large part (82%) of clss located in hiv-1s is situated in currently described recombination segments where they might form recombinogenic hot spots [6, 7, [13] [14] [15] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] . the similar position of each sequence in the different viral families led to establish the notion of 'lentiviral' and 'retroviral self' that was defined by the specificity of the sequences for restricted viral families. the viral genomes were retrieved from the genbank database and their loci are listed in appendix a. immunodeficiency lentiviral genomes correspond to 171 human viruses (155 hiv-1s and 16 hiv-2s), 33 simian viruses (3 cpz, 9 agm, 8 macaque, 2 mandrill, 10 sooty mangabey, 1 sykes' monkey viruses) and 33 non-primate mammal viruses (2 bovine, 2 caprine, 11 equine, 9 feline, 3 ovine and 6 ovine/caprine viruses). to test cls specificity, three kinds of negative controls were examined. first, non lentiviral yet retroviral genomes were screened (5 spumaviruses, 25 oncornaviruses and 4 d-particle-forming simian viruses). then, a set of human, animal and vegetal viruses was tested: 4 herpes viruses, 47 human and animal viruses (1 adenovirus, 6 coronaviruses, 2 filoviruses, 12 flaviviruses, 13 parvoviruses, 10 picornaviruses and 3 rhabdoviruses) and 18 vegetal viruses (5 geminiviruses, 2 potyviruses, 5 tobamoviruses, 1 tombusvirus, 3 tymoviruses and 2 necroviruses). finally, 50 10-kilobase-dna-like structures were randomly generated with a computer in order to eliminate any bias due to the possible subsequent biological role. using a previously published program [8, 37] , a general analysis was carried out to identify the highly conserved sequences common to all or a maximum of lentiviral genomes. we first established the length of sub-sequences with the following trial/error method selecting parts of genomes. when the length was inferior to 10 nucleotides, numerous sub-sequences were found in every genome, including the control ones. when it was superior to 50 nucleotides, very few sub-sequences were found. the correct length was determined when it corresponded to that of sub-sequences common to most of the immunodeficiency viral genomes in either all or certain viral families. a similar approach was used for the determination of the number of accepted transitions. the relationship between both parameters led to an optimized choice of a 15-nucleotide-long sequence with a maximum of three admitted transitions. all the sequences showing such a determined length were tested and positioned in genomes thanks to the expasy program available on internet. some of them overlapped and created longer conserved domains. sequences present in most lentiviral genomes and checked not to be found in negative controls were selected. the number of accepted transitions was retained according to their lentiviral specificity to allow a variability of 15 to 20%. in some cases the limit of 25% was retained, which is thought to delimit the generally admitted jump from one family to another and might fit with a possible biological significance. the genomes supplied by the databases show variable lengths because ltrs are reported with different lengths. thus, as the first considered nucleotides at the 5 extremity vary from one genome to another, the rough detection of the sub-sequences gave heterogeneous localizations. another approach could use the real starting atg (as given in the databases) of the coding sequences as the initial nucleotide, but this was too complex because of the frequent presence of several other atgs. to sharpen and to harmonize the location of clss, we calculated their relative positions as compared to that of one of them chosen as the reference sequence (position +1). this sequence -mmy 1 ® -was previously described for pcr purpose [38, 39] and corresponds to the beginning of the pbs (reverse transcription primer binding site) at the 5 extremity of the genomes. it is situated, for example, between nucleotides 182 and 199 on the hiv-1 bru/lai genome reported in genbank. mmy 1 ® was found in hiv-1, hiv-2, agm, macaque, cpz, sooty mangabey, mandrill, equine and feline lentiviruses. when two ltrs were present in the genomic banks (hiv-1, hiv-2 and equine viruses), a second mmy 1 ® sequence detected at the 3 extremity of the genomes was not considered in the analysis. in the sykes' monkey, bovine, caprine, ovine and ovine/caprine viruses lacking mmy 1 ® , the indicated numbers corresponding to cls locations were determined as crude positions directly detected on the genomes. when investigated on 237 complete viral genomes, 28 clss were detected, six of them being the mmy ® ones that have been previously used as pcr primers for the early detection of possible hiv infection in highly exposed population [38, 39] . these sequences together with some new determined clss classified from cls1 to cls 22 are shown in table 1 . the relationship between precise positions of clss and genomic organization of the viral families are depicted in figs. 1 to 3. the cls characteristics together with their specific gene locations shown in tables 1-3 were divided into categories according to their lentiviral specificity. these sequences included all the mmy ones, cls 1 to 6 and cls 16. almost all of them were also present in simian viruses. it is worth mentioning the following particularities for some clss: cls 2 was detected twice in most hiv-1s, all cpz and mandrill (pol and nef genes) and most agm virus genomes (nef gene). it was observed only once (nef gene) for most hiv-2s, macaque and sooty mangabey viruses. besides, a detection degree of the tested cls in the genomes: (") at least 90%; (2) comprised between 10% and 89%; (!) less than 10%. b gene location of the clss: genes separated by ( / ) when cls was detected on each of the two genes; genes separated by (-) when cls overlapped the 2 genes; gene indicated in italics when cls was present at an occasional position. c a gradual detection of cls 14 was observed when the admitted transitions in this sequence varied from 1 (66%) to 7 (85%). the particular sykes' monkey virus presented the single cls 2 in the vif gene. when viral genomes did not possess the second cls 2 (nef gene), cls 16 (from which cls 2 derived) was detected in the pol gene. for cls 4 (gag gene), a maximum of three transitions (16.7%) and sometimes four transitions (22.2%) were introduced in the computation. in the control viruses, this sequence was only detected in one herpes virus after three permitted transitions. when reaching four transitions, it were detected in almost all lentiviruses and in some negative genomes. while cls 3 (pol gene) as well as cls 4 was found in many genomes in most viral families, they were the only ones present in the four simian d-particle-forming viruses whose presence does not develop aids-like syndrome. cls 6 (pol gene) was present in all the primate viruses, except the sykes' monkey's ones. these sequences corresponded to cls 7 up to cls 15, in addition to those common to a maximum of hiv-1s and hiv-2s. the pattern of detection of hiv-1s was also found for cpzs except that cls 13 and 14 were missing. cls 7, 8 and 9 were only detected in hiv-1s and in cpz viruses. in particular, cls 10 was detected in most hiv-1s (pol gene), sometimes twice for 34/141 positive strains (gag and pol genes). it was not found in 5/6 hiv-1-o viruses (af407418, him302646, him302647, hivant70c and hivmvp5180) but was in af407419. when cls 10 was present, its position highly varied in all cpz viruses (pol gene), the two caprine viruses (vif gene), hiv-2s and sooty mangabey (env gene), and mandrill viruses (pol gene). cls 13 was present in most hiv-1 genomes (env gene), except for 12 out of 155 hiv-1s including the six hiv-1-o viruses, and was found twice in 40 out of 155 ones (env and pol gene). cls 14 only found in most hiv-1s was located mainly in gag gene and gradually detected (66% to 85%) when the admitted transitions varied from 1 to 7. cls 15 (pol gene) was specific to most hiv-1s, but was absent from all the hiv-1-o group, and was present in one cpz virus. a detection degree of the tested cls in the genomes: (") at least 90%; (2) comprised between 10% and 89%; (!) less than 10%. b gene location of the clss: genes separated by ( / ) when cls was detected on each of the two genes; genes separated by (-) when cls overlapped the 2 genes; gene indicated in italics when cls was present at an occasional position. c n.a.: not attributed. these sequences correspond to cls 17 up to 21, in addition to those common to a maximum of hiv-1s and hiv-2s. the five clss characteristic of hiv-2s were also conserved in most simian viruses except for cpz, mandrill and sykes' monkey viruses. cls 17 was present in most hiv-2s and macaque viruses (env gene) and sometimes in sooty mangabey and agm viruses (rev gene). cls 18 was found in gag gene in all hiv-2, agm and macaque genomes and in most sooty fig. 3 . clss localization and genomic organization of cpz, d-particle-forming viruses, feline and equine viruses. the numbers represent the relative positions of the detected cls calculated versus that of the mmy 1 ® , except for d-particle-forming viruses that lack mmy 1 ® , whose numbers correspond to the crude positions of the clss directly referenced from the genomes (see section 2). the reference organization of cpz, d-particle-forming viruses, feline and equine viruses was represented using: af103818, af033815, fivz1 and af247394 viruses, respectively. occp.: cls found at an occasional position. mangabey and mandrill ones. cls 19 (gag gene) and 21 (vpx gene) were detected in most hiv-2s and cls 20 (env gene) in all of them. cls 19, 20 and 21 were also present in all macaque and sooty mangabey viruses in the gag, env and vpx genes, respectively. in addition to the sequences common to a maximum of hiv-2s, cls 22 was characteristic of all agm and sykes' monkey lentiviruses in the env and tat overlapping genes, respectively. when present in one of the two mandrill viruses (sivmndgb1), cls 22 was located at an unusual position in the gag gene. non-primate lentiviruses showed less clss than primate ones, with 13/28 not present at all (table 3) , bovine lentiviruses showing the minimal number of two clss. the missing clss concerned particularly those detected in hiv-1s. it is worth mentioning that cls 2 was uniquely observed in all equine viruses where it was detected once at the unusual position in gag gene. cls 10 was only present in all caprine viruses (unusually in vif gene) as well as cls 16 in most feline ones (pol gene). mmy 1 ® was restricted to all equine and feline viruses. a few clss were displayed in some negative control viral genomes, while the random-generated tenkilobase-dna-like structures did not present any of them. cls 3 was found in 8/9 htlv-1s together with cls 4 in 5/9 of them while htlv-2s did not present any sequence. cls 3 and 4 were the only ones detected in d-particle-forming simian viruses in the pol and gag genes, respectively. mmy 1 ® was in 1/6 murine retroviral genomes. one should note that all these different oncornaviruses belong to families that are genetically stable and do not induce immunodeficiency. cls 1 and 22 were in retroviral spumaviruses (3/5 and 2/5, respectively). cls 3, 4, 10, 17, 20 and 21 were found in 1/4 herpes viruses. remarkably, hiv-1s present a crosstransactivation activity on herpes viruses [40, 41] as well as htlv-1s [42] . cls 18 was in 1/10 picornaviruses while cls 16 was in 5/13 parvoviruses. the observed sequences were mapped on all the genomes of the different viral families. from the particular organization of the hiv-1 and hiv-2 ( fig. 1) , agm and macaque (fig. 2) , cpz, feline, equine and d-particle-forming viruses (fig. 3) and that of sooty mangabey, mandrill and other non-primate lentiviruses (supplementary data), it appears that a given cls occupied on the viral genome a specific position that was roughly conserved in the different viral families. at first sight, the clss were detected mainly in the gag and pol structural genes and, to a lesser extent, in the ltrs, the env structural gene and the nef, vpr, vpu, vpx, vif, rev and tat regulatory genes in a decreasing order. when analyzing the data among families, the hiv-1 genome displayed the highest number of clss that were mainly found in the first half of the genome, covering 5 ltr, gag and pol genes and a part of vif gene. particularly, the sequence detection evidenced the p17 and p24 proteins for gag gene and the p31 and p51 proteins for pol gene, while cls 11 and 13 were found in the gp120 protein of env gene (see tables 2 and 3) . moreover, it was noticeable that cls 7 was at the hinge of the two ltrs. in hiv-2 genome, clss were also mainly related to gag, pol and env genes. hiv-2 and siv genomes presented similar organizations, but some differences led us to differentiate the sivs in several categories (figs. 2 and 3; supplementary data) . for example, cpz viruses exhibited a striking analogy with hiv-1s, while agm, macaque and sooty mangabey genomes showed a cls organization rather similar to that found in hiv-2s, confirming molecular data. cpz viruses also presented such a similarity with hiv-1s at the molecular level, yet they belonged to the simian viruses concerning the immunological characterization (e.g., [1, 8, 43] ). besides, the sykes' monkey virus appeared to be particular since it presented only six clss, four of them being at the hinge of the pol/vif genes. the d-particleforming viruses showed the simplest organization with only cls 3 and 4 that framed the beginning of pol gene. for the non-primate mammal viruses, the data must be cautiously interpreted, because the low number of studied genomes was not representative for some of these families. however, it is noteworthy that their detected sequences were mainly situated in the gag-pol region. the clss of these viruses whose number increased from bovine, equine, ovine/caprine, ovine, feline and finally up to caprine genomes showed a similar pattern ( fig. 3; supplementary data) . lentiviral genomes contained 28 clss allowing them to divide into regions corresponding to 'evolutive cassettes' that defined viral specific subtypes. about one third of clss were common to almost all primate lentiviral genomes. as to clss specific to hivs, their detection fitted with the known immunological families. maps of the viruses that can be reconstructed from building blocks delimited by clss were specific to each viral type, though they presented a similar high density in the gag-pol region. the revealed homology in gene location of clss between hivs and simian viruses confirmed the separation into two categories (hiv-1/cpz viruses, hiv-2s/other sivs). a clear barrier between primate and other mammal viruses appeared, due to shifting locations for some clss in the non-primate ones. many studies describe lentiviruses as mosaic viruses and numerous examples of recombination between different hiv-1s strains have been reported [6, 7, [13] [14] [15] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] . the correlation between wide recombination segments (whose list is not exhaustive) and clss acutely localized in the same regions is shown in table 4 . most of clss located in hiv-1s (82%) corresponded to such reported sequences, an underestimated value since multiple recombinogenic segments lacking of precise location (e.g., [35, 36] ) are not mentioned. cls 2 and 4 as well as mmy 28 were situated in the hiv-1 recombination sequence shown in the gag-pol region [21] . other domains contained cls 2 and cls 7 in the nef -ltr region [19] , mmy 1 (ltr), mmy 3 (gag gene) and mmy 31, as well as cls 9 (pol gene) in a study using hiv-1 strains deleted in the env gene [22] . in complete hiv-1 genomes, recombinogenic segments have been reported in the pol gene at rt level (p51 protein), to which corresponded cls 8, 9, 10 and 12, and in the env/nef common part, containing cls 2 [20] . they concerned also the gag (p17 protein), env (gp 41 and 120 proteins) and nef genes that carried mmy 3 as well as cls 10 and 14, table 4 correlation between recombinogenic segments described in hiv-1s and clss gene location of reported recombinogenic segments clss detected in corresponding segments [22, 25] 5 ltr mmy 1 [6, 15, 21, 22, [26] [27] [28] gag mmy 3, 28, cls 4, 10, 14 [6, 7, 14, 20, [22] [23] [24] [29] [30] [31] pol mmy 31, 32, cls 1-3, 8-10, 12, 16 [13, 26, 27, 30, 32] vpr/tat/rev cls 14, 17 [6, 20, 26, 27, 33, 34] env cls 11, 13 [7, 19] env/nef cls 2, 7 [19] nef/ 3 ltr cls 7 cls 11 and 13, and cls 2 and 7, respectively [6] . recombinogenic segments corresponded to clss 11 and 13 in env gene (gp120 protein) and cls 14 in the tat/rev overlapping region [34] . among the 22 clss present in hiv-1s, 82% belonged to the large recombination domains cited above. mmy 4 and cls 5 that corresponded to gag p24 protein (core protein) and clss 6 and 15 that corresponded to pol p31 protein (integrase) have not been correlated until now to recombinogenic segments. clss were characterized by their nucleotide composition and exhibited at first look a clear gap in detection between primate and other mammal viruses. in fact, well-conserved clss can represent a signal specific of the strain, the sub-group, the group or the family of virus to which they belong. these sequences can be classified as a function of the possible recombination events they could induce: 'hiv-1 type' between the different hiv-1 and cpz strains; 'hiv-2 type' between the different hiv-2s, macaque and sooty mangabey viruses and the feline ones; 'simian type' between agm, mandrill and sykes' monkey viruses; 'lentiviral type' between approximately all mammal lentiviruses; 'retroviral type' between almost all retroviruses. thus clss can be considered as the 'identifying self' of the viruses and their presence permitted the denomination of 'viral self', which could be a 'species self' (an hiv-1-type, for example), or an 'inter-species self' -'lentiviral' or 'retroviral' self. our global study of the entire lentiviral genomes suggested the involvement of recombinations in genome flexibility. it allowed us to postulate that if one recombination induced the formation of one variant, the association of series of related variants formed one subtype. the latter was produced by recombinations between distinct variants that could cause or not the emergence of a new subtype. the genomic flexibility is associated with the viral-derived dna sequences that can recombine [1, 19, 30] and/or produce complementing and/or recombining rnas [4, 44, 45] . once several elements of the viral genome have penetrated into a cell -and sometimes together with hiv dna pieces carried by virions [46] -they may be rearranged to possibly elicit productive infection in a new target tissue. for example, cls 11 and cls 13 (hiv-1 env gene) were located at the level of the v3 and v4 variable loops of the gp120 cellbinding protein, respectively corresponding to the recombinogenic segments previously described [6, 20, 26, 27, 33, 34] . this cls 11/cls 13 tandem seems to correspond to an important building block for the creation of a new cellular tropism. the v3 domain is critical for chemokine-mediated blockade of infection [47] and the v3 and v4 regions that are separated by the c3 constant domain represent the targets of many trials for the establishment of a candidate vaccine. a productive viral dissemination has been revealed that evidences recombinations implying independent gene evolutions [1, 14, 30] , and possibly leads to a new gene acquisition [48, 49] . such genomic divergences both maintain viral specificity and allow the emergence of new families raising the question: are these divergences selected by the specificity or are they specificityselectors while maintaining the viral integrity? so a new viral cell tropism can be created that correlates with the use of the ccr5 or cxcr4 classical co-receptors [50, 51] or a postulated new one [52] . another essential step was to link the determined landmarks of genomic flexibility with precise viral functions. it is worth mentioning that clss 16 and 2 have a very close function. cls 16 was a part of the cppt involved in the initiation of lentivirus reverse transcription [53] where dna synthesis enhances dna/dna recombination [54] . cls 2 represented the well-conserved part of the distal ppt, and perhaps the action site, while the neighbouring sequences which varied highly from a virus to another one might constitute a specific recognition site for the reverse transcrip-tase associated with a given virus. hiv-1s displayed two cls 2 at positions shown to be the cppt (pol gene) [44] and the distal ppt (nef gene) [53] . also, hiv-2s revealed a slightly different organization, since cls 16 was part of the cppt (pol gene) and cls 2 part of the distal ppt when the nef gene overlaps the 3 ltr. the role in recombination of clss situated in the conserved ppt was emphasized by the determination in the pol gene (protein p31, integrase) of a recombinogenic segment that could affect the viral productive cycle [14] . another interesting point concerns the sequence structure which could also indicate a specific additional function for some clss. for example, the presence of the repeated aatt motif inside cls 15 (3 motives) and 13 (2 motives) is similar to a folding-like inducer of the dna [55] . cls 15 presented the noticeable triplicate aatt structure (aaactaaagaatt acaaaaacaaattacaaaaatt) neighbouring cls 6 to form a termination structure. in view of the structure of the cts (pol gene, p31 protein) [56] , cls 15 together with cls 6 showed the aaaaatt and aaatttt motives corresponding respectively to 1 and 2, strong and weak, stop signals [57, 58] . the cts approximately represents the succession of these two sequences. the raison d'être of a cls -simply recombining, participating in the genetic expression or both -can imply important differences in the detected sequence function and/or viral ontology such as generation of a new subtype (e.g., [26, 44] ). clss specificity revealed that two kinds of recombinations seemed to coexist. one involved sequences mostly common to all the lentiviral genomes separated by large domains, which could allow interspecific recombinations. the other type of recombination involved sequences that within a same family were separated by short distances or even overlapped as shown in hiv-1s for cls 10 and 14 (gag gene). these findings led us to discriminate between restricted or expanded specificity. in a gene-to-gene study on gag and env genes, wide segments have been described where recombinations could be present, which implied that one recombinant virus characterizes one subtype [12] . the multiplicity of viral subspecies present in the same infected host may be a cause and/or a consequence of the recombination phenomena [1, 3, 4, 30] , the generation of recombining strains increasing this process. the presence of clss is not directly associated with the role of a gene since a recombination can be either intra-or intergenic. such a process involves two adjacent or distant clss that can be situated in the same gene or in two different genes. considering all the possibilities of recombination that happen during an infection, the viability of the newly recombinant genome is the single criterion of selection. the cascade of slow genomic divergences is an integral part of the viral ontogeny to ensure long-term survival. such genetic approaches could benefit from the defined clss, whose characteristic is to be well-conserved and to keep functional the viral genome. thus the mechanism that maintains the genomic flexibility would be an excellent tool to impede viral growth, particularly when sequences implied in recombination and genetic expression are modified or blocked. as an ad absurdum argument, d-particle-forming viruses that do not cause immunodeficiency presented a single sequence cls 3 (pol gene), and occasionally cls 4 (gag gene), a situation leading to the suppression of some necessary specificity steps. these two clss were the only sequences detected in htlv-1s showing a genetic stability like that in htlv-2s yet without cls, which suggests that they could play a role in gene exchange between oncorna-and/or retroviruses. during lentiviruses adaptation to changes caused by drug administration [2, 45, 59, 60] or by environmental conditions to ensure continued reproduction, the viral propagation beyond a critical level is constantly in a situation of flux. thus, clss can allow the spontaneous replacement of defective variants with newly 'recruited' recombined (or complementing) hiv genomes. the high degree of divergence is a vital part of the viral ontogeny, and recombinations induce sustained viral multiplication allowing the environment to select the most efficient genomic alternative. this emphasizes the importance of the number, the position and the specificity of clss on the viral genome, especially since most of them fitted with recombinogenic segments already described. the biological role validity of some clss not associated until now with a known function had to be checked in vitro, for example by reverse genetic experiments that could reveal the importance of the different sequences. clss, which represent essential landmarks of genomic flexibility, may become key targets for the establishment of new drug and/or gene therapies that can escape the resistances encountered with treatments presently in use. chimpanzee (3): af103818, af115393, sivcpzgab. african green monkeys af074965, af139382, af326583, af326584, af412314, hl2g12gnom, hl2v2cg, htlvcge, nc_001488. murine viruses a.3. negative control viruses herpes viruses recombinant hiv sequences: their role in the global epidemic different evolutionary patterns are found within human immunodeficiency virus type 1-infected patients recombination in hiv: an important viral evolutionary strategy mechanisms of retroviral recombination mosaic structure of the human immunodeficiency virus type 1 genome infecting lymphoid cells and the brain: evidence for frequent in vivo recombination events in the evolution of regional populations high frequency of recombinant genomes in hiv type 1 samples from brazilian southeastern and southern regions morgado, molecular epidemiology of hiv-1 in venezuela: high prevalence of hiv-1 subtype b and identification of a b/f recombinant infection fast analysis of genomic homologies: primate immunodeficiency virus a likelihood method for the detection of selection and recombination using nucleotide sequences in vivo characteristics of human immunodeficiency virus type 1 intersubtype recombination: determination of hot spots and correlation with sequence similarity a novel exploratory method for visual recombination detection scanning the database for recombinant hiv-1 genomes characterization of a highly replicative intergroup m/o human immunodeficiency virus type 1 recombinant isolated from a cameroonian patient sequence variability of the integrase protein from a diverse collection of hiv type 1 isolates representing several subtypes high prevalence of diverse forms of hiv-1 intersubtype recombinants in central myanmar: geographical hot spot of extensive recombination development and application of a highthroughput hiv type-1 genotyping assay to identify crf02_ag in west/west central africa stepwise detection of recombination breakpoints in sequence alignments sequencing and comparison of yeast species to identify genes and regulatory elements genetic characterization of the nef gene from human immunodeficiency virus type 1 group m strains representing genetic subtypes a precise mapping of recombination breakpoints suggests a common parent of two bc recombinant hiv type 1 strains circulating in china genotypic and phenotypic analysis of hiv type 1 primary isolates from western cameroon human immunodeficiency virus type 1 recombination: rate, fidelity, and putative hot spots v118i substitution in the reverse transcriptase gene of hiv type 1 crf02_ag strains infecting drug-naive individuals in cameroon hiv type-1 circulating recombinant form crf09_cpx from west africa combines subtypes a, f, g, and may share ancestors with crf02_ag and z321 isolation and characterization of a fulllength molecular dna clone of ghanaian hiv type 1 intersubtype a/g recombinant crf02_ag, which is replication competent in a restricted host range emergence of new forms of human immunodeficiency virus type 1 intersubtype recombinants in central myanmar independent introduction of transmissible f/d recombinant hiv-1 from africa into belgium and the netherlands mother-to-child hiv type-1 transmission in argentina: bf recombinants have predominated in infected children since the mid-1980s identification of ugandan hiv type-1 variants with unique patterns of recombination in pol involving subtypes a and d evolution and diversity of hiv-1 in africa -a review prevalence and origin of hiv-1 group m subtypes among patients attending a belgian hospital in 1999 dual human immunodeficiency virus type 1 infection and recombination in a dually exposed transfusion recipient. the transfusion safety study group an ab recombinant and its parental hiv type 1 strains in the area of the former soviet union: low requirements for sequence identity in recombination, unaids virus isolation network new hiv type-1 crf01_ae/b recombinants displaying unique distribution of breakpoints from incident infections among injecting drug users in thailand hiv type-1 bf recombinant strains exhibit different pol gene mosaic patterns: descriptive analysis from 284 patients under treatment failure the structure of hiv-1 genomic rna in the gp120 gene determines a recombination hot spot in vivo a probabilistic algorithm for interactive huge genome comparison late seroconversion in three multitransfused young haemophiliacs confirmed by hiv pcr analysis in vitro non-productive infection of purified natural killer cells by the bru isolate of the human immunodeficiency virus type 1 post-transcriptional transactivation of human retroviral envelope glycoprotein expression by herpes simplex virus us11 protein cross-talk between human herpesvirus 8 and the transactivator protein in the pathogenesis of kaposi's sarcoma in hiv-infected patients functional replacement of the hiv-1 rev protein by the htlv-1 rex protein wain-hobson, genetic organization of a chimpanzee lentivirus related to hiv-1 mechanisms associated with the generation of biologically active human immunodeficiency virus type-1 particles from defective proviruses how rna viruses exchange their genetic material viral dna carried by human immunodeficiency virus type 1 virions the v3 domain of the hiv-1 gp120 envelope glycoprotein is critical for chemokine-mediated blockade of infection gene acquisition in hiv and siv evolution of the primate lentiviruses: evidence from vpx and vpr biological and molecular aspects of hiv-1 coreceptor usage impact of antiretroviral treatment on the tropism of hiv-1 plasma virus populations identification and characterization of hiv-2 strains obtained from asymptomatic patients that do not use ccr5 or cxcr4 coreceptors a single-stranded gap in human immunodeficiency virus unintegrated linear dna defined by a central copy of the polypurine tract strand displacement synthesis in the central polypurine tract region of hiv-1 promotes dna to dna strand transfer recombination the structure of an oligo(da).oligo(dt) tract and its biological implications hiv-1 reverse transcription. a termination step at the center of the genome synthesis of dna by human immunodeficiency virus reverse transcriptase is preferentially blocked at template oligo(deoxyadenosine) tracts template-directed pausing of dna synthesis by hiv-1 reverse transcriptase during polymerization of hiv-1 sequences in vitro mutations in retroviral genes associated with drug resistance characterization of resistant hiv variants generated by in vitro passage with lopinavir/ritonavir we thank dr christiane plas for critical reading of this manuscript and for helpful discussions. appendix a af408631, af408632, a04321, a07867, him237565, him245481, him271445, him291719, him302646, him302647, hivant70c, hivbcsg3x, hivbrucg, hivcam1, hivelicg, hivf12cg, hivhxb2cg, hivibng, hivjrcsf, hivmalcg, hivmncg, hivmvp5180, hivndk, hivnl43, hivny5cg, hivoyi, hivpv22, hivp896c, hivrf, hivsf2cg, hivth475a, hivu455a, hivu43096, hivu43141, hivu46016, hivu51188, hivu51189, hivu54771, hivu69584 key: cord-024290-8z6us7v4 authors: allen, edward e.; farrell, john; harkey, alexandria f.; john, david j.; muday, gloria; norris, james l.; wu, bo title: time series adjustment enhancement of hierarchical modeling of arabidopsis thaliana gene interactions date: 2020-02-01 journal: algorithms for computational biology doi: 10.1007/978-3-030-42266-0_11 sha: doc_id: 24290 cord_uid: 8z6us7v4 network models of gene interactions, using time course gene transcript abundance data, are computationally created using a genetic algorithm designed to incorporate hierarchical bayesian methods with time series adjustments. the posterior probabilities of interaction between pairs of genes are based on likelihoods of directed acyclic graphs. this algorithm is applied to transcript abundance data collected from arabidopsis thaliana genes. this study extends the underlying statistical and mathematical theory of the norris-patton likelihood by including time series adjustments. cell signaling is accomplished via networks of transcriptional changes that lead to synthesis of distinct sets of proteins, which cause changes in growth, development, or metabolism. treatments that elevate levels of hormones result in cascades of changes in gene expression driven by activation and synthesis of transcription factors which are required to turn on downstream genes. one approach to model these gene regulatory networks is to collect measurements of changes in abundance of gene transcripts across a time course. the expression of a gene encoding a transcriptional activator or repressor protein may signal to the next gene to either turn on or turn off downstream genes and their encoded proteins. thus, time course transcriptomic data sets contain important information about how genes drive these changes in biological networks. yet genome-wide transcript abundance assays examine tens of thousands of genes so identification of patterns or networks within these large data sets is difficult. it is also critical to filter the meaningful transcript changes in these data sets to remove genes whose responses are not above background or that are dissimilar due to biological or technical variation. yet even though the bioinformatics community has developed statistical methods to filter the data [9] , additional approaches are needed to identify the networks and patterns in these large data sets. an important modern approach to statistical modeling includes bayesian techniques involving likelihoods and posterior probabilities. here, we extend our previous work on this problem by incorporating time series adjustments in the computation of bayesian likelihoods. we apply this method to time course data generated in response to treatments that elevate the levels of the hormone ethylene in arabidopsis thaliana. we take advantage of a previously published genome-wide transcriptional data set [9] , subjected to rigorous filtering and from which all the genes predicted to encode transcription factors have been identified. the goal is to predict gene regulatory networks that control time-matched developmental changes. the results in this paper are novel for several reasons. first, the methods use the hierarchical nature of the data sets. for example, replicate data are not averaged. rather, the method constructs a model over all of the data that uses each replicate as a source of information. the assumption is that at each level of the hierarchy there are commonalities in the data and parameters. thus, the replicate data is not independent. second, the addition of time series adjustment to improve the independence of the model's residuals gives these techniques stronger statistical foundations. third, the combination of bayesian model averaging with a cutting edge genetic algorithm provides rigorous estimates of posterior probabilities for edges. these computational modeling algorithms are derived using rigorous mathematical and statistical techniques and are computationally efficient. the models produced are easily understandable. many different techniques for modeling non-hierarchical data using gene expression data have been proposed. an excellent recent survey on this subject was given by emily [4] . there are many techniques for modeling gene and protein networks-with various different properties-available in the literature. our technique in this paper is a bayesian regression type method. variations of bayesian modeling can be found in [7, 11, 19] . other methods that use types of regression include [2, 21] which focus on logistic regression techniques, and [22, 23] which use poisson regression. other approaches to modeling these types of problems include differential equations [1] and boolean modeling [14] . this bayesian likelihood computational algorithm incorporates additional important features from earlier versions. earlier variations included computing posterior probabilities for a single replicate [11] and for multiple replicates with both hierarchical [18] and independent [17] structures. over the course of this research, the search procedure has changed from metropolis hastings to genetic algorithms. genetic algorithms' execution times are typically polynomial rather than the doubly exponential execution time, in terms of the numbers of time points and genes, of metropolis hastings. this variation also uses a bayesian version of the cross generational elitist selection, heterogeneous recombination, cataclysmic mutation algorithm (chc) [6] . genetic algorithms are motivated by the operators of selection, crossover, and mutation. the chc variation does not allow the crossover of similar parents. once the population becomes too homogeneous, then a cataclysmic mutation event regenerates the population from the current most fit parents. the bayesian chc (bchc) implemented in this paper uses a hierarchical statistical construct (the norris-patton likelihood) as the fitness function. the hormone ethylene (acc) is known to activate root growth in arabidopsis thaliana [9] . transcription factors (tfs) are cellular proteins that bind to dna to turn genes either on (activation) or off (repression). developmental changes are controlled by these genes. the data set used in this modeling process was the complete set of abundance levels of the twenty-six tfs believed/known to be involved in the activation of the growth of roots at eight time points after treatment with the ethylene precursor acc [9] . here, constructing an appropriate network model has potential agricultural applications in that it should lead to more complex understandings of root development. three network modeling paradigms are generally considered in the literature: cotemporal, next state one step and next state one and two steps. a next state one step model predicts the transcript abundance relationships between genes at time j based on the transcript abundance at time j − 1. in this paper, we will only consider next state one step models; for simplicity, we will refer to next state one step as next state. the time series adjusted (tsa) next state models are an amalgamation of next state modeling with standard time series adjustments [12] . the time series adjustment methodology makes the residuals (i.e., the estimated error terms) more independent. a directed graph g = (v, e) consists of a pair of collections: v a set of vertices (or nodes); and, e a collection of directed edges between pairs of vertices. a cycle is a sequence v 1 , e 1 directed acyclic graphs (dags) do not contain cycles. an example of a dag is given in fig. 1 . in this modeling algorithm, dags form the mathematical foundation of our computational approach. the vertices of a dag represent genes and the directed edges are one-way relationships between pairs of vertices. when there is a directed edge from for any dag d with vertex set v = {v 1 , v 2 , · · · , v n }, the vertices can be topologically sorted. this gives a total order > on v such that if v i is an ancestor conditional probability gives that for any two events a and b, the probability similarly, the density function f for two continuous variables y 1 and y 2 is recursively, using the order < implied by topologically sorting the dags on the set of continuous variable specific for a particular dag d, let y 1 be the gene that cannot have any parents. let y 2 be the gene that can have at most parent y 1 . similarly, let y h be the gene that can have parents from the collection {y 1 , · · · , y h−1 }. therefore, if we let y i represent the data of child i for all of the r replicates, we have for d f (y 1 , y 2 , . . . , y n |d) =f (y 1 |d)f (y 2 |y 1 , d)f (y 3 |y 1 , y 2 , d) · · · f (y k |y 1 , · · · y n−1 , d) statistical regression models of response (child) data from predictors (parents) data over time nearly always have correlated residuals over time. this is usually due to the remaining influence of the previous time's response data. in complicated modeling situations (e.g., like ours where we need to obtain closed form likelihoods of dags within a hierarchical structure in order to produce posterior probabilities of edges), it is common to derive results as if there were non-correlated residuals, as we have done in previous work. our previous work has shown utility both for simulated and biological data, but we now rigorously incorporate a time series adjustment into our model. this should result in substantially less correlated residuals and thus more accurate likelihoods for the dags. since these likelihoods are the foundations for the edges' estimated posterior probabilities, these estimates should also be improved. our time series adjustment is an integer autoregressive adjustment of order 1 in the commonly used family of markov conditioning. it is a version of kedem's and fokianos' autoregressive model [12, page 184 ]. in our setting, this simply adds the child's data at the previous time as an additional regressor for the child's data at the current time. thus, much of the child's data at the previous time's influence would be regressed out leaving less correlated, closer to independent, residuals from one time to the next. for each h, with 1 ≤ h ≤ n, f (y h |y 1 , y 2 , . . . , y h−1 , d) gives the density of y h given y h 's parent's data for dag d. now, let i y c be the data vector of any given child c from the i th replicate. the vector i y c has dimension t, the number utilized time points in the child c data set for a given replicate i. the symbol i x c is the t × k c regressor matrix for i y c . for next state with time series adjustment, t is the number of time points per replicate minus one since at time 1, the child data has no last previous parent data nor last previous child (tsa) data-so, the utilized child data starts at time 2. the value of k c is the number of parents of c plus two since i x c has a separate column for each of its parent's data at the previous time, a column of 1's for the intercept, and a column of the child's data at the previous time (the time series adjustment). a k c dimensional slope vector for child c's regressors is i β c . the common within replicate residual variance of child c is σ 2 c . assumptions which detail the hierarchical structure include that for a given the proof of theorem 1 uses the following lemmas whose computation can be found in [16] (a thesis from our research group). we include the proof of lemma 2 to show how the computation of the likelihood includes the slope parameters i β c of each of the replicates separately. proof. using integration, we have letting |m | denote the determinant of the matrix m , we have the following: extending lemma 2 to the product of density functions used in lemma 1, we have: note that g, v 0 and σ 2 c are positive free parameters. in our modeling algorithm, we set g = v 0 = σ 2 c = 1. the use of the time series adjusted next state norris-patton likelihood, along with a tailor-made genetic algorithm and bayesian model averaging, allows for the rigorous estimation of posterior probabilities for all gene pair interactions. if indicator < 0 then 18: p (t) cataclysm(p (t)) 19: indicator 50 20: end if 21: archive archive ∪ p (t) 22: end while 23: return archive 24: end procedure simply put, a genetic algorithm (ga) takes the current population and produces the next generation using the operations of selection, crossover, and mutation [15] . individuals (i.e., dags) are automatically moved to the next generation with preference given to those with the higher likelihoods (the elitist strategy). the first population must be initialized. the genetic algorithm terminates after a specified number of iterations. the tbchc genetic algorithm is an extension of bch [13] which was heavily influenced by the chc [5] . the tbchc fitness function includes the next state time series adjustment. the tbchc operators of selection, crossover, mutation, and repair will be discussed in the following paragraphs. the population of each generation consists of a fixed number of dags. each dag represents gene relationships. the genetic algorithm's aim is to move from the current population of dags to a new generation where the overall quality improves (as measured by the norris-patton likelihood). the elitist strategy only moves the top 10% of dags from the current generation to the next and the balance is filled by crossover. as tbchc iterates, all distinct dags are archived. the final gene interaction model is produced from this archived collection. generally, the selection operator chooses which members of the current population can potentially contribute children to the next generation. in fig. 2 selection is accomplished through a random pairing of all parents in the current population (lines 8-10). by assuming prior probabilities for the dag, the likelihood of a given dag d is proportional to the d's npl [3] . thus, the fitness of a candidate d can be computed using the npl. the crossover operator (line 12) exchanges genetic information (i.e., directed edges) between two parents producing two new offspring. the edges chosen to be exchanged are chosen randomly. there is one caveat: if the two parents are too similar-determined by the hamming distance between them then the two selected parent dags are not allowed to produce offspring (line 11). in a simple genetic algorithm, all selected parents are allowed to produce offspring. this tbchc prohibition of mating by similar parents may result in fewer dags in the next population than in the current population. since the modeling process is based on dags, if the crossover operator introduces a cycle in the offspring, a repair operator is applied. selection and crossover are used exclusively in tbchc until the population becomes too similar. at that point, cataclysmic mutation (line 17) is applied to reset the population by creating a new population of dags from the top 10% npl dags. there are no known techniques for assigning the optimum values to the genetic algorithm parameters. however, experience and the literature give general criterion for appropriate values. still, values are often determined on a case by case basis. the tbchc algorithm parameters include the following: 20 parallel executions each with 600 generations; the number of initial dags is 400; the crossover probability is 0.30; and, the number of parents of any given node is limited to 3. cataclysmic mutation causes the population of dags to be replaced by dags generated by crossover and mutation on the top 10% of the population to restore the candidate class to 400. this tbchc algorithm is implemented in python 3.0 using the networkx [8] and dispy packages [20] . it is important to realize that each directed edge in the model is labeled by a number in the interval [0, 1] indicating the posterior bayesian probability that the associated relationship exists in the biological network. using bayesian statistics, , which simply and appropriately weights each visited dag d according to its likelihood. this methodology requires equally likely priors since in such a situation the posterior for d is proportional its likelihood [3] . in order for this estimate to reflect its true value, it is necessary that ar contain a large and varied collection of dags of high likelihood. using the transcript abundance data for 26 arabidopsis thaliana genes stimulated by acc, gene interaction models for a next state with and without time series adjustment were computationally created, shown in fig. 3 . each edge is labeled by its posterior probability. figure 4 provides comparisons of three similar models to those given in fig. 3 . figure 4 (a) shows a stronger and tighter distribution of posterior probabilities than fig. 4(b) . there is significant agreement across the models for average posterior probabilities exceeding 0.8 and less than 0.2. however, for average posterior probabilities with values greater than 0.2 and less than 0.8 there is a great deal of variance, which reflects the lack of a strong posterior probability over this range. a typical underlying assumption of statistical analysis is that the residuals are independent [3, page 737]. it is well understood, however, that the residuals associated with time course data are not usually independent. by incorporating time series adjustments into the modeling process, the residuals' independence is much improved; thus, yielding a less approximated, more accurate likelihood function. the continuation of this research includes four tasks. first, the computational networks have been sent to the muday lab for biological investigation, confirmation and interpretation. second, in this paper, we investigated the enhancement of times series adjustment on a next state one step model. there are two other time paradigms, next state one and two steps and cotemporal, each of which has a time series adjustment analogue and a corresponding norris-patton likelihood. comparing and contrasting the computational results of these three distinct modeling methods-as well as their biological interpretations-are important in understanding the gene interaction models developed using this methodology. third, we will further consider higher order autoregressive adjustment to continue improving the independence of the residuals. fourth, effort is underway to implement nonuniform priors in the modeling techniques. this would permit construction of gene interaction models that reflect relationships found in the literature. modeling gene regulation networks using ordinary differential equations detecting gene-gene interactions that underlie human diseases probability and statistics, 4th edn a survey of statistical methods for gene-gene interaction in case-control genome-wide association studies the chc adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination evolutionary computation 1 -basic algorithms and operators using bayesian networks to analyze expression data exploring network structure, dynamics, and function using networkx identification of transcriptional and receptor networks that control root responses to ethylene bayesian model averaging: a tutorial continuous cotemporal probabilistic modeling of systems biology networks from sparse data regression models for time series analysis a bchc genetic algorithm model of cotemporal hierarchical arabidopsis thaliana gene interactions stochastic boolean networks: an efficient approach to modeling gene regulatory networks an introduction to genetic algorithms bayesian interaction and associated networks from multiple replicates of sparse time-course data bayesian probabilistic network modeling from multiple independent replicates hierarchical bayesian system network modeling of multiple related replicates bayesian network analysis of signaling networks: a primer dispy: distributed and parallel computing with/for python plink: a toolset for whole-genome association and populationbased linkage analysis boost: a fast approach to detecting gene-gene interactions in disease data gboost: a gpu-based tool for detecting gene-gene interactions in genome-wide case control studies acknowledgments. the authors thank the national science foundation for their support with a grant, nsf#1716279. john farrell thanks wake forest university for support as a wake forest fellow for summer 2019. key: cord-350019-4nlbu54e authors: robinson, elektra k.; covarrubias, sergio; carpenter, susan title: the how and why of lncrna function: an innate immune perspective() date: 2019-09-02 journal: biochim biophys acta gene regul mech doi: 10.1016/j.bbagrm.2019.194419 sha: doc_id: 350019 cord_uid: 4nlbu54e next-generation sequencing has provided a more complete picture of the composition of the human transcriptome indicating that much of the “blueprint” is a vastness of poorly understood non-protein-coding transcripts. this includes a newly identified class of genes called long noncoding rnas (lncrnas). the lack of sequence conservation for lncrnas across species meant that their biological importance was initially met with some skepticism. lncrnas mediate their functions through interactions with proteins, rna, dna, or a combination of these. their functions can often be dictated by their localization, sequence, and/or secondary structure. here we provide a review of the approaches typically adopted to study the complexity of these genes with an emphasis on recent discoveries within the innate immune field. finally, we discuss the challenges, as well as the emergence of new technologies that will continue to move this field forward and provide greater insight into the biological importance of this class of genes. this article is part of a special issue entitled: ncrna in control of gene expression edited by kotb abdelmohsen. one of the most profound discoveries from the sequencing of the human genome is that over 85% of the genome is transcribed, yet < 2% encodes protein-coding genes [1] . large consortiums such as encode and fantom have embarked on attempting to characterize all functional coding and noncoding elements in the genome and have compiled important regulatory data for these elements [2] [3] [4] . long noncoding rnas (lncrnas) represent the largest group of non-coding rnas produced from the genome. lncrnas are defined as transcripts > 200 nucleotides in length, lacking protein-coding potential. in the most recent gencode v30 release, there are 16, 193 annotated lncrnas in the human genome [4] . additionally, there are over 14,000 pseudogenes, that could fall under the description of long noncoding rnas which is simply based on them being 200 nucleotides or greater in length. less than~3% of annotated lncrnas have ascribed functions. hence this class of rnas is greatly in need of further investigation [4] . from those that have been characterized, it is clear that lncrnas can function through a variety of mechanisms to regulate gene expression both at the transcriptional and post-transcriptional levels [5] . as we will discuss through this review lncrnas can mediate their functions through interactions with proteins, rna, dna, or a combination of these. furthermore, the function of lncrnas can often be dictated by their localization, sequence and/or secondary structure. there are many categories and sub-categories of lncrnas, but some of the major classifications include: antisense [6] , bi-directional [7] , enhancer-associated [8] , intergenic lncrnas (lincrnas) [9] , pseudogenes [10] , while a full review of all classifications can be obtained from the recent review by jarroux et al. [11] . lncrna function cannot be determined simply based on the lncrna classification. however, the classification can sometimes provide insight into its mechanism of action, such as antisense lncrnas impacting their neighboring genes. however, this same classification can also lead to erroneous assumptions about how the lncrna regulates gene expression. recently, some lncrnas have been demonstrated to actually encode small peptides indicating that these genes are misclassified as noncoding, although it is possible that they could also have functions as a noncoding rna in addition to their peptide coding capacity [12] [13] [14] [15] [16] [17] . it is therefore, important to have a logical methodology to study the biological importance of these genes. the innate immune system functions as a rapid initial response against specific pathogens, while also promoting the activation and development of the adaptive immune system [18] . macrophages and dendritic cells are important innate immune cells that initiate the immune response through recognition of specific pathogen-associated molecular patterns (pamps) through their germline-encoded pattern recognition receptors (prrs) [18] . these receptors couple pathogensensing to activation of downstream signaling cascades resulting in upregulation of numerous inflammatory pathways [19] . while a robust immune response is crucial for eliminating pathogens, prolonged activation can be detrimental to the host [20] . not surprisingly, many aspects of the inflammatory response are tightly regulated at both transcriptional and post-transcriptional levels allowing for a transient antimicrobial response while subsequently promoting a return to homeostasis [21] . perturbations in this regulation can have significant consequences that can manifest in diseases, such as arthritis [22] , multiple sclerosis [23] and cancer [20, 24] . while the role of coding genes in immune cell function has been well characterized, the role of lncrnas in these processes is just beginning to emerge [25] (fig. 1 ). here, we use the biological model system of macrophage activation as a framework to demonstrate how we approach the study of lncrna biology. we provide a step-by-step guide to consider when studying lncrnas. furthermore, we discuss the challenges, as well as the emergence of new technologies that are helping evolve the ways we study these genes. the lncrna field is in its infancy yet from what we do know we find that lncrnas play critical roles in a wide variety of biological processes and diseases from cell differentiation, tissue organ development, flowering in plants, to cancer metastasis to name just a few [26] [27] [28] [29] [30] . we believe that lncrnas play regulatory roles in many biological processes and diseases. therefore, no matter what your research area is, there is a rich source of information to be obtained from the study of lncrnas in your field of interest. the bulk of lncrna studies to date have focused on the cancer field [31] [32] [33] . meanwhile, studies of lncrnas in the context of innate immunity have lagged, making up only~4% of all lncrna papers to date (fig. 1) . the innate immune system provides one of the first lines of defense against infection through the induction of inflammation [34, 35] . the inflammatory response of murine macrophages offers a powerful system for applying genomic approaches for studying novel lncrnas within the framework of a pathway that has been studied for decades. macrophages are important mediators of inflammation and initiate this response through recognition of specific pathogen-associated molecular patterns (pamps) through their germline-encoded pattern recognition receptors (prrs). these receptors couple pathogensensing to activation of downstream signaling cascades resulting in activation of numerous transcription factors, including nf-kappab (nf-κb) and interferon regulatory factors (irfs) that can act in combination to both positively and negatively regulate the expression of thousands of genes [36, 37] . there are 10 different tlr genes in the human genome and 13 tlr genes in mice [38] [39] [40] , each binding a different pamp [41] . using this extensively studied biological system, we identified the first example of a tlr-stimulated lncrna, lincrna-cox2, which was capable of positively and negatively regulating distinct types of innate immune genes [42] [43] [44] [45] [46] . knockdown of lincrna-cox2 resulted in impaired production of proinflammatory genes (i.e., il-6), while ifnrelated genes were hyperactivated in the absence of lincrna-cox2 [42] [43] [44] [45] [46] . numerous other studies have made use of the tlr-signaling biological system, uncovering and characterizing dozens of novel lncrnas that act in a wide range of mechanisms to either positively or negatively regulate this pathway as reviewed in carpenter et al. and hadjicharalambous et al. [47, 48] . as mentioned lncrnas are categorized into five main classes of long noncoding rnas based on their genomic location: antisense, bidirectional, intronic, enhancer-associated, and intergenic. intergenic and enhancer lncrnas contain their own promoters and are distinct from protein-coding genes. bidirectional lncrnas share a promoter and are transcribed from the opposite strand of a protein-coding gene, while intronic lncrnas are transcribed within an intronic region of a proteincoding gene ( fig. 2a) [49, 50] . the specific class of lncrnas can often provide significant insight into how it may regulate gene expression. for example, lncrnas antisense to a coding gene have been demonstrated to be involved in transcriptional interference, negatively affecting the expression of their coding gene [51] . while all categories of lncrnas will no doubt be important in various biological processes, the most targetable lncrnas are intergenic lncrnas. a benefit to studying an intergenic lncrna (lincrna) is the immense variety of molecular techniques that would not apply to other types of lncrnas, such as antisense, bidirectional and intronic lncrnas, which often overlap coding genes and their targeting could lead to possible unwanted interference of that coding gene. numerous studies have established that lncrnas can regulate the expression of their neighboring coding genes (cis regulation) [51] . a recent study by engreitz et al. demonstrated that a significant portion of lncrnas had cis effects on their neighboring genes [52] . interestingly, in most cases, the cis effect did not require the production of the lncrna transcripts themselves, but instead required the processes associated with their production, such as transcription and splicing [52] . there are many examples of lncrnas that regulate their neighboring coding genes, for example: lnc-marcks, lnc-tnfaip3, as-il1α, lnc-il7r, and il-1β-erna [52] [53] [54] [55] [56] . recently we used genetic mouse models to show that lincrna-cox2 can function as an enhancer rna in cis to regulate its neighboring gene ptgs2 [46] . for these reasons, one aspect to consider when selecting a candidate, no matter the class, is to investigate the effect of the transcriptional expression on neighboring coding genes. this candidate selection approach is sometimes referred to as "guilt by association" [57, 58] . this bioinformatic approach drives an initial hypothesis that the lncrna could be involved in the similar biological pathway as their neighboring protein-coding gene due to their co-expression. a large number of studies to date have also shown that lncrnas can regulate gene expression on different chromosomes (trans regulation) [59] (fig. 2c ). the majority of lncrnas studied in immunity were initially identified following rna-sequencing to examine their expression profiles in specific cell lines or tissues during inflammatory activation. for example, lincrna-cox2 was initially identified as an up-regulated lncrna in murine dendritic cells following tlr4 stimulation [60] , as well as murine bone marrow derived macrophages (bmdms) following tlr2-depedent stimulation [42] . studies have also highlighted the functions of lncrnas that are highly downregulated post inflammatory activation, such as lincrna-eps [61] and lnc13 [62] . both lincrna-cox2 and lincrna-eps were discovered in bmdms post tlr inflammatory activation and were chosen for further characterization based on their extreme expression profile. lincrna-cox2 is rapidly upregulated and regulates a large number of interferon stimulated genes (isgs) and nf-κb regulated genes [42] . meanwhile, lincrna-eps is rapidly down-regulated during inflammation and acts as an inflammatory brake on all isgs during periods of homeostasis [61] . these are just two examples of lncrnas that have provided critical insights into the roles of lncrnas in immunity. for more information on specific lncrnas that are involved in innate immunity we direct you to the following recent reviews on this topic [42, 47, 63] . in addition to these bulk rna sequencing studies, a small number of single cell rna sequencing studies have been performed in both human and mouse that can be utilized to examine differential expression of lncrnas in basal versus treatment conditions or between cell types [64] [65] [66] [67] [68] . numerous rna-seq (both bulk and single cell) datasets are available for a variety of primary cells, cell lines or tissues of interest either basally or under a multitude of inflammatory or cellular differentiation treatments. these datasets outlined in table 1 [42,46,74-77, 57, 61, 64, 69-73] and table 2 [53,56, 82-90, 65-68, 78-81] provide a rich source of lncrnas for further investigation. evlncrnas [91] , noncode [92] , or lncipedia [93] are databases that categorize published information on all annotated lncrnas. these databases can be utilized to determine if a lncrna is experimentally validated within one or more studies. additionally, these databases can provide information on whether a lncrna possesses multiple isoforms, secondary structure, cross-species conservation and/or disease-association via presence of single nucleotide polymorphisms (snps). multiple studies have demonstrated that lncrna expression is more cell type specific compared to protein-coding genes [94] [95] [96] . such specific expression patterns can often provide important clues into the specific biology that the gene could be involved in [46] (tables 1 and 2) . a variety of consortiums exist for both human and mouse and can be utilized to determine cell type specificity of a lncrna candidate further. for instance, gtex [97] and xena [98] are two websites that include rna sequencing on healthy primary human tissue samples, in addition to samples from patients with diagnosed cancers. this will further assist the initial understanding of the expression of the lncrna in specific tissues as well as obtaining information on whether a lncrna is involved in cancer. in contrast, if a researcher is studying a mouse candidate lncrna, the mouse cell atlas (mca) [99] , as well as tabula muris [100] are excellent tools to assess specificity in cellular expression, as well as differential splicing isoforms amongst differentiated cell types. additional websites for mouse and human expression datasets can be found at the encode project [101] , the european bioinformatics institute expression and the fantom projects [102] . these sites are filled with raw and analyzed data sets from either single cell or bulk rna sequencing from primary cells, tissues or immortalized cells ready to use to determine the statistical significance of expression for any annotated lncrna candidate. transcriptional activation is depicted basally 'part i', as well as during inflammatory activation 'part ii' following stimulation of a pattern recognition receptor (prr). three lncrna examples genes are shown a, b and c. part iii depicts how a lncrna can undergo differential isoform expression which is regulated co-transcriptionally. during active transcription a lncrna can either undergo differential splicing or utilize a new transcriptional start site post inflammatory stimulation, such as lipopolysaccharides (lps). post-transcriptional regulation of a lncrna is broken up into three parts: iv, v and vi. after transcription is completed there is several processes that a lncrna can undergo. part iv depicts rna modifications, which can change the structure of a lncrna molecule. these modifications can be added or removed depending on the inflammatory state of a cell. part v depicts the process of mirna biogenesis, where a lncrna can be processed to a mature mirna. part vi shows that a lncrna can also be translated if it has a small open reading frame (smorf). (c) the lower right panel of the figure illustrates the regulatory function of a lncrna in the nucleus and the cytoplasm. during active transcription (basally 'part i' or inflammatory 'part ii'), a lncrna can function to repress genes (mrna gene a and c) or activate genes (mrna gene a and b). a lncrna can either be a scaffold for transcription factors to enhance activation, or a scaffold for chromatin remodeler proteins to open or close chromatin. a lncrna can also regulate a mrna transcript co-transcriptionally 'part iii' by affecting either the stability, change the splicing activity, editing of modifications, or even the capping of the mature mrna. finally, post-transcriptionally a lncrna can function to regulate the mrna in several ways. a lncrna can affect the stability of a mrna transcript: 'part iv and part v.' alternatively, a lncrna can function as a mirna sponge which indirectly de-represses the expression of a mrna that would be targeted by the mirnas. lastly, a lncrna can modulate the translation of a mrna by binding to ribosomes or mrna transcripts during translation. genome-wide association studies (gwas) have revolutionized the study of complex diseases by allowing quantitative disease-association of thousands of genetic loci [103] . these studies include evaluation of single-nucleotide polymorphisms (snps) or deletions and determination of their association with a disease phenotype. diseases studied range from inflammatory bowel disease (idb) to schizophrenia [104, 105] . until recently, most gwas studies focused on proteincoding genes, even though 90% of disease-associated snps lie in noncoding regions of the genome [106] . there are several databases that summarize the plethora of published human sequencing studies, including uk biobank [107] and gwas catalog -embl-ebi [108] . other databases specifically focus on snps within lncrnas, such as lncrnasnp2 [109] and lnc2catlas [110] . to date, a couple of studies have clearly shown how snps from gwas studies can be used to identify clinically relevant lncrnas. castellanos-rubio et al. identified a snp rs917997 associated with celiac disease and showed it was located within a novel lncrna, lnc13 [62] . lnc13 regulates inflammatory genes and mediates its function via an interaction with an hnrnp protein [62] . furthermore, they showed that the snp disrupted the rna-protein interaction, thus making the lncrna dysfunctional [62] . another fascinating study began by mining gwas for atherosclerosis disease-associated snps, which led to the discovery of a novel lncrna, lin00305, that had 5 snps that were associated with atherosclerosis all located within an intronic region [111] . the group went on to characterize linc00305 in human primary and immortalized monocytes as a promoter of inflammation through activating the aryl-hydrocarbon receptor repressor (ahrr) -nf-κb pathway by directly binding to lipocalin-1 interacting membrane receptor (limr), acting as a scaffold to promote the interaction between limr and ahrr [111] . both of these studies started by investigating clinically relevant snps that led to the discovery of disease-associated lncrnas, providing the groundwork for future studies on potential biomarkers or the development of novel therapeutic targets for a variety of inflammatory diseases. additionally, as previously mentioned, xena or lnc2catlas are databases that combine rna sequencing of tissue samples from healthy and cancer patients, these tools can also be adapted to assess age, gender, tissue or even disease association of splice variants [98, 110] . these disease-associated snps or splice variants will help guide further studies to determine therapeutic or biomarker potential. in summary, assessing disease association by a variety of databases can provide insight into the function of a lncrna [112] . unlike lncrnas, coding genes are highly conserved across distal and related species. the requirement of coding genes to encode functional peptides likely constrains the variation within the open reading frame (orf) sequence [113] . lncrnas by definition are not translated and often have poor sequence conservation across related species. many lncrnas display species-specific expression. thus, inferring function based on sequence similarity is a challenge (reviewed in [114] ). a more useful conservation metric is to assess whether the lncrnas have conservation of synteny (location relative to flanking coding genes), along with expression conservation [115] . the databases lncipedia and noncode have user-friendly interfaces, allowing assessment of both conservation of synteny and sequence for lncrnas. additionally, expression conservation can be useful to assess whether the specific lncrna has the same biological role in similar and divergent species [116] . conservation of lncrna expression can also be indicative of conservation of regulatory regions, such as transcription factor binding sites within promoters [117] . however, conservation of expression does not necessarily mean the rna product is important for lncrna function. enhancer rnas (ernas), for example, are thought to predominantly function by creating a localized, active transcriptional state, which can activate neighboring genes [118] [119] [120] . it is unclear to what extent the specific rna sequence of ernas is important for their function [118] . nevertheless, a couple of studies have provided examples of ernas for which the transcript sequence was necessary for their function [121, 122] . to further investigate conservation of a candidate, blast (basic local alignment search tool) on ncbi (national center for biotechnology information) can be utilized to explore conservation across species [123] or lncrnadb v2.0 [124] , by inputting the entire sequence of a lncrna of interest. if a lncrna has a known structure, inputting the shorter structured rna sequence can enhance conservation results. one can also, view the conservation track of the ucsc genome browser [125] to assess if this specific sequence within the lncrna transcript is conserved across species. in summary, conservation of a lncrna is complicated to assess using current bioinformatic methods. however, understanding if there is a functional motif (therefore a shorter starting input sequence) within a lncrna can allow for an increased assessment of functional conservation across species. activation of inflammatory pathways result in both up and down regulation of specific lncrnas, which in turn can have either positive or negative regulatory effects on the pathway, such as activation of sequestered transcription factors [45, 126] or enhanced or repressed expression of specific inflammatory cytokines [61, 84, 127] . genes that are immediate regulators of immunity are poised for transcriptional activation, which can be assessed by defining the openness of promoter regions of a lncrna pre-and post-inflammatory stimulation. common methods for assessing chromatin accessibility of promoters include dnase-hyper-sensitivity (dnasehs) [128] and the assay for transposase-accessible chromatin (atac) [129] . dnase hs-seq and atac-seq datasets are available from a variety of tissues on encode for both mouse and humans (tables 1 and 2 ). if the promoter is open (accessible), this is indicative of either a poised or actively transcribed gene. accessibility of promoter regions in the hematopoietic cell lineage was assessed by lara-astiaso et al. through performing atac-seq for all cells in the hematopoietic lineage [76] . this dataset provides insight into a gene's promoter accessibility, as well as cell type specificity. for instance, if the promoter is open in all cell types, it shows that it is ubiquitously accessible and possibly expressed, while if a promoter is only accessible in myeloid cells or terminally differentiated macrophages this provides insight into the cell type that could be most biologically relevant for a particular lncrna. another interesting data set from tong et al. have provided atac sequencing from bone marrow derived macrophages (bmdms) pre-and post-inflammatory time course stimulation [72] . this dataset assesses both poised genes and genes that undergo promoter remodeling during inflammatory activation. if a promoter of a lncrna is inaccessible or accessible during inflammatory stimulation, this could provide insight to its regulation and biological significance during an immune response. once the accessibility of a promoter region is determined, defining post-translational modifications of histones on promoter regions will assess promoter activity in specific cell types or inflammatory states [130] [131] [132] . a histone modification is a covalent post-translational modification (ptm) to histone proteins which includes methylation, phosphorylation, acetylation, ubiquitylation, and sumoylation (reviewed in [133] ). posttranslational histone modifications do not affect dna nucleotide sequence but can modify chromatin availability to the transcriptional machinery [134] . identifying the types of epigenetic histone marks will add additional layers of understanding if the candidate is active, poised, silenced or an enhancer. many publicly available datasets provide this information outlined in tables 1 and 2. while the examples we outlined here are immune focused there is a vast array of additional primary and immortalized cell line for every histone mark available through the easily accessible encode database [101] . finally, the transcriptional regulation of a lncrna can be defined by analyzing the transcription factor (tf) motif's that lie within the predicted promoter region [135] [136] [137] . rnareg2.0 [138] or homer [139, 140] are useful tools to predict tf binding sites by motif analysis. tf motifs can be indicative of biological pathway regulation indicating when a gene is expressed. these findings of predicted tf can then be put into a gene ontology tool (panther or david) to assess how a candidate lncrna is transcriptionally regulated. for instance, the presence of pioneering transcription factors could provide information on the cell type specificity of a lncrna. on the other hand, if the promoter motifs are enriched for p65, interferon response factors (irfs) or activating transcription factors (atfs) this would be indicative of inflammatory specific expression. this finding can be further supported by chromatin immunoprecipitation (chip) sequencing data sets from a multitude of labs, as well as data from the encode project (outlined in tables 1 and 2). the categorization of lncrnas as "noncoding" is initially determined bioinformatically using the arbitrary cut-off of < 100 codons [141] . this leaves the possibility that some lncrnas may be mrnas. therefore, one of the first steps in characterizing a candidate lncrna is to confirm that it is noncoding. for example, in drosophila, a gene annotated as a lncrna fbgn0087003, was shown to encode multiple small~ten amino acid peptides critical for development [142] . the discovery of these germline-encoded biologically active peptides has opened the door into new and exciting levels of regulation. however, this discovery also shows that we should be cautious when characterizing a lncrna to confirm that indeed they are noncoding. several bioinformatic tools exist for predicting small orfs (smorf) but have noted that the predictive ability for smaller orf size is in general very poor [143] . phylocsf uses codon substitution frequency, together with conservation across multiple species, to provide a score metric that can be used to determine the presence of a conserved orf [113] . other approaches have sought to identify novel small peptides using mass spectrometry. however, a major challenge is determining whether the peptides identified correspond to novel smorfs or represent degraded intermediates of larger proteins [144] . the development of ribosomal foot-printing coupled to next-generation sequencing (ribo-seq) has provided a powerful quantitative method for assessing global translation [145, 146] . while ribosome profiling allows for ribosome nuclease-protected rna fragments to be mapped to transcripts to enable quantitative measurement of the translation efficiency [146] . in addition, to mapping orfs, ribo-seq can be performed with a drug that stalls ribosomes at the start codon to globally map translation start sites-this revealed significant numbers of non-canonical translation initiation from ctg codons [146] . riboseq has also found that many lncrnas appear to be translated, raising the possibility that some of these transcripts could be producing small peptides [145] . guttman et al. developed the ribosome release score metric as part of the ribosome profiling analysis pipeline to more accurately predict translational efficiency. their findings show that ribosome occupancy of lncrnas and 5′utrs does not always equate to translation [147] . furthermore, guttman et al. concluded that most noncoding transcripts are not translated into peptides. several additional studies have examined lncrnas binding to ribosomes and have concluded that ribosome binding may not be functional and may serve as a quality control process to degrade transcripts with low coding potential via the nonsense-mediated decay (nmd) pathway [148, 149] . a recent study by jackson et al. used ribo-tagging in lps-stimulated mouse macrophages to identify ribosome footprints within hundreds of annotated lncrnas, raising the possibility that they may be producing functional peptides [150] . they characterized an 83aa peptide located within a previously annotated lncrna aw112010 and showed it was produced by non-canonical "ctg" translation initiation [150] . they demonstrated that this small peptide had a critical role in mucosal immunity in mice and was specifically required for the expression of il12 mrna [150] . the exact mechanism of how this peptide drives il12 expression is yet to be elucidated. the laborious process of functionally characterizing lncrnas has remained a major limitation to lncrna functional discovery. for example, while~16,000 long noncoding rnas (lncrnas) have been identified in the human genome, only 3% of all validated lncrnas have an ascribed function [47, 91, 151] . for more than two decades since its discovery, rna interference (rnai) has been the method of choice for loss of function studies. the ease and versatility of rnai make it appealing for use, requiring short complementary small rnas transfected into cells that can then utilize the endogenous cellular machinery to target specific transcripts [152, 153] (fig. 3a) . additionally, short hairpin rnas (shrnas) can be expressed in a lentiviral context to accomplish stable rnai in cells [154] . rnai is most active in the cytoplasm, which makes it most useful for targeting mrnas and lncrnas that reside in the cytoplasm [155] . success in knocking down some lncrnas, for example lincrna-cox2, has been attributed to the fact that this lncrna is expressed both in the cytoplasm and the nucleus (perhaps cycling between both compartments) and hence is susceptible to rnai [42] . however, many lncrnas are thought to be nuclear-restricted where rnai has been demonstrated to have limited efficiency [156] (fig. 3) . antisense oligonucleotides (asos) can be either rna or dna-based and can be used to target complementary sequences within a transcript. unlike rnai, asos do not engage the cellular rnai machinery [157] . instead, asos function by hybridizing to the target rna and inhibiting its function by either inducing the rnase h pathway or by steric inhibition. the rnase h enzyme is part of a cellular pathway that normally functions to resolve unwanted dna: rna interactions that can occur during replication and/or transcription [158] . one of the most widely used types of asos for targeting lncrnas are gapmers which contain a "hybrid" modified/unmodified configuration consisting of 10 nt dna core flanked by 2′-o-methyl or lna-modified synthetic nucleotides (fig. 3b) . gapmers offer the benefit of modifications for stability and reduced toxicity, while still allowing engagement of the rnase h pathway (fig. 3b) [159] . efficient depletion of nuclear lncrnas has been demonstrated using modified dna anti-sense oligonucleotides [155] . ilott et al. used gapmers to successfully knock-down tlr-induced enhancer rna, as they were unable to knock-down these same ernas with 4 different sirnas [56] . recent in vivo applications for dna oligonucleotides have also proven successful as a possible treatment of a neural degenerative disease called angelman's syndrome. angelman's syndrome is a monogenic disorder caused by mutations in the e3 ubiquitin ligase 3a (ube3a) [160, 161] . as ube3a is a paternally-imprinted gene, a mutation in the maternal allele is sufficient to lead to disease [160, 161] . paternal imprinting of ube3a requires the expression of an antisense lncrna called ube3a-ats. therefore, targeting the paternal lncrnas ube3a-ats using asos leads to de-repression of expression of paternal ube3a, allowing the rescue of the defective maternal copy [161] . in a mouse model, they showed that even partial restoration of the ube3a protein expression ameliorated some cognitive deficits associated with the disease [161] . mechanistically, it is important to dissect what features of a lncrna are important for its functionfor example, determining whether the lncrna transcript is required or whether the mere act of transcription is the important feature needed to mediate its function (discussed further below). sirnas and asos are thought to inhibit gene expression at the posttranscriptional level, albeit by different mechanisms [162, 163] . nevertheless, there are examples of sirnas (reviewed in [164] ) mediating transcriptional inhibition. targeting sequences near the 5′ end of the transcript can induce the "torpedo-effect," resulting in pre-mature termination of transcription [165, 166] . therefore, when targeting lncrnas with asos or sirnas, it's important to consider where along with the transcript you are targeting to ensure biological interpretation of the ablation is correct. the introduction of crispr/cas9 technology has revolutionized the field of functional genomics by providing a novel tool for interrogating gene function. crispr/cas9 is a deoxyribose nuclease (dnase) that can be specifically targeted to genomic regions via a guide rna (grna) [167, 168] . targeting of cas9 to a region results in a blunt doublestranded dna break that engages the cellular non-homologous end-joining (nhej) dna repair pathway, which promotes imprecise repair, yielding small deletions in the repaired sequence. these small deletions result in a frame-shift that disrupts the orfs within coding genes, thereby disrupting protein synthesis. lncrnas do not contain orfs and span tens of kilobases of sequence in size. as such, targeting them with a single grna is thought to be insufficient to disrupt their function [169] . an alternative application of cas9 that has proven effective for targeting lncrnas involves using two grnas flanking the lncrna region of interest to induce deletion of the entire locus (fig. 3e ). the main advantage of deleting lncrnas is obtaining complete loss-of-function, as demonstrated in a recent study that performed a crispr-mediated deletion-screen to identify lncrnas that positively and negatively regulate cancer growth [170] . on the other hand, deletion of the dna sequence may result in an inability to resolve whether a phenotype is due to loss of lncrna production or loss of dna sequence (discussed below) [171] . in an alternative approach, liu and colleagues recently performed a crispr screen targeting the splice sites of over 10-thousand lncrnas and identified 230 lncrnas that proved essential for viability (fig. 3d) [172] . targeting of splice sites has also been shown to induce exon skipping within coding genes [173] . lncrnas have many of the same regulatory elements as coding genes. hence future studies may opt to target tf binding sites, secondary structure and/or polyadenylation sites as a way to more finely dissect the functional portions of lncrnas. the ease in which cas9 can be targeted to specific genomic regions sparked the development of a modified (catalytically inactivated) version of the protein fused to the krab (krüppel associated box) chromatin-silencing domain termed crispri [174, 175] (fig. 3c) . crispri can be used to target both coding and noncoding genes (such as lncrnas), triggering localized heterochromatin-silencing at the transcription start site (tss) [176] . unlike rnai, the transcription-based inhibition by crispri offers the ability to efficiently target lncrnas regardless of their localization in the cell. gilbert et al. have shown maximum knock-down efficiency when targeting regions +50 to +500 nucleotides relative to the transcription start site (tss) [174] . however, crispri is limited by the availability of an accurately annotated tss, particularly for lncrnas. incorporating cage-seq, gro-seq and/or chip-seq data into grna design may improve efficient targeting of a lncrna of interest [177] . nevertheless, a recent study from the weissman and lim groups utilized crispri to target 16,401 lncrna loci in 7 diverse cell lines and identified hundreds of lncrnas required for cell growth [96] . therefore, despite its caveats, it appears that crispri can be a useful and powerful tool for interrogating lncrna biology. an alternative approach to gene ablation is overexpression, gain-offunction. plasmid-based overexpression systems have been used for decades to overexpress specific coding genes [178] . plasmid-based over-expression of lncrnas is possible but is limited by the size (max size: 6-8 kb) of the specific lncrna (unpublished observations). however, because lncrnas can function in cis, it's important that there is the significant mechanistic characterization of the candidate lncrna to justify its cloning into a plasmid. the same catalytically inactivated cas9 has also been fused to a transcriptional activation domain, for example, the vp16, a strong transcriptional activator derived from herpesviruses [179, 180] . this strategy allows crispr-based transcriptional gene activation that can be used to study gain-of-function phenotypes. the zhang group recently performed a crispr activation screen targeting over 10-thousand lncrnas [181] . they identified 11 lncrnas that upon activation, mediated braf inhibitor resistance in melanoma cells [181] . one of the advantages of using a crispra library is that the same library can be used across different cell types, which makes it more cost effective. nevertheless, several caveats to the crispra system, include the possibility that high expression of lncrnas may create non-physiological conditions leading to incorrect conclusions of specific biology. lncrnas have been shown to mediate functions via binding to specific proteins. hence over-expression of lncrnas, without its protein partner, may result in the inability to identify important biology. in conclusion, it's important to understand the advantages and disadvantages of the loss-or gain-of-function methods used to modulate lncrna expression. all methods have their caveats, and these are important to consider when deciding which method to use. lncrnas can span large stretches of dna sequence and can contain important regulatory regions, such as enhancers, that are functionally independent of the lncrna product [182] . lncrna promoters have been proposed to also function as enhancers, promoting the recruitment of transcriptional-activating factors that can affect the local nucleosome environment and ultimately the expression of neighboring genes [118, 183, 184] . nevertheless, in vivo assessment of lncrna function has predominantly relied on assessing the consequence of deleting entire lncrna loci [171, 185] . numerous studies involving deletion of lncrna loci have been unable to rescue the deletion phenotype using a transgene approach, making it difficult to attribute the phenotype to the lncrna product itself [186, 187] . while deleting the entire lncrna is a useful first step to establish a phenotype, this approach can make it difficult to identify which component of the lncrna is important for the observed phenotype [171, 185, [188] [189] [190] . there are many approaches to generating a lncrna knockout/ knockdown mouse. for instance, the complete knockout is the easiest first step to take, which can be designed to remove the entire gene. after this approach, one can use a more fine-tuned approach to reveal exactly how the gene is working, including deleting specific regions of the gene or inserting a poly-adenylation cassette before the exon1 of the gene. a well-documented example of this phenomena is fendrr, which was shown to have a lethal phenotype in two independent studies, but the importance of the lncrna in development differed due to the mouse ablation strategy. one in vivo study generated a knockout mouse by removing the genomic loci completely and replacing it with lacz while leaving the native promoter intact [185] . this study identified fendrr as a key regulator of lung development and mesenchymal differentiation. however this study did not attempt to rescue the ablated fendrr allele via a transgene [185] . an additional group generated a fendrr knockdown mouse using an alternative approach. instead of disrupting the chromatin architecture or removing any possible dna enhancer regions, they inhibited the transcription of fendrr through the insertion of a poly-a cassette into exon1 [191] . using this lncrna knockdown mouse design, the scientists determined that loss of fendrr led to heart and body wall defects, which is slightly different from the phenotype in the ko mouse study. importantly, the heart and body wall defects caused by the terminator insertion were rescued by a transgene of fendrr in vivo, further confirming that the phenotype is due to the rna while the ko phenotype could be due elements within the genomic dna [191] . fendrr is located~4 kb downstream of foxf1 and~12 kb upstream of irf8. the observed phenotype of the ko mouse is possibly due to the deletion of an enhancer element that could impact the expression of protein-coding genes, which could have roles in lung development [192, 193] . we recently used multiple genetic mouse models to dissect both cis and trans functions of lincrna-cox2 in vivo [46] . working with a complete lincrna-cox2 knockout [185] , in which the gene is replaced with a lacz cassette, we observed a strong cis defect on the neighboring protein-coding gene ptgs2. from these studies, we concluded that lincrna-cox2 functions in cis through an enhancer rna mechanism to regulate its neighboring gene ptgs2 [46] . in order to determine how lincrna-cox2 functions in trans to regulate genes independent of its cis effects, we generated a mutant/intronless mouse by targeting the splice sites of lincrna-cox2 using crispr/cas9. this mouse represents a knockdown mouse, and because there is a low level of transcription of lincrna-cox2, ptgs2 levels are the same as wt. lincrna-cox2 is not inducible in the mutant mouse, probably because of transcript instability due to the lack of splicing, enabling us to study its trans regulatory roles. similar to our early in vitro work we observed using an lps shock model that many genes are both up and downregulated in the serum of mutant mice indicating that lincrna-cox2 can indeed function in trans to regulate immune genes in vivo [46] . to prove genetically that a lncrna is functioning in trans, a trans rescue experiment can be performed using transgenic mice that constitutively express a lncrna. this rescue strategy has been utilized in some studies including evf2 [194, 195] , jpx [196] , as well as pnky [197] which both demonstrate successful rescue experiments where the phenotype from germline ablation of a lncrna is rescued through generation and crossing with a transgenic animal. understanding the mechanism of action of a lncrna lncrnas are immensely adaptable molecules that are capable of working through rna-rna, rna-dna, or rna-protein interactions. rna-directed technologies such as chromatin isolation by rna purification (chirp) [198, 199] or rna antisense purification (rap) [200, 201] , will help uncover lncrna interactomes for rna, genomic or protein partners for highly expressed candidates. if a candidate is lowly expressed, one can exogenously introduce a biotinylated form of the lncrna using the rna pull-down method, which we have successfully used to identify binding partners for lincrna-cox2. this has been performed for many lncrnas [42, 202, 203] . functions of lncrnas are associated with their subcellular fates. web servers can assist in quickly assessing experimentally determined or predictive rna-rna interactions [204] or even rna-protein interactions [205, 206] . depending on the subcellular or extracellular compartmental localization of a lncrna, this patterning will elude to the regulatory role of the gene, as well as how a lncrna might execute its function [207] [208] [209] . some lncrnas that are localized to the nucleus or chromatin have been experimentally shown to function in cis to regulate the transcriptional expression of a neighboring gene, or in trans to regulate the transcriptional regulation of a subclass of genes through the interactions between heterogeneous nuclear ribonucleoproteins (hnrnps) [42, 54, 60, 210] . in the cytosol, some lncrnas have been shown to interact with rnas and proteins to carry out their molecular functions [42, 79, 211, 212] . cellular localization of lncrnas can be predicted using a publicly available user-friendly web server established at http://lin-group.cn/ server/iloc-lncrna [213] . this can be the first step by a researcher to attempt to predict the localization of your candidate based off of its sequence. iloc-lncrna predicts subcellular location of a lncrna by utilizing the 8-tuple nucleotide features into the general pseknc (pseudo k-tuple nucleotide composition) and rigorous tests show the overall accuracy achieved by the new predictor is 86.72%, which is over 20% better than previous algorithms [213] . in addition to prediction methods, there are also publicly available sequencing datasets that have performed rna sequencing on fractionated cells. bhatt et al. utilized murine bone marrow derived macrophages, with and without inflammatory stimulation, and fractionated these cells into chromatin, nuclear, and cytoplasmic compartment to determine rna localization [74] . this data set can now be utilized to investigate the localization of any murine candidate expressed in macrophages. a recent study took this question a step further by performing rna sequencing on nine separate locations with a cell including: nucleus, nucleolus, nuclear lamina, nuclear pore, cytosol, endoplasmic reticulum membrane (erm), outer mitochondrial membrane (omm), mitochondrial matrix (mito), and endoplasmic reticulum lumen [214] . this exciting study utilized an apex-sequencing method, where the peroxidase enzyme apex2 was localized to these nine separate locations in nine separate human embryonic kidney (hek) 293t cell lines. apex2 can biotinylate nearby rna molecules allowing for streptavidin-based immunoprecipitation and rna sequencing. the apex-seq datasets will provide a powerful resource for referencing localization of specific lncrna candidates that are expressed in hek293t cells [214] . there are a few commonly used experimental approaches that can be used to validate and determine the localization of a lncrna. subcellular chromatin, nuclear and cytoplasmic fractionations of any primary or immortalized cells can be prepared using previously published procedures [74, 215] , followed by rna isolation and rt-qpcr to assess the localization. if additional compartmental fractionation is desired other compartments can be enriched, such as mitochondria [216, 217] . another standard gold technique is to visually determine cellular localization by rna fish [42] . rna localization can also be directly visualized by microscopy [218] . seqfish techniques have recently been pioneered for imaging thousands of cellular rnas at once using barcoded oligonucleotides [219] . the drawbacks of these in-situ fluorescence hybridization (fish) based approaches, however, are the need for cell fixation and permeabilization, which can re-localize or extract cellular components [220] . in addition to the difficulty of assigning rnas to specific organelles or cellular landmarks due to spatial resolution limits, some of these difficulties can be overcome with the addition of stains for markers of specific organelles. conventionally, lncrnas display poor sequence conservation across species, with the exception of finite regions of conserved bases surrounded by large seemingly unconstrained sequences [221] . while sequence conservation does not constrain lncrna genes, lncrna function is found to be conserved across species when identifying motifs or structure. a great example of this phenomena is represented in the study of human maternally expressed gene 3 (meg3), which utilized the computer program mfold and multiple ex vivo and in vitro chemical probing techniques to identify common motifs critical for retained function in orangutan, rat, mouse and pig [222, 223] . another study highlighting the marsupial rsx lncrna was initially found to have no linear sequence similarity with the lncrna xist. however, it shared substantial levels of non-linear conservation within k-mer repeats that share functionally analogous protein-binding domains [224] . publicly available web servers can be utilized to determine the rna structure of a lncrna depending on the size. the caveats to these web servers are they do not work efficiently with large transcripts. to overcome this, one can attempt to identify the critical sequence within the lncrna that could be functional using rip-sequencing databases, which allow you to input a gene id to identify possible rna binding proteins (rbps). additionally, if a lncrna has already been identified to bind to a protein (s), rip-sequencing databases can elucidate the specific binding location(s). knowing the location of rna-protein interaction narrows down the sequence input, that can be used for many web servers, which could enhance the elucidation of a predictive structure. if there is no information about potential protein binding partners, then these rna structure webservers will be of little use. an alternative approach to further investigate structure is to utilize bioinformatic tools which now include parameters for covariation. covariation analysis identifies the positions in an rna molecule that have similar patterns of variation and the purpose of this covariance is due to structural constraints initially shown for ribosomal rna [225] and now also for lncrnas [123, 226] . this study predicts structures for malat1, using over 130 vertebrate sequences, as well as lncrnas repa and hotair [227] . these powerful tools allow scientists to predict structures in an rna molecule based on covariance, which in turn drive the next steps of experimentally validating these findings. fortunately, there are several in vitro and in vivo experimental techniques used to assess rna structure for any size, even up to 17,000 nt (xist). dimethyl sulfate (dms) probing uses a base-specific reagent that can bind and alter the methylation state of unpaired adenosine and cytosine nucleotides [228, 229] . dms "footprinting" is optimized for structural analysis of rna. protein binding to rna will generate a "footprint" that can be traced due to alterations in the rna structure. the transcript size that can be evaluated is rather small (< 500 nt) but this method can be performed both in vitro and in vivo as dms can easily penetrate the cell membrane, shown to work for lncrnas from 590 nt, braveheart (brvht) [230] . xue et al. utilized dms and shape to determine the multiple smaller order structures of brvht, including an agil motif and a 90-degree turn [231] . this agil motif is critical for transcription factor binding, which specifies the cardiovascular lineage. targeting structure-seq relies on rna methylation by dma being performed in vivo. using this method, structural models of elements within xist were developed [232] . shape (selective 2′hydroxyl acylation by primer extension), as well as the modified shape-map [233] , in-cell shape-seq [234] and icshape-seq [235] , can interrogate the rna structure both in vitro and in vivo using the chemical nmia and its derivatives to detect flexible regions in rna secondary structure [236, 237] . this method has been proven valuable for xist [238] , repa [239] , and rox1/2 [240] . paris (psoralen analysis of rna interactions and structures) was recently developed to determine both rna structure and interactions in vivo [241] . using this approach, a model for the higher order structure of xist was interrogated [241] . these approaches are critical for identifying structural motifs and enhance conservation studies, as well as these identified elements, can be used as novel targets for further exploration in precise intervention suitable for therapeutic applications. alternative splicing (as) significantly impacts the diversity of rna isoforms produced, which in turn impacts the protein isoforms produced and can affect many aspects of the protein's biology including binding, intracellular localization, enzymatic activity, stability, posttranslational modifications [242] . as also impacts lncrna genes which can have multiple isoforms depending on the cell/tissue, age, and disease state [243] [244] [245] . the ucsc genome browser [125] , as well as noncodev5 [92] , have all the annotated isoform transcripts for each gene. these tools will identify annotated transcript isoforms while rna sequencing will provide information on which of these splicing events is utilized in a given cell or biological state. to date, there have only been a small number of papers focusing on the role that alternative splicing plays in controlling the immune system. one recent global study has shown that widespread shortening of 3′ untranslated regions and increased exon inclusion are evolutionarily conserved features of innate immune responses in primary human macrophages following listeria monocytogenes and salmonella typhimurium infection [78] . this is a transformative study for mrnas but can be reanalyzed to examine as and possible contributions from lncrnas. publicly available tools for tissue isoform expression specificity is available on gtex [97] and xena [98] for human genes. tabula muris [100] , a murine specific dataset, is now available as a ucsc genome browser track on mm10 and can be used to view cell-type specific splicing events and isoform expression. in order to identify isoforms in illumina rna sequencing datasets, several tools can be utilized. miso (mixture of isoforms) is software for a probabilistic model for rna seq will identify specific the 5′ splice sites used for each isoform [246] . two other tools highly used for splicing analysis are juncbase [247] , majiq-spel [248] , and drimseq [249] . these tools can be used to define if your candidate gene undergoes any alternative splicing (alternative start site, exon inclusion/exclusion or alternative last exon) during a specific biological process for example following inflammatory activation. these tools are limited because of their dependence on a fully annotated transcriptomes. therefore if a lncrna is unannotated or has unannotated transcriptional isoforms, these events will not be captured. to overcome the limitations of incomplete transcriptomes, researchers can perform de novo transcriptome assembly using short rna-seq reads [250] [251] [252] . the future of rna sequencing is headed towards long read sequencing, which is being met by pacific biosciences single molecule real time sequencing technology (pacbio) and oxford nanopore technology (ont) [253, 254] . while both powerful technologies perform long read sequencing, their platforms are very different. pacbio technology is dependent on sequencing-by-synthesis. a dna polymerase incorporates nucleotides that each have a corresponding conjugated fluorescent dye. the dna polymerase works at a rate of 1000 bp/s, which is beyond the capabilities of current technologies. however, by circularizing the dna pacbio has overcome this limitation through continuous long read sequencing, resulting in ability to generate 500k-4million reads at an error rate of below 1% [255, 256] . on the other hand, ont's approach relies on a pore embedded in a membrane. as a long cdna or rna strand translocates through the nanopore at single nucleotide precision from enzymatic regulation, the ionic current across the membrane is recorded. this technology can sequence full-length transcripts and can yield up to 10 million reads on the minion or up to 60 million reads on the promethion for cdna [257, 258] . an initial limitation of ont was the 5-10% per read error rate, which has been overcome with a new technology called rolling circle to concatemeric consensus (r2c2) bringing the error rate down to 2.5% by increasing the read coverage. overall, both of these technologies have overcome the transcriptome assembly and isoform identification limitations of short-read illumina sequencing [259] . as stated above, ont and pacbio can perform long read cdna sequencing [260] and even more exciting, both technologies can perform direct long read rna sequencing [261, 262] . the beauty and simplicity that ont and pacbio offers is the ability to sequence the full expressed isoform (cdna and direct rna), without worries of misidentifying a complex splicing pattern, rna cleavage events and also not relying on transcript annotation files will lead to the identification of novel isoforms which are problems faced using short-read illumina sequencing. for lncrnas there have been a couple of studies that focus on isoform specificity and function for a given lncrna. neat1 [263] and lncrna-pxn-as1 [264] are two studies that show how one gene can have different functions depending on the rna isoform expressed. while these studies are not immunology specific, this field is still at the early stages and we anticipate it becoming more prevalent in future studies. rna modifications are widespread and diverse in chemical nature, as well as highly conserved in their occurrence and function throughout species. rna modifications function to affect rna stability, localization, alternative site of poly-adenylation, and more [265] . since lncrnas can function as decoys and scaffolds, which are highly dependent on rna structure, a single modification can enhance or eradicate this rna-protein interaction. as you study a lncrna, the mechanism of this molecule could be dependent on a modified nucleotide. there are many techniques used to determine a single rna modification in a cell type and biology of choice. site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and tlc (scarlet) technology give scientists the ability to probe for n6-methyladenosine rna (m6a) modification status at single nucleotide resolution in mrna and long noncoding rna [266] . the significance of rna modifications to the control of the immune response is beginning to be appreciated. a study by winkler et al. showed that m6a modification controls the innate immune response to infection by targeting type i interferons [267] . a few recent studies have shown that lncrnas do have rna modifications such as malat1 containing m6a modifications [268, 269] , hotair containing m5c and m6a [270, 271] and xist containing ψ, m6a and m5c modifications [269] . a study by zhou et al. showed that the rna modification, m6a, acts as a structural 'switch' in malat1. when there is a modification at site 2515, it results in an increased ability to bind hnrnpg, while a modification at 2577 leads to an increase in binding to hnrnpc [269] . in clinical research, lncrna rp11-139j23.1 is highly expressed in colorectal cancer cells (crc), and this specific upregulation was controlled by m6a methylation [272] . the study showed that m6a could regulate the lncrna, which in turn triggered the dissemination of crc cells via post-translation upregulation of the protein zeb1. this novel study, connecting the interplay of rna modifications and lncrnas, has paved the way for a novel predictive biomarker or therapeutic target in crc [272] . there are over 160 identified rna modifications, while only a few have been studied to any extent [273] . of these rna modifications, the way they are enriched for in analysis is through an assortment of techniques including methylated rna immunoprecipitation (merip), merip-iclip (crosslinking and immunoprecipitation), suicide enzyme trap and clickable chemicals (reviewed in [274] ). these techniques have many limitations and biases, but hopefully, future studies using direct rna nanopore sequencing will overcome all these pitfalls. in a recent study, direct rna sequencing using nanopore technology showed detection of m6a modifications with a 97% accuracy with the design of synthetic sequences [275] . as the performance of the algorithm increases, use of this tool will be extremely insightful when the flow-chart provides a "beginning to end" guide to study lncrnas. not all suggested databases will be appropriate for all lncrnas being studied. a. selection of candidate lncrnas should factor in changes in expression, type of lncrna and nearby coding genes. b. bioinformatic characterization of lncrnas can be done using a variety of online databases including: lncipedia to assess conservation or using the mouse cell atlas (mca) to assess tissue specific expression. c. expression validation: this includes validation of expression and confirming that the lncrna is in-fact non-protein coding. d. functional validation dives into the final stage of mechanistic characterization of a lncrnas, which involved manipulating its expression, as well as uncovering the specific cis-elements within the transcript important for its function. analyzing myeloid or lymphoid primary cells with and without a treatment to understand how rna modifications are regulated in innate immunity and specifically as it relates to our long noncoding transcriptome. lncrnas, including xist and h19, have been studied intensely for decades [275, 276] . at the time we had no idea that these genes would represent the largest family of rna genes produced in the genome. as louis pasteur once said, "chance favors the prepared mind," and this is especially true following the development of next-generation sequencing. rna-sequencing provided an unprecedented insight into the human genome. we did not identify new proteins, instead we found a wealth of noncoding rna transcripts. the lncrna field is growing at a blistering pace with labs from all aspects of biology, and now immunology branching out to include questions about the regulatory impact of these pervasive long noncoding gene species. as detailed in this review, there are many publicly available datasets and web servers that will streamline how to begin a lncrna project, from how to pick a lncrna candidate by interrogating published rna-sequencing data, to determine the best tools to use to study the function and mechanism of a candidate (fig. 4) . since this field is still at an early stage in its development, there are some shortcomings, including poorly annotated lncrna transcripts. however, this will be overcome with direct rna sequencing using ont and pacbio technology. these technologies will enable us to determine the exact isoforms of transcripts expressed in a particular cell and begin to catalog the different rna modifications that exist basally and during a biological process such as activation of inflammation. since lncrnas are cell-type specific in their expression patterns continued development of single-cell sequencing technologies will provide a complete catalog of lncrnas in the genome. as the list of annotated lncrnas grows, characterizing the function of all these genes has become a definite bottle-neck in the field. however, highthroughput crispr screening provides an approach to quickly identify functional lncrnas in a particular biological system. utilizing all the tools outlined here should enable researchers to develop this field rapidly. for our research focus, gaining a better understanding of the role of lncrnas in regulating immune responses will provide novel insights into the molecular mechanisms governing inflammation. this data will be critical for identifying new avenues for therapeutic intervention for infectious and inflammatory disease. the transparency document associated with this article can be found, in the online version. pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding rnas the transcriptional landscape of the mammalian genome a promoter-level mammalian expression atlas the gencode v7 catalog of human long noncoding rnas: analysis of their gene structure, evolution, and expression long non-coding rnas and control of gene expression in the immune system bases of antisense lncrna-associated regulation of gene expression in fission yeast detection of bidirectional promoter-derived lncrnas from small-scale samples using pre-amplification-free directional rna-seq method enhancer rnas (ernas): new insights into gene transcription and disease treatment xlincrnas: genomics, evolution, and mechanisms history, discovery, and classification of lncrnas soybean enod40 encodes two peptides that bind to sucrose synthase a micropeptide encoded by a putative long noncoding rna regulates muscle performance a peptide encoded by a transcript annotated as long noncoding rna enhances serca activity in muscle mtorc1 and muscle regeneration are regulated by the linc00961-encoded spar polypeptide circ-znf609 is a circular rna that can be translated and functions in myogenesis viral infection identifies micropeptides differentially regulated in smorf-containing lncrnas control of adaptive immunity by the innate immune system innate immune recognition: mechanisms and pathways toll-like receptor stimulation in cancer: a proand anti-tumor double-edged sword long noncoding rna: novel links between gene expression and innate immunity role of low-grade inflammation in osteoarthritis role of toll-like receptors in multiple sclerosis interconnection between dna damage, senescence, inflammation, and cancer long noncoding rna in hematopoiesis and immunity long noncoding rnas: cellular address codes in development and disease long non-coding rnas: new players in cell differentiation and development long noncoding rna transcriptome of plants long noncoding rnas: regulation, function and cancer emerging role of lncrna in cancer: a potential avenue in molecular medicine noncoding rna: rna regulatory networks in cancer the impact of lncrna dysregulation on clinicopathology and survival of breast cancer: a systematic review and meta-analysis prostate cancer-associated lncrnas the critical role of toll-like receptors -from microbial recognition to autoimmunity: a comprehensive review innate immune pattern recognition: a cell biological perspective transcriptional control of the inflammatory response transcriptional regulation in the innate immune system toll-like receptors in innate immunity toll-like receptors (tlrs) and nod-like receptors (nlrs) in inflammatory disorders toll-like receptors, associated biological roles, and signaling networks in non-mammals toll-like receptors: key mediators of microbe detection a long noncoding rna mediates both activation and repression of immune response genes lincrna-cox2 promotes late inflammatory gene transcription in macrophages through modulating swi/snf-mediated chromatin remodeling lincrna-cox2 modulates tnf-α-induced transcription of il12b gene in intestinal epithelial cells through regulation of mi-2/nurd-mediated epigenetic histone modifications crispr/cas-based screening of long non-coding rnas (lncrnas) in macrophages with an nf-κb reporter genetic models reveal cis and trans immune-regulatory activities for lincrna-cox2 long non-coding rnas and the innate immune response cytokines and long noncoding rnas on the classification of long non-coding rnas the landscape of long noncoding rna classification neighboring gene regulation by antisense long non-coding rnas local regulation of gene expression by lncrna promoters, transcription and splicing the long noncoding rna rocki regulates inflammatory gene expression cutting edge: a natural antisense transcript, as-il1α, controls inducible transcription of the proinflammatory cytokine il-1α the human long noncoding rna lnc-il7r regulates the inflammatory response corrigendum: long non-coding rnas and enhancer rnas regulate the lipopolysaccharide-induced inflammatory response in human monocytes chromatin signature reveals over a thousand highly conserved large non-coding rnas in mammals genome regulation by long noncoding rnas cis-and trans-acting lncrnas in pluripotency and reprogramming a long noncoding rna lincrna-eps acts as a transcriptional brake to restrain inflammation a long noncoding rna associated with susceptibility to celiac disease, science (80-. ) immunobiology of long noncoding rnas cell composition analysis of bulk genomics using single-cell data innate immune landscape in early lung adenocarcinoma by paired single-cell analyses single cell rna sequencing of human liver reveals distinct intrahepatic macrophage populations single-cell analysis reveals that stochasticity and paracrine signaling control interferon-alpha production by plasmacytoid dendritic cells seq-well: portable, low-cost rna sequencing of single cells at high throughput unique signatures of long noncoding rna expression in response to virus infection and altered innate immune signaling digital cell quantification identifies global immune cell dynamics during influenza infection chromatin state dynamics during blood formation hhs public access, science (80-. ) tissue-resident macrophage enhancer landscapes are shaped by the local microenvironment a stringent systems approach uncovers gene-specific mechanisms regulating inflammation analysis of genetically diverse macrophages reveals local and domain-wide mechanisms that control transcription factor binding and function transcript dynamics of proinflammatory genes revealed by sequence analysis of subcellular rna fractions transcriptional landscape of mycobacterium tuberculosis infection in macrophages a stringent systems approach uncovers gene-specific mechanisms regulating inflammation tissue damage drives co-localization of nf-κb, smad3, and nrf2 to direct rev-erb sensitive wound repair in mouse macrophages widespread shortening of 3′ untranslated regions and increased exon inclusion are evolutionarily conserved features of innate immune responses to infection the stat3-binding long noncoding rna lnc-dc controls human dendritic cell differentiation epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity β-glucan reverses the epigenetic state of lps-induced immunological tolerance specific and novel micrornas are regulated as response to fungal infection in human dendritic cells an interferon-related signature in the transcriptional core response of human macrophages to mycobacterium tuberculosis infection the long noncoding rna thril regulates tnfα expression through its interaction with hnrnpl catalog of differentially expressed long non-coding rna following activation of human and mouse innate immune response whole transcriptome analysis reveals differential gene expression profile reflecting macrophage polarization in response to influenza a h5n1 virus infection human body epigenome maps reveal noncanonical dna methylation variation infectious disease. life-threatening influenza and impaired interferon amplification in human irf7 deficiency longitudinal profiling of human blood transcriptome in healthy and lupus pregnancy the accessible chromatin landscape of the human genome evlncrnas: a manually curated database for long non-coding rnas validated by low-throughput experiments noncodev5: a comprehensive annotation database for long noncoding rnas lncipedia 5: towards a reference set of human long noncoding rnas landscape of transcription in human cells integrative annotation of human large intergenic noncoding rnas reveals global properties and specific subclasses crispri-based genome-scale identification of functional long noncoding rna loci in human cells genetic effects on gene expression across human tissues the ucsc xena platform for cancer genomics data visualization and interpretation mapping the mouse cell atlas by microwell-seq single-cell transcriptomics of 20 mouse organs creates a tabula muris the encyclopedia of dna elements (encode): data portal update paradigm shifts in genomics through the fantom projects chapter 11: genome-wide association studies 10 years of gwas discovery: biology, function, and translation the contribution of genetic variants to disease depends on the ruler human disease-associated genetic variation impacts large intergenic non-coding rna expression the uk biobank resource with deep phenotyping and genomic data genome-wide association studies and beyond lncrnasnp2: an updated database of functional snps and mutations in human and mouse lncrnas lnc2catlas: an atlas of long noncoding rnas associated with risk of cancers long noncoding rna linc00305 promotes inflammation by activating the ahrr-nf-k b pathway in human monocytes disease-associated snps in inflammation-related lncrnas phylocsf: a comparative genomics method to distinguish protein coding and non-coding regions evolutionary conservation of long non-coding rnas; sequence, structure, function unique features of long non-coding rna biogenesis and function perspectives on the mechanism of transcriptional regulation by long non-coding rnas evolutionary clues in lncrnas enhancer rnas and regulated transcriptional programs long non-coding rnas as local regulators of pancreatic islet transcription factor genes genome-wide analysis of enhancer rna in gene regulation across 12 mouse tissues expression of rd29a:: atdreb1a/cbf3 in tomato alleviates drought-induced oxidative stress by regulating key enzymatic and non-enzymatic antioxidants ernas are required for p53-dependent enhancer activity and gene transcription phylogenetic analysis with improved parameters reveals conservation in lncrna structures lncrnadb v2.0: expanding the reference database for functional long noncoding rnas the ucsc genome browser a long noncoding rna, lincrna-tnfaip3, acts as a coregulator of nf-κb to modulate inflammatory gene transcription in mouse macrophages p50-associated cox-2 extragenic rna (pacer) activates cox-2 gene expression by occluding repressive nf-κb complexes genome-wide mapping of dnase hypersensitive sites using massively parallel signature sequencing (mpss) atac-seq: a method for assaying chromatin accessibility genome-wide the tale of histone modifications and its role in multiple sclerosis the histone modification code in the pathogenesis of autoimmune diseases histone modifications and their role in epigenetics of atopy and allergic diseases regulation of chromatin by histone modifications acetylation and methylation of histones and their possible role in the regulation of rna synthesis regulatory networks involving stats, irfs, and nfκb in inflammation single-cell proteomics reveal that quantitative changes in coexpressed lineage-specific transcription factors determine cell fate post-transcriptional regulation of gene expression in innate immunity an enhanced computational platform for investigating the roles of regulatory rna and for identifying functional rna motifs analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities discrimination of non-protein-coding transcripts from proteincoding mrna peptides encoded by short orfs control development and define a new eukaryotic gene family small open reading frames: current prediction techniques and future prospect peptidomic discovery of short open reading frame-encoded peptides in human cells ribosome profiling reveals pervasive translation outside of annotated protein-coding genes genomewide analysis in vivo of translation with nucleotide resolution using ribosome profiling ribosome profiling provides evidence that large noncoding rnas do not encode proteins cytoplasmic long noncoding rnas are frequently bound to and degraded at ribosomes in human cells ribosome profiling reveals resemblance between long non-coding rnas and 5′ leaders of coding rnas the translation of non-canonical open reading frames controls mucosal immunity lncbook: a curated knowledgebase of human long non-coding rnas choosing the right tool for the job rna interference: from basic research to therapeutic applications short hairpin rna-mediated gene silencing cellular localization of long non-coding rnas affects silencing by rnai more than by antisense oligonucleotides knockdown of nuclear-retained long noncoding rnas using modified dna antisense oligonucleotides designing chemically modified oligonucleotides for targeted gene silencing ribonuclease h: the enzymes in eukaryotes fatty acid-modified gapmer antisense oligonucleotide and serum albumin constructs for pharmacokinetic modulation angelman syndrome, heal. care people with intellect towards a therapy for angelman syndrome by targeting a long non-coding rna rna interference-significance and applications molecular mechanisms of antisense oligonucleotides, nucleic acid ther rna interference in the nucleus: roles for small rnas in transcription, epigenetics and beyond transcriptional termination in mammals: stopping the rna polymerase juggernaut, science (80-. ) dicer promotes transcription termination at sites of replication stress to maintain genome stability cutting it close: crispr-associated endoribonuclease structure and function expanding the biologist's toolkit with crispr-cas9 targeting non-coding rnas with the crispr/cas9 system in human cell lines genome-scale deletion screening of human long non-coding rnas using a paired-guide rna crispr-cas9 library considerations when investigating lncrna function in vivo genome-wide screening for functional long noncoding rnas in human cells by cas9 targeting of splice sites frameshift indels introduced by genome editing can lead to inframe exon skipping genome-scale crispr-mediated control of gene repression and activation crispr interference (crispri) for sequence-specific control of gene expression the krüppel-associated box repressor domain induces reversible and irreversible regulation of endogenous mouse genes by mediating different chromatin states gene overexpression: uses, mechanisms, and interpretation multiplexed activation of endogenous genes by crispr-on, an rna-guided transcriptional activator system crispr-mediated modular rna-guided regulation of transcription in eukaryotes genome-scale activation screen identifies a lncrna locus regulating a gene neighbourhood linking rna biology to lncrnas promoter of lncrna gene pvt1 is a tumor-suppressor dna boundary element unlinking an lncrna from its associated cis element multiple knockout mouse models reveal lincrnas are required for life and brain development xist-deficient mice are defective in dosage compensation but not spermatogenesis deletion of the h19 transcription unit reveals the existence of a putative imprinting control element the rox genes encode redundant male-specific lethal transcripts required for targeting of the msl complex loss of the abundant nuclear non-coding rna malat1 is compatible with life and development targeted disruption of hotair leads to homeotic transformation and gene derepression the tissue-specific lncrna fendrr is an essential regulator of heart and body wall development in the mouse genomic and epigenetic complexity of the foxf1 locus in 16q24.1: implications for development and disease transcription factor irf8 plays a critical role in the development of murine basophils and mast cells balanced gene regulation by an embryonic brain ncrna is critical for adult hippocampal gaba circuitry evf2 (dlx6as) lncrna regulates ultraconserved enhancer methylation and the differential transcriptional control of adjacent genes lncrna jpx induces xist expression in mice using both trans and cis mechanisms the long noncoding rna pnky is a trans-acting regulator of cortical development in vivo chirp-ms: rna-directed proteomic discovery chromatin isolation by rna purification (chirp) rap-ms: a method to identify proteins that interact directly with a specific rna molecule in cells rna-rna interactions enable specific targeting of noncoding rnas to nascent pre-mrnas and chromatin sites rna pull-down procedure to identify rna targets of a long noncoding rna zbtb7b engages the long noncoding rna blnc1 to drive brown and beige fat development and thermogenesis catrapid omics: a web server for large-scale prediction of protein-rna interactions omictools: an informative directory for multi-omic data analysis, database (oxford) the emerging role of exosome-derived non-coding rnas in cancer biology global positioning system: understanding long noncoding rnas through subcellular localization cytoplasmic functions of lncrnas nuclear long noncoding rnas: key regulators of gene expression plasma long noncoding rna il-7r as a prognostic biomarker for clinical outcomes in patients with acute respiratory distress syndrome a feedforward regulatory loop between hur and the long noncoding rna linc-md1 controls early phases of myogenesis a long noncoding rna controls muscle differentiation by functioning as a competing endogenous rna iloc-lncrna: predict the subcellular location of lncrnas by incorporating octamer composition into general pseknc atlas of subcellular rna localization revealed by apex-seq co-transcriptional splicing of constitutive and alternative exons sub-cellular localization of membrane proteins localized translation near the mitochondrial outer membrane: an update visualization of single rna transcripts in situ stable in situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus immunolabeling artifacts and the need for live-cell imaging evolution to the rescue: using comparative genomics to understand long non-coding rnas maternally expressed gene 3 (meg3) noncoding ribonucleic acid: isoform structure, expression, and functions structural characterization of maternally expressed gene 3 rna reveals conserved motifs and potential sites of interaction with polycomb repressive complex 2 non-linear sequence similarity between the xist and rsx long noncoding rnas suggests shared functions of tandem repeat domains structural constraints identified with covariation analysis in ribosomal rna covariation analysis with improved parameters reveals conservation in lncrna structures the existence of phylogenetic covariation in base-pairing is strong evidence for functional dms footprinting of structured rnas and rna-protein complexes dms footprinting of structured rnas and rna-protein complexes genome-wide probing of rna structure reveals active unfolding of mrna structures in vivo visualizing the secondary and tertiary architectural domains of lncrna repa a g-rich motif in the lncrna braveheart interacts with a zinc-finger transcription factor to specify the cardiovascular lineage probing xist rna structure in cells using targeted structure-seq rna motif discovery by shape and mutational profiling (shape-map) simultaneous characterization of cellular rna structure and function with in-cell shape-seq transcriptome-wide interrogation of rna secondary structure in living cells with icshape exploring rna structural codes with shape chemistry selective 2′-hydroxyl acylation analyzed by primer extension (shape): quantitative rna structure analysis at single nucleotide resolution shape reveals transcript-wide interactions, complex structural domains, and protein interactions across the xist lncrna in living cells comparison of shape reagents for mapping rna structures inside living cells tandem stem-loops in rox rnas act together to mediate x chromosome dosage compensation in drosophila rna duplex map in living cells reveals higher-order transcriptome structure function of alternative splicing comprehensive network analysis reveals alternative splicing-related lncrnas in hepatocellular carcinoma a genomic view of alternative splicing of long non-coding rnas during rice seed development reveals extensive splicing and lncrna gene families the more the merrier-complexity in long non-coding rna loci analysis and design of rna sequencing experiments for identifying isoform regulation conservation of an rna regulatory map between drosophila and mammals majiq-spel: web-tool to interrogate classical and complex splicing variations from rna-seq data drimseq: a dirichlet-multinomial framework for multivariate count outcomes in genomics transrate: reference-free quality assessment of de novo transcriptome assemblies de novo prediction of stem cell identity using single-cell transcriptome data bridger: a new framework for de novo transcriptome assembly using rna-seq data comprehensive comparison of pacific biosciences and oxford nanopore technologies and their applications to transcriptome analysis relative performances of oxford nanopore minion vs. pacific biosciences sequel third-generation sequencing platforms in identification of agricultural and forest pathogens pacbio sequencing and its applications an evaluation of the pacbio rs platform for sequencing and de novo assembly of a chloroplast genome nanopack: visualizing and processing long-read sequencing data rna and gene expression analysis using direct rna and cdna sequencing improving nanopore read accuracy with the r2c2 method enables the sequencing of highly multiplexed full-length single-cell cdna transcriptome-wide analysis of a baculovirus using nanopore sequencing native molecule sequencing by nano-id reveals synthesis and stability of rna isoforms transcriptome profiling using single-molecule direct rna sequencing approach for in-depth understanding of genes in secondary metabolism pathways of camellia sinensis structural analyses of neat1 lncrnas suggest long-range rna interactions that may contribute to paraspeckle architecture the mbnl3 splicing factor promotes hepatocellular carcinoma by increasing pxn expression through the alternative splicing of lncrna-pxn-as1 a majority of m6a residues are in the last exons, allowing the potential for 3′ utr regulation probing n6-methyladenosine rna modification status at single nucleotide resolution in mrna and long noncoding rna m6a modification controls the innate immune response to infection by targeting type i interferons n(6)-methyladenosine modification in a long noncoding rna hairpin predisposes its conformation to protein binding m6a modification of non-coding rna and the control of mammalian gene expression topology of the human and mouse m6a rna methylomes revealed by m6a-seq comprehensive analysis of mrna methylation reveals enrichment in 3′ utrs and near stop codons m 6 a-induced lncrna rp11 triggers the dissemination of colorectal cancer cells via upregulation of zeb1 modomics: a database of rna modification pathways. 2017 update detecting rna modifications in the epitranscriptome: predict and validate accurate detection of m6a rna modifications in native rna sequences a gene from the region of the human x inactivation centre is expressed exclusively from the inactive x chromosome the product of the h19 gene may function as an rna support from : national institute of health ar070973 and the tobacco related disease research program 27ip-0017h to susan carpenter. key: cord-304607-td0776wj authors: paszkiewicz, konrad h.; giezen, mark van der title: omics, bioinformatics, and infectious disease research date: 2010-12-24 journal: genetics and evolution of infectious disease doi: 10.1016/b978-0-12-384890-1.00018-2 sha: doc_id: 304607 cord_uid: td0776wj bioinformatics is basically the study of informatic processes in biotic systems. actually what constitutes bioinformatics is not entirely clear and arguably varies depending on who tries to define it. this chapter discusses the considerable progress in infectious diseases research that has been made in recent years using various “omics” case studies. bioinformatics is tasked with making sense of it, mining it, storing it, disseminating it, and ensuring valid biological conclusions can be drawn from it. this chapter discusses the current state of play of bioinformatics related to genomics and transcriptomics, briefs metagenomics that finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms. this chapter explains the various possibilities of pan-genome, transcriptional reshaping and also enormous progress of proteomics study. bioinformatic algorithms and tools are crucial tools in analyzing the data. the chapter also attempts to provide some details on the various problems and solution in bioinformatics that current-day scientists face while concentrating on second-generation sequencing strategies. although bioinformatics is generally perceived to be a modern science, the term had been put forward over thirty years ago by paulien hogeweg and ben hesper for "the study of informatic processes in biotic systems" (hogeweg, 1978; hogeweg and hesper, 1978) . it is necessarily nebulous-bioinformatics spans many disciplines and can have many shades of meaning. indeed it can be argued that it is the collation and analysis of data from different disciplines that has provided some of the greatest insights. in the field of genomics and transcriptomics, bioinformatics is an incredibly diverse field. evolution, epidemiology, ecology, and the response of an organism to its environment are all fields that require bioinformatics to accurately process and place into context various sources of data. at the heart of genomics and transcriptomics is the generation and analysis of vast quantities of sequence data. dna sequencing took off in the late 1980s when applied biosystems developed the first automated sequencing machine. the subsequent development of more efficient ways to sequence resulted in the phenomenal growth of the number of sequences deposited in genbank (figure 18 .1). obviously, with over 100 million sequences deposited in genbank, it is not feasible to do any serious manual work with such a large dataset. data obtained from modern secondgeneration sequencers is on the order of 1000 times greater than capillary-based sequencers. it is now possible to routinely generate many gigabases of sequence data. bioinformatics is tasked with making sense of it, mining it, storing it, disseminating it, and ensuring valid biological conclusions can be drawn from it. many of the recent high-throughput functional genomics technologies rely on a bioinformatics component, though bioinformatics is just one part of the process. for example, identification of proteins by mass spectroscopy, quantitative analysis of expression data, phylogenetics, and so on all make use of bioinformatics tools, methods, and databases. bioinformatics plays a key role at several steps in genomics, comparative genomics, and functional genomics: sequence alignment, assembly, identification of single nucleotide polymorphisms (snp), gene prediction, quantitative analysis of transcription data, etc. in this chapter, we will discuss the current state of play of bioinformatics related to genomics and transcriptomics and use relevant examples from the field of infectious diseases. the term "metagenomics" was originally used to describe the sequencing of genomes of uncultured microorganisms in order to explore their abilities to produce natural products (handelsman et al., 1998 , rondon et al., 2000 and subsequently resulted in novel insights into the ecology and evolution of microorganisms on a scale not imagined possible before (see cardenas and tiedje, 2008; hugenholtz and tyson, 2008 for an overview). however, metagenomics now finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms from, for example, patient material that could lead to the identification of the cause of disease. in a quite straightforward metagenomics approach to identify pathogens in sputa from cystic fibrosis patients, standard microbiological culture techniques were compared to molecular methods using 16s rdna pcr (bittar et al., 2008) . the well-known disadvantage of the microbiological methods is that they normally employ "selective" media that are designed to pick up those bacterial pathogens that are thought to be present. emerging pathogens will be missed using traditional culture techniques. indeed, bittar et al. identified 33 bacteria using cultivation while 53 bacterial species were detected using molecular methods (based on blast comparisons; altschul et al., 1990) , interestingly, 30% of the latter were anaerobes, organisms missed in the routine cultivation methods. many bacteria identified using the molecular methods are traditionally not thought to be associated with cystic fibrosis. whether these novel species are associated with the physiopathology of disease remains to be studied. bittar et al. (2008) also noted that the number of bacteria detected increased with increased numbers of clones sequenced, a well-known phenomenon in environmental sequencing that relates to sample depth (huber et al., 2007; huse et al., 2010) . however, with the increased use of next-generation sequencing methods in infectious disease research, the lessons learned from environmental studies relating to diversity and relative abundance of different microbes can be put to effective use. an example of the use of second-generation sequencing in a metagenomics approach of patient material is the study by nakamura et al. (2009) to identify viruses in nasal and fecal material. in this study, rna was isolated from patient material obtained during seasonal influenza infections and norovirus outbreaks. this rna was reverse transcribed into cdna, which was subsequently subjected to large-scale parallel pyrosequencing resulting in 25,000 reads on average per sample. although the influenza samples were mainly (.90%) human in origin, it was nonetheless possible to identify the influenza subtypes in each sample (nakamura et al., 2009) . as the fecal samples were cleared of human and bacterial cells, yields were much better and the complete norovirus gii.4 subtype genome was sequenced with an average cover depth of up to 2583. in addition to being able to identify the influenza and noroviruses, two recently identified human viruses were also identified: wu polyomavirus and human coronavirus hku1 (nakamura et al., 2009) . major bacterial species normally found in the respiratory tract were also identified. although nakamura et al. suggest that the high-throughput sequencing is more sensitive than standard pcr-based analysis and might result in the detection of additional possible pathogens, they also warn that the increased sensitivity might necessitate follow-up work to decide which of the detected pathogens is the actual cause of the disease. important results are expected from the human microbiome project (http:// www.hmpdacc.org/), which will obtain metagenomic information from various human microenvironments such as the gastrointestinal, nasooral, and urogenital cavities as well as the skin. understanding the human microbiome is thought to answer questions such as whether changes in the human microbiome are related to human health. however, large-scale metagenomics projects that include eukaryotic genomes have thus far been quite costly and laborious due to the generally large genomes of eukaryotes. the lowering of sequencing costs may alleviate part of the problem, but sequence data are still accumulating at a faster rate than developments in computational analysis (hugenholtz and tyson, 2008) . organisms that have attracted the attention of genome centers are those that cause disease followed by those from model organisms such as saccharomyces cerevisiae (goffeau et al., 1996) and caenorhabditis elegans (the c. elegans sequencing consortium, 1998), for example. indeed, the first bacterial genomes sequenced were those from pathogens fraser et al., 1995; tomb et al., 1997) , and these were preceded by many bacteriophage genomes such as bacteriophage ms2 (fiers et al., 1976) and ϕx174 (sanger et al., 1977) and viral genomes (fiers et al., 1978) . currently, pathogen genomes represent at least one third of all sequenced genomes. obviously, for comparative genomics two genomes are required, and indeed, when the second bacterial pathogen was sequenced (mycoplasma genitalium by fraser et al., 1995) , it was immediately compared with the first one (haemophilus influenzae by fleischmann et al., 1995) . interestingly, the h. influenzae genome was completed using a "bioinformatics" approach. unlike previous sequencing projects, the used shotgun approach relied on a computational justification that sufficient random sequencing of small fragments would result in a complete coverage of the whole genome. comparing the m. genitalium genome with the haemophilus genome suggested that the percentage of the total genome dedicated to genes is similar albeit that m. genitalium has far fewer genes (fraser et al., 1995) . although the genome of m. genitalium is about three times smaller than that of h. influenzae, its smaller genome has not resulted in an increase in gene density or decrease in gene size. detection of several repeats of components of the mycoplasma adhesin, which elicits a strong immune response in humans, suggests that recombination might underlie its ability to evade the human immune response. that this initial genome study was only the tip of the comparative genomics iceberg was already clear from fleischmann et al. (1995) last sentence: "knowledge of the complete genomes of pathogenic organisms could lead to new vaccines." a whole-genome effort at identifying vaccine candidates appeared some 5 years later when pizza et al. (2000) employed bioinformatics to extract putative surface-exposed antigens by genome analysis. although effective vaccines against neisseria meningitidis, the causative agent of meningococcal meningitis and sepsis, did exist, these vaccines did not cover all pathogenic serogroups. serogroup b had evaded the development of a good vaccine as its capsular polysaccharide (against which the vaccines of the other serogroups were developed) is identical to a human carbohydrate. in order to identify putative candidates for vaccine development, pizza et al. decided to sequence the whole genome of a serogroup b strain. all potential open reading frames (orfs) were analyzed for putative cellular locations using blastx. those orfs likely to be cytosolic were excluded from further analysis. the remaining orfs were analyzed to determine whether they encoded proteins that contained transmembrane domains, leader peptides, and outer membrane anchoring motives using a variety of databases such as pfam (finn et al., 2010) and prodom (servant et al., 2002) . this resulted in 570 orfs encoding putative exposed antigens. these 570 putative genes were cloned in escherichia coli and pizza et al. successfully expressed 350 orfs. these 350 recombinant proteins were used to generate antisera that were tested in enzyme-linked immunosorbent assay (elisa) and fluorescence-activated cell sorter (facs) analyses to test whether they detected proteins on the outer surface of serogroup b meningococcus strains. in addition, the sera were tested for bactericidal activity. of the 350 proteins, 85 reacted positively in at least one assay but only 7 were positive in all three assays. these 7 were subsequently tested on a large variety of strains to analyze their efficacy. a total of 5 seemed able to provide protection against 31 n. meningitidis strains and in addition, those 5 proteins are 95à99% similar to the homologous n. gonorrhoeae proteins, suggesting they might provide successful protection against that pathogen as well (pizza et al., 2000) . arguably the most striking aspect of this study is that in 18 months the authors identified more vaccine candidates than in the preceding 40 years using a novel genomics/bioinformatics approach (seib et al., 2009 ). this study resulted in a vaccine that is currently in phase iii clinical trials (giuliani et al., 2006) . protozoan infections are a major burden on developing nations; they take 8 of the 13 diseases targeted by the world health organization's special program for research and training in tropical diseases (http://www.who.int/tdr). over the last 5 years or so, more than 10 parasitic genomes have been sequenced in the hope that their sequences would reveal weak spots to target these pathogens. the trypanosomatids cause serious disease in africa and south america. trypanosoma brucei causes sleeping sickness in humans and wasting disease in cattle. trypanosoma cruzi is the causative agent of chagas disease and leishmania major leads to skin lesions. the completion of their genomes , el-sayed et al., 2005a , ivens et al., 2005 and the comparative analysis of all three genomes (el-sayed et al., 2005b) may be able to focus efforts toward obtaining vaccines, as current drugs have serious toxicity issues. although their genomes encode a different number of protein-encoding genes (around 8100 in t. brucei; 8300 in l. major; 12,000 in t. cruzi), comparative analysis resulted in the identification of about 6200 genes that entail the trypanosomatid core proteome. all protein coding genes were compared in a three-way manner using blastp (el-sayed et al., 2005b) and the mutual best hits were grouped as clusters of orthologous genes or cogs ( figure 18 .2). trypanosomatid specific proteins from these 6200 might be used in a broadscale vaccine. the remainder of the protein-encoding genes from each parasite (26% of the genes in t. brucei; 12% in l. major; 32% in t. cruzi) consists of species-specific genes. interestingly, a large proportion of these genes encode surface antigens and this might relate to the different mechanisms these parasites employ to evade the host immune system. in addition, it was noted that many genes encoding surface antigens are found at or near telomeres and that many retroelements seem to be present in these regions as well. this might be related to the enormous antigenic variation observed in both trypanosoma species. the presence of novel genes in these areas might suggest that their products play an unknown role in antigenic variation as well which warrants further studies into these uncharacterized genes (el-sayed et al., 2005b) . detailed knowledge of well-studied pathogens might be successfully used to understand the biology of closely related emerging pathogens. this was the driving force for the sequencing of six candida species (butler et al., 2009) . candida species are the most common opportunistic fungal infections in the world and c. albicans is the most common of all candida species causing infection. however, c. albicans incidence is declining while other species are emerging. comparison of eight candida species indicated that although genome size was variable, gene content was nearly identical across all species. as the analysis included pathogenic and nonpathogenic species, butler et al. (2009) specifically studied differences between these two groups. of the over 9000 gene families analyzed, 21 were significantly enriched in pathogenic species. many gene families known to be involved in pathogenesis were present in these 21 families (e.g., lipases, oligopeptide transporters, and adhesins). more interestingly, several poorly characterized gene families were also identified, suggesting these might play an unexpected role in pathogenesis as well. this comparative study revealed a wealth of new avenues to explore, which, combined with the large body of work performed on c. albicans, will aid understanding the newly emerging pathogenic candida species (butler et al., 2009 ). although comparative studies using multiple species can reveal hitherto unknown features as evidenced from the mentioned trypanosomatid and candida studies, they can also reveal something unexpected. because the definition of a bacterial species has been debated for a long time, tettelin et al. (2005) set out to address this question by sequencing multiple strains from streptococcus agalactiae, the most common cause of illness or death among newborns. unexpectedly, despite the presence of a "core-genome" shared between all 8 genomes, mathematical modeling suggested that each additional sequenced genome would add 33 new genes to the "dispensable genome." an additional analysis using s. pyogenes also suggested that sequencing additional genomes would continue to add new genes to the pool resulting in a pan-genome that can be defined as the global gene repertoire of a species . this cannot be extrapolated ad infinitum, as a similar analysis of bacillus anthracis indicated that after the fourth genome, no additional genes were identified in agreement with its known limited genetic diversity (keim and smith, 2002) . subsequent analyses have confirmed the presence of pan-genomes for many bacterial species (hiller et al., 2007; lefébure and stanhope, 2007; rasko et al., 2008; schoen et al., 2008; lefébure and stanhope, 2009) and the ultimate gene repertoire of a bacterial species is much larger than generally perceived. whether this would be the case for eukaryotes remains to be shown. despite the apparently ever-expanding possibilities of the pan-genome, it has also resulted in a universal vaccine candidate for group b streptococcus (gbs). because various gbs serotypes exist, current vaccines only offer protection against a limited set of serotypes. eight genomes from six serotypes were compared resulting in the identification of a core-genome of 1811 genes and a dispensable genome of 765 genes, which were not present in each strain . both genomes were analyzed for the presence of putative surface-associated and secreted proteins. of the 598 identified genes, one third were part of the dispensable genome (193 genes). the authors subsequently produced recombinant tagged proteins in e. coli that were used to immunize mice. ultimately, a combination of four antigens turned out to be highly effective against all major gbs serotypes. three of these antigens were part of the dispensable genome. in addition, this bioinformatics approach highlights the importance of not dismissing unidentified orfs on genomes (generally up to 50% of sequenced genomes) as all four antigens had no assigned function. because of their identification using this method, it became obvious they were part of a pilus-like structure that had never seen before in group b streptococcus (lauer et al., 2005) . the presence of antigens that provide protection on these pilus-like structures suggest that these might play a role in pathogenicity. genomic information is useful as a scaffold. however, in a given environment pathogens and hosts only express a subset of their genes at any one time. the presence of pan-genomes only complicates matters even more. to investigate the response of an organism to an environmental or other stress it is necessary to examine the expression pattern of proteins. at present, this is not possible to accomplish directly on a large scale, but a good approximation can be made by sequencing and counting mrna molecules. at present the process involves converting the rna to cdna, which can introduce biases but nonetheless sequencing has a great many advantages over traditional microarrays (ledford, 2008) . these include high specificity with little or no background noise and one also gains nucleotide level resolution of expression. despite such drawbacks, microarrays are still extremely powerful tools to understand levels of gene expression, and this is obvious from the study by toledo-arana et al., who discovered novel regulatory mechanisms in listeria (toledo-arana et al., 2009) . l. monocytogenes is normally harmless but can lead to serious food-borne infections. environmental change, from the soil through the stomach to the intestinal lumen and ultimately into the bloodstream, is thought to be responsible for the up-and downregulation of a plethora of genes. comparative genomics of the nonpathogenic l. innocua has resulted in the identification of a virulence locus (glaser et al., 2001) . using microarrays, transcripts of one strain grown at 37 c in rich medium were compared to three different conditions: stationary phase, hypoxia, and low temperature (30 c). in addition, knockout mutants in three known regulators of listeria virulence gene expression (prfa, sigb, and hfq) were compared to the control strain as well. rna was also extracted from the intestine of inoculated mice and from blood from healthy human donors that were both infected with three different strains (control and prfa and sigb knockouts). this analysis resulted in the discovery of massive transcriptional reshaping under the control of sigb when listeria enters the intestines. however, in the bloodstream, gene expression is under control of prfa. various noncoding rnas were uncovered, which show the same expression patters as virulence genes suggesting a potential role in virulence (toledo-arana et al., 2009) . because microarray data are based on a comparative difference in hybridization, high-throughput next-generation sequencing is seen as more quantitative as it based on number of hits for each sequenced transcript ( van vliet, 2010) . however, when making cdna for next-generation sequencing transcriptomics in prokaryotes, there are several difficulties not found in eukaryotes, such as high levels of rrna and trna molecules as well as a lack of poly-a tails, making extraction difficult. nontheless, it is possible to overcome these by either reducing the amount of rrna and trna using commercially available kits or by bioinformatic removal of such sequences postsequencing ( van vliet, 2010) . to date, some 20 rna-seq style experiments have been performed on prokaryotes. to give an example of the sort of novel insights that can be gleaned using such technology, passalacqua et al. (2009) sequenced the bacillus anthracis transcriptome using solid and illumina sequencing and clearly showed the polycistronic nature of many transcripts on a whole genome scale. although known for individual operons, this had never been shown on a genome-wide scale. they were also able to test the current genome annotations and discovered that 36 loci that were removed as nongenes showed significant transcriptional activity. in addition, 21 nonannotated regions had clear levels of transcription and should therefore be considered as genes (passalacqua et al., 2009) . as internal methionines could have incidentally been identified as start codons, they also checked whether upstream regions were included in the transcribed region. in 11 cases this proved to be the case suggesting the original start codons were incorrectly annotated. reassuringly, when comparing their data with microarray data, a strong correlation was observed. interestingly, because of the very high resolution of sequence-based transcriptomics studies, it is possible to identify novel regulatory elements. for example, when comparing expression levels under o 2 -and co 2 -rich conditions, the first gene of an eight-gene operon did not show a marked difference in expression level while all the others were significantly upregulated under co 2 (passalacqua et al., 2009 ). indeed, a bioinformatics approach had suggested the presence of a t-box riboswitch between genes 1 and 2 of this operon (griffiths-jones et al., 2005) . a similar approach to study how burkholderia cenocepacia, an opportunistic cystic fibrosis pathogen, responds to environmental changes revealed several new potential virulence factors (yoder-himes et al., 2009). as b. cenocepacia is routinely isolated from soil, two strains (one isolated from a cystic fibrosis patient and one from soil) were analyzed in their response to changes from growth at synthetic human sputum medium and soil medium. although their overall nucleotide identity is 99.8%, 179 and 120 homologous genes showed a significant difference in expression between the two strains when grown in synthetic sputum medium and soil medium, respectively. this suggests that despite the high level of relatedness, differential gene expression plays a large role in adaptation to their ecological niche (yoder-himes et al., 2009) . interestingly, similar to passalacqua et al. (2009) , several expressed noncoding rnas were uncovered with different expression levels depending on environmental condition. the significance of this needs to be investigated but highlights the ability of second-generation sequencing to unearth novel findings. despite the fact that a species' genome could well be larger than the actual genome content of one member of that species due to the pan-genome concept, an organism's proteome is by far much more complex. as discussed earlier, transcriptomics will reveal which subset of the genome is expressed under a given condition. however, posttranslational modifications of proteins make the actual proteome far more complex than the transcriptome. this is also the strength of proteomics, as can be seen in a study of the obligate intracellular parasite chlamydia pneumonia. c. pneumonia is the third-most-common cause of respiratory infections in the world, which, in part, is made possible due to the unique bi-phasic life cycle of this bacterial pathogen. chlamydia spread via a metabolically inert infectious particle called the elementary body. these elementary bodies enter the host cell where they differentiate into reticulate bodies. as the elementary body is the infectious phase, proteins presented on the outer membrane would be ideal candidates for vaccine development, especially as effective vaccines are lacking and treatment is via antibiotic therapy. a large-scale genomics-proteomics study by montigiani et al. (2002) systematically assessed putative exposed antigens for possible use in vaccine development. of the 1073 c. pneumonia genes, 636 have assigned functions, 72 of the latter are predicted to be peripherally located and were therefore selected for follow-up studies. in addition, the remaining 437 orfs were subjected to a series of search algorithms aimed at identifying putative surface-exposed antigens. in total, 141 orfs were identified as being possibly located on the cell surface. these 141 were subsequently used to produce recombinant proteins in e. coli. because both his-tagged as well as gst-tagged versions were made, a total of 173 recombinant proteins were produced and used for immunizations of mice. all antisera were used in facs analysis to test if they could bind to the c. pneumonia cell surface. this resulted in the identification of 53 putative surface-exposed antigens. interestingly, apart from well-known antigens, 14 antigens from unidentified orfs were part of this group of potential vaccine candidates. all 53 candidates were tested on western blots whether they generated a clean band of the expected size or whether they cross-reacted with other proteins; 33 of the 53 were specific. finally, montigiani et al. conducted a proteomic analysis of total protein from the elementary body phase identifying spots using mass spectrometry. protein sequencing using maldi-tof identified 28 putative surface-exposed antigens on the c. pneumonia 2d gels (montigiani et al., 2002) . a follow-up study by thorpe et al. (2007) clearly showed that one of the identified candidates, lcre, induced, amongst others, cd4 1 and cd8 1 t cell activation and completely cleared infection in a murine model. interestingly, lcre is homologous to a protein that is thought be part of the type iii secretion system of yersinia. the exposed nature of lcre on the c. pneumonia cell surface suggests that a type iii secretion system plays a role in chlamydia infection (montigiani et al., 2002) . the importance of exposed outer membrane proteins as potential vaccine candidates has prompted berlanda scorza et al. to assess the complement of outer membrane proteins from an extraintestinal pathogenic e. coli strain (berlanda scorza et al., 2008) . extraintestinal pathogenic e. coli is the leading cause of severe sepsis and current increases in drug resistance warrant the search for novel vaccine targets. in addition, current whole-cell vaccines suffer from undesired cross-reactions to commensal e. coli as well. the novel approach by berland scorza et al. is based on the observation that some gram-negative bacteria release outer membrane vesicles (omv) in the culture media, albeit in minute quantities. a tolr mutant appeared to release much more omvs than wild-type cells and subsequent large-scale mass spectroscopic analysis of its protein content resulted in the identification of 100 proteins. the majority of these were outer membrane and periplasmic proteins. intriguingly, three subunits from the cytolethal distending toxin (cdt) were included. this toxin is unusual in that one of its subunits is targeted to the eukaryotic host cell, where it breaks doublestranded dna resulting in cell death (de rycke and oswald, 2001) . to check whether the presence of cdt in the omv was due to the tolr knockout, wild-type extraintestinal pathogenic e. coli was tested using western blotting. indeed, cdt was detected in wild-type omv as well (berlanda scorza et al., 2008) . this suggests that toxin delivery via vesicles might well be the key event in pathogenesis. interestingly, 18 of the 100 identified proteins were not predicted to be targeted to the periplasm or outer membrane by psortb (gardy et al., 2005) . we see here excellent opportunities to train protein targeting algorithms with new wetbench data as these algorithms generally have been trained on a limited set of model organisms that do not reflect the diversity encountered in real life. despite the enormous progress in genomics of infectious diseases, the discovery of new drugs has not kept equal pace. for example, no candidate drugs have been identified after 70 high-throughput screens using validated bacterial drug targets (payne et al., 2007) . although broad-spectrum drugs might be more desirable, there has been a recent trend in targeting specific proteins from specific pathogens using structural biology. several structural genomics initiatives have been set up to target specific groups of pathogens. for example, the seattle structural genomics center for infectious diseases (http://ssgcid.org) and the center for structural genomics of infectious diseases (http://www.csgid.org) work on category a to c agents listed by the national institute for allergy and infectious diseases (niaid). other centers focus on specific organisms such as mycobacterium tuberculosis. examples are the mycobacterium tuberculosis structural proteomics project (http://xmtb. org) and the mycobacterium tuberculosis structural proteomics consortium (http://www.doe-mbi.ucla.edu/tb). the field of structural genomics aims to solve as many protein structures as possible from human pathogens with the aim to come up with new drug targets or vaccines (van voorhis et al., 2009) . obviously, correct selection of candidates for structural genomics projects is paramount and various criteria have been put forward (anderson, 2009; van voorhis et al., 2009) . if a protein is already a validated drug target obviously aids in selection. the proteins need to be essential for the pathogen and ideally, absent in humans. proteins involved in the uptake of essential nutrients are another target. classically, drug design has been focusing on substrate binding sites. more recently, small molecules interfering with subunit binding have started to attract attention. as eukaryotic and prokaryotic inorganic pyrophosphatases differ in composition (the former are homodimers, while the latter are homohexamers), efforts are aimed at compounds that interfere with the oligomeric state of the enzyme. in contrast, the highly conserved active site of inorganic pyrophosphatase would not have been a good target (van voorhis et al., 2009) . the 2003 sars outbreak that caught the infectious diseases community (if not the whole world) by surprise is one example where structural genomics has made enormous progress. despite knowing that coronaviruses caused serious diseases in animals, the fact that they only caused mild disease in humans meant that there was very little knowledge about coronavirus biology. the subsequent effort to understand viral assembly and replication/transcription, for example, has resulted in the elucidation of 12 sars-cov solved protein structures. interestingly, the novel fold-discovery rate was nearly 50%, while it would normally be more close to 6% (bartlam et al., 2007) . in addition, one key protein, the sars-cov main protease, has since been at the center of structure-based drug discovery. because of the nature of the discipline, structural genomics is dependent on various other disciplines such as biochemistry, microbiology, structural biology, computational biology, and bioinformatics and can only foster in a truly interdisciplinary environment (anderson, 2009 ). it is now possible to sequence the entire genome of a bacterial pathogen, assemble the raw sequence reads, perform automated annotation, and visualize the results within 3 weeks. at the same time (indeed even on the same sequencer) it is also possible to selectively sequence the transcriptome (rna-seq) regions of dna bound to protein (chip-seq) or for relevant species methylated dna to study epigenetic effects as well as small rna molecules. it is also possible to perform the very same sequencing on the host organism at the same time. bioinformatic algorithms and tools are a crucial tool in analyzing such unprecedented volumes of data. these data volumes have emerged as a result of secondgeneration sequencers such as the roche/454, illumina, and abi/solid systems. although useful information can be extracted by single researchers by targeted analysis of the sequencer output, to gain the most information out of such data, it is becoming increasingly common for multiple researchers or research groups with widely differing areas of expertise to collaborate. this collaboration is absolutely crucial if relevant insights are to be gained from large-scale datasets. as a result a vast array of data is generated, which is required to be annotated and curated as well as analyzed for information relevant to any particular experiment. in addition this information needs to be stored, shared, and distributed in a manner that enables reanalysis if and when new hypotheses are generated. platforms as produced by the gmod consortium (http://gmod.org), such as gbrowse, and underlying databases are excellent web-based tools for visualizing and comparing datasets. however, they currently offer limited scope for collaborative annotation or curation of datasets where relevant expertise can be brought to bear from a variety of different research groups. this problem is magnified with the advent of second-generation sequencers since much smaller groups of researchers tend to be involved, meaning that the expertise that large collaborations can muster (such as the influenza research database [fludb], http://www.fludb.org/) is much smaller. thus there is a need for integrated annotation and visualization pipelines to enable individual researchers to perform comparative genomics and transcriptomics. the broad institute offers a number of useful visualization tools to the individual researcher such as argo (http://www.broadinstitute.org/annotation/argo/) and the integrated genome viewer (igv) (http://www.broadinstitute.org/igv/). argo offers the ability to manually annotate and visualize a genome as well as provide a good graphical overview for comparative genomics and transcriptomics. currently, there is no one standard for bioinformatics pipeline development for next-generation sequencing. several efforts are underway or can be adapted from sanger sequencing pipelines. these include the prokaryote annotation pipeline xbase and the isga server (hemmerich et al., 2010) . these enable de novo sequenced prokaryote genomes to be annotated automatically and corrected manually at a later date. alternative sanger adaptations such as maker can also be used once an assembly has been generated. a large array of programs is now available to either align reads to a reference genome or to assemble them de novo (miller et al., 2010; paszkiewicz and studholme, 2010 ). they will not be listed in detail here as there are many considerations, including sequencing platform used, the read length in use, the expected genome size, length of longest repetitive elements, gc content, and whether paired-end reads are in use. the proprietary newbler software from roche is the most popular method of de novo assembly of 454 reads (typically 400à500bp). popular assemblers for short reads (i.e., mostly from illumina or solid platforms) are velvet (http://www.ebi.ac.uk/bzerbino/velvet) for the assembly of genomic dna or oases from the same group dealing with assembly of reads from transcriptomic cdna (http://www.ebi.ac.uk/bzerbino/oases) (zerbino and birney, 2008) . other assemblers such as abyss (simpson et al., 2009) , allpaths (butler et al., 2008) or soapdenovo (http://soap.genomics.org.cn/soapdenovo.html) are also popular. abyss enables assembly to be parallelized, thus speeding up assembly. allpaths has been shown to offer superior performance when multiple pairedend libraries are used. independent of read length, it is crucial that paired-end libraries are used when constructing de novo assemblies of any genome. note that the use of short-read sequences only can lead to significant gaps being left in the final assembly due to repetitive elements. however, for many analyses (especially for prokaryotic organisms) these gaps are generally not considered to be significant. in cases where closure of these gaps is more desirable than the addition of 454, sanger or long-range pcr data can often help. where significant quantities of long-and short-read data are available, then a joint assembly can be attempted. a recommended protocol is to assemble the short and long reads separately using their respective packages and to then merge the two assemblers using programs such as minimus (sommer et al., 2007) . another option is to use a template sequence from a related organism to help guide the assembly (note-this is distinct from remapping as described). the amoscmp package is useful for this purpose (pop et al., 2004) . finally, whatever assembly method is used, it is important to remember that a longer assembly is not necessarily a better one. examining the reads making up a contig (e.g., using the amos package (http://amos.sourceforge.net) or the tablet viewer (http://bioinf.scri.ac.uk/tablet) and alignment to a core-conserved group of genes should be standard practice to ensure that blatant errors are corrected. remapping of short reads to a reference genome is also a valid method of comparison. although software such as blat (kent, 2002) can be used with longer 454 reads, it is not an ideal tool for shorter read technologies where data volumes are much greater. where such a genome is available, software such as maq, its successor, bwa, bowtie, soap, and others offer a wealth of tools to identify indels, snps, and other variants which may be of interest. crucially in these cases it is important to have sufficient depth of coverage to ensure snp calls are valid. paired-end data is also valuable to have to highlight the presence of indels. after remapping it is also common practice to assemble unmapped reads using the de novo assembly software to reveal any novel sequence variants, which may be absent in the reference. in the case where pathogens and hosts are sequenced together, if the sequence of at least one is known, then it is relatively straightforward to separate the two using bioinformatic techniques. to deal with transcriptomic data where a reference sequence is available, softwares, such as erange (http://woldlab.caltech.edu/rnaseq/), tophat (trapnell et al., 2009) , and cufflinks (http://cufflinks.cbcb.umd.edu/), are extremely useful. the cufflinks module in particular offers the ability to predict the most likely exon isoform expression pattern using a combination of bayesian statistics and graphbased algorithms. we are aware that our treatment of the use of "omics" and bioinformatics in infectious disease research is not exhaustive. as mentioned in the introduction, what constitutes bioinformatics is not entirely clear and arguably varies depending on who tries to define it. however, we have attempted to show the considerable progress in infectious diseases research that has been made in recent years using various "omics" case studies. in addition, the last section is an attempt to provide a brief overview of the problems and (bioinformatics) solutions that current-day scientists face who embark on second-generation sequencing strategies. this is a fast-moving field, but the provided references and websites should be a good first approach for those who wish to make further strides toward eradicating infectious diseases from our planet. basic local alignment search tool structural genomics and drug discovery for infectious diseases structural proteomics of the sars coronavirus: a model response to emerging infectious diseases proteomics characterization of outer membrane vesicles from the extraintestinal pathogenic escherichia coli δtolr ihe3034 mutant the genome of the african trypanosome trypanosoma brucei molecular detection of multiple emerging pathogens in sputa from cystic fibrosis patients allpaths: de novo assembly of whole-genome shotgun microreads evolution of pathogenicity and sexual reproduction in eight candida genomes new tools for discovering and characterizing microbial diversity cytolethal distending toxin (cdt): a bacterial weapon to control host cell proliferation? the genome sequence of trypanosoma cruzi, etiologic agent of chagas disease comparative genomics of trypanosomatid parasitic protozoa complete nucleotide sequence of bacteriophage ms2 rna: primary and secondary structure of the replicase gene complete nucleotide sequence of sv40 dna the pfam protein families database whole-genome random sequencing and assembly of haemophilus influenzae rd the minimal gene complement of mycoplasma genitalium psortb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis a universal vaccine for serogroup b meningococcus comparative genomics of listeria species life with 6000 genes rfam: annotating non-coding rnas in complete genomes molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products an ergatisbased prokaryotic genome annotation web server comparative genomic analyses of seventeen streptococcus pneumoniae strains: insights into the pneumococcal supragenome interactive instruction on population interactions microbial population structures in the deep marine biosphere microbiology: metagenomics ironing out the wrinkles in the rare biosphere through improved otu clustering the genome of the kinetoplastid parasite, leishmania major bacillus anthracis evolution and epidemiology blat-the blast-like alignment tool genome analysis reveals pili in group b streptococcus the death of microarrays? evolution of the core and pan-genome of streptococcus: positive selection, recombination, and genome composition pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus campylobacter identification of a universal group b streptococcus vaccine by multiple genome screen the microbial pangenome assembly algorithms for next-generation sequencing data genomic approach for analysis of surface proteins in chlamydia pneumoniae direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach structure and complexity of a bacterial transcriptome de novo assembly of short sequence reads identification of vaccine candidates against serogroup b meningococcus by whole-genome sequencing the pangenome structure of escherichia coli: comparative genomic analysis of e. coli commensal and pathogenic isolates cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms nucleotide sequence of bacteriophage phix174 dna whole-genome comparison of disease and carriage strains provides insights into virulence evolution in neisseria meningitidis the key role of genomics in modern vaccine and drug design for emerging infectious diseases prodom: automated clustering of homologous domains abyss: a parallel assembler for short read sequence data minimus: a fast, lightweight genome assembler genome analysis of multiple pathogenic isolates of streptococcus agalactiae: implications for the microbial "pan-genome genome sequence of the nematode c. elegans: a platform for investigating biology discovery of a vaccine antigen that protects mice from chlamydia pneumoniae infection the listeria transcriptional landscape from saprophytism to virulence the complete genome sequence of the gastric pathogen helicobacter pylori tophat: discovering splice junctions with rna-seq next generation sequencing of microbial transcriptomes: challenges and opportunities the role of medical structural genomics in discovering new drugs for infectious diseases mapping the burkholderia cenocepacia niche response via high-throughput sequencing velvet: algorithms for de novo short read assembly using de bruijn graphs we would like to acknowledge our colleague dr. david j. studholme for his suggestions and feedback. key: cord-278136-ol2buwld authors: gonzales, natalia m.; howell, viive m.; smith, clare m. title: 29th international mammalian genome conference meeting report date: 2016-05-02 journal: mamm genome doi: 10.1007/s00335-016-9640-0 sha: doc_id: 278136 cord_uid: ol2buwld nan during november 8-15, 2015, the 29th annual international mammalian genome conference (imgc) attracted researchers from all over the world to yokohama, japan to discuss the latest advances, tools, and techniques in mammalian genetics. organized by piero carninci (riken) and the international mammalian genome society (imgs; www.imgs.org), the meeting brought together 336 scientists from 28 countries, not to mention hundreds of online participants that followed the live tweeting of talks (#imgc15). social media has become an integral part of the imgc, complementing the scientific content and facilitating ongoing discussions between scientists around the globe. the conference opened with a bioinformatics workshop, guided tours of riken laboratories, and a trainee symposium, giving ph.d. students and early-career postdoctoral researchers an opportunity to share their works in a collegiate and mentoring setting and vie for the chance to present at the main meeting. participants were officially welcomed to yokohama at the evening reception, where old friends and new were met over a smorgasbord of local cuisine. the main meeting was divided into sessions showcasing the wide-ranging research interests of imgs members. these included human disease models and immunology, neuroscience, development and stem cells, genomics and computational analysis, epigenomics and noncoding rnas, advances in genome editing, and largescale resources. the verne chapman lecture was given by professor john mattick, director of the garvan institute of medical research in sydney, and the inaugural darla miller distinguished service lectureship was awarded to janan eppig, professor at the jackson laboratory and a pioneer of the mouse genome informatics (mgi) program project. several poster sessions, a mentor lunch for trainees and workshops on bioinformatics, systems genetics, scientific literature curation, gene enrichment analysis, and fantom were also featured in the meeting program, which culminated in a feast of epic proportions. abstracts from the meeting are available at www.imgc2015.jp. the imgc has a strong reputation for advocating new scientific talent, epitomized by the 33 trainee scholarships awarded for conference travel, the mentor-trainee lunch, presentation awards, and numerous opportunities to present work and receive feedback. the trainee symposium is an integral part of the meeting, featuring oral presentations from 16 graduate students and postdoctoral researchers. the wide variety of topics presented exemplified the diversity and utility of mammalian model systems for both clinical and basic research. xenograft mouse models were utilized by hiroyuki yoda (ts-01; chiba cancer centre research institute) and takahiro inoue (ts-13; chiba cancer centre research institute) to demonstrate the in vivo anti-tumor effects of novel alkylating agents targeting the oncogenes mycn in neuroblastoma, and kras g12d/v in colorectal cancer, respectively. the power of forward genetics was demonstrated by lisa gralinski (ts-02; university of north carolina) using the precollaborative cross to identify sars-coronavirus susceptibility loci and irina treise (ts-16; german mouse clinic), exploiting n-ethyl-n-nitrosourea (enu) mutagenesis to uncover molecular mechanisms of immunodeficiency. the sanger knockout mouse resources were highlighted by kifaythullah liakath-ali (ts-12; kings college london), who conducted a phenogenomics screen to investigate the genetic basis of skin phenotypes. gennadiy tenin (ts-10; university of manchester) followed up on human genomewide association studies (gwas) of the congenital heart defect tetralogy of fallot by developing an in vitro mouse heart culture for testing candidate genes by sirna knockdown. ximena ibarra-soria (ts-07; wellcome trust sanger institute) took us on a journey through the mouse olfactory system, demonstrating that novel olfactory receptor genes can be identified, characterized, and distinguished from environmental regulators despite the system's overwhelming complexity. several speakers focused on behavioral traits, including yuki matsumoto (ts-04; national institute of genetics), who identified a locus on chromosome 11 associated with mouse tameness in wild-derived heterogeneous mouse stocks, and akira tanave (ts-09; national institute of genetics) who is exploring the etiology of anxiety and stress by studying strain differences between wild-derived msm mice and c57bl/6 mice. guzel gazizova (ts-03; kazan federal university) introduced the dormouse as a genetic model, discussing how transcriptome-level differences between two genera of dormice can be analyzed to yield insights into the evolution of hibernation and to clarify the status of the dormouse in the mammalian phylogeny. belinda goldie (ts-06; kyoto university) took us further into the world of gene expression, demonstrating the importance of extending micro-rna (mirna) analysis beyond evolutionarily conserved targets when exploring gene regulation of human neuronal synapses. hazuki takahashi (ts-14; riken) demonstrated a high-throughput system to optimize antisense long-noncoding rnas that increase translation of target mrnas, and riti roy (ts-15; university of western australia) examined expression profiles of receptors and ligands in cell lines profiled in the fantom5 project and tumors profiled by the cancer genome atlas (tcga) to understand how cancer cells communicate. mice were not the only model system represented by trainee symposium talks. pavel mazin (ts-05; skolkovo institute of science and technology) examined brain rna-seq data from humans, chimpanzees, and macaques to identify species and age-related differences in alternative splicing patterns. pavel prosselkov (ts-11; riken) branched from the mammalian tree, using the sea squirt to investigate the role of gene paralogs in cognition. finally, brandon velie (ts-08; swedish university of agricultural sciences) showcased the horse, an underutilized organism in mammalian genetics, as a model for identifying genetic factors related to locomotion, allergic diseases, and eczema. ximena ibarra-soria, akira tanave, hazuki takahashi, and irina treise were named as the lorraine flaherty awardees for their talks during the trainee symposium ( fig. 1 ; table 1 ) and received the opportunity to present their work at the main conference. several plenary sessions focused on a range of mammalian tools and resources for modeling human disease. the session showcased tools such as recombinant inbred lines (rils), outbred populations, classic crosses, and enu mutagenesis to yield new understanding and identify candidate genes for disease susceptibility, while knockout and patient-derived xenograft mice enabled further mechanistic insight. the session featured many talks utilizing the phenotypic diversity and genetic mapping power of diversity outbred (do) mice and the collaborative cross (cc), community resources *10 years in the making. fernando pardoumass medical school) showed that resistance loci underlying tuberculosis pathogenesis could be mapped in *60 cc lines and *20 lines of the incipient c57bl/ 6 9 dba/2 (bxd) cross. interestingly, bacterial modules could be mapped onto the host genome to understand how both host and pathogen genomes determine disease outcome. the mapping resolution of the do population was also highlighted in cancer, with nigel crawford (o-19; national institutes of health, bethesda, md) using quantitative trait locus (qtl) mapping in tramp 9 j:do f1 males to identify metastasis susceptibility loci for prostate cancer. the power of the enu approach was demonstrated by several talks. gaetan burgio (o-02; australian national university) identified host factors altering malaria infection outcomes that could be targeted as a novel host-directed antimalarial therapy, and kart tomberg (o-16; university of michigan) mapped thrombosis modifier genes by bulk exome sequencing mice from a sensitized enu suppressor screen. both talks featured the use of gene editing candidate mutations with crispr/cas9 to validate causative alleles. the impact of mouse models on precision oncology was showcased by carol bult (o-01; the jackson laboratory), who discussed how patient-derived xenograft models can provide a platform for testing therapeutic options to guide treatments for breast and other cancers (fig. 2) . kate ackerman (o-22; university of rochester) used inducible wt1 creert2 to determine the impact of loss of ctnnb1 at different time-points, concluding that b-catenin is critical for diaphragm development during a defined window of time. han kyu lee (o-18; duke university) analyzed polymorphisms among the ancestral haplotypes of 32 inbred mouse strains to map loci for ischemic stroke outcomes, validating the interleukin 21 receptor as a candidate by analysis of gene expression patterns and knockout models. other features of this session included a gwas of aerobic capacity in rats segregated on running ability by yu wang german center for neurodegenerative diseases tuebingen) conducted a massive forward genetic screen using human exome data, followed by systematic rnai screens in worms, flies, and human cell lines to identify genes and pathways involved in parkinson's disease. this year's imgc included a mini-session that sought to address the question of whether or not the mouse is still relevant as a model for human disease. while this particular audience needed no convincing of the fundamental scientific understanding gained through use of the mouse as a model, tsuyoshi miyakawa (o-23; fujita health university) gave a thought-provoking narration of his lab's response to a controversial publication claiming otherwise. his careful consideration of arguments made from both sides of the controversy emphasized the importance of understanding the experimental design and methods for analyzing a study before interpreting its results. miyakawa compared his re-analysis of mouse genomic data from seok et al. (2013) with the original study, which had concluded that genomic responses in mouse models poorly mimic human inflammatory diseases. miyakawa's group drew the opposite conclusion from the data, demonstrating that responses in mouse models greatly mimic human inflammatory diseases (takao and miyakawa 2015) . alterations to the seok et al. (2013) analysis included comparing genes that exist in both mouse and human, not just human disease genes that lack rodent homologs. he emphasized the need to define appropriate phenotypes and choose appropriate statistical methods. ultimately, he concluded, the 15 % overlap in gene expression between mouse and human does not mean that the mouse is a bad model; instead, the overlapping 15 % probably contain the genes that are the most important for disease. the panel responses also emphasized other advantages of mouse models, including but not limited to access to tissue, ability to measure responses at different time-points on the same background and the capacity to do epigenetic studies. the symposium also raised a conundrum of the scientific review process by highlighting ways in which the design and methods chosen to address a question can either obscure or reveal scientific truths, inciting a thoughtful series of questions about training and the process of peer review. how can we ensure that future generations of biologists are adequately trained to evaluate statistical methods? when the same data can be analyzed to show very different results, potentially affecting funding decisions on models, is the ultimate onus for publication on the journal or expert scientific reviewers? these questions were discussed in the open forum and reflect a larger conversation taking place within the wider scientific community, where they will undoubtedly continue to be discussed. this plenary session encompassed the use of mouse embryonic stem cells (mescs), gene expression analysis, and recent advances in genome engineering to address fundamental questions about development and degenerative disease. anne czechanski (o-06; the jackson laboratory) described how her experience deriving novel pluripotent mescs with the inhibitor cocktail 2i led to the unexpected observation that female cell lines experience a higher rate of attrition than male cell lines. future transcriptional profiling may uncover why the combination of x chromosome dosage and 2i culture conditions leads to attrition of female lines and will have important implications for those using mescs. sandra richardson (o-07; university of queensland) used retrotransposon capture sequencing (rc-seq) to deduce the timing and frequency of retrotransposon insertions in multigeneration c57bl/6 pedigrees. she presented data showing retrotransposition in the early embryo resulting in somatic and germline genetic mosaicism, adding to the evidence for retrotransposition as an important source of genetic diversity. patrizia rizzu (o-08; german center for neurodegenerative diseases) shared how she used fantom5 data to understand how a hexanucleotide repeat expansion influences the transcriptional profile of c9orf72, a gene involved in neurodegenerative disease. yasuhide furuta (o-09; riken) derived compound mutant mice from mescs containing multiple targeted mutations in the fibroblast growth factor (fgf) signaling system to study the role of fgf family genes in eye development. due to tight linkage between the genes of interest and reduced fertility in fgf mutant lines, it was previously impossible to generate compound mutants from an experimental cross. however, the development of crispr/cas9 allowed furuta's group to measure the extent of functional redundancy within the fgf pathway and uncover novel roles for its constituents. gabriela sanchez-andrade (o-10; wellcome trust sanger institute), recipient of this year's verne chapman young investigator award (table 1) , closed the session with a discussion of her efforts to identify new biomarkers of neurodegenerative disease in a mouse model of frontotemporal dementia and parkinsonism (ftdp) with severe olfactory deficits. olfactory dysfunction is one of the earliest and most common symptoms of neurodegenerative disease in humans, yet the underlying molecular mechanisms are unknown. sanchez-andrade's impressive analysis of the olfactory epithelium measured differential expression and protein dynamics in wild-type and transgenic mice to identify a set of candidate genes correlated with onset of ftdp. furthermore, she demonstrated that similar changes in the expression of these genes occur in other brain regions relevant to ftdp pathology, highlighting the potential of her approach to identify additional biomarkers of human disease. the imgc featured three sessions on genomics and computational analysis. although the 13 selected talks were diverse in both content and approach, each of them touched upon at least one of the following themes: advances in omic technology, molecular mechanisms underlying complex traits, and the relationship between genomic architecture and mammalian evolution. speakers in the first session discussed their experiences with single-cell rna-seq to address a series of questions that remain integral to the imgc each year; namely, what tools and technologies are driving our field forward? what exciting possibilities do they create, and what obstacles do we face in using these methods and interpreting their results? anna mantoski (o-11; roslin institute) shared her solutions to some of the analytical puzzles that technical variability can create for single-cell sequencing experiments and described how studying differences in gene in describing cap analysis of gene expression (cage), a method for capturing single-cell transcriptomes, charles plessy (o-12; riken) highlighted many of the challenges familiar to researchers using rna-seq and shared how his strategy of combining cage with techniques including ''pseudo-random'' primers to remove rrna, molecular tagging, and fragmentation can improve the quality of single-cell data. anton kratz (o-13; riken) further emphasized cage's potential as he described how his group applied translating ribosome affinity purification (trap), which can isolate the ribosome-associated transcriptome (the translatome) to purkinje dendrites to measure transcription at the subcellular level and identify biomarkers for specific cell types. much of the remaining work sought to address broad evolutionary questions dealing with the relationship between biological mechanisms, complex traits, and genomic architecture. martin taylor (o-33; university of edinburgh) discussed how the distribution of replicationassociated polymorphisms in mouse and human genomes may be explained in part by patterns of transcription factor binding and chromatin accessibility in the paternal germline. satoshi oota (o-52; riken) used enu mutagenesis in the mouse to observe evolution in real time, which allowed gaining insight into how the distribution of gc content has evolved in mammalian genomes. andrew morgan (o-34; university of north carolina chapel hill) explored the functional and evolutionary impacts of large copy number variations at a specific locus in the mouse genome. another prominent theme among the research featured in the sessions on genomics and computational analysis was the relationship between genotypes and quantitative phenotypes at both the level of the cell and the organism. jason lin (o-36; chiba cancer center research institute) discussed how a new class of molecules that combine the histone deactylase inhibitor saha and dna-binding pyrrole-imidazole polyamide (saha-pip) can induce epigenetic reprogramming and regulate pluripotency. steven munger (o-53; the jackson laboratory) used a system's genetics approach in the do to resolve the conflict between the expectation of high mrna-protein expression correlation from the central dogma of biology and recent observations suggesting a weak correlation. his work showed that while the levels of many proteins are regulated by nearby variants that influence mrna expression levels (cis-eqtl), the relative stoichiometry of proteins in stable partnerships and complexes is a key posttranslational regulator of protein abundance that can act to buffer individual member proteins against cis-acting transcriptional variation. robert young (o-35; university of edinburgh) presented evidence of divergence in the patterns of promoter gain and loss in humans and mice, offering insight into the evolutionary processes that contribute to gene expression and phenotypic diversity in both species. peter williamson (o-54; university of sydney) used a panel of inbred mouse strains to identify qtl related to metabolism, body composition, lactation, and other complex traits. a common approach featured at the imgc each year is the use of the mouse as a model for understanding how biological processes influence and respond to changes in the mammalian genomic landscape. notably, this year's sessions on genomics and computational analysis also included speakers that used a variety of other organisms in their research. in addition to the several talks mentioned above, we heard from david beier (o-32; seattle children's research institute) who analyzed the genomewide distribution of nonsense mutations in the exomes of individuals without severe mendelian disorders. he found that the strength of heterozygote selection correlated with the likelihood of recessive lethality and that many genes with established roles in developmental diseases had high heterozygote selection. martin frith (o-51; national institute of advanced industrial science and technology, tokyo) used sequence data from humans, chimps, orangutans, and dogs to classify different types of human chromosomal rearrangements, which occur in slowly evolving regions. finally, isaac adeyemi babarinde (o-37; national institute of genetics) introduced the audience to the capybara (the world's largest living rodent) and shared how he used its genome to understand the relationship between mutation rate and body size in rodents. for the first time at an imgc, two plenary sessions were devoted to epigenomics and noncoding rnas. this highlights the increasing awareness of looking beyond the traditional gene-protein relationship for understanding transcriptional control in both normal and diseased states in mammals. the influence of parental diet was featured in two presentations. joseph nadeau (o-25; pacific northwest research institute) illustrated that both folate supplementation and mutations in rna modulating genes such as dnd1, a1cf, and apobec1 bias fertilization toward wildtype genotypes. johannes beckers (o-26; helmholtz zentrum, munich) reported the epigenetic inheritance of an acquired metabolic disorder. he showed that a parental high-fat diet increased offspring susceptibility to obesity and type 2 diabetes. both speakers stressed the importance of understanding the underlying mechanisms related to diet for guiding future public health policies. different aspects of genomic location as a determinant of transcriptional regulation were also discussed in the first plenary session. through determination of the three-dimensional configuration of x chromosomes in different tissues and cells, christine disteche (o-24; university of washington) demonstrated that genes that escape x inactivation preferentially localize at the periphery of the nucleus. she also showed that the expressed alleles of imprinted genes have greater chromatin contacts than the silent alleles, suggesting that these expressed regions are under greater organizational constraints. reanalysis of the publicly available encode data led siddharth sethi mrc harwell) to the discovery that transcription factor-binding sites are highly enriched in dnase1 hypersensitive sites compared to promoter and enhancer regions. as such, dnase1 hypersensitivity data could be used as an alternative method for discovering enriched regulatory motifs with the aim of improving understanding phenotypic variability. the second plenary session continued the theme of using genomewide data to understand transcriptional control by different noncoding elements-promoters, enhancers, microrna, and l1-transposable elements. to dissect the relationship between dynamic changes in mrna and enhancer rna, erik arner (o-44; riken) measured the activities of promoters and enhancers over time in a number of cell types following different biological stimuli. he presented data supporting enhancer transcription as the earliest event in successive waves of transcriptional change. this phenomenon was observed in multiple biological systems, suggesting this may be a general feature of mammalian transcriptional regulation, contrary to models showing coexpressions of enhancers and promoters. albin sandilan (o-43; university of copenhagen) provided a clinical application, presenting rna-seq data profiling colon specimens from patients with inflammatory bowel disease. a promoter set was identified, which could accurately distinguish between the primary disease subtypes: crohn's disease and ulcerative colitis. moreover, 20,000 active enhancer regions were identified with subsets induced in general inflammation or specifically in one subtype. these enhancers had links to both known and novel genes involved in the pathogenesis of inflammatory bowel disease. michiel de hoon (o-45; riken) turned attention toward the role of mirna in gene regulation. he presented the results from a collaboration spanning 31 centers internationally, which analyzed deep, sequencing data of paired small rna and cage libraries across a wide range of cell types. this analysis revealed two classes of mirna: celltype-specific mirnas and ubiquitous mirnas. cell-typespecific mirnas are highly expressed in only a few cell types and may act as buffers of gene expression. ubiquitous mirnas are expressed in most cell types, but depleted in particular cell types and may be important in preventing inappropriate activation of transcriptional programs. valerio orlando (o-46; ircss fondazione santa lucia) presented the last talk in the plenary session focusing on the epigenetic role of retrotransposable elements, specifically long interspersed nuclear elements 1 (l1). analysis of l1 transcription followed that of enhancer elements and myogenic program in normal muscle cells but was absent in duchenne muscular dystrophy (dmd)-affected muscle cells. pharmacological rescue of the dmd phenotype by histone deacetylase inhibitors or gene therapy was accompanied by normal l1 expression. he proposed that deregulation of l1 mobilization is a key trait in loss of cell identity and disease. genome editing using crispr/cas9 technology has taken the scientific world by storm, allowing rapid and efficient editing in eukaryotic cells. the in vitro and in vivo applications of crispr/cas9 genome editing was a strong theme at this year's meeting, with every plenary session including talks that made use of this technology, as well as the plenary session being completely devoted specifically to advances in genome editing. marie-christine birling (o-29; institut clinique de la souris) kicked off the session demonstrating crispr/cas9 genome editing in rats. she described her group's effort to generate alleles with precise gene deletions and duplications of a 24-mb region by means of two different guide rnas on both sides of the target region. kazuto yoshimi (o-30; national institute of genetics) then combined crispr/cas9 ''scissors'' with single-stranded oligodeoxyribonucleotides as the ''paste'' mechanism to ligate the cut sites for efficient replacement of rat genes with human genes. finally, dave bergstrom (o-31; jackson laboratory) presented his lab's modified crispr approach to enable rapid ''humanizing'' of large segments of the mouse genome, giving the example of replacement of a mouse tumor suppressor gene with 25 kb of the orthologous human gene. inbred strains with whole-genome illumina sequencing and discussed the challenges of making alignments in complex regions. attendees were treated to the test site launch of the latest data available through the uscd genome browser (http://www.hgwdev-mus-strain.sdsc.edu/cgi-bin/hggateway). the 2015 verne chapman lecture titled ''the hidden layer of regulatory rna in mammalian genome biology'' was delivered by eminent molecular biologist john mattick (o-42; garvan institute of medical research). mattick is known for his research in revealing the central role of nonprotein-coding dna in the production of regulatory nonprotein-coding rnas (ncrnas) and his efforts to understand how ncrna regulation contributes to the staggering level of phenotypic complexity observed throughout the animal kingdom. he began his lecture with the quote ''are we letting a philosophy of the protein-coding gene control (our) reasoning? what then is the philosophy of the gene?'' he pointed out that this quote was not from this year or even this century-it was in fact a concern attributed to nobel laureate barbara mcclintock 75 years ago. he reminded us that the mammalian genome contains only *20,000 protein-coding genes, the same number as in simple nematodes. on the other hand, the extent of nonprotein-coding dna increases with increasing developmental and cognitive complexity, reaching 98.5 % in humans. he guided us through the events leading up to his discovery and the explosion of studies that followed, peppering his tale with vivid descriptions of the findings and figures that have influenced him over the years. he shifted effortlessly between concrete descriptions of data and philosophical speculations on the evolution of paradigm shifts, the mysteries of biological complexity, and the silent assumptions that inform (and occasionally hinder) scientific practice. he entertained the audience with his encyclopedic knowledge of molecular genetics, framing his research within a larger historical context that served to complement data-driven descriptions of his and others' work establishing ncrnas as prominent regulators of development, cognition, and disease in the mammalian genome. mattick also elaborated on the relevance of ncrnas to incipient areas of research and technology development in epigenomics and neurobiology, creating a memorable and thought-provoking experience for both experienced investigators and trainees (fig. 3) . this year there were two additional keynote lectures, both showcasing scientific advancements using induced pluripotent stem cells (ipscs). masayo takahashi (o-28; riken center for developmental biology) provided a historical perspective of ipscs, from basic research to clinical application. it is remarkable that the field has gone from the invention of ipsc technology to the clinic in only 7 years. dr. takahashi presented the first human application of ipsc-derived targeting age-related macular degeneration, a retinal disease. she described an impressive panel of experimental validations for retinal pigment epithelial cell sheets derived from ipscs, including whole genome, genotyping arrays, methylome, and single-cell analysis. next, her group hopes not only to treat the retinal epithelium but also to use ipscs to derive photoreceptors to completely restore vision in patients. the talk discussed the use of allogenic as well as autologous ipscs and treated the audience to an early view of potentially revolutionary treatments utilizing ipscs, highlighting the steadily growing wave of interest in this technology. hideyuki okano (o-47; keio university graduate school of medicine) then went on to discuss the use of ipscs in the study and treatment of neurological disorders. he described work to establish ipscs from patients with psychiatric disorders and characterize their pathophysiology. in a quest to investigate human psychiatric and neurological disorders more effectively, hideyuki then presented his group's impressive work generating transgenic marmoset models of parkinson's disease. the marmosets express human synuclein and recapitulate typical human diseases including sleep disturbances, lewy bodies, tremors, and gait abnormalities. in summary, the 2015 meeting showcased a variety of cutting edge mammalian genetic approaches, concepts, and results-from large-scale mapping to single gene approaches and everything in between. the meeting ended in a spectacular fashion, with attendees treated to the spa and fig. 4 a spectacular finale for an amazing conference massage facilities at the yokohama minatomirai manyo club, donning kimonos and yakutas for a traditional banquet (fig. 4) . attendees were entertained by local performers before awards were made to acknowledge exceptional trainee oral and poster presentations, and to thank the outgoing imgs secretariat and nominations and elections committee. it is certainly the first and only imgs meeting to date where ongoing scientific collaborations were made and discussed in a traditional japanese onsen: a most fitting end to a wonderful meeting. we eagerly await the next imgc, which will be part of mouse genetics 2016 at the allied genetics conference (tagc) in orlando, florida in july 2016. that conference will bring the mouse genetics community together with yeast, ciliate, c. elegans, drosophila, and zebrafish communities to highlight the importance of model systems in understanding and translating fundamental advances in genetics. updates about tagc 2016 and the following imgc in heidelberg, germany in 2017 can be found at www.imgs. org. follow us on twitter (https://www.twitter.com/imgs_ news) or facebook (https://www.facebook.com/mamma lian.genome/). genomic responses in mouse models poorly mimic human inflammatory diseases genomic responses in mouse models greatly mimic human inflammatory diseases key: cord-280924-g6062fwk authors: hachim, mahmood yaseen; al heialy, saba; hachim, ibrahim yaseen; halwani, rabih; senok, abiola c.; maghazachi, azzam a.; hamid, qutayba title: interferon-induced transmembrane protein (ifitm3) is upregulated explicitly in sars-cov-2 infected lung epithelial cells date: 2020-06-10 journal: front immunol doi: 10.3389/fimmu.2020.01372 sha: doc_id: 280924 cord_uid: g6062fwk current guidelines for covid-19 management recommend the utilization of various repurposed drugs. despite ongoing research toward the development of a vaccine against sars-cov-2, such a vaccine will not be available in time to contribute to the containment of the ongoing pandemic. therefore, there is an urgent need to develop a framework for the rapid identification of novel targets for diagnostic and therapeutic interventions. we analyzed publicly available transcriptomic datasets of sars-cov infected humans and mammals to identify consistent differentially expressed genes then validated in sars-cov-2 infected epithelial cells transcriptomic datasets. comprehensive toxicogenomic analysis of the identified genes to identify possible interactions with clinically proven drugs was carried out. we identified ifitm3 as an early upregulated gene, and valproic acid was found to enhance its mrna expression as well as induce its antiviral action. these findings indicate that analysis of publicly available transcriptomic and toxicogenomic data represents a rapid approach for the identification of novel targets and molecules that can modify the action of such targets during the early phases of emerging infections like covid-19. coronaviruses are a large family of viruses that were first described over 50 years ago (1) . since the turn of the millennium, there have been two major global outbreaks caused by coronaviruses, namely sars-cov in 2003 and mers-cov in 2012 (2) . the ongoing covid-19 pandemic caused by sars-cov-2 represents the third and most devastating of these outbreaks. these outbreaks, notably the covid-19 pandemic, are harsh reminders of the challenges posed by emerging infectious diseases. the global impact of the covid-19 pandemic has brought to the forefront the need to rapidly develop and deploy an effective vaccine. however, despite ongoing concerted research efforts, it is accepted that such a vaccine will not be available in time to contribute to the containment of the ongoing pandemic. current management guidelines include the use of repurposed drugs such as chloroquine and its analog hydroxychloroquine as well as antiviral agents (3) . however, the need for well-designed clinical trials to validate their efficacy continues to be highlighted. to effectively address the ongoing covid-19 pandemic, there is a recognized need for a framework for rapid identification of novel targets for diagnostic and therapeutic interventions as well as determine clinically approved drugs with high potential for repurposed use against sars-cov-2. publicly available transcriptomic datasets generated from sars-cov infected humans, and mammalian cells represent a wealth of data that could be used to identify consistent differentially expressed genes, which could then be validated against sars-cov-2 infected epithelial cells transcriptomic datasets. a comprehensive toxicogenomic analysis of the identified genes could potentially identify possible interactions with clinically proven drugs. this simple approach can be used for the rapid identification of novel targets and drugs for further validation. in this study, we have applied this approach, and our findings have identified ifitm3 as an early upregulated gene and indicate that valproic acid enhances ifitm3 mrna expression and antiviral action. publicly available transcriptomic datasets were retrieved from gene expression omnibus (geo) mouse lung tissue transcriptome response to a mouse-adapted strain of sars-cov in wild type c57bl6/nj mice and tlr3-/-mice c57bl6/nj ma15 gse68820 (11) (https://www.ncbi.nlm.nih.gov/geo/). only microarray gene expression datasets with the word "sars-cov, " virus, or modified strain infected vs. mock-infected and no more than 48 h after the infection. twelve datasets fulfilled the criteria, as detailed in table 1 . we used geoquery and limma r packages through the geo2r tool for each dataset (12) . after sorting the genes according to the false discovery rate (fdr), the top 2,000 differentially expressed probes with fdr <0.05 were selected from each dataset. the annotated genes of the 5,000 probes in each dataset were intersected with differentially expressed genes (degs) from all other datasets. the degs that were common in at least 9 out of the 12 (75%) datasets were identified as shared genes that are consistently deg in the first 48 h of sars-cov infection. enriched ontology clustering for the identified genes was performed to explore using the metascape (http://metascape.org/gp/index.html#/main/step1). the shortlisted genes expression was then explored in another dataset (gse147507), where rna-sequencing of primary human lung epithelium (nhbe) mock-treated or infected with sars-cov-2 was done to examine whether there is a difference in the response of sars-cov-2 from other strains in terms of degs (13) . in total, 9,692 genes were differentially expressed genes (degs) between mock-infected and virally infected models in the 12 studies. thirty-eight genes that were degs in at least 9 out of 12 studies (75%) were considered common degs due to sars-cov infection of the lungs in the first 48 h post-infection. these genes are listed in table 2 . in order to identify deg in sars-cov infected lung tissuespecific to each of the models used and those which are shared, we intersected the degs from datasets that use the same model. human epithelial cells datasets (gse17400, gse33267, gse37827, gse47960, and gse48142), mice datasets (gse33266, gse50000, gse52920, gse64660, and gse68820), ferret (gse11704) and cynomolgus macaques (gse23955) were all intersected with the covid-19 infected epithelial cells dataset as shown in figure 1 . the number of deg intersected between different species is listed in the table 3 . epithelial cells infected with sars-cov-2 shared 9 degs (mx1, oas3, xaf1, ifi44, mx2, irf7, stat1, ifit3, and ifit1) with human lung cells, mice, and cynomolgus maca. the identified genes are involved in the immune response against rna viruses as expected, the top genes identified are involved in innate immune responses against rna viruses. these include the cytosolic dna-sensing pathway, toll-like receptor signaling pathway, and negative regulation of binding. interferon (ifn) response to viral infections such as type i interferon signaling pathway, defense response to the virus, the antiviral mechanism by ifn-stimulated genes, regulation of type i interferon production, response to interferon-alpha, and regulation of defense response to virus and influenza a, were also upregulated. genes that play significant roles in activating immune systems such as regulation of response to cytokine stimulus, negative regulation of immune response, myeloid cell homeostasis, and positive regulation of the multi-organism process are also upregulated (figure 2 and table 4 ). the identified genes expression levels were higher in human bronchial epithelium infected with sars-cov-2 compared to those mock-infected (figure 3) . however, only ifitm3 showed a significant difference (p < 0.05), while two other genes oas2 and mx1 showed a trend of enhancement, although it was not statistically significant (p = 0.06 using the two-stage linear stepup procedure of benjamini, krieger, and yekutieli). ifitm3 mrna levels were one of the highly expressed genes compared to the other identified genes at baseline in mock-infected hbe and were further induced by the virus, which results in overall high mrna levels. valproic acid can upregulate the ifitm3 mrna expression next, we searched the comparative toxicogenomics database (http://ctdbase.org/) to identify drugs/chemicals that might affect the mrna expression of ifitm3 in at least two reference studies (14) . interestingly valproic acid, carbon nanotubes, nickel, and tert-butylhydroperoxide were shown to upregulate ifitm3 expression while pirinixic acid, acetaminophen, and ethinyl estradiol decreased such an expression ( table 5) . in order to examine the effect of valproic acid therapy on the ifitm3 mrna expression in immune cells of the blood, a publicly available transcriptomics dataset (gse143272) was extracted and reanalyzed. healthy controls were compared to responders and non-responders patients on valproic acid therapy. we found upregulation of the mrna expression of ifitm3 in patients, and the difference was significant in the responder group only (p < 0.05) compared to healthy controls (figure 4) . in response to viral rnas, like in the case of sars-cov-2, the innate immune system will unleash interferon (ifn), to activate antiviral mechanisms and effector cells like natural killers (15) . in mice infected with sars-cov, a delayed and prolonged type i interferon (ifn-i) signaling leads to lung immunopathology as it promotes the accumulation of pathogenic inflammatory cells with increased lung cytokine/chemokine levels and vascular leakage (16) . this prolonged ifn-i and virally induced il-10 set the scene for secondary bacterial infection, which can add a strong il1β and tnfα-mediated inflammatory response to magnify lung damage (17) . understanding how sars-cov-2 can manipulate ifn is vital in deciphering the battle of the body against the viral spread and consequence. our reanalysis of transcriptomic data showed that although the ifn pathway is upregulated consistently in sars-cov related infection, sars-cov-2 showed specific upregulation of the gene for a unique interferon-induced protein, namely ifitm3. ifitm3 is a 15-kda protein that localizes to endosomes and lysosomes and is possibly acquired by mammalian ancestral cells via horizontal gene transfer (18) . interferon-induced transmembrane proteins (ifitms 1, 2, and 3) are innate immune responders to virus infections as they regulate the fusion of invading virus and endocytic vesicles and direct it to the lysosomes (19, 20) . ifitm3 can further alter membrane rigidity and curvature to inhibit virus membrane fusion (21) . such action is important to prevent the release of viral particles into the cytoplasm, which controls viral spread (22) . during influenza a infection of human airway epithelial cells, ifitm3 was shown to clusters on viruscontaining endosomes and lysosomes within few hours postinfection, indicating its role in the early phase of viral entry (23) . even platelets and megakaryocytes were shown to remarkably upregulate ifitm3 to prevent viral progression during influenza infection (24) . bst2, ifi35, ifit2, ifit1, ifit3, irf7, isg20, mx1, mx2, oas2, oas3, sp100, stat1, isg15, ifitm3, usp18, xaf1, zbp1, rsad2, trim21, ddx58, il6, cxcl10, cxcl11 the epithelial cell and resident leukocytes in lung upper and lower airways that constitutively express ifitm3 can withstand viral infections, and this is vital to decide viral tropism as viruses favor cells with low ifitm3 expression (25) . ifitm3 enhances the accumulation of cd8+ t cells in airways to promote mucosal immune cell persistence (26) . lung and circulating immune cells were reported to express less ifitm3 than other tissues, and this was a suggestive reason for covid-19 severity and cytokine release syndrome (27) . interestingly, ifitm3-rs12252-c/c snp prevalence in the chinese population is 26.5%, and recent research confirmed that snps in ifitm3 could change the severity of influenza infection, as was shown in one case with covid-19 (28) . ifitm3 polymorphisms have been linked with hospitalization and mortality during influenza virus infection (29) . expressing the gene is not the only prerequisite to the antiviral action of ifitm3, as it was found recently that within the protein, an amphipathic helix is critical for its blocking effect of viral fusion of similar pathogenic viruses like influenza a virus and zika virus (30). another factor that regulates the ifitm3 trafficking specificity to such viruses is that it requires s-palmitoylation (19, 20) . s-palmitoylation (s-palm) is the reversible process of linking a fatty acid chain to cysteine residues of the substrate protein (31) . multiple zinc finger dhhc domain-containing palmitoyltransferases (zdhhcs) can palmitoylate ifitm3 to make it a fully functional antiviral protein (32) . it seems that bats (order chiroptera), which act as natural hosts for many viral infections, use ifitm3 as an antiviral mechanism if there is s-palmitoylation of the protein; however, if this modification is disturbed, the bat can develop viral infection (33) . based on that, we can suggest that severe covid-19 cases might be due to either non-functional ifitm3 by snp, failure of lung cells to upregulate ifitm3 in response to interferon, a mutation in amphipathic helix sequence or modification in s-palmitoylation. further examination and screening for the ifitm3 dynamics in covid-19 might explain the possible therapeutic and diagnostic options. our toxicogenomic analysis showed that valproic acid increased the mrna expression of ifitm3, supporting a new report that the sars-cov-2-human protein-protein interaction map showed that valproic acid might be a potential repurposing drug for covid-19 (34) . virtual screening, docking, binding energy calculation, and simulation show that valproic acid forms stable interaction with nsp12 of cov and can inhibit its function (14) . valproic acid is currently used for the treatment of epilepsy and known to target histone deacetylases (hdacs) that modify the gene expression epigenetically (35) . valproic acid was shown to inhibit mature and fully infectious enveloped viruses release as it alters cellular membrane composition (36) . the modest and broad antiviral activity of valproic acid made the drug an attractive possibility due to its availability, and limited side effects for a short term of use during acute viral disease (37) . the reported side effects like hepatotoxicity and teratogenicity are mainly associated with the parental compound valproate and can be avoided by the use of its derivatives like valpromide (vpd) and valnoctamide (vcd) . a recent open-label proof-of-concept trial of 10 days intravenous valproic acid for severe covid-19 showed a 50% reduction in the case fatality rate and length of stay (38) . more studies are needed to explore the promising potential of valproic acid in the treatment of covid-19. one limitation of the study is that it is based on the publicly available transcriptome dataset, which is limited in number, partly because this is a novel disease, but also because ongoing lockdowns have made it challenging for scientists to carry out the extensive laboratory work required. our evaluation showed that the analysis of publicly available transcriptomic data could be a reasonable approach to identify the novel target and suggest drugs that can modify the action of such targets during the early phases of emerging infections like covid-19 until a complete understanding of the disease become clear. this can justify the experimental use of clinically approved drugs and guide the clinicians in their limited options against such lethal disease. all datasets presented in this study are included in the article/supplementary material. coronaviruses in animals and humans three emerging coronaviruses in two decades covid-19 -navigating the uncharted lack of innate interferon responses during sars coronavirus infection in a vaccination and reinfection ferret model dynamic innate immune responses of human bronchial epithelial cells to severe acute respiratory syndrome-associated coronavirus infection comparative pathogenesis of three human and zoonotic sars-cov strains in cynomolgus macaques complement activation contributes to severe acute respiratory syndrome coronavirus pathogenesis release of severe acute respiratory syndrome coronavirus nuclear import block enhances host transcription in human lung cells a network integration approach to predict conserved regulators related to pathogenicity of influenza and sars-cov respiratory viruses the pdz-binding motif of severe acute respiratory syndrome coronavirus envelope protein is a determinant of viral pathogenesis toll-like receptor 3 signaling via trif contributes to a protective innate immune response to severe acute respiratory syndrome coronavirus infection ncbi geo: archive for functional genomics data sets-update sars-cov-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems. biorxiv the comparative toxicogenomics database: update 2019 viral innate immune evasion and the pathogenesis of emerging rna virus infections dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice a model of superinfection of virus-infected zebrafish larvae: increased susceptibility to bacteria associated with neutrophil death more than meets the i: the diverse antiviral and cellular functions of interferon-induced transmembrane proteins ifitm3 directly engages and shuttles incoming virus particles to lysosomes interferon-induced transmembrane protein 3 blocks fusion of sensitive but not resistant viruses by partitioning into virus-carrying endosomes ifitm proteins restrict viral membrane hemifusion antiviral protection by ifitm3 in vivo ifitm3 clusters on virus containing endosomes and lysosomes early in the influenza a infection of human airway epithelial cells human megakaryocytes possess intrinsic antiviral immunity through regulated induction of ifitm3 ifitm3: how genetics influence influenza infection demographically enhanced survival of lung tissue-resident memory cd8(+) t cells during infection with influenza virus due to selective expression of ifitm3 two potential novel sars-cov-2 entries, tmprss2 and ifitm3, in healthy individuals and cancer patients ifitm3 restricts the morbidity and mortality associated with influenza snp-mediated disruption of ctcf binding at the ifitm3 promoter is associated with risk of severe influenza in humans insights into protein s-palmitoylation in synaptic plasticity and neurological disorders: potential and limitations of methods for detection and analysis the palmitoyltransferase zdhhc20 enhances interferoninduced transmembrane protein 3 (ifitm3) palmitoylation and antiviral activity bat ifitm3 restriction depends on s-palmitoylation and a polymorphic site within the cd225 domain a sars-cov-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing. biorxiv exploring the inhibitory activity of valproic acid against the hdac family using an mmgbsa approach new pharmacological strategies to fight enveloped viruses trends in antiviral strategies methods of an openlabel proof-of-concept trial of intravenous valproic acid for severe covid-19. medrxiv all authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication. key: cord-253450-k7p510p4 authors: keha, abi; xue, luo; yan, shen; yue, hua; tang, cheng title: prevalence of a novel bovine coronavirus strain with a recombinant hemagglutinin/esterase gene in dairy calves in china date: 2019-05-31 journal: transbound emerg dis doi: 10.1111/tbed.13228 sha: doc_id: 253450 cord_uid: k7p510p4 bovine coronavirus (bcov) is the causative agent of diarrhoea in newborn calves, winter dysentery in adult cattle and respiratory tract illnesses in cattle across the world. in this study, a total of 190 faecal samples from dairy calves with diarrhoea were collected from 14 farms in six chinese provinces, and bcov was detected in 18.95% (36/190) of the samples by reverse transcriptase polymerase chain reaction. full‐length spike, hemagglutinin/esterase (he), nucleocapsid and transmembrane genes were simultaneously cloned from 13 clinical samples (eight farms in four provinces), and most of the bcov strains showed a unique evolutionary pattern based on the phylogenetic analysis of these genes. interesting, 10 of the 13 strains were identified as he recombinant strains, and these strains had experienced the same recombination event and carried the same recombination sites located between the esterase and lectin domain. they also shared an identical aa variant (f181v) in the r2‐loop. moreover, 9/10 strains displayed another identical aa variant (p, s158a) in the adjacent r1‐loop of the he gene, which differs from the other available bcov he sequences in the genbank database. our results showed that bcov is widely circulating in dairy cattle in china, contributing to the diagnosis and control of dairy calves diarrhoea. furthermore, a bcov strain that carries a recombinant he gene has spread in dairy calves in china. to the best of our knowledge, this is the first description of an he recombination event occurring in bcov; this is also the first description of the molecular prevalence of bcov in china. our findings will enhance current understanding about the genetic evolution of bcov. bovine coronavirus possesses five major structural proteins: the spike (s), hemagglutinin/esterase (he), nucleocapsid (n), transmembrane (m) and the small membrane (e) (lai & cavanagh, 1997) . the s protein is involved in receptor recognition and carries distinct functional domains near its amino (s1) and carboxy (s2) termini, while the n-terminal s1 domains recognize sugar receptors, and the s2 subunit is a transmembrane protein that mediates viral and cellular membrane fusion during cell invasion (fang li, 2016) . s1 and s2 contain several antigenic domains, but s1 appears to be the most efficient at inducing antibodies with high neutralizing activities in its host (yoo & deregt, 2001) . the he protein contains two important functional domains: the lectin domain and the esterase domain. the lectin domain recognizes sugar receptors in the cell, whereas the esterase domain possesses a receptor-destroying enzyme activity capable of removing cellular receptors from the surfaces of the targeted cells. the receptor-binding (lectin) and receptor-destroying (esterase) domains may be important for virus entry (kienzle, abraham, hogue, & brian, 1990; schultze, wahn, klenk, & herrler, 1991) . therefore, in addition to the s protein, the he protein serves as a second viral attachment protein for infection initiation (groot, 2006) . the primary role of bcov n protein is to package the viral genome into long, flexible, helical ribonucleoprotein (rnp) complexes, protect the genome and ensure its timely replication and reliable transmission, as well as playing a role in viral transcription and translation (hurst, ye, goebel, jayaraman, & masters, 2010) . in contrast, the m protein plays a crucial role in bcov assembly (oostra, haan, groot, & rottier, 2006) . the high genetic diversity in coronaviruses is attributable to the high mutation rates associated with rna replication, the high recombination frequencies within the coronavirus family and the large coronavirus genomes (woo, lau, huang, & yuen, 2009 ). recombination in coronaviruses plays an important role in virus evolution and can result in the emergence of new pathotypes (menachery, graham, & baric, 2017; wang et al., 2015) as well as changing the host ranges and ecological niches (bakkers et al., 2017) . thus far, recombination regions in coronaviruses have been extensively reported for the s gene (kin et al., 2015; lau et al., 2011; minami et al., 2016) , a finding also applicable to bcov (martínez et al., 2012) . recombination events in m (herrewegh, smeenk, horzinek, rottier, & groot, 1998 ), n (kin et al., 2015 , rp3 (lau et al., 2010) and the orf1 gene (chen et al., 2017; kin et al., 2015) have also been reported. however, to date, recombination events in he have only been reported in mhv, a betacoronavirus, and this situation may act as a strong force for generating strains with new genotypes, host spectra and tissue tropisms (groot, 2006; luytjes, bredenbeek, noten, horzinek, & spaan, 1988; smits et al., 2005) . the presence of bcov has been confirmed in chinese dairy cows (genbank accession number fj556872), but the prevalence and molecular characteristics of bcov are still largely unknown. therefore, we sought to investigate the prevalence of bcov in dairy calves with diarrhoea in china. unexpectedly, our results reveal that a bcov containing a recombinant he gene has emerged and spread in dairy calves in china. a total of 190 faecal samples were collected from dairy calves (≤3 months of age) with obvious diarrhoea at 14 farms from six provinces in china during september 2017 and may 2018 (table 1) . the samples were shipped on ice and stored at −80°c. the faecal samples were fully resuspended in phosphate-buffered saline (1:5 w/v) and centrifuged at 10,000 × g for 10 min. viral rna was extracted from 300 μl of the faecal suspension using rnaios plus (takara bio inc) according to the manufacturer's instructions. the cdna was synthesized using the primescript™ rt reagent kit according to the manufacturer's instructions (takara bio inc.) and then stored at −20°c until required. bovine coronavirus nucleic acids in the faecal samples were identified using a pcr assay established in our laboratory that targets the bcov polymerase gene. after validating the specificity and stability of the assay, the detection limit for the viral nucleic acid in the assays was determined to be 1 × 10 −2 pg per μl-1. in detail, a primer pair (f: 5′-cgagttgaacaccc agat-3′, the complete s, he, n and m genes were pcr-amplified from samples already known to be bcov-positive based on rt-pcr assays previously reported (gélinas, boutin, sasseville, & dea, 2001; lau et al., 2011; martínez et al., 2012; park et al., 2006) . all pcr products were purified using the omega gel kit (omega) following the manufacturer's instructions, after which they were ligated to the pmd19-t vector (takara bio inc.) and transformed into dh5α competent escherichia coli cells (yeasen) for sequencing. the s and n gene sequences were assembled using seqmansoftware (version 7.0; dnastar inc). the homologies of the nt and deduced amino acid (aa) sequences were determined using the megalign program in dnastar 7.0 software (dnastar inc). mega 7.0 was used for multiple sequence alignment and to subsequently build the maximum-likelihood phylogenetic tree with bootstrap testing (1,000 replicates). recombination events were assessed using simplot software (version 3.5.1) and the recombination detection program rdp 4.0 (version 4.9.5) with the rdp, geneconv, chimaera, maxchi, bootscan, siscan and 3seq methods (martin, murrell, golden, khoosal, & muhire, 2015) . of the 190 faecal samples from the calves with diarrhoea, 36(18.95%) were found to be bcov-positive, which revealed that the virus was distributed in 13/14 farms across the six provinces (table 1) . full-length s, he, n and m genes were successfully cloned out of 13 positive samples from eight farms in four different chinese provinces (shanxi, two strains; henan, three strains; liaoning, five strains; and sichuan, three strains). the 13 s genes, at 4,092 -bp each, encode a protein of 1,363 aa, the cleavage site of which is located at aa 768 in all 13. sequence comparisons revealed that all 13 s genes share 98.6%-100% nt identity and 98.5%-100% aa identity with each other. they also share 96.8%-100% nt identity and 95.3%-100% aa identity with all 163 full-length bcov s genes available in the genbank database. a phylogenetic tree based on the complete s gene sequences using the maximum-likelihood method showed that 12 of the 13 s genes from this study together with 13 other bcov s genes from china (one strain from cattle, genbank accession number ku886291; 12 strains from yaks, bos grunniens, submitted by our team, genbank accession number mh810151-mh810162) clustered on an independent large branch. the remaining s genes clustered with three north american bcov strains (genbank accession number mh043952, mh043954 and mh043955) on a small independent branch of the tree ( figure 1 ). compared with the other bcov s genes, 9/13 sequences from this study and the 13 other chinese bcov sequences motioned above, which were located in the independent large branch, each had an identical aa variant (n1192y) in the s2 subunit. additionally, 4/13 sequences from this study and the above-mentioned 12 sequences from chinese yaks, which are located in the large independent branch, have an identical aa variant (e121v) in the s1 subunit. compared with the bcov mebus prototype strain, these bcov s genes have a total of 13 aa changes in the s1 subunit and 3 aa changes in the s2 subunit ( figure 2 ). no frame shifts, deletions, insertions or recombination events were observed in the s gene sequences from all the strains in this study. all 13 he genes were 1,275 -bp long, and the protein they encode is 424 aa residues in length. fgds, the putative esterase active site in all he proteins, was located at aa positions 37-40, and nine n all of the 13 n genes were 1,347 -bp in length, each encoding a protein of 230 aa residues. sequence comparison of these genes revealed that they share 99.8%-100% nt sequence identity and 99.3% bovine coronavirus, an important pathogen of calves, is globally responsible for severe economic losses in farming (azizzadeh et al., 2012; bok et al., 2015; johnson & pendell, 2017) . in china, the prevalence of bcov is still largely unknown. therefore, in this study, we screened 190 diarrhoea faecal samples from calves, 36 of which were found to be bcov-positive, and the positive samples were distributed across 13 of the 14 farms we screened across six provinces in the major dairy cattle production areas of china. the results showed that bcov is circulating widely in these dairy cattle, a finding that should help with the diagnosis and control of diarrhoea in these animals. most of the strains from this study are unique in their evolutionary histories based on our analysis of the full-length s, he, n and m genes, a finding similar to that for bcov in korean (ko et al., 2006; park et al., 2010) . this may be the result of geographical, environmental and natural selection patterns (bidokhti et al., 2013; hasoksuz, sreevatsan, cho, hoet, & saif, 2002; martínez et al., 2012) . the bcov s protein is involved in receptor recognition, host specificity, antigenic diversity and immunogenicity (fang li, 2016) . its gene sequences are variable, and mutations in this protein are associated with alterations in viral antigenicity, viral pathogenicity, host range and tissue tropism (gallagher & buchmeier, 2001; peng et al., 2012) . in this study, compared with other bcov s genes, we found that nine out of 13 of our sequences and 13 chinese bcov sequences (one strain from cattle and 12 strains from yaks), which clustered on a large independent branch of the phylogenetic tree, each had an identical aa variant (n1192y) in the s2 subunit. as a transmembrane protein, the s2 subunit mediates the fusion of viral and cellular membranes (luo & weiss, 1998) ; hence, the biological significance of this variant warrants further investigation. in addition, four out of 13 sequences and 12 sequences from chinese yaks were found to have an identical aa variant (e121v) in the s1 receptor-binding region, compared with the other s genes. f i g u r e 2 amino acid variants of the 13 complete s genes in this study. the figures in the box indicate the identical amino acid change sites in all 13 strains in this study compared with the bcov prototype strain mebus s sequences; the figure marked with triangle was an unique aa variant in the four sequences in this study and 12 sequences from chinese yaks; the figure marked with circular was an unique aa variant in the nine sequences in this study and 13 sequences (12 from chinese yaks and one from chinese cattle); the figure marked with line was an unique aa variant in shanxi strains in this study; which compared with the other available bcov s sequences in the genbank database. hp, the first hydrophobic domain of the s2 subunit; hr-n and hr-c, the heptad repeats; s1a and s1b, the immune reactive domain; s1-ntd, receptor-binding domain; sp, signal peptide the s1 subunit in the n-terminal of bcov (aa 1-330) recognizes a sugar receptor (peng et al., 2012) , and aa substitutions in this region can change the receptor-binding capacity (li et al., 2005) and host receptor specificity (sheahan et al., 2008) . two bcov strains (genbank accession number mk095177 and mk095178) from shanxi province were found to have unique aa substitutions (e1051v, s1076p) in the heptad repeat region. this region is crucial for viral entry and for viral and host cell membrane interactions to occur, and it promotes lipid bilayer fusion and nucleocapsid release into the cytoplasm (forni et al., 2015) . thus, aa substitutions in this region may affect the interaction between the coiled-coil structure and the host cell receptor (martínez et al., 2012) . the he protein has a receptor-binding function, which also plays a critical were recovered from eight farms in four provinces across a wide geographical distance, with the two furthest provinces being more than 1,000 km apart. thus, these novel he recombinant strains have been circulating widely in dairy cattle in china. to the best of our knowledge, this is the first report of a recombination event in the he gene from bcov in cattle, a finding that augments current understanding about the evolution of bcov. in fact, mhv, which is another lineage a member of the betacoronavirus genus, has also reportedly undergone a non-homologous recombination event in the he gene (luytjes et al., 1988) . recombination in the he gene from influenza c virus and in toroviruses has also been observed (groot, 2006; smits et al., 2005) . recombination in the he gene may be a strong driving force for generating strains with new genotypes, host spectra and tissue tropisms (groot, 2006; luytjes et al., 1988; smits et al., 2005) . notably, in our study, the he recombinant and non-recombinant strains simultaneously existed on the same cattle farm in liaoning province. interestingly, the reported decrease in the he receptor-binding capacity of hcov-oc43 betacoronaviruses was thought to be caused furthermore, nine of the 10 strains have another identical aa variant (p, s158a) in the adjacent r1-loop of their he genes, which is an identical situation to that seen with the hcov-oc43 (bakkers et al., 2017) . thus, further investigation of the significance of the receptor-binding capacity caused by aa substitutions in the receptor-binding region of the he recombinant strains is warranted. notable, monoclonal antibodies against the bcov he protein efficiently neutralized bcov infectivity in vitro (deregt & babiuk, 1987) and protected the intestinal epithelium of cattle from virus infection in vivo (deregt et al., 1989) , indicating that the he protein of bcov may also play a significant role in the induction of protective effect on in conclusion, the results of our study have shown that bcovs are circulating widely in dairy calves in china and that most of these strains have unique evolutionary pattern based on our phylogenetic analysis of the complete s, he, n and m genes. recombination events between the esterase and lectin domain of he were identified as occurring at remarkably high frequencies, and these recombinant strains are widely prevalent in dairy cattle in china. as far as we f i g u r e 6 phylogenetic tree based on the deduced 448 aa sequences of the complete n gene. sequence alignments and clustering were performed by clustalw in mega 7.0 software. the tree was constructed by the maximum-likelihood method with bootstrap values calculated for 1,000 replicates. the strains in this study were marked with a circle, and the other chinese bcov strains were marked with a triangle are aware, this is the first description of a recombination event in the he gene of bcov, and our findings will enhance current understanding about the genetic evolution of bcov. f i g u r e 7 phylogenetic tree based on the deduced 230 aa sequences of the complete m gene. sequence alignments and clustering were performed by clustalw in mega 7.0 software. the tree was constructed by the maximumlikelihood method with bootstrap values calculated for 1,000 replicates. the strains in this study were marked with a circle, and the other chinese bcov strains were marked with a triangle factors affecting calf mortality in iranian holstein dairy herds betacoronavirus adaptation to humans involved progress loss of hemagglutinin-esterase lectin activity evolutionary dynamics of bovine coronaviruses: natural selection pattern of the spike gene implies adaptive evolution of the strains molecular and antigenic characterization of bovine coronavirus circulating in argentinean cattle during 1994-2010 two novel porcine epidemic diarrhea virus (pedv) recombinants from a natural recombinant and distinct subtypes of pedv variants monoclonal antibodies to bovine coronavirus: chacteristics and topographical mapping of neutralizing epitopes on the e2 and e3 glycoproteins monoclonal antibodies to bovine coronavirus glycoproteins e2 and e3: demonstration of in vivo virus-neutralizing activity the heptad repeat region is a major selection target in mers-cov and related corona-viruses coronavirus spike proteins in viral entry and pathogenesis bovine coronaviruses associated with enteric and respiratory diseases in canadian dairy cattle display different reactivities to anti-he monoclonal antibodies and distinct amino acid changes in their he, s and ns4.9 protein structure, function and evolution of the hemagglutinin-esterase proteins of corona-and toroviruses molecular analysis of the s1 subunit of the spike glycoprotein of respiratory and enteric bovine coronavirus isolates feline coronavirus type ii strains 79-1683 and 79-1146 originate from a double recombination between feline coronavirus, type i and canine coronavirus an interaction between the nucleocapsid protein and a component of the replicase-transcriptase complex is crucial for the infectivity of coronavirus genomic rna market impacts of reducing the prevalence of bovine respiratory disease in united states beef cattle feedlots. frontiers in veterinary science, 4, 189 structure and orientation of expressed bovine coronavirus hemagglutinin-esterase protein genomic analysis of 15 human coronaviruses oc43 (hcov-oc43s) circulating in france from 2001 to 2013 reveals a high intraspecific diversity with new recombinant genotypes molecular characterization of he, m, and e genes of winter dysentery bovine coronavirus circulated in korea during the molecular biology of coronaviruses the murine coronavirus hemagglutinin-esterase receptor-binding site: a major shift in ligand specificity through modest changes in architecture molecular epidemiology of human coronavirus oc43 reveals evolution of different genotypes over time and recent emergence of a novel genotype due to natural recombination ecoepidemiology and complete genome comparison of different strains of severe acute respiratory syndrome-related rhinolophus bat coronavirus in china reveal bats as a reservoir for acute, self-limiting infection that allows recombination events structure, function, and evolution of coronavirus spike proteins receptor and viral determinants of sara-coronavirus adaptation to human ace 2 roles in cell-to-cell fusion of two conserved hydrophobic regions in the murine coronavirus spike protein sequence of mouse hepatitis virus a59 mrna 2: indications for rna recombination between coronaviruses and influenza c virus rdp4: detection and analysis of recombination patterns in virus genomes molecular and phylogenetic analysis of bovine coronavirus based on the spike glycoprotein gene sialic acid receptors of viruses jumping species-a mechanism for coronavirus persistence and survival. current opinion in virology detection of novel ferret coronaviruses and evidence of recombination among ferret coronaviruses glycosylation of the severe acute respiratory syndrome coronavirus triple-spanning membrane proteins 3a and m detection and characterization of bovine coronaviruses in fecal specimens of adult cattle with diarrhea during the warmer seasons detection and molecular characterization of calf diarrhoea bovine coronaviruses circulating in south korea during crystal structure of bovine coronavirus spike protein lectin domain isolated he-protein from hemagglutinating encephalomyelitis virus and bovine coronavirus has receptor-destroying and receptor-binding activity mechanisms of zoonotic severe acute respiratory syndrome coronavirus host range expansion in human airway epithelium nidovirus sialate-o-acetylesterases: evolution and substrate specificity of coronaviral and toroviral receptor-destroying enzymes origin and possible genetic recombination of the middle east respiratory syndrome coronavirus from the first imported case in china: phylogenetics and coalescence analysis coronavirus diversity, phylogeny and interspecies jumping a single amino acid change within antigenic domain ii of the spike protein of bovine coronavirus confers resistance to virus neutralization prevalence of a novel bovine coronavirus strain with a recombinant hemagglutinin/esterase gene in dairy calves in china the authors declare that there is no conflict of interest. this study did not involve animal experiments besides the faecal sampling of diarrhoea dairy calves that visited farm for clinical treatment. https://orcid.org/0000-0002-8413-7260cheng tang https://orcid.org/0000-0003-2680-0519 key: cord-264746-gfn312aa authors: muse, spencer title: genomics and bioinformatics date: 2012-03-29 journal: introduction to biomedical engineering doi: 10.1016/b978-0-12-238662-6.50015-x sha: doc_id: 264746 cord_uid: gfn312aa this chapter discusses the basic principles of molecular biology regarding genome science and describes the major types of data involved in genome projects, including technologies for collecting them. genome science is heavily driven by new technological advances that allow for rapid and inexpensive collection of various types of data. the emergence of genomic science has not simply provided a rich set of tools and data for studying molecular biology. it has been the catalyst for an astounding burst of interdisciplinary research, and it has challenged long-established hierarchies found in most institutions of higher learning. the next generation of biologists needs to be as comfortable at a computer workstation as they are at the lab bench. recognizing this fact, many universities have already reorganized their departments and their curricula to accommodate the demands of genomic science.the chapter discusses practical applications and uses of genomic data. for example, in the foreseeable future, are gene therapies that can repair genetic defects. at the conclusion of this chapter, the reader will be able to: use key bioinformatics databases and web resources. in april 2003, sequencing of all three billion nucleotides in the human genome was declared complete. this landmark of modern science brought with it high hopes for the understanding and treatment of human genetic disorders. there is plenty of evidence to suggest that the hopes will become reality-1631 human genetic diseases are now associated with known dna sequences, compared to the less than 100 that were known at the initiation of the human genome project (hgp) in 1990. the success of this project (it came in almost 3 years ahead of time and 10% under budget, while at the same time providing more data than originally planned) depended on innovations in a variety of areas: breakthroughs in basic molecular biology to allow manipulation of dna and other compounds; improved engineering and manufacturing technology to produce equipment for reading the sequences of dna; advances in robotics and laboratory automation; development of statistical methods to interpret data from sequencing projects; and the creation of specialized computing hardware and software systems to circumvent massive computational barriers that faced genome scientists. clearly, the hgp served as an incubator for interdisciplinary research at both the basic and applied levels. the human genome was not the only organism targeted during the genomic era. as of june 2004, the complete genomes were available for 1557 viruses, 165 microbes, and 26 eukaryotes ranging from the malaria parasite plasmodium falciparum to yeast, rice, and humans. continued advances in technology are necessary to accelerate the pace and to reduce the expense of data acquisition projects. improved computational and statistical methods are needed to interpret the mountains of data. the increase in the rate of data accumulation is outpacing the rate of increases in computer processor speed, stressing the importance of both applied and basic theoretical work in the mathematical and computational sciences. in this chapter, the key technologies that are being used to collect data in the laboratory, as well as some of the important mathematical techniques that are being used to analyze the data, are surveyed. applications to medicine are used as examples when appropriate. understanding the applications of genomic technologies requires an understanding of three key sets of concepts: how genetic information is stored, how that information is processed, and how that information is transmitted from parent to offspring. in most organisms, the genetic information is stored in molecules of dna, deoxyribonucleic acid ( fig. 13.1) . some viruses maintain their genetic data in rna, but no emphasis will be placed on such exceptions. the size of genomes, measured in counts of nucleotides or base pairs, varies tremendously, and a curious observation is that genome size is only loosely associated with organismal complexity (table 13 .1). most of the known functional units of genomes are called genes. for purposes of this chapter, a gene can be defined as a contiguous block of nucleotides operating for a single purpose. this definition is necessarily vague, for there are a number of types of genes, and even within a given type of gene, experts have difficulty agreeing on precisely where the beginning and ending boundaries of those genes lie. a structural gene is a gene that codes instructions for creating a protein ( fig. 13 .2). a second category of genes with many members is the collection of rna genes. an rna gene does not contain protein information; instead, its function is determined by its ability to fold into a specific three-dimensional configuration, at which point it is able to interact with other molecules and play a part in a biochemical process. a common rna gene found in most forms of life is the trna gene illustrated in figure 13 .3. structural genes are the entities most scientists envision when the word ''gene'' is mentioned, and from this point on, the term gene will be used to mean ''structural gene'' unless specified otherwise. the number and variety of genes in organisms is a current topic of importance for genome scientists. gene number in organisms ranges from tiny (470 in mycoplasma) to enormous (60,000 or more in plants). non-free-living organisms have even smaller gene numbers (the hiv virus contains only nine). the number of genes in a typical human genome has been estimated to be about 30,000, perhaps the single most surprising finding from the human genome project. this number was thought to be as large as 120,000 as recently as 1998. the confusion over this number arose in part because there is a not a ''one gene, one protein'' rule in humans, or indeed, in many eukaryotic organisms. instead, a single gene region can contain the information needed to produce multiple proteins. to understand this fact, the series of steps involved in creating a functional protein from the underlying dna sequence instructions must be understood. the central dogma of molecular biology states that genetic information is stored in dna, copied to rna, and then interpreted from the rna copy to form a functional protein ( fig. 13 .2). the process of copying the genetic information in dna into an rna copy is known as transcription (see chapter 3). the process is thought by many to be a remnant of an early rna world, in which the earliest life forms were based on rna genomes. it is at the level of transcription that gene expression is regulated, determining where and when a particular gene is turned on or off. the transcription of a gene occurs when an enzyme known as rna polymerase binds to the beginning of a gene and proceeds to create a molecule of rna that matches the dna in the genome. it is this molecule of messenger rna (mrna) that will serve as a template for producing a protein. however, it is necessary for organisms to regulate the expression of genes to avoid having all genes being produced in all cells at all times. transcription factors interact with either the genomic dna or the polymerase molecule to allow delicate control of the gene expression process. a feedback loop is created whereby an environmental stimulus such as a drug leads to the production of a transcription factor, which triggers the expression of a gene. in addition to this example of a positive control mechanism, negative control is also possible. an emerging theme is that sets of genes are often coregulated by a single or figure 13 .3 the transfer rna (trna) is an example of a non-protein-coding gene. its function is the result of the specific two-and three-dimensional structures formed by the rna sequence itself. 13 .1 introduction small group of transcription factors. these sets of genes often share a short upstream dna sequence that serves as a binding site for the transcription factor. one of the earliest surprises of the genomic era was the discovery that many eukaryotic gene sequences are not contiguous, but are instead interrupted by dna sequences known as introns. as shown in figure 13 .2, introns are physically cut, or spliced, from the mrna sequence before the rna is converted into a protein. the presence of introns helps to explain the phenomenon that there are more proteins produced in an organism than there are genes present. the process of alternative splicing allows for exons to be assembled in a combinatoric fashion, resulting in a multitude of potential proteins. for example, consider a gene sequence with exons e1, e2, and e3 interrupted by introns i1 and i2. if both introns are spliced, the resulting protein would be encoded as e1-e2-e3. however, it is also possible to splice the gene in a way that produces protein e1-e3, skipping exon e2. much like transcription factors regulate gene expressions, there are factors that help to regulate alternative splicing. a common theme is to find a single gene that is spliced in different ways to produce isoforms that are expressed in specific tissues. the process of reading the template in an mrna molecule and using it to produce a protein is known as translation. conceptually, this process is much more simple than the transcription and splicing processes. a structure known as a ribosome binds to the mrna molecule. the ribosome then moves along the rna in units of 3 nucleotides. each of these triplets, or codons, encodes one of 20 amino acids. at each codon the ribosome interacts with trnas to interpret a codon and add the proper amino acid to the growing chain before moving along to the next codon in the sequence (see chapter 3). genome science is heavily driven by new technological advances that allow for the rapid and inexpensive collection of various types of data. it has been said that the field is data-driven rather than hypothesis-driven, a reflection of the tendency for researchers to collect large amounts of genomic data with the (realistic) expectation that subsequent data analyses, along with the experiments they suggest, will lead to better understanding of genetic processes. although the list of important biotechnologies changes on an almost daily basis, there are three prominent data types in today's environment: (1) genome sequences provide the starting point that allows scientists to begin understanding the genetic underpinnings of an organism; (2) measurements of gene expression levels facilitate studies of gene regulation, which, among other things, help us to understand how an organism's genome interacts with its environment; and (3) genetic polymorphisms are variations from individual to individual within species, and understanding how these variations correlate with phenotypes such as disease susceptibility is a crucial element of modern biomedical research. the basic principles for obtaining dna sequences have remained rather stable over the past few decades, although the specific technologies have evolved dramatically. the most widely used sequencing techniques rely on attaching some sort of ''reporter'' to each nucleotide in a dna sequence, then measuring how quickly or how far the nucleotide migrates through a medium. the principles of sanger sequencing, originally developed in 1974, are illustrated in figure 13 .4. dna sequences have an orientation. the 5' end of a sequence can be considered to be the left end, and the 3' end is on the right. sanger sequencing begins by creating all possible subsequences of the target sequence that begin at the same 5' nucleotide. a reporter, originally radioactive but now fluorescent, is attached to the final 3' nucleotide in each subsequence. by using a unique reporter for each of the four nucleotides, it is possible to identify the final 3' nucleotide in each of the subsequences. consider the task of sequencing the dna molecule aggt. there are four possible subsequences that begin with the 5' a: a, ag, agg, and aggt. the technology of sanger sequencing produces each of those four sequences and attaches the reporter to the final nucleotide. the subsequences are sorted from shortest to longest based on the rate at which they migrate through a medium. the shortest sequence would correspond to the subsequence a; its reporter tells us that the final nucleotide is an a. the second shortest subsequence is ag, with a final nucleotide of g. by arranging the subsequences in a ''ladder'' from shortest to longest, the sequence of the complete target sequence can be found simply by reading off the final nucleotide of each subsequence. a series of new advances have allowed sanger sequencing to be applied in a highthroughput way, paving the way for sequencing of entire genomes, including that of the human. radioactive reporters have been replaced with safer and cheaper fluorescent dyes, and automatic laser-based systems now read the sequence of fluorescentlabeled nucleotides directly as they migrate. early versions of sanger sequencing only allowed for reading a few hundred nucleotides at a time; modern sequencing devices can read sequences of 800 nucleotides or more. perhaps most important has been the replacement of ''slab gel'' systems with capillary sequencers. the older system required much labor and a steady hand; capillary systems, in conjunction with the development of necessary robotic devices for manipulating samples, have allowed almost completely automated sequencing pipelines to be developed. not to be ignored in the series of technological advances is the development of automated base-calling algorithms. a laser reads the intensities of each of the four fluorescent reporter dyes as each nucleotide passes it. the resulting graph of those intensities is a chromatogram. statistical algorithms, including the landmark program phred, are able to accept chromatograms as input and output dna sequences with very high levels of accuracy, reducing the need for laborious human intervention. by assessing the relative levels of the four curves, the base-calling algorithms not only report the most likely nucleotide at each position, but they also provide an error probability for each site. a single state-of-the-art dna sequencing machine can currently produce upwards of one million nucleotides per day. large regions of dna are not sequenced in single pieces. instead, larger contigs of dna are fragmented into multiple, short, overlapping sequences. the emergence of shotgun sequencing (fig. 13 .5), pioneered by dr. craig venter, has revolutionized approaches for obtaining complete genome sequences. the fundamental approach to shotgun sequencing of a genome is simple: (1) create many identical copies of a genome; (2) randomly cut the genomes into millions of fragments, each short enough to be sequenced individually; (3) align the overlapping fragments by identifying matching nucleotides at the ends of fragments, and finally; (4) read the complete genome sequence by following a gap-free path through the fragments. until venter's work, the idea of shotgun sequencing was considered unfeasible for a variety of reasons. perhaps most daunting was the computational task of aligning the millions of fragments generated in the shotgun process. specialized hardware systems and associated algorithms were developed to handle these problems. following in the footsteps of high-throughput genome sequencing came technology that allowed scientists to survey the relative abundance of thousands of individual gene products. these technologies are, in essence, a modern high-throughput replacement of the northern blot procedure. for each member in a collection of several thousand genes, the assays provide a quantitative estimate of the number of mrna copies of each gene being produced in a particular tissue. two technologies, cdna and oligonucleotide microarrays, currently dominate the field, and they have opened the door to many exploratory analyses that were previously impossible. as a first example, consider taking two samples of cells from an individual cancer patient: one sample from a tumor and one from normal tissue. a microarray experiment makes it possible to identify the set of genes that are produced at different levels in the two tissue types. it is likely that this set of differentially expressed genes contains many genes involved in biological processes related to tumor formation and proliferation ( fig. 13 .6). a second common type of study is a time course experiment. microarray data is collected from the same tissues at periodic intervals over some length of time. for instance, gene expression levels may be measured in 6-hour increments following the administration of a drug. for each gene, the change in gene expression level can be plotted against time ( fig. 13 .7). groups of coregulated genes will be identified as having points in time where they all experience either an increase or decrease in expression levels. a likely cause for this behavior is that all genes in the coregulated set are governed by a single transcription factor. a final important medical application of microarray technologies involves diagnosis. suppose that a physician obtains microarray data from tumor cells of a patient. figure 13 .5 shotgun sequencing of genomes or other large fragments of dna proceeds by cutting the original dna into many smaller segments, sequencing the smaller fragments, and assembling the sequenced fragments by identifying overlapping ends. the data, consisting of the relative levels of gene expression for a suite of many genes, can be compared to similar data collected from tumors of known types. if the patient's gene expression profile matches the profile of one of the reference samples, the patient can be diagnosed with that tumor type. the advent of microarray techniques has rapidly improved the accuracy of this type of diagnosis in a variety of cancers. cdna microarrays were the first, and are still the most widely used, form of highthroughput gene expression methods. the procedure begins by attaching the dna sequences of thousands of genes onto a microscope slide in a pattern of spots, with each spot containing only dna sequences of a single gene. a variety of technologies have emerged for creating such slides, ranging from simple pin spotting devices to technologies using laser jet printing techniques. the rna of expressed genes is next collected from the target cell population. through the process of reverse transcription, a cdna version of each rna is created. a cdna molecule is complementary to the genomic dna sequence in the sense that complementary base pairs will physically bind to one another. for example, a cdna reading gttac could physically bind to the genomic dna sequence caatg. during the process of creating the cdna collection, each cdna is labeled with a fluorescent dye. the collection of labeled cdnas is poured over the microscope slide and its set of attached dna molecules. the cdnas that match a dna on the slide physically bind to their mates, and unbound cdnas are washed from the slide. finally, the number of bound molecules at each spot (genes) can be read by measuring the fluorescence level at each spot. highly expressed genes will create more rna, which results in more labeled cdnas binding to those spots. a more common variant on the basic cdna approach is illustrated in figure 13 .8. in this experiment, rna from two different tissues or individuals is collected, labeled with two different dyes, and competitively hybridized on a single slide. the relative abundance of the two dyes allows the scientist to state, for instance, that a particular gene is expressed fivefold times more in one tissue than in the other. oligonucleotide arrays take a slightly different approach to assaying the relative abundance of rna sequences. instead of attaching full-length dnas to a slide, oligonucleotide systems make use of short oligonucleotides chosen to be specific to individual genes. for each gene included in the array, approximately 10 to 20 different oligonucleotides of length 20-25 nucleotides are designed and printed onto a chip. the use of multiple oligonucleotides for each gene helps to reduce the effects of a variety of potential errors. fluorescently labeled rna (rather than cdna) is collected from the target tissue and hybridized against the oligonucleotide array. one limitation of the oligonucleotide approach is that only a single sample can be assayed on a single chip-competitive hybridization is not possible. although oligonucleotide and cdna approaches to assaying gene expression rely on the same basic principles, each has its own advantages and disadvantages. as already noted, competitive hybridization is currently only possible in cdna systems. the design of oligonucleotide arrays requires that the sequences of genes for the chip are already available. the design phase is very expensive, and oligonucleotide systems are only available for commercially important and model organisms. in contrast, cdna arrays can be developed fairly quickly even in organisms without sequenced figure 13 .8 a cdna microarray slide is created by (1) attaching dna to spots on a glass slide, (2) collecting expressed rna sequences (expressed sequence tags, ests) from tissue samples, (3) converting the rna to dna and labeling the molecules with fluorescent dyes, (4) hybridizing the labeled dna molecules to the dna bound to the slide, and (5) extracting the quantity of each expressed sequence by measuring the fluorescence levels of the dyes. genomes. in their favor, oligonucleotide arrays allow for more genes to be spotted in a given area (thus allowing more measurements to be made on a single chip) and tend to offer higher repeatability of measurements. both of these facts reduce the overall level of experimental error rate in oligonucleotide arrays relative to cdna microarrays, although at a higher per observation cost. because of the trade-off between obtaining many cheap noisy measurements versus a smaller number of more precise but expensive measurements, it is not clear that either technology has an obvious cost advantage. both techniques share the same major disadvantage: only measurements of rna levels are found. these measurements are used as surrogates for the much more desirable and useful quantities of the amount of protein produced for each gene. it appears that rna levels are correlated with protein levels, but the extent and strength of this relationship is not understood well. the near future promises a growing role for protein microarray systems, which are currently seeing limited use because of their very high costs. the ''final draft'' of the human genome was announced in april 2003. it included roughly 2.9 billion nucleotides, with some 30,000 to 40,000 genes spread across 23 pairs of chromosomes. the next phase of major data acquisition on the human genome is to discover how differences, both large and small, from individual to individual, result in variation at the phenotypic level. toward this end, a major effort has been made to find and document genetic polymorphisms. polymorphisms have long been important to studies of genetics. variations of the banding patterns in polytene chromosomes, for instance, have been studied for many decades. allozyme assays, based on differences in the overall charge of amino acid sequences, were popular in the 1960s. most modern studies of genetic polymorphisms, though, focus on identifying variation at the individual nucleotide level. the international snp consortium (http://snp.cshl.org) is a collaboration of public and private organizations that discovered and characterized approximately 1.8 million single nucleotide polymorphisms (snps) in human populations. in medicine, the expectation is that knowledge of these individual nucleotide variants will accelerate the description of genetic diseases and the drug development process. pharmaceutical companies are optimistic that surveys of variation will be of use for selecting the proper drug for individual patients and for predicting likely side effects on an individual-to-individual basis. most snps (pronounced ''snips'') are the result of a mutation from one nucleotide to another, whereas a minority are insertions and deletions of individual nucleotides. surveys of snps have demonstrated that their frequencies vary from organism to organism and from region to region within organisms. in the human genome, a snp is found about every 1000 to 1500 nucleotides. however, the frequency of snps is much higher in noncoding regions of the genome than in coding regions, the result of natural selection eliminating deleterious alleles from the population. furthermore, synonymous or silent polymorphisms, which do not result in a change of the encoded amino acids, are more frequent than nonsynonymous or replacement polymorphisms. the fields of population genetics and molecular evolution provide many empirical surveys of snp variation, along with mathematical theory, for analyzing and predicting the frequencies of snps under a variety of biologically important settings. simple sequence repeats (ssrs) consist of a moderate (10-50) number of tandemly repeated copies of the same short sequence of 2 to 13 nucleotides. ssrs are an important class of polymorphisms because of their high mutation rates, which lead to ssr loci being highly variable in most populations. this high level of variability makes ssr markers ideal for work in individual identification. ssrs are the markers typically employed for dna fingerprinting in the forensics setting. in human populations, an ssr locus usually has 10 or more alleles and a per generation mutation rate of 0.001. the fbi uses a set of 13 tetranucleotide repeats for identification purposes, and experts claim that no two unrelated individuals have the exact same collection of alleles at all 13 of those loci. as the technology for collecting genomic data has improved, so has the need for new methods for management and analysis of the massive amounts of accumulated data. the term bioinformatics has evolved to include the mathematical, statistical, and computational analysis of genomic data. work in bioinformatics ranges from database design to systems engineering to artificial intelligence to applied mathematics and statistics, all with an underlying focus on genomic science. a variety of bioinformatics topics may be illustrated using the core technologies described in the preceding section. it is necessary to carry out sequence alignments in order to assemble sequence fragments. all of these sequences, along with the vital information about their sources, functions, and so on, must be stored in databases, which must be readily available to users in a variety of locations. once a sequence has been obtained, it is necessary to annotate its function. one of the most fundamental annotation tasks is that of computational gene finding, in which a genome or chromosome sequence is input to an algorithm that subsequently outputs the predicted location of genes. a gene sequence, whether predicted or experimentally determined, must have its function predicted, and many bioinformatics tools are available for this task. once microarray data are available, it is necessary to identify subsets of coregulated genes and to identify genes that are differentially expressed between two or more treatments or tissue types. polymorphism data from snps are used to search for correlations with, for example, the presence or absence of a disease in family pedigrees. these questions are all of fundamental importance and draw on many different fields. by necessity, bioinformatics is a highly multidisciplinary field. genome projects involve far-reaching collaborations among many researchers in many fields around the globe, and it is critical that the resulting data be easily available both to project members and to the general scientific community. in light of this requirement, a number of key central data repositories have emerged. in addition to providing storage and retrieval of gene sequences, several of these databases also offer advanced sequence analysis methods and powerful visualization tools. in the united states, the primary public genomics resource is that of the national center for biotechnology information (ncbi). the ncbi website (http:// www.ncbi.nlm.nih.gov) provides a seemingly endless collection of data and data analysis tools. perhaps the most important element of the ncbi collection is the genbank database of dna and rna sequences. ncbi provides a variety of tools for searching genbank, and elements in genbank are linked to other databases, both within and outside of ncbi. figure 13 .9 shows some results from a simple query of the genbank nucleotide database. genbank data files contain a wealth of information. figure 13 .10 shows a simple genbank file for a prion sequence from duck. the accession number, af283319, is figure 13 .9 the result of a simple query of the genbank database at ncbi. this query found 9007 entries in the genbank nucleotide database containing the term ''tyrosine kinase.'' each entry can be clicked to find additional information. figure 13 .10 a simple genbank file containing the dna sequence for a prion protein gene. the unique identifier for this entry. the genbank file contains a dna sequence of 819 nucleotides, its predicted amino acid sequence, and a citation to the chinese laboratory that obtained the data. the ''links'' icon in the upper right provides access to related information found in other databases. it is essential for those working in genomics or bioinformatics to become familiar with genbank and the content of genbank files. ncbi is also the home of the blast database searching tool. blast uses algorithms for sequence alignment (described later in this chapter) to find sequences in genbank that are similar to a query sequence provided by the user. to illustrate the use of blast, consider a study by professor eske willerslev at the university of copenhagen. willerslev and his colleagues collected samples from siberian permafrost that included a variety of preserved plant and animal material estimated to be 300,000-400,000 years old. they were able to extract short dna sequences from the rbcl gene. these short sequences were used as input to the blast algorithm, which reported a list of similar sequences. it is likely that the most similar sequences come from close relatives of the organisms that provided the ancient dna. the european bioinformatics institute (ebi, http://www.ebi.ac.uk) is the european ''equivalent'' of ncbi. users who explore the ebi website will find much of the same type of functionality as provided by ncbi. of particular note is the ensembl project (http://www.ensembl.org), a joint venture between ebi and the sanger institute. ensembl has particularly nice tools for exploring genome project data through its genome browser. figure 13 .11 shows a portion of the display for a region of human chromosome 7. ensembl provides comparisons to other completed genome sequences (rat, mouse, and chimpanzee), along with annotations of the locations of genes and other interesting features. most of the items in the display are clickable and provide links to more detailed information on each display component. many other databases and web resources play important roles in the day-to-day working of genome scientists. table 13 .2 includes a selection of these resources, along with short descriptions of their unique features. the most fundamental computational algorithm in bioinformatics is that of pairwise sequence alignment. not only is it of immediate practical value, but the underlying dynamic programming algorithm also serves as a conceptual framework for many other important bioinformatics techniques. the goal of sequence alignment is to accept as input two or more dna, rna, or amino acid sequences; identify the regions of the sequences that are similar to one another according to some measure; and output the sequences with the similar positions aligned in columns. an alignment of six sequences from hiv strains is shown in figure 13 .12. sequence alignments have numerous uses. alignments of pairs of sequences help us to determine whether or not they have the same or similar functions. regions of alignments with little sequence variation likely correspond to important structural or functional regions of protein coding genes. by studying patterns of similarity in an alignment of genes from several species, it is possible to infer the evolutionary history resources and software indices of the species, and even to reconstruct dna or amino acid sequences that were present in the ancestral organisms. many methods for annotation, including assigning protein function and identifying transcription factor binding sites, rely on multiple sequence alignments as input. to illustrate the principles underlying sequence alignment, consider the special case of aligning two dna sequences. if the two sequences are similar, it is most likely because they have evolved from a common ancestral sequence at some time in the past. as illustrated in figure 13 .13, the sequences differ from the ancestral sequence and from each other because of past mutations. most mutations fall into one of two classes: nucleotide substitutions, which result in these two sequences being different at the location of the mutation (fig. 13.13a) , and insertions or deletions of short sequences (fig. 13.13b) . the term indel is often used to denote an insertion or deletion mutation. figure 13 .13b shows that indels lead to one sequence having nucleotides present at certain positions, whereas the second sequence has no nucleotides at those positions. to align two sequences without error, it would be necessary to have knowledge of the entire collection of mutations in the history of the two sequences. since this information is not available, it is necessary to rely on computational algorithms for reconstructing the likely locations of the various mutation events. a score function is chosen to evaluate alignment quality, and the algorithms attempt to find the pairwise alignment that has the highest numerical score among all possible alignments. consider aligning the two short sequences cagg and cga. it can be shown that there are 129 possible ways to align these two sequences, several of which are shown in figure 13 .14. how does one determine which of the 129 possibilities is best? alignments (a) and (b) each have two positions with matching nucleotides; however, alignment (b) includes three columns with indels, whereas (a) has only one. on the other hand, alignment (a) has one mismatch to (b)'s zero. there is no definitive answer to the question of which alignment is best; however, it makes sense that ''good'' alignments will tend to have more matches and fewer mismatches and indels. it is possible to quantify that intuition by invoking a scoring scheme in which each column receives a score, s i , according to the formula s i â¼ m, the bases at column i match d, the bases at column i do not match i, there is an indel at column i 8 < : using this scheme with match score m â¼ 5, mismatch score d â¼ ã�1, and indel score i â¼ ã�3, the alignment in figure 13 .14a would receive a score of 5 ã� 3 ã¾ 5 ã� 1 â¼ 6. similarly, the alignment in figure 13 .14b has a score of 5 ã� 3 ã� 3 ã¾ 5 ã� 3 â¼ 1. the remaining alignments in figure 13 .14 have scores of 0, 0, 1, 0, ã�5, and 1, respectively. alignment (a) is considered best under the standards of this scoring scheme, and, in fact, it has the best score of all 129 possible alignments. this example suggests an algorithm for finding the best scoring alignment of any two sequences: enumerate all possible alignments, calculate the score for each, and select the alignment with the highest score. unfortunately, it turns out that this approach is not practical for real data. it can be shown that the number of possible alignments of 2 sequences of length n is approximately 2 2n = ffiffiffiffiffiffiffiffiffi 2pn p when n is large. even for a pair of short sequences of length 100, the number of alignments is 6 ã� 10 58 , 35 orders of magnitude larger than avogadro's number! techniques such as the needleman-wunsch and smith-waterman algorithms, which allow for computationally efficient identifications of the optimal alignments, are important practical and theoretical components of bioinformatics. conceptually, the task of aligning three or more sequences is essentially the same as that of aligning pairs of sequences. the computational task, however, becomes enormously more complex, growing exponentially with the number of sequences to be aligned. no practically useful solutions have been found, and the problem has been shown to belong to a class of fundamentally hard computational problems said to be np-complete. in addition to the increased computation, there is one important new concept that arises when shifting from pairwise alignment to multiple alignment. scoring columns in the pairwise case was simple; that is not the case for multiple sequences. complications arise because the evolutionary tree relating the sequences to be aligned is typically unknown, which makes assigning biologically plausible scores difficult. this problem is often ignored, and columns are scored using a sum of pairs scoring scheme in which the score for a column is the sum of all possible pairwise scores. for example, the score for a column containing the three nucleotides cgg, again using the scores m â¼ 5, d â¼ ã�1, and i â¼ ã�3, is ã�1 ã� 1 ã¾ 5 â¼ 3. other algorithms, such as the popular clustalw program, use an approach known as progressive alignment to circumvent this issue. almost all widely used methods for finding sequence alignments rely on a scoring scheme similar to the one used in the preceding paragraphs. clearly, this formula has very little biological basis. furthermore, how does one select the scores for matches, mismatches, and indels? considerable work has addressed these issues with varying degrees of success. the most important improvement is the replacement of the simple match and mismatch scores with scoring matrices obtained from empirical collections of amino acid sequences. rather than assigning, for example, all mismatches a value of ã�1, the blosum and pam matrices provide a different penalty for each possible pair of amino acids. since these penalties are derived from actual data, mismatches between chemically similar amino acids such as leucine and isoleucine receive smaller penalties than mismatches between chemically different ones. a second area of improvement is in the assignment of indel penalties. the alignments in figure 13 .14b and 13.14e each have a total of three sites with indels. however, the indels at sites 2 and 3 of figure 13 .14b could have been the result of a single insertion or deletion event. recognizing this fact, it is common to use separate open and extension penalties for indels. if the open penalty is o â¼ ã�5 and the extension penalty is e â¼ ã�1, then a series of three consecutive indels would receive a score of ã�5 ã� 1 ã� 1 â¼ ã�7. the most common bioinformatics task is searching a molecular database such as genbank for sequences that are similar to a query sequence of interest. for example, the query sequence may be a gene sequence from a newly isolated viral outbreak, and the search task may be to find out if any known viral sequences are similar to this new one. it turns out that this type of database searching is a special case of pairwise sequence alignment. essentially, all sequences in the database are concatenated end to end, and this new ''supersequence'' is aligned to our query sequence. since the supersequence is many, many times longer than the query sequence, the resulting pairwise alignment would consist mostly of gaps and provide relatively little useful information. a more useful procedure is to ask if the supersequence contains a short subsequence that aligns well with the query sequence. this problem is known as local sequence alignment, and it can be solved with algorithms very similar to those for the basic alignment problem. the smith-waterman algorithm is guaranteed to find the best such local alignment. even though the smith-waterman algorithm provides a solution to the database search problem for many applications, it is still too slow for high-volume installations such as ncbi, where multiple query requests are handled every second. for these settings, a variety of heuristic searches have been developed. these tools, including blast and fasta, are not guaranteed to find the best local alignment, but they usually do and are, therefore, valuable research tools. it is no exaggeration to claim that blast (http://www.ncbi.nlm.nih.gov/blast) is one of the most influential research tools of any field in the history of science. the algorithm has been cited in upwards of 30,000 studies to date. in addition to providing a fast and effective method for database searching, the use of blast spread rapidly because of the statistical theory developed to accompany it. when searching a very large database with a short sequence, it is very likely that one or more instances of the query sequence will be found in the database simply by chance alone. when blast reports a list of database matches, it sorts them according to an e-value, which is the number of matches of that quality expected to be found by chance. an e-value of 0.001 indicates a match that would only be found once every 1000 searches, and it suggests that the match is biologically interesting. on the other hand, an e-value of 2.0 implies that two matches of the observed quality would be found every search simply by chance, and therefore, the match is probably not of interest. consider an effort to identify the virus responsible for sars. the sequence of the protease gene, a ubiquitous viral protein, was isolated and stored under genbank accession number ay609081. if that sequence is submitted to the tblastx variant of blast at ncbi, the best matching non-sars entries in genbank (remember that the sars entries would not have been in the database at the time) all belong to coronaviruses, providing strong evidence that sars is caused by a coronavirus. this type of comparative genomic approach has become invaluable in the field of epidemiology. much work in genomic science and bioinformatics focuses on problems of identifying biologically important regions of very long dna sequences, such as chromosomes or genomes. many important regions such as genes or binding sites come in the form of relatively short contiguous blocks of dna. hidden markov models (hmms) are a class of mathematical tools that excel at identifying this type of feature. historically, hmms have been used in problems as diverse as finding sources of pollution in rivers, formal mathematical descriptions of written languages, and speech recognition, so there is a rich body of existing theory. predictably, many successful applications of hmms to new problems in genomic science have been seen in recent years. hmms have proven to be excellent tools for identifying genes in newly sequenced genomes, predicting the functional class of proteins, finding boundaries between introns and exons, and predicting the higher-order structure of protein and rna sequences. to introduce the concept of an hmm in the context of a dna sequence, consider the phenomenon of isochores, regions of dna with base frequencies unique from neighboring regions. data from the human genome demonstrate that regions of a million or more bases have g+c content varying from 20% to 70%, a much higher range than one would expect to see if base composition were homogeneous across the entire genome. a simple model of the genome assigns each nucleotide to one of three possible classes (fig. 13.15a) : a high g+c class (h), a low g+c class (l), or a normal g+c class (n). in the normal class, each of the four bases a, c, g, and t is used with equal frequency (25%). in the high g+c class, the frequencies of the four bases are 15% a, 35% c, 35% g, and 15% t, and in the low g+c class, the frequencies are 30% a, 20% c, 20% g, and 30% t. in the parlance of hmms, these three classes are called hidden states, since they are not observed directly. instead, the emitted characters a, c, g, and t are the observations. thus, this simple model of a genome consists of successive blocks of nucleotides from each of the three classes ( fig. 13.15b) . the formal mathematical details of hmms will not be discussed, but it is useful to understand the basic components of the models (fig. 13.16 ). each hidden state in an hmm is able to emit characters, but the emission probabilities vary among hidden states. the model must also describe the pattern of hidden states, and the transition probabilities determine both the expected lengths of blocks of a single hidden state and the likelihood of one hidden state following another (e.g., is it likely for a block of high g+c to follow a block of low g+c?). the transition probabilities play important roles in applications such as gene finding. & what is the chance of seeing a block of high g+c nucleotides shorter than 5000? these types of questions will be addressed in the examples discussed in the next section. the task of gene prediction is conceptually simple to describe: given a very long sequence of dna, identify the locations of genes. unfortunately, the solution of the problem is not quite as simple. as a first pass, one might simply find all pairs of start (atg) and stop (tag, tga, taa) codons. blocks of sequence longer than, say, 300 nucleotides that are flanked by start and stop codons and that have lengths in multiples of three are likely to be protein coding genes. although this simple method will be likely to find many genes, it will probably have a high false positive rate figure 13 .16 an hmm for the states in fig. 13 .15. transition probabilities govern the chance that one hidden state follows another. for example, an n state is followed by another n state 90% of the time, by an l state 5% of the time, and by an h state 5% of the time. emission probabilities control the frequency of the four nucleotides found at each type of hidden state. in the hidden state l, there is 30% a, 20% c, 20% g, and 30% t. (incorrectly predict that a sequence is a gene), and it will certainly have a high false negative rate (fail to predict real genes). for instance, the method fails to consider the possibility of introns, and it is unable to predict short genes. gene finding algorithms rely on a variety of additional information to make predictions, including the known structure of genes, the distribution of lengths of introns and exons in known genes, and the consensus sequences of known regulatory sequences. hmms turn out to be exceptionally well-suited for gene finding, and the basic structure of a simple gene finding hmm is shown in figure 13 .17. note that the hmm includes hidden states for promoter regions, the start and stop codons, exons, introns, and the noncoding dna falling between different genes. also note that not all hidden states are connected to one another. this fact reflects an understanding of gene and genome structure. the sequence of states start -intron -stop -exon -promoter is not biologically possible, whereas the series noncoding -promoter -start -exon -intron -exon -intron -exon -stop -noncoding is. good hmms incorporate this type of knowledge extensively. to put the hmm of figure 13 .17 to use, the model must first be trained. the training step involves taking existing sequences of known genes and estimating all of the transition and emission probabilities for each of the model's hidden states. for example, if a training data set included 150 introns, the observed frequency of c in those intron sequences could be used to come up with the emission probability for c in the hidden state intron. the average lengths of introns and exons would be used to estimate the transition probabilities to and from the exon and intron hidden states. once the training step is complete, the hmm machinery can be used to predict the locations of genes in a long sequence of dna, along with their intron/exon boundaries, promoter sites, and so forth. gene finding algorithms in actual use are much more complex than the one shown in figure 13 .17, but they retain the same basic structure. the performance of gene finders continues to get better and better as more genomes are studied and the quality of the underlying hmms is improved. in bacteria, modern gene finding algorithms are rarely incorrect. upwards of 95% of the predicted genes are subsequently found to be actual genes, and only 1-2% of true genes are missed by the algorithms. the situation is not as rosy for eukaryotic gene prediction, however. eukaryotic genomes are much larger, and the gene structure is more complex (most notably, eukaryotic genes have introns). the effectiveness of gene finding algorithms is usually measured in terms of sensitivity and specificity. if these quantities are measured on a per nucleotide basis, an algorithm's sensitivity, s n , is defined to be the percentage of nucleotides in real genes that are actually predicted to be in genes. the specificity, s p , is the percentage of nucleotides predicted to be in genes that truly are in genes. good gene predictors have high sensitivity and high specificity. the best gene eukaryotic gene finders today have sensitivities and specificities around 90% at the individual nucleotide. if the quantities are measured at the level of entire exons (e.g., did the algorithm correctly predict the location of the entire exon or not?), the values drop to around 70%. an emerging and powerful approach for predicting the location of genes uses a comparative genomics approach. the entire human genome sequence is now available, and the locations of tens of thousands of genes are known. suppose that a laboratory now sequences the genome of the cheetah. since humans and cheetahs are both mammals, they should have reasonably similar genomes. in particular, most of the gene sequences should be quite similar. gene prediction can proceed by doing a pairwise sequence alignment of the two genomes and then predicting that positions in the cheetah genome corresponding to locations of known human genes are also genes in the cheetah. this approach is remarkably effective, although it will obviously miss genes that are unique to one species or the other. the degree of relatedness of the two organisms also has a major impact on the utility of this approach. the human genome could be used to predict genes in the gorilla genome much better than it could be used to predict genes in the sunflower or paramecium genomes. in addition to hmms and comparative genomics approaches, a variety of other techniques are being used for gene prediction. neural networks and other artificial intelligence methods have been used effectively. perhaps most intriguing, as more and more genomes become available, are hybrid methods that integrate, for example, hmms with comparative genomic data from two or more genomes. once a genome is sequenced and its genes are found or predicted, the next step in the bioinformatics pipeline is to determine the biological function of the genes. ideally, molecular biological work would be carried out in the laboratory to study each gene's function, but clearly that approach is not feasible. two basic computational approaches will be described, one using comparative genomics and the other using hmms. comparative genomics approaches to assigning function to genes rely on a simple logical assumption: if a gene in species a is very similar to a gene in species b, then the two genes most likely have the same or related functions. this logic has long been applied at higher biological levels (e.g., the kidneys of different species have the same basic biological function even though the exact details may differ in the two species). at the level of genes, the inference is less accurate, especially if the species involved in the comparison are not closely related, but the approach is nonetheless useful and usually effective. simple database searches are the most straightforward comparative genomic approach to functional annotation. a newly discovered gene sequence that returns matches to cytochrome oxidase genes when input to blast is likely to be a cytochrome oxidase gene itself. complications arise when matches are to distantly related species, when the matching regions are very short, or when the sequence matches members of a multigene family. in the first case, the functions of the genes may have changed during the tens or hundreds of millions of years since the two organisms shared a common ancestor. however, if two or more such distantly related organisms have gene sequences that are nearly identical, a strong argument can be made that the gene is critical in both organisms and that the same function has been maintained throughout evolutionary history. short matches may arise simply as a result of elementary protein structure. for example, two sequences may have regions that match simply because they both encode alpha helical regions. such matches provide useful structural information, but the stronger inference of shared function is not justified. multigene families are the result of gene duplications followed by functional divergence. examples include the globin and amylase families of genes. at some point in the past, a single gene in one organism was completely duplicated in the genome. at that point, the duplicated copy was free to evolve a new, but often related, function. subsequent duplications allow for the growth and diversification of such families. because of their shared ancestry, all members of a gene family tend to have similar dna sequences. this fact makes it difficult to assign function with high accuracy when matches appear in database searches, but it often provides a general class of functions for the query sequence. efforts have been made to classify all known proteins into functional groups using comparative genomics. suppose that the genbank protein database is queried with protein sequence a and the result is that its closest match is protein sequence b. if the database is next queried using sequence b and the closest match for b is found to be sequence a, then these two proteins are said to be reciprocal best matches, and they are likely to have the same function. likewise, if the best match to sequence a is b, the best match to b is c, and the best match to c is a, then a, b, and c are likely to have the same function. this general principle has been used to create clusters of genes that are predicted to have similar or identical functions. the cogs (clusters of orthologous groups of proteins) database at ncbi (http://www.ncbi.nlm.nih.gov/cog) represents a comprehensive clustering of the entire genbank protein database using this type of scheme. there are many known examples of proteins or individual protein domains that have the same function or structure. the pfam (protein family) database (http:// www.sanger.ac.uk/software/pfam) includes multiple sequence alignments of almost 7500 such protein families. using the sequence data for each alignment, the pfam project members created a special type of hmm called a profile hmm. this database makes it possible to take a query sequence and, for each of the 7500 families and their associated profile hmms, ask the question, ''is the query sequence a member of this gene family?'' a query to the pfam results in a probability assigned to each of the included protein families, providing not only the best matches but also indications of the strength of the matches. currently, about 74% of the proteins in genbank have a match in pfam, indicating a fairly high likelihood of any newly discovered protein having a pfam match. pfam is of interest not only because of its effectiveness, but also because of its theoretical approach of combining comparative genomic and hmm components. a common experiment is to use microarray or oligonucleotide array technology to measure the expression level for several thousand genes under two different ''treatments.'' it is often the case that one treatment is a control while the other is an environmental stimulus such as a drug, chemical, or change in a physical variable such as temperature or ph. other possibilities include comparisons between two tissue types (e.g., brain vs. heart), between diseased and undiseased tissues (e.g., tumor vs. normal), or between samples at two developmental phases (e.g., embryo vs. adult). one of the primary reasons to carry out such an experiment is to identify the genes that are differentially expressed between the two treatments. the basic format of the data from a simple two-treatment microarray experiment is the following: each spot on a microarray corresponds to a single gene, and in competitive hybridization experiments, a single spot usually provides measurements of gene expression under two different treatments. note that the first column has been intentionally labeled ''spot'' instead of ''gene.'' it is important that the same gene be used and measured multiple times; therefore, a number of different spots will typically correspond to the same gene. the final column of data is the most important for interpreting this experiment. the most extreme difference in relative expression levels is found at spot 4, where the gene is expressed almost fourfold higher under treatment 2. the question now becomes, ''how large (or small) must the ratio be to say that the expression levels are really different?'' this question is one of variability and of statistical significance. phrased differently, would a ratio near 0.27 for spot 4 be likely if the experiment were repeated? the data in the table do not provide the necessary information to answer this question, and this fact points out the importance of replication in experimental design. whenever quantitative measurements are to be compared, replication is needed in order to estimate the variance of the measurements. this fundamental tenet of experimental design was largely ignored during the early history of microarray studies. fortunately, recent work has included careful attention to experimental design and proper analysis using the analysis of variance (anova). typical experiments now include five or more replicate measurements of each gene. in order to detect very small treatment effects on levels of expression, even larger amounts of replication are needed. a second type of microarry experiment is designed not to find differentially expressed genes, but to identify sets of genes that respond to two or more treatments in the same manner. this type of study is best illustrated with a time course study in which expression levels are measured at a series of time intervals. examples of such studies might involve measuring expression levels in laboratory mice each hour following exposure to a toxic chemical, expression levels in a mother or fetus at each trimester of a pregnancy, or expression levels in patients each year following infection with hiv. if plots of expression levels (y axis) against time (x axis) for each gene are overlaid as shown in figure 13 .18, it is possible to visually compare the expression profiles of genes. the desired pattern is a group of genes that tend to increase or decrease their expression levels in unison. in figure 13 .18 it appears that genes 2 and 5 have very similar expression profiles, as do genes 1 and 4. the similarity between the expression profiles of two genes can be described using the correlation coefficient, where x i and y i are the expression levels of genes x and y at time point i. values near 1 or ã�1 indicate that the two genes have very similar profiles. when faced with thousands of profiles, the task becomes a bit more problematic. a common theme is to cluster genes on the basis of the similarity in their profiles, and many algorithms for carrying out the clustering have been published. all of these algorithms share the objective of assigning genes to clusters so that there is little variation among profiles within clusters, but considerable variation between clusters. top down clustering begins with all genes in a single cluster, then recursively partitions the genes into smaller and smaller clusters. bottom up methods start with each gene in its own cluster and progressively merge smaller clusters into larger ones. clustering algorithms may also be supervised, meaning that the user specifies ahead of time the final number of clusters, or unsupervised, in which case the algorithm determines the final number of clusters. the emergence of genomic science has not simply provided a rich set of tools and data for studying molecular biology. it has been the catalyst for an astounding burst of interdisciplinary research, and it has challenged long-established hierarchies found in most institutions of higher learning. the next generation of biologists will need to be as comfortable at a computer workstation as they are at the lab bench. recognizing this fact, many universities have already reorganized their departments and their curricula to accommodate the demands of genomic science. from a more practical point of view, the results of genomic research will begin to trickle into medicine. already, diagnostic procedures are changing rapidly as a result of genomics. the next phase of genomics will focus on relating genotypes to complex phenotypes, and as those connections are uncovered, new therapies and drugs will follow. consider, for example, a drug that is of significant benefit to 99% of users, but causes serious side effects in the remaining 1%. such drugs currently have difficulty remaining in the marketplace. however, the use of genetic screens to identify the patients likely to suffer side effects should make it possible for these drugs to be used safely and effectively. less imminent, but certainly in the foreseeable future, are gene therapies that will allow for repair of genetic defects. the continued interplay of figure 13 .18 overlaid expression profiles for 5 genes. note that genes 2 and 5, as well as genes biology, engineering, and the mathematical sciences will be responsible for exploration of these frontiers. exercises 1. how many possible proteins could be formed by a gene region containing four exons? 2. in general, eukaryotes have introns, whereas prokaryotes do not. what are possible advantages and disadvantages of introns? 3. most amino acids are encoded by more than a single codon. if one of these synonymous codons is energetically more efficient for the organism to use, what effect would that have on the organism's genome content? how might this fact be used in gene finding algorithms? 4. what is the chance that a 20-nucleotide oligonucleotide matches a sequence other than the one it was designed to match? assume for simplicity that all nucleotides have frequency 25%. how many matches to that oligonucleotide would one expect to find in the human genome? 5. if each of the 13 ssr markers used by the fbi for identification purposes has 20 equally frequent alleles, what is the chance that two randomly chosen individuals have the same collection of alleles at those 13 markers? 6. how many mammalian genomes have been completely sequenced? what are they? 7. what is the size of the anopheles gambiae genome? how many chromosomes does it have? how many genes does it have? 8. what is the length of the drosphila melanogaster alcohol dehydrogenase gene? 9. consider the following two alignments for the sequences cggtca and cagca: c-ggtca c-ggtca ca-g-ca ca-gca. a. find the score of each alignment using a match score of 5, mismatch penalty of ã�2, and gap penalty of ã�4. b. find the score of each if the gap penalty is ã�5 for opening and ã�1 for extending. 10. suppose a computer can calculate the scores for one million alignments per second. how long would it take to find the best alignment of two 1000 bp sequences by exhaustive search? 11. find an example of a zinc finger gene sequence using genbank. use blast to discover how many genbank sequences are similar to the sequence you found. what does the result tell you about zinc finger genes? 12. what are some additional features that might be added to the simple gene finding hmm of fig. 13 .17? draw a diagram of a simple gene finding hmm that might be useful for prokaryotes. the hmm should contain hidden states for exons and intergenic regions, and it should guarantee that exons have lengths that are multiples of three use the pfam website to give a brief description of the structure and function of members of the hamartin gene family 18, gene 2 seems to be expressed at higher levels than gene 5. justify the claim that the two genes have similar profiles and might be coregulated compute the correlation coefficient for each pair of genes. do any of them have similar profiles? 17. the expression levels for two genes measured at four times are: implication of the correlation coefficient? often, the gene sequences placed on microarray slides are of unknown function. suppose that an experiment identifies such a gene as being important for formation of a particular type of tumor when carrying out a database search using blast with a protein coding gene as the query sequence, there are two possible approaches. first, it is possible to query using the original dna sequence. second, one could translate the coding dna and query using the amino acid sequence of the encoded protein basic local alignment search tool isochores and the evolutionary genomics of vertebrates exploring the new world of the genome with dna microarrays prediction of complete gene structures in human genomic dna the human genome project after a decade: policy issues genomics: the science and technology behind the human genome project new goals for the us human genome project the minimal gene complement of mycoplasma genitalium a primer of genome science principles of population genetics amino acid substitution matrices from protein blocks initial sequencing and analysis of the human genome a map of human genome sequence variation containing 1.42 million single-nucleotide polymorphisms analysis of variance for gene expression in microarray data gene-expression profile of the aging brain in mice bioinformatics: sequence and genome analysis a general method applicable to the search for similarities in the amino acid sequences of two proteins a gene expression database for the molecular pharmacology of cancer identification of common molecular subsequences pfam: multiple sequence alignments and hmm profiles of protein domains increasing biological complexity is positively correlated with the relative genome-wide expansion of non-protein-coding dna sequences shotgun sequencing of the human genome the sequence of the human genome database resources of the national center for biotechnology information diverse plant and animal genetic records from holocene and pleistocene sediments gene expression profiles in normal and cancer cells suggested reading key: cord-017853-mgsuwft0 authors: machado, roberto f.; garcia, joe g. n. title: genomics of acute lung injury and vascular barrier dysfunction date: 2010-06-28 journal: textbook of pulmonary vascular disease doi: 10.1007/978-0-387-87429-6_63 sha: doc_id: 17853 cord_uid: mgsuwft0 acute lung injury (ali) is a devastating ­syndrome of diffuse alveolar damage that develops via a variety of local and systemic insults such as sepsis, trauma, ­pneumonia, and aspiration. it is interestingly to note that only a subset of individuals exposed to potential ali-inciting insults develop the disorder and the severity of the disease varies from complete resolution to death. in addition, ali susceptibility and severity are also affected by ethnicity as evidenced by the higher mortality rates observed in african-american ali patients compared with other ethnic groups in the usa. moreover, marked differences in strain-specific ali responses to inflammatory and injurious agents are observed in preclinical animal models. together, these observations strongly indicate genetic components to be involved in the pathogenesis of ali. the identification of genes contributing to ali would potentially provide a better understanding of ali pathobiology, yield novel biomarkers, identify individuals or populations at risk, and prove useful for the development of novel and individualized therapies. genome-wide searches in animal models have identified a number of quantitative trait loci that associate with ali susceptibility. in this chapter, we utilize a systems biology approach combining cellular signaling pathway analysis with populationbased association studies to review established and suspected candidate genes that contribute to dysfunction of endothelial cell barrier integrity and ali susceptibility. acute lung injury (ali) is a devastating syndrome of diffuse alveolar damage that develops via a variety of local and systemic insults such as sepsis, trauma, pneumonia, and aspiration [1] . deranged alveolar capillary permeability, profound inflammation, and extravasation of edematous fluids into the alveolar spaces are critical elements of ali, reflecting the substantial surface area of the pulmonary vasculature needed for alveolar gas exchange. ali, together with its severest form, acute respiratory distress syndrome (ards), afflicts approximately 190,000 patients per year in the usa and has a mortality rate of 35-50% [2, 3] . it is interestingly to note, however, that only a subset of individuals exposed to potential ali-inciting insults develop the disorder and the severity of the disease varies from complete resolution to death. in addition, ali susceptibility and severity are also affected by ethnicity as evidenced by the higher mortality rates observed in african-american ali patients compared with other ethnic groups in the usa [4] . moreover, marked differences in strain-specific ali responses to inflammatory and injurious agents are observed in preclinical animal models [5] . together, these observations strongly indicate genetic components to be involved in the pathogenesis of ali. the role that genetics plays in determining ali risk or the subsequent severity of the outcome is one of the many unanswered questions regarding ali pathogenesis and epidemiology. the identification of genes contributing to ali would potentially provide a better understanding of the pathogenic mechanisms of ali, yield novel biomarkers, identify individuals or populations at risk, and prove useful for the development of novel and individualized therapies. however, a traditional genetic approach to studies using family linkage mapping is not feasible given the sporadic nature of ali and the necessity of an extreme environmental insult. further, genetic studies of ali are challenging owing to the substantial phenotypic variance in critically ill patients, diversity in the lung injury evoking stimuli, presence of varied comorbid illnesses common in the critically ill patient, complex gene-environment interactions, and potentially incomplete gene penetrance [6, 7] . despite these inherent challenges, the unrivaled progress made in the post-human genome era combined with the utilization of sophisticated bioinformatics and high-throughput methods have allowed significant advances to be made. for example, these tools are now linked to escalating knowledge of the molecular mechanisms of lung endothelial permeability, a hallmark of ali and an attractive target for the design of novel therapies, to identify candidate genes whose variants are potentially involved in ali susceptibility. genome-wide searches in animal models have identified a number of quantitative trait loci that associate with ali susceptibility [8] . in this chapter, we utilize a systems biology approach combining cellular signaling pathway analysis with population-based association studies to review established and suspected candidate genes that contribute to dysfunction of endothelial cell barrier integrity and ali susceptibility. the integration of high-throughput gene expression profiling in preclinical models of ali with bioinformatics has led to the identification of differentially expressed genes in response to ali whose variants are potentially involved in ali susceptibility and severity. this approach confirmed long-suspected ali-associated candidate genes, but more importantly, identified novel genes not previously implicated in ali. increasing knowledge of the molecular mechanisms of endothelial-barrier-regulatory pathways has also enhanced the ability to find novel ali candidate genes. the analysis of the molecular pathways involving the cytoskeletal scaffolding and the dynamic cytoskeletal changes driving cell shape alterations, a key feature of vascular permeability, has identified additional genes contributing to the development and severity of ali, thereby providing novel therapeutic targets in this devastating illness. genes encoding proinflammatory cytokines, growth factors and mediators, receptors for barrier-regulatory agonists, and mechanical-stress-sensitive genes expressed in endothelium which regulate inflammatory responses also serve as attractive ali candidate genes and are representative of the diverse but fertile areas of exploration for candidate snps affecting ali susceptibility and severity. angiotensin-converting enzyme (ace) is a member of the rennin-angiotensin system (ras), balancing the levels of angiotensin i and angiotensin ii, with significant expression in lung vascular endothelium as compared with other vascular beds [9] . the ras is considered to be an important regulator of inflammation that contributes to ali by altering vascular permeability, vascular tone, fibroblast activation, and endothelial-epithelial cell survival [10] [11] [12] . for example, angiotensin ii activates inflammatory processes by upregulating proinflammatory cytokines and chemokines via type i and type ii angiotensin ii receptors that subsequently activate the nuclear factor kb (nf-kb) pathway [13, 14] . the ras is also involved in the fibrotic response to ali via induction of transforming growth factor expression [15] . the most compelling evidence for ras involvement in ali has come from the effective attenuation of ali pathobiology by ace inhibitors or angiotensin receptor blocking drugs [16, 17] and ace knockout mice in preclinical models of ali [18] . an intronic insertion (i) or deletion (d) of a 287-bp alu repeat sequence in the human ace gene, located on chromosome 17q35, has been associated with ace levels and activity in serum [19, 20] . the d allele possesses a higher enzyme activity which parallels the higher gene expression in individuals with dd genotype [21] . the initial association of the dd genotype in the ace gene with increased ali mortality provided the impetus for subsequent studies to more firmly establish a genetic basis of ali and to identify ali candidate genes [22] . caucasian patients with ards show significantly higher frequencies of the dd genotype and the d allele as compared with ventilated intensive care unit (icu) patients without ards, patients after coronary artery bypass surgery, or healthy controls. moreover, ards patients with dd genotype show markedly higher mortality (54%) in comparison with the ii genotype (11%) or strike ''4'' id genotype (28%) [22] . the higher mortality rate in ards patients with dd or id genotype as compared with ii genotype was subsequently confirmed in han chinese patients in taiwan, although the frequency of the d allele is significantly lower in the chinese population as compared with western populations [23] . compared with caucasians, a higher frequency of d allele has been reported among africans (nigerian and african-american populations) [24, 25] , potentially contributing to the observed disparity in ali-associated higher mortality rates in african-americans [4] . however, to date, no association study of ace polymorphisms and lung injury has been performed in african-americans. in contrast, mexican and amerindian populations have slightly lower allelic frequencies of the d allele [25] . thus, ace represents a highly viable endothelial candidate gene and attractive target in acute inflammatory lung disease. tumor necrosis factor (tnf) a, an early mediator of ali development, is a potent proinflammatory cytokine which dramatically increases endothelial cell permeability, cytokine production, and a variety of cytotoxic and proinflammatory compounds which lead to subsequent vascular leakage and disturbed lung water balance. both tnfa and tnfb subtypes appear in the circulation, in bronchoalveolar lavage (bal) fluid and in pulmonary edema fluid during the onset of lung injury. as such, the elevated levels of tnf and its soluble receptors are commonly used as markers of inflammation and are associated with morbidity and mortality in ali patients [26] . both the tnfa and tnfb genes lie in close proximity within the major histocompatibility complex, with several polymorphisms described in this region. the -308g/a promoter polymorphism in the tnfa gene and the ncoi restriction fragment length polymorphism in the tnfb gene appear to influence the expression of tnfa. the carriers of the -308a allele and homozygotes for the tnfb 2 allele exhibit increased tnfa expression and have increased susceptibility and mortality to sepsis [27, 28] . in patients with ards, the -308a allele is also associated with increased 60-day mortality, with the strongest association found among younger individuals [29] . however, in ards patients with direct or indirect pulmonary injury, these snps are associated with alterations in ali susceptibility (tnfa -308 g/a snp only in the direct pulmonary injury group, and tnfb ncoi only in the indirect pulmonary injury group). owing to the extent of linkage disequilibrium in the region, it remains unclear as to whether these are regulatory snps or if the tnf protein level is modulated by a third locus or a haplotype [30] . promoter snps within the tnfa gene (-238 g/a, -857 c/t) have been associated with inflammatory bowel disease along with the -308 g/a snp [31] . thus, the role of tnf variants in inflammatory disorders is apparent and indicates a need for further study of other tnf variants in association with ali. interleukin-6 (il-6) is an acute-phase response cytokine that plays a key role in the activation of b and t cells. inflammatory cytokines, including il-6, are essential for the immune system homeostasis; however, when il-6 production is exaggerated as observed in inflammatory lung disorders including ali [22, 32] , clearly detrimental outcomes are observed. ali-related increased levels of il-6 have been established in the bal fluid of critically ill patients with ards, sepsis, and trauma [33, 34] in association with ali adverse outcome [35] and development of multisystem organ failure [36] . in prior reports, we observed significantly higher expression of il-6 and the il-6 receptor genes across multiple-species ali models and in human lung endothelium exposed to ventilator-induced mechanical stress as well as in differential region-specific expression in lungs of the canine ali model [37] [38] [39] . on the basis of these data, the il-6 gene constitutes an excellent candidate gene to understand the genetic basis underlying ali. a functional polymorphism in the il-6 gene promoter region at the -174 position (g-174c) has been associated with alterations in both gene expression and il-6 levels and lower circulating il-6 concentrations and lower mortality rates in patients with acute respiratory failure admitted to the icu [22] . the contrasting correlation between g-174c alleles and circulating il-6 levels has also been reported [32] . the haplotype involving -174 g/c, 1753c/g, and 2954g/c is associated with higher mortality (and other secondary clinical outcomes) in a cohort of septic patients of european descent [40] . we further evaluated 14 il-6 gene tagging snps covering the entire gene for potential association in sepsis and ali patients of european descent [32] . no single snp was identified as significantly associated with ali; however, a common haplotype (comprising -1363g/-572g/-174g/ 1208a/1305a/4835c) with a frequency of 63% in cases and 49% in controls showed a significant association with ali susceptibility. in addition, homozygote carriers of the risk haplotype are twice as frequent in ali cases (44.8%) than in controls (22.9%), yielding a highly significantly increased odds ratio for developing ali (odds ratio 2.73; 95% confidence interval, 1.39-5.37; p = 0.003). this haplotype spans the entire il-6 gene including the g allele at position -174, i.e. the risk allele for susceptibility to ali noted above. these data support the association of the il-6 gene with ali susceptibility and illustrate the value of haplotype analysis as a robust approach in association studies. vascular endothelial growth factor (vegf) is an endothelialcell-specific mitogen that regulates angiogenesis, migration, and cell permeability [41] . vegf plays an important role in several organs by directly regulating vascular permeability to water and proteins. lung overexpression of vegf induces increased pulmonary vascular permeability, resulting in marked pulmonary edema [42] , and plasma vegf levels are significantly elevated in ali patients [43] . several studies have reported the association of low levels of vegf with the severity of ards and elevated levels with the recovery from ards, indicating a role for vegf in the repair process of lung injury [44] . several polymorphisms have been described in the vegf gene, primarily in association with cancer susceptibility and severity. the c/t snp at position 936 of the 3¢ untranslated region (utr) of the gene has been associated with higher vegf plasma levels in healthy subjects [45] . recently, the c936t snp in the vegf gene has been associated with ards susceptibility and severity (increased mortality) in subjects of european descent [46, 47] . the haplotype tct at position c-460 t, c + 405g, and c + 936 t was significantly associated with a higher rate of mortality in ards patients and higher plasma levels of vegf [47] . these studies highlight the vegf gene as an attractive barrier-regulatory ali candidate gene and molecular target in ali therapeutic strategies. chemokine receptor 4 (cxcr4) is an a-chemokine receptor specific for stromal-derived factor 1 (sdf-1; also known as cxcl12) that plays an important role in cell migration, inflammation, b lymphocyte development, angiogenesis, and human immunodeficiency virus (hiv) infection (hiv coreceptor) [48] [49] [50] . chemokine receptors are g-protein-coupled receptors, which trigger diverse signaling cascades including activation of g proteins and the phosphatidylinositol 3-kinase, janus kinase/signal transducer and activator of transcription, rho-p160 rho kinase, and mitogen-activated protein kinase signaling pathways [51] . the activation of these signaling pathways is often accompanied by the internalization of chemokine receptors and their trafficking back to the plasma membrane. this intracellular turnover determines the leukocyte responsiveness to chemokines [52] . nonmuscle myosin ii a is a molecular motor that binds with the cytoplasmic tail of cxcr4 and ccr552 and participates in the sdf-1-dependent endocytosis of cxcr4 via dynamic interaction with a-arrestin, a key component of the cxcr4 internalization pathway [50] . the cxcr4 gene was identified as a novel candidate gene in ali as it survived two filtering strategies dedicated to identifying ali-susceptibility genes associated with elevated levels of mechanical stress as observed in mechanical ventilator-associated lung injury (vali). our orthologous gene approach determined ali-specific gene ontologies -coagulation, inflammation, chemotaxis/cell motility, and immune response [38] -involving recognized genes likely to participate in ali pathogenesis [il-6, aquaporin 1 (aqp-1), plasminogen activator inhibitor type i (pai-1)], as well as novel genes not previously known to be mechanistically involved in ali, including cxcr4 [38] ( table 1) . we subsequently utilized a consomic rodent approach with introgression of rat chromosomes 2, 13, 16, and 17, which contained the highest density of vali-responsive genes [39] . introgression of the vali-sensitive brown norway (bn) rat chromosome 13, containing several genes, including cxcr-4, into the vali-resistant dahl salt-sensitive (ss) rat resulted in conversion of the ss consomic rats to a valisensitive phenotype [39] . surface expression of cxcr4 is downregulated by interleukin-4, interleukin-13, and granulocyte-macrophage colony-stimulating factor and upregulated by interleukin-10 and transforming growth factor-b (tgfb) [53] , suggesting that cxcr4 may also play a role in the fibrotic response to ali via tgfb signaling. polymorphisms in the cxcr4 gene have not yet been reported; however, a snp in the 3¢ utr of the sdf-1 gene (g801a), is associated with susceptibility to aids and type 1 diabetes [54, 55] . we are currently exploring cxcr4 as a potential ali-associated candidate gene as suggested by the density of pubmatrix citations relating cxcr4 to inflammation (1,151 published papers), endothelium (297 published papers), ali (28 published papers) and endothelial permeability (eleven published papers). pubmatrix is a web-based tool that allows simple text-based mining of the ncbi literature search service pubmed using any two lists of keywords terms, resulting in a frequency matrix of term co-occurrence. the advent of high-throughput gene sequencing and expression technologies, and complete genome sequencing of model organisms, now provides the tools to perform largescale analyses of the genome in complex disorders such as ali. whole genome scans, in silico approaches, utilization of consomic rats, and a candidate gene approach involving expression profiling and pathway analysis are proving exceptionally useful in identifying novel candidate genes and genetic variations (fig. 1 ). high-throughput whole genome scanning technology has recently emerged as a powerful tool, particularly in detecting disease-susceptibility genes with modest effects. the haplotype mapping project [56] , which identified blocks of snps associated with each other, has allowed selection of the most informative snps for further disease association studies [57] . currently, the most commonly used high-throughput snp platforms involve assessment of over one million snps spanning the genome, i.e. genome-wide association studies (gwas). gwas platforms are effective and have been successfully used in diverse disorders such as agerelated macular degeneration [58] , inflammatory bowel disease [59] , type 2 diabetes [60] , and stroke [61] . although this approach has yet to be employed in either sepsis or ali, the application of gwas to the disease is clearly imminent. another method to identify ali candidate genes is an orthologue gene in silico approach. the basis of this approach is the hypothesis that patients with ali and preclinical animal models of ali would exhibit commonality in expression of evolutionarily conserved genes across species. for example, profiling results from more than 50 affymetrix microarray chips obtained from ventilator-associated ali models (human, rat, mouse, canine) identified 3,077 genes whose expression was altered across all four species in response to ventilator-associated high-throughput gene sequencing and expression technologies, and complete genome sequencing of model organisms, now provide the tools to perform large-scale analyses of the genome in complex disorders such as ali. genome-wide association study (gwas) platforms are effective and have been successfully used in diverse disorders, but although this approach has yet to be employed in either sepsis or ali, the application of gwas to the disease is clearly imminent. the differential gene expression between lung apex/base regions as well as between gravitationally dependent/nondependent regions of the lung base in a canine model of ventilator-associated lung injury (vali) identified aliimplicated lung genes in response to local mechanical stress within the lung. this approach identified the already established ali gene macrophage migration inhibitory factor and novel genes such as growth arrest dna damage inducible (gadd45) and pre-b cell colony enhancing factor (pbef). our multispecies orthologous gene approach in human (endothelial cells), rat, mouse, and canine models of vali exhibits expression of common ali-implicated evolutionarily conserved genes (orthologues) across the species. the genes with a unidirectional 1.3-fold change (p > 0.05) are found to reside in high density on rat chromosomes 13 and 16, the chromosomal loci used to develop the consomic rodent model. together, these approaches identified novel ali genes such as pbef, chemokine receptor (cxcr-4) gadd45. interrogating the prospective pathways involved in endothelial permeability and correlation with these differentially expressed genes in vali models identified the most putative ali genes such as myosin light chain kinase (mylk), sphingosine 1-phosphate receptor 1, cmet, and vascular endothelial growth factor (vegf) mechanical stress [37, 38] . filtering these results for a unidirectional change in gene expression with greater than 1.3fold change in expression refined the list to 69 genes, reflecting specific ali-associated gene module/ontology categories: coagulation, inflammation, chemotaxis/cell motility, and immune response. this approach identified multiple genes already recognized as ali genes (such as il-6, aqp-1, and pai-1), but also identified several novel genes that were not previously known to be mechanistically involved in ali [38] . complementing the in silico approach described above, a consomic rat approach can also be utilized to identify novel ali gene candidates. in an experimental study, two strains of inbred rodents were determined to have differing susceptibility to vali (20 ml/kg, 4 h): vali-sensitive bn rats and the valiresistant dahl ss rats. using microarray analysis and a bioinformatic-intense candidate gene approach, we identified 245 differentially expressed potential vali genes with ontologies such as transcription, chemotaxis, and inflammation. because chromosomes 2, 13, 16, and 17 were found to contain the highest number of vali-response genes, consomic ss rats containing substituted bn chromosome 13 were exposed to vali mechanical stress, resulting in conversion of the resistant ss rat to vali sensitivity [39] . extensive expression profiling across preclinical ali models can extend the identification of ali gene candidates to determination of allelic frequencies of gene polymorphisms (snps) that may confer ali risk or severity. this "candidate gene approach" has identified several candidates with hypothesized significant mechanistic roles in lung injury, inflammation, or repair in the setting of ali and vali [62] . further, given the availability of sophisticated bioinformatic methods and increasing knowledge of the molecular and cellular mechanisms of lung injury, candidate genes can also be identified via analysis of cellular pathways involved in ali pathogenesis [63, 67] . the application of the novel techniques described in the previous section is proving to be exceptionally useful in identifying novel candidate genes and genetic variations in the study of the pathobiology of ali. these novel gene and biomarkers are discussed in this section. myosin light chain kinase (mlck) is an enzyme that phosphorylates regulatory myosin light chains, which allows myosin cross-bridging interactions with f-actin. in endothelial cells, the contraction of the actomyosin complex generates a stronger centripetal force that overcomes the force keeping the adjacent endothelial cell tethered, leading to endothelial retraction, decreased intercellular adhesion, and increased vascular permeability [68, 69] . this phenomenon is physiologically relevant as evidenced by nonmuscle mlck (nmmlck) isoform knockout mice [which retain the smooth muscle mlck (smmlck) isoform] that are less susceptible to lipopolysaccharide (lps)-and ventilator-induced ali [70, 71] . further, treatment with a mlck inhibitor prior to lps exposure in the wild-type mice attenuates endothelial cell barrier dysfunction and inflammation [70] . thus, the myosin light chain kinase gene (mylk), which encodes for mlck, is an excellent ali candidate gene. since initial cloning of the highly expressed nmmlck in endothelium in our laboratory [72] , we have identified substantial roles of nmmlck in cytoskeleton rearrangement of endothelial cells regulating vascular barrier function [64, 68] , angiogenesis, and leukocyte diapedesis [73] , consistent with a potential mechanistic role for mlck in the genesis of ali. the human mylk gene is located on chromosome 3q21 and encodes three proteins, including nmmlck, smmlck, and telokin. we sequenced exons, exon-intron boundaries, and 2 kb of the 5¢ utr of mylk in healthy individuals, patients with sepsis alone, and patients with sepsis-associated ali, all of european and african-american descent [66] , and identified 51 snps (ten exonic, 31 intronic, nine in the 5¢ utr, and one in the noncoding exon 1), of which 28 were chosen for further linkage disequilibrium studies. five of the ten coding mylk snps confer an amino acid change (pro21his, pro147ser, val261ala, ser1341pro, and arg1450gln) in mlck. subsequently, association analysis of both single snps and haplotypes demonstrated very strong associations in both ethnic groups [66] . in european americans, the rs3845915a/ mylk_037c haplotype was associated with more than a fivefold increase in the risk of developing ali and sepsis. in contrast, the haplotype mylk_021g/mylk_022g/ mylk_011t conferred specific risk for ali but not sepsis [66] . the 5¢ haplotype of the mylk gene also conferred alispecific risk in both european-and african-descent subjects; however, the 3¢ region haplotype was associated with ali only in african-descent subjects. in african-americans, the haplotype hcv1602689c/mylk_037a/rs11707609g is substantially more prevalent in ali (11%) as compared with sepsis (1%). this cag haplotype is not found in european americans, suggesting a potential genetic contribution to the observed ethnicity-specific differences in ali/ards prevalence and susceptibility [4] . we noted similar findings in association studies involving a cohort with trauma-induced ali [74] . we have also evaluated the association of 17 mylk genetic variants with severe asthma in both european american and african-american populations and identified a snp highly associated with severe asthma in african-americans [75] consistent with data linking this chromosomal locus (mylk, 3q21.1) to asthma and asthmarelated phenotypes [75] . taken together, these data strongly implicate mylk genetic variants as risk variants in inflammatory lung disorders, such as ali and asthma. macrophage migration inhibitory factor (mif) is an ali candidate gene and recognized biomarker, initially discovered as a soluble product of activated t cells and named for its role in inhibiting random macrophage migration [76] . mif is a proinflammatory cytokine which binds to cd44 and cd74 and is produced by many cell types, including monocytes/macrophages, pituitary cells, vascular endothelium, and respiratory epithelium [77, 78] . mif may serve as a delicate regulator of the cytokine balance between immunity and inflammation as mif counterregulates the immunosuppressive effects of glucocorticoids [79] . the role of mif as an endogenous prosurvival factor has been demonstrated in vitro. lps-mediated induction of flice-like inhibitory protein (flip) by mif confers resistance to lps-mediated endothelial cell death [80] . suppression of mif by rna interference induces cell death and sensitivity to apoptotic stimuli [80] . in addition, mif interacts with the multidimensional nmmlck [81] isoform which regulates tnf-mediated apoptosis in addition to its potent effects on endothelial cell barrier dysfunction as discussed earlier [68, 69] . together, these findings implicate the role of mif in regulation of nonmuscle cytoskeletal dynamics and vascular pathophysiology, which is evident from the enhanced mif levels in the serum, bal fluid, and alveolar endothelium of patients with ards as compared with other critically ill patients [76, 78, 82] . we found significant increases in mif transcript and protein levels in murine and canine models of ventilator-induced lung injury (vili) (using high mechanical ventilation and endotoxin exposure, respectively) [82] and in human lung endothelium cells exposed to 48 h of cyclic stretch [83] . mif deficiency or immunoneutralization appears to protect mice or rats from fatal endotoxic shock or other inflammatory diseases [84] although these results are not without controversy [85] and our own studies in 8-12-week-old mice failed to demonstrate a vili/ali-related phenotype which was different from controls (data not shown). mif also upregulates the expression of aqp-1, the water channels expressed in alveolar endothelial and epithelial cells, and a candidate gene we identified in models of vili-associated mechanical stress [38] . mif may serve to modulate fluid movement into alveolar spaces, a cardinal feature of ali [86] . to extend the likelihood that mif serves as a putative candidate gene in ali and sepsis, we studied the association of eight mif polymorphisms, including the most studied mif promoter g/c snp at position -173, in a sepsis-induced ali cohort (n = 506) of african-and european-descent cases [82] . no individual snp showed a significant association with either ali or sepsis; however, the carriers of the cc genotype (rs755622) and the carriers of the tt genotype (rs2070767) showed more than twofold increased risk of developing sepsis and ali, respectively. this association was lost, however, after age and gender adjustment in a logistic regression model. in contrast, mif haplotypes at the 3¢ region of the gene display strong association with ali and sepsis, conferring both protection as well as susceptibility to ali, in european and african populations [82] . furthermore, the haplotype at the 5¢ promoter region of the gene involving a short tandem repeat at position -794 (catt)5 and the -173 g allele show significant association with both ali and trauma [82] ; however, no association was found between promoter region haplotypes and mif levels. rheumatoid arthritis patients with the -173c allele have higher levels of mif in the serum and synovial fluid than the carriers of the g allele and have a higher probability of developing idiopathic arthritis [87] . thus, given these diverse mif functions, mif remains an attractive target in inflammatory diseases including the lung. the bioactive sphingolipid metabolite sphingosine 1-phosphate (s1p) is an important lipid mediator that enhances endothelilal cell barrier function in vivo and in vitro by ligating s1p receptor 1 (s1p1), which is encoded by an endothelial differentiation gene (edg1 or s1p1) [88, 89] . s1p1 is a pertussis-toxin-sensitive, g i -coupled receptor which induces rac gtpase-dependent substantial increases in cortical actin polymerization critical to endothelial cell barrier enhancement [88, 90] . s1p1 activation enhances the organization and redistribution of vascular endothelial cadherin and b-catenin in junctional complexes in endothelium by phosphorylation of cadherin as well as p120catenin and inducing the formation of cadherin/catenin/actin complexes [91] . understanding the role of s1p in enhancing endothelial cell barrier function underscores its importance as a therapeutic target in reversing loss of endothelial cell barrier integrity. in vivo administration of selective s1p1 competitive antagonists induces a dose-dependent disruption of barrier integrity in pulmonary endothelium [92, 93] , whereas s1p1 agonists, sew2871 and fty720, promote vascular endothelial barrier function [94] [95] [96] . a compelling argument for s1p1 as an attractive ali candidate gene is not only its ability to transduce signals which restore barrier integrity but also that s1p1 is the target for transactivation by receptors for other potent barrier-protective agonists. these include epcr (receptor for activated protein c) [65] , c-met [receptor for hepatocyte growth factor (hgf)] [97] , cd44 (receptor for high molecular weight hyaluronan) [67] , and the atp receptor [98] . we recently resequenced the s1p1 gene (14 african-americans and 13 european americans) to search for common variations in the edg1 gene and identified 39 snps in the edg1 gene, with several promoter snps associated with asthma, another inflammatory lung syndrome [99] . receptor) the role of hgf and its tyrosine kinase receptor c-met has been investigated in lung development, inflammation, and repair [100] as well as in neoplastic processes such as cellular transformation, neoplastic invasion, and metastasis [101, 102] . snps causing underexpression of c-met have been associated with autism and c-met snps/mutations appear to be linked to lung cancer disparities in different ethnic groups. these include an n375s mutation in the hgfbinding domain of c-met, an r988c snp/mutation in the juxtamembrane domain, and an activating m1268t mutation in the tyrosine kinase domain (exon 19), all linked to development of solid tumors such as lung cancer, renal cancer, gastric cancer, and hepatocellular carcinoma [102] . hgf influences morphogenesis in epithelial cells from a variety of organs, including lungs, where hgf antisense oligonucleotides block alveolar and branching morphogenesis [103] . hgf expression and activity increase after 3-6 h of lung injury with intratracheally administered hydrochloric acid, suggesting that hgf plays a role in reparative responses to lung injury [104] . c-met expression on type ii pneumocytes is likely involved in increased type ii pneumocyte proliferation and restoration of an intact alveolar epithelium [105] . c-met is composed of a 50-kda extracellular a subunit and a 145-kda transmembrane b subunit [106] which contains tyrosine kinase domains, tyrosine phosphorylation sites, and tyrosine docking sites [107] . we demonstrated that hgf-mediated c-met phosphorylation and c-met recruitment to caveolin-enriched microdomains (cems) protects against the lps-induced pulmonary vascular hyperpermeability that is regulated by high molecular weight hyaluronan (cd44 ligand) [108] . our novel findings indicate that hgf/c-met-mediated, cd44-regulated cem signaling promotes tiam1 (a rac1 exchange factor)/dynamin 2 dependent rac1 activation, and peripheral recruitment of cortactin (an actin cytoskeletal regulator), processes essential for endothelial cell barrier integrity. understanding the mechanism(s) by which hgf/c-met promotes increased endothelial cell barrier function may lead to novel treatments for diseases involving vascular barrier disruption, including inflammation, tumor angiogenesis, atherosclerosis, and ali. however, on the contrary, the higher mortality rate in ali patients with increased levels of hgf in bal fluids [109] and in pulmonary edematous fluids [110] indicates severer injury and inflammation in response to increased hgf levels. it has now become increasingly clear that hgf plays an important role in normal and injured lung and may have a therapeutic potential in lung diseases. pre-b cell colony enhancing factor (pbef), was first identified by samal and colleagues in 1994 as a protein secreted by activated lymphocytes in bone marrow stromal cells that stimulate early stage b cell formation in conjugation with stem cell factor and interleukin-7. a large body of work has now highlighted the power of a systems biology approach in the search for novel disease-susceptibility genes and potentially novel biomarkers, with pbef serving as an excellent example of this approach (fig. 2) . we first identified marked upregulation of pbef via microarray analyses of murine and canine models of vili/ali with increased gene/ protein expression in bal fluid and serum samples from critically ill icu patients with ali and sepsis [111] . with only a total of eight papers in pubmed at that time, we next directly sequenced the pbef gene in 36 subjects with ali, sepsis, and healthy controls and conducted a pbef snp-based association study in ali subjects of european and african-american descent [111] . we identified 11 snps in the pbef gene with two promoter snps, t-1001g and c-1543 t, associated with ali and sepsis. genotyping of pbef c-1543 t and t-1001g snps revealed significant associations of sepsis and ali, with the strongest association found with the -1543c/-1001g haplotype. univariate analysis found carriers of the g allele (t1001g) to have 2.75-fold higher risk of developing ali as compared with controls (p = 0.002) [111] . these results were subsequently confirmed in a comparable but distinct replicate ali population [112] . interestingly, the -1543g/-1001c haplotype was also associated with increased icu parient mortality, whereas the -1543 t/-1001 t haplotype was associated with fewer ventilator days and decreased icu patient mortality [112] . a key challenge in genomic explorations is the ability to confirm the contribution of a snp to a dysfunctional-geneinvolved disease process. additional reports have highlighted the capacity for the pbef gene to have an influence far beyond any b-cell regulatory function, with a key role in regulating vascular permeability [113] as well as inhibiting neutrophil apoptosis [114] . to further explore mechanistic participation of pbef in ali and vili, we focused on the contribution of pbef to endothelial function. our prior immunohistochemical staining of canine-injured lung tissues localized pbef expression to vascular endothelial cells, in addition to infiltrating neutrophils and type 2 alveolar epithelial cells [111] . our in vitro studies showed that expression of pbef in pulmonary artery endothelial cells increases thrombin-mediated vascular permeability [111] , suggesting that enhanced pbef expression may mediate the early increase in vascular permeability that is characteristic of ali. neutrophils harvested from the circulation of septic and ali patients show marked inhibition of the apoptotic process in association with evidence of enhanced respiratory burst capacity [115, 116] , with both activities largely restored with administration of pbef antisense oligonucleotides. our initial in vitro studies further demonstrated recombinant human pbef (rhpbef) as a direct rat neutrophil chemotactic factor, with in vivo studies demonstrating marked increases in bal fluid leukocytes (polymorphonuclear leukocytes, pmns) following intratracheal injection in c57bl/6 j mice [117] . these changes were accompanied by increased bal fluid levels of pmn chemoattractants (kc and mip2) and modest increases in lung vascular and alveolar permeability. we also noted synergism between rhpbef challenge and a model of limited vili and observed dramatic increases in bal fluid pmns, bal protein, and cytokine levels (il-6, tnfa, kc) compared with either challenge alone. gene expression profiling identified induction of ali-and vili-associated gene modules (nf-kb, leukocyte extravasation, apoptosis, toll-receptor pathways). heterozygous pbef +/− mice were significantly protected (reduced bal fluid protein levels, bal fluid il-6 levels, peak inspiratory pressures) when exposed to a model of severe vili (4 h, 40 ml/kg tidal volume) and exhibited significantly reduced gene expression of vili-associated modules. finally, strategies to reduce pbef availability (neutralizing antibody) resulted in significant protection from vili [117] . pbef is now recognized as associated with modestly increased risk of type 2 diabetes and elevated levels of acute-phase proteins [118] and a c-948g snp has been associated with an increased diastolic blood pressure in obese children [119] . these studies implicate pbef, now associated with a number of inflammatory disorders such as inflammatory bowel disease, multiple sclerosis, cystic fibrosis, and asthma [120] [121] [122] , as a key inflammatory mediator intimately involved in both the development and the severity of ventilator-induced ali. growth arrest dna damage inducible a (gadd45a), a member of an evolutionarily conserved gene family, is implicated as a stress sensor that modulates the response of mammalian cells to genotoxic or physiological stress [123, 124] . gadd45a is a small 21-kda predominantly nuclear protein that interacts with other proteins implicated in stress responses, including proliferating cell nuclear antigen, p21, cdc2/cyclin b1, mekk4, and p38 kinase [125, 126] . gadd45 induces cell cycle arrest and apoptosis in most of cells as well as promoting dna repair functions and survival [126] . growth arrest and dna damage gene (gadd45) also maintains genomic stability in a p53-responsive manner [127] . despite the multiple known functions of gadd45, its role ali, endothelial/epithelial barrier dysfunction, or repair of injured lung is unknown [38] . gadd45 exhibited differential expression in orthologous global gene expression profiling, in multispecies ali models [38] , in region-specific lung tissue expression profiling [62] , and was markedly upregulated in response to the vili [128] . we explored the mechanistic involvement of gadd45a in endotoxin (lps)and ventilator-induced inflammatory lung injury (vili) by comparing multiple biochemical and genomic parameters of inflammatory lung injury in wild-type c57bl/6 and gadd45a −/− knockout mice exposed to high tidal volume ventilation (vili) or intratracheally administered lps [129] . gadd45a −/− mice were modestly susceptible to lpsinduced injury but were profoundly susceptible to vili, demonstrating increased inflammation and increased microvascular permeability. vili-exposed gadd45a −/− mice manifested striking neutrophilic alveolitis with increased bal fluid levels of protein, igg, and inflammatory cytokines. expression profiling of lung homogenates revealed strong dysregulation in the b cell receptor signaling pathway in gadd45a −/− mice, suggesting the involvement of phosphatidylinositol 3-kinase/akt signaling components. western blots confirmed a threefold reduction in akt protein and phosphorylated akt levels observed in gadd45a −/− lungs. electrical resistance measurements across human lung endothelial cell monolayers transfected with small interfering rnas to reduce gadd45a or akt expression revealed significant potentiation of lps-induced endothelial barrier dysfunction which was attenuated by overexpression of a constitutively active akt1 transgene. whereas other lung injury studies failed to demonstrate a role for gadd45 in hyperoxic lung injury [130, 131] , our studies validate gadd45a as a novel inflammatory lung injury candidate gene and a significant participant in vascular barrier regulation via effects on akt-mediated endothelial signaling [132] . thus, both akt and gadd45 are extremely attractive ali candidate genes. the human gadd45a gene contains 25 validated snps (national center for biotechnology information snp database) whose role in ali pathogenesis is completely unknown [123] . we are currently pursuing further characterization of the role of gadd45a and its association of genetic variants with sepsis and ali. the identification of novel pathways involved in the pathobiology of ali also opens doors for the exploration of new therapeutic targets for the disease. as such, the use of agents that attenuate the endothelial barrier dysfunction and the inflammatory response characteristic of ali have shown promise in preclinical studies which will hopefully lead to their use in trials of human ali (fig. 3 ). s1p, an important lipid mediator generated by the phosphorylation of sphingosine by sphingosine kinase, decreases endothelial permeability to both water and solute via cytoskeletal reorganization and adherens junction assembly [88, 89] . s1p-induced barrier protective effects could serve to attenuate the increased pulmonary vascular permeability essential factor in the development of ali. the s1p analogue fty720 (0.1 mg/kg),when administered to c57bl/6 mice with endotoxin-induced lung injury, decreases lung edema formation, solute transport across the alveolar capillary endothelium, and inflammatory cell infiltration into lung parenchyma [94] . similarly, the prophylactic administration of s1p attenuates both alveolar and vascular barrier dysfunction while significantly reducing shunt formation associated with lung injury in rodent and canine models of ali induced by combined intrabronchial endotoxin administration and hightidal-volume mechanical ventilation [133] . in a recent study of a canine model of ali, we demonstrated that when bacterial endotoxin was instilled intratracheally followed in 1 h by intravenous administration of s1p (85 mg/kg) or vehicle and 8 h of high-tidal-volume mechanical ventilation [134] , s1p treatment attenuated the severity of ali-induced increases in shunt fraction and the presence of both protein and neutrophils in bal fluid compared with vehicle controls. interestingly, bal fluid cytokine production was not altered fig. 3 mechanism-based novel therapies for ali. the identification of novel pathways involved in the pathobiology of ali also facilitates the exploration of new therapeutic targets. sphingosine 1-phosphate (s1p1) attenuates the endothelial barrier dysfunction associated with ali, whereas blocking of pbef attenuates vali significantly by intravenous administration of s1p and s1p potentiated the endotoxin-induced systemic production of the inflammatory cytokines tnfa, c-x-c chemokine ligand-1, and il-6, without resulting in end-organ dysfunction. these data suggest that s1p may represent a viable therapy for the prevention and treatment of ali. as previously described in this chapter, pbef appears to play a central role in the promotion of several pathogenetic aspects of ali and vali. therefore, interventions aimed at attenuating the effects of pbef could have a potential therapeutic effect in these disorders. to begin to address the potential for pbef to serve as a therapeutic target in ameliorating vili, we assessed the effect of pbef neutralizing antibody on rhpbef-stimulated lung inflammation [117] . simultaneous instillation of rhpbef and pbef neutralizing antibody produced dramatic reductions in rhpbef-induced neutrophil recruitment. further, the intratracheal delivery of pbef neutralizing antibody (30 min before high-tidal-volume mechanical ventilation) abolished vili-induced increases in total bal fluid cell counts and significantly decreased neutrophil influx into the alveolar space as well as vili-mediated increases in the level of lung tissue albumin. ali is a major cause of morbidity and mortality in critically ill patients. given the unacceptably high mortality rate observed in ali and the paucity of novel therapies and biomarkers, it is essential to recognize molecular targets associated with ali to identify individuals at risk and to develop novel therapeutic targets and biomarkers. it is clear that derangements in endothelial cell barrier regulation play a major role in the pathobiology of ali and genetic variants regulate endothelial cell barrier function, thereby determining ali risk or subsequent severity of outcome. high-throughput gene sequencing and expression technologies, and complete genome sequencing of model organisms, have allowed for the performance of large-scale analyses of the genome in ali. in this chapter, we have highlighted how global gene expression profiling in multispecies ali models served to broaden our net knowledge of ali-implicated genes and provide a basis for hope that increased insights and therapies may be forthcoming. as genotyping becomes more rapid and easily accessed, combining advanced bioinformatics techniques with high-throughput methods will be the future practice of personalizing treatment strategies. continued challenges will be the gene-gene and gene-environment interactions, which add complexity to our understanding of the genome. these novel genetic approaches may prove exceptionally useful in ushering in the era of personalized medicine for critically ill individuals. the acute respiratory distress syndrome genomics of acute lung injury clinical predictors of the adult respiratory distress syndrome race and gender differences in acute respiratory distress syndrome deaths in the united states: an analysis of multiple-cause mortality data (1979-1996) ozone-induced acute pulmonary injury in inbred mouse strains making genomics functional: deciphering the genetics of acute lung injury wading into the genomic pool to unravel acute lung injury genetics susceptibility to neoplastic and non-neoplastic pulmonary diseases in mice: genetic similarities fate of angiotensin i in the circulation angiotensin ii induces apoptosis of human endothelial cells. protective effect of nitric oxide haemodynamic and endocrine effects of type 1 angiotensin ii receptor blockade in patients with hypoxaemic cor pulmonale angiotensin converting enzyme insertion/deletion polymorphism is associated with susceptibility and outcome in acute respiratory distress syndrome angiotensin ii, via at1 and at2 receptors and nf-kb pathway, regulates the inflammatory response in unilateral ureteral obstruction inflammation and angiotensin ii angiotensin ii and the fibroproliferative response to acute lung injury angiotensin ii induces apoptosis in human and rat alveolar epithelial cells abrogation of bleomycin-induced epithelial apoptosis and lung fibrosis by captopril or by a caspase inhibitor angiotensin-converting enzyme 2 protects from severe acute lung failure an insertion/deletion polymorphism in the angiotensin i-converting enzyme gene accounting for half the variance of serum enzyme levels evidence, from combined segregation and linkage analysis, that a variant of the angiotensin i-converting enzyme (ace) gene controls plasma ace levels alhenc-gelas f (1993) angiotensin i-converting enzyme in human circulating mononuclear cells: genetic polymorphism of expression in t-lymphocytes genetic polymorphisms associated with susceptibility and outcome in ards polymorphism of the angiotensin-converting enzyme gene affects the outcome of acute respiratory distress syndrome differences in frequency of the deletion polymorphism of the angiotensinconverting enzyme gene in different ethnic groups angiotensin-converting enzyme gene (ace) insertion/deletion polymorphism in mexican populations elevated plasma levels of soluble tnf receptors are associated with morbidity and mortality in patients with acute lung injury high berylliumstimulated tnf-a is associated with the -308 tnf-a promoter polymorphism and with clinical severity in chronic beryllium disease tumor necrosis factor gene polymorphism and septic shock in surgical infection 308ga and tnfb polymorphisms in acute respiratory distress syndrome influence of tnfa gene polymorphisms on tnfa production and disease single nucleotide polymorphism in the tumor necrosis factor-alpha gene affects inflammatory bowel diseases risk il6 gene-wide haplotype is associated with susceptibility to acute lung injury cytokine balance in the lungs of patients with acute respiratory distress syndrome a prospective study of inflammation markers in patients at risk of indirect acute lung injury persistent elevation of inflammatory cytokines predicts a poor outcome in ards. plasma il-1 beta and il-6 levels are consistent and efficient predictors of outcome over time pressure-time curve predicts minimally injurious ventilatory strategy in an isolated rat lung model science review: searching for gene candidates in acute lung injury orthologous gene-expression profiling in multispecies models: search for candidate genes use of consomic rats for genomic insights into ventilator-associated lung injury the association of interleukin 6 haplotype clades with mortality in critically ill adults vascular endothelial growth factor and related molecules in acute lung injury lung overexpression of the vascular endothelial growth factor gene induces pulmonary edema vascular endothelial growth factor may contribute to increased vascular permeability in acute respiratory distress syndrome decreased vegf concentration in lung tissue and vascular injury during ards a common 936 c/t mutation in the gene for vascular endothelial growth factor is associated with vascular endothelial growth factor plasma levels vascular endothelial growth factor gene polymorphism and acute respiratory distress syndrome genotypes and haplotypes of the vegf gene are associated with higher mortality and lower vegf plasma levels in patients with ards chemokine receptors as hiv-1 coreceptors: roles in viral entry, tropism, and disease involvement of chemokine receptors in breast cancer metastasis myosin iia is involved in the endocytosis of cxcr4 induced by sdf-1a the a-chemokine, stromal cell-derived factor-1a, binds to the transmembrane g-protein-coupled cxcr-4 receptor and activates multiple signal transduction pathways rab11-family interacting protein 2 and myosin vb are required for cxcr2 recycling and receptor-mediated chemotaxis role of tyrosine phosphorylation in ligand-independent sequestration of cxcr4 in human primary monocytes-macrophages a common stromal cell-derived factor-1 chemokine gene variant is associated with the early onset of type 1 diabetes genetic restriction of aids pathogenesis by an sdf-1 chemokine gene variant. alive study, hemophilia growth and development study (hgds) high-resolution haplotype structure in the human genome complement factor h polymorphism in age-related macular degeneration a genome-wide association study identifies il23r as an inflammatory bowel disease gene a genome-wide association study identifies novel risk loci for type 2 diabetes genomewide association studies of stroke microarray analysis of regional cellular responses to local mechanical stress in acute lung injury novel interaction of cortactin with endothelial cell myosin light chain kinase cytoskeletal regulation of pulmonary vascular permeability activated protein c mediates novel lung endothelial barrier enhancement: role of sphingosine 1-phosphate receptor transactivation novel polymorphisms in the myosin light chain kinase gene confer risk for acute lung injury transactivation of sphingosine 1-phosphate receptors is essential for vascular barrier regulation. novel role for hyaluronan and cd44 receptor family regulation of endothelial cell gap formation and barrier dysfunction: role of myosin light chain phosphorylation regulation of endothelial cell gap formation and paracellular permeability protein kinase involved in lung injury susceptibility: evidence from enzyme isoform genetic knockout and in vivo inhibitor treatment critical role of non-muscle mlck in ventilator-induced lung injury myosin light chain kinase in endothelium: molecular cloning and regulation adherent neutrophils activate endothelial myosin light chain kinase: role in transendothelial migration variation in the mylk gene is associated with development of acute lung injury after major trauma a variant of the myosin light chain kinase gene is associated with severe asthma in african americans macrophage migration inhibitory factor: a regulator of glucocorticoid activity with a critical role in inflammatory disease macrophage migration inhibitory factor role for macrophage migration inhibitory factor in acute respiratory distress syndrome mif as a glucocorticoid-induced modulator of cytokine production macrophage migration inhibitory factor governs endothelial cell sensitivity to lps-induced apoptosis intracellular interaction of myosin light chain kinase with macrophage migration inhibition factor (mif) in endothelium macrophage migration inhibitory factor in acute lung injury: expression, biomarker, and associations role of macrophage migration inhibitory factor (mif) in human and animal models of acute lung injury (ali) and sepsis: association of a promoter polymorphism and increased gene expression a novel dna vaccine-targeting macrophage migration inhibitory factor improves the survival of mice with sepsis role of macrophage migration inhibitory factor (mif) in allergic and endotoxininduced airway inflammation in mice aquaporin-1: a candidate gene in sepsis and lung injury a novel 5¢-flanking region polymorphism of macrophage migration inhibitory factor is associated with systemic-onset juvenile idiopathic arthritis sphingosine 1-phosphate promotes endothelial cell barrier integrity by edg-dependent cytoskeletal rearrangement regulation of sphingosine 1-phosphate-induced endothelial cytoskeletal rearrangement and barrier enhancement by s1p1 receptor, pi3 kinase, tiam1/rac1, and alpha-actinin regulation of the micromechanical properties of pulmonary endothelium by s1p and thrombin: role of cortactin protective effects of high-molecular weight polyethylene glycol (peg) in human lung endothelial cell barrier regulation: role of actin cytoskeletal rearrangement synthesis and biological evaluation of g-aminophosphonates as potent, subtypeselective sphingosine 1-phosphate receptor agonists and antagonists enhancement of capillary leakage and restoration of lymphocyte egress by a chiral s1p1 antagonist in vivo protective effects of sphingosine 1-phosphate in murine endotoxin-induced inflammatory lung injury tipping the gatekeeper: s1p regulation of endothelial barrier function effect of s1p1 receptor agonists on murine lung airway function differential regulation of sphingosine-1-phosphate-and vegf-induced endothelial cell chemotaxis. involvement of g ia2 -linked rho kinase activity endothelial cell barrier enhancement by atp is mediated by the small gtpase rac and cortactin sphingosine-1-phosphate receptor 1 variant increases promoter activity and decreases susceptibility to human asthma hepatocyte growth factor/scatter factor induces a variety of tissue-specific morphogenic programs in epithelial cells the biological role of hgf-met axis in tumor growth and development of metastasis ) c-met mutational analysis in small cell lung cancer: novel juxtamembrane domain mutations regulating cytoskeletal functions involvement of hepatocyte growth factor in formation of bronchoalveolar structures in embryonic rat lung in primary culture hepatocyte growth factor may act as a pulmotrophic factor on lung regeneration after acute lung injury intratracheal administration of hepatocyte growth factor/scatter factor stimulates rat alveolar type ii cell proliferation in vivo met receptor dynamics and signalling c-met signalling: spatio-temporal decisions cd44 regulates hepatocyte growth factor-mediated vascular integrity. role of c-met, tiam1/rac1, dynamin 2, and cortactin keratinocyte growth factor and hepatocyte growth factor in bronchoalveolar lavage fluid in acute respiratory distress syndrome patients hepatocyte growth factor and keratinocyte growth factor in the pulmonary edema fluid of patients with acute lung injury. biologic and clinical significance pre-b-cell colonyenhancing factor as a potential novel biomarker in acute lung injury pre-b-cell colony-enhancing factor gene polymorphisms and risk of acute respiratory distress syndrome pre-b-cell-colonyenhancing factor is critically involved in thrombin-induced lung endothelial cell barrier dysregulation pre-b cell colony-enhancing factor inhibits neutrophil apoptosis in experimental inflammation and clinical sepsis dysregulated expression of neutrophil apoptosis in the systemic inflammatory response syndrome delayed neutrophil apoptosis in sepsis is associated with maintenance of mitochondrial transmembrane potential and reduced caspase-9 activity essential role of pre-b-cell colony enhancing factor in ventilator-induced lung injury a visfatin promoter polymorphism is associated with low-grade inflammation and type 2 diabetes effects of genetic variation in the visfatin gene (pbef1) on obesity, glucose metabolism, and blood pressure in children genome-wide search for atopy susceptibility genes in dutch families with asthma two stage genome-wide search in inflammatory bowel disease provides evidence for susceptibility loci on chromosomes 3, 7 and 12 chromosome 7q21-22 and multiple sclerosis: evidence for a genetic susceptibility effect in vicinity to the protachykinin-1 gene egad, more forms of gene regulation: the gadd45a story gadd45 in the response of hematopoietic cells to genotoxic stress liebermann da genotoxic-stress-response genes and growth-arrest genes gadd, myd, and other genes induced by treatments eliciting growth arrest myeloid differentiation (myd)/growth arrest dna damage (gadd) genes in tumor suppression, immunity and inflammation genomic instability in gadd45a −/− cells is coupled with s-phase checkpoint defects inhaled carbon monoxide confers antiinflammatory effects against ventilatorinduced lung injury gadd45a is a novel candidate gene in inflammatory lung injury via influences on akt signaling -independent induction of gadd45 and gadd153 in mouse lungs exposed to hyperoxia loss of gadd45a does not modify the pulmonary response to oxidative stress modulation of lipopolysaccharide-induced gene transcription and promotion of lung injury by mechanical ventilation sphingosine 1-phosphate reduces vascular leak in murine and canine models of acute lung injury sphingosine 1-phosphate rescues canine lps-induced acute lung injury and alters systemic inflammatory cytokine production in vivo key: cord-102729-b1q7gbd6 authors: mickael, alexandra; klimovich, pavel; henckel, patrick; kubick, norwin; mickael, michel e title: asip (agouti-signaling protein) aggression gene regulate auditory processing genes in mice date: 2020-06-12 journal: biorxiv doi: 10.1101/2020.06.10.141325 sha: doc_id: 102729 cord_uid: b1q7gbd6 covid-19 strategy of lockdown has affected the lives of millions. the strict actions to enclose the epidemic have exposed many households to inner tensions. domestic violence has been reported to increase during the lockdown. however, the reasons for this phenomenon have not been thoroughly investigated. melanocortin gpcrs family contribution to aggression is well documented. asip (nonagouti) gene plays a vital role in regulating the melanocortin gpcrs family function, and it is responsible for regulating aggression in mice. we conducted a selection analysis of asip. we found that it negatively purified from shark to humans. in order to better asses the effect of this gene in mammals, we performed rna-seq analysis of a knockout of an asip crisper-cas mouse model. we found that asip ko in mice upregulates several genes controlling auditory function, including phox2b, mpk13, fat2, neurod2, slc18a3, gon4l gbx2, slc6a3(dat1) aldh1a7 tyrp1 and lbx1. interestingly, we found that slc6a5, and lamp5 as well as il33, which are associated with startle disease, are also upregulated in response to knocking out asip. these findings are indicative of a direct autoimmune effect between aggression-associated genes and startle disease. furthermore, in order to validate the link between aggression and auditory inputs processing. we conducted psychological tests of persons who experienced lockdown. we found that aggression has risen by 16 % during the lockdown. furthermore, 3% of the subjects interviewed reported a change in their hearing abilities. our data shed light on the importance of the auditory input in aggression and open perceptions to interpret how hearing and aggression interact at the molecular neural circuit level. the link between social lockdown and aggression is well established. aggression can be categorized as a beneficial evolutionary trait as it might indicate survival when individuals are competing for resources. conversely, aggressiveness might also impede social cohesion. social lockdown can lead to psychological problems, including anxiety, depression, antisocial, and violence-related behaviors. for example, orca confinement in closed places exhibits aggressive behaviors [1] . in humans, release from mandatory confinement indoors was correlated with decreases in both verbal and physical aggression [2, 3] . during covid-19 lockdown, 45.3% of participants of a study investigating the effect of lockdown reported worsened sleep, increased levels of irritation, anger, and aggression compared to pre-pandemic times [3] . furthermore, 5% of all participants reported experiencing interpersonal violence (ipv) [3] . however, the factors causing aggression are not yet well investigated. melanocortin system plays an important role in regulating aggression. it is widely documented that the peptide hormone pro-opiomelanocortin (pomc) acts as a precursor for various (neuro)peptides including α -melanocyte-stimulating hormone (α-msh), adreno-corticotrophin (acth) and β -endorphin.the function of melanocortins is regulated by the activation of a family of melanocortin receptor subtypes. α -msh binds to mcrs in the brain, where it regulates social behavior, appetite and stress physiology. α -msh acts as a neurotransmitter in the brain where it can modulate behavior mainly via central mc3r and mc4r in a variety of ways including regulating the dopaminergic reward system. melanocortin-5 receptor deficiency reduces a pheromone signal for aggression in male mice [4] . interestingly, agouti signaling peptide (asip) and agouti-related peptide (agrp) have diverse functional roles in feeding, pigmentation and background adaptation mechanisms. interestingly, asip serves as anof the mc1r and the mc4r receptors. it seems melancortin system diverged from adenosine receptors around the time of divergence of hydra vulgaris [5] . serotonin receptors that are known to play a distinctive role in aggression seemed to have evolved as they have been found in trichoplax adhereans [5] . while dopamine receptors diverged afterwards around the time of emergence of ciona intestinalis [5] . however, the inter-regulation mechanism between these four families is still not clear. hearing as a sensory modality in the context of aggressive behavior has been shown to play a major role in controlling behavior [6] . precise integration and processing of sensory inputs are crucial to evoke a suitable behavioral response [7] . in crickets, aggression songs are associated with cricket fights [8] . mice lacking asic3 show reduced anxiety like behavior on the elevated plus maze and reduced aggression [9] . asic3 was also shown to affect hearing [10] . these reports indicate that aggression and hearing are possibly interlinked. previous reports have shown various sensory modalities in mediating of aggressive behavior in drosophila melanogaster including olfactory, gustatory, as well as visual neural networks [6] . furthermore it was found that, neuronal silencing and targeted knockdown of hearing genes such as d trpl (transient receptor potential-like) and the ca2+ signaling-related genes arr2 and inad in the fly's auditory organ elicit abnormal aggression [6] . these observation indicates that hearing could regulating aggression behavior. however if aggression controls hearing is not yet known. interestingly, melanocytes present in the cochlea have an essential function in inner ear physiology. they protect against various types of hearing loss, including age-related hearing loss (arhl) and noise-related hearing loss (nihl), by means of calcium buffering, heavy metal scavenging, and antioxidant activities [11] . however if aggressiveness is directly affecting hearing abilities is still not known. furthermore there has no been earlier reports of asip investigations that have shown a direct link between asip and hearing. in this study, we have investigated the role of asip in linking aggression and the hearing system. in order to confirm that our mice model study would be representative of aggression hearing link, we conducted an evolutionary study that revealed that asip is negatively selected between mice and humans. we analyzed an rna-seq data in which asip was knocked out from japanese wild-derived msm/ms strain using crispr/cas9 genome editing. we found that numerous hearing associated genes were upregulated in the ko mouse model including those linked to startle disease. in order to validate our results we conducted behavioral tests, to investigate whether the rise in aggression during the covid-19 period has affected hearing patterns, we found that there 5 % individuals interviewed experience a change in their hearing abilities while engaging in arguments during lockdown. phylogenetic investigation was done in three stages. first, asip family amino acid sequences were aligned using mafft via the iterative refinement method (fft-ns-i).next, we employed prottest to conclude the best amino acid replacement model [12] . prottest results based on the akaike information criterion (aic) suggested that the best substitution model is lg+i+g+f., lg is the substitution model supplemented by a fraction of invariable amino acids ('+i') with each site assigned a probability of belonging to given rate categories ('+g') and observed amino acid frequencies ('+f'). the third stage involved employing the protein alignment and the resulting substitution model, in applying two different phylogenetic methods to construct the tree, namely, (1) maximum likelihood and (2) bayesian inference. we performed the maximum likelihood analysis using phyml [13] implemented in seaview with 5 random starting trees [14] . we applied bayesian inference analysis using mrbayes where we implemented a markov chain monte carlo analysis with 1000,000 generations to approximate the posterior probability and a standard deviation of split frequencies <0.01 to indicate convergence as previously described. we used the coding dna alignment and our final tree to investigate the ratio of non-synonymous (dn) to synonymous (ds) amino acid substitutions using the paml program. likelihood ratio tests (lrt) were constructed to compare the p-values of χ 2 square tests for selective pressure models against neutral models. one level of analysis was investigated. this level calculates the global ω for the tree using the one-ratio model m0 [15] where ω = dn/ds, with trees under purifying selection (01). the overall design of the rna experiment was as follows: the mid brain section was isolated from japanese wild-derived msm/ms japanese mice (2 control and 3 ko) [16] . total rna was purified from dissected midbrain using trizol (thermo fisher scientific). purified total rna was amplified and labeled with cy3 using low-input quickamp labeling kit (agilent technologies). cy3-labeled rnas were hybridized to sureprint g3 mouse gene expression v2 8x60k microarray kit (agilent technologies) at 65 °c for 17h. the scanned images were processed with feature extraction software (agilent technologies) to extract signal intensities of each probe. the extracted signal data were imported into the gene spring gx 13.1.1 software (agilent technologies) and normalized using the default settings. rnaseq analysis was then performed in r using limma [17] . briefly, we employed the limma rna-seq differential gene expression method to compute the non-parametric approximations of mean-variance relationships. this allowed us to calculate the weights for a linear model analysis of log-transformed counts in conjunction with the empirical bayes shrinkage of variance parameters. differential expression analysis was performed to determine the differences in gene expression between +lps cells and nontreated samples by fitting a linear model to compute the variability in the data with lmfit [72,73]. pathway enrichment was done using the library fgsea[18]. the network between chosen genes was calculated using the glasso module utilizing the webserver geneck [19] with default settings. data were downloaded from http://www.proteinatlas.org [20] . for a protein to be a candidate biomarker it should be medium or high expressed in normal brain .we arbitrarily set our selection criteria for candidate genes that were found to be upregulated in the rna-seq study. the present study included 25 participants who reported no prior aggression, or depression diagnosis. the sample size was determined by the g power analysis. the participants reported age ranged between 27 to 65 years. the education level of participants varied from uneducated to master's degree. informed written consent was taken from the participants after explaining them the purpose of research. exclusion criteria; participants with some serious general medical condition were also not selected. aggression was measured buss-perry aggression questionnaire [21] . this scale consists of four subscales of physical aggression verbal aggression, anger and hostility the participants were asked to evaluate each item on a 5-point likert scale ranging from 1 (not true) to 5 (true). the consistency for each category was confirmed in with cronbach alphas ranging from .70 to .78 [22] . in this study, the alphas for the subscales were .75 for physical aggression, .77 for verbal aggression, .76 for anger, and .73 for hostility, respectively. the descriptive scores of the four subscales were considered by averaging the item scores. we examined whether aggression behavior could be caused by social stress. we employed the bergen social relationship scale (bsrs) which measures the interpersonal relationship problems [23] . it is a six element self-report quantity. the answers were on four points scale using the notions of "describe me well (3)" to "do not describe me at all". the scores system was divided from 0 to 18, where a higher score signifies higher interpersonal conflicts. cronbach's alpha for the bsrs was reported to be 0.76. the testretest correlation was reported to be 0.75. the construct validity of the bsrs was ranged from 0.40 to 0.32, all statistically significant at p < 0.001. our results indicate that aggression-inducing gene the conserved aggression genes asip is responsible for down regulating several genes responsible for hearing and acoustic processing in mice. our results also shows that this gene is strongly conserved between mice and humans. when we analyzed rna-seq for asip ko mouse model we found that several genes controlling hearing were upregulated in the ko samples. specifically, phox2b, mpk13, fat2, neurod2, slc18a3, gon4l gbx2, slc6a3 aldh1a7 tyrp1 and lbx1 genes were down regulated in asip (ko). interestingly, we found that slc6a5, and lamp5 as well as il33, which are associated with startle disease, are also upregulated in response to knocking out asip. these results imply that asip is playing a fundamental role in startle disease and that the startle disease pathology is connected to the patient's response to aggression. we found the link between hearing abilities and aggression mirrored in human samples where people whose experienced aggression behavior lockdown, also reported hearing abilities change. our results shed more light on the link between aggression and processing acoustic signals in humans. agouti evolution was under negative selection. our results extend what has been reported by saeed et al. we were able to locate human asip gene homologs in chimps, orangutan, mice, and chicken (figure x) . we found two homologs of the gene in zebrafish as well as in elephant shark (faa00702.1, and faa00750.1). interestingly, we were only able to find a single copy of the gene in the reed fish erpetoichthys calabaricus (xp_028660269.1), suggesting that reed fish lost one homolog. we were not able to locate any asip genes in lampreys. however, we were able to locate several melanocortin receptors (e.g. xp_032827023.1, serotonin, xp_032806630.1, adenosine (xp_032823932.1) and dopamine receptors xp_032806557.1. suggesting that lampreys have evolved its unique pathways for regulation of aggression. we were not able to locate asip in hagfish hyperotreti, ciona intestinalis, hydra vulgaris, drosophila melanogaster, trichoplax adhaerens or caenorhabditis elegans. these observations indicate that asip first emerged during the divergence of gnathostomata and hyperoartia (lampreys). identity and similarity analysis indicated that asip is highly conserved. for example, the similarity between humans chimp for asip was 97.7 %, while with the mouse, it was (75.5%) (figure) . furthermore, w ds/dn showed a value of 0.3, confirming that the genes evolved under negative selection (figure x) . we analyzed the geo dataset gse84840. in this dataset, crispr/cas9-mediated genome editing in wild-derived mice was performed to generate tamed wild-derived strains by mutation of the a (nonagouti) gene. these tamed mice show non aggressive behavior when tested through the () test. we found genes related to hearing and acoustic signals processing upregulated in the ko mice. we also found that serotonin level was directly down regulated by knocking out agouti gene (figure 1). other aggressiveness related genes that were down regulated include () . however interestingly we found serveal genes that are associated with aggressive behavior have not been affected such as cfos, and…. notably, we did not see any change in the inflammatory pathway in this particular model. however, using the geo data set () that uses (), we found that cd4 pathway was affected (figure 2) as well as () patwhay. (figure 2). ( figure 3=ihc ) to investigate the role of asip in the auditory sensory mechanisms, we analyzed the public microarray of cochlear sensory epithelia versus embryonic stem cells. we found that asip is upregulated in cochlear sensory epithelia but not in embryonic stem cells (fold change 1.92). interestingly mcr4 was not upregulated (1.07) indicative of asip a regulatory role of melanocortin receptors under homeostasis. fat2, slc18a3, gon4l, dat1, aldh1a7 were also not upregulated confirming our hypothesis of a regulatory effect of asip on these hearing related genes. to investigate the role of asip in the cochlear sensory epithelia, we analyzed the public microarray of cochlear sensory epithelia treated with lpl. we found that asip is upregulated in cochlear sensory epithelia but not in embryonic stem cells (fold change 1.92). interestingly il 33 known to perform regulation of inflammatory pathway was not upregulated. interestingly mcr4 was not upregulated (1.07) indicative of asip a regulatory role of melanocortin receptors under homeostasis. fat2, slc18a3, gon4l, dat1, aldh1a7 were also not upregulated confirming our hypothesis of a regulatory effect of asip on these hearing related genes. our studies suggest a direct link between acoustic processing and aggression. we have investigated the relationship between aggression induced by lockdown and hearing. we found that 3% of individuals who answered the questionnaire reported some difference in their hearing ability. this includes both and negative abilities. we found these results reflected in the molecular pathway of melanocortin receptors 1 and 4, where their knocking out their negative agonist; asip resulted in upregulating various hearing associated genes such as fat2 among others. we also noticed that of the genes associated with hearing affected by knocking out of asip is slc6a5 and il33, which showed upregulation; both these genes were associated with startle disease. we investigated asip further and we found that it is expressed in cochlear sensory epithelial cells. from an evolutionary point of view, asip is more recent than melanocortin receptors. taken together, our results suggest that asip divergence represent the evolution of new mechanism linking hearing and aggression in higher animals. our analysis has shown that the asip gene diverged around 600 mya ago. ingo et al,( pmid: 21406593) have shown that agouti exist in teleost. we have found it in elasmobranchs, proving it was more ancient than previously thought. the endocrine related genes that play an important role in aggression include, serotonin, with low levels of serotonin associated with violent behavior and suicidal thoughts. serotonin perform its role in aggression through a network of genes including maoa and maob, (which play a role in the metabolism of biogenic amines), slc6a4, (which acts as serotonin transporter) and tryptophan hydroxylase enzyme (tph1) (which catalyzes the rate-limiting step in the synthesis of serotonin). it has been shown that polymorphism of metabolic enzymes, carrier proteins, and receptors on the serotonergic system are associated with an increased aggressive behavior pattern. interestingly, serotonin receptors diverged during the time of trichoplax adhaerens. in human studies, a positive relationship between aggressiveness 3-methoxy-4-hydroxyphenylglycol (mhpg) (norepherinrine product) and has been established. noradrenaline transporter (slc6a2), −3081t allele is more dominate in adhd in americans of european descent, thus proving a questionable link with aggression. interestingly, β -type noradrenergic receptor blocker have been shown to reduce aggressive behavior in some but not all patients tested, suggesting that aggression is controlled by a host of gene networks. adrenergic receptors also evolved during the time of evolution of trichoplax adhaerens. it seems the melanocortin system diverged from adenosine receptors during the divergence of hydra vulgaris (pmid: 27614546). conversely, arginine vasopressin levels are positively correlated with life history of aggression have evolved ruing the emergence of ciona intestinalis. another important endocrine mechanism is the dopamine reward system, for example, the gene dbh is a key enzyme in the synthesis of norepinephrine which is associated with conduct disorder. however, dopamine receptors first appeared during the emergence of ciona intestinalis. acetylcholine receptors also implicated in aggression behavior diverged during danio rerio emergence. we could not find asip in ciona intestinalis. we have noticed that knocking out asip increased acetylcholine receptor slc18a3 (2.1 fold increase). interestingly, knocking out asip did not affect dopaminergic or serotonergic receptors expression. since its emergence asip has been subjected to a tight negative selection (w=0.23). this indicates that from a chronological point of view, serotonin, adrenergic, are responsible for regulating the basic mechanism controlling behavior in lower organisms, melanocortin and dopamine emerged as the need for a better control of aggression appeared in hydra and ciona, while acetylcholine receptors play a role in higher animals. finally, asip emerged to play a role of regulator in higher invertebrates and vertebrates. asip is controlling melanocortin role in hearing. effect on asip on genes associated with auditory signal processing. asip is downregulating genes responsible for sound processing and regulation. phox2b, mpk13, fat2, neurod2, slc18a3, gon4l gbx2, slc6a3(dat1) aldh1a7 tyrp1 and lbx1. phox2b is expressed in the brain stem, mutation in this gene have been associated with brainstem dysfunction and brainstem auditory evoked potentials in 20% of the congenital central hypoventilation syndrome (cchs) patients investigated (25886294). phox2b is involved in the development of several major noradrenergic neuron populations, including the locus coeruleus the pons of the brainstem which is known to play a major role in aggression behavior (27083854). fat2 plays an important role in hair cell regeneration in zerbra fish (24706895). it was demonstrated that the lateral and basolateral amygdala nuclei fail to form in neurod2 null mice and neurod2 heterozygotes have fewer neurons in this region. neurod2 heterozygous mice show profound deficits in emotional learning as assessed by fear conditioning (16203979). human neurod2can induce neurogenic differentiation in non-neuronal cells in xenopus embryos, and is thought to play a role in the determination and maintenance of neuronal cell fates. gbx2 is required for the morphogenesis of the mouse inner ear. in particular, absence of the endolymphatic duct and swelling of the membranous labyrinth are common features in gbx2-/-inner ears. more severe mutant phenotypes include absence of the anterior and posterior semicircular canals, and a malformed saccule and cochlear duct(15829521). in slc6a3:egfp larvae, it was found that egfppositive dopaminergic fibers were located within the supporting cell layer and not within the hair cell layer. it was demonstrated that dopamine receptors are present in sensory hair cells at synaptic sites that are required for signaling to the brain. when nearby neurons release dopamine, activation of the dopamine receptors increases the activity of these mechanosensitive cells. a mutation in aldh7a1 has been suggested to contribute in the meniere disease (md), an inner ear disorder characterized by tinnitus, vertigo, and hearing loss (lynch et al., 2002) . acoustic overstimulation upregulate tyrp1in rats (26520584). lbx1 acts as a selector gene in the fate determination of somatosensory and viscerosensory relay neurons in the hindbrain. interestingly we found that asip is upregulated in the the perception of sound involves the cochlear sensory epithelium (cse), which contains hair cells and supporting cells. hair cells are the transducers of auditory stimuli into neural signals, and are surrounded by supporting cells (pmid: 28662075). taken together that these reports indicate that asip a key regulator of several genes at different regions of the brain that play various role in developing and maintaining acoustic pathways. our results indicate a direct link between hereditary hyperekplexia and aggression. startle disease is characterized by an exaggerated startle response, evoked by tactile or auditory stimuli, leading to hypertonia and apnea episodes. missense, nonsense, frameshift, splice site mutations, and large deletions in the human glycine receptor α 1 subunit gene (glra1) are the major known cause of this disorder. however, mutations are also found in the genes encoding the glycine receptor β subunit (glrb) and the presynaptic na+/cl−-dependent glycine transporter glyt2 (slc6a5). in this study, systematic dna sequencing of slc6a5 in 93 new unrelated human hyperekplexia patients revealed 20 sequence variants in 17 index cases presenting with homozygous or compound heterozygous recessive inheritance. five apparently unrelated cases had the truncating mutation r439x. genotype-phenotype analysis revealed a high rate of neonatal apneas and learning difficulties associated with slc6a5 mutations. from the 20 slc6a5 sequence variants, we investigated glycine uptake for 16 novel mutations, confirming that all were defective in glycine transport. although the most common mechanism of disrupting glyt2 function is protein truncation, new pathogenic mechanisms included splice site mutations and missense mutations affecting residues implicated in cl− binding, conformational changes mediated by extracellular loop 4, and cation-π interactions. detailed electrophysiology of mutation a275t revealed that this substitution results in a voltage-sensitive decrease in glycine transport caused by lower na+ affinity. this study firmly establishes the combination of missense, nonsense, frameshift, and splice site mutations in the glyt2 gene as the second major cause of startle disease. lamp5 lamp5 plays a pivotal role in sensorimotor processing in the brainstem and spinal cord. it is highly expressed in several brainstem nuclei involved with auditory processing including the cochlear nuclei, the superior olivary complex, nuclei of the lateral lemniscus and grey matter in the spinal cord. it was localized exclusively in inhibitory synaptic terminals, as has been reported in the forebrain. lamp5 knockout mice showed an increased startle response to auditory and tactile stimuli. in addition, lamp5 deficiency led to a larger intensity-dependent increase of wave i, ii and v peak amplitude of auditory brainstem response. (pmid: 30867010). we also found that il33−/− animals had deficits in acoustic startle response, a sensorimotor reflex mediated by motor neurons in the brainstem and spinal cord (fig. 2j , k; (23)). auditory acuity and gross motor performance were normal (fig. s5 i-j) . taken together, these data demonstrate that il-33 is required for normal synapse numbers and circuit function in the thalamus and spinal cord (pmid: 29420261). important (pmid: 25569367) a number of controlled experiments and clinical investigations have demonstrated roles for glucocorticoids in auditory function and protection. as early as the 1960s, clinical studies revealed that patients with adrenocorticosteroid deficiency presented with greater auditory sensitivity compared to normal volunteers (henkin et al., 1967) . moreover, treatment with prednisone brought hearing thresholds up to normal levels, demonstrating that the observed hypersensitivity was related to levels of circulating corticosteroids. similarly other studies revealed that patients with meniere's disease, an inner ear disorder affecting both cochlear and vestibular function, exhibited low levels of circulating corticosteroids. administration of adrenal cortex extract improved auditory function in these patients (powers, 1972) orca behavior and subsequent aggression associated with oceanarium confinement. animals (basel) confined to barracks: the effects of indoor confinement on aggressive behavior among inpatients of an acute psychogeriatric unit the german covid-19 survey on mental health: primary results melanocortin-5 receptor deficiency reduces a pheromonal signal for aggression in male mice an optimised phylogenetic method sheds more light on the main branching events of rhodopsin-like superfamily hearing regulates drosophila aggression emotional responses to multisensory environmental stimuli. sage open aggressive behavior in the antennectomized male cricket gryllus bimaculatus mice lacking asic3 show reduced anxiety-like behavior on the elevated plus maze and reduced aggression asic3(-/-) female mice with hearing deficit affects social development of pups progressive hearing loss in vitamin a-deficient mice which may be protected by the activation of cochlear melanocyte prottest 3: fast selection of best-fit models of protein evolution new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. syst biol seaview version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building paml 4: phylogenetic analysis by maximum likelihood crispr/cas9-mediated genome editing in wild-derived mice: generation of tamed wild-derived strains by mutation of the a (nonagouti) gene. sci rep limma powers differential expression analyses for rna-sequencing and microarray studies geneck: a web server for gene network construction and visualization the human protein atlas: a spatial map of the human proteome the generalizability of the buss-perry aggression questionnaire a review on sample size determination for cronbach's alpha test: a simple guide for researchers measuring interpersonal stress with the bergen social relationships scale we would like to acknowledge professor macrious abraham for his ideas and advice. key: cord-274241-biqbsggu authors: shaw, timothy i.; srivastava, anuj; chou, wen-chi; liu, liang; hawkinson, ann; glenn, travis c.; adams, rick; schountz, tony title: transcriptome sequencing and annotation for the jamaican fruit bat (artibeus jamaicensis) date: 2012-11-15 journal: plos one doi: 10.1371/journal.pone.0048472 sha: doc_id: 274241 cord_uid: biqbsggu the jamaican fruit bat (artibeus jamaicensis) is one of the most common bats in the tropical americas. it is thought to be a potential reservoir host of tacaribe virus, an arenavirus closely related to the south american hemorrhagic fever viruses. we performed transcriptome sequencing and annotation from lung, kidney and spleen tissues using 454 and illumina platforms to develop this species as an animal model. more than 100,000 contigs were assembled, with 25,000 genes that were functionally annotated. of the remaining unannotated contigs, 80% were found within bat genomes or transcriptomes. annotated genes are involved in a broad range of activities ranging from cellular metabolism to genome regulation through ncrnas. reciprocal blast best hits yielded 8,785 sequences that are orthologous to mouse, rat, cattle, horse and human. species tree analysis of sequences from 2,378 loci was used to achieve 95% bootstrap support for the placement of bat as sister to the clade containing horse, dog, and cattle. through substitution rate estimation between bat and human, 32 genes were identified with evidence for positive selection. we also identified 466 immune-related genes, which may be useful for studying tacaribe virus infection of this species. the jamaican fruit bat transcriptome dataset is a resource that should provide additional candidate markers for studying bat evolution and ecology, and tools for analysis of the host response and pathology of disease. bats are an ancient and diverse group [1] and are the second largest taxonomic group of mammals with more than 1,200 identified species among the 5,499 known mammals [2, 3] . bats are the only mammals to have evolved powered flight, which has allowed dispersal across all continents other than antarctica. bats are critical components of ecosystems, serving as major predators of insects, pollinating flowers and dispersing seeds of keystone plant species worldwide. the body sizes of bats range from less than 2 gm with 8 cm wingspans to more than 1 kg with 2 m wingspans. most contemporary species of bats are insect-, nectar-, or fruit-eaters, but about 1% are carnivores, including fish-eating and blood-drinking species. the evolutionary origin of bats remains controversial [4, 5] . in early work, bats were thought to be closely related to rodents and primates [6] . bats are now established within laurasiatheria; however, the placement of bats within laurasiatheria has been difficult to resolve because the major groups diverged from one another within a relatively short period of time [7] . different placements recently hypothesized for bats include: (a) sister to perissodactyla (horse) [8] ; (b) sister to cetartiodacyla (cattle+dolphin) [5] , (c) sister to perissodactyla+cetartiodacyla (horse, cattle, dolphin) [9] , (d) sister to ferungulata (cattle+dolphin, dog+horse) [4, 10] and (e) the pegasoferae hypothesis which places bat with perissodactyla and carnivora (horse+dog) [11] (see [5] for a review). two bat genomes have been sequenced to date [12] , the little brown bat (myotis lucifugus, 76 coverage) and the large flying fox (pteropus vampyrus, 2.66 coverage), but neither has been extensively annotated. these species represent the two major clades within bats: the microbats and megabats. transcriptome sequencing for another megabat species, the australian flying fox (pteropus alecto), has recently been published [13] . thus, a transcriptome for a microbat species is needed. many highly pathogenic viruses are hosted, or suspected to be hosted, by bat reservoirs, including ebolaviruses, marburg virus, hendra virus, nipah virus, rabies virus and coronaviruses [2] . in total, more than 100 viruses have been isolated from, or detected in, bats of dozens of species, yet many of the viruses that cause disease in humans cause little or no disease in the bats. significantly, the great majority of bat species have not been examined for infectious agents and are, thus, likely underappreciated as reservoir hosts. the continued encroachment of humans upon bat habitat and bat migrations caused by climate change may lead to novel infectious diseases among humans and livestock. moreover, some infectious diseases cause significant morbidity and mortality in bats that could have dramatic impacts on population numbers and cascading ecological effects [14] . thus, the study of bats and their infectious agents is an important but neglected aspect of zoonotic and wildlife disease research. jamaican fruit bats (artibeus jamaicensis) are one of the most common bats in the tropical americas, ranging from the caribbean islands, tropical south and central america, mexico and the florida keys [15] . the jamaican fruit bat is a microbat in the family phyllostomidae, which contains 56 genera and 192 species. they are a frugivorous generalist and fig specialist of medium size; about 80 mm in length with a wingspan of 130 mm and mass of about 50 grams. they can readily fly 20 km per night, although they typically maintain a smaller home range as long as food is available, and can live 9 years or more in the wild. females typically produce two offspring per year and provide maternal care for about 50 days, with pups reaching adult body weight by about 80 days. several microbes of interest have been isolated from or detected in jamaican fruit bats, including histoplasma capsulatum, trypanosoma cruzi, eastern equine encephalitis virus, mucambo virus, jurona virus, catu virus, itaporanga virus and tacaiuma virus, suggesting the species may be an important reservoir and vector of infectious diseases [15] [16] [17] [18] . it is unknown what diseases these pathogens may cause in bats. tacaribe virus (tcrv) was isolated from 11 artibeus bats (6 a. lituratus, 5 a. jamaicensis) in the late 1950s in and near port of spain, trinidad [19] . tcrv was the first arenavirus isolated in the americas and during the next decades other arenaviruses with substantial similarity to tcrv were identified that cause the south american hemorrhagic fevers (sahf) [20, 21] . the known reservoir hosts for all other arenaviruses are rodents, making tcrv exceptional for its repeated isolation from artibeus bats. because exhaustive searches for evidence of other potential reservoir hosts of tcrv failed to suggest another reservoir species [19, 22] , it has been suspected that artibeus bats are reservoirs of the virus. however, recent work by us demonstrated that trlv-11573, the only remaining isolate of tcrv, causes a fatal infection resembling the sahf in jamaican fruit bats, one of the species from which tcrv was isolated, or is cleared without disease, suggesting this species is not a suitable reservoir host for tcrv and the virus may be a significant pathogen for bats [23] . because of the equivocal role of artibeus bats as a reservoir host species for tcrv, and because of the similarities with human sahf, the jamaican fruit bat may be a novel model for studying the pathology of the disease. however, as an unusual, non-model organism, very little is known about its physiology, immunology or host response to tcrv. no antibodies are available with known specificity to jamaican fruit bat proteins, which dramatically limits its usefulness. to address some of these deficiencies, we have performed transcriptome sequencing and analysis of spleen, lung, kidney and poly-ic-stimulated primary kidney cells to identify genes of interest for assessing the host response to tcrv infection. more than 240,000 454 reads and 142 million illumina reads were obtained ( table 1 ). the reads were submitted to short read archive (sra) under srr539297 and srr538731. reads from lung, kidney, and poly-ic-stimulated primary kidney cell libraries were pooled for a combined de novo assembly using the 454 gs assembler program, yielding 6,450 contigs. for the illumina spleen sequences, we first corrected reads using the soapdenovo correction tool and further assembled them using soapdenovo, yielding 214,707 contigs. a total of 367,317 snps and 44,679 indels were detected through gigabayes. at least 16 reads covering a site were required to ensure the snp was of high quality. using tgicl, a combined assembly of the 454 and illumina contigs was constructed that contained 102,237 contigs with n10, n50, and n90 of 3,882 bp, 1,004 bp, and 289 bp, respectively. human and mouse genomes were used as references to estimate the distribution of bat contigs within known gene transcripts. human and mouse genomes were chosen for completeness of their annotations. genomic features were divided into 5 groups: 1 kb upstream of 59 utr, 59 utr, cds (coding sequence), 39 utr, 1 kb downstream of 39 utr. we found 58.03% and 23.18% mapped cds region for human and mouse genome respectively ( figure 1a ). because we performed transcriptome sequencing, we expected a majority of the sequences to map to cds and utr regions of the genome. many rna genes were also mapped, including long noncoding rnas and a substantial number of micrornas ( figure 1b ). annotation was concentrated on identifying micrornas because they could be cross validated through their rna secondary structure features. to further obtain a confident set of microrna sequences, a microrna prediction pipeline was used to cross validate the blast mapping of prediction. in the process, 42 confident microrna candidates were found that have been deposited within mirbase [24, 25] . we present the list of predicted micrornas as table s1 . mapping the predicted snps on the genomic features indicates that the vast majority of snps are in the cds region ( figure 1c-1d) . although humans and mice are both outside laurasiatheria the relatively fast rate of molecular evolution of mice is expected to result in more differences between bats and mice than bats and humans [26] [27] [28] [29] . the presence of sequences mapped 1 kb upstream or downstream of the known transcript indicated possible alternative splicing from human and mouse transcripts. blast2go was used to functionally annotate contigs. a total of 20,020 contigs (19.58% overall) had significant matches to known proteins in the ncbi non-redundant protein (nr) database. horse and human were identified as the top two species with best blast hits for bat contigs ( figure 2 ). the blastx annotation process is biased by the completeness of the annotation for each respective genome; therefore, despite the lack of a completely annotated horse genome, a high similarity between bat and horse genomes was apparent. the human genome is well annotated, which explains the high number of blast hits between bat and human. the go annotation divides the functional annotation into three main components: biological process, cellular process, and molecular [30] . a majority of the annotated genes encoding proteins that function within a cell or organelle are involved in metabolic and cellular processes. the primary molecular functions of these genes are catalytic and binding activities ( figures 3a-c) . a total of 466 immune-related genes were annotated by blast2go. these immune genes include toll-like receptors, cytokines, transcription factors, kinases and several chemokine receptors. in addition, categorizer was used to categorize the immune class using the goslim database, resulting in 30 categories representing a broad range of immune activities ( figure 4 ). the immune response and lymphocyte activation genes represented the largest proportion of theses transcripts. there were 82,218 unannotated contigs. a total of 16,869 sequences had open reading frames longer than 300 nt; 5,417 were identified through blastp to the nr database with evalue,1e 23 . for the remaining contigs, 54,892 mapped to the assembled myotis lucifugus genome, and 48,809 mapped to the assembled pteropus vampyrus genome. there were 20,145 contigs that mapped to pteropus alecto, australian flying fruit bat, and 18,359 that overlapped between genomic and transcriptome sequences for all three datasets ( figure 5 ). through this process, we were able to account for 65,828 (80%) unannotated contigs. the completeness of genes mapped to immunological pathways was examined using human and mouse as reference species. based on the ortholog data obtained, all contigs were mapped onto immune system related kegg pathways ( table 2 ) and determined that many genes were missing from these pathways. this could be due to the low expression within bat tissues or due to the overly stringent e-value cutoff of 1e 220 during reciprocal blast annotation that we chose to limit the number of false positives. a kegggraph visual representation of contigs mapped onto the mouse pathway was generated (figures s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, s15, s16, s17, s18). pathways involved in the adaptive immune response, t and b cell signalling pathways, generally had more mapped genes than did those involved in innate response or natural killer (nk) cell-mediated cytotoxicity pathways ( figures 6a and 6b ). the nk cell cytotoxicity pathway appears to have almost half of its genes missing, whereas the b cell receptor pathway appears to have most of its genes present. nucleotide substitution in the coding region can be synonymous or non-synonymous. the ratio between the rate of synonymous (ds) and non-synonymous mutation (dn) can be used to infer the degree of selection operating on the system. we used the human genome as a reference for dn/ds calculations because the human genome is well annotated. reciprocal blast was used to identify human, mouse, and bat orthologs. macse was used to generate codon alignments. the alignments were trimmed for excessive gap codon triplets, and paml was used to calculate dn/ds for each gene. when genes are highly conserved, synonymous mutations (ds) tend to be estimated as 0, resulting in a larger dn/ds ratio, therefore those results were removed from the analysis. after filtering, dn/ds results were obtained for 14,717 genes. the majority of the genes have close to zero dn/ds with clear evidence of purifying selection, a feature common among mammalian genes [31] [32] [33] . for investigation of positive selection, tang et al. [34] have argued that a dn/ds threshold of greater than 1 for positive selection might be overly stringent. because of this, a dn/ds cutoff of 0.7 was chosen to investigate genes that might be experiencing weak purifying selection. a total of 138 genes above the 0.7 threshold were found (table s2) . for genes with evidence of positive selection, 32 exceeded the 1.0 dn/ds threshold (figure 7) . through annotation by david [35, 36] , there were 14 genes involved in transcriptional activation and regulation processes. there were 9 genes associated with cellular signaling. in particular, we found dna-damage-inducible transcript 4 (ddit4) gene with dn/ds 1.4053; this protein is involved in the mtor signaling pathway and it regulates cell growth and promotes neuronal cell death [37, 38] . ectodysplasin a (eda), involved with cytokine:receptor interaction pathways, had a dn/ds value of 1.23. the phylogenetic placement of bats within laurasiatheria is still unresolved. through reciprocal blast, we identified 8,785 putative orthologs across mouse, rat, cattle, horse and human (table s3) . afterward we filtered out alignments with greater than 5% gap, the 2,378 genes remaining were used to construct 500 iteration multilocus bootstrap species tree (see methodology). this resulted in a highly supported species tree placing bat sister to the clade containing cattle, horse, and dog ( figure 8 ). the jamaican fruit bat transcriptome described here is a major new resource for genetic studies of bats. this bat is an important seed dispersing and pollinating species found in most of the tropical americas. it is likely susceptible to infectious diseases, could be a zoonotic reservoir and vector, and may be a suitable model for the pathogenesis of sahf. considering the importance of immunological functions in response to infections, we conducted a transcriptome assessment of genes from spleen, kidney and lungs so that genetic tools and methods can be used to study this species as well as other microbats. genes were identified that mapped to immune response pathways; based on categorize classification of immune classes, we found 40 different immune classes. recently, the transcriptome sequencing for the australian black flying fox was performed [13] . our data contain a greater proportion of lymphocyte related immune classes than does the flying fox's transcriptome dataset. however, our dataset also contained a lesser proportion of cytokine related immune classes than the flying fox's transcriptome dataset. genes involved in adaptive immune response generally had more mapped genes compared to genes involved in innate responses. from figures 6a and 6b , more genes were mapped to the b cell receptor signalling pathway than to the nk cellmediated cytotoxicity pathway. this bias is likely due to the large number of b cells found in the spleen. due to our stringent blast criteria, it is also possible that lowering the e-value threshold could obtain additional genes mapped but at the risk of more false positives. we deposited 42 microrna genes for a. jamaicensis into mirbase, and according to mirbase this gene set is the first deposited bat microrna genes. estimates of substitutions within the orthologous contigs found 32 genes with a dn/ds ratio.1. this ratio provides a guide for indicating potential genes that are under positive selection. many genes were involved in transcriptional activation/regulation processes, suggesting potential differences in the transcriptional regulating architectures of bats and humans. the ddit4 and eda have a dn/ds ratio.1, suggesting these genes are under positive evolutionary selection. ddit4 is involved in regulation of cell death and its positive selection suggests a potential difference in cell death regulation between human and bats; further analysis will need to be performed to verify the functional differences. another potential positively selected gene, eda is associated with ectodermal dysplasia type 1 [39] , a disorder associated with abnormal development of physical structures, including skin, hair, nails, teeth, and sweat glands. we suspect the bat's eda gene could be used as a potential reference for future studies of the disorder. for transcripts that failed to be identified by reciprocal blast searches, we predicted the orf for the unannotated contigs and used blastp against the nr database to identify 5,349 unannotated contigs. for the remaining unannotated contigs, we used genomic data from myotis lucifugus and pteropus vampyrus, as well as transcriptome data from pteropus alecto to identify additional unannotated contigs. existing artibeus contigs that were not present within the nr database, but overlapped among myotis and pteropus genomic and transcriptomic sequences indicated the possibility for bat specific transcripts. we also found contigs that mapped only to the myotis lucifugus genome indicating the possibility for microbat specific contigs. in total, we were able to account for 80% of the unannotated trancripts, and the remaining unannotated transcripts likely include misassembled contigs, contigs not sequenced sufficiently in the other bats to be included in their genome assemblies, as well as a few transcripts specific to artibeus jamaicensis. many additional analyses are warranted to further refine the transcriptome information from artibeus and other bats. phylogenomics is an important tool for resolving the tree of life, and this transcriptome data set provides an opportunity to study the evolutionary history of bats. bats were once thought to be closely related to primates [6] ; however, further work using molecular information placed them within laurasiatheria [40] . our finding of bat as sister to the clade containing horse, dog, and cattle is consistent with the recent study by mccormack et al. [4] and zhou et al [10] . here, we used 2,378 loci from a microbat and species tree analyses to obtain 95% bootstrap support, whereas mccormack et al. use 683 loci to obtain 64%bootstrap support. a recent study by nery et al. [5] obtained a concatenated data from 3,733 loci from megabat with 100% bootstrap support and 1.0 posterior probability placing bat as sister to cattle. our phylogenetic tree is less resolved than nery et al., probably because we did not include the more limited transcriptome data available from dolphin and hedgehog. maximum likelihood analyses are powerful, yet can lead to incorrect conclusions in certain situations, whereas species tree analyses are less powerful but more robust to well-known violations of the models used for maximum likelihood phylogenetic analysis, such as incomplete lineage sorting (see [5] and references therein). additional work is clearly warranted, especially by using additional taxa, testing for convergence and specific violations of gene-tree models, and other sources of conflict among protein-coding genes and other portions of the genome. a principal difficulty for identifying mechanisms of pathogenesis of the sahf, in which the immune response may play a contributory role, is a lack of animal model resources that faithfully recapitulate human disease [41] . although laboratory mice (mus musculus) and rats (rattus norvegicus), which have substantial experimental methodologies and reagents, can be infected with junã­n virus (junv), the etiologic agent of argentine hemorrhagic fever, the pathogenesis is markedly different than human disease. the guinea pig (cavia porcellus) typically exhibits signs of disease that closely resembles human disease; however, there are few immunological or genetic tools for assessing the host response to infection. junv is also a bsl-4 and select agent, thus use of virulent strains is confined to only a few laboratories with highly specialized containment facilities. the pathogenesis of tcrv, a bsl-2 agent, in jamaican fruit bats exhibits many similarities to the sahf in humans, thus the use of transcriptome data could be useful for studying pathogenesis using a variety of new technologies, such as pcr arrays for pathway discovery, and for the development of antibodies to specific artibeus proteins that are important in the pathogenesis of disease. the transcriptome resource provided will facilitate research into artibeus host responses to infectious agents, including mechanisms of pathogenesis of arenavirus disease and will also provide further resource for additional understanding for the bat species evolution and physiological development. all procedures were approved by the unc institutional animal care and use committee (iacuc) and were in compliance with . substitution estimation scatter plot. we calculated the nonsynonymous mutation rate (dn) and synonymous mutation rate (ds) using orthologous genes between bat and human. two lines were drawn representing the two dn/ds cutoffs of 0.7 (green) and 1.0 (blue). doi:10.1371/journal.pone.0048472.g007 figure 8 . unrooted species tree from the orthologous dataset across six boreoeutheria mammals. the species tree was generated from 2378 gene loci. there was 95% bootstrap support for placing bats (chiroptera) sister to perissodactyla, cetartiodactyla, and carnivora. doi:10.1371/journal.pone.0048472.g008 the usa animal welfare act. unc animal care and use committee approval number, 1207c-ra-b-15. five bats from the university of northern colorado jamaican fruit bat colony were used for this work. male and female a. jamaicensis bats were euthanized by respiratory hyperanesthesia followed immediately by thoracotomy. tissues were aseptically removed and flash frozen in liquid nitrogen for subsequent rna extraction. tissues were homogenized in buffer rlt (rneasy kit, qiagen, valencia, ca) containing 2me using a bead beater and silicone beads. the homogenate was passed over a qiashredder column prior to total rna extraction according to manufacturer's instructions. for cell culture, one kidney from one bat was collected in serum-free hbss and minced under aseptic conditions, then trypsinized (trypsin-versene) at room temperature in a sterile 50 ml trypsin flask. cells were washed 36 in 10% fbs-emem, then seeded into a vented t-25 flask. the next day, unattached cells were removed and fresh 10% fbs-emem added. when cells approached confluence they were passaged with trypsin at a split ratio of 1:4. poly-ic was added to 50 mg/ml in two t-75 flasks containing 20 ml each of 10% fbs-emem and incubated for 6 hours, after which rna was extracted according to manufacturer's instructions (qiagen). total rna was extracted using the rneasy minelute cleanup kit (qiagen) and then shipped on dry ice to seqwright (houston, tx) for cdna library construction and sequencing. rna concentrations and quality were assessed by a260/a280 and a260/a230 absorbance values and agarose gel electrophoresis. a260/a280 values were all above 2.0 and a260/a230 were all above 1.9. electrophoresis of the rna samples demonstrated that 28s and 18s rrna were not degraded. libraries for the 454 were prepared from three tissues (kidney, lung, poly-ic-stimulated kidney cells). for 454 library construction, full-length cdna was synthesized with two set of primers for driver and tester cdna [42, 43] . single-stranded cdna was used for hybridization instead of double-stranded cdna. excess amounts of sense-stranded cdna hybridized with antisense-stranded cdna. after hybridization, duplex was removed by hydroxyapatite chromatography. normalized tester cdna was re-amplified with tester specific primer l4n. driver cdna was unable to amplify using l4n. an illumina truseq rna library was made from spleens according to manufacturer's instructions. the libraries were then sequenced according to manufacturer's recommendations: 454 using titanium chemistry and illumina using 26100 nucleotide paired-end sequencing on a hi-seq 2000. the 454 and illumina libraries were assembled individually and also by combining both libraries. bases from the 454 reads were called from the 454 generated sff file using pyrobayes [44] and 454 gs assembler (version 2.5) was used to perform the assembly. soap denovo [45] (version 1.04) was used to assemble reads obtained from spleen (illumina library). only contigs greater than 200 bases were used in the final analysis. prior to performing the combined assembly, duplicates from pre-assembled contigs of lung, kidney and spleen tissues were removed with cd-hit [46] (cd-hit-2009-0427) at default criterion and then combined into longer fragments with tgicl [47] . gigabayes [48] -a short-read snp/indel discovery program was used to detect polymorphisms. snp/indel detection was performed for both libraries separately. to make snp/indel predictions more reliable, we used the criterion that minor allele and major allele (alleles with fewer reads are minor alleles, and alleles with more reads are major alleles) occur at least twice and 8 times for 454 and illumina libraries, respectively. to identify the approximate relative position of conserved mammalian genes, we mapped the bat contigs on to the genome of mouse mm9 and human grch37 (downloaded from the ucsc genome browser) using blat v.34 [49] with a minimum score of 80 used as a filter. coordinates of the protein coding genes were obtained from ensemble (http://uswest.ensembl.org/index.html) xenoref and gtf files. we also normalized the number of blat hits based on the total annotated transcript regions (1000 nt upstream of 59 utr, 59 utr, cds, 39utr, and 1000 nt downstream of 39utr) that were present in the mouse and human. to predict precursor-microrna genes within assembled sequences, we downloaded precursor micrornas for mouse, rat and human from mirbase [24, 25] . we performed a blast search focused on high quality candidates, hits with $95% sequence identity [50] . based upon rnafold [51] secondary structure prediction, we further filtered out sequences that did not possess any hairpin loop structure. previously, it had been demonstrated that micrornas tend to have deterministic folding [52] and, therefore, we used unpaired structural entropy (use) to evaluate the rna secondary structure base pairing distribution (cutoff 0.83 use score). mir-abela, a support vector machine learning program [53] , was used to cross validate the prediction. the final remaining unfiltered sequences are considered as highly confident microrna candidates. orthologous contigs (against human, mouse, dog, cattle and horse) were identified using the reciprocal blast (blastn) approach [54] as it has been found to be superior to sophisticated orthology detection algorithms [55] . a stringent cutoff of 1e 220 was used to separate paralogs from orthologs. cdna sequences from human (homo_sapiens.grch37. 64 the substitution rate is inferred from orthologous genes between bat and mouse. sequences were aligned using macse [56] and an in-house java script was used to trim/remove codon gap triplets from the alignment. substitution rate was estimated using a maximum likelihood method implemented in the codeml program of paml 4.5 [57, 58] . the pairwise maximum likelihood analyses were performed in runmode-2. estimated rates of non-synonymous to synonymous substitutions (dn/ds) were plotted as a scatter plot. blast2go [59] was used to functionally annotate contigs. a combined graph was generated for each go category. to prevent overloading graphs, the sequence filter value was changed to 500 in all 3 categories (biological process, molecular function and cellular component). functional annotation was performed separately for all assembled contigs present in the combined assembly. based on categorizer [60] , we further classified the genes using the go slim database immune classes. the completeness of mapping the bat genes using euarchontoglires as a reference was further examined through kegg. to do this, we first downloaded the xml file of annotated kegg pathways [61, 62] for human and mouse. to identify genes that are functionally important within kegg pathways, kegggraph was used to represent a graph form of the kegg pathway. we further used kegggraph to compute the relative betweenness centrality, which is the algorithmic representation of the involvement of a node within a network. we chose to set a cutoff of grabbing the top 4 nodes within each network, or selecting the top 4 functionally important genes within each pathway [63] . open reading frame was predicted from the assembled contigs through the orfpredictor web server (http://proteomics.ysu.edu/ tools/orfpredictor.html) [64] . a customized java program was used to parse through the prediction to identify sequences longer than 300 nt. to perform additional annotation of the predicted open reading frame we used blastp with an e-value of 1e 23 against the most recent nr database that is available from ncbi during our analysis (august 26 th , 2012). using contigs that were functionally unannotated, we compared the jaimacan fruit bat contigs against three other available bat sequence dataset. myotis lucifugus and pteropus vampyrus genomes were downloaded from the ncbi tracedb ftp server (ftp://ftp.ncbi.nih.gov/pub/tracedb/). the pteropus alecto transcriptome was obtained from dr. a. papenfuss [13] . an evalue threshold of 1e 25 was used to indicate blast hit. we then used an r package venndiagram [65] for displaying the mapped unannotated contigs that overlapped between different bat genome and transcriptomes. to resolve the evolutionary relationship for the artibeus bat species, we filtered the putative bat orthologs between human, mouse, dog, cattle, and horse. insectivores such as hedgehog and dolphin were not used in our analysis due to limited gene annotation in these taxa. to obtain the best multiple sequence alignment for each putative orthologs, we used aqua's pipeline for performing multiple sequence alignment; the pipeline consists of multiple sequence alignment through muscle and mafft which is refined by rascal and assessed by normd [66] [67] [68] [69] . a customized java program was used to filter alignments (obtained through aqua) with greater than 5% gap per sequence. additionally, we filtered for sequences that are at least.1,000 bp long. phyml 3 was used to generate a maximum likelihood gene tree [70, 71] . mraic, a perl script wrapper for phyml, was used to infer the best substitution model for each gene tree based on aic, aicc, bic, and akaike weights [72] . aic was used as the objective function since not much variation was observed across different objective function. njst was used to calculate the unrooted species tree based on our gene trees [73] . a customized rprogram is used for performing a nonparametric bootstrap species tree through resampling nucleotides within loci as well as resampling the loci within the data set as described by seo [74] . fossil evidence and the origin of bats bats: important reservoir hosts of emerging viruses iucn red list version 2011.2: tabel 3a -status category summary by major taxonomic group (animals) ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis resolution of the laurasiatherian phylogeny: evidence from genomic data mammalian phylogeny: shaking the tree mammalian phylogenomics comes of age using genomic data to unravel the root of the placental mammal phylogeny confirming the phylogeny of mammals by use of large comparative sequence data sets phylogenomic analysis resolves the interordinal relationships and rapid diversification of the laurasiatherian mammals pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions a highresolution map of human evolutionary constraint using 29 mammals the immune gene repertoire of an important viral reservoir, the australian black flying fox emerging diseases in chiroptera: why bats? artibeus jamaicensis identification of a new venezuelan equine encephalitis virus from brazil humoral and cell-mediated immunity to histoplasma capsulatum during experimental infection in neotropical bats (artibeus lituratus) experimental rabies virus infection in artibeus jamaicensis bats with cvs-24 variants tacaribe virus, a new agent isolated from artibeus bats and mosquitoes in trinidad, west indies the phylogeny of new world (tacaribe complex) arenaviruses phylogenetic analysis of the arenaviridae: patterns of virus evolution and evidence for cospeciation between arenaviruses and their rodent hosts serological evidence of infection of tacaribe virus and arboviruses in trinidadian bats tacaribe virus causes fatal infection of an ostensible reservoir host, the jamaican fruit bat mirbase: the microrna sequence database mirbase: integrating microrna annotation and deep-sequencing data fast genes and slow clades: comparative rates of molecular evolution in mammals determinants of rate variation in mammalian dna sequence evolution determination of mitochondrial genetic diversity in mammals correlates of substitution rate variation in mammalian protein-coding sequences the gene ontology categorizer natural selection on protein-coding genes in the human genome a scan for positively selected genes in the genomes of humans and chimpanzees positive selection on the human genome a new method for estimating nonsynonymous substitutions and its applications to detecting positive selection systematic and integrative analysis of large gene lists using david bioinformatics resources bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists rtp801 is elevated in parkinson brain substantia nigral neurons and mediates death in cellular models of parkinson's disease by a mechanism involving mammalian target of rapamycin inactivation rtp801 is a novel retinoic acid-responsive gene associated with myeloid differentiation mutations within a furin consensus sequence block proteolytic release of ectodysplasin-a and cause x-linked hypohidrotic ectodermal dysplasia parallel adaptive radiations in two major clades of placental mammals junin virus. a xxi century update construction of a uniformabundance (normalized) cdna library construction and characterization of a normalized cdna library pyrobayes: an improved base caller for snp discovery in pyrosequences de novo assembly of human genomes with massively parallel short read sequencing cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences tigr gene indices clustering tools (tgicl): a software system for fast clustering of large est datasets a general approach to single-nucleotide polymorphism discovery blat-the blast-like alignment tool a uniform system for microrna annotation fast folding and comparison of rna secondary structures analyzing modular rna structure reveals low global structural entropy in microrna sequence identification of clustered micrornas using an ab initio prediction method gapped blast and psi-blast: a new generation of protein database search programs phylogenetic and functional assessment of orthologs inference projects and methods macse: multiple alignment of coding sequences accounting for frameshifts and stop codons paml 4: phylogenetic analysis by maximum likelihood paml: a program package for phylogenetic analysis by maximum likelihood blast2go: a universal tool for annotation, visualization and analysis in functional genomics research categorizer: a web-based program to batch analyze gene ontology classification categories kegg for integration and interpretation of large-scale molecular data sets kegg: kyoto encyclopedia of genes and genomes kegggraph: a graph approach to kegg pathway in r and bioconductor orfpredictor: predicting proteincoding regions in est-derived sequences venndiagram: a package for the generation of highly-customizable venn and euler diagrams in r muscle: multiple sequence alignment with high accuracy and high throughput mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform aqua: automated quality improvement for multiple sequence alignments rascal: rapid scanning and correction of multiple sequence alignments morephyml: improving the phylogenetic tree space exploration with phyml 3 estimating maximum likelihood phylogenies with phyml model selection and multi-model inference estimating species trees from unrooted gene trees calculating bootstrap probabilities of phylogeny using multilocus sequence data we thank jason shaw for assistance with bat handling and stephanie james for laboratory assistance, and uga's georgia advanced computing resource center and institute of bioinformatics for computational resources and support. we thank anthony papenfuss for providing the assembled transcriptome for p. alecto. contributed reagents/materials/analysis tools: ts tg ll. wrote the paper: tis tg ts. key: cord-269352-0o3mryu1 authors: dhama, k.; mahendran, mahesh; gupta, p. k.; rai, a. title: dna vaccines and their applications in veterinary practice: current perspectives date: 2008-04-19 journal: vet res commun doi: 10.1007/s11259-008-9040-3 sha: doc_id: 269352 cord_uid: 0o3mryu1 inoculation of plasmid dna, encoding an immunogenic protein gene of an infectious agent, stands out as a novel approach for developing new generation vaccines for prevention of infectious diseases of animals. the potential of dna vaccines to act in presence of maternal antibodies, its stability and cost effectiveness and the non-requirement of cold chain have heightened the prospects. even though great strides have been made in nucleic acid vaccination, still there are many areas that need further research for its wholesome practical implementation. major areas of concern are vaccine delivery, designing of suitable vectors and cytotoxic t cell responses. also, the induction of immune responses by dna vaccines is inconclusive due to the lack of knowledge regarding the concentration of the protein expressed in vivo. alternative delivery systems having higher transfection efficiency and the use of cytokines, as immunomodulators, needs to be further explored. recently, efforts are being made to modulate and prolong the active life of dendritic cells, in order to make antigen presentation a more efficacious one. for combating diseases like acquired immunodeficiency syndrome (aids), influenza, malaria and tuberculosis in humans; and foot and mouth disease, aujesky’s disease, swine fever, rabies, canine distemper and brucellosis in animals, dna vaccine clinical trials are underway. this review highlights the salient features of dna vaccines, and measures to enhance their efficacy so as to devise an effective and novel vaccination strategy against animal diseases. the property of naked dna to get transfected to mammalian cells in vivo was first reported by ito (1960) ; and three decades later, the concept of dna vaccine was evolved by wolff et al. (1990) , when they administered a recombinant bacterial plasmid dna to obtain the expression of β-galactosidase gene in mice. this paved way for the development of nucleic acid based vaccination, an effective way for the in vivo expression of desired protein to initiate immune response (oshop et al. 2002; liu 2003) . the application of dna immunization as a new generation vaccine has been well studied since its invention, and a variety of such vaccines have undergone clinical trials, in veterinary practice (dunham 2002; oshop et al. 2002; babiuk et al. 2003; babiuk et al. 2007 ). the dna vaccines elicit desired immune responses viz. cell mediated immunity (cmi) and humoral immune response (hir); and it is much easier for their manipulation using recombinant dna techniques and production in bacteria using fed-batch fermentation (liu 2003; liu et al. 2006) . as an effective vaccine, plasmid dna have a gene encoding a protective antigen of a pathogen, which when injected into host, is transcribed and translated, to induce a specific immune response. the dna vaccines, described as genetic immunization to elicit a protective immune response, have been further improved by exploiting various gene delivery methods, cytokine adjuvants and prime-boost (dna vaccine priming and recombinant protein boosting) approaches (sharma and khuller 2001; jiang et al. 2007 ). dna vaccines have several advantages, which include simplicity of manufacture, biological stability, cost effectiveness and safety, ease of transport in lyophilized form and the ability to act in presence of maternal immunity. besides, different genes can be combined simultaneously, making it possible to develop multivalent vaccines. the demerits of dna vaccines, of theoretical levels and not yet proven are; integration into host genome, activation of proto-oncogenes, inactivation of tumor suppressor genes and the possibility of generating anti-nuclear antibodies (sharma and khuller 2001; dunham 2002) . however, as the merits of dna vaccines outnumber the hypothesized demerits, presently they have moved towards second stage clinical trials with promising results, for human diseases like acquired immunodeficiency syndrome (aids), herpes infections, rabies, ebola, tuberculosis, malaria, and leishmaniosis. however, a commercial product has not reached the market yet, due to the safety concerns raised by the international regulatory organizations. regarding veterinary practice, the last few years have seen numerous trials of dna vaccines against various animal diseases like foot and mouth disease (fmd) and herpes virus infection in cattle, aujeszky's disease and classical swine fever in swine, rabies and canine distemper in canines, and avian influenza, infectious bronchitis, infectious bursal disease and coccidiosis in birds (oshop et al. 2002; dunham 2002; ding et al. 2005; gupta et al. 2006; patial et al. 2007 ). one of the distinct advantages of the dna vaccines is the possibility of differentiating infected from the vaccinated animals (diva), for effective disease eradication programs. the utility of 'marker' dna vaccines has been reported for diseases like fmd and avian influenza (lee et al. 2004; grubman 2005) . even though, dna vaccines has ushered a new era in veterinary vaccinology, the potential of the vaccine to develop higher levels of immune response, has to be further improved. keeping in view of the appealing features, and ease of generation of dna vaccines, in the present review, authors have meticulously portrayed the salient features of dna vaccines, ways to improve its efficacy, and their potential applications in veterinary practice. salient features of dna vaccines and strategies to improve vaccine efficacy dna vaccines, generated using plasmids, include a gene encoding target antigen under the transcriptional control of an effective viral/eukaryotic promoter, along with a polyadenylation signal sequence (poly-a) and a bacterial origin of replication ( fig. 1 ) (gurunathan et al. 2000) . the commonly used promoter has been derived from cytomegalovirus (cmv). the poly-a provides stability and effective translation; and the antibiotic resistance gene facilitates selection of bacteria (gurunathan et al. 2000; sharma and khuller 2001; liu 2003; brandsma 2006) . for complete optimization, the plasmid should have kozak sequence (gcca/gcc) upstream of initiator codon, and an enhancer, down stream of the poly-a signal (gurunathan et al. 2000) . as a vaccine, the amount of plasmid dna required for im administration is 10-100 μg in mice, 100-300 μg in small animals and 0.5-2.5 mg in large animals and humans; but, while using 'gene gun', only one hundredth of this amount is required (dunham 2002) . if properly optimized, the recombinant plasmids are capable of expressing the desired antigen in vivo (dunham 2002; babiuk et al. 2003; babiuk et al. 2007) . but, to elicit an effective immune response, the protein should undergo post-translational modification and retain their tertiary structure. after in vivo generation, the antigenic peptides are processed and presented by professional antigen presenting cells (apcs), like dendritic cells, by getting primed through direct transfection or by obtaining proteins from myocytes via cross presentation (fig. 2) (fu et al. 1997) , to further allow presentation via both mhc i and ii, to induce cellular and humoral showing essential features of a dna vaccine construct. transcription unit comprises of a promoter, desired immunogenic or protective gene and a polyadenylation signal sequence. a bacterial origin of replication (ori) and an antibiotic resistance gene are also incorporated in the vector back bone to permit growth and selection of the plasmid in bacteria immunity. however, in some cases, dna vaccines have failed to produce measurable antibodies even when the host got protected, suggesting a major role for cellular immune responses (seo et al. 1997; kodihalli et al. 2000) . in addition, the potential of dna vaccines to act in presence of maternal antibodies is remarkable, due to the ability to provide a durable source of the antigen by dendritic cells. similarly, when administered with cationic liposomes, they enhance mucosal immunity, to protect the host from respiratory or enteric infections. besides, researchers have suggested the immunogenic properties of the plasmid itself, by virtue of having cpg motifs, functioning as adjuvant and up-regulator of natural killer (nk) cells (krieg 2002) . for improving dna vaccines, efforts have been made to enhance their efficacy, through efficient vectors, and vaccine delivery systems that combines delivery route and formulation. in animals, dna vaccines are found less effective due to host impedance, improper transfection and low level of expression. but, various delivery formulations and route could enhance the immune response. delivery methods like suppositories (loehr et al. 2001) , needle free injector system (van rooij et al. 1998) , mucosal delivery (barnes et al. 2000) and topical application (oshop et al. 2002) , have been found useful. vaccination professional apcs like dendritic cells and macrophages receives the secreted antigens (cross presentation) from transfected somatic cells or directly get transfected. they process and present the antigenic peptides to major histocompatibility complex (mhc) class ii molecules for helper t cells, which releases a variety of cytokines to augment activation of various cells of immune system. activation of cytotoxic t lymphocytes occurs by degraded antigenic peptides that are associated with mhc class i molecules. these two mechanisms help in the generation of cellular immune responses. for humoral or antibody responses, b lymphocytes recognize and respond to antigens that are present extracellularly, or as secreted antigens using gene gun, help in direct transfection of dendritic cells, favoring effective antigen presentation (dunham 2002; ulmer et al. 2006) . also, by using cationic lipid complexes and cytokine adjuvants, the efficacy could be significantly improved (min et al. 2001; stevenson 2004; manoj et al. 2004a; dhama et al. 2007 ). however, the major lacuna that still persists is the lack of simultaneous generation of cmi and hir. hence, researchers have evolved various options to improvise dna vaccines for increasing cmi and hir together. addition of a eukaryotic secretory signal sequence have been found to improve the activity of cytotoxic and helper t cells. also, linking of ubiquitin molecule to enhance the proteasome-based degradation, and the use of toll like receptor (tlr) adapter molecule (myd88) along with the desired gene, may elicit a strong cmi and hir. further, the use of l-selectin or cytotoxic t lymphocyte antigen (ctla-4) ligands help better antigen targeting to the immune cells (chaplin et al. 1999) . similarly, targeting of the dendritic cells by heat shock proteins (hsp); microparticulate formulations, and complexing with nonionic block co-polymers, polycations or cochleates, may also increase vaccine efficacy (manoj et al. 2004a) . targeted intra-cellular delivery of plasmid dna can be further achieved using intracellular bacteria and viruses (liu 2003; daudel et al. 2007 ). recently, prime-boost regimens and ways to increase the functional life of dendritic cells have been explored (ulmer et al. 2006; tsen et al. 2007 ). also, 'electroporation', few days prior to vaccination, has been found to force plasmid into cells or promoting influx of inflammatory cells like macrophages in myocytes (babiuk et al. 2007; peng et al. 2007 ). further, nanoparticle-mediated plasmid delivery has been suggested for augmenting the immune responses (jiang et al. 2007 ). veterinary vaccinology is a rapidly developing field and currently, vaccines are not only used for the prevention of diseases in animals, but also to help solve public health crisis, by effectively checking emerging or re-emerging pathogens of zoonotic significance. advancement in science and technology, together with improved knowledge in immunology, microbiology and recombinant technology has played pivotal roles in introducing novel ideas in vaccinology (babiuk et al. 2003; liu et al. 2006) . shams (2005) , has pointed out that subunit vaccines, dna vaccines and vectored vaccines are rapidly gaining acceptance as new generation animal vaccines. the last few years have seen the development of nucleic acid vaccines against many diseases like classical swine fever, aujeszky's disease, rabies, foot and mouth disease, brucellosis, bovine tuberculosis, equine herpes infections, avian infectious bronchitis, avian influenza, infectious bursal disease, chlamydiosis and avian coccidiosis (table 1 ). dna vaccines have promoted a revolution in the concept of vaccination, which has been previously dominated by inactivated or live vaccines. among many advantages, dna vaccines also provide diva strategy, and hence vaccine-induced herd/flock immunity can be differentiated for effective sero-surveillance (lee et al. 2004) . hence, the latest research development in the field of nucleic acid-based vaccination strategy for preventing various bacterial and viral diseases of domestic animals as well as poultry has been discussed further in detail. bovines genetic immunization approach against bacterial diseases of bovines offers attractive possibilities for rapid and effective vaccine development. against brucellosis, an im administered l7/l12 gene has been reported to result in intracellular expression of the immunodominant l7/l12 protein (kurar and splitter 1997) . brucella abortus glyceraldehyde-3-phosphate-dehydrogenase (gdph), a t and b cell reactive protein, which could induce partial protection when co-administered with il-12; and the copper-zinc super oxide dismutase (sod) antigen of b. abortus has also been utilized for the generation of an effective dna vaccine (rivers et al. 2006) . against mycobacterial infections in cattle, huygen (2003) has explained the utility and feasibility of the dna vaccine approaches. dna vaccine based on myobacterium bovis protein mpb-83 when tested in mice has shown to elicit protective immune responses (chambers et al. 2000) . the use of costimulatory molecule cd 154 along with esat-6 gene has been suggested to enhance the immunogenic properties of mycobacterial dna vaccines. further, m. bovis ag85b gene has been found suitable while incorporating in plasmid vector due to its ability to induce a th1 type of immune response (teixeira et al. 2006 ). aside to this, recently, dna vaccines used in combination with bacillus calmette guerin (bcg) elicited superior protection during challenge studies (cai et al. 2006 dhama et al. 2007) . against viral diseases of bovines, the earliest reports suggest the use of a gene encoding the vp4 protein of bovine rotavirus (brv), found effective in stimulating a th1-like immune response (suradhat et al. 1997) . later, brillowska et al. (1999) reported the generation of an effective cellular immune response with the plasmid encoding envelope glycoprotein gp51 and transmembrane glycoprotein gp30 of the bovine leukemia virus (blv). also, dna vaccine encoding the fusion (f) gene of bovine respiratory syncytial virus (brsv) has been found to induce protection against the infection in calves. besides, several workers have developed successful dna vaccine strategies against bovine herpes virus-1 (bhv-1) infection (loehr et al. 2001; gupta et al. 2001; castrucci et al. 2005) . gupta et al. (2001) reported that dna immunization with gc gene of bhv-1 could induce neutralizing antibody and lympho-proliferative responses in bovines. also, bhv-1 gb gene along with il-12 has been suggested to enhance the ctl responses. further, suppositories containing plasmid coding for gd gene of bhv-1 induced mucosal immunity (loehr et al. 2001) ; and bovine cd 154 co-stimulatory molecule linked to gd gene enhanced the immune responses (manoj et al. 2004b ). it has also been suggested that gd gene confers higher protection when compared to gc gene. further, the intercellular trafficking ability of bhv-1 vp22 protein has been utilized to improve the efficacy of a dna vaccine encoding gd gene (zheng et al. 2005) . nucleic acid vaccine that could protect the cattle against bovine viral diarrhea virus (bvdv) infection has also been developed. plasmid dna, expressing the bvdv type 1 glycoprotein e2, induced virus-specific neutralizing antibodies (nobiron et al. 2003) . non-structural protein ns3 could also be used for inducing humoral immunity against bvd. further, liang et al. (2006) found that the dna prime boost regimens were effective for preventing bvd in cattle when compared to the administration of dna vaccine or protein vaccine alone. vp1 based dna vaccines are being utilized for developing effective vaccines against foot and mouth disease (fmd) (dong et al. 2005) . plasmid dna encoding the fmdv vp1 protein followed by boosting with a vp1 peptide conjugate resulted in production of high titers of neutralizing antibodies, suggesting that prime-boost strategy could be a key factor for the success of dna vaccine against fmd (jin et al. 2005) . also, the use of il-1 along with vp1 gene may provide enhanced immune response (shao et al. 2005) . recently, a microparticulate based dna vaccine has been developed that codes for the t and b cell epitopes of vp1 of the fmdv (wang et al. 2006) . dna vaccines are also being reported against some rickettsial diseases and ectoparasites of bovines. vaccine containing the gene of a major surface protein, msp1b, of anaplasma marginale, offered partial protection against challenge infection (de andrade et al. 2004) . dna constructs involving orf of genes, cpg1 and groel and groes of ehrlichia ruminantium could partially protect cattle against "heart water" disease. the potential of dna immunization with plasmid encoding antigen bm86 to induce humoral and cellular immune responses against the tick, boophilus microplus, has also been studied (ruiz et al. 2007 ). ovines and caprines nucleic acid vaccines have been developed that could confer protection to the common bacterial diseases of sheep and goats. dna vaccination with genetically detoxified phospholipase d of corynebacterium pseudotuberculosis, linked with ctla-4, protected sheep against caseous lymphadenitis (chaplin et al. 1999) . in case of anthrax in sheep, the protective antigen (pa83) gene of bacillus anthracis has been employed for developing highly promising dna vaccine (hahn et al. 2006 ). paratuberculosis or johne's disease, caused by mycobacterium avium subsp. paratuberculosis, has been successfully controlled by using plasmids coding mycobacterial heat shock protein antigen (hsp-65) (sechi et al. 2006) . against brucellosis, administering dna vaccine that encode brucella melitensis outer membrane proteins (omp), invasion protein b (ialb), periplasmic protein (bp26) and trigger factor (tf) have been found to induce significant immune responses, which could pave way for the effective control of brucellosis in goats gupta et al. 2007) . for preventing viral diseases of the small ruminants, dna vaccines have been developed that could protect against diseases like caprine arthritis-encephalitis (cae), foot and mouth disease (fmd), visna-maedi and rift valley fever. a plasmid expressing cae viral envelope gene with prime-boost vaccination strategy has shown to induce protective responses in goats (cheevers et al. 2003) . if using gene gun-based mucosal dna immunization against visna-maedi in sheep, the plasmid expressing the envelope gene of the virus in combination with ifnγ gene, is expected to give significant protection and also could restrict virus replication. for preventing fmd infection, niborski et al. (2006) reported the development of a vp1-based nucleic acid vaccine and found that it was more effective when used along with poly lactide co-galactide (plg) formulation. against rift valley fever, the viral glycoprotein gene of rift valley fever virus (rvfv) when used as dna vaccine have been found to induce neutralizing antibodies during experimental studies. recently, babiuk et al. (2007) reported that a single dna vaccination with the hepatitis b virus surface antigen gene (hbsag), in combination with electroporation, approached the efficacy of the commercial subunit vaccine in maintaining long-term protective serum antibody titers against hepatitis b virus in sheep. dna vaccines are also being developed against many ecto-and endo-parasites of small ruminants. vaccination of sheep with a plasmid for the boophilus microplus antigen bm86 co-administered with plasmid encoding for ovine granulocyte monocyte-colony stimulating factor (gm-csf) provided significant levels of protection against b. microplus infestation ). an effective recombinant plasmid, encoding the 15 kda sporozoite surface protein of cryptosporidium parvum has been developed for goats, which provided protective maternal immunity to offsprings (sagodira et al. 1999 ). also, schistosoma japonicum genes (sj28gst and sj23) in dna vaccine formulation offered partial protection as evidenced by a reduction in parasite counts (shi et al. 2001) . similarly, a nucleic acid vaccine against tapeworms, using the 45w gene of taenia ovis, a protective membrane bound antigen, showed optimum humoral response (drew et al. 2000) . swine regarding the dna vaccines developed against bacterial diseases in swine, very few research works have been reported. against swine enzootic pneumonia (sep), caused by mycoplasma hyopneumoniae, plasmid dna coding the heat shock protein gene (p42) should be a suitable candidate as it is capable of inducing both th1 and th2 immune responses. against sep, the capability of p97 adhesin repeat region of m. hyopneumoniae to produce immunogenicity in mice in dna vaccine formulation has also been described (chen et al. 2006) . however, considerable research has been directed towards developing dna vaccines to prevent swine fever, fmd and psuedorabies infection in pigs. plasmid constructs have been developed that could protect swine against the classical swine fever (csf), which causes significant losses to the pig industry in many asian and european countries (wienhold et al. 2005; andrew et al. 2006) . immunization of pigs with a plasmid expressing the complete e2 protein of classical swine fever virus (csfv), conferred protection against viral challenge; and the co-delivery of il-3, il-18 and cd 154 further enhanced the protective responses (wienhold et al. 2005; andrew et al. 2006; li et al. 2007) . regarding fmd in swine, dna vaccines encoding vp1 gene of o, a and c strains of fmd virus showed protection against the disease when administered using 'gene gun' (benvenisti et al. 2001) . two vp1 epitopes (amino acid residues 141-160 and 200-213) have also been found suitable to elicit both fmdv-specific t cell proliferation and neutralizing antibodies (wong et al. 2002) . researchers have also developed successful dna vaccines using prm and envelope (e) genes for japanese encephalitis virus and nucleoprotein (n) gene for transmissible gastroenteritis virus. for constructing dna vaccines against psuedorabies (aujesky's disease), the immunogenic viral protein genes such as gb, gc and gd, are often considered (van rooij et al. 1998; gerdts et al. 1999; dory et al. 2005) . plasmids encoding these glycoproteins, when co-injected with cpg motifs, improved the humoral immune response and provided better clinical protection against lethal psuedorabies infection (dory et al. 2005) . also, it has been suggested that gb and gd genes primes the immune system efficiently even in the presence of maternal antibodies. the utility of dimethyl-dioctadecylammonium (dda), as an adjuvant to psuedorabies dna vaccine, has also been reported. for the prevention of porcine reproductive and respiratory syndrome (prrs) in swine, dna immunization strategies could be formulated utilizing the viral orf 5 region that codes for a major envelope glycoprotein gp5. for endo-parasitic infestations like taeniasis (cysticercosis), recently a dna vaccine using taenia solium b antigen has been developed (guo et al. 2007) . canines during the last couple of decades, immunoprophylactic agents have been developed that have greatly reduced the incidence of infectious diseases of pet animals. in canines, even though live vaccines have been found superior in efficacy, it is expected that new generation vaccines may dominate the market in the near future. focus has been directed for developing dna vaccines to eliminate the bacterial diseases like leptospirosis and lyme disease. leptospirosis, being a zoonotic disease is being given due attention, and many works has suggested the generation of successful vaccines based on its endoflagellar (flab2) gene. there exist cpg motifs within the flab2 gene, which could give the dna vaccine an additional immunostimulatory property, with out the use of adjuvants (dai et al. 2003) . also, hemolysis associated protein (hap1) gene has been found to elicit protection against leptospirosis when encoded in plasmids. against borrelia burgdorferi, a spirochete causing lyme's disease, dna vaccine experiments are under trial stages. the role of outer surface protein genes (ospa and ospc) of b. burgdorferi to elicit protective immune responses when administered as dna vaccines has also been explored. among the viral diseases affecting dogs, the important ones are rabies, canine distemper and parvoviral infections, many studies have been conducted to analyze the utility of dna vaccines. initially, it was against the deadly rabies, the first nucleic acid vaccine was successfully developed (xiang et al. 1994) . after this, it was jiang et al. (1998) who reported a plasmid dna vaccine to protect the canine population from parvovirus infections, utilizing vp1 gene of canine parvovirus (cpv). later, by using vp2 gene of cpv, a successful dna vaccine was developed (gupta et al. 2005a) . now, most of the research has been centered on for developing dna vaccines against rabies and canine distemper (cd). the advantage of plasmid-based vaccine against rabies is that it is a valuable alternative for the mass production of cheaper rabies vaccine when compared to the cell culture based ones. a dna vaccine encoding the rabies virus glycoprotein (g) has been found to yield stronger and more durable virus neutralizing antibody titers in dogs, which has been proven beyond doubt by many researchers (rai et al. 2005; gupta et al. 2005b) ; and the role of trans-membrane domain of the rabies virus glycoprotein in assisting humoral immune response has been experimentally demonstrated (gupta et al. 2006 ). further, the utility of a chimeric glycoprotein gene, constructed from different lyssa viruses, can be used as a multivalent vaccine. also, some works have suggested the advantage of dna vaccine administered intra-dermally in ear in inducing long lasting protective titer against rabies. recently, a bicistronic multivalent dna vaccine has been developed by patial et al. (2007) , which comprised of rabies virus (g) and parvovirus (vp2) genes for inducing neutralizing antibodies against both viral pathogens. it was sixt et al. (1998) who first developed a plasmid vaccine against canine distemper (cd) that encoded the hemagglutinin (ha) and fusion protein (f) gene against this highly infectious and lethal disease of pups. also, plasmids containing the nucleocapsid (n), f and ha genes of a virulent cd virus strain can be developed which could elicit strong humoral and cellular immune responses (jain et al. 2007) . further, immunization with n protein-based dna vaccine against cd to elicit serum n-specific igg responses and thereby generating satisfactory protection should also be considered as a viable option. similarly, the use of ha and f protein genes in a cationic lipid formulation should work well for protecting the pups from clinical disease even in the presence of high titers of maternally derived antibodies. dna vaccines are also being attempted for canines, targeting protozoan diseases (fukumoto et al. 2007) . the gene coding the babesia canis protein p50 has been found to induce protective immunity against canine babesiosis, while the p36/lack antigen gene has been used against leishmaniosis. likewise, plasmid dna encoding a xenogenic tyrosinase can also function as tumor vaccine for prolonging the survival of dogs in cases of malignant melanoma. equines: the increasing international movement of horses and the relaxation of regulations have resulted in an increased incidence of equine infectious diseases. vaccination, along with management measures has become the primary method for the effective control rhodococcal infections, equine influenza, african horse sickness and herpes infections. the advent of recombinant technology has encouraged the development of new generation vaccines such as live-vectored vaccines and dna vaccines (minke et al. 2006 ). among bacterial diseases, dna vaccines have been generated to protect the foals from rhodococcus equi, which causes pyogranulomatous broncho-pneumonia. nucleic acid vaccine, expressing the virulence associated protein (vapa) gene of r. equi, induced an anamnestic response and has been found capable in generating specific igg antibodies (vanniasinkam et al. 2005) . against the equine herpesvirus-1 (ehv-1), a viral pathogen of horses causing respiratory, reproductive and neurological problems, the role of plasmid dna encoding the envelope glycoprotein d (gd) to induce humoral response, has been suggested; and the administration of gm-csf along with such vaccines significantly enhanced virus neutralizing antibody responses to ehv-1 (minke et al. 2006) . for equine influenza (ei), lunn et al. (1999) recorded complete protection of the experimental ponies during challenge after administering dna vaccine encoding hemagglutinin (ha) gene of ei virus. aside to this, administration of il-6 along with ha gene enhances vaccine efficacy and protection (olsen 2000) . to prevent the west nile virus infection in horses, the envelope protein genes (prm and e) have been incorporated in dna vaccine formulation to elicit satisfactory protection (hall and khromykh 2004) . similarly, giese et al. (2002) developed effective dna vaccines encoding the orf's (5 and 7) of equine arteritis virus (eav). development of nucleic acid vaccines has also been attempted for african horse sickness (vp2 gene) as well as against vesicular stomatitis (envelope gene). poultry historically, inactivated whole viruses with various adjuvant systems or live vaccines have been used for the successful prevention of various bacterial and viral diseases of poultry. however, the development of naked dna immunization as third generation vaccines has been well studied and recently a variety of such vaccines are in clinical trials for their use in poultry (oshop et al. 2002; dhama et al. 2006) . a plasmid dna vaccine encoding the enterotoxigenic escherichia coli k88 fimbrial protein elicited satisfactory protection during e. coli challenge (cho et al. 2004) . similarly, plasmids expressing the major outer membrane protein (momp) of chlamydophila psittaci serovar a and d strains have been found to induce protective immunity with significant reduction in clinical symptoms (vanrompay et al. 2001) . also, higher levels of protection against c. psittaci could be obtained while administering interferon gamma (ifn-γ) or vitamin d, along with dna vaccines. besides, dna vaccines have been developed against major viral infections of poultry like avian influenza, utilizing the ha gene of the virus (kodihalli et al. 2000; lee et al. 2004) . similarly, a vaccine encoding fusion (f) and haemagglutinin (hn) gene induced higher level of antibodies against newcastle disease (ndv) in chickens (loke et al. 2005) . for marek's disease, dna vaccine containing the clone of virulent serotype-1 mdv has been found useful during challenge infection in birds. vaccination with a mutated, nononcogenic v-src gene construct, derived from avian leucosis virus (alv), induced cytotoxic t-lymphocytes (ctl) to protect birds from tumors. protective utility of plasmid coded n protein gene of infectious bronchitis virus (ibv) has also been reported for which dna vaccines expressing the s1 glycoprotein of ibv has been suggested (seo et al. 1997) . against infectious bursal disease (ibd), vp2 gene or vp2/4/3 poly-protein gene of ibd virus-based dna vaccination has been found effective in protection . dna vaccination against duck hepatitis b virus has been shown to reduce viremia with rapid removal of the virus from the blood after the challenge. against chicken infectious anemia (cia), for the first time, the simultaneous in vitro and in vivo expression of viral proteins vp1 and vp2 has been studied for generating protective antibodies against the infection (senthil kumar et al. 2004) . also, dna vaccines have been developed against avian reovirus (σc protein gene), and egg drop syndrome (eds-76) virus (penton fiber gene fragment), and both these vaccines were found effective during their respective challenge studies. likewise, against protozoan infections, especially coccidiosis, successful dna vaccines have been developed, utilizing the 3-1e and etmic2 genes in combination with cytokines to provide protection from this economically important infection of birds (min et al. 2001; ding et al. 2005) . vaccination with dna is one of the most promising novel immunization techniques against pathogens, for which conventional vaccination regimens have been less effective. after about 15 years of experimentation, dna vaccines, nick named 'immunological silver bullet', have become well established in clinical trials. however, they have yet to proceed past the second phase trials primarily due to the inability to induce more potent immune responses in higher mammals. in small experimental animals, the milder host impedance has permitted the dna vaccines to induce lasting protective effects in contrast to much tougher host barriers in large animals. significant efforts have been put forward to identify methods of enhancing the immune response of plasmid dna to enable its practical implementation. prime importance has been given to develop vaccines to elicit both humoral and cellular immune responses. researchers have tried a variety of immune modulators, cytokines and co-stimulatory molecules, in this regard. if the potency is improved, plasmid dna vaccines, having numerous advantages, can be useful for the active immunization against infectious diseases of animals. considering the current trends and myriad possibilities, efforts should be targeted towards improving their delivery or to increase their immunogenic potential. poor cellular uptake and rapid in vivo degradation of plasmid dna has to be taken into account and novel delivery systems has to be developed along with the optimization of the plasmid vector. the major challenge in future is the improvement of the transfection efficiency of the dna vaccines. gene gun and electroporation can increase transfection and improve immune responses significantly, but these technologies have not yet advanced to routine use in animals. another promising approach is the development of microparticles as delivery systems or the non-invasive plasmid dna immunization. although the potency of the immune response has been weak while using topical application methods, stratum corneum disrupting agents and novel adjuvants may significantly improve them. further, the properties of dna vaccines have to be modulated via using cationic liposomes for promoting mucosal and systemic immunity, simultaneously. the current scenario of incorporating such novel methodologies unveils much promise regarding the development of effective, safe and economically viable nucleic acid vaccines. in this context, one should be optimistic regarding the continual research efforts for global implementation of dna vaccines as an effective immunological arsenal, which could ably address the threats posed by emerging and highly threatening infectious agents of animals. porcine interleukin-3 enhances dna vaccination against classical swine fever induction of immune responses by dna vaccines in large animals a single hbsag dna vaccination in combination with electroporation elicits long-term antibody responses in sheep recent developments in mucosal delivery of plasmid dna vaccines. current opinions in molecular therapy gene gun-mediate dna vaccination against foot-and-mouth disease virus dna vaccine design protection of cattle against bovine leukemia virus (blv) infection could be attained by dna vaccination a combined dna vaccine-prime, bcg-boost strategy results in better protection against mycobacterium bovis challenge vaccination trials against bovine herpesvirus-1 vaccination of mice and cattle with plasmid dna encoding the mycobacterium bovis antigen mpb83 targeting improves the efficacy of a dna vaccine against corynebacterium pseudotuberculosis in sheep prime-boost vaccination with plasmid dna encoding caprine-arthritis encephalitis lentivirus env and viral su suppresses challenge virus and development of arthritis evaluation of the immunogenicity of the p97r1 adhesin of mycoplasma hyopneumoniae as a mucosal vaccine in mice a plasmid dna encoding chicken interleukin-6 and escherichia coli k88 fimbrial protein faeg stimulates the production of anti-k88 fimbrial antibodies in chickens analysis of cpg motifs in endoflagellar gene (flab2) and expression vector (vr1012) of leptospiral dna vaccine. sichuan da xue xue bao yi xue ban use of attenuated bacteria as delivery vectors for dna vaccines immunization of bovines using a dna vaccine prepared from the jaboticabal strain of anaplasma marginale bm86 antigen induces a protective immune response against boophilus microplus following dna and protein vaccination in sheep immunity and disease resistance strategies in poultry: current and future prospects dna vaccines and prevention of infectious diseases in bovines: a review in ovo vaccination with the eimeria tenella etmic2 gene induces protective immunity against coccidiosis construction of recombinant plasmid with vp1 genes against asia i fmdv and elementary analysis of its immunological activity cpg motif in atcgat hexamer improves dna-vaccine efficiency against lethal pseudorabies virus infection in pigs humoral immune responses to dna vaccines expressing secreted, membrane bound and non-secreted forms of the taenia ovis 45w antigen the application of nucleic acid vaccines in veterinary medicine priming of cytotoxic t lymphocytes by dna vaccines: requirement for professional antigen-presenting cells and evidence for antigen transfer from myocytes prime-boost immunization with dna followed by a recombinant vaccinia virus expressing p50 induced protective immunity against babesia gibsoni infection in dogs potency of an experimental dna vaccine against aujeszky's disease in pigs stable and long-lasting immune response in horses after dna vaccination against equine arteritis virus development of novel strategies to control foot-and-mouth disease: marker vaccines and antivirals induction of protection against porcine cysticercosis in growing pigs by dna vaccination induction of immune responses in cattle with a dna vaccine encoding glycoprotein c of bovine herpesvirus-1 cloning of canine parvovirus vp2 gene and its use as dna vaccine in dogs immunogenicity of a recombinant plasmid dna containing glycoprotein gene of rabies virus cvs a dna vaccine that encodes rabies virus glycoprotein lacking transmembrane domain enhances antibody response but not protection induction of immune response in mice with a dna vaccine encoding outer membrane protein (omp31) of brucella melitensis 16m dna vaccines: immunology, application, and optimization comparison of the immunological memory after dna vaccination and protein vaccination against anthrax in sheep west nile virus vaccines on the use of dna vaccines for the prophylaxis of mycobacterial diseases a tumor reducing factor extracted by phenol from papillomatous tissue of cotton tail rabbits hemagglutinin (h) gene of canine distemper virus cloned in ptarget mammalian expression vector induces neutralizing antibody response in dogs nucleic acid immunization protects dogs against challenge with virulent canine parvovirus novel chitosan derivative nanoparticles enhance the immunogenicity of a dna vaccine encoding hepatitis b virus core antigen in mice dna prime followed by protein boost enhances neutralization and th1 type immunity against fmdv strategies for inducing protection against avian influenza a virus subtypes with dna vaccines cpg motifs in bacterial dna and their immune effects nucleic acid vaccination of brucella abortus ribosomal l7/l12 gene elicits immune response generation of reassortant influenza vaccines by reverse genetics that allows utilization of a diva (differentiating infected from vaccinated animals) strategy for the control of avian influenza oral dna vaccination with polyprotein gene of ibdv delivered by attenuated salmonella elicits protective immune response in chickens a semliki forest virus replicon vectored dna vaccine expressing the e2 glycoprotein of classical swine fever virus protects pigs from lethal challenge priming with dna encoding e2 and boosting with e2 protein formulated with cpg oligodeoxynucleotides induces strong immune responses and protection from bovine viral diarrhea virus in cattle dna vaccines: a review dna vaccines: recent developments and future possibilities suppository-mediated dna immunization induces mucosal immunity against bovine herpesvirus-1 in cattle improved protection from velogenic newcastle disease virus challenge following multiple immunizations with plasmid dna encoding for f and hn genes antibody responses to dna vaccination of horses using the influenza virus hemagglutinin gene approaches to enhance the efficacy of dna vaccines modulation of immune responses to bovine herpesvirus-1 in cattle by immunization with a dna vaccine encoding glycoprotein d as a fusion protein with bovine cd154 adjuvant effects of ill beta, il-2, il-8, 1l-15, ifn-alpha, ifn-gamma tgf-beta and lymphotactin in dna vaccination against eimeria acervulina use of dna and recombinant canarypox viral vectors for equine herpes virus vaccination efficacy of particle-based dna delivery for vaccination of sheep against fmdv dna vaccination against bovine viral diarrhoea virus induces humoral and cellular responses in cattle with evidence for protection against viral challenge dna immunization of dairy cows with the clumping factor a of staphylococcus aureus dna vaccination against influenza viruses: a review with emphasis on equine and swine influenza dna vaccination in avian virus neutralizing antibody response in mice and dogs with a bicistronic dna vaccine encoding rabies virus glycoprotein and canine parvovirus vp2 electric pulses applied prior to intramuscular dna vaccination greatly improve the vaccine immunogenicity development of rabies dna vaccine using a recombinant plasmid brucella abortus: immunity, vaccines and prevention strategies based on nucleic acids immune response in mice and cattle after immunization with a b. microplus dna vaccine containing bm86 gene protection of kids against c. parvum infection after immunization of dams with cp15-dna immunization with dna vaccines encoding different mycobacterial antigens elicits a th1 type immune response in lambs and protects against mycobacterium avium subspecies paratuberculosis infection development of dna vaccine against chicken anemia virus simultaneously using it's vp1 and vp2 proteins the carboxyl-terminal 120-residue polypeptide of ib virus nucleocapsid induces ctls and protects chickens from acute infection recent developments in veterinary vaccinology dna fragment encoding human il-1β 163-171 peptide enhances the immune responses elicited in mice by dna vaccine against foot-and-mouth disease dna vaccines: future strategies and relevance to intracellular pathogens laboratory and field evaluation of schistosoma japonicum dna vaccines in sheep and water buffalo in china cd virus dna vaccination induces humoral and cellular immunity and protects against a lethal intracerebral challenge dna vaccines and adjuvants dna immunization with a bovine rotavirus vp4 gene induces a th1-like immune response in mice dna vaccine using m. bovis ag85b antigen induces partial protection against experimental infection in balb/c mice enhancing dna vaccine potency by modifying the properties of antigen-presenting cells dna vaccines: recent technological and clinical advances effect of vaccination route and composition of dna vaccine on the induction of protective immunity against psuedorabies infection in pigs immune response to vaccines based upon the vapa protein of the horse pathogen, rhodococcus equi, in a murine model protection of turkeys against c. psittaci challenge by parenteral and mucosal inoculations and the effect of turkey interferon-gamma on genetic immunization enhanced immunogenicity of microparticulated multiepitope dna vaccine encoding t and b cell epitopes of foot and mouth disease virus in mice immunomodulatory effect of plasmids co-expressing cytokines in classical swine fever virus subunit gp55/e2-dna vaccination direct gene transfer into mouse muscle in vivo a dna vaccine against fmd elicits an immune response in swine which is enhanced by co-administration with interleukin-2. vaccine vaccination with a plasmid vector carrying the rabies virus glycoprotein gene induces protective immunity against rabies virus selection of protective epitopes for brucella melitensis by dna vaccination bovine herpesvirus 1 vp22 enhances the efficacy of a dna vaccine in cattle key: cord-285656-7o7ofk1e authors: dawson, harry d.; chen, celine; gaynor, brady; shao, jonathan; urban, joseph f. title: the porcine translational research database: a manually curated, genomics and proteomics-based research resource date: 2017-08-22 journal: bmc genomics doi: 10.1186/s12864-017-4009-7 sha: doc_id: 285656 cord_uid: 7o7ofk1e background: the use of swine in biomedical research has increased dramatically in the last decade. diverse genomicand proteomic databases have been developed to facilitate research using human and rodent models. current porcine gene databases, however, lack the robust annotation to study pig models that are relevant to human studies and for comparative evaluation with rodent models. furthermore, they contain a significant number of errors due to their primary reliance on machine-based annotation. to address these deficiencies, a comprehensive literature-based survey was conducted to identify certain selected genes that have demonstrated function in humans, mice or pigs. results: the process identified 13,054 candidate human, bovine, mouse or rat genes/proteins used to select potential porcine homologs by searching multiple online sources of porcine gene information. the data in the porcine translational research database ((http://www.ars.usda.gov/services/docs.htm?docid=6065) is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time pcr assays and reactivity information on >1700 antibodies. it also contains gene and/or protein expression data for >2200 genes and identifies and corrects 8187 errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for 5337 porcine genes. conclusions: this database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene databases. this database provides the first comprehensive description of three major super-families or functionally related groups of proteins (cluster of differentiation (cd) marker genes, solute carrier superfamily, atp binding cassette superfamily), and a comparative description of porcine micrornas. electronic supplementary material: the online version of this article (doi:10.1186/s12864-017-4009-7) contains supplementary material, which is available to authorized users. swine are an important models for human anatomy, nutrition, metabolism and immunology [1] [2] [3] . their organs are anatomically and histologically similar to humans as are their sensory innervation and blood supply [4] . pigs are naturally susceptible to infection with organisms that are closely related or identical to those species infecting humans including helminths (ascaris, taenia, trichuris, trichinella, shistosoma, strongyloides), bacteria (campylobacter, chlamydia, eschericia coli, helicobacter, neisseria, mycoplasma, salmonella), protozoans (toxoplasma) and viri (coronavirus, hepatitis e, influenza, nipah, reovirus, rotavirus) [2, 5, 6] . the last 10 years has seen a boon in the development of genetically modified pig as models for human cardiovascular and lung disease, neurodegenerative and musculoskeletal disorders [7, 8] and cancer [9] . there is also a robust effort to develop pigs as sources for organs and tissues for human xenotransplantation [10] . despite these potential strengths as a model, the lack of an annotated database for porcine gene and protein expression data is a limiting factor for translating findings in one species to another. multiple online databases exist for the storage and retrieval of diverse bovine, rodent or human biomedical data [11] [12] [13] [14] [15] [16] [17] [18] [19] . other databases exist for zebrafish (zfin, [20] ), c. elegans (wormbase, [21] ), and drosophila melanogaster (flybase, [22] ). databases that encompass multispecies analysis such as homologene and/or that rely on manual annotation such as innatedb [23] include bovine but not porcine genes. several porcine genome companion databases exist; however they lack robust manual annotation and are somewhat limited in scope or are infrequently updated [16] [17] [18] [19] . agbase, a large, multispecies functional analysis database allows the user to search 51,489 porcine genes based on 12 criteria including gene and protein names (uniprot) and gene ontology (go) annotations. furthermore, databases can contain a significant number of errors due to their primary reliance on machine-based annotation [24] . for example, the sus-bar database [19] is designed to identify protein orthologs based upon data that includes annotations from the machine-annotated ncbi genome. ncbi has recently begun to include go annotations into curated entries for non-human and rodent species but most of these are indirect and often based on observations made in other species. as swine are an important model for comparative human studies, there is a critical need to have a centralized, manually-curated source of information for biomedical research. to address these needs, we created the porcine translational research database. to generate content of immunological relevance, broadbased literature searches were conducted using the following terms: apoptosis, b cell development or activation, cd markers, chemokines, chemokine receptors, cytokines, cytokine receptors, dendritic cells, type 1 ifn induced genes, inflammation, nuclear factor kappa-lightchain-enhancer of activated b cells (nfκ-b) signaling pathway, toll receptor signaling pathway, t cell development or activation, th1 cell development and th2 cell development. in addition, immunologically related genes associated with the susceptibility to or pathology of allergy, asthma, arthritis, atherosclerosis and inflammation were included. in addition, the gene ontology consortium's community annotation wikis for immunology, cardiovascular disease and muscle biology were searched (http://wiki.geneontology.org/index.php/main_page). the jackson laboratory database of knockout mouse phenotypes was searched for genes leading to defects in immune or metabolic phenotypes when over or under expressed. these genes include the vast majority of genes that are related to immunity and inflammation [2, 3, 25, 26] . for additional metabolically related genes, genes involved in the transport or metabolism of macronutrients, trace vitamins and minerals were searched. other genes, associated with the susceptibility to or pathology of atherosclerosis, diabetes, and obesity, were identified. this process identified 13,054 candidate human, bovine, mouse or rat genes/proteins of interest used to select potential porcine orthologs by searching various online sources of porcine gene information. one to one orthology of protein coding genes were determined by protein structure similarity (best reciprocal blast hits) and the presence of a corresponding gene in the syntenic region of the human and or mouse genome. no 1:1 orthology could be established for members of some gene families including the leukocyte immunoglobulin-like receptor (lilr) killer cell immunoglobulin-like receptor (kir), carcinoembryonic antigen-related cell adhesion molecule (ceacam) and cytochrome p450 superfamilies. one to one porcine orthologs of human genes utilize the approved hgnc name according to the international society for animal genetics (isag) publishing guidelines. we defined pseudogenes by the criteria used by ensembl and enscode; namely the presence of one or more stop codons in the open reading frame that disrupt the protein structure, and (usually) a lack of intron structure at the genome level [27] . pseudogenes are further classified into processed, duplicated, unitary or polymorphic categories [27] . genbank (non-redundant, expressed sequences tag, high throughput genomic sequence, trace archive databases and whole genome shotgun contigs databases) was searched by discontiguous megablast using default settings (word size = 11), using reference sequence accession numbers to human, bovine, mouse or rat genes/ proteins of interest. a similar search was conducted in the following databases using the human or bovine reference sequence; nih intramural sequencing center (nisc) comparative vertebrate sequencing project [28] ; national center for biotechnology information (ncbi), sus scrofa genome assembly releases 102 to 105 and ensembl v10.2 releases 83 to 89. for genes that were determined to be missing from build 10.2 (additional file 1) (and for the mis-assembled or duplicated gene artifacts (additional file 1), we also constructed templates from de novo assemblies derived from illumina 80 bp reads of the pig alveolar macrophage transcriptome (dawson, unpublished results) using the de novo assembly algorithm of clc genomics workbench using word size of 20 and a bubble size of 50. when necessary, predicted templates (from bovine or human sequences) were supplemented with porcine expressed sequence tag (est) assemblies, single ests and portions of the published tibetan (bioproject # prjna291130), wuzhishan (bioproject # prjna144099), goettingen (bioproject # prjna291011) [29] , jinhua, meishan, bamei, large white, berkshire, hampshire, pietrain, landrace, rongshang and duroc (bioproject # prjna309108) porcine genomes [30] . ests were assembled using cap3 (http://doua.prabi.fr/software/cap3). rnaseq reads were then mapped to these predicted templates in order to derive the full-length consensus sequence (unambiguous 6x coverage) using clc genome workbench 7.0 (qiagen bioinformatics, redwood city ca). the following settings were used. mismatch cost =2, insertion cost = 3, deletion cost = 3, similarity fraction = 0.95, length fraction = 0.95. nucleotide sequences were translated using the expasy translate tool (http://web.expasy.org/translate/). a total of 1279 of these sequences have been deposited to the transcriptome shotgun assembly sequence database under bioproject prjna80971 and the short read archive under project srp013743). in silico-derived full-length rna sequences are provided for an additional 3391 genes. this process/pipeline is summarized in fig. 1 . a summary of these sequences is provided in table 1 . we randomly chose 268 of these mrna for comparison of the 5′, 3′ and orf length comparison to the corresponding human mrna. data are presented in additional file 4. for the 1041 protein-coding genes missing from the genome, we entered the gene symbols into the david version 6.8 (https://david.ncifcrf.gov) to assess overrepresentation of groups of gene with related function. the functional data were limited to human. nine hundred and fifty six genes out of 1041 genes were recognized and 955 had functional annotations, of the unrecognized gene 41 are pig or artiodactyl specific genes. data on functional enrichment of genes with a multiple comparison adjustment (benjamini) value of >0.05 are presented in table 3 . we chose the 60 largest proteins of extreme size (>3000 amino acids) to compare the status (number of loci and completeness) in the ncbi and ensembl build 10.2 genome. because exon preservation is usually well conserved and there is fragmentation of certain areas of the porcine genome, the number of exons for the corresponding human gene was used for comparison. lastly, we determined the chromosomal location of 1307 duplicated gene artifacts (2889 loci, additional file 2) to identify problematic regions. data are expressed as duplication per megabase (number of bases derived from the ncbi genome build (http:// www.ncbi.nlm.nih.gov/genome?term=sus%20scrofa) and are presented in fig. 2 . the currently described database was constructed in the filemaker po advanced v14.0 program (filemaker inc., it was deployed using the filemaker server advanced v14.0 program (filemaker inc., santa clara, ca). external access to the database has been successfully tested using chrome, internet explorer and safari browsers. other areas of the database were populated from existing published or our own unpublished data. each publication is manually reviewed and data (antibodies, real-time pcr assays, rna or protein expression data, functional data) is abstracted and entered into the database, along with the pubmed id, in the appropriate field. we have developed taqman real-time pcr assays for 1867 of these genes making them cross reactive for as many species as possible (1067 are partially or fully human gene cross reactive). this is to ensure that comparable areas of the gene are being analyzed as well as for economic reasons. we also conducted a literature survey to determine the sequence of porcine sybr green pcr assays. tissue-specific gene expression summaries, using these assays, are provided for these and other studies (i.e., those using microarray and rnaseq), and a comprehensive search of catalog and published literature to identify antibodies to the corresponding proteins. last, the "notes field" in the database was populated with information such as types of errors discovered, degree of 5′ and 3′ utr conservation, degree of positive selection pressure in various species, and intron status. when the gene (sequence) is present in the genome but not annotated as a gene, we annotate the gene in the notes field as "not an identified gene in ensembl build 10.2." or "not an identified gene in ncbi build 10.2". to date, we have generated 9720 full-length transcripts representing 9165 genes (table 1) . they include 1354 genes missing from ensembl build 10.2 (table 2 and additional file 1) and 1400 genes that have been sequenced at least two times (gene duplicated artifacts shown in table 2 and additional file 2 that were annotated as separate genes in either ensembl or ncbi builds. functional enrichment analysis of 1041 proteincoding genes that are missing from the genome reveals that genes that are annotated as cytokines (24, p = 0.0053) and transcription factors (68) (particularly homeodomainlike transcription factors (34, p = 0.032) and cenp-b/ helix-turn-helix (hth) domains (6, p = 0.035) are significantly overrepresented (table 3) . of note, the great majority of the interleukin 1 superfamily (il1f10, il1rn, il36a, il36b , il36g, il36rn, il37) members are significantly (p = 0.0073) overrepresented. data analysis that do not account for these genes risk missing assessment of important genes involved in inflammation and development. based upon gene number estimates from other closely related species such as human and cow, we estimated that our database has a coverage rate of approximately 42% of the porcine genome. these represent sequences found in 10,232 unigene entries (1.45 per gene), 9967 ncbi loci (5756 are single loci that are not duplicated gene artifacts or split into multiple loci, and 1793 genes have multiple (4211) loci. a total of 2109 and 1616 of the genes have no assigned unigene number or ncbi loci, respectively. in addition to go and kyoto encyclopedia of genes and genomes (kegg) annotations, literature-based functional annotations (derived from more than 5500 references) are provided for these sequences. we have also discovered a relatively large number (178) of porcine or artiodactyl-specific paralogs (additional file 3) for 104 protein or non-protein coding porcine genes. for genes with multiple paralogs, genes are named in the order of phylogenetic distance of the parent human or bovine gene. some of these genes are expressed pseudogenes. some of these genes have been previously discussed (i.e., cd36, il1b [25, 31] ) or will be discussed in the following sections. the transcripts we have generated for protein-coding genes include, on average, 70.5% of the corresponding 5′ and 3′ ends (each) of the human sequence (additional file 4). the orf is 99.4% conserved on a nucleotide count basis. these percentages indicate the fidelity of our procedure. we discovered extensive gene truncation (incomplete orf) and gene duplicated artifacts (genes sequenced more than once) among the machine annotated versions of these genes. these problems are common among 1st drafts of other genomes [32, 33] . gene duplicated artifacts appear most frequently for chromosomes 12, 2 and 3, and less frequently for chromosomes x, 11, 13, and 1 (fig. 2) . the most frequent areas should be targeted for re-sequencing or reassembly. analysis of the 60 largest porcine proteins in the database shows that gene fragmentation and truncation roughly correlate with protein size and number of exons (table 4 ). figure 4 shows blast search results from two extremely large proteins, hemicentin (hmcn1, panel a) and titin (ttn, panel b) that have 9 loci assignments each, in the current ncbi build. surprisingly, these proteins are not represented in ensembl build 10.2 as annotated genes. overall, of the 60 largest porcine proteins, only 6 and 10 are represented as single full-length sequences of the correct size in ensembl and genbank, respectively. we have deposited 12 de novo assemblies in the tsa archive and have provided in silico predicted rna and protein sequences for 37 of these genes. in previous studies, we extensively compared porcine, human and mouse genes related to immunity and inflammation [2, 3, 25, 26] . in the following section, we will summarize our findings for three major superfamilies or functionally related groups of proteins (cd marker genes, solute carrier superfamily, atp binding cassette superfamily) or non-coding rna (microrna) that have complete or nearly complete representation. cd markers (accessible as a group by entering cd markers in the annotations field) encode a heterogeneous group of cell surface proteins. the human leucocyte differentiation antigen (hlda) workshop has designated 408 molecules (some of which are grouped within a cd) as cd markers [34] . based upon our assembly and analysis, we could establish 1:1 orthology for 357 porcine genes to those that compose hlda version 10. forty-three genes are not present in the porcine genome or could not be designated as 1:1 orthologs. of these, nine genes (clec4c, clec4m, [35] [36] [37] . klrc2 (cd159c) is found in humans and rodents but not pigs. fcgr2c is a human-specific gene/pseudogene that belongs to a family of three low-affinity immunoglobulin gamma fc receptors (cd32) [38] . we have determined that pigs have two member of this family that roughly corresponds to fcgr2a and fcgr2b. tnfrsf14 (cd270) is a marker for b cells, dendritic cells, monocytes, and treg cells [39] found in humans and rodents, but not cows. although, canine, feline, equine and ursine homologs have been identified, this gene may be a pseudogene in pigs as the putative orf is interrupted by an endogenous retroviral sequence (h. dawson, unpublished). fcrl2 (cd307b) is a marker for b cells in humans. although sequences corresponding to fcrl2 have been identified in other mammals including dog and horse, no mouse ortholog has been identified [40] . this gene shows evidence of positive selection in humans [41] and is most likely a pseudogene in pigs. due to rapid evolution and post-speciation gene duplication, no 1:1 orthology could be established for most mouse and pig lilr or kir family members, including lilra4 (cd85g) and lilrb4 (cd85k) [42] . similarly, other than ceacam1 (cd66) and ceacam6 (cd66c), no 1:1 orthology could be established for most pig and mouse ceacam family members (ceacam3 (cd66d) ceacam5 (cd66e). ceacam8 (cd67) may be a pseudogene as ests in unigene ssc.60435 predict a 243 amino acid protein interrupted by several stop codons. ceacam8 and ceacam6 were previously determined to have no direct murine orthologs [35] . several other shared human-pig cd marker orthologs (adgre2 (cd312), adgre3 (cd313r), cd1a, cd1e, cr1 (cd35), cd58, fcgr2a (cd32), fcar (cd89), fcrl3 (cd307c), fcrl4 (cd307d), icam3 (cd50), ncr2 (cd336), ncr3 (cd337) and tlr10 (cd290r) have no rodent orthologs [2, 40, 43] . a significant number of errors were discovered in genes encoding porcine orthologs of human cd markers; 25 are not present in ensembl build 10.2, 88 of the proteins are truncated and 52 are duplicated gene artifacts. sixty-seven full-length mrna sequences encoding proteins, assembled from macrophage rna-seq reads, have been deposited in genbank. an additional 79 in silico constructs are provided. antibody data, gathered from publications, manufacturers or generated in house, is provided for 186 proteins including 395 monoclonal and 285 polyclonal antibodies. additional cross reactivity for 29 proteins is expected because they are >95% similar to human proteins. several of the cd marker family are members of other gene families including the solute carrier and atp-binding cassette super family. the human genome organization's gene nomenclature committee (hgnc) has assigned 395 genes to the solute carrier superfamily, 21 are pseudogenes and three hundred seventy four encode proteins (accessible as a group by entering solute carrier superfamily in the annotations field). these are organized into 52 subfamilies; about 25% are dedicated to nutrient transport. the porcine solute carrier super family contains 398 protein-coding members and all human subfamilies are represented. forty-two of these genes are present in other porcine genomes but missing from ensembl build 10.2, 113 are truncated and 58 of these are duplicated gene artifacts. sixty three full-length mrna sequences, assembled from macrophage rna-seq reads, have been deposited in genbank and an additional 159 in silico constructs are provided. forty-two of these genes are missing from all porcine genomes or are present as pseudogenes. among these genes are ucp1 (thermogenein), a protein involved in non-shivering thermogenesis and a pseudogene in pigs [44] and slc52a2, a primate specific riboflavin transporter [45] . other speciesspecific genes include eight primate-specific (slc2a14, slc22a24, slc35e2, slc35g3, slc35g4, slc35g5, slco1b1, slco1b7), one human specific (slc22a25) gene and 14 mouse or rodent-specific genes (slc6a20b, slc7a12, slc21a4, slc22a19, slc22a21, slc22a22, slc22a26, slc22a27, slc22a28, slc22a29, slc22a30, slco1a1, slco1b2, and slco6b1). slc25a18 is present in human and rodent genomes but is missing from bovine and porcine genomes. slc25a52 is present in primate and rat genomes but not mouse. slc9c2 is a pseudogene in mouse [46] . slc22a31 is an expressed pseudogene in pigs and is missing in rodents. slc22a11 is an expressed pseudogene in pigs and a non-expressed pseudogene in mouse. lastly, slc23a4, an intestinal nucleobase transporter [47] , is a pseudogene in humans but is present in pig, cow and rodent genomes. several porcine or artiodactyl-specific gene expansions are found in subfamilies (additional file 3) including slc7a3 (14 members), slc7a13 (3 members) slc22a6 (2 members), slc22a10 (4 members) and slc47a1 (2 members). the biological functions of these paralogs remain to be determined; however the parent genes are involved in amino acid (slc7a3, slc7a13) or dipeptide transport (slc22a6) [48, 49] . the hgnc has assigned 51 genes to the atp binding cassette superfamily, three are pseudogenes and 48 encode proteins (accessible as a group by entering atp binding cassette superfamily in the annotations field). these are organized into five subfamilies (a-g), about 20% are dedicated to nutrient (i.e., carotenoid, cholesterol and vitamin a) transport. the porcine atp binding cassette family contains 57 members and all human subfamilies are represented. these include five that are missing from ensembl build 10.2 and 18 that are duplicated gene artifacts. five of these genes are present in other porcine genomes, but missing from ensembl build 10.2, 21 are truncated, and 18 of these genes are duplicated gene artifacts, eleven full-length mrna sequences, assembled from macrophage rna-seq reads, have been deposited in genbank and an additional 24 in silico constructs are provided. an analysis of this superfamily revealed that abcc11 has no murine ortholog [50] and abca8 has no direct rodent ortholog as the gene has diverged into two paralogs, abca8a and abca8b [51] . the abcc4, a prostaglandin e2 transporter [52] , has diverged from the parent gene into five paralogs (abcc4l1, abcc4l2, abcc4l3, abcc4l4 and abcc4l5 (additional file 3). abca10, involved in human macrophage cholesterol transport [53] , is a pseudogene in rodents. it may be an expressed pseudogene in pigs as the predicted protein is half (787 amino acids) the size of human abca10 (1543 amino acids) and weak expression (by rnaseq) was detected in macrophages and moderated expression in intestine (h. dawson, unpublished) . abca17 is an expressed pseudogene is humans and pigs. like the solute carrier superfamily, most of the genes in the atp binding cassette super family have not been characterized at the functional level. nevertheless, the similarities and differences in the atp binding cassette and atp binding cassette super families impact the suitability of rodents and pigs as models for human drug and nutrient transport and metabolism. the exact number of micrornas in the porcine genome is unknown. there are 4272 annotated micrornas in the human genome (build 30). although there are several papers describing the measurement of porcine micrornas in various tissues or estimating the number in the porcine genome [54] [55] [56] [57] and three partially overlapping sources of porcine microrna sequences, the exact number of porcine micrornas is currently unknown. there are only 382, 385 and 816 (non-redundant) annotated pig mirna sequences in mirbase, ncbi gene build, and ensembl build 10.2, respectively. these three sources of information have a significant amount of overlap (fig. 5a) . we have consolidated this information and provide sequence data for our own predicted sequences based on conserved sequence identity to 1900 human, mouse or bovine sequences, to provide 1033 nonredundant porcine microrna sequences (accessible as a group by entering microrna in the annotations field). of note, all of the sequences found in mirbase were found in the ncbi gene build, 59 of the microrna sequences in ensembl were found to be duplicated artifacts, and 214 of the 1033 sequences are not present in the current ensembl gene build (10.2) . this includes 81 that we have predicted based upon their presence in other species and other unfinished porcine genomes. we discovered the following species-or genera-specific microrna; pigs (454), humans (199), primates (111) bovine (179), mouse (76) and rodents (20) . many of the porcine-specific microrna have arisen from biological duplication/expansion (additional file 3). a comparison of microrna that are present in pigs and shared among at least one of the three other species (human, cows, and mice) revealed that 318 microrna are shared among the four species, 107 are shared between pigs, humans and cows but not mice, and 34 are shared between pigs, mice and cows but not humans (fig. 5b) . thus, the frequency of non-conserved microrna preservation between human and pig is nearly three times that of mouse to pig. the porcine translational research database is named because of its unique utility to translate findings made in rodents to pigs and from those in pigs to humans. a comprehensive literature-based survey was conducted to identify genes that have demonstrated function in humans, mice or pigs. the resulting data in the database is documented by >6000 references. the database currently contains 65 data fields for each entry. our efforts to improve the genome and its annotation are similar to other efforts, for example the sequencing of 12,000 genes to supplement annotation of the pig genome [32, 33, 58] and de novo assembly of multiple pig genomes to reveal 1737 protein coding genes that are missing from ensembl build 10.2 [30] . the online supplemental data from the latter manuscript was unavailable at the time of the preparation of this manuscript so no comparison could be made. the manual assembly of >9700 rna sequences has direct practical implications for genomics-based analysis. the state of the current genome build (mis-annotations, duplication artifacts, and missing sequences) effectively prohibits its use for aligning rnaseq reads. we have used these sequences to compare gene expression separately from ensembl 10.2 and have also compared the number of reads obtained from the corresponding templates in ensembl 10.2. for the great majority of transcripts compared, as expected, our full-length sequences provided a higher level of sensitivity than the corresponding ensembl sequences (h. dawson unpublished) . the full 5′ and 3′ representation of each gene will also allow for characterization of regulatory regions and mirna target sites. in our estimation, >40% of transcripts in ensembl or ncbi genomes do not represent the fulllength gene. our efforts will also allow for further consolidation of porcine unigene numbers. currently, each gene is represented by from 0 to >10 unigene assignments, and >10% of genes have more than one. it is significant that we discovered a large number of errors (about 30% of entries) in the publicly available sequence databases (these can be accessed by searching the "notes field" using the word "error" (fig. 3) ). in addition to the duplication artifacts, mis-annotations and missing genes, we also encountered a number of rna sequences in publically available archives belonging to other species. for, example, human (ahr, af233432.1), panda (il2, nm_001199892.1) and rat (nudt14, ests in unigene ssc.85635) rna sequences are annotated as porcine derived. we also found sources of contaminating dna from completely unrelated species. for example, about 1/5 of porcine chromosome 4 clone cu076066.6 is from zebrafish. these sequences represent 6 zebrafish genes (loc100003615, loc447815, loc108179932, loc108183883, loc108183971, and loc103910681) and are annotated as porcine genes by ensembl build 10.2 (enssscg00000006223) and ncbi genomes (loc100739857). similarly, several ncbi loci (asna1l*, loc100737282, loc100737202, loc100620149, loc 100737282) and one ensembl locus (enssscg 00000026988) are derived from contaminating babesia bigemina genomic dna. we have discovered several sources of systematic errors in the ensmbl and ncbi gene/protein prediction or annotation pipelines. for example all selenoproteins in ensembl are truncated because the codon (uga) for selenocysteine is mistranslated or translated as a stop codon. we and others have identified a systematic error in the identification of another gene family, the taste receptor, type 2 (tas2r) superfamily. despite being intronless and mostly devoid of 5′ and 3′ utr regions, ensembl consistently fails to recognize them as genes [3] . these data illustrate the critical importance of the manual-curation process to reduce errors. we believe that this is the largest manually curated database for any veterinary species and that the infomantics are unique among those targeting a veterinary species in regard to linking gene expression to gene function, identification of related gene pathways, and connectivity with other porcine gene databases, as well as for reagents that measure gene and protein expression. in addition, it is the largest source of centralized antibody information for the pig. any database must be updated frequently in order to be useful. currently the database is updated monthly and we anticipate expanding the content to include all porcine genes. there are several super families of genes that will be the next targets of our efforts. one is the gpcr super family, the exact size of the gpcr super family is still unknown, but nearly 800 different human genes (or~4% of the entire proteincoding genome) have been predicted to code for them. we will also continue to develop and annotate new assays. we intend to include our own prediction analysis for the promoter and 3′ utr region of rna for transcription factor and microrna binding sites. lastly, we intend to synchronize our database with the porcine "snowball" array and porcine gene expression atlas [59] . the pig as a model for human nutrition a comparative assessment of the pig, mouse and human genomes analyses of pig genomes provide insight into porcine demography and evolution livestock models in translational medicine the pig: a model for human infectious diseases chlamydiaceae infections in pig genetically modified pig models for neurodegenerative disorders porcine models of muscular dystrophy pigs as models of human cancers genetic modification of pigs as organ donors for xenotransplantation iris: a database surveying known human immune system genes an update on the functional molecular immunology (fimm) database lefranc mp. imgt, the international immunogenetics information system(r): a standardized approach for immunogenetics and immunoinformatics gpx-macrophage expression atlas: a database for expression profiles of macrophages challenged with a variety of pro-inflammatory, anti-inflammatory, benign and pathogen insults immunoinformatics comes of age pede (pig est data explorer): construction of a database for ests derived from porcine full-length cdna libraries agbase: a functional genomics resource for agriculture piggis: pig genomic informatics system sus-bar: a database of pig proteins with statistically validated structural and functional annotation the zebrafish model organism database: new support for human disease models, mutation details, gene expression phenotypes and searching wormbase 2016: expanding to enable helminth genomic research flybase : a database for the drosophila research community innatedb: systems biology of innate immunity and beyond-recent updates and continuing curation use and misuse of the gene ontology annotations structural and functional annotation of the porcine immunome an in-depth comparison of the porcine, murine and human inflammasomes; lessons from the porcine genome and transcriptome the gencode pseudogene resource parallel construction of orthologous sequence-ready clone contig maps in multiple species functional analysis and transcriptional output of the gottingen minipig genome comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies activation of the transcription factor nuclear factor-kappa b in uterine luminal epithelial cells by interleukin 1 beta 2: a novel interleukin 1 expressed by the elongating pig conceptus lineage-specific biology revealed by a finished genome assembly of the mouse nomenclature of cd molecules from the tenth human leucocyte differentiation antigen workshop comparative genomics of natural killer cell receptor gene clusters signal regulatory proteins in the immune system siglecs and their roles in the immune system genomic organization of classical human low-affinity fcgamma receptor genes regulatory t cell expression of herpesvirus entry mediator suppresses the function of b and t lymphocyte attenuator-positive effector t cells fc receptor-like molecules from evolutionary genetics to human immunology: how selection shapes host defence genes inhibitory leukocyte immunoglobulin-like receptors: immune checkpoint proteins and tumor sustaining factors comparative functional evolution of human and mouse cr1 and cr2 the uncoupling protein 1 gene (ucp1) is disrupted in the pig lineage: a genetic explanation for poor thermoregulation in piglets novel riboflavin transporter family rfvt/slc52: identification, nomenclature, functional characterization and genetic diseases of rfvt/slc52 traditional and emerging roles for the slc9 na+/h+ exchangers identification and functional characterization of the first nucleobase transporter in mammals: implication in the species difference in the intestinal absorption mechanism of nucleobases and their analogs between higher primates and other mammals a new member of the cationic amino acid transporter family is preferentially expressed in adult mouse brain human organic anion transporter oat1 is not responsible for glutathione transport but mediates transport of glutamate derivatives characterization of the mouse abcc12 gene and its transcript encoding an atp-binding cassette transporter, an orthologue of human abcc12 evolutionary analysis of a cluster of atp-binding cassette (abc) genes multiple drug resistance-associated protein 4 (mrp4), prostaglandin transporter (pgt), and 15-hydroxyprostaglandin dehydrogenase (15-pgdh) as determinants of pge2 levels in cancer abca10, a novel cholesterol-regulated abca6-like abc transporter microrna buffering and altered variance of gene expression in response to salmonella infection deciphering the porcine intestinal microrna transcriptome structured rnas and synteny regions in the pig genome distribution of mirna genes in the pig genome large-scale sequencing based on full-length-enriched cdna libraries in pigs: contribution to annotation of the pig genome draft sequence a gene expression atlas of the domestic pig the authors declare that they have no competing interests publisher's note the dataset(s) supporting the conclusions of this article are included within the article, its additional file (additional files 1, 2, 3 and 4) and within the online database (http://www.ars.usda.gov/services/docs.htm?docid=6065).authors' contributions hdd, cc, bg and js contributed to the content of the database. hdd and jfu wrote the manuscript. all authors read and approved the final manuscript.ethics approval and consent to participate not applicable. not applicable.• we accept pre-submission inquiries • our selector tool helps you to find the most relevant journal submit your next manuscript to biomed central and we will help you at every step: key: cord-022226-qxp0gfp3 authors: meager, anthony title: interferons alpha, beta, and omega date: 2007-09-02 journal: cytokines doi: 10.1016/b978-012498340-3/50026-9 sha: doc_id: 22226 cord_uid: qxp0gfp3 interferon alpha (ifn-α) is a mixture of closely related proteins, termed “subtypes,” expressed from distinct chromosomal genes. interferon β (ifn-β) is a single protein species and is molecularly related to ifn-α subtypes, although it is antigenically distinct from them. ifn omega (ifn-ω) is antigenically distinct from ifn-α and ifn-β but is molecularly related to both. the genes of three ifn subtypes are tandemly arranged on the short arm of chromosome 9. they are transiently expressed following induction by various exogenous stimuli, including viruses. they are synthesized from their respective mrnas for relatively short periods following gene activation and are secreted to act, via specific cell surface receptors, on other cells. ifn-α subtypes are secreted proteins and as such are transcribed from mrnas as precursor proteins, pre-ifn-α, containing n-terminal signal polypeptides of 23 hydrophobic amino acids (aa) mainly. pre-ifn-β contains 187 aa, of which 21 comprise the n-terminal signal polypeptide and 166 comprise the mature ifn-β protein. ifn-ω contains 195 aa—the n-terminal 23 comprising the signal sequence and the remaining 172, the mature ifn-ω protein. at the c-terminus, the aa sequence of ifn-ω is six residues longer than that of ifn-α or ifn-β proteins. ifn-α, as a mixture of subtypes, and ifn-ω may be produced together following viral infection of null lymphocytes or monocytes/macrophages. the biological activities of ifns are mostly dependent upon protein synthesis with selective subsets of proteins mediating individual activities. ifns can also stimulate indirect antiviral and antitumor mechanisms, depending upon cellular differentiation and the induction of cytotoxic activity. the phenomenon of viral interference was first described nearly 60 years ago when hoskins (1935) described the protective action of a neurotropic yellow fever virus against a viserotropic strain of the same virus in monkeys. although viral interference was further investigated in the 1940s and 1950s, the underlying mechanism was not discovered until 1957 when isaacs and lindemann, working at the national institute for medical research (london, uk), isolated a biologically active substance from virally-infected chicken cell cultures that, on transfer to fresh chicken cell cultures, produced a protective antiviral effect (isaacs and lindemann, 1957) . the word interferon (ifn) was coined for this substance. its discovery aroused considerable scientific and medical interest since by 1957 antibiotics were widely available to control bacterial infections, but, in stark contrast, viral diseases such as influenza, measles, polio, and smallpox were virtually untreatable. interest was further heightened by many subsequent studies that demonstrated that ifn could be produced by human cells and was active against a broad spectrum of viruses (see schlesinger, 1959 , for an early review). at that time, ifn was being hailed by the media as a wonder drug, but it soon became clear that ifn was being produced naturally in too small quantifies for that extravagant claim to be immediately confirmed. in fact, the low production of ifn was to bedevil attempts both to characterize it molecularly and evaluate it clinically for many years following its discovery. although the protein nature of ifn was recognized at an early stage in its development (see fantes, 1966 , for an early review), it was only following the introduction of large-scale production methods in the 1970s (cantell and hirvonen, 1977) and the simultaneous development of efficient purification procedures (knight, 1976; rubinstein et al., 1978) that sufficient amounts of partially pure ifn protein became available for characterization and clinical use. gradually, it became apparent that ifn was not a single protein and that there were likely to be different types of ifn molecules. however, despite progress in the area of purification and in initial characterization by sequencing n-terminal polypeptides, ifn proteins all but defied full characterization until the advent of recombinant dna (rdna) technology in the late 1970s. this technology, spurred on by the pharmaceutical industry's desire to produce pharmacologically active proteins cheaply, revealed that one type of human ifn, now designated ifn-cx, was a mixture of several closely related proteins, termed subtypes, expressed from distinct chromosomal genes . second and third types of ifn, designated ifn-i3 and ifn-y respectively, have subsequently been "cloned" (taniguchi et al., 1980; gray et al., 1982) but, unlike ifn-0q are single protein species. ifn-13 is molecularly related to ifn-~ subtypes but is antigenically distinct from them, whereas ifn-y is both molecularly and antigenically distinct from either ifn-cx subtypes or ifn-i3. (for this reason, ifn-y is considered separately elsewhere in this volume.) finally, a fourth type of ifn, antigenically distinct from ifn-0~ and ifn-i3, but molecularly related to both, has more recently been cloned and characterized. rather untypically, this new ifn type has been designated ifn omega (ifn-00) (adolf, 1987) . the genes for ifn-0~ subtypes, ifn-13 and ifn-03 are tandemly arranged on the short arm of chromosome 9. they are only transiently expressed following induction by a variety of exogenous stimuli, including viruses. ifn-0~, ifn-i3 and ifn-00 proteins are synthesized from their respective mrnas for relatively short periods following gene activation and are secreted to act, via specific cell surface receptors, on other cells. early studies on the characterization of ifn receptors indicated that ifn-0~ and ifn-i3 were likely to share a common receptor, but it has only been comparatively recently that such receptors have been cloned (uz~ et al., 1990; novick et al., 1994) . ifn actions are initiated by activated receptors and cytoplasmic signal transduction pathways, which are now well characterized for the ifn-~/i3 receptor, and manifested following expression of a number of ifnspecific inducible genes. induction of the antiviral state, which is dependent on such protein synthesis, may now be viewed as just one of the many activities attributed to ifn in general; these activities include inhibition of cell proliferation and immunomodulation (see pestka et al., 1987 , for a review). in the 1960s, two types of lfn were defined on the basis of the capacity of their antiviral activity to withstand acidification to ph 2. these were termed type i ifn for acid-stable ifn and type ii ifn for acid-labile ifn. type i ifn included ifn produced by virally infected leukocytes, alternatively known as leukocyte ifn, and ifn produced by virally infected human diploid fibroblasts, alternatively known as fibroblast ifn. type ii ifn, which was only produced by antigenically or mitogenically stimulated human peripheral blood mononuclear cells (pbmc), has often been referred to as immune ifn (stewart, 1979) . antigenic differences were described for leukocyte and fibroblast ifn and these were put on a molecular basis when knowledge of their respective n-terminal amino acid sequences became available (allen and fantes, 1980; knight et al., 1980; levy et al., 1980; zoon et al., 1980) . at that time, an international nomenclature committee (stewart et al., 1980) reviewed the growing evidence for the existence of distinct molecular forms of ifn and introduced the greek alphabetical system to apply to the then known antigenically distinct types of ifn. leukocyte ifn was designated ifn-~, fibroblast ifn was designated ifn-13, and immune ifn became ifn-y. however, complications immediately arose when it was revealed, following the cloning of several different leukocyte ifn complementary dnas (cdnas) brack et al., 1981; goeddel et al., 1981; streuli et al., 1980) , that leukocyte ifn was heterogeneous and contained many different, molecularly and antigenically related species, now commonly referred to as subtypes. the research group at hoffmann-la roche labeled the subtypes produced in e. coli ra, 0~b, 0~c, 0cd, etc. (evinger et al., 1981; rehberg et al., 1982) , distinguishing them from natural components of leukocyte ifn , while the biogen group labeled these recombinant subtypes ix1, 0~2, 0c3, 0~4, etc. , regrettably without an appropriate alphabeticalnumerical correspondence: for example, ~k -~2, 0cd -0~1. however, the numerical system now prevails. when an ifn preparation is a mixture of ifn-~ subtypes, e.g., leukocyte ifn, lymphoblastoid ifn, this is often designated ifn-o~n. the later cloning of cdnas encoding ifn-0~-like proteins (capon et al., 1985; hauptmann and swetly, 1985) initially led to the naming of this new ifn as ifn-o~ subclass ii, with all of the earlier-characterized ifn-0~ subtypes being reclassified as ifn-tx subclass i. this large, unwieldy nomenclature system has been superseded by the renaming of ifn-~ ii as ifn-03; this has been generally accepted with the finding that ifn-03 is antigenically distinct from ifn-~ and ifn-i] proteins and thus qualifies as a separate type of ifn (adolf, 1987) . fortunately, the nomenclature for ifn-i] has remained straightforward since there is only one protein species, at least in humans (derynck et al., 1980 taniguchi et al., 1980) . from the initial cloning of ifn-o~ cdnas, there has been a plethora of reports on the cloning of new, and sometimes distinct, genomic and edna clones, and fairly disparate nomenclatures have arisen. diaz and allen (1993) therefore undertook the considerable task of compiling the ifn genes and genomic and cdna clones from the literature and introduced an arabic-alphabetical system for naming ifn genes to enable their distinction from ifn proteins. thus, ifn-oc genes became ifna genes with the addition of a numeral to denote subtype, i.e., ifna1, ifna2, etc. (table 25 .1). the ifn-~ gene became ifnb and, since there is only one gene in humans, it is referred to as ifnb 1. for ifn-03 genes, w has been used; hence ifnw1 (table 25 .1). besides genes that are capable of being expressed and translated into ifn proteins, there are a number of pseudogenes which are unable to give rise to ifn proteins, and which in this new nomenclature system are designated by a p, e.g., ifnap22, ifnwp2, or simply ifnp1 where pseudogenes are clearly ifn-like but cannot be definitely included in any one of the ifna, ifnb or ifnw gene families (table 25 .1). the ifn genomic and edna clones have been designated in a variety of ways, as illustrated in table 25 .1. pseudogenic or non-translatable clones are normally prefixed with a greek ~. for the purposes of this chapter, and to reduce the complexity of naming ifn clones and proteins, the nomenclature system adopted by weissmann and colleagues will be adhered to: ifn-0q, %, {x4, o~, etc. 2.1 numbers, structure, and localization in humans, there are 14 nonallelic ifna genes, one of which, ifnap22, is a pseudogene (table 25 .1). in addition, there are a further four nonallelic pseudogenes that possibly also belong to the ifna gene family. probable allelic variants of certain ifna genes, e.g., ifna2, are also known to exist goeddel et al., 1981; dworkin-rastl et al., 1982; emanuel and pestka, 1993) . this extensive family of ifna genes are tandemly arranged on the short-arm of chromosome 9 (9p23) and span a region of approximately 400 kb (owerbach et al., 1981; shows et al., 1982; slate et al., 1982; ullrich et al., 1982) . the ifnb gene and the ifnw gene/pseudogene family are also located in the same region of chromosome 9 (meager et al., 1979a,b; owerbach et al., 1981; henry et al., 1984; capon etal., 1985) . there is a high degree of homology among the ifna genes, but these show much less homology to either ifnb or ifnw genes. nevertheless, all of these ifn genes share the common feature of being intron-less (taniguchi et al., 1980; goeddel et al., 1981; houghton et al., 1981; capon et al., 1985) , suggesting a very ancient origin of their common ancestral gene. it has been proposed that the primordial ifn gene arose some 500 million years ago, with the first split occurring around 400 million years ago to yield the first ifna and ifnb genes (wilson et al., 1983) . since then the ifna gene has evolved and duplicated many times to give rise to the multiple ifna genes found in present-day animals and man (mij~ita and hayashida, 1982; gillespie and carter, 1983) . around 100 million years ago, an ifna gene appears to have diverged sufficiently from the main group to give rise to the ifnw gene family, which is present in most mammals except mice and dogs (himmler et al., 1987; roberts et al., 1992) . it is not clear what characteristic of ifna and ifnw genes enabled the numerous reduplication events to occur in comparison to the nonexistent or more limited (some mammals have more than one ifnb gene, e.g., bovines (wilson et al., 1983) ) reduplication of the ifnb gene (ohlsson et al., 1985) . it is apparent that gene conversion, as a result of mismatch repair and unequal crossover, contributed significantly to the creation of distinct, but highly homologous, nonallelic ifna (and ifnw) genes (de maeyer and de maeyer-guignard, 1988) . in most cases, both coding and noncoding regions have diverged, but in the case of ifna13 the coding region has remained identical to that of ifna1, although its 5' and 3' flanking regions have diverged (todokoro et al., 1984) . the structures of ifna, ifnb, and ifnw genes are similar. each gene has a 5' regulatory promoter region upstream from the transcriptional start (cap) site, a ifna10 ifna13 ifna14 ifnals ifna~ ifna21 ifnap~ ifn-oq ifn-o h, lelf-d ifn-oq (d) ;~a 2, ifn-a ifn adapted and modified from diaz and allen (1993) and allen and diaz (1996) , which see for specific references for genomic clones and cdnas. reproduced with permission of mary ann liebert, inc., new york, usa. coding region containing a nucleotide sequence encoding a signal polypeptide of 21-23 mainly hydrophobic amino acids, which is typical for secreted proteins, and consecutively the sequence encoding the mature ifn protein, followed by the 3' flanking noncoding region, which can vary in length up to 450 base pairs (bp) (figure 25 .1) (derynck etal., 1980 (derynck etal., , 1981 nagata et al., 1980; streuli et al., 1980; taniguchi et al., 1980; degrave etal., 1981; goeddel etal., 1981; gross et al., 1981; lawn et al., 1981a,b; gren et al., 1984; capon et al., 1985; henco et al., 1985) . the 5' flanking region contains a tata or hogness box, which delineates the boundary of the upstream promoter, approximately 30 bp from the cap site. farther upstream are found a number ofhexameric repeat sequences gaaann, where n can be any base, which in their dimeric or multimeric forms act as binding sites for nuclear transcription factors and repressor molecules (fujita et al., 1985; ryals et al., 1985) . (this area is covered in more detail in section 2.2, inducers and transcriptional control). the 3' flanking regions vary in length and contain several polyadenylation sites and thus can give rise to mrnas of different lengths (mantei and weissmann, 1982; henco et al., 1985) . they contain above-average numbers of the sequence motifs atta or ttatttat. such sequences are common, however, in many other cytokine genes and other genes, such as protooncogenes, that are inducibly and transiently expressed. it has been proposed that these sequences contribute to the relative instability and short half-lives of ifn and cytokine mrnas (caput et al., 1986; shaw and kamen, 1986 ). all ifn genes are normally silent and thus require some sort of stimulus to induce expression. a wide range of inducers, including viruses, bacteria, mycoplasma, endotoxins, double-stranded polynucleotides or rna (dsrna), and some cytokines, have been shown to efficiently activate transcription of ifn genes (stewart, 1979; de maeyer and de maeyer-guignard, 1988 ). such inducers have in general the potential to induce expression of all ifna, ifnb, and ifnw genes; however, there appears to be cell-and inducer-specific selectivity that governs the type and numbers of ifn genes expressed. for instance, virally induced human diploid fibroblasts produce mainly ifn-13 and only a minor amount of ifn-~ (havell et al., 1978) , whereas virally induced pbmc produce mainly ifn-a plus ifn-o3 and only a minor amount of ifn-~ (cantell and hirvonen, 1977; adolf et al., 1990) . this has to some extent been confirmed at the mrna level (shuttleworth et al., 1983; hiscott et al., 1984) . differences have also been reported in the proportions of individual ifn-0~ subtypes produced by different cell types (goren et al., 1986; finter, 1991; greenway et al., 1992) , suggesting that ifna genes may be differentially expressed. however, the way in which such differential expression is regulated is presently not understood. transcriptional control of ifna genes resides in their 5' flanking region, upstream from the cap site. nucleotide deletions outside of position-117 from the cap site have little impact on transcriptional control, but deletions farther in eliminated induction, indicating that this region,-117 to-1, contained promoter regulatory elements (ragg and weissmann, 1983; weidle and weissmann, 1983) . these have been further delineated as a purine-rich nucleotide tract between-109 and-64, containing hexameric repeats of gaaann (gaaa g/c t/c), which appears to be necessary for inducible transcription and which has been termed the "virusregulating element" (vre) (ryals et al., 1985) . similar studies involving 5' deletions have been carried out with the ifnb gene, and it has been found that 5' sequences within-110 to -1 contain regulatory elements that are required for induction by viruses and dsrna. the minimum vre has been localized to-74 to-37 with respect to the cap site (goodbourn et al., 1986; goodbourn and maniatis, 1988) and contains two positive virus-inducible elements, termed positive regulatory domains (prdi,-77 to-64; prdii,-66 to -55), and a negative regulatory domain (nrd,-57 to -37) (figure 25 .1) (fujita et al., 1985; fan and maniatis, 1989; whittemore and maniatis, 1990; nourbakhsh et al., 1993) . in addition, the hexameric repeat sequences gaaann (also present in ifna genes) spanning from -110 to-65 contain variants of the prdi sequence and two further regulatory elements, prdiii (-90 to-78) and prdw (-104 to-91) have been identified that are required for a functional vre in ifnb gene expression in mouse l cells (dinter and hauser, 1987; du and maniatis, 1992) . prdi and prdiii act as binding sites for a nuclear transcription factor, designated "interferon regulatory factor-l" (irf-1) , whose expression is transiently increased by virus infection and which appears to mediate the activation of transcription of the ifnb gene (fujita et al., 1988; harada et al., 1989; xanthoudakis et al., 1989) . a second virus-inducible factor, designated "interferon regulatory factor-2" (irf-2), also binds to prdi but suppresses, rather than activates, transcription (harada et al., 1989 (harada et al., , 1990 . prdii is a binding site for the nuclear transcription factor nfra3 (clark and hay, 1989; fujita et al., 1989a; hiscott et al., 1989; lenardo et al., 1989; visvanathan and goodbourn, 1989) , which interacts with the major groove of the dna (thanos and maniatis, 1992) . additionally, another protein, highmobility group y/l, also binds to prdii, interacting with the minor groove of the dna (thanos and maniatis, 1992) . both factors appear to be necessary for virus induction of the ifnb gene promoter. prdiv contains a binding site for a protein of the camp response element binding protein (atf/creb) family of transcription factors (du and maniatis, 1992) . viral induction of the ifnb gene is thought to occur following activation of pre-existing nftcb and by de novo synthesis of irf-1; these nuclear transcription factors bind to the tandemly arranged prdi and prdii and act cooperatively to initiate/activate transcription (leblanc et al., 1990; lenardo et al., 1989; visvanathan and goodbourn, 1989; fujita et al., 1989b; watanabe et al., 1991) . reporter constructs containing prdi supported by a simian virus 40 (sv40) enhancer, or (gaaagt)4 , which contains the functional equivalent of dimeric prdi (n~if et al., 1991) are activated not only by virus but also by overexpression of irf-1 macdonald et al., 1990) . however, in most cell lines, overexpression of irf-1 has led to poor induction of ifnb (and ifna) genes (harada et al., 1990; fujita et al., 1989b) or none at all (macdonald et al., 1990; reis et al., 1992) . this has been attributed to the repressive effect of irf-2, a homologue oflrf-1, which also binds to prdi (harada et al., 1989) . in the undifferentiated murine embryonal carcinoma (stem) cell line p19, which is refractory to viral induction oflfnb (and ifna) genes and which expresses neither irf-1 nor irf-2, overexpression of an introduced irf-1 construct leads to activation of endogenous ifn genes and to activation of reporter plasmids with the ifnb promoter (harada et al., 1990) . in addition, cell lines permanently transformed with an antisense irf-1 expression plasmid exhibited strongly reduced ifnb gene inducibility that nevertheless could be restored by transient transformation with an irf-l-overproducing expression plasmid (reis et al., 1992) . however, the role oflrf-1 in virus-induced activation of the ifnb promoter remains controversial (whiteside et al., 1992; pine et al., 1990) and ruffner and colleagues (1993) have shown that in murine embryonal stem cells devoid of both irf-1 gene alleles (irf-1 ~ ) viral induction of ifnb was only slightly higher in control irf-1 0/+ differentiated stem cells than that in irf-1 ~ differentiated cells. this suggests that while irf-1 at high levels may elicit or enhance induction of ifnb under certain circumstances, it is not essential for viral induction. in cultured mouse fibroblasts devoid of irf-1, ifnb induction by the synthetic dsrna molecule poly(i):poly(c) was absent, whereas induction by newcastle disease virus (ndv) was normal (matsuyama et al., 1993) . however, ifn induction in vivo by either virus or dsrna has been found to be unimpaired in irf-1 ~ mice, indicating that irf-1 is not essential . it has also become clear recently that the prdi site can bind factors other than irf-1 and irf-2, and these may be more important for regulating ifnb gene activation (whiteside et al., 1992; keller and maniatis, 1991) . in contrast, targeted disruption of the irf-2 gene to yield mouse fibroblasts deficient in the repressor irf-2 has been found to lead to upregulated induction of ifnb following ndv infection (matsuyama et al., 1993) . this suggests that irf-2 negatively regulates or represses ifnb gene induction. the induction of the ifna1 gene appears to be regulated differently from that of the ifnb gene. irf-1 is bound poorly by the equivalent prdi site in the ifna1 promoter and this promoter also lacks an nf~cb site ( figure 25 .1) macdonald et al., 1990) . the ifna1 gene vre does contain a hexameric repeat nucleotide sequence (gaaatg)4, designated a "tg-sequence" (macdonald et al., 1990 ), which appears to mediate virus inducibility when supported by an sv40 enhancer, but which does not respond to irf-1 (n~if et al., 1991) . it has, however, been reported that overexpression of irf-1 can induce ifna genes, at least under special circumstances (harada et al., 1990; au et al., 1992) . the ifnw gene, like ifna and ifnb genes, is virus inducible and has structural features in its 5' promoter region similar to those in ifna/b promoters ( figure 25 .1). in particular, hexameric repeat units are present, but are organized differently from those present in ifna/b genes (hansen et al., 1991; roberts et al., 1992) . however, the regulation of transcription of the ifnw gene has not been studied in detail. the 14 ifn-(x subtypes are secreted proteins and as such are transcribed from mrnas as precursor proteins, pre-ifn-(x, containing n-terminal signal polypeptides of 23 mainly hydrophobic amino acids (figure 25 .2). the signal polypeptide is cleaved off before "mature" ifn-(x molecules are secreted from the cell. from their cerna sequences, mature ifn-(x subtypes have been predicted to contain 166 amino acids (except ifn-%, 165 amino acids) nagata et al., 1980; goeddel et al., 1981; lawn et al., 1981a,b; gren et al., 1984) . the calculated molecular mass of recombinant ifn-(x subtypes is approximately 18.5 kda, although apparent molecular masses of leukocyte-derived ifn-(x subtypes in sodium dodecyl sulphate (sds)-polyacrylamide gels vary between 17 and 26 kda, possibly owing to variable processing of c-terminal amino acids (le w et al, 1981) and post-translational modifications. the amino acid sequences of ifn-(x subtypes are highly related, with complete identity at 85 of the 166 amino acid positions (langer and pestka, 1985; de maeyer and de maeyer-guignard, 1988) . this is illustrated in figure 25 .2, where the amino acid sequences of the subtypes are compared to an idealized consensus sequence. many of the positions where amino acids differ from subtype to subtype are conservative substitutions. interestingly, ifn-(x subtypes contain four cysteine residues whose positions (1, 29, 99, and 139) are highly conserved (figure 25 .2). these four cysteines form disulfide bridges (1-99, 29-139) which induce folding of the ifn-(x molecule (figure 25 .3) and whose integrity is essential for biological activity (morehead et al., 1984) . ifn-(x subtypes are predicted to contain a high proportion (-60%) of (x-helical regions and are folded to form globular proteins (zoon and wetzel, 1984) . it has not yet proved possible to apply x-ray crystallographic techniques to ifn-(x subtypes, or to human ifn-~3, but the three-dimensional crystal structure of recombinant mouse ifn-[3, which is approximately 60% related in amino acid sequences to its human counterpart, has been solved (senda et al., 1992) . this has revealed that mouse ifn-[3 has a structure which consists of five (x-helices folded into a compact (x-helical bundle ( figure 25 .4). from comparative sequence analysis it is predicted that in all mammalian ifn-(x and ifn-]3 proteins these five (x-helical domains are conserved (korn et al, 1994; horisberger and di marco, 1995) and thus show similarities with many other cytokines, which also have (x-helical bundle structures (bazan, 1990) . with one exception, ifn-(x subtypes do not contain recognition sites (asn-x-ser/thr) for n-linked glycosylation (henco et al, 1985) ; only ifn-(x14 contains two of these sites (figure 25 .2). nevertheless, o-linked glycosylation may be possible in other ifn-(x subtypes. for example, it has been found that natural ifn-%, purified from human leukocyte ifn, contains the disaccharide galactosyl-n-acetylgalactosamine in olinkage to thr-106 (adolf et al, 1991a) . however, since ifn-% is the only ifn-(x subtype with a theonine at position 106, it may represent the only o-glycosylated ifn-(x protein (figure 25 comprise the mature ifn-]3 protein (derynck et al., 1980 taniguchi et al., 1980; houghton et al., 1981) . although ifn-] 3 is the same length as the majority of ifn-a subtypes, it shows only approximately 30% amino acid sequence relatedness with them ( figure 25 .2) and is antigenically distinct. the ifn-~ protein lacks the n-terminal cys-1 residue present in ifn-a subtypes, but contains three other cysteines at positions 17, 31, and 141, the latter two corresponding to the disulfide bond pairing 29-139 in ifn-a subtypes. replacement of cys-17 by serine does not result in any loss of biological activity, whereas serine substitution of cys-141 does (mark et al., 1981; shepard et al., 1981) . as mentioned previously, on the basis of the threedimensional structure of recombinant mouse ifn-~ ( figure 25.4) , human ifn-[3 is predicted to contain five a-helices and to fold up into an a-helical bundle structure (senda et al., 1992) . human ifn-[3 has one potential n-glycosylation site at asn-80 (taniguchi et al., 1980) and n-linked oligosaccharides, primarily of the biantennary complextype, are known to be attached to this site in natural ifn-13 (hosoi et al, 1988) . however, these may vary considerably depending on the producer cell type (utsumi et al., 1989) . from cdna sequence data, it was predicted that pre-ifn-m contains 195 amino acids, the n-terminal 23 comprising the signal sequence and the remaining 172 the mature ifn-m protein (capon et al., 1985; hauptmann and swetly, 1985) . the amino acid sequence of ifn-m is therefore six residues longer at the cterminus than ifn-r or ifn-[3 proteins. however, it has been found that natural ifn-m is heterogeneous at the n-terminus owing to variable cleavage of pre-ifn-m; about 60% of mature ifn-m molecules carry two additional n-terminal amino acids (adolf, 1990; shirono et al., 1990) . it is approximately 60% related to ifn-cx subtype sequences, but only 30% related to that of ifn-[3 ( figure 25 .2), and is antigenically distinct from both ifn-~ and ifn-[3 (adolf, 1990) . nevertheless, the four cysteines occur in the same notional positions, 1, 29, 99, and 139, as they do in ifn-~ subtypes and it is likely that ifn-m will have a similar cx-helical bundle structure to those predicted for both ifn-cx and ifn-[3 proteins (senda et al., 1992) . ifn-m has one potential site at asn-78 for n-linked glycosylation and natural ifn-m has been demonstrated to be a glycoprotein with biantennary complex oligosaccharides (containing neuraminic acid) attached at this site (adolf, 1990; adolf et al., 1991 b) . type i ifns (ifn-0t,-[3, and-m) are produced by a variety of normal cell types responding to extracellular or intracellular stimuli (stewart, 1979) . ifn-~, as a mixture of subtypes, and ifn-m may be produced together following viral infection of null lymphocytes or monocytes/macrophages (cantell and hirvonen, 1977; adolf, 1990) . the proportions of ifn-0t subtypes may vary according to the type of virus used as inducer (hiscott, 1984; finter, 1991) . however, production of ifn-[3 is usually restricted to double-stranded polynucleotide, e.g., poly-inosinic, poly-cytidylic acid, or it is well established that the biological activities of ifns are mostly dependent upon protein synthesis with selective subsets of proteins mediating individual activities. antiviral, antiproliferative and immunomodulatory activities have been ascribed to ifn-~/[3/c0 (reviewed in pestka et al., 1987; de maeyer and de maeyer-guignard, 1988) . the proteins and mechanisms involved in these activities are described below. virally-induced normal fibroblasts and other tissue cell types, e.g., epithelial cells (meager et al., 1979; stewart, 1979) . in all the above cases, the amount of ifn secreted is dependent on the dose of the inducer. besides normal cells, a range of transformed and tumor-derived cell lines are known ifn producers, e.g., mg63 human osteosarcoma cell line (meager et al., 1982) , and the namalwa b-lymphoblastoid cell line (phillips et al., 1986) . generally speaking, adherent fibroblastic cell lines produce mainly ifn-[3 and only a minor quantity of ifn-~ (havell et al., 1978) , whereas nonadherent myeloid or lymphoid cell lines produce mainly ifn-c~ and only a small amount of ifn-i3 (cantell and hirvonen, 1977; shuttleworth et al., 1983; zoon et al., 1992) . actual production of ifns lasts only a matter of a few hours following induction. this is due to ifn mrna instability and the rapid shut-off of ifn gene transcription (caput et al., 1986; shaw and kamen, 1986) . under conditions where ifn mrna stability is increased, e.g., by blocking protein and rna synthesis following induction, ifn production has been shown to be "superinduced" once the block on protein synthesis is removed (meager et al., 1979; stewart, 1979) . despite there being vast numbers of viruses with different replication strategies, it appears that many viruses can be countered by relatively few ifn-inducible "antiviral" proteins (samuel, 1987) . one of the bestcharacterized of these is a family of enzymes collectively known as "2-5a synthetase" which, in the presence of dsrna (often an intermediate of viral rna synthesis) catalyses the formation of an unusual oligonucleotide, ppp (a2'p) na (2-5a), which in turn activates an ifninduced latent endonuclease, rnase l (williams and kerr, 1978; wreschner et al., 1981; ghosh et al., 1991; hovanessian, 1991; lengyel, 1982; zhou et al., 1993) . when activated, the rnase l degrades viral (and cellular mrna) and therefore inhibits viral protein synthesis. small rna viruses (picornaviridae), e.g., mengo virus and murine encephalomyocarditis virus (emcv), whose replication is cytoplasmic are most inhibited by the induction of2-5a synthetase (lengyel, 1982; rice et al., 1985; chebath et al., 1987; kumar et al., 1988) . a further important ifn-induced "antiviral" protein is a dsrna-dependent protein kinase, now designated pkr, which in the active form phosphorylates the peptide initiation factor, eif2, involved in polyribosomal translation of mrna (miyamoto and samuel, 1980; gupta et al., 1982; samuel, 1987) . phosphorylated eif2 is inactive and thus viral protein synthesis is inhibited. this inhibition has been associated with the loss of replicating capacity of reoviruses and rhabdoviruses such as vesicular stomatitis virus (vsv). the 2-5a synthetase and pkr antiviral mechanisms are rather general and potentially could affect a wide range of viruses. however, ifn can induce certain proteins that inhibit specifically one class of virus. for example, the ifn-inducible mx proteins block the replication of influenza virus, probably by inhibiting the nuclear phase of viral transcription (mouse cells) or later cytoplasmic phases (human cells), without affecting the replication of many other viruses (staeheli, 1990; mel6n et al., 1992; ronni et al., 1993) . in addition, since ifns can impair various steps of viral replication, including penetration, uncoating and assembly of progeny virions as well as transcription and translation, there are likely to be several other antiviral proteins and mechanisms (de maeyer and de maeyer-guignard, 1988) . for instance, some viruses, e.g., herpes virus and certain retroviruses, appear to be inhibited at the relatively late stage of virus particle maturation and budding (aboud and hassan, 1983) . in the course of evolution, many viruses have developed countermechanisms by which they can disrupt the antiviral mechanisms induced by ifn. such countermechanisms often point to the significance of particular "antiviral" proteins. one of the main "targets" for several different viruses is the ifn-inducible pkr. the action of this kinase is overcome in adenovirus or epstein-barr virus (a member of the herpes virus family) by the production of small viral rna molecules, vaiand eber-rnas, respectively, which bind to pkr and block its activation by dsrna (clarke et al., 1991; ghadge et al., 1991; mathews and shenk, 1991) . reoviruses and vaccinia virus (a member of the pox virus family) produce viral proteins (sigma3 and ski, respectively), that bind to dsrna and thus reduce activation of pkr (sen and lengyel, 1992) . interestingly, if ifn-treated vsv-infected cells are coinfected by vaccinia virus, vsv replication is rescued, presumably partly by the inhibitory effect of ski on pkr (whitaker-dowling and youngner, 1983) . vaccinia virus also produces a nonfunctional protein analog of elf2 which competes with the real eif2 for phosphorylation by pkr and thus dilutes out the antiviral effect of activated pkr (beattie et al., 1991) . other viruses, such as influenza, may activate latent cellular inhibitors of pkr activity, e.g., a 58 kda protein (p58) (lee et al., 1992) . the 2-5a synthetase-rnase l system can also be subverted. for example, emcv, a picornavirus, can inactivate rnase l in several cell lines, but this inactivation is usually blocked by ifn treatment (lengyel, 1982) . herpes viruses, in contrast, appear to inhibit rnase l activation by producing competing analogs of2-5a (cayley et al., 1984) . some viruses even .have the ability to block the transcription of ifn-inducible genes. the "early" ela regulatory proteins of adenoviruses prevent the activation of isgf3 by ifn, probably by inhibiting the transcription of the isgf37 subunit gutch and reich, 1991; kalvakolanu et at., 1991; nevins, 1991) . in the case of hepatitis b virus-infected cells, the so-called virus-specified "terminal protein" inhibits ifn-inducible gene expression . the antiviral mechanisms induced by ifn are mediated by enzymes, e.g., 2-5a synthetase and picr, whose activities have broad implications for cell growth and proliferation. viral replication may be regarded as a form of pathological growth of a foreign, "cell-like", entity at the expense of a living cell. in the presence of ifn, enzymes are activated which curtail protein synthesis in general, but because viral protein synthesis is normally rapid, the inhibitory effect on viral replication appears more dramatic than on the slower and more complex cellular growth. possibly, the "ifn system" was evolved more as a part of a complex, interactive network of intercellular mediators of cell growth and proliferation than as one for antiviral mechanisms. recent investigations tend to support the role of ifns in regulating cell growth. for example, if a mutant form of pkr that is unable to phosphorylate eif2 is introduced into cells, they undergo neoplastic transformation (koromilas et al., 1992; lengyel, 1993; meurs et al., 1993) . this suggests that pkr normally acts as a "tumorsuppressor"' gene product. therefore, one of the mechanisms by which ifn inhibits cell proliferation could be through its capacity to induce enhanced expression/activity of pkr. ifn-~ has also been reported to inhibit cyclin-dependent cdk-2 kinase, which is responsible for phosphorylation of retinoblastoma (rb) protein, and this could contribute to antiproliferative activity (kumar and atlas, 1992; resnitzky et al., 1992; zhang and kumar, 1994) . the 2-5a synthetase-rnase l system may also have antiproliferative and tumor suppressor activities. for instance, the levels of these two enzymes are high in growth-arrested cells: introduction of 2-5a-like oligoadenylates into proliferating cells also causes growth impairment (sen and lengyel, 1992; lengyel, 1993; zhou et al., 1993) . the ifn-stimulated increases in synthesis of pkr and 2-5a synthetase are dependent on ifn-inducible transcription factors, such as irf-1 (isgf2) pine et al., 1990; williams, 1991; reis et al., 1992) . the latter has a short half-life and thus transcription of ifn-inducible genes is rapidly repressed by the longer-lasting, inhibitory irf-2 (harada et al., 1989) . if irf-2 is overexpressed, cells become transformed as even low level constitutive production of pkr and 2-5a synthetase, which can regulate normal cell growth, is abrogated. this transformation was reversed by overexpressing irf-1 (harada et al., 1993) , indicating that irf-1 can be viewed as a pivotal player in the growth, regulatory, and tumor suppressor machinery. a variety of other ifn-induced mechanisms, including suppression of oncogenes (contente et al., 1990) , depletion of essential metabolites (sekar et al., 1983) , and increased cell rigidity (e. wang et al., 1981) , could also contribute to its antiproliferative activity. the antiproliferative effects of ifns in different tumor cell lines cultured in vitro is highly variable. besides tumor cell lines, ifn-0t/[3 have antiproliferative activity in hematopoietic precursor cells, e.g., of the myeloid lineage (rigby et al., 1985; de maeyer and de maeyer-guignard, 1988 , for review). they are also potent inhibitors of angiogenesis, the process whereby blood capillaries are formed to envasculate tissues (sidky and borden, 1987) . besides activating intracellular processes, ifns can also activate intercellular activities, especially within the immune system, which are an essential part of host defense against infectious and invasive diseases. thus, ifns can stimulate indirect antiviral and antitumor mechanisms, which in the main rest upon cellular differentiation and the induction of cytotoxic activity. for example, in the presence of antigen-specific antibodies, macrophages can effect cell-mediated cytotoxicity. such antibody-dependent cell-mediated cytotoxicity (adcc) is enhanced by ifn, possibly through an augmentation of immunoglobulin g (igg)-fc receptor (fcr) expression (hokland and berg, 1981; vogel et al., 1983; de maeyer and de maeyer-guignard, 1988 ). in addition, another category of leukocytes comprising large granular lymphocytes and known as natural killer (nk) cells are activated, by unknown mechanisms, to kill virally-infected or tumor cell targets independently of major histocompatibility complex (mhc) antigen expression (trinchieri and perussia, 1984; rager-zisman and bloom, 1985; de maeyer and de maeyer-guignard, 1988) . ifns can stimulate increased expression of class i mhc antigens, i.e., hla-a, -b, -c, which are crucial for recognition of foreign antigen by cytotoxic t lymphocytes (ctl, cd8+); recognition of virally infected cells by ctl depends on class i mhc antigen presentation of viral antigens at the cell membrane (heron et al., 1978; fellous et al., 1979) . ifn-(x/[3 have sometimes been observed to increase class ii mhc antigen expression, which is necessary to trigger both humoral and cell-mediated immunity, but probably play a lesser role than ifn-y, which is the major class ii mhc antigen inducer (baldini et al., 1986; rhodes et al., 1986; de maeyer and de maeyer-guignard, 1988 ). all of the biological activities so far described (see above) for ifns have followed from in vitro experimentation. here it is possible to pick and choose conditions that favor particular outcomes, e.g., the antiviral response, by adjusting doses of ifn, times of incubation, levels of virus challenge, and so on. such experiments illustrate the range of biological activities of ifns but cannot define their physiological roles. that ifns have the potential for inducing antiviral and antitumor activity suggests their main role in vivo is to act as regulators of host defense mechanisms, and to prevent pathophysiological events occurring. investigations in experimental animals have supported this likelihood. for example, injection of mice with anti-ifn-~/]3 antibody has been shown to increase their susceptibility to a range of virus infections (virelizier and gresser, 1978; gresser, 1984) . the earliest evidence for an antitumor effect of ifn-tx/13 came from inoculation of murine l1210 cells into mice. l1210 cells are sensitive to the antiproliferative action of ifn-~/13 in vitro and in vivo, ifn~/[3 prevented tumor growth by these cells. however, when a clone of l1210 was isolated that was resistant to the antiproliferative action of lfn-~/[3, there occurred a similar retardation of tumor growth upon ifn-c~/]3 treatment to that observed with "sensitive" l1210 cells, suggesting that ifn was acting indirectly in vivo by a host-mediated mechanism (gresser et al., 1970 (gresser et al., , 1972 . a similar conclusion was reached more recently using ifn-resistant b-cell lymphoma cells . in the intervening years, many studies have been conducted confirming that ifn can act directly (e.g., human ifn-a 2 against a range of human tumor xenografts in nude mice where human ifn-0~ 2 has no activity on the murine immune system) and indirectly (reviewed by balkwill, 1989) . although antitumor activity has been clearly demonstrated by the application of exogenous ifns, it is not certain that endogenously produced ifns are involved in countering tumor growth. however, some experimental evidence that endogenous ifn could play a role in host resistance to cancer or its spread has been obtained by treating mice with anti-ifn antibodies. under these conditions, the intraperitoneal transplantability of six different experimental murine tumors was observed (gresser, 1984) . ifn-a/~ can inhibit the growth of hematopoietic progenitor cells in vitro (rigby et al., 1985) and this is also likely to occur in vivo. such an occurrence is undesirable in most instances, but suppression of proliferation of bone marrow hematopoietic cell precursors has been turned to advantage in protecting tumor-bearing mice against the cytotoxicity of chemotherapeutic agents such as 5-fluorouracil (5-fu) (stolfi et al., 1983) . ifns exercise their actions in cells via ifn-specific cell surface receptors. these receptors bind ifns with high affinity (aguet, 1980) and transduce the signal occasioned by ligand (ifn) binding across the cell membrane into the cytoplasm. ifn-cq ifn-[3 and ifn-co share the same binding sites (aguet et al., 1984; flores et al., 1991) , but ifn-y binds to different sites (branca and baglioni, 1981; aguet et al., 1988) . the binding of ifn-c~ and ifn-~ to lymphoid cells and fibroblasts has been studied extensively and has been reviewed (rubinstein and orchansky, 1986; branca, 1988; langer and pestka, 1988; grossberg et al., 1989) , and results have generally demonstrated the presence of up to a few thousand complex, high-affinity receptors per cell. chemical cross-linking studies with ~2si-labeled ifn-~ or ifn-[3 to receptor-bearing cells have led to the identification of various ifn-receptor complexes, their molecular masses ranging from 80 to 300 kda (joshi et al., 1982; eid and mogensen, 1983; faltynek et al., 1983; raziuddin and gupta, 1985; thompson et al., 1985; hannigan et al., 1986; vanden broecke and pfeffer, 1988; colamonici et al., 1992) . such results have suggested that there are either multiple binding sites for ifn-cx, ifn-[3, and ifn-m or that there are complex multichain receptors (colamonici et al., 1992; hu et al., 1993) . although on human cells all the ifn-c~, ifn-~, and ifn-m proteins compete for common binding sites, individual ifn-c~ subtypes show different levels of activities on cells (streuli et al., 1981; week et al., 1981; rehberg et al., 1982) which appear to correlate with their binding behavior to the cell surface (uzd et al., 1985) . in particular, ifn-% shows a much lower binding for human membrane receptors than either "ifn-0~ 2 or ifn-% (uz6 et al., 1988) . interestingly, this differential binding of human ifn-~ subtypes is not manifested in bovine cells, and all of the subtypes exhibit high specific activities (yonehara et al., 1983; shafferman et al., 1987) . ifn-i3 and ifn-m are also active in bovine cells (capon et al., 1985; adolf et al., 1990) , but this cross-reactivity does not extend to mouse cells, a feature that has provided experimental systems in which to characterize ifn-receptors. thus, somatic cell genetic studies with human • rodent hybrid cells containing various combinations of human chromosomes have provided evidence that the presence of human chromosome 21 confers sensitivity of such hybrid "rodent" cells to human ifn-~, ifn-i3 and ifn-m (tan et al., 1973; slate et al., 1978; epstein et al., 1982; raziuddin et al., 1984) . further, it was demonstrated that antibodies raised against human chromosome 21encoded cell surface proteins were able to block the binding and action of human ifn-~ to human cells, indicating that this chromosome contained a gene(s) specifying the human ifn cell surface receptor . the elucidation of the full complement of components of the ifn-~/[3/m receptor has long been sought. one methodology used to isolate receptor cdnas involves transfecting mouse cells with total human dna and then selecting for cells sensitive to human ifn-cz. after several attempts, this approach led successfully to the isolation of a 2.7 kb edna from a library constructed from human lymphoblastoid (daudi) cells, which encoded an ifn-cz binding protein (uzd et al., 1990) containing 557 amino acids (molecular mass 63485 da) including a signal sequence of 27 mainly hydrophobic amino acids. this protein has a structure typical of a transmembrane glycoprotein: a large n-terminal extracellular domain, which potentially could be highly glycosylated owing to a preponderance of n-linked glycosylation sites, a short hydrophobic transmembrane domain, and an intracellular or cytoplasmic tail (figure 25.5) . its amino acid sequence shows little homology with any currently available sequences of proteins, including the sequence of the human ifn-7 receptor (aguet et al., 1988) ; however, the extracellular domain has been predicted to show structural similarities with the latter receptor and to a lesser extent with the so-called hematopoietin receptor supergroup (bazan, 1990a,b) . the gene coding for this putative ifn-cz receptor has been mapped to chromosome 21.q22 , in confirmation of the earlier rodent x human hybrid cell data (tan et al., 1973; slate et al., 1978; epstein et m., 1982; raziuddin et al., 1984) . although the cloned "ifn-cz receptor" could be shown to confer sensitivity to human ifn-% in transfected mouse cells (uzd et al., 1990) , such cells were relatively insensitive to human ifn-% and human ifn-[3. these findings, together with those from anti-ifn-cz receptor antibody blocking studies (colamonici et al., 1990; revel et al., 1991; uz~ et al., 1991) and affinity cross-linking studies with ifn-% (colamonici et al., 1992) , have suggested that a second ifn-cz receptor exists or another component is required besides the cloned ifn-cz 8 binding protein, to complete the receptor complex. this hypothesis is further supported by a study in which introduction of a yeast artificial chromosome (yac) containing a segment of human chromosome 21 into chinese hamster ovary (cho) cells conferred a greatly increased response to both ifn-0~ 2 and ifn-0~8, as well as an increased response to ifn-i3 and ifn-m, whereas the expression of the ifn-~ 8 binding protein alone did not confer sensitivity (revel et al., 1991) . however, these increased responses can be "knocked out" by disruption of the ifn-~ 8 binding protein gene in the yac, and then reconstituted by expression of the cdna encoding the ifn-% binding protein (cleary et al., 1994) , suggesting that cell surface expression of this protein is required for a fully functional receptor (see also hertzog et al., 1994; constantinescu et al., 1994) . a second human ifn-~/i3 receptor, which is probably the additional component of the receptor complex referred to above, has been cloned (novick et al., 1994) . the 1.5 kb cdna encodes a 331-amino-acid protein, including a signal sequence, which has the predicted structure of a transmembrane glycoprotein. the nterminal ectodomain (217 amino acids) corresponds in sequence to a soluble 40 kda ifn-(x/[3 binding protein, p40, isolated from urine. this domain is linked to a transmembrane segment (21 amino acids) and a relatively small cytoplasmic domain of 67 amino acids ( figure 25 .5)i overall, the primary sequence shows little ifn-a/]3 binding protein bind ifn-a 2 but are insensitive to its effects, suggesting that an accessory protein, possibly the cloned ifn-c~ 8 binding protein, is required for signaling (novick et al., 1994; constantinescu et al., 1994) . the findings that anti-p40 antiserum and a particular monoclonal antibody to the ifn-cx 8 binding protein (benoit et al., 1993) both block the biological activity of ifn-0q3co indicate that the ifnhomology with that of the previously cloned ifn-a 8 . c~/13 and ifn-a 8 binding proteins are in close proximity, binding protein (uzd et al., 1990) , but when the extracellular domains are compared, 23.4% relatedness is found (novick et al., 1994) , suggesting that both of these ifn binding proteins belong to the same so-called class ii cytokine receptor family (uz6 et al., 1995) . two classes of cytokine receptor (class i and class ii) have been proposed by bazan (1990a,b) , these being distinguished by the positions of cysteine pairs in the extracellular domain. the latter is comprised of fibronectin type iii-like units containing around 200 amino acids and designated d200 (uzd et al., 1995) . a schematic drawing of both ifn-a/[5 receptor chains is shown in figure 25 .5. subsequently, it has been found that alternative splicing of the ifn-a/13 receptor gene can produce a transcript encoding a long form of the receptor protein containing a larger cytoplasmic domain of 251 amino acids . mouse cells transfected with the cdna encoding the and thus probably interact to form a high-affinity ifna/]3/co receptor complex. the most likely scenario on present evidence is for a two-chain ifn-c~/13/co receptor, comprising the cloned ifn-c~ 8 binding protein (uz6 et al., 1990) and the long form of the ifn-c~/[3 binding protein lutfalla et al., 1995) , each of which binds to some extent particular ifn types or ifn-(x subtypes but which together more strongly bind all ifn-~, -13, and -co species and function to transmit signals across the cell membrane. however, it is not completely ruled out that other cell surface components, e.g., membrane glycosphingolipids, are required for fully functional receptors (colamonici et al., 1992; platanias et al., 1994; ghislain et al., 1995; uz4 et al., 1995) or, possibly, that there are alternative ifn receptors, e.g., the epstein-barr virus/complement c3d receptor as an ifn-c~ receptor on b-lymphocytes (delcayre et al., 1991) . vaccinia virus and other orthopoxviruses contain a gene b 18r encoding a soluble type i ifn receptor which, unlike the class ii cytokine type receptors, belongs to the immunoglobulin superfamily (symons et al., 1995; colamonici et al., 1995) . 7.1 the intracellular domains of the two cloned ifn-binding proteins are unrelated to the tyrosine kinase class of receptors, e.g., epidermal growth factor receptor (egf-r) and platelet-derived growth factor-receptor (pdgf-r), and are not predicted to have kinase activity of any sort (uzd et al., 1990; novick et al., 1994) . however, it appears that the cytoplasmic domain of the ifn-0~ 8 and a/13 binding proteins associate with nonreceptor tyrosine kinases tyk2 and janus kinase 1 (jak1), respectively, known to be involved in the signal transduction pathway of ifn-a/[3 and other cytokines (novick et al., 1994; ghislain et al., 1995; ihle, 1995; ihle and kerr, 1995; velasquez et al., 1995) . the current understanding of this pathway is as follows. after binding of ifn-a/[3/c0 to their cognate receptors, the intracellular domains are phosphorylated by tyk2 and jak1. these phosphorylated domains act as docking sites for the cytoplasmic stat (signal transducers and activators of transcrilstion ) proteins p84/p91 (statla/b) and p113 (stat 2) (ihle, 1996) . the latter undergo tyrosine phosphorylation mediated by receptor-associated tyk2/jak1, dimerize, translocate to the nucleus, and combine with a dna binding protein, p48, to form the ifn-stimulated gene factor-3 (isgf3) transcription factor complex (schindler et al., 1992; velasquez et al., 1992; mtiller et al., 1993; platanias et al., 1994; shuai et al., 1994; gupta et al., 1996; yan et al., 1996) . both tyk2 and jak1 need to be reciprocally activated for signal transduction to occur, since cell mutants lacking either tyk2 or jak1 are unresponsive to ifn-0~ (ihle, 1995; ihle and kerr, 1995) . isgf3 binds to cis-acting ifn-stimulated response elements (isre), present in the promoter regions of ifn-inducible genes, to initiate their transcription (williams, 1991) . targeted disruption of the stat 1 gene in mice has shown that stat 1 has an obligatory role in ifn-cx and ifn-y signaling (durbin et al., 1996; meraz et al., 1996) . the jak1/tyk2-isgf3 pathway may not be the only "receptor-to-cell nucleus" signaling mechanism activated in ifn-stimulated cells. there has been some evidence to implicate protein kinase c (pkc) pathways as well (reich and pfeffer, 1990; pfeffer et al., 1991; c. wang et al., 1993) . however, this remains controversial owing to the lack of specificity of kinase inhibitors used (kessler and levy, 1991; james et al., 1992) . ifn-inducible genes have a common regulatory nucleotide sequence (g/a)ggaaan(n)gaaact in their 5' flanking region and this type of sequence, which resembles the vre sequences (ryals et al., 1985; present in ifn genes, is designated interferonstimulated response element (isre) (williams, 1991) . the resemblance between isre and vre sequences probably accounts for the finding that many ifninducible genes are transcriptionally activated by virus infection or dsrna, which also activate the transcription of ifn genes (hug et al., 1988; wathelet et al., 1988) . as mentioned previously (see section 5.1.3), ifn-receptor occupation activates cytoplasmic isgf-3 and this complex is translocated to the nucleus and binds to isre of ifninducible genes as a transcriptional activator. in addition, a second factor, isgf2, forms complexes with isre in ifn-stimulated cells. isgf2 is a single, inducible phosphoprotein that has been shown to be identical to irf-1 pine et al., 1990; williams, 1991; reis et al., 1992) . the role of a third transcription factor, isgf1, which is constitutively produced and requires only the central 9 bp core of isre for binding, remains to be fully defined (kessler et al., 1988 ) . a number of other negative regulatory factors, including irf2 (harada et al., 1989) and the isgf2 (irf1)/isgf3yrelated "human interferon consensus sequence binding protein" (icsbp) (weisz et al., 1992; bovolenta et al., 1994) , which also bind to isre, are also probably involved in the regulation of transcription of ifn-inducible genes. it is clear that the regulation of expression of ifninducible genes is complex and that the mechanisms that control their selective expression are not fully understood (see taylor and grossberg, 1990, for review) . ifninducible proteins, whose number probably exceeds 20, include both those proteins induced early after ifn stimulation and those proteins that may be produced at later times, often in response to the actions of "early" ifn-inducible proteins (sen and lengyel, 1992) . the full set of ifn-inducible proteins is probably not known, but several have been identified and characterized. table 25 .2 shows an incomplete list of ifn-inducible proteins together with their likely functions. some of these proteins are not exclusively induced by ifn-0~/[3/c0; ifn-y and certain other cytokines, e.g., tumor necrosis factor-~ (tnf-~) often induce spectra of proteins that overlap with the set induced by ifn-~/13/~ (revel and chebath, 1986; rubin et al., 1988; wathelet et al., 1992) . it should be noted that ifn-inducible proteins tend also to be cell type-specific and thus not all proteins listed in table 25 .2 will be expressed in all cell types. in some cases, ifn-inducible proteins are completely absent from a cell before ifn stimulation, but in other cases revel and chebath, 1986; sen and lengyel, 1992; samuel, 1987; staeheli, 1990. revel and chebath, 1986; lengyel, 1992. de maeyer and de maeyer guignard, 1988; heron et al, 1978 . schwemmle and staeheli, 1994 . ronni et al, 1993 . revel and chebath, 1986 . c. wang et al., 1993 . kumar and atlas, 1992 . loeb and haas, 1992 . aiidridge et al, 1989 . fellous et al, 1982 . sen and lengyel, 1992 . sen and lengyel, 1992 . bange et al., 1994 . chebath et al, 1983 . choubey and lengyel, 1992 . lawn et al, 1981a constantoulakis et al, 1993 . porter and itzhaki, 1993 . hokland and berg, 1981 . thomas and linch, 1991 . revel and chebath, 1986 they are being constitutively produced, their synthesis augmented by ifn. the genes for mouse ifn-~ subtypes and mouse ifn-[3 (no functional mouse ifn-c0 gene has been found) are located on mouse chromosome 4 (dandoy et al., 1984 (dandoy et al., , 1985 de maeyer and de maeyer-guignard, 1988, for review) . these genes are, like their human counterparts, intronless and of comparable structure. twelve mouse ifn-~ genes or pseudogenes have been identified, of which the cdnas for 10 different genes have been cloned and expressed (langer and pestka, 1985; de maeyer and de maeyer-guignard, 1988) . mouse ifn-~ subtype proteins contain 166 or 167 amino acids, or exceptionally 162 (mouse ifn-~8) , and the four cysteines at positions (langer and pestka, 1985) . there is only a single-copy mouse ifn-[3 gene and this encodes the 16i-amino-acid mature mouse ifn-[3 protein (higashi et al., 1983; de maeyer and de maeyer-guignard, 1988) . mouse ifn-13 contains only one cysteine and thus cannot form intramolecular disulfide bonds. it has three potential n-linked glycosylation sites and is heavily glycosylated when secreted from mouse fibroblasts; the molecular mass of the native glycoprotein is approximately 34 kda compared to the predicted 17 kda for the nonglycosylated counterpart (de maeyer and de maeyer-guignard, 1988 ). the amino acid sequence of mouse ifn-i3 is about 4:8% related to that of human ifn-i3. the threedimcnsional structure of mouse ifn-i3 has been solved (senda et al, 1992 ) (see section 3.1.1 ) and the protein has been shown to comprise five s-helices folded into a compact s-helical bundle (figure 25.4) . induction of transcription of mouse ifn-0~ subtype genes and the mouse ifn-i3 gene is probably regulated by transcription factor-binding nucleotide sequences present in the 5' noncoding promoter region, in a similar way to that of human ifn-~ and ifn-[3 genes (see section 2.2). for. example, repeated gaaa-rich sequences are present in the 5' flanking regions of most mouse ifn-cz subtype genes and these are likely to be important for virus-inducible transcription (shaw et m., 1983; zwarthoff et al., 1985) . inducers of mouse ifn-~ and ifn-i3 synthesis, which include a number of viruses and double-stranded polynucleotides, are similar to those which induce human ifn-c~, ifn-[3, and ifn-60 production (stewart, 1979; de maeyer and de maeyer-guignard, 1988) . similarly, the type of ifn produced follows the pattern found among different human cell types: fibroblastic and epithelial cell lines produce mainly ifn-~, whereas leukocytes produce mainly ifn-0~ subtypes (de maeyer and de maeyer-guignard, 1988) . the biological properties of mouse ifn-0~ and ifn-~ are similar to those of human ifn-0~, ifn-j3 and ifn-03 (see section 5). since mouse and human ifn-~ subtypes are only 40% homologous, there is considerable species preference in biological activity, i.e., mouse ifn-cx is weakly active in human cells and vice versa. mouse ifn-13 is also not active in human cells (stewart, 1979) . rather less is known regarding receptors for mouse ifn-~ and ifn-13, than for the human counterparts, but it is probable that they comprise two or more chains, as is the case for the human ifn-~/j3/00 receptors (uzd et al., 1995) . the mouse equivalent receptor chain to the ifn-0~ 8 binding protein (uzd et al., 1990) has been cloned (uz~ et al., 1992) . the gene for this mouse ifn-~/13 receptor has been located to mouse chromosome 16 (cheng et al., 1993) . the mouse ifn-~/13 receptor is 564 amino acids long and is divided into a large nterminal extracellular domain (403 amino acids), a short hydrophobic transmembrane segment (20 amino acids), and a cytoplasmic domain (141 amino acids) (uz~ et al., 1992) . the extracellular domain contains eight potential n-linked glycosylation sites and is predicted to exhibit the two-d200 domain structure of the human ifn-~ 8 binding protein extracellular domain (figure 25 .5) (uzd et al., 1995) . further mouse ifn-0~/]3 rceptors or components thereof await identification and characterization. signal transduction via mouse ifn-~/]3 receptors is expected to involve the jak1/tyk2-isgf3 (stat 1/2) pathway as outlined for human ifn-0~/]3/60 receptors (see section 7.1). in stat 1 genedeleted mice there are no overt developmental abnormalities, but they display a complete lack of responsiveness to mouse ifn-0~ and ifn-~, (durbin et al., 1996; meraz et al., 1996) . as a consequence, stat 1 -/-mice are highly susceptible to infection by viruses and microbial pathogens. stat 1 is therefore an obligatory mediator in the signal transduction pathway triggered by ifns. targeted disruption of the cloned mouse ifn-0~/]3 receptor gave rise to a knockout with a similar phenotype (miiller et al., 1994) . such mice, lacking the ifn-cx/13 receptor, developed normally but were unable to respond to mouse ifn-0~/]3 and thus unable to cope with viral infections. the potent antiviral activity of ifn-~/]3/c0 together with their potential antitumor actions provided the impetus for large-scale manufacture of ifns for the purpose of clinical evaluation in a variety of viral and malignant diseases. in the early 1970s, ifn production depended on pooled, human buffy coats (leukocytes) and thus only limited quantities could be made (cantell and hirvonen, 1977) . later in that decade, human lymphoblastoid cells (e.g., namalwa), which could be grown to large culture volumes, became available for ifn production. by the 1980s, following the cloning of ifn-~ and ifn-j3, these ifn species were massproduced by recombinant rdna technology, leading to abundant availability of certain ifn-0~ subtypes, e.g., ifn-0~ 2 and "stabilized" ifn-]3 ser 17. there followed production of ifn-7 and ifn-c0 by this means. clinical usage of ifn-~ preparations far exceeds that of ifn-[3 and ifn-c0 because of early production difficulties with the latter types, though these are now solved. at the beginning of the 1980s there was tremendous enthusiasm, both from manufacturers of ifns and from clinicians, to evaluate the therapeutic potential of ifns. however, early clinical trials had been poorly devised, were not "blinded", and often yielded only anecdotal evidence of success. it was only after many controlled, randomized studies had been conducted that it became apparent that ifns in general, administered as a single agent, were not beneficial for the treatment of the majority of malignant diseases, including the major cancers (lung, breast, colon) of the developed world. the initial optimism all but vanished and was replaced in the mid-to-late 1980s by a more sober and realistic appreciation of the potential therapeutic value ofifns. a number of general conclusions have been drawn, as follows. (i) ifn-0~ and ifn-]3, and to a lesser extent ifn-3', have antitumor activity in a small number of cancers, particularly in those that are relatively slow-growing and well-differentiated. (ii) there is no indication that heterogeneous ifn-~ preparations containing mixtures of ifn-0~ subtypes (e.g., leukocyte ifn, lymphoblastoid ifn) have different clinical effects from those of homogenous, recombinant ifn-0~ subtype or ifn-]3 preparations. (iii) continuous or intermittent high dosing appears to be required for antitumor efficacy. (iv) ifns probably work best in patients with a minimal tumor burden (balkwill, 1989) . a major concern that has emerged from clinical studies is that ifns all generate a considerable number of undesirable, clinically observable, side-effects, including fever, chills, malaise, myalgia, headache, fatigue, and weight loss, and in certain cases these have been severe enough for treatment to be halted (bottomly and toy, 1985; rohatiner et al., 1985; goldstein and laszlo, 1986) . in addition, a variable proportion (1-40%) of patients treated with ifn-0~ or ifn-]3, especially recombinant ifn-~ 2 and recombinant ifn-]3 ser 17, develop neutralizing antibodies to the ifn species used (rinehart et al., 1986; antonelli et al., 1991) , that in some instances have been associated with clinical "resistance" to ifn (steis et al., 1988; oberg et al., 1989; freund et al., 1989; fossa et al., 1992) . a further important, but generally unrecognized, side-effect of ifn-~ treatment is the possible induction of certain types of autoimmune disease (feldmann et al., 1989; gutterman, 1994) , probably mediated via ifn-induced upregulation of mhc antigen expression and generalized immunosuppression. the most responsive cancer to ifn-a therapy is a very rare form of b-cell leukemia, known as "hairy cell" leukemia (hcl), in which a response rate up to 80% has been reported (gutterman, 1994; baron et al., 1991; vedantham et al., 1992) . in hcl patients, the "hairy cells" invade the spleen and bone marrow and the disease takes an indolent course. it has been shown convincingly that ifn-~ therapy continued over several months leads to a clearance of "hairy cells" and in some patients a long-term remission is achieved. ifn-~ ser 17 or ifn-y were less effective against hcl (saven and piro, 1992) . the ifn-~-induced mechanisms whereby clearance of "hairy cells" is achieved are not fully understood, but it is believed that a direct action of ifn-~ leading to differentiation of "hairy cells" to a nonproliferating phenotype is involved (vedantham et al., 1992; gutterman, 1994) . not all patients benefit greatly from ifn-~ treatment and some develop neutralizing antibodies, particularly when ifn-0~ 2 is used (steis et al., 1988) . when such neutralizing antibodies cause resistance to further ifn-~ 2 treatment, clinical responses can be "rescued" by switching to a heterogeneous ifn-0~ preparation, e.g., leukocyte ifn-c~ (von wussow et al., 1991) . however, on the whole, ifn-~ therapy of hcl appears at least as effective and durable as chemotherapy with the drug pentostatin (2-deoxycoformycin) (saven and piro, 1992) . ifn-0~ therapy has also been shown to slow down the progression of chronic myelogenous leukemia (cml) (baron et al., 1991; gutterman, 1994) . in this malignant disease, leukemic cells grow slowly in the initial chronic, but benign, phase and persist for 2-4 years, but there follows a dramatic "blast crisis" producing rapidly proliferating myeloid leukemia cells and a fatal outcome. cml patients treated with ifn-~ in the chronic phase often achieve durable remissions, associated with the elimination of leukemic cells bearing the so-called "philadephia chromosome", sometimes lasting up to 8 years. other malignancies in which ifn-a therapy seems to work, although with generally a lower percentage of patients responding than in hcl and cml, include lowgrade non-hodgkin lymphoma, cutaneous t cell lymphoma, carcinoid tumors, renal cell carcinoma, squamous epithelial tumors of the head and neck, multiple myeloma, and malignant melanoma. in most of these cancers, complete responses are low compared to partial responses, but ifn-~ may help with maintenance therapy of diseases in some cases, e.g., multiple myeloma (mandelli et al., 1990; johnson and selby, 1994) . the neovascularization of primary tumors is a crucial step in their development and thus the anti-angiogenic activity of ifn-~/i3 (sidky and borden, 1987) may have therapeutic value in certain early malignancies, e.g., primary melanoma (gutterman, 1994) . kaposi sarcoma, often found in aids patients, has been regarded as an angiogenic tumor or angioproliferative disease, which may explain why ifn-c~ treatment can lead to regression of lesions in up to 40% of patients with this condition (de wit et al., 1988; groopman and scadden, 1989) . in preclinical systems, the combination oflfn therapy and conventional chemotherapy has appeared to offer greater chances of producing effective treatment of many cancers, but in clinical trials this strategy has produced mostly disappointing results (see wadler and schwartz, 1990, for review) . this may be due to (i) the inability of preclinical models accurately to predict the clinical situation; (ii) the lack of understanding of the biochemical interactions and biological consequences of combining ifns and chemotoxic agents; (iii) a failure to incorporate information on dose, scheduling, and sequence of administration of ifns and chemotoxic agents into clinical trials. despite having proven antiviral activity in vitro, ifns have not proved the hoped-for panacea for most common viral infections in man. ifn-0~/~ prevent the replication of common cold viruses (rhinoviruses and coronaviruses) in the test tube and when administered to volunteers in the form of a nasal spray, but cannot "cure" colds once they are established (scott r al., 1982; r.m. douglas et al., 1986; turner et al., 1986) . ifn-ix is only partially effective in preventing influenza virus infections (treanor et al., 1987) . topical applications of ifn-~/~ in the form of creams or ointments to herpes virus lesions, e.g., in herpes zoster (chickenpox), and genital warts (condyloma acuminatum) caused by papilloma viruses have been investigated, but have given limited beneficial effects. however, when administered parenterally, i.e., by intramuscular or intravenous injection, greater beneficial effects of ifn-(x/6 on virally caused lesions and warts have been found, although not to an extent that ifn therapy has become the treatment of choice (schneider et al., 1987; j.m. douglas et al., 1990; baron et al., 1991; gutterman, 1994) . another wart-like disease, juvenile laryngeal papilloma (jlp), which can severely obstruct the airways of young children, caused by the same papilloma virus types (6 and 11) as cause genital warts, has also been found to respond beneficially to ifn-~ therapy. disappointingly, ifn-cz therapy appears neither curative nor of substantial value as an adjunctive agent in the long-term management ofjlp (healy et al., 1988) . probably the most successful application of ifn-~ therapy to viral disease is in the treatment of chronic active hepatitis, caused by either hepatitis b or c viruses (baron et al., 1991; gutterman, 1994) . up to about 40% of chronic active hepatitis b patients respond to ifn-~ therapy; viral infectivity markers disappear and seroconversion and cure follow. it is interesting in the case of hepatitis b virus that viral activity is responsible for inhibiting the endogenous ifn system , and thus the administration of exogenous ifn-cz constitutes a replacement therapy. in hepatitis c virus infection, some serotypes of the virus are apparently more sensitive to ifn-~ therapy than others and prolonged treatment may be necessary (>6 months) to prevent relapses occurring (gutterman, 1994) . both ifn-~ and ifn-[3 have been shown to inhibit human immunodeficiency virus-1 (hiv-1) replication in vitro (hartshorn et al., 1987) . however, in vivo, there is little evidence showing that ifn-c~ therapy has any longterm beneficial effect in asymptomatic hiv-l-positive individuals or mds patients (friedland et al., 1988; lane et al., 1990) , except for limited regressions in kaposi sarcoma lesions (de wit et al., 1988; groopman and scadden, 1989) . combination therapies for hiv-1infected individuals involving ifn-~ and antiviral drugs such as zidovudine (azt) have also proved to be ineffective (berglund, 1991 ) . as mentioned earlier, ifn-0~/~ inhibits hematopoiesis and therefore induces leukopenia in patients. this effect has generally been thought to be undesirable and it can lead to immunosuppression; however, it has proved useful for the treatment of diseases in which there is uncontrolled leukocytosis, e.g., thrombocytosis (markedly elevated platelet numbers), associated with various myeloproliferative diseases (gisslinger et al., 1989) . resistance to ifn-~ 2 therapy has occurred in such patients when neutralizing antibodies to ifn-~ 2 have developed, but successful retreatment with a heterogeneous ifn-~ preparation, lymphoblastoid ifn (ifn-0~n1) has been reported (brand et al., 1993) . the findings that production of ifn-~ and ifn-7 was deficient in multiple sclerosis (ms) patients (neighbor and bloom, 1979) stimulated clinical trials to evaluate ifns in ms. rather unexpectedly, it has repeatedly been found that ifn-]3, either natural fibroblast-derived or the later recombinant ifn-j3 ser 17 (ifn-j3-1b), injected intrathecally, subcutaneously, or intramuscularly in patients with relapsing/remitting disease leads to a reduced rate of exacerbations of the disease and thus is possibly of clinical benefit in some patients (jacobs et al., 1981 (jacobs et al., , 1987 (jacobs et al., , 1993 the ifnb multiple sclerosis study group, 1993; paty et al., 1993) . the ifn-[3-induced mechanisms that contribute to this beneficial outcome are not known, but probably immunomodulatory actions are involved, e.g., suppression of growth and activity of autoreactive t lymphocytes in the central nervous system (goodkin, 1994) . it is unclear whether ifn-c~ would have a similar effect. however, the results with ifn-[3 treatment have been encouraging so far, although more follow-up of patients will be necessary to monitor any effects on the clinical progression of ms (ebers, 1994) . accumulation and breakdown of rna-deficient intracellular virus particles in interferon-treated nih 3t3 cells chronically producing moloney murine leukaemia virus inhibition of the cellular response to interferons by products of the adenovirus type 5 eia oncogene antigenic structure of human interferon 0~1 (interferon alpha ii): comparison with other human interferons monoclonal antibodies and enzyme immunoassays specific for human interferon (ifn) c01: evidence that ifn-c01 is a component of human leukocyte ifn purification and characterisation of natural interferon c01: two alternative cleavage sites for the signal peptidase natural human interferon-~2 is o-glycosylated human interferon c01: isolation of the gene, expression in chinese hamster ovary cells and characterization of the recombinant protein high affinity binding of 12si-labelled mouse interferon to a specific cell surface receptor various human interferon ~ subclasses cross-react with common receptors: their binding activities correlate with their specific biological activities molecular cloning and expression of the human interferon-7 receptor interferon [3 increases expression of vimentin at the mrna and protein levels in differentiated embryonal carcinoma (psmb) cells nomenclature of the human interferon proteins a family of structural genes for human lymphoblastoid (leukocyte-type) interferon neutralizing antibodies to interferon-cx: relative frequency in patients treated with different interferon preparations distinct activation of murine interferon-~ promoter region by irf-1/isgf-2 and virus infection human recombinant interferon c~-2c enhances the expression of class ii hla antigens on hairy cells cytokines in cancer therapy ifp35 is an interferon-induced leucine zipper protein that undergoes interferon-regulated cellular redistribution the interferons: mechanisms of action and clinical applications shared architecture of the hormone binding domains in type i and ii interferon receptors haemopoietic receptors and helical cytokines vaccinia virus-encoded eif-2 alpha homolog abrogates the antiviral effect of interferon a monoclonal antibody to recombinant human ifn-c~ receptor inhibits biologic activity of several species of human ifn-c~, ifn-[3 and ifn-m combined treatment of symptomatic human immunodeficiency virus type 1 infection with native interferon c~ and zidovudine clinical side effects and toxicities of interferon molecular interactions between interferon consensus sequence binding protein and members of the interferon regulatory factor family molecular analysis of the human interferon-cx gene family the interferon receptors evidence that types i and ii interferons have different receptors successful retreatment of an anti-interferon resistant polycythaemic vera patient with lymphoblastoid interferon-czn1 and in vitro studies on the specificity of the antibodies preparation of human leukocyte interferon for clinical use two distinct families of human and bovine interferon-c, genes are coordinately expressed and encode functional polypeptides identification of a common nucleotide sequence in the 3'-untranslated region of mrna molecules specifying inflammatory mediators activation of the ppp(a2'p)na system in interferon-treated, herpes simplex virus-infected cells and evidence for novel inhibitors of the ppp(a2'p)na-dependent rnase interferon-induced 56,000 m r protein and its mrna in human cells: molecular cloning and partial sequence of the edna constitutive expression of (2'-5') oligo a synthetase confers resistance to picornavirus infection gart, son, ifnar, and crf-24 genes cluster on human chromosome 21 and mouse chromosome 16 interferon action: nucleolar and nucleoplasmic localization of the interferoninducible 72-kd protein that is encoded by the if: 204 gene from the gene 200 cluster sequence requirement for specific interaction of an enhancer binding protein (ebp1) with dna binding of epstein-barr virus small rna eber-1 to the double-stranded rna-activated protein kinase dai knockout and reconstitution of a functional human type i interferon receptor complex characterization of three monoclonal antibodies that recognize the ifna2 receptor multichain structure of the ifn-cx receptor on hematopoietic cells vaccinia virus b18r gene encodes a type i interferon-binding protein that blocks interferon transmembrane signaling role of interferon cz/[3 receptor chain 1 in the structure and transmembrane signalling of the interferon ~/[3 receptor complex inhibition of revmediated hiv-1 expression by an rna binding protein encoded by the interferon-inducible 9-27 gene expression of gene rrg is associated with reversion of nih 3t3 transformed by ltr-c-h-ras linkage analysis of the murine interferon-0~ locus on chromosome 4 segregation of restriction fragment length polymorphism in an interspecies cross of laboratory and wild mice indicates tight linkage of the murine ifn-[3 gene to the murine ifn-cx genes nucleotide sequence of the chromosomal gene for human fibroblast (131) interferon and of the flanking regions epstein barr virus/complement c3d receptor is an interferon c~ receptor interferons and other regulatory cytokines isolation and structure of a human fibroblast interferon gene isolation and characterisation of a human fibroblast interferon gene and its expression in escherichia coli clinical and virological effects of high-close recombinant interferon alpha in disseminated aids-related kaposi's sarcoma nomenclature of the human interferon genes cooperative interaction of multiple dna elements in the interferon-j3 promoter cloning and expression of a long form of the 13 subunit of the interferon 0q3 receptor that is required for signalling a randomized trial of combination therapy with intralesional interferon alpha 2b and podophyuin versus podophyllin alone for the therapy of anogenital warts prophylactic efficacy of intranasal alpha-2 interferon against rhinovirus infections in the family setting an atf/creb binding site protein is required for virus induction of the human interferon ]3 gene targeted disruption of the mouse stat 1 gene results in compromised innate immunity to viral disease molecular cloning of human alpha and beta genes from namalwa cells treatment of multiple sclerosis isolated interferon alpha-receptor complexes stabilised in vitro human interferon-~a, -~2 and -~2 (arg) genes in genomic dna direct evidence that the gene product of the human chromosome 21 locus, ifrc, is the interferon-cx receptor recombinant human leukocyte interferon produced in bacteria has antiproliferative activity characterization of an interferon receptor on human lymphoblastoid cells two different virus-inducible elements are required for human ]3-interferon gene regulation purification, concentration and physico-chemical properties of interferons enhanced expression of hla antigens and ~2-microglobulin on interferon-treated human lymphoid cells modulation of tubulin mrna levels by interferon in human lymphoblastoid cells why are there so many subtypes of alpha-interferons human interferon omega (co) binds to the ~/~ receptor recombinant interferon c~-2a combined with prednisone in metastatic renal-cell carcinoma: treatment results, serum interferon levels and the development of antibodies expression of the terminal protein region of hepatitis b virus inhibits cellular responses to interferons c~ and 3' and double-stranded rna recombinant human interferon (ifn) alpha-2b in chronic myelogenous leukaemia: dose dependency of response and frequency of neutralizing anti-interferon antibodies a randomized placebo-controlled trial of recombinant human interferon alpha 2a in patients with aids delimitation and properties of dna sequences required for the regulated expression of human interferon evidence for a nuclear factor(s), irf-1, mediating induction and silencing properties to human ifn-j] gene regulatory elements involvement of a cis-element that binds an h2tf-1/nf-~cb like factor(s) in virus-induced interferon-[3 expression induction of the transcription factor irf-1 and interferon-j3 mrnas by cytokines and activators of second messenger pathways induction of endogenous ifn-~ and ifn-j3 genes by a regulatory transcription factor, irf-1 binding of the adenovirus vai rna to the interferon-induced 68-kda protein kinase correlates with function configuration of the interferon-cx/[3 receptor complex determines the context of the biological response cloning, sequencing and expression of two murine 2'-5'-oligoadenylate synthetases concerted evolution of human interferon ~ genes long-term interferon therapy for thrombocytosis in myeloproliferative diseases human leukocyte interferon produced by e. coli is biologically active the structure of eight distinct cloned human leukocyte cdnas interferon therapy in cancer: from imagination to interferon overlapping positive and negative regulatory domains of the human [3-interferon gene the human ~-interferon gene enhancer is under negative control interferon beta-lb human monocytes and lymphocytes produce different mixtures of s-interferon subtypes expression of human immune interferon edna in e. coli and monkey cells selective production of interferon-alpha subtypes by cultured peripheral blood mononuclear cells and lymphoblastoid cell lines novel human leukocyte interferon subtype and structural comparison of ~ interferon genes role of interferon in resistance to viral infection in vivo interferon and cell division. i. inhibition of the multiplication of mouse leukaemia l1210 cells in vitro by interferon preparations mechanism of antitumour effect of interferon in mice injection of mice with antibody to interferon enhances the growth of transplantable murine tumours interferon therapy for kaposi's sarcoma associated with the acquired immunodeficiency syndrome (aids) the structure of a thirty-six kilobase region of the human chromosome including the fibroblast interferon gene ifn-~ interferon receptors and their role in interferon action interferon action against reovirus: activation of interferon-induced protein kinase in mouse l929 cells upon reovirus infection the sh2 domains of stat 1 and stat 2 mediate multiple interactions in the transduction of ifn~ signals regression of the interferon signal transduction pathway by the adenovirus e1a oncogene cytokine therapeutics: lessons from interferon ~. proc. natl acad. sci. usa 91 differential human interferon alpha receptor expression on proliferating and non-proliferating cells the genes for the trophoblast interferons and the related interferon ~ii possess distinct 5'-promoter and 3'-flanking sequences structurally similar, but functionally distinct factors, irf-1 and irf-2, bind to the same regulatory elements of ifn and ifn-inducible genes absence of type i ifn system in ec cells: transcriptional activator (irf-1) and repressor (irf-2) are developmentally regulated anti-oncogenic and oncogenic potentials of interferon regulatory factors 1 and 2 activity of interferons alpha, beta, and gamma against human immunodeficiency virus replication in vitro a novel class of human type i interferons synthesis of two distinct interferons by human fibroblasts treatment of recurrent respiratory papillomatosis with human leukocyte interferon structural relationship of human interferon-~ genes and pseudogenes the gene for human fibroblast interferon (ifb) maps to 9p21 enhanced expression of 132-microglobulin and hla antigens on human lymphoid cells by interferon a gene on human chromosome 21 located in the region 21q22.2 to 21q22.3 encodes a factor necessary for signal transduction and antiviral response to type i interferons structure and expression of a cloned cdna for mouse interferon-[5 structure and expression in escherichia coli of canine alpha-interferon genes differential expression of human interferon genes induction of human interferon gene expression is associated with a nuclear-factor that interacts with the nf~cb site of the human immunodeficiency virus enhancer interferon enhances the antibody-dependent cellular cytotoxicity (adcc) of human polymorphonuclear leukocytes interferon-alpha hybrids a protective action of neurotropic against viserotropic yellow fever virus in macacus rhesus structural characterization of fibroblast human interferon-beta the absence of introns within a human fibroblast interferon gene interferon-induced and double-stranded rna-activated enzymes: a specific protein kinase and 2',5'-oligoadenylate synthetases evidence for multiple binding sites for several components of human lymphoblastoid interferon-cx interferon betal b is effective in relapsing-remitting multiple sclerosis. i. clinical results of a multicenter, randomized, double-blind, placebo-controlled trial cytokine receptor signalling stats: signal transducers and activators of transcription jaks and stats in signaling by the cytokine receptor superfamily virus interference. i. the interferon intrathecal interferon reduces exacerbations of multiple sclerosis intrathecally administered natural human fibroblast interferon reduces exacerbations of multiple sclerosis: results of a multicenter, double-blind study a phase iii trial of intramuscular recombinant beta interferon as treatment for multiple sclerosis: current status role of protein kinase c in induction of gene expression and inhibition of cell proliferation by interferon cx the treatment of multiple myelomaman important mrc trial interferon receptors. crosslinking of human leukocyte alpha-2 to its receptor on human cells inhibition of interferon-inducible gene expression by adenovirus e1a proteins: block in transcriptional complex formation identification and characterisation of a novel repressor of interferon-~ expression protein kinase activity required for an early step in interferon-~ signalling two interferon-induced nuclear factors bind a single promoter element in interferon-stimulated genes interferon: purification and initial characterisation from diploid cells human fibroblast interferon: amino acid analysis and amino terminal amino acid sequence three-dimensional model of a human interferon alpha consensus sequence malignant transformation by a mutant of the ifn-inducible dsrna-dependent protein kinase studies on the role of the 2'-5'-oligoadenylate synthetase-rnase l pathway in l-interferon-mediated inhibition of encephalomyocarditis virus replication interferon c~ induces the expression of retinoblastoma gene product in human burkitt lymphoma daudi cells: role in growth regulation interferon alpha in patients with asymptomatic human immunodeficiency virus (hiv) infection: a randomized, placebo-controlled trial structure of interferons interferon receptors dna sequence of two closely linked human leukocyte interferon genes dna sequence of a major human leukocyte interferon gene synergism between distinct enhanson domains in viral induction of human beta interferon gene characterization and regulation of the 58,000-dalton cellular inhibition of the interferon-induced, dsrna-activated protein kinase the involvement of nf-~cb in 13-interferon gene regulation reveals its role as a widely inducible mediator of signal transducfion biochemistry of interferons and their actions tumor-suppressor genes: news about the interferon connection amino-terminal amino acid sequence of human leukocyte interferon amino acid sequence of a human leukocyte interferon molecular analysis of a human interferoninducible gene family the interferon-inducible 15-kda ubiquitin homolog conjugates to intracellular proteins the structure of the human interferon c~/[3 receptor gene mutant u5a cells are complemented by an interferon-a[3 receptor subunit generated by alternative processing of a new member of a cytokine receptor gene cluster different pathways mediate virus inducibility of the human ifn-~i and ifn-~ genes maintenance treatment with recombinant interferon alpha 2b in patients with multiple myeloma responding to conventional induction chemotherapy controlled transcription of a human m-interferon gene introduced into mouse l-cells the nucleotide sequence of a cloned human leukocyte interferon cdna site-specific mutagenesis of the human fibroblast interferon gene adenovirus virus-associated rna and translational control targeted disruption of irf-1 or irf-2 results in abnormal type i ifn gene induction and aberrant lymphocyte development involvement of a gene on chromosome 9 in interferon production interferon production: variation in yields from human cell lines somatic cell genetics of human interferon production in human-rodent cell lines the effect of hypertonic salt on interferon and interferon mrna synthesis in human mg63 cells interferon-induced mx proteins form oligomers and contain a putative leucine zipper targeted disruption of the stat 1 gene in mice reveals unexpected physiologic specificity in the jak-stat signaling pathway tumor suppressor function of the interferon-induced double-stranded rna-activated protein kinase recent divergence from a common ancestor of human ifn-~ genes mechanism of interferon action. interferon mediated inhibition of reovirus mrna translation in the absence of detectable mrna degradation but in the presence of protein phosphorylation regulated expression of a gene encoding a nuclear factor, irf-1, that specifically binds to ifn-[3 gene regulatory elements roles of the 29-138 disulfide bond of subtype a of human interferon in its antiviral activity and conformational stability the protein tyrosine kinase jak1 complements defects in the ifn-a/[3 and -~, signal transduction functional role of type i and type ii interferons in antiviral defense multimerization of aagtga and gaaagt generates sequences that mediate virus inducibility by mimicking an interferon promoter element the structure of one of the eight or more distinct chromosomal genes for human interferon-c~ absence of virus-induced lymphocyte suppression and interferon production in multiple sclerosis transcriptional activation by viral regulatory proteins interferon-j3 promoters contain a dna element that acts as a position-independent silencer on the nf-~cb site the human interferon ~/j3 receptor: characterisation and molecular cloning treatment of malignant carcinoid tumors with recombinant interferon alfa-2b: development of neutralizing interferon antibodies and possible loss ofantitumor activity close linkage of ~ and ~ interferons and infrequent duplication of ]3 interferon in humans leukocyte and fibroblast interferon genes are located on human chromosome 9 interferon beta-lb is effective in relapsing-remitting multiple sclerosis. ii. mri analysis results of a multicenter, randomized, double-blind, placebo-controlled trial interferons and their actions transmembrane signalling by interferon ~ involves diacylglycerol production and activation of ~ isoform of protein kinase c in daudi cells large scale production of human interferon from lymphoblastoid cells purification and cloning of interferon-stimulated gene factor 2 (isgf2): isgf2 (irf-1) can bind to the promoters of both beta interferon and interferon-stimulated genes, but is not a primary transcriptional activator of either tyrosine phosphorylation of the ~ and ~ subunits of the c~ and [3 subunits of the type i interferon receptor gene targeting in human somatic cells: complete inactivation of an interferoninducible gene interferons and natural killer cells not more than 117 base pairs of 5'-flanking sequence are required for inducible expression of a human ifn-~ gene receptors for human interferon-alpha: two forms of interferon-receptor complexes identified by chemical cross-linking receptors for human cx and [3 interferon but not for 7 interferon are specified by human chromosome 21 specific molecular activities of recombinant and hybrid leukocyte interferons evidence for involvement of protein kinase c in the cellular response to interferon ~ a single dna response element can confer inducibility by both ~-and 7-interferons enhanced in vivo therapeutic response to interferon in mice with an in vitro interferon-resistant b-cell lymphoma mice devoid of interferon regulatory factor (irf-1) show normal expression of type i interferon genes critical role of a common transcription factor, irf-1, in the regulation of ifn-~ and ifn-inducible genes interferons and interleukin-6 suppress phosphorylation of the retinoblastoma protein in growth-sensitive hematopoietic cells interferon-activated genes components of the human type antigen presentation by human monocytes: effects of modifying major histocompatibility complex class ii antigen expression and interleukin 1 production by using recombinant interferons and corticosteroids double-stranded rna-dependent protein kinase and 2-5a system are both activated in interferon-treated, encephalomyocarditis virus-infected hela cells the effect of recombinant-dna-derived interferons on the growth of myeloid progenitor cells phase i/ii trial of human recombinant ~-interferon serine in patients with renal cell carcinoma interferons as hormones of pregnancy central nervous system toxicity of interferons control of ifn-inducible mxa gene expression in human cells tumour necrosis factor and ifn induce a common set of proteins the interferon receptors human leukocyte interferon purified to homogeneity human leukocyte interferon: isolation and characterisation of several molecular forms induction of type i interferon genes and interferon-inducible genes in embryonal stem cells devoid of interferon regulatory factor i a 46-nucleotide promoter segment from an ifn-cx gene renders an unrelated promoter inducible by virus interferon induction of the antiviral state proteins induced by interferons and their possible roles in the antiviral mechanisms of action treatment of hairy cell leukaemia interferon-dependent tyrosine phosphorylation of a latent cytoplasmic transcription factor interference between animal viruses interferon treatment of human genital papilloma virus infection: importance of viral type the interferoninduced 67-kda guanylate-binding protein (hgbp1) is a gtpase that converts gtp to gmp prevention of rhinovirus colds by human interferon alpha-2 from e. coli inhibition of ornithine decarboxylase in human fibroblast cells by type i and type ii interferons the interferon system: a bird's eye view of the biochemistry three-dimensional crystal structure of recombinant murine interferon-[3 specific residues within an amino-terminal domain of 35 residues of interferon-~ are responsible for recognition of the human interferon-a cell receptor and for triggering biological effects structure and expression of cloned murine ifn-a genes a conserved au sequence from the 3' untranslated region of gm-csf mrna mediators selective mrna-degradation a single amino acid change in ifn-[31 abolishes its antiviral activity existence and unique n-terminal sequence of alpha ii (omega) interferon in natural leukocyte interferon preparation clustering of leukocyte and fibroblast genes on human chromosome 9 interferon activation of the transcription factor star91 involves dimerization through sh2-phosphotyrosyl peptide interactions antibodies to chromosome 21 coded cell surface components block binding of human ~ interferon but not 7 interferon to human cells expression of interferon-~ and interferon-j3 genes in human lymphoblastoid (namalwa) cells inhibition ofangiogenesis by interferons: effects on tumour-and lymphocyte-induced vascular responses presence of human chromosome 21 alone is sufficient for hybrid cell sensitivity to human interferon chromosomal location of a human cx interferon gene family expression of a functional human type i interferon receptor in hamster cells: application of functional yeast artificial chromosome (yac) screening interferon-induced proteins and the antiviral state resistance to recombinant interferon alfa-2a in hairy-cell leukaemia associated with neutralizing anti-interferon antibodies the interferon system interferon nomenclature modulation of 5-fluorouracil-induced toxicity in mice with interferon or with the interferon inducer, polyinosinic-polycytidylic acid at least three human type a interferons: structure of a 2 target specificity of two species of human interferon-a produced in escherichia coli and of hybrid molecules derived from them vaccinia virus encodes a soluble type i interferon receptor of novel structure and broad species specificity the linkage of genes for the human interferon-induced antiviral protein and indophenol oxidase-b traits to chromosome g-21 human leukocyte and fibroblast interferons are structurally related recent progress in interferon research: molecular mechanisms of regulation, action, and virus circumvention the high mobility group protein hmgi (y) is required for nf-~cb-dependent virus induction of the human ifn-13 gene an intracellular 50kda fc~,-binding protein is induced in human cells by 0~-ifn characterization of human beta-interferon-binding sites on human cells two non-allelic human interferon a genes with identical coding regions intranasally administered interferon as prophylaxis against experimentally induced influenza a virus infection in humans human natural killer cells: biologic and pathologic aspects prevention of experimental coronavirus colds with intranasal alpha-2b interferon nucleotide sequence of a portion of human chromosome 9 containing a leukocyte interferon gene cluster characterization of four different mammalian-cell-derived recombinant human interferon-[31s receptor dynamics of closely related ligands: "fast" and "slow" interferons electrostatic interactions in the cellular dynamics of the interferon-receptor complex genetic transfer of a functional human interferon 0~ receptor into mouse cells: cloning and expression of its cdna murine tumor cells expressing the gene for the human interferon 0q3 receptor elicit antibodies in syngeneic mice to the active form of the receptor behaviour of a cloned murine interferon alpha beta receptor expressed in homospecific or heterospecific background 0~ and ]3 interferons and their receptor and their friends and relations characterization of interferon-alpha binding sites on human cell lines mechanism of interferon action in hairy cell leukaemia: a model of effective cancer biotherapy a protein tyrosine kinase in the interferon a/j5 signalling pathway distinct domains of the protein tyrosine kinase tyk2 required for binding of interferon-a/j3 and for signal transduction role of interferon in the pathogenesis of viral diseases of mice as demonstrated by the use of anti-interferon serum. v. protective role in mouse hepatitis virus type 3 infection of susceptible and resistant strains of mice double-stranded rna activates binding of nf-~cb to an inducible element in human [3-interferon promoter interferon-induced enhancement of macrophage fc receptor expression: [3-interferon treatment of c3h/hej macrophages results in increased numbers and density of fc receptors effective natural interferon-~ therapy in recombinant interferon-m-resistant patients with hairy cell leukaemia antineoplastic activity of the combination of interferon and cytotoxic agents against experimental and human malignancies: a review interferon c~ induces a protein kinase c-e (pkc-e) gene expression and a 4.7kb pkc-8 related transcript interferon increases the abundance of submembranous microfilaments in hela-s3 cells in suspension culture activation of ifn-13 element by irf-1 requires a translational event in addition to irf-1 synthesis regulation of two interferon-inducible human genes by interferon, poly(ri):poly(rc) and viruses regulation of gene expression by cytokines and virus in human cells lacking the type-i interferon locus antiviral activities of hybrids of two major human leukocyte interferons the 5'-flanking region of a human ifn-c~ gene mediates viral induction of transcription human interferon consensus sequence binding protein is a negative regulator of enhancer elements common to interferon-inducible genes vaccinia rescue of vsv from interferon-induced resistance: reversal of translation block and inhibition of protein kinase activity identification of novel factors that bind to the prd1 region of the human [3-interferon promoter postinduction repression of the 13-interferon gene is mediated through two positive regulatory domains treatment of malignant carcinoid tumors with recombinant interferon alfa-2b: development of neutralizing interferon antibodies and possible loss of antitumour activity transcriptional regulation of interferon-stimulated genes inhibition of protein synthesis by 2'-5' linked adenine oligonucleotides in intact cells a comparison of vertebrate interferon gene families detected by hybridisation with human interferon dna ribosomal rna cleavage, nuclease activation and 2-5a (ppp(a2'p)na) in interferon treated cells multiple protein-dna interactions with the human interferon-[3 regulatory element phosphorylated interferon-~ receptor 1 subunit (ifnar1) acts as a docking site for the latent form of the 113 kda stat 2 protein different binding of human interferon-c~l and-c~2 to common receptors on human and bovine cells. studies with recombinant interferon produced in escherichia coli interferon-0~ inhibits cyclin eand cyclin dl-dependent cdk-2 kinase activity associated with rb protein and e2f in daudi cells expressing cloning of 2-5a-dependent rnaase: a uniquely regulated mediator of interferon action comparative structures of mammalian interferons amino-terminal sequence of the major component of human lymphoblastoid interferon purification and characterization of multiple components of human lymphoblastoid interferon-c~ organization, structure and expression of murine interferon c~ genes key: cord-284933-flbibrcm authors: kim, jong-oh; kim, jae-ok; kim, wi-sik; oh, myung-joo title: characterization of the transcriptome and gene expression of brain tissue in sevenband grouper (hyporthodus septemfasciatus) in response to nnv infection date: 2017-01-13 journal: genes (basel) doi: 10.3390/genes8010031 sha: doc_id: 284933 cord_uid: flbibrcm grouper is one of the favorite sea food resources in southeast asia. however, the outbreaks of the viral nervous necrosis (vnn) disease due to nervous necrosis virus (nnv) infection have caused mass mortality of grouper larvae. many aqua-farms have suffered substantial financial loss due to the occurrence of vnn. to better understand the infection mechanism of nnv, we performed the transcriptome analysis of sevenband grouper brain tissue, the main target of nnv infection. after artificial nnv challenge, transcriptome of brain tissues of sevenband grouper was subjected to next generation sequencing (ngs) using an illumina hi-seq 2500 system. both mrnas from pooled samples of mock and nnv-infected sevenband grouper brains were sequenced. clean reads of mock and nnv-infected samples were de novo assembled and obtained 104,348 unigenes. in addition, 628 differentially expressed genes (degs) in response to nnv infection were identified. this result could provide critical information not only for the identification of genes involved in nnv infection, but for the understanding of the response of sevenband groupers to nnv infection. grouper is one of the highest valued marine fish and has become an important species in the aquaculture industry of various asian countries. in korea, sevenband grouper (hyporthodus septemfasciatus) is one the favorite grouper fish consumed. its production rate is increasing. however, viral nervous necrosis (vnn) causes high mortality, especially at the larval and juvenile stage of sevenband groupers during the summer season, which has caused vast economic losses [1] . viral nervous necrosis is a serious disease in the world aquaculture industry [2] [3] [4] . firstly, it was reported in bigeye trevally (caranx sexfasciatus) in the 1980s and since then it has been reported in over twenty species [2] [3] [4] . the infected fish are usually swimming abnormally and having vacuolization and necrosis of the central nervous system in the brain [3] . in korea, mass mortalities caused by vnn have been reported from various cultured marine fish such as sevenband grouper (hyporthodus septemfasciatus), rock bream (oplegnathus fasciatus), red drum (sciaenops ocellatus) and olive flounder (paralichthys olivaceus) since 1990 [5] [6] [7] . nervous necrosis virus (nnv), the causative agent of vnn, has non-enveloped icosahedral structure and belongs to the family nodaviridae (genus betanodavirus). its genome contains two single-stranded positive senses rna: rna1 (approximately 3.1 kb in length) encodes an rna-dependent rna died from day 3 after infection and showed 100% of cumulative mortality after 1 week. the moribund fish at days 3 and 4 were selected for sampling. brain tissues of three of ten challenged fish from mock and the virus-challenge group were collected and pooled for ngs analysis, respectively. to obtain high-throughput transcriptome data of sevenband grouper, complementary dna (cdna) libraries were prepared for 100 bp paired-end sequencing using a truseq rna sample preparation kit (illumina, san diego, ca, usa) according to the manufacturer's protocols. they were then paired-end (2 × 100 bp) sequenced using an illumina hiseq2500 system (illumina, san diego, ca, usa). prior to de novo assembly, paired-end sequences were filtered and cleaned using an ngs qc toolkit [21] to remove low quality reads (q < 20) and adapter sequences. in addition, bases of both ends less than q20 of filtered reads were removed additionally. this process is to enhance the quality of reads due to mrna degradation in both ends of it as time goes on [22] . only high quality reads were used for de novo assembly performed by trinity (version 20130225) using default values [23] . to remove the redundant sequences, cd-hit-est [24] was used. ncbi blast (version 2.2.28) was applied for the homology search to predict the function of unigenes. the function of unigenes was predicted by blastx to search all possible proteins against the ncbi non-redundant (nr) database (accessed on 17 july 2013). the criterion regarding significance of the similarity was set at expect-value less than 1 × 10 −5 . after obtaining the assembled transcriptome data using trinity, gene expression level was measured with rna-seq by expectation maximization (rsem), a tool for measuring the expression level of transcripts without any information on its reference [25] . the tcc package was used for deg analysis through the iterative deges/deseq method [26] . normalization was progressed three times to search meaningful degs between comparable samples [27] . the degs were identified based on the p-value of less than 0.05. the go database classifies genes according to the three categories of biological process (bp), cellular component (cc) and molecular function (mf) and provides information on the function of genes. to characterize the identified genes from deg analysis, a go based trend test was carried out through the fisher's exact test. selected genes with p-values of less than 1 × 10 −5 were regarded as statistically significant. all the raw read files were submitted to the sequence reads archive (sra), ncbi database (accession number-srr5091816). sequencing of the two libraries (mock and nnv-infected brain tissue) using the illumina hiseq 2500 platform generated a total of 45,101,102 (5,682,738,852 bases) and 34,715,846 (4,374,196 ,596 bases) raw reads, respectively (table 1) . after the cleansing step with an ngs qc toolkit and removal of low quality (q < 20) reads, 39,932,160 (5,006,434,933 bases) and 31,353,144 (3,932,946,324 bases) remained as clean reads, respectively ( table 1 ). the percentages of clean reads were 88.1% and 89.9%, respectively (table 1) . all the clean reads were submitted to the trinity for de novo assembly. unigenes were identified after removing redundant sequences from assembled transcripts. the number of unigenes was 104,348, the total length and the average length of the unigenes were 88,123,224 bp and 845 bp, respectively ( table 1 ). the length distribution of unigenes is presented in figure 1 . among these unigenes, 66,204 unigenes (63.4%) were no more than 500 bp. a total of 15,382 unigenes (14.7%) were 501-1000 bp, 6991 unigenes (6.7%) were 1001-1500 bp, 4727 unigenes (4.5%) were 1501-2000 bp, 3332 unigenes (3.2%) were 2001-2500 bp, and 7712 unigenes (7.4%) were longer than 2500 bp. bases) remained as clean reads, respectively ( table 1 ). the percentages of clean reads were 88.1% and 89.9%, respectively (table 1 ). all the clean reads were submitted to the trinity for de novo assembly. unigenes were identified after removing redundant sequences from assembled transcripts. the number of unigenes was 104,348, the total length and the average length of the unigenes were 88,123,224 bp and 845 bp, respectively ( table 1 ). the length distribution of unigenes is presented in figure 1 . among these unigenes, 66,204 unigenes (63.4%) were no more than 500 bp. a total of 15,382 unigenes (14.7%) were 501-1000 bp, 6991 unigenes (6.7%) were 1001-1500 bp, 4727 unigenes (4.5%) were 1501-2000 bp, 3332 unigenes (3.2%) were 2001-2500 bp, and 7712 unigenes (7.4%) were longer than 2500 bp. to estimate gene expression levels, we calculated the abundance of reads in the transcriptome. the top 20 most highly expressed transcripts are shown in table 2 . commonly, the most abundant genes in both the mock and nnv-infected groups were ribosomal proteins, such as ribosomal protein (rps) 15a, rpl39, rps28, rps14, rpls2, rps27a, rpl21 and rpl32 essential for biological metabolism. ubiquitin-like protein 4a (ubl4a), c-c motif chemokine 2 (ccl2), lysozyme g (lyg_epico) and two novel genes (id: sgu016297, sgu008676) were highly expressed in the to estimate gene expression levels, we calculated the abundance of reads in the transcriptome. the top 20 most highly expressed transcripts are shown in table 2 . commonly, the most abundant genes in both the mock and nnv-infected groups were ribosomal proteins, such as ribosomal protein (rps) 15a, rpl39, rps28, rps14, rpls2, rps27a, rpl21 and rpl32 essential for biological metabolism. ubiquitin-like protein 4a (ubl4a), c-c motif chemokine 2 (ccl2), lysozyme g (lyg_epico) and two novel genes (id: sgu016297, sgu008676) were highly expressed in the nnv-infected group compared to that in the mock group. of them, ubiquitin-like protein 4a was the most abundant gene in the nnv-infected group. putative annotations of these transcripts were performed using blastx as mentioned in the method section. after gene annotation by using blastx against the non-redundant (nr) database, the putative functions of 43,280 sequences (41.5%) of 104,348 unigenes sequences were identified. a total of 3418 unigenes were differentially expressed based on deg analysis using the tcc package. a total of 372 genes from the total of 3418 degs were annotated (table s1 ). immune relevant degs were further manipulated manually (table 3 ). in immune relevant genes, a variety of cytokines were intensely up-regulated after nnv infection. several cytokine genes induced by nnv infection belonged to the chemokine family, including c-c motif chemokine ligand 2 (ccl2), ccl34, ccl19, ccl4, c-x-c motif chemokine ligand 13 (cxcl13), cxcl6, cxcl8, cxcl9, interleukin-12 subunit alpha (il12a) and beta (il12b), and il18b. ccl2 was the most critically expressed gene in the infected group showing 10.66 log fold change (fc). ccl2 is involved in neuroinflammatory processes taking place in the central nervous system in various diseases [28] . cathepsins are lysosomal cysteine enzymes with important roles in cellular homeostasis and innate immune response [29] . among a dozen members of the cathepsin family, subtypes l, h, k, o, s and z were up-regulated in the brain of sevenband grouper after nnv infection. specifically, cathepsin l was highly expressed in the nnv-infected group showing 8.3 log fc. several lectins were expressed in higher levels in the nnv-infected group compared to the mock group, including c-type lectins (clec4m, clec10a), galectins (lgals9, lgals3), fucolectin (fucl4), and mannose-binding lectin (mbl). in the case of c-type lectins, its receptor (cd209) was also highly expressed in the infected group (table 3) indicating that c-type lectin might play specific roles in the response of sevenband grouper to nnv infection. as expected, a number of antiviral proteins also showed high levels of expression in the nnv-infected group. for example, radical s-adenosyl methionin domain-containing protein 2 (rsad2), also known as viperin, was highly expressed in the nnv-infected group with 10.40 log fc. mx gene (mx), one of the important downstream effectors of interferon (ifn), was also expressed more in the infected group with 8.64 log fc. besides mx, a lot of ifn-induced proteins were upregulated by nnv infection, including ifn-induced protein 44 (ifi44), ifn-induced protein with tetratricopeptide repeats 5 (ifit5), ifn-induced very large gtpase 1 (gvinp1), ifn-induced double-stranded rna-activated protein kinase (eif2ak2), and ifn-induced helicase c domain-containing protein 1 (ifih1). interestingly, of the various heat shock proteins (hsps), only hsp30 was significantly upregulated in the nnv-infected group with log 8.42 fc. nk-lysin, a known antibacterial protein, was also highly expressed in the nnv-infected group with 8.85 log fc. go is a widely used method to classify gene functions and their products in organisms. therefore, the identified degs were subsequently used for go enrichment analysis. according to go terms, 2094 (61.3%) of the total of 3418 degs were classified into the three categories of molecular function, biological process, and cellular component. "binding" (1258 genes, 46.3%) was the major subcategory in the molecular function. the largest subcategory found in the biological process category was "cellular process" (1488 genes, 12.3%) while "cell" (1687 genes, 19.6%) and "cell part" (1687 genes, 19.6%) were the most abundant go terms in the cellular component category (figure 2 ). because one gene could be categorized into several subcategories, the sum of genes in the subcategories could exceed 100%. go analysis of the transcriptome revealed nine molecular function subcategories, 62 biological process subcategories, and 12 cellular component subcategories with p value of less than 1 × 10 −5 ) (table s2 ). (61.3%) of the total of 3418 degs were classified into the three categories of molecular function, biological process, and cellular component. "binding" (1258 genes, 46.3%) was the major subcategory in the molecular function. the largest subcategory found in the biological process category was "cellular process" (1488 genes, 12.3%) while "cell" (1687 genes, 19.6%) and "cell part" (1687 genes, 19.6%) were the most abundant go terms in the cellular component category (figure 2 ). because one gene could be categorized into several subcategories, the sum of genes in the subcategories could exceed 100%. go analysis of the transcriptome revealed nine molecular function subcategories, 62 biological process subcategories, and 12 cellular component subcategories with p value of less than 1 × 10 −5 ) (table s2 ). nnv infection has caused high mortalities of sevenband groupers in aqua-farms during the summer season, especially at larval and juvenile stages. it has also caused tremendous economic losses [1] . due to the greater damage to the sevenband grouper industry, investigation on the molecular response of nnv infection is required to understand the outbreak mechanism of disease and develop prevention methods such as vaccines. in this study, we performed a transcriptome analysis of the brain tissue of sevenband grouper infected with nnv compared to mock brain tissue using a rna-seq. the total number of unigenes and the average length of the unigenes were 104,348 and 845 bp, respectively. the number and average length found in this study indicated a fairly good performance compared to other previous ngs transcriptome studies on crimson spotted rainbowfish (107,749 transcripts, 961 bp) [30] common carp (130,292 transcripts, 1401 bp) [18] , blunt snout bream (253,439 transcripts, 998 bp) [31] , orange-spotted grouper (116,678 transcripts, 685 bp) [32] and asian seabass (89,026 transcripts, 1175 bp) [33] . gene annotation by blastx provides valuable information about the transcripts. in this study, 43,289 unigenes (41.5%) of 104,348 unigenes were annotated. this is similar to the result of orange-spooted grouper (45.8%) [32] . liu nnv infection has caused high mortalities of sevenband groupers in aqua-farms during the summer season, especially at larval and juvenile stages. it has also caused tremendous economic losses [1] . due to the greater damage to the sevenband grouper industry, investigation on the molecular response of nnv infection is required to understand the outbreak mechanism of disease and develop prevention methods such as vaccines. in this study, we performed a transcriptome analysis of the brain tissue of sevenband grouper infected with nnv compared to mock brain tissue using a rna-seq. the total number of unigenes and the average length of the unigenes were 104,348 and 845 bp, respectively. the number and average length found in this study indicated a fairly good performance compared to other previous ngs transcriptome studies on crimson spotted rainbowfish (107,749 transcripts, 961 bp) [30] common carp (130,292 transcripts, 1401 bp) [18] , blunt snout bream (253,439 transcripts, 998 bp) [31] , orange-spotted grouper (116,678 transcripts, 685 bp) [32] and asian seabass (89,026 transcripts, 1175 bp) [33] . gene annotation by blastx provides valuable information about the transcripts. in this study, 43,289 unigenes (41.5%) of 104,348 unigenes were annotated. this is similar to the result of orange-spooted grouper (45.8%) [32] . liu et al. have addressed the possible reasons of such poor annotation: (1) novel genes; (2) sequencing errors; and (3) artefacts by assembly algorithm [33] . therefore, more genetic studies are needed to understand the biological functions. the importance of innate defense mechanisms against viral infection has been extensively reviewed [34] [35] [36] . in this study, we identified innate immune response relevant genes of sevenband grouper involved in nnv infection. chemokines are critical components of the immune system. the roles of chemokines and their receptors in viral interactions have been reported in various studies [37] . the chemokines family comprises four subfamilies based on n-terminal cystein-motifs: c, c-c, c-x-c, and c-x3-c subfamilies [38] . in this study, we also detected significant up-regulation of ccl2, ccl34, ccl19, ccl4, cxcl13, cxcl6, cxcl8, and cxcl9 in sevenband grouper brain tissue after nnv infection. especially, ccl2 was highly over expressed at about 10.66 log fc. ccl2 is a pro-inflammatory chemokine that is induced during several human acute and chronic viral infections, including human immunodeficiency virus (hiv) [39] , hepatitis c virus [40] , epstein-barr virus [41] , respiratory synctitial virus [42] , severe acute respiratory syndrome (sars) [43] , herpes simplex virus-1 [44] , and japanese encephalitis virus [45] . cathepsins are lysosomal cysteines that play important roles in normal metabolism for the maintenance of cellular homeostasis. cathepsins are one of the superfamilies involved in the regulation of antigen presentation and degradation as well as immune responses, including apoptosis, inflammation, and regulation of hormone processing [46] [47] [48] . in addition, chandran et al. have shown that cathepsin b and cathepsin l are involved in ebola virus infection [49] . they are involved in the entry of reovirus [50] . recently, cathepsin l has also been shown to be involved in the entry mediated by the sars coronavirus spike glycoprotein [51] as well as in the process of fusion glycoprotein of hendra virus [52] . in this study, cathepsin l and cathepsin s were found to be notably expressed after nnv infection. their functional roles in the interaction between grouper and nnv merit further studies. lectins are carbohydrate-binding proteins that are highly specific for sugar moieties. they mediate the attachment and binding of viruses to their targets [53] . lectins are also known to play important roles in the immune system. within the innate immune system, lectins can help mediate the first-line of defense against invading microorganisms. in this study, several kinds of lectins were found to be highly induced in sevenband grouper brain tissue by nnv infection, such as c-type lectins (ctls), galectins, fucolectin, and mannose-binding lectin. ctls are the most well studied lectins. they can promote antibacterial and antiviral immune defense [54] . many ctls have been identified in teleost but the exact function of ctls remains unclear. hundreds of interferon stimulated genes (isgs) were transcribed in sevenband grouper brain tissue during nnv infection. interferon induced protein 44 (ifi44) was expressed the most. ifi44 is an interferon-alpha inducible protein associated with infection of several viruses. power et al. have demonstrated that ifi44 can inhibit hiv-1 replication in vitro by suppressing hiv-1 ltr promoter activity [55] . carlton-smith and elliott have screened isgs related to bunyamwera orthbunyavirus replication using nonstructural (nss) protein knock out virus. one of these isgs that have inhibitory activity is found to be ifi44 [56] . whether protein b2 of nnv has roles in virus replication and its relationship with isgs such as ifi44 merit further study. hsps are one of the most phylogenetically conserved classes of proteins with critical roles in maintaining cellular homeostasis and protecting cells from various stresses [57] . ironically, hsp70 has roles to suppress some virus infections, and support their replication in other viruses [58] . in this study, hsp30 was the highly induced gene instead of hsp70. hsp30 has also been reported to be the most induced gene in the nnv infected asian seabass epithelial cell [33] . however, the function of hsp30 in nnv infection remains unclear. krasnov et al. previously reported the effects of nnv on gene expression in atlantic cod brain using a microarray [59] . compared to our study, a number of genes show a similar up-regulation result in the study, such as caspase, cathepsins, irf, radical s-adenosyl methionine domain-containing protein, tripartite motif-containing protein (trim) and so on. however, a lot of novel genes (e.g., nk-lysin, ubiquitin-like protein 4, granzyme a, etc.) were identified from our rna-seq result because a microarray can only evaluate the genes on a chip. our findings are preliminary based on the small scale of the study and the results have not yet been confirmed by an independent technique such as quantitative polymerase chain reaction (qpcr). in future studies, it will be necessary to confirm the expression level of genes and to characterize the function of genes that are highly involved in nnv infection. in conclusion, to the best of our knowledge, this is the first study reporting the transcriptome of brain tissue of nnv-infected sevenband grouper. in this study, we obtained the transcriptome of sevenband grouper. a total of 104,348 transcripts were obtained, including 628 degs between nnv infected and non-infected sevenband grouper. a large number of differential expressed genes relevant to immune response were identified as well as several candidate genes (ccl2, cathepsins, lectins, hsp30, and interferon-induced protein 44) that were intensely induced by nnv. their functions in sevenband grouper against nnv infection merit further study. the acquired data from such transcriptome analysis provide valuable information for future functional genes related to nnv infection and vnn outbreak. supplementary materials: the following are available online at www.mdpi.com/2073-4425/8/1/31/s1, table s1 : the annotated degs, table s2 : gene ontology of sevenband grouper transcriptome after nnv infection. prevalence of viral nervous necrosis (vnn) in sevenband grouper, epinephelus septemfasciatus farms betanodavirus infections of teleost fish: a review special topic review: nodaviruses as pathogens in larval and juvenile marine finfish viral and bacterial diseases of marine fish and shellfish in japanese hatcheries phylogenetic analysis of betanodaviruses isolated from cultured fish in korea comparison of the coat protein gene of nervous necrosis virus (nnv) detected from marine fishes in korea a fish nodavirus isolated from cultured sevenband grouper, epinephelus septemfasciatus transcriptome analysis of orange-spotted grouper (epinephelus coioides) spleen in response to singapore grouper iridovirus comparison of the coat protein genes of five fish nodaviruses, the causative agents of viral nervous necrosis in marine fish characterization of nodavirus and viral encephalophathy and retinopathy in farmed turbot, scophthalumus maximus (l.) using rna-seq to profile soybean seed development from fertilization to maturity rna sequencing reveals a diverse and dynamic repertoire of the xenopus tropicalis transcriptome over development transcriptome profiling reveals th17-like immune responses induced in zebrafish bath-vaccinated with a live attenuated vibrio anguillarum characterization of striped jack nervous necrosis virus subgenomic rna3 and biological activities of its encoded protein b2 de novo characterization of the spleen transcriptome of the large yellow croaker (pseudosciaena crocea) and analysis of the immune relevant genes and pathways involved in the antiviral response high-throughput sequence analysis of turbot (scophthalmus maximus) transcriptome using 454-pyrosequencing for the discovery of antiviral immune genes deep sequencing-based transcriptome profiling analysis of bacteria-challenged lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish de novo assembly and characterization of the spleen transcriptome of common carp (cyprinus carpio) using illumina paired-end sequencing development and application of quantitative detection method for nervous necrosis virus (nnv) isolated from sevenband grouper hyporthodus septemfasciatus. asian pac a simple method of estimating fifty percent end points fastqc a quality control tool for high throughput sequence data next-generation transcriptome assembly de novo transcript sequence reconstruction from rna-seq using the trinity platform for reference generation and analysis accelerated for clustering the next-generation sequencing data rsem: accurate transcript quantification from rna-seq data with or without a reference genome tcc: an r package for comparing tag count data with robust normalization strategies a normalization strategy for comparing tag count data chemokines and disease cloning, characterisation, and expression analysis of the cathepsin d gene from rock bream (oplegnathus fasciatus) rna-seq analysis reveals extensive transcriptional plasticity to temperature stress in a freshwater fish species transcriptome analysis and microsatellite discovery in the blunt snout bream (megalobrama amblycephala) after challenge with aeromonas hydrophila transcriptome analysis of the effect of vibrio alginolyticus infection on the innate immunity-related complement pathway in epinephelus coioides transcriptome analysis of genes responding to nnv infection in asian seabass epithelial cells innate immune response to viral infection innate immunity to virus infection pattern recognition receptors and the innate immune response to viral infection ccl2: a potential prognostic marker and target of anti-inflammatory strategy in hiv/aids pathogenesis hiv-1 tat induces monocyte chemoattractant protein-1-mediated monocyte transmigration across a model of the human blood-brain barrier and up-regulates ccr5 expression on human monocytes serum concentrations and peripheral secretion of the beta chemokines monocyte chemoattractant protein 1 and macrophage inflammatory protein 1α in alcoholic liver disease epstein-barr virus induces mcp-1 secretion by human monocytes via tlr2 differential chemokine expression following respiratory virus infection reflects th1-or th2-biased immunopathology modeling the early events of severe acute respiratory syndrome coronavirus infection in vitro herpes simplex virus 1 interaction with toll-like receptor 2 contributes to lethal encephalitis japanese encephalitis virus differentially modulates the induction of multiple pro-inflammatory mediators in human astrocytoma and astroglioma cell-lines sharma immunodiagnostic/protective role of cathepsin l cysteine proteinases secreted by fasciolaspecies the lysosomal cysteine proteases in mhc class ii antigen presentation cathepsin l in secretory vesicles functions as a prohormone-processing enzyme for production of the enkephalin peptide neurotransmitter endosomal proteolysis of the ebola virus glycoprotein is necessary for infection cathepsin l and cathepsin b mediate reovirus disassembly in murine fibroblast cells inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry cathepsin l is involved in proteolytic processing of the hendra virus fusion protein hepatitis c virus molecular clones and their replication capacity in vivo and in cell culture csctl1, a teleost c-type lectin that promotes antibacterial and antiviral immune defense in a manner that depends on the conserved epn motif ifi44 suppresses hiv-1 ltr promoter activity and facilitates its latency viperin, mtap44, and protein kinase r contribute to the interferon-induced inhibition of bunyamwera orthobunyavirus replication molecular chaperones in cellular protein folding the authors declare no conflict of interest. key: cord-273609-whm2ce4u authors: li, qingdi quentin; skinner, jeff; bennett, john e title: evaluation of reference genes for real-time quantitative pcr studies in candida glabrata following azole treatment date: 2012-06-29 journal: bmc mol biol doi: 10.1186/1471-2199-13-22 sha: doc_id: 273609 cord_uid: whm2ce4u background: the selection of stable and suitable reference genes for real-time quantitative pcr (rt-qpcr) is a crucial prerequisite for reliable gene expression analysis under different experimental conditions. the present study aimed to identify reference genes as internal controls for gene expression studies by rt-qpcr in azole-stimulated candida glabrata. results: the expression stability of 16 reference genes under fluconazole stress was evaluated using fold change and standard deviation computations with the hkgfinder tool. our data revealed that the mrna expression levels of three ribosomal rnas (rdn5.8, rdn18, and rdn25) remained stable in response to fluconazole, while pgk1, ubc7, and ubc13 mrnas showed only approximately 2.9-, 3.0-, and 2.5-fold induction by azole, respectively. by contrast, mrna levels of the other 10 reference genes (act1, ef1α, gapdh, ppia, rpl2a, rpl10, rpl13a, sdha, tub1, and ubc4) were dramatically increased in c. glabrata following antifungal treatment, exhibiting changes ranging from 4.5to 32.7-fold. we also assessed the expression stability of these reference genes using the 2(-δδct) method and three other software packages. the stability rankings of the reference genes by genorm and the 2(-δδct) method were identical to those by hkgfinder, whereas the stability rankings by bestkeeper and normfinder were notably different. we then validated the suitability of six candidate reference genes (act1, pgk1, rdn5.8, rdn18, ubc7, and ubc13) as internal controls for ten target genes in this system using the comparative c(t) method. our validation experiments passed for all six reference genes analyzed except rdn18, where the amplification efficiency of rdn18 was different from that of the ten target genes. finally, we demonstrated that the relative quantification of target gene expression varied according to the endogenous control used, highlighting the importance of the choice of internal controls in such experiments. conclusions: we recommend the use of rdn5.8, ubc13, and pgk1 alone or the combination of rdn5.8 plus ubc13 or pgk1 as reference genes for rt-qpcr analysis of gene expression in c. glabrata following azole treatment. in contrast, we show that act1 and other commonly used reference genes (gapdh, ppia, rpl13a, tub1, etc.) were not validated as good internal controls in the current model. the investigation of gene expression has become increasingly prevalent in numerous animal, human, microorganism, and plant studies [1] [2] [3] [4] [5] . the quantitation of gene expression requires sensitive, precise, and reproducible measurements for specific mrna sequences. generally, gene expression levels can be determined by a variety of techniques, including northern blotting, rnase protection assay, semi-quantitative reverse-transcription pcr, and real-time quantitative pcr (rt-qpcr) [4] . rt-qpcr has gained favor as it is a highly sensitive, accurate, and fast technique that offers highthroughput and the ability to detect low-abundance mrnas [6] and quantify mrna copy number [7] . thus, rt-qpcr has been used for countless different applications [1] [2] [3] [4] [5] . one of the main uses of rt-qpcr, when coupled with reverse transcription, is to measure gene expression at the mrna level in various biological samples. however, there is substantial technical variability associated with rt-qpcr, arising from inherent differences in samples, sample collection, rna degradation and extraction efficiency, quantity and quality of input rna, reverse transcription and pcr efficiency, and pipetting accuracy or error. researchers have employed a number of strategies to normalize their data, including normalization to (i) genomic dna, (ii) total rna, (iii) an external standard, and (iv) a reference gene. the most common practice is to normalize to an internal control gene termed a reference gene. a reference gene is subject to the same errors in cdna preparation as the gene of interest, making it an excellent normalizing control. however, selection of an inappropriate reference gene can add large unpredictable error to the analysis and result in incorrect estimates [8] . the ideal reference gene should have a stable rna transcription level under different experimental conditions and be sufficiently abundant across different tissues and cell types. however, it has become apparent that such an ideal reference gene has not yet been identified [9] . the most commonly used reference genes, including β-actin, cyclophilin, gapdh, tubulin, and 18s and 28s ribosomal rnas, have shown variable expression levels in different cells and tissues under different conditions, and therefore they are unsuitable for normalization purposes owing to large measurement error [6, . hence, it is no longer acceptable to arbitrarily select any reference gene for normalization; it must be demonstrated that the reference gene of choice is suitable for the experiment in question. in recent decades, candida glabrata has emerged as the second most common cause of invasive fungal infection [1, 34, 35] . azoles such as fluconazole are the firstline drugs for the treatment of fungal infections caused by c. glabrata. however, resistance to azoles can arise rapidly in c. glabrata during treatment of patients with azoles [36] . an increasing body of evidence has implicated atp-binding cassette transporters (e.g., cdr1 and pdr1) and sterol biosynthetic enzymes (e.g., erg3 and erg11) in azole resistance in c. glabrata in both clinical and laboratory settings [1, [34] [35] [36] [37] . the expression of these genes in c. glabrata in response to azoles is not completely understood. therefore, we set out to establish an in vitro model for investigating azole-inducible gene expression in c. glabrata, using rt-qpcr. for reliable gene expression analysis, a compulsory step is the selection of good reference genes for normalization; however, no validated reference genes have been reported for the relative quantification of the mrna expression profile in c. glabrata following exposure to azoles. we have been using act1 as the internal control for gene expression analysis by rt-qpcr in clinical isolates of c. glabrata in the absence of drug challenges [1] . other researchers also use act1 as the reference gene for azole-inducible gene expression studies by slot blotting in candida species [34, 37, 38] . however, the suitability of act1 in studies of azole-inducible gene expression in c. glabrata has not been validated. in this work, we evaluated 16 reference genes to establish their suitability as control genes for normalization and identified a set of genes that are suitable for quantitative gene expression analysis by rt-qpcr in c. glabrata following fluconazole treatment. all five c. glabrata strains (table 1 ) used in the present study were grown in ypd broth (difco laboratories, the susceptibility of each c. glabrata strain to fluconazole was determined on ypd agar medium using an e-test (ab biodisk, solna, sweden) according to the manufacturer's instructions (table 1) . total rna was extracted from c. glabrata logarithmicphase cultures grown in ypd broth, using trizol reagent (invitrogen, life technologies, grand island, ny, usa) according to the manufacturer's instructions. the concentration and purity of the rna was determined using a uv spectrophotometer (nanodrop 2000c; ther-mofisher scientific, waltham, ma, usa) by measuring the absorbance at 230 (od 230 ), 260 (od 260 ) and 280 nm (od 280 ). the od 260nm /od 280nm of the samples, reflecting the average purity, ranged from 1.80 to 2.05, and the od 260nm /od 230nm was in the range of 2.00-2.60. the integrity of the rna was further checked in a selected subset of samples by electrophoresis through 1% denaturing and non-denaturing agarose gels. reverse transcription (rt) was performed on 1 μg of total rna using a commercially available kit. prior to rt, the total rna samples were treated with dnase for 30 min at 37°c (turbo dna-free; ambion, life technologies, grand island, ny, usa) according to the manufacturer's instructions. rna was converted to cdna using the high capacity cdna reverse transcription kit (applied biosystems, life technologies, carlsbad, ca, usa). the reaction took place in a thermal cycler (t3 thermocycler; biometra, goettingen, germany) with a single cycle and incubation periods of 25°c for 10 min, 37°c for 120 min, and 85°c for 5 min. all investigated samples were transcribed with the same reverse transcription reaction conditions. negative controls, which were run simultaneously, did not contain either rna (no template control) or no reverse transcriptase (rt negative control), to control for rna and genomic dna contamination, respectively. primers and probes were designed in our laboratory using the primer analysis software primer express 3.0 (applied biosystems). taqman probes were synthesized by applied biosystems, and primers were synthesized by invitrogen/life technologies. the primers and taqman probes used in the current study were selected to bind specifically to the cdnas of cg84u and other c. glabrata strains ( table 1 ). the sequences of taqman probes and forward and reverse primers, the gene numbers, and the localization for each pcr assay for the 16 reference genes and 10 target genes assessed in this study are listed in additional file 1, additional file 2, and additional file 3. rt-qpcr for reference gene rna transcription was performed by sybr green chemistry (sybr green pcr master mix; applied biosystems). the increase in fluorescence of the sybr green dye was monitored using a 7500 real-time pcr system (applied biosystems). this technique has been successfully used to validate reference gene expression levels in yeast and other cell types [41] . primers were used at 300 nm each for specific forward and reverse primers and cdna at 25 ng in 25-μl reactions. primer sets for the reference genes (additional file 2) were used to amplify the open reading frame (orf) region of the genes according to the following conditions: one cycle of 50°c × 2 min, 95°c × 10 min; followed by 40 cycles of 95°c × 15 s, 60°c × 1 min; with dissociation (a melting curve) during the last cycle of 95°c × 15 s, 60°c × 1 min, 95°c × 15 s. the dissociation protocol to determine the melting curve from 60°c to 95°c for each pcr product was added after thermocycling to verify that each primer pair produced only a single product. all samples gave only a single peak, indicating a single pure product and no primer/dimer formation. real-time pcr efficiencies were acquired by amplification of a standardized dilution series of the template cdna and were determined for each gene as the slope of a linear regression model. pcr efficiency was determined by measuring the c t to a specific threshold for a serial dilution of cdna. the corresponding real-time pcr efficiencies were then calculated according to the equation: e ¼ 10 à1=slope à 1 à á â 100 . all pcrs displayed efficiencies between 94% and 119%. to study target gene expression, the amplification was detected in real time using taqman chemistry (taqman universal pcr master mix; applied biosystems) according to the manufacturer's instructions. rt-qpcr was performed in 96-well microtiter plates with a final volume of 25 μl, using a 7500 real-time pcr system (applied biosystems). primers were used at 300 nm each for specific forward and reverse primers; probes, at 200 nm; and cdna, at 25 ng in 25-μl reactions. primer sets and taqman probes for the target genes (additional file 3) were used to amplify the orf region of the genes under the following conditions: one cycle of 50°c × 2 min, 95°c × 10 min; and then 40 cycles of 95°c × 15 s, 60°l c × 1 min. the parallel amplification between the reference genes and the target genes was confirmed for each with probe-primer sets. to minimize technical (run-torun) variation between the samples, all samples were analyzed in the same run for both target genes and reference genes. evaluation of reference gene expression stability using four different software packages non-normalized gene expression levels from our experimental data were analyzed to evaluate the expression stability of potential reference genes, using four different software programs: hkgfinder [42] , genorm [9, 43] , best-keeper [44] , and normfinder [26] . the hkgfinder software computes the pooled standard deviation (sd) of non-normalized expression data from both phenotypes (i.e., azole-treated and untreated c. glabrata cells), the fold change (fc) values between the two phenotypes, and student's t-tests of the log 2 fold-change values with holm-adjusted p-values. the reference genes with the smallest sd and the smallest, non-significant fc are identified as the best potential reference genes (additional file 4). the genorm software computes a stability value (m) and a pairwise variation (v), which are used to evaluate each individual reference gene candidate or each combination of reference genes, called a normalization factor (nf). the pairwise variability v of two genes j and k is the standard deviation of all log 2 ratios of a j /a k , while the stability value m of gene j is the mean of all possible pairwise variations v jk . graphs of the m values help identify the best individual reference genes, and graphs of the v values identify the optimal number of reference genes for an nf. note that an earlier version called genorm excel was produced as an add-in for ms excel and it required several hand calculations to convert crossing point (cp) values into relative expression values. that version is now unavailable and the new genorm plus from biogazelle does not require those hand calculations. the bestkeeper software uses pairwise correlation to determine whether potential reference genes should be included in a bestkeeper index, which is simply the geometric mean of the cp or cycle threshold (c t ) values. the normfinder software computes a different type of stability value (ρ ig ) based on the intragroup and intergroup variation of the expression data. the software instructions from each package were followed when inputting the rt-qpcr data, fetching the output, and interpreting the analysis results. stability of rna transcription of reference genes in c. glabrata following azole stimulation in the present study, 16 reference genes were chosen from among commonly used reference genes in published studies with yeast and mammalian cells, paying close attention to selecting genes that belong to different functional classes; their full names, symbols, functions, and gene numbers are listed in additional file 1. our aim was to identify reference genes with minimal variability under our experimental conditions. to this end, rt-qpcr was used to measure the rna transcription levels of 16 reference genes in c. glabrata cells following fluconazole treatment. to compare the different rna transcription levels after azole exposure, the c t values of the reference genes were directly compared between the drug-treated (t) and untreated (ut) samples using the formula: the c t is defined as the number of cycles needed for the fluorescence signal to reach a specific threshold level of detection and is inversely correlated with the amount of template cdna present in the reaction. thus, a higher value of c t change indicates lower stability of a reference gene, considering that the expression of a reference gene should not change significantly with azole treatment. as expected, the rna transcription levels of the reference genes varied ( table 2 ). the three ribosomal rna subunits rdn5.8, rdn18, and rdn25 were the most stable reference genes, with c t change values less than 0.5, while ubc13, pgk1, and ubc7 were relatively stable with c t change values of only around 1.5. by contrast, the other 10 reference genes showed marked variation in response to fluconazole. among them, the most prominent variation was found in the rna transcription levels of sdha, act1, and rpl13a; as seen in table 2 , the c t change values of these reference genes were as high as 5.03. to validate the stability of candidate rna transcription under our experimental conditions, the levels were compared with the rdn5.8 rna transcription level. we chose to use rdn5.8 as a normalizer because it meets the requirement for both stability and suitability as a reference. first, we calculated the δc t between the c t values of reference genes and rdn5.8 from fluconazoletreated (t) and untreated (ut) cells: in the second step, we subtracted the change in rna transcription in untreated samples from the change in treated samples to obtain the δδc t (t): δδc thus, δδc t (t) indicates the change in rna transcription caused by fluconazole treatment after normalization to rna transcription changes in rdn5.8. a high δδc t (t) value indicates a significant fluconazole-related change in the rna transcription level of the tested gene. a positive δδc t (t) value indicates down-regulation of transcription, whereas a negative δδc t (t) indicates upregulation of a gene's transcription following azole treatment. we then transformed δδc t (t) into a 2 -δδct value, which indicates the fold change in rna transcription of a reference gene in response to fluconazole as compared with the level in untreated cells. the calculated δδc t (t) and 2 -δδct values of the 16 tested reference genes in drug-treated samples are given in table 2 . following stimulation with fluconazole, the rna transcription of sdha, act1, and rpl13a was highly regulated in c. glabrata cells, with changes ranging from 13to 23-fold compared with transcription in the untreated cells. there was almost no regulation of rdn5.8, rdn18, and rdn25 rna transcription, while pgk1, ubc7, and ubc13 rna transcription were only approximately 2-fold induction in response to drug treatment ( table 2) . to choose the best reference genes, the reference gene stability was evaluated using four different software packages: hkgfinder, genorm, bestkeeper, and norm-finder. each of these software packages uses a slightly different metric to evaluate the candidate reference genes. our goal was to compare the findings from these four different methods and look for the best-scoring reference genes that might be common to these different methods. the hkgfinder software identifies the best reference genes by ranking the candidate genes according to their sd and fc values ( table 3 ). among the 16 potential reference genes, the sds ranged from 0.19 to 2.76, and the fcs ranged from 1.2 to 32.7. the best three reference gene candidates were rdn18, rdn25, and rdn5.8. the next three best candidate reference genes, which also had reasonable sd and fc values, were ubc13, pgk1 and ubc7. the genorm software evaluates reference genes by their m-stability values and v-pairwise variability values. low m values represent more stable expression and thus the most suitable reference genes ( figure 1 ). the genorm analysis identified rdn18, rdn25, and rdn5.8 as the three most stable genes; ubc13, pgk1, and ubc7 as relatively stable genes; and rpl13a, act1, and sdha as the three least stable genes under fluconazole treatment in c. glabrata. interestingly, the ranking of expression stability of the 16 reference genes was identical between the genorm program and the hkgfinder tool ( figure 1 ; tables 3 and 4 ). the genorm program also estimates the optimal number of reference genes that could be used in combination as an nf value ( figure 2 ). each nf was calculated as the geometric mean of the two most stable genes, then the pairwise variability v was computed between nfn and nfn + 1 for n = 2, . . ., 15. vandesompele et al. [9] proposed 0.15 as a cutoff value for v below which additional reference genes do not need to be added to the nf. adding the third gene to the most stable two reference genes, rdn18 and the fold change shown in the table represents the difference in reference gene expression between azole-treated and untreated c. glabrata without normalization to an internal control gene. the stability ranking is based on the values of standard deviation (sd), log fold change, and fold change of reference genes in fluconazole-treated c. glabrata cells. the best reference genes will have the smallest sd and smallest fold-change values. determination of the optimal number of reference genes as internal references for normalization using genorm analysis. the genorm program calculates a normalization factor from at least two reference genes and the mean pairwise variation (v) between every combination of sequential normalization factors in order to determine the minimum number of reference genes required for accurate normalization in the samples. for example, v5/ 6 represents the comparison of the normalization factors from five and six reference genes, respectively. on the left-most side is the pairwise variation when the number of reference genes is increased from two to three (v2/3). stepwise inclusion of less stable genes generates the subsequent data points. a decrease in the v value indicates a positive effect and means that the added gene should preferably be included for calculation of a reliable normalization factor. the cutoff value for v, below which the inclusion of an additional reference gene does not result in a significant improvement of normalization, was set at 0.15. it was apparent from the analysis of all studied samples that the combination of the two most stable reference genes is the best option and the combination of the five most stable reference genes is the second-best option for accurate normalization. the normfinder program was also used to rate candidate reference gene stability according to a stability value computed from the intragroup and intergroup expression variability. the least reliable reference genes identified by this program were rpl13a, act1, and sdha, which were identical to the worst reference genes identified by genorm and hkgfinder analyses (table 4) . however, the ranking order of the most stable genes and the relatively stable genes by the normfinder program was different from that generated by genorm and hkgfinder (figure 1; tables 3 and 4) . the genorm and hkgfinder analyses graded rdn18, rdn25, and rdn5.8 as the most stable reference genes, followed by ubc13, pgk1, and ubc7 based on gene expression stability, whereas normfinder rated ubc7, pgk1, and ubc13 as the most stable reference genes, followed by rdn5.8, rdn18, and rdn25 (figure 1 ; tables 3 and 4) . finally, the bestkeeper program was used to grade candidate reference gene stability. this approach permits a comparative analysis across reference genes. ten reference genes analyzed were correlated and were combined into an index. subsequently, the correlation between each reference gene and the index was calculated. the best correlations between the reference genes and the bestkeeper index were obtained for ubc7, ubc13, and pgk1 (r = 0.983, 0.978, and 0.974, respectively; table 4 ). the rankings of the top three and the last three reference genes identified by the bestkeeper program were the same as those generated by the normfinder analysis, although the order of stability of the other reference genes differed slightly between the two programs ( table 4 ). following the identification of the most stable reference genes from the full gene panel of 16 genes, the comparative c t method was used to validate their suitability. the comparative c t method, also referred to as the δδc t method, is a relative quantitation of gene expression between a specific target gene and a reference gene. for the comparative c t method to be valid, the efficiency of the target amplification and the efficiency of the reference (internal control) amplification must be approximately equal, and this must be determined in a validation experiment. to this end, we first determined the amplification efficiency of 10 target genes (cdr1, pdh1, pdr1, snq2, yor1, erg2, erg3, erg4, erg10, and erg11) and six reference genes (act1, pgk1, rdn5.8, rdn18, ubc7, and ubc13). standard curves were generated by plotting the dilutions of the cdna of each gene against the c t values. the linear correlation coefficient (r 2 ) for all 10 target genes and the six reference genes ranged from 0.98 to 1.0. based on these slopes of the standard curves, the amplification efficiencies of the cdna standards, derived from the formula e = (10 −1/slope −1) × 100, ranged from 94 to 119%. the c t values of all 16 genes in the samples were within the range of the standard curves. next, the δc t (δc t = c t target − c t reference ) was calculated using the c t values generated from standard curve mass points (target vs. reference gene). these δc t values were then plotted versus log 10 input amount of cdna to create a semi-log regression line. the slope of the resulting semi-log regression line was used as a general criterion for passing a validation experiment. in a validation experiment that passes, the absolute value of the slope of δc t versus log 10 input cdna would be <0.1, meaning the two c t versus log 10 concentration curves are nearly parallel. as seen in table 5 , our validation experiments passed for all reference genes analyzed except rdn18, which had an absolute value >0.1 for the slopes of δc t versus log 10 input cdna for all 10 target genes evaluated. thus, the amplification efficiency of rdn18 was clearly different from that of the ten target genes, whereas the other five reference genes (act1, each value is the slope m of the line (y = mx + b) of the validation experiment and reflects the correlation of reference gene and target gene amplification efficiencies. an absolute slope value <0.1 is generally used as a criterion for passing a validation experiment, as it indicates that the amplification efficiency is approximately equal between the reference and target genes. the slope m = 0 indicates that the efficiencies of the two pcr reactions are equal. pgk1, rdn5.8, ubc7, and ubc13) had pcr efficiencies that were similar or relatively equivalent to the target amplification efficiencies ( table 5) . to test the effect of azole on the expression of pleiotropic drug resistance genes in c. glabrata, we assessed the fluconazole-induced expression of two abc genes (cdr1 and pdr1) and one erg gene (erg4) in five c. glabrata strains, including the pdr1 mutant strain cgb4 (table 1) . for comparison, we used both rdn5.8 and act1 as references for normalization. as shown in table 6 , fluconazole markedly induced increases in erg4 mrna levels in all c. glabrata strains examined when normalized to rdn5.8. fluconazole also significantly increased cdr1 and pdr1 mrna expression in all of the strains except cgb4, consistent with the critical role of pdr1 in azole-induced transactivation of abc transporter gene expression, but not ergosterol biosynthesis gene expression, in c. glabrata. in contrast, when using act1 as the reference gene for quantification, fluconazole appeared to down-regulate the expression of all three target genes in the five c. glabrata strains (table 6 ). finally, we compared the fluconazole-inducibled mrna expression levels of four target genes (cdr1, pdr1, erg4, and erg10) in c. glabrata (cg84u strain) after normalizing to different reference genes (act1, pgk1, rdn5.8, and ubc13), individually and in pairs. differences in quantitation were detected according to the reference genes used. as seen in table 7 , normalization of the rt-qpcr data against the reference genes suggested as optimal by the four software packages (hkgfinder, genorm, bestkeeper, and normfinder) or the 2 -δδct method, gave comparable relative expression levels of the target genes under fluconazole treatment in c. glabrata. however, normalization against act1 resulted in relative expression levels of the targets that were substantially different from those normalized using other reference genes, implying that act1 is not a suitable reference gene for these studies. taken together with the data shown above, these results demonstrate that the relative quantification of azoleinducible gene expression varies largely depending on the reference gene and the number of reference genes used for normalization. this highlights the importance of choosing a suitable reference gene or reference gene pair when using rt-qpcr to determine the level of target gene expression in this model system. in any gene expression study, the selection of a valid normalization or internal control gene to correct for differences in rna sampling is critical in order to avoid misinterpretation of results and to obtain reliable conclusions. when choosing a reference gene as the internal endogenous control for gene expression studies by rt-qpcr, two important criteria must be met. the expression of the reference gene must remain stable throughout the given intervention (i.e., stability), and the amplification efficiency of the reference gene should be similar to that of the genes of interest (i.e., suitability). in the present study, we used five different methods to evaluate 16 reference genes for potential use as internal controls and found that the reference genes performed differently in terms of stability and suitability in c. glabrata cells upon exposure to fluconazole. to our values indicate the fold change in rna transcription for each target gene in fluconazole-treated c. glabrata, as compared with untreated cells. δc t = c t target − c t reference ; δc t of each target gene was calculated by using rdn5.8 or act1 as the reference. δδc t (t) = δc t (t) − δc t (ut); ut, untreated; t, fluconazole-treated. fold change = 2 -δδct (t). fold changes of the target genes were also computed with the hkgfinder tool by using rdn5.8 or act1 as the reference. # p > 0.05 and *p < 0.05 for the azole-treated group vs. the untreated group after normalization to rdn5.8 or act1. *values indicate the fold change in rna transcription for each target gene in fluconazole-treated c. glabrata after normalization to one or more reference genes, as compared with untreated cells. δδc t (t) = δc t (t) − δc t (ut); ut, untreated; t, fluconazole-treated. fold change = 2 -δδct (t). fold changes of the target genes were also computed with the hkgfinder tool by using one or more reference genes as the internal control for normalization p < 0.05 vs. untreated cells, for all values shown in the table after normalizing to one or more reference genes as indicated. knowledge, this is the first report to validate reference genes as rna internal references in c. glabrata. the poor performance of act1 in c. glabrata cells was surprising, given that this gene has been used frequently as the reference gene in earlier gene expression studies [1, 34, 37, 38] . our data clearly demonstrate the unsuitability of act1 as an internal control for gene expression studies in c. glabrata following fluconazole treatment. the initial results gained from using act1 as the internal control suggested that target gene expression was not up-regulated (tables 6 and 7 ). in fact, the only substantial change caused by azole treatment was a greater increase in act1 rna transcription compared with target gene transcription. while these findings are relevant to our specific study, it appears that numerous other studies have also shown the potential of act1 to detrimentally affect the accuracy of results [11, 12, 15, 19, 26, 27, 29, 30, 32, 33, [45] [46] [47] . we have been using act1 as the internal control for quantitation of gene expression by rt-qpcr in clinical isolates of c. glabrata, and we find that this gene works well as the reference in cells without azole or other agent stimulation [1] . however, edlind and colleagues used act1 as the reference gene for azole-inducible gene expression studies by slot blotting in candida species, and their data clearly show the variation of act1 expression in response to azoles in their systems [34, 38] . thus, our data revealing the instability of act1 in c. glabrata following antifungal treatment, combined with evidence from mammalian and other fungus studies, add to the growing body of evidence that act1 expression is unstable across various cell types and under different experimental conditions [11, 12, 15, 26, 27, 29, 30, 32, 33, [45] [46] [47] . although act1 gene expression was variable in response to fluconazole, the three ribosomal rnas (i.e., rdn5.8, rdn18, and rdn25) remained unaffected and showed stable expression in azole-treated c. glabrata. these results indicate that ribosomal rna expression offers superior consistency compared with the expression of act1 and the other reference genes assessed. the stable expression levels of 18s and 28s rrnas relative to other reference genes under a variety of experimental conditions has previously been described for numerous systems, including both mammalian and yeast cells [10, 11, 15, 18, 23, 30, [48] [49] [50] . the levels of ribosomal rna, which represents 80% of total rna, are thought to be less likely to vary under conditions that affect the expression of mrnas because they are transcribed by a distinct rna polymerase. as an example, thellin et al. and other groups have recommended the use of 18s or 28s rrna as an internal control for mrna quantification studies because mrna variations are weak and cannot highly modify the total rna level [10, 11, 15, 18, 30, 39, [48] [49] [50] [51] [52] . however, our further validation experiments showed that of slopes of δc t versus log 10 cdna were sufficiently parallel between rdn5.8 and the target genes, but not between rdn18 or rdn25 and the target genes. with rdn18 and rdn25, the absolute slope values of the δc t versus log 10 input cdna lines were >0.1 for all target genes. these data indicate that the amplification efficiency of rdn5.8 was similar to the efficiencies of the target genes, whereas the amplification efficiencies of rdn18 and rdn25 were different from the target gene amplification efficiencies. this may be attributable to the much higher abundance of rdn18 and rdn25 than rdn5.8 compared with target mrna transcripts, making it difficult to accurately subtract the baseline value in rt-qpcr data analysis. therefore, although all three ribosomal rna subunits were stable during fluconazole stimulation, only rdn5.8 may offer a more accurate and suitable alternative to act1 as an internal control for gene expression studies in c. glabrata. gapdh, a glycolytic enzyme, is encoded by a single gene and has the advantage of being highly conserved across different species [53, 54] . like 18s rrna and β-actin, gapdh has been commonly used as an internal control, often without testing. in the present study, gapdh showed much higher variability than any of the ribosomal rnas in fluconazole-treated samples. these data demonstrate that gapdh is not an appropriate control gene for these studies, as has been pointed out in previous examples, and that it may lead to incorrect results under specific experimental conditions [12] [13] [14] [15] 19, [29] [30] [31] 33, 55] . previous studies have indicated the instability of gapdh in mammalian systems, and this study broadens the scope of this phenomenon to c. glabrata as well. pgk1 also plays important roles in the glycolytic pathway, and pgk1 and gapdh are potentially co-regulated [56] . in our data, however, their potential co-regulation was not significant. pgk1 mrna levels remained relatively stable, in contrast to the marked variation in gapdh mrna levels in c. glabrata cells, following fluconazole challenge. moreover, our comparative c t calculations showed that the efficiency of pgk1 amplification was approximately equal to the efficiencies of the target gene amplifications. although pgk1 shows some variation as a reference gene, this may not affect experimental results as long as the intergroup difference being measured is greater than the reference gene variation, that is, a reference gene rna that has an error of 1 log 2 may not be ideal, but it would be sufficient to measure a 2 log 2 change in a gene of interest. thus, it is inferred that pgk1 may be a suitable reference gene for the analysis of expression for genes with higher azoleinducible mrna levels, such as erg4 and erg10. ubiquitin is a small regulatory protein that has been found in almost all tissues of eukaryotic organisms. the ubc gene codes for a polyubiquitin precursor protein [57] . due to its ubiquitous existence in different tissues and cells in eukaryotes, there are an increasing number of studies in the literature using the ubc gene as the internal standard for gene expression analysis in different eukaryotic cell systems [26, 29] . out of curiosity, we validated three ubc genes (ubc4, ubc7, and ubc13) in this study. interestingly, we found that ubc7 and ubc13 mrnas (particularly the latter) were relatively stable in c. glabrata during fluconazole treatment. in addition, our validation experiments demonstrated that the amplification efficiencies of these genes were approximately equal to those of the target genes. these findings indicate that like pgk1, ubc13 and ubc7 may also be suitable internal controls for quantifying the expression of specific genes with higher azole-inducible mrna levels in c. glabrata, such as some ergosterol biosynthesis genes. to successfully select reference genes for our studies involving azole treatment, we also investigated seven other reference genes, in addition to the reference genes mentioned above, with a diversity of functions. these reference genes can be generally classified into several groups: transcription-related genes (ef1α), structure/ cytoskeleton-related genes (tub1), protein synthesisrelated genes (rpl2a, rpl10, and rpl13a), and finally, genes that cannot be clearly categorized, including ppia and sdha. these potential reference genes such as ppia, rpl13a, and tub1 are other examples of commonly used internal controls [28, 29, 32] . for example, ppia has been used as a reference gene because of its remarkable evolutionary conservation and broad cellular and tissue distribution [58] . although these seven reference genes have been used as internal standards for normalization in countless studies, all of these genes showed an unacceptable variable expression in our model system, with values ranging from a 4.4-fold induction with tub1 to a 23-fold induction with sdha after antifungal treatment. altogether, these results suggest that the choice of internal controls is highly specific to a particular experimental condition, thus highlighting the importance of validating reference genes for each experimental model before commencement of rt-qpcr studies. although it is now widely accepted that normalizing to a single reference gene represents a strategy that is simple to use and can control for every stage of the rt-qpcr, some researchers also advocate the use of two or more reference genes, rather than relying on a single rna transcript [9, 10, 44, 59] . this is a robust method for providing accurate normalization and is consequently preferable when fine measurements are to be made. according to vandesompele et al. [9] , the purpose of normalization is to remove the sampling difference (such as rna quantity and quality) in order to identify real gene-specific variation. they provided evidence that a conventional normalization strategy based on a single gene can lead to erroneous normalization. however, it is not always possible to measure multiple reference genes because of limited sample availability and cost. furthermore, even when multiple genes are chosen, the resolution of the particular assay remains dependent on the variability of the chosen reference genes. as to our case, the genorm analysis using the geometric mean of the expression of the 16 candidate cdnas suggested the use of rdn5.8, rdn18, and rdn25 in combination or the combination of these three ribosomal rnas plus ubc13 and pgk1 as the reference control in the current study. however, the genorm assessment is based solely on the variability of reference genes and does not take other factors into account. for example, we found that although rdn18 and rdn25 were quite stable, their amplification efficiencies were not equal to the amplification efficiencies of all target genes tested; thus, they may not be suitable as internal controls in our system. therefore, when multiple reference genes are necessary, we believe that the combination of rdn5.8 plus ubc13 and/or pgk1 would be a better choice for quantitation of gene expression by rt-qpcr in c. glabrata following azole stimulation. in this study, we evaluated 16 reference genes for potential use as internal controls for rt-qpcr analysis of gene expression in c. glabrata (cg84u strain) following 2 hours of exposure to fluconazole at 200 μg/ml. to our knowledge, this is the first identification and validation of rdn5.8, ubc13, and pgk1 as the most suitable and stably expressed reference genes among the 16 reference genes tested. therefore, we recommend the use of rdn5.8, ubc13, or pgk1 alone or the geometric mean of these genes as standards for normalization when analyzing differences in gene expression levels in c. glabrata during antifungal treatment. more specifically, rdn5.8 may be a more suitable reference gene for the analysis of expression for genes with lower azoleinducible mrna levels, while ubc13 and pgk1 may be better internal controls for quantifying the expression of genes with higher azole-inducible mrna levels in c. glabrata. in contrast, we demonstrated that 10 reference genes commonly used in published reports, including act1, gapdh, ppia, rpl13a, and tub1, had significant differences in their expression upon azole challenge, and thus were not validated as good endogenous controls in this model. as a main conclusion, this study emphasizes the importance of evaluation studies for the selection of the most appropriate internal controls for each experimental model used for quantitative expression studies. microarray and molecular analyses of the azole resistance mechanism in candida glabrata oropharyngeal isolates quantitative real-time rt-pcr-a perspective the real-time polymerase chain reaction twenty-five years of quantitative pcr for gene expression analysis real-time pcr for mrna quantitation absolute quantification of mrna using real-time reverse transcription polymerase chain reaction assays an overview of real-time quantitative pcr: applications to quantify cytokine gene expression guideline to reference gene selection for quantitative real-time pcr accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes housekeeping genes as internal standards: use and limits betaactin-an unsuitable internal control for rt-pcr beta-actin and gapdh housekeeping gene expression in asthmatic airways is variable and not suitable for normalising mrna levels gapdh as a housekeeping gene: analysis of gapdh mrna expression in a panel of 72 human tissues control selection for rna quantitation direct comparison of gapdh, beta-actin, cyclophilin, and 28s rrna as internal standards for quantifying rna levels under hypoxia mechanical injury of cartilage explants causes specific time-dependent changes in chondrocyte gene expression gene expression studies in prostate cancer tissue: which reference gene should be selected for normalization? effect of experimental treatment on housekeeping gene expression: validation by real-time, quantitative rt-pcr validation of housekeeping genes for normalizing rna expression in real-time pcr comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes validation of endogenous controls for gene expression analysis in microdissected human renal biopsies variation in epidermal housekeeping gene expression in different pathological states control genes in quantitative molecular biological techniques: the variability of invariance quantitative assessment of gene expression in highly purified hematopoietic cells using real-time reverse transcriptase polymerase chain reaction unsuitability of using ribosomal rna as loading control for northern blot analyses related to the imbalance between messenger and ribosomal rna content in rat mammary tumors normalization of real-time quantitative reverse transcription-pcr data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets reference gene selection for quantitative real-time pcr in chrysanthemum subjected to biotic and abiotic stress selection of reference genes for quantitative gene expression normalization in flax (linum usitatissimum l.) selection of housekeeping genes for use in quantitative reverse transcription pcr assays on the murine cornea ribosomal 18 s rna prevails over glyceraldehyde-3-phosphate dehydrogenase and beta-actin genes as internal standard for quantitative comparison of mrna levels in invasive and noninvasive human melanoma cell subpopulations housekeeping gene expression during fetal brain development in the rat-validation by semi-quantitative rt-pcr reference gene selection for quantitative real-time pcr analysis in virus infected cells: sars corona virus, yellow fever virus, human herpesvirus-6. camelpox virus and cytomegalovirus infections standardization strategy for quantitative pcr in human seminoma and normal testis azole resistance in candida glabrata: coordinate upregulation of multidrug transporters and evidence for a pdr1-like transcription factor candida glabrata pdr1, a transcriptional regulator of a pleiotropic drug resistance network, mediates azole resistance in clinical isolates and petite mutants mechanism of increased fluconazole resistance in candida glabrata during prophylaxis upregulation of erg genes in candida species by azoles and other sterol biosynthesis inhibitors histone deacetylase inhibitors enhance candida albicans sensitivity to azoles and related antifungals: correlation with reduction in cdr and erg upregulation collaborative comparison of broth macrodilution and microdilution antifungal susceptibility tests function of candida glabrata abc transporter gene application of a master equation for quantitative mrna analysis using qrt-pcr office of cyber infrastructure and computational biology (ocicb), national institute of allergy and infectious disease (niaid) qbase relative quantification framework and software for management and automated analysis of real-time quantitative pcr data determination of stable housekeeping genes, differentially regulated target genes and sample integrity: bestkeeper-excel-based tool using pair-wise correlations effects of fasting on the expression of gastrin, cholecystokinin, and somatostatin genes and of various housekeeping genes in the pancreas and upper digestive tract of rats changes in beta-actin mrna expression in remodeling canine myocardium regulation of hypoxanthine phosphoribosyltransferase, glyceraldehyde-3-phosphate dehydrogenase and beta-actin mrna expression in porcine immune cells and tissues housekeeping gene variability in normal and carcinomatous colorectal and liver tissues: applications in pharmacogenomic gene expression studies the quantification of gene expression in an animal model of brain ischaemia using taqman real-time rt-pcr validation of oligonucleotide microarray data using microfluidic low-density arrays: a new statistical method to normalize real-time rt-pcr data relative quantitative rt-pcr to study the expression of plant nutrient transporters in arbuscular mycorrhizas expression profile analysis of the lowoxygen response in arabidopsis root cultures various rat adult tissues express only one major mrna species from the glyceraldehyde-3-phosphate-dehydrogenase multigenic family comparison of glyceraldehyde-3-phosphate dehydrogenase and 28 s-ribosomal rna gene expression as rna loading controls for northern blot analysis of cell lines of varying malignant potential validating internal controls for quantitative plant gene expression studies enzyme co-localization in pea leaf chloroplasts: glyceraldehyde-3-p dehydrogenase, triose-p isomerase, aldolase and sedoheptulose bisphosphatase regulatory mechanisms involved in the control of ubiquitin homeostasis cyclophilin: distribution and variant properties in normal and neoplastic tissues a: evidence based selection of housekeeping genes evaluation of reference genes for real-time quantitative pcr studies in candida glabrata following azole treatment we sincerely thank huei-fung tsai, jason noble, and bryan walker for useful discussions and gratefully acknowledge jason noble for helpful technical assistance. this research was supported by the intramural research program of the national institute of allergy and infectious diseases, the national institutes of health. additional file 1: summary of the reference genes evaluated in this study.additional file 2: primers and probes for rt-qpcr analyses of reference gene rna transcription in this study. the authors declare that they have no competing interests. the hkgfinder software is currently available as an r source script and it can be downloaded for free from the niaid exon website [http://exon.niaid.nih. gov/hkgfinder/] with a sample data set and complete instructions. it requires installation of r 2.11 or higher on computers using the microsoft windows operating system for complete compatibility with its graphic user interface (gui) elements. experienced r users should be able to use hkgfinder on an apple macintosh or linux/unix operating system with some reasonable adjustments. an hkgfinder webtool should be available soon on the niaid exon website listed above. the genorm software is available from biogazelle [www.biogazelle.com/genormplus/] with a free 15day trial download as a part of the qbase plus software system. the genorm software has now been integrated into the qbase plus software, where calculation of relative quantities and genorm analysis are combined in a single program to speed up analysis. presently, many manual precalculations are not needed, and cross point (cp) values from rt-qpcr can now be directly used for the gene stability analysis using the qbase plus software. the qbase plus software currently requires microsoft windows xp or above or apple mac os x 10.6 (snow leopard) with java 1.6 or later. support for linux is also available, but no requirements are listed on the manufacturer's website. the bestkeeper software [http://gene-quantification. com/bestkeeper.html] is available for free download. please note that the software requires a password generated by an automatic email response from genequan@wzw.tum.de or password@gene-quantification.info. it requires the microsoft windows operating system and microsoft excel, but specific versions are not listed on the manufacturer's website. the normfinder software is available for free download from the manufacturer's website [http://www.mdl.dk/publicationsnormfinder.htm]. it requires microsoft windows operating system and microsoft excel 2003 or above.authors' contributions qql conceived of the project, conducted the studies, performed all the experimental procedures, carried out the analysis and interpretation of data, wrote the manuscript, and is the primary author of this paper. js developed the hkgfinder software and technically helped with the use of other software packages in the present study. jeb participated in the design and coordination of the study and critically reviewed the manuscript. all authors have read and approved the final manuscript. key: cord-290282-oxyzndsj authors: ortego, javier; sola, isabel; almazán, fernando; ceriani, juan e; riquelme, cristina; balasch, monica; plana, juan; enjuanes, luis title: transmissible gastroenteritis coronavirus gene 7 is not essential but influences in vivo virus replication and virulence date: 2003-03-30 journal: virology doi: 10.1016/s0042-6822(02)00096-x sha: doc_id: 290282 cord_uid: oxyzndsj transmissible gastroenteritis coronavirus (tgev) contains eight overlapping genes that are expressed from a 3′-coterminal nested set of leader-containing mrnas. to facilitate the genetic manipulation of the viral genome, genes were separated by duplication of transcription regulating sequences (trss) and introduction of unique restriction endonuclease sites at the 5′ end of each gene using an infectious cdna clone. the recombinant tgev (rtgev) replicated in cell culture with similar efficiency to the wild-type virus and stably maintained the modifications introduced into the genome. in contrast, the rtgev replication level in the lungs and gut of infected piglets and virulence were significantly reduced. rtgev in which gene 7 expression was abrogated (rtgev-δ7) were recovered from cdna constructs, indicating that tgev gene 7 was a nonessential gene for virus replication. interestingly, in vivo infections with rtgev-δ7 showed an additional reduction in virus replication in the lung and gut, and in virulence, indicating that tgev gene 7 influences virus pathogenesis. transmissible gastroenteritis coronavirus (tgev) is a member of the coronaviridae family, which, together with the arteriviridae and roniviridae families forms the nidovirales order (cowley and walker, 2002; enjuanes et al., 2000a; mayo, 2002) . tgev replicates in both the villous epithelial cells of the small intestine and the lung cells of newborn piglets, resulting in a mortality of nearly 100% (saif and wesley, 1992) . the tgev genome contains a leader sequence at the 5ј end and a poly(a) tail at the 3ј end. genes are arranged in the order 5ј-rep-s-3a-3b-e-m-n-7-3ј (enjuanes et al., 2000b; penzes et al., 2001) . the 3ј end of the majority of tgev genes overlaps with the 5ј terminus of the next gene (enjuanes et al., 2000b) , complicating insertion of heterologous genes into the viral genome and deletion of different genes to determine whether they are essential. the tgev gene 7, located at the 3ј end of the genome, encodes a 78 amino acid hydrophobic protein that may play a role in membrane-associated replication complexes or in virion assembly (tung et al., 1992) . gene 7 is a groupspecific gene (de haan et al., 2002) with homologous versions in group 1 coronaviruses such as feline infectious peritonitis virus (fcov) and canine enteric coronavirus (ccov) (enjuanes et al., 2000a; lai and cavanagh, 1997) . in contrast, group 2 coronaviruses such as mouse hepatitis virus (mhv), bovine enteric coronavirus (bcov), or the human coronavirus (hcov) oc43, and group 3 coronaviruses such as avian infectious bronchitis virus (ibv), do not have a homologous gene 7 (enjuanes et al., 2000a; lai and cavanagh, 1997) . interestingly, the group 1 coronavirus hcov-229e does not have gene 7 (herold et al., 1993) , which could indicate that this gene is nonessential for coronavirus replication, even in group 1 coronaviruses. to study whether gene 7 is dispensable for tgev replication, its deletion by reverse genetics would be required. we used a genomic tgev cdna clone assembled as a bacterial artificial chromosome (bac) (almazán et al., 2000; gonzález et al., 2002) to separate the overlapping genes by duplicating sequences at the 5ј flank of each gene and by introducing unique restriction endonuclease sites between each gene pair. gene separation allowed the deletion of gene 7, showing that it is nonessential for virus replication. furthermore, we show that the accumulation of modifications in gene domains where the trss are located and insertion of restriction sites led to the generation of a collection of recombinant tgevs (rtgevs) with variable virulence, including a highly attenuated recombinant. these viruses could be the basis for coronavirus vector development. to facilitate genetic manipulation of the viral genome, full-length cdna clones were constructed by separating the contiguous genes and inserting unique restriction sites between each gene pair (fig. 1a ). cdna clones with the wild-type sequence or containing one [pbac-tgev-paci (p), pbac-tgev-mlui (m), pbac-tgev-fsei (f), and pbac-tgev-asci (a)], two [pbac-tgev-fsei-pmei (f-pm), pbac-tgev-pmei-asci (pm-a), and pbac-tgev-paci-mlui (p-m)], three [pbac-tgev-fsei-pmei-asci (f-pm-a)], or five [pbac-tgev-paci-mlui-fsei-pmei-asci (rs)] restriction endonuclease sites (fig. 1b) were transfected into bhk cells expressing the porcine amino peptidase n (papn) (bhk-papn cells). on the third day of transfection, a cytopathic effect was observed in cells transfected with each cdna, but not in mock-treated cells. virus production was amplified by passing the supernatants four times in cultured cells. after the fourth passage, viruses were cloned by three plaque-isolation steps, and their genomes were partially sequenced. all the rtgev viruses conserved the modifications engineered in the cdnas (data not shown), indicating that the orf separation and the insertion of unique endonuclease restriction sites between genes were stably maintained in the rtgev genomes. cloned rtgev viruses containing the unique restriction sites showed similar growth kinetics in cell culture to the parental rtgev-wt after infection at both low (0.05) and high (5) m.o.i. (fig. 2) , indicating that removal of the overlapping region between tgev genes and the insertion of endonuclease restriction sites did not significantly affect the in vitro virus replication. to analyze whether gene 7 was essential for viral growth, recombinant virus genomes with gene 7 deleted were generated from pbac-tgev-rs constructs, contain-ing either the s gene from the tgev strain pur-c11 (sánchez et al., 1999) able to infect both the enteric and the respiratory tracts (pbac-tgev-sc11-rs-⌬7) or the s gene from the strain ptv (sánchez et al., 1999) with an exclusive respiratory tropism (pbac-tgev-sptv-rs-⌬7). the rtgev-⌬7 contained a deletion spanning 21 nucleotides upstream of the orf 7 aug and the first 17 nucleotides of this orf (fig. 3a) . bhk-papn cells were transfected with plasmids including five restriction endonuclease sites and carrying gene 7 (pbac-tgev-sptv-rs and pbac-tgev-sc11-rs), or without this gene (pbac-tgev-sptv-rs-⌬7 and pbac-tgev-sc11-rs-⌬7). virus titers were determined in supernatants throughout four additional passages in cell culture ( fig. 3b and c). viruses were recovered from the cdnas containing the deletion of gene 7, and viral titers increased with passage, basically to the same extent as the viruses with gene 7 (rtgev-sptv-rs and rtgev-sc11-rs). as expected, no virus was recovered from the mock-transfected cultures. after four passages in cell cultures, the recombinant viruses were cloned by three plaque isolation steps. the cytopathic effect and plaque morphology produced by the rtgev-sptv-rs-⌬7 and rtgev-sc11-rs-⌬7 were identical to those of the parental viruses containing the complete genome (data not shown). the isolate rtgev-sc11-rs-⌬7 induced the formation of large-size plaques (3-mm-diameter), whereas the virus rtgev-sptv-rs-⌬7 induced smallsized plaques (1-mm-diameter). the cloned viruses with gene 7 deleted showed standard growth kinetics after infection at an m.o.i. of 5 ( fig. 3d and e). recombinant rtgev-sptv-rs-⌬7 and rtgev-sc11-rs-⌬7 generated the highest virus titers (around 6 ϫ 10 8 and 10 7 pfu/ml, respectively) at 24 h postinfection, similar to those of the parental viruses rtgev-sptv-rs and rtgev-sc11-rs. these data indicated that the protein encoded by gene 7 was not essential for tgev replication in cell culture. to confirm that the subgenomic mrna (sgmrna) 7 was not transcribed, bhk-papn cells were infected with rtgev-sptv-rs or rtgev-sptv-rs-⌬7 viruses. total rna was extracted and analyzed by northern blot with a probe complementary to the 3ј end of the tgev genome (fig. 4a ). the mobility and relative amount of the sgmrnas 2, 3, 4, 5, and 6 were indistinguishable in both viruses. as expected, sgmrna 7 was transcribed in rt-gev-sptv-rs-infected cells, but not in cells infected with rtgev-sptv-rs-⌬7. analysis of viral proteins at 16 h p.i. by western blot showed that the amount of s, n, m, and e proteins was similar in rtgev-sptv-rs-⌬7-infected cells and in cells infected with the parental virus rtgev-sptv-rs, except for protein 7 that was not detected in rtgev-sptv-rs-⌬7 virus-infected cells (fig. 4b) , confirming that the partial deletion of gene 7 prevented the synthesis of protein 7. identical results were obtained with rtgev-sc11-rs and rtgev-sc11-rs-⌬7 (data not shown). in vivo growth of a selected set of rtgevs containing the unique endonuclease restriction sites, and the rtgev-rs-⌬7 viruses, was determined by infecting newborn piglets. the animals were sacrificed at 1, 2, 3, and 4 days p.i. recombinant viruses with a modification including a restriction endonuclease site 5ј upstream of some genes, in general, showed lower titers than the wild-type virus, aloutlined sequences indicate the punctual mutations introduced to generate unique restriction sites. the core sequence (cs) is underlined. the atg start codon is shown in bold. duplicated sequences are indicated by dark boxes. tgev genes are indicated by light boxes. 1b, replicase 1b gene; s, spike gene; e, envelope gene; m, membrane gene; n, nucleoprotein gene. *, gene rep 1b termination codon (tga) and the initiation codon of gene s (atg) partially overlap. the sequence of the 24 and 83 nt located 5ј upstream of genes s and 3a, respectively, are described in the full-length tgev genome sequence previously reported (penzes et al., 2001) . (b) schematic illustration of the full-length tgev cdna without (top bar) or with one (pbac-tgev-p, pbac-tgev-m, pbac-tgev-f, and pbac-tgev-a), two (pbac-tgev-f-pm, pbac-tgev-pm-a, and pbac-tgev-p-m), three (pbac-tgev-f-pm-a), or five (pbac-tgev-rs) restriction endonuclease sites. cmv, cytomegalovirus immediate-early promoter; rep, replicase; an, poly(a) tail of 24 a residues; hdv, hepatitis delta virus ribozyme; bgh, bovine growth hormone termination and polyadenylation sequences. though there was some titer variability over the 4 days postinfection ( fig. 5a and b) . alteration in domains 5ј upstream of two or more genes did not lead to a significant decrease in virus titer in comparison with recombinants with a single modification. interestingly, analysis of viral growth in the gut of infected piglets showed a 100-to 5000-fold reduction of recombinant viruses containing one or more restriction sites in relation to the rtgev-wt virus ( fig. 5d and e) . the rtgev-rs, that included modifications 5ј upstream of five genes and insertion of endonuclease restriction sites between each contiguous gene, showed a titer decrease comparable with that of rtgevs with two (tgev-f-a) or three (tgev-f-pm-a) restriction endonuclease sites. these data show that modification of sequences 5ј upstream of genes affected virus replication in the gut. recombinant viruses were isolated from the gut and sequenced. all the modifi-cations introduced were stably maintained in the tgev genome during in vivo infections (data not shown). the growth in lungs of gene 7 deletion mutants (rtgev-sptv-rs-⌬7 and rtgev-sc11-rs ⌬7) showed around 100fold reduction of virus titers compared with the parental viruses rtgevsptv-rs and rtgev-sc11-rs (fig. 5c ). in contrast, in vivo growth of rtgev-sc11-rs-⌬7 in the gut showed slightly lower replication levels than rtgev-sc11-rs (5 ϫ 10 4 pfu/g) probably because the introduction of modifications at the 5ј upstream of five genes had already caused a significant titer reduction (fig. 5f ). as expected, the respiratory recombinants rtgev-sptv-rs and rtgev-sptv-rs-⌬7 did not grow in the gut since the s gene was derived from the respiratory ptv strain. piglet survival after infection by rtgev viruses was compared with survival after infection with their virulent parental virus tgev-pur46-c11 (tgev-sc11-wt). rtgev viruses with one, two, three, or five unique restriction sites were highly attenuated (they produced mild enteritis and led to 80 to 90% survival at 5 days p.i.), except for rtgev-p and rtgev-m viruses, in which the restriction sites paci and mlui were introduced by point mutations, without introducing trs duplication. in these two recombinant viruses the survival was from 0 to 20%. piglets infected by rtgev-sc11-rs-⌬7 showed 100% survival, although a very mild and transitory enteritis was observed in 50% of the animals. these data indicate that gene 7 deletion further reduced virus virulence. in general, a good correlation be-tween growth of recombinants with a single restriction endonuclease insertion and virulence was not clear. this could be due to differences in the distribution of viral antigens and inflammatory responses in pigs infected with wild-type or each mutant. nevertheless, in viruses with two or more restriction endonuclease sites inserted there was a good correlation between virus titers in the gut and mortality. tgev genomes with all the genes separated by unique endonuclease restriction sites have been engineered. the separation of tgev genes implied the duplication of sequences 5ј upstream of each gene, a sequence domain involved in regulating transcription, and affected in vivo virus growth and virulence. in this article, the first demonstration that tgev gene 7 is nonessential for virus viability is provided. in addition, it has been demonstrated that gene 7 deletion affects tgev replication and attenuates virus virulence in piglets, its natural host. interestingly, the introfig. 5 . in vivo growth kinetics of rtgev viruses. two-to three-day-old non-colostrum-deprived swine were used to study the growth kinetics of rtgev viruses containing one (a and d) or more (b and e) endonuclease restriction sites, or partial deletion of gene 7 (c and f). groups of 12 to 20 piglets were oronasally (2 ϫ 10 8 pfu/pig) and intragastrically (3 ϫ 10 8 pfu/pig) inoculated. virus titers at the indicated number of days postinoculation were determined in the indicated tissue extracts. the whole organs were homogenized to obtain representative samples. error bars represent standard deviations of the mean from four experiments. (g) number of surviving piglets at different days postinoculation. results from a representative experiment of two that gave similar results are shown. recombinant virus titers were compared with that of the wild-type virus by the kruskal-wallis test and, in general, were significant (p ͻ 0.05) between viruses with gene 7 deleted and the wild-type virus. western blot analysis of lysates from bhk-papn cells infected with either rtgev-sptv-rs or rtgev-sptv-rs-⌬7 viruses. cell extracts were obtained at 16 h p.i., resolved by 5 to 20% gradient sds-page, transferred to nitrocellulose membranes, and immunoblotted with monoclonal antibodies specific for s, n, m, and e, and with an antiserum specific for a protein 7 peptide (garwes et al., 1989 ). duction of one or more modifications into the tgev genome has led to the generation of a collection of tgev recombinants with a variable degree of virulence. overlapping of genes has been proposed as a mechanism by which nidoviruses preserve the genetic integrity of vital parts of their genomes (de vries et al., 2000) . nevertheless, in coronavirus we have generated viable and stable rtgev viruses in which genes have been separated by the insertion of unique restriction endonuclease sites and modification of the domains where the transcription-regulating sequences (trss) map. maximum titers of mutant rtgev viruses in cell culture were similar to those obtained for the parental virus, indicating that changes in the sequences between tgev genes had little effect on viral yield. in contrast, in vivo assays showed lower tgev mutant virus replication in the lungs and gut, and attenuation of virulence in viruses in which a trs duplication was introduced between genes. in arteriviruses, mutants that had the overlap between orfs 4 and 5, or between orfs 5 and 6 removed, were also viable (de vries et al., 2000) . taken together, these data demonstrate that gene overlapping is not an obligatory requirement for nidovirales viability. viral attenuation resulting from gene separation was possibly due to modification of the trss affecting mrna transcription levels. in the case of nonsegmented negativestrand rna viruses it has also been shown that sequence alterations, such as restriction endonuclease site or heterologous gene insertion, and gene rearrangement may affect virus replication (wertz et al., 2002) . gene 7 is a group 1 specific gene, absent in genome of coronaviruses from groups 2 and 3 (armstrong et al., 1983; boursnell et al., 1985; kamahora et al., 1989; lapps et al., 1987; skinner and siddell, 1985) . interestingly, no mortality was observed in piglets infected with rtgev-⌬7. since these recombinants still replicate with titers between 10 3 and 10 5 pfu/g of tissue ( fig. 5c and f) , tgev-⌬7 mutants could be good candidates as virus vectors. a relationship between gene 7 and virulence has also been observed in the fcovs. the most 3ј end genes of these viruses are genes 7a and 7b. deletions in fcov orf 7a, which is homologous to the tgev orf 7, have been reported in a natural infection of a cat population and correlated with a decrease in virulence (kennedy et al., 2001) . similarly, mutations in fcov orf 7b occur in vitro and have also been correlated with loss of virus virulence (herrewegh et al., 1995) . therefore, the most 3ј end genes of group 1 coronaviruses seem to influence in general virus pathogenesis. deletion of gene 7 did not affect virus replication in cell culture. therefore, the reduction of in vivo virus replication and virulence was probably due to an effect on virus-host interaction. it has been suggested that coronavirus groupspecific genes, such as gene 7, may affect host immune response (de haan et al., 2002) . it would be of interest to determine whether the immune response to a heterologous gene inserted in recombinant viruses with and without gene 7 is influenced by the presence of this gene. in an attempt to identify gene 7 homologous protein sequences or motifs, a sequence search was performed in the databases using gene 7 sequences without success. the high hydrophobicity of tgev protein 7 (garwes et al., 1989) could facilitate its insertion in membranes providing a role in virus replication, since coronavirus replication complexes have been associated with membranes (dennison and sims, 2001; snijder et al., 2001) . similarly, the tentative location of protein 7 within the nucleus (garwes et al., 1989) could be taken as an indication for a possible interference with the cell cycle, similarly to coronavirus nucleoprotein that seems to interact with cell nucleolus proteins and interfere with cell cycle (hiscox, 2002) . therefore, genes 3a, 3b, and 7 of group 1 coronaviruses are nonessential for replication. similarly, genes 2a/he and 4/5a, and possibly gene e, are dispensable in mhv (de haan et al., 2002; kuo and masters, 2002) . manipulation of transcription-regulating sequences and deletion of nonessential genes, such as gene 7, will facilitate the study of the molecular basis of viral attenuation and provide an attractive approach to generate attenuated rtgevs with high potential as virus vectors. baby hamster kidney cells (bhk-21) stably transformed with the gene coding for the porcine aminopeptidase n (bhk-papn) (delmas et al., 1994) were grown in dmem supplemented with 2% fetal calf serum and geneticin g418 (1.5 mg/ml) as a selection agent. bhk-papn cells were used for all the experiments except for standard virus titrations that were performed in porcine swine testis (st) cells (mcclurkin and norman, 1966) . virus titers were compared by the kruskal-wallis test (motulsky, 1995) and, when significant (p ͻ 0.05), it was indicated. rtgev viruses were generated from pbac-tgev constructs containing the s gene from the virulent tgev strain pur-c11 (sc11) as described (almazán et al., 2000; gonzalez et al. 2002) . viruses containing the s gene from the attenuated strain ptv (sptv) were derived from the corresponding pbac tgev vectors with gene sc11 by replacing this gene by that (sptv) of the respiratory strain. two different approaches were followed to introduce into pbac-tgev unique restriction endonuclease sites separating each gene of tgev genome (fig. 1a) , leading to pbac-tgev-rs. the first approach was the introduction of punctual mutations in the tgev genome to generate the restriction sites paci (between genes rep 1b and s) and mlui (between genes s and 3a) (fig. 1b) . due to overlapping in the tgev genome, the second approach involved the duplication of 13, 22, and 19 nucleotides from the 5ј transcription-regulating sequences of genes m, n, and 7 (trs-m, trs-n, and trs-7) after the restriction sites fsei, pmei, and asci, respectively ( fig. 1a and b) . point mutations, duplications, and insertion of restriction endonuclease sites were generated by overlapping pcr amplification from the tgev genome as described (ortego et al., 2002) . the assembly of the full-length pbac-tgev constructs was performed as previously reported (almazán et al., 2000; gonzález et al., 2002) . to generate the deletion ⌬7, oligonucleotide primers 5ј-asc-7.17-vs (5ј-gaggcgcgcctgctgtatttat-tacag-3ј), including the restriction site asci (underlined) and the deletion of 21 nucleotides from trs-7 plus the first 17 nucleotides of orf 7, and bgh34-rs (5ј-cagatg-gctggcaactagaaggc-3ј) were used to generate a pcr product comprising from nt 28,094 to nt 28,764 of the tgev cdna clone. pcr product was digested with asci and bamhi and cloned into the asci-bamhi-digested pbac-tgev-sc11-rs and pbac-tgev-sptv-rs, generating pbac-tgev-sc11-rs-⌬7 and pbac-tgev-sptv-rs-⌬7, respectively. bhk-papn cells were grown to 60% confluence on 35-mm-diameter plates and transfected with 10 g of cdna plasmid and 15 g of lipofectin (life technologies, gibco) according to the manufacturer's specifications. recovery and amplification of viruses were performed as described (ortego et al., 2002) . cell lysates were analyzed by 5 to 20% gradient sodium dodecyl sulfate-polyacrylamide gel electrophoresis (sds-page). proteins were transferred to a nitrocellulose membrane and analyzed as described (ortego et al., 2002) , using mabs specific for s (5b.h1), n (3d.c10), m (9d.b4), and e (v27) proteins (ortego et al., 2002) , and a swine polyclonal antibody specific for tgev protein 7 (provided by p. britton, compton, uk). total rna was extracted by using an ultraspec rna isolation system (biotecx) according to the manufacturer's instructions and analyzed by northern blotting as described (ortego et al., 2002) . two-to three-day-old non-colostrum-deprived swine, from crossing large white and belgium landrace, were used to study in vivo growth kinetics of rtgev, as described (sanchez et al., 1999) . piglets were obtained from sows seronegative for tgev, as tested by radioimmunoassay. engineering the largest rna virus genome as an infectious bacterial artificial chromosome sequence of the nucleocapsid gene from murine coronavirus mhv-a59 sequences of the nucleocapsid genes from two strains of avian infectious bronchitis virus the complete genome sequence of gill-associated virus of penaeus monodon prawns indicates a gene organisation unique among nidoviruses the group-specific murine coronavirus genes are not essential, but their deletion, by reverse genetics, is attenuating in the natural host genetic manipulation of equine arteritis virus using full-length cdna clones: separation of overlapping genes and expression of a foreign epitope further characterization of aminopeptidase-n as a receptor for coronaviruses mhv-a59 gene 1 proteins are associated with two distinct membrane populations coronaviridae nidovirales the polypeptide of mr 14000 of porcine transmissible gastroenteritis virus: gene assignment and intracellular location stabilization of a full-length infectious cdna clone of transmissible gastroenteritis coronavirus by the insertion of an intron nucleotide sequence of the human coronavirus 229e rna polymerase locus the molecular genetics of feline coronavirus comparative sequence analysis of the orf7a/7b transcription unit of different biotypes the nucleolus-a gateway to viral infection? sequence analysis of nucleocapsid gene and leader rna of human coronavirus oc43 deletions in the 7a orf of feline coronavirus associated with an epidemic of feline infectious peritonitis genetic evidence for a structural interaction between the carboxy termini of the membrane and nucleocapsid proteins of mouse hepatitis virus the molecular biology of coronaviruses sequence analysis of the bovine coronavirus nucleocapsid and matrix protein genes a summary of taxonomic changes recently approved by ictv studies on transmissible gastroenteritis of swine. ii. selected characteristics of a cytopathogenic virus common to five isolates from transmissible gastroenteritis comparing two paired groups: paired t and wilcoxon tests generation of a replication competent, propagation-deficient virus vector based on the transmissible gastroenteritis coronavirus genome complete genome sequence of transmissible gastroenteritis coronavirus pur46-mad clone and evolution of the purdue virus cluster transmissible gastroenteritis targeted recombination demonstrates that the spike gene of transmissible gastroenteritis coronavirus is a determinant of its enteric tropism and virulence coding sequence of coronavirus mhv-jhm mrna 4 non-structural proteins 2 and 3 interact to modify host cell membranes during the formation of the arterivirus replication complex the 9-kda hydrophobic protein encoded at the 3ј end of the porcine transmissible gastroenteritis coronavirus genome is membrane-associated adding genes to the rna genome of vesicular stomatitis virus: positional effects on stability of expression this work was supported by grants from the comisión interministerial de ciencia y tecnología (cicyt), la consejería de educación y cultura de la comunidad de madrid, fort dodge veterinaria, and the european union (frame v, key action 2, control of infectious disease projects qlrt-1999-00002, qlrt-1999-30739, and qlrt-2000. i.s. received postdoctoral fellowships from the community of madrid and the european union (frame v, key action 2, control of infectious disease projects). key: cord-279781-5ldpz9m9 authors: chen, chi-yuan; lin, chin-yu; chen, guan-yu; hu, yu-chen title: baculovirus as a gene delivery vector: recent understandings of molecular alterations in transduced cells and latest applications date: 2011-04-28 journal: biotechnol adv doi: 10.1016/j.biotechadv.2011.04.004 sha: doc_id: 279781 cord_uid: 5ldpz9m9 baculovirus infects insects in nature and is non-pathogenic to humans, but can transduce a broad range of mammalian and avian cells. thanks to the biosafety, large cloning capacity, low cytotoxicity and non-replication nature in the transduced cells as well as the ease of manipulation and production, baculovirus has gained explosive popularity as a gene delivery vector for a wide variety of applications. this article extensively reviews the recent understandings of the molecular mechanisms pertinent to baculovirus entry and cellular responses, and covers the latest advances in the vector improvements and applications, with special emphasis on antiviral therapy, cancer therapy, regenerative medicine and vaccine. baculoviruses are a diverse group of dna viruses capable of infecting more than 600 insects, among which autographa californica multiple nucleopolyhedrovirus (acmnpv) is the best characterized and most extensively employed, thus baculovirus discussed in this article refers to acmnpv unless otherwise noted. acmnpv contains a circular dsdna genome of ≈134 kb and replicates in a bi-phasic fashion. the viral proteins polyhedrin and p10 are expressed abundantly in infected cells and are dispensable for virus replication, thus recombinant baculovirus can be constructed by placing the foreign gene under the control of polyhedrin or p10 promoter, and used to infect insect cells for foreign gene expression. such baculovirus-insect cell expression system has been exhaustively utilized for the production of numerous recombinant proteins (kost and condreay, 1999) and commercial vaccine products such as cervarix ® and provenge ® (hitchman et al., 2009; . in addition to insect cells, baculovirus is able to transduce animal cells of human, rodent, rabbit, porcine, bovine, fish and avian origins (hu, 2005 (hu, , 2006 as well as relatively primitive cells including embryonic stem cells, adult stem cells (table 1 ) and induced pluripotent stem cells (fig. 1) . within the baculovirus-transduced cells, the transgene can be expressed as long as it is driven by an appropriate promoter (e.g. cytomegalovirus immediate-early (cmv) or hybrid cag promoter consisting of the cmv early enhancer and chicken β-actin promoter). baculovirus cloning capacity is as large as 38 kb (cheshenko et al., 2001) , thus allowing for the insertion of multiple genes and regulatory elements (kost et al., 2005; kost and condreay, 2002) . baculovirus neither replicates inside the transduced cells nor integrates its dna into host chromosomes in the absence of selective pressure (chen et al., 2011a; merrihew et al., 2001) , hence easing the safety concerns. humans do not possess pre-existing antibody and t-cells specifically against baculovirus (strauss et al., 2007) , therefore baculovirus may circumvent the pre-existing immunity problem faced by other viral vectors. finally, recombinant baculovirus can be readily constructed and propagated to high titers in biosafety level 1 facilities by infecting its natural host insect cells (hu, 2008) . these attributes have fueled growing interests to explore baculovirus for a wide variety of applications, ranging from protein production (jardin et al., 2008; liu et al., 2010) , virus production lesch et al., 2008 lesch et al., , 2011 lucifora et al., 2008; mccormick et al., 2002; nakowitsch et al., 2006; zheng et al., 2010) , virus-like particle production (chen et al., 2005; matsuo et al., 2006; wang et al., 2005) , eucaryotic protein display (ernst et al., 2006, grabherr and ernst, 2010) , vaccine development madhan et al., 2010; tani et al., 2008) , cancer therapy (wang and balasundaram, 2010) , cell-based assay development (condreay et al., 2006; condreay and kost, 2007; kost et al., 2010) to tissue engineering (lin et al., 2010b) . this article will focus on recent understandings of intracellular events after baculovirus transduction and latest applications of baculovirus for antiviral therapy, cancer therapy, tissue regeneration and vaccine development. baculovirus entry into mammalian cells was initially suggested to depend on electrostatic interactions (duisit et al., 1999) , heparin sulfate (duisit et al., 1999) and phospholipids (tani et al., 2001) , but the exact cell surface molecules for baculovirus docking remained unknown. it was also proposed that clathrin-mediated endocytosis (long et al., 2006; matilainen et al., 2005) and macropinocytosis (matilainen et al., 2005) play roles in baculovirus entry. contradictorily, a recent study (laakkonen et al., 2009 ) discovered that (1) baculovirus entered hek293 and hepg2 cells along fluid-phase markers from the raft areas into vesicles devoid of clathrin; (2) macropinocytosis-related regulators (e.g. eipa, pak1, rab34 and rac1) imparted no significant effects on virus transduction and (3) the internalization and nuclear uptake were affected by the regulators of clathrin-independent entry. these data unveiled a baculovirus entry pathway independent of clathrin-mediated endocytosis and macropinocytosis and suggested table 1 selected types of cells permissive to baculovirus transduction. human cells hepg2 (gerner et al., 2010; laakkonen et al., 2008; laakkonen et al., 2009 ) hek293 laakkonen et al., 2009; lavdas et al., 2010; sollerbrant et al., 2001 ) huh7 ) vero e6 cells (liu et al., 2009a) coronary smooth muscle and endothelial cells (grassi et al., 2006) a549 (guo et al., 2010) colon cancer cells (paul and prakash, 2010; yin et al., 2010 ) glioblastoma u373 (lackner et al., 2008) osteosarcoma saos-2 (song et al., 2003 ) glioblastoma u251 (balani et al., 2009 ) astrocytes (balani et al., 2009; boulaire et al., 2009 ) neurons derived from es cells (zeng et al., 2009 ) bone marrow mesenchymal stem cells (bak et al., 2010; ho et al., 2005; lin et al., 2010a; lo et al., 2009 ) embryonic stem (es) cells (du et al., 2010; zeng et al., 2009 ) non-human primate cells cos-7 (pan et al., 2009 ) cv-1 (tani et al., 2001 ) vero (zheng et al., 2010) coronary smooth muscle cell (grassi et al., 2006 ) rodent cells bhk (zheng et al., 2010 ) cho hu et al., 2003; pan et al., 2009 ) c2c12 (shen et al., 2008 ) l929 (airenne et al., 2000; cheng et al., 2004 ) sol 8 (shen et al., 2008) rat hepatic stellate cells (gao et al., 2002) primary mouse osteoblasts and osteoclast (tani et al., 2003) rat articular chondrocyte (ho et al., 2004) mouse amniotic fluid-derived stem cells (liu et al., 2009b) rabbit cells aortic smooth muscle (raty et al., 2004) intervertebral disk cells ) primary articular chondrocyte (chen et al., 2009c; lee et al., 2009; sung et al., 2009 (ping et al., 2006) chicken embryonic fibroblast (ping et al., 2006) chicken embryonic cells ) duck embryonic cells canine cells mdck norden laboratories feline kidney (nlfk) that phagocytosis might play a role (laakkonen et al., 2009) , which echoed the observations reported previously (abe et al., 2005) . moreover, other recent studies reported that baculovirus transduction related to direct fusion pathway induced by a short ph trigger (dong et al., 2010; paul and prakash, 2010) . these conflicting data underlined the need for more in-depth studies to elucidate the underlying mechanism and might suggest that baculovirus entry pathway varies with cell types. nevertheless, one consensus is that baculovirus envelope protein gp64 is pivotal for entry because blocking gp64 can abrogate the baculovirus ability to transduce mammalian cells (abe et al., 2005; niu et al., 2008) and activate dendritic cells (dcs) (schutz et al., 2006) . once inside the cells, baculovirus is transported to the endosome, followed by endosomal escape via acid-triggered gp64 fusion (kukkonen et al., 2003) and subsequent nuclear transport van loo et al., 2001) with the aid of actin filament (matilainen et al., 2005; salminen et al., 2005) . a major component of type iii intermediate filaments, vimentin, also participates in intracellular trafficking (mahonen et al., 2010) . vimentin is reorganized in the optimized culture medium and is linked to enhanced nuclear entry of baculovirus, underscoring the importance of culture medium in the cytoskeleton network assembly and in baculoviral gene delivery. baculovirus encodes ≈155 genes and a number of viral genes (e.g. orf149, ie0, p35 and gp64) are expressed in transduced mammalian cells (fujita et al., 2006; kitajima et al., 2006) . among the baculoviral genes, ie1 is expressed early in insect cells and transactivates downstream gene expression. forced expression of ie1 by the minimal cmv promoter in vero e6 cells also markedly activates gp64 and pe38, and upregulates ie2, he65, pcna, orf16, orf17 and orf25 (liu et al., 2007) . the critical role of ie1 for transactivating downstream genes was further unraveled in a recent study , which showed that baculovirus deficient in ie1 gene mitigates residual baculoviral gene expression 10-to 100-fold (when compared with wild-type baculovirus) in transduced mammalian cells, thus enhancing safety features to baculovirus-based gene therapy. in contrast to ie1, ie2 overexpression driven by the cmv promoter only upregulates 2 baculoviral genes (pe38 and orf17), but baculovirusmediated co-expression of ie1 and ie2 acts in concert to upregulate 59 out of 155 baculovirus genes in mammalian cells (liu et al., 2007) . strikingly, ie2 is a strong transactivator of cmv promoter in both vero e6 and u-2os cells (liu et al., 2009a) . when overexpressed within the baculovirus context, ie2 forms the nuclear foci and develops into large nuclear bodies (nbs) with a hollow center. the ie2 nbs structure contains abundant g-actin, closely associates with rna polymerase ii, promyelocytic leukemia (pml) and small ubiquitin-like modifier-1 (sumo1) and is the site of active transcription, thereby contributing to the ie2-associated gene stimulation (liu et al., 2009a) . furthermore, the nbs formation and cmv promoter activation require the n-terminal ring finger and c-terminal coiled-coil domains of ie2 (liu et al., 2009a) . since cmv promoter is exhaustively used for baculovirus-mediated gene transfer, the transactivating activity of ie2 may be useful for baculovirus-mediated protein production in mammalian cells . due to the complex cascade of events during baculovirus transduction, it is not surprising that baculovirus can alter the cell morphology and trigger cellular responses. for instance, baculovirus transduction of hepg2 cells alters the size of pml nbs, induces remodeling of the host cell chromatin and arouses extensive ruffle formation on the cell surface (laakkonen et al., 2009) . shotgun proteomics also attests that baculovirus-transduced hepg2 cells exhibit a slight induction of proteins related to inflammation, cell survival and chromatin function (gerner et al., 2010) . the most well-characterized baculovirus-induced cellular response is the innate immune response, as manifested by the induction of such cytokines as interferon α/β (ifn-α/β), interleukin-6 (il-6), il-8, il-1β and tumor necrosis factor-α (tnf-α) (abe and matsuura, 2010; tani et al., 2008) . baculovirus transduction of rat chondrocytes also elicits transient expression of ifn-α/β, which attenuates the transgene expression (lee et al., 2009) . not only the cytokine secretion, in vitro baculovirus transduction also activates human (schutz et al., 2006) and mouse (abe et al., 2005) dcs. moreover, wild-type baculovirus transduction of mouse bone marrow-derived dcs (bmdcs) upregulates the major histocompatibility complex (mhc) i and ii and co-stimulation molecules (e.g. cd40, cd80 and cd86) (suzuki et al., 2010a) . the activated bmdcs can further stimulate natural killer (nk) cells upon coculture, as evidenced by the ifn-γ production, cd69 upregulation and cell proliferation. in contrast to these differentiated, specialized cells, stem cells appear to be less sensitive to baculovirus transduction. upon wildtype baculovirus transduction, human bone marrow-derived mesenchymal stem cells (bmscs) secret il-6 and il-8, but no detectable levels of ifn-γ, ifn-β, tnf-α, tnf-β, il-1β, il-2, il-4, il-5, il-10 and il-12 (chen et al., 2009b) . human leukocyte antigen i (hla-i) is slightly upregulated, but the expression of hla-ii and other surface markers are barely disturbed (bak et al., 2010; chuang et al., 2009) . neither does baculovirus transduction compromise the immunosuppressive properties of bmscs . conversely, wildtype baculovirus transduction of mouse induced pluripotent stem cells (ipscs) elicits only the chemokine ip-10, but not other wellknown cytokines (unpublished data). bmscs and ipscs are promising cell sources for regenerative medicine. the lack of these cytokine responses reduces the risk of mounting strong immune responses after transplantation of baculovirus-transduced bmscs and ipscs. in vivo, baculovirus administration triggers innate immune responses, activates macrophages (abe et al., 2005) , dcs (hervas-stubbs et al., 2007; schutz et al., 2006) and nk cells (facciabene et al., 2004; kitajima et al., 2008) . the baculovirus-induced innate immunity gives rise to antitumor activity (suzuki et al., 2010a) and is sufficient to confer protection against influenza virus in mice (abe et al., 2003) , infectious bronchitis virus in chickens (niu et al., 2008) and foot-and-mouth-disease virus in mice (molinari et al., 2010) . the innate immunity also confers baculovirus the adjuvant activity to promote humoral and cellular immune responses against co-administered antigens (hervas-stubbs et al., 2007) . moreover, intramuscular (i.m.) injection of baculovirus triggers t-cell responses against the vector, but the magnitude of anti-baculovirus t cells response is lower than that of anti-adenovirus response (hervas-stubbs et al., 2007) . toll-like receptors (tlrs) are a family of pattern recognition receptors essential for initiating the innate immunity and substantiating the adaptive immunity (ishii et al., 2008) . for instance, tlr3 recognizes virus-derived dsrna while tlr9 recognizes unmethylated cpg dna motifs. upon engagement with cognate ligands, tlrs are activated and recruit adaptor molecules such as myeloid differentiating factor 88 (myd88) and tir domain-containing adaptor inducing ifn-β (trif) to transduce signals to downstream molecules. besides, stimulator of interferon genes (sting) is a cytoplasmic sensor that activates irf3 and ifn-α/β in response to viral dsdna while retinoic acid-inducible gene 1 (rig-i) and melanoma differentiation-associated gene 5 (mda5) are tlr-independent cytoplasmic rna detectors that induce the ifn-α/β production through ifn promoter-stimulator-1 (ips-1). baculovirus-induced innate immunity has been ascribed to the recognition of cpg motifs in viral dna and ensuing activation of tlr9/ myd88-dependent pathway (abe et al., 2005) . the detection activates nuclear factor-κb (nf-κb) and leads to subsequent ifn-α/β production. however, ifn-α/β are still produced in peritoneal cells derived from myd88-or tlr9-deficient mice (abe et al., 2005) . in mouse embryonic fibroblasts (mefs), baculovirus triggers ifn-β and ifn-inducible chemokines through tlr-independent and irf3-dependent pathways and endosomal maturation is required for induction (abe et al., 2009) . these data suggest that baculovirus dna might be recognized by at least two different pathways: tlr9-dependent endosomal recognition and tlr9independent cytoplasmic recognition. the latter was suggested to be related to sting (abe and matsuura, 2010) because baculovirus-induced ifn-α/β production was impaired in sting-deficient mefs (ishikawa et al., 2009 ), yet ifn-α/β stimulation was found to be independent of other cytoplasmic rna detectors such as rig-i and mda5 (abe et al., 2009 ). however, a recent study noted that rig-i and mda5 mrna levels were elevated in baculovirus-transduced cells . addition of dna methytransferase inhibitors (dnmti) prior to transduction retarded such upregulation and enhanced baculovirus-mediated gene expression, suggesting that dnmti may somehow facilitate the baculovirus evasion from the cellular recognition and thus ameliorate the transgene expression. using the cdna microarray, wang and coworkers discovered that baculovirus injection into the striatum in the rat brain perturbed the expression of 628 genes, which represented 3.45% of the total gene probes on the microarray . the same study also identified 532 gene probes (1.99%) in human astrocytes and 1219 gene probes (4.32%) in human neuronal cells that were disturbed in response to baculovirus. despite the disparity between cell types, in all 3 samples baculovirus altered the expression of genes associated with tlr signaling pathway (e.g. tlr2, tlr3, ccl5, cxcl10 and stat1) and cytokine-cytokine receptor interaction (e.g. cxcl10, cxcl11 and ccl5). moreover, genes associated with interferon induction-related genes (e.g. cxcl10, mx1, mx2, oas1 and stat1) and antigen processing and presentation pathway (e.g. cd74 and rt-1ba) were affected. as such, wang and coworkers proposed that baculovirus recognition by tlrs triggers the expression of ifn-α/β, which initiates the subsequent signaling cascade involving stats, upregulating the expression of ifn-responsive genes and hence confers the cells the antiviral state. concurrent with the aforementioned findings, we also discovered that baculovirus transduction of human bmscs disturbed the expression of 816 genes, most of which were related to 5 signaling pathways: cell adhesion molecules, tlr, jak-stat, apoptosis as well as antigen processing and presentation (chen et al., 2009b) . of note, baculovirus triggered the tlr3 pathway, resulting in downstream nf-κb and irf-3 activation and il-6/il-8 production. however, how baculovirus containing the dsdna genome activated the dsrna sensor tlr3 remains an enigma. also, the transduction did not arouse the secretion of ifn-β (chen et al., 2009b) , a signature cytokine associated with tlr3 activation, implying a signaling cascade somewhat distinct from that in immune cells. interestingly, the budded form of another baculovirus, antheraea pernyi nuclear polyhedrosis virus (apnpv), triggers the tlr21 signaling in chicken macrophage-like cells (hd11) but not in chicken b cell-like cells (han et al., 2010) . apnpv transduction of hd11 cells concomitantly induces the production of ifn-γ, il-12p40 and nitric oxide (no) (niu et al., 2008) , which is accompanied by the phosphorylation of extracellular signal-related kinase 1/2 (erk 1/2), p38 mitogen activated protein kinase (mapk) and c-jun n-terminal kinase (jnk) as well as activation of p65-nf-κb (han et al., 2009 ). inhibition of p38 mapk and nf-κb by their respective inhibitors abrogates the expression of cytokines and no, whereas inhibition of jnk abolishes only the induction of cytokines. since in mammals tlrs signaling activates the downstream nf-κb and mapk cascade comprised of at least p38 mapk, erk 1/2, and jnk (ishii et al., 2008) , these data altogether suggest that apnpv transduction of hd11 cells activates tlr21 and signals though nf-κb, p38 mapk and jnk pathways, and chicken tlr21 might play a role similar to mammalian tlr9 (han et al., 2009 (han et al., , 2010 . baculovirus envelope protein gp64 comprises an n-terminal signal peptide and a mature domain that encompasses the transmembrane domain and cytoplasmic domain (ctd). heterologous protein/peptide has been inserted in between the signal peptide and mature domain, which after expression under the polyhedrin or p10 promoter as a fusion protein is translocated to the plasma membrane and incorporated into the viral envelope upon virus budding. such feature has been exploited for surface display of protein/peptide to improve the virus transduction (grabherr and ernst, 2010; grabherr et al., 2001; raty et al., 2004) or for ligand-directed targeting if an appropriate ligand is chosen (kitagawa et al., 2005; makela et al., , 2008 . for instance, baculovirus poorly transduces b lymphocytes (cheng et al., 2004; condreay et al., 1999) . via gp64 fusion, the transduction has been enhanced by displaying the short peptide motif from gp350/220 of epstein-barr virus (ebv, which naturally infects b cells) on the baculovirus envelope (ge et al., 2007) . alternatively, the cytoplasmic transduction peptide (ctp) has been fused to gp64 to enhance the baculovirus transduction of vero e6, u-2os and cho-rd cells (chen et al., 2011b) . another paradigm is the display of the fragment crystallisable (fc) region of antibody on the baculovirus surface (martyn et al., 2009) . fc receptors (fcrs) are membrane proteins that bind to the fc region of antibody and mediate the phagocytosis and antigen presentation. the fc display allows for specific baculovirus targeting to cell lines and antigen presenting cells (apcs) expressing fcrs, hence augmenting the vaccine effect (martyn et al., 2009 ). the display system also allows for the surface presentation of functional membrane proteins to simplify subsequent isolation . aside from the gp64-aided display, expression of vesicular stomatitis virus g protein (vsvg) (chapple and jones, 2002; makela and oker-blom, 2006) , influenza virus neuraminidase (borg et al., 2004) , spodoptera exigua multiple nucleopolyhedrovirus f protein (yu et al., 2009) , single chain antibody fragments (kitidee et al., 2010) and human endogenous retrovirus envelope protein (lee et al., 2010) in insect cells also leads to incorporation of the protein into baculovirus envelope. among these strategies, display of vsvg or heterologous peptide/protein via the vsvg anchor is the most widely adopted and can tremendously enhance baculovirus transduction in vitro and in vivo (facciabene et al., 2004; kaikkonen et al., 2006; kaneko et al., 2006; kitagawa et al., 2005; lu et al., 2006; tani et al., 2003 tani et al., , 2001 zhou and blissard, 2008) . serum complement proteins (e.g. c5b-9) inactivate baculovirus, hence constituting a major hurdle in the in vivo use of baculovirus. the inactivation problem has been circumvented by the use of complement inhibitors such as compstatin (georgopoulos et al., 2009 ) and soluble complement inhibitor 1 (hoare et al., 2005; hofmann et al., 1999) , avoiding exposure to the complement (airenne et al., 2000) or administrating the vector to immunoprivileged sites (kinnunen et al., 2009; lehtolainen et al., 2002) . alternatively, the baculoviral vector can become complement-resistant by displaying human daf (decay accelerating factor) via gp64 fusion (huser et al., 2001 , kaname et al., 2010 . more recently, baculoviral vectors displaying different complement regulatory proteins (e.g. daf, factor h-like protein-1, c4bbinding protein and membrane cofactor protein) were generated by fusion with the membrane anchor of vsvg (kaikkonen et al., 2010) . these surface-modified vectors exhibited varying degrees of complement resistance in vitro and daf yielded the highest level of protection. intraportal delivery of the daf-displaying baculovirus increased the survival and gene expression in immunocompetent mice. the daf-displaying baculovirus provoked lower levels of inflammatory cytokines il-1β, il-6, and il-12p40 in macrophages and mitigated liver inflammation in mice when compared with the control virus. these results prove that daf display offers protection to the baculoviral vector against complement inactivation and attenuates complement-mediated inflammation injury. other than the display on the envelope, heterologous protein has been displayed on the capsid by fusion with the major capsid protein vp39. the vp39 fusion with enhanced green fluorescent protein (egfp) neither interferes with the virus assembly nor affects the virus tier, thereby enabling intracellular baculovirus trafficking and biodistribution monitoring (kukkonen et al., 2003) . similarly, the zno binding peptide has been fused to the n-terminus of vp39 while retaining the viral infectivity and conferring the ability to bind nanosized zno powders (song et al., 2010) . besides, by fusing the protein transduction domain (ptd) of human immunodeficiency virus (hiv) tat protein (a protein responsible for nuclear import of hiv genome) with vp39, the engineered baculovirus results in improved transduction of various mammalian cells (chen et al., 2011b) . in addition to surface modification by genetic engineering, baculovirus has been labeled with tracers for tracking (li et al., 2004) or biodistribution imaging raty et al., 2007) . chemical coupling of antigenic peptides also enables rapid modification of baculovirus particles and delivery of multiple epitopes (wilson et al., 2008) . baculovirus can also be chemically conjugated with polyethylene glycol (peg) alone (kim et al., 2006) or together with folate (kim et al., 2007 (kim et al., , 2010 to improve the transduction of folate receptor-positive kb cells. additionally, baculoviral vectors have been coated with positively charged polyethylenimine (25 kda) through electrostatic interactions (yang et al., 2009) . the modification imparts baculoviral vectors resistance to human and rat serum-mediated inactivation in vitro and elevates in vivo transduction in the liver and spleen after tail vein injection into mice. rna interference (rnai) is a phenomenon that mediates sequence-specific post-transcriptional gene silencing and can be artificially elicited by the expression of short hairpin rnas (shrna) or microrna (mirna) precursors from an expression vector (garzon et al., 2010) . since the initial demonstrations of baculovirus-mediated shrna delivery (nicholson et al., 2005; ong et al., 2005) , baculovirusmediated rnai has been explored for antiviral therapy. one baculovirus expressing the shrna specific for the c-terminal nucleoprotein of porcine reproductive and respiratory syndrome virus (prrsv) genome inhibits the viral replication in vitro (lu et al., 2006) . the baculovirus harboring a bispecific shrna targeted against influenza virus a and b can inhibit the production of both virus types in transduced cell lines (suzuki et al., 2009b) . another baculovirus expressing shrna against the peste des petits ruminants virus (pprv) represses pprv replication in vitro and the inhibition effect is superior to that mediated by the adenovirus expressing the same shrna (nizamani et al., 2011) . hepatitis b virus (hbv) covalently closed circular (ccc) dna is the source of hbv transcripts in chronically infected patients. baculovirus expressing the hbv-specific shrna is able to hinder the formation of hbv ccc dna (starkey et al., 2009) . additionally, the baculovirus expressing shrnas specific for the highly conserved core region of hepatitis c virus (hcv) genome dramatically impedes the target protein expression (suzuki et al., 2009c) . the replication and segregation of ebv genomes to daughter cells is coordinated by the binding of ebv nuclear antigen 1 (ebna1) to orip, an origin of replication derived from ebv. to prolong the transgene expression, baculoviral vectors incorporating the orip/ ebna1 sequences were developed (shan et al., 2006; wang et al., 2008) . based on this concept, suzuki et al. devised a baculoviral vector that accommodated the orip/ebna1 sequence and encoded the hcvspecific shrna. this vector inhibited hcv core protein expression for n14 days, which considerably outlasted the 3 days of inhibition conferred by the conventional vector (suzuki et al., 2009a) . recently, we also exploited the baculovirus vector for mirna delivery and gene regulation. the baculovirus vectors carried artificial gene-specific mirna sequences within the mir155 scaffold, which after expression under the cmv promoter could undergo the mirna processing pathway and knocked down the target gene (e.g. egfp and tnf-α) expression (unpublished data). the gene suppression effect was extended by flanking the mirna expression cassette with the inverted terminal repeat sequences/direct repeats sequences (ir/dr) recognized by the sleeping beauty (sb) transposase. co-transduction of cells with the hybrid baculovirus/transposon vector and another sb transposase-expressing baculovirus resulted in the integration of mirna expression cassette into the chromosome, giving rise to prolonged target gene suppression (unpublished data). these data altogether implicate the potential of baculovirus-based rnai shuttle for antiviral therapy and treatment of indications necessitating prolonged gene regulation (e.g. arthritis). the feasibility of baculovirus-mediated cancer therapy was first explored by wang et al. (2006) , who developed a baculoviral vector expressing the diphtheria toxin a and demonstrated that the baculovirus impeded the growth of cultured glioma cells and glioma xenograft in the rat brain. since then, baculovirus has been utilized for the treatment of mouse hepatoma (kitajima et al., 2008) , murine melanoma, lung cancer and brain cancer (kim et al., 2007) . recently, wang and coworkers constructed another baculovirus that selectively expressed herpes simplex virus thymidine kinase (hsvtk) in tumors for glioma suicide gene therapy (balani et al., 2009 ). the hsvtk gene was driven by a truncated human high mobility group box2 (hmgb2) promoter, which allowed hsvtk expression in glioblastoma tissues but not in normal brain tissues. this vector triggered death of glioblastoma cells in the presence of ganciclovir, but did not affect the survival of human astrocytes and neurons. in a mouse xenograft model, intratumoral injection of this baculovirus at 7 days after tumor inoculation suppressed the growth of human glioblastoma and prolonged the mouse survival. however, the tumor mass was not completely eradicated after one single baculovirus injection, presumably due to the transient hsvtk expression (balani et al., 2009) . alternatively, the hsvtk gene was driven by the gfap (glial fibrillary acidic protein) promoter whose activity is similar in normal and tumor cells of the same lineage (wu et al., 2009a) . to limit the transgene expression in glioma cells, repeated target sequences of 3 mirnas (has-mir31, has-mir127 and has-mir143) that are enriched in astrocytes but are sparse in glioblastoma cells were appended to the 3′ untranslated region of the hsvtk gene. the baculovirus vector markedly improved in vivo selectivity when compared with the control vector without mirna regulation, effectively inhibited human glioma xenograft and imparted negligible toxicity to normal astrocytes. the incorporation of selected mirna sponge thus provides an additional safety switch to prevent off-target transgene expression (wu et al., 2009a) . in addition to hsvtk, the baculovirus expressing nes1 (normal epithelial cell specific-1) can inhibit the growth of gastric cancer cells (sgc-7901) in vitro and repress the sgc-7901 xenografted tumor growth after intratumoral injection, implicating the possibility for the therapy of gastric and colon cancers (huang et al., 2008) . baculovirus can also carry tumor suppressor genes such as p53 or pdcd4 (programmed cell death 4) for cancer treatment. in the mouse glioma xenograft models, the antitumor effect of the baculovirus-expressed p53 was substantiated by surface coating of the baculovirus envelope with polyethylenimine (yang et al., 2009) or by co-administering sodium butyrate (guo et al., 2011) , a histone deacetylase inhibitor that ameliorates baculovirus-mediated gene expression (guo et al., 2010; hu et al., 2003; yin et al., 2010; zhou et al., 2010) . the pdcd4expressing baculovirus conjugated with folate-peg efficiently expressed pdcd4 protein and induced apoptosis in human epidermal carcinoma cells (kim et al., 2010) . in a tumor xenograft, intratumoral injection of the pdcd4-expressing baculovirus significantly suppressed tumor growth and induced apoptosis. moreover, apoptin is a chicken anemia virus-derived protein that specifically triggers apoptosis in tumors. the vsvg-pseudotyped baculovirus expressing apoptin efficiently provoked apoptosis in mammalian cells, repressed the growth of xenograft tumors and prolonged the mice survival after intratumoral injection (pan et al., 2010) . one determinant to the baculovirus-mediated antitumor effects is the innate immune responses elicited by the virus (suzuki et al., 2010b) . takaku and coworkers demonstrated that wild-type baculo-virus, after intravenous (i.v.) injection into immunocompetent b6 mouse, was taken up by the liver and spleen, preferentially entered dcs and b cells, activated dcs, induced nk cells proliferation in the liver and spleen, and enhanced the antitumor immunity in mice with b16 liver metastases (kitajima et al., 2008) . baculovirus administration also increased the survival of c57bl/6, jα281 −/− and ifn-γ −/− mice bearing the b16 tumor, but did not enhance the survival of nk cell-depleted mice. these results prove that wild-type baculovirus efficiently induces nk cell-mediated antitumor immunity (kitajima et al., 2008) . besides direct injection, baculovirus has been employed in conjunction with cell therapy for cancer treatment. bone marrowderived dcs (bmdcs), after ex vivo transduction with wild-type baculovirus and i.v. injection into mice, significantly suppressed the lung cancer (suzuki et al., 2010a) . in a mouse melanoma model, baculovirus-transduced bmdcs inhibited tumor growth and improved animal survival at least partly due to the induction of cd8+ t cell-and nk cell-dependent, cd4+ t cell-independent antitumor immunity. importantly, bmdcs administration did not provoke evident signs of damage to the liver or kidney, as judged from the negligible disturbance of serum alanine aminotransferase (alt), aspartate aminotransferase (ast) and creatine levels (suzuki et al., 2010a) . another promising cell courier is bmscs thanks to their intrinsic tumor homing property. in this regard, bmscs were transduced with a baculovirus expressing hsvtk and injected via tail vein into the mice pre-inoculated with human u87 glioma cells (bak et al., 2010) . after ganciclovir injection, tumor growth was significantly repressed and the life span of animals was considerably prolonged. more recently, the same group generated msc-like cells from human embryonic stem (es) cells which, after transduction with the baculovirus expressing hsvtk and injection into the brain, were capable of inhibiting tumor growth and prolonging the survival of tumor-bearing mice in the presence of ganciclovir (bak et al., 2011) . these data implicated the feasibility of human es cells-derived mscs as a viable and attractive alternative for cancer therapy. gene therapy has converged with tissue engineering, by which the therapeutic genes stimulating tissue regeneration can be administered to cells either in vivo or ex vivo with an appropriate vector. given the highly efficient gene delivery, baculovirus has been used for cartilage and bone engineering (lin et al., 2010b) . articular cartilage is a weight-bearing tissue comprising chondrocytes and extracellular matrix (ecm) composed of proteins (e.g. collagen ii) and glycosaminoglycans (gags) such as aggrecan. articular cartilage may be damaged due to direct trauma or joint diseases, but its ability to self-repair is limited, ultimately leading to debilitating pain and physical impairment. therefore, cartilage tissue engineering that combines cells, scaffolds and biological signals has emerged for cartilage repair. given that chondrocytes are pivotal in the synthesis, composition modulation, structural arrangement of ecm components and hence the mechanical properties, we developed a protocol for baculovirus transduction of rat (ho et al., 2004) and rabbit chondrocytes. this protocol involves the incubation of cells with the virus at 25-27°c for 4-8 h using dulbecco's phosphate-buffered saline (d-pbs) as the surrounding solution and confers efficiencies higher than 90% . the key to such high transduction efficiencies is the absence of nahco 3 in d-pbs, which hinders the baculovirus transduction (shen et al., 2007) . consequently, the surrounding solution has been replaced with nahco 3 -deficient dmem medium . critically, chondrocytes transduced with an egfp-expressing baculovirus remain capable of producing cartilage-specific collagen ii and gags, and grow into cartilaginous tissues when seeded into the porous, poly (l-lactide-co-glycolide) (plga) scaffold and cultivated dynamically . these data demonstrate that baculovirus transduction of chondrocytes does not obstruct the chondrocytes differentiation. therefore, we further constructed 3 baculovirus vectors each expressing one growth factor (tgf-β1, insulin-like growth factor-1 (igf-1) or bone morphogenetic protein-2 (bmp-2)) known to stimulate chondrogenesis, and confirmed the protein expression in chondrocytes isolated from new zealand white (nzw) rabbits . among the 3 baculovirus vectors, only the bmp-2-expressing baculovirus (designated bac-cb) remarkably enhanced the aggrecan and collagen ii production by partially de-differentiated passage 3 (p3) cells and restored the differentiation. the baculovirus expressing tgf-β1 (designated bac-ct) modestly augmented the chondrogenesis but was insufficient to reverse the de-differentiation status of p3 cells . nonetheless, co-transduction of de-differentiated p3 chondrocytes with bac-cb and bac-ct synergistically modulated the re-differentiation program (sung et al., 2009) . albeit the chondrogenic potential, bac-cb transduction alone was insufficient to support uniform 3d cartilage growth in the static culture due to the lack of mechanical stimulation and poor oxygen/ nutrient transfer. to tackle these problems, p3 chondrocytes transduced with bac-cb were seeded into plga scaffolds and cultured in a rotating-shaft bioreactor (rsb), which grew into cartilage-like tissues after 3-week dynamic culture . implantation of the engineered cartilages into the osteochondral defects of nzw rabbits resulted in the regeneration of hyaline cartilages at week 8 and improved the integration of the host and engineered cartilages (chen et al., 2009c) . massive segmental bony defects often occur following trauma or tumor resection and remain a clinical problem. for bone regeneration, bmscs have evolved to be a promising cell source as they are immunoprivileged, immunosuppressive and capable of differentiating into osteoblasts . in this regard, baculovirus transduces bmscs with efficiencies exceeding ≈95% under optimized conditions ), but the transduction efficiencies for bmscs-derived progenitors varies widely with the differentiation states at which the committed progenitors are transduced lee et al., 2007b) . furthermore, transduction of human bmscs with bac-cb (which expresses the potent osteogenic factor bmp-2) directed in vitro commitment of naïve bmscs into osteoblasts (chuang et al., 2007) . after injection into the back subcutis of immunodeficient nude mice, the transduced bmscs resulted in progressive mineralization and ectopic bone formation (chuang et al., 2007) . implantation of the bac-cb-engineered human bmscs into rat calvarial defects stimulated mineralized bone matrix deposition and initiated the bone island formation at week 4, but without immunosuppression the xenogeneic cells were rejected and eradicated at week 12 (chuang et al., 2010) . to circumvent the rejection, bmscs isolated from nzw rabbits were used for baculovirus transduction and allotransplantation. given the important roles of vascular endothelial growth factor (vegf) in angiogenesis, ossification and callus maturation, rabbit bmscs were transduced with bac-cv (expressing vegf) or bac-cb, mixed at a number ratio of 1:4, seeded into plga scaffolds and implanted to the critical-size (10 mm in length) femoral segmental defects of allogeneic, immunocompetent nzw rabbits, which represent a fairly rigorous bone healing scenario (lin et al., 2010a ). x-ray radiography, positron emission tomography (pet), micro computed tomography (μct), immunohistochemical staining and biomechanical testing illustrated that the baculovirus-engineered cell/scaffold constructs not only accelerated the bone healing, but also gave rise to prominent angiogenesis and improved mechanical properties at week 8. adipose-derived stem cells (ascs) are another promising cell source for regenerative medicine thanks to the ease of isolation in abundance and capacity of osteogenic differentiation (levi et al., 2010) . however, ascs appear to be inferior to bmscs in osteogenic differentiation and ascs engineered by the conventional baculovirus transiently expressing bmp-2/vegf (s group) led to significantly poorer healing of segmental femoral bone defects than bmscs engineered by the same vectors (fig. 2) . to use ascs for repairing large, segmental bone defects, we surmised that sustained expression of factors promoting osteogenesis (bmp-2) and angiogenesis (vegf) are necessary. as such, we employed a hybrid baculovirus system developed previously for persistent expression . the dual vector system constitutes two baculoviruses whereby one expresses the flippase recombination enzyme (flp) while the other accommodates the bmp-2 or vegf cassette flanked by the flippase recognition target (frt) sequences. within the mammalian cells cotransduced with the hybrid baculoviruses, flp catalyzes the recombination of the frt-flanking cassette, resulting in the cassette excision off the baculovirus genome, formation of episomal circle and substantially prolonged transgene expression . likewise, the flp/frt-mediated recombination occurred efficiently in the nzw rabbit ascs, enabling persistent transgene expression for n28 days . allotransplantation of the nzw rabbit ascs transduced with the hybrid baculoviruses expressing bmp-2/vegf into the critical-size femoral segmental defects accelerated the healing, improved the bone quality and angiogenesis when compared with transplanting ascs engineered with the conventional baculoviruses (fig. 2) . therefore, ascs engineered by the hybrid baculovirus vector hold promise for repairing massive segmental defects . the safety profile of the hybrid baculovirus vectors was recently assessed using human bmscs as the host (chen et al., 2011a) . transduction of human bmscs with the hybrid baculovirus neither compromised the cell viability/differentiation, nor resulted in transgene integration into the host chromosome. the transduction did not disrupt the bmscs karyotype, nor did it disturb the expression of wellknown proto-oncogenes (c-myc, n-ras, k-ras and h-ras) and tumor suppressor genes (p53 and p16). furthermore, the transduced bmscs did not induce tumor formation in nude mice. these data altogether ensure the safe use of the flp/frt-based hybrid vector for stem cell engineering and tissue regeneration. as discussed in section 2.3, wild-type baculovirus triggers the innate immunity and potentiates the adaptive immune responses, which protect the animals from the infection of several viruses. these attributes have sparked explosive interests to develop baculovirus as a vector vaccine candidate, in which the antigens can be (1) expressed by the vector within the host cells, (2) displayed on the baculovirus surface or (3) displayed and expressed by the vector (table 2) . the feasibility of using baculovirus as a vaccine expression vector was first tested by in vivo inoculation of the baculovirus expressing the glycoprotein gb of pseudorabies virus (aoki et al., 1999) or hemagglutinin (ha) of h1n1 influenza virus (abe et al., 2003) into mice, which raised antigen-specific antibody. moreover, baculovirus expressing the e2 glycoprotein of hcv or carcinoembryonic antigen could elicit antigen-specific t cell responses (facciabene et al., 2004) . similarly, subcutaneous (s.c.) and intraperitoneal (i.p.) immunizations of mice with a baculovirus expressing the antigens of severe acute respiratory syndrome coronavirus (sars-cov) induced humoral immune responses and th1-biased cellular immunity (bai et al., 2008) . the efficacy of baculovirus-based vaccines has been potentiated by surface display of vsvg protein on the envelope. the vsvgpseudotyped baculovirus expressing the gp5 and m proteins of porcine reproductive and respiratory syndrome virus (prrsv) under the cmv promoter elicited anti-prrsv neutralizing antibodies and ifn-γ, and conferred better immunogenicity than the dna vaccine expressing the same antigens . conversely, i.m. immunizations of a mixture of vsvg-pseudotyped baculovirus expressing pseudorabies virus (prv) proteins triggered th1-biased immune responses, as manifested by the induction of ifn-γ and prvspecific igg2a antibodies (grabowska et al., 2009) . the immunization also provoked nk cell activity (accompanied by the production of ifn-α and ifn-γ) and protected mice against lethal prv challenge. i. m. immunization of chickens with another vsvg-pseudotyped baculovirus expressing ha of h5n1 avian influenza virus (aiv) also evoked significantly higher levels of h5-specific antibody and cellular immunity than those receiving dna vaccines, and conferred protection against lethal challenge with the homologous virus strain (wu et al., 2009b) . similar vsvg-pseudotyped baculovirus vectors have been constructed to express the japanese encephalitis virus (jev) envelope protein (li et al., 2009b) , the capsid protein of porcine circovirus type 2 (pcv2) (fan et al., 2008) , and the toxoplasma gondii sag1 protein . all these studies showed that i.m. immunization of the vsvg-pseudotyped baculovirus vector results in stronger vaccine effects than the dna vaccine counterparts, which can be attributable to the adjuvant properties of baculovirus and more efficient transduction of muscle cells in vivo. the effective transduction of muscle cells is desired as the chance of antigen presentation to dcs is increased and baculovirus-mediated expression is considerably prolonged in myogenic cells (shen et al., 2008) . note, however, that vsvg pseudotyping induces cytotoxicity and impairs the baculovirus titer, hence a truncated version of vsvg comprised of the 21 amino acid ectodomain, the transmembrane and ctd domains was designed for baculovirus pseudotyping (kaikkonen et al., 2006) . the truncated vsvg reduced the cytotoxicity and was exploited to construct two pseudotyped baculoviruses: one expressing the capsid protein of footand-mouth disease virus (fmdv) and the other encoding the capsid and a t-cell immunogen . i.m. inoculation of the fig. 2 . the ascs engineered with the hybrid baculovirus augment the healing of massive bone defects. the nzw rabbit ascs were transduced with the hybrid baculovirus vectors conferring sustained expression of bmp-2 or vegf, mixed at a number ratio of 4:1, loaded into cylindrical plga scaffolds (1.5 × 10 6 cells/scaffold) and implanted to the critical-size segmental defects at the femora of nzw rabbits (2 constructs/defect, designated l group). the s group contained ascs that were transduced with conventional baculoviruses transiently expressing bmp-2/vegf and implanted in a similar fashion. the mock group comprised the mock-transduced ascs as the negative control. x-ray radiography, gross appearance examination, micro computed tomography (μct), hematoxylin and eosin (h&e) staining and cd 31-specific immunohistochemical staining (to detect blood vessel formation) performed at 12 weeks post-implantation collectively demonstrated that the l group (persistently expressing bmp-2 and vegf) resulted in significantly improved bone healing and angiogenesis in comparison with the s group (transiently expressing bmp-2 and vegf) and mock group. stars indicate the new bone while arrows indicate the blood vessel formation. baculoviruses into mice induced the fmdv-specific neutralizing antibodies and ifn-γ, and expression of the additional t-cell immunogen augmented the immunogenicity. besides vsvg, human endogenous retrovirus (herv) envelope protein was used for pseudotyping the baculovirus (designated acherv-hp16l1) that encoded the l1 capsid protein of human papillomavirus 16 (hpv 16) (lee et al., 2010) . after i.m. injection and 2 booster injections at 2-week intervals, the mice developed similarly high levels of igg, iga and neutralization titers, as well as tremendously higher levels of ifn-γ when compared with the mice immunized with the commercial virus-like particle vaccine gardasil ® (25 μl/dose). thus acherv-hp16l1 holds promise as a cost-effective and efficient hpv vaccine. as mentioned in section 3.1, surface display of heterologous protein/peptide has been achieved by fusion with gp64 gene and expression driven by either polyhedrin or p10 promoter. taking advantage of this technology, baculovirus displaying the rodent malaria plasmodium berghei circumsporozoite protein (pbcsp) (yoshida et al., 2003) , the antigenic site a of fmdv vp1 protein (tami et al., 2004) or the sars-cov spike protein (chang et al., 2004; feng et al., 2006) has been shown to induce potent immune responses. more recently, baculovirus vectors displaying the pfs25 protein of plasmodium falciparum (mlambo et al., 2010) and 19-kda carboxyl terminus of merozoite surface protein 1 (pymsp1 19 ) of plasmodium yoelii were also developed as potential vaccines against malaria. i.m. and intranasal (i.n) immunizations of mice with the pymsp1 19 -displaying baculovirus induced mixed th1/th2-type immune responses, but i.n. immunization yielded higher pymsp119-specific antibody titers and natural boosting after challenge. i.n. immunization also conferred complete protection thanks to the th1/th2-type immunity associated with tlr9-dependent pathway . ha protein is the major antigen of influenza virus and, similar to gp64, is a class i transmembrane protein on the viral envelope. gp64based fusion was harnessed to display the ha of aiv using the ctd derived from ha or from gp64 (yang et al., 2007) . in comparison with the ha ctd, gp64 ctd endowed more efficient ha display and stimulated stronger hemagglutination inhibition (hi) titers crucial for neutralizing the live aiv (yang et al., 2007) . the profound effect of gp64 ctd on ha display was attested in a recent study which delineated that the signal peptide and ctd of gp64 could enhance the display of influenza ha on baculovirus surface, while the gp64 transmembrane domain impaired ha display (tang et al., 2010) . based on the finding, a baculovirus simultaneously displaying 4 has derived from 4 subclades of h5n1 aiv was constructed. i.m. immunization of mice with this tetravalent baculovirus elicited hi titers against all 4 homologous h5n1 viruses, significantly reduced viral lung titers of challenged mice, raised high levels of ifn-γsecreting and ha-specific cd8+ t cells, and provided 100% protection against lethal challenge with homologous h5n1 viruses (tang et al., 2010) . the same design concept exploiting the gp64 ctd was also adopted to display the σc and σb proteins of avian reovirus (lin et al., 2008) , e rns envelope glycoprotein , e2 (xu and liu, 2008) or ns3 protein (xu et al., 2009 ) of classical swine fever virus (csfv) and the e glycoprotein of jev (xu et al., 2011) . these studies collectively attested that the antigen can be efficiently displayed on the baculoviral envelope and induce humoral/cellular immune responses against the lethal viral challenge. other than displaying the antigen via genetic engineering, cd4+ t helper epitope (sferfeipke) and the major b cell epitope (wltekegsyp) derived from the ha of influenza virus a/pr/8/34 have also been chemically conjugated to baculovirus envelope (wilson et al., 2008) . i.n. administration of such baculovirus to mice elicited antigen-specific igg1a/igg2a, iga and ifn-γ, thus chemical coupling allows for the delivery of multiple epitopes to baculovirus. in addition, ha of aiv (h5n1) has been displayed on another baculovirus bombyx mori npv (bmnpv) via fusion with gp64 (jin et al., 2008) . the virus was produced and purified from silkworm pupae infected with the recombinant virus. immunization of rhesus monkeys with aluminum hydroxide as the adjuvant at doses of 2 mg/kg and 0.67 mg/kg elicited neutralizing antibodies and protected monkeys against influenza virus challenge. the vaccine did not cause appreciable toxicity at the dose as large as 3.2 mg/kg in cynomolgus monkeys and 1.6 mg/kg in mice, indicating the safe vaccine doses (jin et al., 2008) . food-and-mouth disease virus (cao et al., 2011) gp5 and m proteins porcine reproductive and respiratory syndrome virus hpv-16l1 protein human papillomavirus (lee et al., 2010) toxoplasma gondii (fang et al., 2010) display hemagglutinin protein influenza virus (jin et al., 2008; tang et al., 2010; yang et al., 2007) xu et al., 2009; xu and liu, 2008) 19-kda carboxyl terminus of merozoite surface protein 1 plasmodium yoelii pfs25 protein plasmodium falciparum (mlambo et al., 2010) dual hemagglutinin protein influenza virus he et al., 2008; lu et al., 2007; prabakaran et al., 2008; prabakaran et al., 2010a prabakaran et al., , 2010b prabakaran et al., , 2010c plasmodium falciparum (strauss et al., 2007) 7.3. baculovirus as a dual vector for antigen expression/display it is perceivable that the antigen-expressing baculovirus mimicks the dna vaccine. following vector injection into the hosts, the antigen is expressed and presented via mhc i pathway in the transduced cells, and activates cd8+ t cells. conversely, the antigen-displaying baculovirus mimicks the subunit vaccine. antigens on the envelope are internalized by professional antigen presenting cells (apcs) such as macrophages and dcs, and presented by the mhc ii pathway to stimulate humoral immune responses. to fully exploit both mechanisms, a dual baculovirus vector that simultaneously displays and expresses the plasmodium falciparum circumsporozoite (cs) protein was developed as a human malaria vaccine (strauss et al., 2007) . this vector contained two expression cassettes, one encoding the cs-gp64 fusion protein under the polyhedrin promoter while the other accommodating the cs gene under the cmv promoter, such that cs can be displayed on the envelope and expressed after transduction into mammalian cells. upon i.m. injection into mice, the dual vector induced higher anti-cs antibody titers and higher frequencies of cs-specific cd4+ and cd8+ t cells than the vectors that only displayed or only expressed cs. recently, we adopted a similar approach and constructed 3 vectors: bac-ha64 harbored the ha gene of aiv under the p10 promoter for ha display; bac-cha expressed ha under the cmv promoter while the dual vector bac-cha/ha64 encompassed both expression cassettes in opposite orientations, such that bac-cha/ ha64 displayed and expressed ha in the transduced cells . all 3 vectors, after administration (i.m., i.n. and s.c.) into balb/c mice, provoked ha-specific humoral (igg1, igg2a and hi titers), mucosal (iga titers) and cellular (ifn-γ and il-4 producing t cells and ifn-γ + /cd8 + t cells) immune responses. the strong cellular immunity provoked by bac-ha64, which in theory favors the mhc ii pathway and preferentially elicits humoral immune responses, was likely due to the potent adjuvant effects of baculovirus. regardless, via either administration route the dual vector bac-cha/ha64 triggered superior or at least comparable ha-specific immune responses than the other two vaccine forms, demonstrating the advantages of the dual form for vaccine design . instead of using two separate cassettes as above, yoshida et al. (2009) devised a single cassette system whereby the pbcsp-gp64 fusion gene was driven by a tandem promoter consisting of the cmv and polyhedrin promoters, so that pbcsp was expressed in insect cells and mammalian cells. i.m. immunization with this baculovirus elicited both th1 and th2 responses as evidenced by the high pbcsp-specific igg1/ igg2a titers and pbcsp-specific cd8+ t-cells responses, and conferred 100% protection against sporozoite challenge (yoshida et al. 2009 ). the same dual expression system was subsequently utilized to express plasmodium vivax transmission-blocking immunogen (pvs25) for the development of malarial transmission-blocking vaccines against the sexual stages of the parasite (blagborough et al., 2010) . both i.n. and i.m. immunizations of mice induced a mixed th1/th2 response as evidenced by high pvs25-specific igg1 and igg2a/igg2b titers as well as a strong transmission-blocking response after challenge. another simple approach to developing the dual vector is to drive the antigen expression using the white spot syndrome virus (wssv) ie1 promoter, which is active in insect, mammalian and avian cells (gao et al., 2007; he et al., 2008) . baculovirus encoding the wssv vp28 envelope protein driven by the wssv ie1 promoter was able to display vp28 on the envelope (syed musthaq et al., 2009) . immunization of shrimp with this baculovirus resulted in vp28 expression in shrimp tissues, elevated the survival rate and reduced the wssv viral load after wssv infection. wssv ie1 promoter was also used in conjunction with vsvg pseudotyping. i.m. injection of the vsvg-pseudotyped baculovirus that expressed the e2 protein of csfv under the ie1 promoter induced csfv-specific neutralizing antibodies and lymphoproliferative responses in mice (li et al., 2009a) . wssv ie1 promoter has been most extensively employed by the kwang group to drive the expression of ha of h5n1 aiv, which enabled ha display on the envelope and conferred the viral particle hemagglutination activity . i.m. injection of this vector (designated bacha) into mice elicited the antibodies with hi titers against the tested influenza virus. in sf-9 cells, wssv ie1 promoter was more active than cmv promoter, thereby giving rise to more efficient ha incorporation into the baculoviral envelope . the ie1 promoter also resulted in strong ha expression in the lung (after i.n. inoculation) and thymus (after i.m. inoculation) in chickens. as a result, immunization of chickens with the baculovirus bearing the ie1 promoter (bacha) elicited higher anti-ha antibody levels than that bearing the cmv promoter . however, i.n. inoculation of bacha stimulated low anti-ha igg titers (prabakaran et al., 2008) . to enhance the vaccine efficacy via the i.n. route, bacha was co-administered with recombinant cholera toxin b subunit (rctb) as the adjuvant, which significantly enhanced the serum igg, mucosal iga and serum microneutralization titers. with the adjuvant, bacha also triggered higher ha-specific humoral and mucosal immune responses than the inactivated h5n1 virus adjuvanted with the same dose of rctb, and conferred complete protection against challenge with homologous and heterologous h5n1 strains (prabakaran et al., 2008) . additionally, gastrointestinal delivery of bacha into mice by oral gavage led to transduction in vivo and remarkably boosted the ha-specific igg, hi and mucosal iga titers (prabakaran et al., 2010c) . the live bacha triggered stronger cross-clade neutralization against heterologous h5n1 strains than the inactivated bacha, and provided 100% protection against challenge with lethal doses of homologous and heterologous h5n1. moreover, after challenge the immunized mice exhibited only minimal bronchitis in lungs and regained their body weight more rapidly (prabakaran et al., 2010c) . the oral vaccine efficacy was further potentiated by encapsulating the live bacha within a reverse micelle structure of phosphatidylcholine, which provided protection against the destructive environment in the intestinal lumen (prabakaran et al., 2010b) . in comparison with the non-encapsulated bacha, gastrointestinal delivery of the encapsulated baculovirus into mice led to significantly ameliorated ha-specific igg and iga responses, and higher hi titers. the encapsulated vaccine induced strong cross-clade neutralization titers against heterologous h5n1 strains and conferred protection against infection with highly pathogenic, heterologous h5n1 viruses (prabakaran et al., 2010b) . using the same wssv ie1 approach, more recently kwang and coworkers selected 3 ha proteins from different vaccine strains for expression and display on the baculovirus surface (prabakaran et al., 2010a) . the 3 ha proteins covered the entire variants in the neutralizing epitopes among the h5n1 lineages. s.c. immunization of mice with a mixture of 3 baculoviruses displaying the 3 ha proteins (tri-bacha) induced antibodies capable of neutralizing viruses from clades 1, 2.1, 2.2, 4, 7, and 8 of h5n1 viruses. in contrast, s.c. immunization with a single ha-displaying baculovirus (mono-bacha) or a single strain of inactivated whole virus vaccine neutralized only clade 1 (homologous), clade 2.1, and clade 8.0 viruses. also, the tri-bacha vaccine protected 100% of the mice against challenge with three different clades (clades 1.0, 2.1, and 7.0) of h5n1 viruses. since the discovery in 1995 that baculovirus effectively transduces mammalian cells and mediates transgene expression, baculovirus has emerged as a promising gene delivery vector. albeit the rapidly growing lists of permissive cells and applications, relatively little is known about what happens inside the cells after baculovirus transduction. evidence accumulating in recent years has indicated that after entry, baculovirus can translocate into the nucleus through a complex cascade of steps and express baculoviral genes and the transgene. the entire process results in the recognition of baculovirus by the cells via the tlr9-dependent and -independent pathways, and disturbs the expression of a small percentage of host genes, particularly those pertaining to the innate immune responses. however, disparity does exist between reports and the exact intracellular events remain elusive. the lack of comprehensive knowledge about the events governing the virus entry, intracellular transport and cellular responses will constitute a roadblock to future applications in the clinical setting, thus entailing extensive research to elucidate the underlying mechanisms. although the innate immune responses stir up concerns regarding the safety of baculovirus in gene therapy, these responses represent opportunities to harness baculovirus as a vector to defend against infectious agents and tumors. indeed, baculovirus-based vaccine has captured explosive interests over the past 3 years and in one case has entered into trials in primates (jin et al., 2008) . thanks to the intrinsic adjuvant properties, in most studies baculovirus-based vaccines were able to trigger potent immune responses in the absence of additional adjuvants. due to the widespread use and encouraging preclinical data in comparison with other vaccine forms (e.g. dna vaccine, viruslike particle vaccine and whole virus vaccine), it is envisioned that baculovirus-based vaccines can progress into next phase in the near future. the capability of shrna/mirna delivery also confers baculovirus an edge for target gene regulation and for future applications in antiviral therapy and immunotherapy. furthermore, baculovirus transduction of neural cells (kenoutis et al., 2006) and adult stem cells (bak et al., 2010; ho et al., 2006; zeng et al., 2009) does not markedly alter the inherent properties and mitigate the differentiation capacity, warranting baculovirus a promising vector for cell therapy and tissue engineering. recent progresses in the baculovirus vectors engineering with respect to surface modification, minimization of in vivo inactivation, transgene targeting and prolongation of expression further corroborate the potentials of baculovirus in these applications. to advance the baculovirus technology from bench to bedside, other roadblocks still stand in the way. over the past few years, a wealth of literature has addressed the problems in baculoviral vector production carinhas et al., 2009 carinhas et al., , 2010 dojima et al., 2010; tsai et al., 2007) , quantification (chan et al., 2006; ferris et al., 2011; kärkkäinen et al., 2009; lo et al., 2011; lo and chao, 2004; roldao et al., 2009) , purification (chen et al., 2009a; kaikkonen et al., 2008; transfiguracion et al., 2007; vicente et al., 2009 vicente et al., , 2010a vicente et al., , 2010b wu et al., 2007) and quality assurance (ihalainen et al., 2010; jorio et al., 2006) . new baculoviral vectors taking advantage of hybrid promoters (gao et al., 2007; keil et al., 2009; lackner et al., 2008; pan et al., 2009) and new regulatory elements (du et al., 2010; mahonen et al., 2007) are also being developed and evaluated for their potential applications. additionally, the transduction conditions (keil et al., 2009; pan et al., 2009) , supplements (guo et al., 2010; yin et al., 2010) and parameters (lee et al., 2007a (lee et al., , 2007b shen et al., 2007 shen et al., , 2008 dictating the transduction efficiencies have been evaluated. these technological progresses undoubtedly will facilitate the production of baculoviral vectors in compliance of cgmp regulations, and advance baculovirus from a research tool to clinical applications. host innate immune responses induced by baculovirus in mammals baculovirus induces an innate immune response and confers protection from lethal influenza virus infection in mice involvement of the toll-like receptor 9 signaling pathway in the induction of innate immunity by baculovirus baculovirus induces type i interferon production through toll-like receptor-dependent and -independent pathways in a cell-type-specific manner baculovirus-mediated periadventitial gene transfer to rabbit carotid artery induction of antibodies in mice by a recombinant baculovirus expressing pseudorabies virus glycoprotein b in mammalian cells vaccination of mice with recombinant baculovirus expressing spike or nucleocapsid protein of sars-like coronavirus generates humoral and cellular immune responses baculovirus-transduced bone marrow mesenchymal stem cells for systemic cancer therapy human embryonic stem cell-derived mesenchymal stem cells as cellular delivery vehicles for prodrug gene therapy of glioblastoma high mobility group box2 promoter-controlled suicide gene expression enables targeted glioblastoma treatment cell density effect in the baculovirus-insect cells system: a quantitative analysis of energetic metabolism intranasal and intramuscular immunization with baculovirus dual expression system-based pvs25 vaccine substantially blocks plasmodium vivax transmission amino-terminal anchored surface display in insect cells and budded baculovirus using the amino-terminal end of neuraminidase gene expression profiling to define host response to baculoviral transduction in the brain a pseudotype baculovirus expressing the capsid protein of footand-mouth disease virus and a t-cell immunogen shows enhanced immunogenicity in mice baculovirus production for gene therapy: the role of cell density, multiplicity of infection and medium exchange improving baculovirus production at high cell density through manipulation of energy metabolism determination of the baculovirus transducing titer in mammalian cells induction of il-8 release in lung cells via activator protein-1 by recombinant baculovirus displaying severe acute respiratory syndrome-coronavirus spike proteins: identification of two functional regions non-polar distribution of green fluorescent protein on the surface of autographa californica nucleopolyhedrovirus using a heterologous membrane anchor baculovirus-mediated production of hdv-like particles in bhk cells using a novel oscillating bioreactor combination of baculovirus-mediated gene transfer and rotating-shaft bioreactor for cartilage tissue engineering combination of baculovirus-mediated bmp-2 expression and rotating-shaft bioreactor culture synergistically enhances cartilage formation concanavalin a affinity chromatography for efficient baculovirus purification baculovirus transduction of mesenchymal stem cells triggers the toll-like receptor 3 (tlr3) pathway the repair of osteochondral defects using baculovirus-mediated gene transfer with de-differentiated chondrocytes in bioreactor culture baculovirus as an avian influenza vaccine vector: differential immune responses elicited by different vector forms biosafety assessment of human mesenchymal stem cells engineered by hybrid baculovirus vectors membrane penetrating peptides greatly enhance baculovirus transduction efficiency into mammalian cells a rapid and efficient method to express target genes in mammalian cell by baculovirus a novel system for the production of fully deleted adenovirus vectors that does not require helper adenovirus baculovirus as a new gene delivery vector for stem cells engineering and bone tissue engineering baculovirus transduction of mesenchymal stem cells: in vitro responses and in vivo immune responses after cell transplantation xenotransplantation of human mesenchymal stem cells into immunocompetent rats for calvarial bone repair baculovirus expression vectors for insect and mammalian cells transient and stable gene expression in mammalian cells transduced with a recombinant baculovirus vector baculoviruses and mammalian cell-based assays for drug screening production of scfv-displaying bmnpv in silkworm larvae and its efficient purification autographa californica multicapsid nucleopolyhedrovirus efficiently infects sf9 cells and transduces mammalian cells via direct fusion with the plasma membrane at low ph the combined use of viral transcriptional and posttranscriptional regulatory elements to improve baculovirus-mediated transient gene expression in human embryonic stem cells baculovirus vector requires electrostatic interactions including heparan sulfate for efficient gene transfer in mammalian cells baculoviruses deficient in ie1 gene function abrogate viral gene expression in transduced mammalian cells improving baculovirus transduction of mammalian cells by surface display of a rgd-motif baculovirus vectors elicit antigen-specific immune responses in mice construction and immunogenicity of recombinant pseudotype baculovirus expressing the capsid protein of porcine circovirus type 2 in mice construction and immunogenicity of pseudotype baculovirus expressing toxoplasma gondii sag1 protein in balb/c mice model baculovirus surface display of sars coronavirus (sars-cov) spike protein and immunogenicity of the displayed protein in mice models evaluation of the virus counter ® for rapid baculovirus quantitation expression of autographa californica multiple nucleopolyhedrovirus genes in mammalian cells and upregulation of the host β-actin gene high efficiency gene transfer into cultured primary rat and human hepatic stellate cells using baculovirus vectors efficient gene delivery into mammalian cells mediated by a recombinant baculovirus containing a whispovirus ie1 promoter, a novel shuttle promoter between insect cells and mammalian cells targeting micrornas in cancer: rationale, strategies and challenges a surface-modified baculovirus vector with improved gene delivery to b-lymphocytic cells preclinical evaluation of innate immunity to baculovirus gene therapy vectors in whole human blood indications for cell stress in response to adenoviral and baculoviral gene transfer observed by proteome profiling of human cancer cells expression and subcellular targeting of canine parvovirus capsid proteins in baculovirus-transduced nlfk cells baculovirus for eukaryotic protein display developments in the use of baculoviruses for the surface display of complex eukaryotic proteins new baculovirus recombinants expressing pseudorabies virus (prv) glycoproteins protect mice against lethal challenge infection comparison between recombinant baculo-and adenoviralvectors as transfer system in cardiovascular cells sodium butyrate enhances the expression of baculovirusmediated sodium/iodide symporter gene in a549 lung adenocarcinoma cells antiglioma effects of combined use of a baculovirual vector expressing wild-type p53 and sodium butyrate upregulation of proinflammatory cytokines and no production in bv-activated avian macrophage-like cell line (hd11) requires mapk and nf-κb pathways involvement of tlr21 in baculovirus-induced interleukin-12 gene expression in avian macrophage-like cell line hd11 wssv ie1 promoter is more efficient than cmv promoter to express h5 hemagglutinin from influenza virus in baculovirus as a chicken vaccine insect baculoviruses strongly potentiate adaptive immune responses by inducing type i ifn baculovirus expression systems for recombinant protein production in insect cells highly efficient baculovirus-mediated gene transfer into rat chondrocytes transgene expression and differentiation of baculovirus-transduced human mesenchymal stem cells baculovirus transduction of human mesenchymal stem cell-derived progenitor cells: variation of transgene expression with cellular differentiation states complement inhibition rescued mice allowing observation of transgene expression following intraportal delivery of baculovirus in mice protection of baculovirus-vectors against complement-mediated inactivation by recombinant soluble complement receptor type 1 baculovirus as a highly efficient expression vector in insect and mammalian cells baculovirus vectors for gene therapy baculoviral vectors for gene delivery: a review enhancement and prolongation of baculovirusmediated expression in mammalian cells: focuses on strategic infection and feeding baculovirus as an expression and/or delivery vehicle for vaccine antigens combination of baculovirus-mediated gene delivery and packed-bed reactor for scalable production of adeno-associated virus suppression of gastric cancer growth by baculovirus vector-mediated transfer of normal epithelial cell specific-1 gene efficient gene delivery into fish cells by an improved recombinant baculovirus incorporation of decay-accelerating factor into the baculovirus envelope generates complement-resistant gene transfer vectors morphological characterization of baculovirus autographa californica multiple nucleopolyhedrovirus host innate immune receptors and beyond: making sense of microbial infections sting regulates intracellular dna-mediated, type i interferon-dependent innate immunity expression of seap (secreted alkaline phosphatase) by baculovirus mediated transduction of hek 293 cells in a hollow fiber bioreactor system safety and immunogenicity of h5n1 influenza vaccine based on baculovirus surface display system of bombyx mori analysis of baculovirus aggregates using flow cytometry truncated vesicular stomatitis virus g protein improves baculovirus transduction efficiency in vitro and in vivo targeting and purification of metabolically biotinylated baculovirus screening of complement inhibitors: shielded baculoviruses increase the safety and efficacy of gene delivery acquisition of complement resistance through incorporation of cd55/decay-accelerating factor into viral particles bearing baculovirus gp64 inhibition of hiv-1 replication by vesicular stomatitis virus envelope glycoprotein pseudotyped baculovirus vector-transduced ribozyme in mammalian cells a 96-well format for a high-throughput baculovirus generation, fast titering and recombinant protein production in insect and mammalian cells novel vectors for simultaneous high-level dual protein expression in vertebrate and insect cells by recombinant baculoviruses baculovirus-mediated gene delivery into mammalian cells does not alter their transcriptional and differentiating potential but is accompanied by early viral gene expression regulation of transduction efficiency by pegylation of baculovirus vector in vitro and in vivo direct vaccination with pseudotype baculovirus expressing murine telomerase induces anti-tumor immunity comparable with rna-electroporated dendritic cells in a murine glioma model suppression of tumor growth in xenograft model mice by programmed cell death 4 gene delivery using folate-peg-baculovirus baculovirus is an efficient vector for the transduction of the eye: comparison of baculovirus-and adenovirus-mediated intravitreal vascular endothelial growth factor d gene transfer in the rabbit eye ligand-directed gene targeting to mammalian cells by pseudotype baculoviruses characterization of baculovirus autographa californica multiple nuclear polyhedrosis virus infection in mammalian cells induction of natural killer cell-dependent antitumor immunity by the autographa californica multiple nuclear polyhedrosis virus baculovirus display of single chain antibody (scfv) using a novel signal peptide recombinant baculoviruses as expression vectors for insect and mammalian cells recombinant baculoviruses as mammalian cell gene delivery vectors baculovirus as versatile vectors for protein expression in insect and mammalian cells baculovirus gene delivery: a flexible assay development tool baculovirus capsid display: a novel tool for transduction imaging mesenchymal stem cells expressing osteogenic and angiogenic factors synergistically enhance bone formation in a mouse model of segmental bone defect baculovirus-mediated immediateearly gene expression and nuclear reorganization in human cells clathrin-independent entry of baculovirus triggers uptake of e. coli in non-phagocytic human cells a bicistronic baculovirus vector for transient and stable protein expression in mammalian cells soluble forms of the cell adhesion molecule l1 produced by insect and baculovirus-transduced mammalian cells enhance schwann cell motility baculovirus transduction of rat articular chondrocytes: roles of cell cycle variation of baculovirus-harbored transgene transcription among mesenchymal stem cell-derived progenitors leads to varied expression baculovirus transduction of chondrocytes elicits interferon-α/β and suppresses transgene expression development of a novel viral dna vaccine against human papillomavirus: acherv-hp16l1 baculoviruses exhibit restricted cell type specificity in rat brain: a comparison of baculovirus-and adenovirus-mediated intracerebral gene transfer in vivo generation of lentivirus vectors using recombinant baculoviruses production and purification of lentiviral vectors generated in 293t suspension cells with baculoviral vectors human adipose derived stromal cells heal critical size mouse calvarial defects axonal transport of recombinant baculovirus vectors immune responses induced by a bacmam virus expressing the e2 protein of classical swine fever virus in mice immunization with pseudotype baculovirus expressing envelope protein of japanese encephalitis virus elicits protective immunity in mice baculovirus surface display of σc and σb proteins of avian reovirus and immunogenicity of the displayed proteins in a mouse model the healing of critical-sized femoral segmental bone defects in rabbits using baculovirus-engineered mesenchymal stem cells baculovirus as a gene delivery vector for cartilage and bone tissue engineering recent patents on the baculovirus systems adipose-derived stem cells engineered with the persistently expressing hybrid baculovirus augment the healing of massive bone defects efficient and stable gene expression in rabbit intervertebral disc cells transduced with a recombinant baculovirus vector stimulation of baculovirus transcriptome expression in mammalian cells by baculoviral transcriptional activators ring and coiled-coil domains of baculovirus ie2 are critical in strong activation of the cytomegalovirus major immediate-early promoter in mammalian cells baculovirus-transduced mouse amniotic fluid-derived stem cells maintain differentiation potential maximizing baculovirus-mediated foreign proteins expression in mammalian cells rapid titer determination of baculovirus by quantitative real-time polymerase chain reaction development of a hybrid baculoviral vector for sustained transgene expression rapid baculovirus titration based on regulatable green fluorescent protein expression in mammalian cells functional entry of baculovirus into insect and mammalian cells is dependent on clathrin-mediated endocytosis suppression of porcine arterivirus replication by baculovirusdelivered shrna targeting nucleoprotein baculovirus surface-displayed hemagglutinin of h5n1 influenza virus sustains its authentic cleavage, hemagglutination activity, and antigenicity initiation of hepatitis b virus genome replication and production of infectious virus following delivery in hepg2 cells by novel recombinant baculovirus vector baculovirus as vaccine vectors post-transcriptional regulatory element boosts baculovirus-mediated gene expression in vertebrate cells culture medium induced vimentin reorganization associates with enhanced baculovirus-mediated gene delivery baculovirus display: a multifunctional technology for gene delivery and eukaryotic library development enhanced baculovirus-mediated transduction of human cancer cells by tumor-homing peptides tumor targeting of baculovirus displaying a lymphatic homing peptide surface display of igg fc on baculovirus vectors enhances binding to antigen-presenting cells and cell lines expressing fc receptors baculovirus entry into human hepatoma cells characterization of hcv-like particles produced in a human hepatoma cell line by a recombinant baculovirus efficient delivery and regulable expression of hepatitis c virus full-length and minigenome constructs in hepatocyte-derived cell lines using baculovirus vectors chromosomal integration of transduced recombinant baculovirus dna in mammalian cells functional immunogenicity of baculovirus expressing pfs25, a human malaria transmission-blocking vaccine candidate antigen baculovirus treatment fully protects mice against a lethal challenge of fmdv optimization of baculovirus transduction on freestyle (™) 293 cells for the generation of influenza b/lee/40 rna interference mediated in human primary cells via recombinant baculoviral vectors baculovirus up-regulates antiviral systems and induces protection against infectious bronchitis virus challenge in neonatal chicken potential of adenovirus and baculovirus vectors for the delivery of shrna against morbilliviruses hybrid cytomegalovirus enhancer-h1 promoter-based plasmid and baculovirus vectors mediate effective rna interference efficient gene delivery into mammalian cells by recombinant baculovirus containing a hybrid cytomegalovirus promoter/semliki forest virus replicon antitumor effects of a recombinant pseudotype baculovirus expressing apoptin in vitro and in vivo baculovirus reveals a new ph-dependent direct cell-fusion pathway for cell entry and transgene delivery a chimeric baculovirus displaying bovine herpesvirus-1 (bhv-1) glycoprotein d on its surface and their immunological properties baculovirus-mediated gene expression in chicken primary cells protective immunity against influenza h5n1 virus challenge in mice by intranasal co-administration of baculovirus surfacedisplayed ha and recombinant ctb as an adjuvant neutralizing epitopes of influenza virus hemagglutinin: target for the development of a universal vaccine against h5n1 lineages reverse micelle-encapsulated recombinant baculovirus as an oral vaccine against h5n1 infection in mice gastrointestinal delivery of baculovirus displaying influenza virus hemagglutinin protects mice against heterologous h5n1 infection enhanced gene delivery by avidin-displaying baculovirus magnetic resonance imaging of viral particle biodistribution in vivo spect/ct imaging of baculovirus biodistribution in rat error assessment in recombinant baculovirus titration: evaluation of different methods improvement in nuclear entry and transgene expression of baculoviruses by disintegration of microtubules in human hepatocytes an orip/ebna-1-based baculovirus vector with prolonged and enhanced transgene expression the autographa californica nuclear polyhedrosis virus acnpv induces functional maturation of human monocyte-derived dendritic cells baculovirus-mediated gene transfer is attenuated by sodium bicarbonate sustained baculovirus-mediated expression in myogenic cells a novel method using baculovirus-mediated gene transfer for production of recombinant adeno-associated virus vectors effective transduction of osteogenic sarcoma cells by a baculovirus vector transduction of avian cells with recombinant baculovirus baculoviral capsid display of his-tagged zno inorganic binding peptide hepatitis b virus (hbv)-specific short hairpin rna is capable of reducing the formation of hbv covalently closed circular (ccc) dna but has no effect on established ccc dna in vitro baculovirus-based vaccination vectors allow for efficient induction of immune responses against plasmodium falciparum circumsporozoite protein modulation of chondrocyte phenotype via baculovirus-mediated growth factor expression baculovirus-mediated growth factor expression in dedifferentiated chondrocytes accelerates redifferentiation: effects of combinational transduction stable replication of the ebna1/orip-mediated baculovirus vector and its application to anti-hcv gene therapy baculovirus-mediated bispecific short-hairpin smallinterfering rnas have remarkable ability to cope with both influenza viruses a and b suppression of hepatitis c virus replication by baculovirus vector-mediated short-hairpin rna expression baculovirus activates murine dendritic cells and induces non-specific nk cell and t cell immune responses induction of antitumor immunity against mouse carcinoma by baculovirus-infected dendritic cells localization of vp28 on the baculovirus envelope and its immunogenicity against white spot syndrome virus in penaeus monodon immunological properties of fmdv-gp64 fusion proteins expressed on sf9 cell and baculovirus surfaces hemagglutinin displayed baculovirus protects against highly pathogenic influenza characterization of cell-surface determinants important for baculovirus infection in vitro and in vivo gene delivery by recombinant baculoviruses baculovirus vector for gene delivery and vaccine development high yield purification of functional baculovirus vectors by size exclusion chromatography factors influencing the production and storage of baculovirus for gene delivery: an alternative perspective from the transducing titer assay baculovirus infection of nondividing mammalian cells: mechanisms of entry and nuclear transport of capsids purification of recombinant baculoviruses for gene therapy using membrane processes analysis of adsorption of a baculovirus bioreaction bulk on an ion-exchange surface by surface plasmon resonance modeling electrostatic interactions of baculovirus vectors for ion-exchange process development baculovirus-mediated gene expression in zebrafish potential cancer gene therapy by baculoviral transduction baculovirus as a highly efficient gene delivery vector for the expression of hepatitis delta virus antigens in mammalian cells recombinant baculovirus containing the diphtheria toxin a gene for malignant glioma therapy construction and immunogenicity of pseudotype baculovirus expressing gp5 and m protein of porcine reproductive and respiratory syndrome virus inhibition of nasopharyngeal carcinoma growth by rta-expressing baculovirus vectors containing orip dna methyltransferase inhibitors increase baculovirusmediated gene expression in mammalian cells when applied before infection delivery of vaccine peptides by rapid conjugation to baculovirus particles ion-exchange membrane chromatography method for rapid and efficient purification of recombinant baculovirus and baculovirus gp64 protein combinatorial control of suicide gene expression by tissuespecific promoter and microrna regulation for cancer therapy a pseudotype baculovirus-mediated vaccine confers protective immunity against lethal challenge with h5n1 avian influenza virus in mice and chickens baculovirus surface display of e2 envelope glycoprotein of classical swine fever virus and immunogenicity of the displayed proteins in a mouse model baculovirus surface display of e rns envelope glycoprotein of classical swine fever virus baculovirus surface display of ns3 nonstructural protein of classical swine fever virus baculovirus surface display of e envelope glycoprotein of japanese encephalitis virus and its immunogenicity of the displayed proteins in mouse and swine models establishment of medakafish as a model for stem cell-based gene therapy: efficient gene delivery and potential chromosomal integration by baculoviral vectors avian influenza virus hemagglutinin display on baculovirus envelope: cytoplasmic domain affects virus properties and vaccine potential polyethylenimine coating to produce serum-resistant baculoviral vectors for in vivo gene delivery baculovirus vector-mediated transfer of nis gene into colon tumor cells for radionuclide therapy baculovirus virions displaying plasmodium berghei circumsporozoite protein protect mice against malaria sporozoite infection a baculovirus dual expression system-based malaria vaccine induces strong protection against plasmodium berghei sporozoite challenge in mice baculovirus-based nasal drop vaccine confers complete protection against malaria by natural boosting of vaccine-induced antibodies in mice transduction of vertebrate cells with spodoptera exigua multiple nucleopolyhedrovirus f protein-pseudotyped gp64-null autographa californica multiple nucleopolyhedrovirus high-efficiency transient transduction of human embryonic stem cell-derived neurons with baculoviral vectors a novel method for isolation of membrane proteins: a baculovirus surface display system baculovirus expression of cloned porcine arterivirus generates infectious particles in both insect and mammalian cells display of heterologous proteins on gp64null baculovirus birions and enhanced budding mediated by a vesicular stomatitis virus g-stem construct the feasibility of using a baculovirus vector to deliver the sodium-iodide symporter gene as a reporter the authors acknowledge the financial support from the national tsing hua university booster program (99n2544e1, 98n2901e1, 97n2511e1), cgmh-nthu joint research program (99n2419e1, cmrpg380101, cmrpg390141) and national science council (99-2221-e-007-025-my3, 98-2221-e-007-047-my3, nsc99-2622-e-007-012-cc2, nsc 98-2627-b-007-006), taiwan. key: cord-268098-71g1w1mc authors: beckman, m. f.; igba, c. k.; mougeot, f. b.; mougeot, j.-l. title: comorbidities and susceptibility to covid-19: a generalized gene set meta-analysis approach date: 2020-09-15 journal: nan doi: 10.1101/2020.09.14.20192609 sha: doc_id: 268098 cord_uid: 71g1w1mc background the covid-19 pandemic has led to over 820,000 deaths for almost 24 million confirmed cases worldwide, as of august 27th, 2020, per who report. risk factors include pre-existing conditions such as cancer, cardiovascular disease, diabetes, obesity, and cancer. there are currently no effective treatments. our objective was to complete a meta-analysis to identify comorbidity-associated single nucleotide polymorphisms (snps), potentially conferring increased susceptibility to sars-cov-2 infection using a computational approach. results snp datasets were downloaded from publicly available gwas catalog for 141 of 258 candidate covid-19 comorbidities. gene-level snp analysis was performed to identify significant pathways by using magma program. snp annotation program was used to analyze magma-identified genes. covid-19 comorbidities from six disease categories were found to have significant associated pathways, which were validated by q-q plots (p<0.05). the top 250 human mrna gene expressions for snp-affected pathways, extracted from publicly accessible gene expression profiles, were evaluated for significant pathways. protein-protein interactions of identified differentially expressed genes, visualized with string program, were significant (p<0.05). gene interaction networks were found to be relevant to sars and influenza pathogenesis. conclusion pathways potentially affected by or affecting sars-cov-2 infection were identified in underlying medical conditions likely to confer susceptibility and/or severity to covid-19. our findings have implications in covid-19 treatment development. keywords: sars-cov-2, covid-19, comorbidity, susceptibility, severity visualization of protein-protein interaction networks was completed using stringv11.0 [31] program by testing different confidence levels to identify ontologies of biological significance for the significant pathways associated with comorbidities. possible comorbidity significant associated gene sets/pathways were checked for quality control by generating quantile-quantile (q-q) plots using observed quantiles and residual z-scores of genes within the gene set, based on the magmav1.07b publicly available rv3.6.2 script (posthoc_qc_107a.r) [32, 33] . ensembl's variant effect predictor program (vep) [34] was used to analyze magmav1.07b annotation files for each gene set associated with comorbidities [35] . magmav1.07b annotation files were converted into vep format using a bash script. all converted annotation files were uploaded into vep online tool separately. vep summary statistics and analysis tables were downloaded for the comorbidities' associated genes and pathways found significant by magmav1.07b. corresponding tables were merged [36, 37] containing annotated gene symbols and entrez gene identifiers for all human genes were used to retrieve missing gene identification [38] . these tabular (.csv) files were merged and loaded into rv4.0.2. entrez gene ids were matched to gene symbols from vep analysis files to identify affymetrix gene symbols. genes and their corresponding entrez id's were then matched to significant genes' entrez ids found through combined magmav1.07b -string analysis. all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 was used to test the top 250 human mrna gene expressions for each comorbidity based on available human data using ncbi geo[39] , by only including comorbidities that had significant pathways identified by magmav1.07b and vep string analyses. human mrna expression datasets comparing disease group to healthy controls since 2010 were searched. if no datasets were available post-2010 the latest dataset was downloaded using characteristics described for prior datasets. for diseases with no publicly available datasets comparing healthy controls to disease type, the newest, most relevant dataset was used. tissue types used for analysis included: (i) peripheral blood mononuclear cells (pbmcs), (ii) cancer tissues, (iii) adipose tissue, (iv) pulmonary tissue, (v) post-mortem brain tissue, (vi) cardiovascular tissue, and (vii) blood stem cells. for each comorbidity, human mrna gene expression data corresponding to average log-fold change (alfc) were formatted for clustering of genes identified by magmav1.07b and vep and subsequently matched to string protein-protein interactions. gene weights were added manually to account for duplicate genes in the dataset. genes were mean centered and normalized. hierarchical clustering was completed using a similarity metric of manhattan cityblock distance for genes and arrays with average linkage via cluster3.0v1.59. clusters were visualized using heatmaps created using javatreeviewv1.1.6r4 [40] . clustered groups of genes for magmav1.07b and vep genes were run separately through genecodisv4.0 online tool [41] for identification of possible biological processes or pathways involved in viral infection [42] . google searches including hugo gene symbol and either "influenza" or "sars" [46, 47] . risk of bias was assessed according to "cochrane's handbook for systematic reviews of interventions" [48] . human tissue expression relevant to covid-19 for genes with direct involvement was validated using ensembl expression atlas [49, 50] . genes not generally expressed in central nervous, cardiovascular, or pulmonary systems were removed from the dataset. visualization of protein-protein interaction network of genes directly involved with influenza and sars (caused by sars-cov-1) was completed using stringv11.0 using an interaction score of 0.400 [31] . all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 /10. /2020 the overall computational analytical design and associated primary results are presented in figure 1 . to conduct generalized gene set analysis, we retrieved publicly available gwas catalog datasets for 141 out of 258 covid-19 possible comorbidities/underlying medical conditions. the 141 comorbidities were grouped into 8 categories by disease type based on organ most affected (table s1 ). following our magma analysis (figure 1 : flowchart section a), gene set and reactome gene level analyses yielded 69 pathways representing 119 significant genes (p<0.05). these pathways were significant for 22 covid-19 comorbidities representing 6 disease categories, namely, cancer (n=9); cardiovascular (n=4); neurologic/mental (n=3); respiratory (n=2); skin/musculoskeletal (n=1); autoimmune/endocrine/metabolic (n=3). reactome significant pathways and genes obtained through magmav1.07b gene-level analysis from enrichment map are shown in tables 1a and b. using stringv11.0 program with the highest confidence interaction score (cis) of 0.9, processing of the 119 genes yielded a protein-protein interaction network of 70 genes, which was found to be highly significant based on hypergeometric test with benjamini-hochberg correction (p=4.36x10 -11 ) (figure 2a) . the top kyoto encyclopedia of genes and genomes (kegg) pathway, identified by using stringv11.0, corresponded to epstein-barr virus infection with a false discovery rate of 6.72x10 -9 . verification of significant pathways using q-q plots showed a high association between genes and their relative gene ontology defined pathways, since all plots show a distribution of residual z-scores deviating from the diagonal early on. there were no q-q plots with any ambiguous feature. significant genes had high levels of association with each pathway. q-q plots of more than five genes, representing the pathways ontologies "post-translational protein modification", all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 /10. /2020 "translocation of zap-70 to immunological synapse", "metabolism" and "cell cycle" and associated possible covid-19 comorbidities (including asthma), are described in figure s1 . annotation files were converted for 134 of the 141 comorbidities with gwas catalog datasets available (figure 1 : flowchart section b). of 3704 hugo gene symbols extracted from vep, 2996 corresponding entrez gene ids were identified using affymetrix human genome annotation file. of these gene ids, 50 were matched with the 119 significant genes identified by magmav1.07b for the 22 comorbidities with significant pathways ( table 2) . of the 50 genes, all were included in a protein-protein interaction network of 55 genes using a low cis in stringv11.0 (figure 2b) . the top kegg pathway identified using stringv11.0 was htlv-1 infection with a false discovery rate of 4.38x10 -7 using hypergeometric test with benjamini-hochberg correction. geo human mrna expression datasets were retrieved for 19 of 22 comorbidities. a description of geo datasets is presented in table s2 . using 119 magmav1.07b identified genes ( figure 1 : flowchart section a), javatreeviewv1.1.6r4 clustered 4 of 9 cancer types in the heatmap (partial view in figure 3a , full heatmap in supplementary image). also, interstitial lung disease, multiple sclerosis, asthma, obesity, and heart failure were clustered (figure 3a) . vep string matched genes (n=50) also clustered 4 of 9 cancer types and clustered interstitial lung disease, multiple sclerosis, asthma, obesity, and heart failure together (figure 3b ). in both heatmaps nucleoporin 160 (nup160), nucleoporin 153 (nup153), fibroblast growth factor [51] [52] [53] [54] . we also identified three genes kpnb1, signal transducer and activator of transcription 3 (stat3), and interleukin 2 receptor subunit alpha (il2ra) shown to play a significant role in sars [55] [56] [57] . genes identified as being possibly directly associated with influenza and/or sars are shown in table s3 . string protein-protein interaction network yielded 38/46 (82.6%) genes involved in influenza and 15/17 (88.2%) genes involved in sars, using an interaction score of 0.4 (figures 4a and b) . no gwas study was found for sars-cov-1 infection to identify possible susceptibility genes within the 119 genes. additionally, no studies were found to be at high risk for bias (table s4) . this is the first study conducting generalized gene set analysis on a broad spectrum of possible covid-19 comorbidities, with the prospect of identifying comorbidity-specific genes that could impact infection by sars-cov-2. starting with a list of 258 diseases, our magma pipeline was able to identify 69 significant reactome pathways with a total of 119 significant genes corresponding to 22 comorbidities that might have implications in predicting the severity of sars-cov-2 infection (figure 1 , table 1, table s1 ). of the 22 comorbidities, we were able to validate pathways associated with cardiovascular disease, diabetes, obesity, and pulmonary diseases. cardiovascular diseases identified included heart failure, atherosclerosis, kawasaki's disease, and hypertension. pulmonary diseases included asthma and interstitial lung disease. cancer has been reported as a possible risk factor for covid-19 [9] . we were able to identify nine cancers with gwas data and significant associated pathways including acute myeloid leukemia, renal cell cancer, small cell lung cancer, and lung cancer. furthermore, the known covid-19 comorbidities, hypertension, obesity and diabetes had significant pathways and genes. while q-q plots indicated validity of our findings, caution for interpretation of q-q plots must be used as these plots are normally used for pathways containing many genes. to a certain degree, these allow us to convey a certain level of confidence that there is a true association between gene and pathway [33] . in our analysis, however, less genes identified allowed us to narrow possible gene targets and pathways. indeed, certain genes identified in our study may have significant biological relevance to infection by sars-cov-2. for instance, sialyl all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . 1 0 transferase st6 n-acetylgalactosaminide alpha-2,6-sialyltransferase 3 (st6galnac3) was found significant in the post-translational protein modification pathway ( figure s1 ). another sialyl transferase, st6galnac1, has been previously investigated as a drug target against infection of smooth airways epithelial cells by influenza virus [58] . it remains, however, to be determined whether st6galnac3, generally expressed at high levels in renal cell cancer[59], plays a significant role in covid-19 pathogenesis. interestingly, alfc gene expression of st6galnac3 was positive in our analysis for 8 of 19 comorbidities (including renal cell cancer), namely, in tumor, pulmonary, brain, adipose tissues and pbmcs (figure 3a&b ). stringv11.0 analysis produced significant enrichment for both magmav1.07b genes and vep matched genes containing snps that had characteristics of deleterious effects ( table 2) . therefore, we believe the interactions among genes from significant pathways from magma and matched vep genes are likely not due to chance and that these genes are biologically connected. furthermore, stringv11.0 analyses identified top kegg pathways including, epstein-barr virus pathway (magma genes), and htlv-1 pathway (vep matched genes). string was able to cluster 70 genes into four functional groups among the 119 magma significant genes: cell regulation and immune response, cell transport and nervous tissue function, protein homeostasis and gene expression, transcriptional regulation and rnamediated silencing (figure 2a) . additionally, nup160, nup153, and kpnb1 clustered tightly together in the cell transport and nervous tissue function group. stringv11.0 analysis of the 50 vep matched genes with a lower confidence interval of 0.150 was required to obtain sufficient network connections for interpretation. although network analysis may be subjective and is dependent on established knowledge, it is important to note that the enriched protein-protein interaction p-value was statistically significant. for the vep matched gene stringv11.0 analysis, there were four distinct biological groupings recognized within the mapped network based on the closeness of protein interactions (figure 2b ). those groupings were (i) antigen specific immune response, (ii) cell division and molecule formation/development, (iii) cell growth, survival, proliferation, motility, and morphology, (iv) and voltage gated ion channel transmembrane proteins. notably, one of the comorbidities with significant associated pathways, breast cancer, contained snps affecting solute carrier family 4 member 7 (slc4a7) and solute carrier family 24 member 3 (slc24a3) genes. these genes are involved with sodium, calcium, and potassium ion transport and play a role in the malignant progression of breast cancer [60] . in addition, euchromatic histone lysine methyltransferase 2 all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . epithelial signaling by fibroblast growth factors is required for effective recovery from lung injuries resulting from influenza infection [51] . our analysis coincides with previous findings linking induced inactivation of fgfr2 with increased mortality and influenza-induced lung injury [51] . kpnb1 (figure 2a) . furthermore, kpnb1 is involved in the early stage of influenza virus replication via nuclear trafficking, by way of, nuclear import of viral cdna or viral/host proteins into the host chromosome [52, 53] . based on previous studies, the interaction between nup153 and kpnb1 has been investigated in relation to nuclear transport [63] . the degradation of nup153 in influenza virus a infected cells, such as madin-darby canine kidney ii and human lung epithelial cells, results in an enlargement and widening of nuclear pores [54] . this disease process allows viral ribonucleoprotein complexes to be exported from the nucleus to the plasma membrane [54] . additionally, nup160 has been shown to work in conjunction with nup153 to mediate nuclear import and export [64] . therefore, degradation of one or both can prevent the import of signal transducers and activators of transcription, reducing effectiveness of the anti-viral interferon all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org/10.1101/2020.09.14.20192609 doi: medrxiv preprint response [65] . our results support the interactions between these genes and viral respiratory 1 3 further research is needed to confirm these genes (or associated regulations) as possible drug targets for sars-cov-2 infection. while there is no shortage of publicly available data, not all diseases have the same level of dedicated research. therefore, not all possible comorbidities had publicly available snp datasets from gwas catalog or human mrna gene expression datasets from ncbi's geo datasets database. this resulted in a large decrease from 258 possible comorbidities to 141. additionally, we were only able to use 19 of 22 significant comorbidities for geo2r analysis and heatmap visualization. another caveat is that geo2r mrna expression datasets have been generated through different independent studies using different genomic platforms and analysis pipelines, so that optimal normalization of raw data cannot be implemented. little is still known about covid-19 pathogenesis, although research on the matter has increased greatly since the beginning of the pandemic. all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 /10. /2020 dr. jean-luc mougeot and dr. farah bahrani mougeot conceived the study, contributed to the design of the analytical strategy, data interpretation and verification. micaela beckman designed the overall analytical strategy, conducted most computational analyses and data interpretation, and wrote the manuscript draft. chica igba contributed to the design of analytical strategy, conducted analyses and participated into data interpretation. all authors had significant contributions in writing and revisions of main manuscript, tables and figures. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 /10. /2020 preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 /10. /2020 176814; 174048; 176409; 174143; 179419; 174048; 113507; 5687128; 176412; 176408; 453276; 425407; 425393 activation of apc c and apc c: cdc20 mediated degradation of mitotic proteins; cyclin b; mitotic proteins; cell cycle proteins; cell cycle protein prior to satisfaction of cell cycle checkpoint; phospho-apc c mediated degradation of cyclin a; phosphorylation and regulation of apc c between g1 s and early anaphase; e2f enabled inhibition of prereplication complex formation; mapk mapk4 signaling; regulation of mitotic cell cycle; slc-mediated transmembrane transport; transport of inorganic cations anions and amino acids oligopeptides 3.57e-11; 3.32e-5 422475; 204998; 73887; 1266738; 416482; 9675108; 193648; 193704; 194840; 194315 axon guidance; cell death signaling via nrage, nrif, and nade; death receptor signaling; developmental biology; g alpha (12 13) all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020 . . https://doi.org/10.1101 preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org/10.1101/2020.09.14.20192609 doi: medrxiv preprint a list of candidate comorbidities (n=258) possibly associated with increased severity/infectivity of covid-19 were curated. snps associated with comorbidities with available gwas catalogdata (n=141) were analyzed. multi-marker analysis of genomic annotation (magma) was performed. snps were annotated to genes using ncbi gene reference file (ncbi37.3.gene.loc). in magmav1.07b, gene set/pathway analysis was performed for which each snp that was identified, using the "multi-mean=snp-wise" model-generated results, taking into account ethnicities associated with a possible comorbidity. gene-level analysis was completed using reactome pathways retrieved from enrichment map program. stringv11.0 protein-protein interaction program was used to visualize the network of 119 significant genes. quantile-quantile (q-q) plots in rv3.4.2 for 69 significant pathways were used for quality control. ncbi-gene expression omnibus (geo) human mrna differential expression datasets were downloaded via geo2r for each comorbidity with associated genes/ pathways (n=19 of 22). human mrna expression was visualized with a heatmap of the 119 significant genes using cluster3.0v1.59 and javatreeviewv1.1.6r4. tissue expression relevance to sars and influenza was determined using disgenetv6 and ensembl expression atlas databases. magmav1.07b annotation files were converted for ensembl variant effect predictor (vep) format (n=134 of the 141 gwas datasets). gene symbols (n=3704) were extracted for vep analysis from 22 significant comorbidity-associated genes/pathways per magmav1.07b analysis. entrez gene ids (n=2996) were matched to gene symbols using affymetrix gene symbols annotation files (hg-u133a/b human genome files). stringv1.0 protein-protein interaction program was used to visualize the network (n=50 genes). ncbi-gene expression omnibus (geo) human mrna differential expression datasets were downloaded via geo2r for each of the 22 comorbidities with associated genes/pathways (n=19 of 22). vep genes were matched to network genes and formatted for cluster3.0v1.59. human mrna expression was visualized with a heatmap using average log-fold changes (alfc) in javatreeviewv1.1.6r4 (i.e., 50 vep identified genes matched to 119 magma identified genes). all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org/10.1101/2020.09.14.20192609 doi: medrxiv preprint all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . genes were filtered by removing those with less than 60% values present, mean centered and normalized. hierarchical clustering was completed using weights for duplicate/synonymous genes with a similarity metric of city-block (manhattan) distance for genes and arrays with average linkage through cluster3.0v1.59 software. heatmaps were created using javatreeviewv1.1.6r4. yellow depicts positive alfc, blue depicts negative alfc, black depicts missing data and values of zero. string protein-protein interactions of magmav1.07b identified genes with direct involvement with (a) influenza (n=46) and/ or (b) sars (n=17) are shown. level of stringency in stringv11.0 program was set to a medium confidence interaction score (cis) of 0.4 in both influenza and sars related molecular networks (a, b), resulting in a cluster of 38/46 (82.6%) and 15/17 (88.2%) genes, respectively. all rights reserved. no reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september 15, 2020. . https://doi.org /10.1101 /10. /2020 clinical characteristics of coronavirus disease 2019 in china world health organization: q&a: influenza and covid-19 -similarities and differences. www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answershub/q-a-detail/q-a-similarities-and-differences-covid-19-and-influenza#:~:text=mortality for covid-19,quality of health care the incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application review of the 2019 novel coronavirus (sars-cov-2) based on current evidence evaluation, and treatment of coronavirus (covid-19) coronaviruses pathogenesis, comorbidities and multi-organ damage -a review prevalence of comorbidities and its effects in patients infected with sars-cov-2: a systematic review and meta-analysis determinants of severity in cancer patients with covid-19 illness effect of underlying comorbidities on the infection and severity of covid-19 in korea: a nationwide case-control study understanding modernday vaccines: what you need to know vaccine effectiveness: how well do the flu vaccines work accessed 1 identification of information flow-modulating drug targets: a novel bridging paradigm for drug discovery network-based relating pharmacological and genomic spaces for drug target identification transcriptome sequencing assisted discovery and computational analysis of novel snps associated with flowering in raphanus sativus in-bred lines for marker-assisted backcross breeding the connectivity map: using gene-expression signatures to connect small molecules, genes, and disease a survey on the computational approaches to identify drug targets in the postgenomic era a review on computational systems biology of pathogen-host interactions immunoinformatics--the new kid in town influenza research database: an integrated bioinformatics resource for influenza virus research cochrane handbook for systematic reviews of interventions version 6 expression atlas: gene and protein expression across multiple studies and organisms fgfr2 is required for aec2 homeostasis and survival after bleomycin-induced lung injury human host factors required for influenza virus replication a host of factors regulating influenza virus replication ms asthma 5581; 3575; 3117;3123; 6891; 3118 heart failure 64759 tns3 192154334 iv; dgv hypertension 776 lung cancer 79888 we thank kathleen sullivan for her editorial expertise in preparation of this manuscript. this work was supported by the atrium health foundation research fund (internal). dcun1d5 agbl1 inpp5f stat5a agtrap psmc3 cd44 gba ctsc rfwd2 ppp1cb stat3 cd86 btrc kcnj16 mtmr2 shtn1 acsl3 agtpbp1 elovl7 ncoa1 cacnb2 slc44a1 aqp9 syne1 nr3c2 ppkar2b iqgap2 muc2 mapk10 trim36 slc24a3 st6galnac3 cntrl il2ra atxn3 -dqa1 il7r sh2b1 adatms9 acoxl lpcat1 myl2 osbpl10 cacna1d mcm8 ehmt2 prkce kif21b tap2 ext1 ncoa1 iqgap2 muc2 adcy7 fgfr2 ngef vac14 tollip hla-drb1 pik3r2 slc4a7 ntm kcnd3 mad1l1 prkca slc22a1 rarb vps45 tns3 agtpbp1 nup160 lamc1 nup153 stat3 il2ra cacnb2 slc24a3 mapk10 nr3c2 syne1 key: cord-277491-q18b88lm authors: cao, ying-li; wang, ying; guo, rong; yang, fan; zhang, yun; wang, shu-hui; liu, li title: identification and characterization of three novel small interference rnas that effectively down-regulate the isolated nucleocapsid gene expression of sars coronavirus date: 2011-02-11 journal: molecules doi: 10.3390/molecules16021544 sha: doc_id: 277491 cord_uid: q18b88lm nucleocapsid (n) protein of severe acute respiratory syndrome-associated coronavirus (sars-cov) is a major pathological determinant in the host that may cause host cell apoptosis, upregulate the proinflammatory cytokine production, and block innate immune responses. therefore, n gene has long been thought an ideal target for the design of small interference rna (sirna). sirna is a class of small non-coding rnas with a size of 21-25nt that functions post-transcriptionally to block targeted gene expression. in this study, we analyzed the n gene coding sequences derived from 16 different isolates, and found that nucleotide deletions and substitutions are mainly located at the first 440nt sequence. combining previous reports and the above sequence information, we create three novel sirnas that specifically target the conserved and unexploited regions in the n gene. we show that these sirnas could effectively and specifically block the isolated n gene expression in mammal cells. furthermore, we provide evidence to show that n gene can effectively up-regulate m gene mediated interferon β (ifnβ) production, while blocking n gene expression by specific sirna significantly reduces ifnβ gene expression. our data indicate that the inhibitory effect of sirna on the isolated n gene expression might be influenced by the sequence context around the targeted sites. severe acute respiratory syndrome (sars) that spread worldwide in 2003 is caused by a novel type of coronavirus called sars-associated coronavirus (sars-cov). sars-cov is an enveloped and positively single-stranded rna virus with a typical genome size 29.7 kb [1] [2] [3] . the viral envelope is a lipid bilayer enbedded with three viral transmembrane proteins: the matrix (m), envelope (e) and spike (s) proteins. the genomic rna of sars-cov is protected and packaged by the viral nucleocapsid (n) protein by forming a helical structure of ribonucleoprotein complex within the viral envelope. in addition, the physical interaction between the n and m proteins might be necessary for the assembly of coronavirus [4] [5] [6] . the n gene of sars-cov is located at a region proximal to the 3' end of the viral genome with a size of 1269nt long that encodes for a 422aa basic protein. the main functions of n protein are two folds: 1) it binds to viral genomic rna for the formation of viral nucleocapsid; 2) it also selfassociates into polymer that might be critical for the helical structure formation [7] . protein structural study reveals that the sars-cov n protein contains two structural domains flanked by intrinsically disordered (id) regions [8] . the n terminal domain (ntd, amino acid residues 45-181) can bind nonspecifically to a variety of nucleotide substrates [9] and functions as a putative rna binding domain associated with the viral rna genome [10] , whereas the c terminal domain (ctd, amino acid residues 248-365) is thought to be responsible for the dimerized self-association [8, 11] . however, recent studies show that the c terminal sequence of n protein also promotes higher ordered oligomerization and is able to interact with nucleic acids as well [7, 12] . moreover, chang demonstrated that all id sequences are able to bind rna [13] , indicating that the multisite nucleic acid binding property of sars-cov n protein may have an inherent advantage to promote the formation of viral helical nucleocapsid core. sars-cov infection is a life-threatening disease that often develops with acute lung injury and acute respiratory distress syndrome. the inflammation and immune responses in the host are often induced or damaged by viral gene products such as n gene products. sars-cov n protein possesses multifarious activities and may be actively involved in sars-cov induced pathogenesis [14] . for instance, n protein induces the pro-inflammatory responses by activating the promoter activity of either cyclooxygenase-2 (cox-2) or il-6 by directly interacting with the nfb binding element [15, 16] . overexpression of n protein in serum starved cell lines such as monkey kidney cos-1 cells [17] or human pulmonary fibroblast hpf cells [18] induces apoptosis. more importantly, sars-cov n protein can function as an antagonist to counteract the host innate immune response by inhibiting the activity of irf3 and nfkb, and subsequently blocking interferon  production [19] . sars-cov n protein is believed to be one of the major pathological determinants in the host cells [14] . therefore, down-regulating n protein expression by the small interference rna (sirna) approach might be good strategy to reverse the virus-induced damage to the host and may help development of a more effective therapeutic means to control viral transmission. sirna is a type of rna molecule with 21-25nt in length that functions post-transcriptionally to down-regulate the targeted gene expression. sars-cov n gene specific sirnas have been designed and investigated by a number of groups, and at least 23 targeted sites of n gene specific sirnas have been selected and tested [20] [21] [22] [23] [24] . sequence analysis indicates that all these known sirnas are located at 2/3 of the 5' n gene sequence (within first 860nt out of the total 1269nt). nucleotide substitution and deletion mutations can be frequently recovered from the first 440nt sequence of n gene. therefore, these sequence alterations may have a potential to weaken the effect of sirna mediated gene silencing. in the current study, we compared the n gene sequences derived from 16 different isolates of sars-cov and selected three novel sirna targeting sites in the n gene, including one targeting the 3' terminus of the gene. functional analysis indicates that all three novel sirnas are effective in downregulating n gene expression. moreover, our study demonstrates that sars-cov n gene is able to up-regulate the m gene mediated interferon  production, while the n gene specific sirna can effectively reduce this up-regulation. sars-cov n protein is a multi-functional protein that contributes greatly to viral induced pathogenesis and often serves as the therapeutic targeting site for the development of anti-viral drugs against sars-cov infection, including sirna. sars-cov is a positive single-stranded rna virus frequently associated with sequence alterations during viral transmission. a better sirna should be designed to target the conserved region in the targeted sequence. to this end, we randomly selected the n genes derived from 16 different isolates of sars-cov: hku-39849 (ay278491), tor2 (ay274119), bj02 (ay278487), hzs2-fb (ay394987), zj01 (ay297028), sin2748 (ay283797), shanghaiqxc1 (ay463059), shanghaiqxc2(ay463060), cuhk-ag01 (ay345986), pumc01 (ay350750), gz-b (ay394978), tc1 (ay338174), gz-c (ay394979), zs-c (ay95003), lc1 (ay394998.1) and lc5(ay395002.1). clustalw analysis revealed that a total of five sequence alterations (two sequence deletions and three nucleotide substitutions) occurred in the n gene in these isolates ( figure 1 ). interestingly, sequence alterations in the n gene are frequently uncovered in the 5' portion of the gene (within the first 440nt), which is similar to our previous observation on sars-cov m gene [25] . however, different from the m gene mutation, which is associated with single nucleotide substitution, the n gene mutation tends to be larger in size, such as twelve nucleotide deletions (nt14-25) and di-nucleotide substitutions (nt419-420, figure 1 ). the n gene also possesses two additional single nucleotide substitutions (nt74 and nt1128) and one single nucleotide deletion (nt384). previously, a number of n gene-specific sirnas based on random selection have been reported by other groups [20] [21] [22] [23] [24] . sequence analysis revealed that all these sirnas are located within the first 859nt sequence of the n gene, and no targeted site has been selected for the last 410nt sequence. considering the nucleotide substitutions in n gene as well as the previous reports, we chose three unexploited regions that were well conserved in the n genes among the 16 isolates of sars-cov. the selected targeted sites were at +213+233nt, +863+883nt and +1240+1260nt relative to the 5' atg initiation codon, and their respective sirnas were named as si-n213, si-n863 and si-n1240 ( figure 1 ). a known sirna (named si-n#16, previous name no. 16 in reference [21] ) that targeted the 5' terminus (+7+27nt) of n gene was also constructed for control purposes. three novel sirnas (si-n213, si-n863 and si-n1240) targeting at 213233nt, 863883nt and 12401260nt, respectively, are underlined. a known si-n#16 (no.16) targeting at 727nt reported by shi et al. [21] is also indicated. the full length of n gene cdna was amplified from sars-cov strain hku-39849 infected vero e6 cells. after confirmation by sequencing analysis, the n gene was then subcloned into eukaryotic expression vector pcmv-myc to generate a pcmv-myc-np plasmid construct. the expression of n gene was confirmed by rt-pcr and western blot analysis (figure 2a and 2b) . to visualize the subcellular localization of n protein, n gene was also fused with egfp to make a pegfp-np fusion gene construct. after transfection into hek293 cells, egfp-np fusion protein was mainly distributed in the cytoplasm (figure 2c ). alternatively, hek293 cells transfected with pcmv-myc-np were subjected to immunostaining analysis. in agreement with the above result, as well as a previous report [14] , the myc-tagged np gene products were indeed predominantly expressed in the cytoplasm ( figure 2d ). to test if the selected sirnas have an inhibitory effect on n gene expression, pegfp-np was cotransfected with increased doses of either si-n213 or si-n863 into hek293 cells. figure 3 demonstrates that both si-n213 and si-n863 inhibited pegfp-np expression post-transcriptionally and functioned in a dose-dependent manner, indicating that both sirnas are effective in inhibiting the targeted mrna expression. pegfp-np was co-transfected with the increased dose of either pbs/u6-si-n213 or pbs/u6-si-n863 into hek293 cells. total rnas were isolated from the transfected cells and subjected to standard rt-pcr by using n specific primers. beta actin expression was served as internal control. the effect of sirna on n gene repression was further confirmed by western blot analysis. figure 4a demonstrates a significant reduction in n protein expression as the ratio of n to si-n213 increased. quantitation of the band intensity revealed about 4 fold reduction in n protein expression when cotransfection with higher doses of si-n213 (figure 4b) . the specificity of si-n213 on n gene expression was further confirmed by using a non-specific sirna, si-m3, as a negative control. si-m3 has been shown to be a potent inhibitor to sars-cov m gene expression [25, 26] . higher doses of si-n213 but not si-m3 dramatically inhibited egfp-np gene expression, indicating the specificity of si-n213 mediated n gene repression (figure 4c ). quantitative analysis by flow cytometric approach further demonstrated that si-n213 could specifically and markedly reduce egfp-np gene expression (figure 4d) . similarly, si-n863, which targeted at the 3' half of n gene, also dramatically inhibited n protein expression by about four-fold when the molar ratio of si-n863: n reached 6:1 (figure 5a and 5b) . the specificity of si-n863 mediated n gene repression was demonstrated by the fact that higher doses of si-n863, but not si-m3, dramatically inhibited egfp-np gene expression (figure 5c ). the results were further confirmed by measuring the mean fluorescent intensity (mfi) of the transfected cells cotransfected with egfp-np and the indicated sirnas (figure 5d ). sequence analysis reveals that all the n gene sirnas reported by other groups have their own recognition sites, mainly located within the first 859nt sequence of the n gene [20] [21] [22] [23] [24] . therefore, to our knowledge, the last 410nt sequence of n gene has not been designed for sirna targeting. to this end, we created the third sirna that targets to the 3' terminus (+1240+1260nt) of n gene, and named it si-n1240. western blot analysis demonstrated that si-n1240 markedly inhibited myc-tagged n protein expression in a dose dependent manner (figure 6a and 6b ). in addition, pegfp-np was also co-transfected with increased doses of either si-m3 or si-n1240 into targeted cells. the result shown in figure 6c clearly demonstrated that higher doses of si-n1240, but not si-m3, dramatically inhibited egfp-np gene expression. the result was further confirmed by flow cytometric analysis by measuring the mean fluorescent intensity (mfi) of the transfected cells co-transfected with egfp-np and the indicated sirnas (figure 6d ). overall, the above results provide strong evidence to show that all three novel sirnas (si-n213, si-n863 and si-n1240) are specific and effective inhibitors to block the isolated sars-cov n gene expression. to assess the strength of si-n213, si-n863 and si-n1240 mediated n gene repression, we chose a known sirna, si-n#16 (no. 16) , which has been shown to be a potent n gene inhibitor [21] as a positive control. about 1.4 g of pcmv-myc-np was co-transfected with 4.2 g of each plasmid pbs/u6, si-n#16, si-n213, si-n863 and si-n1240 into targeted cells. the real time qrt-pcr result showed that si-n#16, si-n213, si-n863 and si-n1240 all induced significant inhibition on the isolated n gene expression as compared with that of the vector control (figure 7) . the quantitative analysis showed that the n gene inhibitions induced by si-n#16, si-n213, si-n863 and si-n1240 were about 4.0, 5.8, 19.6 and 3.6-fold, respectively (figure 7) , indicating that si-n863 might be a more potent inhibitor on n gene expression. previously, we demonstrated that sars-cov m gene could upregulate inf gene expression in a transient transfection system [25] . also, it has been shown that inf gene expression and transcription could be inhibited by sars-cov infection, probably due to the presence of n gene products [27] . to detect the influence of n gene on m mediated inf production, n and m were co-transfected into hek293 cells. interestingly, we found that the n gene was not able to inhibit, but rather enhanced m gene mediated inf mrna production (figure 8a ). addition of si-n213, si-n863 or si-n1240 in the co-transfection system significantly reduced inf mrna production (figure 8a) . the semiquantitative rt-pcr result was further confirmed by using real time qrt-pcr analysis (figure 8b) , indicating that the identified n gene specific sirna could functionally counteract the n gene mediated cellular processes. human embryonic kidney cell line 293 (hek293) cells were obtained from the cell culture center of institute of basic medical sciences, chinese academy of medical sciences. african green monkey kidney epithelial cell line vero e6 was provided by dr. k.y. yuen from the university of hong kong. cells were cultured in dulbecco's modified eagle medium (hyclone, south logan, ut, usa) supplemented with 10% fetal calf serum and incubated in a 37 c incubator containing 5% co 2 . anti-myc and anti-actin antibodies were purchased from santa cruz biotechnology (santa cruz, ca, usa). horseradish peroxidase (hrp)-labeled goat anti mouse igg and enhanced chemiluminescence (ecl) detection kit were also derived from santa cruz biotechnology. the sars-cov n gene was isolated from sars-cov strain hku-39849 [28] (provided by dr. ky yuen, the university of hong kong) by standard rt-pcr with a pair of primers: n5:5-tatagaatt ctgtctgataatggaccccaat-3 and n3:5-tataggtaccttatgcctgagttgaatcag-3. the amplified full length of the n gene was first subcloned into pgem-t easy vector. after confirmation by sequencing analysis, the n gene products were released by ecori/kpni double digestion and then subcloned into the respective sites of pcmv-myc to generate pcmv-myc-n. the n gene products were also inserted into the ecori/kpni sites of pegfp-n1 vector to form pegfp-np fusion gene construct. for construction of n gene specific sirnas, three novel targeted sites (213-233nt, 863-883nt and 1240-1260nt) in the n gene coding sequence were selected. two oligos for each targeted site were synthesized as sin-213-f, 5'-gggcgttccaatcaacaccaaaagcttttggtgttgattggaacgccctttttg-3' and sin-213-re, 5'-aattcaaaaagggcgttccaatcaacaccaaaagcttttggtgttgattggaacgccc-3'; sin-863-f, 5-gggaccaagacctaa tcagacaagcttgtctgattaggtcttggtccctttttg-3 and sin-863-re, 5-aattcaaaaagggaccaagacctaatcagacaagcttgt ctgattaggtcttggtccc-3; sin-1240-f, 5-ggagcttctgctgattcaactaagcttagttgaatcagcagaagctcctttttg-3 and sin-1240-re, 5-aattcaaaaaggagcttctgctgattcaactaagcttagttgaatcagcagaagctcc-3. the oligos sin-213-f and sin-213-re, sin-863-f and sin-863-re, sin-1240-f and sin-1240-re were annealed pair-wisely to form duplexes. to construct the sirna targeting to the 5 terminus of n gene (named as si-no16 that targets to 7-27nt) as described by shi et al. [21] , two synthesized oligos 5-gataatggaccccaatcaaacaagcttgtttgattggggtccattatctttttg-3 and 5-aattcaaaaagataatggaccccaatcaaacaa gcttgtttgattggggtccattatc-3 were also annealed. the duplex products were then subcloned into pbs/u6 [29] (provided by dr. yang shi, harvard medical school) to form pbs/u6-sin213, pbs/u6-sin863, pbs/u6-sin1240 and pbs/u6-no16, respectively. total rnas were extracted from the cultured cells with trizol (invitrogen, carlsbad, ca). all primers used in the rt-pcr reactions were listed in table 1 . one g of total rnas was first reverse transcribed using amv reverse transcriptase (promega, usa). about 2 l of the transcribed cdnas was subjected to standard pcr reaction using n gene specific primers. one-step real-time quantitative rt-pcr (qrt-pcr) (takara biotechnology, dalian, china) was also performed to monitor the targeted gene expression. real time qrt-pcr was carried out with iq5 real-time pcr detection system (bio-rad laboratories) at the following conditions: 42 °c for 5min and 95 °c for 10sec; 95 °c for 5 sec and 60 °c for 10 sec and repeated for 40 cycles. the dissociation of the reaction products was conducted from 55 °c to 95 °c as the temperature rose at 0.2 °c per ten seconds. cell cultured in 35-mm dishes were transiently transfected with the indicated plasmid dnas using profection ® mammalian transfection systems (promega, usa) according to the manual instruction. briefly, transfected dnas were first mixed with 37 l of 2m cacl 2 and brought to total 300 l with sterile and deionized water. then the dna-cacl 2 mixture was added into equal volume of 2×hbs drop by drop accompanying with gentle vortexing. after 15 minutes incubation, the reaction mixture was evenly distributed into the cell culture medium and incubated for 48 hours before harvesting. the transfected cells were lysed with a lysis buffer containing 1% np-40, 50 mm tris-hcl (ph 7.5), 120 mm nacl, 200 μm navo 4 , 1 g/ml leupeptin, 1 g/ml aprotinin, and 1 m pmsf. about 15 g of cell lysate for each sample was resolved onto 12% sds-page. after separation, the separated proteins were transferred onto hybond nitrocellular membrane (pharmacia). the transferred membrane was first probed with a primary antibody. then, a secondary antibody labeled with horseradish peroxidase was added to the reaction and finally visualized with an ecl kit. sars-cov n gene has long been selected as one of the major targets for sirna design. however, genomic alterations such as nucleotide deletion and substitution frequently occur in sars-cov. this type of change has the potential to weaken the inhibitory effect induced by sirna. in the current study we analyzed the n gene sequences derived from 16 different isolates and found that in addition to single nucleotide substitutions, the n gene also possesses a longer nucleotide deletion and dinucleotide substitution. previous studies on m gene specific sirnas indicate that the sirna targeting site and its surrounding sequences may influence the inhibitory effect [25] . the current study further support this observation by showing that si-n#16, which targets a twelve nucleotide deletion region in the n gene of one viral strain generates a less inhibitory effect than that of either si-n213 or si-n863 (figure 7b ). although n gene specific sirna has been intensively studied, no sirna was reported for the last 410nt sequence. in this study, we created and tested a third sirna (si-n1240) which was effective to target the 3' terminal sequence of n gene and subsequently in downregulating n gene expression. interestingly, the inhibitory effect induced by si-n1240 was similar to that of si-n#16, implying that sirnas targeting at both the 5 and 3 terminal sequences of the isolated n gene might induce less inhibition. we also notice that there might be a discrepancy between our study and a previous report by shi et al. in which they demonstrated that si-n#16 (no.16) was the strongest inhibitor among the eleven n gene sirnas tested [21] . this discrepancy might be due to: 1) the potential structural difference between vector expressed n mrna and virus derived subgenomic n mrnas and/or 2) the differences in the expression system used by us (pbs/u6) versus shi's (chemical synthesized sirnas). finally, we provide evidence to show that sars-cov n gene products were able to up-regulate m gene mediated inf production, while n gene specific sirna could functionally reduce this enhancement. however, the mechanism underlining n gene mediated inf production remains an interesting point to be further addressed. the genome sequence of the sars-associated coronavirus characterization of a novel coronavirus associated with severe acute respiratory syndrome comparative full-length genome sequence analysis of 14 sars coronavirus isolates and common mutations associated with putative origins of infection dissection and identification of regions required to form pseudoparticles by the interaction between the nucleocapsid (n) and membrane (m) proteins of sars coronavirus a major determinant for membrane protein interaction localizes to the carboxy-terminal domain of the mouse coronavirus nucleocapsid protein characterization of the coronavirus m protein and nucleocapsid interaction in infected cells structure of the sars coronavirus nucleocapsid protein rna-binding dimerization domain suggests a mechanism for helical packaging of viral rna modular organization of sars coronavirus nucleocapsid protein biochemical and immunological studies of nucleocapsid proteins of severe acute respiratory syndrome and 229e human coronaviruses structure of the n-terminal rna-binding domain of the sars cov nucleocapsid protein crystal structure of the severe acute respiratory syndrome (sars) coronavirus nucleocapsid protein dimerization domain reveals evolutionary linkage between corona-and arteriviridae carboxyl terminus of severe acute respiratory syndrome coronavirus nucleocapsid protein: self-association analysis and nucleic acid binding characterization multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: implications for ribonucleocapsid protein packaging the sars-cov nucleocapsid protein: a protein with multifarious activities nucleocapsid protein of sars-cov activates the expression of cyclooxygenase-2 by binding directly to regulatory elements for nuclear factor-kappa b and ccaat/enhancer binding protein nucleocapsid protein of sars-cov activates interleukin-6 expression through cellular transcription factor nf-kappab the sars coronavirus nucleocapsid protein induces actin reorganization and apoptosis in cos-1 cells in the absence of growth factors m and n proteins of sars coronavirus induce apoptosis in hpf cells severe acute respiratory syndrome coronavirus open reading frame (orf) 3b, orf 6, and nucleocapsid proteins function as interferon antagonists prophylactic and therapeutic effects of small interfering rna targeting sars-coronavirus inhibition of genes expression of sars coronavirus by synthetic small interfering rnas small interfering rna inhibits sars-cov nucleocapsid gene expression in cultured cells and mouse muscles kinetics and synergistic effects of sirnas targeting structural and replicase genes of sars-associated coronavirus inhibition of sars-cov gene expression by adenovirus-delivered small hairpin rna small interfering rna effectively inhibits the expression of sars coronavirus membrane gene at two novel targeting sites sirnas targeting terminal sequences of the sarsassociated coronavirus membrane gene inhibit m protein expression through degradation of m mrna inhibition of beta interferon induction by severe acute respiratory syndrome coronavirus suggests a two-step model for activation of interferon regulatory factor 3 the complete genome sequence of severe acute respiratory syndrome coronavirus strain hku-39849 (hk-39) a dna vector-based rnai technology to suppress gene expression in mammalian cells this article is an open access article distributed under the terms and conditions of the creative commons attribution license key: cord-273347-eyxc4rt0 authors: mohammadinejad, reza; dehshahri, ali; madamsetty, vijay sagar; zahmatkeshan, masoumeh; tavakol, shima; makvandi, pooyan; khorsandi, danial; pardakhty, abbas; ashrafizadeh, milad; afshar, elham ghasemipour; zarrabi, ali title: in vivo gene delivery mediated by non-viral vectors for cancer therapy date: 2020-07-04 journal: j control release doi: 10.1016/j.jconrel.2020.06.038 sha: doc_id: 273347 cord_uid: eyxc4rt0 gene therapy by expression constructs or down-regulation of certain genes has shown great potential for the treatment of various diseases. the wide clinical application of nucleic acid materials dependents on the development of biocompatible gene carriers. there are enormous various compounds widely investigated to be used as non-viral gene carriers including lipids, polymers, carbon materials, and inorganic structures. in this review, we will discuss the recent discoveries on non-viral gene delivery systems. we will also highlight the in vivo gene delivery mediated by non-viral vectors to treat cancer in different tissue and organs including brain, breast, lung, liver, stomach, and prostate. finally, we will delineate the state-of-the-art and promising perspective of in vivo gene editing using non-viral nano-vectors. since the elucidation of the molecular mechanisms of several diseases along with the discovery of nucleic acid structure, the replacement of defective genes with functional versions has been considered as a new therapeutic paradigm called "gene therapy" (1, 2). gene therapy is carried out by expression constructs in order to increase the production of specific proteins inside the cells. on the other hand, down-regulation of specific genes has shown great potential for the treatment of various diseases (3) . therefore, the modulation or silencing of such genes using accessed by the ribosome for the production of proteins. on the other hand, mrna directly enters the cytoplasm and interacts with ribosome for protein production. these unique properties have made mrna as a potential candidate not only for gene therapy but also for vaccine development particularly for the immunization against widespread viruses including sars-cov-2 (36). however, the major concerns regarding the application of mrna for gene therapy are its unstable nature and the existence of degrading enzymes such as rnases in the extra-and intra-cellular environments (37, 38). to overcome these problems, new developments, including snim (stabilized non-immunogenic mrna), have been introduced in which the modified nucleotides could be incorporated into the mrna structure to increase its stability and reduce its immunogenicity (4, 39-41). the aim of gene therapy is not just increasing the expression of certain gene as it was expected in previous decades. there are several pathological conditions related to the genes over-expression. in such conditions, the gene therapy goal would be silencing the target genes. the knock-down of such genes could be achieved by different nucleic acid materials, including antisense and sirna. it must be considered that there are some differences between gene therapy and oligonucleotide therapy (42). oligonucleotide-based medications such as antisense do not need the transcriptional and translational machinery of the cells while the conventional gene therapy is based on the replacement of defected genes by the functional ones as well as the introduction of new gene into the cells including germlines or somatic cells (43). antisense technology is defined as a powerful tool to down-regulate a specific gene by transferring the antisense strand to the cells with the ability to interact with the sense strand. the base pairing between the sense and antisense strands results in the translational block (44, 45) . on the other hand, rnai technology employs several enzymes (e.g., dicer) and proteins (e.g., risc complex) to interfere with the layer on the surface of the carriers reduces the risk of aggregation and increases colloidal stability. the reduction of the interaction between the stealth gene carriers and serum components reduces the recognition of the vehicles by mononuclear phagocyte system (mps), including macrophages, which in turn leads to enhanced circulation time (58) . in order to direct the carriers into the precise site of action, smart gene carriers have been designed. these carriers could be targeted to the specific receptors by the conjugation of small molecules as well as macromolecules including monoclonal antibodies or aptamers (59) (60) (61) . once the nano-carriers reach the cells, they may enter endosomal compartment, which degrades the nucleic acid therapeutics and leads to failed transfection. hence, the promotion of proton sponge effect or the conjugation of membrane fusogenic compounds could be considered as brilliant strategies to overcome the endo/lysosomal barrier (62, 63) . while the sirna site of action is the cytosolic environment, plasmids must be able to cross the nuclear barrier. it has been shown that the molecules with the molecular mass of 40-70 kda (10-25 nm) are able to passively diffuse via nuclear pores. however, the exact mechanism of nuclear entry is not completely understood (4, 64) . it is not clear whether the polyplexes goes under vector unpackaging outside the nucleus or the transcriptional machinery of the cell dissociate the nucleic acids from the carrier inside the nucleus. regardless of the mechanism, it has been demonstrated that cell cycle may have a crucial impact on the cell entry. the cells at the phases of s/g2 have shown the highest transfection efficiency. however, most cells are not in the dividing phase in vivo; therefore the alternative approaches, including the conjugation of nuclear localization signals (nls), must be employed to increase nucleus entry (65, 66) . the real value of these important findings is dependent on their translation to clinical application. the approval of patisiran (onpattro ® ) as the first fda approved sirna based therapeutic for hereditary transthyretin-mediated (hattr) j o u r n a l p r e -p r o o f amyloidosis opened up new horizons for the scientists to seek for the efficient delivery systems enabling the nucleic acids to be used as therapeutic agents. patisiran has been formulated as lipid nanoparticles (nps) and is used by intravenous infusions while the second approval for sirnabased therapeutics belongs to givosiran (givlaari ® ) (67) (68) (69) . givosiran has been prepared as nacetylgalactosamine (galnac) conjugated sirna and is administrated subcutaneously. the first polymer-based gene therapy investigation in human was carried by transferrin-polylysine (adenovirus-enhanced transferrinfection; avet) carrier in order to transfer the plasmid encoding il-2 gene for the treatment of melanoma (70) . in the first-ever human study of polyplexes, the ex-vivo gene transfer was performed to deliver the plasmid dna into the patient cells. peg conjugated polylysine was used to transfer the pdna to treat cystic fibrosis as a nasal drug delivery system (16) . in another study to design a vaccine for hiv, mannose conjugated pei was prepared as the carrier for the plasmid encoding various hiv antigens and used as a dermal formulation in a human clinical trials (71). the intraperitoneal injection of peg-pei-cholesterol to transfer il-12 plasmid was also used for ovarian cancer treatment (72). the intravenous injection of transferrin-cyclodextrin oligocation complexed with sirna to silence ribonucleotide reductase m2(rrm2) was applied in various solid tumors (22) . since various routes of administration have been used to transfer non-viral delivery systems for gene therapy, it seems that the route is highly dependent on the characteristics of the carrier and nucleic acids as well the prepared complex and the final formulation. it seems that there is no restrict limitation for a specific route of administration for non-viral gene delivery carriers at least in the theoretical aspect ( table 2 ). j o u r n a l p r e -p r o o f despite advances in chemotherapy, surgery, and radiation therapy, lung cancer is one of the leading causes of cancer-related deaths globally (148, 149) . even though there is some initial response with present conventional chemotherapy, patients will develop resistance and exhibit poor survival with prolonged usage (150) . several attempts were made to improve the survival of lung cancer patients using various combination therapies that have demonstrated that no further improvement observed, suggesting the need for specific, less toxic treatment approaches such as genetic alterations. tumor suppressor genes and oncogenes are the two major genetic factors affecting the progression of the disease (151, 152) . hence, altering these explicit genes can advance the therapeutic benefit of present therapies. (153) . numerous gene therapy strategies have been adopted, such as the deletions of oncogenes, immune stimulation, replacement of tumor-suppressor genes and transfer of genes that enhance conventional treatments (154) . here, there are several other non-viral vectors used for the delivery of various nucleic acid materials for lung cancer (166) (167) (168) (169) (170) (171) (172) (173) . another most common genetic alteration happen in the lung cancer is associated with the tumor suppressor genes. for example, tumor suppressor gene tusc2/fus1 (tusc2) is inactivated in lung cancer. however, no drug development approach is available for targeting the loss-of-function genetic deviations. roth ja and his team developed a systemic gene therapy approach by using a tusc2-expressing plasmid vector packaged in dotap:chol nanovesicles. they found that following the tumor treatment with dc-tusc2, some major changes in the intrinsic pro-apoptotic pathway happened (174, 175) . these nanovesicles were administered intravenously in the patients bearing lung cancer and the results showed an improvement in delivering tusc2 genes to both human primary and metastatic tumors safely (176) . among several existing polymeric transporters, pei was mostly exploited to transfer genes for both in vitro and in vivo transfection. for example, scientists used pei to develop a phsensitive in vivo selective gene delivery system to transfer p53dna at the tumor site. a single administration of p53dna nanocomplex along with laser radiation, significantly inhibited tumor growth and prolonged median survival (177) . gold nps also used to deliver p53dna to lung cancer cells (178) . several other studies also demonstrated that the p53-based gene delivery is able to improve the therapeutic outcome for lung cancer (179) (180) (181) (182) . in summary, based on these there are several strategies to treat breast cancers based on the severity and the mechanisms involved in the pathogenesis including autophagy and apoptosis (183) . although there are cxcr4sirna and an rna-triple-helix in the hydrogels nps without synthetic polycationic reagents for the treatment of breast cancer (189) . amorphous calcium carbonate fusion nanospheres fabricated with caip 6 nps were efficient in carrying genes to the tumor site. scientists showed that akt1 sirna loaded caco3/caip 6 nanocomplexes substantially inhibited tumor growth (190) . similarly, a polypeptide containing lah4-l1-simdr1 loaded nanocomplexes displayed significant tumor growth inhibition when used along with ptx. in this study, high mdr1 gene silencing efficacy was observed in the tumor-bearing nude mice (98) . enormous efforts are still underway for developing novel and effective gene delivery systems based on biocompatible nanomaterials to transfer the target genes to the tumor site (167, 191, 192) . for example, researchers have developed an elastin-like recombinant (elr) and specific muc1 aptamers for intracellular delivery of the muc1 gene to breast tumors (193) . more recently, the same group developed a double protection tumor-specific nanomaterial device for gene therapy in breast cancer (86) . the functionalized peptides/ligands can also improve the delivery of nucleic acid-complexed nps to tumors (95, 194, 195) . (196) . cell-penetrating peptide (cpp)-containing and egfr-sirna loaded nanobubbles showed synergism with ultrasound irradiation mediated egfr-sirna delivery to tnbc (197) . zhou et al also developed cd105-conjugated targeted cationic microbubbles for antiangiogenesis gene therapy for breast cancer (198) . similarly, endostatin loaded and cd105 antibody conjugated immunoliposomes were prepared for antiangiogenic and imaging therapy (199) . gu nanocomplexes. this nanocomplex could inhibited the tumor growth synergistically and prolonged the survival of drug-resistant breast tumors mice (202) . in another similar study, the investigators proved that the co-delivery of p53 dna and avpi peptide enabled a complete arrest of tumor growth when used in combination with a reduced dose of dox. in their study, they modified avpi peptide not only to enable it to penetrate to tumor cells but also acts as a gene delivery vehicle by forming a nano complex with cationic r8 moiety (203) . there are j o u r n a l p r e -p r o o f several studies demonstrating that the p53 mediated gene therapy for breast cancer treatment is an efficient approach in cancer gene therapy (204, 205) . overall, the combination of chemotherapy along with gene therapy may enhance the therapeutic effects against breast cancer. there are other categorization methods for brain tumors including primary and secondary tumors. primary tumors originate from meninges, glands, nerve and other brain cells, while secondary tumors originate from other parts of the body and spread to the brain (206) . the most common brain cancers are glioma, neuroblastoma, meningioma, vestibular schwannoma and pituitary adenoma. the brain tumors can be primary diagnosed using mri, ct scan, angiography, skull x-ray and biopsy. despite enormous advances in the field of pharmaceutics and radiotherapy, the brain cancers cannot be completely cured. polymer-based carriers are accounted as one of the most effective carriers in drug delivery (207) (208) (209) (210, 211) . moreover, gene delivery is accounted as a hopeful strategy for brain cancer treatment. one of the most important obstacles in brain drug delivery is the blood-brain barrier (bbb). therefore, there are many efforts to overcome this barrier including functionalization and modification of non-viral gene delivery vectors (212, 213) . the modification leads to the transcytosis and endocytosis of vectors through cell-penetrating peptides (cpp) mediated transmembrane transport, adsorptive-mediated endocytosis and receptor-mediated endocytosis (214) . there are some receptors on the surface of it seems that r7l10 is safer than pei and conjugation with epo enhances its gene and drug delivery efficacy in hypoxia condition (107) . dendrimers have been considered as effective drug delivery carriers and polyamidoamine (pamam) is one the most well-known dendrimers in drug delivery. it seems that primary and tertiary amines in dendrimer play a critical role in dna condensation and release (226) . however, there are controversial reports on the safety of dendrimers owing to their positive surface charge, especially for g2-g4 dendrimers (227, 228) . it has been shown that pegylated lactoferrin-dendrimer-dna has shown significantly less toxicity and higher transfection efficacy than non-pegylated ones. interestingly, they showed that brain uptake and transfection efficacy of the lactoferrin conjugated complexes were significantly higher than the transferrin substituted enhances cell viability, rat survival and vegf marker while decreases apoptosis as compared to the polyplex-trail, polyplex-hsv-tk and pbs in glioma-bearing sd rats (99) . however, the complex containing sv-tk with erythropoietin and nestin intron 2 (ni2) showed that its complexation with reducible poly oligo d-arginine has significantly less cytotoxicity than pei even at hypoxic condition. furthermore, the polyplex induced significantly higher apoptosis and tumor size decrease in an intracranial glioblastoma rat model (106) . overall, the targeting strategies might be considered as a prerequisite for non-viral vectors used for brain gene therapy. j o u r n a l p r e -p r o o f gastrointestinal cancers including colorectal and gastric cancers. these nano carriers have been employed as delivery vehicles for rna silencing of oncogenes, dna delivery of tumor suppressors, apoptosis inducers, suicide genes or immune-stimulatory molecules. colorectal cancer is the third most deadly diagnosed cancer in the world due to its metastasis tumor-bearing nude mice (266) . in addition to various materials used for the delivery of nucleic acids for colorectal carcinoma, electrotransfection is a promising route for facilitated delivery of genes into the target cells. gastric cancer is the second most malignant cancer worldwide with the poor five-year survival of 30% (267) . a range of nanoparticulate systems have been investigated for efficient and safe delivery of genes to gastric cancer models. for example, calcium phosphate nps (cpnps) were used to deliver a novel fusion suicide gene, ycdglytk, which is regulated by a cancer-specific cea promoter and a cmv enhancer (cv) (268) (269) (270) . it was observed that cpnps specifically hepatocellular carcinoma (hcc) is another deadly cancer worldwide due to the late diagnosis and the impaired and insufficient treatments. therefore, it is necessary to develop the carriers with enhanced targeted specificity, improved efficiency and safety (276, 277) . its promoter is associated with the risk of‫‬ several cancers including hcc. it can also be a predictive factor for poor hcc prognosis. in vivo efforts to re-express rassf1a has shown the arrest of hcc growth as well as the improved sensitivity of hcc cells to mitomycin (282) . plasmid dna was used as a reporter gene. the average hydrodynamic size and zeta potential of the carrier system at c/p ratio of 25 were 157 ± 3 nm and +18 ± 0.3 mv, respectively. the nanovehicle was intratumorally injected to subcutaneous huh-7 xenografts in athymic nude mice. it was suggested that biodegradable 536 nps would also be appropriate for systemic or transarterial delivery due to its small size which preferentially localized in tumor through epr effect (126) . in another effort for hcc gene therapy, a multifunctional np targeted for hcc was designed to deliver trail gene in mice (251) . these self-assembled lipid-bilayer structures (lcpp nps) are composed of the calcium phosphate (cap) and protamine core, which act as a ph stimuliresponsive and trail nuclear localization agent, respectively. moreover, the ca ions released from cap reverse the trail resistance. hcc-targeting peptide (sp94) was also used for for hepatocellular carcinoma treatment. the results showed that the treatment of the cells with such system combined with ultrasound irradiation increased the mir-122 expression level by 30fold in human hcc xenografts (130) . hence, these methods have shown potential for further studies to develop safe and efficient gene therapy approaches. prostate cancer is the fourth most common cancer and the second most extensive cancer in males leading to the mortality of around 300,000 individuals per year. almost 200,000 new patients have been diagnosed per annum. the late diagnosis of prostate cancer is the primary cause of death (290, 291) . based on the stage and severity of the tumor, different treatments can be suggested to the patient including prostatectomy, radiotherapy, hormone therapy, chemotherapy, gene therapy, and a combination of them. the most recent procedure is gene therapy that mainly initiated via transferring a new gene to achieve destruction or fixation of cancerous cells (292) (293) (294) . transferrin and lactoferrin are two iron-binding proteins that widely used as targeting ligands for prostate cancers (295, 296) . another promising approach for prostate targeting is using the integrins that can be attached to the extracellular matrix of prostate cancer j o u r n a l p r e -p r o o f microenvironment. integrin receptors are supposed to be over-expressed on prostate cancer cells (297, 298) . prostate-specific membrane antigen (psma), integrins, and prostate stem cell antigen (psca) are the glycoprotein which could be targeted by various ligands (299, 300) . the most common treatment of cancers is chemotherapy while having various challenges and side effects including the lack of selectivity to the cancer cells and toxicity to the healthy cells (305) . different approaches such as gene therapy and combination therapy have been suggested to circumvent these limitations (111, 285, 306) . combination therapy may decrease the toxicity of each agent by reducing the individual drug-related dose. in this field, co-delivery of drug and gene-based nps have attracted more attention (244, 245) . co-delivery of doxorubicin and sirna against p-glycoprotein. it has been observed that the emerged synergistic effect is even more efficient than co-treatment of chemotherapeutics and sirna (116, 193, 244, 285) . only in these systems, cas9 protein is an essential compartment for dna interference. generally, this system contains a nuclease protein (cas9) and a guide rna (grna) (213) . since, the grna could be replaced by sgrna (synthetic chimeric single guide rna), the cas9 protein could be directed to the target site using sgrna which consequently leads to the induction of double-stranded dna breaks (dsbs). finally, the major pathways of repair mechanism in the cells are responsible for inducing the alterations. this simple, robust, userfriendly, specific, and efficient system has enabled researchers to create models for various diseases as well the novel therapeutic approaches (314) (315) (316) (317) . generally, there are three different approaches for crispr/cas 9 delivery (318) . the ultimate goal is to transfer the whole system into the cells. however, the ribonucleoprotein complex could since the difficulties for efficient delivery of cas9 protein reduces the transfection efficiency of sgrna and cas9 protein, the alternative strategy is to use the cas9 mrna with sgrna. for efficient delivery of cas9 protein and sgrna, the delivery platform must be able to transfer a large positively charged protein (cas9) and a negatively charged nucleic acid (sgrna) together. designing such delivery systems is not simple. the second approach includes the delivery of two mrna molecules with similar biophysical properties that facilitates the design of delivery systems. besides, the introduction of cas9 mrna into cells does not need to be entered into the cell nucleus for subsequent transcription. therefore, the main advantage for this approach is the quick onset of action. the transient expression of cas9 mrna in the cytosol along with the quick onset of cas9 action make this approach an attractive way for the researchers to reduce the off-target effects associated with the long time presence of cas9 protein inside the cells. however, the low stability of mrna is the major hampering factor for this delivery method. various non-viral delivery strategies have been employed to transfer cas9 mrna with the sgrna together including zwitterionic aminolipid nps (332-335) and branched-tail lipid nps. since there are several problems for efficient delivery of cas9 protein, cas 9 mrna and sgrna, the third method have been introduced which includes the design of a plasmid encoding cas 9 and sgrna inside the cells. the stability of plasmid-based crispr/cas9 systems is really higher than protein or mrna making these systems more attractive for in vivo applications. however, there are several major obstacles reducing its clinical applications. this system could be able to cross the nuclear membrane and access the transcriptional machinery of the cells. since the transcription of the plasmids and the production of cas9 protein and sgrna need more time rather than the direct introduction of these macromolecules into the cells, the delay in the onset of therapeutic action is expected. in addition, the off-target effects associated with the long-term production of cas9 protein is more probable rather than the previous methods. also, the risk of the integration of plasmid into the genomic materials may reduce their potential for in recent decades, various oligonucleotide-based therapeutics have been introduced for human clinical applications. this novel category of therapeutic materials includes antisense oligonucleotides and aptamers as well as sirna-based medications. the clinical applications of these new drugs are the result of breakthrough discoveries in molecular biology. however, the translation of these achievements to the clinical applications is substantially dependent on the development of efficient and safe delivery systems. an optimized delivery system for nucleic acids should be able to form a stable structure outside the cells and release the payloads at the specific site of action. in addition, the toxicity of the delivery vehicle must be tolerable by the human cells. the biophysical properties and the pharmacokinetic characteristics of the vehicles are the other significant points which determine the potential of delivery system for human applications. in order to improve these properties, stealth technology using various materials such as peg and targeting strategies have been introduced. using these approaches, the biophysical characteristics of the carriers could be modified and their pharmacokinetic properties might be improved. generally, polymer and dendrimer-based delivery systems have shown higher transfection efficiency (5, 25, 53) . however, their toxicity is the major concern for the further developments towards the clinical applications. for these carrier systems, the main j o u r n a l p r e -p r o o f modification strategy is focused on the reduction of cytotoxicity through the modulation of cationic charge or designs the biodegradable polycationic compounds. in addition, these materials suffer from the low targetability for the specific cells or tissues (346). therefore, the addition of targeting moieties on these materials could be considered as an effective way to improve their properties. these materials are appropriate delivery systems for the formation of complexes based on the electrostatic interaction between the nucleic acid and carrier. on the other hand, lipid-based carriers have demonstrated higher biocompatibility rather than the polymeric delivery systems (347, 348). these delivery systems have shown great potential for clinical applications due to their low toxicity. however, the transfection efficiency of such materials is generally lower than the polymeric compounds. therefore, the major approaches to improve the properties of these vehicles are focused on the augmentation of their transfection efficiency. similar to the polymeric delivery systems, lipid-based materials need the targeting moieties for efficient transfer of nucleic acid to the target cells or organs. although the toxicity of lipid-based delivery systems is lower than the polycationic polymers or dendrimers, they may induce inflammatory responses following systemic administration. the translation of these materials for commercial application needs a scalable production process which leads to the commercial products with highest batch-to-batch uniformity. the most recent clinical trial on the application of mrna as a potential vaccine for sars-cov-2 has been conducted by lnps which shows the importance of this category of delivery system for human application (36 prospects and problems of gene therapy: an update recombinant adeno-associated virus-mediated alpha-1 antitrypsin gene therapy prevents type i diabetes in nod mice therapeutic and biological activities of berberine: the involvement of nrf2 signaling pathway comparison of the effectiveness of polyethylenimine, polyamidoamine and chitosan in transferring plasmid encoding interleukin-12 gene into hepatocytes gene therapy clinical trials worldwide to 2017: an update progress and problems with the use of viral vectors for gene therapy a tragic setback after glybera's withdrawal, what's next for gene therapy? accumulation of sub single-stranded dna-packaged polyplex micelle as adeno-associated-virus-inspired compact vector to systemically target stroma-rich pancreatic cancer virus treatment questioned after gene therapy death polymer systems for gene delivery-past, present, and future. progress in polymer science cationic polymer based gene delivery systems design and development of polymers for gene delivery evaluation of cellular uptake and gene transfer efficiency of pegylated poly-l-lysine compacted dna: implications for cancer gene therapy regulatory sequences of the h19 gene in dna based therapy of bladder cancer current issues of rnai therapeutics delivery and development treatment of disseminated ovarian cancer using nonv k -12 v mannose polyethylenimine conjugates for targeted dna delivery into dendritic cells the first targeted delivery of sirna in humans via a self-assembling, cyclodextrin polymer-based nanoparticle: from concept to clinic shedding light on gene therapy: carbon dots for the minimally invasive image-guided delivery of plasmids and noncoding rnas-a review preparation, characterization and transfection efficiency of cationic pegylated pla nanoparticles as gene delivery systems conjugation of poly (amidoamine) dendrimers with various plant virus nanoparticles: novel and robust nanocarriers for drug delivery and imaging cancer gene therapy using plasmid dna: purification of dna for human clinical trials advances in non-viral dna vectors for gene therapy preparation, characterization, and transfection efficiency of low molecular weight polyethylenimine-based nanoparticles for delivery of the plasmid encoding cd200 gene modified polyethylenimine: self assemble nanoparticle forming polymer for pdna delivery plasmid dna delivery into hepatocytes using a multifunctional nanocarrier based on sugar-conjugated polyethylenimine systemic therapeutic gene delivery for cancer: crafting paris' arrow new amphipatic polymer-lipid conjugates forming longcirculating reticuloendothelial system-evading liposomes evaluation of circulation profiles of liposomes coated with hydrophilic polymers having different molecular weights in rats phosphatidyl polyglycerols prolong liposome circulation in vivo sheddable coatings for long-circulating nanoparticles plasmid dna delivery using l-thyroxine-conjugated polyethylenimine nanocarriers tetraiodothyroacetic acid-conjugated polyethylenimine for integrin receptor mediated delivery of the plasmid encoding il-12 gene double domain polyethylenimine-based nanoparticles for integrin receptor mediated delivery of plasmid dna polyplex evolution: understanding biology, optimizing performance absence of na+, k (+)-atpase regulation of endosomal acidification in k562 erythroleukemia cells polyethylenimine: a versatile, multifunctional non-viral vector for nucleic acid delivery nanoparticles of compacted dna transfect postmitotic cells uptake pathways and subsequent intracellular trafficking in nonviral gene delivery clinical advances of sirna therapeutics strategies, design, and chemistry in sirna delivery systems oligonucleotides to the (gene) rescue: fda approvals phase-i clinical trial of il-12 stimuli-responsive release and efficient sirna delivery in non-small cell lung cancer by a poly(l-histidine)-based multifunctional nanoplatform ultrasound-microbubbles-mediated micrornaand therapy synthesis and application of a novel gene delivery vector for non-small-cell lung cancer therapy liposome mediated-cyp1a1 cancer toxicological exploration of peptide-based cationic liposomes in sirna delivery chloroquine in combination with aptamer-modified nanocomplexes for tumor vessel normalization and efficient erlotinib/survivin shrna co-delivery to overcome drug resistance in egfr-mutated non-small cell lung cancer tumor-targeting anti-microrna-155 delivery based on biodegradable poly(ester amine) and hyaluronic acid shielding for lung cancer therapy liposomes co-loaded with 6-phosphofructo-2-kinase/fructose-2, 6-biphosphatase 3 (pfkfb3) shrna plasmid and docetaxel for the treatment of non-small cell lung cancer lung cancer gene therapy: transferrin and hyaluronic acid dual liganddecorated novel lipid carriers for targeted gene delivery knockdown of importin 7 inhibits lung tumorigenesis in k-rasla1 lung cancer mice treatment of breast cancer with autophagy inhibitory micrornas carried by ago2-conjugated nanoparticles polycation-carbon nanohybrids with superior rough hollow morphology for the nir-ii responsive multimodal therapy effect of protein corona on the transfection efficiency of lipid-coated graphene oxide-based cell transfection reagents a double safety lock tumor-specific device for suicide gene therapy in breast cancer a self-assembled rna-triple helix hydrogel drug delivery system targeting triple-negative breast cancer peptide-targeted polyplexes for aerosol-mediated gene delivery to cd49f-overexpressing tumor lesions in lung a flexible bowl-shaped magnetic assembly for multifunctional gene delivery systems novel facile thermosensitive hydrogel as sustained and controllable gene release vehicle for breast cancer treatment regulation of ca2+ signaling for drug-resistant breast cancer therapy with mesoporous silica nanocapsule encapsulated doxorubicin/sirna cocktail ultrasound-guided delivery of thymidine kinase-nitroreductase dual therapeutic genes by pegylated-plga/pie nanoparticles for enhanced triple negative breast cancer therapy efficient targeted tumor imaging and secreted endostatin gene delivery by anti-cd105 immunoliposomes reactive oxygen species-biodegradable gene carrier for the targeting therapy of breast cancer enhanced delivery of sirna to triple negative breast cancer cells in vitro and in vivo through functionalizing lipid-coated calcium phosphate nanoparticles with dual target ligands self-sensibilized polymeric prodrug co-delivering mmp-9 shrna plasmid for combined treatment of tumors effective gene silencing mediated by polypeptide nanoparticles lah4-l1-simdr1 in multi-drug resistant human breast cancer polylysine-modified polyethylenimine polymer can generate genetically engineered mesenchymal stem cells for combinational suicidal gene therapy in glioblastoma delivery of sirna in vitro and in vivo using pei-capped porous silicon nanoparticles to silence mrp1 and inhibit proliferation in glioblastoma a non-viral suicide gene delivery system traversing the blood brain barrier for non-invasive glioma targeting treatment magnetic ternary nanohybrids for nonviral gene delivery of stem cells and applications on cancer therapy targeting ezh2 for glioma therapy with a novel nanoparticle-sirna complex optimization of in vivo dna delivery with nickfect peptide vectors folate-conjugated gene-carrying microbubbles with focused ultrasound for concurrent blood-brain barrier opening and local gene delivery peptide micelle-mediated delivery of tissue-specific suicide gene and combined therapy with avastin in a glioblastoma model thymidine kinase gene delivery using curcumin loaded peptide micelles as a combination therapy for glioblastoma biodegradable polymeric nanoparticles show high efficacy and specificity at dna delivery to human glioblastoma in vitro and in vivo dual-targeting nanoparticles with excellent gene transfection efficiency for gene therapy of peritoneal metastasis of colorectal cancer disrupting g6pd-mediated redox homeostasis enhances chemosensitivity in colorectal cancer modified nanoparticle mediated il-12 immunogene therapy for colon cancer a theranostic micelleplex codelivering sn-38 and vegf sirna for colorectal cancer therapy development of self-assembled multi-arm polyrotaxanes nanocarriers for systemic plasmid delivery in vivo colorectal cancer combination therapy using drug and gene co-delivered, targeted poly (ethylene glycol)-ε-poly (caprolactone) nanocarriers. drug design, development and therapy modified pamam vehicles for effective trail gene delivery to colon adenocarcinoma: in vitro and in vivo evaluation. artificial cells, nanomedicine, and biotechnology multifunctional nucleus-targeting nanoparticles with ultra-high gene transfection efficiency for in vivo gene therapy microbeads mediated oral plasmid dna delivery using polymethacrylate vectors: an effectual groundwork for colorectal cancer. drug delivery rpm peptide conjugated bioreducible polyethylenimine targeting invasive colon cancer a nanoparticle formulation that selectively transfects metastatic tumors in mice micrornas targeting mutant k-ras by electrotransfer inhibit human colorectal adenocarcinoma cell growth in vitro and in vivo competition of charge-mediated and specific binding by peptide-tagged cationic liposome-dna nanoparticles in vitro and in vivo rgd peptides-conjugated pluronic triblock copolymers encapsulated with ap-2α m v in vivo multifunctional drug carrier based on pei derivatives loaded with small interfering rna for therapy of liver cancer polymeric nanoparticles as cancer-specific dna delivery vectors to human hepatocellular carcinoma development of self-assembling peptide nanovesicle with bilayers for enhanced egfr-targeted drug and gene delivery effect of diphtheria toxin-based gene therapy for hepatocellular carcinoma sp94-targeted triblock copolymer nanoparticle delivers thymidine kinase-p53-nitroreductase triple therapeutic gene and restores anticancer function against hepatocellular carcinoma in vivo ultrasound-assisted mir-122-loaded polymeric nanodroplets for hepatocellular carcinoma gene therapy apoe-modified liposomes mediate the antitumour effect of survivin promoter-driven hsvtk in hepatocellular carcinoma golgi membrane protein gp73 modifiedliposome mediates the antitumor effect of survivin promoter-driven hsvtk in hepatocellular carcinoma dual delivery of ligand-mediated targeting of cytokine interleukin-27 enhances its bioactivity in vivo delivery of the gene encoding the tumor suppressor sef into prostate tumors by therapeutic-ultrasound inhibits both tumor angiogenesis and growth tat modified and lipid-pei hybrid nanoparticles for co-delivery of docetaxel and pdna targeted gene delivery of polyethyleneimine-grafted chitosan with rgd dendr m αvβ3 -overexpressing tumor cells. carbohydrate polymers nanoghosts as a novel natural nonviral gene delivery platform safely targeting multiple cancers study on the prostate cancer-targeting mechanism of aptamer-modified nanoparticles and their potential anticancer effect in vivo highly efficient cationic hydroxyethylated cholesterol-based nanoparticle-mediated gene transfer in vivo and in vitro in prostate carcinoma pc-3 cells cancer statistics global epidemiology of lung cancer emerging insights of tumor heterogeneity and drug resistance mechanisms in lung cancer targeted therapy hallmarks of cancer: the next generation revisiting the hallmarks of cancer state-of-the-art human gene therapy: part ii. gene therapy strategies and clinical applications gene therapy leaves a vicious cycle non viral vectors in gene therapy-an overview viral and non-viral vectors in gene therapy: technology development and clinical trials production and clinical development of nanoparticles for gene delivery the development of functional non-viral vectors for gene delivery sirna conjugated nanoparticles-a next generation strategy to treat lung cancer aptamer-targeted delivery of bcl-xl shrna using alkyl modified pamam dendrimers into lung cancer cells polymetformin combines carrier and anticancer activities for in vivo sirna delivery multi-functional self-assembled nanoparticles for pvegf-shrna loading and anti-tumor targeted therapy folate-conjugated polyspermine for lung cancer-targeted gene therapy exosome-mediated microrna-497 delivery for anticancer therapy in a microfluidic 3d lung cancer model mdm2 knockdown mediated by a triazinemodified dendrimer in the treatment of non-small cell lung cancer at2r gene delivered by condensed polylysine complexes attenuates lewis lung carcinoma after intravenous injection or intratracheal spray elastin-like recombinamers with acquired functionalities for gene-delivery applications cationic lipid guided shorthairpin rna interference of annexin a2 attenuates tumor growth and metastasis in a mouse lung cancer stem cell model preparation and evaluation of chitosan-dna-fap-b nanoparticles as a novel non-viral vector for gene delivery to the lung epithelial cells visualization and expression of genes with biomimetically mineralized zeolitic imidazolate framework-8 (zif-8) therapeutic effect of phlip-mediated ceacam6 gene silencing in lung adenocarcinoma tumor suppressor fus1 signaling pathway synergistic tumor suppression by coexpression of fus1 and p53 is associated with down-regulation of murine double minute-2 and activation of the apoptotic protease-activating factor 1-dependent apoptotic pathway in human non-small cell lung cancer cells phase i clinical trial of systemically administered tusc2(fus1)-nanoparticles mediating functional gene transfer in humans highly specific in vivo gene delivery for p53-mediated apoptosis and genetic photodynamic therapies of tumour

fabrication of gold nanoparticles in absence of surfactant as in vitro carrier of plasmid dna

poly( -amino ester) nanoparticle delivery of tp53 has activity against small cell lung cancer in vitro and in vivo current status of gene therapy for lung cancer and head and neck cancer histone h2a-peptide-hybrided upconversion mesoporous silica nanoparticles for bortezomib/p53 delivery and apoptosis induction autophagic, apoptotic, and necrotic cancer cell fates triggered by acidic ph microenvironment toxicity concerns of nanocarriers. nanotechnology-based approaches for targeting and delivery of drugs and genes hydrogel doped with nanoparticles for local sustained release of sirna in breast cancer direct cytosolic sirna delivery by reconstituted high density lipoprotein for target-specific therapy of tumor angiogenesis codelivery of an optimal drug/sirna combination using mesoporous silica nanoparticles to overcome drug resistance in breast cancer in vitro and in vivo tpgs functionalized mesoporous silica nanoparticles for anticancer drug delivery to overcome multidrug resistance a self-assembled rna-triple helix hydrogel drug delivery system targeting triple-negative breast cancer caco3/caip6 composite nanoparticles effectively deliver akt1 small interfering rna to inhibit human breast cancer growth elastin-like recombinamers as smart drug delivery systems biocompatibility and immunogenicity of elastin-like recombinamer biomaterials in mouse models biocompatible elr-based polyplexes coated with muc1 specific aptamers and targeted for breast cancer gene therapy engineering nanoparticles for targeted delivery of nucleic acid therapeutics in tumor cell penetrating peptides, novel vectors for gene therapy synthesis of a novel pegdga-coated hpamam complex as an efficient and biocompatible gene delivery vector: an in vitro and in vivo study novel cell-penetrating peptide-loaded nanobubbles synergized with ultrasound irradiation enhance egfr sirna delivery for triple negative breast cancer therapy targeted antiangiogenesis gene therapy using targeted cationic microbubbles conjugated with cd105 antibody compared with untargeted cationic and neutral microbubbles efficient targeted tumor imaging and secreted endostatin gene delivery by anti-cd105 immunoliposomes reversal of p-glycoprotein-mediated multidrug resistance by cd44 antibody-targeted nanocomplexes for short hairpin rna-encoding plasmid dna delivery polycation-functionalized nanoporous silicon particles for gene silencing on breast cancer cells a cationic prodrug/therapeutic gene nanocomplex for the synergistic treatment of tumors dna nanocomplex as adjuvant therapy for drug-resistant breast cancer comparison of two silica based nonviral gene therapy vectors for breast carcinoma: evaluation of the p53 delivery system in balb/c mice novel luminescent silica nanoparticles (lsn): p53 gene delivery system in breast cancer in vitro and in vivo systematic study of naf nanoparticles in micelles loaded on polylactic acid nanoscaffolds: in vitro efficient delivery a systematic study of cu nanospheres embedded in nonionic surfactant-based vesicle: photocatalytic efficiency and in vivo imaging study analysis of nanoparticle delivery to tumours progress in microneedle-mediated protein delivery multifunctional polymeric nanoplatforms for brain diseases diagnosis zeb1 and zeb2 gene editing mediated by crispr/cas9 in a549 cell line non-viral gene delivery and therapeutics targeting to brain quantitative evaluation of monocyte transmigration into the brain following chemical opening of the blood-brain barrier in mice non-viral vectors based on cationic niosomes as efficient gene delivery vehicles to central nervous system cells into the brain ox26/ctx-conjugated pegylated liposome as a dual-targeting gene delivery system for brain glioma nanogels for oligonucleotide delivery to the brain a powerful nonviral vector for in vivo gene transfer into the adult mammalian brain: polyethylenimine. human gene therapy dual-targeting and microenvironmentresponsive micelles as a gene delivery system to improve the sensitivity of glioma to radiotherapy boosting rnai therapy for orthotopic glioblastoma with nontoxic brain-targeting chimaeric polymersomes growth inhibition, cytokinesis failure and apoptosis of multidrug-resistant leukemia cells after treatment with p-glycoprotein inhibitory agents stability, intracellular delivery, and release of sirna from chitosan nanoparticles using different cross-linkers receptor-mediated gene delivery using pamam dendrimers conjugated with peptides recognized by mesenchymal stem cells protective effect of pegylation against poly glioblastoma u-87mg tumour cells suppressed by zno folic acid-conjugated nanoparticles: an in vitro study. artificial cells, nanomedicine, and biotechnology preparation, characterization, and evaluation of the anticancer activity of artemether-loaded nanoniosomes against breast cancer improved drug delivery and therapeutic efficacy of pegylated liposomal doxorubicin by targeting anti-her2 peptide in murine breast tumor model targeted lung cancer therapy: preparation and optimization of transferrin-decorated nanostructured lipid carriers as novel nanomedicine for codelivery of anticancer drugs and dna biocompatible elr-based polyplexes biodegradable poly (ɛ-caprolactone)-poly (ethylene glycol) copolymers as drug delivery system pcl/peg copolymeric nanoparticles: potential nanoplatforms for anticancer agent delivery cationic micelle-based sirna delivery for efficient colon cancer gene therapy treating colon cancer with a suicide gene delivered by self-assembled cationic mpeg-pcl micelles novel polymer micelle mediated codelivery of doxorubicin and p-glycoprotein sirna for reversal of multidrug resistance and synergistic tumor therapy a multifunctional nanocarrier for trai -b herapy against hepatocellular carcinoma with desmoplasia in mice survivin-t34a: molecular mechanism and therapeutic potential regulation of apoptosis at cell division by p34cdc2 phosphorylation of survivin nanotechnology-based sirna delivery strategies for metastatic colorectal cancer therapy nanoparticle-based delivery of sidcamkl-1 increases microrna-144 and inhibits colorectal cancer tumor growth via a notch-1 dependent mechanism selective blockade of dcamkl-1 results in tumor growth arrest by a let-7a microrna-dependent mechanism cell state plasticity, stem cells, emt, and the generation of intra-tumoral heterogeneity synthetic anticancer gene medicine exploits intrinsic antitumor activity of cationic vector to cure established tumors nan fang yi ke da xue xue bao novel polyethyleneimine-r8-heparin nanogel for high-efficiency gene delivery in vitro and in vivo efficient inhibition of c-26 colon carcinoma by vsvmp gene delivered by biodegradable cationic nanogel derived from polyethyleneimine synergistic and low adverse effect cancer immunotherapy by immunogenic chemotherapy and locally expressed pd-l1 trap liposomal nanostructures for drug delivery in gastrointestinal cancers liposome-encapsulated plasmid dna of telomerase-specific oncolytic adenovirus with stealth effect on the immune system cationic liposome coupled endostatin gene for treatment of peritoneal colon cancer nanoparticles to deal with gastric cancer suicide gene delivery by calcium phosphate nanoparticles: a novel method of targeted therapy for gastric cancer calcium phosphate nanoparticles as a novel nonviral vector for efficient transfection of dna in cancer gene therapy tissue specific expression of suicide genes delivered by nanoparticles inhibits gastric carcinoma growth regression of gastric cancer by systemic injection of rna nanoparticles carrying both ligand and sirna photothermal and gene therapy combined with immunotherapy to gastric cancer by the gold nanoshell-based system novel polyethylenimine-derived nanoparticles for in vivo gene delivery development of an mri-visible nonviral vector for sirna delivery targeting gastric cancer characterization of polyethylene glycolgrafted polyethylenimine and superparamagnetic iron oxide nanoparticles (peg-g-pei-spion) as an mrivisible vector for sirna delivery in gastric cancer in vitro and in vivo current status of nanomaterial-based treatment for hepatocellular carcinoma advances in delivery vectors for gene therapy in liver cancer gene therapy approaches against cancer using in vivo and ex vivo gene transfer of interleukin-12 asialoglycoprotein receptor-magnetic dual targeting nanoparticles for delivery of rassf1a to hepatocellular carcinoma rassf1a expression inhibits the growth of hepatocellular carcinoma from qidong county a nanoparticle-based model delivery system to guide the rational design of gene delivery to the liver. 2. in vitro and in vivo uptake results gold nanoparticles delivered mir-375 for treatment of hepatocellular carcinoma advances in the application of nanotechnology in the diagnosis and treatment of gastrointestinal tumors pten and trail genes loaded zein nanoparticles as potential therapy for hepatocellular carcinoma m v αfetoprotein promoter-mediated tbid delivered by folate-pei600-cyclodextrin nanopolymer vector in hepatocellular carcinoma yap inhibition restores hepatocyte differentiation in advanced hcc, leading to tumor regression highly efficient and tumorselective nanoparticles for dual-targeted immunogene therapy against cancer prostate cancer gene therapy clinical trials prostate cancer: diagnosis and clinical management history of gene therapy non-viral gene delivery methods cancer nanotechnology: the impact of passive and active targeting in the era of modern cancer biology regression of prostate tumors after intravenous administration of lactoferrin-bearing polypropylenimine dendriplexes encoding tnf-α trai k -12 therapeutic efficacy of intravenously administered transferrin-conjugated dendriplexes on prostate carcinomas integrins and prostate cancer metastases tumor targeting via integrin ligands novel strategies for targeting prostate cancer targeted nonviral gene therapy in prostate cancer preparation of nanobubbles carrying androgen receptor sirna and their inhibitory effects on androgen-independent prostate cancer when combined with ultrasonic irradiation interleukin-27 gene delivery for modifying malignant interactions between prostate tumor and bone dna/lipid complex incorporated with fibronectin to cell adhesion enhances transfection efficiency in prostate cancer cells and xenografts autophagy modulators: mechanistic aspects and drug delivery systems which one performs better for targeted lung cancer combination therapy: pre-or postbombesin-decorated nanostructured lipid carriers? drug delivery near-infrared/ph dual-responsive nanocomplexes for targeted imaging and chemo/gene/photothermal tri-therapies of non-small cell lung cancer nanotechnological strategies for osteoarthritis diagnosis, monitoring, clinical management, and regenerative medicine: recent advances and future opportunities memory of viral infections by crispr-cas adaptive immune systems: acquisition of new information rna-guided genetic silencing systems in bacteria and archaea emt signaling: potential contribution of crispr/cas gene editing multiplexed crispr technologies for gene editing and transcriptional regulation efficient genome editing in zebrafish using a crispr-cas system exploiting crispr-cas nucleases to produce sequence-specific antimicrobials correction of a genetic disease by crispr-cas9-mediated gene editing in mouse spermatogonial stem cells targeted genome engineering in human cells with the cas9 rna-guided endonuclease harnessing nanoparticles for the efficient delivery of the crispr/cas9 system direct cytosolic delivery of crispr/cas9-ribonucleoprotein for efficient gene editing cytosolic delivery of crispr/cas9 ribonucleoproteins for genome editing using chitosan-coated red fluorescent protein gene disruption by cellpenetrating peptide-mediated delivery of cas9 protein and guide rna a ph-responsive silica-metalorganic framework hybrid nanoparticle for the delivery of hydrophilic drugs, nucleic acids, and crispr-cas9 genome-editing machineries a biodegradable nanocapsule delivers a cas9 ribonucleoprotein complex for in vivo genome editing the authors would like to thank dr. horacio cabral (department of materials engineering, the university of tokyo) for his constructive comments. key: cord-021063-4y8m33ea authors: hug, peter; sleight, richard g. title: chapter 18 the advantages of liposome-based gene therapy: a comparison of viral versus liposome-based gene delivery date: 2007-09-02 journal: nan doi: 10.1016/s1569-2582(97)80043-8 sha: doc_id: 21063 cord_uid: 4y8m33ea viruses have evolved in such a way they are able to efficiently introduce and express exogenous genes in eukaryotic cells. most viruses need to maintain high-level expression of their proteins for only a short time and need not be concerned with the viability of the host cell after infection. attempts to modify a virus into a gene therapy vector can be hampered by this conflict. virus-based methods of gene therapy are likely to be most useful in applications that require a burst of high-level expression in many of the patient's cells, such as in cancer therapy. liposomal methods of gene therapy are flexible in that all the components of the system are controlled by designers. as these systems have become more sophisticated, they have begun to take on several characteristics of the viruses that they are intended to replace. the use of basic substances to condense dna has increased the efficiency of encapsulation. the addition of nucleophilic proteins raises the efficiency of transfection. by adding antibodies or other targeting molecules to the surface of liposomes, preferential binding of vesicles to a desired cell type has been increased. gene therapy as a treatment for human disease is in a stage of intensive development. already in limited use as an experimental therapy for cancer (oldfield et al., 1993; nabel et al., 1993) and inborn errors of metabolism (hoogerbrugge et al., 1992; morgan and anderson, 1993) , it will become a tool to treat diseases that are currently difficult to manage. there are many competing strategies being developed to introduce exogenous genes into a subset of cells of a patient. several of these are likely to proceed to clinical application. deciding which therapeutic strategy is best suited for a given disease will require an understanding of the underlying biology of the disease, as well as the therapy. in using this chapter to learn about liposomal methods of gene therapy, one should remember that the material presented here is only current as of january, 1994 . for more recent developments and greater detail, it will be necessary to consult the original literature. regardless of the specific disease being addressed by gene therapy, several factors must be considered before one treats patients. the first is that the method and the treatment itself must be commensurate with the severity of the disease. gene therapy will not be the preferred treatment for diseases such as phenylketonuria or galactosemia, both of which can be treated adequately by diet modification. a more widely recognized problem is that the treatment may, while curing the disease, create a new one in the process. the archetypal concern is the generation of insertional mutations by the integration of the curative gene into a cellular protooncogene, leading to cancer in the "cured" patient. once a method has been determined to be reasonably safe for the patient, it must be shown not to present a danger to others. this problem has hampered the widespread use of viral transfection vectors. substantial progress has been made in the construction of replication-defective retroviral and adenoviral vectors. however, the possibility remains that superinfection with wild-type adenovirus, or insertion of a wild-type retrovirus into an integrated defective retroviral genome, can generate new, infectious viral particles that could contain an oncogenic fragment of the therapeutic gene. these viral particles may have the ability to transfer genes between a patient's cells, and between individuals. in order for a method of gene therapy to be useful, it must significantly ameliorate the symptoms of the disease, and halt the progression of the disease state for a significant period of time. ideally, the disease would be cured permanently by one treatment. although periodic retreatments might be acceptable, every treatment brings with it the possibility of insertional mutagenesis or the generation of an immune response to the therapeutic vector. the final considerations involve ease of preparation and use. many candidate vectors, although potentially capable of introducing foreign genes into an organism, are so difficult to generate, or administer, that it is difficult to envision their widespread adoption. this aspect of vector design is often not considered until after the vector is developed. bearing in mind the above considerations, let us now consider the various approaches being developed to create vectors for gene therapy. in general, these can be divided into three groups: (i) active viruses, (ii) viral mimics that attempt to reproduce a subset of viral activities in a synthetic construct, and (iii) artificial delivery systems. given the amount of effort being expended on the development of these systems, and the highly varied nature of the intended target diseases and organs, it is likely that there will never be a single best choice of method. every disease will call for a tailored strategy that can be optimized to best fit the needs of the situation. table 1 contains a summary and comparison of various gene therapy delivery vectors. ten years ago, it seemed a foregone conclusion that retroviruses were the only serious candidate for an eventual therapeutic vector. today, many different viral vectors are being considered for use in gene therapy. because different viruses target different cell types and have different modes of replication, each viral delivery system may be best suited for treatment of different diseases. the viral vectors under most active development are retroviruses, adenoviruses, adeno-associated virus, and herpesviruses. some general characteristics of the viruses currently being developed as gene therapy vectors are presented in table 2 . wild-type retroviruses are contained within a lipid membrane and have two identical copies of an rna genome. after the viral particle's entry into the cell, the rna molecule is reverse-transcribed into a dna copy, which is then integrated into the host's genome, mrnas for the various viral proteins synthesized during the course of the infection are transcribed from this integrated dna. the integration of the retroviral dna genome into the host genome forms the basis for the use of retroviruses as vectors for the introduction of exogenous dna into eukaryotic cells (hoeben et al., 1992) . to convert a wild-type retrovirus into a retroviral gene therapy vector, two steps are necessary: first, the therapeutic gene must be added to the retroviral and, genome second, the potential of the retrovirus to remain infectious after integration into the target cell must be eliminated. presently, this is accomplished by inserting a therapeutic gene into a retroviral genome that has had most of the genes necessary for packaging the virus removed or mutated. deletion of the viral genes eliminates infectivity in the treated cells by preventing the newly integrated genome from making infectious copies of itself that could complete further rounds of infection. it also has the added benefit of increasing the maximum length of a therapeutic gene that can be delivered by a retrovirus. some advantages of retroviruses as gene delivery vectors include: (i) the efficient and stable integration of the introduced gene into the host genome, (ii) a wide host range, and (iii) the ability to infect large numbers of cells. potential problems associated with the use of viral vectors include: (i) the possibility of recombination events that could convert a replication-defective vector into an infectious agent, (ii) the possibility that superinfection with another retrovirus may allow unwanted transfer of the introduced gene between individuals, (iii) a 7-13 kb limit on the amount of dna that can be packaged, (iv) potential problems in targeting the virus to specific cells, and (v) difficulty in maintaining high-level expression of the exogenous gene. wild-type adenoviruses cause respiratory disease in humans. the adenovirus genome is a linear double-stranded dna molecule, about 36 kb in length. although the virus does not have a lipid coat, it is taken up by cells via endocytosis and enters the cytoplasm in a manner analogous to the retrovirus. once inside the cell, the viral dna moves to the nucleus and begins its replication cycle without becoming part of the host cell's genome. adenoviruses that are being considered for use as gene therapy vectors have had substantial parts of their genome removed, rendering them unable to replicate except in specially developed cell lines (i.e., packaging cell lines) that express the removed proteins (kozarsky and wilson, 1993) . the therapeutic gene to be introduced is added to the adenoviral genome by homologous recombination with a plasmid containing the therapeutic gene flanked by adenoviral dna. although possible target organs for adenoviral gene therapy include liver, central nervous system, vascular endothelium, and muscle, the primary target organ is the lung. clinical trials of adenoviral gene therapy for treatment of ~l-antitrypsin deficiency and cystic fibrosis are underway (crystal, 1992) . advantages of adenovirus-based vectors include: (i) the ability to infect large numbers of cells, (ii) relative ease of constructing the vector and obtaining large amounts of virus, and (iii) absence of genomic integration, obviating the problem of insertional mutagenesis. some disadvantages are: (i) the necessity to periodically reinfect the patient, as the genome is lost from dividing cells; (ii) the concomitant development of an allergic response; and (iii) possible toxicity of high doses of virus. adeno-associated virus is a naturally occurring defective virus that infects human lungs. although it is unable to go through lytic infection of cells independently, it can do so in the presence of a separate infection by adenovirus or herpes simplex virus. in the absence of a helper infection by one of these viruses, the adeno-associated virus establishes a latent infection of the host cell by integrating itself into the host genome. about 70% of the integrations of the wild-type virus occur at a single site on chromosome 19 (samulski et al., 1991) . because these integrations do not seem to be associated with any disease, this virus may eventually be used to add therapeutic genes to the human genome with a lower risk of insertional mutagenesis. to date, however, the targeted nature of the insertional events is not retained when the native viral genome is replaced by a therapeutic or marker gene. in addition, packaging constraints limit the size of any insert to 4.5-5 kb of dna. the genomes of herpesviruses are very large and complex. this fact alone might argue against the use of herpes simplex virus i (hsv-i) as a gene therapy vector, since a complex genome will be more difficult to effectively "tame." however, hsv-i has an advantage that compensates for this problem, in that the virus can exist and mediate persistent expression of its genes in neurons and other postmitotic cells (geller, 1993) . in the normal course of an infection, hsv-i will enter neuronal cells and remain latent, sometimes for years, before reactivating and starting a lytic cycle that results in a small patch of endothelial cell death (e.g., a cold sore). during latency, the host cell transcribes some viral genes, producing what are termed latency associated transcripts (lats). hsv-i deletion mutants have been isolated that are incapable of reactivation, and that once inserted into a neuron will remain latent indefinitely. if therapeutic genes added to these vectors have lat promoter elements, they will be expressed even though the virus itself remains latent. these vectors are being developed for the introduction of genes into cells of the central nervous system (geller, 1993) . in summary, all viral methods of gene therapy share several strengths, as well as numerous weaknesses. it is relatively easy to produce the large amounts of virus necessary to treat patients. viruses efficiently infect a large fraction of the cells with which they come in contact. although this may be a drawback in some cases where expression of the therapeutic gene must be restricted to specific cells, it may be possible to limit unwanted expression of the inserted gene by the use of tissue-specific promoters. all viral vectors for human gene therapy are derived from organisms that cause human disease. the possibility of reactivation of the vectors' infectivity or pathogenicity is hard to dismiss. in the end, it may prove easier and safer to construct a gene therapy vector having the elements of a successful virus, but derived from completely synthetic components. one such strategy involves the use of liposomes. liposomes are small vesicles usually made from pure preparations of phospholipids. because they can be made from completely pure components, they have been used extensively to characterize and define the physical properties of membranes and membrane proteins. liposomes have also been used as a means of delivering substances to cells, both in vitro and in vivo, including dna (hug and sleight, 1991) . before discussing factors to be considered in preparing liposomes for gene therapy of a given disease, it is worthwhile to review some general aspects of liposome preparation and design. liposomes are generally classified by the number of bilayers (lamellae) they contain, by size, and by the composition of their membrane. the earliest liposomes were formed by hydration of a dried lipid film, followed by vortexing. this results in a cloudy solution of liposomes having many lamellar layers, resembling an onion. these are called multilamellar vesicles, or mlvs. although mlvs are effective for the packaging of lipophilic drugs, they have very little aqueous volume in their interiors, and are not efficient at entrapping water-soluble compounds or large dna fragments. because of these deficiencies, mlvs were quickly superseded by unilamellar vesicles, which have only a single lipid bilayer between the external fluid and the lumen. several methods are used to pre~pare unilamellar vesicles, each having its own advantages and disadvantages. because of space limitations, we will only briefly describe those methods of liposome preparation that are directly relevant to the construction of gene therapy vectors. table 3 gives a direct comparison of these methods of preparation as related to transfection. cationic liposomes. liposomes made entirely or partially of a cationic lipid have a charge-based affinity for dna. when such liposomes are incubated in solution with dna, they form complexes that can transfect cells. several types of cationic lipid are available which in general have similar characteristics (leventis and silvius, 1990; ito et al., 1990; gao and huang, 1991; hazinski et al., 1991; farhood et al., 1992) . this technique was initially intended exclusively for in vitro use. however, in vivo transfections have been performed with several commercial preparations of cationic lipids. there are several reasons why cationic lipid-dna complexes have proved so popular for transfections: (i) they are very easy to prepare. the liposomes and the dna are incubated together, and then used without further preparation. (ii) the complexes are relatively non-toxic. (iii) transfections using this method are efficient with most cell types. and (iv) because the dna remains outside the liposome, questions of entrapment efficiency do not apply. in vivo transfections using cationic lipid-dna complexes are completely uncontrolled. that is, there are no targeting or specific binding agents present, and the complexes probably deliver dna to the first cell they contact. this occurs most often in the first capillary bed through which the injected complexes pass. when dna-lipid complexes are intravenously injected, the highest levels of gene expression are usually seen in the pulmonary vascular endothelium (brigham et al., 1993) . dna-lipid complexes introduced into the airway cause a generalized transfection of the lung. because the site of transfection is completely uncontrolled, expression of the inserted gene may need to be regulated by means of tissue-specific promoters (hazinski et al., 1991) . cell specific promoters are not available for all tissues, and those currently available are sometimes less specific than desired. while both cationic liposomes and retroviral gene delivery vectors lack the ability to target specific cells, the lack of length constraints makes tissue-specific expression easier to achieve using dna-lipid complexes as compared to retroviruses. ca2 +/edta chelation. in the presence of calcium, small unilamellar vesicles (suvs) composed of acidic phospholipids (typically phosphatidylserine) aggregate and fuse into large cochleate structures. when edta is added to chelate the calcium, these structures form large unilamellar vesicles (luvs) (papahadjopoulos et al., 1975) . in the process, the vesicles encapsulate a substantial fraction of the surrounding solvent and solutes. if dna is present, some of it becomes encapsulated in the vesicles. two advantages of this technique are: (i) the dna is not exposed to harsh conditions that would degrade it, and (ii) the entrapment is relatively efficient. however, the requirement for phosphatidylserine makes this technique impractical for most in vivo applications, with the exception of transfecting macrophages which have cell surface receptors for phosphatidylserine (ps). detergent dialysis (and virosomes). when membranes are mixed with detergents at concentrations above their critical mixed micellar concentration (cmmc), they combine with the membrane's phospholipids and proteins to form mixed micelles. because detergents are relatively water soluble, they can be removed from the mixed micelles by dialysis. when their concentration in the micelles falls below the cmmc, membrane bilayers regenerate and form liposomes. enveloped viruses have membrane proteins that promote the fusion of the viral membrane with cellular membranes. when an enveloped virus fuses to a cell, the nucleocapsid is released into the cytoplasm, and the infection proceeds. detergent solubilization of enveloped viruses produces a micellar mixture containing detergent, viral phospholipids, and viral membrane proteins. dialysis of these detergent mixtures results in the production of liposomes that are often called virosomes to indicate the presence of viral proteins. if the dialysis is performed properly, the fusion proteins of many viruses can be reconstituted in an active conformation (white et al., 1983) . viral fusion proteins having a variety of different characteristics have been identified. for example, some are only active at low ph and fuse with endosomal membranes, while others are active at physiological ph and fuse to the plasma membrane. some viral fusion proteins, such as the ha protein of influenza, bind to specific receptors before fusion. a few viral fusion proteins will only promote fusion to target membranes containing a specific mixture of phospholipids (cervin and anderson, 1991; nussbaum et al., 1992) . the major advantage of detergent dialysis is that it is very gentle since no sonication, vortexing, or organic solvent is used. this results in only a small amount of dna fragmentation during encapsulation. unfortunately, the efficiency of entrapment is very low, causing much of the dna to be wasted and resulting in a high fraction of empty liposomes being formed. loyter and co-workers have fused dna-containing liposomes with cells by making empty virosomes and dna-containing liposomes separately, and then fusing the two populations of vesicles in the presence of target cells (lapidot and loyter, 1990) . to date, this technique has not been used in vivo. reverse-phase evaporation. currently, the highest dna entrapment efficiencies can be achieved when vesicles (revs) are prepared by reverse-phase evaporation (fraley et al., 1980) . revs are prepared by bath sonication of a mixture of lipid, dna, ether, and buffered saline, followed by evaporation of the ether to form a paste of phospholipid inverted monolayers surrounding aqueous droplets. these are resolved into liposomes by vortexing the paste. encapsulation efficiency is high (up to 40 or 50%) because there is only a small amount of aqueous buffer present in relation to the amount of lipid. presumably, dna molecules present in the buffer are forced to curl up in small aqueous droplets, a conformation that is easily entrapped (szelei and duda, 1989) . because the size distribution of the revs is heterogeneous and hard to control, they are often sized by extrusion through a filter before use. extrusion. multilamellar vesicles formed by vortexing a hydrated lipid film are repeatedly passed (extruded) through a filter under pressure. as the liposomes are forced through the pores of the filter, they are converted from multilamellar to unilamellar vesicles, and the buffer and anything present in it is trapped inside the liposomes (mayer et al., 1986) . to a certain extent, the size of the liposomes corresponds to the pore size of the filter used. advantages of this technique are: (i) nearly any lipid or lipid mixture can be used, (ii) liposome size from 25 to 100 nm in diameter can be controlled, and (iii) the encapsulation efficiency is excellent (up to 30%). the main disadvantage is that the high pressures used in the extrusion process can fragment the dna, and denature or dissociate dna-protein complexes. a central difficulty in using liposomes for gene therapy is the entrapment of the dna. the ideal encapsulation method would efficiently entrap but not degrade the dna. at the same time, the method would be sufficiently flexible with regard to size and vesicle composition to allow targeting of liposomes based on these properties. dna in solution has a linear conformation. in general, dna restriction fragments containing an entire gene are much longer than the diameter of com-monly used liposomes. it should not be surprising then that most methods of liposome formation entrap dna only at low efficiency, and that the efficiency is inversely related to the length of the dna (fraley et al., 1980) . two methods of increasing the entrapment efficiency of dna have been developed. in the first, the liposomes are formed in such a way that the aqueous phase, containing the dna, is broken into droplets small enough such that the dna is forced to curl in upon itself to remain within the droplet. this probably occurs during rev formation. the other method achieves the same result by complexing the dna with some substance in a tight and compact aggregate. to date, this has been done with spermine (tikchonenko et al., 1988) and basic proteins (jay and gilbert, 1987) . the efficiency of encapsulation can also be increased by compacting the dna in phage head particles (szelei and duda, 1989) before liposomal entrapment. the basis for successful transfections by liposomal vectors delivered intravenously lies in the maximization of the in vivo circulation time of the liposomes. most liposome formulations are cleared rapidly from the circulation by the reticuloendothelial system (res), with large amounts accumulating in macrophages (allen et al., 1989) . if the intended target of the gene therapy is the res, as in gaucher's disease, this is a benefit. in most cases, however, this fate will only serve to reduce the number of liposomes being exposed to their intended target cell population. while several different strategies have been explored to lengthen the circulation half-life of liposomes, they all have certain aspects in common (allen et al., 1989; gabizon and papahadjopoulos, 1992) . first, they seek to make the liposomes as small as possible. this seems to reduce their interaction with the res. second, they are designed to make the liposomal surface as much like the surface of a normal cell as possible. this can be accomplished by using lipid formulations containing a high fraction of cholesterol, high transition temperature phospholipids, and a substantial fraction of sphingomyelin or the ganglioside gm 1. a different approach has been to modify the liposomal phospholipids by addition of polyethylene glycol (peg) to the polar headgroup (lasic et al., 1991; blume and cevc, 1993) . peg-modified lipids have a higher hydrophilicity than native lipids, and appear to escape recognition by macrophages. a potential disadvantage of this approach is that targeting of liposomes with covalently attached antibodies (see below) is inhibited by the presence of peg-modified lipids (klibanov et al., 1991) . once a liposome formulation has been settled upon, attention must be paid to limiting the number of cells exposed to the dna delivered by the liposomes. this is necessary for several reasons. first, if the dna will be integrated into the host cell's genome, there is a chance that insertional mutagenesis will lead to activation of a proto-oncogene and the development of cancer by the host. by limiting the sites of dna delivery, the chance of insertional mutagenesis is lowered. second, limiting the number of cells that are transfected reduces the amount of dna, lipid, and other proteins needed to perform the gene therapy. this reduction can be substantial, and can reduce potential systemic toxicities, as well as cost. table 4 compares some different methods of targeting liposomes to specific cell types. there are three general methods by which liposomes may be directed to a subset of cells within an organism. first, the liposomes may be introduced into a compartment of the body that is sequestered from the general circulation. one example of this is the addition of liposomes to the airway in a fluid or an aerosol (hazinski et al., 1991; stribling et al., 1992) . because liposomes cannot cross the lung epithelial barrier, entrance into the general circulation is prevented. therefore, it is anticipated that liposome vectors will transfect only those cells present on the epithelial surface. another site where this approach may be applicable is in the central nervous system (cns). in the cerebral vasculature, blood is separated from the cns by the blood-brain barrier, which would normally be completely impermeable to any liposomes. however, if the cerebral vasculature is isolated, and perfused with a hyperosmotic solution, the blood-brain barrier is damaged, leaving holes large enough to admit liposomes into the cns (johansson, 1992) . the blood-brain barrier reconstitutes itself shortly thereafter, leaving the liposomes trapped within the cns and unable to interact with any other cells. may not be specific enough second, the lipid composition of the vesicles can be adjusted to cause them to be taken up preferentially by some types of cells. for example, macrophages have a very high affinity for liposomes containing (ps) on their surface (lee et al., 1992) . consequently, liposomes containing a substantial fraction of ps are quickly cleared from the circulation by macrophages (allen et al., 1989) . a similar targeting strategy uses lactosylceramide as a membrane component. this lipid binds to the asialoglycoprotein receptors found on hepatocytes, causing these cells to be preferentially targeted by the vesicles (grosse et al., 1984) . the final method involves the addition of antibodies or other proteins to the surface of the liposome after it has been formed (loughrey et al., 1990) . these "proteoliposomes" are taken up by cells to which they can bind. table 5 gives a representative list of proteins that have been used to target liposomes to cells. many proteins can be modified by the addition of phospholipid molecules or acyl chains and retain their native function. these modified proteins can be coupled to a preexisting bilayer, endowing the liposome with their binding properties. all three methods described above have been used successfully to direct the uptake of liposomes to specific cell types. the methods are not mutually exclusive and, in principle, it should be possible to combine them for more specific targeting. the central problem of any liposome-based gene therapy system is that dna must be introduced into the cytoplasm of the target cell. when dna-loaded liposomes are added to cells they are endocytosed, and most are carried to lysosomes where they are degraded by digestive enzymes. somehow, a small fraction of encapsulated dna escapes this pathway, enters the cytoplasm, and is expressed by the cells (straubinger et al., 1990) . several methods have been developed to increase the proportion of dna delivered to the cytosol. two general approaches that have been used successfully are: (i) bypassing the endocytic pathway completely, and (ii) using the initial stages of the endocytic pathway but escaping the pathway before delivery of the dna to lysosomes. the endocytic pathway of liposomal dna entry into the cell is bypassed when liposomes are fused directly to the plasma membrane of the cell. this has been accomplished using virosomes containing a constitutively active viral fusion protein (nakanishi et al., 1987) . as soon as the virosome binds to the cell surface, the fusion protein acts to fuse the virosomal membrane with that of the cell, causing the vesicle contents to be introduced directly into the cytoplasm. some liposome delivery systems use the initial stages of the endocytic pathway in the delivery process. this strategy takes advantage of the low ph of endosomes, which can be used to trigger fusion of liposomes with the endosomal membrane. ph-dependent fusion of liposomes with endosomes has been achieved using vesicles composed of phosphatidylethanolamine (pe) and substances such as dipalmitoylsuccinylglycerol that leave the membrane at low ph (liu and huang, 1990) . as the ph is lowered in the endosome, the effective concentration of pe rises. when the concentration rises above 60 mole percent, the vesicle membrane can no longer remain as a bilayer. as it converts to the inverted hexagonal phase, it destabilizes the adjacent endosomal membrane, and releases the liposomal contents into the cytoplasm. similar results can be seen using virosomes containing ph-sensitive fusion proteins (gould-fogerite et al., 1989; lapidot and loyter, 1990) . once the dna has successfully been introduced into the cytoplasm of the target cell, the factors governing expression are much less well understood. however, proteins exist that are karyophilic, that is, they are actively transported into the nucleus. two such proteins are the semliki forest virus nucleocapsid protein (michel et al., 1990) and the non-histone high mobility group i chromosomal protein (hmg i) (kato et al., 1991; tomita et al., 1992) . if a gene is coencapsulated in a liposome with hmg i, expression increases as much as 10-fold over the levels seen without the protein. delivery of dna/protein complexes has the potential for greatly increasing transfection efficiency, and should prove to be compatible with most methods of liposomal preparation. one shortcoming that all present approaches to gene therapy share is their inability to maintain a high level expression of the transfected gene for long periods of time. for some applications, such as the activation of the immune system to fight a cancer, this will probably not be a problem. however, to effectively cure many genetic diseases, the gene will need to be expressed at near normal levels throughout the patient's life. successful transfection of cells with some currently available viral vectors may lead to expression of at least some viral proteins, resulting in the clearance of the cells by the host's immune system. this process may be partially responsible for the loss of expression seen in transfections with retroviral and adenoviral vectors. another possible cause of low levels of expression is that length constraints of viral vectors usually necessitate the use of cdna copies of genes. these constructs do not have introns, greatly reducing their length. there is some evidence from transgenic mouse experiments that expression of exogenous genes is higher when at least some of the introns are present (brinster et al., 1988; palmiter et al., 1991) . if this is true for genes inserted by viral or liposomal methods, then it may be necessary to include introns in the delivered dna to produce an adequate therapeutic effect. unfortunately, increasing the size of the therapeutic gene would prevent the use of most viral vectors. while all the methods of gene therapy have specific advantages, none is suitable for all applications. in an environment as complex as a living organism, it is futile to hope that a single method might be ideal in all situations. however, the availability of many methods greatly increases the number and types of diseases that can be treated. viruses have evolved such that they are able to efficiently introduce and express exogenous genes (i.e., viral genes) in eukaryotic cells. however, it is important to remember that viruses have been optimized by selective pressure to meet criteria that are not precisely those of gene therapy. in particular, most viruses need to maintain high-level expression of their proteins for only a short time, and need not be concerned with the viability of the host cell after infection. attempts to modify a virus into a gene therapy vector will be hampered by this conflict. virus-based methods of gene therapy are likely to be most useful in applications that require a burst of high-level expression in many of the patient's cells, such as in cancer therapy. cationic lipid-dna complexes will never be as selective as either virus-or conventional liposome-based gene therapy. however, they will remain much easier to prepare and use. the complexes will probably find their greatest use in in vitro and animal gene transfer. although they are currently more widely used in all areas of gene transfer than any other technique, their lack of control and cell-specificity will probably lead to their replacement by viral and liposomal methods of gene introduction in clinical areas. liposomal methods of gene therapy are flexible, in that all the components of the system are controlled by the designers. as these systems have become more sophisticated, they have begun to take on several characteristics of the viruses that they are intended to replace. the use of basic substances to condense the dna has increased the efficiency of encapsulation. the addition of nucleophilic proteins raises the efficiency of transfection. by adding antibodies or other targeting molecules to the surface of liposomes, preferential binding of vesicles to a desired cell type has been increased. the incorporation of viral fusion proteins into the membrane allows a greater fraction of liposomes binding to cells to deliver their contents into the cytoplasm. in short, many aspects of a virus from budding to delivery of the genetic material to the nucleus have been mimicked in liposomal vectors. current methods of liposome-based gene therapy may be improved by modifying the structure of the introduced gene. here, preliminary indications are that constructs resembling native genes, rather than cdnas, will provide the best long-term expression. not all these enhancements to the basic liposome have been used together, and not all of them are mutually compatible. however, it seems inevitable that the best liposome-based gene therapy vectors will eventually combine many of these enhancements. the resulting liposome, with a condensed dna-protein complex containing both histonelike proteins and nuclear targeting proteins, a membrane designed for long circulation within the blood, targeting molecules on the surface, and viral fusion proteins in the membrane, will in effect be a completely man-made nonreplicative virus. liposomes with prolonged circulation times: factors affecting uptake by reticuloendothelial and other tissues molecular mechanism of the lipid vesicle longevity in vivo expression of human growth hormone fusion genes in cultured lung endothelial cells and in the lungs of mice lntrons increase transcriptional efficiency in transgenic mice modulation of coronavirus-mediated cell fusion by homeostatic control of cholesterol and fatty acid metabolism gene therapy strategies for pulmonary disease 19q_'2) effect of cationic cho!e~terol derivatives on gene transfer and protein kinase c activity introduction of liposome-encapsulated sv40 dna into cells the role of surface charge and hydrophilic groups on liposome clearance in vivo a novel cationic liposome reagent for efficient transfection of mammalian cells herpesviruses: expression of genes in postmitotic brain cells chimerasome-mediated gene transfer in vitro and in vivo flow cytofluorometric investigation of the uptake by hepatocytes and spleen cells of targeted and untargeted liposomes injected intravenously into mice localization and induced expression of fusion genes in the rat lung gene therapy for human inherited disorders: techniques and status treatment of patients with severe combined immunodeficiency due to adenosine deaminase (ada) deficiency by autologous transplantation of genetically modified bone marrow cells liposome-mediated transformation of eukaryotic cells synthetic cationic amphiphiles for liposome-mediated dna transfection basic protein enhances the incorporation of dna into lipid vesicles: model for the formation of primordial cells experimental models of altering the blood-brain barrier direct injection of hepatitis b virus dna into liver induced hepatitis in adult rats activity of amphipathic poly(ethylene glycol) 5000 to prolong the circulation time of liposomes depends on the liposome size and is unfavorable for immunoliposome binding to target gene therapy: adenovirus vectors fusion-mediated microinjection of liposome-enclosed dna into cultured cells with the aid of influenza virus glycoproteins sterically stabilized liposomes: a hypothesis on the molecular origin of the extended circulation times recognition of liposomes by cells: in vitro binding and endocytosis mediated by specific lipid headgroups and surface charge density interactions of mammalian cells with lipid dispersions containing novel metabolizable cationic amphiphiles. b iochim. b iophys ph-sensitive, plasma-stable liposomes with relatively prolonged residence in circulation optimized procedures for the coupling of proteins to liposomes vesicles of variable sizes produced by a rapid extrusion procedure karyophilic properties of semliki forest virus nucleocapsid protein human gene therapy direct gene transfer with dna-liposome complexes in melanoma: expression, biologic activity, and lack of toxicity in humans the improved efficient method for introducing macromolecules into cells using hvj (sendai virus) liposomes with gangliosides fusion of influenza virus particles with liposomes: requirement for cholesterol and virus receptors to allow fusion with and lysis of neutral but not negatively charged liposomes gene therapy for the treatment of brain tumors using intra-tumoral transduction with the thymidine kinase gene and intravenous ganciclovir heterologous introns can enhance expression oftransgenes in mice cochleate lipid cylinders: formation by fusion of unilamellar lipid vesicles targeted integration of adeno-associated virus (aav) into human chromosome 19 endocytosis and intracellular fate of liposomes using pyranine as probe aerosol gene delivery in vivo entrapment of high-molecular-mass dna molecules in liposomes for the genetic transformation of animal cells transfer of condensed viral dna into eukaryotic cells using proteoliposomes direct in vivo gene introduction into rat kidney membrane fusion proteins of enveloped animal viruses key: cord-015850-ef6svn8f authors: saitou, naruya title: eukaryote genomes date: 2013-08-22 journal: introduction to evolutionary genomics doi: 10.1007/978-1-4471-5304-7_8 sha: doc_id: 15850 cord_uid: ef6svn8f general overviews of eukaryote genomes are first discussed, including organelle genomes, introns, and junk dnas. we then discuss the evolutionary features of eukaryote genomes, such as genome duplication, c-value paradox, and the relationship between genome size and mutation rates. genomes of multicellular organisms, plants, fungi, and animals are then briefly discussed. duplications sometimes occur in eukaryotes, especially in plants and in vertebrates, but genome duplication is so far not known for prokaryotic genomes. because the gene number of typical eukaryotic genomes is much larger than that of prokaryotes, there are many genes shared among most of eukaryote genomes but nonexisting in prokaryote genomes. some examples are listed in table 8.2 . for example, myosin is located in animal muscle tissues, and its homologous protein exists in cytoskeleton of all eukaryotes, but not found in prokaryotes. recently, kryukov et al. (2012; [ 1 ] ) constructed a new database on oligonucleotide sequence frequencies and conducted a series of statistical analyses. frequencies of all possible 1-10 oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1-4. deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. figure 8 .1 shows the distribution of the deviation for various organismal groups. the biological reason for this difference is not known. there are two major types of organella in eukaryotes: mitochondria and plastids. figure 8 .2 shows schematic views of mitochondria and chloroplasts. these two organella has their independent genomes. this suggests that they were initially independent organisms which started intracellular symbiosis with primordial eukaryotic cells. because most eukaryotes have mitochondria, the ancestral eukaryotes, a lineage that emerged from archaea, most probably started intracellular symbiosis with mitochondrial ancestor. a parasitic rickettsia prowazekii is so far phylogenetically closest to mitochondria [ 2 ] , and a rickettsia-like bacterium is the best candidate as the mitochondrial ancestor. however, there is an alternative "hydrogen hypothesis" [ 3 ] . plastids include chloroplasts, leucoplasts, and chromoplasts and exist in land plants, green algae, red algae, glaucophyte algae, and some protists like euglenoids. mitochondrial genome sizes of some representative eukaryotes are listed in table 8 . 3 . most of animal mitochondrial genomes are less than 20 kb, and sizes of protist and fungi mitochondrial genomes are somewhat larger. mitochondrial genome size of plants is much larger than those of other eukaryotic lineages, yet the size is mostly less than 500 kb. an ancestral eukaryotic cell, probably an archaean lineage, hosted a bacterial cell, and intracellular symbiosis started. initially, archaea and bacteria shared genes responsible for basic metabolism, and the situation is a sort of gene duplication for many genes, though homologous genes are not identical but already diverged long time ago. in any case, division of labor followed, and only limited metabolic pathways were left in the bacterial system, which eventually became mitochondria. animal mitochondrial genomes contain very small number of genes; 13 for peptide subunits, 20 for trna, and 2 for rrna [ 4 ] . genome size (kb) animals homo sapiens (human) 16 .5 takifugu rubripes (torafugu fi sh) 16 representative animal species mitochondrial dna genomes. although most of vertebrate mitochondrial dna genomes have the same gene order as in human ( fig. 8 .3a ), gene order may vary from phylum to phylum. yet the gene content and the genome size are more or less constant among animals. it is not clear why animal mitochondrial genomes are so small. one possibility is that animal individuals are highly integrated compared to fungi and plants, and this might have infl uenced a drastic reduction of the mitochondrial genome size. another interesting feature of animal mitochondrial dna genomes is the heterogeneous rates of gene order change. for example, platyhelminthes exhibit great variability in mitochondrial gene order (sakai and sakaizumi, 2012; [ 5 ] ). in contrast, plant mitochondrial genomes are much larger (see table 8 .3 ). figure 8 .4 shows the genome structure of tobacco mitochondrial genome (from sugiyama et al. 2005; [ 6 ] ). horizontal gene transfers are also known to occur in plant mitochondrial dnas even between remotely related species [ 7 ] . the melon ( cucumis melo ) mitochondrial genome size, ca. 2.9 mb, is exceptionally large, and recently its draft genome was determined [ 8 ] . interestingly, melon mitochondrial genome looks like the vertebrate nuclear genome in its contents, in spite of its genome size being similar to that of bacteria. the protein coding gene region accounted for only 1.7 % of the genome, and about half of the genome is composed of repeats. the remaining part is mostly homologous to melon nuclear dna, and 1.4 % is homologous to melon chloroplast dna. most of the protein coding genes of melon mitochondrial dnas are highly similar to those of its congeneric species, which are watermelon and squash whose mitochondrial genome sizes are 119 kb and 125 kb, respectively. this indicates that the huge expansion of its genome size occurred only recently. interestingly, cucumber ( cucumis sativus ), another congeneric species, also has ~1.8-mb mitochondrial genome with many repeat sequences [ 9 ] . it will be interesting to study whether the increase of mitochondrial genomes of melon and cucumber is independent or not. chloroplasts exist only in plants, algae, and some protists. it may change to leucoplasts and chromoplasts. because of this, a generic name "plastids" may also be used. the origin of chloroplast seems to be a cyanobacterium that started intracellular symbiosis as in the case of mitochondria. a unique but common feature of chloroplast genome is the existence of inverted repeats [ 10 ] , and they mainly contain rrna genes. chloroplast dna contents may [ 11 ] . chloroplast genomes were determined for more than 340 species as of december 2013 [ 106 ] . their genome sizes range from 59 kb ( rhizanthella gardneri ) to 521 kb ( floydiella terrestris ). although the largest chloroplast genome is still much smaller than atypical bacterial genome, its average intergenic length is 4 kb, much longer than that for bacterial genomes. fractions of mitochondrial dna may sometimes be inserted to nuclear genomes, and they are called "numts." an extensive analysis of the human genome found over 600 numts [ 12 ] . their sequence patterns are random in terms of mitochondrial genome locations. this suggests that mitochondrial dnas themselves were inserted, not via cdna reverse-transcribed from mitochondrial mrna. a possible source is sperm mitochondrial dna that were fragmented after fertilization [ 12 ] . the reverse direction, from nucleus to mitochondria, was observed in melon, as discussed in subsection 8.2.1 . intron is a dna region of a gene that is eliminated during splicing after transcription of a long premature mrna molecule. intron was discovered by phillip a. sharp and richard j. roberts in 1977 as "intervening sequence" [ 13 ] , but the name "intron" coined by walter gilbert in 1978 [ 14 ] is now widely used. it should be noted that some description on intron by kenmochi [ 15 ] was used for writing this section. there are various types of introns, but they can be classifi ed into two: those requiring spliceosomes (spliceosome type) and self-splicing type. figure 8 .5 shows the splicing mechanisms of these two major types. most of introns in nuclear genomes of eukaryotes are spliceosome type, and there are common gu-ag type and rare au-ac type, depending on the nucleotide sequences of the intron-exon boundaries [ 16 ] . spliceosomes involving these two types differ [ 17 ] . self-splicing introns are divided into three groups: groups i, ii, and iii. group i introns exist in organellar and nuclear rrna genes of eukaryotes and prokaryotic trna genes. group ii are found in organellar and some eubacterial genomes. cavalier-smith [ 18 ] suggested that spliceosome-type introns originated from group ii introns because of their similarity in splicing mechanism and structural similarity between group ii introns and spliceosomal rna. group iii introns exist in organellar genomes, and its splicing system is similar with that of group ii intron, though they are smaller and have unique secondary structure. there is yet another type of introns which exist only in trnas of single-cell eukaryotes and archaea [ 19 ] . these introns do not have self-splicing functions, but endonuclease and rna ligase are involved in splicing. the location of this type of introns is often at a certain position of the trna anticodon loop. after the discovery of introns, their probable functions and evolutionary origin have long been argued (e.g., [ 20 , 21 ] ). because self-splicing introns can occur at any time, even in the very early stage of origin of life, we consider only spliceosometype introns. for brevity, we hereafter call this type of introns as simply "intron." there are mainly two major hypotheses: introns early and introns late. the former claims that exon existed as a functional unit from the common ancestor of prokaryotes and eukaryotes, and "exon shuffl ing" was proposed for creating new protein functions [ 14 ] . introns which separate exons should also be quite an ancient origin [ 14 , 22 ] . in contrast, introns are considered to emerge only in the eukaryotic lineage according to the introns-late hypothesis [ 23 , 24 ] . the protein "module" hypothesis proposed by go [ 25 ] is related to be intronsearly hypothesis. pattern of intron appearance and loss has been estimated by various methods (e.g., [ 21 , 26 ] ). kenmochi and his colleagues analyzed introns of ribosomal proteins of mitochondrial genomes and eukaryotic nuclear genomes in details [ 27 -29 ] . these studies supported the introns-late hypothesis, because introns in mitochondrial and cytosolic ribosomal proteins seem to be independent origins and introns seem to emerge in many ribosomal protein genes after eukaryotes appeared. introns do not code for amino acid sequences by defi nition. in this sense, most of introns may be classifi ed as junk dnas (see the next section). there are, however, evolutionarily conserved regions in introns, suggesting the existence of some functional roles in introns. ohno (1972; [ 30 ] ) proclaimed that the most part of mammalian genomes are nonfunctional and coined the term "junk dna." with the advent of eukaryotic genome sequence data, it is now clear that he was right. there are in fact so much junk dnas in eukaryotic genomes. junk dnas or nonfunctional dnas can be divided into repeat sequences and unique sequences. repeat sequences are either dispersed type or tandem type. unique sequences include pseudogenes that keep homology with functional genes. prokaryote genomes sometimes contain insertion sequences; however, this kind of dispersed repeats constitutes the major portion of many eukaryotic genomes. these interspersed elements are divided into two major categories according to their lengths: short ones (sines) and long ones (lines). one well-known example of sine is alu elements in primate genomes. it is about 300-bp length, and originated from 7sl ribosomal rna gene. let us see the real alu element sequence from the human genome sequence. if we retrieve the ddbj/embl/genbank international sequence database accession number ap001720 (a part of chromosome 21), there are 128 alu elements among the 340kb sequence. the density is 0.38 alu elements per 1 kb. if we consider the whole human genome of ~ 3 billion bp, alu repeats are expected to exist in ~1.13 million copies. one example of alu sequence is shown below from this entry coordinates from 133600 to 133906: ggcgggagcg atggctcacg cctgtaatgc cagcactttg ggaggccgag gtgggtggat cacaaggtca ggagatagag accatcctgg ctaacacggt gaaacactgt ctctactaaa aacacaaaaa actagccagg cgtggtggcg ggtgcctgta atcccagcta ctcgggaggc tgaggcagga gaatggtgtg aacccaggaa gtggagcttg cagtgagctc agattgcgcc actgcactcc agcctgggtg acagagtgag actccatctc aaaaaaaata aaataaataa aaaaaa if we do blast homology search (see chap. 14 ) using ddbj system ( http:// blast.ddbj.nig.ac.jp/blast/blastn ) targeted to nonhuman primate sequences (pri division of ddbj database), the best hit was obtained from chimpanzee chromosome 22, which is orthologous to human chromosome 21. i suggest interested readers to do this homology search practice. alu elements were fi rst classifi ed into j and s subfamilies [ 31 ] . it is not clear about the reason of selection of two characters (j and s), but probably two authors (jurka and smith) used initials of their surnames. in any case, this division was based on the distance from alu consensus sequence; alu elements which are more close to the consensus were classifi ed as s and those not as j. later, a subset of the s subfamily were found to be highly similar with each other, and they were named as y after 'young," for they appeared relatively in young or recent age. rough estimates of the divergence time of alu elements are as follows: j subfamily appeared about 60 million years ago, and s subfamily separated from j at 44 million years ago, followed by further separation of y at 32 million years ago [ 32 ] . figure 8 .6 shows the overall pattern of alu element evolution (based on [ 32 ] ). tandemly repeated sequences are also abundant in eukaryotic genomes, and the representative ones are heterochromatin regions. heterochromatins are highly condensed nonfunctional regions in nuclear dna, in contrast to euchromatins, in which many genes are actively transcribed. heterochromatins usually reside at teromeres, terminal parts of chromosomes, and at centromeres, internal parts of chromosomes, that connect spindle fi bers during cell division. a more than 1-mb teromeric regions of arabidopsis thaliana were found to be tandem repeats of ca. 180-bp repeat unit [ 33 , 34 ] . the nucleotide sequence below is arabidopsis thaliana tandemly repeated sequence ar12 (international sequence database accession number x06467): aagcttcttc ttgcttctca atgctttgtt ggtttagccg aagtccatat gagtctttgt ctttgtatct tctaacaagg aaacactact taggctttta ggataagatt gcggtttaag ttcttatact taatcataca catgccatca agtcatattc gtactccaaa acaataacc the human genome also has a similar but nonhomologous sequence in centromeres, called "alphoid dna" with the 171-bp repeat unit [ 35 ] . the following is the sequence (international sequence database accession number m21746): catcctcaga aacttctttg tgatgtgtgc attcaagtca cagagttgaa cattcccttt cgtacagcag tttttaaaca ctctttctgt agtatctgga agtgaacatt aggacagctt tcaggtctat ggtgagaaag gaaatatctt caaataaaaa ctagacagaa g if we do blast homology search (see chap. 13 ) targeted to the human genome sequences of the ncbi database, there was no hit with this alphoid sequence. this clearly shows that the human genome sequences currently available are far from complete, for they do not include most of these tandem repeat sequences. telomores of the human genome are composed of hundreds of 6-bp repeats, ttaggg. if we search the human genome as 36-bp long 6 tandem repeats of this 6-repeat units as query using the ncbi blast, many hits are obtained. as we already discussed in chap. 4 , authentic pseudogenes have no function, and they are genuine members of junk dnas. when a gene duplication occurs, one of two copies often become a pseudogene. because gene duplication is prevalent in eukaryote genomes, pseudogenes are also abundant. pseudogenes are, by defi nition, homologous to functional genes. however, after a long evolutionary time, many selectively neutral mutations accumulate on pseudogenes, and eventually they will lose sequence homology with their functional counterpart. there are many unique sequences in eukaryote genomes, and majority of them may be this kind of homology-lost pseudogenes. a long rna is initially transcribed from a genomic region having an exon-intron structure, and then rnas corresponding to introns are spliced out. these leftover rnas may be called "junk" rnas, for they will soon be degraded by rnase. only a limited set of genes are transcribed in each tissue of multicellular organisms, but leaky expression of some genes may happen in tissues in which these genes should not be expressed. again these are "junk" rnas, and they are swiftly decomposed. a series of studies (e.g., [ 36 , 37 ] ) claimed that many noncoding dna regions are transcribed. however, van bakel et al. [ 38 ] showed that most of them were found to be artifact of chip-chip technologies used in these studies. if nonsense or frameshift mutations occur in a protein coding sequences, that gene cannot make proteins. yet its mrna may be produced continuously until the promoter or its enhancer will become nonfunctional. in this case, this sort of mutated genes produces junk rnas. if only a small quantity of rnas are found from cells and when they are not evolutionarily conserved, they are probably some kind of junk rnas. as junk dnas and junk rnas exist, cells may also have "junk" proteins. if mature mrnas are not produced in the expected way, various aberrant mrna molecules will be produced, and ribosomes try to translate them to peptides based on these wrong mrna information. proteins produced in this way may be called "junk" proteins, for they often have no or little functions. even if one protein is correctly translated and is moved to its expected cellular location, it can still be considered as "junk" protein. one good example is the abcc11 transporter protein of dry-type cerumen (earwax), for one nonsynonymous substitution at this gene caused that protein to be essentially nonfunctional [ 39 ] . there are various genomic features that are specifi c to eukaryotes other than existence of introns and junk dnas, such as genome duplication, rna editing, c-value paradox, and the relationship between genome size and mutation rates. we will briefl y discuss them in this section. the most dramatic and infl uential change of the genome structure is genome duplications. genome duplications are also called polyploidization, but this term is tightly linked to karyotypes or chromosome constellation. prokaryotes are so far not known to experience genome duplications, which are restricted to eukaryotes. interestingly, genome duplications are quite frequent in plants, while it is relatively rare in the other two multicellular eukaryotic lineages. an ancient genome duplication was found from the genome analysis of baker's yeast [ 40 ] , and rhizopus oryzae , a basal lineage fungus, was also found to experience a genome duplication [ 41 ] . among protists, paramecium tetraurelia is known to have experienced at least three genome duplications [ 42 ] . because we human belongs to vertebrates and the two-round genome duplications occurred at the common ancestor of vertebrates (see chap. 9 ), we may incline to think that genome duplications often happen in many animal species. it is not the case. so far, only vertebrates and some insects are known to experience genome duplications. the reason for this scattered distribution of genome duplication occurrences is not known. if we plot the number of synonymous substitutions between duplogs in one genome, it is possible to detect a relatively recent genome duplication. this is because all genes duplicate when a genome duplication occurs, while only a small number of genes duplicate in other modes of gene duplications (see chap. 3 ). figure 8 .7 shows the schematic view of two cases: with and without genome duplication. lynch and conery (2000; [ 44 ] ) used this method to various genome sequences and found that the arabidopsis thaliana genome showed a clear peak indicative of relatively recent genome duplication, while the genome sequences of nematode caenorhabditis elegans and yeast saccharomyces cerevisiae showed the curves of exponential reduction. it is interesting to note that before the genome sequence was determined, the genome duplication was not known for arabidopsis thaliana, while the genome of saccharomyces cerevisiae was later shown to be duplicated long time ago [ 40 ] . when genome duplications occurred in some ancient time, the number of synonymous substitutions may become saturated and cannot give appropriate result. in this case, the number of amino acid substitutions may be used, even if each protein may have varied rates of amino acid substitutions. in any case, accumulation of mutations will eventually cause two homologous genes to become not similar with each other. therefore, although the possibility of genome duplications in prokaryotes are so far rejected [ 45 ] , it is not possible to infer the remote past events simply by searching sequence similarity. we should be careful to reach the fi nal conclusion. modifi cation of particular rna molecules after they are produced via transcription is called rna editing. all three major rna molecules (mrna, trna, and rrna) may experience editing [ 46 ] . there are various patterns of rna editing; substitutions, in particular between c and u, and insertions and deletions, particularly u, are mainly found in eukaryote genomes. guide rna molecules exist in one of the main rna editing mechanisms, and they specify the location of editing, but there are some other mechanisms [ 47 ] . it is not clear how the rna editing mechanism evolved. tillich et al. [ 47 ] studied chloroplast rna editing and concluded that suddenly many nucleotide sites of chloroplast dna genome started to have rna editing, but later the sites experiencing rna editing constantly decreased via mutational changes. they claimed that there was no involvement of rna editing on gene expression. this result does not give rna editing a positive signifi cance. because there are many types of rna molecules inside a cell, there also exist many sorts of enzymes that modify rnas. it may be possible that some of them suddenly started to edit rnas via a particular mutation. rna editing which did not cause deleterious effects to the genome may have survived by chance at the initial phase. this view suggests the involvement of neutral evolutionary process in the evolution of rna editing. organisms with complex metabolic pathways have many genes. multicellular organisms are such examples. generally speaking, their genome sizes are expected to be large. in contrast, viruses whose genomes contain only a handful of genes have small genome sizes. therefore, their possibility of genome evolution is rather limited. even if amino acid sequences are rapidly changing because of high mutation rates, the protein function may not change. unless the gene number and genome size increase, viruses cannot evolve their genome structures. it is thus clear that the increase of the genome size is crucial to produce the diversity of organisms. however, genomes often contain dna regions which are not indispensable. organisms with large genome sizes have many such junk dna regions. because of their existence, the genome size and the gene number are not necessarily highly correlated. this phenomenon was historically called c-value paradox (e.g., [ 48 ] ), after the constancy of the haploid dna amount for one species was found, yet their values were found to vary considerably among species at around 1950 (e.g., [ 49 -51 ] ). "c-value" is the amount of haploid dna, and c probably stands as acronym of "constant" or "chromosomes." we now know that the majority of eukaryote genome dna is junk, and there is no longer a paradox in c-values among species. 56 ]) found conserved noncoding dna sequences from insects, nematodes, and yeasts by comparing closely related species. we will discuss more on conserved noncoding sequences of vertebrates in chap. 9 . as for plants, kaplinsky [ 62 ] ) compared genome sequences of arabidopsis, grape rice, and brachypodium and found >100 times more abundant cnss from monocots than dicots. hettiarachchi and saitou; [ 63 ] compared genome sequences of 15 plant species and searched lineage-specifi c cnss. they found 2 and 22 cnss shared by all vascular plants and angiosperms, respectively, and also confi rmed that monocot cnss are much more abundant than those of dicots. what kind of the relationship exists between the genome size and mutation rates? if all the genetic information contained in the genome of one organism are necessary for survival of that organism, the individual will die even if only one gene of its genome lost its function by a mutation. an organism with a small genome size and hence with a small number of genes, such as viruses, can survive even if the mutation rate is high. in contrast, organisms with many genes may not be able to survive if highly deleterious mutations often happen. therefore, such organisms must reduce the mutation rate. however, when the nucleotide substitution type mutation rate per generation was compared with the whole-genome size, lynch (2006; [ 65 ] ) found a positive correlation. more recently, lynch (2010; [ 66 ] ) admitted that for organisms with small-sized genomes, these two values were in fact negatively correlated. however, when large-genome-sized eukaryotes are compared, now a positive correlation was observed. we have to be careful when we discuss these two contradictory reports. one considered the rate using unit as physical year, while the other used one generation as the unit. another difference is to use either only protein coding gene region dna sizes or the whole-genome sizes. the relationship between the mutation rate and genome size is not simple. drake et al. (1998; [ 67 ] ) examined this problem and found that the mutation rate per genome per replication was approximately 1/300 for bacteria, while mutation rates of multicellular eukaryotes vary between 0.1 and 100 per genome per sexual or individual generation. table 8 .4 shows the list of the mutation rate and the genome size for various organisms. apparently there is no clear tendency. we will discuss genomes of three multicellular lineages of eukaryotes: plants, fungi, and animals in this section. unfortunately, there seems to be no common feature of genomes of multicellular organisms, so each lineage is discussed independently. arabidopsis thaliana was the fi rst plant species whose 125-mb genome was determined in 2000 [ 68 ] . a. thaliana is a model organism for fl owering plants (angiosperms), with only 2-month generation time. in spite of its small genome size, only 4 % of the human genome, it has 32,500 protein coding genes. the genome sequence of its closely related species, a. lyrata , was also recently determined [ 69 ] . angiosperms are divided into monocots and dicots. a. thaliana is a dicot, and genome sequences of six more species were determined as of december 2013 (see table 8 .5 ). rice, oryza sativa , is a monocot, and its genome size, 370 ~ 410 mb, is much smaller than that of the wheat genome. its japonica and indica subspecies genomes were determined [ 70 ] and [ 71 ] , and the origin of rice domestication is currently in great controversy, particularly in single or multiple domestication events (e.g., [ 72 , 73 ] ). the number of protein coding genes in the rice genome is 37,000 ~ 40,000 [ 74 ] . wheat corresponds to genus triticum , and there are many species in this genus. the typical bread wheat is triticum aestivum , and it is a hexaploid with 42 (7 ã� 6) chromosomes. its genome arrangement is conventionally written as aabbdd [ 75 ] . because it is now behaving as diploid, genomic sequencing of 21 chromosomes (a1-a7, b1-b7, and d1-d7) is under way (see http://www.wheatgenome.org/ for the current status). the hexaploid genome structure emerged by hybridization of diploid (dd) cultivated species t. durum and tetraploid (aabb) wild species aegilops tauschii [ 75 ] . a genome duplication followed hybridization. non-seedling land plants are ferns, lycophytes, and bryophytes, in the order of closeness to seed plants (e.g., [ 76 ] ). a draft genome sequence of a moss, physcomitrella patens was reported in 2008 [ 77 ] , followed by genome sequencing of a lycophyte, selaginella moellendorffi i, in 2011 [ 78 ] . these genome sequences of different lineages of plants are deciphering stepwise evolution of land plants. the genome sequence of baker's yeast ( saccharomyces cerevisiae ) was determined in 1996, as the fi rst eukaryotic organism [ 79 ] . there are 16 chromosomes in s . cerevisiae, and its genome size is about 12 mb. there are a total of 8,000 genes in its genome: 6,600 orfs and 1,400 other genes. the genome-wide gc content is 38 %, slightly lower than that of the human genome. the proportion of introns is very small compared to that of the human genome, and the average length of one intron is only 20 bp, in contrast to the 1,440-bp average length of exons [ 80 ] . as we already discussed, the ancestral genome of baker's yeast experienced a genome-wide duplication [ 40 ] . pseudogenes, which are common in vertebrate genomes, are rather rare in the genome of baker's yeast; they constitute only 3 % of the protein coding genes [ 80 ] . the baker's yeast is often considered as the model organisms for all eukaryotes; however, their genome may not be a typical eukaryote genome. as of december 2013, genome sequences of more than 400 fungi species are available (see ncbi genome list at http://www.ncbi.nlm.nih.gov/genome/browse/ for the present situation). figure 8 .9 shows the relationship between the genome size and gene numbers for 88 genomes. there is a clear positive correlation between them. however, there are some outliers. the perigord black truffl e ( tuber melanosporum ), shown as a i n fig. 8.9 , has the largest genome size (~125 mb) among the 88 fungi species whose genome sequences were so far determined, yet the number of genes is only ~7,500 [ 81 ] . three other outlier species are postia placenta , ajellomyces dermatitidis , and melampsora laricipopulina , shown as b, c, and d in fig. 8.9 , respectively. interestingly, these four outlier species are phylogenetically not clustered well; two are belonging to pezizomycotina of ascomycota and the other two are agaricomycotina and pucciniomycotina of basidiomycota. if we exclude these four outlier species, a good linear regression is obtained, as shown in fig. 8.9 . this straight line indicates that in average, one gene size corresponds to 2.9 kb in a typical fungi genome. if we apply this average gene size to the truffl e genome, its genome size should be ~22 mb, but the real size is 103 mb larger. this suggests that there is unusually large number of junk dna in this genome. in fact, 58 % of its genome consists of transposable elements [ 81 ] . the truffl e genome must still have 24 % more junk dna region. gain and loss of genes in each branch of the phylogenetic tree for fungi species are shown in fig. 8 .10 (based on [ 81 ] ). it will be interesting to examine genome sizes of species related to the perigord black truffl e, so as to infer the evolutionary period when the genome size expansion occurred. the relationship between the genome size and gene numbers among 88 fungi genomes system that is responsible for this is hox genes. we thus fi rst discuss this gene system in this subsection. the genome of c. elegans , fi rst determined genome among animals, will be discussed next, followed by genomes of insects and those of deuterostomes. because genomes of many vertebrate species were determined, we discuss them in chap. 9 , and in particular, on the human genome in chap. 10 . hox genes were initially found through studies of homeotic mutations that dramatically change segmental structure of drosophila by edward b. lewis [ 82 ] . they code for transcription factors, and a dna-binding peptide, now called homeobox domain, was later found in almost all animal phyla [ 83 ] . figure 8 .11 shows the hox gene clusters found in 12 animal groups. there are four hox clusters in mammalian and avian genomes, and they are most probably generated by the two-round genome duplication in the common ancestor of vertebrates (see chap. 9 ). interestingly, the physical order of hox genes in chromosomes and the order of gene expression during the development are corresponding, called "collinearity" [ 84 ] . this suggests that some sort of cis-regulation is operating in hox gene clusters, and in fact, many long transcripts are found, and some of their transcription start sites are highly conserved among vertebrates [ 85 ] . figure 8 .12 shows highly conserved the hox genes control expression of different groups of downstream genes, such as transcription factors, elements in signaling pathways, or genes with basic cellular functions. hox gene products interact with other proteins, in particular, on signaling pathways, and contribute to the modifi cation of homologous structures and creation of new morphological structures [ 87 ] . there are other gene families that are thought to be involved in diverse animal body plan. one of them is the zic gene family [ 88 ] . the zic gene family exists in many animal phyla with high amino acid sequence homology in a zinc-fi nger domain called zf, and members of this gene family are involved in neural and neural crest development, skeletal patterning, and left-right axis establishment. this gene family has two additional domains, zoc and zf-bc. interestingly, cnidaria, platyhelminthes, and urochordata lack the zoc domain, and their zf-bc domain sequences are quite diverged compared to arthropoda, mollusca, annelida, echinodermata, and chordata. this distribution suggests that the zic family genes with the entire set of the three conserved domains already existed in the common ancestor of bilateralian animals, and some of them may be lost in parallel in the platyhelminthes, nematodes, and urochordates [ 88 ] . interestingly, phyla that lost zoc domains have quite distinct body plan although they are bilateralian. caenorhabditis elegans was the fi rst animal species whose 97-mb draft genome sequence was determined in 1998 [ 89 ] . this organism belongs to the nematoda phylum which includes a vast number of species [ 90 ] . brenner (1974; [ 91 ] ) chose this species as model organism to study neuronal system, for its short generation time (~ 4 days) and its size (~1 mm). the following description of this section is based on the information given in online "wormbook" [ 86 ] . there are 22,227 protein coding genes in c. elegans including 2,575 alternatively spliced forms, with 79 % confi rmed to be transcribed at least partially. the number of trna genes is 608, and 274 are located in x chromosome. the three kinds of rrna genes (18s, 5.8s, and 26s) are located in chromosome i in 100-150 tandem repeats, while ~100 5s rrna genes are also in tandem form but located in chromosome v. the average protein coding gene length is 3 kb, with the average of 6.4 coding exons per gene. in total, protein coding exons constitute 25.6 % of the whole genome. figure 8 .13 shows the distribution of the protein coding genes, and fig. 8 .14 the distribution of exon numbers per gene. both distributions have long tails. the median sizes of exons and introns are 123 bp and 65 bp, respectively. intron lengths of c. elegans are quite short compared to these of vertebrate genes (see chap. 9 ). the distribution of protein coding genes varies depending on chromosomes, slightly more dense for fi ve autosomes than x chromosome and more dense in the central region than the edge of one chromosome. processed, i.e., intronless, pseudogenes are rare, and a total of 561 pseudogenes were reported at the wormbase version ws133. about half of them are homologous to functional chemoreceptor genes. genome sequences of four congeneric species of c. elegans ( c. brenneri , c. briggsae , c. japonica , and c. remanei ) were determined ( http://www.ncbi.nlm.nih. gov/genome/browse/ ). a fruit fl y drosophila melanogaster was used by thomas hunt morgan's group in the early twentieth century and has been used for many genetic studies. because of this importance, its genome sequence was determined at fi rst among arthropods in 2000 [ 92 ] . heterochromatin regions of ~50 mb were excluded from sequencing, [ 93 ] . their genome sizes vary from 145 to 258 mb, and the number of genes is 15,000-18,000. interestingly, d . melanogaster has the largest genome size and the smallest number of genes. a total of 12 insect species other than drosophila 12 species were sequenced by end of 2011 [ 1 ] . as of december 2013, their genome sizes are in the range of 108 mb and 540 mb, more than fi ve times difference, and the gene numbers are from 9,000 to 23,000. deuterostomes contain fi ve phyla: echinodermata, hemichordata, chaetognatha, xenoturbellida, and chordata. the genome of sea urchin strongylocentrotus purpuratus [ 94 ] was determined in 2006. its genome size is 814 mb with 23,300 genes. genomes of another sea urchins, lytechinus variegatus and patiria miniata , are also under sequencing, as well as hemicordate saccoglossus kowalevskii . chordata is classifi ed into urochordata (ascidians), cephalochordata (lancelets or amphioxus), and vertebrata (vertebrates). because we will discuss genomes of vertebrates in chap. 9 , let us discuss genomes of ascidians and lancelets only. the genome of ascidian ciona intestinalis was determined in 2002 [ 95 ] , and the genome sequence of its congeneric species, c. savignyi , was also determined three years later [ 96 ] . the genome size of c. intestinalis is ~155 mb with ~16,000 genes. interestingly it contains a group of cellulose synthesizing enzyme genes, which were probably introduced from some bacterial genomes via horizontal gene transfer [ 8 , 97 ] . the c. intestinalis genome also contains several genes that are considered to be important for heart development ( [ 95 ] ), and this suggests that heart of ascidians and vertebrates may be homologous. through the superimposition of phylogenetic trees (see chapter a2) for fi ve genes coding muscle proteins, oota and saitou ([ 98 ] ) estimated that vertebrate heart muscle was phylogenetically closer to vertebrate skeletal muscles. if both results are true, muscles used in heart might have been substituted in the vertebrate lineage. the genome sequences of an amphioxus (cephalochordate branchiostoma fl oridae ) was determined in by holland et al. (2008; [ 104 ] ), and they provide good outgroup sequence data for vertebrates. eukaryotic viruses are relying most of metabolic pathways to their eukaryote host species. therefore, the number of genes in virus genomes is usually very small. for example, infl uenza a virus has 8 rna fragments coding for 11 protein genes, and the total genome size is ~13.6 kb. as in bacteriophages, there are both dna type and rna type genomes in eukaryotic viruses. table 8 .6 shows one example of classifi cation of eukaryotic viruses based on their genome structure [ 99 ] . genomes of double-strand dna genome viruses have four types: circular, simple linear, linear with proteins covalently attached to both ends, and linear but both ends were closed. genomes of single-strand dna genome viruses are either circular or linear. genomes of rna genomes are all linear in both single-and double-strand type. those of single-strand rna genomes are classifi ed into two types: plus strand and minus strand. a subset of single-plus strand rna genome type is experiencing [ 100 ] . megavirus is phylogenetically close to mimivirus [ 101 ] , a member of nucleoplasmic large dna viruses, including pox virus. recently, a larger genome size virus, pandoravirus, with more than 2.5-mb genome, was discovered [ 105 ] . the phylogenetic status of these large genome size dna viruses is unknown at this moment. analysis of the genome sequence of the fl owering plant arabidopsis thaliana the genome of the cucumber, cucumis sativu s l draft genome sequence of the oilseed species ricinus communis the genome of black cottonwood, populus trichocarpa the grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla genome sequence of foxtail millet ( setaria italica ) provides insights into grass evolution and biofuel potential a new database (gcd) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses the genome sequence of rickettsia prowazekii and the origin of mitochondria the hydrogen hypothesis for the fi rst eukaryote mitochondrial genome the complete mitochondrial genome of dugesia japonica (platyhelminthes; order tricladida) the complete nucleotide sequence of the tobacco mitochondrial genome: comparative analysis of mitochondrial genomes in higher plants and multipartite organization widespread horizontal transfer of mitochondrial genes in fl owering plants determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of dna having a nuclear origin small, repetitive dnas contribute signifi cantly to the expanded mitochondrial genome of cucumber the complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression changes in the structure of dna molecules and the amount of dna per plastid during chloroplast development in maize pattern of organization of human mitochondrial pseudogenes in the nuclear genome why genes in pieces? introns. in encyclopedia of evolution . tokyo: kyoritsu shuppan comprehensive splice-site analysis using comparative genomics the ever-growing world of small nuclear ribonucleoproteins intron phylogeny: a new hypothesis trnomics: analysis of trna genes from 50 genomes of eukarya, archaea, and bacteria reveals anticodon-sparing strategies and domain-specifi c features the origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? the evolution of spliceosomal introns: patterns, puzzles and progress genes in pieces: were they ever together? nuclear volume control by nucleoskeletal dna, selection for cell volume and cell growth rate, and the solution of the dna c-value paradox the recent origins of spliceosomal introns revisited correlation of dna exonic regions with protein structural units in haemoglobin remarkable interkingdom conservation of intron positions and massive, lineage-specifi c intron loss and gain in eukaryotic evolution new maximum likelihood estimators for eukaryotic intron evolution analysis of ribosomal protein gene structures: implications for intron evolution intron dynamics in ribosomal protein genes so much "junk" dna in our genome a fundamental division in the alu family of repeated sequences whole-genome analysis of alu repeat elements reveals complex evolutionary history characterization of highly repetitive sequences of arabidopsis thaliana centromeric repetitive sequences in arabidopsis thaliana sequence defi nition and organization of a human repeated dna empirical analysis of transcriptional activity in the arabidopsis genome identifi cation and analysis of functional elements in 1% of the human genome by the encode pilot project most "dark matter" transcripts are associated with known genes a snp in the abcc11 gene is the determinant of human earwax type molecular evidence for an ancient duplication of the entire yeast genome genomic analysis of the basal lineage fungus rhizopus oryzae reveals a whole-genome duplication global trends of whole-genome duplications revealed by the ciliate paramecium tetraurelia size of the protein-coding genome and rate of molecular evolution the evolutionary fate and consequences of duplicated genes comparative genomics in prokaryotes functions and mechanisms of rna editing the evolution of chloroplast rna editing chromosome structure and the c-value paradox la teneur du noyau cellulaire en acide dã©soxyribonuclã©ique ã  travers les organes, les individus et les espã¨ces animales (in french) nucleoprotein determination in cytological preparations the constancy of deoxyribose nucleic acid in plant nuclei conserved linkage between the puffer fi sh (fugu rubripes) and human genes for platelet-derived growth factor receptor and macrophage colony-stimulating factor receptor conserved noncoding sequences are reliable guides to regulatory elements enrichment of regulatory signals in conserved non-coding genomic sequence evolution at two level: on genes and form evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes utility and distribution of conserved noncoding sequences in the grasses conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution conserved noncoding sequences in the grasses arabidopsis intragenomic conserved noncoding sequence the banana ( musa acuminata ) genome and the evolution of monocotyledonous plants computational analysis and characterization of uce-like elements (ules) in plant genomes identifi cation and analysis of conserved noncoding sequences in plants viral mutation rates the origins of eukaryotic gene structure evolution of the mutation rate rates of spontaneous mutation analysis of the genome sequence of the fl owering plant arabidopsis thaliana the arabidopsis lyrata genome sequence and the basis of rapid genome size change a draft sequence of the rice genome ( oryza sativa l. ssp. japonica) a draft sequence of the rice genome phylogeography of asian wild rice, oryza rufi pogon , reveals multiple independent domestications of cultivated rice, oryza sativa independent domestication of asian rice followed by gene fl ow from japonica to indica curated genome annotation of oryza sativa ssp. japonica and comparative genome analysis with arabidopsis thaliana multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants the physcomitrella genome reveals evolutionary insights into the conquest of land by plants the selaginella genome identifi es genetic changes associated with the evolution of vascular plants overview of the yeast genome origin of genome architecture perigord black truffl e genome uncovers evolutionary origins and mechanisms of symbiosis master control genes in development and evolution: the homeobox story from dna to diversity evolution of conserved non-coding sequences within the vertebrate hox clusters through the two-round whole genome duplications revealed by phylogenetic footprinting analysis wormbook -the online review of c. elegans biology function and specifi city of hox genes a wide-range phylogenetic analysis of zic proteins: implications for correlations between protein structure conservation and body plan complexity genome sequence of the nematode c. elegans : a platform for investigating biology an improved molecular phylogeny of the nematoda with special emphasis on marine taxa the genetics of caenorhabditis elegans the genome sequence of drosophila melanogaster evolution of genes and genomes on the drosophila phylogeny the genome of the sea urchin strongylocentrotus purpuratus the draft genome of ciona intestinalis : insights into chordate and vertebrate origins assembly of polymorphic genomes: algorithms and application to ciona savignyi a functional cellulose synthase from ascidian epidermis phylogenetic relationship of muscle tissues deduced from superimposition of gene trees genome science and microorganismal molecular genetics distant mimivirus relative with a larger genome highlights the fundamental features of megaviridae the 1.2-megabase sequence of mimivirus ultraconserved elements in the human genome genomu shinkagaku nyumon (written in japanese, meaning 'introduction to evolutionary genomics') the amphioxus genome illuminates vertebrate origins and cephalochordate biology pandoraviruses: amoeba viruses with genomes up to 2.5 mb reaching that of parasitic eukaryotes key: cord-023647-dlqs8ay9 authors: nan title: sequences and topology date: 2003-03-21 journal: curr opin struct biol doi: 10.1016/0959-440x(91)90051-t sha: doc_id: 23647 cord_uid: dlqs8ay9 nan . garrell j, modolell j: the drmm~hila locus, am antagonist of proneural geors that, lilac these genes, ~.ncodes a helix-loop-helix protein. ce/11990, 61:39-48 in crystals of ~or-nib-glu(oz~)-leu-mb-ala-leu-an~alm-lys(z)-alb.ome. pro~ natl acx*d sci usa 1990, 87:7921-7925 cloning and expre~inn of two distinct high-afl~nlty p~eptot~ ~geat¢ting with acidic and b~ic ~last growth imctogs. embo j 1990 . emboj 1990 emboj , 9:1957 emboj -1962 . he~gst l~ lf~w~m~ t, g~lwr~ i~. the ryb l gene in the l~lmion ye~tt sd~m~acctm~ pombe lfalcodin 8 a gtp-bindin s protein belated to rho and ypt: structot~, expfemflon and identificetion of its human homulogue. embo j 19~0, 9:1949 embo j 19~0, 9: -1956 serotouln receptor that activates adenyiate cyclase domain • of lutropin/choriogonadotropin receptor expressed in t~ cells binds choringonadotropin with ltlgh /tmnlty cllgol~o0onlal orsanization of adrenergic receptor genes molecoiat chagactegization of a rat ~2n-adrenerl#c receptor identificatlon of rpo30, a vaccinia virus rna polymerase gene with structut~ similarity to a eucaryotic transcription elon~ation factor nucleotide sequence analysis of the l g~ne of vesicular stomafltia virus (new jersey serotype) --identification of conserved domai~l~ in l proteins of nonsegmented negative-strand rna viruses a novel u,man immunodeflclency virus type-1 protein, try, shares ~'quences with tat, ent~ and rev proteins phosphoprotein and nucleo~psid protein evolution of vesicular stomatitis virus new jersey identification of a conserved region common to cadherius and hl~u~llzat s~ a hema~u!tinin~ sequence and evolutionary relationships of african swine fever vh*es thymidine kinase all unusual stnlctul~ of a putative t cell oncugene whlch allows production of similar proteh~ from distinct messengeg rnas ~ ldentilfication of a 3rd protein factor which binds to the rolls sarcoma virus ltr enhancer --po~lble homology with the serum response factor genetic variation and multigene fannllles in aj~rics~t swine fever virus sequence of the genome rna of rubella vh'us --evidence for genetic rearrangement du~n~ tosavirus evolution derse i~ equine infectious anemia virus tat--insights into the structure, function, and evolution of lentivtrus tran.~activator proteins a colrpoeison of the genome organization of capripoxvirns with that of the orthopo~ ~golutionary orlffln of human and s imian lmtmo~odeflciency vir~jes a new supe~mlly of putative ntp-bindin 8 domaan,t encoded by genomes of small dna and rna viruses envelope gene sequence of htlv-1 isolate mr-2 and its comlmrlann with other htlv-i isolates evolutionary relationship between luteovtruses and other rna plant viruses based on sequence motifs in their putative rna pulymet-am~ and nucleic acid hellcases isolation and sequence analysis of caenothabd/~s br/~w~e repetitive elements related to the ~ dqana transposon tcl jmolevo11990 selective clo~ sequence analysis of the hum l1 sequence* which t~ in the rehttively recent l~tt jowcs m& sequences related to the matte tra~acmable element acin the genum zea. j m0/ evo/1990 evulutiotmtv pattern of the hemas~utinin gene of inflmmza-b viru~s ~ulated in japan ~ cocir~lating linesses in the same epidemic semon the dna binding subuult of nf-kaplm-b is identical to factor kbfl and homologous to the rel oncogene product sequencins analyses and com~ of pmrainfluenza virus type-4a and type-4b np protein genes. virok>gie complete sequence of the gcnomic gna of o'nyon8-nyong virus and its use in the constroctton of alphavir~ phylogenetic trecs molecular clouln s of the rinderpest virus matrix gene --comparative sequence analysis with other paramyxorirm~. vi~logy cautd~an p~ ancestry of a human endogenous retrovirus ~ determination of an epitope of the diffuse systemic sclerosis marker antisen dna topoisomerase-l: sequence $1mllagity with retroviral ~ protein suggests a possible cause for autoimmunity in systemic sclerosis. pro6 natlacad s6i u&11989, 86:8492~96. mcgeoch dj: pgotein sequence cota~lxs show that the 'psuedoprotesses' encoded by poxviruses and certain retrovirus~ belong to the deoxyoridine triphtmphate family ~sk1 life: ~es of comme//na yellow mottle vlrus's complete dna sequence, genomlc discontinuities and transcript su88est that it is a pararetrovlnm i~l~titis c vllrll~ sborl~ amino acid sequence similarity with pe~tivirutu~ and flavivirus~ as well as members of two plant vlgus superggoupo mo~mann ti~ homology of cy~kine synthesis inhibitory factor (el-10) to the epstein-barr virus gene bcrfi nucleotlde sequence analysis of sa-omvv, a vlsna-related ovine lentivirus --phylo~-netic history of lentivirmms single copy seqoences in g~qgo dna retmmable a repetitive hnman aetrotrmmposon-llke family. y mo/e~/1990, 31:92 100 re¢otnblnation resulting in unusual features in the polyomavlrus genome isolated from a murine tumor cell line sequence anal~is of rice dwarf elxytoreovirus genome sewments s4, s5, and s6 -comparison with the equivalent wound tumor virus segments ho~tu~ ~ s71 is a ehylngcueticellly distinct human endogenous reteovtgal 1rlement with structural mad sequence homology to simian sarcoma virus (ssv). vi~ologie identification of a novel 65-kl)a cell surface receptor common to pm~cee~flc polypeptide molybdenum hydroxylas~ ~ the amino acid seqoence of chicken hepatic solfite oxidase frequency of a]mloglnai h |lm~tn haemoglobins ~ by c ~ t trmmitions in cpg dlnucleotid~ evidence for conservation of ferritin sequences amon 8 plants and animctbt and for a transit peptlde in soybean a 32-kda llpo~ortin from human mononuclear cells appears to be identical with the placental inhibitor of blood coagulation distinct fercedoxins from rhodobacter-capsulstus -complete amino acid sequences and molecular evolution n~ptide sequence analysis and molecular cloning reveal two calcium pump isoforms in the human erythrocyte membgane cloning and characterization of a novel member of the cytochrome-p450 subfamily iva in rat prostate a directiy repeated sequence in the ~-globin promoter resulates transcription in murine efythroleukemla cells isolation and chamcterizatinn of the alkane-inducibie nadph-cytochrome-p-450 olf, idoreductsse gene from candida-tropicalls -identification of invarlant residues wlthin slmilmr amino acid sequences of direr'sent flavoproteins protein klnase-c inhibitor proteins -purification from sheep brain and sequence similarity to lipocortins and 14-3-3 mci~ aveml~ b& sequence homology between purple acid phosphatases and phusphoprotein pho*phatsses --are phesphoprotcin phosphatatms metalloproteins collt~|nln~ oil~-bridged dinuclcar metal centers negative regulation of the human ~-globin ca~ne by transcriptional interference: role of an mu repetitive ~lement amino acid sequence of chicken catisequestrin deduced from c dna -comp~rison of caisequestrin and aspartactin caisequestrin, an intesccilular calciumbinding protein of skeletal muscle sarcoplssmic reticulm, is homolokous to ~, a putstive latminin-binding protein of the exteac¢llular matr~ bovsm~ ]prote~ c inhihl.gog with structugll and fun~ hotdoio~ou~ ]~-.gtl~ to hum~zn plum~ protein c inhibitor sequence of silkworm hemolymph antitrypsin deduced from its cdna nucleoude sequence --~on of its homology with ~.rplus. l b~cbem (tokyo) human mm~t cell tryptm~e multiple cdnas and genes reveal • multigene serlne protemje lmmlly howam> jc: msc ore. n k#on encoding protehm iteleted to the multidtog ite~letance family of tra~membt'mne tratmpofters m~, a tks~me-speclfi¢ b•tmment membrane protein, is a ia.minin.like protein commrvation of a cytoplasmic ~xy-termitml domain of couexin 43, a gap junctional protein, in mammal heart and brain the a~lba//a~ plasma membrane h+-a~ multigene ~ -genomle sequence and expression of • 3rd lsoform, f b/0/owra op#n of calliphora peripheral l~otoreceptors r1-6 --homology with d~ rhl and po~tmnsi~domd processing evolution of rhodopsin supergene family --independent divergence of visual pibments in vertebrates and insects and po~ibly in mollusks ct~tpl¢ the g~ne~ amino acid ~'~m~me gene of sac~baromy~wcet~-v/~ae --nucleotide seque~tce, protein similarity with the other i~kers yeast amino acid petmme~mes, and nitrogen cataboht~ repreulon the 70-kda peroxlsomal membrane protein is a member of the mdr (p-glycoprotein)-related atp-bindin~g protein superfamlly a new clam of lym~o-real/vacuolar protein sorelng signais. l b~/chem complete amino acid sequence and homologies of human erythrocyte membrane protein band 4.2. proc natl acad scd us a the primary structure of a halorhodopsin from n pbaraom~--structural, functional and evolutionary impnoations for bacterial rhod~ and haloghodopslns soluble lactose. blndln~ vertelmue lectlns: a ~ family the a regulatory subunit of the mltochondrlal fi-atpa~ complex is a heat shock protein. identification of two highly conserved amino acid sequences amon~ the ~x-subunits and molecular ~ sequence of h ilmlfl ~ l~ieat~ • novel gene family of integral membrane proteoglycans a protein with homology to the c-termimml relationaxip~ between/m~-nylate cycla~ and n•+,k+-ati~se lit pat pancgtmti¢ islets human na+ ,k+-ati~¢ genes ~ beta~ubunit gene family conmina at lcest one gene and one ~ evolution of the mltc cles~l genes of a new world lh'imate from ancestral homologues of human non-clessical genes the cdna sequence of mouse pllp-1 arid homololgy to hntman cd44 c~ll s~e antitpm and promot#ycen core unk proteins ~tjott of cdna encodin~ a hnman sperm membr~e protein related to a4 amyloid protebm ptwlflcstlo~ c~mu'actet~muon, and con with memb~ne carbonic anhydrase from human kidney hypermumbility of cpg dinucleoudes in the propcptide-enced/ng sequence of the human alb~tmi~ gene dystt'ophhl in electric or~n of to, pedo-~ homologous to that in h,ml~ muscle botste~ i~ homolosy of a yeast acun-binding protein to signal trmmductlon proteins slid myosin-1 the complete sequence of drosophila alpha-spectrin --conservation of structurml domahm between alplm-~ and alpl~t4cttnin •~ettaatflon of a lqbrilisr collqwn gene ~ spruces reveais the etdy evolutionary appearance of two collqwn gene fmmilk~ the predicted amino acid sequence of ct-lnternexin is that of a novel neuronal lntegmedla~ ~ent protein otsen bl~ type xil collm~n. a larbe multidomnln molecule with partial homology to type ix cousllem / b/d aera 1989 amyioid protein in i~mni~l amyloidmfls (plmnlah type) ks homolollotm to gd$oilmb an ac, tht-i~h,.da~g protein. b/~bera b/q0b~ res commun key ji~ ~ of a proline-rich cell wall protein gene ~ of soybesn. a ~ ana/ysis. j b/o/~em chicken liver evolutionary rehttinnships and impflcations for the resulation of phoophohpsse-a2 from snake venom to human secreted forms identification of a locality in snake venom a-ncurotoxins with a slsnlficant comlm*itinmd similarity to marine smdl ct-conotoxins: implications for evolution and structure activity al~ph[biml~ albmtm|nm ~s members of the albumin, alpha-l~toprotein. vitamin-d-binding protein mul~ flmily ~ni~on of the hnm~n llpoprotein lllmse gene and evolution of the llpase gene family e~'t~ion of cloned human reticulocyte 15-1ipoxygenase and immunological evidence that 15-hpoxygetmses of different cell types are related identification of a protein alt~ inttaspecific evolution of a gene family coding for urinary proteins conservation between yeast and man of a protein a~ociated with i35 small nuclear rlbonucleoprotein stl~ctute and partial amino acid sequence of calf thymus dna topobmmaertt~-ii -coml~on with other type-h emmyme~ ol~nudeotide correlations between infector and hem genomes hint at evolutiotmry relationships. nu6/e~ scot/~ik p& carotenoid desmurases fi, om ~ ~and nmoowo~craua are stru~ and l~n~'tinnally comerved and eonmin domains homolosons to flavoprotein dimdflde oxldoreductm~ deininger pi2 stt'uc~uee and vsrisbihty of recently inserted alu family members a novel neutrolphfl chemmtttactant generated duan8 an ln~ammmtory reaction in the l~mt peritoneal cal~lt~ tt~ t~t~o -l~tl'~t~tloil~ ~ amino acid seque~tce and structural relmtmmhip to interkukin-& b~ffx~m j the multlfimctinna 6-methylmllcyllc acid syn~ ge~e of ~~ ~ its ge~e structmm ieimive to tl~t of other po~lyketide symhase~. f.urj b/odaem 1990 mammalkm ublquitin carrier prmmtmh but not i~:i~k, ame ltdated to the 20-kda yeast 182, rad6. bk~chem b/qohys res commun chambers gk: sequence. structure and evolution of the c.ene codin b for ~t-gi~erol-3-phe~plmte ~rdrotfm~ in om,qt~ the cotaplete sequence of bogu/ktmm nenrotoxin type-#, and com~ with other clostrldhtl neugoto~hm if: a pamlly of cxam~fltutive c/bbp-llkc dna blndln~ proteins attenuate the il-l~t induced, ni~b mediated trans-activation of the ansiotemflnogen gene acute-phase response element different fort~ of ultmhithomx proteim generated by alternative spttcim~ are functionally equivalent evolution of collagen-iv genes from a 54-batm pair faton --a role for lntrmm ht gem~ evolution evolution of the insulin superfamlly tcetins are structoraily related sertoli cell proteim who~ ~on is tightly coupled to the iprtsence of germ cells ivarie r~ a bovine homolo s to the human myolletti c determination factor myf~ sequence conservation and 3' proce~ing of transcripts proteiu sertne threonine phoephatmes -an expanding family coppes zl divergence of duplicate genes in three sciaenid species (perciformes) from the south co~t of uruguay coasfaneda m: rrs~j~o~a (mu-~--a~) repetitive dna seqmmce l~vointion in 3 ~hically mstinct isolates. cor~0 bnz~n physiol repetitive seq~ce involvement in the duplication and divergence of mouse lysozyme genes the structure of a subtermlnal nut/e/6 a6/ds res 1990 schoofs i~ h~ between amino acid sequenc~ of ~ v~'lt~tm'stte peptide hormones and peptides ~mlated fi-on~ invertebrate sources. corn# bm&.n mg~ol bun'nng s, ~us r& lqatelet gtycoprotetn nb-ma protein antssonim from snake venoms ---evidence for s fumlly of p~telet-~sgqpttlon lnhll~tol~ hikher plant orilgins and the whylogeny of gt~en allpte simihtrity between the t~ ~ sindln s proteins abf1 how big is the univet~ of e~otm worklwide diffegences in the ~ncideace of type ! diabetes are ammciated with amino acid variation at pos/tion 57 of the hi~-dq ~ chain yeast general trtnscelptimt l~ctor gf! --sequence requirements for binding to dna mad evointhmky commrvttion. nudeg m/ds res concerted ]rv~ution of primate mplm smelllte dna. e'~kmce foe tm an~mt~ sequence sbm'ed by goal~ md human x ~e alpha ~ttdllte the nuchl~m~ sequence of etve ribommaal protein genea from the o/anene. of ~~ impacattom concem~ the mtytosene~ relationship bet~-en cyanelles and chloropluts wmslanoer l~ a new member of a secretory protein gene family in the dipteran c~t~onomot~ tentaus ~ a variant repeat stracture the ~r sequence ~ --die.inn on the x-chromosome and y-chromosome of a large set of closely related sequence~, most of wmda are i~eudogene~ ba~ttmo~e l~ cloning of the pso dna binding subutdt of nf-kapi~-b -homolo~" to gel and dortml l-~te two-monooxr~muse from m~ --clon~ nucleotide sequence, and primary structu~ homology within an enzyme family genetic hot~o~n~ty ~ acute and chronic acute forms of spinal muscular atrophy genetic variants of bovine ~-lactogiobulin --a novel wild.type ~-lacto#obulin w and ~ts primary sequence. b/or (~rn h0tt0e sey/er l~ltogh~ dna evolution in the olmcm species subgroup of drooophll~ f mot evot lovell-badge l~ a gene mapldng to the sex-determining gegion of the mouse y chromommae ~ a member of a novel ~ of zmbryonk~ly genes ~titmte 1,2-dioxy~mm~ from p~.udomotm~ pustfi~mtion, characterization, ~md compm'tson of the f.mtymes from psemffmmm~m ta~o~k-ron/and aaammms~ spec~clties of the peptidyl prolyl cis-tratm isomeric activities of cydophmn and fk-506 bindh~ protein --evidence for the existence of a family of distinct enzymes. b~x/aem/ary mltochondrl~ dna evolution in primates -tt-atmltion gate has been extremely low in the lemug homeobox containing genes in the nematode ~enorbabd/f~ elk.gamin nucleic ac shdic add fateesses of ~ • voluttomu.y origins have serine active sites f~entlal arginlne residues dewact-rrer l~ the 188 ltilm0omal rna ~-quence of the s~t anemone anemom~s ssdcmta and its evolutionary intuition amomqg other eukaryotes inferred b'om s~l,.m.~ comlmrttmas of a heat shock g~ae in two nematorl~ the l~'/o multtgene family of ok~hag of cdna ~ for the ~ omin of human complement component ca~bi~una protein, seqaenoe homolo~ with thc a c~t~:~a~h proc natl acad s¢t usa1990 highly conserved core domain and unique n terminus with presumptive regulatory moti~ in a hmman tata factor (l'lql~) [letter] identification cimractertzaflon of a novel member of the nerve growth fmctor/besln.dertved neurotrophic factor family ~ bind8 to s~dlfmme [eal(~-so4)l~l-lcer ] and has a sequence homology with other pt'otelns that bind sulfated glycoconjut~tes anllllo acid seqmmce of clnnamomin, a new member of the elicitin family, and its comparison to cryptogein and capsicetn soluble and mtmo[~tle~ioc~ta~l h~ low-ml~n|ty adenomne binding protein (adenotin) --properties and homology with mtmmall~la and avian stress protelus. b~-/~om/stry edolatlon of complementary dna$ f~lcoding a cerebellum-enriched nuclear factor-i family that activates tt'anscription from the mouse m~.lin basic protein promoter ye~mt mltochondrlal dna polymet'ase is related to the family a dna polymerases nudeotide and deduced amino add sequence of a human cdna (nqo2) corresponding to a second membeg of the nad(p)h --quinone oxldoreductase gene family --extensive polymorphism at the nqo 2 gene locus on chgomo~ome-6. b/oc.heraistry ult~ sltnlltt'leles a~llolltll enzyme pterin binding sites as demonstrated by a monoeinnal amiidiotypic antibody blundell tl molecular anatomy: phylogenetic relationship* derived from three~limenslonal structure~ of proteins subfamily structure and evolution of the hnmtn 1.1 family of repetitive scquence~. f mot evo 3 selmt~te mltochondrlal dna sequences are contiguous in htlmsa~ genol~ic dna l~t~lit~ within mmmm~lla~ sogl~tol deh~ --the prlmm'y structure of the human liver enzyme heterogeneous modifications of the l14/alo ltrote~a of ibtegleuldn-~t cells are concentrated in a/,ti~hly r~qg~.titlv ~ amino-t~ vaults.ell rebofmcleoprotein structures are msl~ conserved among higher and lower e~tes rnas le~d support to the monophyletic nature of the ~erla lmmunoloslcal ~lmllmtties ~etween cytosolic and partictdate tissue trans#utamilsc. febs lat mans~ti x#tope m~w~zed by a protective m~aodonm antibody is identical to the sta~e-specific embryonic antlgen-l. proc naa acad sa o~ 1990 the murg3 gene of t-brucei contains multiple dom.l.m of extensive editinil and is hofaoin~m~ to a subultit of nadh dehy~ neparm-bindl~ nenrotrophtc x~tor (hbnf) and mk, member's of z new i~mily of homolosous~ developmentally l~ted proteitm pugmattion and strucrmml ~on of pttcentel nad + .mtked 15-hydroxyproma#andm dehydtoffmase ~ the primary structure reveals the enzyme to belon 8 to the short-alcohol l)ehydrogena~ l~mlly. b/ochemistry structores and homologies of carbohydrate ~pho~ system ep~l~[ln, a ~o~a-gmjoclated mudn, is generated by a polymorphlc gene encodin8 splice variants with alternative amino termini a new member of the leucine zipper class of proteins that binds to the hia drct promoter. sc/ence attalysi~ of cdna for human ~ ajudgyrin i~dicltes a repeated structure with homology to tissue-differentiation a~td cell-cycle control protein the b subunlt of a rat hetefomeric ocaat-binding transcription factor shoes a striking sequence identity with the yeast hap2 transcription factor homology to mouse s-if and sequence similarity to yeast pt~2 stgucttu'e and evolution of the 02 small nuclear rna multigene family in primates: gene amplification under nat-¢wal selectinn? ident~catinn of an additional member of the proteln.tyrushle-phosp~ family --l*vidence f~ alternative spliclog in the tyrmine phosphzmme domain a 8~le am~o acid difference dis~ishes the human and the rat sequences of statlmaln, a ubiquitous intracehular pho~phoproteln ~ with cell item comp~ison of the seve~le~ gene* of drosop~ffa t~'ff~ end ma4~ muty, an adenine ~ active on g-a mislmirs, has homology to ~t evolution of largesubunit iutna structuge --the ~cation of imvetbe~t d3 dommin amon8 mmjor phyiolpmetic groups discrepancy in diveqlenoe of the mltodtondrlal and nuclear genomes of m sensor/and y~ j mot evot 1~90 adenylate deamll~t~. a mt~flige~e fam~ in p..m~,n, and rats isolmion and structure of ceerol#m, itna,~le hat~ peptmes, from the smm~m, ~ mo~ comp a~a rmm~ i~ vmotocin ge~ of the teleom f.,xott intro~ botany. ~ hot~ ot'l~mization. b~hemioy the adb gene areal share features of sequence structure and nudeast~protected sites. m0/cell bto/1990 the amino-acid sequence of multip/e lectins of the #.corn barnacle m~us-lgo~ and its homology with .animal ]~'tllls. bioclx'm btqobys acta amino add ~.-quence of mtmkey erythrocyte glycophorba mk. its amino acid ~'qu~'~icc ]f][~ a stri~tl~ homology with that of human glycophorin a flsp~r p& drtmophila proliferating cell nuclear antigen. structural and functional homology with its mammalian coonterpart phylogeny of n|trogen*me s~queac~ in ][~mnkla and other nlteogen-fixing ml~m$ vertebrate prot~mlne c~ne evolution.1. sequence alignments and gene structure florin l~ a major styl~ matrix polypeptid~ (sp41) is a member of the f~thogenesia-reiated proteins superciass complete amino acid sequence of rat kidney ornithine aminoteat~fet-~e --identity with ijver omithine aminotransferme. l bnxl;em (tokyo) rlbonuclease p --function and variation. j b/o/~bem the primary strum of glycoprotein-m from bovine adrenal medullary granules --sequence similarity with bnmmn serum protein-40,40 and rat sertoli cell giycoprotein-2 compm'ative ~quence/umlysis of m~mmantan f'a~or ix protaotegs the amino acid sequence of the b nman l~ia polymet'a~-h 33-kda subunit hrpb 33 is highly cotmerved among eukaryotes phylogenetic conservation of atylsulfatases --cdna cloturing and expre~ion of hnman aryisul~t~e-b. j b/o/cbem c.oll/l~'vlltion and diversity in fatnllies of coated vemcle adaptlns cllaracterizaflon of petel porcine bone sialoproteins, soca'~ted phosphopgotein ! (sppi, osteopontin), bone siaioprotein, and a 23.kda glycoprotetn ~ demonstration that the 23-kda glycoprotein is derived from the carboxyl terminus of sppi characterization of matteuccin, the 2.2s storag~ prote~ of the ostcich fern -evolutionary iteiatinnshlp to angiosperm seed storage ~ a new mmber of the glutamine-rlch protein gene family is characterized by the absence of internal lgepe~ts and the androgen control of its expression in the subm*ndlbuiar gland of pad 2 novel insect n~ with homology to peptides of the vea'te~ tachykinin family identircation of a novel platelet-derived neutrophli-chcmaotgctic po~ with structural homology to piatelet-factor-4 a novel repeated dna sequoncc located in the intergenic regions of ba~tceial chromosomes. nuc2eic.,k:ids res the proianlin storage protellx¢ of cere~ seeds ~ structure and evolution functional analysis of the 3'-terminal part of the balbiani ring gene by hlterspecies sequence comparison dr= mammaban ~yl phosphate symhetase (cp*) --cdna sequence and evolution of the cl m domain of the syrian hamster multifunctional protein cad mammalian dihydroorotase --nudeotide sequence, peptide sequences, and evolution of the imhydroorotsse domain of the multifunctinnal protein cad a receptor for tumor necrosis factor defines an unusual family of cellular and viral proteins the control of flower morphogenesis in a~..ffd~um majusthe protein shows homoinff~ to transcription factors an element of symmetry in ytmst tata-box binding protein transcription factor-lid --consequence of an ancestra/ duplication? c-type natciuretic peptide (cnp): a new member of nateinretic peptide family identified in porcine brain evolution of antioxidant m~: ediol-dependent petoxidm~.s and thiol~ ~umong ptocaryotes towards the evolution of ribozymes alkyl hydroperoxide reductase from sa/mone/ta ~ur/um --sequence and homology to thinredoxin reductase and other fiavoprotein disuliide oxidoreducmses fc: nonuniform evolution of duplicated, developmentally controlled c~azrion genes in a sillumoth the fission yeast cutl + gene regulates spindle pole body duplication and has homolosy to the buddin structural homology b~ween the hnmmn fur gene product mad the sub---like protea~ encoded by ye~t/~x2. nuc~ a¢/ds res 1990 nudeotide sequences and novel steuctut~ features of hnm=. and cimm~ lighter ~# primary stt~t~ and expression of a nuclear-coded subunit of complex-n n~ to protetm specified by the chtoropiast genome. b/0chera bnfhys r~ commun a novel gene member of the human giycophorin-a and glycophorin-b genc fatuily -molecular cloning and expression the x-chromosome of monotremes shares a highly conserved region with the eutherlan and marsupial x-o~romosomes despite the absence of x-chromosome ittactt~tion c~lract~tion and or~= nl~tion of dna sequences adjacent to the evidence for a new fmily of evolutionarily conserved homeobox genes elellatltlll and albolabrin purified peptides from viper venoms --homologies with the rgds domain of flbrinogen and yon willebrand pactor measurement of $~tiv~-site homology between potato and l~bbit muscle alpha-glum phosphoryiases through use of a iane~r free energy relationship white 1~ weiss 1~ the neuroflbromatosis typed gene encodes a protein related to gap the dna damage-inducible gcne-dinl of saocbarom3q~ewcet~#.s/ae encodes a regulatory subunit of elbonucleotide reductase and is identical to gnr3 fhlgegprinting of ne~lr-homogeneous dna hgase-i and ligase-h from eh,m~n cells --similarity of their amp-binding domains control of m11na st~mlity in • chnoc~qg.~um, by 3'inverted ltepeats: effects of stem and loop mutations on degradation ofxtmba mlna/n vt~ nuc/e~ ac alternative messenger rna structures of the ciil-gene of bacteriophage ~. determine the rate of its tt'ansbttion initiation alternative mrna structures of the cm genc of bacta~ophage ~ detc:'mine the rate of its translation initiation. j mo/b~0/1989 a model fog iina editing in klnetopiastid mltochondrla --guide rna molecules transcribed from max/circle dna provide the edited information elements and coding sequences. j mol bio11989, 210:417-427 . chang c-y, ~ d-a, mohandas til chung b-c: stt~ctut~e, ~-quence, chromo~maal location, and evolution of the human fercedoxin gene family. dna cell b/o/1990, 9:205-212 key: cord-018798-yzxy9ogf authors: jain, pradeep kumar; bhattacharya, ramcharan; kohli, deshika; aminedi, raghavendra; agrawal, pawan kumar title: rnai for resistance against biotic stresses in crop plants date: 2018-07-10 journal: biotechnologies of crop improvement, volume 2 doi: 10.1007/978-3-319-90650-8_4 sha: doc_id: 18798 cord_uid: yzxy9ogf rna interference (rnai)-based gene silencing has become one of the most successful strategies in not only identifying gene function but also in improving agronomical traits of crops by silencing genes of different pathogens/pests and also plant genes for improvement of desired trait. the conserved nature of rnai pathway across different organisms increases its applicability in various basic and applied fields. here we attempt to summarize the knowledge generated on the fundamental mechanisms of rnai over the years, with emphasis on insects and plant-parasitic nematodes (ppns). this chapter also reviews the rich history of rnai research, gene regulation by small rnas across different organisms, and application potential of rnai for generating transgenic plants resistant to major pests(.) but, there are some limitations too which restrict wider applications of this technology to its full potential. further refinement of this technology in terms of resolving these shortcomings constitutes one of the thrust areas in present rnai research. nevertheless, its application especially in breeding agricultural crops resistant against biotic stresses will certainly offer the possible solutions for some of the breeding objectives which are otherwise unattainable. rna interference (rnai) is an invaluable technology for unraveling gene function in the area of functional genomics. it has been utilized in basic research ranging from functional studies to gene knockdown in plants and vertebrates and to suppression of cancer and viral diseases in medicine. moreover, from application point of view, it is being used extensively for trait modification by selective inhibition of gene expression universally across the organisms. in agriculture, rnai has been extensively employed particularly for imparting resistance against biotic stresses including insects, bacteria, nematodes, fungal infection, and viruses (tan and yin 2004; yanagihara et al. 2006; good and stach 2011; banerjee et al. 2017; majumdar et al. 2017; zhang et al. 2017 ). this chapter focuses on how rnai has been extensively used in managing various biotic stresses which constitute serious impediments to crop productivity. damage due to insects, fungus, parasitic weeds, and plant-parasitic nematodes is a major biotic constraint causing significant yield losses in agriculture year-round. the basic concept involves a double-stranded rna (dsrna) molecule which potentially silences the gene with complementary sequences post-transcriptionally. rnai phenomenon was first discovered in a free-living nematode, caenorhabditis elegans (fire et al. 1998 ). they coined the term "rnai" for describing effective silencing of gene expression by exogenously supplied sense and antisense rnas in the model nematode, caenorhabditis elegans. this phenomenon, conserved among eukaryotes, was described as post-transcriptional gene silencing (ptgs) (carthew and sontheimer 2009; berezikov 2011) . historically the roots of this exciting development can be traced back to 1990 when chsa gene was overexpressed in transgenic petunia plants and the silencing of endogenous as well as transgene of chalcone synthase in the transgenic plants was observed (napoli et al. 1990 ). loss of endogenous as well as transgene-derived mrnas was described as co-suppression, a term formulated by napoli. soon, importance of this technology was well understood by the scientific community, and since then, phenomenal growth in this technology has taken place. in fungi, this mechanism of ptgs is known as quelling (agrawal et al. 2003) . in nature, viruses mediate ptgs in plants, and the effect is amplified in cytoplasm or in the nucleus. the major small noncoding rnas (ncrnas) include micrornas (mirnas), small interfering rnas (sirnas), and piwi-interacting rnas (pirnas) which are all involved in downregulation of gene expression (aalto and pasquinelli 2012) . each class of small rna is unique in its biogenesis and mechanism of action, but there are a few similarities too. both mirnas and sirnas are processed from larger dsrnas through cleavage by dicer (a ribonuclease iii enzyme). both are associated with argonaute proteins (ago) (ketting 2011) forming rna-induced silencing complex (risc). risc basically is an argonaute protein bound to a single strand of noncoding rna. varied ribonucleoprotein complexes arise due to several ncrnas and argonautes involved in formation of risc (darrington et al. 2017) . the rnai-mediated gene silencing occurs basically in three stages (siomi and siomi 2009 ). first one involves processing of long dsrna into small dsrna by ribonuclease iii; in the second stage, unwinding of these small rnas leads to formation of one guide strand, which is loaded into the risc, whereas the other strand known as passenger strand gets degraded. finally, the risc, directed by the guide strand, locates mrnas containing sequences complementary to the guide, binds to these sequences, and either degrades the mrna or blocks its translation (winter et al. 2009 ). the mechanism of rnai is emerging with all its complexity, but with clarity, as more and more players involved in the interference are getting identified and characterized. the involvement of sirna molecules as important intermediates of the rnai process became evident through independent investigations carried out by researchers around the world. the first report of accumulation of sirnas was confirmed by hamilton and baulcombe (1999) while studying tomato lines transformed with 1-aminocyclopropane-1-carboxyl oxidase (aco) and later in drosophila syncytial blastoderm embryo (tuschl et al. 1999) . two other independent studies experimentally exhibited the 21-23 nucleotide small rnas as intermediates for degradation of mrna (zamore et al. 2000; elbashir et al. 2001) . but how these small rna molecules are excised from their precursor was yet to be discovered. as the role of rnase iii enzymes had been recognized as dsrna nucleases already, the rnase iii domain-containing proteins were searched as one of the factors in sirna biogenesis. recently only, different experimental studies revealed the involvement of rnaprocessing enzymes in chopping off the dsrnas into sirna molecules. one of the crucial enzymes, dicer, was identified in drosophila, by browsing its genome for the proteins dedicated for functioning like rnase iii endonuclease activity . in another study, dicer protein in c. elegans (a bidentate nuclease) was characterized revealing its functional role in small rna regulatory pathways (ketting et al. 2001) . it was also deduced to be the ortholog of drosophila dcr-1 protein. ketting et al. (2001) in this study also showed the requirement of atp for regulating the rate of sirna synthesis. in yet another experiment reduction in atp levels by 5000-fold in drosophila revealed a decrease in the rate of sirna production (nykanen et al. 2001) . it is now believed that dicer acts as a complex of proteins with domains for dsrna binding at its c terminus which are separable from motifs like helicase and paz. it was experimentally found to co-localize with an endoplasmic reticulum protein, calreticulin (caudy et al. 2002) . however, the role of atp in the biogenesis of sirna is abstruse due to its varied functions among different dicer proteins in different organisms. an imperative involvement of atpase in sirna production was exhibited by drosophila dicer-2 and c. elegans dcr-1 (tomari and zamore 2005) in contrast to human dicer wherein an atpase-defective mutant showed regular processing (carthew and sontheimer 2009) . a comprehensive biochemical, molecular, genetic, and structural study revealed the presence of two main domains, namely, paz and rnaseiii, performing a crucial role in excising the sirnas (zhang et al. 2004; macrae et al. 2006) . once dicer cuts off the dsrna, synthesized sirnas then enter the risc complex. the double-stranded sirnas act as a template for the risc to recognize the complementary mrna aided by argonaute proteins. agronaute proteins are required for the risc assembly and have been biochemically characterized in drosophila. amplification of sirnas has been reported in nematodes, fungus, plants and amoeba (dykxhoorn et al. 2003) . rna-dependent rna polymerase (rdrp) is proposed to be involved in augmenting the sirna molecules on the basis of biochemical studies (lipardi et al. 2001; sijen et al. 2001 ). sijen demonstrated the fundamental role of rrf1 gene having sequence homology to rdrp for the production of secondary sirnas in c. elegans. in this study, the concept of transitive rnai pathway induced by secondary sirnas came into the picture. thus, catalytic nature of rnai was proposed. the direct loss in crop productivity due to damage by insect pest and the input-cost accrued in agrochemical based protection amount to billions of dollars every year worldwide. in spite of alarming environmental hazard directly due to residual toxicity of insecticides in food chain, the consumption of insecticides has been ever incremental. this is primarily due to resistance development in insect-pest population and lack of awareness among the farming community. the worldwide consumption of insecticide increases by almost 30% in every 4 years. therefore, insect-pest management, preferably through an integrative approach and without indiscriminate use of insecticide, has become a most sought-after area in research planning worldwide. millions of dollars were granted for researching on sustainable and low-cost alternate avenues of pest control strategies in five most important agricultural crops. development of resistant cultivars in crops seems to be the most acclaimed alternative for minimizing the application of insecticides. unfortunately, for most of the major crop-insect damage, either such resistant cultivars are not available or the resistance has been broken down. further insight into such examples reveals that lack of resistance source maneuverable either through classical breeding or through transgenesis has been the major constraint. accessing unrelated gene pool through development of transgenics has emerged as the most potential avenue for overcoming this bottleneck. success of bacillus thuringiensis (bt) toxin-mediated protection of a large number of crops has been celebrated widely and in fact demonstrated for the first time the potential of biotechnological means in developing genetic resistance. however, applicability of bt-mediated protection is limited as many of the insect pests are not affected by bt toxin, and also this technology has faced second-generation challenge of some major insect species developing resistance to bt (tabashnik 2008; tabashnik et al. 2008) . it has been realized that lack of useful insecticidal transgenes is the major limitation in transgenic-based engineering of genetic resistance. in contrary, through rnai, any important gene can be precisely targeted to elicit lethality in the insect species. use of rnai has rapidly progressed for gene function analysis in various insect orders, including diptera (lum et al. 2003; dietzl et al. 2007 ), lepidoptera (tian et al. 2009; terenius et al. 2011) , coleoptera (baum et al. 2007; zhu et al. 2011; bolognesi et al. 2012) , and hymenoptera (nunes and simoes 2009; meer and choi 2013; zhao and chen 2013) . like in plants, rnai is primarily involved in antiviral defense mechanisms of insects as a part of its innate immunity. however, a number of studies indicate several branches of rnai involved in endogenous gene regulation in addition to silencing of genetic elements of pathogen invaders and transposons (van rij and berezikov 2009 ). gene silencing through rnai is systemic and transitive as originally described in c. elegans. a host-derived rna-dependent rna polymerase (rdrp) amplifies the rnai post-elicitation by dsrna. in contrast to nematodes, in insects, there is no definite proof of the presence of rdrp. in the absence of rdrp-mediated amplification of dsrna in insects, the silencing is expected to be more localized. therefore, elicitation of an effective silencing will require delivery of the dsrna directly to the target cells and tissues in a continuous manner. the administered dsrna enters the insect cells via sirna pathway in which a complex consisting of the rnaase iii enzyme (dicer-2) and trbp cuts the dsrna into small 21-23 bpsirnas. the risc bound to ago recognizes the guide strands of the sirnas. this complex then binds to complementary sequences of target rnas which are eventually degraded. two types of rnai pathway are known to occur in insects: cell-autonomous and non-cell-autonomous rnai. cell-autonomous rnai is limited to the cells in which the dsrna is administered or delivered. in contrary, when the silencing occurs in cells different from the cells delivered with or producing the dsrna, it is called non-cell-autonomous rnai. depending on how the dsrna is acquired by the cell, non-cell-autonomous rnai can be grouped in two kinds: environmental rnai and systemic rnai. in environmental rnai, dsrna is absorbed by a cell from the surrounding environment. therefore, this is seen in unicellular organisms or any cell lines when administered with dsrna. environmental rnai does not necessarily result into systemic spread of the response. in multicellular organisms, silencing signal is transported from one cell to another by systemic rnai. in case of transgenic host-mediated delivery of dsrna, the dsrna is delivered into the gut lumen of insects. for eliciting effective rnai, dsrna must be taken up by gut cells from the gut lumen which is known as environmental rnai. if the transcripts of target genes are prevalently expressed in tissues outside the gut cells, the systemic rnai has to occur for spreading of silencing signal. however, there is no definite study on assessing systemic rnai in insects. plant-parasitic nematodes (ppns) are grouped on the basis of different type of lifestyles, i.e., sedentary, including root-knot nematode (rkn) and cyst nematodes, and migratory, including root-lesion nematodes. sedentary endoparasites interact with the host through secretions which are vital cues for plant-nematode interactions. these secretory proteins are thus of major interest as targets for modulating the interaction. rnai has been extensively used in functional genomics performed on c. elegans and opened up the possibility of deciphering the function of uncharacterized genes in other parasitic nematodes. recent discoveries focused on unraveling the role of different components of rnai in parasitic nematodes has eventually led to increasing our understanding of rnai mechanism. there are overwhelming reports on managing ppns using rnai. in nematodes, systemic rnai can be observed resulting in a gene knockout that spreads throughout the organism. this is because rna-dependent rna polymerase (rdrp) is present in nematodes which interact with risc and leads to production of new dsrnas which are acted upon by dicer enzymes and further produces new sirnas (secondary sirnas) in a well-coordinated amplification reaction. therefore, the effect of dsrna persists over development and also can be exported to neighboring cells thereby leading to silencing effect all over the organism (daniel and john 2008) . c. elegans displays systemic rnai wherein the dsrna/sirnas entering from the environment can spread from one cell to another. studies on identification of effectors of systemic rnai revealed presence of protein sid-1 in c. elegans (winston et al. 2002; feinberg and hunter 2003) . interestingly, m. incognita and m. hapla, along with other parasitic nematodes, despite exhibiting successful rnai, were found deficient in sid-1 and other related proteins having a key role in dsrna uptake and its spread. several detailed comparative studies have postulated the presence of rnai components in different ppns and animal parasitic nematodes that were reported in c. elegans (lendner et al. 2008; dalzell et al. 2011; haegeman et al. 2011) . all these studies found rare proteins taking part in rnai pathway. seventy-seven orthologous effectors in c. elegans were searched in 13 nematode species, ancylostoma caninum, oesophagostomum dentatum, ascaris suum, brugia malayi, c. brenneri, c. briggsae, c. japonica, c. remanei, haemonchus contortus, meloidogyne hapla, m. incognita, pristionchus pacificus, and trichinella spiralis, using reciprocal blast followed by domain structure verification (maule et al. 2011) . it was concluded that effector deficiencies cannot, in any way, be associated with reduced susceptibility in parasitic nematodes. surprisingly, minimum diversity was observed among these parasitic nematodes in most of the orthologous genes belonging to different functional groups (table 4 .1). thus it was evident that all the species possess varied proteins from across the rnai spectrum each with alternative proteins which are yet to be fully identified and characterized. the efficacy of gene silencing substantially depends on the method of dsrna uptake. in absence of systemic rnai, gene silencing shall be limited to the cells that take up the dsrna. therefore, appropriate delivery system is pivotal (terenius et al. 2011) . different delivery methods of dsrna that have been used for successful rnai in insects and nematodes include microinjection, feeding on either artificial diet (table 4 .2), and/or host-mediated delivery through transgenic plants (fig. 4.1) . each of these methods has its own advantages and limitations. microinjection involves injection of dsrna or sirna directly into the body of an organism and has been demonstrated as one of the most successful delivery methods for rnai to validate gene functions (ober and jockusch 2006) . in this method, dsrna is produced by in vitro transcription using t7 or sp6 promoter sequences. it has been employed successfully for suppressing genes in both insects and nematodes. in d. melanogaster, microinjection has been successfully used for delivering dsr-nas for two genes, viz., frizzled and frizzled2, into embryos. the silencing resulted in defects in embryonic patterning that was similar to loss of wingless (wg) function. this was the first study proving the function of frizzle through dsrna microinjection in an insect (kennerdell and carthew 1998) . since then, microinjection-based delivery has been used in several insect species. a comprehensive list of hemipteran insects subjected to microinjection for studying rnai is presented in table 4 .3. direct injection of dsrna into the insect body leads to higher efficiency of gene expression attenuation compared to other methods. nevertheless, there are several limitations in microinjection delivery method. in vitro synthesis of dsrna is skill intensive and costly. additionally, recovery of the insects, especially smaller insects, from aftershock of microinjection, is relatively low. the significant aftershock is due to damage of cuticle leading to adverse immune responses in the insect (roxstrom-lindquist et al. 2004) . therefore, microinjection is rarely used in functional analysis of large number of genes from the point of view of insect-pest control. it is evident from table 4.3 that in the microinjection, mediated delivery has been carried out mostly in the case of hemipteran insects. after injecting dsrnas into the worms, progeny is counted and recorded for the mutant phenotypes. usually after 24 h of injection, good rnai effect is observed (fire et al. 1998 ). in c. elegans, dsrnas of genes like unc-22, unc-54, fem1, and hlh-1 were injected into the adult hermaphrodites, and the interference effect was observed. it was also proposed that in an antisense mechanism, interference of endogenous gene is due to the hybridization between the injected rna and endogenous mrna (fire et al. 1998 ). it is a classical technique, and different target mrnas can be used for injection simultaneously. however, microinjection has not been very successful in plant-parasitic nematodes in general and particularly in m. incognita. this is because of the small size of the infective stages and their inability to ingest fluid without host plant infection (banerjee et al. 2017) . in this process, although the range of dsrna concentrations can be used, the success rate relies upon ample uptake or absorption by the worms (hull and timmons 2004) . dsrna delivery through artificial diet has been the most popular method for delivering dsrna into the insect gut especially for relatively smaller insects such as hemipterans, which are sap-sucking. several insect species of different taxa were studied for rnai by the administration of dsrna through artificial diet as presented in table 4 .3. araujo et al. (2006) fed the blood-sucking rhodnius prolixus with an artificial diet containing dsrna of the nitrophorin2 (np2) gene and found that the saliva of control r. prolixus prolonged plasma coagulation by approximately fourfold compared with the saliva of np2-knockdown r. prolixus. feeding a. pisum with an artificial diet supplemented with dsrna of the a. pisum aquaporin 1 (apaqp1) gene caused attenuated expression of the target gene, which resulted in an increased osmotic pressure of the hemolymph in this insect (shakesby et al. 2009 ). in a nematode, feeding involves ingestion of bacteria expressing dsrna of the target gene against which rnai is employed. timmons et al. (2001) developed engineered bacteria deficient for rnaseiii producing high levels of dsrna segments of a specific gene. c. elegans feeding on these engineered bacteria showed rnai effect leading to loss-of-function phenotypes for the target genes. one of the advantages of this method is that it can be conducted for stage-specific rnai experiments as worms of any stage can be fed with dsrna (kamath et al. 2001; ahringer 2006 ). the feeding method has some major advantages over other methods of delivering dsrna. these are as follows: (i) it is easy to perform; (ii) feeding dsrna is less traumatic to the nymphs and juveniles than doing so via injections, the nymphs and juveniles remain healthier, and their mortality is comparatively lower (shakesby et al. 2009 ); and (iii) perhaps most significantly, delivering dsrnas in early stages of insects and nematodes is convenient by this method as compared to microinjection which needs special equipment and often causes high rate of mortality due to art effect. however, there are some challenges, viz., low efficiency of this method and requirement of large quantities of dsrna, which need to be addressed. moreover, a detailed study in understanding the mechanism of dsrna delivery by ingestion for inhibiting gene expression is yet to be carried out. this method involves soaking of nematodes in concentrated dsrna solution and subsequently scoring of worms or their progeny for phenotypes. rnai by soaking is useful for treating a moderately large number of animals (e.g., 10-100). rnai through soaking method was first employed in c. elegans as a tool for converting its genome sequence information into functional information (tabara et al. 1998) . apart from c. elegans, silencing of genes in plant-parasitic nematodes (ppn) through soaking technique has been popularly used but with minor modifications. other techniques like feeding and microinjection possess some limitations with respect to ppns. in microinjection, successful recovery of injected juveniles is difficult and ppn juveniles do not take up dsrna orally easily from the solutions. this was overcome by urwin et al. (2002) by inducing oral uptake of dsrna using octopamine, a neuroactive compound by cyst nematodes heterodera glycines and globodera pallida. this marked a revolution in imparting rnai-mediated resistance in cyst and root-knot parasitic nematodes. since then many reports on successfully governing the nematode growth utilizing rnai approach came into the picture. in later studies, compounds like resorcinol and serotonin were used for successful uptake of dsrna in m. incognita (rosso et al. 2005; huang et al. 2006) . apart from neuroactive compounds, fluorescein isothiocyanate (fitc) as a marker for observing dsrna uptake and as a mean of selecting affected individuals was used in many studies (urwin et al. 2002; rosso et al. 2005) . intestinal gene cysteine proteinase was suppressed through the soaking method in g. pallida, h. glycines, and m. incognita (nakai and horton 1999; schmidt et al. 1999) . gene silencing by rnai soaking has led to various abnormalities in processes like nematode hatching and molting and even resulted in reduced reproduction rates. many genes, namely, chitin synthase, neuropeptides, msp, c-type lectin, and aminopeptidases, were targeted (kennerdell and carthew 1998; schmidt et al. 1999; dernburg and karpen 2002; ischizuka et al. 2002) . but the efficiency and duration of the silencing effect were assessed for m. incognita calreticulin (mi-crt) and polygalacturonase (mi-pg-1) (rosso et al. 2005) . other genes targeted by this approach are cellulases, pectate lyase, chorismate mutase, and glutathione-s transferase (anandalakshmi et al. 1998; cogoni and macino 2000; hammond et al. 2001; matzke et al. 2001; carmell et al. 2002) . however, the silencing acquired by soaking in dsrna solutions is often transient as duration of soaking and the concentration of dsrnas affect the rnai mechanism (banerjee et al. 2017 ). another alternative method of dsrna delivery is through host-delivered rnai (hd-rnai) where gene is silenced in target organism by the host plant. since there is no synthesis of any gene product in hd-rnai, it is likely to address the biosafety concerns more favorably. genetic transformations of crop plants for expressing dsrna homologous to important insect gene entail several advantages. it delivers the dsrna to the target insect pest in a continuous fashion that leads to elicitation of rnai throughout the life cycle of the insects. host-mediated delivery of dsrna was first demonstrated against two important agricultural pests, cotton bollworm, helicoverpa armigera, and western corn rootworm, diabrotica virgifera (baum et al. 2007; mao et al. 2007 ). transgenic rice was developed by delivering dsrna targeting hexose transporter gene niht1, carboxypeptidase gene nicar, and the trypsin-like serine protease gene nitry of nilaparvata lugens. the study revealed reduced transcript levels of these three targeted genes in the insects that fed on these transgenic rice plants. however, insect lethality was not reported (zha et al. 2011 ). subsequently, several attempts have been made for attenuating key genes of the insects through transgenic host-mediated delivery of dsrna as presented in table 4 .4. the gene construct for expression of the dsrna essentially consists of 200-500 nucleotide tandem repeats of the target gene sequence under the control of a constitutive promoter. such strategy also offers the scope of tissue specific expression of the dsrna. for example, for targeting the phloem-feeding insect pests, phloem-specific expression of the dsrna and their transport in phloem sieve elements would be more desirable. however, several attempts in this direction clearly indicated the effective level of protection would depend on targeting suitable target genes in addition to desired level of expression and delivery of intact dsrna to the infesting insect pests (price and gatehouse 2008) . further understanding of the uptake process and elicitation of rnai by dsrna in insects will facilitate tailoring the gene expression cassette of dsrna in order to achieve effective protection. mao et al. (2007) used rnai-mediated approach to reduce insect's ability to cope up when exposed to xenobiotic compounds, for example, gossypol. transgenic cotton plants expressing a hairpin dsrna targeting gossypol-inducible (mao et al. 2011 ), but were not lethal to the larvae. interestingly, when a cysteine proteinase which is supposed to damage larval peritrophic matrix leading to higher accumulation of gossypol in the midgut was co-delivered, the tolerance was further enhanced (mao et al. 2013 ). the similar strategy may be applicable for restoring insecticide sensitivity among resistant insect species (bautista et al. 2009; tanget al. 2012; figueira-mansur et al. 2013 ). the host-mediated rnai for controlling insect pest has been considered to be particularly important for phloem-sucking hemipteran insect pests, viz., aphids. in green peach aphid, plant-mediated rnai of several target insect-specific genes such as salivary proteins mpc002, mppinto1, and mppinto2 and the gut-specific gene rack-1 showed reduced fecundity (table 4 .3). in a similar study, stronger aphicidal activity of a hairpin rna targeting v-atpase e or the tubulin folding cofactor d (tbcd) was demonstrated (guo et al. 2014 ). rnai-mediated expression attenuation of a serine protease gene mysp in the green peach aphid, myzus persicae, led to a remarkable decrease in their fecundity and parthenogeneticity (bhatia et al. 2012 ). these studies on host-mediated delivery of dsrna and elicitation of rnai in infesting aphids demonstrated potential of rnai approach for developing genetic resistance against aphids. mao and zeng (2014) reported reduced attack by aphids on transgenic tobacco plants expressing dsrna against the gap gene hunchback, and reproduction rate of aphids was also retarded. interestingly, aphid nymphs parthenogenetically born from mothers reared on transgenic plants expressing dsrna continued to show downregulation of the target gene even when transferred on normal plants. an assessment of rnai effect over three generations of m. persicae revealed 60% reduction in aphid reproduction levels in transgenic arabidopsis plants expressing dsmpc002 compared to 40% decline on transgenics expressing dsrack1 and dsmppinto2. such transgenerational rnai was found to last over seven generations in sitobion avenae reared on transgenic barley plants expressing shp-dsrna (abdellatef et al. 2015) . such parental transmission of rnai effect adds to potential of the strategy. rnai mechanism partly occurs in the host itself and partly in nematodes feeding on the transgenic host plant expressing dsrna for the target gene. the plant rnai machinery produces sirnas which are ingested by nematodes feeding upon these plants through stylet (li et al. 2011) . by far hd-rnai is the most successful methodology for developing resistance against nematodes in important crops. this technique exploits the capability of ppns of ingesting macromolecules from the host plants. specifically, the method involves producing dsrna construct and developing transformed plants by agrobacterium-mediated transformation. for generating dsrna, a part of the target gene is cloned in sense and antisense orientation separated by an intron or spacer region and expressed under a constitutive or tissue-specific promoter. majority of researchers have adopted this time-consuming methodology and have successfully developed transgenics resistant against nematodes. another new approach with rapid screening system has been developed involving hairy root method for transformation of crops like soybean, tomato, and sugar beet. genes involved in various vital processes are mostly targeted by this approach being categorized into effector genes (most targeted), house-keeping genes, developmental genes, and genes associated with mrna metabolism. two genes encoding integrase and splicing factor were suppressed in m. incognita using host-delivered rnai. it was the first report eliciting rnai in m. incognita by developing transgenic tobacco lines (yadav et al. 2006 ). the lethality of these genes as rnai targets was further reconfirmed by kumar et al. (2017) in arabidopsis by utilizing this approach against m. incognita. effective silencing of 16d10 effector genes leads to 63-90% reduction in the infectivity of m. incognita in arabidopsis (huang et al. 2006 ). since 16d10 is highly conserved in meloidogyne species, resistance against three other major species was also developed (li et al. 2011) . m. chitwoodi also showed a reduction in the number of nematodes and eggs on silencing 16d10l gene via hd-rnai approach in transgenic arabidopsis and potato plants (dinh et al. 2014a, b) . cyst nematodes also exhibited gene suppression by this technique successfully. the suppression of four parasitism genes, ubiquitin-like (4g06), cellulose-binding protein (3b05), skp1-like (8h07), and zinc finger protein (10a06), in heterodera schachtii resulted in the reduction of females in rnai transgenic arabidopsis lines (sindhu et al. 2009 ). silencing of esophageal proteins in h. glycines leads to the reduction in reproduction (bakhetia et al. 2007 ). in another study, successful suppression of major sperm protein of h. glycines resulted in 68% decrease in eggs per gram root tissue when infected on transgenic soybean plants (steeves et al. 2006) . transgenic tobacco lines expressing dsrnas of two neuropeptides, flp-14 and flp-18, showed 50-80% decline in the infection of m. incognita (papolu et al. 2013) . other genes silenced using this methodology are mj-tisll, rpn7, tyrosine phosphotase, mitochondria stress 70 protein precursor and neuropepetides against meloidogyne spp(s) (hamann et al. 1993; lindbo et al. 1993; depicker and montagu 1997; pasquinelli 2002; lim et al. 2003; valdes et al. 2003) . host-mediated rnai strategy is more successful in root-knot (rkn) nematodes as compared to cyst nematodes (cn) owing to factors like more rnai sensitivity and larger size exclusion limit of rkns than in cns (li et al. 2011) . host-delivered rnai appears to be the most successful technique in controlling nematode infection. identification of appropriate target genes based on preliminary diet-based bioassay and ensuring adequate in planta expression of the dsrna in the transgenic host are pivotal requirements for effective host-mediated rnai. however, further understanding of the mechanisms on dsrna uptake by insect and nematodes will facilitate the tailoring of dsrna expression in hd-rnai. the dsrna uptake mechanism in insects is known to be achieved by either of the two pathways, viz., a protein-mediated pathway and via endocytic pathway. the major component of protein-mediated pathway is a multi-pass transmembrane protein known as systemic rna interference deficient-1 (sid-1) which exports the small interfering rnas across neighboring cells (bansal and michel 2013) . the second pathway is receptor-mediated pathway. in case of c. elegans, the endocytic pathway involves a sid-2 gene localized in intestinal cells. it encodes a membrane protein and is thought to import dsrna from the intestinal lumen which are then exported to other cells with the help of sid-1 channels (winston et al. 2007; mcewan et al. 2012) . hence, sid-1 and sid-2 proteins must work in conjunction to achieve environmental rnai. sid-1 genes have been reported to be evolutionarily conserved among insects orders, but sid-2 gene is absent in insects. tribolium is considered as the model insect for studying systemic rnai with presence of sid-1 like proteins. however, the sid-1 gene of tribolium was found orthologous to tag-130 gene of c. elegans and not ce-sid-1 gene interestingly, where tag-130 has not been reported to be associated with systemic rnai in nematodes (tomoyasu et al. 2008) . the presence of sid-1-like channel proteins varies among different orders of insects. the involvement of sid-1-like channel proteins in dsrna uptake has been reported in brown plant hopper [bph, nilaparvata lugens (xu et al. 2013) ], the colorado potato beetle [cpb, leptinotarsa decemlineata (cappelle et al. 2016)] , and the red flour beetle tribolium castaneum (tomoyasu et al. 2008) . in 2016, genes involved in rnai pathway in insects were identified and classified. the study reveals absence of sid-1/tag-130 orthologs in diptera order (dowling et al. 2016) . it was suggested that in drosophila melanogaster, dsrna uptake is mediated via endocytic pathway along with pattern recognition receptors (prrs) based on a study by ulvila et al. (2006) . this study reports more than 90% reduction in the uptake of double-stranded rna on silencing of these two receptors by rnai technology. most of the studies examining dsrna uptake so far focused on either the endocytic pathway or sid-1like dependent system. however, a clear understanding of the roles of these pathways on dsrna uptake across the insect species is still lacking. nevertheless, insects belonging to another order have been reported to have both the sid-1-like channel proteins and receptor-mediated endocytosis pathways playing a role in dsrna uptake (cappelle et al. 2016) . however, the dsrna uptake mechanism in worms is quite different. the components involved in dsrna uptake have been well studied in c. elegans, and presence of sid-1 and sid-2 genes along with other components like rsd-2, rsd-3, and rsd-6 has been well documented in the c. elegans genome. but surprisingly in a study, it was found these proteins were not evolutionary conserved . the dataset recognizes sid-1orthologs in two parasitic nematodes, viz., in haemonchus contortus and oesophagostomum dentatum only. the sid-2 protein was not found to be present in other nematode species. intriguingly, the plant-parasitic nematodes such as meloidogyne and globodera spp. despite the absence of sid-1 and sid-2 genes exhibit systemic rnai when subjected to silencing technology indicating a presence of similar receptor-mediated endocytic process for dsrna uptake as reported in insects ). though lot of information has been generated over past few years, a clear understanding on dsrna uptake mechanism(s) in worms is still elusive other than insects and nematodes, there are agricultural pests belonging to phylum arthropoda that affect the crop productivity worldwide, and rnai-based strategy to control these pests has shown some success. these pests are fire ants, mites, locusts (order orthoptera), and many more. systemic rnai has already been demonstrated in these pests via microinjection. on feeding the worker ants, solenopsis invicta, with 1000 ppm dsrna targeting pban/pyrokinin gene, increased mortality rate of the fourth instar larvae. direct toxic effect was also observed even when the dsrna concentration was reduced to 200 ppm (zhao and chen 2013) . in spider mite, gene silencing and increased mortality rate was observed when 160 ppm of dsrna, targeting several genes, was employed via permeated leaf disc assay (kwon et al. 2013) . in another mite, varroa destructor, an ectoparasite of the honey bee, apis mellifera, both the delivery methods of dsrna, i.e., by immersing mites in a dsrna solution or by host-mediated rnai, wherein dsrna was fed to the honey bees and eventually delivered to mites, were found to attenuate the target gene expression through environmental rnai (campbell et al. 2010; garbian et al. 2012) . interestingly, locust species displayed systemic rnai response but were refractory to environmental rnai. even a considerate concentration of 15 pg of dsrna per mg body mass (~10 ng/insect) was enough to silence a gene in the desert locust, schistocerca gregaria (wynant et al. 2012) . in case of tribolium castaneum, the systemic response continued to increase over time in a dose-dependent manner and furthermore led to mortality 7 days postinjection. a similar dose-dependent response was also exhibited by the migratory locust, locusta migratoria, leading to target gene suppression and lethality, but was unresponsive to environmental rnai (luo et al. 2013 ). fungi are classified as a separate eukaryotic kingdom from plants and animals. the vital rnai components (rna-dependent rna polymerase (rdrp), dicer, and argonaute) have been found in different fungi indicating the presence of functional rnai pathway (dang et al. 2011) . the rnai phenomenon is termed as "quelling" in fungi which was first demonstrated in ascomycete neurospora crassa (romano and macino 1992) . silencing of fungal genes by rnai has shown to be desirable for many fungal species like ascomycota, basidiomycota, zygomycota, and phytophthora species (nunes and dean 2012) . several studies have been published reporting the successful use of host-induced gene silencing (higs) to control fungal diseases (table 4 .5) (koch and kogel 2014) . suppression of gus transcripts in a gus-expressing strain of fusarium verticillioides (phytopathogenic filamentous fungi) while colonizing transgenic tobacco plants expressing gus gene-interfering cassette was reported (tinoco et al. 2010) . in vitro feeding of dsrna complementary to three genes involved in ergosterol biosynthetic pathway, viz., cyp51a, cyp51b, and cyp51c, showed reduced growth of fusarium graminearum (koch et al. 2013) . in wheat, mycotoxin-specific genes were silenced in f. graminearum and resulted in inhibition of virulence (mcdonald et al. 2005) . fungal pathogenicity genes have shown to be an appropriate target for controlling fungal infection. a complete loss of pathogenicity was reported on targeting two of the host-selective act-toxin koch et al. (2013) genes in the fungus alternaria alternata (miyamoto et al. 2008; ajiro et al. 2010) . similar reports on silencing of pathogenicity gene or avirulent gene proved successful in inhibiting the fungal growth and development. in magnaporthe oryzae, silencing of 37 genes involved in calcium signaling process adversely affected hyphal growth, sporulation, and pathogenicity (nunes and dean 2012) . higs-mediated silencing of effector gene avra10 showed a reduction in the number of haustoria in powdery mildew-susceptible barley cultivar (koch and kogel 2014) . to date, there are several successful reports of gene silencing in fungi with varied silencing efficiency. for instance, in moniliophthora perniciosa, the silencing efficiency varied depending upon the targeted gene with reduction rates ranging from 18% to 97% in case of hydrophobin transcripts and 23% to 87% in peroxiredoxin transcripts (santos et al. 2009 ), while when rna hairpin precursor used to transform the ascomycota ophiostoma novo-ulmi, the expression of 6%, 22%, and 31% relative to the wild type was reported (carneiro et al. 2010) in three transformants. although usage of rnai for managing fungus growth is nowadays a favored approach by researchers, rnai silencing also leads to some off-target effects as observed by lacroix and spanu (2009) on silencing various genes in c. fulvum. these off-targets can be avoided by using specific silencing trigger sequence in rnai vector, by tissue-specific and inducible silencing (senthil-kumar and mysore 2011). the potential of rnai technology for controlling various pests has been well documented over the past decade. however, there are many limitations which need to be taken care of for successful deployment of rnai technology. there are several factors which need to be carefully looked into while designing rnai experiments, including the off-target effects, dsrna design, length and concentration of dsrna, and many more. therefore, to ensure a successful and effective rnai-based silencing, these factors need to be balanced optimally. in case of insects, persistency of rnai is a major problem due to which an optimum amount of dsrna needs to be determined for an effective silencing. interestingly, it is not true for every order of insect which is to be managed. for instance, about 60% (or lower) of gene knockdown was reported in certain recalcitrant insect species, while in coleopterans, 90% knockdown of gene was successfully achieved ensuing a long-lasting hereditary (baum et al. 2007; huvenne and smagghe 2010; zhu et al. 2011; bolognesi et al. 2012; rangasamy and siegfried 2012; li et al. 2013) . not only in insects but in nematodes also barriers like off-target effects have been reported while performing rnai technology based management approaches. designing an effective sirna sequence is a major limitation in rnai technology-based silencing. the following are some major barriers. off-target effects result from the knockdown of unintended genes other than the target gene. therefore, one of the most important aspects is avoidance of nonspecific target effects. it is the sequence used that determines possible off-target effects in the target organism and also in other species. other than sequence, off-target effects can arise due to wide range of sirnas being produced from a single dsrna which increases the chance of nontarget effects. there are many reports of off-target effects, for instance, in triatomid bug r. prolixus, two homologous nitroprin genes were silenced other than the targeted gene (araujo et al. 2006) . thus, selecting a sequence for synthesizing dsrna is a crucial and limiting step in rnai technology. selection of target gene is the first step in decision-making process for successful induction of rnai in an organism. the gene selected should have a crucial role in the concerned organism, and genes involved in parasitism or development are likely candidate genes fulfilling all such requirements. moreover, it should be highly specific and not conserved across different genera (danchin et al. 2013 ) especially in pollinators. next stage is to choose a suitable target site from the selected target gene. it is necessary to ensure the designing of a species-specific dsrna. for identifying potential target sites for eliciting effective rnai, bioinformatic tools are available online. specificity of the dsrnas could be conferred by either targeting conserved domain or variable region depending on the candidate gene with the aim to minimize possibility of affecting any unintended genes or organisms. this is particularly important to ensure that dsrnas targeting agricultural pests should not possess any overlapping similarity to the genes of beneficial pollinators. by targeting the utr regions, even closely related homologous genes can be selectively silenced through rnai as demonstrated in d. melanogaster, t. castaneum, a. pisum, and tobacco hornworms, manduca sexta, with respect to vatpase gene (whyard et al. 2009 ).the concept of dsrnas being used as tailor-made pesticides is emerging wherein highly specific dsrnas are employed against havoc-creating pests and are also eco-friendly to the environment. in general, longer rna molecules tend to have longer half-life and therefore may be considered desirable while designing dsrnas. however, size of the dsrna molecule could be a limiting factor toward efficient uptake by the organisms. in nematodes, 28-140 kda dsrna could be efficiently ingested by meloidogyne species (urwin et al. 1997; li et al. 2007; zhang et al. 2012) , though the limit is not known for other pests. in red flour beetle, the length and concentration of dsrna had profound effect on efficiency as well as persistence of the rnai effect, for example, 60-and 30-bp dsrnas induced 70 and 30% of gene knockdown, respectively (miller et al. 2012 ). in the same study, it has been also shown that multiple dsrnas, when injected together, led to competitive inhibition influencing the effectiveness of rnai. in contrary, dsrna longer than 200 nucleotides and likely to generate multiple sirnas contribute efficient rnai response (andrade and hunter 2016) . multiple sirnas will help in overcoming the target resistance that may arise due to polymorphism in the target. however, more studies are warranted to understand unambiguously the effect of length and concentration of dsrnas on the initial efficiency and persistence of the rnai effect. for realizing rnai-mediated gene silencing as an applicable strategy of pest control in agriculture, it remains imperative to achieve significant mortality or growth arrest of the pest population. therefore, any attenuation of the target gene must be indispensible for the pest organism. this in turn underlines the importance of identifying appropriate target gene for the target pest. though most of the studies have used limited set of target genes reported earlier, more emphasis should be given on identification of novel candidate genes (pitino et al. 2011; zhu et al. 2011 ). the upcoming genomics and bioinformatics tools, like genome search (bai et al. 2009 ), cdna library (mao et al. 2007; baum et al. 2007 ), rna-seq and digital gene expression tag profile (dge-tag) , and rit-seq (alsford et al. 2011) , have been used for identification of new target genes. the persistence of silencing signal determines the effectiveness of rnai. studies on low persistence of silencing effect have been reported in a. pisum wherein silencing effect on aquaporin persisted for 5 days of delivery before subsiding (shakesby et al. 2009 ) indicating transient nature of rnai effect. thus, continuous supply of dsrna seems to be essential for effective rnai. it lends support for the transgenic host-mediated expression of the dsrna for persistent and effective silencing. persistent rnai will also be useful in manifesting desired effect on the target organism even in case of inefficient and partial downregulation of the target gene. selecting a life stage for larger silencing effects is species dependent that is to be targeted. in most cases, younger stage is preferred despite the efficient handling of older stages. in plant-parasitic nematodes, selecting the pre-parasitic juvenile stage for delivering dsrnas shows better silencing effect. similar observation was reported in insects, for example, in case of r. prolixus, no silencing effect was observed after treating its fourth instars compared to 42% silencing when using second instars (araujo et al. 2006 ). various methods of dsrna delivery have been used across the organisms. such methods include microinjection, feeding with bacteria expressing dsrna, feeding through diet supplementation, and host-mediated ingestion. the efficiency of rnai varies significantly among different organisms and when using different delivery methods. in insects, either microinjection or diet supplementation has been the method of choice, though the aftershock effect of microinjection remains a concern in many species. microinjection-mediated direct delivery bypasses the exposure of the dsrna molecule to the nucleases present in the digestive tract. however, for realizing true efficacy of the dsrna, it is desired to deliver through oral delivery that mimics the host-mediated delivery through ingestion. limited success in rnai in some of the insects has been attributed to rapid degradation of dsrna by saliva of the insects. the saliva of lygus lineolaris was found to contain rnases which interact with plant material prior to ingestion (allen and walker 2012) . presence of nucleases in the saliva and viruses in the hemolymph of insects also limits the silencing efficiency by degrading dsrnas (thompson et al. 2012; christensen et al. 2013 ). an ample number of studies in insect orders of coleoptera, diptera, lepidoptera, hemiptera, and others comprising of several insect pests have shown that rnai targeting insect genes can affect growth and development of insects, often leading to insect death (tables 4.3 and 4.4). the kind of genes for which a relatively high rnai efficiency could be achieved included genes encoding detoxification enzymes, metabolism and cytoskeleton structure, cell synthesis, nutrition, etc. alternative pathways of many of these genes in insects as well as relative importance of a particular pathway in an insect species are not known with certainty. therefore, use of rnai as a strategy for pest control will require an essential step of target selection. if an indispensible gene has to be identified for an insect species, it will involve large throughput screening rather than going for homologous genes, effective for other insect species. chitin covers the exoskeleton of insect body, and the insect midgut lined by peritrophic membrane (pm) constitutes the major channel for absorption of nutrients as well as orally administered dsrna. therefore genes expressed and functioning in the insect midgut have been screened by many researchers (wang and granados 2001) . for example, a chitinase gene (oncht) and a chitin synthase gene (onchs2) were identified from gut-specific est of european corn borer (ostrinia nubilalis) (khajuria et al. 2010) . chitin content of the pm is regulated by oncht as demonstrated in feeding experiment with dsrna-and rnai-based suppression which led to reduced growth and development of european corn borer larvae (khajuria et al. 2010) . in a similar study, mao et al. (2007) identified several gossypol inducible genes, including a putative p450 monooxygenase, cyp6ae14, from a midgutspecific cdna library from fifth-instar larvae exposed to gossypol. similarly, for screening targets for rnai in coleopteran insects, a large number of cdnas from the cdna libraries of western corn rootworm (diabrotica virgifera virgifera) were in vitro transcribed and used in feeding-based bioassays (baum et al. 2007) . a rapid method of cdna screening was demonstrated by wang et al. (2011) by combining illumina's rna-seq and digital gene expression tag profile (dge-tag) in asian corn borer (acb) (ostrinia furnacalis). in addition to being a rapid and costeffective method, this method allows monitoring expression of the genes throughout the insect body and thus broadening the base of target selection. using illumina parallel sequencing technology, abundance of >90,000 transcripts from trypanosome libraries was scored before and after induction of rnai. the results led to constitution of non-redundant set of protein-coding sequences (cds) comprising â�¼7500 genes (alsford et al. (2011) . thus these methods can derive core set of essential gene loci if genome sequence of the organism is known. rnai-mediated attenuation of these core loci is most likely to significantly retard survival and fitness of the insect pests. in recent years, several modifications and methods for effective delivery and uptake of dsrna have been proposed. such methods include chemical modifications of sirna duplex delivery through nanoparticles and liposomes, sprayable rnai-based products, root absorption and trunk injection, and bacteria-or virusbased delivery. a few of them with much potentiality have been described below. synthetic, nontoxic nanoparticles could be generated from natural as well as synthetic polymers. nanoparticles offer ease of surface modifications and biodegradability in addition to more penetration ability, thus an effective vehicle for delivery of dsrna (vauthier et al. 2003; herrero-vanrell et al. 2005) . in mosquito dsrna encapsulated in polymer, chitosan was used to achieve rnai . the encapsulation process used the electrostatic forces between the negative charges of the rna backbone and positive charges of the amino groups of chitosan. zhang et al. (2015a, b) demonstrated effective knockdown of agchs1 and agchs2 in a. gambiae and a. aegypti (sema1a) during larval development by using chitosan nanoparticles. he et al. (2013) fed lepidopteran pest, asian corn borer (ostrinia furnacalis), with diet containing the mixture of fluorescent nanoparticle (fnp) and cht10-dsrna, naked cht10-dsrna, fnp and gfp-dsrna, and gfp-dsrna. rnai-mediated gene silencing occurred only in the larvae fed on the diet containing the mixture of fnp and cht10-dsrna leading to retarded growth and eventually death of the larvae. liposome vesicles composed of nontoxic natural lipids are already being used in drug delivery. liposomes can cross the cell membrane effectively and deliver the exogenous molecules. whyard et al. (2009) used cationic liposomes for encapsulating and delivering dsrna targeting 3â�²-utr of the g-tubulin gene in four different species of drosophila (d. melanogaster, d. sechellia, d. yakuba, and d. pseudoobscura) and demonstrated mortality of the insects only in case of encapsulated dsrna. in drosophila, presence of sid1 homologues has never been confirmed, and the uptake of dsrna is likely to be by receptor-mediated endocytosis (ulvila et al. 2006) . higher efficiency of rnai in case of liposome-mediated delivery in certain cases could be attributed to the fact that it bypasses the gut nucleases which reduces the efficacy of orally delivered dsrna. chemical modifications are known to increase the stability of rna molecules. in case of sirna also such modifications have been proposed to improve half-life and pharmacokinetic properties of the sirna duplexes, target-binding affinity, and delivery (kurreck 2003; manoharan 2003; dorsett and tuschl 2004) . interestingly a couple of examples have demonstrated that such modifications may increase the specificity of dsrna. for example, methylation at 2â�²-position of the ribosyl ring of the second base of the sirna could decrease off-target effects (jackson et al. 2003) , sirna duplex with 3â�²-overhangs at each end was more effective in gene silencing compared to blunt-ended duplex (elbashir et al. 2001) , and addition of 3â�²-tt overhangs (the "tuschl design") on both strands of duplex sirna has been preferred in many cases. a few other designs, for instance, sirnas without 3â�²-overhangs and single 3â�²-overhang structures in the guide strand, have been active in gene silencing (czauderna et al. 2003; lorenz et al. 2004 ). despite few limitations, the applicability of rnai in improving crop resistance especially against biotic stresses is expected to be the most reliable and significant approach in the future as evident from a plethora of studies. certain products based on rnai-mediated resistance such as monsanto's smartstax pro, for control of western corn earworm, and dupont pioneer's plenish high oleic acid soybean (majumdar et al. 2017 ) are likely to be commercialized soon. however, efficacy of these plants remains to be proven in actual field situations. diverse classes of biotic factors, affecting crop production worldwide, have shown varied levels of susceptibility toward rnai, which warrants need for modified and improved versions of dsrna delivery methods. the better understanding of host-pest interaction and the genetic basis of parasitism are likely to generate more potential target genes for effective hd-rnai. crispr/cas system has come up as a powerful technique in creating knockout mutants to unravel complex mechanism of parasitism and thus paves the way for identification of the key pest genes. transplastomic expression of dsrna in the plants would be a further improvement for achieving higher expression. applying dsrna through methods with low environmental risks, for instance, irrigation water, root drench, or trunk injection, would obviate the need for genetic transformation. these methods result in localized application along with rapid breakdown of dsrna and therefore likely to be more acceptable from a biosafety point of view (joga et al. 2016) . successful demonstration of using layered double hydroxide clay nanosheets for topical application of dsrna against viruses (mitter et al. 2017) opens up possibilities of applying dsrna like any other protective agrochemicals. to conclude, rnai has emerged as one of the most potential control mechanisms for pests like insects, nematodes, fungus, etc. although still a lot remains to be explored and understood about the molecular process of rnai in plants and their pests, the present available knowledge and the studies reviewed in this chapter have proved rnai technology as an important tool in identifying gene functions and targeting vital genes for controlling pest development. rnai-mediated loss-offunction phenotypes not only determine functions of unknown genes but also lead to identification of new specific targets for managing pest or improving agricultural traits. but understanding rnai mechanism is of utmost importance as rnai machinery varies from genus to genus. there are several shortcomings that need to be addressed, for instance, persistence of silencing effects, off-target effects of silencing, etc. not only this, the biosafety, risk assessment, and government regulations related to commercialization of rnai-based transgenics still have to be developed. the revelation of rnai technology has revolutionized the area of research in biotechnology. not only in pest management, the wide range of rnai application includes modification of agronomic traits, eliminating mycotoxin contamination, improving nutritional value of crops, etc. it is also proving its worth in rnai-based therapeutics research for human welfare. in toto, this technology is a potential boon in the arsenal of the scientific community to address the challenges associated with climatic changes, burgeoning population, and sustainability of human race. small non-coding rnas mount a silent revolution in gene expression silencing the expression of the salivary sheath protein causes transgenerational feeding suppression in the aphid sitobion avenae rna interference: biology, mechanism and applications the c elegans research community role of the host-selective act-toxin synthesis gene actts2 encoding an enoylreductase in pathogenicity of the tangerine pathotype of alternaria alternata saliva of lygus lineolaris digests double stranded ribonucleic acids high-throughput phenotyping using parallel sequencing of rna interference targets in the african trypanosome a viral suppressor of gene silencing in plants rna interference-natural gene-based technology for highly specific pest control (hispec) knocking-down meloidogyne incognita proteases by plant-delivered dsrna has negative pleiotropic effect on nematode vigor rna interference of the salivary gland nitrophorin 2 in the triatomine bug rhodnius prolixus (hemiptera: reduviidae) by dsrna ingestion or injection culture of drosophila primary cells dissociated from gastrula embryos and their use in rnai screening rna interference of dual oxidase in the plant nematode meloidogyne incognita qpcr analysis and rnai define pharyngeal gland cell-expressed genes of heterodera glycines required for initial interactions with the host rna interference: a novel source of resistance to combat plant parasitic nematodes core rnai machinery and sid1, a component for systemic rnai, in the hemipteran insect, aphis glycines control of coleopteran insect pests through rna interference rna interference-mediated knockdown of a cytochrome p450, cyp6bg1, from the diamondback moth, plutella xylostella , reduces larval resistance to permethrin evolution of microrna diversity and regulation in animals role for a bidentate ribonuclease in the initiation step of rna interference host generated sirnas attenuate expression of serine protease gene in myzus persicae characterizing the mechanism of action of double-stranded rna activity against western corn rootworm (diabrotica virgifera virgifera leconte) delivery of intrahemocoelic peptides for insect pest management gene-knockdown in the honey bee mite varroa destructor by a non-invasive approach: studies on a glutathione s-transferase the involvement of clathrin-mediated endocytosis and two sid-1-like transmembrane proteins in double-stranded rna uptake in the colorado potato beetle midgut the argonaute family: tentacles that reach into rnai, developmental control, stem cell maintenance, and tumorigenesis suppression of polygalacturonase gene expression in the phytopathogenic fungus ophiostoma novo-ulmi by rna interference origins and mechanisms of mirnas and sirnas fragile x-related protein and vig associate with rna interference machinery functional analysis of pathogenicity proteins of the potato cyst nematode globodera rostochiensis using rnai feeding based rna interference of a trehalose phosphate synthase gene in the brown plant hopper, nilaparvata lugens engineering resistance against root-knot nematode, meloidogyne incognita, by host delivered rnai metabolism studies of unformulated internally [3h]-labeled short interfering rnas in mice dsrna degradation in the pea aphid(acyrthosiphon pisum) associated with lack of response in rnai feeding and injection assay post-transcriptional gene silencing across kingdoms persistence and transgenerational effect of plant-mediated rnai in aphids gene silencing in adult aedes aegypti mosquitoes through oral delivery of double-stranded rna structural variations and stabilizing modifications of synthetic sirnas in mammalian cells rnai effector diversity in nematodes identification of novel target genes for safer and more specific control of root-knot nematodes from a pan-genome mining rna interference in fungi: pathways, functions, and applications rnai-mediated crop protection against insects implementing the sterile insect technique with rna interference -a review influence of catalase gene silencing on the survivability of sitobion avenae post-transcriptional gene silencing in plants a genome-wide transgenic rnai library for conditional gene inactivation in drosophila rna interference of effector gene mc16d10l confers resistance against meloidogyne chitwoodi in arabidopsis and potato plant mediated rna interference of effector gene mc16d10l confers resistance against meloidogyne chitwoodi in diverse genetic backgrounds of potato and reduces pathogenicity of nematode offspring sirnas: applications in functional genomics and potential as therapeutics phylogenetic origin and diversification of rnai pathway genes in insects tomato transgenic plants expressing hairpin construct of a nematode protease gene conferred enhanced resistance to root-knot nematodes killing the messenger: short rnas that silence gene expression rna interference is mediated by 21-and 22-nucleotide rnas host-delivered rnai: an effective strategy to silence genes in plant parasitic nematodes resistance to ditylenchus destructor infection in sweet potato by the expression of small interfering rnas targeting unc-15, a movement-related gene orco mediates olfactory behaviors and winged morph differentiation induced by alarm pheromone in the grain aphid, sitobion avenae analysis of chitin synthase function in a plant parasitic nematode, meloidogyne artiellia, using rnai transport of dsrna into cells by the transmembrane protein sid-1 silencing of p-glycoprotein increases mortality in temephos-treated aedes aegypti larvae potent and specific genetic interference by doublestranded rna in c. elegans bidirectional transfer of rnai between honey bee and varroa destructor: varroa gene silencing reduces varroa population silencing of rieske iron-sulfur protein using chemically synthesized sirna as a potential biopesticide against plutella xylostella oral delivery mediated rna interference of a carboxylesterase gene results in reduced resistance to organophosphorus insecticides in the cotton aphid synthetic rna silencing in bacteria antimicrobial discovery and resistance breaking rna interference with the allato regulating neuropeptide genes from the fall armyworm spodoptera frugiperda and its effects on the jh titer in the hemolymph plant-generated artificial small rnas mediated aphid resistance analysis of the transcriptome of the root lesion nematode pratylenchus coffeae generated by 454 sequencing technology consecutive inactivation of both alleles of the gb110 gene has no effect on the proliferation and differentiation of mouse embryonic stem cells a species of small antisense rna in posttranscriptional gene silencing in plants post-transcriptional gene silencing by doublestranded rna fluorescent nanoparticle delivered dsrna toward genetic control of insect pests self-assembled particles of an elastin-like polymer as vehicles for controlled drug release black shank resistant tobacco by silencing of glutathione s-transferase engineering broad root-knot resistance in transgenic plants by rnai silencing of a conserved and essential root-knot nematode parasitism gene methods for delivery of double-stranded rna into caenorhabditis elegans mechanisms of dsrna uptake in insects and potential of rnai for pest control: a review a drosophila fragile x protein interacts with components of rnai and ribosomal proteins expression profiling reveals off-target gene regulation by rnai rnai efficiency, systemic properties, and novel delivery methods for pest insect control: what we know so far effectiveness of specific rna-mediated interference through ingested double-stranded rna in caenorhabditis elegans use of dsrna-mediated genetic interference to demonstrate that frizzled and frizzled 2act in the wingless pathway dicer functions in rna interference and in synthesis of small rna involved in developmental timing in c. elegans the many faces of rnai a gut-specific chitinase gene essential for regulation of chitin content of peritrophic matrix and growth of ostrinia nubilalis larvae a correlation between host-mediated expression of parasite genes as tandem inverted repeats and abrogation of development of female heterodera glycines cyst formation during infection of glycine max new wind in the sails: improving the agronomic value of crop plants through rnai-mediated gene silencing host-induced gene silencing of cytochrome p450 lanosterol c14î±-demethylase-encoding genes confers strong resistance to fusarium species functional characterization of a juvenile hormone esterase related gene in the moth sesamia nonagrioides through rna interference silencing of acetyl cholinesterase gene of helicoverpa armigera by sirna affects larval growth and its life cycle development of an rnai based microalgal larvicide to control mosquitoes host-delivered rnai-mediated root-knot nematode resistance in arabidopsis by targeting splicing factor and integrase genes antisense technologies: improvement through novel chemical modifications screening of lethal genes for feeding rnai by leaf discmediated systematic delivery of dsrna in tetranychus urticae silencing of six hydrophobins in cladosporium fulvum: complexities of simultaneously targeting multiple genes attempts to establish rna interference in the parasitic nematode heligmosomoides polygyrus host-derived suppression of nematode reproductive and fitness genes decreases fecundity of heterodera glycines ichinohe rapid in planta evaluation of root expressed transgenes in chimeric soybean plants rna interference in nilaparvata lugens (homoptera, delphacidae) based on dsrna ingestion rna interference of four genes in adult bactrocera dorsalis by feeding their dsrnas biotechnological application of functional genomics towards plant parasitic nematode control advances in the use of the rna interference technique in hemiptera resistance to root-knot nematode in tomato roots expressing a nematicidal bacillus thuringiensis crystal protein cathepsin b cysteine proteinase is essential for the development and pathogenesis of the plant parasitic nematode radopholus similis cloning and characterisation of a heterodera glycines aminopeptidase cdna vertebrate micro-rna genes induction of a highly specific antiviral state in transgenic plants: implications for regulation of gene expression and virus resistance rnai as random degradation pcr: sirna primers convert mrna into dsrna that are degraded to generate new sirnas steroid and lipid conjugates of sirnas to enhance cellular uptake and gene silencing in liver cells knockdown of heat-shock protein 90 and isocitrate lyase gene expression reduced root-knot nematode reproduction identification of hedgehog pathway components by rnai in drosophila cultured cells differential responses of migratory locusts to systemic rna interference via double-stranded rna injection and feeding structural basis for double-stranded rna processing by dicer rna interference (rnai) as a potential tool for control of mycotoxin contamination in crop plants: concepts and considerations rna interference and chemically modified sirnas feeding-based rna interference of a gap gene is lethal to the pea aphid, acyrthosiphon pisum plant-mediated rnai of a gap gene-enhanced tobacco tolerance against the myzus persicae silencing a cotton bollworm p450 monooxygenase gene by plant-mediated rnai impairs larval tolerance of gossypol cotton plants expressing cyp6ae14 double-stranded rna show enhanced resistance to bollworms cysteine protease enhances plant-mediated bollworm rna interference rna: guiding gene silencing an eye on rnai in nematode parasites rna silencing of mycotoxin production in aspergillus and fusarium species uptake of extracellular double-stranded rna by sid-2 formicidae (ant) control using double-stranded rna constructs rna interference suggests sulfakinins as satiety effectors in the cricket gryllus bimaculatus dissecting systemic rna interference in the red flour beetle tribolium castaneum: parameters affecting the efficiency of rnai clay nanosheets for topical delivery of rnai for sustained protection against plant viruses functional analysis of a multicopy host-selective act-toxin biosynthesis gene in the tangerine pathotype of alternaria alternata using rna silencing rnai knockdown of a salivary transcript leading to lethality in the pea aphid, acyrthosiphon pisum a secreted mif cytokine enables aphid feeding and represses plant immune responses psort: a program for detecting sorting signals in proteins and predicting their subcellular localization introduction of a chimeric chalcone synthase gene into petunia results in reversible cosuppression of homologous genes in trans rnai silencing of the meloidogyne incognita rpn7 gene reduces nematode parasitic success msp40 effector of root-knot nematode manipulates plant immunity to facilitate parasitism higs: host-induced gene silencing in the obligate biotrophic fungal pathogen blumeria graminis host-induced gene silencing: a tool for understanding fungal host interaction and for developing novel disease control strategies a non-invasive method for silencing gene transcription in honeybees maintained under natural conditions atp requirement and small interfering rna structure in the rna interference pathway the roles of wingless and decapentaplegic in axis and appendage development in the red flour beetle, tribolium castaneum host-induced gene silencing of wheat leaf rust fungus puccinia triticina pathogenicity genes mediated by the barley stripe mosaic virus utility of host delivered rnai of two fmrf amide like peptides, flp-14 and flp-18, for the management of root knot nematode, meloidogyne incognita micrornas: deviants no longer over-expression of cyp6a2 is associated with spirotetramat resistance and cross-resistance in the resistant strain of aphis gossypii glover aphid protein effectors promote aphid colonization in a plant species-specific manner silencing of aphid genes by dsrna feeding from plants. plosone 6:e25709 gene knockdown by rnai in the pea aphid acyrthosiphon pisum rnai-mediated crop protection against insects silencing of midgut aminopeptidase n of spodoptera litura by double-stranded rna establishes its role as bacillus thuringiensis toxin receptor validation of rna interference in western corn rootworm diabrotica virgifera virgifera leconte (coleoptera, chrysomelidae) adults rna interference of odorant-binding protein 2 (obp2) of the cotton aphid, aphis gossypii (glover), resulted in altered electrophysiological responses establishment of broad-spectrum resistance against blumeria graminis f. sp. tritici in triticum aestivum by rnai-mediated knock-down of mlo rnai-mediated knockdown of a spodoptera frugiperda trypsin-like serine-protease gene reduces susceptibility to a bacillus thuringiensis cry1ca1 protoxin quelling: transient inactivation of gene expression in neurospora crassa by transformation with homologous sequences application of rna interference to root-knot nematode genes encoding esophageal gland proteins parasite-specific immune response in adult drosophila melanogaster: a genomic study dsrna induced gene silencing in moniliophthora perniciosa, the causal agent of witches' broom disease of cacao new insight into the rna interference response against cathepsin-l gene in the pea aphid, acyrthosiphon pisum: molting or gut phenotypes specifically induced by injection or feeding treatments genetic and molecular characterization of sting, a gene involved in crystal formation and meiotic drive in the male germ line of drosophila melanogaster hv et al (2009) a water-specific aquaporin involved in aphid osmoregulation host-induced silencing of two pharyngeal gland genes conferred transcriptional alteration of cell wall-modifying enzymes of meloidogyne incognita vis-ã -vis perturbed nematode infectivity in eggplant on the role of rna amplification in dsrna-triggered gene silencing effective and specific in planta rnai in cyst nematodes: expression interference of four parasitism genes reduces parasitic success oral delivery of double-stranded rna in larvae of the yellow fever mosquito, aedes aegypti: implications for pest mosquito control on the road to reading the rna-interference code transgenic soybeans expressing sirnas specific to a major sperm protein gene suppress heterodera glycines reproduction rna interference of î²1 integrin subunit impairs development and immune responses of the beet armyworm, spodoptera exigua rnai in c. elegans: soaking in the genome sequence delaying insect resistance to transgenic crops insect resistance to bt crops: evidence versus theory rnai, a new therapeutic strategy against viral infection rna interference in lepidoptera: an overview of successful and unsuccessful studies and implications for experimental design toxicological and pharmacokinetic properties of chemically modified sirnas targeting p53 rna following intravenous administration developmental control of a lepidopteran pest spodoptera exigua by ingestion of bacteria expressing dsrna of a non-midgut gene ingestion of bacterially expressed dsrnas can produce specific and potent genetic interference in caenorhabditis elegans in vivo trans-specific gene silencing in fungal cells by in planta expression of a double-stranded rna microrna biogenesis: drosha can't cut it without a partner exploring systemic rna interference in insects: a genome-wide survey for rnai genes in tribolium rna interference in the light brown apple moth, epiphyas postvittana walker induced by double-stranded rna feeding targeted mrna degradation by double-stranded rna in vitro rna interference against gut osmoregulatory genes in phloemfeeding insects double-stranded rna is internalized by scavenger receptor-mediated endocytosis in drosophila s2 cells rna interference for the control of whiteflies (bemisia tabaci) by oral route ingestion of double-stranded rna by pre-parasitic juvenile cyst nematodes leads to rna interference resistance to both cyst-and root-knot nematodes conferred by transgenic arabidopsis expressing a modified plant cystatin with double stranded rna to prevent in vitro and in vivo viral infections by recombinant baculovirus small rnas and the control of transposons and viruses in drosophila drug delivery to resistant tumors: the potential of poly (alkyl cyanoacrylate) nanoparticles the pratylenchus penetrans transcriptome as a source for the development of alternative control strategies: mining for putative genes involved in parasitism and evaluation of in planta rnai stacking resistance to crown gall and nematodes in walnut rootstocks prolonged gene knockdown in the tsetse fly glossina by feeding double stranded rna molecular structure of the peritrophic membrane (pm): identification of potential pm target sites for insect control second-generation sequencing supply an effective way to screen rnai targets in large scale for potential application in pest insect control angiotensin-converting enzymes modulate aphid-plant interactions ingested double-stranded rnas can act as species-specific insecticides the structural sheath protein of aphids is required for phloem feeding systemic rnai in c. elegans requires the putative transmembrane protein sid-1 caenorhabditis elegans sid-2 is required for environmental rna interference many roads to maturity: microrna biogenesis pathways and their regulation oral delivery of double-stranded rnas and sirnas induces rnai effects in the potato/tomato psyllid tissue-dependence and sensitivity of the systemic rna interference response in the desert locust, schistocerca gregaria gene silencing of two acetylcholinesterases reveals their cholinergic and non-cholinergic functions in rhopalosiphum padi and sitobion avenae silencing the hahr3 gene by transgenic plant-mediated rnai to disrupt helicoverpa armigera development genome-wide screening for components of small interfering rna (sirna) and micro-rna (mirna) pathways in the brown planthopper, nilaparvata lugens (hemiptera: delphacidae) silencing of an aphid carboxylesterase gene by use of plantmediated rnai impairs sitobion avenae tolerance of phoxim insecticides the 8d05 parasitism gene of meloidogyne incognita is required for successful infection of host roots host generated double stranded rna induces rnai in plant parasitic nematodes and protects the host from infection effects of short interfering rna against methicillin-resistant staphylococcus aureus coagulase in vitro and in vivo efficiency of different methods for dsrna delivery in cotton bollworm (helicoverpa armigera) development of rnai methods for peregrinus maidis, the corn planthopper development of a host-induced rnai system in the wheat stripe rust fungus puccinia striiformis f. sp. tritici rnai: double-stranded rna directs the atp-dependent cleavage of mrna at 21-to 23-nucleotide intervals knockdown of midgut genes by dsrna-transgenic plantmediated rna interference in the hemipteran insect nilaparvata lugens peroxiredoxin 1 protects the pea aphid acyrthosiphon pisum from oxidative stress induced by micrococcus luteus infection single processing center models for human dicer and bacterial rnase iii chitosan/double-stranded rna nanoparticle-mediated rna interference to silence chitin synthase genes through larval feeding in the african malaria mosquito (anopheles gambiae) production of dsrna sequences in the host plant is not sufficient to initiate gene silencing in the colonizing oomycete pathogen phytophthora parasitica silencing of molt-regulating transcription factor gene, cihr3, affects growth and development of sugarcane stem borer, chilo infuscatellus feasibility, limitation and possible solutions of rnai-based technology for insect pest control silencing of cytochrome p450 cyp6b6 gene of cotton bollworm (helicoverpa armigera) by rnai full crop protection from an insect pest by expression of long double-stranded rnas in plastids chitosan/interfering rna nanoparticle mediated gene silencing in disease vector mosquito larvae next-generation insect-resistant plants: rnai-mediated crop protection double stranded rna constructs to control ants. us patent application publication no phyllotreta striolata (coleoptera, chrysomelidae): arginine kinase cloning and rnai-based pest control rna interference in the termite reticulitermes flavipes through ingestion of double-stranded rna ingested rna interference for managing the populations of the colorado potato beetle, leptinotarsa decemlineata improvement of pest resistance in transgenic tobacco plants expressing dsrna of an insect-associated gene ecr a novel meloidogyne enterolobii effector metctp promotes parasitism by suppressing programmed cell death in host plants key: cord-260793-bb4h255w authors: brann, david h.; tsukahara, tatsuya; weinreb, caleb; lipovsek, marcela; van den berge, koen; gong, boying; chance, rebecca; macaulay, iain c.; chou, hsin-jung; fletcher, russell; das, diya; street, kelly; de bezieux, hector roux; choi, yoon-gi; risso, davide; dudoit, sandrine; purdom, elizabeth; mill, jonathan s.; hachem, ralph abi; matsunami, hiroaki; logan, darren w.; goldstein, bradley j.; grubb, matthew s.; ngai, john; datta, sandeep robert title: non-neuronal expression of sars-cov-2 entry genes in the olfactory system suggests mechanisms underlying covid-19-associated anosmia date: 2020-05-18 journal: biorxiv doi: 10.1101/2020.03.25.009084 sha: doc_id: 260793 cord_uid: bb4h255w altered olfactory function is a common symptom of covid-19, but its etiology is unknown. a key question is whether sars-cov-2 (cov-2) – the causal agent in covid-19 – affects olfaction directly by infecting olfactory sensory neurons or their targets in the olfactory bulb, or indirectly, through perturbation of supporting cells. here we identify cell types in the olfactory epithelium and olfactory bulb that express sars-cov-2 cell entry molecules. bulk sequencing revealed that mouse, non-human primate and human olfactory mucosa expresses two key genes involved in cov-2 entry, ace2 and tmprss2. however, single cell sequencing and immunostaining demonstrated ace2 expression in support cells, stem cells, and perivascular cells; in contrast, neurons in both the olfactory epithelium and bulb did not express ace2 message or protein. these findings suggest that cov-2 infection of non-neuronal cell types leads to anosmia and related disturbances in odor perception in covid-19 patients. sars-cov-2 (cov-2) is a pandemic coronavirus that causes the covid-19 syndrome, which can include upper respiratory infection (uri) symptoms, severe respiratory distress, acute cardiac injury and death (1-4). cov-2 is closely related to other beta-coronaviruses, including the causal agents in pandemic sars and mers (sars-cov and mers-cov, respectively) and endemic viruses typically associated with mild uri syndromes (hcov-oc43 and hcov-229e) (5) (6) (7) . clinical reports suggest that infection with cov-2 is associated with high rates of disturbances in smell and taste perception, including anosmia (8) (9) (10) (11) (12) . while many viruses (including coronaviruses) induce transient changes in odor perception due to inflammatory responses, in at least some cases covid-related anosmia has been reported to occur in the absence of significant nasal inflammation or coryzal symptoms (11, (13) (14) (15) . this observation suggests that cov-2 might directly target odor processing mechanisms, although the specific means through which cov-2 alters odor perception remains unknown. cov-2 -like sars-cov -infects cells through interactions between its spike (s) protein and the ace2 protein on target cells. this interaction requires cleavage of the s protein, likely by the cell surface protease tmprss2, although other proteases (such as cathepsin b and l, ctsb/ctsl) may also be involved (4) (5) (6) (16) (17) (18) (19) (20) . other coronaviruses use different cell surface receptors and proteases to facilitate cellular entry, including dpp4, furin and hspa5 for mers-cov, anpep for hcov-229e, tmprss11d for sars-cov (in addition to ace2 and tmprss2), and st6gal1 and st3gal4 for hcov-oc43 and hcov-hku1 (6, (21) (22) (23) . we hypothesized that identifying the specific olfactory cell types susceptible to direct cov-2 infection (due to e.g., ace2 and tmprss2 expression) would provide insight into possible mechanisms through which covid-19 causes altered smell perception. the nasal epithelium is divided into a respiratory epithelium (re) and olfactory epithelium (oe), whose functions and cell types differ. the nasal re is continuous with the epithelium that lines much of the respiratory tract and is thought to humidify air as it enters the nose; main cell types include basal cells, ciliated cells, secretory cells (including goblet cells), and brush/microvillar cells (24, 25) (figure 1 ). the oe, in contrast, is responsible for odor detection, as it houses mature olfactory sensory neurons (osns) that interact with odors via receptors localized on their dendritic cilia. osns are supported by sustentacular cells, which act to structurally support sensory neurons, phagocytose and/or detoxify potentially damaging agents, and maintain local salt and water balance (26) (27) (28) ; microvillar cells and mucus-secreting bowman's gland cells also play important roles in maintaining oe homeostasis and function (24, 29) (figure 1 ). in addition, the oe contains globose basal cells (gbcs), which are primarily responsible for regenerating osns during normal epithelial turnover, and horizontal basal cells (hbcs), which act as reserve stem cells activated upon tissue damage (30) (31) (32) . osns elaborate axons that puncture the cribriform plate at the base of the skull and terminate in the olfactory bulb, whose local circuits process olfactory information before sending it to higher brain centers (figure 1 ). it has recently been demonstrated through single cell rna sequencing analysis (referred to herein as scseq) that cells from the human upper airway -including nasal re goblet, basal and ciliated cells -express high levels of ace2 and tmprss2, suggesting that these re cell types may serve as a viral reservoir during cov-2 infection (33) . however, analyzed samples in that dataset did not include any osns or sustentacular cells, indicating that tissue sampling in these experiments did not include the oe (34, 35) . here we query both new and previously published bulk rna-seq and scseq datasets from the olfactory system for expression of ace2, tmrpss2 and other genes implicated in coronavirus entry. we find that non-neuronal cells in the oe and olfactory bulb, including support, stem and perivascular cells, express cov-2 entryassociated transcripts and their associated proteins, suggesting that infection of these non-neuronal cell types contributes to anosmia in covid-19 patients. schematic of a sagittal view of the human nasal cavity, in which respiratory and olfactory epithelium are colored (left). for each type of epithelium, a schematic of the anatomy and known major cell types are shown (right). in the olfactory bulb in the brain (tan) the axons from olfactory sensory neurons coalesce into glomeruli, and mitral/tufted cells innervate these glomeruli and send olfactory projections to downstream olfactory areas. glomeruli are also innervated by juxtaglomerular cells, a subset of which are dopaminergic. to determine whether genes relevant to cov-2 entry are expressed in osns or other cell types in the human oe, we queried previously published bulk rna-seq data derived from the whole olfactory mucosa (wom) of macaque, marmoset and human (36) , and found expression of almost all cov-entry-related genes in all wom samples ( figure s1a ). to identify the specific cell types in human oe that express ace2, we quantified gene expression in scseq derived from four human nasal biopsy samples recently reported by durante et al (37) . neither ace2 nor tmprss2 were detected in mature osns, whereas these genes were detected in both sustentacular cells and hbcs (figures 2a-e) . in contrast, genes relevant to cell entry of other covs were expressed in osns, as well as in other oe cell types. we confirmed the expression of ace2 proteins via immunostaining of human olfactory epithelium biopsy tissue, which revealed expression in sustentacular and basal cells, and an absence of ace2 protein in osns ( figures 2f and s1e ). together, these results demonstrate that sustentacular and olfactory stem cells, but not mature osns, are potentially direct targets of cov-2 in the human oe. given that the nasopharynx is a major site of infection for cov-2 (10), we compared the frequency of ace2 and tmprss2 expression among the cell types in the human re and oe (37) . sustentacular cells exhibited the highest frequency of ace2 expression in the oe (2.90% of cells) although this frequency was slightly lower than that observed in respiratory ciliated and secretory cells (3.65% and 3.96%, respectively). while all hbc subtypes expressed ace2, the frequency of expression of ace2 was lower in olfactory hbcs (0.84% of cells) compared to respiratory hbcs (1.78% of cells) ( figure 2b ). in addition, all other re cell subtypes showed higher frequencies of ace2 and tmprss2 expression than was apparent in oe cells. these results demonstrate the presence of key cov-2 entry-related genes in specific cell types in the oe, but at lower levels of expression than in re isolated from the nasal mucosa. we wondered whether these lower levels of expression might nonetheless be sufficient for infection of cov-2. it was recently reported that nasal re has higher expression of cov-2 entry genes than re of the trachea or lungs (38) , and we therefore asked where the oe fell within this previously established spectrum of expression. to address this question, we developed a two step alignment procedure in which we first sought to identify cell types that were common across the oe and re, and then leveraged gene expression patterns in these common cell types to normalize gene expression levels across all cell types in the oe and re ( figure s2 ). this approach revealed a correspondences between goblet cells in the re and bowman's gland cells in the oe (96% mapping probability, see methods), and between pulmonary ionocytes in the re and a subset of microvillar cells in the oe (99% mapping probability, see methods); after alignment, human oe sustentacular cells were found to express ace2 and tmprss2 at levels similar to those observed in the remainder of the non-nasal respiratory tract ( figure 2g) (38) . these results are consistent with the possibility that specific cell types in the human olfactory epithelium express ace2 at a level that is permissive for direct infection. (37) . each dot represents an individual cell, colored by cell type (hbc = horizontal basal cell, osn = olfactory sensory neuron, sus = sustentacular cell, mv: microvillar cell, resp.: respiratory, oec = olfactory ensheathing cell, smc=smooth muscle cell). (b) percent of cells expressing ace2 and tmprss2. ace2 was not detected in any osns, but was observed in sus cells and hbcs, among other olfactory and respiratory epithelial cell types. olfactory and respiratory cell types are shown separately. ace2 and tmprss2 were also co-expressed above chance levels (odds ratio 7.088, p-value 3.74e-57, fisher's exact test). (c) umap representations of 865 detected immature (gng8) and mature (gng13) osns. neither ace2 nor tmprss2 are detected in either population of osns. the color represents the normalized expression level for each gene (number of umis for a given gene divided by the total number of umis for each cell). (d) umap representations of all cells, depicting the normalized expression of cov-2 related genes ace2 and tmprss2, as well as several cell type markers. ace2 and tmprss2 are expressed in respiratory and olfactory cell types, but not in osns. ace2 and tmprss2 are detected in hbc (krt5) and sustentacular (cyp2a13) cells, as well as other respiratory epithelial cell types, including respiratory ciliated (foxj1) cells. (e) various cov related genes including ace2 and tmprss2, are expressed in respiratory and olfactory cell types, but not in osns. gene expression for ace2 and tmprss2 as well as marker genes for olfactory and respiratory epithelial cell types are shown normalized by their maximum expression across cell types. mhv, mouse hepatitis virus. (f) ace2 immunostaining of human olfactory mucosal biopsy samples. ace2 protein (green) is detected in sustentacular cells and krt5-positive basal cells (red; white arrowhead). nuclei were stained with dapi (blue). bar = 25 µm. the ace2 and krt5 channels from the box on the left are shown individually on the right (g) gene expression across cell types and tissues in durante figure s4 ). the tissues correspond to progressive positions along the airway from nasal to distal lung. ace2 expression in olfactory hbc and sustentacular cells is comparable to that observed in other cell types in the respiratory tract. to further explore the distribution of cov-2 cell entry genes in the olfactory system we turned to the mouse, which enables interrogative experiments not possible in humans. to evaluate whether that expression patterns observed in the mouse correspond to those observed in the human oe, we examined published datasets in which rna-seq was independently performed on mouse wom and on purified populations of mature osns (39) (40) (41) . the cov-2 receptor ace2 and the protease tmprss2 were expressed in wom, as were the cathepsins ctsb and ctsl (figures 3a and s3a) (39) . however, expression of these genes (with the exception of ctsb) was much lower and ace2 expression was nearly absent in purified osn samples (figures 3a and s3a, see legend for counts). genes used for cell entry by other covs (except st3gal4) were also expressed in wom, and de-enriched in purified osns. the deenrichment of ace2 and tmprss2 in osns relative to wom was also observed in two other mouse rna-seq datasets (40, 41) ( figure s3b ). these data demonstrate that, as in humans, ace2 and other cov-2 entry-related genes are expressed in the mouse olfactory epithelium. the presence of ace2 and tmprss2 transcripts in mouse wom and their (near total) absence in purified osns suggest that the molecular components that enable cov-2 entry into cells are expressed in non-neuronal cell types in the mouse nasal epithelium. to identify the specific cell types that express ace2 and tmprss2, we performed scseq (via drop-seq, see methods) on mouse wom ( figure 3b ). these results were consistent with observations made in the human epithelium: ace2 and tmprss2 were expressed in a fraction of sustentacular and bowman's gland cells, and a very small fraction of stem cells, but not in osns (zero of 17,666 identified mature osns, figures 3c and s3c-d) . of note, only dorsally-located sustentacular cells, which express the markers sult1c1 and acsm4, were positive for ace2 ( figures 3d and s3d-e) . indeed, reanalysis of the ace2+ subset of human sustentacular cells revealed that all positive cells expressed genetic markers associated with the dorsal epithelium ( figure s1d ). an independent mouse scseq data set (obtained using the 10x chromium platform, see methods) revealed that olfactory sensory neurons did not express ace2 (2 of 28769 mature osns were positive for ace2), while expression was observed in a fraction of bowman's gland cells and hbcs ( figure s4 , see methods). expression in sustentacular cells was not observed in this dataset, which included relatively few dorsal sustentacular cells (a possible consequence of the specific cell isolation procedure associated with the 10x platform, which distinguishes it from drop-seq; compare figures s4c and 3d) . staining of the mouse wom with anti-ace2 antibodies confirmed that ace2 protein is expressed in sustentacular cells and is specifically localized to the sustentacular cell microvilli ( figure 3e -k). ace2+ sustentacular cells were identified exclusively within the dorsal subregion of the oe; critically, within that region many (and possibly all) sustentacular cells expressed ace2 ( figure 3f -g). this observation is consistent with the possibility that ace2 protein can be broadly expressed in cell populations that exhibit sparse expression when characterized by scseq. staining was also observed in bowman's gland cells but not in osns ( figure 3h -j). taken together, these data demonstrate that ace2 is expressed by sustentacular cells that specifically reside in the dorsal epithelium in both mouse and human. considered positive if any transcripts (umis) were expressed for a given gene. sustentacular cells (sus) from dorsal and ventral zones are quantified separately. ace2 is detected in dorsal sustentacular, bowman's gland, hbcs, as well as respiratory cell types. (d) umap representation of sustentacular cells, with expression of cov-2 related genes ace2 and tmprss2, as well as marker genes for sus (both pan-sus marker cbr2 and dorsal specific marker sult1c1) indicated. each point represents an individual sustentacular cell, and the color represents the normalized expression level for each gene (number of umis for a given gene divided by the total number of umis for each cell; in this plot ace2 expression is binarized for visualization purposes). ace2-positive sustentacular cells are found within the dorsal sult1c1positive subset. umap plots for other cell types are shown in figure s2 . (e) ace2 immunostaining of mouse main olfactory epithelium. as shown in this epithelial hemisection, ace2 protein is detected in the dorsal zone and respiratory epithelium. note that the punctate ace2 staining beneath the epithelial layer is likely associated with vasculature (see figure 5f ). bar = 500 µm. arrowheads depict the edges of ace2 expression, corresponding to the presumptive dorsal zone (confirmed in g). viral injury can lead to broad changes in oe physiology that are accompanied by recruitment of stem cell populations tasked with regenerating the epithelium (13, 30) . to characterize the distribution of ace2 expression under similar circumstances, we injured the oe by treating mice with methimazole (which specifically ablates osns), and then employed a previously established lineage tracing protocol to perform scseq on hbcs and their descendants during subsequent regeneration (see methods) (32) . this analysis revealed that after injury ace2 and tmprss2 are expressed in subsets of sustentacular cells and hbcs, as well as in the activated hbcs that serve to regenerate the epithelium (figures 4a-c and s5; note that activated hbcs express ace2 at higher levels than resting hbcs). analysis of the ace2+ sustentacular cell population revealed expression of dorsal epithelial markers ( figure 4d ). to validate these results, we re-analyzed a similar lineage tracing dataset in which identified hbcs and their progeny were subject to smart-seq2-based deep sequencing, which is more sensitive than scseq (32) . in this dataset, ace2 was detected in more than 0.7% of gbcs, nearly 2% of activated hbcs and nearly 3% of sustentacular cells but was not detected in osns ( figures s5b) . furthermore, larger percentages of hbcs, gbcs and sustentacular cells expressed tmprss2. immunostaining with anti-ace2 antibodies confirmed that ace2 protein was present in activated stem cells under these regeneration conditions ( figure 4e ). these results demonstrate that activated stem cells recruited during injury express ace2, and do so at higher levels than those in resting stem cells. given the potential for the re and oe in the nasal cavity to be directly infected with cov-2, we assessed the expression of ace2 and other cov entry genes in the mouse olfactory bulb (ob), which is directly connected to osns via cranial nerve i (cn i); in principle, alterations in bulb function could cause anosmia independently of functional changes in the oe. to do so, we performed scseq (using drop-seq, see methods) on the mouse ob, and merged these data with a previously published ob scseq analysis, yielding a dataset with nearly 50,000 single cells (see methods) (42) . this analysis revealed that ace2 expression was absent from ob neurons and instead was observed only in vascular cells, predominantly in pericytes, which are involved in blood pressure regulation, maintenance of the blood-brain barrier, and inflammatory responses (figures 5a-d and s6-7) (43) . although other potential cov proteases were expressed in the ob, tmprss2 was not expressed. we also performed smart-seq2-based deep sequencing of single ob dopaminergic juxtaglomerular neurons, a population of local interneurons in the ob glomerular layer that (like tufted cells) can receive direct monosynaptic input from nose osns (figures 5e and s8, see methods); these experiments confirmed the absence of ace2 and tmprss2 expression in this cell type. immunostaining in the ob revealed that blood vessels expressed high levels of ace2 protein, particularly in pericytes; consistent with the scseq results, staining was not observed in any neuronal cell type ( figure 5f ). these observations may also hold true for other brain regions, as re-analysis of 10 deeply sequenced scseq datasets from different regions of the nervous system demonstrated that ace2 and tmprss2 expression is absent from neurons, consistent with prior immunostaining results ( figure s9 ) (44) . given the extensive similarities detailed above in expression patterns for ace2 and tmprss2 in the mouse and human, these findings (performed in mouse) suggest that ob neurons are likely not a primary site of infection, but that vascular pericytes may be sensitive to cov-2 infection in the ob. here we show that subsets of oe sustentacular cells, hbcs, and bowman's gland cells in both mouse and human samples express the cov-2 receptor ace2 and the spike protein protease tmprss2. human oe sustentacular cells express these genes at levels comparable to those observed in lung cells. in contrast, we failed to detect ace2 expression in mature osns at either the transcript or protein levels. these observations suggest that cov-2 does not directly enter osns, but instead may target oe support and stem cells. similarly, neurons in the ob do not express ace2, whereas vascular pericytes do. thus primary infection of non-neuronal cell types -rather than sensory or bulb neurons -may be responsible for anosmia and related disturbances in odor perception in covid-19 patients. the identification of non-neuronal cell types in the oe and bulb susceptible to cov-2 infection suggests four possible, non-mutually-exclusive mechanisms for the acute loss of smell reported in covid-19 patients. first, local infection of support and vascular cells in the nose and bulb could cause significant inflammatory responses whose downstream effects could block effective odor conduction, or alter the function of osns or bulb neurons (14) (45). second, damage to support cells (which are responsible for local water and ion balance) could indirectly influence signaling from osns to the brain (46) . third, damage to sustentacular cells and bowman's gland cells in mouse models can lead to diffuse architectural damage to the entire oe, which in turn could abrogate smell perception (47) . finally, vascular damage could lead to hypoperfusion and inflammation leading to changes in ob function. immunostaining in the mouse suggests that ace2 protein is (nearly) ubiquitously expressed in sustentacular cells in the dorsal oe, despite sparse detection of ace2 transcripts using scseq. similarly, nearly all vascular cells positive for a pericyte marker also expressed ace2 protein, although only a fraction of ob pericytes were positive for ace2 message when assessed using scseq. although ace2 transcripts were more rarely detected than protein, there was a clear concordance at the cell type level: expression of ace2 mrna in a particular cell type accurately predicted the presence of ace2 protein, while ace2 transcript-negative cell types (including osns) did not express ace2 protein. if humans also exhibit a similar relationship between mrna and protein (a reasonable possibility given the precise match in olfactory cell types that express cov-2 cell entry genes between the two species), then ace2 protein is likely to be broadly expressed in human dorsal sustentacular cells. thus, in the there may be many sustentacular cells available for cov-2 infection in the human epithelium (which in turn could recruit a diffuse inflammatory process). that said, it remains possible that damage to the oe could be caused by more limited cell infection. for example, infection of subsets of sustentacular cells by the sdav coronavirus in rats ultimately leads to disruption of the global architecture of the oe, suggesting that focal coronavirus infection may be sufficient to cause diffuse epithelial damage (47) . the natural history of cov2-induced anosmia is only now being defined; while recovery of smell has been reported, it remains unclear whether in a subset of patients smell disturbances will be long-lasting or permanent (8) (9) (10) (11) (12) 48) . we observe that activated hbcs, which are recruited after injury, express ace2 at higher levels than those apparent in resting stem cells. while on its own it is likely that infection of stem cells would not cause acute smell deficits, in the context of infection the dual challenge of loss of sustentacular cells, together with the inability to effectively renew the oe over time, could result in persistent anosmia. many viruses, including coronaviruses, have been shown to propagate from the nasal epithelium past the cribriform plate to infect the ob; this form of central infection has been suggested to mediate olfactory deficits, even in the absence of lasting oe damage (18, (49) (50) (51) (52) (53) . the rodent coronavirus mhv passes from the nose to the bulb, even though rodent osns do not express ceacam1, the main mhv receptor (50, 54) ( figures s3c, s4e, s5a) , suggesting that covs in the nasal mucosa can reach the brain through mechanisms independent of axonal transport by sensory nerves; interestingly, ob dopaminergic juxtaglomerular cells express ceacam1 ( figure 4e ), which likely supports the ability of mhv to target the bulb and change odor perception. one speculative possibility is that local seeding of the oe with cov-2-infected cells can result in osn-independent transfer of virions from the nose to the bulb, perhaps via the vascular supply shared between the ob and the osn axons that comprise cn i. although cn i was not directly queried in our datasets, it is reasonable to infer that vascular pericytes in cn i also express ace2, which suggests a possible route of entry for cov-2 from the nose into the brain. given the absence of ace2 in ob neurons, we speculate that any central olfactory dysfunction in covid-19 is the secondary consequence of pericyte-mediated vascular inflammation (43) . we note several caveats that temper our conclusions. although current data suggest that ace2 is the most likely receptor for cov-2 in vivo, it is possible (although it has not yet been demonstrated) that other molecules such as bsg may enable cov-2 entry independently of ace2 ( figures 2e, s3c , s4e, s5a) (55, 56) . in addition, it has recently been reported that low level expression of ace2 can support cov-2 cell entry (57); it is possible, therefore, that ace2 expression beneath the level of detection in our assays may yet enable cov-2 infection of apparently ace2 negative cell types. we also propose that damage to the olfactory system is either due to primary infection or secondary inflammation; it is possible (although has not yet been demonstrated) that cells infected with cov-2 can form syncytia with cells that do not express ace2. such a mechanism could damage neurons adjacent to infected cells. any reasonable pathophysiological mechanism for covid-19-associated anosmia must account for the high penetrance of smell disorders relative to endemic viruses, the apparent suddenness of smell loss (which can precede the development of other symptoms), and the transient nature of dysfunction in many patients (8) (9) (10) (11) (12) (11, (13) (14) (15) ; definitive identification of the disease mechanisms underlying covid-19-mediated anosmia will require additional research. nonetheless, our identification of cells in the oe and ob expressing molecules known to be involved in cov-2 entry illuminates a path forward for future studies. human scseq data from durante et al. (37) was downloaded from the geo at accession gse139522. 10x genomics mtx files were filtered to remove any cells with fewer than 500 total counts. additional preprocessing was performed as described above, including total counts normalization and filtering for highly variable genes using the spring gene filtering function "filter_genes" with parameters (90, 3, 10) . the resulting data were visualized in spring and partitioned using louvain clustering on the spring k-nearest-neighbor graph. four clusters were removed for quality control, including two with low total counts (likely background) and two with high mitochondrial counts (likely stressed or dying cells). putative doublets were also identified using scrublet and removed (7% of cells). the remaining cells were projected to 40 dimensions using pca. pca-batch-correction was performed using patient 4 as a reference, as previously described (58) . the filtered data were then re-partitioned using louvain clustering on the spring graph and each cluster was annotated using known marker genes, as described in (37) . for example, immature and mature osns were identified via their expression of gng8 and gng13, respectively. hbcs were identified via the expression of krt5 and tp63 and olfactory hbcs were distinguished from respiratory hbcs via the expression of cxcl14 and meg3. identification of sus cells (cyp2a13, cyp2j2), bowman's gland (sox9, gpx3), and mv ionocytes-like cells (ascl3, cftr, foxi1) was also performed using known marker genes. for visualization, the top 40 principal components were reduced to two dimensions using umap with parameters (n_neighbors=15, min_dist=0.4). the filtered human scseq dataset contained 33358 cells. each of the samples contained cells from both the olfactory and respiratory epithelium, although the frequency of osns and respiratory cells varied across patients, as previously described (37) . 295 cells expressed ace2 and 4953 cells expressed tmprss2. of the 865 identified osns, including both immature and mature cells, none of the cells express ace2 and only 2 (0.23%) expressed tmprss2. in contrast, ace2 was reliably detected in at least 2% and tmprss2 was expressed in close to 50% of multiple respiratory epithelial subtypes. the expression of both known cell type markers and known cov-related genes was also examined across respiratory and olfactory epithelial cell types. for these gene sets, the mean expression in each cell type was calculated and normalized by the maximum across cell types. data from deprez et al. (34) were downloaded from the human cell atlas website (https://www.genomique.eu/cellbrowser/hca/; "single-cell atlas of the airway epithelium (grch38 human genome)"). a subset of these data was combined with a subset of the durante data for mapping between cell types. for the deprez data, the subset consisted of samples from the nasal re that belonged to a cell type with >20 cells, including basal, cycling basal, suprabasal, secretory, mucous multiciliated cells, multiciliated, sms goblet and ionocyte. we observed two distinct subpopulations of basal cells, with one of the two populations distinguished by expression of cxcl14. the cells in this population were manually identified using spring and defined for downstream analysis as a separate cell type annotation called "basal (cxcl14+)". for the durante data, the subset consisted of cells from cell types that had some putative similarity to cells in the deprez dataset, including olfactory hbc, cycling respiratory hbc, respiratory hbc, early respiratory secretory cells, respiratory secretory cells, sustentacular cells, bowman's gland, olfactory microvillar cells. to establish a cell type mapping: 1) durante (37) and deprez (34) data were combined and gene expression values were linearly scaled so that all cells across datasets had the same total counts. pca was then performed using highly variable genes (n=1477 genes) and pcabatch-correction (58) with the durante data as a reference set. 3) the table of votes t was z-scored against a null distribution, generated by repeating the procedure above 1000 times with shuffled cell type labels. the resulting z-scores were similar between the two possible mapping directions (durante -> deprez vs. deprez -> durante; r=0.87 pearson correlation of mapping zscores). the mapping z-scores were also highly robust upon varying the number of votes-cast per cell (r>0.98 correlation of mapping z-scores upon changing the vote numbers to 1 or 50 as opposed to 5). only cell-type correspondences with a high zscore in both mapping directions (z-score > 25) were used for downstream analysis. to establish a common scale of gene expression between datasets, we restricted to cell type correspondences that were supported both by bioinformatic mapping and shared a nominal cell type designation based on marker genes. these included: basal/suprabasal cells = "respiratory hbcs" from durante et al., and "basal" and "suprabasal" cells from deprez we next sought a transformation of the durante data so that it would agree with the deprez data within the corresponding cell types identified above to account for differing normalization strategies applied to each dataset prior to download (log normalization and rescaling with cell-specific factors for deprez et al. but not for durante et al.), we used the following ansatz for the transformation, where the pseudocount p is a global latent parameter and the rescaling factors ! are fit to each gene separately. in the equation below, t denotes the transformation and !" represents a gene expression value for cell i and gene j in the durante data: the parameter p was fit by maximizing the correlation of average gene expression across all genes between each of the cell type correspondences listed above. the rescaling factors ! were then fitted separately for each gene by taking the quotient of average gene expression between the deprez data and the log-transformed durante data, again across the cell type correspondences above. normalized gene expression tables were obtained from previous published datasets (36, (39) (40) (41) . for the mouse data sets, the means of the replicates from wom or osn were used to calculate log2 fold changes. for the mouse data from saraiva et al. and the primate data sets (36, 39) , the normalized counts of the genes of interest from individual replicates were plotted. below is a table with detailed sample information. sample information for the bulk rna-seq data analyzed in this study a new dataset of whole olfactory mucosa scseq was generated from adult male mice (8-12 weeks-old). all mouse husbandry and experiments were performed following institutional and federal guidelines and approved by harvard medical school's institutional animal care and use committee (iacuc). briefly, dissected main olfactory epithelium were cleaned up in 750 µl of ebss (worthington) and epithelium tissues were isolated in 750 µl of papain (20 u/ml in ebss) and 50 µl of dnase i (2000 u/ml). tissue pieces were transferred to a 5 ml round-bottom tube (bd) and 1.75 ml of papain and 450 µl of dnase i were added. after 1-1.5 hour incubation with rocking at 37°c, the suspension was triturated with a 5 ml pipette 15 times and passed through 40 µm cell strainer (bd) and strainer was washed with 1 ml of dmem + 10 % fbs (invitrogen). the cell suspension was centrifuged at 300g for 5 min. cells were resuspended with 4 ml of dmem + 10 % fbs and centrifuged at 300g for 5 min. cells were suspended with pbs + 0.01 % bsa and concentration was measured by hemocytometer. drop-seq experiments were performed as previously described (59) . microfluidics devices were obtained from flowjem and barcode beads were obtained from chemgenes. 8 of 15 min drop-seq runs were collected in total, which were obtained from 5 mice. 8 replicates of drop-seq samples were sequenced across 5 runs on an illumina nextseq 500 platform. paired end reads from the fastq files were trimmed, aligned, and tagged via the drop-seq tools (v1.13) pipeline, using star (v2.5.4a) with genomic indices from ensembl release 93. the digital gene expression matrix was generated for 4,000 cells for 0126_2, 5,000 cells for 0105, 0126_1, 051916_ds11, 051916_ds12, 051916_ds22, 5,500 cells for 051916_ds21, and 9,500 cells for 0106. processing of the wom drop-seq samples was performed in seurat (v2.3.1). cells with less than 500 umis or more than 15,000 umis, or higher than 5% mitochondrial genes were removed. potential doublets were removed using scrublet. cells were initially preprocessed using the seurat pipeline. variable genes "findvariablegenes" (y.cutoff = 0.6) were scaled (regressing out effects due to numi, the percent of mitochondrial genes, and replicate ids) and the data was clustered using 50 pcs with the louvain algorithm (resolution=0.8). in a fraction of sustentacular cells, we observed co-expression of markers for sustentacular cells and other cell types (e.g. osns). re-clustering of sustentacular cells alone separately out these presumed doublets from the rest of the sustentacular cells, and the presumed doublets were removed for the analyses described below. the filtered cells from the preprocessing steps were reanalyzed in python using scanpy and spring. in brief, the raw gene counts in each cell were total counts normalized and variable genes were identified using the spring gene filtering function "filter_genes" with parameters (85, 3, 3); mitochondrial and olfactory receptor genes were excluded from the variable gene lists. the resulting 2083 variable genes were zscored and the dimensionality of the data was reduced to 35 via principal component analysis. the k-nearest neighbor graph (n_neighbors=15) of these 35 pcs was clustered using the leiden algorithm (resolution=1.2) and was reduced to two dimensions for visualization via the umap method (min_dist=0.42). clusters were manually annotated on the basis of known marker genes and those sharing markers (e.g. olfactory sensory neurons) were merged. the mouse wom drop-seq dataset contained 29585 cells that passed the above filtering. each of the 16 clusters identified contained cells from all 8 replicates in roughly equal proportions. of the 17666 mature osns and the 4674 immature osns, none of the cells express ace2. in contrast, in the olfactory epithelial cells, ace2 expression was observed in the bowman's gland, olfactory hbcs, dorsal sustentacular cells. mice were sacrificed with a lethal dose of xylazine and nasal epithelium with attached olfactory bulbs were dissected and fixed in 4% paraformaldehyde (electron microscope sciences, 19202) in phosphate-buffered saline (pbs) for overnight at 4°c or for 2 hours at room temperature. tissues were washed in pbs for 3 times (5 min each) and incubated in 0.45m edta in pbs overnight at 4°c. the following day, tissues were rinsed by pbs and incubated in 30 % sucrose in pbs for at least 30 min, transferred to tissue freezing medium (vwr, 15146-025) for at least 45 min and frozen on crushed dry ice and stored at -80°c until sectioning. tissue sections (20 µm thick for the olfactory bulb and 12 µm thick for nasal epithelium) were collected on superfrost plus glass slides (vwr, 48311703) and stored at -80°c until immunostaining. for methimazole treated samples, adult c57bl/6j mice (6-12 weeks old, jax stock no. 000664) were given intraperitoneal injections with methimazole (sigma m8506) at 50 µg/g body weight and sacrificed at 24, 48, and 96-hour timepoints. sections were permeabilized with 0.1% triton x-100 in pbs for 20 min then rinsed 3 times in pbs. sections were then incubated for 45-60 min in blocking solution that consisted of pbs containing 3% bovine serum albumin (jackson immunoresearch, 001-000-162) and 3% donkey serum (jackson immunoresearch, 017-000-121) at room temperature, followed by overnight incubation at 4°c with primary antibodies diluted in the same blocking solution. primary antibodies used are as follows. after secondary antibody incubation, sections were washed twice for 5-10 min in pbs, incubated with 300 nm dapi in pbs for 10 min and then rinsed with pbs. slides were mounted with glass coverslips using vectashield mounting medium (vector laboratories, h-1000) or prolong diamond antifade mountant (invitrogen, p36961). for co-staining of ace2 and nqo1, slides were first stained with ace2 primary antibody and donkey anti-goat igg alexa 488 secondary. after 3 washes of secondary antibody, tissues were incubated with unconjugated donkey anti-goat igg fab fragments (jackson immunoresearch, 705-007-003) at 30 µg/ml diluted in blocking solution for 1 hour at room temperature. tissues were washed twice with pbs, once in blocking solution, and incubated in blocking solution for 30-40 min at room temperature, followed by a second round of staining with the nqo1 primary antibody and donkey anti-goat igg alexa 555 secondary antibody. confocal images were acquired using a leica spe microscope (harvard medical school neurobiology imaging facility) with 405 nm, 488 nm, 561 nm, and 635 nm laser lines. multi-slice z-stack images were acquired, and their maximal intensity projections are shown. for figure 3e , tiled images were acquired and stitched by the leica las x software. images were processed using fiji imagej software (60) , and noisy images were median-smoothed using the remove outliers function built into fiji. sult1c1 rna was detected by fluorescent rnascope assay (advanced cell diagnostics, kit 320851) using probe 539921-c2, following the manufacturer's protocol (rnascope fluorescent multiplex kit user manual, 320293-um date 03142017) for paraformaldehyde-fixed tissue. prior to initiating the hybridization protocol, the tissue was pre-treated with two successive incubations (first 30 min, then 15 min long) in rnascope protease iii (advanced cell diagnostics, 322337) at 40°c, then washed in distilled water. at the end of protocol, the tissue was washed in pbs and subjected to the 2-day immunostaining protocol described above. human olfactory mucosa biopsies were obtained via irb-approved protocol at duke university school of medicine, from nasal septum or superior turbinate during endoscopic sinus surgery. tissue was fixed with 4% paraformaldehyde and cryosectioned at 10 µm and sections were processed for immunostaining, as previously described (37) . sections from a female nasal septum biopsy were stained for ace2 ( figure 2f ) using the same goat anti-ace2 (thermo fisher, pa5-47488, 1:40) and the protocol described above for mouse tissue. the human sections were co-stained with rabbit antikeratin 5 (abcam, ab24647; ab_448212, 1:1000) and were detected with alexafluor 488 donkey anti-goat (jackson immunoresearch, 705-545-147) and alexafluor 594 donkey anti-rabbit (jackson immunoresearch, 711-585-152) secondary antibodies (1:300). as further validation of ace2 expression and to confirm the lack of ace2 expression in human olfactory sensory neurons ( figure s1e ), sections were stained with a rabbit anti-ace2 (abcam, ab15348; rrid:ab_301861, used at 1:100) antibody immunogenized against human ace2 and a mouse tuj1 antibody against neuronspecific tubulin (biolegend, 801201; rrid:ab_2313773). anti-ace2 was raised against a c-terminal synthetic peptide for human ace2 and was validated by the manufacturer to not cross-react with ace1 for immunohistochemical labeling of ace2 in fruit bat nasal tissue as well as in human lower airway. recombinant human ace2 abolished labeling with this antibody in a previous study in human tissue, further demonstrating its specificity (61) . the tuj1 antibody was validated, as previously described (37) . biotinylated secondary antibodies (vector labs), avidin-biotinylated horseradish peroxidase kit (vector) followed by fluorescein tyramide signal amplification (perkin elmer) were applied per manufacturer's instructions. for dual staining, tuj1 was visualized using alexafluor 594 goat anti-mouse (jackson immunoresearch, 115-585-146; rrid: ab_2338881). human sections were counterstained with 4',6-diamidino-2-phenylindole (dapi) and coverslips were mounted using prolong gold (invitrogen) for imaging, using a leica dmi8 microscope system. images were processed using fiji imagej software (nih). scale bars were applied directly from the leica acquisition software metadata in imagej tools. unsharp mask was applied in imagej, and brightness/contrast was adjusted globally. 2 month-old and 18 month-old wild type c57bl/6j mice were obtained from the national institute on aging aged rodent colony and used for the wom experiments; each experimental condition consisted of one male and one female mouse to aid doublet detection. mice containing the transgenic krt5-creer(t2) driver (62) and rosa26-yfp reporter allele (63) were used for the hbc lineage tracing dataset. all mice were assumed to be of normal immune status. animals were maintained and treated according to federal guidelines under iacuc oversight at the university of california, berkeley. the olfactory epithelium was surgically removed, and the dorsal, sensory portion was dissected and dissociated, as previously described (32) . for wom experiments, dissociated cells were subjected to fluorescence-activated cell sorting (facs) using propidium iodide to identify and select against dead or dying cells; 100,000 cells/sample were collected in 10% fbs. for the hbc lineage tracing experiments krt5-creer; rosa26yfp/yfp mice were injected once with tamoxifen (0.25 mg tamoxifen/g body weight) at p21-23 days of age and sacrificed at 24 hours, 48 hours, 96 hours, 7 days and 14 days post-injury, as previously described (32, 64) . for each experimental time point, yfp+ cells were isolated by facs based on yfp expression and negative for propidium iodide, a vital dye. cells isolated by facs were subjected to single-cell rna-seq. three replicates (defined here as a facs collection run) per age were analyzed for the wom experiment; at least two biological replicates were collected for each experimental condition for the hbc lineage tracing experiment. single cell cdna libraries from the isolated cells were prepared using the chromium single cell 3' system according to the manufacturer's instructions. the wom preparation employed v3 chemistry with the following modification: the cell suspension was directly added to the reverse transcription master mix, along with the appropriate volume of water to achieve the approximate cell capture target. the hbc lineage tracing experiments were performed using v2 chemistry. the 0.04% weight/volume bsa washing step was omitted to minimize cell loss. completed libraries were sequenced on illumina hiseq4000 to produce paired-end 100nt reads. sequence data were processed with the 10x genomics cell ranger pipeline (2.0.0 for v2 chemistry), resulting in the initial starting number before filtering of 60,408 wom cells and 25,469 hbc lineage traced cells. the scone r/bioconductor package (65) was used to filter out lowly-expressed genes (fewer than 2 umi's in fewer than 5 cells) and low-quality libraries (using the metric_sample_filter function with arguments hard_nreads = 2000, zcut = 4). cells with co-expression of male (ddx3y, eif2s3y, kdm5d, and uty) and female marker genes (xist) were removed as potential doublets from the wom dataset. for both datasets, doublet cell detection was performed per sample using doubletfinder (66) and scrublet (67) . genes with at least 3 umis in at least 5 cells were used for downstream clustering and cell type identification. for the hbc lineage tracing dataset, the bioconductor package scone was used to pick the top normalization ("none,fq,ruv_k=1,no_bio,batch"), corresponding to full quantile normalization, batch correction and removing one factor of unwanted variation using ruv (68) . a range of cluster labels were created by clustering using the partitioning around medoids (pam) algorithm and hierarchical clustering in the clusterexperiment bioconductor package (69) , with parameters k0s= (10, 13, 16, 19, 22, 25) and alpha=(na,0.1,0.2,0.3). clusters that did not show differential expression were merged (using the function mergeclusters with arguments mergemethod = 'adjp', cutoff = 0.01, and demethod = 'limma' for the lineagetraced dataset). initial clustering identified one macrophage (msr1+) cluster consisting of 252 cells; upon its removal and restarting from the normalization step a subsequent set of 15 clusters was obtained. these clusters were used to filter out 1515 cells for which no stable clustering could be found (i.e., 'unassigned' cells), and four clusters respectively consisting of 31, 29 and 23 and 305 cells. doublets were identified using doubletfinder and 271 putative doublets were removed. inspection of the data in a three-dimensional umap embedding identified two groups of cells whose experimentally sampled timepoint did not match their position along the hbc differentiation trajectory, and these additional 219 cells were also removed from subsequent analyses. analysis of wom scseq data were performed in python using the open-source scanpy software starting from the raw umi count matrix of the 40179 cells passing the initial filtering and qc criteria described above. umis were total-count normalized and scaled by 10,000 (tpt, tag per ten-thousands) and then log-normalized. for each gene, the residuals from linear regression models using the total number of umis per cell as predictors were then scaled via z-scoring. pca was then performed on a set of highlyvariable genes (excluding or genes) calculated using the "highly_variable_genes" function with parameters: min_mean=0.01, max_mean=10, min_disp=0.5. a batch corrected neighborhood graph was constructed by the "bbknn" function with 42 pcs with the parameters: local_connectivity=1.5, and embedding two-dimensions using the umap function with default parameters (min_dist = 0.5). cells were clustered using the neighborhood graph via the leiden algorithm (resolution = 1.2). identified clusters were manually merged and annotated based on known marker gene expression. we the filtered hbc lineage dataset containing 21722 cells was analyzing in python and processed for visualization using pipelines in spring and scanpy (70, 71) . in brief, total counts were normalized to the median total counts for each cell and highly variable genes were selected using the spring gene filtering function ("filter_genes") using parameters (90, 3, 3) . the dimensionality of the data was reduced to 20 using principal components analysis (pca) and visualized in two-dimensions using the umap method with parameters (n_neighbors=20, min_dist=0.5). clustering was performed using the leiden algorithm (resolution=1.45) and clusters were merged manually using known marker genes. expression of candidate cov-2-related genes was defined if at least one transcript (umi) was detected in that cell, and the percent of cells expressing candidate genes was calculated for each cell type. in the wom dataset ace2 was only detected in 2 out of 28,769 mature osns (0.007 %), and in the hbc lineage dataset, ace2 was not detected in any osns. furthermore, ace2 was not detected in immature sensory neurons (gbcs, inps, or iosns) in either dataset. single-cell rna-seq data from hbc-derived cells from fletcher et al. and gadye et al (32, 64) , labeled via krt5-creer driver mice, were downloaded from geo at accession gse99251 using the file "gse95601_oehbcdiff_cufflinks_eset_counts_table.txt.gz". processing was performed as described above, including total counts normalization and filtering for highly variable genes using the spring gene filtering function "filter_genes" with parameters (75, 20, 10) . the resulting data were visualized in spring and a subset of cells were removed for quality control, including a cluster of cells with low total counts and another with predominantly reads from ercc spike-in controls. putative doublets were also identified using scrublet and removed (6% of cells) (67) . the resulting data were visualized in spring and partitioned using louvain clustering on the spring k-nearest-neighbor graph using the top 40 principal components. cell type annotation was performed manually using the same set of markers genes listed above. three clusters were removed for quality control, including one with low total counts and one with predominantly reads from ercc spike-in controls (likely background), and one with high mitochondrial counts (likely stressed cells). for visualization, and clustering the remaining cells were projected to 15 dimensions using pca and visualized with umap with parameters (n_neighbors=15, min_dist=0.4, alpha=0.5, maxiter=500). clustering was performed using the leiden algorithm (resolution=0.4) and cell types were manually annotated using known marker genes. the filtered dataset of mouse hbc-derived cells contained 1450 cells. the percent of cells expressing each marker gene was calculated as described above. of the 51 osns identified, none of them expressed ace2, and only 1 out of 194 inps and iosns expressed ace2. in contrast, ace2 and tmprss2 were both detected in hbcs and sus cells. single-cell rnaseq data from whole mouse olfactory bulb (42) were downloaded from mousebrain.org/loomfiles_level_l1.html in loom format (l1 olfactory.loom) and converted to a seurat object. samples were obtained from juvenile mice (age postnatal day [26] [27] [28] [29] . this dataset comprises 20514 cells passing cell quality filters, excluding 122 cells identified as potential doublets. a new dataset of whole olfactory bulb scseq was generated from adult male mice (8-12 weeks-old). all mouse husbandry and experiments were performed following institutional and federal guidelines and approved by harvard medical school's institutional animal care and use committee (iacuc). briefly, dissected olfactory bulbs (including the accessory olfactory bulb and fractions of the anterior olfactory nucleus) were dissociated in 750 µl of dissociation media (dm: hbss containing 10mm hepes, 1 mm mgcl2, 33 mm d-glucose) with 28 u/ml papain and 386 u/ml dnase i (worthington). minced tissue pieces were transferred to a 5 ml round-bottom tube (bd). dm was added to a final volume of 3.3 ml and the tissue was mechanically triturated 5 times with a p1000 pipette tip. after 1-hour incubation with rocking at 37°c, the suspension was triturated with a 10 ml pipette 10 times and 2.3 ml was passed through 40 µm cell strainer (bd). the suspension was then mechanically triturated with a p1000 pipette tip 10 times and 800 µl were filtered on the same strainer. the cell suspension was further triturated with a p200 pipette tip 10 times and filtered. 1 ml of quench buffer (22 ml of dm, 2.5 ml of protease inhibitor prepared by resuspending 1 vial of protease inhibitor with 32 ml of dm, and 2000u of dnase i) was added to the suspension and centrifuged at 300g for 5 min. cells were resuspended with 3 ml of quench buffer and overlaid gently on top of 5 ml of protease inhibitor, then spun down at 70g for 10min. the pellet was resuspended using dm supplemented with 0.04 % bsa and spun down at 300g for 5 min. cells were suspended in 400 µl of dm with 0.04 % bsa. drop-seq experiments were performed as previously described (59) . microfluidics devices were obtained from flowjem and barcode beads were obtained from chemgenes. two 15 min drop-seq runs were collected from a single dissociation preparation obtained from 2 mice. two such dissociations were performed, giving 4 total replicates. 4 replicates of drop-seq samples were pooled and sequenced across 3 runs on an illumina nextseq 500 platform. paired end reads from the fastq files were trimmed, aligned, and tagged via the drop-seq tools (1-2.0) pipeline, using star (2.4.2a) with genomic indices from ensembl release 82. the digital gene expression matrix was generated for 8,000 cells per replicate. cells with low numbers of genes (500), low numbers of umis (700) or high numbers of umis (>10000) were removed (6 % of cells). potential doublets were identified via scrublet and removed (3.5 % of cells). overall, this new dataset comprised 27004 cells. raw umi counts from juvenile and adult whole olfactory bulb samples were integrated in seurat (72) . integrating the datasets ensured that clusters with rare cell types could be identified and that corresponding cell types could be accurately matched. as described below (see figure s5 ), although some cell types were observed with different frequencies, the integration procedure yielded stable clusters with cells from both datasets. briefly, raw counts were log-normalised separately and the 10000 most variable genes identified by variance stabilizing transformation for each dataset. the 4529 variable genes present in both datasets and the first 30 principal components (pcs) were used as features for identifying the integration anchors. the integrated expression matrix was scaled and dimensionality reduced using pca. based on their percentage of explained variance, the first 28 pcs were chosen for umap visualisation and clustering. graph-based clustering was performed using the louvain algorithm following the standard seurat workflow. cluster stability was analysed with clustree on a range of resolution values (0.4 to 1.4), with 0.6 yielding the most stable set of clusters (73) . overall, 26 clusters were identified, the smallest of which contained only 43 cells with gene expression patterns consistent with blood cells, which were excluded from further visualisation plots. clustering the two datasets separately yielded similar results. moreover, the distribution of cells from each dataset across clusters was homogenous ( figure s5 ) and the clusters corresponded previous cell class and subtype annotations (42) . as previously reported, a small cluster of excitatory neurons (cluster 13) contained neurons from the anterior olfactory nucleus. umap visualisations of expression level for cell class and cell type markers, and for genes coding for coronavirus entry proteins, depict log-normalized umi counts. the heatmap in figure 4b shows the mean expression level for each cell class, normalised to the maximum mean value. the percentage of cells per cell class expressing ace2 was defined as the percentage of cells with at least one umi. in cells from both datasets, ace2 was enriched in pericytes but was not detected in neurons. acute olfactory bulb 300 µm slices were obtained from dat-cre/flox-tdtomato (b6.sjl-slc6a3 tm1.1(cre) bkmn/j , jax stock 006660 / b6.cg-gt(rosa)26sor tm9(cag-tdtomato)hze , jax stock 007909) p28 mice as previously described (74) . as part of a wider study, at p27 these mice had undergone brief 24 h unilateral naris occlusion via a plastic plug insert (n = 5 mice) or were subjected to a sham control manipulation (n = 5 mice); all observed effects here were independent of these treatment groups. single cell suspensions were generated using the neural tissue dissociation kit -postnatal neurons (miltenyi biotec. cat no. 130-094-802), following manufacturer's instructions for manual dissociation, using 3 fired-polished pasteur pipettes of progressively smaller diameter. after enzymatic and mechanical dissociations, cells were filtered through a 30 µm cell strainer, centrifuged for 10 minutes at 4° c, resuspended in 500 µl of acsf (in mm: 140 nacl, 1.25 kcl, 1.25 nah2po4, 10 hepes, 25 glucose, 3 mgcl2, 1 cacl2) with channel blockers (0.1 µm ttx, 20 µm cnqx, 50 µm d-apv) and kept on ice to minimise excitotoxicity and cell death. for manual sorting of fluorescently labelled dopaminergic neurons we adapted a previously described protocol (75) . 50 µl of single cell suspension was dispersed on 3.5mm petri dishes (with a sylgard-covered base) containing 2 ml of acsf + channel blockers. dishes were left undisturbed for 15 minutes to allow the cells to sink and settle. throughout, dishes were kept on a metal plate on top of ice. tdtomato-positive cells were identified by their red fluorescence under a stereoscope. using a pulled glass capillary pipette attached to a mouthpiece, individual cells were aspirated and transferred to a clean, empty dish containing 2 ml acsf + channel blockers. the same cell was then transferred to a third clean plate, changing pipettes for every plate change. finally, each individual cell was transferred to a 0.2 ml pcr tube containing 2 µl of lysis buffer (rlt plus -qiagen). the tube was immediately placed on a metal plate sitting on top of dry ice for flash-freezing. collected cells were stored at -80c until further processing. positive (more than 10 cells) and negative (sample collection procedure without picking a cell) controls were collected for each sorting session. in total, we collected samples from 2.5 hours elapsed between mouse sacrifice and collection of the last cell in any session. samples were processing using a modified version of the smart-seq2 protocol (76) . briefly, 1 µl of a 1:2,000,000 dilution of ercc spike-ins (invitrogen. cat. no. 4456740) was added to each sample and mrna was captured using modified oligo-dt biotinylated beads (dynabeads, invitrogen). pcr amplification was performed for 22 cycles. amplified cdna was cleaned with a 0.8:1 ratio of ampure-xp beads (beckman coulter). cdnas were quantified on qubit using hs dna reagents (invitrogen) and selected samples were run on a bioanalyzer hs dna chip (agilent) to evaluate size distribution. for generating the sequencing libraries, individual cdna samples were normalised to 0.2ng/µl and 1µl was used for one-quarter standard-sized nextera xt (illumina) tagmentation reactions, with 12 amplification cycles. sample indexing was performed using index sets a and d (illumina). at this point, individual samples were pooled according to their index set. pooled libraries were cleaned using a 0.6:1 ratio of ampure beads and quantified on qubit using hs dna reagents and with the kapa library quantification kits for illumina (roche). samples were sequenced on two separate rapid-runs on hiseq2500 (illumina), generating 100bp paired-end reads. an additional 5 samples were sequenced on miseq (illumina). paired-end read fastq files were demultiplexed, quality controlled using fastqc (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed using trim galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). reads were pseudoaligned and quantified using kallisto (77) against a reference transcriptome from ensembl release 89 (gencode release m17 grcm38.p6) with sequences corresponding to the ercc spike-ins and the cre recombinase and tdt genes added to the index. transcripts were collapsed into genes using the sumacrossfeatures function in scater. cell level quality control and cell filtering was performed in scater (78) . cells with <1000 genes, <100,000 reads, >75% reads mapping to ercc spike-ins, >10% reads mapping to mitochondrial genes or low library complexity were discarded (14% samples). the population of olfactory bulb cells labelled in dat-tdtomato mice is known to include a minor non-dopaminergic calretinin-positive subgroup (79) , so calretininexpressing cells were excluded from all analyses. the sctransform function in seurat was used to remove technical batch effects. an analysis of single-cell gene expression data from 10 studies was performed to investigate the expression of genes coding for coronavirus entry proteins in neurons from a range of brain regions and sensory systems. processed gene expression data tables were obtained from scseq studies that evaluated gene expression in retina (gse81905) (80) inner ear sensory epithelium (gse115934) (81, 82) and spiral ganglion (gse114997) (83) , ventral midbrain (gse76381) (84) , hippocampus (gse100449) (85), cortex (gse107632) (86), hypothalamus (gse74672) (87), visceral motor neurons (gse78845) (88) , dorsal root ganglia (gse59739) (89) and spinal cord dorsal horn (gse103840) (90) . smart-seq2 sequencing data from vsx2-gfp positive cells was used from the retina dataset. a subset of the expression matrix that corresponds to day 0 (i.e. control, undisturbed neurons) was used from the layer vi somatosensory cortex dataset. a subset of the data containing neurons from untreated (control) mice was used from the hypothalamic neuron dataset. from the ventral midbrain dopaminergic neuron dataset, a subset comprising dat-cre/tdtomato positive neurons from p28 mice was used. a subset comprising type i neurons from wild type mice was used from the spiral ganglion dataset. the "unclassified" neurons were excluded from the visceral motor neuron dataset. a subset containing neurons that were collected at room temperature was used from the dorsal root ganglia dataset. expression data from dorsal horn neurons obtained from c57/bl6 wild type mice, vgat-cre-tdtomato and vglut2-egfp mouse lines was used from the spinal cord dataset. inspection of all datasets for batch effects was performed using the scater package (version 1.10.1) (78) . publicly available raw count expression matrices were used for the retina, hippocampus, hypothalamus, midbrain, visceral motor neurons and spinal cord datasets, whereas the normalized expression data was used from the inner ear hair cell datasets. for datasets containing raw counts, normalization was performed for each dataset separately by computing pool-based size factors that are subsequently deconvolved to obtain cell-based size factors using the scran package (version 1.10.2) (91). violin plots were generated in scater. (39) . normalized counts for each gene in the whole olfactory mucosa (wom) and olfactory sensory neurons (osns) are shown. each circle represents a biological replicate and each color indicates the category of the gene shown on the right (cov-2 and other covs: genes involved in the entry of these viruses, other categories: marker genes for specific cell types such fig. 2a for three bulk rna-sequencing datasets. mhv, mouse hepatitis virus. left plot is same as fig. 2a except for the addition of ceacam1. (c) gene expression for cov-related genes including ace2 and tmprss2 as well as marker genes for olfactory and re subtypes are shown normalized by their maximum expression across cell types. ace2 and tmprss2 are expressed in wom respiratory and non-neuronal olfactory cell types, but not in osns. (d) umap representations of gene expression in the wom dataset for cov-2 related genes ace2 and tmprss2, as well as marker genes for each cell type. each point represents an individual cell, and the color represents the normalized expression level for each gene (number of umis for a given gene divided by the total number of umis for each cell). (e) fluorescent in situ hybridization of an identified dorsal sustentacular cell marker, sult1c1 (in yellow), combined with immunostaining for the known dorsal osn marker nqo1 (white). note that sult1c1 rna fills the apical cytoplasm; given that sustentacular cells are ubiquitous in the epithelium, this is apparent as broad antisense signal for sult1c in a pattern that is characteristic of the apical anatomy of sustentacular cells. sult1c1 rna is detected in sustentacular cells in the nqo1-positive dorsal olfactory epithelium. nuclei were stained with dapi (blue). bar = 20 µm. granule cells (0) granule cells (1) immature neurons (2) granule cells (3) calretinin neurons (4) astrocytes (5) olfactory ensheathing cells (6) immature neurons (7) interneurons (8) microglia (9) oligodendrocytes (10) dopaminergic neurons (11) interneurons (12) mitral/tufted cells -aon (13) astrocytes (14) vascular (15) oligo precursor cells (16) pericytes (17) external tufted cells (18) mitral/tufted cells (19) perivascular macrophages (20) vip neurons (21) vascular leptomeningeal cells (22) intermediate progenitor cells (23) granule cells ( hypothalamus (romanov) retina (shekhar) inner ear spiral ganglion (shrestha) dorsal root ganglia (usoskin) clinical characteristics of coronavirus disease 2019 in china the epidemiology and pathogenesis of coronavirus disease (covid-19) outbreak characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72 314 cases from the chinese center for disease control and prevention a pneumonia outbreak associated with a new coronavirus of probable bat origin differences and similarities between severe acute respiratory syndrome (sars)-coronavirus (cov) and sars-cov-2. would a rose by another name smell as sweet? coronavirusesdrug discovery and therapeutic options hosts and sources of endemic human coronaviruses coincidence of covid-19 epidemic and olfactory dysfunction outbreak self-reported olfactory and taste disorders in sars-cov-2 patients: a cross-sectional study virological assessment of hospitalized patients with covid-2019 olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (covid-19): a multicenter european study loss of smell and taste in combination with other symptoms is a strong predictor of covid-19 infection. medrxiv a primer on viral-associated olfactory loss in the era of covid-19 sudden and complete olfactory loss function as a possible symptom of covid-19 european patients with mild-to-moderate coronavirus disease sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme 2 is a functional receptor for the sars coronavirus severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace2 efficient replication of severe acute respiratory syndrome coronavirus in mouse cells is limited by murine angiotensin-converting enzyme 2 a crucial role of angiotensin converting enzyme 2 (ace2) in sars coronavirus-induced lung injury cleavage and activation of the severe acute respiratory syndrome coronavirus spike protein by human airway trypsin-like protease influenza and sars-coronavirus activating proteases tmprss2 and hat are expressed at multiple sites in human respiratory and gastrointestinal tracts middle east respiratory syndrome coronavirus and bat coronavirus hku9 both can utilize grp78 for attachment onto host cells the laboratory mouse comparative anatomy, physiology, and function of the upper respiratory tract phagocytic cells in the rat olfactory epithelium after bulbectomy supporting cells as phagocytes in the olfactory epithelium after bulbectomy ionic conductances in sustentacular cells of the mouse olfactory epithelium novel role of cystic fibrosis transmembrane conductance regulator in maintaining adult mouse olfactory neuronal homeostasis olfactory epithelium: cells, clinical disorders, and insights from an adult stem cell niche stem and progenitor cells of the mammalian olfactory epithelium: taking poietic license deconstructing olfactory stem cell trajectories at single-cell resolution sars-cov-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes a single-cell atlas of the human healthy airways a cellular census of human lungs identifies novel cell states in health and in asthma a transcriptomic atlas of mammalian olfactory mucosae reveals an evolutionary influence on food odor detection in humans single-cell analysis of olfactory neurogenesis and differentiation in adult humans sars-cov-2 entry genes are most highly expressed in nasal goblet and ciliated cells within human airways hierarchical deconstruction of mouse olfactory sensory neurons: from whole mucosa to single-cell rna-seq deep sequencing of the murine olfactory receptor neuron transcriptome dnmt3a regulates global gene expression in olfactory sensory neurons and enables odorant-induced transcription molecular architecture of the mouse nervous system pericytes and neurovascular function in the healthy and diseased brain tissue distribution of ace2 protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis sudden and complete olfactory loss function as a possible symptom of covid-19 a single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte morphologic changes in the nasal cavity associated with sialodacryoadenitis virus infection in the wistar rat early recovery following new onset anosmia during the covid-19 pandemic -an observational cohort study neurologic alterations due to respiratory virus infections functional consequences following infection of the olfactory system by intranasal infusion of the olfactory bulb line variant (oblv) of mouse hepatitis strain jhm systemic diseases and disorders the olfactory nerve and not the trigeminal nerve is the major site of cns entry for mouse hepatitis virus, strain jhm intranasal inoculation with the olfactory bulb line variant of mouse hepatitis virus causes extensive destruction of the olfactory bulb and accelerated turnover of neurons in the olfactory epithelium of mice ceacam1a-/-mice are completely resistant to infection by murine coronavirus mouse hepatitis virus a59 function of hab18g/cd147 in invasion of host cells by severe acute respiratory syndrome coronavirus sars-cov-2 invades host cells via a novel route: cd147-spike protein sars-cov-2 productively infects human gut enterocytes lineage tracing on transcriptional landscapes links state to fate during differentiation highly parallel genome-wide expression profiling of individual cells using nanoliter droplets fiji: an open-source platform for biological-image analysis angiotensinconverting enzyme 2 is reduced in alzheimer's disease in association with increasing amyloid-β and tau pathology temporally-controlled site-specific mutagenesis in the basal layer of the epidermis: comparison of the recombinase activity of the tamoxifeninducible cre-er(t) and cre-er(t2) recombinases cre reporter strains produced by targeted insertion of eyfp and ecfp into the rosa26 locus injury activates transient olfactory stem cell states with diverse lineage capacities performance assessment and selection of normalization procedures for single-cell rna-seq doubletfinder: doublet detection in single-cell rna sequencing data using artificial nearest neighbors scrublet: computational identification of cell doublets in single-cell transcriptomic data normalization of rna-seq data using factor analysis of control genes or samples clusterexperiment and rsec: a bioconductor package and framework for clustering of single-cell and other large gene expression datasets spring: a kinetic interface for visualizing high dimensional single-cell expression data scanpy: large-scale single-cell gene expression data analysis comprehensive integration of single-cell data clustering trees: a visualization for evaluating clusterings at multiple resolutions embryonic and postnatal neurogenesis produce functionally distinct subclasses of dopaminergic neuron a manual method for the purification of fluorescently labeled neurons from the mammalian brain separation and parallel sequencing of the genomes and transcriptomes of single cells using g&t-seq near-optimal probabilistic rnaseq quantification scater: pre-processing, quality control, normalization and visualization of single-cell rna-seq data in r the transcription factor pax6 regulates survival of dopaminergic olfactory bulb neurons via crystallin αa comprehensive classification of retinal bipolar neurons by single-cell transcriptomics single-cell rna-seq resolves cellular complexity in sensory organs from the neonatal inner ear characterization of spatial and temporal development of type i and type ii hair cells in the mouse utricle using new celltype-specific markers sensory neuron diversity in the inner ear is shaped by activity sensory neuron diversity molecular diversity of midbrain development in resource molecular diversity of midbrain development in mouse, human and stem cells dissociable structural and functional hippocampal outputs via distinct subiculum cell classes variation in activity state, axonal projection, and position define the transcriptional identity of individual neocortical projection neurons molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes visceral motor neuron diversity delineates a cellular basis for nipple-and pilo-erection muscle control unbiased classification of sensory neuron types by large-scale single-cell rna sequencing neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types pooling across cells to normalize single-cell rna sequencing data with many zero counts we thank members of the datta lab, james schwob, bernardo sabatini, andreas schaefer, kevin franks, michael greenberg and vanessa ruta for helpful comments on the manuscript. we thank james lipscombe and andres crespo for technical support. data and materials availability: reanalyzed datasets are obtained from the urls listed in supplementary materials. all data is currently being deposited and will be made publicly accessible from the ncbi geo at accession gse148360. normalized expression * * * olfactory ensheathing cells (oec) and respiratory cells. (e) gene expression for cov-related genes including ace2 and tmprss2 as well as marker genes for olfactory and re subtypes are shown normalized by their maximum expression across cell types. ace2 and tmprss2 are expressed in wom respiratory and olfactory cell types, but not in osns. (f) cov-2 related genes ace2 and tmprss2, as well as marker genes for cell types in fig. 2c ., in umap representation of wom dataset with normalized expression. gfap-positive oecs (olfactory ensheathing cells) and muc5b-positive secretory cells are indicated by asterisks. key: cord-005216-potmzdfs authors: sun, dong; wan, xin; pan, bin-bin; sun, qing; ji, xiao-bing; zhang, feng; zhang, hao; cao, chang-chun title: bioinformatics analysis of genes and pathways of cd11b(+)/ly6c(intermediate) macrophages after renal ischemia-reperfusion injury date: 2018-03-15 journal: curr med sci doi: 10.1007/s11596-018-1848-7 sha: doc_id: 5216 cord_uid: potmzdfs renal ischemia-reperfusion injury (iri) is a major cause of acute kidney injury (aki), which could induce the poor prognosis. the purpose of this study was to characterize the molecular mechanism of the functional changes of cdllb+/ly6c(intermediate) macrophages after renal iri. the gene expression profiles of cdllb+/ly6cintermcdiate macrophages of the sham surgery mice, and the mice 4 h, 24 h and 9 days after renal iri were downloaded from the gene expression omnibus database. analysis of mrna expression profiles was conducted to identify differentially expressed genes (degs), biological processes and pathways by the series test of cluster. protein-protein interaction network was constructed and analysed to discover the key genes. a total of 6738 degs were identified and assigned to 20 model profiles. degs in profile 13 were one of the predominant expression profiles, which are involved in immune cell chemotaxis and proliferation. signet analysis showed that atp5al, atp5o, cox4i, cdc42, rac2 and nhp2 were the key genes involved in oxidation-reduction, apoptosis, migration, m1-m2 differentiation, and proliferation of macrophages. rps18 may be an appreciate reference gene as it was stable in macrophages. the identified degs and their enriched pathways investigate factors that may participate in the functional changes of cd 1lb(+)ly6c(intermediate) macrophages after renal iri. moreover, the vital gene nhp2 may involve the polarization of macrophages, which may be a new target to affect the process of aki bule injury. a subsequent switch to m2 macrophages can suppress the inflammatory response and induce a proliferative repair phase [11] . m2 macrophages are involved in producing extracellular matrix components, but may also contribute to the tissue fibrosis, should this process become dysregulated [12] . the marker of macrophages may be diverse in different organs, therefore defining macrophages according to their function has become challenging [13] . according to the whole genome microarray analysis data, the cd11b + /ly6c high population was associated with the onset of renal injury and produced proinflammatory cytokines. in addition, the cd11b + / ly6c int population demonstrated a wound healing phenotype [14] . dragomir et al reported that cd11b + / ly6c lo cells were larger than cd11b + /ly6c hi cells, and more irregularly shaped. moreover, cd11b + /ly6c lo cells contained a highly vacuolated cytoplasm and an increased cytoplasm:nuclear ratio. rt-pcr analysis revealed that mrna expression levels related to proinflammatory proteins (for example, tnf-α, inos, and the chemokine receptor ccr2) were significantly higher in cd11b + /ly6c hi cells than in cd11b + / ly6c lo cells. in contrast, the mrna expression level of anti-inflammatory cytokine il-10 was reduced in cd11b + /ly6c hi cells when compared to cd11b + / ly6c lo cells [15] . clements et al investigated the genes that were uniquely expressed in each population. it is necessary to consider of genes that were regulated in each phenotype over time [14] . the purpose of our study was to evaluate the proportion of macrophages that changed at different time points after iri. we used microarray analysis to identify the differentially expressed genes (degs) in cd11b + /ly6c int macrophages of c57bl/6 mice and mice undergoing sham surgery or iri for 4 h, 24 h or 9 days. we used the series test of cluster (stc) analysis, stc-gene ontology analysis and pathway analysis to identify changes in function and pathways in macrophages in the different groups. the protein-to-protein interaction (ppi) network was applied to select key genes by degree, which may help explain how macrophages were influenced at different time points. our study may provide further insight into a new target that affects the process of aki by changing the macrophage function. gene expression analysis was performed on an affymetrix mouse genome 430 2.0 array platform (affymetrix, usa) for which the transcription profile gse75808 from the gene expression omnibus (geo) database, an open-access functional genomics data repository was downloaded. twenty-five male c57bl/6 mice (8 to 10 week-old) were divided into several groups and underwent either bilateral renal iri for 28 min or sham surgery followed by reperfusion. for rna isolation and amplification for cdna production, macrophage populations were sorted on a bd facsaria ii cell sorter (bd bioscience,usa) based on cd11b + /ly6c high , cd11b + /ly6c int or cd11b + / ly6c low as previously described [14] . we analysed the expression of cd11b + /ly6c int macrophages of the sham (gsm1968232 and gsm1968235), 4-h iri (gsm1968226, gsm1968228 and gsm1968230), 24 h iri (gsm1968216, gsm1968217 and gsm1968218) and 9 day iri (gsm1968219, gsm1968221 and gsm1968223) groups to identify genes that are associated with this subset of macrophages. the probe-level data were converted into expression measures so that by taking the average expression value, the expression values of all probes for a given gene in each sample were reduced to a single value [16] . next, the principal component analysis (pca) was performed and degs were identified as previously described [17] . in our study, we used the effective statistical method for small samples to identify differentially expressed mrnas among the four groups of macrophages by gcbi (https://www.gcbi.com.cn/gclib/html/ index). these values include false discovery rate-adjusted p values and were considered significant when p<0.05. to validate the most probable set of clusters of four-time series, we used the stc algorithm of gene expression dynamics to profile the gene expression time series as previously described [18] . the stc algorithm identified which profiles have a significant number of genes assigned and the result may indicate the change rule of samples at different time points. to identify functional changes in macrophages, we analysed the role of degs in each significant expression profile. analyses were based on the gene ontology (go) database in funrich 3.0 software as previously described [19] . the degs of each significant expression profile was classified into a group of biological process categories from go annotation. go terms were considered significant at p<0.05. we identified the significant pathways that were changed in cd11b + /ly6c int macrophages. to identify the main biological function of cd11b + /ly6c int macrophages, pathways of genes with similar expression trend were analysed. analyses were based on the reactome database in funrich. the threshold of significance was considered as p<0.05. we created a ppi network to analyse the key genes which regulate other genes. based on the 927 degs in profile 13, ppi was analysed by string 10.5 and the cytohubba app of cytoscape software (version 3.5.1) as previously described [20] . the combined score of the ppi value was>0.4. genes that showed a high degree were identified as key genes as previously described [21] . cytohubba is an approach that has frequently been used to select hub genes, and provides 11 topological analysis methods, including degree, maximum neighborhood component, edge percolated component, maximal clique centrality, density of maximum neighborhood component, and six centralities (bottleneck, eccentricity, closeness, betweenness, radiality, and stress) based on the shortest paths [22] . genes that appeared in the top 50 genes by more than 6 ways in cytohubba were identified. the top 50 hub forming genes were output by each of the ranking methods as a measure of significance. genes were considered significant when they were identified by both methods. we especially focused on the genes of the top 20 degrees in the ppi network as previously described [23] . in this study, we analysed degs from cd11b + / ly6c int macrophages, which were isolated from kidneys of mice undergoing sham surgery (n=2), and iri at 4 h, 24 h, and 9 days (n=3 per group). figure 1a and 1b present the data before and after normalization. in fig. 1c , the pca score plots of the four groups are shown. a total of 6738 normalized degs were identified (p<0.05 and q<0.05) at different time points. to identify target genes among the 6738 genes, twenty expression profiles were evaluated by cluster analysis. each profile contained genes with a similar expression pattern after iri. among the 20 profiles of genes, nine profiles (profile 13, 5, 10, 2, 4, 7, 11, 1 and 3) were significantly different (p<0.05) ( fig. 2 and table 1 ). as shown in table 2, go terms that were downregulated after renal iri included cell cycle, inflammatory response, apoptosis, oxidation-reduction process, autophagy, and cell proliferation. profile 13 consisted of genes that were stable at early time points and then rapidly decreased at later time points. to get insights into the biological effects of the genes in profile 13, we analysed the involved go terms using the go annotation in funrich 3.0 (the threshold of go terms was p<0.05). biological processes such as transport and apoptosis showed the most notable enrichment of the target genes. moreover, chemotaxis of several immune cells was also decreased at day 9 after iri. among the cellular component, extracellular vesicular exosome and mitochondrion showed a maximum enrichment, whereas protein binding and rna binding showed the highest enrichment of molecular function ( fig. 3 ). we analysed the pathway enrichment using reactome database for degs of profile 13. p<0.05 was considered for pathway analysis. the highest enrichment of pathway was demonstrated for neutrophil degranulation (p=1.01e-41) and respiratory electron transport (p=1.47e-17) ( fig. 4 ). string analysis was used to obtain the ppi of the 927 degs in profile 13. the minimum required interaction score was 0.4. we gained a total of 6903 edges and 902 nodes, accounting for 97.30% of all degs. results demonstrated that gapdh (degree: 138), hsp90aa1 (degree: 110), actb (degree: 90), actg1 (degree: 81), atp5a1 (degree: 74), atp5o (degree: 69), cdc42 (degree: 68), pcna (degree: 66), uqcr11 (degree: 65) and cox4i1 (degree: 60), which all have a high degree, were identified as hub genes in the ppi network ( fig. 5) . moreover, rhoa, rac2, nhp2, rplp0, rpl5, rpl7, ppp2ca, mrpl2, and rps6 were among the top twenty degrees in the ppi network. in addition to string analysis, key genes were also identified by cytohubba. table 3 presents genes that were considered to be the top 50 key genes in more than 6 ways of the 11 topological analysis methods described above. because the changes of the genes with either a high degree or selected by cytohubba could regulate the expression of multiple genes, they were identified as key genes for further study. as aki could result in high mortality, it is important to take back to homeostasis by a suitable way [4] . the cd11b + /ly6c int population is identified as wound healing population, which carries membrane receptors and chemokines associated with inflammation at the same time [14] . understanding how the population changes may provide a new way to treat aki by modulating the immune system. in our study, we identified nine gene profiles, most of which were downregulated after iri. the proportion of cd11b + /ly6c int macrophages was at a maximum one day after iri and was markedly reduced at day 9 after iri. taking this into consideration, the genes in the profile 13 were involved in infiltration and proliferation. go analysis of profile 13 showed that the genes related to the extracellular vesicular exosome were significantly changed, which was consistent with the finding that cd11b + /ly6c int macrophage could secrete anti-inflammation cytokines [15] . go and pathway analysis indicated that immune cell chemotaxis (for example, neutrophil, leukocyte, lymphocyte, monocyte, and macrophage) and g1/s progression were downregulated at later time points. in mitochondria, the oxidation-reduction process and respiratory electron transport were significantly involved after iri as at-p5a1, atp5o, and cox4i1 are also key genes in the ppi network. it would be required to further assess the vitality, switch and infiltration ability of cd11b + /ly6c int macrophages, given that cdc42 and rac2 were found to have an effect on apoptosis, migration, and m1-m2 differentiation [24, 25] . protein phosphatase 2a (pp2a) is a bona fide tumor suppressor gene that is involved in mitosis and apoptosis [26] . homologues ppp2ca (pp2ac) and ppp2cb are the catalytic subunits of pp2a [27] . previous studies have shown that upregulation of ppp2ca in systemic lupus erythematosus decreased il-2 [28] , resulting in the generation of effector and memory t cells and the maintenance of regulatory t cells [29] . moreover, it was suggested that reduction of ppp2ca may play a role in anti-inflammatory progression by increasing osteoprotegerin (opg) expression and decreasing receptor activator of nuclear factor κb ligand (rankl) expression [30] . in prostate cancer cells, loss of ppp2ca facilitated epithelial-to-mesenchymal transition [31] . taken together, the reduction of ppp2ca in profile 13 may be a reason why cd11b + /ly6c int macrophages decreased inflammation. rpl5 can induce the cell cycle, and the downregulation of rpl5 at day 9 after iri may provide an explanation of the reduction of cd11b + /ly6c int macrophages. moreover, the depletion of rpl5 strongly suppresses cell cycle progression in primary human lung fibroblasts [32] . rpl5 and rpl11 promote apoptosis and reduce cellular proliferation in tumor in vitro [33] . however, it has also been reported that a heterozygous deletion or mutated rpl5 occurred in 11% of glioblastoma, 28% of melanoma and 34% of breast cancer cases [34] . nhp2, one of the key nodes in ppi, was significantly decreased when 1-day and 9-day cd11b + /ly-6c int macrophages were compared. nhp2 is a part of h/aca ribonucleoprotein particles (rnps), containing four common proteins, including the pseudouridine synthase cbf5, nhp2, nop10, gar1a and a substrate-specific h/aca rna. h/aca is considered to play a role in the biogenesis of spliceosomal small nuclear rna (snrna) and ribosomal rna (rrna) [35] . depletion of nhp2 reduced all h/aca snornas and impaired global rrna pseudouridylation, which play a role in the uridine selection or isomerization processes as they are required for the synthesis and stability of particles [36, 37] . nhp2 is highly expressed in spleen, thymus, small intestine, testis, ovary, prostate, colon (mucosal lining), skeletal muscle, kidney, heart, pancreas, placenta, and brain, whereas the expression levels in the liver are low [38] . nhp2 mrna was barely detectable in peripheral blood leukocytes and lung. during the differentiation of u937 cells into monocytes and macrophages induced by 12-o-tetradecanoylphorbol-13-acetate (tpa), the expression of nhp2 mrna was markedly decreased, but was upregulated during the retro-differentiation [38] . nhp2, htr, htert and nop10 could control telomere homeostasis, which is important for apoptosis and cell-cycle arrest [39, 40] . thus, nhp2 may affect the polarization of macrophage. finally, it is important to identify which genes could serve as reference genes. we analysed the expression levels of gapdh, actb, β2m, hmbs, hprt, rplp0, tbp, gusb, ppia, oaz1, nono, tfrc, eef2, hs-b90ab1, rps18, sdha, ywhaz, ubc, rps17, rplp0, rp-l37a, pum1, psmc4, pop4, pgk1, pes1, mrpl9, ipo80, gadd45a, elf1, eif2b1, cdkn1b, cdkn1a, casc3, abl1, and pol2a which are widely used [41] [42] [43] . the data showed that rps18 may be an appropriate reference gene as its expression was stable in macrophages after renal iri. in our study, a total of 6738 degs were found. we analysed the genes that were stable at early time, but were abruptly downregulated at late time, investigated the function and molecular pathways that they were involved, and studied the critical genes involved. a possible explanation why cd11b + /ly6c int macrophages were reduced on day 9 may be due to the fact that genes involved in the chemotaxis and proliferation were decreased. on the other hand, one of the key genes, nhp2, may be involved in the polarization of macrophages. taken together, our study will increase insight into the functional changes of macrophages; therefore, identifying the critical genes involved may provide novel targets for regulating the quantity and phenotype of macrophages. incidence, outcomes, and risk factors of community-acquired and hospital-acquired acute kidney injury: a retrospective cohort study identifying acute kidney injury in the community--a novel informatics approach netrin-1 regulates the inflammatory response of neutrophils and macrophages, and suppresses ischemic acute kidney injury by inhibiting cox-2-mediated pge2 production long-term outcome of severe acute kidney injury survivors followed by nephrologists in a developing country recognition and management of acute kidney injury in the international society of nephrology 0by25 global snapshot: a multinational cross-sectional study small interfering rna targeting tnf-alpha gene significantly attenuates renal ischemia-reperfusion injury in mice double-negative alphabeta t cells are early responders to aki and are found in human kidney vascular-resident cd169-positive monocytes and macrophages control neutrophil accumulation in the kidney with ischemia-reperfusion injury microrna 26a modulates regulatory t cells expansion and attenuates renal ischemia-reperfusion injury gm-csf promotes macrophage alternative activation after renal ischemia/reperfusion injury distinct macrophage phenotypes contribute to kidney injury and repair macrophage-mediated injury and repair after ischemic kidney injury renal f4/80+ cd11c+ mononuclear phagocytes display phenotypic and functional characteristics of macrophages in health and in adriamycin nephropathy differential ly6c expression after renal ischemia-reperfusion identifies unique macrophage populations role of galectin-3 in classical and alternative macrophage activation in the liver following acetaminophen intoxication improved normalization of systematic biases affecting ion current measurements in label-free proteomics data time-series analysis in imatinib-resistant chronic myeloid leukemia k562-cells under different drug treatments differential gene expression profiling analysis in workers occupationally exposed to benzene funrich: an open access standalone functional enrichment and interaction network analysis tool proteomic differences between developmental stages of toxoplasma gondii revealed by itraq-based quantitative transcriptome analysis of chicken kidney tissues following coronavirus avian infectious bronchitis virus infection cytohubba: identifying hub objects and sub-networks from complex interactome effects of beta-catenin on differentially expressed genes in multiple myeloma mif inhibits monocytic movement through a non-canonical receptor and disruption of temporal rho gtpase activities in u-937 cells rac2 controls tumor growth, metastasis and m1-m2 macrophage differentiation in vivo sv40 small t antigen and pp2a phosphatase in cell transformation ppp2ca knockout in mice spermatogenesis protein phosphatase 2a is a negative regulator of il-2 production in patients with systemic lupus erythematosus cutting edge: mechanisms of il-2-dependent maintenance of functional regulatory t cells protein phosphatase 2a calpha is involved in osteoclastogenesis by regulating rankl and opg expression in osteoblasts restoration of ppp2ca expression reverses epithelial-to-mesenchymal transition and suppresses prostate tumour growth and metastasis in an orthotopic mouse model loss of tumor suppressor rpl5/rpl11 does not induce cell cycle arrest but impedes proliferation due to reduced ribosome content and translation capacity ribosomal proteins l11 and l5 activate tap73 by overcoming mdm2 inhibition the ribosomal protein gene rpl5 is a haploinsufficient tumor suppressor in multiple cancer types aca small nucleolar rna pseudouridylation pockets bind substrate rna to form three-way junctions that position the target u for modification cb-f5p, a potential pseudouridine synthase, and nhp2p, a putative rna-binding protein, are present together with gar1p in all h box/aca-motif snornps and constitute a common bipartite structure nhp2p and nop10p are essential for the function of h/aca snornps expression of the human homologue of the small nucleolar rna-binding protein nhp2 gene during monocytic differentiation of u937 cells structure of the shq1-cbf5-nop10-gar1 complex and implications for h/aca rnp biogenesis and dyskeratosis congenita telomere maintenance mechanisms and cellular immortalization identification of optimal reference genes for rt-qpcr in the rat hypothalamus and intestine for the study of obesity appropriateness of reference genes for normalizing messenger rna in mouse 2,4-dinitrobenzene sulfonic acid (dnbs)-induced colitis using quantitative real time pcr identification of optimal reference genes for normalization of rt-qpcr data in cancerous and non-cancerous tissues of human uterine cervix the authors report no conflicts of interest. the authors alone are responsible for the content and writing of the paper. key: cord-252147-bvtchcbt authors: domingo-espín, joan; unzueta, ugutz; saccardo, paolo; rodríguez-carmona, escarlata; corchero, josé luís; vázquez, esther; ferrer-miralles, neus title: engineered biological entities for drug delivery and gene therapy: protein nanoparticles date: 2011-11-15 journal: prog mol biol transl sci doi: 10.1016/b978-0-12-416020-0.00006-1 sha: doc_id: 252147 cord_uid: bvtchcbt the development of genetic engineering techniques has speeded up the growth of the biotechnological industry, resulting in a significant increase in the number of recombinant protein products on the market. the deep knowledge of protein function, structure, biological interactions, and the possibility to design new polypeptides with desired biological activities have been the main factors involved in the increase of intensive research and preclinical and clinical approaches. consequently, new biological entities with added value for innovative medicines such as increased stability, improved targeting, and reduced toxicity, among others have been obtained. proteins are complex nanoparticles with sizes ranging from a few nanometers to a few hundred nanometers when complex supramolecular interactions occur, as for example, in viral capsids. however, even though protein production is a delicate process that imposes the use of sophisticated analytical methods and negative secondary effects have been detected in some cases as immune and inflammatory reactions, the great potential of biodegradable and tunable protein nanoparticles indicates that protein-based biotechnological products are expected to increase in the years to come. the development of genetic engineering techniques has speeded up the growth of the biotechnological industry, resulting in a significant increase in the number of recombinant protein products on the market. the deep knowledge of protein function, structure, biological interactions, and the possibility to design new polypeptides with desired biological activities have been the main factors involved in the increase of intensive research and preclinical and clinical approaches. consequently, new biological entities with added value for innovative medicines such as increased stability, improved targeting, and reduced toxicity, among others have been obtained. proteins are complex nanoparticles with sizes ranging from a few nanometers to a few hundred nanometers when complex supramolecular interactions occur, as for example, in viral capsids. however, even though protein production is a delicate process that imposes the use of sophisticated analytical methods and negative secondary effects have been detected in some cases as immune and inflammatory reactions, the great potential of biodegradable and tunable protein nanoparticles indicates that protein-based biotechnological products are expected to increase in the years to come. the design of new chemical entities (nce) for diagnosis and treatment of human diseases has relied on the discovery of active chemical drugs from a diverse library of compounds or from naturally occurring molecules. 1, 2 further chemical modifications improve pharmacokinetic properties to obtain a final product with a known mechanism of action and decreased toxicity. 3 nonetheless, using such approaches, the final products present low specificity for their target molecules, interacting with many other molecules and accumulating in some tissues, disturbing the correct homeostasis of the system. in some cases, the adverse effects of drug administration exceed pharmacological effect and despite the concise mechanism of action of the drug over the target molecule representing an improvement in the patient's state, the treatment has to be prevented or discontinued. 4 in fact, although a maintained steady increase in the number of launched nce has been observed in the last years, the question arises whether this classical approach has already exhausted the discovery of innovative molecules. 5 on the other hand, macromolecular new biological entities (nbe) have been used to supplement cellular deficiencies or to inhibit cellular pathways exploiting their relatively specific mode of action. proteins and peptides have been obtained first from their natural source or produced as recombinant 248 versions after the development of genetic engineering techniques in the late 1970s. however, the delivery of biological entities is sometimes hampered by its low half-life in the bloodstream by unspecific degradation, resulting in an expensive and ineffective process. nevertheless, some solutions have already been explored for biopharmaceuticals to increase solubility and stability and to reduce immunogenicity including postranslational modifications such as glycosylation and covalent conjugation of polyethylene glycol. 6 thus, one of the main objectives in the use of drugs (for either nce or nbe) is the need to optimize the delivery system to reduce the pharmacological dose which would consequently represent a concomitant reduction in toxicity and cost. in that scenario, new delivery approaches have been implemented using biological interactions such as antigen-antibody binding (immunoliposomes) 7 or more sophisticated interactions including the binding between nutrient concentrator sparc (secreted protein acidic and rich in cysteine) and albumin in the treatment of some types of cancer (abraxane ). 8, 9 proteins can be then used for their targeting qualities as molecular delivery vehicles both for the specific delivery of drugs or nucleic acids in gene therapy approaches and by themselves as therapeutic molecules. one of the interesting characteristics of proteins is their ability to form intermolecular driven complexes as sophisticated and structurally perfect as in the case of viral capsids. in addition, through the use of genetic engineering, recombinant proteins can be tuned to include additional properties to optimize drug delivery and nucleic acid delivery in gene therapy. in this chapter, the main available strategies to develop protein-based nanovehicles or biopharmaceuticals will be described. in this context, several parameters will be defined such as proper formulation, stability, immunogenicity, and delivery to the correct cell type and cell compartment. modular protein engineering, virus-like particles (vlps), and other self-assembling entities are envisioned as modulatable novel protein nanoparticles able to include many desirable properties in the correct delivery of drugs and nucleic acids. finally, some successful examples of protein nanoparticles on the market will be described in addition to protein products currently in clinical trials and under preclinical research in order to envision which type of protein nanoparticles will be available soon on the market. with the therapeutic molecule to generate a vehicle capable of being transported in the blood if a systemic administration is needed and retaining a significant stability before reaching the target cell. 10, 11 in addition, the biological system poses specific barriers that have to be overcome such as membranes (cytoplasmic, endocytic, and nuclear), degradation (protease degradation induced by acid denaturalization in lysosomes, cytosolic proteosomes, and nucleases), cytosolic transport, and nuclear entry if necessary. 12, 13 for central nervous system therapies, the blood-brain barrier (bbb) represents the main bottleneck, and for that, a specific strategy has to be designed. 14 furthermore, the therapeutic complex has to be flexible enough in order to release the therapeutic molecule in the specific cell compartment. thus, several protein motifs have been described to overcome each and every process described earlier so that a modular multifunctional protein can be generated including those modules that are necessary to achieve its goal. in order to get a rational construction of the multifunctional vector, each step has to be carefully taken into account so as to overcome every step which is needed to achieve its final goal (table i) . the dna/rna condensation or drug interaction with the protein vector is a critical step in the formulation of protein nanoparticles for gene therapy. they have to remain attached to the vector during the whole transport process through the body and the cell until it can be released in the desired localization within the target cell. highly positively charged peptides containing a large number of arginines or polylysines have been used to promote electrostatic interactions since nucleic acids are highly negatively charged molecules. [15] [16] [17] [18] [19] [20] [21] [22] natural dna-condensing proteins as nuclear histidines or protamines can also be used to bind nucleic acids. [22] [23] [24] [25] protamine, which is the protein that replaces histidines during the spermatogenesis process, is a sperm chromatin component and just as the histidines do, it has very high dna condensation ability to protect nucleic acids form cytosolic endonucleases. 23, 26 in addition, as soon as the complex reaches the cellular nucleus, protamine is degraded by chromatinremodeling proteins, releasing the transported dna allowing its expression. 15, 23 in contrast, polycationic dna condensation modules such as polylysines and polyarginines-even they can present higher dna condensation ability depending on the polycationic chain length-usually present lower dna-releasing ability, interfering negatively with the accessibility of cellular transcription factors and dna expression capacity. 15 all these dna condensation modules described above interact with any dna that is incubated in an unspecific way. however, there are proteins such as gal4 that are able to recognize specific dna sequences [27] [28] [29] and that permit to bind and condensate specific dna sequences in the final vector. 30 in many cases, the multifunctional protein vector is in vivo administrated by the systemic route in order to travel in the blood and reach the target cells. that exposes the vector to all blood components, making it susceptible to be degraded. thus, it is completely necessary that the vector remain in the blood long enough to be able to reach the target cells. it has also been described that naked dna has an estimated half-life in blood of minutes 10 ; so protein nanovehicles in gene therapy, among other properties, are intended to protect nucleic acids from degradation. one important factor when the vector is exposed to the blood is that it can be recognized by the immune system components and produces an immune response against the vector. thus, it is also very important to try to make the vector as less antigenic as possible in order to avoid being degraded or even being toxic to the organism. 32 peptide uptake or internalization involves a step before the protein binding to the cell surface. this attachment can be either specific or unspecific but in all cases the promotion of its internalization is required. 33 positively charged peptides usually bind the cellular surface by unspecific electrostatic interactions with the negatively charged cell surface proteoglicans. this kind of peptides can be used in the multifunctional protein if specific targeting is not required. 33 cell-penetrating peptides (cpps) have been widely described as unspecific cell-binding and internalization peptides [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] (see also the chapter ''peptide nanoparticles for oligonucleotide delivery'' by lehto et al. in this volume). however, specific interactions can be obtained by incorporating cell receptor ligands if cell or tissue targeting is required for the therapeutic action. moreover, some of those ligand-receptor interactions promote the ligand-receptor complex internalization. many peptides have been described in the literature as receptor-specific ligands so any of them can be added to the multifunctional proteins in order to confer them cell specificity. [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] the most natural specific ligands that can also be used for cell targeting are monoclonal antibodies. 32, [63] [64] [65] in addition, if no specific peptides are available for an intended target, new specific binding peptides can be found by using phage display 66 or combinatorial chemistry. 67 2. endosomal escape several internalization pathways are possible depending on the vector properties, 27, 33 including endocytosis (clathrin/caveolae-mediated, clathrin/ caveolae-independent), macropinocytosis, and non-endocytic pathways. 254 it is known that more than one internalization pathway can be performed at the same time but usually the peptide-based vector uses endocytic pathways. 68 moreover, it seems that proteins that interact with a specific cellular receptor are internalized by the clathrin-mediated endocytic pathway. 33 most of the generated endosomal vesicles will converge to late endosomes that eventually will fuse with cellular lysosomes. 15, 33 remaining in the cellular endosomes, the multifunctional protein will be degraded, so it is strictly necessary that the internalized multifunctional proteins be released into the cellular cytoplasm escaping from degradation. several peptides have been described that are able to promote endosomal escape and can be classified into two types depending on their escape mechanism: fusiogenic peptides and histidine-rich peptides. 36 the fusiogenic peptides are small peptides that have hydrophobic amino acids (aa-s) interspersed at constant intervals with negatively charged aa-s. 12, 19, 39, 40, 45, 46, [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] thus, when early endosomes become late endosomes, their low ph induces a conformational change in the peptide, which adopts a alpha-helix structure, in an amphipathic structure able to fuse with the endosomal membrane, leading to pore formation and releasing all the endosomal content into the cell cytoplasm. 36 the histidine-rich peptides are small peptides with a high histidine content whose endosmolytic activity is mediated by a mechanism called ''proton sponge''. 12, 22, [80] [81] [82] [83] when the endosomal ph becomes low in late stages, the imidazole groups of the histidines are protonated and attract endosomal cl à ions, buffering against the proton pump. thus, the endosomes collapse by an osmolytic swelling process and the endosomal content is released to the cell cytoplasm. 36 further details are given in the chapter ''peptide nanoparticles for oligonucleotide delivery'' by lehto et al. in this volume. once the protein has achieved the cellular cytosol, it can be degraded by cellular proteases or by the cellular proteosome system. 84 it is important to avoid this process, especially if the protein has to reach the cellular nucleus. if the final target of the nanoparticle is the cellular cytoplasm, it is necessary that it remain there at least long enough to perform its therapeutical action. several peptide proteosome inhibitors have been described that are able to avoid this type of protein degradation. by adding these peptides to the final protein vector it is possible to protect it and enhance cytoplasmatic stability. epstein-barr virus nuclear antigen 1 (ebna1) contains a proteosome inhibitor consisting of glycine-alanine repeats able to prevent proteosomal proteolysis. it has been shown that a minimum of 4 aa-s gly-ala repeats are necessary to achieve such protective activity. [85] [86] [87] if the protein vector is carrying nucleic acids (dna or rna), degradation by the cytosolic endonucleases has to be taken into account, so it is also very important to protect this nucleic acid in order to maintain its integrity. some dna/rna condensing peptides as protamines also protect the dna against cytoplasmic endonucleases and enhance its stability as has been described above. 15 the cellular cytoplasm is a very crowded and compartmentalized environment where cellular organelles and cytoskeleton make the free diffusion of macromolecules such as protein vectors difficult. however, cytoskeleton elements such as microtubules are used by endosomes and other cytosolic macromolecules for intracytosolic mobility. 33 dyneins have been described as being capable of carrying those macromolecules and endosomes along the microtubules in a retrograde transport toward the nucleus. some small peptides that are able to bind dyneins have been identified. they can be added to the multifunctional protein vector in order to mediate an intracytosolic mobility toward the cellular nucleus. 36 several dynein-binding proteins have been identified in viruses that are able to use this transport system. comparing those protein sequences, a consensus peptide sequence (kstqt) that is able to bind to the dynein lc8 light chain has been identified. 88 molecules lower than 45 kda/10-30 nm are able to enter in the cellular nucleus by passive diffusion. however, macromolecules higher than 45 kda/ 10-30 nm generally require an active transport system through the nuclear pore system. this transport mechanism generally requires a specific targeting signal peptide named nuclear localization signal (nls). these signaling peptides are usually rich in basic aa-s, which are recognized by the cellular importines and actively transported through the nuclear pore. 15, 89 monopartite or bipartite nls sequences which are nls peptides that have one or two nls recognized sequences respectively have been described. 12 thus, these peptidic sequences can be added into the final multifunctional protein if nuclear localization is required in order to express a carried dna. it has been reported that a single nls sequence is sufficient to transport the vector to the nucleus and that a large number of nls sequences can result in inhibition of its activity. 90 one of the most used nls signal peptides are fragments derived from the 111-135 aa-s of the simian virus sv40 large tumor antigen (t-ag). other nls sequences can be found in gal4, protamines, or tat. 23, 36, 37, 77, [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] it is important that when the transported dna reaches the cellular nucleus, it has to be released in order to be accessible to the nuclear transcription factors and achieve the desired expression level. thus, while designing the multifunctional protein vector, this aspect has to be taken into account. once the dna has been released in the cell nucleus, it will be necessary to control its expression level depending on which therapeutic action is being promoted. when the goal is to kill a cell as in cancer therapies, the uncontrolled dna expression levels would not be a problem. however, when a specific protein expression level is required, achieving good control is very important. 13 some expression systems have been developed that can be pharmacologically regulated by oral drug formulation. 103 cell-specific promoters and enhancers can be also used in order to confer high cell specificity to the therapy. 104, 105 d. ways to get over the bbb the bbb is a hermetic barrier that only allows nonlipophilic molecules smaller than 400 da to cross it. however, some human proteins such as insulin, transferrin, insulin-like growth factor, or leptins are able to go across it by receptor-mediated transporters. thus, the most important factor limiting central nervous system-targeting therapeutics is the presence of the bbb. 106 finding the way to cross it will be the main challenge. some peptides have been described that are able to reach the brain crossing the bbb. moreover, it has been seen that they can be associated with another molecule and transported through the barrier. thus, they could be interesting candidates to be included in the multifunctional vectors if central nervous system targeting is required. 14, 56, 107, 108 antibodies have also been described that bind transferrin and insulin receptors and that are able to cross the bbb efficiently. they can be conjugated with large molecules, allowing its translocation to the central nervous system. 63, 64, [109] [110] [111] synthesis, and rational design the development of genetic engineering techniques has increased the natural repertoire of proteins for the design of useful and/or valuable proteins with the aim to obtain new proteins with desired functions. there are three main strategies leading to the construction of engineered proteins: (a) direct evolution, (b) de novo protein design, and (c) rational design. directed evolution has developed quickly to become a method of choice for protein engineers in order to create enzymes having desired properties for all kind of processes. over the past decade, this technique has become a daily part of the molecular toolbox of every biochemist. this is emphasized by the increasing number of publications about the subject. 112 in nature, evolution and creation of new functionalities is achieved by mutagenesis, recombination, and survival of the fittest. directed evolution mimics this and is a process of iterative cycles of producing mutants and finding the mutant with the desired properties. mutations can be introduced at specific places using site-directed mutagenesis or throughout the gene by random mutagenesis. several mutagenesis techniques have been developed in order to avoid codon bias. 113, 114 the first technique used to mimic evolution was dna shuffling. 115 this method is based on the mixing and subsequent joining of different related small dna fragments in order to form a complete new gene. in the process of shuffling, the recombination frequency is dependent on the degree of homology. a high level of recombination is important to get all possible combinations of mutations. since recombination can be biased, several methods to overcome problems arising from the use of shuffling in the early years were tackled by novel strategies, all having their own advantages and disadvantages. 112 the products obtained by these methods have to be screened for desired qualities and not all of them can be easily screened. de novo protein design offers the broadest possibility for new structures. it is based on searches for amino acid sequences that are compatible with a three-dimensional protein backbone template using in silico techniques. several research groups in the field have applied in silico methods to design the hydrophobic cores of proteins, with the novel sequences being validated with experimental data. 116 in silico protein design has allowed novel functions on templates originally lacking those properties, modifying existing functions, and increasing protein stability or specificity. beyond any doubt, intense research activities are ongoing in the field, the potential of which is simply enormous. 117 so far there have been numerous examples of full sequences designed ''from scratch'' that were confirmed to fold into the target three-dimensional structures by experimental data. 118 the zinc-finger protein designed by dahiyat and mayo 119 was the first one to appear by this method. rational design of proteins is based on the modification or insertion of selected amino acids or domains in a polypeptide chain backbone to obtain proteins with new or altered biological functions. when using that strategy, a detailed knowledge of the structure and function of the backbone protein is needed to make desired changes. this generally has the advantage of being inexpensive and technically feasible. however, a major drawback of this approach is that detailed structural knowledge of a protein is often unavailable or it can be extremely difficult to predict the effects of various mutations. modular engineering enables, by using simple dna recombinant techniques, the construction of chimerical polypeptides in which selected domains, potentially from different origins, provide the required activities. an equilibrate combination and spatial distribution of such partner elements has generated promising 258 prototypes, able to deliver expressible dna or molecules to tissue culture but also to specific cell types in whole organisms. 120 modular fusion proteins that combine distinct functions required for cell type-specific uptake and intracellular delivery of dna or drugs present an attractive approach for the development of self-assembling vectors for targeted gene or drug delivery. 121 one of the first examples was described by the group of uherek et al. they combined a cell-specific target module (antibody fragment specific for the tumor-associated erbb2 antigen), a dna-binding domain (gal4), and a translocation domain for endosomal escape. 121 in this context, many strategies for the construction of safer vehicles are being explored and the number of nonviral prototype vectors for gene and drug delivery is noticeably increasing. here, the common steps that an approach like this might explore are presented ( fig. 1) . when designing a new protein for drug or gene delivery there are many critical aspects, namely (a) design of the vehicle itself, required functions, stability, etc.; (b) production of the protein, suitable expression system, purification procedure, scaling up process, etc.; (c) characterization of the vehicle by physicochemical and functional tests; and finally (d) the administration route and regulatory guidance for biological products. although all these aspects belong to different disciplines, they have to be overviewed together. here, the major needs of a modular protein for gene and drug delivery are presented. to enhance the physicochemical stability of the cargo molecules and their resistance to nuclease/protease-mediated degradation, protein vehicles should ideally exhibit, like their natural counterparts (viruses), nucleic-acid binding and condensing properties. 27 such abilities are, in general, conferred by cationic segments of the main scaffold molecules that interact with nucleic acids, mainly through electrostatic interactions. in addition, such complexes need to efficiently release the nucleic acid in the nucleus (if the cargo is a therapeutic gene), for which endosomal escape is required. such functions have been found in some peptides in many natural molecules and they are suitable for functionalizing protein vehicles. the ability to bind a particular cell type with high specificity is especially significant in a systemic delivery in which appropriate biodistribution and tissue targeting are essential. 122 for nuclear targeting, only naked short nucleic acids can freely enter the nucleus of nondividing cells via free diffusion through the nuclear pore. large molecules require active transport mediated by nlss that are often found in viral proteins. because the molecular mass of plasmidic dna varies from to 2 to 10 mda, dna that is to be expressed, and essentially any macromolecular complex for nucleic acid delivery, requires nlss. 123 the role and types of functional modules peptides used for all these purposes will be discussed in depth in the following sections. in vivo experiments finally, which protein or peptide is better for a given cargo is to be determined empirically and only few rules can be taken literally. 38, 124 c. production of protein nanoparticles some steps in the production of a protein-based vehicle after molecular cloning such as protein production and protein purification 125 might be experimentally labor intense with a variable success rate. for that reason, when small proteins are needed, solid-phase peptide synthesis 44 guarantees the process. however, the classical procedure of biological production allows scaling up the process in most of the cases and the production of larger polypeptides and fulllength proteins. generally, in protein nanoparticle approaches, the protein is composed by different modules of natural sources such as the cell-penetrating peptide transactivator of transcription (tat) derived from the tat of the human immunodeficiency virus (hiv) 126 or artificial sequences not present in any organism such as the polylysine dna-condensing sequence. 127 once it has been defined which modules will be part of the protein, it is important to define the order they will have in the final construct. it has been demonstrated by boekle and coworkers using melittin conjugated to polyethylenimine (pei) that depending on the side of the linkage (c-or n-terminus), the lytic activity could be changed. some other modules have the need to be in a determined position for its correct function. 128 when producing a protein for gene or drug delivery, it is important to know the origin of its domains to choose the most suitable expression system for its production. for instance, if any module naturally carries a posttranslational modification that is essential for its biological function, the expression system chosen will have to be able to reproduce the same crucial modification. the main biological production systems for protein drugs are described below. escherichia coli is the most widely used prokaryotic organism for the expression of recombinant proteins. 129 the use of this host is relatively simple and inexpensive. 130 added advantages include its short duplication time, growth to high cell densities, ease of cultivation, and high yields of the recombinant product. however, since it lacks fundamental prerequisites for efficient secretion, recombinant proteins manufactured by e. coli systems are mainly produced as inclusion bodies. 125, 131 moreover, posttranscriptional modifications are not achieved with this system. there are many examples of proteins for gene delivery produced in e. coli with probed efficiency. 132, 133 like e. coli, yeasts can be grown cheaply and rapidly and are amenable to high-cell-density fermentations. besides possessing complex posttranslational modification pathways, they offer the advantage of being neither pyrogenic nor pathogenic and are able to secrete more efficiently. species established in industrial production procedures are saccharomyces cerevisiae, kluyveromyces lactis, pichia pastoris, and hansenulapolymorpha. s. cerevisiae is the best genetically characterized eukaryotic organism among them all and is still the prevalent yeast species in pharmaceutical production processes. 131 in spite of their physiological advantageous properties and natively high expression and secretion capacity, the employability of yeasts in some cases, however, might reach a limit, particularly when the pharmacological activity of the product is impaired by the glycosylation pattern. in such cases, either a postsynthetic chemical modification has to be considered or the employment of more highly developed organisms. most examples of nanoparticles produced in yeast are for vlps. 134 animal cell expression systems show the highest similarity to human cells regarding the pattern and capacity of posttranslational modifications and the codon bias. however, their culture is more complicated and costlier and usually yields lower product titers. among the known systems, insect cells infected by baculovirus vectors have reached popularity since they are considered to be more stress-resistant, easier to handle, and more productive compared with mammalian systems and are thus frequently employed for high-throughput protein expression. for commercial application, scale-up related questions have to be solved. [135] [136] [137] preferably applied in pharmaceutical production processes are mammalian systems like chinese hamster ovary (cho) cells and baby hamster kidney (bhk) cells. these systems are genetically more stable and easier to transform and handle in scale-up processes, to grow faster in adherent and submerged cultures, and to be more similar to human cells and more consistent in their complete spectrum of modification. 138 in some cases, mammalian cell systems can be the only choice for the preparation of correctly modified proteins. peptides, being complex and unique complex molecules with regard to its chemical and physical properties, can be produced synthetically by the solidphase method. 139, 140 this technology can be used to avoid problems related to biological production. general advantages of synthetic peptides are that they are very stable compounds, solid-phase chemistry produces highly standardized peptides, and the crucial polycation component is provided by a ''natural'' polycation, thus minimizing toxicity. 141 however, some disadvantages related to synthetic peptides have been reported such as the difficulty to synthesize long and well-folded oligopeptides, peptides with multiple cysteine, methionine, arginine, and tryptophan residues due to technical limitations or production cost. 141 when working with protein nanoparticles, it is very important to characterize them physically and functionally in order to understand their behavior. the size and charge of protein/cargo particles are crucial properties which influence rates of diffusion, binding to polyanionic components of connective tissues, transversal of anatomical barriers, binding of serum proteins, attachment to cells, and mechanisms of endocytosis, among other factors. stability in physiological salt solutions is a key issue for in vivo delivery, as salt is found everywhere in the body. 141 mixing a multivalent polycation and dna results in electrostatic binding of both molecules, with charge neutralization of dna and a particle formation named conjugate. charge neutralization can be easily seen by retardation gel assays and particle formation by dynamic light scattering (dls). dls is a good method to see particle formation but not to quantify relative number of particles of different sizes. 142 to visualize particles, many groups have used transmission electron microscopy (tem) 15, 143 with good results while others have used fluid particle image analyzer (fpia) to photograph individual particles in physiological solutions. 58 the net charge of protein/cargo particles is an important variable. generally, optimal gene delivery for cell lines requires a net positive charge but, as stated previously, it has to be determined empirically. one of the best techniques to determine the net charge is by calculating the zeta potential that measures the electrophoretic mobility of particles. 144 despite the fact that physical characterization is a key element, understanding and testing the functionality and pharmacokinetics of a gene or drug is the most important part of its development process. most of the initial tests are done using cell lines in in vitro experiments using reporter genes, rna, or drugs. 145, 146 quantifying the percentage of transfected cells or drug-induced changes is a very valuable tool to evaluate nanoparticle performance in both nuclear and cytoplasmic delivery, respectively. in addition, in vitro experiments may be designed to select a candidate for the in vivo experiments from a group of possible therapy vectors. the quantitative kinetics of particle binding, the molecular basis of particle interactions with target cell membranes, the efficiency of particle internalization, and endosomal escape are all poorly understood. 141 interaction of particles with plasma membranes prior to protein internalization can be either unspecific or specific. untargeted delivery normally is the consequence of electrostatic interactions between anionic ligands in the cell surface and cationic components of the vehicle. on the other hand, targeted delivery to specific membrane molecules is a more sophisticated approach. it aims to improve cell specificity and efficiency, by directing to molecules, only expressed or overexpressed in a particular cell type, that initiate internalization by endocytosis. targeting moieties include many types of molecules and is discussed afterwards. internalization of particles, its mechanisms, and kinetics are not well known and most studies about nanoparticle delivery do not focus on this aspect. there are several endocytic pathways each initiated by different ligands. 147 enhancing the delivery by addition of chloroquine, a synthetic molecule used primarily for the prophylaxis and treatment of malaria that disrupts endosomes, 148 is an accepted parameter to demonstrate endosomal localization of particles. endosomal escape is the area most intensively investigated but is poorly understood. an important practical point to note is that some reagents that are used can be toxic. 141 to enhance this step, anionic fusogenic peptides can be used. these peptides fuse to membranes in an acidic-dependent manner causing its disruption. 149 in gene delivery approaches, translocation of dna expression plasmids into the cell nucleus involves an active, energy-dependent process through the nuclear pore complex. 150 directly injected dna into the cytosol is usually, but not always, poorly transferred to the nucleus 150, 151 and because of that, the use of proteins carrying cationic nuclear-localizing sequences (such as that of sv40 large t antigen) has been widely used to overcome this step. 143 iv. natural self-assembling protein nanoparticles: vlps ideal drug delivery and gene therapy vehicles must accomplish some desired features such as appropriate packaging size for its cargo, target cellspecificity, safe and efficient cargo delivery, and protection against immune recognition, or capability to escape immune recognition. moreover, these vehicles must avoid inflammatory toxicity and rapid clearance. 152 in this context, viral vectors have been exploited as one of the vehicles of choice. viruses are nano-sized (15-400 nm) supramolecular nucleoproteinbased entities, covered or not with a lipid bilayer (enveloped/nonenveloped viruses) that satisfy, into relatively simple structures, outstanding properties and functions that are relevant to drug and gene delivery. viruses are able to recognize and interact specifically with cells by receptor-mediated binding, internalize, escape from endosomes, and uncoat and release nucleic acids in different cellular compartments. they are also capable of transcribing and translating their viral proteins to self-assemble into new infectious virus particles and exit the host cell. 120, [153] [154] [155] despite all these relevant properties of viral vectors or some other rising vehicles in drug and gene delivery such as cationic liposomes, their therapeutic use presents some limitations and risks because of the complexity of production, limited packaging capacity, insertional mutagenesis and gene inactivation, low probability of integration, reduced efficacy of repeat administration or reduced expression overtime, unfavorable immunological recognition or strong 264 immune response against vehicle and transgene, inflammatory toxicity, and rapid clearance. 120, 152 in this context, virus capsids or vlps, produced by recombinant capsid proteins but lacking the viral genome, have noticeably emerged as a safer alternative to viral vectors. a. structure of protein self-assembled nanovehicles vlps are classically described as self-assembling, nonreplicative and nonpathogenic, highly organized supramolecular multiprotein nanoparticles (coats) (ranging from 20 to 100 nm) that can be formed from the minimal spontaneous self-assembling of one or more viral structural capsid proteins. it has been described that the self-assembling process of the structural viral proteins for vlp formation involves both spontaneous assembly, under favorable experimental conditions, and the requirement of scaffold proteins as catalysts. 156, 157 therefore, vlps are considered protein ''coats'', ''shells'', or ''boxes'' that lack the viral genome, still conserve the structure, morphology, and some properties of viruses. some of these properties such as cellular tropism and uptake, intracellular trafficking, membrane translocation, and transfer of nucleic acids or molecules across the cytoplasmic, endosomal, and nuclear membranes are important for drug delivery and gene therapy. 120, 153, 155, [158] [159] [160] usually, the degree of similarity of vlps and their viruses depends on the number of proteins incorporated into the constructs. 161, 162 since the first description in 1983 of the viral dna packaging into mouse polyomavirus (mpyv) vlps and its transduction in vitro, 163 vlps of different viruses such as papillomaviruses, [164] [165] [166] hepatitis b, c, and e viruses, [167] [168] [169] polyomaviruses, 163 vlps offer some structure, dynamics, characteristic features, and functions that make them appealing bionanomaterials to be exploited in the biomedicine arena as drug and gene delivery vehicles and are discussed in detail afterward. on the one hand, viral coat proteins have the ability to spontaneously selfassemble, which ensures the formation of highly organized, regular, repetitive structurally stable, and very low morphological polydisperse particles that provide useful properties to be used as scaffolds for bioimaging, synthesis of bionanomaterials, and as nanocarriers in drug and gene therapy. 186 in addition, homogeneity of particle size and composition is a desired production factor when developing therapeutic molecules. the overexpression of structural viral proteins in a convenient expression system renders recombinant proteins capable of being folded and assembled in discrete organized nanoparticles with a defined size corresponding to the natural capsid geometry. [187] [188] [189] moreover, even though vlps are structurally stable particles, some biochemical and structural studies have observed that viral capsids and bacteriophages may show some structurally dynamic properties varying in shape, size, or rearrangements of the coat proteins, in response to different factors such as ph. [190] [191] [192] [193] on the other hand, vlps are considered biologically safe nanostructures since they are not infectious (lack of viral genome) and do not replicate, representing a safer alternative to viral vectors. 160, [194] [195] [196] [197] however, they can elicit immune and inflammatory responses, especially when repeated administration is needed. 152 it has to be also noted that when used in vaccination, vlps could show excellent adjuvant properties and the majority of vlps stimulate strong cellular and humoral immune responses as direct immunogens. 198 it has been suggested that recombinant vlps derived from infection of insect cells with baculovirus or even those derived from prokaryotic systems could be contaminated with different residual components of these host cells, contributing those impurities to the adjuvant properties. 153 one interesting property of vlps is that coat viral proteins present an enormous elasticity and adaptability to be modified chemically and/or by protein genetic engineering 154, 160, 199 to incorporate multiple directed functionalities, in order to be addressed in biomedical applications such as drug delivery or gene therapy. it has been recently reviewed that chemically and/or genetically modified vlps, including cpmv, ccmv, ms2, m13 bacteriophages, and other virus-based nanoparticles, 155,186 could maintain their structural integrity and improve their physical stability 154 and, moreover, these modifications could also confer desired cell-targeting properties to the nanovehicle. [153] [154] [155] 186, 200, 201 vlps can be successfully engineered with spatial precision to incorporate (attached or genetically displayed on the surface) targeting tissue-specific ligands such as epidermal growth factor (egfr) and antibodies, or other molecules such as oligonucleotides, peptides, gold, and other metals, target proteins, carbohydrates, polymers, fluorophores, quantum dots, drugs, or small molecules. 152, 154, 155 moreover, one of the potential benefits of such modifications is that the specific geometric rearrangement confers precise recognition patterns. 200, 201 furthermore, accessibility of the materials carried within the particle and the ability of inclusion and separation of nucleic acids, small molecules, and unusual cargoes with appropriate charge is another outstanding feature and key advantage of vlps that has also made them excellent vessels for gene and drug delivery. 152, 195 as described above, vlps can be used as empty nanocarriers to transport molecules chemically attached on their surface or can be loaded ex vivo with therapeutic small molecules such as drugs, dnas, mrnas, sirnas, oligonucleotides, quantum dots, magnetic nanoparticles, or proteins. 155, 157, 160 vlps of different papillomavirus and polyomavirus have been widely characterized and used for directed delivery in biomedical applications. 132, 165, 173, 174, 194, 202 osmotic shock and in vitro self-assembling of vlp subunits in the presence of 266 the cargo have been the two main strategies used to packaged nucleic acid or other small molecules. it has to be taken into account that some attachment of the cargo on the vlp surface can occur. 195 besides, diversity of natural tropism including liver for hepatitis b vlps, spleen for some papillomavirus and polyomavirus vlps, antigen-presenting cells for certain papillomavirus vlps, and glial cells for human polyomavirus jc (jcv) vlps, among others 152 is one of the key advantages offered by vlps providing a wide spectrum of specific targeting and distribution profiles depending on the directed application. although each vlp has its own characteristic receptors, entry pathway, and intracellular trafficking, it has been demonstrated that tropism of vlps could be customized, modifying the residues identified as ligands of the cellular receptor on vlps' surface or even varying the delivery routes. 155, 189, 203 another key advantage of vlps is that they can be easily produced by using a wide range of hosts and expression systems, each of them with its own conditionings. 162 in the past years, there has been an increasing need to improve and optimize efficient large-scale production systems, process control and monitoring, and up-and down-streaming processes. 153, 157, 159, 204 production of vlps usually involves transfection of the cell host expression system of choice with a plasmid encoding one or more viral structural proteins, further and rigorous purification for the removal of immunogenic cellular contaminants, and quality control of the produced vlp and encapsulation of the cargo ex vivo before administration. 152, 158 the most frequent and convenient expression systems, adaptable to large-scale processes are (1) yeast cells 176 205, 206 , (4) bacteria 204,207 , (5) green plants infected with modified viruses 208, 209 , and (6) cell-free systems. 163, 204 the preparative and large-scale manufacture of vlps in some of these hosts has been reviewed by pattenden et al. and can be classified into two main methods of bioprocessing: in vivo and in vitro systems. 157 in addition, the capability of in vitro dissociation and reassociation of vlps contribute to the application of easy and more accurate purification methods than those of viral vectors. 152, 157 furthermore, depending on the expression system, the resulting vlp might be significantly different even though expressing the same viral proteins. thus, a broad spectrum of vlps could be customized depending on the vlp type, the number of proteins needed for vlp assembling, and the targeted final application. 158, 210 as described above, vlps have great potential as nanocarriers in drug and gene delivery. at the same time, although there is an increasing flow of developments in this area, these vehicles also present some limitations that should be addressed and taken into account, such as residual cellular components, variable yield of functional vlps after disassembly/reassembly process, immunostimulation and unsuitability for repeated administration, tolerance to the transgene, ineffective therapeutic molecule loading, and low transfection rates. 152 protein nanoparticles engineered for drug delivery and gene therapy 267 due to their versatile nanoparticulate structure and morphology, and nonreplicative and noninfecting nature combined with their natural immunogenic properties and ease production, vlps have principally emerged as an excellent alternative tool to attenuate viruses for vaccination. 152 182 and ebola virus. 215, 216 although vlp-based vaccines have been primarily developed for their use against the corresponding virus, in the last decades genetic engineering or chemical modifications have been applied in order to generate chimeric vlps. thus, on the one hand, commonly short heterologous peptide epitopes or full proteins that are unable to form vlps or that are unsafe for vaccination have been presented on surface-exposed loops or fused to n-or c-exposed termini of structural viral capsid proteins on vlps. 154, 161, 210 different hpv, 217-219 hbv, 220,221 parvovirus, 222, 223 and chimeric polyoma vlps have been engineered 170, 175 and tested for different applications including vaccination against viral or bacterial diseases, against virus-induced tumors, and more recently, for immunotherapy of nonviral cancer. 161, 210 on the other hand, chemical bioconjugation for covalent coupling of protein epitopes and small molecules to lysines, cysteines, or tyrosine residues of vlp surfaces has been applied in viral or cancer vaccines. 200 chackerian et al. have demonstrated the efficient induction of protective autoantibodies using self-antigen conjugation to hpv vlps. 224 it is important to point out that vlps can also be engineered to incorporate heterologous cell-specific ligands to cell receptors, thus altering their cellular tropism. 154, 155, 186, 201 this great convertibility and flexibility of vlps to be modified (chemically and/or genetically), their high stability, natural and diverse tropism, their nanocontainer properties, and their ability to enter in the cell and incorporate, bind, and deliver nucleic acids and small molecules have positioned vlps as appealing entities not only for vaccination applications but also for a broad spectrum of other diverse and emerging applications in nanomedicine and nanotechnology such as immunotherapy against cancer, 210,225 gene therapy delivery of therapeutic genes into specific cells, 161, 165, 171, 184, 226, 227 and targeted delivery of drugs and small molecules using vlps as nanocarriers. 174, 196 268 domingo-espín et al. although there is no commercial vlp as vector in gene therapy, since the initial work in 1970 of uncoating polyoma pseudovirus in mouse embryo cells as gene delivery vector 228 and the establishment in 1983 of the viral dna packaging into mpyv vlps and its transduction in vitro, 163 different vlps such as hbv and hepatitis e virus, 229 hpv and polyomavirus nanoparticles 172, 178, 229 have been modified toward the specific delivery of therapeutic genes and proteins in different target cells, organs, and tissues in vitro and in vivo by systemic injection 229 or oral administration. 230 for example, recombinant vp1-based polyomavirus vlps can encapsulate in vitro exogenous dna, and deliver it by cell surface sialic acid residues to human brain cells and fetal kidney epithelial cells. 178 furthermore, vlps have recently emerged as novel nanocarriers or nanocontainers to store unnatural cargos, deliver modified oligonucleotides, 154 synthetic small interfering rnas, and plasmids expressing short hairpin rnas as therapy to downregulate gene expression. 171, 231 in this context, chou et al. have recently described the use of jcv vlps as an efficient vector for delivering rnai in vitro using murine macrophage raw 264.7 cells and in vivo using balb/c mice in silencing the cytokine gene of il-10 without significant cytotoxicity for systemic lupus erythematosus gene therapy. 171 one of the key aspects in targeted gene and drug delivery is cell-specific delivery. it is important to point out that vlps are tunable nanoparticles that can also be chemically or genetically engineered to modify their natural cellular tropism in order to diversify the range of therapeutic applications in targeted gene or drug delivery. 154, 201 some effective approaches to modify the natural cellular tropism include: (1) genetic engineering of vlp chimeras incorporating heterologous cellspecific short peptides that contain recognition sites of target cell receptors. 232 in this context, polyoma and papillomavirus, with solved atomic structures of their major structural capsid proteins, have been extensively used to obtain chimeric vlps as delivery vector systems. 165, 233 however, this approach has some bioprocessing limitations such as low production levels as a consequence of vlp modification, alterations of size and properties of the vlps that could affect the structural interactions and conformations for vlp assembly, disassembly and packaging, and low transduction efficiencies. 157 (2) chemical bioconjugation of purified vlps with epitope-containing peptides 234, 235 or a wide range of small molecules conferring cell-specific targeting such as transferrins, folic acid, or other targeting molecules. as an example, cmpv vlps have been successfully conjugated with tfn using ''click'' chemistry 236 and with nhs-ester-derivatized folic acid, demonstrating both as internalized into hela cells and kb cells, respectively. 183, 184 (3) high-throughput library and directed evolution method is a rational approach that has been recently used to engineer viral vectors with the desired tropism properties. 237 (4) pseudotyping, which consists of replacing the envelope protein of one virus species by the envelope protein of another virus species. 238 (5) modification of the delivery route of the vlps. it has been shown that the levels of expression of b-galactosidase in heart, lung, kidney, spleen, liver, and brain are different depending on the delivery route of polyomavirus vp1 vlps. 203 the great accessibility and reactivity showed by vlps, as well as their ability to serve as nanocarriers, which made them suitable to be exploited in gene therapy, have also been applied to targeted drug delivery. 195 genetic modification and/or chemical functionalization of exposed amino acid residues on the capsid surface in order to attach small molecules, such as markers or bioactives molecules, is one of the most common approaches applied to target drug delivery. 174, 239 as an example, canine parvovirus (cpv) vlps produced in a baculovirus expression system and exhibiting natural tropism to transferrin receptors (tfrs) were chemically modified on accessible lysines of the capsid surface with fluorescent dye molecules and delivered to tumor cells. derivatization of cpv-vlps did not interfere with the binding and internalization into tumor cells. 183, 184 one limitation of vlps in gene therapy is the low efficiency of gene transduction due to inefficient dna packaging. however, a recent study presented a novel in vivo dna packaging of jcv vlps in e. coli that effectively reduced human colon carcinoma volume in a nude mouse model. in this study, the exogenous plasmid dna was transformed into the jcv vp1 expressing e. coli. the packaging of the second plasmid occurs simultaneously as the in vivo assembly of the jcv vlp. even though it is still not clear how the plasmid dna molecules are encapsidated in the vlp, the authors showed that gene transduction efficiency by their in vivo package system was about 80% in contrast to the 1-2% of gene transduction efficiency achieved by the in vitro osmotic shock system. 226 in addition, the administration of exogenous proteins may induce the immune system response, reducing therapy effectiveness or causing undesirable secondary effects, albeit immunological response of protein nanoparticles can be modulated. 240 spontaneous protein self-assembly to form ordered oligomers is a common event in biology. it can prove advantageous in terms of genome-size minimization, formation of large structures, stabilization of complexes, and inclusion of 270 functional features. 241 it has been widely documented that cellular oligomer proteins as well as viral capsids are stabilized by several weak noncovalent interactions as hydrophobic interaction, electrostatic energy, and van der waals forces. [242] [243] [244] these interactions result in a complex quaternary structure described by three symmetry point groups named cyclic (cn), dihedral (dm), and cubic (t, o, i). 245, 246 the development of computational techniques to predict protein-protein interactions using solved 3d protein structures makes it possible to predict and/or strengthen experimental data performing in in silico approaches. 247 furthermore, its use opens up the possibility to design proteins not only displaying specific biological functions but also interesting intermolecular interactions to obtain increased multivalency in the resulting complexes. moreover, it should be considered that not only whole proteins can self-assemble in smart nanoparticles; oligopeptides are also capable of forming organized structures. many applications are possible due to the enormous quantity of different combinations and features that can be exploited with peptides. 248, 249 furthermore, protein-protein interactions are not the unique parameters involved in particle formation, nucleic acid-peptide interactions, salt concentration, order of mix, and ratio between nucleic acid and protein can also strongly influence the condensation process. 250, 251 due to their natural tendency to self-assemble forming highly ordered structures, viruses provide a wide variety of scaffold proteins which are used as gene/drug carriers. among them, vlps have been reviewed in the previous section. however, simple bacterial proteins can be also utilized as carriers for gene delivery. for example, heat shock proteins (hsp) from hyperthermophilic archeaon methanococcus jannaschii can assemble in a small structure of 24 subunits having an octahedral symmetry. these 12 nm structures are stable at high temperature, up to 70 c, and wide range of ph. residue modifications are allowed to elicit specific attachment of small molecules. 186, 252 in bacteria, bacterial microcompartments (bmc) which are intracellular organelles consisting of enzymes encapsulated within polyhedral, protein-only shells, somewhat similar to viral capsids, have been described. bmcs are composed of a few thousand copies of a few repeated protein species (including one or more enzymes involved in specific metabolic pathways), and with sizes of around 100-150 nm in cross section. the general role of bmcs is to confine toxic or volatile metabolic intermediates, while allowing enzyme substrates, products, and cofactors to pass. the first described bmc, the carboxysome, was isolated in the early 1970s 253, 254 and has been found to contain both co 2 -fixing ribulose bisphosphate carboxylase/oxygenase (rubisco) 253, 254 and carbonic anhydrase [255] [256] [257] enzymes. carboxysomes' function is to enhance autotrophic co 2 fixation at low co 2 levels. other bmcs were later identified in cyanobacteria and some chemoautotroph bacteria. among them, bmc proteins have been later found to be encoded in the propanediol utilization operon (pdu) of the heterotroph salmonella 258 and by an operon for metabolizing ethanolamine (eut) in enteric bacterial species, including salmonella and escherichia. 259 salmonella enterica forms a polyhedral organelle during growth on 1,2-propanediol (1,2-pd) as a sole carbon and energy source, but not during growth on other carbon sources. 260, 261 the pdu organelles' function is to minimize the harmful effects of a toxic intermediate of 1,2-pd degradation (propionaldehyde). [261] [262] [263] other studies have shown that a polyhedral organelle is involved in ethanolamine utilization (eut) by s. enterica. 259 the function of the eut microcompartment is to metabolize ethanolamine without allowing the release of acetaldehyde into the cytosol, therefore minimizing the potentially toxic effects of excess aldehyde in the bacterial cytosol [264] [265] [266] and also preventing volatile acetaldehyde from diffusing across cell membrane. 267 so far, about 1700 proteins containing bmc domains have been identified, covering at least 10 different bacterial phyla. the typical bmc protein consists of approximately 90 amino acids, with an alpha/beta fold pattern. 268, 269 some individual bmc proteins self-assemble to form hexamers, which further assemble side by side to form the flat facets of the shell. 268, 270, 271 the formation of icosahedral, closed shells from such flat layers was elucidated in part by structural studies in carboxysomes: some bmc proteins assemble to form pentamers, which are located at and form the vertices of the icosahedral shell. 270 mechanisms directing enzyme encapsulation within protein-based bmcs have been studied during the last years. it has been described that, in some carboxysomes, protein ccmm is used as a scaffold to form interactions between both shell proteins and enzymes, 272,273 through a ccmm c-terminal region with homology to the small subunit of rubisco. 274 other studies revealed that pdu shells can self-assemble without needing interior enzymes 275 and that carboxysomes can self-assemble in vivo when rubisco has been deleted. 276 regarding properties of the encapsulated enzymes, in the pdu bmc some of the internal enzymes are encapsulated by specific n-terminal targeting sequences. 275, 277 in this line, sutter and colleagues 278 described a conserved c-terminal amino acid sequence that mediates the physical interaction of an iron-dependent peroxidase (dyp) or a protein closely related to ferritin (flp) with a specific type of bmc (encapsulins). in another example, an icosahedral enzyme complex, lumazine synthase (aals) from bacillus subtilis and aquifex aeolicus, was engineered to encapsulate target molecules by means of charge complementarity and can also be modified to give different characteristics to the assembled structure. 279 moreover, enzymatic subunits, like e2 of pyruvate dehydrogenase from bacillus stearotermophilus, can be modified to be used in gene delivery. e2 peptides naturally form a dodecahedron of 60 subunits of 24 nm in diameter allowing modification for drug-like accommodation. the assembling/disassembling of these structures can be modulated by changing the operative ph in the experimental environment. these nanoparticles can also be functionalized with antigens for vaccine development. 281, 282 according to these results, specific targeting sequences could be of use in biotechnological applications to package proteins inside the stable selfassembled icosahedral shell of bmcs, offering appealing opportunities to manipulate in the laboratory such nanocages to fill them with therapeutic molecules. the simplicity of this system makes it very attractive for engineering studies to design, mimicking nature, new applications in biotechnology, providing a new, intriguing platform of microbial origin for drug delivery. bovine serum albumin (bsa) is able to form microspheres after sonochemical treatment in aqueous medium. chemical effects of ultrasound radiation and coupling with an anticancer drug such as taxol (paclitaxel) led to the assembling of a spherical carrier with an average diameter of 120 nm. bsa particles resulting from s-s bonds, due to ho 2 radical formation, are able to release the encapsulated taxol in cancer tissue with best results if compared with mere taxol treatment. this drug for breast cancer treatment is commercially available. 283, 284 also little cationic peptides can lead to self-assembling particles. among others, arginine-rich cationic peptides are widely known as good tools for gene delivery. for example, purified r9-tailored gfp in solution is described to form nanodisk particles 20 nm in diameter. this structure is proved to be induced by the 9 arg tails and is able to bind and condense dna. these nanodisks are also able to deliver dna toward the nucleus where the reporter gene is expressed. 285 on the other hand, the expression of recombinant proteins over physiological rates can cause a bad functioning of cellular quality control system, leading to self-organizing, pseudo-spherical, protein aggregates known as inclusion bodies. these mechanically stable nanoparticles, ranging from 50 to 500 nm in diameter, were considered for a long time as undesired bio-products. recently, it became clearer that they are suitable for medical approaches when utilized as scaffold surface to promote cellular proliferation. [286] [287] [288] one of the most difficult goals for a foreign gene delivery is to reach the nucleus. an approach to overpass this obstacle is by fusing an nls in a nonessential position of a dna-binding protein. such type of modification has been described for a tetracycline repressor protein (tetr) fused with an sv40 nls. the tetr-nls affinity and specificity to teto dna sequence is exploited to form spontaneous protein-dna complexes which allow an enhancing of dna transportation into the nucleus and subsequent expression of foreign genes, combining the two peculiar characteristics of each fusion component. 289 there is still a tremendous gap between progresses made in protein-based nanoparticle research for drug delivery and clinical reality. hundreds of publications in basic research describe the combination of two or more functional elements in a single protein nanoparticle, by which the delivery of a carried drug is enhanced. these agents act by improving critical steps in the drug delivery process, such as increasing the systemic stability or tissue specificity, favoring internalization, endosomal escape, and entry into the nucleus, or transporting therapeutic material through the bbb, in in vitro and in vivo studies. besides the human recombinant therapeutic proteins currently on the market (or functional segments of them), there are also some fusion proteins approved for clinical use (most by incorporating an antibody fragment or a ligand to enhance cell specificity). sadly no gene therapy trials have so far used full protein carriers in vivo, but rather peptide-functionalized vehicles. bottlenecking the gap between research and clinical application, the us fda/european medicines agency (emea) only approves human proteins, to avoid the risk of an immune response that could affect not only the effectiveness of the nanoparticle but also challenge patients' health. another critical factor is the administration route, where the protein is degraded before arriving at the target; this problem could be solved or minimized by the use of protein d-isomers, pegylation, or the design of protecting groups for labile sites. despite the current situation mentioned above, there are many good examples of multifunctional modular proteins that, when carrying therapeutic material, can improve the prognosis in vivo in animal models for different diseases. these examples are reviewed below, along with those few protein nanoparticles that are currently on the market or in clinical trials. albumin is a natural protein transporter of hydrophobic molecules throughout plasma that has been approved by the fda to reversibly bind water-insoluble anticancer agents, as is the case of albumin-bound (nab) paclitaxel, abraxane . this albumin-nab technology-based drug is in use in patients with metastatic breast cancer who have failed combination therapy, and it is the first protein nanoparticle approved by the fda. albumin potentiates paclitaxel 274 concentration within the tumor by increasing paclytaxel endothelial transcytosis through caveolae formation. it also contributes to the fact that tumors secrete an albumin-binding protein sparc (also called bm-40) to attract and keep albumin-bound nutrients inside the tumor cell. 290 the albuminpaclitaxel complex was not formally considered a nanoparticle in the united states (due to an average size of 130 nm) but only so in europe. apart from whole recombinant therapeutic proteins being currently commercialized, there are also some examples of vehicles formed by chimerical proteins with target ligands already in the market. dab389il-2 (denileukin diftitox or ontak) is a fusion of diphtheria toxin catalytic and translocation domains for lethal effect and interleukin-2 (il-2) to gain cell specificity in the treatment of persistent or recurrent t-cell lymphoma. belatacept (bms-224818) is a ctla4-ig fusion protein formed by the cytotoxic t-lymphocyteassociated antigen 4 joined to an immunoglobulin g1 fc fragment fusion protein, developed by bristol-miers-squibb. etanercept (enbrel) fusion tumor necrosis factor receptor (tnfr), which binds and inhibits specifically tnf activity, to an immune globulin g1 fc, to prevent inflammation mediated by tnf in autoimmune diseases like arthritis and psoriasis. on the other hand, fusion proteins which include an antihuman epidermal growth factor receptor 2 (her2) monoclonal antibody that binds tumor cell surfaces, among them the so-called ''trastuzumab'' (commercialized as herceptin by roche), associated to dm-1, an antimitotic drug, aimed at improving the treatment of breast cancer. finally, vlps, that is, empty viral entities formed by the self-assembly of a viral capsid protein, are the only truly protein nanoparticles (architectonically speaking) which are currently used in clinical practice. hbsag recombinant protein of hbv expressed in yeast and the capsid l1 recombinant protein of hpv (types 6, 11, 16, and 18) administered currently as vaccines tend to form spontaneously vlps that elicit t and b immune response. recently, there have been preclinical and clinical trials to test the security and efficacy of vlp vaccines against chikungunya 291 and seasonal influenza virus (http://www. medpagetoday.com/meetingcoverage//icaac/22129), respectively. influenza vlp vaccines have proven to provide complete protection against h1n1 2009 flu pandemics, 292 within a record preparation time when compared to 9 months for traditional vaccines. the use of vlps as a delivery system for drugs or nucleic acids in gene therapy is still under investigation. 194 drugs and proteins may be transformed through pegylation, a process that can assist them in overcoming some of the potential problems that delay the adoption of protein nanoparticles for clinical use. the covalent attachment of peg can reduce immunogenicity and antigenicity by hiding the particle from the immune system, can increase the circulating time by reducing renal clearance, and can also improve the water solubility of a hydrophobic particle. the use of pegylation has been approved for commercial use by the fda and emea, and some examples of pegylated protein products are adagen (peg-bovine adenosine deaminase), the first pegylated protein approved by the fda in 1990, pegasys (peg-interferon alpha), and oncaspar (peg-l-asparaginase). the majority of protein nanoparticles studied in clinical trials (http://clinicaltrias.gov) are fusion proteins composed of a therapeutic protein/peptide and a target cell-specific ligand. an example is alt-801, a biologic compound composed of il-2 genetically fused to a humanized soluble t-cell receptor directed against the p53-derived antigen. the clinical trials evaluated whether directing il-2 activity using alt-801 to the patient's tumor sites that overexpress p53 results in clinical benefits (nct01029873, nct00496860). another ligand joined to il-2 is l19, a tumor-targeted immunocytokine constituted of a single chain fragment variable (scfv) directed against the ed-b domain of fibronectin, one of the most important markers for neoangiogenesis. l19-il-2 is in a phase i/ii study for patients with solid tumors and renal cell carcinoma (rcc) (nct01058538). l19 has also been fused to tnfa with the intention to target tnfa directly to tumor tissues resulting in high and sustained intralesional bioactive tnfa concentrations. the l19tnfa is under clinical trial using isolated inferior limb perfusion (ilp) with the standard treatment with melphalan 10 mg/l limb volume in subjects affected by stage iii/iv limb melanoma (nct01213732). ngr-htnf is another bifunctional protein which combines a tumor-homing peptide (ngr) that selectively binds to amino peptidase n/cd13 highly expressed on tumor blood vessels, thus affecting tumor vascular permeability, and htnf, with direct anticancer activity. ngr-htnf is undergoing 14 clinical trials as a single agent to treat different cancers, as well as in combination with chemotherapy agents. another strategy to direct a therapeutic protein to the target cell is through fusion to a growth factor receptor ligand. an example is tp-38, a recombinant chimerical protein composed of the egfr binding ligand (tgf-a) and a genetically engineered form of the pseudomonas exotoxin, pe-38, to treat recurrent grade iv malignant brain tumors (nct00071539). many clinical trials are based on a therapeutic protein fused to a targeting antibody, as is the case of apc8015. this drug stimulates the immune system and stops cancer cells from growing by the combination of biological therapies with bevacizumab , an already approved monoclonal antibody that locates tumor cells and kills them in a specific way (nct00849290). there are also many putative protein drugs against cancer which include antibodies antiintegrins (e.g., cilengitide and imgn388), sometimes in combination with 276 classical therapies. a recently developed tool, the nanobodies or single domain antibodies, 293 have several advantages: small size (only 12-15 kda), which lowers the possibility of triggering immune response, safety in clinical trials (nct01020383), and is easy to be joined to different kinds of compounds. all these features make nanobodies competent drugs against different diseases, and have been tested in vivo as bifunctional proteins associated to a prodrug, very efficient in mice cancer xenografts. 294 even though cpps are very useful tools to deliver drugs and in gene therapy (see the chapter ''peptide nanoparticles for oligonucleotide delivery'' by lehto et al. in this volume), their toxicity and endosomal entrapment slows their inclusion for systemic delivery in clinical trials. nevertheless, there are a few examples of use to prevent undesirable cell proliferation in coronary artery bypass grafts, as is the case of a cpp (r-ahx-r) 4 ahxb-pmo conjugate targeted to human c-myc to be applied ex vivo. the trial, in phase ii, has been completed in 2009 (nct00451256). another case is psorban , a product patented for the treatment of psoriasis based on a cyclosporine-polyarginine conjugate of local application, which circumvents the specificity problem of intravenous (i.v.) application. it is in clinical trial phase iii, but not yet in the market. finally, kai-9803, a pkcd inhibitor peptide conjugated to tat to function as an intravenous drug for the treatment of acute myocardial infarction, is currently in phase 2b clinical trial (nct00785954, kai pharmaceuticals). there are many proteins, often organized as nanoparticles, that when associated to a drug, therapeutic protein, peptide, or nucleic acid increase the therapeutic efficacy of a cargo alone in the treatment of various diseases. some of them proved effective in animal models, which are discussed in more detail in this section, with relevant examples listed in table ii . these nanoparticles may simply be (a) a cpp to promote nonspecific internalization, 295-300 (b) a peptide to confer cargo specificity by joining a receptor distinctive of a cell type, including scfvs or peptides obtained by phage display, 301 and (c) a mixture of both, 302 since as observed in several studies the cpp does not reduce ligand specificity and increases nanoparticle potency. [303] [304] [305] complex and multifunctional vehicles including endosomal escape peptides enhance the therapeutic potency of the complex, or other domains that allow their selective activation in certain contexts. 306, 307 apart from the cases listed in table ii , the spectrum of additional examples of multidomain protein nanoparticles tested in vivo is wide, and a considerable proportion of them include cpps, mainly tat and polyarginines. a classical tat fusion protein is the transducible d-isomer ri-tatp53c 0 ' fusion protein that activates p53 protein in cancer cells, but not in normal cells. ri-tatp53c 0 treatment in terminal peritoneal carcinomatosis and peritoneal lymphoma preclinical models results in significant increases in life span (higher than sixfold) and full recovery from the disease. 308 there are also several studies in vivo using tat-fused therapeutic proteins which have proven effective in treating tumors [309] [310] [311] and cerebral ischemia 312,313 when applied intraperitoneally (i.p.). regarding polyarginines, kumar and colleagues have presented two different models in which a bifunctional peptide formed by nine arginines (9r) and a specific ligand constitute an effective sirna vehicle. in the first model, a chimerical peptide derived from rabies virus glycoprotein (to confer neuronal specificity) fused to 9d-arginines (rvg-9r), was able to transport si-rna across the bbb and silence specific gene expression in the brain when applied intravenously. 56 in the second model, a cd7-specific single-chain antibody was conjugated to oligo-9-arginine peptide (scfvcd7-9r) for t cell-specific antiviral sirna delivery in humanized mice reconstituted with human lymphocytes. in hiv-infected humanized mice, this treatment controlled viral replication and prevented the disease-associated cd4 t cell loss. moreover, it effectively suppressed viremia in infected mice. 314 some other examples of polyarginines in tumor models are 9-d-arginines fused to a tumor-suppressor peptide, which stopped tumor growth in hepatocellular carcinoma-bearing mice when applied intraperitoneally, and also colesteryl oligoarginines carrying vegf sirna, which inhibited tumor growth in colon adenocarcinoma after local application. 315 another bbb-crossing peptide is g7, which is able to transport nanoparticles loaded with loperamide. 107 in general, the partner fusion peptide can confer specificity instead of penetrability, as is the case of egfr fab fragment associated to liposomes that contain anticancer drug, which increases efficiency of anticancer effect in egf overexpressing xenograft tumors 316 ; in addition, rgd-4 c-doxorubicin in human breast xenografts increases efficacy and diminishes toxicity. 317 in many conjugates, the therapeutic peptide of the chimerical proteins is a toxin. anthrax lethal toxin has been modified to be activated by methaloproteases, and it has probed to be effective for human xenografted tumors such as melanoma, lung, and colorectal cancer. 318 anthrax toxin has also been associated to antibodies or growth factors for lethal effects specifically on cancer cells. 319 the specific cytotoxicity desired to treat a tumor might derive from a tissue factor, which promotes clotting to restrict blood supply in tumor vessels, fused to peptides that provide specificity, like v-cam antibodies, fibronectin, and integrin ligands. 320 eventually, drug activity may decrease when conjugated to a carrier protein, although if the entry of the drug is favored, the overall balance of activity can be much more efficient. 321 on the other hand, the use of noncovalent bond drug carrier could avoid interfering with the activity of the drug. an important issue in a preclinical study to be considered for a clinical trial is the administration route. in in vivo experiments, most of the protein nanoparticles are administered by local or intraperitoneal injection, avoiding systemic spreading and clearance in the vascular system, in a way very similar to in vitro experiments. the fda and emea, on the other hand, will preferentially approve i.v. and oral administrations rather than intraperitoneal or local injections except for very accessible tissues. another relevant issue is the number of active domains to be included in a therapeutic protein carrier, an issue that seems to be relevant for the functionality of the construct. for example, the cpp neutralization of a ligand may depend on the cpp/ligand ratio that is in the vehicle. 322 it has also been observed that the integrin binding power of rgd-containing motives increases with the number of rgd domains over the monomer until a maxim of four moieties. 323 another example is tat activity empowerment when attached to molecules that form tetramers, such as beta-galactosidase 108 and p-53. 324 some multidomain protein carriers allow the drug entrance only in selected target cells by tailored smart selective mechanisms. 325 for instance, cpps neutralized by polyanions are activated and enter the cells when they are released by metalloproteases 326 or by lowering the ph, 327 both situations being very common in tumors. cpp-morpholino oligomer (pmo) nanoparticles have also shown their effectiveness in treating viral infections by inhibiting viral replication, as demonstrated with the carrier (r-ahx-r) 4ahxb-pmo administered i.v. in animal models infected with picornaviruses, i.p. in mice infected with coronaviruses and flaviviruses, and the carrier r9f2c-pmo administered also i.p. in mice infected with ebola virus. furthermore, it has also been shown in some of these studies that the efficacy of the treatment is dependent on the incorporation of arginine-rich peptides in the nanoparticle. 328 a good example of how a cpp can improve the internalization of a therapeutic protein is the case of insulin. the instability and low absorption in the digestive tract of insulin prevents its oral administration, even though it would be very convenient for a daily administrated drug. in recent studies, noncovalent conjugation of insulin to different cpps enhances its absorption without toxic intestinal effect, l-penetratin being the most efficient as insulin carrier. 329 among the protein nanoparticles tested in vivo, it is worth making special mention of trojan horses generated in pardridge's laboratory to cross the bbb, through a strategy of fusing within a chimerical peptide the therapeutic protein which has to reach the cns to a monoclonal antibody against the human insulin receptor (hirmab). this trojan horse is very potent for humans and primates, and has proven effective to transport b-glucuronidase, a-l-iduronidase, gdnf, abeta amyloid peptides, paroxonase, etc., with potential benefits in diseases like mucopolysaccharidosis type vii, hurler syndrome, parkinson, alzheimer, and organophosphates toxicity, respectively. 330 there are also promising results when protein nanoparticles have been tested as carriers for gene therapy in vivo, some examples being listed in table ii . in this regard, the use of modular proteins generated by insertional mutagenesis of b-galactosidase condensing the sod gene are able to protect neurons against ischemic injury 133 ; a bifunctional galactosylated polylysine is able to conjugate plasmid dna and to differentially promote expression in hepatocytes that display asialoglycoprotein receptor 331 ; a suicide multidomain protein particle formed by herpes simplex virus thymidine kinase (hsv-tk) conjugated to transferrin (tf) by a biotin-streptavidin bridging, which, administered i.v. in k562 massively metastasized nude mice, was able to reduce tumor size and to increase mouse survival. 332 in this chapter, proteins and peptides have been envisioned as potent biotechnological tools for the development of new biocompatible biological entities that can be used as therapeutic agents by themselves or as nanovehicles for the delivery of associated drugs. proteins are nanostructures that can form complex high-order entities such as vlps, resulting in appropriate cages for the internalization of therapeutic molecules. in addition, the design of modular proteins displaying selected functions has been possible by using in silico approximations to the feasibility of recombinant protein production. this approach has demonstrated the versatility of such molecules in the generation of novel delivery nanovehicles opening up the possibility of new functional combinations to enhance the specific interaction with the target tissue. such tunable specificity in the delivery of drugs, nucleic acids, or other proteins is one of the main properties that make multifunctional proteins appealing as more rational delivery vehicles. the presence on the market of such complex entities, which started with the approval of insulin for the treatment of diabetes, has been increasing over the past years, and this tendency is expected to continue. in fact, there are some products in clinical trials that will probably end up being approved and some more are being explored in preclinical experiments which might enter in clinical trials. identifying actives from hts data sets: practical approaches for the selection of an appropriate hts data-processing method and quality control review natural products in the process of finding new drug candidates when analoging is not enough: scaffold discovery in medicinal chemistry dose-toxicity models in oncology is declining innovation in the pharmaceutical industry a myth? the impact of pegylation on biological therapies anticancer activity of celastrol in combination with erbb2-targeted therapeutics for treatment of erbb2-overexpressing breast cancers soon-shiong p. sparc expression correlates with tumor response to albumin-bound paclitaxel in head and neck cancer patients improved effectiveness of nanoparticle albumin-bound (nab) paclitaxel versus polysorbate-based docetaxel in multiple xenografts as a function of her2 and sparc status pharmacokinetics of plasmid dna in the rat instability, stabilization, and formulation of liquid protein pharmaceuticals peptide-guided gene delivery artificial viruses: a nanotechnological approach to gene delivery approaches to transport therapeutic drugs across the blood-brain barrier to treat brain diseases multifunctional protein nanocarriers for targeted nuclear gene delivery in nondividing cells synthetic and natural polycations for gene therapy: state of the art and new perspectives structure-activity relationships of poly(l-lysines): effects of pegylation and molecular shape on physicochemical and biological properties in gene delivery systemic circulation of poly(l-lysine)/dna vectors is influenced by polycation molecular weight and type of dna: differential circulation in mice and rats and the implications for human gene therapy a novel dnapeptide complex for efficient gene transfer and expression in mammalian cells branched cationic peptides for gene delivery: role of type and number of cationic residues in formation and in vitro activity of dna polyplexes comparative gene transfer efficiency of low molecular weight polylysine dna-condensing peptides low molecular weight disulfide cross-linking peptides as nonviral gene delivery carriers protamine sulfate enhances lipid-mediated gene transfer protamine-induced condensation and decondensation of the same dna molecule evaluation of nuclear transfer and transcription of plasmid dna condensed with protamine by microinjection: the use of a nuclear transfer score the protamine family of sperm nuclear proteins membrane-active peptides for non-viral gene therapy: making the safest easier enhancement of msh receptor-and gal4-mediated gene transfer by switching the nuclear import pathway target cell-specific dna transfer mediated by a chimeric multidomain protein: novel non-viral gene delivery system a multi-domain protein system based on the hc fragment of tetanus toxin for targeting dna to neuronal cells refined solution structure of the dna-binding domain of gal4 and use of 3j(113cd,1h) in structure determination immune responses to gene therapy vectors: influence on vector function and effector mechanisms peptide-assisted traffic engineering for nonviral gene therapy cell-penetrating peptides: a reevaluation of the mechanism of cellular uptake cell penetrating peptides: overview and applications to the delivery of oligonucleotides modular protein engineering in emerging cancer therapies oligomers of the arginine-rich motif of the hiv-1 tat protein are capable of transferring plasmid dna into cells tat-mediated delivery of heterologous proteins into cells tat peptide-mediated cellular delivery: back to basics a truncated hiv-1 tat protein basic domain rapidly translocates through the plasma membrane and accumulates in the cell nucleus cellular uptake [correction of utake] of the tat peptide: an endocytosis mechanism following ionic interactions the design, synthesis, and evaluation of molecules that enable or enhance cellular uptake: peptoid molecular transporters delivery of short interfering rna using endosomolytic cell-penetrating peptides conjugate for efficient delivery of short interfering rna (sirna) into mammalian cells cell penetration by transportan cellular translocation of proteins by transportan integrin-mediated vectors for gene transfer and therapy inhibition of tumor growth by rgd peptide-directed delivery of truncated tissue factor to the tumor vasculature hiv coreceptor downregulation as antiviral principle: sdf-1alpha-dependent internalization of the chemokine receptor cxcr4 contributes to inhibition of hiv replication cxcr4, inhibitors and mechanisms of action the transferrin receptor part ii: targeted delivery of therapeutic agents into cancer cells improved gene delivery into neuroglial cells using a fiber-modified adenovirus vector systemic genetic transfer of p21waf-1 and gm-csf utilizing of a novel oligopeptide-based egf receptor targeting polyplex specific systemic nonviral gene delivery to human hepatocellular carcinoma xenografts in scid mice a new n-acetylgalactosamine containing peptide as a targeting vehicle for mammalian hepatocytes via asialoglycoprotein receptor endocytosis transvascular delivery of small interfering rna to the central nervous system a novel peptide, plaeidgielty, for the targeting of alpha9beta1-integrins a synthetic peptide vector system for optimal gene delivery to corneal endothelium secretin-mediated gene delivery, a specific targeting mechanism with potential for treatment of biliary and pancreatic disease in cystic fibrosis a synthetic peptide containing loop 4 of nerve growth factor for targeted gene delivery neurotensin-spdp-poly-l-lysine conjugate: a nonviral vector for targeted gene delivery to neural cells identification of peptides that target the endothelial cell-specific lox-1 receptor selective transport of an anti-transferrin receptor antibody through the blood-brain barrier in vivo humanization of anti-human insulin receptor antibody for drug targeting across the human blood-brain barrier anti-gad antibody targeted non-viral gene delivery to islet beta cells novel challenges in exploring peptide ligands and corresponding tissue-specific endothelial receptors from combinatorial chemistry to cancer-targeting peptides cell surface adherence and endocytosis of protein transduction domains influenza virus hemagglutinin ha-2 n-terminal fusogenic peptides augment gene transfer by transferrin-polylysine-dna complexes: toward a synthetic virus-like gene-transfer vehicle the influence of endosomedisruptive peptides on gene transfer using synthetic virus-like gene transfer systems ph-dependent bilayer destabilization by an amphipathic peptide mechanism of leakage of phospholipid vesicle contents induced by the peptide gala association of a ph-sensitive peptide with membrane vesicles: role of amino acid sequence gala: a designed synthetic ph-responsive amphipathic peptide with applications in drug and gene delivery design, synthesis, and characterization of a cationic peptide that binds to nucleic acids and permeabilizes bilayers new basic membrane-destabilizing peptides for plasmid-based gene delivery in vitro and in vivo melittin enables efficient vesicular escape and enhanced nuclear access of nonviral gene delivery vectors the third helix of the antennapedia homeodomain translocates through biological membranes trojan peptides: the penetratin system for intracellular delivery histidine-rich peptides and polymers for nucleic acids delivery membrane permeabilization and efficient gene transfer by a peptide containing several histidines histidine containing peptides and polypeptides as nucleic acid vectors characterization of the gene transfer process mediated by histidine-rich peptides the proteasome metabolizes peptide-mediated nonviral gene delivery systems inhibition of ubiquitin-dependent proteolysis by a synthetic glycine-alanine repeat peptide that mimics an inhibitory viral sequence a minimal glycine-alanine repeat prevents the interaction of ubiquitinated i kappab alpha with the proteasome: a new mechanism for selective inhibition of proteolysis cis-inhibition of proteasomal degradation by viral repeats: impact of length and amino acid composition recognition of novel viral sequences that associate with the dynein light chain lc8 identified through a pepscan technique mechanisms of nuclear protein import gene delivery: a single nuclear localization signal peptide is sufficient to carry dna to the cell nucleus epstein-barr virus nuclear antigen 1 forms a complex with the nuclear transporter karyopherin alpha2 identification of the human c-myc protein nuclear translocation signal a chimeric fusion protein containing transforming growth factor-alpha mediates gene transfer via binding to the egf receptor the requirement of h1 histones for a heterodimeric nuclear import receptor core histones and linker histones are imported into the nucleus by different pathways nuclear targeting peptide scaffolds for lipofection of nondividing mammalian cells the carboxyl 35 amino acids of sv40 vp3 are essential for its nuclear accumulation pentapeptide nuclear localization signal in adenovirus e1a identification of domains involved in nuclear uptake and histone binding of protein n1 of xenopus laevis competition between nuclear localization and secretory signals determines the subcellular fate of a single cug-initiated form of fgf3 the human poly(adpribose) polymerase nuclear localization signal is a bipartite element functionally separate from dna binding and catalytic activity two interdependent basic domains in nucleoplasmin nuclear targeting sequence: identification of a class of bipartite nuclear targeting sequence long-term pharmacologically regulated expression of erythropoietin in primates following aav-mediated gene transfer rapid promoter analysis in developing mouse brain and genetic labeling of young neurons by doublecortin-dsred-express liverrestricted expression of the canine factor viii gene facilitates prevention of inhibitor formation in factor viii-deficient mice blood-brain barrier delivery targeting the central nervous system: in vivo experiments with peptide-derivatized nanoparticles loaded with loperamide and rhodamine-123 in vivo protein transduction: delivery of a biologically active protein into the mouse genetic engineering, expression, and activity of a fusion protein of a human neurotrophin and a molecular trojan horse for delivery across the human blood-brain barrier genetic engineering of a lysosomal enzyme fusion protein for targeted delivery across the human blood-brain barrier gdnf fusion protein for targeted-drug delivery across the human blood-brain barrier directed evolution: selecting today's biocatalysts novel methods for directed evolution of enzymes: quality, not quantity chemical and biochemical strategies for the randomization of protein encoding dna sequences: library construction methods for directed evolution rapid evolution of a protein in vitro by dna shuffling protein design automation advances in protein structure prediction and de novo protein design: a review solution structure and dynamics of a de novo designed three-helix bundle protein de novo protein design: fully automated sequence selection modular protein engineering for non-viral gene therapy a modular dna carrier protein based on the structure of diphtheria toxin mediates target cell-specific gene delivery gene therapy progress and prospects: non-viral gene therapy by systemic delivery using nuclear targeting signals to enhance non-viral gene transfer delivery of bioactive molecules into the cell: the trojan horse approach synthesis of cell-penetrating peptides and their application in neurobiology complexes of plasmid dna with basic domain 47-57 of the hiv-1 tat protein are transferred to mammalian cells by endocytosis-mediated pathways molecular organization of protein-dna complexes for cell-targeted dna delivery role of molecular chaperones in inclusion body formation recombinant protein expression in escherichia coli advanced genetic strategies for recombinant protein expression in escherichia coli recombinant expression systems in the pharmaceutical industry the major capsid protein, vp1, of human jc virus expressed in escherichia coli is able to self-assemble into a capsid-like particle and deliver exogenous dna into human kidney cells neuroprotection from nmda excitotoxic lesion by cu/zn superoxide dismutase gene delivery to the postnatal rat brain by a modular protein vector yeast cells allow high-level expression and formation of polyomavirus-like particles mammalian cell culture systems for recombinant protein production insect cell culture for industrial production of recombinant proteins large-scale mammalian cell culture glycosylation of human recombinant gonadotrophins: characterization and batch-to-batch consistency efficient gene delivery to primary neuron cultures using a synthetic peptide vector system cell-penetrating dna-binding protein as a safe and efficient naked dna delivery carrier in vitro and in vivo synthetic peptides as non-viral dna vectors exploration of peptide motifs for potent non-viral gene delivery highly selective for dividing cells engineering nuclear localization signals in modular protein vehicles for gene therapy the role of surface charge on the uptake and biocompatibility of hydroxyapatite nanoparticles with osteoblast cells rotavirus-like particles: a novel nanocarrier for the gut efficient accommodation of recombinant, foot-andmouth disease virus rgd peptides to cell-surface integrins virus entry: open sesame high efficiency polyoma dna transfection of chloroquine treated cells plasticity of influenza haemagglutinin fusion peptides and their interaction with lipid bilayers a nuclear localization signal can enhance both the nuclear transport and expression of 1 kb dna efficacy of a peptidebased gene delivery system depends on mitotic activity biological gene delivery vehicles: beyond viral vectors virus-like particles-universal molecular toolboxes virus engineering: functionalization and stabilization viruses and their uses in nanotechnology scaffolding proteins and their role in viral assembly towards the preparative and large-scale precision manufacture of virus-like particles virus-sized vaccine delivery systems virus-like particles in vaccine development advances in the development of virus-like particles as tools in medicine and nanoscience vaccination, immune and gene therapy based on viruslike particles against viral infections and cancer virus-like particles as a vaccine delivery system: myths and facts gene transfer by polyoma-like particles assembled in a cell-free system papillomavirus-like particles induce acute activation of dendritic cells papillomavirus virus-like particles as vehicles for the delivery of epitopes or genes virus-like particle vaccines and adjuvants: the hpv paradigm a novel system for efficient gene transfer into primary human hepatocytes via cell-permeable hepatitis b virus-like particle essential elements of the capsid protein for self-assembly into empty virus-like particles of hepatitis e virus recombinant hepatitis c virus-like particles expressed by baculovirus: utility in cell-binding and antibody detection assays murine pneumotropic virus chimeric her2/neu virus-like particles as prophylactic and therapeutic vaccines against her2/neu expressing tumors in vitro and in vivo targeted delivery of il-10 interfering rna by jc virus-like particles cell-type specific targeting and gene expression using a variant of polyoma vp1 virus-like particles molecular cloning and expression of major structural protein vp1 of the human polyomavirus jc virus: formation of virus-like particles useful for immunological and therapeutic studies packaging of small molecules into vp1-virus-like particles of the human polyomavirus jc virus chimeric polyomavirus-derived virus-like particles: the immunogenicity of an inserted peptide applied without adjuvant to mice depends on its insertion site and its flanking linker sequence generation of recombinant virus-like particles of human and non-human polyomaviruses in yeast saccharomyces cerevisiae hamster polyomavirusderived virus-like particles are able to transfer in vitro encapsidated plasmid dna to mammalian cells virus-like gene transfer into cells mediated by polyoma virus pseudocapsids immunity against both polyomavirus vp1 and a transgene product induced following intranasal delivery of vp1 pseudocapsid-dna complexes virus-like particles: designing an effective aids vaccine lentivirus-based virus-like particles as a new protein delivery tool parenteral administration of rf 8-2/6/7 rotavirus-like particles in a one-dose regimen induce protective immunity in mice canine parvovirus-like particles, a novel nanomaterial for tumor targeting tumor targeting using canine parvovirus nanoparticles norwalk virus-like particles as vaccines virus-based nanoparticles (vnps): platform technologies for diagnostic imaging quantitative characterization of virus-like particles by asymmetrical flow field flow fractionation, electrospray differential mobility analysis, and transmission electron microscopy high-resolution structure of a polyomavirus vp1-oligosaccharide complex: implications for assembly and receptor binding the polyomaviridae: contributions of virus structure to our understanding of virus receptors and infectious entry structures of virus and virus-like particles fabrication of novel biomaterials through molecular self-assembly bacteriophage capsids: tough nanoshells with complex elastic properties maturation of a tetravirus capsid alters the dynamic properties and creates a metastable complex the use of virus-like particles for gene transfer virus-like particles as vaccines and vessels for the delivery of small molecules recombinant virus like particles as drug delivery system virus-like particles as immunogens virus-like particles: passport to immune recognition manipulation of the mechanical properties of a virus by protein engineering viruses as building blocks for materials and devices adaptations of nanoscale viruses and other protein cages for medical applications blocking oncogenic ras signaling for cancer therapy assessment of cell type specific gene transfer of polyoma virus like particles presenting a tumor specific antibody fv fragment microbial production of virus-like particle vaccine protein at gram-per-litre levels recombinant baculoviruses as mammalian cell gene-delivery vectors production of core and virus-like particles with baculovirus infected insect cells production of fmdv virus-like particles by a sumo fusion protein approach in escherichia coli virus-like particles production in green plants an efficient plant viral expression system generating orally immunogenic norwalk virus-like particles immunotherapeutic polyoma and human papilloma virus-like particles human hepatitis b vaccine from recombinant yeast evaluation of hbs, hbc, and frcp virus-like particles for expression of human papillomavirus 16 e7 oncoprotein epitopes advances in methods for the production, purification, and characterization of hiv-1 gag-env pseudovirion vaccines nanotechnology in vaccine delivery protection against lethal challenge by ebola virus-like particles produced in insect cells ebola virus-like particles protect from lethal ebola virus infection activation of dendritic cells and induction of t cell responses by hpv 16 l1/e7 chimeric virus-like particles are enhanced by cpg odn or sorbitol vaccination trial with hpv16 l1e7 chimeric virus-like particles in women suffering from high grade cervical intraepithelial neoplasia (cin 2/3) hpv16 l1e7 chimeric virus-like particles induce specific hla-restricted t cells in humans after in vitro vaccination chimeric virus-like particles for the delivery of an inserted conserved influenza a-specific ctl epitope chimeric hepatitis b virus core particles carrying an epitope of anthrax protective antigen induce protective immunity against bacillus anthracis parvovirus b19 empty capsids as antigen carriers for presentation of antigenic determinants of dengue 2 virus recombinant virallike particles of parvovirus b19 as antigen carriers of anthrax protective antigen conjugation of a self-antigen to papillomavirus-like particles allows for efficient induction of protective autoantibodies a vaccine against nicotine for smoking cessation: a randomized controlled trial efficient gene transfer using the human jc virus-like particle that inhibits human colon adenocarcinoma growth in a nude mouse model a top-down approach for construction of hybrid polymer-virus gene delivery vectors dna and gene therapy: uncoating of polyoma pseudovirus in mouse embryo cells nanoparticles for the delivery of genes and drugs to human hepatocytes dna vaccineencapsulated virus-like particles derived from an orally transmissible virus stimulate mucosal and systemic immune responses by oral administration efficient delivery of rna interference effectors via in vitro-packaged sv40 pseudovirions reengineering a receptor footprint of adeno-associated virus enables selective and systemic gene transfer to muscle an investigation into the use of human papillomavirus type 16 virus-like particles as a delivery vector system for foreign proteins: n-and c-terminal fusion of gfp to the l1 and l2 capsid proteins coupling of antibodies via protein z on modified polyoma virus-like particles conjugation of an antibody fv fragment to a virus coat protein: cell-specific targeting of recombinant polyoma-virus-like particles accelerated bioorthogonal conjugation: a practical method for the ligation of diverse functional molecules to a polyvalent virus scaffold molecular engineering of viral gene delivery vehicles filovirus-pseudotyped lentiviral vector can efficiently and stably transduce airway epithelia in vivo nuclear entry mechanism of the human polyomavirus jc virus-like particle: role of importins and the nuclear pore complex hybrid virus-polymer materials. 1. synthesis and properties of peg-decorated cowpea mosaic virus the power of two: protein dimerization in biology bottom-up design of biomimetic assemblies measuring the forces that control protein interactions recent progress in understanding hydrophobic interactions binding mechanisms in supramolecular complexes structural symmetry and protein function modeling experimental design for proteomics smart and genetically engineered biomaterials and drug delivery systems molecular designer self-assembling peptides nanoparticulate architecture of protein-based artificial viruses is supported by protein-dna interactions giant dna molecules exhibit on/off switching of transcriptional activity through conformational transition the small heat shock protein cage from methanococcus jannaschii is a versatile nanoscale platform for genetic and chemical modification comparative ultrastructure of the thiobacilli functional organelles in prokaryotes: polyhedral inclusions (carboxysomes) of thiobacillus neapolitanus association of carbonic anhydrase activity with carboxysomes isolated from the cyanobacterium synechococcus pcc7942 a novel evolutionary lineage of carbonic anhydrase (epsilon class) is a component of the carboxysome shell isolation of a putative carboxysomal carbonic anhydrase gene from the cyanobacterium synechococcus pcc7942 the control region of the pdu/cob regulon in salmonella typhimurium the 17-gene ethanolamine (eut) operon of salmonella typhimurium encodes five homologues of carboxysome shell proteins the propanediol utilization (pdu) operon of salmonella enterica serovar typhimurium lt2 includes genes necessary for formation of polyhedral organelles involved in coenzyme b(12)-dependent 1, 2-propanediol degradation pdua is a shell protein of polyhedral organelles involved in coenzyme b(12)-dependent degradation of 1,2-propanediol in salmonella enterica serovar typhimurium lt2 protein content of polyhedral organelles involved in coenzyme b12-dependent degradation of 1,2-propanediol in salmonella enterica serovar typhimurium lt2 pdup is a coenzyme-a-acylating propionaldehyde dehydrogenase associated with the polyhedral bodies involved in b12-dependent 1,2-propanediol degradation by salmonella enterica serovar typhimurium lt2 dna polymerase i function is required for the utilization of ethanolamine, 1,2-propanediol, and propionate by salmonella typhimurium lt2 glutathione is required for maximal transcription of the cobalamin biosynthetic and 1,2-propanediol utilization (cob/pdu) regulon and for the catabolism of ethanolamine, 1,2-propanediol, and propionate in salmonella typhimurium lt2 microcompartments for b12-dependent 1,2-propanediol degradation provide protection from dna and cellular damage by a reactive metabolic intermediate conserving a volatile metabolite: a role for carboxysome-like organelles in salmonella enterica protein structures forming the shell of primitive bacterial organelles bacterial microcompartment organelles: protein shell structure and evolution atomic-level models of the bacterial carboxysome shell structure and mechanisms of a protein-based organelle in escherichia coli a multiprotein bicarbonate dehydration complex essential to carboxysome function in cyanobacteria analysis of carboxysomes from synechococcus pcc7942 reveals multiple rubisco complexes with carboxysomal proteins ccmm and ccaa analysis of a genomic dna region from the cyanobacterium synechococcus sp. strain pcc7942 involved in carboxysome assembly and function synthesis of empty bacterial microcompartments, directed organelle protein incorporation, and evidence of filament-associated organelle movement halothiobacillus neapolitanus carboxysomes sequester heterologous and chimeric rubisco species short n-terminal sequences package proteins into bacterial microcompartments structural basis of enzyme encapsulation into a bacterial nanocompartment a simple tagging system for protein encapsulation multiple assembly states of lumazine synthase: a model relating catalytic function and molecular assembly thermostability and molecular encapsulation within an engineered caged protein scaffold ph-triggered disassembly in a caged protein complex characterization and activity of sonochemically-prepared bsa microspheres containing taxol-an anticancer drug paclitaxel-clusters coated with hyaluronan as selective tumor-targeted nanovectors protein nanodisk assembling and intracellular trafficking powered by an arginine-rich (r9) peptide the nanoscale properties of bacterial inclusion bodies and their effect on mammalian cell proliferation surface cell growth engineering assisted by a novel bacterial nanomaterial nanostructured bacterial materials for innovative medicines development of a selfassembling nuclear targeting vector system based on the tetracycline repressor protein unraveling mysteries of the multifunctional protein sparc a virus-like particle vaccine for epidemic chikungunya virus protects nonhuman primates against infection recombinant h1n1 viruslike particle vaccine elicits protective immunity in ferrets against the 2009 pandemic h1n1 influenza virus properties, production, and applications of camelid singledomain antibody fragments efficient cancer therapy with a nanobody-based conjugate effect of cell-based intercellular delivery of transcription factor gata4 on ischemic cardiomyopathy morpholino oligomer-mediated exon skipping averts the onset of dystrophic pathology in the mdx mouse overcoming multidrug resistance of small-molecule therapeutics through conjugation with releasable octaarginine transporters in vivo delivery of the caveolin-1 scaffolding domain inhibits nitric oxide synthesis and reduces inflammation a non-covalent peptide-based carrier for in vivo delivery of dna mimics targeting cyclin b1 through peptide-based delivery of sirna prevents tumour growth antibody mediated in vivo delivery of small interfering rnas via cell-surface receptors selective inhibition of erbb2-overexpressing breast cancer in vivo by a novel tat-based erbb2-targeting signal transducers and activators of transcription 3-blocking peptide design of a tumor-homing cell-penetrating peptide a tumor-homing peptide with a targeting specificity related to lymphatic vessels penetratin improves tumor retention of single-chain antibodies: a novel step toward optimization of radioimmunotherapy of solid tumors killing hiv-infected cells by transduction with an hiv protease-activated caspase-3 protein development of elastin-like polypeptide for thermally targeted delivery of doxorubicin treatment of terminal peritoneal carcinomatosis by a transducible p53-activating peptide antitumor effect of tat-oxygen-dependent degradation-caspase-3 fusion protein specifically stabilized and activated in hypoxic tumor cells the 104-123 amino acid sequence of the beta-domain of von hippel-lindau gene product is sufficient to inhibit renal tumor growth and invasion dendritic cells transduced with protein antigens induce cytotoxic lymphocytes and elicit antitumor immunity protein kinase c delta mediates cerebral reperfusion injury in vivo in vivo delivery of a bcl-xl fusion protein containing the tat protein transduction domain protects against ischemic brain injury and neuronal apoptosis t cell-specific sirna delivery suppresses hiv-1 infection in humanized mice cholesteryl oligoarginine delivering vascular endothelial growth factor sirna effectively inhibits tumor growth in colon adenocarcinoma epidermal growth factor receptor-targeted immunoliposomes significantly enhance the efficacy of multiple anticancer drugs in vivo cancer treatment by targeted drug delivery to tumor vasculature in a mouse model matrix metalloproteinase-activated anthrax lethal toxin demonstrates high potency in targeting tumor vasculature anthrax fusion protein therapy of cancer comparison of three different targeted tissue factor fusion proteins for inducing tumor vessel thrombosis overcoming methotrexate resistance in breast cancer tumour cells by the use of a new cellpenetrating peptide tumor cell retention of antibody fab fragments is enhanced by an attached hiv tat protein-derived peptide improved targeting of the alpha(v)beta (3) integrin by multimerisation of rgd peptides probing the impact of valency on the routing of arginine-rich peptides into eukaryotic cells cell-penetrating and cell-targeting peptides in drug delivery tumor imaging by means of proteolytic activation of cell-penetrating peptides tat peptide-based micelle system for potential active targeting of anti-cancer agents to acidic solid tumors cell penetrating peptide conjugates of steric block oligonucleotides usefulness of cell-penetrating peptides to improve intestinal insulin absorption biopharmaceutical drug targeting to the brain gene transfer in vivo: sustained expression and regulation of genes introduced into the liver by receptor-targeted uptake in vivo gene delivery to tumor cells by transferrin-streptavidin-dna conjugate the authors appreciate the financial support received through grants bfu2010-17450 from micinn, ps0900165 from fiss, and 2009sgr-108 from agaur. the authors also acknowledge the support of the ciber de bioingeniería, biomateriales y nanomedicina (ciber-bbn), an initiative funded by the vi national r&d&i plan 2008-2011, iniciativa ingenio 2010, consolider program, ciber actions and financed by the instituto de salud carlos iii with assistance from the european regional development fund. protein nanoparticles engineered for drug delivery and gene therapy 281 key: cord-335382-fk4um9nw authors: farver, carol f.; zander, dani s. title: molecular basis of pulmonary disease date: 2012-08-10 journal: molecular pathology doi: 10.1016/b978-0-12-374419-7.00018-4 sha: doc_id: 335382 cord_uid: fk4um9nw pulmonary pathology includes a large spectrum of both neoplastic and non-neoplastic diseases that affect the lung. many of these are a result of the unusual relationship of the lung with the outside world. every breath that a human takes brings the outside world into the body in the form of infectious agents, organic and inorganic particles, and noxious agents of all types. although the lung has many defense mechanisms to protect itself from these insults, these are not infallible; therefore, lung pathology arises. damage to the lung is particularly important given the role of the lung in the survival of the organism. any impairment of lung function has widespread effects throughout the body, since all organs depend on the lungs for the oxygen they need. pulmonary pathology catalogs the changes in the lung tissues and the mechanisms through which these occur. this chapter presents a review of lung pathology and the current state of knowledge about the pathogenesis of each disease. it suggests that a clear understanding of both morphology and mechanism is required for the development of new therapies and preventive measures. pulmonary pathology includes a large spectrum of both neoplastic and non-neoplastic diseases that affect the lung. many of these are a result of the unusual relationship of the lung with the outside world. every breath that a human takes brings the outside world into the body in the form of infectious agents, organic and inorganic particles, and noxious agents of all types. although the lung has many defense mechanisms to protect itself from these insults, these are not infallible and so lung pathology arises. damage to the lung is particularly important given the role of the lung in the survival of the organism. any impairment of lung function has widespread effects throughout the body, since all organs depend on the lungs for the oxygen they need. pulmonary pathology catalogs the changes in the lung tissues and the mechanisms through which these occur. what follows is a review of lung pathology and the current state of knowledge about the pathogenesis of each disease. we believe that a clear understanding of both morphology and mechanism is required for the development of new therapies and preventive measures. lung cancer is a major cause of morbidity and mortality throughout the world. the most recent estimates available from the surveillance, epidemiology, and end results (seer) program of the national cancer institute are that in 2007 over 213,000 people in the united states were diagnosed with cancer of the lung and bronchus, and over 160,000 will have died due to this disease [1] . however, in the past decade incidence and mortality rates have begun to move in a more positive direction, particularly in men. overall, men show a decline in lung cancer incidence, while in women, although lung cancer rates grew from 1975 through 1998, they stabilized from 1998 through 2004 [2] . similarly, cancer death rates due to lung cancer have declined for men and have slowed for women. although, for women, lung cancer death rates have increased since 1975, the rate of increase has slowed to 0.2% annually from 1995 to 2004 [2] . these trends parallel changes in the prevalence of tobacco smoking, the most important risk factor for development of lung cancer. given the tremendous societal and individual impacts of this disease, it is not surprising that the molecular biology of lung cancer is a major focus of investigation. elucidation of the molecular pathogenesis of these neoplasms has progressed significantly, offering insights into new, targeted therapies, and predictors of prognosis and therapeutic responsiveness. recognition of precursor lesions for some types of lung cancers has been facilitated by our expanded understanding of early molecular changes involved in carcinogenesis. the world health organization (who) classification scheme is the most widely used system for classification of these neoplasms (table 18 .1) [3] . although there are numerous histologic types and subtypes of lung cancers, most of the common malignant epithelial tumors can be grouped into the categories of nonsmall cell lung cancers (nsclcs) and small cell carcinomas (sclcs). nsclcs include adenocarcinomas (acs), squamous cell carcinomas (sqccs), large cell carcinomas, adenosquamous carcinomas, and sarcomatoid carcinomas. sclcs include cases of pure and combined small cell carcinoma. common pulmonary symptoms associated with these tumors include cough, shortness of breath, chest pain or tightness, and hemoptysis (coughing up blood). since some tumors cause airway obstruction, they predispose to pneumonia, which can be an important clue to the existence of a tumor in some patients. constitutional symptoms can include fever, weight loss, and malaise. some neoplasms will declare themselves with symptoms related to local invasion of adjacent structures such as chest wall, nerves, superior vena cava, esophagus, or heart. sclcs are known for early and widespread metastasis and are therefore particularly prone to being discovered through presentations as metastases in distant sites. some tumors are discovered due to pathophysiologic changes triggered by the release of soluble substances from tumor cells. endocrine syndromes due to elaboration of hormones are well recognized, and include cushing syndrome, syndrome of inappropriate antidiuretic hormone, hypercalcemia, carcinoid syndrome, gynecomastia, and others. hypercoagulability commonly occurs with lung cancers, leading to manifestations of venous thrombosis, nonbacterial thrombotic endocarditis, and disseminated intravascular coagulation. hematologic changes can include anemia, granulocytosis, eosinophilia, and other abnormalities. other paraneoplastic syndromes such as clubbing of the fingers, myasthenic syndromes, dermatomyositis/polymyositis, and transverse myelitis are noted in subsets of patients. when lung cancer is suspected, evaluation of the patient includes a thorough clinical, radiologic, and laboratory assessment, with collection of tissue or cytology samples to establish a pathologic diagnosis of malignancy and to classify the tumor type. fiberoptic bronchoscopy is often performed to collect samples for diagnosis. sample types can include transbronchial and endobronchial biopsies, bronchial brushings, bronchial washings, bronchoalveolar lavage samples, and transbronchial needle aspirates. submission of sputum samples for cytologic malignant epithelial tumors examination can provide a diagnosis in some cases, particularly for centrally located tumors such as sqcc and sclc. tumors arising in a peripheral location can also be sampled, in many cases, by fine needle aspiration or core needle biopsy performed under radiologic guidance. if a pleural effusion is present in combination with a lung parenchymal tumor, analysis of the pleural fluid cytology often allows one to establish a diagnosis. pleural biopsy, mediastinoscopy with biopsy, and wedge biopsy can also be performed, depending on the clinical and radiologic findings. for tumors with apparent distant metastasis, biopsy of the metastasis focus can both establish a pathologic diagnosis and determine the stage of the tumor. the prognosis of lung cancers is closely related to tumor stage. for nsclcs, the american joint commission on cancer tnm staging system is widely used (table 18. 2) [4] , and for sclcs, disease is classified as limited (restricted to one hemithorax) or extensive. overall, for lung cancers, the 5-year survival is 13.4% for men and 17.9% for women [5] . an important factor leading to this relatively poor survival is the late stage at which many lung cancers are diagnosed. information from the seer database, from 1996-2003, indicates that 16%, 35%, 42%, and 7% of patients were diagnosed with localized, regional, distant, or unstaged disease, respectively [5] . the corresponding 5-year survival rates are 49.0%, 15.3%, 2.8%, and 8.7%, and 10year survival rates are 37.8%, 10.3%, 1.6%, and 5.1% [5] . for patients with nsclcs, treatment depends on stage and comorbid conditions [6] . surgical resection is the preferred approach to treatment of localized nsclcs, provided there is no medical contraindication to operative intervention. lobectomy or more extensive resection (depending on tumor extent) is usually recommended rather than lesser surgeries, unless other comorbid conditions preclude these procedures. tumor 3 cm in greatest dimension, surrounded by lung or visceral pleura, without bronchoscopic evidence of invasion more proximal than the lobar bronchus t2 tumor with any of the following features of size or extent: > 3 cm in greatest dimension, involves main bronchus ! 2 cm distal to the carina, invades visceral pleura, associated with atelectasis or obstructive pneumonitis that extends to the hilar region but does not involve the entire lung t3 tumor of any size that directly invades the chest wall, diaphragm, mediastinal pleura, parietal pericardium; or lies < 2 cm distal to the carina but without involvement of the carina; or is associated with atelectasis or obstructive pneumonitis of the entire lung t4 tumor of any size that invades the mediastinum, heart, great vessels, trachea, esophagus, vertebral body, carina; or has separate tumor nodule(s) in same lobe; or is associated with a malignant pleural effusion. regional lymph nodes (n) nx regional lymph nodes cannot be assessed n0 no regional lymph node metastasis n1 metastasis in ipsilateral peribronchial and/or ipsilateral hilar lymph nodes, including intrapulmonary nodes involved by direct extension of the primary tumor n2 metastasis in ipsilateral mediastinal and/or subcarinal lymph node(s) n3 metastasis in contralateral mediastinal, contralateral hilar, ipsilateral or contralateral scalene or supraclavicular lymph node(s). mx distant metastasis cannot be assessed m0 no distant metastasis m1 distant metastasis; includes separate tumor nodule(s) in a different lobe. occult t0 n0 m0 stage 0 tis n0 m0 stage ia t1 n0 m0 stage ib t2 n0 m0 stage iia t1 n1 m0 stage iib t2 n1 m0 t3 n0 m0 stage iiia t1 n2 m0 t2 n2 m0 t3 n1 m0 t3 n2 m0 stage iiib any t n3 m0 t4 any intraoperative mediastinal lymph node sampling or dissection is also recommended for accurate pathologic staging and determination of therapy. subsets of patients also benefit from chemotherapy and radiotherapy. for more advanced nsclc and for sclc, chemotherapy and radiotherapy are the primary treatment modalities [6] . rare patients with limited-stage sclcs can be considered for surgical resection with curative intent. development of lung cancer occurs with multiple, complex, stepwise genetic and epigenetic changes involving allelic losses, chromosomal instability and imbalance, mutations in tumor suppressor genes (tsgs) and dominant oncogenes, epigenetic gene silencing through promoter hypermethylation, and aberrant expression of genes participating in control of cell proliferation and apoptosis [7] . there are similarities as well as type-specific differences in the molecular alterations between nsclcs and sclcs, and between sqccs and acs [8] [9] [10] . oncogenes that play a part in the pathogenesis of lung cancer include myc, k-ras (predominantly acs), cyclin d1, bcl2, and erbb family genes such as egfr (epidermal growth factor receptor) (predominantly acs) and her2/neu (predominantly acs) [11, 12] . also, lung cancers often display abnormalities involving tsgs including tp53, rb, p16 ink4a , and new candidate tsgs on the short arm of chromosome 3 (dutt1, fhit, rasff1a, fus-1, bap-1) [11, 13] . as research advances, these lists continue to grow, and as knowledge has expanded about the roles of these genes in carcinogenesis and tumor behavior, new targeted therapeutic agents have been designed to treat this disease ( figure 18 .1 and table 18 .3) [14] . many other agents are under investigation. in cancers, chromosomal regions harboring tsgs and oncogenes are often deleted or amplified. allele loss involving loci in 3p14-23 is a consistent feature of lung cancer pathogenesis [15, 16] . wistuba et al. reported allelic losses of 3p, often multiple and discontinuous, in 96% of the lung cancers studied and in 78% of the precursor lesions [15] . larger segments of allelic loss were noted in most sclcs (91%) and sqccs (95%) than in acs (71%) and preneoplastic/preinvasive lesions [15] . there was allelic loss in the 600-kb 3p21.3 deletion region in 77% of the lung cancers; 70% of the normal or reneoplastic/preinvasive lesions associated with lung cancers; and 49% of the normal, mildly abnormal, or preneoplastic/ preinvasive lesions found in smokers without lung cancer, but no loss was seen in the samples from people who had never smoked [15] . 8p21-23 deletions are also frequent and early events in the pathogenesis of lung carcinomas [17] , and other common alterations include loh at 13q, 17q, 18q, and 22p [16] . allelic losses that are more frequent in sqccs than acs include deletions at 17p13 (tp53), 13q14 (rb), 9p21 (p16 ink4a ), 8p21-23, and several regions of 3p [11, 15, 17, 18] . a recent study utilizing a bacterial artificial chromosome array to perform high-resolution whole genome profiling of sqcc and ac cell lines showed that regions of frequent amplification shared by both types of tumors included 5p; chromosome 7, 8q, 11q13, 19q, and 20q; and common regions of deletion included 3p, 4q, 9p, 10p, 10q; chromosome 18; and chromosome 21 [10] . however, acs appeared to have higher frequencies of deletion of chromosome 6; 8p, 9q, 15q; and chromosome 16 than sqccs, and possess small regions of amplification on chromosomes 12 and 14 not seen in sqccs. chromosome arms 2q and 13q were frequently deleted in ac but amplified in sqcc cell lines. both types of tumors showed deletion of chromosome arm 17p, but it was more frequent in the sqcc cell lines, while amplification of chromosome 17p was more frequent in acs. amplification of chromosome 3q was common to both types of tumors but showed frequent alteration at 3q23-3q26 in the sqcc lines and at 3q22 in the ac lines. inactivation of recessive oncogenes is believed to occur through a two-stage process. it has been suggested that the first allelic inactivation occurs, often via a point mutation, and the second allele is later inactivated by a chromosomal deletion, translocation or other alteration such as methylation of the gene promoter region [19] . inactivating mutations in the tsg tp53, which encodes the p53 protein, are the most frequent mutations in lung cancers. these mutations are found in up to 50% of nsclcs and over 70% of sclcs, and are largely attributable to direct dna damage from cigarette smoke carcinogens [20] . tp53 mutational patterns show a prevalence of g to t transversions in 30% of smokers' lung cancers versus only 12% of lung cancers in nonsmokers [20] . p53 protein is a transcription factor and a key regulator of cell cycle progression; cellular signals induced by dna damage, oncogene expression, or other stimuli trigger p53dependent responses including initiating cell cycle arrest, apoptosis, differentiation, and dna repair [21] . loss of p53 function in tumor cells can result in inappropriate progression through the dysregulated cell cycle checkpoints and permits the inappropriate survival of genetically damaged cells [22] . the p16 ink4a -cyclin d1-cdk4-rb pathway, which plays a central role in controlling the g1 to s phase transition of the cell cycle, is another important tumor suppressor pathway that is often disrupted in lung cancers. it interfaces with the p53 pathway through p14 arf and p21 waf/cip1 . thirty percent to 70% of nsclcs contain mutations of p16 ink4a , including homozygous deletion or point mutations and epigenetic alterations, leading to p16 ink4a inactivation [22] . almost 90% of sclcs and smaller numbers of nsclcs, on the other hand, display loss of rb expression [23] , and mutational mechanisms usually responsible include deletion, nonsense mutations, and splicing abnormalities that lead to truncated rb protein [22] . p16 ink4a leads to hypophosphorylation of the rb protein, which causes arrest of cells in the g1 phase. the active, hypophosphorylated form of rb regulates other cellular proteins including the transcription factors e2f1, e2f2, and e2f3, which are essential for progression through the g1/s phase transition. loss of p16 ink4a protein or increased complexes of cyclin d-cdk4-6 or cyclin e-cdk2 lead to hyperphosphorylation of rb with resultant evasion of cell cycle arrest and progression into s phase [21, 23] . cell cycle progression is inhibited by p21 waf/cip1 through its inhibition of the cyclin complexes. the 10%-30% of nsclcs lacking detectable alterations in p16 ink4a and rb may have abnormalities of cyclin d1 and cdk4, which cause inactivation of the rb pathway [22] . figure 18 .2 provides an overview of the p53 and retinoblastoma (rb) pathways, showing the complex interactions between the components [21] . epigenetic alterations (hypermethylation of the 5 0 cpg island) of tsgs are also frequent occurrences during pulmonary carcinogenesis, and methylation profiles of nsclcs show relationships to smoke exposure, histologic type, and geography. methylation rates of p16 ink4a and apc and the mean methylation index (mi) (a reflection of the overall methylation status) in current or former smokers were significantly higher than in never smokers; the mean mi of tumors was highest in current smokers; methylation rates of apc, cdh13, and rarbeta were significantly higher in acs than in sqccs; methylation rates of mgmt and gstp1 in cases from the united states and australia significantly exceeded those from japanese and taiwanese cases; and no significant gender-related differences in methylation patterns were found [24] . proto-oncogene activation and growth factor signaling are important in pulmonary carcinogenesis. the tyrosine kinase epidermal growth factor receptor (egfr) is frequently mutated in nsclcs, particularly in acs, and the mutational status is important in determining response to tyrosine kinase inhibitors. a related pathway, the phosphoinositide 3-kinase (pi3k)/akt/mammalian target of rapamycin (mtor) pathway, is frequently deregulated in pulmonary carcinogenesis. as reviewed by marinov et al., this pathway has been reported to mediate the effects of several tyrosine kinase receptors, including egfr, c-met, c-kit, and igf-ir, on proliferation and survival in nsclc and sclc [25] . clinical trials are ongoing, investigating the efficacy of the mtor inhibitor rapamycin and its analogues on lung cancer [26] . her2/neu is another related receptor tyrosine kinase that is upregulated in approximately 20%-30% of nsclcs [27, 28] , but unlike the situation with her2/neu-positive breast cancers, treatment with anti-her2/neu antibody (trastuzumab) does not seem to yield comparable benefits for nsclc when used alone or in combination with chemotherapy [28, 29] . point mutations of ras family proto-oncogenes (most often at k-ras codons 12, 13, or 61) are detected in 20%-30% of lung acs and 15%-50% of all nsclcs [22] . although farnesyl transferase inhibitors prevent ras signaling, these agents have not shown significant activity as single-agent therapy in untreated nsclc or relapsed sclc [30] . myc family genes (myc, mycn, and mycl), which play roles in cell cycle regulation, proliferation, and dna synthesis, are more frequently activated in sclcs than in nsclcs, either by gene amplification or by transcriptional dysregulation [22] . vascular endothelial growth factor (vegf) is a homodimeric glycoprotein that is overexpressed in many lung cancers and directly stimulates endothelial cell proliferation, promotes endothelial cell survival in newly formed vessels, and induces proteases involved in the degradation of the extracellular matrix needed for endothelial cell migration [31] . its angiogenic effects are mediated by three receptors: vegfr-1, vegfr-2, and vegfr-3; ligand binding leads to tyrosine kinase activation and activation of the signaling pathways required for angiogenesis [31] . monoclonal antibodies to vegf (bevacizumab) and tyrosine kinase inhibitors to vegfrs have been developed and show promise for treatment of nsclc. a phase iii trial of bevacizumab showed significantly improved overall and progression-free survival when this agent was used in combination with standard first-line chemotherapy in patients with advanced nsclc, and several smallmolecule vegfr tyrosine kinase inhibitors have yielded favorable results in phase i and ii trials in nsclc [32] . micrornas are a recently discovered class of nonprotein-coding, endogenous, small rnas which regulate gene expression by translational repression, mrna cleavage, and mrna decay initiated by mirna-guided rapid deadenylation [33] . some micrornas such as let-7 have been suggested to play roles in carcinogenesis by functioning as oncogenes or tumor suppressors, negatively regulating tsgs and/or genes that control cell differentiation or apoptosis [33] . investigations of the therapeutic potential of micrornas are also under way. in the 2004 version of the who classification scheme, ac is defined as "a malignant epithelial tumour with glandular differentiation or mucin production, showing acinar, papillary, bronchioloalveolar or solid with mucin growth patterns or a mixture of these patterns" [34] . ac has become the most frequent histologic type of lung cancer in parts of the world. it occurs primarily in smokers, but represents the most common type of lung cancer in people who have never smoked and in women. a small subset of these tumors arise in patients with localized scars or diffuse fibrosing lung diseases such as asbestosis and interstitial pneumonia associated with scleroderma [35] . these neoplasms usually arise in the periphery of the lung, and are more likely to invade the pleura and chest wall than other histologic types of lung cancers. radiologic studies can show one or more nodules, ground-glass opacities, or mixed solid and ground-glass lesions. on gross examination, the neoplasms are often solitary gray-white nodules or masses, sometimes with necrosis or cavitation, which pucker the overlying pleura. mucin-producing tumors can have a glistening, gelatinous appearance. other presentations include a pattern of consolidation resembling pneumonia (usually bronchioloalveolar carcinoma) ( figure 18 .3), multiple nodules, diffuse interstitial widening due to lymphangitic spread, endobronchial lesions with submucosal infiltration, and diffuse visceral pleural infiltration and thickening resembling mesothelioma. common histologic patterns displayed by acs include acinar ( figure 18 chapter 18 molecular basis of pulmonary disease mixtures of these patterns are very frequent. less common histologic subtypes include fetal ac, mucinous (colloid) ac, mucinous cystadenocarcinoma, signet ring ac, and clear cell ac [34] . acs usually exhibit differentiation toward clara cells or type ii pneumocytes or, less often, goblet cells. they manifest a range of differentiation extending from very well-differentiated tumors with extensive gland formation and little cytoatypia, to poorly differentiated, solid tumors that cannot be categorized as acs unless one orders a mucin stain (figure 18.7) . however, most examples include readily identifiable glands. invasiveness is reflected by the presence of neoplastic glands that infiltrate through stroma or pleura, stimulating a fibroblastic (desmoplastic) response ( figure 18.4) , or by cells in the lumens of blood vessels or lymphatics. in recent years, atypical adenomatous hyperplasia (aah) has been recognized as a precursor lesion for peripheral pulmonary acs. this lesion is defined as "a localized proliferation of mild to moderately atypical cells lining involved alveoli and, sometimes, respiratory bronchioles, resulting in focal lesions in peripheral part iv molecular pathology of human disease alveolated lung, usually less than 5 mm in diameter and generally in the absence of underlying interstitial inflammation and fibrosis" (figure 18 .8) [36] . aah exists on a histologic continuum with bronchioloalveolar carcinoma (bac), which is defined as an in situ (noninvasive) form of ac, in which the neoplastic cells grow along alveolar septa (lepidic growth) without invasion of stroma or vasculature ( figure 18 .5, figure 18 .6) [34] . most bacs exceed 1 cm in diameter and consist of cells with greater degrees of cytoatypia than aah. although aah is found in approximately 3% of patients without lung cancer at autopsy [37] , it has been reported in 9%-21% of lung resection specimens with all types of primary lung cancer and 16%-35% of lung resection specimens with ac [36] . the progenitor cell for bac and aah is believed to be an epithelial cell located at the junction between the terminal bronchiole and alveolus, termed the bronchioalveolar stem cell [38] . a recently published large-scale study of primary lung acs, using dense single nucleotide polymorphism arrays, described 57 significantly recurrent copy-number alterations in these tumors (table 18 .4) [12] . twenty-six of 39 autosomal chromosome arms showed consistent large-scale copy-number gain or loss, and 31 recurrent focal events, including 24 amplifications and 7 homozygous deletions, were found. although some of the alterations involved regions known to harbor a proto-oncogene or tsg, these genes remain to be identified in some of the other regions affected. amplification of chromosome 14q13.3 was the most common event noted, found in 12% of samples. this region includes nkx2-1, which encodes a lineage-specific transcription factor (thyroid transcription factor-1 [ttf-1]) that activates transcription of target genes including the surfactant proteins, and may be an important proto-oncogene involved in a significant fraction of lung acs. immunohistochemical staining for ttf-1 can be performed to detect expression of this factor in most lung adenocarcinomas, aiding in the determination of the lung as the site of origin of the tumor (figure 18 .9). additional work using small interfering rna (sirna)mediated knockdown of this gene in lung cancer cell lines with amplification led to reductions in tumor cell proliferation, through both decreased cell cycle progression and increased apoptosis, suggesting that gene amplification and overexpression contribute to lung cancer cell proliferation rates and survival [39] . egfr and k-ras mutations are mutually exclusive mutational events in ac of the lung, which suggests the existence of two independent oncogenic pathways [40, 41] . egfr is a receptor tyrosine kinase whose activation by ligand binding leads to activation of cell signaling pathways such as ras/mitogen-activated protein kinase (mapk) and phosphatidylinositol-3-kinase, which in turn propagates signals for proliferation, blocking of apoptosis, differentiation, motility, invasion, and adhesion [21] . tumor-acquired mutations in the tyrosine kinase domain of egfr, often associated with gene amplification, have been found in approximately 5%-10% of nsclcs in the united states, and are associated with ac histology, never-smoker status, east asian ethnicity, and female gender [14, 40, 42] . egfr mutations are frequently in-frame deletions in exon 19, single missense mutations in exon 21, or in-frame duplications/insertions in exon 20, and occasional missense mutations and double mutations can also be detected [40, 43] . egfr mutation has an inverse correlation with methylation of the p16 ink4a gene and sparc (secreted protein acidic and rich in cysteine), an extracellular ca2ã¾-binding glycoprotein associated with the regulation of cell adhesion and growth [41] . egfr status is an important predictor of response to egfr kinase inhibitors: patients with egfr mutations are most likely to have a significant response to egfr tyrosine kinase inhibitor therapy, and egfr amplification and protein overexpression have been reported to correlate with survival after egfr tyrosine kinase inhibitor therapy [14, 44] . k-ras is a member of the ras family of proteins, which function as signal transducers between cell membrane-based growth factor signaling and the mapk pathways [21] . k-ras mutations are associated with smoking, male gender, and poorly differentiated tumors [43] . her2 (also known as egfr2 or erbb2), a member of the egfr family of receptor tyrosine kinases, is mutated in less than 2% of nsclc, and does not occur in tumors with egfr or k-ras mutation [45] . the her2 mutations are in-frame insertions in exon 20 and are significantly more frequent in acs (2.8%), never smokers (3.2%), asian ethnicity (3.9%), and women (3.6%), similar to egfr mutations [45] . alterations in dna methylation appear to be important epigenetic changes in cancer, contributing to chromosomal instability through global hypomethylation, and aberrant gene expression through alterations in the methylation levels at promoter cpg islands [46] . this lesion, which has been defined as a precursor lesion for peripheral pulmonary adenocarcinomas, consists of a wellcircumscribed nodule measuring several millimeters in diameter, in which alveolar septa are lined by mildly moderate atypical cells. epigenetic differences exist between egfr-mediated and k-ras-mediated tumorigenesis, and may interact with the genetic changes. a recent study showed that the probability of having egfr mutation was significantly lower among those with p16 ink4a and cdh13 methylation than in those without, and the methylation index was significantly lower in egfr mutant cases than in wild-type. in contrast, k-ras mutation was significantly higher in p16 ink4a methylated cases than in unmethylated cases, and the methylation index was higher in k-ras mutant cases than in wild-type [47] . sqcc is defined as "a malignant epithelial tumour showing keratinization and/or intercellular bridges that arises from bronchial epithelium," in the who classification scheme [48] . it is a common histologic type of nsclc that is closely linked to cigarette smoking. in most patients, this tumor arises in a mainstem, lobar, or segmental bronchus, producing a central mass on imaging known tumor suppressor genes and proto-oncogenes defined as found in either cosmic30, cgp census31, or other evidence; if there is more than one known proto-oncogene in the region, only one is listed (priority for listing is, in order: known lung adenocarcinoma mutation; known lung cancer mutation; other known mutation (by cosmic frequency); listing in cgp census). @myc is near, but not within, the peak region. ksingle gene deletions previously seen, this study provides new mutations as well. part iv molecular pathology of human disease studies. many of these tumors have an endobronchial component that can cause airway obstruction, leading to postobstructive pneumonia, atelectasis, or bronchiectasis. not infrequently, it is the pneumonia that prompts evaluation of the patient and leads to discovery of the tumor. less often, sqccs develop in the periphery of the lung. gross examination reveals a tan or gray mass that usually arises in a large bronchus and often includes an endobronchial component (figure 18 .10, figure 18 .11). partial or complete airway obstruction can be associated with changes of pneumonia, bronchitis, abscess, bronchiectasis, or atelectasis. necrosis and cavitation are very common in these tumors. involvement of hilar lymph nodes by tan-gray tumor can be visible in some resected specimens. microscopically, the key features of this tumor are its keratinization, sometimes with formation of keratin pearls, and intercellular bridges ( figure 18 .12). as is true of acs, the degree of differentiation of this tumor varies from very well differentiated cases, in which there are abundant keratinization and intercellular bridges and little cytoatypia, to very poorly differentiated cases, in which keratinization and intercellular bridges can be quite inconspicuous and the tumor consists of sheets of large atypical cells with marked cytoatypia and frequent mitoses. however, most cases fall more toward the middle of the spectrum. invasiveness is reflected by the presence of irregular nests and sheets of cells that infiltrate through tissues, stimulating a fibroblastic response, or by cells inside vascular or lymphatic spaces. invasive sqccs are often accompanied by sqcc in situ and dysplasia, their precursor lesions. these lesions arise in the bronchi and may be contiguous with the invasive tumor or exist as one or more separate foci. these precursor lesions can also be observed without coexisting invasive carcinoma. like sqcc, tobacco smoking is the main predisposing factor for sqcc in situ and dysplasia. unlike invasive sqcc, however, these lesions are not invasive-they do not extend through the basement membrane of the bronchial epithelium. grossly, they may be invisible or appear as flat, tan or red discolorations of the bronchial mucosa, or tan wart-like excrescences. microscopically, these lesions encompass a chapter 18 molecular basis of pulmonary disease range of squamous changes that include alterations in the thickness of the bronchial epithelium, the maturational progress of squamous differentiation, cell size, and nuclear characteristics ( figure 18 .13, figure 18 .14) [11, 49] . as dysplasia increases from mild to moderate to severe, the epithelium thickens, and maturation is increasingly impaired. the basilar zone expands with epithelial cell crowding, the intermediate zone shrinks, and there is reduced flattening of the superficial squamous cells. cell size, pleomorphism, and anisocytosis usually increase, and there is coarsening of the chromatin and appearance of nucleoli, nuclear angulations, and folding. in carcinoma in situ, although the epithelium may or may not be thickened and the cell size may be small, medium, or large, there is minimal or no maturation from the base to the superficial aspect, and the atypical nuclear features are present throughout the entire thickness of the epithelium. mitoses appear in the lower third (mild or moderate dysplasia), lower two-thirds (severe dysplasia), or throughout the full thickness of the epithelium (carcinoma in situ). basal cells in the bronchial epithelium are believed to represent the progenitor cells for invasive sqcc, and the sequence of events leading to sqcc is believed to include basal cell hyperplasia, squamous metaplasia, squamous dysplasia, carcinoma in situ, and invasive sqcc (figure 18 .14) [11, [49] [50] [51] . regression of lesions preceding invasive sqcc can occur, particularly the earlier lesions [52] . however, severe dysplasia and carcinoma in situ are associated with a significantly increased probability of developing invasive sqcc in patients followed over time with surveillance bronchoscopy [53] . wistuba and colleagues evaluated sqccs and precursor lesions for loss of heterozygosity (loh) at 10 chromosomal regions (3p12, 3p14.2, 3p14.1-21.3, 3p21, 3p22-24, 3p25, 5q22, 9p21, 13q14 rb, and 17p13 tp53) part iv molecular pathology of human disease frequently deleted in lung cancer and found multiple, sequentially occurring allele-specific molecular changes in separate, apparently clonally independent foci, early in the pathogenesis of sqccs of the lung, suggesting a field cancerization effect [11, 18] . they observed clones of cells with allelic loss at one or more regions in 31% percent of histologically normal epithelium and 42% of specimens with hyperplasia or metaplasia; increasing frequency of loh within clones with increasing histopathologic lesional severity; the most frequent and earliest regions of allelic loss at 3p21, 3p22-24, 3p25, and 9p21; increasing size of the 3p deletions with progressive histologic changes; and tp53 allelic loss in many histologically advanced lesions (dysplasia and cis) [18] . an overview of the sequential molecular events leading to invasive sqcc is shown in figure 18 .14 [11] . large cell carcinoma is an undifferentiated nsclc without light microscopic evidence of squamous or glandular differentiation, although squamous or glandular features may be detectable by ultrastructural examination (figure 18 .15) [54] . histologic subtypes of large cell carcinoma include large cell neuroendocrine carcinoma (lcnec), combined lcnec, basaloid carcinoma, lymphoepithelioma-like carcinoma, clear cell carcinoma, and large cell carcinoma with rhabdoid phenotype [54] . clinical signs and symptoms resemble those of other types of nsclc. most tumors develop as peripheral lung masses, except for basaloid carcinomas, which usually form centrally located masses. histologically, large cell carcinomas consist of sheets and nests of large cells with vesicular nuclei, prominent nucleoli, and moderate or abundant amounts of cytoplasm. lcnecs demonstrate neuroendocrine architectural features and immunohistochemical or ultrastructural evidence of neuroendocrine differentiation. basaloid carcinomas display nests of small, monomorphic, rounded or fusiform tumor cells with little cytoplasm, numerous mitoses, comedo-type necrosis, and hyaline or mucoid stromal degeneration. clear cell carcinoma consists of large tumor cells with clear cytoplasm. precursor lesions are not currently recognized for any of the subtypes of large cell carcinoma. however, basaloid carcinoma is associated with squamous dysplasia in about one-third of cases [54] . large cell carcinomas are poorly differentiated carcinomas that can demonstrate features of ac (most frequent), sqcc, or neuroendocrine differentiation when examined by immunohistochemistry, electron microscopy, or molecular methods [55] . these tumors often demonstrate losses of 1p, 1q, 3p, 6q, 7q, and 17p, and gains of 5q and 7p, more closely resembling acs than other histologic types of lung cancer [56] . common molecular abnormalities include tp53 mutation, c-myc amplification, and p16 promoter hypermethylation, while k-ras mutation is less common [55] . egfr tyrosine kinase domain mutation is not characteristic of large cell carcinomas, and egfrviii (deletion mutations in the extracellular domain of egfr) is uncommon [57, 58] . the major categories of pulmonary neuroendocrine (ne) neoplasms include small cell carcinoma (sclc), large cell neuroendocrine carcinoma (lcnec), typical carcinoid, and atypical carcinoid. sclc and lcnec are high-grade carcinomas, typical carcinoid is a low-grade malignant neoplasm, and atypical carcinoid occupies an intermediate position in the spectrum of biologic aggressiveness. in one large series, the 5-year and 10-year survival rates for typical carcinoid were 87% and 87%, 56% and 35% for atypical carcinoid, 27% and 9% for lcnec, and 9% and 5% for sclc, respectively [59] . by light microscopy, these tumors display ne architectural features including organoid nesting, a trabecular arrangement, rosette formation, and palisading. these patterns are more prominent in carcinoids than in lcnecs and may or may not be visible in individual sclcs. typical carcinoids contain fewer than 2 mitoses per 2 mm 2 (10 hpf) and lack necrosis ( figure 18 .16), while atypical carcinoids show 2-10 mitoses per 2 mm 2 (10 hpf) or necrosis, which is often punctate [60] . sclc consists of small, undifferentiated tumor cells with scant cytoplasm and finely granular chromatin and absent or inconspicuous nucleoli ( figure 18 .17). nuclear molding is characteristic, necrosis is common, and the mitotic rate is typically high, with a mean of over 60 mitoses per 2 mm 2 [61] . combined differences also exist in the characteristics of patients with carcinoids, as compared to patients with sclc and lcnec. patients with carcinoids are typically younger and less likely to smoke than those with sclcs and lcnecs, the vast majority of whom have a current or previous history of tobacco smoking [62, 63] . rare patients with carcinoids have the multiple endocrine neoplasia 1 (men1) syndrome, an association that is not seen with sclcs and lcnecs. in addition, an association with diffuse idiopathic pulmonary neuroendocrine cell hyperplasia (dipnech) has been noted for carcinoids but not for sclcs and lcnecs, leading to classification of dipnech as a preinvasive lesion in the most recent version of the who classification scheme [64] . dipnech is a diffuse proliferation of single cells, small nodules (ne bodies), and linear proliferations of pulmonary ne cells that may reside in the bronchial and/or bronchiolar epithelia ( figure 18 . 19) , and may be accompanied by extraluminal proliferations part iv molecular pathology of human disease (tumorlets and carcinoids) [64] . however, morphologically identifiable precursor lesions for sclc and lcnec have not been established. molecular markers of pulmonary ne tumors include chromogranin a, synaptophysin (figure 18.20) , and n-cam (cd56). these markers are expressed by all categories of ne tumors, with higher frequencies observed in the carcinoids and atypical carcinoids than in small cell and large cell neuroendocrine carcinomas. gastrin-releasing peptide, calcitonin, other peptide hormones, the insulinoma-associated 1 (insm1) promotor and the human achaete-scute homolog-1 (hash1) gene have also been reported as overexpressed by these tumors [65, 66] . thyroid transcription factor-1 (ttf-1) is expressed by 80%-90% of sclcs, 30%-50% of lcnecs, and 0%-70% of carcinoids [67] [68] [69] [70] . sclcs [71] [72] [73] [74] [75] [76] . more than 90% of sclcs and sqccs demonstrate large, often discontinuous segments of allelic loss on chromosome 3p, in areas encompassing multiple candidate tumor suppressor genes, including some of those listed previously [15, 75] . atypical carcinoids show a higher frequency of loh at 3p, 13q, 9p21, and 17p than typical carcinoids, but not as high as the high-grade ne tumors [77] . some typical and atypical carcinoids possess mutations of the multiple endocrine neoplasia 1 (men1) gene on chromosome 11q13 or loh at this locus [78] , while these abnormalities occur with lower frequencies in sclcs and lcnecs, supporting separate pathways of tumorigenesis [79] . men1 encodes for the nuclear protein menin, which is believed to play several roles in tumorigenesis by linking transcription factor function to histone-modification pathways, in part through interacting with the activator-protein-1 family transcription factor jund, modifying it from an oncoprotein into a tumor suppressor protein [80] . oncogenes frequently amplified in sclcs include myc (8q24), mycn (2p24), and mycl1 (1p34), and additional amplified genes that represent candidate oncogenes include the antiapoptotic genes tnfrsf4 (1p36), dad1 (14q11), bcl2l1 (20q11), and bcl2l2 (14q11) [76] . the myc proteins are transcription factors that are important in cell cycle regulation, proliferation, and dna synthesis, and can induce p14 arf , leading to apoptosis through p53 if cellular conditions do not favor proliferation [21] . tsgs are inactivated in the majority of sclcs. eighty percent to 90% of sclcs demonstrate tp53 mutations, as compared to more than 50% of nsclcs, fewer atypical carcinoids, and virtually no typically carcinoids [74, 81] . most of the tp53 mutations in sclcs are missense point mutations that result in a stabilized p53 mutant protein which can be easily detected by immunohistochemistry [71] . p53 protein overexpression occurs frequently in high-grade ne carcinomas, but is unusual in typical carcinoids and intermediate in atypical carcinoids [82, 83] . dysregulation of p53 produces downstream effects on bcl-2 and bax. antiapoptotic bcl-2 predominates over proapoptotic bax in the high-grade ne carcinomas, while the reverse is true for carcinoids [82] . lcnecs resemble sclcs in their high rates of tp53 mutation and predominance of bcl-2 expression over bax expression [84] . alterations compromising the p16 ink4a /cyclin d1/rb pathway of g1 arrest are consistent in high-grade pulmonary ne carcinomas (92%), primarily through loss of rb protein, but are less frequent in atypical carcinoids (59%) and are uncommon in typical carcinoids [23] . mutations in the rb1 gene exist in many sclcs, with associated loss of function of the gene product [71, 74, 85] . in another study, 89% of the ne carcinomas (excluding carcinoids) versus 13% of the non-ne carcinomas exhibited loh and loss of rb-protein expression [86] . the hypophosphorylated form of rb protein functions as a cell cycle regulator for g1 arrest; cyclin d1 overexpression and p16 ink4a loss produce persistent hyperphosphorylation of rb with consequent evasion of cell cycle arrest [23] . recent data also suggest that in sclcs, overexpression of mdm2 (a transcriptional target of p53) or p14 arf loss leads to evasion of cell cycle arrest through the p53 and rb pathway ( figure 18 .2) [71] . the transcription factor e2f-1 appears to play a role in cellular proliferation by activating genes required for s phase entry. e2f-1 product is overexpressed in 92% of sclcs and 50% of lcnecs, and is significantly associated with a high ki67 index and bcl-2:bax ratio >1 [87] . a mediator of the proteasomal degradation of e2f-1, the s phase kinase-associated protein 2 (skp2) f-box protein accumulates in high-grade ne carcinomas (86%), and its overexpression has been associated with advanced stage and nodal metastasis in pulmonary ne tumors [88] . in the high-grade ne tumors, skp2 appears to interact with e2f-1 and stimulate its transcriptional activity toward the cyclin e promoter [87, 88] . telomeres play an important role in the protection of chromosomes against degradation. telomerases, the enzymes that synthesize telomeric dna strands, serve to counterbalance losses of dna during cell divisions. high telomerase activity has been noted in over 80% of sclcs and lcnecs [89] [90] [91] versus 14% or fewer typical carcinoids [91, 92] . expression of human telomerase mrna component (hterc) and human telomerase reverse transcriptase (htert) mrna were reported, respectively, in 58% and 74% of typical carcinoids; and in 100% and 100% of atypical carcinoids, lcnecs and sclcs, and telomere length alterations in lcnecs and sclcs were greater than in typical carcinoids [92] . aberrant methylation of cytosine-guanine (cpg) islands in promoter regions of malignant cells is an important mechanism for silencing of tsgs (epigenetic inactivation). methylation of dna involves the transfer of a methyl group, by a dna methyltransferase, to the cytosine of a cpg dinucleotide [93] . rassf1a is a potential tsg that undergoes epigenetic inactivation in virtually all sclcs and a majority of nsclcs through hypermethylation of its promoter region [94, 95] . ne tumors have lower frequencies of methylation of p16, apc, and cdh13 (h-cadherin) than nsclcs [95] . sclcs have higher frequencies of methylation of rassf1a, cdh1 (e-cadherin), and rarb than carcinoids [95] . promoter methylation of casp8, which encodes the apoptosis-inducing cysteine protease caspase 8, was also found in 35% of sclcs, 18% of carcinoids, and no nsclcs, suggesting that casp8 may function as a tsg in ne lung tumors [96] . although histologically defined precursors for sclc are lacking, a higher incidence of genetic abnormalities is found in the normal or hyperplasic airway epithelium of patients with sclc than nsclc [97] . by extension, it has been suggested that sclc may arise directly from histologically normal or mildly abnormal epithelium, rather than evolving through a sequence of recognizable histologic intermediary changes [11] . relatively little is known about molecular abnormalities in precursors of carcinoids. although carcinoids have been viewed as arising from tumorlets, 11q13 (int-2) allelic imbalance is significantly more common in carcinoids (73%) than in tumorlets (9%), and may represent an early event in carcinoid tumor formation [98] . the int-2 gene lies in close proximity to men1, a tumor suppressor gene frequently mutated in ne tumors [98] . the molecular pathology of dipnech remains to be elucidated. mesenchymal neoplasms included in the who classification scheme (table 18 .1) encompass a spectrum of malignant and benign proliferations that show differentiation along multiple lineages. overall, these tumors are much less common in the lung than are epithelial neoplasms. information about molecular pathogenesis has emerged for some of the mesenchymal neoplasms. pulmonary inflammatory myofibroblastic tumor (imt) is a lesion composed of myofibroblastic cells, collagen, and inflammatory cells that primarily occurs in individuals less than 40 years of age, and is the most common endobronchial mesenchymal lesion in childhood ( figure 18 .21) [99] . synovial sarcoma is usually a soft tissue malignancy, but uncommonly arises in the pleura or the lung and often takes an aggressive course [100] . pulmonary hamartomas are benign neoplasms consisting of mixtures of cartilage, fat, connective tissue, and smooth muscle, which present as coin lesions on chest radiographs and are excised in order to rule out a malignancy ( figure 18 .22). many imts demonstrate clonal abnormalities with rearrangements of chromosome 2p23 and the anaplastic lymphoma kinase (alk) gene [101] . the rearrangements involve fusion of tropomyosin (tpm) n-terminal coiled-coil domains to the alk c-terminal kinase domain, producing two alk fusion genes, tpm4-alk and tpm3-alk, which encode oncoproteins with constitutive kinase activity [102] . like their soft tissue counterparts, more than 90% of pulmonary and pleural synovial sarcomas demonstrate a chromosomal translocation t(x;18) (syt-ssx) [103, 104] . detection of this translocation can be very helpful for confirming the diagnosis of synovial sarcoma in this unusual location. most pulmonary hamartomas show abnormalities of chromosomal bands 6p21, 12q14-15, or other regions [105] , corresponding to mutations of high-mobility group (hmg) proteins, a family of nonhistone chromatin-associated proteins that serve an important role in regulating chromatin architecture and gene expression [106] . malignant mesothelioma (mm) is an uncommon, aggressive tumor arising from mesothelial cells on serosal surfaces, primarily the pleura and peritoneum, and less often the pericardium or tunica vaginalis. the most important risk factor for mm is exposure to the subset of asbestos fibers known as amphiboles (crocidolite and amosite) [107] . the incidence of this tumor in the united states peaked in the early to mid-1990s, and appears to be declining, likely related to decreases in the use of amphiboles since their peak period of importation in the 1960s [107] . these tumors are characterized by long latency periods between asbestos exposure and clinical presentation of the tumor, with a mean of 30-40 years [108] . radiation, a nonasbestos fiber known as erionite, and potentially other processes associated with pleural scarring have also been implicated in the causation of smaller numbers of cases of malignant mesothelioma [108] , and a role for simian virus 40 (sv40) in the genesis of this tumor has been suggested by some, but remains controversial [109, 110] . pleural mm most commonly arises in males over the age of 60. presenting features typically include a hemorrhagic pleural effusion associated with shortness of breath and chest wall pain. weight loss and malaise are common. by the time the tumor is discovered, patients usually have extensive involvement of the pleural surfaces. with progression, the tumor typically invades the lung, chest wall, and diaphragm. lymph node metastasis can cause superior vena caval obstruction, and cardiac tamponade, subcutaneous nodules, and contralateral lung involvement can also occur. from the time of diagnosis, the median survival is 12 months [110] . treatment may include surgery, chemotherapy, radiotherapy, immunotherapy, or other treatments, often in combination [110] . the intent of surgery is usually palliative. whether extrapleural pneumonectomy with chemotherapy and radiotherapy can lead to cure is unclear [111] . new agents are currently under investigation for their potential to improve the life expectancy and quality of life in patients with this aggressive malignancy. gross pathologic features of mm include pleural nodules which grow and coalesce to fill the pleural cavity and form a thick rind around the lung. a firm tan appearance is common, and occasionally the tumor can have a gelatinous consistency (figure 18 .23). extension along the interlobar fissures and invasion into the adjacent lung, diaphragm, and chest wall are characteristic. further spread can occur into the pericardial cavity and around other mediastinal structures, and distant metastases can also develop. histologically, mm manifests a wide variety of histologic patterns. the major histologic categories include epithelioid mesothelioma, sarcomatoid mesothelioma, desmoplastic mesothelioma, and biphasic mesothelioma [108] . epithelioid mesothelioma consists of round, ovoid, or polygonal cells with eosinophilic cytoplasm and nuclei that are usually round with little cytoatypia (figure 18 .24). these cells most often form sheets, tubulopapillary structures, or gland-like arrangements, and some tumors can have a myxoid appearance due to production of large amounts of hyaluronate. sarcomatoid mesothelioma is composed of malignant-appearing spindle cells occasionally accompanied by mature sarcomatous components (osteosarcoma, chondrosarcoma, others). desmoplastic mesothelioma can be a diagnostic challenge due to its frequently bland appearance and resemblance to organizing pleuritis. it consists of variably atypical spindle cells in a dense collagenous matrix ( figure 18 .25). helpful features for separating figure 18 .23 malignant mesothelioma. the tan/white tumor involves the entire pleura surrounding and compressing the underlying parenchyma, which appears congested but relatively unremarkable. chapter 18 molecular basis of pulmonary disease this tumor from organizing pleuritis include invasion of chest wall muscle or adipose tissue and necrosis. biphasic mesotheliomas include both epithelioid and sarcomatoid elements, each comprising at least 10% of the tumor [108] . pathologic diagnosis of mm has been greatly assisted by the expanded availability of antibodies for use in immunohistochemistry [112] . mesothelial differentiation can be supported by immunoreactivity with cytokeratin 5/6, calretinin ( figure 18 .26), hbme-1, d2-40, and other antibodies. histologic distinction of epithelioid mesotheliomas from metastatic acs is a common need in practice, and a panel approach using calretinin and cytokeratin 5/6, with other antibodies reactive with acs (cea, moc-31, ber-ep4, leu m1, b72.3, and others) will usually be successful. electron microscopy can also be helpful in difficult cases by demonstrating long thin microvilli in many mms with an epithelioid component. pan-cytokeratin staining is helpful for supporting a diagnosis of sarcomatoid or desmoplastic mm as opposed to sarcoma, since most (but not all) sarcomas will not stain for pan-cytokeratin. other mesothelial and mesenchymal markers can also be useful for assisting in the differentiation of mm from histologically similar sarcomas. precursor lesions for mm have not been clearly defined from a histologic standpoint, although it is likely that an in situ stage exists [108] . the term atypical mesothelial hyperplasia has been recommended for surface (noninvasive) proliferations of mesothelial cells of uncertain malignant potential [108] . exposure to asbestos fibers is believed to trigger the pathobiological changes leading to the majority of mms. currently, it is believed that asbestos may act as an initiator (genetically) and promoter (epigenetically) in the development of mms [113] . the degree to which tumorigenesis results from direct interactions of the fibers with the mesothelial cells, or through other mechanisms involving oxidative stress (or both), is unresolved [113, 114] . multiple chromosomal alterations are often noted in mms, and inactivation of tsgs plays an important part in the pathogenesis of mm [113] . a variety of genetic abnormalities have been reported including deletions of 1p21-22, 3p21, 4p, 4q, 6q, 9p21, 13q13-14, 14q, and proximal 15q, monosomy 22, and gains of 1q, 5p, 7p, 8q22-24, and 15q22part iv molecular pathology of human disease 25 [108, 115] . the most common genetic abnormality in mm is a deletion in 9p21 encompassing the cdkn2a locus encoding the tumor suppressors p16 ink4a and p14 arf , which participate in the p53 and rb pathways and inhibit cell cycle progression ( figure 18 .2) [113, 116] . recent studies have shown that sv40 large t antigen (present in some mms) inactivates the tsg products rb and p53, raising the possibility that asbestos and sv40 could act as co-carcinogens in mm and suggesting that perturbations of rb-and p53-dependent growth-regulatory pathways may be involved in the pathogenesis of mm [115] . other common findings include inactivating mutations with allelic loss in the tsg neurofibromin 2 (nf2), found at chromosome 22q12 [117] , and inactivation of cdkn2a/p14 arf and gpc3 (another tsg) by promoter methylation [108] . loss of cdkn2a/ p14 arf also results in mdm2-mediated inactivation of p53 [116] . however, in mms, unlike many other epithelial tumors, mutations in the tp53, rb, and ras genes are rare [118] . the wnt signal transduction pathway is also abnormally activated in mms and appears to play a role in pathogenesis [119] . activation of the pathway leads to accumulation of b-catenin in the cytoplasm and its translocation to the nucleus. interactions with tcf/ lef transcription factors promote expression of multiple genes including c-myc and cyclin d. the mechanism of activation does not appear to involve mutations in the b-catenin gene, but may instead involve more upstream components of the pathway, such as the disheveled proteins [119] . recent evidence also suggests that the phosphatidylinositol 3-kinase (pi3-k/akt) pathway is frequently activated in mms, and that inhibition of this pathway can increase sensitivity to a chemotherapeutic agent [120] . the wilms' tumor gene (wt1) is also expressed in most mms, but its role in the pathogenesis of mm is unclear [114] . finally, egfr signaling in mms has recently become a focus of greater attention, and there are some data showing that the egfr is an early cell membrane target of asbestos fibers and is linked to activation of the mapk cascade [113] . unfortunately, a phase ii clinical trial of gefitinib treatment in patients with mms did not show effectiveness, despite egfr overexpression in over 97% of cases [121] . another study found that common egfr mutations conferring sensitivity to gefitinib are not prevalent in human malignant mesothelioma [122] . further investigation continues into new, potentially efficacious agents for the treatment of mm. non-neoplastic pulmonary pathology comprises inflammatory and fibrosing diseases of the conducting airways, alveoli, vessels, and lymphoid tissue. this pathology may be localized or diffuse, may either have an obvious etiology or be idiopathic, and may cause injury that is reparable or irreparable. most importantly, an understanding of non-neoplastic lung pathology plays a vital role in the clinical management of these diseases. this section covers the major types of obstructive and interstitial diseases, the vascular lesions, the pneumonias, the occupational diseases, the major histiocytic conditions, and the most common developmental anomalies. this list does not include all of the non-neoplastic diseases that can affect the lung, but it represents those that are responsible for the majority of illness. also, the conditions highlighted within each of these categories are those about which we best understand the molecular biology of the disease mechanisms. obstructive lung diseases are characterized by a reduction in airflow due to airway narrowing. this airflow reduction occurs, in general, by two basic mechanisms: (i) inflammation and injury of the airway, resulting in obstruction by mucous and cellular debris within and around the airway lumen; and (ii) destruction of the elastin fibers of the alveolar walls, causing loss of elastic recoil and subsequent premature collapse of the airway during the expiratory phase of respiration. there are four major obstructive lung diseases: asthma, emphysema, chronic bronchitis, and bronchiectasis. asthma is a chronic inflammatory disease of the airways that affects more than 150 million people worldwide. the prevalence of disabling asthma has increased over 200% since 1969, ranging from as low as 1% in rural ethiopia to over 20% among children in parts of central and south america [123] . in the united states, asthma affects approximately 8%-10% of the population and is the leading cause of hospitalization among children less than 15 years of age [123] . clinically, the disease is defined as a generalized obstruction of airflow with a reversibility that can occur spontaneously or with therapy. it is characterized by recurrent wheezing, cough, or shortness of breath resulting from airway hyperactivity and mucus hypersecretion. the hyperresponsiveness is a result of acute bronchospasm and can be elicited for diagnostic purposes using histamine or methacholine challenges. the key feature of these symptoms is that they are variable-worse at night or in the early morning, and in some people worse after exercise. it has previously been assumed that these symptoms are separated by intervals of normal physiology. however, evidence is now accumulating that asthma can cause progressive lung impairment due to chronic morphologic changes in the airways. the treatment strategies for this complex disease are myriad. in atopic individuals, allergen avoidance should be the primary therapy. for example, in children, reducing exposure to house dust mites early in life decreases sensitization and the incidence of disease. for those who do develop the disease, avoidance of allergens later in life improves symptom control. established treatments for asthma flairs include inhaled corticosteroids, and short-acting and long-acting b2-adrenoceptor agonists. phosphodiesterase (pde) inhibitors such as theophylline have been used for decades to treat asthmatic bronchoconstriction, but both cardiac and central nervous systems side effects have limited their use. newer pde inhibitors without side effects include non-xanthine drugs such as rofumilast. the pathologic changes to the airways in asthma are very similar to those seen in chronic bronchitis. they consist of a thickened basement membrane with epithelial desquamation, goblet cell hyperplasia, and subepithelial elastin deposition. in the wall of the airway, smooth muscle hypertrophy and submucosal gland hyperplasia are also present ( figure 18 .27). in acute asthma exacerbations, a transmural chronic inflammatory infiltrate with variable amounts of eosinophilia may be present, resulting in epithelial injury and desquamation that can become quite pronounced. one sees clumps of degenerating epithelial cells mixed with mucin in the lumen airway. these aggregates of degenerating cells are referred to as creola bodies and can be seen in expectorated mucin from these patients. also present in these sputum samples are charcot-leyden crystals, rhomboid-shaped structures that represent breakdown products from eosinophil cytoplasmic granules ( figure 18 .28). the changes seen in the walls of these airways represent long-term airway remodeling caused by prolonged inflammation. this remodeling may play a role in the pathophysiology of asthma. the amount of airway remodeling is highly variable from patient to patient, but remodeling has been found even in patients with mild asthma. currently, the effect of the treatment on this chronic pathology is unclear [124] . the pathogenesis of asthma is complex, and most likely involves both genetic and environmental components. most experts now see it as a disease in which an insult initiates a series of events in a genetically susceptible host. no single gene accounts for the familial component of this disease. genetic analysis of these patients reveals a prevalence of specific hla alleles, polymorphisms of fc erib, il-4, and cd14 [125, 126] . asthma can be classified using a number of different schema. most commonly, asthma is divided into two categories: atopic (allergic) and nonatopic (nonallergic). atopic asthma results from an allergic sensitization usually early in life and has its onset in early childhood. nonatopic asthma is late-onset and, though the immunopathology has not been as well studied, probably has similar mechanisms to atopic asthma. although this nosology is convenient for purposes of understanding the mechanisms of the disease, most patients manifest a combination of these two categories with overlapping symptoms. th0 pathogenetic mechanisms of both types encompass a variety of cells and their products. these include airway epithelium, smooth muscle cells, fibroblasts, mast cells, eosinophils, and t-cells. the asthma response includes two phases: an early response comprising an acute bronchospastic event within 15-30 minutes after exposure, and a late response that peaks approximately 4-6 hours and that can have prolonged effects. if one wants to understand this complex response, it is best to divide it into three components: (i) a type 1 hypersensitivity response, (ii) acute and chronic inflammation, and (iii) bronchial hyperactivity. type 1 hypersensitivity in general, human asthma is associated with a predominance of type 2 helper cells with a cd4ã¾ phenotype. these th2-type cells result from the uptake and processing of viral, allergen, and environmental triggers that initiate the episode. the processing includes the presentation of these triggers by the airway dendritic cells to naive t-cells (th0), resulting in their differentiation into populations of th1 and th2. the th2 differentiation is a result of il-10 release by the dendritic cells, and the th2 cells then part iv molecular pathology of human disease further propagate the inflammatory reaction in two ways. first, they release a variety of cytokines such as il-4, il-5, and il-13 that mediate a wide variety of responses. il-4 and il-13 stimulate b-cells and plasma cells to produce ige, which, in turn, stimulates mast cell maturation and the release of multiple mediators, including histamine and leukotrienes. second, these th2 cells secrete il-5 that, together with il-4, also stimulates mast cells to secrete histamine, tryptase, chymase, and the cysteinyl leukotrienes causing the bronchoconstrictor response that occurs rapidly after the exposure to the allergen. il-5 from these lymphocytes also recruits eosinophils to the airways and stimulates the release of the contents of their granules, including eosinophil cationic protein (ecp), major basic protein (mbp), eosinophil peroxidase, and eosinophil-derived neurotoxin. these compounds not only induce the bronchial wall hyperactivity but are also responsible for the increased vascular permeability that produces the transmural edema in the airways. the cells can differentiate into th1 cells as a result of il-12 produced by dendritic cells. these th1 cells produce interferon-gamma (ifn-g), il-2, and lymphotoxin, which play a role in macrophage activation in delayedtype hypersensitivity reactions as seen in diseases such as rheumatoid arthritis and tuberculosis [123] . these th1 cells are predominantly responsible for defense against intracellular organisms and are more prominent in normal airways and in airways of patients with emphysema than in asthmatics. however, in severe forms of asthma, th1 cells are recruited and have the capacity to secrete tumor necrosis factor (tnf)-a and ifn-g, which may lead to the tissue-damaging immune response one sees in these airways (figure 18 .29) [127, 128] . acute and chronic inflammation the role of acute and chronic inflammatory cells, including eosinophils, mast cells, macrophages, and lymphocytes, in asthma is evident in the abundance of these cells in airways, sputum, and bronchoalveolar samples from patients with this disease. the number of eosinophils in the airways correlates with the severity of asthma and the amount of bronchial hyperresponsiveness. proteins released by these cells including ecp, mcp, and eosinophil-derived neurotoxin cause at least some of the epithelial damage seen in the active form of asthma. neutrophils are prominent in the more acute exacerbations of asthma and are probably recruited to these airways by il-8, a potent neutrophil chemoattractant released by airway epithelial cells [123] . these cells also release proteases, reactive oxygen species (ros), and other proinflammatory mediators that, in addition to the epithelial damage, also contribute to the airway destruction and remodeling that occurs in the more chronic forms of this disease. the susceptibility of the epithelium in asthma to this oxidant injury may be increased due to decreased antioxidants such as superoxide dismutase in these lungs [129] . finally, mast cells are activated to release an abundance of mediators through the binding of ige to fceri, high-affinity receptors on their surface. allergens bind to ige molecules and induce a cross-linking of these molecules, leading to activation of the mast cell and release of a number of mediators, most notably histamine, tryptase, and various leukotrienes, including leukotriene d 4 (ltd 4 ), and interact with the smooth muscle to induce contraction and the acute bronchospastic response [130] . allergen bronchial hyperactivity the cornerstone of asthma is the hyperactive response of the airway smooth muscle. the mechanism by which this occurs combines neural pathways and inflammatory pathways. as stated, the inflammatory component of this response comes predominantly from the mast cells. the major neural pathway involved is the nonadrenergic noncholinergic (nanc) system. although cholinergic pathways are responsible for maintaining the airway smooth muscle tone, it is the nanc system that releases bronchoactive tachykinins (substance p and neurokinin a) that bind to nk2 receptors on the smooth muscle and cause the constriction that characterizes the acute asthmatic response [123] . in addition to these acute mechanisms, the airway also undergoes structural alterations to its formed elements. in the mucosa, these changes include goblet cell hyperplasia and basement membrane thickening. within the submucosa and airway wall, increased deposition of collagen and elastic fibers results in fibrosis and elastosis, and both the smooth muscle cells and the submucosal glands undergo hypertrophy and hyperplasia. these irreversible changes are a consequence of chronic inflammatory insults on the airways through mechanisms that include release of fibrosing mediators such tgfb and mitogenic mediators such as epidermal and fibroblast growth factors (egf, fgf). the exact mechanisms by which this occurs are not clearly defined, but the similarity of these factors with those involved in branching morphogenesis of the developing lung has led to a focus on the effect of inflammation on the interaction of the epithelium with the underlying mesenchymal cells [128] . the term chronic obstructive pulmonary disease (copd) applies to emphysema, chronic bronchitis, and bronchiectasis, those diseases in which airflow limitation is usually progressive, but, unlike asthma, not fully reversible [131] . the prevalence of copd worldwide is estimated at 9%-10% in adults over the age of 40 [132] . though there are different forms of copd with different etiologies, the clinical manifestations of the most common forms of the disease are the same. these include a progressive decline in lung function, usually measured as decreased forced expiratory flow in 1 second (fev1), a chronic cough, and dyspnea. emphysema and chronic bronchitis are the most common diseases of copd and are the result of cigarette smoking. as such, they usually exist together in most smokers. chronic bronchitis is defined clinically as a persistent cough with sputum production for at least 3 months in at least 2 consecutive years without any other identifiable cause. patients with chronic bronchitis typically have copious sputum with a prominent cough, more commonly get infections, and typically experience hypercapnia and severe hypoxemia, giving rise to the clinical moniker blue bloater. emphysema is the destruction and permanent enlargement of the air spaces distal to the terminal bronchioles without obvious fibrosis [133] . these patients have only a slight cough, while the overinflation of the lungs is severe, inspiring the term pink puffers. the pathologic features of copd are best understood if one considers the whole of copd as a spectrum of pathology that consists of emphysematous tissue destruction, airway inflammation, remodeling, and obstruction [134] . the lungs of patients with copd usually contain all of these features, but in varying proportions. the pathologic features of chronic bronchitis include mucosal pathology that consists of epithelial inflammation, injury, and regenerative epithelial changes of squamous and goblet cell metaplasia. in addition, the submucosa shows changes of remodeling with smooth muscle hypertrophy and submucosal gland hyperplasia. these changes are responsible for the copious secretions characteristic of this clinical disease, although studies have reported no consistent relationship between these pathologic features of the large airways and the airflow obstruction [135] . the pathology definition of emphysema is an abnormal, permanent enlargement of the airspaces distal to the terminal bronchioles accompanied by destruction of the alveolar walls without fibrosis [133] . the four major pathologic patterns of emphysema are defined by the location of this destruction. these include centriacinar, panacinar, paraseptal, and irregular emphysema. the first two of these are responsible for the overwhelming majority of the clinical disease. centriacinar emphysema (sometimes referred to as centrilobular) represents 95% of the cases and is a result of destruction of alveoli at the proximal and central areas of the pulmonary acinus, including the respiratory bronchioles ( figure 18 .30). it predominantly affects the upper lobes the remaining two types of emphysema, paraseptal and irregular, are rarely associated with clinical disease. in paraseptal emphysema, the damage is to the distal acinus, the area that abuts the pleura at the margins of the lobules. damage in this area may cause spontaneous pneumothoraces, typically in young, thin men [136] . irregular emphysema is tissue destruction and alveolar enlargement that occurs adjacent to scarring, secondary to the enhanced inflammation in the area. though this is a common finding in a scarred lung, it is of little if any clinical significance to the patient. though the emphysema in these lungs plays the dominant role in causing the obstruction, small airway pathology is also present. respiratory bronchiolitis refers to the inflammatory changes found in the distal airways of smokers. these consist of pigmented macrophages filling the lumen and the peribronchiolar airspaces and mild chronic inflammation and fibrosis around the bronchioles (figure 18 .33). the pigment in these macrophages represents the inhaled particulate matter of the cigarette smoke that has been phagocytized by these cells. the macrophages in turn release proteases, which destroy the elastic fibers in the surrounding area, resulting in the loss of elastic recoil and the obstructive symptoms. in general, copd is a result of inflammation of the large airways that produces the airway remodeling characteristic of chronic bronchitis as well as inflammation of the smaller airways that results in the destruction of the adjacent tissue and consequent emphysema. the predominant inflammatory cells involved in this process are the alveolar macrophages, neutrophils, and lymphocytes. the main theories of the pathogenesis of copd support the interaction of airway inflammation with two main systems in the lung: the protease-antiprotease system and the oxidant-antioxidant system. these systems help to protect the lung from the many irritants that enter the lung via the large pulmonary surface area that interfaces with the environment. in the protease-antiprotease system, proteases are produced by a number of cells, including epithelial cells and inflammatory cells that degrade the underlying lung matrix. the most important proteases in the lung are the neutrophil elastases, part of the serine protease family, and the metalloproteinases (mmps) produced predominantly by macrophages. these proteases can be secreted in response to invasion by environmental irritants, most notably infectious agents such as bacteria. in this setting, their role is to enzymatically degrade the organism. however, proteases can also be secreted by both inflammatory and epithelial cells in a normal lung to repair and maintain the underlying lung matrix proteins [137] . to protect the lung from unwanted destruction by these enzymes, the liver secretes antiproteases that circulate in the bloodstream to the lung and inhibit the action of the proteases. in addition, macrophages that secrete mmps also secrete tissue inhibitors of metalloproteinases (timps). a delicate balance of proteases and antiproteases is needed to maintain the integrity of the lung structure. an imbalance that results in a relative excess of proteases (either by overproduction of proteases or underproduction of their inhibitors) leads to tissue destruction and the formation of emphysema. this imbalance occurs in different ways in the two major types of emphysema: centriacinar and panacinar. in centriacinar emphysema, caused primarily by cigarette smoking, there is an overproduction of proteases primarily due to the stimulatory effect of chemicals within the smoke on the neutrophils and macrophages. though the exact mechanism is not completely understood, most studies support that nicotine from the cigarette smoke acts as a chemoattractant, and ros also contained in the smoke, stimulate an increased release of neutrophil elastases and mmps from activated macrophages, leading to the destruction of the elastin in the alveolar spaces [137] . this inflammatory cell activation may come about through the activation of the transcription factor nfkb that leads to tnfa production [132] . in addition, the elastin peptides themselves may attract additional inflammatory cells to further increase the protease secretion and exacerbate the matrix destruction [137] . unlike centriacinar emphysema, panacinar emphysema is most commonly caused by a genetic deficiency of antiproteases, usually due to alpha-1 anti-trypsin (aat) deficiency, a condition that affects approximately 60,000 people in the united states [138] . aat deficiency is due to a defect in the gene that encodes the protein aat, a glycoprotein produced by hepatocytes and the main inhibitor of neutrophil elastase. the affected gene is the serpina1 gene (formerly known as p1), located on the long arm of chromosome 14 (14q31-32.3). the genetic mutations that occur have been categorized into four groups: base substitution, in-frame deletions, frame-shift mutations, and exon deletions. these mutations usually result in misfolding, polymerization, and retention of the aberrant protein within the hepatocytes, leading to decreased circulating levels. aat deficiency is an autosomal codominant disease with over 100 allelic variants, of which the m alleles (m1-m6) are the most common; these alleles produce normal serum levels of a lessactive protein [139] . individuals who manifest the lung disease are usually homozygous for the alleles z or s (zz and ss phenotype) or heterozygous for the 2 m alleles (mz, or sz phenotype) [139] . an aat concentration in plasma of less than 40% of normal confers a risk for emphysema [140] . in individuals with the zz genotype, the activity of aat is approximately one-fifth of normal [141] . the second system in the lung involved in the pathogenesis of emphysema is the oxidant-antioxidant system. as in the protease system, the lung is protected from oxidative stress in the form of ros by antioxidants produced by cells in the lung. ros in the lung include oxygen ions, free radicals, and peroxides. the major antioxidants in the airways are enzymes including catalase, superoxide dismutase (sod), glutathione peroxidase, glutathione s-transferase, xanthine oxidase, and thioredoxin, as well as nonenzymatic antioxidants including glutathione, ascorbate, urate, and bilirubin [142] . the balance of oxidants and antioxidants in the lung prevents damage by ros. however, cigarette smoke increases the production of ros by neutrophils, eosinophils, macrophages, and epithelial cells [143] . evidence that damage to the lung epithelium and matrix is a direct result of ros includes the presence of exhaled h 2 o 2 and 8-isoprostane, decreased plasma antioxidants, and increased plasma and tissue levels of oxidized proteins, including various lipid peroxidation products. in addition to this direct effect, ros may also induce a proinflammatory response that recruits more inflammatory cells to the lung. in animal models, cigarette smoke induces the expression of proinflammatory cytokines such as il-6, il-8, tnfa, and il-1 from macrophages, epithelial cells, and fibroblasts, perhaps through activation of the transcription factor nfkb [144, 145] (figure 18.34) . finally, there is some evidence that cigarette smoke further disturbs the oxidant-antioxidant balance in the lung by depleting antioxidants such as ascorbate and glutathione [132] . bronchiectasis represents the permanent remodeling and dilatation of the large airways of the lung most commonly due to chronic inflammation and recurrent pneumonia. these infections usually occur because airway secretions and entrapped organisms cannot be effectively cleared. this pathology dictates the clinical features of the disease, which include chronic cough with copious secretions and a history of recurrent pneumonia. the five major causes of bronchiectasis are infection, obstruction, impaired mucociliary defenses, impaired systemic immune defenses, and congenital. these may produce either a localized or diffuse form of the disease. localized bronchiectasis is usually due to obstruction of airways by mass lesions or scars from previous injury or infection. diffuse bronchiectasis can result from defects in systemic immune defenses in which either innate or adaptive immunity may be impaired. diseases due to the former include chronic granulomatous disease (cgd), and diseases due to the latter include agammaglobulinemia/hypogammaglobulinemia and severe combined immune deficiencies. defects in the mucociliary defense mechanism that is responsible for physically clearing organisms from the lung may also cause diffuse bronchiectasis. these include ciliary dyskinesias that result in cilia with aberrant ultrastructure and cystic fibrosis (cf). congenital forms of bronchiectasis are rare but do exist. the most common include mounier-kuhn's syndrome and williams-campbell syndrome, the former causing enlargement of the trachea and major bronchi due to loss of bronchial cartilage, and the latter causing diffuse bronchiectasis of the major airways probably due to a genetic defect in the connective tissue [146, 147] . the pathology of bronchiectasis is most dramatically seen at the gross level. one can see dilated airways containing copious amounts of infected secretions and mucous plugs localized either to a segment of the lung or diffusely involving the entire lung as in cystic fibrosis (figure 18.35) . microscopic features include chronic inflammatory changes similar to those of chronic bronchitis but with ulceration of the mucosa and submucosa leading to destruction of the smooth muscle, and elastic in the airway wall and the characteristic dilatation and fibrosis. these enlarged airways contain mucous plugs comprising mucin and abundant degenerating inflammatory cells, a result of infections that establish themselves in these airways following the loss of the mucociliary defense mechanism. bacteria may be found in these plugs, most notably p. aeruginosa. the pathogenetic mechanism of bronchiectasis is complex and depends on the underlying etiology. in general, the initial damage to the bronchial epithelium is due to aberrant mucin (cystic fibrosis), dysfunctional cilia (ciliary dyskinesias), and ineffective immune surveillance (defects in innate and antibody-mediated immunity), leading to a cycle of tissue injury, repair, and remodeling that ultimately destroys the normal airway. the initial event in this cycle usually involves dysfunction of the mucociliary mechanism that inhibits the expulsion from the lungs of organisms and other foreign substances that invade the airways. this may be due to defects in the cilia or the mucin. ciliary defects are found in primary ciliary dyskinesia, a genetically heterogeneous disorder, usually inherited as an autosomal recessive trait that produces immotile cilia with clinical manifestations in the lungs, sinuses, middle ear, male fertility, and organ lateralization [148] . over 250 proteins make up the axoneme of the cilia, but mutations in 2 genes, dnai1 and dnah5, which encode for proteins in the outer dynein arms, most frequently cause this disorder [149] . in cf the main defect affects the mucin. in patients with this autosomal recessive condition, there is a low volume of airway surface liquid (asl) causing sticky mucin that inhibits normal ciliary motion and effective mucociliary clearance of organisms. this is due to a defect in the cystic fibrosis transmembrane conductance regulator (cftr) gene, located on chromosome 7 that encodes a camp-activated channel which regulates the flow of chloride ions in and out of cells and intracellular vacuoles, helping to maintain the osmolality of the mucin. this protein is present predominantly on the apical membrane of the airway epithelial cells, though it is also involved in considerable subapical, intracellular trafficking and recycling during the course of its maturation within these cells. this genetic disease manifests in multiple other organs that depend on chloride ion transport to maintain normal secretions, including the pancreas, intestine, liver, reproductive organs, and sweat glands [150] . the genetic mutations in cf influence the cftr trafficking in the distal compartments of the protein secretary pathway, and various genetic mutations produce different clinical phenotypes of the disease. over 1600 mutations of the cftr gene have been found. however, only four of these mutations occur at a frequency of greater than 1%. these mutations are grouped into five classes according to their functional deficit: group i, cftr is not synthesized; group ii, cftr is inadequately processed; group iii, cftr is not regulated; group iv, cftr shows abnormal conductance; group v, cftr has partially defective production or processing. approximately 70% of cf patients are in group ii and have the same mutation, f508d cftr, a deletion of phenylalanine at codon 508 [154] . in these patients, most of the cftr protein is misfolded and undergoes premature degradation within the endoplasmic reticulum, though a small amount of the cftr protein is present on the apical membrane and does function normally. cf patients may have a combination of genetic mutations from any of the five groups. however, those patients with the most severe disease involving both the lungs and pancreas usually carry at least two mutations from group i, ii, or iii [151] . systemic immune deficiencies cause bronchiectasis through the establishment of persistent infection and inflammation. there are four major categories of immune deficiencies. the first category consists of a number of genetic diseases that cause either agammaglobulinemia or hypogammaglobulinemia. these include xlinked agammaglobulinemia (xla) and common variable immunodeficiency (cvi). xla is caused by a mutation of the bruton's tyrosine kinase (btk) gene that results in the virtual absence of all immunoglobulin isotypes and of circulating b lymphocytes. in cvi there is a marked reduction in igg and iga and/or igm, associated with defective antibody response to protein and polysaccharide antigens. as expected, both of these diseases increase susceptibility to infections from encapsulated bacteria. the second category of immune deficiency is hyper-ige syndrome, a disease with markedly elevated serum ige levels that is characterized by recurrent staphylococcal infections. the third category is chronic granulomatous disease (cgd), a genetically heterogeneous group of disorders that have a defective phagocytic respiratory burst and superoxide production, inhibiting the ability to kill staphylococcus spp. and fungi such as aspergillus spp. finally, severe combined immune deficiency (scid) comprises a group of disorders with abnormal t-cell development and b-cell and/or natural killer cell maturation and function, predisposing these patients to pneumocystis jiroveci and viral infections [152] . after the initial insult, the subsequent steps in the development of bronchiectasis include destruction of the epithelial cells and bronchial wall connective tissue matrix by the proteases and ros secreted by the neutrophils. this proinflammatory milieu is produced by multiple factors. first, infections can persist in these lungs due to defective host immune systems and mechanisms certain organisms have developed to evade these immune defenses. for example, pseudomonas aeruginosa, changes from a nonmucoid to a mucoid variant and also releases virulence factors to protect against phagocytosis [153] . second, in the case of cystic fibrosis, neutrophils are directly recruited by proinflammatory cytokines, such as interleukin-8 (il-8), released from the bronchial epithelial cells as a result of the defective cgft protein [154] . finally, the necrotic cellular debris and other breakdown products act as chemoattractants that recruit more inflammatory cells to the airway wall, further exacerbating the damage. the final phase of the repair and remodeling begins when macrophages invade and recruit fibroblasts that secrete collagen, leading to the fibrosis seen in the pathology. however, in the absence of effective airway clearance mechanisms, these ectatic airways remain a reservoir of infection that continues the cycle of inflammation and tissue destruction. the idiopathic interstitial pneumonias (iips) comprise a group of diffuse infiltrative pulmonary diseases with a similar clinical presentation characterized by dyspnea, restrictive physiology, and bilateral interstitial infiltrates on chest radiography [155] . pathologically, these diseases have characteristic patterns of tissue injury with chronic inflammation and varying amounts of fibrosis. by recognizing these patterns, a pathologist can classify each of these entities and predict prognosis. however, the pathologist cannot establish the etiology, since these pathologic patterns can be seen in multiple clinical settings. the pathologic classification of these diseases, originally defined by liebow and carrington in 1969 [156] , has undergone important revisions over the past 35 years with the latest revision by the american thoracic society/european respiratory society in 2003 [157] . the best known and most prevalent entity of the iips is idiopathic pulmonary fibrosis (ipf), which is known pathologically as usual interstitial pneumonia (uip). uip is a histologic pattern characterized by patchy areas of chronic lymphocytic inflammation with organizing and collagenous type fibrosis. these patients usually present with gradually increasing shortness of breath and a nonproductive cough after having had symptoms for many months or even years. imaging studies usually reveal bilateral, basilar disease with a reticular pattern [155] . therapy begins with corticosteroids, advancing to more cytotoxic drugs such as methotrexate and cytoxan, but most current therapies are not effective in stopping the progression of the disease. the current estimates are that 20/100,000 males and 13/100,000 females have the disease, most of whom progress to respiratory failure and death within 5 years [158] . the pathology is characterized by a leading edge of chronic inflammation with fibroblastic foci that begin in different areas of the lung at different times. these processes produce a variegated pattern of fibrosis, usually referred to as a temporally heterogenous pattern of injury [159] . because it occurs predominantly in the periphery of the lung involving the subpleura and interlobular septae, the gross picture is one of more advanced peripheral and basilar disease (figure 18.36) . the progression from inflammation to fibrosis includes interstitial widening, epithelial injury and sloughing, fibroblastic infiltration, and organizing fibrosis within the characteristic fibroblastic foci. deposition of collagen by fibroblasts occurs in the latter stages of repair. the presence of the abundant collagen produces stiff lungs that are unable to clear the airway secretions, leading to recurrent inflammation of the bronchiolar epithelium with eventual fibrosis and breakdown of the airway structure. this remodeling produces mucousfilled ectatic spaces giving rise to the gross picture of honeycomb spaces, which is seen in the advanced pathology ( figure 18 .36) [160] . theories of the pathogenesis of ipf have evolved over the past decade. early theories favored a primary inflammatory process, while current theories favor the concept that the fibrosis of the lung proceeds independently of inflammatory events and develops from aberrant epithelial and epithelial-mesenchymal responses to injury to the alveolar epithelial cells (aecs) [161] . the aecs consist of two populations: the type 1 pneumocytes and the type 2 pneumocytes. in normal lungs, type 1 pneumocytes line 95% of the alveolar wall, and type 2 pneumocytes line the remaining 5%. however, in lung injury, the type 1 cells, which are exquisitely fragile, undergo cell death, and the type 2 pneumocytes serve as progenitor cells to regenerate the alveolar epithelium [162] . though some studies have suggested that repopulation of the type 2 cells depends on circulating stem cells, this concept remains to be fully proven. according to current concepts, the injury and/or apoptosis of the aecs initiates a cascade of cellular events that produce the scarring in these lungs. studies of aecs in lungs from patients with ipf have shown ultrastructural evidence of cell injury and apoptosis as well as expression of proapoptotic proteins. further, inhibition of this apoptosis by blocking a variety of proapoptotic mechanisms such the fas-fas ligand pathway, angiotensin, and tnfa production, and caspase activation can stop the progression of this fibrosis [163] . the result of the aec injury is the migration, proliferation, and activation of the fibroblasts and myofibroblasts that leads to the formation of the characteristic fibroblastic foci of the uip pathology and the deposition and accumulation of collagen and elastic fibers in the alveoli (figure 18.37 ). this unique pathology may be a result of the increased production of profibrotic factors such as transforming growth factor-a (tgfa) and tgfb, fibroblastic growth factor-2, insulin-like growth factor-1, and platelet-derived growth factor. an alternative pathway might involve overproduction of inhibitors of matrix degradation such as timps (tissue inhibitors of matrix production) [164] . in support of the former mechanism, fibroblasts isolated from the lungs of ipf patients exhibit a profibrotic secretory phenotype [165] . multiple factors, such as environmental particulates, drug or chemical exposures, and viruses may trigger the initial epithelial injury, but genetic factors also play a role. approximately 2%-20% of patients with ipf have a family history of the disease with an inheritance pattern of autosomal dominance with variable penetrance. two genetic mutations have been implicated in this familial form of ipf. one large kindred has been reported with a mutation in the gene encoding surfactant protein c, and six probands have been a b reported with heterozygous mutations in genes htert or htr, encoding telomerase reverse transcriptase and telomerase rna, respectively, resulting in mutant telomerase and short telomeres [166] . adult respiratory distress syndrome (ards) represents a constellation of clinical, radiologic, and physiologic features in patients with acute respiratory failure that can occur after a variety of insults. ards is defined by clinical criteria that include a rapid onset of severe hypoxemia that is refractory to oxygen therapy, the presence of abnormal chest radiographs with evidence of bilateral alveolar filling and collapse, increased pulmonary artery occlusion pressure, and a resistance to improved oxygenation regardless of mechanical ventilation therapy [167] . treatment of ards includes eliminating the underlying cause, protective ventilation strategies that improve oxygenation, and supportive treatment that may include administration of corticosteroids. the pathology of ards is diffuse alveolar damage (dad), whose histologic picture is one of inflammation and fibrosis that diffusely involves all of the structures of the alveolus and is similar throughout the affected areas of the lung [168] . dad is divided into three major phases that follow each other chronologically after the original insult. these are exudative, proliferative, and fibrotic dad. the initial injury primarily involves the epithelium of the alveolar wall and the endothelium in the capillary, causing the destruction and sloughing of the type 1 pneumocytes into the alveolar space and a breakdown of the tight junctions of the endothelium. in combination, these two events result in the loss of the epithelial-endothelial barrier of the alveolus and leakage of plasma from the capillary into the alveolar space. this flooding of the airspace with fluid markedly decreases oxygen exchange and causes the hypoxia that these patients experience. in addition, acute inflammatory changes of the endothelium also cause thrombi to form in vessels, adding to a decreased amount of blood circulating through the lung and further compromising gas exchange. as air is brought into the alveoli, the positive pressure within the alveolar space forces the plasma against the alveolar wall, producing a membranous morphology referred to as hyalin membranes characteristic of the first phase of dad, referred to as exudative dad (figure 18.38) . this initial injury is followed by a sequence of events that represent the lung's efforts to repair itself. first, type 2 pneumocytes undergo hyperplasia and re-epithelialize the alveolar wall after the loss of the type 1 cells. this re-establishes the epithelial barrier and, because these cells secrete surfactant, results in increased surfactant production, which lowers the surface tension of the alveolus and inhibits its collapse. because of the increased numbers of type 2 pneumocytes, this is known as the proliferative phase of dad (figure 18.38) . in the final phase of dad, fibrotic dad, fibroblasts migrate in from the adjacent interstitium to the alveolar space and produce organizing and irreversible fibrosis within both the alveolar space and the interstitium. in addition to this mechanism, fibrosis may also occur in those areas where alveolar walls collapse when surfactant is decreased during the initial insult. the histopathologic picture during this fibrotic phase is one of thickened alveolar septa, intra-alveolar granulation tissue, microcyst formation, and areas of irregular alveolar scarring. in rare cases, these microcysts progress to large cysts, an adult equivalent of bronchopulmonary dysplasia. the cellular events of dad are complex and incompletely understood. in general, the disease can be broken down into two phases. in the first, a large influx of neutrophils and plasma enter the alveolar space. the role the neutrophils play in the initial cellular injury and death is unclear, but it is known that they are necessary for this injury to occur. in addition, clinical studies have shown that within the peripheral blood and bronchoalveolar lavages (bal) of these patients, neutrophils are present along with a myriad of proinflammatory cytokines, such as il-8, il-1, and tgfa, all of which are capable of recruiting them to the lung. also present in these fluids are mediators that recruit fibroblasts such as tgfb. all of these mediators are probably the result of upregulation of nfkb, a proinflammatory transcription factor, in alveolar macrophages. the adherence of neutrophils to the capillary endothelium in the lung occurs through adhesion molecules such as selectin, integrin, and immunoglobulins. neutrophil adherence and subsequent transmigration through the endothelium of the lung capillaries may cause some endothelial damage. however, most speculate that ros and reactive nitrogen chapter 18 molecular basis of pulmonary disease species (rns) secreted by the neutrophils modulate the majority of this injury [169] . this is supported by the finding that patients with ards have products of oxidative damage such as hydrogen peroxide (h 2 o 2 ) in the exhaled breath and myeloperoxidase and oxidized aat in the bal. the cell injury and death of the type 1 pneumocytes most likely occurs via two mechanisms: lipopolysaccharide (lps)-induced caspase-dependent apoptosis and hyperoxia-induced cell death through apoptosis and nonapoptotic mechanisms [170] . in the former, lps, an immunogenic component of the outer membrane of gram-negative bacteria, may trigger innate immune and inflammatory responses via toll-like receptors that bind fas-associated death domain protein and caspase-9, leading to epithelial cell death. in hyperoxia-induced cell death, hyperoxia may induce the expression of angiopoietin 2 (ang2) in lung epithelial cells. ang2 is an angiogenic growth factor that can activate caspase pathways and lead to apoptotic cell death [170] . cell death in ards is not limited to these mechanisms, and further study of many of pathways by which this can occur is needed. lymphangioleiomyomatosis (lam) is a rare systemic disease of women, usually in their reproductive years (average age of 35 years), that is characterized by a proliferation of abnormal smooth muscle cells giving rise to cysts in the lungs, abnormalities in the lymphatics, and abdominal tumors, most notably in the kidneys. in addition to sporadic cases (denoted as s-lam), lam also affects 30% of women with tuberous sclerosis (denoted as tsc-lam), a genetic disorder with variable penetrance associated with seizures, brain tumors, and cognitive impairment [171, 172] . global estimates indicate that tsc-lam may be as much as 5-fold to 10-fold more prevalent than s-lam, though at least some suggest that tsc-lam may have a milder clinical course than s-lam [172] . clinically, lam patients usually present with increasing shortness of breath on exertion, obstructive symptoms, spontaneous pneumothoraces, and chylous effusions or with abdominal masses consisting of either angiomyolipomas and/or lymphangiomyomas. chest imaging studies characteristically reveal hyperinflation with flattened diaphragms and thin-walled cystic changes. mortality at 10 years from the onset of symptoms is 10%-20% [173] . lam appears as small, thin-walled cysts (0.5-5.0 cm) randomly throughout both lungs [174] (figure 18.39) . microscopically, lam lungs contain a diffuse infiltration of smooth muscle cells, predominantly around lymphatics, veins, and venules. most notably, one finds smooth muscle cells in the subpleural with hemosiderin-laden macrophages in the adjacent field, and the macrophages are also seen on bronchoalveolar lavage specimens from these patients. the hemosiderin pigment in these lungs is thought to be secondary to microhemorrhages from the obstruction of the veins ( figure 18 .40) [175] . the smooth muscle cells in lam react to antibodies to hmb-45, a premelanosomal protein. other melanosome-like structures are also found in lam cells, suggesting that these cells have characteristics of both smooth muscle and melanosomes [176] . the lesional cells in lam are smooth muscle-like with both spindled and epithelioid morphology [177] . these cells are the same in both s-lam and tsc-lam part iv molecular pathology of human disease and are a clonal population although they lack other features of malignancy [178] . molecular studies reveal that the abnormal lam cell proliferation is caused by mutations in one of two genes linked to tuberous sclerosis: tuberous sclerosis complex 1 or 2 (tsc1 or tsc2). these two genes control cell growth and differentiation through the akt/mammalian target of rapamycin (mtor) signaling pathway [172] . in this pathway, a growth factor receptor (such as insulin or pdgf receptors) becomes phosphorylated when an appropriate ligand binds, resulting in activation of downstream effectors and ultimately akt. the gene products of tsc1 and tsc2 are hamartin and tuberin, which act as dimers to maintain rheb (a member of the ras family) in a gdp-loaded state via statins, acting as a break to the akt/mtor pathway, thereby retarding protein synthesis and cell growth. in lam cells, loss-of-function mutations in these two genes remove this inhibition, leading to enhanced rheb activation, mtor activation (with raptor), and subsequent phosphorylation of downstream molecules which result in uncontrolled cell growth, angiogenesis, and damage to the lung tissue ( figure 18 .41) [179] . the abnormal proliferation of lam cells is thought to damage the lung through overproduction of matrix metalloproteinases (mmps), which degrade the connective tissue of the lung architecture, destroy the alveolar integrity, and result in cyst formation with air trapping [179] . these destructive capabilities of the lam cells are enhanced by their secretion of the angiogenic factor vegf-c, which is thought to cause the proliferation of lymphatic channels throughout the lung [179] . sarcoidosis is a multisystemic disease that involves the lung in over 90% of the cases [180] . it is most common in the 20-40-year age group and among females. in the united states, african americans are more commonly affected than caucasians [181] . the clinical picture of sarcoidosis is variable, but most patients present with systemic symptoms including fatigue, weight loss, and fever. the most common finding on chest imaging studies is bilateral hilar lymph node enlargement and reticular, reticulonodular, and focal alveolar opacities within the lung parenchyma [182] . pulmonary sarcoidosis is characterized by granulomas which consist of activated histiocytes, called epithelioid histiocytes that form nodules ranging in size from 15-20 microns (figure 18 .42) [183] . unlike infectious granulomas that usually contain areas of central necrosis, the granulomas in pulmonary sarcoidosis are predominantly non-necrotizing [184] . also, the granulomas in sarcoidosis follow a distribution along the lymphatics, which includes the area in the subpleural, along the interlobular septae and around the bronchovascular area containing the bronchiole and branch of the pulmonary artery (figure 18 .42). the granulomas occur much more commonly in the upper lobes, leading to the predominant upper lobe fibrosis and bronchiectasis that can be seen in longstanding sarcoidosis [185] . despite over 50 years of research on sarcoidosis, the etiology remains unknown. most agree that the disease is probably a result of environmental triggers acting on a genetically susceptible host [186, 187] . a genetic basis of sarcoidosis has been suggested by studies that demonstrate familial clustering and racial variation [188, 189] . further, complex inheritance patterns for the disease suggest that more than one gene may be involved [190] . several genes of the major histocompatibility complex (mhc) region of the genome have been implicated. most are clustered on the short arm of chromosome 6 that encompasses the human leukocyte antigen (hla) domain. the hla class i mhc molecules associated with sarcoidosis are the hla-b7 and hla-b8 class i alleles [191, 192] . hla class ii molecules implicated in susceptibility include the hla-dr alleles [193, 194] . genes other than mhc genes thought to regulate the susceptibility to sarcoidosis include those for chemokines such as macrophage inflammatory protein-1a and rantes (ccr5 and ccr3) [195, 196] . environmental factors that have been implicated are those that are aerosolized. therefore, these environmental agents have a mode of entry into the lungs and can cause granulomas in the lung, similar to sarcoidosis. these factors can be divided into two major categories, which include infectious and noninfectious agents. the mycobacteria have been the most extensively studied organisms. however, their role in this disease remains controversial due to the difficulty in identifying them by either culture or histochemical stains in sarcoid tissue. recently, molecular techniques have been able to demonstrate mycobacterial nucleic acid in sarcoid tissue [197, 198] . however, even studies using this technology have not produced consistent results, and the role of these organisms in the disease requires further study. the immune response in sarcoidosis has two major features: (i) the initial event leading to granuloma formation and (ii) the progression of this granulomatous response to either resolution or fibrosis [199] . the formation of the granulomas, triggered by activation of tcells and antigen-presenting dendritic histiocytes, results in a release of proinflammatory cytokines and chemokines, and recruitment, activation, and proliferation of mononuclear cells, predominantly t-cells. these activated t-cells are predominantly cd4-expressing t-helper (th) cells, which release ifn-g and il-2. alveolar macrophages at the site release tnfa, il-12, il-16, and other growth factors. this results in the granuloma formation and alveolitis, the characteristic morphologic features of the disease [200] . the second phase of this immunologic response that leads to either resolution of the disease or persistence of the granulomas and fibrosis is less well characterized. ongoing granuloma formation and inflammation may be a result of the persistent presence of antigens, the excessive synthesis of chemotactic factors, or the part iv molecular pathology of human disease persistence of the mononuclear cells within the granulomas. importantly, the role of the t-cells in these granulomas is to secrete cytokines that attract, stimulate, and ultimately deactivate the fibroblasts that are responsible for the fibrosis that is seen in the chronic disease. the balance between the profibrotic mediators such as tgfb, insulin-like growth factor-i, platelet-derived growth factor (pdgf), and the antifibrotic mediators, such as ifn-g, probably dictates the natural history of sarcoidosis in the lung [201] . genes involved in macrophage-derived cytokines, chemokines, and mediators of fibrosis are all possible candidates for the underlying genetic cause of this complicated disease. pulmonary alveolar proteinosis (pap) is a rare disease of the lungs characterized by accumulation of surfactant in the alveolar spaces. the names alveolar proteinosis, lipoproteinosis, or perhaps most accurately phospholipoproteinosis, apply equally to this entity. pap takes three forms clinically: (i) congenital (2%), (ii) secondary (5%-10%), and (iii) idiopathic or primary (88%-93%) [202] [203] [204] . pap arises in previously healthy adults with the median age at diagnosis of approximately 40 years and a male-to-female ratio of 2.7:1. the clinical presentation is variable and usually includes an insidious onset of slowly progressive dyspnea, a dry cough, and other symptoms of respiratory distress, including fatigue and clubbing. however, almost one-third of patients are asymptomatic and are found clinically by abnormal chest x-rays [205, 206] . the secondary form of pap can be found in patients with environmental exposures, including fine silica, aluminum, titanium dioxide, and kaolin dust [206] . also, secondary pap may be found in patients with malignancies, most commonly hematologic malignancies such as myelogenous leukemia [207, 208] . chest imaging studies in both the idiopathic and secondary forms most commonly show fine, diffuse, feathery nodular infiltrates, centered in the hilar areas, sparing the peripheral regions [206] . on chest computerized tomographs, the infiltrates may have a geometric-type shape, sometimes referred to as crazy paving [209] . the most prominent microscopic feature of both idiopathic and secondary pap is the filling of the alveoli with finely granular period acid-schiff-positive diastaseresistant (pasd) acellular material (figure 18 chapter 18 molecular basis of pulmonary disease material consists of phospholipids (90%); surfactant proteins a, b, c, and d (10%); and carbohydrate (<1%) [210] . alveolar macrophages (ams) with prominent foamy cytoplasm are commonly seen, while alveolar septa are remarkably normal in appearance. in some alveolar spaces there are denser, more solid clumps of pas-d-positive material. definitive pathologic differences between the idiopathic and secondary forms of pap have not been well documented [211, 212] . the etiologies of the two adult forms of pap have been well studied with the most known about the idiopathic variant. theories of the pathogenesis of this form have focused on the abnormal accumulation of the surfactant-like material within the alveolar spaces. since the regulation of surfactant levels in the alveoli depends on appropriate synthesis, recycling, and catabolism, the two opposing hypotheses have included overproduction versus decreased degradation of this material. in normal hosts, surfactant is essential to maintaining the low surface tension needed for proper alveolar inflation and gas exchange. the critical role of maintaining the proper composition and amount of surfactant in the alveoli is performed by two cell types: type 2 pneumocytes and alveolar macrophages [213] . the type 2 pneumocytes synthesize surfactant in the endoplasmic reticulum and golgi, and store it as lamellar bodies [213] , which are then delivered to and fuse with the apical plasma membrane, secreting the surfactant into the airways [214] . catabolism of surfactant is carried out by type 2 pneumocytes and ams. in pap, most evidence suggests that the clearance of surfactant by the am is decreased [203, 215] . the first clue as to the underlying mechanism for this defect in am function came in 1994 when studies revealed that knockout mice deficient in granulocytemacrophage colony-stimulating factor (gm-csf) develop lung lesions similar to those in patients with pap [216] . this rather serendipitous finding prompted explorations centered on the am and the effect diminished gm-csf might have on its cellular functions. subsequent studies from humans with pap revealed an autoimmune mechanism by which a circulating neutralizing antibody to gm-csf blocked its binding to the gm-csf receptor, depressing the effect of gm-csf on the ams [217] [218] [219] . neutralizing antibodies to gm-csf have most often been identified in the idiopathic variant of pap. however, recently these antibodies have also been reported in patients with secondary pap [220] . genes that control many functions in the am are controlled by signaling pathways initiated by gm-csf binding to the am. one pathway is mediated through a transcription factor pu.1 that controls genes involved in surfactant degradation, among other bactericidal functions [221, 222] . another transcription factor, peroxisome-proliferator-activated receptor g (pparg), is also part of a pathway activated by gm-csf. pparg controls the expression of genes involved in intracellular lipid metabolism. ams from patients with pap have a deficiency of this transcription factor, which is correctable by gm-csf therapy [223] . overall, the lack of gm-csf-initiated signaling in ams from patients with pap leads to inhibition of both pparg and pu.1 pathways. this results in decreased surfactant catabolism, intracellular lipid metabolism, and the accumulation of surfactant in the alveoli (figure 18 .44). pulmonary hypertension consists of a group of distinct diseases whose pathology is characterized by abnormal destruction, repair, remodeling, and proliferation of all compartments of the pulmonary vascular tree, including arteries, arterioles, capillaries, and veins. the classification of these diseases has undergone a number of revisions. the most recent revision (in 2003) groups these diseases based on both their pathologic and clinical characteristics [224] . there are five major disease categories in the current classification system: (i) pulmonary arterial hypertension (pah); (ii) pulmonary hypertension with left heart disease; (iii) pulmonary hypertension associated with lung disease and/or hypoxemia; (iv) pulmonary hypertension due to chronic thrombotic and/or embolic disease; and (v) miscellaneous causes, including sarcoidosis, histiocytosis x, and lymphangioleiomyomatosis. the clinical course of most patients with pulmonary hypertension begins with exertional dyspnea, and progresses through chest pain, syncope, increased mean pulmonary artery pressures and, eventually, right heart failure. the rate of this clinical progression varies among patients, from a few months to many years [225] . treatment of these diseases focuses on blocking the mediators involved in the pathogenesis of the diseases. however, current therapies rarely prevent progression of the disease, and lung transplantation provides the only hope for long-term survival. the major group of this classification, pah, can be subdivided into familial pah, idiopathic pah, pah associated with other conditions (such as connective tissue diseases, hiv, congenital heart disease), and pah secondary to drugs and toxins (such as anorexigens, cocaine, and amphetamines). in these diseases, the primary pathology is localized predominantly in the small pulmonary arteries and arterioles. however, two other diseases in this group, pulmonary veno-occlusive disease and pulmonary capillary hemangiomatosis, involve predominantly other components of the pulmonary vasculature, the veins, and the capillaries, respectively. the pathologic changes seen in the pulmonary vessels of these patients primarily reflect injury to and repair of the endothelium. early pathologic changes include medial hypertrophy and intimal fibrosis that narrows and obliterates the vessel lumen. these are followed by remodeling and revascularization, producing a proliferation of abnormal endothelial-lined spaces. these structures are known as plexogenic lesions and are the pathognomonic feature of pah (figure 18.45) . in the most severe pathologic lesions, these abnormal vascular structures become dilated or angiomatoid-like and may develop features of a necrotizing vasculitis with transmural inflammation and fibrinoid necrosis. though the exact pathogenetic mechanism of pah remains unknown, research over the past 10 years has begun to offer some clues. the familial form of pah, with a 2:1 female-to-male prevalence, has an autosomal dominance inheritance pattern with low penetrance. the genetic basis for this has been found to be germline mutations in the gene encoding the bone morphogenetic protein receptor type 2 (bmpr2). these mutations account for approximately 60%-70% of familial pah and 10%-25% of patients with sporadic pah [233] . approximately 140 bmpr2 mutations have been identified in familial pah, each resulting in a loss of receptor function, either through alteration in transcription of the gene through missense, nonsense, or frameshift alterations in the codon or by rna spicing mistakes [226] . the mechanism by which a single mutation to the bmpr2 gene induces vascular smooth muscle proliferation and decreased apoptosis that is not completely understood, but it most likely involves defects in the bmpr2 signaling pathway. bmpr2 is a receptor for a family cytokines (bmps) that are members of the tgfb superfamily of proteins that play a role in the growth and regulation of many cells, including those of the pulmonary vasculature. in the vascular smooth muscle cells of the lung, tgfb signaling causes a proliferation of smooth muscle in pulmonary arterioles, while bmpr2 signaling causes an inhibition of the proliferation of these cells, favoring an apoptotic environment. the bmpr2 signaling occurs through an activation of a receptor complex (bmpr1 and bmpr2) that leads to phosphorylation and activation of a number cytoplasmic mediators, most notably the smad proteins (mothers against decapentaplegic). these smad proteins, especially the smad 1, smad 5, and smad 8 complex with smad 4, translocate to the nucleus where they target gene transcription that induces an antiproliferative effect in the cell. in familial pah, the bmrpr2 gene mutation may lead to insufficient protein product and subsequent decreased protein function, in this case decreased bmpr2 receptor function, decreased smad protein activation, and decreased antiproliferative effects in the vascular smooth muscle cells. the imbalance between the proproliferative effects of the tgfbs and the antiproliferative effects of the bmps results in the formation of the vascular lesions of pah (figure 18 .46) [227, 228] . despite these advances, questions regarding the pathogenesis of pah remain. most notably, why do only 10%-20% of patients with the mutation develop clinical disease? some speculate that genes confer susceptibility but a second hit is required to develop the clinical disease, such as modifier genes or environmental triggers, perhaps drugs or viral infections [227, 229] pulmonary vasculitides present as diffuse pulmonary hemorrhage and are usually caused by one of three major pulmonary vasculitis syndromes: wegener's granulomatosis, churg-strauss syndrome, and microscopic polyangiitis. all three diseases have similar clinical presentations and considerable overlap in their pathologic features as small vessel systemic vasculitides that affect the lung as well as other organs, most notably the kidney. wegener's granulomatosis (wg) is an unusual disease that affects the upper and lower respiratory tract and the kidneys. it usually presents between 40 and 60 years of age and is slightly more common in men than women. the clinical presentation depends on the affected organ, but when the lung is involved, hemoptysis is the major presenting symptom. chest imaging studies may show a variety of patterns, most commonly bilateral ground glass opacities with masses, usually in the lower lobes that may cavitate. immunologic testing of peripheral blood or end organ tissue can be helpful in revealing characteristic immunofluorescent staining patterns for antineutrophilic cytoplasmic antibody (anca), an antibody that targets two substances: proteinase 3 (pr3) and myeloperoxidase (mpo). when present in either the blood or the tissue, the pattern of immunofluorescent staining can be cytoplasmic (canca) or perinuclear (panca). the former pattern is more commonly seen in wegener's granulomatosis, and the latter is more commonly seen in microscopic polyangiitis and churg-strauss syndrome (css). css is a systemic disorder defined by the presence of asthma, peripheral blood eosinophilia, and systemic vasculitis. similar to wg, it usually presents between 40 and 60 years of age, and a clinical diagnosis requires a history of asthma, a peripheral blood eosinophilia, neuropathy, an abnormal chest imaging study, and sinusitis. other organs involved include the heart, the central nervous system, kidneys (though less commonly than wg), gastrointestinal tract, and skin. chest imaging usually shows patchy, multifocal infiltrates; masses and cavitation are rare. laboratory tests reveal positive panca tests in 70% of patients. microscopic polyangiitis (mpa) is similar to both wg and css in that it is a systemic vasculitis that involves the lung and usually presents in the fourth or fifth decade of life. the clinical onset is usually sudden with fever, weight loss, myalgias, and arthralgias. the kidney is the main organ involved, and mpa is the most common cause of pulmonary-renal syndrome. lung involvement occurs in approximately 50% of the patients, and skin and upper respiratory tract are other common sites. similar to wg and css, anca testing is helpful with positive panca in 80% of patients. chest imaging usually shows bilateral infiltrates without masses, similar to css. treatment for all three diseases is immunosuppression with glucocorticoids or cyclophosphamide, and all three usually respond well, although wg has a greater relapse rate after treatment than either css or mpa [230] . the pathology of wg, css, and mpa have overlapping features of an acute and chronic vasculitis that involves medium-and small-sized vessels in the lung. the inflammatory cell infiltrate that destroys the blood vessels is both lymphocytic and neutrophilic, and areas of fibrinoid necrosis are seen. however, in wg, there are characteristic areas of microabscesses that lead to masses of geographic necrosis with basophilia. scattered multinucleated giant cells are present, but no wellformed granulomas are seen. this helps to distinguish it from other vasculitides and infection (figure 18.47) . similarly, the pathology of css has distinguishing features, with the early pathology characterized by an eosinophilic pneumonia with areas of loosely formed granulomas with central necrosis containing degenerating eosinophils (figure 18 .48). the infiltrate is predominantly eosinophils, but neutrophils, lymphocytes, and plasma cells are also present. capillaritis can be seen in wg, csg, and mpa, and all three have hemosiderin deposition present, both within alveolar macrophages and deposited in the connective tissue of the interstitium and the vessel walls. the pathogenesis of these three pulmonary hemorrhage syndromes is similar to the mechanisms of these diseases in the kidney. in general, these diseases in the lung and the kidney represent immune-mediated these lesions are thought to be the early form of the larger areas of geographic necrosis that produces the mass-like nodules found in these lungs. chapter 18 molecular basis of pulmonary disease necrotizing vasculitides that have few or no immune deposits in the vessels but exhibit the presence of anca autoantibodies to myeloperoxidase (mpo) and proteinase 3 (pr3), the components of primary granules of neutrophils. mpa and css are primarily diseases of mpo antibodies, and wg is primarily a disease of pr3 antibodies. the mechanism by which the ancas are induced is not known but may be part of an autoimmune response to environmental exposures early in life. these autoantibodies then inflict damage on the vessels through a mechanism that is not yet completely understood. one theory suggests that circulating ancas bind to pr3 and mpo on the surface of neutrophils and initiate a respiratory burst, degranulation, and apoptosis. ros and proteases are released and inflict endothelial and tissue damage on the adjacent vessel. the anca binding may also induce the release of proinflammatory cytokines and chemokines such as il-1 and tnfa that further contribute to the vascular inflammation. the second theory postulates that circulating immune complexes of excess anca antigen (mpo or pr3) and anca autoantibodies attach to the vascular endothelium and activate complement that results in the chemotaxis and adhesion of inflammatory cells, causing these cells to undergo a respiratory burst and, as in the first theory, release of ros and proteases that cause the vascular endothelial damage. in both theories, it is important to remember that mpo and pr3 are also present in monocytes and that anca autoantibodies may be involved with monocytes in similar ways to release inflammatory mediators [231] . infectious diseases of the lung are a common cause of pulmonary disease given the constant exposure of the lungs to the environment. various organisms are capable of causing these infections, including common viruses and bacteria, as well as more uncommon fungi, parasites, and protozoa. the diagnosis of the specific etiologic agent can be challenging given that most have similar clinical features and many are difficult to identify in the lung tissue. this brief overview of the defense mechanisms the lung uses to protect itself will serve to introduce the pathology of these lung infections. the lung has multiple anatomic mechanisms by which it defends itself against invasion by various pathogens. first, the upper nasal cavities and respiratory tract serve as anatomic barriers to inhaled organisms. the ciliated epithelium and torturous cavities of the sinuses screen large organisms (typically larger than 10 microns). for those particles that venture further down the respiratory tract, the cough reflex that the upper trachea elicits serves to expel them up and out. second, the mucociliary tree of the upper respiratory tract captures organisms that evade these two mechanisms. the bronchial epithelium contains cilia of up to 20 microns in length that extend into the air surface liquid (asl). the asl is a bilayer of 50-100 microns in thickness consisting of a low-viscosity or watery lower layer that is covered by a high-viscosity or gel upper layer secreted by adjacent goblet cells. this sticky upper layer serves to trap organisms, and the coordinated beating of the cilia moves these entrapped invaders up this mucociliary escalator to the larynx, where they can be expectorated. present in the secretions of the large airways and within the surfactant lining the alveolar walls are soluble mediators secreted by various cells. these mediators include lysozyme and lactoferrin, which lyse bacteria and inhibit their growth; the defensins and cathelicidins, small peptides both with microbicidal properties; and surfactant proteins a and d at the alveolar level, which bind to microorganism and enhance phagocytosis and also have direct bactericidal activity [232] . the major cells of the innate immune response of the lung are the alveolar macrophages (am) and the polymorphonuclear leukocytes (pmn). neutrophils phagocytize and destroy bacteria such as s. aureus, s. pneumoniae, and h. influenzae through a respiratory burst that generates nadph oxidase-dependent ros. in some instances, ams may ingest but not kill an organism. this occurs with such organisms as mycobacterium spp., nocardia spp., and legionella spp. because of the ability of these organisms to continue to replicate within part iv molecular pathology of human disease the am, cell-mediated immunity is required for their complete elimination. patients with defects in nadph oxidase are especially prone to respiratory infections by such organisms as s. aureus, nocardia spp. and aspergillus spp. bronchial epithelial cells are important in innate immunity through secretion of cytokines and molecules including il-1, il-5, il-6, il-8, and granulocytemacrophage colony-stimulating factor (gm-csf). these molecules attract macrophages as well as neutrophils and other inflammatory cells to the area to enhance the inflammatory response to the organism [233] . bronchial epithelial cells also serve an important role in recognizing pathogens through patternrecognition receptors (prrs). natural killer (nk) cells are involved in the innate immune response with surface receptors that recognize cells infected with viruses such as rsv, influenza, parainfluenza, and rhinovirus. the nk cells release ifn-g, which recruit other immune cells to add to the antiviral response. dendritic cells are tissue histiocytes positioned around the airways and lymphatics in the lung that recognize pathogens and their antigens and trigger the proliferation and amplification of antigen-specific tcells. this immune response bridges the innate immune response to the adaptive immune responses and is especially important in fungal infections. this mechanism is mediated through toll-like receptors (tlrs) that are able to distinguish pathogens from self-components by triggering cytokine production through nfkb and ap-1 and expressing co-stimulatory molecules necessary for this t-cell activation [234] . for those organisms that evade the basic, innate immunity of the lung, there are adaptive immune mechanisms that encompass both humoral and cellular immune mechanisms. humoral immunity is an important defense against encapsulated bacteria, most notably s. pneumoniae, and for other pyogenic bacteria such as h. influenzae, and staphylococci spp., and resolution of these infections requires the production of igg antibodies to the organisms. cellular immunity is especially important against such respiratory viral infections as influenza, rsv, cmv, varicella, and also against opportunistic infections. these viruses induce a cd4ã¾ and cd8ã¾ t-cell response that clears the lung of these viruses within 8-10 days post infection. granulomas are a common inflammatory response to both pathogens and foreign material. the most notable granulomatous infections in the lung are due to mycobacteria and fungal organisms. activation of cd4ã¾ t-cells by these organisms leads to proliferation and differentiation of these cd4ã¾ t-cells into t-helper-1 cells. the release of ifn-g by the th-1 cells activates lung macrophages to form epithelioid macrophages that have an increased ability to kill the microorganisms and express surface molecules that promote cell-to-cell fusion into giant cells. in addition, activation of these macrophages results in the release of numerous cytokines including ifn-g and tnfa. in patients who are deficient in cd4ã¾ t-cells or ifn-g, granuloma formation is very poor, altering the pathologic picture of these infections. this effect is most obvious in the nontuberculous mycobacterial infections, which have numerous patterns of injury depending on the immune status of their host. pneumonias can be broadly categorized into one of five major clinicopathologic categories, including (i) community-acquired pneumonias (acute and atypical), (ii) nosocomial pneumonias, (iii) aspiration pneumonias and lung abscess, (iv) chronic pneumonias, and (v) pneumonias in immunocompromised hosts. each type presents with a characteristic clinical pattern and may be caused by any of several pathogens so that treatment is many times empiric. the first category comprises community-acquired pneumonias (cap). these represent the majority of the lung infections that receive medical treatment, usually on an outpatient basis, with low (<1%) mortality. patients hospitalized for these infections typically have other comorbidities. the responsible organisms include respiratory syncytial virus (rsv); rhinovirus, parainfluenza, and influenza virus; bacteria, including mycoplasma pneumoniae and rickettsia; and most notably chlamydia pneumonia. chlamydia causes what is termed atypical pneumonia with a clinical course characterized by a progressive onset of fever without chills, a dry cough, and chest imaging that reveals focal infiltrates. acute or typical cap presents abruptly with high fever, chills, productive cough, and radiographs with lobar or segmental consolidation. the most common pathogens are streptococcus pneumoniae, haemophilus influenza, staphylococcus aureus, and moraxella catarrhalis. the second category, nosocomial pneumonias, consists of infections acquired within the hospital or from healthcare associated facilities. these infections are usually found in patients with predisposing risk factors and are a major source of morbidity and mortality, with some studies reporting a mortality range of 20%-50%. the most common risk factors include respiratory ventilation, artificial airways, nasogastric tubes, supine positioning, and medications that alter gastric emptying. the responsible organisms include klebsiella spp., legionella spp., staphylococcus aureus, and pseudomonas aeruginosa. the third category includes aspiration pneumonias and lung abscesses. these infections occur in the setting of patients with aberrant swallow or gag reflexes that allow gastric or oral contents into the airways. the organisms where necrosis and cavity formation occurs include s. aureus, k. pneumoniae, the anaerobic oral flora, and mycobacteria. clinically, these infections may have an acute course with fever and dyspnea or a more insidious course, many times with patients first presenting with lung cavities, empyemas, or necrotizing pneumonias. the fourth category, chronic pneumonias, includes indolent infections that cause a localized mass-like lesion in an otherwise healthy host. nocardia and actinomyces spp. are the most common pathogens, but mycobacteria and fungi may also cause these pneumonias. the fifth category includes pneumonias that occur in the setting of an immunocompromised patient. these include a number of organisms that otherwise would not act as pathogens such as the viruses cmv and hsv, the fungi aspergillosis and pneumocystis pneumonia, and the bacterium mycobacterium avium complex. streptococcus pneumoniae streptococcus pneumoniae, a gram-positive diplococcus also known as pneumococcus or diplococcus pneumonia, is a common cause of bacterial pneumonia in infants and elderly patients, alcoholics, diabetics, and patients with immunosuppression. this pneumonia usually presents abruptly with chills, a cough with rust-colored sputum and pleuritis, with high fevers, tachycardia, and tachypnea. the characteristic gross pathology is a lobar pneumonia that progresses from a red acute phase to a gray organizing phase. a fibrinous pleuritis is common, which eventually organizes to entrap the lung parenchyma in a fibrous capsule [235] . the microscopic examination reveals abundant fibrin, neutrophils, and extravasated red blood cells within the alveolar space and congested capillaries. hemophilus influenzae hemophilus influenzae is a gramnegative bacillus that inhabits the upper respiratory tract and can cause otitis media, epiglottitis, and meningitis, and usually enters the lung through aspiration or hematogenous spread. six serotypes are defined based on their capsular antigens, with type b the most common cause of pneumonias. this type of pneumonia is most commonly found in children or in the elderly with underlying chronic lung disease such as emphysema, cystic fibrosis, bronchiectasis, in patients with hiv infection, or in alcoholics. this bacterial pneumonia is usually preceded by a viral or mycoplasma infection that damages the mucociliary elements in the airways and allows for colonization by h. influenzae. the symptoms include fever; a productive, purulent cough; and myalgias. the incidence of this pneumonia as a common community-acquired pneumonia in children is quite low due to the advent of effective vaccines. however, it is increasing in incidence as a nosocomial infection [236] . like pneumococcal pneumonia, the pathology of h. influenzae pneumonia is in a lobar distribution with a neutrophilic-rich infiltrate and a pleural effusion. necrosis and empyema may occur but are uncommon. staphylococcus aureus staphylococcal pneumonia is caused by staphylococcus aureus, gram-positive cocci that usually spread to the lung through the blood from other infected sites, most often the skin. though a common community pathogen, it is found twice as frequently in pneumonias in hospitalized patients. it often attacks the elderly and patients with cf and arises as a co-infection with influenza viral pneumonia. the clinical course is characterized by high fevers, chills, a cough with purulent bloody sputum, and rapidly progressing dyspnea. the gross pathology commonly reveals an acute bronchopneumonia pattern (figure 18 .49) that may evolve into a necrotizing cavity with congested red/purple lungs and airways that contain a bloody fluid and thick mucoid secretions. the histologic pattern is characterized by a bronchopneumonia that spreads distally from the small airways into to the alveolar spaces (figure 18 .50) to form abscesses that connect with the pleural surface and may result in empyemas. the treatment of this organism has become increasingly problematic due to antibioticresistant strains, most notably methicillin-resistant s. aureus. legionella pneumophila legionella are gram-negative bacilli found predominantly in aquatic habitats such as lakes, rivers, and ponds. standing pools of water from humidifiers and other water outlets may be other sources. approximately 50% of air conditioners contain these bacilli. though 15 serogroups of legionella have been identified, 3 cause the overwhelming majority of human pneumonia. the clinical disease takes two forms: (i) legionnaires' disease, named after the outbreak of pneumonia at the 1976 american legion convention in philadelphia; and (ii) pontiac fever, a self-limiting flu-like disease with nonspecific symptoms. legionella pneumonia presents as a severe infection of the lung with chills and rigors with a nonproductive cough. it can progress rapidly to systemic symptoms of nausea, vomiting, and diarrhea and can lead to renal failure and death without immediate antibiotic therapy. the infected lungs are remarkably red and congested and appear to be distended with fluid. the microscopic picture reveals fibrinopurulent exudates that fill the alveolar space mixed with a necrotic, cellular infiltrate of degenerating neutrophils and monocytes (figure 18 .51). hyaline membranes may form in the periphery of the lesions, and pleural effusions consisting of fibrinoserous exudates are common. pseudomonas aeruginosa pseudomonas aeruginosa is a gram-negative bacillus that is found throughout the environment and in 50% of the airways of hospitalized patients. it usually enters the body through a disruption of the epithelial surface by cuts, burns, or therapeutic devices such as mechanical ventilators or intravascular catheters. pneumonias caused by this organism are usually found in intensive care units of hospitals and burn units, in patients with underlying chronic lung diseases including cystic fibrosis, emphysema, and in patients with prolonged hospitalization. the pathology is necrosis with a bronchopneumonia pattern that usually consists of an area of congestion and hemorrhage that is surrounded by a halo of tan/white consolidation (figure 18.52) . a necrotizing vasculitis with abundant organisms in vessel walls can be seen, and cavitation is common ( figure 18 .53). in treated lungs, healed cavities or pneumatoceles may appear as smooth-walled fibrous cysts. other gram-negative bacilli gram-negative bacilli such as klebsiella pneumoniae, acinetobacter, and various enterobacteriaceae spp. are common nosocomial pathogens. similar to p. aeruginosa, these pathogens colonize the oropharynx and are usually introduced into the lung by inhalation or aspiration of oral contents. the most notable of these is friedlander's pneumonia caused by k. pneumoniae, the most common cause of gramnegative bacterial pneumonia. this typically occurs in men over 40 years of age, usually in the setting of alcoholism, diabetes mellitus, or chronic lung disease. these patients produce large amounts of thick, bloody sputum, a product of the viscous mucopolysaccharide capsule of the organism, and present with severe systemic symptoms of hypotension and generalized weakness. the pathology of these pneumonias is similar to pseudomonas pneumonia with marked cavitation and abundant organisms on microscopic examination. nocardia spp. nocardiosis of the lung is caused by nocardia asteroides, a gram-positive rod found in the soil or organic matter. this infection is most common in immunocompromised adult patients and can be seen in the setting of pulmonary alveolar proteinosis, chronic lung diseases, and mycobacterial and other granulomatous diseases that affect the lung. its clinical course is indolent and usually begins 1-2 weeks before the patient presents for medical therapy. cough is common, often with thick, purulent sputum. in the immunocompromised setting, fever, chills, dyspnea, and hemoptysis are common, and weight loss may occur as the disease progresses. the pathology is remarkable for suppurative abscess formation with multiple cavities filled with green, thick pus. the inflammatory infiltrate consists of neutrophils, macrophages, and abundant necrotic debris with epithelioid histiocytes and giant cells within the wall of the cavity (figure 18 .54). empyema and pleura involvement occur in the majority of cases. mycoplasma and rickettsia pneumonias mycoplasma pneumoniae pneumonia is among the most common infections of the lower respiratory tract and usually occurs in small epidemics in closed populations. it often presents with atypical features of a progressive onset, fever without chills, a dry cough, diffuse crackles on physical examination, and chest imaging studies that reveal patchy interstitial infiltrates. the pathologic features are a result of the attachment of the organisms to the bronchiolar epithelium where they cause epithelial injury and ulcerations through secretion of peroxide [237] . in cases of severe infection, diffuse alveolar damage may be present. chlamydial pneumonia chlamydia spp. causes pneumonia in a variety of clinical settings. chlamydia trachomatis is an infection found predominantly in the postnatal period, chlamydia psittaci is the result of direct transmission from infected birds, including parakeets, parrots, and pigeons. chlamydia pneumoniae is the most common of the three and is a frequent cause of community-acquired pneumonia. it typically causes a very mild or asymptomatic infection with fever, sore throat, and nonproductive cough. the course of this infection may be severe in the elderly. chest imaging studies show alveolar infiltrates, and pleural effusions are present in the majority of cases. the pathology has not been well defined since the infection is usually self-limited. however, in experimental animal models there is a neutrophilic response in the early stages, and an interstitial, peribronchiolar, and perivascular infiltrate of lymphocytes, macrophages, and plasma cells in the latter stages of the infection. mycobacteria, a major cause of lung infections, are nonmotile, aerobic, catalase-producing, acid-fast bacilli. clinically significant lung infections can be caused by m. tuberculosis and by a group of nontuberculous mycobacteria (ntm). the latter group consists of over 100 species, of which three cause the overwhelming majority of pulmonary disease. these are m. avium-intracellulare (m. avium complex), m. kansasii, and m. fortuitum-chelonae. throughout history, tuberculosis (infection with m. tuberculosis) was the major disease caused by these organisms and was responsible for worldwide morbidity and mortality. however, over the past two decades lung diseases caused by ntm have become much more common and now represent the majority of the pulmonary mycobacterial disease. mycobacterium tuberculosis pulmonary tuberculosis is spread by interpersonal contact through aerosolized droplets. once in the alveoli, the bacteria cause a cell-mediated inflammatory response that is capable of inducing granuloma formation and necrosis. as in all infections, the extent of the disease is a function of the host's immune response. the most susceptible part iv molecular pathology of human disease patients are those with certain conditions that include immunosuppression, diabetes, malignancy, renal failure, among others. clinically, an infected patient has a productive cough, fever, and weight loss, and may develop hemoptysis as the cavitation progresses and erodes into the pulmonary vessels. extensive involvement of the lung can produce significant dyspnea and pleuritic chest pain. the pathology of tuberculosis is primarily that of granuloma formation and acute pneumonia. the granulomas are predominantly necrotizing, and the pneumonia usually contains abundant fibrin and neutrophils that fill the alveolar spaces. the gross lesions are referred to as caseous or cheese-like, because of the amount of necrosis present. this caseous material can extend into airways and is commonly coughed up during the active disease. in chronic forms of the disease, the area can undergo fibrosis and involute into a firm, hard scar. there are three major clinicopathologic variants of the disease: (i) primary tuberculosis, (ii) postprimary or reactivation tuberculosis, and (iii) progressive fibrocavitary disease. primary tuberculosis. in this form of the disease, the initial site of infection can be anywhere in the lung, but is usually in the lower lobe or anterior segment of the upper lobe, the areas that receive the most ventilation. the lesion usually consists of a dense consolidation with acute pneumonia and necrotizing granulomas. cavitation may occur, especially in the setting of immunocompromised hosts. from these foci, the organisms may spread through the lymphatics to elsewhere in the lung, the hilar lymph nodes, and the bloodstream, and lay dormant for long periods of time. the combination of the primary site of infection and the involved hilar lymph nodes is known as a ghon complex [238] . postprimary tuberculosis. this form of tuberculosis represents reactivation of old, scarred primary lesions long after the initial insult. the lesion can occur anywhere in the lung where the bacteria from the primary lesion have spread, but is usually apical. it consists of a focus or organizing pneumonia and fibrosis with central caseation. in an active lesion, the typical parenchymal pattern is an acute pneumonia with cavitation that expands to include the surrounding lung with aggregates of granulomas. the controversy surrounding this lesion arises as some evidence suggests that these lesions represent exogenous reinfection. the pathology of reactivation or reinfection may be indistinguishable, although reactivation tuberculosis may appear to arise out of a fibrotic, calcified chronic lesion [239] . progressive fibrocavitary disease. this form of the disease may arise out of either primary or postprimary tuberculosis. however, the latter is the more common scenario. the cavities that develop in this form of the disease begin as a slowly progressive, necrotizing pneumonia with abundant granulomas (figure 18 .55). the active disease may spread through the airways, causing ulceration, necrosis, and fibrosis of the surrounding bronchi and bronchioles. the extension of the disease in this way depends on the host, and patients with depressed immune systems can have large areas of the lung involved with massive pulmonary necrosis. usually, a fibrous capsule develops in the area of the cavitation, although inspissated necrotic material into the adjacent airways remains a continuous source of inflammation that can lead to reinfection and ongoing scarring [240] . nontuberculous mycobacteria the nontuberculous mycobacteria (ntm) are ubiquitous inhabitants of our environment, isolated from soil, fresh and brackish water, house dust, birds, animals, and food, and are increasingly important in causing pulmonary disease. there are currently more than 100 ntm species known. those organisms thought to be pathogenic to the lung include the clinical presentation of these lung infections can vary from minimally symptomatic small lesions discovered by routine radiography to sudden hemoptysis from advanced disease with severe cavitation (table 18 .5). the two most characteristic lesions are those of diffuse infiltrates in an immunocompromised patient, seen most commonly in the hiv-positive population and an viruses most pulmonary infections are due to viruses from four major groups: influenza, parainfluenza, respiratory syncytial virus (rsv), and adenovirus (table 18 .6) [241] . the clinical presentations of these infections have some common features, including insidious onset, nonproductive cough, fever, and chest pain. chest imaging studies usually reveal bilateral, multifocal infiltrates, most without evidence of cavitation or pleural involvement. these infections are mild, self-limiting, and require no more than supportive therapy except in immunocompromised hosts, where the clinical course can be much more serious. also, immunocompromised patients are susceptible to other viruses such as herpesvirus and cytomegalovirus pneumonias, which are not common pathogens in normal hosts [242] . since the 1980s, a subset of pulmonary viral infections has emerged with a much more aggressive clinical course, most notably sars, coronavirus, and hantavirus. these viruses present with systemic symptoms of headache, myalgias, and weakness followed by a deteriorating clinical course with respiratory distress, shock and, in over 50% of the cases, death [243, 244] . therapy for most respiratory viral infections is supportive, although antivirals are available for some viruses, mostly used in the setting of immunocompromised patients. ribavirin, a guanosine analogue, is the main antiviral used for rsv; m2 inhibitors or adamantanes (amantadine and rimantadine) are used against influenza a and neuraminidase inhibitors (oseltamivir and zanamivir) are used against both influenza a and b [245] . cytomegalovirus is treated with ganciclovir, foscarnet, or cidofovir, while herpesvirus is treated with acyclovir [241] . the pathologic patterns of injury for most viruses are similar, making morphologic distinctions among them difficult. however, some characteristic patterns emerge, most notably in those viruses that cause cytopathic changes. influenza, adenovirus, sars, coronavirus, and hantavirus all cause an acute lung injury pattern with diffuse alveolar damage, and in the case of the latter two viruses, evidence of hemorrhage and edema. influenza and adenovirus will also cause a necrotizing bronchiolitis due to their preferential infection of bronchial epithelial cells. finally, some viral infections can be distinguished by their characteristic cytopathic inclusions. adenovirus can be identified by characteristic smudge cells that present in advanced stages of the disease and represent adenovirus particles in the nucleus of an infected cell (figure 18 .56). cytomegalovirus has both nuclear owl's eye inclusions, as well as cytoplasmic inclusions (figure 18 .57). herpesvirus has glassy intranuclear inclusions and can also have multinucleation (figure 18 .58). fungi are larger and more complex than bacteria, and their patterns of injury in the lung are different and in general more destructive. these pathogens are common in our environment and enter the lungs through inhalation. though many fungi are capable of causing pulmonary disease, most only inhabit the lung as colonizers. those of most concern for causing clinical disease include the endemic fungi of north america-histoplasma capsulatum, blastomyces dermatitidis, and coccidioides immitis-and two fungi that are commonly seen in immunocompromised hosts-aspergillus fumigatus and pneumocystis jiroveci. histoplasma capsulatum histoplasma capsulatum is a dimorphic fungus most prevalent in the middle portion of the united states from the great lakes to tennessee. the fungus is present in soil that has been contaminated with guano and other debris by nesting birds, most commonly blackbirds and chickens, and by bats. the organism lives in the environment as spores or conidia and germinates to form hyphae. these structures divide to create the yeast forms, which, when inhaled, induce granuloma formation in the lung. approximately 75% of people have skin tests that are positive for exposure to h. capsulatum, but most exposures do not cause clinical disease. disease typically occurs in people exposed to large amounts of organisms, such as construction workers who move large volumes of dirt or spelunkers who venture into bat-ridden caves. the acute disease has flu-like symptoms which are self-limiting. healed disease may leave behind calcified granulomas in the lung that appear as buckshot on chest imaging studies. the most chronic forms of this disease may slowly progress, giving rise to cavitating and fibrous lesions. in the immunocompromised host, disseminating histoplasmosis can be seen, although reactivation is uncommon [246] . the pathology reveals characteristic necrotizing granulomas distributed around the airways (figure 18 .59), which contain silver-positive yeast forms of 2-4 microns. these granulomas may resolve into scarred nodules, which can calcify and produce the characteristic chest images. cavities may form in the apices with progression of the disease, and the disseminated form of the disease has an abundance of organisms both within macrophages in the lung and throughout many organs in the body. blastomyces dermatitidis blastomyces dermatitidis is also endemic to the middle united states, including the ohio and mississippi river valleys. it is found in wooded terrain, usually during the wet seasons, putting campers and outdoorsmen at risk. the clinical disease takes two forms, cutaneous and systemic, the latter beginning in the lungs through inhalation. the acute pulmonary infection takes a nonspecific form with fever, malaise, and chest pain. imaging studies may show either infiltrates or a mass-like infiltrate. thus, blastomyces infection may mimic other diseases, and the diagnosis may be delayed. some patients go on to chronic disease with cavitation or progressive pulmonary blastomycosis, which manifests as acute respiratory distress syndrome, cavitary lesions, and a poor prognosis [247] . the pathology of blastomyces infection is similar to histoplasmosis with necrotizing granulomas. however, the lesions are larger, showing more neutrophilic necrosis. the organisms are also larger (8-15 microns), with prominent broad-based budding, and are apparent on routine hematoxylin and eosin staining (figure 18 .60). coccidioides immitis coccidioides immitis is found in the semi-arid desert climate of the southwestern united states. the organisms are inhaled as spores, causing an acute disease characterized by fever, chills, chest pain, dyspnea, and hemoptysis. chest imaging studies typically show consolidation and cavitation, and hilar lymphadenopathy is common. reactivation and dissemination are possible in patients with previous infection, whether or not they are immunocompromised patients [248] . the pathology of pulmonary coccidioidomycosis is neutrophilic, suppurative, and granulomatous. the organisms appear as large spherules containing endospores, visible on silver stains. the spherules are 30-100 microns in diameter and the endospores that are released into the surrounding tissue proceed to mature into new spherules (figure 18.61) . as in histoplasmosis, cavitating lesions may have hyphal forms that begin to germinate. aspergillus fumigatus aspergilli are asexual mycelial fungi that are ubiquitous in the environment as airborne aspergillus spores. they are weak pathogens that produce invasive infections predominantly in immunocompromised hosts or in those with significant chronic lung diseases. in tissue, aspergilli form septate hyphae, 3-6 microns in diameter, with characteristic acute-angle, dichotomous branching (figure 18.62) . these organisms affect the lung in three major ways: (i) saprophytic growth in bronchi or pre-existent cavities; (ii) as an allergic or hypersensitivity reaction, predominantly in asthmatics; and (iii) invasive aspergillosis in immunocompromised hosts [249, 250] . as a saprophyte, aspergillus produces surface growths or minute masses of hyphae, usually in bronchiectatic cavities, emphysematous bullae, or scars from previous lung diseases such as tuberculosis or sarcoidosis. the pathology is usually that of a fibrous-walled cavity containing degenerating hyphae (figure 18 .63). in this setting, hyphae do not invade into the lung tissue, but surface erosion of a vascularized cavity may cause hemoptysis. aspergillus causes an immunologic response resulting in mucoid impaction or eosinophilic pneumonia in asthmatics, an entity known as allergic bronchopulmonary aspergillosis (abpa). pathologically, one sees mucoid plugs and superficial erosions of the airways with histiocytic inflammation, with only rare hyphal fragments present. the final form of the disease, invasive pulmonary aspergillosis, is found in severely immunocompromised, neutropenic patients. the hyphae, which disseminate through the blood, invade the blood vessels causing thrombosis, hemorrhage, and infarction to form typical targetoid lesions. this form of the disease has a poor prognosis despite aggressive antifungal therapy. pneumocystis jiroveci the taxonomy of pneumocystis jiroveci (formerly pneumocystis carinii) has changed over the past decade. previously thought to be a protozoan based on the histological characteristics of its trophozoite and cyst life forms, it has recently been placed in the fungal kingdom after ribosomal rna was found to have sequences compatible with the ascomycetous fungi [251] . the inability to culture pneumocystis jiroveci has slowed the understanding of this organism. animal models have helped in defining the antigenic and genotypic differences among the various pneumocystis organisms, which has led to the proposal for species-specific strains, with p. jiroveci found in human infections [252] . the molecular methods used for the typing these species examine a number of gene loci. most importantly, sequence analysis of the thymidylate synthase (ts) and superoxide dismutase (soda) gene loci, the epsp synthase domain of the multifunctional arom gene, and the mitochondrial small subunit ribosomal rna (mtssu rrna) locus have been used to distinguish the various pneumocystis species that infect different mammalian hosts [253] . clinically, p. jiroveci causes disease predominantly in the immunocompromised setting. pneumocystis pneumonia (pcp) has been found during recent times most commonly in the aids population, but prior to this epidemic, it was found in malnourished infants and other severely immunocompromised hosts. because this organism has not been cultured, the diagnosis of pcp continues to be challenging. the clinical characteristics are nonspecific and vary with the patient's immune status. in the hiv population, patients typically develop a subacute onset of progressive dyspnea, a nonproductive cough, malaise, and a low-grade fever. in the non-hiv population, the presentation is more acute, with fulminant respiratory failure associated with cough and fever, and usually requiring mechanical ventilation [254] . chest imaging studies typically show bilateral, symmetric, fine reticular interstitial infiltrates involving the perihilar area, which spread to involve the entire lung. figure 18 .62 aspergillosis. aspergillus fumigatus grows within necrotizing cavities of the lung as branching septated fungal hyphae, as seen on this grocott methenamine silver stain. figure 18 .63 aspergilloma. fungal hyphae from aspergillus fumigatus can colonize chronically inflamed lungs with cavities and may grow to form fungal balls with a dark, green color that are treated by surgical resection, as seen in this case of a lobectomy specimen. treatment is usually with trimethoprim/sulfamethoxazole and intravenous pentamidine. survival is 50%-95% even in severely immunocompromised patients. the life cycle of p. jiroveci consists of three stages: trophozoite, cyst, and sporozoite. the trophozoite form, which adheres to the type 1 epithelium, replicates and enlarges through three precyst stages before maturing into a cyst form that is found in the alveolar space. sporozoites develop within immature cysts through meiosis and mitosis. the mature cyst contains eight haploid sporozoites. the rupture of the cyst wall releases sporozoites into the surrounding environment where they mature into trophozoites. the pathology of the infection is predominantly due to the interaction of the organism with the epithelium. the attachment of the organism to the lung epithelium is via glycoprotein a present on the surface of the organism. the binding of the organism to the type 1 cell occurs via surface receptors on the type 1 cell that include macrophage mannose receptors. these interact with glycoprotein a and activate pathways in the organism that induce genes encoding for pathways that induce mating and proliferation responses, and for the formation of pheromone receptors, transcription factors, and heterotrimeric g-protein subunits [263] . in addition to these genetic effects, the cyst wall contains chitins, polymers, and other substances, in particular, 1,3-glucan, that maintain its integrity and induce the inflammatory response of the host. the 1,3-glucan in the wall of the organism stimulates the release by the macrophages of reactive oxidant species and the generation of potent proinflammatory cytokines, such as tnfa, which bind to the organism and exert a toxic effect. once inside the macrophage, the organism is incorporated into the phagolysosome and degraded. tnfa also directly recruits other inflammatory cells including neutrophils, lymphocytes, and circulating monocytes, and induces the release of il-8 and ifn-g that recruit and activate inflammatory cells [255] . in aggregate, the recruitment of these inflammatory cells and the mediators they release is responsible for the damage to the lung epithelium and endothelium that is seen in this disease [255] . the pathology of pcp has typical and atypical variants. typically, the lung contains a dense interstitial plasma cell pneumonia that expands alveolar walls. the epithelium consists predominantly of type 2 pneumocytes, and the alveolar spaces contain an eosinophilic, frothy exudate, which contains fine, hemoxylin-stained dots that represent a thickening in the cyst wall (figure 18.64) . in this form of the disease, the organisms are abundant and the diagnosis can usually be made by bronchoalveolar lavage. atypical pathologic variants include a necrotizing variant that has a pattern similar to the typical form with exudative alveolar infiltrates, but which undergoes necrosis and cavity formation. these cavities heal into fibrous-walled cysts, similar in gross appearance to those found in pseudomonas pneumonia. a third variant has wellformed granulomas involving the airways, a pattern common to histoplasmosis and tuberculosis. in this form, the organisms are rare and very difficult to find, even with tissue organismal stains. in general, the pathologic pattern of injury depends on the host's immune status, with the typical pathology found in severely immunocompromised hosts as the aids population and the atypical forms found in hosts with immune systems that are less compromised. pulmonary langerhans cell histiocytosis (plch) and erdheim-chester disease are histiocytic diseases that primarily affect the lung. other histiocytic diseases may affect the lung, such as niemann-pick disease, gaucher disease, hermansky-pudlak and rosai-dorfman disease, but these are not considered primarily lung histiocytic diseases. pulmonary langerhans' cell histiocytosis (plch) is a disease of the dendritic histiocytes of the lung referred to as langerhans' cells (lcs). this disease is part of a group of diseases that are characterized by a proliferation of langerhans cells in organs throughout the body that range from a malignant systemic disease as is seen in children [256] to the pulmonary variant that is seen in adolescents and adults. plch is usually the result of inflammatory or neoplastic stimuli in lungs of smokers or in lungs involved by certain neoplasms [257] . chest radiographs from patients with plch usually reveal bilateral nodules, predominantly in the upper lobes, which are worrisome for metastatic disease. treatment involves smoking cessation and steroid therapy. typically, the disease undergoes spontaneous regression. approximately 15%-20% of patients will progress to irreversible end-stage fibrosis [258] . the pathology of plch consists of airway-based lesions with a proliferation of lcs. the early cellular lesions contain a mixture of cells including langerhans' cells, lymphocytes, plasma cells, and eosinophils ( figure 18 .65). though it was previously referred to as eosinophilic granuloma, eosinophils are not the major cell type present, and the lesion is, at best, a loosely formed granuloma. immunohistochemistry reveals the lcs to be diffusely, strongly immunoreactive to s-100 protein and cd1a. ultrastructural analysis reveals intracytoplasmic organelles called birbeck granules, a normal constituent of langerhans' cells, in greater numbers in plch [259] . the pathogenetic mechanisms of plch focus on defects in the homeostasis of dendritic cells (dcs) in the lungs of smokers and the role tobacco smoke may play in stimulating the proliferation of these cells [260] . some studies suggest that stimulation of alveolar macrophages by chemicals in smoke results in secretion of such cytokines as gm-csf, tgfb, and tnfa [261] . in transgenic mice, accumulation of dcs around airways may be a result of excess gm-csf [262] . other theories suggest that cigarette smoke stimulates the secretion of bombesin-like peptide by the neuroendocrine cells in the bronchiolar epithelium and leads to a similar stimulation of alveolar macrophages and a cytokine milieu that promotes the proinflammatory proliferative changes [262] . not all smokers get plch, leading to the suggestion that only smokers with an underlying genetic susceptibility will develop the disease. studies have established that in some cases the lcs in plch are clonal, suggesting that cellular abnormalities must play some part in the pathogenesis of the diseases [263] . to support this, studies have shown genetic mutations and allelic loss of tumor suppressor genes in smokers with plch [264] . the mechanisms by which this proliferation of lcs leads to the destruction of the bronchiolar epithelium and the other observed pathology are unclear. lcs in normal lungs have little ability to interact with t-cells or act as effective antigen-presenting cells, but the lcs of plch have a mature immunophenotype, expressing b7-1 and b7-2, the co-stimulatory molecules needed for lymphostimulatory activity [265] . whether this more mature immune phenotype leads to an unregulated immune response and destruction of the bronchial epithelial cells is not known. however, some studies have shown that bronchiolar epithelial cells may induce the expression of this mature phenotype by secreting cytokines in response to environmental stimulants such as cigarette smoke or viral infections, or by the development of hyperplastic or dysplastic lesions that express new foreign antigens [265] . erdheim-chester disease (ecd) is a systemic non-langerhans' cell histiocytosis of adults that most commonly involves the long bones. involvement of other organs, including the lung, has been reported. lung involvement occurs in approximately 20%-35% of the cases, and the patients usually present with cough, dyspnea, rhonchi, and pleuritic pain. radiographically, the lungs reveal infiltrates in a lymphatic distribution, predominantly upper lobe, with prominent interstitial septal markings that can mimic sarcoidosis [266] [267] [268] [269] [270] [271] [272] . pulmonary involvement by ecd may have an unfavorable prognosis, and the fibrosis that ensues is one of the most frequently reported causes of death [266, 273] . the treatment of ecd is variable with corticosteroids, chemotherapy, surgical resection, and radiation therapy reported [273] . non-langerhans' cell histiocytes of dendritic cell phenotype are the main cells present in this disease. this infiltrate contains foamy histiocytes with scattered giant cells, a scant number of lymphocytes or plasma cells, and some fibroblasts. the histiocytes express cd68 (macrophage antigen) and factor xiiia (dendritic cell antigen), but express s-100 protein weakly or not at all, and do not express cd1a. ultrastructural analysis reveals phagolysosomes, but no birbeck granules are present [273] . this infiltrate that involves the lung is usually present in the pleura and subpleura, within the interlobular septa and around the bronchovascular structures. the remainder of the lung parenchyma is unremarkable, though fibrosis and paracicatricial emphysema can appear in the late stages of the disease [266] . the etiology of ecd is not known, but this rare disease has been established as primarily a macrophage disorder [274] . these histiocytes have abundant phagolysosomes and express the antigen cd1a and are consistent with a phagocytic cell, most likely closely related to alveolar macrophages. the peripheral monocytosis and the proinflammatory cytokine profile that is found in these patients might suggest that the histiocytic infiltrate is a result of systemic monocytic activation and invasion of circulating monocytes into the tissues throughout the body [275] . recently, an ecd patient was successfully treated by an agent toxic to monocytes, supporting the theory that these cells play a part in the disease [275] . alternatively, end organ cytokine production by local inflammatory cells resulting in proliferation and differentiation of resident immature histiocyte populations may produce a similar picture. another interesting observation is that erdheim-chester has been reported to occur in patients with langerhans' cell histiocytosis [276] , which may suggest that this is a disease where macrophages transition between two different phenotypes along the differentiation spectrum of tissue dendritic cells [276] . whether this is a benign or malignant proliferation has not been established. of 5 patients studied, clonality has been demonstrated in 3 by polymerasechain reaction [277] . environmental exposures are a major cause of lung disease and can cause a wide spectrum of both acute and chronic pathology. many organic and inorganic materials can cause lung damage, and because of their similar patterns of injury and long latent periods, it can be difficult to isolate the exact offending agent without a thorough clinical history. the two occupational lung diseases presented here-asbestosis and silicosis-represent pneumoconiosis, which are defined as diseases which result in diffuse parenchymal lung injury due to inhaled inorganic material. both have many pathologic patterns of injury that depend on the amount and length of time of exposure, and both can also cause neoplastic diseases of the lung. asbestos fibers are naturally occurring silicates that are commonly used in construction materials such as cement and insulation and in many textiles. they can be separated into two groups based on their mineralogic characteristics. serpentine fibers, named as such because they are long and curly, include chrysotile asbestos. amphibole fibers, more straight and rodlike, include predominantly amosite and crocidolite asbestos. in the united states most of the asbestos is chrysotile. the amphiboles are more pathogenic and are responsible for most of the neoplastic and non-neoplastic pulmonary diseases associated with asbestos exposure. by definition, asbestosis is bilateral diffuse interstitial fibrosis of the lungs that can be attributed to asbestos exposure. the disease, which mostly affects textile and construction workers, is usually the result of direct exposure over 15-20 years. the latency to clinical disease is inversely proportional to the level of exposure. the symptoms are a gradual onset of shortness of breath, a cough with dry rales at the bases on inspiration, and digital clubbing. in the early disease, the chest x-ray shows basilar disease that begins predominantly as thickening of the subpleural, but progresses as infiltrates and fibrosis that involve the middle zone, eventually leading to thickening of the airways and traction bronchiectasis. the apex of the lung is usually spared. the clinical findings are nonspecific and have considerable overlap with uip, so the diagnosis is usually made only when a history of significant exposure is discovered. the gross picture includes a bilateral lower lobe gray/tan fibrosis with honeycomb changes in late disease. microscopically, asbestosis can cause many patterns of injury in the lung, but the most common is collagenous deposition in the areas of the lymphatics where the fibers are in the highest concentration. these areas include the subpleural, interlobular septae, and around the bronchovascular areas that contain a bronchiole and a branch of the pulmonary artery. hyalinized pleural plaques are a common manifestation of asbestos exposure but are not specific for asbestos and can be found in the absence of pulmonary parenchymal disease. eventually, the fibrosis involves the alveoli beyond the bronchioles and causes distortion of the lung architecture to form remodeled, dilated airspaces similar to those seen in uip. distinguishing this fibrosis from other forms of fibrosing lung disease can be difficult, but the presence of ferruginous bodies, asbestos fibers coated by iron, proteins, and a mucopolysaccharide coat are indicative of significant asbestos exposures and support this diagnosis (figure 18 .66) [278] . figure 18.66 asbestosis. this cytopathologic preparation from a bronchoalveolar lavage specimen illustrates an asbestos fiber coated by an iron-protein-mucopolysaccharide substance and appears as a golden brown, beaded structure known as a ferruginous body. silicosis results from chronic, high-dose exposure to crystalline silica, which consists of silicon and oxygen with trace amounts of other elements, usually iron. the most common silica is quartz, which is present in large amounts in such rocks as granite, shale, and sandstone and is among the more fibrogenic of all silica types. thus, occupations most at risk for silicosis include sandblasting, quarrying, stone dressing, and foundry work where exposure to quartz is high. the disease takes three major clinical and pathologic forms that have different clinical characteristics. simple or nodule silicosis is marked by the presence of fine nodules 1 cm, on chest imaging studies, usually in the upper lobes. patients with this condition are typically asymptomatic, with normal respiratory physiology. the pathology in these lungs reveals discrete, hard nodules that have a green/gray color, centered either on the small airways or in the subpleura. microscopically, these nodules have an early stellate shape that eventually transforms to a more whorled appearance with dustladen macrophages scattered throughout it. polarized light examination reveals weakly birefringent material. complicated pneumoconiosis represents similar pathologic findings only with larger and more circumscribed nodules, which coalesce into a large upper lobe mass, a condition known as progressive, massive fibrosis ( figure 18.67) . these patients are symptomatic with a productive cough and mixed pulmonary function tests with a reduced diffusing capacity as the fibrosis increases. diffuse interstitial fibrosis may occur; however, unlike asbestosis, this pattern is found in pneumoconiosis. when complicated pneumoconiosis is found with rheumatoid nodules in the setting of a patient with rheumatoid arthritis, this is known as caplan's syndrome. the pathogenesis of both asbestosis and silicosis depends upon inflammation and fibrosis caused by the inhaled fibers. in humans, the amount of fiber needed to cause fibrosis varies from person to person. this may be related to a difference in fiber deposition based on the size of the lungs or to the efficacy with which the lung clears these fibers [256] . some studies have also suggested that fiber length determines the amount of pathology. however, this association has not been confirmed in humans for either asbestosis or silicosis. in both diseases, it is known that other factors increase the risk of developing disease. for example, smokers exhibit worse disease than nonsmokers with similar exposures to asbestosis. the mechanism for this effect is unclear, although speculation centers on the inhibition of fiber clearance in smokers. also, it is known that smoking enhances the uptake of fibers by pulmonary epithelial cells and in this way may increase the fibrogenic and inflammatory cytokine production by these cells. the cellular mechanisms by which both asbestos and silica fibers induce the inflammation and fibrosis are mediated predominantly through alveolar macrophages. in the case of silica, it is known that the uptake of these fibers into the alveolar macrophages is by way of a scavenger receptor expressed on the surface of the cell known as marco (macrophage receptor with a collagenous structure). once inside the cells, the fibers activate the release of ros that can lead to cellular and molecular damage through a number of pathways. first, ros can directly cause lipid peroxidation, membrane damage, and dna damage. second, silicainduced free radicals can trigger phosphorylation of cellular proliferation pathways through mitogen-activated protein kinases (mapks), extracellular signal regulated kinases (erks), and p38. these pathways are also involved in the proliferation of fibroblasts in asbestosis and of mesothelial and epithelial cells in the neoplastic diseases associated with the inhalation of these fibers [279] . in addition, these fibers can activate proinflammatory pathways controlled by such transcription factors as nuclear nfkb and activator protein 1 (ap-1). these pathways result in the activation of the early response genes c-fos and c-jun and the release of proinflammatory cytokines such as il-1 as well as fibrogenic factors such tnfa [280] . tnfa plays a prominent role in both diseases, and its regulation has been studied in animal models exposed to silica. it is now known that a transcription factor labeled nuclear factor of activated t-cells (nfat) plays a key role in the regulation of tnfa. figure 18 .67 complicated pneumoconiosis/ progressive massive fibrosis. this sagittal cut section of lung reveals a large gray/black mass that extends from the apex to include the majority of the lung. the patient had a long history as a coal mine worker, and the microscopic sections revealed abundant anthracotic pigment and scarring in this area. binding sites for nfat have been found in the promoter region of the tnfa gene. the mediation of silica-induced tnfa transcription is probably via o 2-but not h 2 o 2 [280, 281] . atresia of the lung represents a premature closure of the airway at any level of the bronchial tree including the lobar, segmental, or subsegmental airways. clinically, these children usually present between 10 and 20 years of age for symptoms of dyspnea, wheezing, recurrent pneumonias, or for incidental findings on a chest imaging study. these lesions are more common in the proximal segmental bronchi, right more often than left. when atresia is associated with anomalies of the vascular supply to the affected airway, the lesion represents a separate, aberrant segment of lung known as a sequestration, either intralobar or extralobar type. the pathology of bronchial atresias and sequestrations represents sequelae of chronic inflammation due to the accumulation of secretions in these blind-end airways. these features consist of cystically dilated airways with mucus and parenchymal fibrosis with honeycomb changes. in intralobar sequestrations (ils), the anomalous vessel is a muscular artery that enters through the pleura from an aortic source, usually from the thoracic area. ils are separate, isolated areas of lung invested with the normal visceral pleura without bronchial or arterial connections (figure 18 .68). extralobar sequestrations (els) are pyramid-shaped accessory pieces of lung that have their own pleura with an artery from the lung but without airway connections. the category of congenital pulmonary cystic diseases represents the majority of congenital pulmonary disease and includes foregut cysts and cystic adenomatoid malformations. foregut cysts include bronchogenic, esophageal, and thymic cysts that form from defects in the foregut branching. clinically, these cysts are usually incidental findings on chest imaging studies, but they can present with complications due to infection or hemorrhage. pathologic features of these cysts include subtle differences that are usually only apparent after microscopic examination. grossly, these cysts usually arise proximally either within the mediastinum (over 50%) or in the proximal regions of the lungs, right more commonly than left, along the esophagus, and rarely within the lung parenchyma or below the diaphragm [282] . microscopically, each cyst contains a simple cuboidal or columnar epithelium, ciliated or nonciliated, that may undergo squamous metaplasia. distinguishing among the three types of cysts requires the presence of other elements. bronchogenic cysts have submucosal glands and/or hyaline cartilage within their walls, and thymic cysts may contain residual thymus. congenital cystic adenomatoid malformations (ccam), now more commonly referred to as congenital pulmonary airway malformations (cpam), are segments of lung with immature airways and alveolar parenchyma. these are usually classified by their predominant cyst size into types 0-4. type 1 cysts, which contain a main large cyst of up to 10 cm, are the most common. these cysts are distinguished from foregut cysts upon the recognition in the cpam of immature alveolar duct-like structures connecting to the surrounding lung parenchyma. this type of cpam is also notable, as it is known to undergo malignant transformation, usually to mucinous bronchioloalveolar cell adenocarcinomas. these anomalies arise due to defects during the various stages of development and are best considered within these developmental stages. the embryonic stage occurs within the first 3-7 weeks of life when the ventral wall of the foregut separates into the trachea and esophagus and branches to form the left and right lungs. the splanchnic mesenchyme that surrounds this foregut forms the vascular and connective tissues of the lungs. defects in this phase result in complete lack of lung development known as pulmonary agenesis and incomplete separation of the trachea and esophagus, causing tracheal-esophageal atresias and fistulas. the pseudoglandular stage, between weeks 7-17 of development, is a time of rapid development of the conducting airways including the bronchi and bronchioles and the expansion of the peripheral lung into the acinar buds. the mesenchymal tissue figure 18 .68 intralobar sequestration. the tan and white mass involving this left lower lobectomy specimen represents chronic pneumonia and fibrosis in the sequestered area of the lung. the dilated airways are features of an endstage fibrosis that is commonly found in this entity. part iv molecular pathology of human disease that surrounds these buds begins to thin, becomes vascularized, and forms the cartilage that surrounds the more proximal branching airways. during the canalicular (week [17] [18] [19] [20] [21] [22] [23] [24] , saccular (weeks [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] , and alveolar (weeks 36 to maturity) stages of development, the acinar buds continue to expand, and the mesenchyme surrounding this continues to thin. during the canalicular stage, the pulmonary vascular bed begins to organize, the distance between the blood in the vascular spaces and the air in the alveoli narrows, and the respiratory epithelium begins to form. the gas exchange unit of the alveolus becomes functional during the saccular stage with further differentiation of the respiratory epithelium to include clara cells, ciliated and nonciliated cells, and type 2 cells with the first production of surfactant occurring during this period. this gas exchange unit continues to mature during the alveolar stage with the growth and septation of the alveoli. this process continues postnatally through 6-8 years of age. the different types of cpams arise at different stages of development. cpams 0, 1, and 2 are a result of defects during the early embryonic and pseudoglandular stages of development, producing pathology with features of primitive alveolar buds and immature and abnormal airway cartilage structures. cpams 3 and 4 result from abnormal formation of the more distal airways and pulmonary parenchyma during the canalicular, saccular, and alveolar phases, causing pathology with immature alveolar, or alveolar simplification with enlarged alveoli [283] . various genetic defects in the pathways that control lung morphogenesis have been associated with these congenital lung diseases. two major transcription factors are responsible for the normal branching morphogenesis. the first, thyroid transcription factor-1 (ttf-1), is a member of the nkx2.1 family of hemeodomain-containing transcription factors. this factor plays a role in the lung epithelial-specific gene expression and proper lung bud development in the embryonic stage, as well as in the maturation of the respiratory epithelium. the second major factor is somatic hedgehog (shh)/gli, expressed by endodermally derived cells and required for branching morphogenesis. the development of the lung bud from the foregut endoderm depends on the appropriate expression of these lung-specific genes at the correct time in development. in the presence of genetic defects, aberrant lung development may occur. for example, mutations of various types in the shh/gli gene have been found to cause tracheoesophageal fistulas, anomalous pulmonary vasculature, and aberrant airway branching. also, deletions in the ttf-1 gene are associated with tracheoesophageal fistulas and a variety of forms of lung dysgenesis [284] . finally, factors present in the surrounding mesenchyme play a role in inducing the proper development of the pulmonary endoderm. a prominent mesenchymal factor in this process is fibroblast growth factor (fgf), which modulates both the proximal and distal lung branching morphogenesis. deletions in this gene may cause lung agenesis and tracheal malformations [284] . surfactant dysfunction disorders represent a heterogenous group of inherited disorders of surfactant metabolism, found predominantly in infants and children. pulmonary surfactant includes both phospholipids and surfactant proteins, designated surfactant proteins a, b, c, and d (sp-a, sp-b, sp-c, sp-d), synthesized and secreted by type 2 cells beginning in the canalicular stage of lung development. damage to type 2 cells during this time period can lead to acquired surfactant deficiencies. however, more commonly these deficiencies are the result of genetic defects of the surfactant proteins themselves. the major diseases are caused by genetic defects in the surfactant protein b (sftpb, chromosome 2p12-p11.2); surfactant protein c (sftpc, chromosome 8p21); and adenosine triphosphate (apt)-binding cassette transporter subfamily a member 3 (abca3, chromosome 16p13.3). defects in sftpb and abca3 have an autosomal recessive inheritance pattern, and defects in sftpc have an autosomal dominant pattern. sp-b deficiency is the most common. it presents at birth with a rapidly progressive respiratory failure and chest imaging studies showing diffuse ground glass infiltrates. the gross pathology in these lungs consists of heavy, red, and congested parenchyma with microscopic features that range from a pap-like pattern to a chronic pneumonitis of infancy (cpi) pattern. in sp-b deficiency, the pap pattern predominates with a histologic picture of cuboidal alveolar epithelium and eosinophilic pas-positive material within the alveolar spaces that appears with disease progression. in the late stages of the disease, the alveolar wall thickens with a chronic inflammatory infiltrate and fibroblasts. this alveolar proteinosis-type pattern of injury can be confirmed with immunohistochemical studies that establish the absence of sp-b within this surfactant-like material. diseases due to abca3 or sftpc deficiency may present within a week of birth or years later; the former has a poor prognosis, but the latter has a more variable prognosis with some patients surviving into adulthood. indeed, sp-c mutations have also been recognized in some families as a cause of interstitial pneumonia and pulmonary fibrosis in adults [285] . the pathology of sp-c deficiency has more cpi features and less proteinosis. in contrast, abca deficiency can have either pap or cpi features, with the former present early in the disease and the latter present in more chronically affected lungs [286] . the sp-b gene (sftpb) is approximately 10 kb in length and is located on chromosome 2. there are over 30 recessive loss-of-function mutations associated with the sftpb gene. however, the most common mutation is a gaa substitution for c in codon 121, found in about 70% of the cases. the lack of sp-b leads to an abnormal proportion of phosphatidylglycerol and an accumulation of a pro-sp-c peptide, leading to the alveolar proteinosis-like pathology. sp-c protein deficiency is due to a defect in the sftpc gene localized to human chromosome 8. there are approximately 35 dominantly expressed mutations in sftpc that result in acute and chronic lung disease. approximately 55% of them arise spontaneously, and the remainder are inherited. the most common mutation is a threonine substitution for isoleucine in codon 73 (i73t), found in 25% of the cases, including both sporadic and inherited disease [287] . this mutation leads to a misfolding of the sp-c protein, which inhibits its progression through the intracellular secretory pathway, usually within the golgi apparatus or the endoplasmic reticulum [288] . the absence of sp-c within the alveolar space causes severe lung disease in mouse models. infants with documented mutated prosp-c protein, the larger primary translation product from which sp-c is proteolytically cleaved, can have respiratory distress syndrome (rds) or cpi. in older individuals, pathologic patterns observed in the lungs with these mutations include nonspecific interstitial pneumonitis (nsip) and uip. in this affected adult population, the pathology and age of disease presentation vary even within familial cohorts, suggesting the involvement of a second hit, perhaps an environmental factor [289] . the abca3 protein is a member of the family of atpdependent transporters, which includes the cftr, and is expressed in epithelial cells. mutation in this gene results in severe respiratory failure that is refractory to surfactant replacement. the cellular basis for the lack of surfactant in patients with this genetic mutation is not known. the presence of abnormal lamellar bodies within the type 2 cells by ultrastructural analysis suggests a disruption in the normal surfactant synthesis and packaging in this disease. there is some evidence that this gene contains promoters that share elements consistent with their activation by the transcription factors ttf-1 and foxa7, and deletions in either or both of these genes may play a role in this disease [289] . annual report to the nation on the status of cancer pathology and genetics of tumours of the lung, pleura ajcc cancer staging manual varying), national cancer institute, dccps, surveillance research program, cancer statistics branch, released diagnosis and management of lung cancer executive summary: accp evidence-based clinical practice guidelines genetics of preneoplasia: lessons from lung cancer allelotyping demonstrates common and distinct patterns of chromosomal loss in human lung cancer types genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering high resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array cgh lung cancer preneoplasia characterizing the cancer genome in lung adenocarcinoma focus on lung cancer new molecularly targeted therapies for lung cancer high resolution chromosome 3p allelotyping of human lung cancer and preneoplastic/preinvasive bronchial epithelium reveals multiple, discontinuous sites of 3p allele loss and three regions of frequent breakpoints genetic and molecular alterations allelic losses at chromosome 8p21-23 are early and frequent events in the pathogenesis of lung cancer sequential molecular abnormalities are involved in the multistage development of squamous cell lung carcinoma hereditary cancers disclose a class of cancer genes tobacco smoke carcinogens, dna damage and p53 mutations in smoking-associated cancers molecular oncogenesis of lung cancer molecular genetics of lung cancer the p16/cyclin d1/rb pathway in neuroendocrine tumors of the lung smoke exposure, histologic type and geography-related differences in the methylation profiles of non-small cell lung cancer targeting mtor signaling in lung cancer the potential role of mtor inhibitors in non-small cell lung cancer her-2/neu expression in archival non-small cell lung carcinomas using fdaapproved hercep test epidermal growth factor family of receptors in preneoplasia and lung cancer: perspectives for targeted therapies lack of trastuzumab activity in nonsmall cell lung carcinoma with overexpression of erb-b2: 39810: a phase ii trial of cancer and farnesyl transferase inhibitors for patients with lung cancer angiogenesis inhibitors in the treatment of lung cancer antiangiogenic therapy in nonsmall cell lung cancer micrornas as oncogenes and tumor suppressors pathology and genetics of tumours of the lung, pleura, thymus and heart lung cancer associated with several connective tissue diseases: with a review of literature atypical adenomatous hyperplasia atypical adenomatous hyperplasia of the lung in autopsy cases identification of bronchioalveolar stem cells in normal lung and lung cancer genomic profiling identifies titf1 as a lineage-specific oncogene amplified in lung cancer clinical and biological features associated with epidermal growth factor receptor gene mutations in lung cancers exclusive mutation in epidermal growth factor receptor gene, her-2, and kras, and synchronous methylation of nonsmall cell lung cancer somatic mutations of epidermal growth factor receptor signaling pathway in lung cancers distinct epidermal growth factor receptor and kras mutation patterns in non-small cell lung cancer patients with different tobacco exposure and clinicopathologic features combination of egfr gene copy number and protein expression predicts outcome for advanced non-small-cell lung cancer patients treated with gefitinib somatic mutations of the her2 kinase domain in lung adenocarcinomas the role of dna methylation in the development and progression of lung adenocarcinoma mutational and epigenetic evidence for independent pathways for lung adenocarcinomas arising in smokers and never smokers squamous cell carcinoma squamous dysplasia and carcinoma in situ lung cancer and lung stem cells: strange bedfellows? follow-up of bronchial precancerous lesions and carcinoma in situ using fluorescence endoscopy the natural course of preneoplastic lesions in bronchial epithelium surveillance for the detection of early lung cancer in patients with bronchial dysplasia large cell carcinoma molecular pathology of large cell carcinoma and its precursors karyotypic characterization of bronchial large cell carcinomas abnormalities of epidermal growth factor receptor in lung squamous-cell carcinomas, adenosquamous carcinomas, and large-cell carcinomas: tyrosine kinase domain mutations are not rare in tumors with an adenocarcinoma component egfr mutations in non-small-cell lung cancer: analysis of a large series of cases and development of a rapid and sensitive method for diagnostic screening with potential implications on pharmacologic treatment survival analysis of 200 pulmonary neuroendocrine tumors with clarification of criteria for atypical carcinoid and its separation from typical carcinoid the concept of pulmonary neuroendocrine tumours small cell carcinoma pulmonary carcinoid: presentation, diagnosis, and outcome in 142 cases in israel and review of 640 cases from the literature pulmonary neuroendocrine tumors: incidence and prognosis of histological subtypes. a population-based study in denmark diffuse idiopathic pulmonary neuroendocrine cell hyperplasia the insulinoma-associated 1: a novel promoter for targeted cancer gene therapy for small-cell lung cancer hash1 expression is closely correlated with endocrine phenotype and differentiation extent in pulmonary neuroendocrine tumors ttf-1 expression is specific for lung primary in typical and atypical carcinoids: ttf-1-positive carcinoids are predominantly in peripheral location usefulness of cdx2 and ttf-1 in differentiating gastrointestinal from pulmonary carcinoids expression of thyroid transcription factor-1 in the spectrum of neuroendocrine cell lung proliferations with special interest in carcinoids cytokeratin 7 and 20 and thyroid transcription factor 1 can help distinguish pulmonary from gastrointestinal carcinoid and pancreatic endocrine tumors small cell carcinoma chromosomal imbalances in human lung cancer small-cell lung cancer is characterized by a high incidence of deletions on chromosomes 3p, 4q, 5q, 10q, 13q and 17p neuro-endocrine tumours of the lung. a review of relevant pathological and molecular data tumor suppressor genes on chromosome 3p involved in the pathogenesis of lung and other cancers combined microarray analysis of small cell lung cancer reveals altered apoptotic balance and distinct expression signatures of myc family gene amplification genetic changes in the spectrum of neuroendocrine lung tumors identification of men1 gene mutations in sporadic carcinoid tumors of the lung men1 gene mutation analysis of high-grade neuroendocrine lung carcinoma mechanisms of disease: multiple endocrine neoplasia type 1-relation to chromatin modifications and transcription regulation molecular genetics of small cell lung carcinoma apoptosis-related factors p53, bcl2, and bax in neuroendocrine lung tumors analysis of p53, k-ras-2, and c-raf-1 in pulmonary neuroendocrine tumors. correlation with histological subtype and clinical outcome neuroendocrine carcinomas and precursors rb and cyclin dependent kinase pathways: defining a distinction between rb and p16 loss in lung cancer loss of heterozygosity at the rb locus correlates with loss of rb protein in primary malignant neuro-endocrine lung carcinomas distinct pattern of e2f1 expression in human lung tumours: e2f1 is upregulated in small cell lung carcinoma e2f-1, skp2 and cyclin e oncoproteins are upregulated and directly correlated in high-grade neuroendocrine lung tumors telomerase activity in small-cell and non-small-cell lung cancers differential expression of telomerase reverse transcriptase (htert) in lung tumours lack of telomerase activity in lung carcinoids is dependent on human telomerase reverse transcriptase transcription and alternative splicing and is associated with long telomeres telomere length, telomerase activity, and expressions of human telomerase mrna component (hterc) and human telomerase reverse transcriptase (htert) mrna in pulmonary neuroendocrine tumors genetics of neuroendocrine and carcinoid tumours epigenetic inactivation of rassf1a in lung and breast cancers and malignant phenotype suppression dna methylation profiles of lung tumors differential inactivation of caspase-8 in lung cancers molecular changes in the bronchial epithelium of patients with small cell lung cancer 11q13 allelic imbalance discriminates pulmonary carcinoids from tumorlets. a microdissection-based genotyping approach useful in clinical practice inflammatory myofibroblastic tumour mesenchymal tumours alk1 and p80 expression and chromosomal rearrangements involving 2p23 in inflammatory myofibroblastic tumor tpm3-alk and tpm4-alk oncogenes in inflammatory myofibroblastic tumors primary intrathoracic synovial sarcoma: a clinicopathologic study of 40 t(x;18)-positive cases from the french sarcoma group and the mesopath group primary pulmonary and mediastinal synovial sarcoma: a clinicopathologic study of 60 cases and comparison with five prior series a high frequency of tumors with rearrangements of genes of the hmgi(y) family in a series of 191 pulmonary chondroid hamartomas chromosomal translocations in benign tumors: the hmgi proteins changing trends in us mesothelioma incidence pathology and genetics of tumours of the lung, pleura, thymus and heart sv40 and the pathogenesis of mesothelioma advances in malignant mesothelioma surgical management of malignant pleural mesothelioma: a systematic review and evidence summary application of immunohistochemistry to the diagnosis of malignant mesothelioma cellular and molecular parameters of mesothelioma diffuse malignant mesothelioma: genetic pathways and mechanisms of oncogenesis of asbestos and other agents that cause mesotheliomas asbestos, chromosomal deletions, and tumor suppressor gene alterations in human malignant mesothelioma cytogenetic and molecular genetic changes in malignant mesothelioma neurofibromatosis type 2 (nf2) gene is somatically mutated in mesothelioma but not in lung cancer advances in the molecular biology of malignant mesothelioma update on the molecular biology of malignant mesothelioma human mesothelioma cells exhibit tumor cell-specific differences in phosphatidylinositol 3-kinase/akt activity that predict the efficacy of onconase gefitinib in patients with malignant mesothelioma: a phase ii study by the cancer and leukemia group b common egfr mutations conferring sensitivity to gefitinib in lung adenocarcinoma are not prevalent in human malignant mesothelioma new insights into the pathogenesis of asthma placebo-controlled immunopathologic study of four months of inhaled corticosteroids in asthma the pharmacogenetics of asthma: an update genetics and the variability of treatment response in asthma immunology of asthma and chronic obstructive pulmonary disease pathogensis of asthma superoxide dismutase inactivation in pathophysiology of asthmatic airway remodeling and reactivity the role of the mast cell in the pathophysiology of asthma global initiative for chronic obstructive lung disease (gold): global strategy for the diagnosis mocopd mechanisms of cigarette smokeinduced copd: insights from animal models the definition of emphysema: report of the national heart, lung and blood institute division of lung diseases workshop frontiers in emphysema research advances in the pathology of copd. histopath pathology of chronic airflow obstruction molecular pathogenesis of emphysema alpha 1-antitrypsin pi-types in 965 copd patients alpha-1-antitrypsin deficiency: pathogenesis, clinical presentation, diagnosis, and treatment alpha-1-antitrypsin deficiency: current concepts z-type alpha 1-antitrypsin is less competent than m1-type alpha 1-antitrypsin as an inhibitor of neutrophil elastase mediators of chronic obstructive pulmonary disease oxidative stress and lung inflammation in airways disease oxidative stress and redox regulation of lung inflammation of copd genetics of chronic obstructive pulmonary disease congenital bronchiectasis due to deficiency of bronchial cartilage (williams-campbell syndrome): a case report tracheobronchomegaly-the mounier-kuhn syndrome: report of two cases and review of the literature a human syndrome caused by immotile cilia genetic causes of bronchiectasis: primary ciliary dyskinesia cystic fibrosis identification of the cystic fibrosis gene: cloning and characterization of complementary dna genetic causes of bronchiectasis: primary immune deficiencies and the lung emerging and unusual gram-negative infections in cystic fibrosis altered respiratory epithelial cell cytokine production in cystic fibrosis imaging of idiopathic interstitial pneumonias the interstitial pneumonias and by the ers executive committee the genetic approach in pulmonary fibrosis idiopathic pulmonary fibrosis: clinical relevance of pathologic classification pathology of advanced interstitial diseases: pulmonary fibrosis, sarcoidosis, histiocytosis x, autoimmune pulmonary disease, lymphangioleiomyomatosis idiopathic pulmonary fibrosis: multiple causes and multiple mechanisms? role of epithelial cells in idiopathic pulmonary fibrosis idiopathic pulmonary fibrosis: new insights into pathogenesis aberrant wnt/beta-catenin pathway activation in idiopathic pulmonary fibrosis fibroblasts from idiopathic pulmonary fibrosis and normal lungs differ telomerase mutations in families with idiopathic pulmonary fibrosis correlative study of adult respiratory distress syndrome by light, scanning, and transmission electron microscopy pulmonary pathology of the adult respiratory distress syndrome roles of oxidants and redox signaling in the pathogenesis of acute respiratory distress syndrome acute lung injury and cell death: how many ways can cells die? lymphangioleiomyomatosis: clinical course in 32 patients lymphangioleiomyomatosis: a clinical update pulmonary lymphangiomyomatosis: correlation of ct with radiographic and functional findings clinical and molecular insights into lymphangioleiomyomatosis. sarc vasc dif lung dis markers of cell proliferation and expression of melanosomal antigen in lymphangioleiomyomatosis smooth muscle-like cells in pulmonary lymphangioleiomyomatosis enter line ht. lymphangiomyoma, a benign lesion of chyliferous lymphatics synonymous with lymphangiopericytoma molecular pathogenesis of lymphangioleiomyomatosis: lessons learned from orphans current concepts of the pathogenesis of sarcoidosis pulmonary sarcoidosis the morbid anatomy of sarcoidosis sarcoidosis: a clinicopathologic review of three hundred cases, including twenty-two autospies pathology of sarcoidosis what is sarcoidosis? gene-environment interactions in sarcoidosis: challenge and opportunity epidemiology, demographics and genetics of sarcoidosis familial aggregation of sarcoidosis. a case-control etiologic study of sarcoidosis (access) genetic basis of remitting sarcoidosis: triumph of the trimolecular complex? increased hla-b7 antigen frequency in south carolina blacks in association with sarcoidosis human leukocyte antigen class i alleles and the disease course in sarcoidosis patients hla and sarcoidosis in the japanese analysis of restriction fragment length polymorphism for the hla-dr gene in japanese familial sarcoidosis is linked to the major results from a genome-wide search for predisposing genes in sarcoidosis a search for mycobacterial dna in granulomatous tissues from patients with sarcoidosis using the polymerase chain reaction molecular analysis of sarcoidosis and control tissues for mycobacteria dna difficult treatment issues in sarcoidosis pulmonary fibrosis of sarcoidosis. new approaches, old ideas pulmonary alveolar proteinosis pulmonary alveolar proteinosis granulocyte-macrophage colony-stimulating factor and lung immunity in pulmonary alveolar proteinosis pulmonary alveolar phospholipoproteinosis: experience with 34 cases and a review pulmonary alveolar proteinosis: recent advances unsuspected pulmonary alveolar proteinosis complicating acute myelogenous leukemia alveolar proteinosis as a consequence of immunosuppression. a hypothesis based on clinical and pathologic observations crazy-paving appearance at thin-section ct: spectrum of disease and pathologic findings surfactant proteins in pulmonary alveolar proteinosis in adults pulmonary alveolar proteinosis. staining for surfactant apoprotein in alveolar proteinosis and in conditions simulating it surfactant apoprotein in nonmalignant pulmonary disorders regulation of surfactant secretion in alveolar type ii cells gm-csf regulates pulmonary surfactant homeostasis and alveolar macrophage-mediated innate host defense granulocyte/macrophage colony-stimulating factor-deficient mice show no major perturbation of hematopoiesis but develop a characteristic pulmonary pathology pulmonary alveolar proteinosis is a disease of decreased availability of gm-csf rather than an intrinsic cellular defect autoantibodies against granulocyte-macrophage colony-stimulating factors are diagnostic for pulmonary alveolar proteinosis idiopathic pulmonary alveolar proteinosis as an autoimmune disease with neutralizing antibody against granulocyte/macrophage colony-stimulating factor characteristics of a large cohort of patients with autoimmune pulmonary alveolar proteinosis in japan 1 regulation of human alveolar macrophage differentiation requires granulocyte-macrophage colony-stimulating factor gm-csf regulates alveolar macrophage differentiation and innate immunity in the lung through pu peroxisome proliferatoractivated receptor-gamma is deficient in alveolar macrophages from patients with alveolar proteinosis clinical classification of pulmonary hypertension narrative review: the enigma of pulmonary art genetics and mediators in pulmonary arterial hypertension primary pulmonary hypertension update in pulmonary arterial hypertension recent advances in the diagnosis of churg-strauss syndrome new insight into the pathogenesis of vasculitis associated with antineutrophil cytoplasmic autoantibodies defensins: antimicrobial peptides of innate immunity innate immunity in the lung: how epithelial cells fight against respiratory pathogens strategic targets of essential hostpathogen interactions bacterial infections worldwide haemophilus influenzae type b disease at the beginning of the 21st century: global analysis of the disease burden 25 years after the use of the polysaccharide vaccine and a decade after the advent of conjugates mycoplasma pneumoniae and its role as a human pathogen the clinical spectrum of primary tuberculosis in adults. confusion with reinfection in the pathogenesis of chronic tuberulosis exogenous reinfection as a cause of recurrent tuberculosis after curative treatment the natural history of the tuberculous pulmonary lesion nonbacterial pneumonia pulmonary infections in transplantation pathology retrospective diagnosis of hanta-virus pulmonary syndrome analysis of deaths during the severe acute respiratory syndrome (sars) epidemic in singapore: challenges in determining a sars diagnosis viral respiratory tract infections in transplant patinets histoplasmosis and blastomycosis epidemiological and clinical featues of pulmonary blastomycosis current concepts: coccidioidomycosis pathology of infectious diseases the histologic spectrum of necrotizing forms of pulmonary aspergillosis ribosomal rna sequence shows pneumocystis carinii to be a member of the fungi a new name (pneumocystis jiroveci) for pneumocystis from humans strain typing methods and molecular epidemiology of pneumocystis pneumonia advances in the biology, pathogenesis and identification of pneumocystis pneumonia pneumocystis pneumonia mechanisms in the pathogenesis of asbestosis and silicosis cells of the dendritic cell lineage in human lung carcinomas and pulmonary histiocytosis s pulmonary langerhans' cell histiocytosis surface phenotype of langerhans cells and lymphocytes in granulomatous lesions from patients with pulmonary histiocytosis x pulmonary dendritic cells aberrant chemokine receptor expression and chemokine production by langerhans cells underlies the pathogenesis of langerhans cell histiocytosis transgenic expression of granulocyte-macrophage colony-stimulating factor induces the differentiation and activation of a novel dendritic cell population in the lung pulmonary langerhans cell histiocytosis: molecular analysis of clonality genotypic analysis of pulmonary langerhans cell histiocytosis adult pulmonary langerhans cell histiocytosis erdheim-chester disease: clinical, radiologic, and histopathologic findings in five patients with interstitial lung disease erdheim-chester disease. clinical and radiologic characteristics of 59 cases erdheim-chester disease with prominent pulmonary involvement associated with eosinophilic granuloma of mandibular bone diffuse interstitial pneumonia revealing erdheim-chester's disease erdheim-chester disease erdheim-chester disease: a rare multisystem histiocytic disorder associated with interstitial lung disease fulminant multisystem non-langerhans cell histiocytic proliferation with hemophagocytosis: a variant form of erdheim-chester disease pulmonary pathology of erdheim-chester disease erdheim-chester disease: a primary macrophage cell disorder treatment of erdheim-chester disease with cladribine: a rational approach pulmonary involvement with erdheim-chester disease: radiographic and ct findings chester-erdheim disease: a neoplastic disorder the pathology of asbestos-associated disease of the lung and pleural cavities: diagnostic critera cellular and molecular mechanisms of asbestos-induced fibrosis essential role of ros-mediated nfat activation in tnf-alpha induction by crystalline silica exposure cell-type-specific regulation of the human tumor necrosis factor alpha gene in b cells and t cells by nfatp and atf-2/jun bronchogenic cysts: clinicopathological presentation and treatment cystic lesions of the lung in children: classification and controversies transcriptional control of lung morphogenesis nonspecific interstitial pneumonia and usual interstitial pneumonia with mutation in surfactant protein c in familial pulmonary fibrosis abca3 mutations associated with pediatric interstitial lung disease a common mutation in the surfactant protein c gene associated with lung disease genetic disorders of surfactant proteins genetic disorders of surfactant homeostasis pathology and genetics of tumours of the lung, pleura key: cord-280691-nzc8ir0n authors: guo, sun-wei title: china’s “gene war of the century” and its aftermath: the contest goes on date: 2013-08-30 journal: minerva doi: 10.1007/s11024-013-9237-7 sha: doc_id: 280691 cord_uid: nzc8ir0n following the successful cloning of genes for mostly rare genetic diseases in the early 1990s, there was a nearly universal enthusiasm that similar approaches could be employed to hunt down genes predisposing people to complex diseases. around 1996, several well-funded international gene-hunting teams, enticed by the low cost of collecting biological samples and china’s enormous population, and ushered in by some well-connected chinese intermediaries, came to china to hunt down disease susceptibility genes. this alarmed and, in some cases, enraged many poorly funded chinese scientists, who perceived them as formidable competitors. some depicted foreign gene-hunters as greedy pilferers of the vast chinese genetic gold mine, comparing it to the plundering of national treasures from china by invaders in the past, and called upon the government and their fellow countrymen to rise up and protect china’s genetic gold mine. media uproar ensued, proclaiming the imminent “gene war of the century.” this article chronicles the key events surrounding this “war” and its aftermath, exposes some inherent complexities in identifying susceptibility genes for complex diseases, highlights some issues obscured or completely overlooked in the passionate and patriotic rhetoric, and debunks some misconceptions embedded in this conflict. in addition, it argues that during the entire course of this “war,” the public’s interest went conspicuously unmentioned. finally, it articulates several lessons that can be learned from this conflict, and outlines challenges facing human genetics researchers. around 1997, and amid the talks of hong kong's upcoming return to china and later the asian financial crisis, a recurring topic in the chinese media was the so-called ''gene war of the century'': the lopsided condemnation of foreign scientists coming purportedly to pilfer china's vast genetic resources for a profit. some scientists wrote articles and gave lectures, calling for heightened vigilance for the pilfering act, and proposed that the country protect its precious genetic resources by conducting genetic research on its own. while the public might have been completely flummoxed by some esoteric and arcane jargons, the message was nonetheless loud and clear: the western capitalists were trying to profit from china's unique genetic heritage. in a country with a past history of repeated foreign invasion, defeat, and humiliation, this message struck a tender emotional chord and caused an eruption of national furor. the person who likely triggered, perhaps unintentionally, the first spark of this ''war'' is xiping xu, a chinese expatriate at harvard at the time. despite his repeated proclamation as a staunch and unwavering patriot loyal to his beloved motherland and dedicated to the advancement of china's science and technology, he nonetheless later became embroiled in an avalanche of controversies surrounding the ''gene war.'' he effectively became a lightning rod for all the controversy on genetic resources, intellectual rights, informed consent, and the protection of human research subjects. well over a decade has since passed. what was at stake? did these precious genetic resources actually exist? who was the most likely beneficiary of the gene-hunting endeavor? how did this ''war'' end? who were the winners and losers, if there were any in the first place? what happened after the conflict? as war is invariably a continuation of politics by other means, what was the politics behind it? what happened to the people who were embroiled in the ''war''? the answers to all these questions can be addressed, at least in part, now with the benefit of a 20/20 hindsight. war breaks out simply because of irreconcilable conflicts in interest. the ''gene war,'' whether it was real or fictitious, was no exception. this article chronicles the key events surrounding the ''war'' and its aftermath, exposes some inherent complexities in identifying susceptibility genes for complex diseases, highlights some issues obscured or completely overlooked in the passionate and patriotic rhetoric, and debunks some misconceptions embedded in the lopsided condemnations. in addition, it describes how, during the entire course of the ''war'' of intense and often politically charged uproars, the patients' interest was conspicuously unmentioned and likely overlooked. examining the larger issues regarding science and politics, it also argues that the ''war'' and its surrounding events can be best understood through the lens of credibility contest vying for resources. finally, it lists several lessons that can be learned from this conflict, and outlines challenges facing current researchers in human genetics. ''gene war of the century'': the genesis in 1990, the human genome project (hgp) was launched. with a price tag of 3 billion us dollars and a 15-year timeline, this ambitious megaproject aimed to sequence the entire human genome, with the ultimate goal of ''understand[ing] the human genome'' and ''knowledge of the human as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine.'' supported by the us department of energy (doe) and the us national institute of health (nih), the hgp was the culmination of several years of research building on a series of breathtaking breakthroughs in molecular genetics. considered the ''genetic blueprint for human beings'' and hailed as the ''book of life'' or simply the holy grail, the human genome, when completely sequenced, would purportedly unlock the secrets underlying a plethora of human traits as mundane as facial resemblance between parents and offspring and as complex as human behavior. against this foreground, human genetics research entered a golden age. in 1989, scientists identified (called ''cloned'') the genetic mutation responsible for a rare genetic disease called cystic fibrosis, that is, the gene responsible for the disease was identified with known location and size. in the inaugural year of the hgp, a gene responsible for breast cancer was localized, or ''mapped,'' to chromosome 17. in the following few years, the genes responsible for huntington's disease, breast cancer (5-10% of the cases), alzheimer's disease and other rare genetic diseases often with a tongue-twisting name would be cloned (see table 1 for the timeline of research milestones and events surrounding the ''gene war''). following on the heels of successful cloning of genes for these mostly rare mendelian diseases in the early 1990s, there emerged a nearly universal enthusiasm, hope or even conviction that similar gene-mapping approach could be employed to hunt down susceptibility genes predisposing people to various complex diseases -primarily common chronic diseases such as asthma, diabetes, and cancer that invariably have an elusive pathogenesis and collectively contribute to the major health burdens (lander and schork 1994; risch and merikangas 1996) . it was hoped that once genes were identified, the characterization of their functions would not only help better understand genotype-phenotype relationships, but also facilitate the development of specific therapies and preventative measures and the identification of people at increased risk of developing the disease (collins and mckusick 2001) . it was also hoped that once the risk of particular combinations of genotype and environmental exposure is known, medical interventions, such as lifestyle changes, could then be institutionalized to target high-risk groups or individuals (collins and mckusick 2001) . some biotech companies quickly saw the potential of enormous business opportunities and joined the fray. human genome sciences, founded in 1992 by william a. haseltine, a noted harvard professor and aids researcher, partnered with some genomics companies and soon filed patents on 100,000 genes 1 and, in 1999, quadrupled its stock price (zimmer 2009 ). other genomics companies followed suit. yet this practice had one problem: most, if not all, patented ''genes,'' in fact, rna transcripts, were merely pieces of cdna without any known functions at the time of filing. to understand what a gene does and how it does, and to establish the causal relationship between a gene and a human disease, let alone treatment, is by no means an easy task, even with modern technology. very often, it is a slow, arduous, painstaking, and imprecise process full of dead-ends and false leads. many other biotech companies and academic scientists took a different approach called ''positional cloning.'' this approach eliminates the need to understand the molecular genetic mechanisms underlying the disease of interest. instead, through the collection of pedigrees enriched with patients having the disease, the existing genetic sign posts (called dna markers, which are scattered around the human genome with known locations and content) would be used to localize the responsible gene in a particular region. 2 this would allow researchers to zero in the gene, identify it, and ultimately figure out its function and its relationship with the disease through extensive lab work. thus, by ''positioning'' the gene, the gene could be cloned and its functions and roles in disease pathogenesis elucidated without any prior knowledge of the possible pathogenesis of the disease. 3 this conceptual simplicity and beauty, coupled with increasingly fast and affordable methods of making genetic signposts (called ''genotyping'') attracted many biomedical scientists and even converted many of them this discovery was later found to hold universally true in all organisms, including humans, and became a corner stone and a principle in genetics. basically, if a trait is determined by a gene, then the gene will tend to go hand in hand with its neighboring signposts when transmitted from parents to offspring-thus the term ''linkage.'' if many relatives in a pedigree having the same trait all carry the same signposts, then there is a chance that the gene responsible for the trait is near to the signposts. although different cross-breeding cannot be made in humans for obvious reasons, this difficulty can be offset by statistical computations. the advent of personal computers in the 1980s coincided with the booming of human genetics and proved to be indispensable in this endeavor. the discovery of various classes of dna markers also facilitated the gene hunting. 3 the approach based on pedigree data is called ''linkage analysis.'' since 1997, scientists found that for common diseases, another approach, called ''association studies,'' can be more powerful in gene hunting. association studies identify disease genes by finding the significant gene frequency differentials between a group of unrelated healthy individuals and another group of unrelated people with the disease of interest. ''gene war of the century'' and its aftermath 489 to human genetics, who were frustrated by the slow, arduous and often fruitless process of finding the cause(s) for disease. thus, in the 1990s hardly a week went by without a news report or announcement of the discovery of genes for some disease, at least in the us. one high-profile study, published in 1993 in a prestigious journal, science, reported the localization of a gene in chromosome x 28q that is responsible for male sexual orientation. 4 this approach has one catch, however. one critical prerequisite for position cloning is the availability of a sufficient number of pedigrees saturated with people having the same disease, along with correct diagnoses or precise measurements of the disease (called ''phenotyping''). once such pedigrees are collected, phenotyping and genotyping can proceed and the positional cloning endeavor can start. since genotyping and phenotyping a large number of people can be costly, there are strong incentives to cut the cost of either process. everything else equal, locations with low costs of acquiring blood samples (for genotyping purposes) and labor (for phenotyping purposes) would be extremely attractive. in 1995, sequana therapeutics, a start-up biotech company located in la jolla, california, announced that it had achieved two significant research milestones related to its asthma gene discovery program. it analyzed dna collected from about 300 people on tristan da cunha, an island in the south atlantic, about 1,500 miles from south africa. approximately 30% of the island residents had asthma, presumably passed on from an original settler generations ago. the announcement prompted cash payments of $2 million from boehringer ingelheim, ingelheim, germany, based on a prior agreement. boehringer later paid sequana an additional $13 million for its exclusive right to market the drug derived from the putative asthma gene, while sequana retained the exclusive right for developing gene-based diagnostics. sequana announced in late may 1997 that it had identified a mutated gene that predisposes people to asthma, a feat hailed by one clinical investigator as ''perhaps this century's most important finding in the etiology of asthma'' (asthma gene discovered. 1997). early in the same year, sequana announced that it had signed a letter of intent with pe applied biosystems to form a broad-based dna sequencing joint venture in shanghai, china. circa 1996, several well-funded international gene-hunting teams, lured apparently by the low cost of collecting biological samples and the sheer population size and also ushered in by some well-connected chinese scientists working in north america, arrived in china to hunt down susceptibility genes for various complex diseases (shou 1997) . one biotech company from california announced that it had discovered a large pedigree of hundreds of people enriched with asthma patients in a small village in zhejiang province, china. 5 perhaps the most notable team was one led by dr. xiping xu, who was at that time working at harvard and was well-connected in anhui, china. an anhui native, xu spent the mid-1970s after high-school as a ''barefoot doctor'' in anhui, providing basic medical care for peasants after a small amount of training at a time when access to medical care in rural china was a luxury. he received his medical degree from anhui medical college (now anhui medical university) in the early 1980s, and was admitted to beijing medical college (now peking university school of medicine) for a student exchange program, a prep program for oversea studies. he went on to get his ph.d. degree in epidemiology in japan, went to harvard to receive his post-doctoral training in epidemiology in respiratory diseases, and received his master's degree in biostatistics from harvard university school of public health (hsph). he stayed on at hsph as a faculty member. even at harvard, xu apparently kept close ties with the anhui medical college, and was involved in several epidemiological studies in the anhui province. when his post-doc supervisor, dr. scott weiss, a harvard university respiratory epidemiologist, told geoffrey duyk, a geneticist who had left harvard to join a biotechnology start-up called millennium pharmaceuticals, that one of his postdoctoral fellows came from the anhui province and had access to a large number of dna samples, they instantly saw the potential and quickly formed a collaborative relationship to discover susceptible genes in complex diseases in anhui (keim 2003) . a formal partnership between harvard and millennium was established, where xu would direct the collection of a large number of dna samples in anhui, for which millennium would pay the university $3 million (keim 2003) . with tens of thousands of blood samples provided by anhui's villagers, the partnership hoped to identify the susceptibility genes predisposing people to asthma, obesity, miscarriages, schizophrenia and other illnesses. contingent upon its access to the anhui population's dna, millennium also secured sizable capital shortly afterwards from the swedish pharmaceutical company astra ab and from another pharmaceutical giant, hoffmann-la roche. the company's access to dna from the ''large, homogeneous population'' of anhui province was also featured prominently when millennium went public in 1996, raising $54 million in its initial public offering (keim 2003) . leveraging the existing and some preliminary data collected from anhui, xu went on to apply for more funding support from the nih. a search of crisp, a searchable database of nih-funded biomedical research projects (crisp 2010) using ''xiping xu'' as the principal investigator (pi)'s name reveals that xu received 1 grant on genetics of airway responsiveness and lung function in 1997 besides two other nih non-genetics grants, and in 2000 he received 5 nih grants on genetics of osteoporosis, airway responsiveness and lung function, nicotine addiction vulnerability, hypertension, and asthma on top of 4 other nih grants (table 2) . being a pi almost concurrently on 9 nih grants is remarkable, especially for a junior faculty without much track record, but having nih grant support in diverse research areas ranging from osteoporosis, hypertension, nicotine addiction, and asthma to human reproductive effects due to endocrine disruption or rotating shift work is extraordinary and certainly breathtaking. in all, he received well over 10 million usd in grant support from the nih, the pharmaceuticals industry, the march of dimes, and other funding agencies to investigate genetics of various complex diseases (yang 2003; keim 2003) . recognizing his scholarly potential and the growth area he was in, hsph promoted xu to associate professor in 1996 and made him the director of the newly established program in population genetics (now disbanded) in hsph. the solid financial backing and extensive connections allowed xu to enlist the enthusiastic support of the local government officials and his alma mater, who helped xu collect thousands of blood samples from rural villagers. a nearly palpable aura of harvard university which xu embodied and was (and still is) revered by many in china as the premium research institution and the most prestigious institution of higher learning also helped. many villagers were illiterate, had no idea what would be done with their samples, and were given merely empty promises of free medical care in exchange for their blood samples. these lapses of oversight, deliberate or otherwise, would return later to haunt xu and his team. the increasing number of scientists like xu with huge financial backing arriving in china to conduct genetic research alarmed many poorly funded chinese scientists, who perceived the situation as a major threat to their profession and a danger of eclipsing their own work. in november 1996, about 30 leading chinese biomedical and genetic researchers gathered in xiang shan, beijing, and held a conference on ''the human genome project and the development strategy for the 21 st -century medicine'' to evaluate and discuss the situation. 6 one incident further fueled the concern shared by all participants. a translated version of a science article was presented at the conference, which stated that xu sought to gain ''access to 200 million chinese through collaboration with six chinese medical centers.'' but in the chinese version, the program became one that would ''take blood samples from 200 million chinese'' (hui and jue 1997) . this seemingly astronomical figure incensed all conference participants, who were at that time cash-strapped and still playing catch-up with the west. they quickly reached a consensus and soon made it public: (1) china's genetic resources should not be pilfered by foreigners; (2) chinese scientists should immediately grasp the opportunity to find disease genes and patent them; (3) we should educate the people, and raise the awareness and importance of protection of our genetic resources; (4) we welcome all international collaborations based on fairness and mutual benefits; (5) through various avenues, the chinese scientists should be vocal about certain views deemed to be harmful to china's genetic research (xiao et al. 1997) . soon after the xiang shan conference, some scientists published articles depicting foreign gene-hunters as greedy pilferers of the vast chinese genetic gold mine and comparing them to past foreign invaders plundering china's national treasures. they called upon the government and their fellow countrymen to rise up and protect the putative genetic gold mine of the chinese gene pool (fang 1997a, b; lv 1999) . shortly after, financial details about the millennium-harvard deal based on anhui samples were leaked to the chinese press and caused a media blitz of condemnations. the media called it an imminent ''gene war of the century'' (shou 1997) . in fact, the notion of foreign capitalists profiting from china's precious genetic resources sparked such a fury that several other genetic research projects unrelated to xu were stalled for a year (hui and jue 1997) . around the same time, it was rumored that one prominent geneticist, who received his ph.d. from cal-tech in the 1930s for his work on ladybug genetics, yet had no training in either medical genetics or genetic epidemiology, had written a letter to president jiang zemin urging the government to take the matter seriously and to protect china's precious genetic resources. this, along with the media furor, duly alarmed the central government. in 1998, the office for management of china's human genetic resources was quickly established under the auspice of the ministry of science and technology, and charged with overseeing the handling and export of all biological specimens potentially containing genetic materials. soon a de facto law, the interim measures for the administration of human genetic resources, promulgated by the general office of the state council (ministry of science and technology & the ministry of public health, 1998), was enacted in june 1998. it mandated that genetic resources were not allowed to be taken abroad without explicit permission and observance of due procedures as defined in the interim measures. funding for domestic human genetic research subsequently poured in (swinbanks 1998), which spurred human genetics research in china (du et al. 2001) . two genome research centers, one located in beijing and the other in shanghai, were established. while the phrase, ''china's genetic resources,'' has been used widely and extensively, surprisingly no definition has ever been officially provided. some scientists likened china's genetic resources to natural resources, claiming that, as the most populous nation in the world, china has the largest number of ethnic groups and also has the widest and the most complex disease spectrum (xiao et al. 1997 ). in addition, with a long documented history and many isolated populations, there were many genetic isolates and thus china has the purest genetic heritage in the world. therefore, china is a ''gene giant'' and ''the new world of genes,'' making many technologically advanced nations envy and salivate (xiao et al. 1997) . however, the analogy between ''genetic resources'' and natural resources has several problems. granted, the vast population facilitates the recruitment of patients with rare diseases for medical research and the low cost of collecting specimens is conducive to large-scale biomedical research. however, china did not and still does not necessarily have more types of diseases, and even if it did, few people outside china would be interested in finding the genetic causes for diseases that are unique to the chinese population. in fact, for many rare genetic diseases (called ''orphan diseases'') for which susceptibility genes have been cloned, many pharmaceutical companies are often reluctant to invest in drug research and development (r&d) for these diseases due to concerns of profitability. the values of natural resources are determined by their amount, their extractability, and market demand. there are two forms, renewable (such as wind power) and non-renewable (such as fossil fuels). a commodity is considered a natural resource when the primary human activities associated with it are extraction and purification, as opposed to creation. thus, mining, petroleum extraction, fishing etc. are natural resource industry, but agriculture is not. since gene identification requires much more than collecting blood samples and are both labor and knowledge intensive, genetic resources are, by definition, not natural resources. in addition, unlike fossil fuels, genetic resources are not entirely non-renewable, if the disease spectrum remains the same. with dramatic economic and social changes, the living standard in china has risen remarkably in the last 30 years. following these changes, diet and lifestyles also have changed quite dramatically. as a result, the disease spectrum in the chinese has shifted, especially in coastal regions. the incidences of breast cancer, colon cancer, prostate cancer, hypertension, and type 2 diabetes all have risen sharply in the last decade (xiang et al. 2004) . in large cities such as beijing, childhood obesity is used to be very rare but now it is reported to be in the range of 10% (and another 10% of children are overweight). in contrast, incidence of stomach cancer has decreased, especially in those highincidence areas where living standards have been improved. ''gene war of the century'' and its aftermath 495 the rapidly changing disease spectrum means that, firstly, the genetic resources would be forever lost if not used in a timely fashion for gene-hunting purposes. secondly, hunting disease susceptibility genes for a disease that obviously has a strong environmental component was an uncharted territory -no one at the time was absolutely certain how it would turn out. over ten years would elapse before people realized that heritability would vanish even for human traits that are known to be mostly hereditary (maher 2008) . lastly, the notion of china's genetic resources touched upon some thorny issues. unlike other natural resources, genetic materials, as in blood samples, exist only in the human body, which is technically owned by their hosts, not by the state. if a person strikes a deal with a drug company, or acts simply out of genuine altruism, and is willing to give away his blood sample, does the state have the right to intervene? if so, would such intervention infringe on the donor's human rights? giving away or even selling one's blood sample is certainly different from prostitution or selling one's own body parts. if the state does have the right to intervene, where can we draw the line? unfortunately, such questions were never raised and discussed. for biomedical research, there surely is such thing as a genetic resource. but what is it? what constitutes a genetic resource? contrary to the popular belief, population size and diversity of diseases, in and by themselves, actually do not make china ''an ideal place for gene hunters'' (guo et al. 1997) . few among the common diseases in china are known to have a hereditary component or to be amenable for positional cloning. in fact, while a small portion of breast cancer cases, for example, may be attributed to gene mutations, the large proportion of common and complex diseases is unlikely due to a few gene mutations or polymorphisms (see below). as demonstrated by an interventional study conducted in finland, by reducing body weight, eating healthy and regularly engaging in physical exercise, the risk for developing type 2 diabetes can be reduced by nearly 60% (tuomilehto et al. 2001 ). the dramatic changes in incidence of various diseases in china clearly show that many complex diseases have a very strong environmental component. indeed, a 20-year interventional study conducted in da qing, china, shows that, after merely 6 years of lifestyle intervention after recruitment, those in the intervention groups had a 43% lower incidence over the 20 year period as compared with control participants (li et al. 2008) , very similar to the finnish results. what constitutes a genetic resource, then? an ideal population for positional cloning 7 and association studies should have a well-enumerated genetic disease 7 positional cloning is a method of gene identification in which a gene for a specific disease is identified. a scientist can know nothing about the disease etiology. but just by collecting family data, dna, and some sleuthing, s/he could roughly localize the putative gene in a chromosomal region (i.e. positioning). then, with other molecular genetic tools, the scientist can then identify the gene from the region-thus the phrase, positional cloning. heritage, such as that of the finns (norio et al. 1973) , and a relatively welldelineated population genetic structure, such as in finland, where extensive church records often exist that document pedigree information for many populations. 8 the catalogue of the genetic disease heritage would allow for correct specification of genetic models and facilitate gene identification. a well-delineated genetic structure of the population should facilitate fine-mapping (i.e. zero-in) and genetic association studies. in contrast, when the ''gene war'' broke out, the documentation of chinese disease heritage was scant at best, and its research in population genetics and genetic epidemiology lagged far behind that of other developing countries. since genetic epidemiology is itself a new branch of epidemiology, and since the design, execution and analysis of genetic epidemiologic studies require not only the expert knowledge of disease epidemiology but also a good command of statistical genetics, genetic epidemiology in china was in its infancy at the time. consequently, there was little, if any, genetic epidemiologic research in china. as a result, little was known of the mode of inheritance, penetrance, 9 and gene frequency for major complex diseases. even credible estimates of disease incidence/prevalence were hard to find. therefore, the notion that china is ''an ideal place for gene hunters'' is questionable and somewhat dubious. the fact that after well over a decade no genes for any common, chronic disease have been identified in china is a testament to this. while calling for protection of china's genetic resources and equating large number of dna samples to huge commercial profits, virtually no one at the time was even remotely aware that there are actually numerous obstacles to this gene prospecting. first, there were huge financial barriers. hunting susceptibility genes for complex diseases usually requires a large number of blood samples, along with accurate phenotypic data in the first place. while genotyping costs have been reduced substantially, they were still expensive in the late 1990s, especially when the whole genome would be scanned. these procedures alone would require a significant upfront capital investment. and like any other scientific endeavor, the gene hunting expedition would carry inherent risk of failure, lacking any guarantee that the investment would bear any financial rewards. the demand for huge resources, coupled with the uncertainty of yield from the investment, raises the question as to whether these endeavors were actually good investments, especially in a developing nation like china where there were and still are problems in providing affordable and equitable medical care for all its citizens (hsiao 2009 ). indeed, lifting living standard for all, improving child and maternal health, and better health education on healthy lifestyles (smoke-cessation, healthy 8 a well-enumerated genetic disease heritage can provide the scientists with basic information about the disease of interest, such as mode of transmission and disease frequency. a well-delineated population genetic structure would come in handy when trying to narrow down the gene in a chromosomal region. 9 penetrance refers to the probability that a person with a certain genotype (a genetic makeup) will develop the disease. ''gene war of the century'' and its aftermath 497 diet, physical exercises etc.) can have a proven improvement of health of the general population. secondly, there were numerous scientific hurdles, some seemingly insurmountable. hunting for complex disease genes was an uncharted territory in 1997, and no one could be reasonably sure as to whether there were susceptibility genes, and, if so, whether they could be identified, especially with reasonable time and resource constraints. even if they could be identified, whether they could fulfill the promises of better diagnosis and treatment is also completely uncertain. the genetic mechanism for sickle cell anemia has been known for well over half a century, for example, but so far no gene-derived therapeutics is available. 10 lastly, there were technical hurdles. to genotype large number of samples with high accuracy and reasonable cost was still a challenge around 1997. methodologically, how to analyze the data for diseases which apparently are also influenced by environmental factors and aging process was, and still is, a serious challenge. in addition, how to handle gene-gene interaction, gene-environment interaction, and variable age of onset posed formidable analytical challenges (di and guo 2007) . these hurdles were further compounded by the lack of a systematic catalogue and documentation of hereditary diseases in china in terms of disease frequency, mode of transmission and penetrance, the lack of documentation of population genetic structures in china, and the scarcity of well-trained genetic epidemiologists. even though the per-sample cost of sample collection was relatively low, this only advantage rarely offset the other, more fundamental deficiencies, and boded ill for many gene-hunting endeavors. viewing the xiping xu case as the nexus of international, transnational, national, and local interests, sleeboom eloquently provides ten different perspectives representing the views and ideals of different interest groups on xu's genetics research program in china (sleeboom 2005) . indeed, it is often stated that there are several stakeholders in the putative ''gene war'': chinese scientists, foreign gene hunters, and the chinese government. all of them apparently had a vested interest, mostly financial, in china's genetic resources. however, one critically important stakeholder and a potential beneficiary of this gene prospecting, obscured by the media blitz, were actually the patients of various complex diseases in china and the rest of the world. somehow, their voices were muffled and not heard. indeed, from a patient's perspective, it really does not matter which country finds the genes first and comes up with an efficacious therapeutics as long as s/he can access it at a reasonable cost and within a reasonable timeframe. in the 1990s, china's drug r&d lagged significantly behind the west. most, if not all, drugs and diagnostic kits with proven efficacy used in china today have 10 sickle cell anemia is a life-long blood disorder, characterized by abnormal, sickle-shaped red blood cells that do not have usual elasticity as normal red blood cells do. the disorder is caused by a single mutation in a gene called hemoglobin and manifests excruciating pain and shortened lifespan. it has been known to be an abnormality in the hemoglobin molecule since 1949. been first discovered and developed outside china. in fact, all major pharmaceutical companies have now set up manufacturing facilities in china, and almost all drug companies market their products in china once approved by the state food and drug administration of china, a counterpart of the fda of the us. in fact, anecdotal evidence suggests that, when money is not an issue, many patients in china, especially those with potentially fatal diseases, usually prefer imported drugs or drugs made by foreign companies even though cheaper, domestically made drugs of purportedly equal efficacy are available. thus, one simple fact was overlooked in the entire media blitz: an intellectual right on genes can be profitable only when it has a market. china's market is too big to ignore for any rational pharmaceutical company. and when a drug company sells its gene-derived products to china using materials collected from chinese, some patients may still reap the fruits of gene prospecting. this seemingly obvious fact was completely overlooked at the height of the ''gene war.'' the attention that xu's projects drew was certainly unexpected and was likely distracting to xu's projects. xu vehemently denied that he was exploiting poor villagers in anhui for personal gains. he lamented that ''i came from china, and i love the country. but i have been treated like a traitor'' (hui and jue 1997) . fearing that the media furor and the lopsided condemnation from scientists in china would torpedo his projects in anhui, xu quickly moved to act. when a letter to the editor appeared in science questioning the validity of the ''gene war'' (guo et al. 1997) , xu quickly translated it into chinese and circulated it among chinese officials. xu also enlisted the support of several established chinese scholars in the us. he appealed to peng peiyun, who was then the director of the state family planning commission, soliciting support for his projects. he adamantly maintained that he was patriotic, and his projects in anhui and elsewhere in china had already trained chinese scientists and elevated their research capabilities. xu apparently had mastered the finesse of keeping a good relationship with the government officials and adroitly played the card of a patriotic oversea chinese scholar. the official people's daily reported in 2001 that ''in the last few years, the chinese biomedical researchers collaborated with the world-famous harvard university on various projects, and achieved exciting results in the pathogenesis of complex chronic diseases…. in particular, the research in genetic epidemiology of asthma and hypertension is now at the forefront in the world'' ([benefiting thousands and thousands of families.] 2002). in another report, it said that ''harvard's project has so far produced 8 postdocs, 4 doctoral students, 20 visiting scholars, and 4 senior investigators'' ([east and west collaborate for the health of humankind.] 2001). in xu's hometown, the provincial newspaper reported, after enumerating various human genetics research projects with harvard, that ''these collaborative projects not only injected new vitality to anhui's science and technology but also helped attract investment in the amount of about 50 million ''gene war of the century'' and its aftermath 499 rbm yuan'' ([anhui-native scientist climb peak of human genetics.] 2002). it remarked that ''these projects helped establish tens of laboratories with advanced instruments, …, and laid the foundation for our nation to catch up and surpass the west in human complex diseases research in the 21 st century'' ([anhui-native scientist climb peak of human genetics.] 2002). xu's preference for dealing with the high-rank officials, however, went overboard and nearly caused his undoing (see below). as xu's various gene-hunting projects in anhui took off, some disturbing incidents involved in these projects gradually surfaced and leaked to the press. participants were initially promised by the research team that they would get free or reduced-cost medical care, but these promises were never honored. informed consent supposed to be obtained from potential participants actually was not-a flagrant violation of nih regulations. rumors circulated regarding coercion by local officials to participate in the projects, sample mishandling and mix-ups in the lab. prompted by these allegations, a fact-finding team of six people from harvard, including xu and his mentor, scott weiss, arrived in anhui in march 1999 to investigate the ethical and scientific integrity of xu's projects, but found no irregularities. five months later, the department of health and human services (dhhs) launched its own investigation of harvard's genetic research in china, based on the complaint of a whistle-blower from hsph alleging violations of us human subject protection regulations (pomfret and nelson 2000) . in late 2000, the washington post ran a lengthy report detailing the allegations that chinese villagers had not been given low-cost medical care as they were promised in exchange for providing blood samples for xu's us-funded genetic research. the report also included allegations of xu's lapses in informed consent (pomfret and nelson 2000) . partly because of the controversy surrounding this case and others like it in china, the us embassy in beijing issued an advisory recommending that american scientists stop conducting medical research in extremely poor areas of china like anhui. in march 2001, an investigative report by two xinhua news agency reporters was published in outlook, a major chinese magazine. the report reiterated some of the allegations made in an earlier report published in the washington post and supplemented them with interviews with chinese farmers in an isolated region of the anhui province and their various predicaments. the controversies surrounding xu's anhui projects reached a new crescendo at the symposium on bioethics, biotechnology and biosecurity held in early april 2001, which was sponsored by the united nations educational, scientific, and cultural organization (unesco) and organized by the hangzhou municipal government. xiong lei, the lead reporter of the outlook report, presented her investigation and findings with their interview with chinese rural villagers in anhui. her presentation elaborated various irregularities in xu's projects, including the lack of informed consent, broken promises of providing medical care for those who participated in xu's projects, and xu's taking more blood samples than officially approved. an intense debate ensued, starting with xu's anhui collaborators, who maintained that the rate of getting informed consent was close to 100%, and that the chinese side did reap benefits from the collaboration with the harvard team. but their presentation was confronted by incensed chinese scientists, who questioned their numbers and practice. xu's legal counsel also made some comments, but the comments were challenged and ridiculed. some scientists expressed grave concerns about the loss of chinese genetic materials to the west, in fear of jeopardizing china's own genetic research. prompted by the outlook report, china's office for management of human genetic resources also launched its own investigation. it soon concluded, in june 2001, that xu's projects did not violate any chinese regulations, and told the us embassy so (pomfret 2001a) . but the controversy took another turn in late june 2001. in a letter to xu dated june 26, 2001, the dean of the hsph reprimanded him, strongly criticizing him for writing two letters to senior chinese government officials, in which xu urged the government to silence the voice from his detractors and to take actions against a major figure who had criticized his work. defending himself as a patriot, xu's letter suggested that the outlook report had released state secrets to ''foreigners'' (pomfret 2001b) . the dean condemned xu's actions, and warned him that he would ''not receive the continued support of the school for you or your research if you persist in exercising independent action.'' if he continued his campaign against journalists and others who questioned his research, the letter said, xu would face ''appropriate sanction'' (pomfret 2001b ). yet xu's woes, unfortunately, showed no sign of abating. on march 28, 2002, the federal office for human research protections (ohrp) of the dhhs issued a scathing indictment of the hsph research. the ohrp found that 15 projects in china that hsph was involved, including 12 projects on which xu served as the pi, failed to be approved by institutional review boards (irb); where approval had been granted, significant and unannounced changes were often made. it found that many of the informed consent documents approved by the hsph irb included complex language that would not be comprehensible to all subjects, particularly for rural chinese subjects. hsph was charged with failing to minimize risks to their vulnerable subjects, such as economically or educationally disadvantaged persons. in the end, subjects never even received the free medical care as promised. as a result of the indictment, xu was ordered to suspend all human subject interventions in his active studies pending the outcome of an internal audit. this new development was soon reported dutifully in china's press (xiong and wang 2002) . on may 14, 2002, lawrence summers, then the president of harvard university, gave a speech at peking university. when responding to a question from a student in the audience regarding harvard's projects in anhui, summers admitted that some irregularities in the projects were ''wrong.'' xu eventually breathed a sigh of relief when ohrp sent a letter to hsph in early may 2003, stating that all issues raised in the hsph-involved china studies have been either resolved or dissolved because of unsubstantiated allegations (as in alleged forged informed consent documents). consequently, ''there should be no need for further involvement of ohrp'' in these matters. notably, the letter acknowledged that xu had decided not to continue the hypertension study due to ohrp's concern that some of the interventions in the protocol exceeded the level of minimal risk. shortly afterwards, hsph held a press release announcing the ''[c]onclusion of u.s. government's inquiry into hsph genetic research in china. '' 11 soon after the hsph news release, one internet article, by xiong lei, alleged complacency and a likely cover-up on the part of the chinese government. it raised several issues (xiong 2003) . first, among the 15 harvard projects in china that the ohrp found to have problems, 12 projects were headed by xu. yet officially, only 3 of xu's projects had ever been approved by the government. hence there was a glaring discrepancy. second, the official from the office for management of human genetic resources, who was in charge of the investigation of the allegation of irregularities in xu's projects, told xiong privately that it was not an official investigation. however, the same official then turned around and told the american embassy that no violation was found. it was rather strange that the results of this unofficial investigation, which was shielded from the media and the public, would then be used by the americans to prove that there are no irregularities in these projects. lastly, one peasant in anhui whose ordeal led to the exposure of apparent lapses in informed consent later recanted his story after talking with officials from anhui and the central government. presumably, he changed his story because of pressure he experienced. this changed story explained why the ohrp eventually found no wrong-doings in xu's projects (xiong 2003) . since xiong's article appeared in a website that is officially blocked in china, it did not cause any stir. in china's news media, the criticism of xu's anhui projects also subsided consequently. xu's woes finally ended. more than a decade has passed since the purported ''gene war.'' despite well over a decade of hard work and well over 10 million us dollars in grant support of various forms, xu's team has so far not cloned a single gene for any complex disease or disorder. in fact, they are not even close. other teams were no luckier. in fact, besides numerous reports of association of diabetes, asthma, and other complex diseases with certain genetic polymorphisms, so far not a single gene has been proven to be chiefly responsible for any of these diseases. most genetic loci identified to be associated with the disease risk confer only miniscule relative risks, ranging from 1.02 to 1.5 (kraft and hunter 2009) . even when genetic polymorphisms that are associated with a modest increase in risk are combined, they generally have a low discriminatory and predictive ability (janssens and van duijn 2008) . a more recent study reports that after examination of 101 genetic variants reportedly linked to heart disease, the variants turned out to have no value in predicting disease among 19,300 women who had been followed for over 12 years and that family history had better predictive value (paynter et al. 2010 ). for human height, a trait that is known to be mostly hereditary, it is calculated that approximately 93,000 single nucleotide polymorphisms that are required to explain 80% of the population variation (goldstein 2009 ). this nearly astronomical number certainly inspires no enthusiasm for conducting large-scale gene hunting projects, and questions their value in genetic screening, genetic testing, and the possibility of developing gene-derived therapy (wade 2009 ). the idea that disease genes can be quickly identified, patented, and then quick profits can be made now seems to be too naïve. indeed, 10 years after the first draft of the human genome was completed, medicine has yet to see any large part of the promised benefits (wade 2010) . even gene patenting, the very topic that made the ''gene war'' so contentious, has begun to encounter resistance (kintisch 2009 ). there is indication that, at least in the united states, the status of gene patenting is changing (kintisch 2009 ). in fact, the us government recently decided that human and other genes should not be eligible for patents because they are part of nature (pollack 2010) . this move, viewed as ''a bit of a landmark, kind of a line in the sand,'' followed a surprising ruling made in march 2010, by judge robert w. sweet of the united states district court in manhattan, which ruled the patents held by myriad pharmaceuticals and the university of utah on two genes that predispose women to breast and ovarian cancer invalid (pollack 2010 ). on june 13, 2013, the u.s. supreme court ruled in association for molecular pathology vs. myriad genetics that ''naturally occurring'' dna sequences, but not lab-created synthetic cdnas, are off-limits for patent protection. millennium pharmaceuticals, the initial financial backer of xu's projects, pulled out of anhui early in 1999, with no significant medical or business discoveries to show for its $3.5 million investment (pomfret and nelson 2000) . it since has moved into a field of drug r&d that seeks to customize medical treatments for individual patients. it has grown into a successful, billion-dollar enterprise. yet no doubt xu's anhui projects played a crucial role early in its success by securing a much needed infusion of funds (pomfret and nelson 2000) . for sequana therapeutics, despite its public announcement of the discovery of the asthma gene in 1997, so far there have been no publications from the company regarding the gene. the claim has never been independently verified. the prospect of making diagnostics or therapeutics derived from the putative gene never materialized. it was acquired by arris pharmaceuticals, forming axys pharmaceuticals which later on formed axys advanced technologies, later bought by chemrx. the remains of axys were bought by celera. what used to be sequana therapeutics no longer exists. human genome sciences' stock price reached its peak at $109 on january 31, 2000 and went through two splits in 2000. its president and founder haseltine said that his work ''speeds up biological discovery a hundredfold, easily.'' he talked of finding ''the fountain of youth'' in genes in the form of ''cellular replacement'' therapies. 12 the company raised more than $2 billion in investments by 2000. in september 2000, the company reported that it had found a way to treat large, painful sores that often plague elderly patients, using a protein spray called repifermin, made by a human gene called keratinocyte growth factor-2. in february 2004, however, the company said that it was ending the development of repifermin because it showed no more benefit than a placebo in clinical trials. another initial drug also failed in clinical trials. in late 2004, the company announced haseltine's retirement and named h. thomas watkins the new president and ceo. in 2000, the first draft version of the human genome was published, thanks to collaborative work among geneticists from china, france, germany, japan, united kingdom and united states. in 2003, the completed version of the human genome was published, marking the completion of the hgp. from the first draft of the human genome, it was quickly learned that there are about 20,000 genes, less than one quarter of 100,000 ''genes'' patented by the human genome sciences. along with this shrinkage in the number of genes, the company stock price also shrank dramatically: the closing price on july 14, 2009, was $2.50, a reduction of 97.7% from its historical high. other genomics companies have not fared much better. iceland-based decode genetics, for example, was founded in 1996 to identify human genes associated with common diseases using population studies. its stock price reached $28.75 on september 11, 2000, but plummeted to $0.53 on july 14, 2009, a reduction of 98.2% in value. its stock was removed from the nasdaq biotechnology index in november 2008. the company's 2006 annual report reveals that its net losses were in excess of 530 million us dollars, and that they have never turned a profit. 13 its 2009 annual report says that ''decode has recorded substantial operating and net losses over the past 3 years'' and the company filed for chapter 11 bankruptcy in the same year (http://www.decode. com/investors/dcgn-sec-filings.php). it was acquired by amgen at the end of 2012 for 415 million usd. on the chinese side, human genetics research moved on with the infusion of research funding from the government. most scientists who participated in the 1996 xiang shan symposium established themselves in genetics research. several of them were later elected to the chinese academy of sciences, the most prestigious honor that can be bestowed on to a scientist in china. besides all the trappings and perks, being an academy member also carries far more influence and political clout than its us counterpart. huangmin yang, the most vocal in criticizing xu's projects, went on to establish china's premier genome research center, and his career culminated recently in the completion of the diploid genome of the first asian individual (wang 2008) , rumored to be the dna extracted from yang himself. the purported ''gene war'' has a profound resonance. even today, over a decade after, the reverberations of the media blitz and the fallout are still palpable: a google search of ''gene war'' or ''genetic resource'' turns up numerous websites still talking about the ''gene war'' or even the purported attempt by the us to wage war against china using ''gene weapons'' (tong 2003) . the ''war'' also spurred a flurry of research papers in chinese scholarly journals, universally calling for the protection of china's genetic resources and profit-sharing arising from gene research (jia and wang 2006; jiang and wei 2006) . xu left harvard in 2005 and joined the school of public health, university of illinois in chicago, as a non-tenure track research professor of epidemiology (http:// www.cade.uic.edu/sphapps/faculty_profile/facultyprofile.asp?i=xipingxu), apparently failing to secure a tenured position at harvard. the negative publicity surrounding xu likely made him seem more a liability than an asset to hsph, especially when he and his group had made no important discoveries. in 2008, the epidemiologist-turned genetic epidemiologist went through another metamorphosis and became an entrepreneur. he was the chief technology officer and, as of writing, is now the president of ausa pharmed company in shenzhen, china. in a glowing profile of xu and his company by people's daily, xu is quoted as saying, ''i used to write prescriptions for people in a small village, and now i am writing a big prescription for people all over the world'' (wang 2008) , 14 apparently referring to the company's blockbuster-to-be drug for stroke prevention, yiye (''yi'' is the pronunciation of the first syllable of enalapril in chinese and ''ye'' is that of folic acid). according to the company's website, 15 the drug, a polypill consisting of a combination of enalapril (an angiotension converting enzyme inhibitor, or acei, used as an anti-hypertensive drug, on which merck had a patent, now expired) and folic acid (a member of the vitamin b family, used to prevent neural tube defects for pregnant women, and, as an auxiliary, to treat hyperhomocysteine and other conditions), is the fruit of extensive research by ausa scientists, ''the only class i cardiovascular drug approved by the state food and drug administration (the us fda counterpart in china) in the last three years with all china-owned intellectual rights, and is the first novel drug in the world that can simultaneously control two risk factors for stroke, hypertension and hyperhomocysteine.'' in 2009, xu was among the first that were granted the ''thousand scholars'' support, a program designed to attract full-professor-level senior professionals from overseas to work in china. 16 this is a title with enormous privileges and perks given to a select group of best scholars recruited from overseas by beijing. on december 14, 2008, xu was featured in the oriental satellite tv's special program, 30 people in 30 years, a program that profiled 30 prominent people and their achievements in the 30 years of economical reform in china. in the program, xu talked about his early life as a ''barefoot doctor,'' his admission to peking university and then to harvard, and his dream, as a youth, of ''writing big prescriptions for people all over the world.'' he talked about his work in epidemiologic studies of air pollution and health and his new venture in developing drugs for chinese people. he made no mention, however, about his genetics studies and their associated controversies, and showed no trace of bitterness. 17 curiously, the ausa-sponsored clinical trial on the evaluation of yiye in prevention of stroke was registered at the clinical trial registry, www. clinicaltrials.gov, on november 19, 2008 , which coincided with the official approval from the sfda of yiye. 18 the registered trial, china stroke primary prevention trial (csppt), is a phase iv trial (nct00794885), which compares the efficacy of enalapril and folic acid combination vs. enalapril alone in preventing strokes. as of writing, its status is listed as ''on-going, but is not recruiting participants.'' its estimated completion date is august, 2014. in modern society, science has become a firmly institutionalized social activity, attracting people through offering generally prized opportunities of engaging in socially approved patterns of association with one's fellow and the consequent creation of cultural products esteemed by the group, in addition to economic benefits that science may offer (merton [1938 ). as merton eloquently put it, ''such group-sanctioned conduct tends to continue unchallenged, with little questioning of its reason for being. institutionalized values are conceived as selfevident and require no vindication'' (merton [1938 ). in modern science, especially in biomedical research, scientific enquiries often require large amount of resources-expensive instruments and reagents, lab space, and talented and hardworking students. hence the pressure for getting resources is enormous. anything that promises to help ease the pressure is welcome. scholars of sociology of science often view science as agonistic, made up of rival individuals or groups vying to have their scientific ideas or views recognized and/or accepted as valid (greenhalgh 2008) . science, as a space of maps of culture, is drawn by scientists hoping to have their research accepted as valid and recognized, their practices esteemed and patronized, and their culture sustained as home of objectivity, reason, truth or utility. the maps are then used by all who are unsure about the reality (gieryn 1999 ). yet maps of science change over time, as competing cartographers are constantly drawing, erasing, and redrawing the boundaries of science. by doing this, the scientist cartographers claim authority over a particular issue by taking it within their science or turf. thus, vying for acceptance or the valid ''truth'' among scientists is essentially a credibility contest, with winners viewed as the epistemic authority (gieryn 1999) . the one with the epistemic authority obviously would be the most influential, and their views and voices would be the most visible and vocal when it comes to policy issues (greenhalgh 2008 ). gieryn's credibility contest metaphor aptly depicts the quest for epistemic authority in science, it also is applicable, rather fittingly, to the situation when scientists vie for resources from funding agencies. in fact, when the process of deciding who would get resources lacks procedural justice, and when there is a lack of tradition for open and rational debate and of a checks-and-balances system, the credibility contest becomes literally a ''beauty'' contest-the most glamorous, in terms of rank, status, or simply seniority in the administrative ladder, but not necessarily the vision, merit, or talent, would get the resources. in a country where political loyalty and connections are valued far more than scientific merit and talent, the credibility contest further becomes a contest of political correctness or patriotism. coupled with the lack of a clean and efficient government and of transparency and also with the pervasiveness of guanxi or personal connections, this contest might create winners who are not necessarily the scientifically most visionary. as human beings, scientists are also susceptible to all human frailties. aside from questing for epistemic authority, they also compete for resources and influence, and often vie for political clout, credit, fame, and even glamour, especially when such activities help their quest for epistemic authority or increase their professional and personal gains. if there are no set rules of the game with certain procedural justice or a checks-and-balances system in place that can curtail the tendency and channel it into the maximization of the common good, the human frailties, coupled with the lack of proper avenues for open debate, the contest would be an invitation for inefficiency, waste, corruption, and even disaster. winners might take all, but in the long run the bad money drives out the good. the spectacular fiasco in containing the sars epidemic and in sequencing the coronavirus that causes sars are a prime example (enserink 2003; cao 2004) . lured mostly by the low cost of collecting large dna samples and the perceived genetic homogeneity, many gene hunters from the west came to china in the hope to identify genes responsible for complex diseases, and some of them may have hoped to get rich in the process. this was mostly accomplished through some wellconnected chinese intermediaries ''as experienced guide,'' as washington post reporter john pomfret put it (pomfret and nelson 2000) . letting the intermediary do the leg work did spare them from doing the dirty field work but also insulated them from the sentiment from villagers and the scientific establishment in china and prevented from establishing a rapport with these people. from a scientific standpoint, many gene-hunting projects were launched without much understanding of the population genetic structure of the chinese population or foundational genetic epidemiologic data, let alone the appreciation of the inherent risk in this scientific endeavor. there was no plan b that one could have something to fall back on in case what was planned did not pan out. the execution also was fraught with various deficiencies. with little or no oversight, the daily work was left to the hands of not-so-well trained people. and when rumors of irregularities surfaced, inspection was largely perfunctory, nothing more than sugar-coating or bandaging. it would have been a miracle if such projects were ever productive. faced with numerous well-endowed gene hunting teams coming to china, the cash-strapped chinese genetics scientists had every reason to be worried. the taking of large number of dna samples and, worse yet, the tracking down of some large pedigrees with rare genetic diseases would effectively deprive their chance of finding disease genes, outshining them in the genetic research of chinese populations and threatening their careers. providing dna materials without any reasonable share of possible future financial payoff for the people who donated their blood could also be a concern, but it is not clear which was the primary concern. by calling the attention of the central government through evoking nationalism via calling the protection of china's genetic resources, they got the resources and also claimed a territory that would be off-limit to ''foreign devils.'' yet by doing so, no one apparently was aware then of numerous and enormous hurdles to gene prospecting and vastly underestimated its complexity and challenge. by evoking the urgency to protect china's genetic resources, some scientists played the card of nationalism, wining the contest in getting resources through nudging the unsuspecting government to cough up some much needed funds to thwart ''foreign devils''' pilfering act. through the drafting and implementation of the interim measures for the administration of human genetic resources, the domestic scientists effectively enacted an embargo of the transfer of all dna-containing materials from china to the outside by drawing a clearly demarcated boundary. this may explain why dr. xiping xu repeatedly proclaimed, in many public occasions during the entire course of the ''war,'' that he is a patriot. well-connected and clearly a master of nuances of guanxi, he certainly knew the psyche of many chinese and government officials. lapses and missteps aside, he was no match to domestic scientists united in the name of patriotism. yet his biggest deficiency in the credibility contest was attributable to his betting on a wrong horse: many, if not all, of his well-funded projects did not pan out in the end, producing no headline scientific discovery and failing to establish an epistemic authority. while credibility contest to quest for epistemic authority depicts science and its working, the contest for credibility, glamour or patriotism in getting resources as we see in the ''gene war'' may be ultimately detrimental to china's science and technology. today, china's r&d investment, in terms of dollar amount, has increased dramatically as compared with the era of ''gene war.'' it reached a historical high of 868.7 billion yuan, or about 133.6 billion us dollars, in 2011, amounting to 1.84% of china's gdp. 19 as a result, china's scientific output, measured by the number of papers published in international journals, also has increased remarkably and is reportedly ranked as the second in the world, just behind the us. 20 yet in terms of average number of citations-a rough measure of impact or research quality, china was ranked a distant 14 th . a more disconcerting observation is that the fruit of biomedical research seldom translates into better patient care, better therapeutics, better prognostics, or better prevention. in other words, the vast majority of tax-payers have not received any tangible benefit from the supposedly noble and grandiose scientific endeavor. this situation, if left unchanged, is not going to justify for heavy investment in biomedical research and to win the continued support from tax-payers in the long-run. ultimately, it would raise the issue of sustainability of biomedical research in china. this problem will become all the more acute as china enters into an aging society in which health care cost will surely skyrocket. it should be noted that, at the height of the ''gene war,'' similar concerns were also raised in finland and india. however, few seemed to have framed their concerns at the level of nationalism, and even fewer have gone overboard and demonized, often in passionate and patriotic rhetoric, all foreign-supported gene hunting projects. the near paranoid that instigated towards all foreign-backed gene-hunting project did help to cough up some much needed research funding from the government, but also fermented xenophobia and some utterly unfounded yet sensational non-sense such as ''gene weapons'' 21 (tong 2003) and tarnished genetic research in china. remarkably, when a book was published in 2003, sensationally claiming that the sars virus was deliberately manufactured by the west based on dna materials smuggled out of china (tong 2003) , no one stood out and debunked such scientific rubbish. the chinese also carry a burden of humiliation and painful memories of the past as a result of repeated ravages by foreign aggression and exploitation in the last two centuries. consequently, issues that may be remotely related to national sovereignty or foreign exploitation are hot-button ones. minor incidents can be easily escalated into a major event, as evidenced by the calls for boycotts of french carrefour and other foreign retailers in china in response to the disruptions of the olympic torch relay in paris in 2008, and, more recently, by the vandalization of japanese-made cars in many chinese cities at the height of anti-japanese sentiment rekindled by the territorial dispute between china and japan. china's current funding system and the science policy-making also are vulnerable to subterfuge of various kinds, behind which personal gains are often masquerading as patriotism or national interest. as ambrose bierce once defined, patriotism is a ''combustible rubbish ready to the torch of any one ambitious to illuminate his name.'' it is a challenge to weed out charlatans impersonating as patriots, but the best bet would be a transparent system that values merit, talent and vision above cheap patriotic or nationalistic rhetoric. as china is aspiring to be a leader in science and technology, this ''gene war of the century'' and its aftermath, as narrated here, serve as a cautionary tale. it reminds us, first and foremost, how important it is to be clear-headed and not to follow blindly whatever in vogue. very often, what we see is the conspicuous ''me-too'' science, following whatever fashionable. yet the most important ball that all eyes should lay on, i.e. better health for all, is seemingly lost. in the absence of procedural justice in the process of deciding who would get resources, and when there is a lack of tradition for open and rational debate and of a checks-and-balances system, the credibility contest for resources would easily become a ''beauty'' contest. in a country where political loyalty and nationalism are valued more than scientific merit and talent, the credibility contest would further become a contest of political loyalty, political correctness, or patriotism. the news of pilfering of china's genetic resources by foreign companies could easily strike a chord of painful memories of foreign aggression and exploitation in the last two centuries. the isolation from the world community for nearly three decades since 21 one key finding of the hgp is that all human races are 99.99% identical-dna sequence-wise. therefore, racial differences are genetically insignificant. for many genes, it has been established that genetic variations within the same racial group are larger than those among racial groups. thus, scientifically, it is impossible to devise a ''gene weapon'' to target a specific racial or ethnic group. ''gene war of the century'' and its aftermath 509 the founding of the people's republic purportedly due to the prejudice, discrimination, and containment of the western imperialists and capitalists-or so it was told by the state media-also helped foster or reinforce a nearly xenophobia attitude towards anything perceived to be dangerous if coming from outside of china, especially it touches on ideology. thus, a spark of minor incident could easily kindle the fire of tumultuous nationalistic uproar. hence, the ''gene war'' holds a lesson that being seemingly the most patriotic is not an assurance for good science. the mere possession of resources does not guarantee scientific productivity, either. vision and brain are more important when it comes to scientific innovation and discovery. the over-politicizing science will ultimately prove to be detrimental to china's science and technology. we have seen it during china's great cultural revolution, in which nearly everything in science and technology was politicized. but science and technology then were essentially decimated, and the characteristic hallmark was jia, da, kong or falsehood, grandeur, and emptiness. in addition, when political loyalty prevails over talent and vision, some unscrupulous scientists can hijack the value system to their own advantage. and when there is also a lack of avenue for open debate, then one project purported of to be of national importance could be usurped by another with purportedly greater importance. the conflict also reminds us that scientists are no more than human beings and have all the human frailties. personal gains or some ulterior motives can be camouflaged as patriotism or public interest, as shown in the politics of paleoanthropological nationalism in china (sautman 2001) . as merton succinctly put it, ''any extrinsic reward-fame, money, position-is morally ambiguous and potentially subversive of culturally esteemed values. for as rewards are meted out, they can displace the original motive: concern with recognition can displace concern with advancing knowledge. an excess of incentives can produce distracting conflict'' (merton [1968 ). without any measures or system to guard against this, the interest of the nation and of the public will suffer. in human genetics, china's premier challenge now remains to be, as was a decade ago, ''to build up a critical mass of highly competent and visionary scientists who will be able to bring chinese genetics into the world community'' (guo et al. 1997) . the ultimate goal of biomedical research in general and human genetics in particular is to bring tangible benefits and better health to the general public, not merely some ranking of scientific output, for it is the source of gaining sustainable support from the general public and of economic prosperity. science thrives on openness, reason, and the competition of ideas, and it suffers when subjected to political agenda, faux patriotism or nationalism. anhui-native scientist climb peak of human genetics noncoding rna transcription beyond annotated genes implications of the human genome project for medical science the search for genetic variants predisposing women to endometriosis confirmation of susceptibility gene loci on chromosome 1 in northern china han families with type 2 diabetes china's missed chance shiji zhi zhang: jiyin qian duo zhan [war of the century: the war to seize the genes shiji zhi zhang: jiyin qian duo zhan [war of the century: the war to seize the genes asthma gene discovered cultural boundaries of science: credibility on the line common genetic variation and human traits just one child: science and policy in deng's china gene war of the century? a linkage between dna markers on the x chromosome and male sexual orientation backlash disrupts china exchanges genome-based prediction of common diseases: advances and prospects protection of genetic patent-a discussion of gene providers' sharing the benefits accruing from the use of genetic patent the thought on dispute of cooperative gene research between china and america out of sight, out of mind: how harvard university exploited rural chinese villagers for their dna appellate court weighs 'obvious' patents genetic risk prediction-are we there yet? genetic dissection of complex traits the long-term effect of lifestyle interventions to prevent diabetes in the china da qing diabetes prevention study: a 20-year follow-up study jiyin da zhan'' [the escalating ''gene war personal genomes: the case of the missing heritability the puritan spur to science behavior patterns of scientists hereditary diseases in finland; rare flora in rare soul association between a literature-based genetic risk score and cardiovascular events in women people's daily. 2002. benefiting thousands and thousands of families u.s. says genes should not be eligible for patents china probe clears harvard's genetic research an isolated region's genetic mother lode harvard-led study mined dna riches; some donors say promises were broken male homosexuality: absence of linkage to microsatellite markers at xq28 the future of genetic studies of complex human diseases peking man and the politics of paleoanthropological nationalism in china beijing youth daily the harvard case of xu xiping: exploitation of the people, scientific advance, or genetic theft? genome research set to take off in china the last defence: the agony of the loss of chinese genes prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance genes show limited value in predicting diseases a decade later, genetic map yields few new cures new drug research in china lift heavy siege. people's daily methods for time trend analysis of cancer incidence rates experimental studies on somatic gene therapy for diabetes i. structuring of recombinant from human insulin gene and mammalian expression vector prc/cmv scramble for chinese genes not ended harvard's genetic research projects in china a ''violation advances in science and progress of humanity: a global perspective on dna the gene puzzle acknowledgments the author would like to thank drs. cong cao, partha majumder, margaret sleeboom-faulkner for their constructive comments on an earlier version of this manuscript. he also thanks dr. yang huanmin for the reading of an earlier draft, two anonymous reviewers and the editors for their helpful comments. thanks also goes to charles guo for his critical reading of and helpful comments on the manuscript. key: cord-016187-58rqc0cg authors: opal, s. m. title: the challenge of emerging infections and progressive antibiotic resistance date: 2006 journal: intensive care medicine in 10 years doi: 10.1007/3-540-29730-8_6 sha: doc_id: 16187 cord_uid: 58rqc0cg nan our collective vulnerability to the threat of emerging microbial pathogens remains disturbingly evident as we enter the twenty-fi rst century. despite two centuries of knowledge about the germ theory of disease, breaking the genetic code, and sequencing the genomes of virtually every major bacterial and viral pathogen capable of causing disease in humankind, we still fi nd ourselves susceptible to infectious diseases. densely concentrated cities with interconnected human societies linked by international aviation put us at continued risk from future epidemics that will inevitably occur [1] . the ever expanding population growth of our species will force environmental change as we venture into sparsely inhabited rainforests, populate remote ecosystems and cultivate natural habitats to support our voracious human appetite for goods and services. global warming, environmental degradation and land development along with human upheavals and natural calamities will create new outbreaks with novel pathogens and renew the spread of ancient scourges like cholera [2] and plague [3] . numerous examples of intercontinental spread of microbial pathogens within the last fi ve years alone give notice of the susceptibility of human populations to emerging infectious diseases (table 1) [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] . this is perhaps best exemplifi ed by the tragic events set into motion in late 2002 when a previously unidentifi ed, obscure, animal coronavirus (now known as severe acute respiratory syndrome [sars]-cov) was fi rst introduced into an unsuspecting human population in southern china [9] . current molecular evidence indicates that a food handler in an exotic food 'wet market' in guangdong province probably fi rst became infected by an animal coronavirus from a civet cat. this newly derived animal virus was adapt at infecting humans and was effi ciently spread person-to-person by infected aerosol [10] . an ill chinese physician from the affected region traveled to hong kong to attend a wedding. while spending a single night in amoy garden hotel in the city, this infected individual appeared to spread the virus to at least 12 other hotel guests. over the next several days these people returned to their homes in fi ve different countries incubating the sars-cov pathogen in their respiratory secretions. over the next 3-4 months, this newly acquired coronavirus spread to over 27 countries worldwide and caused over 8000 cases of sars resulting in nearly 800 deaths in early 2003 [9] . through a global effort from a large number of very diligent public health offi cials and laboratory scientists, the outbreak ended within a year and has yet to be seen again, except for occasional laboratory-acquired accidents [9] . a diverse array of pathogens has produced recent outbreaks and concerns for our vulnerability to pathogens within the global village we occupy and share with other fl ora, fauna and microorganisms ( table 1 ). the spread of mosquitoavian infl uenza [4] [5] [6] [7] infl uenza a (h 5 n 1 ) risk of pandemic infl uenza; sporadic human cases of avian fl u in asia -mortality rates>70% severe acute sars associated coronavirus risk of spread of zoonotic respiratory syndrome viruses; outbreak from a (sars) [8] [9] [10] southern china to worldwide epidemic in 2003 -8000 cases and 800 deaths monkey pox [11] orthopox virus risk of exotic pet trade; outbreak in wild rodents and humans in mid-western usa from sale of gambian giant rats from africa west nile virus mosquito-borne fl avivirus risk of international spread; (wnv) [12, 13] wnv from africa to new york in 1999, thousands of cases and hundreds of deaths in north america over next 5 years inhalational anthrax intentional release of vulnerability to [14, 15] bacillus anthracis spores bioterrorism; 11 cases, in usa mail system 5 [17] [18] [19] [20] oseltamivir-resistant infl uenza genes -community azole-resistant outbreaks now occur candida spp. borne west nile virus in north america [12, 13] , prion-related food-borne variant creutzfeldt-jakob disease [21] , and hemorrhagic fever viruses [16] are a constant reminder of our susceptibility to pathogens that naturally reside in other animal species. the omnipresent fear of the next pandemic of infl uenza has been heightened by recent evolutionary changes in virulence and transmissibility of avian fl u viruses [22] . standard chemotherapeutic regimens for infectious diseases may not reliably rescue persons with severe infections in the new millennium. community and nosocomial outbreaks of multidrug resistant pathogens as evidenced by methicillin and vancomycin resistance [17] in staphylococcus aureus and resistance to the new anti-viral neuraminidase inhibitors [18, 19] by recent infl uenza isolates are cause for real concern. the care of hospitalized, critically ill patients is likely to fundamentally change if current trends in the progressive emergence of antimicrobial resistance to commonly prescribed antibiotics are not signifi cantly altered in the near future. regrettably, there is little evidence that the situation is likely to change unless concerted efforts are taken on several fronts to reverse the current trajectory of increasing antibiotic resistance [17, 20] . the fi tness of a microorganism is dependent upon its capacity to genetically adapt to rapidly changing environmental conditions. antimicrobial agents exert strong selective pressures on microbial populations, favoring those organisms that are capable of resisting them. genetic variability may occur by a variety of mechanisms. point mutations may occur in a nucleotide base pair, which is referred to as micro-evolutionary change [23] . these mutations may alter the target site of an antimicrobial agent, altering with its inhibitory capacity. point mutations inside or adjacent to the active sites of existing beta-lactamase genes (e.g., genes for tem-1, shv-1) have generated a remarkable array of newly recognized extended-spectrum beta-lactamases [23] . beta-lactam antibiotics have been known for almost 80 years and their widespread use has created selection pressures on bacterial pathogens to resist their inhibitory actions. at least 267 different bacterial enzymes have now been characterized that hydrolyze beta-lactam antibiotics [24] . the hydrolyzing enzymes exist in four basic molecular classes and are classifi ed as listed in table 2 . the enzymes are either serine hydrolases (class a, c, and d) or zinc containing metalloenzymes with a zinc-binding thiol group its active site (class b enzymes). the microevolutionary events that account for the differential activities of this array of beta-lactamases have been carefully studied, and these bacterial enzymes now even have their own internet website devoted specifi cally to their molecular properties (http:// www.lahey.org/studies/webt.htm). beta-lactamase activity has become so ubiquitous among bacterial populations that it has prompted the development of specifi c beta-lactamase inhibitor compounds (clavulanate, tazobactam and sulbactam) in an effort to combat this common bacterial resistance mechanism. this has been countered by the generation of inhibitors of these beta-lactamase inhibitors by multidrug-resistant bacteria [23] in the ongoing confl ict between pathogens and chemotherapeutic strategies to eradicate these microorganisms. recently it has been demonstrated that at least some bacterial populations have the capacity to increase their mutation rates during times of environmental stress such as exposure to an antibiotic. this stress response is known as the 'sos' response or transient hypermutation [25] . it is highly advantageous for the organism to increase the rate of genetic variation at times of unfavorable environmental conditions. it is possible for bacteria to upregulate the pace of evolution in an attempt to develop a clone that can resist the action of an antibiotic. the dna polymerase in such organisms has reduced fi delity of replication and subsequently an increased rate in the mutational occurrences as a result of excess nucleotide mispairing. the recombination system of bacteria (the reca system) becomes less restrictive in the degree homology between dna sequences before a crossover event is permitted to occur. a fl urry of mutational events occur in stressed bacteria in a fi nal attempt to generate a resistant subpopulation of bacteria in the presence of an environmental challenge such as the presence of a new antibiotic. this process has even been phenotypically linked with alterations in growth rate and biofi lm formation in some strains of pseudomonas aeruginosa [26] . a second level of genomic variability in bacteria is referred to as a macro-evolutionary change and results in whole-scale rearrangements of large segments of dna as a single event. such rearrangements may include inversions, duplications, insertions, deletions, or transposition of large sequences of dna from one location of a bacterial chromosome or plasmid to another. these whole-scale rearrangements of large segments of the bacterial genome are frequently created by specialized genetic elements known as transposons or insertion sequences, which have the capacity to move independently as a unit from the rest of the bacterial genome [23] . acquisition of foreign dna sequences from the extracellular environment may be taken up by naturally competent bacteria (e.g., some streptococci and neisserial organisms) by transformation. these sequences can then become integrated into the host genome into homologous sequences by the generalized recombination and dna repair system bacteria. inheritance of these foreign dna elements further contributes to the organism's ability to cope with selection pressures imposed upon them by antimicrobial agents [23] . a third level of genetic variability in bacteria is created by the acquisition of foreign dna carried by plasmids and bacteriophages. these extrachromosomal dna elements provide ready access to disposable yet potentially highly advantageous genes including antibiotic resistance genes from plasmids or phage particles. these elements are autonomously self-replicating, and they can remain unattached in the cytoplasm of bacterial cells or integrate directly into the chromosome of the bacterial host. they have the capacity to replicate and move independently from the chromosome adding further variability to the entire bacterial genomic dna. evidence from whole genome sequencing projects indicates that these genomic rearrangements, bacteriophage sequences and insertion sequences are commonplace in bacterial chromosomes [27] . these genetic variations provide bacteria with the seemingly limitless system to alter their genomes, rapidly evolve and develop resistance to virtually any antimicrobial agent. recent examples of vancomycin-resistance in enterococci [23] , s. aureus [27] , and extended spectrum beta-lactamases [23] , carbapenemase production [28] and transferable quinolone resistance in p. aeruginosa and enterobacteria [23] attest to the capacity of microorganisms to adapt to environmental stresses induced by antibiotic exposure. viruses [19] and fungi [20] are also quite capable of rapid antimicrobial resistance development and these resistance capacities pose additional threats in the management of icu patients with serious infections from a variety of potential pathogens [29] . antibiotic resistance genes probably arose from detoxifying enzymes or synthetic enzymes with altered substrate specifi city by critical mutations or recombination events resulting in the formation of mosaic genes with entirely new functions [30] . altered penicillin binding proteins that mediate beta-lactam resistance in multiple bacterial genera (e.g., methicillin-resistant s. aureus [mrsa], penicillin-resistant streptococci and pneumococci, chromosomal resistance in gonococci) may have evolved from gene fusions for penicillin binding proteins involved in bacterial cell wall synthesis [23] . another common resistance strategy is a change in the regulation of metabolic activity of an enzyme system that is affected by the antibiotic. increasing the rate of folate precursor synthesis, for example, can overcome the inhibitor effects of sulfa drugs and trimethoprim [30] . many common antibiotic resistance genes were accidentally acquired ('stolen') from antibiotic producing bacteria. streptomyces and related soil bacteria are the source of many standard antimicrobial agents in use in clinical medicine today. these bacteria have co-evolved the capacity to synthesize antibiotics along with the necessary resistance genes to protect their own metabolic machinery from the very antibiotic they produce. the resistance genes from these antibiotic producing bacteria provide a ready genetic blueprint to resist the target antibiotic if susceptible bacteria can acquire these resistance genes. recent evidence confi rming that this does indeed occur was found by yokoyama and colleagues in japan during an investigation of a sudden outbreak of p. aeruginosa with high-level resistance to essentially all the clinically available aminoglycosides [31] . these investigators discovered that the resistant strain had acquired a new methylase gene that blocked the binding site for inhibition by aminoglycosides on a specifi c sequence on 16s ribosomal rna. this identical mechanism and highly homologous gene is found in aminoglycoside-producing strains of streptomyces and related bacteria. at least seven distinctive mechanisms of antibiotic resistance have been described in bacteria and are summarized on table 3 . detoxifying enzymes are used to degrade beta-lactams [24] , and modify aminoglycosides so they no longer enter bacterial membranes and attach to their ribosomal target. there are over 30 such enzymes identifi ed that can inhibit aminoglycosides by one of three general reactions: n-acetylation, o-nucleotidylation, and o-phosphorylation [23] . detoxifying enzymes are also one of the resistance mechanisms against chloramphenicol, and are rarely utilized by certain bacterial strains to inactivate macrolides, lincosamides, tetracyclines and streptogramins. it was recognized early in the history of antibiotic development that penicillin is effective against gram-positive bacteria but not against gram-negative bacteria [23] . this difference in susceptibility to penicillin is due in large part to the outer membrane, a lipid bilayer that acts as a barrier to the penetration of antibiotics into the cell. situated outside the peptidoglycan cell wall of gramnegative bacteria, this outer membrane is absent in gram-positive bacteria. the outer portion of this lipid bilayer is composed principally of lipopolysaccharide (lps) made up of tightly bound hydrocarbon molecules that impede the entry of hydrophobic antibiotics, such as penicillins or macrolides. the passage of hydrophilic antibiotics through this outer membrane is facilitated by the presence of porins, proteins that are arranged so as to form water-fi lled diffusion channels through which antibiotics may traverse [23] . bacteria usually produce a large number of porins with differing physiochemical properties, permeability characteristics and size; approximately 10 5 porin molecules/ cell for escherichia coli. bacteria are able to regulate the relative number of different porins in response to the osmolarity of their microenvironment. in hyperosmolar conditions, e. coli represses the synthesis of larger porins (ompf) while continuing to express smaller ones (ompc) [32] . mutations resulting in the loss of specifi c porins can occur in clinical isolates and determine increased resistance to beta-lactam antibiotics. resistance to aminoglycosides and carbapenems emerging during therapy has also been associated with a lack of production of outer membrane proteins. in p. aeruginosa, resistance to imipenem appears to be due to an interaction between chromosomal beta-lactamase activity and a loss of a specifi c entry channel, the d2 porin [33] . the rate of entry of aminoglycoside molecules into bacterial cells is a function of their binding to a usually non-saturable anionic transporter, whereupon they retain their positive charge and are subsequently 'pulled' across the cytoplasmic membrane by the internal negative charge of the cell. this process requires energy and a threshold level of internal negative charge before signifi cant transport occurs (proton motive force) [34] . these aminoglycoside-resistant isolates with altered proton motive force may occur during long-term aminoglycoside therapy. these isolates usually have a 'small colony' phenotype due to their reduced rate of growth. imgram-neg: gram-negative bacteria; tmp: trimethoprim; tcn: tetracycline; +++: most common mechanism; ++: common; +: less common, -: not reported (see reference [23] ) active effl ux of antimicrobial agents is increasingly utilized by bacteria and fungi as a mechanism of antibiotic resistance. some strains of e. coli, shigella, and other enteric organisms express a membrane transporter system that leads to multidrug resistance by drug effl ux [35] . specifi c effl ux pumps also exist that promote the egress of single classes of antimicrobial agents. effl ux mechanisms are the major mechanism of resistance to tetracyclines in gram-negative bacteria. some strains of s. pneumoniae, s. pyogenes, s. aureus, and s. epidermidis, use an active effl ux mechanism to resist macrolides, streptogramins, and azalides [23] . this effl ux mechanism is mediated by the mef (for macrolide effl ux) genes in streptococci and msr (for macrolide streptogramin resistance) genes in staphylococci. a similar effl ux system, encoded by a gene referred to as mrea (for macrolide resistance effl ux), has been described in group b streptococci. this mechanism of resistance may be more prevalent in community-acquired infections than was generally appreciated. dissemination of these resistance genes among important bacterial pathogens constitutes a major threat to the continued usefulness of macrolide antibiotics [36] . active effl ux mechanisms may also contribute to the full expression of betalactam resistance in p. aeruginosa. multidrug effl ux pumps in the inner and outer membrane of p. aeruginosa may combine with periplasmic beta-lactamases and membrane permeability components for full expression of antibiotic resistance [37] . active effl ux of fl uoroquinolones by specifi c quinolone pumps or multidrug transporter pumps has also been detected in enteric bacteria and staphylococci [23] . resistance to a wide variety of antimicrobial agents, including tetracyclines, macrolides, lincosamides, streptogramins and the aminoglycosides, may result from alteration of ribosomal binding sites. the mls b -determinant has the genes that produce enzymes to dimethylate adenine residues on the 23-s ribosomal rna of the 50-s subunit of the prokaryotic ribosome, disrupting the binding of these drugs to the ribosome. resistance to aminoglycosides may also be mediated at the ribosomal level. mutations of the s12 protein of the 30-s subunit have been shown to interfere with binding streptomycin to the ribosome. ribosomal resistance to streptomycin may be a signifi cant cause of streptomycin resistance among enterococcal isolates. ribosomal resistance to the 2-deoxystreptamine aminoglycosides (gentamicin, tobramycin, amikacin) appears to be uncommon and may require multiple mutations in that these aminoglycosides bind at several sites on both the 30s and 50s subunits of the ribosome [23] . vancomycin and other glycopeptide antibiotics such as teicoplanin bind to dalanine-d-alanine, which is present at the termini of peptidoglycan precursors. the large glycopeptide molecules prevent the incorporation of the precursors into the cell wall. resistance of enterococci to vancomycin has been classifi ed as a-g based upon the genotype, type of target site modifi cation and level of resistance to vancomycin and teicoplanin [38] . strains of e. faecium and e. faecalis with high-level resistance to both vancomycin and teicoplanin have class a resistance. class a resistance is mediated by the vana gene cluster found on an r plasmid. this protein synthesizes peptidoglycan precursors that have a depsipeptide terminus (d-alanine-d-lactate) instead of the usual d-alanine-d-alanine. the modified peptidoglycan binds glycopeptide antibiotics with reduced affi nity, thus conferring resistance to vancomycin and teicoplanin. the other classes of vancomycin resistance genes vary in level of resistance, species distribution and specifi c cell wall alterations [23, 38] . vancomycin-intermediate strains of resistant s. aureus (visa) have been isolated with heterogeneous resistance patterns. visa strains express unusually thick peptidoglycan cell walls that are less completely cross-linked together. the cell wall in some strains of visa contains non-amidated glutamine precursors that provide an increased number of false binding sites to vancomycin [39] . the vancomycin molecules are absorbed to these excess binding sites thereby reducing vancomycin concentrations at the growth point of peptidoglycan synthesis along the inner surface of the cell wall. the arrival of high level vancomycin resistance from vana expressing s. aureus [17] has created a renewed sense of urgency in the need to develop novel strategies to combat multi-drug resistant bacterial pathogens. beta-lactam antibiotics inhibit bacteria by binding covalently to penicillinbinding proteins (pbps) in the cytoplasmic membrane. these target proteins catalyze the synthesis of the peptidoglycan that forms the cell wall of bacteria. in gram-positive bacteria, resistance to beta-lactam antibiotics may occur by a decrease in the affi nity of the pbp for the antibiotic or with a change in the amount of pbp produced by the bacterium [23] . these low affi nity binding pbps may be inducible where their production is stimulated by exposure of the microorganism to the beta-lactam drug [40] . the structural gene (meca) that determines the low-affi nity pbp of mrsa shares extensive sequence homology with a pbp of e. coli, and the genes that regulate the production of the low-affi nity pbp have considerable sequence homology with the genes that regulate the production of staphylococcal penicillinase [23] . the pbps of beta-lactamase-negative penicillin-resistant strains of n. gonorrhoeae, n. meningitidis, and haemophilus infl uenzae have shown reduced penicillin-binding affi nity [41] . their pbps appear to be encoded by hybrid genes containing segments of dna scavenged from resistant strains of related species, similar to penicillin-resistant pneumococci [23] . dna gyrase (also known as bacterial topoisomerase ii) is necessary for the supercoiling of chromosomal dna in bacteria in order to have effi cient cell division [23] . another related enzyme, topoisomerase iv is also required for segregation of bacterial genomes into two daughter cells during cell division. these enzymes consist of two a subunits encoded by the gyra gene and two b subunits encoded by the gyrb gene (or parc and pare for topoisomerase iv. although spontaneous mutation ot the a-subunit of the gyra locus is the most common cause of resistance to multiple fl uoroquinolones in enteric bacteria, b-subunit alterations may also affect resistance to these drugs. dna gyrase (topoisomerase ii) is the primary site of action in gram-negative bacteria whereas topoisomerase iv is the principal target of quinolones in gram-positive bacteria. mutations in a variety of chromosomal loci have been described that resulted in altered dna gyrases resistant to nalidixic acid and the newer fl uoroquinolones in enterobacteriaceae and p. aeruginosa. many of these mutations involve the substitution of single amino acids at key enzymatic sites (located between amino acids 67-106 in the gyrase a subunit) that are involved in the generation of the dna gyrase-bacterial dna complex [42] . there are two common genes that mediate resistance to sulfa drugs in a wide variety of pathogenic bacteria. these are known as sul1 and sul2. these genes give rise to altered forms of the target enzyme for sulfonamide, dihydropteroate synthase (dhps) [43] . the altered dhps enzymes mediated by the sulfonamide resistance genes no longer bind to sulfa yet continue to synthesize dihydropteroate from para-aminobenzoic acid substrate. trimethoprim is a potent inhibitor of bacterial dihydrofolate reductase (dhfr). a large number of altered dhfr enzymes with loss of inhibition by trimethoprim have been described from genes found primarily on r plasmids. these altered dhfr genes are widespread in gram-negative bacteria and are also found in staphylococci (the dfra gene) [44] . tetracycline resistance may be mediated by a mechanism that interferes with the ability of tetracycline to bind to the ribosome. the ubiquitous tetm resistance gene and related tetracycline resistance determinants protect the ribosome from tetracycline action. the tetm gene generates protein with elongation factor-like activity that may stabilize ribosomal transfer rna interactions in the presence of tetracycline molecules [45] . sulfonamides compete with para-aminobenzoic acid to bind the enzyme dihydropteroate synthase, and thereby block folic acid synthesis necessary for nucleic acid synthesis. sulfonamide resistance may be mediated in some bacteria by the over production of the synthetic enzyme dihydropteroate synthase. the gene responsible for dhps is felp and strains of bacteria that produce excess dhps can overwhelm sulfa inhibition [43] . trimethoprim resistance may also occur in a similar fashion, by making excess amounts of dihydrofolate reductase from the bacterial chromosomal gene fola [44] . an unusual mechanism of resistance to specifi c antibiotics is by the development of auxotrophs, which have specifi c growth factor requirements not seen in wild-type strains. these mutants require substrates that normally are synthesized by the target enzymes, and thus if the substrates are present in the environment, the organisms are able to grow despite inhibition of the synthetic enzyme by an antibiotic. bacteria that lose the enzyme thymidylate synthetase are 'thymine dependent'. if they can acquire exogenous supplies of thymidine to synthesize thymidylate via salvage pathways from the host, they are highly resistant to sulfa drugs and trimethoprim [23] . once an antibiotic resistance gene evolves, the resistance determinant can disseminate among bacterial populations by transformation, transduction, conjugation, or transposition. favored clones of bacteria then proliferate in the fl ora of patients who receive antibiotics. antibiotic-resistance genes were found among bacteria even in the pre-antibiotic therapy era [23] . however, selection pressures placed upon microbial populations by a highly lethal antimicrobial compound create an environment in which individual clones that resist the antibiotic are markedly favored. these resistant populations then proliferate and rapidly replace other susceptible strains of bacteria. while some antibiotic resistance genes place a metabolic 'cost' on bacteria, many microorganisms have evolved strategies to limit this cost by limiting expression, alternate gene products or phase variation. these mechanisms allow favorable but sometimes 'costly' genes that mediate antibiotic resistance to persist in the absence of continued antibiotic selection pressure and yet be rapidly expressed upon re-exposure to antibiotics [46] . plasmids are particularly well adapted to serve as agents of genetic evolution and r-gene dissemination. plasmids are extrachromosomal genetic elements that are made of circular double-stranded dna molecules that range from less than 10 to greater than 400 kilobase pairs and are extremely common in clinical isolates of bacterial pathogens. although multiple copies of a specifi c plasmid or multiple different plasmids, or both, may be found in a single bacterial cell, closely related plasmids often cannot coexist in the same cell. this observation has led to a classifi cation scheme of plasmids based upon incompatibility groups [23] . plasmids may determine a wide range of functions besides antibiotic resistance, including virulence and metabolic capacities. plasmids are autonomous, self-replicating genetic elements that possess an origin for replication and genes that facilitate its stable maintenance in host bacteria. conjugative plasmids require additional genes that can initiate self-transfer. the transfer of plasmid dna between bacterial species is a complex process, and thus conjugative plasmids tend to be larger than non-conjugative ones. some small plasmids may be able to utilize the conjugation apparatus of a coresident conjugative plasmid. many plasmid-encoded functions enable bacterial strains to persist in the environment by resisting noxious agents, such as heavy metals. mercury released from dental fi llings may increase the number of antibiotic-resistant bacteria in the oral fl ora. hexachlorophene and other topical bacteriostatic agents in the environment may actually promote plasmid-mediated resistance to these agents and other antimicrobial agents [47] . transposons are specialized sequences of dna that are mobile and can translocate as a unit from one area of the bacterial chromosome to another. they can also move back and forth between the chromosome and plasmid or bacteriophage dna. transposable genetic elements possess a specialized system of recombination that is independent of the generalized recombination system that permits recombination of largely homologous sequences of dna by crossover events (the reca system of bacteria). the reca-independent recombination system ('transposase') of transposable elements usually occurs in a random fashion between non-homologous dna sequences and results in whole-scale modifi cations of large sequences of dna as a single event [23] . there are two types of transposable genetic elements, transposons and insertion sequences. these mobile sequences probably play an important physiologic role in genetic variation and evolution in prokaryotic organisms. transposons differ from insertion sequences in that they mediate a recognizable phenotypic marker such as an antibiotic-resistance trait. either element can translocate as an independent unit. both elements are fl anked on either end by short identical sequences of dna in reverse order (inverted repeats). these inverted-repeat dna termini are essential to the transposition process. transposons and insertion sequences must be physically integrated with chromosome, bacteriophage, or plasmid dna in order to be replicated and maintained in a bacterial population. some transposons have the capability to move from one bacterium to another without being transferred within a plasmid or bacteriophage. these conjugative transposons are found primarily in aerobic and anaerobic gram-positive organisms and can rapidly and effi ciently spread antibiotic resistance genes [30, 48] . transposition, like point mutation, is a continuous and ongoing process in bacterial populations. transposons are also essential in the evolution of r plasmids that contain multiple antibiotic-resistance determinants [47] . high-level vancomycin resistance (vana) in enterococci is mediated by a composite transposon that encodes a series of genes needed to express vancomycin resistance [38] . single transposons may encode multiple antibiotic-resistance determinants within their inverted-repeat termini as well [23] . genetic exchange of antibiotic-resistance genes occurs between bacteria of widely disparate species and different genera. identical aminoglycoside-resistance genes can spread between gram-negative and gram-positive bacteria and between aerobic and anaerobic bacteria [49] . given the highly variable environmental selection pressures created by a wide variety of antibiotics and the plasticity of bacterial genomes, the ongoing evolution of multi-drug resistant bacterial organisms is probably inevitable [23] . the structural genes that mediate antibiotic resistance are often closely linked and may exist in tandem along the bacterial chromosome or plasmid. genetic analysis of sequences of dna adjacent to resistance genes has identifi ed unique integration units near promoter sites [50] . these integration regions are known as integrons, and they function as convenient recombinational 'hot spots' for site-specifi c recombination events between largely non-homologous sequences of dna. the integron provides its own integrase function [94] with a common attachment and integration site for acquisition of foreign dna sequences. integrons are widespread in bacterial populations and provide a convenient site for insertion of multiple different resistance genes from foreign dna sources. there are four classes of integrons with type i integrons being the most common in pathogenic microorganisms [23] . integrons also serve as effi cient expression cassettes for resistance genes. integrons possess a promoter site in close proximity to the 5'end of the newly inserted dna sequence. numerous clusters of different resistance genes have been linked into integrons through specifi c insertion sites. integrons may have as many as fi ve resistance genes linked in sequence and fl anked between specifi c 59 base-pair spacer units [50, 51] . integron-mediated multiple resistance gene cassettes have been fl anked by transposons, mobilized to plasmids, and then transferred between bacterial species by conjugation. by these systems of genetic exchange, widespread dissemination of multiple antibiotic resistance genes is accomplished in a rapid and frighteningly effi cient manner [50] . for some time concerned scientists have been warning about the possibility of widespread antibiotic resistance leading to the loss of effectiveness of antibiotics in clinical medicine [49] [50] [51] [52] . these warnings have largely been ignored as it was assumed that this human need and the profi t motive of free enterprise would stimulate pharmaceutical companies to continuously develop new antibiotics. if we could discover new targets for future antimicrobial drugs it may be possible to keep pace or even exceed the rate of antibiotic resistance gene development by microbial pathogens. for a number of disconcerting reasons, humans may be losing ground rather than gaining on pathogens in the 21 st century. a recent survey of new pharmaceutical products in 2002 found only fi ve new antibiotics out of the 506 new molecular entities in the research and development pipeline [52] . the pace of new antibiotic discovery is turning into a trickle and drying up compared to what it was even 20 years ago [53] . the market reality is regrettably set against the development of new antibiotics in favor of more lucrative options with greater market profi t from drugs for chronic illnesses with less risk and longer revenue streams [52] [53] [54] . the reimbursement and return on investments are unfavorable for antibiotics and the market system is not meeting the needs of society with respect to new antibiotic development. some far reaching and bold initiatives are desperately needed if a crisis in loss of antibi-otic effectiveness is to be avoided [52, 53] . the disincentives for new antibiotic development and some proposed solutions are listed in table 4 . bacterial strains contain complex aggregations of genes that may be linked together to combat the inhibitory effects of antibiotics. since prokaryotic organisms all contribute to a common 'gene pool', favorable genes mediating antibiotic resistance may disseminate among bacterial diverse microbial genera and species. increasing evidence of multiple antibiotic resistance mechanisms within the same bacterium against a single type of antibiotic, and cooperation between bacterial populations within biofi lms attest to the remarkably ingenuity and fl exibility of bacterial populations [23, 29, 30] . thus the use of one antibiotic may select for the emergence of resistance to another. mobile genetic elements and rapidly evolving integron cassettes with multiple antibiotic resistance genes endow bacteria with a remarkable capacity to resist antibiotics [50] . although the development of antibiotic resistance may be inevitable, the rate at which it develops can be reduced by the rational use of antibiotics. use of 19th century diagnostic methods employ real-time pcr, genomics, (culture and susceptibility tests) to treat proteomics to identify pathogens, and 21 st century diseases encourages empiric resistance genes; target and treat specifi c broad-spectrum antibiotic drug use infections with narrow spectrum drugs pcr: polymerase chain reaction the wider accessibility to molecular techniques and computer technology to rapidly identify the specifi c microorganisms, their resistance potential, and track their spread between patients within the hospital and or the community will be of considerable benefi t in the control of antibiotic resistance. the need to utilize empiric, broad-spectrum antibiotics for days and even weeks while samples are being sent for culture and susceptibility testing needs to stop. we need specifi c information in real time to assure patients with specifi c infections are being treated with effective, narrow-spectrum drugs [23] . the use of antibiotics for non-medical uses should be entirely banned. up to 50% of antibiotic use today is for non-medical use in agriculture, food preparation, and other industrial uses [52] . this adds to environmental contamination with low levels of antibiotics. sub-inhibitory concentrations of antibiotics foster the development of resistant clones of bacteria that can cause infections in humans. the use of non-antibiotic approaches to the management of infectious diseases needs to be supported and developed. the use of plasma-based antibody therapies and anti-bacterial, anti-viral and anti-fungal vaccines should be encouraged in the future [55] [56] [57] . the management of common invasive pathogens such as staphylococcal infections has become very complicated given the rapid spread of simultaneous beta-lactam, aminoglycoside, and quinolone-resistant isolates [58] . recent reports of vancomycin-resistant s. aureus in japan and the united states suggest that common, invasive, microbial pathogens may become refractory to any chemotherapeutic agent in the future [17, 23, 58] . new drug discoveries have allowed us to be one step ahead of the bacterial pathogens for the latter half of the twentieth century. it is unlikely we will continue this record of remarkable success against microbial pathogens in the new millennium. the rapid evolution of resistance has limited the duration of the effectiveness of antibiotics against certain pathogens. the best hope for the future is the continued development of new antibiotic strategies [53] . in order to retain the antimicrobial activity of existing and new antibiotics, clinicians can assist through careful antibiotic stewardship and tightened infection control measures. antimicrobial agents have had a substantial impact in decreasing human morbidity and mortality rates and have served us well over the antimicrobial era. it behooves us to improve our diagnostic and surveillance efforts and to exercise caution in administering antibiotics if we are to maintain their continued effi cacy. forecast and control of epidemics in a globalized world multidrug resistance in yersinia pestis mediated by a transferable plasmid h5n1 infl uenza: a protean pandemic threat induction of proinfl ammatory cytokines in human macrophages by infl uenza a (h5n1) wiruses: a mechanism for the unusual severity of human disease enhanced virulence of infl uenza a viruses with the haemagglutinin of the 1918 pandemic virus the structure and receptor binding properties of the 1918 infl uenza hemagglutinin viral shedding patterns of coronavirus in patients with probable severe acute respiratory syndrome planning for epidemics -the lessons of sars public health measures to control the spread of the severe acute respiratory syndrome during the outbreak in toronto multistate outbreak of monkeypox-illinois westward ho? -the spread of west nile virus death from the nile crosses the atlantic: the west nile fever story hemorrhagic fever viruses as biological weapons death due to bioterrorism-related anthrax: report of two patients isolation and partial molecular characterization of a strain of ebola virus during a recent epidemic of viral haemorrhagic fever in gabon staphylococcus aureus resistant to vancomycin-united states oseltamivir-resistant infl uenza? resistant infl uenza a viruses in children treated with oseltamivir: descriptive study resistance mechanisms in clinical isolates of candida albicans creutzfeldt-jakob disease and related transmissible spongiform encephalopathies outbreaks of avian infl uenza a (h5n1) in asia and interim recommendations for evaluation and reporting of suspected cases -united states mechanisms of antibiotic resistance a functional classifi cation scheme for î²lactamases and its correlation with molecular structure error-prone polymerase, dna polymerase iv, is responsible for transient hypermutation during adaptive mutation in escherichia coli pseudomonas biofi lm formation and antibiotic resistance are linked to phenotypic variation infection with vancomycin-resistant staphylococcus aureus containing the vana resistance gene multifocal outbreaks of metallo-î²lactamase-producing pseudomonas aeruginosa resistant to broad-spectrum î²-lactams, including carbapenems antimicrobial resistance in intensive care units the origins and molecular basis of antibiotic resistance acquisition of 16s rrna methylase gene in pseudomonas aeruginosa interactions of outer membrane proteins 0-8 and 0-9 with peptidoglycan sacculus of escherichia coli k-12 interplay of impermeability and chromosomal beta-lactamase activity in imipenem-resistant pseudomonas aeruginosa membrane potential and gentamicin uptake in staphylococcus aureus drug effl ux as a mechanism of resistance active effl ux, a common mechanism for biocide and antibiotic resistance inner membrane effl ux components are responsible for î²-lactam specifi city of multidrug effl ux pumps in pseudomonas aeruginosa genetic characterization of vang, a novel vancomycin resistance locus for enterococcus faecalis contribution of a thickened cell wall and its glutamine nonamidated component to the vancomycin resistance expressed by staphylococcus aureus m450 transition from resistance to hypersusceptibility to beta-lactam antibiotics associated with loss of a low-affi nity penicillin-binding protein in a streptococcus faecium mutant highly resistant to penicillin penicillin-binding proteins and ampicillin resistance in haemophilus infl uenzae activity of quinolones against gram-positive cocci: mechanisms of drug action and bacterial resistance sulfonamide resistance in haemophilus infl uenzae mediated by acquisition of sul2 or a short insertion in chromosomal folp trimethoprim resistance high-level tetracycline resistance in neisseria gonorrhoeae is result of acquisition of streptococcal tetm determinant phenotypic switching of antibiotic resistance circumvents permanent costs in staphylococcus aureus plasmid-determined resistance to antimicrobial drugs and toxic metal ions in bacteria conjugative transfer of staphylococcal antibiotic resistance markers in the absence of detectable plasmid dna dispersal in campylobacter spp. of apha-3, a kanamycin resistance determinant from gram-positive cocci origins of the mobile gene cassettes found in integrons multidrug resistance among enterobacteriaceae is strongly associated with the presence of integrons and is independent of species or isolate origin trends in antimicrobial drug development antibiotics at the crossroads safety and antigenicity of non-adjuvanted and mf59-adjuvanted infl uenza a/duck/singapore/97 (h5n3) vaccine; a randomized trial of two potential vaccines against h5n1 infl uenza antibody concentration and clinical protection after hib conjugate vaccination in the united kingdom immunogenicity and impact on nasopharyngeal carriage of a nonavalent pneumococcal conjugate vaccine antimicrobial resistance: the example of staphylococcus aureus key: cord-275720-kf9m4zho authors: cho, won kyong; yu, jisuk; lee, kyung-mi; son, moonil; min, kyunghun; lee, yin-won; kim, kook-hyung title: genome-wide expression profiling shows transcriptional reprogramming in fusarium graminearum by fusarium graminearum virus 1-dk21 infection date: 2012-05-06 journal: bmc genomics doi: 10.1186/1471-2164-13-173 sha: doc_id: 275720 cord_uid: kf9m4zho background: fusarium graminearum virus 1 strain-dk21 (fgv1-dk21) is a mycovirus that confers hypovirulence to f. graminearum, which is the primary phytopathogenic fungus that causes fusarium head blight (fhb) disease in many cereals. understanding the interaction between mycoviruses and plant pathogenic fungi is necessary for preventing damage caused by f. graminearum. therefore, we investigated important cellular regulatory processes in a host containing fgv1-dk21 as compared to an uninfected parent using a transcriptional approach. results: using a 3′-tiling microarray covering all known f. graminearum genes, we carried out genome-wide expression analyses of f. graminearum at two different time points. at the early point of growth of an infected strain as compared to an uninfected strain, genes associated with protein synthesis, including ribosome assembly, nucleolus, and ribosomal rna processing, were significantly up-regulated. in addition, genes required for transcription and signal transduction, including fungal-specific transcription factors and camp signaling, respectively, were actively up-regulated. in contrast, genes involved in various metabolic pathways, particularly in producing carboxylic acids, aromatic amino acids, nitrogen compounds, and polyamines, showed dramatic down-regulation at the early time point. moreover, genes associated with transport systems localizing to transmembranes were down-regulated at both time points. conclusion: this is the first report of global change in the prominent cellular pathways in the fusarium host containing fgv1-dk21. the significant increase in transcripts for transcription and translation machinery in fungal host cells seems to be related to virus replication. in addition, significant down-regulation of genes required for metabolism and transporting systems in a fungal host containing the virus appears to be related to the host defense mechanism and fungal virulence. taken together, our data aid in the understanding of how fgv1-dk21 regulates the transcriptional reprogramming of f. graminearum. fusarium graminearum (teleomorph gibberella zeae) is a well known phytopathogenic fungus associated with fusarium head blight (fhb) disease, which causes blights, root rots, or wilts, especially in economically important cereal crops such as wheat, maize, and barley [1] . fhb is considered an important fungal disease because it drastically reduces grain yield and quality, and produces mycotoxins such as deoxynivalenol (don) and nivalenol (niv) in cereals, which are very harmful to human and animal health [2, 3] . the fungus can also infect several dicotyledonous plants including arabidopsis, tobacco, tomato, and soybean [4] . viruses that infect plant fungi are referred to as mycoviruses. infection by some mycoviruses confers hypovirulence by attenuating pathogenicity to their fungal hosts, which are mostly plant pathogens. mycoviruses tend to be double-stranded rna (dsrna) viruses [5] , and several fusarium-infecting mycoviruses have been isolated [6] [7] [8] . in addition, several whole genome sequences of dsrna mycoviruses strains derived from f. graminearum have recently been reported [9] [10] [11] [12] . in many cases, such as those of f. poae (fusarium poae virus 1, fpv1) [7] and f. solani (fusarium solani virus 1, fsv1), viral infection is not associated with phenotypic changes [8] . however, fusarium graminearum virus 1 strain-dk21 (fgv1-dk21) exhibits interesting phenotypes including reduced mycelial growth and the induction of dark red pigmentation [6] . several previous studies have provided strong evidence that hypovirulent mycoviruses could be used as substitutes for fungicides [13, 14] . a recent study demonstrated that protoplast fusion is the most efficient approach for transmitting mycoviruses among a wide range of phytopathogenic fungi and that this approach will facilitate the use of mycoviruses as a biocontrol agent [15] . with the increasing availability of whole genome sequences for representative plant fungal pathogens [16] , extensive and diverse genome-wide analyses can be performed, including transcriptomics, proteomics, and metabolomics [17] . proteomics approaches for different fusarium species have enabled examinations of extracellular proteins, proteins involved in fumonisin biosynthesis, and proteome profiles upon antagonistic rhizobacteria inoculation and mycovirus infection [18] [19] [20] [21] . several gene expression analyses based on microarrays have also been conducted [21] [22] [23] [24] [25] . for example, genome-wide expression profiling of f. graminearum was carried out to examine responses to treatment with azole fungicide tebuconazole and during perithecium development [22, 24] . microarrays provide a valuable tool for detecting and identifying fusarium species that produce specific metabolites such as trichothecene and moniliformin [23, 25] . moreover, the recently completed genome sequencing of three major fusarium species provides an important resource for studying pathogenicity and functions of individual genes [26] . several microarray-based studies have demonstrated transcriptional changes in fungal genes following mycovirus infection, although most of these studies examined only chv1-713 infecting the chestnut blight fungus cryphonectria parasitica. initially, a polymerase chain reaction (pcr)-based approach demonstrated that elevation of camp levels by chv1-713 resulted in reduced accumulation of the gtp-binding (g) protein subunit cpg-1 [25] . in addition, cdna microarrays containing 2,200 genes from c. parasitica showed transcriptional change in g-signaling pathways following hypovirus infections showing different virulence or phenotypes [27] [28] [29] . infection by a virus leads to changes in diverse biological processes between fungal host and viral factors. it is of interest to examine such alterations at the molecular level. however, no previous reports have examined expression differences between a fungus containing a mycovirus and an infected parent, aside from two papers that used microarray cdna chips based on expressed sequence tags to examine fungal host gene expression upon mycovirus infection [28, 29] . here, we examined genomewide transcriptional differences in f. graminearum expression between a strain harboring fgv1-dk21 and its uninfected parent. this is the first report of a genome-wide fungal gene expression analysis during mycovirus infection using a 3′ tiling microarray, and our findings show global differences in host cellular pathways in f. graminearum harboring fgv1-dk21. genome-wide 3′-tiling microarray to identify differentially expressed genes in f. graminearum harboring fgv1-dk21 the virus-infected f. graminearum exhibited strong inhibition of mycelia growth as well as reduced levels of don at 7 days after inoculation ( figure 1 ). to visualize how gene expression patterns were affected at different time points, we generated scatterplots ( figure 2 ). radial growth was measured on pda 120 h after inoculation. b the number of spores was measured in cmc broth at 25°c for 9 days. the number of spores indicated represents the number per 1 ml. data were described previously [6] . c production of deoxynivalenol and 15-acetyldeoxynivalenol. for toxin analysis, all strains were grown in minimal medium supplemented with 5 mm agmatine and incubated for 7 days. d n/d: not detected. * the value is significantly different than that of the virus-free isolate as determined by a student's t-test. interestingly, the scatterplots showed that there were no significant differences in the number of differentially expressed genes between 36 h and 120 h; however, it appeared that the changes in gene expression at 120 h were somewhat more extensive than those at 36 h ( figure 2 ). to identify differentially expressed genes, we first performed hierarchical clustering, which identified gene sets of significantly differentially expressed genes at figure 2 differentially expressed f. graminearum genes during fgv1-dk21 infection identified by microarray. scatterplots of normalized signal intensities 36 h (a) and 120 h (b) after inoculation with fgv1-dk21 as compared to virus-free samples. all signal intensities were converted to a log10 scale. the diagonal lines indicate a two-fold change. the venn diagram illustrates the total number of genes that were significantly differentially expressed at 36 h and 120 h (c), down-regulated genes (d), and up-regulated genes (e). the heat map shows the expression patterns of 384 genes that were hierarchically clustered and differentially expressed at both 36 and 120 h (f). red and green indicate upregulation and down-regulation, respectively. two different time points (additional file 1: table s1 and additional file 2: table s2 ). most of the identified genes showed at least two-fold differential expression. a total of 1775 genes, representing 13.3% of 13,382 genes, were differentially expressed at both time points ( figure 2c ), with 1109 (5.4%) and 1050 genes (5%) identified as differentially expressed at 36 h and 120 h, respectively ( figure 2c ). moreover, 384 genes (3%) were differentially expressed at both time points ( figure 2c ). we further analyzed the lists of differentially expressed genes according to those down-regulated vs. those upregulated ( figure 2d , e). genes with an adjusted p-value less than 0.05 were selected as differentially expressed genes. in both time points, 940 genes (7%) were downregulated whereas 904 genes (7%) were up-regulated ( figure 2d , e). among the down-regulated genes, there were more differentially expressed genes at 120 h (561 genes) than at 36 h (499 genes) ( figure 2d ). in contrast, more genes were induced at 36 h (610 genes) than at 120 h (489 genes) ( figure 2e ). moreover, 120 genes and 195 genes were commonly found in the group of downregulated and up-regulated genes, respectively ( figure 2d , e). we then investigated the expression patterns of 384 genes that were differentially expressed at both time points using hierarchical clustering, which sorted the genes into four groups according to expression patterns ( figure 2f ). group a contained 36 genes that were highly downregulated at 36 h but were up-regulated at 120 h. in contrast, the 33 genes belonging to group b were highly induced at 36 h, and subsequently down-regulated ( figure 2f ). group c included 120 genes that were strongly repressed regardless of virus infection time, and group d contained 195 genes that showed consistently elevated gene expression across both time points. representative genes showing significant expression change and real-time validation of the microarray data by quantitative real-time reverse transcription pcr representative fungal genes that showed significant gene expression are listed in table 1 . at 36 h, several genes encoding enzymes including phospholipase/carboxylesterase, polyketide synthase, eukaryotic aspartyl protease and dipeptidyl aminopeptidases were highly induced, whereas transcripts involved in transport, such as amino acid transporter permease and abc transporter, were up-regulated at 120 h (table 1 ). in contrast, several of the repressed genes at 36 h included maltose transporter, linoleate diol synthase, and genes with unknown functions, whereas those encoding ferric reductase and abhydrolase 3 were strongly repressed at 120 h (table 1) . to validate the microarray data, we selected 20 genes whose expression was significantly affected in the host containing fgv1-dk21, as demonstrated by the microarray analysis, and determined their relative expression by quantitative real-time reverse transcription pcr (qrt-pcr) ( figure 3 and additional file 3: table s3 and additional file 4: table s4 ). genes that showed diverse expression patterns were categorized into different functional classes (figure 3 and additional file 3: table s3 ). the results of the qrt-pcr were highly consistent with those of the microarray data. for example, according to the qrt-pcr and microarray results, the transcript levels for three genes, including fgsg_01379, fgsg_03143, and fgsg_03911, were highly reduced at both 36 h and 120 h, whereas fgsg_03788, fgsg_00023, fgsg_07804, fgsg_07801, and fgsg_13222 were strongly induced regardless of the time point ( figure 3a -c). when the mrna level of a gene is too low to quantify, or the p-values from the microarray data are very high, it is highly likely that qrt-pcr results will not correlate with microarray data, as was observed for fgsg_11119 and fgsg_04089 ( figure 3a ). as compared to the microarray approach, qrt-pcr offers a highly sensitive technique for detecting low amounts of transcripts and provides the transcript level for the gene of interest. for example, the expression intensities for 11 genes were relatively low, ranging from 0 to 2.97 ( figure 3a ), whereas the amount of mrnas for nad-dependent aldehyde dehydrogenases (fgsg_07801) and dipeptidyl aminopeptidases (fgsg_13222) were very high and ranged from 4424.26 to 5254.08, particularly in the host containing fgv1-dk21 at 36 h ( figure 3c ). funcat classification for an overview of the transcriptional reprogramming of f. graminearum harboring fgv1-dk21 we subjected a total of 1775 differentially expressed genes to functional catalogue (funcat) annotation to gain insight into their functional classifications [29] . specifically, we divided differentially expressed genes into four groups (table 2 ). more than half of the differentially expressed genes were not assigned to any functional category. specifically, 344 genes (71.9%) in the group of down-regulated genes at 120 h were unclassified. of the functional categories, the vast majority of differentially expressed genes were associated with metabolism ( table 2 ). note that 118 genes (28.2%) were downregulated while 81 genes (22.9%) were up-regulated at 36 h, whereas there were more up-regulated genes (95) than down-regulated genes (79) at 120 h. based on the number of differentially expressed genes, it is likely that genes involved in various metabolic pathways were severely repressed at 36 h and then were gradually induced at 120 h. along with a gene set for metabolism, genes associated with energy were highly down-regulated (26 genes) at 36 h (table 2 ). in contrast, genes involved in transcriptional and translational machinery were dominantly up-regulated early after fgv1-dk21 infection. for example, genes associated with the cell cycle and dna processing (21 genes), transcription (46 genes), protein synthesis (35 genes), protein fate (18 genes), and those encoding proteins with binding function (81 genes) were highly upregulated at 36 h. the number of down-regulated genes associated with cellular transport at 36 h was almost twice that of up-regulated genes. conversely, the number of upregulated genes that govern cellular transport was similar to that of the down-regulated genes at 120 h. next, we analyzed the enriched gene ontology (go) terms of the differentially expressed genes. the identified enriched go terms are listed in additional file 5: table s5 . the directed acyclic graph (dag) illustrates the go terms that were over-represented (figures 4, 5, 6, 7, 8, and 9) . interestingly, go terms related to ribosome biogenesis, such as the ribosome ribonucleoprotein complex assembly (go: 0022618), ribosome assembly (go: 0042255), ribonucleoprotein complex biogenesis (go: 0022613), and ribosome biogenesis (go: 0042254), were highly over-represented ( figure 4 ). moreover, go terms for rna processing (go: 0006396), ncrna processing (go: 0034470), rrna metabolic processes (go: 0016072), rrna processing (go: 0006364), and maturation of ssu-rrna (go: 0030490) were over-represented with high levels of transcripts ( figure 5 ). similarly, go terms related to the nucleolus, such as the small nucleolar ribonucleoprotein complex (go: 0005732), the nuclear lumen (go: 0031981), and nucleolus (go: 0005730), were over-represented ( figure 6 ). transcripts associated with the nucleolus were highly accumulated at 36 h. it was not surprising that the expressions of genes that involve a range of metabolic pathways were significantly affected by fgv1-dk21 infection. consistent with the funcat annotation, go enrichment analysis showed that genes associated with a variety of metabolic pathways were significantly down-regulated, particularly at 36 h. for example, over-represented go terms included cellular aromatic compound metabolic processes (go: 0006725), in the cellular component ontology, go terms associated with membranes (go terms 0016020, 0044425, 0031224, and 0016021) were over-represented ( figure 6 ). interestingly, genes associated with membranes were down-regulated regardless of the time point ( figure 6 ). furthermore, transporter activity was one of the significantly over-represented go terms with respect to molecular function. of a total of 17 go terms for transporter activity, 14 were present at 36 h, including carbohydrate transmembrane transporter activity (go: 0015144), ion transmembrane transporter activity (go: 0015075), and sugar:hydrogen symporter activity (go: 0005351) (figure 9 ). at 120 h, three go terms, including primary active transmembrane transporter activity (go: 0015399), atpase activity, and coupled to transmembrane movement of substances (go: 0042626), were over-represented ( figure 9 ). transcription factors (tfs) play a key role in signal transduction pathways by regulating gene expression to control biological processes [30] . thus, it is of interest to understand their involvement in fungi-virus interactions at the transcriptional level. whole genome sequences of f. graminearum show that there are at least 659 tfs divided into 44 families [31] . we identified more differentially expressed tfs at the early time point (57 tfs) than at the late time point (27 tfs) (additional file 6: table s6 ). zn2cys6 (42 tfs) was the most prevalent tf family, followed by the c2h2 zinc finger (7 tfs) and bhlh (2 tfs) families. in addition, centromere protein b, dna-binding region, homeodomain-like, lambda repressor-like, dna-binding, nucleic acid-binding, ob-fold, tf jumonji, and bzip were also identified as tfs showing significant change at the transcript level (additional file 6: table s6 ). zn2cys6 (15 tfs) was also the most prominent tf family at the late time point, followed by c2h2 zinc finger (3 tfs), bhlh (2 tfs), and bzip (2 tfs) (additional file 6: table s6 ). moreover, the gene expressions of tfs belonging to the myb (a negative transcriptional regulator), tf jumonji, zinc finger (cchctype), and zinc finger (nf-x1-type) families were significantly changed. recently, a mutant library of 657 putative tfs was established via homologous recombination in f. graminearum, providing a valuable resource to study gene regulation in fungus [32] . thus, it might be of interest to compare phenotypes of mutants for the 75 differentially expressed tfs. most tf knock-out mutants show distinct phenotypes. for example, five mutants for which members of the c2h2 zinc finger family were deleted (fgsg_04083 and fgsg_12837) and zn2cys6 (fgsg_04901, fgsg_05789, and fgsg_09464) displayed abnormal phenotypes compared to wild type controls. interestingly, the deletion of fgsg_04083 resulted in a hypervirulence phenotype. the expression of all five of these tfs was strongly up-regulated 36 h after virus infection. two genes (fgsg_12837 and fgsg_09464) were highly expressed in both early and late time points. the ascospore was not formed in fgsg_12837 mutants, while that for fgsg_09464 exhibited increased resistance to oxidative stress. fgv1-dk21 drastically induces the expression of fungal host genes required for their replication perhaps the most striking finding of this study is that the host containing fgv1-dk21 accumulates transcripts associated with translation machinery, such as ribosome biogenesis and the nucleolus. ribosomes function in the production of proteins, while the nucleolus is the site of ribosomal rna synthesis and ribosome assembly [33] . the nucleolus actively participates in a variety of biological processes, including cell cycle regulation, cell growth, stress sensing, and viral infection [34] . moreover, genes involved in rna processing were highly induced. these genes convert precursor rnas such as non-coding rna (ncrna) and small subunit (ssu) ribosomal rnas molecules into mature rna molecules. taken together, these findings suggest that the entire complex for protein synthesis and processing in fungal host cells was highly activated by fgv1-dk21. viruses rely on host cell machinery and have evolved sophisticated mechanisms to achieve replication efficiently during virus infection [35] . given that nucleolus localizing genes were up-regulated, it appears that the virus stimulates gene expression associated with nucleolus. as a result, the nucleolus produces numerous ribosomes to maximize viral replication. in addition, the virus might control protein synthetic machinery in host ribosomes to replicate viral proteins. recently, proteomics-based studies confirmed the involvement of the nucleolus in viral infection and replication [36, 37] . previous studies have shown that rna viruses can interact with several nucleolar proteins such as nucleolin, b23, and fibrillarin to facilitate virus replication [38] . thus, it might be of interest to examine the interactions between nucleolar proteins from f. graminearum and fgv1-dk21 viral proteins in future studies. although our data suggest that expression of genes related to ribosomes was strongly affected by virus infection, this might be a common phenomenon in hosts in response to various kinds of virus. thus, we cannot exclude the possibility that ribosomes are indirectly involved in the replication of dsrna viruses. since little is known about dsrna viruses, we refer to many results from singlestranded (ss) rna viruses to support our data. the system for dsrna viruses might differ from that of ssrna viruses. metabolism is the core of cellular functions, and comprises numerous reactions that function in the degradation of nutrients and biosynthesis of cellular components including proteins, lipids, carbohydrates, dna, and rna. our results, as well as a previous report of c. parasitica, found dramatic differences in gene expression associated with metabolic pathways [39] . however, there are inherent differences between that study and the present one. specifically, our microarray data demonstrated down-regulation of genes involved in metabolism. in contrast, the previous study found that the majority of metabolites, including lipids and carbohydrates, were significantly accumulated [39] . this difference may be due to different infection times. we found the most dramatic changes in gene expression at the early time point, whereas metabolic probing showed that a variety of metabolites accumulated with increasing infection time [39] . differential expression of genes related to metabolism might be associated with the altered host phenotype. for example, in the group of down-regulated genes at 36 h, genes for cell type differentiation were also highly enriched, suggesting that host cell differentiation seems to be induced by viral infection. interestingly, several studies provide evidence that filamentous differentiation in fungi is required for virulence [40] . regardless of viral infection stage, genes related to cell rescue, defense, and virulence were highly up-regulated, suggesting that the host defense system was consistently activated. compared to the whole genome, genes with significantly enriched functions were mostly found in the group showing up-regulation at 36 h, suggesting that the transcriptional regulation in the host harboring mycovirus is more important at the early time point than the late time point. of the altered metabolites, our study as well as a previous report [39] found dramatic changes in gene expression levels for polyamine production. polyamines play roles in many biological processes, such as cell growth, development, and responses to various stresses [41] . thus, it is likely that the down-regulation of genes involved in polyamine biosynthesis during the early stage (36 h) could be correlated with reduced levels of don, which confers hypovirulence to host f. graminearum (figure 1) . a previous study showed that polyamine biosynthesis inhibitors decreased mycelial growth of sclerotinia sclerotiorum [42] . this result is highly consistent with observed phenotypes in virus-infected f. graminearum showing strong inhibition of mycelia growth (figure 1 ). however, we do not know whether the inhibition of mycelial growth in f. graminearum is directly related to polyamine biosynthesis. changes to the membrane-associated transporting system of the host harboring fgv1-dk21 along with the reduced amounts of many metabolites such as carbohydrates, gene expression for the transfer of such metabolites or ions from one side of the membrane to the other was greatly suppressed in the host harboring fgv1-dk21. this indicates that virus infection affects the transport of many micro-and macroelements in host cells at the transcript level. moreover, this transport system mediates cell-to-cell communication within the host via plasma membranes. for example, transcripts required for cellular communication were highly accumulated at 36 h, suggesting that the virus might first stimulate cell-to-cell communication in fungal host cells, which is necessary to trigger host defense mechanisms against viral pathogens. indeed, a recent study reported that infection of chlorovirus, paramecium bursaria chlorella virus 1, affects the transport activity of solutes via plasma membranes in chlorella [43] . plasma membranes are the first barriers to block pathogen attack and can transmit information and molecules between neighboring cells. viruses utilize plasma membranes to interact with signaling molecules. a previous study suggested that virus infection causes depolarization of the host cell membranes, thus decreasing the transport of solutes by active transporters via plasma membranes [43] . the impairment of plasma membranes by viruses suggests that all materials required for virus replication should be recruited within the host cell [43] . taken together, these data suggest that fgv1-dk21 might inhibit the transport of diverse metabolites via plasma membranes to the maximize energy required for virus replication within the nucleolus. fungal-specific tfs are key players in the gene expression regulation in f. graminearum harboring fgv1-dk21 the expressions of members of the zn2cys6 tf family were strongly altered at both the early and late time points. these are known to be fungal-specific, and their functional roles are diverse, including sugar and amino acid metabolism, gluconeogenesis, respiration, vitamin synthesis, cell cycle, chromatin remodeling, nitrogen utilization, peroxisome proliferation, drug resistance, and stress response [30] . the zn2cys6 family is the largest tf family in f. graminearum, comprising 309 tfs. of the nine tfs that were differentially expressed at both the early and late time points, six belong to the zn2cys6 family, indicating that expression of zn2cys6 is necessary in the host harboring fgv1-dk21. furthermore, the enriched go terms for nucleic acid binding that were up-regulated at 36 h provide evidence for transcriptional regulation by tfs such as the zn2cys6 tf family (figure 9 ). to characterize the functional roles of the zn2cys6 tf family associated with fgv1-dk21, it is necessary to examine the phenotypes of knock-out mutants. therefore, the recently generated f. graminearum deletion mutant lines will be very useful resources for characterizing the functions of host tfs associated with hypovirulence. fungi utilize camp-mediated signal transduction pathways to recognize and respond to diverse environmental stimuli. camp signaling is implicated in the regulation of hyphal growth, mating, and gluconeogenesis in many fungi [44, 45] . in addition, camp and the g-protein alpha subunit coordinate their activities to regulate differentiation and virulence in some fungi. we found that transcripts for genes associated with responses to camp were highly accumulated in the late stage of virus infection. this result is consistent with previous data that suggested up-regulation of camp levels in the fungal transcriptome by hypovirus infection [46] . thus, we hypothesize that fgv1-dk21 attenuates the pathogenicity of f. graminearum via camp-mediated signaling and that this process occurs relatively late after virus infection. recent years have seen extraordinary developments in genome-wide experimental methods. of these, microarray analyses facilitate an understanding of the dynamic gene expression patterns of target organisms during environmental stimuli such as biotic and abiotic stresses. here, given the benefits of the available whole genome sequences of f. graminearum, we generated a 3′ tiling microarray system covering whole genes. to decipher global transcriptional reprogramming in f. graminearum harboring fgv1-dk21 in detail, samples were harvested at two different time points, thus providing lists of differentially expressed genes early and late in the host containing fgv1-dk21 as compared to an uninfected strain. numbers of differentially expressed genes at the early and late time points were comparable, but the gene lists differed, suggesting time-dependent transcriptional changes. genes that were up-regulated at the early time point included those involved in protein synthesis, such as ribosome assembly, as well as nucleolus and ribosomal rna-processing genes, suggesting that fgv1-dk21 strongly modulates translational machinery in f. graminearum to maximize viral replication. moreover, the accumulation of transcripts associated with transcription, such as tfs, indicated that the transcriptional machinery, which is largely regulated by fungal-specific tfs, might be one of the main targets for virus infection. in contrast, genes involved in various metabolic pathways, particularly those that produce carboxylic acids, aromatic amino acids, nitrogen compounds, and polyamines, were highly down-regulated at the early time point (36 h) . interestingly, such components are closely associated with the host defense mechanism. these results suggest that fgv1-dk21 suppresses the production of such defense-related components until the transcriptional and translational machinery in host cells have adjusted to fgv1-dk21 replication. in addition, transport systems associated with membranes were severely damaged by hindering the recruitment of materials for viral replication within host cells. when faced with viral infection, f. graminearum tries to establish a defense mechanism by consistently up-regulating genes associated with defense and virulence at the late time point (120 h). taken together, our data provide strong genome-wide transcriptional evidence of how fgv1-dk21 regulates the transcriptional reprogramming in f. graminearum. virus-free and fgv1-dk21-infected f. graminearum strain-dk21 were stored in 15% (v/v) glycerol at −80°c and reactivated on pda at 25°c with a 12 h light-dark cycle. f. graminearum cultures used for rna extractions were grown as described previously [9] . freshly grown mycelia from pda media plates were inoculated in 50 or 200 ml complete media (cm) broth [6] and incubated at 25°c for 36 h or 120 h in an orbital shaker (150 rpm). hyphae were collected by filtering through 3mm paper, washed with distilled water, dried by blotting with paper towels, and frozen at −80°c. conidia of virus-free and fgv1-dk21-infected f. graminearum strains were harvested in 50 ml of cmc broth at 4 days after inoculation. conidial suspensions (2 × 105 conidia per dish) were grown in 20 ml of defined media containing 5 mm of agmatine [47] in 90 × 15 mm petri dishes for 7 days prior to filtrate harvests. three replicates were used for this treatment. mycotoxin was extracted from the isolates and analyzed with a shimadzu qp-5000 gas chromatograph-mass spectrometer, as described previously [48] . the trichothecenes were measured based on biomass produced by each strain. preparation of total rna samples and cdna synthesis for rt-pcr frozen mycelia were pulverized with a mortar and pestle using liquid nitrogen for nucleic acid extraction. total rnas were extracted by iso-rna lysis reagent (5 prime, germany). extracted total rna was treated with dnase i (takara, japan) to remove genomic dna according to the instructions provided by the manufacturer. these total rna samples were precipitated with ethanol and resuspended in depc-treated water. next, 5 μg total rna of each sample was used to synthesize first-strand cdna with an oligo (dt) 18 primer and m-mlv reverse transcriptase (promega, usa) according to the manufacturer's protocols. all synthesized cdnas were diluted 1:10 with nucleasefree water for rt-pcr. expression profiling was conducted with the gibberella zeae 135k microarray, which was designed based on the f. graminearum sequences released in march 2007 (http://www.broad.mit.edu/annotation/genome/fusarium _group/). the genome contains a total of 13,382 genes. ten 60-nucleotide-long probes were designed from each gene starting 250 base pairs (bp) ahead of the end of the stop codon and by shifting 10 bp. thus, these 10 probes covered 150 bp in the 3′ region of the target gene. mitochondrial genes (50 genes) and selected markers such as gfp, gus, hyg, bar, and kan were included. in total, 133,612 probes were designed. the average probe size was 60 nucleotides long, and the tm values were adjusted from 75 to 85°c. the microarray was manufactured by nimblegen inc. (http://www.nimblegen.com/). random gc probes (40,000) to monitor the hybridization efficiency and four corner fiducial controls (225) to assist with overlaying the grid on the image were included. to assess the reproducibility of the microarray analysis, we repeated the experiment three times with independently prepared total rnas. thus, a total of 12 samples were subjected to total rna isolation and used for microarray analyses. for the synthesis of double-stranded cdnas, the revertaidtm h minus first strand cdna synthesis kit (fermentas, lithuania) was used. in brief, 1 μl oligo dt primer (100 μm) and 10 μl (10 μg) total rnas were combined and denatured at 70°c for 5 min and renatured by cooling the mixture in ice. first-strand dna was synthesized by adding 4 μl 5 x first strand buffer, 1 μl ribolocktm ribonuclease inhibitor, 2 μl 10 mm dntp mix, and 1 μl rever-taidtm h minus m-mulv reverse transcriptase enzyme and incubating at 42°c for 1 h. the reaction was stopped by heating at 70°c for 10 min. to synthesize the second strand, 66.7 μl nuclease free water, 5 μl 10 x reaction buffer for dna polymerase i (fermentas, lithuania), 5 μl 10 x t4 dna ligase buffer (takara, japan), 3 μl 10 unit/ul dna polymerase i (fermentas, lithuania), 0.2 μl 5 unit/μl ribonuclease h (fermentas, lithuania), and 0.1 μl 350 unit/μl t4 dna ligase (takara, japan) were added to the firststrand reaction mixture and the reaction was performed at 15°c for 2 h. the double-stranded cdna mixture was purified using the minelute reaction cleanup kit (qiagen, usa). for the synthesis of cy3-labeled target dna fragments, 1 μg double-stranded cdna was mixed with 30 μl (1 optical density) cy3-9mer primer (sigma-aldrich, usa) and denatured by heating at 98°c for 10 min. the reaction was further proceeded by adding 10 μl 50 x dntp mix (10 mm each), 8 μl deionized water, and 2 μl klenow fragment (50 unit/μl, takara, japan) and incubating at 37°c for 2 h. dna was precipitated by centrifugation at 12,000 x g after adding 11.5 μl 5 m nacl and 110 μl isopropanol. precipitated samples were rehydrated with 13 μl water. the concentration of each sample was determined using a spectrophotometer. a 10 μg aliqout of dna was used for microarray hybridization. the sample was mixed with 19.5 μl 2 x hybridization buffer (nimblegen, usa) and finalized to 39 μl with deionized water. hybridization was performed with an maui chamber (biomicro, usa) at 42°c for 16-18 h. after hybridization, the microarray was removed from the maui hybridization station and immediately immersed in a shallow 250 ml wash i (nimblegen, usa) at 42°c for 10-15 s with gentle agitation, then transferred to a second dish of wash i and incubated for 2 min with gentle agitation. the microarray chip was then transferred to another dish of wash ii and further washed in wash iii for 1 min with agitation. the slide was dried in a centrifuge for 1 min at 500 g and scanned using a gene-pix scanner 4000b (axon, usa). the hybridized microarray chip was scanned with genepix 4000 b (axon instruments) preset with a 5 μm resolution for the cy3 signal. signals were digitized and analyzed by nimblescan (nimblegen, usa). the grid was aligned to the image with a chip design file called the ndf file. the alignment was assessed by ensuring that the grid's corners were overlaid on the image corners. this was further checked by uniformity scores in the program. the analysis was performed in a two-part process. first, pair-report files were generated in which the sequence, probe, and signal intensity information for the cy3 channel were collected. databased background subtraction using a local background estimator was performed to improve fold-change estimates on arrays with high background signals. the data were normalized and processed with a cubic spline normalization using quantiles to adjust for signal variation between chips [49] . a probe-level summarization by robust multi-chip analysis (rma) using a median polish algorithm implemented in nimblescan was used to produce call files in order to improve the sensitivity and reproducibility of microarray results [50] . the multiple correction analysis was performed using the limma package in an r computing environment [51] . linear models implemented in lmfit and empirical bayes methods implemented in ebayes were applied to assess the differential expression of genes. genes for which the adjusted p-value or false discovery rate was below 0.05 were collected and further selected. hierarchical clustering was performed by acuity 3.1 (axon instruments) with similarity metrics based on squared euclidean correlation, and average linkage clustering was used to calculate the distance of genes. the microarray data were deposited in the ncbi gene expression omnibus (geo) database with the accession number gse30545. to gain insight into the functions of the differentially expressed genes, go enrichment analysis was conducted with gominer [52, 53] . the 8,338 genes were matched to the m. grisea sequencing assembly sc5 (http://www. broad.mit.edu/annotation/genome/magnaporthe_grisea/) with scores of 100 and up by blastp analysis and were used as the total gene set for go enrichment analysis. the gominer program first categorizes each gene according to its go terms and the mode of gene expression (either down-or up-regulation). modes of expressions are denoted as under, over, and change. the program then calculates p-values using a one-sided fisher exact test for the number of categorized go terms out of the total number of terms. false discovery action and reaction of host and pathogen during fusarium head blight disease health effects of mycotoxins: a toxicological overview identification of deoxynivalenol-and nivalenol-producing chemotypes of gibberella zea by using pcr arabidopsi is susceptible to the cereal ear blight fungal pathogens fusarium graminearu and fusarium culmoru a summary of taxonomic changes recently approved by ictv double-stranded rna mycovirus from fusarium graminearu genetic interrelationships and genome organization of double-stranded rna elements of fusarium poa cloning and characterization of mycovirus double-stranded rna from the plant pathogenic fungus, fusarium solani f. sp. robiniae molecular characterization of a dsrna mycovirus, fusarium graminearum virus-dk21, which is phylogenetically related to hypoviruses but has a genome organization and gene expression strategy resembling those of plant potex-like viruses complete nucleotide sequence of double-stranded rna viruses from fusarium graminearu strain dk3 a novel double-stranded rna mycovirus from fusarium graminearu: nucleic acid sequence and genomic structure molecular characterization of fusarium graminearum virus 2 isolated from fusarium graminearu strain 98-8-60 biological control of chestnut blight with hypovirulence: a critical analysis viruses of plant pathogenic fungi transmission of fusarium boothi mycovirus via protoplast fusion causes hypovirulence in other phytopathogenic fungi the dawn of fungal pathogen genomics comparative proteomics of extracellular proteins in vitro and in planta from the pathogenic fungus fusarium graminearu identification of genes associated with fumonisin biosynthesis in fusarium verticillioide via proteomics and quantitative real-time pcr a proteomics approach to study synergistic and antagonistic interactions of the fungal-bacterial consortium fusarium oxysporu wild-type msa 35 proteomic analysis of fungal host factors differentially expressed by fusarium graminearu infected with fusarium graminearum virus-dk21 development of a fusarium graminearu affymetrix genechip for profiling fungal gene expression in vitro and in planta development of a novel multiplex dna microarray for fusarium graminearu and analysis of azole fungicide responses dna microarray to detect and identify trichothecene-and moniliforminproducing fusariu species microarray analysis of transcript accumulation during perithecium development in the filamentous fungus gibberella zea (anamorph fusarium graminearu) an oligonucleotide microarray for the identification and differentiation of trichothecene producing and non-producing fusariu species occurring on cereal grain comparative genomics reveals mobile pathogenicity chromosomes in fusariu specific and common alterations in host gene transcript accumulation following infection of the chestnut blight fungus by mild and severe hypoviruses microarray analysis of cryphonectria parasitic gα-and gβγ-signalling pathways reveals extensive modulation by hypovirus infection comparative analysis of alterations in host phenotype and transcript accumulation following hypovirus and mycoreovirus infections of the chestnut blight fungus cryphonectria parasitic transcription factors in fungi ftfd: an informatics pipeline supporting phylogenomic analysis of fungal transcription factors a phenome-based functional analysis of transcription factors in the cereal head blight fungus, fusarium graminearu genome organization in and around the nucleolus involvement of the nucleolus in plant virus systemic infection translation initiation of viral mrnas elucidation of the avian nucleolar proteome by quantitative proteomics using silac and changes in cells infected with the coronavirus infectious bronchitis virus quantitative proteomics using silac coupled to lc-ms/ms reveals changes in the nucleolar proteome in influenza a virus-infected cells rna viruses: hijacking the dynamic nucleolus major impacts on the primary metabolism of the plant pathogen cryphonectria parasitic by the virulence-attenuating virus chv1-ep713. microbiology the control of filamentous differentiation and virulence in fungi polyamines and abiotic stress tolerance in plants polyamine metabolism during sclerotial development of sclerotinia sclerotioru chlorovirus-mediated membrane depolarization of chlorella alters secondary active transport of solutes conserved camp signaling cascades regulate fungal development and virulence signal transduction cascades regulating mating, filamentation, and virulence in cryptococcus neoforman extensive alteration of fungal gene transcript accumulation and elevation of g-protein-regulated camp levels by a virulence-attenuating hypovirus nutrient profiling reveals potent inducers of truncothecene biosynthesis in fusarium graminearu atp citrate lyase is required for normal sexual and asexual development in gibberella zea a new non-linear normalization method for reducing variability in dna microarray experiments exploration, normalization, and summaries of high density oligonucleotide array probe level data linear models and empirical bayes methods for assessing differential expression in microarray experiments gominer: a resource for biological interpretation of genomic and proteomic data gene ontology: tool for the unification of biology. the gene ontology consortium the funcat, a functional annotation scheme for systematic classification of proteins from whole genomes submit your next manuscript to biomed central and take full advantage of: • convenient online submission • thorough peer review • no space constraints or color figure charges • immediate publication on acceptance • inclusion in pubmed, cas, scopus and google scholar • research which is freely available for redistribution rate (fdr) values were obtained from 100 randomizations. go terms for which the fdr was less than 0.05 were selected.qrt-pcr analysis qrt-pcr was performed on bio-rad's cfx96 ™ real-time pcr system using gene-specific internal primers. each reaction mix (10 μl) consisted of 25 ng total cdna, 5 μl 2 x ssofast ™ evagreen w supermix (bio-rad, usa), and 10 pmoles each primer. the thermal profile was as follows: 3 min at 95°c and 40 cycles of 5 s at 95°c, 5 s at 58°c, and melting curve data obtained by increasing the temperature from 65 to 95°c. elongation factor 1α (ef-1α; fgsg_08811) and cyclophilin (cyp1; fgsg_07439) were used as internal reference genes to normalize mrna levels between samples (ef-1α; genbank accession no. xm388987, cyp1; genbank accession no. xm387615). data were analyzed using the bio-rad cfx manager v1.6.541.1028 software (bio-rad, usa). rna was extracted from three independent replicate experiments, and each pcr product was evaluated in at least three independent experiments, including three technical replicates [54] . additional file 1: table s1 . differentially expressed f. graminearum genes at 36 h.additional file 2: table s2 . differentially expressed f. graminearum genes at 120 h.additional file 3: table s3 . the 20 representative f. graminearum genes selected for qrt-pcr validation.additional file 4: table s4 . oligonucleotide primers used for qrt-pcr.additional file 5: table s5 . enriched go terms of differentially expressed genes.additional file 6: table s6 . f. graminearum transcription factors which were differentially expressed by fgv1-dk21 infection. the authors declare that they have no competing interests.authors' contributions jy, kml, ms, ywl, and kk designed the experiment. jy, kml, and ms performed cultivation of f. graminearum and infection with fgv1-dk21. jy and km conducted mycotoxin analysis, isolated total rnas, and conducted rt-pcr. wkc, jy, and kk analyzed the microarray data and interpreted the results. jy and kk coordinated the study. wkc, jy, ywl, and kk wrote the manuscript. all authors read and approved the final manuscript. key: cord-346308-9h2fk9qt authors: kaur, rajwinder; yadav, bhoomika; tyagi, r.d. title: microbiology of hospital wastewater date: 2020-05-01 journal: current developments in biotechnology and bioengineering doi: 10.1016/b978-0-12-819722-6.00004-3 sha: doc_id: 346308 cord_uid: 9h2fk9qt the study of hospital wastewater (hww) microbiology is important to understand the pollution load, growth of particular pathogenic microbes, shift and drift in microbial community, development and spread of antibiotic resistance in microbes, and subsequent change in treatment efficiencies. this chapter investigates the potential microbes such as bacteria, viruses, fungi, and parasites present in hww along with the diseases associated and methods of treatment used. due to the indiscriminate release of antibiotics from hospitals, hww serves as a hotspot for emergence of antibiotic-resistance genes (args) and antibiotic-resistance bacteria. this chapter discusses the args occurrence in hww, their prevalence in the environment, the molecular tools used for identification, and different mechanisms of horizontal gene transfer. thus better understanding of the microbiology of hww could further help in development of advanced treatment technologies for effective removal of microbes and their bioproducts (toxins and infectious nucleic acid) from hww and contaminated water. wastewater includes any type (e.g., from agriculture, domestic means, industries, human excretion, commercial sectors, pharmaceuticals, healthcare units) of water which quality has been deteriorated under anthropogenic influence [1] . hospital wastewater (hww) is quite different from the wastewater discharged from other sources and is hazardous and infectious. it consists of wide range of several micro-and macropollutants, discharged from operation (surgery) rooms, wards, laboratories, laundry, polyclinics, research units, radiology, and medicine and nutrient solutions used in microbiology laboratories [2] . the micro-and macropollutants include radioactive isotopes, pharmaceuticals, stock cultures, heavy metals, media, pathogens, drugs, cotton particles, disinfectants, and chemical compounds [3] . the contraceptive-rich pharmaceuticals present in hww were reported to be associated with effects of endocrine disruption, for instance, exposure to pharmaceutical waste containing estrogen or androgen caused sex reversals in fishes and thus, reproductive impairment [4] . the hospital effluent discharged directly into the ponds has caused eutrophication, which is evident by oxygen depletion, dense algal blooms, and death of aquatic animals [1] . the discharge of wastewater depends upon the capacity of hospital and generally water varying from 400 to 1200 l/day/bed is consumed by the hospitals [5] . the hww with many microbes and emerging infectious particles such as prions, viroids, and toxins is hazardous for the environment, and ultimately human health. however, in many countries, hww is directly discharged into sewage water without pretreatment, it then undergoes treatment along with the municipal wastewater but the treatment could not be sufficient to remove micropollutants from hww [6] . moreover, the pharmaceutical compounds present in hww undergo biological transformation and form conjugate compounds, those toxicity could be even higher than the present metabolite [7] . thus wastewater pretreatment methods designed for municipal wastewaters are not able to remove the pharmaceutical-related micropollutants or conjugate compounds, which are generally present in low concentration than other macropollutants (chemical oxygen demand, biological oxygen demand, phosphorus, and nitrogen) and thus, could not be removed using conventional wastewater treatment plant (wwtp) operations [8] . moreover, the pathogens present in hww could cause microbial population imbalance in existing and operating municipal wwtp and metals or heavy metals present in the hww can affect the biological processes like nutrient removal. the heavy metals are in fact nonbiodegradable as compared with other organic pollutants and thus move to other pollutant sources [9] . the hww can act as an ideal growth medium for various pathogenic microbes including bacteria, viruses, fungi, and parasites. the wastewaters from hospitals also consist of several resistant bacteria and antibiotic residues as well, which could inhibit the growth of susceptible bacteria, thereby increasing the population of resistant bacteria in the receiving water. resistant bacteria discharged into environment either act as vectors to carry a transmissible gene or as the reservoirs for antibiotic-resistance genes (args) that could pose a threat to public health [10] . in addition, fungi which have capability to grow at faster rate and can spread their spores to external environment also pose a serious threat to environment and human health [11] . the absence of specific pretreatment technologies for hww also increased the frequency of gastroenteric viruses in aquatic bodies [12] . the direct discharge of hww into municipal wastewater containing disease-causing parasites has also increased the risk of skin infections and other harmful diseases in humans [13] . the specific and advanced technologies are required to be developed for hww treatment to prevent the release of harmful contaminants into the environment. therefore the microbial community analysis of hww is utmost importance to assess the pollution load and to develop the specific treatment methods for protection of the environmental health. thus authors explain about the various pathogens present in hww, args, and tools used to study the args. moreover, metagenomics, a recent approach used to study different microbial communities and gene-specific identification, are also explained in this chapter. in addition, horizontal gene transfer (hgt) approach, which can efficiently contribute to spreading of args, is discussed briefly. the hww poses a serious threat to humans with respect to contagiousness and drastic spread of infective diseases in healthcare units, society, hospital employees, and environment [5] . different solvents, pharmaceuticals, and radionuclides are used in hospitals for purpose of diagnostics, disinfection, and research. after the drug consumption, various drugs remain nonmetabolized or partially assimilate in the human body and thus excrete as such and end up into wastewater. the residual quantities of disinfectants used for treatment of skin microbial infection and to disinfect instruments and surfaces of hospitals, also end up in the hww, resulting in an increase in the population of pathogenic microbes. the pathogenic microflora present in hwws also come from medical devices, atmosphere, and water used in the hospital practice, and the pathogens are released mainly in the form of excreta of patients [14] . therefore hww consists of a mixture of pathogenic microbes including bacteria, fungi, yeasts, algae, viruses, protozoa, parasites, and bacteriophages. the effluent from hospitals is usually discharged and treated with domestic wastewaters without any prior treatment [15] . the pathogens in the receiving water, if untreated, survive for a long time in soil or water and enter into the food chain causing infectious diseases and health risks to human beings [16] . due to the economic reasons and lack of resources for analysis of actual pathogens, certain bacteria have been used as indicators for contamination of water since decades [17] (box 4.1). however, it has to be considered that removal of coliforms, which is indicator of fecal contamination, could not be directly correlated with removal of viruses, pathogens, fungi, protozoa, or other bacteria from water (samples) [17] . in addition, the indicator bacteria such as escherichia coli and clostridium perfringens when compared with pathogenic bacteria, protozoa and viruses are more sensitive to inactivation through processes such as natural competition, wastewater treatment, and high temperature [18] . traditional detection of pathogenic bacteria involves selective culture media and biochemical characterization methods. these techniques are inexpensive and simple; however, sampling error, time consumption (5à11 days), tedious, and monospecific detection (detection of only one type of pathogen) are the major limitations [17] . the enzyme-linked immuno sorbent assay (elisa) is used for laboratory diagnostics of different pathogens. the polymerase chain reaction (pcr) has been adapted in many ways: nested pcr, multiplex pcr, real-time pcr, fluorescence, and digital pcr [19] (box 4.2) for analysis of waterborne pathogens. criteria for selection: • should be present only in contaminated water, not in uncontaminated ones • should not be able to grow and proliferate in water • should be present in intestinal tract of warm blooded animals • should have similar survival pattern as pathogens present outside the host • should be easily detectable • should be useful for all types of water • should be relatively cheap to use total coliforms are higher in number than any other pathogens. the subgroup fecal coliforms indicate fecal contamination of water and thus, indicate the presence of other pathogens as well, which are lower in number, hard to detect. however, lower number of pathogens are enough to cause morbidity and mortality to humans. they are detectable by inexpensive cultural methods and do not pose any health risk to laboratory workers. however, there are few limitations of fecal indicator bacteria, which have been discussed in the text. biochemical tests, pcr, and sequencing [32] (continued) bed-linens, or through infected patients [39] . the hospital-acquired infections resulted in increase of methicillin-resistant s. aureus (mrsa) isolates. the mrsa is associated with infections of skin and soft tissues and has been a global threat for human health. in the united states, 72,444 cases of mrsa infections were reported in 2014 [40] . e. coli and pseudomonas aeruginosa are also commonly found in hww along with other nosocomial pathogens including candida albicans, klebsiella proteus, and enterobacter species [37] . most of the times, e. coli is harmless and is important part of human intestinal tract, but some of the strains are pathogenic to humans. the e. coli 0157:h7 is related to many outbreaks of water and food-borne illness and also responsible for 63,000 hemorrhagic colitis (bloody diarrhea) cases in the united states annually [41] . a study reported on hww of brazil also confirmed the presence of other bacterial species including citrobacter freundii, klebsiella ornithinolytica, proteus mirabilis, pantoea agglomerans, and serratia rubidacea [28] . many c. freundii infections have been reported in bloodstream such as septic shock, pneumonia, hypothermia, oliguria, and thrombocytopenia [42] . another bacterium p. mirabilis, which is found in hww, is responsible for urinary tract infections and is further accompanied by development of kidney stones [43] . p. agglomerans mostly affects plants but also causes opportunistic infections in immunocompromised individuals causing skin infections [44] . an important prospect about fluctuations of pollution was discussed [44] , according to which, the microbial load present in wastewater was directly correlated to several hospital activities. the higher microbial load was present in hww during the day period as compared with evening and early morning. another study [44] was also conducted on hww collected during early hours (eh) and late hours (lh) of the day in abia state, nigeria. the microbiological analysis revealed the presence of higher total bacterial count (7.9 3 10 4 cfu/ ml) at lh as compared with that (6.7 3 10 4 cfu/ml) at eh of the day. the similar results were also observed for total coliform count, which was also higher (2.2 3 10 3 cfu/ml) at lh as compared with that (1.8 3 10 3 cfu/ml) at eh due to the fact that hospital activities at night time were more restricted than day time. according to the recent study, the hww samples were collected from three hospitals in different cities of romania [35] . three hospitals (h1, h2, and h3) were having different numbers of beds and inhabitants. the microbial community analysis results revealed that wastewater from h1 was dominant by 97.27% of the proteobacteria phylum, while from h2 was dominant by 44.7% of proteobacteria phyla followed by firmicutes (33.68%), bacteroidetes (16.4%), and actinobacter (4.9%) and from h3 was dominant by 71.4% of proteobacteria, bacteroidetes (13.7%), and firmicutes (13.1%) [35] . the similar taxonomic composition was also reported in various studies using wastewater collected from different hospital facilities, nonhospital medical facilities, and municipal treatment plant [26, 45, 46] . the dominant nature of proteobacteria was correlated to their presence in human feces, long-term survival ability in wastewater, and exposure to several antibiotics present in hww. moreover, the dominant nature of firmicutes could be correlated to their capacity to survive in extreme environmental conditions and high contaminant levels [47] . in 2018 buelow et al. [48] detected abundant bacterial taxa to be different from urban and suburban wwtp influents and comprised of camphylobacteraceae, aeromonadaceae, and carnobacteriaceae. hww also consisted of several different members of human gut microbiota such as of genus streptococcus and family ruminococcaceae due to the sampling location in close proximity of the hospital sanitation areas as compared with wwtp influent. these microbes are not well suited to survive in such complex environment, which results in progressively decreased levels of these bacteria in urban wwtp influent [48] . another study compared the relative abundances of the most abundant bacterial orders in wwtp for 7 years and showed that the wwtp environment was dominated by phyla actinobacteria, bacteriodetes, and proteobacteria, and the wastewater community was highly stable and unique to its environment [49] . hospital sewage also contained high levels of anaerobic bacteria including genera such as bifidobacteriales, clostridales, bacteroidales that were likely to originate from the human gut [1] . the release of antibiotics from hospitals results in creating a selection pressure on bacteria and as a result, effluents from hospitals contain high numbers of resistant bacteria as well as antibiotic residues. prevalence and spread of antibiotic-resistance bacteria (arb) in the environment are a major problem worldwide due to improper antibiotic usage and lack of effective hww management systems. therefore arb and args are particularly studied among the hospital contaminants. for example, one study reported e. coli from hospital effluent to be multiresistant toward ampicillin, streptomycin, sulfamethoxazole, cephalosporin, ciprofloxacin, and tetracycline [29] . the hospital water environment includes potable water, faucets, wastewater drainage systems, and effluents can be the reservoir of nosocomial pathogens (arb such as mdr enterobacteriaceae, acinetobacter baumannii, and pseudomonas species), thus, increasingly dominating the hww microbiome [50] . commonly found isolates such as p. aeruginosa, a. baumannii, and enterobacteriaceace in hospital effluents have shown carbepenem resistance and have disseminated around the globe [50] . baricz et al. [51] reported antimicrobial effect of a crude bacterial extract of janthinobacterium lividum against mdr bacteria of both clinical and environmental origin, for example, mrsa, methicillin susceptible s. aureus, enterococci, and enterobacteriaceae [51] . such studies not only help to provide promising candidates for development of new antimicrobials but also to propose new or improved treatment technologies to reduce the burden of antimicrobial resistance (amr) in environment. the microbiological analysis of several hwws showed the prevalence of fungal species along with bacterial and coliform species. the fungi have simpler nutritional requirements and have higher capability to grow at lower water activity as compared with bacteria [52] . moreover, the fungal populations can easily spread their spores to external environment, hence affecting the human beings directly [11, 52] . this is the reason for its prevalence in hospital and clinical environment because if healthcare facilities are even considered free from fungus, but nature and environment factors such as temperature, moisture, and nutrients could provide easy and favorable conditions for the extensive growth of fungal species in storage containers holding clinical waste. the fungal infections could range from moderate to fatal depending upon the infection site as well as immune system of the affecting individual [53] . the prevalence of invasive fungal species has been reported to increase since three decades, due to increase in number of immunocompromised patients. the moderate fungal infections include athlete's foot and ring worm infections (cutaneous infections) in immunocompetent patients and life-threating infections include mucosal and systemic infections [53] . the fungal isolates, associated diseases, and treatment methods reported in various studies are presented in table 4à2 . the study reported by neely and orloff [54] stated the prevalence of aspergillus species in the hospital waste collected from the united states and even worse than that, this species was capable enough to survive for more than a month on the hospital waste. the longer survival of aspergillus can result in reoccurrence of fungi and has higher chance to cause disease. moreover, many spores have capacity to remain dormant under adverse conditions and then again develop into fungi, when conditions become favorable. the other fungal species were also detected but the survival capacity of mucar ( . 20 days), paecilomyces (11 days), and fusarium (10 days) species was lower than that of aspergillus species. the survival efficiency of different fungal species is related to the presence of specific structure of spores. the spores of aspergillus species are spiny and rough and thus capable enough to adhere to any type of waste and can survive longer time. however, spores of fusarium species are smooth and thus have less capability to adhere and survive. the wastewater collected from nigeria revealed the presence of total fungal count of 3.5 3 10 3 cfu/ml during the day time activities in the hospital [72] . the most prevalence fungal species in the hww were aspergillus (33.3%), followed by candida (28.7%), cryptococcus (14.9%), and penicillium (14.2%) species. the fungal diversity (29 species) was reported to be higher for waste samples collected from hematology section of malaysia hospital [72] . moreover, gloves waste consisted of the highest number of fungal species (19 species) among all the different types of waste collected from hospital. the high fungal count of 7 3 10 5 cfu/ml was also reported in soil collected from hospital dumpsite, which was correlated with the presence of high organic material present in the hospital waste. the fungi are heterotrophic microorganism, which have capacity to consume organic compounds from the surrounding environment [72] . the common fungal species identified in hospital waste were penicillium rubrum, penicillium viricadum, trichothecium roseum, rhizopus nigricans, aspergillus flavus, aspergillus parasiticus, microsporum canis, and aspergillus niger. among them, a. niger was found to be dominant fungal species with 34.5% due to the fact that it released variety of enzyme such as amylases, pectinases, and proteases and had capacity to utilize number of organic compounds as a substrate [73] . a. flavus and a. niger can cause disease named as aspergillosis and are considered as pathogens for both humans and animals. they can cause either bronchopulmonary (coughing and wheezing) or invasive aspergillosis (affects the lungs and could spread to throughout the body) [74] . more than 200,000 cases and 30%à95% mortality rate have been reported for aspergillosis infections worldwide. the nonpathogenic black bread mold called r. nigricans was also found in abundance with 27.5%. m. canis can cause infection in upper and dead layer of skin of domestic animals, including dogs and cats and later on the humans can also get affected by the infection [75] . other fungal species including t. roseum, p. viricadum, and p. rubrum are nonpathogenic fungus, which can only affect the humans with weak immune system. the curvularia species have also been reported to be present in hospital waste and can cause rare infections in the respiratory tract and cornea of human beings. curvularia lunata is also responsible for eumycetoma disease caused in farmers of chandigarh and rajasthan. most of the agricultural regions use azole fungicides (imidazole, metalaxyl, tebuconazole, and propiconazole). therefore the farmers responded poorly to antifungal treatment and were referred for long-term therapies [76] . the few fungal species of fusarium (1.5%) and trichoderma (2.2%) have also been reported in hospital waste, but infection caused by these fungal species is difficult to cure [77] . over 1.2 billion people worldwide are reported to be suffered from several fungal infections and 1.5à2 million people die due to fungal infections each year, which are far superior than malaria or tuberculosis death rate [76] . antifungal drugs are mostly used for treatment of fungal infections as they target plasma membrane, sterol biosynthesis, dna biosynthesis, and β-glucan biosynthesis of different fungi. however, since past decades, the increased use of antifungal drugs resulted in development of resistance, leading to increase in morbidity and mortality. the drug resistance could be due to several reasons such as alterations of the drug target site, increased efflux of drugs, and biofilm formation. although several studies have reported, antifungal drugs resistance is not at the same level as resistance in bacteria against antibiotics. other reasons of high mortality rate with fungal infections include availability of limited treatment options against invasive fungal diseases and less susceptibility of immunocompromised patients to antifungal drugs [78] . for instance, in the united states, until 2013, there were reports of 19,262 organ transplants, which resulted in increased susceptibility to fungal infections in immunocompromised patients [78] . in addition, adverse effects of antifungal drugs and lack of effective antifungal therapies are the other reasons, which necessities the development of novel treatment strategies and next-generations of antifungal drugs. the development of ultrahigh-throughput screening techniques could help in advent of novel antifungal drugs by providing rapid, effective, and economical drug screening. presently, the viral infections are in forefront as compared with pathogenic diseases caused by bacteria, fungi, and parasites (table 4à3) . moreover, the detection, analysis, characterization, and epidemiology of virus is entirely different from bacteria because the bacterial indicators used to represent water contamination cannot actually represent viral contamination. it has been reported that each of the material present in the hospital waste can carry viruses, and thus, viruses are able to survive for 5à8 days [78] . the viral hepatitis is very common and leading disease. moreover, human immunodeficiency virus infections, hepatitis b, and hepatitis c are among the deadly infectious diseases, transmitted by direct contact of blood from infected to another. however, these are yet easy to prevent, once detected at early stages and obligatory precautions are followed by patients [5] . the samples collected from different wwtps have been analyzed for detection and characterization, and even research has been focused on removal of viruses from wastewater [79] . however, in spite of all the health risks associated with viruses, very few research [5] has been conducted specifically for hww, and thus, requires special attention due to the fact that, it is one of the main sources to spread the pathogenic and deadly viral diseases. the modern techniques of molecular biology including pcr assays offer several advantages such as specificity, sensitivity, and wider data analysis for easy and faster detection of viruses. the occurrence of rotavirus a, norovirus genogroup i and ii, human adenovirus (hadv), and hepatitis were detected in samples collected from two different hospital wwtps. the load of rotavirus and hadv present in hospital wwtp was 2 log cycle higher as compared with that present in urban sewage wwtp [80] . the hadv consists of double-stranded linear dna (fig. 4à1) and comes under category of nonenveloped viruses, which makes them resistant to heat, dry, and acidic conditions [78] . the adenovirus consists of outer capsid and inner core with several histone proteins. the elongated fiber proteins on the surface of the virus interact with receptor of host cell such as coxsackie and adenovirus receptor and mcp and start infection in the host cells. after interaction between virus fiber proteins and host cell receptors, uncoating of virus particles takes place resulting in dissolution of viral protein in the endosome. ultimately, the translocation of adenovirus takes place with microtubules through cytoplasm toward the nucleus [81] . hadv is mostly profound in patients with weak immune system and acute lower respiratory disease [82] . seventy-nine types of hadv have been reported [83] , and out of them, hadv 40/41 is mostly prevalent in aquatic systems such as sea water, sewage, rivers, surface water, and drinking water [12] . the hadv is placed among the contaminant candidate list for drinking water by u.s. environment protection agency and is a real factor of concern for the human health [84] . it can cause several diseases in humans including ocular infections, conjunctivitis, genitourinary infections, pharyngoconjunctival fever, hemorrhagic cystitis, exanthema, encephalitis, and pneumonia [85] . the hadv is more frequently found in wastewater than any other enteric virus. the hadv is reported to be transmitted through fecal-oral transmission, aerosol droplets, and contaminated materials. it can survive for the extended period of 3à8 weeks in the environment without host. due to its stable and persistence nature in aquatic systems, it could also be considered as an indicator of fecal contamination as suggested by various studies [86, 87] . the prevalence of hadv species varies according to the hospital environment. according to prado et al. [80] , the hadv 40/41 (species f) was among the most prevalence genotype found during molecular characterization of viruses, detected during analysis of samples collected from two different hospital wwtps. the species f come from the hospitalized children reported with acute gastroenteritis disease. the other reported hadv strains include species c and d, which are associated with conjunctivitis and respiratory tract infections [80] . the frequency of hadv has also been tested for the samples (102 in number) collected throughout year from hospital wwtp located in tunisia city, tunisia and 64% samples were detected positive for hadv and most prevalent species were species f (hadv 40/41) [12] . in the similar way, the prevalence of hadv was detected in samples collected from different regions of the world such as 92.1% in poland, 76.9%à92.3% in greece, 92% in norway, and 100% in brazil, while samples collected from morocco (45.5%), italy (21.6%), and taiwan (27.3%) consisted of low prevalence of hadv [78] . the high and low prevalence of hadv reported in various countries could be collaborated by the fact that the circulation of virus varied according to the geographical regions and epidemiological community profile. however, whether it was high or low prevalence, still presence of hadv questions the ability of wwtps to remove the gastroenteric viruses and highlights the urgency for development of effective environmental control programs and innovative hww treatment plants before discharging the hww into the sewage system. the detection of hadv includes cell culture method, antigen detection, and pcr method. serology is also used sometimes to detect adenovirus-specific antibodies. there is no effective treatment against hadv, essential precautions should be taken to control infections, which include washing of hands, disinfection of instruments as well as application of infection control protocol in hospitals against adenovirus to prevent outbreaks [88] . the rotavirus consists of 11 segmented double-stranded rna and its surface is surrounded by three layers: the outer capsid, inner capsid, and central core. there are seven rotavirus groups that have been reported from a to g and studies have revealed that groups a, b, and c affect both humans and animals, while d to g are reported to cause infection mainly in animals. the dual (binary) classification system has been used for detailed classification of rotavirus on the basis of two outer capsid proteins; the glycoprotein vp7 and protein vp4 (fig. 4à2) , where the glycoprotein vp7 defines for the stereotype g and protein vp4 defines for stereotype p [89] . the viral attachment to the host cell surface is by glycoprotein vp7 or surface protein vp8. after interaction with host cell, calcium-dependent endocytosis takes place and thus, uncoating of the particle occurs, resulting in release of double layer of the viral particle into cytoplasm. further, transcription and translation takes place in the cytoplasm and viral proteins are synthesized by cellular ribosomes. after assembly, virus particle is released from the infected cell through cell lysis, and infection is further spread into the host [90] . rotavirus is associated with around 111 million cases of acute gastroenteritis in the newborn babies and children across the world, and 352,000à592,000 deaths have been reported for children with age of ,5 years globally. after infection and replication, rotavirus is discharged into surface water, groundwater, drinking water, and wastewater, able to transmit effectively through water due to its stable and resistant nature under adverse environmental conditions [91] . the rotavirus can survive at ambient temperature of 30 cà35 c and can exist on inanimate objects for 60 days without host. the rotavirus is mostly prevalent in humans and 11%à42% of positive samples are associated with group a rotavirus [92, 93] . the detection methods of rotavirus include electron microscopy, elisa, passive particle agglutination tests, polyacrylamide gel electrophoresis, and rt-pcr. the vaccination with safety and efficacy profile for rotavirus a has been developed and licensed in 2006 after the withdrawn of first ever vaccination, which was associated to cause intestinal intussusceptions in children [94] . the prevalence of rotavirus in different seasons (summer, autumn, and winter) was detected by collecting 60 different samples in the period of 9 months from hww located in shiraz, iran. enzyme immunoassays (eias) were performed and positive specimens were further investigated with nested-pcr by various primers. moreover, the virus concentration method such as the pellet and two-phase is reported to enhance the concentration of virus by 50-to 100-fold before applying eia. about 15 samples (25%) were found to be positive for rotavirus a and the highest prevalence was detected in autumn with frequency of 46.67% followed by 33.33% and 20% of prevalence in winter and spring, respectively [95] . other studies also confirmed the prevalence of rotavirus a in cold weather as compared with other seasons [96] . the molecular characterization of rotavirus further revealed the predominance of g1 genotype during various clinical investigations followed by mixed genotype of g1g4, which is associated with frequent water contamination and further generation of novel strains by genetic reassortment. moreover, various studies have reported the prevalence of rotavirus in treated wastewater as well, which is a real concern for public health [95] . the studies on rotavirus highlight the significance of environmental surveillance tools for detection of novel genotypes and to further analyze the distribution and treatment of the hazardous effects caused by rotavirus in human beings. the norovirus belongs to the family caliciviridae, and is reported to be nonenveloped and contains a positive sense, single-stranded rna, which is organized into three open reading frames (orfs). the three orfs encode three different proteins; orf1 is of size 5 kb and encodes nonstructural (ns) polyproteins, orf2 is having 1.8 kb size and encodes structural capsid protein vp1, while orf3 with size of 0.6 kb encodes a vp2 protein to maintain the stability of capsid protein [97] . the norovirus life cycle has been presented in fig. 4à3 . the interaction of norovirus with the host cell surface takes place through histoblood group antigens. after release of the vpg-linked rna genome of the virus into cytoplasm of host cell, the viral rna translation takes place. once the translation of the orf1 polyprotein is done, further co-and posttranslation processing occurs by ns6 protease of virus, which results in the release of ns proteins for the formation of replication complex. thereafter the viral replication takes place by rna-dependent rna polymerase using de novo and vpg-dependent mechanism. furthermore, the replicated genomes are translated and further packed into the capsid for assembly of virion and exit [98] . the norovirus is divided into seven genotypes, among them three genotypes gi, gii, and giv are responsible for infection in humans, and gii with genotype 4 (gii.4) is a predominant infectious virus associated with 18% of the diarrheal diseases and gastroenteritis infections around the world. in addition to gii.4, various other non-gii.4 containing viral strains are also responsible for gastroenteritis infections, for instance, the gastroenteritis emerged in southeast asia in 2014à15 was associated with g11.17 viral strain. norovirus is associated with 677 million gastroenteritis cases and 210,000 deaths annually around the world [99] . the emission of waterborne norovirus is caused by contamination of water systems including drinking water, surface water, mineral water, and groundwater [100] . as the infectious individuals shed the virus into the water system, the samples collected from hww are considered useful for study of norovirus epidemiology at population level [99] . the transmission of norovirus is through fecalàoral route from person to person or animal to person by contamination food or water intake. depending on the surface temperature and conditions, the norovirus can survive for longer period outside the host. it has capacity to survive for weeks on the hard surface, until 12 days on contaminated fabrics and for years in the contaminated still water [101] . the norovirus infection in humans can be detected by real-time pcr, multiplex gastrointestinal platforms, eias, and genotyping as well. the development of real-time pcr for detection of norovirus is considered as significant advancement with various environmental applications. the prevalence of norovirus was studied by ibrahim et al. [100] , where 102 samples were collected form 5 different basins of hospital wwtp located in tunisia. the 65% of the samples were positive for norovirus gii followed by norovirus g1, which was detected in 1% of the samples collected from wastewater, while frequency of norovirus was lowered moving from one basin to another. however, there is requirement of tertiary wastewater treatment before recycling of water for bathing and other purposes. different studies have revealed the prevalence of norovirus in different seasons, for instance, studies on hww collected from italy and sweden have showed occurrence of norovirus gii in spring and summer seasons [102, 103] . however, the study from china has revealed the prevalence of norovirus in winter conditions, which could be explained by variability in temperature during different seasons, immunity level of the host, humidity as well as socioeconomic conditions of different countries [104] . the emerging recombinant norovirus has been analyzed by illumine miseq and sanger sequencing technique by collecting different clinical and wastewater samples within australia and new zealand. the prevalence of pandemic variant (gii.pe/gii.4) was found in sydney during 2014 and 2015, while decline in pandemic virus was observed in 2016 with emergence of five new recombinant strains. these new viruses were held responsible for emergence of gastroenteritis outbreaks during november 2016 [99] . the use of new generation sequencing tools has advanced and reformed the genome sequencing of different pathogens present in wastewater. the norovirus has been responsible for 210,000 deaths per year. in 2016 world health organization (who) identified norovirus as a priority disease for vaccine development. however, several vaccines are still in trails phase and preclinical stages and no licensed vaccine has been available for norovirus infection treatment. the cost effectiveness, target population, and public acceptance are the major considerations for vaccine development [105] . the public awareness about the norovirus and prevention measurement should be spread for protection of people from norovirus infection due to lack of treatment available. hepatitis a virus (hav) is a nonenveloped virus, belongs to the picornaviridae family, and has positive sense single-stranded rna (fig. 4à4) . the rna consists of only one orf, which encodes a large polyprotein [106] . there are six genotypes of the virus reported and genotype i, ii, and iii are infectious to human population and are further subgrouped into a and b, while iv, v, and vi affects other primates than human beings [107] . there is a genetic nucleotide variation of 7.5% in each of the subgenotypes, which has been evaluated with pcr by vp1/vp2a junction present in 168 nucleotide fragment [108] . the interaction of the hav with host cell involves endocytic pathway. the host cell receptor involved in interaction with virus is tim-1 (havcr1) and it is reported to cycles between plasma membrane and lysosome of host cell with calthrin-mediated endocytosis. the replication, translation, and assembly of virus occurs inside the cytoplasm of host cell [108] . the genetic analysis of hav could be correlated with the outbreaks of viral infections and transmission mode as well. the main transmission mode for the virus is fecalàoral route, which could be directly transmitted through one human to another or indigestion of contaminated food as well [109] . the hav has capacity to survive for months outside the host body, it can survive through freezing temperature but can be killed, when exposed to high temperature of .80 c. the geographical distribution of the virus has been mainly present in india, africa, middle east, central america, and south america. the hav is considered as one of the most important food-borne pathogens and is a major cause of hepatitis in humans with reported cases of around 1.5 million globally. it has been estimated that half of the hepatitis cases are related to hav. the wastewater coming from the hospital could be emerging source of hav due to the fact that human excretion contains enormous amount of virus; moreover, virus is highly resistant to harsh environmental conditions and is able to survive for longer periods [110] . the hav is also detected in rivers, raw or treated water, and dam water as well [111] . these viruses are a real cause of concern for human health; moreover, there are no strict regulations in existence related to monitoring of these viruses in environment and water resources [112] . the consumption of contaminated food has also been directly linked with outbreak of hav at several places. for instance, there were 397 reported cases of hav in michigan between august 2016 to october 2017 with 15 fatalities and 320 hospitalization. the outbreak was linked to consumption of raw scallops served in sushi chain. in another case at hawaii, 292 cases suffering with hepatitis a infections were reported between june and october 2016, which were linked to contaminated frozen strawberries [113] . the prevalence of hav was detected in different clinical samples collected over a period of 2 years from patras and alexandroupolis located in greece. the nested real-time pcr revealed the presence of 100% genotype-1 and particularly subgenotype a [114] . the predominance of genotype-1 and subgroup a has also been reported in wastewater samples collected from brazil [80] . the prevalence of virus was also reported to be higher during the lower number of cases reported for hepatitis a, which could be directly correlated to the shedding of virus into feces. the detection of igm antibodies and anti-hav in blood is related to hepatitis a infection. the presence of anti-hav in blood indicates infection or past vaccination. the hav vaccination within 2 weeks of infection is the prevention method for hav infection and includes two-dose series for individuals above age of 12 months. the vaccination includes inactivated virus and is safe for immunocompromised persons [113] . the hww is heavily loaded with pathogens and discharged directly into aquatic bodies could be directly evident by skin infections and intestinal parasites. around the globe, millions of people are affected by deadly parasites infections, for instance, 2.8 million suffer from giardiasis, 50 million from amebic dysentery, 740 million affected by hookworm infections, 795 million detected with trichuriasis, and 1.2 billion are infected with ascariasis [13] . these parasites are mostly transmitted through fecalàoral route under poor hygienic conditions, contaminated water and food sources, and poor wastewater disposal practices. the parasites in their ineffective stages such as cysts, eggs, and oocysts survive under environmental adverse conditions and through many wastewater treatments processes due to the presence of protective outer layer. therefore parasites have capability to survive in wastewater for extended time period in comparison to viruses and bacteria [115] . the samples collected from hospital sewage treatment plant located in zaria, nigeria contained several eggs, cysts, and oocytes of various parasites. about 1648 eggs, cysts, and oocysts were present per liter of wastewater. ascaris lumbricoides was the common parasite found in hww with 307 eggs per liter (18.67%) of wastewater followed by 287 eggs (17.42%) of taenia spp., 253 eggs (15.35%) of schistosoma spp., 176 eggs of toxocara spp., 135 eggs (8.19%) of ancylostoma spp., 130 eggs (7.89%) of cryptosporidium parvum, and 58 eggs (3.52%) of giardia lambila. moreover, 92 eggs (5.58%) of trichuris and hymenolepis spp. were also found in hww. the cysts of entamoeba histolytica and a. lumbricoides were found in many studies and remained viable for longer period of time in pond effluents that was further used to irrigate raw vegetables, thus, entering the food chain and then directly to humans [13] . c. parvum causes the disease cryptosporiadiasis and g. lambila causes giardiasis, both could be life threatening if found in persons with weak immune system. the children diarrhea in many cases is caused by the cryptosporidium species [116] . taenia species can cause cysticerocosis in humans that can infect muscles, brain and can ultimately cause onset seizures in adults, while it can also cause bovine cysticerocosis in cattle after consumption of contaminated water. the main symptoms of parasitic infection include nausea, vomiting, malabsorption, diarrhea, and stomach cramps [117] . therefore the wastewater should be treated before discharge into water bodies otherwise they could be a great risk to public health. antibiotics are one of the most successful and important drugs used in therapeutic applications and the indiscriminate use of these compounds has made their way into the environment. the overuse and misuse of antibiotics not only in human therapeutics but also in veterinary, agriculture, and aquaculture applications [118] has led to the emergence of args and arb compromising or decreasing the effect of antibiotic compounds as they are becoming resistant to multiple drugs thus, causing a major concern. some studies have shown that resistant bacteria can also be present where antibiotic concentrations are low, saying that subinhibitory concentrations can also promote resistance among bacteria as similar to concentrations found in some aquatic and soil environments [118] . just as natural antibiotics have existed for billions of years, args are also ancient [119] . there have been occurrences of args in places where anthropogenic activities are minimal such as genes encoding for β-lactamase in a remote alaskan soil suggesting the environment to be a reservoir of args [120] . antibiotic-resistant bacteria have been reported from lechuguilla cave, new mexico which has been isolated for more than 4 million years including some strains which were resistant to over 14 different antibiotics [121] and in ancient permafrost samples where communities harbored resistance mechanisms minimum 5000 years ago [122] . such studies improve our understanding of the prevalence of resistance genes in environments much prior to human use of antibiotics. within past years, pieces of evidence have shown mobilization of these resistance genes from the environment into pathogenic bacteria causing health risks to humans and animals and also, demonstrating a link between environmental and clinical resistance [123] . moreover, due to the introduction of antibiotics from various human activities, the environment has turned into a reactor of arb and args contributing to the evolution and spreading of resistant genes. the phenomenon of emergence and spread of args has increased so intensely that 70% of all hospital-acquired infections show resistance toward at least one family of antibiotics. hww represents an important source of args and arb and such effluents are highly hazardous due to its infectious and toxic features [78] . traditionally, resistance was viewed as a healthcare problem but now nonclinical environment has also been reported as an important factor in the dissemination of resistance genes [124] . a wide range of antibiotics and args are being released from the hospitals and urban wastewaters which are received by wwtps [125] . the wwtp also serves as a hotspot in the emergence and spreading of args and arb in the ecosystem [126] . even after the treatment, some of the antibiotics and args are still not completely removed and being released into the receiving water bodies making aquatic ecosystems an ideal place for acquisition and spread of such genes. commensal bacteria of humans and animals also constitute a reservoir of such resistance genes which can enter the environment through sewer systems or use of animal manure [127, 128] . other than wwtp effluents, args can reach the soils, sediments, surface water and groundwater bodies including drinking water systems [129, 130] by surface runoff or infiltration of water that has been used for agricultural purpose [118] . it has been estimated by who that arbs are responsible for 25,000, 23,000, and 38,000 deaths per year in european union, the united states, and thailand, respectively [131] . there has been increasing resistance in many human pathogens such as e. coli, a. baumannii, enterococcus sp., klebsiella pneumonia, s. aureus, p. aeruginosa, serratia marcesens, citrobacter sp., and other enterobacter species, and there have been a great number of growing reports concerning the prevalence and dissemination of these pathogens into various environmental settings [132à134]. knapp et al. [135] reported args from different major classes of antibiotics tested from 1940 to 2008 had significantly increased since 1940, along with tetracycline args around 15 times more abundant than in the 1970s [135] . the resistant gene can be specific to one antibiotic, for example, ciprofloxacin or a group of antibiotics such as β-lactums and as a result, hundreds of args are being detected [136] . genes resistant to antibiotics are commonly observed for aminoglycoside (aac, aad, aph), chloramphenicol (cat, cml), sulfonamide (suli, sulii, suliii, sula), trimethoprim (dfra, dfrb), quinolone (qnra, b, s), tetracycline (teta-e, g, h, j, y, z, etc.), vancomycin (vana, vanb), macrolide (erma-c, e, f, t, v, x), β-lactum, and penicillin (bla, meca, pena) antibiotics in different environments [124, 136, 137] . infections caused by resistant strains are difficult to deal and have led to increasing rates of infections, higher hospital costs, and high rates of morbidity and mortality, for example, strains producing extended-spectrum β-lactamase (esbl) are responsible for higher rates of mortality [138] . there is a widespread occurrence of args originating from different settings that has contributed to a web of resistance among humans, animals, and the environment and, even after their treatment, they can be still present and can contribute to their spread in the environment. furthermore, there still exists an unexplored pool of genes that may have the potential to be used as args and may be passed to pathogenic bacteria. various hypothesis have been given for the mechanisms of resistance genes to traverse wwtp and their influence by the treatment process [139] (fig. 4à5) . thus antibiotic resistance development includes a broader impact on the environment and human health rather than just a local health issue and has to be addressed. the current risk assessment is inadequate and requires advanced biological risk assessment evaluations to detect the proliferation of args and antibiotics. this includes minimizing the resistance emergence and spread in the environment and their transmission to humans. to achieve this, it is necessary to achieve goals such as defining resistance in environmental samples and standardizing testing in those samples which will further require establishment of more comprehensible databases to combine both environmental and clinical metadata. it would help to understand relationships between resistomes of different settings and would improve the risk assessment of args and arb to further develop control strategies [140] . the need for screening of args present in bacteria is important for the optimal antimicrobial therapy to treat infections in patients and this need is increasing with increasing resistance among bacteria. detection will also predict the spread of resistant organisms and genes throughout the environment. a diversity of detection methods is available for both phenotypic as well as genotypic determination of args in isolates. antibiotic resistance is a selectable phenotype and can be detected using growth inhibition assays using disc diffusion method, broth dilution, gradient strips, or other methods to determine the minimum inhibitory concentration (mic) of antibiotics [141] . the mic is calculated for each isolate and based on the results, isolate can either be susceptible or resistant to the antibiotic [142] . however, there remain certain problems associated with this method, such as gradation of resistance, time-consuming (can take several weeks for slow-growing bacteria such as mycobacterium tuberculosis), and culture-dependent method, relate to only concentration of antibiotics (difficult to detect low-level resistance), and give no information about dissemination of resistance via mobile genetic elements (mges) [141, 143] . to overcome these disadvantages, genotypic or molecular characterization methods such as pcr, hybridization techniques including microarray, and whole genome sequence (wgs) are used extensively to determine specific resistance genes and providing results within hours. genotypic characterization has led to the identification of args in fastidious bacteria which expresses few phenotypic features, for example, tropheryma whipplei, that causes whipple's disease showed mutations in gyra and parc gene owning to fluoroquinolones resistance [144] . the wwtp compartments like influent, activated sludge, effluent consist of contrasting conditions including different concentrations of metals, antibiotics, etc. changing stress concentrations may act as drivers for microbial community and resistomes thus, changing biomass per volume and persistence/enrichment of arb this can result in a strong shift in the whole microbial community composition and antibiotic-resistance subset these changes are expected to correlate with the changes of resistome. contigs analysis allows further identification of marker genes for mobile genetic elements finally, horizontal gene transfer may help in the transfer of resistance genes and may act on evolutionary timescales the use of a plethora of pcrs such as standard, real-time, multiplex pcr to detect the presence of genes encoding resistance to aminoglycoside, chloramphenicol, macrolide, β-lactum, penicillin, trimethoprim, and tetracycline in bacteria is quite common [136, 145, 146] . isothermal amplifications using loop-mediated isothermal amplification [147] pcrs are very rapid, performed at one constant temperature and have been developed to detect esbls genes and carbapenemases genes in bacteria but they cannot be used in multiplex to detect several genes simultaneously. very recently, a paper-based chip which is integrated with lamp and a switch molecule for fluorescent detection of args has been developed to allow more convenient and efficient detection especially in resource-limited conditions [148] . many other detecting and genotyping techniques based on the pcr have been developed to identify args such as restriction fragment length polymorphism-pcr, mismatch amplification mutation assay-pcr, or pcr-single strand conformation polymorphism that have been used to detect gyra mutations in quinolone-resistant isolates [149] . schwartz et al. [150] used pcr technique to detect vana (vancomycin-resistance gene), meca (methicillin-resistant gene), and ampc (ampicillin-resistant gene) in wastewater, surface water, and drinking water biofilms [150] . recently both pcr and rt-pcr have been used across europe for detection of plasmid-mediated mcr-1 genes harboring resistance against colistin [151à154] (table 4à4) . multiplex rt-pcr has been used to identify staphylococci from blood samples by targeting meca and other species-specific genes [158] . another example of using multiplex pcr is in the detection of prevalent esbls genes in e. coli that encoded mainly bla shv , bla tem , bla ctx-m , and bla oxa [156] . it is also possible to devise and use multiplex pcr to detect 35 genetically diverse resistance genes for [136] tetracycline thus, making it one of the most widely used techniques [159] . however, these methods have low-throughput, sometimes gives false results, mostly ignore potential reservoirs of resistance that are nonculturable bacteria, and depend on primers that leave less room for discovery of novel genes. molecular hybridization is one of the oldest molecular techniques that have been used to detect the presence of specific args and meanwhile several improvements have been done on probe designs and synthesis. southern hybridization demonstrated tet and class-i integrons can be cotransferred from isolates present in soil to e. coli and/or pseudomonas putida [160] . pcr-southern blot assays were used frequently and reported tetracycline (tet39) and sulfonamide (sulii) resistance genes in acinetobacter species from fish farms in thailand [157] . probes can be labeled with a variety of reporters including radioactive and nonradiolabeled systems. one nonradiolabeled method is fluorescence in situ hybridization that has been used for rapid detection of macrolide, clarithromycin, linezolidresistances [136] . high-throughput dna microarray is another technique that has been successfully used to detect args in the test organism in comparison to a reference strain and works with high speed and high delicacy. in this technique, probes specific to the particular gene are spotted on a solid substrate (e.g., glass slide). dna is labeled and hybridized, and the specific targetprobe duplexes are detected. comparison of genomic diversity in a large number of test isolates can be done for which wgs is not available [145] . it enables detection of a large number of single genes, mutations, mges and also can characterize strain at the molecular level [161] . it has been used for antimicrobial genes detection in a diverse range of bacteria [162] . glass-based microarray has been developed for the detection of 17 tetracyclineresistance genes and one β-lactamase gene in multiple bacteria that used microarray probes c. 550-base pair pcr products [134] . perreten et al. [163] developed a rapid and efficient screening of gram-positive bacteria using microarray for the presence of 90 args which was further improved by additional oligonucleotides for detecting 117 args [163, 164] . similarly, alere microarrays have been developed capable of detecting 75 clinically relevant antibiotics and can be used in routine diagnostics laboratories [165] . some of its disadvantages include, low detection limit, high cost (due to use of platforms with probes of short oligonucleotides or pcr products as well as fluorescent dyes) and problems in dealing with complex samples [166, 167] . just like pcr and microarray, wgs is another potential method to detect genes and mutations conferring antibiotic resistance. the main advantage is its ability to use and detect different targets simultaneously and to subtype-specific gene variants. one can possibly add new targets to the database for analysis and can perform rapid in silico reanalysis of sequenced isolates. the presence of args within the wgs data has to be determined for which, comprehensive databases containing relevant target dna is required along with the use of appropriate bioinformatics methods for obtaining information about the wgs data. some of the databases that have been used for detection of args in curated wgs data are resfinder, which is a web server and uses blast for identification of args [168] , comprehensive antibiotic-resistance database that uses blast and resistance gene identifier as two analysis options [169] , antibiotic-resistance gene-annotation that gives user an opportunity to use local blast in bio-edit software and analysis can be done without web interface [170] , and antibiotic-resistance gene database that is a manually curated database unifying publicly available information on args such as resistance profile, ontology, mechanism of action, and much more [171] . other knowledge resources of amr include antibioticresistance genes online, collection of antimicrobial peptides, database of antimicrobial activity and structure of peptides, antimicrobial peptide database, and bacmet which differ from each other on the basis of the knowledge on molecular level of args they reflect and the scope of resistance mechanism they cover [172] . however, one major drawback of using molecular techniques for arg detection is that new emerging resistances against some antibiotics may be overlooked as observed for ndm-1 gene encoding resistance to carbapenem antibiotics which was first isolated using phenotypic methods and then characterized on the basis of genotype [141] . thus, within the number of both phenotypic and genotypic methods, there are limitations involved with each one and one can perform a combination of these screening methods to monitor resistance. thus the use of all these techniques helps in predicting the resistance genes, setting up control measures in hospital infections, adapt to a specific therapy and using them routine detection tools [173] . the molecular analysis tools such as pcr, quantitative pcr, and 16s rrna analysis have provided the in-depth knowledge about microbial communities present in wastewater since decades [174] . however, requirement of gene-specific primers, and species-specific approach used by these tools limit their activity for detection of certain targeted microbes providing incomplete information about microbial communities present in wastewater [175] . recently, metagenomics analysis has been introduced, which overcomes the limitations related with conventional molecular analysis tools and is able to generate hundreds to thousands of sequences, providing complete profile of microbial communities present in unknown samples, thus high abundance of potential microbes are detected [176] . the metagenomics analysis consists of four steps; genetic material isolation followed by genetic material cloning, construction of library, and the analysis of genetic material from the metagenomics library. metagenomics along with search engine for amr [177] can analyze the unknown samples collected from environment and can provide full-length arg data. the hww has been reported to have two overexpressed β-lactam-resistance genes (bla ges and bla oxa ) as compared with the water collected from other aquatic bodies, which could be correlated with antibiotic usage over the time in hospitals and discharge of the residues of antibiotics in the wastewater [176] . the hww that is derived from clinical speciality ward is determined to be major spot for the amr. the wastewater derived from different wards of the hospital and the final effluent of hospital was tested and compared for resistance genes. the wastewater has a high abundance of class a β-lactamase resistant genes (i.e., bla kpc , bla ctx-m , bla shv , bla tem ) and the wastewater from clinical ward also consists of high level of bla kpc-2 genes (142, 200 x/gb), encoding for carbapenem resistance. moreover, there was presence of assembled scaffolds of incq and incf plasmids having quinolone-resistance genes (qnrs1, qnrs2) and the class a β-lactamase gene (bla tem-1 ) in wastewater, which further helps in proliferation of amr [178] . metagenomic functional selection has been integrated with deep metagenomic sequencing to trace validated args in different environments. studies have identified wwtp resistome consisting of novel args thus, throwing the limelight on wwtp reservoir of uncharacterized resistance genes using these selections. using such metagenomic methods relates the number of clinically relevant resistant genes to overall functional resistant genes and thus, helps in improving the risk classification in environments [49] . metagenomics strategy also revealed the association of isolated p. aeruginosa from hww with amr frequency. the isolated strain was selected as a bio-indicator, due to its ability to survive and colonize in harsh environments and it was used to assess its viability, antibiotic susceptibility, and diversity in hww. the samples collected during different treatment steps of hww revealed that each treatment step was able to decrease the bacterial population by four logarithm cycle; however, the antibiotic resistance profile did not decrease at each treatment level, while, on the contrary, there was increase in amr microbes and genes as well. the microorganism was able to remain survived in the spore form during different treatment steps and again transform into vegetative form when released into surviving environment [179] . the healthcare clinics and hospitals, where antibiotic use is more common among the major contributor to disseminate the pathogenic microorganism into the environment. in addition, shortgun metagenomics analysis of hww collected from turkey revealed the presence of more than 100 antibiotic resistance determinants and most common genes belongs to aminoglycosides and β-lactameses. the prediction of high resistome diversity in hww raises an alarm for health concern of human population. an important step in coping with the serious problem of emergence and dissemination of args is to understand the pathways for resistance gene spread. the ability of bacteria to respond to selective pressures and new environments can be explained by the acquisition of new genes by cells using horizontal transfer methods. studies have shown around 75% of the genes in each genome have been transferred by hgt during evolution [180] . many resistance genes are located on mges such as plasmids, transposons also known as jumping genes, and integrons that are capable of capturing and expressing gene cassettes [181] , which are responsible for antibiotic resistance that functions as vectors for their dissemination [182] . horizontal acquisitions might be neutral or have deleterious effects in the chromosome or confer a selective advantage to the host. other than mutations, hgt has been one of the methods which are most responsible for the dissemination of args among different bacterial species. it has caused the spread of antibiotic resistance from commensal and environmental bacteria to pathogenic species. the args transfer by hgt exists much back before the use of antibiotics by humans, for example, oxa β-lactamase genes that confer resistance against β-lactam antibiotics mobilized from chromosomes to plasmid millions of years ago [183] . but the rate of the resistance gene transfer has increased tremendously due to selective pressure caused by antibiotic use by humans in the last few decades. bacteria employ three mechanisms for hgt, that is, conjugation, transformation, and transduction (fig. 4à6) . conjugation is said to have the greatest influence over the spread of args. conjugation requires a physical contact between two cells via pili or adhesions to transfer the genetic material. this mechanism has been found in bacterial cells with an exception of agrobacterium species that uses hgt in plants. the conjugative machinery encoded by the genes on plasmids or by integrative conjugative elements on chromosomes facilitates this process. args are mostly associated with conjugative elements like plasmids and transposons, and the transfer of these elements is most likely to occur by conjugation because it provides a more efficient way to enter the recipient cell and gives better protection in the environment [119] . conjugation can be of various types such as (1) transfer of a selftransmissible conjugative plasmid, for example, rp4 plasmid of e. coli. (2) mobilization of nonself-transmissible plasmids by the action of conjugative plasmid, for example, incq plasmid rsf1010. (3) cointegration of two different circular plasmids to fuse to become one. (4) conjugated transposons which may facilitate mobilization of plasmids or cointegration. the mges conferring agrs transfer have been found in bacteria ranging from soil, water ecosystems to food, and human pathogens [185] . the transfer of the conjugative elements between bacteria over distant phylogeny indicates the emergence of multiresistance between different reservoirs [186] . bla ctx-m esbl genes have been disseminated to various plasmids within enterobacteriaceae and other pathogens and can be found in various geographical locations [187] . the conjugation of plasmids has caused dissemination of args worldwide that encode resistance to β-lactamases, carbapenemases, quinolones, aminoglycosides, colistin, sulfonamide, tetracyclines, and other classes of drugs. conjugation has been also reported between bacteria and eukaryotic cells and is observed in various environments such as soil, aquatic, marine sediment, wastewater, and activated sludge [124] . the second mechanism is transformation, in which the cells take up the naked dna from the environment. for transformation to take place, there are certain prerequisites like release and persistence of extracellular dna in the environment, the recipient cells must be competent, and the dna translocated must be stabilized by integration into the genome or by recircularizing into self-replicating plasmids [188] . the process takes place in two steps: first, the dna substrate is transferred from surface to the cytoplasmic membrane which is mediated by type ii secretion system, type iv secretion system, and type iv pili; and second, is the transport of dna across the cytoplasmic membrane using cytoplasmic membrane channels [189] . with an exception of neisseria gonorrhoeae that is a naturally transformable bacteria which responds to environmental conditions [188] , most species develop competence under autoinducers, peptides, or stressful conditions, for example, aminoglycoside and fluoroquinolone antibiotics sublethal concentrations induce competence in streptococcus pneumoniae and legionella pneumophila [190] . mao et al. [191] developed a method to extract extracellular and intracellular dna from water and sediments. they obtained a higher concentration of extracellular dna than intracellular dna in sediments from a river basin that serves as a major arg reservoir that could be transferred to indigenous bacteria through natural transformation [191] . also, it has been shown in pneumococcal genome that the conjugated transposon disseminates via transformation in addition to conjugation [192] . transduction is the process by which dna is transferred with the help of bacteriophages. bacteriophages can transfer dna sequences like chromosomal dna, mges such as plasmids, transposons and genomic islands that are advantageous to their bacterial hosts as well as serves in improving the survival of bacteriophages [193] . transduction can be generalized where random host dna is incorporated during cell lysis and specialized, where the prophage excises itself from the host genome and also incorporates flanking host dnas [194] . args transfer by transduction has also been reported in various bacteria, for example, β-lactamase gene transfers by p1-like bacteriophages in e. coli [195] , erythromycin resistance transfer by phage between clostridium difficile strains [196] , tetracycline and gentamycin resistance between enterococci [197] , or transfer of resistance plasmids in msra [198] . bacteriophages can have broad host range, that is, between different species or among different taxonomic groups [197] thus, indicating dissemination of args via transduction in the environment, a common mechanism. other mechanisms such as gene transfer agents (gtas) and cell fusion have also been described. gtas are delivery systems that carry random pieces of host genome in capsids and get integrated into the host chromosome. the amount of dna that gta contain is insufficient to encode their proteins counterparts, unable to self-propagate thus, they do not necessarily carry gta-encoding genes that is an important distinction from transduction mechanism [199] . the first gta discovered was in rhodopseudomonas capsulate called as rcgta and they were able to transfer antibiotic resistance to bacteria by a mechanism similar to transduction which does not require cells contact and was resistant to dnases [200] . apart from the evidence of gta shown in various members of rhodobacterales, for example, ruegeria pomeroyi which contain a complete rcgta-like cluster of these genes, roseovarious nubinhibens and ruegeria mobilis also showed rates of transfer of antibiotic resistance around 10 6 -fold higher than rates of transformation and transduction when added to microbial communities [199] . mcdaniel et al. [201] reported gta frequencies to be much higher, around a million times higher the frequency of transformation and transduction measured previously in the marine environment [201] . there are several advantages of gtas over other mechanisms such as protection of dna from environmental factors, transfer not constrained to cellàcell contact, and they contain random dna from host rather than most of the bacteriophage dna as observed in transduction. another mechanism, cell fusion has been observed in haloferax and sulfolobus species [202, 203] . symmetric and bidirectional cell fusion has been observed in haloferax volcanii. studies have shown interspecies gene exchange in halophilic archaea where cells fuse forming a diploid state containing two different chromosomes of parental cells thus, facilitating genetic exchange and recombination followed by separation of hybrid parental cells [202] . although new mechanisms are continuously being developed by bacteria and identification, mapping of these resistance mechanisms provides us with knowledge of sources of resistance and helps in designing new antimicrobial drugs. the prions are infectious protein particles present in brain, which are responsible for degenerative neurological disorders. in humans, the prnp gene present on chromosome 20 produces the prp protein and prions results in transformation of prp protein into abnormal disease causing isoform [204] . these pathogenic prions cause the functional disruption of neural cells resulting in cell death and leading to memory problems, trouble with movement and personality changes as well. the prions can result in many infectious diseases such as creutzfeldtjakob disease (cjd), variably protease-sensitive prionopathy, gerstmann-sträussler-scheinker disease, scrapie, variant cjd, fatal insomnia, and kuru. these transmissible disease impacts number of mammalian species, including sheep and goats, cattle, deer, moose, elk, and humans. the scrapie and cjd are of particular concern because they can be transmitted horizontally and they remain infectious in the environment for very long time [205] . for instance, the scrapie infection was persistent even after burial in soil for 3 years. the cjd disease is most common and deadly human prion disease and can cause infection in three ways: sporadic, genetic, and acquired. most of the times (85%à90%), cjd occurs sporadically and approximately 1à1.5 million people are affected annually, because there is no cure available for cjd and other prion diseases [206] . the recently developed methods for detection of prion infectivity include protein misfolding cycling amplification, quake-induced conversion, and quantitative tandem mass spectrometric techniques [207] . however, the study of prions is highly challenging due to the fact that the high-resolution structure of prion isotherm is still to be determined and change in structure directly affect infectivity and pathology of disease. prions are likely to enter through live, infected hosts, and can be shred through saliva, urine, feces, mucus, and blood; and they enter in wastewater through hospital effluents, research facilities, homes, slaughterhouses, and mortuaries [208] . the wwtps are not able to treat prions, for instance, after entering into municipal wwtp, the prions bind to sewage sludge, survive through anaerobic digestion and are further present in treated biosolids. these biosolids are further used for land application, which results in their introduction into environment. thereafter they can be swept by wind and contaminated water with prions could further migrate into lakes, rivers, and oceans and thus, directly affecting the humans, aquatic life, and animals as well [209] . viroids are small (b250à400 nucleotides), single stranded, circular rna particles, which are distinguished from viruses by smaller size, lack of capsid and also they do not encode any protein. they have capacity to reproduce and are mostly known to cause infection in plants; however, hepatitis δ, which cause infections in humans, has many similar properties with viroids and could be related to viroids [210] . a total of 29 viroids species has been identified, which are grouped into avsunviroidae and pospiviroidae. the viroids are mainly infectious for plants such as tomatoes, cucumber, avocados, potatoes, apples, coconut palms, and chrysanthemums and are responsible for million dollars' loss in agriculture revenue each year. the most commonly reported viroid plant diseases caused by viroid include apple fruit crinkle, tomato chloric dwarf, and chrysanthemum chlorotic mottle [211] . the yellowish curled leaves of plants are due to pairing of viroid rna with messenger rna of plants resulting in interference of proper translation. some of the viroids cause devastating effects, for instance, coconut cadongàcadong viroid has caused lethal effects in coconut palms in philippines, resulting in loss of 40 million coconut palm [212] . other reported effects of viroids include epinasty, rugosity, necrosis, chlorosis, stem shortening, color alteration of fruits, flowering, and ripening delays. the microarrays are identified as a significant tool for detection of many viroids simultaneously and even has ability to detect emerging or established viroid in new host. potent protein toxins such as tetanus and botulinum toxins, anthrax toxin, epsilon-toxin, and enterotoxin can cause some most significant diseases in humans and animals. these protein toxins are most common virulence factors encoded by plasmids in clostridium and bacillus species, for example, tetanus toxin plasmid and conjugative toxin plasmid of c. perfringens. extracellular vesicles (evs) are also infectious particles said to be produced by all types of microorganisms. these evs can contain nucleic acids, toxins, lipoproteins, adhesins, nutrient scavenging factors, and enzymes that can play major role in pathogenicity in hww. such vesicles from gram-positive bacteria, mycobacteria, and fungi contain virulence factors that can elicit host immune responses. for example, crytococcus neoformans evs have glucuronoxylomannan (capsular polysaccharide) which is a virulence factor [212] . evs from gramnegative bacteria originate from outer membrane and are called outer-membrane vesicles which have been associated with cytotoxicity, virulence, invasion of host cells, and antibiotic resistance proteins. in addition to these particles, certain factors and conditions may further increase the virulence of evs, for example, a study indicated the role of iron-limiting conditions in the virulence of mycobacterium evs. these emerging infectious particles are not easily removed by conventional treatment technologies. thus there is a requirement of more novel methods and techniques to deal with such infectious particles. therefore these particles have to be readily analyzed for their presence and have to be studied for their removal from hww due to their potential negative effects on human and animal life. the recent years have witnessed the emphasis toward management of hww and various studies focused on the microbial communities present in wastewater have increased immensely. the pathogenic microbes present in hww have affected the human health since decades and antibiotic resistance microbes are also increasing significantly with time. the development of resistance toward antibiotics has been observed worldwide and has challenged both public and animal health. the use and release of various antibiotic agents in different settings have not only led to the prevalence of args in the environment but also spread and emergence of resistant bacteria. this has caused increased resistance in human pathogens and thus, making infections caused by them difficult to deal with, leading to higher mortality rates. however, the surveillance of its spread and prevalence in environment is limited and has to be expanded more due to the broad impact of antibiotic resistance on human health. various culture-dependent and independent techniques have allowed characterization and exploration of args and arb, which have increased our understanding toward the evolutionary pathways for their dissemination within any community. further, metagenomics tools have advanced the research toward analysis of complete microbial profile and increased the knowledge toward microbial abundance present in hww. however, further planning and implementation of strategies, policies, and experimental approaches have to be done by collaboration of scientific community and public authorities to limit the use of antibiotics, detection of microbial communities (resistant and/or sensitive) from wastewaters, and mapping resistance mechanisms to clearly understand the role of mges and extracellular dna in evolutionary process. the microbiome and resistome of hospital sewage during passage through the community sewer system evaluation of wastewater discharge from al-sadr teaching hospital and its impact on the al-khorah channel and shatt al-arab river in basra city-iraq study of widely used treatment technologies for hospital wastewater and their comparative analysis microbiological and toxicological assessment of pharmaceutical wastewater from the lagos megacity contaminant properties of hospital clinical laboratory wastewater: a physiochemical and microbiological assessment occurrence of pharmaceuticals in finnish sewage treatment plants, surface waters, and their elimination in drinking water treatment processes review on fate and mechanism of removal of pharmaceutical pollutants from wastewater using biological approach hospital effluents as a source of emerging pollutants: an overview of micropollutants and sustainable treatment options groundwater contamination by microbiological and chemical substances released from hospital wastewater: health risk assessment for drinking water consumers antibiotic resistant bacteria from treated and untreated hospital wastewater at ayder referral hospital, mekelle, north ethiopia survival of opportunistic fungi in clinical wastes molecular detection and genotypic characterization of enteric adenoviruses in a hospital wastewater parasitological profile of raw wastewater and the efficacy of biosand filter in reduction of parasite ova and cysts disinfectant-resistant bacteria in buenos aires city hospital wastewater separate treatment of hospital and urban wastewaters: a real scale comparison of effluents and their effect on microbial communities a global overview of treated wastewater guidelines and standerds for agricultural reuse determination of pathogenic bacteria in wastewater using conventional and pcr techniques pathogen detection methodologies for wastewater and reserviors dpcr: a technology review lab-on-a-chip technology for environmental monitoring of microorganisms the route of antimicrobial resistance from the hospital effluent to the environment: focus on the occurrence of kpcproducing aeromonas spp. and enterobacteriaceae in sewage hospital wastewater releases of carbapenem-resistance pathogens and genes in urban india assessment of antibiotic-and disinfectant-resistant bacteria in hospital wastewater, south ethiopia: a cross-sectional study antibiotic susceptibilities of enterococcus species isolated from hospital and domestic wastewater effluents in alice, eastern cape province of south africa beta-lactamase-producing enterobacteriaceae in hospital effluents insights into the relationship between antimicrobial residues and bacterial populations in a hospital-urban wastewater treatment plant system influence of hospital wastewater discharged from university of benin teaching hospital (ubth) multiresistance, beta-lactamaseencoding genes and bacterial diversity in hospital wastewater in rio de janeiro, brazil enumeration and characterization of antimicrobial-resistant escherichia coli bacteria in effluent from municipal, hospital, and secondary treatment facility sources antibiotic resistance and antibiotic resistance genes in escherichia coli isolates from hospital wastewater in vietnam antibiotic resistant bacteria in hospital wastewaters and sewage treatment plants dissemination of antibiotic resistance in methicillin-resistant staphylococcus aureus and vancomycin-resistant s aureus strains isolated from hospital effluents detection of antimicrobial-resistant gram-negative bacteria in hospital effluents and in the sewage treatment station of goiânia brazil antimicrobial resistance of 3 types of gram-negative bacteria isolated from hospital surfaces and the hands of health care workers abundance of antibiotics, antibiotic resistance genes and bacterial community composition in wastewater effluents from different romanian hospitals vancomycin resistant enterococci: from the hospital effluent to the urban wastewater treatment plant the microbiological effects of hospital wastes on the environment hospital effluent: a source of multiple drug-resistant bacteria hospital-associated microbiota and implications for nosocomial infections vancomycin resistance in staphylococcus aureus escherichia coli (e. coli 0157 h7), statpearls [internet citrobacter freundii fitness during bloodstream infection pathogenesis of proteus mirabilis infection pantoea agglomerans cutaneous infection influence of a non-hospital medical care facility on antimicrobial resistance in wastewater bacterial communities in full-scale wastewater treatment systems bacterial community characteristics under long-term antibiotic selection pressures limited influence of hospital wastewater on the microbiome and resistome of wastewater in a community sewerage system limited dissemination of the wastewater treatment plant core resistome the hospital water environment as a reservoir for carbapenem-resistant organisms causing hospital-acquired infections-a systematic review of the literature investigating the potential use of an antarctic variant of janthinobacterium lividum for tackling antimicrobial resistance in a one health approach inactivation of aspergillus spores in clinical wastes by supercritical carbon dioxide, arab overcoming antifungal resistance survival of some medically important fungi on hospital fabrics and plastics multidrug-resistant candida: epidemiology, molecular mechanisms, and treatment treatment of infections due to aspergillus terreus species complex candida and invasive mould diseases in non-neutropenic critically ill patients and patients with haematological cancer the spectrum of fungi that infects humans, cold spring harb emerging infectious diseases with cutaneous manifestations: fungal, helminthic, protozoan and ectoparasitic infections combination therapy for the treatment of pulmonary mold infections treatment of paecilomyces variotii pneumonia with posaconazole: case report and literature review first report of monomicrobial candida parapsilosis necrotizing fasciitis genetic diversity and in vitro antifungal susceptibility of 200 clinical and environmental aspergillus flavus isolates aspergillus niger infection in an immunosuppressed patient confined solely to the brain central lineàassociated mucor velutinosus bloodstream infection in an immunocompetent pediatric patient microbiological and antimicrobial analysis of hospital wastewater discharged into the soil environment trichophyton rubrum infection characterized by majocchi's granuloma and deeper dermatophytosis: case report and review of published literature darier disease complicated by terbinafine-resistant trichophyton rubrum: a case report zygomycetes in human disease cryptococcosis, diagnosis and treatment of human mycoses management of candidemia and invasive candidiasis introduction to fungi, the plant health instructor how a fungus shapes biotechnology: 100 years of aspergillus niger research fatal invasive aspergillosis caused by aspergillus niger after bilateral lung transplantation fungal infections from human and animal contact upsurge in curvularia infections and global emerging antifungal drug resistance fungal opportunist infection: common and emerging fungi in immunocompromised patients antifungal resistance: current trends and future strategies to combat longitudinal study on occurrence of adenoviruses and hepatitis a virus in raw domestic sewage in the city of limeira quantification and molecular characterization of enteric viruses detected in effluents from two hospital wastewater treatment plants the repertoire of adenovirus in human disease: the innocuous to the deadly acute lower respiratory tract infections in soldiers first isolation of a new type of human adenovirus (genotype 79), species human mastadenovirus b (b2) from sewage water in japan quantitative real-time pcr assays for detection of human adenoviruses and identification of serotypes 40 and 41 human adenovirus removal in wastewater treatment and membrane process evaluation of human adenovirus and human polyomavirus as indicators of human sewage contamination in the aquatic environment technical aspects of using human adenovirus as a viral water quality indicator human adenovirus surveillance-united states detection and molecular characterization of human rotaviruses isolated in italy and albania rotavirus disease mechanisms diarrhea, vomiting and inflammation: how and why occurrence of common pollutants and pharmaceuticals in hospital effluents rotavirus detection in environmental water samples by tangential flow ultrafiltration and rt-nested pcr first molecular detection of group a rotaviruses in drinking water sources in beijing, china first molecular detection of group a rotavirus in urban and hospital sewage systems by nested-rt pcr in shiraz, iran comparative study of enteric viruses, coliphages and indicator bacteria for evaluating water quality in a tropical high-altitude system surveillance of noroviruses in rio de janeiro, brazil: occurrence of new giv genotype in clinical and wastewater samples norovirus gene expression and replication emerging recombinant noroviruses identified by clinical and waste water screening quantification and molecular characterization of norovirus after two wastewater treatment procedures standardized multiplex one-step qrt-pcr for hepatitis a virus, norovirus gi and gii quantification in bivalve mollusks and water prevalence of norovirus and factors influencing virus concentrations during one year in a full-scale wastewater treatment plant detection and molecular characterization of noroviruses from five sewage treatment plants in central italy one-year monthly survey of rotavirus, astrovirus and norovirus in three sewage treatment plants in beijing, china and associated health risk assessment progress on norovirus vaccine research: public health considerations and future directions hepatitis a: epidemiology and prevention in developing countries hepatitis a virus subgenotyping based on rt-qpcr assays type a viral hepatitis: a summary and update on the molecular virology, epidemiology, pathogenesis and prevention quantitative pcr detection and characterisation of human adenovirus, rotavirus and hepatitis a virus in discharged effluents of two wastewater treatment facilities in the eastern cape fate of viruses in water systems incidence of human adenoviruses and hepatitis a virus in the final effluent of selected wastewater treatment plants in eastern cape province, south africa monitoring of adenovirus serotypes in environmental samples by combined pcr and melting point analyses hepatitis a virus: essential knowledge and a novel identify-isolate-inform tool for frontline healthcare providers molecular characterization of hepatitis a virus isolates from environmental and clinical samples in greece public health risks associated with food-borne parasites prevalence of protozoa species in drinking and environmental water sources in sudan comparative study on waterborne parasites between malaysia and thailand: a new insight antibiotics in the aquatic environment-a review-part i dissemination of antimicrobial resistance in microbial ecosystems through horizontal gene transfer functional metagenomics reveals diverse β-lactamases in a remote alaskan soil antibiotic resistance is prevalent in an isolated cave microbiome functional characterization of bacteria isolated from ancient arctic soil exposes diverse resistance mechanisms to modern antibiotics the role of aquatic ecosystems as reservoirs of antibiotic resistance environmental dissemination of antibiotic resistance genes and correlation to anthropogenic contamination with antibiotics occurrence of antibiotics and antibiotic resistance genes in hospital and urban wastewaters and their impact on the receiving river urban wastewater treatment plants as hotspots for antibiotic resistant bacteria and genes spread into the environment: a review levels of antibiotic resistance genes in manure, biosolids, and fertilized soil epidemiology of resistance to antibiotics: links between animals and humans detection of antibiotic resistance genes in source and drinking water samples from a first nation community in canada prevalence of antibiotic resistance in drinking water treatment and distribution systems antimicrobial resistance: global report on surveillance, world health organization society's failure to protect a precious resource: antibiotics hospital effluents are one of several sources of metal, antibiotic resistance genes, and bacterial markers disseminated in sub-saharan urban rivers identifying antimicrobial resistance genes with dna microarrays evidence of increasing antibiotic resistance gene abundances in archived soils since 1940 antibiotic resistance genes in water environment prevalence of antibiotic resistance genes and their relationship with antibiotics in the huangpu river and the drinking water sources multiple drug resistance and biocide resistance in escherichia coli environmental isolates from hospital and household settings wastewater treatment plant resistomes are shaped by bacterial composition, genetic exchange, and upregulated expression in the effluent microbiomes tackling antibiotic resistance: the environmental framework screening methods for the detection of antimicrobial resistance genes present in bacterial isolates and the microbiota culture-based methods for detection of antibiotic resistance in agroecosystems: advantages, challenges, and gaps in knowledge insights into antibiotic resistance through metagenomic approaches molecular evaluation of antibiotic susceptibility: tropheryma whipplei paradigm molecular methods for detection of antimicrobial resistance multi-centre evaluation of real-time multiplex pcr for detection of carbapenemase genes oxa-48 molecular detection of hepatitis a virus in urban sewage in rio de janeiro, brazil simultaneous detection of antibiotic resistance genes on paper-based chip using molecular detection of antimicrobial resistance detection of antibiotic-resistant bacteria and their resistance genes in wastewater, surface water, and drinking water biofilms occurrence and characterization of mcr-1-harbouring escherichia coli isolated from pigs in great britain from 2013 to co-occurrence of extended spectrum β lactamase and mcr-1 encoding genes on plasmids detection of mcr-1 encoding plasmid-mediated colistin-resistant escherichia coli isolates from human bloodstream infection and imported chicken meat emergence of plasmid-mediated colistin resistance mechanism mcr-1 in animals and human beings in china: a microbiological and molecular biological study antibiotic resistance elements in wastewater treatment plants: scope and potential impacts, wastewater reuse and current challenges molecular epidemiology of extended-spectrum β-lactamases among escherichia coli isolates collected in a swedish hospital and its associated health care facilities from the tetracycline resistance determinant tet 39 and the sulphonamide resistance gene sulii are common among resistant acinetobacter spp. isolated from integrated fish farms in thailand usefulness of multiplex real-time pcr for simultaneous pathogen detection and resistance profiling of staphylococcal bacteremia tetracycline antibiotics: mode of action, applications, molecular biology, and epidemiology of bacterial resistance, microbiol class 1 integrons and tetracycline resistance genes in alcaligenes, arthrobacter, and pseudomonas spp. isolated from pigsties and manured soil molecular methods for detection of antibiotic resistance, antimicrobial resistance in bacteria of animal origin dna microarray detection of antimicrobial resistance genes in diverse bacteria microarray-based detection of 90 antibiotic resistance genes of gram-positive bacteria a novel universal dna labeling and amplification system for rapid microarray-based detection of 117 antibiotic resistance genes in gram-positive bacteria evaluation of an expanded microarray for detecting antibiotic resistance genes in a broad range of gram-negative bacterial pathogens development of a miniaturised microarray-based assay for the rapid identification of antimicrobial resistance genes in gramnegative bacteria metagenomic and network analysis reveal wide distribution and co-occurrence of environmental antibiotic resistance genes identification of acquired antimicrobial resistance genes the comprehensive antibiotic resistance database arg-annot, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes ardb-antibiotic resistance genes database card 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database genetic methods for detection of antimicrobial resistance bacterial communities and antibiotic resistance communities in a full-scale hospital wastewater treatment plant by high-throughput pyrosequencing metagenomic analysis reveals the impact of wastewater treatment plants on the dispersal of microorganisms and genes in aquatic sediments overexpression of antibiotic resistance genes in hospital effluents over time using the whole genome sequence to characterize and name human adenoviruses characterization of metagenomes in urban aquatic compartments reveals high prevalence of clinically relevant antibiotic resistance genes in wastewaters diversity and antibiotic resistance profiles of pseudomonads from a hospital wastewater treatment plant trends and barriers to lateral gene transfer in prokaryotes plasmid encoded antibiotic resistance: acquisition and transfer of antibiotic resistance genes in bacteria detection of 140 clinically relevant antibiotic-resistance genes in the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to selected antibiotics phylogenetic analysis shows that the oxa b-lactamase genes have been on plasmids for millions of years antimicrobial-resistant bacteria in the community setting genetic exchange between bacteria in the environment large-scale analysis of plasmid relationships through genesharing networks ctx-m enzymes: origin and diffusion mechanisms of, and barriers to, horizontal gene transfer between bacteria two steps away from noveltyàprinciples of bacterial dna uptake bacterial transformation: distribution, shared mechanisms and divergent control persistence of extracellular dna in river sediment facilitates antibiotic resistance gene propagation composite mobile genetic elements disseminating macrolide resistance in streptococcus pneumoniae transfer of antibiotic-resistance genes via phagerelated mobile elements horizontal gene transfer: building the web of life characterization of a p1-like bacteriophage encoding an shv-2 extended-spectrum β-lactamase from an escherichia coli strain phage φc2 mediates transduction of tn6215, encoding erythromycin resistance, between clostridium difficile strains bacteriophage-mediated transduction of antibiotic resistance in enterococci efficient transfer of antibiotic resistance plasmids by transduction within methicillin-resistant staphylococcus aureus usa300 clone gene transfer agents: phage-like elements of genetic exchange genetic recombination in rhodopseudomonas capsulata high frequency of horizontal gene transfer in the oceans cell fusion and hybrids in archaea: prospects for genome shuffling and accelerated strain development for biotechnology a multicopy plasmid of the extremely thermophilic archaeon sulfolobus effects its transfer to recipients by mating current infection control recommendations for prion disease; a difficult problem for the deployed personnel of the armed forces kinetics of ozone inactivation of infectious prion protein epidemiological characteristics of human prion diseases early preclinical detection of prions in the skin of prion-infected animals prions in the environment: occurrence, fate and mitigation persistence of pathogenic prion protein during simulated wastewater treatment processes subviral agents virus taxonomy: ninth report of the international committee on taxonomy of viruses detection of coconut cadang-cadang viroid (cccvd) in oil palm by reverse transcription loop-mediated isothermal amplification (rt-lamp) key: cord-017156-ximzvqbm authors: forsdyke, donald r. title: chargaff’s gc rule date: 2010-05-18 journal: evolutionary bioinformatics doi: 10.1007/978-0-387-33419-6_8 sha: doc_id: 17156 cord_uid: ximzvqbm evolutionary selective pressures sometimes act to preserve nucleic acid features at the expense of encoded proteins. that this might occur in the case of nucleic acid secondary structure was noted in chapter 5. that this might also apply to the species-dependent component of the base composition, (g+c)%, was shown by sueoka in 1961 [2]. the amino acid composition of the proteins of bacteria is influenced, not only by the demands of the environment on the proteins, but also by the (g+c)% of the genome encoding those proteins. chargaffs "gc rule" is that the ratio of (g+c) to the total bases (a+g+c+t) tends to be constant in a particular species, but varies between species. sueoka further pointed out that for individual "strains" of tetrahymena (ciliated protozoans) the (g +c)% (re ferred to as "gc" ) tends to be uniform throughout the genome: " if one compares the distribution of dna molecules of tetrahymena strains of different mean gc contents, it is clear that the difference in mean values is due to a rather uniform difference of gc content in individual molecules. in other words, assuming that strains of tetrahym ena have a common phylogenetic origin, when the gc content of dna of a particular strain changes, all the molecules undergo increases or decreases of gc pairs in similar amounts. this result is consistent with the idea that the base composition is rather uniform not only among dna molecules of an organism, but also with respect to different parts of a given molecule." again, this observation has since been shown to apply to a wide variety of species, although many organisms have their genomes finely sectored into regions ("homostability regions" or " isochores") of low or high (g+c)% (see later). sueoka also noted a link between (g+c)% and reproductive isolation for strains of tetrahym ena: "dna base composition is a reflection of phylogenetic relationship. furthermore, it is evident that those strains which mate with one another (i.e. strains within the same 'variety ') have similar base compositions. thus strains of variety i ..., which are freely intercrossed, have similar mean gc content." it seems that, in identifying (g +c)% as the component of the base composition that varies between species, chargaff had uncovered what can now be recognized as the " ho ly grail " of speciation postulated by the victorian physiologist george romanes [3] . romanes had drawn attention to the possibility of what we would now call non-genic variations (germ-line mutations that usually do not affect gene products). as manifest in the phenomenon of hybrid sterility, these would tend to isolate an individual reproductively from most members of the species to which its close ancestors had belonged, but not from individuals that had undergone the same non-genic variation . romanes held that, in the general case, this isolation was an essential precondi-tion for the preservation of the anatomical and physiological characteristics (genic characteri stics) that were distinctive of a new species. in the early dec ade s of the twentieth century william bateson als o postulated non-genic inherited vari ati ons that tend to remain relatively constant (vary only within narrow limits) with in a species, but would vary between species (i.e. a species member would not d iffer from its fellow species member s, but would differ from members of allied species). t he non-genic variation s, in whatever was responsible for carry ing hereditary information from generatio n to generation (not known at that time), would have the potential to lead to spec ies differentiation, so that variant individuals (con stituting a potential " not-self' incipient species) would end up not being able to reproduce with members of the main speci es ("sew' species). reproduction bein g unsu ccessful, the main species can be viewed as const ituting a "reproduct ive env ironment" that moulds the genome phenotype ("reprotype") by negatively se lect ing (by den ying reproducti ve success to) variant organisms that attempt (by mating and producing healthy, fertile , offspring) to recross the eme rging interspecies boundary. thu s, the main specie s positively selects itself by negati vely selecting variants. should these variants find compatible mates, then they might accumulate as a new species that, in turn , would positiv ely select itself by negatively selecting further variants. this is " spec ies se lection," a form of group se lection that many biologists have found hard to imagine. indeed, richard dawkins, hav ing sco rned the " argume nt from personal incredulity," was obliged to resort to it when confro nted with the possibility of species se lection: " it is hard to th ink of reasons why species survivabi lity should be decoupled from the s um o f the surv ivabilities of the individual members of the spec ies" [4] . when the latter sentence is parsed its logic see ms imp eccabl e. hold tight , and we will see if we can work it out. "the spec ies" is the establi sh ed main species, members of which imp eril themselves onl y marginally, if at all , by mating with (denying reproductive success to) members of a small potentially incipient species . thu s, in reproductive interactions between a main and an incipient spec ies, survivability of the main spec ies is coupled negatively to the sum of the survivabilit ies of ind iv idua l members of the incipient specie s (i.e. it surv ive s when they do not survive), much more than it is coupled positively to the sum of the survivabilities of its own individual members (i .e. it s urv ives when they survive). in this sense, main spec ies survivability is coupled to the sum of the survivabilities of individual members of the incipient species, and decoupl ed from the sum of the survivabilities of its own individual members. of course, by individual survivabilities is meant, not just mer e survival, but survival permitting unimpeded production offertile offspring. survival of members of an incipient species occurs, not only when cla ssical darwinian phenotypic interactions are favourable (e.g. escape from a tiger), but also when reprotypic interactions are favourable (e .g. no attempted reproduction with members of the main species). tigers are a phenotypic threat. members of the main species are a reprotypic threat [3] . individual members of a main species that are involved (when there is attempted crossing) in the denial of reproductive success to individual members of an incipient species, are like individual stones in the walls of a species fortress against which the reproductive arrows of an incipient species become blunted and fall to the ground . alternatively, the main species can be viewed as a gulliver who barely notices the individual lilliputian incipients brushed off or trampled in his evolutionary path. just as individual cells acting in collective phenotypic harmony constitute a gulliver, so individual members of a species acting in collective reprotypic harmony constitute a species. that harmony is threatened, not by its own members, but by deviants that, by definition, are no longer members of the main species (since a species is defined as consisting of individuals between which there is no reproductive isolation). these deviants constitute a potential inc ipient species that might one day pose a phenotypic threat to the main species (i.e. they will become part of the environment of the latter). it is true that a member of a main species that becomes irretrievably pairbonded with a member of an incipient species (e .g. pigeons) will leave fewer offspring, so that both members will suffer the same fate (have decreased survivability in terms of number offertile offspring). but, in the general case, one such infertile reproductive encounter with a member of an incipient species will be followed by many fertile reproductive encounters with fellow members of the main species. members of the main species are most likely to encounter other members of the main specieshence, there will be fertile offspring. members of an incipient species, being a minority, are also most likely to encounter members of the main species -hence, there will be infertile (sterile) offspring. much more rarely , a member of an incipient species will encounter a fellow incipient species member with which it can successfully reproduce -an essential precondition for species divergence. once branching (reproductive isolation) is initiated (fig. 7-4) , the natural selection of darwin should help the branches sprout (extend in length). natural selection would favour linear species differentiation by allowing the survival of organisms with advantageous genic variations, and disallowing the survival of organisms with disadvantageous genic variations. these genic variations would affect an organism's form and function (the classical phenotype). darwin thought that natural selection might itself suffice to bring about branching. indeed, it appears to do so in certain circumstances, as when segments of a species have become geographically isolated from each other. however, here the branching agency is whatever caused the geographical isolation , not natural selection. speciation requires isolation in some shape or form. the probl em of the ori gin of species is that of determining what form isolation takes in the general case . in his faith in the power of natural selection , darwin wa s like the early chemists who were s atisfied w ith atoms as the ultimate basi s of matter. but for some chemists phenomena such as swinging compass needles (magnetism) , falling apples (gravity), and (lat er) radioactivity, were manifestations of som ething more fundamental in chemistry than atom s. likewise, for some biologists the phenomenon of hybrid ste rility seemed to manifest something more fundamental in biology than natural selection [3] . romanes referred to his holy grail (speciating factor) as an abstract " intrinsic peculiarity" of the reproductive system. bateson described his as an abstract " res id ue" with which genes were independ ently assoc iated. goldschmidt's was an ab stract chromosomal " patte rn" caused by "s ys temic mutations" that would not necessaril y affect genic function s (see chapter 7). these are just what we might expect of (g +c)%. indeed, in bacteria, which when so inclined inte rmitte ntly tran sfer dna in a sex ual fashion [5] , differences in (g +c)% appear early in the spec iation process [6], in keeping with sueoka's above obs ervations in ciliates. as show n in chapter 3, where different levels of genetic information were considered , a metaphor for the role (g +c)% might play in keeping individuals reproductively isolated from each oth er, is their acc ent [7] . a common language brings people together, and in this way is conducive to sexual reproduction . but languages can vary , first into diale cts and then into independent sub-lang uages . lin gu istic differen ces keep people apa rt, and this difference in the reproductive environment can militate against sexual reproduction . at the molecular level , we see similar force s acting at the level of meiosis -the dance of the chromosomes. in the gonad sim ilar paternal and maternal chromosomes (homologues) align . the early microscopists referr ed to this as "c onj ugation." if there is sufficient seq uence identity (i.e. the dna " accents" match), then the band plays on . the chromosomes continue their minuet, progressing through various check-points [8] , and gametes are formed. if there is insufficient identity (i .e. the dna " accents" do not match) then the music stops. meiosis fails , gametes are not formed , and the child is ster ile -a " mule." thus, the parents of the child (their " hy brid") are reproductively isolatedfrom each other (i.e. unable to generate a line of descendents due to hybrid sterility), but not necessarily from other members of their species. at least one of the parents has the potential to be a founding member of a new spec ies, provided it can find a mate with the same dna " accent." differences in (g+c)% have the potential to initiate the speciation process creating first " incipient species" with partial reproductive isolation, and then " species" that, by definition , are fully reproductively isolated. to see how this might work, we consider the chemistry of chromosome alignment at meiosis [9]. in 1922 muller suggested that the pairing of genes as parts of chromosomes undergoing meiotic synapsis in the gonad might provide clues to gene structure and replication [10] : "it is evident that the very same forces which cause the genes to grow [duplicate] should also cause like genes to attract each other [pair] .... if the two phenomena are thus dependent on a common principle in the make-up of the gene, progress made in the study of one of them should help in the solution of the other." in 1954 he set his students an essay "how does the watson-crick model account for synapsis?" [ii] . the model had the two dna strands " inwardlooking" (i.e. the bases on one strand were paired with the bases on the other strand). crick took up the challenge in 1971 with his " unpairing postulate" by which the two strands of a dna duplex would unpair to expose free bases in single-stranded regions [12] . this would allow a search for sequence similarity (homology) between two chromosomes (i.e. between two independent duplexes). others later proposed that the single-stranded regions would be extruded as stem-loops. the " outward-looking" bases in the loops would be available to initiate the pairing process [13] [14] [15] . thus, for meiotic alignment, maternal and paternal chromosomal homologues should mutually explore each other and test for "self' dna complementarity, by the " kissing" mechan ism noted in chapter 6 [16] [17] [18] . under this model ( fig. 8-1 ), the sequences do not commit themselves, by incurring strand-breakage, until a degree of complementary has been recognized. the mechanism is essentially the same as that by which trna anticodon loops recognize codons in mrnas, except that the stem-loop structures first have to be extruded from dna molecules that would normally be in classical duplex form . in all dna molecules examined, base-order supports the formation of such secondary structures (see chapter 5). if sufficient complementarity is found between the sequences of paternal and maternal chromosome homologues (i.e. the genomes are "reprotypically" compatible), then crossing over and recombination can occur (i.e. the " kissing" can be "consummated") . the main adaptive values of this would be the proper assortment of chromosomes among gametes, and the correction of errors in chromosome sequences (see below and chapter 14). "kissing" turns out to be a powerful metaphor, since it implies an exploratory interaction that may have reproductive consequences . as negative supercoiling progressively increases, the strands of each duplex synchronously open to allow formation of equivalent stem-loop secondary structures so that "kissing" interactions between loops can progress to pairing. at the right, paternal and maternal duplexes differ slightly in (g+c)% (x, and x + 1). the maternal duplex of higher (g+c)% opens less readily as negative supercoiling increases, so strand opening is not synchronous, "kissing" interactions fail, and there is no progress to pairing. in this model, chromosome pairing occurs before the strand breakage that accompanies recombination (not shown). even if strand breakage were to occur first (as required by some models), unless inhibited by single-stranded dna-binding proteins the free single strands so exposed would rapidly adopt stem-loop conformations . so the homology search could still involve kissing interactions between the tips of loops the model predicts that, for preventing recombination (i.e. creating reproductive isolation), a non-complementarity between the sequences of potentially pairing strands, in itself, might be less important than a noncomplementarity associated with sequence differences that change the pattern of stem-loops. this implies differences in the quantities of members of the watson-crick base pairs in single strands (i.e. a parity difference).this is because parity between these bases would be needed for optimum stem formation . parity differences should correlate with differences in stem formation, and hence, different stem-loop patterns, as will now be con sidered. what role does the (g+c)% "accent" play in meiotic pairing? from calculated dna secondary structures, it has been inferred that small fluctuations in (g +c)% have great potential to affect the extrusion of stem-loops from duplex dna molecules and , hence, to affect the pattern of loops which would then appear ( fig. 5-2) . a very small difference in (g+c)% (reprotypic difference) would mark as "not-self' a dna molecule that was attempting to pair meiotically with another dna ("self'). this would impair the kissing interaction with the dna [19, 20] , and so would disrupt meiosis and allow divergence between the two parental lines , thus initiating a potential speciation event. the total stem-loop potential in a sequence window can be analysed quantitatively in terms of the relative contributions of base composition and base order, of which base composition plays a major role (see chapter 5). of the various factors likely to contribute to the base composition-dependent component of the folding energy of an extruded single stranded dna sequence, the four simplest are the quantities of the four bases. two slightly more complex factors are the individual bases, from each potential watson-crick base pair, that are present in lowest amounts. for example, if the quantities of a, g , c and t in a 200 nucleotide sequence window are 60, 70, 30 and 40 , respectively , then what may be referred to as "a t min " would be 40, and the corresponding "gc min " would be 30 . these numbers would reflect the upper limit on the number of base pairs that could form stems, since the quantity of the watson-crick pairing partner that was least would placc a limit on the possible number of base pairs. this value might be expected to correlate positively with folding stability. conversely, the excess of bases without a potential pairing partner (in the above example a-t = 20 and g-c = 40) might provide an indication of the maximum number of bases available to form loops . since loops tend to destabilize stem-loop structures, these "chargaff difference" values might be expected to correlate negatively with folding stability. although the bases are held in linear order, a vibrating single-stranded dna molecule has the potential to adopt many structural conformations, with watson-crick interactions occurring between widely separated bases. accordingly, pairing can also be viewed as if the result of random interactions between free bases in solution. this suggests that the two products of the quantities of pairing bases could be important (60 x 40 , and 70 x 30 , in the above example). the products would be maximal when pairing bases were in equal proportions in accordance with chargaffs second parity rule . in an attempt to derive formulae permitting prediction of folding energy values directly from the proportions of the four bases , jih-h . chen [21] examined the relative importance of e ight of the above ten factors in determining the base composition-dependent component of the folding energy (fors-m; see chapter 5). these factors were a, g , c, t , at mi ." cg mi ,,, a x t, and g x c (where a, c , g, and t refer to the quantities of cach particular base in a sequence window). the products of the quantities of the watson-crick pairing bases (a x t, and g x c) were found to be of major importance, with the coefficients of g x c (the strongly interacting s bases), greatly exceeding those of ax t (the weakly interacting w bases). less important were at min and cg min , and the quantities of the four bases. all ten parameters were exam ined in an independent study, which confirmed the major role of the product of the quantities of the s bases in a segment ( of particular importance is that it is not just the absolute quantities of the s bases, but the product of the multiplication of these absolute quantities. this should amplify very small fluctuations in (g+c)%, and so should have a major impact on the folding energy of a segment and, hence, in the pattern of stem-loops extruded from the duplex dna in a chromosome engaging in a "kissing" homology search for a homologous chromosome segment. if stem-loops are of critical importance for the initiation of pairing between segments of nucleic acids at meiosis, then differences in (g +c)% could strongly influence the establishment of meiotic barriers, so leading to speciation . but barriers may be transient. having served its purpose, an initial barrier may be superseded later in the course of evolution by a more substantial barrier (see figure 7 -4). in this circumstance evidence for the early transient barrier may be difficult to find. however, in the case of different, but related, virus species (allied species) that have the potential to co infect a common host cell , there is circumstantial evidence that the original (g+c)% barrier has been retained. modern retroviruses, such as those causing aids (hiv -1) and human t cell leukemia (htlv-i), probably evolved by divergence from a common ancestral retrovirus. branching phylogenetic trees linking the sequences of modern retroviruses to such a primitive retroviral " eve" are readily constructed, using either differences between entire sequences, or just (g +c)% differences [23] . the fewer the differences, the closer are two species on such trees. unlike most other virus groups, retroviruses are diploid. as indicated in chapter 2, diploidy entails a considerable redundancy of information, a luxury that most viruses cannot afford. they need compact genomes that can be rapidly replicated, packaged and dispersed to new hosts. however, different virus groups have evolved different evolutionary strategies. the strategy of retroviruses is literally to mutate themselves to the threshold of oblivion ("mutational meltdown"), so constituting a constantly moving target that the immune system of the host cannot readily adapt to . to generate mutants, retroviruses replicate their nucleic acids with self-encoded enzymes (polymerases) that do not have the error-correcting ("proof-reading") function that is found in the corresponding enzymes of their hosts. indeed, this is the basis of aids therapy with azt (azidothymidine), which is an analogue of one of the nucleotide building blocks that are joined together (polymerized) to form linear nucleic acid molecules ("polymers;" see chapter 2) . azt is recognized as foreign by host polymerases, which eject it. but retroviral polymerases cannot discriminate, and levels of mutation (in this case termination of the nucleic acid sequence) attain values above the obi iv-ion threshold ("hypermutation") from wh ich it is impossibl e to recover ("error catastrophy"). below the thre shold, there is a most effective mech ani sm to counter mutational damage. the retroviral counter-mutation strategy requires that two complete sing lestrand retroviral rna genomes be packaged in each viru s particl e (i .e. diploidy). each of these genomes will be seve rely mut ated but, since mutations occur randomly, there is a chance that each genome w ill have mutations at different sites. thus, in the next host cell there is the possibility of recombination (cutting and splicing) betw een the two genomes to gen erate a new genome with many less, or zero, mutations [24] . the copackaging of the two genomes requires a proc ess analogous to meiotic pairing. on each genome a "d imer initiation" nucleotide sequenc e folds into a stem-loop struc ture. " kissing" interact ions between the loop s preced e the form ation of a short length of duplex rna , so that the two genomes form a dimer. this allows packaging and , in the next host , recomb ination can occur. wh at if two diploid viru ses both infected the same host ce ll, thu s releasin g four geno mes into an environment conducive to recombination? in many cas es th is would be a most favorable circumstance, sinc e there would now be four damaged genomes from which to regenerate, by repeated acts of recombination, an ideal ge nome . thus, it would seem maladaptive for a viru s with this particularly strate gy to evo lve mechanisms to prevent entry of anoth er virus ("sup erinfect ion") into a cell that it was occupying, at least in the early stage s of infection [25 ] . this presupposes that a co-infec ting viru s will be of the same spec ies as the virus whi ch first gain ed entry . however, h1v-1 and htlv -l are retroviruses of alli ed , but distin ct, species. th ey have a common host (humans) and common host cell (known as the cd4 t-iymphocyte). when in the cou rse of evo lution these two virus spec ies first began to diverge from a common ancestral retroviral species, a barrier to recombination had to develop as a cond ition of successful div ergence. yet, these two virus types needed to retain a common host cell in which they had to perform sim ilar tasks. thi s meant that they had to retain similar gen es. many simi lar gene-encoded function s are indeed found . similar genes implies similar sequences, and sim ilar sequences implies the possibility of recombination betw een the two genomes. thu s, coexisten ce in the same host cell could result in the viruses destroying each oth er, as distinct species members, by mutually recombining (shuffl ing their genomes tog ether). without a recombination barrier each virus was part of the selective environment of the other. this should have provided a pre ssure for genomic changes that, while not interfering with conventional phenotypic functions , would protect against recombination with the other type. if (g+c)% differences could create such a recombination barrier (while maintaining, through choice of appropriate codons, the abilities to encode similar amino acid sequences), then such differences would be selected for. when we examine the (g +c)% values of each of these species there is a remarkable difference. j-1iv-i is one of the lowest (g+c)% species known (i.e. it is at-rich). j-1tlv-i is one of the high est (g+c)% species known (i.e. it is gc-rich). this might be regarded as just a remarkable coincidence save for the fact that, in some other situations where two viruses from different but allied species occupy a common host cell , there are also wide differences in (g +c)% [3, 19] . as set out above, th ese (g+c)% differences alone should suffice to prevent recombination. the plant which gives us tobacco, nicotiniana tabacum, is a tetraploid which emerged some six million years ago when the two diploid genomes of nicotin iana sylvestris and nicotiniana tomentosiform is appeared to fuse . nicotiniana tabacum is designated an allotetraploid (rather than an autotetraploid) since the two genomes were from different so urce species (greek : allos = other; autos = same). the two species are estimated to have diverged from a common ancestral species 75 million years ago . as allied species they should have retained some sequence similarities; so within a common nucleus in the tetraploid there should have been ample opportunity for recombination between the two genomes. yet , the genomes have retain ed their separate identities. this can be shown by backcrossing to the parental types. half the chromosomes of the tetraploid pair at meiosis with chromosomes of one parent type. thus, recombination of the other chromosomes of the tetraploid with chromosomes of that parent type is in some way prohibited. in 1940 goldschm idt noted [26]: "c lausen ... has come to the conclusion that n. tabacum is an allotetraploid hybrid, one of the genomes being derived from the species sylvestris, the other from tomentosa. by continuous backcrossing to sylvestris the chromosomes deriv ed from sylvestris can be tested because they form tetrads with the sylvestris the surv ival of a duplicate copy of a gene depends on a var iety of factors , including (i) natural selection favouring organisms where a function encoded by the gene is either increased or changed (i.e . there is either concerted or divergent gene evolution), (ii) a recombination-depend ent proc ess known as gene conversion , and (iii) a recombination-dependent process that can lead to copy-loss (see fig. 8-2) . these intragenomic recombination s can occur when there is a successful search for similarity between dna strands . thi s is likely to be greatly influ enc ed by the (g+c)% environment of th e or iginal gene and the (g +c)% env iro nment wh ere the duplicate copy locates. once a (g +c)%-dependent speciation proc ess has begun, factors oth er than (g +c)% are likely to replace the original difference in (g +c)% as an intergenomic barrier to reproduction (i .e . a barri er to recombination between diverged paternal and maternal genomes within their hybrid, if such a " mule" can be generated; fig . 7-4) . in this circumstance, (g +c)% becomes free to adopt oth er roles, such as the prevention of recombination within a genome (intragenom ic recombination). this can invo lve the differentiation of regi ons of relatively uniform (g+c)%, that japane se physicists aki yo shi wada and ak ira suy ama referred to as having a " homosta bizing propensity" and g iorg io bernardi and his coworkers named " isoc hores" (greek : iso = sa me; choras = group) [28 , 29] . th ese hav e the potential to recombinatio nally isolate different part s of a genome. t hus, the attempted duplication of an ance stral g lobin gene to gen erate the a-globin and [3-globin gen es of mod ern primates might have fa iled sinc e sequ ence sim ilarity would favour recombination between the tw o gen es and incipi ent differences (early sequence divergence) co uld have been e liminated (" gene conversion;" fig. 8-3 ). how ever, the dup lication app ears to hav e involved relocation to a d ifferent isochore with a different (g +c)%, so the two genes became recombinationally isolated to the ext ent that initi ally the sequ ences flanking the genes d iffered in (g +c)% . later the new gene would have increased its recombinational isolation by mutating to acquire the (g+c)% of its host isochore. as a con sequence of the differences in (g+c)% the correspond ing mrnas today utilize different codons for correspond ing amino ac ids, even though both mrnas are tran slated in the same cell using the same ribosomes and same trna populations. so it is most unlikely that the primary pressure to differentiate codons aro se at the translational level. ig. 8-2. model for possible outcomes of a gene duplication. the duplication from (a) can result in identical multicopy genes (b) that confer an ability to produce more of the gene product. if this is advantageous, then the multicopy state will tend to be favored by natural selection. if unmutated (white box in (b)) or only slightly mutated (light grey striped box in (c)) , there are not sufficient differences between the duplicates to prevent a successful homology search (d). this allows the mutation (c) to be reversed to (b) by the process known as gene conversion (see fig. 8-3 ). this maintains identical copies, so allowing concerted evolution of the multicopy genes to continue. however, the recombination necessary for gene conversion can also result in removal of a circular intermediate (e, f), and restoration of the single copy state (g). the risk of copy-loss due to recombination (d-g) can be decreased by further mutation (dark grey striped box in (h)). this will decrease the probability of a successful homology search. being protected against recombination (i.e. preserved), the duplicate is then free to differentiate further by mutation (black box in (i)). if the product of the new gene confers an advantage, then the duplicate will be further preserved by natural selection (divergent gene evolution). in the general case, mutation facilitating recombinational isolation (h) precedes mutation facilitating functional differentiation (ij under positive darwinian selection < atgctgcggctatcgcagcat s + m 5' ---i t-a-g--g-a-g-g-g-g g i t a g-c--g+g-g-i=-a > 3' 3'< 5' atgctg~cag ca t (b) > 3' ( ) e-e-e-~e-e-e-.t 5'----+--a-f"l-ffi-ih7-fi---f;-1'::-(.,; r=f-"f-'~re-fi-., 3' in the alternative shown here, the status quo is restored to the top duplex (an a is mutated to t), but in the bottom duplex the t-t non-watson-crick basepair is replaced with an a-t watson-crick base-pair (i.e. a t is mutated to an a). thus, there has been conversion of the sequence of the original m allele to that of the p allele. there has been a loss of heterozygosity (as in (a)) and a gain of homozygosity (as in (d)). in this example, gene conversion involves copies of homologous genes (alleles) on different chromosomes. however, gene conversion can also involve homologous genes (non-allelic "paralogues") on the same chromosome (see fig. 8 -2). note that, in chapter 4, sequence 3.1 (p above) is shown to form a stem-loop with the central bases being located in the loop (sequence 4.4). since the single base-pair difference between p and m versions is in this loop, then the m version has the potential to form a similar stem-loop. because the loops differ slightly, during the initial homology search loop-loop "kissing" interactions might fail and prohibit subsequent steps. however, cross-over points can migrate (e.g. (b) to (c)), so that if crossing over is prohibited in one region there is some possibility of a migration from a neighboring region that would reveal mismatches. thus, multiple incompatibilities (base differences) are most likely to inhibit the pairing of homologous chromosomes and the repairing of multiple mismatches each isochore would have arisen as a random fluctuation in the base composition of a genomic region such that a copy of a duplicated gene that had transposed to that region was able to survive without recombination with the original gene for a sufficient number of generations to allow differentiation between the copy and its original to occur. thi s would have provided not only greater recombinational isolation, but also an opportunity for functional differentiation. if the latter differentiation were advantageous, organisms with the copy would be favoured by natural selection. the regional base compositional fluctuation would then have "hitch-hiked" through the generations by virtue of its linkage to the successful duplicate (i.e. the copy would have been positively selected). by preserving the duplicate copy from re-combination with the original copy, the isochore would, in turn , have itself been preserved by virtue of its linkage to the duplicate copy. when functional differentiation of a duplicate is necessary for it to be selected (divergent evolution), there is the danger that, before natural selection can operate, recombination-mediated gene conversion will rev erse any incipient differentiation, or intragenic recombination between the copies (paralogues) will result in copy-loss. in the case of duplicate eukaryotic genes that have diverged in sequence, koichi matsuo and his colleagues noted that divergence was greatest at third codon positions, usually involving a change in (g+c)% [30-33]. thus, there was a codon bias in favour of the positions of least importance for the functional differentiation that would be necessary for the operation of natural selection. where amino acids had not changed, different gene copies used different synonymous codons. it wa s proposed that the (g+c)% change was an important "line of defence" against homologous recombination between the duplicates. thus, recornbinational isolation of the duplicate (largely involving third codon position differences in (g+c)%) would protect (preserve) the duplicate so allowing time for functional differentiation (largely involving first and second codon position differences), and hence, for natural selection to operate. in the general case, isolation would precede functional differentiation, not the converse. (g+c)% differentiation, largely involving third codon positions, would precede functional differentiation, largely involving first and second codon positions under positive darwinian selection . from all this it would be predicted that, if a gene from one isochore were transposed to an isochore of different (g+c)%, and its ability to recombine with its allele were advantageous, then the gene would preferentially accept mutations converting its (g+c)% to that of the new host isochore (i .e. organisms with those mutations would be genetically fitter and thus likely to leave more fertile offspring than organisms without the mutations). indeed, there is evidence supporting this. the sex chromosomes (x and y) tend not to recombine at meiosis except in a small region (the "pseudoautosomal" region; see chapter 14). transfer of a gene from a non-recombining part of a sex chromosome to the pseudoautosomal region forces the gene rapidly to change its (g+c)% value [34] . for various reasons (e.g. large demand for the gene product), certain genes are present in multiple identical copies. but, in the absence of some restraint, copies that are initially identical will inevitably diverge in sequence [3]. so how can multicopy genes (e .g. rrna genes) preserve their similarity to each other? to prevent divergence through the generations (i.e . to allow "concerted evolution"), they should mutually correct each other to eliminate deviant copies. this is likely to occur by a recombination-dependent process -"gene conversion" (figs. 8-2, 8-3 ; see chapter 10). thus, multicopy genes should all be, either in the same isochore, or in isochores of very close (g+c)%, so that recombination can occur. before dna sequencing methods became available, " isochores" were described as dna segments that could be identified on the basis of their distinct densities in samples of duplex dna obtained from organisms whose cells had nuclei (eukaryotes). the method involved physically disrupting dna by hydrodynamic sheering to break it down to lengths of about 300 kilobases. the fragments were then separated as bands of distinct densities by centrifugation in a salt density gradient. the densities could be related to the average (g+c)% values of the segments, since the greater these values, the greater the densities. this way of assessing the (g+c)% of a duplex dna segment distinguished one large segment from another, and largeness became a defining property of isochores. isochores, as so defined, were not identified in bacteria, which do not have distinct nuclear membranes (prokaryotes; see chapter 10). since prokaryotes (e.g. bacteria) and eukaryotes (e.g. primates) are considered to have evolved from a common ancestor, does this mean that the ancestor had isochores that were subsequently lost by prokaryotes during or after their divergence from the eukaryote lineage (isochores-early)? or did the ancestor not have isochores, which were therefore freshly acquired by the eukaryotic lineage after its divergence from the prokaryotic lineage (isochores-late)? if prokaryotes could be shown to have isochores, then this would favour the isochores-early hypothesis. indeed, prior to modern sequencing technologies, physical methods demonstrated small segments of distinct (g+c)% in the genomes of prokaryotes and their viruses. the 48 kb duplex genome of phage lambda (see chapter 5) was extensively sheered to break it down to subgenome-sized fragments. these resolved into six distinct segments, each of relatively uniform (g+c)%, by the density method [35] , and into thirty four "gene sized" segments by another, more sensitive, method (thermal denaturation spectrophotometry) [36] . with the advent of sequencing technologies, in 1984 mervyn bibb and his colleagues were able to plot the average (g+c)% values of every third base for small windows in the sequences of various bacteria (fig. 8-4) [37] . three plots were generated, the first beginning with the first base of the sequence (i.e. bases in frame i, 4, 7, etc.), the second beginning with the second base of the sequence (i.e. bases in frame 2,5,8, etc.), and the third beginning with the third base of the sequence (i.e. bases in frame 3, 6, 9, etc.) . in certain small regions (g+c)% values were relatively constant within each frame. these regions ofconstant (g+c)% corresponded to genes. note that the relative constancy of (g+c)% is most for the third codon position (mainly independent of the encoded amino acids), and least for the second codon position (most dependent on the encoded amino acids). the fluctuation in values at the second codon position is more apparent when a window size equivalent to 14 codons is used (b) than when a window size equivalent to 42 codons is used (a). this figure was redrawn from ref. [37] thus, individual genes have a relatively uniform (g +c)% and each codon position makes a distinctive contribution to that uniformity . this is not confined to bacteria. wada and suyama noted that, whether prokaryotic or eukaryotic, "every base in a codon seems to work cooperatively towards realizing the gene's characteristic value of (g+c) content." this was a "homostabilizing propensity" allowing a gene to maintain a distinct (g+c)%, relatively uniform along its length , which would differentiate it from other genes in the same genome [38) . thus, each gene constitutes a homostabilizing region in dna . stated another way , if large size is excluded as a defining property, many bacteria have isochores. when isochores are defined as dna segments of relatively uniform (g+c)% that are coinherited with specific sequences of bases, then bacteria have isochores. to contrast with the classical isochores of bernardi, these are termed "rnicroisochores," and their length is that of a gene, or small group of genes (see chapter 9). thus, classical eukaryotic isochores ("macroisochores") can be viewed as constellations of microisochores of a particular (g +c)%. the proposed antirecombination role of (g+c)% would required that , unless they represent multicopy genes, microisochores sharing a common macroisochore (i.e. they have a common (g+c)%) have other sequence differences that are sufficient to prevent recombination between themselves [39). within an organism, genes with similar (g+c)% values may sometimes locate to similar tissues, so that there is a tissue-specific codon usage tendency [3 i). since both prokaryotic (e.g. bacterial) and eukaryotic (e.g. primate) lineages have some form of isochore, this appears most consistent with the isochores-early hypothesis. while not endorsing a particular role for (g+c)%, this underlines the fundamental importance of (g +c)% differences in biology . let metaphors multiply! a given segment of dna is coinherited with a "coat" of a particular (g+c)% "color." a given segment of dna "speaks" with a particular (g+c)% "accent," (and hence has a distinct potential vibrational frequency; see fig. 5-2) . a fundamental duality of information levels is again manifest. as will be further considered in chapter 9, it is likely that differences in (g+c)% serve to isolate recombinationally both genes within a genome, and genomes within a group of species (a taxonomic group). the power to recombine is fundamental to all life forms because, for a variety of reasons, it is advantageous (see chapter 14). however, the same power threatens to homogenize (blend) genes within a genome, and to homogenize (blend) the genomes of members of allied species within a taxonomic group (i.e. genus). this would countermand evolution both within a species and between spe-cies. thus, f unctional differentiation. be it between genes in a genome. or between genomes in a taxonomic gro up (spec iation), must. in the general case, be preceded (or closely accompanied) by the establishment of recombinational barriers. species have long been defined in terms of recombinational barriers (see chapter 7). in some cont exts, genes are defined sim ilarly. a species can be defined as a unit of recombination (or rather, of antirecombination with respect to other species). so can a gene . most definitions of the "gene" contain a loose or explic it refe renc e to function. thus, biologists talk of a gen e encod ing information for tallness in peas. biochemists ta lk of the gene encoding information for growth hormone (a prot ein), and relate this to a segment of dna (se e legend to fig . 10-1) . however, before it can function , information must be preserved. classical darwinian theory proposes that function , through natural selection, is itself the preserving agent. thus, function and preservation go hand-in-hand, but fun ction is more fundamental than preservation . in 1966 biol ogi st ge org e williams in the usa , an originator of the "se lfis h gen e" con cept, seem ed to argue the converse wh en arriving at a new definition. the function of any multipart entity, which needs more than one part for this function , is usually dependent on its parts not being se parated. preservation can be more fundamental than function . williams propo sed that a gen e should be defined entire ly by its property of remaining intact as it passes from generation to generation. he identified recombination as a major thr eat to that intactness. thus, for williams, "gene" meant any dna segment that has the potential to persist for enough generations to serve as a unit for natural selection; this requires that it not be easily disruptable by recombination . th e gene is a un it of reco mbination (or rather, of antirecombination with respect to other gen es) [40] . " socrates' ge nes may be with us yet, but not his genotype, becau se meiosis and recombination destroy genotypes as surely as death. it is only the meiotically dissociated fragments of the genotype that are transmitted in sex ua l reproduction , and these fragments are further fragm ented by meio sis in the next generation . if there is an ultimate indivisible fragm ent it is, by definition, ' the gene ' that is treated in the abstract discussions of population genetics. various kind s of suppress ion of recomb ination may cause a major chromosomal segment or even a whole chromo some to be transmitted entire for many generations in certain lines of descent. in such cases the segment, or chromosome, behaves in a way that approximates the population genetics of a s ingle gene . . . . i use the term gene to mean ' that which segregates and recombines with appreciable frequency ' .... a gene is one of a multitude of meiotically dissociable units that make up the genotypic message." despite this, williams did not invoke any special chromosomal characteristic that might act to facilitate preservation . pointing to "the now discredited theories of the nineteenth century," and lamenting an opposition that " arises . . . not from what reason dictates, but from the limits of what the imagination can accept," his text adaptation and natural selection made what seemed a compelling case for "natural selection as the primary or exclusive creative force ." no other agency was required. this tendency, which can be infectious, to bolster the scientific with the ad hominem in otherwise rational discourse, will be considered in the epilogue.jn contrast, we have here considered intergenomic and intragenomic differences in (g+c)% as an agency, essentially independent of natural selection, which preserves the integrity of species and genes, respectively . within a species individual genes differ in their (g +c)%. relative positions of genes on the (g +c)% scale are usually preserved through speciation events. if, in an ancestral species, gene a was of higher (g+c)% than gene e, this relationship has been sustained in the modern species that resulted from divergences within that ancestral species. accordingly, when the (g+c)% values of the genes of one of the modern species are plotted against the corresponding (g+c)% values of similar (orthologous) genes in the other modern species, the points usually fit a close linear relationship (c.f. fig. 2-5) . species with intragenomic isochore differentiation can themselves further differentiate into new species. in this case, a further layer of intergenomic (g+c)% differentiation would be imposed upon the previous intragenomic differentiation . again, when a sufficient degree of reproductive isolation had been achieved this initial barrier between species would usually be replaced by other barriers, thus leaving (g+c)% free to continue differentiating in response to intragenomic demands. however, (g+c)% is never entirely free. it can itself be constrained by demands on gene function (i .e. natural selection) that primarily affect first and second codon positions. furthermore, as we shall see next , in extreme environments, natural selection can make direct demands on (g +c)%, which might then conflict with its role as a recombinational isolator. there are few environments on this planet where living organisms are not found . hot springs, oceanic thermal vents, and radioactive discharges of nuclear reactors, all contain living organisms ("extremophiles"). fortunately, since heat and radiation are convenient ways of achieving sterilization in hospitals, none of these organisms has been found (or genetically engineered to become) pathogenic (so far) . thermophiles are so-called because they thrive at high temperatures. proteins purified from thermoph iles may show high stability at normal temperatures, a feature that has attracted commercial interest (i.e. they have a long "shelf life"). hence, the full genomic sequences of many prokaryotic thermophiles (bacteria and archaea) are now available . some thennoph iles normally live at the temperature of boiling wat er. nucleic acid s in solution at this temperature soon degrade. so how do nucleic acid s survive in thermophil es? the secondary structure of nucleic acids with a high (g+c)% is more stable than that of nucl eic acids with a low (g+c)%. this is con sistent with watson-crick g-c bonds being strong, and a-tor a-u bonds being weak (see table 2 -1). do thermophiles have high (g +c)% dna ? in the case of gen es corresponding to rnas whose structure is vital for rna function , namely rrnas and trnas, the answer is affirmative. free of cod ing con straints (i .e. they are not mrnas), yet required to form part of the precise structure of ribo somes where prot e in synthesis occurs, gen es corresponding to rrna s appear to have had the flexibility to accept mutation s that increase g +c (i.e. organisms that d id not accept such mut ations perished by natural selection, presumably acting again st organ isms w ith less effic ient prot ein synthesis at high temperatures). the g +c content of rrnas is directly proportional to the normal growth temperature, so that rrna s of thermophilic prokaryotes are highly enriched in g and c [41] [42] [43] . yet, althou gh optimum growth temperature correlates positively with the g+c content of rrna (and hence of rrn a genes), optimum growth temperature does not correlate positively with the overall g+c content of genomic dna , and hence with that of the numerous mrna populations transcribed from the genes in that dna (fi g. 8-5a). instead, optimum growth temperature correlates positi vely with a+g content (fi g. 8-5b ; see chapter 12) [44] . the finding of no consistent trend tow ards a high genomic (g +c)% in thermophilic organi sm s has been interpreted as supporting the " neutralist" argument that vari at ions in genomic (g +c)% are the consequences of mutational biases and are , in themselves, of no adapti ve value, at least with respect to maintaining duplex stability [43 , 45] . however, the finding is also consistent w ith the argument that genomic (g +c)% is too important merely to follow the dictates of temperature, since its prim ary role is related to other more fundamental adaptations. the stability of duplex dna at h igh temperatures can be ach ieved in ways other than by an increase in g +c content. these include association with small basic peptides (polyam ines) and relaxation of tor sional strain (supercoiling) [46, 47] . thus, there is ev ery reason to believe that , whatever their (g+c)% content, thermophiles are able , both to maintain their dna s in class ica l duplex stru ctures with watson-crick hydrogen-bonding between oppo-site stra nds, and to adopt any necessary extruded secondary structures involving intrastrand hydrogen-bonding (i .e. stem-loops). this will be further considered in chapter 9. reflect the fact that relatively few thermophiles have been sequenced at this time. note that, whereas in (a) only 5% of the variation between points can be explained by growth temperature (~=0.05), in (b) 21% can be explained on this basis (r" = 0.21; see appendix 1) darwin held that biological evolution reflected the accumulation of frequent very small variations, rather than few intermittent large variations. that nature did not work by means of large jumps was encapsulated in the latin phrase "natura non facit saltum ." however, huxley, while supporting most of darwin's teachings, considered it more likely that evolution had proceeded in jumps ("natura facit saltum"). according to the arguments of this chapter, both are correct. within some members of a species small variations in the genome phenotype (i .e. in (g+c)%) accumulate, so that these members become progressively more reproductively isolated from most other members of the species, initially without major changes in the conventional phenotype. as it accrues, reproductive isolation increasingly favors rapid change in the conventional phenotype, often under the influence of natural selection. so, when their appearance is viewed on a geological time scale, new species can seem to "jump" into existence. the rate increase reflects better preservation of frequent phenotypic micromutations rather than of infrequent phenotypic macromutations (i.e . of "hopeful monsters," to use goldschmidt 's unfortunately term). in other words, while there is continuity of variation at the genotype level , as far as speciation is concerned variants (mutant forms) seem to emerge discontinuously at the phenotype level. being infrequent, and hence unlikely to find a member of the opposite sex with the same change, organisms with macrornutations are not the stuff of evolution. single strands extruded from duplex dna have the potential to form stemloop structures that, through exploratory loop-loop "kissing" interactions, may be involved in the homology search preceding recombination. the total stem-loop potential in a sequence window can be analyzed quantitatively in terms of the relative contributions of base composition and base order, of which base composition, and particularly the product of the two s bases (g x c), plays a major role . thus, very small differences in (g +c)% should impair meiotic pairing, resulting in hybrid sterility and the reproductive isolation that can initiate speciation (i.e. because their hybrid is sterile, the parents are, in an evolutionary sense, "reproductively isolated" from each other). in chemical terms, chargaff's species-dependent component of base composition, (g+c)%, may be the "holy grail" responsible for reproductive isolation (non-genic) as postulated by romanes, bateson and goldschmidt. once a speciation process has initiated, other factors (often genic) may replace (g+c)% as a barrier to reproduction (preventing intergenomic recombination between species). this leaves (g+c)% free to assume other roles, such as defined as long segments of relatively uniform (g+c)% that are coinherited with specific sequences of bases. these may facilitate gene duplication. indeed, each gene has a " ho mostabilizing propensity" to maintain itself as a "microisochore" of relatively uniform (g+c)%. protection against inadvertent recombination afforded by differences in (g+c)% facilitates the duplication both of genes, and of genomes (speciation). george williams' definition of a gene as a unit of recombination rather than of function is now seen to have a chemical basis key: cord-264884-ydkigome authors: villarreal, luis p. title: the widespread evolutionary significance of viruses date: 2008-07-05 journal: origin and evolution of viruses doi: 10.1016/b978-0-12-374153-0.00021-7 sha: doc_id: 264884 cord_uid: ydkigome in the last 30 years, the study of virus evolution has undergone a transformation. originally concerned with disease and its emergence, virus evolution had not been well integrated into the general study of evolution. this chapter reviews the developments that have brought us to this new appreciation for the general significance of virus evolution to all life. we now know that viruses numerically dominate all habitats of life, especially the oceans. theoretical developments in the 1970s regarding quasispecies, error rates, and error thresholds have yielded many practical insights into virus–host dynamics. the human diseases of hiv-1 and hepatitis c virus cannot be understood without this evolutionary framework. yet recent developments with poliovirus demonstrate that viral fitness can be the result of a consortia, not one fittest type, a basic darwinian concept in evolutionary biology. darwinian principles do apply to viruses, such as with fisher population genetics, but other features, such as reticulated and quasispecies-based evolution distinguish virus evolution from classical studies. the available phylogenetic tools have greatly aided our analysis of virus evolution, but these methods struggle to characterize the role of virus populations. missing from many of these considerations has been the major role played by persisting viruses in stable virus evolution and disease emergence. in many cases, extreme stability is seen with persisting rna viruses. indeed, examples are known in which it is the persistently infected host that has better survival. we have also recently come to appreciate the vast diversity of phage (dna viruses) of prokaryotes as a system that evolves by genetic exchanges across vast populations (chapter 10). this has been proposed to be the “big bang” of biological evolution. in the large dna viruses of aquatic microbes we see surprisingly large, complex and diverse viruses. with both prokaryotic and eukaryotic dna viruses, recombination is the main engine of virus evolution, and virus host co-evolution is common, although not uniform. viral emergence appears to be an unending phenomenon and we can currently witness a selective sweep by retroviruses that infect and become endogenized in koala bears. phylogenetic tools have greatly aided our analysis of virus evolution, but these methods struggle to characterize the role of virus populations. missing from many of these considerations has been the major role played by persisting viruses in stable virus evolution and disease emergence. in many cases, extreme stability is seen with persisting rna viruses. indeed, examples are known in which it is the persistently infected host that has better survival. we have also recently come to appreciate the vast diversity of phage (dna viruses) of prokaryotes as a system that evolves by genetic exchanges across vast populations (chapter 10). this has been proposed to be the " big bang " of biological evolution. in the large dna viruses of aquatic microbes we see surprisingly large, complex and diverse viruses. with both prokaryotic and eukaryotic dna viruses, recombination is the main engine of virus evolution, and virus host co-evolution is common, although not uniform. viral emergence appears to be an unending phenomenon and we can currently witness a selective sweep by retroviruses that infect and become endogenized in koala bears. our understanding of virus evolution has reached a threshold in that it now appears to provide a much broader vista regarding its general signifi cance and infl uence on host evolution. several developments have brought us to this point. one has been the realization that viruses often evolve by processes involving the collective action of a consortia, or quasispecies. and the resulting adaptability and power of such evolution is unmatched by any other genetic entity. much of this volume is dedicated to this issue. in such consortia, the concept of a " wild-type " virus is no longer considered to be the fi ttest type, as the quasispecies itself provides fi tness (see chapters 3 and 4). the quasispecies model resembles population genetics in some ways, but it has led to some signifi cant departures from population genetics, and these departures are very well supported by experiments. another development that has recalibrated our view of the overall signifi cance of viruses is information on the scale and diversity of viruses. viruses are present at a previously unappreciated global level and appear to have affected the evolution of all life on earth. much of this realization has been brought about by the development of metagenomic methods as applied to various habitats. measurements of major habitats (the oceans, soil, extreme environments) have established that our biological world is predominantly viral, in terms of both numbers and diversity ( paul et al. , 2002 ; breitbart et al. , 2003 ; rohwer, 2005a, 2005b ; edwards and rohwer, 2005 ; comeau et al. , 2006 ) . these two developments would seem reason enough to consider virus evolution in a new light. however, there have also been numerous theoretical proposals suggesting viral involvement in some of the very earliest events and major transitions in the evolution of life. we no longer think of viruses as recent agents that escaped from the host chromosomes as run away replicons. viruses now appear very old to us and they relate to and trace all branches of life. the last 30 years have been very active regarding virus evolution. major developments in theory, technology, medicine, and the study of human disease with respect to virus evolution have all occurred. and as we seek to grow and manage various life forms for human use, virus evolution has also had major impact on such efforts. as a science, virus evolution has benefi tted greatly from traditional evolutionary biology. however, since viruses are molecular genetic parasites that are inscrutable by casual observation, our understanding of virus evolution has been dependent upon measuring sequence variation and sequence diversity in a large number of virus genomes. because of this, these small genetic parasites have been the last domain of life to yield their secrets of evolution. and viruses harbor some clearly distinct evolutionary abilities. for one, they are polyphyletic. all major viral lineages have their own distinct origins. they are also difficult if not impossible to defi ne as species and are able to exchange genetic information across normal boundaries. even " dead " and defective viruses can participate in such exchanges, which confuses the defi nitions of fi tness. we now know that viruses can evolve by a consortia process and also exchange information by recombination across vast genetic pools to assemble new mosaic combinations of genes. we thus no longer think of a specifi c genetic lineage in understanding virus evolution, but instead think of a cloud, matrix, or a population as the basis of virus evolution. viruses are inherently fuzzy entities that can differ from their relatives in any specifi c feature. yet even with such fuzziness, it is clear that common themes also link them. patterns of evolution have become clear. diversity and variation are often (but not always) observed. stability and host congruence can also be observed. nevertheless, the evolutionary power of viruses has been learned at a human cost. the application of numerous analytical and phylogenetic tools have provided crucial insights into virus origin and evolution. yet these methods struggle to incorporate the fuzzy nature of viruses and have clear limits, especially regarding quasispecies and high recombination rates. structural biology now also adds tools that extend our vision of virus evolution beyond what can be seen in the genetic sequence. for example, common structural motifs from phage to eukaryotic dna viruses (t4 and herpesvirus) suggest very ancient links in virus evolution that span all domains of life (see below). nevertheless, our analytical methods are currently lacking as we struggle to understand complex genetic mixtures that provide fi tness, reticulated relationships, polyphyletic origins, and virus-host congruence. virus evolution has, for the most part, been considered to be a specifi c, esoteric part of broader evolutionary biology and has been given limited attention in reference works on evolutionary biology (see pagel, 2002 ) . historically, the focus has been on various rna viruses and some dna viruses that cause disease in humans and domesticated animals and plants . however, i have asserted that all life forms must be examined from the perspective of virus evolution ( villarreal, 2005 ) , not just those pathogens that impact on us. how viruses evolve in a more general sense informs us of evolutionary paradigms that have not been previously well understood (especially the evolution of consortia or the dynamics of vast reticulated gene pools). this volume now extends these traditional topics of virus evolution to include the vast virology of the prokaryotic world. in so doing, it illuminates the global consequences viruses have had on all life forms. prior to the 1970s we saw some stunning successes in vaccine control for major viral human diseases, such as polio, measles, mumps, infl uenza, and especially smallpox virus. due to this historic success, american health agencies and educators considered virus disease as a thing of the past, no longer a serious threat. the era of infectious disease that they represented was now one for the historical record, an unfortunate part of human history, or so we thought. in what now seems to have been a clear case of hubris and naivety, we have been humbled by the evolutionary power of viruses, which was woefully underappreciated, even by most virologists. by the end of that decade, the evolution and emergence of hiv-1 permanently changed our views (see chapters 13 and 14) . this has also been followed by a seemingly never-ending series of viral threats as newly emerging viral diseases have come to our attention. hiv-1 provides the only example of a public health situation that has reversed centuries of progress for extending human health and lifespan. it now limits human life expectancy in many parts of the world, especially sub-saharian africa. this development could not have been imagined in the 1970s. we are much less confi dent now about predicting the future of virus evolution and its potential impact on human health. diseases of domesticated microbiological, plant, and animal species have also experienced the trauma of the consequences of emerging viral diseases along with huge losses. however, the human hiv-1 story may not be a fl uke of virology but may be telling us something basic about human and primate evolution. as we sought to understand the origin of this new virus, we have come to appreciate a much broader virus-host story which involves simian immunodefi ciency virus (siv), foamy viruses, and the speciation of old world primates. we have also come to learn that the genomes of these primates show much evidence of past viral interaction and ongoing endogenous retrovirus colonization. the evolution of retroviral endogenization has taken on a much greater signifi cance in basic evolutionary biology. thus it is with great interest that we now study the ongoing endogenization of retroviruses in the koala bear genome (see below). historically, we are compelled to study viruses because they can cause serious disease. new viruses come to our attention also mainly because of disease. it is therefore understandable that most evolutionary biologists mainly think of viruses strictly as agents of disease. these are the products of run away replicons that provide negative selection to host survival. in this light, the application of predator-prey based mathematical models has seemed most appropriate. with such viral disease, variation has long been observed and was initially used for the generation of most vaccines. however, this disease-centric view has also occluded another more prevalent virus-host relationship. for example the emergence of hiv-1 has led us conclude that it likely evolved from various versions of siv. but siv is not pathogenic in its native african primate host. nor does it show the genetic diversifi cation of hiv-1 in these native primate hosts. it is a silent, asymptomatic infection. genomic and metagenomic analysis now allows us to identify many more silent, asymptomatic viruses that would not have previously been observed. we now know of many such viruses that are prevalent in a specifi c host. evolutionary biology must escape the confi nes of disease-centric thinking and seek to understand these relationships as well. in the last ten years i have attempted to provide another view concerning virus-host evolution. i have argued that viruses often attain evolutionary stability by species-specifi c persistence and that such states apply to all domains of life, including prokaryotes. on an evolutionary time-scale, the majority of viral lineages tend to exist as species-specifi c persistent (aka temperate, latent, and chronic) infections in which individual hosts will be colonized by mostly silent (asymptomatic) viruses for the duration of their life . such persistence can have major consequences to the evolution of both virus and host, which also leads us to more directly link virus evolution to broader issues of host evolution. it is from this perspective that we start to clearly see that viruses indeed belong on the tree of life as major participants ( villarreal, 2005 ( villarreal, , 2006 . persisting viruses are not simply agents responsible for destruction of life, but are also agents that create genetic novelty on a vast scale that infl uences all life and promotes symbiosis ( marquez et al. , 2007 ; ryan, 2007 ; villarreal, 2007 ) . the persistent lifestyle of such symbiotic virus-host relationships is not simply a less effi cient, acute infection; nor is it simply a " reservoir " for acute virus (as epidemiologists are prone to assert). neither can it be attributable to concepts of selfi sh dna. persistence represents a major virus life strategy that is both fundamental and highly adapted. it has distinct genetic, fi tness, and evolutionary characteristics that require intimate, host (tissue)-specifi c viral strategies and precise gene functions to attain stable maintenance in the presence of immunity and to allow biologically controlled reactivation. persistence also must resist displacement by similar viruses and competitors. it is virus-host persistence that provides the thread that allows us to link these polyphyletic viral lineages (and their clouds) with the entire tree of life. in turn, this link identifi es a much more fundamental role for viruses in the evolution of host, visible from the very earliest to the most recent events in host evolution. it is from such species-specifi c persistent states that the large majority of acute diseases evolve and emerge by various mechanisms. we know much about virus replication and disease. however, our understanding of the specifi c mechanisms of persistence is generally poor. persistence is a generally silent and inscrutable state, it does not lend itself to in vitro or cell culture experimental models. we are left with but a few examples from which to attempt to extrapolate the possible existence of general relationships. the study of virus evolution thus struggles to incorporate concepts of persistence. another recent and major development in virus evolution is the arrival of various proposals suggesting that viruses have been involved in some major innovation and transition in the evolution of life. in all these proposals, however, it is necessary that the virus in question has attained a stable genomic persistence with its host. these evolutionary events thus seem to be the products of viral-mediated symbiogenesis of host ( ryan, 2007 ; villarreal, 2007 ) . proposals include the possibility that viruses may have originated the dna replication system of all three cellular domains (archaea, bacteria, eukarya) of life ( forterre, 1999 ( forterre, , 2005 ( forterre, , 2006a ( forterre, , 2006b filee et al. , 2003 ; forterre et al. , 2007 ) . the discovery and analysis of the largest dna virus (1200 orf mimivirus), a lytic cytoplasmic virus of amebae (a distant relative of phycodnavirus and poxviruses), has also led to proposals that this virus lineage may represent an ancient fourth domain of life ( raoult et al. , 2004 ; desjardins et al. , 2005 ; claverie et al. , 2006 ) . it is interesting that in an initial structural analysis, the large complex replication centers for mimivirus were confused with the host nucleus ( suzan-monti et al. , 2007 ) . thus, it seems relevant that others have proposed that a distant relative of phycodnaviruses and poxviruses may have originated the eukaryotic nucleus ( villarreal, 1999 ; villarreal and defilippis, 2000 ; bell, 2001 bell, , 2006 takemura, 2001 ) . such proposals, although consistent with various observations, however, remain outside of the consensus of most evolutionary biologists. nevertheless, numerous other observations continue to suggest viral involvement in other major host innovations, such as a viral origin of rag1/2 of the adaptive immune system ( dreyfus et al. , 1999 kapitonov and jurka, 2005 ; fugmann et al. , 2006 ) or the role of endogenous retroviruses (ervs) in the evolution of the placenta ( villarreal and villareal, 1997 ; harris, 1998 ; blond et al. , 2000 ; mi et al. , 2000 ; dupressoir et al. , 2005 ; caceres and thomas, 2006 ) . such possible roles for viruses in host evolution are at odds with accepted views of virus-host relationships, but might be the products of viral symbiogenesis. the metagenomic viral measurements mentioned above for prokaryotic dna viruses, along with the increasing realization that viruses and host can co-evolve, has led to various calls that a viral tree of life needs to be considered and developed ( forterre, 2003 ; villarreal, 2006 ; filee et al. , 2006b ) . a virosphere clearly exists but its nature and boundaries are not so clear. multiple viral origins, their diversity and numerical dominance in distinct and sometimes harsh environments as well as their presence in host genomes suggest that any viral tree of life will be huge, multidimensional, and connected to the host tree of life. as discussed below regarding double-stranded (ds)dna viruses of prokaryotes, they have all the above characteristics. such viruses may represent the big bang of biological novelty. with their unmatched capacity to generate diversity they can function as the mass creators of biological novelty as well as destroyers of most species. surely, such capacity must have had big infl uences on the evolution of life. symbiosis, simply defi ned, is the stable coexistence of two previously separate lineages of organisms. there can be little doubt that many temperate phage can stably colonize a bacterial cell resulting in a stable descendent from two lineages. this is clearly symbiotic. endogenous retroviruses can similarly be found to persist in vertebrate genomes and also appear symbiotic. yet studies of symbiosis seldom consider a role for virus ( villarreal, 2007 ) . how important are viruses in general to evolutionary biology? the core concepts of evolutionary biology were developed well before we had a modern understanding and defi nition of viruses ( luria, 1950 ) . after all, the basic lysogenic model of phage integration was only clarifi ed in 1962 when developed by campbell (see campbell, 2007 ) . that cryptic and defective phage are ubiquitous in the genomes of all prokaryotes is generally considered uninteresting by many in the fi eld of evolutionary biology. i suggest that the seemingly applicable concepts of selfi sh dna effectively derailed any thinking that persisting genetic parasites might have a more germinal role in the evolution of life ( doolittle and sapienza, 1980 ; orgel and crick, 1980 ). yet as outlined above, virus footprints in major evolutionary transitions are clear and a direct role in such events now seems much more plausible. we therefore must seek to defi ning the nature of the virosphere and how its evolution relates to the tree of life. this book represents the fi rst integration of the entire fi eld of virus evolution, including both prokaryotic and eukaryotic life forms. however, because our understanding of virushost relationships remains uneven, the chapters necessarily focus on well-studied models (exemplars). these exemplars also tend to refl ect a historic disease focus (i.e., e. coli , fl owering plants, mouse, humans). it is unfortunate that the silent species-specifi c viruses that tend to exist in stable states with long evolutionary histories seldom provide our examplars. we understand these infections poorly and lack basic defi nitions concerning fi tness or selective advantages. only metagenomics tools now seem able to inform us of their presence, but not their biology. self-organization and the evolution of rna molecules as an origin of biological information are discussed in chapter 1. autocatalytic chemical reactions, such as replication of rna, presents issues such as how to optimize a rugged fi tness landscape yet allow the study of evolution in vitro with rna. rna in vitro reduces genotype-phenotype issues to rna secondary structure and minimal free energy states. this allows both continuity and discontinuity to be measured. these same issues are crucial for the study of rna viruses, whose sites of secondary structure often defi ne replicator identity. these models currently offer the best system to evaluate an early and simple biological world for evolutionary principles (see chapter 3). viruses and viroids with their rna genomes may be the only extant survivors of this pre-dna world. since it was the consideration of error-prone replication that led to the development of the concept of quasispecies (see chapter 2), such models have provided a conceptual foundation which led to several basic concepts. chapter 4 discussed the foundations and various aspects of quasispecies theory. viruses appear to operate close to the error threshold, thus allowing maximum evolutionary exploration ( biebricher and eigen, 2006 ) . however, as presented below, the loss of the " fi ttest " type concept has also led to clear experimental evaluations of consortia-based evolutionary behaviors. such behaviors were not predicted by classical darwinian models. although virologists were initially attracted to quasispecies models, many evolutionary biologists were initially hostile to the application of the quasispecies concept to evolution. it was thought that the classical mathematical models of population biology, as originally developed by wright and fisher, and later applied by kimura and maruyama to asexual haploid populations at the mutationselection balance, had already fully developed the needed models and precluded additional need for the quasispecies concept. the classical models were thus argued to provide adequate mathematical coverage for viruses, including quasispecies and error threshold ( wilke, 2005 ) . however, these two approaches differ fundamentally with regard to the signifi cance of error-prone replication and it was the quasispecies approach that led to the clear experimental establishment that quasispecies selection, per se , is important for viral pathology and fi tness (see below). the development of quasispecies theory to virology does indeed demonstrate distinct differences with population genetics. various phenomena, such as complementation, cooperation, competition, and even defective mediated extinction ( domingo et al. , 2001 , domingo 2006 grande-perez et al. , 2005 ) have been observed, all of which fall outside of the parameters of classical population genetics. viral fi tness has indeed been shown to be due to interaction within a diverse population, and not to the fi ttest or master type. and with rna viruses, error threshold has become a central issue ( biebricher and eigen, 2005 ) . the collective experience has thus made clear the value of theory to biology. many working biologists understand that life seems overly complex and defi es most generalizations. thus they do not always appreciate attempts at general theory. although in reality, biology may indeed often be too complex for accurate theoretical predictions, these theories have nevertheless clearly stimulated crucial concepts and experimental evaluation, and biologists should be encouraged by them. by providing new ways of thinking, entirely new experimental approaches can be developed. the existence of error-prone replication and quasispecies also raises the issue of the conservation of information. how is information stability and higher fi tness attained with such errors? how can genetic complexity be created in such a circumstance? can cooperation (or consortia behavior) result from any of these models? these issues have yet to be resolved. some interesting suggestions, however, have been proposed. one involved cooperative evolution that results from ligated genomes. a model was proposed in 2000 by stadler and schuster in which they considered the dynamics of replicator networks resulting from higher order reactions involving the templated ligation of smaller genomes ( stadler et al. , 2000 ) . although this was based on concepts such as triple-stranded nucleic acids, this clearly has some elements that also resemble the ligation of recombinational processes for dna phage (presented below). a most interesting outcome of these models is that, depending on initial replicator concentrations, permanent coexistence of replicators could result in a cooperative network. such cooperation is a rare outcome for most models and given the conclusion that early dna based life was a " horizontal " consortium, such models are of special interest. the issue of consortia will come up often. consortia selection directly implies cooperation, but cooperation of selfi sh replicators presents dilemmas. replicator networks with interaction functions that give highly non-linear dynamics can result in complex mixtures, with behaviors ranging from survival of the fi ttest to also including attainment of globally stable equilibrium tantamount to permanent coexistence. the fi tness of populations, however, is inherent to current quasispecies concepts in virology. there may also be other ways to explain the genetic origin of cooperation (such as stable persistence involving addiction strategies). the stable persistence of a genetic parasite can compel cooperation and promote the conservation of information (see below). the defi nitions of fi tness with respect to a virus in a natural habitat are far from clear. although the concept of relative replicative fi tness is often applied to lab experiments of virus growth, we know many situations in which virus replication is not maximized in natural settings and many viruses can exist in relatively non-replicative states for long periods. even in the context of an acutely replicating virus in a host organism, the concept of fi tness is clearly conditional, as the virus must replicate through various in vivo habitats that can have opposing selection. as presented below, in vivo models that study fi tness and viral diversity have clearly indicated that diversity per se is important and fi tness is the result of consortia. how do we defi ne such fi tness since the mixture clearly matters? also, how do we defi ne information content or integrity of a consortia? currently, we cannot. in the lab, the viability of a virus is usually measured by the ability to produce plaques. this has been a crucial and main assay for many experimental systems that study virus population. here, the defi nition of fi tness seems direct: plaque formation equals fi tness. various highly useful models have thus been defi ned and developed that depend on these plaque assays (see chapter 4). with this, populations and population growth are defi ned as relative growth of plaque-forming units. however, the concept has always been problematic when considered from a natural virology perspective. plaque formation is clearly not equal to fi tness in natural habitats. there are many examples of highly successful viruses that either plaque poorly or not at all. consider the roughly 100 types of human papillomavirus (hpv), a simple small circular dna virus of epithelia; this does not form plaques in any known system (chapter 18). hpv is clearly fi t, well adapted to, and stable in its human host. in addition, hpv evolution is phylogenetically congruent with their primate host, as are most persistent viral infections. we have yet to understand the definition of fi tness in this situation. in some cases, it seems selection for plaque propagation has clearly resulted in loss of highly conserved genes; such as with the plaque-adapted laboratory strains of cytomegalovirus (cmv). the problem posed by viruses with ineffi cient plaque formation is not limited to dna viruses. many persisting rna viruses also do not plaque well or at all, such as most rna viruses of plants or many insect picorna-like viruses, such as those found in drosophila and bees (which also conserve an extra orf). nor do most persistent infections make lots of virus. low-level persistence, such as hantavirus in rodents, for example, is common ( hart and bennett, 1999 ) . clearly, our simplifying assumptions of viral fi tness and population dynamics cannot apply to these stable evolutionary states. however, if we limit our defi nition of viral fi tness to relative replication or plaque formation we can perform some clear and quantitative evaluations. experimental evaluation forces us to study fi tness by only those defi nitions that we can currently measure. as fi tness appears to be a relativistic and transient concept, depending very much on the tissue, time, place, extant adaptive and innate immunity, and competition, it is likely that we can only measure with any accuracy one aspect of fi tness at any one time. hiv infection of humans shows evidence of this in that the r5 virus is more fi t for transmission and early disease whereas the x4 virus is fi t later during the aids disease phase. clearly conditional, time-dependent issues relate to fi tness defi nitions. however, much more problematic is that we have no theory for viral persistence or its fi tness. we lack specifi c or measurable parameters other than the simple maintenance of genetic material. yet it seems clear that some distinguishing features of persistence can already be recognized. for example, the possible participation of viral defectives (normally considered unfi t), which in numerous circumstances can modify or mediate persistence, would need to be included. clearly, a defective role in persistence would also preclude them from being considered as genetic " junk, " or selfish elements, since they would then matter in measurable ways to the biological outcome of virus persistent infection. persistence also requires an extended duration of infection, not simply maximized replication. in fact, persistence generally requires mechanisms to limit the replication of at least the same virus for at least some time. thus, limited replication must be an essential element for this life strategy. in my judgment, and much like the quasispecies concept, the concept of persistence will eventually be recognized for the fundamental (symbiotic) force it represents in virus evolution. the experimental work of domingo and holland spans the modern assessment of quasispecies theory that occurred in the 1980s and 1990s. these investigators were the chief proponents of this theory, bringing it to the attention of the broader virology community (chapter 4). this work has transformed our thinking and laid the experimental foundations that we now build upon. this current volume is an extension from an earlier book on quasispecies and now encompasses both prokaryotic and eukaryotic viruses . since early experimental phage studies provided the foundations for quasispecies theory ( eigen et al. , 1988 ) , using mathematical descriptions (differential equations) of mutation rates in t-even phage ( luria and delbruck, 1943 ) , this inclusion is appropriate. interestingly, a second early paper measuring replication rates by these same authors also noted the problems of viral interference and defectives ( delbruck, 1945 ) . other early experiments of phage rna polymerase ( batschelet et al. , 1976 ) , especially with rna phage qβ ( domingo et al. , 1978 ) , helped set the stage for the subsequent experiments of the 1980s and 1990s. from the test tube to mouse models to the study of human disease, the work of domingo and colleagues has spanned the entire history of viral quasispecies ( domingo and gomez, 2007 ; chapter 4) . quasispecies deals with the products of error-prone replication. however, it is worth repeating that products of error-prone replication are not behaving in a simple " selfish dna " capacity and are not devoid of biological relevance and phenotype. in their complex populations, they create clear and varied affects on viral adaptability, competition, and fi tness. since quasispecies necessarily involve defective and mutant virus, it is easy (and common) to think of these entities simply as genetic junk ( villarreal, 2005 ( villarreal, , 2006 . defective and even lethal or interfering variation in viral genomes can contribute to adaptability. thus, viruses can clearly adapt as a cloud with a mutant spectra. in addition, unending competition and exclusion, consistent with the red queen hypothesis, has also been observed ( clarke et al. , 1994 ) . the poliovirus-mouse model (see below) in particular has provided a solid experimental system for evaluating the adaptive consequence of quasispecies. it is thus ironic that these same experiments have also made clear that the original simplifying assumptions of the quasispecies ordinary differential as presented by eigen are violated by the resulting quasispecies. these products of error-prone replication do indeed strongly interact with each other in both positive and negative ways and such interactions contribute signifi cantly to the observed fi tness of the population. errors and interaction are important for fi tness. for example, defectives have been reported to mediate extermination of a competing wild-type virus ( grande-perez et al. , 2005 ). complementation has also been observed ( garcia-arriaza et al. , 2004 ) , as has trans -dominant inhibition ( crowder and kirkegaard, 2005 ) . genetic memory of past selection has been shown to be maintained in a minority of the population ( ruiz-jarabo et al. , 2000 ; briones et al. , 2006 ) . such cooperative (consortia) behavior, which can also depend on unfi t or defective members, is at odds with classical darwinian notions regarding survival of the fi ttest. consider, for example, the fi tness of a defective or mutant outside of its role in a quasispecies. such a consideration ignores the very nature of a quasispecies yet it is an issue that has often been posed and experimentally evaluated. we should refrain from thinking of viruses simply as fi t or non-fi t individual types since they clearly exist in populations that provide population-based adaptability. the selection of viral consortia or population raises some fundamental issues for evolutionary biology. this is essentially group selection in which a population, not the fi ttest individual, is selected. this view makes cooperation or interaction of individual genomes a signifi cant component of selection, which is not commonly thought to be a general or accepted process in evolution. yet population selection is no longer a contestable issue in rna virus evolution (see below). i expect that many classical evolutionary biologists might interpret this as evidence that viruses really are an oddity in this feature and are not representative of broader processes of evolutionary biology. furthermore, viral-based group selection may not be limited to quasispecies-based evolution. as presented below, persistent viral infections may also provide population-based selective advantage (see below for the p1 and mouse hepatitis virus (mhv) persistence exemplars; villarreal, 2006 villarreal, , 2007 . since viruses are ancient, numerically dominant, and the most diverse biological entities on earth, no life form can escape exposure to them. all extant life forms have evolved in a viral habitat. thus we should expect that the viral footprints (including defectives) that we now fi nd in all genomes have likely played an active role in their evolution; a role, i would argue, that is fundamental, dynamic, and unending. if we can accept this assertion, we may start to see and appreciate the vast evolutionary power that viruses can bring to bear onto host evolution. we can start to attain a global perspective and appreciation for their ability to assemble genetic function from enormous, complex mixtures of genomes, and select gene sets needed to solve multivariant, temporally dynamic evolutionary problems. we can then seek evidence for the role of viral elements in fundamental host innovation and be open to evaluating the occurrence of viral entities from a constructive perspective and not instinctively dismiss such observations as due to coincidence, " junk, " or selfi sh dna. the advantage of such a perspective is that it will promote the specifi c experimental evaluations that can better assess any constructive role genetic parasites might have played in host evolution. for example, there is much reason to think ervs have played an active role in human evolution (for references see ryan, 2007 ) . the quasispecies concept has provided the foundation for us to understand virus evolution and informed us of the evolutionary power viruses possess. if that power also links to host evolution, then the tree of life becomes enriched by virus, much larger and more dynamic. the recent experimental studies from andino and colleagues using poliovirus in the mouse model should, in my judgment, provide the keystone exemplar regarding the in vivo fi tness of quasispecies (see chapters 6 and 7; vignuzzi et al. , 2006 ) . these studies make clear the importance of quasispecies and error-prone replication. such detailed in vivo experiments were made possible by a long and detailed history of poliovirus studies that has identifi ed the nature of rna polymerase fi delity as well as developed mouse models for the study of pathogenesis. few other virus-host systems could have provided such potential for high resolution. these results also provide the experimental observations that distinguish quasispecies-based evolution from the classical fisher-based population genetics. the general importance of this story for understanding virus evolution thus deserves special emphasis. the very origins of modern animal virology stem from poliovirus studies with the need to develop in vitro cell culture technology in order to grow and evaluate poliovirus and generate variants. the live poliovirus vaccine is of special interest with regards to virus evolution and adaptability. the " live " oral sabin vaccine can be considered to have been a miracle of the practical approach to virology developed in the 1950s ( horaud, 1993 ) in that it was used well before our understanding of the relevant evolutionary theory. the sabin vaccine strain was the result of rodentadapted virus and differs from the neurovirulent mahoney strain by 56 point mutations (in the consensus sequence), although only a small number of these mutations were needed for neurovirulence ( christodoulou et al. , 1990 ) . one of the important neurovirulent mutations was within the rna polymerase gene ( tardy-panit et al. , 1993 ) . however, the signifi cance of this observation took many years to unravel and exploit. in time it became apparent that 3dpol mutants could affect replication fi delity. one poliovirus point mutant, 3dg64s, was shown to have enhanced highfi delity replication and that selective pressure could be designed to increase fi delity in rna polymerase ( pfeiffer and kirkegaard, 2005 ) . another major development was the molecular identifi cation of the poliovirus receptor and the subsequent creation of transgenic mice expressing this receptor, making them susceptible to poliovirus infection. one of these transgenic lines allowed mouse brain infections with neurovirulent versions of poliovirus , and has provided a very useful animal model that allowed the evaluation of viral fi tness in the context of in vivo pathogenesis. although 3dg64s replicates well in culture (with lowered error rate), it was less pathogenic in this mouse model and competed poorly with 3d wild-type virus. it seemed that the decreased viral diversity was less able to generate the variation needed to get past bottlenecks due to multiple selective differences presented in vivo in tissues in the host, such as brain infection ( pfeiffer and kirkegaard, 2006 ). this experimental system also makes clear the greater complexity of fi tness in vivo relative to that typically measured in culture. thus it seems that in vivo there may not be one fi tness but several that cannot be distinguished or individually measured. it is likely that various in vivo barriers require distinct fi tness solutions that tend to create bottlenecks and that the diversity per se is essential to get past such bottlenecks. a population, not a clone or a consensus, appeared more fi t as higher titer infections of 3dg64s also failed to be pathogenic. thus, higher levels of a consensus virus are not equivalent to higher diversity. the relationship between rna polymerase structure, error rates, and ribavirin action is discussed by cameron in chapter 6 and has been the subject of numerous studies ( crotty et al. , 2001 ; vignuzzi et al. , 2005 ) . knowledge of the structure and catalytic mechanism of rna polymerase function has allowed a greatly enhanced level of detail to be considered into what affects error rate (see castro et al. , 2007 ; korneeva and cameron, 2007 ; marcotte et al. , 2007 ) . this has provided insight into the likely action of ribavarin on product fi delity ( harki et al. , 2002 ) . thus, it appeared that even a mutant of rna polymerase with increased fi delity could still generate elevated diversity by various methods. such control of fi delity allowed for the design of control experiments in which the same consensus virus genome could be forced to generate either less or more diverse progeny populations. in no other virus-host system have we attained such detailed insight into issues of error rate as those that were put to such excellent use in the poliovirus-mouse system. how generally important is this poliovirus in vivo quasispecies result? although the poliovirus-mouse system provides us with a fi rm experimental result, it seems likely that the generality of this relationship will be questioned by evolutionary biologists for several reasons. for one, this was observed in a lab constructed model system, which, it could be argued, is not an accurate representation of in vivo virus-host fi tness. also, as mentioned above, group selection is a process that will not readily be accepted as representative by the broader community. is there any evidence that this result with poliovirus indeed represents a general virus-host evolutionary relationship in natural settings? as presented in chapters 13-15, retroviruses and also human hepatitis virus c clearly exist as quasispecies populations that affect disease outcome. in the case of the retroviruses, viral populations show diversity that far exceeds that seen for other rna viruses. in both hiv-1 and hcv there is clear circumstantial evidence for the importance of quasispecies for in vivo disease outcome, drug resistance, and fi tness. in addition, with hcv, cns infection may sometimes result, and such brain infections appear to be mediated by distinct quasispecies ( forton et al. , 2004 ; forton et al. , 2006 ) , reminiscent of the polio virus mouse model. quasispecies memory, as mentioned above, also seems to be an important issue with regard to failure of antiretroviral therapy ( kijak et al. , 2002 ) and it appears that pol gene mutations could also be involved in this ( carobene et al. , 2004 ) . measurements of hiv quasispecies in individual patients indicates that multiple evolutionary patterns can be found in typical individual patients ( casado et al. , 2001 ) , thus mixtures of hiv exist in patients ( bello et al. , 2004 ( bello et al. , , 2005 . and hiv-1 recombination is clearly contributing to diversity ( kijak and mccutchan, 2005 ) . thus, with both hiv-1 and hcv, their capacity to cause human disease is clearly associated with quasispecies compositions that affect fi tness in complex ways. the poliovirus mouse system therefore appears to refl ect quasispecies issues as observed in natural virus-host situations. consideration of retrovirus-host evolution introduces another large issue in evolution: genomic viruses. unlike poliovirus and most rna viruses, retroviruses (e.g. non-lentivirus) have colonized the genomes of animal species in large numbers and represent a large fraction of these genomes. genomic retroviruses are present in vast numbers, most of which are defective and mutant copies. in this genomic colonization they resemble the dsdna viruses of prokaryotes (discussed below) that also colonize all prokaryotes although at a much lower numbers. the human genome has fewer than 26 000 genes, but appears to have 500 000 retroviral-related ltr elements. some of these elements are intact and conserved (human ervs (hervs)) and this genomic population has some clear characteristics of a viral quasispecies. such large amounts of genetic material have previously been dismissed simply as selfi sh or junk dna of no fi tness consequences to the host. however, given the importance of quasispecies mutant genomes for viral fi tness and persistence, we might need to re-evaluate this dismissal. retroviruses are clearly part of the human ancestry thus we should seek to understand, not dismiss their role in human evolution. in contrast to the story above in which polio infection of mouse brain was dependent on the quasispecies resulting from lowered fi delity replication, a different relationship has been proposed for the nidoviruses. these are also positive single-stranded polycistronic rna viruses ( gorbalenya et al. , 2006 ) . this group of virus includes the coronaviruses (e.g. mouse hepatitis virus and sars-associated coronavirus), which are the largest rna viruses known (26-32 kb). it has been proposed that such large genomes have required the adaptation of a high-fi delity rna polymerase in order to increase the error threshold and accommodate large rna genomes. based on the phylogenetics of this polymerase and other rna-processing enzymes, this group of viruses appears to be monophyletic and it is thought that the acquisition of a high-fi delity rna replicase was central to the origin of this lineage. this type of replicase is unique to rna viruses. the monophyletic view stems from an analysis of a small set of conserved genes. overall, however, these larger genomes have many other genes that show no similarities to related viruses. the origins and evolution of these more diverse and numerous genes cannot be currently traced. this is an inherent problem in the analysis of virus evolution: a small selected set of hallmark genes with some similarity are assumed to trace an apparently linear (tree-based) viral lineage whereas the larger number of genes are not included and cannot be traced. if most of rna virus evolution is indeed mediated by a mixed cloud of genomes, any role for mutant mixtures thus becomes obscure. but perhaps there is little else we can currently do given the lack of information. how might we explain the increased fi delity and genome size of the nidoviruses? was there some change in viral adaptation in which quasispecies and generation of mixtures was no longer as important for adaptation? did the need and selection for a larger genome override the use of error to generate adaptability as seen in poliovirus and hiv-1? if so, what selective pressures might have changed this seemingly basic feature? what do we know about the natural biology of these viruses, which might provide some insight into this? unfortunately, the natural distribution and gene functions of the nidoviruses are generally poorly understood. in terms of coronaviruses, numerous mammal and avian species can be infected and the virus will cause acute disease. in several of these acute infections, the virus involved seems to have recently been adapted to the new host from other, often unknown sources. with the recent emergence of the sars virus and human infections, however, much greater attention has been focussed on trying to understand the origin and evolution of this virus. it has recently become clear that there indeed appears to exist an evolutionary stable source of this virus from which adaptation to humans was possible. various bat species have been found to support persistent asymptomatic infections by specifi c versions of sars viruses ( tang et al. , 2006 ; vijaykrishna et al. , 2007 ) . these studies also indicate that there appear to be three different and independent groups of sars viruses in bats. in fact six novel coronaviruses were isolated from six different bat species showing an astonishing diversity in bats. furthermore, phylogenetic analysis indicates that all bat coronaviruses appear to have descended from a common ancestor. only one of these bat groups includes sars and sars-like coronaviruses that adapted to acute human infections. thus, a prevalent and speciesspecifi c persistence of sars viruses is found in particular geographical populations. why is this relationship stable? could the adaptation to a host-specifi c persistence-based basal life strategy provide some explanations for the evolution of the higher fi delity rna replicase of these coronaviruses? as i have argued, persistent viral infection represents the majority of evolutionary stable viral lineages ( villarreal, 2006 ) . however, we have almost no knowledge regarding how these bat sars viruses persist and escape elimination by innate and adaptive immunity and what, if any, role the high-fi delity replicase (or other genes) have in this life strategy. although we cannot yet evaluate natural sars virus persistence in native bat hosts, another coronavirus may be more informative regarding the effects of persistence on host populations. mouse hepatitis virus (mhv) may provide our best exemplar of virus-host relationships and show how the concept of virus addiction relates to population persistence. mhv is the best-studied coronavirus. as a natural and prevalent virus of rodents, mhv is our best natural model of persistent rna virus-host relationships for any mammal. in general, rodents are the most studied non-domestic mammals with regard to natural virus distribution. overall, we know that wild-caught rodents seldom show signs of acute virus infection ( kashuba et al. , 2005 ) . however, asymptomatic virus persistence is ubiquitous in wild rodents ( descoteaux et al. , 1977 ; gannon and carthew, 1980 ; schoondermark-van de ven et al. , 2006 ) , including voles ( descoteaux and mihok, 1986 ) . some fi eld studies have evaluated broader patterns of virus persistence in mice ( singleton et al. , 1993 ; becker et al. , 2007 ) which indicated that wild house mice are highly colonized with mhv (80-100% prevalence). in addition to mhv, mouse cytomegalovirus, mouse parvovirus, mouse thymic virus, and mouse adenovirus are also prevalent. other well-studied mouse viruses, such as lymphocytic choriomeningitis virus (lcmv) and polyomavirus (pyv), were at low natural prevalence. interestingly, some non-native house mice that have colonized isolated islands may lack mhv ( moro et al. , 1999 ) , although most other isolated island populations retain mhv ( moro et al. , 2003 ) . other small mammals have yet to show any viral disease whatsoever (hedgehogs, chinchillas, prairie dogs, gerbils, sugar gliders) ( kashuba et al. , 2005 ) . thus, asymptomatic persistent viral infection is clearly the norm in rodents. yet, in spite of this usual asymptomatic viral persistence, historically, some zoonotic viral disease outbreaks have occasionally been documented in natural populations. one such early outbreak was an epizootic diarrhea that occurred in infant mice ( adams and kraft, 1963 ) . later, it was established that one such infection was due to mouse hepatitis virus ( carthew, 1977 ; ishida et al. , 1978 ) . in spite of this disease outbreak, with mhv, it has since become clear that asymptomatic persistent infections are the norm and are highly stable. yet mhv disease outbreaks, especially in virus-free mouse facilities, are also common and severe. how does mhv attain such stable and prevalent persistence in natural population yet retain the ability to cause disease in naive populations? what maintains the mhv fi tness of natural persistence? it is well known that once mhv is established in a mouse or rat colony it can be very diffi cult to eliminate ( gannon and carthew, 1980 ; lussier and descoteaux, 1986 ) , clearly indicating that stability is rapidly attained and likely genetically programmed by the virus. i propose that these stable evolutionary states of viral persistence are due to a strategy we can call virus addiction ( villarreal, 2005 ) and that mhv can provide the exemplar of such a state. with mhv, only persistently infected mice colonies are protected from the disease that is otherwise caused by the virus. in wild asymptomatic mice, mhv is found mostly as an enteric infection. the cns demyelinated disease that mhv can induce is most observed in newborn pups ( homberger, 1997 ; nash et al. , 2001 ) and once in the brain, mhv can persist in cns with recurring disease ( marten et al. , 2001 ) . this recurring cns disease is also associated with quasispecies (in the s gene) and recombination ( rowe et al. , 1998 ) . the most serious cns disease is in s-gene variant of mhv-4 (jhm), thus as with the poliomouse model, pathogenic fi tness with mhv is also associated with quasispecies. such mhv disease is the bane of all mouse colonies ( knobler et al. , 1982 ) . however, once mhv persistence is attained, the problem to a mouse facility is not due to acute disease, but because immunological measurements are signifi cantly affected by mhv persistence. thus mhv alters mouse molecular identity regarding immunological (t-cell) reactions ( wilberz et al. , 1991 ) . to establish stable asymptomatic persistence, however, mhv needs to infect newborns ( weir et al. , 1987 ) , in which acute disease is prevented due to maternal passive immune antibody transfer ( gustafsson et al. , 1996 ) . being born to immune mothers thus protects against cns disease and promotes enteric (not brain) virus colonization. in addition, it appears that persistence also promotes cross-species transfer ( baric et al. , 1999 ) . mhv persistence may involve genome stability and result in a distinct evolutionary dynamic. asymptomatic persisting infections in a lewis rat, for example, showed no variation in mhv s gene sequence, and no quasispecies as seen in brain infections ( stuhler et al. , 1997 ) . the need to establish stable persistence could then be providing a strong selection for increased genome complexity and stability and might better explain the selection for the enhanced rna polymerase fi delity in nidoviruses. how might such selection operate in natural populations? evolutionary biologists often consider what might differentiate one group from another very similar group in a way that leads to two isolated and distinct populations. consider two hypothetical adjacent hay stacks harboring two mus musculus colonies, one of which is persistently infected with mhv the other which is not. what is the fi tness consequence to the colony harboring mhv relative to its uninfected neighbor? our experience with mhv in mouse breeding colony provides a clear answer. the colony that is persistently infected with mhv will have a distinct advantage over its neighbor as mhv introduced into this uninfected colony will have severe effects on the offspring. eventually, we can expect only the mhv-harboring colony will prevail in both hay stacks. this is a state i have called virus addiction. only mice harboring persistent mhv are protected against the potential pathogenic consequence of acute mhv (or related virus) infection. the population is addicted to the virus. such a state, however, is clearly affecting colonies (or groups) of host, not individuals. an individual either quickly succumbs to the virus infection or, if infected, transmits it to others in the colony. a colony is thus under selection by mhv. to generalize this state, we expect that the persistence of sars in specifi c bat populations would be expected to also affect the fi tness of the corresponding specifi c bat populations. persistence is a more demanding phenotype than acute replication. it requires greater gene complexity to counter host immunity and also to promote self-regulation. thus the enhanced fi delity of rna replication is selected in order to conserve this greater genetic complexity and stability. we know that the high-fi delity rna replication system (including rna pol, helicase, endoribonuclease, and other activities) is also present in an ancient nidovirus relative of coronaviruses, such as fi sh-isolated white bream virus (26 kb rna). i suggest there will also likely be species-specifi c persistent infections with this virus that require this enhanced replication fi delity and maintain this virus in its natural habitat. thus, i suggest, an ancient persistent life strategy could more easily explain the monophyletic character of the nidovirus virus lineage. it is particularly interesting that one of these unique and conserved replication proteins (adp ribose-1-monophosphate) is dispensable for culture growth ( putics et al. , 2005 ) . i suggest it will not be dispensable for persistence. the hiv-1 pandemic is an unfi nished story. hiv-1 represents a real-time biological event in human evolution that confi rms for us the importance of quasispecies and retroviruses to human biology. however, even though its human toll is huge, modern medicine and culture has responded rapidly enough to limit the impact of hiv-1 to the point at which it will not likely be the cause of a selective evolutionary sweep that could have altered human genetic makeup (in contrast to the koala bear endogenization presented below). as described earlier, its amazing adaptability via quasispecies along with extensive recombination contribute directly to hiv-1 ' s diversity and makes it the most dynamic genetic entity ever studied. many studies track the dominant hiv population and fail to examine minority populations. yet it is precisely these minority populations, which evolve independently of the majority population, that can determine drug resistance phenotype and biological outcome ( charpentier et al. , 2004 ; briones et al. , 2006 ; morand-joubert et al. , 2006 ) . clearly, the specifi c makeup of a complex hiv population matters. furthermore, hiv defectives and variants can also have major consequences. in some cases, long-term non-progressors of hiv-1 have shown mixed populations and unusual polymorphism in the early phase of hiv infection, sometimes contributing to long-term non-progression (ltnp) ( alexander et al. , 2000 ) . one population of ltnps was reported to have been colonized by an hiv variant that showed low virus replication and slow or arrested evolution ( bello et al. , 2005 ) . in another case, a stable non-progressor was colonized by a replication incompetent version of hiv-1 ( wang et al. , 2003 ) . some of these non-progressors also appear to resist super-infection . it seems clear that at least in these exceptional situations, non-majority hivs are crucial to the outcome. there is also reason to think that other retroviruses have had a major infl uence on recent primate and human evolution, such as apathogenic persisting foamy virus in primates ( switzer et al. , 2005 ; murray and linial, 2006 ) . human antiretroviral genes seem to have undergone recent adaptations, such as apobec3, which can interfere with exogenous retroviruses (such as mlv and siv) and underwent an expansion in the hominid lineage ( esnault et al. , 2005 ) . it thus seems clear that human and primate evolution has been signifi cantly affected by earlier, prevalent primate retroviruses. another important human-virus quasispecies story that has long been recognized is with hepatitis c virus (hcv), (see chapter 15; domingo and gomez, 2007 ) . hcv seems to have adapted to humans in the recent past, possibly from asymptomatic enteric primate viruses currently found in africa ( smith et al. , 1997 ) . as hcv remains an infection predominantly transmitted by blood, it does not appear to have fully adapted to the tissues of and transmission within its human host. however, like hiv-1, hcv has long been recognized to generate quasispecies in chronically infected people ( martell et al. , 1992 ) and it soon became apparent that the viral quasispecies are affected by and affects the outcome of antiviral therapy hohne et al. , 1994 ; kurosaki et al. , 1994 ; okamoto and mishiro, 1994 ) . thus, successful antiviral therapy is directly correlated with an initial dramatic reduction in genetic diversity. unfortunately, it has become clear that only a minority of hcv-infected individuals will respond favorably to a combination of interferon and ribivarin. thus it seems to be diversity per se and the resulting structure of an hcv quasispecies that has a direct consequence to human health. however, since hcv is less well-adapted to humans compared with hiv-1, it does not pose the same threat to potentially provoke an evolutionary event in human evolution. vsv is a negative-stranded rna virus that has been a very important experimental model and has provided many laboratory measurements regarding quasispecies theory (see chapter 4). using vsv, evidence supporting the red queen hypothesis, involving unending adaptation to greater competition and mueller ' s ratchet has been presented ( clarke et al. , 1994 ; novella et al. , 1995 ; elena et al. , 1996 ) . when vsv was evaluated as an arbovirus, requiring adaptation to alternating and opposing fi tness of insect and mammalian host, it was also apparent that minority quasispecies populations were responsible for maintaining the apparently antagonistic phenotypes ( novella et al. , 1999 ) . thus here too, the consortia character of a quasispecies is clear. yet in natural settings several very different virus-host relationships can be seen with rhabdoviruses. a distant relative of vsv (vhsv) is also known to be responsible for mass die-off of commercially important fi sh ( marty et al. , 2003 ) . this virus infects many teleost species and has shown 100% mortality in many experiments (i.e. with i.p. inoculation). in natural outbreaks, however, it has also shown surprising genetic stability ( einer-jensen et al. , 2006 ) . clearly error-prone rhabdovirus replication must be kept in check by purifying selection in this situation. in contrast, another rhabdovirus, sigma virus of drosophila , is associated with no mortality but is a vertically transmitted persisting virus in specifi c drosophila populations ( fleuriet, 1996 ) . yet in some recent population measurements, sigma virus infected drosophila are expanding for unknown reasons ( fleuriet, 1994 ) . clearly this particular virus-host persistent relationship has some undefi ned selective advantage that operates beyond the lab-based concepts as measured above. other rhabdoviruses also have peculiar host-specifi c relationships, such as bats that tend to support many persistent infections ( badrane and tordo, 2001 ; li et al. , 2005 ) , or birds that seem to be free of almost all rhabdoviruses. clearly, although vsv lab results have been highly informative, we still have much to learn regarding natural settings that affect rhabdovirus adaptation and evolution. another major paradigm for the high rates of negative-strand virus evolution is found with infl uenza virus. due to its history and potential for initiating great human epidemics, it has long held the special interest of evolutionary virologists (see chapter 5; nelson and holmes, 2007 ) . however, this research has not much emphasized the quasispecies character of infl uenza virus evolution. instead, it concentrates on the evolution of the master template or clades of template for the purposes of vaccine development ( webster and govorkova, 2006 ) . the views stemming from this type of evolution have lent themselves well to master template-based phylogenetic analysis and have dominated how many researchers think of virus-host evolution. thus it is curious, given the above emphasis, that the quasispecies character of infl uenza populations often seems of low relevance to issues of acute disease and vaccination, other then to provide a source of diversity. in some situations, viral competitive interference may contribute to drift variation and displacement in antigenic epitopes ( levin et al. , 2004 ). yet outcomes of individual human and bird infections do not seem much affected by specifi c quasispecies structures, as we saw with hiv-1 and hcv. with infl uenza, we are mainly concerned with epidemic human disease. however, by shear numbers of infections and deaths worldwide, it must be admitted that infl uenza virus is really a virus that affects mostly birds. for example, during the 2005 outbreak in china, only 251 humans died whereas 230 million domestic birds died ( smith et al. , 2006 ) . although our concern on the large potential for human disease is understandable, these numbers should inform us of a more basic virushost biology. in this case, infl uenza shows a high affi nity for various birds; migratory water birds in particular can have high prevalence ( wallensten et al. , 2006 ) . some waterfowl, such as wild mallard ducks, have been called the stealth (asymptomatic) carriers of infl uenza h5n1 and free grazing ducks seem to introduce virus into domestic bird populations ( gilbert et al. , 2006 ) . thus waterfowl represent the well-accepted epidemiological concept of a reservoir species ( louz et al. , 2005 ) . but these wild waterfowl, shorebirds, and gulls that are a natural host for avian infl uenza also seem to show a much slower rate of evolution ( spackman et al. , 2005 ) . in contrast, the much higher rate of evolution as seen in chickens and turkeys indicates that these hosts should not be considered as natural reservoirs ( suarez, 2000 ) . in waterfowl, infl uenza infections show several distinctions, such as virus co-infection or virus interference ( sharp et al. , 1997 ) as well as phylogenetically distinguishable waterfowl dendograms, including specifi c m lineages ( makarova et al. , 1998 ; widjaja et al. , 2004 ) . the diverse and stable avian pool of infl uenza virus appears to be ancestral to the infl uenza viruses that infected human populations. the phylogenetic methods that have been adapted from evolutionary biology have been tremendously helpful and have allowed us to trace the seemingly untraceable, virus evolution (see chapter 5). thus, we have often been able to make informed judgments concerning broader patterns of virus evolution and this has become the major tool for the current study of virus evolution, such as infl uenza virus ( nelson and holmes, 2007 ) . infl uenza a, for example can be seen to show extended periods of stasis followed by periods of rapid adaptation that necessitates adaptations in vaccine strategy ( wolf et al. , 2006 ) . however, the evolutionary variation between seemingly similar viruses can be surprisingly large (see above vsv section). for example, the very different phylogenetic behaviors between infl uenza a and measles virus, both acute human respiratory infections due to membrane-bound negative-stranded rna viruses, are striking. the reasons for the maintained genetic stability of measles virus remain poorly understood, but may well involve more complex fi tness associated with systemic infections. phylogenetic methods can also be highly informative regarding the likely origins of viral lineages and possible sources of emergence. for example, the studies of dengue virus by holmes and colleagues suggest that this virus fi rst entered its human host about 1000 years ago, and that sylvatic (african jungle) asymptomatic infection of primates may have provided the origin of this virus that later became a human pathogen ( holmes and twiddy, 2003 ; holmes, 2006 ) . such insight provides valuable clues concerning the likely selective pressures that may lead to the emergence of dengue virus. phylogenetic methods are also highly informative regarding classifi cation and taxonomy relationships and have allowed us to understand viral relationships across broad species defi nitions ( zanotto et al. , 1996 ) . however, phylogenetic approaches necessarily assume the master template is the fi ttest type and that mutations or variants in the rna populations are a source of genetic load that are deleterious and limiting to virus adaptation ( pybus et al. , 2007 ) . such variation is mostly due to " unfi t " mutations, which indicates that a viral cloud is mostly and unfi t consortia. it would seem that such conclusions go against the concept of quasispecies as being fi t per se as described above. in this consideration we see a major weakness of extant phylogenetic methods. they were not developed to access the evolutionary relationship and fi tness of interacting mixtures. nor were they designed to follow the evolution of systems with high rates of recombination between numerous parental templates. we currently lack the analytical tools for such a population analysis. without such tools, however, it seems we can only evaluate those parameters we can defi ne and will remain confused by those we cannot. evolution of a consortia thus provides a new directions for theoretical and laboratory research. we should seek to investigate the mixture, not just its average. another major virus-host system that has been highly studied is the viruses of agricultural plants. our understanding of plant viruses has also been highly infl uenced by disease associated with agricultural domestic species, thus natural virus-plant relationships are much less understood, although some recent fi eld studies are starting to change this situation (see chapter 12). we currently have a rather uneven understanding of broader virus-host relationships and evolution in plants. for example, viruses of the more ancient ferns, if they exist, are essentially unknown. the prevalence and diversity of positivestranded rna viruses in plants is striking. in addition, we are starting to appreciate that virus-virus interactions are also frequently involved, although this issue remains poorly studied. one well-studied family of plant virus are the tobamoviruses of angiosperms (see chapter 11; gibbs, 1999 ) . progenitors of this virus family appear to also be found in algae and fungi consistent with a very long evolutionary history. both high transmission between host and virus-host congruence are observed with these viruses. virus-virus interactions also seem to be important. for example, tobacco mosaic virus (tmv) and tomato golden mosaic virus (tmgv) appear to have shown interactions in australia which have apparently led to the extinction of tmv, but the retention of tmgv with no increase in genetic diversity ( fraile et al. , 1997 ) . plant viruses have also been seen as quasispecies in some but not all settings (see chapter 12; roossinck, 2003 ; roossinck and schneider, 2006 ) . besides the interactions expected for typical viral quasispecies, plants often show evidence of more extensive mixed virus infections. there are, for instance, many examples of satellite viruses that must necessarily interact with other rna viruses of plants. it is also clear that the subviral elements of even a single viral lineage can greatly affect the virus-host relationship. such subviral elements (dis) have been observed to both reduce and intensify disease, and also interact with satellite viruses ( qiu and scholthof, 2001 ) , thus virus-virus interactions are clearly crucial in many situations ( simon et al. , 2004 ) and viral interactions and synergism appear to have led to signifi cant events in plant virus emergence ( fargette et al. , 2006 ) . virus-virus interactions are not limited to plant rna viruses. the ssdna plant geminiviruses also display complex interactions with satellites as well as high diversity in fi eld isolates of east africa ( ndunguru et al. , 2005 ) . thus, plant viruses seem particularly prone to interactions. more recently, virus-mediated symbiosis with respect to host survival has been reported ( roossinck, 2005 ) (discussed below). phylogenetic methods also struggle to address the occurrence of high rates of recombination in viral lineages. such a situation complicates the analysis, creating hardto-defi ne, reticulated trees, although these limitations can be partially overcome by using sliding windows for the analysis. such approaches have allowed surveys of recombination in some viral lineages, such as with the plant potyviruses ( chare and holmes, 2006 ) . however, the rampant recombination and quasispecies generation of hiv-1 makes a quantitative assessment of the virus population problematic. one proposed solution is to use a composition vector method ( gao and qi, 2007 ) . the issue of measuring recombination and tracing evolution in large populations is especially a problem that applies to the dna viruses (phage) of prokaryotes (see below). cells. our perception regarding the overall importance of dna viruses of prokaryotes to the evolution of life on earth has undergone a major shift in recent years. the main realization is that dna phage are the numerically dominant genetic entity in most habitats on earth (mentioned above). in addition, as discussed in chapter 10, it is now clear that some of these viruses are surprisingly complex and that essentially the entire pool of dsdna viruses of prokaryotes may be exchanging dna via recombination at high rates. this would constitute by far the largest common gene pool on earth. historically, the evolution of the dna viruses of prokaryotes has seldom been considered in the broader context of virus evolution or evolutionary biology. although it has long been realized that there are many basic similarities between viruses of bacteria and eukaryotes ( luria et al. , 1959 ) , not until structural studies solved the capsid genes of prokaryotic and eukaryotic viruses did the evolutionary relationships between these viruses become clear. in addition, there have been a number of striking proposals that suggest that dna viruses of prokaryotes may be involved in the origin of several major systems used by cells and that viruses appear to be involved in several major transitions during host evolution. thus we now consider the possibility that these dna phage were fundamental to the origin and evolution of life on earth. it now seems likely that some large dna viruses infecting eubacteria, archaea, and eukaryotes share some common evolutionary histories. it also seems clear that such viruses can link all three domains of life. this realization was not apparent based on phylogenetic sequence conservation, which is absent. it stems from the structure and assembly of virion capsids in which t4 phage, halophage, and the herpesviruses all show clear similarity as well as similarity in replication strategies. in addition, phage prd1 and adenoviruses show similar broad structural and strategic conservation. some biochemical (dna pol family) and genetic similarities (gene order, gene programming) are also apparent, which taken together supports the common origins of these viruses ( hendrix, 1999 ( hendrix, , 2002 hendrix et al. , 1999 hendrix et al. , , 2003 . t4-like viruses in particular seem to represent a major source of global genetic diversity. this giant genetic pool represents a huge potential to affect life ( filee et al. , 2005 ) and the viral genetic creativity represented by this pool would also be vast ( nolan et al. , 2006 ) . since t4-like phage that infect cyanobacteria also encode virus-specifi c type ii photosynthetic core genes, viruses appear able to create the most complex of genes as well ( clokie et al. , 2006 ; sullivan et al. , 2006 ) . as presented in chapter 10, phage are now thought to evolve by distinct and highly mosaic " horizontal " processes of rampant recombination ( hendrix, 2002 ( hendrix, , 2003 . large dna phage appear to be ancient, present before the split of the three main branches of cellular life: bacteria, archaea, and eukarya ( benson et al. , 2004 ) . luca, the last universal common ancestor, would represent the putative cell ancestor prior to this split. however, phyogenetic analysis of common or conserved genes of luca identifi es only about 325 or fewer genes in extant cellular genomes ( mushegian, 1999 ; koonin et al. , 2001 ; mirkin et al. , 2003 ) . ironically, the genes needed for dna replication are not part of this conserved set, calling into question the nature of the fi rst dna-based cell. large-scale " horizontal " transfer seems to have clearly prevailed early in the evolution of dna-based cellular life and it has recently been asserted that luca existed in a highly horizontal " consortia " of cooperative genes that developed the common genetic code ( vetsigian et al. , 2006 ) . since the dna replication proteins in the extant three domains of life have distinct compositions, it has been proposed by forterre that dna viruses and retroviruses were directly involved in the invention of the three extant cellular dna replication systems . according to this view, early cellular life was completely entangled with viral (phage) lineages; hence cells must have evolved from an ancestral " virus " -mediated population not a single genetic lineage. thus the evolution of early life would have clear similarity to the quasispecies (consortia) state of genetic information as seen in rna viruses above. thus the huge creative and adaptive potential of virus would have been directly involved in the very earliest evolution of life. clearly, such conjectures regarding the most ancient events in the evolution of life are hard to substantiate. but, these theories are as viable as any other and deserve serious consideration. in spite of this seemingly unending mosaic exchange in dsdna phage, some phage isolates show surprisingly stable genetic makeup. we now accept that t4-related phage are an important source of the larger global phage genetic diversity and that most such viral genes are novel ( filee et al. , 2006b ; nolan et al. , 2006 ). yet even with t4-like viruses, there can be clear barriers to horizontal gene transfer which promote the evolution of stable viral lineages ( filee et al. , 2006a ) . in t4-type phage, 24 similar core genes could be seen in all genomes, which seem to be inherited in gene blocks that preclude recombination. however, these blocks were not seen in the broader t-even and pseudo t-even genomes. other phage also show surprising genetic stability when repeatedly isolated from similar habitats, such as soil phages of burkholderia ( summer et al. , 2006 ) and bam35 ( saren et al. , 2005 ) as well as some hot spring isolates ( khayat et al. , 2005 ) . this bam35 capsid also identifi es another structural motif mentioned above that is broadly conserved in evolution and shows clear similarity to that capsids found in prd1 and pbcv-1 (discussed below). sh1 also has a clear prd1related capsid, membrane, and genome; thus this halophilic euryarchaeon virus, although showing no sequence similarity to prd1 or any other bacterial phage, is clearly structurally related ( bamford et al. , 2005a ) . it is interesting that overall the viruses of hyperthermophilic crenarchaeota generally show no sequence relationship to phage of bacteria. in addition, the use of the term phage for these viruses can also be questioned as most establish non-lytic chronic infections. many of these crenarchaeota viruses have unique morphologies not found in any other domain of life ( prangishvili et al. , 2006a ( prangishvili et al. , , 2006b ortmann et al. , 2006 ) . some, however, have clear structural and genetic similarity to specifi c phage (i.e. t4). considerations of phage evolution and rampant recombination (especially with t4 and t-even phage) often emphasize the viral lytic lifestyle and host death. in fact this lytic relationship was argued by many early phage researchers to be the fundamental and only character of phage-host relationships in general. we now know, however, that persisting (temperate) phage are also common, some of which have no independent lytic phase. the fundamental model of phage persistence by unique integration into host chromosomes (temperate lysogeny) marks a major development in our understanding of molecular virology and virus-host relationships which was fi rst clarifi ed by campbell in 1962 (see campbell, 2007 . all free-living prokaryotes show the presence of colonized phage in their genomes. both complete and defective genomes of dsdna viruses have been observed in the sequenced dna of all free living prokaryotic genomes ( gelfand and koonin, 1997 ) (exceptions are some intracellular parasites and plastids). thus, the massive genetic diversity and novelty of phage evolution as presented above has a direct conduit into the genetic composition of all prokaryotes via lysogeny. the fi tness and evolutionary consequences of such colonization to the evolution of the host and its virus should be considerable but is in need of theoretical development. fitness of temperate phage, however, is more complicated then that of a lytic virus and, like fi tness of persistence discussed above, cannot be simply described by relative replication or effi cient virion production. here too, successful phage colonization must inherently limit the replication of the same virus. thus, a temperate lifestyle also requires an autoinhibitory capacity. this generally involves an immunity gene set that not only limits self-replication but can also affect replication of other temperate and lytic viruses, i.e. lambda (even as a defective) precludes t4 and other t-even phage. uncolonized hosts are thus susceptible to lysis by highly prevalent acute tailed phage. host fi tness is thus strongly affected by a temperate phage due to its ability to preclude and survive other competing phage. i suggest this situation is similar to the mhv-mouse exemplar above, in that virus-colonized hosts are in a state of " virus addiction " in which persistence is needed to provide protection from the same or similar virus ( villarreal, 2005 ( villarreal, , 2006 . it is well established that most natural populations of bacteria have specifi c patterns of phage colonization, hence the utility of phage typing for strain identifi cation. from this, we can infer that virus-virus competition is a prevalent and major issue regarding the prokaryotic fi tness resulting from a symbiotic temperate phage-host combination. in addition, such virus-host symbiosis can also affect competition with other bacteria. this would be very much like the virus addiction concept outlined above for the mhv examplar. the original observation of a lysogenic process and coining of this term occurred in the 1920s when two pure cultures of bacteria were grown together. it was observed that in some combinations, one strain would lyse the other strain (was lysogenic). later, it became clear that such lysis was mediated by reactivation of temperate phage present in the lysogenic strain, but absent from the non-lysogenic susceptible strain. in this relationship, we see another example of group selection operating on bacterial populations harboring a persistent virus. thus, what host is fi t depends very much on the prevalent viruses it will encounter as well as the viruses that colonize it. bacterial populations that are colonized by the same or similar phage express the appropriate immunity functions and are protected from lysis by the same or similar phage. such a situation has signifi cant implication for the evolution of immunity and group identity for cells. host stability becomes a major fi tness issue for a persistent virus life strategy. it is generally thought that a temperate virus attains a stable colonization of its host by simply integrating into and become one with the host genome. however, there are also clear examples of stable phage persistence that does not integrate and uses other strategies to attain host stability (similar to eukaryotic dna viruses; see below for the p1 phage exemplar of this). like a temperate phage, a host that is colonized by episomal persisting viruses has also been much affected in its evolutionary potential. it is clear that phage can have complex effects on host populations, but these phage themselves often exist in complex and mixed states that can be diffi cult to unravel ( harcombe and bull, 2005 ) . it has been known for some time that the presence of otherwise silent phage can greatly affect the growth of other virus and susceptibility of host. one such silent and common phage that has long been studied is p1. p1 was initially discovered due to its effect on t4 and lambda. however, p1 has been a very interesting model, not because it causes disease or offers potential therapy against bacterial pathogens, but simply because it persists effi ciently as an episome and competes effectively with many other phage ( yarmolinsky, 2004 ) . since it does so without integrating, p1 provides us with one of the only well-studied models that can inform us regarding the molecular strategies and details of how stability in non-genomic persistence is attained. curiously, a main strategy by which p1 attains this stability was inapparent and not suspected after several decades of study. it became apparent only after replication mutations were made that induced self-destruction and uncovered the existence of what came to be called " addiction modules " ( lehnherr et al. , 1993 ) . p1 encodes several gene pairs (toxins/ antitoxins, such as the phd/doc pair) that protect bacteria harboring p1, but kill daughter bacteria that have lost the p1 genome ( gazit and sauer, 1999 ) . this strategy compels colonized e. coli to maintain p1 or die (doc, death on curing). however, these very same addiction systems are also involved in protecting a p1-colonized colony from t4 and lambda infection and will also induce self-destruction when cells are infected by those viruses, protecting the colony (population). p1 also provides an exquisite level of molecular self-identifi cation in that it will recognize a single second copy of its own genome ( yarmolinsky, 2000 ) . what then is the fi tness and evolutionary consequence to e. coli harboring p1? clearly it is major, but mostly host fi tness is affected relative to other viruses. accordingly, when contemplating the amazing complexity of the p1 immunity and how it evolved, yarmolinsky posed the question; " could the byzantine complexity of the controls at immi be the outcome, not of successive host-parasite accommodations, but of competition among related phages? " ( yarmolinsky, 2004 ) . if we answer yes to this question, then we would also conclude that virus-virus interactions and competition in general are major forces in the adaptability and evolution of persisting phage and surviving colonized host. in this light, viral persistence takes on a major role in virus and host evolution. the p1 exemplar has thus provided us the concept of viral addiction that also promotes host group selection. historically, we are biased to think of viruses (and phage) as agents that simply kill their host. some have proposed that the prokaryotic global biomass is phage partitioned into those populations that live and those that die due to viral lysis. from such a perspective, viral novelity would seen of little relevance to host evolution. metagenomic projects as noted above, have sequenced nearly 2 million phage genomes and report that most of these phage genes are unique, not in the database, and likely not derived from host ( edwards and rohwer, 2005 ) . the protein repertoire of sequenced phage indicates that 80% of conserved phage genes are specifi c to phage and show an evolutionary independence from genes of host . this identifi es a massive genetic novelty from virus, which is especially apparent in large dna phage. as just discussed above, however, those hosts that live are also products of phage selection, and persisting temperate phage play a major role in this. such phage colonization allows this massive phage novelty to fi nd its way into host genomes, which allows viral complex gene sets to be applied to novel problems of host adaptation. host novelty can thus be introduced by phage ( comeau and krisch, 2005 ) . that persistence is a major life strategy of phage is confi rmed by the large numbers of genes associated with persistence (i.e. integrases, immunity) observed in metagenomic screens. there is also much practical experience that supports the crucial role of prophage in host evolution. one particularly well-studied system that has been studied for over 50 years is the ongoing evaluation of phage evolution as observed in the dairy industry ( canchaya et al. , 2003 brussow et al. , 2004 ) . the temperate phage analysis of these bacteria follows a long tradition of lambda and e. coli studies ( campbell et al. , 1992 ; canchaya et al. , 2003 canchaya et al. , , 2004 . since lytic phage can severely disrupt dairy fermentation, it was of particular interest to understand and trace their evolution. these studies have led brussow to conclude that much of the more recent dairy bacteria evolution can be considered to have resulted from the action of temperate phage. a similar view applies to e. coli and cyanobacteria. in addition, the ecor collection of 72 sequenced e. coli genomes of medical interest shows that they differ from each other mainly due to patterns of genetic colonization, mostly by prophage, but they also show the presence trna-adjacent defective prophage and plasmid elements that differentiate these strains ( hurtado and rodriguez-valera, 1999 ; mazel et al. , 2000 ; nilsson et al. , 2004 ) . cyanobacteria ( prochlorococcus ) is major model for the study of the origin of the type ii (plant-like) photosynthetic system. since such genes show much evidence of recent and massive horizontal movement, it seem quite likely that prophage are mediators of such transfers, especially as these phage encode their own version of these photosynthetic genes ( lindell et al. , 2004 ; sullivan et al. , 2006 ) . very similar prochlorococcus strains exist in distinct oceanic populations in various habitats known as ecotypes. some think that such ecotypes represent the initial type of genetic variation that leads to speciation. the sequencing of six ecotypes has shown that they are 99% similar to one another, but the genetic variation that distinguishes them is mostly due to patterns of prophage colonization (called phage islands) ( bouman et al. , 2006 ; coleman et al. , 2006 ) . thus in all these prokaryotic models, persisting viruses play a fundamental role in host evolution and host genetic novelty is mostly phage derived. such observations have led some to propose that " war is peace " regarding virus-host evolution ( comeau and krisch, 2005 ) . massive and complex innovation by phage appears to be a major force in the prokaryotic world. prokaryotes are the most adaptable of all cells. if we can accept the above conclusion concerning the role for viruses in the evolution of prokaryotes, we must then ask why such a successful evolutionary strategy was not apparently maintained in eukaryotes? in eukaryotes we see little evidence that largescale integration by dna viruses is an important evolutionary process (although the story with retroviruses is different). why should prokaryotes and eukaryotes differ is such a fundamental way? nevertheless, as noted at the start of this section, we do see good evidence that links the evolution of large dna viruses of prokaryotes to the large dna viruses of eukaryotes. in case we were becoming comfortable with the apparently clear distinctions between rna and dna virus evolution as outlined above (quasispecies vs. domain recombination respectively), the evolution of the parvoviruses informs us that dna viruses can also evolve by a quasispecies process. parvovirus evolution (see chapter 17) can show a sharp contrast to the evolutionary pattern displayed by other small dsdna viruses above (hpv, py). with the emergence of an acute pandemic in domestic dogs and cats (as well as other wild carnivore species), we see what is essentially evolution driven by single point mutations, mostly affecting the capsid genes and host cell receptor binding. this system provides us with one of the better studied examples of the evolutionary dynamics of an emergent viral disease. in addition, in vivo mouse studies with minute virus of mouse (mvm) now make it clear that parvoviruses can behave much like rna viruses, generating quasispecies of diverse progeny that allow a high adaptability for the generation of fi tness and disease in vivo ( lopez-bueno et al. , 2006 ) . this story is very reminiscent of the study of poliovirus in mice mentioned above. human studies with b19 parvovirus are also consistent with high mutation rates ( parsyan et al. , 2007 ; shackelton and holmes, 2006 ) . although not specifi cally addressed in this volume, the viruses of eukaryotic unicellular green algae are of special interest from the perspective of dna virus evolution. these large, complex dsdna membrane-containing icosahedral viruses are abundant in some water habitats ( van etten, 2003 ; ghedin and claverie, 2005 ) . the reason they deserve special attention is that they clearly have many features that are characteristic of both prokaryotic and eukaryotic viruses. they resemble prokaryotic viruses in that their life cycle is clearly phagelike, such as external virion attachment, injection of dna and no pinocytosis. in addition, they also encode many phage-like genes, such as restriction-modifi cation enzymes and homing endonucleases ( filee et al. , 2006c ) . they also resemble eukaryotic viruses in that they have eukaryotic dna replication proteins (dna polymerase beta and pcna; chen and suttle, 1996 ; nagasaki et al. , 2005 ; villarreal and defilippis, 2000 ) as well as many genes associated with eukaryotic signal transduction ( van etten et al. , 2002 ) . thus they represent a clear link between prokaryotic and eukaryotic dna viruses. for example, the dna polymerase of paramecium bursaria chlorella virus (pbcv-1) is the most conserved gene and most closely resembles that found in human herpesvirus and is distantly related to the similar family dna pol encoded by t4. this polymerase is distinct from that of the poxviruses or prd1/adenoviruses (associated with protein-primed dna replication). however, numerous other genes of the phycodnaviruses are similar to some genes found in the mimiviruses (giant dna virus of ameba), including the presence of conserved intenes in the dna pol gene ( ogata et al. , 2005 ) . in view of this it is most curious that in structural similarity, polydnavirus capsids clearly resemble prd1 capsid ( khayat et al. , 2005 ; nandhagopal et al. , 2002 ) . prd1 contains the double-barrel trimer capsid structure that was fi rst observed in adenovirus (for references see saren et al. , 2005 ) . adenovirus also closely resembles prd1 in dna replication strategy (i.e. linear dna with covalently closed ends ( benson et al. , 2004 ; khayat et al. , 2005 ) . the lineage of adenovirus-like dna viruses, however, is thought to be distinct from that herpes and poxviruses and its dna polymerase is clearly distinct from polyndavirus. it is clear that related elements of all these viruses can be found in phycodnaviruses. overall, the phycodnaviruses, like phage, also appear to be creating genes in large numbers and they encode many genes unrelated to their host. what then is the evolutionary relationship that links all of these seemingly distinct viruses? as outlined above, the pattern of evolution of dsdna phage involves lots of exchange by recombination from a vast gene pool. this pool resembles a cloud from which various mosaic subelements and substrategies are assembled to allow viral gene acquisition and novelty ( blum et al. , 2001 ; benson et al. , 2004 ) . does such a distributed pattern of evolution and gene novelty also apply to the phycodnaviruses? recently, another distinct phycodnavirus has been sequenced: coccolithovirus (ehv-86) ( allen et al. , 2006a ( allen et al. , , 2006b conserves only 24 core genes in common with pbcv-1 and is unique to the phycodnaviruses in that it has acquired six dnadep rna polymerase subunit genes, which are absent in all other phycodnaviruses. as rna polymerase is considered a core viral gene function, it is clear that phycodnaviruses can alter some very basic molecular functions during their evolution. oceanic phycodnaviruses are thought to have large infl uence on the free-living populations of eukaryotic algae, such as the termination of algal blooms reported for emilian huxley virus ( martinez et al. , 2007 ; schroeder et al. , 2003 ) . however, not all phycodnaviruses are lytic. another lineage of phycodnaviruses is represented by two viruses of fi lamentous brown algae, esv-1 and firrv-1 ( delaroque et al. , 2003 ) . unlike the lytic phycodnaviruses noted above, these two viruses are " temperate phage " like. that is they exist as silent viruses whose dna is integrated into the germlines of their host. in this, they are unique to all known eukaryotic dna viruses; host chromosome integration is a normal part of their persistent life strategy. esv-1 has a 335 593-bp genome and encodes 231 likely genes ( delaroque et al. , 2001 ) . these genes are mostly unique and only 28 are clearly related to pbcv-1 genes. the gene differences include many replication genes and their gene order is completely different. like the temperate phage-host evolutionary relationship outlined above, it would be most interesting to understand how the integration of these large dna viruses has affected host evolution. thus, the phycodnaviruses appear to represent a basal but diverse viral lineage that has both acute and persistent lifestyle and have some clear relationships to most large eukaryotic dna viruses and many phage. the phycodnavirus exemplar above should leave us with several impressions regarding the nature and evolution of these large and ubiquitous dna viruses of algae, an early eukaryotic host. they show clear linkages by structure and function to both phage and various eukaryotic dna viruses. they also show major variation and novelty in their own genetic composition, including their core genes. in addition, they show clear relationships to distinct and seemingly separate viral lineages (adenoviruses, herpesviruses, poxviruses, iridoviruses). the picture we are left with is that they seem to resemble phage evolution in that they appear to have evolved from a diverse pool that has exchanged many basic viral features and created many new genes. this view, however, contrasts sharply with the work of iyer et al. (2001 iyer et al. ( , 2006 . by considering the small number of conserved genes in four families of eukaryotic dna viruses (poxviruses, asfarviruses, iridoviruses, phycodnaviruses), they suggest that these viruses are monophyletic, evolving from a common nucleo-cytoplasmic large dna virus (ncldv) with an icosahedral capsid. given the above information, i fi nd this view unhelpful and possibly confusing. it has numerous problems. the main problem is that it fails to acknowledge the clear link between prokaryotic and eukaryotic viruses. furthermore, by focussing on a small set of related genes, it represents a traditional perspective as found in evolutionary biology that assumes a common (fi ttest) linear lineage, not a cloud, cooperative, or mosaic pool as the main source of novelty resulting in the matrix pattern of virus evolution. the virosphere is clearly not disconnected from itself, but it is also clearly not a linear or tree-like evolutionary system as suggested above. we must learn to think of virus evolution in its own terms; fuzzy, mixed, reticulated, and cloud-like. as mentioned in the phage section, there have been various publications that suggest a deep evolutionary relationship between the herpesviruses and dsdna viruses of prokaryotes ( rice et al. , 2004 ; khayat et al. , 2005 ; duda et al. , 2006 ; akita et al. , 2007 ) . such enormously distant relationships, however, cannot now be measured by any reliable metric. although herpes-like viruses are found in invertebrates (such as ostreid herpesvirus 1 (oshv-1)) in both lytic and asymptomatic states ( barbosa-solomieu et al. , 2005 ) , our interest in their evolution has been mainly focussed on the vertebrate herpesviruses. vertebrate herpesvirus do tend to show clear sequence conservation that suggests broad patterns of evolution. one interesting feature of this evolution is the apparent link between the biology of the virus and its evolution. a common, but not universal pattern is that of virus and host co-evolution ( mcgeoch et al. , 2000 ( mcgeoch et al. , , 2006 mcgeoch and gatherer, 2005 ) . this trend has maintained several biological characteristics, such as highly species host-and tissue-specifi c persistence (i.e. neuronal and lymphoid persistence). the discovery of hhv-8 has further stimulated studies of herpesvirus evolution in that hhv-8 appears to have undergone much recombination with herpesviruses of related primate lineages ( mcgeoch and davison, 1999 ) . thus recombination seems prevalent in herpesviruses. the apparent link between herpesvirus evolution and recent human evolution, as well as an apparent link to primate retroviral evolution, is fascinating, but of unknown signifi cance ( kung and wood, 1994 ; lacoste et al. , 2000 ) . the herpesviruses lineages will often show the presence of lineage-specifi c genes. many of these genes affect innate and adaptive host functions, whereas others affect host metabolism. when the source of such genes has been contemplated, in contrast to phage, phycodnaviruses, or baculoviruses ( herniou et al. , 2001 ) , it is often proposed that most such herpes genes originate from the host. it is well accepted that the three major lineages of herpesviruses descended from a common ancestor in vertebrates ( mcgeoch et al. , 2006 ) . there have been numerous proposals that most new lineage-specifi c herpesvirus genes have originated from host (see becker and darai, 2000 ) . this includes herpesvirus dutpase ( davison and stow, 2005 ) , and viral chemokines and viral bcl-2 ( nicholas et al. , 1998 ) . in my evaluation of such claims, however, it seems clear that the possibility that there was an ancient viral source of such genes was not considered and cannot now be dismissed. we currently believe that ancient herpesvirus ancestors can be traced to tailed phage ( hendrix, 1999 ; bamford, 2003 ; baker et al. , 2005 ; duda et al. , 2006 ; mcgeoch et al. , 2006 ) . other phage lineages also appear to trace to eukaryotic viruses ( bamford et al. , 2005b ) . within the herpesviruses, the same t-16 icosahedral structure, as well as invertable dna regions are also present in the very distant but much more recognizable oceanic ostreid herpesvirus 1 . given the highly diverse and mosaic nature of large dna virus evolution in prokaryotes and lower eukaryotes described above, it seem quite possible that many other viral genes might also trace far back in virus evolution. consider the example of dutpase in avian and mammalian herpesvirus ( davison and stow, 2005 ; mcgeehan et al. , 2001 ) . the current view requires very complicated gene rearrangements to account for the viral source of this gene from its host. yet we know that diverse dutpases are found in many ancient viral lineages. for example, the ervs present in all vertebrate genomes also conserve dut-pase ( jern et al. , 2005 ) , as do exogenous retroviruses (i.e. lentiviruses) ( mcintosh and haynes, 1996 ) . in fact, since the herpesviruses genes are especially poor in introns, it would seem likely that any herpesviral gene acquisition would necessarily involve a retrovirus via a cdna. the oceans are especially fi lled with large complex dna viruses (such as mimivirus and phycodnavirus, plus numerous relatives of oshv-1) thought to be ancient ancestors of herpesvirus. the phycodnavirus (chlorella virus, pbcv-1) provides a clear bridge between phage and eukaryotic dna viruses. pbcv-1 also encodes a dutpase that has the highly conserved motif iii ( zhang et al. , 2005 ) . many phage are also known to encode dutpases of diverse types, such as b. subtilis (spbeta) ( persson et al. , 2005 ) , and a phage of thermus thermophilus ( naryshkina et al. , 2006 ) . this thermus phage (phiys40) is of special note since its dutpase gene is clearly related to the dutpases of eukaryotic viruses and has a version that has undergone multiple events of recombination from apparently distinct phage, exactly as expected for mosaic phage genes. thus, the origin of new herpesvirus genes might not be so different than that seen in other large dna viruses and a potential ancient source of new genes from these ancestral viruses remains plausible. similar considerations apply to other possible examples of herpesvirus gene capture. for example, the herpes thymidylate synthase (ts) has also been considered to have originated by host gene capture ). yet distinct versions of these genes are also found in different herpesviral lineages, which would necessitate multiple independent " capture " events of different version of host ts genes. ts genes are present in ancient virus sources. for example, bacillus phage beta 22 encodes ts, which also has a self-splicing intron ( bechhofer et al. , 1994 ) . also, phage phikz has a highly conserved ts ( mesyanzhinov et al. , 2002 ), yet this virus lacks a dna polymerase or other replication proteins, clearly indicating that the viral ts genes has a basic viral role. similarly, the cytokines-like genes (such as il-10) as found in poxviruses and herpesviruses appear to have originated in at least three independent events prior to the divergence of mammalian eutherian orders. yet it is still presupposed that they are necessarily the products of host gene capture ( hughes, 2002 ) . comparative genomics supports the idea that the herpesviruse lineages are originating viral genes. a broader phylogenetic analysis of all herpesvirus genomes identifi ed only 17 genes in common to all 30 taxa of herpesvirus . thus only 30 genes appear to be in common to all the herpesviruses. in this analysis, only a few genes of recent origin could be identifi ed as possibly having been transferred between virus and host (e.g. new genes found at tips of phylogenetic dendograms). thus, gene gain in the herpesviruses (as in dna phage and phycodnavirus) is prevalent but the origination of such genes from the host is not prevalent. i suggest that our tendency to assume that new viral genes are usually " stolen " from the host should be revised ( moreira and lopez-garcia, 2005 ). in contrast to the herpesviruses, the poxviruses evolution tend to have little congruence to host evolution (see chapter 19 ). yet, they too show evidence of ancient linkages to other viruses. the replication of poxvirus dna is distinct in that it involves a linear genome with inverted ends that have covalently closed " snapback " dna. the resulting replication structures involve head-to-tail and tail-to-tail intermediates. this replication strategy is very different from that used by the host (and most other dna viruses), but is clearly related to that found in other eukaryotic and prokaryotic viruses. similar replication mechanisms are seen in all poxviruses, as well as african swine fever virus and phycodnaviruses (pbcv-1). this exact replication strategy is also present in archaeal lipothrixviruses (sirv1 and sirv2) which has been proposed to be ancestral to phycodnaviruses and poxviruses ( persson et al. , 2005 ) . a similar replication strategy is also seen with n15 ( lobocka et al. , 1996 ) , an unusual phage of e. coli that persists as a linear dna ( casjens et al. , 2004 ) . conservation of such replication similarities clearly suggests ancestral relationships, but no sequence similarity can be seen between these viruses. the similarity between poxvirus and pbcv-1 dna replication deserves some additional comment. pbvc-1 and herpesvirus have very similar dna polymerase genes, yet differ fundamentally in replication strategy. furthermore, the poxviral dna polymerase gene is very different from that found in the herpesviruses. yet, the pbcv-1 capsid was clearly similar to that of adenoviruses and prd1 phage (and iridovirus capsids). how then do we link poxvirus evolution to other more ancient dna viruses, such as pbcv-1 which has the same dna replication mechanism, but distinct replication proteins? such observations might seem confusing, but they are clearly consistent with mosaic, reticulated evolution of dna viruses. various distinct phage lineages can link in multiple ways to various distinct eukaryotic dna viruses. the concept of a net or matrix rather than a tree is thus a better way to describe the broad topology of dna virus evolution. the issue of gene gain and gene loss is also of central interest to orthopoxvirus evolution. typically, we seek to understand poxviruses evolution from the perspective of pathogenesis, such as the origin of human-specifi c smallpox virus. with the comparative genomics of several orthopoxviruses now possible, we see curious overall patterns of gene loss in their evolution ( randall et al. , 2004 ) . for example, comparing human smallpox to cowpox dna (a rodent virus that is phylogenetically basal to smallpox), we observe an overall diminution of gene content in smallpox virus. several poxviruses seem to have also lost genes relative to cowpoxvirus, especially genes that appear to affect immunity ( hughes and friedman, 2005 ) . i suggest that this evolutionary tendency for gene reduction is associated with a switch from a more demanding species-specifi c persistent life strategy to a less demanding, acute life strategy in a new host. cowpox is a naturally persistent infection in rodents (bank voles) ( feore et al. , 1997 ; chantrey et al. , 1999 ) , which has been called a natural virus reservoir ( hazel et al. , 2000 ) . smallpox is a strictly acute and human-specifi c disease. such gene loss in association with lost persistence could be a general situation and might also explain why clinical isolates of human cytomegalovirus isolates show a strong tendency to delete genes with passage in culture ( davison et al. , 2003 ) . most orthopoxviruses are not phylogenetically congruent with their vertebrate host. host switching and acute replication seem to be relatively common but recent occurrences in their evolution ( babkin and shchelkunov, 2006 ) . the avian poxviruses are not as well studied in this context, but curiously have signifi cantly more complex genomes than the orthopoxviruses ( jarmin et al. , 2006 ) . the entomopoxviruses are even less well understood from both a biological and molecular perspective, although they do conserve 49 genes found in all poxvirus family members ( gubser et al. , 2004 ) . clearly these poxviruses share some degree of evolutionary history. it is most curious that entomopoxviruses have even larger, more diverse and complex genomes than the other poxviruses. why? as insects lack an adaptive immune system (the target of many orthopoxvirus genes), they would seem to present a simpler host for virus adaptation. this group appears to be the most basal phylogenetically, but evolutionary relationships between entomopoxvirus and insect evolution have not been studied. the enotomopoxviruses are particularly prevalent in grasshopper and locust species, often in unapparent states. interestingly, within these viruses we can fi nd examples of major shifts in core replication genes, such as the family of dna pol gene that is used (a shift from dna pol x to dna pol b in two entomopoxvirus lineages). we can recall that the dna pol b gene closely resembles that found in phycodnaviruses (and herpesvirus), but is distinct from that in orthopoxvirus ( zhu, 2003 ) . we also see in the entomopoxviruses some clear links to phage genes, such as t4-like rna ligase found in all entomopoxviruses ( ho and shuman, 2002 ) as well as a lambdalike integrase seen in d1epv ( hashimoto and lawrence, 2005 ) . this integrase in d1epv implies possible integration and persistence, thus it is most signifi cant that d1epv also shows a clear persistent host infection as well as symbiosis and apparent phylogenetic congruence between virus and host. this virus is symbiotic in its parasitoid wasp host in that virus is injected into larval host along with the wasp egg (and also along with a second d1rhv rhabdovirus) and virus is needed for successful host parasitization. this symbiosis is clearly very reminiscent of the genomic polydnaviruses of other parasitoid wasp species. diepv is also expressed in the male poison gland. however, it is unknown if diepv is integrating into the host dna. clearly, d1epv it is part of a complex virus-virus-host symbiotic interaction. the overall evolution of orthopoxviruses contrasts sharply with that of the papillomaviruses as presented by bernard in chapter 18. here, highly species-specifi c and tissuespecifi c host infection are the norm and the viral evolution is typically highly congruent with the host (with some exceptions). the resolution between virus and host can be high, in that human racial and geographical populations, for example, can often be differentiated based on the type of hpv they harbor. yet here too there is evidence of signifi cant shifts in core gene usage early during papillomavirus evolution. in the human and rodent viruses, a highly conserved gene function associated with replication and cell control are the e6 and e7 early genes. in particular, the prb-binding domain of the e7 gene is thought to be central to the biological strategy of the virus. thus, it is most curious that the papillomaviruses of lagomorphs, such as bovine and reindeer papillomavirus, lack an e7 rb-binding domain and instead appear to use e5 or e9 genes for this regulatory function ( narechania et al. , 2004 ) . it seems an early but signifi cant and bifurcating shift occurred in the molecular strategy during the virus-host evolution of this group of viruses for unknown reasons. other small dna viruses (jcv, bkv, py) can also show similar high-resolution host congruence ( shadan and villarreal, 1995 ) . as well as similar curious shifts in basic molecular strategies. for example, the presence of a middle tantigen in mouse virus (a third early gene), but its absence from primate viruses ( gottlieb and villarreal, 2001 ) , clearly differentiates these viral lineages. although the origins of these entire small dna viruses are obscure, and any links to prokaryotic viruses are unknown, it does appear they have tended to retain their overall biological strategy and show a strong tendency for tissue-specifi c (especially kidney) persistence and virus-host congruence. since persistence requires the stable coexistence of a virus and its host, it also fi ts the simple defi nition of symbiosis (the stable living together of two distinct lineages of organisms). viral involvement in symbiosis is a foreign idea to many and possibly presents a fundamentally different view of the role viruses may have in host evolution. a major role for persisting (temperate, cryptic) viruses in the evolution of prokaryotes is no longer a controversial idea. thus, at least in the prokaryotic world, virus persistence can be accepted as adaptive. in eukaryotes, however, viral persistence is seldom considered adaptive. the mhv-mouse exemplar as presented above has suggested how persistence can directly affect host survival. can this be considered an example of symbiosis in the accepted sense? a crowning achievement in the fi eld of symbiosis has been to explain the origin of plastids (chloroplasts, mitochondria) from symbiotic prokaryotes in eukaryotic cytoplasm ( margulis and bermudes, 1985 ) . this idea involves the high adaptability of prokaryotes to provide innovation but would seem not to involve virus in any way. yet here too we can fi nd viral footprints that suggest some involvement. for example, various plastid-specifi c rna and dna polymerases clearly resemble polymerases from t3/t7-like phage ( cermakian et al. , 1996 ; shutt and gray, 2006 ) . other models of symbiosis also show evidence of a viral role, such as the sexual isolation of buchnera ( moran et al. , 2005 ) . another very popular topic in the fi eld of symbiosis is the symbiotic origin of the photosynthetic sea slug, elysa chlorotica . what could be more fascinating than a green sea slug-an animal that can use light for photosynthesis? e. chlorotica eats photosynthetic eukaryotic algae ( vaucheria litorea ) and retains the functional chloroplast from algae for months. here too, however, there lies a viral footprint. this slug harbors an unusual endogenous retrovirus which is expressed in large numbers during sexual reproduction, following which all slugs die via synchronized apoptosis and in which the chloroplasts have accumulated numerous viral particles ( pierce et al. , 1999 ; mondy and pierce, 2003 ) . since there is reason to think gene movement from the algae to the slug genome is involved in this symbiosis, the presence of this retrovirus is a strong candidate to also be involved in symbiogenesis. clearly we should thus investigate retroviral elements as possible symbiotic participants and not dismiss them beforehand as irrelevant or " junk dna " (as is automatically done in many database screens). if viral persistence is a kind of symbiosis, viruses may also mediate the establishment of other symbiotic relationships ( villarreal, 2007 ) . the recent studies by roossinck and colleagues (see chapter 12), in which a persisting virus, a plant, and a fungus were all symbiotically involved in altering the thermal tolerance of the plant could be an example of this ( marquez et al. , 2007 ) . many other virus-host relationships should also be examined for possible symbiosis. for example, placental vertebrate evolution has involved various endogenous retroviruses (i.e. herv-w, herv-frd). intact herv genomes, including env orfs, are important for placental trophoblast fusion (for references see ryan, 2007 ) . some will dismiss this situation as the quirky usurping of a viral gene for host function which is of little general signifi cance. the specifi c erv involved is simply selfi sh and mostly defective genetic material of no general consequence. if so, why is it that in sheep a distinctly different lineage of retrovirus (enjsrv) was also selected to provide a related placental function to a another mammal with signifi cantly different placental reproductive biology? it has been experimentally well established the enjsrv env is essential for sheep embryo implantation ( dunlap et al. , 2006a ( dunlap et al. , , 2006b . enjsrv is the endogenous version of jsrv, a problematic sheep-specifi c retrovirus that induces lung tumors (responsible for the death of dolly, the famous fi rst cloned sheep). the endogenous virus (enjsrv) is present in 20 copies in the sheep genome and all sheep have this virus. sheep genomes also encoded a trans -dominant enjrsv gag that is inhibitory to exogenous jsrv ( mura et al. , 2004 ; oliveira et al. , 2006 ; murcia et al. , 2007 ) . it seems clear that this situation can also be considered from the perspective of viral symbiosis and/or virus addiction in host evolution. we should thus seek to understand why colonization by an erv population might generally provide a good solution to the evolutionary demands of placental biology. there are many other opportunities to examine the potential role of persistent and symbiotic viruses in the evolution of viruses, animals, primates, and humans. for example, as we seek to understand the origins of the adaptive immune system we should pay attention to viral footprints. we can ask, for example, why the major histocompatibility complex (mhc) locus, the most polymorphic, diverse, and rapidly evolving gene set in our chromosome, is so densely colonized with retroviral elements ( andersson et al. , 1998 ) . why is a retrovirus also the basic element of the duplication unit that was thought to have been the progenitor for the expansion of the mhc class i (and ii) genes ( gaudieri et al. , 1997 ; kulski et al. , 1998 kulski et al. , , 2005 ? why do similar herv element (l and 16) also differentiate between human and chimpanzee mhc i ( watkins, 1995 ; kulski et al. , 1999 ) ? what was the role for siv in the evolution of primate mhc ( vogel et al. , 1999 ) ? humans and primates appear to have undergone some signifi cant and relatively recent evolution with regard to their endogenous and exogeneous retroviruses. along these lines, apobec-like genes are basic component of the adaptive immune response but they are also antiretroviral genes that act on retroviral cdna and gag ( ohainle et al. , 2006 ) . the apobec3 antiviral system has expanded recently in humans, but not chimpanzees ( sawyer et al. , 2004 ; ohainle et al. , 2006 ) . why? all african primates support unapparent foamy viruses (and also siv co-infection), but not humans ( murray and linial, 2006 ) . apobec3c is active against foamy viruses ( delebecque et al. , 2006 ) . old world primates also underwent an expansion of ervl colonization (a clear relative of foamy virus) ( sawyer et al. , 2004 ) . was this ervl colonization of relevance to the ancient co-speciation of simian foamy virus and their primate host ( switzer et al. , 2005 ) ? what exactly was the relevance of herv endogenization to human survival and adaptations? curiously, human brain (neocortex) specifically expresses many of these more recent ervs as transcripts ( nakamura et al. , 2003 ; yi et al. , 2004 ) . if we consider these situations as possible examples of virus-mediated symbiosis in human evolution, perhaps they may make more sense of the otherwise confusing role or hervs. as noted, all primates, but especially humans show much evidence of recent endogenization by retroviruses. but these events mostly occurred in our extinct ancestors and we do not see ongoing evidence that any hervs remain active. however, we are currently witnessing a related virus-host evolutionary event of considerable interest. koala bears, native marsupials of australia, are currently experiencing a major epidemic caused by a leukemiainducing retrovirus. as a consequence, they are undergoing massive endogenization by a gammaretrovirus (mlv-related). this virus is similar to gibbon ape leukemia virus, but most likely originated from rodent ancestry ( tarlinton et al. , 2006 ; fiebig et al. , 2006 ) . the expectation is that extinction awaits those koalas that do not adapt or endogenize the retrovirus successfully ( stoye, 2006 ) . this event has the appearances of a retroviral-driven addiction that will result in a genetic variant of koala bear that has acquired a new antiretroviral state. this seems equivalent to the expansion of human apobec3; or perhaps a closer analogy is the endogenization of a suppressive gag as occurred with enjsrv. the surviving koala bears will likely tolerate or be persistently infected with this retrovirus pool. the genome of the species will have undergone considerable (but unpredictable) genetic perturbations and likely contain a large pool of variant and defective retrovirus. however, in so doing, the descendent koalas will likely present a biological hazard to any koala species that remain virus-free (as in virus addiction). currently, one island colony of koalas is suffi ciently isolated to have remained virus-free. this population will henceforth be under persistent threat from populations of endogenized koalas, now favored by group selection. from the very earliest events in evolution of prebiotic replicators to very recent events in human evolution, including the emergence of human-specifi c hiv, we expect viral evolution to show profound effects on the evolution of all life. unlike accepted host evolution, viruses also employ consortia and mixed populations to evolve, sometimes at unprecedented rates. thus viruses have informed us of quasispecies, group dynamics, and group selection in evolution. virus evolution should now be considered as basic science, not just a medical concern. we must acknowledge that the tree of life cannot be properly understood without virus evolution. this book helps to lay the foundation for such understanding. epizootic diarrhea of infant mice: indentifi cation of the etiologic agent the crystal structure of a virus-like particle from the hyperthermophilic archaeon pyrococcus furiosus provides insight into the evolution of viruses unusual polymorphisms in human immunodefi ciency virus type 1 associated with nonprogressive infection a ) genome comparison of two coccolithoviruses b ) evolutionary history of the coccolithoviridae retroelements in the human mhc class ii region host switching in lyssavirus history from the chiroptera to the carnivora orders common ancestry of herpesviruses and tailed dna bacteriophages do viruses form lineages across different domains of life? a ) what does structure tell us about virus evolution? b ) constituents of sh1, a novel lipid-containing virus infecting the halophilic euryarchaeon haloarcula hispanica ostreid herpesvirus 1 (oshv-1) detection among three successive generations of pacifi c oysters ( crassostrea gigas ) persistent infection promotes cross-species transmissibility of mouse hepatitis virus the proportion of revertant and mutant phage in a growing population, as a function of mutation and growth rate an intron in the thymidylate synthase gene of bacillus bacteriophage beta 22: evidence for independent evolution of a gene, its group i intron, and the intron open reading frame serological survey of virus infection among wild house mice ( mus domesticus ) in the uk molecular evolution of viruses, past and present: evolution of viruses by acquisition of cellular rna and dna viral eukaryogenesis: was the ancestor of the nucleus a complex dna virus? sex and the eukaryotic cell cycle is consistent with a viral ancestry for the eukaryotic nucleus coexistence of recent and ancestral nucleotide sequences in viral quasispecies of human immunodefi ciency virus type 1 patients a subset of human immunodefi ciency virus type 1 long-term nonprogressors is characterized by the unique presence of ancestral sequences in the viral population does common architecture reveal a viral lineage spanning all three domains of life? the error threshold what is a quasispecies an envelope glycoprotein of the human endogenous retrovirus herv-w is expressed in the human placenta and fuses cells expressing the type d mammalian retrovirus receptor the genome of the archaeal virus sirv1 has features in common with genomes of eukaryal viruses oceanographic basis of the global surface distribution of prochlorococcus ecotypes a ) here a virus, there a virus, everywhere the same virus? b ) method for discovering novel dna viruses in blood using viral particle selection and shotgun sequencing metagenomic analyses of an uncultured viral community from human feces minority memory genomes can infl uence the evolution of hiv-1 quasispecies in vivo phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion . microbiol the gene of retroviral origin syncytin 1 is specifi c to hominoids and is inactive in old world monkeys phage integration and chromosome structure. a personal history lambdoid phages as elements of bacterial genomes phage as agents of lateral gene transfer the impact of prophages on bacterial chromosomes differences in frequencies of drug resistance-associated mutations in the hiv-1 pol gene of b subtype and bf intersubtype recombinant samples lethal intestinal virus of infant mice is mouse hepatitis virus different evolutionary patterns are found within human immunodefi ciency virus type 1-infected patients the pk02 linear plasmid prophage of klebsiella oxytoca two proton transfers in the transition state for nucleotidyl transfer catalyzed by rna-and dna-dependent rna and dna polymerases sequences homologous to yeast mitochondrial and bacteriophage t3 and t7 rna polymerases are widespread throughout the eukaryotic lineage cowpox: reservoir hosts and geographic range a phylogenetic survey of recombination frequency in plant rna viruses role of minority populations of human immunodefi ciency virus type 1 in the evolution of viral resistance to protease inhibitors extensive recombination among human immunodefi ciency virus type 1 quasispecies makes an important contribution to viral diversity in individual patients evolutionary relationships among large double-stranded dna viruses that infect microalgae and other organisms as inferred from dna polymerase genes the thymidylate synthase gene of hz-1 virus: a gene captured from its lepidopteran host mapping of mutations associated with neurovirulence in monkeys infected with sabin 1 poliovirus revertants selected at high temperature the red queen reigns in the kingdom of rna viruses mimivirus and the emerging concept of " giant " virus transcription of a ' photosynthetic ' t4-type phage during infection of a marine cyanobacterium genomic islands and the ecology and evolution of prochlorococcus war is peacedispatches from the bacterial and phage killing fi elds genetic richness of vibriophages isolated in a coastal environment implications of high rna virus mutation rates: lethal mutagenesis and the antiviral drug ribavirin rna virus error catastrophe: direct molecular test by using ribavirin poliovirus pathogenesis in a new poliovirus receptor transgenic mouse model: age-dependent paralysis and a mucosal route of infection trans-dominant inhibition of rna viral replication can slow growth of drug-resistant viruses new genes from old: redeployment of dutpase by herpesviruses the human cytomegalovirus genome revisited: comparison with the chimpanzee cytomegalovirus genome a novel class of herpesvirus with bivalve hosts the complete dna sequence of the ectocarpus siliculosus virus esv-1 genome comparisons of two large phaeoviral genomes and evolutionary implications interference between bacterial viruses: iii. the mutual exclusion effect and the depressor effect restriction of foamy viruses by apobec cytidine deaminases serologic study on the prevalence of murine viruses in a population of wild meadow voles ( microtus pennsylvanicus ) serologic study on the prevalence of murine viruses in fi ve canadian mouse colonies new evolutionary frontiers from unusual virus genomes quasispecies and rna virus evolution: principles and consequences quasispecies: concept and implications for virology quasispecies and its impact on viral hepatitis nucleotide sequence heterogeneity of an rna phage population origin and evolution of viruses selfi sh genes, the phenotype paradigm and genome evolution asymmetric dde (d35e)-like sequences in the rag proteins: implications for v(d)j recombination and retroviral pathogenesis shared architecture of bacteriophage sp01 and herpesvirus capsids a ) ovine endogenous betaretroviruses (enjsrvs) and placental morphogenesis b ) endogenous retroviruses regulate periimplantation placental growth and differentiation syncytin-a and syncytin-b, two fusogenic placentaspecifi c murine envelope genes of retroviral origin conserved in muridae viral metagenomics molecular quasi-species genetic stability of the vhsv consensus sequence of g-gene in diagnostic samples from an acute outbreak evolution of fi tness in experimental populations of vesicular stomatitis virus fluctuation of hepatitis c virus quasispecies in persistent infection and interferon treatment revealed by single-strand conformation polymorphism analysis apobec3g cytidine deaminase inhibits retrotransposition of endogenous retroviruses molecular ecology and emergence of tropical plant viruses the effect of cowpox virus infection on fecundity in bank voles and wood mice transspecies transmission of the endogenous koala retrovirus the role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies marine t4-type bacteriophages, a ubiquitous component of the dark matter of the biosphere a ) a selective barrier to horizontal gene transfer in the t4-type bacteriophages that has preserved a core genome with the viral replication and structural genes b ) t4-type bacteriophages: ubiquitous components of the " dark matter " of the biosphere c ) i am what i eat and i eat what i am: acquisition of bacterial genes by giant viruses female characteristics in the drosophila melanogaster sigma-virus system in natural-populations from languedoc (southern france) polymorphism of the drosophila melanogaster sigma virus system displacement of cellular proteins by functional analogues from plasmids or viruses could explain puzzling phylogenies of many dna informational proteins the great virus comeback-from an evolutionary perspective the two ages of the rna world and the transition to the dna world: a story of viruses and cells a ) the origin of viruses and their possible roles in major evolutionary transitions b ) three rna cells for ribosomal lineages and three dna viruses to replicate their genomes: a hypothesis for the origin of cellular domain luca: the last universal common ancestor origin and evolution of dna topoisomerases identifi cation of unique hepatitis c virus quasispecies in the central nervous system and comparative analysis of internal translational effi ciency of brain, liver and serum variants central nervous system changes in hepatitis c virus infection a century of tobamo virus evolution in an australian population of nicotiana glauca an ancient evolutionary origin of the rag1/2 gene locus prevalence of indigenous viruses in laboratory animal colonies in the united kingdom 1978-1979 whole genome molecular phylogeny of large dsdna viruses using composition vector method evolutionary transition toward defective rnas that are infectious by complementation retroelements and segmental duplications in the generation of diversity within the mhc the doc toxin and phd antidote proteins of the bacteriophage p1 plasmid addiction system form a heterotrimeric complex avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes mimivirus relatives in the sargasso sea evolution and origins of tobamoviruses free-grazing ducks and highly pathogenic avian infl uenza nidovirales: evolving the largest rna virus genome natural biology of polyomavirus middle t antigen . microbiol suppression of viral infectivity through lethal defection poxvirus genomes: a phylogenetic analysis maternal antibodies protect immunoglobulin defi cient neonatal mice from mouse hepatitis virus (mhv)-associated wasting syndrome the viriosphere, diversity and genetic exchange within phage communities impact of phages on two-species bacterial communities synthesis and antiviral evaluation of a mutagenic and non-hydrogen bonding ribonucleoside analogue: 1-beta-d-ribofuranosyl-3-nitropyrrole placental endogenous retrovirus (erv): structural, functional and evolutionary signifi cance hantavirus infections: epidemiology and pathogenesis comparative analysis of selected genes from diachasmimorpha longicaudata entomopoxvirus and other poxviruses a longitudinal study of an endemic disease in its wildlife reservoir: cowpox and wild rodents evolution: the long evolutionary reach of viruses bacteriophages: evolution of the majority . theor bacteriophage genomics bacteriophages with tails: chasing their origins and evolution evolutionary relationships among diverse bacteriophages and prophages: all the world ' s a phage use of whole genome sequence data to infer baculovirus phylogeny bacteriophage t4 rna ligase 2 (gp24.1) exemplifi es a family of rna ligases found in all phylogenetic domains sequence variability in the env-coding region of hepatitis c virus isolated from patients infected during a single source outbreak the evolutionary biology of dengue virus the origin, emergence and evolutionary genetics of dengue virus enterotropic mouse hepatitis virus albert b. sabin and the development of oral poliovaccine origin and evolution of viral interleukin-10 and other dna virus genes with vertebrate homologues poxvirus genome evolution by gene gain and loss accessory dna in the genomes of representatives of the escherichia coli reference collection isolation of mouse hepatitis virus from infant mice with fatal diarrhea common origin of four diverse families of large eukaryotic dna viruses evolutionary genomics of nucleo-cytoplasmic large dna viruses avipoxvirus phylogenetics: identifi cation of a pcr length polymorphism that discriminates between the two major clades use of endogenous retroviral sequences (ervs) and structural markers for retroviral phylogenetic inference and taxonomy rag1 core and v(d)j recombination signal sequences were derived from transib transposons small mammal virology structure of an archaeal virus capsid protein reveals a common ancestry to eukaryotic and bacterial viruses hiv diversity, molecular epidemiology and the role of recombination origin of human immunodefi ciency virus type 1 quasispecies emerging after antiretroviral treatment interruption in patients with therapeutic failure virus persistence and recurring demyelination produced by a temperature-sensitive mutant of mhv-4 horizontal gene transfer in prokaryotes: quantifi cation and classifi cation structurefunction relationships of the viral rna-dependent rna polymerase: fidelity, replication speed and initiation mechanism determined by a residue in the ribose-binding pocket the evolution of mhc diversity by segmental duplication and transposition of retroelements comparison between two human endogenous retrovirus (herv)-rich regions within the major histocompatibility complex ervk9, transposons and the evolution of mhc class i duplicons within the alpha-block of the human and chimpanzee interactions between retroviruses and herpesviruses . singapore; river edge evolution and selection of hepatitis c virus variants in patients with chronic hepatitis c kshv-like herpesviruses in chimps and gorillas plasmid addiction genes of bacteriophage p1: doc, which causes cell death on curing of prophage and phd, which prevents host death when prophage is retained evolution and persistence of infl uenza a and other diseases bats are natural reservoirs of sars-like coronaviruses transfer of photosynthesis genes to and from prochlorococcus viruses protein repertoire of double-stranded dna bacteriophages characterization of the primary immunity region of the escherichia coli linear plasmid prophage n15 parvovirus variation for disease: a difference with rna viruses? cross-species transfer of viruses: implications for the use of viral vectors in biomedical research, gene therapy and as live-virus vaccines bacteriophage: an essay on virus reproduction mutations of bacteria from virus sensitivity to virus resistance virus growth and variation: ninth symposium of the society for general microbiology prevalence of natural virus infections in laboratory mice and rats used in canada different patterns of molecular evolution of infl uenza a viruses in avian and human populations crystal structure of poliovirus 3cd protein: virally encoded protease and precursor to the rna-dependent rna polymerase symbiosis as a mechanism of evolution: status of cell symbiosis theory a virus in a fungus in a plant: three-way symbiosis required for thermal tolerance hepatitis c virus (hcv) circulates as a population of different but closely related genomes: quasispecies nature of hcv genome distribution mhv infection of the cns: mechanisms of immunemediated control molecular dynamics of emiliania huxleyi and cooccurring viruses during two separate mesocosm studies role of disease in abundance of a pacifi c herring ( clupea pallasi ) population . can antibiotic resistance in the ecor collection: integrons and identifi cation of a novel aad gene evolution of the dutpase gene of mammalian and avian herpesviruses the descent of human herpesvirus 8 integrating reptilian herpesviruses into the family herpesviridae toward a comprehensive phylogeny for mammalian and avian herpesviruses topics in herpesvirus genomics and evolution hiv and human endogenous retroviruses: an hypothesis with therapeutic implications the genome of bacteriophage phikz of pseudomonas aeruginosa syncytin is a captive retroviral envelope protein involved in human placental morphogenesis algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes apoptotic-like morphology is associated with annual synchronized death in kleptoplastic sea slugs ( elysia chlorotica ) the players in a mutualistic symbiosis: insects, bacteria, viruses and virulence genes low genetic barrier to large increases in hiv-1 cross-resistance to protease inhibitors during salvage therapy comment on ' the 1.2-megabase genome sequence of mimivirus murine viruses in an island population of introduced house mice and endemic short-tailed mice in western australia pathogens of house mice on arid boullanger island and subantarctic macquarie island late viral interference induced by transdominant gag of an endogenous retrovirus the transdominant endogenous retrovirus enjs56a1 associates with and blocks intracellular traffi cking of jaagsiekte sheep retrovirus gag foamy virus infection in primates the minimal genome concept algal viruses with distinct intraspecies host specifi cities include identical intein elements human endogenous retroviruses with transcriptional potential in the brain the structure and evolution of the major capsid protein of a large, lipidcontaining dna virus lack of the canonical prb-binding domain in the e7 orf of artiodactyl papillomaviruses is associated with the development of fi bropapillomas thermus thermophilus bacteriophage phiys40 genome and proteomic characterization of virions natural history of murine gamma-herpesvirus infection molecular biodiversity of cassava begomoviruses in tanzania: evolution of cassava geminiviruses in africa and evidence for east africa being a center of diversity of cassava geminiviruses the evolution of epidemic infl uenza novel organizational features, captured cellular genes and strain variability within the genome of kshv/hhv8 site-specifi c recombination links the evolution of p2-like coliphages and pathogenic enterobacteria genetic diversity among fi ve t4-like bacteriophages exponential increases of rna virus fi tness during large population transmissions lack of evolutionary stasis during alternating replication of an arbovirus in insect and mammalian cells a new example of viral intein in mimivirus adaptive evolution and antiviral activity of the conserved mammalian cytidine deaminase apobec3h genetic heterogeneity of hepatitis c virus in vitro characterization of a koala retrovirus selfi sh dna: the ultimate parasite hot crenarchaeal viruses reveal deep evolutionary connections encyclopedia of evolution identifi cation and genetic diversity of two human parvovirus b19 genotype 3 subtypes marine phage genomics cloning, expression, purifi cation and characterisation of the dutpase encoded by the integrated bacillus subtilis temperate bacteriophage spbeta increased fi delity reduces poliovirus fi tness and virulence under selective pressure in mice bottleneckmediated quasispecies restriction during spread of an rna virus from inoculation site to brain annual viral expression in a sea slug population: life cycle control and symbiotic chloroplast maintenance a ) viruses of the archaea: a unifying view b ) evolutionary genomics of archaeal viruses: unique viral genomes in the third domain of life adp-ribose-1 " -monophosphatase: a conserved coronavirus enzyme that is dispensable for viral replication in tissue culture phylogenetic evidence for deleterious mutation load in rna viruses and its contribution to viral evolution defective interfering rnas of a satellite virus structural proteomics of the poxvirus family the 1.2-megabase genome sequence of mimivirus the structure of a thermophilic archaeal virus shows a double-stranded dna viral capsid type that spans all domains of life plant rna virus evolution symbiosis versus competition in plant virus evolution mutant clouds and occupation of sequence space in plant rna viruses quasispecies development by high frequency rna recombination during mhv persistence memory in viral quasispecies viruses as symbionts a snapshot of viral evolution from genome analysis of the tectiviridae family prevalence of naturally occurring viral infections, mycoplasma pulmonis and clostridium piliforme in laboratory rodents in western europe screened from virus succession observed during an emiliania huxleyi bloom phylogenetic evidence for the rapid evolution of human b19 erythrovirus the evolution of small dna viruses of eukaryotes: past and present considerations coinfection of wild ducks by infl uenza a viruses: distribution patterns and biological signifi cance bacteriophage origins of mitochondrial replication and transcription proteins plant virus satellite and defective interfering rnas: new paradigms for a new century prevalence of viral antibodies and helminths in fi eld populations of house mice (mus domesticus) in southeastern australia the origin of hepatitis c virus genotypes emergence and predominance of an h5n1 infl uenza variant in china phylogenetic analyses of type a infl uenza genes in natural reservoir species in north america reveals genetic variation dynamics of autocatalytic replicator networks based on higher-order ligation reactions koala retrovirus: a genome invasion in real time no evidence for quasispecies populations during persistence of the coronavirus mouse hepatitis virus jhm: sequence conservation within the surface glycoprotein gene s in lewis rats evolution of avian infl uenza viruses prevalence and evolution of core photosystem ii genes in marine cyanobacterial viruses and their hosts ultrastructural characterization of the giant volcano-like virus factory of acanthamoeba polyphaga mimivirus ancient co-speciation of simian foamy viruses and primates poxviruses and the origin of the eukaryotic nucleus prevalence and genetic diversity of coronaviruses in bats from a mutation in the rna polymerase of poliovirus type 1 contributes to attenuation in mice retroviral invasion of the koala genome unusual life style of giant chlorella viruses phycodnaviridae-large dna algal viruses collective evolution and the genetic code ribavirin and lethal mutagenesis of poliovirus: molecular mechanisms, resistance and biological implications quasispecies diversity determines pathogenesis through cooperative interactions in a viral population evolutionary insights into the ecology of coronaviruses dna virus contribution to host evolution viruses and the evolution of life how viruses shape the tree of life virus-host symbiosis mediated by persistence a hypothesis for dna viruses as the origin of eukaryotic replication proteins on viruses, sex and motherhood acute and persistent viral life strategies and their relationship to emerging diseases major histocompatibility complex class i genes in primates: co-evolution with pathogens high prevalence of infl uenza a virus in ducks caught during spring migration through sweden first demonstration of a lack of viral sequence evolution in a nonprogressor, defi ning replication-incompetent hiv-1 infection phylogenetic analysis, genome evolution and the rate of gene gain in the herpesviridae the evolution of major histocompatibility class i genes in primates h5n1 infl uenza-continuing evolution and spread elimination of mouse hepatitis virus from a breeding colony by temporary cessation of breeding matrix gene of infl uenza a viruses isolated from wild aquatic birds: ecology and emergence of infl uenza a viruses persistent mhv (mouse hepatitis virus) infection reduces the incidence of diabetes mellitus in non-obese diabetic mice long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of infl uenza a virus bacteriophage p1 in retrospect and in prospect human endogenous retroviral elements belonging to the herv-s family from human tissues, cancer cells and primates: expression, structure, phylogeny and evolution a reevaluation of the higher taxonomy of viruses based on rna polymerases chlorella virus-encoded deoxyuridine triphosphatases exhibit different temperature optima persistence of extraordinarily low levels of genetically homogeneous human immunodefi ciency virus type 1 in exposed seronegative individuals key: cord-284015-vvtv492b authors: nikaido, masato; kondo, shinji; zhang, zicong; wu, jiaqi; nishihara, hidenori; niimura, yoshihito; suzuki, shunta; touhara, kazushige; suzuki, yutaka; noguchi, hideki; minakuchi, yohei; toyoda, atsushi; fujiyama, asao; sugano, sumio; yoneda, misako; kai, chieko title: comparative genomic analyses illuminate the distinct evolution of megabats within chiroptera date: 2020-09-23 journal: dna res doi: 10.1093/dnares/dsaa021 sha: doc_id: 284015 cord_uid: vvtv492b the revision of the sub-order microchiroptera is one of the most intriguing outcomes in recent mammalian molecular phylogeny. the unexpected sister–taxon relationship between rhinolophoid microbats and megabats, with the exclusion of other microbats, suggests that megabats arose in a relatively short period of time from a microbat-like ancestor. in order to understand the genetic mechanism underlying adaptive evolution in megabats, we determined the whole-genome sequences of two rousette megabats, leschenault’s rousette (rousettus leschenaultia) and the egyptian fruit bat (r. aegyptiacus). the sequences were compared with those of 22 other mammals, including nine bats, available in the database. we identified that megabat genomes are distinct in that they have extremely low activity of sine retrotranspositions, expansion of two chemosensory gene families, including the trace amine receptor (taar) and olfactory receptor (or), and elevation of the dn/ds ratio in genes for immunity and protein catabolism. the adaptive signatures discovered in the genomes of megabats may provide crucial insight into their distinct evolution, including key processes such as virus resistance, loss of echolocation, and frugivorous feeding. bats belong to the order chiroptera and have the ability of powered flight. accounting for one-fifth of all mammals in terms of the number of species, bats are one of the most successful groups of mammals. 1 it is of primary interest for biologists to identify the processes and mechanisms of dynamic adaptation in bats. traditionally, morphological and paleontological analyses placed the order chiroptera within the superorder archonta (primates, dermoptera, chiroptera, and scandentia). 2 however, dna sequencing data has challenged the validity of the archonta, and alternatively proposed the inclusion of bats into laurasiatheria (cetartiodactyla, perissodactyla, carnivora, pholidota, chiroptera and eulipotyphla). [3] [4] [5] [6] although laurasiatheria is now considered to be a natural assemblage, the phylogenetic position of bats within laurasiatheria remains to be resolved. 7, 8 the paraphyly of microbats is also under debate. traditionally, morphological studies proposed the sub-division of the order chiroptera into two suborders: microchiroptera (microbats) and megachiroptera (megabats or old-world fruit bats). 9 microbats use ultrasonic echolocation for flight and for foraging in the night, whereas megabats do not echolocate, and primarily use vision to fly and feed on fruits and/or nectars. megabats are also neuroanatomically distinct from microbats, as megabats have a developed visual system. 10 molecular data suggests that five lineages of microbats, including rhinopomatidae, rhinolophidae, hipposideridae, craseonycteridae, and megadermatidae, are more closely related to megabats than to other microbats. therefore, the five lineages of rhinolophoid microbats and megabats were re-classified as 'yinpterochiroptera' and the remaining microbats as 'yangochiroptera'. 5, 11, 12 thus, recent molecular studies suggest that several adaptive characteristics specific to megabats have emerged within a short period of time from a microbat-like ancestor. genome-wide analyses have been used to identify the unique evolution of bats in several studies. seim et al.'s 13 study determined the genome sequence of one microbat (brandt's bat) and found the signatures for adaptive evolution in genes related to physiology and longevity. zhang et al. 14 determined the genome sequences of one microbat (david's myotis) and one megabat (black flying fox) and found that genes for flight and immunity evolved due to positive selection. parker et al. 15 identified the genomes of three microbats, including the greater horseshoe bat, the greater false vampire bat, and parnell's mustached bat, and one megabat, the straw-coloured fruit bat. in comparing the genomes of these bats with those of other mammals, this study identified that genes related to hearing/deafness showed convergent evolution among echolocating mammals. pavlovich et al. 16 recently determined the whole genome of the egyptian fruit bat (r. aegyptiacus), which is a natural reservoir for the marburg virus, and revealed that the genes for immunity were expanded and diversified, suggesting an antiviral mechanism that is used to control viral infection. especially, as bats are natural hosts for zoonotic virus including henipaviruses, filoviruses, and coronaviruses, which are emerging viruses with high rates of fatality, the comparative genomic study in bats may provide an effective solution against the current global pandemics of coronavirus disease-2019 . 17 in this study, we determined the genome sequences of two rousette megabats, leschenault's rousette (rousettus leschenaultia) and the egyptian fruit bat (r. aegyptiacus). we assessed the genomic signatures for the process of natural selection that facilitates the dynamic and adaptive evolution of megabats. in particular, the main aim to determine the whole-genome sequence of egyptian fruit bat in addition to the previous study 16 is to obtain higher quality genome data, which facilitates more accurate and comprehensive gene annotations, especially for multi-gene families. in addition, the genome sequences of leschenault's rousette belonging to the same genus as the egyptian fruit bat is of our interest to identify genomic differences in closely related bat species. these genome sequences were compared with those of 22 mammals, including six microbats and three megabats, available in the database. we used genome-wide phylogenetic analyses, followed by candidate gene analyses focussed on retroposons and chemosensory multi-gene families for taste, olfaction, and pheromone detection. in addition, we also performed global positive selection analyses. as a result, the inter-relationships among laurasiatheria were consistently reconstructed, with the order eulipotyphla diverging first, followed by the divergence of chiroptera and the remaining groups, including cetartiodactyla, perissodactyla, pholidota, and carnivora. the reciprocal monophyly of yinpterochiroptera and yangochiroptera was also shown with reliable statistical support. we revealed several notable distinct features in megabat genomes, including extremely low activity of sine retrotranspositions and the expansion of the genes for the trace amine receptor (taar) and olfactory receptor (or). additionally, the signatures for positive or relaxed selection were observed in genes for immunity and protein catabolism. thus, our comparative genomic analyses may illuminate the genetic mechanisms underlying the dynamic adaptation of megabats during diversification in the order chiroptera. egyptian fruit bats (r. aegyptiacus) and leschenault's rousettes (r. leschenaulti), both of which were provided by ueno zoo, were maintained under controlled conditions using an air conditioner and moisture chamber. the animals were kept in steel cages and fed fruit and water at the same time every day. all experiments were performed in accordance with the animal experimentation guidelines of the university of tokyo and were approved by the institutional animal care and use committee of the university of tokyo. as for egyptian fruit bats, we prepared kidney-derived primary cultured cells. a pregnant egyptian fruit bat was deeply anesthetized with isoflurane, the uteri were surgically removed, and the animal was euthanized by bleeding. the kidney from the fetus was fragmented using scissors and treatment with tryple (gibco). the fragmented kidney was then cultured in dmed containing 5% fetal calf serum to obtain primary cultured cells. genomic dna was extracted from the frozen spleen tissue or cultured kidney cells of two individuals of egyptian fruit bat, and frozen kidney tissue from one individual of leschenault's rousette, using a blood & cell culture dna kit (qiagen, hilden, germany), according to the manufacturer's protocol with minor laboratory customizations, the information can be available upon request. the dna samples (>20 kb) were subjected to the sequencing as described below after quality and quantity check. to construct paired-end sequencing libraries, the genomic dna was fragmented using a covaris s2 focussed-ultrasonicator (covaris, woburn, ma, usa). the paired-end libraries were constructed using the truseq dna pcr-free library prep kit (illumina, san diego, ca, usa). mate pair libraries were prepared from genomic dna using the nextera mate pair sample preparation kit (illumina, san diego, ca, usa). all libraries were sequenced on an illumina-hiseq 2500 system using rapid-mode chemistry with paired-end sequencing. prior to assembly, data pre-processing was performed. first, the adapter sequences were trimmed using the fastq-clipper ea-utils v1.1.2, 18 setting the parameters to '-p 10 -m 1 -l 0'. second, we filtered the reads mapped to the mitochondrial genome using bwa-aln v0.6.2 19 with default parameters. finally, we performed base error correction using soapec v2.01 20 with the parameters '-k27 -l 150'. we then assembled the reads using platanus v1.2.1 21 with default parameters. contamination candidates were removed by mapping to escherichia coli and phix genomes using blastn v2.2.9, 22 setting the parameters to '-e 1e-30'. the statistics of the genome assemblies and the information of sequence libraries are summarized in supplementary tables s1-1 and s1-2. in order to test the quality of the reference assembly in the egyptian fruit bat, we additionally constructed a fosmid library, which was end-sequenced using abi 3730xl sequencers. the protein-coding genes in the genomes of egyptian fruit bat and leschenault's rousette were identified based on the alignment with annotated gene sequences of 14 mammals (cat, dog, horse, cow, hedgehog, human, macaque, mouse, rat, black flying fox, little brown bat, brandt's bat, david's myotis, and large flying fox; supplementary table s2 ) that are available in the database. the sequences for each gene of the 14 mammals were aligned to the two bat genomes by using blat 23 to identify approximate gene loci. the blat alignments of the gene sequences to the genomes were refined by the exonerate software to estimate the exon-intron boundaries. 24 in addition to the homology-based identification, rna-seq-based transcript reconstruction and ab initio gene prediction were performed to identify the protein coding genes. rna of primary culture cells from the kidney of the egyptian fruit bat was extracted by using trizol reagent (thermo fisher). a total of 122,017,734 paired-end reads of mrna (illumina-hiseq, 101 bp) were aligned to the genomes using tophat. 25 in total, 97,696,475 and 88,528,280 paired-end reads could be mapped to the genome sequences of r. aegyptiacus and r. leschenaultii, respectively. transcript structures were reconstructed using augustus 26 based on the tophat alignment of the illumina reads to the bat genomes. the expression levels of the reconstructed genes were computed using cufflinks 27 based on the tophat alignment of the illumina reads to the genomes. a total of 8,079 genes were expressed with fragments per kilobase of transcript per million of reads mapped (fpkm) ! 1 in the kidney-derived primary cultured cells. examples are shown in supplementary fig. s1 . ab initio genes were obtained by using genscan 28 and snap. 29 the genomic sequences were cut to seven megabase-long fragments, and genscan was run on each fragment. the genes identified were assigned to gene loci based on the overlap of exons on the same strand, and the redundancies of the transcripts were removed. only transcripts annotated with the start codon (atg) and introns flanked by canonical splice dinucleotide pairs (gt-ag, gc-ag, and at-ac) were retained. a total of 46,249 and 47,073 transcripts were annotated over 20,005 and 20,913 gene loci, respectively, on the genomes of the egyptian fruit bat and leschenault's rousette. the completeness of the gene determination was evaluated by using busco. 30 similarly, we assessed protein-coding genes on the genomes of four other bat species, the straw-coloured fruit bat, the greater false vampire bat, the greater horseshoe bat, and the strawcoloured fruit bat. 15 due to fragmental nature of these genome assemblies (n50: 15-27 kb), however, we did not use the thresholds of initial codon and splice sites as used in the annotation of the genomes of the egyptian fruit bat and leschenault's rousette. we identified the longest orf in each transcript mapped by exonerate by using transdecoder (https://github.com/transdecoder/transdecoder/ blob/master/transdecoder.longorfs) and used it as the gene annotation. we identified 28,367-31,441 transcripts in 19,296-20,272 gene loci on these genomes (supplementary table s3 ). the ratio of complete genes of the annotated genes evaluated by busco was 53.8-76.0%. additional to this annotation, the tandemly duplicated receptor genes, including ors, taste receptors (t1rs and t2rs), vomeronasal receptors (v1rs and v2rs), formyl peptide receptors (fprs), and taars, were annotated separately. olfactory receptors were identified by the method described previously. 31, 32 the other receptor genes were identified using another protocol. 33 in short, we obtained protein sequences of mammalian t1rs, t2rs, fprs, v1rs, v2rs, and taars from the ncbi refseq database (https://www.ncbi.nlm. nih.gov/refseq/). the redundant sequences, which contain more than 80% identity as identified by cd-hit 34 , were removed to establish representative query sequences. for t1rs, we used only the transmembrane regions as query sequences. we used the ncbi conserved domains database to annotate the 7-transmembrane domains of t1rs. using the query sequences, we performed a tblastn search against the whole-genome sequence assemblies available in genbank (https://www.ncbi.nlm.nih.gov/genbank/). the taxonomic classification and the accession numbers of the whole-genome sequences are summarized in supplementary table s2 . the exon-intron structure of each sequence, which was obtained by tblastn, was predicted with the exonerate program 24 using translated query sequences as protein models. the resulting hit sequences were classified into 'intact', 'truncated', and 'pseudo-genes'. due to an assembly issue, the 'truncated' genes included poly 'n' sequences. in order to estimate the gene copy numbers, in these analyses, we treated the 'truncated' genes as 'putatively intact'. the pseudo-genes include inactivating mutations in the coding region. the resulting genes were assessed to determine whether they encode the chemosensory receptors of interest using blastx searches and ghostz 35 against the uniref50 database (https://www.uniprot.org/help/uniref). we used the framework for annotating translatable exons (fate), which is available in github (https://github.com/hikoyu/fate), for the automation of the procedures described above. we constructed a phylogenetic tree based on the single-copy orthologous gene sets of mammals, as previously reported by wu et al., 36 to elucidate the phylogenetic relationships of megabats with other mammals. briefly, the nucleotide sequences of the 6,365 proteincoding genes of the two megabat species and 22 other mammalian species (supplementary table s2 ) were aligned using the prank software v.170427 37 in codon level. sites that are shared by <70% of the species were removed from the alignment. among the 6,365 genes, 2,093 genes were listed for all species, and were used for the analyses. the gene tree was constructed using raxml software, v8.1.12 38 using the gtrþcþi model with 1,000 bootstrap replicates for each 2,093 gene. we collected the best tree for all 2,093 genes, which were used to infer the coalescent species tree with branch length by astral-iii. 39 the node support of the species tree was obtained by 1,000 replicates of bootstrapping. branch length shown in the tree indicates the branch length in coalescent units. 40 we used the genome of leschenault's rousette for the identification of tes, based on two approaches, including de novo characterization of tes and identification of homologous copies of known tes in another megabat, the large flying fox (supplementary table s2 ). in the first approach, repeatmodeler ver. 1.0.8 (http://www.repeatmasker. org/repeatmodeler.html) was used to obtain a collection of repetitive sequences. for each of the preliminary consensus sequences, we conducted a local nucleotide blast search (r ¼ 2, g ¼ 5, e ¼ 2, with an e-value cutoff of 10 à10 ) and collected 80-100 copies along with their 10-kp flanking sequences. the copy sequences were aligned using mafft ver. 7. 41 the alignment was manually modified using mega 5.0 42 and a consensus sequence was re-constructed. the consensus sequence was used for the next round of the blast search, as described above, to obtain additional copies. this procedure was repeated until a full-length consensus sequence was completed. the full-length tes were characterized and classified based on the sequence structure, including terminal inverted repeats and long terminal repeats (ltrs), coding proteins such as transposase and reverse transcriptase, and by comparison with known elements using repeatmasker ver. 4.0.6 (http://repeatmasker.org), censor, 43 and rtclass1. 44 for the second approach, a te library of another megabat, the large flying fox, including 65 te families which were obtained from repbase, 45 was used as a query for a homology search against leschenault's rousette. the local nucleotide blast search, alignment of the copy sequences, and reconstruction of the consensus sequence were conducted as described above. a similar blast search was also conducted using 102 te families of a microbat (the little brown bat, supplementary table s2) library; however, no additional novel tes were found except in the results from the two approaches listed above. all of the newly characterized 118 te (sub) families were designated in conformity with the repbase classification. the repeat contents of the two rousettus genomes were estimated using repeatmasker with the sensitive option (-s) of cross-match search using the rousettus repeat library which we developed here. the te contents, such as the number of copies and length, were summarized based on their divergence (%) from the consensus sequence at the family/subfamily levels by using in-house perl scripts. the te contents of other species were summarized based on the repeatmasker output (http://www.repeatmasker.org/genomic datasets/rmgenomicdatasets.html). orthologous genes under relaxed selection on megabat lineages were identified from the aligned 6,365 single-copy genes. on every alignment, we used the codeml branch model in paml 4.8 46 to detect the elevation of the dn/ds ratio (the non-synonymous substitution rate to the synonymous substitution rate) on stem and crown megabat branches. the species tree shown in fig. 1 was used as a guide tree in the analysis. likelihood ratio tests and inspections of the p-value were used to compare likelihoods between two models: (i) that assumed the megabat lineages as foreground branches; and (ii) that assumed the dn/ds ratio was not altered in all branches (null hypothesis), to evaluate the significance of the elevation of the dn/ds ratio for megabat branches. we performed further analyses for the genes of interest using the codeml branch-site models for analysing the positive selection on each site. in the branch-site test, we tested stem and crown megabats as the foreground branches and used microbats and outgroup species, including human, macaque, mouse, rat, cat, dog, chinese pangolin, sunda pangolin, bottlenose dolphin, cow, horse, hedgehog, asian musk shrew, and common shrew, as background branches. for the branch-site test, we used two models for the analysis, including one model of a null hypothesis that assumes that the gene was under two types of selective pressures (purifying selection and neutral selection), and one model that used an alternative hypothesis to assume the gene was under three categories of selective pressures, including positive selection on the megabat branches. the likelihood ratio test comparing the likelihoods of these two models was used to evaluate the significance of the alternative model. to assess the functionality of positively selected sites, protein structure deposited in protein data bank (pdb) was used. the protein structures were depicted using the open-source version of pymol. 47 we constructed draft genomes of the egyptian fruit bat and leschenault's rousette by assembling short read data into contigs and scaffolding them using platanus v1.2.1. 21 the genome of the egyptian fruit bat is composed of 1.90 gbp with 4,974 scaffolds (n 50 ¼ 37.2 mbp) and the genome of leschenault's rousette is composed of 1.90 gbp with 8,141 scaffolds (n 50 ¼ 32.7 mbp) (supplementary tables s1-1 and s1-2). the high qualities of the two genomes are demonstrated by the ratios of complete genes, which are 98.1 and 97.9%, respectively, as evaluated by busco 30 (supplementary table s3 ). the quality of both genomes in terms of the continuity of the scaffolds and the rate of n is high enough to facilitate genome-wide evolutionary analyses and characterization of multi-gene families. in addition, independent genome assemblies and gene annotations of the two individuals of egyptian fruit bat determined in the previous study 16 and this study may be utilized as an initial step towards the identification of the genotypic, transcriptomic, and phenotypic variation of this species in the future research. figure 1 shows the maximum likelihood phylogenetic tree with the time scale for 24 mammals, including 11 bats (five megabats and six microbats) based on 2,093 single-copy orthologous gene sets. four species of euarchontoglires, including humans, macaca, mouse, and rat, were used as outgroups. the tree successfully highlights the evolutionary history of laurasiatherian mammals in that eulipotyphla diverged first among them. in this phylogenetic tree, chiroptera diverged after eulipotyphla; however, the bootstrap probability (bp) supporting this node was not so high (63.5) . in addition, the grouping of pegasoferae (chiroptera, perissodactyla, and carnivora), which was originally proposed by the insertion of retroposons 7 and supported by several genome-wide analyses, 13, 14 was not supported. given that the bps for the inter-relationships of cetartiodactyla, perissodactyla (carnivora þ pholidota), and chiroptera were relatively low (63.7, 63.5) and the branch lengths were markedly short, it is highly likely that the initial divergence of laurasiatherian mammals occurred rapidly during evolution. such rapid speciation events may hamper reconstruction of the consistent tree topology for these groups. 8, 48 importantly, as it was shown in the previous studies, 5,49,50 the reciprocal monophyly of yangochiroptera and yinpterochiroptera was successfully supported in this analysis, suggesting that the megabats are nested in microbat lineages. although it is difficult to estimate the ancestral state in the megabat ancestors due to the rarity of the fossil record, the phylogenetic tree suggests that several distinct characteristics in megabats, including the welldeveloped visual system, frugivorous diet, and the absence of echolocation, evolved in a short period of time during evolution from a 'microbat-like' ancestor. we next focussed on assessing the signatures for such adaptive evolution in these groups based on the genome-wide comparative analyses. in both the leschenault's rousette and egyptian fruit bat genomes, tes account for $35% of the genome, including sines (3.9%), lines (21%), ltr retrotransposons (6.2%), and dna transposons (4.0%) (supplementary table s4 and fig. s2 ). it is notable that the proportions of tes in megabats, including the two rousettus species and the large flying fox, are considerably lower as compared to the levels in other mammals, such as humans, where nearly half of the genome is covered by tes ( fig. 2a and supplementary fig. s2) . consistent with the previous observations, it is also interesting that the proportion of tes is generally correlated with the genome size in mammals 51, 52 (supplementary fig. s2 ). co-variation between an accumulation of tes and dna loss by large segmental deletions is considered a major contributing factor to determine the genome size. 50 therefore, the smaller genome sizes in the megabats may be due to a lower activity of tes, at least in part. indeed, our analysis revealed that the number of young (recently retrotransposed) te copies in the megabat genomes is very small (fig. 2b and supplementary fig. s3 ). as exemplified by the microbat myotis lucifugus, where the number of tes representing <5% divergence from the consensus sequence is 180,000 (6.5% among all te copies; supplementary fig. s3 ) consistent with the previous studies, 53 in general, young tes constitute a few percent among all tes in mammalian genomes. however, the copy numbers of young tes is only 2,900 (0.15%) and 7,300 (0.38%) for the rousettus species and large flying fox, respectively ( fig. 2b supplementary fig. s3 ). the small proportion of young tes is partly accounted for the low frequency of retrotransposition events in megabat-specific sines (fig. 2) . in general, different types of sine families are distributed for each mammalian clade, such as order, sub-order, or family. 52 in megabats, the only known active sines are the 5s rrna-derived meg sines. 54 it should be noted that rousettus genomes contain no more than 23,000 copies of the meg-related sines, which cover 0.21% of the genome. however, clade-specific sines are, in general, retrotranspositionally highly active, with 10 5 -10 6 copies present in each mammalian genome (fig. 2a) . the large flying fox (pteropus vampyrus) also has only 22,000 copies of meg-related sines. based on the wide distribution of meg sines in megabats, including rousettus, macroglossus, eonycteris, and cynopterus, 54 the origin of meg can be traced back to the common ancestor of megabats, which existed at least 24 million years ago. 49 it is possible that such a low retrotranspositional activity of the sines found in rousettus and pteropus is observed widely among megabats. it has been demonstrated that flying vertebrates, including bats, have substantially lost tes and have smaller genome sizes in association with cellular metabolic constraints. 55, 56 the small proportion of meg sines in the megabats may also be a result of the constraint related to their powered flight. another notable te family is line-1 (l1), as it has been reported that the retrotranspositional activity of l1 has been lost in megabats. 57 it is unlikely that the extinction of l1 resulted from the quiescence of l1 itself, because a synthesized sequence of the reconstructed megabat l1 is capable of retrotransposition in human hela cells. 58 in addition, we identified that in addition to l1, all types of tes have the least activity in megabats among the mammals investigated (fig. 2b) . this low activity of young tes may be due to an unknown megabat-specific mechanism for te repression or a result of extensive dna loss during the past tens of millions of years. one of the possible mechanisms by which te activity may be tightly repressed is an antiviral immune system in megabats. suggesting that the egyptian fruit bat may possess a novel mode of antiviral defense, 16 several antiviral-related genes are known to have expanded in this bat. for example, ribonuclease l, an interferoninducible endoribonuclease that cleaves viral rnas, 59 evolved under relaxed selective constraint in bats. 16 ribonuclease l is also known to restrict retrotransposition of human l1 and mouse iap elements in human cells. 60 in addition, several other factors that restrict retrotransposition in humans and mice are known to be involved in an antiviral immune system. 61 thus, it is possible that a unique antiviral mechanism against exogenous parasites (i.e. viruses) is secondarily used for the restriction of the endogenous retroelements. as general mobilization of sines in mammals relies on the l1 machinery, the restriction of megabat l1 could limit the meg sine activity. 62 the low activity of tes may partly contribute to the small genome size ( supplementary fig. s2 ), which could also be advantageous with respect to cell size and metabolic constraints in megabats as well as other flying vertebrates 55, 56 . therefore, the unusual characteristics of the tes, likely shared among megabats, are an important example to study the molecular mechanisms underlying restriction of retrotransposition. such future studies may shed light on the reason why bats have such compact genomes. it also remains unknown why ves sines in microbats are active, whereas the genome size is relatively small among mammals (fig. 2) . the difference in the sine activity between megabats and microbats may be affected by a possibly distinct antiviral immune system between the two groups, given that expansion of some antiviral-related genes occurred specifically in megabats. 16 most of the chemosensory receptors are encoded by multi-gene families, allowing animals to detect highly diversified chemicals in the environment. the previously published studies have shown that the collections of the chemosensory receptor genes are flexible and highly variable among mammals, including the ors, taste receptors (t1rs and t2rs), vomeronasal receptors (v1rs and v2rs), fprs, and taars. 63 the number of certain chemosensory receptor gene families has been shown to have a strong correlation with the degree of dependence on these ligand chemicals for survival. 32, 64, 65 several studies have revealed that bats lost several chemosensory receptor genes, such as t1r1 for umami, 66 and v1rs for pheromone(s) 67 that may be due to the specific sensory adaptation in the ancestor of these groups. it is possible that megabats re-allocated the diversity in chemosensory receptor genes as a sensory trade-off, given that megabats have experienced the secondary loss of echolocation ability, which is one of the most specialized senses in bats. 68 to examine this possibility, we comprehensively characterized the chemosensory receptor genes and compared their diversity by focussing on whether or not the repertoires in megabats show notable differences from those in microbats. our comparative genomic analyses of chemosensory receptor genes in the genomes of 25 mammals revealed that the copy number of the intact genes and pseudo-genes show a certain variation among bat species. in t1rs, the absence of t1r1, the umami receptor, in all of the bats that we analysed is consistent with the findings of the previous studies. 66 all megabats possess two t1rs (t1r2 and t1r3), whereas microbats are somewhat variable, in that they can possess no (greater false vampire bat), one (little brown bat), or two (brandt's bat, greater horseshoe bat) t1rs (fig. 3a and supplementary table s5 ). it is noteworthy that all megabats possess t1r2, which is the sweet receptor, suggesting the importance of sweet taste for their frugivorous lifestyle. no intact t1rs in the greater false vampire bat could be explained by their specific adaptation for a carnivorous diet, which resembles the blood-feeding activity of the vampire bat (desmodus rotundus), which also lost t1rs. 66, 69 as for t2rs, which are bitter taste receptors, the copy numbers are relatively smaller in megabats than those in microbats ( fig. 3a and supplementary table s5 ). the smaller number of t2rs in megabats can also be explained by their frugivorous diet, as compared with that of microbats, which are mostly insectivores. indeed, the repertoires of t2rs in primates have a strong correlation with their diet, 70 suggesting the importance of t2rs for feeding adaptation in mammals. we identified little variation between megabats and microbats in fprs, which are expressed in the sensory neurons of the vomeronasal organ and mediate innate avoidance behaviours (fig. 3a supplementary table s5 ). 71 suggesting that fpr-mediated chemodetection is not directly linked with the difference in their habitats, mega-and microbats both possess two to eight fprs. however, a previous study, by comparing the orthologous sequences among a broad range of mammals, found the signatures for the operation of positive selection in fprs. 72 therefore, to examine the possible contribution of fprs to the adaptive evolution of megabats, more detailed investigation is necessary by focussing on the dn/ds values among orthologous fpr sequences of many bat species, which are lacking at present. there was an extensive reduction in v1rs, which are known to be expressed in vno neurons of mammals and detect various pheromones, [73] [74] [75] in both megabats and microbats ( fig. 3a and supplementary table s5) . especially, only one v1r was found in the genomes of megabats. the reduction of v1rs revealed in this study is consistent with the findings of the previously published studies. 67 the inactivation of trpc2s 76,77 and ancv1rs, 78,79 which is responsible for vno function, suggested the degeneration of vnos in most bat lineages including megabats. although most bats do not possess intact v1rs, parnell's mustached bat possesses four intact v1rs (fig. 3a and supplementary table s5) , which is consistent with the presence of the vno in this species. 80 in addition, recent study has suggested that there are a substantial number of v1rs in distantly related groups of phyllostomids and miniopterids, which possess an intact vno, suggesting that they retained v1r-mediated chemical communication. 77, 81 therefore, the ancestor of all extant bats is expected to possess an intact vno, as well as a certain number of v1rs, that were independently degenerated after the divergence of each family, including megabats (pteropodidae). namely, the loss of echolocation and the degeneration of the vno occurred spontaneously in the ancestor of megabats. v2rs are expressed in the basal region of the vno neurons 74, 82, 83 and peptide pheromones were detected in mice. 84, 85 however, intact v2rs have been identified only in a limited number of mammals, such as rodents, 63 mouse lemurs, 86 and opossum. 87 our comprehensive analysis failed to find intact putative v2rs in the genomes of all bats and most of other mammals. this result suggests that, before the acquisition of the echolocation ability, the v2r-mediated pheromone detection system has already been lost in the common ancestor of all extant bat lineages. it is noteworthy that the hedgehog and the horse possess seven and one intact v2rs, respectively ( fig. 3a and supplementary table s5 ). this provides the first description of intact v2rs in the genomes of laurasiatherian mammals. more detailed analyses may provide insight into the v2r-mediated pheromone detection system in these species. one of the most intriguing results in the chemosensory receptor genes was obtained from taars. trace amine receptors have been believed to function as receptors for trace amines, for example, tyramine and octopamine in the brain. 88 however, a recent study revealed that taars may be expressed primarily or exclusively in the moe, 89 and are responsible for detecting volatile amines, including ethological odors that evoke innate animal behavioural responses. 90 in this study, we revealed that the number of taars was increased in the common ancestor of megabats. in particular, the number of taars, which were identified to be from five to seven copies in microbats, increased to more than 15 copies in megabats. in particular, leschenault's rousette possess 38 putatively intact (29 intact and 9 truncated) taars, which is the largest number the number of intact, truncated, and pseudo-genes is indicated in blue, yellow, and red, respectively. we treated the truncated genes as 'putatively intact'. the dotted lines show the variation in the number of intact þ 'putatively intact' genes among mammals. it should be noted that the number of taars is obviously higher in megabats than in microbats. (b) phylogenetic tree of intact taars in 24 mammals. only the intact genes were included in the tree. the taars of the egyptian fruit bat and leschenault's rousette are indicated by the square (green) and triangle (blue). it is obvious that the taars of subfamilies seven and eight were expanded in two rousettus bats. zebrafish taar13c in the ncbi database was used as an outgroup. mouse taar1-9 in the ncbi database was used as an indicator for each taar subfamily. accession codes for these database-derived genes are available in supplementary fig. s4 . identified among mammals ( fig. 3a ; supplementary tables s5 and s6 ). the phylogenetic analyses of intact taars for the 24 mammals clearly demonstrated that the expansion of the genes in the two rousettus bats, including the egyptian fruit bat and leschenault's rousette, occurred in subfamilies seven and eight in a species-specific manner ( fig. 3b; supplementary fig. s4 and table s6 ). eyun et al. 91 also reported a high copy number of taars in one megabat, the large flying fox; however, the repertoire was quite different from that of these two rousettus bats ( fig. 3a ; supplementary tables s5 and s6 ). although taars were expanded in subfamilies seven and eight in the two rousettus species, they were expanded only in subfamily seven in the java fruit bat. the number of intact genes, as well as the pseudo-genes, was highly variable among the megabats, suggesting that birth and death of taars were quite active. phylogenetic, as well as copy number, analyses suggest that taars have provided a large contribution to some process of adaptive evolution and diversification of megabats. interestingly, pavlovich et al. 16 revealed the gene expansion of mhc genes in the genomes of the egyptian fruit bat, suggesting novel modes of antiviral defense. thus, the mhc genes and taars were both expanded in megabats. santos et al. 92 reported that taars may be a key mediator in mhc-dependent mating choices in the sac-winged bat (saccopteryx bilineata). based on these findings, it is possible that the megabats use diversified taars for mate choice, by taking advantage of mhc-related molecules that are also diversified. functional experiments investigating taars and mating in megabats may provide insight into the possible link between taars and mhc genes. ors, which are expressed in the moe, have undergone extensive expansion and contraction that may be associated with environmental adaptations. in ors, we also revealed the notable increase of the genes in megabats, which is more evident in two rousettus bats, including the egyptian fruit bat and leschenault's rousette ( fig. 3a and supplementary table s5 ). although the copy numbers of putatively intact (intact and truncated) ors span from 249 to 543 in microbats, those of megabats ranges from 401 to 740. the increase in the number of ors in megabats may be the signature for the reallocation in response, leading to the loss of the echolocation ability in the megabat ancestor. hayden et al. 65 identified convergent or patterns linked to frugivorous diet in megabats and new world fruit-eating microbats (phyllostomids). given that the increase in the ors is more extensive, these patterns of ors are not only linked to the frugivorous diet, but also to some other roles, such as predator avoidance and social communication. by extensively analysing the copy-number variations of chemosensory receptor genes between megabats and microbats, we revealed obvious differences in taars and ors, both of which are expressed in the moe. it is possible that the contraction of vnomediated chemo-detection and echolocation in megabats may lead to the expansion of chemo-detection genes expressed in the moe. in addition, it is noteworthy that the repertoires of taars and ors function was deduced by enrichment analysis in webgestalt. 93 were obviously differentiated even between closely related two species belonging to the rousettus, suggesting that birth and death of these genes are quite active in this genus ( fig. 3a and b ; supplementary tables s5 and s6 ). the results propose the possibility that two rousettus bats are particularly dependent on olfaction through taars and ors. in addition to the candidate approach, which focussed on retroposons and chemosensory receptor genes, we also performed global analyses on the protein-coding genes of megabats. the elevation of dn/ds ratios were examined for the 6,365 single-copy orthologous genes 35 using the branch model of codeml implemented in paml4.8. 46 the likelihood ratio tests and the inspection of p-value identified that the elevation of dn/ds ratios (p < 0.05) was significant in 246 genes (supplementary table s7 ). as shown by the enrichment analyses for the resultant 246 genes using webgestalt, 93 the elevation of the dn/ds ratios in megabats was remarkable in genes related to the immune system and protein catabolism (table 1 and supplementary table s8 ). the elevation of the dn/ds ratios in immune system genes has been reported in several comparative genomic analyses on mammals, including the pangolin, 94 microbat, 14 and megabat. 16 notably, microbats and pangolins have recently begun to attract attention as possible host reservoirs of sars-related coronaviruses responsible for the current outbreak of coronavirus disease-2019 (covid-19). 95, 96 pavlovich et al. 16 revealed the episodic evolution of immune response genes in egyptian rousette, a natural reservoir of marburg virus, by showing an unusual expansion of ngk2, cd94, mhc, and ifn gene families. we revealed the episodic evolution by showing the elevation of dn/ds ratios in many immune response genes in megabat lineages ( table 1 ). the tolerance for zoonotic viruses without overt pathology in bats are consistent with the episodic evolution in immune response genes. namely, co-evolution of viruses and immune system in these species may be facilitated by the adaptive evolution. further molecular biological and physiological investigations of these candidate genes are of primary importance in elucidating how bats tolerate infections by various zoonotic viruses. interestingly, the elevation of the dn/ds ratio of protein catabolism was also reported in the tyrosine aminotransferase gene (tat) in megabats. 97 to further investigate the evolution of the protein catabolism pathway in megabats, we focussed on another representative gene, 3-hydroxyacyl-coa dehydrogenase (hadh), in which the elevation of the dn/ds ratio was significant in the branch model (table 1; supplementary tables s9 and s10 ). hadh is involved in the degradation of ile, val, lys, and tyr to convert them into energy via the citric acid (tca) cycle (fig. 4a) . the branch-site test for hadh ( fig. 4 and supplementary table s10) revealed that seven sites were positively selected with a posterior probability (p) of >95%, including three sites with a p of >99% (fig. 4b) . the likelihood for the operation of positive selection was not significant, as only a few sites were detected as positively selected (11%, figure 4 . positively selected sites in hadh on megabat lineages. (a) in protein metabolism, hadh is involved in the degradation of ile, val, lys, tyr and transforms these factors into acetyl-coa or succinyl-coa for the tca cycle (https://www.genome.jp/dbget-bin/www_bget?hsa: 3033). (b) the sequence alignment between the positively selected sites in hadh in the megabat lineages and microbats and human hadh. the codon alignment of all hadh sequences used in this study is available in supplementary alignment file s3. the sites were identified by the branch-site model on paml. positively selected sites are highlighted in yellow (p, >95%) and red (p-value,p >99%). (c) positively selected residues on megabat lineages are mapped on the human hadh dimer (pdb: 1f0y). the a chain is presented as a spherical model (yellow and red). the hadh dimer a chain is shown as a cartoon model (white) and the b chain is shown as a surface model (gray). the ligands of hadh, nad, and acetoacetyl-coa are shown as a stick model (blue and orange, respectively). supplementary table s10). we then mapped the positively selected sites on the human hadh dimer structure (pdb: 1f0y, fig. 4c ). although the positively selected sites were not located on the ligand (nad and caa) binding sites, it was of interest that four sites (r221, e229, a247, and l286) were located on the dimer interface (fig. 4c) . the mutations on these four residues change electric charges or polarities, such as r221y, e229n, a247s, and l286s, suggesting that dimer formation is likely to be interrupted and enzyme catalysis is degraded. shen et al. 97 identified the significantly low activity of tat in megabats and discussed that the elevation of the dn/ds ratio in tat may be the relaxation of purifying selection in response to their frugivorous diet. megabats may utilize the ingested proteins for the synthesis of new proteins, rather than for energy production through catabolism, as their diets, which include fruits and nectar, are rich in carbohydrates but poor in protein. accordingly, it is possible that the megabats are less dependent on the protein catabolism pathway. in this study, we provide additional and inclusive evidence which suggests that the evolutionary constraints on genes for protein catabolism were relaxed due to the adaptation for frugivorous diets. in summary, our comparative genomic analyses revealed several distinct signatures for adaptive evolution in megabats. (i) the activity of tes is considerably lower compared to other mammals, which is possibly related to a defense mechanism against viruses. the small size of the genomes, which may be due to the low activity of tes, could be advantageous in association with cellular metabolic constrains of flying organisms. (ii) taars and ors, which function in the neurons of moe, show specific expansions, implying the important contribution of olfaction in their adaptation processes. (iii) positive selection in genes for immunity may suggests the coevolution of immune system and viruses, providing crucial insights into the mechanism of asymptomatic infection of bats for zoonotic viruses as a host reservoir. (iv) positive selection in genes for protein catabolism is consistent with the ability of frugivorous feeding that is one of the adaptive characters specific to megabats. bats-biology and behavior mammalian phytogeny: shaking the tree complete mitochondrial genome of a neotropical fruit bat, artibeus jamaicensis, and a new hypothesis of the relationships of bats to other eutherian mammals monophyletic origin of the order chiroptera and its phylogenetic position among mammalia, as inferred from the complete sequence of the mitochondrial dna of a japanese megabat, the ryukyu flying fox molecular evidence regarding the origin of echolocation and flight in bats parallel adaptive radiations in two major clades of placental mammals pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions phylogenomic analysis resolves the interordinal relationships and rapid diversification of the laurasiatherian mammals phylogenetic relationships of icaronycteris, archaeonycteris, hassianycteris, and palaeochiropteryx to extant bat lineages, with comments on the evolution of echolocation and foraging strategies in microchiroptera primate-like retinotectal decussation in an echolocating megabat, rousettus aegyptiacus integrated fossil and molecular data reconstruct bat echolocation maximum likelihood analysis of the complete mitochondrial genomes of eutherians and a reevaluation of the phylogeny of bats and insectivores genome analysis reveals insights into physiology and longevity of the brandt's bat myotis brandtii comparative analysis of bat genomes provides insight into the evolution of flight and immunity genome-wide signatures of convergent evolution in echolocating mammals the egyptian rousette genome reveals unexpected features of bat antiviral immunity a pneumonia outbreak associated with a new coronavirus of probable bat origin command-line tools for processing biological sequencing data fast and accurate short read alignment with burrows-wheeler transform soapdenovo2: an empirically improved memory-efficient short-read de novo assembler efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads basic local alignment search tool blat-the blast-like alignment tool automated generation of heuristics for biological sequence comparison tophat: discovering splice junctions with rna-seq using native and syntenically mapped cdna alignments to improve de novo gene finding transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation prediction of complete gene structures in human genomic dna gene finding in novel genomes busco: assessing genome assembly and annotation completeness with single-copy orthologs identification of olfactory receptor genes from mammalian genome sequences acceleration of olfactory receptor gene loss in primate evolution: possible link to anatomical change in sensory systems and dietary transition evolution of vomeronasal receptor 1 (v1r) genes in the common marmoset (callithrix jacchus) cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences faster sequence homology searches by clustering subsequences rates of molecular evolution suggest natural history of life history traits and a post-k-pg nocturnal bottleneck of placentals phylogeny-aware alignment with prank raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies astral-iii: polynomial time species tree reconstruction from partially resolved gene trees gene tree discordance, phylogenetic inference and the multispecies coalescent mafft multiple sequence alignment software version 7: improvements in performance and usability mega5: molecular evolutionary genetic analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods annotation, submission and screening of repetitive elements in repbase: repbasesubmitter and censor simple and fast classification of non-ltr retrotransposons based on phylogeny of their rt domain protein sequences repbase update, a database of repetitive elements in eukaryotic genomes paml 4: a program package for phylogenetic analysis by maximum likelihood the pymol molecular graphics system a genomic approach to examine the complex evolution of laurasiatherian mammals a molecular phylogeny for bats illuminates biogeography and the fossil record characterization of the mitochondrial genome of rousettus leschenaulti dynamics of genome size evolution in birds and mammals transposable elements as genetic accelerators of evolution: contribution to genome size, gene regulatory network rewiring, and morphological innovation pinpointing the vesper bat transposon revolution using the miniopterus natalensis genome 5s rrna-derived and trna-derived sines in fruit bats origin of avian genome size and structure in non-avian dinosaurs palaeogenomics of pterosaurs and the evolution of small genome size in flying vertebrates loss of line-1 activity in the megabats reviving the dead: history and reactivation of an extinct l1 viral encounters with 2 0 ,5 0 -oligoadenylate synthetase and rnase l during the interferon antiviral response rnase l restricts the mobility of engineered retrotransposons in cultured human cells restricting retrotransposons: a review line-mediated retrotransposition of marked alu sequences the evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity dramatic variation of the vomeronasal pheromone receptor gene repertoire among five orders of placental and marsupial mammals a cluster of olfactory receptor genes linked to frugivory in bats genomic and genetic evidence for the loss of umami taste in bats extreme variability among mammalian v1r gene families prenatal development supports a single origin of laryngeal echolocation in bats evolution of the sweet taste receptor gene tas1r2 in bats frequent expansions of the bitter taste receptor gene repertoire during evolution of mammals in the euarchontoglires clade formyl peptide receptor-like proteins are a novel family of vomeronasal chemosensors adaptive evolution of formyl peptide receptors in mammals molecular organization of vomeronasal chemoreception from genes to social communication: molecular sensing by the vomeronasal organ evolution of v1r pheromone receptor genes in vertebrates: diversity and commonality widespread losses of vomeronasal signal transduction in bats trpc2 pseudogenization dynamics in bats reveal ancestral vomeronasal signaling, then pervasive loss a single pheromone receptor gene conserved across 400 million years of vertebrate evolution inactivation of ancv1r as a predictive signature for the loss of vomeronasal system in mammals vomeronasal organ in bats and primates: extremes of structural variability and its phylogenetic implications expressed vomeronasal type-1 receptors (v1rs) in bats uncover conserved sequences underlying social chemical signaling a novel family of putative pheromone receptors in mammals with a topographically organized and sexually dimorphic distribution a multigene family encoding a diverse array of putative pheromone receptors in mammals the male mouse pheromone esp1 enhances female sexual receptive behaviour through a specific vomeronasal receptor sexual rejection via a vomeronasal receptor-triggered limbic circuit first evidence for functional vomeronasal 2 receptor genes in primates comparative genomic analysis identifies an evolutionary shift of vomeronasal receptor gene repertoires in the vertebrate transition from water to land a renaissance in trace amines inspired by a novel gpcr family a second class of chemosensory receptors in the olfactory epithelium trace amine-associated receptors: ligands, neural circuits, and behaviors molecular evolution and functional divergence of trace amine-associated receptors mhc-dependent mate choice is linked to a trace-amine-associated receptor gene in a mammal webgestalt, a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit pangolin genomes and the evolution of mammalian scales and immunity isolation of sars-cov-2-related coronavirus from malayan pangolins identifying sars-cov-2-related coronaviruses in malayan pangolins adaptive evolution in the glucose transporter 4 gene slc2a4 in old world fruit bats (family: pteropodidae) the authors thank mr. yujiro kawabe for the illustration of the animals in fig. 1 . computations were partially performed on the nig supercomputer at rois national institute of genetics. all nucleotide sequence reads and the genome assembly have been deposited in the ddbj sequence read archive (sra) for egyptian fruit bat (dra001680) and for leschenault's rousette (dra010589). the sequences of fosmid library of egyptian fruit bat were also deposited in the database (ddbj accession nos. ga901612-ga919400). the raw data for the rna-seq analyses of the kidney-derived primary cultured cells of the egyptian fruit bat has been deposited in ddbj sra (dra010068). none declared. the supplementary data are available at dnares online. key: cord-005476-q6o5239w authors: griesenbach, u; geddes, d m; alton, e w f w title: gene therapy for cystic fibrosis: an example for lung gene therapy date: 2004-09-29 journal: gene ther doi: 10.1038/sj.gt.3302368 sha: doc_id: 5476 cord_uid: q6o5239w gene therapy is currently being evaluated for a wide range of acute and chronic lung diseases. the requirement of gene transfer into the individual cell types of the complex lung structure will very much depend on the target disease. over the last decade, the gene therapy community has recognized that there is not even one vector that is good for all applications, but that the gene transfer agent has to be carefully chosen. gene therapy is particularly attractive for diseases that currently do not have satisfactory treatment options and probably easier for monogenic disorders than for complex diseases. cystic fibrosis (cf) fulfills these criteria and is therefore a good candidate for gene therapy-based treatment. this review will focus on cf as an example for lung gene therapy and discuss the progress made in this field over the last couple of years. gene therapy is currently being evaluated for a wide range of acute and chronic lung diseases including acute respiratory distress syndrome (ards), cancer, asthma, emphysema and cystic fibrosis (cf), not least because of the comparatively easy noninvasive accessibility of the lungs through aerosols. the lung is a complex organ and can be roughly divided into two main regions: the airways, consisting of trachea, bronchi, large and small airways, which transport air to the peripheral lung, and the alveoli, where gas exchange takes place ( figure 1 ). the cell types facing the lumen vary greatly from pseudostratified, columnar ciliated and nonciliated epithelium in the larger airways, to single-layer cuboidal epithelium in the small airways and type i and ii pneumocytes in the alveolar epithelium. 1 the requirement of gene transfer into the individual cell types will very much depend on the target disease. in addition, tumour and perhaps inflammatory cells may also be important targets for gene transfer. over the last decade, the gene therapy community has recognized that there is not even one vector that is good for all applications, but that the gene transfer agent (gta) has to be carefully chosen depending on the cell type to be targeted, the number of treatments (one versus repeat administration) required, and the size and nature (secreted versus cellular product) of the gene to be delivered. gene therapy is particularly attractive for diseases that currently do not have satisfactory treatment options, and is probably easier for monogenic disorders than for complex diseases. cf fulfills these criteria and is there-fore a good candidate for gene therapy-based treatment. this review will mainly focus on cf as an example for lung gene therapy. cf is the most common lethal autosomal recessive disease in the caucasian population and affects approximately 70 000 individuals worldwide. although several organs are affected, severe lung disease is the cause of most of the morbidity and mortality in cf individuals. 2 the cf gene, the cystic fibrosis transmembrane conductance regulator (cftr), was cloned in 1989 3 and is a chloride channel located in the apical membrane of epithelial cells. mutations in the cftr gene lead to imbalanced ion and water movement across the airway epithelium, resulting in accumulation of sticky mucus, chronic bacterial infection and inflammation. proof-of-principle for cftr gene transfer was quickly established in vitro and in animal models. 4, 5 the first clinical trials in cf patients were carried out in 1993 and to date 29 trial protocols, most of which have been completed, are published (http//www.wiley.co.uk/genmed/clinica/). the initial hope was that cf gene therapy would progress rapidly, due to the ease of noninvasive access to the lungs, but delivery of the gene to the relevant cells remains a difficult task. here, we will review the considerable progress that has been made in pre-clinical and clinical gene therapy studies for cf over the last couple of years. in non-cf individuals, cftr is not expressed abundantly in the lungs, but high expression is seen in serous cells in the submucosal glands and isolated epithelial cells in the small airways. 6 it is currently unclear which of these cell types is the main target for cf gene therapy. however, given that cf, at least in the early stages, presents as a small airway disease, airway epithelial cells (aecs) are likely to be an important target. topical delivery of gta to the lung is currently the preferred method for airway gene transfer. however, before the gta can reach the surface of the epithelial cells, a number of extracellular physical and immunological barriers have to be overcome (reviewed in ferrari et al 7 and weiss 8 ). briefly, the airway epithelium in the lung is generally covered by a thin mucus layer (figure 2 ), whose main role is to trap invading foreign particles. it has been shown that mucus significantly reduces the transfection efficiency of most viral and nonviral gtas. however, transfection efficiency could be increased through pretreatment with mucolytics or the anticholinergic drug glycopyrrolate in vitro and in vivo. 9 in cf patients, particularly at later stages in the disease, the airways are also filled with sticky sputum, consisting of inflammatory cells, cell debris, mucus and dna. to avoid the confounding effect of sputum in vivo, gene transfer should ideally be carried out early in the course of lung disease, before the lungs become filled with secretions. the glycocalyx is also a barrier to gene transfer and pretreatment with neuraminidase, which removes sialic acid residues, enhances adenovirus (ad) transfection of polarized cells in vitro. 10 although not formerly shown, it is likely that cilia also lead to steric hindrance of gta to the apical surface of epithelial cells. in addition to the physical barriers, specific and nonspecific immune defences are important inhibitors of airway gene transfer. pulmonary macrophages have been shown to ingest gtas, and removal of these cells before transfection has increased reporter gene expression by 490% in animal models. 11 however, it is unlikely that removal of macrophages is clinically feasible. in addition to the cellular immune response, humoral immune responses against gta are an important problem, severely restricting the use of viral vectors for chronic diseases such as cf. despite encouraging results in nasal and pulmonary tissues of pre-clinical models 12, 13 and being well tolerated at low-to-intermediate doses in humans, 14 adenovirusmediated gene transfer in the absence of epithelial damage has been inefficient in cf patients. 15 this is mainly due to the absence of the coxsackie-adenovirus receptor (car) on the apical surface of the majority of human aecs, 16 and highlights the important differences in receptor distribution of animal models and humans. in an attempt to increase the transfection efficiency of adenoviral vectors in vivo, gregory et al 17 assessed the effects of sodium caprate (a tight junction opener) application to the luminal surface of aecs in mouse lung, with the rationale that car expression is higher on the basolateral surface of epithelial cells. gene expression in total lung homogenate was increased 25-fold, which further increased to 45-fold when adenovirus was complexed with 2-(diethylamino)ethyl ether (deae) dextran. however, it is unclear if this increase in gene expression was attributable to increased epithelial cell transfection. a controversial issue is whether such tight junction openers can be used clinically, given the heavy bacterial colonization present in the cf lung and the attendant risk of systemic invasion. in addition to problems with low transfection efficiency, the use of adenovirus for a chronic disease like cf is limited due to effective cellular and humoral immune responses against the virus. harvey et al 18 delivered three doses of ad-cftr to the lung of cf patients 3 months apart and demonstrated that after the third administration vector-specific cftr mrna was no longer detecthelper-dependent adenoviral vectors, which are depleted of all viral genes, are less immunostimulatory and have improved safety profiles compared to first-and second-generation viruses, which have only a subset of viral genes deleted. recently, it was shown that helperdependant adenovirus combined with the epithelial cellspecific cytokeratin 18 (k18) promoter leads to reduced inflammation and more prolonged expression in murine airways. 19 the use of adenoviral vectors for cf gene therapy is currently limited by low transfection efficiency and inability of repeated administration, but it remains to be seen if future virus improvements resurrect its use. aav vectors have attracted much interest due to their good safety profile, broad tissue tropism, long duration of expression, and suggestion of their superior escape from immune system surveillance compared with other viruses. several clinical trials have been carried out in the nose, sinus and single lobes of cf patients, all using the aav2-based vector tgaav-cftr (targeted genetics corp.). this vector contains the complete human cftr cdna and uses aav inverted terminal repeat (itr)based promoter elements. phase i studies aerosolizing aav2-cftr into cf patients with mild-to-moderate lung disease have been conducted. there were no safety problems and the vector was detected in the proximal airways: however, vector-specific mrna was not found. 20 a phase ii trial was also undertaken in the maxillary sinuses of cf patients. 21 although the good safety profile was confirmed, none of the primary end points, including the time to sinusitis relapse, histopathology and interleukin (il)-8 measurements, changed significantly when compared to the contralateral control sinus. most recently, results of the first repeat-administration lung trial (three doses of nebulized aav2 1 month apart) were published. the treatment was well tolerated and showed some evidence of improved lung function and reduced il-8 in induced sputum after the first administration. a follow-up trial sufficiently powered to detect pulmonary changes has recently started. 22 the small packaging capacity of aav (o5 kb) precludes the use of this vector for transfer of larger genes. although there is enough space for the cftr cdna, it is not possible to include potent promoter/ enhancer elements. thus, all clinical trials carried out with aav2-cftr have relied on the comparatively weak itr regulatory elements, which may in part explain the disappointing efficacy data described above. strategies to overcome the aav packaging problem have therefore been developed, including approaches based on transsplicing 23 and homologous recombination. 24 the basic principle of these techniques is to split the therapeutic cdna and required regulatory elements, and package them into two viruses, which when transfecting the same cell may recombine and generate a full-length therapeutic gene. one would speculate that both of these strategies would lead to reduced transfection efficiency, when compared with the administration of one intact virus to the lung. however, surprisingly, halbert et al 24 have demonstrated that aav2/6 (itr from aav2 and capsid from aav6) recombination-dependent vectors transduced lung cells in mice almost as efficiently as intact vector, with 10% of aecs being positive. several different isoforms of human aavs have been identified and further screening for new human and nonhuman primate isoforms is underway. 25 it has already been documented that a virus with aav5 or aav6 capsid protein can enter aec more efficiently than aav2 viruses, but the overall transfection efficiency is still comparatively low. 26 recently, the atomic structure of aav2 has been identified, which should enable rational engineering of vector capsids for specific cell targeting. 27 shi et al 28 have already identified specific regions within the capsid protein that can tolerate the insertion of small exogenous peptides and have made an attempt at incorporating integrin-targeting peptides into this region. it has been postulated that aav may not infect antigen-presenting dendritic cells and thereby avoids activation of the host immune system. if this is true, aavs, in contrast to other viruses, may be suitable for repeat administration. the results of repeat administration have been reported to vary greatly and may depend on the host, delivery route and aav serotype tested. 29, 30 aurichio et al 30, 31 have shown that aav2/5 can be readministered once to the mouse lung 5 months after the first delivery. most recently, fischer et al treated nonhuman primates with serial doses (three administrations) of aerosolized aav2. this study goes some way towards demonstrating that repeat administration of aav2 maybe possible, despite increasing titres of neutralizing antibodies. 32 importantly, repeat aerosolization of aav2-cftr into cf patients is safe and well tolerated 22 and phase ii efficacy trials are currently being carried out to determine if repeat administration in humans results in persistent gene expression. the murine parainfluenza virus type 1 (or sendai virus (sev)), the human respiratory syncytial virus (rsv) and the human parainfluenza virus type 3 (piv3) have all been shown to efficiently transfect aecs via the apical membrane 33,34 using sialic acid and cholesterol, which are abundantly expressed on the apical surface of aecs. these viruses have a negative-strand rna genome and replicate in the cytoplasm. they do not go through a dna intermediate and do not enter the nucleus. only sev has been assessed in animal models in vivo and is currently the most efficient virus for airway gene transfer. first-generation recombinant sev carrying cftr cdna can produce functional cftr chloride channels in vitro and after transfection of the nasal epithelium in cf knockout mice. 33 further improvements in the sev vectors have been made by deleting the f-protein from the viral backbone (df), which rendered the second-generation viruses transmission-incompetent. inoue et al 35 have further improved the df/sev vector by introducing mutations into the matrix (m) and hemagglutinin-neuraminidase (hn) proteins, which reduce the amount of virus-like particles that are produced after transfection, thereby further improving the safety profile. sev-mediated gene expression is transient (lasting for about 7 days) and currently repeated administration is inefficient. several groups, including our own, are assessing a variety of immuno-modulatory strategies to improve the use of sev for chronic lung diseases. in contrast to retroviruses, lentiviruses transfect nondividing cells and are, therefore, suitable for transfection of terminally differentiated aecs. the virus stably integrates into the genome of transfected cells and expression is therefore likely to last for the lifetime of the cell (approximately 100 days for aecs). however, when pseudotyped with the commonly used vesicular stomatitis virus g-glycoprotein (vsv-g), lentiviruses can only enter aec via the basolateral membrane, using the inorganic phosphate receptor pit2. importantly, pit2 is also expressed on the apical surface and binds amphotropic virus equally well on both membranes. 36 thus, other as yet unidentified factors contribute to the inefficient transfection of this virus via the apical membrane. vsvg-pseudotyped hiv-derived lentivirus carrying the cftr gene transiently and partially corrected the chloride defect in cf knockout mouse nose for up to 46 days. 37 however, to achieve efficient transfection in the mouse nose, pretreatment with the tight junction opener lysophosphatidylcholine was necessary. gene expression using b-galactosidase as a reporter gene was detected for up to 92 days, without a loss in transgene expression. these results may suggest hiv integration into stem/ progenitor cells in the airways. however, longer followup will be required to determine this since the duration of follow-up (92 days) overlaps closely with the projected lifetime of aecs of about 100 days. it has previously been shown that lentivirus pseudotyped with envelope glycoproteins from the filoviruses ebola or marburg transfect aecs via the apical membrane, and that folate receptor alpha (fra) is a cellular receptor for filoviruses. a recent report has shown that fra is abundantly expressed on the apical surface of primary aecs, but interestingly does not appear to be absolutely required for filovirus uptake into the cells. 38 in the presence of anti-fra-blocking antibodies, virus entry was not affected. this indicates that cellular entry of lentivirus pseudotyped with filovirus glycoproteins is likely more complex than via a single receptor. as mentioned, above members of the paramyxovirus family, such as sev and rsv, transfect aecs very efficiently. this is due to rapid interaction between the f and hn envelope glycoproteins with cholesterol and sialic acid residues on the cell surface, respectively. the f and hn proteins are therefore promising candidates for pseudotyping lenitviruses and kobayashi et al 39 have recently described successful incorporation of f and hn envelope proteins into the capsid from simian immunodeficiency virus (siv). this vector was able to transduce polarized epithelial cells from both the apical and basolateral sides and we are currently evaluating this vector for airway transduction in animal models. importantly, unless lentiviral vectors are able to hit airway stem cells efficiently, they will likely need to be re-administered and therefore will face the same immune-response problems as other viral vectors. improving the efficiency of nonviral gene transfer to aecs has been a major focus with a variety of strategies being followed. several groups are modifying polyplexes such as polylysine and polyethylenimine (pei) by adding sugars, based on the rationale that aecs express lectins, which selectively bind and internalize glycoconjugates. although glycoconjugates containing lactose have been efficient in cell culture, [40] [41] [42] their efficacy in vivo remains to be demonstrated. receptor-mediated gene delivery has been developed for aecs by targeting the serpin-enzyme complex receptor (sec-r). 43 this receptor is responsible for the uptake of serine proteases bound to their cognate inhibitors into cells. the receptor recognizes a conserved five-amino-acid-binding motif, but tolerates large variation in the attached cargo. sec-r-directed complexes are prepared by condensing plasmid dna with a covalent conjugate of a peptide receptor ligand (17 amino acids) and polylysine. ziady et al 43 have recently demonstrated partial correction of the chloride transport defect in the nasal epithelium of cf knockout mice following administration of sec-r ligand complexed to a cftr plasmid. in nondividing cells, the nuclear membrane appears to be an important barrier to gene transfer and one reason why sec-r ligand polylysine complexes transfect airway cells efficiently might be their small size. with a diameter of 18-25 nm, these nanoparticles may be able to enter the nucleus via passive diffusion through the nuclear pore complex, which has a cutoff size of about 25 nm. however, formulation and stability problems have so far prevented phase i clinical trials. peptides resembling integrin-binding domains have also been linked to plasmid dna and have been shown to transfect the airway epithelium of pigs when delivered at bronchoscopy. 44 it remains to be established if antipeptide immune responses will interfere with using peptidecarrying nonviral formulations for chronic diseases, but the risk of immune responses against the peptide can be minimized by using conserved human peptide sequences. importantly, traditionally used animal models may not be suitable to evaluate efficiency or repeat administration of human peptide formulation, if the chosen sequence is not conserved within the animal model. another nanoparticle formulation, consisting of a single plasmid molecule compacted with polyethyleneglycol (peg)-substituted polylysine (polymer of 30 lysines) has been developed. these dna nanoparticles have a rod-like structure (12-15 nm diameter, 100-150 nm length). a single-dose escalation study to evaluate the safety of nasal administration into cf patients has recently been carried out in 12 subjects. in addition to assessing safety, secondary end points included assessment of electrical correction of the ion transport defects and molecular analysis for the presence of vector-specific dna and mrna. administration of the nanoparticles was considered safe. in most patients, plasmid dna could be detected in at least one nostril. there was no evidence of vector-specific mrna in any patient, which may have been due to insufficient sensitivity of the assay. partial correction of the chloride transport defect was demonstrated in seven out of 12 patients, which persisted for up to 15 days. 45 although these initial results are encouraging, further phase ii trials will be necessary ultimately to determine the efficacy of these particles. in addition to improving nonviral dna condensing agents, several groups are improving the plasmid vectors for nonviral gene transfer. yew et al 46 have demonstrated that reduction in cpg motifs in the pdna reduces the immunostimulatory capacity of pdna after systemic administration of liposome/pdna complexes. fewer changes in blood parameters of toxicity, reduced levels of inflammatory cytokines and decreased liver damage were observed after depletion of 80% of the cpg motifs. in addition, gene expression was prolonged in immunocompetent mice. similar results were observed after topical administration of liposome/pdna complexes to the lung (rk scheule, personal communication). gill et al 47 have studied the effect of different promoters on persistence of lung gene expression by comparing the frequently used human immediate-early cytomegalovirus (cmv) promoter to the constitutive endogenous polyubiquitin c (ubc) and elongation factor 1a (ef1a) promoters. although both eukaryotic endogenous promoters lead to about 10-fold less transgene expression at day 2, duration of gene expression was significantly improved when 'naked' pdna was administered to the lung (cmv: o1 week, ubc: 416 weeks) and ubc-mediated gene expression reached cmv day 2 levels approximately 4 weeks after transfection. similar results were reporter by yew et al 48 using the ubiquitin b (ubb) promoter. promoter silencing is likely to contribute to these results and it has previously been demonstrated that the cmv promoter is silenced by tnfa and infg, which are both upregulated after gene transfer. however, it is currently unknown why the ef1a and ubc promoters are more resistant to gene silencing. despite the comparatively low transfection efficiency, nonviral gtas offer important advantages over viral gta for chronic disease. we and others are currently assessing a variety of physical delivery methods, including electroporation, magnetism, ultrasound and vibration, in an attempt to increase the transfection efficiency of nonviral formulations. electroporation has been successfully used to enhance transfection in a variety of organs including muscle. initial results for lung gene transfer are encouraging and demonstrate that the transfection efficiency of naked dna can be enhanced in the presence of electrical fields 49 (and ian pringle, personal communication). clearly, important technical questions and safety considerations have to be resolved. systemic delivery has long been postulated as a means for lung transfection and intravenous (i.v.) injection of many nonviral gtas leads to lung transfection. it is important to note that for the vast majority of gtas gene transfer is only achieved in alveolar endothelial cells and maybe pneumocytes, because the gta gets trapped in the alveolar capillaries of the pulmonary circulation, the first capillary bed encountered after i.v. administration, but are found only rarely in the conducting airways, which are the targets for cf gene therapy. to be able to transfect the conducting airway epithelium, the gta has to pass through the pulmonary circulation, reach the left side of the heart and travel from there to the bronchial circulation, which supplies the airways (figure 3) . here, the gta has to escape from the vessels and migrate through a dense layer of extracellular matrix to the basement membrane of the aecs. we have recently demonstrated that naked oligonucleotides are able to follow this route and transfect the cytoplasm of aecs efficiently. 50 koehler et al 51 have shown that the bronchial epithelium and submucosal glands can be transfected using plasmid dna complexed to the cationic liposome dodac:dope, although this was not reproducible with other lipids and appears to be a characteristic property of this particular liposome. a better understanding of liposome structure and charge interaction in the context of serum proteins will help with the rational design of nonviral gta for systemic cf gene therapy. the addition of ligands for receptor-mediated uptake may also improve the transfection efficiency of aecs after i.v. delivery and proof-of-principle for this concept has been published several years ago. ferkol et al 52 have shown that addition of ligands to the polymeric immunoglobulin receptor increases aecs transfection following systemic administration. the addition of moieties to increase organ and cell-type specific targeting will be important to minimize systemic gene transfer and toxicity. alternative non-cftr cdna nucleotide-based therapies gene repair of the endogenous cftr gene has two major advantages over traditional gene therapy. if successful, gene repair should ensure gene expression for the lifetime of the cells and appropriate control of gene expression is likely because the endogenous cftr promoter is utilized. our preliminary results indicated that the genomic cftr locus could be modified in primary rat hepatocytes, but not primary aecs, using chimeraplasts (dna/rna hybrid oligonucleotides). 53 hepatocytes have previously been shown to be easily amenable for gene repair strategies, most likely due to efficient uptake of repair molecules into the nucleus. in addition, a similar approach using small-fragment homologous recombination (sfhr) was able to reintroduce the wild-type cftr sequence into the lungs of cf knockout mice. 54 overall, the mechanisms involved in figure 3 schematic presentation of the pulmonary and bronchial circulation transporting gene transfer agents to the airway epithelium after intravenous injection (courtesy of steve smith, department of gene therapy, imperial college london). gene therapy for cystic fibrosis u griesenbach et al gene repair are not well understood and it is currently uncertain if the required 'repair' proteins are present in terminally differentiated aecs. in addition, uptake of repair oligonucleotides into the nucleus of aecs ex vivo and in vivo remains inefficient (uta griesenbach, unpublished observation) and is the first hurdle that needs to be overcome, before gene repair can be assessed. downregulation of gene expression through antisense molecules may be of therapeutical benefit in cf patients. lambert et al 55 showed that antisense inhibition of the b-cell antigen receptor-associated protein (bap) 31 increased expression of both wild-type cftr and mutant cftr and partially restored cftr chloride channel function. the exact function of bap31 is unclear, although the authors speculated that the protein may be involved in retaining mutant cftr in the er. several other chaperone proteins, mucins or the epithelial sodium channel (enac), which is hyperactive in cf, may be suitable candidates for antisense strategies. we have recently assessed rna interference-mediated gene silencing in the lungs in vivo, and although proof-ofprinciple could be demonstrated efficiency was low, 56 likely due to low transfection efficiency. splicosome-mediated trans-splicing (smart) has recently been introduced as a means to generate wild-type cftr mrna in cf xenograft models. cells were transfected with very high titres of adenovirus that produced the so-called pre-therapeutic wild-type cftr mrna molecules (ptms), which are designed to promote trans-splicing with the endogenous cftr mrna. 57 similar to gene repair, smart ensures celltype-specific expression of wild-type cftr mrna; however, the technology requires further optimization with respect to efficiency and specificity. the choice of the correct animal model is a crucial factor in developing gene therapy for cf. currently, the cf knockout mouse is the only cf animal model and although these mice do not develop the characteristic cf lung disease, they have the same ion transport defect as cf patients in their nasal epithelium. this, combined with the fact that the nasal epithelium can easily be exposed to gtas, makes the cf mouse nose an ideal organ for assessing and optimizing gene transfer. in addition, non-cf primates, 58,59 pigs 44 and most recently sheep 60 have been used to optimize airway gene transfer and allowed clinically relevant delivery methods such as nebulization to be assessed. more recently, first attempts have been made at generating cf ferret and sheep based on targeting of the cftr locus in somatic cells coupled with nuclear transfer 61 (and jim mcwhir, roslin institute, personal communication). the success of pre-clinical and clinical cf gene therapy studies stands and falls with the assays used to evaluate gene transfer. the development of new cftr-specific assays involving epithelial cell-specific detection of cftr mrna and protein, bacterial adherence to aecs, airway surface liquid height measurements and others are currently a major focus of the uk cystic fibrosis gene therapy consortium (www.cfgenetherapy.org.uk). for clinical studies, the most relevant end points are a reduction in decline of lung function over time and of episodes of infection. however, these end points are not suitable for early phase ii trials, because large patient numbers (4500) and long follow-up (412 months) would be required. 62 it is therefore crucial to identify clinical surrogate end points (such as bacterial burden, inflammatory markers and imaging) that can be assessed in smaller patient cohorts with shorter follow-up. it is unlikely that one-time administration of a short acting gta will change these clinical surrogate end points, but will more likely require repeat administration and it is therefore important to design future gene therapy trials with these surrogate end points in mind. an extensive discussion about assays is outside the scope of this review, but has recently been reviewed. 63 over the last decade, it became apparent that gene transfer to the aecs is difficult. this is perhaps unsurprising, given the lung has evolved to keep foreign particles out. the major obstacle for most viral gtas is the effective immune surveillance mechanisms in the lung, which prohibit repeat administration. many strategies to overcome this problem have already been explored, but have not yet been successful. in our view, this may be a difficult hurdle to overcome. nonviral gene transfer has traditionally been inefficient, but recently developed nanoparticles and ligand-targeting appear to be overcoming this problem. importantly, physical delivery methods to increase nonviral gene transfer are currently being assessed in the lung. although gene therapy for cf is not yet a clinical reality, the many innovative strategies currently being assessed should lead to efficient and repeatable airway gene transfer within the next few years. the cells of the pulmonary airways the molecular and metabolic basis of inherited disease identification of the cystic fibrosis gene: cloning and characterization of complementary dna non-invasive liposome-mediated gene delivery can correct the ion transport defect in cystic fibrosis mutant mice correction of the cystic fibrosis defect in vitro by retrovirus-mediated gene transfer expression of the cystic fibrosis gene in adult human lung immunological hurdles to lung gene therapy delivery of gene transfer vectors to lung: obstacles and the role of adjunct techniques for airway administration barriers to and new approaches for gene therapy and gene delivery in cystic fibrosis retargeting the coxsackievirus and adenovirus receptor to the apical surface of polarized epithelial cells reveals the glycocalyx as a barrier to adenovirus-mediated gene transfer role of alveolar macrophages in rapid elimination of adenovirus vectors administered to the epithelial surface of the respiratory tract aerosol delivery of a beta-galactosidase adenoviral vector to the lungs of rodents adenovirus-mediated persistent cystic fibrosis transmembrane conductance regulator expression in mouse airway epithelium safety of local delivery of low-and intermediate-dose adenovirus gene transfer vectors to individuals with a spectrum of morbid conditions aerosol and lobar administration of a recombinant adenovirus to individuals with cystic fibrosis. i. methods, safety, and clinical implications basolateral localization of fiber receptors limits adenovirus infection from the apical surface of airway epithelia enhancement of adenovirus-mediated gene transfer to the airways by deae dextran and sodium caprate in vivo airway epithelial cftr mrna expression in cystic fibrosis patients after repetitive administration of a recombinant adenovirus reduced inflammation and improved airway expression using helper-dependent adenoviral vectors with a k18 promoter a phase i study of aerosolized administration of tgaavcf to cystic fibrosis subjects with mild lung disease efficient and persistent gene transfer of aav-cftr in maxillary sinus repeated adeno-associated virus serotype 2 aerosol-mediated cystic fibrosis transmembrane regulator gene transfer to the lungs of patients with cystic fibrosis: a multicenter, double-blind, placebo-controlled trial expanding aav packaging capacity with trans-splicing or overlapping vectors: a quantitative comparison efficient mouse airway transduction following recombination between aav vectors carrying parts of a larger gene serological characterisation of human and non-human primate aavs adeno-associated virus type 5 (aav5) but not aav2 binds to the apical surfaces of airway epithelia and facilitates gene transfer the atomic structure of adeno-associated virus (aav-2), a vector for human gene therapy rgd inclusion in vp3 provides adenoassociated virus type 2 (aav2)-based vectors with a heparan sulfate-independent cell entry mechanism repeated delivery of adeno-associated virus vectors to the rabbit airway transduction by adeno-associated virus vectors in the rabbit airway: efficiency, persistence, and readministration noninvasive gene transfer to the lung for systemic delivery of therapeutic proteins successful transgene expression with serial doses of aerosolized raav2 vectors in rhesus macaques recombinant sendai virus-mediated cftr cdna transfer rsv and piv3 target human ciliated airway epithelial cells: efficient gene transfer vectors for cystic fibrosis lung disease a new sendai virus vector deficient in the matrix gene does not form virus particles and shows extensive cell-tocell spreading apical barriers to airway epithelial cell gene transfer with amphotropic retroviral vectors recovery of airway cystic fibrosis transmembrane conductance regulator function in mice with cystic fibrosis after single-dose lentivirus-mediated gene transfer lentivirus vectors pseudotyped with filoviral envelope glycoproteins transduce airway epithelia from the apical surface independently of folate receptor alpha pseudotyped lentivirus vectors derived from simian immunodeficiency virus sivagm with envelope glycoproteins from paramyxovirus sugar-mediated uptake of glycosylated polylysines and gene transfer into normal and cystic fibrosis airway epithelial cells uptake of plasmid/glycosylated polymer complexes and gene transfer efficiency in differentiated airway epithelial cells lactosylated poly-l-lysine targets a potential lactose receptor in cystic fibrosis and non-cystic fibrosis airway epithelial cells functional evidence of cftr gene transfer in nasal epithelium of cystic fibrosis mice in vivo following luminal application of dna complexes targeted to the serpin-enzyme complex receptor evaluation of a porcine model for pulmonary gene transfer using a novel synthetic vector single dose escalation study to evaluate safety of nasal administration of cftr001 gene transfer vector to subjects with cystic fibrosis reduced inflammatory response to plasmid dna vectors by elimination and inhibition of immunostimulatory cpg motifs increased persistence of lung gene expression using plasmids containing the ubiquitin c or elongation factor 1alpha promoter high and sustained transgene expression in vivo from plasmid vectors containing a hybrid ubiquitin promoter electroporation-mediated transfer of the na-k-atpase b1 subunit safely increases alveolar fluid clearance in rat lungs towards nucleic acid transfer to the airway epithelium via the systemic route targeting transgene expression for cystic fibrosis gene therapy gene transfer into the airway epithelium of animals by targeting the polymeric immunoglobulin receptor conversion of wild-type cftr to the g551d mutation in primary rat hepatocytes using rna/dna oligonucleotides targeted replacement of normal and mutant cftr sequences in human airway epithelial cells using dna fragments control of cystic fibrosis transmembrane conductance regulator expression by bap31 lacz sirna and antisense dna do not decrease b-galactosidase expression in the airways of k18-lacz mice partial correction of endogenous deltaf508 cftr in human cystic fibrosis airway epithelia by spliceosome-mediated rna trans-splicing deposition and expression of aerosolized raav vectors in the lungs of rhesus macaques gene therapy for cystic fibrosis with aerosolized adenovirus-cftr: characterization of the aerosol and scintigraphic determination of lung deposition in baboons transfection efficiency and toxicity following delivery of naked plasmid dna and cationic lipid-dna complexes to ovine lung segments development of a ferret model of cystic fibrosis identifying treatments that halt progression of pulmonary disease in cystic fibrosis preclinical and clinical endpoint assays for cystic fibrosis gene therapy key: cord-260345-ugd8kkor authors: giles, ian g. title: a compendium of reviews in biochemistry and molecular biology published in the first half of 1992 date: 1992-12-31 journal: international journal of biochemistry doi: 10.1016/0020-711x(92)90283-7 sha: doc_id: 260345 cord_uid: ugd8kkor abstract 1. 1. a compendium of reviews and mini-reviews in biochemistry and molecular biology published in the first half of 1992 is presented. in all 499 titles are listed from 95 different publications. 2. 2. this compendium presents the references by journal name. keywords have been included with each reference to increase the value of the collection. keyword and author cross-reference indexes are not included but are available in the electronic database from which this version was constructed. should anyone wish to have this information in electronic form it can be distributed on ms-dos formatted flopppy disks in either reference manager or medline format. the author should be contacted for details of the number of preformatted floppy disks required. krasikov n., thompson k. and sekhon g.s. (1992) brief clinical report-monosomy 18q12.1+21.1-a recognizable aneuploidy syndrom~report of a patient and review of the literature. am. j. med. geti. 43. 531-534. verloes a., mulliez n.. gonzales m., laloux f.. hemutnnsle t., pierard g.e. and koulischer l. (1992) restrictive dermopathy, a lethal form of arthrogryposis multiplex with skin and bone dysplasiac3 new cases and review of the literature. am. j. med. genet. 43, 539547. aplasia cutis ccugenita; pyloric atresia, newborn; sibs. leonard c., huret j.l., imbert mc., lebouc y., selva j. and boulley a.m. (1992) trisomy-16p in a liveborn offspring due to maternal translocation t(16-21)(ql l-p1 1) and review of the literature. am. j. med. gene; . 43, [621] [622] [623] [624] [625] spontaneous abortions; handing teclm' tques; duplication 16~; infant; segregation. xie l.q.. markides k.e. and lee m.l. (1992) biomedical applications of analytical supercritical fluid separation techniques. anal. biuchem. u)o, [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] chtumatography-mass spectrometry; amino-acid derivatives; gas-chromatograph stationary phase; chargeexchange; silica-gel; extraction; glycosphingolipids; resolution; interface. hozier j.c. and davis l.m. (1992) review-cytogenetic approaches to genome mapping. anal. biochem. 208, 205217 . chaiken i., rose s. and karlsson r. (1992) quantitative analysis of protein interaction with ligands. (2) analysis of macromolecular interactions using immobilized ligands. anal. b&km. 201, [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] . neurophysin self-association; affinity-chmmatography; subunit-exchange chromatography; biosynthetic precursor; equilibtium-carstants; peptide recognition; sense pcptides; binding; purification; elution. ichikawa y.. look g.c. and wong c.h. (1992) review-enzyme-catalyzed oligosaccharide synthesis. anal. b&hem. 202,215-238. gal-l3-1+3(4)glcnac a-2+3 sialyltransferase; acetylneuraminic acid synthetase; immobilized l3-galactosidase; gdp-l-fucose; sialic acids; esckrichia coli; rat-liver, glycoprotein oligosaccharides; carbohydrate antigens; determines expression. gabriel 0. and gersten d.m. (1992) staining for enzymatic activity after gel electrophoresis-review. anal. bbckm. 203, sodium dodecyl-sulfate.; ted blood-cells; phosphoenolpyruvate carboxylase activity; nucleotide-linked dehydrogenases; alkaline-phosphatase isoenzymes; pathogenesis-related proteins; cr-l-fucosidase; polyactylamide-gel; produce phosphate; general-method. wood p.j. (1992) the measurement of parathyroid hormone. ann. cfin. b&km. 29, [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] cyclic amp; immtmoradiomettic assay; primary hypatparathytoidism; humoral hypercalcemia; calcium huneostasis; intact parathytin; clinical utility; circadian-rhythm; chromogranin a; lung-cancer. newman d.j.. henneberry h. and price c.p. (1992) particle enhanced light scattering immunoassay. ann. clin. b&hem. 29, . c-reactive protein; human chorionic-gonadotropin; latex agglutination-test; cell-labeled antibodies; shell corn patticles; counting immunoassay; turbidimetric immunoassay; spectroscopic immunoassay; passive hemagglutinatiom luteinizing-hormone. thompson d.. milfordward a. and whither j.t. (1992) the value of acute phase protein measurements in clinical practice. ann. clin. c-reactive protein; erythrocyte sedimentation-rate; inflammatory bowel-disease; tumornecrosis factor, amyloid a protein; tlteumatoid atthritis; plasma-ptoteins; polymyalgia rheumatica; acute-leukemia; tissue-injury; acute phase. soldi s.j. (1992) drug receptor assays-quo vudis. ann. . allen j.f. (1992) protein phosphorylation in regulation of photosynthesis. biochln. bi@ys. ac& la275 335. light hamsting complex; dtlomphyll a; b protein; cyanobncterium syn&ococc~~ 6301; excitatiarcnergy distribution; absorptial cross-section; ii reaction center. thylakoid membrane pelypaptides; state-l state-2 tmnsiticns; amino-acid-sequence; randomixed chlomplast iamellae. anthony c. (1992) the c-type cytochromes of methylotrophic bacteria. b&him. biqhys. acta 1099, l-15. methylobacteriwn extorquetu am; electron-transport chain; blue copper proteins; oxidation mutant classes; ammo-acid sequence; sp strain aml; paracoccus dcni~rificons; melhylophilvs mdylotrophur; obligate metbylotmpb; m&and dehydrogenase. hoch f.l. (1992) cardi~lipins and biomembrane function. biochim. biophys. acta 1113, 71-133. rat-liver-mitochondtia; fatty-acid composition; brown-adipose-tissue cytochmme-c-oxidaset munbmne lipid-ccmpositiau adenine-nucleotide translocase; age-related-changes; skeletal-muscle mitochond~, chronic ethanol-consum@n; lateral proton conduction. bandekar j. (1992) amide modes and protein conformation. biochim. biophys. actu 1120. 123-143. transforfn-infrared-spsccpy; laser raman-spectroscopy; secondary-structure-analysis; liver ahohd-dehydmgenase; hydrogadeuteriutn exchange; a transmembrane channel; valyl-glycyl-glycine; iii spectral region; vibratiaral analysis, gramicidin-a. b&him. biophys. acta 1123, 231-238. acetyl-coa carboxylase; coenzyme-a reductase; hormone-sensitive l&se; dependent multipmtein kinase; rat-lives; 3-hydmxy-3-methylglutaryl coenxyme; enxymic activity; phosphorylationdephosphorylatiar; hydmxymethylgfutaryl coenxyme; reversible phosphotylation. w&t k. (1992) origins and fates of fatty acyl-coa esters. biochim. biophys. acfu 1124, 101-111. coenzyme a syntbetase; performance liquid-chromatography; rat-liver micmsomcs; acyltransfemse-cataly~ cleavage; dependent transacylation system; rabbit alveolar macropbages; pemxisomal &oxidation; amcbidonic-acid; brain micmsomes; sbott-chain. low-density~l&protein; high-performance liquid; chrcmatogmphy mass spectmmetry; chicken vitellogam gene; thin-layer chmmatography; apolipoprotein-vldl-ii; fatty-acid composition; laying turkey hens; egg-yolk; plasma-lipoproteins. erlansonalbertsson c. (1992) pancreatic colipase-structural and physiological aspects. biuchim. biophys. acta 1125. 1-7. gastric-inhibitory polypeptide, messenger-rna levels; diabetic rats; pro-colipase; co-lipase; tymsine residues; porcine colipase; sequence; taurodeoxycholrte; triglyceride. coleman r. and rahman k. (1992) wehle e. (1992) reiter r.j. (1992) the ageing pineal gland and its physiological conxequences. bioessays 14, 169-175. malatonin receptors; admoet@c-receptors; circadian variations; n-acetylscrotonin, plasma melatcoin, serum melatouin; hamsters; gerbil; brain; reduction; downward j. (1992) regulatory mechanisms for ras proteins. bioessap 14, 177-184. gtaase-activating protein; neurofibranatosis type-l gene; nucleotide exchange-reaction; gap-associated proteins; growth-factor; tymsine phosphorylation; ras-p21 gtpase; stimulation; p21; recepors. rusciano d. and burger m.m. (1992) adamo m., roberta ct. and lemith d. (1992) how distinct are the insulii and ~sul~-l~e growth factor? -signalling systems. biofwtors 3.151-157. human-skin fibroblests; igf binding-protein; messenger ribonucleic-acid; cooh-teminal truncation; monoclonal-antibody; endothelial-cells; factor receptoc amniotic-fluid; dna-synthesis; rat-heart. hehnreich e.j.m. (1992) how pyridoxal s-phosphate could function in glycogen phosphorylaae catalysis. biofbctors 3, 159-172. aron d.c. (1992) insulin-like growth factor-i and erythropoiesis. biofators 3, 211-216. factor-binding-protein; erythroid colony formation; cultured human-tibroblasts; factor messenger-rna, n-terminal sequence; fetal bovine serum; igf-i; somatomedin c, clinical-applicstians; stimulating factors. silver b.j. (1992) platelet-derived growth factor in human malignancy. biofmtors 3, 217-227. terminal coding region; c-sis; b-chain. bmis w.d. and durst r.w. (1992) bajpai p. and bajpai p.k. (1992) arachidonic acid production by microorganisms. biofechnol. appl. biockm. 15. l-10. bellomo m.j., parlier v., muhlematter d., grob j.p. and beris p. (1992) three new cases of chromosome-3 rearrangement in bandq21 and band-q26 with abnormal thrombopoiesis bring further evidence to the existence of a 3q21q26-syndrome. cancer genet. cytogenet. 59. 138-160. acute nonlymphocytic leukemia; chronic myelogcnous leukemia; chronic myeloid-leukemia; acute megakaryoblastic leukemia; chronic myelocytic-leukemia; acute myeloblastic-leukemia; british cooperative group; myelodysplastic syndromes; blast crisis; tmnslocation t(l -3). pedersen b. (1992) survival of patients with t(l-7)(pl l-p1 l&report of 2 cases and review of the literature. cancer genet. cytogenet. 60, 53-59. acute nonlymphocytic leukemir; trsnslccation i-7; myelodysplastic syndromes; chromosome analysis; mycloid disorders; secondary. nossal g.j.v. (1992) the molecular and cellular basis of affinity maturation in the antibody response. cell 68.1-2. mutation. thomas g. (1992) map kinase by any other name smells just as sweet. cell 68, 3-6. protein-kinsse; insulin; identification; muscle. teach r.e. (1992) type-2 astrocyte developme ciliaty netttotmphic factoc fibmblast growth-factor. retinoic acid xceptor; chick limb bud; tymsine kinaae m eatiy xatqus embryos; pmto-oncogene int-1; activin-a; w-locus. greenwald i. and rubin gm (1992) mellman i. and simons k. (1992) the golgi complex--in vitro verirus. cell 68. 829-840. asparsgine-linked oligosaccharides; cell-free system; rough endoplssmic-reticulum; vesicular stomatitis-tims; plasma-manbrane proteins; bmfeldin a; cis-golgi; successive compartments; intracellular-transport; n-acetylglucosamine. chao m.v. (1992) hall a. (1992) signal transduction through small gtpases-a tale of 2 gaps. cell 69, 389-391. proteins. wetr d.z., peles e.. cupples r., suggs s.v., bacus s.s., luo y., trail g., hu s.. silbiger s.m.. benlevy r.. koski r.a., lu h.s. and yarden y. (1992) neu differentiation factor-a transmembrane glycoprotein containing an egf domain and an immunoglobulin homology unit. cell 69, 559-572. epidennal growth-factor. human mterleukin-2 receptor. human-breast-carcinoma; human mammary-tumors; factor-a; nudeotide sequence; molecular cloning; prom-oncogene; tyrosine phosphorylation; signal transduction. travers a.a. (1992) the reprogramming of transcriptional competence. cell 69. 573-575. position effect variegation; drarophila; protein. helenius a. (1992) unpacking the incoming influenza virus. cell 69, 577-578. amantscline: protein, virions. jan l.y., jan y.n. and hughes h. (1992) tracing the roots of ion channels. cell 69. 715-718. protein. gauss p. and walther c. (1992) pax in development. cell 69, 719-722. genes; conservation; drosophila; nowchord, proteins; domain; box. varshavsky a. (1992) the n-end rule. cell 69, 725-735. shon-hved protein; dependent pmteolytic system; ubiquitin-conjugating enzyme; repair gene ra&, escherichia coli; transfer rna; sacclurroqces cercvisiue; endoplasmic-reticulum; cell-cycle; amino-acid. cell signailing iwashita s. and kobayashi m. (1992) signal transduction system for growth factor receptors asso&ed with tyrosine kinase activity-epidennal growth factor receptor signalling and its regulation. cell signal. 4. 123-132. vogt w. and nagel d. (1992) eriksson l.c. and andersson g.n. (1992) nucleotide-sequence; femo-cofacscr. iritani n. (1992) nutritional and hormonal regulation of lipogenic-enzyme gene expression in rat liver. eiu. j. fatty-acid aynthase; acetyl-coa cuboxylase; post-transcriptiaral regulation; chickembryo hepatocytes; messenger rna levels; malic enzyme; thymid hormone; posttmnscriptional regulation; molecular-clcming. gavel y. and vonheijne g. (1992) the distribution of charged amino acids in mitochondrial inner-membrane proteins suggests different modes of membrane integration for nuclearly and mitochondriaily encoded proteins. eur. j. biochem. 205, 1207-1215. cytochmme-c-oxidrse; adp-awcartier. beef-heart mitochondria; nicotinamide nucleotide tmnshydrogenase; brown fat mitochondtia; diffemnt genes cdna, imn-sulfur proteirx saccharanyces cerevisiae; unce@ieg potdn; escherichia coli. frrmcklyn c., musierforsyth k. and schimmel p. (1992) youn y.k., lalonde c. and demling r. (1992) haas a. and goebel w. (1992) defromentel c.c. and soussi t. (1992) mackay t.f.c., lyman r.f. and jackson ms. (1992) hpitan n.l.v. (1992) sobell j.l., heston l.l. and sommer s.s. (1992) delineation of genetic predisposition to multifactorial disease-a general approach on the threshold of feasibility. genomics 12, l-6. polymense chain-reaction; fragment length polymorphisms; dependent diabetes-mellitus; sickle-cell anemia; factor-m gene; enzymatic amplificatiat; genomic dna; mutations; sequence; diagnosis; predisposition; genetics. troy f.a. (1992) polysialylation-from bacteria to brains. glycobiology. 2, 5-23. cell-adhesion molecule; escherichia coli kl; rainbow-trout eggs; deaminated neuraminic acid; endo-n-acetylne.uramiidase; group b meningococci; long oligosaccharide segment: netno-blastoma cells; polysialic acid; sialic-acid. varki a. (1992) diversity in the sialic acids. glycobiology. 2, 25-40 . n-acetylneuramhtic acid, infhtenxa c virus; de-ortho-acetylation; performance liquid-chromatography; hungonuhu d&her antigen; liver golgi vesicles; melanoma-associated ganglioside; bombardment mass-spectrometry; human gastmintestinal-trac deaminated neuraminic acid. stanley p. (1992) glycosylation engineering. glycobiology. 2, 99-107. hamster ovary cells; mcombinant human erythropoietin; tissue plasminogen-activator. n-linked oligosaccharide, protein glycosylaticm; biological-activity; lysosomal-enzymes; insect cells; animal-cells; sugar chains. harvey d.j. (1992) the role of mass spectrometry in glycobiology. glycoconjugate j. 9, 1-12. fast-atom-bombardment; 6-o-methylghtcose pelysaccharide; collisional activation; gas-chromatography; laser desorptien; molecular mass; oligosacdtarides; ionization; proteins. aquit d.a. and b~~~ii~ii c.f. (1992) heat-shock proteins and immunopathology-an overview. heat-shock t-cell receptor; messenger-rna degradation; gamma-delta; stress proteins; antigen-receptor, lymphocytes-t; ~ycobactcrium fuberc&arir. mammalian-cells; cyto-toxicity; dna-binding. fenick d.a. and gemmellhori l. (1992) potential developmental role for self-reactive t cells bearing gamma-delta t cell receptors specific for heat-shock proteins. dendritic epidermal-cells; toxic lymphocytes-$ antigen receptor; intraepithclial lymphocytes; rheumatoid arthritis; athymic mice; mycobacrcritert lubercularis; immune-response; thymic ontogeny; a-chain; cell receptor. christmas s.e. (1992) cytokine production by lymphocytes-t bearing the gamma-delta t cell antigen receptor. antigen; antigen receptor. mceptor. tumor-necrosis-factor; bone-marrow transplantation; a-p+ lymphocytes; interferon-gamma; monoclonal-antibodies; peripheral-blood; cytotoxic activity; fetal thymocytes; different forms; human intestine. wintield j., jarjour w. and minota s. (1992) stress protein autoantibodies and the expression of stress proteins on the surface of human gamma-delta cells and other cells of the immune system. heat-shock proteins and gamma-delta t cells 53, 47-60. stress; autoantibodies; immune-system; heat-shock protein; rous-sarcoma virus; juvenile rheumatoid-arthritis; t-cells; synovial-fluid; lymphocytes-t, mycobucterium tuberculosis; borrelia bwgdorferi; lupus erythematosus; transforming protein; stmss protein. modlin r.l.. lewis j., uyemura k. and tigelaar r.e. (1992) lymphocytes-t bearing gamma-delta antigen receptors in skin. dendritic epidermaltells; heat-shock protein; mycobacterium-tuberculosis; intraepithelial lymphocytes; limited diversity; murine epidermis; concanavalin a; thy-l antigen; nude-mice; expression. hohlfeld r. and engel a.g. (1992) the role of gamma-delta lymphocytes-t in inflammatory muscle disease. monochmal-antibody analysis; natural-killer cells; mononuclear-cells; cyto-toxicity; receptor; myopathies; recognition; expmssiost; ptoteins; canplex. aquino d.a. and selmaj k. (1992) heat-shock proteins and gamma-delta t cell responses in the central nervous system. heat-shock proteins and gamma-delta t cells 53, 86-101. experimental autoimmune encephalomyelitis; experimental allergic encephalomyelitis; fibrillary acidic protein; myelin basic-protein multiple sclerosis: stress prcteim spinal-cord, insitu hybridixation; alexander's disease; praxosin treatment. mario t., nagasawa m. and yata j. (1992) gamma-delta t cells in patients with primary immunodeficiency syndrome-their function and a possible role in the pathogenesis. heat-shock proteins und gamma-delta t cells 53, 102-120. wiskott-aldrich syndrome; heat-shock proteins; receptor-delta; ataxia telangiectasia; transgenic mice; lymphocytes t; bearing cells; recognition; expression; incmase. reardon cl., bom w. and obrien r.l. (1992) murine gammadelta lymphocyte-t recognition of hsp60 a possible source for bacterial immunity or autoimmunity. he&& j.e. and whitelaw p.f. (1992) the role of cellular oncogenes in myogenesis and muscle cell hyperuqhy. int. j. bkxhem. 24, 193203. muscle; all; rous sarcoma vins; fibroblast growth-factor; c-fos expression; embtyonal caminoma-cel~, chicken skeletal-muscle; myc messatger-rna; proto-oncogene; geneexpnssion; dna-binding; src gmc. colacicchi s.. ferrari m. and sotgiu a. (1992) in vivo electron paramagnetic resonance spectroscopy imaging -first experiences, problems, and perspectives. bat. j. b&hem. 24.205-214. &oxide spin labels; loop-gap resonator. free-radicals; soluble nitroxides; metabolism; phannacokinetics; oxygen; cells; specttusn~, sensitivity. seyer r.. richoux j.p. and aumelas a. (1992) probing angiotensin receptors. iru. j. biochem. 24, 369-377. message-address concept; if onnplementaty rna, paraventricular nucleus; biological-activities; subfomical organ; binding-sites; amino-acid; rat-brain; antagonists; analogs. tuck m.t. (1992) the formation of internal 6-methylademne residues in eucaryotic messenger rna. fnt. j. biochem. 24, 379-386. rekharsky m.v., nemykina e.v. and erokhin a.s. (1992) thermochemistry of n-c bonds hydrolysis in amides, peptides, n-acetyl amino acids and high-energy n-c bonds hydrolysis in n-acetyl imidazole and urea. lnl. konopinska d., rosinski g. and sobotka w. (1992) insect peptide hormones, an overview of the present literature. amino-acid-sequence; bombardment mass-spectranetry; pigment-concentrating hormone; locust adipokinetic hormone; periphneta americana l; akh-rpch family; corpora cardiaca; lencophaea mademe; cotpus caniiaann; matuiuca swrta. dinarello c.a. (1992) the biology of interleukin-1. interleukinomolecular biology and immunology 51. l-32. tumor necrosis factor, colony-stimulating factor; smooth-muscle cells; human-immunodeficiency-vims; vascular endothelial-cells; blood mononuclear-cells; recombinant human interleukin-1; human monocyte interleukin-1; hepatic protein-synthesis; autocrine growth-factor. dower sk.. sims j.e., cerretti d.p. and bird t.a. (1992) the interleukin-1 system--receptors, ligands and signals. interleukin.+molecular biology and hmunology 51, 33-64. tumor necrosis factor, pmtein kinase c, growth-factor-receptor. nf-kappa-b; factor increase phosphorylation; prostaglandin e production; thaunatoid synovial-cells; high-affinity receptors; gtp-binding pmt+ human t-cells. ihle j.n. (1992) interleukin-3 and hematopoiesis. interleukim-44olecular biology ad i mmunology 51, 65-106. colony-stimulating factor. human granulocyte-macmphage; protein kinase-c; recombinant human interleukin-3; cell growth-factor, murine bone-marrow; express functional receptors; acute lymphocytic-leukemia; gtpase-activating pm&n; factor-independent growth. ascorbic-acid deficiency; ischemic-heart-disease; low-density-lipoprotein; eastern finnish men; diabetes melli~rcs. guinea-pigs; blood-pressure; oral-contraceptives; experimental atherosclerosis; plasma-cholesteml. mannella ca., forte m. and colombini m. (1992) toward the molecular structure of the mitochondrial channel, vdac. j. bioenerg. biometnbmne 24, 7-19. outer-membrane channel; neurarpora crassa mitochondria; hexokinase-binding protein; voltage-dependent channel, rat-liver mitochondrip; synthetic polyanion; electron-microscopy; pore protein; sequence; arrays; molecular-structure. depittt~ v. and pahnieri f. (1992) benz r. and brdiczka d. (1992) the cation-selective substate of the mitochondrial outer membrane pore-ainglechannel conductance and influence on intermembrane and peripheral kinases. j. bioenerg. biomembrane x33-39. rat-liver mitochondria; hexokinase-binding protein, contact sites; synthetic polyanicn; creatine-kinase; inhibition; transport; stae; brain. arora k.k., parry d.m. and pedersen p.l. (1992) khorana h.g. (1992) rhodopsin. photoreceptor of the rod cell-an emerging pattern for structure and function. j. bid chetn 267, 1-4. schiff-base counterion; bovine rhodopsin; cysteine residue-l 10; molecular mechanism; visual excitation; outer segmenl; protein; light; transducin; binding; mds; rhodopsin; photoreceptors. pugh b.f. and tjian r. (1992) diverse transcriptional functions of the multisubunit eukaryotic tfbd complex. j. a-a-crystallin; tissue-specific expmssion; chicken 6-l crystallin gene; vettebrate lens ctystallins; non-lenticular tissues; x ray-analysis; y-crystaltin; transgenic mice; b-ctystallin; eye lens; gene-regulation. fibroblast growth-faaer; endothelial-cell mitcgen; vascular-permeability factor, tumor necrosis factor. bovine brain; extracellular-matrix; factor-a; neovascularixaticn in viva; dna-synthesis; acidic fgf. sardesai v.m. (1992) nutritiongi role of ~~~~~~~ fatty acids. f. nurt. ffioch. 3,154-u%. a-linolutic acid; coronary beast-disease; arachidatic-acid; cholesterol levels; lipid-metabolism; young ra% deficiency: plasma; hr& requirements. rcmto g., toth k., gaspar s. and csik g. (1992) transmission; somatic-cell hybrids; @mine-rich region; maternal inheritance; nucleotide-sequarce; gene organization; d-loop; heteroplasmy; melanogaster; evolution; genome. schaich k.m. (1992) metals and lipid oxidation--contemporary issues. lipids 27, 209-218. oxidation; unsaturated fatty-a&s; electnm-spin resonance: hydrogen-petoxide, linoleic-acid; free-radicals; pulse-radiolysis; amino-acids; catalyzed nutoxidation; butyl hydtopetoxido: transition-metals. microbial reviews zimmer r.s. and lowry c.v. (1992) regulation of gene expression by oxygen in saccharomyces cerevisiae. microbial. rev. 56. l-11. cytoehrome-c oxidase; upstream activation site; mitochondtial messenger-ma; yeast hap1 activator, tihoscmntl-protein genes; sex-determining mgiont dna-binding maif; nuclear gene; glucose mpmssiont cycl gene wittarts s.c. (1992) two-way chemical signaling in agr~tffi~-plot interactions. ~~~~. rev. 56. 12-31. tumefaeiens 7%plasmi& vir gate-expression; single-stranded-dna, crown-gall tumors; atopine synthase mhancen transgenic tobacco plants; mc&tducing plasmid; rhizogenes strain a4; opine-like compound; virulence genes. ac 24/l%-c dowling j.n., saha a.k. and glow r.h. (1992) microbial. rev. !%, [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] cdi; a neurotoxin, light-chains; spasmodic taticolhs; mouucluual-antibodies; mediciue. botsford j.l. and htttman j.g. (1992) cyclic amp in prokaryotes. microbid. rev. 56, edurichia cdi kl2; pertussis adcnylate-cyclase; camp receptorproteim cat&d&e-activaterprcteinadn; l-m-l& sugar phosphotrausferase system; site-directed mutagenesis; calmoduliu-like activity; gate-ngulauq ptuteiu; calcium-biudiug prctein. matthews k.s. (1992) dna kroping. microbial. rev. 56. 123-136. gslactese opercn; escluriclia coli; lac represson arac protein; operator interactieu; lactose nprerra; uabimnc 0pfaen; gene-zcgulatioru rna-pelymense; binding-sites. nishimura a., akiyama k.. kohara y. and horiuchi k. (1992) glycusyl-phosphatidyliuosituk ascospotogeuuus yeasts; molecular-cloniug; division arrest sutface prcteiu. silver s. and walderhaug m. (1992) gene regulation of plasmid-determined and chromosome-determined inorganic ion transport in bacteria. microbid. rev. 56, 195228. eschcrichia coli k12; arsenical resistance openm; gram-negative bacteria; pnol2g-encoded nidcel resistance; irondicitrate transport; syringae pv tomato; nucleotide-sequence; alcaligeneseutmphus; staphylucoccus 4wew; phosphrte legulon. osawa s.. jukes t.h., watanabe k. and mutt a. (1992) recent evidence for evolution of the genetic code. microbiol. complete nuclcotide-sequence; transfer-rna genes; mitochatdnal transfer-rna; neurapporo cmsaa mitochondris directional mutation pressure; transfer rilwnucleic-acid; isoleucine transfer-ma; coli tnnsfer-mas; escherichia cog uycoplosnw capricdtm. vsnrens g.l.m.. dejong w.w. and bloemendal h. (1992) a superfamily in the mammalian eye lens-the &crystallins. mol. biol rep. 16. i-10. x-ny-aualysir, y-ctystallin; ~-cqstalhu; geue family; differential synthesis; messenger-rna, evolutionary dationrhipr; smctural variation; nucleutide-sequeuce. volkenstein m.v. (1991) structure and dynamics of proteins. md. pancmatic trypsin-inhi*, nuclear m&c-resonance; amino-acid-squenc; glcbularptweins; clystd-structure; heliul prutein; enzyme-activity; evelutiat; conformation; principles. barker p.a. and murphy r.a. (1992) the nerve growth factor receptor-a multicomponsnt system that mediams the actions of the neurntrophin family of proteins. md. cd2 biochcm. 110. 1-15. nm growth; fatter recepta, wbaat-germ-rgglutinin; rat spinal-cerdt sympatketic-ganglia manbran% kigkrffinity recepon; human-melanana celts; human ngf receptor pcl2 cells; messenger-rna; sasory neurons; pheochmmocytonu alla. dicker lb. and seetharam s. (1992) what is known ahout the structure and function of the escherichia cdi protein oncogene cooper c.s. (1992) mini review--& met oncogene-from detection by transfection to tmnsmembrane mceptor for hepatocyte @owth factor. 0t8l%?@?f&? 7, 3-7. protdn t$vwk kinare; c@4bkks locus; human dhine; molecular-&xdng; s6ster f-mnnga activa gene; rearrangemens expfessinn. verma d.p.s. (1992) signals in root nodule organogenesis and endocytosis of rhizobium. pbt cell 4.373382. pwibac&roid mcmbmn~ glycine-mu; soybean nod&es; celldivision; m-phase; gene; protein; cunpa~ nodulatioq expmstiul. henderson b.w. and dougherty t.j. (1992) h ow does photodynamic therapy work? photo&m. photobid. 55, 145-157. silver s. (1992) plasmid-determined metal resistance mechanisms-range and overview. plasmid 27, 1-3. folate supplements prevent recurrence of neural tube defects anonymous (1992) fda dietary supplement task force folate supplements and neural tube defects gastric acidity, atrophic gastritis, and calcium absorption nutrition mission to iraq for unicef a retrovirus uses a cationic amino acid transporter as a cell surface receptor anonymous (1992) position of double bond of trans isomers of linoleic acid affects fatty acid deaaturation and elongation retinoic acid, bound to its nuclear receptor, enhances the expression of the gene for phosphoenolpyruvate carboxykinase 50.5254. seasaral-variations; parathymid-honncee; mineral centem. anonymous (1992) transplacental nutrient transfer and intrauterine growth retardation 65-70. pmcellagee messenger-rna, chick-embryo choudmcytes; alkaline phosphatare; extmcellular-matrix nutrition and health communication-the message and the media over l/2 a century ceramide-a new lipid 2nd messenger plasma-membraues; spbinganyehaase; brain amines; metabolism; polyamines; methionine; protein. anonymous (1992) sweet foods and calorie consumption at meals vitamin-e supplementation enhances immune response in the elderly nutritional requirements and dietary guidelines-the globe shrinks excerpts from dietary reference values for food energy and nutrients for the united kingdom-introduction to the guide and summary tables 95-101. oral glucose-load; bcdy-cunposition; dietary-fat; thennogenic response; basal metabolism health and nutritional consequences of the 1991 bangladesh cyclone unproven nutritional remedies and cancer buthionine sulfoximine, an experimental tool to induce glutathione deficiency-elucidation of glutathione and ascorbate in their role as antioxidants choline-a conditionally essential nutrient for humans anonymous (1992) gender differences in immune competence during copper deficiency 116-l 18. folinic acid; plasma; ccujugase; intestine relevance of physiology of nutrient absorption to formulation of enteral diets vi&u; dissimilatory sulfate reduction; border membrane-vesicles; peptide-chain length; human gut bacteria, free amino-acids; small-intestine dietary fiber, phytoestrogens, and breast cancer post-menopausal women; hotmone-binding globulin; serum estrogen-levels bound estradiok american whites; mammary-tumors first lines of physiology-prospective overview relationship between malnutrition and falls in the elderly nutritional management of patients with short-bowel syndrome high-frt; resection 181-223. calcium-t&are channek samuplasmic-reticulum membranes junctional foot ptoteim fast-twitch muscle; intramembrane charge movemenu extensor digktmt longus; catdiac f%ukinje4iben the varieties of ribonuclease p flavin-iinked peroxide r~~~~o~~-s~f~c acids aud the oxidative stress response flavoprotein disulfide oxidoseductases; alkyl hydmperoxide reductase; stmpmcoc4 nadh pemxidase sulmuueflo quhimgriw ghttathiate-reduetase; thiomdoxm reductasc; escheri& cofi; sequence; petoxide chromatin dynamics and the modulation of genetic activity neutralizing autibodies; envelopa glycoprotein; mouc&mal-antibody complex regulation of simple sugar transport in htsulin-responsive cells new insights into genetic eye disease. trends genet. 8.85-91. dentinant relinirb pigmenfma; retinal degeneration; photonceptor cells; molecular-genetics; tbodopain gene a new bind for myc dna-binding; activates tmnsa@on; protein, dimerimtion; motif; my& ~nsfo~~~ ptefetenees; contains; dimer the bmp proteins in bone fo~ati~ and repair gmwth-factor-& tgf-& xenopus; identification; inductian; family; bovine; gene element; arabi&.& thaliana; lilium henryi; dna-sequence 109-114. dmninant motphoiogical mutan tmns&ption factots; maize; gene; protein hormones, puffs and flies-the molecular control of metamotphosis by ecdysone. trends gener. 8, 132-138. imaginal disk motphogenesir, sequential gene activation; larval salivary-gland; drosophila melonogoslcr, alcoholdehydtogenarc; pulytene cbtumosomes the molecular basis of glucose 6-phosphate dehydrogenase deficiency. trends genrt. 8.138-143. plarmadivnr f&ipwwq n~~&-~u~~~ point mutations; red-cells; gene; expmssion; cdna 144-149. skeletal-muscle; neural controh mcuse embryo molecular genetics of sex detetmination in c. efegunr. trench genet. 8, 164-168. pest-amlnyonic development; cycle contro1 proteins; dosage canpensation hemtaphrodim, mutaticaas; trri-1 trends gatct. 8.169-174. position+ffect variegatimx dna methyhuien; gene-expression; cpg island; mplmatia~ 174-180. mjugation tube formation; succharomyee~ cerevisioe; tremslia mesenterica; dna; gene the retinoblastoma protein and cell cycle regulation patterning of the drosophila nervous system-the achaete-scute gene complex. tr& gener. 8. 202-208. aensay m development mating-tupe; yeast; intexunpsicm; fusion; fission yeast; schizosaccharomyces pom6e; cell linuge; dna arank ho gene; cassetks; asymmetly ustilogo viohxu; nst !itlt@ts; t'&unce; pcpi&ition cti~ dynamics; ctisciw; @ttc; reproduction integrins and metastases-an overview. tumor bid 12.309-320. murine melanoma-cells; mconstituted basement-membrane; fibronectin-recepor complex tumor-cells; surface galactosyltransferase; adhesion molecules; breast-cancer. carbohydrate cha& leukucy~ integrins rat brain haxddnase; amino-acid-sequence; intraceuular-localization; terminal sequence; tumor-cells; membrane; glucose; particulate; form; phosphorylation.mcenery m.w. (1992) the mitochondrial benzodiazepine receptor-evidence for association with the voltage-dependent anion channel (vdac). j. bioenerg. biomembrane 24, 63-69.membrane contad sites; rat-liver mitochondria; central nervous-system; binding-sites; heafl-mitochcndria; mdogcnous ligands; calcium-channel; creatine-k&se; cyclosporin-a; localization.thinnes f.p. (1992) evidence for extra-mitochondrial localization of the vdac/porin channel in eucaryotic cells. j. bioenerg. biomembrane 24, 71-75. dependent anion channel; outer-membrane; chloride channels; escherichia coli; cystic-fibrosis; conductance; protein; identification; hexokinase; state.beavis a.d. (1992) properties of the inner membrane anion channel in intact mitochondria. rat-liver mitochondria; heart-mitochcndria; plant-mitochondria; k+ flux; dibutylchlonnnethyltin chloride; triphenyltin compounds; conducting channel; transport pathway; light-scattering;binding-sites. moran 0. and sorgato m.c. (1992) high-conductance pathways in mitochondrial membranes. j. bioenerg. biomembrane 24. 91-98. contact sites; outer-membrane; brain mitochondria; eukaryotic cells; lipid bilayers; pot-in pores; ion channel; transport; phosphorylation;reconstitution.kirmally k.w., antonenko y.n. and zorov d.b. (1992) modulation of inner mitochondrial membrane channel activity. j. bioenerg. biomembrane 24, 99-110. anion channel; contact sites; selective channels; brain mitochondria; lot] channci; conductance; protein; ca*'; mitoplasts;catiolls. l. and herzfeld j. (1992) nmr studies of retinal proteins. j. bioenerg. biomembrane 24. 139-146.solid-state 'le. dark-adapted bacteriorhodopsin; schiff-base; lab&d hncteriorhodopsin; membrane-proteins; chromophore; dynamics; conformation; spectroscopy. rothschild k.j. (1992) ftir difference spectroscopy ol bacteriorhodopsin-toward a molecular model. resonance raman-spectroscopy; carboxyl protonation changes; hydrogen-dcutcrium exchange; oriented purple membrane; retinal schiff-base; to-blue transition; vibrational spectroscopy; low-temperature; neutron-diffraction; differcncc spea*scopy.lanyi j.k. (1992) proton transfer and energy coupling in the bacteriorhodopsin photocycle. j. bioenerg. biomembrane 24, 169-179. resonance raman-spectra; solid-state "c. purple membrane; halobaclerium halobium: chromophore structure; photochanical cycle; schiff-base; retinal cimmophom: nbsor@ion-spectra; aspanic acid-96.oesterhelt d., tittor j. and bamberg e. (1992) segrest j.p.. jones m.k.. deloof h.. brouiiiette c.g., venkatachalapathi y.v. and anamharamaiah g.m. (1992) the amphipathic helix in the exchangeable apoiipoproteins-a review of secondary structure and function j. lipid res 33, 141-166. high-density-lipcpmtein; synthetic pa&de analogs; lccithiu-choleste.ml acyltransferaae; gradient gel&ctmphcresis; lipid-pmtaiu iute~; a-i; plasma-hpopmteins; monoclonal-antibodies; electron-miaorcopy; netstrut-scattering. hide wa., chw l. and li w.h. (1992) daliongeviile j., lussiercacan s. and davignon j. (1992) modulation of plasma triglyceride levels by apoe phenotype-a meta-analysis-review. j. lipid res 33, 447-454.levek apoiipopmmht-e poiymotphism; coronaty-anery disease; density-lipoprotein chdestemk measured genuype infonnaticn; amein-e polymorphism; amino-acid-sequence; iii hyparlipoproteinemia; e isoforms; quantitative phenotypes; v hyperlipoproteinemia. bjorkhem i. (1992) mechanism of degradation of the steroid side chain in the formation of bile acids-review. j.lipid res 33, 455-471. rat-liver peroxisomes; h-tic mitochondrial cytochrome-p-450; cerebmtendinous xanthomatosis; cholic-acid; 3-a,7-a.12-a-trihydroxy-s-@cholestanoic acid; 26-hydroxylase system; chenodeoxycholic acid; 12-a-trihydroxy-5-~-cholestanoic acid; cholesterol 7-a-hydroxylase; vitamin-d3 25hydroxylase.hofmann a.f. and mysels k.j. (1992) bile acid soiubiiity and precipitation in vitro and in v&-the role of conjugation, ph, and ca*+ ions. j. lipid res 33.617-626. guo z.s. and depamphilis m.l. (1992) a-protein; bacteriophage-~, purificatiat. trumbly r.j. (1992) glucose repression in the yeast saccharomyces cerevisiae. mol. microbiol. 6, 15-21. carbcn catabolite repression; snfl protein-kinase; invettase synthesis; molecular analysis; maltose fermentation; cytocbrome genes; nuclear-pmtcim gal genes; suc2 gene; mutants.bischoff d.s. and ordal g.w. (1992) okane d.j. and prasher d.c. (1992) evolutionary origins of bacterial bioluminescence. mol. microbial. 6,443-449.amino-acid-sequence; yellow fluorescent protein; fischeri strain y-l; photobacterium phosphorewn; vibrio fischeri; nucleotide-sequence; lumaxine protein; luminous bacterium; a-subunit; g-subunit.ditita v.j. (1992) ohe g., johannes c. and schultefiohlinde d. (1992) faisst s. and meyer s. (1992) compilation of vertebrate-encoded transcription factors. nucleic acids. nf-kappa-b, major histocompatibility complex; enhancer-binding-protein; nuclear factor-i; long terminal repeat amp response elanenu heavy-chain promoter. growth-hormone gene; admovilus dna-mplication; mouse estrogm-reapor. strohl w.r. (1992) compilation and analysis of dna sequences associated with apparent sfrepfomycete promoters.nucleic acids. res 20. %l-974.rna-polymerase; a-amylase gene; aminoglycoside phosphotransferase gene; fi-lactamase gene; escherichia coli; nucleotide-sequence; transcriptional analysis; coelicolor a3(2); sigma-factor. acetyltmnsferasc gme. perigaud c., gosselin g. and imbach j.l. (1992) nucleoside analogues as chemotherapeutic agents-a review.nucleos. nucleot. 11, 903-945. welch g.r. (1992) an analogical field construct in cellular biophysics-history and present status. prog. biophys.mol. biol57,71-128. nonstatiauy electric-fields; sequattial metabolic enzymes; free-energy; biological-systems; protein fluctuations; aqueous cytoplasm: diekttic theory; litig systems; thermodynamics; mechanisms.mart4 p. (1992) biophysicai aspects of neutron scattering from vibrational modes of proteins. prog. biophys. mot. bid 57, 129-179. pancm& tqprin-inhibitor, low-frequency vibrations; time-of-flight liquid glass-transition; hinge-bending mode; biological functions; globular pmtein; temperaturedependence; allosteric transition; supercooled liquids. cinti d.l., cook l., nagi m.n. and suneja sk. (1992) the fatty acid chain elongation system of memmalirm endoplaamic reticulum. prog. lipid res 31, 1-51. rat-liver nli crosomu; swine cembral mictosanes; acyl-coenzyme a; very-latg-chaim mouse-brain miaoscnnu; enoyl-coa hydratase; pemxisomal bifunctioual protein; cultured skin fib&lasts; lipid-lowering agentr; ~-oxidationheth2ea.m. lysolecithitt-lysolecithin acyltmttsferue; rabbit alveolar macmphages; myocatdial lysophos+&mse-tran~~, aortic endothelial-cells: cultumd netuo-blastoma; heart-muscle mictosomes; precursor fatty-acids; rat-brain microsomes; guinea-pig heart; fish oil diet kaya k. (1992) chemistry and biochemistry of taurolipids. prog. lipid res 31, 87-108. phosphoenolpymvate carboxykinase gtp; diet-induced hypercholesterolemia promoter-regulatory region; tissue-specific expression; enhancer-binding-pn&tu pymvate-kinase gene; rat-liver, messenger-rna, 6phosphofmcto-2kinasefructose 2,6-bispbosphatase; transcriptional regulation. barber j. and andersson b. (1992) lermrd j. (1992) moms d. (1992) smtctura 1 and fum&mal relationships between ~~yl-~f~ rna syt&tases. biodem. sci 17, 159-164. 3dimatsiand structure; atp, mechanisms; resohstion; homology; ~ueteim tymsyl; site; ammoacyl-transfer-rna. key: cord-022177-j0qcjbxg authors: markl, jürgen; sadava, david; hillis, david m.; heller, h. craig; hacker, sally d. title: genome date: 2018-10-12 journal: purves biologie doi: 10.1007/978-3-662-58172-8_17 sha: doc_id: 22177 cord_uid: j0qcjbxg canis lupus familiaris, der haushund, wurde vor rund 15.000 jahren von den menschen domestiziert. obwohl es viele verschiedene varianten von wölfen gibt, ähneln sich diese ziemlich stark, doch das trifft auf den „besten freund des menschen“ nicht zu. die fédération cynologique internationale (fci), weltweit größter dachverband der hundezüchter, erkennt über 300 hunderassen an. genetiker gehen von rund 100 echten hunderassen aus, der rest seien varietäten. hunderassen sehen nicht nur recht unterschiedlich aus, sondern sie unterscheiden sich auch stark in ihrer körpergröße. so wiegt beispielsweise ein durchschnittlicher chihuahua nur 1,5 kg, während ein schottischer jagdhund 70 kg auf die waage bringt. kein anderes säugetier zeigt eine so starke phänotypische variabilität. außerdem gibt es hunderte von genetisch bedingten krankheiten bei hunden, und für viele davon findet sich auch ein gegenstück bei menschen. das hundegenomprojekt begann in den späten 1990er-jahren, um herauszufinden, welche gene für die genetische variabilität verantwortlich sind und welche zusammenhänge zwischen genen und krankheiten bestehen. wie nicht anders zu erwarten, gründeten einige wissenschaftler firmen, um anhand der dna hunde auf genetische varianten zu testen und dadurch besorgten hundehaltern die "reinheit der rasse" bestätigen zu können. in ähnlicher weise wurde das genom von hauskatzen, wildkatzen und verschiedenen großkatzenarten sequenziert. vergleiche dieser tiergenome tragen dazu bei, die evolutionsgeschichte der verschiedenen säugerlinien zu ermitteln und auch gene zu identifizieren, die für krankheiten und merkmalsformen verantwortlich sind, wie sie bei den verschiedenen spezies der säuger vorkommen. solche untersuchungen beschränken sich natürlich nicht nur auf säuger, sondern es gibt genomprojekte quer durchs tierreich, und auch bei pflanzen, pilzen, vielen anderen eukaryoten sowie zahlreichen prokaryoten. welche erkenntnisse haben wir durch die sequenzierung der genome von tieren gewonnen? in "experiment: vergleichende analyse des tigergenoms" in 7 abschn. 17.1 und in 7 "faszination forschung" am ende dieses kapitels finden sie antworten auf diese frage. bei der genomsequenzierung wird die nucleotidsequenz des gesamten genoms eines lebewesens bestimmt. bei einem prokaryoten, der ein einziges chromosom besitzt, ist die genomsequenz eine durchgehende abfolge von basenpaaren (bp). bei einer diploiden, sich sexuell fortpflanzenden spezies mit mehreren autosomen und einem paar von geschlechtschromosomen (7 abschn. 12.4) bezieht sich der begriff "sequenziertes genom" normalerweise auf die sequenz aller basen in einem haploiden autosomensatz und den beiden geschlechtschromosomen (beim menschen also 22 c 2, beim hund 38 c 2). genome werden in form kurzer fragmente sequenziert, die mithilfe von überlappungen einander zugeordnet werden. in der funktionellen genomik nutzt man die sequenzinformationen, um die funktionen der verschiedenen teile des genoms zu bestimmen. in der vergleichenden genomik werden die genomsequenzen verschiedener organismen verglichen. mit den technischen fortschritten bei der dna-sequenzierung kam es zu einer explosionsartigen zunahme an genetischer information, die wissenschaftler auf verschiedene weise nutzen können. man kann die genome verschiedener spezies vergleichen, um herauszufinden, wie sie sich auf dna-ebene unterscheiden. diese informationen können dann genutzt werden, um evolutionäre beziehungen nachzuvollziehen. man kann die sequenzen von individuen innerhalb einer spezies vergleichen, um mutationen zu ermitteln, die bestimmte phänotypen hervorrufen. mithilfe der sequenzinformationen lassen sich gene für bestimmte merkmalsformen identifizieren, etwa gene, die mit krankheiten zusammenhängen. man kann die dna-sequenz proteincodierender gene auffinden und daraus die aminosäuresequenz der betreffenden proteine ableiten, sofern diese noch unbekannt ist. die möglichkeit, das gesamte genom eines komplexen organismus zu sequenzieren, wurde vor 1986 gar nicht in betracht gezogen. der nobelpreisträger renato dulbecco und andere wissenschaftler schlugen damals jedoch vor, dass die wissenschaftliche gemeinschaft weltweit mobilisiert werden sollte, um die sequenzierung des gesamten menschlichen genoms in angriff zu nehmen. ein beweggrund war, dass man bei menschen, die die atombombenwürfe im zweiten weltkrieg in japan überlebt hatten und der strahlung ausgesetzt gewesen waren, etwaige dna-schäden untersuchen wollte. um aber veränderungen im menschlichen genom feststellen zu können, musste erst einmal dessen sequenz bekannt sein. so wurde das mit öffentlichen geldern finanzierte humangenomprojekt ins leben gerufen, ein gewaltiges vorhaben, das aber bereits 2003 erfolgreich abgeschlossen werden konnteerheblich früher als erwartet. diese bemühungen wurden von privat finanzierten gruppen unterstützt und ergänzt. das projekt profitierte von der entwicklung vieler neuer und bahnbrechender methoden, die zuerst bei der sequenzierung kleinerer genome angewendet wurden -von prokaryoten und einfach gebauten eukaryoten, etwa von den modellorganismen, denen sie in vorangegangenen kapiteln dieses buches bereits begegnet sind. viele dieser methoden sind heute weit verbreitet, darunter ganz neue methoden speziell für die genomsequenzierung. die methodischen entwicklungen auf diesem sektor gehen ständig weiter. diese verfahren werden ergänzt durch neuartige methoden, um die phänotypische vielfalt der proteine und der stoffwechselprodukte in einer zelle zu untersuchen. eine grundvoraussetzung war und ist die dramatische weiterentwicklung der computerhard-und -software, um die riesigen anfallenden datenmengen bewältigen zu können. viele prokaryoten besitzen ein einziges chromosom, während es bei eukaryoten viele chromosomen sind. aufgrund ihrer unterschiedlichen größe lassen sich die chromosomen einfach voneinander trennen. das direkteste verfahren scheint zu sein, bei der sequenzierung eines chromosoms an einem ende zu beginnen und einfach das gesamte dna-molekül nucleotid für nucleotid zu sequenzieren. die aufgabe wird dadurch etwas vereinfacht, dass nur einer der beiden stränge sequenziert werden muss, denn der andere ist dazu ja komplementär. betrachten sie die sequenz 5 0 : : : aagctca: : : 3 0 ; dann muss der andere strang so aussehen: 3 0 : : : ttcgagt: : : 5 0 : doch ist die sequenzierung eines dna-moleküls, das millionen von basenpaaren lang ist, von einem ende zum anderen, selbst mit den heutigen methoden nicht möglich und auch gar nicht erforderlich. mit dieser strategie können höchstens einige tausend basenpaare auf einmal sequenziert werden. um eine genomsequenz bestimmen zu können, muss der mehrere zentimeter lange dna-faden eines chromosoms in viele kurze dna-fragmente zerlegt werden, und dann sequenziert man tausende solcher fragmente gleichzeitig. in den 1970er-jahren erfanden frederick sanger und seine mitarbeiter eine methode, mit der sich dna durch verwendung chemisch veränderter nucleotide sequenzieren lässt. diese nucleotide waren ursprünglich entwickelt worden, um die zellteilung bei krebszellen anzuhalten. dieses verfahren (beziehungsweise eine variante davon) diente dazu, die erste genomsequenz des menschen und mehrerer modellorganismen zu bestimmen. die methode ist jedoch nach heutigem maßstab relativ langsam, teuer und arbeitsintensiv. in der ersten dekade des neuen jahrtausends wurden schnellere und preisgünstigere methoden entwickelt, die man häufig unter dem begriff hochdurchsatzsequenzierung zusammenfasst. bei diesen verfahren nutzt man eine miniaturisierte technik, die ursprünglich für die elektronikindustrie entwickelt wurde, sowie die mechanismen der dna-replikation, häufig in kombination mit der polymerasekettenreaktion (pcr). die pcr lässt sich automatisieren; sie ist eine wichtige methode für die sequenzierung geringer dna-mengen. einzelheiten über die pcr finden sie in 7 abschn. 13.5. die methoden der hochdurchsatzsequenzierung, auch zusammengefasst unter dem begriff next-generation-sequencing (ngs), werden rasch immer weiter verbessert. eine der vielen herangehensweisen soll hier skizziert werden und ist in 7 abb. 17.1 dargestellt. zuerst werden die dna-fragmente für die sequenzierung präpariert, indem man sie an einen festen träger bindet und die dna durch pcr amplifiziert (7 abb. 17.1a): universelle adaptersequenz immer schneller werden neue genomsequenzen veröffentlicht, was eine riesige menge an daten liefert. diese informationen werden in zwei verwandten forschungsgebieten genutzt, die sich beide mit der untersuchung von genomen beschäftigen. in der funktionellen genomik verwenden biologen sequenzinformationen, um die funktionen der verschiedenen teile eines genoms zu erkennen (etwa die bedeutsamen regionen, die mrnas oder trnas codieren, sowie regulatorische sequenzen, 7 abschn. 14.4): offene leseraster, das heißt proteincodierende regionen der gene: bei proteincodierenden genen erkennt man diese bereiche an den start-und stoppcodons für die translation und an consensussequenzen, welche die positionen der introns anzeigen, ein wichtiges ziel der funktionellen genomik besteht darin, in allen genomen die funktion aller offenen leseraster zu bestimmen. originalliteratur: cho ys et al. (2013) nature commun 4: 1-7 panthera tigris (der tiger) ist der größte vertreter aller katzenartigen und gehört zu den bekanntesten gefährdeten tierarten. es gibt nur noch rund 4000 freilebende exemplare. vor über einem jahrhundert waren neun genetisch unterschiedliche unterarten bekannt, von denen vier bereits ausgestorben sind. zu den fünf verbliebenen unterarten gehören etwa der bengalische tiger (königstiger), der öfter in zoos zu sehen ist, und der sibirische tiger (amurtiger), der in den schneereichen regionen von russland, china und nordkorea lebt. die genome anderer katzenarten, beispielsweise von löwe, schneeleopard und hauskatze, waren bereits sequenziert worden, das des tigers allerdings noch nicht. durch die sequenzierung der genome von großkatzen lässt sich zeigen, dass die phänotypische variabilität, die bei katzen festzustellen ist, auf genetische varianten zurückzuführen ist. exon/intron-muster von genen, um etwas über die evolution und rolle der introns und modulartiger proteindomänen zu erfahren aminosäuresequenzen von proteinen, da sich diese durch anwendung des genetischen codes (7 abb. 14.5) aus den dna-sequenzen von offenen leserastern herleiten lassen regulatorische sequenzen, zum beispiel promotoren, enhancer und terminatoren für die transkription: diese werden aufgrund ihrer nähe zu offenen leserastern identifiziert und auch weil sie erkennungssequenzen für die bindung spezifischer transkriptionsfaktoren enthalten. rna-gene, etwa für rrna, trna und kleine nucleäre rna (snrna) und mikrorna andere nichtcodierende sequenzen, die verschiedenen kategorien zugeordnet werden können, beispielsweise centromer-oder telomerregionen, transposons und weitere sequenzwiederholungen sequenzinformationen werden auch in der vergleichenden genomik genutzt, also für den vergleich eines neu sequenzierten genoms (oder von teilen daraus) mit den sequenzen von anderen organismen. das hundegenomprojekt, das in der einleitung zu diesem kapitel (7 "faszination forschung: das hundegenomprojekt") beschrieben wird, hat beispielsweise nicht nur erkenntnisse über hunde geliefert, sondern auch darüber, inwieweit das hundegenom mit den genomen anderer tiere verwandt ist. durch genomvergleiche lassen sich zusätzliche informationen über die funktionen von sequenzen ableiten, und man kann insbesondere zwischen verschiedenen spezies evolutionäre beziehungen ableiten. jedes genom, das sequenziert wird, kann neue erkenntnisse liefern. in 7 "experiment: vergleichende analyse des tigergenoms" ist die vor kurzem erfolgte sequenzierung des tigergenoms dargestellt und auch wie dieses genom mit dem genom anderer katzenarten und dem des menschen verwandt ist. bei der sequenzierung von genomen werden chromosomen in fragmente zerlegt, dann die fragmente sequenziert und schließlich in der richtigen reihenfolge zu durchgängigen sequenzen in vollständigen chromosomen angeordnet. die heutigen sequenzierverfahren sind automatisiert und erfordern leistungsfähige computer. bei diesen methoden verwendet man markierte nucleotide, die an den enden der wachsenden nucleinsäureketten nachgewiesen werden. wenn man die genome von prokaryoten und eukaryoten vergleicht, drängt sich eine interessante schlussfolgerung auf: bestimmte gene sind universell, also bei allen lebewesen vorhanden. es ist nicht erstaunlich, dass dies unter anderem gene betrifft, deren produkte an der replikation, der transkription und der proteinsynthese beteiligt sind. es gibt auch einige (fast) universelle gensegmente, die in vielen genen zahlreicher organismen vorkommen, beispielsweise die sequenz, die eine atp-bindungsstelle in einem protein codiert. diese befunde deuten darauf hin, dass es einen uralten minimalen satz von dna-sequenzen gibt, ein minimalgenom, das allen zellen gemeinsam ist. ein mögliches verfahren, um diese gene zu identifizieren, besteht darin, sie computergestützt durch eine vergleichende analyse sequenzierter genome zu suchen. bei einer anderen methode, das minimalgenom zu ermitteln, geht man von einem organismus mit einem möglichst einfachen genom aus, mutiert gezielt ein gen nach dem anderen und beobachtet jedes mal, was geschieht. mycoplasma genitalium besitzt eines der kleinsten genome -es enthält nur 482 proteincodierende gene. selbst hier sind unter bestimmten voraussetzungen einige gene verzichtbar. wenn das bakterium beispielsweise gene für die metabolisierung von glucose und fructose besitzt, kann es im labor auch auf einem medium überleben, das nur einen dieser beiden zucker enthält. was ist mit den übrigen genen? mithilfe von transposons, die als mutagene dienen, hat eine arbeitsgruppe unter der leitung von craig venter experimentell versucht, diese frage zu beantworten. werden die transposons in dem bakterium aktiviert, fügen sie sich zufällig in gene ein und bewirken so eine mutation und inaktivierung des betroffenen gens (7 "experiment: bestimmung des minimalgenoms durch mutagenese mithilfe von transposons"). die mutierten bakterien werden auf wachstum und überlebensfähigkeit getestet. die dna von interessanten mutanten wird sequenziert, um herauszufinden, welche gene transposons enthalten. das erstaunliche ergebnis dieser untersuchungen zeigt, dass m. genitalium im labor mit einem minimalgenom von 382 funktionsfähigen genen überleben kann. bei der hefe besteht das minimalgenom aus nur 10 % der 5000 proteincodierenden gene, bei dem nematoden caenorhabditis elegans ist es ein ähnlicher anteil. ein ziel dieser untersuchungen besteht darin, für bestimmte anwendungen neue lebensformen zu entwickeln, beispielsweise bakterien, die eine ölpest beseitigen. im nächsten kapitel lernen sie diese methodik, die man als synthetische genetik bezeichnet, genauer kennen. mithilfe der dna-sequenzierung kann man die genome von prokaryoten untersuchen, von denen viele für den menschen und bestimmte ökosysteme von bedeutung sind. die funktionelle genomik bestimmt anhand der gensequenzen die funktionen der genprodukte. die vergleichende genomik vergleicht gensequenzen von verschiedenen organismen, um ihre funktionen und ihre evolutionären verwandtschaftsbeziehungen zu bestimmen. transponierbare elemente, zu denen auch zusammengesetzte transposons gehören, bewegen sich im genom von einer stelle zur anderen. untersuchungen des minimalgenoms könnten es ermöglichen, künstliche spezies zu erzeugen. sie sollten . . . die charakteristischen merkmale von prokaryotischen genomen beschreiben können. experimentelle ansätze entwickeln können, um fragen der metagenomik anzugehen. die mechanismen zusammenfassen können, durch die sich transponierbare elemente im genom bewegen können. originalliteratur: hutchison c et al. (1999) mycoplasma genitalium besitzt eines der kleinsten bekannten genome der prokaryoten. aber sind all dessen gene zum leben unentbehrlich? indem man gene nacheinander inaktiviert hat, ließ sich nun ermitteln, welche für das überleben der zelle unbedingt notwendig sind. bei bakterien reicht für das überleben der zelle ein essenzielles minimalgenom. die meisten erkenntnisse über eukaryotische genome hat man mithilfe von modellorganismen gewonnen, die umfassend untersucht wurden: die bierhefe saccharomyces cerevisiae, der nematode (fadenwurm) caenorhabditis elegans und die taufliege drosophila melanogaster, und repräsentativ für blütenpflanzen die ackerschmalwand arabidopsis thaliana. diese modellorganismen wurden ausgewählt, weil sie sich im labor relativ einfach vermehren und untersuchen lassen, ihre genetik gut bekannt ist und sie merkmale besitzen, die für eine größere gruppe von organismen repräsentativ ist. (als spezielle wirbeltiermodelle sind vor allem der zebrafisch danio rerio, der krallenfrosch xenopus laevis und die labormaus mus musculus in gebrauch, für die untersuchung von befruchtungsvorgängen auch noch der seeigel psammechinus miliaris.) die bierhefe: das eukaryotische grundmodell hefen sind einzellige eukaryoten und gehören zu den pilzen. typisch für eukaryoten besitzen sie membranumhüllte organellen wie den zellkern oder das endoplasmatische reticulum, und in ihrem zellzyklus wechseln haploide und diploide generationen einander ab (7 abb. 11.14). deshalb ist es auch nicht verwunderlich, dass die einzellige hefe ein größeres genom mit mehr proteincodierenden genen besitzt als ein einzelliges bakterium (7 tab. 17.2). untersuchungen zur geninaktivierung (ähnlich wie mit m. genitalium, 7 abb. 17.5) haben ergeben, dass bei der hefe weniger als 10 % der gene für das überleben unverzichtbar sind. der auffälligste unterschied zwischen dem hefegenom und dem genom von e. coli betrifft jedoch die anzahl der gene, die für die zielgerichtete verteilung der proteine auf die organellen zuständig sind (7 tab. 17.3). beide einzelligen organismen nutzen offenbar dieselbe anzahl von genen, um die grundfunktionen des überlebens der zelle aufrechtzuerhalten. die meisten der zusätzlichen gene sind für die kompartimentierung der eukaryotischen hefezelle in organellen erforderlich. dieser befund spiegelt im genotyp wider, was im phänotyp offensichtlich ist: die eucyte ist weitaus komplexer als die procyte. zwischen den mitgliedern der genfamilie liegt nichtcodierende spacer-dna. chromosom 11 abb. 17.8 die familie der globingene. die '-globinund "-globin-cluster der menschlichen globingenfamilie liegen auf verschiedenen chromosomen. die gene jedes clusters sind durch nichtcodierende spacer-dna getrennt. die funktionslosen pseudogene sind mit dem griechischen buchstaben psi ( §) gekennzeichnet. von den "-genen gibt es die beiden varianten a " und g " eukaryotische genome enthalten eine große zahl dna-sequenzwiederholungen, die keine proteine oder peptide codieren. sie befinden sich auch normalerweise nicht innerhalb von proteincodierenden genen, die nur einen geringen prozentsatz der gesamten dna ausmachen. sie umfassen hochrepetitive sequenzen, mäßig repetitive sequenzen und transposons. abb. 17.9 eine mäßig repetitive dna-sequenz codiert rrna. a dieses rrna-gen kommt einschließlich der nichttranskribierten spacer-region im menschlichen genom 280-mal vor, wobei sich auf fünf chromosomen cluster befinden. b diese elektronenmikroskopische aufnahme (tem) zeigt die transkription von multiplen rrna-genen mäßig repetitive sequenzen mäßig repetitive sequenzen wiederholen sich im eukaryotischen genom zehn-bis 1000mal. zu diesen sequenzen gehören auch die gene, die für die produktion von trnas und rrnas transkribiert werden, die an der proteinsynthese beteiligt sind. die zelle synthetisiert ständig trnas und rrnas, aber selbst bei der maximalen transkriptionsrate würden einzelkopien der trna-und rrna-gene nicht ausreichen, um die von den meisten zellen benötigten großen mengen dieser moleküle zur verfügung zu stellen. deshalb enthält das genom mehrere bis viele kopien dieser gene. bei den säugern enthält das ribosom vier verschiedene rrna-moleküle: die 18s-, 5,8s-, 28s-und 5s-rrna. (das s bedeutet svedberg-einheit oder sedimentationskoeffizient, ein in einer ultrazentrifuge ermitteltes größenmaß.) die 18s-, die 5,8sund die 28s-rna werden gemeinsam als einzelnes rna-vorläufermolekül transkribiert (7 abb. 17.9). in mehreren posttranskriptionellen schritten wird das vorläufermolekül in die drei endgültigen rrna-produkte geschnitten und die nichtcodierende spacer-rna wird entfernt (vom englischen spacer für "abstandhalter"). die dna-sequenz, die diese rrnas co-diert, ist beim menschen mäßig repetitiv: insgesamt liegen 280 kopien der sequenz in clustern auf fünf verschiedenen chromosomen. frage zu 7 abb. 17.9: gibt es ähnlichkeiten zwischen dem vorgang, der hier in der fotografie dargestellt ist, und dem vorgang der translation mit polysomen (7 abb. 14.15)? außer den rrna-genen handelt es sich bei den meisten mäßig repetitiven sequenzen um transposons, die sich wie die transposons der prokaryoten (siehe oben) durch das genom bewegen können. transposons machen etwa 40 % des menschlichen genoms aus und etwa 50 % des maisgenoms, wobei der anteil bei vielen anderen untersuchten eukaryoten mit 3-10 % deutlich kleiner ist. in 7 tab. 17.6 sind die wichtigsten typen von transposons bei den eukaryoten zusammengestellt. die retrotransposons umfassen drei gruppen, entsprechend der art der repetitiven sequenzen, die sie enthalten: lange endständige wiederholun die genome der eukaryoten enthalten deutlich mehr gene als die der prokaryoten. manche dieser "zusätzlichen" gene codieren funktionen, die mit der kompartimentierung der eukaryotischen zellen zusammenhängen, andere sind für die vielzelligkeit erforderlich. die genomsequenzen von modellorganismen wurden verglichen, um gemeinsame merkmale eukaryotischer genome zu finden, etwa die große menge an regulatorischen sequenzen, sequenzwiederholungen und nichtcodierender dna. einige gene der eukaryoten bilden genfamilien, zu denen auch gene gehören können, die mutiert und funktionslos sind. seit zu beginn des ersten jahrzehnts dieses jahrtausends die sequenzen des ersten menschlichen genoms vollständig ermittelt sind, wurden die genome vieler weiterer personen sequenziert teil v und veröffentlicht. aufgrund der schnellen entwicklung der methoden kann man, wie oben erwähnt, inzwischen ein menschliches genom für weniger als 1000 c sequenzieren lassen. am menschlichen genom und seinen genen werden die eigenschaften komplexer eukaryotischer genome deutlich. mithilfe der vergleiche von haplotypen bei personen, die von einer bestimmten genetisch bedingten krankheit betroffen oder nicht betroffen sind, lassen sich loci bestimmen, die mit der krankheit assoziiert sind. die pharmakogenomik untersucht, wie ein individuelles genom die reaktionen auf medikamente oder andere äußere faktoren beeinflusst. im folgenden sind einige interessante erkenntnisse aufgeführt, die man über das menschliche genom gewonnen hat: abb. 17.13 proteomik. ein einziges gen kann mehrere proteine codieren zur erinnerung: in 7 abschn. 16.5 wurde besprochen, dass von einem gen durch alternatives spleißen verschiedene mrnas erzeugt werden können. dadurch entsteht aus einem einzigen gen eine familie verschiedener proteine mit unterschiedlichen funktionen. bekanntermaßen können proteine auch in posttranslationalen reaktionen modifiziert werden, etwa durch proteolyse, glykosylierung und phosphorylierung (7 abschn. 14.1, "experiment: testen der signalsequenz"). das proteom wird vor allem mithilfe der massenspektrometrie analysiert. bei diesem verfahren dienen elektromagnete dazu, zuvor mittels laser ionisierte proteine aufgrund ihrer masse (bzw. der massen ihrer proteolytisch erzeugten fragmente) zu identifizieren. ein anderes standardverfahren der proteomik ist die elektrophoretische auftrennung von proteinen in zweidimensionalen polyacrylamidgelen. das letztendliche ziel der proteomik ist mindestens so anspruchsvoll wie das der genomik. während die genomik dazu dient, genome und ihre expression zu beschreiben, will man mit der proteomik alle zu bestimmten zeitpunkten exprimierten proteine identifizieren und charakterisieren. wie vergleiche der proteome des menschen und anderer eukaryotischer organismen gezeigt haben, gibt es einen gemeinsamen satz proteine, die sich in verwandtschaftsgruppen mit ähnlichen aminosäuresequenzen und funktionen einteilen lassen. wenn man die organismen als ganzes betrachtet, stimmen 46 % des hefeproteoms, 43 % des fadenwurmproteoms und 61 % des fliegenproteoms mit dem menschlichen proteom überein. laut funktionsanalysen ermöglicht dieser satz von 1300 proteinen die grundlegenden stoffwechselfunktionen einer eukaryotischen zelle, beispielsweise glykolyse, citratzyklus, membrantransport, proteinsynthese, dna-replikation synapsenfutter: wenden sie an, was sie gelernt haben aufgrund der nun bekannten wirkung von myostatin auf die muskelentwicklung hat man überlegungen angestellt, ob sich beim menschen myostatin nicht künstlich beeinflussen lässt, um muskelschwunderkrankungen wie die muskeldystrophie zu behandeln. sie können sich sicher auch vorstellen, dass athleten, die gerne größere muskeln hätten, an diesem gen und seinem proteinprodukt sehr interessiert sind. nur wenige wissenschaftliche projekte haben so viel begeisterung und hoffnung geweckt wie die genomsequenzierung. zurzeit ist man sehr bestrebt, die genome von so vielen tumoren und bei so vielen menschen wie möglich zu sequenzieren, um sie nach mutationen zu durchsuchen. als man festgestellt hat, dass das brca1-gen bei brustkrebs mutiert ist (7 kap. 15 key: cord-264996-og3sg0qw authors: howell, gareth j.; holloway, zoe g.; cobbold, christian; monaco, anthony p.; ponnambalam, sreenivasan title: cell biology of membrane trafficking in human disease date: 2006-09-17 journal: int rev cytol doi: 10.1016/s0074-7696(06)52005-4 sha: doc_id: 264996 cord_uid: og3sg0qw understanding the molecular and cellular mechanisms underlying membrane traffic pathways is crucial to the treatment and cure of human disease. various human diseases caused by changes in cellular homeostasis arise through a single gene mutation(s) resulting in compromised membrane trafficking. many pathogenic agents such as viruses, bacteria, or parasites have evolved mechanisms to subvert the host cell response to infection, or have hijacked cellular mechanisms to proliferate and ensure pathogen survival. understanding the consequence of genetic mutations or pathogenic infection on membrane traffic has also enabled greater understanding of the interactions between organisms and the surrounding environment. this review focuses on human genetic defects and molecular mechanisms that underlie eukaryote exocytosis and endocytosis and current and future prospects for alleviation of a variety of human diseases. the human cell is a complex network of membranes and protein enclosed in a membrane lipid bilayer. the interactions within and associated with such biomembrane bilayers have profound consequences for the organism as a whole; a single defect in just 1 of the potential 30-40,000 gene products made by each cell can cause devastating, if not fatal, evects for the whole organism. in addition to this, humans pass genetic information onto their ovspring and, with it, any genetic mutations or polymorphisms. it is believed that at least 1 in 10 people have, or will eventually develop, a disease caused by mutation or variation at the gene level. understanding how genetic mutations increase risk for human disease is critical in our understanding and treatment of the majority of human ailments that are caused by interactions between the organism and the environment. this review focuses on the research undertaken in the past 30 years relating to the molecular mechanisms that underlie membrane traycking within eukaryotic cells. we address mechanisms and factors that control protein progression through the secretory and internalization pathways and highlight key human diseases that illuminate mechanisms of membrane traycking. in addition, current and future strategies for therapeutic intervention in such genetic disorders are considered. common to all eukaryotic cells is the presence of multiple biomembrane lipid bilayer compartments, or organelles, which are maintained by specific protein-protein and protein-lipid interactions. such interactions are maintained within each compartment in spite of continuous traycking of membrane-bound and soluble components to diverent intracellular locations, and for secretion from the cell. in the majority of cases, this transfer of material occurs through vesicular movement: fission, docking, and fusion of membrane bilayer-enclosed intermediates occurs between donor and acceptor compartments (palade, 1975) . proteins, including membrane-bound receptors, secreted enzymes, and antibodies, begin their journey by entering the early secretory pathway at the endoplasmic reticulum (er). from here they are transported through the golgi apparatus and finally distributed to their final destination such as other intracellular organelles, the plasma membrane, or the extracellular environment. but how does a specific protein ''know'' how to reach a specific cellular destination when hundreds of newly synthesized, diverent molecules require 2 specific transport and targeting? many of these transport intermediates or vesicles, whether derived from the er, other internal organelles, or the plasma membrane, are ''coated'' with unique protein complexes, tethering factors, and regulatory factors that ensure correct targeting to an acceptor compartment. vesicle coat proteins, such as the clathrin or coat protein (cop) complexes, are relatively well studied. such complexes are assembled onto the cytoplasmic face of donor compartments to facilitate the fission of transport intermediates. allied with these coat proteins are diverent molecules that mediate recognition of cytoplasmic motifs in cargo proteins either directly (e.g., transmembrane proteins) or indirectly (e.g., soluble secreted enzymes). the snare hypothesis is central to our understanding of vesicular targeting to intracellular compartments (rothman, 1994; sollner et al., 1993) . initially uncovered in a screen for intra-golgi transport docking and fusion regulators, the snare (soluble n-ethylmaleimide-sensitive fusion attachment protein receptor) proteins have been found to regulate diverent membrane interactions in all eukaryotes via a highly conserved mechanism for membrane traycking based on accessory docking and fusion regulators. snare proteins are present on both the vesicle (vesicular or v-snare) and the acceptor (target or t-snare) and comprise coiled-coil domains that assemble to facilitate vesicle docking and membrane fusion (bennett, 1995; pelham, 2001) . in conjunction with snare proteins, small ras-related rab gtpases are implicated in further ensuring the fidelity of vesicle docking and fusion (olkkonen and stenmark, 1997) . these 20-to 25-kda proteins are gtphydrolyzing enzymes that act to recruit diverent proteins or evectors to membranes in a gtp/gdp-regulated manner (collins, 2003) . rab gtpase activity and protein conformation are regulated by interaction with soluble and membrane-bound proteins; such regulators can also tether vesicles to acceptor membranes and mediate intracellular signaling. a. early secretory pathway the endoplasmic reticulum (er) is the first stage of quality control along the secretory pathway. proteins destined for secretion (e.g., hormones), the plasma membrane (e.g., membrane-bound receptors), or other intracellular membrane compartments such as the lysosome (e.g., lysosomal proteases) emphysema and liver cirrhosis (perlmutter, 2004) 107400 endophillin ii clathrin-coated pit formation leukemia (dreyling et al., 1996; jones et al., 2001; narita et al., 1999; tebar et al., 1999) alzheimer's disease presenilin 1 presenilin 1-involved in cleavage and trafficking of amyloid precursor protein to plasma membrane neurodegenerative disorder (uemura et al., 2004) tau tau -microtubular stability through formation of aggregates autosomal dominant polycystic kidney disease (adpkd) polycystin-1 or 2 causes a defect in e-cadherin assembly and basolateral trafficking renal cysts in kidney and other tissues leading to endstage renal failure (charron et al., 2000) 173900 (continued ) autosomal dominant retinitis pigmentosa rhodopsin inhibited interaction of rhodopsin and arf4, leading to inhibited post-golgi delivery to rod outer segment narrowing of visual fields, night blindness (deretic et al., 2005) 180380 autosomal dominant ventricular tachycardia cardiac arrhythmia, hyperthermia (yano et al., 2005) 604722 autosomal recessive primary hyperoxaluria mistargeting of peroxisomal proteins to mitochondria kidney disease (danpure, 1998 ) 259900 ab-lipoproteinaemia mtp er retention thus preventing apob secretion vascular disease (sharp et al., 1993) 200100 batten's disease cln1-cln8 group of gene products implicated in regulating the processing and targeting of lysosomal and synaptic proteins neurological disease (pearce, 2000) 204200 breast cancer caveolin-1 deletion or dominant negative mutation of caveolin-1 promotes tumor progression breast cancer (bouras et al., 2004; williams and lisanti, 2005) (hayasaka et al., 1993; matsuyama et al., 2002) chediak-higashi syndrome (chs) chs1/lyst lyst involved in regulation of protein secretion from lysosomes -enlarged lysosomes partial albinism, recurrent bacterial infections, impaired chemotaxis and abnormal natural killer cell function (shiflett et al., 2002; ward et al., 2003) 214500 choroideremia (chm) rab escort protein 1 (rep1) rab27a remains cytosolic due to defective geranylgeranyl modification in chm lymphoblasts x-linked form of retinal degeneration 303100 combined factors v and viii deficiency ergic-53/p58 c-type lectin er retention and defective secretion of factors v and viii blood disease 227300 congenital finnish nephritic syndrome nephrin (nphs1), podocin (nphs2) er retention kidney inflammation (kestila et al., 1998; kramer -zucker et al., 2005) 256300 600995 pancreatic atp-sensitive potassium channel (k-atp) er or golgi retention of k-atp due to mutations in its sulfonylurea-1 (sur1) subunit excess insulin leading to hypoglycaemia (dunne et al., 2004; yan et al., 2004) congenital hypothyroid goiter thyroglobulin er storage disease. thyroglobulin is misfolded and accumulates in er constipation, large tongue, swelling around the eyes, failure to suckle, mental retardation (hishinuma et al., 1998; kim and arvan, 1998) (fan et al., 1999; garman and garboczi, 2002) familial hemophagocytic lymphoschistiocytosis (fhl) perforin -defective ctl (cytotoxic t lymphocytes) mediated killing immunodeficiency (feldmann et al., 2003; stepp et al., 1999) munc (hogg et al ., 1999; mathew et al., 2000) three forms of menkes disease can arise from diverent mutations in the atp7a gene: premature stop codons, deletions, or splicing defects. these can prevent atp7a function and/or traycking. classical menkes disease is the most common and fatality usually results by the age of 3 years. in two other nonfatal forms of menkes, mild and occipital horn syndrome, atp7a maintains the ability to transport copper ions across intracellular membranes, although traycking to the plasma membrane can be compromised (la fontaine et al., 1999) . atp7a is ubiquitously expressed and is the major copper transporter in cells of the intestine, kidney, and brain. in the liver, however, the major copper transporter is the wilson's disease protein, atp7b (bull et al., 1993) . this second p-type atpase shares strong similarity with atp7a and also translocates copper ions across membranes. although these gene products share functional similarities, mutations in atp7b result in copper accumulation in the liver and brain. familial hypercholesterolemia is an autosomal dominantly inherited disease caused by mutations in the low-density lipoprotein receptor (ldlr), leading to premature atherosclerosis and coronary heart disease. in healthy individuals, the ldlr is expressed on the surface of cells, where it binds circulating ldl particles and promotes uptake and cellular metabolism of its constituents, which includes cholesterol. in these patients, ldlr alleles display amino acid substitutions (cassanelli et al., 1998; jensen et al., 1997) , truncations (lehrm an et al ., 1987) , or mis sense mutat ions ( leiter sdorf et al ., 1993) , which can result in er retention and degradation. the point mutation at residue 209 of the insulin receptor compromises the ability of the receptor to dimerize correctly within the er, therefore leading to er retention. decreased plasma membrane levels of insulin receptor cause inhibited insulin binding after stimulus by a meal, and subsequent elevations in plasma glucose levels. this then leads to type ii diabetes mellitus (kadowaki et al., 1991) . a number of human diseases can induce the er stress response. here, the mutant protein is retained within the er, resulting in either dilation of the organelle, such as in congenital hyperthyroidism (medeiros-neto et al., 1996) and hypofibrinogenemia (callea et al., 1992) , or chronic er stress as is the case for hereditary emphysema (perlmutter, 2003) . in pelizaeus-merzbacher disease, an x-linked leukodystrophy disease, er accumulation of proteolipid protein (plp) results in oligodendrocyte apoptosis (gow et al., 1998) and the subsequent disruption of white matter formation in the brain observed in humans and mouse models. plp is a central nervous system protein that is the major component of myelin and, when expressed in cultured fibroblasts, is localized to the plasma membrane (gow et al., 1994) . the link between plp and the er stress response provides a tool for elucidating the cellular response to misfolded protein accumulation . accumulation of proteins within the er, leading to blockage of protein secretion, is an unwanted cellular property and mechanisms have evolved to overcome such events. this disposal of unwanted proteins is termed er-associated degradation (erad) (fig. 1) . as recently as the early 1990s, it was still believed that aberrant proteins were degraded within the er (fra and sitia, 1993) ; however, current models suggest that aberrant er-retained proteins actually undergo retrotranslocation and subsequent degradation in the cytoplasm. retrotranslocation has been proposed to occur through the same ''pore'' used to translocate nascent proteins into the er lumen during translation, namely the sec61 translocon (biederer et al., 1996; rö misch, 1999) . various yeast and mammalian proteins have been shown to be retrotranslocated from the er and degraded within the cytoplasm in a proteasomedependent manner, including the budding yeast proteins carboxypeptidase y, and a mutant pro-a-factor. when a mammalian protein such as cftr is expressed in budding yeast, it matures relatively slowly within the yeast er, leading to retrotranslocation to the cytoplasm and degradation (ward et al., 1995) . a further example is that of a 1 -antitrypsin deficiency. a 1 -antitrypsin is responsible for inactivating the enzyme elastase produced by lung neutrophils. in this inherited disease, a mutated form of a 1 -antitrypsin is retrotranslocated and degraded in proteasomes, leading to retention of active elastase in lung tissues and thus is a cause of lung emphysema (rutishauser and spiess, 2002) . however, retrotranslocation and proteasomal degradation may not be functionally coupled processes. pharmacological inhibitors that cause proteasome inactivation lead to egress of molecules such as mhc class i (wiertz et al., 1996a,b) , ribophorin (de virgilio et al., 1998) , and carboxypeptidase y (biederer et al., 1997) from the er to the cell cytoplasm. in contrast, inhibition of protein ubiquitination results in the retention of such molecules within the er. schmitz et al. (2004) suggest that two distinct proteasome-regulated pathways mediate degradation of retrotranslocated b-amyloid precursor protein. interestingly, endocytosed toxins that target key cytosolic factors appear to use the erad pathway to move out of the er and into the cytosol (deeks et al., 2002; hazes and read, 1997) . cholera and ricin toxins are routed from the cell surface through the golgi apparatus and to the er before being retrotranslocated into the cell cytosol. it is believed that the unusually low lysine content of these protein toxins prevents subsequent er-associated ubiquitination for degradation by the cytosolic proteasome. protein cargo is shuttled between the er and golgi within vesicular intermediates or 50-nm-diameter spherical vesicles containing coat protein complexes fig. 1 quality control of protein assembly within the endoplasmic reticulum. proteins destined for the secretory pathway (this example shows a transmembrane protein) are cotranslationally translocated from the ribosome into the lumen of the endoplasmic reticulum (er) through a portal referred to as the sec61 translocon. as the newly synthesized protein enters the er, quality control mechanisms in the form of protein chaperones bind to it and fold it to its correct conformation. further processing occurs through interactions with other chaperones before the successfully folded protein is loaded into copii-coated vesicles and shuttled from the er to the golgi apparatus. however, if the protein carries a mutation that causes it to take on an aberrant conformation the er chaperones will trigger a misfolded protein response. this has two outcomes: either the chaperones will remain bound to the misfolded protein, preventing its escape from the organelle (er retention), or the protein will be ubiquitinated and retrotranslocated through the sec61 complex for proteasomal degradation in the cell cytoplasm. a number of human genetic diseases are a result of key proteins failing to trayc through the secretory pathway and as a consequence are retained or degraded in this manner. such as copi or copii. initially discovered in mammals and yeast (kaiser and schekman, 1990; malhotra et al., 1989; novick et al., 1980; rothman and wieland, 1996) , cop complexes are required for the formation of vesicles at the er, er-golgi intermediate compartment (ergic), and golgi apparatus. cop recruitment to membranes facilitates the specific capture, packaging, transport, and delivery of membrane-bound and soluble protein cargo to an acceptor compartment. copii recruitment to sites on the smooth er initiates the formation of anterograde (forward) transport vesicles. these copii vesicles move from the er to the ergic, or vesicular tubular clusters (vtcs). from here, copi-coated vesicles are thought to mediate the continued anterograde movement from the ergic to the cis face of the golgi apparatus (scales et al., 1997) . the sar1p gtpase regulates copii vesicle formation via interaction with the sec12p guanine exchange factor (gef). sec12p-mediated activation of sar1p to a gtp-bound form leads to recruitment of the sec23p-sec24p heterodimer to membranes; this also initiates protein cargo selection within the er and recruitment of v-snares such as bet1p and bos1p. binding of sec23p-sec24p mediates further recruitment of the sec13p-sec31p complex. this copii complex then acts as a protein scavold that causes deformation of the membrane, resulting in vesicular fission, with anterograde movement of protein cargo-containing copii vesicles to the ergic. copii docking at an acceptor compartment is thought to trigger sec23p function, causing a conformational change in sar1p and gtp hydrolysis and dissociation or uncoating of the copii complex. thus copii vesicle docking and fusion with an acceptor compartment are mediated by cognate v-snare/ t-snare interactions (kirchhausen, 2000; kuehn et al., 1998; matsuoka et al., 1998; tang et al., 2005) . a severe hereditary bleeding disorder called combined deficiency of factor v factor viii (f5f8d) highlights the functional importance of traycking between the er and ergic. some f5f8d patients are deficient in the ergic-localized ergic-53 (lman1) protein and display defective secretion of the factor v and viii clotting factors. ergic-53 is a mannosebinding lectin that acts as a ''cargo receptor'' and recycles between the er and ergic (neerman-arbez et al., 1999; . however, 30% of f5f8d patients show normal levels of ergic-53/lman1, but are deficient in an associated protein, mcfd2, another ergic resident that interacts with ergic-53/lman1 in a calcium-dependent manner . small intestinal cells called enterocytes absorb fats and fat-soluble vitamins from food in the form of fatty acids and monoglycerides. the fats enter the lumenal surface of absorptive enterocytes by free divusion across their membranes, and emerge from the basolateral surface as particulate structures referred to as chylomicrons. formation of chylomicrons occurs within the er and golgi apparatus by vesicular transport before being traycked from the golgi to the plasma membrane. chylomicron retention disease (cmrd), anderson disease, and a neuromuscular disorder, cmrd associated with marinesco-sjö gren syndrome (cmrd-mss) , are examples of inherited diseases that result in compromised fat absorption, low blood cholesterol, and severely depleted blood chylomicron levels. jones et al. (2003) identified eight mutations in the sar1p gene product and copii component associated with these lipid absorption diseases, thus strongly implicating a role for the copii vesicular transport system in the movement of dietary fats from the intestine to the circulating bloodstream. copii mediates anterograde trayc from the er to the golgi apparatus; however, copi vesicles appear to function primarily in the retrograde (backward) transfer of proteins from the golgi and ergic back to the er. this retrograde trayc is necessary for recovering escaped er resident proteins, coat and snare proteins that have arrived at the ergic and golgi from copii vesicles, or glycosylation enzymes that have been incorrectly modified (duden, 2003; lee et al., 2004) . the golgi-associated copi coatomer is a complex of seven polypeptides: a-, b-, b 0 -, g-, d-, e-, and z-cop gene products, which interact with the donor membrane to form copi vesicles. vesicle formation is triggered by the gtpase adp-ribosylation factor 1 (arf1), which recruits copi coatomer to the donor membrane. transmembrane proteins containing cytoplasmic lysine-based motifs such as kkxx or kxkxx, or soluble proteins containing the c-terminal kdel motif, are recycled by copi-coated vesicles from the golgi apparatus back to the er. the kdel motif, present in soluble er chaperones such as bip and protein disulfide isomerase, is recognized by the membrane-bound kdel receptor (majoul et al., 2001) . in both cases, cytoplasmic motifs in these transmembrane proteins are recognized and bound by copi coatomer, promoting inclusion into vesicles destined for the er. actin microfilaments are also involved in this retrograde transport step (valderrama et al., 2001) . this golgi-er step is regulated by the gtpase cdc42 and n-wasp protein (luna et al., 2002) , factors previously implicated in actin-linked processes at the plasma membrane. live imaging of cells expressing an engineered fluorescent and temperaturesensitive vesicular stomatitis virus g-glycoprotein (ts045vsvg) demonstrated sequential action of copii-and copi-coated vesicles (scales et al., 1997) . vsvg accumulated in structures close to the er that contained intermediate compartment resident proteins. these structures then matured into vesicles that contained copi proteins. stephens et al. (2000) showed that this ''segregation'' between copii and copi vesicles occurred at a location in close proximity to exit sites on er membranes. a cop-independent mechanism has also been implicated in retrograde trayc between the golgi apparatus and the er. the rab6 gtpase is implicated in regulating the movement of bacterial shiga toxin b fragment (stb) via a retrograde step from the golgi apparatus to the er. expression of a dominant-negative gdp-bound form of rab6 inhibited stb retrograde movement, whereas copi transport was unavected (white et al., 1999) . the golgi apparatus is composed of flattened cisternae and membrane compartments that are closely juxtaposed in a stack-like appearance. in mammalian cells these stacks are positioned end-to-end, forming a ribbonlike structure near the nucleus (barr and warren, 1996) . the golgi apparatus is a highly dynamic organelle sited at the hub of the secretory pathway with key processing and sorting functions. the golgi is a polarized structure with proteins and lipids from the er received at the cis side, followed by the medial and trans subcompartments, where further glycosylation modifications occur; the trans-golgi network (tgn) is the final subcompartment where sorting and packaging events take place. the golgi apparatus also sorts proteins and lipids bound on a retrograde pathway from the cis-golgi back to the er. in addition, proteins can also return to the tgn from the endomembrane/lysosomal system (fig. 2) . controversy exists regarding the mechanism for anterograde movement of cargo proteins within the golgi apparatus. the golgi apparatus contains secretory proteins that can vary in physical size, from relatively small polypeptides to large, bulky multisubunit complexes; all need to reach the tgn for final sorting into transport intermediates. there are also resident glycosylation enzymes that have spatially restricted functions within the golgi, that is, enzymes that function within specific subcompartments to ensure the correct addition or trimming of n-and o-linked sugars on secreted proteins as they progress through the pathway. this raises a key question: how do protein and lipid cargo move through the golgi apparatus while resident enzymes retain their localization? we know that many golgi enzymes contain transmembrane golgi localization signals that mediate targeting to a specific compartment (munro, 1998) . two models have been proposed: the cisternal maturation model and the vesicular transport model (elsner et al., 2003; storrie et al., 2000) . briefly, the cisternal maturation model suggests that large proteins or aggregates remain within a single golgi cisterna, which matures through the retrograde transfer of resident enzymes via copi vesicles. in contrast, the vesicular transport model proposes that newly synthesized protein is traycked from cisterna to cisterna via copi-coated vesicles that sequentially bud ov membranes and fuse with the next subcompartment. in either case, copi-coated vesicles play a central role in intra-golgi the secretory pathway and vesicular traycking. protein enters the secretory pathway at the endoplasmic reticulum (er) and is traycked in copii-coated vesicular structures to the intermediate compartment (ergic/vtc), from which copi-coated vesicles carry it to the cis face of the golgi. cargo protein (c) continues along the secretory pathway through the golgi apparatus to the trans-golgi network (tgn). retention signals in er resident proteins (r) ensure they undergo retrograde traycking from the golgi in copi vesicles. retrograde transport of 20 transport. a number of snare proteins, such as membrin, rbet1, gs27, and syntaxin-5, have also been localized to the golgi apparatus and are required for intra-golgi transport and homeostasis (nichols and pelham, 1998) . golgi-tethering molecules called golgins and golgi reassembly stacking proteins (grasps) belong to a family of regulatory factors involved in golgi maintenance and vesicular transport. the reader is pointed to an indepth review that covers golgins in more detail (short et al., 2005) . in brief, the golgins can be anchored to golgi membranes through various mechanisms and contain characteristic coiled-coil domains that extend from the membranes as a rod-like structure (burkhard et al., 2001) . golgins such as giantin and golgin-84 are securely anchored to the membrane via a transmembrane domain near their c terminus. electrostatic or ionic interactions mediate the attachment of other golgins to membranes. for example, proteins of the grasp family (grasp65 and grasp55) bind to gm130 and golgin-45 to recruit these factors to the cis and medial golgi membranes, respectively. moreover, a large number of golgins are recruited to membranes via interactions with the rab, arf, and arl (arf-like) gtpases. vesicular and cis-golgi membrane recruitment of golgin p115 is regulated by rab1, whereas membrane attachment of yeast golgin rud3p is regulated by arf1p. golgin-97 binds to membranes by interaction with arl1p, a member of a new class of arf-like gtpases termed arls (short et al., 2005) . interestingly, autoantibodies directed against giantin, golgin-245, golgin-160, gm130, and golgin-97 golgins and grasps are present in patients with autoimmune conditions such as sjö gren's syndrome and systemic lupus erythematosus. in sjö gren's syndrome, moisture-producing glands are targeted by the autoimmune response, resulting in dry eyes and mouth (lichtenfeld et al., 1976) . systemic lupus erythematosus is a chronic rheumatic condition that avects joints and muscles, causing skin rash and kidney problems. sjö gren's syndrome patients can also simultaneously display both rheumatoid arthritis and systemic lupus erythematosus. golgi biogenesis requires golgin function at diverent stages during cell division. mammalian p115 is crucial for maintenance of the stacked nature of the golgi cisternae (puthenveedu and linstedt, 2004) . during mitosis, the golgi stack disperses into clustered vesicles. these vesicles then fuse in the daughter cells to form new cisternae, alignment and stacking of which result in the formation of a fully functional organelle. grasp65 tethers have been proposed to hold cisternae in close proximity through interactions with p115 and gm130 (shorter and warren, 1999) . the golgin p115 is also involved in tethering copi vesicles to golgi membranes (sonnichsen et al., 1998) and may be needed for snare complex assembly (shorter et al., 2002) . the budding yeast p115 homolog (uso1p) tethers copii-coated vesicles to golgi membranes during anterograde transport from er exit sites to the cis-golgi (barlowe, 1997; cao et al., 1998; sapperstein et al., 1996) . mammalian p115 is also essential for the tethering of transport vesicles to the cis-golgi (alvarez et al., 2001) and during intra-golgi transport (seemann et al., 2000; waters et al., 1992) . golgins such as golgin-84 are implicated in the regulation of golgi structure and the formation of the golgi ribbon (diao et al., 2003) . golgin-97 may function as a tethering molecule in retrograde trayc from the endosome to the tgn (lu et al., 2004) . moreover golgins are also implicated as tethering components between the cytoskeleton and the golgi apparatus (short et al., 2005) . the trans-golgi network (tgn) is the final golgi subcompartment where secreted proteins are sorted, packaged, and directed to their final destination. traycking from the tgn can occur in either a constitutive or regulated manner. constitutive transport is the continuous release of protein from the trans-golgi network. regulated secretion occurs in response to extracellular stimuli such as secretagogues, metal ions, hormones, or growth factors, which trigger the docking and fusion of secretory granules or vesicles with the plasma membrane. various mechanisms control the traycking of proteins from the tgn by the formation and delivery of membrane-derived transport vesicles to the plasma membrane, endosomes, or lysosomal structures (ponnambalam and baldwin, 2003) . the expression of inactive (dominant-negative) protein kinase d isoforms in tumor lines (liljedahl et al., 2001) , polarized canine kidney cells (yeaman et al., 2004) , and mouse fibroblasts (prigozhina and waterman-storer, 2004) has been shown to inhibit vesicle fission (release) from the tgn. vesicle release is modulated by this family of kinases in response to cellular diacylglycerol (baron and malhotra, 2002) and binding to an as yet unknown evector protein on the cytoplasmic face of the tgn (van lint et al., 2002) . in addition, the cdc42 gtpase is linked to actin remodeling and has been shown to inhibit the exit of basolateral targeted proteins in polarized cells (kroschewski et al., 1999; musch et al., 2001) and copper-regulated protein transport (cobbold et al., 2002) . copper is an essential element and cofactor required for functionality of many secreted enzymes (cuproenzymes). at steady state, atp7a (menkes 22 howell et al. disease copper transporter; section iii.a.1) resides in the tgn, where it provides newly synthesized cuproenzymes such as lysyl oxidase with copper ions as they traverse the secretory pathway. when intracellular copper ion levels rise, atp7a responds to this environmental danger by redistributing to the plasma membrane in a cdc42-regulated manner (cobbold et al., 2002) . here, atp7a acts as a copper ezux pump to remove copper ions from the cytoplasm to maintain homeostatic function and prevent toxicity. when copper levels are reduced, atp7a recycles back to the tgn. this endocytic internalization and sorting event is independent of both clathrin and caveolae (cobbold et al., 2003) , although relying on a cytoplasmic dileucine motif present in the atp7a c-terminus petris and mercer, 1999) . dent's disease, an x-linked kidney disorder that presents with hypercalciuria, nephrocalcinosis (kidney stone formation), and progressive renal failure, is caused by missense, nonsense, and deletion mutations within the endosomal clc-5 voltage-gated chloride channel. clc-5 is a member of a large family of voltage-gated chloride channels that have a diverse array of cellular functions including membrane excitability, transepithelial ion transport, and cell volume regulation (thakker, 1997) . when expressed in xenopus oocytes, a number of missense mutations in the clc-5 gene localized the channel to the golgi apparatus and showed reduced conductance and significantly reduced plasma membrane (pm) localization (ludwig et al., 2005) . similarly, expression of mutant clc-5 alleles in cultured cells revealed an approximate 5-fold increase in golgi retention (carr et al., 2003) . a. receptor-mediated endocytosis clathrin-coated vesicles (ccvs) are a route for protein internalization conserved from yeast to humans. roth and porter (1964) first observed this process in mosquito oocytes and these vesicles have subsequently become one of the best characterized membrane transport steps in eukaryotes. clathrin is one of the principal proteins involved in this transport step and, in combination with more than 25 clathrin-associated factors, this unique structural component forms transport vesicles on the cytoplasmic face of the tgn, endosomes, and the plasma membrane. clathrin-coated vesicles bud from their donor membranes and are directed to target membranes by associated proteins and factors. this highly conserved 600-kda clathrin complex comprises heavy (180 kda) and light (25 kda) chain proteins that are assembled into a three-legged structure called a triskelion. triskelions can be polymerized by accessory factors into striking lattice-like ''cages'' comprising pentagons and hexagons, resembling a soccer ball structure or buckminsterfullerene. clathrin cages are 70-120 nm in diameter; significantly larger than copi or copii vesicles (ccvs). clathrin-coated vesicles are believed to assemble through a sequence of events that can be designated as activation, cargo capture, coat assembly, scission, movement, and vesicle uncoating (kirchhausen, 2000) . members of a class of clathrin-associated factor termed adaptor protein (ap) complexes are recruited to donor membranes through interactions with a docking complex, which then further interacts with motifs within the cytoplasmic tail of cargo proteins, resulting in ''cargo capture.'' this leads to clathrin cage assembly and the concomitant polymerization of the clathrin triskelion and resultant deformation of the donor membrane. scission, or vesicle release from the plasma membrane, is believed to occur through the action of the gtpase dynamin and other accessory proteins, such as amphiphysin (wigge et al., 1997) . in the fruit fly drosophila melanogaster, a dynamin gene mutation (shibire) causes temperature-sensitive paralysis. this is likely due to a block in the endocytic uptake of synaptic vesicle proteins at the plasma membrane, leading to a block in recycling and reformation of competent synaptic vesicles at nerve terminals (chen et al., 1991; koenig and ikeda, 1989; kosaka and ikeda, 1983; van der bliek and meyerowitz, 1991) . the expression of a dominant-negative gdp-bound dynamin mutant, k44a, results in compromised ccv formation (herskovits et al., 1993; van der bliek et al., 1993) and inhibition of clathrin-mediated internalization of the glucose transporter glut4 (al-hasani et al., 1998) , human immunodeficiency virus (hiv) (daecke et al., 2005) , and influenza virus (roy et al., 2000) . the scission function of dynamin is assisted by specific lipid-modifying enzymes such as endophilin, synaptojanin, and phospholipase d (bi et al., 1997; havner et al., 1997; ringstad et al., 1999; schmidt et al., 1999; woscholski et al., 1997) . finally, ccv uncoating at the target membrane occurs through the actions of the heat shock protein hsc70 (schlossman et al., 1984) and auxilin (ungewickell et al., 1995) . sorting of proteins from donor to target membranes involves the recognition of cytoplasmic sequences in membrane proteins by clathrin-associated ap complexes. four adaptor protein complexes (ap1-ap4), each comprising four diverent subunits, have been identified (robinson, 2004) . the ap1 complex is involved in clathrin-coated vesicle formation at the tgn for transport to late endosomes; evidence has also implicated a role for this complex in a tgn-to-plasma membrane step (folsch et al., 2003) . ap2 is the best-studied of the four complexes and mediates internalization of transmembrane 24 howell et al. receptors at the plasma membrane via clathrin-coated vesicles. the ap3 complex is involved in traycking from early endosomes to either late endosomes or lysosome-related organelles such as melanosomes, platelet-dense bodies, and antigen-processing compartments. finally, the ap4 complex was the last to be cloned (dell'angelica et al., 1999a; hirst et al., 1999; ) . in contrast to ap1-ap3, ap4 does not possess the b ''ear'' domain (see below), which allows interaction with clathrin and other cytosolic factors such as eps15 and auxilin 2 (lundmark and carlsson, 2002) . by electron microscopy, ap4 has been localized to vesicles at the tgn, plasma membrane, and early endosomes, although there is debate as to whether these vesicles are clathrin-coated (barois and bakke, 2005; hirst et al., 1999) . interestingly, ap3 and ap4 may function independently of clathrin (hirst et al., 1999; vowels and payne, 1998) , suggesting the existence of another, as yet unidentified, coat protein that is analogous to clathrin. all four ap complexes comprise two large 100-kda subunits: a b subunit (b1-b4) plus a g (ap1), a (ap2), d (ap3), or e (ap4) subunit. in addition, each ap complex contains a 50-kda subunit (m1-m4) and a small 20-kda subunit (s1-s4). ap1, -2, and -3 contain two carboxyl ''ear'' domains connected to the head of each large 100-kda subunit by a flexible hinge of approximately 20-30 residues. importantly, the ear domain of the b subunit and the hinge domains of the g and a subunits have been shown to bind clathrin (goodman and keen, 1995; morgan et al., 2000; owen et al., 2000) , and consensus sequences in the hinge domains of b1 and b2 have clathrin-binding properties (dell'angelica et al., 1998) . the b and m subunits of the ap complex interact with motifs present in the cytoplasmic domains of transmembrane proteins to mediate cargo recruitment into clathrin-coated vesicles. such motifs include npxy, yxxø, and dileucine-based sequences (ø represents a bulky hydrophobic amino acid). one such motif, npxy, is present in key cellular receptors such as low density lipoprotein receptor (ldlr), epidermal growth factor receptor (egfr or erb1), and insulin receptor, and mediates endocytosis and sorting. importantly, the jd mutation (y807c) in ldlr lies within this key motif and causes familial hypercholesterolemia (knoblauch et al., 2000) . the second tyrosinebased motif, yxxø, mediates plasma membrane internalization, lysosomal targeting, and basolateral targeting of cargo. this motif is found in lysosomal residents such as lamp-1 and -2, cd63, the recycling transferrin receptor (tfr), and tgn-associated recycling membrane proteins, furin and tgn38. di-leucine motifs present on transmembrane transporters such as glut4 (glucose transporter), atp7a, and mannose-6-phosphate receptors (m6pr) can fall into two categories: [de]xxx[li] and dxxll related motifs. the [de]xxx[li] motif is associated with proteins internalized from the plasma membrane and targeted to lysosomes, while dxxll motif is found in transmembrane proteins that shuttle between the tgn and endosomal system (bonifacino and traub, 2003) . another class of clathrin-associated factor is the golgi-localized, g-earcontaining, arf-binding proteins (ggas) found on the tgn and postulated to interact with ap1 to mediate transport of m6pr (section v.b) to endosomes (doray et al., 2002) . ggas can act as multifunctional adaptors that link transmembrane proteins, arf gtpases, clathrin and accessory proteins at sites of ccv formation (robinson and bonifacino, 2001) . the disease oculocerebrorenal syndrome of lowe (ocrl) is an x-linked disorder caused by mutations in the ocrl1 gene (lowe, 2005) . the gene product is an inositol 5 0 -phosphatase that catalyzes the removal of the phosphate from this position on the inositol moiety. the preferred ocrl1 substrate is pi(4,5)p 2 , a phosphoinositide shown to be important in endocytosis because of its central role in recruiting accessory proteins to ccvs (padron et al., 2003) . ocrl1 has been localized to clathrin-coated vesicles associated with endosomal and tgn membranes (choudhury et al., 2005) . this is not surprising as ocrl1 interacts with clathrin and promotes its assembly into clathrin lattices and cages (choudhury et al., 2005; ungewickell et al., 2004) . ocrl1 also interacts with the rac1 gtpase that regulates actin dynamics, possibly via a gtpase activation domain to accelerate gtp hydrolysis (faucherre et al., 2003) . although the exact function of ocrl1 is still unclear, the disease phenotype hints to ocrl1 function in membrane traycking. ocrl1 mutations can cause loss of protein expression and phosphatase activity. rnai-mediated inhibition of ocrl1 expression in cultured human cells results in partial redistribution of a cation-independent mannose-6-phosphate receptor and a tgn recycling protein (tgn46) to early endosomes (choudhury et al., 2005) . this suggests that loss of ocrl1 perturbs endosome-to-tgn vesicle transport, suggesting a functional requirement for this membrane trayc step. it is possible that ocrl1 plays a role in anterograde traycking from the tgn-to-endosomes as well, since ocrl1 is abundantly present on tgn-associated clathrin buds destined for the endocytic pathway. ocrl disease symptoms include congenital cataracts, mental retardation, and renal tubular dysfunction (lowe et al., 1952) . renal failure in ocrl patients is probably partly caused by defects in solute and protein readsorption in kidney proximal tubules. this is likely due to missorting of megalin and cubilin, cell surface receptors involved in kidney solute uptake. in ocrl1 patients plasma membrane shedding of these receptors is reduced (norden et al., 2002) , indicating ocrl1 regulation of either receptor traycking from the tgn-to-plasma membrane or recycling from plasma membrane-to-tgn. paraneoplastic stiv-person syndrome (sps) is a neurological autoimmune disease characterized by severe muscle stivness and spasms, and often has secondary symptoms including diabetes, epilepsy, and breast cancer. autoantibodies are produced against the clathrin-associated regulator, amphiphysin i 26 (de camilli et al., 1993) , a protein shown to bind dynamin in nerve terminals (david et al., 1996) and which is implicated in regulating the endocytosis of neuronal synaptic vesicle components (burns, 2005) . in support of this hypothesis, sommer et al. (2005) showed that sps-like symptoms could be triggered in rats injected with anti-amphiphysin antibodies from a human sps patient. genetic translocations leading to the formation of hybrid clathrin-accessory proteins can lead to other forms of acute myeloid leukemia, lymphoblastic leukemia and acute megakaryoblastic leukemia (dreyling et al., 1996; jones et al., 2001; narita et al., 1999; tebar et al., 1999) . in these diseases, an aberrant hybrid protein consisting of the putative transcription factor af10 and the clathrin accessory protein calm (clathrin assembly lymphoid myeloid leukemia protein) is formed because of a partial inversion of the af10 gene on chromosome 11 (salmon-nguyen et al., 2000) . finally, in hermansky-pudlak syndrome (hps) type 2, a condition that results in partial albinism and prolonged bleeding, mutations have been found in the b3a gene that encodes a subunit of the ap3 adaptor complex (dell'angelica et al., 1999b) . hps is discussed in more detail in section v.c. originally identified more than 50 years ago (palade, 1953; yamada, 1955) , caveolae are flask-shaped invaginations of approximately 50-100 nm in diameter at the plasma membrane. these plasma membrane profiles are related to lipid rafts and contain unique mixtures of gpi-anchored proteins, transmembrane proteins, signaling factors and lipids, such as cholesterol. caveolae are believed to mediate the uptake of small solutes, regulate protein traycking (hommelgaard et al., 2005; tagawa et al., 2005) , transcytosis (transport across endothelial cells) (simionescu et al., 2002) , signal transduction (insel et al., 2005; lisanti et al., 1994; ostrom and insel, 2004) and cholesterol homeostasis (fielding and fielding, 2001) . however, their exact role in the internalization of membrane proteins and soluble protein ligands is controversial. caveolin-1, also known as vip21, is a structural component essential for the formation and stability of caveolae (kurzchalia et al., 1992; rothberg et al., 1992) . of the three members of the caveolin gene family (caveolin-1, -2, and -3) tang et al., 1996) , caveolin-1 and -2 are abundant in a wide variety of cell types including endothelial cells, adipocytes, alveolar type i pneumocytes, and smooth muscle cells (williams and lisanti, 2004) , whereas caveolin-3 is a muscle-specific isoform expressed in striated muscle cells such as cardiac and skeletal myocytes (cohen et al., 2004; tang et al., 1996) . caveolin-1 and -3 are both able to induce formation of caveolae at the plasma membrane (galbiati et al., 2001; li et al., 1996) . however, caveolin-2 requires the presence of caveolin-1 for expression, membrane localization, and formation of caveolae (razani et al., 2002) . caveolae are absent from cells that lack caveolin-1 but can be induced by ectopic expression of the gene (fra et al., 1995) . caveolins adopt a hairpinlike structure that inserts into the membrane such that the n and c termini are cytoplasmic. caveolins can polymerize to form a striated coat surrounding an invagination site . caveolin-1 can bind cholesterol (murata et al., 1995) , which is enriched within both caveolae and lipid rafts (sargiacomo et al., 1993) ; this may explain why caveolae have been considered a subset of lipid rafts. however, caveolae and lipid rafts are considered to be independent entities as some proteins can be found in one but not the other (liu et al., 1997) . certain ligands can internalize via a lipid raftdependent but clathrin-independent mechanism in cells that lack caveolae (lamaze et al., 2001) . a large pool of the plasma membrane caveolar vesicles cluster into dense grape-like structures where individual caveolae appear stacked on top of each another (thomsen et al., 2002) . these structures are intimately associated with the actin cytoskeleton (stahlhut and van deurs, 2000) ; caveolaassociated proteins are also implicated in regulating plasma membrane dynamics and cellular movement. a small pool of ''transport-competent'' caveolar vesicles may undergo short-range constitutive fusion and budding cycles just under the plasma membrane . caveolae and caveolins can also be detected at the tgn (dupree et al., 1993; kurzchalia et al., 1992) and may form stable ''platforms'' for the movement of proteins and lipids from the tgn to the plasma membrane (tagawa et al., 2005) . the caveolar pathway can be hijacked and used by pathogens or toxins to gain entry into the cell. viruses such as polyomavirus, echovirus 1, and simian virus 40 (sv40) use caveolae to internalize viral particles. these viruses cluster lipid rafts and sequester them into caveolae through interactions with raft components such as integrins and glycosphingolipids ; in the case of sv40, the virus binds to the raft component ganglioside gm 1 (tsai et al., 2003) . tagawa et al. (2005) have shown that sv40 can trigger the long-range movement of transport-competent caveolar vesicles. moreover, cell infection with sv40 more than doubles the number of caveolae capable of undergoing viral internalization and long-range traycking. caveolae contain much of the molecular machinery required for ''classical'' vesicle fission, docking, and fusion, for example, snare proteins, monomeric and trimeric gtpases, annexins ii and vi, n-ethylmaleimide (nem)sensitive fusion protein (nsf), and atpases (schnitzer et al., 1995) . caveolae also contain the dynamin gtpases, which can be transiently recruited to 28 sv40-loaded caveolae and implicated in membrane scission (henley et al., 1998; oh et al., 1998; pelkmans and helenius, 2002) . internalized caveoladerived vesicles move to an endocytic compartment termed the ''caveosome'' and eventually arrive at the early endosome. after fusion with the target compartment, caveolae do not disassemble but maintain their integrity in the membrane, preserving their compartmentalization and retaining their lipid and protein components (pelkmans et al., 2004) . the fate of internalized sv40 viruses after reaching the caveosome eventually results in arrival at the smooth er (pelkmans et al., 2001) . interestingly, mutations in caveolin have been implicated in muscular dystrophy and cardiovascular disease, and mutations causing the downregulation of caveolin have been linked to the progression of various human carcinomas; it is therefore possible that caveolins may have a tumor suppressor role. the caveolin-1 and caveolin-2 genes are located on human 7q31.1 near the microsatellite repeat marker d7s522. this region is commonly deleted in various cancers (engelman et al., 1998) , hinting that caveolin gene deletion may be advantageous for tumor progression. in one report, the caveolin-1 p132l mutation was present in 16% of breast cancer patients studied (hayashi et al., 2001) . the p132l mutation was also linked to the metastatic potential of tumors and disease prognosis. the caveolin-1 p132l mutation also conferred increased cell migration and altered morphology. caveolin-1 protein levels can be reduced or absent from a number of human breast cancer cell lines compared with normal mammary cells (lee et al., 1998) . similarly, silent and missense mutations in caveolin-1 have also been associated with oral carcinomas (han et al., 2004) . caveolin-1, and to a lesser extent caveolin-2, gene expression is downregulated in some cases of thyroid carcinoma (aldred et al., 2003) . although it remains unclear as to why the loss of caveolin causes cell proliferation diseases such as cancer, one can speculate on the role of caveolin in regulating signaling pathways. in endothelial cells, which have a high abundance of caveolin, the key vascular endothelial growth factor receptor 2 (vegfr2) has been shown to be inactive when localized to caveolae (labrecque et al., 2003) . this receptor tyrosine kinase modulates the endothelial response to the key vegf-a cytokine and controls angiogenesis and new blood vessel formation, thus regulating neovascularization and tumor growth (neufeld et al., 1999) . similarly, platelet-derived growth factor (pdgf) receptor tyrosine kinase activity is reduced when associated with caveolae (yamamoto et al., 1999) . in addition to vegfr2 and pdgfr, a number of g protein-coupled receptors (gpcrs) have been shown to interact with caveola-associated factors (insel et al., 2005) . gpcrs are a large family of transmembrane receptors involved in a variety of signal transduction events. these receptors are activated by a range of ligands, including hormones and peptides, and have been linked to a number of cancers such as cell biology of membrane trafficking in human disease thyroid, lung, and gastric. the presence of a number of gpcrs in caveolae suggests that these plasma membrane structures may interact with gpcrs and modulate their signaling potential. lisanti and others (li et al., 1995a) have shown that caveolin-1 interacts solely with inactive forms of g-protein a subunits, lending credence to the negative regulation hypothesis caused by the association of caveolae with transmembrane signaling receptors. a number of mutations in muscle-specific caveolin-3 have been associated with four distinct but related autosomal dominant muscle disease phenotypes (woodman et al., 2004) : limb girdle muscular dystrophy type 1c (minetti et al., 1998) , rippling muscle disease, hyperckemia (persistently elevated levels of serum creatine kinase), and distal myopathy. some mutations cause aberrant retention of caveolin-3 in the golgi and subsequent degradation; other mutations may cause mutant caveolin-3 to act in a dominant-negative manner by forming unstable aggregates with wild-type caveolin-3 (galbiati et al., 1999; sotgia et al., 2003a,b) . hypertrophic cardiomyopathy (hcm) patients have a caveolin-3 t63s mutation that reduces plasma membrane levels (hayashi et al., 2004) . caveolin gene knockout mice are providing insights into protein function in diverent human diseases. for example, lack of caveolins can cause diabetes, atherosclerosis, and cardiomyopathies in mouse models (cohen et al., 2004; williams and lisanti, 2004) . however, such phenotypes have yet to be linked to caveolin dysfunction in humans. phagocytosis is a process used by white blood cells such as macrophages, neutrophils, and dendrites to ingest large particulate material into specialized vesicles called phagosomes. these professional phagocytes are paramount in the defense against infection as they engulf and ingest whole microorganisms such as bacteria. they also use this route for ''mopping up'' apoptotic debris or senescent cells from tissues. in contrast to constitutive pinocytic transport, phagocytosis is regulated by cell surface-localized fc receptor (fcr) contact or interaction with complement-or antibody-coated particles which results in clustering of fcr on the cell surface, a step important for subsequent intracellular signaling and cellular activation (daeron, 1997) . polymorphisms in leukocyte-specific fcg receptors may contribute to autoimmune diseases such as guillain-barré syndrome or rheumatoid arthritis, and enhanced susceptibility to infection (van sorge et al., 2003) . fc-mediated binding can trigger a complex signaling response involving extrusion of fine plasma membrane projections (pseudopodia) from the macrophage to surround and engulf the pathogen, forming a phagosome. the signaling response is reviewed in greater detail elsewhere (bokoch, 2005; chimini and chavrier, 2000; niedergang and chavrier, 2005) . in brief, the activation of tyrosine kinases and rho gtpases is triggered through fcr signaling. the rac and cdc42 gtpases, in conjunction with the downstream evector wasp, mediate remodeling of the actin cytoskeleton, leading to pseudopodium formation and phagosome closure (castellano et al., 2001; chimini and chavrier, 2000) . in contrast, the complement mediated-uptake of opsonized particles divers such that they appear to ''fall'' into the cell in a process that requires rho, but not rac or cdc42 (bokoch, 2005) . phagocytosis, although designed to destroy pathogens, can paradoxically be used as a route of entry by pathogens such as mycobacterium (m. leprae and m. tuberculosis) or leishmania (nguyen and pieters, 2005; scott et al., 2003) . normally, internalized pathogens are destroyed successfully through phagosome maturation into lysosomes and subsequent degradation. mycobacterium can evade host degradation by secreting a soluble serine/threonine protein kinase g molecule into the phagosome. this molecule initiates a signaling response that interferes with phagosome-lysosome fusion, and promotes intracellular pathogen survival (walburger et al., 2004) . furthermore, phagosome maturation is compromised by a pathogen induced block of p38 map (mitogen-activated protein) kinase recruitment to the tethering molecule early endosome antigen 1 (fratti et al., 2003) . the leishmania protozoan parasite, which is transmitted to humans by sand flies, produces a membrane molecule called a lipophosphoglycan, which is inserted into the lipid bilayer of the phagosome in infected macrophages. this lipophosphoglycan is thought to modulate intracellular signaling pathways, resulting in a less fusogenic phagosome and preventing maturation; this would facilitate pathogen replication and disease progression (lodge and descoteaux, 2005) . molecules internalized from the cell surface by receptor-mediated endocytosis and clathrin-coated vesicles are delivered to the early endosome for sorting. molecules such as low-density lipoprotein receptor (ldlr) and transferrin receptor (tfr) are eyciently recycled between the early endosome and the plasma membrane. however, after ligand-mediated activation (fig. 3) , receptor tyrosine kinases such as epidermal growth factor receptor (egfr) are sorted along the endocytic pathway for degradation. early endosomes are thought to be formed through the fusion of internalized vesicles and recruitment of specific proteins and lipids. one key 3 protein traycking through the endosomal-lysosomal system. cell surface receptors are internalized through clathrin-coated vesicles (ccvs) at the plasma membrane. in the cell cytoplasm, ccvs shed their coat components and fuse to produce endosomes. internalized receptors are either recycled from sorting endosomes (housekeeping receptors, e.g., transferrin receptor) or targeted for degradation within the lysosome (signaling receptors, e.g., growth factor receptors) after movement through the late endosome and multivesicular body (mvb) compartments. 32 endosomal regulator is the ubiquitously expressed rab5a gtpase. rab5a is present on the cytosolic face of the plasma membrane, vesicles, and tubular endosomal profiles (chavrier et al., 1990) . a number of rab5a-associated evector proteins regulate endosomal fusion and mediate protein cargo movement and endosomal sorting (zerial and mcbride, 2001) . such evector proteins, including , are clustered on the cytosolic face of the early endosome and stabilize the gtp-bound rab5a in an activated state (horiuchi et al., 1997) . gtp-bound rab5a directly binds to early endosome antigen eea1 to regulate vesicular and endosomal tethering. eea1 contains a c-terminal rab5a-binding domain, and a phosphatidylinositol 3-phosphate-binding zinc finger domain referred to as an fyve (conserved in fab1, yotb, vac1, and eea1) domain (gaullier et al., 1998; stenmark et al., 1996) . overexpression of wild-type rab5a, or a constitutively active rab5a mutant, causes endosome enlargement and defective traycking through this compartment, whereas expression of a constitutively inactive rab5a mutant leads to formation of small endosomes and decreased endocytosis (bucci et al., 1992) . a family of evector proteins that accelerate gtpase hydrolysis (rabgaps) have been identified: rabgap-5 binds to rab5a and regulates traycking through the endocytic pathway . the importance of rab5a activity is further illustrated in the genetic disorder tuberous sclerosis (ts), a disease that causes tumors in the brain, eyes, heart, kidneys, lungs, and skin. ts arises when the tumor suppressor gene, tuberous sclerosis complex (tsc), is absent; introduction of the wild-type tsc2 gene into an animal model or cultured cells results in tumor suppression and reduced cellular proliferation (kobayashi et al., 1995; yeung et al., 1994) . interestingly, the tsc2 gene product (tuberin) is implicated in regulating gtp/gdp exchange on rab5a, thus regulating traycking through this endosome system (xiao et al., 1997) . in chronic myelomonocytic leukemia (cmml) a genetic translocation causes fusion of rab5a evector rabaptin-5 and the pdgfbr (magnusson et al., 2001) . this chromosomal translocation results in enhanced cellular proliferation by compromising endosomal fusion and traycking, and thus regulation of growth factor degradation. it is likely that this aberrant gene product is not degraded and triggers sustained intracellular signaling, leading to cell proliferation and tumor progression in a subset of lymphoid cells. recycling from endosomes back to the cell surface is often used by receptors that internalize nutrients such as lipoproteins and ions. receptor recycling rather than degradation conserves receptor functionality and nutrient uptake and reduces energy expenditure in the synthesis of new receptors (mukherjee et al., 1997) . genetic screens in the nematode caenorhabditis elegans identified rme-1 and delineated a new family of conserved class of eps15 homology (eh) domain proteins . both the worm and mouse homologs of rme-1 are associated with the endosomal compartment: a dominant-negative rme-1 g429r mutant had little evect on receptor-mediated endocytosis but had a substantial evect on endosomal recycling, suggesting a functional role in this step . although information is currently limited, a number of neurological diseases are associated with dysfunction of early endosomal proteins. in some cases of demyelinating polyneuropathy, characterized by progressive weakening and sensory dysfunction of the legs and arms, eea1 autoantibodies have been detected (selak et al., 1999) . a number of disorders, from muscular dystrophy to rheumatoid arthritis, reveal the presence of circulating anti-eea1 antibodies. interestingly, eea1 epitopes recognized by such autoantibodies varied from patient to patient (selak et al., 2003) . autoantibodies against eea1 have also been detected in cases of subacute cutaneous systemic lupus erythematosus (scle), characterized by the appearance of an unsightly red rash, often occurring after sun exposure (mu et al., 1995) . lysosomes are terminal, membrane-enclosed degradative compartments that interact with other organelles through vesicular transport originating from the secretory, endocytic, and autophagic pathways. this organelle stores various proteases, lipases, hydrolases, and degradative enzymes within an acidic environment that maximizes enzymatic activity and degradation. resident lysosomal membrane proteins, integral proteins, and glycoproteins are targeted to the organelle via the endosome. lysosomal proteases such as cathepsin d are processed in the golgi apparatus to add a mannose 6-phosphate (m6p) moiety to n-linked sugars. the m6p moiety is recognized by plasma membrane or tgn-resident mannose 6-phosphate receptors (m6prs) and sorted to the late endosome and eventually the lysosome. here, the acidic ph (ph < 5.5) results in receptor-ligand disassociation and recycling of the m6pr to the tgn. fusion between the endosome and preexisting primary lysosomes allows the delivery of lysosomal resident proteins. the importance of m6p-mediated targeting of lysosomal proteins is highlighted in the human neurological disorder, i-cell disease (mucolipidosis ii), where lysosomal enzymes are secreted from cells rather than targeted to the lysosome. the defect in i-cell disease involves lack of m6p moiety addition as a result of mutations to the n-acetylglucosamine-1-phosphotransferase enzyme usually present within the golgi apparatus (ben-yoseph et al., 1987) . how lysosomes are formed is still unclear (luzio et al., 2003) . three mechanisms have been proposed to explain lysosomal biogenesis: vesicular transport between late endosomes and preformed primary lysosomes (griyths and 34 gruenberg, 1991), early endosomal ''maturation'' to lysosomes (murphy, 1991) , or the current favored model of ''kiss-and-run,'' in which transient interactions between endosomes and lysosomes transfer endosomal contents to the latter compartment (duclos et al., 2003; storrie and desjardins, 1996) . late endosome and lysosome interactions in the kiss-and-run model are thought to be regulated by the rab7 gtpase, which is present on late endosomes; a vps complex, homologous to budding yeast vacuole fusion regulators, is also implicated in sorting and delivery to lysosomes (seals et al., 2000) . the mammalian form of the vps complex interacts with syntaxin-7, a t-snare that is concerned in regulating membrane dynamics along this route (kim et al., 2001) . danon disease is caused by point mutations in, or complete absence of lysosome-associated membrane protein 2 (lamp2) or complete absence of this protein: changes which result in cardiomyopathy, myopathy, and mental retardation. in danon disease patients and lamp2-deficient mice, autophagic vacuoles accumulate within the cytoplasm; these vacuoles arise via intracellular engulfment of old membranes to form an autophagosome, thus sequestering membranes and proteins for eventual degradation (shintani and klionsky, 2004) . autophagosomes fuse with lysosomes, leading to degradation for provision of molecules for cellular homeostasis. the accumulation of autophagic vacuoles in lamp2-deficient cells suggests that lamp2 mediates interactions between autophagosomes and lysosomes. this pathway is commonly activated during conditions of cellular stress such as starvation or pathogenic infection (kirkegaard et al., 2004) . lysosomal storage diseases are caused through insufficient degradation of targeted components within lysosomes, leading to substrate accumulation and lysosome enlargement. more than 40 lysosomal storage diseases have been documented and generally manifest themselves as neurological disorders; disease severity correlates with the levels of lysosomal enzyme activity. niemann-pick disease is a neurodegenerative condition caused by sphingomyelin accumulation in reticuloendothelial cells and ganglion neurons, leading to cell death. it is classified into five types (a-e), each distinguished by either clinical severity or age-related disease phenotype. niemann-pick type a (npa) is most common, with death occurring before 3 years of age. npa patients have point mutations in the smpd1 gene that encodes a lysosomal sphingomyelinase (levran et al., 1991; takahashi et al., 1992) . interestingly, in npc patients, endocytosed ldl particles are not fully degraded in lysosomes, leading to defects in cholesterol metabolism (li et al., 2005a) . the npc disease is caused by mutations in the npc1 gene, which encodes a lysosomal resident protein with similarity to sterol-sensing enzymes and proteins (scott and ioannou, 2004) . fabry disease is an x-linked condition caused by changes in lysosomal a-galactosidase activity resulting in glycosphingolipid accumulation within vascular endothelial lysosomes. this leads to angiokeratomas (a wart-like thickening of the skin), progressive renal impairment, cardiomyopathy, and cerebrovascular disease. mutations in the a-galactosidase a gene can also show reduced enzymatic activity of the encoded protein and retention within the endoplasmic reticulum (yasuda et al., 2003) . receptor tyrosine kinases such as epidermal growth factor receptor (egfr) are degraded by lysosomes after ligand binding and receptor activation. egfr lysosomal targeting is dependent on ligand-stimulated ubiquitination of the cytoplasmic domain. binding of egf to egfr causes downstream signaling, clathrin-mediated internalization, and traycking to endosomes. internalized receptor-ligand complexes are sorted to the late endosome or multivesicular bodies (mvbs), which eventually deliver their contents to the lysosome (katzmann et al., 2002) . whereas other receptors such as tfr are recycled to the plasma membrane, egfr is moved through the endosome-lysosome system by a ubiquitin-dependent sorting and recognition system. these include the hrs/stam heterodimer and tsg101 (bilodeau et al., 2002) present on endosomal membranes. the tsg101 tumor suppressor gene is mutated in nearly 50% of breast cancer patients and encodes a membrane-associated protein (lee and feinberg, 1997) . this factor participates in the sorting of ubiquitinated proteins on the endosome, but its exact function is not clear. in some specialized cells, such as cytotoxic t lymphocytes (ctls), platelets, and melanocytes, regulated secretion can be routed through compartments other than the tgn. such cells have evolved mechanisms whereby modified or secretory lysosomes release their contents at the plasma membrane in response to extracellular stimuli. these secretory lysosomes (sls) share lysosomal characteristics such as acid ph and lamp (lysosome-associated membrane proteins) residents but also contain unique markers such as tyrosinase, present in melanosomes. the ctl secretory lysosomes contain unique components such as perforin and granzymes required for triggering apoptosis in target cells. on ctl contact with a target cell, sls trayc toward the immunological synapse formed between the ctl and target cell. a signal then causes sl fusion with the ctl plasma membrane (stinchcombe et al., 2001) , and release of sl contents and subsequent target cell death. a number of autosomal genetic diseases causing immunodeficiency and albinism involve defects in regulated lysosomal secretion (stinchcombe et al., 2004) . in the rare, fatal disease familial hemophagocytic lymphohistiocytosis (fhl) sls congregate at the plasma membrane in ctls, where they can dock but cannot fuse with the membrane. in one group of fhl patients, this disease is due to a mutation in the gene encoding munc13-4; this is closely 36 related to the neuronal munc13-1 gene product that is involved in snare complex formation in neuronal cells (feldmann et al., 2003) . assembly of this neuronal syntaxin-1, snap-25, and synaptobrevin complex is regulated by munc18-1, which binds and locks syntaxin-1 (t-snare) in a closed, inactive conformation, thus preventing it from interacting with snap-25 (yang et al., 2000) . however, munc13-1 (sassa et al., 1999) and rim (rab3a-interacting protein evector) (koushika et al., 2001) can compete with munc18-1 and displace it from syntaxin-1. this reinforces the syntaxin-1 open conformation and allows snare complex formation to occur. munc13-1 may act as a conformational switch to promote t-snare into an ''open'' state, thus allowing formation of the snare complex that mediates synaptic vesicle docking and fusion. from observations of fhl patients, one speculation is that munc13-4 has a role similar to that of munc13-1 in regulating snare complex formation for sl docking and fusion in ctls (yang et al., 2000) . chediak-higashi syndrome (chs) is a key example of a disease avecting sl function in ctls, with patients displaying hypopigmentation (stinchcombe et al., 2004) . chs patients have genetic mutations in the lyst or chs1 gene (barbosa et al., 1996; perou et al., 1996) and produce ctls containing strikingly enlarged sls that are able to polarize to the immunological synapse but are unable to fuse with the pm. this suggests a role for the chs1 gene product in regulating membrane docking and fusion . overexpression of chs1 leads to the presence of small lysosomes, indicating increased lysosomal fission (ward et al., 2003) . in addition, chs1 interacts with snare proteins, further indicating a role in sl fusion (tchernev et al., 2002) . griscelli syndrome patients also display defects in sl dynamics within ctls and exhibit hypopigmentation and silvery hair. in melanocytes, cells responsible for pigment storage and production, rab27a is required to recruit melanophilin to pigment granules called melanosomes (sls). melanophilin binds the myosin motor protein myosin va and regulates melanosome movement along actin cables to the plasma membrane strom et al., 2002; wu et al., 2002) . in type 1 griscelli syndrome patients, rab27a gtpase is missing or defective, whereas in type 2 griscelli syndrome patients the myosin va motor protein is absent. these defects are also evident in mouse models such as ashen (rab27a defective), dilate (myosin va defective), and leaden (melanophilin defective). in both the human griscelli syndromes and the mouse models, melanosomes are clustered in a perinuclear location, a defect attributed to rab27a dysfunction (wilson et al., 2000; wu et al., 2001) . interestingly, ctls isolated from type 1 griscelli syndrome patients and ashen mice (rab27a deficient) are unable to kill target cells, whereas type 2 griscelli syndrome patient and dilate mouse ctls are functional. this suggests that rab27a interacts with diverent evectors to induce sl fusion with the plasma membrane in diverent cell types (haddad et al., 2001) . hermansky-pudlak syndrome (hps) is a fourth example of sl dysfunction and is characterized by oculocutaneous albinism, ceroid deposition, and excessive prolonged bleeding (hermansky and pudlak, 1959; swank et al., 2000) . however, hps cannot be viewed as a single disease but a group of at least seven autosomal genetic disorders. each of the seven subgroups (hps1-7) is due to mutations in individual genes, most of which encode components of multisubunit protein complexes involved in vesicle traycking , whereas the function of others remains unclear. three of these complexes, termed blocs (biogenesis of lysosome-related organelle complexes), play a role in regulating traycking involved in platelet and melanosome secretion, but their exact functions are unclear (di pietro and dell'angelica, 2005) . in hps2 patients a nonsense mutation in the gene encoding the b3a subunit of the ap3 adaptor protein prevents expression of this subunit (huizing et al., 2002) . as previously mentioned, ap3 is involved in the recruitment of transmembrane proteins into vesicles at the early endosome for delivery to the lysosome (peden et al., 2004) . in melanocytes derived from hps2 patients, the tyrosinase that catalyzes the formation of melanin is not transported to maturing melanosomes (huizing et al., 2001) . this leads to the characteristic pattern of albinism seen in patients with this condition. furthermore, patients with hps2 display an impaired ctl response and immune response. ctls from hps2 patients have lytic granules that cannot move in an oriented fashion toward the microtubule-organizing center; therefore, when ctls are stimulated by contact with target cells, the lytic granules are not targeted to the immunological synapse for cell killing . studies on the cell biology of hiv infection have suggested the existence of a viral secretory compartment. work by marsh and others pelchen-matthews et al., 2003) has localized viral envelope (gp120) and matrix proteins (p17) to tetraspanin-positive endosome-related organelles in infected macrophages and dendritic cells. these viral secretory compartments move from an intracellular localization to an infectious synapse when infected macrophages or dendritic cells form an immunological synapse with activated t cells. this may be one mechanism for subsequent viral infection of cd4-positive t cells, thus causing the impaired immune response seen in patients with acquired immunodeficiency syndrome (aids). cells require a highly organized framework or cytoskeleton to station and move membrane organelles within three-dimensional space. components of the cytoskeleton can guide organelles or vesicles to specific destinations within the cell. the microtubule cytoskeleton is commonly associated with 38 the directional movement of intracellular transport vesicles or intermediates. in contrast, actin has been envisioned to have a structural role in determining cell shape, plasma membrane dynamics, and cell locomotion. however, evidence points to a role for actin in regulated traycking from the tgn (allan et al., 2002; badizadegan et al., 2004; cobbold et al., 2004) and endocytosis (ascough, 2004; engqvist-goldstein and drubin, 2003) . the cytoskeleton is a dynamic structure likened to a collapsible scavold that can be rapidly disassembled and reconstituted depending on cellular requirements. actin or tubulin polymerization (elongation) and depolymerization (breakdown) rely on the controlled addition or removal of monomers in a polarized and energy-dependent manner. protofilaments in either structure are both polarized, with the plus end growing at a faster rate. actin cables are each composed of two parallel protofilaments that twist around each other, whereas microtubules are composed of a hollow cylindrical structure comprising 13 parallel protofilaments. actin nucleation is an initial step required for elongation involving formation of a stable trimer subunit base for protofilament elongation. a heptameric complex termed arp2/3 (actin-related protein) binds to the ends and sides of actin filaments to nucleate and further accelerate the growth of the actin network (millard et al., 2004) . the function of the arp2/3 complex can be regulated by membrane-associated rho gtpases. these regulators, which include cdc42 and various rac isoforms, act as molecular switches that cycle between an active gtp-bound state and an inactive gdp-bound state. cdc42 regulates arp2/3 indirectly through its downstream target wiskott-aldrich syndrome protein (wasp), which binds directly to the arp2/3 complex (jave and hall, 2005) . patients with x-linked wiskott-aldrich syndrome display mutations in the wasp gene and have thrombocytopenia (reduced platelet count), eczema, recurrent infections, hematologic malignancy, and autoimmune disorders (lemahieu et al., 1999) . approximately 300 disease mutations in wasp have been reported, which lead to defective control of wasp in actin polymerization and severe disease phenotypes (burns et al., 2004) . wasp expression is restricted to hematopoietic cells, although the ubiquitously expressed n-wasp is present in various cells and tissues (burns et al., 2004) . the actin network is important for the formation of immunological synapses between cytotoxic t lymphocytes (ctls) and their targets, as well as t lymphocytes and antigen-presenting cells such as macrophages. the formation of the immunological synapse in ctls is essential for the transport, docking, and fusion of sls and subsequent destruction of the target cell as described above. defective wasp inhibits the formation of the immunological synapse and t cell activation (notarangelo and ochs, 2003) , probably causing the immunological deficiencies observed in wiskott-aldrich syndrome patients. wasp deficiency in t lymphocytes also avects the regulation and composition of lipid rafts (dupre et al., 2002) , indicating that the cell biology of membrane trafficking in human disease formation of the immunological synapse is dependent on lipid rafts, wasp, and actin dynamics. motor proteins provide the physical force to move membrane vesicles along the polymerized cytoskeletal filaments via atp-dependent hydrolysis. actinbased motor proteins belong to the myosin superfamily. the myosin va gene is mutated in a small number of patients with griscelli syndrome (bahadoran et al., 2003; pastural et al., 2000) (other traycking mutations contributing to griscelli syndrome are discussed in section v.c). mutations in the myosin viia gene can cause usher's syndrome, resulting in blindness and deafness. intracellular transport is probably compromised in usher syndrome patients; the mouse shaker model has a mutant myosin viia gene, displaying defective melanosome transport in retinal pigment epithelial cells (liu et al., 1998) and altered distribution in photoreceptor cells (richardson et al., 1997) . microtubule motor proteins, which actively move vesicles along the microtubules, and microtubule-associated proteins (maps), serve as docking molecules to bind cargo to motor proteins (gerdes and katsanis, 2005) . microtubule motor proteins belong to either the kinesin or dynein families. with the exception of the c-terminal kinesins, kinesin-based motors generally transport cargo toward the plus end of the microtubule, whereas the dyneins are minus end-directed motors. long-range vesicular transport is particularly important in neurons, where axons can reach up to 1 m in length. newly synthesized lipids, and secreted or membrane-associated proteins, are made in the cell body; long-range and directional transport is crucial for replenishing the constituents of the presynaptic cleft (at the terminal end of the axon) with synaptic vesicles and plasma membrane receptors (holzbaur, 2004) . a number of human neurological diseases are linked to mutations in microtubule motors and associated proteins. lissencephaly, a greek term meaning ''smooth brain,'' causes severe brain malformation resulting in mental retardation and epilepsy. one of the genes mutated in the disease is lis1 (originally identified in miller-dieker syndrome patients with lissencephaly) (reiner et al., 1993) . the lis1 protein regulates microtubule motor function by binding dynein1 and p150 glued , a component of the dynactin complex that binds to and activates dynein (smith et al., 2000) . it is proposed that lis1 regulates retrograde axonal transport. another gene mutated in some patients with lissencephaly is doublecortin, a microtubule-associated protein that binds tubulin and stabilizes microtubules (horesh et al., 1999; moores et al., 2004) . the kif1b kinesin regulates transport of synaptic vesicle precursors along the neuronal axon. patients with charcot-marie-tooth disease type 2a display neuronal axonal degeneration due to a loss-of-function mutation in the motor domain of kif1b . in alzheimer's disease, a classical sign is hyperphosphorylated aggregates of the microtubule-associated 40 protein, tau, in neuronal cells and tissues. tau protein can influence vesicular transport (ebneth et al., 1998) by regulating the attachment of motors to microtubules (trinczek et al., 1999) . one theory is that the tau protein can interfere with kinesin-dependent transport by blocking motor access to microtubules, thus slowing or preventing vesicle movement along axons (mandelkow et al., 2003) . moreover, an early sign of alzheimer's disease is the loss of synapses and retrograde degeneration of neurons, complemented by a breakdown in intracellular transport. the disruption of microtubule-mediated vesicular traycking may also be a causative factor of the neurodegenerative phenotype of huntington's disease. this disease is caused by expansion of polyglutamine repeats occurring in the brain-enriched protein huntingtin (htt). it has been demonstrated that htt enhances vesicular transport of brain-derived neurotrophic factor (bdnf) along microtubules (gauthier et al., 2004) . htt is localized in the cytoplasm and is associated with vesicular and microtubule-based trayc through its ability to bind huntingtin-associated protein 1 (hap1) (li et al., 1995b) , a protein that has aynity for the dynactin p150 glued subunit (engelender et al., 1997) . dysfunctional polyq-htt associated with the disease state may disrupt the transport of bdnf by binding and blocking the hap1/dynactinmediated delivery of bdnf vesicles along microtubules (gauthier et al., 2004) . this is further supported by the finding that bdnf levels are decreased in brains of huntington's disease patients (ferrer et al., 2000) . the current treatment of genetic disorders involves addressing the symptoms rather than the cause. to that end, many mild forms of disorders such as niemann-pick disease, and familial hypercholesterolemia, can be controlled by diet regimens and lifestyle changes. in contrast, a life-threatening disease such as cystic fibrosis requires extensive physiotherapy and pulmonary exercise to loosen and prevent mucus accumulation within the lungs. new antidementia drugs are increasingly successful in treating neurological disorders such as alzheimer's disease. drugs such as galantamine, donepezil, rivastigmine, and memantine target the posttranslational processing of bapp to reduce amyloid deposits (prasher, 2004) . in familial hypercholesterolemia, statin treatment is a common strategy for reducing plasma ldl and cholesterol levels by targeting hmg-coa reductase, the rate-limiting enzyme in cellular cholesterol biosynthesis. furthermore, less commonly used ldl-lowering drugs such as probucol have shown some success in (buckley et al., 1989) . the administration of adrenalin receptor antagonists (b-blockers) to patients with the cardiac condition long-qt syndrome reduces arrhythmia risk. enzyme replacement therapy (ert) has been carried out in patients with fabry's disease, a lysosomal storage disease. patients are given recombinant lysosomal a-galactosidase (mignani and cagnoli, 2004) to reduce the risks of strokes and kidney failure associated with the condition. finally, organ transplantation is occasionally carried out for some disease states: for example, bone marrow transplants for chediak-higashi syndrome patients (liang et al., 2000) and rab27a-defective patients with griscelli syndrome (schuster et al., 2001) and wiskott-aldrich syndrome (filipovich et al., 2001) . however, although transplant operations can be successful in alleviating the immunological issues associated with these diseases, it does not address problems associated with the nervous system or pigmentation. b. gene therapy: the next generation of medical treatment? completion of the human genome sequencing project has given science the ability to track gene(s) responsible for potentially any genetic disorder and, as a consequence, to allow these genes to be corrected in patients. this is the goal of gene therapy research. of course, gene therapy has a fundamental limitation: it is only really suitable for single-gene defect diseases, and multigenic or chromosomal defects will be beyond the ability of the technique because of the complex nature of the disease. however, there are more than 2500 single gene defects that cause human disease, so there are many diseases requiring such approaches. the history of gene therapy is discussed in more detail by russell (1997) and scollay (2001) . much evort has been made in developing techniques that allow successful replacement or augmentation of defective genes. gene therapy is performed by introducing a gene vehicle directly into the patient (in vivo) or by removing cells from a patient, introducing the gene into these cells in culture, and replacing the cells back in the patient (ex vivo). most studies have focused on the use of viral vectors as delivery vehicles. retroviruses are potentially the best gene delivery system (kurian et al., 2000) . these rna viruses are able to infect a great many cell types and replicate by inserting their viral genes into the genome of the host. the host cellular machinery is then modulated to produce and assemble viral particles. in gene therapy, retroviruses could be used to express the target gene to be replaced but be modified to prevent viral disease (hiv, which is the causative agent of aids, is also a retrovirus). the principal drawback of a retrovirus vector is the possibility that genomic integration could elevate oncogene expression, thus causing cancer. therefore, the majority of clinical trials using retrovirus vectors have been performed ex vivo. a ''successful'' gene therapy experiment was exemplified in the case of a 4-year-old female patient lacking adenosine deaminase (ada), which results in severely compromised immunodeficiency (ada-scid) and dysfunctional t cells (blaese et al., 1995) . in this case, a retroviral vector was used to deliver the coding sequence for ada into cells, resulting in successful expression of this enzyme in hitherto defective cells. although successful, it is uncertain whether enzyme replacement treatment (recombinant ada injections) also influenced the patient outcome. adenovirus (adv) (mcconnell and imperiale, 2004) is a dna virus and key gene therapy vehicle that maintains the viral genome as a separate transmissible episome within the nuclei of infected host cells. the use of attenuated or inactivated adv for human gene delivery has attracted much interest. the advantages of adv gene transfer are that its genome can easily be manipulated and recombinant virus can be grown to high titers in vitro with eycient transduction of target cells in vitro or in vivo. as adv can evectively infect nondividing cells such as lung pulmonary tissues it is a popular vector of choice for gene therapy to treat cystic fibrosis (cf) patients. although there are promising studies (zabner et al., 1993) , failures have also been noted (knowles et al., 1995) . a major disadvantage of an adv-based approach is the triggering of a strong host immune response to the virus, which becomes a serious problem in subsequent long-term delivery of recombinant virus for disease alleviation. one approach to circumventing such an issue is to use a viral delivery system that produces a low host immune response such as the adeno-associated virus (aav) (flotte, 2005) . aav is a nonpathogenic virus that requires coinfection with a helper virus to replicate. however, aav has broad host cell specificity and is diycult to grow in large quantities, probably because of its reliance on a helper virus. finally, nonviral methods are increasingly available for the delivery of dna constructs directly into cells and tissues. these are often lipid-based reagents (e.g., liposomes) that bind to the plasmid dna and fuse with the plasma membrane, thus enabling cytosolic delivery of the gene. the plasmid dna would then be transported into the host nucleus by endogenous cellular machinery. this type of gene delivery can only be performed ex vivo and can be limited by the poor dna transfection eyciency of primary cells or tissues. this type of method, however, is a potentially useful method for delivering genes into progenitors or precursors (e.g., stem cells) before cellular cell biology of membrane trafficking in human disease diverentiation and tissue formation within a particular microenvironment in the body (mendell et al., 1995) . numerous disease states are caused by protein misfolding within the er, leading to degradat ion (table i ; section s iii. a.1 an d iii.a.2 ). one strategy would be to promote the correct protein conformation in a mutant gene product either chemically or pharmacologically. a number of membrane-divusible chemical and pharmacological ''chaperones'' have been identified that could avect protein folding in cells. chemical chaperones such as glycerol and trimethylamine n-oxide (tmao) can restore the wild-type traycking and activity of cftráf508 in cultured epithelial cells (brown et al., 1996) , and porcine kidney epithelial cells expressing cftráf508 and treated with dimethyl sulfoxide (dmso) increased plasma membrane levels of the channel protein (bebok et al., 1998) . loo et al. (2005) have demonstrated that a novel quinazoline derivative specific for cftr will rescue the defective traycking of cftráf508 in cultured cells. cell surface levels of a water channel, aquaporin-2, can be enhanced with dmso (tamarappoo and verkman, 1998) . defects in this gene can result in x-linked nephrogenic diabetes insipidus, a condition in which patients are unable to concentrate their urine because of an inability to reabsorb water from the kidneys into the blood. although chemical chaperones are somewhat nonspecific in their action (the protein folding of the whole cell is avected and not just the target protein), pharmacological chaperones can be tailored to individual proteins. for example, the compound sr121463a is a nonpeptide vasopressin v2 receptor antagonist (morello et al., 2000; robert et al., 2005) . patients with a mutant vasopressin v2 receptor can also display nephrogenic diabetes insipidus. on treatment, the cell-permeant sr121463a compound would act as a chaperone and accompany the mutant v2 receptor to the cell surface to rescue correct functionality. geldanamycin, a naturally occurring antifungal agent, has potential as an anticancer drug (beliakov and whitesell, 2004; miyata, 2005) . geldanamycin interacts with and inhibits activity of the heat shock protein and chaperone hsp90, a cytosolic cellular stress protein that supports the correct folding, stability, and function of ''client'' proteins. many hsp90 client proteins are implicated in cell cycle progression, proliferation, and angiogenesis (whitesell and lindquist, 2005) . the erbb2 tyrosine kinase complex is implicated in regulation and development of epithelial breast tumors and is an hsp90 44 client. inhibition of hsp90 action by geldanamycin results in degradation of both erbb2 and downstream signaling evectors, resulting in reduced cellular growth and tumor formation (citri et al., 2004) . a number of cell-permeable peptide sequences found in viruses and host proteins have been discovered that mediate the delivery of cargo (proteins, drugs, plasmid dna, oligonucleotides) directly into cells (brooks et al., 2005; gupta et al., 2005; schwartz and zhang, 2000) . such peptide sequences could be fused or attached to recombinant or engineered proteins and administered to patients to complement defects of a particular gene product. for example, the drosophila melanogaster antennapedia homeodomain (antp) transcription factor contains a short 16-residue sequence that mediates protein translocation across biological membrane bilayers in an energy-independent manner (derossi et al., 1994; joliot et al., 1991) . other sources of cell membrane-permeable proteins have been uncovered in viruses. the hiv-1 replication protein tat contains a basic, arginine-and lysine-rich peptide sequence (residues 47-57) that modulates the translocation of exogenous tat across the plasma membrane in a number of cell types, and is able to activate intracellular genes controlled by an hiv promoter (frankel and pabo, 1988; mann and frankel, 1991) . this basic 10-residue sequence can internalize conjugated b-galactosidase and horseradish peroxidase (fawell et al., 1994) as well as a fab antibody fragment (anderson et al., 1993) . the major structural protein of herpesvirus (hsv-1), vp22, can trayc between cells (elliott and o'hare, 1997) , whereas the pres-2 domain of hepatitis b virus surface antigen acts as a shuttle for peptides and functional proteins (such as egfp) in hepatocytes and other cells (oess and hildt, 2000) , suggesting further the existence of naturally occurring peptide sequences that may act as drug delivery vectors. finally, a ''synthetic'' amphipathic peptide, fluos-klalklalkalkaalkla-nh 2 , has been shown to be internalized in mast and endothelial cells (oehlke et al., 1998) . the employment of small molecular inhibitors as a method of treating human disease has moved at exponential pace. a number of compounds have been synthesized or isolated from nonhuman organisms that directly avect cellular function and have been used in research on a variety of diseases. plant-and microorganism-derived polyhydroxylated alkaloids referred to as iminosugars have been used in the treatment of patients with gaucher disease (cox et al., 2000) . gaucher disease type i and type ii is a lysosomal storage disorder caused by a mutation in the gene encoding the cell biology of membrane trafficking in human disease acid b-glucosidase (gba) enzyme and results in the accumulation of toxic glucosylceramide in a patient's spleen, liver, and bones; manifesting itself in enlargement of these organs, as well as heart and lung disease. iminosugars act on glycosylating enzymes present within the er and golgi and inhibit their ability to transfer sugar moieties onto proteins. one member of the iminosugar family, n-butyldeoxynojirimycin (nb-dnj; also called miglustat or zavesca) inhibits the enzyme important in the maturation of the gba substrate glucocerebroside, namely ceramide glucosyltransferase (cgt) (butters et al., 2003) . inhibition of cgt has resulted in significantly reduced levels of glucocerebroside in the liver and spleen of patient in clinical trials (cox et al., 2000) . however, nearly 80% of patients in the trials displayed osmotic diarrhea as a side evect of the treatment. in a mouse model for human tay-sachs disease, which is caused by a mutation in the gene encoding hexosamidase a, levels of the harmful glycophospholipid gm 2 were significantly reduced on treatment with nb-dnj (platt et al., 1997) . in addition to the treatment of lysosomal storage diseases, an nb-dnj derivative called miglitol has been used in the treatment of diabetes mellitus, resulting in reduced activity of the sucrose-isomaltase enzyme complex and reduction of carbohydrate digestion (mitrakou et al., 1998) . a major aspect of human disease is the production and subsequent degradation of misfolded proteins, either by the proteasome or within the lysosome. lysosomotropic agents such as chloroquine cause an increase in the intralumenal ph of endosomes and lysosomes, reducing lysosomal protease activities and the traycking of proteins through the endosome-lysosome system. a number of proteasome inhibitors such as mg132, lactacystin, and alln can specifically inhibit the activity of a range of serine and cysteine proteases and chymotrypsin-like enzymes (kisselev and goldberg, 2001) . proteasome inhibition has been linked with a number of aspects of human disease. treatment of endothelial cells with proteasome inhibitors resulted in apoptosis of proliferating cells (drexler et al., 2000) and inhibition of plasminogen activator levels; this factor promotes angiogenesis and new blood vessel sprouting (oikawa et al., 1998) . however, inhibiting proteasome function has broad cytotoxic and apoptotic evects in cells and tissues. chemotherapeutic agents targeting signaling pathways are currently of much interest in relation to cancer therapeutics. cellular proliferation can be regulated by growth factor binding to a cell surface receptor and signaling through either the mitogen-activated protein kinase (mapk) or phosphoinositide-3-kinase (pi3k) cascades. activation of these pathways induces the expression of oncogenes such as c-jun and c-fos and inhibits apoptosis through a sequence of protein phosphorylation events. shelton et al. (2003) demonstrated that inhibition of the mapk pathway with small molecule inhibitors specific for raf-1 or mek reduced cell proliferation and induced apoptosis in conditionally transformed hematopoietic cells. however, 46 such pathways also regulate other cellular functions besides proliferation or apoptosis and there are likely to be consequences for cellular homeostasis. structural studies are important in the development of new small molecule inhibitors that target specific enzymes and regulators. c-akt (pkb) is a serine/ threonine protein kinase required for survival and proliferation in many human cancers and its structure has been elucidated (kumar and madison, 2005 ; yang et al ., 2002; and refer ences therei n). chem otherape utic agents have been consequently designed that inhibit c-akt activity; molecules such as h-89 target the atp-binding pocket in c-akt (kumar and madison, 2005) . compounds that bind specifically to c-akt isoforms or target specific domains within the kinase have been reported (barnett et al., 2005) , but there are no reports of clinical trials with such compounds (kumar and madison, 2005) . finally, small molecule inhibitors are being developed to target the posttranslational processing of proteins or peptides implicated in human disease. the enzyme that catalyzes the initial steps in b-amyloid synthesis, g-secretase, is an attractive target for prevention of amyloid deposits in alzheimer's disease patients (churcher and beher, 2005) . such small molecule inhibitors could also be used to treat pathogenic infections such as those caused by severe acute respiratory syndrome (sars), influenza, hiv, or hepatitis c viruses. attractive targets are virus-encoded or host proteases required for processing of viral proteins to generate infectious virus particles from the host cell. in the case of the sars virus, a viral chymotrypsin-like cysteine protease is responsible for processing sars viral proteins required for viral replication. inhibition of this protease would evectively inhibit viral replication. a molecule referred to as cs11 was found to inhibit the replication of human sars with no toxic evect on normal cells (dooley et al., 2006) . much work is also being carried out in targeting host proteases required for the processing of hiv envelope glycoproteins by the biosynthetic secretory pathway to generate viral gp120 and gp41 polypeptides. the completion of the human genome sequencing project has led to the prediction that a large number of diseases will be identified and understood at the gene level . as noted in this review, a number of examples exist in which a single gene mutation can have devastating evects on human function. at present, the symptoms of some mild forms of genetic diseases can be modulated through diet or drug regimens, and some success has been achieved with organ transplantation. gene therapy has attracted much attention but has suvered setbacks due to viral toxicity issues. an alternative strategy is the use of small molecule therapeutics, which cell biology of membrane trafficking in human disease may override specific defects or target specific pathways to compensate for gene defect(s). in addition, our understanding of how we respond at a genetic level to pathological infection will enable us to design evective drug strategies to presently chronic infections. in essence, understanding the cell biological basis for human diseases will enable us to design evective methods to deliver therapeutic strategies to patients. caveolin-1 and caveolin-2, together with three bone morphogenetic protein-related genes, may encode novel tumor suppressors down-regulated in sporadic follicular thyroid carcinogenesis endocytosis of the glucose transporter glut4 is mediated by the gtpase dynamin motoring around the golgi the p115-interactive proteins gm130 and giantin participate in endoplasmic reticulum-golgi trayc cftr and chaperones: processing and degradation tumor cell retention of antibody fab fragments is enhanced by an attached hiv tat protein-derived peptide endocytosis: actin in the driving seat traycking of cholera toxin-ganglioside gm1 complex into golgi and induction of toxicity depend on actin cytoskeleton comment on elejalde syndrome and relationship with griscelli syndrome identification of the homologous beige and chediak-higashi syndrome genes coupled er to golgi transport reconstituted with purified cytosolic proteins the akt/pkb family of protein kinases: a review of small molecule inhibitors and progress towards target validation the adaptor protein ap-4 as a component of the clathrin coat machinery: a morphological study role of diacylglycerol in pkd recruitment to the tgn and protein transport to the plasma membrane loss of function associated with novel mutations of the scn5a gene in patients with brugada syndrome disassembly and reassembly of the golgi apparatus activation of deltaf508 cftr in an epithelial monolayer hsp90: an emerging target for breast cancer therapy snares and the specificity of transport vesicle targeting mutations associated with neutropenia in dogs and humans disrupt intracellular transport of neutrophil elastase altered molecular size of n-acetylglucosamine 1-phosphotransferase in i-cell disease and pseudo-hurler polydystrophy congenital and acquired neutropenia the role of regulated cftr traycking in epithelial secretion phosphatidic acid formation by phospholipase d is required for transport from the endoplasmic reticulum to the golgi complex degradation of subunits of the sec61p complex, an integral component of the er membrane, by the ubiquitin-proteasome pathway role of cue1p in ubiquitination and degradation at the er surface the vps27p hse1p complex binds ubiquitin and mediates endosomal protein sorting t lymphocyte-directed gene therapy for ada-scid: initial trial results after 4 years regulation of innate immunity by rho gtpases signals for sorting of transmembrane proteins to endosomes and lysosomes caveolin-1 in breast cancer tat peptide-mediated cellular delivery: back to basics chemical chaperones correct the mutant phenotype of the delta f508 cystic fibrosis transmembrane conductance regulator protein the small gtpase rab5 functions as a regulatory factor in the early endocytic pathway probucol. a reappraisal of its pharmacological properties and therapeutic use in hypercholesterolaemia the wilson disease gene is a putative copper transporting p-type atpase similar to the menkes gene coiled coils: a highly versatile protein folding motif mechanisms of waspmediated hematologic and immunologic disease a step forward for stiv-person syndrome small-molecule therapeutics for the treatment of glycolipid lysosomal storage disorders hepatic endoplasmic reticulum storage diseases initial docking of er-derived vesicles requires uso1p and ypt1p but is independent of snare proteins a role for cbs domain 2 in traycking of chloride channel clc-5 a ''de novo'' point mutation of the low-density lipoprotein receptor gene in an italian subject with primary hypercholesterolemia actin dynamics during phagocytosis adpkd: a human disease altering golgi function and basolateral exocytosis in renal epithelia localization of low molecular weight gtp binding proteins to exocytic and endocytic compartments multiple forms of dynamin are encoded by shibire, a drosophila gene involved in endocytosis function of rho family proteins in actin dynamics during phagocytosis and engulfment lowe syndrome protein ocrl1 interacts with clathrin and regulates protein traycking between endosomes and the trans-golgi network gamma-secretase as a therapeutic target for the treatment of alzheimer's disease the achilles heel of erbb-2/her2: regulation by the hsp90 chaperone machine and potential for pharmacological intervention lytic granules, secretory lysosomes and disease adaptor protein 3-dependent microtubule-mediated movement of lytic granules to the immunological synapse novel membrane trayc steps regulate the exocytosis of the menkes disease atpase the menkes disease atpase (atp7a) is internalized via a rac1-regulated, clathrin-and caveolae-independent pathway actin and microtubule regulation of trans-golgi network architecture, and copper-dependent protein transport to the cell surface role of caveolae and caveolins in health and disease the human genome project: lessons from large-scale biology rab and arf gtpase regulation of exocytosis novel oral treatment of gaucher's disease with n-butyldeoxynojirimycin (ogt 918) to decrease substrate biosynthesis copper transporting p-type atpases and human disease defective intracellular transport and processing of oa1 is a major cause of ocular albinism type 1 involvement of clathrin-mediated endocytosis in human immunodeficiency virus type 1 entry fc receptor biology the molecular basis of alanine: glyoxylate aminotransferase mistargeting: the most common single cause of primary hyperoxaluria type 1 a role of amphiphysin in synaptic vesicle endocytosis suggested by its binding to dynamin in nerve terminals the synaptic vesicle-associated protein amphiphysin is the 128-kd autoantigen of stiv-man syndrome with breast cancer a novel form of hereditary myeloperoxidase deficiency linked to endoplasmic reticulum/proteasome degradation the low lysine content of ricin a chain reduces the risk of proteolytic degradation after translocation from the endoplasmic reticulum to the cytosol low-density lipoprotein receptor-its structure, function, and mutations association of the ap-3 adaptor complex with clathrin ap-4, a novel protein complex related to clathrin adaptors altered traycking of lysosomal proteins in hermansky-pudlak syndrome due to mutations in the beta 3a subunit of the ap-3 adaptor rhodopsin c terminus, the site of mutations causing retinal disease, regulates traycking by binding to adp-ribosylation factor 4 (arf4) the third helix of the antennapedia homeodomain translocates through biological membranes rab geranylgeranyl transferase alpha mutation in the gunmetal mouse reduces rab prenylation and platelet synthesis ubiquitination is required for the retrotranslocation of a short-lived luminal endoplasmic reticulum glycoprotein to the cytosol for degradation by the proteasome mutations in the mdr3 gene cause progressive familial intrahepatic cholestasis the coiled-coil membrane protein golgin-84 is a novel rab evector required for golgi ribbon formation the cell biology of hermansky-pudlak syndrome: recent advances from genome to drug lead: identification of a small-molecule inhibitor of the sars virus cooperation of ggas and ap-1 in packaging mprs at the trans-golgi network inhibition of proteasome function induces programmed cell death in proliferating endothelial cells the t(10;11)(p13;q14) in the u937 cell line results in the fusion of the af10 gene and calm, encoding a new member of the ap-3 clathrin assembly protein family remodeling of endosomes during lysosome biogenesis involves ''kiss and run'' fusion events regulated by rab5 er-to-golgi transport: cop i and cop ii function (review) hyperinsulinism in infancy: from basic science to clinical disease wiskott-aldrich syndrome protein regulates lipid raft dynamics during immunological synapse formation caveolae and sorting in the trans-golgi network of epithelial cells overexpression of tau protein inhibits kinesin-dependent traycking of vesicles, mitochondria, and endoplasmic reticulum: implications for alzheimer's disease a novel gene for autosomal dominant stargardt-like macular dystrophy with homology to the sur4 protein family intercellular traycking and protein delivery by a herpesvirus structural protein cisternal maturation and vesicle transport: join the band wagon! (review) huntingtin-associated protein 1 (hap1) interacts with the p150glued subunit of dynactin genes encoding human caveolin-1 and -2 are co-localized to the d7s522 locus (7q31.1), a known fragile site (fra7g) that is frequently deleted in human cancers actin assembly and endocytosis: from yeast to mammals accelerated transport and maturation of lysosomal alpha-galactosidase a in fabry lymphoblasts by an enzyme inhibitor lowe syndrome protein ocrl1 interacts with rac gtpase in the trans-golgi network tat-mediated delivery of heterologous proteins into cells munc13-4 is essential for cytolytic granules fusion and is mutated in a form of familial hemophagocytic lymphohistiocytosis (fhl3) brain-derived neurotrophic factor in huntington disease caveolae and intracellular traycking of cholesterol impact of donor type on outcome of bone marrow transplantation for wiskott-aldrich syndrome: collaborative study of the international bone marrow transplant registry and the national marrow donor program adeno-associated virus-mediated gene transfer for lung diseases the ap-1a and ap-1b clathrin adaptor complexes define biochemically and functionally distinct membrane domains the endoplasmic reticulum as a site of protein degradation de novo formation of caveolae in lymphocytes by expression of vip21-caveolin identification of a di-leucine motif within the c terminus domain of the menkes disease protein that mediates endocytosis from the plasma membrane cellular uptake of the tat protein from human immunodeficiency virus induction of p38 mitogen-activated protein kinase reduces early endosome autoantigen 1 (eea1) recruitment to phagosomal membranes caveolin-3 null mice show a loss of caveolae, changes in the microdomain distribution of the dystrophin-glycoprotein complex, and t-tubule abnormalities phenotypic behavior of caveolin-3 mutations that cause autosomal dominant limb girdle muscular dystrophy (lgmd-1c). retention of lgmd-1c caveolin-3 mutants within the golgi complex hiv-1 traycking to the dendritic cell-t-cell infectious synapse uses a pathway of tetraspanin sorting to the immunological synapse structural basis of fabry disease fyve fingers bind ptdins huntingtin controls neurotrophic support and survival of neurons by enhancing bdnf vesicular transport along microtubules microtubule transport defects in neurological and ciliary disease wilson disease the alpha chain of the ap-2 adaptor is a clathrin binding subunit intracellular transport and sorting of the oligodendrocyte transmembrane proteolipid protein disrupted proteolipid protein traycking results in oligodendrocyte apoptosis in an animal model of pelizaeus-merzbacher disease evidence that rme-1, a conserved c. elegans eh-domain protein, functions in endocytic recycling the arguments for pre-existing early and late endosomes intracellular delivery of large molecules and small particles by cell-penetrating proteins and peptides a gtpase-activating protein controls rab5 function in endocytic traycking defective granule exocytosis in rab27a-deficient lymphocytes from ashen mice synaptojanin 1: localization on coated endocytic intermediates in nerve terminals and interaction of its 170 kda isoform with eps15 mutation and aberrant expression of caveolin-1 in human oral squamous cell carcinomas and oral cancer cell lines traycking, turnover and membrane topology of prp charcot-marie-tooth neuropathy type 1b is associated with mutations of the myelin p0 gene invasion activating caveolin-1 mutation in human scirrhous breast cancers identification and functional analysis of a caveolin-3 mutation associated with familial hypertrophic cardiomyopathy accumulating evidence suggests that several ab-toxins subvert the endoplasmic reticulum-associated protein degradation pathway to enter target cells the delta f508 mutation shortens the biochemical half-life of plasma membrane cftr in polarized epithelial cells dynaminmediated internalization of caveolae albinism associated with hemorrhagic diathesis and unusual pigmented reticular cells in the bone marrow: report of two cases with histochemical studies evects of mutant rat dynamin on endocytosis characterization of a fourth adaptor-related protein complex missense mutation (c1263r) in the thyroglobulin gene causes congenital goiter with mild hypothyroidism by impaired intracellular transport a novel leukocyte adhesion deficiency caused by expressed but nonfunctional beta2 integrins mac-1 and lfa-1 caveolae: stable membrane domains with a potential for internalization doublecortin, a stabilizer of microtubules a novel rab5 gdp/gtp exchange factor complexed to rabaptin-5 links nucleotide exchange to evector recruitment and function hereditary neutropenia: dogs explain human neutrophil elastase mutations ap-3 mediates tyrosinase but not trp-1 traycking in human melanocytes nonsense mutations in adtb3a cause complete deficiency of the {beta} 3a subunit of adaptor complex-3 and severe hermansky-pudlak syndrome type 2 the leaden gene product is required with rab27a to recruit myosin va to melanosomes in melanocytes caveolae and lipid rafts: g protein-coupled receptor signaling microdomains in cardiac myocytes rho gtpases: biochemistry and biology a common w556s mutation in the ldl receptor gene of danish patients with familial hypercholesterolemia encodes a transport-defective protein antennapedia homeobox peptide regulates neural morphogenesis identification and molecular characterisation of a calm-af10 fusion in acute megakaryoblastic leukaemia mutations in a sar1 gtpase of copii vesicles are associated with lipid absorption disorders substitution of arginine for histidine at position 209 in the alpha-subunit of the human insulin receptor. a mutation that impairs receptor dimerization and transport of receptors to the cell surface distinct sets of sec genes govern transport vesicle formation and fusion early in the secretory pathway metabolic and molecular bases of menkes disease and occipital horn syndrome receptor downregulation and multivesicular-body sorting stress signaling from the lumen of the endoplasmic reticulum: coordination of gene transcriptional and translational controls positionally cloned gene for a novel glomerular protein-nephrin-is mutated in congenital nephrotic syndrome endocrinopathies in the family of endoplasmic reticulum (er) storage diseases: disorders of protein traycking and the role of er molecular chaperones a conditional mutation avecting localization of the menkes disease copper atpase. suppression by copper supplementation molecular characterization of mammalian homologues of class c vps proteins that interact with syntaxin-7 three ways to make a vesicle cellular autophagy: surrender, avoidance and subversion by microorganisms proteasome inhibitors: from research tools to drug candidates a cholesterol-lowering gene maps to chromosome 13q a controlled study of adenoviral-vector-mediated gene transfer in the nasal epithelium of patients with cystic fibrosis a germline insertion in the tuberous sclerosis (tsc2) gene gives rise to the eker rat model of dominantly inherited cancer disappearance and reformation of synaptic vesicle membrane upon transmitter release observed under reversible blockage of membrane retrieval possible temperature-dependent blockage of synaptic vesicle recycling induced by a single gene mutation in drosophila a post-docking role for active zone protein rim hiv interaction with endosomes in macrophages and dendritic cells organization of the pronephric filtration apparatus in zebrafish requires nephrin, podocin and the ferm domain protein mosaic eyes cdc42 controls secretory and endocytic transport to the basolateral plasma membrane of mdck cells copii-cargo interactions direct protein sorting into er-derived transport vesicles akt crystal structure and akt-specific inhibitors defective human ether-a-go-go-related gene traycking linked to an endoplasmic reticulum retention signal in the c terminus retroviral vectors vip21, a 21-kd membrane protein is an integral component of trans-golginetwork-derived transport vesicles regulation of vascular endothelial growth factor receptor-2 activity by caveolin-1 and plasma membrane cholesterol intracellular localization and loss of copper responsiveness of mnk, the murine homologue of the menkes protein, in cells from blotchy (mo blo) and brindled (mo br) mouse mutants interleukin 2 receptors and detergent-resistant membrane domains define a clathrinindependent endocytic pathway aberrant splicing but not mutations of tsg101 in human breast cancer tumor cell growth inhibition by caveolin re-expression in human breast cancer cells bi-directional protein transport between the er and golgi a missense mutation in the low density lipoprotein receptor gene causes familial hypercholesterolemia in sephardic jews novel mutations in the wiskott-aldrich syndrome protein gene and their evects on transcriptional, translational, and clinical phenotypes niemann-pick disease: a frequent missense mutation in the acid sphingomyelinase gene of ashkenazi jewish type a and b patients evidence for a regulated interaction between heterotrimeric g proteins and caveolin a huntingtin-associated protein enriched in brain with implications for pathology expression and characterization of recombinant caveolin hermansky-pudlak syndrome type 7 (hps-7) results from mutant dysbindin molecular, anatomical, and biochemical events associated with neurodegeneration in mice with niemann-pick type c disease the listeria protein internalin b mimics hepatocyte growth factor-induced receptor traycking bone marrow transplantation from an hla-matched unrelated donor for treatment of chediak-higashi syndrome familial sjogren's syndrome with associated primary salivary gland lymphoma protein kinase d regulates the fission of cell surface destined transport carriers from the trans-golgi network rme-1 regulates the distribution and function of the endocytic recycling compartment in mammalian cells caveolae, caveolin and caveolin-rich membrane domains: a signalling hypothesis organized endothelial cell surface signal transduction in caveolae distinct from glycosylphosphatidylinositol-anchored protein microdomains mutant myosin viia causes defective melanosome distribution in the rpe of shaker-1 mice rescue of deltaf508 and other misprocessed cftr mutants by a novel quinazoline compound organic-aciduria, decreased renal ammonia production, hydrophthalmos, and mental retardation; a clinical entity structure and function of the lowe syndrome protein ocrl1 autoantigen golgin-97, an evector of arl1 gtpase, participates in trayc from the endosome to the trans-golgi network functional evaluation of dent's disease-causing mutations: implications for clc-5 channel traycking and internalization regulation of protein transport from the golgi complex to the endoplasmic reticulum by cdc42 and n-wasp the beta-appendages of the four adaptor-protein (ap) complexes: structure and binding properties, and identification of sorting nexin 9 as an accessory protein to ap-2 function and regulation of the mammalian coppertransporting atpases: insights from biochemical and cell biological approaches membrane dynamics and the biogenesis of lysosomes rabaptin-5 is a novel fusion partner to platelet-derived growth factor beta receptor in chronic myelomonocytic leukemia kdel-cargo regulates interactions between proteins involved in copi vesicle trayc: measurements in living cells using fret purification of a novel class of coated vesicles mediating biosynthetic protein transport through the golgi stack clogging of axons by tau, inhibition of axonal trayc and starvation of synapses endocytosis and targeting of exogenous hiv-1 tat protein a novel point mutation in cd18 causing the expression of dysfunctional cd11/cd18 leucocyte integrins in a patient with leucocyte adhesion deficiency (lad) copii-coated vesicle formation reconstituted with purified coat proteins and chemically defined liposomes altered traycking and adhesion function of mpz mutations and phenotypes of charcot-marie-tooth disease 1b biology of adenovirus and its use as a vector for gene therapy the putative tumor suppressors ext1 and ext2 form a stable complex that accumulates in the golgi apparatus and catalyzes the synthesis of heparan sulfate congenital hypothyroid goiter with deficient thyroglobulin. identification of an endoplasmic reticulum storage disease with induction of molecular chaperones mutations in rab27a cause griscelli syndrome associated with haemophagocytic syndrome myoblast transfer in the treatment of duchenne's muscular dystrophy enzyme replacement therapy in fabry's disease: recent advances and clinical applications signalling to actin assembly via the wasp (wiskott-aldrich syndrome protein)-family proteins and the arp2/3 complex mutations in the caveolin-3 gene cause autosomal dominant limb-girdle muscular dystrophy long-term evectiveness of a new alpha-glucosidase inhibitor (bay m1099-miglitol) in insulin-treated type 2 diabetes mellitus genetic disorders avecting proteins of iron and copper metabolism: clinical implications hsp90 inhibitor geldanamycin and its derivatives as novel cancer chemotherapeutic agents mechanism of microtubule stabilization by doublecortin identification and functional analysis of two novel mutations in the multidrug resistance protein 2 gene in israeli patients with dubin-johnson syndrome pharmacological chaperones: a new twist on receptor folding a conserved clathrin assembly motif essential for synaptic vesicle endocytosis eea1, an early endosome-associated protein. eea1 is a conserved alpha-helical peripheral membrane protein flanked by cysteine ''fingers'' and contains a calmodulin-binding iq motif localization of proteins to the golgi apparatus vip21/caveolin is a cholesterol-binding protein maturation models for endosome and lysosome biogenesis cdc42 regulates the exit of apical and basolateral proteins from the trans-golgi network consistent detection of calm-af10 chimaeric transcripts in haematological malignancies with t(10;11)(p13;q14) and identification of novel transcripts molecular analysis of the ergic-53 gene in 35 families with combined factor v-factor viii deficiency vascular endothelial growth factor (vegf) and its receptors the trojan horse: survival tactics of pathogenic mycobacteria in macrophages snares and membrane fusion in the golgi apparatus mutations in the er-golgi intermediate compartment protein ergic-53 cause combined deficiency of coagulation factors v and viii regulation of phagocytosis by rho gtpases urinary megalin deficiency implicates abnormal tubular endocytic function in fanconi syndrome wiskott-aldrich syndrome: a model for defective actin reorganization, cell traycking and synapse formation identification of 23 complementation groups required for post-translational events in the yeast secretory pathway cellular uptake of an alpha-helical amphipathic model peptide with the potential to deliver polar compounds into the cell interior non-endocytically novel cell permeable motif derived from the pres2-domain of hepatitis-b virus surface antigens dynamin at the neck of caveolae mediates their budding to form transport vesicles by gtp-driven fission from the plasma membrane of endothelium the proteasome is involved in angiogenesis role of rab gtpases in membrane trayc the evolving role of lipid rafts and caveolae in g proteincoupled receptor signaling: implications for molecular pharmacology the structure and function of the beta 2-adaptin appendage domain phosphatidylinositol phosphate 5-kinase i{beta} recruits ap-2 to the plasma membrane and regulates rates of constitutive endocytosis fine structure of blood capillaries intracellular aspects of the process of protein synthesis two genes are responsible for griscelli syndrome at the same 15q21 locus localization and processing of cln3, the protein associated to batten disease: where is it and what does it do? localization of the ap-3 adaptor complex defines a novel endosomal exit site for lysosomal membrane proteins infectious hiv-1 assembles in late endosomes in primary macrophages snares and the specificity of transport vesicle targeting caveolar endocytosis of simian virus 40 reveals a new two-step vesicular-transport pathway to the er caveolin-stabilized membrane domains as multifunctional transport and sorting devices in endocytic membrane trayc alpha1-antitrypsin deficiency: liver disease associated with retention of a mutant secretory glycoprotein in the endoplasmic reticulum alpha-1-antitrypsin deficiency: diagnosis and treatment identification of the murine beige gene by yac complementation and positional cloning the menkes protein (atp7a; mnk) cycles via the plasma membrane both in basal and elevated extracellular copper using a c-terminal dileucine endocytic signal participation of the endoplasmic reticulum chaperone calnexin (p88, ip90) in the biogenesis of the cystic fibrosis transmembrane conductance regulator prevention of lysosomal storage in tay-sachs mice treated with n-butyldeoxynojirimycin correction of a mineralization defect by overexpression of a wild-type cdna for col1a1 in marrow stromal cells (mscs) from a patient with osteogenesis imperfecta: a strategy for rescuing mutations that produce dominant-negative protein defects constitutive protein secretion from the trans-golgi network to the plasma membrane review of donepezil, rivastigmine, galantamine and memantine for the treatment of dementia in alzheimer's disease in adults with down syndrome: implications for the intellectual disability population protein kinase d-mediated anterograde membrane traycking is required for fibroblast motility gene replacement reveals that p115/snare interactions are essential for golgi biogenesis constitutive skipping of alternatively spliced exon 10 in the atp7a gene abolishes golgi localization of the menkes protein and produces the occipital horn syndrome traycking and folding defects in hereditary spherocytosis mutants of the human red cell anion exchanger caveolin-2-deficient mice show evidence of severe pulmonary dysfunction without disruption of caveolae isolation of a miller-dicker lissencephaly gene containing g protein [beta]-subunit-like repeats myosin viia is required for aminoglycoside accumulation in cochlear hair cells endophilin/sh3p4 is required for the transition from early to late stages in clathrinmediated synaptic vesicle endocytosis mechanisms of cell-surface rerouting of an endoplasmic reticulum-retained mutant of the vasopressin v1b/v3 receptor by a pharmacological chaperone adaptor-related proteins adaptable adaptors for coated vesicles surfing the sec61 channel: bidirectional protein translocation across the er membrane yolk protein uptake in the oocyte of the mosquito aedes aegypti caveolin, a protein component of caveolae membrane coats mechanisms of intracellular protein transport protein sorting by transport vesicles early stages of influenza virus entry into mv-1 lung cells: involvement of dynamin science medicine, and the future. gene therapy endoplasmic reticulum storage diseases calm-af10 fusion gene in leukemias: simple and inversion-associated translocation (10;11) assembly of the er to golgi snare complex requires uso1p signal transducing molecules and glycosyl-phosphatidylinositol-linked proteins form a caveolin-rich insoluble complex in mdck cells regulation of the unc-18-caenorhabditis elegans syntaxin complex by unc-13 visualization of er-to-golgi transport in living cells reveals a sequential mode of action for copii and copi identification, sequence, and expression of caveolin-2 defines a caveolin gene family an enzyme that removes clathrin coats: purification of an uncoating atpase endophilin i mediates synaptic vesicle formation by transfer of arachidonate to lysophosphatidic acid endoplasmic reticulumlocalized amyloid beta-peptide is degraded in the cytosol by two distinct degradation pathways endothelial caveolae have the molecular transport machinery for vesicle budding, docking, and fusion including vamp, nsf, snap, annexins, and gtpases griscelli syndrome: report of the first peripheral blood stem cell transplant and the role of mutations in the rab27a gene as an indication for bmt peptide-mediated cellular delivery gene therapy: a brief overview of the past, present, and future phagosome maturation: a few bugs in the system the npc1 protein: structure implies function rab gtpases, intracellular trayc and disease a ypt/rab evector complex containing the sec1 homolog vps33p is required for homotypic vacuole fusion the role of the tethering proteins p115 and gm130 in transport through the golgi apparatus in vivo early endosome antigen. 1: an autoantigen associated with neurological diseases identification of the b-cell epitopes of the early endosome antigen 1 (eea1) cloning and gene defects in microsomal triglyceride transfer protein associated with abetalipoproteinaemia diverential evects of kinase cascade inhibitors on neoplastic and cytokine-mediated cell proliferation chediak-higashi syndrome: a rare disorder of lysosomes and lysosome related organelles autophagy in health and disease: a double-edged sword golgins and gtpases, giving identity and structure to the golgi apparatus a role for the vesicle tethering protein, p115, in the postmitotic stacking of reassembling golgi cisternae in a cell-free system sequential tethering of golgins and catalysis of snarepin assembly by the vesicle-tethering protein p115 transcytosis of plasma macromolecules in endothelial cells: a cell biological survey regulation of cytoplasmic dynein behaviour and microtubule organization by mammalian lis1 snap receptors implicated in vesicle targeting and fusion paraneoplastic stiv-person syndrome: passive transfer to rats by means of igg antibodies to amphiphysin a role for giantin in docking copi vesicles to golgi membranes phenotypic behavior of caveolin-3 r26q, a mutant associated with hyperckemia, distal myopathy, and rippling muscle disease phosphofructokinase muscle-specific isoform requires caveolin-3 expression for plasma membrane recruitment and caveolar targeting: implications for the pathogenesis of caveolin-related muscle diseases identification of filamin as a novel ligand for caveolin-1: evidence for the organization of caveolin-1-associated membrane domains by the actin cytoskeleton endosomal localization of the autoantigen eea1 is mediated by a zinc-binding fyve finger perforin gene defects in familial hemophagocytic lymphohistiocytosis copicoated er-to-golgi transport complexes segregate from copii in close proximity to er exit sites the immunological synapse of ctl contains a secretory domain and membrane bridges linking albinism and immunity: the secrets of secretory lysosomes the biogenesis of lysosomes: is it a kiss and run, continuous fusion and fission process? breaking the copi monopoly on golgi recycling a family of rab27-binding proteins. melanophilin links rab27a and myosin va function in melanosome transport retinal stimulates atp hydrolysis by purified and reconstituted abcr, the photoreceptor-specific atp-binding cassette transporter responsible for stargardt disease abnormal vesicular traycking in mouse models of hermansky-pudlak syndrome protein folding and translocation across the endoplasmic reticulum membrane role of calnexin in the glycan-independent quality control of proteolipid protein assembly and traycking of caveolar domains in the cell: caveolae as stable, cargo-triggered, vesicular transporters identification and expression of five mutations in the human acid sphingomyelinase gene causing types a and b niemann-pick disease. molecular evidence for genetic heterogeneity in the neuronopathic and non-neuronopathic forms defective aquaporin-2 traycking in nephrogenic diabetes insipidus and correction by chemical chaperones misfolding of mutant aquaporin-2 water channels in nephrogenic diabetes insipidus molecular cloning of caveolin-3, a novel member of the caveolin gene family expressed predominantly in muscle copii and exit from the endoplasmic reticulum identification of a familial hyperinsulinism-causing mutation in the sulfonylurea receptor 1 that prevents normal traycking and function of katp channels mutations of the pds gene, encoding pendrin, are associated with protein mislocalization and loss of iodide ezux: implications for thyroid dysfunction in pendred syndrome the chediak-higashi protein interacts with snare complex and signal transduction proteins clathrin assembly lymphoid myeloid leukemia (calm) protein: localization in endocytic-coated pits, interactions with clathrin, and the impact of overexpression on clathrin-mediated trayc how peroxisomes arise chloride channels cough up caveolae are highly immobile plasma membrane microdomains, which are not involved in constitutive endocytic traycking tau regulates the attachment/detachment but not the speed of motors in microtubule-dependent transport of single vesicles and organelles gangliosides are receptors for murine polyoma virus and sv40 protein traycking and alzheimer's disease role of auxilin in uncoating clathrincoated vesicles the inositol polyphosphate 5-phosphatase ocrl associates with endosomes that are partially coated with clathrin actin microfilaments facilitate the retrograde transport from the golgi complex to the endoplasmic reticulum in mammalian cells dynamin-like protein encoded by the drosophila shibire gene associated with vesicular trayc protein kinase d: an intracellular trayc regulator on the move fcgammar polymorphisms: implications for function, disease susceptibility and immunotherapy a dileucine-like sorting signal directs transport into an ap-3-dependent, clathrin-independent pathway to the yeast vacuole protein kinase g from pathogenic mycobacteria promotes survival within macrophages degradation of cftr by the ubiquitinproteasome pathway use of expression constructs to dissect the functional domains of the chs/beige protein: identification of multiple phenotypes a novel 115-kd peripheral membrane protein is required for intercisternal transport in the golgi stack rab6 coordinates a novel golgi to er retrograde transport pathway in live cells hsp90 and the chaperoning of cancer sec61-mediated transfer of a membrane protein from the endoplasmic reticulum to the proteasome for destruction the human cytomegalovirus us11 gene product dislocates mhc class i heavy chains from the endoplasmic reticulum to the cytosol amphiphysin heterodimers: potential role in clathrin-mediated endocytosis the caveolin genes: from cell biology to medicine a mutation in rab27a causes the vesicle transport defects observed in ashen mice caveolinopathies: mutations in caveolin-3 cause four distinct autosomal dominant muscle diseases four contiguous amino acid substitutions, identified in patients with laron syndrome, diverently avect the binding aynity and intracellular traycking of the growth hormone receptor synaptojanin is the major constitutively active phosphatidylinositol-3,4,5-trisphosphate 5-phosphatase in rodent brain rab27a enables myosin va-dependent melanosome capture by recruiting the myosin to the organelle identification of an organelle receptor for myosin-va the tuberous sclerosis 2 gene product, tuberin, functions as a rab5 gtpase activating protein (gap) in modulating endocytosis the fine structure of the gall bladder epithelium of the mouse caveolin is an inhibitor of platelet-derived growth factor receptor signaling sulfonylureas correct traycking defects of atp-sensitive potassium channels caused by mutations in the sulfonylurea receptor nsec1 binds a closed conformation of syntaxin1a abnormal ryanodine receptor function in heart failure fabry disease: characterization of alpha-galactosidase a double mutations and the d313y plasma enzyme pseudodeficiency allele protein kinase d regulates basolateral membrane protein exit from trans-golgi network predisposition to renal carcinoma in the eker rat is determined by germ-line mutation of the tuberous sclerosis 2 (tsc2) gene adenovirus-mediated gene transfer transiently corrects the chloride transport defect in nasal epithelia of patients with cystic fibrosis rab proteins as membrane organizers bleeding due to disruption of a cargo-specific er-to-golgi transport complex charcot-marie-tooth disease type 2a caused by mutation in a microtubule motor kif1b the lebanese allele at the low density lipoprotein receptor locus. nonsense mutation produces truncated receptor that is retained in endoplasmic reticulum sucrase-isomaltase deficiency in humans. diverent mutations disrupt intracellular transport, processing, and function of an intestinal brush border enzyme surfing the sec61 channel: bidirectional protein translocation across the er membrane crystal structure of an activated akt/protein kinase b ternary complex with gsk3-peptide and amp-pnp key: cord-010278-loey5xq9 authors: huh, changgoo; nagle, james w.; kozak, christine a.; abrahamson, magnus; karlsson, stefan title: structural organization, expression and chromosomal mapping of the mouse cystatin-c-encoding gene (cst3) date: 1995-01-23 journal: gene doi: 10.1016/0378-1119(94)00728-b sha: doc_id: 10278 cord_uid: loey5xq9 cystatin c (cstc) is a potent cysteine-proteinase inhibitor. the structure of the mouse cstc-encoding gene (cst3) was examined by sequencing a 6.1-kb genomic dna containing the entire gene, as well as 0.9 kb of 5′ flanking and 1.7 kb of its 3′ flanking region. the sequence revealed that the overall organization of the gene is very similar to those of the genes encoding human cstc and other type-2 cst, with two introns at positions identical to those in the human gene. the promoter area does not contain typical tata or caat ☐es. two copies of a spl-binding motif, gggcgg, are present in the 5′ flanking region within 300 bp upstream from the initiation codon. a hexa-nucleotide, tgttct, which is a core sequence of the androgen-responsive element (are), is found in the promoter region. this region also contains a 21-nucleotide sequence, 5′-agactagcagctgactgaagc, which contains two potential binding sites for the transcription factor, ap-1. the mouse cst3 mrna was detected in all of thirteen tissues examined by northern blot analysis. cst3 was mapped in the mouse to a position on distal chromosome 2. the cystatins (cst) are a group of potent cysteineproteinase inhibitors. there are at least five distinct types in the cst superfamily, each type consisting of several proteins (rawlings and barrett, 1990; devos et al., 1993) . cstc belongs to the family of type-2 cst and consists of 120 aa, with two intrachain disulfide bonds (barrett et al., 1986) . although the proteinase-inhibiting function of the cstc has been thoroughly investigated, less is known about its broader biological role. recent reports indicate that cstc may play a role in cancer progression (sloane, 1990) , bone resorption (lerner and grubb, 1992) , modulation of neutrophil chemotactic activity and inflammation (leung-tack et al., 1990a,b) , and resistance to viral infection (collins and grubb, 1991) . furthermore, a point mutation in the cst3 gene, resulting in a leu~gln substitution, is the primary cause of autosomal dominant hereditary disorder, hereditary cstc amyloid angiopathy (hccaa) (grubb et al., 1984) . as young adults, carriers of this mutation suffer from repeated and massive brain hemorrhages due to deposition of the mutant protein in the walls of the cerebral arteries. the gene structures for several type-2 cst have been determined. the human cst1 and cst2 genes (saitoh et al., 1987) , and the human cst3 (abrahamson et al., 1990) and cst4 (freije et al., 1991) genes show very similar structural organization with respect to the number and position of the introns. structural analysis of the genes for these proteins will be necessary to understand the function and evolution of the members in the cystatin multigene superfamily. in this paper, we report the structural organization and expression of the mouse cst3 gene, compare its regulatory elements with that of other cst genes and map the cst3 gene in the mouse. using two primers mcyc3: (5'-atg gcc agc ccg ctg cgc tcc ttg-3') and mcyc4: (5'-ggc att ttt gca gct gaa ttt tgt cag-3'), a 420-bp dna fragment within the coding region was generated from mouse cst3 cdna, labeled with 32p by random primer extension, and used as a probe to screen a ~,fixli genomic dna library from 129/sv mice (purchased from stratagene, la jolla, ca, usa). among several hybridizing clones, one clone, ~,cygl3, was chosen for characterization. southern blot and pcr analysis indicated the presence of the mouse cst3 gene in a 15-kb dna insert. southern blot analysis of the genomic dna did not show any evidence for the presence of a cst3 pseudogene. a 6.1-kb genomic dna fragment containing the entire gene, as well as 0.9 kb of 5' flanking and 1.7 kb of 3' flanking region, was subjected to nt sequencing. the 6125-nt sequence covering the entire mouse cst3 gene is shown in fig. 1 . comparison of the mouse cst3 gene sequence with that of the corresponding cdna (solem et al., 1990) revealed that the gene contains two intron sequences located between the nt triplets encoding aa 55-56 and 93-94 of the proposed mature polypeptide chain, exactly as in the human cst3 gene. the presence of two introns, at homologous positions, has also been reported in the other type-2 cst genes fully characterized to date: the human csti, cst2 and cst4 genes. the intron-exon junctions in the mouse cst3 gene are all close matches to the consensus sequences for the donor and acceptor splice sites of introns (mount, 1982) . some differences were observed between the exons of the genomic dna sequence and the published cdna, the five differences are summarized in table 1 . two of these positions, 994 and 3421, are in the coding region. the nt 994 in exon 1 results in a gcc codon (coding for ala) towards the c-terminal part of the leader sequence. however, a ggc codon (coding for gly) was reported at this position in the published cdna. this gcc codon found in the genomic dna exists in the corresponding site of the rat and human cst3 cdnas. the nt 342~ in exon 2 forms a ttg codon for leu (ttt coding for phe in cdna). the differences between genomic and cdna sequence may be due to an error during cdna synthesis or due to polymorphism between the mouse strains 129/sv and balb/c. the sequence of the 0.9-kb segment flanking exon 1 of the mouse cst3 gene at the 5' end, did not reveal a typical tata or caat box in the suggested promoter area. however, a tata-box-like taaaa sequence is present at 78-82 nt upstream from the start codon. a similar slightly atypical tata-box is found at the homologous position in the human cst3 gene (ataaaa), the human cst4 gene (ataaat), the human csti and cst2 genes (ataaa). the tata-box is preceded by a spl-binding gc-box sequence (pugh and tjian, 1990) with the core consensus sequence, gggcgg, ending 23 nt upstream from the at-sequence. a corresponding sequence in the human cst3 gene is located slightly closer to the tatabox (distance 16 nt). in the human cst4 gene, a gc-box is also found in the immediate 5'-flanking region (upstream distance from the at-sequence 41 nt). by contrast, in the human csti and cst2 genes, a segment similar to the caat consensus is found instead of the gc box. a core sequence of the are, tgttct, is found 22 nt downstream from the taaaa sequence. this hexanucleotide is located in a partial palindromic setting. some naturally occurring sequences and synthetic constructs containing this core sequence in a partial palindromic structure were shown to be inducible with androgens (ham et al., 1988) . this are is not found in the promoters of any other type-2 cst genes published. however, it is present in cst-related protein-encoding genes whose expression is regulated by androgen in the ventral prostate and lachrymal gland (chamberlain et al., 1983 ). an exact match of nine nt with the pituitary transcription factor (pit-l) recognition element is centered around nt -795 from the start codon, but is probably of low significance for the expression of the gene because multiple recognition elements have been shown to be needed for markedly increased expression of the rat prolactin gene by pit-1 (ingraham et al., 1988 recognized by the leader binding protein (lbp-1), 5'-wctgg-3' or its inverse, that is present in several copies in the hiv-1 promoter and contribute to its basal function (jones et al., 1988) , is strikingly abundant in the 5'-flanking region of the mouse cst3 gene. another five lbp-i motifs are found within a 300-bp segment in the first part of the first intron. transcription factor ap-l-binding sites that bind the jun-fos protooncogene complexes contain the consensus sequence 5'-tgactcagc. the mouse cst3 promoter contains two ap-l-like binding sequences within the sequence 5'-agactagcagctgactgaagc, immediately upstream from the first spl-binding site. this 21-mer sequence contains direct repeats of two adjacent potential ap-l-binding sites, each slightly deviating from the consensus sequence. it has been shown that two adjacent ap-l-like binding sites act synergistically to confer inducibility beyond that observed for a single ap-1 consensus sequence (friling et al., 1992) . the presence of the two ap-l-like binding sites in the promoter indicates that differences between the mouse cst3 gene sequence and that of the published edna (solem et al., 1990) position" genomic dna b cdna c aa a (genomic/cdna) 994 c (11) g ala/gly 3421 g(5) t leu/phe 4393 c 15) t 4564-5 gc (3) at a the nt positions refer to fig. 1 . b the nt refers to the genomic dna sequence. sequence was determined on both strands (number of independent sequencing runs in parentheses). c the nt refers to the cdna sequence. d the aa encoded by nt in columns genomic dna and cdna. transcription factor ap-1 may play a role in the cst3 gene expression. there is evidence that induction of gene expression by tgf-13 is mediated by transcription factor ap-1. autoregulation of tgf-131 expression is mediated by the binding of ap-1 to a loose consensus binding site, tgagaca, in the tgf-131 promoter (kim et al., 1990) . a strong positive regulation of the cst3 gene by tgf-13 in serum free mouse embryo cells has also been reported (solem et al., 1990) . the presence of ap-l-like binding sites in the mouse cst3 promoter suggests that cystatin c induction by tgf-[3 may be mediated by the ap-i complex, the y-flanking region of the human cst3 gene has a notably high g+c content, with >70% g+c in the 400 bp sequence upstream from the start codon. the g+c-rich region also includes the coding part of exon 1 and the 5' part of the first intron, which together represents a 900-bp segment with a g+c content of 73%, and contains cpg/gpc dinucleotides in a ratio close to unity (abrahamson et al., 1990) . the immediate 5'-flanking region of the mouse cst3 gene does not have such a strikingly high g + c content, but is more similar to the human cst1, cst2 and cst4 genes in having a gc content of approx. 60%. however, the cpg/gpc ratio is 1/1.3 in the 300 bp region upstream from the start codon (as compared to 1/4.1 over the entire 6. l-kb sequenced region), differing markedly from ratios of 1/6, 1/9 and 1/16 for the human cst4, csti and cst2 genes, respectively. thus, the mouse cst3 gene y-flanking region is not a typical housekeeping gene promoter having extremely high g + c content. rather it is similar to these promoters and the human cst3 promoter because it displays several spl-binding sites and contains a high number of cpg dinucleotides. this may indicate a low degree of methylation due to constant transcription of the gene (bird, 1986) . proposed promoter regions of several type-2 cst are compared in fig. 2 (saitoh et al., 1987 : abrahamson et al., 1990 : freije et al., 1991 . sequence determination of the mouse cst3 gene 1.7-kb 3'-flanking segment revealed no alternative polyadenylation signals in addition to the one present in the corresponding edna, 213bp downstream from the stop codon. analysis of short tandem repeats within the entire gene sequence revealed the presence of(gt)zl and (ga)26 in the region immediately 3'-flanking the polyadenylation signal and three stretches of perfect ct repeats, (ct)13, (ct)14 and (ct)25, 1300 bp further downstream. analysis of two multilocus crosses .was used to define the chr location for the mouse cst3 gene: (nfs/n or c58/j × m. m. musculus) × m. m. musculus ) and (nfs/n ×m. spretus)×m, spretus or c58/j (adamson et al., 1991) . dnas extracted from parental mice and progeny of the crosses were typed by southern blotting analysis for rflps of cst3 using the mouse cst3 cdna as probe. ssti digestion produced fragments of 8.6 and 6.0 kb in m. spretus and 6.0 and 2.8 kb in nfs/n and c58/j mice, and bamhi fragments of 22.0 and 18.8 kb were detected in m. m. musculus and nfs/n, respectively. inheritance of the parental fragments was followed in both crosses and compared with inheritance of almost 650 markers previously typed and mapped to all 19 autosomes and the x chromosome. as shown in fig. 3 the numbers given between adjacent loci represent percent recombination±the standard error calculated according to green (1981) . (b) chr 2 linkage maps. the map on the right was generated from the two crosses described here and indicates the position of cst3 relative to the other markers typed in this cross. distances between adjacent markers (in centimorgans) are indicated to the immediate left of the map. the map on the left is an abbreviated version of the composite genetic map (siracusa and abbott, 1993) . numbers to the left of the map are centimorgan distances from the centromere. human map locations for homologs of the underlined mouse genes are indicated to the far left of this map. and distal to snap (encoding synaptosomal associated protein), markers which were typed in these crosses as previously described (joseph et al., 1990; grimaldi et al., 1992) . it has recently been shown that the human homolog of this gene maps to 20pll (schnittger et al., 1993) . this is consistent with our results, since the distal end of mouse chr 2 contains a substantial region of linkage conservation with human chr 20 (siracusa and abbot, 1993; fig. 3b ). the human cst3 is part of a cluster which includes up to eight members of the cst gene family (schnittger et al., 1993) further suggesting that the mouse homologs of these genes are likely to map to the same site on chr 2. we examined the expression of the mouse cst3 gene in 13 different tissues by northern blot analysis using the mouse cst3 cdna probe. as expected, cst3 mrna was detected in all tissues examined, including stomach, brain, intestines, liver, muscle, spleen, heart, kidney, lung, pancreas, testis, uterus and ovary (data not shown). the pattern of the mouse cst3 gene expression is similar to that of its human counterpart. both species show expression of the gene in all tissues examined with high level of cst3 messenger rna in brain and testis, and lowest level in pancreas. this overall similarity between the two species indicates that mouse may be suitable for generating an animal model for the human genetic disease hccaa. structure and expression of the human cystatin c gene the mouse homolog of the gibbon ape leukemia virus receptor: genetic mapping and a possible receptor function in rodents nomenclature and classification of the proteins homologous with the cysteine-proteinase inhibitor chicken cystatin cpg-rich islands and the function of dna methylation isolation, properties, and androgen regulation of a 20-kilodalton protein from rat ventral prostate inhibitory effects of recombinant human cystafin c on human coronaviruses structure of rat genes encoding androgen-regulated cystatin-related proteins (crps): a new member of the cystatin superfamily structure and expression of the gene encoding cystatin d, a novel human cysteine proteinase inhibitor two adjacent ap-l-like binding sites form the electrophile-responsive element of the murine glutathione s-transferase ya subunit gene genetics and probability in animal breeding experiments genomic structure and chromosomal mapping of the murine cd40 gene abnormal metabolism of 7-trace alkaline microprotein characterization of response elements for androgens, glucocorticolds and progestins in mouse mammary tumour virus a tissue-specific transcription factor containing a homeodomain specifies a pituitary phenotype structural arrangements of transcription control domains within the 5'-untranslated leader regions of the hiv-i and hiv-2 promoters characterization and expression of the complementary dna encoding rat histidine decarboxylase autoinduction of transforming growth factor 131 is mediated by the ap-1 complex molecular genetic markers spanning mouse chromosome 10 neutrophil chemotactic activity is modulated by human cystatin c, an inhibitor of cysteine proteases modulation of phagocytosis-associated respiratory burst by human cystatin c: role of the n-terminal tetrapeptide lys-pro-pro-arg human cystatin c, a cysteine proteinase inhibitor, inhibits bone resorption in vitro stimulated by parathyroid hormone and parathyroid hormone-related peptide of malignancy a catalogue of splice junction sequences mechanism of transcriptional activation by sp 1; evidence for coactivators evolution of proteins of the cystatin superfamily human cysteineproteinase inhibitors: nucleotide sequence analysis of three members of the cystatin gene family cystatin c (cst3), the candidate gene for heretary cystatin c amyloid angiopathy (hccaa), and other members of the cystatin gene family are clustered on chromosome 20pl 1.2 mouse chromosome 2. mammal cathepsin b and cystatins: evidence for a role in cancer progression transforming growth factor beta regulates cystatin c in serum-free mouse embryo (sfme) cells we thank dr. jakob reiser for critical reading of the manuscript. key: cord-012542-rsqon0w0 authors: abbas, mostafa; el-manzalawy, yasser title: machine learning based refined differential gene expression analysis of pediatric sepsis date: 2020-08-28 journal: bmc med genomics doi: 10.1186/s12920-020-00771-4 sha: doc_id: 12542 cord_uid: rsqon0w0 background: differential expression (de) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. such analysis often provides a wide list of genes that are differentially expressed between two or more groups. in general, identified differentially expressed genes (degs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. furthermore, degs are treated as candidate biomarkers and a small set of degs might be identified as biomarkers using either biological knowledge or data-driven approaches. methods: in this work, we present a novel approach for identifying biomarkers from a list of degs by re-ranking them according to the minimum redundancy maximum relevance (mrmr) criteria using repeated cross-validation feature selection procedure. results: using gene expression profiles for 199 children with sepsis and septic shock, we identify 108 degs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated area under roc curve (auc) score of 0.89. conclusions: machine learning based refinement of de analysis is a promising tool for prioritizing degs and discovering biomarkers from gene expression profiles. moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis. pediatric sepsis is a life-threatening condition that is considered a leading cause of morbidity and mortality in infants and children [1, 2] . sepsis is a systematic response to infection that is characterized by a generalized pro-inflammatory cascade, which may lead to extensive tissue damage [3] . early recognition of sepsis and septic shock will help pediatric care physicians to intervene before the onset of advanced organ dysfunction and thus reduce the mortality and length of stay as well as post critical care complications [4] . however, reliable risk stratification of sepsis, especially in children, is a challenge due to significant patient heterogeneity [5] and existing poor definitions of sepsis in pediatric populations [6] . existing physiological scoring tools commonly used in intensive care units (icus), such as acute physiologic and chronic health evaluation (apache) [7] and sepsis-related organ failure assessment (sofa) [8] , use clinical and laboratory measurements to quantify critical illness severity but provide little information about the risk for poor outcome (e.g., mortality) at the onset of the disease [2] . several recent studies have proposed sepsis prognostic biomarkers (e.g., [5, 9, 10] ) as well as sepsis diagnostic biomarkers (e.g., [11] [12] [13] ) by differentiating between infectious and non-infectious systemic inflammatory response syndrome. to date, transcriptomic, proteomic, and metabolomic data have been used to identify sets of genes, proteins, or metabolites that are differentially expressed among patients [14] . however, a major challenge for developing clinically feasible sepsis biomarkers is to have a fast turnaround time [14, 15] . recent advances in high-throughput transcriptomic technology have created opportunities for precision critical care medicine by enabling fast and clinically feasible profiling of gene expressions within few hours. for example, wong et al. [16] used a multiplex messenger rna quantification platform (nanostring ncounter) to profile the expressions of previously identified 100 three subclass-defining genes [17] in 8-12 h. differential gene expression analysis is a commonly used computational approach for identifying genes whose expressions are significantly different between two phenotypes. given gene expression profiles for septic patients annotated with targeted outcome (e.g., survivals vs. non-survivals), this analysis typically associates a p-value (that could be corrected for multiple hypothesis testing) with each gene from the two groups (e.g. survivals and non-survivals). then, degs are those genes with p-values lower than a specific threshold (typically, 0.05) and user-specified thresholds for fold change (fc) for up-and downregulated genes [18] . a typical de analysis of gene expression profiles often return hundred or more degs, where considerable number of them might be highly correlated with one or more other degs. against this background, we present a novel method for refining the results of the statistical de analysis methods via re-ranking and prioritizing the genes from the outcome of de analysis. specifically, we propose a hybrid approach that leverages: i) statistical de analysis for identifying a wide list of degs; ii) supervised feature selection methods for selecting an optimal subset of degs with maximum relevance for predicting the target variable and minimum redundancy among selected genes; iii) supervised machine learning methods for assessing the discriminatory power of the selected genes. using gene expression profiles from the blood samples extracted from 199 children admitted to icu and diagnosed with sepsis or septic shock, we first report a list of 108 degs and associated enriched functional pathways. then, we demonstrate the viability of our proposed gene re-ranking method in identifying a 10-gene signature for mortality in pediatric sepsis. finally, we make our python code (including notebooks examples for refining degs and analyzing biomarkers using two example datasets) publicly available at https://bitbucket.org/i2 rlab/rdea/. normalized and pre-processed transcriptomic gene expression profiles were downloaded from [19] . these gene expression profiles represent peripheral blood samples collected from 199 pediatric patients (later diagnosed with sepsis or septic shock) during the first 24 h of admission to the pediatric icu. out of these 199 pediatric patients, 28 patients are non-survivals. affymetrix cel files were downloaded from ncbi geo accession number gse66099 and re-normalized using the gcrma method in affy r package [20] . probe-to-gene mappings were downloaded from the most recent soft files in geo and the mean of the probes for common genes were set as the gene expression level. we used limma r package (version 3.42.0) [18] to identify the differentially expressed genes with a benjamini-hochberg (bh) correction method. we calculated the fold change with respect to the non-survival (i.e., the upregulated genes are the genes with expression of the non-survival samples that are higher than the expression of these genes in the survival samples). we experimented with three commonly used machine learning algorithms for developing and evaluated binary classifiers for predicting mortality in pediatric sepsis: i) random forest [21] with 100 trees (rf100); ii) extreme gradient boosting [22] with 100 weak tree learners (xgb100); iii) logistic regression (lr) [23] with l2 regularization. the three algorithms are implemented in the scikit-learn machine learning library (version 0.21.2) [24] . we used two feature selection methods that have been widely used with gene expression data, random forest feature importance (rffi) [21] and minimum redundancy and maximum relevance (mrmr) [25] . for the rffi method, we trained a rf with 100 trees and then feature importance scores which quantify the contribution of each feature in the learned rf model were used to sort and rank the input features and only top k = 1, 2, …, 10 were selected for training our classifiers. for mrmr feature selection method, we used the training data to select the top k features. these features were selected such that the objective function in eq. 1 is maximized. let, ω, s, and ω s denote input, selected, and non-selected input features, respectively. the first term in eq. 1 uses a relevance function f(x i , y) to quantify the relevance of the feature x j for predicting the target output y while the second term quantifies the redundancy among the selected features in s using the function g(x j , x l ). we implemented the mrmr algorithm [25, 26] as a scikit-learn feature selection model using python. in our experiments, we used the scipy (version 1.2.1) implementation of the pearson correlation coefficient to compute redundancy between features. for relevance functions, we considered three functions (implemented in scikit-learn): area under roc curve (mrmr_auc); χ 2 (mrmr_chi2); and f-statistic (mrmr_fstat) . marker genes discovery and performance evaluation we identified top discriminative features (i.e., marker genes) and estimated the performance of the machine learning classifiers using 10 runs of the 10-fold crossvalidation procedure. briefly, we repeated the following procedure 10 times: first, the dataset was randomly partitioned into 10 equal subsets (each with the same survivals to non-survivals ratio as the entire dataset). nine of the 10 subsets were combined to serve as the feature selection and training set while the remaining subset was held out for estimating the performance of the trained classifier. this procedure was repeated 10 times, by setting aside a different subset of the data as the test set. overall, we had 100 iterations of train and test experiments. the reported performance is averaged over the 100 iterations and the score of each feature represents the fraction of how many times this feature was selected in the 100 iterations (i.e., a feature with a score of 0.85 means that this feature had been selected to train the classifier in 85 out of 100 iterations). we assessed the performance of classifiers using five widely used predictive performance metrics [27] : accuracy (acc), sensitivity (sn); specificity (sp); and matthews correlation coefficient (mcc); area under roc curve (auc) [28] . auc is a widely used metric and summary statistic of the roc curve. however, when several models have almost the same auc score, we can still compare them by examining their roc curves to determine if a model has an roc curve that completely or partially (in the leftmost region) dominates all other roc curves. we used the function find_enriched_pathway in the keggprofile r package (version 1.28.0) [29] to map the differentially expressed genes in kegg pathway database [30] . in our experiments, pathways with adjusted p-value ≤0.05 and gene count ≥2 were considered significantly enriched. based on absolute fold change ≥1.5 and adjusted p-value ≤0.05, 108 from a total of 10,596 genes were found to be degs between survival and non-survival septic pediatric patients (see additional file 1: table s1 ) and additional file 2: fig. s1 ). table 1 shows the top 10 degs when the genes are ranked using the absolute value of their fold change. only one gene, tgfbi, is down-regulated while the remaining nine genes are upregulated. tgfbi is among the 11 genes that have been used in the sepsis metascore (sms) gene expression diagnostic method [11, 31] . the top three upregulated genes are slc39a8, rhag, and ddit4. slc39a8 is found in the plasma membrane and mitochondria and plays a critical role at the onset of inflammation [32] . both rhag (also called slc42a1) and slc39a8 belong to solute carrier (slc) group of membrane transport proteins. finally, increased expressions of dna damage inducible transcript 4 (ddit4) gene had been associated with higher risks of mortality in sepsis patients [10, 19] . in order to get biological insights into the functional rules of the identified 108 degs, we used the keggprofile r package to identify enriched human kegg pathways in this set of genes. in our experiments, we did not threshold on the p-value, adjusted p-value, or minimum number of genes in the pathway such that the returned results include all kegg pathways that have at least one gene in common with the target set of genes. the complete set of results is provided in additional file 1: table s2 . we considered a pathway to be significantly enriched if its adjusted p-value is ≤0.05 and at least two degs are included in that pathway. using these criteria, we got 8 significantly enriched pathways ( table 2 ). most of these pathways had been linked to inflammation and/ or dna damage. additional file 2 fig. s2 shows the heatmap of the correlation matrix of the 108 degs. the figure shows that up-regulated and down-regulated degs are clustered separately. we also noted that within each cluster, every gene might be highly correlated with multiple other genes. can a small subset of the degs discriminate between survivals and non-survivals? here, we report the results of evaluating 120 models obtained using a combination of three supervised classification algorithms, four feature selection methods, and 10 possible values for the number of selected features (k = {1, 2, …, 10}). additional file 1: table s3 shows the average performance metrics estimated over 10 runs of 10-fold cross-validation experiments. figure 1 shows the boxplots of the average auc scores for each combination of a classification algorithm and a feature selection method. interestingly, mrmr_auc is consistently the best feature selection method using any of the three classification algorithms considered in our experiments. surprisingly, we found that the models obtained using this feature selection method and lr algorithm not only have the best performance (in terms of auc scores) but also have the lowest variance in estimated auc (i.e., auc scores are between 0.84 and 0.85). additional file 1: table s4 shows the results of using the mann-whitney u test pairwise progesterone-mediated oocyte maturation 9.16e-04 3.89e-02 fig. 1 comparisons of lr, rf100, and xgb100 classifiers evaluated using four different feature selection methods and 10 runs of 10-fold crossvalidation experiments. each boxplot represents the distribution of average auc score of 10 models evaluated using a given classification algorithm and feature selection method for selecting top k = 1, 2, …, 10 features comparisons of classifiers (in fig. 1 ) for each feature selection method. we found that the median auc score for lr is significantly higher than the median auc score for rf100 using the four feature selection methods. we also found that the median auc score for lr is significantly higher than the median auc score for xgb100 using mrmr_auc and mrmr_chi2 feature selection methods. figure 2 shows that (using mrmr_auc feature selection) lr models outperformed corresponding rf100 and xgb100 models for any choice of the number of selected features in k = {1, 2, …, 10}. based on this figure, one might conclude that we should not use more than 2 features since adding more features did not yield any improvements in the auc score.. however, to accurately identify the best performing lr model, we inspected the average roc curves of these lr models (see additional file 2: fig. s3 ). the lr model using only 2 features is dominated in the leftmost region of the curve (i.e., region corresponds to specificity greater than 0.80) by all other models. for a target specificity greater than 0.80, the best roc curve corresponds to the model trained using top seven selected degs. we concluded that the best model (out of the 120 models evaluated in this study) is based on lr algorithm and mrmr_auc method for selecting top seven degs. therefore, only seven genes are needed to achieve the highest auc score of 0.85. due to the small dataset and the instability of feature selection methods, the top seven degs selected in each fold might be different. note that we conducted 10 runs of 10-fold cross-validation procedure. thus, we chose seven degs 100 times to train and evaluate the lr model. to determine the importance of each gene, we assigned each gene a score indicating how many times (out of 100) this gene had been selected among the top seven genes used to train the classifier. then, we simply normalized the scores by dividing by 100 such that gene importance scores of 1.0, 0.87, and 0.0 correspond to genes that have been selected 100, 87, and zero times, respectively. additional file 1: table s5 reports the gene importance scores for the 108 degs. only 31 genes have importance score greater than zero. the top 15 genes and their importance scores are shown in fig. 3 . we noted that three genes (ddit4, rhag, and areg) had been consistently selected in each time. as a result of the small number of samples in our dataset, the performance of any predictive model estimated using 10-fold cross-validation procedure might vary for different random partitioning of the data into 10 folds. therefore, the repeated cross-validation is essential for obtaining more accurate estimates of model performance. to examine if the repeated cross-validation is also necessary for obtaining robust estimates of gene importance scores, we repeated the preceding experiment using a single run of 10-fold cross-validation procedure. the resulting gene importance scores are reported in additional file 1: table s6 . only 15 genes have non-zero scores. out of these genes, we found that 12 genes are in the top 15 genes determined using the repeated 10-fold cross-validation experiment. in summary, our machine learning based refining of degs outcome reduced the number of degs from 108 to 31 and provided an alternative ranking of these genes. next, we show how to use this ranking to determine the minimum set of degs that best discriminate between pediatric sepsis survivals and non-survivals. we used the top 15 genes in fig. 3 to search for a minimal set of genes that best discriminates between pediatric sepsis survivals and non-survivals. specifically, for top k = {4, 5, …, 15} genes, we obtained the average roc curves of lr models estimated using 10 runs of 10-fold cross-validation procedure (see additional file 2: fig. s4 ). we found no improvement in the roc curve when using more than top 10 genes. figure 4 shows the boxplots of the normalized gene expressions of these 10 genes. interestingly, all 10 genes are upregulated. the most expressed genes are cox7b and ddit4 while the least expressed genes are prg2 and areg. using this panel of 10 marker genes, we compared the three machine learning algorithms considered in this study. we found that the roc curve of the lr model almost dominates the two roc curves for rf100 and xgb100 classifiers (fig. 5) . performance comparisons of these three classifiers are provided in table 3 . the lr model has an average auc score of 0.89 while both rf100 and xgb100 have an average auc score of 0.86. moreover, the lr model has the best sensitivity, specificity, and mcc. additional file 1 table s7 shows the enriched kegg pathways of the 10 marker genes. since these 10 genes are minimally redundant with each other, it is hard to find pathways that include more than one of these genes. we found only two pathways, necroptosis (genes found: stat4 and tnfaip3) and pi3k-akt signaling pathway (genes found: areg and ddit4), with more than one hit from the 10 marker genes. we compared the lr model trained using the 108 degs to the lr models trained using only top 10 degs obtained using our proposed machine learning based gene ranking method (top10_ml) and two other ranking methods based on absolute fold change (top10_fc) and p-values (top10_pv). the average roc curves of the four lr models are shown in fig. 6 -a and the performance metrics of these models are reported in table 4 . the model using the 108 degs has the worst roc curve and the lowest performance estimates. the model based on top 10 genes obtained using the absolute fold change ranking slightly outperformed the model based on top 10 genes ranked using the p-values. finally, the model obtained using our proposed machine learning based ranking substantially outperformed all three models. although all the models based on the three ranking methods had acceptable performance (i.e., auc score ≥0.84), we found that the three sets of genes were not substantially overlapping with each other (see fig. 6-b) . every set of genes had at least 5 unique genes and the only common gene among the three sets was ddit4. figure 6 also visualizes the gene expression profiles for survival and non-survival patients in a 3d space defined by the top three marker genes in these three lists. differential expression (de) analysis has been widely used to analyze gene expression profiles and uncover the underlying biological mechanisms for complex diseases [33, 34] . in general gene expression profiles are characterized with high dimensionality (tens of thousands of genes) and high pairwise correlations between genes. therefore, the outcome of de analysis tools often includes hundred(s) of highly correlated genes (see additional file 2: fig. s2 ). therefore, it is impractical to use all degs for developing diagnostic and prognostic prediction tools. in general, identifying a gene signature (a small set of marker genes) can be done using domain knowledge or datadriven approaches [14] . in this study, we presented a datadriven approach to prioritize the marker genes using an instance of the mrmr feature selection algorithm for selecting genes with the highest auc for predicting the pediatric sepsis mortality and the minimal redundancy among selected genes in terms of pearson's correlation coefficients. the novelty of our work includes the integration of feature selection methods into the statistical pipeline for de analysis, the introduction of a new relevance scoring function based on auc scores for the mrmr algorithm, and the identification of a 10-gene signature of mortality in pediatric sepsis. an interesting observation in our analysis is that the widely used performance metrics such as sensitivity, specificity, and auc might not be sufficient to draw accurate conclusions regarding how different models compare to each other particularly when models are very competitive with each other and there is no model with an roc curve that dominates the roc curves for the remaining models. this underscores the drawback of quantifying the roc curves using their auc scores without visualizing the roc curves for more accurate comparisons. another interesting observation is related to the observed surprisingly superior performance of lr models compared with rf100 and xgb100 models. this superior performance combined with the fact that lr models are linear interpretable models make lr algorithm a preferred choice for developing prediction models based on gene expression profiles as long as marker genes can be reliably identified. it should be noted that supervised machine learning algorithms combined with feature selection methods could be directly applied to identify marker genes from the entire transcriptomic profiles. however, this approach suffers two major limitations. first, the computation time might be extremely long because some feature selection methods including: mrmr which often has a run time in hours when applied to gene expression datasets with tens of thousands genes; feature selection based on genetic algorithms [35] ; and network-based feature selection [36] ) have expensive computational time proportion to the number of features. second, it is challenging to apply functional enrichment analysis to the identified set of marker genes because of the small number of identified genes and the lack of significant redundancy among these genes [19] . therefore, it is less likely that these genes share any common functional pathways. the present approach utilizes supervised feature selection to refine the outcome of statistical de analysis. it will be interesting to explore novel approaches for separately applying statistical de and supervised feature selection to entire gene expression profiles and then integrate the outcome of the two methods. for example, networ-kanalyst tool [37] supports comprehensive metaanalysis of multiple gene lists through heatmaps, venn diagrams, and enrichment networks. one interesting way for obtaining more than one list of degs is to obtain them using different statistical and machine learning approaches. our de and machine learning analyses suggested three 10-gene marker lists for predicting mortality in pediatric sepsis with average auc score ≥0.86. these three lists had only one gene in common, which suggests the existence of multiple data-driven gene signatures for mortality in pediatric sepsis. similar observation had been reported by sweeney et al. [19] where the authors had reported four sets of sepsis marker genes with only few genes in common. this underscores the need for independent validation set as well as wet laboratory experiments to validate some of these markers and confirm the reported biological insights. we have identified a signature of 10 marker genes for reliably predicting mortality in pediatric sepsis. these 10 genes have been determined using a novel machine learning data-driven approach for re-ranking and selecting an optimal subset of 108 degs identified via a secondary analysis of, to the best of our knowledge, the largest publicly available transcriptomic cohort study for pediatric sepsis. our on-going work aims at: i) validating our proposed 10-gene signature using an independent test set; ii) testing and evaluating the proposed approach for identifying reliable biomarkers for challenging biomarker discovery tasks in critical care settings such as diagnosing and endotyping sepsis and acute respiratory distress syndrome (ards); iii) adapting our approach for single cell gene expression analysis [38, 39] . global epidemiology of pediatric severe sepsis: the sepsis prevalence, outcomes, and therapies study precision medicine in pediatric sepsis sepsis kills: early intervention saves lives improved risk stratification in pediatric septic shock using both protein and mrna biomarkers. persevere-xp defining pediatric sepsis the apache iii prognostic system: risk prediction of hospital mortality for critically iii hospitalized adults the sofa (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. on behalf of the working group on sepsis-related problems of the european society of intensive care medicine persevere-ii: redefining the pediatric sepsis biomarker risk model with septic shock phenotype critical care medicine differential gene expression analysis reveals novel genes and pathways in pediatric septic shock patients a comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set validation of the sepsis metascore for diagnosis of neonatal sepsis validation of a host response assay, septicyte lab, for discriminating sepsis from systemic inflammatory response syndrome in the icu biomarker panels in critical care sepsis biomarkers developing a clinically feasible personalized medicine approach to pediatric septic shock validation of a gene expression-based subclassification strategy for pediatric septic shock critical care medicine limma powers differential expression analyses for rna-sequencing and microarray studies a community approach to mortality prediction in sepsis via gene expression analysis affy-analysis of affymetrix genechip data at the probe level random forests xgboost: a scalable tree boosting system ridge estimators in logistic regression scikit-learn: machine learning in python minimum redundancy feature selection from microarray gene expression data min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data assessing the accuracy of prediction algorithms for classification: an overview the use of the area under the roc curve in the evaluation of machine learning algorithms. pattern recogn keggprofile: an annotation and visualization package for multi-types and multi-groups expression data in kegg pathway kyoto encyclopedia of genes and genomes benchmarking sepsis gene expression diagnostics using public data critical care medicine the slc39 family of zinc transporters. molecular aspects of medicine analysing differential gene expression in cancer edger: a bioconductor package for differential expression analysis of digital gene expression data genetic algorithms in feature and instance selection. knowl-based syst biomarker discovery in inflammatory bowel diseases using network-based feature selection network-analyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis bayesian approach to single-cell differential expression analysis nature methods bias, robustness and scalability in single-cell differential expression analysis not applicable. supplementary information accompanies this paper at https://doi.org/10. 1186/s12920-020-00771-4.additional file 1. supplementary tables s1-s4. the original dataset can be downloaded from ncbi geo repository (accession number gse66099). the normalized and pre-processed gene expression profiles can be obtained from the synapse portal accessed at https://doi.org/10.7303/syn5612563.ethics approval and consent to participate not applicable. not applicable. the authors declare no conflict of interest.received: 19 february 2020 accepted: 19 august 2020 key: cord-007259-vj9tv3or authors: guo, feng-biao; dong, chuan; hua, hong-li; liu, shuo; luo, hao; zhang, hong-wan; jin, yan-ting; zhang, kai-yue title: accurate prediction of human essential genes using only nucleotide composition and association information date: 2017-06-15 journal: bioinformatics doi: 10.1093/bioinformatics/btx055 sha: doc_id: 7259 cord_uid: vj9tv3or motivation: previously constructed classifiers in predicting eukaryotic essential genes integrated a variety of features including experimental ones. if we can obtain satisfactory prediction using only nucleotide (sequence) information, it would be more promising. three groups recently identified essential genes in human cancer cell lines using wet experiments and it provided wonderful opportunity to accomplish our idea. here we improved the z curve method into the λ-interval form to denote nucleotide composition and association information and used it to construct the svm classifying model. results: our model accurately predicted human gene essentiality with an auc higher than 0.88 both for 5-fold cross-validation and jackknife tests. these results demonstrated that the essentiality of human genes could be reliably reflected by only sequence information. we re-predicted the negative dataset by our pheg server and 118 genes were additionally predicted as essential. among them, 20 were found to be homologues in mouse essential genes, indicating that some of the 118 genes were indeed essential, however previous experiments overlooked them. as the first available server, pheg could predict essentiality for anonymous gene sequences of human. it is also hoped the λ-interval z curve method could be effectively extended to classification issues of other dna elements. availability and implementation: http://cefg.uestc.edu.cn/pheg supplementary information: supplementary data are available at bioinformatics online. catalogs of essential genes on a whole-genome scale, determined using wet-lab methods, are available for a large number of prokaryotic and eukaryotic organisms which are provided in deg and ogee databases (chen et al., 2012; luo et al., 2014) . computational methods with high accuracy offer an appealing alternative method for identifying essential genes. computational methods are broadly divided into three types: machine learning-based methods combining intrinsic and context-dependent features (cheng et al., 2013; deng et al., 2011) , flux balance analysis-based methods (del rio et al., 2009; gatto et al., 2015; kuepfer et al., 2005) and homology search and evolutionary analysis-based methods (peng et al., 2012; wei et al., 2013) . with respect to essential gene prediction in bacteria, we integrated the orthology and phylogenetic information and subsequently developed a universal tool named geptop (wei et al., 2013) , which has shown the highest accuracy among all state-of-the-art algorithms. some studies have focused on essential gene prediction in eukaryotic genomes. in 2005, chen and xu investigated protein dispensability in saccharomyces cerevisiae by combining highthroughput data and machine learning-based methods (chen and xu, 2005) . in 2006, seringhaus et al. reported a machine learningbased method that integrated various intrinsic and predicted features to identify essential genes in yeast s.cerevisiae genomes (seringhaus et al., 2006) . yuan et al. integrated informative genomic features to perform knockout lethality predictions in mice using three machine learning-based methods (yuan et al., 2012) . lloyd et al. analyzed the characteristics of essential genes in the arabidopsis thaliana genome and used a.thaliana as a machine learning-based model to transform the essentiality annotations to oryza sativa and s.cerevisiae (lloyd et al., 2015) . recently, three research teams approximately identified 2000 essential genes in human cancer cell lines using crispr-cas9 and genetrap technology (blomen et al., 2015; hart et al., 2015; wang et al., 2015) . their results showed high consistency, which further confirmed the accuracy and robustness of the essential gene sets (fraser, 2015) . these studies provided an in-depth analysis of tumor-specific essential genes and feasible methods to screen tumor-specific essential genes (fraser, 2015; hart and moffat, 2016) . the essential genes screened by these three teams provided a clear definition of the requirements for sustaining the basic cell activities of individual human tumor cell types. practically, these genes can be regarded as targets for cancer treatment (fraser, 2015) . the data from these three groups provided a rare opportunity to theoretically study the function, sequence composition, evolution and network topology of human essential genes. one of the most important and interesting theoretical issues in modern biology is whether essential genes and non-essential genes can be accurately classified using computational methods. the models established in the aforementioned three eukaryotic organisms, s.cerevisiae (chen and xu, 2005; seringhaus et al., 2006) , mus musculus (yuan et al., 2012) and a.thaliana (lloyd et al., 2015) , involved intrinsic features, or intrinsic and context-dependent features. these context-dependent features included those features extracted from experimental omics data. however, the features derived from experimental data are frequently unavailable; consequently, this type of machine learning model cannot be extended to a wide range of genomes. in the present study, we addressed this problem in humans by using only intrinsic features derived from sequences, from which certain features can be characterized using a k-interval z curve. the k-interval z curve considered information of both adjacent nucleotide compositions and internal nucleotide associations. to facilitate the use of interested researchers, we have provided a user-friendly online web server, pheg, which can be freely accessed without registration at http://cefg.uestc.edu.cn/pheg. we extracted the gene essentiality data from the deg database (http://tubic.tju.edu.cn/deg/), the updated version of which contained human gene essentiality information from three recent works (blomen et al., 2015; hart et al., 2015; wang et al., 2015) . these essentiality annotations serve as the basis for constructing our benchmark dataset. the flowchart shown in figure 1 illustrates the construction of the positive and negative datasets. in the three studies, 13 datasets were provided. due to lines hct116 and kbm7 are represented by two datasets each, 11 cancer cell lines (kbm7, k562, raji, jiyoye, a375, hap1, dld1, gbm, hct116, hela, rpel) are involved in total. blomen et al. and wang et al. identified the essential genes in the kbm7 cell line. we combined these two datasets into one gene set, kbm7. a total of 2073 and 386 essential genes were contained in the two datasets for hct116. the 386 genes in this dataset were markedly different from those in the datasets for the other cell lines, so this dataset was excluded. ultimately, 11 essential gene sets were obtained, corresponding to a single cell line. essential genes, by definition, are indispensable for the survival of organisms under optimized growth conditions. those genes are considered the foundation of life (juhas et al., 2011) , and they should be persistent in a wide range of cell lines and species. therefore, we only retained genes that were identified as lethal genes in more than half of the investigated cell lines. when a gene appeared as essential in more than six cell lines (11/2 % 6), it was selected as one sample in the positive dataset. according to this principle, we obtained a total of 1518 essential genes. we downloaded all of the protein coding gene sequences from the ccds database (harte et al., 2012) (https://www.ncbi.nlm.nih.gov/ccds/ccdsbrowse.cgi), and the annotations of protein coding genes were obtained from the hgnc database (http://www.genenames.org/cgi-bin/statistics, march 1, 2016), which contained 19 003 annotation entries. the essential gene sequences were extracted according to the annotations, and genes with no counterpart in the ccds database were excluded. according to this criterion, we excluded 2 genes and obtained 1516 essential genes. we used the essential gene annotation in the deg dataset, and the gene sequences were extracted from the ccds because the former did not contain the information for non-essential genes. for human essentiality annotations in the deg database, a number of scattered annotated essential genes aside from those in the 11 cell lines were identified. a total of 28 166 essential gene annotated entries (including conditional essential gene annotated entries) were obtained. among these annotations, there were many repeated annotation entries; therefore, there were considerably fewer unique entries. to obtain a more reliable negative dataset, i.e. absolutely non-essential genes, we excluded all of the human essential genes annotated in the deg database (luo et al., 2014) from the list of the protein coding genes. the remaining genes were regarded as the negative dataset, and their gene sequences were extracted from the ccds database. genes with no counterpart in the ccds database were also excluded. a total of 10 499 non-essential genes were obtained using this method. ultimately, a total of 12 015 gene entries were obtained in the benchmark dataset: 1516 essential genes and 10 499 nonfig. 1 . description of the construction of the human essential and non-essential gene datasets essential genes. the protein coding gene annotations are provided in supplementary material s1, and information for the benchmark dataset is provided in supplementary material s2. the originally proposed z curve variables might reflect the composition of a single nucleotide considering the features derived from phase heterogeneity of a single nucleotide (zhang, 1997; zhang and chou, 1994; zhang and zhang, 1991, 1994) . herein, we provided a summary of the z curve method used for gene identification (zhang and wang, 2000) . let us suppose that the frequencies of bases a, c, g and t occurring in an orf or a gene fragment at positions 1, 4, 7, . . ., 2, 5, 8, . . . and 3, 6, 9, . . ., are represented by a 1 , c 1 , g 1 , t 1 ; a 2 , c 2 , g 2 , t 2 ; a 3 and c 3 , g 3 , t 3 , respectively. those 12 symbols represent the frequencies of the bases at the 1st, 2nd and 3rd codon positions, respectively. according to the symbols defined above, the universal z curve mathematical expression is as follows (zhang and wang, 2000) : because composition bias for oligonucleotides in coding dna sequence (cds) regions or open reading frames (orfs) exists, the adjacent w-nucleotides z curve method was proposed (gao and zhang, 2004; guo et al., 2003) . let us suppose that w represents the length of the adjacent nucleotide sequence. the z curve variables for the phase-specific adjacent w-nucleotides can be calculated as follows: where k equals 1, 2, or 3 to indicate that the first oligonucleotide bases are situated at the 1st, 2nd and 3rd codon positions, respectively. recent studies demonstrated the existence of long-range associations in chromosomes and showed that these associations are crucial for gene regulation (fullwood et al., 2009) . although the two adjacent nucleotides in the primary structure have no association in some cases, strong associations in terms of tertiary structure might exist. therefore, we introduced the k-interval z curve to virtually represent the interval range association. the details of this method are described as follows. let us used p k (s w x) to represent the frequency of oligonucleotides s w x in genes or orfs, where x is one of the four basic bases a, t, g and c. to facilitate this presentation, the length of the oligonucleotide s w is represented as w. according to the predetermined characters, we generated the universal equation for the k-interval z curve based on z curve theory as follows: where x, y and z represent the accumulation of the three base groups classified according to chemical bond properties. variable k denotes the phase-specific index of the first base in the nucleotide sequence s w , and k represents the intervals between s w and x. the first base in the oligonucleotide s w was located at position k. the core part of k-interval z curve forms oligonucleotide s w x. a schematic diagram of the formation of these oligonucleotides is shown in figure 2 . the oligonucleotide window s w slides along a dna molecule sequence according to phase, forming oligonucleotide sets with base x, which is k intervals away from the last base of s w . the periodicity derived from three codons is denoted as s w x. when w is equal to 1 and k is equal to 0, equation (3) can be transformed into equation (1). when w is more than 1 and k is equal to 0, equation (3) can be transformed into equation (2). thus, the phase-specific single nucleotide z curve and phase-specific adjacent w-nucleotide z curve are incorporated into the k-interval z curve. using the k-interval z curve, we can extract more features to characterize dna sequences. according to equations (1) and (2), when w is greater than 0 and k is equal to 0 we can obtain 3 â 3 â 4 wà1 variables to characterize dna sequences. when w and k are greater than 0, we can obtain 3 â 3 â 4 w variables. for convenience, we used fv w,k to represent the variables with the length of w for oligonucleotides s w , and the highest interval length between oligonucleotides s w and base x is k. to obtain more information from a dna sequence, the final variable is described as fv w, k , where w represents the longest oligonucleotides and k is the highest interval. this variable can be represented as follows: where the symbol 'u' represents the union set of fv w,k , i.e. , . . .. fv 2,0 and fv 3,0 are the combination of adjacent phase-specific w-nucleotide z curve variables. we performed this prediction with w ranging from 2 to 4 and k ranging from 0 to 5. according to the discussion above, we obtained 4 545 variables for fv 4, 5 . linear svms play a key role in solving ultra-large-scale data, reflecting the effectiveness, rapid speed and splendid generalization of this method in training and prediction. liblinear, designed by fan et al. (2008), is an easy-to-use, freely available software tool to manage large sparse data. the new version of liblinear (version 2.1-4) supports not only classification, such as l2-loss and l1-loss linear support vector machine, but also regression, such as l2-regularized logistic regression. given the ultra-high-dimensional feature vectors and large samples contained in the benchmark dataset in the present study, we used the liblinear software package for prediction. the new version of liblinear can be downloaded from https://www. csie.ntu.edu.tw/$cjlin/liblinear/. in the present study, we used the 5-fold cross-validation test to determine the best penalty parameter c with the penalty parameter from 2 à18 to 2 10 . we further adopted the jackknife test to assess the predictive power of our classifier. the area under the roc (receiver operating characteristic), curve, the auc, is often used to measure the performance quality of a binary classifier. an auc of 0.5 is equivalent to random prediction, whereas an auc of 1 represents a perfect prediction. there is no bias for evaluating the performance of the unbalanced dataset through auc. therefore, we adopted the auc as a cross-validation criterion in the present study. first, the predictive power of a classifier can be influenced by the relevance and noise in the original features. second, additional time for training and predicting tasks can be increased due to the highdimensional features. feature selection (fs) technology is a powerful method for the removal of noise and redundant features from the original features. hence, the dimension of the features can be reduced. recursive feature extraction through svm linear kernels is a powerful fs algorithm (guyon et al., 2002) , but the correlation bias was not considered using this method. yan and zhang (2015) proposed an improved method, called svm-recursive feature extraction (rfe)þcorrelation bias reduction (cbr), which incorporates the cbr. the main concept is that the ranking criterion can be directly derived from the svm-based model. the feature with the smallest weight is excluded for each run time. the training process was repeated by incorporating cbr until the ranks of all features are obtained. we used svm-rfe þ cbr fs technology to perform feature selection and improve the performance of the classifier. the final features of this method were described by fvw ,k ., a value that contains information on the composition of the adjacent w-nucleotides (gao and zhang, 2004) and k-interval nucleotides. the association information was also captured by fvw, k . therefore, this method achieves improved performance compared with using the original z curve. the following results solidly confirmed this point. we performed a 5-fold cross-validation test with w ranging from 2 to 4 and k ranging from 0 to 5. the detailed results are provided in table 1 , showing that area under the curve (auc) values gradually increased with increasing k when w was fixed. an examination of the performance under variable values for w and fixed values for k revealed that the performance for the classifier improved with increasing values for w. as shown in table 1 , we obtained an auc value of 0.8002 under fv 2,0 . however, after utilizing the k-interval nucleotide composition, the performance was improved, for example, the best auc achieved for this model was 0.8449 through the 5-fold cross-validation test under variable fv 4,5 . the auc was improved 4.47% compared with fv 2, 0. the information redundancy and noise in the original features can influence the predictive power of a classifier, and highdimensional features also increase the time costs for training and prediction. fs technology can mitigate these disadvantages. the svm-rfe þ cbr method was adopted to rank these features in descending order based on the contribution of each feature. subsequently, the top 100 features were used to constitute the initial feature subset to train and test the model, and the next 100 features were added into the feature subset, followed by prediction using the same methods. this process was repeated until the top 4500 features had been added according to the rank order. the test results of each model were evaluated according to the auc scores via a 5-fold cross-validation test. the auc values for different top features are shown in figure 3a . among all 4545 features examined, the best auc of 0.8814 was achieved for the top 800 selective features. the final auc value was 8.12% higher than that for fv 2,0 . to conduct an objective evaluation of this method, we performed a rigorous jackknife test based on the top 800 selected features using the parameters determined via a 5-fold cross-validation test. we obtained an auc value of 0.8854. as expected, excellent performance was obtained after adopting the k-interval nucleotide composition and feature selection technology. those results illustrated that the essentiality of human genes could be well reflected by only sequence information. as we are extremely interested in the actual essential genes in the predicted results, we used the positive predictive value (ppv) to further refine. this evaluation index can be calculated using the formula tp/(tp þ fp), where tp (true positive) and fp (false positive) represent the number of real essential and non-essential genes among the positive predictions. therefore, the ppv reflects the proportion of actual essential genes among the predicted essential genes. we obtained a ppv of 73.05% (tp ¼ 515, fp ¼ 190) using the jackknife test based on the top 800 features. sn (sensitivity), sp (specificity) can reflect the correctly predicted percentage of positive and negative samples, respectively. to give a comprehensive evaluation, we additionally calculated sn, sp. we obtained sp with value of 98.19% and sn with value of 33.97% under the default threshold of liblinear. note that the sn is much lower than sp and this case is cause by the unbalanced dataset (the size of non-essential genes is much larger than essential genes). one of the simplest cross validation tests is the holdout method. in this procedure, the dataset is separated into two subsets, namely, training and testing datasets. we randomly sampled one-fifth of the positive and negative samples from the benchmark dataset for the training model, and the remaining samples were used as the testing dataset. to comprehensively assess the method used in the present study, we repeated the holdout method 100 times. the composition of our training and testing samples has differences in every holdout process. the mean auc score was used as the final evaluator. a mean auc score of 0.8537 with a variance of 1.67e-005 was obtained. additionally, the proportions of samples in the training and predicting datasets were changed for further investigation. one-tenth of the positive and negative samples were randomly sampled as the training dataset, and the remaining samples were used as the testing dataset. this procedure was repeated 100 times. we obtained a mean auc score of 0.8347 with a variance of 2.77e-005. these results further confirmed that this method was robust and accurate. features with fixed w and k values correspond to a specific group of variables. a total of 19 special groups were obtained, namely, fv 1,0 , fv 1,1 , fv 1,2 . . .. fv 1,5 ; fv 2,0 , fv 2,1 . . .. fv 2,5 ; fv 3,0 , fv 3,1 . . .. fv 3,5 and fv 4,0 . we calculated the percentage of features in these groups, and the results are provided in table 2 . for each group, there were two frequencies: p(a), which denotes the actual frequency of features in each group appearing in the top 800 selected features, and p(e), which denotes the expected frequency of the features in each group appearing in the original 4545 features. therefore, p(a) was obtained based on the number of selected features in each group divided by 800, and p(e) was calculated by dividing the number of total features in each group by 4545. p(a) and p(e) are listed in columns 4 and 5, respectively. if p(a) is higher than p(e), then the group makes a higher-thanaverage contribution to the identification of essential genes. we calculated the selected tendentiousness using the formula p(a)/p(e)-1, and the results are listed in column 6 of table 2 . we further conducted a hypergeometric distribution test for each group, and the p values are listed in column 7. figure 3b and table 2 show that fv 2,0 (p ¼ 1.60e-06), fv 2,1 (p ¼ 0.024077) and fv 3,0 (p ¼ 7.89e-05) are preferentially selected and are statistically significant. these results demonstrated that there are strong signals for classifying essential and non-essential genes when the character interval is equal to zero or one, but the other groups did not show these strong signals. to further confirm this result, the variables fv 2,0 , fv 2,1 , fv 2,2 , fv 2,3 , fv 2,4 , fv 2,5 and fv 3,0 , fv 3,1 , fv 3,2 , fv 3,3 , fv 3,4 and fv 3,5 were used as input features. improved performance was obtained under fv 2,0 , fv 2,1 and fv 3,0 compared with the other groups (table 2, those results demonstrated that the shorter interval association provides more information. however, longer interval association can still play an independent role. hence, integrating the interval information into adjacent ones could significantly improve our classifier's capacity of discernment (table 1) . to facilitate the use of interested researchers, we constructed a userfriendly online web server named pheg (predictor of human essential genes), which is freely accessible at http://cefg.uestc.edu.cn/pheg. pheg's algorithm is based on the k-interval z curve. additional parameters are not necessary, making this server convenient to use. pheg can predict whether a query gene (or multi query genes) with fasta format is (are) essential using only the cds region of a gene (or multi genes) as input. we integrated logistic regression into the pheg server to estimate the reliability of the predicted results. hence, this server can output a probabilistic estimated value as a measurement of gene essentiality for the inputted coding region. this is the first available server for predicting human gene essentiality. comparatively, some computational models have been proposed for the other eukaryotes however all of them did not provide online prediction service. we re-predicted the genes in the benchmark dataset via pheg and obtained an auc ¼ 0.9249 and ppv ¼ 83.84%. a total of 612 genes were identified as essential genes among the 1516 positive samples. this means the number of false negative samples is 904 and tends to be a quite large number. however, this case is pervasive in the issue of essential gene prediction because the researchers try to keep a high tn proportion in order to correctively deal with a very high number of negative samples. among the 10 499 negative samples in our benchmark, 118 ones were predicted as essential genes by pheg. to estimate how many genes among those predictions are real essential genes, we calculated precisions using 5-fold, 10-fold, 15-fold and 20-fold cross-validation tests, and we obtained precisions with values of 70.43%, 71.63%, 72.48%, 72.22%, which were approximately 70%. hence, we expect that 82 (118 â 70%) are correctly predicted essential genes. the information for these 118 genes is provided in supplementary material s3. addtionally, we used these 118 gene sequences to conduct a blast (basic local alignment search tool) search against essential genes in the genome of mus musculus (mouse). the current mouse essential gene set is accessible in the ogee database (chen et al., 2012) . considering that no blast program is embedded in ogee, we downloaded the essential gene annotations (gene_essentiality) at (http://ogee.medgenius.info/downloads/) and extracted the essential gene annotation of m. musculus. we obtained the essential gene sequences according to the annotations (http://ogee.medgenius.info/ downloads/). a blast search was performed via ncbi-blast-2.2.30þ-win64.exe (shiryev et al., 2007) using the data from ogee, and homologs for 20 genes were identified (e value < 1e-100) among the 118 predicted essential genes. the details for these 20 genes are provided in supplementary material s4. the exome aggregation consortium (exac) incorporates high-quality exome sequencing data, and it provides rare opportunity to investigate lossof-function (lof) intolerance of a gene via quantitative index pli (lek et al., 2016) . herein, the 20 genes identified as essential were further investigated using exac browser (http://exac.broadinstitute. org/). two-sample t-test illustrated that the mean pli values between those 20 predicted essential genes and those remained nonessential genes in our negative dataset is significantly different (p < 0.05), indicating functions of the 20 predicted genes are more vital. these results illustrated that at least a part of these 118 genes have higher probability to be factually essential genes and have been overlooked in the essential gene screening in previous experimental studies. hence, pheg sever could be used to predict essentiality for anonymous gene sequences of human and closely related species, and identifying novel essential genes using pheg may supplement the essential gene list of human. the z curve has been widely used in the field of bioinformatics for tasks such as protein coding gene identification (chen et al., 2003; guo et al., 2003; guo and zhang, 2006; hua et al., 2015; zhang and wang, 2000) , promoter recognition (yang et al., 2008) , translation start recognition (ou et al., 2004) , recombination spots recognition (dong et al., 2016) , and nucleosome position mapping (wu et al., 2013) . however, correlation and k-interval nucleotide composition have not been incorporated into the z curve method. in the present study, we present a k-interval z curve based on z curve theory. the dna sequence can be understood as an ordinary character sequence; therefore, the method proposed in the present study has the potential for applications in mining characteristics from other character sequences and can be used as a universal feature extraction method for dna sequences. based on the k-interval z curve, we obtained excellent performance in human essential gene identification. this excellent performance might be attributable to the following points: first, we introduced the concept of intervals, reflecting association information and the k-interval nucleotide composition. second, we used feature selection technology in the present study. thus, noisy and redundant features could be removed from the original features. table 2 shows the improved performance obtained under fv 2,0 , fv 2,1 and fv 3,0 compared with the other variable groups. further comparison of these results with other feature groups shown in table 2 , and this comparison shows that the auc values obtained with k-interval variables are smaller than those obtained with adjacent variables. however, the performance can be improved after adding k-interval oligonucleotide association information (see table 1 ). hence, the kinterval z curve should reflect additional important information for essential genes that cannot be contained in adjacent nucleotide association information. in 2005, chen and xu used a neural network and svm to predict the dispensability of proteins in the yeast s. cerevisiae based on the protein evolution rate, protein-interaction connectivity, geneexpression cooperativity and gene-duplication data (chen and xu, 2005) . the next year, seringhaus et al. only used 14 features to predict essential genes in s. cerevisiae and obtained a ppv ¼ 0.69 (seringhaus et al., 2006) . yuan et al. assembled a comprehensive list of 491 candidate genomic features to predict a lethal phenotype in a knockout mouse using three machine learning methods (yuan et al., 2012) , and the best auc value was 0.782. in 2015, lloyd et al. investigated the relationship between phenotype lethality and gene function, copy number, duplication, expression levels and patterns, rate of evolution, cross-species conservation, and network connectivity, and the random forest-based model used in this study achieved an auc of 0.81, which is significantly better than that obtained by random guessing (lloyd et al., 2015) . those previous researches in three eukaryotes illustrated classifiers can gave satisfactory prediction through combining sequence information with other features. for human essential gene identification, we only used the sequence composition and interval association information in the present study and still obtained an auc of 0.8854. considering that this result is better than the results obtained in previous studies using integrated features, the gene essentiality of the human genome can be accurately reflected based on only the sequence information. we also surveyed two other properties related to gene essentiality. homologous genes between human and other 17 species were downloaded from hcop (http://www.genenames.org/cgi-bin/hcop). these data were used for calculating how many species maintain homologous genes compared with human. results of two-sample t-test illustrated that there is significant difference for persistence value between essential and non-essential genes (p ¼ 4.9493e-214). the mean persistence for essential genes was 12.7, whereas the mean persistence for non-essential genes was 8.3. we also downloaded the human protein-protein interaction data (biogrid-organism-3.4.139.mitab.zip) from the biogrid database (stark et al., 2006) . results of network topology analysis revealed there is significant difference in the degree of connectivity between essential and non-essential genes via the two-sample t-test (p ¼ 1.6160e-229). the mean connectivity degree for essential genes was 76, whereas the mean connectivity degree for non-essential genes was 19. thus, essential genes tend to maintain persistence in more species and have more neighbors in protein-protein interaction networks than nonessential genes. if the two types of features are integrated in the future we think the classifier of essentiality could be further improved. gene essentiality and synthetic lethality in haploid human cells zcurve_cov: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing sars-cov genomes ogee: an online gene essentiality database understanding protein dispensability through machinelearning analysis of high-throughput data a new computational strategy for predicting essential genes how to identify essential genes from molecular networks? investigating the predictability of essential genes across distantly related organisms using an integrative approach combining the pseudo dinucleotide composition with the z curve method to improve the accuracy of predicting dna elements: a case study in recombination spots liblinear: a library for large linear classification essential human genes an oestrogen-receptor-alpha-bound human chromatin interactome comparison of various algorithms for recognizing short coding sequences of human genes flux balance analysis predicts essential genes in clear cell renal cell carcinoma metabolism zcurve: a new system for recognizing protein-coding genes in bacterial and archaeal genomes zcurve_v: a new self-training system for recognizing protein-coding genes in viral and phage genomes gene selection for cancer classification using support vector machines high-resolution crispr screens reveal fitness genes and genotype-specific cancer liabilities bagel: a computational framework for identifying essential genes from pooled library screens tracking and coordinating an international curation effort for the ccds project zcurve 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes essence of life: essential genes of minimal genomes metabolic functions of duplicate genes in saccharomyces cerevisiae analysis of protein-coding genetic variation in 60,706 humans characteristics of plant essential genes allow for within-and between-species prediction of lethal mutant phenotypes deg 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements gs-finder: a program to find bacterial gene start sites with a self-training method iteration method for predicting essential proteins based on orthology and protein-protein interaction networks predicting essential genes in fungal genomes improved blast searches using longer words for protein seeding biogrid: a general repository for interaction datasets identification and characterization of essential genes in the human genome geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny z curve theory-based analysis of the dynamic nature of nucleosome positioning in saccharomyces cerevisiae feature selection and analysis on correlated gas sensor data with recursive feature elimination human pol ii promoter recognition based on primary sequences and free energy of dinucleotides predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the z curve a symmetrical theory of dna sequences and its applications a graphic approach to analyzing codon usage in 1562 escherichia coli protein coding sequences analysis of distribution of bases in the coding sequences by a diagrammatic technique z curves, an intutive tool for visualizing and analyzing the dna sequences we would like to thank dr. ke yan for providing the open source of the svm-rfe þ cbr script, prof. chun-ting zhang for inspiring discussions and providing invaluable assistance. conflict of interest: none declared. key: cord-016062-h4vjkufn authors: gontier, nathalie title: evolutionary epistemology and the origin and evolution of language: taking symbiogenesis seriously date: 2006 journal: evolutionary epistemology, language and culture doi: 10.1007/1-4020-3395-8_10 sha: doc_id: 16062 cord_uid: h4vjkufn symbiogenesis is a form of horizontal evolution that occurred 2 billion years ago, with the evolution of eukaryotic cells. it will be argued that, just as we can develop universal selection theories based upon a general account of natural selection, we can also develop a universal symbiogenetic principle that can serve as a general framework to study the origin and evolution of language. (1) horizontal evolution will be compared with and distinguished from vertical evolution. (2) different examples of intraand interspecific horizontal evolution will be given to show that horizontal evolution is quantitatively and qualitatively the most commonly occurring form of evolution throughout the history of life. (3) finally, three examples are given of how a universal symbiogenesis principle can be implemented in the study of language origins and evolution, more specifically within: (a) the study of language variation, (b) language genes and (c) conceptual blending. the universal selectionist models (campbell, 1959 (campbell, , 1960 (campbell, , 1974 (campbell, , 1977 (campbell, , 1987 cziko, 1995; hull et al., 2001) , universal darwinism (dawkins, 1983) or philosophical darwinism (munz, 2001) developed by evolutionary epistemologists, are all based upon the evolution of genes by natural selection. these theories, although they are very useful to study the evolution of animals, are not adequate to study phenomena such as language or culture. language and culture do not follow rigid evolutionary schemes analogous to the evolution of genes, rather they have their own peculiarities that need to be studied in their own right. to begin with, it is difficult to pinpoint one unit and one level of selection, because languages and cultures can take on many forms. languages are the result of many different elements that are combined: speech, thinking, grammar, semantics, sensory-motor actions, memory, (co-verbal) gesturing, language rules and language games. and in the case of culture, there is the individual that adheres to different (sometimes contradictory) views, that are categorized by a brain; there is the interaction with significant others within a community; and there are cultural artefacts that need to be taken into consideration. these peculiarities, however, can also be formalized, analogous to different evolutionary theories, one of the most important being symbiogenesis. ruse (1985) already pointed out that we have to take darwin seriously, meaning, amongst other things, that our cognitive capacities such as language and culture need to be studied as a product of darwinian evolution. however, here it will be argued that we should also take symbiogenesis seriously. symbiogenesis is a form of horizontal evolution and it will be argued that horizontal evolution is quantitatively and qualitatively the most commonly occurring form of evolution throughout life's history. evolution by natural selection is a form of evolution that, during and within the evolution of life, only plays a minor role within the evolution of animals. given this minor role that natural selection plays within evolution, it is too short-sighted to only develop general normative frameworks based upon neo-darwinian theory to study all of life's phenomena. first of all, it is bad science because it neglects all other evolutionary theories that provide adequate and scientific explanations regarding certain phenomena. secondly, this unnecessarily narrows down the options of linguists and anthropologists, leading to today widely defended views that naturalistic approaches cannot adequately address culture or language, because the, on biology-based, theories of language and culture that are introduced by evolutionary epistemologists, supposedly cannot account for a diversity of research topics within language and culture (e.g. recursivity, creativity, religion, arbitrary rituals and so on). here a much more optimistic view is given. the above-mentioned criticisms should not be regarded as negative, but as positive. for where natural selection, or where ee(m)-models (gontier, this volume) based upon natural selection fail, it is not necessary to abandon a naturalistic approach altogether. we can turn to other evolutionary frameworks, such as symbiogenesis, that can deal with, and formalize, these phenomena. first, horizontal evolution is distinguished from vertical evolution. secondly, it is shown how both evolutionary concepts are being applied within the study of language. thirdly, symbiogenesis is universalized and it is shown how we can apply this normative framework to the study of phenomena such as language variation and evolution, language genes and conceptual blending. contrary to received wisdom, horizontal evolution processes occur quantitatively and qualitatively more often than vertical evolutionary processes. we will start off by defining our concepts. vertical evolution is evolution as we have all learned it at school: it is evolution by natural selection. neo-darwinian theory (ayala, 1978; mayr, 1978 mayr, , 1983 dawkins, 1983 dawkins, , 1984 dawkins, , 2000 dennett, 1995; gould, 1980 gould, , 1982 gould, , 1991 maynard smith, 1993) adheres to the view that only speciation leads to the evolution of new species. to give only one standard example of how this speciation takes place, let us look into 'allopatric speciation by peripheric isolation'. this catchy phrase refers to the following scenario. a subgroup of a population of a certain species gets isolated from the main group (by the eruption of a volcano that burns the ground, leading to the subgroup not crossing this land even if the ground is cooled of, because they do not recognize it as their territory or niche; or by another geographical barrier, such as water because of floods). given that this isolation takes a very long time, it is possible that one or several random mutations occur and spread within the subgroup. again, given enough time, should the subgroup and the main group meet again, it could be that these different groups cannot fertilize each other or cannot produce fertile offspring, while members of the same group can produce fertile offspring. if the latter is the case, then we have a new species. actual examples of allopatric speciation by peripheric isolation have only been reported a couple of times. processes like allopatric speciation by peripheric isolation, together with other such processes (for example species-mate-recognition patterns, 2 see for instance schwartz, 1999) lead, randomly, to the evolution of new species by speciation: new species evolve out of and split off from older, sometimes still existing species. hence a family tree with a branching pattern is regarded as 1 what is being argued here might seem trivial but it is not. most non-biologically schooled scholars think that evolutionary biology is simple, it is not. evolution in some cultures today is part of a way of life, of a philosophical understanding of the world, leading to the idea that everybody knows what evolution is about and that evolution is easily comprehensible. again, it is not. 2 recognizing that someone belongs to the same species and that this organism is of the opposite sex and hence a potential mate. the most suitable way of iconizing vertical evolution ( figure 1 ). 3 this is what darwin talked about: species are not static entities; they evolve out of each other by speciation. neo-darwinian theory, more specifically ernst mayr, later on introduced the term allopatric speciation by peripheric isolation. neo-darwinian theory also added mendelian hereditary laws, genes, mutations and mathematics to this speciation concept (gontier, 2004) to answer the following question: what happens during vertical evolution? the answer to that question is: existing or new characteristics are retained by inheritance (genetically) and spread throughout the population across time, from generation to generation, sometimes leading to speciation or extinction. actually, today, we would not say that it is characteristics, but genes that are retained and inherited by recombination. this is because we do not quite know how genes encode for characteristics (all they seem to encode for are amino acids that form proteins that form tissue). the point is, however, that it is presumed that genes, in a yet to be explained manner, encode for certain characteristics of an animal, and that these genes need to be inherited from one generation to the next, in a vertical fashion. 3 gould (1984 gould ( , 1991 already emphasized from within neodarwinian theory that this iconicity however is not suitable to explain evolution, because vertical evolution implies that there is one common point of departure (and beginning) and from there, the tree diversifies, the idea being that the maximal diversification lies at the end of the tree, comparable with a christmas tree turned upside down. however, he showed that the maximal diversity lies not at the end of the tree of life, but somewhere in the middle. after the cambrian explosion, there was a decimation of most phylogenetic branches. only the few phylogenetic branches that did survive this decimation show an enormous diversification within these different phylogenies. there are few branches (body designs), but lots of twigs on these branches (species with a certain body plan), a process that he characterizes as early experimentation and later standardization. again, we need to be more cautious with our definition: vertical evolution at the animal level implies that two members of the same species, of the opposite sexes, mate, and that during this mating process, one sex cell from each parent merges with the other to form a fertilized cell, from whereupon an embryo develops. so half of the genes of the mother and half of the genes of the father are recombined in a fertilized egg cell (as we shall see under section 2.2.1, this is not totally correct either). vertical here has two meanings: (1) half of the genes of the parents are passed on to the next generation; (2) only those characteristics that are somehow genetically encoded for in the individual can possibly get passed on to the next generation. if they are passed on, these genes and hence the characteristics are retained in the next generation. the more a gene is retained within the next generation through time, because the carrier is fit and hence able to reproduce, the more it is spread throughout the population in general and the more it can get fixed within a species. p1 gives the gene to offspring f1, 4 who mates with x, and passes the gene on to their offspring and these in turn have offspring with y and z and pass the gene on to their offspring. hence an existing gene will spread throughout a population, which does not necessarily lead to the evolution (by speciation) of a new species. sometimes a gene undergoes a random mutation. if this mutated gene is to be passed on to the next generation, a few conditions are given by neo-darwinian theory. the mutation can (possibly, not necessarily) only be passed on to the next generation, if the mutation occurs in one of the sex cells and if this sex cell (a sperm or an egg cell) is used during reproduction to form a zygote. this means that the organism with the mutation has to be able to reproduce itself. it can only reproduce itself if it is able to survive long enough to find a mate, and it can only find a mate if the two parents recognize each other as potential mates. hence we find the introduction of concepts such as adaptation (to the environment in order to survive long enough) and fitness (to be able to reproduce itself at a maximal rate), concepts that all too often are intertwined. so neo-darwinian theory basically studies two processes: (1) the recombination of genetic material at the level of the sex cells (meiotic recombination); and (2) the possible occurrence of genetic mutations at the level of these sex cells, because if advantageous to the individual and passed on to the next generation, the mutations possibly lead to new species (gontier, 2004) . vertical evolution though, is primarily a zoological concept that can only be applied to a certain degree to animal evolution. most neo-darwinians are zoologists as well. horizontal evolution (figure 1 ) is evolution through symbiogenesis (section 2.2.1) or hybridization (section 2.2.2) but it can also occur at the inter-and intraspecific level of animal evolution (section 2.2.3). two general definitions can be given: (1) horizontal evolution is the coming together and merging of existing and independently evolved evolutionary lineages (we will return to this definition under sections 2.2.1-2.2.3). (2) horizontal evolution happens when existing characteristics are retained and spread out geographically within members of a population (and across generations through time). the second general definition of what horizontal evolution is about resembles the definition of vertical evolution, but there is a difference: only existing characteristics are retained and spread throughout a population. neo-darwinians do not explain this phenomenon of horizontal evolution as a form of evolution, rather they regard the process of passing on already existing genes as part of variation. the more variation, the less genes are held in common within members of the same species; the less variation, the more genes are 'fixed' within a population and hence are regarded as (linked to) typical traits of that population. as mentioned above, the modern synthesis focuses on two steps: the sex cells, where genes possibly are passed on from one generation to the next, and possible random mutations that occur within these genes. hence the popular idea put forward by neo-darwinians that animals pass on their genes from one generation to the next (gontier, 2004) . this is not true: animals do not pass on their genes from one generation to the next, they pass on their sex cells (that contain genes) from one generation to the next, and here a horizontal element is involved: namely, two members of the same species, of the opposite sexes, mate and if all goes well a sperm cell penetrates an egg cell, resulting in the formation of a cell with diploid chromosomes. "zoologists, those who professionally study animals, have imposed a distinct concept of species, which they call the 'biological species concept'. coyotes and dogs in nature do not mate to produce fully fertile offspring. they are 'reproductively isolated'. the zoological definition of species refers to organisms that can hybridize-that can mate and produce fertile offspring. thus organisms that interbreed (like people, or like bulls and cows) belong to the same species. botanists, who study plants, also find this definition useful (margulis and sagan, 2002: 4-5, my emphasis) ." there is more to it than a mere definition process. this crucial horizontal step is taken for granted and even ignored by neo-darwinian theory, because of their focus on genes. every mating process, however, is a crucial horizontal (temporary merging) process of the parents, and every fertilization is a permanent merging and recombining of different cells that contain (mostly already existing) genes. since it mostly only involves the passing on or recombining of existing genes, i prefer to call this a form of horizontal evolution contrary to regarding this as part of the process of individual variation that occurs because of vertical genetic recombinations without there actually being vertical evolution (because no species evolves or goes extinct). vertical evolution carries connotations of speciation, mutations (the introduction of new genetic material) and branching, which leads to the idea that all animals that belong to the same species, carry the same genes and that these genes are the essential characteristics of that species. horizontal evolution emphasizes the coming together and spreading of already existing genetic material and involves a process worth studying on its own. only prokaryotes (bacteria, viruses) are able to pass on their genes, immediately and directly, within one generation or from one generation to the next. bacteria, who happen to bump into each other, can exchange and donate genetic material: they form a bridge (literally, made from proteins) and exchange genetic material in a direct way. a process that can be understood by the following analogy: imagine that in a coffee house you brush up against a guy with green hair. in so doing, you acquire that part of his genetic endowment, along with perhaps a few more novel items. not only can you now transmit the gene for green hair to your children, but you yourself leave the coffee shop with green hair. bacteria indulge in this sort of casual, quick gene acquisition all the time. bathing, they release their genes into the surrounding liquid. if the standard definition of species, a group of organisms that interbreed only among themselves, is applied to bacteria, then all bacteria belong worldwide to a single species. (margulis and sagan, 2000: 93) in contrast, all eukaryotic organisms (protists, plants, animals and fungi) pass on their sex cells (with genetic material) from one generation to the next. although darwin entitled his magnum opus on the origin of species, the appearance of new species is scarcely even discussed in his book. symbiosis [. . . ] is crucial to an understanding of evolutionary novelty and the origin of species. indeed, i believe the idea of species itself requires symbiosis. bacteria do not have species. no species existed before bacteria merged to form larger cells including ancestors to both plants and animals. [. . . ] [l]ong-standing symbiosis led first to the evolution of complex cells with nuclei and from there on to other organisms such as fungi, plants, and animals. (margulis, 1999: 8) as margulis (1999; margulis and sagan, 2000, 2002) shows, all life can be divided into organisms with two basic cell types: prokaryotic organisms and eukaryotic organisms. prokaryotes are all (archae)bacteria (the first kingdom), quantitatively the most common form of life. typical of these bacteria is the fact that they carry genetic material in their cells and that these genes encode for the proteins present in these cells. however, this genetic material is not organized on chromosomes, nor is it encapsulated within a nucleus. as said, these bacteria can exchange genetic material freely in a horizontal fashion. eukaryotes are all organisms that are part of the other four kingdoms of life: protists, animals, plants and fungi (mushrooms and yeast). the cells that make up these eukaryotic organisms, all contain genetic material that is organized on chromosomes, and encapsulated in a protecting nucleus. animals, plants and fungi cells, beside their nucleus, also contain organelles, little bodies in the cell that are enclosed by their own membrane, and contain their own genetic material. what is interesting about this genetic material is that, when compared with the genetic material from the nucleus, it shows little to no resemblance to it. however, when the genetic material that is present in all organelles of eukaryotic cells is compared with the genetic material of today's independently existing bacteria (that is, prokaryotes) they show a very high resemblance, so high, that we have to conclude that the organelles that are part of all eukaryotic cells, used to be bacteria that lived independently. somehow, 2 billion years ago, bacteria merged: instead of just exchanging genetic material, whole bodies fused together, they penetrated each other and literally started living in each other, as a form of permanent parasitism. these merged beings evolved into protists and multicellular organisms, ending with the evolution of the fungi, plant and animal kingdoms ( figure 2 ). and, the types of bacteria that fused, still exist today, on their own, thereby excluding any deterministic process: the mergings that occurred, occurred randomly, otherwise we would not see members of these different types alive and on their own today. bacteria fused literally, by cannibalism or enforced parasitism. another interesting aspect of these organelles is that they are passed on from one generation to the next, in a non-mendelian fashion. only eggs contain mitochondria or chloroplasts, sperm cells lack these. so every eukaryotic organism, male or female, receives its organelles with their specific genetic paracocci thermoplasm spirochetes cyanobacteria protoctista animals plants fungi nucleic cells figure 2 . the symbiotic mergers that took place between different prokaryotic organisms (all bacteria beneath the dotted line) that lead to the evolution of eukaryotic organisms (everything above the dotted line). important are the evolutionary lineages, that instead of forming a splitting pattern, merge (based on margulis, 1999:41) . material, from the mother. it is hence not true that we receive half of our genetic material from our mother and half of our father. 5 according to margulis' theory, we can only talk about 'species' at the level of eukaryotic organisms, where cell fusions are the driving forces of evolution, while bacteria can't be distinguished into species, rather they are classified into different types (that belong to one single species). mitosis and meiosis always occurs at the level of the cell or between cells. hence according to margulis' view, all eukaryotic evolution, even today, during the reproduction of organisms belonging to these four eukaryotic kingdoms, requires a certain form of symbiogenesis. therefore, this horizontal process needs to be distinguished from a vertical evolution process. plant hybridization is another form of horizontal symbiotic evolution and plants also by far outnumber animals. plant species that evolved independently from one another can cross-fertilize and produce fertile offspring. this is not a mere vertical process either because what we call incest is a rather common phenomena in hybridizing plants. p1 can be the result of the hybridization of two different plants species, and f1 hybrids can possibly cross-fertilize again with p1. hybridization can also occur when for instance these f1 hybrids cross-fertilize with yet another plant species and their offspring, for the sake of argument, called f2 hybrids (although they are f1 hybrids of the crossings of f1 with yet another species) and these f2 hybrids can potentially cross-fertilize with p1 or f1 ( figure 3 ). it is a common thing in plants and indeed these symbiotic mergers are also a form of symbiogenesis, because they always involve the fertilization of whole cells, not merely or solely the passing on of genes. bacteria today still donate genes regularly. that is why for instance certain infectious bacteria become immune to antibiotics. even if bacteria die from a certain antibiotic, suppose that they develop a resistant gene, bacteria that are alive can snap these genes from those dead bacteria. horizontal evolution however does not only occur at the prokaryotic level or within the evolution of eukaryotic plants. it can also occur at the intraand interspecific level within the evolution of eukaryotic life in general, also within the evolution of animals. sars (severe acute respiratory syndrome)-, hiv-and ebola-viruses for instance (kahn, 2004) are viruses that are passed on not only between members of the same species (intraspecifically) but also between members of different species (interspecifically). the recently evolved sars-virus is a virus that humans caught as a result of eating or being around the masked palm civet, a cat-sized animal. sars is a variant of a common corona-virus of these masked palm civets. once one human catches this virus, it can spread very rapidly within the human population at an intra-specific level. it can also be passed on from one generation to the next, not because the virus contaminates the sex cells. rather this occurs because when there is no external intervention like putting contaminated individuals in quarantine, the population as a whole 'carries' the virus. newborn babies can catch the disease because their grandparents have it, or grandparents can catch the disease because their grandchildren have it. the same goes for the hiv-virus which humans caught eating the brains of primates, and primates in turn got the virus eating monkey brains that were infected with the siv-virus. "hiv itself has been isolated from common chimpanzees, which are believed to be the original source of the aids pandemic after hunters killed and ate them. ironically, [. . . ] chimpanzees acquired their siv from monkeys they killed and eaten." (kahn, 2004: 58) . once the hiv-virus developed and was passed on interspecifically, it remains in a population, because it is passed on intraspecifically. and ebola for instance, is now killing great apes, while humans eating these apes, can again infect the human population with ebola. neo-darwinians do not explain these phenomena as evolution because most viruses do not infect the sex cells and as mentioned above, the researchers only understand two processes to be relevant for the study of evolution: the genes that get passed on sexually, and how these get passed on (mutated or not). however, ontogenetically, the hiv-virus can get passed on through the blood line (when for instance cutting the navel cord, or drinking the mother's milk). immunological processes of resistance against certain viruses, for instance, are also passed on through the mother's milk. infants who are breast-fed are more immune to the development of certain diseases that are caught by viruses or bacteria. this is because through the mother's milk, children receive antibodies (indeed again whole cells) that the mother already made when she, for example, caught this year's flu. so intraspecifically, there is a lot more going on than the mere transmission of genetic material from one generation to the next because of sexual recombination. viruses and bacteria which contain their own genetic material can also be passed on in a horizontal ontological fashion. neo-darwinian theory is not able to account for, or to formalize these sorts of evolution because of their excessive focus on the sex cells with the subsequent genetic variations and possible mutations. these forms of contamination as said, can happen at the inter-and intraspecific level, but horizontal evolution, by means of cross-fertilization can also happen at the level of animals. we of course all know the mule that is the result of cross-breeding with a donkey and a horse. however the mule is infertile and hence neo-darwinians define species as those individuals that, when mating with members of the opposite sex, can produce fertile offspring. however, the giant panda (o'brien and menotti-raymond, 1999) is also the result of cross-breeding between the brown bear and other bear species. their chromosomes reveal these crossings and most importantly, the giant panda is fertile. it is threatened with extinction (because of its vanishing niche but also perhaps because of this genetic load that the animal carries), but the giant panda is nevertheless up until this day, fertile. neo-darwinian theory cannot cope with these different, everyday phenomena. therefore, a horizontal evolutionary concept that can cope is absolutely necessary. that is not to say that neo-darwinian theory is wrong, far from it, all that is being said is that there are different phenomena going on within the evolution of life which can very optimistically be explained from within other evolutionary theories. within the study of language (its origin, evolution and use), both evolutionary concepts are explicitly or implicitly put to use. most especially a vertical evolution concept is used, explicitly within the study of language, while a horizontal evolution concept is used implicitly. a vertical evolutionary concept can be distinguished within the disciplines of historical and theoretical linguistics, structuralism and within the darwinization of language (croft, 2002) and today takes on the form of the 'languageas-species metaphor' (mufwene, 2001) . although historical and theoretical linguistics are separate disciplines today, both can be understood as part of, and the result of, the sociological systems theory movement described in gontier (this volume), for they adhere to the view that language needs to be studied synchronically, as a closed, self-explaining and self-encapsulating system. language is understood as a static, unevolving entity (croft, 2002: 75-78 ) which leads to the entification and reification of language. this essentialism in turn implicitly subscribes to the idea that there is only one (ideal) language or one (grammatical) language structure understood as a platonic archetype which takes on different manifestations. so the idea arises that there is only one ideal language, that diversifies into different languages. essentialist thinking is always about distinguishing the accidental from the essential. de saussure for example developed his three laws. these state that the primary concern of linguistics is about coming to terms with the following three dichotomous relations within language: (a) the relation between lexicon and grammar; (b) the relation between form and meaning and (c) the relation between langue and parole. these dichotomous relations indeed are instruments to distinguish the accidental from the essential and hence are used to discover the core of 'the' language. this has four major consequences: (1) although language can have different manifestations (there are different languages belonging to different language families, there are dialects, and even child language is different from adult language), all these languages belong to the same 'universal' language, because all share the essential properties. the goal of linguistics, according to these theoretical linguists, is hence to distinguish the accidental from the essential and thus to answer the what-is-language question, thereby introducing a functionalistic approach. (2) since all languages are different manifestations of one language, all languages are uniform, meaning that there is no directionality to language change (newmeyer, 2003: 64) . if there were directionality, language(s) would evolve and there would be 'lesser' and 'more' languages, but the essential, reified, ideal, universal language is, once evolved, evolutionless. (3) the principle of uniformity adhered to by theoretical and historical linguists, implicitly implies that, since all languages are essentially the same, but different because of contingent and arbitrary elements such as culture and so on, the essential properties of language transcend everyday language use, and indeed the individual itself, which again leads to an entification and reification of language outside an individual organism. this entified structure, which obeys laws of its own, and is not part of the individual members of the species, forms its own structures and behaves on its own. (4) so when we want to understand language evolution, we need to study this structure on its own, using, for example, the internal reconstruction method and search for the point where this one language started to diversify and have different manifestations, in other words: we need to search for 'the' proto-language, because this will show the essential properties of language. this by no means implies that today historical linguists adhere to the idea that it is possible to reconstruct 'the' proto-language or that they believe that there was a proto-language from where all languages developed. newmeyer (2003: 63) gives credible evidence for the fact that we do not know whether there is one language from where all languages developed, but then again we cannot prove that two languages are unrelated either. and neither does this imply that these historical linguists themselves believe that their internal reconstruction or their use of the comparative method can shed light on the origin of language (newmeyer, 2003: 71-72) . nevertheless, answering the questions whether all languages share a common descent and whether this gives clues as to how language evolved, used to be one of the goals of historical linguistics, and these goals are the ones under review in this article, for biologists have interpreted them in different ways which has given rise to the general academic climate which will be discussed in the next section of this paper (under sections 3.1.3 and 3.2). structuralism evolved out of historical linguistics and here chomsky (1965) makes his entrã©e. chomsky's main goal was to criticize behaviourism which stated that language can be understood without entering the black box that our brain is. chomsky never denied that we need to understand language from within biology or cognition, on the contrary, this was his main goal. however, he denied that language needs to be studied diachronically, that is, amongst other disciplines making use of evolutionary biology. because language was uniquely human, the evolution of non-linguistic species in itself could not help the study of human language. basically, developing de saussure's ideas further, chomsky distinguished between competence and performance, arguing that only the competence part is relevant for linguistics. this linguistic competence of individuals was believed to be universal: all human beings have access to a universal grammar, a language organ in the brain called the language acquisition device (lad). because performance can vary greatly, competence is what needs attention. therefore, he stated that: "linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogenous speech-community, who knows its language perfectly." (chomsky in croft, 2002: 76) and hence here too the primary concern of linguists is to distinguish the accidental from the essential, that is, to search for the proper functions of language and to answer the what-question: "hence the logically prior task of elucidating precisely what evolved has taken research priority over elucidating how it evolved." (newmeyer, 2003: 60) . essentialistic thinking is always associated with asking the 'what-question' (gontier, 2004) . in a very real sense, it was chomsky who brought the 'whatquestion' into (biological) linguistics. according to plato, the archetypes which he talked about, were part of a transcendental reality. one of his students, aristotle, internalized these archetypes in human and other beings and elements, saying that the 'ideal form' that the platonic archetype is, is not part of a transcendental reality, but is potentially part of things. this potential needs to be actualized by a process of becoming. in exactly this way, did chomsky internalize language, saying that the ideal form is part of the brain and that this potential only needs actualization. the point however, is that the actualization process, is not a kind of evolutuion, it needs to be understood as a process of actualizing what was already inherently there (understood as an unfolding). the result of this thinking is of course that language again gets reified and entified, language in its idealized form (the universal grammar of the lad)-once evolved-does not evolve anymore and hence is essentially evolution-less. different from his predecessors, chomsky states that the ideal entified structure is not somewhere out there, rather it is part of the individual, for it forms an organ in the brain. universal grammar is supposed to be part of the brain, where presumably a module is formed, where these universal grammatical structures are somehow stored, hot-wired, without these being able to undergo change. since natural selection does not work in a manner where something this complex can evolve all at once, without evolutionary intermediates, chomsky denied any role to natural selection and assumed a qualitative emergent evolutionary step, leading, at one leap, to this language faculty. it is, however, one thing to assume that we can deduce ideal grammatical structures or rules from a language, and quite another to adhere to the view that these structures actually are somehow part of our brain. we have not yet been able to localize one grammatical rule within that big brain of ours. as we have seen, historical and theoretical linguists entified and reified language. problem then was to locate this reified structure. contrary to his predecessors, chomsky localized the entified structure in the human brain, calling it the lad that took on the form of a module. basically, chomsky hence internalized language and combined this view with the principle of uniformity: after all, the universal grammar present in all humans which forms the basis for the competence of these humans, is biologically the same. this in turn again implies that all languages share a 'common descent', for all language performances are manifestations of this ideal internal structure. when battling against behaviourism, the next logical step was to replace instructionist models (that evolved out of behaviourism and stressed the relation between the language structure out there (in a community) and the learning of that structure by individuals) with selectionist models (where natural selection is internalized (gontier, this volume)). the first problem to be tackled when interested in a biological, evolutionary study of language is to search for evidence of common descent (in the true sense of the words), that is, to search for commonalities between different languages, and to investigate whether these commonalities are the result of random events, or whether they can be explained in a homologue fashion. "a biologist interested in exploring the evolution of some structural property of some species will first of all avail himself or herself of the comparative method, which involves the identification of homologues to the relevant property in the same species." (newmeyer, 2003: 61) . what is common to all languages was already a problem posed by essentialist theoretical linguists, while what is homologous is a question studied by historical linguists. although the above mentioned scientists never intended to use these theories when tackling questions about language origins, they nevertheless were interpreted in this way by different biologists. pinker and bloom (1990) developed chomsky's ideas further, thereby emphasizing the need for diachronic study of language as well. their main objective is to synthesize neo-darwinian theory with modularity theory and the theory of generative grammar. their idea: "no single mutation or recombination could have led to an entire universal grammar, but it could have led a parent with an n-rule grammar to have an offspring with an n+1 rule grammar, or a parent with an m-symbol rule to have an offspring with an m+1 symbol rule." (pinker and bloom, 1990: 753) this of course is obviously problematic: ee is being applied to fictitious models developed within historical, theoretical and structural linguistics. first of all, there is still no evidence of the existence of a universal grammar that statically is part of our brain. this is still only a hypothesis for we can neither pinpoint a module nor some part or neuron in the brain that contains a grammatical or all universal grammatical rules. it is of course useful to develop this idea further and to look into possible evidence for this theory, but it is another thing altogether to speculate upon a speculation or to assume upon an assumption. the position taken by pinker and bloom is an evolutionary epistemological one: given that organisms evolved by natural selection, and given that certain organisms evolved language which is the result of brain activity and other elements, language itself probably evolved by natural selection and is to be understood as an algorithmic process. no problem up until now, what they take for granted, however, is that natural selection evolved a grammar module, as it evolved other modules (cosmides and tooby, 1994; sperber, 1996) and that these modules are encoded for by genes. this is problematic, for it combines modularity theory, but no module up until this day has been found within the brain; it assumes that there are genes, that encode for such modules, but no gene that encodes for a module has been found; and it takes universal grammar for granted, although we have not been able to locate this either. all these speculations are useful for the development of scientific theories, but speculations alone cannot form the basis of a theory. in the analysis above we fully see the merits of ee. ee is not a one-directional discipline that from within biology seeks to develop a normative framework that can be put to use to study other evolutionary phenomena. it is, at a minimum, a two-way process, because difficulties or problems that are obvious in one discipline (for example linguistics), can point to a less obvious but analogical problem within other disciplines where one uses or obtains the general framework from (biology). that a vertical evolution concept is being used by pinker and bloom is obvious, and it will be shown that historical linguists as well make use of such a concept. as hull (2002: 18) observes: "in the second half of the 19 th century, some exchange took place between historical linguists and evolutionary biologists with respect to the methods that they used [. . . ] ." what is crucial is that both linguists and biologists use tree models to classify languages and species, and even more importantly: at the beginning these tree models were not used to portray historical sequences showing common descent, rather they were a-historical sequences of platonic (and hence essentialist) archetypes. "the transformation from archetypes to ancestors turned out to be much more complicated than anyone at the time expected. the transition is still far from complete. the linnaean hierarchy was devised for a-temporal, abstract relations, not historical sequences." (hull, 2002 : 24) theoretical linguistics implicitly took over these observations and methods from historical linguistics, when they implemented sociological, synchronic methods to study language. the language-as-species metaphor is a direct result of essentialist thinking: languages are entified structures that are either part of the community, or part of the brain. they obey their own laws. turner (this volume) already showed that the biological species concept is still basically essentialist. hull (2002: 25) observes that "[. . . ] linguists can be found joining biologists to argue that languages are as much historical entities as are species [. . . ] . they come into being, split, merge, go extinct, etc." the definition of language given by chomsky equals the way biology looks at 'species'. some members of a species are regarded as ideal representatives of the gene-pool (in technical terms called the 'wild type'), and it is presumed that all organisms share essentially all important genes within a perfectly homogenous population. hence within biology this is an essentialist view of species (gontier, 2004; turner, this volume) and species are still distinguished from one another by dividing accidental from essentialist properties. the wild type is an abstract concept and an objective measure to discover what the essential properties of a species are. all other organisms that are part of that species are regarded as varieties of that wild type. the language-as-species metaphor (mufwene, 2001 ) makes exactly the same mistakes as biologists make: adhering to the view that there is one (abstract) ideal type of language that equals a brain structure that is common to all humans and lies at the basis of the competence, while all acts of performance are regarded as accidental variation, different manifestations from that ideal grammatical structure (the linguistic wild type). 6 the family tree models used by historical linguists also emphasize speciation. these speciation models imply that at the origin of language, lies one language from wherein all other languages evolved. just as in biology where it is generally presumed that life arose from one urcell, called the last common ancestor (gontier, 2004) , so it is presumed that language evolved from one proto-language (bickerton, 2002) . here one can take two directions: (a) one can either, within historical linguistics, try to reconstruct this proto-language, which is regarded as an idealized language, using the comparative and internal reconstruction methods or; (b) one can assume that this proto-language really existed, as biologically inspired scientists like bickerton (2002) propose. this idea, again just as in biology, leads to the assumption that older organisms, or older languages, are less complex and hence more primitive than later developed organisms or languages. hence ideas like the ladder of cone, where organisms are portrayed as evolving towards ever increasing complexity which is obviously false. this in turn means that if older 'primitive' or 'first' languages are viewed as less evolved, than younger languages are less complex: therefore, pidgin and creole languages or child language (aitchison, 1995) for that matter are explained as a form of proto-language comparable with the first languages ever spoken by human beings. in other words, these ideas contradict the principle of uniformity which states that all human languages are equally complex. a way out of this is found when one assumes that proto-language differs from human language, because proto-language is not human-bound, rather it is presumed to be a characteristic of the homo habilis, erectus and neanderthalenis (mellars, 1998; bickerton, 2002) . an essentialist view of languages, or species for that matter, implies that one needs to distinguish essential from accidental properties, which in turn leads, for example, to de saussure who develops three laws to explain what is essential to language. languages and (as) species are thus regarded as static entities that cannot mix nor influence each other because they follow rigid evolutionary lines. language mixing or hybridization cannot be explained using speciation models 6 the term 'manifestations' here is to be taken literally. although i cannot go into this argument any further, mutations, 'quasi-species', are always mutations from the wild type: because the wild type-the dominant genetic sequence in a population, and hence the most reproductive-accidentally mutates the most (just the law of great numbers here), mutations always resemble the wild type, and hence are different manifestation of that wild type (gontier, 2004) . alone, a point also mentioned by hull (2002: 19) : "the metaphor of a tree of life seems just right for both species and languages, but it can also be misleading. as trees are commonly depicted, they are totally a matter of splitting, splitting and more splitting. merger never takes place." hence species or languages 'die' when essential characteristics disappear, and are replaced: it is stated that these species or languages go extinct, rather than that they evolved into something new, with different properties. what evolved is regarded to be a totally new language or species. turning to the darwinization of linguistics, again we encounter problems. today neo-darwinian theory tends to reduce the evolutionary process to the process of adaptation. adaptation strictu sensu means: being able to survive in an environment long enough to reproduce. the emphasis here lies on the survival part. later on, with the development of the modern synthesis, adaptation was defined as being able to reproduce fertile offspring, which in turn was narrowed down to being able to reproduce at maximal speed (fitness). hence adaptation and fitness today are almost synonymous. more so, the 'being able to reproduce' part which still assumes organisms who reproduce today is being replaced with the idea that being adaptive is being able to pass on his/her genes to the next generation, or more abstractly in the gene pool, at a maximum rate (gontier, 2004) . popular biological scientists such as dawkins for instance, also introduce theological elements into evolutionary biology, although they claim to do the contrary. just as chomsky introduced functionalistic, aristotelian methodology into linguistics, dawkins introduces aristotelian thinking into biology. dawkins (2000) defends the idea that it is the task of every biologist to answer the question of why organisms show design. here it is presumed that there is such thing in nature as design: hands are for grasping, eyes are for seeing, legs are for walking, . . . . methodologically, when searching for design features, we need to answer the question of what a certain characteristic is for. the whole point, however, is that in aristotle's philosophy, the 'what'-question and the 'what for' question are related (gontier, 2004) . answering the 'what'-question is equal to distinguishing between the essential and the accidental and hence equal to defining the proper functions. since in aristotle's philosophy, every final form is potentially present-it just needs to be actualized by a working agent (related to the how question)-the 'what for'-question is related to the 'what'-question. the 'essence' of a thing or organism (what is it, what is the function) is the same as its 'final goal' (what is it for). so when introducing the what-for question, we need an answer to the what question, and when both question are combined, a teleological point of view is taken. given this, let us take dawkins' argument a bit further: are hands also made for colouring, for typing, are our mouths and vocal cord and tongue made for shouting (anti)-governmental slogans? obviously a category mistake is being made here. in evolutionary biology the 'how'-question is and should also be the most important question raised, instead of trying to give functionalistic accounts based upon concepts like adaptivity with respect to what language evolved for. needless to say, there are numerous accounts of maladaptive and neutral characteristics of animals (kimura, 1976; gould, 1980; gould and vrba, 1982) . since we do not know how most things evolved, how can we even begin to answer the question of why they evolved, particularly, in a scientific and non-speculative manner? neo-darwinians have worked long and hard as well as fruitfully to ban all kinds of theological/teleological thinking within biology. why on earth should we undo this job? pinker and bloom follow the same road pointed out by dawkins: language shows design and, therefore, it should have and must have evolved by natural selection, because somehow natural selection gets understood as the designer: the blind watchmaker (dawkins, 2000) . since natural selection works slowly, grammar (under which form exactly: a module, a gene, an inherited brain structure, is not clear in their work) evolved stepwise as well, leading to a child with an n + 1 grammar rule, while his parents got stuck with an n rule grammar. how can this advanced and more complex child make itself comprehensible to his less complex parents? as mentioned in gontier (this volume), ee is about developing a normative framework, based upon evolutionary thinking that can explain all of phylogenetic evolution, but also all of ontogenetic evolution. ee studies the cognitive capacities of organisms from within evolutionary biology (bradie's and harms' (2001) eem programme) and it studies how theoretic evolutionary models can be put to use to study the products of these cognitive capacities such as language and culture (bradie's and harms' (2001) eet programme). in practice this means that ee gets reduced to finding mechanisms and universal frameworks analogous to neo-darwinian thinking. the blind-variationselective-retention scheme of campbell (1987) , the generate-test-regenerate scheme of plotkin (1995) , the replication-variation-environmental interaction scheme introduced by hull (see hull et al., 2001 , for the most up to date account), all base their universal schemes upon the evolution of genes by natural selection. blind variation refers to the random mutations genes can undergo which leads to variation; the selective retention phase is about the selection of the advantageous random mutations that are heritably retained and spread within the species through time. the same goes for plotkin's scheme: the genetic material that is generated in the next generation because of sexual reproduction (mutated or not) needs to be tested (by the environment) in order to see whether they are suited for that environment: if the organism carrying these genes is able to survive long enough to reproduce; through reproduction, the genes are regenerated. hull emphasizes that the level where natural selection acts is the environment which interacts with a phenotype, instead of a replicator (a selfish gene); plotkin emphasizes the randomness of the testing phase; and campbell emphasizes the need for retention. however, all these researchers base their models upon the evolution of genetic material by natural selection and extrapolate from hereon to processes like science, culture or language. so all of them take darwin seriously, or to be more precise: they take ruse (1985) seriously, who states that we should take darwin seriously, although darwin himself has little to do with neo-darwinian theory (gontier, 2004) . darwin talked about gemulles that can blend, while neo-darwinians talk about genes that are impenetrable. these frameworks surely have their merits, and have helped different disciplines within biology and also throughout the study of complex phenomena within the life sciences; there is no question about that. croft (2000) , for example, is the first one to actually use one of these normative frameworks introduced by evolutionary epistemologists. he develops a framework of language variation using hull's replication-variationenvironmental interaction scheme. the replicator (the unit of selection) is his view is the lingueme: grammatical structures that are replicated in the utterances of people. variation arises because of phonological and semantic differences that occur in these utterances; and the environmental interaction (the level of selection) refers to the population of utterances (analogous to the gene pool) of people which interact with other such utterances within a language community. the merit of croft is that he takes the individual organism as the actor: language is not part of some superorganic structure, it is not out there in the community: language is part of a human being who in his utterances produces grammatical structures. however, here arise two problems: although croft is a fearsome critic of chomskyan linguistics, he does not seem to be able to completely abandon chomsky's view on language: the universal grammatical rules are unchangeable, perfectly and idealistically part of a brain, and these structures are replicated in an imperfect way within the utterances of individuals. so here he is ambivalent, not using chomsky while he is using his ideas. another problem arises with respect to his idea of a lingueme pool that is analogous to a gene pool, an ambivalence that can also be read in the works of hull where croft obtains his evolutionary framework from: although the level of selection is the environment and the unit of selection is the entire organism, that through his genes that make up a phenotype interacts with that environment, ultimately genes or linguemes are all that counts. because of these extraordinary developments and the progress made within ee, the time has come to go one step further and to apply evolutionary thinking there where even natural selection, that has been so helpful, fails. natural selection cannot explain all of life's phenomena (if it were able to do this, then and only then it would be unscientific, for a theory that can explain everything, explains nothing). as said in gontier (this volume), it is difficult to apply these rigid schemes to language or culture, because it is difficult to pinpoint one unit or level of selection within the evolution of language and culture. therefore, it is proposed in this article that a symbiogenetic view can complement the study of language from within neo-darwinian theory. just as the above schemes are generalizations made form evolution by natural selection, so can we develop a general framework using symbiogenesis. freeman dyson (1998) was the first to develop a 'universal symbiogenesis theory' that he applied to the evolution of universes and stars, within cosmology. his definition, however, only needs a few adjustments (between brackets), to be useful for the purpose of this article: "universal symbiogenesis is the [re]attachment of two [or more] structures, after they have been detached from each other and have evolved along separate paths for a long time, so as to form a combined structure with behaviour not seen in the separate components. " (dyson, 1998: 121) symbiogenesis thus falls into place as a form of emergent evolution, known by the catchy phrase 'the whole is more than the sum of its parts'. within universal symbiogenesis, different elements (neither numbered in advance, nor defined in advance, not implying that these elements need to be replicated faithfully, or have longevity or fecundity) that evolved along separate lines (after they got split off from each other or even when they never showed signs of common descent) somehow are combined and this new combination (a structure, element) has new properties, that cannot be reduced to the parts that form the new structure. this universal symbiogenetic process can be implemented in the study of language evolution in at least three ways: in the study of language variation (section 4); language genes (section 5) and within the study of conceptual blending (section 6). implicitly, sociolinguistics (also called socio-historical linguistics) and anthropological linguistics use a horizontal symbiogenetic concept of language evolution. sociolinguists and social anthropologists study language variation or language change within a community or a subpopulation of that community and most importantly, they focus on the performance level: they study language as it is actually used by real speakers (croft, 2002) . they broaden the linguistic synchronic view that states that language can only be understood from within language, and use diachronic studies as well. hence they contextualize language, using and searching for political, economical, social and cultural factors that influence, effect or even cause certain types of language behaviour. typical research topics include language contact, language borrowing, language mixing, language death, bi-and plurilingualism, personal and/or group attitudes towards certain forms of language use (see for instance crystal, 2002; nettle and romaine, 2002; thomason, 2001) and they find answers as to why these aspects are part of language. explanations are not reduced to linguistic structures (for example the relations between lexicon and grammar), rather these elements are regarded as being influenced by warfare, trade, colonialism, hegemonic cultural factors, cultural or social markers. pidgin and creole languages, in this view, are not comprehended as static manifestations of a proto-language, but as those languages that give us the best examples of how languages change and vary because of cultural, political, social and economic influences. these processes are already well and often described by sociolinguists but we should also be able to explain these phenomena in the long run, and form predictions. therefore, a normative framework is required and symbiogenesis can provide that framework. first of all, the processes of language variation and language contact resemble the processes involved in contaminations of viruses or bacteria, that are at work at the level of the population. as said: most viruses do not infect the sex cells, but nevertheless stay 'alive' within the population from one generation to the next. during colonial times for example thousands of natives were killed because they got the measles from their western colonizers. those who survived grew resistant to these infections and now the measles are childhood diseases just as they are in western countries. these are not the result of genetic adaptations towards the measles. viruses and bacteria do not form species, but are distinguishable into different types, so within this view there is no need for a language-as-species metaphor. because when one regards language as a species, one gets into trouble relating this species in itself with the human species. secondly, language variation and language mixing also resemble plant hybridization processes. croft, therefore, introduced the 'plantish approach' to language contact and language change: "the zoã¶centric view of phylogeny corresponds to the family tree model of language families in linguistics." (croft, 2000: 196) "in biology there are very similar phenomena to language contact once one leaves the animal kingdom, moving no further than to the plant kingdom." (croft, 2000: 198) thirdly, as symbiogenesis can occur very rapidly (from one generation to the next: bacteria penetrated each other through cannibalism or enforced parasitism leading at once to eukaryotic beings), so language variation and language contact is something that can occur very rapidly within different members of the same population. the creolization of pidgin languages mostly happens very quickly, especially when children learn the pidgin language as their first language ( bickerton, 2002) ; but when we for instance look at certain dialects or different uses of language as a result of different ages, we too find that they adhere to rapid changes: as a teenager it is popular to use 'slang'. these teenagers also know how to use their language 'properly' and when they work on the weekend or go to school they use this proper form; while going out with their friends they use slang. new words get introduced very rapidly; who used the word computer before 1950 and who did not know this word after 1960? fourthly, neo-darwinian theory cannot cope with these different aspects of language variation or language change. that for instance [a] gets pronounced as [ae] by one subgroup and as [e] by another, would be comprehended as individual variation (as opposed to mutation that can introduce novelty: what would be translated to linguistics, the introduction of a whole new vowel). the mechanisms at the base of language variation, however, can get comprehended as a form of horizontal evolution: just as bacteria can exchange genetic material freely within one generation, so languages can exchange grammatical structures, vowels, phonological elements freely. languages, therefore, show more resemblance to bacterial types than to rigid species. we can leave a coffee shop saying we are going to a [pã rtee] instead of a party. freely of course has to be taken with a grain of salt: bacteria, even though they donate and receive genetic bacteria, cannot change from type (spirochetes will never become cyanobacteria). that is why it is important, not so much to study the potential, but the constraints of these organisms. the same goes for language: rather than applying an adaptationist, functionalistic approach to language (the what-for question that automatically and teleologically directs itself to the future), one should study the constraints language has and ask how and when-questions that direct the quest to the past, which is more appropriate when studying origins and evolutions. hurst et al. (1990) was the first to report on the ke-family, a british family where half of the family members suffer from a severe speech disorder (at that time diagnosed as articulatory dyspraxia) that affects their language skills. this pathology was later on diagnosed as specific language impairment (sli), because besides their overall orofacial an oromotor dyspraxia, the pathology of the affected ke-family members is also noticeable in nonverbal orofacial movements (vargha-khadem et al., 1995; alcock et al., 2000) , in their receptive language skills (vargha-khadem et al., 1995 watkins et al., 2002) and in their brain structures (vargha-khadem, 1998; liã©geois, 2003) . in 2000, lai et al. narrowed the search for the gene responsible down to a specific region on chromosome 7 called the spch1 region and finally the gene presumed responsible was identified, called the fox p2 gene (forkhead box, p2). within the affected family members, this gene has undergone a point mutation (lai et al., 2000; lai et al., 2001) . the fox p2 gene is a regulatory gene that can be divided into two parts: one part contains a large polyglutamine tract; the other part contains a forkhead dna binding domain, meaning that part of the gene produces helix-turnhelix proteins that are able to (dis)activate other genes, thereby influencing and regulating development. enard's team (2002) showed that the human fox p2 protein has undergone two amino acid sequence substitutes that occurred solely within the human lineage and are fixed within the human population. this fixation converges with the emergence of anatomically modern humans (presumed 200,000 years old) (enard et al., 2002: 869-870) . the fox p2 gene, however, is an old gene and is very well conserved throughout evolution: since the diversification between the lineages of the mouse and the lineage that would evolve humans, 70 million years ago, there has been only one amino acid substitution, which makes the fox p2 gene one of the 5% most conserved genes in evolution. the fox p2 gene cannot, however, be called a specific language gene, because it is also activated during the development of the heart, the lungs and the gut, nor is it specifically human (given that language is uniquely human, one would assume that language genes would be as well). this, however, is typical for regulatory genes. regulatory genes differ from structural genes (genes that encode for proteins that make tissue that eventually leads to the formation of an organism) in that they produce proteins that return to the helix. these have the amazing property of being able to switch other genes on or off, thereby influencing development. regulatory genes were first discovered in 1975 by king and wilson (1975) . ten years ago, a homeobox of genes was found in our genome (robertis et al., 1990; melton, 1991; wolpert, 1991 wolpert, , 1998 mcginnis and kuziora, 1994; gehring, 1998; davidson, 2001) . we share with almost all eukaryotic organisms a homeobox of genes (called hox -genes) that regulates the development of our anatomical body plans. even more interesting is that the same regulatory genes contribute to the development of different species, because of the (dis)activation, elongation of these genes, during different times at different regions during development. the same gene that for instance lies at the basis of the development of a radial symmetrical body plan (such as a sea star), also lies at the basis of bilateral symmetrical animals, such as humans (schwartz, 1999; gontier, 2004) . fox -genes, which the fox p2 gene is part of, differ from hox -genes because they are spread throughout the genome, but they share the functional properties of hox -genes, when it comes to switching other genes on or off. these ontogenetic processes of gene activation or disactivation need to be comprehended from within a universal symbiotic and hence horizontal point of view, because different genes, after they have evolved in different ways, and were (dis)activated in different regions and periods during ontogeny and phylogeny, can start interacting in new ways, which can lead to the development of new structures and even new species. hence contextualization and emergentism is what matters (gontier, 2004) , because the newly developed features are not reducible to the mere elements that make them up. 7 neo-darwinian theory, which tries to explain vertical evolution using mathematical algorithms (dennett, 1995) cannot explain these phenomena using merely these algorithms. an algorithm basically is a linear binary system combined with logic: if gene a is activated (1), then amino acid (a) is formed; if gene a is not activated (0) then amino acid (a) is not formed. eventually we 7 an observation also made by adoutte in a somewhat different context: "the new molecular based phylogeny has several important implications. foremost among them is the disappearance of "intermediate" taxa [. . . ] we have lost the hope, so common in older evolutionary reasoning, of reconstructing the morphology [. . . ] through a scenario involving successive grades of increasing complexity based on the anatomy of extant "primitive" lineages. [. . . ] in this respect, the situation is not unlike the new perspective emerging on the phylogeny of eukaryotes as a whole, in which most of the formerly intermediate taxa have been pulled upwards." (adoutte et al., 2000: 4455, my emphasis) get a treelike top-down structure that tries to explain how genes encode for features. however, when we look at the activation or disactivation of different regulatory genes, a network develops, where, if some genes are activated during a certain period at a certain region within the individual, than other genes are switched on or off by proteins that are encoded for by these genes. how many proteins are produced depends upon the activation of certain genes, the locus and the time of development. regulatory genes are characterized by their pleiotropic effects (gehring, 1998) and these cannot get formalized using mere algorithms alone. at the very least, non-linear dynamics need to be interwoven with boolean operators. and even then it is difficult to formalize how genes encode for characteristics, because mostly a 1-1 correspondence between a gene and a trait is lacking. what we find is that genes act more like risk factors. schrã¶dinger (2000) already pointed out years ago, that genes do not encode for characteristics or behaviours. it seems neither adequate nor possible to dissect into discrete 'properties' the pattern of an organism which is essentially a unity, a 'whole'. [...] what we locate in the chromosome is the seat of the difference. (we call it, in technical language, a 'locus', or, if we think of the hypothetical material structure underlying it, a 'gene'.) difference of property, to my view is really the fundamental concept rather than property itself, notwithstanding the apparent linguistical and logical contradiction of this statement. the differences of properties actually are discrete, [...] . (schrã¶dinger, 2000: 28-29 , my emphasis) these differences cannot be explained using only neo-darwinian theory, a symbiogenetic view, however, can take this emergentism into account. finally, universal symbiogenesis can incorporate theories such as conceptual blending, developed by fauconnier and turner (2002) . these scientists understand language as a singularity: there are no gradually evolved grammatical structures and there are no intermediate stages of language, not now, not ever. language emerged. emergentism, however, does not imply discontinuity. they point to two fallacies: the cause-effect isomorphism and the function-organ isomorphism (fauconnier and turner, 2002: 175-176) . the cause-effect isomorphism is the fallacious idea, widely defended by scholars, that given that the effect is amazing (in this case language), the cause has to be something extraordinary as well. however, when we look at the development and evolution of regulatory genes, this need not be the case. far from it: the development of an eye for example is the result of one gene (the pax 6 gene) that switches on 2500 other genes that make an eye. if this gene is not activated, eye development is not triggered (gehring, 1998) . the (dis)activation of this gene, however, is only a tiny step in the process. the second fallacy put forward by fauconnier and turner is the function-organ isomorphism: the idea that with the development of every new function, a new organ is involved. this idea dates back to aristotle, and is also subscribed to by chomsky, who presumes the existence of a language organ in the brain. however, nature gives numerous accounts of organs loosing functions or gaining new functions (gould and vrba, 1982) . hence fauconnier and turner (2002: 177) point out that: "language is not an organ. the brain is the organ, and language is just a function subserved by it, with the help of various other organs. language is the surface manifestation of a capacity." this capacity, according to their view, is conceptual blending (the use of metaphorical or analogical thinking) and language is regarded as just one of the products of this blending capacity. here again, we encounter a problem because they adhere to the view that there is a deeper lying, once evolved, unchangeable structure in the brain that has different manifestations, some of these structures emerging through blending. so fauconnier and turner too cannot seem to transcend a mere essentialist/potentialist view. however, conceptual blending can also be understood as a form of symbiogenesis, so therefore, i have redefined conceptual blending just to show how symbiotic this view really is: conceptual blending is the combining of two or more conceptual frames that results in a new conceptual frame with meaning not seen in the different components. it is important to note that in this definition, the components themselves are not static, unchangeable entities. universal symbiogenesis can be regarded as a complementation of neo-darwinian theory, because it can (at the least) integrate the following: (1) horizontal evolution, as different from vertical evolution, can be put to use to explain language variation and language change, phenomena already well described by sociolinguistics. however, we also need to be able to explain these phenomena and, therefore, a normative framework is urgently needed. (2) it can explain ontogenetic and phylogenetic processes concerning regulatory genes such as the fox p2 gene, by analogy, because the same genes, put in a different context and forming different interactions, lead to new emergent properties. (3) it can explain cognitive evolutions such as conceptual blending crucial to language, because symbiogenesis can provide a framework to explain these emergent processes that are the result of blending of conceptual frameworks. the new animal phylogeny: reliability and implications chimps, children and creoles: the need for caution oral dyspraxia in inherited speech and language impairment and acquired dysphasia the mechanisms of evolution evolutionary epistemology the speciation of modern homo sapiens methodological suggestions from a comparative psychology of knowledge processes blind variation and selective retention in creative thought as in other knowledge processes evolutionary epistemology comment on 'the natural selection model of conceptual evolution selection theory and the sociology of scientific validity aspects of the theory of syntax beyond intuition and instinct blindness: toward an evolutionary rigorous cognitive science explaining language change: an evolutionary approach. essex: pearson. croft, w. 2002 language death without miracles: universal selection theory and the second darwinian revolution genomic regulatory systems: development and evolution evolution from molecules to man replicators and vehicles the blind watchmaker. london: penguin books darwin's dangerous idea the evolution of science molecular evolution of fox p2, a gene involved in speech and language the way we think: conceptual blending and the mind's hidden complexities master control genes in development and evolution: the homeobox story de oorsprong en evolutie van leven: 15 van het standaardparadigma afwijkende thesen. van voorwoord en nawoord voorzien door philip polk en jean paul van bendegem. brussel: vubpress [the origin and evolution of life: 15 nonneodarwinian theses the panda's thumb: more reflections in natural history darwinism and the expansion of evolutionary theory smooth curve of evolutionary rate: a psychological and mathematical artifact wonderful life: the burgess shale and the nature of history. london: penguin books exaptation: a missing term in the science of form species, languages and the comparative method a general account of selection: biology, immunology, and behavior an extended family with a dominantly inherited speech disorder viral trade and global public health how genes evolve: a population geneticist's view evolution at two levels in humans and chimpanzees the spch1 region on human 7q31: genomic characterization of the critical interval and localization of translocations associated with speech and language disorder a forkhead-domain gene is mutated in a severe speech and language disorder language fmri abnormalities associated with fox p2 gene mutation the symbiotic planet: a new look at evolution. london: phoenix, orion books what is life? berkeley acquiring genomes: a theory of the origin of species the theory of evolution. cambridge: canto the unity of the genotype the molecular architects of body design neanderthals, modern humans and the archaeological evidence for language pattern formation during animal development the ecology of language evolution philosophical darwinism: on the origin of knowledge by means of natural selection. london: routledge vanishing voices what can the field of linguistics tell us about the origin of language the promise of comparative genomics in mammals natural language and natural selection darwin machines and the nature of knowledge: concerning adaptations, instinct and the evolution of intelligence. london: penguin books homeobox genes and the vertebrate body plan taking darwin seriously what is life? cambridge sudden origins: fossils, genes, and the emergence of species explaining culture: a naturalistic approach language contact: an introduction praxis and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder neural basis of an inherited speech and language disorder behavioural analysis of an inherited speech and language disorder: comparison with acquired aphasia do we understand evolution? the evolution of cellular development key: cord-003514-yyzbv7ys authors: arslan, mehboob; yang, xin; santhakumar, diwakar; liu, xingjian; hu, xiaoyuan; munir, muhammad; li, yinü; zhang, zhifang title: dynamic expression of interferon lambda regulated genes in primary fibroblasts and immune organs of the chicken date: 2019-02-14 journal: genes (basel) doi: 10.3390/genes10020145 sha: doc_id: 3514 cord_uid: yyzbv7ys interferons (ifns) are pleiotropic cytokines that establish a first line of defense against viral infections in vertebrates. several types of ifn have been identified; however, limited information is available in poultry, especially using live animal experimental models. ifn-lambda (ifn-λ) has recently been shown to exert a significant antiviral impact against viral pathogens in mammals. in order to investigate the in vivo potential of chicken ifn-λ (chifn-λ) as a regulator of innate immunity, and potential antiviral therapeutics, we profiled the transcriptome of chifn-λ-stimulated chicken immune organs (in vivo) and compared it with primary chicken embryo fibroblasts (in vitro). employing the baculovirus expression vector system (bevs), recombinant chifn-λ3 (rchifn-λ3) was produced and its biological activities were demonstrated. the rchifnλ3 induced a great array of ifn-regulated genes in primary chicken fibroblast cells. the transcriptional profiling using rna-seq and subsequent bioinformatics analysis (gene ontology, differential expressed genes, and keggs analysis) of the bursa of fabricious and the thymus demonstrated an upregulation of crucial immune genes (viperin, ikkb, ccl5, il1β, and ap1) as well as the antiviral signaling pathways. interestingly, this experimental approach revealed contrasting evidence of the antiviral potential of chifn-λ in both in vivo and in vitro models. taken together, our data signifies the potential of chifn-λ as a potent antiviral cytokine and highlights its future possible use as an antiviral therapeutic in poultry. viral pathogens pose significant threats to the poultry industry around the globe. this necessitates the development of novel and alternative antiviral therapies to contain the impacts of pathogens. avian influenza viruses (aivs) are a particular threat, which cause severe damage to the poultry industry, especially in developing countries where huge monetary losses are incurred [1, 2] . public health is also threatened by aivs, owing to their zoonotic importance. active preventive strategies would minimize the risk of viral transmission to humans and also benefit the poultry industry. interferons (ifns) are pleiotropic functional cytokines with antiviral, antitumor, and natural immune-boosting effects. ifns play a significant role in eliciting an antiviral state in vertebrates [3] . ifns are broadly categorized into three distinct types based on their molecular structure, receptor specificity, and induction pathway [4] . type i ifns include ifn-α, ifn-β, ifn-ε, ifn-κ, and ifnω, and all signal via common cell surface receptors (ifnαr-1) and (ifnαr-2), which are situated on a broad range of cells [3] . type ii ifns consist of ifn-γ, which is activated through highly specific ligand interactions with distinct ifn-γ receptors (ifn-γr1) and (ifn-γr2). the third family of ifns consists of ifn lambda, which interacts with a heterodimeric receptor complex (il-28rα and il-10β). ifn-λ was first discovered in mammals and subdivided into ifn-λ1 (also known as il-29), ifn-λ2 (il-28a), ifn-λ3 (il-28b), and ifn-λ4 [5] . ifns are crucial in an innate immune response, as their expression and antiviral potential is dependent on their cognate receptor interaction in a particular system [6] . in chickens, type i ifns primarily interact in fibroblasts, whereas epithelial cells (gastrointestinal and respiratory tract) are the primary site for the actions of type iii ifns [7] . despite morphological diversity, ifns share integrated, interconnected, and a precisely coordinated cascade in immunity pathways [3] . ligand recognition and interaction by ifn receptors results in rapid activation of janus kinase/signal transducers and activators of transcription (jak-stat pathway). this leads to phosphorylation of stat1 and stat2, activation of interferon stimulated gene factor 3 (isgf3), binding of ifn-stimulated response elements (isres), and expression of ifn stimulated genes (isgs) [8] . once expressed, these isgs demonstrate an essential role in the antiviral response. it is evident from published data that ifns upregulate identical sets of isgs, which in turn express antiviral proteins. ifn-induced transmembrane protein (ifitms), viperin and myxovirus resistance protein (mx) are some of the potent antiviral proteins expressed in response to viral infections [9] . once expressed, these isgs control viral replication, which provides an antiviral atmosphere to limit viral propagation in infected cells. compared to the mammalian ifn-λ repertoire (ifn-λ1, ifn-λ2, ifn-λ3, and ifn-λ4), chicken ifn-λ is the sole member in birds and demonstrates structural identity with human ifn-λ3. ifn-λ is chiefly involved in protection against viral infection of the respiratory and gastrointestinal tract epithelia (aiv, ndv, ibv), and due to the distribution of il-28rα in epithelium-rich organs, ifn-λ demonstrates significant potential to limit viral propagation [10] . while most of the current studies in chickens are mainly focused on type i and type ii ifns, we investigated the potential of type iii ifns in innate and adaptive immunity. previously, it was established that chifn-α presented a significant delay in the propagation of rous sarcoma virus and confirmed in vivo [11] . it was also revealed that chifn-α treatment ameliorates infection progression in experimental chickens with highly pathogenic influenza a virus (hpaiv) subtype h5n1 [12] . compared to type i ifns, chifn-λ has also been shown to elicit moderate antiviral response in both the chicken macrophage cell line hd11 and primary chicken embryo fibroblasts (cef) [13] . another published study demonstrated that cefs treated with recombinant chifn-λ induced isgs in a temporal fashion [14] . however, the antiviral potential of chifn-λ in live animals (e.g., chickens) has not yet been investigated, which could provide evidence for the potential of chifn-λ in animals per se. to investigate the impact of exogenous chifn-λ on the innate immune system in chickens, we first expressed chifn-λ in a silkworm bioreactor platform utilizing a baculovirus expression vector system (bevs) [15] . compared to the autographa californica nucleopolyhedrovirus (acmnpv)-sf9 cell expression system, the bombyx mori nucleopolyhedrovirus (bmnpv)-silkworm system possesses greater post-translational modifications and enhanced expression efficiency [16, 17] . comparative transcriptomic profiling revealed the key mechanisms, signaling pathways, and expression patterns of genes involved in interferon-induced immunity. our results highlight the dynamics of chifn-λ roles in chicken innate immunity. bm5 cells (bombyx mori-derived cell line) were cultured and maintained at 27 • c with 10% fetal bovine serum (fbs, gibco, usa) in tc100 (insect cell culture medium) (applichem, darmstadt, germany) as per the published literature [18] . for co-transfection, bm5 cells were cultured at a constant density of 1 × 10 6 cells per well in six well plates for 12 hours with tc100 media containing fbs. tc100 media without fbs was used to wash the cells twice and a mixture of transfection and co-transfection was introduced to cells. between 4-6 h post-transfection, fbs was introduced to the cell culture media. for viral amplification and expression, cells were infected with a multiplicity of infection (moi) of 0.1 for 1-2 h. the ensembl chicken genome database (ftp://ftp.ensembl.org/pub/release-93/fasta/gallus_ gallus/dna/) was extensively screened for homologues of chifn-λ by employing the blast algorithm (http://www.ncbi.nlm.nih.gov/blast/). a stretch of sequences demonstrating high sequence identity was identified and characterized. 1] were acquired from the national center for biotechnology information (ncbi) and aligned using the clustalw program, and phylogenetic analysis was performed using the neighbor-joining method with bootstrap n = 1000 in mega software (version 7). amino acid sequences of ifn-λ from multiple species were aligned using the clustalw algorithm. the espript 3.0 (http://espript.ibcp.fr/espript/cgi-bin/espript.cgi) was utilized to analyze the sequences. in our previous study, we developed a novel defective-rescue recombinant bombyx mori bacmid (rebmbac) expression system [15] . we used this in-house built and developed system to express chifn-λ. the rebmbac-silkworm expression system was employed to construct chifn-λ (interferon lambda-3 [gallus gallus]; sequence id: xp_015144667.1; length: 186). briefly, in order to enhance expression efficiency by codon optimization, chifn-λ genes were optimized for expression in the silkworm (bombyx mori) and synthesized by genscript company (china). plasmid-containing orf1629+ with gene of interest (chifn-λ) and pph as a promoter was co-transfected with rebmbac in the bm5 cell line. recombinant virus containing the chifn-λ gene was harvested 4-5 days post co-transfection. expression product was acquired after 4-5 days of silkworm/pupae infection. the plaque assay was performed to evaluate the recombination efficiency [18] . luciferase assay kit (promega, usa) was employed to analyze expression quantity of luciferase in 50 µg of protein lysate. the bradford method was used to measure the amount of protein [19] . antiviral activity of chifn-λ was assayed in the gfp-reduction assay using recombinant vesicular stomatitis virus (vsv-gfp) [20] . cefs were prepared from 9-11 days old specific pathogen free (spf) chicken eggs and maintained in cell culture flasks [21] . after 24 hours, cefs were stimulated with chifn-λ and cells were harvested after 12 hours post treatment, snap frozen, and stored at −80 • c for further processing. all experiments were performed in triplicate. the present study was conducted in accordance with animal ethics guidelines and approval was given by the beijing administration office of laboratory animals, china. a total of 60 newly hatched spf chicks were obtained from beijing arbor acre company ltd., p.r. china. chicks were reared in cages (n = 10 birds/cage) and placed in six cages in a temperature-controlled environment at the biotechnology research institute, chinese academy of agricultural sciences (caas), p.r. china. birds were offered standard commercial feed obtained from cp group ltd., p.r. china. unrestricted access to water was provided via nipple drinker lines and ad libitum feed was offered. a treatment group of 14-day old chicks were injected daily with chifn-λ (10,000 iu/kg body weight) (105 iu/ml). phosphate buffer saline (pbs) was injected intramuscularly to the control group. the bursa of fabricious and thymus were obtained by euthanizing the chickens at five days post-treatment. tissue samples were rapidly collected, snap-frozen in liquid nitrogen, and stored at −80 • c for further processing. total rna was extracted from virus-infected or mock-treated cefs (in triplicates), as per manufacturer's guidelines [22] . similarly, a total of five immune organs (bursa of fabricious and thymus) were pooled (in duplicates) from randomly selected chicken from each virus-or mock-infected group. total rna extraction was performed as per manufacturer's instructions [22] . extracted rna quality was analyzed by employing 1% agarose gel and rna integrity was assured using rna nano 6000 assay kit from bioanalyzer 2100 system (agilent technologies, ca, usa). extracted samples were sent to novogene beijing for sequencing. samples were sequenced using hiseq x ten (ilumina) and pe150 platforms. rna-seq generated from cef, bursa of fabricious and thymus samples of chicken (both chifnλ-treated and control groups) are presented in supplementary table s1 . reads were mapped to the reference genome database (ftp://ftp.ensembl.org/pub/release-89/fasta/gallus_gallus/dna/). individually mapped reads for each sample were assembled by stringtie (v1.3.3b) using a reference-based approach. featurecountsv1.5.0-p3 was utilized to estimate read numbers mapped to each gene. fragments per kilo base of transcript sequence per million base pairs sequenced (fpkm) of each gene was analyzed on the basis of length of gene and read count mapped to this gene. differential expression analysis was accomplished by employing deseq2 r package (1.16.1). using benjamini and hochberg's approach, p-values were adjusted for controlling false discovery rate (fdr). genes with (p < 0.05, |log2fold change|>1) observed by deseq2 were designated as differentially expressed. for differentially expressed genes, both gene ontology (go) enrichment analysis and kyoto encyclopedia of genes and genomes (kegg) pathway enrichment was conducted using the clusterprofiler r package. go terms with adjusted p-values < 0.05 were considered as significantly enriched (http://www.genome.jp/kegg/). using the chicken ifn gene as a query, we constructed the phylogenetic tree by employing the neighbor joining method (bootstrap n = 1000). this demonstrates the relationship of chifnλ with its mammalian orthologues by illustrating that chifn-λ is distinct in its evolution. furthermore, this revealed the contrasting consensus sequence from databases including ensembl and genbank. chifn-λ encodes a putative protein of 186 amino acids and further demonstrates typical characteristics of type iii ifns. a pairwise blast analysis demonstrated that chifn-λ shares 36%, 34%, 39%, 34% and 33% sequence similarity with recently characterized pig, mouse, human, cattle, and frog ifn-λ, respectively. based on amino acid homology, conserved amino acids among distinct avian and mammalian ifn-λ are identified. taken together, this comparative characterization further shows that chifn-λ shares characteristic features of type iii ifns (supplementary figure s1a ,b). in order to construct chifn-λ, we employed a bevs study. in order to determine the expression efficiency, we used a luciferase reporter gene for quality control as we described previously [15] . the luciferase gene was acquired from pgl3-basic vector by employing bglii/xbai digestion and insertion into the bamhi/xbai-digested pvl1393 vector to construct pvl1393-luc vector. a combination of pvl1393-luc and rebmbac dna was co-transfected in bm5 cells (figure 1) . a viral plaque assay was used to determine a suitable virus strain with which to express luciferase. supernatant from bm5 cells containing recombinant bmnpv (rebm-luc) was harvested five days post-transfection before inoculation into silkworms. after four to five days, protein was harvested from silkworms and 50 µg protein from lysed larval haemolymph was subjected to luciferase assays. luminescence detected from silkworm larval haemolymph was approximately 3.42 ± 0.52 × 10 8 relative light units (rlu), compared to 150-300 rlu from luc-negative virus-infected samples. pcr amplification (qpcr) further verified and validated the chifn-λ gene expression in bevs (supplementary figure s1c ). in order to construct chifn-λ, we employed a bevs study. in order to determine the expression efficiency, we used a luciferase reporter gene for quality control as we described previously [15] . the luciferase gene was acquired from pgl3-basic vector by employing bglii/xbai digestion and insertion into the bamhi/xbai-digested pvl1393 vector to construct pvl1393-luc vector. a combination of pvl1393-luc and rebmbac dna was co-transfected in bm5 cells (figure 1) . a viral plaque assay was used to determine a suitable virus strain with which to express luciferase. supernatant from bm5 cells containing recombinant bmnpv (rebm-luc) was harvested five days post-transfection before inoculation into silkworms. after four to five days, protein was harvested from silkworms and 50 μg protein from lysed larval haemolymph was subjected to luciferase assays. luminescence detected from silkworm larval haemolymph was approximately 3.42 ± 0.52 × 10 8 relative light units (rlu), compared to 150-300 rlu from luc-negative virus-infected samples. pcr amplification (qpcr) further verified and validated the chifn-λ gene expression in bevs (supplementary figure s1c) . in order to investigate the possible biological, cellular, and molecular mechanisms involved in the cascade of interferon-induced immunity, we performed transcriptomic analysis on chicken embryo fibroblasts and organs of live chickens. transcriptomes from the bursa of fabricious and thymus (most important immune organs in chicken) were compared with the control group to identify differentially expressed genes (degs) among all groups. experimentation started at day 14 post-hatch as this is a phase of rapid growth and development, and we hoped to achieve biologically active transcriptional changes. the differences in degs observed in the present study control cellular architecture, immune function, metabolic pathway, and muscular function. it has previously been established that huifn-λ signals via il-10 and il-28r exhibit typei-like antiviral potential [23] . protection from simian foamy virus (sfv) and avian influenza (ai) augments the antiviral functioning and further postulates its diverse antiviral potential against avian pathogens. in this context, we stimulated chickens with silkworm-expressed chifn-λ and profiled the gene expression in immune organs (thymus and bursa) and compared it with that in primary chicken fibroblasts using rna-seq. an overall low isg expression was noticed in chifn-λstimulated cef; out of 26,616 genes, 161 were degs (84 upregulated and 77 downregulated) (p< 0.05, │ log2fold change │ >1) (figure 2a) . although cef do not possess receptors for ifn-λ, slight temporal expression of degs in response to chifn-λ treatment signifies its antiviral potential in primary cells. next, we monitored the gene expression in the thymus and bursa. between the chifn-λ-treated and non-treated thymus, a total of 23,801 genes were expressed. among them, 331 genes were degs, in which 177 genes were upregulated and 154 genes were downregulated (figure 2b) . in the bursa of fabricious, 289 out of 23,951 genes were differentially expressed (130 upregulated and 159 in order to investigate the possible biological, cellular, and molecular mechanisms involved in the cascade of interferon-induced immunity, we performed transcriptomic analysis on chicken embryo fibroblasts and organs of live chickens. transcriptomes from the bursa of fabricious and thymus (most important immune organs in chicken) were compared with the control group to identify differentially expressed genes (degs) among all groups. experimentation started at day 14 post-hatch as this is a phase of rapid growth and development, and we hoped to achieve biologically active transcriptional changes. the differences in degs observed in the present study control cellular architecture, immune function, metabolic pathway, and muscular function. it has previously been established that huifn-λ signals via il-10 and il-28r exhibit typei-like antiviral potential [23] . protection from simian foamy virus (sfv) and avian influenza (ai) augments the antiviral functioning and further postulates its diverse antiviral potential against avian pathogens. in this context, we stimulated chickens with silkworm-expressed chifn-λ and profiled the gene expression in immune organs (thymus and bursa) and compared it with that in primary chicken fibroblasts using rna-seq. an overall low isg expression was noticed in chifn-λ-stimulated cef; out of 26,616 genes, 161 were degs (84 upregulated and 77 downregulated) (p < 0.05, |log2fold change|>1) (figure 2a ). although cef do not possess receptors for ifn-λ, slight temporal expression of degs in response to chifn-λ treatment signifies its antiviral potential in primary cells. pathways [24] (figure 3) . due to the induction of a distinct subset of genes, a lower level of antiviral activity is observed as compared to type-i ifns. it is speculated that the activation of chifn-λ is similar to type-i ifns but they are diverse in functional capability. the chifn-λ have particular significance in viral infections of epithelial origin, where they are optimally active by eliciting a broad antiviral state. using conventional approaches, we have confirmed the expression of selected genes as shown in supplementary figure s2a and s2b. next, we monitored the gene expression in the thymus and bursa. between the chifn-λ-treated and non-treated thymus, a total of 23,801 genes were expressed. among them, 331 genes were degs, in which 177 genes were upregulated and 154 genes were downregulated ( figure 2b ). in the bursa of fabricious, 289 out of 23,951 genes were differentially expressed (130 upregulated and 159 downregulated) ( figure 2c ). interestingly, a relatively low number of genes overlapped among these three groups ( figure 2d ). in order to confirm the expression of degs, we used a conventional approach (qpcr) and show (supplementary figure s2a,b) a scenario corresponding to the rna-seq data. on the basis of abundance and fold change, degs were further characterized (supplementary table s1 ). cumulatively, a significant upregulation of crucial cytokine and chemokine genes (il1-β, ccl4, ccl5, and cx3cl1) was observed. these are broadly involved in antiviral response, apoptosis, cellular proliferation and differentiation, cytokine-cytokine receptor interaction and inflammation pathways [24] (figure 3 ). due to the induction of a distinct subset of genes, a lower level of antiviral activity is observed as compared to type-i ifns. it is speculated that the activation of chifn-λ is similar to type-i ifns but they are diverse in functional capability. the chifn-λ have particular significance in viral infections of epithelial origin, where they are optimally active by eliciting a broad antiviral state. using conventional approaches, we have confirmed the expression of selected genes as shown in supplementary figure s2a degs were further analyzed for go terms and the kegg pathway by utilizing deseq2 [25] . of 956 go terms associated with chifn-λ-treated cef, 112 go terms were significant (p < 0.05) ( figure 4a ). in the bursa, among biological processes, we observed the wnt signaling pathway (wif1/camk2a), cytokine-cytokine receptor interactions (tnfsf11), the apelin signaling pathway (ryr2/myl4), and the significant antiviral pathway (novel gene) in cellular components (figure 4b) . in the thymus, out of 1712 go terms, we observed 309 significant, and in the bursa, out of 2298 go terms, 637 were significant (p < 0.05). in order to understand the biological functions associated with degs, we further analyzed the data in three distinct categories, including biological processes (bp), cellular components (cc), and molecular functions (mf) (figure 4c) . degs were further analyzed for go terms and the kegg pathway by utilizing deseq2 [25] . of 956 go terms associated with chifn-λ-treated cef, 112 go terms were significant (p < 0.05) ( figure 4a ). in the bursa, among biological processes, we observed the wnt signaling pathway (wif1/camk2a), cytokine-cytokine receptor interactions (tnfsf11), the apelin signaling pathway (ryr2/myl4), and the significant antiviral pathway (novel gene) in cellular components ( figure 4b ). in the thymus, out of 1712 go terms, we observed 309 significant, and in the bursa, out of 2298 go terms, 637 were significant (p < 0.05). in order to understand the biological functions associated with degs, we further analyzed the data in three distinct categories, including biological processes (bp), cellular components (cc), and molecular functions (mf) ( figure 4c ). further to gene ontology and differential expression, we investigated kegg pathway enrichment. in cefs, significant enrichment was seen in pathways including the mapk signaling pathway ( further to gene ontology and differential expression, we investigated kegg pathway enrichment. in cefs, significant enrichment was seen in pathways including the mapk signaling pathway (fos/il1b/fosb), the toll-like receptor signaling pathway (fosb, il8l1, il1b, fos, ccl5), influenza a (il8l1/il1b/ccl5), cytokine-cytokine receptor interactions (ccl20/il8l1/il1b/cx3cr1/ccl5), salmonella infection (fosb/il8l1/il1b/fos), the nod-like receptor signaling pathway (il8l1/il1b/ccl5), and herpes simplex infection (fosb/il1b/fos/ccl5) ( figure 5a ). in bursa, wnt signaling (wif1), the apelin signaling pathway (ryr2), and the calcium signaling pathway (ryr2) were significantly observed ( figure 5b ). for the thymus, the nod-like receptor signaling pathway (plcb1/mapk11), the mapk signaling pathway (srf/mapk11), influenza a (rsad2/mapk11), and mapk11 (salmonella, toll-like, herpes simplex infection) were observed ( figure 5c ). collectively, apoptosis (jun/birc5/ctsc/actg1), rna degradation (eno1/btg2/c1d), the tca cycle (mdh1/idh3a), the p53 signaling pathway (perp1/ccnb2), biosynthesis of amino acid (eno1/idh3a), influenza a (rsad2/jun/actg1), and the toll-like receptor signaling pathway (jun) were among the most significant. here, we present the first comprehensive report on cloning and expression of chifn-λ by employing bevs and demonstrate that it is biologically active in both cef (in vitro) and live chickens here, we present the first comprehensive report on cloning and expression of chifn-λ by employing bevs and demonstrate that it is biologically active in both cef (in vitro) and live chickens (in vivo). the identification of this potentially significant ifn among the ifn family advances fundamental aspects and functionality of chifn-λ in avian type-iii ifns. it is evident from the data that this ifn, like human interferon lambda (huifn-λ) , demonstrates similar type-i ifn-like properties. however, a distinct pattern of expression of isgs in chifn-λ contrasts it from other type-i ifns. knowledge regarding ifns is fundamental as rapid outbreaks of viral pathogens cause huge economic losses to the poultry industry every year. the present study investigates the isgs and signaling pathways associated with avian immunity and will bring new horizons to target problematic viral pathogens, e.g., aivs, circulating within the poultry industry. interferon lambda is a biologically active type-iii interferon which primarily acts on epithelial tissues [3] . studies have demonstrated the antiviral potential of ifn-λ against highly pathogenic avian influenza by eliciting a broad antiviral state [10] . ifn-λ is structurally peculiar as it possesses five exonic regions located on chromosome 7, contrary to type-i ifns, which are intronless and situated on the z sex chromosome in chicken [5, 26] . this is in agreement with human ifn-λ subfamily which are anatomically identical by possessing five exonic regions on chromosome 1 of the human genome [5] . furthermore, 36% of amino acids are identical between huifn-λii and chifn-λ, which signifies the similarity of these two ifns. however, unlike mammals, only one member exists in chicken (chifn-λ). this is in agreement with the other types of chicken ifns, which have fewer members compared to mammalian ifns [27] . reduced expression of isgs in response to chifn-λ in our experiment demonstrates the fact that cefs are optimally less receptive to ifn-λ, which is in agreement with published reports [10] . one study revealed that chifn-λ can actively inhibit the viral replication of ai in primary embryonic tracheal organ cultures and clec-213 (chicken lung cell line). it is further postulated that with treatment of chifn-λ, isgs are expressed significantly, especially mx gene, which is primarily expressed in epithelial rich organs (i.e., trachea, lungs, and intestine) was also observed in the present study [10] . furthermore, studies have also revealed that a high degree of cell type specificity in receptor-ligand interactions make avian ifns distinct from mammalian ifns. recently, it has been established that chicken ifn-λ inhibits low pathogenic influenza virus replication in cefs; however, as compared to chifn-γ and chifn-β, higher doses are required to induce isgs and maintain the strong antiviral state in the cells [14] . go and kegg analysis of each experimental group demonstrated overlapping biological functions. an important gene involved in the host response of infected samples is rsad2, also termed viperin, which is one of the potent interferon stimulated genes (isgs) responsible for eliciting a broad antiviral state against a variety of viral and bacterial pathogens [28] . in mammals, it is highly expressed in response to invading viral infections [29] . elevated expression of viperin in chifn-λ-treated organs further augments the expression of isgs in response to injected ifn in vivo. viperin was upregulated in response to chifn-λ treatment, which is symbolic for all isgs. ifn-inducible transmembrane protein-1 (ifitm-1) is one of the potent isgs expressed in response to either type of ifn and plays an antiviral role by blocking cytoplasmic entry [30] . it is further demonstrated that ifitm alters membrane fluidity, hence producing curvature in the outer leaflets of the membrane or by interfering with intracellular cholesterol homeostasis [31, 32] . significant upregulation of ifitm3 in the chifn-λ-treated thymus augments the temporal expression of isgs in response to ifn treatment. further studies are needed to investigate the possible future role of chifn-λ as a potent and novel therapeutic in the poultry industry. although the immune response elicited by type iii ifns is still not very clear, in the present study we also found some novel genes involved in the cascade of the avian immune response. furthermore, in vitro exposure of cef to chifn-λ demonstrated a rapid surge of pro inflammatory cytokines. considering their vital role in immune pathways, cytokine gene expression is widely employed as an indicator for the immune response. we did observe some genes that were previously illustrated in publications; one such example is chemokine (c-c motif) ligand 1 (ccl1, ensgalt00000003670) [33] . chemokines are secreted chemotactic cytokines that play a fundamental role in the recruitment and migration of lymphoid and myeloid cells in target tissues, and hence govern the avian immune response [34] . ccl1 is a chemokine secreted by monocytes that is capable of activating macrophages and t lymphocytes [35] . ccl20, like its mammalian orthologue, is responsible for recruiting lymphoid cells and is involved in the early immune response in chickens [36] . likewise, ccl1, ccl4, and ccl5 were also upregulated in cef and are chiefly involved in the innate avian immune response. the present study describes the transcriptomic analysis of differential gene expression following exposure to chifn-λ and the resultant pro-inflammatory response in both cef and chicken tissues. this response ostensibly is due to rapid and sustained signaling via cell surface receptors and a surge of chemokines and cytokines, which in turn create an antiviral environment. a contrasting feature of the present study is the upregulation of the toll-like receptor (tlr) signaling pathway in all three treatment groups, where it is evident that numerous genes are upregulated in tlr mediated cytotoxicity. tlr15, a unique chicken receptor expressed on the surface of fibroblasts, heterophils, and macrophages, shares 30% sequence identity with tlr2 [37] . it is evident from experimentation that tlr15 is a broad spectrum tlr that has the capability to recognize heat stable components of both gram-positive and gram-negative bacteria, cpg oligonucleotides, lipopolysaccharide (lps), and tripalmitoylated lipopeptide [38] . tlr15, an avian-specific tlr, plays a significant role in avian immune responses against bacterial and viral pathogens. recently, it has been demonstrated that diacylated lipopeptide from mycoplasma synoviae activated tlr15 and regulated innate immune responses [39] . similarly, significant upregulation of tlr15, observed in the present study, highlights a possible role of chifn-λ against mycoplasma infections in chicken. however, it warrants future studies to delineate the molecular processes. it has also been established by repeated experimentation that chifn-λ has been seen to cause delay in viral excretion and the spread of highly pathogenic avian influenza (hpai) h5n1 [10] . it is evident that in mammals, ifn-λ elicits a protective antiviral response toward ai, whereas ifn-λ plays a minor role in lung epithelia [40] . similarly, in the respiratory tract of chickens, not all mucosal cells are responsive to chifn-λ. therefore, treatments can only delay, but do not significantly support the complete removal of viral loads of h5n1 or halt the virus crossing the epithelial barrier [41] . however, for low-pathogenic avian influenza (lpai), it is evident that chifn-λ has demonstrated significant antiviral activities [42] . recent reports revealed another contrasting feature of ifn-λ, where it significantly elicited strong antiviral potential on intestinal epithelial cells to control murine rotaviruses [43, 44] . it will be fascinating to investigate in the future whether the same antiviral phenomena occurs, and chifn-λ might also demonstrate epitheliotropism like rotaviruses and halt viral pathogens of the gastrointestinal tract in chickens. nuclear factor kappa-b (nf-kb) is the most significant, evolutionarily conserved, pleiotropic, inducible transcription factor responsible for regulating genetic expression in a variety of fundamental processes, including apoptosis, growth, immune response, inflammation, stress response, etc. [45] . notably, the upregulation of nf-kb in response to chifn-λ treatment on cef signifies their potent role in the immune response. activator protein 1 (ap-1) is a transcription factor complex highly responsive for cytokine signaling and growth promotion [46] . formed through noncovalent dimerization between the fos and jun family of nuclear oncogenes, this complex activates ap-1-dependent genes, hence controlling cell proliferation, differentiation, and apoptosis [47] . consistent with these observations, our study demonstrated that many genes associated with this pathway were upregulated. this finding suggests a link between ap1 and the transcriptional cascade associated with recombinant interferon treatment. overall, transcriptomic analysis revealed significant upregulation of fos and jun in cef and bursa, and thymus of chicken. the innate immune response is a highly complex, precise, interconnected, and integrated response that relies on many factors. the genes investigated in our study control direct protein interactions and are significantly involved in the avian innate immunity cascade. however, further validation of a broad set of immunity-related genes will also be required to elucidate the mechanism of interferon-induced immunity. a more comprehensive study including a larger set of immune genes and multiple recombinant ifns, which will correlate their integrated role, will enable researchers to provide comprehensive insight into the avian innate response. other future studies involving backyard poultry to assess whether similar patterns of innate immunity prevail in indigenous breeds in response to chifnλ are also important and will further develop our understanding of avian immunity. in the current study, we employed rna-seq to illustrate vital transcriptomes involved in the cascade of avian biology and observed divergent results in recombinant interferon-treated chickens compared to a control group chickens. our data suggest that significant antiviral, cell cycle regulators, and biologically active genes are expressed in response to administered chicken ifn. functional characterization of these vital genes warrants further investigation to determine the future possible role for recombinant chicken ifn in the poultry industry. the authors declare no conflict of interest. phylogeography and evolutionary history of reassortant h9n2 viruses with potential human health implications h9n2 avian influenza virus in korea: evolution and vaccination avian interferons and their antiviral effectors characterization and transcriptional analysis of the mouse chromosome 16 cytokine receptor gene cluster ifn-λs mediate antiviral protection through a distinct class ii cytokine receptor complex lambda interferon (ifn-λ), a type iii ifn, is induced by viruses and ifns and displays potent antiviral activity against select virus infections in vivo advances in avian immunology-prospects for disease control: a review interferon lambda genetics and biology in regulation of viral control partial antiviral activities detection of chicken mx jointing with neuraminidase gene (na) against newcastle disease virus antiviral activity of lambda interferon in chickens protective effects of type i and type ii interferons toward rous sarcoma virus-induced tumors in chickens highly pathogenic avian influenza viruses do not inhibit interferon synthesis in infected chickens but can override the interferon-induced antiviral state molecular cloning, expression, and characterization of chicken ifn-λ biological effects of chicken type iii interferon on expression of interferon-stimulated genes in chickens: comparison with type i and type ii interferons a highly efficient and simple construction strategy for producing recombinant baculovirus bombyx mori nucleopolyhedrovirus comparison of the n-linked glycosylation of human beta1,3-n-acetylglucosaminyltransferase 2 expressed in insect cells and silkworm larvae comparison of recombinant protein expression in a baculovirus system in insect cells (sf9) and silkworm a manual of methods for baculovirus vectors and insect cell culture procedures the bradford method for protein quantitation generating vesicular stomatitis virus pseudotype bearing the severe acute respiratory syndrome coronavirus spike envelope glycoprotein for rapid and safe neutralization test or cell-entry assay laboratory manual for the isolation and identification of avian pathogens purification of rna using trizol (tri reagent) interferons α and λ inhibit hepatitis c virus replication with distinct signal transduction and gene regulation kinetics transcriptional innate immune response of the developing chicken embryo to newcastle disease virus infection moderated estimation of fold change and dispersion for rna-seq data with deseq2 type i interferons and the innate immune response-more than just antiviral cytokines a genomic analysis of chicken cytokines and chemokines characterisation of chicken zap the antiviral response. microbes infect palmitoylome profiling reveals s-palmitoylation-dependent antiviral activity of ifitm3 ifitm proteins restrict viral membrane hemifusion the antiviral effector ifitm3 disrupts intracellular cholesterol homeostasis to block viral entry marek's disease virus-induced immunosuppression: array analysis of chicken immune response gene expression profiling cloning, expression and functional characterization of chicken ccr6 and its ligand ccl20 unique chemotactic response profile and specific expression of chemokine receptors ccr4 and ccr8 by cd4+ cd25+ regulatory t cells identification, mapping, and phylogenetic analysis of three novel chicken cc chemokines unique features of chicken toll-like receptors expression of the avian-specific toll-like receptor 15 in chicken heterophils is mediated by gram-negative and gram-positive bacteria, but not tlr agonists diacylated lipopeptide from mycoplasma synoviae mediates tlr15 induced innate immune responses ifn-λ: a new spotlight in innate immunity against influenza virus infection differential responses of innate immunity triggered by different subtypes of influenza a viruses in human and avian hosts type iii interferon gene expression in response to influenza virus infection in chicken and duck embryonic fibroblasts distinct roles of type i and type iii interferons in intestinal immunity to homologous and heterologous rotavirus infections interferon-λ and interleukin 22 act synergistically for the induction of interferon-stimulated genes and control of rotavirus infection interferons and viral infections innate immune sensing of dna viruses neuronal activity-dependent local activation of dendritic unfolded protein response promotes expression of brain-derived neurotrophic factor in cell soma key: cord-012035-rhpfpku9 authors: zhong, hui-hai; wang, hui-yuan; li, jian; huang, yong-zhuo title: trail-based gene delivery and therapeutic strategies date: 2019-08-23 journal: acta pharmacol sin doi: 10.1038/s41401-019-0287-8 sha: doc_id: 12035 cord_uid: rhpfpku9 trail (tumor necrosis factor-related apoptosis-inducing ligand), also known as apo2l, belongs to the tumor necrosis factor family. by binding to the death receptor 4 (dr4) or dr5, trail induces apoptosis of tumor cells without causing side toxicity in normal tissues. in recent years trail-based therapy has attracted great attention for its promise of serving as a cancer drug candidate. however, the treatment efficacy of trail protein was under expectation in the clinical trials because of the short half-life and the resistance of cancer cells. trail gene transfection can produce a “bystander effect” of tumor cell killing and provide a potential solution to trail-based cancer therapy. in this review we focus on trail gene therapy and various design strategies of trail dna delivery including non-viral vectors and cell-based trail therapy. in order to sensitize the tumor cells to trail-induced apoptosis, combination therapy of trail dna with other drugs by the codelivery methods for yielding a synergistic antitumor efficacy is summarized. the opportunities and challenges of trail-based gene delivery and therapy are discussed. nucleic acid-based therapy has been considered one of the most promising strategies for the treatment of various diseases [1] . tumor necrosis factor (tnf) plays an important role in the homeostatic regulation of the immune system [2] . although tnf is potent in causing tumor necrosis, the first two clinical trials of tnflike molecules for cancer therapy failed because of lethal inflammatory shock syndrome and fulminant liver toxicity [3, 4] . subsequentlyx, a novel tnf family member, tnf-related apoptosis-inducing ligand (trail), was found [5, 6] ; this protein is a type ii transmembrane protein and can be released from the cell surface in soluble form via proteolysis [7] . soluble trail is nontoxic to normal cells, and in fact, there is a trace amount of endogenous trail (~100 pg/ml) in healthy adult plasma [8, 9] . the trail protein is expressed in various tissues-predominantly in the spleen, lung, and prostate-and on the surface of cytotoxic t cells and natural killer (nk) cells [10] . its death receptors (drs), dr4 and dr5, are overexpressed in many types of cancer cells. importantly, trail is capable of killing tumor cells without causing lethal adverse effects [11, 12] . apoptosis is an essential function of the maintenance of cellular homeostasis and prevents a number of diseases, including cancer [13] . tumorigenesis is associated with defects in apoptosis regulation [14] . there are two major apoptotic pathways: the intrinsic, or mitochondrial, pathway usually induced by chemotherapy [10] , and the extrinsic, or dr, pathway that mediates extrinsic programs of cell death, such as trail-induced apoptosis. however, these two pathways usually associate with each other downstream via "crosstalk" [15] . the extrinsic pathway is activated by extracellular proapoptotic stimulators that bind to cell surface receptors [16] . there are five homologous human receptors for trail: the full-length intracellular death domain (dd)-containing receptors dr4 [17] and dr5 (trail-r2) [18] [19] [20] ; the decoy receptor 1 (dcr1 or trail-r3), which lacks an intracellular domain; [18, 19, 21] dcr2 (trail-r4), which contains a truncated dd; [22, 23] and the soluble receptor osteoprotegerin (opg). among these receptors, only the binding of trail to dr4 or dr5-because of their integrated intracellular structure-can induce an apoptotic effect; dr5 has the highest affinity for trail [24] . however, trail/dcr1 binding cannot induce downstream signaling, because dcr1 lacks an intracellular domain, while dcr2 and opg act as nf-κb ligand receptors, which induce nf-κb activation but not apoptosis. after dr4 or dr5 binding to trimeric trail, the intracellular dd structure of the drs is altered, and binding to fas-associated death domain-containing protein (fadd) then occurs. then, fadd binds to procaspase-8/-10 via the death effector domain (ded) on the n-terminus, thereby forming dr4/dr5-fadd-procaspase-8/-10, which is called the death-inducing signaling complex (disc). oligomerization and autocatalysis of procaspase-8 leads to the activation of caspase-8, which consequently triggers cleavage of the effector caspases-3/-7/-9 to induce apoptosis. furthermore, caspase-8 promotes the release of cytochrome c, inducing intrinsic apoptosis via the mitochondrial pathway in type ii cells [25] . the trail-induced apoptosis process is summarized in fig 1. the ability of the tumor-specific action of trail to induce the apoptosis of cancer cells while sparing normal cells is attractive and renders trail signaling a potential therapeutic target. to date, clinical trials of trail have focused on recombinant trail protein and antibodies against trail-r (table 1) . however, clinical trials have shown inadequate treatment outcomes [26] . the recombinant form of trail and anti-trail-r antibodies, as well as their combination with other components, have not achieved the expected efficacy [7] . for example, the recombinant form of trail did not exhibit significant antitumor effects, partially due to its short half-life [26, 27] , while tas266, an antibody targeting dr5, showed acute toxicity in a phase i clinical study [28] . there are three major limitations of trail-based therapy: its short in vivo half-life [29, 30] , its poor tumor-targeting efficacy, and resistance to trail monotherapy [31, 32] . the emergence of nanotechnology has provided a useful tool to address these problems. trail-based nanotherapies offer the potential to improve the stability of trail and prolong its half-life in the bloodstream, to specifically deliver trail to target sites and to overcome resistance to trail [33] . compared with direct administration of trail proteins, trail gene therapy also has the unique advantages of delivering trail-encoding dna into tumor cells to locally secrete the trail protein on the membrane or into the tumor microenvironment, thereby overcoming the limitations of recombinant trail protein. notably, combination therapies of trail with other anticancer agents via a codelivery system may solve the problem of trail resistance. trail gene therapy also benefits from the "bystander" effect, by which not only the host cancer cells but also the neighboring cancer cells can be killed by both secreted and membrane-bound trail [34] . trail shows a unique advantage over other cell deathinducing ligands (e.g., fas ligand, fasl), of which only the membrane-bound form can induce apoptosis, while the intrinsic soluble form cannot [35] . liposome-bound trail induces even more efficient apoptosis than the soluble form [36] . trail-based gene therapy has been investigated in various types of tumors, such as hepatocellular carcinoma and cervical cancer. since the completion of the human genome project, the development of gene therapy has accelerated. gene therapy depends on the success of delivering specific nucleic acids to target sites by overcoming a series of biobarriers; in other words, it relies on the efficient delivery of vectors. viral vectors are well known for their high transfection efficiency, which mainly relies on their ability to integrate genes into the genome of host cells. this approach increases the risk of insertion mutations at the integration site, especially if there are hotspots near prooncogenes. thus, there is an urgent need to find a highly efficient vector with low genotoxicity and immunogenicity. nonviral vectors delivering genes typically via a nanosized platform are another option, with obvious advantages of safety and the high packing capacity of nucleic acids. in addition, advances in nanotechnology have provided good insight into rational designs for targeted delivery. although numerous nonviral vectors have been developed, the amount of data from clinical trials has been very limited due to their low transfection efficiency [37] . thus, there is a pressing need to develop nonviral vectors with increased efficiency. nonviral vectors include cationic lipids, cationic polymers, and inorganic nanoparticles. polyethyleneimine (pei), for example, has high transfection efficiency in various cell lines. for the systemic delivery of gene therapeutics, delivery systems must cross a series of barriers, which is an important consideration in the design of delivery systems. regarding trail-based therapy, viral vectors are prominently used for cell therapy, while nonviral vectors deliver plasmidencoded trail (ptrail) to targets. in this review, we summarize trail-based gene therapeutic strategies and discuss the challenges facing clinical trials of trail. most gene therapy clinical trials involved viral vectors [38] . modified viruses such as retroviruses, lentiviruses, adenoviruses, and adeno-associated viruses (aavs) are commonly used to deliver genes. viruses are able to transfer genes into many cell types, with highly efficient transfection. griffith et al. constructed a trail-encoded adenovirus and found that rapid expression of the trail protein and apoptosis of tumor cells were triggered by the activation of caspase-8 [39] . oncolytic adenoviruses (oads) can selectively replicate in cancer cells while sparing normal cells; thus, these viruses have been used in clinical trials for anticancer therapy. el-shemi et al. applied systemic therapy with ad-δb/ing4 (inhibitor of growth 4) plus ad-δb/trail in an orthotopic human hepatocellular carcinoma (hcc) -bearing mouse model. this study found that the combination of these agents elicited potent eradicative effects by inducing apoptosis and immune responses, and suppressing tumor angiogenesis without causing obvious overlapping toxicity [40] . in addition, the use of viral vectors has been explored for trailbased cell therapy, which will be introduced in section 3. because of the inherent shortcomings of viral vectors, including their limited dna packaging capacity, complicated production processes, broad tropism, cytotoxicity, immunogenicity, and tumorigenicity, nonviral vectors have also been widely investigated as an option for gene therapy [41] . in contrast to viral vectors, nonviral vectors have the advantages of low immunogenicity, high delivery capacity, and easy preparation [42, 43] . before they reach the target cells, delivery systems need to cross a series of biobarriers. in the physiological environment, positively charged complexes are more prone to bind serum proteins and aggregate and thus be cleared rapidly [44, 45] . shielding the positive charge using polyethylene glycol (peg) or anionic materials such as γ-pga can be helpful. another challenge is selective accumulation at the target tissue, which requires specific designs, such as arming targeted ligands. subsequently, fig. 1 pathway of trail-induced apoptosis. trail binds to five receptors, including four membrane-bound receptors (i.e., dr4, dr5, trail-r3, and trail-r4) and one soluble receptor (opg). only binding to dr4 or dr5 results in receptor trimerization and recruitment of fadd via the dds of dr4 or dr5. after further recruitment of caspase-8, these proteins form a complex named disc, which can activate caspase-8. in type i cells, caspase-8 activates caspase-3 and triggers apoptosis via the extrinsic pathway. however, in type ii cells, the intrinsic pathway is triggered via caspase-8/bid/tbid, and consequently, bax/bak on the mitochondrial membrane is activated to induce the release of cytochrome c, which promotes the formation of the apoptosome with apaf1 and procaspase-9. subsequently, activation of caspase-9 and caspase-3 is induced. figure adapted from fig. 2 in ref. [7] the dna of interest needs to be delivered to the nucleus for transcription [41] . thus, the design of gene delivery systems should account for these considerations. polymers. cationic polymers are an important class of nonviral vectors. poly(l-lysine) (pll) and polyethyleneimine (pei) were developed for gene delivery in the 1990s. subsequently, numerous cationic polymers have been developed and used, including poly [(2-dimethylamino) ethyl methacrylate] (pdmaema), poly(β-amino ester)s, and various carbohydrate-based polymers and dendrimers. pei and its derivatives are the most commonly used polymeric vectors for gene delivery, with the advantage of the "proton sponge" effect, which facilitates the endosomal escape of gene drugs [46] . however, the major hurdles to overcome for the in vivo use of pei are the substantial cytotoxicity related to its strong positive charge and the issue of its stability in the bloodstream [47] . to address these issues, modification of pei and the surface coating have been investigated. for example, a γ-pga corona can shield the positive charge of the pei/dna complex, thereby decreasing its toxicity and increasing its stability. it has been reported that the γ-pga-coated branched pei/ptrail complex can efficiently transfect pancreatic stellate cells expressing fibroblast growth factor receptors [48] . although pei 25k is the gold standard for polymer transfection, its high molecular weight generally causes high toxicity. therefore, many studies focusing on low-molecular-weight pei have been reported. for instance, a gene delivery system for brain targeting was established by using pei 10k modified with myristic acid (mc); mc-pei 10k /dna nanoparticles could interact with the cell membrane via the hydrophobic segment on mc that can be incorporated into the phospholipid bilayer. mc-pei 10k /porf-htrail nanoparticles were effective against the growth of intracranial tumors [49] . furthermore, cell-penetrating peptide (cpp)-modified and mannosylated low-molecular-weight pei (termed man-pei 5k -cpp) was constructed as a vector to deliver ptrail for colorectal cancer treatment. man-pei 5k -cpp increased the cellular uptake efficiency and improved the efficiency of transfection [50] . then, a ternary complex system was developed by a γ-pga-based γ-glutamyl transpeptidase (ggt)-targeting and surface camouflage strategy via a layer-bylayer self-assembly method. biodegradable polyanionic γ-pga protected the pei/pdna complexes from interaction with body fluid components and interacted with tumor-overexpressed ggt, which can mediate the endocytosis of nanoparticles for cervical cancer gene therapy (fig. 2) [51] . hyaluronic acid (ha) also acts as a biocompatible polyanionic biomaterial for shielding positive charges. an ha-decorated polyethyleneimine-poly(d, l-lactide-co-glycolide) nanoparticle (pei-plga np) system was established for targeted codelivery of ptrail and gambogic acid (ga) for triple-negative breast cancer (tnbc) therapy [52] . ga was encapsulated into the hydrophobic core of pei-plga nps, while ptrail was adsorbed onto the positively charged np surface. the ha coating on the pei-plga nps not only functioned as a shell to neutralize the excess positive charge on the nps but also served as a targeting ligand by binding to the cd44 receptor on tnbc cells. a series of terpolymers were synthesized via the enzyme-catalyzed copolymerization of lactone with dialkyl diester and aminodiol. targeted delivery of ptrail to the tumor by the terpolymers resulted in significant inhibition of tumor growth with minimal toxicity in vivo, and the transfection efficiency was associated with the high molecular weight and increased hydrophobicity [53] . intriguingly, it was found that preparation via a high concentration the clinical trials can be found at https://www.clinicaltrials.gov trail-based gene delivery and therapeutic strategies hh zhong process (i.e., a small reaction volume) resulted in large pei/dna complexes that had a higher gene transfection efficiency than their small counterparts prepared at a low concentration (fig. 3 ) [54] . the mechanisms were associated with macropinocytosis and fast dissociation. accordingly, large-sized pei/ptrail complexes exhibited increased anticancer efficacy via local or regional administration in subcutaneous xenograft and peritoneal xenograft mouse models. poly(beta-amino ester)s (pbaes) are a class of cationic synthetic polymers that can be used as nonviral gene carriers. these compounds are easy to synthesize, bind effectively to dna, and are hydrolytically degradable under physiological conditions. tzeng et al. reported the use of pbae nanoparticles containing ptrail for the selective transfection of cancer cells and the induction of apoptosis in several human cancer cells [55] . additionally, shen et al. developed an ingenious vector comprising quaternary amines carrying n-propionic 4-acetoxybenzyl ester substituents; the 4-acetoxybenzyl ester group can undergo rapid intracellular esterase-catalyzed hydrolysis, which subsequently triggers a reversal of the polymer's charge from cationic to zwitterionic [56] . due to the high cytosolic esterase activity in cancer cells but low activity in fibroblasts, the loaded ptrail can be delivered specifically into cancer cells without damaging fibroblasts, preventing the expression of wnt16b (fig. 4) [56] . the same group also synthesized another enzyme-responsive cationic polymer, poly (pqdea), which is rapidly hydrolyzed by intracellular esterases to form anionic poly(acrylic acid) to confer low cytotoxicity and fast release of dna [57] . pqdea/dna polyplexes were further coated with a lipid layer to generate serum-stable lipidic polyplexes (lpqdea/dnas) for in vivo use. lpqdea/ptrail strongly inhibited tumor formation as effectively as paclitaxel but gave less tumor relapse and longer survival. in addition, some inorganic materials have been modified with polymers for ptrail delivery. for example, a sandwich-type peicoated gold nanocomposite coloaded with ptrail and nucleartargeted dexamethasone (dexa) (termed au-pei/ptrail/pei-dexa) exhibited efficient transfection and significantly inhibited the growth of hep3b tumors [58] . a targeted iron oxide np coated with a chitosan-peg-pei copolymer and chlorotoxin was developed; this np efficiently delivered ptrail into human t98g gbm cells and induced the secretion of trail [59] . further, systemic administration to mice bearing t98g-derived flank xenografts resulted in almost imperceptible tumor growth and induced apoptosis in tumor tissue. to overcome treatment limitations of adenoid cystic carcinoma, the fe 3 o 4 -pei-plasmid complex (fpp) was generated, in which iron oxide nps were modified by positively charged pei to enable them to carry pactert-trail [60] . the efficiency of fpp-mediated transfection was sixfold higher than that of pei alone or of lipo2000, and fpp-mediated trail gene transfer efficiently inhibited sacc-83 tumor growth. dendrimers. dendrimers, which are characterized by their welldefined size and low polydispersity index, have been widely used for drug and gene delivery [61] . jiang et al. reported a gene vector generated by polyamidoamine (pamam) and angiopep-2 another strategy has been reported in which transferrin (tf) was conjugated to a generation 3 diaminobutyric polypropylenimine (dab) dendrimer; this delivery system harbors plasmids encoding for tnf-α, trail, or il-12 and leads to therapeutic effects on prostate tumors following intravenous administration [63] . in addition, lactoferrin (lf)-bearing dab dendriplexes showed similar efficacy [64] . later, coumarin-anchored low-generation dendrimers (g1 pamam dendrimers) were established to improve dna binding and gene delivery [65] . the coumarin moieties endowed these materials with light-responsive drug delivery behaviors, and the drug-loaded nanoparticles exhibited complementary anticancer activity through the codelivery of 5-fluorouracil and ptrail. a triazine-modified dendrimer, g5-dat66, was synthesized and used as a vector for ptrail therapy, showing higher transfection efficacy than commercial transfection reagents such as lipo2000 and superfect [66] . furthermore, in vivo studies demonstrated that g5-dat66/ptrail efficiently inhibited tumor growth in osteosarcoma-bearing mice. pishavar et al. synthesized a vector composed of pamam dendrimers modified with alkyl-carboxylate chains, peg, and cholesteryl chloroformate for delivering ptrail, and these pamam g4-alkyl-peg (3%)-chol (5%)-trail complexes inhibited ct26 tumor growth in mice [67] . peptides and proteins. cationic peptides rich in residues such as lysines or arginines are able to condense dna. furthermore, conjugation of peptide ligands to delivery systems allows targeting to specific types of cancer cells [68] . a biomimetic vector was developed for delivering ptrail to tumor; it contained an adenovirus μ peptide for pdna condensation, a synthetic cyclic peptide for tumor-targeting and intracellular delivery, a phresponsive synthetic fusogenic peptide for endosome escape, and a nuclear localization signal from human immuno-deficiency virus for intranuclear delivery [69] . up to 62% of zr-75-1 breast cancer cells were killed after exposure to ptrail in complexation with this vector. a dimerized hiv-1 tat peptide was used to formulate fig. 4 illustration of the esterase-responsive charge-reversal polymer (erp) and its lipid-coated esterase-responsive polyplexes with trail plasmid for cancer gene therapy. a the erp is a pei whose amines are quaternized with propionic 4-acetoxybenzyl ester. hydrolysis of the phenolic acetate triggers the elimination of p-hydroxymethylphenol and consequent conversion of the cationic polymer into a zwitterionic form. b erp condenses plasmid dna into the polyplexes, which are easily coated with dc-chol/dope lipids to form lipidic esterase-responsive polyplexes (lerps). after i.p. injection into nude mice bearing hela cell-derived tumors, tumor cells internalize lerps into the cytosol, which is rich in esterases. the lerps disassemble and release the polyplexes to allow the esterases to trigger the charge reversal of the erp and thus plasmid release. these free plasmids enter the nucleus for effective gene expression, inducing apoptosis when delivering the trail gene. in tumor fibroblasts, the low esterase level cannot efficiently induce the charge reversal process, and the trail plasmids will not be expressed, preventing wnt16b production. reprinted with permission from [56] a nanoparticle vector (dtat np) to leverage the efficiency of this cell penetration strategy for tumor-targeted gene delivery [70] . in cell culture, dtat np was an effective pdna transfection vector with negligible cytotoxicity. gene expression in tumor tissues lasted for >14 days after intratracheal administration. bolus administration of dtat np-encapsulated ptrail markedly attenuated the growth of tumors derived from lewis lung carcinoma cells. phosphatase and tensin homolog (pten) and trail genes loaded into zein nanoparticles showed antiproliferative activity against hepg2 cell lines, indicating their potential for gene therapy for the treatment of hcc [71] . liposomes. cationic lipids have been widely used for nucleic acid delivery and for advances in gene delivery through their molecular design. cationic lipids can spontaneously form specific types of complexes for condensing and encapsulating dna into particles [72] . huang et al. reported a novel lipid (1,2-di-(9z-octadecenoyl)-3biguanide-propane (dobp)) that was elaborately designed by utilizing biguanide as the cationic head group [73] . this novel cationic lipid acted as a gene carrier and had a metformin-like antitumor activity via activation of the ampk and inhibiting the mtor pathways. dobp-lpdtrail nps showed potent efficacy against tumor progression. then, these nps were used to transfer a secreted form of trail (strail) to tumor-associated fibroblasts in order to secrete cytotoxic proteins to tumor cells via lipidcoated protamine [74] . strail triggered apoptosis in tumor cell nests adjacent to tafs. furthermore, strail converted the residual fibroblasts to a quiescent state, thus arresting the tumor growth and remodeling the microenvironment to facilitate the secondwave nanotherapy. the system showed good efficacy in an orthotopic xenograft model of human pancreatic cancer, where the desmoplastic stroma is a major barrier to the delivery of therapeutic nanoparticles. chen et al. developed a tumor-targeted lcpp (lipid/calcium/ phosphate/protamine) np to deliver ptrail into hcc cells in a mouse model of hcc [75] . ptrail was entrapped in a phresponsive calcium phosphate (cap) core, and protamine was included to direct intranuclear delivery. trail resistance could be reversed by intracellular release of ca 2+ from the cap core that induced dr5 up-regulation through camkii activation (fig. 5) . other vectors. gong et al. developed a well-tailored and versatile "core-shell" ternary system (rrphc) for systemic gene delivery to treat aggressive melanoma [76] . this system consisted of a core of fluorinated polymers (pfs) that bound to a plasmid (pdna) and a negatively charged multifunctional rrph (rgd-r8-peg-ha) shell constructed by grafting a hyaluronan polymer with peg side chains, which were further conjugated with the r8-rgd tandem peptide on the distal side, simultaneously targeting the cd44 receptors and integrin α v β 3 receptors overexpressed on the neovasculature and most malignant tumor cells. systemic injection of the proapoptotic mtrail plasmid by rrphc ternary complexes inhibited the melanoma growth, without noticeable side toxicity. next, this group reported a similar system of an artificial virus core-shell to target cancer stem cell-like cells [77] . the intravenously injected nanoparticles accumulated at the tumor sites while reducing the exposure to the normal tissues, and efficiently arrest the tumor growth, without obvious systemic toxicity. mesenchymal stem cells (mscs) are a population of fibroblast-like cells originally isolated from bone marrow and other tissues, including adipose tissue, peripheral blood, umbilical cord blood, and wharton's jelly, among others [78, 79] . mscs have therapeutic potential in several pathological conditions and unique immunological features [80] [81] [82] . in recent years, the cellular vehicle function of mscs has been used to transfer trail to the tumor parenchyma [83] [84] [85] . lee et al. reported a novel application of magnetic core-shell nanoparticles for the dual purpose of delivering and activating a heat-inducible gene vector that encodes trail in adipose-derived mesenchymal stem cells (ad-mscs) [86] . this group developed a plasmid harboring the heat shock protein 70b' (hsp70b0) promoter. the magnetic core-shell nanoparticles (mc nps) was composed of znfe 2 o 4 magnetic nanoparticle core and mesoporous silica shell, as well as a surface coating of pei for dna binding. the mc nps facilitated the intracellular delivery of the heat-inducible plasmid into ad-mscs via magnetic guidance, and after systemic injection, the engineered ad-mscs could home to tumors/metastases. trail expression could be specifically activated via the induction of mild magnetic hyperthermia (~41°c). this system enhanced control over the activation of stem cell-based gene therapies. it was reported that human mscs were transduced with trail and the ires-egfp reporter gene under the control of the tetracycline promoter using a lentiviral vector [87] . the transduced and activated mscs led to the apoptosis and death in various cancer cells in coculture experiments. the in vivo studies demonstrated that the i.v. injected trail-expressing mscs significantly arrested the tumor growth. gao et al. transfected ptrail into mscs with a nonviral vector, pei 600 -cyd, prepared by linking low-molecular-weight polyethyleneimine (pei) and β-cyclodextrin (β-cd) [88] . the lung tumor homing ability of mscs expressing trail in vivo proved to be efficient for lung metastasis therapy. fig. 6 shows a schematic of the specific process of this cell-based trail therapy. human umbilical cord-derived mesenchymal stem cells (humscs) were transfected by lentiviral vectors coding the strail with the alpha-fetoprotein (afp) promoter, and the treatment efficacy of these engineered humscs on orthotopically implanted hepatocarcinomas in mice was examined [89] . humscs could migrate to the hepatocarcinoma, where the afp promoter was triggered by the early hepatic differentiation of humscs, and expressed strail at the cancer cells and yielded significant antitumor activity. dominici et al. transduced human ad-mscs with a retroviral vector encoding full-length human trail [84] . ad-mscs could target various cancer cell lines in vitro, and reverse the trail resistance by coadministration of bortezomib. these ad-mscs targeted to tumors and induced apoptosis, without apparent side toxicity. the engineered mscs with trail expression also induced apoptosis by cell-to-cell contact in the trailresistant ewing sarcoma (ews) that is insensitive to amg655, an antibody against the death receptor dr5, too, and the treatment effect was confirmed in two orthotopic models of ews [85] . despite these encouraging results, there is some concern regarding the safety of inoculating both wild-type and genetically modified mscs, especially regarding their possible damaging effects on normal organs, malignant transformation, and promotion of cancer growth [90] . then, researchers incorporated suicide genes-the herpes simplex virus thymidine kinase (hsv-tk) and cytosine deaminase genes-into mscs to control their fate once infused [91] . this approach is based on a variant of human caspase-9 that binds with high affinity to a synthetic, bioinert small molecule (ap20187), leading to cell death [92] . conventionally, mscs have been genetically modified for cancer therapy by using viral vectors that can elicit oncogenicity, thus limiting their use in clinical trials. chen et al. used nonviral agents such as a polylysine-modified polyethyleneimine (pei-pll) copolymer to generate genetically engineered mscs with suicide genes, namely, hsv-tk and trail [93] . the mscs armed with suicide genes along with prodrug ganciclovir can induce significant antitumor effect to glioblastoma by intratumoral injection both in vitro and in vivo. another study on delivering the trail gene for stem cell-mediated gene therapy was conducted by using nonviral vectors (a less efficient but safer method). na et al. prepared the polyplexes of ptrail and bpei, and photochemical internalization (pci) was applied to improve the polyplex entrapping in hmscs and enhance the transfection efficiency of ptrail; the tumor-homing hmscs could also increase the trail secretion in the tumors [94] . pci-mediated polyplex loading significantly enhanced trail expression in stem cells, and that homing ability enhanced cancer targeting. exposure of polyplex-loaded hmscs (ptrail/bpei@hmscs) to laser irradiation resulted in a beneficial therapeutic antitumor effect in a xenograft mouse model. in addition to mscs, hu et al. reported a dc cell-based therapy for colon cancer cells [95] . tyrosine kinase receptor 3 ligand (fl) and trail plasmids were constructed for combination therapy. fl, a hemopoietic growth factor, is important in progenitor cell proliferation and differentiation, which can enhance the proliferative and antitumor effect of dcs. these two plasmids were transfected into dcs by lipo2000. the combination of fl-carrying b trail expression and secretion were quantified by confocal microscopy and elisa, respectively. c sp94-lcpp nps without pdna significantly increased dr5 expression in a dose-dependent manner in hcc cells. d the camkii inhibitor k252a prevented the effects of lcpp nps on dr5 upregulation. e, f treatment with trail pdna loaded into sp94-lcpp nps showed higher cytotoxicity in hcc cells than to control cells. however, these nps exhibited slight cytotoxicity in murine hepatocyte fl83b cells. reprinted with permission from [75] dcs and trail-carrying dcs showed a good level of apoptosis in colon cancer [96] . although trail-induced apoptosis is an attractive therapeutic target, resistance to trail readily develops during treatment. this therapeutic resistance derives from two sources: intrinsic resistance in some highly malignant tumors [97] [98] [99] and acquired resistance after repeated exposure to trail [100] . resistance to trail is conferred by multiple receptors and involves a series of signaling pathways and activation of inhibitory molecules [99] . trail signaling begins with the binding of trail to the death receptors. first, binding to decoy receptors (e.g., trail-r3 and r4) will not lead to the activation of downstream trail-signaling. second, mutation or downregulation of the functional drs (e.g., dr4 or dr5) can also result in resistance. for instance, transfection of mutated dr4 into sw480 colon cancer cells caused a lower efficacy of cell killing than transfection of the wild-type counterpart [101] . low expression of dr5 contributed to trail resistance in anti-dr5 antibody therapy [102] . in this case, upregulation of dr5 is a therapeutic strategy for sensitization to trail treatment [101] . third, the multifunctionality of downstream trail signaling also induces resistance. the assembly of the disc can be inhibited by apoptosis inhibitors. for example, cflip, with a similar structure to caspase-8, can inhibit caspase-8 activation after binding to fadd [103] . the ratio of cflip/caspase-8 is correlated with trail resistance in various tumors, such as hepatocellular carcinoma and burkitt lymphoma [104, 105] . in type ii cells, trail-initiating apoptosis is mainly mediated by the mitochondrial pathway (fig. 7) . upregulation of the proapoptotic proteins bax and bak promotes cell death, while the antiapoptotic proteins bcl-2 and bcl-x l determine cell survival [106, 107] . furthermore, inhibitors of apoptosis proteins (iaps) can block apoptosis by inhibiting the activity of the effector caspases (e.g., caspase-3 and caspase-9), and overexpression of iaps can confer trail resistance [108, 109] . however, this effect can be reversed by iap antagonists such as smac/diablo, which promotes apoptosis by interacting with iaps [110, 111] . the mechanisms are summarized in fig. 7 . in addition, epigenetic changes play a role in trail resistance via the regulation of caspase-8 gene expression, and dna methylation can restore apoptosis in trail-resistant tumor cells. trail resistance is also related to nf-κb and mitogen-activated protein (map) kinases. however, these signaling pathways showed both proapoptotic and antiapoptotic effects [112] . in conclusion, dr dysfunction and overexpression of cflip, bcl-2, bcl-x l , and iaps contribute to trail resistance, and therefore, regulation of drs and inhibition of antiapoptotic proteins resensitizes cells to trail treatment [113] . the resistance of cancer cells to trail has encouraged the investigation of combination therapy. in recent years, many multifunctional drug delivery systems, including liposomes, micelles, and polymeric nanoparticles, have been developed for the codelivery of genes and drugs. here, we summarize the delivery and therapeutic strategies of combinations of ptrail and different types of drugs. combination with trail sensitizers the combination of trail with its sensitizers is a promising strategy to overcome trail resistance. it was found that polyether ionophore antibiotics (e.g., monensin) can overcome trail resistance via endoplasmic reticulum stress induction, dr5 fig. 6 the nonviral vector pei 600 -cyd was used to transduce trail plasmid into mscs. trail-armed mscs showed a lung tumor-homing ability and an antitumor effect on lung metastasis-bearing c57bl/6 mice after i.v. injection. reprinted with permission from [88] fig. 7 inhibition of trail signaling results in trail resistance. when the disc assembles, flip (flice-like inhibitory protein) can competitively bind to fadd and limit the recruitment of caspase-8. while bcl-2 and bcl-x l can inhibit bax and bak, tbid plays roles in inhibiting bcl-2 and bcl-x l and activating bax and bak in type ii cells. iaps can strongly inhibit the activation of effector caspases, as smac/ diablo is needed to interact with iaps to release effector caspases. figure adapted from fig. 2 in ref. [7] upregulation and cflip downregulation [114] . huang et al. constructed a biocompatible nanosystem for the codelivery of ptrail and monensin, in which low-molecular-weight pei 1.8k (lmw-pei) was intermolecularly crosslinked via disulfide bonds using sulfhydryl β-cyclodextrin as a linker (fig. 8 ) [115] . the resulting β-cd-sspei carrier, which can bind efficiently with ptrail, can serve as a carrier for codelivery, while monensin is encapsulated inside the cavities of the β-cyclodextrin molecules. the mo β-cd-sspei ptrail nanocomplex can further be modified by a polyanionic polymer γ-pga, thus protecting the nanocomplex from interaction with serum proteins and consequently extending its half-life in the bloodstream. furthermore, γ-pga/ mo β-cd-sspei ptrail can achieve tumor targeting via specific binding between γ-pga and tumor-overexpressed ggt, which can mediate the endocytosis of the nanocomplex. inside cancer cells, the acidic endosomal environment facilitates the detachment of γ-pga, whose conformation is ph-dependent, and the exposed pei facilitates endosomal escape. importantly, the intermolecular crosslinks of pei can be degraded, and ptrail and monensin can be released. monensin can increase intracellular ros levels and induce apoptosis. moreover, dr5 expression is upregulated, thus synergizing with trail-based treatment for colon cancer gene therapy. combination with chemotherapeutic drugs trail exhibits improved efficacy in combination with chemotherapy because chemotherapeutic agents can sensitize tumors to trail-induced apoptosis via crosstalk between the intrinsic and extrinsic pathways of cell death [116] . the chemotherapeutic drugs doxorubicin (dox) and paclitaxel (ptx) were explored to determine their synergistic antitumor effect with ptrail. ebrahimian et al. reported a vector composed of polypropylenimine (ppi) modified with 10-bromodecanoic acid for the codelivery of ptrail and dox for tumor therapy [117] . in addition, a host-guest conjugated nanoparticle for the codelivery of dox and ptrail was designed [118] . the adamantane-conjugated dox (ad-dox, guest component) and the pei-cyclodextrin conjugates (pei-cd, host) were selfassembled into the supramolecular pei-cd/ad-dox, which further bound with trail dna to form the pei-cd/ad-dox/pdna snps. the snps exhibited the enhanced therapeutic efficacy with the significantly increased survival rate of the tumor-bearing mice. jiang et al. investigated the codelivery of the htrail-encoding plasmid open reading frame (porf-htrail) and dox using a tumor-targeting carrier, a peptide haiyprh (t7)-conjugated polyethylene glycol-modified polyamidoamine dendrimer (pamam-peg-t7) [119] . in this system, approximately 375 dox molecules were bound to one porf-htrail molecule, and t7 served as a ligand targeting tumor cell-overexpressed transferrin receptors. this codelivery system induced apoptosis of tumor cells and efficiently inhibited bel-7402 tumor growth in vivo. furthermore, the combination therapy strategy was further applied to glioma, in which dendrigraft poly-l-lysine (dgl), modified by the t7 peptide and conjugated with dox via a ph-sensitive hydrazone bond, was used to deliver porf-htrail to glioma tissue [120] . in addition, other dual targeting systems have also been developed for the codelivery of dox and porf-htrail for glioma treatment [121, 122] . regarding the combination of ptx and trail, an angiopep-2 peptide-modified cationic liposome (ang-clp) for the efficient codelivery of pegfp-htrail and ptx to glioma was reported. angiopep-2 can target the low-density lipoprotein receptorrelated protein (lrp) overexpressed on the bbb and on glioma cells [123] . lu et al. developed two kinds of nanoparticles for the delivery of porf-htrail and ptx to glioma tissues [124, 125] . porf-htrail was delivered by c(rgdyk)-poly(ethylene glycol)-polyethyleneimine (rgd-peg-pei). rgd can bind to integrin α v β 3, which is overexpressed in the neovasculature and on u87 glioblastoma cells. ptx was loaded in a cdx-poly(ethylene glycol)-block-poly(lactic acid) micelle. cdx is a peptide derived from the loop ii region of the snake neurotoxin candoxin, with a high-binding affinity to nicotinic acetylcholine receptors (nachrs). dominici et al. investigated the combination of ptrail and ptx for msc therapy [126] . ptx restored the sensitivity of pancreatic cancer to msc-delivered trail by reverting its prosurvival gene expression profile. additionally, a combination of cisplatin and trail with high anticancer activity was found [127] . fig. 8 the process of codelivery of ptrail and monensin nanocomplexes. as a trail sensitizer, monensin upregulates dr5 and sensitizes tumor cells to trail. reprinted with permission from [115] trail-based gene delivery and therapeutic strategies hh zhong these results indicate that trail gene therapy in combination with chemotherapy can be promising for ovarian cancer therapy. although it is not fully known how chemotherapy sensitizes cells to trail-induced apoptosis, the effect may be related to a p53independent pathway, in addition to changes in the expression of proteins involved in trail signaling, which also play an important role [128] . combination with intracellular apoptosis-induced agents as trail induces extracellular apoptosis, its combination with agents mediating the intracellular apoptosis pathway can yield a synergistic effect. histone deacetylase inhibitors (hadci) induce cell cycle arrest and apoptosis in tumor cells, and hadci can resensitize trail-resistant cancer cells. codelivery of ptrail and vorinostat (saha) with a reactive oxygen species (ros)-triggered charge reversal polymer (b-pdeaea) produced a good antitumor effect [129] . the increased transfection efficiency of ptrail and saha-induced ros accumulation caused significant apoptosis of the cancer cells. the mscs-based gene therapy for antiglioma was developed, in which hat-mscs was transfected by pires2-egfp-strail [130] . the engineered mscs effectively inhibited the proliferation of malignant glioma cells and upregulated drs, yielding a potent antiglioma effect in combination with panobinostat. iaps play an important role in cancer cell resistance to trail, and the combination of trail with iap antagonists can help sensitize cells to trail-induced apoptosis. ge et al. demonstrated that an oncolytic adenovirus coexpressing trail and smac, in combination with the cyclin-dependent kinase (cdk) inhibitor sns-032, synergistically reinforced their individual anti-pancreatic cancer activities, and sns-032 enhanced zd55-trail-ietd-smac-induced apoptosis [131] . shi et al. developed an aav-mediated gene therapy that was characterized by coexpression of trail with mir-221-zip, which produced a synergistic effect on the enhanced apoptosis induction via the sensitizing effect of mir-221-zip by upregulation of pten and downregulation of survivin [132] . compared with trail monotherapy, combination therapy provides enhanced treatment outcomes. therefore, combinations including trail-mediated therapy are a promising approach for cancer therapy. trail is a promising drug candidate for the treatment of many cancers. the investigation of trail-based therapy in clinical trials focuses on recombinant trail proteins and anti-trail antibodies, but delivery and drug resistance issues are the main hurdles in the successful translation of these results. trail-based gene therapy is another potential treatment approach. importantly, the codelivery systems loaded with ptrail and drugs combined with trail have been actively explored. the development of safe and efficient gene vectors is the central issue for gene therapy. the elegant design of advanced systems provides the tumor-targeted codelivery of both agents, thus achieving synergistic treatment effects. in addition, it is necessary to identify proper biomarkers for maximizing trail-based therapy, which can also provide helpful information for the appropriate selection of drug combinations. however, clinical trials have made little progress. efforts should be directed toward the rational design of appropriate formulations of trail to improve its in vivo pharmacokinetic profile and the exploration of biomarkers specific to trail-based therapy. the future application of trail may benefit from a better understanding of the anticancer mechanisms of trail as well as its combination with trail sensitizers. it is expected that as precision medicine progresses, trail-mediated antitumor functions will be better understood to allow the development of precise and effective trail-based treatments for patients. overcoming cellular barriers for rna therapeutics the cd95 (apo-1/fas) and the trail (apo-2l) apoptosis systems cachetin/tnf-alpha in septic shock and septic adult respiratory distress syndrome lethal effect of the anti-fas antibody in mice identification and characterization of a new member of the tnf family that induces apoptosis induction of apoptosis by apo-2 ligand, a new member of the tumor necrosis factor cytokine family exploring the trails less travelled: trail in cancer biology and therapy rankl/ opg/trail plasma levels and bone mass loss evaluation in antiretroviral naive hiv-1-positive men apo2l/ trail and the death receptor 5 agonist antibody amg 655 cooperate to promote receptor clustering and antitumor activity targeting trail in the treatment of cancer: new developments tumoricidal activity of tumor necrosis factor-related apoptosis-inducing ligand in vivo safety and antitumor activity of recombinant soluble apo2 ligand a super way to kill cancer cells apoptosis-its significance in cancer and cancer-therapy cross-talk in cell death signaling proapoptotic activation of death receptor 5 on tumor endothelial cells disrupts the vasculature and reduces tumor growth an antagonist decoy receptor and a death domain-containing receptor for trail the receptor for the cytotoxic ligand trail control of trail-induced apoptosis by a family of signaling and decoy receptors trail-r2: a novel apoptosis-mediating receptor for trail cloning and characterization of trail-r3, a novel member of the emerging trail receptor family the novel receptor trail-r4 induces nf-kappab and protects against trailmediated apoptosis, yet retains an incomplete death domain a novel receptor for apo2l/trail contains a truncated death domain temperature-sensitive differential affinity of trail for its receptors. dr5 is the highest affinity receptor structural determinants of disc function: new insights into death receptor-mediated apoptosis signalling getting trail back on track for cancer therapy preclinical studies to predict the disposition of apo2l/tumor necrosis factor-related apoptosisinducing ligand in humans: characterization of in vivo efficacy, pharmacokinetics, and safety unexpected hepatotoxicity in a phase i study of tas266, a novel tetravalent agonistic nanobody(r) targeting the dr5 receptor phase i dose-escalation study of recombinant human apo2l/trail, a dual proapoptotic receptor agonist, in patients with advanced cancer phase 1b study of dulanermin (recombinant human apo2l/trail) in combination with paclitaxel, carboplatin, and bevacizumab in patients with advanced non-squamous nonsmall-cell lung cancer trailing trail resistance: novel targets for trail sensitization in cancer cells trail and apoptosis induction by tnf-family death receptors nanocarriers for trail delivery: driving trail back on track for cancer therapy antitumor activity and bystander effects of the tumor necrosis factor-related apoptosis-inducing ligand (trail) gene membrane-bound fas ligand only is essential for fas-induced apoptosis liposomes decorated with apo2l/trail overcome chemoresistance of human hematologic tumor cells polymers for gene delivery across length scales overcoming gene-delivery hurdles: physiological considerations for nonviral vectors adenoviralmediated transfer of the tnf-related apoptosis-inducing ligand/apo-2 ligand gene induces tumor cell apoptosis efficacy of combining ing4 and trail genes in cancer-targeting gene virotherapy strategy: first evidence in preclinical hepatocellular carcinoma non-viral vectors for gene-based therapy design and development of polymers for gene delivery nonviral vectors for gene delivery barriers to nonviral gene delivery progress in developing cationic vectors for non-viral systemic gene therapy against cancer a versatile vector for gene and oligonucleotide transfer into cells in culture and in vivo: polyethylenimine toxicity of cationic lipids and cationic polymers in gene delivery new path to treating pancreatic cancer: trail gene delivery targeting the fibroblast-enriched tumor microenvironment the use of myristic acid as a ligand of polyethylenimine/dna nanoparticles for targeted gene therapy of glioblastoma a mannosylated pei-cpp hybrid for trail gene targeting delivery for colorectal cancer therapy poly-gamma-glutamic acid-based ggt-targeting and surface camouflage strategy for improving cervical cancer gene therapy co-delivery of gambogic acid and trail plasmid by hyaluronic acid grafted pei-plga nanoparticles for the treatment of triple negative breast cancer biodegradable poly(amine-co-ester) terpolymers for targeted gene delivery nano-structural effects on gene transfection: large, botryoid-shaped nanoparticles enhance dna delivery via macropinocytosis and effective dissociation polymeric nanoparticle-based delivery of trail dna for cancer-specific killing esterase-activated chargereversal polymer for fibroblast-exempt cancer gene therapy enzyme-responsive charge-reversal polymer-mediated effective gene therapy for intraperitoneal tumors sandwich-type au-pei/dna/pei-dexa nanocomplex for nucleus-targeted gene delivery in vitro and in vivo nanoparticlemediated target delivery of trail as gene therapy for glioblastoma antitumor effect of human trail on adenoid cystic carcinoma using magnetic nanoparticle-mediated gene expression discovery of dendrimers and dendritic polymers: a brief historical perspective dual targeting effect of angiopep-2-modified, dna-loaded nanoparticles for glioma therapeutic efficacy of intravenously administered transferrin-conjugated dendriplexes on prostate carcinomas regression of prostate tumors after intravenous administration of lactoferrinbearing polypropylenimine dendriplexes encoding tnf-alpha, trail, and interleukin-12 a self-assembled coumarin-anchored dendrimer for efficient gene delivery and light-responsive drug delivery triazine-modified dendrimer for efficient trail gene therapy in osteosarcoma modified pamam vehicles for effective trail gene delivery to colon adenocarcinoma: in vitro and in vivo evaluation peptide-guided gene delivery development of a genetically engineered biomimetic vector for targeted gene transfer to breast cancer cells intratracheal administration of a nanoparticle-based therapy with the angiotensin ii type 2 receptor gene attenuates lung cancer growth pten and trail genes loaded zein nanoparticles as potential therapy for hepatocellular carcinoma advancing nonviral gene delivery: lipid-and surfactant-based nanoparticle design strategies a novel cationic lipid with intrinsic antitumor activity to facilitate gene therapy of trail dna targeting tumor-associated fibroblasts for therapeutic delivery in desmoplastic tumors a multifunctional nanocarrier for efficient trail-based gene therapy against hepatocellular carcinoma with desmoplasia in mice core-shell" nanoparticles-based gene delivery for treatment of aggressive melanoma multifunctional nucleus-targeting nanoparticles with ultra-high gene transfection efficiency for in vivo gene therapy minimal criteria for defining multipotent mesenchymal stromal cells human bone marrow and adipose tissue mesenchymal stem cells: a user's guide mesenchymal stem/stromal cells as a delivery platform in cell and gene therapies how do mesenchymal stromal cells exert their therapeutic benefit? mesenchymal stem cell therapy for autoimmune disease: risks and rewards understanding tumor-stroma interplays for targeted therapies by armed mesenchymal stromal progenitors: the mesenkillers adiposederived mesenchymal stem cells as stable source of tumor necrosis factorrelated apoptosis-inducing ligand delivery for cancer therapy trail delivered by mesenchymal stromal/stem cells counteracts tumor development in orthotopic ewing sarcoma models stem cell-based gene therapy activated using magnetic hyperthermia to enhance the treatment of cancer mesenchymal stem cell delivery of trail can eliminate metastatic cancer mesenchymal stem cells as a novel carrier for targeted delivery of gene in cancer therapy based on nonviral transfection suppression of orthotopically implanted hepatocarcinoma in mice by umbilical cord-derived mesenchymal stem cells with strail gene expression driven by afp promoter an inducible caspase 9 suicide gene to improve the safety of mesenchymal stromal cell therapies inducible caspase9-mediated suicide gene for msc-based cancer gene therapy an inducible caspase 9 safety switch for t-cell therapy polylysine-modified polyethylenimine polymer can generate genetically engineered mesenchymal stem cells for combinational suicidal gene therapy in glioblastoma trail-secreting human mesenchymal stem cells engineered by a non-viral vector and photochemical internalization for pancreatic cancer gene therapy the roles of flt3 in hematopoiesis and leukemia nanoliposomemediated fl/trail double-gene therapy for colon cancer: in vitro and in vivo evaluation bcl-xl protects pancreatic adenocarcinoma cells against cd95-and trail-receptormediated apoptosis sensitization for death receptor-or drug-induced apoptosis by re-expression of caspase-8 through demethylation or gene transfer resistance to tumor necrosis factor-related apoptosis-inducing ligand (trail)-induced apoptosis in neuroblastoma cells correlates with a loss of caspase-8 expression mechanisms involved in development of resistance to adenovirus-mediated proapoptotic gene therapy in dld1 human colon cancer cell line molecular determinants of response to trail in killing of normal and cancer cells tumoricidal activity of a novel anti-human dr5 monoclonal antibody without hepatocyte cytotoxicity flice-inhibitory proteins: regulators of death receptor-mediated apoptosis cellular flice/caspase-8-inhibitory protein as a principal regulator of cell death and survival in human hepatocellular carcinoma modulation of caspase-8 and flice-inhibitory protein expression as a potential mechanism of epstein-barr virus tumorigenesis in burkitt's lymphoma induction of apoptotic program in cell-free extracts: requirement for datp and cytochrome c the release of cytochrome c from mitochondria: a primary site for bcl-2 regulation of apoptosis synergy is achieved by complementation with apo2l/trail and actinomycin d in apo2l/trail-mediated apoptosis of prostate cancer cells: role of xiap in resistance x-linked inhibitor of apoptosis (xiap) blocks apo2 ligand/ tumor necrosis factor-related apoptosis-inducing ligand-mediated apoptosis of prostate cancer cells in the presence of mitochondrial activation: sensitization by overexpression of second mitochondria-derived activator of caspase/direct iap-binding protein with low pl (smac/diablo) smac, a mitochondrial protein that promotes cytochrome c-dependent caspase activation by eliminating iap inhibition identification of diablo, a mammalian protein that promotes apoptosis by binding to and antagonizing iap proteins mechanisms of resistance to trail-induced apoptosis in cancer loss of caspase-8 protein expression correlates with unfavorable survival outcome in childhood medulloblastoma monensin, a polyether ionophore antibiotic, overcomes trail resistance in glioma cells via endoplasmic reticulum stress, dr5 upregulation and c-flip downregulation. carcinogenesis targeting death receptors for drug-resistant cancer therapy: codelivery of ptrail and monensin using dualtargeting and stimuli-responsive self-assembling nanocomposites trail and chemotherapeutic drugs in cancer therapy evaluation of efficiency of modified polypropylenimine (ppi) with alkyl chains as non-viral vectors used in co-delivery of doxorubicin and trail plasmid in vivo treatment of tumors using host-guest conjugated nanoparticles functionalized with doxorubicin and therapeutic gene ptrail plasmid porf-htrail and doxorubicin co-delivery targeting to tumor using peptide-conjugated polyamidoamine dendrimer gene and doxorubicin codelivery system for targeting therapy of glioma choline-derivate-modified nanoparticles for brain-targeting gene delivery choline transporter-targeting and codelivery system for glioma therapy co-delivery of pegfp-htrail and paclitaxel to brain glioma mediated by an angiopep-conjugated liposome candoxin, a novel toxin from bungarus candidus, is a reversible antagonist of muscle (alphabetagammadelta) but a poorly reversible antagonist of neuronal alpha 7 nicotinic acetylcholine receptors co-delivery of trail gene enhances the anti-glioblastoma effect of paclitaxel in vitro and in vivo msc-delivered soluble trail and paclitaxel as novel combinatory treatment for pancreatic adenocarcinoma in vitro and in vivo growth inhibition of drug-resistant ovarian carcinoma cells using a combination of cisplatin and a trail-encoding retrovirus is trail the holy grail of cancer therapy? saha (vorinostat) facilitates functional polymer-based gene transfection via upregulation of ros and synergizes with trail gene delivery for cancer therapy histone deacetylase inhibitor panobinostat potentiates the anti-cancer effects of mesenchymal stem cell-based strail gene therapy against malignant glioma synergistic antitumor effects of cdk inhibitor sns032 and an oncolytic adenovirus coexpressing trail and smac in pancreatic cancer combination of aav-trail with mir-221-zip therapeutic strategy overcomes the resistance to trail induced apoptosis in liver cancer competing interests: the authors declare no competing interests.open access this article is licensed under a creative commons attribution 4.0 international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. key: cord-018526-rz7id5mt authors: braun, serge title: non-viral vector for muscle-mediated gene therapy date: 2018-12-14 journal: muscle gene therapy doi: 10.1007/978-3-030-03095-7_9 sha: doc_id: 18526 cord_uid: rz7id5mt non-viral gene delivery to skeletal muscle was one of the first applications of gene therapy that went into the clinic, mainly because skeletal muscle is an easily accessible tissue for local gene transfer and non-viral vectors have a relatively safe and low immunogenic track record. however, plasmid dna, naked or complexed to the various chemistries, turn out to be moderately efficient in humans when injected locally and very inefficient (and very toxic in some cases) when injected systemically. a number of clinical applications have been initiated however, based on transgenes that were adapted to good local impact and/or to a wide physiological outcome (i.e., strong humoral and cellular immune responses following the introduction of dna vaccines). neuromuscular diseases seem more challenging for non-viral vectors. nevertheless, the local production of therapeutic proteins that may act distantly from the injected site and/or the hydrodynamic perfusion of safe plasmids remains a viable basis for the non-viral gene therapy of muscle disorders, cachexia, as well as peripheral neuropathies. skeletal muscle can act as an effective platform for the long-term production (and secretion) of therapeutic proteins with systemic distribution and for the introduction of dna vaccines eliciting strong humoral and cellular immune responses (for review see [1, 2] ). conversely, the treatment of hereditary neuromuscular diseases is particularly challenging for non-viral vectors. among issues are as follows: (1) the size of the muscle tissue, which represents half of the total mass of the organism, (2) the poor accessibility of profound muscles or peripheral nerves, and (3) the progressive tissue remodeling along the natural history of some muscle diseases with active processes of necrosis/regeneration and fibrosis/lipidosis. s. braun (*) afm-telethon, evry cedex, france e-mail: sbraun@afm-telethon.fr on the other hand, non-viral vectors do bear interesting advantages over recombinant viruses. non-viral vectors are made of plasmid dna, naked or complexed to a variety of versatile molecules such as cationic lipids or polymers. they are (1) well characterized, and their structure can be fine-tuned [3] , and (2) mostly nonimmunogenic provided, they are not carrying protein motifs. this allows repeated administrations for chronic diseases, (3) comparatively easy to produce at a large scale [4] , (4) less limited by size constraints, leaving the potential to deliver widetype genetic material, as large as 100 kb [5] (this is far beyond the size of coding sequences such as the dystrophin cdna for duchenne muscular dystrophy), and non-viral vectors (5) can remain functional for a long period of time in skeletal muscles [6] . episomal plasmid dna can persist for life in rodents and for many years in larger animals if they are delivered into low turnover tissues, including the brain and spinal cord, heart, or muscle (for review see [7] ). synthetic vectors have been constructed as substitutes to viral vectors for delivering therapeutic genes and many other drugs in humans [8] . the principle is based on the self-assembly of supramolecular complexes, often through electrostatic interactions between the positively charged vectors and the dna negatively charged phosphate residues [9] . in these complexes, dna is condensed and compacted and is less exposed to nuclease degradation. among these, cationic lipid-and polymerbased systems have been the most extensively studied [10] [11] [12] . in early studies, dna was encapsulated in neutral or anionic liposomes without changing the structures of the liposomes [9, 13] . the ratio between the cationic charge of the liposome and the negative charge of the dna usually controls the size of complexes [14] , typically in the range of 200 nm to 2 μm quasi-spherical particles with an ordered (often multilamellar) organization. their positive total charge enables them of efficiently interacting with the negative residues of the cell membranes and internalizing into the cell, which occurs mainly through the endocytosis pathway [10, 15]. systemic bio-distribution of non-viral vectors is dependent upon their capability of escaping from blood vessels in the target tissue. vectors must be small enough (less than 500 nm) to cross through vascular endothelial cells and gain access to surrounding tissues [16] . furthermore, they should also be designed so that they can be ignored in contrast to viral vectors, non-viral gene transfer is not elicited to a large extent by active intake processes. therefore, a sophisticated vector may be needed to facilitate the cellular uptake and appropriate intracellular processing of the transgene. significant developments in artificial complexes combined different functions for improved gene transfer. many cationic liposomes are normally accompanied by a neutral lipid such as dioleoylphosphatidylethanolamine (dope) or cholesterol. dope is frequently useful because it can fuse with other lipids when exposed to a low ph, as in endosomes, which triggers the release of the associated dna into the cytosol [27] . other popular modifications use ligand binding to peg. various targeting approaches have been investigated, including incorporation of peptides, antibodies, and sugar into the lipid vesicles to facilitate tissue targeting (for review see [28] ). however, the association of all of these components results in complex structures that require thorough formulation and galenic studies. after cell entry, intracellular barriers may impair successful gene delivery. vectors need to escape from the endosomal or lysosomal membrane to avoid degradation of the plasmid dna [29] . endosomal release of dna by cationic polyplexbased vectors may be based on the physical disruption of the negatively charged endosomal membrane after direct interaction with the cationic complex [30] , or a "proton-sponge" phenomenon [11] resulting in osmotic swelling and endosomal membrane rupture, followed by the release of the polyplexes into the cytoplasm. addition of a fusogenic helper lipid such as dope facilitates the formation of a destabilizing hexagonal phase with the endosome membrane and enhances gene expression by promoting the release of dna from the endosomal compartment ( fig. 9 .1 and [31] ). it should be mentioned the majority of cytoplasmic plasmids fail to reach the nucleus due to cytoplasmic nucleases. in contrast to short nucleic acids (such as oligonucleotides) which diffuse freely, the plasmid dna imports to nucleus by an active transport process via the nuclear pore complex and receptor proteins that include importin α, β, and ran [32] . nuclear localization signals or designed peptides can be linked to the plasmid dna to facilitate nuclear import (for review see [33, 34] ). a number of therapeutic concepts have been explored in humans using more or less refined non-viral gene delivery systems in the view of therapies for genetic disorders and of immunologic disorders. as of today, despite a number of very sophisticated chemistries, non-viral vectors were not completely satisfactory in transferring genes to muscle tissues following systemic administration. many complexes show excellent transfection activity in cell culture, but most do not perform well in the presence of serum, and only a few are active in vivo [35] . they remain at least 3 logs of magnitude less effective than viral vectors. therapeutic doses require high concentrations of complexes. besides the relatively large size of many synthetic vectors (often above 150 nm), the main obstacles in the use of synthetic complexes via systemic delivery are their aggregation, instability, toxicity, and , including negatively charged naked plasmid dna (or polynucleotides) delivered either directly or combined with physical methods (ultrasound, electroporation) or complexed with various chemical entities such as cationic lipids or polymers. (b) uptake pathways involve either fusion with the muscle cell membrane-, receptor-, clathrin-, caveolae-, or pinocytosis-dependent endocytosis. this is followed by endosome formation, escape from endosome, degradation, nuclear import of the plasmid dna/polynucleotide, and transgene expression propensity to be captured by the mononuclear phagocyte system, leading to their rapid clearance by phagocytic cells in the liver, spleen, lungs, and bone marrow. these particles readily aggregate as their concentration increases. toxicity is often linked to the colloidal instability of synthetic vectors resulting from interactions with molecules in biological fluids, leading to large aggregates. these aggregates, which are generally ineffective gene delivery agents, can be absorbed onto the surface of circulating red blood cells, or embolized in microvasculatures, preventing them from reaching the intended target cells. this opsonization process can also activate the complement system, one of the innate immune mechanisms against "foreign" particles within the bloodstream, which in turn activates the phagocytosis and initiates an inflammatory response [7, 19, 36] . skeletal muscles possess poorly permeable, tight endothelial (maybe less in the case of chronically inflamed tissues) layers and a highly regulated microcirculation [37] . the implication is that one would not expect particulate systems to be distributed easily from the blood circulation to skeletal muscles. thus, the prospects for non-viral particulate vector widespread distribution from the systemic circulation are limited at present. only one systemic delivery attempt was initiated in a neuromuscular disease indication. this was in hereditary inclusion body myopathy in a single patient intravenously perfused with a lipoplex in a compassionate trial. the patient showed signs of increase of sialic acid-related proteins and stabilization in the decline of muscle strength [38] . the administration of vectors directly to the target tissue avoids most of the obstacles encountered by systemic delivery. however this approach remains hampered by the diffusion limitations and immune cell clearance in the interstitial region of the target organ. indeed, transgene expression following direct intramuscular needle delivery of complexes is often localized in regions that are close to the injection site. this implies that the dispersion of colloidal particles within muscle is a critical issue, and there is a need for basic studies of the effect of formulation on dispersion within solid tissues such as skeletal muscle. nevertheless this poor efficiency remains compatible with applications that require only low levels of the therapeutic proteins, such as genetic vaccines, cancer, or peripheral limb ischemia (table 9 .1). interestingly, retrograde transport seemed to be obtained as some gene expression was found in the peripheral and central nervous system following intramuscular administration [39] . delivery of therapeutic genes to peripheral neurons upon a peripheral and minimally invasive intramuscular administration of polymeric nanoparticles was shown to be feasible in animal models [40] . naked dna can be manufactured in a very cost-effective manner and is a very stable material that can be stored at room temperature for long periods of time following lyophilization. it is composed of a bacterial plasmid that contains the cdna of the therapeutic gene under the transcriptional control of various eukaryotic [116] (continued) regulatory elements and a bacterial origin of replication to allow production in bacteria. a strong promoter may be required for optimal expression in mammalian cells. for this, some promoters derived from viruses such as cytomegalovirus (cmv) or simian virus 40 (sv40) have been used. however, virally derived promoters, such as the cmv promoter, may not be suitable for applications to chronic diseases, as illustrated by the negative impact of inflammatory cytokines (interferon-γ or tumor necrosis factor-α) [41] . thus, muscle-specific alternatives to the cmv promoter have been proposed, such as the desmin promoter/enhancer, which controls expression of the cytoskeletal protein desmin [42] or the creatine kinase promoter [43] . even in vaccines, the vaccinating immune responses obtained were shown to be of a comparable magnitude to those in mice immunized with dna vaccines containing nonspecific promoters. for clinical efficacy and safety of chronic disease applications, it may be necessary to maintain appropriate levels of a gene product in order to prevent toxicity and to be able to modulate or resume transgene expression in response to disease evolution or immune problems. artificial systems for the control of genes are based on two elements: a chimeric transcription factor responding to a small inducer or even electric field and an artificial promoter composed of multiple binding sites for the transcription factor followed by a minimal promoter. inducible gene expression systems use endogenous elements that respond to exogenous signals or stress, such as cytokines, heat, metal ions, and hypoxia. however, neither muscle-specific nor inducible promoters in the absence of induction are devoid of leaky activity [44] . if hypomethylated bacterial cpg sequences are maintained on the plasmid dna backbone or promoter elements, a t helper 1 (th1) immune response (but only for a short period and with no induction of anti-dna antibodies) can be generated which may however be advantageous in view of genetic vaccination, alone or in priming-boost regimens with viral vectors [45] . following the serendipitous demonstration of transgene expression in skeletal muscle injected with naked dna by wolff [46] , plasmid dna has been used extensively in a variety of indications [7] . uptake and expression of numerous transgenes have been demonstrated in various species following intramuscular administration of naked dna. expression peaks at around 7 days, followed by a slow decrease and a prolonged steady state (years), in case of non-immunogenic transgene. the very long-term expression is probably linked to the postmitotic state of skeletal muscles and the persistence of administered genetic material as an extrachromosomal episomal elements [47] . the efficiency of plasmid gene transfer into skeletal muscle (and other tissues) by direct injection is low (~1% of cell nuclei) and remains confined at the injection site (along the needle track) across species [48] , and it further decreases with the plasmid size. nevertheless, naked plasmid dna administration was used in animal models to provide a systemic source of therapeutic protein, for genetic vaccination against pathogens and tumor cells or for therapeutic angiogenesis. in the later case, local gene delivery to focal lesions in the peripheral vasculature, for the production of highly active hormones, is ideally suited to the use of intramuscular or percutaneous vector delivery. in humans, intramuscular injections of naked plasmid encoding angiogenic factors (such as vegf165 or hgf) were used in small numbers of patients with critical limb ischemia and did demonstrate promising clinical efficacy for the treatment of peripheral arterial disease. ischemic pain and ischemic ulcers in the affected limb were relieved or markedly improved in further trials ( [49] and table 9 .1). importantly, all those plasmid-based preclinical and clinical trials resulted in a very good safety record ([50] and table 9 .1). a meta-analysis of 12 clinical trials (1494 patients total) of local administration of pro-angiogenic growth factors (vegf, fgf, hgf, del-1, hif-1alpha) using plasmid or viral gene transfer by intra-arterial or intramuscular injections showed that, despite promising results in single studies, no clear benefit could be identified in peripheral artery disease patients, irrespective of disease severity [51] . locally injected naked dna is being evaluated in muscle regeneration approaches such as myostatin propeptide gene gun delivery [52] and for genetic motoneuron disorders. in the later case, smn induction in a mouse spinal muscular atrophy model was observed following intramuscular injection of a tetanus toxin c fragment plasmid [53] . artificially or spontaneous regenerating muscle fibers display a higher, but still limited, efficiency of transfection [54] . physical methods (electric or ultrasound pulses, ballistic gene gun), which either create transient pores in the cell membrane or increase passive diffusion, were shown to improve up to 100-fold gene transfer to skeletal muscles [55] . the pulse parameters and the type of material used (i.e., needle versus externally applied plate electrodes) are of critical importance [44] . selective electro-sonoporation in a defined area using microbubble contrast agents showed increased plasmid-vegf165 delivery in skeletal muscle allowing therapeutic angiogenesis in chronically ischemic skeletal muscles with undetectable tissue damage [56]. a slightly higher risk of random integration of plasmid dna into genomic dna may also be seen [57] . still limited penetration of the genetic material in the tissue is obtained (in the range of ~1 cm). widespread delivery to large or deep muscles remains challenging. muscle damage and inflammation [58] are induced by these methods which peak at around 7 days and resolve at 3 weeks postinjection with both th1 and th2 immune responses potentially occurring [44] . therefore, this strategy may not be suitable in already inflamed tissue such as dmd muscles. high levels of gene expression in the limb and diaphragm muscles have been achieved by the rapid injection of naked dna in large volumes via locoregional hydrodynamic intravascular delivery with both blood inflow and outflow blocked surgically or using external tourniquets [59, 60] . the endothelium in muscle is continuous and non-fenestrated, showing low permeability to macromolecules, including plasmid dna. the hydrodynamic pressure induces extravasation of the injected dna, probably by expanding the endothelium and thereby making pores accessible for dna entry. the mechanism of plasmid dna uptake by the muscle cells is still not clear and may involve both low-affinity receptor-mediated and nonspecific processes [1, 61] . the procedure safety is supported by a large body of data collected in mice, rats, dogs, and nonhuman primates. the edema caused by the injected fluid is resolved within 24 h and even the minimal signs of observed muscle toxicity clear within 2 weeks postinjection [62, 63] . the hind limb perfusion procedure is a rather quick and simple technique, which may be applied to chronic diseased muscles [64] or other chronic diseases such as anemia [65] . based on successful preclinical studies using the mdx mouse and golden retriever muscular dystrophy (grmd) dog models of duchenne muscular dystrophy, and the positive (expression -though very low-, and safety) outcome of a phase i trial of intramuscular injection of myodys ® , a full-length dystrophin plasmid, in duchenne patients (the first completed gene transfer clinical trial in neuromuscular diseases) [66], the ground was set for a human clinical trial using myodys ® into the forearm of duchenne patients. a dose escalation study of single-limb perfusion with 0.9% saline was carried out in nine adults with muscular dystrophies under intravenous analgesia. the study led by fan et al. demonstrated feasibility and safety up to 35% of limb volume in the upper extremities of the young adults with muscular dystrophy. perfusion at 40% limb volume was associated with short-lived physiological changes in peripheral nerves without clinical correlates in one subject [67] . this study used lower cuff pressures than in our nonhuman primate studies (310-325 mm hg vs. 450-700 mm hg in nonhuman primates) [68, 69] . from our studies in the mdx mouse and grmd dog models of duchenne dystrophy, and in nonhuman primates, the minimal volume needed for efficient naked dna limb perfusion is 40% of the limb volume [70] . whereas arterial limb perfusion did not turn out to be safe in grmd dogs (personal data not shown), up to ten consecutive naked dna limb perfusions every other day appeared very safe in both dystrophic mice and dogs. even though head-to-head comparison would be necessary, our studies suggested that gene transfer was higher in diseased muscles than in wild-type animals. we also noticed that the highest transfection efficiencies were found in nonhuman primates; up to 40% of limb muscles expressed reporter genes following a single-limb perfusion [68] . therefore, limb perfusion of a naked dna remains a valid approach to treat limb dystrophic muscles as an alternative to viral vectors in seropositive patients or in indications that require large transgenes with regional gene transfer [71] . ex vivo approaches using gene-corrected stem cells with non-viral vectors are also being explored. human artificial chromosome (hac) vectors have the capacity to carry large genomic loci and to replicate and segregate autonomously without integration into the host genome. hac vectors containing the entire human dystrophin gene (dys-hac) with its native regulatory elements allow dystrophin expression at levels similar to native dystrophin isoform expression levels. since they can be stably maintained as episomal elements in host cells, the dys-hac could be introduced into several types of patient stem or progenitor cells for ex vivo therapy, e.g., induced pluripotent stem cells, mesoangioblasts, ac133, and mesenchymal stem cells [72] . one of the main issues, however, is the translatability of stem cell therapy in muscle disorders [73, 74] . the development of successful non-viral gene delivery systems to skeletal muscle is highly dependent on the proportion of muscle (or their innervating motoneuron) cells that need to be transfected. more than 25 years of research and testing in animal models and in human trials gear us toward two types of muscle-directed non-viral gene transfer applications: 1. direct injection. this represents a far simpler but poorly efficient approach. provided highly active gene products are used, non-viral gene therapy becomes increasingly amenable to infectious, cancerous, and peripheral ischemia diseases. vectors could be both naked dna and synthetic complexes. 2. intravascular delivery. simple intravenous perfusion of non-viral vectors is as of today far less practicable. regional hydrodynamic delivery of naked dna offers several advantages over viral vectors which hold potential for muscle diseases, including limb-girdle muscular dystrophies and peripheral neuropathies. nevertheless, muscle gene therapy using systemic administration of non-viral vectors retains major hurdles that need to be overcome before any human applications. disclosure author declares having no potential competing financial interests. the mechanism of naked dna uptake and expression non-viral gene delivery in skeletal muscle: a protein factory synthetic vectors for gene delivery: an overview of their evolution depending on routes of administration solid lipid nanoparticles for applications in gene therapy: a review of the state of the art in vitro and in vivo delivery of intact bac dna-comparison of different methods cationic liposomes as non-viral carriers of gene medicines: resolved issues, open questions, and future promises muscular gene transfer using nonviral vectors nonviral gene therapy lipofection: a highly efficient, lipid-mediated dna-transfection procedure mechanism of oligonucleotide release from cationic liposomes cellular and molecular barriers to gene transfer by a cationic lipid nuclear import of plasmid dna in digitoninpermeabilized cells requires both cytoplasmic factors and specific dna sequences improvement of exogenous dna nuclear importation by nuclear localization signal-bearing vectors: a promising way for non-viral gene therapy? bioplex technology: novel synthetic gene delivery pharmaceutical based on peptides anchored to nucleic acids cationic transfection lipids key issues in non-viral gene delivery microcirculation in skeletal muscle hereditary inclusion body myopathy: single patient response to intravenous dosing of gne gene lipoplex efficient gene transfer from innervated muscle into rat peripheral and central nervous systems using a non-viral haemagglutinating virus of japan (hvj)-liposome method bdnf gene delivery mediated by neuron-targeted nanoparticles is neuroprotective in peripheral nerve injury promoter attenuation in gene therapy: interferon-gamma and tumor necrosis factor-alpha inhibit transgene expression efficient vaccination by intradermal or intramuscular inoculation of plasmid dna expressing hepatitis b surface antigen under desmin promoter/enhancer control long-term expression of a fluorescent reporter gene via direct injection of plasmid vector into mouse skeletal muscle: comparison of human creatine kinase and cmv promoter expression levels in vivo electrotransfer into skeletal muscle for protein expression adjuvant properties of cpg oligonucleotides in primates direct gene transfer into mouse muscle in vivo long-term persistence of plasmid dna and foreign gene expression in mouse muscle direct gene transfer into nonhuman primate myofibers in vivo delivery of dna into muscle for treating systemic diseases: advantages and challenges high-pressure transvenous perfusion of the upper extremity in human muscular dystrophy: a safety study with 0.9% saline evaluation of hydrodynamic limb vein injections in nonhuman primates dose response in rodents and nonhuman primates after hydrodynamic limb vein delivery of naked plasmid dna functional efficacy of dystrophin expression from plasmids delivered to mdx mice by hydrodynamic limb vein injection gene-based therapies of neuromuscular disorders: an update and the pivotal role of patient organizations in their discovery and implementation dna vaccines to attack cancer concise review: mesoangioblast and mesenchymal stem cell therapy for muscular dystrophy: progress, challenges, and future directions recent progress in satellite cell/myoblast engraftment-relevance for therapy phase i clinical evaluation of a six-plasmid multiclade hiv-1 dna candidate vaccine phase i clinical trial safety of dna-and modified virus ankara-vectored human immunodeficiency virus type 1 (hiv-1) vaccines administered alone and in a prime-boost regime to healthy hiv-1-uninfected volunteers comparative ability of plasmid il-12 and il-15 to enhance cellular and humoral immune responses elicited by a sivgag plasmid dna vaccine and alter disease progression following shiv(89.6p) challenge in rhesus macaques dna gag/adenovirus type 5 (ad5) gag and ad5 gag/ad5 gag vaccines induce distinct t-cell response profiles induction of hiv-specific functional immune responses by a multiclade hiv-1 dna vaccine candidate in healthy ugandans induction of multifunctional human immunodeficiency virus type 1 (hiv-1)-specific t cells capable of proliferation in healthy subjects by using a prime-boost regimen of dna-and modified vaccinia virus ankara-vectored vaccines expressing hiv-1 gag coupled to cd8+ t-cell epitopes safety and immunogenicity of cytotoxic t-lymphocyte poly-epitope, dna plasmid (ep hiv-1090) vaccine in healthy, human immunodeficiency virus type 1 (hiv-1)-uninfected adults vaccine research center 004 study team (2006) phase 1 safety and immunogenicity evaluation of a multiclade hiv-1 dna candidate vaccine clinical experience with plasmid dna-and modified vaccinia virus ankara-vectored human immunodeficiency virus type 1 clade a vaccine focusing on t-cell induction a randomized, placebo-controlled phase i trial of dna prime, recombinant fowlpox virus boost prophylactic vaccine for hiv-1 first human trial of a dna-based vaccine for treatment of human immunodeficiency virus type 1 infection: safety and host response ev02: a phase i trial to compare the safety and immunogenicity of hiv dna-c prime-nyvac-c boost to nyvac-c alone excellent safety and tolerability of the human immunodeficiency virus type 1 pga2/js2 plasmid dna priming vector vaccine in hiv type 1 uninfected adults a human immunodeficiency virus 1 (hiv-1) clade a vaccine in clinical trials: stimulation of hiv-specific t-cell responses by dna and recombinant modified vaccinia virus ankara (mva) vaccines in humans safety and immunogenicity of a gag-pol candidate hiv-1 dna vaccine administered by a needle-free device in hiv-1-seronegative subjects cross-subtype antibody and cellular immune responses induced by a polyvalent dna prime-protein boost hiv-1 vaccine in healthy human volunteers clinical phase 1 testing of the safety and immunogenicity of an epitopebased dna vaccine in human immunodeficiency virus type 1-infected subjects receiving highly active antiretroviral therapy protective efficacy of a recombinant dna vaccine against hepatitis b in male homosexuals: results at 36 months induction or expansion of t-cell responses by a hepatitis b dna vaccine administered to chronic hbv carriers strong hcv ns3/4a, ns4b, ns5a, ns5b-specific cellular immune responses induced in rhesus macaques by a novel hcv genotype 1a/1b consensus dna vaccine a dna vaccine for ebola virus is safe and immunogenic in a phase i clinical trial development of a preventive vaccine for ebola virus infection in primates toxicological safety evaluation of dna plasmid vaccines against hiv-1, ebola, severe acute respiratory syndrome, or west nile virus is similar despite differing plasmid backbones or gene-inserts the threat of avian influenza a (h5n1). part iv: development of vaccines safety, efficacy, and immunogenicity of vgx-3100, a therapeutic synthetic dna vaccine targeting human papillomavirus 16 and 18 e6 and e7 proteins for cervical intraepithelial neoplasia 2/3: a randomised, double-blind, placebo-controlled phase 2b trial safety of asp0113, a cytomegalovirus dna vaccine, in recipients undergoing allogeneic hematopoietic cell transplantation: an open-label phase 2 trial clinical development of a cytomegalovirus dna vaccine: from product concept to pivotal phase 3 trial safety and immunogenicity of a bivalent cytomegalovirus dna vaccine in healthy adult subjects safety, tolerability and humoral immune responses after intramuscular administration of a malaria dna vaccine to healthy adult volunteers clinical trial in healthy malaria-naive adults to evaluate the safety, tolerability, immunogenicity and efficacy of mustdo5, a five-gene, sporozoite/hepatic stage plasmodium falciparum dna vaccine combined with escalating dose human gm-csf dna immunization with dna coding for gp100 results in cd4 t-cell independent antitumor immunity inability to immunize patients with metastatic melanoma using plasmid dna encoding the gp100 melanoma-melanocyte antigen gene electrotransfer of plasmid antiangiogenic metargidin peptide (amep) in disseminated melanoma: safety and efficacy results of a phase i first-in-man study phase i study of a plasmid dna vaccine encoding mart-1 in patients with resected melanoma at risk for relapse safety and immunogenicity of tyrosinase dna vaccines in patients with melanoma generation of mammaglobin-a-specific cd4 t cells and identification of candidate cd4 epitopes for breast cancer vaccine strategies immunogenicity of a plasmid dna vaccine encoding chimeric idiotype in patients with b-cell lymphoma her2/neu dna vaccination for breast tumors a phase i trial of dna vaccination with a plasmid expressing prostate-specific antigen in patients with hormone-refractory prostate cancer taking electroporationbased delivery of dna vaccination into humans: a generic clinical protocol dna vaccines: an active immunization strategy for prostate cancer development of pro-angiogenic engineered transcription factors for the treatment of cardiovascular disease therapeutic angiogenesis with intramuscular nv1fgf improves amputation-free survival in patients with critical limb ischemia constitutive expression of phvegf165 after intramuscular gene transfer promotes collateral vessel development in patients with critical limb ischemia effect of fibroblast growth factor nv1fgf on amputation and death: a randomised placebo-controlled trial of gene therapy in critical limb ischaemia naked plasmid dna encoding fibroblast growth factor type 1 for the treatment of end-stage unreconstructible lower extremity ischemia: preliminary results of a phase i trial clinical safety and preliminary efficacy of plasmid pudk-hgf expressing human hepatocyte growth factor (hgf) in patients with critical limb ischemia plasma vascular endothelial growth factor (vegf) levels after intramuscular and intramyocardial gene transfer of vegf-1 plasmid dna vascular endothelial growth factor-induced angiogenic gene therapy in patients with peripheral artery disease treatment with intramuscular vascular endothelial growth factor gene compared with placebo for patients with diabetes mellitus and critical limb ischemia: a double-blind randomized trial basic fibroblast growth factor in patients with intermittent claudication: results of a phase i trial results of a double-blind, placebo-controlled study to assess the safety of intramuscular injection of hepatocyte growth factor plasmid to improve limb perfusion in patients with critical limb ischemia non-viral vectors for gene therapy: clinical trials in cardiovascular disease intramuscular vascular endothelial growth factor gene therapy in patients with chronic critical leg ischemia results from a phase ii multicenter, double-blind placebo-controlled study of del-1 (vlts-589) for intermittent claudication in subjects with peripheral arterial disease design of the del-1 for therapeutic angiogenesis trial (delta-1), a phase ii multicenter, double-blind, placebo-controlled trial of vlts-589 in subjects with intermittent claudication secondary to peripheral arterial disease treatment of thromboangiitis obliterans (buerger's disease) by intramuscular gene transfer of vascular endothelial growth factor: preliminary clinical results long-term follow-up evaluation of results from clinical trial using hepatocyte growth factor gene to treat severe peripheral arterial disease induction of antigen-specific tolerance in multiple sclerosis after immunization with dna encoding myelin basic protein in a randomized, placebo-controlled phase 1/2 trial double-blind, placebo-controlled study of hgf gene therapy in diabetic neuropathy vascular endothelial growth factor gene transfer for diabetic polyneuropathy: a randomized, double-blinded trial improvement in chronic ischemic neuropathy after intramuscular phvegf165 gene transfer in patients with critical limb ischemia key: cord-020969-lh2ergpm authors: strauss, james h.; strauss, ellen g. title: gene therapy date: 2012-07-27 journal: viruses and human disease doi: 10.1016/b978-0-12-373741-0.50014-3 sha: doc_id: 20969 cord_uid: lh2ergpm nan molecular genetic studies during the last decades have led to an enormous increase in our understanding of the molecular biology of the replication of viruses. the complete nucleotide sequences of many virus genomes have been determined. information on the origins required for the replication of these genomes, the promoters used to express the information within them, and the packaging signals required for packaging progeny genomes into virions have been established for many. the mechanisms by which viral mrnas are preferentially translated have been explored. together with methods for cloning and manipulating viral genomes, this information has made possible the use of viruses as vectors to express foreign genes. in principle, any virus can be used as a vector, and systems that use a very wide spectrum of virus vectors have been described. dna viruses were first developed as vectors, since it is possible to manipulate the entire genome in the case of smaller viruses, or to use homologous recombination to insert a gene of interest in the case of larger viruses. recent developments now make it possible to manipulate the entire genomes of even very large dna viruses as artificial chromosomes, which potentially makes the use of these large genomes for expression of foreign proteins even more appealing. when complete cdna clones of rna viruses were obtained, it became straightforward to rescue plusstrand viruses from clones because the viral rna itself is infectious, and many such viruses have been used to express proteins. the use of minus-strand rna viruses as vectors was delayed because the virion rna itself is not infectious, but recent developments has made it possible to rescue virus from cloned dna by using coexpression of the appropriate viral proteins in a transfected cell. as a consequence, minusstrand rna viruses have also joined the club of expression systems receiving intense study. retroviruses have also been widely used because of their capacity to integrate into a host chromosome and potentially express foreign proteins indefinitely. a sampling of expression systems and their uses is given here to illustrate the approaches that are being followed. every virus system has advantages and disadvantages as a vector, depending on its intended use. one of the more exciting uses has been the development of viruses as vectors for gene therapy, that is, to correct genetic defects in humans. in the most general sense, gene therapy involves transfer of genetic information into a cell, tissue, organ, or organism with the goal of improving the clinical outcome, either by curing a disease, or alleviating an underlying condition in a patient. although results have been disappointingly slow in coming, such systems offer great promise. this use represents an example of taking these infectious agents that have been the source of much human misery and developing them for the betterment of mankind. such expression systems have a wide variety of other potential uses, however. efforts to engineer viruses to kill cancer cells are also receiving attention. viruses that express foreign proteins have potential uses in the engineering of new vaccines against other pathogens. finally, viral expression systems have proved very useful in the expression of proteins in cell culture that can be used for various studies. a representative sampling of viruses that are being developed as vectors is described next in order to illustrate some of the strengths and weaknesses of the different systems. the viruses used in most clinical trials to date have been the poxviruses, the adenoviruses, and the retroviruses, and these are described here. several other virus systems that may be used in the future for treatment of humans, or that are useful for other purposes, are also described. vaccinia virus is a poxvirus with a large dsdna genome of 200 kb (chapter 7). until recently, this genome was too big to handle in one piece in a convenient fashion, and homologous recombination has been used to insert foreign genes into it. the large size of the viral genome, however, does mean that very large pieces of foreign dna can be inserted, while leaving the virus competent for independent replication and assembly. another advantage of the virus is that it has been used to vaccinate hundreds of millions of humans against smallpox. thus, there is much experience with the effects of the virus in humans. although the vaccine virus did cause serious side effects in a small fraction of vaccinees, highly attenuated strains of vaccinia have been developed for use in gene therapy by deleting specific genes associated with virulence. a new approach to the use of poxvirus vectors has been the development of nonhuman poxviruses, such as canarypox virus, as vectors. canarypox virus infection of mammals is abortive and essentially asymptomatic, but foreign genes incorporated into the canarypox virus genome are expressed in amounts that are sufficient to obtain an immunologic response. a variety of approaches have been used to obtain recombinant vaccinia viruses that express a gene of interest, but only the first such method to be used, and one that remains in wide use, is described here. this method is illustrated in fig. 11 .1. the thymidine kinase (tk) gene of vaccinia virus is nonessential for growth of the virus in tissue culture. furthermore, deletion of the tk gene results in attenuation of the virus in humans, which is a desirable trait. finally, the tk gene can be either positively or negatively selected 424 gene therapy .1 construction of recombinant vaccinia virus expression vector. the foreign gene (red) is inserted into a bacterial plasmid adjacent to a vaccinia promoter (black arrow), flanked with sequences from the vaccinia thymidine kinase gene (turquoise). plasmid dna is transfected into cells infected with wild-type (tk + ) vaccinia virus. recombinant progeny from homologous recombination are all tk − , due to interruption of the tk gene, and can be selected by growth in bromodeoxyuridine (budr), since incorporation of budr into tk + vaccinia is lethal. these tk − vaccinia will infect normally and express the foreign gene under control of the vaccinia promoter. adapted from strauss and strauss (1997) figure 2 .25 on p. 115. by using different media for propagation of the virus. the starting point is a plasmid clone that contains a copy of the tk gene that has a large internal deletion. in the region of the deletion a vaccinia virus promoter is inserted upstream of a polylinker. the gene of interest is inserted into the polylinker using standard cloning technology. thus, we have the foreign gene downstream of a vaccinia promoter, and the entire insert is flanked by sequences from the vaccinia tk gene. the plasmid containing the cloned tk gene with its foreign gene insert is transfected into cells that have been infected by wild-type vaccinia virus. homologous recombination between the tk gene in the virus and the tk-flanking sequences in the plasmid occurs with a sufficiently high frequency that a reasonable fraction of the progeny have the gene of interest incorporated. these viruses have an inactive tk gene (they are tk − ), because the tk gene has been replaced by the deleted version containing the inserted foreign gene. the next step, then, is to select for viruses that are tk − by growing virus in the presence of bromodeoxyuridine (budr). an active tk enzyme will phosphorylate budr to the monophosphate form, which can be further phosphorylated by cellular enzymes to the triphosphate and incorporated into the viral nucleic acid during replication. incorporation of budr is lethal under the appropriate conditions, and thus viruses that survive this treatment are those in which the tk gene has been inactivated. it is usually necessary to select among the surviving progeny for those that possess the gene of interest, because inactivation of the tk gene can occur spontaneously through deletion or mutation. selection can be accomplished by a plaque lift hybridization assay in which virus in plaques is transferred to filter paper. virus plaques on the filter paper are probed with radiolabeled hybridization probes specific for the inserted gene. virus in plaques that hybridize to the probe is recovered and further passaged. in this way a pure virus stock that will express the gene of interest can be isolated. the herpesviruses also have a large dna genome that is capable of accommodating large inserts of foreign dna. hsv-1, in particular, has been studied as an expression vector. recombination has been used to insert foreign genes and to delete virus genes involved in lytic growth or toxicity. because hsv-1 is neurotropic, it has been considered as a possible vector for the control or eradication of neural cancers. the viral dna does not integrate, and the virus is capable of infecting nonreplicating neurons and being maintained in a latent state, properties that suggest it could be used for this purpose. it might also be used as an expression vector that could produce protein for long periods in neurons, and as such might be useful for the treatment of spinal nerve injury, for example, or for pain therapy. the baculoviruses are insect viruses that have a large dna genome capable of accommodating large dna inserts. foreign dna is inserted by recombination and selection of appropriate viruses. they have been widely used to express high levels of protein in eukaryotic cells (insect cells in this case) that can be used, for example, in crystallization trials to determine protein structure, or to produce protein for immunization of animals, and other uses. recent studies have suggested that baculoviruses might be useful for gene therapy in humans. the viruses will infect a number of human cells resulting in expression of proteins of interest, but the viruses are nonpathogenic in humans, suggesting a level of safety in their possible application. whether problems associated with these viruses, such as low levels of expression and the rejection of them by the immune system, can be overcome remains to be determined. adenovirus infections of humans are common and normally cause only mild symptoms. deletion of virulence genes from adenovirus vectors further attenuates these viruses. in addition, adenovirus vaccines have been used by the military for some years and, therefore, some experience has been gained in the experimental infection of humans by adenoviruses, although gene therapy trials use a different mode of delivery of adenovirus vectors. because of their apparent safety, adenoviruses have been developed for use as vectors in gene therapy trials or for vaccine purposes. two approaches have been used. in one, infectious adenoviruses have been produced that express a gene of interest. in the second approach, suicide vectors are produced that can infect a cell and express the gene of interest, but which are defective and cannot produce progeny virus. suicide vectors cannot spread to neighboring cells, and the infection is therefore limited in scope and in duration. the genome of adenoviruses is dsdna of 36 kbp (chapter 7). thus, the genome is smaller than that of poxviruses or other large dna viruses such as the herpesviruses and the baculoviruses and can accommodate correspondingly smaller inserts. however, inserts large enough for most applications can be accommodated. the genome is small enough that the virus can be reconstituted from dna clones. such an approach is inconvenient, however, and homologous recombination is often used to insert the gene of interest into the virus genome. the foreign gene is inserted into the region occupied by either the adenovirus e1 or e3 genes, one or both of which are deleted in the vector construct. virus lacking e1 cannot replicate, and such viruses form suicide vectors. for gene therapy, suicide vectors are normally used so as to prevent the spread of the infection. to prepare the stock of virus lacking e1, the virus must be grown in a cell line that expresses e1. an overview of this process is shown in fig. 11 .2. the complementing cell line, which produces e1 constitutively, supplies the e1 needed for replication of the defective adenovirus. the cells are transfected with the defective adenovirus dna and a full yield of progeny virions results. the progeny virus is defective and cannot replicate in normal cells, but it can be amplified by infection of the complementing cell line. on introduction of the virus into a human, the virus will infect cells and express the foreign gene, but the infection is abortive and no progeny virus is formed. the stock of defective virus must be tested to ensure that no replication-competent virus is present, since such virus can arise by recombination between the vector and the e1 gene in the complementing cell line. adenoviruses with only e3 deleted are often used to express proteins for vaccine purposes. these e3-deleted viruses possess intact e1 and will replicate in cultured cells and in humans, but are attenuated. because the virus replicates, expression of the immunizing antigen persists for a long time and a good immune response usually results. the procedure for insertion of the gene of interest by homologous recombination resembles that used for the poxviruses. the gene is inserted into a plasmid containing flanking sequences from the e1 or e3 region, and transfected into cells infected with adenovirus. recombinant viruses containing the gene of interest are selected and stocks prepared. it is also possible to transfect cells with the e1 or e3 expression cassette together with dna clones encoding the rest of the adenovirus genome, in which case homologous recombination results in the production of virus. in the case of insertions into e1, cells that express e1 must be used to produce the recombinant virus. adeno-associated viruses (aavs) have a single-stranded dna genome of 4.7 kb (chapter 7). they normally require coinfection of a cell by a helper virus, usually an adenovirus or a herpesvirus. they are being developed as expression vectors because they are not pathogenic in humans and because they normally integrate into the host-cell genome in a specific region, thus minimizing the problems of insertional mutagenesis. the genome size is small enough to be readily manipulated as a dna clone, but the small size also limits the amount of dna that can be inserted and therefore the applicability of the virus for gene transfer experiments. a related problem is that for expression studies, the genome is normally deleted for the aav genes with only the ends that function as promoter sites retained. however, site-specific integration requires the activity of the rep protein. nonetheless, the system is sufficiently attractive that efforts to develop aav as a gene therapy vector continue and the virus has been used in a number of clinical trials, as described later. retrovirus-based expression systems offer great promise because the retroviral genome integrates into the host-cell chromosome during infection and, in the case of the simple retroviruses at least, remains there as a mendelian gene that is passed on to progeny cells on cell division. thus, there is the potential for permanent expression of the inserted gene of interest. the essential components of a retrovirus vector are the long terminal repeats (ltrs), the packaging sequences known as ψ, the primer-binding site, and the sequences required for jumping by the reverse transcriptase during reverse transcription to form the dsdna copy of the genome (chapter 6). the process of creating and packaging a retrovirus-based expression cassette is illustrated in fig. 11 .3. a packaging cell line is created that expresses the retroviral gag, pol, and env genes, but whose mrnas do not contain the packaging signal and so cannot be packaged. the vector dna/ rna is created by modifying a dna clone of a retrovirus to contain the gene of interest in place of the gag-pol-env genes. in the process, all of the essential cis-acting signals required for packaging, reverse transcription, and integration are retained. the foreign gene can be under the control of the ltrs, or it can be under the control of another promoter positioned in the insert upstream of it. the resulting dna clone is transfected into the packaging cell line, and a producer cell line isolated that expresses the vector dna as well as the helper dna. vector rna transcribed from the vector dna is packaged into retroviral particles, using the proteins expressed from the helper dna. these particles are infectious and can be used to infect other cells or to transfer genes into a human. on infection of cells by the packaged vector, the vector rna is reverse transcribed into dna that integrates into the host-cell chromosome, where it can be expressed under the control of the promoters that it contains. the limitation on the size of the insert is about 10 kb, the upper limit of rna size that can be packaged. although murine leukemia viruses are not known to cause disease in man, it has been found that these viruses will cause tumors in immunosuppressed subhuman primates. thus, it is thought to be essential that there be no replication-competent virus in stocks used to treat humans. replication-competent virus can arise during packaging of the vector by recombination between the vector and the retroviral sequences used to produce gag-pol-env. at the current time, preparations of packaged vectors are screened to ensure that replicationcompetent viruses are not present. efforts are being made to reduce the incidence of recombination during packaging in order to simplify the procedure. one approach is to develop vectors that have very little sequence in common with the helper sequences, in order to reduce the incidence of homologous recombination. a second approach is to separate the gag-pol sequences from the env sequences in the helper cell. in this case, recombination between three separate dna fragments in the producer cell (that encoding gag-pol, that encoding env, and sequences in the vector) are required in order to give rise to replication-competent retrovirus. in gene therapy trials that use retroviruses, it has been found that the expression of the foreign gene in humans is often downregulated after a period of months. attempts are being made to identify promoters that will not be downregulated. different promoters might be required for different uses, and promoters that target transcription to particular cell types would be useful. a major problem with retroviral vectors is that simple retroviruses will only infect dividing cells. although they enter cells and are reverse transcribed into dna, the dna copy of the genome can enter the nucleus only during cell division. in many gene therapy treatments, it is desirable to infect stem cells in order to maintain expression of the therapeutic gene indefinitely. because stem cells divide relatively infrequently, it is difficult to infect a high proportion of them by vectors used to date. attempts are being made to identify methods to stimulate stem cells to divide during ex vivo treatment, so that a larger fraction of them can be infected. a second approach is to develop lentivirus vectors. lentiviruses, which include hiv, can infect nonreplicating cells and could potentially infect nondividing stem cells during ex vivo treatment. lentivirus vectors could also be useful for therapy involving other nondividing cells, such as neurons. it would be of considerable utility to be able to target retroviruses to specific cells. one possible approach to this is to replace all or part of the external domains of the retroviral surface glycoprotein with a monoclonal antibody that is directed against an antigen expressed only on the target cells. in principle, this approach is feasible, but whether it can be developed into something practical is as yet an open question. if specific cells could be infected, it would allow protocols in which the therapeutic gene would be expressed only in cells where it would be most useful. it would also allow the specific killing of cells such as tumor cells or hivinfected cells. for example, the retrovirus could express a gene that rendered the cell sensitive to toxic drugs such as budr. a retrovirus vector that expressed such a gene could also be useful for conventional gene therapy, because it would allow the infected cells to be killed if the infection process threatened to get out of hand. the genomes of plus-strand rna viruses are selfreplicating molecules that replicate in the cytoplasm, and they can express very high levels of protein. these properties make them potentially valuable as expression vectors. the alphaviruses possess a genome of single-strand rna of about 12 kb (chapter 3). their genomes can be easily manipulated as cdna clones, and infectious rna can be transcribed from these clones by rna polymerases, either in vivo or in vitro. rna transcribed in vitro can be transfected into cells and give rise to a full yield of virus, whereas rna transcribed in vivo will begin to replicate and produce virus. the structural proteins are made from a subgenomic mrna, making it easy to insert a foreign gene under the control of the subgenomic promoter. two approaches that have been used are illustrated in fig. 11 .4. in one approach, a second subgenomic promoter is inserted into the genome downstream of the structural proteins, or between the structural proteins and the nonstructural proteins ( fig. 11 .4c). two subgenomic mrnas are transcribed, one for the structural proteins and the second for the gene of interest. the size of the insert must be relatively small, on the order of 2000 nucleotides or less, because longer rnas are not packaged efficiently. however, this system has the advantage that the resulting double subgenomic virus is an infectious virus that can be propagated and maintained without helpers. a second approach is to delete the viral structural proteins and replace them with the gene of interest. in this case, there is room for an insert of about 5 kb that will still allow the resulting replicon to be packaged. the replicon is capable of independent replication, and transcription of a subgenomic messenger results in expression of the gene of interest. the replicon constitutes a suicide vector. it cannot be packaged unless the cells are coinfected with a helper to supply the structural proteins, or unless a packaging cell line that expresses the viral structural proteins is used. alphavirus replicons can be extremely efficient in expressing a foreign gene. in some cases as much as 25% of the protein of a cell can be converted to the foreign protein expressed by the replicon over a period of about 72 hours. wild-type replicons are cytolytic in vertebrate cells, inducing apoptosis, and the infection dies out. however, replicons have been produced with mutations in the replicase proteins that are not cytolytic and will produce the protein of interest indefinitely. thus, a wide sprectrum of choices is available, and the system chosen can be adapted to the needs of a particular experiment or treatment. viral expression systems would be more useful if they could be directed to specific cell types. an approach that uses monoclonal antibodies to direct sindbis virus to specific cells has been described. protein a, produced by staphylococcus aureas, binds with high affinity to igg. it is an important component of the virulence of the bacterium because it interferes with the host immune system. the igg-binding domain of protein a has been inserted into one of the viral glycoproteins. virions containing this domain are unable to infect cells using the normal receptor. however, the virus will bind igg monoclonal antibodies. if an antibody directed against a cell surface component is bound, the virus will infect cells expressing this protein at the cell surface. thus, this system has the potential to direct the virus to a specific cell type. one of the advantages of this approach is that the virus, once made, can be used with many different antibodies and thus directed against a variety of cell types. this approach is potentially applicable to any enveloped virus, and perhaps to nonenveloped viruses as well. a modification of the alphavirus system is to use a dna construct containing the replicon downstream of a promoter for a cellular rna polymerase, rather than using packaged rna replicons. on transfection of a cell with the dna, the replicon rna is launched when it is transcribed from the dna by cellular enzymes. once produced, the rna replicates independently and produces the subgenomic mrna that is translated into the gene of interest. as described in chapter 10, naked dna can be used to transfect muscle cells and perhaps other cells. plus-strand viruses that do not produce subgenomic mrnas, such as the picornaviruses and flaviviruses, present different problems for development as vectors. the translated product from the gene of interest must either be incorporated into the polyprotein produced by the virus and provisions made for its excision, or tricks must be used to express the gene of interest independently. two approaches with poliovirus will be described as examples of how such viruses might be used as vectors. poliovirus replicons have been constructed by deleting the region encoding the structural proteins and replacing this sequence with that for a foreign gene. the foreign gene must be in phase with the remainder of the poliovirus polyprotein, and the cleavage site recognized by the viral 2a protease is used to excise the foreign protein from the polyprotein. because the poliovirus replicon lacks a full complement of the structural genes (it is a suicide vector), packaging to produce particles requires infection of a cell that expresses the polioviral structural proteins by some mechanism. a construct that uses this approach to express the cytokine tumor necrosis factor alpha (tnf-α) is illustrated in fig. 11 sali i nonstructural proteins structural proteins exogeneous gene to be expressed vpg 3 figure 11.5 generation of poliovirus replicons for expression of foreign genes in motor neurons. based on an earlier construct to express interleukin-2 via a poliovirus replicon, the gene for wild-type murine tumor necrosis factor alpha (tnf-α) was positioned between the vp0 and 2a proteins of poliovirus, replacing vp3 and vp1. it was flanked on either side by sites for cleavage by the poliovirus 2a protease. these constructs were injected into transgenic mice expressing the poliovirus receptor, and expression of murine tnf-α was monitored. adapted from bledsoe et al. (2000) . clone" in which a dna copy of the viral genome is positioned downstream of a promoter for t7 rna polymerase is modified by replacing the genes for vp3 and vp1 with the gene for tnfα. recognition sites for the poliovirus 2a protease are positioned on both sides of the tnf-α gene. the tnf-α protein is produced as part of the poliovirus polyprotein, and cleaved from the polyprotein by the 2a protease. packaged replicons were used to infect transgenic mice that expressed the polio receptor (chapter 1). one of the interests of this system is that poliovirus exhibits an extraordinary tropism for motor neurons in the central nervous system (cns) (chapter 3). the packaged replicons, on introduction into the cns, infected only motor neurons, and therefore the foreign gene was expressed only in motor neurons. such replicons may be useful to treat cns diseases in which motor neurons are affected. a second approach to the use of poliovirus replicons is to use a second internal ribosome entry site (ires) (chapter 1) to initiate the synthesis of the nonstructural proteins. if the foreign gene replaces the structural genes, it will be translated from the 5′ end of the genome. if the poliovirus nonstructural genes are placed downstream of a second ires, internal initiation at this ires results in production of a polyprotein for the nonstructural proteins. this approach is similar to the approach shown in fig. 3 .3, where the structural proteins are replaced by a gene of interest. the coronaviruses have long been considered as potential candidates for experimental expression vectors or as candidate vaccines. the viruses have the largest nonsegmented rna viral genomes, up to 31 kb (see also chapter 3), which is both an advantage in that they could potentially accommodate a large amount of heterologous nucleic acid, and a disadvantage due to the difficulties of manipulating large rna molecules. the essential genes are arranged 5′-replicase-s-e-m-n-3′, and are interspersed with a number of nonessential genes which are group specific. recent studies have revealed a number of characteristics which make them even more attractive as vector candidates. the first was that the deletion of the nonessential genes is sufficiently attenuating that no further mutations in the essential genes are required to produce an avirulent virus. second, as the precise domains on the s protein which interact with the species-dependent cellular receptors were determined, it was found that both species and tissue specificity could be altered by relatively minor changes in the sequence of s. third, it is possible to rearrange the linear order of the genes, and while this altered the relative amounts of the products, it reduces the possibility that the vector (vaccine) could undergo recombination with field strains. fourth, it is possible to insert heterologous genes anywhere in the genome by simply incorporating a cassette comprised of the gene of interest preceded by the specific intergenic sequence of the parental virus. the generation of a recombinant mhv (mouse hepatitis virus) encoding renilla luciferase is shown in figure 11 .6 to illustrate the strategies employed. these include maintenance of a replication defective genome as a bacterial plasmid under the control of a t7 promoter, transcription of rna in vitro, electroporation of rna into feline cells previously infected with a murine coronavirus engineered to infect feline cells, in vivo recombination, and selection for recombinants on murine cells. due to the high frequency of recombination in coronaviruses, many recombinants are unstable and not suitable for use as vaccines. however, the location of the foreign gene within the genome, the particular coronavirus used, and the identity of the particular heterologous gene all have significant effects on stability, and coronaviruses may yet prove to be useful for targeted gene delivery. in minus-strand rna viruses, the genomic rna is not itself infectious. ribonucleoprotein containing the n, p, and l genes is required for replication of the viral rna, and thus for infectivity, and only recently have methods been devised to recover virus from cdna clones. a schematic diagram of how virus can be recovered from dna clones of the rhabdovirus vesicular stomatitis virus (vsv) (chapter 4) is shown in fig. 11 .7. a cell is transfected with a set of cdna clones that together express n, p, and l as well as the genomic or antigenomic rna. the antigenomic rna usually works better, probably because it does not hybridize to the mrnas being produced from the plasmids. encapsidation of the antigenomic rna by n, p, and l to form nucleocapsids allows it to replicate and produce genomic rna that is also encapsidated. synthesis of mrnas from the genomic rna, together with continued replication, results in a complete virus replication cycle and production of infectious progeny virus that have as their genome the rna supplied as a cdna clone. the yield of infectious virus is small, but sufficient to isolate individual plaques and thus obtain viruses from the cdna clones. the ability to rescue virus from a cdna clone makes it possible to manipulate the viral genome. since the rhabdovirus genome is transcribed into multiple mrnas, one for each gene, and the transcription signals recognized by the enzyme are well understood, it is relatively simple to add or delete genes. a modified vsv that was produced by using dna clones is illustrated in fig. 11 .8. in this vsv, the surface glycoprotein present on the vsv particle, called g, has been deleted and replaced with cd4, the cell surface protein that is used as a receptor by hiv. in addition a new gene has been inserted, the gene encoding the hiv coreceptor cxcr4, so that the virus now contains six genes. the virions produced figure 11 .6 construction of a coronavirus expression plasmid. an attenuated and nonreplicating plasmid from mhv is constructed in which the 5′ end of the orf 1 sequence is fused to the last 28 codons of he. in addition, all of the accessory protein genes are deleted, and the gene for renilla luciferase (rl) inserted. rna is transcribed in vitro from this plasmid using t7 polymerase and the rna electroporated onto feline cells which have been infected with mhv virus in which the spike s protein has been mutated to recognize receptors on feline cells (fmhv). after 24 hours these cells are plated out onto a monolayer of murine cells to select for coronaviruses that have undergone recombination between the plasmid and the helper virus. recombinants are plaque-purified and stocks grown. are unable to infect the cells normally infectable by vsv because they lack the g protein. however, because they contain the hiv receptor and coreceptor on their surface, they do infect cells that express the hiv glycoprotein on their surface, such as cells infected by hiv. since vsv is a lytic virus, the hiv-infected cells are killed. viruses have been widely used as vectors to express a variety of genes in cultured cells. this use is of long standing and has led to important results. of perhaps more interest are efforts to develop viruses as vectors for medical purposes. the manipulation of virus genomes to develop new vaccines is very promising. although no licensed human vaccines have been introduced using this technology, clinical trials are ongoing and it is to be expected that several such vaccines will be licensed in the near future. there is also expectation that viruses will be useful as vectors for gene therapy, and numerous clinical trials are taking place. the results to date have been disappointing, but the promise remains. figure 11.8 producing a mutant vsv targeted to kill hiv-infected cells. (a) genome of a rhabdovirus in which the glycoprotein g gene has been replaced with sequences encoding cd4 and cxcr4, the hiv primary receptor and coreceptor, and a schematic of a cdna clone containing the genome sequence (cdna copy of vrna). (b) a susceptible cell is infected with vaccinia virus expressing the t7 rna polymerase, and four separate plasmids: the mutant genome plasmid from which full-length vc (plus strand) rna is transcribed, and three individual plasmids expressing vsv n protein, p protein, and l protein, all under the control of t7 promoters. (c) plus-strand mutant vcrna is transcribed and encapsidated with n, p, and l. the rnp then replicates and both viral proteins and cd4 and cxcr4 are expressed from individual mrnas transcribed from genome sense rnps. virions bud from the cell and (d) cannot infect a new susceptible cell as before in fig. 11 .7, but can infect an hiv-infected cell expressing hiv env proteins on its surface. infection with vsv is cytolytic and the hiv-infected cell dies. adapted from conzelmann and meyers (1996) . the genome by virus vectors, often by recombinant vaccinia virus. these studies have resulted in an understanding of the two viral proteases within the hcv genome, the processing pathway through which the polyprotein translated from the genome is processed, the function of the viral ires, and the function of the viral replicase, among other results. the use of virus vectors means that such studies on hcv can be conveniently conducted in mammalian cells under conditions that are related to the natural growth cycle of the virus. norwalk virus is another virus for which there is no cell culture system. the virus can be grown only in human volunteers, again limiting the range of studies that can be done. virus particles isolated from the stools of infected volunteers are often degraded and difficult to purify to homogeneity. thus, structural studies of infectious virus have been limited. expression of cdna copies of the structural proteins of the virus in baculovirus vectors has allowed the production of large amounts of viral structural proteins that spontaneously assemble into virus-like particles. these virus-like particles have been studied by cryoelectron microscopy, and detailed information on the structure of the virus has been obtained in this way. baculoviruses are also widely used to prepare large amounts of protein for crystallographic studies. such studies require 20 mg or more of protein, and the baculovirus system can be used to prepare such quantities. an advantage of the system is that the protein is made in a eukaryotic cell, which can be important for obtaining the protein folded into its correct three-dimensional conformation. also of importance is the use of secretion sequences in the constructs that lead to the secretion of the protein from the infected cell, making it easier to purify the protein. even for viruses for which cell culture systems exist, the use of virus vectors that express to higher levels can be advantageous. there are cell culture systems in which rubella virus will grow and plaque, and there is a full-length cdna clone of rubella virus from which infectious rna can be recovered. however, the cell culture systems produce only low amounts of virus proteins, especially of the nonstructural proteins, and it has been difficult to study the expression and processing of the nonstructural polyprotein. expression of the nonstructural region of rubella virus in vaccinia virus vectors or in sindbis virus vectors has allowed the production of much larger quantities of the polyprotein precursor. this has been used to determine the processing pathways, the identification of the virus nonstructural protease, and the identification of the cleavage sites that are cleaved by this protease. as a final example, vaccinia virus vectors and sindbis virus vectors have been used to map t-cell epitopes for a number of viruses (chapter 10). for this, defined regions of a viral protein are expressed in order to determine whether a particular t-cell epitope lies within that region. much effort is being put into the development of viruses as agents to immunize against other infectious agents, including other viruses. such an approach has a number of advantages. there is a large body of experience in the use of attenuated or avirulent viruses as vaccines. many of these, such as vaccinia virus or the yellow fever 17d virus, both of which have been used to immunize many millions of people, can be potentially developed as vectors to express other antigens, such as those of hcv or hiv. use of a live virus as a vector to express antigens of other pathogens has many of the advantages of live virus vaccines. this includes the fact that only low initial doses are required, and therefore the expense of vaccine production may be less; that subsequent virus replication leads to the expression of large amounts of the antigen over an extended period of time, and the antigen folds in a more or less native conformation; and that a full range of immunity, including production of ctls as well as of humoral immunity, usually develops. no human vaccines have been licensed that use such recombinant viruses, but there are ongoing clinical trials of several potential vaccines. several trials of candidate vaccines against hiv have been conducted that use vaccinia virus or retrovirus vectors to express the hiv surface glycoprotein. these trials have been moderately successful in the sense that immune responses to hiv glycoprotein were obtained, but these immune responses were not particularly vigorous and it is not known if the immune response is protective. hiv is able to persist in infected patients despite a vigorous immune response, and sterilizing immunity might be required. further, the hiv surface glycoprotein is highly glycosylated and neutralizing antibodies are difficult to obtain. studies in monkeys with related vaccines against simian immunodeficiency virus have given mixed results. in most such trials, immune responses were generated, but these were not fully protective. one recent trial did generate a protective response, however, giving hope that continued efforts in this direction will ultimately work out. recent studies with anti-hiv drugs given very soon after infection found that limiting the replication of the virus early appears to allow the generation of a protective immune response in at least some patients. although such studies remain preliminary, they do suggest that a nonsterilizing immune response that restricts virus replication early might prove to be protective. other clinical trials have also tested poxviruses as vectors. vaccinia virus has been used in an attempt to immunize against epstein-barr virus, and canarypox virus has been used as a vector for potential immunization against rabies virus. although no licensed human vaccines use poxvirus vectors, veterinary vaccines that are based on poxvirus vectors are in use. one such vaccine consists of vaccinia virus that expresses the rabies surface glycoprotein. this vaccine has been used to immunize wildlife. the recombinant vaccinia viruses are spread in baits that are eaten by wild animals that serve as reservoirs of the virus, such as skunks, raccoons, foxes, and coyotes. this approach has been useful in limiting the spread of rabies in wildlife populations. other poxvirus-based vaccines include vaccinia virus vectors to protect cattle against vesicular stomatitis virus and rinderpest virus, and to immunize chickens against influenza virus; pigeonpox virus vectors to immunize chickens against newcastle disease virus; fowlpox virus vectors to immunize chickens against influenza, newcastle disease, and infectious bursal disease viruses; a capripox virus vector to immunize pigs against pseudorabies virus; and a canarypox virus vector used to immunize dogs against canine distemper virus. thus, it should be possible to develop human vaccines based on poxvirus vectors. in a quite different approach, clinical trials of a novel vaccine against japanese encephalitis (je) virus have been conducted. je is a scourge in parts of asia, causing a large number of deaths and neurological sequelae in people that survive the encephalitis (chapter 3). vaccines in widespread use are inactivated virus vaccines, and the difficulties in preparing the large amounts of material required and delivering it to large segments of the population are significant. an attenuated virus vaccine, sa14-14-2, has been prepared in china by passing the virus in cultured cells and in rodent tissues. this vaccine is safe but overattenuated, so that the effectiveness is only 80% after a single dose. in contrast, the yellow fever virus (yf) 17d vaccine has an effectiveness of virtually 100% after a single dose, and immunity is long lasting, probably lifelong. a candidate je vaccine has been developed that consists of the 17d strain of yf virus in which the prm and e genes have been replaced with those of je, as illustrated in fig. 11.9 . four chimeric viruses were tested. the je structural proteins were taken from either the virulent nakayama strain or from the attenuated sa-14-14-2 strain. in both cases, chimeras containing all three structural proteins from je were tested infectious 17d yellow fever cdna clone japanese encephalitis (nakayama) cdna (virulent) japanese encephalitis (sa) cdna (human vaccine strain) figure 11.9 construction of yellow fever/japanese encephalitis chimeric viruses. starting with the full-length cdna clone for 17d yellow fever virus, a number of chimeric viruses were constructed in which the m and e proteins were replaced with those of different strains of japanese encephalitis virus. however, when c, m, and e of je were put into the yellow fever clone, no viable virus was obtained. both prme chimeras grew well in tissue culture, and were neutralized by anti-je antiserum. yf/je-s prm-e was attenuated, and did not kill adult mice by intracerebral inoculation, but yf/je-n prm-e was neurovirulent. prnt is the log reciprocal of the dilution yielding 50% plaque reduction neutralization, based on 100 pfu on llv-mk cells, using either yf or je hyperimmune ascitic fluid. adapted from chambers et al. (1999) . as well as chimeras that contained only prm and e from je. chimeras containing c, prm, and e from je were not viable, whereas chimeras containing only prm and e from je were viable and grew well in culture (fig. 11.9 ). the viable chimeras were first tested in mice. the chimera containing the nakayama strain proteins caused lethal encephalitis in mice, as does the yf 17d virus (even though it is safe for use in humans). however, the chimera containing prm and e from the attenuated je strain was fully attenuated in mice and did not cause illness. the fully attenuated chimera was chosen for testing in monkeys, and was found to be safe and to protect monkeys against challenge with je virus. clinical trials of this candidate vaccine have taken place in humans. the vaccine appears to be safe and more effective than the je vaccines now in use. furthermore, this approach is applicable to other flaviviruses, such as the dengue viruses, for which no licensed vaccines exist, or west nile virus, which spread recently to the americas where it caused a number of fatal cases of human encephalitis. recombinant yf 17d expressing the prm and e proteins of all four serotypes of dengue viruses and recombinant viruses expressing prm and e of west nile virus are also in clinical trials with encouraging results. yet another possible approach to developing new generations of vaccines using the power of biotechnology is to attenuate a virus by making changes in the laboratory that are expected to cripple the virus. such an approach can be used with virtually any virus. a candidate vaccine strain of dengue virus has been constructed by making deletions in the 3′ nontranslated region of the genome that attenuate the virus, and such viruses are being tested in early trials. a number of genetic diseases result from the failure to produce a specific protein, usually due to a single defective gene. one of the more exciting possible uses for virus vectors is for the expression of a missing protein as a cure for the genetic defect associated with its absence. some of these "monogenic diseases" that might be curable through the use of gene therapy are listed in table 11 .1. for successful treatment, expression of the missing protein must be long-term and preferably lifelong, the levels of protein produced must be sufficient to alleviate the symptoms of the disease, the protein must be expressed in or translocated to those cells that require the normal protein for function, and infection with the virus vector must be free of disease symptoms. because of the requirement for long-term expression, viruses whose dna integrates into the host chromosome, such as the simple retroviruses as well as the lentiviruses, and adeno-associated viruses, offer the most promising system for many diseases. to date, several hundred patients have been treated with vectors based on moloney murine leukemia virus in clinical trials. clinical trials have also been conducted that use adenovirus. adeno-associated virus, poxvirus, and herpesvirus vectors (fig. 11.10) . clinical trials in humans, which require extensive prior testing in animals, are divided into three phases. phase i involves relatively few, usually healthy individuals. the objective of phase i trials is to test the safety of a vaccine or treatment as well as the dosage that is tolerated, and the individuals are closely monitored during the trial. phase ii trials involve more individuals and test the efficacy of the treatment, and patients are again closely monitored. if a treatment passes both of these tests, phase iii trials can begin in which thousands of individuals are treated to test the efficacy of treatment. virtually all of the clinical trials in gene therapy conducted to date are phase i or phase ii; fewer than 2% of trials have progressed to phase iii (fig. 11.11) , and no gene therapy treatments have been licensed to date. in addition to the possible treatment of genetic defects, virus vectors may also be useful for the treatment of a number of acquired diseases. these include cancer, hiv infection, parkinson's disease, injuries to the spinal cord, and vascular diseases such as restenosis and arteriosclerosis. a partial listing of acquired diseases that have been suggested as candidates for gene therapy is given in table 11 .2, and the number of trials for a number of different conditions is shown in fig. 11 .12. despite the large efforts to use gene therapy in clinical settings, the progress has been disappointingly slow, and many of the trials have been aborted due to unforeseen adverse consequences. nevertheless, as infectious clones of viruses continue to be developed, a large body of research is being devoted to construction of vectors, especially now to second and third generation vectors, as the problems associated with the initial systems are becoming clear. a comparison of various virus systems that are being considered for gene therapy is shown in tables 11.3 and 11.4. naked dna has also been used in a recent trial for coronary artery disease, and the properties of this system are included in table 11 .4. most of the modern vectors have had more and more of the dispensable viral genes deleted. deletion of these genes reduces pathogenicity, and prevents the production of immunogenic viral antigens. often only the gene of interest and the viral transcriptional regulatory elements are left, and to prepare the vectors for use in trials, all other functions must be supplied by a helper virus or a packaging cell line. another advantage of such stripped down vectors is the fact that it is improbable that the vector can recombine with wild-type viruses, either exogenous or endogenous, to cause disease. a partial listing of clinical trials that attempt to treat several different genetic defects by using virus vectors to deliver specific genes is given in table 11 .5. there have been few successes to date and the table is more of a compendium of the variety of genes and diseases, as well as the variety of delivery schemes, that are being examined. also included in the table is an impending trial for the vaccination of humans against hpiv-3 using a bovine virus. retroviruses have been used in a number of clinical trials to genetically tag cells. although this use does not fall within the narrow definition of gene therapy, it does provide background experience in the use of retrovirus vectors in humans. one such use has been in bone marrow transplantation for leukemia. severe forms of leukemia can sometimes be treated by ablation of the hematopoietic system with chemotherapy and/or x-rays in order to kill all tumor cells, followed by reconstitution of the system by transplantation of bone marrow from a compatible donor. although often successful, the leukemia sometimes recurs and it is desirable to know whether it recurs because of incomplete destruction of the patient's leukemic cells or whether the donor cells are the source of the leukemia. experiments in which the donor cells have been tagged using retroviruses that express a marker gene have been used to answer this question, which is important for the design of transplantation protocols. patients who lack the enzyme adenosine deaminase (ada) will die early in life unless treated. lack of ada results in the failure to clear adenosine from the body and, consequently, the accumulation of adenosine in cells throughout the body. adenosine is toxic at high concentrations, producing a variety of symptoms. the most serious symptom results from the extreme sensitivity of t cells to elevated adenosine concentrations. loss of t cells results in scid, severe combined immunodeficiency. both ctl responses (which are t-cell based) and humoral responses (which require t-helper cells) are impaired. people with scid syndrome are unable to mount an immunologic response to infectious agents, and scid is invariably fatal early in life unless treated in some way. ada deficiency accounts for about 25% of scid syndromes in humans. scid can be treated by bone marrow transplantation if a suitable donor can be found. in the case of scid due to ada deficiency, weekly or twice weekly injections of ada somia (1997), jolly (1994) , and bukreyev et al. (2006) . source: verma and weitzman (2005) , jolly (1994), boulaiz et al. (2005) , hu (2006) . mixed with polyethylene glycol (peg) have been used to successfully treat about 60 patients in whom bone marrow transplantation cannot be used because of the lack of compatible donors. of these, about 10 patients have also been treated with retroviral vectors that express ada. in these experiments, t cells were taken from the patient (or in the case of three newborns, umbilical cord cells were used), infected ex vivo with the retrovirus vector using a number of different cell culture and infection protocols, and the cells reinfused into the patient. many of the patients continue to produce ada from the vector several years after treatment. however, all of the patients continue to receive ada-peg injections, which is known to be an effective treatment. although some patients who have received retroviral therapy have been partially weaned from the supplementary ada-peg, it appears that some of these, and perhaps all, do not produce enough ada to be cured. thus, although no cures were effected in these early trials, the results were encouraging and suggested that future protocols might be more successful. two areas of retroviral therapy that needed improvement were to increase the efficiency with which stem cells are infected, and the need to prevent the retroviral promoter from being downregulated. a more recent trial involving two scid-ada patients in italy was more successful (table 11 .5). two infants for whom no compatible donor existed and for whom no peg-ada treatment was available were treated with improved retroviral therapy. both patients developed functional immune systems and no adverse events have been reported. the improved results appear to arise from improved protocols as well as to selection in the patient for lymphoid progenitor cells that expressed adequate amounts of ada. scid disease can also be caused by a failure to produce the receptor for the cytokine interleukin-2 (chapter 10). thirteen scid patients with this deficiency were treated in two different clinical trials with retroviruses that expressed the defective gene (table 11 .5). the results illustrate the highs and the lows of gene therapy trials. all 13 patients developed functional immune systems, and the trials at first appeared to be a complete success. however, three of the patients later developed t-cell leukemia and one has died of the leukemia. the leukemia was at first suspected to arise from insertional mutagenesis, a chronic worry with vectors that insert into the host dna, and many gene therapy trials that used retroviral vectors were suspended. recent studies indicate that the disease is not due to insertional mutagenesis, however, but rather due to the oncogenic potential of the il2rg gene itself, as studies have shown that overexpression of this gene in mice results in leukemia. cystic fibrosis results from loss of the cystic fibrosis transmembrane conductance regulator (cftr), which regulates epithelial transport of ions and water. although lack of this protein results in damage to the epithelium in many parts of the body, the most serious manifestation is lung disease accompanied by chronic bacterial infection of the airways. clinical trials using adenoviral vectors, which infect respiratory epithelium, to express cftr in the lungs have been conducted. the first such studies were encouraging, but a more recent trial that was carefully controlled found no relief of symptoms. inflammation produced by the high doses of adenovirus used in trials is also a problem. it is difficult to get efficient delivery to the lung, especially through the thick mucus that is a characteristic of cystic fibrosis. in addition, the expression needs to continue for the life of the patient, which means either a very stable (integrated?) gene being expressed, or a system of repeated administration of vector that can be tolerated without immunologic consequences. lentiviruses have been proposed as an attractive vector, but there have been concerns about probable pathogenesis due to the lentivirus itself. however, for this disease, the most promising mode of gene delivery so far developed has been dna compacted into nanoparticles with polycations, in particular peg-polylysine, nicknamed "polyplexes." a clinical trial in humans used cftr polyplexes, and the phase i trial showed no adverse effects. for this disease a nonviral approach may well be the best solution. porteus et al. (2006 ), foster et al. (2006 , bukreyev et al. (2006) . duchenne muscular dystrophy (dmd) is a severe muscle wasting disorder due to the lack of functional dystrophin protein. it occurs in 1/3500 male births. there are a number of difficulties in attempting to cure dmd with gene therapy, including the size of the protein, which is encoded in a cdna of 11kb, and the need to deliver the vector to a large proportion of the body mass, that is to all of the striated muscles and cardiac muscle. several approaches have been tried, and the first human clinical phase i trial has just been completed (see table 11 .5). in this study, a plasmid containing the entire gene under the control of a cmv promoter was injected intramuscularly. although only low levels of dystrophin expression were observed in 6 out of 9 patients, there was no evidence for adverse reactions and the trial was considered a success. a second approach has been to attempt to introduce the gene in an aav vector, especially in one of the many human isolates to which most of the population show no preexisting immunity, or into a nonhuman aav. however, here the size of the gene is a problem, but it has been shown that the dystrophin protein contains a large number of repeated elements ( fig. 11 .13) and that "mini-dystrophin" and "microdystrophin" are functionally active in the mouse model for the disease, the mdx mouse. low-pressure intravenous injection of aav6 expressing micro-dystrophin into a mouse could transfect 90% of muscle, but high titers of the aav6 vector were required. a phase i/ii clinical trial has been initiated for delivery of micro-dystrophin in aav under the control of a cmv promoter into human patients, but there are no results as yet. a third approach attacks the specific nature of the genetic defect. it has been shown that 75% of dmd is caused by a frameshift mutation in one of the exons, such that no dystrophin is produced. since severely truncated dystrophin (like micro-dystrophin) can be functional (fig. 11.13 ), therapy to get rid of the exon causing the problem is being developed. modified antisense oligonucleotides (aos) can be used to alter the splicing pattern of the gene such that the exon in which the frameshift occurs is skipped, thereby restoring the reading frame. following success of an ao with a 2′o-methyl-phosphorothiolate backbone (2ome aos) to restore function in the mdx mouse, two clinical phase i trials have been initiated, one in the netherlands using 2ome aos and one in great britain using morpholino aos. since both are targeting the exclusion of exon 51, it will be possible to directly compare the two chemically modified aos. rheumatoid arthritis is a chronic, progressive inflammatory disease of the joints. an estimated 5 million people in the united states suffer from it. there is no cure. drug therapies are used that ameliorate the symptoms, but most of these drugs have side effects and cannot be taken indefinitely. if the disease progresses far enough, joint replacement may be required. the disease is associated with the release of inflammatory cytokines in the affected joints. clinical trials have started that use retroviruses to deliver the gene for an anti-arthritic cytokine gene to the joints. the gene encodes the interleukin (il)-1 receptor antagonist, which inhibits the biological actions of both il-1α and il-1β. it is hoped that such treatment might damp out the disease or at least keep it from progressing. (b) mini-dystrophin from a duchenne muscular dystrophy patient who was only mildly impaired. (c) micro-dystrophin engineered for delivery in aav vectors. adapted from figure 1 in foster et al. (2006) . patients who have deficiencies in enzymes that participate in the urea cycle have increased concentrations of ammonia in the blood. high concentrations of ammonia result in various symptoms, which can include behavioral disturbances or coma. severe deficiencies in these enzymes result in early death, but moderate deficiencies can result in delayed appearance of symptoms and may be partially controlled by diet. one such enzyme is ornithine transcarbamylase (otc), which is found on the x chromosome. deficiencies in otc are therefore more common in males than in females. gene therapy trials that use virus vectors recently received a major setback when a relatively fit 18-year-old male with an inherited deficiency for otc died 4 days after an adenovirus vector was injected into his liver. a high dose of adenovirus (4 × 10 10 ) that expressed otc was injected in an effort to achieve adequate levels of enzyme production. the virus unexpectedly spread widely and a systemic inflammatory response developed, inducing a fever of 40.3°c. he went into a coma, his lungs filled with fluid, and he died of asphyxiation. this unfortunate result makes clear the possible drawbacks to experimental treatments and the difficulties in designing protocols that allow an adequate margin of safety while trying to achieve a clinically relevant result. a gene therapy trial in patients with heart disease gave very encouraging results. although this study did not involve virus vectors, a brief description will be given since it serves as an incentive for continuation of gene therapy trials. coronary artery disease is common in older people. angioplasty or bypass surgery is used to open clogged arteries, but in many patients the arteries close up again (a process called restenosis). thirteen patients with chronic chest pain who had failed angioplasty or bypass surgery or both were injected in the heart muscle with dna encoding vascular endothelial growth factor. this factor promotes the growth of blood vessels, a process called angiogenesis. two months after treatment, all patients exhibited an improvement in vascularization of damaged areas of the heart, as shown by imaging and mapping studies. all patients reported a decrease in disease symptoms, and all had an improved performance in treadmill tests. although the number of patients is small, the uniformly positive results are encouraging. a large number of clinical trials have examined the possibility of using viruses as anticancer agents. table 11 .6 lists a number of trials that were active in the year 2000, to give a flavor of what is being tried and the number of malignacies that are being considered as candidates for treatment using gene therapy approaches. these trials are all phase i or i/ii, but more than 1000 patients were participating in the trials listed in the table. as shown in fig. 11 .12, the development of gene therapy approaches for the control of cancers continues to attract much effort, such that the majority of clinical trails to date have been directed against cancers. although progress has been painfully slow and disappointing, the prospect remains that effective treatment may yet be obtained for at least some cancers using such approaches. in most of the trials in table 11 .6, viruses are used to express proteins that control the growth of tumors or that are toxic to tumor cells. a number of different cytokines are being tried as antitumor agents, such as ifn-γ, il-2, tnf, and gm-csf. another approach is to try to repair the defective regulatory gene in the tumor cell, which is often p53. many other gene products are also being tested. the viruses used to express these products include the retroviruses, the adenoviruses, or the poxviruses. more recent trials have used additional viruses as well, in particular the herpesviruses and the adeno-associated viruses. herpes simplex type 1 would seem to be particularly appropriate for control of brain tumors, because the virus is neurotropic but sets up a latent infection in neurons. one idea would be to engineer herpes to express a protein that is only toxic in dividing cells but which would be nontoxic in mature, nondividing neurons. the table also lists a number of trials that use lipofection to introduce the gene of interest into target cells. further afield, thought is being given to the possibility of using viruses to express proteins that are overexpressed in tumor cells in an attempt to stimulate the immune system to respond by killing tumor cells. this is in essence an attempt to vaccinate a person against a tumor. for this approach to succeed, an antigen overproduced by a tumor cell, such as a melanoma cell, must be identified, inserted into a suitable vector, and the person with the tumor infected with the virus vector in an attempt to stimulate the immune system. in principle, this approach may be feasible, but only time will tell whether it is in fact practical. another approach is to try to direct the virus, more or less specifically, to infect the tumor cells, so that upon infection the cells are killed. cell death might result either because the virus itself is cytolytic or because the virus expresses a protein that renders the cell sensitive to a toxic agent such as budr. a number of the trials listed in table 11 .6 use the tk gene for this, since cells that express tk are sensitive to budr. one possible approach is to engineer the virus so that its surface glycoprotein expresses a monoclonal antibody directed against an antigen expressed only on the tumor cell, while at the same time causing the virus to be unable to infect cells that do not express the antigen. experiments have established the possibility of this approach, at least in principle, with viruses such as the alphaviruses. another approach was illustrated by the experiments with vsv to design a virus that could infect only hiv-infected cells. the possible use of herpesviruses to control brain tumors, especially gliomal tumors, has been cited earlier. simple retroviruses are also being examined for this purpose. these viruses can only replicate in dividing cells. thus, they should be able to infect only tumor cells in the brain, since most neuronal cells are terminally differentiated and do not divide. if the retroviruses express a protein that renders the cells sensitive to a toxin, it might be possible to kill replicating cells and therefore only the tumor cells. infectious clones a abbreviations: ifnγ, interferon gamma; il-2, interleukin 2; tk, thymidine kinase (sometimes used with bromodeoxyuridine); hsv-tk, herpes simplex thymidine kinase, often coupled with gancyclovir treatment; brca-1, breast cancer 1, early onset; psa, puromycin-sensitive aminopeptidase; cea, carcinoembryonic antigen; gm-csf, granulocyte-macrophage colony stimulating factor; mdr-1, multi-drug resistance protein 1 (used to insert chemotherapy-resistance genes into the hematopoetic lineage); cd80, protein involved in t-cell activation; cd, cytosine deaminase; tnf, tumor necrosis factor. b definitions of phases in a clinical trial: phase i usually fewer than 100 healthy volunteers, primarily to gauge adverse reactions, and to determine optimal dose and best route of administration. phase ii are generally pilot efficacy studies usually involving 200-500 volunteers randomly assigned to control and study groups. phase ii will test for immunogenicity in the case of vaccines, and duration of expression and amelioration of symptoms for gene therapy. note that none of these trials have proceeded beyond phase ii, and most are in phase i. source: wiley journal of gene medicine/clinical trial database at: http://www.wiley.co.uk/genetherapy/clinical/. a stable fulllength yellow fever virus cdna clone and the role of conserved rna elements in flavivirus replication rescue of a segmented negativestrand rna virus entirely from cloned complementary dnas genetic engineering of animal rna viruses cloning the vaccinia virus genome as a bacterial artificial chromosome in escherichia coli and recovery of infectious virus in mammalian cells reverse genetics system for introduction of site-specific mutations into the double-stranded rna genome of infectious rotavirus reverse genetics systems for the generation of segmented negative-sense rna viruses entirely from cloned dna production of infectious rna transcripts from sindbis virus cdna clones: mapping of lethal mutations, rescue of a temperature sensitive marker, and in vitro mutagenesis to generate defined mutants infectious rabies viruses from cloned cdna infectious cdna clone of the epidemic west nile virus from new york city feline calicivirus: recovery of wildtype and recombinant viruses after transfection of crna or cdna constructs nonviral vectors for gene therapy non-viral and viral vectors for gene therapy dna vaccines as cancer treatment construction and characterization of encapsilated poliovirus replicons that express biologically active murine interleukin-2 cytokine production in motor neurons by poliovirus replicon vector gene delivery nonsegmented negative-strand viruses as vaccine vectors coronaviruses as vectors: stability of foreign gene expression development of bovine herpesvirus 4 as an expression vector using bacterial artificial chromosome cloning vaccinia virus: a selectable eukaryotic cloning and expression vector chimeric yellow fever virus 17d-japanese encephalitis virus vaccine: dose-response effectiveness and extended safety testing in rhesus monkeys evaluation of attenuation, immunogenicity and efficacy of a bovine parainfluenza virus type 3 (piv-3) vaccine and a recombinant chimeric bovine/human piv-3 vaccine vector in rhesus monkeys an alphavirus replicon particle chimera derived from venezuelan equine encephalitis and sindbis viruses is a potent gene-based vaccine delivery vector restricting expression prolongs expression of foreign genes introduced into animals by retroviruses recombinant measles viruses efficiently entering cells through targeted receptors virus vectors and their applications review: episomal vectors for gene expression in mammalian cells gene therapy vectors based on adeno-associated virus: characteristics and applications to acquired and inherited diseases transfer of genes to humans: early lessons and obstacles to success severe combined immunodeficiency: a model disease for molecular immunology and therapy gene therapy progress and prospects: duchenne muscular dystrophy baculovirus vectors for gene therapy enhanced functional recovery from spinal cord injury following intrathecal or intramuscular administration of poliovirus replicons encoding il-10 targeting sindbis virus-based vectors to fc receptor-positive cell types t lymphocytes with a normal ada gene accumulate after transplantation of transduced autologous umbilical cord blood cd34 + cells in ada-deficient scid neonates gene therapy progress and prospects-vectorology: design and production of expression cassettes in aav vectors functional lentiviral vectors for xeroderma pigmentosum gene therapy gene therapy in the cornea a look to future directions in gene therapy research for monogenic diseases update on adenovirus and its vectors herpes simplex virus 1 (hsv-1) for cancer treatment virus vectors and their applications the use of oncolytic vaccinia viruses in the treatment of cancer: a new role for an old ally? curr gene therapy: twenty-first century medicine viral vectors for gene transfer: a review of their use in the treatment of human diseases gene therapy: therapeutic gene causing lymphoma current prospects for gene therapy of cystic fibrosis key: cord-005089-jwcmmfdw authors: zhao, yin-he; möller, michael; yang, jun-bo; liu, ting-song; zhao, jin-feng; dong, li-na; zhang, jin-peng; li, cheng-yun; wang, guo-ying; li, de-zhu title: extended expression of b-class mads-box genes in the paleoherb asarum caudigerum date: 2009-11-11 journal: planta doi: 10.1007/s00425-009-1048-6 sha: doc_id: 5089 cord_uid: jwcmmfdw asarum caudigerum (aristolochiaceae) is a paleoherb species that is important for research in origin and evolution of angiosperm flowers due to its basal position in the angiosperm phylogeny. in this study, a subtracted floral cdna library from floral buds of a. caudigerum was constructed and cdna arrays by suppression subtractive hybridization were generated. cdnas of floral buds at different stages before flower opening and of leaves at the seedling stage were used. the macroarray analyses of expression profiles of isolated floral genes showed that 157 genes out of the 612 unique ests tested revealed higher transcript abundance in the floral buds and uppermost leaves. among them, 78 genes were determined to be differentially expressed in the perianth, 62 in the stamens, and 100 genes in the carpels. quantitative real-time pcr of selected genes validated the macroarray results. remarkably, apetala3 (ap3) b-class genes isolated from a. caudigerum were upregulated in the perianth, stamens and carpels, implying that the expression domain of b-class genes in this basal angiosperm was broader than those in their eudicot counterparts. electronic supplementary material: the online version of this article (doi:10.1007/s00425-009-1048-6) contains supplementary material, which is available to authorized users. typical angiosperm flowers comprise concentric arrangements of four types of organs, arranged from outward in: sepals, petals, stamens, and the inner carpels ). all of these whorls are interpreted as modified leaves, as goethe proposed 200 years ago (pelaz et al. electronic supplementary material the online version of this article (doi:10.1007/s00425-009-1048-6) contains supplementary material, which is available to authorized users. 2001; ditta et al. 2004) . flowers develop under the control of homeotic genes, many of them from the mads-box gene family. they encode transcription factors that play crucial roles in the development of floral primordia and the establishment of floral organ identities (weigel 1998; theissen et al. 2000; theissen 2001; becker and theissen 2003) . the genetic regulation of floral organ formation in typical eudicot flowers, such as arabidopsis thalina or antirrhinum majus, was initially described with the abc model for the specification of floral organ identities coen and meyerowitz 1991) . since then, new classes of genes have been identified, such as d-function genes thought to be involved in ovule identity (angenent et al. 1995) . the four class e sepallata genes encode proteins that are apparently required in floral organ identity determination (pelaz et al. 2000 (pelaz et al. , 2001 ferrario et al. 2003; ditta et al. 2004) . the products of the a, b, c and e genes act in a combinatorial manner to achieve three types of activity. in the outermost whorl, a and e genes function to control sepal identity. in the second whorl, a, b, and e genes together specify petal identity. in the third whorl, b, c, and e genes act in concert to direct stamen identity, while in the innermost whorl, c and e genes determine carpel identity coen and meyerowitz 1991; honma and goto 2001; pelaz et al. 2000 pelaz et al. , 2001 theissen 2001; ditta et al. 2004 ). the functions of a and c genes are considered mutually antagonistic and the expression of one represses that of the other coen and meyerowitz 1991; drews et al. 1991; egea-cortines et al. 1999; pelaz et al. 2000; theissen 2001) . in higher eudicots, b-class gene products are represented by homologs of the arabidopsis thaliana genes apetala3 (ap3) and pistillata (pi), which control petal and stamen identity in the second and third whorls, respectively (jack et al. 1992; goto and meyerowitz 1994) . in addition to petal and stamen expression, however, ap3 and pi transcripts are detected in carpel tissue in basal angiosperms such as amborella and nuphar (kim et al. 2005) . other angiosperms and gymnosperm further possess a sister clade of b genes, termed bsister (bs) genes, and expression studies revealed that these genes are predominantly expressed in female reproductive organs (including carpels and ovules) (becker et al. 2002) . asarum caudigerum hance is a paleoherb belonging to family aristolochiaceae of the magnoliid order piperales, which is phylogenetically near the base among angiosperms (kramer and irish 2000; angiosperm phylogeny group 2003) . for another basal angiosperm genus, amborella (amborellales) actually the most basal lineage of the extant angiosperms (mathews and donoghue 1999; qiu et al. 1999; soltis et al. 1999; angiosperm phylogeny group 2003) , buzgo et al. (2004) and soltis et al. (2007a) hypothesized that there is a gradual transition in the expression of b-class genes throughout the floral parts in basal angiosperms. they proposed a ''fading borders'' model of floral gene expression in these basal angiosperms, and kim et al. (2005) suggested that their b-class genes expression is broader than those of their counterparts in eudicots and monocots. in the present study, we investigated this hypothesis by studying mads-box genes in a little-studied basal angiosperm species, a. caudigerum. we took a genome-wide screening approach to study its mads box gene homologs. we analyzed the expression profiles of all genes expressed during floral bud development by macroarray. we identified mads-box homologs in a. caudigerum by phylogenetic analysis, and determined how its b-class genes are expressed through gene expression pattern analyses, qrt-pcr and rna in situ hybridization. we further showed that the b-class genes isolated from this basal angiosperm have a broader expression pattern than those of their counterparts in higher eudicots. plant materials, total rna extraction, dna sequencing, data analysis, and data deposition plants of asarum caudigerum hance were introduced from the wild in qiubei county of yunnan province, sw china, and cultivated in the botanical garden of the kunming institute of botany (chinese academy of sciences, kunming, yunnan, china). seedling leaves and floral organs of buds at several stages before anthesis were collected and transferred directly to liquid nitrogen. total rna was isolated from the frozen leaves and floral buds using trizol (shanghai huashun company, shanghai, china) according to the manufacturer's protocol. to identify floral organspecific genes and to elucidate their expression patterns, 612 unique est genes were used in the present study: 567 unique ests were previously published ; genbank est database dv038159-dv038720 and dv075851-dv075856), and 45 unique genes from 400 newly sequenced clones of the previously prepared cdna library described in zhao et al. (2006) were obtained and deposited in the genbank est database (ee127746 and ee605642-ee605686). protein similarity searches were performed against the ncbi database using the blastx program to assign putative functions to these ests. in the present study, e-values less than 1e -5 with more than 100 nucleotides of the ests were considered significant. cdna macroarray preparation a cdna macroarray was prepared according to a previously published method (ji et al. 2003) , with minor modifications. the pcr product from each unique est was transferred from a 384-well plate to a nylon membrane (amersham biosciences, arlington heights, il, usa) using the biomek 2000 laboratory automation workstation (beckman coulter, fullerton, ca, usa). the pcr products were reamplified by pcr (perkin-elmer gene-amp pcr system 9600) using nested pcr primers 1 and 2r provided in the pcr-select cdna subtraction kit (clontech, palo alto, ca, usa). these primers were complementary to sequences flanking both sides of the cdna insert. thermo-cycling conditions were as follows: one step at 94°c for 3 min, followed by 28 cycles of 95°c for 10 s, 68°c for 3 min, and 72°c for 7 min. each clone was blotted in quadruplicates with the spots 1.125 mm in diameter and 1.25 mm apart. after air-drying, the membranes were denatured in 0.6 m naoh for 5 min, neutralized in 0.5 m tris-hcl (ph 7.5) for 5 min and then rinsed in distilled water for 3 min. the blotted cdna samples were cross-linked to membranes using a lowenergy uv source and were baked for 2 h at 80°c. sars virus genes, distilled water, and the pcr reaction solution were also transferred onto the membrane as negative controls. macroarray hybridization, washing, and radioactive scanning total rnas prepared from leaves at the seedling stage, uppermost leaves (upper leaves which close to floral buds), perianth, stamens, and carpels were reverse transcribed and used as probes for expression profile analysis. the reverse transcription reaction was performed in 20 ll volumes set up as follows: 1 ll oligo(dt) 18 (10 mm/ll), 5 lg total rna and distilled water (up to 8 ll). the mix was heated to 65°c for 5 min, then quickly chilled on ice, and the contents collected by centrifugation at 10,000g for 20 s. then, 4 ll of 59first-strand buffer, 2 ll of 0.1 m dtt, 1 ll of 10 mm dntp mix (10 mm each datp, dttp and dgtp), 1 ll rnasin (40 u ll -1 ), 3 ll [ 32 p]-dctp (370 000 bq ll -1 ) and 1 ll (200 u) of superscript tm ii polymerase (invitrogen, carlsbad, ca, usa) was added, the solutions mixed by gentle vortexing and incubated at 42°c for 1 h. the probes were denatured at 100°c for 5 min, and then chilled for 5 min on ice before hybridization. membranes were pre-hybridized in 20 ml church solution (1% bsa, 1 mm edta, 0.25 m na 2 hpo 4 -nah 2 po 4 , 7% sds) at 65°c for 5 h. the denatured probes were then added to the church solution and hybridization was carried out overnight at 65°c. after hybridization, the membranes were washed at 65°c in 2 9 ssc, 0.5% sds for 10 min, in 1 9 ssc, 0.5% sds for 10 min, then in 0.5 9 ssc, 0.5% sds for 10 min, and finally in 0.1 9 ssc, 0.1% sds for 10 min. they were then exposed to storage phosphor screens (amersham) for 3 days. images were acquired by scanning the membranes with a typhoon 9210 scanner (amersham). data were analyzed using the gpc visualgrid software (http://www.gpc-biotech.com). the radioactive intensity of each spot was quantified as volume values and the local background levels subtracted, resulting in subtracted volume values, designated svol. the mean of all spot intensities in each membrane was used as the internal control, the subtracted volume value of which was designated sref. all images were normalized by dividing the svol of each spot by the sref value within the same image, resulting in a normalized volume value (nvol) for each spot. the nvol values were comparable between all images. the ratios of the signal intensities for each est in the uppermost leaves, perianth, stamens, and carpels, to those of the leaves at the seedling stage were calculated as measures of the changes in the differential expression of the genes represented by the cdna spots on the macroarrays. a twofold expression cutoff was applied to make the analysis more stringent, i.e., spots with a ratio equal to or more than two were judged significantly upregulated (jia et al. 2006 ) and loaded into the program hierarchical clustering explorer 3.0 (hce3.0) for analysis (http://www. cs.umd.edu/hcil/hce/hce3.html). total rna was reverse-transcribed with oligo-dt and reverse transcriptase (promega, madison, wi, usa) following the supplier's protocols. to examine the expression of the three putative mads-box transcription factors (dv038650, dv038420, dv038434) and one gibberellinregulated gene (dv038510), quantitative real-time pcr (qrt-pcr) was carried out with an abi prism 7900 ht sequence detection system (applied biosystems, foster city, ca usa). the a. caudigerum 28s rrna (dv038186) gene was used as internal control for normalization of the template cdna. the dv038420 sense primer was dv038420f (5 0 -gtccatggcgggggagtttctctct ctttc-3 0 ) and the anti-sense primer dv038420r (5 0 -ga agcagccattccaaggagtgtag-3 0 ); the dv038434 sense primer was dv038434f (5 0 -agccattccaagg agtgtag-3 0 ), and the anti-sense primer dv038434r (5 0 -gcatacttacattccaggtcttc-3 0 ); the dv038510 sense primer was dv038510f (5 0 -attcgcatttcgta planta (2010) 231:265-276 267 tccaac-3 0 ), and the anti-sense primer dv038510r (5 0 -tgttatgcaaggcgcagatcgtg-3 0 ); the dv038650 sense primer was dv038650f (5 0 -aagaccaaggaa ggaggac-3 0 ), and the anti-sense primer dv038650r (5 0 -cggccgcgaccacgctaatc-3 0 ); the dv038186 sense primer was dv038186f (5 0 -aggagactgcgttg atgtg-3 0 ), and the anti-sense primer dv038186r (5 0 -caaatccaaacgaaagggacaat-3 0 ). qrt-pcr was performed in 19 sybr green i pcr master mix (applied biosystems), containing 200 nm of each primer and 1 ll 1:10 diluted cdna. the pcr was performed under thermal cycling conditions as follows: 1 cycle at 50°c for 2 min, 1 cycle at 95°c for 10 min, 40 cycles at 95°c for 30 s, at 60°c for 30 s, and at 72°c for 18 s. following amplification, melting curve analyses were performed at 95°c for 10 min followed by 40 cycles of 95°c for 30 s, 60°c for 30 s, and 72°c for 20 s. the data collected during each extension phase were analyzed initially using sds2.1 (applied biosystems). the 28s rrna gene was used as an internal calibrator to standardize the rna content of the different tissues. measurement of the leaves at the seedling stage was used as a sample calibration control. the abundance of dv038650, dv038420, dv038434 and dv038510 transcripts was calculated using the relative 2 -ddct analytical method (livak and schmittgen 2001) . the mean of triplicates of the same rna sample was used as final result of gene expression, and the standard deviation of the three reactions calculated. floral primordia at different stages of development were dissected in the greenhouse, fixed in faa (5% formaldehyde, 5% acetic acid, in 70% alcohol) for 13 h, and were then dehydrated through a series of alcohol solutions ranging from 70 to 100%. the materials were further dissected under a stereomicroscope, and the alcohol replaced by isopentanol acetate before the samples were dried in a hitachi hcp-2 co 2 critical point dryer (cpd). the dried material was mounted on stubs and coated with gold-palladium. observations were made using a hitachi ky amray-100b sem at 25kv and 0.5-1 mm working distance. database searches and isolation of a putative b-class genes using the previously described est dv038650 ), we performed race using the 5 0 -race cdna amplification kit (clontech) according to the manufacturer's protocol. we subsequently performed database searches in genbank for the resulting mrna and predicted protein sequences. this enabled the elaboration of additional cdna sequence, deposited in genbank as eu368583. the complete coding region of eu368583 was amplified by pcr using the specific sense primer eu368583f (5 0 -gatccatgggctgcgcgacgtccaa-3 0 ) and anti-sense primer eu368583r (5 0 -gtaggtgacc actttgttatgcaaggcgcagatcg-3 0 ). the conditions for amplification were 94°c for 3 min followed by 30 cycles at 94°c for 30 s, at 54°c for 30 s and at 72°c for 1 min, plus a final extension at 72°c for 7 min. pcr products were purified and cloned into pgem t-easy vectors (promega) according to the manufacturer's protocol, and finally the clones were sequenced. the putative mads-box amino acid eu368583 isolated from a. caudigerum was aligned with those closely related to b-class genes and mads-box genes from genbank using clustal x (thompson et al. 1997 ) followed by manual adjustment where necessary. alignments were created from the relatively conserved mik amino acids (the nucleotides were too variable to be aligned unambiguously) while the c domain was excluded. a neighbor-joining (nj) tree was constructed using the pairwise deletion option in mega4 (tamura et al. 2007 ). genetic distances were estimated under the poisson correction model. floral buds were fixed in faa, and were dehydrated through a standard ethanol series. the buds were transferred to liquid nitrogen, shock-frozen, and were stored at -80°c. frozen tissues were later embedded with an embedding optimum cutting temperature (oct) compound (''tissue-tek''; miles laboratories inc., elkhart, in, usa), sectioned at 10 lm thickness and mounted on glass slides. rna in situ hybridization was carried out according to yang et al. (2005) with minor modifications. immunodetection with anti-dig antibodies conjugated with alkaline phosphatase was carried out using dig northern starter kit (roche, mannheim, germany) according to the manufacturer's protocol. a 326-bp antisense probe was prepared using the dv038650r (5 0 -agatttagaccgta gagt-3 0 ) and the dv038650 t7 primer containing the t7 promoter (5 0 -taatacgactcactataggg-3 0 ). the sense probe was created using primer dv038650f (5 0 -gg tatagcgtataatgagac-3 0 ) and the t3 promoter primer (5 0 -aattaaccctcactaaaggg-3 0 ). in situ hybridizations were performed at 46°c and subsequent washes were carried out at 37°c. images were captured with an olympus microscope (olympus company, guangzhou, guangdong, china). total rna from seedling leaves, uppermost leaves, perianth tissue, stamens, and carpels was used to synthesize a set of probes for the macroarray experiments (fig. 1 ). in total, 157 genes out of the 612 unique est genes tested showed an upregulated expression (table s1 ). of these, 43, 78, 62, and 100 were upregulated in the uppermost leaves, perianth, stamens and carpels, respectively. the expression of 29, 38 and 32 upregulated genes overlapped in the uppermost leaves and perianth, perianth and stamens, and stamens and carpels, respectively (fig. 2a-c and table s1 ). the gene expression pattern in the perianth was very similar to that in the uppermost leaves ( fig. 1a-d) . however, we did not detect transcripts encoding putative mads-box transcription factors in the uppermost leaves (table 1 and table s1 ). of the 78 upregulated perianth transcripts (table s1 ), seven putative mads-box transcription factors (dv038213, dv038420, dv038528, dv038434, dv038183, dv038296, dv038162), including three putative apetala3-like proteins, two agl6-like proteins, and two similar to putative mads542 protein were found (table 1) . sixty-two genes were upregulated in the stamens (table s1 ). of the stamen-specific transcripts, three genes encoded mads-box transcription factors (dv038577, dv038682, dv038420), including one putative mads box transcription factor, ap3-like, and one putative mads542 protein (table 1 ). carpels showed more upregulated genes than the other organs, probably due to their complex tissue types, with 100 genes showing significant upregulation. they included three transcripts encoding putative mads-box transcription factors (table s1 ), two encoding putative ap3-like proteins (dv038213, dv038650), and one encoding a putative mads542 protein (dv038528) ( table 1) . the expression of three putative mads-box transcription factors (dv038650, dv038420, dv038434) and one gibberellin-regulated protein (dv038510) was further validated by qrt-pcr (fig. 3) . our qrt-pcr results showed early floral organ development in a. caudigerum, floral primordia arise under transverse protuberances in axillary bract primordia. the whorl of the perianth appears first and then the first whorl of the androecium, quickly followed by the carpel primordia and then the second androecium whorl (fig. s1) . a. caudigerum usually has six carpels and two whorls of six stamens, although some buds have only five carpels and five stamens per androecium whorl. the second androecium whorl continues to develop when the first whorl has already matured. therefore, the series of floral organ development progresses from the first whorl to the third, then forth, and lastly to the second whorl. we used race to obtain the full-length coding sequence of eu368583 and confirmed its homology with the previously published sequence dv038650 (fig. 4) . the paleoap3 motif was not present in pi genes, and was not well conserved in the bs genes (fig. 4) . phylogenetic analysis was performed to further clarify the homology and relationships of the full-length putative ap3-like mads-box transcription factor gene of a. caudigerum isolated in this study. the neighbor-joining analysis of the amino acid alignment yielded a high bootstrap support for the clade of ap3-like mads-box transcription factor genes from asarum (bootstrap, bs = 100%), including eu368583 (the full orf length of dv038650) isolated from a. caudigerum. the latter formed a clade with ap3-1 of a. europaeum (bs = 94%) (fig. 5) , and thus was not a member of pi or bs gene fig. 2 comparison of the distribution of upregulated floral organ genes showing more than twofold increase in transcript abundance between different whorls and uppermost leaves. a genes that were expressed in both the uppermost leaves and the perianth. b genes that were upregulated in both the perianth and stamens. c genes that were upregulated in both the stamens and carpels families. in addition, the ap3-like b-class genes from asarum formed a cluster with ap3-1 of saruma henryi (bs = 77%). in situ hybridization studies of the putative ap3-like homolog in a. caudigerum to obtain information on the spatial expression pattern of the putative ap3 homolog isolated from a. caudigerum, rna in situ hybridization was performed. the putative mads-box protein dv038650, a partial fragment of eu368583 was upregulated in carpels at two different stages of development: mid-development (fig. 6c , e, f) or late-development (fig. 6a, b, d) . in longitudinal sections, the signal was especially strong at the adaxial base of the cupules, where ovules would later develop, but was very weak in the perianth, and absent from the stamens ( fig. 6a -c, e). in transverse cross sections, a strong signal was also detected in the female reproductive structures (fig. 6d, f) . the use of sense probes for dv038650 as a control resulted in non-specific signals (fig. 6g, h ). we have performed a macroarray analysis of gene expression in leaves and flowers at different stages of development in the paleoherb a. caudigerum, with special focus on mads-box genes. because genome information for a. caudigerum is not available, we employed a largescale screening approach to identify the target genes. in our study, it was noticeable that a considerable number of upregulated transcripts constituted heretofore-uncharacterized genes, or at least hypothetical proteins with unknown gene function. in addition, many transcription factors were differentially upregulated in the flower whorls. basal angiosperms are very interesting to the study of the origin, diversification, and evolution of angiosperms. using neighbor-joining analysis, we demonstrated that the putative mads-box transcription factor acap3-1 isolated from a. caudigerum is a b-class gene based on the presence of conserved pi motif-derived and the paleoap3 motif (fig. 4 ) and the positioning in the phylogeny (fig. 5) . it did not fall anywhere near aeap3-2 and the other bsister genes (fig. 5) . although acap3-1 most closely resembled aeap3-1, the former showed carpel expression that had not been detected in the latter (kramer and irish 2000) (fig. 6) . the bsister gene aeap3-2, however, is expressed in female reproductive structures (kramer and irish 2000) . recent comprehensive expression studies in other plants groups revealed that bsister genes are mainly transcribed in female reproductive organs (becker et al. 2002; de folter et al. 2006) . why ap3-1 from a. europaeum should behave differently from its a. caudigerum homolog is not yet understood. in the typical abc model based on a. thaliana, the b-class genes, ap3 and pi control the specification of petals in conjunction with a-class genes and stamens in conjunction with c-class genes in the second and third whorls, respectively coen and meyerowitz 1991; ma and depamphilis 2000) . the borders of b-function gene expression are also not so clear-cut in arabidopsis: ap3 and pi, the major arabidopsis b-function genes, are not entirely restricted to the second and third whorl of the flowers; instead, ap3 is expressed in parts of the first whorl, and pi is expressed in parts of the fourth whorl at early stage (jack et al. 1992; goto and fig. 4 alignment of the conserved c-terminal domain of predicted proteins from b and bs lineage homologs. the pi and paleoap3 motifs (kramer et al. 1998; jaramillo and kramer 2004 ) are shaded meyerowitz 1994; chen et al. 2000) . it is believed that physical interactions of the two proteins are required for protein stability and in turn maintenance of their expression, so that in older flowers their expression is limited to whorls two and three (goto and meyerowitz 1994; riechmann et al. 1996) . it is possible that such a stabilization mechanism will not be present in basal angiosperms. based on our cdna macroarray data, the expression domain of b-class ap3-like genes in a. caudigerum was found to be broader than for their counterparts in eudicots (fig. 7) , as acap3-1 expression was found in carpels, and other ap3-like genes were also expressed in the perianth and stamens ( fig. 7; table 1 ). comparative studies of b-class ap3 gene homologs conducted in the family aristolochiaceae reported ap3 gene expression for the genera saruma and aristolochia (jaramillo and kramer 2004) . specifically, b-class ap3 genes were found to be expressed in the second, third and fourth whorls of saruma henryi, as well as the third and fourth whorls of aristolochia manshuriensis (jaramillo and kramer 2004) . this is consistent with the view of a broader expression pattern of b-class genes in basal angiosperms (buzgo et al. 2004; kim et al. 2005) . based on our sem observations, we found that the early floral organ development initiated in the sequence (1) first whorl, (2) third whorl, (3) fourth whorl, and (4) second whorl. therefore, the early floral organ development of a. caudigerum does not progress from the outer whorl inward, but instead the development of the second whorl is delayed. it is consistent with the hypothesis of buzgo et al. (2004) who suggested a gradual transition in the expression of floral b-class genes in amborella. our cdna macroarray data, rna in situ hybridization, and qrt-pcr results indicated that acap3-1/dv038650 is expressed in carpels, and particularly strongly in the endothelium of ovules. this gene's unusual expression may explain the retarded development of the second-whorl organs. irish (1999, 2000) proposed that the abc model was not rigidly fixed during the earliest stages of angiosperm evolution. irish (2003) cautioned that the applicability of this model outside of the core eudicots would require testing. indeed, several lines of evidence suggest patterns of gradual transitions among flowers that prompt a ''fading borders'' view of gene expression. this evidence includes the morphological transition between organ identities from perianth to carpels, the results from genome-scale phylogenomics research, and the implied gradual shift in the expression of the b-class genes across the flower in the basal angiosperms (kramer et al. 2003; buzgo et al. 2004; kim et al. 2005; soltis et al. 2007a, b) . however, there could be antagonistic interactions between the a-class gene products, which are partially responsible for perianth identities, and the c-class genes, which specify stamen and carpel identities in the higher eudicots (coen and meyerowitz 1991; drews et al. 1991; theissen 2001) . in arabidopsis thaliana, apetala2 (ap2) of the a-class genes promotes the b gene expression domain by antagonizing agamous (ag) (zhao et al. 2007 ). winter et al. (2002a, b) pointed out that an ancestral b protein from the gymnosperm gnetum gnemon binds dna in a sequencespecific manner as a homodimer, whereas in antirrhinum majus the def-like protein binds to dna only as a heterodimeric complex with the glo-like protein. less strict requirements for heterodimerization or autoregulatory upregulation may facilitate spatial shifts in the b-function in taxa outside of the eudicots, as detected by the broader gene expression of b-class genes among flowers of a. caudigerum, amborella and other basal angiosperm lineages. therefore, we support the contention that the abc model was not established at the outset of angiosperm evolution, but instead occurred in a simpler form in the basal angiosperms, and gradually evolved into the canonical abc model in the more derived higher eudicot lineages. a novel class of mads box genes is involved in ovule development in petunia an update of angiosperm phylogeny group classification for the orders and families of flowering plants: apg ii the major clades of mads-box genes and their role in the development and evolution of flowering plants a novel mads-box gene subfamily with a sister-group relationship to class b floral homeotic genes genes directing flower development in arabidopsis genetic interactions among floral homeotic genes of arabidopsis floral developmental morphology of amborella trichopoda (amborellaceae) minimal regions in the arabidopsis pistillata promoter responsive to the apetala3/pistillata feedback control do not contain a carg box the war of whorls: genetic interactions controlling flower development a bsister mads-box gene involved in ovule and seed development in petunia and arabidopsis the sep4 gene of arabidopsis thaliana functions in floral organ and meristem identity negative regulation of the arabidopsis homeotic gene agamous by the apetala2 product ternary complex formation between the mads-box proteins squamosa, deficiens and globosa is involved in the control of floral architecture in antirrhinum majus the mads box gene fbp2 is required for sepallata function in petunia function and regulation of the arabidopsis floral homeotic gene pistillata complexes of mads-box proteins are sufficient to convert leaves into floral organ the evolution of floral homeotic gene function the homeotic gene apetala3 of arabidopsis thaliana encodes a mads box and is expressed in petals and stamens apetala3 and pistillata homologs exhibit novel expression patterns in the unique perianth of aristolochia (aristolochiaceae) isolation and analyses of genes preferentially expressed during early cotton fiber development by subtractive pcr and cdna array annotation and expression profile analysis of 2073 full-length cdnas from stress-induced maize (zea mays l.) seedlings expression of floral mads-box genes in basal angiosperms: implications for the evolution of floral regulators evolution of genetic mechanisms controlling petal development evolution of the petal and stamen developmental programs: evidence from comparative studies of the lower eudicots and basal angiosperms molecular evolution of genes controlling petal and stamen development: duplication and divergence within the apetala3 and pistillata madsbox gene lineages complex patterns of gene duplication in the apetala3 and pistillata lineages of the ranunculaceae analysis of relative gene expression data using real-time quantitative pcr and the 2 -ddct method the abcs of floral evolution the root of angiosperm phylogeny inferred from duplicate phytochrome genes abnormal flowers and pattern formation in floral development ) b and c floral organ identity functions require sepallata madsbox genes conversion of leaves into petals in arabidopsis the earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes dimerization specificity of arabidopsis mads domain homeotic proteins, apetala1, apetala3, pistillata, and agamous angiosperm phylogeny inferred from multiple genes as a tool for comparative biology the floral genome: an evolutionary history of gene duplication and shifting patterns of gene expression the abc model and its applicability to basal angiosperms mega4: molecular evolutionary genetics analysis (mega) software version 4.0 development of floral organ identity: stories from the mads house a short history of mads-box genes in plants the clustal_x windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools from floral induction to floral shape evolution of class b floral homeotic proteins: obligate heterodimerization originated from homodimerization on the origin of class b floral homeotic genes: functional substitution and dominant inhibition in arabidopsis by expression of an orthologue from the gymnosperm gnetum a geraniol-synthase gene from cinnamomum tenuipilum expressed sequence tags (ests) and phylogenetic analysis of floral genes from a paleoherb species, asarum caudigerum ) mir172 regulates stem cell fate and defines the inner boundary of apetala3 and pistillata expression domain in arabidopsis floral meristem conflict of interest statement none. key: cord-002142-tdgu9sr9 authors: reniere, michelle l.; whiteley, aaron t.; portnoy, daniel a. title: an in vivo selection identifies listeria monocytogenes genes required to sense the intracellular environment and activate virulence factor expression date: 2016-07-14 journal: plos pathog doi: 10.1371/journal.ppat.1005741 sha: doc_id: 2142 cord_uid: tdgu9sr9 listeria monocytogenes is an environmental saprophyte and facultative intracellular bacterial pathogen with a well-defined life-cycle that involves escape from a phagosome, rapid cytosolic growth, and acta-dependent cell-to-cell spread, all of which are dependent on the master transcriptional regulator prfa. the environmental cues that lead to temporal and spatial control of l. monocytogenes virulence gene expression are poorly understood. in this study, we took advantage of the robust up-regulation of acta that occurs intracellularly and expressed cre recombinase from the acta promoter and 5’ untranslated region in a strain in which loxp sites flanked essential genes, so that activation of acta led to bacterial death. upon screening for transposon mutants that survived intracellularly, six genes were identified as necessary for acta expression. strikingly, most of the genes, including gshf, spxa1, yjbh, and ohra, are predicted to play important roles in bacterial redox regulation. the mutants identified in the genetic selection fell into three broad categories: (1) those that failed to reach the cytosolic compartment; (2) mutants that entered the cytosol, but failed to activate the master virulence regulator prfa; and (3) mutants that entered the cytosol and activated transcription of acta, but failed to synthesize it. the identification of mutants defective in vacuolar escape suggests that up-regulation of acta occurs in the host cytosol and not the vacuole. moreover, these results provide evidence for two non-redundant cytosolic cues; the first results in allosteric activation of prfa via increased glutathione levels and transcriptional activation of acta while the second results in translational activation of acta and requires yjbh. although the precise host cues have not yet been identified, we suggest that intracellular redox stress occurs as a consequence of both host and pathogen remodeling their metabolism upon infection. intracellular pathogens such as plasmodium spp., mycobacterium tuberculosis, salmonella enterica, trypanosoma cruzi, and leishmania spp. are responsible for an overwhelming amount of morbidity and mortality worldwide. successful dissemination of many of these pathogens requires complex life cycles that involve survival and replication in environmental or vector niches. to propagate within their hosts, these pathogens establish a variety of unique intracellular niches that are essential for their pathogenesis [1] . although there is considerable understanding of how intracellular pathogens manipulate host cell biology to promote their pathogenesis, less is known about the precise mechanisms by which these pathogens sense their host cell. such an understanding may lead to targets for therapeutic intervention. in this study we used listeria monocytogenes as a model system for understanding virulence gene regulation of a facultative intracellular bacterium that transitions from extracellular to intracellular growth. l. monocytogenes is a ubiquitous environmental saprophyte capable of causing severe disease as a foodborne pathogen [2] . l. monocytogenes is also a model system for studying bacterial adaptation to the host [3] . the bacterial virulence program is coordinated with a life cycle that begins upon entry into a mammalian cell either by phagocytosis or bacteria-mediated internalization. to commence intracellular growth, l. monocytogenes must first escape from the hostile phagosomal environment by the expression and secretion of a cholesterol-dependent cytolysin, listeriolysin o (llo) that mediates destruction of the phagosome [4] . upon entry into the cytosol, l. monocytogenes grows rapidly and expresses an essential determinant of pathogenesis, acta, an abundant surface protein that mediates host actin polymerization [5, 6] . appropriate regulation of llo and acta is critical for l. monocytogenes pathogenesis and transcriptionally coordinated by the master virulence regulator prfa [7] . prfa is a camp receptor protein (crp) family transcriptional regulator that is absolutely essential for l. monocytogenes virulence gene expression and pathogenesis [8] . prfa-mediated gene expression is regulated by prfa abundance, affinity for target promoters, and activation via cofactor binding [9] . prfa levels are controlled by three promoters. the most proximal promoter contains a site of negative regulation, while the most distal is a prfa-dependent readthrough transcript that is essential for appropriately high levels of intracellular gene expression [10] [11] [12] . prfa binds a palindromic dna sequence (prfa-box) and deviations from a consensus sequence result in lower affinity dna-prfa interactions [13] . the affinity of prfa for dna determines the degree of transcriptional activation prior to prfa allosteric activation [14] . for example, the gene encoding llo (hly) has a high affinity prfa-box and consequently is expressed even during growth in broth when prfa is not activated. in contrast, the acta promoter contains a lower affinity prfa box and is not expressed during growth in broth [15, 16] . upon entry into the host cell cytosol, prfa is over-expressed and is activated by a two-step process: first, binding of prfa to dna requires reduction of the four prfa cysteine residues while full transcriptional activation of prfa requires allosteric binding to glutathione [17] . the requirement for glutathione can be bypassed by mutations that lock prfa in its active conformation (prfa ã ) [18] . strains with prfa ã mutations constitutively express prfa-activated genes and consequently have growth defects extracellularly, demonstrating the importance of regulating virulence gene expression [19, 20] . however, even prfa ã strains grown in broth fail to synthesize the amount of acta observed intracellularly, which is likely attributable to translational control localized to the 5' untranslated region (5' utr) [21] . despite these findings of exquisite gene regulation, little is known about trans-acting factors that affect expression of prfa or prfa-activated genes. in a previous study, a genetic system was designed to select for l. monocytogenes mutants that failed to express acta intracellularly [17] . this screen led to the identification of l. monocytogenes glutathione synthase (gshf) and glutathione, a tripeptide antioxidant, as the allosteric activator of prfa. in this study we sought to further understand the host cues that are recognized by intracellular pathogens during infection. we returned to the forward genetic selection and exhaustively screened for additional mutants that failed to express sufficient acta intracellularly. this selection identified genes required at each stage of the intracellular lifecycle, including: vacuolar escape, prfa activation, and cell-to-cell spread. these data suggest a model of compartmentalized gene expression, furthering our understanding of the l. monocytogenes virulence program. the goal of this study was to identify genes involved in regulation of a principle virulence determinant in l. monocytogenes, acta. a bacterial strain was constructed that failed to replicate upon activation of the acta gene, which is specifically up-regulated during cytosolic growth and is essential for pathogenesis. this 'suicide' strain harbored loxp sites in the chromosome flanking the origin of replication (ori) and several essential genes. codon-optimized cre recombinase was expressed from the acta promoter ( fig 1a) . the suicide strain grew like wild type in rich media but was unrecoverable after infection of bone marrow-derived macrophages (bmms). a himar1 transposon library was then constructed in the suicide strain background and used to infect bmms. when bacteria were isolated at five hours post-infection (p. i.) nearly all mutants harbored transposon insertions in cre, the acta promoter driving cre expression (acta1p), loxp sites, and gshf, encoding glutathione synthase. to identify additional genes required during infection, colonies were isolated at three and four hours p.i, generating a library of 1,090 transposon mutants from an initial inoculum of >1 million bacteria. colony pcr excluded strains with transposon insertions in cre and gshf, resulting in a collection of 700 strains (fig 1a) . transposon mutants in the suicide background were screened individually for survival in bmms, narrowing the list to 300 mutants. six transposon insertions were identified in hly and nine insertions in prfa, emphasizing that cytosolic access and prfa are absolutely required for acta activation and subsequent cre expression. saturation of the screen was further demonstrated after identification of 11 insertions in the acta promoter driving cre and 31 insertions in the loxp sites (which are each only 34 nucleotides). the remaining transposon mutations were transduced into a wild type background and analyzed in a plaque assay, a highly sensitive measure of cell-to-cell spread, which is completely dependent on acta expression [22] . using a threshold of 85%, 12 mutants were identified that formed plaques significantly smaller than wild type in l2 murine fibroblasts (fig 2a and table 1 ). with one exception, the transposon insertions were in open reading frames and likely resulted in loss-of-function mutations. the [66] , teal arrows represent sites of transposon insertions, and numbers above these arrows correspond to mapped transposon locations (nucleotides 3' of the start codon). bolded numbers denote the transposon insertions used in this study. transposon in the promoter of lmo2191 (spxa1), a gene predicted to be essential in l. monocytogenes [23] , resulted in a 10-fold decrease in spxa1 expression when the bacteria were grown in broth, essentially resulting in a knock-down strain (s1 fig). attempts to make an in-frame deletion of spxa1 using conventional methods were unsuccessful, consistent with a previous report [23] . as the goal of this selection was to identify mutations that affect acta expression in vivo, we measured acta abundance during infection of bmms. four hours post-infection, cells were lysed and acta and the constitutively expressed p60 protein were analyzed by immunoblot. nine strains were found to express less acta than wild type after normalizing to p60 abundance ( fig 2b) . the work-flow of this selection used cre expression from the acta promoter plaque area as a percentage of wild type. data are the mean and error bars indicate the standard error of the mean (s.e.m.) for three independent experiments. p values were calculated using a heteroscedastic student's t-test and all strains are significantly different from wild type (p < 0.001). (b) quantification of immunoblots of acta and p60 during infection. acta abundance was normalized to p60 abundance and measured as a percentage of wild type. data are the mean ± s.e.m. of at least three independent experiments. (c) female cd-1 mice were infected with 10 5 colony forming units (cfu) of each mutant. spleens were harvested 48 hours post-infection and cfu were quantified. the solid lines indicate the median, and data represent three pooled experiments totaling n = 15 mice per strain. p values were calculated using a heteroscedastic student's t-test * p < 0.05; ** p < 0.01; *** p < 0.001. (d) gene expression of target genes measured by quantitative rt-pcr in wild type l. monocytogenes grown in broth compared to expression during infection of bmms. data are the mean ± s.e.m. of at least three independent experiments. p values were calculated using a heteroscedastic student's t-test * p < 0.05. and plaque area as a criterion for inclusion in the core set of twelve mutants analyzed here. it was therefore unexpected that three mutants (lmo0441::tn, lmo0443::tn, and citc::tn) did not display a defect in acta abundance during intracellular growth. we hypothesize that these mutations may disrupt elements of bacterial physiology critical to appropriate cre activity or normal growth. the twelve mutants isolated by the genetic selection were identified based on in vitro assays for virulence. while these assays are correlated to in vivo outcomes, the importance of these genes to l. monocytogenes pathogenesis was confirmed in a murine model of infection. intravenous infection of mice revealed that four of the mutants displayed no virulence defect (lmo0441::tn, rsbx::tn, lmo2107::tn, and gtca::tn) while the remaining eight mutants were significantly attenuated ( fig 2c) . it was surprising that four mutants exhibited impaired plaque formation yet were fully virulent; it is possible that these four mutants are impaired in other aspects of pathogenesis not reflected by changes in cfu during these infection conditions. to determine if the plaque defects in these mutants were due to cell-specific defects evident only in the l2 murine fibroblasts used for plaque assays, cell-to-cell spread defects were also analyzed in tib-73 cells, a murine hepatocyte cell line (table 1) . we observed consistent phenotypes between the plaque defects in tib-73 cells and l2 cells with the exception of citc::tn, p-spxa1::tn, and ohra::tn. however, these mutants were significantly attenuated during infection and thus it was unclear why they did not display a plaque defect in tib-73 cells. the specificity of the transposon insertion in seven of the eight attenuated strains was confirmed by expressing the disrupted gene in trans and complementing the plaque defect (s2 fig). attempts to complement the ppla::tn plaque defect were unsuccessful. however, ppla mutants are difficult to complement and the mutant we identified exhibited phenotypes consistent with published δppla defects [24] . other reports have identified genes necessary for virulence of l. monocytogenes by comparing changes in gene expression in vivo [25] [26] [27] . in our analysis, only gshf was differentially transcribed between host cells and rich media (fig 2d) . it remains to be investigated if the activity of these genes is regulated post-transcriptionally in response to the host. in this study we focused on the following genes that were required for acta expression and pathogenesis ( fig 1b) . yjbh (lmo0964) encodes a putative thioredoxin similar to yjbh in bacillus subtilis (57% amino acid similarity) [28] . a transposon in l. monocytogenes yjbh was previously identified in a screen for mutants defective in llo production in vitro and was found to be attenuated in a competitive infection model [29] . spxa1 (lmo2191) encodes an arsc family transcriptional regulator similar to the disulfide stress regulator spx conserved in firmicutes (83% amino acid identity to b. subtilis spx) [30] . the difference in nomenclature is due to the presence of a paralogous gene in l. monocytogenes (lmo2426 or spxa2) that is 59% identical to b. subtilis spx while b. subtilis encodes only a single spx. in b. subtilis and staphylococcus aureus yjbh post-translationally regulates spx [28, 31] , although it is not known if this function is conserved in l. monocytogenes. lmo2199 encodes a hypothetical protein with a peroxiredoxin domain and is part of the organic hydroperoxide resistance (ohr) protein subfamily. it is cotranscribed with lmo2200, encoding a marr family transcriptional regulator which was not required for virulence, suggesting that lmo2200 may act as a transcriptional repressor [26] . in b. subtilis homologs of lmo2199 and lmo2200 are named ohra (63% amino acid similarity) and ohrr (68%), respectively, and we have adopted this nomenclature for consistency [32] . arpj (lmo2250) encodes a predicted amino acid abc transporter permease that was originally identified in a screen for genes with increased intracellular expression [25] . however, the data presented here did not show an increase in arpj expression during infection of bmms. this may be explained by the different growth media and cell types used in the two studies. it is also possible that arpj is autoregulated, as the previous study analyzed arpj expression in an arpj transposon mutant. ppla (lmo2637) encodes a lipoprotein whose secretion is increased in a prfa ã mutant [33] . the signal sequence of this lipoprotein is processed into a secreted peptide, which is required for vacuolar escape from non-phagocytic cells [24] . finally, gshf (lmo2770) encodes the only glutathione synthase in l. monocytogenes [34] . glutathione has been demonstrated to be an allosteric activator of prfa and therefore gshf mutants are severely attenuated in vivo due to insufficient virulence gene expression [17] . given the role of glutathione in activating prfa, we hypothesized that suppressor mutations of δgshf might illuminate alternative pathways for prfa activation, potentially involving other genes identified. accordingly, we screened for mutations that increased the virulence of a δgshf mutant. mice were serially infected with a high-inoculum of δgshf, livers were harvested at 72 hours p.i., homogenized, and diluted to inoculate naive mice. after four successive infections bacteria isolated from infected livers were analyzed by plaque assay. this approach previously identified a mutation in prfa that constitutively activates the protein (g145s), known as prfa ã , completely bypassing the requirement for glutathione during infection [17] . the δgshf prfa ã suppressor forms 100% plaque; therefore, for these experiments we selected bacteria that formed intermediate-sized plaques, which were then subjected to genome sequencing. two suppressor mutants were isolated and found to encode a g>a mutation 58 nucleotides 5' of the prfa start codon (fig 3a) . this mutation lies within a previously identified site of negative regulation of prfa, the so-called "p2 promoter" (prfa2p, fig 3a) and deletion of the -35 region of this promoter (δp2 mutant) results in a 10-20-fold up-regulation of the prfa1p-dependent prfa transcript [11] . we hypothesized that the prfa -58 g>a mutation also inactivated the p2 promoter and resulted in greater prfa abundance. indeed, the δp2 gshf::tn double mutant and the δgshf prfa -58 g>a suppressor mutants all formed plaques approximately 60% the size of wild type ( fig 3b) . these results did not directly implicate any of the other genes identified in our genetic selection, however these findings did highlight the impact of both prfa abundance and activation during infection. prfa expression is controlled by a feed-forward loop in which activated prfa drives its own transcription [12] . strains expressing δp2 or prfa ã decouple prfa abundance and activation whereby δp2 increases prfa abundance but still relies on glutathione for prfa activation; prfa ã increases both the amount and activity of prfa, independent of glutathione. we next sought to determine if the other mutants identified in the screen affected prfa abundance or activation by transducing each into l. monocytogenes δp2 and prfa ã backgrounds and measuring the plaque size in each background ( fig 3c) . based on these analyses, mutants fell into three categories. the first category (yjbh::tn, p-spxa1::tn, ohra::tn, and arpj::tn) was unaffected by alterations in prfa expression or activity, indicating that these genes were required down-stream of prfa. in the second category was gshf::tn, which was partially rescued by δp2 and completely rescued by prfa ã , consistent with the demonstrated role for glutathione as the allosteric activator of prfa. the third category describes ppla::tn, which formed 100% plaques in both the δp2 and prfa ã backgrounds. these data suggested that the ppla mutant was capable of activating prfa (because it was rescued by δp2) but was deficient in expression of prfadependent genes required early during infection before cytosolic access and glutathione-mediated activation of prfa. a principle difference between early and late prfa-dependent genes is that expression of early genes are less dependent on prfa activation by glutathione [35] . the two early genes are hly (encoding llo) and plca, which share a high-affinity prfa-box and are transcribed by unactivated prfa [35, 36] . the δp2 mutation results in increased transcription of early genes but does not affect late gene expression, whereas prfa ã increases transcription of both early and late genes. we hypothesized that strains rescued by δp2 are specifically deficient in early gene expression. accordingly, we analyzed early gene expression (llo production) in broth for each mutant. several of the mutants were found to secrete less llo than wild type ( fig 4a) . to determine if the defect in llo production led to impaired phagosomal escape and thus a plaque defect, these mutations were transduced into a δhly mutant over-expressing hly from a constitutive hyper promoter (ph-hly strain) [37, 38] . in this background, efficiency of vacuolar escape should be equivalent in all strains, and indeed, equal llo secretion was confirmed in broth. constitutive expression of hly rescued the plaque defects of three mutants: p-spxa1::tn, ohra::tn, and ppla::tn ( fig 4b) . interestingly, there was discordance between llo production in broth and the defect in plaque formation one might predict from an llo deficiency. for this reason, measuring llo production in broth may be revealing aspects of bacterial physiology unrelated to llo production in vivo. the above results suggested that mutations in p-spxa1, ohra, and ppla resulted in aberrant llo secretion and/or that these mutants were unable to survive in the harsh environment of the vacuole. constitutive expression of hly would likely overcome either defect. we attempted to segregate these two possibilities by analyzing sensitivity to vacuolar conditions, including reactive oxygen species which l. monocytogenes must adapt to in order to survive [39, 40] . the response of each mutant to peroxide, disulfide stress, and organic hydroperoxide was analyzed by measuring their sensitivity to hydrogen peroxide (h 2 o 2 ), diamide, and cumene hydroperoxide (chp), respectively. knock-down of spxa1 and disruption of ohra or gshf significantly increased the sensitivity of l. monocytogenes to both peroxide and disulfide stress (fig 4c) . in accordance with its annotation and the published role of ohra in b. subtilis [32] , the ohra::tn mutant was significantly more susceptible to chp ( fig 4c) . as these results suggested a role for redox control of virulence genes, we tested the hypothesis that host reactive oxygen or nitrogen species may be sensed by the bacteria during infection to activate acta. however, growth of the suicide mutant was not rescued in bmms lacking inducible nitric oxide synthase (nos2 -/-) or nadph oxidase (nox2 -/-) (s3 fig). therefore, l. monocytogenes may activate virulence genes in response to multiple redundant host cues or depend on yet unidentified host pathways. constitutive production of hly restored the majority of the plaque defect for p-spxa1::tn and ohra::tn, however, it did not restore the plaque to 100% of the parent strain ( fig 4b) . we hypothesized that these mutants might also be impaired in the ability to grow in the host cytosol, independently from virulence gene expression. all of the mutants identified in the screen grew similarly to wild type in bmms with the exception of p-spxa1::tn and ohra::tn (fig 4d) . in fact, p-spxa1::tn and ohra::tn were also the only mutants that exhibited growth defects in rich media (fig 4e) . these pleiotropic growth defects and sensitivity to redox stress are likely why ph-hly was only partially able to complement the plaque defect of these mutants (fig 4b) . previous work clearly demonstrated that glutathione was essential for transcriptional activation of virulence genes [17] . in order to assess which factors might be independent of glutathionedependent transcriptional activation, we combined each transposon with an in-frame δgshf mutation. the only mutation not epistatic to gshf was yjbh::tn, which produced an additive plaque defect ( fig 5a) . further, yjbh::tn was not rescued by constitutive activation of hly ( fig 4b) or prfa ( fig 3c) . together, these data suggested that yjbh was required for acta expression post-transcriptionally. indeed, transcript levels of acta were identical in bmms infected with wild type or the δyjbh mutant ( fig 5b) . it is intriguing that arpj::tn was epistatic to gshf, yet not rescued by constitutive activation of prfa, indicating that arpj may contribute to glutathione-dependent transcriptional activation of acta through an unknown mechanism. the acta gene is preceded by 149 nucleotides of untranslated mrna ( fig 5c) which is important for sufficient acta expression [21] . a strain was constructed in which acta was expressed independent of prfa by expressing the entire acta transcript (including the 5' utr) under the control of the constitutive hyper promoter in a strain deleted for acta (ph-acta strain, fig 5d) . acta protein abundance was then analyzed by immunoblot. in this background, acta abundance was equivalent among all strains when the bacteria were grown in broth ( fig 5e) . however, during infection of bmms, disruption of yjbh resulted in significant impairment in acta abundance (fig 5f) , indicating a failure to translationally activate acta. given that disrupting yjbh rescued the death of the suicide strain in which cre was expressed under acta1p and the 5' utr, these data indicate a genetic interaction between yjbh and the 5' utr of acta. to further support this genetic interaction we engineered a fluorescent strain of l. monocytogenes in which rfp was expressed under the acta1p promoter and 5' utr (acta1prfp, fig 5g) . during infection of bmms the δyjbh acta1p-rfp strain exhibited significantly less fluorescence than wild type acta1p-rfp ( fig 5h) . unfortunately, we were unable to interrogate the effect of a yjbh mutation on acta abundance in the absence of its 5' utr due to an inability to detect acta when the 5' utr was deleted, consistent with this region being critical for acta expression [21] . a drawback to ph-acta is that although acta is over-expressed in broth, this strain still elaborates much less acta in vivo and fails to form a plaque (fig 5e and 5f ). to analyze the role of translational activation during infection, the acta gene and 5' utr were moved to a neutral locus within the l. monocytogenes chromosome [41] . in this strain, acta was expressed only from the prfa-dependent acta1p proximal promoter, eliminating read-through transcription from the distal acta2p promoter (fig 5c) . this strain was called acta1p and was only mildly impaired in plaque formation and virulence (fig 5i and 5j) . however, acta1p yjbh::tn was unable to form a plaque (fig 5i) . the importance of acta translational activation was further underscored by a 3-log defect for acta1p yjbh::tn in the livers of infected mice (fig 5j) . these data revealed a critical role for yjbh in acta activation that was less apparent in the wild type background due to redundant prfa-dependent promoters. in this study, rather than search for novel virulence factors or genes up-regulated in vivo, we screened for genes required for activation of an essential determinant of l. monocytogenes pathogenesis (acta) that is up-regulated over 200-fold during intracellular growth. mutants identified in the genetic selection fell into three broad categories: (1) those that failed to reach the cytosolic compartment; (2) mutants that entered the cytosol, but failed to activate the master virulence transcriptional regulator prfa; and (3) mutants that entered the cytosol and activated transcription of acta, but failed to synthesize it (fig 6) . this approach highlighted how and cfu were quantified. the solid lines indicate the median, and data represent two pooled experiments totaling n = 10 mice per strain. in all panels, p values were calculated using a heteroscedastic student's t-test * p < 0.05; ** p < 0.01; *** p < 0.001; ns (not significant) p > 0.05. doi:10.1371/journal.ppat.1005741.g005 once phagocytosed by a host macrophage, l. monocytogenes (light blue rods) requires the gene products of spxa1 and ohra to survive in the phagosome. by a mechanism that is not yet understood, ppla is required for vacuolar escape in non-phagocytic cells. yjbh and arpj are then required for cell-to-cell spread. the l. monocytogenes pathogenicity island-1 is pictured below. early genes (depicted in red) are those with high-affinity prfa boxes that do not require active prfa (teal) for transcription. late genes (depicted in blue) are those with relatively low-affinity prfa boxes that require activated prfa to be transcribed and these are required later during infection, in the host cytosol. the transition from unactivated to activated prfa requires glutathione (orange circles), which is synthesized by gshf. yjbh (magenta) is then required for translational activation of acta, although the mechanism is not yet understood. see text for more details, model is not drawn to scale. expression of virulence factors is spatially and temporally compartmentalized via regulation of transcription and translation during infection. one of the most striking findings of this study was that the majority of genes identified in the selection encode proteins predicted to control bacterial redox regulation, suggesting that redox changes represent one of the biological cues sensed by l. monocytogenes to regulate its virulence program. redox stress during infection can arise from endogenous by-products of bacterial metabolism and exogenously derived factors generated by the host. however, it remains to be discovered whether the redox stress that may trigger virulence gene expression is produced by the host, the bacteria, or both. yjbh, spx, ohra, and gshf have defined roles in maintaining redox homeostasis in the presence of disulfide and organic peroxide stresses in firmicutes. in b. subtilis ohra is a peroxiredoxin required during organic hydroperoxide stress [32] . in s. aureus and b. subtilis yjbh interacts with spx to regulate the abundance and activity of spx [28, 31] . specifically, yjbhbound spx is recognized by the clpxp protease and is degraded so that spx concentrations are low under steady-state conditions [42, 43] . during disulfide stress the yjbh:spx interaction is disrupted by intramolecular disulfide bonds in both proteins that result in reduced proteolysis of spx. b. subtilis spx represses transcription of 176 genes and activates transcription of 106 genes [44] , the majority of which are required to adapt to redox stress, including genes for production of the low-molecular weight (lmw) thiol utilized by b. subtilis, bacillithiol [45] . l. monocytogenes spxa1 cannot be deleted and its regulon has not yet been characterized [23] . similarly, in streptococcus pneumoniae simultaneous deletion of both spxa1 and spxa2 paralogues is lethal [46] , supporting the notion that the spx regulon(s) may contain essential genes in some firmicutes. mutants exhibiting the most severe virulence phenotypes contained insertions in gshf, which encodes the sole l. monocytogenes glutathione synthase [34] . glutathione is a tripeptide lmw thiol antioxidant present at millimolar concentrations that contributes to maintaining a reducing environment in both bacterial and host cells [47] . not surprisingly, l. monocytogenes δgshf mutants are more sensitive to redox stressors such as hydrogen peroxide and diamide and are 200-fold less virulent in mice, indicating that bacterially-derived glutathione is essential for pathogenesis [17] . however, δgshf mutants are fully virulent in l. monocytogenes harboring prfa ã mutations that lock prfa in its constitutively active conformation. therefore, the primary role of gshf-derived glutathione during infection is to activate virulence gene expression via prfa activation, although we cannot rule out a contribution of imported host-derived glutathione [17] . indeed, host-derived glutathione activates virulence gene expression in burkholderia pseudomallei [48] . in the case of l. monocytogenes, gshf is transcriptionally up-regulated 10-fold during intracellular growth, suggesting the existence of an unidentified cue, likely redox-related, that stimulates glutathione production. the identification of many redox-related bacterial factors in this genetic selection led to our working model that specific redox changes during infection are sensed by the bacteria as a mechanism to identify their intracellular location and activate virulence genes appropriately. redox stress during infection could arise from host-derived antimicrobial factors. for example, the host generates antibacterial factors that assault invading pathogens with redox stresses, including: reactive oxygen species (ros), reactive electrophilic species (res) such as methylglyoxal, and reactive nitrogen species (rns) such as nitric oxide and peroxynitrite [40, 49, 50] . interestingly, these redox stresses from the host are spatially compartmentalized. rns and ros are produced in the phagosome and once in the host cytosol, l. monocytogenes is confronted with res and mitochondrial-derived ros [40, 51] . it is possible that the bacterial response to the redox stressors is also compartmentalized, requiring specific factors in the vacuole (such as spxa1 and ohra) and host cytosol (such as yjbh). eliminating host nitric oxide synthase (nos2) or nadph oxidase did not rescue growth of the suicide mutant (s3 fig). nos2-generated nitric oxide is required for efficient l. monocytogenes cell-to-cell spread during infection, although this is due to the nitric oxide-mediated delay of phagolysosome maturation and not a direct effect on the bacteria [52] . together, these data suggest that a combination of host factors are likely required to activate acta during infection. alternatively, the source of redox stress may come from bacterial metabolism via ros generated from incomplete reduction of oxygen during aerobic respiration [53] . carbon source and phosphate abundance also affect the production of ros and methylglyoxal [54, 55] . prfa activity has been demonstrated to be sensitive to available carbon sources [2] . growth on plant-derived beta-glucoside sugars in the environment, such as cellobiose, represses prfa activation, whereas growth on host-derived sugars such as glucose-1-phosphate stimulates prfadependent gene expression [9, 56, 57] . therefore, entry of l. monocytogenes into the host cytosol results in a remodeling of carbon metabolism that may be linked to virulence gene regulation. glycerol is the principle carbon source used by l. monocytogenes intracellularly and growth on glycerol is a well-described stimulant of methylglyoxal production [58] [59] [60] [61] . in b. subtilis, methylglyoxal stress stimulates the spx regulon and production of bacillithiol, a low molecular weight thiol used by b. subtilis to detoxify methylglyoxal [62] . thus, the 10-fold increase in gshf transcript levels in l. monocytogenes may correspond to increased methylglyoxal production during infection, which would further link metabolism of an alternative carbon source to virulence. coupling of metabolism to virulence gene regulation may allow the system to remain off in the environment while remaining poised to turn on upon entering a host. considering our finding of multiple redox factors that are required for proper virulence gene expression, we speculate that changes in carbon metabolism could alter the endogenous levels of ros and res produced, thus affecting prfa activation and leading to the "sugar-mediated repression" observed previously [9] . appropriate up-regulation of acta at the translational level is understood to require its 5' utr, although the mechanism remains unknown [21] . the data reported here further emphasize the sensitivity of acta translation to the environment in which l. monocytogenes is growing. in broth, the prfa ã strain elaborated 2.4% the amount of acta protein as compared to constitutively expressed acta (fig 5e) , and increased 200-fold during infection (fig 5f) , despite the fact that transcript levels of acta are equivalent in both growth conditions [17] . these data emphasize the importance of the translational control of this virulence factor. importantly, yjbh was required for the increased abundance of acta protein during infection. in wild type l. monocytogenes, multiple prfa-dependent promoters may compensate for loss of translational activation; however, when acta was isolated under its most proximal promoter, disruption of yjbh resulted in an attenuation of over 3-logs in the livers of infected animals (fig 5j) . it seems unlikely that the thioredoxin yjbh activates translation of acta via direct binding to the 5' utr. however, yjbh may indirectly activate translation via interaction with another factor (s) or modulation of a small-molecule signal produced by the host. prfa-dependent transcription and activation are regulated redundantly at multiple levels, including: a temperature-sensitive riboswitch [63] , allosteric activation by glutathione [17] , multiple read-through transcripts [10, 64] , positive and negative promoter elements [11, 65] , and yet to be fully characterized translational control. the complexity of acta activation is likely the result of selective pressure to respond appropriately to host-derived cues. this study investigated the virulence defects associated with failure to up-regulate virulence genes; however, over-production or inappropriate regulation of virulence factors extracellularly also results in a competitive disadvantage for l. monocytogenes [19, 20] . how l. monocytogenes and other intracellular pathogens regulate virulence gene expression is central to understanding their pathogenesis. results reported here suggest that redox cues are a mechanism by which intracellular pathogens recognize the host and represents an exciting new area of further investigation. this study was carried out in strict accordance with the recommendations in the guide for the care and use of laboratory animals of the national institutes of health. all protocols were reviewed and approved by the animal care and use committee at the university of california, berkeley (aup-2016-05-8811). all l. monocytogenes strains are a derivative of wild type 10403s [67, 68] and were cultivated in brain heart infusion (bhi, difco), shaking at 37°c unless otherwise stated. all e. coli strains were cultivated shaking in lb (miller) at 37°c. antibiotics (purchased from sigma) were used at the following concentrations: carbenicillin (100 μg/ml), streptomycin (200 μg/ml), chloramphenicol (7.5 μg/ml for l. monocytogenes and 10 μg/ml for e. coli), erythromycin (1 μg/ ml), and tetracycline (2 μg/ml). all e. coli strains are listed in table 2 and all l. monocytogenes strains are listed in table 3 . bacterial broth growth curves were performed as previously described [69] . the suicide strain was a gift from peter lauer and bill hanson (aduro biotech); details of its construction are reported elsewhere [17] . briefly, loxp sites were inserted on either side of the origin of replication by allelic exchange into a δactaδinlb strain of l monocytogenes. a transcriptional fusion of cre with acta that included the acta1p promoter, 5' utr, and ribosomal binding site of acta, was inserted adjacent to a loxp site. knock-in of ppl2 derivative plasmids was performed by standard methods [41] . briefly, constructed ppl2 plasmids were transformed into chemically competent sm10 e. coli, selecting on chloramphenicol. donor sm10 and recipient l. monocytogenes were mixed at a 1:1 ratio on a non-selective bhi plate at 37°c for 4-24 hours, then trans-conjugation was selected for by plating bacteria on bhi containing streptomycin plus either chloramphenicol (ppl2), erythromycin (ppl2e), or tetracycline (ppl2t). single colonies were re-streaked for purifying selection onto bhi containing the same antibiotics as used after trans-conjugation. in-frame deletions of genes was accomplished by allelic exchange using pksv7-orit and conventional methods [64] . briefly, the constructed knock-out plasmid was transformed into sm10 e. coli, recovered on lb containing carbenicillin, and trans-conjugated into l. monocytogenes by mixing the donor sm10 and recipient l. monocytogenes at a 1:1 ratio on a non-selective bhi plate for 4-24 hours at 30°c, the permissive temperature for pksv7-orit to replicate in gram-positive organisms. trans-conjugation was selected on bhi containing both streptomycin and chloramphenicol at 30°c. after isolation of a single colony of l. monocytogenes containing pksv7-orit at 30°c, bacteria were grown at 42°c on bhi agar containing both streptomycin and chloramphenicol to select for chromosomal integration. colonies were restreaked onto selective media at 42°c two additional times for purifying selection and integrated pksv7-orit. this strain was then serially passaged at 30°c to enrich for excision and loss of pksv7-orit. mutants that lost pksv7-orit were identified by sensitivity to chloramphenicol using indirect patch-plating methods. finally, allelic exchange was confirmed by pcr and, when necessary, sanger dna sequencing. preparation of electro-competent l. monocytogenes and himar1 transposon mutagenesis were performed as previously described [29] , generating a transposon mutant library that was not fully characterized previously [17] . transposon junctions were mapped as previously described [71] . the position of each himar1 transposon refers to to the distance of the insertion site, 3' of the first nucleotide of each gene. transposons were mapped to the 10403s genome, however, for continuity of nomenclature the egd-e loci names have been used. for reference: lmo0441 (lmrg_00133), lmo0443 (lmrg_00135), rsbx is lmo0896 (lmrg_02320), yjbh is lmo0964 transposons in the chromosome were introduced into different genetic backgrounds by generalized transduction using the phage u153, as previously described [29, 75] . briefly, a transducing lysate was generated by lysogenizing approximately 10 9 cfu of l. monocytogenes transposon donor with approximately 10 7 pfu of phage in 3-4 ml of 0.7% lb agar containing mgso 4 and cacl 2 (10 mm each) on lb agar and incubated overnight at 30°c. phage was soaked out of the agar by incubating with 5 ml of tm buffer (10 mm tris, ph 7.5 and 10 mm mgso 4 ) for 8-24 hours and these recovered phage stocks were filter sterilized. with the newly generated transducing lysate, 10 8 l. monocytogenes recipients were lysogenized with 10 7 pfu of lysate, incubated at 30°c for 30 min in lb containing mgso 4 and cacl 2 (10 mm each), and then plated on selective bhi agar at 37°c. when transducing the himar1 transposon using erythromycin selection, colonies appeared after two days. these colonies were purified by restreaking transductants for single colonies and verified by sequencing the transposon junction. u153 phage stocks were propagated using l. monocytogenes strain slcc-5762. knock-in plasmids were constructed as previously described using primers listed in table 4 and reagents are from new england biolabs, unless otherwise specified [71] . briefly, vectors for complementing yjbh and spxa1 were constructed by amplifying each gene along with its predicted native promoters using a reverse primer that appended a dna sequence encoding a six histidine affinity tag at the c-terminus. these dna fragments and ppl2 [41] were then digested with kpni and bamhi and ligated using quick ligase, according to manufacturer's instructions. the arpj and ohra complement vectors were constructed by amplifying their entire predicted operon and predicted native promoter (arpj: lmrg_01581-lmrg_01580, ohra: lmrg_01632-lmrg_01633) without addition of affinity tags. the dna fragment was combined with linearized ppl2t harboring a transcriptional terminator [71] and assembled using in-fusion cloning (clontech) or gibson assembly ultra (synthetic genomics). the ppl2t.p hyper -acta vector was constructed by amplifying both 5' utr and cds of acta (lmrg_02626), and combining the dna fragment with linearized ppl2t harboring a modified pspac-hy (p hyper ) [38] sequence: "aattgtgagcgctcacaattttgcaaaaagttgttgactttatctacaaggtgtgg cataatgtgtgtaattgtgagcgctcacaatt", inserted via gblock (idt), and a transcriptional terminator for assembly using in-fusion cloning (clontech). the pksv7-orit-δyjbh vector was constructed according to methods previously described [71] . briefly, the vector was constructed by sequentially amplifying~1 kb of homology flanking the yjbh coding region using primers in table 3 . these two fragments were joined by sequence overlap extension pcr, which included the coding region for the first six and last six amino acids of yjbh. the final pcr fragment and pksv7-orit were digested with kpni and psti (rsap was also included for the vector) and ligated using quick ligase. the ligation product was transformed into xl1 blue e. coli and transformants were screened by pcr for the presence of the insert, followed by sanger sequencing confirmation. the plaque assay was carried out by conventional methods [22, 76] . briefly, l2 fibroblasts (generated previously from l929 cells [77] and provided as a generous gift from susan weiss in 1988, as detailed in sun et al. [22] ) or tib-73 hepatocytes (atcc tib-73) were maintained in high-glucose dmem medium plus 10% fbs (hyclone), 2 mm l-glutamine (gibco), and 1 mm sodium pyruvate (gibco). cells were plated at 1.2 x 10 6 cells per well in a six-well dish and infected the next day at an moi of 300 with l. monocytogenes grown overnight at 30°c, stationary. the infection was allowed to proceed for one hour before the wells were washed twice with pbs and 3 ml of medium plus 0.7% agarose and 10 μg/ml gentamicin was overlaid. at 48 hours post-infection the plaques were stained with 2 ml of medium plus 0.7% agarose, 10 μg/ ml gentamicin, and 25 μl/ml neutral red (sigma). the plaques were then imaged at 72 hours post-infection. plaque area was quantified using imagej software [78] . each experiment represented an average of the area of at least five plaques per strain as a proportion to wild type plaques in that experiment. data are representative of at least three experiments. macrophage growth curves were performed as previously described [72, 79] . briefly, bone marrow-derived macrophages (bmms) were derived from bone marrow of c57bl/6 mice purchased from the jackson laboratory and were cultivated/differentiated in high-glucose dmem medium containing csf (from mouse csf-1-producing 3t3 cells), 20% fbs (hyclone), 2 mm l-glutamine (gibco), 1 mm sodium pyruvate (gibco), and 14 mm 2-mercaptoethanol (bme, gibco). bmms were derived as previously described and plated in 60 mm non-tc treated dishes that contained 14 tc-treated coverslips at 3 x 10 6 cells per dish. these dishes were then infected at an moi of 0.1 for 30 minutes, washed twice with pbs prior to replacing media, and gentamicin was added at 50 μg/ml one hour post-infection. three coverslips were removed from each dish at 0.5, 2, 5, and 8 hours post-infection and added to 5 ml of sterile water. coverslips were rigorously mixed prior to plating on lb agar. each graph is representative of three experiments and each data point represents the average of three coverslips. to analyze virulence, female cd-1 mice were infected intravenously (i.v.) via the tail vein using 200 μl of sterile pbs containing 10 5 cfu of each l. monocytogenes strain as previously described [80] . the infection was allowed to progress for 48 hours, at which point animals were euthanized and the spleens and livers were harvested. organs were homogenized in 0.1% np-40 and serial dilutions were plated on lb agar containing streptomycin. graphs represent pooled data from at least two experiments of greater than four mice each. groups were statistically compared using a heteroscedastic student's t-test. in vivo suppressors were identified similarly to previously described methods [17] . briefly, cd-1 mice were infected i.v. with 1 x 10 7 cfu of δgshf for 72 hours and the livers were harvested, homogenized, and 100 μl was inoculated into broth. naïve mice were then infected with these liver homogenate cultures. after four successive infections bacteria isolated from infected livers were analyzed via plaque assay and two strains with intermediate plaque phenotype were selected for genome sequencing. genomic dna was isolated from l. monocytogenes using the masterpure gram-positive dna purification kit (epicentre) according to the manufacturer's instructions. genome sequencing and dna library preparation was performed as previously described [71] at the vincent j. coates genomics sequencing laboratory at uc berkeley. data was assembled and aligned to the 10403s reference genome (genbank: gca_000168695.2) demonstrating >50x coverage. snp/indel/structural variation was determined as compared to the δgshf parent strain using clc genomics workbench (clc bio). all immunoblotting was performed as previously described [17] . briefly, for bacteria grown in broth, overnight cultures were diluted 1:10 into bhi, incubated for five hours at 37°c, shaking, then the bacteria were separated from the supernatant by centrifugation. for secreted proteins, the supernatant was treated with 10% v/v tca for one hour on ice to precipitate all proteins. the protein pellet was washed twice with ice-cold acetone, followed by vacuum drying. the proteins were dissolved in lds buffer (invitrogen) containing 5% bme using a volume that normalized for od 600 of harvested bacteria, boiled for 20 minutes, and separated by sds-page. for surface associated proteins, bacteria were suspended in 150 μl of lds buffer containing 5% bme, boiled for 20 minutes, and proteins separated by sds-page. immunoblots of bacteria grown intracellularly within infected bmms used 12-well dishes with bmms at a density of 10 6 cells per well and infected with an moi of 10. one hour postinfection the cells were washed and media containing gentamicin (50 μg/ml) was added. four hours post-infection the cells were washed twice with pbs and harvested in 150 μl lds buffer containing 5% bme. the samples were then boiled and separated by sds-page. primary antibodies were each used at a dilution of 1:5,000, including: rabbit polyclonal antibody against the n-terminus of acta [81] , rabbit polyclonal antibody against llo, and a mouse monoclonal antibody against p60 (adipogen). p60 is a constitutively expressed bacterial protein used as a loading control [82] . all immunoblots were visualized and quantified using odyssey imager and appropriate secondary antibodies from the manufacturer according the manufacturer's instructions. transcript analysis in broth was performed as previously described [83] . briefly, bacteria were grown overnight in bhi and subcultured 1:100 into 25 ml bhi. bacteria were harvested at an od 600 = 1.0. transcript analysis during infection was analyzed as previously described [17] . briefly, bmms were plated at a density of 3 x 10 7 cells in 150 mm tc-treated dishes and infected with an moi of 10. one hour post-infection the cells were washed and media containing gentamicin (50 μg/ml) was added. four hours post-infection the cells were washed with pbs and lysed in 5 ml of 0.1% np-40. after collecting the lysate, the dishes were then washed in rnaprotect bacteria reagent (qiagen), which was combined with the lysate. bacteria were isolated by centrifugation. bacteria harvested from either broth or bmms were lysed in phenol: chloroform containing 1% sds by vortexing with 0.1 mm diameter silica/zirconium beads (biospec products inc.). nucleic acids were precipitated from the aqueous fraction overnight at -80°c in ethanol containing 150 mm sodium acetate (ph 5.2). precipitated nucleic acids were washed with ethanol and treated with turbo dnase per manufacturer's specifications (life technologies corporation). rna was again precipitated overnight and then washed in ethanol. rt-pcr was performed with iscript reverse transcriptase (bio-rad) and quantitative pcr (qpcr) of resulting cdna was performed with kapa sybr fast (kapa biosystems). primers used for qpcr are listed in table 4 . disk diffusions were performed similarly to previously described methods [84] . briefly, approximately 3 x 10 7 cfu from overnight cultures of bacteria were immobilized using 4 ml of molten (55°c) top-agar (0.8% nacl and 0.8% bacto-agar) spread evenly on tryptic soy agar plates. after the agar cooled, whatman paper disks soaked in 5 μl of 5% hydrogen peroxide, 1 m diamide solution, or 80% cumene hydroperoxide solution were placed on top of the bacteria-agar. the zone of inhibition was measured after 18-20 hours of incubation at 37°c. bmms were differentiated and cultivated as described for bmm growth curves. cells were plated at 5 x 10 5 cells per well in a 24-well dish in media without antibiotics. the following day bmms were infected at an moi of 5 with l. monocytogenes mutants that had been incubated at 30°c without shaking. after 30 minutes cells were washed once with pbs and fresh media containing gentamicin (50 μg/ml) was added. six hours post infection media was removed from each well, the cells were washed with 1 ml of pbs, and 0.5 ml of pbs was replaced for each well. rfp fluorescence was measured using a plate reader (infinite m1000 pro, tecan) with 555 nm excitation, 584 nm emission, and 5 nm band filters. each well was interrogated 64 times on an 8 x 8 grid and the edge reads were excluded. data were normalized by subtracting baseline fluorescence of wild type (without rfp) infected cells and plotting data as a percentage of wild type expressing acta1p-rfp. each experiment represents three infected wells per l. monocytogenes genotype and data are representative of three pooled independent experiments. bacterial pathogen manipulation of host membrane trafficking listeria monocytogenes-from saprophyte to intracellular pathogen listeria monocytogenes: a multifaceted model listeriolysin o: a phagosome-specific lysin interaction of human arp2/3 complex and the listeria monocytogenes acta protein in actin filament nucleation monocytogenes-induced actin assembly requires the acta gene product, a surface protein coordinate regulation of virulence genes in listeria monocytogenes requires the product of the prfa gene the prfa virulence regulon regulation of listeria virulence: prfa master and commander regulation of the prfa transcriptional activator of listeria monocytogenes: multiple promoter elements contribute to intracellular growth and cell-to-cell spread dual promoters of the listeria monocytogenes prfa transcriptional activator appear essential in vitro but are redundant in vivo pleiotropic control of listeria monocytogenes virulence factors by a gene that is autoregulated sequence variations within prfa dna binding sites and effects on listeria monocytogenes virulence gene expression differential activation of virulence gene expression by prfa, the listeria monocytogenes virulence regulator intracellular induction of listeria monocytogenes acta expression expression of listeriolysin o and acta by intracellular and extracellular listeria monocytogenes glutathione activates virulence gene expression of an intracellular pathogen a gly145ser substitution in the transcriptional activator prfa causes constitutive overexpression of virulence factors in listeria monocytogenes prfa regulation offsets the cost of listeria virulence outside the host constitutive activation of prfa tilts the balance of listeria monocytogenes fitness towards life within the host versus environmental survival evidence implicating the 5' untranslated region of listeria monocytogenes acta in the regulation of bacterial actin-based motility isolation of listeria monocytogenes small-plaque mutants defective for intracellular growth and cell-to-cell spread identification in listeria monocytogenes of meca, a homologue of the bacillus subtilis competence regulatory protein identification of a peptide-pheromone that enhances listeria monocytogenes escape from host cell vacuoles five listeria monocytogenes genes preferentially expressed in infected mammalian cells: plca, purh, purd, pyre and an arginine abc transporter gene intracellular gene expression profile of listeria monocytogenes the listeria transcriptional landscape from saprophytism to virulence yjbh is a novel negative effector of the disulphide stress regulator, spx, in bacillus subtilis development of a marinerbased transposon and identification of listeria monocytogenes determinants, including the peptidylprolyl isomerase prsa2, that contribute to its hemolytic phenotype spx-rna polymerase interaction and global transcriptional control during oxidative stress wachenfeldt von c. the yjbh adaptor protein enhances proteolysis of the transcriptional regulator spx in staphylococcus aureus ohrr is a repressor of ohra, a key organic hydroperoxide resistance determinant in bacillus subtilis identification of novel listeria monocytogenes secreted virulence factors following mutational activation of the central virulence regulator a multidomain fusion protein in listeria monocytogenes catalyzes the two primary activities for glutathione biosynthesis allosteric mutants show that prfa activation is dispensable for vacuole escape but required for efficient spread and listeria survival in vivo prfa mediates specific binding of rna polymerase of listeria monocytogenes to prfa-dependent virulence gene promoters resulting in a transcriptionally active complex the 5' untranslated region-mediated enhancement of intracellular listeriolysin o production is required for listeria monocytogenes pathogenicity in vivo effects of sporulation kinases on mutant spo0a proteins in bacillus subtilis the role of the activated macrophage in clearing listeria monocytogenes infection localized reactive oxygen and nitrogen intermediates inhibit escape of listeria monocytogenes from vacuoles in activated macrophages construction, characterization, and use of two listeria monocytogenes site-specific phage integration vectors yjbh-enhanced proteolysis of spx by clpxp in bacillus subtilis is inhibited by the small protein yirb (yuzo) the yjbh protein of bacillus subtilis enhances clpxp-catalyzed proteolysis of spx a regulatory protein that interferes with activator-stimulated transcription in bacteria regulation of bacillus subtilis bacillithiol biosynthesis operons by spxa1, a novel transcriptional regulator involved in x-state (competence) development in streptococcus pneumoniae the many faces of glutathione in bacteria host cytosolic glutathione sensing by a membrane histidine kinase activates the type vi secretion system in an intracellular bacterium inducible nitric oxide synthase and control of intracellular bacterial pathogens cells producing their own nemesis: understanding methylglyoxal metabolism how mitochondria produce reactive oxygen species nitric oxide increases susceptibility of toll-like receptor-activated macrophages to spreading listeria monocytogenes bacterial adaptation to oxidative stress: implications for pathogenesis and interaction with phagocytic cells life of listeria monocytogenes in the host cells' cytosol bacterial production of methylglyoxal: a survival strategy or death by misadventure? carbon-source regulation of virulence gene expression in listeria monocytogenes glucose-1-phosphate utilization by listeria monocytogenes is prfa dependent and coordinately expressed with virulence factors carbon metabolism of listeria monocytogenes growing inside macrophages carbon metabolism of intracellular bacterial pathogens and possible links to virulence lethal synthesis of methylglyoxal by escherichia coli during unregulated glycerol metabolism methylglyoxal resistance in bacillus subtilis: contributions of bacillithiol-dependent and independent pathways an rna thermosensor controls expression of virulence genes in listeria monocytogenes dual roles of plca in listeria monocytogenes pathogenesis negative regulation of prfa, the key activator of listeria monocytogenes virulence gene expression, is dispensable for bacterial pathogenesis comparative transcriptomics of pathogenic and non-pathogenic listeria species adoptive transfer of immunity to listeria monocytogenes. the influence of in vitro stimulation on lymphocyte subset requirements comparison of widely used listeria monocytogenes strains egd, 10403s, and egd-e highlights genomic variations underlying differences in pathogenicity cyclic di-amp is critical for listeria monocytogenes growth, cell wall homeostasis, and establishment of infection a broad host range mobilization system for in vivo genetic-engineeringtransposon mutagenesis in gram-negative bacteria. bio-technology the pamp c-di-amp is essential for listeria monocytogenes growth in rich but not minimal media due to a toxic increase in (p) avoidance of autophagy mediated by plca or acta is required for listeria monocytogenes growth in invasive extravillous trophoblasts restrict intracellular growth and spread of listeria monocytogenes functional impact of mutational activation on the listeria monocytogenes central virulence regulator prfa. microbiology (reading, engl) generalized transduction of serotype 1/2 and serotype 4b strains of listeria monocytogenes a prl mutation in secy suppresses secretion and virulence defects of listeria monocytogenes seca2 mutants enhanced growth of a murine coronavirus in transformed mouse cells nih image to imagej: 25 years of image analysis role of hemolysin for the intracellular growth of listeria monocytogenes sting-dependent type i ifn production inhibits cell-mediated immunity to listeria monocytogenes constitutive activation of the prfa regulon enhances the potency of vaccines based on live-attenuated and killed but metabolically active listeria monocytogenes strains expression of the iap gene coding for protein p60 of listeria monocytogenes is controlled on the posttranscriptional level listeria monocytogenes is resistant to lysozyme through the regulation, not the acquisition, of cell wall-modifying enzymes mutations of the listeria monocytogenes peptidoglycan n-deacetylase and o-acetylase result in enhanced lysozyme sensitivity, bacteriolysis, and hyperinduction of innate immune pathways the authors would like to thank nancy freitag (university of illinois at chicago college of medicine), pete lauer, and bill hanson (aduro biotech) for strains and helpful discussions. the authors would also like to thank nicholas garelis and sonya john for technical assistance, gabriel mitchell for assistance with microscopy, chen chen for assistance with flow cytometry, and brittany ruhland and eric d. lee for critical reading of the manuscript. key: cord-267475-6f4h3cck authors: kozak, marilyn title: pushing the limits of the scanning mechanism for initiation of translation date: 2002-10-16 journal: gene doi: 10.1016/s0378-1119(02)01056-9 sha: doc_id: 267475 cord_uid: 6f4h3cck selection of the translational initiation site in most eukaryotic mrnas appears to occur via a scanning mechanism which predicts that proximity to the 5′ end plays a dominant role in identifying the start codon. this ‘position effect’ is seen in cases where a mutation creates an aug codon upstream from the normal start site and translation shifts to the upstream site. the position effect is evident also in cases where a silent internal aug codon is activated upon being relocated closer to the 5′ end. two mechanisms for escaping the first-aug rule – reinitiation and context-dependent leaky scanning – enable downstream aug codons to be accessed in some mrnas. although these mechanisms are not new, many new examples of their use have emerged. via these escape pathways, the scanning mechanism operates even in extreme cases, such as a plant virus mrna in which translation initiates from three start sites over a distance of 900 nt. this depends on careful structural arrangements, however, which are rarely present in cellular mrnas. understanding the rules for initiation of translation enables understanding of human diseases in which the expression of a critical gene is reduced by mutations that add upstream aug codons or change the context around the aug(start) codon. the opposite problem occurs in the case of hereditary thrombocythemia: translational efficiency is increased by mutations that remove or restructure a small upstream open reading frame in thrombopoietin mrna, and the resulting overproduction of the cytokine causes the disease. this and other examples support the idea that 5′ leader sequences are sometimes structured deliberately in a way that constrains scanning in order to prevent harmful overproduction of potent regulatory proteins. the accumulated evidence reveals how the scanning mechanism dictates the pattern of transcription – forcing production of monocistronic mrnas – and the pattern of translation of eukaryotic cellular and viral genes. the scanning mechanism for initiation of translation postulates that the small (40s) ribosomal subunit enters at the 5 0 end of the mrna and migrates linearly, stopping when the first aug codon is reached. consistent with the postulated 5 0 end-dependent entry of ribosomes, translation in vivo is strongly augmented by the m7g cap (furuichi and shatkin, 2000; horikami et al., 1984; lo et al., 1998; neeleman et al., 2001) and ribosome binding in vitro is prevented by circularization of the mrna (kozak, 1979a; konarska et al., 1981) . perhaps because the scanning mechanism has been around for a while, the evidence for some basic points has been forgotten. one recent commentary even questions whether the 40s ribosomal subunit has anything to do with it (mathews, 2002) . the easiest answer is that the stopscanning step is clearly mediated by pairing of the initiation codon with the anticodon in met-trna i (cigan et al., 1988a) , and the 40s ribosomal subunit is the carrier of met-trna i ·eif2. but the 40s subunit was already implicated by experiments done earlier. the experiments that gave rise to the scanning model concerned unusual polysome-like complexes formed in the presence of edeine, an antibiotic which blocks recognition of the aug codon (kozak and shatkin, 1978) . analysis of the rapidly sedimenting complexes revealed 40s ribosomal subunits distributed throughout the body of the mrna. because control experiments showed that, even in the presence of edeine, ribosomes can enter only from the 5 0 end, the simplest explanation was that 40s subunits enter at the 5 0 end and then migrate into the interior of the mrna; in the absence of edeine, the migration would stop when an aug codon is reached. independent experiments confirmed that edeine is targeted to the ribosome (herrera et al., 1986) , and use of a fractionated translation system confirmed that the edeine-induced complexes are formed by 40s but not 60s ribosomal subunits (kozak and shatkin, 1978; kozak, 1979b) . subsequent experiments, with edeine omitted, showed that scanning can be interrupted by inserting a base-paired structure between the cap and the aug codon; the resulting abortive complexes sediment around 40s (kozak, 1989 (kozak, , 1998 paraskeva et al., 1999) . we are not yet sure which initiation factors are associated with the 40s ribosomal subunit during the scanning phase. the only factors whose role in scanning has been defined clearly are the gtp-binding protein eif2, which escorts met-trna i onto the 40s subunit, and eif5, which activates gtp hydrolysis by eif2 (asano et al., 2001; das et al., 2001) . by controlling the rate of gtp hydrolysis, eif5 controls the fidelity of initiation, i.e. the fidelity of the stop-scanning step . other protein factors have not yet been fitted in. (the voluminous literature on factors focuses on modifications -phosphorylation, cleavages -rather than on defining the initiation pathway. basic questions, such as when each factor enters and leaves, have not yet been answered.) one untested possibility is that the large initiation factor eif3, bound to the 40s ribosomal subunit, might form a clamp around the mrna that is opened and closed by cycles of atp hydrolysis. scanning appears to be dependent on atp hydrolysis (kozak, 1980) , thereby implicating eif4a, an rna-dependent atpase which might control the hypothetical clamp. some ideas about the function of other initiation factors are reviewed elsewhere (dever, 2002; mccarthy, 1998; pestova et al., 2001) . the strongest evidence that the scanning 40s ribosome/ factor complex advances linearly is the position effect on selection of the start codon: initiation at the first potential start codon has been demonstrated in rigorous experimental tests (cigan et al., 1988a; kozak, 1983 kozak, , 1995 and confirmed in many 'natural tests' wherein addition or removal of an aug codon produces the expected shift in the site of initiation (see below). the aforementioned blockade caused by inserting a base-paired structure between the 5 0 cap and the aug start codon is further evidence that 40s ribosomal subunits traverse the leader sequence linearly, rather than hopping (discontinuous scanning) or entering directly at the aug codon. although the scanning mechanism predicts that translation should initiate at the aug codon nearest the 5 0 end of the mrna, two ancillary mechanisms -reinitiation and context-dependent leaky scanning -enable additional initiation events at downstream aug codons in some mrnas. these well-defined mechanisms for escaping the first-aug rule are discussed below. an additional escape mechanism might involve direct entry of ribosomes at an internal site in the mrna. while there is evidence suggestive of direct internal initiation with picornavirus mrnas, the evidence for internal ribosome entry sites (ires) in cellular mrnas is problematic (kozak, 2001a) . the absence of shared structural features among candidate cellular ires elements makes it impossible to predict which mrnas, if any, might use such a mechanism. rather than attempting to summarize the extensive literature on internal initiation, i refer the reader to other detailed reviews on that subject (dever, 2002; hellen and sarnow, 2001; pestova et al., 2001) . the next section provides a terse summary of points that are easily explained by the scanning model. the bulk of the review then focuses on complicated examples and issues. 2. constraints imposed by the scanning mechanism explain many common aspects of gene expression in higher eukaryotes many plant and animal viruses produce dicistronic or polycistronic mrnas from which only the 5 0 cistron can be translated (table 1 ). all these viruses solve the problem of 'silent 3 0 cistrons' by producing -via splicing or discontinuous transcription or an internal promoteradditional forms of mrna in which the downstream cistron is repositioned closer to the 5 0 end. the reason for the complicated pattern of splicing seen with human immunodeficiency virus type 1 (hiv-1), for example, is simply to produce mrnas that allow downstream open reading frames (orfs) to be translated. the broad range of viruses represented in table 1 merits attention. the same problem and same solution -post-transcriptional processing of polycistronic mrnas -underlie the expression of many genes in caenorhabditis elegans (blumenthal et al., 2002; hough et al., 1999) . in mammalian cells, mrnas that contain two full-length nonoverlapping cistrons are extremely rare and, as with the aforementioned viruses, actual translation of the 3 0 cistron probably occurs from a second, monocistronic mrna (pardigol et al., 1998; westerman et al., 2001) or from a second mrna in which the two cistrons are fused into a single translation unit (gray and nicholls, 2000; hänzelmann et al., 2002) . a dicistronic transcript derived from the mouse snurf-snrpn locus barely supports translation of the second cistron, as discussed below in the section on reinitiation (section 4). recently discovered dicistronic transcripts produced from the mouse hyal locus support translation of only the 5 0 cistron (shuttleworth et al., 2002) . a few other reported dicistronic mrnas await testing. wold et al., 1995; ziff, 1985 parvovirus: adeno-associated capsid protein a capsid proteins b/c splicing muralidhar et al., 1994 hepatitis b virus core protein s proteins (envelope) promoter switch schaller and fischer, 1991 retrovirus: avian, murine gag (capsid) protein env protein splicing e pawson et al., 1977; van zaane et al., 1977 retrovirus: human foamy gag (capsid) protein pol precursor splicing jordan et al., 1996 lentivirus: hiv-1 tat rev and nef f splicing schwartz et al., 1992 alphavirus: semliki forest nonstructural proteins capsid protein internal promoter glanville et al., 1976; strauss and strauss, 1994 calicivirus: feline g nonstructural proteins capsid protein independent replication carter, 1990; neill et al., 1991 miller et al., 1985; shih and kaesberg, 1976 tobacco mosaic virus replicase coat and movement proteins internal promoters grdzelishvili et al., 2000; hunter et al., 1976 potato virus x 25 kda movement protein 12 and 8 kda movement proteins f ?? verchot et al., 1998 carmovirus: turnip crinkle g replicase (p28/p88) p8 and p9 movement proteins internal promoters li et al., 1998; wang and simon, 1997 fütterer et al., 1994 a the silent downstream cistron identified in the third column is expressed only upon being moved closer to the 5 0 end via production of a second, shorter mrna. translation of most genes derived from these viruses follows straightforward predictions of the scanning mechanism, although occasional deviations have been reported. in rare instances where a 3 0 cistron appears to be translated from a dicistronic mrna (grundhoff and ganem, 2001; kirshner et al., 1999; nador et al., 2001; stacey et al., 2000) , the virus in question employs a complicated pattern of splicing and therefore the existence of an undetected monocistronic mrna is not beyond the realm of reason. in some other cases only a small amount of the protein encoded by the 3 0 cistron was produced, and the published rna analyses were not sufficiently sensitive to rule out the presence of an additional subgenomic mrna (herbert et al., 1996). b in some cases the listed example is arbitrary, i.e. with retroviruses, coronaviruses, closteroviruses, etc., there are additional polycistronic mrnas wherein translation is restricted to the 5 0 cistron. c whereas dna viruses and retroviruses use conventional promoter-switching or splicing mechanisms to generate alternative forms of mrna that allow translation of the downstream cistron, more complicated mechanisms underlie the production of subgenomic mrnas by some rna viruses . d the presence of internal promoters that produce a shorter transcript for each downstream orf is suggestive, but testing of translation is still needed for the mrnas produced by cytomegalovirus and geminivirus. e whereas all retroviruses employ splicing to produce the subgenomic mrna from which envelope protein (env) is translated, some retroviruses also employ an internal promoter which is postulated to mediate expression of novel orfs, such as the superantigen of mouse mammary tumor virus (reuss and coffin, 1998) and orf-x of the virus that causes lung cancer in sheep (palmarini et al., 2002) . f see leaky scanning in table 3 and fig. 1 . g in place of the usual m7g cap, the 5 0 end of these viral rnas carries a covalently linked protein (vpg) or is unblocked. the need for a subgenomic mrna even in these cases emphasizes that translation is 5 0 end-dependent even when it is not cap-dependent. h the full-length genomic mrna supports translation of the 3 0 cistron in vitro but the 3 0 cistron is silent in vivo. the latter result is considered more reliable (meulewaeter et al., 1992) . notwithstanding the documented inability to translate the 3 0 cistron in natural dicistronic mrnas, synthetic dicistronic mrnas -constructed by inserting a putative ires element between two reporter genes -appear sometimes to allow translation of the downstream cistron. the interpretation that this occurs via direct internal initiation of translation has been questioned (kozak, 2001a) and defended (hellen and sarnow, 2001) in other reviews. the position effect, indicative of scanning, is seen when a mutation creates an aug codon upstream from the normal start codon and translation shifts to the upstream site (bergenhem et al., 1992; cai et al., 1992; gross et al., 1998; harington et al., 1994; liu et al., 1999; lock et al., 1991; mével-ninio et al., 1996; muralidhar et al., 1994; wada et al., 1995) . in the most stringent test of the rule, the first aug codon was shown to be the exclusive site of initiation even when the second aug was positioned just a few bases downstream from, and in the same optimal context as, the first (kozak, 1995) . the position effect is seen also when removal of the first start codon activates initiation from the next aug downstream. some genes require production of two versions of the encoded protein, wherein the shorter version, initiated from an internal aug codon, lacks the n-terminal domain of the longer isoform. the problem of how ribosomes can gain access to an internal start codon is solved by producing, via splicing or a downstream promoter, a second form of mrna from which the first aug start codon has been removed. table 2 lists some examples. the n-terminally truncated isoform thus produced may reside in a different cellular compartment, or may function as an antagonist to the full-length protein (as seen with various transcription factors listed in table 2 ), or may function in a surprising way. one such surprise was the discovery that a truncated form of tryptophanyl-trna-synthetase ('minitrprs') has angiostatic activity (wakasugi et al., 2002) . the entries in table 2 and some other examples (aichem and mutzel, 2001; beuret et al., 1999; falvey et al., 1995; nagpal et al., 1992) are what i call natural tests of the position rule. additional evidence comes from experimental manipulations wherein removal of the first aug was shown to activate initiation from a downstream site (cahana et al., 2001; chenik et al., 1995; tailor et al., 2001; thoma et al., 2001) . the scanning mechanism predicts that the 5 0 untranslated region (5 0 utr, an unfortunate misnomer) is actually traversed by ribosomes. this explains why translation of the major coding domain is reduced when adventitious out-offrame aug codons occur upstream. the upstream aug codons often create small orfs (uporfs) which are indeed translated, as shown by detecting the encoded peptide (hackett et al., 1986; raney et al., 2000; wang and wessler, 2001) or by fusing a reporter gene to the uporf (abastado et al., 1991; donzé et al., 1995; liu et al., 1999; steel et al., 1996; tanaka et al., 2001; xu et al., 2001) . the fusion test is the more reliable, as small peptides are usually degraded rapidly. even if upstream aug codons are arranged in a way that allows reinitiation, there is a penalty because reinitiation is usually inefficient. this topic will be discussed at length in section 4. the hypothesis that the 5 0 utr is traversed by ribosomes explains why a highly structured 5 0 leader sequence is so detrimental to translation. vertebrate mrnas characteristically have long, gc-rich -hence highly structured -leader sequences (kozak, 1991a; macleod et al., 1998) , and the resulting difficulty in translation has been discovered over and over in the course of cloning. even a short gc-rich 5 0 utr can inhibit profoundly, as illustrated in cases where a gene produces a mixture of mrnas with different leader sequences, and the worst-translated mrna was found to be the form with the shortest 5 0 utr (jiang and lucy, 2001; yang et al., 1998) . a stem-and-loop structure, stabilized in some cases by a repressor protein, is most inhibitory when its proximity to the 5 0 end blocks ribosome binding (goossen and hentze, 1992; kozak, 1989; wang and wessler, 2001) . if the structure is far enough from the 5 0 end to allow ribosome entry, the advancing 40s ribosome/factor complex apparently has some ability to disrupt base pairing, but this ability is notably less than that of 80s elongating ribosomes (kozak, 1986a (kozak, , 2001b lingelbach and dobberstein, 1988; paraskeva et al., 1999) and is curtailed in yeast (koloteva et al., 1997) . while there are mechanisms for reducing the inhibitory effects of upstream aug codons, as discussed below, no mechanism has yet been defined for modulating the inhibitory effects of secondary structure. some studies suggest that secondary structure might be less inhibitory to translation in vivo than in vitro (charron et al., 1998; curnow et al., 1995; hensold et al., 1997; hoover et al., 1997; morrish and rumsby, 2001; van der velden et al., 2002) . this could be due to production of an alternative transcript that simply eliminates the secondary structure -a reasonable possibility given that gc-rich domains often harbor promoter elements -or to modification of the translation machinery. interpretation of in vivo tests of translation could also be complicated by effects of secondary structure on mrna stability (stefanovic et al., 1999) . whether and how translation of gc-rich leader sequences might improve in exponentially growing cells (nielsen et al., 1995) remains an important open question. the scanning mechanism rationalizes the occurrence of initiation at upstream acg or cug codons in some mrnas. these alternative codons are usually too weak to actually substitute for the aug start codon (reviewed by kozak, 1999 ; for some exceptions see falvey et al., 1995; kiefer et al., 1994; riechmann et al., 1999; sadler et al., 1999) . it is not uncommon, however, for initiation to occur at an upstream non-aug codon in addition to the first aug (see leaky scanning in section 3). this is observed frequently with cellular genes that have highly structured, gc-rich leader sequences (kozak, 1991b) , perhaps because secondary structure slows scanning and thus allows more time for the mismatched codon to pair with met-trna i . with some viruses, the extra protein isoform initiated from an upstream non-aug codon serves an essential function (muralidhar et al., 1994; portis et al., 1994 portis et al., , 1996 . while the n-terminally-extended isoforms derived from some cellular genes also display distinct functions (arnaud table 2 partial list of vertebrate genes that produce a second, shorter version of the encoded protein via a second form of mrna in which an internal aug codon becomes a functional start site upon elimination of the upstream aug start sun et al., 1995 a production of long and short protein isoforms via this mechanism is seen also with genes from insects (mével-ninio et al., 1996) , plants (cunillera et al., 1997; wimmer et al., 1997) , yeast (beltzer et al., 1988; carlson et al., 1983; chatton et al., 1988; ellis et al., 1989; gammie et al., 1999; natsoulis et al., 1986; wolfe et al., 1996) and viruses (barbosa and wettstein, 1988; lambert et al., 1987; liu and roizman, 1991; liu and biegalke, 2002; weimer et al., 1987; welch et al., 1991; wu et al., 1993b; zheng et al., 1994). b in these cases, the long and short protein isoforms have different functional effects. other genes that resemble this pattern, producing long and short isoforms with contrasting functions, are not listed in the table because the aug start codon for the shorter protein is carried on an alternative exon present only in the shorter mrna (e.g. koski et al., 1999; molina et al., 1993) . that arrangement does not illustrate the main point of the table, which is that a silent internal aug codon in the longer mrna can be activated simply by truncating the transcript. c the long and short isoforms are targeted to different cellular compartments. d the long and short isoforms are expressed in different tissues. e the long and short forms of b1,4-galactosyltransferase appear to function identically. the main significance of the promoter switch, which eliminates the first aug start codon, is that the shorter 5 0 utr supports translation more efficiently (charron et al., 1998 (charron et al., ). et al., 1999 calkhoven et al., 2000; spotts et al., 1997) or patterns of localization (acland et al., 1990; lock et al., 1991; packham et al., 1997) , it would not be surprising if some other upstream-initiated proteins turn out to be inadvertent byproducts generated in the course of slowly traversing a gc-rich leader sequence. because the ability of the scanning mechanism to explain the big picture is generally accepted, the remainder of this review directs attention, not to examples that can be seen readily to support the model, but to mrnas that seem to be poorly designed for a scanning mode of initiation. the main point is that the scanning mechanism applies even in these difficult cases. understandably, such mrnas are translated inefficiently and this brings out a second important point: some critical regulatory genes require protein synthesis to be inefficient. an earlier review raised awareness that genes that encode potent regulatory proteins -cytokines, growth factors, kinases, transcription factors -often produce mrnas in which the 5 0 leader sequence is gc-rich or burdened by upstream aug codons (kozak, 1991a) . some examples described herein validate the prediction that these encumbered 5 0 sequences are nature's way of limiting the synthesis of potent proteins that would be harmful if overproduced. i also suggested in earlier reviews that, when a cdna sequence has so many upstream aug codons as to challenge the applicability of the scanning mechanism, it is wise to ask whether the cdna correctly reflects the structure of the mrna. that advice is not changed by what is written here. very often, cdna sequences that appear incompatible with scanning have been found to derive from incompletely spliced transcripts or to have been misinterpreted in other ways (kozak, 1996 (kozak, , 2000 . in other cases, although an encumbered cdna sequence is correct, it derives from a transcript that does not support translation (hake and hecht, 1993; foo et al., 1994; larsen et al., 2002; lee et al., 2000) . only after one is certain of the mrna structure should the mechanisms below be considered. in mammals, the optimal context for recognition of the aug start codon is gccrccaugg. within this motif, the purine (r) in position 23 is the most highly conserved (see section 6.1) and functionally the most important position. the importance of a or g (a is somewhat better than g) in position 2 3 was proved by mutagenesis experiments on a wide variety of genes (kozak, 1986b; and see entries marked 'tested' in table 3 ). the g in position þ 4 is also highly conserved and, especially in the absence of a in position 2 3, contributes strongly (kozak, 1997) . adherence to the rest of the gccrccaugg motif varies, without major consequences as long as positions 2 3 and þ 4 conform; the upstream gcc motif can be seen to contribute, however, in the absence of other elements (kozak, 1987b) . the aforementioned mutagenesis experiments define two extremes: (i) when the first aug codon occurs in a strong context -annaugn or gnnaugg -all or almost all ribosomes stop and initiate at that point; (ii) when the first aug resides in a very weak context, lacking both r in position 2 3 and g in position þ 4, some ribosomes initiate at that point but most continue scanning and initiate farther downstream. this leaky scanning enables the production of two separately initiated proteins from one mrna, as documented below. it is harder to predict what happens at start sites that fall between the extremes, i.e. mrnas in which the first potential start codon has the sequence ynnaugg, gnnaugy or gnnauga. leaky scanning is seen in some but not all such cases. a possible explanation suggested by studies with test transcripts (kozak, 1990a) is that initiation might be restricted to the first aug codon, despite a suboptimal context, when downstream secondary structure slows scanning and thus provides more time for codon/anticodon pairing. suppression of leaky scanning via this mechanism requires a critical distance (13 -15 nt, which corresponds to half the diameter of the ribosome) between the aug codon and the downstream structured element. table 3 lists some examples in which two proteins are produced from one mrna via leaky scanning. the postulated link between context and leaky scanning has been tested in many of these cases by showing that, upon improving the context at the upstream site, initiation from the second site is reduced or abolished. (whether the second aug codon resides in a strong or weak context is not relevant; the ribosome reads the mrna linearly and thus the decision to stop or to bypass the first aug is not influenced by whether there is a better initiation site downstream.) the large number of genes that employ leaky scanning precludes discussion of the biological significance of the proteins thereby produced, but it merits noting that, for many of the viruses in table 3 , replication requires production of both listed proteins. for some other viruses, the second protein is a virulence factor that weakens host defenses (bridgen et al., 2001; chen et al., 2001; weber et al., 2002) . the biological importance of these downstream-initiated proteins shows that leaky scanning is a deliberately employed tool; it does not simply reflect sloppiness on the part of the translational machinery. the long list of examples in table 3 conforms to expectations in that the first start codon resides in a suboptimal context. there are, however, rare instances of leaky scanning despite a good context (r 23 and g þ4 ) around the first aug. this can happen when the first aug codon is too close to the 5 0 end to be recognized efficiently (kaneda et al., 2000; kozak, 1991c; ruan et al., 1994; sedman et al., 1990; slusher et al., 1991; spiropoulou and in all mrnas here listed, the sequence flanking the first start codon deviates from the consensus sequence in position 23 and/or position þ 4, highlighted by underlining. when the postulated link between context and leaky scanning was tested (so marked in this column), mutations that improved the context at the first start site diminished access to the downstream start site. this test failed only with cucumber necrosis virus, where the short distance between the m7g cap and aug#1 allowed some leaky scanning even when the context was optimized. c in some cases the first and second aug codons are in the same reading frame, generating long and short versions of the encoded protein which may function differently. in cases where the first and second start codons are in different reading frames, indicated by italicizing the second product, the extent of overlap between the two orfs ranges from a few codons (peanut clump virus, southern bean mosaic virus) to 626 codons (turnip yellow mosaic virus). d access to the downstream initiation site via leaky scanning is augmented by a reinitiation shunt, as explained in the text (section 4.3) and diagrammed in fig. 1 for c/ebpb mrna. e mutations that eliminate aug#1 usually increase production of the second, downstream protein. in rare cases where the expected increase was not seen (e.g. von hippel-lindau, turnip yellow mosaic virus), it might be because translation of the second protein was restricted at the level of elongation. for a similar reason, improving the context around aug#1 occasionally fails to elevate production of the protein there initiated (fajardo and shatkin, 1990) . these entries nevertheless satisfy the main prediction of the leaky scanning mechanism, which is that improving the context around aug#1 prevents initiation from the second, downstream site (fajardo and shatkin, 1990; iliopoulos et al., 1998) . f whereas feline leukemia virus produces an n-terminally-extended, glycosylated form of gag (gp80 gag ) from the indicated weak aug codon, the corresponding upstream start site in murine leukemia virus is acccugg (portis et al., 1994) . when that site was experimentally ablated, however, revertants expressed the extended protein from a weak upstream aug codon (uuuaugg) created by a point mutation. those revertants were selected because the extra glycosylated form of gag contributes to viral spread (portis et al., 1996) . g in the mrnas from baboon reovirus, influenza a virus, and southern bean mosaic virus, the indicated proteins derive from the first (weak) and fourth aug codons. aug#2 and aug#3 initiate small orfs that terminate before aug#4. thus, a combination of leaky scanning and reinitiation probably mediates access to the downstream start site. nichol, 1993; werten et al., 1999) or when the facilitating effect of g in position þ 4 is canceled by u in position þ 5 (kozak, 1997; sloan et al., 1999; stallmeyer et al., 1999) . other occasional claims of leaky scanning despite a strong context at the first aug codon were simply mistaken (scherer et al., 1995) ; the shorter protein turned out to be translated from a second form of mrna (kogo and fujimoto, 2000) . at the opposite extreme, there are rare mammalian mrnas in which, despite a very unfavorable context flanking the first aug codon, translation appears to initiate exclusively at that site (arai et al., 1991; hickey and roth, 1993; leslie et al., 1992; mcneil et al., 1992; plowman et al., 1990; wu et al., 1993a) . leaky scanning might be suppressed in these and a few other cases because of downstream secondary structure, or because the wider context (c in positions 21, 22, 24, 2 5; g in position 2 6) compensates to some extent for the absence of r 23 and g þ4 , or for other unknown reasons. the same principle that allows initiation from the first and second aug codons when the first aug is in a suboptimal context (table 3 ) applies in cases where translation initiates at an upstream non-aug codon in addition to the first aug (acland et al., 1990; arnaud et al., 1999; carroll and derse, 1993; fajardo et al., 1993; florkiewicz and sommer, 1989; fütterer et al., 1996; fuxe et al., 2000; lock et al., 1991; muralidhar et al., 1994; packham et al., 1997; saris et al., 1991; spotts et al., 1997) . recognition of an upstream acg, cug or gug start codon requires a strong context (portis et al., 1994) , despite which scanning is usually leaky because the initiator codon itself is weak. production of long and short protein isoforms via leaky scanning is harder to regulate -e.g. to achieve tissue specific expression of one or the other form -than when a unique mrna encodes each isoform, as in table 2 . there are hints, however, that dual initiation via leaky scanning might be regulable (probst-kepper et al., 2001; spotts et al., 1997) . this could conceivably be accomplished via proteins that stabilize downstream secondary structure or, perhaps, via a combination of leaky scanning and regulated reinitiation, if the mrna also has small uporfs. among the examples in table 3 are many plant viruses, indicating that the basic context rules extend to plant systems. mutagenesis experiments confirm the functional importance of r in position 23 and g in position þ 4 in plant mrnas (jones et al., 1988; lukaszewicz et al., 2000) and surveys of plant cdna sequences confirm the conservation of those key positions (pesole et al., 2000; rogozin et al., 2001) . unlike mammalian mrnas, however, plant mrnas do not show a predominance of c in positions 2 1, 2 2, 2 4 and 2 5. the foregoing discussion pertains to mrnas from plants and vertebrate animals. there is some evidence for contextdependent leaky scanning in fungi (arst and sheerins, 1996) , but context effects on initiation have not yet been studied carefully in protozoa, insects, and various other systems. the observation that trans-splicing of mrnas in c. elegans sometimes brings a purine into position 2 3 is interesting (hough et al., 1999) but the significance awaits testing. with a number of yeast genes, there is a hint of leaky scanning when the usual a in position 2 3 is replaced by a pyrimidine (gaba et al., 2001; slusher et al., 1991; vilela et al., 1998; welch and jacobson, 1999; wolfe et al., 1994) . context effects were not evident, however, in other studies of translation in yeast (cigan et al., 1988b) . for whatever reason, leaky scanning is rare in yeast, apart from a few cases attributable to the first aug codon residing too close to the 5 0 end. mrnas that initiate translation from three sites provide a striking illustration of how far leaky scanning, alone or in combination with reinitiation, can be pushed. fig. 1 shows some examples. the predominant translation product obtained from cmyc mrna is a 65 kda 'long form 2' which initiates at the first aug codon (fig. 1a) . a small amount of a longer isoform (68 kda) derives from an upstream cug codon which is a weak start site (i.e. very leaky) because the codon is not aug. although the first aug codon has the required a in position 2 3, a small percentage of ribosomes bypass that site and initiate at the next aug, producing a third (50 kda) form of c-myc. this happens apparently because the context flanking the first aug codon is good but not perfect. thus, production of the 50 kda isoform was eliminated when the upstream site was changed from acgaugc to accaugg (spotts et al., 1997) . fig. 1b depicts another example in which ribosomes initiate from three in-frame aug codons. with the mrna that encodes c/ebpb, access to the far downstream site via leaky scanning is augmented by a reinitiation shunt, as explained in the legend to fig. 1 and discussed further in section 4. translation of c/ebpa mrna occurs by a mechanism similar to that depicted for c/ebpb except that the first start site in c/ebpa is a cug codon (calkhoven et al., 2000) , which generates a smaller amount of the longest protein (isoform a) than does the aug codon in c/ ebpb. with c-myc, c/ebpb and c/ebpa mrnas, leaky scanning is biologically important because the long and short versions of the protein have opposing effects as regulators of transcription. it is striking that leaky scanning can operate even when the second initiation site resides far downstream from the first. with synthetic transcripts designed to test the processivity of scanning, there was no reduction in initiation from the downstream site when the inter-aug distance was expanded stepwise from 11 to 251 nt (kozak, 1998) . in some remarkable viral mrnas, the second functional initiation site is more than 500 nt downstream from the fig. 1 . examples of 'maximally leaky' scanning wherein one mrna produces three independently initiated proteins. major (thick arrow) and minor (thin arrow) translation products are identified below their respective start codons. sequences that cause the initiation site to be weak, and thus promote leaky scanning, are highlighted in red. offset rectangles represent orfs in different reading frames. (a) with c-myc mrna, a leaky scanning mechanism was inferred from experiments in which optimizing the context around the first aug codon suppressed production of the 50 kda isoform, while changing the upstream cug codon to aug suppressed production of both the 65 and 50 kda isoforms (spotts et al., 1997) . access to the downstream start site might be more complicated than here depicted, as there is a small out-of-frame orf between the 65 and 50 kda start sites. (b) with c/ebpb mrna, a mutation that strengthens the first start codon (uucaugc ! accaugg) blocked production of all shorter isoforms, implicating a leaky scanning mechanism (calkhoven et al., 2000) . a small uporf (blue) superimposes another level of control, causing more ribosomes to bypass the start site for isoform b1 than would be expected from leaky scanning alone. presumably because the aug start codon for isoform b1 is positioned close to the termination site of the uporf, reinitiation at site b1 is inefficient and some ribosomes thus reach the far downstream start site for the 20 kda isoform (lip). as evidence for this reinitiation shunt, calkhoven et al. (2000) showed that eliminating the aug codon of the uporf abolished production of lip and that strengthening or weakening the context around the uporf start codon caused corresponding changes in the yield of lip. although the smallest form of c/ebpb can be generated in some situations by proteolysis (dearth et al., 2001) , the effects of the aforementioned mutations clearly implicate a translational mechanism. the lap/lip ratio shows tissue and stage specific variation (dearth et al., 2001; descombes and schibler, 1991) . (c) whereas leaky scanning allows initiation at multiple sites within a single orf in c/ebpb and c-myc mrnas, leaky scanning allows translation of three separate orfs in the pregenomic mrna of rice tungro bacilliform virus. these orfs (not drawn to scale) have overlapping start and stop codons of the form auga. translation via leaky scanning was inferred from the strong reduction (.13-fold) in translation of orf2 and orf3 when the start codon of orf1 was changed from auu to aug (fütterer et al., 1997) and from the inhibitory effect on expression of orf3 when an adventitious aug codon was inserted into orf2. the 5 0 leader sequence that precedes orf1 has ten small uporfs which are not depicted here because that peculiar leader sequence, postulated to be translated by ribosome hopping (fütterer et al., 1996) , is not required for the leaky scanning mechanism that underlies translation of orfs 1, 2 and 3. (d) the avian reovirus s1 mrna supports translation of one structural and two nonstructural proteins (bodelón et al., 2001) . the depicted mechanism postulates that orf1 has a dual function, encoding its own polypeptide (p10) and facilitating translation of orf3 by shunting some ribosomes past the strong aug start codon for orf2. the absence of extraneous aug codons in the 310 nt region between the end of orf1 and the start of orf3 is consistent with the idea that orf3 might be translated by reinitiation. some ribosomes would be expected to translate p17 (orf2) by leaky scanning, engendered by the poor context at the start of orf1. improving the context at the start of orf1 indeed increased production of p10 (shmulevitz et al., 2002) ; unfortunately, the yield of p17, which would be expected to decrease, was not monitored. the observation that strengthening the context at the start of orf1 had no effect on the yield of sc is not surprising because the reinitiation mechanism postulated to underlie translation of orf3 would probably be limited by other features, such as the relatively large size of orf1. first aug (herzog et al., 1995; sivakumaran and hacker, 1998) . the pregenomic mrna of rice tungro bacilliform virus (fig. 1c) provides the most dramatic illustration of these points. use of a weak (non-aug) codon to initiate orf1 and an unfavorable context at the start of orf2 (uacauga) enables the majority of ribosomes to reach and initiate at the start of orf3. the orf3 polyprotein is thought to be a precursor from which coat protein, protease and reverse transcriptase are derived by proteolysis. the remarkable absence of aug codons from the long (563 nt) coding domain of orf1 and the presence of but one weak aug codon within orf2 underscore how carefully this mrna is constructed to support translation via scanning. the careful construction includes minimizing the overlap between adjacent cistrons. without that precaution, elongational occlusion might work against utilization of a far downstream start codon, as documented in other cases (kozak, 1995) . the avian reovirus rna diagrammed in fig. 1d offers another example of initiation from three sites in one mrna. additional experiments are needed to validate the postulated mechanism. in contrast with the 'maximally leaky' mrnas in fig. 1 , the mrnas in fig. 2 are minimally leaky: only a small fraction of ribosomes bypass the first aug codon and initiate downstream. here the leaky scanning mechanism has been pushed to the limits in the sense that there is (lowlevel) initiation from a second site despite the presence of a strong context around the first aug codon. the explanation is that the context flanking the first aug start codon is good but not perfect. the resulting low-level leaky scanning enables the viruses depicted in fig. 2a ,b to produce two proteins -one abundant, the second in small amountsfrom a single mrna. experimental manipulations that support this interpretation are summarized in the legend to fig. 2. a few other viral genes that might fit this category have been described (chenik et al., 1995; jayakar and whitt, 2002) . the hepatitis b virus example is noteworthy because, via the rube goldberg mechanism diagrammed in fig. 2b , reverse transcriptase encoded by the p gene is initiated independently from a far downstream site, unlike most other reverse transcriptase genes which lack an independent start codon and therefore require frameshifting during translation of the preceding core gene. production of a second protein isoform via low-level leaky scanning is seen also with some cellular mrnas. an interesting example is the production in rats of an osteogenic growth peptide (ogp) initiated from codon 85 in the histone h4 gene (fig. 2c) . the leaky scanning explanation was tested by showing that production of ogp increased upon deleting the upstream h4 start codon, and that production of ogp was suppressed upon changing the h4 start codon from a good (aggaagaugu) to a perfect (gccaccaugg) context. a similar mechanism might operate with a few other cellular genes that produce a trace amount of a second protein isoform short and pfarr, 2002; it is not clear whether leaky scanning or a change in splicing underlies the translational switch described by land and rouault, 1998) . the fourth example in fig. 2 differs from the others in that a good-but-not-perfect context at the first start site serves, not to enable production of two proteins, but simply to modulate the yield of a 2a -r from the second aug. examination of a 2a -r genes from various organisms shows conservation of the overlapping orf, with the upstream aug codon always in a context that allows only low-level leaky scanning (lee et al., 1999) . conservation of the structure supports the interpretation that this is a device contrived to limit the production of a 2a -r protein. low-level leaky scanning caused by a not-quite-perfect context around the first aug codon might occur with other mrnas where it normally goes unnoticed because the downstream start site(s) are out-of-frame. antigenic peptides recognized by cytotoxic t-lymphocytes (ctls) might be produced in this way, as discussed in section 5.4. a small degree of leaky scanning that normally goes unnoticed could become significant if a mutation that shifts the normal start codon out of frame moves a downstream aug codon into the main reading frame. in some cases where low-level internal initiation was observed with such a mutated gene (e.g. maser et al., 2001) , the possibility that the downstream site is reached via a combination of leaky scanning and reinitiation -a mechanism such as that proposed for hepatitis b virus (fig. 2b ) -merits consideration. reinitiation occurs with mrnas, such as those depicted in fig. 3 , that have small orfs near the 5 0 end. our rudimentary understanding of what happens following translation of the first uporf may be summarized as follows. when the 80s ribosome reaches the termination site of the uporf, the 60s ribosomal subunit is thought to be released (this has not actually been shown) while the 40s subunit remains bound to the mrna, resumes scanning, and may initiate another round of translation at a downstream aug codon. for the downstream reinitiation event to occur, the 40s subunit must reacquire met-trna i and this appears to be an important point of control. reacquisition of met-trna i is promoted by lengthening the intercistronic domain (abastado et al., 1991; kozak, 1987c) , which provides more time for met-trna i to bind, or by increasing the concentration of eif2 (abastado et al., 1991; hinnebusch, 1997) . genetic experiments also implicate eif3 in the met-trna i rebinding step (garcia-barrio et al., 1995) . another potential point of control is at the termination site of the uporf, where certain features -perhaps nearby secondary structure (grant and hinnebusch, 1994; vilela et al., 1998) -might prevent the resumption of scanning or, in some other way, prevent reinitiation. this brief summary is based on studies carried out in yeast and mammals. some studies of reinitiation in plants suggest that the intercistronic sequence may have effects beyond simply providing time for ribosomes to reacquire met-trna (wang and wessler, 1998) . some results obtained in early experiments with mammalian vectors were interpreted as evidence that ribosomes can scan backwards and thus reinitiate at an aug codon positioned upstream from the termination site (peabody et al., 1986 ), but recent experiments contradict fig. 2 . examples of minimally leaky scanning in which a strong, but not quite perfect, context at aug#1 causes most ribosomes to initiate there while allowing a low level of initiation downstream. with the depicted viral mrnas (a,b), the predominant product of translation is the capsid protein initiated from aug#1. low-level leaky scanning generates a small but adequate amount of the indicated second protein. with bovine coronavirus, a mutation in position þ4 (u ! g, indicated in red) flanking aug#1 strongly reduced translation from the downstream site (senanayake and brian, 1997) , supporting the interpretation that the natural mrna is slightly leaky because the context flanking aug#1 is not a perfect match to the consensus sequence. with hepatitis b virus, ribosomes en route to the p start site (aug#5) apparently bypass the weak aug#2 by leaky scanning, while translation of the small orf initiated at aug#3 enables some ribosomes to miss the inhibitory aug#4 (inhibitory because it resides in a strong context and overlaps the p orf) and thus to reach aug#5. whereas the core protein start codon (aug#1) here depicted resides in a context which allows a low level of leaky scanning, a slightly longer mrna which encodes the pre-core protein has a stronger start codon (a in position 23) and polymerase cannot be translated from that form of mrna (fouillot and rossignol, 1996) . the publications on which the scheme shown here is based (fouillot et al., 1993; hwang and su, 1998 ) also discuss some alternative possibilities vis-à-vis translation of polymerase. (c) the first aug codon in rat histone h4 mrna initiates translation of the full-length protein. the second aug, 85 codons downstream and in the same reading frame, initiates production of a peptide which has growth-regulatory properties (bab et al., 1999) . (d) with rat a 2a r adenosine receptor mrna, an overlapping uporf that initiates at an aug codon in a strong context is used to minimize production of a 2a r protein. the overlapping arrangement precludes reinitiation but the not-quite-perfect context at the upstream start site allows low-level leaky scanning. this interpretation is supported by the observed ten-fold increase in translation of a 2a r in vivo when the start codon of the uporf was eliminated (lee et al., 1999) . via a second promoter, the rat a 2a -r gene produces some transcripts with additional uporfs, but no transcript has yet been found that lacks the inhibitory uporf discussed here. here and in fig. 3 , the major coding domain is shaded gray. small regulatory orfs (blue rectangles) are not drawn to scale. that view (kozak, 2001b) . indeed many studies have shown that the strongest inhibition is caused by an uporf that overlaps the start of the downstream cistron (babik et al., 1999; bates et al., 1991; byrne et al., 1995; cao and geballe, 1995; ghilardi et al., 1998; hansen et al., 2002; kos et al., 2002; lee et al., 1999; liu et al., 1999) , which would not be the case if ribosomes could move backwards to reinitiate. the size of the first orf is a major limitation on reinitiation in eukaryotes: reinitiation can occur following the translation of a 'minicistron' (a small first orf) but not following the translation of a full-length 5 0 cistron. the long list of mrnas that contain silent 3 0 cistrons (table 1) underscores the point. the only apparent exception occurs with cauliflower mosaic virus, where a protein encoded by the virus appears to promote reinitiation following the translation of a full-length first cistron (park et al., 2001) . the reason why reinitiation is usually restricted by the size of the first orf is not known, but a possible explanation is that certain initiation factors dissociate from the ribosome only gradually during the course of elongation. if the elongation phase is brief -i.e. if the first orf is a fig. 3 . small upstream orfs in eukaryotic mrnas function in various ways to modulate translation. only the 5 0 end of each mrna is depicted. (a) the presence of uporfs forces translation of the major orf to occur by a reinitiation mechanism, which is usually inefficient. the extent of inhibition depends on the number and arrangement of uporfs and whether the context flanking the upstream start codon(s) allows some escape via leaky scanning. (b) because reinitiation can occur only in the forward direction, an overlapping uporf strongly impairs translation of the major orf. (c) whereas type b mrnas have a single in-frame start codon which is bypassed due to the overlapping uporf, type c mrnas initiate from two in-frame start codons; the uporf serves to divert some ribosomes to the downstream start site. the depicted sequence is a simplified representation of glyrs mrna (mudge et al., 1998) . translation of bag-1 mrna can also be fitted to this pattern: the first start site is an in-frame cug codon which produces the 50 kda form of bag-1; the next start site (aug#1, out-of-frame) initiates a small uporf within which the first in-frame aug codon (aug#2) resides, and that aug is thereby skipped; the 36 kda form of bag-1 is produced from aug#3 which is accessed by reinitiation following translation of the small uporf (packham et al., 1997) . some other mrnas that use an uporf to dodge one aug codon in favor of another are described elsewhere (mittag et al., 1997; sarrazin et al., 2000) . note that the reinitiation shunt as here defined adheres to the linear scanning mechanism, unlike a shunt postulated to operate with cauliflower mosaic virus mrna (ryabova et al., 2000) . (d) the common feature of mrnas that use mechanism d is inhibition of translation in cis by a peptide encoded within the uporf. the amino acid sequence of the inhibitory peptide is different in each case (morris and geballe, 2000) . in the column at the far right, asterisks indicate examples in which the translational control mechanism is regulated, e.g. via a change in concentration of eif2 (gcn4) or arginine (cpa1) or polyamines (adometdc) or, more commonly, via an alternative promoter that generates a simpler form of mrna devoid of uporfs (c-mos, mdm2, il-12; see text for other examples, e.g. in alderete et al., 2001) . minicistron -the factors required for reinitiation would still be present when the 40s subunit resumes scanning. although the postulated factors have not been identified, there is evidence for the idea that the duration of the elongation phase matters: when a short uporf which normally permits reinitiation was reconfigured to contain a pseudoknot that is known to slow elongation, reinitiation failed (kozak, 2001b) . that result makes it difficult to specify a cutoff size, i.e. one cannot say 'an uporf this long will allow reinitiation' while a longer orf will not. the permissible size is likely to vary depending on features, such as secondary structure or codon usage, that affect the rate of elongation. as a rough guide, however, one may note that reinitiation often has been observed following translation of a ten to 12 codon uporf, and that reinitiation was substantially reduced, but not abolished, when a 13 codon uporf was lengthened to 33 codons (kozak, 2001b) . in a different study, reinitiation occurred following a 24 codon uporf but not when the orf was lengthened to 40 codons (luukkonen et al., 1995) . some naturally occurring uporfs that strongly inhibit translation, perhaps because their size precludes reinitiation, include a 36 codon uporf in mitochondrial uncoupling protein 2 mrna (pecqueur et al., 2001) , a 71 codon uporf in polyoma virus jc mrna (shishido-hara et al., 2000) , and a 53 codon uporf in plant s-adenosylmethionine decarboxylase (adometdc) mrna (hanfrey et al., in press) . that the size of the uporf might be what limits translation of adometdc is suggested from the five-fold increase in translation observed when the uporf was shortened from 53 to 25 codons, but that result could also be explained in other ways. (the suggested interpretation is not contradicted by the fact that an alternative uporf in adometdc mrna caused little inhibition even when lengthened to 66 codons; the alternative uporf initiates from an aug codon in a weak context which would allow it to be bypassed to some extent by leaky scanning. the 53 codon uporf, in contrast, has a strong start codon.) with the mouse snurf-snrpn transcript, where the first cistron is 71 codons long , a very low level of reinitiation might account for translation of the downstream snrpn cistron. a naturally occurring atg-to-agg mutation in the start codon of the upstream snurf cistron was found to elevate translation of snrpn . 15fold (tsai et al., 2002) , which implicates a scanning/ reinitiation mechanism and rules out direct internal initiation. from cdna sequencing data, it is clear that many vertebrate mrnas have small orfs upstream from the start of the major coding domain, but an accurate count of genes in this class is difficult. the tallies that have been attempted (e.g. pesole et al., 2001; suzuki et al., 2000) are invariably flawed by inclusion of misinterpreted cdna sequences, such as cdnas in which a putative 5 0 utr with 'upstream' aug codons turned out to be part of the coding domain or part of an intron that gets removed from the functional mrna (di fruscio et al., 1998; kozak, 1996 kozak, , 2000 kubu et al., 2000; nishitani et al., 2001; santamarina-fojo et al., 2000; wagner et al., 1998) . some transcripts with long, aug-burdened leader sequences are not associated with polysomes (hake and hecht, 1993; sanchez-góngora et al., 2000) or not able to support protein synthesis (foo et al., 1994; larsen et al., 2002; lee et al., 2000) , emphasizing that not all cdnas correspond to functional mrnas. a more fundamental complication vis-à-vis which genes to count is the propensity for a single gene to produce transcripts with different 5 0 leader sequences, only some of which have upstream aug codons (anant et al., 2002; aplan et al., 1991; eerola et al., 2001; huo and scarpulla, 1999; kawakubo and samuel, 2000; laurin et al., 2000; perälä et al., 1994; perrais et al., 2001; sanchez-góngora et al., 2000; suva et al., 1989; tanaka et al., 2001; tsuda et al., 2000; zimmermann et al., 2000) . the significance of a particular form of rna cannot always be deduced from its abundance, inasmuch as a minor transcript is sometimes the major functional mrna (andrea and walsh, 1995; babik et al., 1999; ghilardi et al., 1998; mitsuhashi and nikodem, 1989; nielsen et al., 1990) and incompletely processed transcripts are sometimes more abundant than the fullyspliced, translatable mrna (boularand et al., 1995; frost et al., 2000; xie et al., 1991; zachar et al., 1987) . translational regulation mediated by small uporfs is important, as discussed below, but equally important are non-translational mechanisms -use of alternative promoters or splice sites -that simplify the 5 0 utr by eliminating uporfs in certain tissues or at certain times when elevated synthesis of the protein is required (aizencang et al., 2000; anusaksathien et al., 2001; arrick et al., 1994; babik et al., 1999; brown et al., 1999; horiuchi et al., 1990; landers et al., 1997; lee et al., 2000; nonaka et al., 1989; phelps et al., 1998; ren and stiles, 1994; steel et al., 1996; teruya et al., 1990) . because vertebrate mrna leader sequences are often gc-rich (section 2.4), secondary structure near the 5 0 end might impair translation even more than the presence of upstream aug codons. thus, it is not surprising that eliminating upstream aug codons does not improve translation in every case (rao et al., 1988; wood et al., 1996) . in many cases, however, mutations targeted to the upstream aug codons confirmed their role in restricting translation from downstream (anant et al., 2002; babik et al., 1999; bates et al., 1991; brown et al., 1999; child et al., 1999a; gereben et al., 2002; ghilardi et al., 1998; griffin et al., 2001; harigai et al., 1996; kos et al., 2002; lee et al., 1999; marth et al., 1988; meijer et al., 2000; pecqueur et al., 2001; ren and stiles, 1994; steel et al., 1996; tanaka et al., 2001; tsai et al., 2002; wang and wessler, 1998; wang and rothnagel, 2001; wera et al., 1995; wu et al., 2002) . this occurs by a variety of mechanisms, as summarized in fig. 3 and discussed next. while the efficiency of reinitiation varies, there is almost always a penalty -demonstrable by showing an increase in translation when the uporf is deleted -and the penalty can be severe. thus, the simplest function of small uporfs is to limit production of the protein encoded in the full-length orf by making downstream translation dependent on an inefficient reinitiation mechanism (fig. 3a) . the best studied example is yeast gcn4, which initiates from the fifth aug codon in the mrna; the long leader sequence contains four small uporfs. in a series of classic experiments, hinnebusch (1997) was able to reconstruct gcn4 regulation using only the first and fourth uporfs, and i will explain what happens in that simplified case. uporf1 is always translated efficiently (it is the first aug in the mrna), after which ribosomes resume scanning and reinitiate, usually, at uporf4. uporf4 is unusual in that its translation precludes further reinitiation events: thus, when uporf4 is translated, gcn4 is not. that is the situation in yeast cultures that have adequate nutrients. starvation for amino acids, however, causes some ribosomes to bypass the inhibitory uporf4 and reinitiate instead farther downstream. this happens because starvation creates a pool of uncharged trnas which activate a protein kinase that phosphorylates, and thus partially inactivates, eif2. when eif2 levels fall, it takes longer for ribosomes to reacquire met-trna i and thus become competent to reinitiate. the slower acquisition of competence means that some ribosomes, scanning in the reinitiation mode, will bypass the nearby uporf4 and can thus reach the downstream gcn4 start site. three general lessons from the gcn4 story appear to carry over to mammals. (i) fig. 3a lists some examples of mammalian mrnas that are translated inefficiently due to small uporfs; many other examples were cited in section 4.2. (ii) experimental manipulations with c/ebpb mrna (fig. 1b ) support the interpretation that an aug codon which follows the uporf too closely is skipped (presumably because ribosomes have not yet reacquired met-trna i ), allowing reinitiation to occur farther downstream. the same mechanism might be invoked to explain how an internal start codon is accessed in minitrprs mrna (wakasugi et al., 2002) and baculovirus ie0 mrna (theilmann et al., 2001) , and how c-myb gets translated from a rearranged transcript generated by retrovirus insertion (jiang et al., 1997) . in each of these mrnas, the first aug codon that follows a small uporf must be bypassed to reach the functional start codon downstream. (iii) the third lesson from gcn4 pertains to regulation of reinitiation by manipulation of eif2 levels. although hints of this have been described with mammalian genes that encode c/ebp transcription factors (calkhoven et al., 2000) , macrophage receptor protein cd36 (griffin et al., 2001) and activating transcription factor 4 (harding et al., 2000) , the point requires much more careful study. the mrnas discussed in connection with fig. 3a have uporfs that terminate before the start of the major coding domain, thus allowing (inefficient) translation of the main orf by reinitiation. in fig. 3b , however, the uporf overlaps the start of the major coding domain. this precludes reinitiation and profoundly reduces the translational yield. limited access to the main orf in some of these mrnas might be achieved by leaky scanning, as was discussed for a 2a -r (fig. 2d) . mrnas derived from the human thrombopoietin (tpo) gene have structures similar to that depicted in fig. 3b and much can be learned from the tpo story, as outlined in fig. 4 . the normal gene produces a mixture of transcripts with different leader sequences, all of which translate tpo poorly because of an overlapping uporf (uporf7 in fig. 4 ). targeted mutagenesis (ghilardi et al., 1998) confirmed that upstream aug#7 is primarily responsible for blocking translation of tpo. this is because its near-optimal context (gccgccuccaugg) prevents leaky scanning and the overlapping arrangement precludes reinitiation. various mutations that restructure the 5 0 utr in ways that increase production of tpo cause hereditary thrombocythemia. translation of tpo normally initiates at aug#8 in exon 3, but a splice-site mutation that causes deletion of exon 3 causes initiation to shift to a previously silent inframe codon (aug#9) in exon 4; this is diagrammed in the center of fig. 4 . the resulting truncated form of tpo lacks only the first four amino acids and appears to function normally . the problem -the cause of the pathology -is that the mutation greatly increases translation of tpo by removing the inhibitory uporf7. in two other families affected with hereditary thrombocythemia, production of tpo is elevated by mutations that restructure uporf7. in one case, deletion of a g residue shifts uporf7 into the same reading frame as tpo, thereby causing overproduction of an elongated form of tpo initiated from aug#7 kondo et al., 1998) . in the other case, a g ! t mutation creates a terminator codon within uporf7 and this shortening of the orf, which now terminates 31 nt before aug#8, enables efficient reinitiation at aug#8 . these insightful studies of tpo expression make two important points: (i) the bulk of the transcripts produced by the wild type gene are virtually untranslatable; and (ii) it is necessary for this potent cytokine to be translated poorly; overproduction results in disease. with tpo as precedent, one suspects that in other cases where -despite the production of alternative leader sequences -it is hard to find even one form of mrna devoid of upstream aug codons (e.g. larsen et al., 2002; lee et al., 1999; pecci et al., 2001; peterson and morris, 2000; wang et al., 1999) , the goal is to ensure that translation is very, very inefficient. the wig-1 growth-regulatory gene might be another example: an overlapping uporf initiates from a strong aug codon, while the wig-1 start codon itself is weak, and these distinctive features are conserved between the human and mouse genes (hellborg et al., 2001) . whereas an overlapping uporf functions simply to down-modulate translation in the examples depicted in fig. 3b , with the mrnas in fig. 3c the overlapping uporf qualitatively affects the protein output. ribosomes that translate the uporf thereby miss the first in-frame aug codon but proceed to reinitiate at another start codon fig. 4 . a low-level reinitiation mechanism normally prevents overproduction of tpo. translational yields from various forms of tpo mrna in transfected cos cells (far right column) are expressed relative to a control transcript that has a short, unencumbered 5 0 utr. p1 and p2 are alternative promoters; a cluster of arrows indicates that p2 produces staggered start sites. the tpo coding domain (horizontal black bar) begins at an aug codon which is labeled #8 because, in the longest form of mrna (line 1), it is preceded by seven aug codons that initiate small uporfs. superscript letters indicate whether each upstream aug resides in a strong (s) or weak (w) context and horizontal blue lines depict the approximate length and arrangement of the uporfs. vertical lines demarcate the boundaries of exons; carets depict the introns in alternatively spliced transcripts. only the beginning of the tpo coding domain (exons 3 -7) is shown. the key point is that the normal set of transcripts supports translation poorly because uporf7 overlaps the tpo start site. various mutations (shown in red) that relieve this constraint elevate the translation of tpo, and this overproduction causes hereditary thrombocythemia. among the normal set of mrnas, the 'rare' transcript from promoter p1 (line 2) supports translation slightly better than the others, perhaps because the short distance between uporf2 and aug#7 enables some reinitiating ribosomes to bypass aug#7 and thus reach aug#8. because of the strong context at augs #1 and #2, uporfs 1 and 2 would be more effective than uporfs 5 and 6 in setting up this reinitiation shunt. the depicted scheme is based on experiments described by ghilardi et al. (1998) and wiestner et al. (1998) . additional mutations diagrammed near the bottom of the figure were described by , , and kondo et al. (1998). downstream. if the uporf itself has a suboptimal initiation site (u in position 2 3 in the depicted example), leaky scanning will allow some production of the long protein isoform from the first in-frame aug codon while the reinitiation shunt promotes production of the shorter protein isoform. the operation of a reinitiation shunt is most obvious when the uporf overlaps a potential start codon, as shown in fig. 3c , but the same principle applies in cases (discussed in section 4.3) where, although the uporf terminates prior to a potential downstream start codon, the intervening distance is too short to allow reinitiation. the fourth regulatory mechanism diagrammed in fig. 3 is used only rarely. mammalian adometdc mrna is the best studied example in which a small uporf encodes a peptide which functions in cis to inhibit downstream translation. the nascent peptide (magdis) produced during translation of the uporf is thought to interact with ribosomes in a way that prevents completion of the termination process and thus prevents reinitiation (law et al., 2001) . the stalled ribosome, held at the termination site of the uporf, would also block other ribosomes from reaching the downstream start site via leaky scanning. biologically, this mechanism is important because ado-metdc is a key enzyme in polyamine biosynthesis and, at least in vitro, elevated polyamine levels stabilize the ribosome complex stalled at the end of the uporf (law et al., 2001) . in other words, elevated polyamine levels down-regulate translation of adometdc. it is interesting to note parenthetically that antizyme, a protein that downregulates polyamine levels, is also translated via a polyamine-sensitive mechanism. elevated polyamine levels up-regulate production of antizyme by promoting a ribosomal frameshift needed to translate the full-length protein (ivanov et al., 2000) . the foregoing examples illustrate how reinitiation operates as part of the normal translation mechanism in cases where uporfs are constitutively present in mrnas. there are other cases in which a reinitiation mechanism kicks in only when a nonsense mutation is introduced in a way that truncates the coding domain. in effect, the normal aug initiator codon becomes the start of an uporf, following which reinitiation occurs at a normally silent internal aug codon (chang and gould, 1998; ledley et al., 1990; zoppi et al., 1993) . the n-terminally truncated protein thus produced sometimes retains enough function to mitigate the pathological effects of the nonsense mutation (chang and gould, 1998) . this potential rescue device often fails, however, because many mrnas are rapidly degraded when a nonsense codon is introduced (frischmeyer and dietz, 1999; he and jacobson, 2001) . the mrna decay pathway that targets these abnormal mrnas is activated in part by cis elements located in the coding domain (gudikote and wilkinson, 2002) , which might explain why normal uporf-containing mrnas (e.g. those discussed in figs. 3 and 4) are not rapidly degraded. initiation factor eif2 plays a key role in translational control (clemens, 2001; dever, 2002) and mutations that perturb regulation of eif2 have profound pathological consequences (delépine et al., 2000; han et al., 2001; harding et al., 2001; van der knaap et al., 2002) . human genetic disorders have been traced also to disruption of regulatory mechanisms mediated by mrna binding proteins cazzola and skoda, 2000; cazzola et al., 2002; kaytor and orr, 2001; mikulits et al., 1999) . here, however, i focus on pathologies resulting from increases or decreases in translation caused directly by changes in mrna structure. the preceding paragraph mentioned some cases in which an n-terminally truncated protein is produced, apparently by reinitiation, when a mutation introduces a premature nonsense codon. the effects of some other types of mutations can also be understood in light of the scanning mechanism, as outlined next and discussed elsewhere in more detail (kozak, 2002) . recent investigations have identified diseases that result from failure to produce one of the two protein isoforms derived from genes that encode certain transcription factors (table 4 ). because the second isoform often functions as a modulator, the transcriptional imbalance caused by these changes in translation can have serious consequences. hereditary diseases have been traced occasionally to point mutations that alter the context flanking the aug start codon. the list includes a-thalassaemia caused by an a ! c change in position 23 of the a-globin gene (morlé et al., 1985) , androgen insensitivity syndrome caused by a g ! a mutation in position þ4 of the androgen receptor gene (choong et al., 1996) , and ataxia with vitamin e deficiency caused by a c ! t mutation in position 21 of the a-tocopherol transfer protein gene (usuki and maruyama, 2000) . there is an interesting report of a somatic point mutation (g ! c in position 23) in the brca1 gene in a highly aggressive case of sporadic breast cancer (signori et al., 2001) . in mice, a screen for mutations that cause defects in eye development uncovered an a ! t change in position 23 of the pax6 gene (favor et al., 2001) . each of these mutations was shown to cause a decrease (generally two-to four-fold) in translation. not every mutation or polymorphism within the consensus motif can be explained simply, however. other considerations, such as codon usage, might prevent an increase in translation even when the context is improved (i.e. translation might be limited at the level of elongation rather than initiation), and some mutations near the aug codon might affect mrna processing or stability rather than translation per se. a clinically relevant polymorphism in position 2 1 of annexin v appears to have an effect on translation which is inconsistent with the context rules (gonzález-conejero et al., 2002) , but the effect was small and documented only by assaying translation in vitro, which is not always reliable (section 6.2). a natural polymorphism in position 2 5 of the glycoprotein iba gene displayed a small effect on translation in vitro that was consistent with the rules (c worked better than t; afshar-kharghan et al., 1999) but, in the same study, mutations that changed position þ 4 from c to g did not augment translation. testing mutations in position þ 4 is tricky, however, because the change in identity of the penultimate amino acid might affect protein stability in ways that obscure the effects on translation. the solution is to use an assay that directly monitors the initiation step of translation (kozak, 1997) . the scanning mechanism predicts that a mutation which weakens or destroys the normal start codon should activate initiation from the next aug downstream. in some hereditary diseases in which the aug start codon is ablated, a truncated protein is indeed produced in this way but it does not function well enough to prevent the disease (cahana et al., 2001; huang et al., 1999; o'neill et al., 2001) . in the case of a mutated vasopressin gene in which the g of the aug start codon is deleted, the shorter signal peptide initiated from the second aug codon is not recognized by signal peptidase (beuret et al., 1999) . the resulting uncleaved vasopressin-precursor protein folds incorrectly, causing subsequent processing steps to fail, and therefore vasopressin never gets released from the endoplasmic reticulum. the second aug is only four codons downstream from the first, but the processing defect caused by this slight shift in the site of initiation causes diabetes insipidus. the scanning mechanism predicts that, when an out-offrame aug codon is introduced into the 5 0 utr, the adventitious upstream start codon should supplant the normal start site. a number of pathologies result from this kind of translational block. sometimes the upstream aug codon is created by a rare mutation (cai et al., 1992; liu et al., 1999) . other times it derives from a common polymorphism (bergenhem et al., 1992; endler et al., 2001; kanaji et al., 1998; kraft et al., 1998; zysow et al., 1995) . the reduction in translation is more or less severe depending on the context of the upstream aug codon and whether reinitiation is possible. i have already explained how hereditary thrombocythemia is caused by mutations that elevate translation of tpo by restructuring or eliminating an inhibitory uporf (fig. 4) . translation of proto-oncogenes is also elevated in some cases by eliminating small uporfs from the mrna. the mdm2 oncogene is one example: whereas the normal mrna has a long 5 0 utr that includes two upstream aug codons, in tumor cells the use of a different promoter eliminates the upstream augs and thus increases translational efficiency 20-fold (brown et al., 1999; landers et al., 1997) . in the case of oncogene gli1, the upstream aug codons that restrict translation in normal cells reside in an intron which is eliminated by more efficient splicing in basal cell carcinomas (wang and rothnagel, 2001) . translation of many other human or rodent oncogenes is restricted in normal cells by an encumbered (aug-burdened or gc-rich) leader sequence (arrick et al., 1991; bates et al., 1991; child et al., 1999a; harigai et al., 1996; hoover et al., 1997; horvath et al., 1995; manzella and blackshear, 1990; sarrazin et al., 2000) ; in some of these cases, a shorter 5 0 utr that better supports translation is produced in transformed cells (arrick et al., 1994; marth et al., 1988) . for other oncogenes, although there are alternative leader sequences that might regulate expression in normal tissues (link et al., 1992; sasahara et al., 1998) , there is no evidence that switching leader sequences contributes to tumorigenesis. table 4 pathologies resulting from a change in mrna structure which selectively abolishes production of the long or short form of a transcription factor translational mechanism that normally generates two protein isoforms disease-associated change in mrna structure and translation references c/ebpa (human) two proteins from one mrna via leaky scanning þ reinitiation shunt in acute myeloid leukemia, mutations near amino terminus eliminate production of longer isoform. pabst et al., 2001 gata1 (human) two proteins from one mrna via leaky scanning in down syndrome-related leukemia, premature stop codon eliminates production of longer isoform. wechsler et al., 2002 lef1 (human) two proteins from two mrnas (via two promoters) in colon cancer, failure to activate downstream promoter prevents production of shorter (inhibitory) isoform. hovanes et al., 2001 rx/rax (mouse) two proteins from one mrna via leaky scanning a in eyeless mice, mutation of second aug, leaving only the weak upstream start codon, results in inadequate yield. tucker et al., 2001 a here the long and short isoforms appear to function identically; the significance of the second aug start codon pertains to boosting the overall protein yield. the eyeless mouse serves as a spontaneous model for human anophthalmia. whereas removal of small uporfs elevates the translation of the aforementioned mdm2 and gli1 oncogenes in tumor cells, addition of small uporfs shuts off the translation of some tumor suppressor genes. in the case of hyal1, retention of an intron which contains eight upstream aug codons renders the mrna untranslatable in squamous cell carcinomas (frost et al., 2000) . a striking example of translational inactivation of a tumor suppressor gene is seen in some individuals with a predisposition to melanoma. in certain families, a point mutation (g ! t) creates an upstream, out-of-frame aug codon in the cdkn2 gene . the small uporf initiated from this new aug codon overlaps the cdkn2 start codon, and the resulting inhibition of translation is profound. structural changes that attenuate the translation of viral mrnas can contribute to the development of persistent infections. the 5 0 leader sequence on bovine coronavirus mrnas, for example, was found to evolve -by acquiring a small uporf -during the course of establishing a persistent infection (hofmann et al., 1993) . shishido-hara et al. (2000) speculate that human polyomavirus jc might cause persistent rather than acute infection because all capsid-protein encoding transcripts produced by the jc virus have a small uporf. with the related simian virus 40, in contrast, the uporf is sometimes eliminated by splicing, generating transcripts that better support translation of the major capsid protein. attenuating effects caused by introducing an upstream aug codon have been described also with other viruses (petty et al., 1990; slobodskaya et al., 1996) . a more drastic restructuring of mrnas sometimes occurs during the establishment of persistent infections by the measles virus. instead of the normal monocistronic mrna for the fusion protein, the predominant transcript in some persistently infected cells was a dicistronic mrna from which the f cistron, located at the 3 0 end, could not be translated (hummel et al., 1994) . a similar problem encountered in studies with recombinant rhabdoviruses provides insight into the transcriptional defects that can generate untranslatable dicistronic mrnas (quiñones-kochs et al., 2001) . in the case of a human parvovirus, productive infection is restricted to a subset of erythroid cells in which splicing generates a monocistronic mrna for each of the major capsid proteins. in nonpermissive cells, a slight shift in the position of a splice site imposes an upstream orf which is postulated to restrict translation of the capsid proteins (brunstein et al., 2000) . translational twists sometimes generate antigens which, by stimulating the ctl response, are important in the host defense against tumor cells and viruses (shastri et al., 2002) . leaky scanning is a likely explanation in several cases where the major orf starts with an aug codon in a suboptimal context and the ctl antigen derives from initiation at the next (out-of-frame) aug (aarnoudse et al., 1999; bullock et al., 1997; probst-kepper et al., 2001; rimoldi et al., 2000) . in one notable case, translation shifts upstream to an in-frame aug codon created during insertion of a provirus, and the resulting novel n-terminal amino acid extension functions as a tumor rejection antigen (wada et al., 1995) . the scanning mechanism cannot explain the translation of ctl antigens for which the start codon resides far in the interior of the mrna (ronsin et al., 1999; wang et al., 1996) . in these cases the antigenic peptide might be produced from an undetected alternative form of mrna. sensitive new assay techniques employed with some genes indeed reveal an array of alternative transcripts from which novel tumor antigens can be translated (behrends et al., 2002) . in another study, a potent tumor rejection peptide, which maps to an internal aug codon in the full-length cdna, was expressed experimentally from a truncated cdna wherein the start codon for the antigenic peptide was made the first aug (rosenberg et al., 2002) . additional analyses are needed to determine whether, in the melanoma cells wherein this antigen is expressed naturally, a transcript similar to the experimentally truncated cdna is produced via a downstream promoter or splice site. 6. surveys and assays and problems therein 6.1. cdna surveys surveys of mrna/cdna sequences differ in other details, but every survey confirms the presence of a purine in position 2 3 in most ($ 90%) vertebrate mrnas (kozak, 1987a; pesole et al., 2000; rogozin et al., 2001; sakai et al., 2001) . the occasional survey that purports to challenge the context rules involves distortions, such as emphasizing the low percentage of cdnas that have the full consensus sequence while ignoring the high percentage of cdnas that have the critical purine in position 2 3 (peri and pandey, 2001) . a major uncertainty pertaining to all cdna surveys concerns the validity of the database. when i re-examined the entries in one study (suzuki et al., 2000) , i found numerous instances in which the aug start codon had been misidentified; the corrected start sites adhered more closely to the consensus motif (kozak, 2000) . some authors pre-emptively defend their conclusions on the grounds that the (unidentified) cdna sequences used for their analysis derive from refseq, which is a curated database (pruitt et al., 2000) . but the entries in refseq are not without errors, some of which -e.g. misidentified start codons, mistaken claims of upstream aug codons -can be traced by comparing curated genbank entries nm_005493, nm_005502, nm_000282 and nm_003605 with results published else-where (campeau et al., 2001; nishitani et al., 2001; nolte and müller, 2002; santamarina-fojo et al., 2000) . some cdna surveys use misleading terminology, e.g. referring to upstream aug codons as 'unused' (peri and pandey, 2001) . given that upstream aug codons are used, as proved by detecting the encoded peptide or by fusing the uporf to a reporter gene, it is not anomalous to find a good context around some upstream aug codons. all surveys tend to overestimate the incidence of upstream augs by scoring only the longest cdna isoform, ignoring the existence of alternative transcripts that have shorter, unencumbered 5 0 utrs (section 4.2). the significance of upstream aug codons also tends to be misstated: the presence of small uporfs in vertebrate mrnas which are thereby translated inefficiently (see the foregoing discussion of tpo, oncogenes, etc.) constitutes evidence for, rather than against, the scanning model. the first-aug rule, which i cite as evidence for the scanning mechanism, derives not from statistical analysis of cdna sequences but from the experimentally observed fact (section 2.2) that translation shifts predictably upstream or downstream when an aug codon is added or removed. in short, it makes more sense to use the scanning/context rules to evaluate cdna sequences (hatzigeorgiou, 2002) than to attempt the reverse. while conclusions about translation derived from experimental studies are arguably more meaningful than those derived from statistics, the interpretation of experimental results can be complicated. in vivo assays avoid the problem of reaction-conditions-dictating-the-outcome (see next paragraph), but there are other potential traps. the usually-valid assumption that polysomal association identifies actively translated mrnas is called into question by the recent discovery of mrnas trapped on large polysomes from which there is no polypeptide production (rüegsegger et al., 2001) . the major problem when translation is studied in vivo is uncertainty about the structure of the mrna. for example, a claim that ires-mediated translation is developmentally regulated (créancier et al., 2000 (créancier et al., , 2001 is premature, inasmuch as those studies monitored the amount but not the form of mrna. when mrna structure is examined, the developmentally regulated expression might be found to reflect activation of an internal promoter rather than activation of an ires. other studies wherein translation of an encumbered leader sequence appears to improve under certain conditions or in certain cell types require better analyses to rule out a possible change in structure of the 5 0 utr (bernstein et al., 1995; child et al., 1999b; li et al., 2001; zimmer et al., 1994) . some useful hints may be found in reports that describe the belated discovery of alternative forms of mrna that were missed the first time around cortner and farnham, 1990; déjardin et al., 2000; deng et al., 2002; frost et al., 2000; grundhoff and ganem, 2001; jordan et al., 1996; kastner et al., 1990b; kiss-lászló et al., 1995; laurin et al., 2000; peremyslov and dolja, 2002; zhang and liu, 2000; zheng et al., 1994) . in vitro translation assays pose a different set of problems. the commercial availability of in vitro translation kits is both a blessing -the systems are easy to useand a curse. the latter because insufficient attention is paid to reaction conditions that can affect the selection of aug start codons. when the magnesium concentration is too low, the first aug codon may be bypassed despite an adequate context; when the magnesium concentration is too high, initiation may occur at upstream non-aug codons that are not naturally used. one solution is to include control transcripts for which start-codon selection was determined in vivo, and to adjust the in vitro reaction conditions to give the same result (kozak, 1990b) . some suppliers of translation kits make it possible to adjust the magnesium concentration, but there is little awareness of the need to do so and the use of coupled transcription/translation systems makes it difficult. for whatever reason, in vitro translation results sometimes deviate significantly from what is seen in vivo vis-àvis access to internal aug codons (grove et al., 1991; land and rouault, 1998; meijer et al., 2000; meulewaeter et al., 1992; mitchelmore et al., 2002; saucedo et al., 1999) and the degree of inhibition caused by small uporfs (ghilardi et al., 1998; harigai et al., 1996; pecqueur et al., 1999 pecqueur et al., , 2001 tanaka et al., 2001; wang and wessler, 1998) . the fidelity of initiation in vitro is clearly impaired, possibly due to degradation of the mrna, in cases where extraneous, low molecule weight polypeptides are produced (herbert et al., 1996; liu and biegalke, 2002; lekven et al., 2001, fig. 3c; maser et al., 2001, fig. 4b; packham et al., 1997, fig. 5) . the possibility that the input mrna might undergo cleavage during incubation in vitro complicates attempts to study the expression of dicistronic mrnas, as discussed in the next section. this type of artifact is not ruled out by finding that only certain downstream orfs are translated (o'connor and brian, 2000) . extrapolating from what is seen when mrnas are deliberately cleaved in vivo (thoma et al., 2001) , activation of internal start codons in vitro would depend on where the accidental cleavage occurs and whether the endolytic cleavage product persists long enough for a ribosome to engage the newly created 5 0 end before exonucleases take over. this line of reasoning could explain the claim that an 'artificial ires', consisting of a multiple cloning region and a portion of the escherichia coli laci gene, supports internal initiation of translation in starved yeast cells (paz et al., 1999) : starvation is likely to promote mrna degradation, and the 'ires' might fortuitously stabilize certain intermediates in degradation. the discovery that ires elements are actually targeted by some ribonucleases (elgadi and smiley, 1999; nadal et al., 2002) should be remembered. recent studies that use a primer-extension inhibition (toeprinting) assay to monitor the binding of ribosomes to mrnas have the advantage of focusing directly on the initiation step, but care is needed to distinguish authentic initiation complexes from artifactual pauses in primer extension caused by base-paired structures or extraneous proteins bound to the mrna. the complicating effects of mrna secondary structure, which are prominent when avian reverse transcriptase is used for toeprinting, can be minimized by using a form of the enzyme derived from murine leukemia virus (kozak, 1998) . attempts to explain the origin of multiple isoforms of eif4g (bradley et al., 2002) illustrate how challenging it can be to interpret translation assays. in vitro experiments presented in support of the idea that translation can initiate from five aug codons, in a single form of eif4g mrna, might have been compromised by mrna breakage; this would explain the production of an array of extraneous smaller polypeptides (byrd et al., 2002, fig. 3c , lanes 1, 3 and 5). translation of some eif4g isoforms from broken mrnas could also explain the variability in yields noted throughout that study. when the endogenous eif4g gene is expressed in vivo, access to certain downstream aug codons might occur via alternative splicing or internal promoters; both mechanisms have been documented in studies of eif4g by other investigators (han and zhang, 2002 , and references therein). thus, even though one could rationalize the production of at least three isoforms of eif4g from one mrna via established translational mechanisms -an overlapping uporf could shunt some 40s ribosomal subunits past the first in-frame aug codon (position 275), and the unfavorable context at aug 395 might allow some ribosomes to reach aug 536 by leaky scanning -it would be premature to propose that solution. the in vitro experiments need to be repeated with careful attention to magnesium levels and with efforts to minimize mrna breakage. the latter might be accomplished by lowering the temperature to 25 8c and limiting the window for initiation to 5 or 10 min. (addition of edeine after the first 5 or 10 min, followed by another period of incubation, would allow polypeptides to be elongated without further initiation events.) the possible production of some eif4g isoforms by proteolysis also needs to be ruled out, as this protein is notoriously susceptible to cleavage. the study by byrd et al. (2002) included experiments carried out with dicistronic transcripts, predicated on the belief that eif4g mrna contains ires elements which allow direct internal initiation of translation. an eif4g/ egfp (enhanced green fluorescent protein) fusion gene positioned at the 3 0 end of a dicistronic transcript was translatable in vitro, but the aforementioned possibility of mrna breakage complicates the interpretation. indeed, the unexpected translation of egfp from the 3 0 position even without fusion to eif4g (byrd et al., 2002, fig. 7a , lane 3) is most easily explained by mrna cleavage. (the authors invoke reinitiation as the explanation, but reinitiation cannot occur following translation of a large 5 0 cistron.) fragmentation of the mrna could explain why translation of the 3 0 eif4g/egfp cistron persisted when translation of the 5 0 cistron was blocked by a hairpin structure (byrd et al., 2002, fig. 7b ). the hairpin test, widely used to test for internal initiation, is meaningless without evidence that the dicistronic input mrna remains intact. the in vivo tests of eif4g translation (byrd et al., 2002, fig. 8 ) also require careful rna analyses to document that the vector produces only the intended dicistronic mrna; the quality of the northern blot in that figure falls far short of what is required. to rule out the possibility that the 3 0 cistron might be translated from an unintended monocistronic mrna, a promoter-deletion control is needed -a control which shows that, upon deleting the promoter that precedes the 5 0 cistron, expression of the 3 0 cistron is also abolished. this test failed in studies with some other sequences, revealing that the candidate ires actually harbors a cryptic promoter (han and zhang, 2002; larsen et al., 2002) . the foregoing discussion of eif4g translation alludes to only some of the problems associated with dicistronic vectors; a more detailed critique may be found elsewhere (kozak, 2001a) . use of a certain popular vector which harbors an intron near the 5 0 end (jopling and willis, 2001) increases the likelihood of producing an unintended monocistronic mrna via splicing; the candidate ires need contribute only a cryptic 3 0 splice site. this indeed happens in some cases (grundhoff and ganem, 2001; pinkstaff et al., 2001) . claims of ires activity are problematic when supported by in vitro assays in which translation of the 3 0 orf is very, very weak (e.g. deffaud and darlix, 2000, fig. 2; lekven et al., 2001, fig. 3b, lane 3; maser et al., 2001, fig. 4e ). the simple idea that an ires can be identified based on the ability to support translation of a 3 0 cistron runs into trouble when, for example, the bglobin mrna leader sequence, intended to serve as a negative control, turns out somehow to allow translation of a downstream cistron ( van der velden et al., 2002) . in another study, merely lengthening the intercistronic domain enabled substantial translation of the 3 0 cistron (gallie et al., 2000) , perhaps by providing room for rnases to cleave and thus release a translatable 3 0 fragment. even with the paradigmatic ires elements derived from picornaviruses, the ability to support internal initiation was found to depend inexplicably on the choice and arrangement of 5 0 and 3 0 reporter genes (hennecke et al., 2001) . these odditiesalong with the notable inability to translate the 3 0 cistron in natural dicistronic mrnas (table 1) -are reason to worry about the validity of experiments that employ synthetic dicistronic constructs. the proffered rationale for a cap-independent internal initiation mechanism is that it would enable certain mrnas to be translated when eif4e levels decline, but recent experiments presented in support of that idea used the 5 0 utr from poliovirus rather than 5 0 utrs from cellular mrnas, such as eif4g, that are supposedly regulated via 'a dynamic interplay between cap-dependent and cap-independent processes'. even if the proffered rationale is valid, convincing evidence for direct internal initiation with particular mrnas is needed. the widely used dicistronic assay has flaws, as outlined above. an alternative assay which involves circularization of the mrna has been attempted with only one viral ires element (chen and sarnow, 1995) ; the results await independent verification and extension to other sequences. the scanning model provides a framework for understanding basic patterns of eukaryotic gene expression, such as the reliance on monocistronic mrnas, and for understanding how translation is perturbed by mutations that restructure the 5 0 utr. a growing number of human diseases have been traced to such mutations. the scanning mechanism has been shown to operate not only with simple mrnas that have a short 5 0 utr and initiate at the first aug codon, but also with mrnas that have complicated leader sequences and multiple start codons. one often hears the suggestion that an alternative, iresmediated mechanism of initiation is required when a long leader sequence is encumbered by secondary structure or upstream aug codons (dever, 2002; pestova et al., 2001) . that view is not well taken. scanning can occur over long distances, as evidenced by some bifunctional viral mrnas in which the second start site is more than 500 nt downstream from the first (e.g. peanut clump virus, southern bean mosaic virus, rice tungro bacilliform virus; table 3 and fig. 1c ). the structure-prone, gc-rich leader sequences on mammalian mrnas strongly reduce translational efficiency but do not preclude operation of the scanning mechanism (van der velden et al., 2002) . upstream aug codons also reduce translational efficiency and that is why they are there. to postulate the need for an alternative mechanism is to miss the point that an encumbered leader sequence ensures that translation via scanning will be inefficient, and thus ensures against harmful overproduction of cytokines (fig. 4) and other potent proteins. the high frequency of intron-containing cdna sequences (kozak, 1991a (kozak, , 1996 might reflect another type of regulation. inefficient or regulated removal of the first intron has been documented in some cases (boularand et al., 1995; frost et al., 2000; van der leij et al., 2002; wang and rothnagel, 2001; xie et al., 1991; zachar et al., 1987) and i suspect that additional examples might be found -miscategorized -among the aforementioned cdnas that are postulated to require an alternative mechanism of initiation. removal of the intron, or use of a cryptic promoter therein, would eliminate the upstream aug codons that are barriers to scanning. examples in which translation is prevented deliberately by splicing-out the exon that contains the aug start codon (lin et al., 1998) or by other regulated splicing events (rueter et al., 1999) underscore the point that not every transcript -hence not every cdna -corresponds to a functional mrna. before postulating the need for a new mechanism to explain how a funny looking cdna gets translated, one must be certain that it is translated. the mechanisms discussed herein for escaping the first-aug rule, within the constraints imposed by the scanning model, obviously cannot explain every report of initiation from an internal position. more information is needed to understand how n-terminally truncated versions of some proteins are produced apparently without truncation of the mrna (goss et al., 2002; maser et al., 2001; santagata et al., 2000; scharnhorst et al., 1999; vanhoutte et al., 2001 ). an ires element was postulated in some of those cases, based on the dicistronic test, but in one study there were no accompanying analyses of rna structure in vivo (goss et al., 2002) , and in another study the use of an in vitro translation system produced too little of the truncated protein to be convincing (maser et al., 2001) . speculation about how some other interesting genes are translated (klemke et al., 2001 ) also must be postponed pending a search for possible additional forms of mrna. although i listed the von hippel-lindau tumor suppressor gene as a possible example of leaky scanning (table 3) , definitive tests are needed to distinguish between that and other mechanisms for producing the short isoform (iliopoulos et al., 1998) . those of us with an interest in translation have a tendency to interpret every change in mrna structure as a means to control translation, but transcriptional requirements -the need to turn on a gene in various tissues via whatever promoter works in each tissue -underlie most switches in 5 0 leader sequences. in some cases the actual sequence of the 5 0 utr is dictated by the presence therein of transcriptional control elements (akiri et al., 1998; minami et al., 2001; solecki et al., 1997; yin and blanchard, 2000; yu et al., 2001; zimmermann et al., 2000) . regulation of transcription is the major reason for the gc-rich domains near the 5 0 end of many mammalian genes; the accompanying down-modulation of translation is an inevitable consequence -arguably a useful consequence because, given the long half-life of most mammalian mrnas, inefficient translation might be a necessity. it merits repeating that, although the m7g cap strongly promotes ribosome binding, the scanning mechanism is not dependent on the presence of the cap. the essence of the scanning model is 5 0 entry of ribosomes and positiondependent selection of the aug start codon. those key points hold with naturally uncapped mrnas produced by some viruses (footnote g in table 1 ) and with synthetic uncapped mrnas used to study translation in vitro (kozak, 1979a . the inclination to invoke internal initiation based on indirect criteria -absence of a cap, or the ability to be translated in extracts from poliovirus-infected cells -should be resisted. it is a mistake to think that, because archaeal mrnas lack a 5 0 cap, translation in that system cannot occur via scanning. the discovery in archaea of proteins similar to certain eukaryotic initiation factors (kyrpides and woese, 1998) is intriguing for other reasons but has no direct bearing on whether the start codon in archaeal mrnas might be recognized via a prokaryotic-or eukaryotic-type mechanism. that interesting question, which bears on the evolutionary origin of scanning, awaits answering. fundamental questions about the molecular workings of the scanning mechanism also await answering. what drives migration of the 40s subunit during the scanning phase? how does the 40s subunit hold on at a terminator codon, in order to reinitiate? what prevents reinitiation when the size of the first orf exceeds a certain length? we know nothing about how recognition of the start codon is aided by a purine in position 23 and g in position þ 4. there is no evidence for base pairing between the gccrcc motif and rrna (or for binding of rrna to any other sequence in eukaryotic mrnas). there is as yet no convincing evidence for recognition of gccrcc by a trans-acting protein factor. it would be easy, and meaningless, simply to find proteins that bind an rna fragment which contains the motif. credible experiments would require controls based on what we know about the consensus sequence: that the purine (a . g) in position 2 3 plays a dominant role, and the full effect requires that the gccrcc motif abut the aug codon (kozak, 1999, fig. 1 ). with so much effort being directed to searching for possible exceptions to the scanning mechanism, one can only wish that some enterprising soul would tackle these important questions. interleukin-2-induced, melanoma-specific t cells recognize camel, an unexpected translation product of lage-1 suppression of ribosomal reinitiation at upstream open reading frames in amino acid-starved cells forms the basis for gcn4 translational control subcellular fate of the int-2 oncoprotein is determined by choice of initiation codon kozak sequence polymorphism of the glycoprotein (gp) iba gene is a major determinant of the plasma membrane levels of the platelet gp ib-ix-v complex specific sequences in p120ctn determine subcellular distribution of its multiple isoforms involved in cellular adhesion of normal and malignant epithelial cells unconventional mrna processing in the expression of two calcineurin b isoforms in dictyostelium uroporphyrinogen iii synthase. an alternative promoter controls erythroid-specific expression in the murine gene regulation of vascular endothelial growth factor (vegf) expression is mediated by internal initiation of translation and alternative initiation of transcription abundant early expression of gpul4 from a human cytomegalovirus mutant lacking a repressive upstream open reading frame apobec-1 transcription in rat colon cancer: decreased apobec-1 protein production through alterations in polysome distribution and mrna translation associated with upstream augs identification of a brain-specific protein kinase cj pseudogene (cpkcj) transcript tissue-specific and ubiquitous promoters direct the expression of alternatively spliced transcripts from the calcitonin receptor gene structural characterization of sil, a gene frequently disrupted in t-cell acute lymphoblastic leukemia cloning and characterization of the gene encoding rabbit cardiac calsequestrin a new 34-kilodalton isoform of human fibroblast growth factor 2 is cap dependently synthesized by using a non-aug start codon and behaves as a survival factor inhibition of translation of transforming growth factor-b3 mrna by its 5 0 untranslated region enhanced translational efficiency of a novel transforming growth factor b3 mrna in human breast cancer cells translational initiation competence, 'leaky scanning' and translational reinitiation in area mrna of aspergillus nidulans multiple roles for the cterminal domain of eif5 in translation initiation complex assembly and gtpase activation biosynthesis of osteogenic growth peptide via alternative translational initiation at aug 85 of histone h4 mrna expression of murine il-12 is regulated by translational control of the p35 subunit the two proteins encoded by the cottontail rabbit papillomavirus e6 open reading frame differ with respect to localization and phosphorylation biosynthesis of human fibroblast growth factor-5 novel products of the hud, huc, nnp-1 and a-internexin genes identified by autologous antibody screening of a pediatric neuroblastoma library yeast leu4 encodes mitochondrial and non-mitochondrial forms of a-isopropylmalate synthase mutation creates an open reading frame within the 5 0 untranslated region of macaque erythrocyte carbonic anhydrase (ca) i mrna that suppresses ca i expression and supports the scanning model for translation the translational repression mediated by the platelet-derived growth factor 2/c-sis mrna leader is relieved during megakaryocytic differentiation positive and negative regulation of myogenic differentiation of c2c12 cells by isoforms of the multiple homeodomain zinc finger transcription factor atbf1 mechanism of endoplasmic reticulum retention of mutant vasopressin precursor caused by a signal peptide truncation associated with diabetes insipidus alternative splicing of the imprinted candidate tumor suppressor gene zac regulates its antiproliferative and dna binding activities a global analysis of caenorhabditis elegans operons the avian reovirus genome segment s1 is a functionally tricistronic gene that expresses one structural and two nonstructural proteins in infected cells the 2.2 kb e1b mrna of human ad12 and ad5 codes for two tumor antigens starting at different aug triplets the human tryptophan hydroxylase gene. an unusual splicing complexity in the 5 0 -untranslated region mass spectrometric analysis of the n terminus of translational initiation factor eif4g-1 reveals novel isoforms the procaspase-8 isoform, procaspase-8l, recruited to the bap31 complex at the endoplasmic reticulum bunyamwera bunyavirus nonstructural protein nss is a nonessential gene product that contributes to viral pathogenesis role of two upstream open reading frames in the translational control of oncogene mdm2 microarray identification of fmrp-associated brain mrnas and altered mrna translational profiles in fragile x syndrome identification of a novel rna splicing pattern as a basis of restricted cell tropism of erythrovirus b19 initiation codon scanthrough versus termination codon readthrough demonstrates strong potential for major histocompatibility complex class i-restricted cryptic epitope expression generation of multiple isoforms of eukaryotic translation initiation factor 4gi by use of alternate translation initiation codons translational control of mammalian serine hydroxymethyl-transferase expression targeted mutagenesis of lis1 disrupts cortical development and lis1 homodimerization two novel b-thalassemia mutations in the 5 0 and 3 0 noncoding regions of the b-globin gene translational control of c/ ebpa and c/ebpb isoform expression alternative translation initiation site usage results in two functionally distinct forms of the gata-1 transcription factor structure of the pcca gene and distribution of mutations causing propionic acidemia translational inhibition by a human cytomegalovirus upstream open reading frame despite inefficient utilization of its aug codon the secreted form of invertase in saccharomyces cerevisiae is synthesized from mrna encoding a signal sequence translation of equine infectious anemia virus bicistronic tat-rev mrna requires leaky ribosome scanning of the tat ctg initiation codon transcription of feline calicivirus rna translational pathophysiology: a novel molecular mechanism of human disease a novel deletion of the l-ferritin iron-responsive element responsible for severe hereditary hyperferritinaemia-cataract syndrome phenotype-genotype relationships in complementation group 3 of the peroxisome-biogenesis disorders the increased level of b1,4-galactosyltransferase required for lactose biosynthesis is achieved in part by translational control the yeast vas1 gene encodes both mitochondrial and cytoplasmic valyl-trna synthetases structure of the gm2a gene: identification of an exon 2 nonsense mutation and a naturally occurring transcript with an in-frame deletion of exon 2 initiation of protein synthesis by the eukaryotic translational apparatus on circular rnas a novel influenza a virus mitochondrial protein that induces cell death translation initiation at alternate in-frame aug codons in the rabies virus phosphoprotein mrna is mediated by a ribosomal leaky scanning mechanism translational control by an upstream open reading frame in the her-2/neu transcript cell type-dependent andindependent control of her-2/neu translation a novel missense mutation in the amino-terminal domain of the human androgen receptor gene in a family with partial androgen insensitivity syndrome causes reduced efficiency of protein translation two distinct protein isoforms are encoded by ntk, a csk-related tyrosine protein kinase gene alternative transcription and splicing of the human porphobilinogen deaminase gene result either in tissue-specific or in housekeeping expression trna met functions in directing the scanning ribosome to the start site of translation mutational analysis of the his4 translational initiator region in saccharomyces cerevisiae initiation factor eif2a phosphorylation in stress responses and apoptosis tissue-and development-specific alternative rna splicing regulates expression of multiple isoforms of erythroid membrane protein 4.1 identification of the serum-responsive transcription initiation site of the zinc finger gene krox-20 fibroblast growth factor 2 internal ribosome entry site (ires) activity ex vivo and in transgenic mice reveals a stringent tissue-specific regulation c-myc internal ribosome entry site activity is developmentally controlled and subjected to a strong translational repression in adult transgenic mice the arabidopsis thaliana fps1 gene generates a novel mrna that encodes a mitochondrial farnesyldiphosphate synthase isoform alternatively spliced human type 1 angiotensin ii receptor mrnas are translated at different efficiencies and encode two receptor isoforms eukaryotic translation initiation factor 5 functions as a gtpase-activating protein the s4 genome segment of baboon reovirus is bicistronic and encodes a novel fusion-associated small transmembrane protein expression and function of ccaat/enhancer binding protein b (c/ ebpb) lap and lip isoforms in mouse mammary gland, tumors and cultured mammary epithelial cells rlk/txk encodes two forms of a novel cysteine string tyrosine kinase activated by src family kinases characterization of an internal ribosomal entry segment in the 5 0 leader of murine leukemia virus env rna a novel subgenomic murine leukemia virus rna transcript results from alternative splicing eif2ak3, encoding translation initiation factor 2-a kinase 3, is mutated in patients with wolcott-rallison syndrome transcriptional regulation of the interleukin-6 gene of human herpesvirus 8 (kaposi's sarcomaassociated herpesvirus) era gene expression in human primary osteoblasts: evidence for the expression of two receptor proteins a liver-enriched transcriptional activator protein, lap, and a transcriptional inhibitory protein, lip, are translated from the same mrna gene-specific regulation by general translation factors genomic structure of unp, a murine gene encoding a ubiquitin-specific protease control of start codon choice on a plant viral rna encoding overlapping genes the first and third uorfs in rsv leader rna are efficiently translated: implications for translational regulation and viral rna packaging identification of eight novel 5 0 -exons in cerebral capillary malformation gene-1 (ccm1 ) encoding krit1 picornavirus internal ribosome entry site elements target rna cleavage events induced by the herpes simplex virus virion host-shutoff protein nucleotide sequence and expression of the small (s) rna segment of maguari bunyavirus amino-terminal extension generated from an upstream aug codon increases the efficiency of mitochondrial import of yeast n 2 ,n 2 -dimethylguanosine-specific trna methyltransferases gastric cancers overexpress darpp-32 and a novel isoform, t-darpp a common c ! t polymorphism at nt 46 in the promoter region of coagulation factor xii is associated with decreased factor xii activity translation of bicistronic viral mrna in transfected cells: regulation at the level of elongation a 31-amino acid n-terminal extension regulates c-crk binding to tyrosine-phosphorylated proteins the rat hepatic leukemia factor (hlf) gene encodes two transcriptional activators with distinct circadian rhythms, tissue distributions and target preferences molecular characterization of pax6 2neu through pax6 10neu : an extension of the pax6 allelic series and the identification of two possible hypomorph alleles in the mouse mus musculus human basic fibroblast growth factor gene encodes four polypeptides: three initiate translation from non-aug codons identification of a new isoform of the human estrogen receptor-alpha (her-a) that is encoded by distinct transcripts and that is able to repress her-a activation function 1 a testisspecific promoter in the rat vasopressin gene translational stop codons in the precore sequence of hepatitis b virus pre-c rna allow translation reinitiation at downstream augs translation of the hepatitis b virus p gene by ribosomal scanning as an alternative to internal initiation upstream organization of and multiple transcripts from the human folylpoly-gglutamate synthetase gene nonsense-mediated mrna decay in health and disease hyal1 luca-1 , a candidate tumor suppressor gene on chromosome 3p21.3, is inactivated in head and neck squamous cell carcinomas by aberrant splicing of pre-mrna viral and cellular mrna capping: past and prospects splicing in a plant pararetrovirus position-dependent att initiation during plant pararetrovirus rice tungro bacilliform virus translation rice tungro bacilliform virus open reading frames ii and iii are translated from polycistronic pregenomic rna by leaky scanning translation of p15.5 ink4b , an n-terminally extended and fully active form of p15 ink4b , is initiated from an upstream gug codon physical evidence for distinct mechanisms of translational control by upstream open reading frames the role of 5 0 -leader length, secondary structure and pabp concentration on cap and poly(a) tail function during translation in xenopus oocytes the two forms of karyogamy transcription factor kar4p are regulated by differential initiation of transcription, translation, and protein turnover gcd10, a translational repressor of gcn4, is the rna-binding subunit of eukaryotic translation initiation factor-3 the mrna structure has potent regulatory effects on type 2 iodothyronine deiodinase expression a single-base deletion in the thrombopoietin (tpo) gene causes familial essential thrombocythemia through a mechanism of more efficient translation of tpo mrna thrombopoietin production is inhibited by a translational mechanism hereditary thrombocythaemia in a japanese family is caused by a novel point mutation in the thrombopoietin gene initiation of translation directed by 42s and 26s rnas from semliki forest virus in vitro a common polymorphism in the annexin v kozak sequence (21 c . t) increases translation efficiency and plasma levels of annexin v, and decreases the risk of myocardial infarction in young patients position is the critical determinant for function of iron-responsive elements as translational regulators attenuated apc alleles produce functional protein from internal translation initiation characterization of the cis-acting elements controlling subgenomic mrnas of citrus tristeza virus: production of positive-and negativestranded 3 0 -terminal and positive-stranded 5 0 -terminal rnas effect of sequence context at stop codons on efficiency of reinitiation in gcn4 translational control diverse splicing mechanisms fuse the evolutionarily conserved bicistronic mocs1a and mocs1b open reading frames an imprinted, mammalian bicistronic transcript encodes two independent proteins mapping of the tobacco mosaic virus movement protein and coat protein subgenomic rna promoters a link between diabetes and atherosclerosis: glucose regulates expression of cd36 at the level of translation the vitamin d receptor gene start codon polymorphism: a functional analysis of fok i variants cloning and expression of two human p70 s6 kinase polypeptides differing only at their amino termini mechanisms governing expression of the v-flip gene of kaposi's sarcoma-associated herpesvirus t-cell receptor sequences that elicit strong down-regulation of premature termination codon-bearing transcripts mapping and expression of southern bean mosaic virus genomic and subgenomic rnas synthesis in vitro of a seven amino acid peptide encoded in the leader rna of rous sarcoma virus utilization of an alternative transcription initiation site of somatic cytochrome c in the mouse produces a testisspecific cytochrome c mrna heme-regulated eif2a kinase (hri) is required for translational regulation and survival of erythroid precursors in iron deficiency regulation of gene expression by internal ribosome entry sites or cryptic promoters: the eif4g story human ubiquitin-activating enzyme, e1. indication of potential nuclear and cytoplasmic subpopulations using epitope-tagged cdna constructs abrogation of upstream open reading frame-mediated translational control of a plant s-adenosylmethionine decarboxylase results in polyamine disruption and growth perturbations mouse atf5: molecular cloning of two novel mrnas, genomic organization, and odorant sensory neuron localization functionality of alternative splice forms of the first enzymes involved in human molybdenum cofactor biosynthesis regulated translation initiation controls stress-induced gene expression in mammalian cells diabetes mellitus and exocrine pancreatic dysfunction in perk 2 /2 mice reveals a role for translational control in secretory cell survival characterization of two cis-regulatory regions in the murine b1,4-galactosyltransferase gene a cis-acting element in the bcl-2 gene controls expression through translational mechanisms subcellular relocalization of a long-chain fatty acid coa ligase by a suppressor mutation alleviates a respiration deficiency in saccharomyces cerevisiae translation initiation start prediction in human cdnas with high accuracy upflp, nmd2p, and upf3p regulate the decapping and exonucleolytic degradation of both nonsense-containing mrnas and wild-type mrnas human wig-1, a p53 target gene that encodes a growth inhibitory zinc finger protein internal ribosome entry sites in eukaryotic mrna molecules composition and arrangement of genes define the strength of ires-driven translation in bicistronic mrnas the conserved 5 0 -untranslated leader of spi-1 (pu.1) mrna is highly structured and potently inhibits translation in vitro but not in vivo detection of the orf3 polypeptide of feline calicivirus in infected cells and evidence for its expression from a single, functionally bicistronic subgenomic mrna an altered ribosomal protein in an edeine-resistant mutant of saccharomyces cerevisiae translation of the second gene of peanut clump virus rna 2 occurs by leaky scanning in vitro characterization of the gene encoding human platelet glycoprotein ix translational regulation of yeast gcn4. a window on factors that control initiator-trna binding to the ribosome translation initiation and assembly of peripherin in cultured cells a translationattenuating intraleader open reading frame is selected on coronavirus mrnas during persistent infection molecular basis for the dual mitochondrial and cytosolic localization of alanine:glyoxylate aminotransferase in amphibian liver cells pim-1 protein expression is regulated by its 5 0 -untranslated region and translation initiation factor eif-4e characterization of the infections of permissive and nonpermissive cells by host range mutants of vesicular stomatitis virus defective in rna methylation translational regulation of complement protein c2 expression by differential utilization of the 5 0 -untranslated region of mrna multiple elements in the 5 0 untranslated region down-regulate c-sis messenger rna translation caenorhabditis elegans mrnas that encode a protein similar to adars derive from an operon containing six genes b-cateninsensitive isoforms of lymphoid enhancer factor-1 are selectively expressed in colon cancer rh mod syndrome: a family study of the translation-initiator mutation in the rh50 glycoprotein gene gtp hydrolysis controls stringent selection of the aug start codon during translation initiation in saccharomyces cerevisiae genomic structure of the locus encoding protein 4.1. structural basis for complex combinatorial patterns of tissue-specific alternative rna splicing restriction of fusion protein mrna as a mechanism of measles virus persistence messenger rna for the coat protein of tobacco mosaic virus multiple 5 0 -untranslated exons in the nuclear respiratory factor 1 gene span 47 kb and contribute to transcript heterogeneity and translational efficiency translational regulation of hepatitis b virus polymerase gene by termination-reinitiation of an upstream minicistron in a length-dependent manner pvhl 19 is a biologically active product of the von hippel-lindau gene arising from internal translation initiation conservation of polyamine regulation by translational frameshifting from yeast to mammals identification of two additional translation products from the matrix (m) gene that contribute to vesicular stomatitis virus cytopathology decreased riz1 expression but not riz2 in hepatoma and suppression of hepatoma tumorigenicity by riz1 variants of the 5 0 -untranslated region of the bovine growth hormone receptor mrna: isolation, expression and effects on translational efficiency minimal truncation of the c-myb gene product in rapid-onset b-cell lymphoma both codon context and leader length contribute to efficient expression of two overlapping open reading frames of a cucumber necrosis virus bifunctional subgenomic mrna expression of bacterial chitinase protein in tobacco leaves using two photosynthetic gene promoters n-myc translation is initiated via an internal ribosome entry segment that displays enhanced activity in neuronal cells expression of human foamy virus reverse transcriptase involves a spliced pol mrna a common genetic polymorphism (46 c to t substitution) in the 5 0 -untranslated region of the coagulation factor xii gene is associated with low translation efficiency and decrease in plasma factor xii level production of three distinct mrnas of 150 kda oxygen-regulated protein (orp150) by alternative promoters: preferential induction of one species under stress conditions two distinct estrogen-regulated promoters generate transcripts encoding the two functionally different human progesterone receptor forms a and b transient expression of human and chicken progesterone receptors does not support alternative translational initiation from a single mrna as the mechanism generating two receptor isoforms human rna-specific adenosine deaminase (adar1 ) gene specifies transcripts that initiate from a constitutively active alternative promoter rna targets of the fragile x protein molecular cloning of the human p120 ctn catenin gene (ctnnd1): expression of multiple alternatively spliced isoforms competition between nuclear localization and secretory signals determines the subcellular fate of a single cug-initiated form of fgf3 expression of the open reading frame 74 (g-protein-coupled receptor) gene of kaposi's sarcoma (ks)-associated herpesvirus: implications for ks pathogenesis splicing of cauliflower mosaic virus 35s rna is essential for viral infectivity two overlapping reading frames in a single exon encode interacting proteins -a novel way of gene usage murine phospholipid hydroperoxide glutathione peroxidase: cdna sequence, tissue expression, and mapping a positive-strand rna virus with three very different subgenomic rna promoters caveolin-1 isoforms are encoded by distinct mrnas the position dependence of translational regulation via rna-rna and rnaprotein interactions in the 5 0 -untranslated region of eukaryotic mrna is a function of the thermodynamic competence of 40s ribosomes in translational initiation binding of ribosomes to linear and circular forms of the 5 0 -terminal leader fragment of tobacco mosaic virus rna familial essential thrombocythemia associated with one-base deletion in the 5 0 -untranslated region of the thrombopoietin gene upstream open reading frames regulate the translation of multiple mrna variants of the estrogen receptor alpha independent promoters regulate the expression of two amino terminally distinct forms of latent transforming growth factor-b binding protein-1 (ltbp-1) in a cell typespecific manner inability of circular mrna to attach to eukaryotic ribosomes migration of 40s ribosomal subunits on messenger rna when initiation is perturbed by lowering magnesium or adding drugs role of atp in binding and migration of 40s ribosomal subunits analysis of ribosome binding sites from the s1 message of reovirus. initiation at the first and second aug codons translation of insulin-related polypeptides from messenger rnas with tandemly reiterated copies of the ribosome binding site influences of mrna secondary structure on initiation by eukaryotic ribosomes point mutations define a sequence flanking the aug initiator codon that modulates translation by eukaryotic ribosomes an analysis of 5 0 -noncoding sequences from 699 vertebrate messenger rnas at least six nucleotides preceding the aug initiator codon enhance translation in mammalian cells effects of intercistronic length on the efficiency of reinitiation by eucaryotic ribosomes circumstances and mechanisms of inhibition of translation by secondary structure in eucaryotic mrnas downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes evaluation of the fidelity of initiation of translation in reticulocyte lysates from commercial sources an analysis of vertebrate mrna sequences: intimations of translational control structural features in eukaryotic mrnas that modulate the initiation of translation a short leader sequence impairs the fidelity of initiation by eukaryotic ribosomes adherence to the first-aug rule when a second aug codon follows closely upon the first interpreting cdna sequences: some insights from studies on translation recognition of aug and alternative initiator codons is augmented by g in position þ 4 but is not generally affected by the nucleotides in positions þ5 and þ6 primer-extension analysis of eukaryotic ribosome-mrna complexes initiation of translation in prokaryotes and eukaryotes do the 5 0 untranslated domains of human cdnas challenge the rules for initiation of translation (or is it vice versa new ways of initiating translation in eukaryotes? constraints on reinitiation of translation in mammals emerging links between initiation of translation and human diseases migration of 40s ribosomal subunits on messenger rna in the presence of edeine significant impact of the þ 93 c/t polymorphism in the apolipoprotein(a) gene on lp(a) concentrations in africans but not in caucasians: confounding effect of linkage disequilibrium identification of the translational initiation codon in human maged1 genomic organization and biosynthesis of secreted and cytoplasmic forms of gelsolin archaeal translation initiation revisited: the initiation factor 2 and eukaryotic initiation factor 2b ab-d subunit families the molecular biology of coronaviruses a transcriptional repressor encoded by bpv-1 shares a common carboxy-terminal domain with the e2 transactivator the genetics of bovine papillomavirus type 1 targeting of a human iron-sulfur cluster assembly enzyme, nifs, to different subcellular compartments is regulated through alternative aug utilization translational enhancement of mdm2 oncogene expression in human tumor cells containing a stabilized wild-type p53 protein nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus genomic organization of the mouse peroxisome proliferator activated receptor b/g gene. alternative promoter usage and splicing yield transcripts exhibiting differential translational efficiency the hormone-sensitive lipase gene is transcribed from at least five alternative first exons in mouse adipose tissue polyamine regulation of ribosome pausing at the upstream open reading frame of sadenosylmethionine decarboxylase mutation eliminating mitochondrial leader sequence of methylmalonyl-coa mutase causes mut o methylmalonic acidemia molecular cloning and characterization of the mouse peroxiredoxin v gene the 5 0 untranslated regions of the rat a 2a adenosine receptor gene function as negative translational regulators zebrafish wnt8 encodes two wnt8 proteins on a bicistronic transcript and is required for mesoderm and neurectoderm patterning the human galactose-1-phosphate uridyltransferase gene translational control of cell fate: availability of phosphorylation sites on translational repressor 4e-bp1 governs its proapoptotic potency cell-to-cell movement of turnip crinkle virus is controlled by two small open reading frames that function in trans a translationally regulated tousled kinase phosphorylates histone h3 and confers radioresistance when overexpressed a 30-kda alternative translation product of the ccaat/enhancer binding protein a message: transcriptional activator lacking antimitotic activity intron-exon structure of the met gene and cloning of an alternatively-spliced met isoform reveals frequent exon-skipping of a single large internal exon an extended rna/rna duplex structure within the coding region of mrna does not block translational elongation characterization of the 5 0 untranslated region of the human c-fgr gene and identification of the major myelomonocytic c-fgr promoter the promoter, transcriptional unit, and coding sequence of herpes simplex virus 1 family 35 proteins are contained within and in frame with the u l 26 open reading frame initiation of translation from a downstream in-frame aug codon on brca1 can generate the novel isoform protein dbrca1(17aa) the retinoblastoma interacting zinc finger gene riz produces a pr domain-lacking product through an internal promoter mutation of the cdkn2a 5 0 utr creates an aberrant initiation codon and predisposes to melanoma the human cytomegalovirus ul35 gene encodes two proteins with different functions rna polymerase i-promoted his4 expression yields uncapped, polyadenylated mrna that is unstable and inefficiently translated in saccharomyces cerevisiae two isoforms of murine hck, generated by utilization of alternative translational initiation codons, exhibit different patterns of subcellular localization the human aqp4 gene: definition of the locus encoding two water channel polypeptides in brain cloning and origin of the two forms of chicken vitamin d receptor in vivo evaluation of the context sequence of the translation initiation codon in plants alternative splicing and promoter usage generates an intracellular stromelysin 3 isoform directly translated as an active matrix metalloproteinase efficiency of reinitiation of translation on human immunodeficiency virus type 1 mrnas is determined by the length of the upstream open reading frame and by intercistronic distance expression of nfat-family proteins in normal human t cells an alternative promoter in the mouse major histocompatibility complex class ii i-ab gene: implications for the origin of cpg islands regulation of rat ornithine decarboxylase mrna translation by its 5 0 -untranslated region translational activation of the lck proto-oncogene an alternative mode of translation permits production of a variant nbs1 protein from the common nijmegen breakage syndrome allele lost in translation molecular biology of luteoviruses posttranscriptional control of gene expression in yeast. microbiol isolation, characterization, and transcription of the gene encoding mouse mast cell protease 7 translational control of the xenopus laevis connexin-41 5 0 -untranslated region by three upstream open reading frames human mxb protein, an interferon-a-inducible gtpase, contains a nuclear targeting signal and is localized in the heterochromatin region beneath the nuclear envelope subgenomic rnas mediate expression of cistrons located internally on the genomic rna of tobacco necrosis virus strain a the three dominant female-sterile mutations of the drosophila ovo gene are point mutations that create new translation-initiator aug codons posttranscriptional control via iron-responsive elements: the impact of aberrations in hereditary disease synthesis of subgenomic rnas by positivestrand rna viruses synthesis of brome mosaic virus subgenomic rna in vitro by internal initiation on (2)-sense genomic rna transforming growth factor-b 1 -mediated inhibition of the flk-1/kdr gene is mediated by a 5 0 -untranslated region palindromic gata site characterization of two novel nuclear btb/poz domain zinc finger isoforms. association with differentiation of hippocampal neurons, cerebellar granule cells, and macroglia regulation of expression of the alternative mrnas of the rat a-thyroid hormone receptor gene differential translational initiation of lbp mrna is caused by a 5 0 upstream open reading frame identification of murine p120 cas isoforms and heterogeneous expression of p120 cas isoforms in human tumor cell lines inducibility and negative autoregulation of crem: an alternative promoter directs the expression of icer, an early response repressor a-thalassaemia associated with the deletion of two nucleotides at position 22 and 23 preceding the aug codon upstream open reading frames as regulators of mrna translation the 5 0 utr of protein kinase c 1 confers translational regulation in vitro and in vivo complex organisation of the 5 0 -end of the human glycine trna synthetase gene mapping of the polycistronic rnas of tomato leaf curl geminivirus site-directed mutagenesis of adeno-associated virus type 2 structural protein initiation codons: effects on regulation of synthesis and biological activity specific cleavage of hepatitis c virus rna genome by human rnase p expression of kaposi's sarcoma-associated herpesvirus g protein-coupled receptor monocistronic and bicistronic transcripts in primary effusion lymphocytes rar-b4, a retinoic acid receptor isoform is generated from rar-b2 by alternative splicing and usage of a cug initiator codon the hts1 gene encodes both the cytoplasmic and mitochondrial histidine trna synthetases of s. cerevisiae translation of a nonpolyadenylated viral rna is enhanced by binding of viral coat protein or polyadenylation of the rna nucleotide sequence and expression of the capsid protein gene of feline calicivirus translational discrimination of mrnas coding for human insulin-like growth factor ii growthdependent translation of igf-ii mrna by a rapamycin-sensitive pathway full-sized ranbpm cdna encodes a protein possessing a long stretch of proline and glutamine within the nterminal region, comprising a large protein complex human o-glcnac transferase (ogt): genomic structure, analysis of splice variants, fine mapping in xq13 tissue-specific initiation of murine complement factor b mrna transcription downstream ribosomal entry for translation of coronavirus tgev gene 3b generation from a single gene of two mrnas that encode the mitochondrial and peroxisomal serine: pyruvate aminotransferase of rat liver abcd1 translation-initiator mutation demonstrates genotype-phenotype correlation for amn ccaat/enhancer-binding protein mrna is translated into multiple proteins with different transcription activation potentials dominant-negative mutations of cebpa, encoding ccaat/enhancer binding protein-a (c/ebpa), in acute myeloid leukemia mammalian cells express two differently localized bag-1 isoforms generated by alternative translation initiation spliced and prematurely polyadenylated jaagsiekte sheep retrovirus-specific rnas from infected or transfected cells ribosomal pausing and scanning arrest as mechanisms of translational regulation from cap-distal iron-responsive elements hcc-2, a human chemokine: gene structure, expression pattern, and biological activity a plant viral "reinitiation" factor interacts with the host translational machinery genetic manipulation of arterivirus alternative mrna leader-body junction sites reveals tight regulation of structural protein expression the size of rous sarcoma virus mrnas active in cell-free translation starved saccharomyces cerevisiae cells have the capacity to support internal initiation of translation effect of upstream reading frames on translation efficiency in simian virus 40 recombinants promoter choice influences alternative splicing and determines the balance of isoforms expressed from the mouse bcl-x gene functional organization of the human uncoupling protein-2 gene, and juxtaposition to the uncoupling protein-3 gene uncoupling protein 2, in vivo distribution, induction upon oxidative stress, and evidence for translational regulation the exon structure of the mouse a2(ix) collagen gene shows unexpected divergence from the chick gene identification of the subgenomic mrnas that encode 6-kda movement protein and hsp70 homolog of beet yellows virus a reassessment of the translation initiation codon in vertebrates characterization of human mucin gene muc4 promoter p76 mdm2 inhibits the ability of p90 mdm2 to destabilize p53 analysis of oligonucleotide aug start codon context in eukariotic mrnas structural and functional features of eukaryotic mrna untranslated regions molecular mechanisms of translation initiation in eukaryotes human myeloid zinc finger gene mzf produces multiple transcripts and encodes a scan box protein systematic movement of an rna plant virus determined by a point substitution in a 5 0 leader sequence coupled transcriptional and translational control of cyclin-dependent kinase inhibitor p18 ink4c expression during myogenesis hepatocyte-nuclear factor 3b gene transcripts generate protein isoforms with different transactivation properties on the glucagon gene internal initiation of translation of five dendritically localized neuronal mrnas the amphiregulin gene encodes a novel epidermal growth factor-related protein with tumor-inhibitory activity characterization of hypersensitive sites, protein-binding motifs, and regulatory elements in both promoters of the mouse porphobilinogen deaminase gene identification of a sequence in the unique 5 0 open reading frame of the gene encoding glycosylated gag which influences the incubation period of neurodegenerative disease induced by a murine retrovirus the glycosylated gag protein of mulv is a determinant of neuroinvasiveness: analysis of second site revertants of a mutant mulv virus lacking expression of this protein an alternative open reading frame of the human macrophage colony-stimulating factor gene is independently translated and codes for an antigenic peptide of 14 amino acids recognized by tumorinfiltrating cd8 t lymphocytes introducing refseq and locuslink: curated human genome resources at the ncbi rat phospholipid-hydroperoxide glutathione peroxidase. cdna cloning and identification of multiple transcription and translation start sites mechanisms of loss of foreign gene expression in recombinant vesicular stomatitis viruses in vitro translation of the upstream open reading frame in the mammalian mrna encoding s-adenosylmethionine decarboxylase the 5 0 untranslated sequence of the c-sis platelet-derived growth factor 2 transcript is a potent translational inhibitor a single gene encodes two isoforms of the p70 s6 kinase: activation upon mitogenic stimulation posttranscriptional mrna processing as a mechanism for regulation of human a 1 adenosine receptor expression expression of the smoothelin gene is mediated by alternative promoters mouse mammary tumor virus superantigen expression in b cells is regulated by a central enhancer within the pol gene non-aug initiation of agamous mrna translation in arabidopsis thaliana efficient simultaneous presentation of ny-eso-1/lage-1 primary and nonprimary open reading frame-derived ctl epitopes in melanoma presence of atg triplets in 5 0 untranslated regions of eukaryotic cdnas correlates with a 'weak' context of the start codon a non-aug-defined alternative open reading frame of the intestinal carboxyl esterase mrna generates an epitope recognized by renal cell carcinoma-reactive tumor-infiltrating lymphocytes in situ identification of bing-4 cancer antigen translated from an alternative open reading frame of a gene in the extended mhc class ii region using lymphocytes from a patient with a durable complete regression following immunotherapy cell-specific translational regulation of s-adenosylmethionine decarboxylase mrna. influence of the structure of the 5 0 transcript leader on regulation by the upstream open reading frame block of hac1 mrna translation by long-range base pairing is released by cytoplasmic splicing upon induction of the unfolded protein response regulation of alternative splicing by rna editing continuous and discontinuous ribosome scanning on the cauliflower mosaic virus 35s rna leader is controlled by short open reading frames a complex translational program generates multiple novel proteins from the latently expressed kaposin (k12) locus of kaposi's sarcoma-associated herpesvirus correlation between sequence conservation of the 5 0 untranslated region and codon usage bias in mus musculus genes cot kinase proto-oncogene expression in t cells n-terminal rag1 frameshift mutations in omenn's syndrome: internal methionine usage leads to partial v(d)j recombination activity and reveals a fundamental role in vivo for the n-terminal domains complete genomic sequence of the human abca1 gene: analysis of the human and mouse atp-binding cassette a promoter the pim-1 oncogene encodes two related protein-serine/threonine kinases by alternative initiation at aug and cug negative and translation termination-dependent positive control of fli-1 protein synthesis by conserved overlapping 5 0 upstream open reading frames in fli-1 mrna normal developing rat brain expresses a platelet-derived growth factor b chain (c-sis ) mrna truncated at the 5 0 end multiple murine double minute gene 2 (mdm2) proteins are induced by ultraviolet light transcriptional control of hepadnavirus gene expression internal translation initiation generates novel wt1 protein isoforms with distinct biological properties caveolin isoforms differ in their n-terminal protein sequence and subcellular distribution identification and functional analysis of the turnip yellow mosaic tymovirus subgenomic promoter a second major native von hippel-lindau gene product, initiated from an internal translation start site, functions as a tumor suppressor an amino-terminal domain of mxi1 mediates anti-myc oncogenic activity and interacts with a homolog of the yeast transcriptional repressor sin3 mechanism of translation of monocistronic and multicistronic human immunodeficiency virus type 1 mrnas mechanisms of synthesis of virion proteins from the functionally bigenic late mrnas of simian virus 40 translation initiation at a downstream aug occurs with increased efficiency when the upstream aug is located very close to the 5 0 cap cloning, expression, and nucleotide sequence of rat liver sterol carrier protein 2 cdnas bovine coronavirus i protein synthesis follows ribosomal scanning on the bicistronic n mrna ubiquitinactivating enzyme (e1) isoforms in lens epithelial cells: origin of translation, e2 specificity and cellular localization determined with novel site-specific antibodies cloning and characterization of liverspecific isoform of chk1 gene from rat producing nature's gene-chips: the generation of peptides for display by mhc class i molecules preferential ribosomal scanning is involved in the differential synthesis of the hepatitis b viral surface antigens from subgenomic transcripts translation of the rnas of brome mosaic virus: the monocistronic nature of rna1 and rna2 analysis of capsid formation of human polyomavirus jc (tokyo-1 strain) by a eukaryotic expression system: splicing of late rnas, translation and nuclear transport of major capsid protein vp1, and capsid assembly sequential partially overlapping gene arrangement in the tricistronic s1 genome segments of avian reovirus and nelson bay reovirus: implications for translation initiation translational regulation of the jund messenger rna characterization of the murine hyaluronidase gene region reveals complex organization and cotranscription of hyal1 with downstream genes, fus2 and hyal3 polyoma virus has three late mrnas: one for each virion protein a somatic mutation in the 5 0 utr of brca1 gene in sporadic breast cancer causes down-modulation of translation efficiency the 105-kda polyprotein of southern bean mosaic virus is translated by scanning ribosomes the two subunits of human molybdopterin synthase: evidence for a bicistronic messenger rna with overlapping reading frames poliovirus neurovirulence correlates with the presence of a cryptic aug upstream of the initiator codon mrna leader length and initiation codon context determine alternative aug selection for the yeast gene mod5 chop-dependent stress-inducible expression of a novel form of carbonic anhydrase vi the promoters for human and monkey poliovirus receptors characterization of two bifunctional arabidopsis thaliana genes coding for mitochondrial and cytosolic forms of valyl-trna synthetase and threonyl-trna synthetase by alternative use of two in-frame augs a small highly basic protein is encoded in overlapping frame within the p gene of vesicular stomatitis virus identification of downstream-initiated c-myc proteins which are dominant-negative inhibitors of transactivation by full-length c-myc proteins leaky scanning is the predominant mechanism for translation of human papillomavirus type 16 e7 oncoprotein from e6/ e7 bicistronic mrna human molybdopterin synthase gene: identification of a bicistronic transcript with overlapping reading frames elements in the murine c-mos messenger rna 5 0 -untranslated region repress translation of downstream coding sequences regulatory role of the conserved stem-loop structure at the 5 0 end of collagen a1(i) mrna the alphaviruses: gene expression, replication, and evolution characterization of human mapre genes and their proteins calspermin gene transcription is regulated by two cyclic amp response elements contained in an alternative promoter in the calmodulin kinase iv gene structure of the 5 0 flanking region of the gene encoding human parathyroid-hormonerelated protein (pthrp) statistical analysis of the 5 0 untranslated region of human mrna using "oligo-capped" cdna libraries evidence for the existence of a coat protein messenger rna associated with the top component of each of three tymoviruses truncated forms of the dual function human asct2 neutral amino acid transporter/retroviral receptor are translationally initiated at multiple alternative cug and gug codons the 5 0 -untranslated region of the mouse glial cell line-derived neurotrophic factor gene regulates expression at both the transcriptional and translational levels testis-specific transcription initiation sites of rat farnesyl pyrophosphate synthetase mrna the baculovirus transcriptional transactivator ieo produces multiple products by internal initiation of translation generation of stable mrna fragments and translation of ntruncated proteins induced by antisense oligonucleotides transcriptional regulation of the interferon-g-inducible tryptophanyl-trna synthetase includes alternative splicing evidence for translational regulation of the imprinted snurf-snrpn locus in mice cancerassociated alternative usage of multiple promoters of human galcer sulfotransferase gene the eyeless mouse mutation (ey1 ) removes an alternative start codon from the rx/rax homeobox gene expression patterns of the multiple transcripts from the folylpolyglutamate synthetase gene in human leukemias and normal differentiated tissues ataxia caused by mutations in the atocopherol transfer protein gene mutations in each of the five subunits of translation initiation factor eif2b can cause leukoencephalopathy with vanishing white matter structural and functional genomics of the cpt1b gene for muscle-type carnitine palmitoyltransferase i in mammals ribosomal scanning on the highly structured insulin-like growth factor ii-leader 1 opposing roles of elk-1 and its brain-specific isoform, short elk-1, in nerve growth factor-induced pc12 differentiation identification of rauscher murine leukemia virus-specific mrnas for the synthesis of gag-and env-gene products in vivo translation of the triple gene block of potato virus x requires two subgenomic mrnas the yeast transcription factor genes yap1 and yap2 are subject to differential control at the levels of both translation and mrna stability rejection antigen peptides on balb/c rlf1 leukemia recognized by cytotoxic t lymphocytes: derivation from the normally untranslated 5 0 region of the c-akt proto-oncogene activated by long terminal repeat genomic architecture and transcriptional activation of the mouse and human tumor susceptibility gene tsg101: common types of shorter transcripts are true alternative splice variants a human aminoacyl-trna synthetase as a regulator of angiogenesis analysis of the two subgenomic rna promoters for turnip crinkle virus in vivo and in vitro inefficient reinitiation is responsible for upstream open reading frame-mediated translational repression of the maize r gene role of mrna secondary structure in translational repression of the maize transcriptional activator lc utilization of an alternative open reading frame of a normal gene in generating a novel human cancer antigen post-transcriptional regulation of the gli1 oncogene by the expression of alternative 5 0 untranslated regions a novel, testis-specific mrna transcript encoding an nh 2 -terminal truncated nitric-oxide synthase rna diversity has profound effects on the translation of neuronal nitric oxide synthase bunyamwera bunyavirus nonstructural protein nss counteracts the induction of alpha/beta interferon genomic organization of human mxi1, a putative tumor suppressor gene acquired mutations in gata1 in the megakaryoblastic leukemia of down syndrome infectious tymv rna from cloned cdna: effects in vitro and in vivo of point substitutions in the initiation codons of two extensively overlapping orfs expression of the hepatitis b virus core gene in vitro and in vivo cytomegalovirus assembly protein nested gene family: four 3 0 -coterminal transcripts encode four in-frame overlapping proteins an internal open reading frame triggers nonsense-mediated decay of the yeast spt10 mrna deregulation of translational control of the 65-kda regulatory subunit (pr65a) of protein phosphatase 2a leads to multinucleated cells the short 5 0 untranslated region of the betaa3/a1-crystallin mrna is responsible for leaky ribosomal scanning the human achaete scute homolog 2 gene contains two promoters, generating overlapping transcripts and encoding two proteins with different nuclear localization an activating splice donor mutation in the thrombopoietin gene causes hereditary thrombocythaemia cloning and characterization of two novel thyroid hormone receptor b isoforms a 6700 mw membrane protein is encoded by region e3 of adenovirus type 2 the glyoxysomal and plastid molecular chaperones (70-kda heat shock protein) of watermelon cotyledons are encoded by a single gene analysis and mapping of a family of 3 0 -coterminal transcripts containing coding sequences of human cytomegalovirus open reading frames ul93 through ul99 evidence that aguauauga and ccaagauga initiate translation in the same mrna in region e3 of adenovirus e3 transcription unit of adenovirus interplay of heterogeneous transcriptional start sites and translational selection of augs dictate the production of mitochondrial and cytosolic/nuclear trna nucleotidyltransferase form the same gene in yeast mechanisms leading to and the consequences of altering the normal distribution of atp(ctp):trna nucleotidyltransferase in yeast the 5 0 -untranslated region of the n-methyl-d-aspartate receptor nr2a subunit controls efficiency of translation characterization of an upstream open reading frame in the 5 0 untranslated region of pr-39, a cathelicidin antimicrobial peptide identification and characterization of a new mammalian mitogen-activated protein kinase kinase, mkk2 alternative transcriptional initiation as a novel mechanism for regulating expression of a baculovirus trans activator expression of a mitogen-responsive gene encoding prostaglandin synthase is regulated by mrna splicing inhibition of corticotropin releasing hormone type-1 receptor translation by an upstream aug triplet in the 5 0 untranslated region ccaat/ enhancer binding protein 1 is preferentially up-regulated during granulocyte differentiation and its functional versatility is determined by alternative use of promoters and differential splicing identification of the start sites for the 1.9-and 1.4-kb rat transforming growth factor-b1 transcripts and their effect on translational efficiency dna methylation represses the expression of the human erythropoietin gene by two different mechanisms an ercc1 splicing variant involving the 5 0 -utr of the mrna may have a transcriptional modulatory function molecular identification and characterization of a and b forms of the glucocorticoid receptor evidence that a regulatory gene autoregulates splicing of its transcript identification of a new form of aqp4 mrna that is developmentally expressed in mouse brain identification of a noncanonical signal for transcription of a novel subgenomic mrna of mouse hepatitis virus: implication for the mechanism of coronavirus rna transcription novel short transcripts of hepatitis b virus x gene derived from intragenic promoter splicing in adenovirus and other animal viruses tissue specific expression of the retinoic acid receptor-b2: regulation by short open reading frames in the 5 0 -noncoding region analysis of the cc chemokine receptor 3 gene reveals a complex 5 0 exon organization, a functional role for untranslated exon 1, and a broadly active promoter with eosinophil-selective elements complete testicular feminization caused by an amino-terminal truncation of the androgen receptor with downstream initiation c/t polymorphism in the 5 0 untranslated region of the apolipoprotein(a) gene introduces an upstream aug and reduces in vitro translation research in my laboratory is supported by grant gm33915 from the national institutes of health. key: cord-291349-tq2n4mx3 authors: smith, kevin r title: gene transfer in higher animals: theoretical considerations and key concepts date: 2002-10-09 journal: j biotechnol doi: 10.1016/s0168-1656(02)00105-0 sha: doc_id: 291349 cord_uid: tq2n4mx3 gene transfer technology provides the ability to genetically manipulate the cells of higher animals. gene transfer permits both germline and somatic alterations. such genetic manipulation is the basis for animal transgenesis goals and gene therapy attempts. improvements in gene transfer are required in terms of transgene design to permit gene targeting, and in terms of transfection approaches to allow improved transgene uptake efficiencies. during the 1970s it became possible to introduce exogenous dna constructs into higher eukaryotic cells in vitro. mammalian (germline) transgenesis was first achieved in the early 1980s, mice being the subject species. transgenic members of a wide range of animal (and plant) classes and species have now been produced, including amphibians, cattle, chickens, fish, insects, nematodes, pigs, rabbits, and sea urchins. gene transfer methods have been used in gene therapy attempts on humans since 1990. gene therapy approaches have so far focused primarily on monogenic disorders and cancers. to date, limited clinical success has been achieved. how-ever, gene therapy is in its infancy and holds great promise for the future. as indicated above, gene transfer methods may be used to generate transgenic animals. such animals may in principle be utilised in either of two broad ways: (a) as models for fundamental or applied scientific study; and (b) as novel sources of pharmaceutical agents, or human-compatible organs for xenotransplantation. gene transfer in higher eukaryotes may in principle be applied directly for therapeutic purpose to humans in either of two broad ways: (a) somatic gene therapy, the genetic manipulation of a subset of cells in the body; and (b) germline gene therapy, involving an alteration of genetic information in germ cells. germline gene therapy has never (yet) been attempted with humans (unless one includes the transfer of foreign mitochondria during fertilisation), and is fraught with major ethical concerns, thus there is a dearth of scientific literature available on the subject. to achieve germline alterations, transgenesis must occur at an early stage of development. by contrast, somatic cell alterations, as in somatic gene therapy, involve a very wide range of cell types. various somatic cell types are mentioned at appropriate points throughout this paper. the present section reviews the cell types that may be used for germline transgenesis. genetic manipulation of a newly fertilised, single-cell egg (zygote) should in principle result in the development of an organism in which all or very many cells contain the (identical) alteration. hence, the zygote has been the main focus for transgenic engineering. (in keeping with common usage, in this paper the term 'egg' is applied to all developmental stages from oocyte to unhatched blastocyst.) pre-fertilisation eggs may in principle be suitable targets for transgenesis. however, fundamental practical problems have so far precluded their use. eggs collected following ovulation would have to be fertilised after transgenic manipulation. this would entail the use of in vitro fertilisation. potentially-transgenic eggs would thus have to endure a further, extensive ex vivo procedure. since this could only be detrimental to the eggs, it is difficult to see any role for transgenesis directed at this level rather than at the zygote. speculatively, en masse transgenic manipulation of pre-ovulatory oocytes in vivo could in principle be attempted. the resultant 'proto-transgenic' organism would be expected, upon each subsequent ovulation, to produce transgenically altered eggs ready for fertilisation. however, the relevant technology has not yet been sufficiently developed to support this type of in vivo approach, and no transgenic animals are reported to have been generated in this way. eggs that have undergone cleavage are less than ideal for transgenesis. where only one cell is transgenically altered, the resulting organism is likely to take the form of a genetic mosaic. such an organism would consist of: (a) cells harbouring the transgene; and (b) cells without the transgene. indeed, the more rounds of cleavage there have been prior to transgenesis, the lower the probable proportion of altered cells in the transgenic. manipulating more than one cell in an egg is technically very difficult. more crucially, each manipulated cell would not contain an identical alteration, for reasons to be explored later. thus, although comprising only altered cells, the resultant organism would again be a mosaic (jaenisch, 1988; whitelaw et al., 1993; wilkie et al., 1986) . mosaicism need not always be a problem. mosaic transgenics may produce gametes that contain the transgene, allowing the subsequent generation(s) to be fully transgenic. however, mosaic transgenics do not always contain genetically altered germline cells. of those transgenics that do, not all of the germline cells will necessarily be altered. thus the use of post-zygotic eggs would result in a lower ultimate efficiency of transgenesis in comparison with zygote-stage eggs. a major limitation associated with eggs is that the targeting of transgenes to chosen genomic loci is not a practical proposition, due to the inability to select for eggs that contain rare targeted integration outcomes. although gene targeting has been reported following gene transfer to eggs, the rate of targeting (versus random integration) is too low for practical purposes (brinster et al., 1989) . however, many transgenic experiments do not require gene targeting (rulicke and hubscher, 2000) . randomly integrated transgenes have been used in a variety of approaches. examples include controlled trangene expression via administration of an extrinsic agent (kistner et al., 1996) , ablation of hormone-producing tissues by expression of a toxin-producing transgene (wallace et al., 1991) and production of human blood clotting factor viii in the milk of transgenic sheep (niemann et al., 1999) . inner cell mass (icm) cells from the mouse blastocyst can be propagated in vitro as embryonic stem (es) cells (reviewed by torres, 1998) . in contrast to other cultured cell lines, es cells retain their normal karyotype even after several weeks in culture, during which time they remain totipotent. furthermore, es cells are capable of colonising the embryo. these unique properties allow es cells to form chimeras when injected into blastocysts or aggregated with morulae. the resultant embryos are transferred to the uterus of a pseudopregnant female mouse, where approximately 50% should develop successfully to term. approximately 50% of the resultant offspring should be chimeric. the es cell contribution to a mouse can be as high as 80% of the cells, and will often include the germline cells. it is during the in vitro culture stage that es cells may be transgenically manipulated (pirity et al., 1998) . the great advantage of es cells is that they can be subjected to a range of selective agents in vitro, which allows the selection of particular transgenic modifications. this ability makes es cells extremely useful for gene targeting experiments and applications. es cell technology has enabled a large range of transgenic approaches in the mouse. types of gene modifications presently available in the mouse include targeted elimination of endogenous gene expression (gene 'knockout'), targeted gene repair/ replacement, conditional gene targeting and 'gene trap' reporter systems (for relevant reviews see demayo and tsai, 2001; stanford et al., 2001; gao et al., 1999; jasin et al., 1996; ledermann, 2000; lewandoski, 2001; metzger and feil, 1999; muller, 1999) . the use of es cells is limited due to the fact that, to date, the mouse is the only animal from which es cell lines have been unequivocally established. it would be surprising if this limitation represents a fundamental biological barrier. however, further empirical work is needed before true es cell lines become available for other species. it is possible that the inbred strains of mice used to generate es cells may carry mutations that are essential for the generation of es cells. if such mutations represent a precondition for es cell derivation, then it may take a considerable amount of time to establish nonmurine es cell lines. nevertheless, major progress has recently, been made in the analysis of molecular pathways of icm and trophoblast differentiation in mammals (niwa, 2001; rossant, 2001) . such progress is expected to have a positive impact on nonmurine es cell establishment. when nonmurine es cells become available, the established mouse technologies will provide the basis for in vitro genetic modification of all species. building upon fundamental research into cell cycle co-ordination, wilmut et al. (1997) , schnieke et al. (1997) and campbell et al. (1996) have reported the successful transfer of 'reprogrammed' ovine donor nuclei. unfertilised, metaphase-stage enucleated ('universal recipient') eggs received the transferred nuclei. nuclei were taken from somatic cells that had been forced into a form of cell cycle stasis (by incubating the cells in a minimal nutrient medium), such that dna replication and gene expression were halted (or virtually so). the transfer of 'static' donor nuclei to 'universal recipient' eggs resulted (in some cases) in successful embryo development, the donor nuclei having been 'reprogrammed' into totipotency. offspring were produced following the transfer of such 'reconstructed' embryos to recipient ewes. subsequent molecular genetic testing showed that the lambs' dna had originated from the donor cells. in some of the experiments, the donor nuclei were obtained from (ovine) embryo-derived cultured cell lines. following these ground-breaking experiments, successful cloning from cultured cells of various animals including cattle, goats and pigs has been reported (see reviews by tsunoda and kato, 2000; wolf et al., 2000) . the prospects for germline transgenesis via nuclear transfer (nt) are very significant: transgenes can be introduced to donor cells in vitro, permitting the production of genetically modified animals by nt . moreover, because selection can be applied to cultured donor cells, nt can be used to produce gene-targeted transgenic animals (reviewed by clark et al., 2000) . the generation of the first gene-targeted sheep by mccreath et al. (2000) provides a useful illustration. in this approach, fetal fibroblasts were transfected with a therapeutic transgene carried by liposomes (see section 3.3.3). key parts of the transgene construct included (a) sequences homologous to the ovine a1(i) procollagen locus and (b) a promoterless neomycin selectable marker. these transgene features were employed such that (a) homologous recombination (hr) between transgene and target would result in targeted transgene integration, and (b) targeted integration would confer g418 resistance by bringing the neomycin gene into proximity with the endogenous promoter. g418-resistant fibroblasts were cultured in reduced serum medium prior to the transfer of their nuclei into recipient (enucleated) oocytes. from 470 reconstructed embryos, 20 fetuses were produced, from which 14 live-born lambs resulted. of 16 lambs and fetuses analysed, 15 showed the presence of the transgene at the target locus. similarly, lai et al. (2002) have recently reported successful nt gene targeting in pigs. the target was the alpha-1,3-galactosyltransferase locus in porcine fetal fibroblasts. following in vitro selection, 338 reconstructed embryos were created, which in turn gave rise to 7 live-born piglets. of these, 4 piglets contained transgene dna at the target locus, such that in each case one allele of the alpha-1,3-galactosyltransferase gene was knockedout. thus, nt is potentially able to provide the same range of transgenic manipulations presently available in mice (via the es cell route) to all animal species. however, in comparison with es cell transgenesis, nt has thus far proved to be relatively inefficient: only a small proportion of reconstructed embryos survive to become live animals. for example, in the foregoing cases, live targeted sheep were produced at an efficiency of 3.6% (mccreath et al., 2000) , and pigs at 1.2% (lai et al., 2002) . the health status of nt-derived animals is also proving to be problematic (reviewed by smith et al., 2000; renard et al., 2002) . developmental abnormalities are very common, and frequently result in death (fetal or postnatal) or debility. for example, of the 14 live-born lambs described above (mccreath et al., 2000) , seven died within 30 h of birth, and four died within 12 weeks. similarly, out of the 7 piglets described above (lai et al., 2002) , 2 piglets died shortly after birth, and one died at 17 days; only one appeared to be entirely free of developmental abnormalities. transgenesis and gene targeting are not of themselves implicated: the health problems are associated with nt per se. during the in vitro (cell culture) stage, the pattern of chromosomal imprinting may change; there are indications that inappropriate expression of imprinted genes following such epigenetic alteration may be mainly responsible for the poor health of nt-derived animals (kono, 1998; rideout et al., 2001; wakayama and yanagimachi, 2001) . research into epigenetic reprogramming in nt embryos is in progress, and it is to be hoped that developmental abnormalities arising from nt will eventually be eliminated or reduced in frequency. meanwhile, it is anticipated that nt-related health problems, to the extent that the basis for such is epigenetic, are unlikely to affect the offspring of surviving first generation animals. as should become clear later, one of the major technical drawbacks of germline transgenesis is the difficulty of physically introducing exogenous dna (the transgene) to the zygote. given that the natural role of sperm cells is to deliver dna to the egg, an intriguing approach would be to induce sperm cells to carry transgenes. a number of groups have claimed varying degrees of success in this regard. however, others have had difficulty replicating such work, and there are as yet no clear answers to the questions: (a) is it possible; and (b) if so, how? (reviewed by smith, 1999 ). an alternative possibility could be to introduce the transgene into testicular (sperm) stem cells in vivo. this would in principle remove the need to collect, manipulate or transfer eggs, thus providing a major streamlining of germline transgenesis. preliminary results have been reported in mice, where transgene constructs were directly injected into the testis. for example, 60á/70% of sperm were reported to carry the transgene following injection of naked dna into the vas deferens (huguet and esponda, 1998) , with a follow-up report claiming detection of the transgene in the cells of 7.5% of offspring animals produced following fertilisation with the transgene-bearing sperm (huguet and esponda, 2000) . similar results were reported by sato et al., using lipidassociated transgene molecules injected through the testicular capsule (sato et al., 1999) . this section reviews several methods that may be used for getting exogenous dna into recipient cells. the most widely used/widely applicable methods */retroviral transfer and microinjection */are considered first. some fundamentals of transgene design are also considered, because most work in this regard has probably been carried out (and has the widest scope) with microinjection, at least as far as germline modifications are concerned. however, such fundamentals of transgene design also apply to other delivery methods, as should become clear in later sections of this paper. other delivery methods */ i.e. those currently under development or less widely applicable */are considered also. retroviruses are found in many species including most mammals (reviewed by lazo and tsichlis, 1990) . retroviruses have rna as their genetic material. following infection, the rna is transcribed by the virus-encoded enzyme reverse transcriptase. the resultant single-stranded dna is replicated as double-stranded dna (dsdna). the dsdna viral genome has the important property of being able to linearly integrate (as a provirus) into the chromosomal dna of the host cell. the site of integration appears to be essentially random. the genome of retroviruses can be manipulated to carry exogenous dna. eggs may be incubated in media containing high concentrations of the resultant retroviral vector. alternatively, retroviral vector-producing cell monolayers may be used, upon which eggs are co-cultivated. in either case, between 10 and 50% of (surviving) embryos will be infected. following egg transfer into pseudopregnant females, the infected embryos should give rise to transgenic offspring. molecular genetic analysis of transgenics produced in this way usually show integration of a single proviral copy into a given chromosomal site. rearrangements of the host genome are normally restricted to short direct repeats at the site of integration. in many embryos the germline cells contain viral integrants: thus, transmission of the transgene to the next generation will often occur. methods have also been developed to allow infection into postimplantation embryos. virus uptake is effective for many somatic cell lines, however germline cells are infected at low frequency, due to a high level of mosaicism (braas et al., 1996; morgan and french anderson, 1993) . infection into highly developed tissues, such as those of foetuses, juveniles or adults, is also a possibility (reviewed by hu and pathak, 2000) . this holds great promise in the context of somatic gene therapy. retroviral vectors have also been used to introduce transgenes into the es cell genome (for example, see robertson et al., 1986) . the major advantages of retroviral vectors are: (a) is the ease of introducing the transgene (the virus is naturally equipped to carry exogenous genetic material into cells and to integrate it into the cells' endogenous dna); and (b) the unitary form of integration (one intact copy per genome of transgene-positive cells is the norm). however, retroviral vectors are limited or problematic in a number of respects, as discussed in the following sections. because the size of the viral dna á/protein complex precludes it from passing through the nuclear pores, the host cell must be in mitosis for integration to occur (miller et al., 1990) . thus only actively dividing cells are infectable by retroviral vectors. this means that: (a) retrovirus-based gene therapies would not be applicable to many cell/ tissue/organ types in the adult (e.g. brain tissue); and (b) eggs need to be at the eight-cell stage (at least) before infection can begin: this leads to problems of mosaicism. retroviral vectors are somewhat limited in respect of the length of transgene sequence it is possible for them to carry. the upward length of transgene sequence is 9 á/10 kb (for example see hu and pathak, 2000) . retroviral vectors integrate into the host dna in a largely random fashion. thus the chromosomal location of the provirus/transgene varies between individual transgenics created with an identical retroviral vector construct. each particular chromosomal environment is likely to have a particular effect on the transcription of the transgene. the result is inconsistency and irreproducibility of expression (for example see jahner and jaenisch, 1985) . transgene expression in the provirus is driven by the viral 5? long direct terminal repeat. hence, it is problematic to engineer into the construct the ability for it to be controlled (a) in a tissue-specific manner or (b) by external (experimenter) influence (for example see hoatlin et al., 1995; pannell and ellis, 2001) . using integration-deficient retroviral vectors, ellis and bernstein (1989) were able to target genomic loci such that vector sequences homologously recombined with endogenous sequences. however, the frequency of targeting was very low (:/1 targeted event per 3 )/10 6 infected cells). given the essentially random (nonhomologous) mode of integration of natural retroviruses, retroviral vectors do not appear to hold much promise for applications in which gene targeting is required. 3.1.6. instability/safety concerns it is possible for integrated retroviral dna to be spontaneously reactivated (weiss et al., 1985) , leading to the production of new integration within the dna of the cell, to new infection of other cells or to infection of other individual animals. such instability again may result in transgene expression problems and safety concerns (reviewed by temin, 1990; cornetta et al., 1991; gunter et al., 1993) . however, retroviral vectors may be engineered such that they lack the genetic sequences required for a normal life cycle (reviewed by vile et al., 1996) . the creation of such 'defective' retroviral vectors goes a long way towards curing instability-related expression and safety problems. however, the risk of reactivation can probably never be completely eliminated, because complementation by a competent 'helper' retrovirus cannot be ruled out. in a controlled laboratory environment this may well represent only a minor concern. however, for transgenically altered organisms entering the outside environment (e.g. transgenic dairy cattle bioreactors, or humans treated with gene therapy), the risk may be more acute. in addition to the risks of releasing infectious agents into the general environment, there are concerns for patients who have been treated with retroviral vector-based gene therapeutic agents. in such cases, retroviral reactivation could conceivably lead to oncogenesis. thus there are serious safety concerns over the use of retroviral vectors. such risks are generally difficult to qualify. whatever the actual risk may turn out to be, in the absence of certainty most regulatory bodies have tended to be fairly restrictive with regard to what they will allow in respect of retroviral vector usage (for example, see kessler, 1993) . in fact, a wide range of safety issues surrounds the use of retroviruses as agents of exogenous gene delivery. the following list summarises the main areas of concern. . integration problems: accidental integration into or near to an endogenous gene could lead to insertional mutagenesis/oncogene activation (although this is not unique to the retroviral method of gene delivery). . pathogenesis: immunocompromised patients may be at risk from infection: such infection could result in direct damage or oncogenesis. . pathogen contamination: if the packaging cells become infected */especially with another virus */this agent is likely to contaminate the retroviral product. . pseudotyping: surface/structural changes to the virus may occur due to packaging cell viralbased sequences: this may result in altered/ expanded host cell range (paradoxically, this may be useful if harnessed correctly, in some possible applications */see review by dornburg, 1997) . . complementation: as alluded to previously, complementation by a competent 'helper' retrovirus cannot be completely ruled out. this may occur in vitro or in vivo. despite the limitations and safety concerns referred to above, retroviral-mediated gene therapy has already been used in a number of gene therapy attempts, and appears to hold a good deal of promise in this regard. however, although retroviral-mediated gene therapy has been used successfully for (nonhuman) germline modifications, the most used */and to date the most useful */method for germline transgenesis is microinjection, reviewed in the following section. jon gordon in 1980 demonstrated that exogenous dna could be introduced into the genome simply by the physical injection of a solution of cloned dna into the zygote (gordon et al., 1980) . subsequently, microinjection has become the most widely used method of germline transgenesis. the technique is most established with mice, however microinjection is also carried out fairly commonly with other animals including rats, rabbits, farmyard animals and fish. a finely drawn out glass micropipette, loaded with dna solution, is used for the injection. under the microscope, with the aid of micromanipulators the egg is held fast and penetrated with the micropipette. the micropipette is guided through the cytoplasm towards one of the egg's pronuclei. once the tip of the micropipette has entered the pronucleus, around 1á/2 pl of dna solution is injected, bringing typically 200 dna molecules into the pronucleus. some eggs lyse following microinjection, probably due to the physical trauma of being penetrated by the micropipette. surviving eggs are transferred to the uterus of a pseudopregnant mother. following gestation, between 10 and 40% of the resulting offspring are likely to be transgenic (hogan et al., 1994) . in this method of transgenesis, the transgene dna integrates into the endogenous dna. integration is random and usually occurs at only one chromosomal site in each transgenic. the number of copies of the transgene at an integration site may range from one to thousands. for multiple copy inserts, the most common arrangement is an array of molecules joined head-to-tail (reviewed by gordon and ruddle, 1985) . less usually, tail-totail and head-to-head arrangements occur. deletions, duplications and other rearrangements may occur at the junctions between chromosomal and transgenic dna sequences (reviewed by bishop, 1996) . only a limited amount is known about the mechanisms of transgene integration. it has become apparent that the majority (:/ 85%) of pronuclear microinjection-derived transgenic founders are mosaics of transgenic and nontransgenic cells (whitelaw et al., 1993) . this surprising finding may be explained by postulating that (endogenous) dna replication is required for chromosomal integration. if transgene integration occurs during dna replication, only one of the two resulting cells will contain transgene sequences. during embryogenesis, a small number of cells (]/3 for mice) are recruited as embryo progenitors. thus, the resulting animal will most frequently comprise a mixture of transgene-harbouring and transgene-free cells. in approximately 15á/25% of mosaic founders, the foreign dna apparently integrates at later stages of the embryo cell replication, resulting in mice that contain the transgene in varyingly small proportions of their cells. as with other mosaics, transmission of the transgene is dependent upon the existence and extent of germline colonisation by transgene-containing cells. in the vast majority of cases where transmission occurs (whether from fully transgenic founders, from mosaic founders or from subsequent generations of transgenics), the transgene is inherited in a stable mendelian fashion, although exceptions have been found (for example, see palmiter et al., 1984) . due to the hemizygous nature of transgene insertion, even a nonmosaic founder will transmit its transgene to only (on average) 50% of its offspring. there is no particular restriction on the size of dna molecule used for microinjection. yeast artificial chromosome (yac) based transgene constructs consisting of /100 kb of dna have been successfully introduced into the mouse germline by pronuclear microinjection (reviewed by lamb and gearhart, 1995) . indeed, it may become possible (pending development of the necessary transgene constructs) to microinject autonomous artificial 'mini-chromosomes' (mammalian artificial chromosomes, or macs), complete with centromeres, telomeres and replication origins in addition to structural genes, promoters and enhancers. these specialized constructs would be expected to give a number of benefits compared with integrated transgenes, the most important of which would probably be the absence of chromosomal position effects on transgenic expression (reviewed by sgaramella and eridani, 1996) . because there are no special problems with microinjecting large transgene constructs, it is possible to incorporate structural gene-plus-promoter (plus other potentially useful sequences such as enhancers) combinations into the host genome. the main areas of application for microinjected geneplus-promoter combinations (i.e. current transgene designs) are reviewed in the following sections. as noted previously, some of the fundamentals of transgene design also apply to other methods of transgene delivery, as should become clear in later sections of this paper. 'housekeeping gene' promoters, such as the bactin promoter (beddington et al., 1989) and the histone h4 promoter (choi et al., 1991) , can be fused with chosen structural genes. the 'housekeeping gene' promoters in such genetic constructs generally drive a fairly high level of constant transcription in most cell types and developmental stages when these constructs are integrated as transgenes. beyond simply driving gene expression, promoters may be chosen to allow specificity in, or control over, patterns of expression. a transgene comprising a particular structural gene fused with a tissue-specific promoter should only produce its gene product in the tissue(s) specified by that promoter. in terms of germline gene therapy, this might allow treatment to be directed exclusively to the required tissues or organs. another application would be to direct expression of a novel exogenous gene to a body part where the gene product will not cause physiological havoc, and where the product could be readily recovered. indeed, pharmaceutically-useful products such as human factor ix and human alpha-1-antitrypsin (haat) have already been obtained from the milk of transgenic animals. factor ix is an anti-haemophilic agent, and haat may be used for the treatment of emphysema. in the former case, the construct used comprised the factor ix structural gene fused with the ovine b-lactoglobulin promoter (clark et al., 1989) . the same promoter was used in the latter case, fused with the haat structural gene (simons et al., 1987) . if outside (i.e. experimenter) manipulation of gene expression is required, an inducible promoter may be used. inducible promoters are able to respond to specific environmental cues such as temperature, or to dietary factors such as zinc. thus the structural gene within a transgene can be switched on or off at will. for instance, a metallothionein (mt) promoter fused with a growth hormone (gh) gene (palmiter et al., 1982) should allow gh production to be switched on simply by providing the transgenic with a zincsupplemented diet. this might avoid the possible physiological difficulties associated with continuous global production */particularly in utero */of transgene products such as gh. potential applica-tions for inducible promoters in terms of gene therapy are conceivable. more recently, inducible systems employing prokaryotic tetracycline resistance gene components have been developed (gossen et al., 1995; kistner et al., 1996; park and rajbhandary, 1998; schultze et al., 1996; shockett et al., 1995) . these systems usually require two separate transgenes: thus, for use with transgenic animals (as opposed to cells in vitro) these systems usually require the establishment of two separate transgenic lines, each line containing one of the two transgenes. double heterozygotes (containing both transgenes) are obtained by mating the two lines. one transgene (transgene i) includes a promoter construct consisting of (a) an array of tet operator sequences and (b) a minimal promoter sequence; (a) and (b) are coupled to the gene that is to be controlled (gene w). the other transgene (transgene ii) comprises a hybrid transcriptional transactivator gene fused to a suitable (e.g. tissuespecific) promoter. the hybrid transactivator gene product consists of a viral transcriptionactivating domain coupled with a tetracylinebinding domain. there are two main variants of the basic system: an 'on' system and an 'off' system. these variants are based upon functionally different transactivators. in the 'on' system, in cells in which transgene ii is active, exogenously administered tetracycline (or its analogue doxycycline) binds to the transactivator protein: this renders the transactivator able to bind to the tet sequences on transgene i, thereby activating expression of gene w. by contrast, the transactivator in the 'off' system binds to the tet sequences only in the absence of tetracycline: thus, administration of tetracycline prevents expression of gene w. several other promoter-based systems for the control of transgene expression are in the developmental stage. promising areas include natural promoters inducible by aryl hydrocarbons and promoter constructs inducible by steroid hormones (fussenegger, 2001; saez et al., 1997; smith et al., 1995; wang et al., 1994 wang et al., , 1997 . site-specific recombination provides a novel means of controlling transgene expression (kuhn et al., 1995; stark et al., 1992; utomo et al., 1999) . as with the tetracycline approach (above), two separate trangenes are usually required, necessitating the mating together of separate transgenic lines to produce double heterozygotes (where transgenic animals are required). one transgene (transgene i) consists of an appropriate (e.g. tissue-specific) promoter coupled to the gene that is to be controlled (gene x), engineered to contain a strong stop signal flanked on each side by a recombinase recognition site (e.g. loxp from bacteriophage p1). the other transgene (transgene ii) consists of a recombinase gene (cre in the case of bacteriophage p1) fused to an inducible promoter. exogenous administration of inducer drives the production of cre recombinase from transgene ii. the cre recombinase binds to the loxp sites on transgene i and catalyses the excision of the flanked stop signal, thereby rendering gene x competent for expression. a variant of this system can be used to inactivate a transgene, in which gene x (or an essential component thereof) is flanked by recombinase recognition sites. in this case, recombinase production results in the removal of essential sequences, thereby eliminating expression of gene x. transgene sequences integrating randomly into the host genome tend to give poor levels of expression, or exhibit inappropriate expression, in the form of temporally or spacially (ectopic) aberrant expression. the primary reason for such problems is the 'position effect', whereby the particular genetic environment at any point of insertion is likely to influence the expression of the integrated transgene. in some cases the remedy lies with transgene design: for example by ensuring that an appropriate enhancer sequence is included in the transgene construct. beyond this, it may be possible to insulate a gene from the position effect. matrix attachment regions are sequences which, when placed on either side of a gene within a transgene construct, appear to allow the gene(s) within an integrated transgene to occupy a separate chromosomal domain. ultimately, however, the best solution to transgene expression problems would be to avoid the position effect entirely. this is achievable through gene targeting: a transgene targeted to a chosen genomic locus will by definition avoid the position effect. as discussed in section 2.2 and section 2.3, reliable ways of germline gene targeting do exist. however, current germline gene targeting approaches rely upon the use of es cells or nt, and gene targeting is not possible at present with pronuclear microinjection, due to the inability to select for eggs that contain rare targeted integrations (see section 2.1). therefore, the use of germline gene targeting as a means to avoiding transgene expression problems awaits progress in es cell technology and nt technology. the above sections review ways in which gene function can be added-in by gene transfer. however, potential applications also exist for the ability to eliminate endogenous gene function in the recipient organism. for example, a dominant, gain-of-function single-gene disorder requires for its treatment the removal of the aberrant gene activity (rather than the addition of a missing activity). in animal transgenesis, the ability to eliminate specific gene function is often very desirable in the production of animal models for scientific/medical research. ideally, exogenous dna would be targeted to the precise chromosomal locus in question. integration of the transgene would cause disruption of endogenous dna sequences, resulting in nonexpression of the endogenous gene. however, as discussed above, gene targeting is not widely available and is presently incompatible with pronuclear microinjection. until gene targeting becomes readily available, alternative methods of gene function elimination may be employed. one such approach is to fuse a toxin gene (e.g. the gene for diphtheria toxin a) to a particular promoter (for examples, see palmiter et al., 1987; wallace et al., 1991 wallace et al., , 1994 delaney et al., 1996) . thus, the cell population(s) in which the promoter is active will die (or simply fail to develop) due to intracellular toxin production. an alternative approach employs a viral thymidine kinase (tk) gene as the expressing part of the transgene. gancyclovir (gcv), a toxic thymidine analogue, may be administered systemically to the transgenic animal or, in principle, to the gene therapy-treated patient. gcv is relatively nontoxic unless phosphorylated by the viral tk gene product. phosphorylated gcv is incorporated into the cellular genome during dna replication, leading to cell death. thus, dividing cells containing and expressing the transgene will be ablated. as with the 'toxin' transgenes considered above, promoter choice can determine which (transgenecontaining) cells should express the tk gene. the tk/gcv approach is very promising for gene therapy applications, particularly in respect of antitumour approaches. two properties are important in this respect: (a) the ability to 'switch on' transgene activity (only) when required, by administration of gcv; and (b) the ablation only of dividing cells. ablating entire cell populations is a rather heavy-handed way to eliminate specific functions. a more precise approach employs antisense dna (reviewed by sokol and murray, 1996) . this approach is feasible where the sequence of the gene to be turned off is known accurately. the antisense dna represents the sequence of the nontranscribed dna strand of the endogenous gene; the transgene construct is prepared by inverting the isolated structural gene and rejoining it with its promoter. expression of the integrated transgene results in antisense rna transcript production. the antisense rna should hybridise with the 'sense' transcripts produced by the endogenous gene (located at a separate chromosomal locus). the resulting duplexes of rna cannot be processed by the translational machinery, therefore protein production (hence, endogenous gene expression) is eliminated. beyond uses in basic research, antisense dna constructs may have potential uses in gene therapy. an anti-viral strategy represents perhaps the most promising approach. dna encoding an antisense viral rna sequence would be coupled to an appropriate promoter to form the transgene construct. once integrated into the host genome, the transgene should block viral replication by production of antisense rna. an alternative potential application for antisense constructs might be in antitumour therapy. antisense dna-based constructs do not, however, hold a great deal of promise for the treatment of dominant, gain-of-function single-gene disorders. the problem is that, unless the rna sequence and size of a disease allele transcript happens to be quite different from that of the normal allele (assuming the sufferer to be a heterozygote), the antisense rna may also act against the normal allele's transcripts. although microinjection is well established as an effective technique of transgenesis, it does have certain inherent limitations. such limitations are reviewed in the following sections. 3.2.7.1. practical constraints. the most obvious drawback of microinjection is its cost. although conceptually very simple, the physical introduction of dna into egg pronuclei requires sophisticated levels of equipment and expertise. very high quality microscopes are required, together with micromanipulators, incubators, micropipette pullers, microinstrument forges, and a whole host of ancillary equipment. in terms of expertise it typically takes several years for a microinjector to be trained to a satisfactory level of competence. further, microinjection requires such a high level of concentration over the long duration of a single session that it is usually impossible for an individual microinjector to perform efficiently more than about once per week. a single murine microinjection session lasts around 6 h. this (unbroken) period is dictated by the biological 'window' available between fertilisation and pronuclear fusion. added to the above costs is the problem of low absolute efficiency of transgenic production. for mice, a successful laboratory typically microinjects around 200 eggs per session. from these eggs, around 20 offspring should result, of which between 2 and 8 are likely to be transgenic. thus the highest overall efficiency of murine microinjection is in the region of 4% (hogan et al., 1994) . efficiencies for nonmurine mammalian species (e.g. cattle, pigs, sheep) are much lower, typically less than 1% (wall et al., 1992) . such lower efficiencies may be accounted for by the fact that more expertise exists in respect of mice, and the existence of certain biological factors that hamper microinjection of nonmurine eggs. a major limiting biological factor associated with cow, pig and sheep eggs is the difficulty of visualising their pronuclei. cow and pig eggs are optically opaque, due to the presence of extensive cytoplasmic lipid particles. sheep eggs do not have dense cytoplasmic particles, however their pronuclei remain extremely difficult to visualise. this effect seems to be due to the pronuclei and cytoplasm sharing similar refractive indexes. another difficulty concerns the size of pronuclei. murine pronuclei are relatively large entities compared with those of farm animals. thus, the micropipette must travel further into the farm animal egg compared with that of the mouse. the extra distance travelled by the micropipette causes a larger entry hole (due to the tapering section of the micropipette), thus increasing the likelihood of cell death. more difficulties with agricultural animal eggs include poorly anchored pronuclei and a lack of visible indicators of post-injection egg damage. for practical considerations concerning transgenesis by pronuclear microinjection, the reader is referred to the following excellent reviews: wall (1996) , niemann and kues (2000) , wolf et al. (2000) . gene therapy via pronuclear microinjection has not been attempted with humans. current knowledge of human gametes lends support to the idea that, for microinjection purposes, the human egg would probably be more akin to an agricultural animal egg than to the mouse egg. 3.2.7.2. fundamental constraints. more basic than issues of cost and efficiency are limitations associated with the mode of transgene integration into the chromosome. the problem is that the whole process is, as mentioned previously, entirely random. there is no way of predicting or controlling major aspects of transgene integration such as copy number, copy orientation, endogenous sequence rearrangement nor, most importantly, chromosomal site of insertion (reviewed by jaenisch, 1988) . as mentioned previously (in the context of retroviral vectors), each chromosomal locus im-parts a particular position effect on the transgene. thus, random integration may result in transcriptional failure in some transgenics, and always requires that each founder transgenic be treated as unique. another problem arising from random integration is that the exogenous dna can disrupt and therefore inactivate an endogenous gene (doerfler et al., 1997) . such new mutations are thought to occur in a small but significant number of transgenics. in terms of applying transgenesis to particular purposes, it is not possible to deliberately introduce a nul mutation in zygotes using microinjection. nor, therefore, is it currently possible to replace a particular gene function with a new function. the same applies to attempts to subtly alter (rather than switch-off) endogenous gene activity. however, one speculative possibility might be to co-inject a recombinase enzyme (or gene) along with the transgene, in order to increase the frequency of hr, thus increasing the frequency of gene targeting. however, although several candidate enzymes/genes for enhancing hr have been described, the use of these in gene targeting is at an early experimental stage (vasquez et al., 2001) . finally, it should be noted that microinjection can be applied to many types of cells in addition to zygotes. es cells fall into this category. for es cells used in transgenesis, exogenous dna must enter the cells somehow, and microinjection into the es cell's nucleus is an effective option (see for example zimmer and gruss, 1989) . the drawback is that microinjection does not allow es cells to be treated en masse. somatic cells, whether in culture, temporarily removed from the body or, in principle, in situ in the body, may also receive exogenous dna via microinjection (reviewed by celis, 1984) . as for es cells, the drawback is the inability to treat cells en masse, together with enormous practical difficulties surrounding any attempt at in situ microinjection. (i.e. the huge numbers of cells likely to need microinjection and the difficulty of locating, visualising and manipulating such cells.) thus, microinjection is fundamentally unsuitable for in vivo somatic cell gene therapy. somatic gene therapy via microinjection remains a possibility where ex vivo cells are involved. microinjecting a relatively small number of ex vivo stem cells might, following their return to the body, allow the stem cells to recolonise and hence, amplify the number of treated cells. alternatively, but more speculatively, somatic cells from foetal tissues could be microinjected prior to in vitro culturing; the treated cells would thus be amplified in culture, such that an adequate number could be (heterologously) transplanted to the patient. lastly, cells in culture may be microinjected simply in order to study the mechanisms of transgene integration. retroviral infection and, especially, microinjection have become the main methods of gene transfer in higher animals, the former for somatic approaches and the latter for germline approaches. however, alternative methods are available or have been proposed. the following sections review a range of such methods. insoluble molecules such as calcium phosphate and deae-dextran can, when mixed with dna molecules, co-precipitate to form granules. these granules are phagocytosed by cells and, in a proportion of recipients, the dna appears in the nucleus where it may be transiently expressed. in a small proportion of treated cells, the dna becomes stably integrated into the genome (robins et al., 1981) . despite its proven worth (and continuing use) with cultured somatic cells, co-precipitation is of limited use in transgenesis. co-precipitation is somewhat laborious compared with electroporation and liposome-mediated gene transfer. there are no reports of successful gene transfer to eggs by co-precipitation. es cells can be transfected by co-precipitation (gossler et al., 1986) , as can ex vivo somatic cells. however, the range of potential cell types is limited: co-precipitation works well with fibroblasts, but has proved difficult to apply to other cell types. even in non-refractory cell types, co-precipitation remains less than ideal because it is associated with transgene mutations and ultra-high copy number integrations (calos et al., 1983; razzaque et al., 1983) . additionally, for unknown reasons, co-precipitation appears to allow only relatively low levels of hr (reviewed by mohn and koller, 1995) , making it less than ideal for gene targeting applications. finally, coprecipitation is strictly an in vitro procedure: it is difficult to envisage ways in which it could be adapted for in vivo gene therapy. retroviruses offer good opportunities for gene transfer due to their integration into the host chromosome. however, only actively diving cells are infectable by retroviral vectors, yet many potential targets for gene therapy are organs and tissues comprising slowly-or non-dividing cells. thus, alternative viruses have been explored as potential agents for somatic cell gene therapy. such viruses include the adenoviruses (tomanin, 1997), hepatitis delta virus (hdv) (netter et al., 1993) and herpes viruses (hsv) (burton et al., 2001) . a major limitation with these viruses as transgene vectors is that it is unclear whether they can reliably be induced to infect es cells, or eggs from a variety of species. thus, at present the major potential application for these viruses lies in the domain of (somatic) gene therapy. however, given that most if not all cultured cell types can be made permissive for infection by such viruses, one might speculate as to the likelihood of es cells being made similarly permissive by empirical advances. alternatively, nt (section 2.3) may in the future prove able to circumvent this limitation entirely. 3.3.2.1. adenoviruses. adenoviruses (ad) have a number of properties that make them potential candidates for gene therapy. ad are able to carry large transgenes (up to c. 38 kb) without adversely affecting their infectivity (bett et al., 1993) . ad have a low host cell/species-specificity, providing a very large range of tissues and organs as putative candidates for treatment, and permitting current animal models to be used for testing adenovirusmediated therapies. transgene expression is stable and persists beyond 1 year after a single treatment (stratford-perricaudet et al., 1992) . moreover, given that: (a) serious disease following adenoviral infection is very rare; and (b) the viral genome rarely integrates into the host's chromosomes, adenoviral-mediated gene therapy promises higher safety levels than those associated with retroviralmediated therapies (lee et al., 1995) . in addition to their potential role in gene therapy applications, there are indications that ad may be able to be used as transgene vectors. tsukui et al. (1996) report the production of transgenic mice following the infection of zonafree eggs with a replication-defective ad vector. this intruiging result suggests the possibility of a promising new strategy for animal transgenesis. however, further research is required in order to determine (a) the parameters within which ad can reliably deliver transgenes to eggs; and (b) the range of species potentially amenable to this form of gene transfer. 3.3.2.2. hepatitis delta virus. hdv is potentially usable as an agent of gene therapy directed at somatic cells. hdv is self-replicating, and may reach high copy-numbers (:/300 000 per cell) in infected tissues. this property makes hdv attractive as a gene therapy vector, although there are safety issues to be considered in view of the possible consequences of germ cell infection by a self-replicating vector. another safety consideration is that hdv is potentially cytopathic, although it might be possible to modify the viral genome to reduce such effects. a limitation with hdv is its size: at only 1.7 kb, it is unlikely to be able to carry large exogenes (netter et al., 1993) . hsv have large (125 á/229 kb) genomes, and therefore offer the potential for transferring large exogenes (roizman, 1994) . herpes viruses also have the attractive capacity of being able to induce permanent latent infection in their hosts (stevens, 1989) . although herpes viruses certainly offer great potential for gene therapy, particularly in respect of disorders of the nervous system (reviewed by lachmann and efstathiou, 1997) , the development of gene therapy systems employing these viruses awaits substantial progress in certain areas, including the ability to control the viral life-cycle and to prevent immune attack of treated cells. further, the pathology of herpes viral infection is poorly understood; thus there remain potential safety problems to be resolved. 3.3.2.4. other viruses. the viruses considered above are not the only types of viruses that are under scrutiny as transgene vectors. no single virus has the necessary characteristics for all applications. nor have all possible types of viruses been assessed for the potential utility in somatic or germline transgenesis. other types of viruses will doubtless be added to the current store of potential gene transfer vectors. liposomes are artificial vesicles that can act as delivery agents for exogenous materials including transgenes (see excellent reviews by watwe and bellare, 1995; nicolau et al., 1987; ilies and balaban, 2001) . like their natural cellular counterparts, liposomes comprise a lipid bilayer surrounding a small volume of aqueous solution. the liposome's lipid bilayer is similar to that of natural cells, consisting mainly of phospholipids and cholesterol. liposomes have been created in a variety of distinct forms, some of which are available as commercial preparations: the major differences are of structure, size and charge. the main structural difference is between liposomes with a single lipid bilayer and those with a multilamellar ('onion-skin') lipid bilayer. liposome sizes range from around 100 nm to several micrometers, and may be either negatively or positively charged. liposomes for use as gene transfer vehicles are prepared by adding an appropriate mix of bilayer constituents to an aqueous solution of dna molecules. in this aqueous environment, phospholipid hydrophilic heads associate with water while hydrophobic tails self-associate to exclude water from within the lipid bilayer. this self-organising process creates discrete spheres of continuous lipid bilayer membrane enveloping a small quantity of dna solution. the liposomes are then ready to be added to target cells (felgner, 1996; mahato et al., 1997) . when they come into contact with a target cell, liposomes may interact with the cell membrane in a variety of ways. possible liposome á/cell interactions include: (a) exchange of membrane constituents (lipsky and pagano, 1985) ; (b) adsorption to the cell membrane (eggens et al., 1989) ; (c) endocytosis (connor et al., 1984) ; and (d) membrane fusion (lamb, 1993) . interactions (a) and (b) are undesirable, as they result in the transgene molecules remaining outside the cytoplasm. interaction (c) is undesirable where it results in the formation of late endosomes, as the outcome is the destruction of engulfed transgene molecules. nevertheless, interaction (c) may be desirable where early endosomes release transgene molecules into the cytosol. interaction (d) is desirable because liposome á/cell fusion allows the exogenous dna to directly enter the cytosol. 'simple' liposome systems comprise negatively or neutrally charged liposomes: interaction (d) is quite rare in such systems, and generally takes several days for completion (schaeferridder et al., 1982) . although the success rates (with cultured cells) are better than those obtained using standard co-precipitation gene transfer protocols, the relative rarity of liposome á/cell membrane fusion presents a problem for transgenesis and gene therapy per se. furthermore, the slow rate of fusion is particularly problematic for in vivo therapies because free liposomes are rapidly captured and destroyed by macrophages (schoilew et al., 1990) . more recent systems employ cationic liposomes. the positively charged lipids in such liposomes bind directly with the cell membrane. such liposomes have been developed into commercially available systems that offer ease of preparation, stability and high transfection efficacy: examples include: lipofectin tm , lipofectamine tm , and transfectace tm (gibco-brl); transfectam † (promega); and dotap (boehringer mannheim) (reviewed by chisholm, 1995) . an alternative approach to the problem of inefficient liposome á/cell membrane fusion in-volves the use of fusion-inducing virus glycoproteins. glycoproteins from several viruses (including parainfluenza viruses, paramyxoviruses, coronaviruses and retroviruses) have powerful cell-fusion promoting properties, and inclusion of virus glycoproteins responsible for these properties into the liposome lipid bilayer results in a more frequent and rapid occurrence of liposome-cell fusion (bailey and chernomordik, 1997) . the major potential use of liposomes is in gene therapy, particularly with in vivo somatic approaches, for example in antitumour approaches (dass et al., 1997) . in this context, liposomes rival viral vectors as a dna delivery method. given that they consist only of biological lipids, liposomes have low toxicity. there is a theoretical risk for patients with lipid metabolism disorders; however studies designed to assess this have suggested minimal actual risk (for example see tsuboniwa et al., 2001) . antigenicity does not appear to be a problem either (yanagihara et al., 1995) , and there are of course no concerns about viral proliferation within the host. cationic liposome-mediated gene therapy has been carried out using animals/animal models. for example, partial correction of the ion transport deficit in the cystic fibrosis mouse model was reported following instillation (hyde et al., 2000) and nebulization (stern et al., 1998 ) of liposomes. similarly, canonico et al. (1994 and losordo et al. (1998) have reported expression of haat and gh genes in rabbits following aerosol delivery and intravenous injection of liposomes. fusion-inducing glycoprotein-based liposomes have been used to deliver genes in vivo in several animal studies. gene expression has been reported following injection of such liposomes into various tissues including heart, kidney, liver, lung, skeletal muscle and testis; gene expression has also been reported following trans-arterial delivery (reviewed by yanagihara et al., 1995) . although liposome-based gene transfer systems are now able to efficiently deliver transgenes to the cytoplasm, only in a minority of cells will transgenes reach the nucleus. this represents a significant limitation in terms of transgene stability and expression. one way of avoiding this problem might be to characterise and utilise the biochem-ical apparatus that permits some viruses (e.g. poxviruses: carroll and moss, 1997; moss, 1996) to exist and replicate in the cytoplasm. following liposome-mediated gene transfer, amongst transgene molecules reaching the nucleus, only a minority integrate into the host cell chromosomes. transgene expression is therefore essentially transient, with reported durations of expression varying widely between separate studies: the range is from around 10 days to several months, with a typical duration of perhaps around 20 days (see reviews by ilies and balaban, 2001; yanagihara et al., 1995) . for in vivo gene therapy, transient expression would necessitate repeated administration of liposomes, perhaps for the entire lifetime of the patient. however, this may not be a fundamental problem, given the previously mentioned low toxicity and antigenicity associated with liposomes, although a suitably non-invasive administration route would certainly be required. from a safety perspective, the non-integration that underlies transient expression would be positively beneficial for in vivo therapies, particularly in respect of oncogenesis but also with regard to the genetic integrity of the patients' germ cells. germline transgenesis is possible with liposomemediated gene transfer, and es cells have been successfully transfected by liposomes (for example see pain et al., 1999) . however, the rate of transgene integration into the genome is broadly similar to that of electroporation. as with electroporation, the relatively low integration frequency renders liposome-mediated transfection impractical for use with mammalian zygotes. however, an indirect approach to the germline via the zygote remains a putative possibility, where liposomes would be used to deliver transgenes to sperm cells. some success has been claimed for this approach (reviewed by smith, 1999) . in summary, liposomes look set to become increasingly important as agents of transgene delivery, particularly for in vivo gene therapy. efforts are being exerted towards improving and developing methods in a number of respects, including the following areas (sections 3.3.3.1, 3.3.3.2 and 3.3.3.3). 3.3.3.1. improved site-specificity of in vivo treatment. beyond simply changing the site/route of liposome administration, accurate organ targeting might be achieved by incorporating tissuespecific recognition molecules (e.g. receptor-binding proteins, antibodies) into the liposome membrane (tiukinhov et al., 1999) . 3.3.3.2. improved nuclear targeting. efficient translocation of exogenous dna to the host cell's nucleus might be achieved by incorporating nuclear-localisation proteins into the liposome complex (reviewed by boulikas, 1997) . improved nuclear translocation should: (a) enhance expression levels; and (b) increase the frequency of chromosomal integration, thus increasing the scope for germline modification. 3.3.3.3. improved expression longevity. if chromosomal integration could be induced (see previous section), expression could persist for the lifetime of the host cell; however, nuclear translocation without integration would be very useful in itself. the capacity of liposomes to carry dna molecules of great size (more than 150 kb has been reported*/ see for example strauss et al., 1993) may in future allow transfer of macs. long-lived expression would be an expected outcome. electroporation is a process by which highintensity electric field pulses temporarily destabilise cellular membranes. during the destabilisation period, dna molecules present in the surrounding media are able to permeate the cell's external and internal membranes to enter the cytoplasm and nucleoplasm (reviewed by lurquin, 1997) . electroporation provides a fast and inexpensive means of introducing exogenous dna into cultured mammalian cells. electroporation is not associated with transgene mutation. the process can be equilibrated to yield copy numbers (of integrated transgenes) of between 1 and 20 copies per genome */an advantage compared with microinjection. large transgene molecules (e/150 kb) can be transferred. in addition to the advantages of being able to transfer large conventional transgenes, the dna transfer capabilities of elec-troporation may in future allow transfer of macs (section 3.2.1). such constructs should be immune to nuclease attack and might be designed to replicate in step with the host cell cycle, thus providing long-lived transgene expression. the main drawbacks of electroporation are that: (a) specialised equipment is required; (b) each cell type and culture system requires fairly extensive empirical optimisation; and (c) typically only around 0.01% of treated cells show genomic integration of transgene (see excellent reviews by chang et al., 1992; lurquin, 1997; potter and cooke, 1992) . in terms of mammalian transgenesis, electroporation is an effective method of introducing exogenous dna into es cells (chu et al., 1987) . the advantage of electroporation over microinjection in the context of es cells is that electroporation allows the en masse treatment of large numbers of cells. this is extremely useful where a rare integration event requires selection from a background of unwanted integrations, as in gene targeting. similarly, electroporation has been successful with nt transgenesis (mccreath et al., 2000; schnieke et al., 1997) . the relatively low efficiency of electroporation renders it impractical for use with mammalian zygotes. the best superovulation protocols deliver around 30 eggs per animal for mice and pigs, and 10 for cattle and sheep (wall et al., 1992) . extrapolating from these figures, a 0.01% efficiency rate would necessitate on average 300'/ mice or pigs, and approximately 1000 cattle or sheep in order to obtain just one transgenic. however, for (mammalian) species that produce large numbers of easily recovered eggs, electroporation shows more promise in terms of transgenesis. many fish species are potentially useful in this respect, and some successes have been claimed. for example, murakami et al. (1995) report the successful use of electroporation to create transgenic medaka, as do ono et al. (1997) , with the latter also reporting successful transmission of transgenes to f 1 progeny. an attractive putative use of electroporation for transgenesis would be as an adjunct to sperm cellmediated dna. for example, gagne et al. (1991) report an increase from 12 to 19% of transgenic bovine blastocysts when electroporation is in-cluded in an otherwise passive sperm-dna uptake protocol. similar findings were reported by rieth et al. (2000) , again with transgenic bovine blastocysts. several experiments have indicated that fish species may be able to be genetically manipulated in this way (for examples see walker et al., 1995; patil and hong woo, 1996) . however, these results await replication, and big questions remain over the effectiveness or otherwise of sperm cells as vectors per se (smith, 1999) . if it turns out that there is substance to claims that sperm cells can be induced to carry transgenes, then the techniques' efficiency would have to be high. otherwise, electroporation of sperm cells could be as limited as it is (in principle) for zygotes, with excessively high numbers of animals needed in order to obtain each transgenic. the only way around this limitation would be via the development of a selection system for 'positive' sperm cells in vitro */a highly unlikely possibility, give that sperm cells exist in a quiescent state as far as gene expression is concerned. in terms of its use in gene therapy, electroporation has the potential to be used in vivo. this field is at a very early stage of development, but empirical improvements may in future permit electroporation to be used to deliver transgenes to particular tissues or to tumours (hofmann et al., 1999; swartz et al., 2001) . clearly, this has great potential utility in the context of gene therapy. moreover, given that there is no evidence that any particular somatic cell types are inherently unable to be successfully electroporated, the range of treatable tissues */and hence, diseases */is potentially very great. ex vivo gene targeting approaches using electroporation are also under development. hatada et al. (2000) used electroporation to correct by gene targeting a defective hypoxanthine phosphoribosyltransferase gene in hematopoietic progenitor cells. the approach was similar to gene targeting with es cells or nt, in that selection was used to enrich for targeted outcomes. if similar successes can be obtained with appropriate stem cells in humans, it may be possible to return such targeted cells to the body of the patient such that repopulation by the altered cells yields a therapeutic or curative outcome for various genetic disorders. finally, electroporation has recently, been used to transfer genes into cultured mammalian embryos at defined stages of development (akamatsu et al., 1999; osumi and inoue, 2001) . the purpose here is to gain insights into mammalian development at the molecular level (as opposed to generating transgenic animals). in the experiments described by osumi and inoue (2001) , plasmid dna was injected into the neural tube of rat embryos prior to electroporation. following in vitro maintenance of the electroporated embryos, exogenous dna was detected in 10 á/100% of the cells in the target region. finally, it is worth considering a highly novel technique, as an illustration of the many and varied means by which emerging technologies are enabling gene transfer. in particle bombardment, dna may be adsorbed onto spherical tungsten or gold particles (diameter c. 4 mm) and transferred into a mass of cells by a particle gun; once inside the target cells, the dna is solubilised and may be expressed (klein et al., 1992) . this approach, sometimes known as 'biolistics', was originally developed for plant transgenesis but has been shown to be effective for transferring transgenes into mammalian cells in vivo (cheng et al., 1993) . indeed, there are indications that biolistics may be more efficient than alternative methods such as liposome-mediated transfection and recombinant viral infection (gainer et al., 1996) , although the amount of research data presently available is too little to permit definitive comparisons. if the method does prove to be effective in vivo, tumours are the most likely targets for particle bombardment (reviewed by mahvi et al., 1997) . biolistics then, is a promising method for treating cells en masse, and looks most useful in terms of somatic gene therapy. there have been no reported attempts to utilise biolistics for altering germline cells. the en masse nature of the approach places it in a similar position to that of electroporation or liposome-based methods in respect of fertilised eggs: impractically large numbers of eggs would undoubtedly be required per successful transgenic event. in principle it might be possible to apply biolistics to es cells as a route to the germline, however to date no such attempts have been recorded. the ability to transfer genes into the higher animal cells is a prerequisite for continued progress in animal transgenesis and in human gene therapy. however, improvements in gene transfer are urgently required, particularly if hopes of effective gene therapies are to be realised. key aspects for attention include improved control of target cell range, improved transgene uptake efficiencies, the ability to localise transgenes to the nucleus and improvements in gene targeting to enable the efficient integration of transgenes into chosen genomic loci. mammalian elav-like neuronal rnabinding proteins hub and huc promote neuronal development in both the central and the peripheral nervous systems lipid effects in liposome cell fusion mediated by influenza hemagglutinin an in situ transgenic enzyme marker for the midgestation mouse embryo and the visualization of inner cell mass clones during early organogenesis packaging capacity and stability of human adenovirus type 5 vectors chromosomal insertion of foreign dna nuclear localization signal peptides for the import of plasmid dna in gene therapy strategies for the isolation and purification of retroviral vectors for gene therapy targeted correction of a major histocompatibility class ii ea gene by dna microinjected into mouse eggs multiple applications for replication-defective herpes simplex virus vectors high mutation frequency in dna transfected into mammaliancells sheep cloned by nuclear transfer from a cultured cell line aerosol and intravenous transfection of human a-1-antitrypsin gene to lungs of rabbits poxviruses as expression vectors microinjection of somatic cells with micropipettes */comparison with other transfer techniques. biochem overview of electroporation and electrofusion in vivo promoter activity and transgene expression in mammalian somatic tissues evaluated by using particle bombardment high efficiency gene transfer into mammalian cells a generic intron increases gene expression in transgenic mice electroporation for the efficient transfection of mammalian cells with dna expression of human anti-hemophilic factor ix in the milk of transgenic sheep gene targeting in livestock: a preview ph-sensitive liposomes á/acid-induced fusion safety issues related to retroviral-mediated gene-transfer in humans. hum cationic liposomes and gene therapy for solid tumors conditional ablation of cerebellar astrocytes in postnatal transgenic mice targeted gene regulation and gene ablation integration of foreign dna and its consequences in mammalian systems from the natural evolution to the genetic manipulation of the host-range of retroviruses specific interaction between lex and lex determinants */a possible basis for cell recognition in preimplantation embryos and in embryonal carcinoma-cells gene targeting with retroviral vectors */recombination by gene conversion into regions of nonhomology the impact of mammalian gene regulation concepts on functional genomic research, metabolic engineering, and advanced gene therapies electroporation of bovine spermatozoa to carry foreign dna in oocytes successful biolistic transformation of mouse pancreatic islets while preserving cellular function advanced transgenic and gene-targeting approaches dna-mediated genetictransformation of mouse embryos and bone-marrow */a review genetic transformation of mouse embryos by micro-injection of purified dna transcriptional activation by tetracyclines in mammalian-cells transgenesis by means of blastocyst-derived embryonic stem-cell lines the safety of retroviral vectors gene correction in hematopoietic progenitor cells by homologous recombination amplified and tissue-directed expression of retroviral vectors using ping á/pong techniques electroporation therapy of solid tumors manipulating the mouse embryo design of retroviral vectors and helper cells for gene therapy foreign dna introduced into the vas deferens is gained by mammalian spermatozoa generation of genetically modified mice by spermatozoa transfection in vivo: preliminary results repeat administration of dna/liposomes to the nasal epithelium of patients with cystic fibrosis recent developments in cationic lipid-mediated gene delivery and gene therapy transgenic animals retrovirus-induced de novo methylation of flanking host sequences correlates with gene inactivity regulation of somatic-cell therapy and gene therapy by the food and drug administration doxycycline-mediated quantitative and tissue-specific control of gene expression in transgenic mice transformation of mirobes, plants and animals by particle bombardment influence of epigenetic changes during oocyte growth on nuclear reprogramming after nuclear transfer inducible gene targeting in mice the use of herpes simplex virus-based vectors for gene delivery to the nervous system production of alpha-1,3-galactosyltransferase knockout pigs by nuclear transfer cloning yac transgenics and the study of genetics and human disease paramyxovirus fusion */a hypothesis for changes biology and pathogenesis of retroviruses embryonic stem cells and gene targeting adenoviral vectors conditional control of gene expression in the mouse intracellular translocation of fluorescent sphingolipids in cultured fibroblasts */endogenously synthesized sphingomyelin and glucocerebroside analogs pass through the golgi-apparatus en route to the plasma membrane gene therapy for myocardial angiogenesis: initial clinical results with direct myocardial injection of phvegf165 as sole therapy for myocardial ischemia gene transfer by electroporation nonviral vectors for in vivo gene delivery: physicochemical and pharmacokinetic considerations dna cancer vaccines: a gene gun approach production of genetargeted sheep by nuclear transfer from cultured somatic cells engineering the mouse genome by site-specific recombination gene-transfer by retrovirus vectors occurs only in cells that are actively replicating at the time of infection genetic manipulation of embryonic stem cells human gene therapy genetically engineered poxviruses for recombinant gene expression, vaccination, and safety ten years of gene targeting: targeted mouse mutants, from vector design to phenotype analysis development of a localized electroporation system for transgenic medaka experimental transmission of human hepatitis-delta virus to the laboratory mouse liposomes as carriers for in vivo gene transfer and expression expression of human blood clotting factor viii in the mammary gland of transgenic sheep transgenic livestock, premises and promises ecdysone-inducible gene expression in mammalian cells and transgenic mice transgenic medaka fish bearing the mouse tyrosinase gene: expression and transmission of the transgene following electroporation of the orange-colored variant gene transfer into cultured mammalian embryos by electroporation chicken embryonic stem cells and transgenic strategies cell lineage ablation in transgenic mice by cell-specific expression of a toxin gene dramatic growth of mice that develop from eggs microinjected with metallothioneine-growth hormone fusion genes transmission distortion and mosaicism in an unusual transgenic mouse pedigree silencing of gene expression: implications for design of retrovirus vectors tetracycline-regulated suppression of amber codons in mammalian cells nuclear internalization of foreign dna by zebrafish spermatozoa and its enhancement by electroporation embryonic stem cells, creating transgenic animals gene transfer into adherent cells growing on microbeads rearrangement and mutagenesis of a shuttle vector plasmid after passage in mammalian-cells nuclear transfer technologies: between successes and doubts nuclear cloning and epigenetic reprogramming of the genome electroporation of bovine spermatozoa to carry dna containing highly repetitive sequences into oocytes and detection of homologous recombination events germ-line transmission of genes introduced into cultured pluripotential cells by retroviral vector transforming dna integrates into the host chromosome herpesviridae: a brief introduction stem cells from the mammalian blastocyst germ line transformation of mammals by pronuclear microinjection inducible gene expression in mammalian cells and transgenic mice sperm-mediated gene transfer by direct injection of foreign dna into mouse testis liposomes as gene carriers */efficient transformation of mouse l-cells by thymidine kinase gene human factor ix transgenic sheep produced by transfer of nuclei from transfected fetal fibroblasts intracellulardistribution and destruction of liposomes, following in vitro phagocytosis by alveolar macrophages efficient control of gene expression by single step integration of the tetracycline system in transgenic mice mammalian artificial chromosomes a modified tetracycline-regulated system provides autoregulatory, inducible gene-expression in cultured-cells and transgenic mice alteration of the quality of milk by expression of sheep beta-lactoglobulin in transgenic mice cytochrome-p450 1a1 promoter as a genetic switch for the regulatable and physiological expression of a plasma-protein in transgenic mice benefits and problems with cloning animals antisense and ribozyme constructs in transgenic animals gene-trap mutagenesis: past, present and beyond catalysis by site-specific recombinases the effects of jet nebulisation on cationic liposome-mediated gene transfer in vitro widespread long-term gene-transfer to mouse skeletal-muscles and heart germ line transmission of a yeast artificial chromosome spanning the murine a-1(i) collagen locus sparking new frontiers: using in vivo electroporation for genetic manipulations safety considerations in somatic genetherapy of human-disease with retrovirus vectors. hum tissuetargeting of acoustic cationic liposomes: potential for directed gene delivery the use of embryonic stem cells for the genetic manipulation of the mouse safety evaluation of hemagglutinating virus of japanartificial viral envelope liposomes in nonhuman primates transgenesis by adenovirus-mediated gene transfer into mouse zonafree eggs the recent progress on nuclear transfer in mammals temporal, spatial, and cell type-specific control of cre-mediated dna recombination in transgenic mice manipulating the mammalian genome by homologous recombination retroviral vectors */from laboratory tools to molecular medicines mouse cloning with nucleus donor cells of different age and type gene transfer by electroporated chinook salmon sperm transgenic livestock progress and prospects for the future making transgenic livestock */genetic-engineering on a large-scale specific ablation of thyroid-follicle cells in adult transgenic mice consequences of thyroid-hormone deficiency induced by the specific ablation of thyroid-follicle cells in adult transgenic mice positive and negative regulation of gene expression in eukaryotic cells with an inducible transcriptional regulator a regulatory system for use in gene-transfer manufacture of liposomes molecular biology of tumor viruses: rna tumor viruses the majority of g0 transgenic mice are derived from mosaic embryos germline and somatic mosaicism in transgenic mice viable offspring derived from fetal and adult mammalian cells transgenic technology in farm animals */progress and perspectives liposome-mediated gene transfer production of chimaeric mice containing embryonic stem (es) cells carrying a homeobox hox-1.1 allele mutated by homologous recombination key: cord-017208-7oew461e authors: aurigemma, rosemarie; tomaszewski, joseph e.; ruppel, sheryl; creekmore, stephen; sausville, edward a. title: regulatory aspects in the development of gene therapies date: 2005 journal: cancer gene therapy doi: 10.1007/978-1-59259-785-7_29 sha: doc_id: 17208 cord_uid: 7oew461e preclinical therapeutics development research is directed toward fulfilling two overlapping sets of goals. a set of scientific goals includes defining the best molecule or biologic construct for the task at hand, and proving the case for its development. the second set of goals addresses regulatory requirements necessary to introduce the agent into human subjects. in the case of “small molecule” drugs, in most cases the identity of the molecule and appropriate safety studies are straightforward. in contrast, the development of biologic agents, including gene therapies discussed here, presents distinct challenges. the nature of the “drug” may be an organism subject to mutation or selection of variants through recombination. its properties may vary depending on the scale and method of its preparation, purification, and storage. how to test adequately for its safety prior to first introduction in humans may not be straightforward owing to intrinsic differences in response to the agent expected in humans as compared to animals. preclinical therapeutics development research is directed toward fulfilling two overlapping sets of goals. a set of scientific goals includes defining the best molecule or biologic construct for the task at hand, and proving the case for its development. the second set of goals addresses regulatory requirements necessary to introduce the agent into human subjects. in the case of "small molecule" drugs, in most cases the identity of the molecule and appropriate safety studies are straightforward. in contrast, the development of biologic agents, including gene therapies discussed here, presents distinct challenges. the nature of the "drug" may be an organism subject to mutation or selection of variants through recombination. its properties may vary depending on the scale and method of its preparation, purification, and storage. how to test adequately for its safety prior to first introduction in humans may not be straightforward owing to intrinsic differences in response to the agent expected in humans as compared to animals. the general principles, however, in allowing "first-in-human" experiences are similar for both small molecules and biologics. the ethical conduct of clinical trials in patients with a dire or lifethreatening disease demands an understanding of the identity and dose of an agent that has the possibility of causing clinical benefit with adverse events expected at worst to be easily reversible and well predicted by the preclinical experience. in normal volunteers or patients who are otherwise well, evidence should be gathered that would support an initial range of doses of the test agent expected to be without substantial toxicity or long-term effects. thus, the successful clinical introduction of a novel therapeutic concept requires an organized approach to integrate scientific, technical, and regulatory requirements. this integration should begin in the research laboratory, as the concept becomes a candidate for the clinic, to prevent avoidable and expensive delays in clinical development. for example, if a product is created using a mammalian cell line for which viral or other contamination has not been ruled out, costly rederivation will be required before that product can be manufactured for clinical trials using current good manufacturing practices (cgmp). on the other hand, during the discovery phase, an excessive and premature concern over cgmp compliance can impede research. therefore, a clear strategic understanding of the principles underlying regulatory issues is desirable and is the goal of this chapter. we proceed from the experience of the developmental therapeutics program (dtp) of the national cancer institute in the manufacture of biologicals, including gene therapy constructs for preclinical and clinical use. we outline the basis for our approach to safety testing studies to be included in an investigational new drug (ind) application to the food and drug administration (fda). we focus on studies that would allow phase i and perhaps early phase ii clinical trials. in 2002, dtp contributed to over 40 different cgmp biological projects. most of these activities were selected competitively from applications received from academic researchers or from the intramural laboratories of the national institutes of health (nih). dtp products include viruses for vaccines or gene therapy, plasmids, monoclonal antibodies, recombinant proteins, synthetic peptides, natural product fermentations, and oligonucleotides. during the 2002 fiscal year, over 30 different lots were manufactured and released under cgmp for clinical use or further cgmp manufacturing. most products are destined for phase i or phase ii clinical trials in cancer. beyond early (phase i/ii) clinical trials, technology transfer for some products has occurred, with further development through phase iii now addressed by pharmaceutical companies. based on experience accumulated over several years, we abstracted the initial profiles of the more successful concepts (table 1) , as well as some early project characteristics that can impede clinical development ( table 2) . we note correlations between the thoroughness of the early research, attention to "the rules," outlined in the references cited here, and the development of commercial interest in the product. gene therapy and other biologic therapeutics are regulated within the fda by the center for biologics evaluation and research (cber), which was created in 1972 to address products emerging from the new biotechnology. reorganization at the fda is currently under way that will result in regulation of many biotherapeutics by the center for drug evaluation and research, which has oversight of small molecule drugs. it is anticipated, however, that blood products, vaccines, and gene therapy products will remain with cber. the biological response modifiers advisory committee is a chartered advisory group with the role of advising the fda to ensure the safety and effectiveness of biological products, including gene therapy. the recombinant dna advisory committee (rac) also oversees gene therapy research through the nih office of biotechnology activities. the rac was established in 1974 in response to public concerns regarding the safety of recombinant deoxyribonucleic acid (dna) technology. human gene transfer trials in which nih funding is involved (either directly or indirectly) are to be submitted to the rac for review. table 1 beyond a good idea: what the successful investigator has already done with a project leading to commercial development defined candidate biologic (or molecule) made comparisons with similar products characteristics of product are consistent with pharmaceutical requirements production scale is adequate product characterization is adequate laboratory reference standard exists in vitro potency assay has been developed stability studies develop confidence product is a "drug" reproducible model systems have confirmed in vivo activity with clinical product early animal work includes some toxicology scale-up requirements practical for initial clinical trials in general, reflects experience and scientific maturity of investigator in addition to the us agencies that develop the regulations that govern drug development and licensing, the international conference on harmonization (ich) was formed in april 1990 involving the united states, the european union, and japan to address the issue of globalizing such regulations. the ich steering committee meets at least twice a year to continue their agenda of updating and harmonizing regulations for medicinal products; they emphasize safety, quality, and efficacy. expert working groups were formed within ich to address specific topics related to these basic areas. although the fda has not formally adopted all of the ich guidelines, these guidelines should be followed when they exist in preliminary form. for investigators planning to conduct investigational drug trials in foreign countries, it is imperative that they be familiar with, and adhere to, the regulations set forth by ich. in 1996-2001, a series of fda and ich guidance documents on characterization and preclinical safety evaluation of biotechnology-derived pharmaceuticals was developed (1) (2) (3) (4) (5) (6) (7) (8) . these guidances represent the fda's current thinking on preclinical safety evaluation of biotechnology-derived pharmaceuticals. these are defined as products derived from characterized cells using a variety of expression systems, including bacteria, yeast, insect, plant, and mammalian cells. the active substances may include proteins and peptides, their derivatives, and products of which they are components. these materials could be derived from cell cultures or produced using recombinant dna technology, including production by transgenic plants and animals. examples include cytokines, enzymes, fusion proteins, growth factors, hormones, monoclonal antibodies, plasminogen activators, recombinant plasma factors, and receptors. the intended indications for use in humans may include in vivo diagnostic, therapeutic, or prophylactic uses. the principles outlined in these guidance documents may also be applicable to recombinant dna protein vaccines, chemically synthesized peptides, plasmaderived products, endogenous proteins extracted from human tissue, and oligonucleotide drugs. the fda defines gene therapy as "a medical intervention based on modification of the genetic material of living cells" (9) . cells may be modified ex vivo for subsequent administration to humans or may be altered in vivo by gene therapy given directly to the patient. when the genetic manipulation is performed ex vivo on cells that are then administered to the patient, this is also considered a form of somatic cell therapy (9) . "the genetic manipulation may be intended to have a therapeutic or prophylactic table 2 issues requiring attention at the outset of a project inappropriate antibiotic selection markers (e.g., ampicillin for recombinant proteins) lab-scale affinity purification solubility problems low yield errors in genetic sequence extraneous genetic material poorly defined production systems inadequate purification schemes unvalidated or nonexistent in vitro potency assay(s) lack of key reagents (e.g., antibodies to desired product) poor biochemical characterization inappropriate raw materials raw material qualification problems inappropriate cell banks difficult or unidentified toxicology systems failed vendor qualification intellectual property concerns effect or may provide a way of marking cells for later identification. recombinant dna materials used to transfer genetic material for such therapy are considered components of gene therapy and as such are subject to regulatory oversight". specific information related to gene therapy issues is contained in the 1998 "fda guidance for industry: guidance for human somatic cell therapy and gene therapy" (9) . this guidance document updates and replaces the 1991 fda "points to consider" on this subject (9a). new information was intended to provide manufacturers with current information regarding regulatory concerns for production, quality control testing, and administration of recombinant vectors for gene therapy and of preclinical testing of both cellular therapies and vectors. the fda defines somatic cell therapy as "the administration to humans of autologous, allogeneic, or xenogeneic living non-germline cells, other than transfusable blood products, for therapeutic, diagnostic, or preventive purposes." the evaluation of the safety of gene therapy products is perhaps one of the more difficult tasks that faces toxicologists in the drug development arena today. because many of the agents, like other biologicals, are species specific and because these agents integrate into the host genome, the choice of animal models and study designs is fraught with uncertainty, and each product frequently breaks new regulatory ground. until recently, many investigators working in this field were probably lulled into a false sense of security because of the close scrutiny that preclinical studies and clinical protocols received from the nih rac and the fda. with the death of jesse gelsinger, a patient enrolled in a gene therapy clinical trial to correct a metabolic disease, in 1999 and the recent reports of a leukemialike disease produced in children who received gene therapy treatments to correct severe combined immunodeficiency disease (scid) (10) (11) (12) (13) , the safety of these agents is called into question more than ever. as a result, the toxicologist is under even more pressure to design more rigorous safety evaluation programs. there have been a number of reviews in this area in recent years by toxicologists from the fda (14) , industry (15, 16) , and international workshops (17) that cover many of the fundamentals regarding safety evaluation of gene therapy products. these resources, in conjunction with this chapter and the various guidance documents from the fda and the ich (7, 9, 18) , can be used by toxicologists to develop sound safety programs. these issues are discussed in greater detail in the latter half of this chapter. the basic foundation of regulations for drug development can be found in the code of federal regulations, title 21 food and drugs (21 cfr; (19) (20) (21) (22) (23) (24) (25) (26) . in addition to title 21, fda maintains an extensive number of web sites containing regulatory information that should be consulted during the development of a novel biotherapeutic. the collection of available regulatory information includes points to consider, guidance documents, drafts, and reports from public forums and symposia as well as information on the meetings of the biological response modifiers advisory committee. ich also sponsors a web site for obtaining the most recent guidelines. a free subscription to an e-mail advisory update service is also available ( table 3) . in addition to the regulatory guidelines provided by the fda, the nih has published, and frequently updates, the nih guidelines for research involving recombinant dna molecules, which can also be found electronically ( table 3) . although published documents disseminated by the fda and nih are essential starting points for planning a cgmp development strategy, it is important to realize that, in this rapidly evolving field, some requirements may be reflected in public comments or a growing consensus among industry long before they are formally adopted. furthermore, it is not unusual during the development of a new biologic to have also developed alternatives to conventional practices that are based on sound scientific data and are then implemented after discussions with the fda. product-specific factors can influence the regulatory requirements for an investigational agent. these issues should be explored in detail with the fda in a pre-ind meeting at which the ind sponsor presents relevant preclinical data and manufacturing and animal safety testing to support the proposed approach to clinical development. the types of further studies pertinent to the particular agent can then be proposed, and input from the agency help shape the final development plan. interactions should take place with regulatory authorities at intervals that will facilitate the development of a product (table 4) . a key issue frequently not understood is that regulatory demands become more stringent as a product moves from phase i (safety), through phase ii (activity), to phase iii (comparative efficacy) trials and licensure (6, (27) (28) (29) (30) (31) . this philosophy reflects the conscious effort not to stifle innovation in early phase clinical testing, but to ensure that, by the time registrationoriented late-stage trials are contemplated, issues related to production variability, assay, and assurance of safety are mature and well-substantiated because the results of such trials could be the basis for sale of the agent to the public. another factor that affects the level of regulatory compliance is the nature of the study population. products manufactured for phase i trials in healthy normal volunteers typically must meet much stricter requirements than those studied in patients with dire, life-threatening conditions (e.g., cancer or end-stage acquired immunodeficiency syndrome). as improved technology becomes available, requirements also tend to increase (27-31). the level of regulatory compliance to be followed during different stages of development is dependent on the type of biologic product and the technology available for supporting its development. assays, methods, and technologies for monoclonal antibody development (32) , for instance, are better defined than the techniques available for some of the new virus vectors that are emerging. furthermore, new technologies to support product development are also constantly evolving. the number of specific viral contaminant tests required of cgmp human cell lines, for example, has increased steadily as new pathogens are identified and assays become available. as new scientific knowledge accumulates, novel regulatory challenges can appear. the issue of transmissible spongiform encephalopathy, for example, has resulted in stringent requirements in raw material qualifications and traceability (33) . to minimize the impact of regulatory changes, careful record keeping of all processes and materials involved in deriving the product is highly recommended. finally, because of the availability of improved techniques for characterizing certain biologicals, the fda is reorganizing its regulatory approach in ways that are analogous to the regulation of small molecule drugs. technical demands will rise as regulatory requirements become more standardized. beyond identification and confirmation of an interesting novel concept, a major challenge in the preclinical development of biologicals is the optimal allocation of research and development resources. key to this is proper assessment of a candidate concept's readiness for clinical development. all applicants for the national cancer institute's biologicals production resources now receive a list of "generic questions" corresponding to the appropriate product type. at the beginning of a project, it is not always reasonable to expect all issues to be resolved, but the assumption is that, for a successful candidate, these issues should be in hand by the time the project is completing phase i clinical testing. table 5 lists the generic questions for cgmp production of recombinant virus vectors. because it is not possible to provide a complete guide to cgmp development in a few pages, we highlight some concerns common to many projects arising from academic laboratories. these are based on dtp's experience (both successes and failures) with projects making the transition from the preclinical research phase to pilot clinical studies. our discussion is organized primarily around the concepts of identity, purity, potency, and safety that underlie development, manufacture, and release. from the viewpoint of regulatory compliance, it is essential to establish the identity of the product and the components used to generate it during manufacturing (9, 22) . we have noted that, frequently in proposed gene therapy or recombinant dna-derived products submitted to us, dna sequencing shows some deviation from the sequence published and/or submitted by the investigator, sometimes with major consequences for the project. when the dna product, such as plasmid vaccines, will be administered to the patient, full plasmid sequencing has occasionally revealed unacceptable genetic sequences outside the open reading frame, as passengers from previous experiments, spurious promoters, frame shifts resulting in translation of nonsense sequences beyond the intended termination, and so on. dna sequencing and repair are available at relatively low cost compared to the cost of repeating critical preclinical experiments. the fda now requires complete sequencing of vectors of sizes up to 40 kilobases (kb) (34) . for viral vectors, genetic stability is a major concern, particularly with respect to the possible issues of recombination with generation of replication-competent viruses. specific guidelines are provided for adenovirus, retrovirus, and lentivirus vectors (9, 35, 36) . for other virus vectors, specific assays (e.g., neurovirulence testing of recombinant poliovirus and herpes virus vectors) are required to ensure that an attenuated phenotype is preserved after scale-up. if possible, the investigator should attempt to assess the genetic stability of the vector during preclinical studies, after administration in vivo or propagation in vitro. in addition to the gene therapy product itself, the cells used to produce the product must be similarly identified and qualified for cgmp manufacturing. excellent guidance documents are available for the production of master cell banks, working cell banks, and master virus banks (9, 32, 37) . at minimum, the complete cell history should be known and documented, and the cells should be tested to verify their origin. peptide sequencing or mapping employing liquid chromatography-mass spectrometry is typically used to provide critical information for synthetic peptides and recombinant proteins. for recombinant vectors containing transgenes, the expression of the desired gene product should be verified, for example, by immunoassay using a specific antibody against the product. a. non-gmp for additional preclinical development b. cgmp (clinical grade) 2. provide details of molecular construct(s), including starting materials (e.g., plasmid, relevant vector maps, detailed vector construction scheme, and so on) 3. does the construct contain an antibiotic resistance gene or other selectable marker? are alternative methods of selection available? why was the proposed selection chosen? 4. is the vector replication competent or replication defective? (for replication-selective vectors, what is the molecular basis of the selectivity and the conditions under which the vector would replicate?) 5. does the vector have an altered cell tropism? define. what is the effect of altered tropism on anticipated host toxicity? 6. has this construct been sequenced? provide a sequence in an electronic format. 7. are data available evaluating the genetic stability of the recombinant vector? have mutation rates been established and/or rates of reversion to either wild-type or alternate viral genomes? 8. are data available evaluating the potential for genetic recombination with other organisms in the patient or in the environment? 9. is the organism currently grown in a qualified cgmp cell line? if not, is there a qualified cell line available for propagation of this vector? was the cell line genetically modified to support this vector? provide details of its construction and any information regarding the stability of the genetic alteration in the cell line. 10. is there a cgmp-qualified virus seed bank? 11. provide details of the proposed production method. 12. has this material ever been produced for laboratory or clinical studies using this production system? 13. has this material been produced in a related or other production system? if so, please provide the details. 14. please provide details of existing purification methods. 15 . what is the average yield of the production system before and after purification? what is the largest amount of material that you have produced in your laboratory in a single production batch? please provide average ratios obtained by this production method for virus particle/infectious unit and infectious units/cell. how does this scale to anticipated quantities for clinical trial? 16. how much material is available as a reference standard? 17. is material available as bulk biological substance for preliminary pharmacology and toxicology studies? 18. are there reproducible assays for the product? please provide the following assays, if available: a. identity b. purity c. potency 19. what are to be the release criteria for the product? how does one know that a lot of product is qualified for use? 20. in what form (lyophilized, formulated product, and so on) and fill size is the desired final product? what is the desired final product formulation? 21. are there issues of formulation that must be resolved? 22 . what is known about the product stability with respect to physical integrity and activity? 23. do you have any information regarding the estimated costs of this production project? 24. have you identified any possible sources of production with any commercial firms? please provide details. 25 . are there any safety issues connected with the production, purification, and/or handling of the product? 26. what is the status of the product(s) regarding intellectual property issues? 27. sometimes, proposed projects are an improvement or modification of an existing approach. in these cases, this information may significantly affect the analysis of feasibility, cost, and other production issues. please provide a brief summary of the nature of any such antecedents or other approaches that appear closely related to the proposed project. 28 . have there been any meetings scheduled with regulatory agencies, such as a pre-ind meeting with the fda or a presentation to the nih rac? if so, please indicate the type of meeting, the regulatory agency, and the date or proposed date. 29. if you have had a pre-ind or rac meeting, were any issues concerning manufacturing, safety, or stability raised that will have an impact on producing your product? 30. who will sponsor the ind for the proposed study? 31. has a source of funding been identified for performing the clinical trial with this product? purification strategies depend on the nature of the biologic agent and expected impurities. these approaches are guided by the early development of reliable analytical techniques appropriate to the product and to the manufacturing approach. for example, purification of recombinant proteins and monoclonal antibodies for cgmp manufacture typically involve large-scale chromatography based on multiple isolation principles (e.g., charge, hydrophobicity, size, and so on). specific contaminants, such as dna, endotoxin, viruses from mammalian cell production systems, contaminants introduced in the manufacturing process, and the like must be quantified and may require additional specific purification measures to remove or inactivate them. problems in refolding or solubility, tendencies to aggregate, and product stability at intermediate holding points can be significant issues in process development for scale-up. these represent key challenges in scale-up from investigator laboratory-generated lots to a potentially suitable scale to allow clinical testing. additional concerns include subtle degradations of proteins that can lead to undesirable immunogenicity. a major concern is the impact of each additional step on the downstream product, which should be reassessed using in vitro potency assays as well as physicochemical characterization. at major development milestones, selected in vivo models should be reexamined using purified product. production cells must be cgmp qualified and tested for adventitious agents and other contaminants, before initiation of production as well as at end of production. a number of cgmp-qualified cell lines and starting vectors are available commercially at relatively low cost and should be considered for use as raw materials to initiate cgmp seed banks in preference to shared materials of uncertain provenance despite the good intentions of the original provider. in the handling of cell lines, care should also be taken to avoid contamination (e.g., from media components, trypsin, or activities taking place in nearby laboratory space). postproduction cells can be tested for specific contaminants in the presence of a viral product (e.g., using polymerase chain reaction [pcr]). in the presence of virus product, however, it is unlikely that the full set of cgmp tests (e.g., cocultivation) for adventitious agents can be performed on postproduction cells. before initiating cgmp production, therefore, investigators should consider the parallel propagation of a mock-infected control to provide a surrogate postproduction test article. in addition to the usual tests for sterility and purity of purified investigational product (9, 20, 21) , it is important to have an assay for residual host cell dna. assays for host cell proteins are not always required for all phase i products, but are required for phase ii and beyond. this consideration is another reason to start with cgmp-qualified cell lines from a commercial source because host cell dna and protein assays may already be available. the general requirement for adventitious agent testing is given in guidance documents (e.g., ref. 9). it is important to note that some specific assays are not yet described in published fda documents, but can enter widespread cgmp practice by sponsor-based industry consensus, liability considerations, or other factors. endotoxin assays are available as kits, which are useful to guide laboratory process development. a qualified good laboratory practice (glp) laboratory, however, should perform endotoxin assays for clinical product release. specific assays may also be required to quantify process residuals from production and purification components (cscl, antibiotics, and so on). in production facilities, particularly those where different types of gene therapy products may be produced, assays are necessary to support decontamination and cleaning, product changeover, environmental monitoring, and raw material qualification (25, 38) . in general, all equipment that has contact with the investigational product should be verified free of contaminants before use. special consideration must also be given to assays to qualify virus seeds and end product for the presence of defective particles, replication-competent viruses, or defective genomes. in addition to monitoring for replication-competent and/or pathogenic vectors during manufacture, suitable assays may also be required to monitor patients receiving the therapeutic agent. in this case, levels of sensitivity for detection must be suitable for different types of patient specimens (serum, urine, sputum, and so on). evolving requirements for long-term follow-up of gene therapy patients (39) should be consulted to ensure that proper assay support is maintained beyond the duration of the planned clinical trial. the fda regulations governing the performance of assays that support the production of biologics for human use can be found in 21 cfr 211, subpart i, "laboratory controls" (25) . glp regulations only per-tain to the performance of preclinical animal and in vitro studies (23). the measured activity of a biological candidate depends on the hypothesized molecular mechanism of therapeutic action (21 cfr 600. 3; 9, 19) . although it is most reassuring to see in vivo demonstration of efficacy in appropriate animal models, the efficient development of a cgmp process will strongly benefit from the availability of rapid, reliable, and reproducible in vitro assays relating to the mechanism of action in addition to assays for purity and identity. assays based on the basic therapeutic mechanism, therefore, are critical goals of early research and development. formulation development should begin as early as possible as suitable assays become available and experience with real-time stability accumulates. it is preferable to choose formulations from candidates likely to be acceptable to the fda, such as those whose components are already used for licensed products. as production reaches larger scales, handling and storage considerations become increasingly important. stability studies incorporate assays for identity, purity (including aggregation), and potency. although they can provide some useful information, accelerated stability studies are typically not reliable for predicting real-time stability of biologics. therefore, there is a need for real-time stability studies to be initiated as soon as possible. suitability of formulated product should also be assessed in the identical administration and handling conditions expected in the clinic. this may include transient exposure to conditions expected during transit to the study site and storage in an environment that closely mimics study site conditions. these may result in markedly different product behavior at the study site from that expected from the behavior of vouchered specimens at a central repository site. the results of ongoing stability studies are useful to support process development; to evaluate product at intermediate hold points in scaleup production and at product release; and to support formulation development, product storage, shipment, and handling during toxicology studies and clinical applications. as development proceeds, master specifications for release of intermediate and final product should be established and refined. the ind must indicate a schedule for real-time stability studies to be performed throughout the duration of the clinical trial. some key early milestones common to all product areas include the attainment of an adequate scale of high-quality, single-batch production, the availability of adequate amounts of high-quality laboratory reference standard, and the development of reliable assays for identity, purity, and potency of the product. these milestones are necessary in addition to the exploration of animal models showing safety and promising evidence of efficacy. at early stages in a project, investigators should expect substantial variability in product quality, assays, and animal models. ideally, therefore, a single high-quality batch should be used to establish laboratory standards, support multiple assay qualification runs, and perform replicate animal model experiments. multiple production runs could then be performed to explore process development issues, including scale-up. in this way, fundamental issues could be explored at the research stage to prepare for development required for cgmp manufacture. following this reasoning, our facility often manufactures high-quality glp lots to provide a uniform supply of material for additional preclinical research and development for selected biologics of interest before making the decision to undertake cgmp production. the early establishment of certain aspects of glp (21 cfr 58) is crucial to the advancement of a drug development project. by following simple rules of laboratory cleanliness, documentation, and segregation of materials and activities at the start of development, time can be saved by avoiding the necessity to duplicate results that were not properly performed or documented from the outset. at the discovery phase, development of reliable assays to explore basic therapeutic mechanisms of action are just as important as the performance of animal models in laying the groundwork for future cgmp product development. laboratory facilities and staff should be adequate to perform necessary studies. assay protocols should be specific and reproducible. research documentation should be kept at a glp level with complete and secure laboratory notebooks. records of all reagents (i.e., manufacturer, catalog number, lot number, certificates of analysis [coas] , and expiration dates) should be routinely archived. even if cgmp production or testing is not contemplated in the development laboratory, fluctuations in product activity are not unusual during later scale-up, and these materials and information may be useful in resolving such issues. key assays for product or reagent identity should be repeated at appropriate intervals. access to critical raw materials and reagents, reference standards, and cell banks should be limited. staff should avoid comingling of research-grade, glp, and gmp activities or reagents by labeling reagent containers and sequestering them as much as possible. similarly, signs should be posted on dedicated equipment, and access should be limited as appropriate. if common equipment must be used, standard operating procedures should be developed to define the use and control of such equipment, to clean equipment before and after use to avoid cross-contamination, and to document the use, cleaning, and calibration of the equipment. critical raw materials (e.g., those used in seed development or pilot product manufacture) must be traceable to their source and obtained from reliable vendors. it is beneficial when possible to use vendors subjected to commercial audits. animal-derived reagents should be avoided; reagents such as glycerol, detergents, proteins, amino acids, and the like should preferably come from vegetable sources. if this is not possible, animal-derived reagents should come from acceptable herds in countries without endemic or questionable transmissible spongiform encephalopathy (33) . it is important to ensure that raw materials are stored under appropriate conditions and not used beyond their expiration date. inventories and logbooks should be used to track use of important reagents. critical materials requiring special storage conditions should be stored at more than one location to prevent loss in the event of equipment failure. cgmp-qualified cell lines should be purchased from vendors if possible, but if cgmp-qualified cell lines do not exist, cell lines should be obtained directly from a reliable repository source such as the american type culture collection (atcc) and documentation should be archived. incoming cell lines must be tested for sterility and mycoplasma contamination. thorough records should be kept on cell passages, observations, frozen storage, and the coas from media and other components used to grow, freeze, and otherwise manipulate the cells. critical cell lines should be segregated to prevent cross-contamination. stock cells should not be cultured in incubators containing virus-infected cell lines. vectors should be purchased from a reliable vendor, and documentation should be kept on the propagation, storage, and use, including coas, lot numbers, and so on of the reagents used to propagate the vector. if the vector is acquired from another laboratory (i.e., is unavailable for purchase from a vendor), a detailed history should be obtained of the generation of the vector, and detailed records should be kept from that time. all genetic manipulations of the vectors should be well documented and verified by sequencing. a lab-generated reference standard is a critical raw material for a biologic. it is ideal to have a large enough stock of this reference standard to use for the duration of the development work. it is often not possible, however, to produce sufficient quantities or material of sufficient stability at the early development stages. for this reason, it may be necessary to produce fresh batches and to test them thoroughly against independent standards or the current standard before that standard is depleted or loses potency. the same consideration should be given to other critical reagents, such as cell lines and compounds obtained from outside sources. reference standards and key reagents should be made or obtained in adequate amounts, characterized as well as possible, and stored under conditions that will maintain stability for at least the duration of the development process. as improved manufacturing and assay processes are developed, improved reference standards will be required, but quantities of the original standards should also be preserved to provide material for later comparisons as required (40) . in some cases, such as for retrovirus and adenovirus vector development, the fda has made available reference material against which all sponsors can standardize their own reference reagents. to avoid future questions about data reliability, investigators should consider outsourcing of difficult but common technologies, such as transmission electron microscopy, tandem liquid chromatography mass spectroscopy, peptide mapping, or dna sequencing, if these are not adequate in their facility. product-specific assays, such as potency studies and pilot animal efficacy and toxicity studies, are likely to be performed best at the researcher's own facility, early in development. preclinical assay protocol development and record keeping must ensure that data are useful for later ind submission. at some point in cgmp process development, consideration should be given to technology transfer of critical assays to a glp-compliant laboratory prepared to support the repetitive studies required during cgmp process development, manufacture, release, and postrelease stability studies for the duration of the clinical trial. toxicology material should ideally be manufactured using the same process used to manufacture the cgmp clinical material. therefore, a toxicology lot is produced late in process development. if there are concerns over batch-to-batch variability, production of a single lot for both toxicology and the initial phase i trial is recommended. typically, a toxicology lot can be available several months before the clinical lot is ready for release. for studies involving autologous cells, the handling of cells must be under gmp conditions to preserve sterility and prevent cross-contamination with other cells. for allogeneic cells, it is important to use a cgmp-tested cell line with adequate traceability, including its origin, passage history, and exposure to media products that may have been derived from animal sources. starting material must be routinely checked for sequence accuracy; therefore, the complete plasmid should be sequenced. it is also preferable to examine genetic stability that can lead to the introduction of coding errors and changes in protein expression. the use of penicillin-like antibiotics (b-lactams) for selection is unacceptable because of the possibility of allergic reactions in patients administered products produced using this selection system. other antibiotics, such as kanamycin, are substituted, or alternative methods of selection are employed. measures of dna quality include supercoiled content, as well as assays for endotoxin, genomic dna, and other contaminants from the production system. more in-depth guidance is available through guidance documents (9). it is generally recommended by the fda that a vector smaller than 40 kb must be completely sequenced (34) . as technology improves, this criterion may well be expanded to include larger vector genomes. those vectors with genomes larger than 40 kb (e.g., herpes viruses, poxviruses) must have the transgene sequenced along with 5' and 3' flanking regions and any significant modifications to the vector backbone or sites vulnerable to alteration during molecular manipulation. when qualified vaccine strains exist for the vector of interest (e.g., vaccinia, poliovirus) it is preferable for cgmp manufacture to derive investigational constructs using the vaccine strain if available from the nih, atcc, or commercial sources. for adenovirus vectors, the recent availability of an adenovirus reference standard allows for the normalization of dosing based on virus particle concentration and infectious unit (iu) titer. current recommendations by the fda are for a ratio of viral particle to infectious unit of less than or equal to 30:1 (35) . for replication-defective adenoviruses, generation of replication-competent adenoviruses (rca) must be measured in lots produced for clinical use. the current target requirement is fewer than 1 rca per 3 ´ 10 10 virus particles as measured by a cell culture/cytopathic effect method (35) . for viruses that are replication selective, different testing strategies (e.g., quantitative pcr) may be called for and should be discussed with the fda. similarly, the agency may have special considerations for viruses with altered tropism to ensure appropriate containment and prevent the generation of a replicationcompetent adenovirus with an expanded cell tropism. it should be noted that rca assays must be optimized regarding the presence of defective particles and other factors that may affect the sensitivity of the assay. retroviruses are of special concern because of the possibility of insertional carcinogenesis. this potential safety problem is amplified if replication-competent retroviruses (rcrs) are generated (9, 36) . the general guideline is to test at least 5% of the total virus vector supernatant produced by amplifying any rcr on a permissive cell line. in addition, 1% of the producer cells or 10 8 (whichever is less) must also be tested at the end of production by the method of coculturing on permissive cells (36) . as with adenovirus vectors, retrovirus vectors with tropism modifications are of special concern and may require more stringent containment and patient follow-up (39) . promoter modifications may also affect the safety profile of these virus vectors. lentiviruses generally have the same safety concerns as retroviruses, particularly because they can replicate in a broader variety of cells (dormant as well as actively dividing cells). although there is a retrovirus standard available through atcc to investigators who are developing retrovirus vectors, there is no lentivirus standard currently offered. a lentivirus standard is not planned for the future primarily because of the great variability in lentivirus backbones currently under development for clinical investigations (e.g., equine, murine, human). herpes viruses under development for clinical use either must be replication defective or, if replication competent, must be shown to be nonneurovirulent. neurovirulence is an issue for poliovirus as well. adeno-associated virus (aav) vectors are of concern because, although these vectors are designed to be maintained episomally, there can be reversion to wild type, resulting in integration into the host chromosome, or the vector could be rescued in a patient with a concurrent adenovirus infection (41) . several interesting concepts seek to employ modified bacteria as the therapeutic agent. as with recombinant viruses, general issues of safety as well as specific issues of genetic stability and exchange should be considered. stabilization of the new genetic material may be required by incorporation into the bacterial genome rather than through a plasmid that can be lost or exchanged. strategies to incorporate new genetic material into bacterial dna will depend on confirming the sequence accuracy of the target bacterial sequences as well as the novel genetic material. introduction of an antibiotic resis-tance gene through a manufacturing step raises special concerns and can be avoided using alternative selection approaches. whether evaluating small molecules or biologically derived materials such as gene therapy products, the basic intent of nonclinical toxicity studies is to define the pharmacological and toxicological effects predictive of the human response, not only prior to initiation of phase i clinical trials in humans, but also throughout the entire drug development process leading ultimately to biologics license application (bla). the goals of these studies include, first, to define an initial safe starting dose and dose escalation schemes for first-in-human clinical trials; second, to identify potential target organs for toxicity, biomarkers or other parameters that can be monitored in patients receiving these therapies, and to determine if this toxicity is reversible; and finally, to determine which patient populations may be at greater risk for developing toxicity to a given cellular or gene therapeutic product (42) . these nonclinical studies should be designed with the following points in mind: whether the product is transduced cells, the population of cells to be administered, or the class of vector used; the most appropriate animal species and physiological state of that model most relevant for the clinical indication and product class; and the intended doses, route of administration, and treatment regimens that will be used in the clinic. many of the questions that need to be taken into consideration and addressed during the design phase for safety studies include what is already known about the most likely toxicities related to the agent's biodistribution, local as well as systemic toxicity, immune responses (immunogenicity and immunotoxicity), the potential for insertional mutagenesis, and biological activities of the transgene product. then specific questions that arise with the new product or use are addressed. for example, are the safety issues primarily related to the vector, the transgene product, the method of administration, the formulation/excipient, or some combination of the above? how might existing published or unpublished nonclinical or clinical data address the questions mentioned above? safety issues that should be addressed in these studies include evaluation of the toxicity of the vector alone (irrespective of the transgene), including its potential toxicity and/or tumorigenicity (in some cases, this is apparent from previous evaluations with the same vector); toxicity of transgene expression in vivo that may not be evident from in vitro studies; occurrence and consequences of ectopic transgene expression in nontargeted tissues; occurrence and consequences of immune responses to transgene or vector proteins such as autoimmunity; and finally the possibility of germline transduction (34) . because conventional pharmacology and toxicity testing as typically used for the evaluation of small molecules may not always be appropriate to determine the safety and biologic activity of cellular and gene therapy products, issues such as species specificity of the transduced gene, permissiveness for infection by viral vectors, and comparative animal to human physiology should be considered in the design of these studies. available animal models mimicking the disease indication may be useful in obtaining both sufficient safety and efficacy data prior to entry of these agents into clinical trials. early pre-ind discussions with the fda during development of a toxicology plan may prevent delays and added expenses because of inadequate data or the use of inappropriate species. some of the questions that should be answered by preclinical pharmacology/toxicology studies are the following (43) : what is the relationship of the dose to the biologic activity? what is the relationship of the dose to the toxicity? does the route and/or schedule affect activity and/or toxicity? what risks can be identified for the clinical trial? for ex vivo gene transfer, the product is considered to be the transduced cells. the general safety test (21 cfr 610.11) must be performed on the final product. when appropriate, modified procedures may be developed according to 21 cfr 610.9. the fda is considering proposed rule making to amend the general safety test rules and scope of applicability, especially for cell therapy products (9) . finally, it is expected that these nonclinical toxicity studies will be conducted in compliance with glp regulations. however, there will be situations in which highly specialized assays will be required because of the nature of biotechnology-derived products, and it will not be possible to conduct these assays in full compliance with glps (e.g., in university or other discovery laboratories). it will be important that these areas be identified for any impact that they may have on the interpretation of toxicity data. in most cases, carefully performed studies such as this can be used to support inds and blas (7). when selecting the animal model that will be used in the various preclinical biodistribution, pharmacology, and toxicology studies, consideration should be given to the scientific rationale for the animal species used. for example, would there be an advantage to performing the studies in rodents when larger numbers of animals might be more practical, or is there a necessity for a large animal model, such as a canine or nonhuman primate? if nonhuman primate studies are proposed, is it clear that another large animal or rodent model would not provide the same information? would there be any utility in a genetically deficient model, and would this deficient model be more relevant to the proposed study either because of the potential for adverse immunologic consequences or because of the biological effects in the deficient condition? animal models of disease may not be available for every cellular or gene therapy system proposed for development. this makes species selection an even more difficult process. preclinical pharmacological and safety testing of these agents should employ the most appropriate, pharmacologically relevant animal model available. a relevant animal species might be one in which the biological response to the therapy mimics the human response. this entails some knowledge of the pathophysiology of the disease in humans and of how faithfully it is reproduced in the animal model. the species of animal chosen for preclinical toxicity evaluations of viral preparations should be selected for its sensitivity to infection and production of pathologic sequelae induced by the wild-type virus related to the chosen vector, as well as its utility as a model of biologic activity of the vector construct. there should be a reasonable expectation of a similar distribution of receptors or permissivity in the animal model as there is in humans. thus, the species utilized may vary with the vector administered, the transgene expressed, the route of administration, the patient population treated, and the disease studied. rodent models rather than nonhuman primates may be useful if they are susceptible to pathology induced by the virus class (e.g., cotton rats are semipermissive hosts for adenovirus infections) (44) ; the use of the scid mouse (45) or the cotton rat (46) may be suitable for the evaluation of herpes simplex virus (hsv) rather than the aotus monkey. some investigators have also suggested the use of miniature swine for evaluation of adenoviral vectors (47, 48) . when evaluating the activity of a vector in an animal model for the clinical indication, safety data can be gathered from the same model to assess the contribution of disease-related changes in physiology or underlying pathology to the response to the vector. some specialized circumstances illustrating these points follow. the inbred cotton rat (sigmodon hispidus) has been used extensively as an animal model for research since the 1940s, when it was first used in poliomyelitis research. since that time, it has been shown to be a semipermissive host for adenoviral infection (44, 49) . in those studies, it was shown that the pulmonary lesions and replication pattern of the virus seen in the cotton rat paralleled those seen in humans. virus persisted in the nasal mucosa and lung for up to 21 and 28 days, respectively, after inoculation. this was even in the presence of high-titer neutralizing antibody that was detected by day 7. although cotton rats have readily adapted to the laboratory environment, they have retained a number of the characteristics of their wild counterparts. these animals have a tendency to bite, panic when handled, jump out of their cages, and have a large fight-or-flight zone. care and handling of these animals have been described by other investigators (50,51) . the cotton rat has been used for the evaluation of numerous adenovirus vectors by many routes of administration, and some of these studies are described here. when early e3-deleted adenoviral vectors were evaluated in the cotton rat, it was discovered that the e3 region was not required for replication, but that this region plays a critical role in the pathogenesis of the disease in that these mutants induced a markedly greater lymphocyte and macrophage/monocyte inflammatory response in the lungs (52) . e3 replacement recombinants were significantly less pathogenic than e3-deleted viruses after intranasal administration (53) . this study also demonstrated that adenovirus replicated in balb/c and cba mice and produced results that were similar to those seen in cotton rats. the intracranial administration of a replication-defective adenovirus expressing the herpes simplex virus thymidine kinase (hsvtk) gene at a dose of 1.0 ´ 10 9 pfu into both adenoviral immune and adenoviral naïve cotton rats resulted in only mild gliosis and trace meningitis along the injection tract and approximated a "no toxic effect" dose (54) . when this same vector was administered to either wistar rats or rhesus monkeys, direct neuronal injury or a dose-related inflammatory response was seen at the injection site and in the surrounding parenchyma. there was no apparent injury to tissues not of the central nervous system in any of the three models, and all cerebral spinal fluid, blood, urine, and stool samples failed to culture for adenovirus. in a study with a similar hsvtk adenovirus inoculated into cotton rats via intracardiac injection at doses up to 3 ´ 10 10 viral particles per animal with and without ganciclovir (gcv), the only significant microscopic lesions observed were epicardial inflammation and splenic hemosiderosis (55) . vector sequences persisted throughout the 14-day assay period in the heart, lung, and lymphoid organs. infectious virions were detected for 24 hours, but these virions were only detected at the site of injection of two animals in the highest dose group. when a similar vector was administered as either one or two subcutaneous injection cycles with 2.3 ´ 10 12 viral particles/kg each or as a single course with 6.9 ´ 10 13 viral particles/kg, the only significant treatment-related histopathological finding was dermatitis with mild acanthosis at the site of vector injection (56) . in addition to these local effects, mild hyperamylasemia, lymphocytosis, and granulocytosis were seen clinically, but no other clinical signs of toxicity or death were observed. vector sequences were detected in the skin at the injection site and to a lesser extent in the liver, spleen, and lungs, and small amounts of vector dna were detected in the ovaries. these were cleared rapidly, and the absence of viral sequences in the excreta and swabs of the majority of animals suggested that there was no significant replication of this adenovirus vector in this host and little shedding. the owl monkey (aotus trivirgatus or nancymae) has been an excellent model for oncogenic and nononcogenic viruses such as hsv type 1 (hsv-1) and others (57) , and the herpes virus that infects these animals is a strain of hsv-1 (58). these animals have been routinely used to test vaccines against hsv-1 and found to mimic the course of the disease in humans (59,60). as a result, it was only natural that these animals be used to evaluate the safety of gene therapy vectors produced from hsv-1. however, these animals tend to be more fragile to use than other species and as a result must be handled with greater care. g207, an attenuated, replication-competent hsv-1 recombinant, was tested for safety after intracerebral inoculation in the aotus (61). these animals received doses of either 10 7 or 10 9 pfu of g207 or 10 3 pfu of the wild-type hsv-1 strain f. wild-type hsv-1 caused rapid mortality and symptoms consistent with hsv encephalitis, including fever, hemiparesis, meningitis, and hemorrhage in the basal ganglia. for up to 1 year after g207 inoculation, seven of the treated animals were alive and exhibited no evidence of clinical complications, indicating that this form of hsv was considerably attenuated in comparison to wild-type virus. two animals were reinoculated with 10 7 pfu of g207 at the same stereotactic coordinates 1 year after the initial dose. these animals were alive and healthy 2 years after the second inoculation. as a further, more comprehensive clinical evaluation, animals were subjected to cerebral magnetic resonance imaging (mri) studies both before and after g207 inoculation. these studies failed to reveal radiographic evidence of the typical hsv-related sequelae in the brain seen in the animals treated with the wild-type virus. microscopic examination of multiple tissues found no evidence of hsv-induced histopathology or dissemination in spite of the fact that measurable increases in serum anti-hsv titers were detected. viral shedding and biodistribution in the aotus were also evaluated using pcr analyses and viral cultures of tear, saliva, or vaginal secretion samples (62) . neither infectious virus nor viral dna was detected at any time-point up to 1 month postinoculation. analyses of tissues obtained at necropsy at 1 month or 2 years after inoculation showed the distribution of g207 dna was restricted to the brain, although infectious virus was not isolated in these samples. the safety of this construct was also evaluated in the aotus after intraprostatic injections (63) . safety was assessed on the basis of clinical observations, viral biodistribution, virus shedding, and histopathology. none of the injected monkeys displayed evidence of clinical disease, shedding of infectious virus, or spread of the virus into other organs. no significant microscopic abnormalities were observed in the organs evaluated. the results of these studies demonstrated that g207 can be safely inoculated into either the brain or the prostate, and that the aotus monkey could be successfully used in preclinical toxicological evaluations. in addition to the studies performed with this vector in aotus monkeys, balb/c mice were also used to evaluate the safety of g207. mice were inoculated in the same manner as the aotus either intracerebrally or intraventricularly with 10 7 pfu of g207 and survived for over 20 weeks with no apparent symptoms of disease. in contrast, over 80% of animals inoculated intracerebrally with 1.5 1 0 3 pfu of hsv-1 wild-type strain kos and 50% of animals inoculated intraventricularly with 10 4 pfu of wild-type strain f died within 10 days. when mice were inoculated intrahepatically with g207 (3 ´ 10 7 pfu), all animals survived for over 10 weeks, whereas no animals survived for even 1 week after inoculation with 10 6 pfu of wild-type kos (64) . mice were also injected in the prostate with either g207 or wild-type hsv-1 strain f and observed for 5 months (63) . none of the g207-injected animals exhibited any clinical signs of disease or died. however, 50% of mice injected with strain f displayed sluggishness and hunched behavior and were dead by day 13. on microscopic examination, the prostates injected with g207 were normal, whereas those injected with strain f showed epithelial flattening, sloughing, and stromal edema. these studies and those described by whitley with the scid mouse (45) point to the fact that rodents can be used in place of the owl monkey and produce adequate safety data for the evaluation of hsv-1 vectors for gene therapy. finally, safety data can also be obtained in well-designed efficacy studies. in many cases, mouse studies can provide similar information as studies conducted in nonhuman primates, so smaller species should not be automatically rejected. the nonhuman primate should not be relied on for use as a model simply because of the comfort of going into studies in humans only after evaluation of the toxicity of the agent in nonhuman primates. experience has repeatedly shown for numerous classes of agents, both small molecules as well as biologicals, that no one species may be predictive of all human toxicities, and that not all human toxicities may be seen in other animal species (65) . finally, certain human populations may not be predictive of all other human populations. this last fact makes predicting each and every toxicity almost impossible. the doses of vectors used in nonclinical studies should be selected based on preliminary efficacy/ activity data from both in vitro and in vivo studies. a no-effect dose level, an overtly toxic dose, and several intermediate doses should be evaluated, along with appropriate controls, such as naïve or vehicle-treated animals. for new formulations, it is very important to include this last group to distinguish formulation-related effects from those of the agent of interest. when products are difficult to produce in large quantities and as a result are in limited supply or for products with an inherently low toxicity, a maximum tolerated dose may not be achievable; as a result, a maximum feasible dose may be administered as the highest level tested in the preclinical studies and should be so designated in appropriate reports. although this may not be intellectually or scientifically satisfying, the data derived from such a study should at least establish the safety of the clinical starting dose. preclinical safety/ toxicity studies should include at least one dose that is equivalent to and at least one dose escalation level exceeding those proposed for the clinical trial. the multiples of the human dose required to determine adequate safety margins may vary with each class of vector employed, and the relevance of the animal model to humans and the rationale for dose selection should be provided. scaling of doses based on either body weight or total body surface area as appropriate facilitates comparisons across the animal species used and humans. although most small molecule cancer therapeutics are scaled based on body surface area (66), body surface area may not be appropriate for gene therapeutics. information generated in the preclinical studies can be used to determine the margin of safety of the vector for use in the clinical trial, as well as gage an acceptable dose escalation scheme depending on the steepness of the toxicity curve. in a cross comparison of doses for an adenoviral vector for cystic fibrosis (14), very similar toxicities were seen in cotton rats, mice, hamsters, rhesus monkeys, and baboons when the agent was directly instilled into the lungs of the animals. when the doses were scaled for body surface area, the no observable adverse effect levels for the various species were remarkably similar to one another and to the first human dose at which toxicity was observed, 0.4-2.4 ´ 10 9 iu/m 2 vs 1.2 ´ 10 9 iu/m 2 for humans. the only notable exception was the rhesus monkey at 4.6 ´ 10 7 iu/m 2 . studies like this enable other investigators to make wiser choices in the selection of doses and species to evaluate. the route of administration of vectors can have an obvious influence on toxicity in vivo because of the distribution and concentrations of the agent that are produced. for example, intravenous bolus doses can produce very high concentrations for short durations; other routes of administration, such as subcutaneous, may produce much lower concentrations and more prolonged exposure. current practice recommends that safety evaluations in preclinical studies should be conducted by the identical route and method of administration as that proposed for the phase i clinical trial whenever possible. when this is difficult to achieve in a small animal species, a method of administration similar to that planned for use in the clinic is advised. for example, intrapulmonary instillation of adenoviral vectors by intranasal administration in cotton rats or mice is an acceptable alternative to direct intrapulmonary administration through a bronchoscope because the latter procedure is simply not feasible in rodents. if the proposed clinical route is a nonintravenous (e.g., intratumoral), it may be wise also to conduct an intravenous study to provide perhaps "worse-case" data for what may happen in the event of an accidental injection directly into a patient's blood vessel. when possible, the schedule of administration in the animal studies should also be identical to that intended in the phase i clinical trial. this may not be feasible in certain instances because of the production of neutralizing antibodies in the animal model that might preclude repeated administration; that may not be a factor in humans. in studies in which additional agents will be administered in combination with the gene therapy agent (e.g., in suicide therapy using hsvtk and gcv or hsv cytosine deaminase and 5-fluorocytosine), the route and schedule should also be identical to that planned for the clinic. evaluating the vector alone in animal models would not provide sufficient data for predicting additional toxicity that may be produced by the combination, but should be at least one arm of the planned preclinical animal studies. at a minimum, treated animals should be monitored for general health status (clinical observations, body weight and temperature changes, changes in food and water consumption), serum biochemistry, and hematological profiles. target organs and other critical tissues should be examined for gross and microscopic changes. the addition of other parameters to be evaluated will depend on the nature of the product studied, the species used, and the route of administration. there is no set of all-inclusive parameters that is sufficient for each and every new agent. studies should be designed specifically for each agent, utilizing the most appropriate tests to capture as much relevant data as possible. because many biotechnology-derived pharmaceuticals intended for human use will be immunogenic in animals, the use of animal-derived proteins/products, if available, should be considered to define the intrinsic toxicity of the new agent. this may entail parallel development processes in which a construct relevant to the species in the safety test is developed to a point to allow a most relevant safety test to proceed. the analogous human construct then may actually be brought into the clinic supported by these results. if human material is used, measurement of antibodies associated with administration of products should always be performed when conducting repeated dose toxicity studies. these data will assist the investigator in the interpretation of the results of these studies. antibody responses produced in animals should be fully characterized (e.g., titer, number of responding animals, neutralizing or nonneutralizing), and their appearance should be correlated with any pharmacological and/or toxicological changes observed. more specifically, the effects of antibody formation on pharmacokinetics and/or pharmacodynamics, incidence and/or severity of adverse effects, complement activation, or the emergence of new toxic effects should be considered when interpreting the data. attention should also be paid to the evaluation of possible pathological changes related to immune complex formation and deposition, especially in the kidney of treated animals. the detection of antibodies in animals should not be the sole criterion for the early termination of a preclinical safety study or modification of the duration of the study unless the immune response neutralizes the pharmacological and/or toxicological effects of the biopharmaceutical in a large proportion of the animals. in most cases, the immune response to biopharmaceuticals in animals will be variable, similar to such responses in humans. if these issues do not compromise the interpretation of the data from the safety study, then no special significance should be ascribed to the antibody response. the induction of antibody formation in animals is not necessarily predictive of a potential for antibody formation in humans. by the same token, humans may also develop serum antibodies against humanized proteins, and frequently the therapeutic response persists in their presence. the same may happen in animals if a purified protein is administered via a gene therapy viral vector. in the case of human factor ix, when the purified protein was administered to rhesus macaques, the monkeys did not make antibodies (67) . however, when factor ix was administered in a first-generation adenoviral vector, the animals mounted an acute phase response that produced neutralizing antibody that eliminated factor ix from the circulation (68) . finally, the occurrence of severe anaphylactic responses to recombinant proteins is rare in humans. the results of guinea pig anaphylaxis tests, which are generally positive for protein products, are not considered predictive for reactions in humans; therefore, studies such as this are considered of little value for the routine evaluation of these types of products even though they are frequently performed. inflammatory, immune, or autoimmune responses induced by the gene product may be of concern. animal studies should be conducted over a sufficient duration of time to allow development of such responses. host immune responses against viral or transgene proteins may limit their usefulness for repeated administration in the clinic. the immune status of the intended recipients of a gene therapy should be considered in the risk-benefit analysis of a product, particularly for viral vectors. if exclusion of immunocompromised patients would unduly restrict a clinical protocol, immune-suppressed, genetically immunodeficient, or newborn animals may be used in preclinical studies to evaluate any potential safety risks. it is extremely important to investigate the potential for undesirable pharmacological activity in appropriate animal models and, when necessary, to incorporate particular monitoring for these activ-ities in nonclinical toxicity studies and/or clinical studies. safety pharmacology studies are designed to measure functional indices of potential toxicity. these indices may be investigated in separate studies or may be carefully incorporated into the design of nonclinical toxicity studies. the aim of these studies should be to reveal any functional effects on the major physiological systems of the body (e.g., cardiovascular, respiratory, renal, and central nervous systems) that will have a major impact on whether or how the agent is administered in the clinic. some of these investigations may include the use of isolated organs or other test systems that do not involve the use of intact animals, such as the use of a perfused rabbit heart model for the evaluation of torsade de pointes and qt prolongation (69, 70) . the results from all of these safety pharmacology studies may allow a mechanistically based explanation of specific organ toxicities, which should be considered carefully with respect to human use and intended indications. the use of additional biomarkers, exemplified by cardiac troponin t or i (71,72) for agents with potential cardiac toxicity, may be warranted in additional nonclinical animal studies and/or in clinical studies in humans. pharmacology studies can be divided into three main categories, depending on the nature of the effect: primary and secondary pharmacodynamic studies and safety pharmacology studies. safety pharmacology studies are defined in the ich guidance document (s7a) on this subject (18) "as those studies that investigate the potential undesirable pharmacodynamic effects of a substance on physiological functions in relation to exposure in the therapeutic range and above." this last point is particularly important in that these studies should be conducted at dose levels or serum concentrations that are therapeutic targets based on prior efficacy/activity studies. simply conducting these studies at low doses does not provide much useful information or adequately assess the safety of the agent. the objectives of these studies are to identify undesirable pharmacodynamic properties of a drug substance that may have relevance to its human safety and toxicity; to evaluate more fully adverse pharmacodynamic and/or pathophysiological effects of a drug substance that were previously observed in nonclinical toxicology and/or clinical studies; and to investigate the mechanism of action of the adverse pharmacodynamic effects that were either previously observed or suspected. the investigational plan developed to meet these objectives should be clearly identified and delineated by the drug development team. for biotechnology-derived products that achieve highly specific receptor targeting, it is often sufficient to evaluate safety pharmacology end-points as a part of well-designed toxicology and/or pharmacodynamic studies; therefore, the need for separate safety pharmacology studies can be reduced or eliminated. for those bioproducts that represent a new therapeutic class and/or those products that do not achieve highly specific receptor targeting, a more extensive evaluation in separate safety pharmacology studies should be considered. biodistribution studies are generally performed for gene therapy products, and typical pharmacokinetic studies used for most types of drugs that measure serum or plasma levels, half-life, clearance, and the like are generally not performed. these preclinical animal biodistribution studies are designed to determine the distribution of the vector to sites other than the intended therapeutic site as an indicator of potential toxicity. the goals of these studies are generally twofold: determination of dissemination of the vector to the germline and distribution of vector to nontarget tissues. the first has been routinely accomplished by assaying total gonadal tissue. the second provides information on potential target organs of toxicity. both may be addressed in the same preclinical study. studies may use normal, intact animals or animal models of disease. the latter study may be more representative of the clinical setting. whenever possible, the intended route of administration should be employed, again with the consideration that groups of animals might also be treated intravenously as a worst-case scenario. transfer of the gene to normal, surrounding, and distal tissues as well as the target site should be evaluated using the most sensitive detection methods possible, such as reverse transcriptase pcr, and should include evaluation of gene persistence. when aberrant or unexpected localization is observed, additional studies should be conducted to determine whether the gene is expressed and whether its presence is associated with any pathologic effects. biodistribution studies may not be necessary for all new agents (73) . with "previously defined" vectors, if there is previous experience with a similar vector, route of administration, formulation, and schedule (e.g., adenovirus type 5 vectors), if the transgene product is considered "innocuous" if expressed ectopically, and when the size of the new vector is not essentially different, biodistribution aspects of the prior agent may be referenced. on the other hand, studies may not be postponed if a new class of vector is used (i.e., there is little or no previous experience; e.g., aav, lentivirus, others); if there is a change in the formulation (i.e., lipid carrier instead of an aqueous formulation); if the route of administration is changed to an intentional systemic route from local administration of the "established" vector; and finally, if the transgene has the potential to induce toxicity if it is aberrantly expressed in nontarget organs. as with toxicity studies, there are a number of factors that should be taken into consideration when designing vector biodistribution studies. regarding species selection, nonhuman primates are not always needed. rodents may be perfectly acceptable. the animal gender should reflect the intended patient population. at least 3-5 animals per sex and group should be used as a minimum. the use of smaller animals (i.e., mice or rats) allows the inclusion of larger numbers of animals and the easy evaluation of more time points. as in other studies, the following dose groups should be included when practical: controls, the maximally feasible/clinically relevant dose, and a lower dose for establishment of the no observable adverse effect level. the route of administration should mimic intended clinical route to the greatest extent possible. regarding animal sacrifice and/or sampling time points, an early point that reflects peak vector transduction/expression should be included, as should a later timepoint determined by intended clinical use and a time-point that should reflect clearance from the gonads and nontarget organs to determine persistence. the following tissues are generally recommended: peripheral blood; gonads; injection site; highly perfused organs (to assist in determination of toxicity) such as brain, liver, lung, kidneys, heart, spleen; other tissues based on toxicity/pathology as determined by transgene (e.g., bone marrow); and those based on the route of administration, such as draining, contralateral lymph nodes. the methodology used to detect the agent should detect a sequence of the vector dna (or ribonucleic acid) that is unique to that product and should be appropriate to detect the vector sequence adequately in tissue samples from both preclinical animal studies and samples obtained during the initial clinical trials. many of the points presented and discussed in this section are elaborated in publicly accessible fda documents (43,73). shedding of viral vectors through the skin or in excreta is of obvious concern with highly infectious viruses. to measure the dissemination, persistence, and shedding of these vectors, multiple tissues (e.g., brain, heart, lungs, spleen, liver, kidneys, ovaries, and skin) as well as bodily fluids such as urine, feces, tears, saliva, vaginal secretions, and skin swabs are taken at multiple time-points throughout the study and analyzed by real-time quantitative pcr for the presence of vector sequences. if the vector sequences are rapidly cleared and viral sequences are absent in the excreta and swabs of the majority of animals, this suggests that there was no significant replication of the vector in the host (56,62,74). even if the intended clinical schedule involves repeated doses, single-dose studies may generate useful data to describe the relationship of dose to systemic and/or local toxicity and the steepness of the dose/toxicity curve. data from these studies can be used to select doses for repeated-dose toxicity studies. information on dose-response relationships may be gathered through the conduct of a single-dose toxicity study or as a component of pharmacology or animal model efficacy studies. the incorporation of safety pharmacology parameters in the design of these studies should be considered, which will reduce the number of animals used, the amount of product required, and the number of studies that must be performed. the route and dosing regimen for these studies (e.g., daily vs intermittent dosing) should reflect the intended clinical use or exposure (e.g., once a week for 3 weeks, every other day, etc.). a recovery period should be included to determine the reversal or potential worsening of pharmacological/ toxicological effects and/or the potential for delayed toxic effects. for biopharmaceuticals that induce prolonged pharmacological/toxicological effects, recovery group animals should be monitored until reversibility is demonstrated. this may not be fundamentally obvious at the outset of the study. the duration of repeated dose studies should be based on the intended duration of clinical exposure and disease indication. this duration of animal dosing has generally been 1-3 months for most biotechnology-derived pharmaceuticals, but this probably will not be the case for most gene therapy studies. however, in the case of life-threatening diseases such as cancer, longer term studies are generally not required to support phase i trials. one aspect of immunotoxicological evaluation includes the assessment of potential immunogenicity as described in section 8.4. many biotechnology-derived pharmaceuticals are intended to stimulate or suppress the immune system and therefore may affect not only humoral, but also cell-mediated immunity. inflammatory reactions at the injection site may be indicative of a stimulatory response. it is important, however, to recognize that simple injection trauma or specific toxic effects caused by the formulation vehicle may also result in toxic changes at the injection site. in addition, the expression of surface antigens on target cells may be altered, which has implications for autoimmune potential. for conventional small molecule drugs, reproductive toxicity is usually assessed in rats and rabbits. the species specificity and potential immunogenicity of biologicals has led to the increased use of nonhuman primates for this purpose. the need for reproductive and developmental toxicity studies will depend on the product, clinical indication, and intended patient population. the specific study design and dosing schedule may be modified based on issues related to species specificity, immunogenicity, biological activity, and/or a long elimination half-life. the issue of germline integration has prompted considerable public discussion (75) . for gene therapy products directly administered to patients, the risk of vector transfer to germ cells should be seriously considered. animal testicular or ovarian samples should be analyzed for vector sequences by the most sensitive method available. if a signal is detected in the gonads, further studies should be conducted to determine if the sequences are present in germ cells as opposed to stromal tissues; techniques used may include, but are not limited to, cell separations, in situ pcr, or other techniques. semen samples for analysis can be collected from mature animals, including mice, by well-established methods (76, 77) for determination of vector incorporation into germ cells. evaluation of biodistribution to the gonads may not be needed prior to all phase i clinical trials, and this issue should be considered carefully in pre-ind meetings with the fda. the informed consent form should address the lack of data and the unknown risks. genotoxicity studies, such as the ames salmonella assay, the micronucleus test, and the mouse lymphoma assay, which are routinely conducted for small molecule pharmaceuticals, are not appli-cable to biotechnology-derived pharmaceuticals, especially gene therapy products, and therefore are not needed. the administration of large quantities of peptides, proteins, or viruses may yield uninterpretable results. when there is cause for concern about the product, genotoxicity studies should be performed in available and relevant systems, including newly developed systems. the use of standard genotoxicity studies as indicated for assessing the genotoxic potential of process contaminants is not considered appropriate. if standard assays are performed for this purpose, the rationale should be provided. standard 2-year carcinogenicity bioassays in normal mice and rats are generally inappropriate for biotechnology-derived pharmaceuticals and probably more so for gene therapy products. this issue has received additional attention owing to the emergence of a lymphoproliferative syndrome in a potentially significant fraction of recipients of a vector designed to treat scid syndrome (10, 11) . this clinical result actually recapitulates to a certain degree toxicities anticipated from experience in animal models (78) . thus, product-specific assessment of carcinogenic potential will still be needed for biopharmaceuticals, and studies utilized must be refined after consideration of the duration of anticipated clinical dosing, the patient population, or the biological activity of the product (e.g., retrovirus vectors, growth factors, immunosuppressive agents, etc.). when there is a concern about carcinogenic potential, a variety of new approaches may be considered to evaluate this risk. when the potential to support or induce proliferation of transformed cells and clonal expansion leading to neoplasia is considered possible, the product should be evaluated with respect to receptor expression for the biopharmaceutical or for the transgene's expressed form in various malignant and normal human cells, especially those potentially relevant to the patient population under study. the ability of the biopharmaceutical to stimulate growth of normal and/or malignant cells expressing the relevant receptors should be determined. when in vitro studies such as this give cause for concern about carcinogenic potential, further studies in relevant animal models may be needed if these are available and relevant. as stated in this section, when gene transfer agents must be evaluated, the standard rodent models (mice and rats) and the 2-year carcinogenicity bioassay are probably not appropriate. daily administration of vector as is usually performed in these studies is not feasible; however, several of these vectors, including aav, continue to express over the lifetime of the animal. the other factor that may be limiting is that the host immune response to the vector or to the transgene may either limit the toxicity, perhaps because of the development of neutralizing antibodies, or may have effects on tumor development. it will be necessary to consult with the fda to develop product-specific studies on an individualized basis or to determine whether and which carcinogenicity studies are needed. local tolerance to administration of the new agent should be evaluated. the formulation intended for the clinical trial should be tested unless there is a cogent reason why this would not be feasible or biologically meaningful. in most cases, the potential adverse effects of the product at the site of administration can be evaluated in the single-or repeated-dose toxicity studies that are usually conducted in the normal course of development, thus eliminating the need for separate studies. adenoviral vectors can efficiently deliver genes to a wide variety of dividing and nondividing cell types both in vitro and in vivo, resulting in a high level of transient gene expression. considerable modifications have been made in the wild-type virus to reduce infectivity and toxicity in normal tissues or to improve transduction or tropism for tumor cells. the death of jesse gelsinger because of several complications, including liver failure, coupled with the fact that adenovirus infections in immunocompromised oncology patients can lead to fatal hepatotoxicity (79, 80) , and reports of serious hepatotoxicity and death in nonhuman primates treated with different adenoviral vectors make the safety evaluation of these vectors for cancer treatment paramount. when a first-generation adenovirus vector expressing human factor ix was intravenously injected into rhesus macaques at doses from 1 ´ 10 10 to 1 ´ 10 11 pfu/kg, no toxicity was seen at the lower dose level, but substantial, dose-limiting liver toxicity was observed at the higher dose (68) . this hepatotoxicity was manifested as elevated serum transaminase levels, hyperbilirubinemia, hypoalbuminemia, and prolongation of clotting times. all evidence of liver toxicity resolved except for persistent hypofibrinogenemia in the high-dose recipient, indicating possible permanent liver damage. these data suggested a very narrow therapeutic window for this first-generation adenovirus-mediated gene transfer vector. in follow-up studies (81) , it was concluded that these abnormalities may be caused by direct toxic effects of the adenovirus vector itself, or may result indirectly from the accompanying acute inflammatory response marked by elevations in interleukin 6. when another first-generation adenoviral vector expressing b-galactosidase was intravenously injected into two baboons at doses of 1.2 ´ 10 12 or 1.2 ´ 10 13 particles/kg, the baboon receiving the high dose developed acute symptoms, decreased platelet counts, and increased liver enzymes and became moribund at 48 hours after injection; the baboon receiving the lower dose developed no symptoms (82) . again, a very narrow therapeutic index was demonstrated. recombinant adenoviruses infused into the portal vein of adult rhesus monkeys at a dose of 10 13 particles/kg resulted in the formation of neutralizing antibody, severe liver toxicity, and death. readministration of a second vector was associated with the same degree of toxicity as the first vector, but prompted a much more vigorous neutralizing antibody response (83) . the administration of several gene transfer vehicles and routes was studied in rhesus monkeys to develop a model for adenovirusmediated gene transfer for liver. vectors administered via the portal vein or saphenous vein were efficient, but this resulted in transient gene expression and was accompanied by an immune response to both vector and transgene products and acute hepatitis (84) . turning to models of intracerebral administration, baboons received intracerebral injections of either a high dose of a replication-defective adenoviral vector expressing hsvtk (1.5 ´ 10 9 pfu) with or without gcv or a low dose of adv/rsvtk (7.5 ´ 10 7 pfu) with gcv to evaluate the safety of this regimen. animals receiving the high-dose vector and gcv either died or became moribund and were sacrificed during the first 8 days of treatment. necropsy of these animals revealed cavities of coagulative necrosis at the injection sites. animals that received only the high-dose vector were clinically normal; however, lesions were detected with mri at the injection sites corresponding to cystic cavities at necropsy. animals that received the low-dose vector and gcv were clinically normal and exhibited small mri abnormalities, and although no gross lesions were present at necropsy, microscopic foci of necrosis were present. neutralizing antibodies were produced in the animals, but no shedding of the vector was found in urine, feces, or serum 7 days after intracerebral injection (74) . intrapulmonary administration uses are exemplified through the use of recombinant adenovirus vectors containing expression cassettes for human cystic fibrosis transmembrane conductance regulator, which were instilled through a bronchoscope into limited regions of lung in baboons. the only adverse effect noted was a mononuclear cell inflammatory response within the alveolar compartment of animals receiving doses of virus that were required to induce detectable gene expression. minimal inflammation was seen at 10 7 and 10 8 pfu/ml, but at 10 9 and, more prominently, at 10 10 pfu/ml, a perivascular lymphocytic and histiocytic infiltrate was seen (85). host immune elimination of infected cells often limits gene expression in vivo to 1-2 weeks after infection (86, 87) . in addition to a cell-mediated immune response to the adenovirus infection, a humoral response to the injected virus is often generated (88) . although this humoral response may prevent the use of adenoviral vectors for repeated dosing, it may be blocked or reduced by coadministration of immunosuppressive agents or cytokines. alternatively, the use of adenoviruses of different serotypes may allow for repeated administration, even in the presence of neutralizing antibodies (88) . harvey et al. (89) reported on 6 years of experience with the local administration of low (<10 9 particle units) and intermediate (10 9 to 10 11 particle units) doses of e1 -/e3adenovirus vectors to six different sites. with a group incidence of only 0.7% for major adverse events and no deaths related to administration of the adenovirus vectors, local administration of low and intermediate doses of adenovirus vectors was well tolerated. second-generation adenoviral vectors, mutated in e2a, have been proposed to decrease host immune responses against transduced cells, reduce toxicity, and increase duration of expression as compared with first-generation vectors deleted only in e1. the safety of and e1-, e2a-, e3-deleted adenoviral vector (av3h82) encoding an epitope-tagged b-domain-deleted human factor viii complementary dna was evaluated in cynomolgus monkeys. animals received intravenous administration of either 6 ´ 10 11 or 3 ´ 10 12 particles/kg. vector distribution was widespread, with the highest levels observed in liver and spleen. histopathology, hematology, and serum chemistry analysis identified the liver and blood as major sites of toxicity. transient mild serum elevations of liver enzymes were observed, along with a dose-dependent inflammatory response in the liver. in addition, mild lymphoid hyperplasia was observed in the spleen. mild anemia and a transient decrease in platelet count were observed, as was marrow hyperplasia and extramedullary hematopoiesis (90) . when vectors deleted in e1 and containing either a temperature-sensitive mutation in the e2a gene or a deletion of the e4 region were infused into the hepatic artery of nonhuman primates, minimal toxicity was seen. histopathology showed that portal inflammation was present throughout both livers in the animals receiving the high dose. no differences were seen in the level of portal inflammation in targeted and untargeted lobes. pcr analysis detected viral dna sequence in gonads and brain as well as many other tissues in baboons treated with high-dose vector. in baboons treated with lower doses of an e1-e4-deleted vector expressing the human ornithine transcarbamylase gene, dna was detectable by nested pcr in liver, but not gonads, at days 29 and 61. the data suggested that intraarterial administration of recombinant adenoviral e1-e4-deleted vector is feasible and safe. (91) . toxicity of first-generation and e2a-deleted vectors expressing human a1-antitrypsin was evaluated in c3h mice after administration of increasing doses starting at 1 ´ 10 12 particles/kg. both vectors induced dose-dependent toxicity, including transient thrombocytopenia, elevated alanine aminotransferase, and increased hepatocyte proliferation, followed by inflammation and then hypertrophy. there were no differences in toxicity between the two vectors when measured at matching levels of human a1-antitrypsin expression. however, the e2a-deleted vector had slightly reduced hepatocyte toxicity at an intermediate particle dose (92) . although these vectors are purported to be less toxic, the fact remains that the human fatality that occurred in the ornithine transcarbamylase deficiency trial at the university of pennsylvania was an e1, e4-deleted construct (93). the current e1-deleted adenoviruses can infect a wide variety of cells through a specific interaction between the viral fiber protein and at least one cell surface receptor. entry of the virus into the cell is further enhanced through a specific interaction of the fiber with an integrin "coreceptor." the host's range of tissue susceptibilities to the virus can therefore be altered by various strategies so that it can bind more efficiently to the target cell surface (94) (95) (96) . antibodies against tissue-specific cell surface proteins can also be coupled to the fiber protein to facilitate partial targeting of the virus (97) . another approach to achieve "targeting" of the virus is the use of cell-specific promoters to drive expression of a therapeutic gene in the context of the recombinant virus (98) . enhanced uptake strategies through fiber modification may present special concerns for toxicity, especially regarding hepatotoxicity when administered by an intravenous or direct hepatic artery injection. careful comparison of a tropism-modified adenoviral vector to the nontropism-modified vector in mouse toxicity and biodistribution studies as well as nonhuman primate and toxicity studies might be desirable. members of the family parvoviridae aavs are among the smallest of the dna viruses (99) . unlike autonomous parvoviruses, aavs or dependo-viruses require coinfection with unrelated helper viruses for a productive infection to occur (100, 101) . as recombinant vectors for gene therapy, they seem to have several advantages compared to other vectors, such as the transduction of terminally differentiated and nondividing cells (102, 103) , relatively high stability of transgene expression (104) , and the potential for targeted integration (105, 106) . from a safety point of view, aav vectors show a lack of pathogenicity (107-109), low immunogenicity (104, 110) , and low risk of insertional mutagenesis (111) . also, there did not appear to be any evidence of transduction in the gonads of rhesus monkeys (112) . however, aav has a limited dna capacity. hsv vectors can deliver large amounts of exogenous dna; however, cytotoxicity and maintenance of transgene expression are obvious obstacles to their use. they also have the advantages of the abilities to infect nondividing cells and to establish latency in some cell types. the ability to establish latency in neuronal cells makes hsv an attractive vector for treating neurological disorders such as parkinson's and alzheimer's diseases. in addition, the ability of hsv to infect efficiently a number of different cell types, such as muscle and liver, may make it an excellent vector for treating nonneurological diseases. one problem associated with hsv-based vectors has been the toxicity of the vector in many different cell types. the generation of hsv vectors with deletions in many of the immediate early gene products, which is similar to the strategy used for adenovirus, has resulted in vectors with reduced toxicity and antigenicity as well as prolonged expression in vivo (60) (61) (62) (63) (64) (65) . no clinical study has been reported in detail with these vectors. section 8.1.2. details a summary of preclinical safety considerations pertaining to use of the aotus monkey in comparison to rodent species. retrovirus vectors are replication-defective and are primarily based on the moloney murine leukemia virus (mmlv), which is a well-studied and well-characterized retrovirus (113, 114) with numerous advantages. they have been extensively studied, produce stable integration into the host genome, and are very efficient at gene transfer. disadvantages include an infection that is limited to dividing cells, which makes gene transfer into nondividing cells such as hematopoietic stem cells, hepatocytes, myoblasts, and neurons an impossibility, and low titer of products. there are four theoretical concerns that exist for retroviral-mediated gene transfer that relate to two potential delayed toxicities. these are insertional mutagenesis, recombination with endogenous retroviral sequences, transfer of exogenous genetic material, and accidental exposure to replicationcompetent murine retroviruses (115) . because retroviral vectors can permanently integrate into the genome of the infected cell, there is a serious concern regarding insertional mutagenesis causing the development of a secondary malignancy. the presence of rcrs is of major concern because of the fact that rcrs have produced lethal malignant t-cell lymphomas in 3/10 rhesus monkeys (78) . these concerns resulted in a publication concerning the fda considerations on these issues (116) and the issuance of a new fda guidance on this subject in october 2000 (36) . some of these con-cerns are no longer theoretical. the elation that this type of retroviral-mediated therapy was successful in curing a number of children with scid (117) has been severely dampened by the reports of a leukemialike disease produced in two of these children (10-13). unlike oncoretrovirusus such as moloney murine leukemia virus, one subclass of retroviruses, the lentiviruses, can infect nondividing cells. this makes these viruses attractive for gene transfer. one of these viruses, human immunodeficiency virus (hiv), has been the subject of investigation by a number of groups. the most obvious concerns with using hiv for gene therapy is safety and the possible generation of replication-competent virus during vector production. this involves engineering the vector so that it is replication defective. this has been done in a number of cases by eliminating all accessory genes, such as tat, vif, vpr, vpu, and nef, from a packaging construct that still has the ability to transduce cells (118) . concern about the possibility of insertional activation of cellular oncogenes by a random integration of the vector provirus into the host genome has led to the development of self-inactivating vectors (119) (120) (121) (122) . the use of self-inactivating viruses significantly improves the biosafety of hiv-derived vectors because it reduces the likelihood that rcrs will originate during vector production and target cells and hampers recombination with wild-type hiv in an infected host. in an attempt to make even safer constructs, other groups are working on the development of lentiviral vectors from hiv-2 (123), simian immunodeficiency virus (124), bovine immunodeficiency virus (125, 126) , and feline immunodeficiency virus (127) . these last vectors may be inherently more acceptable because they are not based on hiv-1. none of these newer constructs has moved toward the clinic, so there is little animal safety data and no human data on these vectors yet. this chapter presents a range of issues that might be considered in contemplating the development of a gene therapy agent to the point of an early phase clinical trial. as no gene therapy product has yet been recognized as "safe and effective," the standard approach to these issues should be regarded very much as a "work in progress." indeed, the nature of these agents would suggest that each new opportunity would call for its own unique set of requirements, so that a single approach will probably never "standardly" exist. rather, the principles that underlie regulatory policy should be woven into the approach to each new agent. in broad strokes, these involve approaches to answering the following questions: is the identity of the agent clearly defined? can successive batches of the material be made reproducibly in the quantity to support clinical development, and how is this known? are the biological features of the vector, and its transgene when applicable, clearly similar in the animal species used for safety studies and in the human, at least as far as this can be ascertained? what dose is likely to be required for therapeutic effect? what level of gene expression or replication is necessary to attain a therapeutic effect? when toxicity occurs because of the agent, what is the evidence the toxicity will be reversible? is toxicity after repeated doses of agent likely to be attenuated or magnified by immunological response to the agent? what are the consequences of long-term presence of the therapeutic agent in the recipient? is there a danger of producing directly (as the therapeutic agent itself) or indirectly (through recombination and/or replication) an infectious agent that acts horizontally in the population or vertically across generations? how will the presence and distribution of the gene therapy agent be followed in the patient? sponsors are above all encouraged to see the regulatory process as a collaborative interaction with the regulatory agencies with the end not only of protecting the patient, but also of advancing the most scientifically defensible and rigorous questions to clinical trial. far more costly than the conduct of experiments designed to be compliant with regulatory requirements is a failed or overtly injurious clinical trial. a clear understanding and a proactive approach in addressing regulatory issues outlined here will maximally ensure the likelihood of an interpretable clinical outcome. the regulatory issues outlined here must be approached with continuing appreciation of the evolving science associated with the gene therapy field. as such, requirements may evolve with the state of the science, and careful sustained contact with the regulatory agencies is important in incorporating the best and most current science into the design, conduct, and interpretation of regulatory studies. ich guidance on quality of biotechnology/biological products: derivation and characterization of cell substrates used for production of biotechnical/biological products ich: final guideline on quality of biotechnological products: analysis of the expression construct in cells used for production of r-dna derived protein products ich guidance on specifications: test procedures and acceptance criteria for biotechnological/biological products ich guidance for industry q1a (r2), stability testing of new drug substances and products ich guidance on viral safety evaluation of biotechnology products derived from cell lines of human or animal origin ich guidance: q7a good manufacturing practice guidance for active pharmaceutical ingredients ich guidance for industry: s6 preclinical safety evaluation of biotechnology-derived pharmaceuticals ich: final guidance on stability testing of biotechnological/biological products fda guidance for industry: guidance for human somatic cell therapy and gene therapy gene therapy: a tragic setback a serious adverse event after successful gene therapy for x-linked severe combined immunodeficiency regulators split on gene therapy as patient shows signs of cancer second cancer case halts gene-therapy trials preclinical development strategies for novel gene therapy products the role of the toxicologic pathologist in the preclinical safety evaluation of biotechnologyderived pharmaceuticals nonclinical safety evaluation of biotechnologically derived pharmaceuticals non-clinical safety studies for biotechnology-derived pharmaceuticals: conclusions from an international workshop ich guidance to industry: s7a safety pharmacology studies for human pharmaceuticals code of federal regulations, title 21, food and drugs, part 610.10, subpart b, general biological products standards; general provisions; potency code of federal regulations, title 21, food and drugs, part 610.12, subpart b, general biological products standards code of federal regulations, title 21, food and drugs, part 610.13, subpart b, general biological products standards code of federal regulations, title 21, food and drugs, part 610.14, subpart b, general biological products standards code of federal regulations, title 21, food and drugs, part 58, good laboratory practice for nonclinical laboratory studies current good manufacturing practice in manufacturing, processing, packing, or holding of drugs; general code of federal regulations, title 21, food and drugs, part 211, current good manufacturing practice for finished pharmaceuticals code of federal regulations, title 21, food and drugs, part 312, investigational new drug application inds) for phase i studies of drugs, including well-characterized, therapeutic, biotechnology-derived products fda guidance for industry: ind's for phases 2 and 3 studies of drugs, including specified therapeutic biotechnology-derived products, chemistry, manufacturing and controls content and format fda guidance for industry: content and format of chemistry, manufacturing, and controls information and establishment description information for a vaccine or related product fda guidance for industry for the submission of chemistry, manufacturing, and controls information for a therapeutic recombinant dna-derived product or a monoclonal antibody product for in vivo use fda guidance for industry: formal meetings with sponsors and applicants for pdufa products fda points to consider in the manufacturing and testing of monoclonal antibody products for human use fda letter to manufacturers of biological products: recommendations regarding bovine spongiform encephalopathy (bse) fda biological response modifiers advisory committee: current policy on sequence characterization of gene transfer products fda biological response modifiers advisory committee: adenovirus titer measurements and rca levels fda guidance for industry: supplemental guidance on testing for replication competent retrovirus in retroviral vector based gene therapy products and during follow-up of patients in clinical trials using retroviral vectors fda points to consider in the characterization of cell lines used to produce biologicals fda gene therapy patient tracking system final document fda guidance concerning demonstration of comparability of human biological products, including therapeutic biotechnology-derived products third national nih gene transfer safety symposium: safety considerations in the use of aav vectors in gene transfer clinical trials basic principles of gene therapy: basic principles and safety considerations preclinical animal models in gene therapy research a new animal model for human respiratory tract disease due to adenovirus use of aotus monkey to assess neurovirulence of replication-selective herpes vectors herpes simplex type 1 infects and establishes latency in the brain and trigeminal ganglia during primary infection of the lip in cotton rats and mice tropism of human adenovirus type 5-based vectors in swine and their ability to protect against transmissible gastroenteritis coronavirus porcine toxicology studies of sch 58500, an adenoviral vector for the p53 gene pathogenesis of adenovirus type 5 pneumonia in cotton rats (sigmodon hispidus) the cotton rat in biomedical research handling the cotton rat for research role of early region 3 (e3) in pathogenesis of adenovirus disease early region 3-replacement adenovirus recombinants are less pathogenic in cotton rats and mice than early region 3-deleted viruses intracranial administration of adenovirus expressing hsv-tk in combination with ganciclovir produces a dose-dependent, self limiting inflammatory response distribution, persistency, toxicity, and lack of replication of an e1a-deficient adenoviral vector after intracardiac delivery in the cotton rat subcutaneous administration of a replication-competent adenovirus expressing hsv-tk to cotton rats: dissemination, dersistence, shedding, and pathogenicity. hum the owl monkey (aotus trivirgatus) as an animal model for viral diseases and oncologic studies characterization of four herpesviruses isolated from owl monkeys and their comparison with herpesvirus saimiri type 1 (herpesvirus tamarinus) and herpes simplex virus type 1 immunization of experimental animals with reconstituted glycoprotein mixtures of herpes simplex virus 1 and 2: protection against challenge with virulent virus in vivo behavior of genetically engineered herpes simplex viruses r7017 and r7020. ii. studies in immunocompetent and immunosuppressed owl monkeys (aotus trivirgatus) attenuated, replication-competent herpes simplex virus type 1 mutant g207: safety evaluation of intracerebral injection in nonhuman primates viral shedding and biodistribution of g207, a multimutated, conditionally replicating herpes simplex virus type 1, after intracerebral inoculation in aotus preclinical safety evaluation of g207, a replication-competent herpes simplex virus type 1, inoculated intraprostatically in mice and nonhuman primates attenuated, replication-competent herpes simplex virus type 1 mutant g207: safety evaluation in mice concordance of the toxicity of pharmaceuticals in humans and in animals quantitative comparison of toxicity of anticancer agents in mouse, rat, hamster, dog, monkey, and man the rhesus macaque as an animal model for hemophilia b gene therapy adenovirus-mediated expression of human coagulation factor ix in the rhesus macaque is associated with dose-limiting toxicity experimental models of torsade de pointes drug-related torsade de pointes in the isolated rabbit heart: comparison of clofilium, d,l-sotalol and erythromycin elevations in cardiac troponin measurements: false false-positives: the real truth predicting cancer therapy-induced cardiotoxicity: the role of troponins and other markers preclinical considerations for gene transfer clinical trials: vector biodistribution adenoviral-mediated thymidine kinase gene transfer into the primate brain followed by systemic ganciclovir: pathologic, radiologic, and molecular studies meeting summary (available on-line at: www4.od.nih.gov/oba/rac/summaries/3-99sum.htm.) and meeting minutes rac minutes-03/11-12/99. available on-line at: www4.od a technique for the artificial insemination of mice an improved method for the artificial insemination of mice helper virus induced t cell lymphoma in nonhuman primates after retroviral mediated gene transfer adenovirus infection in the immunocompromised patient fulminant hepatic failure due to disseminated adenovirus infection in a patient with chronic lymphocytic leukemia toxicity of a first-generation adenoviral vector in rhesus macaques lethal toxicity, severe endothelial injury, and a threshold effect with high doses of an adenoviral vector in baboons gene transfer into the liver of nonhuman primates with e1-deleted recombinant adenoviral vectors: safety of readministration liver-directed gene transfer in non-human primates adenovirus-mediated transfer of the cftr gene to lung of nonhuman primates: toxicity study role of viral antigens in destructive cellular immune responses to adenovirus vector-transduced cells in mouse lungs clearance of adenovirus-infected hepatocytes by mhc class i-restricted cd4 + ctls in vivo circumvention of anti-adenovirus neutralizing immunity by administration of an adenoviral vector of an alternate serotype safety of local delivery of low-and intermediate-dose adenovirus gene transfer vectors to individuals with a spectrum of morbid conditions adenoviral vector-mediated expression of physiologic levels of human factor viii in nonhuman primates selective gene transfer into the liver of non-human primates with e1-deleted, e2a-defective, or e1-e4 deleted recombinant adenoviruses toxicological comparison of e2a-deleted and first-generation adenoviral vectors expressing alpha1-antitrypsin after systemic delivery recombinant adenovirus gene transfer in adults with partial ornithine transcarbamylase deficiency (otcd). hum targeting of adenovirus penton base to new receptors through replacement of its rgd motif with other receptor-specific peptide motifs generation of recombinant adenovirus vectors with modified fibers for altering viral tropism towards the use of replicative adenoviral vectors for cancer gene therapy targeted adenovirus gene transfer to endothelial and smooth muscle cells by using bispecific antibodies a new adenoviral vector: replacement of all viral coding sequences with 28 kb of dna independently expressing both full-length dystrophin and b-galactosidase characteristics and taxonomy of parvoviridae adenovirus-associated defective virus particles herpes simplex virus types 1 and 2 completely help adenovirus-associated virus replication prospects for the use of adeno-associated virus as a vector for human gene therapy adeno-associated virus vectors for gene therapy stable in vivo expression of the cystic fibrosis transmembrane conductance regulator with an adeno-associated virus vector site-specific integration by adeno-associated virus mapping and direct visualisation of regionspecific viral dna integration site on chromosome 19q13-qter epidemiology of adenoassociated virus infection in a nursery population a seroepidemiologic study of adenovirus-associated virus infection in infants and children adeno-associated viruses of humans high efficiency transfer of the t cell co-stimulatory molecule b7-2 to lymphoid cells using high-titer recombinant adeno-associated virus vectors. hum regulated high level expression of a human g-globin gene introduced into erythroid cells by an adeno-associated virus vector preclinical evaluation of aav vectors expressing the human ctfr cdna retroviral vectors for use in human gene therapy for cancer, gaucher disease, and arthritis effects of retroviral vector design on expression of human adenosine deaminase in murine bone marrow transplant recipients engrafted with genetically modified cells safety aspects of gene therapy evaluation of recommendations for replication competent retrovirus testing associated with use of retroviral vectors sustained correction of x-linked severe combined immunodeficiency by ex vivo gene therapy minimal requirement for a lentivirus vector based on human immunodeficiency virus type 1 development of a self-inactivating lentivirus vector advanced modular self-inactivating lentiviral expression vectors for multigene interventions in mammalian cells and in vivo transduction transduction of acute myeloid leukemia cells with third generation self-inactivating lentiviral vectors expressing cd80 and gm-csf: effects on proliferation, differentiation, and stimulation of allogeneic and autologous anti-leukemia immune responses self-inactivating lentivirus vector for safe and efficient in vivo gene delivery human immunodeficiency virus type 2 (hiv-2) vectormediated in vivo gene transfer into adult rabbit retina pseudotyped lentivirus vectors derived from simian immunodeficiency virus sivagm with envelope glycoproteins from paramyxovirus construction and molecular analysis of gene transfer systems derived from bovine immunodeficiency virus mapping of the bovine immunodeficiency virus packaging signal and rre and incorporation into a minimal gene transfer vector gene transfer to the nonhuman primate retina with recombinant feline immunodeficiency virus vectors key: cord-003387-82573enr authors: nam, gyu-hwi; mishra, anshuman; gim, jeong-an; lee, hee-eun; jo, ara; yoon, dahye; kim, ahran; kim, woo-jin; ahn, kung; kim, do-hyung; kim, suhkmann; cha, hee-jae; choi, yung hyun; park, chan-il; kim, heui-soo title: gene expression profiles alteration after infection of virus, bacteria, and parasite in the olive flounder (paralichthys olivaceus) date: 2018-12-24 journal: sci rep doi: 10.1038/s41598-018-36342-y sha: doc_id: 3387 cord_uid: 82573enr olive flounder (paralichthys olivaceus) is one of economically valuable fish species in the east asia. in comparison with its economic importance, available genomic information of the olive flounder is very limited. the mass mortality caused by variety of pathogens (virus, bacteria and parasites) is main problem in aquaculture industry, including in olive flounder culture. in this study, we carried out transcriptome analysis using the olive flounder gill tissues after infection of three types of pathogens (virus; viral hemorrhagic septicemia virus, bacteria; streptococcus parauberis, and parasite; miamiensis avidus), respectively. as a result, we identified total 12,415 differentially expressed genes (deg) from viral infection, 1,754 from bacterial infection, and 795 from parasite infection, respectively. to investigate the effects of pathogenic infection on immune response, we analyzed gene ontology (go) enrichment analysis with degs and sorted immune-related go terms per three pathogen groups. especially, we verified various go terms, and genes in these terms showed down-regulated expression pattern. in addition, we identified 67 common genes (10 up-regulated and 57 down-regulated) present in three pathogen infection groups. our goals are to provide plenty of genomic knowledge about olive flounder transcripts for further research and report genes, which were changed in their expression after specific pathogen infection. viruses, viral hemorrhagic septicemia virus (vhsv) is affiliated to novirhabdovirus genus, which is a member of the rhabdoviridae family 4 . the six gene were contained in the vhsv genome of about 11 k bases and each of them coded nucleoprotein (n), phosphoprotein (p), matrix protein (m), glycoprotein (g), nonstructural viral protein (nv), and rna polymerase (l) in the following order 3'-n-p-m-g-nv-l-5' 4 . infection of vhsv results in contagious viral hemorrhagic septicemia (vhs) in diverse fish species regardless of their inhabitation; seawater or freshwater 5 . in east asia, a lot of infection cases into olive flounder have been reported steadily, since vhsv was detected in middle of 1990s [6] [7] [8] [9] . a variety of scuticociliates have been reported as cause of scuticociliatosis in marine species including turbot, guppy, and southern bluefin tuna [10] [11] [12] . in olive flounder, disease has been reported to be causing from various scuticociliates; uronema marinum, pseudocohnilembus persalinus, philasterides dicentrarchi, miamiensis avidus [13] [14] [15] [16] . interestingly, judging from infection experiments using various scuticociliates plus identification outcome of 8 isolates acquired from olive flounders with symptom of ulcers and haemorrhages, miamiensis avidus was suggested as the major aetiologic agent of scuticociliatosis because of high pathogenicity and mortality rate compared with other scuticociliates 14, 17 . infection of bacteria could sustain serious damage to fish. streptococcosis is known to be caused by a variety of streptococcic species; streptococcus parauberis, streptococcus iniae, streptococcus difficilis, lactococcus garvieae, lactococcus piscium, vagococcus salmoninarum, and carnobacterium piscicola, and has become major nuisance in olive flounder farms [18] [19] [20] [21] . in particular, streptococcus iniae, lactococcus garvieae, and streptococcus parauberis have been introduced to be related with streptococcosis in olive flounder [19] [20] [21] [22] [23] . the main issue of aquaculture industry is to reduce economic loss by preventing mortality of fish from various pathogens. a large number of immunologic studies have been proceeded about various immune-related gens against pathogen infection 3, [24] [25] [26] [27] . a huge quantity of genomic information from next generation sequencing (ngs) technique has been gradually increasing for the last few years, indicating that researchers could approach more comprehensive understanding view about genome of organisms than when they research a single gene level. with development of wide-sized analysis methods, it is not difficult to figure out change of gene expression level after any chemical treatment or environmental change. recently, studies to identify large-scale genes were conducted in the olive flounder genome for researches about vaccine, gonadal development, and sex determination [28] [29] [30] . in particular, characterizing of immune-related genes was reported in olive flounder spleen tissue 31 . a lot of studies reported earlier were focused on gene expression analysis of single pathogen and specifically defined the expression pattern of limited genes [32] [33] [34] [35] [36] . further, infection by two or more pathogens were reported in the olive flounder genome 37, 38 . in order to solve these problem, we need plentiful genomic information to respond rapidly to multiple infection of pathogens. however, researches, which were comprehensively analysed about change of gene expression pattern by different type of pathogens, have not been reported in the olive flounder genome, so far. in this research, we identified differentially expressed genes (degs) by transcriptome analysis and conducted gene ontology (go) analysis with genes identified. then, we tried to find important genes which showed consistently meaningful expression change in the results of three infection experiments. as a result, we determined 10 up-regulated genes and 57 down-regulated genes in common after infection of three pathogens. we aimed to provide essential genome information which is related with pathogen infection and explore the various consequences related to differential infections and find out the common strategies against specific candidates involved in disease progression in natural habitat of aquaculture. statistical summary of transcriptome analysis. to profile gene expression after infection of three pathogens (vhsv, streptococcus parauberis, and miamiensis avidus), transcriptome analysis was conducted using gill tissues of olive flounders, respectively. we prepared twelve olive flounders (three un-infected individuals as control, three virus-infected, three bacteria-infected, and three parasite-infected individuals) to raise confidence. to gain the sufficient number of transcripts, twelve independent rna samples acquired from normal and pathogen-infected olive flounder gill tissues were employed for construction of cdna library. then, these cdna libraries were sequenced using illumina hiseq2500, generating the numbers of approximately 78.4 million, 65.2 million, and 45.7 million raw reads from three control samples, 44.0 million, 56.9 million, and 62.0 million raw reads from bacteria-infected samples, 41.2 million, 62.4 million, and 40.0 million raw reads from virus-infected samples, 39.6 million, 41.6 million, and 53.1 million raw reads from parasite-infected samples, respectively (table 1) . after trimming of low-quality reads and adaptor sequences, the number of clean reads acquired from control samples were average 62.3 million reads from control samples, average 53.5 million reads from bacteria-infected samples, average 48.0 million reads from virus-infected samples, and average 44.3 million reads from parasite-infected samples, respectively. then, we checked gene coverage whether the reads that we acquired are sufficient for quantitative gene expression analysis ( supplementary fig. 1 ). the clean reads were assembled into 120,880 transcript sequences acquired from transcriptome analysis, we identified total 40,100 genes involving novel 19,560 genes from transcript sequences using interproscan database and non-redundant protein database in the ncbi ( table 2 ). to figure out the effects of external pathogen for gene expression, we sorted out genes which showed expressional change after pathogens infection having p-value of <0.05 when compared with control sample. as shown in table 3 and fig. 1 , the largest numbers of gene expression change were shown in vhsv infection group; total 12,415 degs were identified from transcriptome analysis. we showed information of degs derived from viral infection in supplementary to explore the functional enrichment of these degs, we performed go enrichment analysis using david tool 39 table 2 ). prediction following viral infection, we identified the degs with p-value of <0.05 after infection with bacteria (streptococcus parauberis) and parasite (miamiensis avidus) ( table 3 and table 2 ). miamiensis avidus affected gene expression pattern in olive flounder genome and selected genes which showed expression change (supplementary table 2 ). we identified 795 degs caused by infection of miamiensis avidus; description samples u-i 1 u-i 2 u-i 3 b-i 1 b-i 2 b-i 3 v-i 1 v-i 2 v-i 3 p-i 1 p-i 2 p-i 3 number distributional pattern of total degs acquired from transcriptome analysis. after comparison of gene expression level among twelve transcriptome analysis, we identified degs after pathogen infection. then, we focused on selection of genes showing expression change pattern after three types of pathogen infection in common. as a result, we summarized 10 up-regulated genes and 57 down-regulated genes, respectively (fig. 2) . we analyzed these 67 degs to identify their gene symbol correctly using their sequences in non-redundant protein database of the ncbi database, and 37 degs were annotated (table 4) . we showed the rest of 30 unannotated degs in supplementary table 3 . with development of sequencing technique, numerous genomic researches have been reported to understand infection results by virus 28, 31, 40 , bactria 26, 41, 42 , and parasite 43 in the olive flounder genome. a fundamental way to overcome disease outbreak from external pathogens is to approach from their genome level. it is essential to expand quantity of genomic information in pursuance of biological research about any target. investigation of overall gene expression change after pathogen infection would provide clues of cause of biological damage. gene regulation is essential for viruses, prokaryotes and eukaryotes as it increases the versatility and adaptability of an organism by allowing the cell to express protein when needed. phylogenetic diversity of pathogens (virus, bacteria and parasite) is also responsible for differential expression of genes in diseases. each individual pathogen causes disease in a different way, which makes it challenging to understand the basic biology of infection. in this study, we understood the relation between three types of pathogen infection and differential gene expression in the olive flounder genome through transcriptome analysis, respectively. the diverse pathogens used in this study, carry specific antigenic variations, which refers to the specific mechanism by which an infectious agent infect the fish and progress the disease. transcriptome analysis help us to understand the progression of disease in fish through pathogen infection based on diversity of pathogen (virus, bacteria and parasite). this study shows differentially expressed genes were up-and down-regulated at different extend in fish tissue. interestingly, virus and bacteria have more down-regulated genes while parasite have more up-regulated genes. this data signifies the fact that fish immune system interacts with bacteria and virus with the same strategy, while with parasite different due to difference in mode of infections between them. for efficient prevention against pathogen, it is important to understand which genes were activated/repressed after pathogen infection because their expression change means variation of metabolic system in body. in this study, we identified total number of 12,415 in vhsv infection group, 1,754 in streptococcus parauberis infection group, and 795 degs from miamiensis avidus infection group, respectively (table 3 ). given the difference in the number of degs among three pathogen groups, these results seemed that virus had the most impact on gene expression mechanism in the olive flounder genome among three pathogens. interestingly, 11,051 degs (89% of all degs) showed down-regulated pattern after viral infection. this phenomenon that global gene expression was decreased by viral infection must cause pathogenic disease by affecting immune-related gene expression level, finally leads to death. this view was supported by our findings (supplementary table 2 ), which showed expression decrease pattern of all genes in the immune-related go terms. especially, go terms in viral infection group showed that all genes tend to be down-regulated after pathogen infection, indicating loss of resistance against pathogens by down-regulating the expression of immune-related genes. like this situation, functional information of genes acquired from go enrichment could help researcher to figure out critical biological pathway against any external factor. the immune mechanism in fishes is composed of a set of cellular and humoral system and divided into innate (inherit), and adaptive (acquired) substances. the understanding of fish immune system structure and function is essential for the development of new technologies and products to improve productivity. the transcriptome analysis bring exposure to basic difference in expression profile of all pathogens in the host. these differences were due to nature of parasite, their mode of infection, antigenic variations and many other factors. additionally, along with all above differences, disease progressed in host due to external surface variation of pathogens (viral, parasite and bacterial) and their appropriate recognition by host immune systems for making the basis to initiate microbial clearance 44, 45 . disease research requires the knowledge of important key factors like method of avoiding host immune surveillance, antigenic variations, subversion of immune responses through phagocyte and inhibition of cytokines and chemokines in common with pathogen infections (viral, bacterial and parasite) 44, 45 . on the other hand, the disease progression was different in accordance to type of pathogens. in case of viral infection, understanding of complement inhibition and blockade of cellular immunity is the most important, while parasite and bacterial infections required knowledge and research of innate pathway and acquired immunity 44, 45 . our analysis indicates the complexity and difference of expression profile could be due to all the above reasons. in addition, important basis of fish vaccine is depending on innate and adaptive immunity 46 . there were many vaccine types which depend upon antigens, live microorganisms or specific dna segment of pathogens or polyvalent vaccines. all above vaccines required complete knowledge of pathogenicity and deep research of efficacy 47 . our study clearly indicates about various immune and antigenic genes which can be chosen for pathways analysis and use for therapeutic agents or as some vaccine candidates (supplementary table 2) . despite of their different infection pathway, we wondered common degs which were affected from infection. as shown in fig. 2 , total 67 degs (10 up-regulation and 57 down-regulation) were identified in common in three pathogens, and 1 of 10 up-regulated and 36 of 57 degs were annotated by genomic database (table 4 ; , 408 genes (bacteria), and 561 genes (parasites). these genes were specific for each pathogen so can be used as candidate genes for vaccination or therapeutic agents. in case of the down-regulation of 10168 gene (virus), 177 gene (bacteria) and 60 gene (parasites) in infection of fish, these genes were specific for specific pathogens so can be used as a diagnosis marker for specific pathogens. c-x9-c motif containing 4, col1a1; collagen type i alpha 1 chain, col1a2; collagen type i alpha 2 chain, slc14a2; solute carrier family 14 member 2) which showed highest expression change (down regulation) with fold change (log 2 ) of ' <â��2.0' on at least two infection groups. in this study, we investigated all the above candidate genes and found their role in disease progression. we have listed these genes and their role in below headings. anpep, called as gene aminopeptidase n (apn), is metallopeptidase that exerts strong influence on various immune response mechanisms. for example, apn has been known to cause decomposition of cytokines and peptides used by neurons [48] [49] [50] , and acts as receptor for viruses 51, 52 . in addition, relation between expression level of apn and stimulated t-cell was reported 53 . recently, it was suggested that apn controlled the balance of innate immune and adaptive immune by regulating tlr4 signal transduction pathway in myeloid cells 54 . bglap, also known as osteocalcin, is a noncollagenous protein, mainly found in bone, which needs vitamin k for its synthesis. this protein was thought to play a role in calcium ion homeostasis and used as biological marker for bone formation 55 . in addition, it concerns in endocrine regulation, especially in digestive system, by stimulating release of insulin hormone from î²-cell of the pancreas and adiponectin hormone from fat cells, respectively 56 . as well as these function, it has been reported to take a role in promotion of energy availability and sexual maturation of male by stimulation of testosterone biosynthesis 57, 58 . cmc4, called as mtcp1, has been mainly reported to be related in various diseases. it was reported that mtcp1 gene affected t-cell homeostasis prior to process of leukemogenesis in transgenic mice 59 . although the function of this gene has been entirely discovered, regulation error of mtcp1 gene affected on cell survival and cell growth 60 . besides, this gene was known to be related in the pathogenesis of a subset of t-cell lymphoproliferative diseases 61, 62 . col1a1 and col1a2 encodes the pro-î±1 chain and the pro-î±2 chain protein, respectively. the type i collagen, which is comprised by two pro-î±1 chains and one pro-î±2 chain, plays a role in reinforcement and support in most of all connective tissues such as bone, cartilage, skin, and tendon, and offers those tissues rigidity and elasticity 63, 64 . this protein was reported to stimulate expression of pro-inflammatory cytokines and professional phagocytes in teleost fish gilthead seabream 65, 66 . in addition, it has been reported that receptor-mediated interaction which is formed in between cells and collagen molecule might affect in wound healing, inflammatory, and immune response by activating various factors such as cytokines, growth factor, and matrix metalloprotease 63, [66] [67] [68] [69] [70] . slc14a2, also is known urea transporter 2 (hut2), is important gene involved in urea transport and play role in physiology. in mammals, two types of urea transporter (slc14a1 and slc14a2) has been reported 71 and were regulated by vasopressin hormone 72 . the kidney uses urea to maintain the appropriate concentration and volume of blood. without control of these proteins, organism would result in extreme damage in urinary system. besides, a previous study has reported that genetic variation including nucleotide change is known to significantly influence blood pressure (bp) and metabolism syndrome 73, 74 . as shown in supplementary table 2 , immune-related degs were revealed as results of three pathogens infection. infection of pathogens caused activation of immune system to respond to invasion of harmful external elements, indicated that change of expression level of immune-related genes. the down-regulation of gene could sequentially influence on expression of various molecules positioned in down-streams in metabolic pathway. in this view, cd13, which is one of down-regulated genes by infection of three pathogens on common, was reported to inactivate interleukin 8 49 . representative function of this cytokine is to induce migration of neutrophils and granulocytes toward infection site. in addition, absence of cd13 considerably improves cross-presentation of soluble antigen via regulation of receptor-mediated uptake 75 . thus, decrease of cd13 expression consequentially might activate immune response in the olive flounder. mtcp1 gene induced malignant t-cell transformation 59 and was related in the leukemogenic process of mature t-cell proliferation 61 . this gene was thought to maintain balance of immune response by t-cell. the innate immune system mediates the initial inflammatory response by pathogen infection or injury. for rapid response against external pathogens, infected cells secrete various cytokines to induce effector cells and complements. the type i collagen, which is comprised by proteins coded from col1a1 and col1a2, was involved in the expression of pro-inflammatory cytokines in the innate immune system. however, two genes (col1a1 and col1a2) showed decreasing expression pattern after infection in our study. given sampling period (7 days from infection) of olive flounders for this study, it might be explained that the adaptive immune response was activated in the olive flounder genome. it is hard to understand comprehensively about immune system of fish genome. however, our results were expected to contribute for further study by extend of genomic knowledge in the olive flounder. in conclusion, this study is helpful in understanding infection of the diversified pathogens (antigenic variation) and their role in disease progression in the olive flounder. the differentially expressed genes identified from transcriptome analysis using three types of pathogens could be useful to study the basic diagnosis and therapeutic mechanisms, and offer opportunities for designing the appropriate vaccines or drug targets for pathogen specific candidate genes. because of lack of genomic information or using one external infection factor, previous studies have been limited to understand global expression pattern of whole genes in the olive flounder genome. we hope that this research would contribute to achieve great outcome in various biological field. ethical statement. all experiments with the olive flounders in this study were carried out in accordance with the guidelines and regulation approved by ethical committee of pukyong national university. preparation of olive flounder gill tissues. gill tissues from twelve olive flounder (bw = ~50 g, n = 3/ group) including healthy and infected fish with each pathogen were used for this study. briefly, healthy fish (non-challenged), sampled fish at 7 days post challenge (dpc) with s. parauberis at 5.06 ã� 10 3 cfu/fish in 1/3 seawater of 21 â°c, sampled fish at 7 dpc with vhsv at 106 pfu/fish in 1/3 seawater of 19 construction of cdna libraries for transcriptome analysis. building of transcriptome libraries were conducted by illumina's truseq rna protocol, and 1-2 î¼g of total rna were used in each samples. ampure xp beads (beckman coulter) and ambion fragmentation reagents kit (ambion, austin, tx) were used for extraction of poly(a)+ rna and their fragment, respectively. as the following steps, cdna synthesis, end-repair, a-base addition, and ligation of the illumina indexed adapters were carried out according to illumina's protocol. the size-selected 250-300 bp cdna fragments were loaded on a 3% nusieve 3:1 (lonza) agarose gel for libraries. the cdna fragments were recovered using qiaex ii gel extraction reagents (qiagen), and amplified using phusion dna polymerase (new england biolabs) for 14 pcr cycles. the amplified libraries were purified by ampure xp beads, their concentration and product sizes were assessed on an agilent 2100 bioanalyzer. sequencing of paired-end libraries were conducted with the illumina hiseq2500, (2 ã� 100 nucleotide read length). transcriptome analysis and differential gene expression. transcriptome analysis were carried out with the rnaseq tuxedo protocol. mapping of sequences were conducted against the olive flounder draft genome (submitted at present) using tophat v2.0.9 with default options for paired-end sequences. transcripts expression were estimated using the cufflinks program v2.1.1. total sequencing reads were subjected to preprocessing as follows: adapter trimming was performed using cutadapt with default parameters, and quality trimming (q30) was performed using fastqc with default parameters. processed reads were mapped to the olive flounder draft genome (submitted at present) using tophat and cufflink with default parameters 76 . the differential analysis was performed using cuffdiff 76 using default parameters. further, the fpkm values from cuffdiff were normalized and quantitated using r package tag count comparison (tcc) 77 to determine statistical significance (e.g., p values) and differential expression (e.g., fold changes.). through these statistics analysis, we sorted degs having p < 0.05 and showed them as results. gene ontology analysis. deg set for go analysis was acquired from transcriptome analysis. degs were annotated from interproscan database and non-redundant protein database in the ncbi. david and uniprot tool were used for exploring the functional enrichment of these degs and sorting specially out the immune-related go terms with p-value of <0.05, respectively. immune relevant genes of japanese flounder, paralichthys olivaceus. comparative biochemistry and physiology development of a dna vaccine against hirame rhabdovirus and analysis of the expression of immune-related genes after vaccination identification and analysis of the immune effects of cpg motifs that protect japanese flounder (paralichthys olivaceus) against bacterial infection complete genomic sequence of viral hemorrhagic septicemia virus, a fish rhabdovirus viral haemorrhagic septicaemia virus in marine fish and its implications for fish farming -a review isolation of viral haemorrhagic septicaemia virus (vhsv) from wild japanese flounder. paralichthys olivaceus genetic relationship of the vhsv (viral hemorrhagic septicemia virus) isolated from cultured olive flounder, paralichthys olivaceus in korea an outbreak of vhsv (viral hemorrhagic septicemia virus) infection in farmed olive flounder paralichthys olivaceus in korea an outbreak of vhsv (viral hemorrhagic septicemia virus) infection in farmed japanese flounder paralichthys olivaceus in japan philasterides dicentrarchi (ciliophora, scuticociliatida) as the causative agent of scuticociliatosis in farmed turbot scophthalmus maximus in galicia (nw spain) fatal encephalitis due to the scuticociliata uronema nigricans in sea-caged, southern bluefin tuna thunnus maccoyii. diseases of aquatic organisms morphology and biology of parasite responsible for scuticociliatosis of cultured olive flounder paralichthys olivaceus complete small subunit rrna gene sequence of the scuticociliate miamiensis avidus pathogenic to olive flounder paralichthys olivaceus pseudocohnilembus persalinus (ciliophora: scuticociitida) is an additional species causing scuticociliatosis in olive flounder paralichthys olivaceus occurrence of scuticociliatosis in olive flounder paralichthys olivaceus by phiasterides dicentrarchi (ciliophora: scuticociliatida) pathogenicity of miamiensis avidus (syn. philasterides dicentrarchi), pseudocohnilembus persalinus, pseudocohnilembus hargisi and uronema marinum (ciliophora, scuticociliatida). diseases of aquatic organisms isolation and characterization of streptococcus sp. from diseased flounder (paralichthys olivaceus) in jeju island discrimination of streptococcosis agents in olive flounder (paralichthys olivaceus) bacterial diseases of cultured marine fish in japan multiplex pcr assay for detection of bacterial pathogens associated with warm-water streptococcosis in fish streptococcosis in cultured turbot (schophtalmus maximus) associated with streptococcus parauberis phenotypic characteristics of streptococcus iniae and streptococcus parauberis isolated from olive flounder (paralichthys olivaceus) molecular cloning and expression study on toll-like receptor 5 paralogs in japanese flounder characterization and functional analysis of a novel c1q-domain-containing protein in japanese flounder (paralichthys olivaceus) cathepsins in the kidney of olive flounder, paralichthys olivaceus, and their responses to bacterial infection molecular cloning and expression analysis of two hepcidin genes from olive flounder paralichthys olivaceus rna-seq transcriptome analysis of the olive flounder (paralichthys olivaceus) kidney response to vaccination with heat-inactivated viral hemorrhagic septicemia virus transcriptome analysis of the gonads of olive flounder (paralichthys olivaceus). fish physiology and biochemistry gonadal transcriptome analysis of male and female olive flounder (paralichthys olivaceus) de novo assembly of the japanese flounder (paralichthys olivaceus) spleen transcriptome to identify putative genes involved in immunity differentially expressed genes after viral haemorrhagic septicaemia virus infection in olive flounder (paralichthys olivaceus) cdna microarray analysis of viral hemorrhagic septicemia infected olive flounder, paralichthys olivaceus: immune gene expression at different water temperature transcriptional analysis of olive flounder lectins in response to vhsv infection immune response of olive flounder (paralichthys olivaceus) infected with the myxosporean parasite kudoa septempunctata cloning and expression analysis of cathepsin d in the olive flounder paralichthys olivaceus distribution of marine birnavirus in cultured olive flounder paralichthys olivaceus in korea the impact of co-infections on fish: a review systematic and integrative analysis of large gene lists using david bioinformatics resources in-depth profiling and analysis of host and viral micrornas in japanese flounder (paralichthys olivaceus) infected with megalocytivirus reveal involvement of micrornas in host-virus interaction in teleost fish a cdna microarray analysis to identify genes involved in the acute-phase response pathway of the olive flounder after infection with edwardsiella tarda heat shock protein profiles on the protein and gene expression levels in olive flounder kidney infected with streptococcus parauberis lectin histochemistry of kudoa septempunctata genotype st3-infected muscle of olive flounder (paralichthys olivaceus) anti-immunology: evasion of the host immune system by bacterial and viral pathogens fish immunity and parasite infections: from innate immunity to immunoprophylactic prospects advances in research of fish immune-relevant genes: a comparative overview of innate and adaptive immunity in teleosts present status and future prospects of fish vaccination: a in vitro degradation of opioid peptides by human placental aminopeptidase m inactivation of interleukin-8 by aminopeptidase n (cd13) t cell responses affected by aminopeptidase n (cd13)-mediated trimming of major histocompatibility complex class ii-bound peptides human aminopeptidase n is a receptor for human coronavirus 229e cd13 (human aminopeptidase n) mediates human cytomegalovirus infection the activation-dependent induction of apn-(cd13) in t-cells is controlled at different levels of gene expression cd13 restricts tlr4 endocytic signal transduction in inflammation milk ribonuclease-enriched lactoferrin induces positive effects on bone turnover markers in postmenopausal women endocrine regulation of energy metabolism by the skeleton regulation of male fertility by the bone-derived hormone osteocalcin osteocalcin signaling in myofibers is necessary and sufficient for optimum adaptation to exercise the mtcp1 oncogene modifies t-cell homeostasis before leukemogenesis in transgenic mice transgenic mice for mtcp1 develop t-cell prolymphocytic leukemia mtcp-1: a novel gene on the human chromosome xq28 translocated to the t cell receptor alpha/delta locus in mature t cell proliferations the chromosomal translocation t(x;14)(q28;q11) in t-cell prolymphocytic leukaemia breaks within one gene and activates another collagens, modifying enzymes and their mutations in humans, flies and worms procollagen trafficking, processing and fibrillogenesis collagen regulates the activation of professional phagocytes of the teleost fish gilthead seabream correlated expression profile of extracellular matrix-related molecules during the inflammatory response of the teleost fish gilthead seabream a role for specific collagen motifs during wound healing and inflammatory response of fibroblasts in the teleost fish gilthead seabream evolution of collagen-based adhesion systems structural insights into the interactions between platelet receptors and fibrillar collagen interactions between extracellular matrix and growth factors in wound healing urea and renal function in the 21st century: insights from knockout mice essential role of vasopressin-regulated urea transport processes in the mammalian kidney genetic variation in the human urea transporter-2 is associated with variation in blood pressure genetic variants of human urea transporter-2 are associated with metabolic syndrome in asian population cd13 regulates dendritic cell cross-presentation and t cell responses by inhibiting receptor-mediated antigen uptake differential gene and transcript expression analysis of rna-seq experiments with tophat and cufflinks tcc: an r package for comparing tag count data with robust normalization strategies this research was a part of the project titled "omics based on fishery disease control technology development and industrialization (20150242), " funded by the ministry of oceans and fisheries, korea. g.n. wrote the main manuscript text, a.m. and j.g. supported arrangement of contents in manuscript, h.l. and a.j. supported to complete fig. 2 , d.y. and a.k. prepared fish tissue samples, w.k. provided olive flounder genome reference sequence, k.a. provided bioinformatic advice for this study, d.k., s.k., h.c., y.c. and c.p. reviewed the manuscript, h.k. supervised entire flow of the manuscript and reviewed contents. supplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-36342-y.competing interests: the authors declare no competing interests.publisher's note: springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. key: cord-016713-pw4f8asc authors: goyal, amit k.; rath, goutam; garg, tarun title: nanotechnological approaches for genetic immunization date: 2013-05-24 journal: dna and rna nanobiotechnologies in medicine: diagnosis and treatment of diseases doi: 10.1007/978-3-642-36853-0_4 sha: doc_id: 16713 cord_uid: pw4f8asc genetic immunization is one of the important findings that provide multifaceted immunological response against infectious diseases. with the advent of r-dna technology, it is possible to construct vector with immunologically active genes against specific pathogens. nevertheless, site-specific delivery of constructed genetic material is an important contributory factor for eliciting specific cellular and humoral immune response. nanotechnology has demonstrated immense potential for the site-specific delivery of biomolecules. several polymeric and lipidic nanocarriers have been utilized for the delivery of genetic materials. these systems seem to have better compatibility, low toxicity, economical and capable to delivering biomolecules to intracellular site for the better expression of desired antigens. further, surface engineering of nanocarriers and targeting approaches have an ability to offer better presentation of antigenic material to immunological cells. this chapter gives an overview of existing and emerging nanotechnological approaches for the delivery of genetic materials. vaccine development offers an attractive and cost-effective preventive approach against deadly disease. new advances in immunology, molecular biology and biotechnology as low as for the development of unique, safe and effective against some dreadful diseases like hiv, cancer, hepatitis, tuberculosis, etc. (table 1) . genetic immunization holds potential to discover new vaccines and may be an efficient vaccine delivery system. in the early 1990s, dna vaccines burst into the scientific limelight. tang and johnston described the delivery of dna using a gene gun into the mice skin and felt that this could be a useful technique to generate antibody responses against specific transgene product (tang et al. 1992) . in 1992, at the annual vaccine meeting at the cold spring harbor laboratory reported to drive both humoral and cellular immune responses against pathogens or tumor antigens in vivo by the use of dna vectors. merck pharmaceutical company reported that developed immune responses against influenza virus antigens in mice after injecting the naked plasmids intramuscularly (ulmer et al. 1993) . similarly, robinson proved the ability of dna plasmids against influenza virus antigens (fynan et al. 1993) . the capability of plasmids carrying hiv antigens or tumor antigens to generate immune responses and protection from tumor in mice has been described (wang et al. 1993 ). importantly, a dna vaccine affects humoral as well as cellular immunity. the use of the dna approach also promised to overcome the safety concerns associated with live vaccines-their reversion risks and their potential spread to unintended individuals, avoids the risks linked to the manufacture of killed vaccine (ruprecht 1999) . vaccines are generally composed of whole organism-either live and weakened or killed forms (first-generation vaccines). live, attenuated organisms such as smallpox and polio vaccines are able to induce killer t-cell (t c ) responses, helper t-cell (t h ) responses, and antibody immunity (fig. 1) . first-generation vaccines providing maximum protection but associated with a risk that attenuated forms of a pathogen can revert to a dangerous form and may still be able to cause disease, especially in immune compromised vaccine recipients (e.g., aids patients). killed vaccines cannot generate specific killer t-cell responses and will be effective against limited diseases where cellular response is not essential (alarcon et al. 1999 ). these were the reasons which initiated the research for second-generation vaccines. second-generation vaccines were the subunit vaccines, consisting of defined protein antigens (such as tetanus or diphtheria toxoid), recombinant protein components (hepatitis b surface antigen), or surface proteins (influenza). these vaccines are able to generate th and antibody responses, but not killer t-cell responses. this reason again restricted the utility of these vaccines to limited number of diseases. today is the era of genetic immunization, which is nextgeneration vaccine (third generation), which seems to be highly effective till date. this strategy is based upon improved gene optimization, improved rna structural design, novel formulations and immune adjuvants, and more effective delivery approaches (alarcon et al. 1999; robinson and pertmer 2000) . at the cellular level, introduction of nanotechnology and the development of nanocarrier-based vaccines provide effective immunization through better targeting and by triggering antibody responses. in order to induce an effective protective immunity, these vaccines require boosting with agents called adjuvants. adjuvants and delivery vehicles have shown to protect antigens from degradation. the current trend toward many efforts to develop novel adjuvants and carrier have persistent on systems at the micro-and nanoscale. immunization by traditional vaccines requires the administration of live attenuated virus, killed organism, whereas dna vaccines can be constructed to encode specific antigenic determinants. dna vaccines are highly flexible, stable, easily stored, manufacture on large scale, encoding several types of genes including viral or bacterial antigens, and immunological and biological proteins (gengoux and leclerc 1995; kutzler and weiner 2008) . many potential advantages of dna vaccines are summarized in table 2 . the gene of interest having the antigenic determinant is inserted into the recombinant vectors like multiple cloning region of plasmid by enzymatically, synthetically or by pcr and delivered to the inoculation site by one of several delivery methods like physical (gene gun, electroporation), viral (virosomes), or nonviral (liposomes, microspheres, nanospheres) to either skin (intradermally), subcutaneum, or muscle. the mechanisms by which dna vaccines produce antigen-specific immunity in vivo are under intense investigation, with an idealized model presented in fig. 2 . figure 2 represents the overview of the mechanisms of plasmid uptake and proteinaceous antigen expression by either somatic cells (e.g., myocytes, keratinocytes) at the site of injection or the resident antigen-presenting cells (apcs), the immature dendritic cells. the mechanisms include (a) direct targeting of dendritic cells (langerhans cells, i.e., skin dendritic cells) in gene gun administration of plasmid dna, which involves high-speed shooting of gold microbeads coated with plasmid dna into the upper layers of the skin or (b) "cross-priming," most likely in intramuscular injections or any parenteral injections where the somatic cells mentioned above primarily express the protein encoded by the the success of dna vaccines concerns improving their immunogenicity and safety. therefore, there is an urgent need for the development of potent and safe adjuvants and delivery systems that can be used with new generation of vaccines. as shown in fig. 3 , there are several ways in which antigen expression and immunogenicity can be improved for the dna vaccine platform. there are lot of steps undertaken to modify immunogenicity and safety of dna vaccines (fig. 3) . promoter is an important component of the plasmid that drives high levels of expression of the gene of interest. various promoters have been utilized to improve the expression of vaccine genes. the human cytomegalovirus (cmv) promoter has been extensively used for high levels of protein expression in mammalian cells (boshart et al. 1985) . however, there are some drawbacks associated with cmv promoters like chromatin condensation by histone deacetylase. recently, histone deacetylase inhibitors have been supplemented with cmv promoter-based plasmid that has shown increased expression of dna vaccine antigens (lai et al. 2010 immediate-early promoter one (ie1) have also demonstrated better gene expression in insect cells compared with cmv promoter (he et al. 2008 ). the porcine circovirus type 1 capsid gene promoter has enhanced the antigen expression and immunogenicity in a hiv-1 plasmid vaccine (tanzer et al. 2011 ). regulation of transcriptional termination is a key element in control of gene expression within the framework of a single transcriptional promoter (barr et al. 2002) . so one of the most effective ways to increase protein production is through the use of codon optimization or by adopting species-specific codon changes (gustafsson et al. 2004) . plasmid backbone optimization has also been important contributory factor for dna vaccine. replacement of sv40t polyadenylation and splicing signals of the paec plasmid vectors by synthetic intron and synthetic rabbit beta globin-based termination/polyadenylation sequences and cpg motif have enhanced the cell-mediated ifn-gamma-secreting activity. the rna polymerase ii dependent cytomegalovirus immediate early (cmv ie) enhancer/promoter and t7 promoter in psmcta and pshcta has been utilized to enhance the expression of antigenic substances (yu et al. 2005 ). viral vectors are a tool commonly used by molecular biologists to deliver genetic material into cells. viral vector vaccines use live viruses to carry dna into human cells. it consists of a non-replicating virus that contains some defined genetic fig. 3 factors affecting the immunogenicity of dna vaccines material from the pathogen to which immunity is desired. viruses have evolved specialized molecular mechanisms to efficiently transport their genomes inside the cells they infect. viral vector vaccines carry dna into a host cell for production of antigenic proteins that can be tailored to stimulate a range of immune responses, including antibody, t helper cell (cd4 + t cell), and cytotoxic t lymphocyte (ctl, cd8 + t cell) mediated immunity (draper and heeney 2010) . retroviruses, parvoviruses, adenoviruses, lentiviruses, adeno-associated viruses, and the herpes simplex virus are being investigated for their ability to transfer dna. gene expression with high transfection efficiencies in tissues, such as kidney, heart, muscle, eye, and ovary, has been achieved by using viral vectors. advantages of viral-vectored vaccines include their ease of production, a good safety profile, ability to potentiate strong immune responses, infect a broad spectrum of cell types, triggering t-lymphocyte activation, potential for nasal or epicutaneous delivery and mucosal immunization (chamberlain 2002; galimi and verma 2002; lien and lai 2002; martin et al. 2002; mctaggart and al-rubeai 2002; wolf and jenkins 2002) . the recombinant retroviruses have the ability to integrate into the host genome in a stable fashion because it contains a reverse transcriptase that allows integration into the host genome. 8-10 kb is the typical maximum length of an allowable dna insert in a replication-defective viral vector. lentiviruses are a subclass of retroviruses. the unique feature of lentiviruses is to their ability to integrate into the genome of nondividing cells, whereas retroviruses can infect only dividing cells. when the virus enters the cell, the viral genome in the form of rna is reverse transcribed and produce dna, which is then inserted into the genome by the viral integrase (cattoglio et al. 2007 ). their primary applications are in gene therapy and vaccination but their limits use in basic research due to it does not integrate into the genome and is not replicated during cell division. respiratory, gastrointestinal and eye infections were commonly caused in humans after the contact with adenoviruses. adeno-associated virus (aav) is a small virus that infects both dividing and nondividing cells of humans and some other primate species and may incorporate its genome into that of the host cell with causes a very mild immune response. these features make aav a very attractive candidate for creating viral vectors for gene therapy (goff and berg 1976) . nowadays, incorporation of molecular adjuvants has been the main strategy for melioration of vaccines. co-injection of plasmids encoding cytokines, chemokines, or co-stimulatory molecules like death receptors, growth factors, adhesion molecules, toll-receptor ligands can be used individually or in combination to maximize substantial effect on the immune response in the clinic, in both prophylactic and therapeutic studies to plasmid-encoded antigen. for example, boost the humoral and cellular response when antigen co-administered with synthetic oligodeoxynucleotides containing unmethylated cpg motifs in mice (higgins et al. 2007) . recently, immunomodulation is based on targeting antigen-presenting cells (apc) "majorly macrophages" by using macrosialin promoter. the immune response of the constructed plasmids expressing jev envelope (e) protein under the control of aforesaid promoter and cytomegalovirus (cmv) immediate early promoter against jev have induces comparable immunity in comparison to ubiquitous promoter construct (ahsan and gore 2011) . nk group 2, member d (nkg2d) is also reported as potent-activating receptor expressed by cells of the innate and adaptive immune systems. recombinant mouse cmv expressing the high-affinity nkg2d ligand rae-1γ has shown better expression and profound virus attenuation in vivo and could be a powerful to develop immunogenic hcmv vaccine (slavuljica et al. 2010) . in vivo dendritic cells (dc) targeting is an attractive approach with potential advantages in vaccine efficacy, cost, and availability. genetic targeting of the dc-specific cd11c-driven active transcription factor xbp1s to dc (xbp1s/dc) has potentiated vaccine-induced prophylactic and therapeutic antitumor immunity in multiple tumor models (tian et al. 2012) . recently, heterodimeric antigen-presenting cells targeted multireceptor ligand approaches have been implemented to access the potential of more than one apc-specific targeting unit in the antigenic molecule. results revealed that heterodimeric barnase-barstar vaccine molecules were potent and provide a flexible platform for development of novel dna vaccines with increased potency (spang et al. 2012 ). some mechanisms of adjuvant action are discussed below: 1. vaccine adjuvants can increase the potency and immune response of small, antigenically weak synthetic or recombinant peptides in immunologically immature, immunosuppressed, or senescent individuals. 2. they can improve the immune response to stronger antigens in respect of speed, vigor, and persistence. for example, aluminum adjuvants adsorbed dtp elicit early and higher antibody response after primary immunization than do unadjuvanted preparations. 3. vaccine adjuvants can modulate antibody avidity, specificity, quantity, isotype, and subclass against epitopes on complex immunogens. 4. they target antigen to a cell-surface receptor on apcs by formation of multimolecular aggregates. 5. they can direct antigen presentations by direct peptide exchange on surface mhc molecules or by mhc class i or mhc class ii pathways by means of fusion or disruption of cell membranes (newman and powell 1995) . the most important characteristic of any adjuvanted vaccine is that it is more efficacious than the aqueous vaccine but unfortunately, the absolute safety of adjuvanted vaccines, or any vaccine, cannot be guaranteed. the real or theoretical risks of administering vaccine adjuvants are local acute or chronic inflammation, painful abscess, persistent nodules, ulcers, fever, hypersensitivity, anaphylaxis, chemical toxicity to tissues or organs, autoimmune arthritis, amyloidosis, anterior uveitis, glomerulonephritis or meningoencephalitis, immune suppression or oral tolerance, carcinogenesis, teratogenesis or abortogenesis and spread of a live vectored vaccine to the environment (edelman and tacket 1990; bussiere et al. 1995; goldenthal et al. 1998 ). 1. it must be safe, including freedom from side effects. 2. it should be affordable and stable. 3. it should be biodegradable or easily removed from the body after its adjuvant effect. 4. efficacy and immunogenicity should be achieved using fewer doses and/or lower concentrations of the antigen. 5. it should elicit a more vigorous protective or therapeutic immune response combined with the antigen than when the antigen is administered alone. 6. it must be defined chemically and biologically, so that there is no lot-to-lot variation in the manufactured product. freund's adjuvant is a solution of antigen emulsified in mineral oil and used as an immunopotentiator (booster). freund's complete adjuvant (fca) is composed of inactivated and dried mycobacteria whereas the incomplete form (fia) lacks the mycobacterial components. although, fca has been proved as a potent inducer of cell-mediated immunity and ability to boost the humoral immune response, but associated adverse side effects like sterile abscesses, granulomas, muscle indurations, plasma cell neoplasia, ascites and amyloidosis has limits its utility. a modified version of fca is known as freund's incomplete adjuvant (fia) in which antigen is administered in water-in-oil (w/o) emulsion but without mycobacterial components. it consists of a mixture of mineral oil (drakeol 6vr, bayol f, marcol 52) (85 % v/v) and emulsifier (mannide monooleate) (15 % v/v) with an equal volume of aqueous solution of antigen. mechanism of the freund's adjuvants is allowing a gradual and continuous release of the antigen by establishment of a repository antigen-containing locus at the site of injection or interaction with mononuclear cells such as phagocytic cells, antigen presenting cells, etc. fia has been included in veterinary vaccines (rabies, hog cholera, canine hepatitis) (freund et al. 1948; fastier and hansen 1964; ott 1966) , as well as human vaccines (tetanus toxoid, influenza vaccines) (salk et al. 1952) . in general, both fia and fca are indeed very efficient in raising high antibody titers, induce cytotoxic t lymphocytes (ctl) and used in priming immunizations. morozova et al. investigated that development of inflammatory response in the rat myocardium after immunization rats with single subcutaneous injection of cardiac myosin (800 ug/kg) with incomplete freund's adjuvant (ifa) (gjessing et al. 2012 ). there are very limited studies that have been conducted which signify their utility of freund's adjuvant for dna vaccination. it has been demonstrated that plasmid pv-16cpg suspended in ifa has significantly enhanced both type of cellular and humoral immune responses to hbsag (luo et al. 2012 ). aluminum compounds [aluminum phosphate (alpo 4 ), aluminum hydroxide (al (oh) 3 ), and alum] precipitated vaccines are currently the most commonly used adjuvants with human and veterinary vaccines owing to their good track record of safety, low cost, and adjuvanticity with a variety of antigens (gupta et al. 1993; gupta and siber 1995) . however, aluminum adjuvants have certain limitations such as local reactions at the site of injection, ige antibody responses augmentation, ineffectiveness for some antigens, and inability to supplement cell-mediated immune responses . two methods are used to prepare vaccines and toxoids with aluminum compounds-in situ precipitation of aluminum compounds in the presence of antigen and adsorption of antigen onto preformed aluminum gel (aprile and wardlaw 1966; holt et al. 1994; hem and white 1995; gupta 1998) . the mechanism of adjuvanticity of aluminum compounds includes formation of a depot, efficient uptake by antigen-presenting cells, stimulation of immune competent cells of the body through induction of eosinophilia, and activation of macrophages and complement. recently, adjuvanticity of alum has been reported due to cell death and the subsequent release of host cell dna, which acts as a potent endogenous immunostimulatory signal-mediating alum adjuvant activity (marichal et al. 2011) . gupta et al. (1996) showed that diphtheria toxoid adsorbed aluminum phosphate induced significant antibody levels in rabbits. previously, manam et al. reported that aluminum phosphate adjuvant had shown no effect on the tissue distribution and integration frequency of delivery genetic materials (manam et al. 2000) . similarly, liang et al. (2004) showed the similar results indicated that there was not increase in hbsag expression when plasmid pcdna3.1-s mixed aluminum phosphate. however, they demonstrated the better antibody titer after intramuscular immunization of balb/c mice with pcdna3.1-s mixing aluminum phosphate adjuvant. this study revealed that aluminum phosphate has a potential for dna vaccination (liang et al. 2004) . recently, yu et al. have demonstrated the role of aluminum adjuvant for dna vaccines against botulinum neurotoxin (bonts) and shown induced protective humoral immune responses (yu et al. 2010) . combined use of il-12 with alum adjuvants for dna immunization have also demonstrated the significant change in the survival rates of the vaccinated animals against toxoplasma gondii (khosroshahi et al. 2012) . cytokines are a group of secreted low-molecular weight proteins by the cells of the innate and adaptive immunity that have a major role in cell-to-cell communication. cytokines play an important role in induction of immune responses during the processing and presentation of antigens. numerous cytokines including interleukin-12 (il-12), granulocyte-macrophage colony stimulating factor (gm-csf), and interleukin-2 (il-2) have been shown to significantly modulate the inflammatory process when given systemically. the local administration of il-2 increases local expression of major histocompatibility (mhc) class ii antigens and enhances skin antigen reactivity, but high bolus doses of il-2 cause hypotension, exacerbation of underlying autoimmune disease, and induce vascular leak syndrome. this studies revealed that exogenous il-2 could be a valuable adjunct in the treatment of immunodeficiency virus (hiv) infected human by decreases the frequency of apoptotic peripheral blood mononuclear cells (pmbcs), which may contribute to the increase in circulating cd 4+ t cells. il-2 also induces b-cell activation and antibody synthesis in vitro (cordiali fei et al. 1994 ). among various improvement strategies, the incorporation of cytokine-expressing plasmids as molecular adjuvants has been widely studied in the past years, yet still without significant clinical application. this chapter reviews recent progress in the co-application of cytokine-encoding genes used for enhancement and direction of immunogenicity, as well as discusses their therapeutic potential for future applications. coadministration of pro-inflammatory agents (such as various interleukins, tumor necrosis factor, and gm-csf) plus th2-inducing cytokines increase antibody responses, whereas pro-inflammatory agents and th1-inducing cytokines decrease humoral responses and increase cytotoxic responses (which is more important in viral protection, for example). co-stimulatory molecules like b7-1, b7-2 and cd40l are also sometimes used. mpl (monophosphoryl lipid), a immunostimulant, is derived from the lipopolysaccharide (lps) of salmonella minnesota, r59. an important characteristic of mpl adjuvant activity is to enhance the generation of specific immunity without being directly associated with an antigen. the choice of an mpl adjuvant formulation will depend on several factors such as the nature of the antigen, desired immune response characteristics, and level of tolerable local reactogenicity. aqueous dispersions of mpl in isotonic buffers when admixed with soluble protein antigens can provide a strong adjuvant effect. an advantage of these mpl plus antigen is that they tend to be well tolerated and induce little or no local tissue reaction at the injection site (qureshi et al. 1985) . mpl-a has been used to enhance immunity induced by dna vaccination against human immunodeficiency virus type 1 (hiv-1). results indicate that mpl performances as an effective adjuvant for immunogenic dna injection despite reduced expression of encoding protein in muscle (sasaki et al. 1997) . combination of mpl with antigen-encoded dna has shown the enhanced protective neutralizing antibody response against glycoprotein of the cvs rabies virus (lodmell et al. 2000) . lipid a has also been admixed with plasmid dna (pdna)-coated nanoparticles and studied for their immunological potential. immunological results revealed that plasmid dna with lipid a have shown significant higher immunological response, especially cellular response (cui and mumper 2003a, b) . studies indicated that la is potential adjuvants to further enhance immune responses; however, limited studies have been utilized this adjuvant for dna vaccination. several established methods have utilized for transferring plasmid dna into cells, including calcium phosphate precipitation, electroporation, particle bombardment, liposomal delivery, polymeric delivery, viral-vector delivery, and receptormediated gene delivery. however, compared to viral vectors, nonviral vectors are easy to make and are less likely to produce immune reactions (edelman and tacket 1990 ). in addition, there is no replication reaction required. the engineered novel nano-construct may deliver immunogens safely, with the appropriate kinetics, to the appropriate location, and possibly together with the adequate recognition and maturation stimuli (fig. 4) . the use of nonviral particulate carriers for dna-based vaccination could provide better and safe delivery of encapsulated genetic material, circumvent the need for muscle involvement and facilitate instead the uptake of the fig. 4 schematic representation of immunological response greeted by novel dna-loaded nanocarrier dna by apcs. however, transfection of apcs with encapsulated dna into particulate carrier systems will be dependent upon choice of carrier surface charge, size, and lipid/polymer composition, or presence of other biological [e.g., interleukin 2 and interferon-γ (ifn-γ)]. toxicity, transfection efficiency, nucleic acid (na) degradation and free na release are challenging problems for all of the current nonviral gene delivery systems, including lipid and polymers carrier systems (pouton et al. 1998; cui and mumper 2003a, b) . one current trend in dna vaccine formulation is the use of biodegradable polymeric microparticles and liposomes delivery systems for dna vaccines are excellent formulations for delivery and enhanced immunogenicity in several different hosts like mice, nonhuman primates and humans (herrmann et al. 1999; kaur et al. 2004 ). as noted earlier, genetic materials attached to a particulate carrier are more likely to bring about a successful immunological reaction and some, such as chitosan particles, can act as adjuvants in their own right. natural polymers such as gelatin or albumin have been used as particulate drug delivery systems, although they are of uncertain purity and certainly have the potential for immunogenicity (pouton et al. 1998; cui and mumper 2003a, b; xiang et al. 2006; pichichero 2008) . plasmid dna is trapped on the surface of the polymers like polylactice-coglycolide, chitosan, polyethyleneimine, amine-functionalized polymethacrylates, cationic poly(β-amino esters), poloxamers, and polyvinylpyrrolidone (densmore 2003) . polymer-trapped plasmid dna is delivered systemically or directly to mucosal surfaces (orally or via the respiratory tract), where the complex is taken up by dendritic cells (dcs) and results in upregulation of dc activation markers and further augments systemic and mucosal immune responses. liposomes offer considerable flexibility towards vaccine optimization due to its structural versatility, including vesicle surface charge (both cationic and anionic liposomes can be made), size, and lipid content. liposome with other suitable adjuvants can protect dna from degradation by serum proteins during transfer of dna across membranes and after the release of genetic material following fusion with endosome (gao and huang 1995; nakanishi and noguchi 2001) . among the different approaches to drug delivery, lipid vesicles for both hydrophobic and hydrophilic drugs have attracted much attention. lipid-based gene delivery is the focus of several specialized high-technology companies, of which vical (san diago, ca, usa), genzyme (farmington, ma, usa), genemedicine (the woodlands, tx, usa) and megabios (burlingame, ca, usa) have products in clinical trials. some of the engineered liposomal and non-liposomal versions like ph-sensitive cationic and anionic liposomes, ph-sensitive immunoliposomes, fusogenic liposomes; genosomes (dna-liposomes/lipid complexes), lipofection tm (lipid-dna complex) and recently cochleates are investigated as the major gene vectors (fig. 5) . however, most of the commercially available nonviral gene vectors used for transfection is cationic liposome-dna complexes (fenske and cullis 2008). liposomes are self-assembling structures comprising concentric amphipathic lipid (e.g., phospholipid) bilayers separated by aqueous compartments (baca-estrada et al. 2000; saupe et al. 2006) . in 1974, first humoral immune responses observed in mice after injection of liposome-entrapped diphtheria toxoid (allison and gregoriadis 1974; manesis et al. 1978) . liposomal vaccines that have been investigated inhuman trials include malaria, hiv, hepatitis a, influenza, prostate cancer and colorectal cancer (katre et al. 1998 ). in a liposome-based drug delivery system, genetic material is encapsulated in the liposome and then administered to the patient to be treated. advantage of the use of liposomal dna is that it may be taken up directly by apcs such as dendritic cells, which results in transfection and mhc classes i and ii expression, which stimulates the cd4 + and cd8 + t cells by antigenic peptide and induces ctl responses and also b cells to produce antibodies, whereas vaccination with naked plasmid dna, the plasmid is taken up by the myocytes, which are transfected. unfortunately, there are a number of problems associated with the use of conventional liposomes as genetic vaccine delivery vehicles. the relatively low transfectivity of liposomes, particularly evident with insufficient quantities of polynucleotide within liposomal formulations, can be overcome by adding positively charged amphipathic lipid moieties to liposomal formulations. several phospholipids may be used for the preparation of liposomes entrapped vaccines include phosphatidylcholine, phosphatidic acid, triolein, phosphatidylglycerol, phosphatidylserine, distearoyl phosphatidylcholine, dioleylphosphatidylethanolamine, phosphatidylethanolamine, polyethyleneglycol 6000 etc. overall, by modification, these systems may provide high membrane fluidity, flexibility, endocytosis and fusiogenic behavior, that is making this system far better than other particulate carriers (fig. 6 ). cationic liposomes are widely explored nowadays for the delivery of dna into eukaryotes. they are formed by simple mixing of positively charged lipid bilayers with negatively charged naked dna. the resulting cationic liposomes-dna complexes (lipoplexes) are taken up via endocytosis, followed by their release from an early endosomal compartment (duzgunes et al. 2003) . cationic lipid-dna complexes have been used successfully to deliver plasmid dna to the lungs, brain, tumors and skin, by local administration, or to vascular endothelial cells after systemic, intravenous injection (brigham et al. 1989) . in addition to different cationic lipids (fig. 7 have also shown an important role in membrane perturbation and fusion for intracellular delivery of genetic material. liu et al. have shown that lipoplexes showed much higher transfection in the liver than naked dna alone (liu et al. 2003 ). gregoriadis et al. for the first time showed that intramuscular immunization of mice with prc/cmv hbs (encoding the s region of hepatitis b antigen; hbsag) entrapped into positively charged (cationic) liposomes leads to greatly improved humoral and cell-mediated immunity (gregoriadis et al. 1997 ). these cationic liposome-entrapped dna vaccines generate titers of anti-hbsag igg1 antibody isotype in excess of 100-fold higher and increased levels of both ifn-γ and il-4 when compared with naked dna or dna complexed with preformed similar (cationic) liposomes. further, modification of liposomal surface with polymer offers potential for oral administration of plasmid dna and able to elicit markedly enhanced transgene-specific cytokine production following in vitro restimulation of splenocytes with recombinant antigen (somavarapu et al. 2003) . modification of lipid/dna complexes by the polymer poly(d,l-lactic acid) was found to be consistently and significantly more effective than either unmodified liposomal dna or naked dna in eliciting transgene-specific immune responses to plasmid-encoded antigen when administered by the s.c. route (bramwell et al. 2002) . surface-modified mannosylated cationic liposomes were developed for targeted delivery of pdna to apcs, and the results verified that man lipoplex induces significantly higher pub-m gene transfection into dendritic cells and macrophages than unmodified lipoplex and naked dna and it also strongly induces ctl activity against melanoma, inhibits its growth and prolongs the survival after tumor challenge compared with unmodified liposomes (lu et al. 2007 ). an anionic lipid formulation called fluid liposomes was capable of delivering fluorescently labeled oligonucleotides into bacterial cells. it was composed of dppc and 1,2-dimyristoyl-sn-glycero-3-[phospho-rac-(1-glycerol)] (dmpg). lack of further progress of these systems may be attributed to the poor association between dna molecules and anionic lipids by electrostatic repulsion between these negatively charged species (perrie and gregoriadis 2000) . liposomes have been prepared from mixtures of anionic and zwitter ionic lipids, 1dopg and dope, respectively, at a molar ratio of 17:83 (dopg:dope). efficient and relatively safe dna transfection using anionic lipoplexes makes them an alternative for gene delivery (patil et al. 2004) . similarly, endosomolytic bacterial protein listeriolysin o (llo) incorporated in an anionic liposome-entrapped polycation-condensed dna delivery system (lpdii) has been developed that demonstrated better condensation of the dna with improved transfection efficiency due to endosomolytic properties of llo (lorenzi and lee 2005) . combination of cationic lipoplexes and pegylated anionic liposomes has also been used to prepare anionic pegylated lipoplexes. studies demonstrated that the gene expression of the developed formulation was similar for the cationic formulation taken as a control and the anionic formulations prepared (mignet et al. 2008) . overall, anionic lipoplex formulation shown promise as a nonviral vector with high-transfection efficiency and low cytotoxicity. a growing amount of literature describes the role of ph-sensitive liposomes for targeting and/or release encapsulated genetic material within cellular compartment. ph-sensitive liposomes are designed to release their contents in response to acidic ph within the endosomal system, while remaining stable in plasma thus improving the cytoplasmic delivery of biopharmaceuticals. they can be generated by the insertion of dope into acidic lipids liposomes such as cholesteryl hemisuccinate or oleic acid (venugopalan et al. 2002) . it is reported that detergent removal method is a superior method for preparing glycosaminoglycan-resistant and ph-sensitive lipid-coated dna complexes. this method is produced stable, but acid activatable, lipid-coated dna complexes (lehtinen et al. 2008) . at the neutral cellular ph 7, these lipids undergo protonation and collapse into a non-bilayer structure of endosomal compartmentalization which in turn helps in the rapid release of dna into the cytoplasm. recently, citraconyl-dope (a chemical derivative of dope), deliver dna-based therapeutics to cancer cells, in this manner combining the targeting and the rapid endosome release (reddy and low 2000) . addition of ph-sensitive fusogenic peptide, gala (peptide composed of repeating sequences of glu-ala-leu-ala) in lipidic preparation is also promising method to enhance the expression of the desired proteins. studies demonstrated that addition of 0.1 μm gala to the plasmid/liposome complex significantly increased the transfection efficiency, especially in the case of lipofectin, but higher concentration of gala decreased transfection efficiency (futaki et al. 2005; nakase et al. 2011) . similarly, ph-sensitive histidine-modified galactosylated cholesterol derivative (gal-his-c4-chol) has also been synthesized that demonstrate much greater transfection activity than conventional liposomes in hepg2 hepatic cells (shigeta et al. 2007 ). further, ph-sensitive tat-modified pegylated liposomes are utilized for delivery of tumor-specific stimuli-sensitive drug and gene delivery systems (kale and torchilin 2007) . immunoliposomes are sophisticated gene delivery systems in which incorporation of functionalized antibodies attached to lipid bilayers used for cell targeting (maclean et al. 1997) . using immunoliposomes, tissue-specific gene delivery has been achieved in the brain, embryonic and breast cancer tissue. recently, immunoliposomes containing an antibody fragment were successfully used in targeted delivery of tumor-suppressing genes into tumors in vivo (xu et al. 2002) . chloramphenicol acetyltransferase (cat) gene-encoded plasmid was entrapped in ph-sensitive immunoliposomes comprising of h-2kk antibody-coated liposomes with dope, cholesterol, and oleic acid. studies revealed that approximately 20 % of the injected immunoliposomes were taken up by the target rdm-4 cells. uptake was much less when liposomes without antibody were used (wang and huang 1987) . similarly, these authors have also reported that compositions of liposomes have altered the distribution for targeted drugs. delivery was also dependent on the lipid composition of the liposome. the ph-sensitive lipid composition gave eightfold higher efficiency than the corresponding ph-insensitive composition (wang and huang 1989) . ligand-modified immunoliposomes has been used to efficiently deliver plasmid dna expressing ns3-ns5b (hcv-specific antigenic sequence) to antigen-presenting cells. results confirm that this is as a more efficient delivery system than direct intramuscular inoculations with naked dna (zubkova et al. 2009 ). overall, studies have shown that immunoliposomes are efficiently used for targeted delivery of genetic material, especially in treatment of genetic disorders; however, very limited work has been done for delivery of dna vaccines. stealth liposomes (polyethylene glycol(peg)-conjugated lipids) are sterically stabilized liposomal formulations. pegylation prevents the liposomal vesicles by opsonization and recognition from the reticuloendothelial system and conjunction with other polymeric delivery systems such as pll to achieve longer circulation half-lives (mannisto et al. 2002) . peg grafted liposomes carrying antigenic epitope of gp41, a transmembrane protein of hiv-1 has shown higher immune response and prolonged persistence of antibodies than plain liposome-based antigenic formulations (singh and bisen 2006) . further, it is also reported that grafting of peg on cationic liposomes have resulted in enhanced lymphatic drainage, but there is no improvement in immune responses, when compared to non-pegylated liposomes (carstens et al. 2011) . similarly, immune cell-specific ligand anchored pegylated liposomes have been developed to provide selective uptake at immunological cell. ultrasound (us)-responsive and mannose-modified gene carriers, man-peg (2000) bubble lipoplexes, have been utilized for transfer of ovalbumin (ova)-expressing plasmid dna to selectively and efficiently into antigenpresenting cells. developed systems have demonstrated 500-800-fold higher gene expressions in the antigen-presenting cells (apcs) selectively in vivo compared with the conventional lipofection method (un et al. 2010 ). virosomes are lipidic envelope devoid of genetic information, which retain the antigenic profile and fusogenic properties from their viral origin. reconstituted lipid vesicles equipped with viral glycoproteins seems to possess many ideal properties for delivery of immunogens such as no limitation of size of encapsulated immunogens, high efficiency for cytosolic delivery, simplicity in handling and brevity of incubation time (okamoto et al. 1997) . virosome-mediated delivery has low toxicity and high immunogenicity with various prospective applications for the treatment and prevention of cancer, neurodegenerative disorders, and infectious diseases. the use of immunopotentiating reconstituted influenza virosomes (iriv) as delivery system of dna appear to be a promising tool in vaccinology and gene therapy. irivs are spherical, unilamellar vesicles with a mean diameter of~150 nm, short surface projections of 10-15 nm. irivs are prepared by a mixture of natural and synthetic phospholipids containing 70 % egg yolk phosphatidylcholine (for enhancement of immune responses), 20 % synthetic phosphatidylethanolamine (able to directly stimulate b cells to produce antibodies), and 10 % envelope phospholipids originating from h1n1 influenza virus. irivs were first utilized in the manufacture of hepatitis a vaccine. the adjuvant function of virosomes is based on their virus-like particle structure providing repetitive antigen presentation to b cells, partial protection from extracellular degradation, and a depot effect (gluck et al. 1992) . proteasomes this immunogenic delivery system generally uses a noncovalent interaction between the proteosomes and antigen to form the appropriate complexes for delivering apolar or amphiphilic antigens. in most cases, these trials have involved intranasal administration of the vaccine and qualified as safe and well-tolerated materials through various human clinical trials. proteasome-conjugated shigella flexneri 2a lps vaccine shows an immune response similar to that observed after immunization with the live pathogen (fries et al. 2001) . intranasal delivery of proteasome-based vaccines may be able to produce both systemic and mucosal immunity. another very similar category of vaccines is the conjugate vaccine. these vaccines consist of a relatively non-immunogenic (especially in infants) antigen linked to a more immunogenic carrier such as a protein or toxoid. the conjugate vaccines for h. influenzae type b (hib) were developed using hib polysaccharide conjugated to either diphtheria toxoid (prp-d), omp of neisseria meningitidis (prp-omp), mutant diphtheria toxoid crm197 (hboc) or tetanus toxoid (prp-t) to provide the hib antigen immunogenic (heath 1998 ). cochleates are phospholipid calcium precipitates with a unique structure consisting of a large continuous solid lipid bilayer sheet rolled up into a "jelly roll-like" structure (papahadjopoulos et al. 1975) . cochleate delivery vehicles composed of simple, natural materials (phosphatidylserine and calcium) are unique vaccine carrier and delivery formulations (mannino and gould-fogerite 1995) . they are nontoxic, noninflammatory, and biodegradable. cochleates are prepared through the calcium-induced fusion of negatively charged phospholipid liposomes to collapse into solid sheets that roll up or stack, excluding water. the entire cochleate structure is a series of solid layers, components within the interior of the cochleate structure remain intact provides protection from degradation when exposed to harmful environmental conditions or enzymes. the protection of encochleated materials and structural stability of the cochleate allows for efficient delivery of dna by various routes like mucosal (oral, intragastric, intranasal, and intraocular) and parenteral (intramuscular, subcutaneous, intraperitoneal, and intradermal). strong, long-lasting, mucosal and circulating, antibody and cell-mediated responses are generated. protection from challenge with live viruses following oral or intramuscular administration has been achieved (mannino et al. 1998) . cochleates efficiency can be improved by attachment of surface glycoproteins of enveloped viruses and can be integrated into the lipid bilayers. dna cochleates can be formed by trapping oligonucleotides or high molecular weight plasmids within or between the lipid bilayers (papahadjopoulos et al. 1975 ). virus-like particles (vlps) are small particles consisting of one or more viral coat proteins can act as an adjuvant by carrying peptide sequences inside the apc and feeding into the endogenous processing pathway (schirmbeck et al. 1995) . these are safe, highly immunogenic, no additional adjuvant is needed, well tolerated, noninfective, and can easily be handled in the laboratory. it uses nature's own mechanism and structural principles to trigger the immune system for protective effects by stimulating both cellular immunity by effectively stimulating cd4 proliferative responses and cytotoxic t lymphocyte (ctl) responses and humoral immunity by efficiently cross-linking the membrane-associated immunoglobulin molecules that constitute the b-cell receptor (chackerian 2007; jennings and bachmann 2008; buonaguro et al. 2010) . the immune-stimulating complex (iscom) is a highly versatile and effective particulate antigen delivery system that has been extensively studied as an adjuvant system for a range of viral, bacterial, parasite, and other antigens. iscoms are threedimensional "cage-like" structures, which have been shown to form upon detergent removal from mixtures of saponins, detergents, and cholesterol. the iscom (immunostimulating complex) is a complex consisting of protein antigen, cholesterol, phospholipid, and the saponin adjuvant quil a. a similar vaccine delivery vehicle and adjuvant has also been developed that uses the same material minus the antigen and is referred to as iscomatrix ® . the antigen can be added later to the iscomatrix ® during formulation of the vaccine. this material seems to work similarly to iscoms but provides for more general applications by removing the requirement for hydrophobic antigens (pearse and drane 2005) . iscoms potentiate both humoral and cellular immune responses to incorporated antigens (cox et al. 1998) . iscoms stimulate apcs to produce il-1, il-6, and il-12 and induce thelper cells of both th1 and th2 type and the cell-mediated immune response includes cd8 + class i restricted cytotoxic t cells in a variety of experimental animal models and have now progressed to phase i and ii human trials (claassen and osterhaus 1992; barr and mitchell 1996) . oral administration of iscom vaccines has been shown effectiveness and immune-potentiating effect, but this route requires the use of high and frequent dosing. a study in which iscom vaccines may be able to elicit strong mucosal immune responses when administered in the pelvic presacral space of sheep, which could be useful for immunization against viral infections of the female genital tract (thapar et al. 1991) . a quil a-containing iscom with modified cholera toxin a1 (cta1-dd) used as a mucosal vaccine carrier system for the influenza virus pr8 antigen (helgeby et al. 2006 -ji et al. 2000) . nasal vaccinations with p6 dna vaccine and matrix-m (immunostimulatory complex adjuvant) have shown significant higher iga-producing cells in addition to th1 and th2 cytokine expression. this strategies may provide a new way for the induction of specific immunity at mucosal sites (kodama et al. 2011 ). archaeosomes are nanometric size liposomes made from the polar ether lipids of archaea found in eukaryotes and bacteria. polar ether lipids of archaeosomes are providing excellent physicochemical stability and self-adjuvanting properties for delivery of vaccine preparations. archaeosomes have demonstrated relatively higher stabilities to oxidative stress, high temperature, alkaline ph, action of phospholipases, bile salts, and serum proteins (patel and chen 2005; benvegnu et al. 2009 ). archaeosomes facilitated a strong antibody (th2) response to entrapped protein antigens. the antibody humoral response was superior to that obtained with conventional liposomes and was in some instances comparable to that obtained with the potent but toxic freund's adjuvant (patel and sprott 1999; patel and chen 2005) . sprott et al. have also been described the role of co-enzyme q10 into archaeosome-based antigen formulation. incorporation of coq10 into archaeosomes and conventional liposomes can enhance the phagocytosis of the resultant vesicles by macrophage cells that allow the alteration in targeting profiles to specific tissues when the vesicles are administered to an animal via different routes and further enhance the immune response to coadministered immunogens. recently, "cationic archaeosomes," based on mixtures of neutral/cationic bilayerforming lipids and archaeobacterial synthetic tetraether-type bipolar lipids, have shown better transfection efficiency and can be utilized for dna vaccination (rethore et al. 2007 ). among the variety of lipid delivery systems, polymeric delivery systems have emerged as a promising alternative because of their ease of preparation, purification and chemical modification as well as their enormous stability. polymeric nonviral carriers (polyplexs) are one of the effective means of delivering a therapeutic or other biologically active substance in controlled and sustained manner. polymeric particulate delivery system induces adjuvant effect on the incorporated antigen and reduces the frequency of vaccination required to establish long-term protection. both natural and synthetic polymers have been considered to encapsulate antigenic materials for vaccination (table 3) . various polymeric delivery systems have been developed using these polymers like micellar systems, emulsions, polymerosomes, nanoparticles, microspheres, nanocapsules, dendrimers, and dendrosomes (fig. 8) . however, there are several associated concerns for the use of polymers as vaccines delivery systems such as toxicity, irritancy, allergenicity, and biodegradability. the advantages of using natural polymers include their low cost, biocompatibility and aqueous solubility. however, the natural polymers may also be limited in their use due to the presence of extraneous contaminants, variability from lot to lot and low hydrophobicity. in contrast, synthetic polymers are more reproducible and can be prepared with desired degradation rate, molecular weight and copolymer composition. nevertheless, synthetic polymers may be disadvantageous due to their limited solubility, they are often soluble only in organic solvents and consequently may not release biologically active antigen (rice-ficht et al. 2010). polymeric vaccines may offer improved stability and activity of encapsulated antigen materials by avoiding exposure to organic solvents used during formulation and acidic ph conditions caused by degradation of the polymer (duncan et al. 2005) . effective application of a polymeric nanoparticulate delivery system is greatly dependent on the specific polymer used, as this will dictate the properties of the nanoparticle in vivo (hanson et al. 2008) . for example, polycationic polymers can interact with negatively charged dna, resulting in a improved intracellular dna delivery to occur. whereas noncondensing polymers are neutral or slightly negatively charged polymers that physically encapsulate materials and can be used to target apcs and m-cells in the mucosa (bhavsar and amiji 2007) . there are a number of factors that affects the physicochemical properties of polymeric delivery vehicles like molecular weight, degree of branching, cationic charge density buffer capacity, polyplex properties and the experimental conditions like the polyplex concentration, the presence or absence of serum during transfection, the incubation time and the transfection model chosen for the gene delivery experiment. to reduce its cytotoxicity and improve transfection efficiency, polyplexes have been modified by conjugating with polyethylene glycol (peg), histidine, and targeting ligands including polysaccharides, transferrin, and galactose. various biodegradable polymers like aliphatic polyesters such as poly(lactic acid) (pla), poly(glycolic acid) (pga), poly(e-caprolactone) (pcl), poly (hydroxybutyrate) (phb), and their copolymers being evaluated for their uses as vaccine adjuvants and delivery systems (panyam and labhasetwar 2003) . recently, poly(amino acid)s-based copolymers have also been employed for the delivery of protein, vaccine, and genetic materials such as poly-l-glutamic acid, poly-l-aspartic acid, poly-l-lysine, poly-l-arginine, poly-l-proline, poly-l-asparagine, and poly-lhistidine. polyamino acids have properties that mimic proteins, making them ideal for vaccines delivery. they provide better adjuvanticity, low toxicity, biodegradability and targeting into intracellular compartments (chiang and yeh 2003) . various type of polysaccharides, such as agarose, alginate, carrageenan, hyaluronic acid, dextran, chitosan, cyclodextrins, mannan, and pullulan, have been used for delivery of vaccines (table 3) . at specific concentrations and temperatures, when amphiphilic molecules, or molecules containing hydrophobic and hydrophilic regions, are maintained, naturally form association colloids known as amphiphilic micelles as a result of hydrophobic interactions. poly (ethylene glycol) (peg) is commonly incorporated as the hydrophilic segment in both amphiphilic micelles (gaucher et al. 2005) . gelatin: a denatured protein obtained by acid and alkaline processing of collagen. insoluble in water to prepare hydrogel through chemical cross-linking, with water-soluble carbodiimides and glutaraldehyde lou et al. (1995) easy processability, good biodegradability poor mechanical properties, brittle capable of targeting fibronectinbearing surfaces associated with some tumors silk fibroin: silkworm bombyx mori produces silk to weave its cocoon, and its major components are fibroin and sericin. this is light weight, extremely strong and elastic and exhibits mechanical properties comparable to the best synthetic fibers produced by modern technology zhang et al. (2012) environmentally safe, biocompatibility, excellent mechanical properties less production, high brittleness model antigen enhance the stability, up to 60 c over more than 6 months fibrin: fibrin is a protein matrix produced from fibrinogen, providing an immune-compatible carrier for delivery of active biomolecules, antigens. fibrin naturally contains sites for cell binding and has been investigated as a substrate for cell linkage, distribution, relocation, and propagation khan et al. (2012) induce improved cellular interaction, used as a cell carrier as well as antigen carrier rapid degradation, instable, low mechanical stiffness elastin: elastin is synthesized by vascular smooth muscle cells and secreted as a tropo-elastin monomer that is soluble, hydrophobic and non-glycosylated. elastin is a potent regulator of vascular smooth muscle cells activity, regulations important for preventing fibro-cellular pathology gaudreau et al. (2007) conferring elasticity, precise molecular weight, low polydispersity become insoluble and aggregate at a critical temperature soybean: the most cultivated plant in the world is rich in proteins (40-50 %), carbohydrates (26-30 %),and lipids (20-30 %). it is a species of legume native that can be processed into protein-rich products moravec et al. (2007) abundant, renewable, inexpensive, environment friendly biodegradable application of soy-based polymers in this field is still very narrow iga, mucosal iga antibody response after administered orally to mice chitosan: fully/partially deacetylated form of chitin. degree of deacetylation of commercial chitosan is usually between 70 and 95 %, and the molecular weight between 10 and 1,000 kda. chitosan exhibits a ph-sensitive behavior as a weak polybase due to the large quantities of amino groups on its chain verheul et al. (2011) enhanced immune response, mucoadhesive property poly(ester-amide)s: this polymer is made up of a soft peg segment, connected to a hard diester-diamide segment through an ether bond. it is a high performance thermoplastic elastomer. it is used to replace common elastomers-thermoplastic polyurethanes, polyester elastomers, and silicones-for these characteristics: lower density among tpe, superior mechanical and dynamic properties (flexibility, impact resistance, energy return, fatigue resistance) and keeping these properties at low temperature (lower than à40 c), and good resistance against a wide range of chemicals. it is sensitive to uv degradation li and hu (2002) enhanced cell mediated immunity, superior mechanical and thermal properties poly(lactide-co-glycolide)(plga): plga or poly(lactic-co-glycolic acid) is copolymer, is synthesized by means of random ring-opening copolymerization of two different monomers, the cyclic dimers (1,4-dioxane-2,5-diones) of glycolic acid and lactic acid. during polymerization, successive monomeric units (of glycolic or lactic acid) are linked together in plga by ester linkages, thus yielding a linear, aliphatic polyester as a product moore et al. (1995) degradation products are naturally occurring metabolites and readily absorbed by neighboring cells generate acidic environment and effect the stability yersinia pestis, hiv gp140 dominant th1 response block copolymer micelles are colloidal particles with a size around 5-100 nm, which are currently under investigation as carriers for delivery of biopharmaceuticals. in contrast to cationic polymeric systems, nonionic polymers enhance gene expression through mechanisms, which most likely do not involve dna condensation and facilitated transport within cells. adjuvant-active nonionic block copolymers that are flexible, linear structures, flanked on both ends by hydrophilic polyoxyethylene (poe) with a core of hydrophobic polyoxypropylene (pop) with variable ratios (newman et al. 1998) . the block copolymers are useful as general surfactants and display enhanced biological efficacy as vaccine adjuvants. osmolarity, ph and buffer salts mainly affected the size and morphology of the particles. molecular weight and formulation mainly affected titer and isotype of antibody. jain et al. evaluated a system of combined poly(lactic acid) (pla) and poly(ethylene glycol) (peg) for the delivery of a recombinant hepatitis b surface antigen (hbsag). pla forms the hydrophobic core in an aqueous medium, which controlling the release of the antigen as it degrades into lactic acid. an outer shell form by peg allows for prolonged release patterns and enhanced mucosal uptake to occur (jain et al. 2009 ). hunter et al. (1991) showed that the adjuvant activity of block copolymers varies with the lengths of the chains of polyoxypropylene (pop) and polyoxyethylene (poe). pluronic block copolymers have been used extensively in a variety of pharmaceutical formulations like low molecular mass drugs and polypeptides. kabanov et al. (2002) described that these molecules can modify the biological response during gene therapy in the skeletal muscle, resulting in an enhancement of the transgene expression and therapeutic effect of the transgene. block copolymers were recently used to promote gene delivery of plasmid encoding a food allergen, bovine beta-lactoglobulin. tetronic 304 based block copolymers have decreased blg-specific ige concentrations and reduced local inflammatory response (adel-patient et al. 2010) . similarly, triblock copolymers consisting of three alternating hydrophobic and hydrophilic segments are also used to delivery genetic materials. biodegradable and nontoxic triblock copolymers of pla-peg-pla and plga-peg-plga were also utilized micellar carriers for delivery of encapsulated plasmid pcdna3.1(+)-ma against hcv. developed carrier system has provided long-term better adjuvant effect with no side effects (yang et al. 2011) . similarly, copolymers of a hydrophilic poly(ethylene glycol) block and a cationic poly(aminoethyl methacrylate) (paem) block have been used for dna vaccine delivery. synthesized polyplexes based carrier systems have induced a modest up-regulation of surface markers for dc maturation and better uptake by dcs in the draining lymph nodes (tang et al. 2010 ). further, cationic block copolymers poly(ethylene glycol) (peg) with a positively charged poly(dimethylamino)ethyl methacrylate have been synthesized and utilized for hiv-1 tat dna molecules. results indicated that synthesized cationic block copolymers was safe and ability to deliver genetic material for cell machinery and promising candidate for dna vaccination (caputo et al. 2002) . similar to cationic block polymers, nonionic block copolymers of poly(ethyleneoxide)-poly (propyleneoxide) (peo-ppo) have also been utilized dna vaccination using a beta-galactosidase (betagal) encoding plasmid (mcilroy et al. 2009 ). herpes simplex virus type-1 genes specifying glycoproteins gb and gd have been also delivered by nonionic block copolymers. plasmid-encapsulated block polymers have protected the mice against lethal hsv-1 challenge when immunization was performed via the i.m. route (baghian et al. 2002) . dendrimers are a unique class of polymeric nanoconstructs having highly branched, three-dimensional, nanoscale architecture with very low polydispersity and high functionality. first discovered in the early 1980s by donald tomalia and coworkers, these hyperbranched molecules were called dendrimers. dendrimers are highly branched, synthetic spherical macromolecules with layered architectures that can be considered analogous to a globular protein. they have the potential for high loading capacities due to small diameters (1.5-14.5 nm) through mechanisms such as complexation or formation of chemical bonds at terminal branch points or other active sites (wiwattanapatapee et al. 2000) . in addition, the low polydispersity of dendrimers should provide reproducible pharmacokinetic behavior in contrast to that of some polymers containing fractions with vastly different molecular weight within a given sample (parekh 2007) . several dendrimer-based products have been approved by the fda and successfully commercialized for treatment and diagnosis of diseases, including vivagel™ (starpharma) designed as a topical microbicide, superfect ® , (qiagen pvt ltd.) used for gene transfection, and alert ticket™ (us army research lab) for anthrax detection (merdan et al. 2002) . in the past decade, research has increased on polyamidoamine, polyethylenimines, polylysine, polypropyleneimine, polyaryl ether, polyester, polyglycerol and their derivatives for the design and synthesis of biocompatible dendrimers. dendrimers form complexes by electrostatic interaction with all forms of nucleic acids such as dna, rna, and antisense oligonucleotides. the nature of the dendrimer-nucleic acid complexes ("dendriplexes") is dependent on the stoichiometry and concentration of the dna-phosphates, dendrimer amines, bulk solvent properties (e.g., ph, salt concentration, buffer strength), and even the dynamics of mixing. high ionic strength interferes with the binding process and affects the nature of complexes formed by the different generations, for example, highergeneration ppi dendrimers in higher concentrations form water-soluble dendriplexes, whereas the g1 and g2 ppi dendrimers lead to the formation of electroneutral complexes (tang and szoka 1997) . dendrimer-dna complex is formed by simply mixing the components in an aqueous solution. transfection property can be improved by the use of an excess of cationic dendrimer because the negatively charged phosphate groups on the dna neutralize the positively charged amine groups on the dendrimer through electrostatic interaction and an overall positively charged system is important in cell uptake (bielinska et al. 1999) . immunogenicity and efficacy of dna vaccines can be improved by physical conjugation of the pamam dendrimer to the mhc class ii-targeting peptide. therefore, dendrimers can be further explored for dna-based vaccine development against malaria parasite (pietersz et al. 2006 ). in a recent study, dendriplexes, complexes of dendrons and condensed plasmids containing the gene for protective antigen (pa) of bacillus anthracis, were encapsulated in polylactide-co-glycolide (plg) particles using the double emulsion method. studies indicated that the plg-dendriplex particles produced superior levels of anti-pa igg antibodies when compared to animals immunized with the plg particles (ribeiro et al. 2007 ). conjugation of fifth-generation polyamidoamine (g5-pamam) dendrimers, a dna-loading surface, with mhc class ii-targeting peptides that can selectively deliver these dendrimers to apcs under conditions that enhance their immune stimulatory potency. dna conjugated with this platform efficiently transfected murine and human apcs in vitro. subcutaneous administration of dna-peptide-dendrimer complexes in vivo preferentially transfected dendritic cells (dc) in the draining lymph nodes, promoted generation of high affinity t cells, and elicited rejection of established tumors. taken together, our findings show how pamam-dendrimer complexes can be used for high transfection efficiency and effective targeting of apcs in vivo, conferring properties essential to generate effective dna vaccines. multiple antigenic peptide (map) dendrimer system is being used for vaccine and immunization purposes. map-based delivery can prepare by addition of multiple immune-functional components, like b/t-cell epitopes, cell-penetrating peptides, and lipophilic moieties or by controlled synthesis of nanomaterials like micelles, dendrimers, and nanoparticles (fujita and taguchi 2011) . a tetravalent multiple antigen peptide (map) dendrimer with four identical branches of a c-terminal peptide sequence of the rat gh-bp (gh-bp263-279) was synthesized and used as an immunogen in rabbits. the tetravalent rat gh-bp263-279 map dendrimer served as an effective immunogenic antigen in eliciting specific antibodies (aguilar et al. 2009 ). similar to map dendrimers, glycopeptide dendrimers containing both carbohydrates and peptides can be also used in delivery of vaccine components (niederhafner et al. 2008; sebestik et al. 2011) . the encapsulated dendrimer-nucleic acid complex within a lipophilic shell known as dendrosomes. these are novel vesicular, spherical, supramolecular entities and possess negligible hemolytic toxicity and higher transfection efficiency. dendrosome are reported to be completely nontoxic both in vitro as well as in vivo. poly (propyleneimine) dendrosome-based genetic immunization found to be highly effective against hepatitis b when compared to dendrimer-plasmid dna complex, and the results indicate that dendrosomes hold great potential in dna vaccination. in dendrosomes, the poly(propyleneimine) dendrimer-dna complex is largely protected by multilamelarity of the vesicles. it has been reported that polyamidoamine dendrimer-based dendrosomes are efficient systems for the delivery of s10sirna targeting e6/e7 oncogenes in cervical cancer (pourasgari et al. 2009 ). in vitro superior transfection efficiency displayed by pamam dendrosomes as comparison to other nonviral gene delivery vectors. nontoxic self-assembled dendritic spheroidal nanoparticles (den123) have been used for the delivery of pcmv-betv1 loaded dendritic spheroidal nanoparticles (den123) have shown low toxicity, enhanced transfection efficiency, and improved the immune response against birch pollen allergy (balenga et al. 2006) . similarly, efficiency of dendrosome (a gene porter) is assessed in transferring recombinant human rotavirus vp2 cdna. studies revealed that dendrosome has lower cytotoxicity and better transfectivity in a549, a human lung cell line (pourasgari et al. 2009 ). dendrosome has been used to deliver the dna vaccines encoding hiv-1 p24-gp41 gene. studies have proved the efficacy of this carrier for the delivery of recombinant plasmids construct (roodbari et al. 2012) . polymersomes are self-assembled polymeric colloidal vesicular systems containing aqueous inner core. polymersomes are made up from amphiphilic block copolymers that allow polymersomes to stably encapsulate or integrate a broad range of active molecules. the aqueous core can be utilized for the encapsulation of therapeutic hydrophilic molecules and the membrane can integrate hydrophobic drugs within its hydrophobic part. further, the brush-like surface properties of the polymersome can provide better biocompatibility and blood circulation times. these systems have better loading efficiency, stabilities and provide sustained, controlled release of encapsulated therapeutics. further, these systems have also been used to deliver biotherapeutics, especially peptides, proteins, and nucleic acids to site-specific cellular environment due to escape from endolysosomes (levine et al. 2008; christian et al. 2009 ). amphiphilic diblock copolymer of poly (oligoethylene glycol methacrylate)-block-poly(2-(diisopropylamino)ethyl methacrylate in association with tannic acid forms dna-loaded polymersomes. developed systems have demonstrated better cytosolic release of encapsulated nucleic acid materials (lomas et al. 2011) . further, calcein-loaded polymersomes have also observed for their cytosolic delivery within dendritic cell (scott et al. 2012) . similarly, poly(g-benzyl-l-glutamate)-k (pblg50-k) polymersomes have been used for delivery of influenza hemagglutinin antigen. the immunogenicity and adjuvanticity of developed polymerosomes was better for administered the influenza antigen. in future, this nanostructured polymeric vesicular system may have huge potential for delivery of protein and dna vaccines. emulsions can be manufactured as water-in-oil (w/o) or oil-in-water (o/w) particulate carrier systems. emulsion carrier systems are similar in size to pathogens and taken up by epithelial or m cells in the mucosal surfaces for successive delivery of the vaccine component to apcs and lymphoid tissue. a nanoemulsion formulation of intranasal hepatitis b vaccine showed improved vaccine efficacy, stability and ease of distribution (makidon et al. 2008) . multiple emulsion formulations can also be used as vaccine carrier systems due to its longer stability and high entrapment efficiency of protein antigens without damage during emulsification procedures. types of surfactants, processing methods and stabilizers is requisite for making stable multiple-emulsions (hanson et al. 2008) . the emulsion adjuvant mf59 immunostimulator has been shown to result in the recruitment of antigen-presenting cells (apcs) to the site of injection and to increased uptake of soluble antigen by the apcs. it has been formulated by a simple mixing of the antigen with the adjuvant and has shown excellent compatibility with a variety of subunit antigens. mf59 shows strong immunogenicity as comparison to other adjuvant is clearly seen in pre-clinical data published by ott et al. they reported that when immunized guinea pigs and goat with glycoprotein d of herpes simplex virus (hsv) type 2 in the presence of mf59 showed a 34-fold and ninefold increases in antibody titers compared to aluminum hydroxide, respectively (ott et al. 1995 ). an oil-in-water (o/w) emulsion, syntex adjuvant formulation (saf) is an effective adjuvant composed of a muramyl dipeptide derivative (threonyl-mdp). threonyl-mdp demonstrated a lack of side effects (pyrogenicity, uveitis, adjuvant-induced arthritis) and increased adjuvant activity. saf adjuvant used with a variety of antigens, such as influenza and malaria, and showed both cellmediated and humoral immune responses. saf, or a suitable equivalent, provides an excellent tool for vaccine research (lidgate et al. 1989; lidgate et al. 1992) . there are several different types of montanide™, including isa 50 v, 51, . isa 51 and 720 have been used in human's vaccine formulations, while isa 206 and 50 v have been used only in veterinary vaccine formulations. they are composed of metabolizable squalene-based oil with mannide monooleate emulsifier and permit antigens to be released more rapidly. the montanide emulsions induce high antibody titers and ctl responses due to the formation of a depot at the site of injection. these emulsions have been used as vaccines against malaria, hiv and various cancers and found to be safe and fairly well tolerated (lawrence et al. 2000; toledo et al. 2001 ). various physical delivery methods are being heavily investigated because of direct transfection of apcs with the dna vaccine (porgador et al. 1998 ). the transcutaneous microneedle has the ability to bypass the stratum corneum layer of the skin, thus reaching langerhans cells-the apcs of the skin. jet-injection mechanical devices deliver dna vaccines into the viable epidermis and increased efficacy in the prevention and/or therapy of infectious diseases, allergic disorders and cancer (chen et al. 2002; imoto and konishi 2005; roberts et al. 2005) . the tattooperforating needle device has been used to puncture the skin and transfer dna into skin-associated cells. the bundles of fine metal needles that oscillate at a constant high frequency have shown better expression of reporter genes in mice and induction of immune responses. electroporation has been extensively studied to deliver therapeutic genes that encode a variety of hormones, cytokines, enzymes or antigens in large animal species such as dogs, pigs, cattle and nonhuman primates. several different strategies of this technology are being pursued. however, too little is currently known about several of these devices and much additional research in this area is warranted (van drunen littel-van den hurk et al. 2004; roos et al. 2006; hirao et al. 2008 ). nanotechnology is the development of engineered devices due to their small size at the micromolecular level in the nanometer range and large surface area, which enhances their action for early diagnosis of cancer and infectious diseases. advances in nanotechnology have also proved to be beneficial in therapeutic fields such as drug discovery, drug delivery and gene/protein delivery. this concept has been found to be useful in developing nanovaccines using different routes of administration like oral, nasal and parenteral. the oral route is the most popular and convenient route of administration. oral delivery refers to absorption from the buccal through the rectal mucosa. several barriers associated with genetic vaccination through the oral are generally attributed to (a) low permeability across biological membranes, (b) harsh gastric environment, (c) hepatic first-pass metabolism, and (d) chemical instability. the major drawback with oral route of administration is a higher concentration and is required for the vaccine to be effective due to dilution during the transport of the vaccine through the gastrointestinal tract. to date, most gene delivery strategies have concentrated on the parenteral route of delivery and oral administration has been largely ignored. different nano-and microparticulate delivery systems using natural and synthetic lipid and polymers have been utilized to improve the stability and immunogenicity of oral dna vaccines (bhavsar and amiji 2007) . oral vaccination with dna-chitosan nanoparticles has appeared interesting because of their great stability and the ease of target accessibility, besides chitosan immunostimulatory properties. studies demonstrated that 47 % of protection against parasite infection after delivery chitosan nanoparticles loaded with dna encoding rho1-gtpase protein of schistosoma mansoni (oliveira et al. 2012) . similarly, chitosan nanoparticles are used for dna vaccine against vibrio anguillarum through oral route. studies revealed that chitosan-dna (pvaomp38) complex showed moderate protection against experimental v. anguillarum infection after oral vaccination in asian sea bass (rajesh kumar et al. 2008) . the orally administered tresylmonomethoxypolyethylene glycol (tmpeg) grafted liposome complexes with modified vaccinia virus ankara (mva(iiib/beta-gal) is also capable of delivering the transgenes to mucosal tissues and enhances the env-specific cellular and humoral immune responses after repeated oral immunization of balb/c mice (naito et al. 2007) . mannosylated niosomes loaded with hepatitis dna have shown humoral (both systemic and mucosal) and cellular immune response upon oral administration (jain et al. 2005) . chitosan-coated and polyplex-loaded liposomes (plls) containing plasmid prc/cmv-hbs are developed for oral delivery of vaccines specifically for targeting to peyer's patch. chitosan-coated pll demonstrated better uptake of encapsulated dna to the distal intestine and provide better stability from enzymatic degradation (channarong et al. 2011) . the nasal route has been chiefly employed for producing local action on the mucosa. this route has a number of advantages, such as the high permeability of the nasal epithelium, which allows a higher molecular mass cut-off for permeation of approximately 1,000 da, as well as the rapid drug absorption rate. accurate and repeated dispensing of vaccine, mucociliary clearance, presence of peptidases, proteases and nuclease enzymes in the mucus or associated with nasal membrane, variation in extent of absorption with the mucus secretion and mucus turnover and deposition of the formulated vaccine to all areas of the nasal mucosa (especially lymphoid tissues), potential of uptake of vaccine formulations by the primary olfactory nerves in the nasal cavity, local irritation and unpleasant taste from concentrated drug reaching the mouth are major challenges associated with intranasal delivery of vaccines (oliveira et al. 2007; sharma et al. 2009 ). these problems can be overcome by design of appropriate antigen carriers. nanocarriers for nasal vaccines are able to facilitate the transport of the associated antigen across the nasal epithelium, thus leading to efficient antigen presentation to the immune system and provide the protection and stability of encapsulated genetic materials (koping-hoggard et al. 2005) . further, use of mucoadhesive agents offers a strategy for the facilitation of increased residence time and increased vaccine efficacy (alpar et al. 2005) . polycarbophil (pc) or polyethylene oxide (peo)-based in-situ mucoadhesive polymers have demonstrated better nasal absorption of plasmid dna (park et al. 2002) . several studies have proven that wide applicability of chitosan nanoparticles for the nasal delivery of dna vaccines like severe acute respiratory syndrome coronavirus (sars-cov) (raghuwanshi et al. 2012) , pneumococcal surface antigen a (psaa) (xu et al. 2011) , hepatitis b antigen-encoding plasmid (khatri et al. 2008) , and dna plasmid-expressing epitopes of respiratory syncytial virus (iqbal et al. 2003) . further, several modification on the chitosan polymers have also been made to improve the potential of chitosan nanoparticles for nasal administration of dna vaccines like preparation of low molecular weight chitosan, development of water soluble chitosan (n-trimethyl chitosan), etc. blends of poly(lactic-co-glycolic acid) (plga) and polyethylene oxide (peo) have exhibited the capacity to associate and release plasmid dna in a controlled manner. results showed that dna-loaded nanoparticles elicit significantly pronounced immune response compared to the naked plasmid dna for up to 6 weeks (csaba et al. 2006) . dry-powder influenza virosomes-based vaccines have also been advantageous for mucosal immunization (de jonge et al. 2007 ). needle-free nasal immunization, using nanoemulsion is made of soya bean oil, alcohol, water and detergents emulsified into droplets of 40 nm, has been reported to be a safe and effective hepatitis b vaccine (makidon et al. 2008) . the release of liquid or particles into the airflow enters one nostril via a sealing nozzle and exits through the other nostril and minimizes the risk and problems related to deposition of particles in the lung, which occurs during conventional inhalation from a nebulizer and increases the delivery of particles to the posterior part of the nasal mucosa. encapsulation of the antigen into bioactive nanoparticles is a promising approach to nasal vaccine delivery (slutter et al. 2008) . the ocular route holds immense potential for peptides/proteins intended for pathological ophthalmologic conditions. the eye mucosa is a possible route for mucosal vaccine because it is an important entry point for environmental antigens and infectious materials occupying most of the external ocular surface (streilein et al. 1997) . lymphoid follicles are found in close association with the epithelium of the conjunctival mucosa in humans, rabbits, guinea pigs, dogs, pigs, and many other mammals (chodosh et al. 1998 (seo et al. 2010) . ocular mucosal delivery of peptide epitopes of herpes simplex virus (hsv-1) glycoprotein d (gd) has mixed with oligodeoxy nucleotides containing unmethylated cpg motifs (cpg2007). results suggested enhanced local and systemic immune response after multi-instillation of gd peptide epitopes with cpg2007 adjuvants (nesburn et al. 2005) . ocular mucosal administration of iron nanoparticles with glutamic acid containing dna vaccine herpes stromal keratitis (prsc-gd-il-21) have confers protection against mucosal challenge with herpes simplex virus type 1 in mice (hu et al. 2011) . vaginal mucosa is a portal of entry to many viral and bacterial pathogens. vaginal route serves as a potential site of drug administration for local and systemic absorption of therapeutically important molecules, proteins, peptides, small interfering rnas, oligonucleotides, antigens, vaccines and hormones (hussain and ahsan 2005) . it is one of alternative site for the systemic delivery of protein drugs because of the relatively high permeability of the vaginal epithelium, by passage of the hepatic first-pass metabolism, large surface area and rich blood supply (gupta et al. 2011 (gordon et al. 2012) . thermo-sensitive mucoadhesive vaginal vaccine delivery systems have also been tested for the local and systemic antibody responses to hpv 16 l1 virus-like particles (park et al. 2003) . vaginal delivery of vaccines which is associated with vaginal infection could be better alternative to induce an immune response in the genital mucosa capable of controlling the entry of the pathogen. noninvasive gene delivery approaches could be able to deliver and express naked plasmid dna to tissue-specific localized delivery to skin. there are several advantages of needle-free noninvasive gene administration such as limited toxicity, potential cell receptor-independent uptake, minimal dna size restrictions, and the potential for multiple treatments via a relatively uncomplicated administration modality, thus improving patient compliance. topically applied formulation, especially nanosystems have been shown to enter skin, accumulate in hair follicles, diffuse via dendritic cells to draining lymph nodes, and elicit antigen-specific humoral and cell-mediated immunity (nasir 2009 ). a number of methods have been developed to perform noninvasive topical gene delivery, which includes passive diffusion of genetic materials between a skin patch and skin, as well as active processes such as iontophoresis, sonophoresis, electroporation, and chemically enhanced diffusion (mehier-humbert and guy 2005). topical vaccination has been achieved using topical application of naked dna with or without tape stripping and dna/lipid-based complex such as liposomes, niosomes, transfersomes, or microemulsion (cui and sloat 2006) . ethanol-in-fluorocarbonbased microemulsion has been for topically delivery of anthrax protective antigen (pa) protein-encoding dna vaccine (pgpa). pgpa-loaded microemulsion has significantly enhanced the anti-pa antibody responses (cui and sloat 2006) . similarly, dna delivery by novel lipid-based biphasic delivery system has significant deliver plasmid dna into the "viable" layers of skin (foldvari et al. 2006) . plasmid dna-encoding hepatitis b surface antigen (hbsag)-loaded cationic transfersomes are also utilized for topical immunization. results revealed that dna-loaded cationic transfersomes elicited significantly higher anti-hbsag antibody titer and cytokines level as compared to naked dna. it was also observed that topical application of dna-loaded cationic transfersomes elicited a comparable serum antibody titer and endogenous cytokines levels as produced after intramuscular recombinant hbsag administration (mahor et al. 2007) . 40-or 200-nm sized polystyrene nanoparticles have been studied to target active compounds to the hair follicle and may result in a better penetration and higher efficiency of compound uptake by skin resident cells. studies demonstrated that 40 and 200 nm nps and modified vaccinia ankara (mva) expressing the green-fluorescent protein penetrated deeply into hair follicles and uptake by apcs and transport to the draining lymph nodes (mahe et al. 2009 ). nanoengineered genetic vaccine formulation has been developed for topical immunization comprising of emulsifying wax (oil phase), ctab (cationic surfactant), mannan (dc ligand), dioleoylphosphatidylethanolamine (endosomolytic agents), and cholesterol. all pdna-coated nanoparticles, especially the mannan-coated pdna-nanoparticles with dope, have shown significant immune response (igg titers; 16-fold over "naked" pdna alone) (cui and mumper 2002) . diffusion patches and tape stripping techniques are used for delivery of small (<500 da) and large molecules, respectively. liquid jet injector is an approach in which dna vaccine is delivered around the langerhans cells by a high-speed injector. (chen et al. 2002) reported that particle-mediated gene-gun dna immunization use similar mechanical devices to deliver dna vaccines into viable epidermis (chen et al. 2002) . microneedle arrays is a set of needles of microscale length with their nanoscale tips coated with dna and can accurately, efficiently and safely deliver biomolecules to the viable cells of the epidermis. recently, tran et al. (2008) developed a unique nanoliposomal ultrasound-mediated device for delivering small interfering rna (sirna) specifically targeting melanocytic tumors present in the skin and they observed that decrease early melanocytic lesion development in the skin and prevent the spread of cutaneous metastases of melanoma (tran et al. 2008) . these results suggested that skin may provide an appealing, noninvasive route of delivery for dna vaccines and other therapeutic genes. table 4 represents positive and negative aspects of various routes of administration, which are very helpful for selection of particular route. novel vaccine carriers, adjuvant, vehicles, and particle-based delivery strategies are being evaluated in a variety of vaccines, including those against diseases such as cancer, malaria, aids, hepatitis, etc., in which a cellular and/or mucosal immune response is desired. various immunity responses were generated by different adjuvant like mf59 and mpl ® generated th1 responses, vlps, virosomes, nondegradable nanoparticles, and liposomes generated cellular immune responses in humans. viral vectors, iscoms and montanide™ isa51, 720 and various nanoparticulate immunopotentiators and antigen delivery vehicles have shown ctl responses. the desirable responses can be achieved by using combination of various adjuvants. systemic antibodies produced in humans when viral-vectored vaccines as well as proteasomes given in. the clinical trials required for vaccine approval are often very long and difficult. furthermore, since many vaccines are often administered to healthy individuals, and frequently to infants, it is critical that they are proven safe and well tolerated in nonhuman primates before entering human trials. while the development of novel vaccine delivery systems and adjuvant has been aided by nanotechnology, it must be necessary to perceived potential problems such as their high surface area and reactivity, the ability to cross biological membranes, slow biodegradability of some materials, its safety and tolerability before its approval. many challenges must be met before new classes of vaccines become available like ability to stimulate humoral, cellular and mucosal immune responses, longer duration response, easily metabolized of vaccine components in body, cost-effective production, and lesser risk and less invasive approaches for the administration of vaccinations. as these challenges are met, the prevention and therapy of many previously untreatable diseases should become increasingly possible. block copolymers have differing adjuvant effects on the primary immune response elicited by genetic immunization and on further induced allergy map dendrimer elicits antibodies for detecting rat and mouse gh-binding proteins comparison of immune response generated against japanese encephalitis virus envelope protein expressed by dna vaccines under macrophage associated versus ubiquitous expression promoters dna vaccines: technology and application as antiparasite and anti-microbial agents liposomes as immunological adjuvants yersinia enterocolitica as a vehicle for a naked dna vaccine encoding brucella abortus bacterioferritin or p39 antigen biodegradable mucoadhesive particulates for nasal and pulmonary antigen and dna delivery aluminium compounds as adjuvants for vaccines and toxoids in man: a review vaccine delivery: lipid-based delivery systems carbohydrate biopolymers enhance antibody responses to mucosally delivered vaccine antigens protective immunity against lethal hsv-1 challenge in mice by nucleic acid-based immunisation with herpes simplex virus type-1 genes specifying glycoproteins gb and gd protective efficiency of dendrosomes as novel nano-sized adjuvants for dna vaccination against birch pollen allergy iscoms (immunostimulating complexes): the first decade transcriptional control of the rna-dependent rna polymerase of vesicular stomatitis virus preparation, characterization, cytotoxicity and transfection efficiency of poly(dl-lactide-co-glycolide) and poly(dl-lactic acid) cationic nanoparticles for controlled delivery of plasmid dna new generation of liposomes called archaeosomes based on natural or synthetic archaeal lipids as innovative formulations for drug delivery polymeric nano-and microparticle technologies for oral gene delivery dna complexing with polyamidoamine dendrimers: implications for transfection alginate coated chitosan nanoparticles are an effective subcutaneous adjuvant for hepatitis b surface antigen a very strong enhancer is located upstream of an immediate early gene of human cytomegalovirus liposome/dna complexes coated with biodegradable pla improve immune responses to plasmid encoding hepatitis b surface antigen in vivo transfection of murine lungs with a functioning prokaryotic gene using a liposome vehicle virus-like particles as particulate vaccines preclinical safety assessment considerations in vaccine development micellar-type complexes of tailor-made synthetic block copolymers containing the hiv-1 tat dna for vaccine application effect of vesicle size on tissue localization and immunogenicity of liposomal dna vaccines hot spots of retroviral integration in human cd34+ hematopoietic cells virus-like particles: flexible platforms for vaccine development gene therapy of muscular dystrophy development and evaluation of chitosan-coated liposomes for oral dna vaccine: the improvement of peyer's patch targeting using a polyplex-loaded liposomes needle-free epidermal powder immunization contribution of poly(amino acids) to advances in pharmaceutical biotechnology comparative anatomy of mammalian conjunctival lymphoid tissue: a putative mucosal immune site enhancement of t helper type 1 immune responses against hepatitis b virus core antigen by plga nanoparticle vaccine delivery polymersome carriers: from self-assembly to sirna and protein therapeutics the iscom structure as an immune-enhancing moiety: experience with viral systems apoptosis in hiv infection: protective role of il-2 iscoms and other saponin based adjuvants plga:poloxamer and plga:poloxamine blend nanostructures as carriers for nasal gene delivery topical immunization using nanoengineered genetic vaccines the effect of co-administration of adjuvants with a nanoparticle-based genetic vaccine delivery system on the resulting immune responses microparticles and nanoparticles as delivery systems for dna vaccines topical immunization onto mouse skin using a microemulsion incorporated with an anthrax protective antigen protein-encoding plasmid inulin sugar glasses preserve the structural integrity and biological activity of influenza virosomes during freeze-drying and storage delivery of messenger rna using poly(ethylene imine)-poly(ethylene glycol)-copolymer blends for polyplex formation: biophysical characterization and in vitro transfection properties polyethyleneimine-based gene therapy by inhalation priming with chlamydia trachomatis major outer membrane protein (momp) dna followed by momp iscom boosting enhances protection and is associated with increased immunoglobulin a and th1 cellular immune responses viruses as vaccine vectors for infectious diseases and cancer dendrimer biocompatibility and toxicity cationic liposomes for gene delivery: novel cationic lipids and enhancement by proteins and peptides encapsulation of antigenic extracts of salmonella enterica serovar. abortusovis into polymeric systems and efficacy as vaccines in mice an adjuvant vaccine against infectious canine hepatitis liposomal nanomedicines gene delivery into human skin in vitro using biphasic lipid vesicles immune response to rabies vaccine in water-in-oil emulsion safety and immunogenicity of a proteosome-shigella flexneri 2a lipopolysaccharide vaccine administered intranasally to healthy adults current status of multiple antigen-presenting peptide vaccine systems: application of organic and inorganic nanoparticles cellulose acetate butyrate-ph/ thermosensitive polymer microcapsules containing aminated poly(vinyl alcohol) microspheres for oral administration of dna unique features of a ph-sensitive fusogenic peptide that improves the transfection efficiency of cationic liposomes dna vaccines: protective immunizations by parenteral, mucosal, and gene-gun inoculations opportunities for the use of lentiviral vectors in human gene therapy cationic liposome-mediated gene transfer scaffold: a novel carrier for cell and drug delivery block copolymer micelles: preparation, characterization and application in drug delivery protective immune responses to a multi-gene dna vaccine against staphylococcus aureus in vivo induction of cd4+ t cell responses by antigens covalently linked to synthetic microspheres does not require adjuvant long-term safety analysis of preventive hiv-1 vaccines evaluated in aids vaccine evaluation group niaid-sponsored phase i and ii clinical trials a sequential study of incomplete freund's adjuvantinduced peritonitis in atlantic cod immunopotentiating reconstituted influenza virus virosome vaccine delivery system for immunization against hepatitis a construction of hybrid viruses containing sv40 and lambda phage dna segments and their propagation in cultured monkey cells preventive hiv type 1 vaccine clinical trials: a regulatory perspective targeting the vaginal mucosa with human papillomavirus pseudovirion vaccines delivering simian immunodeficiency virus dna liposome-mediated dna vaccination aluminum compounds as vaccine adjuvants adjuvants for human vaccines -current status, problems and future prospects adjuvants -a balance between toxicity and adjuvanticity adjuvant properties of aluminum and calcium compounds adjuvant properties of non-phospholipid liposomes (novasomes) in experimental animals for human vaccine antigens exploring novel approaches to vaginal drug delivery codon bias and heterologous protein expression nanoscale double emulsions stabilized by singlecomponent block copolypeptides wssv ie1 promoter is more efficient than cmv promoter to express h5 hemagglutinin from influenza virus in baculovirus as a chicken vaccine haemophilus influenzae type b conjugate vaccines: a review of efficacy data the combined cta1-dd/iscom adjuvant vector promotes priming of mucosal and systemic immunity to incorporated antigens by specific targeting of b cells structure and properties of aluminum-containing adjuvants immune responses and protection obtained by oral immunization with rotavirus vp4 and vp7 dna vaccines encapsulated in microparticles immunostimulatory dna as a vaccine adjuvant intradermal/subcutaneous immunization by electroporation improves plasmid vaccine delivery and potency in pigs and rhesus macaques origin and steady-state turnover of class ii mhcbearing dendritic cells in the epithelium of the conducting airways an ocular mucosal administration of nanoparticles containing dna vaccine prsc-gd-il-21 confers protection against mucosal challenge with herpes simplex virus type 1 in mice adjuvant activity of non-ionic block copolymers. iv. effect of molecular weight and formulation on titre and isotype of antibody the vagina as a route for systemic drug delivery needle-free jet injection of a mixture of japanese encephalitis dna and protein vaccines: a strategy to effectively enhance immunogenicity of the dna vaccine in a murine model nasal delivery of chitosan-dna plasmid expressing epitopes of respiratory syncytial virus (rsv) induces protective ctl responses in balb/c mice release characteristics of a model plasmid dna encapsulated in biodegradable poly(ethylene glycol fumarate)/acrylamide hydrogel microspheres mannosylated niosomes as adjuvant-carrier system for oral genetic immunization against hepatitis b synthesis, characterization and evaluation of novel triblock copolymer based nanoparticles for vaccine delivery against hepatitis b the coming of age of virus-like particle vaccines pluronic block copolymers as novel polymer therapeutics for drug and gene delivery enhanced transfection of tumor cells in vivo using "smart" phsensitive tat-modified pegylated liposomes pluronic f127 enhances the effect as an adjuvant of chitosan microspheres in the intranasal delivery of bordetella bronchiseptica antigens containing dermonecrotoxin multivesicular liposome (depofoam) technology for the sustained delivery of insulin-like growth factor-i (igf-i) immunogenicity in mice of a cationic microparticle-adsorbed plasmid dna encoding japanese encephalitis virus envelope protein vaccine potential of cytosolic proteins loaded fibrin microspheres of cryptococcus neoformans in balb/c mice plasmid dna loaded chitosan nanoparticles for nasal mucosal immunization against hepatitis b comparing the effect of il-12 genetic adjuvant and alum non-genetic adjuvant on the efficiency of the cocktail dna vaccine containing plasmids encoding sag-1 and rop-2 of toxoplasma gondii nasal immunization with plasmid dna encoding p6 protein and immunostimulatory complexes elicits nontypeable haemophilus influenzaespecific long-term mucosal immune responses in the nasopharynx nanoparticles as carriers for nasal vaccine delivery dna vaccines: ready for prime time? an hdac inhibitor enhances the antitumor activity of a cmv promoter-driven dna vaccine parenteral nutrition in the malnourished: dialysis, cancer, obese, and hyperemesis gravidarum patients effect of vaccination with 3 recombinant asexualstage malaria antigens on initial growth rates of plasmodium falciparum in non-immune volunteers glycosaminoglycan-resistant and ph-sensitive lipid-coated dna complexes produced by detergent removal method polymersomes: a new multi-functional tool for cancer diagnosis and therapy assembly of electroactive layer-by-layer films of myoglobin and ionomer poly (ester sulfonic acid) enhancement of a hepatitis b dna vaccine potency using aluminum phosphate in mice formulation of vaccine adjuvant muramyldipeptides. 3. processing optimization, characterization, and bioactivity of an emulsion vehicle sterile filtration of a parenteral emulsion gene therapy for renal disorders: what are the benefits for the elderly? poly(cationic lipid)-mediated in vivo gene delivery to mouse liver dna immunization in combination with the immunostimulant monophosphoryl lipid a polymersome-loaded capsules for controlled release of dna enhanced plasmid dna delivery using anionic lpdii by listeriolysin o incorporation interaction between fibronectin-bearing surfaces and bacillus calmette-guerin (bcg) or gelatin microparticles pmma particle-mediated dna vaccine for cervical cancer development of an antigen-presenting cell-targeted dna vaccine against melanoma by mannosylated liposomes plasmid dna containing multiple cpg motifs triggers a strong immune response to hepatitis b surface antigen when combined with incomplete freund's adjuvant but not aluminum hydroxide immunoliposomes as targeted delivery vehicles for cancer therapeutics (review) nanoparticle-based targeting of vaccine compounds to skin antigen-presenting cells by hair follicles and their transport in mice cationic transfersomes based topical genetic vaccine against hepatitis b pre-clinical evaluation of a novel nanoemulsion-based hepatitis b mucosal vaccine plasmid dna vaccines: tissue distribution and effects of dna sequence, adjuvants and delivery method on integration into host dna incorporation of hepatitis-b surface antigen (hbsag) into liposomes lipid matrix-based vaccines for mucosal and systemic immunization targeting immune response induction with cochleate and liposome-based vaccines structure-activity relationships of poly(llysines): effects of pegylation and molecular shape on physicochemical and biological properties in gene delivery dna released from dying host cells mediates aluminum adjuvant activity gene delivery to the eye using adeno-associated viral vectors spotlight on quadrivalent human papillomavirus (types 6, 11, 16, 18) recombinant vaccine(gardasil(r)) in the prevention of premalignant genital lesions, genital cancer, and genital warts in women dna/amphiphilic block copolymer nanospheres promote low-dose dna vaccination retroviral vectors for human gene delivery physical methods for gene transfer: improving the kinetics of gene delivery into cells prospects for cationic polymers in gene and oligonucleotide therapy against cancer anionic ph-sensitive pegylated lipoplexes to deliver dna to tumors immunization with a soluble recombinant hiv protein entrapped in biodegradable microparticles induces hiv-specific cd8+ cytotoxic t lymphocytes and cd4+ th1 cells production of escherichia coli heat labile toxin (lt) b subunit in soybean seed and analysis of its immunogenicity as an oral vaccine oral vaccination with modified vaccinia virus ankara attached covalently to tmpeg-modified cationic liposomes overcomes pre-existing poxvirus immunity from recombinant vaccinia immunization confocal and probe microscopy to study gene transfection mediated by cationic liposomes with a cationic cholesterol derivative application of a fusiogenic peptide gala for intracellular delivery nanotechnology in vaccine development: a step forward local and systemic b cell and th1 responses induced following ocular mucosal delivery of multiple epitopes of herpes simplex virus type 1 glycoprotein d together with cytosine-phosphate-guanine adjuvant immunological and formulation design considerations for subunit vaccines use of nonionic block copolymers in vaccines and therapeutics glycopeptide dendrimers. part i polyhydroxyalkanoates: materials for delivery systems induction of antibody response to human tumor antigens by gene therapy using a fusigenic viral liposome vaccine intranasal vaccines for protection against respiratory and systemic bacterial infections oral vaccination based on dna-chitosan nanoparticles against schistosoma mansoni infection hamster cell culture rabies vaccine design and evaluation of a safe and potent adjuvant for human vaccines biodegradable nanoparticles for drug and gene delivery to cells and tissue cochleate lipid cylinders: formation by fusion of unilamellar lipid vesicles the advance of dendrimers -a versatile targeting platform for gene/drug delivery in situ gelling and mucoadhesive polymer vehicles for controlled intranasal delivery of plasmid dna enhanced mucosal and systemic immune responses following intravaginal immunization with human papillomavirus 16 l1 virus-like particle vaccine in thermosensitive mucoadhesive delivery systems archaeosome immunostimulatory vaccine delivery system archaeobacterial ether lipid liposomes (archaeosomes) as novel vaccine and drug delivery systems anionic liposomal delivery system for dna transfection iscomatrix adjuvant for antigen delivery liposome-entrapped plasmid dna: characterisation studies improving vaccine delivery using novel adjuvant systems structure and design of polycationic carriers for gene delivery predominant role for directly transfected dendritic cells in antigen presentation to cd8+ t cells after gene gun immunization low cytotoxicity effect of dendrosome as an efficient carrier for rotavirus vp2 gene transferring into a human lung cell line: dendrosome, as a novel intranasally gene porter polycation-dna complexes for gene delivery: a comparison of the biopharmaceutical properties of cationic polypeptides and cationic lipids monophosphoryl lipid a obtained from lipopolysaccharides of salmonella minnesota r595. purification of the dimethyl derivative by high performance liquid chromatography and complete structural determination dendritic cell targeted chitosan nanoparticles for nasal dna immunization against sars cov nucleocapsid protein potential use of chitosan nanoparticles for oral delivery of dna vaccine in asian sea bass (lates calcarifer) to protect from vibrio (listonella) anguillarum enhanced folate receptor mediated gene therapy using a novel phsensitive lipid formulation region specific and worldwide distribution of collagen-binding m proteins with parf motifs among human pathogenic streptococcal isolates archaeosomes based on synthetic tetraether-like lipids as novel versatile gene delivery systems plga-dendron nanoparticles enhance immunogenicity but not lethal antibody production of a dna vaccine against anthrax in mice polymeric particles in vaccine delivery clinical safety and efficacy of a powdered hepatitis b nucleic acid vaccine delivered to the epidermis by a commercial prototype device dna vaccines for viral infections: basic studies and applications immune responses against a new hiv-1 p24-gp41/pcaggs-il-12 dna vaccine in balb/c mice enhancement of cellular immune response to a prostate cancer dna vaccine by intradermal electroporation live attenuated aids viruses as vaccines: promise or peril? purification of human and avian influenza viruses using cellulose sulfate ester (cellufine sulfate) in the process of vaccine production the use of adjuvants in studies on influenza immunization. ii. increased antibody formation in human subjects inoculated with influenza virus vaccine in a water in-oil emulsion monophosphoryl lipid a enhances both humoral and cell-mediated immune responses to dna vaccination against human immunodeficiency virus type 1 immunostimulatory colloidal delivery systems for cancer vaccines hepatitis b virus small surface antigen particles are processed in a novel endosomal pathway for major histocompatibility complex class irestricted epitope presentation dendritic cell activation and t cell priming with adjuvant-and antigen-loaded oxidation-sensitive polymersomes peptide and glycopeptide dendrimers and analogous dendrimeric structures and their biomedical applications eye mucosa: an efficient vaccine delivery route for inducing protective immunity pharmaceutical aspects of intranasal delivery of vaccines using particulate systems novel histidine-conjugated galactosylated cationic liposomes for efficient hepatocyte-selective gene transfer in human hepatoma hepg2 cells effects of various adjuvants on efficacy of a vaccine against streptococcus bovis and lactobacillus spp. in cattle adjuvanticity of stealth liposomes on the immunogenicity of synthetic gp41 epitope of hiv-1 recombinant mouse cytomegalovirus expressing a ligand for the nkg2d receptor is attenuated and has improved vaccine properties rational design of nasal vaccines oral plasmid dna delivery systems for genetic immunisation heterodimeric barnase-barstar vaccine molecules: influence of one versus two targeting units specific for antigen presenting cells immune deviation in relation to ocular immune privilege immunogenic properties of the salmonella atypical fimbriae in balb/c mice the influence of polymer structure on the interactions of cationic polymers with dna and morphology of the resulting complexes genetic immunization is a simple method for eliciting an immune response well-defined block copolymers for gene delivery to dendritic cells: probing the effect of polycation chain-length the porcine circovirus type 1 capsid gene promoter improves antigen expression and immunogenicity in a hiv-1 plasmid vaccine methylation-dependent t cell immunity to mycobacterium tuberculosis heparin-binding hemagglutinin secretory immune responses in the mouse vagina after parenteral or intravaginal immunization with an immunostimulating complex (iscom) genetic targeting of the active transcription factor xbp1s to dendritic cells potentiates vaccine-induced prophylactic and therapeutic antitumor immunity a phase i clinical trial of a multi-epitope polypeptide tab9 combined with montanide isa 720 adjuvant in non-hiv-1 infected human volunteers targeting v600eb-raf and akt3 using nanoliposomal-small interfering rna inhibits cutaneous melanocytic lesion development thermoresponsive polymers as gene delivery vectors: cell viability, dna transport and transfection studies heterologous protection against influenza by injection of dna encoding a viral protein development of an ultrasound-responsive and mannosemodified gene carrier for dna vaccine therapy strategies for improved formulation and delivery of dna vaccines to veterinary target species ph-sensitive liposomes: mechanism of triggered release to drug and gene delivery prospects covalently stabilized trimethyl chitosan-hyaluronic acid nanoparticles for nasal and intradermal vaccination ph-sensitive immunoliposomes mediate target-cell-specific delivery and controlled expression of a foreign gene in mouse highly efficient dna delivery mediated by ph-sensitive immunoliposomes gene inoculation generates immune responses against human immunodeficiency virus type 1 anionic pamam dendrimers rapidly cross adult rat intestine in vitro: a potential oral delivery system? gene therapy for ovarian cancer (review) pathogen recognition and development of particulate vaccines: does size matter? systemic tumor-targeted gene delivery by anti-transferrin receptor scfv-immunoliposomes intranasal vaccination with chitosan-dna nanoparticles expressing pneumococcal surface antigen a protects mice against nasopharyngeal colonization by streptococcus pneumoniae immunogenicity of multiple-epitope antigen gene of hcv carried by novel biodegradable polymers construction of dna and rna based on bifunctional replicon vector derived from semliki forest virus enhanced potency of individual and bivalent dna replicon vaccines or conventional dna vaccines by formulation with aluminum phosphate stabilization of vaccines and antibiotics in silk and eliminating the cold chain t-cell vaccines that elicit effective immune responses against hcv in chimpanzees may create greater immune pressure for viral mutation key: cord-298131-zolwjl9u authors: xiao, shuqi; jia, jianyu; mo, delin; wang, qiwei; qin, limei; he, zuyong; zhao, xiao; huang, yuankai; li, anning; yu, jingwei; niu, yuna; liu, xiaohong; chen, yaosheng title: understanding prrsv infection in porcine lung based on genome-wide transcriptome response identified by deep sequencing date: 2010-06-29 journal: plos one doi: 10.1371/journal.pone.0011377 sha: doc_id: 298131 cord_uid: zolwjl9u porcine reproductive and respiratory syndrome (prrs) has been one of the most economically important diseases affecting swine industry worldwide and causes great economic losses each year. prrs virus (prrsv) replicates mainly in porcine alveolar macrophages (pams) and dendritic cells (dcs) and develops persistent infections, antibody-dependent enhancement (ade), interstitial pneumonia and immunosuppression. but the molecular mechanisms of prrsv infection still are poorly understood. here we report on the first genome-wide host transcriptional responses to classical north american type prrsv (n-prrsv) strain ch 1a infection using solexa/illumina's digital gene expression (dge) system, a tag-based high-throughput transcriptome sequencing method, and analyse systematically the relationship between pulmonary gene expression profiles after n-prrsv infection and infection pathology. our results suggest that n-prrsv appeared to utilize multiple strategies for its replication and spread in infected pigs, including subverting host innate immune response, inducing an anti-apoptotic and anti-inflammatory state as well as developing ade. upregulation expression of virus-induced pro-inflammatory cytokines, chemokines, adhesion molecules and inflammatory enzymes and inflammatory cells, antibodies, complement activation were likely to result in the development of inflammatory responses during n-prrsv infection processes. n-prrsv-induced immunosuppression might be mediated by apoptosis of infected cells, which caused depletion of immune cells and induced an anti-inflammatory cytokine response in which they were unable to eradicate the primary infection. our systems analysis will benefit for better understanding the molecular pathogenesis of n-prrsv infection, developing novel antiviral therapies and identifying genetic components for swine resistance/susceptibility to prrs. porcine reproductive and respiratory syndrome (prrs), also called ''blue ear'' disease due to a typical, but not often observed hallmark of ''blue ears'', is widely accepted as being one of the most economically important diseases affecting swine industry. since its first appearance in the late 1980s in the us and europe, prrs has spread worldwide [1, 2, 3] . prrs is characterized with high mortality in piglets, reproductive failure (late-term abortions and stillbirths, premature farrowing, mummified pigs) in pregnant sows and respiratory disease (interstitial pneumonia, respiratory difficulties) in nursery and grower/finishing pigs, causing highly significant economic losses to the swine industry worldwide, resulting in .$ 560.32 million losses each year in the us alone [4] . the etiologic agent of prrs is prrs virus (prrsv), a small enveloped, linear, single, positive-stranded rna virus, which is a member of the family arteriviridae which includes lactate dehydrogenase-elevating virus, equine arteritis virus, and simian hemorrhagic fever virus and enters in the newly established order delayed and their levels remain low, which can not eliminate effectively prrsv-infected cells [12, 13] . because of these features of prrsv infection, prrs has been one of the most challenging subjects of research in veterinary viral immunology [12] . regulation of immune responses and genetic resistance to infectious viral diseases is an area of concern for human and swine [14] . prrsv strongly modulates the host's immune responses, and changes the host's gene expression. studies showed that prrsv inhibits type i interferons (ifn-a/b, spi ifn), especially ifn-a [15] , and induces interleukin-10 (il10) [16, 17] . because the primary cellular target of prrsv is the porcine alveolar macrophages (pams) of the lung, several studies have analysed the immune responses of pams to prrsv infection. one group [18] used differential display reverse-transcription pcr to identify molecular genetic changes within prrsv-infected pams over a 24 h pi period. their results suggest that myxovirus resistance 1 (mx1) and ubiquitin specific proteases (usp) genes may play important roles in clinical disease during prrsv infection. notably, one recent paper on genome-wide transcriptional response of pams following infection with the lelystad prrsv strain (european type, eu prrsv) using affymetrix microarrays has been published during the preparation of our manuscript [19] . they found that the expression of beta interferon 1 (ifn-b) was strongly upregulated while the expression of il-10 and tnf-a was weakly upregulated. almost in the same time, the other group employed serial analysis of gene expression (sage) to examine the global expression of genes in vr-2332 prrsv strain (north american type, na prrsv)-infected pams. they identified over 400 unique tags with significantly altered expression levels [20] . in vitro studies will be useful for investigating how prrsv modifies genes expression in primary target cells, such as pams. however, many of the outstanding issues will be answered only in the context of prrsv-infected animals. hence, the characterization of host immune response under in vivo environment to prrsv is still an area in urgent need of investigation. lung pathogenesis is a major feature of prrsv infection. moreover, in addition to serving as a source of protein in the human diet, the pig is also an excellent biomedical model for humans because of the similarity in size and physiology, and in organ development and disease progression [14] . thus, understanding the host's immune response to prrsv infection is important not only for swine production but also for human consumption. however, to date, the immune response to prrsv in porcine lung has not been analyzed by transcriptome profiling. next generation high-throughput sequencing technology has been adapted for transcriptome analysis because of the inexpensive production of large volumes of sequence data [21, 22, 23, 24] . the technology developed by illumina (formerly solexa sequencing) [25] , which is also referred to as digital gene expression (dge) tag profiling, allows identification of millions of short rnas in a sample and of differentially expressed genes without the need for prior annotations. here we employed the illumina genome analyzer platform to perform a digital gene expression analysis of the porcine lung transcriptome response to n-prrsv infection, and used histopathology examination to analyze the pulmonary pathological changes of the infected-porcine lungs. the relationship between pulmonary gene expression profiles after n-prrsv infection and infection pathology was systematically analyzed. the comprehensive analysis of the global host response induced by n-prrsv suggested an inflammatory response, mediated by multiple inflammatory molecules early during infection that induced tissue injury, an immunosuppressive state, mediated by apoptosis of infected cells, which caused depletion of immune cells and induced an anti-inflammatory cytokine response in which they were unable to eradicate the primary infection. our systems analysis will benefit for better understanding the molecular pathogenesis of n-prrsv infection, developing novel antiviral therapies and identifying genetic components for swine resistance/susceptibility to prrs. our study had been approved by animal care and use committee of guangdong province, china. all animal procedures were performed according to guidelines developed by the china council on animal care and protocol approved by animal care and use committee of guangdong province, china. nine conventionally-reared, healthy 6-week-old, crossbred weaned pigs (landrace6yorkshire) were selected from a highhealth commercial farm that has historically been free of all major pig diseases, such as prrsv, porcine circovirus type 2, classical swine fever virus, porcine parvovirus, pseudorabies virus, swine influenza virus and mycoplasma hyopneumoniae infections. all pigs were prrsv-seronegative determined by elisa (herdchek prrs 2xr; idexx laboratories) and absence of prrsv tested by rt-pcr. pigs were randomly assigned to two groups in the experiment and raised in isolation rooms. six pigs were inoculated with 6 ml viral suspension (4 ml intranasally and 2 ml intramuscularly) of classical north american type prrsv (n-prrsv) strain ch 1a, isolated from china in 1996, gifted by dr. zhang guihong, south china agricultural university) at a dose of 10 6.0 tcid50 ml 21 on day 0. three uninfected negative control (unc) pigs were treated similarly with an identical volume of dmem culture media from uninfected marc-145 cells 1 day prior to experimental infection, and were immediately necropsied. n-prrsv-inoculated pigs were clinically examined daily and rectal body temperatures were recorded from days 22 to 7 pi. viral reisolates were performed after the pigs were killed. the infected group showed positive, and the unc group was negative. tissue homogenates and serum were examined by n-prrsv-specific quantitative pcr (qpcr). the oligonucleotide primers used were nsp2f(59-gtgggtcggcaccagtt-39) and nsp2r , designed in the gene segment encoding for nsp2. the taqman probe, 59 fam-cacagttctacgcggtgcagg -tamra 39, was synthesized. three infected pigs randomly chosen were necropsied at each time point of 96 h pi and 168 h pi. lung samples were collected from unc group (c), three pigs at 96 h pi (n96), three pigs at 168 h pi (n168) and immediately frozen in liquid nitrogen for rna isolation or fixed in 10% neutralized buffered formalin for histological processing. lungs of unc and experimentally infected pigs were processed routinely for haematoxylin and eosin (h&e) and immunohistochemistry staining, as described previously [26] . total rna was extracted from frozen lungs using standard protocols (trizol) and then treated with dnase to remove potential genomic dna contamination according to the manufactures's protocols. rna integrity and concentration were evaluated by agilent 2100 bioanalyzer (agilent technologies). for rna library construction and deep sequencing, equal quantities of rna samples from three unc individual lungs were pooled, rna samples from the three infected pig lungs (n96) were pooled, and rna samples from the three infected individual lungs (n168) were pooled. approximately 6 mg of rna representing each group were submitted to solexa (now illumina inc.) for sequencing. sequence tag preparation was done with illumina's digital gene expression tag profiling kit according to the manufacturer's protocol. in brief, mrna was isolated from 6mg total rna by binding the mrna to a magnetic oligo bead. first-and secondstrand cdna were synthesized while the mrna was attached to the beads. the double stranded cdnas were digested with nlaiii to wash away all fragmens other than the 39 catg fragment attached to the oligo bead. then gex nlaiii adapter 1 was ligated at the site of nlaiii cleavage. in addition, gex nlaiii adapter 1 contains the sequence for the restriction enzyme mmei, subsequently, we applied the restriction enzyme mmei to create the 17 bp tag. the gex adapter 2 was ligated at the site of mmei cleavage. a pcr with 12 cycles was performed with two primers that anneal to the ends of the adapters to enrich the adapterligated cdna construct. the resulting 85 bp fragments were purified from 6% novex tbe page gel. subsequently, the purified cdna tags were sequenced on the illumina cluster station and genome analyzer. image recognition and base calling were performed using the illumina pipeline. all data is miame compliant. the raw data (tag sequences and counts) has been submitted to gene expression omnibus (geo) under series gse19970. for the raw data, we filtered adaptor tags, low quality tags and tags of copy number = 1 to get clean tags. subsequently, we classified the clean tags according their copy number in the library and show their percentage in the total clean tags and analyzed saturation of the library. the preprocessed database of all possible catg +17-nt tag sequences was created, using sus scrofa unigene (http://www.ncbi. nlm.nih.gov/unigene/ugorg.cgi?taxid = 9823, unigene build #35 sus scrofa, nov, 7th, 2008) from ncbi. for monitoring the mapping events on both strands, both the sense and the complementary antisense sequences were included in the data collection. information on the position of polyadenylation signals was also collected from the transcript dababase. then we aligned all clean tags to the reference sequences, and unambiguous tags were annotated. we counted the clean tag number corresponding to each gene. to compare the de of gene across samples (n96/c, n168/c, n168/n96), the number of raw clean tags in each library was normalized to tags per million (tpm) to obtain normalized gene expression level. de detection of gene or tag across samples was performed according to the previous description [27] . genes were deemed significantly differentially expressed with a p-value , 0.005, a false discovery rate (fdr) ,0.01 and an estimated absolute log2-fold change .0.5 in sequence counts across libraries. in order to verify the dge results, we used qpcr analysis. the rna samples used for the qpcr assays were both the same as for the dge experiments and independent rna extractions from biological replicates. qpcrs were done on the lightcycler480 (roche), with sybr-green detection (sybr primescript rt-pcr kit, takara biotechnology co., ltd.), according to the manufacture's instruction. each cdna was analyzed in triplicate, after which the average threshold cycle (ct) was calculated per sample. the relative expression levels were calculated with the 2 2ddct method. the results were normalized to the expression level of hprt1 and relative to the c sample. through browsing all health traits of pig quantitative trait locus (qtl) database (pigqtldb, http://www.animalgenome. org/qtldb/pig.html) by trait classes, we obtained mapping details of qtl on the corresponding pig chromosome. then pig affymetrix elements corresponding to health trait qtl regions were downloaded to an excel file. by matching the id of de genes to all genes in the qtl regions, we obtained de genes of the corresponding qtl region. pathway analysis was mainly based on the kyoto encyclopedia of genes and genomes (kegg) database. two-side fisher's exact test with a multiple testing and x2 test were used to classify the pathway category. the false discovery rate (fdr) was used to correct the p-value. we chose only pathway categories that had a p,0.05. within the significant category, the enrichment re was , where n f is the number of flagged proteins within the particular category, n is the total number of proteins within the same category, nf is the number of flagged proteins in the protein reference database list, n is the total number of proteins in the gene reference database list. stc is implemented entirely in java. the clustering algorithm first selects a set of distinct and representative temporal expression profiles. these model profiles are selected independent of the data. the clustering algorithm then assigns each gene passing the filtering criteria to the model profile that most closely matches the gene's expression profile as determined by the correlation coefficient. since the model profiles were selected independent of the data, the algorithm can then determine which profiles have a statistically significant higher numberthan genes assigned using a permutation test. this test determines an assignment of genes to model profiles using a large number of permutations of the time points. it then uses standard hypothesis testing to determine which model profiles have significantly more genes assigned under the true ordering of time points compared to the average number assigned to the model profile in the permutation runs. significant model profiles can either be analyzed independently, or grouped together based on similarity to form clusters of significant profiles [28, 29] . stc-go supports gene ontology enrichment analyses for sets of genes having the same significant temporal expression pattern. we select random samples of s a (s a is the number of genes assigned to the same model temporal expression profile r.) genes at each iteration and compute fisher's exact test p-values for the selected genes in all go biological categories [30] . the two-sided fisher's exact test p-value for a category reflects a test of the null hypothesis that the category is enriched in genes assigned to profile r with respect to what would have been expected by chance alone. to decide whether or not to follow up a category that appears enriched in these genes, we would know the statistical reliability of the apparent enrichment. to assess the significance of a particular category, we need to know the distribution of p-values that would occur by random chance. the percentage of false positives to be tolerated will generally depend on the relative costs of false positives and false negatives in whatever follow-up study is to be done. this way of framing the question leads us to specify the false discovery rate (fdr) for a set of categories, rather than significance level (p-value) for each category. with the significance at the 0.05 level, for a given category, the enrichment r e is given by r e~( i=m)=(s a =n) where i is the number of genes assigned to profile r within the go category of interest, m is the total number of genes within the go category of interest, and n is total number of unique genes in the gene reference database list. after n-prrsv infection, the affected pigs exhibited the following clinical symptoms within 3-7 days: fever of 40.08-40.8uc, depression, anorexia, rough hair coats, dyspnoea, reddening of skin, oedema of the eyelids, conjunctivitis, mild diarrhoea, shivering. those unc pigs did not show any obvious changes in body temperature and clinical signs. qpcr assay showed that n-prrsv virus was present in each of the 6 infected pigs. but n-prrsv nsp2 gene was not differentially expressed at 96 h pi and 168 h pi (table s1 ). histopathology examination of n-prrsv-affected pigs showed interstitial pneumonia in lungs with thickening of alveolar septa accompanied with infiltration of immune cells ( figure 1b ). most viral antigen was detected in alveolar cells and bronchiolar epithelial cells in lesions ( figure 1c ). to investigate the regulation of the host response to the n-prrsv virus, we considered the global gene expression profiles in lungs using solexa/illumina's dge system, a tag-based transcriptome sequencing method. we sequenced three porcine lung dge libraries from c, n96, n168 using massively parallel sequencing on the illumina platform. major characteristics of these three libraries were summarized table 1 . we obtained approximately 6.9 million total sequence tags per library with 515885 distinct tag sequences. prior to mapping these tag sequences to reference sequences, we filtered adaptor tags, low quality tags and tags of copy number = 1, producing approximately 6.6 million total clean sequence tags per library with 179589 distinct clean tag sequences. the c library had the highest number of both total sequence tags and distinct sequence tags; this was followed by the n168, n96 libraries. moreover, the c library had the highest ratio of number of distinct tags to total tags and the lowest percentage of distinct clean high copy number tags. the data showed that more genes were detected in the c library than other two libraries and more transcripts were expressed at lower levels in the c library. saturation analysis of capacity of libraries showed that new emerging distinct tags were gradually reduced with increasing of total sequence tags when the number of sequencing tags was big enough. when the number of sequencing tags reached 3 million, library capacity approached saturation ( figure s1 ). for tag mapping, we preprocessed one reference tag database that included 51670 sequences from sus scrofa unigene. to get the reference tags, we used nlaiii to digest all the samples and took all the catg+17 tags in the gene as the gene's reference tags, not only the 39 most one. we obtained 194664 total reference tag sequences with 172119 unambigous tag sequences. considering polymorphism across samples, tolerances were set to allow one mismatch in each alignment. by the criteria, 47.71%,51.04% of distinct clean tags mapped to the unigene virtual tag database, 39.42%,42.11% of the distinct clean tags mapped unambiguously to the unigene, and 52.29%,48.96% of the distinct clean tags didn't map to the unigene virtual tag database ( table 1 ). the occurrence of unknown tags was probably due to the incompleteness of pig genome sequencing. most solexa experimental tags matched to the 1st or 2nd 39 catg site in high-confidence transcripts ( figure s2 ). for solexa sequencing can distinguish between transcripts originating from both dna strands, employing the strand-specific nature of the sequencing tags obtained, we found evidence for bidirectional transcription in 6368 to 8271 of all detectable unigen clusters and 3889 to 4043 antisense-stand specific transcripts (table s2 ). by comparison, the ratio of sense to antisense strand of the transcripts was approximately 1.7:1 for all libraries. this suggests that in spite of the high number of antisense mapping events detected, the transcriptional regulation in the n-prrsv-induced immune response acts most strongly on the sense strand. to analyze the depth of transcriptome sampling in the dge libraries, we studied the rate of increase of the number of genes (sense+antisense strand) identified as the size of the corresponding library increases. when the library size reached one million, we could identify 45% and 30% all genes and genes identified by unambigous tags, respectively ( figure s3 ). at this time, library capacity approached saturation. to gain the global transcriptional changes in n-prrsv infected porcine lungs, we applied the method described previously [27] to identify de genes from the normalized dge data by pairwise comparisons between all differential time points (n96/c, n168/c, n168/n96) during infection. results showed that 5430 genes had p values ,0.005, false discovery rate (fdr) ,0.01 and estimated absolute log2-fold change .0.5 in at least one of the pairwise comparisons, which were declared to be differentially expressed during infection course (table s3) . to characterize the functional consequences of gene expression changes associated with infection with n-prrsv, we performed pathway analysis of de genes based on the kegg database by two-side fisher's exact test. we chose only significant pathway categories that had a p-value of ,0.05 and an fdr of ,0.05. as shown in figure s4 , the significant signaling pathways include cell adhesion molecules (cams), t cell receptor signaling pathway, antigen processing and presentation, toll-like receptor signaling pathway, biosynthesis of unsaturated fatty acids, pantothenate and coa biosynthesis, etc (table s4) . to validate de genes identified by solexa sequencing, we selected 8 genes for qpcr confirmation. the set included two down-regulated genes (epithelial chloride channel protein (aecc) and hyaluronan and proteoglycan link protein 1 (hapln1)) and six up-regulated genes (inflammatory response protein 6 (irg6), dead (asp-glu-ala-asp) box polypeptide 58 (ddx58), usp18, cxcl10, cytochrome p450 (cyp3a88), and cd209). data were presented as fold changes in gene expression normalized to the hprt1 gene and relative to the c sample. pearson's correlation coefficient (r) showed that both the dge and qpcr data (pooling samples) were highly correlated, for the genes modulated by n-prrsv had a high consistency and r values ranging from 0.781 (cyp3a88) to 0.997 (aecc) between the two methods ( figure 2 ). qpcr analysis (both pooling samples and independent rna extractions from biological replicates) confirmed the direction of change detected by dge analysis. this correlation indicated the reliability of dge results. qtl play a central role in linking genomic information with phenotypes. the ultimate goal of qtl studies is to identificate the actual gene(s) that are responsible for the phenotypic variation observed in a particular trait [31] . in the present paper, we mapped the de genes to pig qtl regions of health traits in pig qtldatabase (pigqtldb). our search found that 240 de genes were distributed in 18 different known qtl regions related to pig health traits ( figure s5 ; table s5 ). among the 240 de genes, 122 and 114 were located in qtl regions of the cd4-positive leukocytes and cd2-positive leukocytes, respectively; 53 were distributed in the qtl region of the band-formed neutrophils and cd8-positive leukocytes. immune responses against pathogens depend in part on the generation of fully differentiated 'killer' (or effector) and memory cd8 + t cell. effective priming and maintenance of cd8 + t cell responses to viral infection require 'help' from cd4 + t cells, the latter play also a critical role in programming cd8 + t cell memory development [32] . moreover, recent study showed that cd4 + t cells guide effector cytotoxic t lymphocytes (ctls) to virally infected tissues where they can destroy infected cells [32, 33] . in order to profile gene expression time series and search for the most probable set of clusters generating the observed time series, we used stc algorithm of gene expression dynamics, which explicitly took into account the dynamic nature of temporal gene expression profiles during clustering and identified the number of figure s6 ; table s3 ) with 4 (profile 1,6,0,7) significant cluster profiles which have significantly more genes assigned under the true ordering of time points compared to the average number assigned to the model profile in the permutation runs ( figure s6 ). then gene ontology (go) based on biological process (bp) enrichment analyses for sets of de genes having significant cluster profiles was performed by two-side fisher's exact test (table s6 and table s7 ; figures s7, s8 , s9 s10). we chose only significant go categories that had a p-value of ,0.05. the most prominently overrepresented go terms of significant cluster profile 1 (0,21,21) and profile 0 (0, 21, 22) , which are down-regulated genes, involved in regulation of lipid, cholesterol biosynthetic and metabolic process; regulation of skeletal muscle development, muscle cell differentiation; digestion; negative regulation of neuron apoptosis and neurological system process (table s6 ; figures s7 and s8) . the most prominently overrepresented go terms of significant cluster profile 6 (0,1,1) and profile 7 (0,1,2), which are up-regulated genes, included negative regulation of fibroblast proliferation; natural killer cell, macrophage, lymphocyte, mononuclear cell, leukocyte and t cell proliferation, differentiation and activation; complement activation, immune response, inflammatory response, defense response, and apoptosis; response to stimulus(stress); lipid and fatty acid metabolic process and oxidation; positive regulation of ubiquitinprotein ligase activity and protein proteolysis, protein targeting to mitochondrion (table s7 ; figure s9 and s10). these results are consistent with these genes and their associated processes playing important roles in n-prrsv replication and pathogenesis. viral infection of host leads to the initiation of antiviral innate immune responses, which results in the induction of expression of the type i interferons [34] . meanwhile, many viruses have also developed strategies to evade and subvert the immune response. as shown in figure 3a , transcripts of the ifn c was significantly induced in n-prrsv-infected pigs at days 4 through 7 pi, but short type i interferon (spi ifn) gene expression was suppressed, and interferon alpha 5 (ifna5) gene expression was markedly down-regulated. lipid rafts, lipid microdomains of the cell membrane enriched in sphingolipids, cholesterol and associated proteins, play critical roles in the life cycle of many viruses [35] . some viruses enhance their replication by modulating host cell lipid metabolism [36] . dge analysis of pigs infected with n-prrsv showed significant increase of transcript abundance in many genes involved in lipid metabolism, including those for apolipoprotein b48 receptor (apob48r), apolipoprotein-e (apoe), low density lipoprotein b (ldlb), phosphatidylinositol 3-kinase catalytic subunit type 3 (pik3c3) ( figure 3b ). perhaps n-prrsv alters hosts' lipid metabolism to create a lipid-rich intracellular environment to facilitate its own multiplication. moreover, we also observed that n-prrsv induced upregulation expression of anti-apoptotic genes in n-prrsv infected lungs, including myeloid cell leukemia sequence 1 (bcl2-related) (mcl1), nuclear factor kappa-b 1 (nfkb1), nfkb2, adrenomedullin (adm), and interleukin 10 (il10), and downregulation expression of pro-apoptotic genes, including bak protein (bak), (apoptosis-related protein 1) (apr1) ( figure 3d ) to inhibit apoptosis, which might prolong cell life and increase the yield of progeny virions. n-prrsv infection caused anorexia and subsequent slow growth. accordingly, we observed that transcript abundance of genes involved in digestion, such as gastric mucin (muc5ac) and cytochrome p450 (cyp39a1), was significantly decreased ( figure 3e ). simultaneously, transcript abundance of the genes associated with cell, muscle and cartilage development was markedly decreased ( figure 3f ). these genes include insulin-like growth factor binding protein 3 (igfbp3), collagen, type ii, alpha 1 isoform 2 (col2a1), connective tissue growth factor (ctgf), epidermal growth factor (egf). fever and heat shock fever is frequently the host's initial response to infection [37] . after viral infection, pathogen-associated molecular patterns (pamps) in viral proteins and nucleic acids were recognized by host pathogen-recognition receptors (prrs), such as toll-like receptors (tlrs), which trigger gene expression and synthesis of the il-1b precursor. active caspase-1 (casp1) cleaves the il-1b precursor into mature, bioactive il-1b, which is an inflammatory cytokine most responsible for fever [37, 38, 39] . as shown in figure 4a , transcript abundance of tlr1, 2, 4, 6, il-1b and casp1 was significantly increased in n-prrsv infected porcine lungs. moreover, transcript abundance of genes involved in the activation of casp1 and il-1b secretion including apoptosisassociated speck-like protein containing a card (asc), prostaglandin e synthase 2 (pge2) and phospholipase a2, group vii (pla2g7) was significantly increased ( figure 4a ). the expression of heat shock proteins (hsps), known as stress proteins, can be markedly upregulated by all cells under conditions of stress, such as increased temperature (fever) and viral infection [40] . transcript abundance for most of these heat shock genes, including 90-kda hsp (hsp90), hsp70, and heat shock protein beta-1 (hsp27) was significantly elevated in n-prrsv infected lungs relative to unc lungs ( figure 4b ). viral infection results in an inflammatory response, which is an essential component of the antiviral innate immune response [41] . after recognizing the pamps, either surface or intracellular prrs trigger intracellular signaling cascades that results in the activation of transcription factors, including nuclear factor-kb (nf-kb), interferon-regulatory factors (irfs), and signal transducer and activator of transcription (stats). as shown in figure 4a , transcripts of the toll-like prrs tlr1, tlr2, tlr4, tlr6, were significantly increased in n-prrsv-infected pigs at days 4 through 7 pi, but no change in tlr3 which specializes in the recognition of viral dsrna was detected. cytoplasmic prrs ( figure 5a ), retinoic-acid-inducible protein i (rig-i, ddx58) and melanoma differentiation-associated gene 5 (mda5), the two most relevant for defense against viruses, were expressed at a high level after n-prrsv infection. cell surface prrs such as cd14, md-2 protein (md2) and cd163 (which is probably involved in prrsv entry during uncoating [42] ) were likewise up-regulated expression after n-prrsv infection ( figure 5a ). after binding to n-prrsv viral pamps, prrs initiate intracellular signaling cascades that activate transcription factors, including irf1, irf7, irf9, but not irf3 and stat1, stat3, stat6 ( figure 5b ). activated transcription factors and stats in turn induce the transcription of specific sets of interferon-stimulated genes (isgs) [34, 43] , and expression of multiple inflammatory genes [44] , which induce a pro-inflammatory response and attract cells, such as neutrophils and macrophages, to sites of infection. accordingly, we observed significant increase of transcript abundance in many genes involved in isgs ( figure 5c ), pro-inflammatory cytokines (such as il1b, il8) ( figure 5d ), chemokines (ccl2, cxcl9, cxcl10) ( figure 5e ), adhesion molecules (vcam, icam1, sell), and other inflammatory molecules (such as mmp-2,) ( figure 5f ). moreover, immunoglobulin (such as igg2b, igg3) ( figure 5g ), three categories of fc receptors and mannose receptor c1 (mrc1) ( figure 5h ), and complement proteins ( figure 5i ) were also significantly induced in the n-prrsv-infected lungs. however, several complement inhibitors that possess inhibitory and/or decay-accelerating acitivity, such as decay-accelerating factor cd55, complement component 4 binding protein, alpha (c4bpa), c4bpb, were significantly repressed in the n-prrsvinfected lungs ( figure 5i ). cytotoxic t lymphocytes (ctls) detect cells infected with a virus and destroy them through perforin-mediated apoptosis [45] . cd8 + t cells activation require t cell receptors (tcrs) to recognize cognate antigenic peptides for presentation on mhc class i molecules displayed on the surface of antigen presenting cells (apcs) [32] . accordingly, we observed that transcript abundance of ubiquitin specific peptidase (usp) and ubiquitin enzyme ( figure 6a ), 16 proteasomes, and aminopeptidases ( figure 6b ) was significantly increased in n-prrsv-infected lungs. the transcript abundance of b2m, mhc class i antigen 2 (sla-2), sla-3, tap2, and chaperones (such as grp78) was markedly increased after infection with n-prrsv while the transcript abundance of sla-b was significantly decreased ( figure 6c ). in addition to recognization of cognate peptides presented by mhc class i molecules, cd8 + t cells activation needs also to receive 'costimulatory' signals and help by helper cd4 t + cells [33] . as shown in figure 6d and 6e, 8 cathepsins and 5 mhc class ii antigens were significantly induced in n-prrsv-infected lungs. the transcript abundance of costimulatory molecules (such as cd86, icos), cams, and tcrs/cd3 complex as well as co-receptor molecules (such as cd8a, table s3 for full gene names. doi:10.1371/journal.pone.0011377.g006 cd8b) was remarkably increased after infection with n-prrsv ( figure 6f and 6g) . activated ctls release perforin (pfr) and granzymes, which two effectors act collaboratively to induce apoptosis of target cells. as shown in figure 6h , prf1 and granzyme b (gzmb) transcript abundance was significantly elevated in n-prrsv infected lungs relative to unc lungs. in addition to cytotoxins released from ctls, the transcript abundance of other pro-apoptotic members ( figure 6i ), such as nfkbia, growth arrest and dna-damage-inducible protein alpha (gadd45a), bh3 interacting domain death agonist (bid), xiap-associated factor 1 (xaf1), cytochrome c (cycs), casp10, was also significantly increased after infection with n-prrsv, which can induce apoptosis of virus-infected cells. in addition, we also identified the upregulated expression of cytochrome b245 heavy chain (gp91-phox), a critical component of the membrane-bound oxidase of phagocytes (macrophages and neutrophils), and the downregulated expression of heme oxygenase 1 (hmox1) during n-prrsv infections, which might result in the oxidative stress response and subsequent oxidative damage of tissues ( figure 6j ). from the data presented in the paper, a model for the relationship between pulmonary gene expression profiles and infection pathology can be surmised in figure 7 , n-prrsv virus replicates and spreads by subverting host innate immune response and hijacking host lipid metabolism as well as inducing an antiapoptotic and anti-inflammatory state, as indicated by suppression expression of spi ifn, ifn-a, down-regulation expression of proapoptotic genes for bak, apr-1, sarp3, high levels expression of genes involved in lipid metabolism, such as apoe, ldlb, pik3c3, anti-apoptotic genes for mcl1, bcl2a1, chfr, adm, nfkb, il10, and anti-inflammatory molecule pge2 as well as cd163. infections of n-prrsv viruses resulted in fever and inflammatory response, as indicated by high expression of proinflammatory cytokines and chemokines, adhesion molecules, inflammatory enzymes and receptors, such as il-1b, il8, sell, icam, ccl2, cxcl9, cxcl10, b2m, proteasomes, cathepsins. this was compounded by cell death and elevated expression of nfkbia, xaf1, gadd45a, perforin, granzymes, and cytochrome c, coupled with increased ros-mediated oxidative stress, as indicated by by up-regulation expression of cytochrome b245. taken together, the n-prrsv infections may have resulted in an excessively immune and inflammatory response that contributed to tissue damage. infection of pigs with n-prrsv caused fever, dyspnoea, reddening of skin, oedema of the eyelids, conjunctivitis, depression, anorexia, mild diarrhoea. histopathology examination showed interstitial pneumonia in lungs with thickening of alveolar septa accompanied with infiltration of immune cells [46] (figure 1 ). although great efforts have been made by many researchers, the molecular basis of n-prrsv infection is largely unknown. here we report on the first genome-wide host transcriptional response to n-prrsv infection using solexa/illumina's digital gene expression (dge) system, a tag-based novel high-throughput transcriptome deep sequencing method. given the nature of the methodology of illumina's dge system, we have pooled biological replicates from three pigs for each group to make representative samples for deep sequencing analysis. we could reach a sequencing depth of 6.3-7.9 million tags per library (table 1) and found over 5000 genes to be differentially expressed during n-prrsv infection processes (table s3) . although others studies have also pooled biological replicates for library construction and deep sequencing [47, 48] , resulting in the lack of biological replicate, one may blur the impact of variations in pooling samples. because of the variations of pigs in response to prrsv infection, it is possible that one pig could significantly affect results without independent libraries. but we performed the qpcr validation both on the same pooled material that was used for deep sequencing and on independent rna extractions from each pig, which all confirmed the direction of change detected by dge analysis (figure 2 ). our dge analysis showed massive changes in the transcript abundace of known immune response genes and of genes that have been implicated in prrsv infection [18, 19, 20] . we also identified many interesting genes that had not been linked to prrsv infection in previous studies. for example, transcript abundace of lipid metabolism-related genes including apob48r, apoe, pik3c, was significantly increased during n-prrsv infection processes. alterations in lipid metabolism, perhaps including those with significant upregulation in this study, have been observed in response to infection by a range of viruses including sars-cov, hcv, influenza a virus, or dengue virus [35, 36, 49, 50] . although in vitro studies have investigated how prrsv modifying genes expression in pams [18, 19, 20] , many of the outstanding issues will be answered only in the context of prrsv-infected animals [51] . hence, we characterized the genome-wide transcriptome response to prrsv infection in porcine lung by deeping sequencing. but studies of transcript abundance in lung tissues have also their intrinsic limitations. for example, the transcriptome of lung tissues is actually a merging transcriptional responses of a wide range of cell types, some of which are infected, some of which are responding directly to the infectious process and others of which are bystanders. moreover, increased cellularity of tissues may be confused as biologically important increased transcript abundance. despite such limitations, our dge study offers a broad, system-wide window into molecular processes that regulate gene expression and also provides new leads for functional studies of candidate genes involved in host-virus interaction, as illustrated in this paper. the induction of expression of type i interferons (ifns; including ifn-a and ifn-b) is a well-known innate antiviral immune reaction in the virus-infected cells [52, 53] . however, n-prrsv infection suppressed spi ifn gene expression and decreased the transcript abundance of ifn-a ( figure 3a ). previous studies [19, 54, 55] , both in vitro and in vivo, have also showed that prrsv elicited only a minimal ifn-a production or even suppressed it's expression. the suppression of spi ifn, in particular of ifn-a, is probably a crucial step in the pathogenesis, because ifn-a has been shown to inhibit prrsv replication [11] . other viruses infection, such as the 1918 influenza virus [56] , hepatitis c virus (hcv) [57] , ebola virus [58] , also suppressed type i ifn gene expression which led to extensive viral replication and increased pathogenesis. irf3 plays an important role for type i ifn gene expression. the transcript abundance of irf3 was decreased intensively in n-prrsv-infected pigs by 168 h pi ( figure 4b ). one study [59] indicated that prrsv nsp1b inhibited irf3, and then down-regulated ifn-b gene expression. it is worth mentioning that the nsp1 of the influenza a can also suppress innate immunity by inhibiting irf3 activation, and subsequently disrupting the induction of a/b-interferon [60] . research has indicated that the expression of cd163, a prrsv receptor [61] , on macrophages in different microenvironments, in vivo, may determine the replication efficiency and subsequent pathogenecity of prrsv [62] . transcript abundance of cd163 was significantly increased after n-prrsv infection ( figure 5a ). the internalization of prrsv via cd163 in the target cells may induce the expression of il10, and in turn induce the expression of cd163 on neighboring undifferentiated monocytes and increased overall prrsv susceptibility [62] . moreover, infected pigs develop a strong and rapid humoral response but these initial antibodies do not confer protection and can even be harmful by mediating an ade [12] . these antibodies enhance prrsv viral replication by coating the virus and enhancing the internalization of viral particles into macrophages [12] . as shown in figure 5g and 5h, igg and fcc receptors were significantly induced during n-prrsv infection processes. interestingly, the presence of antibodies during feline enteric coronaviruses (fecvs) infection does not also provide sterilizing immunity, instead, these antibodies opsonize virus particles and facilitate their entry to monocytes and/or macrophages through fcc receptors, resulting in disease enhancement [51] . the activation of pro-inflammatory transcription factor nf-kb induces robust activation of the casp1 inflammasome and subsequent release of il-1b that cause fever and inflammation [63, 64, 65, 66] . accordingly, we identified upregulation expression of casp1, nf-kb, and il-1b genes during n-prrsv infection processes ( figure 4a ). nf-kb activation also enhanced the expression of matrix metalloproteinases (mmp2) and mmp9, two cytotoxic substances in prrsv-infected cells [11, 67] . similarly, transcript abundance of mmp2 and ngal (25 kda alpha-2-microglobulin-related subunit of mmp-9) was significantly increased in the lungs of n-prrsv-infected pigs ( figure 5f ). upregulation expression of mmps would likely facilitate infiltration of inflammatory cells and increase inflammation. upregulation expression of il8 (also known as cxcl8), which is an attractant for neutrophils and other polymorphonuclear leukocytes produced after acute infection, in prrsv-infected pams [68, 69] and lungs of n-prrsv-infected pigs ( figure 5d ), was observed. other chemokines such as ccl2 (also known as mcp1 ), cxcl9, cxcl10 (also known as ip10), which were significantly increased ( figure 5d ), may be also crucial for lymphocyte and macrophage infiltration into the sites of n-prrsv infection. ccl2, il8 and ip10 expression were upregulated during sars-cov [70, 71] , and murine coronavirus [72] infections process, which may recruit monocytes and/or macrophages to sites of infection and be a major cause of lung pathology. although the present study indicates that upregulation expression of pro-inflammatory molecules contributes to the pathogenesis of n-prrsv, increased transcript abundance of anti-inflammatory molecules, such as il10, pge2, was also detected in the study. upregulation of il10 gene expression was found previously in prrsv-infected porcine leukocytes, pams, dcs, and in vivo in prrsv infected pigs [16, 17, 19, 73] . perhaps an increase in pro-inflammatory molecules followed by increased anti-inflammatory molecules is the normal progression of events in inflammation [74] . the upregulation expression of il10 might skew the immune response away from a protective th1-cell response towards a non-protective th2-cell response, therefore impairing clearance of virus, which benefits viral infections [51] . upregulation expression of anti-inflammatory molecules and proinflammatory molecules occurring concurrently was also observed after sars-cov and fipv infection [75, 76] . antibodies might also contribute to immunopathogenesis through increasing the uptake of virus by macrophages, resulting in activation of these macrophages and secretion of pro-inflammatory cytokines and chemokines. antigen-antibody complexes might increase transcript abundance of complement ( figure 5i ), which leads to generation of the classical inflammatory response through the production of potent proinflammatory molecules [77] . furthermore, complement activation might also contribute to the development of pulmonary edema and oedema of the eyelids. further understanding the roles complement plays in the hostpathogen interactions may help to develop more effective pharmacological agents against infection. moreover, damage to the lungs of n-prrsv-infected pigs seems to occur directly by viral destruction of alveolar and bronchial epithelial cells and macrophages ( figure 1c) , as well as indirectly through production of immune mediators. activated ctls and nk cells release perforin (pfr) and granzymes, which two effectors act collaboratively to induce apoptosis of target cells. transcript abundance of pfr1 and granzymes increased in the lungs of n-prrsv infected pigs ( figure 6h ). pro-apoptotic molecules xaf1, bid, cyto c, casp10, aifm2, were significantly up-regulated after infection with n-prrsv, which may induce apoptosis of virus-infected cells ( figure 6i ). simultaneously, we also observed upregulation expression of anti-apoptotic genes in n-prrsv infected lungs, including bcl2a1, mcl1, chfr, nfkb, adm, il10 etc ( figure 3d ). upregulation expression of anti-apoptotic genes and pro-apoptotic genes occurring concurrently after n-prrsv infection seems in contradiction of each other. however, this may reflect a balance between apoptotic and anti-apoptotic mechanisms. perhaps prrsv actively induces an anti-apoptotic state to complete its virus replication cycle through delaying cell death while induces apoptosis of virus-infected cells after completion of virus replication to increase rate of spread of virus [19, 78, 79] . anti-apoptotic and pro-apoptotic activaties were also observed in prrsv-infected marc-145 cells, in which prrsv stimulated anti-apoptotic pathways early in infection while caused apoptosis of prrsv-infected cells late in infection [80, 81] . infection with n-prrsv also increased transcript abundance of nfkbia ( figure 6i ), an inhibitor of the tnf receptor activated transcription factor nf-kb. loss of nf-kb activity has been shown to increase the cytotoxic effects of tnf which resulted in increased cell death [82] . an increase of transcript abundance in proapoptotic genes might result in disruption of the mitochondria transmembrane potential, thereby inducing release of cyto c from mitochondrial membranes to induce apoptosis and secondary necrosis [83] . the production of ros, especially superoxide radicals, and the subsequent oxidative damage of cells and tissues are recognized as key contributors to the viral pathogenesis [82, 84] . ros-mediated oxidative stress might also be involved in inducing apoptosis by prrsv [81] . accordingly, we identified the remarkable upregulation of cytochrome b245 heavy chain (gp91-phox) ( figure 6j ), a critical component of the membrane-bound oxidase of phagocytes (macrophages and neutrophils), after infection with n-prrsv, that generated superoxide radicals that killed both infected and normal cells at sites of infection, which would further exacerbate the immunopathological response. infection of macrophages, monocytes and dcs that are essential for immune function, is likely to be a key component in n-prrsv-induced pathogenesis [19, 85, 86, 87] . apoptosis of infected cells causes immune suppression by two mechanism: apoptosis either induces a decrease in the numbers of immune cells that compromises both the innate and adaptive immune response in which they are unable to eradicate the primary infection, or impairs immunity by inducing immunosuppressive effects in the surviving cells [88] . for example, uptake of apoptotic cells by normal macrophages and dcs stimulates immune tolerance by inducing the release of anti-inflammatory cytokines, such as il10, and suppressing the release of pro-inflammatory cytokines [89] . histopathological analysis of the lymphnodes of pigs infection with n-prrsv revealed a profound depletion of immune cells compared with those of unc (data not shown). in summary, the data presented in this study suggest that n-prrsv appears to utilize multiple strategies for its replication and spread in infected pigs, including subverting host innate immune response, inducing an anti-apoptotic and anti-inflammatory state as well as developing ade. after infection of macrophages and possibly dcs, prr-pamp interactions triggered signaling cascades that increased the transcript abundance of multiple inflammatory molecules, including cytokines, chemokines, adhesion molecules and inflammatory enzymes that induce a proinflammatory response, activate and recruit immune cells, such as macrophages and neutrophils, to sites of infection for virus elimination and thereby produce the clinical symptoms of viral infection, such as fever, dyspnoea, interstitial pneumonia in lungs. further, antibodies and complement activation might exacerbate inflammatory response. n-prrsv might induce an immunosuppressive state, mediated by apoptosis of infected cells, which caused depletion of immune cells and induced an anti-inflammatory cytokine response in which they were unable to eradicate the primary infection. figure s4 signaling pathways of de genes. pathway analysis was mainly based on the kegg database. a p-value of ,0.05 and an fdr of ,0.05 in the two-side fisher's exact test were selected as the significant criteria. the vertical axis is the pathway category and the horizontal axis is the log10(p value) of these significant pathways. found at: doi:10.1371/journal.pone.0011377.s004 (0.66 mb tif) figure s5 genes that distributed in the known pig qtls of health traits. the x axis represents the qtl symbol, and the y axis indicates the number of genes associated with health traits. see table s4 for full qtl names. figure s7 biological process go terms of profile 1. functional classification of the de genes was performed according to go biological processes. a p-value of ,0.05 in the two-side fisher's exact test were selected as the significant criteria. these de genes were sorted by the enrichment of go categories. the vertical axis is the go category and the horizontal axis is the enrichment of go. found at: doi:10.1371/journal.pone.0011377.s007 (1.53 mb tif) figure s8 biological process go terms of profile 0. functional classification of the de genes was performed according to go biological processes. a p-value of ,0.05 in the two-side fisher's exact test were selected as the significant criteria. these de genes were sorted by the enrichment of go categories. the vertical axis is the go category and the horizontal axis is the enrichment of go. found at: doi:10.1371/journal.pone.0011377.s008 (2.78 mb tif) figure s9 biological process go terms of profile 6. functional classification of the de genes was performed according to go biological processes. a p-value of ,0.05 in the two-side fisher's exact test were selected as the significant criteria. these de genes were sorted by the enrichment of go categories. the vertical axis is the go category and the horizontal axis is the enrichment of go. epidemiology of porcine reproductive and respiratory syndrome (prrs): an overview isolation and identification of porcine reproductory and respiratory syndrome(prrs) virus from aborted fetuses suspected of prrs emergence of porcine reproductive and respiratory syndrome in sweden: detection, response and eradication assessment of the economic impact of porcine reproductive and respiratory syndrome on swine production in the united states nidovirales: a new order comprising coronaviridae and arteriviridae rapid differential detection of classical and highly pathogenic north american porcine reproductive and respiratory syndrome virus in china by a duplex real-time rt-pcr construction of infectious cdna clones of prrsv: separation of coding regions for nonstructural and structural proteins emergence of fatal prrsv variants: unparalleled outbreaks of atypical prrs in china and molecular dissection of the unique hallmark emergence of a highly pathogenic porcine reproductive and respiratory syndrome virus in the mid-eastern region of china highly virulent porcine reproductive and respiratory syndrome virus emerged in china challenges for porcine reproductive and respiratory syndrome virus (prrsv) vaccinology the challenge of prrs immunology porcine reproductive and respiratory syndrome virus-infected alveolar macrophages contain no detectable levels of viral proteins in their plasma membrane and are protected against antibody-dependent, complement-mediated cell lysis advances in swine biomedical model genomics interferon-alpha response to swine arterivirus (poav), the porcine reproductive and respiratory syndrome virus upregulation of il-10 gene expression in porcine peripheral blood mononuclear cells by porcine reproductive and respiratory syndrome virus upregulation of interleukin-10 gene expression in the leukocytes of pigs infected with porcine reproductive and respiratory syndrome virus molecular responses of macrophages to porcine reproductive and respiratory syndrome virus infection genomewide transcriptional response of primary alveolar macrophages following infection with porcine reproductive and respiratory syndrome virus effect of porcine reproductive and respiratory syndrome virus on porcine alveolar macrophage function as determined using serial analysis of gene expression (sage) gene expression analysis goes digital rna processing and its regulation: global insights into biological networks prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity sequencing technologies -the next generation toward the 1,000 dollars human genome protective immunity induced by a recombinant pseudorabies virus expressing the gp5 of porcine reproductive and respiratory syndrome virus in piglets the significance of digital gene expression profiles cluster analysis of gene expression dynamics optimal gene expression analysis by microarrays gene ontology: tool for the unification of biology. the gene ontology consortium a qtl resource and comparison tool for pigs: pigqtldb mobilizing forces-cd4+ helper t cells script adaptive immunity cd8(+) t lymphocyte mobilization to virus-infected tissue requires cd4(+) t-cell help viral evasion and subversion of patternrecognition receptor signalling lipid rafts play an important role in the early stage of severe acute respiratory syndrome-coronavirus life cycle hepatitis c virus hijacks host lipid metabolism common and divergent immune response signaling pathways discovered in peripheral blood mononuclear cell gene expression patterns in presymptomatic and clinically apparent malaria the caspase-1 inflammasome: a pilot of innate immune responses recognition of rna virus by rig-i results in activation of card9 and inflammasome signaling for interleukin 1 beta production heat-shock proteins induce tcell regulation of chronic inflammation post-transcriptional regulons coordinate the initiation and resolution of inflammation sialoadhesin and cd163 join forces during entry of the porcine reproductive and respiratory syndrome virus cytokine determinants of viral tropism innate immune modulation by rna viruses: emerging insights from functional genomics perforin-mediated target-cell death and immune homeostasis a review of evidence for immunosuppression due to porcine reproductive and respiratory syndrome virus deep sequencing of the zebrafish transcriptome response to mycobacterium infection a microrna catalog of the developing chicken embryo identified by a deep sequencing approach lipid raft disruption by cholesterol depletion enhances influenza a virus budding from mdck cells jnk phosphorylation, induced during dengue virus infection, is important for viral infection and requires the presence of cholesterol immunopathogenesis of coronavirus infections: implications for sars interferon-inducible antiviral effectors interferons and viral infections in vivo and in vitro interferon (ifn) studies with the porcine reproductive and respiratory syndrome virus (prrsv) differential production of proinflammatory cytokines in the pig lung during different respiratory virus infections: correlations with pathogenicity aberrant innate immune response in lethal infection of macaques with the 1918 influenza virus interferon regulatory factor-3 activation, hepatic interferon-stimulated gene expression, and immune cell infiltration in hepatitis c virus patients global suppression of the host antiviral response by ebola-and marburgviruses: increased antagonism of the type i interferon response is associated with enhanced virulence porcine reproductive and respiratory syndrome virus non structural protein 1{beta} modulates host innate immune response by antagonizing irf3 activation inhibition of retinoic acid-inducible gene i-mediated induction of beta interferon by the ns1 protein of influenza a virus cd163 expression confers susceptibility to porcine reproductive and respiratory syndrome viruses modulation of cd163 receptor expression and replication of porcine reproductive and respiratory syndrome virus in porcine macrophages blocking il-1 in systemic inflammation infection, fever, and exogenous and endogenous pyrogens: some concepts have changed caterpillers, pyrin and hereditary immunological disorders inflammasome adaptors and sensors: intracellular regulators of infection and inflammation increased proteolytic activity and matrix metalloprotease expression in lungs during infection by porcine reproductive and respiratory syndrome virus increased production of proinflammatory cytokines following infection with porcine reproductive and respiratory syndrome virus and mycoplasma hyopneumoniae innate immune responses to replication of porcine reproductive and respiratory syndrome virus in isolated swine alveolar macrophages severe acute respiratory syndrome: clinical outcome and prognostic correlates sars coronavirus and innate immunity the t cell chemoattractant ifn-inducible protein 10 is essential in host defense against viral-induced neurologic disease porcine reproductive and respiratory syndrome virus infects mature porcine dendritic cells and up-regulates interleukin-10 production host gene expression profiling in pathogen-host interactions altered p38 mitogen-activated protein kinase expression in different leukocytes with increment of immunosuppressive mediators in patients with severe acute respiratory syndrome in vivo cytokine response to experimental feline infectious peritonitis virus infection complement and its role in innate and adaptive immune responses induced apoptosis supports spread of adenovirus vectors in tumors viral appropriation of apoptotic and nf-kappab signaling pathways porcine reproductive and respiratory syndrome virus modulates apoptosis during replication in alveolar macrophages porcine reproductive and respiratory syndrome virus induces apoptosis through a mitochondria-mediated pathway global host immune response: pathogenesis and transcriptional profiling of type a influenza viruses expressing the hemagglutinin and neuraminidase genes from the 1918 pandemic virus mitochondria and apoptosis pathogenesis of influenza virus-induced pneumonia: involvement of both nitric oxide and oxygen radicals ifn-alpha treatment enhances porcine arterivirus infection of monocytes via upregulation of the porcine arterivirus receptor sialoadhesin differential type i interferon activation and susceptibility of dendritic cell populations to porcine arterivirus porcine reproductive and respiratory syndrome virus productively infects monocyte-derived dendritic cells and compromises their antigen-presenting ability apoptosis and caspases regulate death and inflammation in sepsis death-defying immunity: do apoptotic cells influence antigen processing and presentation? we thank the beijing genomics institute (bgi) shenzhen and genminix informatics ltd.,co for their providing us with technical assistance in dge and bioinformatics analysis. key: cord-018647-bveks6t1 authors: butnariu, monica; butu, alina title: plant nanobionics: application of nanobiosensors in plant biology date: 2019-10-01 journal: plant nanobionics doi: 10.1007/978-3-030-16379-2_12 sha: doc_id: 18647 cord_uid: bveks6t1 nanobiosensors (nbss) are a class of chemical sensors which are sensitive to a physical or chemical stimulus (heat, acidity, metabolism transformations) that conveys information about vital processes. nbss detect physiological signals and convert them into standardized signals, often electrical, to be quantified from analog to digital. nbss are classified according to the transducer element (electrochemical, piezoelectric, optical, and thermal) in accordance with biorecognition principle (enzyme recognition, affinity immunoassay, whole sensors, dna). nbss have varied forms, depending on the degree of interpretation of natural processes in plants. plant nanobionics uses mathematical models based on qualitative and less quantitative records. nbss can give information about endogenous concentrations or endogenous fluxes of signaling molecules (phytohormones). the properties of nbss are temporal and spatial resolution, the ability of being used without significantly interfering with the system. nbss with the best properties are the optically genetically coded nbss, but each nbs needs specific development efforts. nbs technologies using antibodies as a recognition domain are generic and tend to be more invasive, and there are examples of their use in plant nanobionics. through opportunities that develop along with technologies, we hope that more and more nbss will become available for plant nanobionics. the main advantages of nbss are short analysis time, low-cost tests and portability, real-time measurements, and remote control. plants are a fascinating research topic if we relate to environmental stress. because they are physically stuck in specific spots, the plants have to handle in that site, regardless of the environmental conditions. moving to another place is not an option. but what plants can do is to modify the internal "environment," and plants are "true masters" of manipulating their metabolism to deal with environmental disturbances. this feature is one of the reasons why plants are useful in various research; we can rely on them as "sensitive indicators" of environmental changes, even in completely new environments. in the absence of normal conditions, plants cannot use the classical pathways of metabolism, so they need to identify other solutions. this is what happens when plants adjust their metabolism for regulating gene activation, thus producing more or less proteins that are useful or not in the new environmental conditions. the different parts of the plant come with their own genetic regulation strategies. a number of genes that are involved in the creation and remodeling of cell walls are activated differently in growing plants. other genes with a role in identifying light, which are normally active on the leaves, are active at the root level. in leaves, many genes associated with transmission of hormonal information are suppressed, and genes associated with insect protection are more active. these trends are also seen in the (higher) number of proteins involved in message transmission, cell wall metabolism, and plant protection. these patterns of gene and protein functioning indicate that in unfavorable conditions, the plants respond by weakening the cell walls and creating new ways to understand the environment. it is possible to monitor changes in the genes in real time by labelling certain proteins with fluorescent elements. plants modified with fluorescent proteins can give useful information about how they respond to the environment. these modified plants act as biological sensors (nbss). specialized cameras and microscopes allow us to monitor how the plant uses these fluorescent proteins (hamers et al. 2014) . chemical or biological nbs functions on the principle of signal emission (voltage or electrical, photonic) in response to a chemical reaction involve a chemical or biological receptor, r (macrocyclic ligand, antibody enzyme), that binds to a specific target molecule of a sample to be studied, the analyte, a. signal transmission is achieved by coupling with a transducer t that interfaces nbs processes with the processing-transform unit in a measurable signal. analysis of signals in plant nanobionics aims at processing signals recorded by measurements in order to extract the maximum of useful information for diagnostics and these devices are mostly used in genetic engineering in agriculture, where it is necessary to know the mechanisms of reaction and the affinity of enzymes and microorganisms for different substrates of interest and signaling molecules. a biochip is a device that contains a structure of individual sensory elements (nbss) interconnected by functions and recognition specifications, integrated on a chip. the number of nbss on a chip may be of the order of 10 6 units. in this degree of integration, a set of distinct tests can be performed extremely quickly and efficiently. in contrast to microchips, biochips are not electronic structures (they contain different electronic structures coupled to nanobiosensory elements). each nbs can be thought of as a "microreactor" that performs a specific chemical reaction with an analyte. nbss from biochips can be designed to detect a wide variety of analytes, including dna, antibodies, proteins, and biomolecules. the advantages are multiple: sensors can be produced in batches or sequences that can be assembled in parallel or serial, providing a high manufacturing yield; sensors can be assembled on very small areas with reduced distances between them; 3d structures can be generated providing high signals besides 2d structures; any type of biochemical reaction may be incorporated; nbss can be produced separately and subsequently assembled according to the specificity and nature of the application. the important features regardless of industry or technology are selectivity, sensitivity, and stability in the design of sensory systems integrated with structures and arrangements of sensory elements (wan salim et al. 2013) . one of the integrated systems including rotational aseptic sampling is a robotic fluid and reusable electrodes formed by ink-jet printing injection system. the system contains an enzyme electrode with immobilized gox in gel, and the detection of hydrogen peroxide is carried out on a rhodinised carbon electrode (rh coating). although the enzyme electrode has stability and efficiency characteristics, the problem of automated sample monitoring and sampling in an integrated system requires multiparameter optimization due to reciprocal interferences. there is a requirement on specific domains (environment and genetic engineering) of highly performing integrated systems that work in in vivo conditions, such as dialysis, the use of biointerfaces, evanescent techniques, and atomic force microscopy to grasp in the depth of the biological phenomena (identifying and understanding the interaction of proteins). the in vivo exploitation of detection systems for both glucose and lactate was confirmed by the efficacy of using phospholipid copolymers and improving hemocompatibility. immunosensors provide an example for the development of integrated systems where microseparations, chromatographic methods, and electrochemical couplings with optical detectors are incorporated, which ultimately lead to a miniaturized system. there are examples where the level of integration and miniaturization becomes more pronounced (dna-nanotube or biochips-biointerfaces). nbss are expected to be widely used in plant nanobionics where physiological/biochemical parameters should be identified. advanced ink-jet technology has developed methods for analyzing nanoliter fractions on a three-dimensional nbs surface at a speed of 6 m/s. it is expected to produce one million nbss/cm 2 areas using photolithography, contact fingerprinting or self-assembling techniques, and adsorption/desorption under the laser beam that allows "writing" proteins on the surface to be analyzed with great precision. laser techniques, maple (matrix assisted pulsed laser evaporation) or dw (direct writing) approached for immobilizing biological materials on substrates, are in the laboratory stage but have prospects for use as molecular imprinting methods (potocký et al. 2014 ). whether nbss are individuals, in integrated systems, or areas of nbss, all are characterized by unique parameters such as sensitivity and detection limit for a range of analytes. trace detection of various analytes (indicators, additives, contaminants) with sufficient sensitivity and safety is the basic criteria of an nbs to be used. the detection limit in the laboratory is pushed to an atom when the atomic force microscopy is used. thus, the enzymatic electrodes, studied and continuously perfected, use palliative such as concentrating the analyte of interest, which leads to major design and miniaturization difficulties. nbss for phenol vapors were identified, where the phenoloxidase was immobilized on a glycerol gel with a range of interdigitated electrodes. phenol vapors are directly partitioned into the gel and oxidized to quinone. signal amplification was improved by redox amplification of the quinone/catechol couple to obtain a reasonable sensitivity resulting in a detection limit of 30 ppb phenol. this principle can be extended to other carbon compounds up to the ppt (parts per trillion) limits. dna structures have been studied as potential receptors. sandwich structures of liquid crystal dispersions and dna-polycation complexes have been studied with relatively good success in identifying different analytes. the polycation with the role to maintain structural integrity of dna and dna-protamine complexes allows detection of hydrolysis of the trypsin enzyme to the detection limit of 10 −14 m. elimination of the polycation leads to an increase in the distance between the two dna strands resulting in the appearance of an intensive band in the circular dichroism spectrum due to the texture modification (espinoza et al. 2014 ). from a practical standpoint, the disadvantage is their inherent instability. different strategies have been approached to improve longevity and preserve the structure of biological receptors. immobilization in matrices by sol-gel technique for glucose nbss is one of the strategies; fluorescence indicators are used: hexahydrate chloride (2,2′-bipyridyl) ruthenium (ii) and 1-hydroxypyrene-3,6,8 trisulfonic acid. in addition to the optical property improvement of the gel, the stability of the gox enzyme has also improved. other examples are the case of monooxygenase used in hydrocarbon detection; detection of organic halides with metalloporphyrins; and detection of carbon tetrachloride, haloalkane (perchloroethylene), and insecticides (ddt) (kazakova et al. 2013 ). improving the selectivity of an nbs can be approached on two levels: through direct transducer-biological receiver interfacing to reduce interference and new receptors with improved affinity or new affinity capabilities. selectivity is a key parameter that requires the performance of an nbs. pyrroloquinoline is used as a mediator in a glucose oxidase enzyme electrode for the measurement of glucose in the raw or elaborated cavity. alternatively, the electrocatalytic detection of the reaction products resulting from the enzymatic reactions can be improved by chemically modified electrodes such as rhodinised electrodes or hexacyanoferrate modified carbon electrodes. prussian blue is used to modify the electrode surface at amperometric detection of oxygenated water at both oxidation and reduction potentials for the enzyme electrode in lactate and glucose detection. one solution is to identify redox centers of the enzyme via a molecular wire to perform the electron transfer to the electrode (enzymes linked by molecular wires), but the concerns have focused on immobilized mediators on different polymer chains. molecular wires are regarded as intermediates in long-range electron transfer, consisting of two pyridine groups linked by thiophenes with different lengths. such wires can be used in conjunction with self-assembly techniques to produce an isolated electrode that transfers electrons to predetermined molecular pathways (jones et al. 2013 ). an ideal nbs is a device that will detect an "analyte," the subject under analysis and which is present in a given sample. most samples also contain other analytes which may interfere with the nbs response. it must have a specific selectivity to identify the target analyte. it is necessary to design nbss with selectivity for an analyte with the ability to discriminate the interferences produced by the other components in the analyzed sample. specific identification and selectivity capacity are the key components of molecular recognition. molecular recognition is accomplished by the sensor component of a host molecule (host-chemoreceptor) that binds selectively to the "guest" target molecule/guest molecule that needs to be identified. for each "host-guest" system, there is a specific chemical reaction from the multitude of possible reaction channels. when the host-guest response was identified, the host molecule is immobilized/incorporated into nbss, typically on a transducer-contacting membrane/ contact electrode. finally, a way to signal that the bindings/recognition event has occurred (transducer to transducer) has to be found (rodríguez-sevilla et al. 2014 ). one of the requirements for molecular recognition is the existence of groups or centers with specific reactivity in the host molecule that can "close" or bind ions, atoms, molecules, and biomolecules. all living organisms use enzymes, which are proteins that contain "pockets," active centers, designed to recognize a specific analyte. this means that only a specific analyte is able to enter the enzyme pocket. enzymes can be used in nbss as host receptors with molecular recognition capability but are unstable. to design host molecules that can be used in an nbs, the following criteria are considered: the host molecule must be stable under the conditions in which it is to be used, must be able to selectively bind the analyte in the sample, must be able to be immobilized in a film/membrane that is in contact with the sample, must signal that a host-host binding response has occurred, and must ideally release the analyte after detection so the host is free to be reusable. biological receptors include antibodies, membrane receptors, signaling molecules, enzymes, ribosomes, lectins, phytohormones, etc. they bind analytes using "lock-and-key" molecular recognition mechanisms (key-lock, identification-immobilization). biological receptors are not practical solutions for many applications because the specificity, sensitivity, and stability cannot be optimized. artificial receivers are immobilization environments that can be optimized by molecular design for any type of application. synthetic receptor design and synthesis are based on tools developed by proteomics and genetic engineering, producing recognition components that can respond to the occurrence and identification of metabolic deficiencies in plant nanobionics. there are platforms and areas of artificial receptors based on combinatorial mathematical techniques, interface biology, and surface chemistry. they have induced the development of various artificial receptor environments with rapid and diversified selection for any target analyte. the current technique for producing synthetic receptors is called cara (combinatorial array of receptor analysis) (fang et al. 2015) . supramolecular chemistry has developed a wide range of synthetic macrocycles. the most common feature for macrocycle classes is that they contain cavities that act as host pockets for guest molecules. the selectivity of the hosts can be done in "read mode" by varying the size of the preformed cavities. 12-crown-4 has a small cavity ideal for the binding of small ions such as li + , while 18-crown-6 has a large cavity that fits better with larger ions such as k + . the size of the cavities is important for the selectivity of the host, but the question remains what attracts an ion or molecule into a preformed cavity and which factors stabilize the host-guest complex. in enzymes, weak noncovalent interactions (hydrogen bonding, electrostatics, dipole-dipole, van der waals, etc.) are used to link the guest into the enzyme pocket (interactions stabilize host-guest interaction). macrocycles contain polar functionalities, capable of interacting with guests via hydrogen bonding, electrostatic interactions, and dipole-dipole interactions. it is desirable that bonding in the cavity is not strong, because it is important that the analyte, the guest, is released from the host after it has been detected and measured. crown ethers and calixarene are ideal for bonding metal cations, based on the size of their cavity, but also on the high density of electrons present on the oxygen atoms in the cavity. the base compound selectively binds li + to other metallic cations; the modified version of the base macrocycle has a good selectivity for na + . by synthetic modification it is possible to increase the capacity of the host cavity, and new functionalities can be introduced that will favor the binding of specific molecules and ions (da silva et al. 2013) . other modified calixarenes, which demonstrate this principle, are the group of tetraphosphine oxide of the calix[4]arene. by changing binding groups on the same template, calix[4]arene from esters in phosphorus hydrogen oxides selectivity changes from na + to ca 2+ . by increasing the number of repeatable units in esters and phosphorus hydrogen oxides at six, the cavity increases, and the selectivity changes in favor of the higher cs + and pb 2+ cations, respectively. some host compounds have been developed for the selective detection of low molecular weight compounds. an example involves the use of the tetra (s-propanol) calix[4]arena containing four lateral chiral halves for selective differentiation of the phenylalanyl enantiomers. other techniques of supramolecular chemistry may be involved in the synthesis of synthetic receptors that simulate the properties of enzymes. the basic structures that can be modified are porphyrins, semiconductor polymers of the tetrathiafulvalene (ttf) class, and ppvs. other patterns can be considered modified polysaccharides. a linear archetype is the polyanilines containing the two types of redox states (meyer et al. 2007 ). among the multiple nbs classifications, bioaffinity has a range of applications, and antigen-antibody interactions (ag-ab) play a role and are considered to be an instrument in the development of molecular recognition principles. in vivo ag-ab interactions are reversible. factors that condition the ag-ab interaction are the structural complementarity between the antigenic determinant and the antibody combining site; this is the exclusive factor of the specificity of the reaction; the structural complementarity involves the conformational adaptation of the two reacting groups and was conceived in structural terms on the key-lock principle; the chemical complementarity of the reaction groups is the consequence of structural complementarity and signifies the entry into action of intermolecular forces that stabilize and consolidate the interaction of the two groups. the formation of intermolecular bonds requires the existence of atomic groups sufficiently close to the two molecules. the distance between them is inversely proportional to the degree of complementarity. although structural complementarity is not strictly obligatory, higher spatial matching is more conducive to interaction. it is expressed by the congruent of contact surfaces that provide intermolecular attraction forces that stabilize the complex. the ag-ab interaction involves the following types of noncovalent bonds: h bonds, electrostatic forces, van der waals linkages, and hydrophobic bonds. all are nonspecific forces of low value and their nature makes the reaction reversible. h bonds form when two atoms share an atomic h nucleus (one proton). the common proton is found between two atoms of n or o or between n and o. the h nucleus is covalently bonded to one of the two atoms (n or o). the h bond has a binding energy of 3-7 kcal/mol. intermolecular forces are involved in ag-ab complex formation. the action of these forces requires close contact between the two reaction groups. the h bonds result from the formation of an h bridge between two nearby atoms. the electrostatic forces are due to the attraction of the ionic groups with opposite charges located at the periphery of the two protein chains. the van der waals forces result from the interaction between different electron clouds, represented in the form of oscillating dipoles. the van der waals relationships, the weakest interaction forces, are active at very small distances between the reaction groups. the binding energy is 1-2 kcal/mol. van der waals' links are not based on a permanent separation of electrical charges but on their fluctuations induced by the proximity of molecules. at intermolecular distance, instantaneous electric fields are formed, with a polarizing effect on neighboring molecules. between the nearby atoms, there is a mutual attraction force induced by fluctuating dipole load, which a dipole induces in the neighboring dipole (dispersion forces). their intensity depends on the distance between the groups involved and is inversely proportional to the seventh power of the distance. their value is optimal at 1-2 å. hydrophobic bonds, which can contribute with half of the ag-ab binding force, are produced by the association of nonpolar and hydrophobic groups, whereby water molecules are excluded. the optimum distance between the reactive groups varies with the type of bond. the electrostatic forces (coulomb or ionic) are the result of the attraction between atoms or groups of atoms with the opposite electric charge located on the two reacting groups: between a cation (na + ) and an anion (cl − ) or between coo − and nh 3+ (agrawal et al. 2012) . the binding energy of these forces is significant at very small (less than 100 å) distances from the reaction groups. exact juxtaposition of ions favors the action of these forces. the binding energy is 5 kcal/mol and varies inversely with the square of the distance between the two reaction groups (1/d 2 ). hydrophobic (or a polar) linkages occur between nonpolar (nonionized) groups in aqueous solutions and are the consequence of the tendency to exclude the ordered molecule of water molecules between the antigen and antibody molecules. these linkages are favored by amino acids with a polar group that tends to associate, reducing the number of water molecules in their vicinity. by removing the water molecules between the reaction groups, the distance between the active sites decreases much, and the value of the stabilizing forces increases. taken each one on their own, space complementarity or intermolecular forces are not sufficient to form stable relationships. for the stability of the ag-ab interaction, both conditions are required. the higher the binding energy of the reactants, the stable the ag-ab complexes. the interaction of the antigen-and antibody-reactive groups is defined by two parameters: the affinity and avidity of the antibodies. measurement of antibody affinity can be achieved by dialysis at equilibrium. the ag-ab interaction is reversible. within the dialysis bag, the hapten is partially free and partially bound to the antibodies, depending on the affinity of the antibodies. through the membrane of the dialysis bag, only the free hapten can be diffused, and its external concentration will equal the concentration of the free hapten inside the bag (kersten and feilner 2007) . measurement of the concentration of the hapten in the dialysis bag allows for the calculation of the amount of antibody-bound hapten. the constant renewal of the buffer results in total dissociation and loss of hapten in the dialysis bag, which indicates the reversible nature of ag-ab binding. affinity of antibodies measures the binding force between an antigenic determinant and the complementary binding site of a specific antibody. affinity is the result of attraction and rejection forces that mediate the interaction of the two reactants. the strength of these interactions is measured in the reaction between a monovalent antigen (hapten) and specific antibodies. a high affinity interaction requires perfect complementary structures, while the imperfect complementarity of the reaction groups causes a low affinity, since the attraction forces are active only at very small distances and are diminished by the rejection forces. complexes formed by antibodies with high affinity are rapidly eliminated from the circulation without adverse effects on renal function. the ag-ab interaction is permanently characterized by the formation and cancellation of various types of intermolecular bonds. in vivo, probably all ag-ab reactions are reversible, but secondary in vitro reactions (agglutination, precipitation) , under the conditions of reagent balance, are irreversible (sakamoto et al. 2018) . it is essential that the host-guest binding event (the receptor-analyte interaction, r-a) is detected. it is thus necessary to have a way of identifying and transducing the signal from the receptor-analyte interaction to the outside to be processed. it is generally defined as a transducer. the transducer must be in contact with the receiver or the diaphragm that immobilized the handset. electrode and interfacial interactions are determinant in capturing the signal from an r-a interaction and transforming it into an electrical or photonic signal. there are ways of identifying the r-a event, collecting the signal and transducing it as an external signal. the way of identifying the signals and their transduction defines the type of nbss. this means immobilizing the receiver on an electrochemical transducer that measures a current (amperometric method) or a voltage (potentiometric) between two electrodes. if r is immobilized on an optical component, then we will define optical nbss (optical fiber, fluorescence, absorption, surface plasmon resonance (spr)). for detection, at most electrochemical nbss, it is necessary for the membranes containing the host molecule to be placed on a surface of an electrode that leads to an electrochemical response to the binding of the guest. the approach works well when target analysts are loaded species such as metal cations. neutral molecules cannot be detected from the point of view of electrochemical transduction, so the optical detection methods have been successfully used. a chiral host in calixarene contains naphthyl, fluorescent units. upon binding to the guest, fluorescence is attenuated as a result of interaction between the naphthyl-phenyl groups in the host or analyte. the fluorescence attenuation is proportional to the concentration of the analyte. optical methods are used because they offer greater sensitivity than electrochemical techniques. in the absence of the guest analyte, this compound does not exhibit fluorescence because the pyrenyl substituent cannot come in contact with the adjacent nitrophenyl substituent (and the fluorescence attenuation occurs due to their interaction). however, in the presence of na + ions, fluorescence is observed, because the na + ion enters the cavity and binds to the oxygen in the phenoxy and carbonyl groups in the host. this binding induces a more rigid conformation by removing the nitrophenyl groups of pyrene to prevent fluorescence attenuation (gaggeri et al. 2012 ). nbs is a bioelectronic analysis system that combines a transducer with a biological component that is in a specific interdependence. nbss use biological systems with different levels of recognition of the substances to be determined. the first step in this interaction is the formation of the specific complex of the biologically active, immobilized substance r (the receptor, the substrate with the sensitive biological component) with the analyte a (often defined as the chemical signal). table 12 .2 summarizes specific patterns of nbss in relation to the nature of the receptor and the chemical/biochemical signal. there are two general classes of nbss that are based on the bioaffinity response between r and a that alters the distribution of electrical charges that can be measured with specific transducers or consuming the substrate by a specific reaction. the biological constituent of the molecular recognition element (r) is represented by various active species that can be enzymes or enzymatic systems, antibodies (ab) or antigens (ag), receptors, populations of bacteria or eukaryotic cells, tissue fragments, and sometimes even signaling molecules. analytes or substances that can be analyzed (a) are glucose or other sugars, amino acids, alcohols, lipids, and nucleotides. they can be identified by their specific interaction, or their concentration can be measured by various methods. both r and a represent distinct molecular species with high macromolecular specialization (antibodies, antigens, enzymes, receptors, etc.) or are complex systems (cells, tissues, etc.) (kersten and feilner 2007) . after the active biological component, they can be grouped as follows: • enzymatic nbss: enzymes are energy proteins characterized by their catalytic function. modified substrate molecules lead to oxidation, reduction, and hydrolysis reactions that can be measured by enzymatic nbss. enzymatic nbss produce a linear response depending on the substrate concentration. • immunosensors: antibodies are glycoproteins produced by the immune system when an external substance, antigen, is involved. it is theoretically possible to produce antibodies without identifying an antigen. an immunosensor is a high sensitivity nbs. the principle of operation is based on the ag-ab interaction of molecular recognition. • nbss with receptors: the regularity of biological processes is ensured by high sensitivity molecular processes based on the specialization of structural proteins (receptors) capable of recognizing a number of physiological signals. this is the case for neurotransmitters, whose action is mediated through the presence of receptors in the plasma membrane, in sites or cell targets. activation of the biologically active site is via the ion channels. the acetylcholine receptor is the first known receptor in neurotransmitting phenomena. • nbss based on cells or tissues: measurement of molecular species is not limited to interaction with the compounds to be analyzed; the transformations that occur can be measured as resulting products. it is desirable to operate with cell populations whose major metabolic pathways are known. relevant is nbss' l-arginine, which associates populations of bacterial cells of streptococcus faecium in combination with an ammonia electrode. arginine is metabolized by microorganisms. it is difficult to obtain complex reactions outside the cellular structures. similar to the use of cell populations as sensitive elements, fragments or parts of plant tissues may be used. the advantage is greater because there is no extra effort to keep the cells viable in a natural arrangement. for adenosine nbss, a tissue biosensitive element has been proposed. for dopamine nbss, specialists have focused on the pulp of banana fruit, considering that it has remarkable biocatalytic properties. • nbss with redox proteins: the redox proteins are involved in biochemical processes such as cellular respiration and photosynthesis reaction (kersten and feilner 2007) . nbs catalysts use enzymes, microorganisms, or cells to catalyze a reaction with a target substance. nbss own an affinity on using antibodies, receptors, and nucleic acids that bind to a target substance. reactions are quantified by electrochemical, optical, evanescence transducers, etc. the main types of known redox proteins are cytochromes, containing iron in the prosthetic group, and cytochrome "c" is involved in the transfer of electrons into mitochondria; ferredoxins contain iron and sulfur ions in dimeric combinations of chloroplasts (2fe-2s) ferredoxin and tetrameric combinations of bacterial ferredoxin 2 (2fe-2s) involved in photosynthesis and transfer of fixed nitrogen ions, respectively; blue proteins contain copper linked to the smallest cysteine residue involved in a tetrahedral structure such as plastocyanin and azurin that mediates electron transfer in photosynthesis and possibly in nitrite reduction; flavoproteins, containing a prosthetic group and an organic conjugate, are involved in the transfer of proteins such as flavotoxins (agrawal et al. 2012) . these proteins play a role in nature, due to the location on their surface of the redox centers. the subtle architecture of molecules offers selectivity and specificity to these molecules in their interaction with other proteins or enzymes, such as the cytochrome c structure. porphyrin iron (heme) is located at the center of the molecule and is well covered or hidden, being exposed to solvents in a small proportion of 0.06% of the total molecular surface. from those presented above and from table 12 .2, we can see that nbss can be classified into two groups according to the biological component. the protein has a positive potential of +9 mv due to excess lysinic base debris. there is a 324 debye dipole moment, which produces an imbalance in the spatial distribution of the acidic chain balance. a number of lysine residues are distributed around the solvent to which the center of the heme that interacts with the redox proteins is exposed (nelsen et al. 1990 ). nbss are classified in three generations. at the first-generation sensors, the biocatalyst is attached to the surface of the membrane, and then this arrangement is fixed to the surface of the transducer. the adsorption or covalent attachment of the biologically active component to the surface of the transducer allows the elimination of the semipermeable membrane, which is the second generation. the direct linking of the biocatalyst to the electronic device that translates and amplifies the signal, such as the compact transistor, is the basis of the third-generation nbs miniaturization. depending on the nature of the immobilization and the interaction between the three components, the a-r membrane contact with the electrodes to the transducer, and the processes in the nbss have evolved over a generation. first, the specificity and selectivity are dominated by the biological component and are directly related to its nature: enzymes, antibodies, and microorganisms. the specificity derives from the binding of the analyte to the biological component used as the receptor. at the base of this dominant sequence is the a-r biochemical reaction and the collision process between a and r. second, the transport of the analyte to and through the surface considered to r is also an important factor. this process is related to the transport of a physical size through typical diffusion, migration, and convection mechanisms. third, the nbs signal is dependent on the a-r reaction assumed to be at a constant speed. transient states and biochemical pseudo-equilibrium conditions are dominated by the reaction kinetics, the nature of the transport, which in turn is coupled with the immobilized a-r interface substrate reactions. even in the case of a real equilibrium, the reaction speed near the steady state will be important in determining the response time. the kinetics of these processes require additional conditions (agrawal et al. 2012) . the new types of nanoscale materials with different levels of biocompatibility, the new generation of biocompound cells based on a better understanding of metabolism, the manipulation of information stored at the molecular level all have led to a generation of nbss with a high level of integration. molecular information initially stored in the base molecular components can be expressed directly to a higher level called "supramolecular" where interactions between molecules are performed by preestablished algorithms, leading to adaptive, functional, and intelligent materials. materials are built on conceptional, supramolecular, and combinatorial principles. separation, storage, and detection techniques are developed using "biomimetic" membranes that function according to biological models or precise physicochemical principles ). electrochemical nbs with dna is the result of medical diagnosis requirements to quickly and accurately determine the segments of a dna sequence. results from genetics, molecular biology, and nanotechnology have led to one of the most accurate detection methods: electrochemical nbss with dna (combining the principle of nbs isfet with molecular wires from nanotubes). the operating principle consists of collecting the signal between two electrodes-one working electrode and another reference electrode. the auxiliary electrodes have a specified role in their turn. the sensing mechanism consists in modifying the i-v characteristic (current-voltage) in the presence of a target molecule. carbon nanotubes are exceptional for the work electrode with high electron transfer velocity and excellent spatial resolution. target in nbs dna is an unknown sequence of dna (or oligonucleotides) attached by functionalization to the carboxyl or amino cnt groups. researchers reported the developing plants' ability to capture 30% more energy by implanting carbon nanotubes into chloroplasts and plant organisms where photosynthesis takes place. they managed to modify plants so as to detect nitric oxide by implanting another type of carbon nanotubes. "the plants are suitable for the role of a technological platform. they heal themselves, they are durable, they resist the harsh environments and they have their own sources of energy and water." the transformation of plants to photon devices, with own energy, such as explosive detectors or chemical weapons, is expected (panchal and upadhyay 2014) . external or surface electrodes are metal electrodes that contact the nbss' bioactive component either directly (dry or solid electrodes) or through an electrolyte solution (liquid electrodes). solid or dry electrodes are made in silver, platinum, gold, and nickel. internal electrodes are made of thin wires made of durable metal: stainless steel, platinum, and tungsten. the active part of the electrode can be covered with a metallic conductor layer (gold, silver) and the inactive one with an insulating layer (polymer/thin film). nbss' active contact surfaces are large in size compared to cell sizes and are used for extracellular recordings. microelectrodes are internal electrodes, but they are built to measure potentials in direct contact with the receiver in nbss. the contact surface with r is micronized. microelectrodes can be solid, compounds which can be achieved by depositing a conductive layer (platinum, gold) on a glass support having a particularly thin peak, and another constructive variant consists of inserting a metallic or carbon fiber conductor into an epoxy resin support mixed with a conductive paste; they are used in cellular samples and consist of a glass pipette having a micronized tip filled with an electrolytic solution containing potassium chloride. in the electrolyte solution, a conducting wire is immersed to pick up the electrical potential (wu et al. 2014 ). it was thought that it is not possible to transfer electrons directly between the electrode and the proteins due to their distortion. several practical considerations have led to the conclusion that the active center of the heme is irreversibly adsorbed when resulting in protein denaturation in contact with the electrode. changing the surface of the gold electrode by surface adsorption of 4,4′-bipyridyl resulted in the modification of the electrode surface configuration for interaction with cytochrome "c." 4,4′ bipyridyl is not an electroactive substance in the potential region and therefore does not play a role as a mediator. this electrochemical addition was possible due to the quasireversible binding of cytochrome "c" to the modified gold electrode with 4.4′ bipyridyl, thereby resulting in the hydrogen linkages in the lysine residues to bound to the nitrogen of the pyridyl which modified the surface of the electrode. transmutation through complex protein electrode rapidly directs the transfer of electrons, which is accomplished by the following scheme: cytochrome diffusion "c" on the electrode, protein binding on the surface, electron transfer, and protein desorption. following this process, more than 60 surface changes were possible for electrochemistry of proteins and the gold electrode. using a bifunctional reagent (x-y), wherein the x group is the n, p or s bonded electrode and the y group, which must be bonded "represent also examples of patterns developed" (agrawal et al. 2012 ). electrochemistry of proteins has been extended to the carbon electrode. pyrolytic carbon-graphite forms, vitreous carbon and mesocarbon, are structures in which graphene plans are arranged in ababa hexagonal mesh or disordered in different turbulent forms. the base graphene plan is hydrophobic, but existing or induced defects lead to free c-c bonds, and there is an increase in c-o linkages by oxidation. the direct electrochemistry of positively charged proteins can therefore be performed on the edges of the graphite plans of the carbon electrode. the direct transfer of electrons to negatively charged proteins, such as plastocyanin with graphite electrode (edges or plane edges), can be aided with mn, ca, cr complex cations complexed with amino compounds, cr (am6) 3+ , used as promoters of reactions. in this context, the promoters are inactive redox species in solutions but allow the transfer of electrons to the redox proteins. microminiaturized electrodes have specific advantages among which we mention the improvement of polarization and contact with biological material at the active sites. specificity and selectivity depend on the nbs receptor biological component and its affinity for the analyte. affinity is a specific feature of enzymes, antibiotics, and receptors being used in functions in living organisms. affinity is based on the chemical coupling between a component and its complementary partner. in the case of high-affinity components, the diffusion process is rapid, leading to the formation of the ag-ab type of complex. the association reaction specific to molecular recognition will be characterized by the first-order response rate constant. in nbs measurements it is essential to consider the concentration of a constant component and other variables. the results of electrochemistry of proteins have been extended to amino acids and peptides (barroso et al. 2015) . the addition of enzymes to solutions containing substrate molecules is the essential condition in enzyme catalysis reactions. extracting the necessary information from the enzyme science to be applied to the development of nbss such as the enzyme electrode is an extremely difficult task. references will be used to outline some of the enzyme properties necessary to describe enzymatic nbss. consider a simple reaction, with a single substrate s, that combines with enzyme e to form the enzyme-substrate intermediate complex, es. this unstable complex undergoes a new reaction resulting in product p. the formation and consumption rates of the complex are equal. as soon as e and s enter the reaction, the system becomes unbalanced, the concentration of the complex will be zero, and the formation speed of the complex is much higher than the rate of its consumption. as the reaction unfolds, es increases and implicitly increases the rate of disappearance of the complex relative to its rate of formation. initially, the excess of the substrate determines the consumption of the enzyme, and during the course of the reaction, the enzyme's constant regeneration begins to reach its steady state. the analysis of these reactions results in two important conclusions: • at a low concentration of the substrate, the rate of the reaction is proportional to the substrate concentration and inversely proportional to the rate of the formation and extinction rate of the complex or the dissociation reaction rate in the initial reactants plus the decomposition reaction rate in products. • at a high concentration of the substrate, maximum speed is limited by enzyme concentration. thus, the two sequences correspond to the two processes that can control the overall reaction rate (stein et al. 2011 ). when the reaction takes place in homogeneous solutions at a uniform rate, the same in the entire environment, it is necessary to consider the change in the concentration of the components over time. three mechanisms of mass transport occur in solution: diffusion, convection, and migration. an electrochemical nbs is an autonomous, self-contained device capable of providing specific quantitative or semiquantitative analytical information using as a molecular recognition element, a biochemical receptor (biological identification element) that is in direct spatial contact with an electrochemical transducer. electrochemical nbss are distinguished only by the nature of the transducer regardless of the nature of the biological component according to the classification in table 12 .3. due to their ability to be calibrated in a repetitive manner, an electrochemical nbs is distinguished by a bioanalytical system requiring additional processing steps, such as the addition of reagent. nbss for a single type of measurement, or unable to continuously monitor concentration analysis or not to be rapidly and reproducibly regenerated, are defined as "disposable." nbss are classified according to their biological specificity-with reference to the mechanism or to the interpretation of the physical-chemical signal (the transducer) (barroso et al. 2015) . the biological recognition element is based on a catalyzed chemical reaction or an equilibrium reaction with macromolecules that have been previously isolated or synthesized in their original biological environment. in the case of reversible reactions, the steady state can be reached if there is no net consumption by the agent of in addition to quantitative determination of analytes, nbss are also used to detect and quantify microorganisms: the receptors are bacteria, yeasts, or oligonucleotides coupled with electrochemical, piezoelectric, optical, or calorimetric transducers the immobilized biocomplex and incorporated into the nbss. electrochemical nbss can be classified according to the analyses and reactions they monitor: direct monitoring of the concentration of the analytes or their production or consumption reactions and, alternatively, an indirect monitoring of the inhibitor or activator of the biological recognition element (the biochemical receptor). criteria include calibration characteristics (sensitivity, linearity, operational range of concentration, quantitative determination limits, and specific detection), selectivity, equilibrium state and response time, reproducibility, lifetime, and stability (fang et al. 2015) . the notion of recognition is used in nbss or in nanobiosensory systems by association with the sensory systems of the plants. sensations such as smell or taste are made up of systems that contain an identification receiver cell coupled with neurotransmitter signal-processing pathways. such phenomena also occur in biochemosensors but at a much-simplified level compared to the complexity of molecular recognition in living systems (barroso et al. 2015) . examples of single or multiple transfer signals, limited to the main biochemosensors, are shown in table 12 .3. for the receptor types shown in table 12 .3, different electrodes and measurement methods can be selected from table 12 .4 to form an electrochemical nbs. nbss are classified by the recognition element (table 12. 3) or by the transduction mode (table 12 .4). nbss irrespective of the type of classification should be treated unitarily as a microsystem, the biological recognition element being in direct contact with the transducer element. an electrochemical nbs is an nbs with electrochemical transducer (table 12 .4). it is considered to be a chemically modified electrode (cme), the electric conductor nbss may use several other types of non-electrochemical transducer: (a) piezoelectric nbss; (b) nbs-saw, measures surface acoustic waves in a resonance circuit (shear and surface acoustic wave); (c) thermometric nbss (the active element is coupled with a thermistor); (d) optical nbss, uses optical phenomena: planar wave guide, optical fiber, surface plasmon resonance) spr nbss use the immobilized analyte-receptor interaction on a metal film deposited on an optical prism measuring the variance of the refractive index due to changes induced in the metal's electrical charge that transmits the electrons from the interaction process to the outside in the electronic measuring system. the electrode may be a metal, a semiconductor, or an ionic conductive material coated with a biochemical or bioactive film. electrochemical nbs is an integrated transducer microsystem capable of providing selective, quantitative, or semiquantitative analytical information using a biological identification element. it can be used to monitor biological and nonbiological elements. chemical nbss that incorporate nonbiological components as receptors, although used to monitor biological processes (ph or nbss of oxygen), are not nbss. the clark electrode is of importance in the nbss' measuring range. similar physical nbss used in biological environments such as those measuring pressure, etc. are not considered nbss (jacoby et al. 2015) . electrochemical nbss according to the terminology set out in tables 12.3 and 12.4 can be classified according to their biological specificity, by mechanism or mode of signal transmission, or alternatively, the combination of two. they can be amperometric, potentiometric, field effect (fet), or nmss' conductometric (electrical conduction measurement) respectively impedance metrics. alternatively, they may be called enzymatic amperometric nbss to specify the nature of the receptor and the transducer. the first nbss that were studied are the enzymes and immunosensors (fang et al. 2015) . nbss are based on a catalyzed reaction of biomacromolecules present in the original biological medium that is preisolated or synthetically produced. the reaction is monitored by an integrated detector (transducer) that measures the stationary or transition states or the final reaction product via the immobilized biocidal product in nbss. types of commonly used biocatalysts are enzymes (simple or enzymatic complexes)-most commonly used as recognition systems, cells, microorganisms (bacteria, fungi, eukaryotic cells, or yeasts), cellular organs, or component (cell walls, mitochondria) sections of plant or animal tissues. nbss with biocatalyst recognition elements are the best known and studied since the beginning of their approach by clark and lyons. one or more analytes, commonly called s and s′ substrates, react in the presence of one or more enzymes, cells, etc., to produce the p and p′ products (fang et al. 2015) . there are four strategies whereby the associated transducer can monitor the consumption of s-analyte through the biocatalytic reaction: • detection of s′ cosubstrate consumption, oxygen depletion through the oxidaseinduced reaction chain, bacteria, or yeast. the measured signal is the decrease in cosubstrate consumption compared to the initial value. • recycling of the reaction product p such as peroxyhydrogen, h + , co 2 , and nh 3 , in oxidoreductase reduction schemes, hydrolysis, lysis, etc. the signal from the transducer will be amplified. • detection of active centers in the biocatalyst: redox, cofactors, prosthetic groups evolving in the presence of substrate s by using an immobilized mediator. it reacts quickly with the biocatalyst and is easily detected in the transduction chain. various ferrocene derivatives, such as tetrathiafulvalene, tetracyanochinodimetane (ttf + tcnq), organic salts, quinones, quinone dyes, ru, or os complexes in polymeric matrices, can be used as mediators. • direct electron transfer is made between the enzymatic redox reactive site and the electrochemical transducer. the third strategy eliminates, partially or totally, the dependence of the nbss' response on the cosubstrate concentration, s′, which decreases the influence of interference between species. the use of mediators leads to the decrease of the substrate concentration together with the reaction chains by using a suitable membrane, whose permeability favors the transport of the cosubstrate. when enzymes are immobilized within the same reaction chains, it can improve the performance and abilities of nbss. three possibilities are commonly used: • some enzymes facilitate biological identification by sequentially converting the products of the enzyme reaction series into an electroactive final form: this way allows for a wide range of nbs analysis. • the enzymatic complex, applied in series, can regenerate the cosubstrate of the first enzyme and amplify the nbs output signal by regenerating another cosubstrate of the first enzyme. • the parallel enzyme complex improves selectivity of nbss by lowering the local concentration of electrochemical interfering substrate: this sequence is an alternative to the use of a permselective membrane or a sequential method (e.g., interpretation of an output signal generated by an nbs and a reference nbs without biorecognition element). the operation of nbss is based on the interaction between the analyte and the macromolecules or organized molecular assemblies. upon reaching the balance, there is no consumption of analyte by the biocomplex agent immobilized in the substrate. the response to the biocomplex analyte-reagent reaction is monitored by an integrator detector. in some cases, the biocomplex reaction is self-monitored by a complementary biocatalytic reaction. the integrator detector monitors stationary or transient states. antibody-antigen interactions, the most relevant examples of nbss using biocomplex receptors, are based on immunochemical reactions, e.g., binding an antigen (ag) to the characteristic antibody (ab). complexes formed by ab-ag can be detected provided that other nonspecific reactions are minimized, for each determination of ag corresponds to a certain ab that must be isolated, purified, etc. some recent studies have analyzed direct monitoring of ag-ab formation using ion-selective field effect transistors (isfets). increasing the sensitivity of immunological nbss is achieved by adding specific enzymes to ag-ab couples, but this requires additional synthesis steps. as the binding strength or affinity constant varies widely, these systems operate irreversibly (disposable nbss) or are coupled to flow injection analysis (fia) systems; then ab can be regenerated from dissociation of the complex with agents such as glycine-hcl to ph 2.5 (kurien et al. 2010 ). recently, they have been used as molecular recognition systems in conductometric analysis, isfet, or optical nbss with receptors with ion channels, membranes, or protein structures. a transporting protein, lactose-permease (lp), can be incorporated into a liposomal bilayer that permits protonic carbohydrate transport with a stoichiometric ratio of 1:1. this mechanism was identified through the ph-dependent fluorescence of a fluorophore immobilized in liposomes. liposomes with lp were incorporated into a lipid bilayer deposited on a ph-sensitive isfet. preliminary results show that this modified isfet is capable of irreversibly detecting lactose from an fia system. protein receptor nbss have been recently discovered. binding of analytes, here called agonists, to immobilized receptor proteins is monitored by changing the flow of ions through these channels. glutamate, as an agonist, can be determined in the presence of other agonists that can interfere with the determination of na + or ca 2+ streams using conductometric method or ion-selective electrodes. due to the dependence of the ionic channel on the nature of the linkages, it produces an independence toward the enzyme nature in order to achieve the desired sensitivity. two methods have been approached. the first refers to oligonucleotide duplex interleaving during the formation of the double helix structure of the dna of a molecule that is electrically active. the second method is the direct detection of guanine which is electroactive. some of these nbss cannot operate through analytical-sensing membrane separation membranes. the sensitive layer often has to be in contact with the biological environment where the analytes are located (fang et al. 2015) . nbss have been developed for indirect monitoring of organic pesticides or inorganic compounds (heavy metals, fluorides, cyanides) that inhibit the biocatalytic properties of enzymes used in the construction of nbss (devices are irreversible). in immunosensors, the initial biological activity can only be regenerated by chemical treatment and therefore is not part of the reconditioned or reusable nbs class. their application potential is to warn and not to accurately monitor a specific analyte (considered as disposable). nbss with cyanide (i.g. inhibition of cytochrome c oxidase) that are used as inhibitor to cytochromoxidase are regenerated by washing with a phosphate buffer at ph 6.3 (armstrong and beckett 2011). with the development of enzymatic glucose nbss, an experiment in which glucose oxidase is immobilized between two membranes, literature has emerged about techniques for immobilizing biological receptors. enzymes, antibodies, cells, or tissues with high biological activity can be immobilized in a thin film on the surface of a transducer through a variety of methods. the following immobilization procedures of biological receptors are used: • immobilization on the membrane on the surface unexposed to the analyte: an enzymatic solution, a cell or tissue suspension, rests between the permeable membrane to the analyte and the measuring electrode (electrochemical detector). • retaining of biological receptors in a polymeric matrix, polyacrylonitrile, agar gel, polyurethane (pu) or polyvinyl alcohol (apv), redox hydrogels with redox centers such as [os (bpy) 2cl] +/2+ . • retaining of biological receptors between self-assembled layers (sam) or in membranes from the double lipid layer (blm). • the covalent binding of membrane surface receptors through bifunctional groups: glutaraldehydes, carbodiimides, sams, avidin-biotin silanized. • modification of the entire electrode structure (modified carbon paste with enzymes or graphite in epoxy resins). receptors are modified either alone or mixed with other proteins, such as bovine serum albumin (bsa), either directly on the surface of the transducer or in the polymer membrane. reactivated membranes can be used directly to immobilize enzymes or antibodies without chemical modification. the covalent binding and cross-linking are more difficult than immobilization or the retaining of receptors on the membrane. in the case of microsensor structures where the membrane is directly deposited on the transducer, the covalent bonding is safer and more stable (muñoz et al. 2008 ). besides reactive layers or membranes with immobilized receptors, many nbss, those for clinical or biological applications, incorporate one or more internal or external separation auxiliary membranes with three important functions: the barrier, the outer diffusion barrier for the substrate, and the biocompatible surfaces. for any nbs built on the principle of molecular recognition, it is important to characterize it by its response, which is related to operating parameters and limiting reaction speeds. accuracy, precision, sensitivity, and reproducibility are basic criteria for estimating nbs performance. these parameters are in direct relation to the reaction mechanisms, the transport phenomena, and the kinetics of the processes in the volume at the interface. most criteria have been developed for enzymatic nbss, being the most studied in the literature. in the case of immunosensors, the key element is the ability to capture the surface, i.e., the number of surface molecules that is active. one method of checking this parameter is to measure the specific activity, meaning the ratio of the number of active molecules to the number of immobilized molecules. this estimation is dependent on the immobilization mode (molecular orientation, number of attachment points or active sites), and the ratio ranges between 0.15 and 0.3, rarely reaching the unit. capture capability becomes important when the surface decreases as in microfluidic applications. a problem encountered in immunosensors is that of regenerating the surface without significant loss of activity. there was a lack of rigor in the performance criteria (affi et al. 2016) . the response signal is corrected for background noise, the reference concentration is usually estimated in mol/l although this high value is never used when measuring ranges refer to 10 −10 mol/l, and currently sensitivities of the order nmol/l and pmol/l have been reached. transient response is important for dynamic assay analysis and sampling techniques but is less significant in continuous monitoring. the transient response is estimated by the slope (dr/dt) max after the addition of the analyte in the measuring cell. one evaluation method is to introduce nbss into an fia system for sequential sample analysis in a specified hydrodynamic regime. the sensitivity and linear range of measurement of stationary concentration are determined through graphical representation. this method is more concise than the current calibration curves used to plot the response corrected to the baseline based on its concentration or logarithm. parameters are estimated in the linear response range of nbss. any electrochemical nbs has a superior linearity of the response. this limit is directly related to the biocatalytic or biocomplex properties of the biochemical or biological receptor. more in the case of nbss with enzymes, this limit is significantly influenced by membranes and immobilized substrates where the diffusion barriers and secondary kinetics play a role. the local concentration of the analyte in the reaction layer may be two orders of magnitude smaller than the volume of the solution (michelini and roda 2012) . enzymatic kinetics are described by michaelis-menten mechanisms and expressed by km and vm parameters. for the kinetics of the enzyme in the solution phase, km is usually determined from the lineweaver-burk graphical representation. for any electrochemical nbss, the number of standards used and how the standard sample matrix can be simulated or duplicated should be set, being required to specify the procedures for each type of nbss related to its application. these are important for the disposable nbss' case using immunoaffinity or inhibition reactions. sensitivity is the slope of the curve and should not be confused with the quantified detection limit (lod) relative to the baseline or noise signals. the range of work concentrations is determined by lower or higher detection limits (fang et al. 2015) . selectivity and safety are determined as any kind of amperometric or potentiometric nbss. they depend on the choice of receiver and transducer. most enzymes are specific, but there are also nonselective enzyme classes, such as alcohol oxidases, the group of oxidases sugars, peroxidases, lactases, tyrosinases, ceruloplasmin, alcohol dehydrogenases, glucose dehydrogenases, nadh dehydrogenases, etc. they have been used to develop nbss to determine environmental phenols or to monitor food quality. on the other hand, oxygen electrodes, ph electrodes, and isfets have a pronounced selectivity, the same as metal electrodes that are sensitive to many substances. their selectivity may be changed when these transducers are associated with receptors. enfet is ph sensitive to the buffer and protonation but its selectivity is not altered. when the transducer interferes with other substances, known as ascorbate or urease, to glucose nbss based on hydrogen peroxide detection, these side effects may be restricted by the use of outer or inner permselective membranes. alternatively, nbss with and without biological receptors that work by differential nbss are designed. safety in operation of nbss depends on the selectivity and reproducibility and accuracy of the measurements (heyl et al. 2012 ). the clark construction principle studied electrochemically oxygen as a reducing gas and platinum as a metal electrode. platinum used for detecting electrochemical oxygen is known as the clark electrode. the electrode has an organic membrane covering the electrolyte layer and two metal electrodes. oxygen diffuses through the membrane and is electrochemically reduced to the cathode. between the cathode and the anode, a fixed voltage is applied, for which the oxygen reduction reaction takes place. temperature greatly influences reaction speed and solubility. this is a polarographic electrode used to measure the concentration of oxygen in body fluids or gases. the sample is in contact with a membrane (polypropylene or teflon) through which the oxygen diffuses into a measuring chamber containing 50% saturated potassium chloride solution. inside the room are two electrodes, one is reference, ag/agcl, and the other is platinum, coated in the glass. the electrical current at the polarization potential of -600 mv is proportional to the oxygen concentration in the solution. for reverse polarization at +600 mv, hydrogen measurements can be made. reactions are very sensitive to temperature and should be maintained at ±0.1 °c. the electrode is calibrated using a mixture of the two gas-oxygen and hydrogen-known concentrations. oxygen electrode or clark electrode has proven to be an analyzer of raw gas or developed gas when performing chemistry analyses in the clinical laboratory and in the field of medical care, ambulatory, or intensive care (on the surface of the platinum electrode an enzyme reacts with oxygen). the enzymes are placed in a closed membrane to the surface, which can be recognized as the simplest model of nbss. the oxygen concentration curve was proportional to the glucose concentration. it was the first nbs built, which helped the progress of laboratory analyses a lot. oxygen diffuses through the membrane and is electrolytically reduced to the cathode. the higher the partial oxygen pressure, the more oxygen diffuses at a time. the temperature nbss attached to the sample allow the membrane to compensate for the diffusion and solubility rate. the measuring instruments record cathode current, sample temperature, membrane temperature, barometric pressure, and salinity. with this information one can calculate the oxygen contained in the sample, either in parts per million (ppm) or in percent of oxygen saturation. the geometric configuration of the clark electrode is of great importance. in particular, the thickness of the electrolytic layer between the cathode and the membrane must have a certain limit, to ensure linearity and decrease of the drift current. calibrating a polarographic system is a must. proportionality between current and oxygen concentration must be ensured, with errors below 1% (biological samples role and air parameters are essential). air, as a gas mixture that has a constant oxygen content of about 20.9%, when in contact with water, the dissolved amount depends on several factors: the optimal time for oxygen dissolution, homogeneity of the water solution, water temperature, air pressure, salts contained in water, and other water-soluble substances that are oxygen-consuming. oxygen contained in water is determinant for biological and chemical processes, so measurement of dissolved oxygen in water is important to find the partial pressure of dissolved oxygen; it must be saturated in pure water at a certain temperature (wolfbeis 2015) . the enzymatic electrode (in some references known as the enzyme electrode) is a combination of an electrochemical probe of any type (amperometric, potentiometric, or conductometric) with a thin layer (10-200 microns) of immobilized enzyme. in these devices the function of the enzyme is to provide selectivity in virtue of its biological affinity for a substrate of molecules. an enzyme can catalyze a reaction of a given substrate for a specific isomer from a plurality of substrates with different isomers. typically, the degree of advancement of an enzymatic reaction (directly related to the concentration of the analyte) is monitored by the rate of product formation or the disappearance of a reagent. if the product or reagent is electrically active, then the response can be directly monitored by amperometry, i.e., the variation of the current for a given applied potential. the main considerations are: does the enzyme contain active redox groups? are the biochemical reaction products electroactive? is one of the substrates or cofactors electrically active? what is the speed and response time? what is the final application of nbss? if the enzyme does not contain redox groups, then nbss are limited to measuring the product or substrate consumption by their reaction to the transduction electrode. the electrical current is directly related to the analyte concentration. nbss are based on electrochemical response due to h 2 o 2 . most common enzymes used in the design of enzyme electrodes are those that contain redox groups that change their redox state during the biochemical reaction. redox enzymes are oxidases and dehydrogenases, pyrroloquinoline quinone (pqq). they act by oxidizing the substrate, accepting electrons during the process, and further transforming in a reduced state. these enzymes return to the oxidized active state by transferring electrons to the oxygen molecule resulting in hydrogen peroxide (h 2 o 2 ). both oxygen and peroxide being electrochemically active, they continue by reducing to cosubstrate (oxygen) or oxidation of peroxide (reaction product). the method based on the reduction of oxygen to the o 2 electrode is one of the simplest methods but suffers from several disadvantages, namely, slow response, miniaturization difficulties, low accuracy, and reproducibility. measurements on peroxide oxidation overcome the above difficulties and are currently the most popular method. mediator systems-a major limitation of the above-described hydrogen permeation system is the high operating potential (about 0.8 v against the ag/ agcl reference electrode) required for oxidation of peroxide which leads to increased interference. the use of mediators (molecules that can carry electrons between the enzyme redox center and the electrode) can minimize this inconvenience. depending on the nature of the mediators, the potential applied may be reduced below the limit of interferences of species such as ascorbate, urate, and paracetamol. a large number of compounds are able to act as mediators in the enzyme electrode. of these, the most popular are the metallic complexes. representatives for organometallic complexes are ferrocenes and their derivatives because they have redox potentials and are independent of ph. bienzymatic systems' recent works have focused on the direct communication of electron transfer between enzymes and electrodes. successes in the field are the peroxidase enzyme hrp (horseradish peroxidase) that catalyzes the reduction of hydrogen peroxide for a number of organic compounds. when the enzyme is immobilized on the electrode, the need for the organic reducer is prevented by the electrode itself providing the reducing equivalences. the coupling of peroxidase with an enzyme oxidase allows for the construction of bienzymatic systems through which the peroxide produced by the oxidase is detected by the electrode-peroxidase system which operates at lower potentials relative to a simple platinum electrode. this is where the minimization of active species interferences results from (frederickson matika and loake 2014). optical fiber as a nanobiosensor can be placed in the surface or inside the plant to directly measure parameters. nbss with optical fiber are proposed to be used in many and rapid medical determinations, and its applications are continuously expanding. it can be attached inside a hollow-like tubular instrument, serving to dilate a hole or channel, and inserted into the tissue, performing a minimal monitoring where it is needed. nbss with optical fiber are nontoxic, chemically inert, and can be successfully used inside the plant. it can be associated with plant monitoring equipment. it's easy to handle with negligible weight. the evolution of fiber-optic nbss is based on multiple performance and biocompatibility. biocompatibility is the first step in the plant's comfort; nbss should not affect the physiological parameters of the plant, but its functionality must not be compromised by plant disassimilation products. fiber-optic nbss can be classified as extrinsic, fiber acts as a way for signal and intrinsic, interactions occur in the fiber itself. there are two types of fobs: minimally invasive nbss that are introduced into the cavities of the plants and invasive nbss that are introduced into the organs or in wood conductive tissue (liu et al. 2015) . in the last decade, optical fiber is a product that is widely used in all the cutting-edge fields of advanced science and technology. given the ease with which it can be manipulated, unlimited sterilization possibilities, and reduced costs, it can be estimated that this product will increasingly gain market. the following applications are known to have used fiber-optic nbss: in epidermis and vascular tissues, for analysis of raw sebaceous elements, saturation in oxygen, raw sewage gas analysis, sap ph; in plant breeding monitoring; easy ph determination with a microabsorbent indicator and ph-modulator, acid-alkaline; in vegetal tissues, when it is intended to monitor the temperature, or to diagnose small and very small injuries that are difficult to reach; in epidermis for can test the quality and integrity of the layers, so, small lesions can be detected, can be used to stimulate tissues, a fobs based on oxygen demand (bod) can be used; in the stem can identify very small injuries that are inappropriate. another possible application is to appreciate the color or integrity of the vascular tissues. optic-fiber nbss can now monitor electrolytes from raw or elaborated sap as well potassium, sodium, and calcium. it takes the form of a tubular instrument, able to expand an optic-fiber channel (0.5 mm diameter) that can be inserted into vascular tissues. it can measure the gas concentration and the ph of the raw or elaborated sap and oxygen saturation also. the materials that make up the chemical transducers are ionophores that can be reversibly attached to the electrolyte by a molecular separator (spacer) and fluorophore, respectively. the degree of fluorescence, through excitation with electroluminescent diodes in purple, is modulated by ionophores proportional to the analyte concentration. nbss are used either extracorporeally in the external raw or elaborated sap gas analysis circuit or intracorporeally for continuous blood gas monitoring in critical situations. the chemical parameter capable of monitoring cell state is ph because lactic acid, formed when cell tissue dies, produces a decrease in ph. any drop in the ph of the raw or elaborated sap from 7.4 indicates cell death (mclamore et al. 2010) . achievements in the domain are invasive ph nbss that determine the state of the cell. nbss are composed of a fluorescent dye encapsulated in a gel matrix (polyacrylamide) attached at the end of the optical fiber. the dye is characterized by the emission of the acidic form centered on 580 nm and the alkaline form centered on 680 nm. the two forms are ph sensitive. excitation occurs at 533 nm for both forms. separation of emission is done directly through optical filters, and sensitivity is 0.05 ph units far below acceptable clinical standards (0.1 ph). a bacterial disorder is a multifactorial condition that is characterized by demineralization of the inorganic portion and an organic destruction of the substance. each bacterial perturbation has as an etiological agent as pathogenic species. the content of the raw or elaborated sap in terms of bacterial load is about 109/ml raw or elaborated sap. therefore, raw or elaborated sap can be considered as a selective medium for bacterial growth. a significant correlation has been demonstrated between the number of pathogens in the raw or elaborated sap and their prevalence in the problems. a simple method has been adapted for detecting and counting the pathogen; it is a device consisting of a special support made with an agar culture medium containing 20% sucrose. it is inoculated with raw or elaborated sap, and the density of growth of the pathogen is assessed after incubation for 48 hours at 37 °c. next comes the morphological identification of distinct colonies on selective and nonselective agar, on distinct cell form, visible in the light of the microscope. the technique also has some disadvantages that it takes time for bacterial growth and also requires additional laboratories. to monitor pathogen activity in the sap, a fobs was developed that monitors the pathogen-mediated sucrose reactions through a photosensitive indicator immobilized in a porous glass coating. the surface of the optical fiber core is treated or coated in a porous glass film using the sol-gel technique (kozan et al. 2007) . spectroscopic analysis showed that there are two phases in light absorption at 597 nm over a duration of 120 min: between 0-60 min and 60-120 min. the investigation shows the potential of nbss in monitoring plant activity. the sol-gel technique is used to immobilize the photosensitive indicator, and it is simpler than compared with the principle of selective medium which takes time and is laborious. criteria at the base of the experiment: pathogens are partially anaerobic with opti-mal growth at 37 °c; glucosyltransferases and fructosyltransferases from pathogens catalyze the synthesis of water-insoluble glucan and fructanic polymers in sucrose to form lactic acid found in acidic sap; pathogenic agents in the sap synthesize both extracellular and intracellular polysaccharides from sucrose; extracellular polysaccharides help the adhesion of bacteria to the surface, while intracellular polysaccharides are stored for bacterial energy; polysaccharides, intracellularly, help the bacterium to continue fermentation even when there is no exogenous form of food. acid tolerance of pathogens causes their activity to continue even at a low ph; a ph indicator, photosensitive, produces a characteristic color, according to a color gradient, depending on the ph of the raw or elaborate sap and is used in the fobs construction (miranda et al. 2011) . on the basis of these considerations, an experimental assembly was designed and performed with a double beam uv spectrometer in which optical fiber was introduced instead of a cuvette. in the reference well, the blue bromophenol buffer solution was used for different ph values, from 4 to 7. the initial experiment helped determine the wavelength and peak characteristic to the buffer indicator and for different ph values in the sap, induced by bacterial activity. uv spectroscopic analysis at a ph of 4 and 7 of the blue bromophenol solution showed slightly prominent at 590 nm wavelength; peak intensity decreases from ph 7 to 4. comparison with literature data showed a good concordance, observing that in the medium with sap, sucrose and bromophenol blue, the absorption was stable at 590 nm wavelength for a time interval of 15 min, 30 min, 1 h, and 24 h. for each set of corroded and processed fibers set, it requires independent calibration due to the in homogeneities resulting from the optic-fiber preparation steps. fobs proves to be a rapid quantity measurement test of pathogen activity in raw or elaborated sap. this test can also be adapted to other plant nanobionics where bacterial activity is involved in cellular or tissue destruction. the method of forming a biosensor from an optical fiber for the observation and detection of the pathogen, the experiments of this study followed the phase of biochemical recognition of the signal and the phase of the spectroscopic analysis. the idea of nanobionic plants has evolved to create solar cells that heal themselves from plant cells. the next step was the desire to try to amplify photosynthesis in isolated chloroplasts in plants to be used in solar cells. chloroplasts host everything they need for two-step photosynthesis. in the first step, pigments such as chlorophyll absorb light, which generates the stimulation of electrons that circulate through the chloroplast tilacloids. the plant captures this electrical energy and uses it to fuel the second stage of photosynthesis, creating glucose. chloroplasts also have these reactions after they have been removed from the plants, but after a few hours, they break down because light and oxygen destroy their photosynthetic proteins. normally, unlike extracted chloroplasts, plants can repair this damaging process. to prolong chloroplast productivity, researchers have attached ceric oxide nanoparticles. these particles are, in fact, powerful antioxidants that remove oxygen radicals and other high reactivity molecules produced by light and oxygen, protecting the decomposition chloroplasts. the researchers introduced nanoparticles into chloroplasts using a new technique called leep (lipid exchange envelope penetration). by wrapping the particles into polyacrylic acid, a heavily charged molecule allowed the particles to penetrate the lipid hydrophobic membranes surrounding the chloroplasts. in these chloroplasts, the level of decomposition of molecules has decreased tremendously. using the same technique, the researchers introduced semiconductor carbon nanotubes embedded in negatively charged dna into chloroplasts. plants use only one tenth of the available sunlight, but carbon nanotubes have functioned as artificial antennas that have allowed chloroplasts to capture unusual light waves such as green, ultraviolet, and near infrared. when carbon nanotubes functioned as prosthetic photoabsorbents, photosynthesis, measured by the activity of electrons in tilacloids, was 49% more intense than in chloroplasts isolated without attached nanotubes. when cerium oxide was joined with carbon nanotubes, the chloroplasts remained active for the next few hours (nikolelis et al. 2008) . researchers went to live plants and used a technique called vascular infusion to attach nanoparticles to arabidopsis thaliana, a small flower plant. using the above method, the researchers applied a nanoparticle solution on the lower side of the leaf, penetrating the stomata that usually allow the carbon dioxide to get in and the oxygen out. in these plants, the nanotubes have penetrated into chloroplasts and have increased the photonic electron circuit by about 30%. it is also to be discovered how these percentages influence the sugar production of plants. scientists have been able to transform the arabidopsis thaliana plant into a chemical nbs by implanting nanotubes that detect nitric oxide, a pollutant produced by combustion (hines et al. 2015) . nbss have been created on the basis of carbon nanotubes for several chemicals, including hydrogen peroxide, trinitrotoluene, and sarin gas. when molecules attach to polymers encased in nanotubes, the fluorescence of these nanotubes is altered. carbon nanotubes can be used to create nbss that detect real-time particle free radicals or signal the presence of molecules with a very low level of concentration and too difficult to detect. this is a tremendous demonstration of how nanotechnology can be combined with synthetic biology to modify and enhance the functions of living organisms of plants. the way that nanoparticles arrange themselves can be used to enhance plant photosynthetic capacity, being used as nbss and stress reducers. by adapting nbss to targets, researchers hope to develop plants that could be used to monitor environmental pollution, pesticides, fungal infections, or exposure to bacterial toxins. attempts to incorporate electron nanomaterials into plants, such as graphene, are currently being made. researchers have long tried to find the best way to transmit information in a timely manner, focusing on electronics and mechanics of nbss for their tasks. nbss used for agricultural purposes are not new. recombinant dna technology now offers the possibility of obtaining new biological insecticides that preserve the benefits of "classic" biological control agents, plus some new features. these technologies are not accessible to all possible users, especially if they are poor, and furthermore, they have also generated a series of public debates about their usefulness and effects on organisms other than the target or the environment. obtaining pest-resistant plants is perhaps the most spectacular field of genetic engineering applied to plants, since it allowed the regeneration of plants containing genes of bacterial origin that provide protection against harmful insects. this ensures, on the one hand, the obtaining of richer harvests and, on the other hand, the reduction of farmers' costs for pesticides (lukács et al. 2006; prasad et al. 2014 prasad et al. , 2017 . a series of new genes for resistance to insect attack, transferable to plants (genes encoding δ-endotoxin production from b. thuringiensis) has been discovered; genes for the synthesis of enzymes or enzyme inhibitors; plant genes encoding the synthesis of specific lectins; genes that cause induction of synthesis of plant compounds such as phytoalexins. the development of cloning methodology was based on the observation that there is a group of gram-positive bacteria belonging to b. thuringiensis species that produce a toxin called δ-endotoxin or crystalline protein capable of killing a wide range of insects (coleoptera, lepidoptera, diptera), depending on the bacterial strain. of great interest is strain b. thuringiensis var. tenebrionis that synthesizes an effective δ-endotoxin against the colorado beetle. the genes involved in the synthesis of this protein are localized, on most bacterial strains, to large plasmids (75 kb), the production of the toxin occurring during sporulation. it has been shown that crystalline protein (δ-endotoxin) is normally expressed as a large inactive protoxin, which undergoes proteolytic processing in the intestine of the sensitive insect, becoming an active toxin. it recognizes the specific receptors in the intestinal cells and blocks the functions of these cells, leading to the death of the insects. studies on genes that encode inhibitory proteins produced by b. thuringiensis led to their grouping into four types based on target species specificity and nucleotide sequence: type i cry genes, encode specific 130 kda proteins for lepidopteran larvae; type ii cry genes, encode active 70 kda proteins on dipterous and lepidopteran larvae; type iii cry genes, encode 70 kda specific activity on coleopteran larvae; and type iv cry genes, encode inhibitory proteins for diptera larvae. the number of genes identified that are coding for δ-endotoxin-like proteins is high: 140 genes specific for lepidopteran, coleopteran, and diphtheria (genfa et al. 2005) . it has been achieved to obtain crystalline protein genes from several strains of b. thuringiensis by genetic amplification (pcr). because the whole gene for the crystal protein was found to be very poor in the transformed plant cells, a modified gene was created, containing only the n-terminal portion of the protein (amino acids 1-645). to increase gene expression in plants, the natural sequence for amino acids 1-415 rich in at was replaced by a synthetic sequence, rich in gc, containing the preferred codons for plant cells. these recombinant genes were introduced into ti plasmid-derived vectors (binary vectors containing the duplicate camv promoter, which increases the transcriptional and fivefold transcription and selection marker genes for antibiotic resistance or phosphinothricin herbicide) transferred to cells by a. tumefaciens containing ti disarmed plasmids. the size of recombinant plasmids obtained ranges between 5000 and 10,000 pb, depending on the size of the bacterial gene and the promoter sequence integrated into the vector. recombinant bacterial strains were then used to infect test plants (potato, tobacco, cotton). selection was first performed according to vector-borne selection markers (antibiotic resistance, gus test, herbicide resistance, etc.), and finally regenerated plants were subjected to insect attack (takakusagi et al. 2013) . regenerated plants have shown resistance to attack by insect pests, the specific character being maintained and expressed in experiments in the field. the first transgenic plant that manages the insect attack resistance belongs to the nicotiana tabacum species, expressing the whole or truncated cry 1a gene, cloned under the control of a constitutive promoter, so that the inhibitory protein represents 0.02% of the total plant protein (leaf). there were obtained cotton plants into which the modified cry 1a (c) gene, cloned under the control of the camv 35s promoter or under the control of a promoter and a sequence for a chloroplast transit peptide isolated from arabidopsis so that the expression level of the gene of interest led to a high level of toxin: 0.1% of total protein, 1%, respectively. another variant of cloning the bacterial toxin gene was that of using genetic elements that ensure expression of the gene of interest exclusively in the green portions of the plant (the promoter derived from the gene for pepc) or pollen (by using a gene derived promoter for a calcium dependent protein kinase (cdpk). a similar methodology has been used to transform rice plants, and by cloning a synthetic cry iii gene, they have obtained tobacco and potato plants resistant to the attack of colorado beetle (vigneux et al. 2007) . a modified 1a (b) modified gene was used for cloning under the control of the camv promoter and obtained sugarcane plants with resistance to diatraea saccharalis larvae. given the practical significance of plant resistance to harmful insects, research has been extended to other plant species, producing eggplant plants resistant to the attack of coleoptera, broccoli with resistance to certain lepidopteran species, maize with resistance to b. fusca, etc., as well as a number of advances in leguminous plants. toxicity studies conducted on plants expressing the gene for δ-endotoxin have shown that the existence of the transgene does not alter the normal features of the plants, except resistance to insect attack. in addition to δ-endotoxin produced by b. thuringiensis strains, other bacterial species have also been identified that produce insecticidal proteins (liu et al. 2011) . this is the case for b. cereus strains producing two vip1 and vip2 proteins with effect on insects, their mode of action being similar to δ-endotoxin. expression by plants of enzymes such as chitinase, cholesterol oxidase, lipoxygenase, phenol oxidase, peroxidase, or isopentenyl transferase (ipt) could be an alternative to using the δ-endotoxin gene. of the enzymes that can provide plant protection against insect attack, a particular place is occupied by chitinases, enzymes that act on chitin, a basic component of insect coatings. tobacco plants expressing genes for chitinase isolated from insects or beans exhibit increased resistance to lepidopterans. it has been observed that by cloning the isolated chitinase gene from the s. marcescens bacterium, a synergistic effect of endocytinase produced by plants containing the endotoxin gene in addition to s. littoralis larvae has been revealed. another enzyme of bacterial origin that exhibits insecticidal action is cholesterol oxidase. the introduction and expression of cho a gene for cholesterol oxidase from streptomyces sp. in tobacco plants have led to increased plant resistance to a. grandis larvae. the use for cloning the gene for the enzyme bacterial isopentenyl transferase (ipt) (involved in cytokine biosynthesis) by fusion with the protease inhibitor ii (pi-iik) gene promoter determined the production of n. plumbaginifolia lepidopteran-resistant plants (m. sexta or m. persicae). another embodiment was that of introducing into the plant cells the cpt2 gene that encodes a trypsin inhibitory protein isolated from vicia faba. this protein has antimetabolic effect, protecting plants from the attack of pests. similarly, other genes encoding protease inhibitors (kunitz trypsin inhibitor, bowman-birk proteinase inhibitor) or lectins have been cloned into different hosts, and encoded proteins may be true "defense guns" for the plants that contain them. it is known that insects, such as lepidopterans, depend on serine proteases (trypsin, chymotrypsin, or elastase), these being the first enzymes to digest (alvarez-fernandez 2010). a series of genes encoding different inhibitors for serine proteases have been isolated from various sources (plants, microorganisms) and cloned into various plant species, such as m. sativa, tobacco, corn, etc.; the plants obtained showed increased resistance to various insect pests compared to normal non-gm plants. in some cases, it has been noted that the insertion of additional genes to plants for protease inhibitors to join endogenous genes causes an increased level of pathogen resistance of transformed plants. however, the use of protease inhibitors for controlling insect pests requires thorough studies of plant and insect interactions because it has been observed that for some inhibitors such as serine proteases, the insecticide effect also manifests on useful insects (e.g., on bees). insect carbohydrate metabolism is another target for inhibitory agents tested in numerous studies. genes for different α-amylase inhibitors (wheat and beans) were isolated and characterized; after cloning the isolated gene from wheat in tobacco plants, an increase in lepidopteran larval mortality of up to 40% was observed. in the case of cloning the α-amylase inhibitor gene from beans in pea plants under the control of the pha 1 gene promoter, an increased expression of the foreign gene in the seeds was obtained, which resulted in a higher resistance to callosobruchus sp. (ramirez and spears 2014) . vegetable lectins are a special group of glycoproteins that have protective functions against a number of harmful organisms. studies on these glycoproteins have shown that they produce strong effects on the development of various types of insects. the first example of plants containing genes for nonspecific lectins that show toxicity to pests is tobacco plants in which the lectin genes from the pea have been cloned. however, many of the vegetable lectins also have a toxic effect on mammals, which limits their use as agents to increase pest resistance. special attention has been given by many specialists to lectin specific for mangosteen from guinea pigs and concanavalin a: the genes for these lectins have been cloned into different plant species (tobacco, tomato, potato, sugarcane, rice, wheat), and the heterologous proteins synthesized by them have determined a reduced sensitivity to the attack of harmful insects (lepidopteran, aphids, coleopteran larvae). the results suggest that the use of plants containing insecticide genes (such as for lecithin) together with integrated management represents promising possibilities for controlling pests from many plant species, including wheat or rice (richard et al. 2014 ). contrary to the remarkable results achieved so far, the genes used to transform crop plants are either too specific or only partially effective on the targets of insect pests. to use plants as true "weapons" for pest control, it would be necessary to have genes at their level to determine the synthesis of compounds with different actions on the same target. the researchers are relatively recent and aim mainly to combine genes for a b. thuringiensis δ-endotoxin with another inhibitory gene in the same host gene: for example, the gene for the v. faba trypsin inhibitor (cpti) or for serine proteases and the protein gene in the potato virus y coating. another interesting approach is that which introduced the cry1a (c) gene into a p. fluorescens strain able to colonize the sugarcane by means of two plasmids, pder405 and pkt240, in which the gene is found in 13 and 28 children, respectively. testing of recombinant bacterial strains on sugarcane-specific insect pests revealed greater resistance of plants treated with the respective bacteria than untreated. also, although pest-resistant plants have been obtained for several plant species, fewer results have been reported for cereals, vegetables, and oleaginous plants (ibrahim et al. 2008) . plant resistance to various diseases (microbial, viral, and nematode phytopathogens) has been a subject for long-term studies, identifying a relatively large number of resistance genes. although it was thought that endogenous resistance genes would provide a sustainable effect for the appropriate plants, this was true in very few cases. in the case of potato, the control program for certain diseases, such as the rot caused by phytophthora infestans, had to be abandoned because the resistance to this disease of potato plants obtained by transferring the 11 resistance genes on the basis of crossbreeds with the solanum demissum species proved to be of short duration. identifying genes of resistance in the genome of different plant species and transferring them to other crop plants are extremely difficult and time-consuming if traditional methods (intra-and interspecific hybridizations) are used. the process is considerably accelerated by the use of molecular markers generated by restriction fragment length polymorphism (rflp) techniques, randomized amplified polymorphic dna (rapd), single sequence repeat (ssr), or single nucleotide polymorphism (snp). the application of molecular markers allowed the isolation of nearly 20 resistance genes (r-genes) from genetically well-characterized plant species, proving that many of them are grouped into specific chromosomal regions (they form clusters). of these resistance genes, some have been transferred to other plant species than to their origin through molecular cloning techniques, with the help of specific vectors that ensure the transfer of large fragments, revealing that in this way, the r-genes act synergistically and provide lasting resistance to some diseases. as it has already been mentioned, few r-genes have been shown to provide a lasting control of plant diseases. this is the case for pepper bs2 genes and rice xa21 which provide resistance to different phytopathogenic strains of x. campestris or x. oryzae in the case of species in which the genes have been cloned (e.g., tomatoes). resistance to these genes is due to the recognition of proteins produced by bacteria or phytopathogenic fungi, resulting in the occurrence of a plant hypersensitivity phenomenon and necrosis of affected tissues. another example of the long-acting r-gene is the barley recessive mlo gene which provides the resistance of the plants containing it to all e. graminis strains, through the accumulation in plant cells of a phenolic antifungal compound named p-coumaroylhydroxyagmatine. it is expected that this gene will be used for suppressing the antisense technique of the dominant mlo gene from wheat or other plant species susceptible to erysiphe sp. in vitro processing of the r-genes and then introducing them into new hosts provide new possibilities for resistance. this is the case for the tomato avrpto gene which, after cloning under the control of a strong promoter such as 35s to camv, determines the resistance of transformed tomato plants to p. syringae pv. strains, tomato, and to unrelated pathogens such as x. campestris and c. fulvum. researchers' efforts to obtain resistance systems applicable to a larger number of plant species are not limited to r-genes but also include systemic acquired resistance mechanisms (sar). several genes involved in sar have been isolated and characterized, of which the npr1 gene encoding a transcriptional regulator is a key gene in this system. overexpression of this gene increases the resistance of arabidopsis thaliana or rice plants to a wide variety of pathogens (knecht et al. 2010 ). an interesting behavior has been observed in the myb1 gene which is induced by vmt infection of resistant tobacco plants and which causes the synthesis of a transcription factor that binds to the promoter of a gene involved in pathogenesis (the pr1a gene). modifying the expression level of the myb1 gene in tobacco plants leads to increased resistance to viral infection (with vmt) as well as to r. solani pathogenic fungus (raymond et al. 2007) . along with this, another recently cloned gene, pad4, isolated from arabidopsis thaliana, proves to be extremely interesting for the development of broad-spectrum resistance: overexpression of the gene in plants increases resistance to phytopathogens. numerous biotech companies and universities have begun to assess the performance of plants that express antifungal proteins through both "in vitro" and field experiments. both plants containing r-genes or genes involved in sars, as well as genes such as those which encode the glucosidase (ago) from aspergillus sp., defensins, h 2 o 2 -generating enzymes, glucanases or chitinases, have been examined. although at the laboratory level potato plants containing the fungal gene for glucose oxidase showed an increased resistance to phytopathogens, the results were inconclusive when they were put into the field. for other genes such as chitinase and intracellular α-1,3-glucanase, overexpression of these in tomato plants resulted in significant resistance to fusarium oxysporum (werner et al. 2016) . companies have created nbss that farmers can use to detect information such as air pollution, soil humidity, and so on. however, given that plants are really good nbss and that they can naturally react to external stimuli and changes, they can be used instead. this is the idea behind the latest nbss initiative called advanced plant technologies. the idea is that, through genetic manipulation, researchers will be able to create self-sustaining plants, which in turn enable them to act as a kind of nbs when it comes to detecting chemical substances, pathogens, etc. this is not the first time this idea with plants as nbss appears, but before that, resources that plants needed to survive were used, which in turn reduced their resistance. this new idea indicates that nbss can be sustained by themselves, which means they can work longer in isolated parts. in the future, plants could be used to detect when a biological attack will take place. in addition, because they are plants, it also means that they can be placed anywhere and nobody will think twice about their presence and that they might be some nbss. through such examples, nature teaches us to optimize by exploring diversity. in this context, integrated nanoscale systems (nbss), energy sources (biocombustible cells) that use plants metabolism, manipulators for nanoscale surgery, and drug reservoirs embedded in intelligent polymers are explored. all of these are virus-sized. the proliferation of these types of nbss leads to a large number of applications, and combinations of these in the future will lead to microminiaturization, versatility, and functionality. plant species present a great diversity of genetics, and wild ones constitute a large genetic reservoir, from which genes that are important from a practical point of view can be obtained. plant genetic engineering research has a great theoretical significance, facilitating the knowledge of how genes of these organisms act, the effects of phytohormones on plant development, genes inactivation mechanisms, etc. by applying molecular biology techniques, useful information can be obtained on the genome of plants used for amelioration, the localization of genes of interest, the degree of relationship between different species, etc. as far as practical applications are concerned, a number of significant results have been obtained so far, some of which have already been applied, such as virus-resistant plants, plants resistant to the attack of pests, herbicide-resistant plants, plants of horticultural interest (ornamental plants with new phenotypes, plants producing softening resistant fruits), plants capable of synthesizing secondary metabolites in increased amounts, and plants producing "edible" antibodies, and enumeration could continue. numerical modeling of the dynamic response of a bioluminescent bacterial biosensor rapid detection of cadmium-resistant plant growth promotory rhizobacteria: a perspective of elisa and qcm-based immunosensor patented applications of gene silencing in plants: manipulation of traits and phytopathogen resistance experimental and modelling data contradict the idea of respiratory down-regulation in plant tissues at an internal [o 2 ] substantially above the critical oxygen pressure for cytochrome oxidase 3d-nanostructured au electrodes for the event-specific detection of mon810 transgenic maize development of biosensor for phenolic compounds containing ppo in β-cyclodextrin modified support and iridium nanoparticles detection of glycoalkaloids using disposable biosensors based on genetically modified enzymes proteomic dissection of plant responses to various pathogens redox regulation in plant immune function chiral flavanones from amygdalus lycioides spach: structural elucidation and identification of tnf alpha inhibitors by bioactivity-guided fractionation the screening and isolation of an effective anti-endotoxin monomer from radix paeoniae rubra using affinity biosensor technology development of fret biosensors for mammalian and plant systems application of genetically engineered microbial whole-cell biosensors for combined chemosensing properties, functions and evolution of cytokinin receptors tracking transience: a method for dynamic monitoring of biological events in arabidopsis thaliana biosensors the influence of different nutrient levels on insect-induced plant volatiles in bt and conventional oilseed rape plants assessment of respiration in isolated plant mitochondria using clark-type electrodes in vivo biochemistry: applications for small molecule biosensors in plant biology chemosensors and biosensors based on polyelectrolyte microcapsules containing fluorescent dyes and enzymes generation of plant protein microarrays and investigation of antigenantibody interactions expression of bvglp-1 encoding a germin-like protein from sugar beet in arabidopsis thaliana leads to resistance against phytopathogenic fungi biosensing hydrogen peroxide utilizing carbon paste electrodes containing peroxidases naturally immobilized on coconut (cocos nucifera l.) fibers heat-solubilized curry spice curcumin inhibits antibody-antigen interaction in in vitro studies: a possible therapy to alleviate autoimmune disorders plant growth enhancement and associated physiological responses are coregulated by ethylene and gibberellin in response to harpin protein hpa1 portable optical aptasensor for rapid detection of mycotoxin with a reversible ligand-grafted biosensing surface effect of bt broccoli and resistant genotype of plutella xylostella (lepidoptera: plutellidae) on development and host acceptance of the parasitoid diadegma insulare (hymenoptera: ichneumonidae) measurement of the optical parameters of purple membrane and plant light-harvesting complex films with optical waveguide lightmode spectroscopy self-referencing optrodes for measuring spatially resolved, real-time metabolic oxygen flux in plant systems redox-sensitive gfp in arabidopsis thaliana is a quantitative biosensor for the redox potential of the cellular glutathione redox buffer staying alive: new perspectives on cell immobilization for biosensing purposes colorimetric bacteria sensing using a supramolecular enzyme-nanoparticle biosensor versatile strategy for the synthesis of biotin-labelled glycans, their immobilization to establish a bioactive surface and interaction studies with a lectin on a biochip complex regulation of the immunoglobulin mu heavy-chain gene enhancer: microb, a new determinant of enhancer function development of an electrochemical biosensor for the rapid detection of naphthalene acetic acid in fruits by using air stable lipid films with incorporated auxin-binding protein 1 receptor boron nitride nanotube-based biosensing of various bacterium/ viruses: continuum modelling-based simulation approach live-cell imaging of phosphatidic acid dynamics in pollen tubes visualized by spo20p-derived biosensor nanotechnology in sustainable agriculture: recent developments, challenges, and perspectives. front microbiol 8:1014 nanotechnology in sustainable agriculture: present concerns and future aspects stem nematode counteracts plant resistance of aphids in alfalfa, medicago sativa host plant and population determine the fitness costs of resistance to bacillus thuringiensis fine mapping of co-x, an anthracnose resistance gene to a highly virulent strain of colletotrichum lindemuthianum in common bean electrochemical quantification of the antioxidant capacity of medicinal plants using biosensors enzyme-linked immunosorbent assay for the quantitative/qualitative analysis of plant secondary metabolites kinetic models for detection of toxicity in a microbial fuel cell based biosensor mapping a disordered portion of the brz2001-binding site on a plant monooxygenase, dwarf4, using a quartz-crystal microbalance biosensor-based t7 phage display the xaxab genes encoding a new apoptotic toxin from the insect pathogen xenorhabdus nematophila are present in plant and human pathogens multi-analyte biochip (mab) based on allsolid-state ion-selective electrodes (assise) for physiological research belowground communication: impacts of volatile organic compounds (vocs) from soil fungi on other soil-inhabiting organisms luminescent sensing and imaging of oxygen: fierce competition to the clark electrode comparative quantification of oxygen release by wetland plants: electrode technique and oxygen consumption model key: cord-018145-kssjdn8y authors: niemann, heiner; kues, wilfried; carnwath, joseph w. title: transgenic farm animals: current status and perspectives for agriculture and biomedicine date: 2009 journal: genetic engineering in livestock doi: 10.1007/978-3-540-85843-0_1 sha: doc_id: 18145 cord_uid: kssjdn8y the first transgenic livestock were produced in 1985 by microinjection of foreign dna into zygotic pronuclei. this was the method of choice for more than 20 years, but more efficient protocols are now available, based on somatic cell nuclear transfer (scnt) which permits targeted genetic modifications. although the efficiency of transgenic animal production by microinjection technology is low, many animals with agriculturally important transgenic traits were produced. typical applications included improved carcass composition, lactational performance, and wool production as well as enhanced disease resistance and reduced environmental impact. transgenic animal production for biomedical applications has found broad acceptance. in 2006 the european medicines agency (emea) approved the commercialization of the first recombinant protein drug produced by transgenic animals. recombinant antithrombin iii, produced in the mammary gland of transgenic goats, was launched as atryn® for prophylactic treatment of patients with congenital antithrombin deficiency. pigs expressing human immunomodulatory genes have contributed to significant progress in xenotransplantation research with survival periods of non-human primates receiving transgenic porcine hearts or kidneys approaching six months. lentiviral vectors and small interfering ribonucleic acid (sirna) technology are also emerging as important tools for transgenesis. as the genome sequencing projects for various farm animal species progress, it has become increasingly practical to target the removal or modification of individual genes. we anticipate that this approach to animal breeding will be instrumental in meeting global challenges in agricultural production in the future and will open new horizons in biomedicine. the first transgenic livestock were produced in 1985 by microinjection of foreign dna into zygotic pronuclei. this was the method of choice for more than 20 years, but more efficient protocols are now available, based on somatic cell nuclear transfer (scnt) which permits targeted genetic modifications. although the efficiency of transgenic animal production by microinjection technology is low, many animals with agriculturally important transgenic traits were produced. ty pical applications included improved carcass composition, lactational performance, and wool production as well as enhanced disease resistance and reduced environmental impact. transgenic animal production for biomedical applications has found broad acceptance. in 2006 the european medicines agency (emea) approved the commercialization of the first recombinant protein drug produced by transgenic animals. recombinant antithrombin iii, produced in the mammary gland of transgenic goats, was launched as atryn ® forp rophylactic treatment of patients with congenital antithrombin deficiency. pigs expressing human immunomodulatory genes have contributed to significant progress in xenotransplantation research with survival periods of non-human primates receiving transgenic porcine hearts or kidneys approaching six months. lentiviral vectors and small interfering ribonucleic acid (sirna) technology are also emerging as important tools for transgenesis. as the genome sequencing projects for various farm animal species progress, it has become increasingly practical to target the removal or modification of individual genes. we anticipate that this approach to animal breeding will be instrumental in meeting global challenges in agricultural production in the future and will open new horizons in biomedicine. the production of transgenic farm animals is extraordinarily labor and cost intensive and depends upon advanced techniques in molecular biology, 1 this contribution is mainly based on the following reviews of the authors: niemann et al. 2005; cell culture, reproductive biology and biochemistry. the transfer of the foreign dna is only one step in this process. critical steps involved in the production of transgenic farm animals are: milestones (live offspring) in transgenesis and somatic cloning in farm animals. modified from niemann et al. 2005 . milestones trategyr eference first transgenic sheep and pigs microinjection of dna into one pronucleus of a zygote hammer et al. 1985 hammer et al. 1986 embryonic cloning of sheep nuclear transfer using embryonic cells as donor cells willadsen 1986 willadsen 1997 cloning of sheep with somatic donor cells nuclear transfer using adult somatic donor cells wilmut et al. 1997 wilmut et al. 1997 transgenic sheep produced by nuclear transfer random integration of the construct schnieke et al. 1997 schnieke et al. 1998 transgenic cattle produced from fetal fibroblasts and nuclear transfer random integration of the construct cibelli et al.1998 cibelli et al. 1998 generation of transgenic cattle by mmlv injection infection of oocytes with helper viruses chan et al.1998 2000 gene targeting in sheepg ene replacement and nuclear transfer mccreath et al. 2000 mccreath et al. 2002 trans-chromosomal cattle artificial chromosome kuroiwa et al. 2002 kuroiwa et al. 2002 knockout in pigs heterozygous knock-out dai et al.2002; lai et al.2002 lai et al. 2003 homozygous gene knockout in pigs homozygous knock-out phelps et al.2003 the first successful gene transfer method in animals (mouse) was based on the microinjection of foreign dna into zygotic pronuclei. this was used to produce the first transgenic livestock more than 20 years ago (hammer et al. 1985) . despite the inherent inefficiency of microinjection technology, a broad spectrum of genetically modified large animals has been generated since then for applications in agriculture and biomedicine, with the use of transgenic livestock for 'gene pharming' already at the level of commercial exploitation . however, microinjection has several major shortcomings including low efficiency, random integration and variable expression patterns which mainly reflect the site of integration. research has focused on the development of alternate methodologies for improving the efficiency and reducing the cost of generating transgenic livestock. these include sperm mediated dna transfer (lavitrano et al. 1989; lavitrano et al. 2002; chang et al. 2002) , intracytoplasmic injection (icsi) of sperm heads carrying foreign dna (perry et al. 1999; perry et al. 2001) , injection or infection of oocytes and/or embryos by different types of viral vectors (haskell and bowen 1995; chan et al. 1998; hofmann et al. 2004) , rna interference technology (rnai) (clark and whitelaw 2003) and the use of somatic cell nuclear transfer (scnt) cibelli et al. 1998; baguisi et al. 1999; dai et al. 2002; lai et al. 2002; table 1) . to date, somatic cell nuclear transfer, which has been successful in 13 species, holds the greatest promise for significant improvements in the generation of transgenic livestock (figure 1). the typical success rate (live births) of mammalian somatic nuclear transfer is low and usually is only 1-2% of the transferred embryos. cattle seem to be an exception to this rule as levels of 15-20% can be reached . recently, we have also obtained a significant improvement of porcine cloning efficiency by better selection and optimized treatment of the recipients, specifically by providing a 24h asynchrony between the pre-ovulatory oviducts of the recipients and the reconstructed embryos. presumably, this gives the embryos additional time to achieve the necessary level of nuclear reprogramming. the improved protocol has resulted in pregnancy rates of~80% and only a slightly reduced mean litter size ). these results show that the efficiency of scnt is likely to be improved in the near future with significant impact on transgenic animal production. further qualitative improvements may be derived from technologies that allow precise modifications of the genome including targeted chromosomal integration by site-specific dna recombinases, such as cre or flp, or methods that allow temporally and/or spatially controlled transgene expression (capecchi 1989; kilby et al. 1993) . the genomes of farm animals (cattle, chicken, horse, dog, bee) have been sequenced and annotated in http://www.ensembl.org and http://www.ncbi.org (both july 2008). they, thus, provide new opportunities for selective breeding and transgenic animal production. after 12,000 years of domestic animal selection (copley et al. 2003 ) based on the random mutations resulting from environmental factors such as radiation and oxidative injury, technology is now available to introduce or remove genes with known functions. here, we provide a comprehensive overview on the current status of transgenic animal production and look at future implications. we focus on large domestic species and do not cover recent developments in poultry breeding or in aquaculture. gene 'pharming' entails the producti on of recombinant pharmaceutically active human proteins in the mammary gland or blood of transgenic animals. this technology overcomes the limitations of conventional and recombinant dna based production systems rudolph 1999) and has advanced to the stage of commercial application (ziomek 1998; dyck et al. 2003; schnieke this proceedings and walsh this proceedings) . the mammary gland is the preferred production site mainly because of the quantities of protein that can be produced in this organ using mammary gland specific promoter elements and established methods for extraction and purification of the respective protein rudolph 1999) . guidelines developed by the food and drug administration (fda) of the usa require monitoring the animals' health in a specific pathogen free (spf) facility, sequence validation of the gene construct, characterization of the isolated recombinant protein, and monitoring the genetic stability of the transgenic animals over several generations. this has necessitated, for example, the use of animals from scrapie free countries (new zealand) and maintenance of production animals under strict hygienic conditions. several products derived from the mammary glands of transgenic goats and sheep have progressed to advanced clinical trials (echelard et al. 2006) . phase iii trials for antithrombin iii (atiii) (atryn ® from gtc-biotherapeutics, usa), produced in the mammary gland of transgenic goats, have been completed and the recombinant product was approved as drug by the european medicines agency (emea) in august 2006. this protein is the first product from a transgenic farm animal to be accepted as a fully registered drug. atryn ® is registered for treatment of heparin resistant patients undergoing cardiopulmonary bypass procedures. gtc-biotherapeutics has also expressed at least eleven other transgenic proteins in the mammary gland of transgenic goats at concentrations of more than one gram per liter. the enzyme α-glucosidase (pharming bv) from the milk of transgenic rabbits has orphan drug status and has been successfully used for the treatment of pompe's disease (van den hout et al. 2001) . similarly, recombinant c1 inhibitor (pharming bv), produced in the milk of transgenic rabbits, has completed phase iii trials and is expected to receive registration in the near future. the overall global market for recombinant proteins from domestic animals is expected to exceed $ 1b illion in 2008 and to reach $18.6billion in 2013. an interesting new development is the production of recombinant proteins in the mammary gland of transgenic animals for use as antidotes against organophosphorus compounds used as insecticides in agriculture and chemical warfare. butyrylcholinesterase is a potent prophylactic agent against these compounds. recombinant butyrylcholinesterase has been produced at a concentration of 5g/liter in the mammary gland of transgenic mice and goats (huang et al. 2007 ). the recombinant product was biologically active and had a half life in vitro which was sufficient to provide protection against organophophorus intoxication. transgenic goats can produce sufficient butyrylcholinesterase to protect all humans at risk of organophosphorus poisoning. some gene constructs have failed to produce economically significant amounts of protein in the milk of transgenic animals indicating that the technology needs further refinement to insure consistent high-level expression. this is particularly true for genes having complex regulation, such as those coding for erythropoietin (epo) or human clotting factor viii (hfviii) (hyttinen et al. 1994; massoud et al. 1996; niemann et al. 1999) . with the advent of transgenic plants that also produce pharmacologically active proteins, there is now an array of recombinant technologies that will allow selection of an appropriate production system for each required protein . the use of somatic nuclear transfer will accelerate production of transgenic animals for mammary gland specific synthesis of recombinant proteins. numerous monoclonal antibodies are being produced in the mammary gland of transgenic goats ) and cloned transgenic cattle have been created which produce a recombinant bi-specific antibody in their blood (grosse-hovest et al. 2004 ). when purified from serum, this antibody is stable and mediates target-cell restricted t cell stimulation and tumor cell killing. an interesting new development is the generation of trans-chromosomal animals. a human artificial chromosome (hac), containing the complete sequences of the human immunoglobulin heavy and light chain loci, has been introduced into bovine fibroblasts, which were then used in nuclear transfer. trans-chromosomal bovine offspring were obtained that expressed human immunoglobulin in their blood. this system is a significant step forward in the production of therapeutic human polyclonal antibodies (kuroiwa et al. 2002) . follow-up studies showed that the hac was maintained in most animals for several years in first generation cattle (robl et al. 2007 ). how the hacs behave during meiotic cell divisions remains to be shown. functional human hemoglobin has been produced in transgenic swine. the transgenic protein could be purified from the porcine blood and showed oxygen bindingcharacteristics similartonatural humanhemoglobin. the main obstacle was that only a small proportion of porcine red blood cells contained the human form of hemoglobin (swanson 1992) . alternate approaches to produce human blood substitutes have focused on linking hemoglobin to the superoxide-dismutase system (d'a gnillo and chang 1998). today more than 250,000 people are alive only because of a successful human organ transplantation (allotransplantation). ironically, the success of organ transplantation technology has led to an acute shortage of appropriate organs, because cadaveric and live organ donation falls far short of meeting the demand in western societies. to close the growing gap between demand and availability of appropriate organs, transplant surgeons are now considering the use of xenografts from domesticated pigs (platt and lin 1998; kues and niemann 2004; yang and sykes 2007) . prerequisites for successful xenotransplantation are: (i) overcoming the immunological hurdles, (ii) preventing the transmission of pathogens from the donoranimal to the human recipient, and (iii) compatibility of donor organs with human physiology. with a discordant donor species such as the pig, it is necessary to overcome both hyperacute rejection (har) and acute vascular rejection (avr). the two strategies that have been successfully explored for long term suppression of the har of porcine xenografts are: i) transgenic synthesis of human proteins regulating complement activity (rcas) in the donor organ (cozzi and white 1995; bach 1998; platt and lin 1998) and ii) inactivation of the genes producing antigenic structures on the surface of the donor organ, e.g. the α-gal-epitope (dai et al. 2002; lai et al. 2002; phelps et al. 2003) . prolonged survival of xenotransplanted porcine organs where the 1,3-α-galactosyltransferase (α-gal) gene has been knocked out has been demonstrated. survival rates of up to six months have been achieved with transplanted porcine hearts (kuwaki et al. 2005 ) and survival of up to three months has been obtained with kidneys transplanted from α-gal knockout pigs to baboons (yamada et al. 2005) . the current approach to increasing survival time beyond six months is to create donor pigs with multiple transgenes that block a range of additional immunological barriers. to this end, we have recently produced triple transgenic pigs expressing either human thrombomodulin (htm) or human heme oxygenase-1 (hho-1) on top of one or two rcas to suppress both har and the later stage coagulatory disorders observed in experimental porcine-to-primate xenotransplantation . reproducible survival of porcine xenografts for more than six months in non-human primate recipients is considered to be a necessary precondition to starting clinical trials with human patients. a particularly promising strategy for achieving long-term xenograft survival is to induce tolerance by creating permanent chimerism in the recipient by intraportal injection of embryonic stem cells (fändrich et al. 2002) or by co-transplantation of vascularized thymic tissue (yamada et al. 2005) . long term tolerance of hla-mismatched kidneys has recently been demonstrated in humans (kawai et al. 2008) . extensive research has revealed that the risk of porcine endogenous retrovirus (perv) transmission to human patients is low, opening the door for preclinical testing of xenografts (switzer et al. 2001; irgang et al. 2003) . rna interference (rnai) is a promising method for knocking down the already low level of perv expression in porcine somatic cells. using rnai mediated knockdown, perv expression has been further reduced in porcine somatic cells for 4-6 months, these cells were successfully used in scnt and gave normal piglets (dieckhoff et al. 2007; dieckhoff et al. 2008) . rnai mediated perv expression knockdown provides an additional level of safety for porcine-to-human xenotransplantation. although additional refinements will always be possible, it is expected that appropriatelines of transgenic pigs will be availableasorgan donors within thenextfivetoten years. transplantationofpancreaticisletsfrom (transgenic) pigs may take place even earlier. guidelines for the clinical application of porcine xenotransplants already exist in the usa and are currently being developed in other countries. the worldwide consensus is that thet echnologyi se thically acceptable provided that thei ndividual's well-being does not compromise public health (e.g., the risk of perv recombination).t he improvement in qualityo fl ifef or patients receivingc onventionala llotransplants is dramatic,b ut xenotransplantation is also economically attractive because thecostofmaintaining patients with severe kidney diseaseondialysisorlongtermtreatment of patients with chronich eart diseasec an be greater than thec osto fas uccessfult ransplantation.preliminary functional data on porcinekidneys andheartsin non-humanp rimatesi sp romising although thel ongt erme ffecto fp orcine organs on humanp hysiologyi st oag reat extent unexplored (ibrahim et al. 2006 ). the physiology, anatomy, and life span of mice differ significantly from humans, making the rodent model inappropriate for many human diseases farm animals, such as pigs, sheep or cattle, may be more appropriate models in which to study the treatment of human diseases such as artherosclerosis, non-insulin-dependent diabetes, cystic fibrosis, cancer and neuro-degenerative disorders, which require longer periods of observation than is possible with mice (theuringe ta l. 1997;p almarini and fan 2001; li and engelhardt 2003; hansen and khanna 2004) . cardiovascular disease is increasing in ageing western societies where coronary artery diseases already account for the majority of deaths. because genetically modified mice do not manifest myocardial infarction or stroke as ar esulto fa therosclerosis,n ew animal models, sucha sp igsthate xhibit similar pathologies, are needed to develop effective therapeutic strategies (rapacz and hasler-rapacz 1989; grunwald et al. 1999 ). an important porcine model has been developed for the rare human eye disease retinitis pigmentosa (pr) (pettersetal. 1997). patientswithp rsufferfromnight blindness earlyinlifedue to loss of photoreceptors.t ransgenicpigswith am utated rhodopsing eneh aveaphenotypeq uite similart othe human patients and effective treatments are being developed (mahmoud et al. 2003 ). an important aspect of scnt derived large animal models of human diseases (and the development of regenerative therapies using these models) is that somatic cloning per se does not necessarily result in shortened telomeres as once feared and thus does not necessarily lead to premature ageing (schätzlein and rudolph 2005) . telomeres are the repetitive dna sequences at the ends of the chromosomes and are crucial for their structural integrity and function and are thought to be related to lifespan. telomere shortening is correlated with severe limitation of the regenerative capacity of cells, the onset of cancer, ageing and chronic dis-ease with significant impact on human lifespan (schätzlein and rudolph 2005) . expression of telomerase, which is the enzyme primarily responsiblefor theformation andrebuildingoftelomeres,issuppressed in most somatic tissues postnatally. however, recent studies have revealed that telomerel engthi s( re-)establishede arly in preimplantationd evelopment at themorula-blastocyst transition duetotelomeraseactivity ( schätzlein et al. 2004 ). agricultural exploitation of transgenic animal technology lags behind applications in biomedicine . nevertheless, table 2 gives an overview of work in the production of animals transgenic with improved agricultural traits. transgenic pigs bearing a hmt-pgh construct (human metallothionein promoter driving the porcine growth hormone gene) showed significant improvement in economically important traits including growth rate, feed conversion and body composition (muscle/fat ratio) without the pathological phenotype seen with earlier gh constructs nottle et al. 1999) . similarly, transgenic pigs carrying the human insulin-like growth factor-i gene (higf-i) had~30% larger loin mass,~10% more carcass lean tissue and~20% less total carcass fat (pursel et al. 1999) . unfortunately, commercialization of these pigs has been postponed due to the current lack of public acceptance of genetically modified foods. an important step towards the production of more healthful pork products was made by creating pigs with a desaturase gene, derived either from spinach or from caenorhabditis elegans, which increases the non-saturated fatty acid content in the skeletal muscles of these animals. the higher ratio of unsaturated to saturated fatty acids means more healthful pork, since it is well known that a diet rich in non-saturated fatty acids is associated with a reduced risk of stroke and coronary diseases in humans (niemann 2004; saeki et al. 2004; lai et al. 2006) . the physicochemical properties of milk are mainly due to the ratio of casein variants, making these a prime target for the improvement of milk composition. dairy production is an attractive field for targeted genetic modification (yom andb remel1 993; karatzas andt urner1 997) andi t is possible to produce milk with a modified lipid composition by modulation of the enzymes involved in lipid metabolism and to increase curd and cheese yield by enhancing expression of the casein gene family in the mammary gland. the bovine casein ratio has already been altered by the over-expression of beta-and kappa-casein, demonstrating the potential of transgenic technology for improving the economic value of bovine milk (brophye tal. 2003 ). it should also be possible to create 'hypoallergenic' milk by knocking out or knocking down the β-lactoglobulin gene. one could envision the production of enhanced 'infant milk' containingh uman lactoferrino rt he production of milk whichr esists bacterial contamination by expressing lysozyme, the antibacterial component of egg white and human tears. to generate lactose-free milk, a knockout or knockdown at the α-lactalbumin locus would suppress this key step in milk sugar synthesis. lactose reduced or lactose-free milk would render dairy products suitable forconsumption by thelarge proportion of theworld's adult population who do not produce an active intestinal lactase. lactose is themajor osmoticallyactivesubstance in milk andits absencemight be expected to interfere with milk secretion. however, a lactase construct has been tested in the mammary gland of transgenic mice and in hemizygous mice; this reduced lactose content by 50-85% without altering milk secretion (jost et al. 1999 ). on the other hand, experimental transgenic mice with a homozygous knockout for α-lactalbumin could not nurse their offspring because of the high viscosity of their milk (stinnakre et al. 1994 ). in the pig, increased transgenic expression of a bovine lactalbumin construct in the mammary gland resulted in increased lactose content and increased milk production which resulted in improved survival and development of the piglets (wheeler et al. 2001) . increased survival of piglets at weaning would provide significant commercial benefits to the producer and improved animal welfare. these findings demonstrate the feasibility of producing significant alterations in milk composition by application of an appropriate transgenic strategy. transgenic sheep carrying a keratin-igf-i construct expressed in their skin produced 6.2% more clear fleece than non-transgenic controls and no adverse effects on health or reproduction were observed (damak et al. 1996a, b) . similar efforts to alter wool production by transgenic modification of the cystein pathway have met with more limited success, although it is known that cystein is the rate limiting biochemical factor for wool growth (ward 2000) . phytase transgenic pigs have been developed to address the problem of manure-related environmental pollution. these pigs carry a bacterial phytase gene under transcriptional control of a salivary gland specific promoter, which allows the pigs to digest plant phytate. without the bacterial enzyme, phytate passes through the animal undigested and pollutes the environment with phosphorus if uncontained. with the bacterial enzyme, fecal phosphorus output was reduced up to 75% (golovan et al. 2001) . these environmentally friendly pigs may be used for commercial production in canada within the next few years. in most cases, susceptibility to pathogens represent the interplay of numerous genes, i.e. the trait is polygenic in nature. however, some genetic loci are known to confer resistance against specific diseases. transgenic strategies to enhance disease resistance include the transfer of major histocompatibility-complex (mhc) genes, t cell receptor genes, immunoglobulin genes, genes that affect lymphokines, or specific disease resistance genes (müller and brem 1991) . a prominent example for a specific disease resistance gene is the murine mx-gene. production of the mx1-protein is induced by interferon. this was discovered in inbred mouse strains that were resistant to influenza viruses (staeheli 1991) . microinjection of an interferonand virus-inducible mx-construct into porcine zygotes resulted in two transgenic pig lines which expressed the mx-mrna; but no mx protein was detected (müller et al. 1992 ). the bovine mxi gene was identified and shown to confer antiviral activity when transfected into in vero cells (baise et al. 2004 ). transgenic constructs bearing the immunoglobulin-a (iga) gene have been successfully introduced into pigs, sheep and mice in an attempt to increase resistance against infections (lo et al. 1991) . expression of the murine iga gene was successful in two transgenic pig lines but only the light chains could be detected and the iga-molecules showed only marginal binding to phosphorylcholine (lo et al. 1991 ). on the other hand, high levels of monoclonal murine antibodies with a high binding affinity for their specific antigen have been produced in transgenic pigs (weidle et al. 1991) . attempts to increase ovine resistance to visna virus infection by transgenic production of visna envelope protein have been reported (clements et al. 1994 ). the transgenic sheep developed normally and expressed the viral gene without pathological side effects. however, the transgene was not expressed in monocytes, the target cells of the viral infection, and antibodies were detected after artificial infection of the transgenic animals (clements et al. 1994) . passive immunity has been induced against an economically important porcine disease in a transgenic mouse model (castilla et al. 1998 ). these transgenic mice secrete a recombinant antibody in their milk that neutralized the corona virus responsible for transmissible gastroenteritis (tgev) and this conferred resistance to tgev. strong mammary gland specific expression was achieved over the entire period of lactation. extension of this work to pigs is promising. knockout of the prion protein is the only secure way to prevent infection and transmission of spongiform encephalopathies including scrapie and bse (weissmann et al. 2002) . it was possible to knock out the ovine prion locus; however, the cloned lambs carrying the knockout locus died shortly after birth (denning et al. 2001 ). on the other hand, cloned cattle with a knockout for the prion locus have been successfully produced and indeed show clear evidence of resistance to bse infection (richt et al. 2007 ). transgenic animals with modified prion genes will be an appropriate model for studying the development of spongiform encephalopathies in humans and are crucial for developing strategies for the elimination of prion carriers from the farm animal population. this work is a prerequisite for the future production of recombinant proteins for human medicine in the blood or the mammary glands of transgenic cattle. the level of anti-microbial peptides (lysozyme and lactoferrin) in human milk is many times higher than in bovine milk and transgenic expression of the human lysozyme gene in mice causes a significant reduction in bacterial contamination and a reduced frequency of mammary gland infections . lactoferrin has bactericidal and bacteriostatic effects in addition to being the main source of iron in milk. these properties make an increase in lactoferrin levels in the bovine mammary gland and are a practical way to improve milk quality. human lactoferrin has, in fact, been expressed at high levels in the milk of transgenic mice and cattle (krimpenfort et al. 1991; platenburg et al. 1994) and was associated with an increased resistance against mammary gland diseases (van berkel et al. 2002) . similarly, lysostaphin was shown to confer specific resistance against mastitis caused by staphylococcus aureus. mastitis resistant cows have been produced by expressing a lysostaphin gene construct in the mammary gland (wall et al. 2005) . as discussed above, most work in transgenic technology has focussed on livestock species either for biomedical or agricultural purposes. however, the methodology is becoming routine and recent applications include the development of new varieties of ornamental fish. for example, fluorescent green transgenic medaka (oryzias latipes, rice fish) have been produced and approved for sale in taiwan (chou et al. 2001) . the fluorescent medaka is currently marketed by the taiwanese company taikong. the "glofish ® " is a trademarked transgenic zebra fish (danio rerio) expressing red fluorescent protein from a sea anemone under the transcriptional control of a muscle-specific promoter (gong et al. 2003) . green and yellow fluores-cent proteins have also been introduced into the zebra fish to give different fluorescence colors. yorktown technologies (www.glofish.com, july 2007) initiated commercial sales of the transgenic zebrafish in the united states with retail prices of approximately $ 5,00 each. the glofish ® thus became the first transgenic animal freely distributed throughout the usa. a recent report of the fda contained no evidence that glofish ® represents a risk (us fda, 2003) . commercialization of fluorescent fish has gone forward in several countries other than the usa, including taiwan, malaysia, and hongkong, whereas marketing in australia, canada and european union is currently prohibited. transgenic applications will undoubtedly become more widespread if even more efficient gene transfer methods will be developed and specific genetic traits can be targeted. some of the emerging technologies are described below. lentiviruses have proven to be efficient vectors for the delivery of genes into oocytes and zygotes. for example, transgenic cattle have been produced by injecting lentiviral vectors into the perivitelline space of oocytes (hofmann et al. 2004 ) and injection of lentiviruses into the perivitelline space of porcine zygotes resulted in a high proportion of transgene expressing founder animals from which several lines of transgenic pigs were established (hofmann et al. 2003; whitelaw et al. 2004) . lentiviral mediated gene transfer in livestock has generated unprecedented high yields of transgenic animals due to multiple integration events. unfortunately, multiple integration brings the disadvantage that here is an increased probability of unwanted side effects caused by oncogene activation or insertional mutagenesis. additional problems which have been identified are gene silencing due to the presence of viral sequences (hofmann et al. 2006 ) and a high frequency of mosaicism in founder animals. transgenic mice and farm animals harbouring the first-generation of conditional promoter elements showed expression in response to heavy metals or steroid hormones but suffered from high basal expression levels and pleiotropic effects (lee et al. 1981; mayo et al. 1982; miller et al. 1989; pursel et al. 1989) . newer, binary expression systems based on prokaryotic control elements are responsive to exogenous iptg (isopropyl-β-dthiogalactopyranosid), ru-486, ecdysone or tetracycline derivatives and give more tightly controlled expression (lewandowski 2001; corbel and rossi 2002) . the first tetracycline system that was successfully used in mice required two dna constructs. one was for doxycycline controlled expression of a transactivator and the other contained regulatory elements which conferred transactivator dependent expression of the target gene. these dna constructs were typically integrated into two different lines of transgenic mice. after crossbreeding the two lines of transgenic mice, their offspring expressed the target gene only after stimulation with doxycycline (furth et al. 1994) . unfortunately, the long generation intervals make this approach unfeasible in livestock species (niemann and kues 2003) . recently, we reported on a tetracycline-responsive bicistronic expression cassette (nta) in which expression was amplified by transactivator mediated positive feedback . this was used to produce the first tetracycline controlled transgenic expression in a farm animal. the auto-regulatory cassette was integrated at a single chromosomal site in the pig genome after pronuclear dna injection and was designed to give ubiquitous expression of a human regulator of complement activation (rca) independent of cellular transcription cofactors. expression from this construct could be inhibited reversibly by feeding the animals doxycycline (tet-off system). in ten transgenic pig lines in which only one copy of the nta cassette was integrated, the transgene was silenced in all tissues with the exception of skeletal muscle where expression was limited to a small number of discrete muscle fibers (niemann and kues 2003) . however, crossbreeding of lines to produce animals with two nta cassettes resulted in reactivation of the cassettes and strong, tissueindependent, tetracycline-sensitive rca expression. it seems that crossing the transgenic pig lines, which doubled the level of transactivator, was able to overcome epigenetic silencing. transgene expression in the double transgenic pigs was inversely correlated with the level of nta cassette dna methylation . this approach highlights the importance of understanding epigenetic (trans)-gene regulation in the pig. pluripotent embryonic stem (es) cells have the ability to participate in organ and even germ cell development following injection into blastocysts or aggregation with morulae (rossant 2001) . true es cells (i.e. those able to contribute to the germ line) are currently only available from inbred mouse strains (kues et al. 2005a ). these murine es cells have become an important tool for gene knock-out and knock-in experiments and to study large chromosomal rearrangements (downing and battey 2004) . es-like cells and primordial germ cell cultures have been reported for several farm animal species, and es-like cells which can produce chimeric animals albeit without germ line contribution have been reported in swine (anderson 1999; shim et al. 1997; wheeler 1994) and cattle (cibelli et al. 1998) . recent data indicate that somatic stem cells have a much broader developmental potential than previously assumed kues et al. 2005b ). whether these cells will improve the efficiency of chimera generation or somatic nuclear transfer in farm animals has yet to be shown conclusively (kues et al. 2005b; hornen et al 2007) . pluripotent cells are a valuable tool for improved production of animals with targeted genetic modification. a revolutionary breakthrough in direct nuclear reprogramming of mouse somatic cells was recently reported (takahashi and yamanaka 2006; okita et al. 2007; wernig et al. 2007 ). cells transfected with constructs expressing oct4, sox2, myc and klf4, carried in retroviral vectors, were reprogrammed to a totipotent state and were indistinguishable from es cells generated from fertilized embryos with regard to differentiation potential and morphology. these induced pluripotent stem cells (ips), derived from somatic cells, were able to populate the germ line upon injection into blastocysts and after transfer into recipient mice, clearly indicating complete reprogramming (okita et al. 2007; wernig et al. 2007; maherali et al. 2007 ). the same genes have recently been found to be effective in reprogramming human fibroblasts and other human somatic cells into cells with pluripotent properties (takahashi et al. 2007 ). this affords a new approach to the generation of pluripotent cells from farm animal species. transplantation of transgenic primordial germ cells into the testes is potentially an alternative approach to the generation of transgenic animals. initial experiments in mice showed that the depletion of endogenous spermatogonial stem cells by treatment with busulfan prior to transplantation is effective and permits re-colonization of the testes by donor cells. transmission of the donor haplotype to the next generation after germ-cell transplantation has been achieved in goats (honaramooz et al. 2003) . current major obstacles of this strategy are the lack of efficient in vitro culture methods for primordial germ/prospermatogonial cells and the lack of efficient gene transfer techniques for these cells. recently adeno-associated virus (aav ) was found to be suitable for delivering transgenes to infect a mal germ cell and germline tansmission was reported in goats and mice (honaramooz et al. 2007 ). the efficiency of this approach and putative silencing of the aav introduced transgenes requires further investigation. rna interference (rnai) is a conserved post-transcriptional gene regulatory process found in most biological systems including fungi, plants and animals. the common element is double stranded rna which is cleaved to form small interfering rna (sirna) molecules 19-27 base pairs in length. a single strand of these small rna molecules is incorporated in an rnainduced silencing complex (risc) which specifically binds to the complementary sequence of its target mrna causing endonuclease mediated degradation. the result is that no protein is produced from that mrna transcript (plasterk 2002) . natural rna interference is involved in gene regulation, specifically to suppress the translation of mrnas from endogenous and exogenous viral elements, so this can be exploited for therapeutic purposes (dallas and vlassow 2006) . for transient gene knockdown, synthetic sirnas can be transfected into cells or early embryos (clark and whitelaw 2003; iqbal et al. 2007) . for stable gene repression, the sirna sequences must be incorporated into a gene construct and constitutively expressed. the combination of sirna with lentiviral vector technology is now a highly effective tool in this respect. rnai knockdown of porcine endogenous retrovirus (perv) has been demonstrated in porcine primary cells (dieckhoff et al. 2007 ) and in cloned piglets (dieckhoff et al. 2008) . sirna mediated knockdown of the prion protein (prnp) gene has been accomplished in bovine embryos (golding et al. 2006) . the modification appears to be permanent as lentiviral delivered sirna has been shown to persist for three generations in rats (tenenhaus dann et al. 2006) . the combination of sirna and lentiviral vector technology provides a method for highly efficient targeted gene knockdown for functional genetic analysis in farm animals and could easily be integrated into existing breeding programs. concerns have been raised about the health of transgenic farm animals because it is known that insertional mutagenesis and other undesirable side effects can be caused by the integration and expression of recombinant gene constructs (van reenen et al. 2001, van reenen this proceedings) . the health of all transgenic animals is closely monitored because of the time and money invested in their creation and because all work is basic research. a small number of studies have systematically investigated the health effects of transgenesis. a study of the effects of human growth hormone expression in pigs and sheep identified specific pathological phenotypes related to their accelerated growth rate. these problems were eliminated in subsequent transgenic animals by modifications to the gene constructs (nottle et al. 1999) . in pigs, transgenic for human daf and maintained under qualified pathogen free conditions, haematological parameters and blood chemistry were similar to non-transgenic controls (tucker et al. 2002) . with the exception of slightly accelerated growth rates, no deviations were found. a detailed pathomorphological examination of nine lines of hemizygous pigs expressing human rcas revealed no adverse effects related to transgene expression (deppenmeier et al. 2006) , providing clear evidence that transgenesis per se does not compromise animal health and welfare. investigation of animals carrying the nta bicistronic expression cassette, driving hcd59 and a tetracycline regulated transactivator revealed that multi-transgenic animals display a normal health status (deppenmeier et al. 2006) . the hemizygous lines were fertile and produced normal litter sizes. transgenesis based on scnt is increasingly used forf armanimals.i n cloned animals, both pre-and postnataldevelopment can be compromised and a proportion of scnt offspring in both ruminants and mice exhibit increased perinatal mortality. the list of developmental abnormalities includes: extended gestation length, oversized offspring, aberrant placental function, cardiovascular problems, respiratory defects, immunological deficiencies, problems with tendons, adult obesity, kidney or hepatic malfunctions, behavioral changesand higher susceptibilitytoneonataldiseases, all of which are aspects of what has been called the "large offspring syndrome" (los) (renard et al. 1999; tamashiro et al. 2000; ogonuki et al. 2002; rhind et al. 2003) . the incidence of los is stochastic and has not been linked to aberrant expression of any single genes or to any specific pathophysiology. the general assumption is that the underlying cause of los is faulty epigenetic reprogramming of the transferred somatic cell nucleus. despite these problems, critical surveys of the published literature have revealed that most cloned animals are healthy and develop normally (cibelli et al. 2002; panarace et al. 2007 ). this demonstrates that mammalian development can tolerate minor epigenetic aberrations and subtle variations in gene expression without affecting survival of cloned animals (humpherys et al. 2001) . six months old cloned cattle do not differ from age-matched controls with regard to biochemical blood and urine parameters (lanza et al. 2001; chavatte-palmer et al. 2002) , immune status (lanza et al. 2001) , body score (lanza et al. 2001) , somatotrophic axis (govoni et al. 2002) , reproductive parameters , or milk yield and composition (pace et al. 2002) . no differences were found in the meat or milk composition of bovine clones compared to age matched counterparts (tian et al. 2005; miller 2007) . similar findings were reported for cloned pigs archer et al. 2003) . regulatory agencies around the world have agreed that food derived from cloned animals is safe and there is no scientific basis for questioning this (c.f. national academy of sciences, committee on defining science-based concerns associated with products of animal biotechnology, (national academy of sciences 2002). expert committee from the japanese ministry of agriculture, forestry and fisheries (maff; kumugai 2002) , the fda (rudenko et al. 2007; food and drug administration 2008) and efsa (european food safety agency, 2007) . since somatic cloning has only been used since 1997 and the lifespan of domestic animals is relatively long, the specific effects of cloning on longevity and senescence have not yet been fully assessed; however, preliminary data indicate no cumulative pathology even after serial cloning of mice and cattle kubota et al. 2004 ). there are still insufficient numbers of transgenic farm animals produced by the newest technologies including viral vectors and spermatogonial transgenesis to reveal subtle effects on animal health and welfare. biological products from any animal source are unique and must be handled differently than chemically synthesized drugs to assure their safety, purity and potency. proteins are heat labile, subject to microbial contamination, can be damaged by shear forces and can be immunogenic and allergenic. in the united states, the fda has developed guidelines to assure safe commercial exploitation of recombinant biological products. a crucial consideration with animal derived products is the prevention of transmission of pathogens from animals to humans . this requires sensitive and reliable diagnostic and screening methods for various pathogenic organisms. furthermore, transgenic farm animal based applications require strict standards of quality control. maldi-tof-spectometry is an important tool in this context (hughes et al. 2000; templin et al. 2002) . meanwhile, improvements in rna isolation and in unbiased global amplification of picogram amounts of mrna enable researchers to analyse rna from single embryos (brambrink et al. 2002; ). one can now monitor the entire transcriptome of a transgenic organ or organism to ensure the absence of unwanted effects (hughes et al. 2000; templin et al. 2002) . detailed genomic information and new genetic engineering tools will accelerate and improve transgenic animal production in the future. genetic technology presents not only a major opportunity to improve agricultural production but also offers exciting prospects for medical research by exploiting large animals as models of human health and disease. progress in animal genomics has broadly followed the route pioneered by the human genome project in terms of the assembly, publication and utilization of the data. this is evident in the advanced drafts of the bovine, porcine, horse, canine, chicken, and honeybee genomic maps. the ability to engineer the genome is new and the advent of new molecular tools and breeding technologies is benefiting this field. however, full realization of this exciting potential is handicapped by our currently limited understanding of epigenetic controls and the role of natural sirna and microrna in regulating gene expression. the convergence of recent advances in reproductive technology with the tools of molecular biology opens a new dimension for animal breeding. major goals are the continued refinement of reproductive biotechnology and a rapid completion of the various genome sequencing and annotation projects. induced pluripotent stem cell (ips) (takahashi and yamanaka 2006) research will play a critical role in understanding epigenetic controls. despite continued efforts, no es-cell lines with germ line potential have been established from mammals other than the mouse although es-like cells have been reported in several species and have been maintained in culture from 13 weeks to three years (gjorret and maddox-hyttel 2005) . true germ line competent es cell lines from farm animal species will permit exploitation of the full power of recombinant dna technology in animal breeding. this is critical for the development of sustainable and diversified animal production systems for the future. we anticipate that in the near future genetically modified animals will play a significant role in the biomedical field but that agricultural applications will develop more slowly due to the complexity of many economically important traits and to current resistance to the concept of engineered farm animals. embryonic stem cells in agricultural species hierarchical phenotype and epigenetic variation in cloned swine production of goats by somatic cell nuclear transfer conditional expression of type i interferon-induced bovine mx1 gt-pase in a stable transgenic vero cell line interferes with replication of vesicular stomatitis virus application of cdna arrays to monitor mrna profiles in single preimplantation mouse embryos cloned transgenic cattle produce milk with higher levels of b-casein and k-casein the new mouse genetics: altering the genome by gene targeting phenotyping of transgenic cloned pigs engineering passive immunity in transgenic mice secreting virus-neutralizing antibodies in milk transgenic cattle produced by reverse-transcribed gene transfer in oocytes effective generation of transgenic pigs and mice by linker based sperm-mediated gene transfer clinical, hormonal, and hematologic characteristics of bovine calves derived from nuclei from somatic cells uniform gfp-expression in transgenic medaka (oryzias latipes) at the f0 generation transgenic bovine chimeric offspring produced from somatic cell-derived stem like cells the health profile of cloned animals a future for transgenic livestock development of transgenic sheep that express the visna virus envelope gene direct chemical evidence for widespread dairying in prehistoric britain latest developments and in vivo use of the tet system: ex vivo and in vivo delivery of tetracycline-regulated genes the generation of transgenic pigs as potential organ donors for humans polyhemoglobin-superoxide dismutase-catalase as a blood substitute with antioxidant properties targeted disruption of the α1,3-galactosyltransferase gene in cloned pigs rnai: a novel antisense technology and its therapeutic potential improved wool production in transgenic sheep expressing insulin-like growth factor 1 targeting gene expression to the wool follicle in transgenic sheep deletion of the alpha(1,3)galactosyltransferase (ggta1) and the prion protein (prp) gene in sheep health status of transgenic pig lines expressing human complement regulator protein cd59 inhibition of porcine endogenous retroviruses (perv) in primary porcine cells by rna interference using lentiviral vectors knockdown of porcine endogenous retrovirus (perv) expression by perv-specific shrna in transgenic pigs technical assessment of the first 20 years of research using mouse embryonic stem cell lines making recombinant proteins in animals -different systems, different applications production of recombinant therapeutic proteins in the milk of transgenic animals reproductive characteristics of cloned heifers derived from adult somatic cells preimplantation-stage stem cells induce longterm allogeneic graft acceptance without supplementary host conditioning fda statement regarding glofish animal cloning: a risk assessment -final temporal control of gene expression in transgenic mice by a tetracycline-responsive promoter attemptstowards derivationand establishment of bovine embryonic stem cell-like cultures suppression of prion protein in livestock by rna interference pigs expressing salivary phytase produce lowphosphorus manure development of transgenic fish for ornamental and bioreactor by strong expression of fluorescent proteins in the skeletal muscle age-related changes of the somatotropic axis in cloned holstein calves cloned transgenic farm animals produce a bispecific antibody for t cell-mediated tumor cell killing identification of a novel arg->cys mutation in the ldl receptor that contributes to spontaneous hypercholesterolemia in pigs production of transgenic rabbits, sheep and pigs by microinjection spontaneous and genetically engineered animal models; use in preclinical cancer drug development efficient production of transgenic cattle by retroviral infection of early embryos efficient transgenesis in farm animals by lentiviral vectors generation of transgenic cattle by lentiviral gene transfer into oocytes epigenetic regulation of lentiviral transgene vectors in a large animal model fertility and germline transmission of donor haplotype following germ cell transplantation in immunocompetent goats adeno-associated virus (aav )-mediated transduction of male germ line stem cells results in transgene transmission after germ cell transplantation production of viable pigs from fetal somatic stem cells recombinant human butyrylcholinesterase from milk of transgenic animals to protect against organophosphate poisoning widespread aneuploidy revealed by dna microarray expression profiling epigenetic instability in es cells and cloned mice generation of transgenic dairy cattle from transgene-analyzed and sexed embryos produced in vitro selected physiologic compatibilities and incompatibilities between human and porcine organ systems parent-of-origin dependent gene-specific knock down in mouse embryos porcine endogenous retroviruses: no infection in patients treated with a bioreactor based on porcine liver cells pluripotency of mesenchymal stem cells derived from adult marrow production of low-lactose milk by ectopic expression of intestinal lactase in the mouse mammary gland toward altering milk composition by genetic manipulation: current status and challenges hla-mismatched renal transplantation without maintenance immunosuppression site-specific recombinases: tools for genome engineering generation of transgenic dairy cattle using in vitro embryo production serial bull cloning by somatic cell nuclear transfer the contribution of farm animals to human health from fibroblasts to stem cells: implications for cell therapies and somatic cloning isolation of murine and porcine fetal stem cells from somatic tissue epigenetic silencing and tissue independent expression of a novel tetracycline inducible system in doubletransgenic pigs safety of animal foods that utilize cloning technology cloned transchromosomic calves producing human immunoglobulin heart transplantation in baboons usion a1,3-glactosyltransferase knock out pigs as donors: initial experiments generation of cloned transgenic pigs rich in omega-3 fatty acids production of α1,3-galactosyltransferase knockout pigs by nuclear transfer cloning cloned cattle can be healthy and normal sperm cells as vectors for introducing foreign dna into eggs: genetic transformation of mice efficient production by sperm-mediated gene transfer of human decay accelerating factor (hdaf) transgenic pigs for xenotransplantation glucocorticoids regulate expression of dihydrofolate reductase cdna in mouse mammary tumour virus chimaeric plasmids conditional control of gene expression in the mouse progress toward generating a ferret model of cystic fibrosis by somatic cell nuclear transfer expression of mouse iga transgenic mice, pigs and sheep the production of recombinant pharmaceutical proteins in plants the effects of mammary gland expression of human lysozyme on the properties of milk from transgenic mice mammary gland expression of transgenes and the potential for altering the properties of milk directly reprogrammed fibroblasts show epigenetic remodeling and widespread tissue contribution lensectomy and vitrectomy decrease the rate of photoreceptor loss in rhodopsin p347l transgenic pigs the deleterious effects of human erythropoietin gene driven by the rabbit whey acidic protein gene promoter in transgenic rabbits the mouse metallothionein-i gene is transcriptionally regulated by cadmium following transfection into human or mouse cells expression of recombinant proteins in the milk of transgenic animals food from cloned animals is part of our brave old world expression of human or bovine growth hormone gene with a mouse metallothionein-1 promoter in transgenic swine alters the secretion of porcine growth hormone and insulin-like growth factor-i disease resistance in farm animals transgenic pigs carrying cdna copies encoding the murine mx1 protein which confers resistance to influenza virus infection safety of genetically engineered foods: approaches to assessing unintended health effects. sub-report on methods and mechanisms of genetic manipulation and cloning of animals. the national transgenic pigs expressing plant genes expression of human blood clotting factor viii in the mammary gland of transgenic sheep application of transgenesis in livestock for agriculture and biomedicine transgenic farm animals: present and future transgenic farm animals: an update application of dna array technology to mammalian embryos production and analysis of transgenic pigs containing a metallothionein porcine growth hormone gene construct early death of mice cloned from somatic cells generation of germline-competent induced pluripotent stem cells ontogeny of cloned cattle to lactation retrovirus-induced ovine pulmonary adenocarcinoma, an animal model for lung cancer how healthy are clones and their progeny: 5 years of field experience mammalian transgenesis by intracytoplasmic sperm injection efficient metaphase ii transgenesis with different transgene archetypes preovulatory embryo transfer increases cloning efficiencies in pigs production of pigs transgenic for human hemeoxygenase-i by somatic nuclear transfer genetically engineered large animal model for studying cone photoreceptor survival and degeneration in retinitis pigmentosa production of alpha 1,3-galactosyltransferasedeficient pigs rna silencing: the genomes immune system expression of human lactoferrin in milk of transgenic mice the future promises of xenotransplantation expression of insulin-like growth factor-i in skeletal muscle of transgenic pigs animal models: the pig hot topic: using a stearoyl-coa desaturase transgene to alter milk fatty acid composition lymphoid hypoplasia and somatic cloning cloned lambs--lessons from pathology production of cattle lacking prion protein transgenic animal production and animal biotechnology animal cloning and the fda: the risk assessment paradigm under public scrutiny stem cells from the mammalian blastocyst biopharmaceutical production in transgenic livestock functional expression of a delta12 fatty acid desaturase gene from spinach in transgenic pigs telomere length is reset during early mammalian embryogenesis telomere length regulation during cloning, embryogenesis and aging human factor ix transgenic sheep produced by transfer of nuclei from transfected fetal fibroblasts isolation of pluripotent stem cells from cultured porcine primordial germ cells intracellular immunization: a new strategy for producing diseaseresistant transgenic livestock? creation and phenotypic analysis of α-lactalbumin-deficient mice production of functional human hemoglobin in transgenic swine lack of cross-species transmission of porcine endogenous retrovirus infection to nonhuman primate recipients of porcine cells, tissues and organs induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors induction of pluripotent stem cells from adult human fibroblasts by defined factors postnatal growth and behavioral development of mice cloned from adult cumulus cells protein microarray technology heritable and stable gene knockdown in rats transgenic animals as models of neurodegenerative disease in humans meat and milk compositions of bovine clones the production of transgenic pigs for potential use in clinical xenotransplantation: baseline clinical pathology and organ size studies large scale production of recombinant human lactoferrin in the mik of trangenic cows enzyme therapy for pompe disease with recombinant human α-glucosidase from rabbit milk transgenesis may affect farm animal welfare: a case for systemic risk assessment cloning of mice to six generations genetically enhanced cows resist intramammary staphylococcus aureus infection transgene-mediated modifications to animal biochemistry genes encoding a mouse monoclonal antibody are expressed in transgenic mice, rabbits and pigs transmission of prions in vitro reprogramming of fibroblasts into a pluripotent es-cell-like state development and validation of swine embryonic stem cells: a review transgenic alteration of sow milk to improve piglet growth and health efficient generation of transgenic pigs using equine infectious anaemia virus (eiav) derived vector nuclear transplantation in sheep embryos viable offspring derived from fetal and adult mammalian cells xenotransplantation: current status and a perspective on the future risk assessment of meat and milk from cloned animals marked prolongation of porcine renal xenograft survival in baboons through the use of α1,3-galactosyltransferase gene-knockout donors and the cotransplantation of vascularized thymic tissue genetic engineering of milk composition: modification of milk components in lactating transgenic animals commercialization of proteins produced in the mammary gland key: cord-014368-4nasrbs6 authors: nan title: gene chip for viral discovery date: 2003-11-17 journal: plos biol doi: 10.1371/journal.pbio.0000003 sha: doc_id: 14368 cord_uid: 4nasrbs6 nan comparing the genomes of related organisms, researchers can see what parts of the genomes are conserved-highly conserved genes tend to be important-and then focus on these regions to track down genes and determine how they function. to construct a draft sequence of the c. briggsae genome, the researchers merged genomic data from three sources-one derived from whole-genome shotgun sequencing, another from physical genome mapping, and the third from regions of a previously "finished'' sequence. for the shotgun sequence, the researchers extracted dna from worms, randomly cut it into short pieces, sequenced them, and then assembled overlapping sequences to create thousands of stretches of contiguous dna sequence. to help fill in the gaps between these "contigs,'' stein and colleagues developed a "fingerprint'' map of the genome as a guide for aligning the shorter fragments. the map also helped them identify inconsistencies and misalignments in the genome assembly. finally, they integrated the previously finished sequence to improve the draft genome sequence. using these massive datasets, the authors produced a high-quality genome sequence; although it does not quite meet the gold standard of a "finished'' sequence, it covers 98% of the genome and has an accuracy of 99.98%. after confirming the accuracy of the draft, the researchers turned to the substance of the genome. examining two species side by side, scientists can quickly spot genes and flag interesting regions for further investigation. analyzing the organization of the two genomes, stein et al. not only found strong evidence for roughly 1,300 new c. elegans genes, but also indications that certain regions could be "footprints of unknown functional elements.'' while both worms have roughly the same number of genes (about 19,000), the c. briggsae genome has more repeated sequences, making its genome slightly larger. because the worms set out on separate evolutionary paths about the same time mice and humans parted ways-about 100 million years ago, compared to 75 million years ago-the authors could compare how the two worm genomes have diverged with the divergence between mice and humans. the worms' genomes, it seems, are evolving faster than their mammalian counterparts, based on the change in the size of the protein families (c. elegans has more chemosensory proteins than c. briggsae, for example), the rate of chromosomal rearrangements, and the rate at which silent mutations (dna changes with no functional effect) accumulate in the genome. this would be expected, the researchers point out, because generations per year are a better measure of evolutionary rate than years themselves. (generations in worms are about three days; in mice, about three months.) what is surprising, they say, is that despite these genomic differences, the worms look nearly identical and occupy similar ecological niches; this is obviously not the case with humans and mice, which nevertheless have remarkably similar genomes. both worm pairs-as well as mouse and human-also share similar developmental pathways, suggesting that these pathways may be controlled by a relatively small number of genes and that these genes and pathways have been conserved, not just between the worms, but also between the nematodes and mammals. this question, along with many others, can now be explored by searching the two species' genomes and comparing those elements that have been conserved with those that have changed. with the nearly complete c. briggsae genome in hand, worm biologists have a powerful new research tool. by comparing the genetic makeup of the two species, c. elegans researchers can refine their knowledge of this tiny human stand-in, fill in gaps about gene identity and function, as well as illuminate those functional elements that are harder to find, and study the nature and path of genome evolution. some 200,000 people live with partial or nearly total permanent paralysis in the united states, with spinal cord injuries adding 11,000 new cases each year. most research aimed at recovering motor function has focused on repairing damaged nerve fibers, which has succeeded in restoring limited movement in animal experiments. but regenerating nerves and restoring complex motor behavior in humans are far more difficult, prompting researchers to explore alternatives to spinal cord rehabilitation. one promising approach involves circumventing neuronal damage by establishing connections between healthy areas of the brain and virtual devices, called brain-machine interfaces (bmis), programmed to transform neural impulses into signals that can control a robotic device. while experiments have shown that animals using these artificial actuators can learn to adjust their brain activity to move robot arms, many issues remain unresolved, including what type of brain signal would provide the most appropriate inputs to program these machines. as they report in this paper, miguel nicolelis and colleagues have helped clarify some of the fundamental issues surrounding the programming and use of bmis. presenting results from a series of long-term studies in monkeys, they demonstrate that the same set of brain cells can control two distinct movements, the reaching and grasping of a robotic arm. this finding has important practical implications for spinal-cord patients-if different cells can perform the same functions, then surgeons have far more flexibility in how and where they can introduce electrodes or other functional enhancements into the brain. the researchers also show how monkeys learn to manipulate a robotic arm using a bmi. and they suggest how to compensate for delays and other limitations inherent in robotic devices to improve performance. while other studies have focused on discrete areas of the brain-the primary motor cortex in one case and the parietal cortex in another-nicolelis et al. targeted multiple areas in both regions to operate robotic devices, based on evidence indicating that neurons involved in motor control are found in many areas of the brain. the researchers gathered data on both brain signals and motor coordinates-such as hand position, velocity, and gripping force-to create multiple models for the bmi. they used retraining the brain to recover movement if you have ever spent an evening hoisting brews with your pals at the corner pub, chances are you never stopped to think-gee, how do i lift my glass now that it's only half full? it seems like a simple task-you raise that glass reflexively, whether it is empty or full-yet the neural calculations that determine the force needed to lift your arm smoothly to your lips in each case are anything but simple. the brain, it seems, operates like a computer to process variable cues-such as the weight of a glass and the position of your arm-to generate an appropriate response: lifting the glass. neuroscientists believe the brain builds a kind of internal software program based on past experience to transform such variable cues into motor commands. the brain's software, or internal model, depends on specialized sets of instructions, or "computational elements,'' in the brain. but exactly how the brain organizes these elements to process sensory variables that affect arm movements is far from clear. eun jung hwang and colleagues predict that these computational elements are based on a multiplicative mechanism, called a gain field, through time after time in biology, revelations about structure lead to insights about corresponding functional mechanisms. while evolution throws in the occasional spandrel, more often organizational structure serves a practical purpose. so naturally, neuroscientists wonder, does the architectural organization of the motor system reveal an underlying functional organization? progress on this question has been complicated by the fact that there appears to be no clear correspondence between the development of motor neurons centrally and their target muscles in the periphery. in the visual system, for example, retinal ganglion cells send axons in an ordered manner into the brain, where they form connections with neurons of the primary visual center in the brain responsible for detecting visual targets. the arrangement of these connections mirrors the neighboring relationships of the neurons in the retina, and so the neural map of connections in the brain is an "anatomical correlate'' of the arrangements in the retina. the origin of these anatomical relationships can be traced through the process of development, allowing scientists to link the assembly of this sensory system with the function of the neurons involved. matthias landgraf and colleagues now report that in the fruitfly drosophila the arrangement of motor neurons corresponds to the distribution of their target muscles. thus, anatomical correlates also exist in the motor system, in the form of a "myotopic map,'' where the arrangement of motor neuron dendritic branches in the central nervous system reflects the distribution of their target body wall muscles in the periphery. starting with the larger question of how the neural networks governing locomotion are specified and assembled during development, the researchers decided to see if they could identify an elementary principle of motor system organization. working in drosophila, they examined motor neurons and the body wall muscles they innervate. with an eye toward understanding the mechanisms directing the assembly of the motor system, the researchers concentrated on the early stages of development, when the motor neurons first establish their characteristic dendritic territories. they found that the dendrites of motor neurons innervating internal muscles and that those innervating external muscles do in fact project into distinct regions, corresponding to the distinct mapping of the muscles themselves. surprisingly, the arrangement of the dendrites in the myotopic map forms independently of the muscles they innervate. it may be, the researchers suggest, that the initial signals charting the location of the dendrites are set very early in development, when the coordinates for other structural elements are established. but that question requires further investigation. the researchers are among the first to reveal such an orderly underlying principles of motor system organization revealed which sensory signals to the brain are amplified by signals from the eye, head, or limbs. in this way, the brain can rely on past experience of one kind of sensory cues to predict how to respond to new but similar situations. while previous studies had established that some visual cues are combined through a gain field, this study shows that motor commands may also be processed via gain fields. this finding, the researchers demonstrate, accounts for a range of behaviors. based on previous studies showing that when people reach to various directions in a small space, they can extrapolate what they learn about the forces in one starting position to a significantly different position, it has been proposed that the way the brain computes movement is not terribly sensitive to limb position. citing other research with seemingly contrary conclusions-that the brain can be highly sensitive to limb position in calculating force and movement-hwang et al. set out to investigate whether-and how-the brain creates a template to translate sensory variables (limb position and velocity) into motor commands (force). they created a computer model to mimic the reaching behaviors observed by people in their experiments and found that the most accurate model used computational elements that are indeed sensitive to both limb position and velocity. if the brain processes these two independent variables through a gain field, it can use the relationship of the two variables-that is, the strength of the gain field-to adapt information about the force needed to move or lift something in one situation to accomplish a wide range of similar movements. when the researchers compared their model to previously published results, they found their model accounted for seemingly disparate findings. they explain that the brain's sensitivity to limb position can be either low or high after a task has been learned because the gain field itself is adjustable. the authors note that neurophysiological experiments suggest that the motor cortex may be one of the crucial components of the brain's internal models of limb dynamics. the next step will be to track the motor cortex neurons to see whether their activity supports this model. hwang et al. predict they will. novel "checkpoint" mechanism mediates dna damage responses connection between patterns of motor neuron dendrites and patterns of muscles. this organization, in the form of the myotopic map, may be mirrored by the patterning of processes of higher-order neurons, which form connections with the motor neuron dendrites themselves. in vertebrates, studies have shown that motor neurons are grouped into "pools'' and "columns'' that correlate with the muscles they innervate. but because these pools and columns represent the location of the cell bodies and not the areas of the spinal cord where the neurons receive most of their inputs, that is, their dendritic branches, scientists could not say whether the pools and columns are simply spandrels-an incidental result of the way motor neurons are generated-or mirror a functional organization of the motor system. this novel fi nding in drosophila will pave the way for future studies on the relationship between anatomy and physiology during development. it will be particularly interesting to discover whether such myotopic arrangements of motor neuron dendrites are unique to insects or whether this organizational principle occurs in other motor systems, including vertebrates. of all the tasks a cell must accomplish day in and day out, protecting its genome may be the most important. genomes confront all manner of potential assaults, from the strand-splitting action of gamma-radiation to the simple copying mistakes sometimes made when dna replicates before a cell divides. though some mutations are harmless, others can disrupt gene action, leading to cancer and other diseases. to guard against such events, healthy cells maintain quality-control "checkpoints'' that sense and respond to dna injuries, as well as to defects in dna replication, and that prevent cell division until the dna can be repaired. if the damage is beyond repair, apoptosis pathways set about the business of destroying the affl icted cell. many of the genes and protein complexes involved in these checkpoint responses have been identifi ed, but the biochemical mechanisms that in some cases trigger cell cycle arrest are not fully understood. experiments by philip hanawalt and his student david pettijohn at stanford university in 1963 suggested that the molecular machinery of dna replication and repair-which they discovered at sites of damage-are quite similar and closely linked. while many studies have since supported that link, viola ellison and bruce stillman, the director of the cold spring harbor laboratory, have found new evidence that the two processes may indeed coincide by showing that protein complexes regulating a cellular checkpoint in dna repair operate much like similar complexes involved in dna replication. the molecular pathways governing the replication of dna before cell division are well known. as the double-stranded dna molecule unwinds, different protein complexes step in to ensure that each strand is faithfully reproduced. two protein complexes required for this process are replication factor c (rfc) and proliferating cell nuclear antigen (pcna). in the 1980s, stillman's laboratory isolated pcna and rfc and showed that they function together to "load'' pcna onto a structure in dna that is created after dna synthesis begins. pcna forms a clamp around the dna strand and regulates the dna polymerases that duplicate the dna double helix. studies in yeast had identifi ed a series of proteins required for the dna synthesis phase of the cell cycle and the dna damage checkpoint pathways; mutations in these proteins' genes make cells very sensitive to radiation (hence the name rad genes). a subset of these proteins, which are conserved in human cells, form two protein complexes-rsr and rhr-that function like rfc and pcna, respectively, with rsr loading the rhr clamp onto dna. ellison and stillman demonstrate that both pairs of "clamp-loading'' complexes follow similar biochemical steps, but, signifi cantly, rfc and rsr favor different dna structures for clamp loading. while it was known that the rsr/rhr complexes exist in human cells, it had not been established that the two types of clamps prefer different dna targets. the researchers also show that the rsr/rhr biochemistry depends on rpa, a protein known to be involved in the dna damage-response pathway. the discovery that rsr loads its rhr clamp onto a different dna structure was unexpected; it suggests not only that the two clamp loaders have distinct replication and repair functions, but also how the checkpoint machinery might work to prevent dna damage from being passed on to future generations. by establishing the chemical requirements of rsr/rhr interactions as well as the preferred dna-binding substrate, the researchers have charted the way for determining the different functions of these cell cycle checkpoint complexes and how the complexes' different subunits affect these functions. the researchers propose that the role of this checkpoint machinery is not as an initial sensor of dna damage, but rather as a facilitator of dna repair, stepping in after preliminary repairs to dna lesions have been made. ellison and stillman's work helps establish a biochemical model for studying how both of these checkpoint complexes function to coordinate replication and repair-and promise to help scientists understand how cancer develops when the checkpoint repair mechanisms fail. during animal development, cells gradually grow, multiply, and specialize to create the tissues and organs that shape and sustain multicellular organisms. the progression from a single cell to a thousand-, million-, or trillion-celled animal follows an exacting schedule and plan involving an elaborate network of genes and proteins. one of the primary mechanisms coordinating this process is cell-to-cell communication. cellular signaling regulates two crucial development mechanisms, apoptosis (programmed cell death) and cell proliferation, which work like chisel and clay to sculpt multiplying masses of cells into, say, a fly wing or a human finger. controlled by multiple signals operating at fixed intervals, the entwined pathways can be steered off-course by a single defect in the communication network, resulting in the death of a healthy cell, for example, or the survival of a damaged cell. such disruptions can lead to physical abnormalities, such as webbed hands and feet, when cells that should die remain alive; degenerative nerve disease, when healthy cells are killed; and cancer, when damaged cells survive and evade normal growth limitations. researchers have uncovered some of the mechanisms underlying these processes by studying genes involved in fruitfly (drosophila) development. following that tradition, stephen cohen and david hipfner have identified a gene critical to drosophila development that juggles cell growth and survival signals to help promote cell growth and prevent inappropriate apoptosis. they searched for genes associated with changes in tissue growth in fruitfly wings and identified some that can cause tissue "overgrowth''-abnormally large masses resulting either from cells growing faster than they divide or from cells escaping proliferation controls when they are overexpressed. among these is a gene that encodes a newly divide, and differentiate, most respond to the defect by killing themselves, even under conditions that normally promote survival. thus, cells without slik appear to have an intrinsic survival defect, suggesting that slik prevents apoptosis. when slik is overexpressed, cell proliferation increases, but so does apoptosis. only when apoptosis was blocked did the cells form tumor-like growths. this coupling of cell growth and cell death is characteristic of oncogenes (cancercausing genes), and slik also seems to function in both pathways. the authors point out that the signal to proliferate may inherently sensitize cells to apoptosis, as has been shown previously for some cancer cells. this may keep an individual cell under the control of its neighbors, who collectively monitor the needs of the organism. for a cell to respond to a signal by dividing rather than dying, it must get the appropriate signs from its comrades. slik, the authors demonstrate, is a key factor in determining whether a cell lives or dies. whether its mammalian counterparts play a similar role is yet to be determined. identified kinase that contributes to the regulation of cell proliferation and survival (or death, depending on the circumstance) during drosophila development. cohen and hipfner called the gene slik based on its similarity to two human kinase-coding genes (slk and lok). little is known about these human proteins, though previous studies suggest they may affect cytoskeletal dynamics and cell adhesion. in this paper, the authors report preliminary evidence supporting the notion that slik may regulate the cytoskeleton, the "backbone'' of the cell that confers structure and motility. interestingly, disturbances to cell adhesion and cytoskeletal structure are known triggers of apoptosis and are being explored as potential anticancer agents. kinases make up one of the largest families of proteins and are important regulators of cell signaling. to investigate the function of slik in drosophila, the researchers removed the gene and then studied the physical and cellular effects. they found striking delays in growth and developmental timing and showed that these effects result largely from the demise of the slik-deficient cells. while cells deprived of slik can grow, one might not expect that yeast lead terribly eventful lives, yet the singlecelled fungus must struggle to survive just like everyone else. and for yeastlike everyone else-survival means being able to detect and coordinate a rapid response to changes in its environment. though survival for humans is a bit more complicated, our cells use the same regulatory networks, which maintain cell growth and health when they work and contribute to diseases, from asthma to cancer, when they break down. given the variety of conditions even the lowly yeast is likely to encounter during its life, one might expect to find a multitude of molecules mobilizing a response. but yeast cells, it turns out, are fairly resourceful. as erin o'shea and colleagues report, just one protein in yeast activates different groups of genes in response to different amounts of an environmental stimulus. the researchers focused on how yeast responds to various levels of phosphate, an essential nutrient for all cells. one way that cells regulate responses to environmental stimuli is through the transcription (activation) of genes. these transcriptional responses are often controlled by a multistep process that shuttles gene-activating proteins into the nucleus, where they can generate the appropriate response for a given stimulus, or confines them to the cytoplasm if their gene products are not needed. during this process, called phosphorylation, the addition of a phosphate group to a protein-such as a receptor or transcription factor-acts as a mechanism for controlling gene expression. o'shea's team demonstrated that phosphorylation of a transcription factor called pho4 controls gene expression by controlling where that protein resides in the cell-in the cytoplasm or in the nucleus. as is the case with many proteins, pho4 can accept phosphate groups at multiple sites. to see whether the location of phosphorylation affects the action of pho4, o'shea's team exposed yeast to different levels of phosphate and tracked the cellular response. they found that when yeast is deprived of phosphate, pho4 has no phosphate groups at any of its binding sites and enters the nucleus, where it binds to dna and activates a set of genes whose products can scavenge for phosphate or otherwise compensate for the scarcity. when yeast has ample supplies of phosphate, pho4 is phosphorylated and remains in the cytoplasm-unable to influence transcription-suggesting that the cells can absorb plenty of nutrients from their environs without having to engage a specialized foraging team. when the researchers exposed the yeast to intermediate amounts of phosphate, the results were surprising. middling concentrations of phosphate produced different forms of phosphorylated pho4, which varied in their ability to activate genes, and so added to the number of possible responses. pho4 partially phosphorylated at one site, for example, could still enter the nucleus, but activated only one type of phosphate-recovery gene and not others. while it is not unexpected that differential phosphorylation could have different functional outcomes, the authors say, it is surprising that one enzyme acting on one transcription factor can create different phosphorylation patterns-and therefore different gene-expression patterns-in response to different amounts of a single stimulus. their results show that cells rely on a highly regulated series of interactions that induce subtle changes in gene expression to fine-tune their response to small environmental changes. and they do this in a remarkably efficient manner, relying on a small cast of characters to orchestrate the responses essential for survival. extraction, amplification, and decoding of viral sequences rapidly identify known viruses and classify new ones based on their genetic makeup. this was validated in march when the viral chip contributed to the identification of the cause for severe acute respiratory syndrome (sars) as a novel coronavirus. in the article published in this issue, the researchers describe the chip (or microarray), how it was used in the classification of the sars virus, and how it provides direct access to viral genomic sequence. microarray technology works by taking advantage of the structural properties of dna. dna molecules normally exist as double helices, two complementary strands of nucleotides wrapped around each other. the microarray consists of a large number of single dna strands attached to a solid base. these probes (which in case of the viral chip represent sequences from all fully sequenced reference viruses) can be used to interrogate unknown sequences: if a solution containing such sequences is passed over the chip, similar sequences will "hybridize,'' or bond in a signature double helix. known viruses hybridize in a characteristic pattern and can be identified quickly. because bonding occurs even when the match between probe and sample sequence is not perfect, new relatives of known viruses can be identified as belonging to a particular family (such as coronaviruses, in the case of sars). to quickly obtain more information on a novel virus, it is then possible to "syphon off'' those viral sequences that stuck to their respective counterparts on the chip and to use the material to determine part of the genomic sequence. such sequence information provides more detail on how the new virus relates to known ones, which might provide clues about its origin and possible treatment strategies. faced with all manner of potential threats in the form of billions of different viral, bacterial, and chemical pathogens, the mammalian immune system relies on a "safety in diversity'' strategy for protection. with two distinct subsystems-one innate, the other adaptive-the immune system can recognize some 100 trillion antigens. the innate system deploys cells programmed to quickly recognize microbes with a particular set of conserved molecular structures. the adaptive system relies on billions of uniquely outfitted lymphocytes (white blood cells) to identify just as many pathogens through their protein fragments, or antigens. a human being grinds out billions of these cells every day. in the absence of threats, the immune system maintains a quiescent state and many of these cells are discarded. but for the immune system, doing nothing takes a concerted effort. lymphocytes originate in the bone marrow, though not all differentiate there. one class of lymphocytes, called t cells, develops in the thymus, where every t cell acquires a one-of-a-kind receptor, called a t cell receptor (tcr), designed to recognize a different antigen. when an antigen gets bound by a tcr (a bound molecule is called a ligand), the antigen triggers a signaling cascade that tells the t cell either to attack the infected cell or to alert other immune cells of the infiltrator. but as jeroen roose, arthur weiss, and colleagues report, signaling pathways activated by bound tcrs appear to influence gene expression even in the absence of antigen or other receptor ligands, a process called ligand-independent signaling. these findings lend support to the notion that cellular signaling pathways regulated by surface receptors, like tcrs, exhibit a continuous low-level signaling (known as basal signaling) in the absence of a stimulus and that this continuous signaling, by influencing gene expression, has significant influence on cellular differentiation. roose, weiss, et al. focused on the tcr signaling pathway that regulates the expression of a group of genes, including rag-1 and rag-2, that are activated in two distinct waves during t cell development. rag genes play a crucial role in t cell development, a highly complex, multistage process that involves a reshuffling, or recombination, of tcr genes and the activation of different proteins and genes at different stages. rag genes regulate the genetic recombination and ultimate cell surface expression of tcrs. using chemical inhibitors and mutant human t cell lines deficient in critical signaling components involved in antigen receptor-dependent pathways, the researchers found that the loss of specific functions or specific proteins affected an unexpected set of target genes. notably, when downstream components (the protein kinases erk and abl) were disabled in the basal signaling pathway, the researchers saw a resurgence of rag gene expression. while erk was already known to play a prominent role in signaling pathways downstream of the tcr, it now appears that abl may also be regulated in tcr pathways. most importantly, these findings suggest that signaling pathways thought to be triggered only by ligated receptors can influence gene expression on their own. and it may be through this type of signaling that tcr pathways help regulate t cell development by repressing rag gene activity. these basal signals, the researchers postulate, may in effect save the rag expression machinery until recombination is called for. if rag genes were expressed at the wrong time, they could cause inappropriate genetic recombination and create t cells that either lack function or attack healthy cells, as happens in immunodeficiency and autoimmune diseases. elucidating the mechanisms and components of this basal pathway will contribute important insights into the development and function of the immune system. but these studies also establish a model for investigating other signaling systems, to determine whether biologically functional basal signaling is a rare phenomenon or whether it is a fundamental cell process needed to control the profile of gene expression in the quiescent state. a multicellular organism can have more than 200 different types of cells and as many as 100 trillion altogether. during the process of development, an organism enlists the service of hundreds of signaling molecules and thousands of receptors to direct cell growth, differentiation, and morphological destiny. any given cell has no use for most of these signals and gets by with just a limited repertoire of receptors on its surface. once a signal reaches a receptor, it triggers a series of biochemical reactions as different molecules transform the external signal into a biological response, in a process called signal transduction. one cell type controls all of its cellular functions-both universal and specialized-with just a few dozen receptors; each receptor elicits a wide range of responses by triggering a small number of interacting pathways. exactly how a receptor produces the right response at the right time is a fundamental question in biology. of particular interest is a class of receptors-called receptor tyrosine kinases (rtks)that regulate cell proliferation, differentiation, and survival and play an important role in embryonic development and disease. growth factor receptors are an important subset t he protozoan parasite plasmodium falciparum causes falciparum malaria, a fatal parasitic disease in humans, and is transmitted by anopheles mosquito vectors (predominantly the anopheles gambiae complex and an. funestus in africa). there are about 300 million malaria cases and 1-2 million deaths annually, the brunt of which are borne mostly in africa by children under 5 years of age and by pregnant women. in many african countries, malaria poses a formidable challenge to an overburdened and underfunded public health system. the current malarial control strategies consist of chemotherapy directed against the malaria parasite and prevention of mosquito vector/human contact using insecticide-impregnated bednets and, to a lesser extent, indoor residual insecticide spraying and environmental control for reducing mosquito breeding sites. there are still no malaria vaccines in clinical practice. chemotherapy (the use of drugs to target disease) is used for both treatment and prevention. drug resistance is increasingly becoming a problem. some of the antimalarial drugs in current use include quinolines, artemisinins, antifolates, atovaquone/ proguanil, and antibiotics. chloroquine of rtks. the platelet-derived growth factor receptor (pdgfr) family activates downstream signaling enzymes that stimulate the growth and motility of connective tissue cells, such as vascular smooth muscle cells (vsmcs), oligodendrocytes (cells of the tissue encasing nerve fibers), and chondrocytes (cartilage cells). the pdgf beta receptor is essential for directing the differentiation of vsmcs. while studies of signal transduction of this growth factor have established a model of how receptor tyrosine kinases function, the role of individual downstream signaling components in a living organism is still unclear. using mouse molecular genetics, michelle tallquist and colleagues set out to determine the function of individual components in the pdgfr beta pathway. they discovered a quantitative correlation between the overall amount of signal produced by the receptor and the end product of the signal, formation of vsmcs. receptor responses, they report, are controlled in two ways: signaling was influenced both by the amount of receptors expressed and by the number of specific pathways engaged downstream of the receptor. surface receptors have "tails'' that project into a cell's interior. when a surface receptor is activated, a number of potential binding sites-modified amino acid residues-are exposed on its intracellular tail. ten of these sites can bind to proteins with a specific amino acid sequence, called an sh2 domain; proteins with these domains can then initiate a signal transduction pathway. by introducing mutations in the sh2 domain-binding sites in mice, the researchers could evaluate how the loss of a particular binding site-and therefore pathway-affected the function of the receptor. they had previously investigated the functions of two other downstream signaling proteins in similar experiments. surprisingly, tallquist et al. found that losing some of the individual components did not produce a significant negative physiological effect. only when multiple downstream signaling pathways were disrupted did the researchers see a significant effect on the population of the cells. reductions in the numbers of both activated receptors and activated signal transduction pathways produced reductions in the population of vsmcs. these results have not been seen in tissue culture before, suggesting that signal transduction is more complex in vivo and that future studies would benefit from incorporating a global approach, rather than targeting a single signaling component. the next step will be to investigate exactly how the individual pathways contribute to this result. it is also unclear whether these results apply only to these growth factor receptors or explain how rtks operate in general. such questions have significant clinical relevance. overexpression of the pdgfr beta pathway has been linked to a variety of serious diseases, including atherosclerosis and cancer. understanding how cells control the action of this growth factor is an important step in developing targeted therapies. since many of these conditions result from a growth factor stuck in the "on'' position, inhibiting overactive receptors promises to be an effective clinical intervention. (cq) is a cheap and widely used aminoquinoline, but cq-resistant parasites have become ubiquitous in endemic countries and other drugs are now used much more frequently (ridley 2002) . fansidar, a combination of sulphadoxine and pyrimethamine (sp), is a first-line treatment in several african countries, but resistance to sp is spreading rapidly. targeting the mosquito vector with pyrethroidimpregnated bednets, in addition to chemotherapy, is an effective method of controlling malaria transmission. however, pyrethroid resistance has been reported in an. gambiae s.s. in west africa, and there is concern about its emergence in east africa (chandre et al. 1999) . thus, the public health problem due to malaria is exacerbated by the emergence of drug-resistant parasites and insecticide-resistant mosquitoes. the clinical application of efficacious intervention tools is therefore an urgent imperative for malarial control. this brings into sharp focus the importance of genomics research for drugs, vaccines, diagnostics, and insecticides. the unraveling of the genomes of humans, p. falciparum, and an. gambiae has ushered in a new era of hope that genomics research will result in the development of new and better tools for malaria control. the p. falciparum genome of 22.8 megabases (mbp) distributed among 14 chromosomes consists of 5,300 protein-coding genes (gardner et al. 2002) . p. falciparum possesses a relict plastid, the apicoplast, homologous to the chloroplasts of plants and algae. the apicoplast is essential for parasite survival and functions in the anabolic synthesis of fatty acids, isoprenoids, and heme (seeber 2003). these essential metabolic pathways are not present in humans and are therefore ideal targets for the development of safe antimalarial drugs. inhibitors of type ii fatty acid biosynthesis (triclosan and thiolactomycin) and mevalonateindependent isoprenoid biosynthesis (fosmidomycin and fr900098) with potent antimalarial activities have been identified by computational mining of the genome data. the fact that fosmidomycin has rapidly entered into clinical trials underscores the great utility of genomics research in the control of malaria (lell et al. 2003) . about 3,200 proteins (60%) in p. falciparum have no known functions (gardner et al. 2002) . the greatest challenge of malarial functional genomics (the elucidation of the functions of genes encoded by an organism's genome) is to assign functions to these proteins, thus comprehensively identifying the proteins that function at various lifecycle stages and that function together to carry out particular cellular processes, e.g., red blood cell invasion, signal transduction, growth, vesicular trafficking, etc. the application of functional genomics approaches allows the properties of many genes and proteins to be assessed in parallel on a large scale. these approaches are being used to address specific questions about the biology of p. falciparum. gene profiling (determining which genes are expressed) by microarray technology allows a rapid, parallel analysis of genome-wide changes in gene expression over a variety of experimental conditions (e.g., chloroquine versus saline control), tissues, and cell types; these genes can be clustered (ordered by expression pattern) to identify those that function in the same process. one of the most promising applications of microarrays is the study of differential gene expression during the complex p. falciparum lifecycle, specifically the formidable and challenging task of determining which subset of the 5,300 genes is represented in the transcriptome of each stage (bozdech et al. 2003; le roch et al. 2003) . these approaches are beginning to yield invaluable insights about new vaccine candidates, novel drug targets, and the molecular basis of drug resistance. proteomics is the study of all the proteins expressed in an organism. global protein analysis offers a unique means of determining not only protein expression, but also interacting partners, subcellular localizations, and post-translational modifications of proteins of whole proteomes. analyses of the proteomes of parasites that have been exposed to distinct environmental stimuli (e.g., chloroquine versus saline control) or that manifest distinct phenotypes (drug resistant versus drug sensitive) might also facilitate the identification of biochemical drug targets and of the specific proteins involved in drug resistance. comparative genomics (the comparison of genomes of related species), on the other hand, will yield invaluable insights about the biology of and the pathogenesis of disease associated with different parasites, i.e., p. falciparum doi: 10.1371/journal.pbio.0000039.g001 on the one hand and p. vivax on the other. the biology and pathology of the two parasites are quite distinct, e.g., the preference for reticulocytes (p. vivax) versus mature red blood cells (p. falciparum), the ability to cause severe (p. falciparum) versus mild (p. vivax) disease, and the implication of amino acid substitutions in pfcrt in cq resistance in one (p. falciparum) but not in the other (p. vivax). the 278 mbp sequence of the nuclear genome of the pest strain of an. gambiae s.s. has been published in draft form and is considerably larger than the 122 mbp assembled sequence of the fruitfly drosophila melanogaster (holt 2002) . the an. gambiae genome includes a treasure trove of 79 odorant receptor genes and about 200 genes that encode glutathione-s-tranferases, cytochrome p450s, and carboxylesterases. these and possibly other genes probably play a critical role in human host finding and detoxification of insecticides, respectively, and could be exploited, using gene profiling, proteomics, and comparative genomics, for the development of novel mosquito repellants or traps and insecticides. the ability to introduce foreign genes into anopheles vectors is an exciting advance that might facilitate the development of transgenic mosquitoes that do not transmit malaria parasites (moreira et al. 2002) . however, the future implementation of this control strategy, if current technical hurdles can be overcome, must take into consideration concerns about the environmental impact of releasing genetically altered mosquitoes. scientists in endemic countries must be active participants in malaria genomics research and not just conduits for field materials for northern partners. however, the reality is that there is an increasing technological gap between endemic-and developedcountry researchers in the field. this needs to be urgently addressed. the world health organization special programme for research and training in tropical diseases have initiated a series of training workshops in bioinformatics in endemic countries; the howard hughes medical institute has supported one such workshop. the training must extend to other aspects of genomics and include infrastructure development. there is considerable optimism that genomics research will result in new drugs, vaccines, diagnostics, and tools for malarial vector control. strong linkages between genomics research and national malarial control programs will facilitate the translation of research findings into intervention tools. as it is for all new technologies, it might also be important for the communities in endemic countries to have a greater awareness and understanding of genomics research. this will enhance acceptance of the products and improve informed consent. there is therefore a unique opportunity for collaborations between social-economic scientists and genomics researchers. the challenges of searching the scientific literature t he standard "front end" for biomedical literature search is medline and its entrez query system. huge, well-managed, and nearly exhaustive, medline and its 11 million references provide incredible ease and facility for anyone who can type a boolean query. though not quite a parallel for google-which runs a kind of popularity contest for web links in real time-the entrez search has opened up the literature to anyone with a web browser. to those who grew up chasing citations and papers through the aisles of a scientific library, entrez is a dream come true. and yet. suspend disbelief and imagine for a moment a kind of literature search dream-tool. "find me all references citing my gene of interest," you could ask. but why stop there? "find me all references citing some or all of my four genes of interest with expression or in vitro data." and then, "bring up the text of the paragraph in which these citations occurred so i can view them in context. and do it in real time." tools that can perform such searches would go beyond google because they avoid the repetitiveness involved in multiple searches. and they would go beyond entrez because they would search the entire medical literature in full-text format and not, as medline does, just the abstracts. furthermore, they would go beyond both types of searches in that they would be at least somewhat intelligent. such text-mining efforts are the next frontier for both academic and commercial groups that have sprung up from pasadena to boston to tel aviv. but how realistic is this venture? text-mining and its more universal relative "information retrieval" are still in their infancy. the first paper on text-mining for biology was published only in 1997. furthermore, because biological text-mining comes so close to the challenge of comprehending human language-arguably the most complex invention in the history of the planet-it is what computer scientists call a "hard problem." so even here, at the embryonic and fun stage in this technology's history, the outcome and especially the timing of improvement are impossible to predict. language-processing software tools have been successfully applied in text-mining of nonscientific sources, especially to newswire content. computer programs can already perform all three levels of text-mining ( figure 1 ) effectively: retrieving documents relevant to a given subject; extracting lists of entities or relationships among entities; and answering questions about the material, delivering specific facts in response to natural-language queries. information retrieval and extraction can be performed on news data at success rates of 90%-95%, says lynette hirschman, a structural linguist. question-answering has been reported in the literature at 85% accuracy, she notes, which is "amazingly good." the question is, how soon can these levels be achieved for biology? good thing for biologists that hirschman has turned her energies in their direction. hirschman works in massachusetts at mitre corporation, a government-funded institution that pursues projects in the national interest, be they in defense and intelligence or, as in the case of textmining, "anywhere we can move an entire field forward," says hirschman. the good news from news-mining is that improvement seems to arrive in direct proportion to the time and energy expended by the research community. similar improvement has occurred in speech recognition by computers, she adds ( figure 2 ). when people took successively harder problems and worked on them for four or five years, she explains, it caused error rates to drop, as a rule, by a factor of two every two years. one might think tackling the biomedical literature would be relatively easy, remarks hirschman: biology jargon has a lot of prefixes and suffixes, which can be parsed more easily than verbs and adverbs; it is highly regular, with greek-letter addons to gene or protein names signifying relatives or subtypes of the original proteins; and there are many resources available, such as databases and ontologies linking different biological terms. but whereas extraction of person and place names from news text routinely reaches 93%, results in biology remain mired in the 75%-80% range. "it's a little depressing," warns hirschman. "even something as simple as a slash may imply two different entities or a single compound." a chorus of assent greets her observation. programmers eager to codify the rules of biology have been stymied by what one bioinformaticist calls "a sea of exceptions." moreover, there is a chronic lack of data that have been "marked up" by software or humans to indicate the roles played by some of the key words. this marking-up process, however it is done, is crucial for machine-learning tasks. getting these data is both hard and expensive, says hirschman. to move biology text-mining forward, she believes, requires organizing different academic and commercial groups so that they are at least working on the same problem. only then can standards emerge that will allow progress in the field even to be measured. this type of shared problem-known as a "challenge evaluation"-has become something of a "religion" in the speech and language community since the 1980s, says hirschman. by putting out a set of data to train on and then issuing a "challenge" for each group to extract the same information or answer the same questions, "you compare apples to apples. in the process you build a research community." last year, hirschman and others ran the very first challenge evaluation in biology, the kdd cup (officially called the knowledge discovery and data-mining challenge cup). six weeks in advance, the organizers gave participants a training set of 862 journal articles already included in the model organism database flybase, along with associated lists of genes and gene products, as well as relevant data fields from flybase. after building their software tools, the entrants were then asked to take a test set of 213 articles and pretend they were curators: the tools were supposed to determine whether the articles were appropriate for curation, based on whether they contained experimental evidence for gene expression products, including both rna transcripts and proteins. eighteen participants took a shot at the kdd cup and their results speak of the infant state of the field. on average, they could assign only 58% of the papers correctly and could determine whether relevant gene products were present only 35% of the time. the winning entrant, a joint group from the israeli company clearforest ("see the forest and the trees") and marylandbased celera genomics, did better. doi: 10.1371/journal.pbio.0000048.g001 the four levels of information retrieval: google and medline both use keywords to direct a searcher to documents. but the next level has been tough to crack. improved software would allow biologists to jump from the web or medline to specifics with a single query. (adapted with permission from the mitre corporation.) information retrieval and extraction can be performed on news data at success rates of 90%-95%. the question is, how soon can these levels be achieved for biology? their entry made the right decision to curate 78% of the time and the right call on the presence of gene products 67% of the time. the winning group did so well by using a clever "trick," says hirschman admiringly. their program searched for figure captions and then applied multiple techniques to find those gene products they were looking for. the techniques applied by clearforest and others fall into two broad categories, statistical and heuristic. statistical techniques are the next step up from keyword searches. they count words such as genes or gene products appearing close to one another, but apply no linguistic insights, such as whether an adjective modifies a noun. by contrast, heuristic approaches use hand-crafted rules designed for specific datasets: e.g., january, february, march, etc., are months; the word following "mr." is a name; and so forth. this approach is labor-intensive but especially useful when there is only a limited amount of data-as is the case with single scientific papers or small groups of papers. some statistical approaches have been labeled with the nickname "bag of words" because they fail to account for grammatical relationships; e.g., "man bites dog" and "dog bites man" would drop the same three words in the bag. a key observation at the kdd cup was that the most basic statistical approach, which counts word occurrences at the document level, is not sufficient unless it takes into account at least some higher-level context, such as the part of the paper from which the search terms are extracted. furthermore, the more hand-crafted rules there were, the better. many of the top teams included biologists who applied their expertise to help create empirical rules that became part of the program instructions. this points to a general theme in machine learning: the greater the degree of human intervention, the better. the best programs are covered with fingerprints. although the march toward better text-mining systems is building momentum, there are two issues that could stop it in its tracks. the first is access. experts in text-searching uniformly cite access as a key obstacle for developing better search tools. "access is a bigger problem than algorithms" is how one machine-learning expert puts it, and a half-dozen others agreed. the present "balkanized" situation for text-processing is filled with "dead ends" and "short circuits" in information flow among biologists, says david lipman, head of the united states' national center for biotechnology information, which runs pubmed, the medline database, as well as the national library of medicine and other critical resources in biology and bioinformatics. it is as if readers are marine biologists on a coastline whose beaches are 98% private. at best, asking permission to view every article slows down the work. at worst, there are some important tools one can never build owing to the missing context. medline itself would be much more powerful if it were based on full text, experts say. owing to lack of access, says hirschman, "we miss a great deal by not having large corpora of full-text articles" included in the design of both the kdd cup and the next challenge evaluation, called biocreative, being held later this year. many of the relevant biological data are found outside abstracts, but getting access to full text is complicated at best. for manual searching, researchers traditionally fall back on portalhopping: jumping from one full-text subscription (to nature, science, or cell, for example) to another, or from one portal (highwire, web of science) to another. that way, many scientists routinely obtain access to as many as 80% of the journals they need. the rest they can usually request via interlibrary loan or order as photocopies online. however, this approach fails for most automated search programs. just sorting out the permissions and keeping up with changes in the portals dramatically increase the headaches for anyone trying to build a search tool. the second threat to text-searching programs ever becoming widely useful has more of the ring of linguistics jargon. the so-called " ontology problem" threatens successful searching based on the very specific nature of biological terminology. the issue here is not only that scientists are truly terrible about sticking to established terminologies. "scientists would rather share each other's underwear than use each other's nomenclature," as biochemist keith yamamoto is fond of saying. consequently, the scientific literature is a hodgepodge of identical or overlapping terms. a naã¯ve text-parsing program does not know whether "cat" refers to the catalase gene, the chloramphenicol transferase gene, or a household animal. the challenge is to build an ontology describing all the important relationships so your computer program can navigate among them without asking you what to do. consequently, an ontology would prescribe rules for understanding the interactions among genes based on the appearance of certain verbs ("inhibit," "express"), nouns ("agonist"), or phrases. although within each narrow scientific subdiscipline it may be possible to build exquisitely useful textmining tools, as soon as programmers broach the borders of the narrowest subfields, they will run into a kind of heisenberg uncertainty principle of linguistics and science. every toolmaker doi: 10.1371/journal.pbio.0000048.g002 is faced with the ontology problem in one respect or another, especially when the tool is meant to be a general one. david gilmour, chief executive officer of tacit inc., a knowledge management company in palo alto, california, is an industry veteran of exactly this war "and i have scars all over my body to prove it," he says. the issue in a nutshell, he explains, is that "ontologies scale poorly, and by the time they are useful," that is, large enough to capture most of the possible relationships among words, "they are unmaintainable." hirschman acknowledges that keeping up with the literature and new terminologies is challenging. adapting tools to new domains has traditionally been one of the "critical stumbling blocks" for text-processing technology, she says. the dynamic growth of biological terminology does not help. there are 50-100 alterations every week to the nomenclature section of mouse genome database web page. staying within one's narrow domain, then, could be a recipe for success, as long as the vocabulary and user questions remain tightly constrained, especially if there is a way to tiptoe around the access problem. that is apparently the case at wormbase, though the newly available tool there, called textpresso, is still being built. the motivation for textpresso was simple, says hans-michael mueller, a postdoctoral fellow in the lab of paul sternberg at caltech in pasadena, california, where wormbase-the genetic database for the nematode worm caenorhabditis elegans-is curated. "we want the user to be able to avoid going to the library to read all those papers [on genes and proteins] that your favorite gene interacts with. that is very tedious." the other goal is equally recognizable in the biology community: no mere mortal can hope to keep up with the burgeoning literature, even in the relatively narrow field of worm biology. mueller, a nuclear physicist by background, called textpresso "a search engine for full-text searches of abstracts and articles" that can help find answers to more challenging queries than simple keyword searches. mueller and his team use human "taggers" to mark up the corpus of text to indicate categories like "biological processes" ("late larval activation"), "genes" (let-7), and "molecular functions." then, like the clearforest-celera program, textpresso searches for combinations of categories in the same or neighboring sentences. the ontology relating the expressions and categories to one another is based both on scientific and common sense as well as linguistic components. in less than two years of work, mueller and his team have already marked up 3.9 million terms in 16,000 abstracts and 2,000 full-text papers. a typical search asks a question such as "what can be found out about the negative regulatory aspects of a genetic network in the pharynx?" answers emerge in the form of citations, abstracts, and, if available, a paragraph or so from the text of the relevant paper. textpresso went up-unpublicized-on the web in february this year and already receives a couple of hundred hits a day, a big number in a field of about 2,000 researchers. mueller estimates that textpresso is 95% accurate and that about 35% of the relevant papers have been included. textpresso needs full-text access to be as good as it is, says mueller. "we noticed" that drawing on full text "greatly increased the chances of a true hit," not a false positive. he managed to avoid the access issue by claiming a kind of "curator's privilege." only the curators see the full text. once the data are on the web, users can only get at most a paragraph, which falls within fair use, said mueller. if a user happens to subscribe to the journal in question, it is possible for him to click through, the publisher's portal and see the paper. whereas textpresso works exclusively on worm genetic data and commercial players like clearforest are just beginning to hunt for biological applications, a handful of companies have begun to market text-searching products to academic biomedical scientists. one such product is called quosa, for query, organize, share, and analyze. the software had its commercial launch in late 2002. put simply, the program-available on an institution-wide basis and already installed for hundreds of researchers at massachusetts general hospital and the dana-farber cancer institute in boston-allows a search across one's own documents. a front end for the literature that cooperates with medline, quosa pulls in and prioritizes full-text papers. the program first allows the user to search for the relevant files and download them in full-text format to the extent permitted by her library's subscription agreements and licenses. once it becomes second nature to users, they rave about it. like the best of the firstgeneration software, quosa allows users to make connections they would not have otherwise made. like so many other early software products, its longterm success will hinge on demand as well as improvements made in the upgrades. because of the ontology problem, improvements in searching in the next couple of years are likely to result from the application of ever-better techniques within existing domains. collaborations among wormbase, flybase, and other model-organism database groups will help improve all their search tools. medline itself may benefit from more advanced search techniques, though these will be restricted to abstract searches. the big unknown for predicting further development of text-search tools is the path publishers will take. if each publisher or portal such as reed-elsevier or highwire were to license or develop its own tool for searching its own content, the result might be better than the status quo, but would still be unsatisfying. running the same search three times on three different subsets of content might be better than running it 15 times-but wouldn't it be easier to run it just once? î�­ the transcriptome of the intraerythrocytic developmental cycle of plasmodium falciparum we are grateful to drs. lisa ranford-cartwright (university of glasgow, glasgow, united kingdom), ayoade oduola and yeya toure (world health organization, geneva, switzerland), and wilfred mbacham (university of yaounde, yaounde, cameroon) for critical comments on the manuscript. key: cord-016313-n4ewq0pt authors: baranyi, lajos; dropulic, boro title: advances in lentiviral vector-based cell therapy with mesenchymal stem cells date: 2012-09-27 journal: mesenchymal stem cell therapy doi: 10.1007/978-1-62703-200-1_14 sha: doc_id: 16313 cord_uid: n4ewq0pt the field of possible application of mesenchymal stem cells in medicine and research expanded tremendously with the advent of improved lentiviral-vectors capable of inserting stable copies of genes of interest and expressing proteins or biologically active rna species ad libitum, performing delicate gene editing or active gene silencing or serving as advanced drug delivery systems utilized in ex vivo cell therapy. the combination of these two fields has created a number of new areas of research in the landscape of modern medicine which are now extensively studied and discussed here. these areas include tissue engineering, tissue repair, wound healing and tissue implants, anticancer therapies, angiogenesis, myocardial infarction and repair as well as understanding and treating acute lung damage and injury. in addition, genetically modified, tagged mscs are being intensively deployed in research and therapeutic attempts of the various ailments of the central nervous system including parkinson’s disease, alzheimer’s disease, various phases of acute ischemia and trauma. the emergence of new and important data for type ii diabetes research is being followed with treatment suggestions and studies of senescence to find novel applications for genetically engineered mscs. we find in ­general that genetically modified mscs are at the cusp of breaking through from basic research into the next phase of clinical trials. stem cells, including the mesenchymal stem cells (mscs), are very close manifestations of plato's imagery of the shadows on the cave wall, since they are dif fi cult to study outside their intimate interactions with their microenvironment [ 288 ] . our observation methods change their responses and characteristics [ 9, 74, 82, 103, 145, 252 ] , as in quantum physics, when the observation changes the observed. with that caveat, we can admire the rapid development in stem cell research. alas, the dif fi culties in research are faithfully re fl ected in the confusion in the nomenclature used for describing and classifying stem cells, including the classes of stem cells of mesodermal origin. the recent de fi nition of mscs by dominici states that mscs are a stromal cell type, possessing the following characteristics and markers: plastic adherence in cell culture, speci fi c surface antigen expression of cd105(+), cd90(+), cd73(+), cd34(−), cd45(−), cd11b(−) or cd14(−), cd19(−) or cd79a(−), hla-dr1(−) and multi-lineage in vitro differentiation potential (osteogenic, chondrogenic, and adipogenic) [ 59 ] . however, this de fi nition would neatly exclude cd34+ hematopoietic stem cells (hscs ), while one also could argue that the hematopoietic stem cells are just a specialized subclass of the mesenchymal progenitors [ 180 ] . another subset of mscs that express hyaluronan (cd44), an adhesion molecule important for stem cell homing [ 14, 125, 281 ] , would also be excluded, but their perivascular equivalents could be considered to be true mpcs [ 180 ] . it becomes even more complicated if we include the results of (stem) cell reprogramming, when more or less differentiated cell types are regressed into less differentiated, pluripotent cell types [ 91, 185, 192, 239 ] , providing us with a never ending stream of novel biomarkers, more often represented by whole proteome analysis [ 1, 203 ] . time will tell, what are the biomarkers and criteria for properly characterizing the particular stem cell populations, but there is a functional de fi nition lingering around as a fi rm conceptual handle on the idea of cell plasticity of which stem cells are prominent representatives [ 21, 22 ] . the plasticity indicates the ability of matured or not fully differentiated cells to differentiate into novel cell types, or more accurately, it describes the existence of cells specialized into becoming progenitors of differentiated cells while sustaining their own type and maturation level. combining this with the embryology and origins of cell lineages from the three primordial "dermata" (ecto-, endo-, and mesoderm) provides us with a useful and generalized de fi nition of mscs being the pluripotent, self-renewing stromal cells of mesenchymal origin and allowing us to determine the speci fi c biomarkers later, at our convenience, as the state of affairs in mesenchymal stem cell biology progresses and solidi fi es. there is no doubt that we will fi nd the appropriate placement for the specialized subtypes as well as the proper and practical placement of some of the induced pluripotent cells (ipcs) in the realm of mscs. regardless of the exactitude of the classi fi cation, the (omni-) presence of the mesenchymal cell lineages in all of the organs and tissues [ 11, 102, 116, 132, 180 ] renders them good candidates not only for general stem cell therapy [ 22, 73, 102, 235 ] , but even more promising is the potential use of mscs for gene therapy [ 35, 45, 176, 196 ] , cell reprogramming [ 9, 36, 252 ] , delivery of bioactive molecules [ 163, 196 ] , and tissue engineering [ 4, 82, 153 ] . in addition, a number of new issues are arising from the results describing the importance of stem cells in inducing and sustaining the malignant phenotype and the potential therapeutic targeting of a wide range of the elusive cancer stem cell types [ 27, 183, 217, 244 ] . the genetic modi fi cation of mscs and associated cell types with lentiviral vectors opens their application beyond reliance upon the innate properties of the cells. expression of proteins that can modulate their biology or therapeutic properties enormously expands their utility for therapy. the lack of a crisp de fi nition of all the stem cell types affects targeting of lentiviral vectors to speci fi c subsets of stem cells. however, recent successful efforts in the pseudotyping of lentiviral vector is a step in the right direction. the use of the vsvg pseudotype expanded the tropism of the lentiviral vectors and, as a result, practically any cell type can be targeted and the narrowing of the tropisms by developing novel vector-pseudotypes will be addressed. the emergence of single chain antibodies as pseudotype indicates that we can expect a rapid expansion of this technology in the near future and will result in a precise tool for studying cell lineages. we focus and limit our review on the recent progress made in stem cell research using lentiviral vector-based gene delivery, a method that is emerging as the safest and most effective way to modify (stem) cells permanently or temporarily, if using non-integrating versions of the novel generations of lentiviral vectors, both of which have clear potential for a wide range of research application in preclinical studies as well as therapeutic applications. the most commonly used lentiviral vector framework is hiv-1 based although hiv2, siv (simian immunode fi ciency virus ), fiv (feline immunode fi ciency virus) have been successfully tested ; see review by dropulic [ 62 ] . the native hiv-1 is a human pathogen; but it had been modi fi ed to eliminate pathogenicity and increase safety before considering it as a broadly available tool for gene transfers. typically, lentiviral vector are generated by trans complementation , a process that separates the essential components of hiv (the genes encoding gag-pol , rev , and env ) into separate plasmids, which lack the packaging signal and , therefore, can never end up in a packaged vector unless they appear in a recombinant sequence [ 63 ] . the components (tat , vif, etc.) responsible for pathogenicity by upregulation of transcription [ 54 ] and export of genomic rna to the cytoplasm have been successively removed from the constructs. the potential for a recombination event is minimized, and for all practical purposes avoided, by carefully editing the genes and using codon degeneracy to reduce the chances of recombination with the wild-type virus. these separate plasmids are used to co-transfect a packaging cell, typically hek293, along with a payload plasmid that carries the packaging signal necessary for starting the envelope formation and encapsidation of the mrna that carries the payload gene (s) as well as the 5 ¢ and 3 ¢ long terminal repeats (ltr s) necessary for integration into the transcriptionally active regions of the host chromosomes. packaging is a delicate process, which ensures that with the rna, appropriate trnalys, protease, integrase, and reverse transcriptase enzymes are carried by the vector with the packaging elements necessary for successful cell entry, reverse transcription of vector rna to dna, transport of that dna into the nucleus, and the permanent integration of the dna into host chromosome. the env gene encodes the protein gp160 that is cleaved into trimer-forming gp120, which appears as spikes decorating the vector particle; and gp41, that carries a transmembrane region and a carboxy terminal sub-domain that interacts with the nucleocapsid within the envelope. the n-terminal domain has a fusogenic domain that facilitates cell entry by fusing the outer membrane of the vector with the cell membrane. a region further down to from the amino terminal region also binds to gp120 which in turn binds to primary hiv-1 receptors on the target cd4+ of t lymphocytes. this property if left unmodi fi ed would signi fi cantly limit the usability of the lentiviral vector, as very few cells types can be directly infected by hiv-1. pseudotyping overcomes this limitation and permits targeting to any mammalian cell. pseudotyping essentially replaces the original hiv env gene with a corresponding molecule from other viruses and carries over the cell-targeting speci fi city (i.e., the tropism of the virus) and obviates some of the safety concerns related to gp120 [ 258 ] . the list of successful pseudotypes and cell tropism is rather lengthy [ 18, 76-79, 105, 222 ] and growing. the most successful pseudotype so far uses the env from vesicular stomatitis virus (vsvg) that successfully broadens the tropism to cells in the brain, kidney, and liver amongst other. it extends to mesenchymal (stem) cells, even those in nondividing (resting) state [ 77, 136, 280 ] . filovirus env pseudotypes shift the tropism to a more limited set of cells, airway epithelial and endothelial cells [ 130, 148 ] . baculovirus gp64 and hepatitis c virus e1 and e2 pseudotypes redirect the vectors toward liver cells targeting their respective receptor, cd81 (tetraspanin) [ 19 ] . rabies virus env has been shown to ef fi ciently retarget the vectors to neuronal cells [ 162, 262 ] . rd114 env pseudotyped lentiviral vector show preference for hematopoietic cellular compartment [ 20, 56, 85, 115, 221, 268 ] . however, some applications require targeting a speci fi c cell type, which is not necessarily covered by the available pseudotypes listed above. in those cases, new targeting methods have been developed to further tighten the tropism of lentiviral vector by co-expressing cell-speci fi c coreceptors that recognize one of the cell-type speci fi c markers. the payload plasmid components providing the backbone for the transfer vector in the early hiv vectors were composed of a 5 ¢ ltr, followed by a major splice donor site, a packaging signal site encompassing the packaging signal components of the 5 ¢ region of gag (necessary for high ef fi ciency packaging and high vector titer) and a deletion of the rest of the gag gene. deletion of the u3 region from the 3 ¢ ltr promoters became also possible by relaying on constructing genes with their own promoter(s). the latest generation of lentiviral vectors carry an additional safety element, the self-inactivating ltr (sin lentiviral vector , [ 114 ] ) replacing the ltr with an hiv-independent promoter from cytomegalovirus (cmv). in these vectors the ltrs are modi fi ed in a way that upon integration they lose their intrinsic promoter ability reducing genotoxic potential. in addition the irreversible changes that occur during integration diminish the ability to mobilize after integration and to recombine with other elements to form a full-fl edged, replication capable virus [ 25, 279 ] . the formal proof of increased safety is still lacking, ironically, because of the inability to create and detect rcl capable viruses from lentiviral vector-treated cells [ 25 ] , indicating that this risk is mainly theoretical. the removal of rre and the associated splice donor and acceptor elements results in signi fi cant loss of transduction ef fi ciency of the vector [ 127 ] , while adding a 100 nucleotide central polypurine tract (central dna fl ap) restores the transduction ef fi ciency by improving the reverse transcription and nuclear transport ef fi ciency [ 47, 283 ] . the woodchuck hepatitis virus transcriptional regulatory element (wpre ) is another widely used regulatory element added to the lentiviral vector backbone to stabilize the transcription transgene mrna levels and improve transgene expression [ 292 ] . however, an open reading frame of the oncogenic whv-x element has been found within the native wpre sequence [ 128 ] , so the sequence has been modi fi ed to remove the translation start site [ 282 ] . further optimization continues to improve the safety of lentiviral vector, such as isolating the integrated vector dna to prevent translation beyond the vector boundaries by adding isolating elements. h owever, the insulators have themselves proven to be genotoxic in some instances, and no proof has emerged that such isolators are truly needed [ 100 ] . gene switches such as tet -on and -off have been added to subsequent generations of lentiviral vector and proven to be highly functional, operating with very low leakage [ 194, 260 ] . the cre-lox system has also been successfully implemented in the lentiviral context allowing high ef fi ciency engineering and sophisticated, site-speci fi c recombination techniques including the delivery and irreversible switching by small hairpin rna (shrna ) expression [ 135 ] , a tool extensively used in gene function analysis [ 37, 193 ] . a major concern regarding the safety of lentiviral vectors has been the potential genotoxicity resulting in oncogenesis, as observed previously during clinical trials for treating x-linked severe combined immunode fi ciency (x-scid) with transplanted hscs treated with murine oncoretroviral vectors carrying the gamma chain of il2r genes [96] [97] [98] [99] . the preferential insertion of oncoretroviral vectors in proximity of the lmo2 proto oncogene and the subsequent constitutive activation of the proto-oncogen driven by the enhancer element (ltr) in the vector resulted in uncontrolled cell proliferation. however, in a series of studies comparing oncoretroviruses and lentiviral vector, it has been shown that while the oncoretroviral vectors trigger a dose dependent acceleration of cancer onset in a mouse transplantation model sensitive to cancer-triggering genetic changes (cdkna2−/−), lentiviral vectors lacked such activity [ 172, 175 ] even though the vector integration rate was signi fi cantly higher. this important observation implying that a low level of insertional mutagenicity has been con fi rmed independently by several groups indicating a favorable safety pro fi le for lentiviral vectors while emphasizing the importance of vector design and avoidance of strong enhancers in the vectors [ 32, 169, 171 ] . recent clinical trials have supported the good safety pro fi le of lentiviral vectors . there have been no oncogenic effects reported in any of trials using lentiviral vectors to date [ 30, 83, 123, 143, 161, 195, 210, 276 ] . gene silencing by small interfering rna (rnai) is based on duplex formation between the mrna and a short complementary micro rna or small inhibitory rna, each having the ability to interfere with the protein synthesis and downregulate the expression levels of the targeted protein. a major problem with the inhibitory rna technologies is the short half-life and delivery of the rnai. this can be resolved using lentiviral vectors encoding arti fi cial genes with appropriate micro rna sequences that can be integrated into the host cell dna and ef fi ciently transcribed into primary micro rna that utilizes the natural intracellular processing by microprocessor complex formation with drosha to form small hairpin rna (shrna) in the nucleus. the exported mirna is subsequently cleaved by dicer and produces the complex forming inhibitory rnai. the process is rather complex, but ef fi cient to produce a signi fi cant blockade of protein expression that may be incomplete but readily achieves signi fi cant reduction, that is adequate for gaining insight into the function of the targeted protein and ef fi cient enough for phase i and ii clinical trials, though no therapeutic use has been approved by the fda. it is interesting to note that mscs are capable of secreting cholesterol-rich phospholipid microparticles encapsulating mirna, and therefore have the potential to facilitate intercellular communication and act as regulatory agents in their microenvironment [ 38 ] . an hematopoietic or general pluripotent stem cells are often selected targets for rna interference-based interventions and one of the promising efforts deal with creating arti fi cial virus resistance genes and virus resistant somatic cells. preventing hiv infection by reconstituting the immune system with such stem cell-derived virus-resistant progeny has been used as model system with signi fi cant clinical relevance [ 121 ] . the idea is that an ef fi cient hiv infection requires virus entry through the cd4 surface antigen and one or more virus co-receptors, among which ccr5 has been shown to play an essential role in the case of r5 tropic viral strains involved in primary hiv infection. clinical data indicate that ccr5 de fi ciency or certain mutations in this co-receptor protect the infected individuals from the onset of full blown aids, and the hope is that the arti fi cial knockdown of ccr5 using gene therapy and rnai will achieve similar protection [ 8, 57, 121, 146, 233 ] . the relative inef fi ciency of the ccr5 suppression remains a signi fi cant issue, but major improvement and complete knockdown of ccr5 have been achieved with somewhat longer (28 base instead of 23) shrna [ 7 ] . mesenchymal stem cell research is taking full advantage of the shrna techniques by characterizing the subtle, and not so subtle, changes induced by individual gene knockdowns. it is a long held view that mechanical stresses and mechanical characteristics of stem cells, as well as the microenvironment, can affect stem cell proliferation and differentiation. lentiviral vectors are excellent and ef fi cient targeting tools for these stem cells, even resting ones, and can deliver the shrna without causing major changes and stress that would otherwise change the stem cells on its own. chowdhury et al. studied the spreading response of mscs and showed that myosin ii , f-actin , src, or cdc42 were essential for cell spreading and changes in the mechanical characteristics ("softening") of the stem cells led directly to the downregulation of the oct3/4 gene. this indicates the possibility that small mechanical events may affect the embryo and developing tissues and even transplanted stem cells [ 41 ] . another area of ef fi cient use of lentiviral vectors and rnai technology in stem cell research is the production of transgenic embryos which carry knockdown genes. production of transgenic embryos is highly ef fi cient, and if the fertilized egg is transduced at a single cell stage, the entire germ line is affected, or partial chimerism can be achieved if multicellular embryos are treated with lentiviral vectors. an example of such a study is that by wang et al., in which they showed that the knockdown of runx1 in embryonal tissues and mscs by lentiviral vector-delivered interfering rna blocked chondrogenesis in limb buds [ 257 ] . the technique has been shown to be very ef fi cient for transgenesis, as high as 44% average rate of germ-line transmission can be achieved [ 227 ] , providing a new source of gene-modi fi ed mscs. a recent comprehensive review of the use of naturally occurring regulatory mirna technology in mesenchymal stem cell research has been written by guo et al. [ 93 ] , indicating that stem cells have discrete and distinct expression pro fi les that can account for intrinsic stem cell properties such as self-renewal and pluripotency, a property that cannot longer be overlooked by experts dealing with mscs. the accumulating data indicate that the progenitors and terminally differentiated mesenchymal cells can be tracked and de fi ned by function-related mirnas in addition to the already established sets of surface markers. the mirnas already identi fi ed affect osteogenic differentiation, chondric differentiation, adipogenic differentiation, myogenic differentiation, neuronal differentiation, wound healing, and replicative senescence. these advances open a wide array of possibilities to direct the differentiation patterns of the stem cell population temporarily by using non-integrating lentiviral vectors that are automatically lost from dividing cell populations and lead to the natural disappearance of control signal after a few cell division but potentially giving a push to the original stem cell population to develop in a preferred direction. extensive progress has been made in regards to the elucidation of the hedgehog signaling pathway in mscs using rna interference delivered with lentiviral vectors. the data suggest that at least some of the elements indeed act through the regulatory mirna network, by downregulating the cellular mirna levels. the data, however, also suggest signi fi cant off-target effects of the interfering rna molecules and indicate that we are a long way from the potential clinical use of the elucidated networks [ 124 ] . an ingenious method was devised by hu et al., to prepare the brain for traumatic interventions (surgery, extensive stem cell transplantations, etc.) by downregulating the cerebral matrix metalloproteinase 9 (mmp9) using lentiviral vector and mmp-9 shrna 2 weeks before the trauma. the knockdown of mmp-9 with the shrna proved to be an effective way to preserve the blood-brain barrier, and they achieved signi fi cant reduction of brain infarction volumes, reduction of brain water content and evans blue/igg extravasation (measure of edema formation) as well as a reduction in the neurobehavioral de fi cit in their rat brain trauma model [ 108 ] implying a potential for improved protocols for traumatic brain interventions needed for more extensive type of intracranial stem cell implantations. as mentioned earlier, lentiviral vectors provide a very ef fi cient method for generating transgenic embryos, signi fi cantly reducing the need for the generation of a high number of embryos to establish new sources of gene-modi fi ed stem cell lines, embryonic, or other [ 227 ] . the lentiviral technology is able to deliver a payload of 6-8 kb very ef fi ciently, but payloads of 10kb can be handled and delivery of 12-13 kb is possible, at a cost of lower ef fi ciency. this payload-carrying capacity allows the delivery of very large genes such as the gene encoding blood clothing factor viii, a 2,351 amino acid long protein together with its stabilizer, the von willebrand factor (2,813 amino acids in its native form) simultaneously or, one may need to use domain-engineered and shortened version of both; similarly it can be used to deliver all three chains of an igm molecule in a single, tri-cistronic complex. the implication is that the lentiviral vector system has suf fi cient payload capacity to deliver a number of relevant genes together with several supporting molecules envisioned for highly complex gene therapy scenarios currently outside the scope of monogenic gene therapy as practiced today. it may be used to target diseases with multi-gene disorders such as high blood pressure, arthritis, or diabetes in the future. zinc-fi nger nucleases (zfns) have the remarkable ability to (a) bind to a speci fi c location in the double-stranded dna; (b) break the double-stranded dna at that speci fi c location and, if an endogenous repair template is provided, (c) initiate homology-directed repair, restoring the integrity of the newly edited double-stranded dna. as their name implies, there is a speci fi c dna-binding part of this class of enzymes that consists of a tandem repeat of dna-binding zinc-fi nger motifs, hence the dna binding speci fi city and a catalytic domain, foki . for dna cleavage to occur, foki has to dimerize, one on the sense and the other on the antisense strand, while the zinc-fi nger domains attach to the right target half site and the left target half site. upon binding, a nick with a 5 ¢ overhang is initiated by foki between the target sites and the homology-directed dna repair mechanism is activated. what makes this con fi guration useful is that the spacer between the two target half sites can be several hundreds or even thousands of base pairs long and by providing a template for the activated repair mechanism, a novel dna sequence of equal length can be introduced into the dna; see a recent reviews by caroll [ 29 ] and others [ 50, 101, 117, 134, 246 ] . fundamentally, two factors determine the ef fi cacy of the dna editing or repair that the technology allows. the fi rst is the speci fi city of the zinc-fi nger binding, which also determines the length of the spacer and the proper speci fi city and uniqueness of the binding site and allows the minimization of the off-target effects that may be introduced by similar sites far away from the desired and targeted locus [ 101 ] . huge efforts are being made to tailor the zinc-fi nger nucleases for particular applications and improving the selectivity by successfully engineering the dna-binding speci fi city of the binding domain [ 3, 101, 158, 201, 220 ] . the second factor is the ef fi cient delivery of the zfns and the template dna by vectors. while the early attempts relied on retroviral vectors, adeno and adeno-associated vectors, and even baculovirus vectors, the recent advances in the fi eld clearly indicate that the lentiviral delivery system is considered to be a safer and more ef fi cacious route. as high as 50% conversion rate can be achieved with lentiviral delivery in a variety of cell lines and human embryonic stem cells [ 154 ] as compared with the earlier best rates of 18% with other methods in human and other species [ 3, 101, 197, 198, 201, 220, 245, 264 ] . one of the many holy grails of medicine, the ability to replace diseased tissue or even entire organs, seems to be hovering at the not too distant horizon. there is rapid progress in a wide range of areas, but at the center of the solution is almost always biocompatible scaffolding that is populated with a wide variety of cells. the strategically positioned cells fi nd their place within the 3d structure, propagate, differentiate, and fi ll the available space, while producing a structure that can replace or enhance the damaged tissue in the form of various implants or prosthetics. as for scaffolding, the options are quite numerous, including those obtained from cadavers or live organs (animal or human origin), by removing the cells while preserving the fi brous tissue that maintains the basic morphology of the organ. alternatively, a scaffold can be printed with various 3d printers [ 126, 159, 214-216, 229, 255, 261 ] . processed cartilage can also result in scaffold and it can be used to rebuild and regrow an implantable ear, nose, or cartilage for trachea reconstruction [ 174 ] . the culture, expansion, and differentiation of human mscs into arti fi cial tissues represent a very complex series of events and lentiviral vectors often serve as excellent research tools for marking, visualizing, and tracking the process [ 253 ] , or modifying the gene or protein expression patterns [ 68 ] . a number of tissue engineering attempts have reached the clinic and lentiviral vector have played various roles in the advancement of the technology. a very promising technology is the use of these scaffolded arti fi cial tissues employing mscs and lentiviral vectors for delivering biologics for prolonged times. van damme succinctly described the potential of these arti fi cal tissues built on scaffolds and providing arti fi cial implants for drug delivery. lentiviral vectors were used to transduce mesenchymal cells to express green fl uorescent protein (gfp) or fviii. expression was superior compared to oncoretroviral transduction, showing consistently higher transduction rates and expression remained high for several months post-transduction. the transduced cells retained their stem/progenitor cell properties, and they were still capable of differentiating along adipogenic and osteogenic lineages in vitro, while maintaining high gfp and fviii expression levels. implantation of lentiviral vector-transduced human bone marrow mesenchymal cells using collagen scaffolds into immunode fi cient mice resulted in ef fi cient engraftment of gene-engineered cells and provided sites for transgene-expression in vivo. in addition to the bone marrow-derived stem cells, adipose tissue-derived mesenchymal stem cells have been shown to be amenable to populate implantable scaffolds and retain the potential to differentiate into osteogenic cells. some of these scaffolds have been engineered for use in reconstructing craniofacial bone defects. lentiviral vector have been used to deliver fl uorescent proteins to track cells during manipulation such as osteogenic differentiation. the gfp-marked stem cells and their progeny remained fl uorescent over the 8 weeks of the study period. the gfp-marked stem cells were successfully induced into osteogenic cells both in monolayers and threedimensional scaffolds. quanti fi cation showed no decrease in staining of the osteoinduced stem cells indicating the ef fi ciency and durability of the labeling [ 256 ] . tissue engineered vascular grafts built on bilayered elastomeric poly (ester-urethane) urea scaffolds and seeded with pericytes have shown promise in the past. however, in vitro endothelialization is still an issue for the use of these types of grafts. doebis et al. reported in 2006 enhanced endothelialization using allogeneic endothelial cells or their precursors, expressing recombinant anti-alpha-mhc i single chain antibody to prevent rejection. the recombinant antibody was delivered ef fi ciently ex vivo using lentiviral vector, and has signi fi cantly reduced the mhc-1 expression levels as well as the killing of allogeneic cells by mhc-1 speci fi c cd* + t cells [ 58 ] . the results suggest that these allogeneic cells may provide a suitable alternative supply for the lining of vascular prostheses. endothelial cells and their precursors are attractive targets for gene therapy, both for the treatment of cardiovascular disease and for the systemic delivery of recombinant gene products directly into the circulation. there have been a few reports which show lentiviral vector-mediated gene transfer ef fi ciency. sacoda and colleagues compared the effectiveness of lentiviral vector compared to adeno and oncoretroviral vectors. bovine aortic endothelial cells (baecs) were infected, in vitro, with these viral vectors. transduction ef fi ciency of beta-gal gene transfer in baecs by adenovirus, lentiviral vector, or retrovirus at a multiplicity of infection (moi) of 10 (determined on hela cells) was 69 ± 11, 33 ± 8, or 22 ± 6% respectively. at higher moi [ 50 ] both adenovirus and lentiviral vectors achieved an almost 100% transduction rate. however, retroviral vectors showed only 48 ± 6% at moi 50 and no increase at moi 100. the percentage of beta-gal positive cells decreased rapidly at longer passage of cells after being transduced by adenovirus. in contrast, lentiviral vector and retrovirus vectors mediated transductions showed sustained higher percentage of positive cells. furthermore, the transductions by lentiviral vectors had no signi fi cant effect on viability of baecs suggesting that for long-term cell therapy the lentiviral vectors have overall the best features [ 219 ] . expressing il10 in similar settings in the early, initiation phase, also inhibited and delayed the onset of the rejection process [ 287 ] . one of such cases in which the performance of the endothelial cells may need to be boosted is to increase the resistance to ischemia-reperfusion injury of the vascularized transplants and implants or normal tissues undergoing prolonged surgery. this is a condition which occurs too frequently and is responsible for devastating tissue injury caused by systemic activation of the complement system. lentiviral vectors can be used to force the over-expression of the anti-apoptotic gene, bcl-xl and indeed, it has shown signi fi cant protection from early apoptotic loss of vascular endothelial cells [ 286 ] . recently, tooth tissue engineering has attracted more and more attention. stem cellbased tissue engineering is thought to be a promising way to replace a missing tooth. the potential mscs for tooth regeneration mainly include stem cells from human exfoliated deciduous teeth (sheds), adult dental pulp stem cells (dpscs), stem cells from the apical part of the papilla (scaps), stem cells from the dental follicle (dfscs), periodontal ligament stem cells (pdlscs), and bone marrowderived mscs (bmscs). a recent review by peng et al. shows promising progress [ 190 ] . however, in practice, tissues other than bone marrow can serve as stem cell donors, including adipose tissue, periodontal ligament, and pulp for oral tissue regeneration [ 206 ] . the experimental data suggest that not only the stem cells ex vivo, but cells in the osteogenic tissue are amenable to direct transduction by lentiviral vector [ 259 ] . this opens up the periodontal reconstruction interventions to the bene fi cial effects of gene therapy enhancing the wound healing and improving engraftment by expressing growth promoters at low and slowly decreasing concentrations. estrela published an excellent review on the potential of mscs in regeneration of dental tissues [ 68 ] and rodrigues-loza reviewed the mesenchymal cell types recovered from dental tissues [ 208 ] . other data clearly show that the primary osteogenic cells are ef fi ciently transduced by lentiviral vector, and that their infusion into the mandible is a feasible method for locally delivering dna to primary osteogenic and bone cells in rat models [ 259 ] , indicating that future applications in vivo dental implant enhancement, using dental scaffolding, bone healing, and tooth regeneration may be feasible. recent efforts extend toward engineering dental repair by changing the expression of growth factors and bone morphogenic proteins leading to dentin formation, as discussed in a 2011 review by casagrande [ 31 ] and which seem to be amenable to cell therapy efforts with nonintegrating lentiviral vector. one of the tissues that is often injured but that presents dif fi culties when it comes to healing and repairs is the tendon. enhancing the healing process by in situ overexpression of helper factors such as il10 could reduce recuperation time and perhaps improve the quality of the repair. richetti et al. reported promising results in a murine model of patellar tendon injury after direct injection of an il10 transgene using lentiviral vector. although the tendons showed no obvious histological difference, the il-10-treated groups had superior mechanical characteristics by day 42 [ 205 ] . although the mechanism of wound healing in tendons is not yet understood, the involvement of mscs is suspected and delivery of additional factors that partake in healing process is discussed by meyerose and ashlan [ 13, 163 ] . recent fi ndings by shamis and colleagues [ 230 ] demonstrated that embryonic stem cells could be directed to speci fi ed and alternative mesenchymal cell fates whose function could be distinguished in engineered human skin equivalents. lentiviral shrna-mediated knockdown of hepatocyte growth factor (hgf) resulted in a dramatic decrease of hgf secretion from cell lines (edk cells) that led to a marked reduction in their ability to promote keratinocyte proliferation and reepithelialization of cutaneous wounds. in contrast, h9-mscs demonstrated features of mscs but not those of dermal fi broblasts, as they underwent multilineage differentiation in monolayer culture, but were unable to support epithelial tissue development and repair and produced signi fi cantly lower levels of hgf. characterization of these induced mesenchymal cells in 3d, engineered human skin equivalents demonstrated the utility of this tissue platform to predict the functional properties of stem cell-derived fi broblasts before their therapeutic use in reconstructive skin transplantation and wound healing. inhibition of hyper-keratinization by expressing a mutant form of tcgf beta3 that has lost its binding site for latency-associated peptide, reduced the re-epithelialization density and fi broblast/myo fi broblast trans-differentiation within the wound area [ 251 ] in a mouse skin wounding model. the expression of this mutated gene was achieved by injecting lentiviral vectors encoding the muttcgf beta3, into the regenerating tissue and the changes induced by this intervention predict a signi fi cant decrease in keloid formation and provide a potential model for preventing the painful dis fi gurement that follows the abnormally strong skin remodeling and scar tissue formation that oftentimes accompanies wound healing. the data indicate that future stem cell therapy with carefully designed interventions for patients prone to scar tissue formation could fi nd wide spread application. one of the causes of erectile dysfunction is the damaged penile cavernous smooth muscle cells (smcs) and sinus endothelial cells. song reports that it may be feasible to restore these cells by applying mscs to penile cavernous ecs or smcs. for this purpose immortalized (via lentiviral vector encoding v-myc) human bone marrow mesenchymal stem cell line b10 cells were transplanted into the cavernosum of sprague-dawley rats and harvested 2 weeks later. the expression of cd31, von willebrand factor (vwf), smooth muscle cell actin (sma), calponin, and desmin was determined immunohistochemically in rat penile cavernosum. multipotency of b10 to adipogenic, osteogenic, or chondrogenic differentiation was found. expression of endothelial cell-speci fi c markers (cd31 or vwf protein) and expression of smooth muscle cell-speci fi c markers (calponin, sma, or desmin protein) were demonstrated in grafted b10 cells indicating that human mscs may be a good candidates in the treatment of penile cavernosum injury [ 238 ] . angiogenesis requires the presence and active involvement of mscs and therefore mscs are ready to be recruited into the areas when there is a need for novel blood vessels: the in fl amed, hypoxic, tumor infested locations. gehmert et al. described an interesting model to study the migration of mscs. in their work, immunode fi cient mice were engrafted with human breast cancer cells (4t1) in the left mammary pad. a day later, the mice were injected ip with luciferase-labeled adipose tissue-derived mscs (using lentiviral vector technology). the mscs were found to rapidly migrate into the tumor, con fi rming the previous observations that mscs can be found within the tumor stroma and vasculature, even if the in fl ammation is not present, as the immunode fi cient mice lacked the in fl ammation signaling pathway. based on this result, it can be suggested that mscs can be attracted solely by the cytokines produced by the tumor. however, the power of in fl ammation has been clearly demonstrated in control animals, which received e. coli injections at contralateral locations and attracted all the mscs leaving the tumor implant msc-free [ 84 ] . elucidating the migratory mechanisms of the mscs seems to be an important step toward fi nding a delivery system to in fl ammatory sites and fi nding the conditions for clear migration into established tumors. even the simple marking of tumor tissue with fl uorescent proteins (such as gfp) holds important promise for surgeons, as delineating a breast cancer in situ during surgery would be possible by applying uv light and tracing the contours of the tumor. the technique already allows sophisticated molecular imaging combined with stem cell therapy [ 254 ] . wang and colleagues used the ability of mscs to differentiate into endothelial cells in vivo to establish whether the differentiated mscs persist in vivo and to determine if this potential persistence contributes to functional improvement after experimental myocardial infarction. they generated a lentiviral vector encoding two distinct reporter genes, one driven by a constitutive murine stem cell virus promoter and the other driven by an endothelial-speci fi c tie-2 promoter. the endothelial speci fi city of the lentiviral vector was validated by its expression in endothelial cells but not in undifferentiated stem cells. the lentivirus-transduced mscs were injected into peri-infarct are as of the hearts of severe combined immunede fi cient mice. persistence of injected cells was tracked by bioluminescence imaging (bli) and veri fi ed by immunohistochemical staining. the bli signal from the endothelial-speci fi c reporter revealed that the stem cells differentiated into endothelial cells 48 h after injection. however, both the constitutive and endothelial-speci fi c signals disappeared by day 50. nonetheless, the improvement in left ventricle ejection fraction with therapy persisted for up to 6 months. immunohistochemical staining showed that stem cell-derived endothelial cells integrated into endogenous cd31+ vessels. furthermore, stem cell-transplanted hearts had more cd31+ vessels and a lesser degree of cardiac fi brosis compared with the controls at 6 months. increased angiogenesis and decreased fi brosis were associated with cardiac functional improvement. similarly mscs double-marked with gfp-lentiviral vector and superparamagnetic iron oxide could be followed by mri for up to 8 months in a porcine model of infraction and revascularization [ 274 ] . endothelial cells respond to mild injurious stimuli by upregulating anti-apoptotic gene expression to maintain endothelial integrity. ec dysfunction and apoptosis resulting from ischemia/reperfusion injury may contribute to chronic allograft rejection. under optimized conditions for lentiviral vector transduction of rat aortic endothelial cells (raec) the delivery of the anti-apoptotic gene, bcl-xl, via lentiviral vector, protects raec from apoptotic death. the authors con fi rmed the damaging effect of the reperfusion phase. endogenous bax expression increased with i/r injury, whereas endogenous bcl-xl remained constant. raec transduced with lentiviral vector expressing bcl-xl were protected from early apoptosis caused by i/r injury, correlating with reduced cytochrome c release into the cytosol. this protective effect may be attributed to altering the balance of pro-and anti-apoptotic proteins, resulting in sequestration of the harmful bax protein, and may open up new strategies for controlling chronic allograft rejection [ 286 ] . inhibition of na+/h+ exchanger 1 (nhe1 ) reduces cardiac ischemia-reperfusion (i/r) injury as well as cardiac hypertrophy and cardiac failure. although the mechanisms underlying these nhe1-mediated effects suggest delay of mitochondrial permeability transition pore (mptp) opening, and reduction of mitochondrial-derived superoxide production, the possibility of nhe1 blockade targeting mitochondria has been incompletely explored. a short-hairpin rna sequence mediating speci fi c knock down of nhe1 expression was incorporated into a lentiviral vector (shrna-nhe1) and transduced into the rat myocardium. nhe1 expression of mitochondrial lysates revealed that shrna-nhe1 transductions reduced mitochondrial nhe1 (mnhe1) by approximately 60%, supporting the expression of nhe1 in mitochondria membranes. electron microscopy studies corroborate the presence of nhe1 in heart mitochondria. immunostaining of rat cardiomyocytes also suggests colocalization of nhe1 with the mitochondrial marker cytochrome c oxidase. to examine the functional role of mnhe1, mitochondrial suspensions were exposed to increasing concentrations of cacl 2 to induce mptp opening and consequently, rat heart mitochondrial swelling. shrna-nhe1 transduction reduced the cacl 2 -induced mitochondrial swelling by 64 ± 4%. whereas the nhe1 inhibitor hoe-642 (10 m m) decreased mitochondrial ca 2+ -induced swelling by only 37 ± 6. because mitochondria from rats injected with shrna-nhe1 present a high threshold for mptp formation, the bene fi cial effects of nhe1 inhibition in i/r resulting from mitochondrial targeting should be considered as a future target for cell therapy [ 250 ] oxidative stress is important in a number of pathologies, including cardiovascular diseases, such as atherosclerosis and cardiac ischemia-reperfusion injury. an important mechanism for adaptation to oxidative stress is the induction of genes through the antioxidant response element (are) which regulates the expression of antioxidant and cryoprotective genes via the transcription factor nrf2 (nuclear factor e2-related factor 2). as nrf2-regulated genes are induced during oxidant stress, occurring for example in reperfusion after ischemia, hurttila et al. took a novel approach to exploit are for the development of oxidative stress-inducible gene therapy vectors. to this end, one, two, or three are-containing regions from human nad(p)h: quinone oxidoreductase-1, glutamate-cysteine ligase modi fi er subunit and mouse heme oxygenase-1 were cloned into a vector expressing luciferase under a minimal sv40 promoter. the construct, which was the most responsive to areinducing agents, was chosen for further studies in which a lentiviral vector was produced for an ef fi cient transfer to endothelial cells. heme oxygenase-1 (ho-1), which has well-characterized anti-in fl ammatory properties, was used as the therapeutic transgene. in human endothelial cells, are-driven ho-1 overexpression inhibited nuclear factor-kappa b activation and subsequent vascular cell adhesion molecule-1 expression induced by tumor necrosis factor-alpha. they concluded that the are element is a promising alternative for the development of oxidative stressinducible gene therapy vectors [ 111 ] . progenitor cell therapy is a potential new treatment option for ischemic conditions in the myocardium and skeletal muscles. however, it remains unclear whether umbilical cord blood (ucb)-derived progenitor cells can be therapeutic in ischemic muscles and if yes, whether the ex vivo gene transfer can be used for improving the effect. the use of lentiviral vector led to ef fi cient transduction of both ucb-derived hscs and mscs resulting in long-term transgene expression. moreover, it did not alter the differentiation potential of either hscs or mscs. in addition, the therapeutic potential of cd133+ and msc progenitor cells transduced ex vivo with lentiviral vector encoding the mature form of vascular endothelial growth factor d (vegf-d ) or the enhanced green fl uorescent protein (egfp) marker gene achieved permanent gene expression. the transplantation of the progenitor cells into nude mice serving as mouse model of skeletal muscle ischemia enhanced the regeneration of ischemic muscles, but notably, without a detectable long-term engraftment of either cd133+ or msc progenitor cells. the results show that rather than directly participating in angiogenesis or skeletal myogenesis, the ucb-derived progenitor cells indirectly enhance the regenerative capacity of skeletal muscle after acute ischemic injury. however, rather counter-intuitively, the vegf-d gene transfer into the progenitor cells did not improve the therapeutic effect in ischemic muscles [ 131 ] . another cell type with improved adult stem cell functions has been discovered and cells have been isolated from the peripheral blood of young children. this clonally expandable, telomerase expressing progenitor cell type is distinct from hematopoietic or mesenchymal stromal cells and resembles that of embryonic multipotent mesoangioblasts. cell numbers and the proliferative capacity correlate with donor age, and express the pluripotency markers klf4 , c-myc , as well as low levels of oct3/4, but lack sox2 . overexpression of sox2 by lentiviral transduction of sox2 (sox-mabs) enhances pluripotency and facilitates differentiation to cardiovascular lineages. furthermore, the number of smooth muscle actin positive cells was higher in sox-mabs. in addition, pluripotency of sox-mabs was shown in a mouse model by demonstrating the generation of endodermal and ectodermal progenies and injection of sox-mabs into nude mice after acute myocardial infarction resulted in improved cardiac function compared to mice treated with control cells (cmabs). furthermore, cell therapy with sox-mabs resulted in an increased number of differentiated cardiomyocytes, endothelial cells, and smooth muscle cells in vivo [ 133 ] . mesenchymal stem cell therapy emerges as a viable therapy in the context of acute lung injury /acute respiratory distress syndrome and chronic disorders, such as lung fi brosis and chronic obstructive pulmonary disease. there is evidence for bene fi cial effects of mscs on lung development, repair, and remodeling. the engraftment in the injured lung does not occur easily, but several studies report that paracrine factors can be effective in reducing in fl ammation and promoting tissue repair. mscs release several growth factors and anti-in fl ammatory cytokines that regulate endothelial and epithelial permeability and reduce the severity of in fl ammation, as reviewed by arboreau et al. [ 2 ] , suggesting that carefully controlled expression of these factors using transduced stem cells could enhance the bene fi cial effects of the mesenchymal stem cell therapy. this may be a risky proposal, however, since constitutive expression of tgf beta /tgf alpha in epithelial mscs generated breast cancer stem cells [ 12 ] . acute respiratory distress syndrome (ards) is a crippling disease with no effective therapy, and characterized by progressive lung damage followed by dyspnea. mscs have been proposed as a new therapeutic modality for ards because the stem cells can attenuate in fl ammation and repair the damaged tissue by differentiating into several cell types. the bene fi cial effect of the stem cells is still a minor mystery, as it is known that macrophages participate in the development of ards and that mscs can only weekly modulate macrophage function. the chemokine ccl2 is a potent inducer of macrophage recruitment and activation, and its expression is elevated in patients with ards. a set of mscs have been generated by transducing the cells with a lentiviral vector expressing 7nd, a dominant-negative inhibitor of ccl2, expecting enhanced therapeutic function of the mscs if the hypothesis is valid. the transduction was effective, and the stem cells produced a large amount of 7nd. after inducing lung injury by bleomycin treatment, the iv-injected mscs readily migrated into the site of injury as con fi rmed by immunostaining 24 h postinjection. this fi nding suggests that mscs could work as a drug delivery tool. mice treated with 7nd-expressing mscs showed signi fi cantly milder weight loss, suffered less severe lung injury, lower collagen content, lesser accumulation of in fl ammatory cells and in fl ammatory mediators, and ultimately showed signi fi cant gains in survival [ 218 ] . no evidence of 7nd-mesencymal stem cell-induced toxicity was observed during or after treatment. thus, inhibiting the effects of macrophages may greatly enhance the ability of mscs to affect lung repair in ards. direct transduction of lung tissues for gene therapy has always been an attractive proposal. the reoccurring problem, however, is that the airways are far less accessible to vector particles than hoped for and the depth of penetration of inhaled substrate ends in the branches which are larger than 100 m m in diameter [ 48, 263 ] . an attractive alternative delivery of gene therapy components could be the intrapleural injection of mscs. to enable tracking, the cells were labeled with green fl uorescent protein (gfp) using a lentiviral vector, and were found readily attached to the pleura of sprague-dawley rats. the isolated and recovered cells preserved the typical mesenchymal stem cell phenotype and could differentiate into adipocytes, osteoblasts, and chondroblasts in vitro. the highest number of the labeled cells was found to be adhered to the mediastinal pleura, but no labeled cells were detected in the lung parenchyma or other tissues/organs, such as the liver, kidney, spleen, and mesenterium, a remarkable compartmentalization of a stem cell transplant [ 200 ] . alzheimer's disease (ad) is one of the most devastating conditions and its prevalence is still rising paralleling the increase of average life expectancy. a hallmark of the disease is the accumulation of amyloid plaques and extensive neurodegeneration in the context of an intracerebral in fl ammation, leading to progressive dementia. over the years, a tripartite set of goals crystallized, when the potential treatments of ad were considered: (a) stop the progression of the disease by reducing/reversing the plaque formation; (b) stop the neurodegeneration that seems to be a consequence of both internal changes (neuro fi brillary tangle formation and related issues) and changes external to the cells, related to plaque formation and degeneration of the neuronal microenvironment; and (c) recover neurological function by replenishing the lost neuronal compartment [ 71, 81, 94, 122, 152, 188, 291 ] . interestingly, mscs and stem cell therapy are increasingly considered a potentially important part of the toolset to achieve these goals. the symptoms that are collectively categorized as ad often have different backgrounds, some of which seem to have roots implying genetic causes, such as improper processing of beta amyloid peptide. consequently, a disease-modifying therapeutic approach in alzheimer's disease aims to reduce the accumulation of neurotoxic beta amyloid aggregation peptides. habish et al. report new fi ndings for a potential autologous stem cell-based strategy for delivery of enzymatic activities against beta amyloid formation in the brain. f-spondin and neprilysin (cd10), genes expressed in adult mscs, are known to be involved in the formation and degradation of beta amyloid peptides, respectively. coincubation of the converted mscs with hek-293 cells stably expressing amyloid precursor protein (app) lead to a signi fi cant cell dose-dependent decrease of amyloid peptide release and deposition, indicating that mscs might be useful for delivering antiamyloid activity to treat ad [ 95 ] . this direction of research is gaining new momentum from the discovery of a new beta amyloid secretase and the tremendous progress gained in recent years in the fi eld of amyloid formation, its contribution to neurodegenerative diseases [ 122 ] and allowing new gene therapies to be conceived and tested. one effort has utilized human umbilical cord blood-derived mscs (hucb-mscs) which were transplanted into amyloid precursor protein and presenilin1 double-transgenic mice. this experiment resulted in signi fi cantly improved spatial learning and a decrease in memory decline. furthermore, beta amyloid peptide deposition, beta-secretase 1 (bace-1 ) levels, and the hyper-phosphorylation of the tau proteins were dramatically reduced in hucb-msc transplanted app/ps1 mice. interestingly, these effects were associated with reversal of disease-associated microglial neuroin fl ammation, as evidenced by decreased microglia-induced pro-in fl ammatory cytokines, reduction in the number of alternatively activated microglia , and decrease in anti-in fl ammatory cytokines. combining these fi ndings with the potential cell therapy targeting, these mscs are expected to produce a sustained neuroprotective effect by establishing a feed-forward loop engaging the alternative activation of microglia, thereby ameliorating disease pathophysiology and reversing the cognitive decline associated with amyloid deposition [ 139 ] . peng and colleagues report additional details on the use of lentivirus-expressed sirna as a method to ameliorate alzheimer disease neuropathology in app transgenic mice by reducing the levels of beta-site app cleaving enzyme 1, or bace1 [ 189 ] . a series of experiments demonstrated the potential of neural stem cells transduced by a multigenic lentiviral vector stably expressing recombinant human nerve growth factor in relevant amounts to exploit their ability for therapeutic applications. the multigenic lentiviral vector contained a tricistronic cassette to express simultaneously up to three independent genes: (1) rhngf (beta subunit); (2) egfp (enhanced green fl uorescent protein); and (3) neo (r) (neomycin antibiotic resistance gene). lentiviral vectors were released in culture media and subsequently used to transduce mouse stem cells. remarkably, the subsequent test revealed that engineered nscs were all positive for egfp and after 30 passages in vitro engineered cells maintained their multipotentiality to differentiate into neurons, astrocytes, and oligodendrocytes. furthermore, it was found that rhngf-stem cell-derived neurons expressed choline acetyltransferase and displayed an enhanced axonal growth. the stem cells showed an altered sphere forming frequency either in rhngf-nsc or in both groups of control nsc. lentivirus-mediated rhngf gene transfer into nsc was achieved without changes in the expression of neural differentiation markers, like microtubule-associated protein 2 (map2) (a/b), glial fi brillary acidic protein (gfap) and chondroitin sulfate proteoglycan [ 34 ] . secreted rhngf increased axonal sprouting by rhngf-nsc-derived neurons, which was associated with chat expression. rhngf-nscs may prospectively be a good candidate for the treatment of neurodegenerative diseases. a protein that has been shown to promote app accumulation is beta-secretase (beta-site app cleaving enzyme 1, or bace1). typically, a marked increase in the level of bace1 is found in the cerebrospinal fl uid of those affected with alzheimer's disease. through in vivo studies using app transgenic mice, it has been demonstrated that decreasing the expression of bace1 via lentiviral vector delivery of bace1 sirna has the potential for signi fi cantly reducing the cleavage of app, the accumulation of these products, and the consequent neurodegeneration. as such, lentiviral-expressed sirna against bace1 is a therapeutic possibility in the treatment of ad. neprilysin has recently been implicated as a major extracellular beta amyloid degrading enzyme in the brain. a unilateral intracerebral injection of a lentiviral vector expressing human neprilysin (lenti-nep) was tested in transgenic mouse models of amyloidosis reduced amyloid-beta deposits by half relative to untreated mice, indicating that neprilysin may have a role in alzheimer's disease treatment. that said, a more ef fi cient delivery system is likely required, a property that a neprilysin expressing stem cell could potentially provide [ 160 ] . gene transfer to the central nervous system provides a powerful methodology for the study of gene function and gene-environment interactions in vivo, in addition to a vehicle for the delivery of therapeutic transgenes for gene therapy. research has been signi fi cantly aided by successfully targeting speci fi c regions of brain, and for parkinson's disease, the substantia nigra. the key to success is the ease of pseudotyping lentiviral vectors, which makes it possible to change the patterns of tropism. cannon et al. used isogenic lentiviral vector particles encoding a gfp reporter and pseudotyped with envelope glycoproteins derived from vesicular stomatitis virus (vsv), mokola virus (mv), lymphocytic choriomeningitis virus (lcmv), or moloney murine leukemia virus (mulv). adult, male lewis rats were injected unilaterally with stereotactic infusions of vector into the substantia nigra. three weeks later, patterns of viral transduction were determined by immunohistological detection of gfp. different pseudotypes gave rise to different sites of transgene expression. vsv and mv pseudotypes transduced midbrain neurons, including a subset of nigral dopaminergic neurons. in contrast, lcmv-and mulv-pseudotyped lentiviral vector resulted in transgene expression exclusively in astrocytes. the restricted transduction of astroglial cells was not explained by the cellular distribution of receptors previously shown to mediate entry of lcmv or mulv. the availability of neuronal and astrocyte-targeting vectors will allow dissociation of cell autonomous and cell nonautonomous functions of key gene products in vivo. similar tissue and cell-speci fi c patterns can be achieved in stem cells using cell/tissue-speci fi c promoters and mirna [ 43, 79, 86, 177, 186, 199, 213, 232, 242, 266, 290 ] . multipotent mesenchymal stromal cells have raised great interest for brain cell therapy due to their ease of isolation from bone marrow, their immunomodulatory and tissue repair capacities, their ability to differentiate into neuronal-like cells, and for their ability to secrete a variety of growth factors and chemokines. a subpopulation of human mscs, the marrow-isolated adult multilineage inducible (miami) cells, when combined with pharmacologically active microcarriers (pams) have shown great promise in a rat model of parkinson's disease. pams are biodegradable and non-cytotoxic poly (lactic-co-glycolic acid) microspheres, coated by a biomimetic surface and releasing a therapeutic protein, which acts on the cells conveyed on their surface and on their microenvironment. in this study, pams were coated with laminin and designed to release neurotrophin 3, which stimulate the neuronal-like differentiation of miami cells and promotes neuronal survival. after adhesion of dopaminergic-induced (di)-miami cells to pams in vitro, the complexes were grafted in the partially dopaminergic-deafferented striatum of rats, which led to a strong reduction of the amphetamine-induced rotational behavior together with protection/repair of the nigrostriatal pathway. these effects were correlated with the increased survival of di-miami cells that secreted a wide range of growth factors and chemokines. moreover, the observed increased expression of tyrosine hydroxylase by cells transplanted with pams may contribute to this functional recovery [ 52 ] and provide an excellent new delivery system for genetically modi fi ed/enhanced cells into substantia nigra. lewy body disease is a heterogeneous group of neurodegenerative disorders characterized by alpha-synuclein accumulation and includes gradually worsening dementia with lewy bodies (dlb) accumulating in neurons followed by advanced parkinson's disease (pd). recent evidence suggests that impairment of the lysosomal pathways (i.e., autophagy) involved in alpha-synuclein clearance might play an important role. for this reason, the expression levels of members of the autophagy pathway in brains of patients with dlb and alzheimer's disease and in alpha-synuclein transgenic mice were examined by immunoblot analysis. in dlb cases, the levels of mtor were elevated and atg7 were reduced compared to controls and ad. levels of other components of the autophagy pathway such as atg5, atg10, atg12, and beclin-1 were not different in dlb compared to controls. in dlb brains, mtor was more abundant in neurons displaying alpha-synuclein accumulation. these neurons also showed abnormal expression of lysosomal markers such as lc3, and ultrastructural analysis revealed the presence of malformed autophagosomes in abundance. similar alterations were observed in the brains of alpha-synuclein transgenic mice. intracerebral infusion of rapamycin, an inhibitor of mtor, or injection of a lentiviral vector expressing atg7 resulted in reduced accumulation of alphasynuclein in transgenic mice and amelioration of associated neurodegenerative alterations supporting the notion that defects in the autophagy pathway, and more speci fi cally in mtor and atg7, are associated with neurodegeneration. this supports the possibility that modulators of the autophagy pathway might have potential therapeutic effects using genetically altered stem cells [ 44, 270 ] . although the advances in parkinson's disease research to date are signi fi cant, the lack of clinical use of genetically modi fi ed cells is a bit surprising and may indicate an oversight and underuse of the advanced tools provided by the combination of stem cells and lentiviral vectors. lasting cerebral ischemia is a frequent (~80%) consequence of stroke and, as a result, most of the stroke research is focusing on ameliorating the devastating consequences of ischemic events: endothelial damage, neurodegeneration, and breakdown of the blood-brain barrier (bbb) leading to dif fi cult-to-treat cerebral edema [ 108 ] . data indicate that transplantation of human umbilical cord stem cells helps to protect ischemic brain [ 149 ] , and the protection is partially attributed to cytokines and protective factors produced by these stem cells [ 10, 149 ] . another promising fi nding is that the mesenchymal and neuronal stem cells preserve their ability to differentiate into glial and neuronal cells [ 51, 119, 149, 231, 249, 275 ] . various studies on focal cerebral ischemic models have implicated the direct activation and expression of matrix metalloproteinases (mmps), especially mmp-9, as a key orchestrator of bbb disruption. moreover, studies have shown that mmp-9 sirna can protect the bbb from ischemia/reperfusion injury. one study investigated the neuroprotective role of a lentiviral vector-mediated mmp-9 shrna following focal cerebral ischemia [ 108 ] , indicating that it is possible to deliver mmp-9 inhibitors by genetically enhanced stem cells. this study also showed the ability to deliver the target deeper into the affected area normally not accessible by direct lentiviral vector infusion. the forerunner of such interventions is a study testing the hypothesis that transplantation of human neurotrophin-3 (hnt-3) over-expressing neural stem cells into rat striatum after a severe focal ischemia would promote functional recovery. the rat neural stem cells were transduced with a flag-tagged hnt-3 gene in a lentiviral vector. the stem cells were transplanted into the striatum ipsilateral to the injury of adult rats 7 days after 2 h occlusion of the middle cerebral artery from 3 days to 2 weeks after transplantation. the modi fi ed cells (nscs-hnt3, as de fi ned by flag immuno fl uorescence staining) that survived the transplantation procedures could secrete signi fi cantly higher levels of neurotrophin-3 protein in the graft sites than controls ( p < 0.001). furthermore, the rats that accepted nscs-hnt3 exhibited enhanced functional recovery on neurological and behavioral tests, compared with control animals transplanted with saline or untransduced stem cells, indicating that they might have value for enhancing functional recovery after stroke [ 285 ] . recovery from ischemic events is slow and rather unpredictable. however, there seem to be new therapeutical opportunities that could enhance the process such as using vegf-induction therapy [ 16 ] . there is accumulating evidence indicating that vegf has direct neuroprotective effects on various cultured neurons of the central nervous system. interestingly, in vivo vegf controls the correct migration of facial branchiomotor neurons in the developing hindbrain and stimulates the proliferation of neural stem cells in enriched environments and after cerebral ischemia. on the other hand, transgenic mice expressing reduced levels of vegf develop late-onset motor neuron degeneration, reminiscent of amyotrophic lateral sclerosis (als). also, reduced levels of vegf have been implicated in a polyglutamine-induced model of motor neuron degeneration. intracerebroventricular delivery of recombinant vegf protein delays disease onset and prolongs survival of als rats, whereas intramuscular administration of a vegf-expressing lentiviral vector increases the life expectancy of als mice by as much as 30%. deciphering the precise role of vegf at the neurovascular interface promises to uncover new insights into the development and pathology of the nervous system and should be helpful to the design of novel strategies to treat (motor) neurodegenerative disorders [ 137 ] . vegf-expressing mscs have also been found bene fi cial in parkinson's disease [ 16, 271 ] . the development of lentiviral particles engineered for macrolide-responsive human vascular endothelial growth factor 121 (vegf121) expression will bring closer the in vivo use of inducible growth factor cell therapies, expressing the factors only in ischemic conditions using hypoxia-inducible erythropoietin promoter [ 6 ] . alternatively, the inducible vegf121 promoter system also compared favorably with isogenic streptogramin-and tetracycline-responsive con fi gurations and showed excellent growth-factor fi ne-tuning following transduction into a variety of mammalian cell lines and different human primary cells. chicken embryos transduced for macrolide-controlled vegf121 production can be fi ne-tuned to prime a dose-dependent neovascularization [ 168 ] . expression of survivin (svv) using an sin lentiviral vector carrying vascular endothelial growth factor further improved the expression of vegf and basic fi broblast growth factor in male sprague-dawley rats under hypoxic conditions. the in vivo experiment that produced this observation consisted of three groups of rats, one receiving intravenous injection of 500 m l of phosphate-buffered saline without cells (control group) and two groups administered the same volume solution with either three million gfp-mscs (group gfp) or svv/gfp-mscs (group svv). all animals were submitted to 2 h middle cerebral artery occlusion followed by reperfusion. modi fi cation with svv further increased secretion of both factors. the survival of the transplanted cells in the svv group was 1.3-fold higher at 4 days after transplantation and 3.4-fold higher at 14 days after transplantation, respectively, when compared with group gfp and reduced the cerebral infarct volume by 5.2% at 4 days after stroke and improved post-stroke neurological function at 14 days after transplantation. modi fi cation with svv could further enhance the therapeutic effects of mscs possibly through improving the mscs survival capacity and upregulating the expression of the protective cytokines in the ischemic tissue [ 151 ] . the identi fi cation of the genes differentially regulated by ischemia will lead to an improved understanding of cell death pathways such as those involved in the neuronal loss observed following a stroke. furthermore, the characterization of such pathways could facilitate the identi fi cation of novel targets for stroke therapy. one such novel approach was the ampli fi cation of the differential gene expression patterns in a primary neuronal model of stroke, by employing a lentiviral vector system to speci fi cally bias the transcriptional activation of hypoxically regulated genes. over-expression of the hypoxia-induced transcription factor subunits hif -1 alpha and hif-2 alpha elevated hypoxia-mediated transcription of many known hif-regulated genes well above control levels. furthermore, many potentially novel hif-regulated genes were discovered that were not previously identi fi ed as hypoxically regulated. most of the identi fi ed novel genes were activated by a combination of hif-2 alpha over-expression and hypoxic insult. these included several genes with particular importance in cell survival pathways and of potential therapeutic value. hypoxic induction of hif-2 alpha may therefore be a critical factor in mediating protective responses against ischemic injury. further investigation of the genes identi fi ed in this study may provide increased understanding of the neuronal response to hypoxia and may uncover novel therapeutic targets for the treatment of cerebral ischemia [ 202 ] and the genes need to be considered as useful targets in future mesenchymal stem cell therapies. however, the use of hypoxia-induced gene therapy has to be evaluated carefully in the light of recent provocative observations indicating that the hypoxic phenotype contributes to appearance of highly malignant cancer forms from the initial epithelial-mesenchymal transition to the ultimate organotropic colonization, and that can potentially be regulated by hypoxia, suggesting a master regulator role of hypoxia and hifs in metastasis [ 6, 155 ] . furthermore, modulation of cancer stem cell self-renewal by hifs may also contribute to the hypoxia-regulated metastasis program. the hypoxia-induced metastatic phenotype may be one of the reasons for the modest ef fi cacy of anti-angiogenic therapies and may well explain the provocative fi ndings that anti-angiogenic therapy increased metastasis in preclinical models [ 155 ] . the image of a wheelchair-bound superman exempli fi ed for all of us the tragedy that affects many of the victims of traumatic spinal cord injury and motivated research into protecting and restoring spinal-cord functionality beyond and above the usual efforts. the results are promising on many fronts [ 236 ] . on one hand, the intervertebral disk , cartilage, and bone injuries that threaten the integrity of the spinal cord can be almost completely healed and the healing can be facilitated and enhanced by stem cell therapies in most of the experimental models. the treatment often includes stem cells engineered with lentiviral gene transfer for enhancing and promoting wound healing and tissue restoration [ 13, 87, 89, 109, 247 ] . signi fi cant success has been achieved by expressing bone morphogenic proteins in the injured tissue [ 80 ] and observations that mechanical stimulation has a multiplying effect in bone regeneration will hopefully carry the research into clinical trials [ 140 ] sooner than later. probably, the fi rst trials will be done in well-designed spinal surgery, allowing even risky interventions, currently not practiced [ 13, 88, 163 ] . the progress is signi fi cantly slower when it comes to restoring the functionality of severed spinal cord, but successful demonstration that mscs migrate into the site of injury and differentiate into proper cell types needed for the healing [ 224 ] predicts potential breakthroughs. in this set of experiment, mesenchymal cells were labeled with green fl uorescent protein using lentiviral vector, were injected into the subarachnoid space, and their migration and differentiation was observed. cells were found on the surface of the injured spinal cord parenchyma, in deeper area of the perivascular spaces and some of them had been found deeply integrated into the parenchyma. immunostaining for nestin demonstrated that some gfp-positive cells differentiated into neural stem cells and mature neurons or glial cells in situ. lentiviral vectors pseudotyped with rabies env were successfully used to deliver genes into spinal cord and site of injury and showed successful retrograde transfer into deeper areas, indicating that gene therapy is possible and factors necessary for further differentiation of stem cells can be delivered [ 224, 241 ] . further advances in pseudotyping with rabies virus glycoprotein has a promise for more ef fi cient motor neuro-speci fi c delivery of transgenes and restoration of neuronal functions. as the examples indicate above, mscs have been recognized as promising delivery vehicles for gene therapy in the cns. a particularly unmet need is delivery of compounds that could help patients suffering from a particularly aggressive form of cancer, gliomas. a glimpse into a possible future can be gained from experiments in which stem cells were used to evaluate the antitumor effect of cytosine deaminase (cd) in a rat c6 glioma model . lentiviral vectors expressing cd and enhanced green fl uorescent protein (egfp) were constructed and transduced into rat mscs which were intracranially injected alone or in combination with c6 glioma cells supported by unlabeled parental mscs. the presence and effect of the engineered stem cells were then correlated with the possible effects on tumor growth, tumor cell apoptosis, tumor size, and rat survival in the presence of 5-fl uorocytosine (5-fc). fei et al. found that the cd/egfp cells were largely localized at the junction of the tumor with normal tissue. the mean survival time of rats co-injected with c6 glioma cells and mscs-cd/egfp cells was signi fi cantly extended to 45.9 days with tumor size reduction when compared with rats injected with c6 glioma cells alone surviving an average of 15.3 days, or those co-injected with c6 glioma cells and parental cells surviving only for 16.0 days. in addition, data suggest that msc-cd/egfpmediated gene therapy promoted tumor cell apoptosis in rat c6 gliomas [ 72 ] . without going into detail, hypoxia-induced genes seem to play an important role in the fate of mscs and require further studies, as modifying and preconditioning as well as changing their effects temporarily by gene therapy indicates a plethora of important insights into the potential use of this complicated class of stem cells in tumor therapy [ 142, 147, 150, 155, 267, 272 ] , and we expect rapid progress in this area in the near future. the rational is that tumor cells have signi fi cantly altered metabolism with a shift toward the anaerobic pathway and changes in the respective gene expression patterns providing novel targets and delivery methods for cancer therapy. transplantation of hscs to correct a series of lysosomal storage diseases and peroxisomal disorders has almost 25 years of history and involves over 20 diseases [ 23 ] . however, the success was limited to only a small subclass of diseases such as hurler syndrome, x-ald, and infantile krabbe disease. detailed studies are now available suggesting that hematopoietic stem cells are suitable only for a carefully selected cases, leaving open the fi eld for a more versatile mesenchymal stem cell therapy, especially those instances having neurological symptoms [ 69 ] . bone marrow-derived mscs are another promising platform for cell-and gene-based treatment of inherited and acquired disorders including a whole range of lysosomal storage diseases. several animal models exist to run preclinical studies [ 164 ] . human mscs distribute widely in a murine xenotransplantation model, and the human stem cells are amenable to lentiviral vector-mediated transduction to obtain expression of therapeutic levels of enzyme in xenotransplantation models of human disease (non-obese diabetic severe combined immunode fi cient mucopolysaccharidosis type vii [nod-scid mpsvii]) [ 164 ] . transduced mscs persisted in the animals that underwent transplantation and comparable numbers of donor mscs were detected at 2 and 4 months after transplantation. the level of circulating enzymes were suf fi cient to normalize the secondary elevation of other lysosomal enzymes and reduce lysosomal distention in several tissues providing additional evidence that transduced human mscs retain their normal traf fi cking ability in vivo and persist for at least 4 months, while able to deliver therapeutic levels of proteins in an authentic xenotransplantation model of human disease. similar results have been reported by muller and colleagues, who were able to restore aryl sulfatase and beta galactosidase levels in genetically de fi cient bone marrow mscs, and showed that untransduced cells from patients with metachromatic leukodystrophy , who are asa de fi cient, took up a substantial amount of asa that was released into the media from mscs [ 173 ] , an important milestone for future attempts to try stem cell therapy of metachromatic leukodystrophy. gm1 ganglyosiosys was successfully treated with mscs in a mouse beta-galactosidase knockout model indicating that autologous transplantation may be feasible using lentiviral-transduced mscs [ 228 ] . fabry disease affects an estimated 1 in 40,000-60,000 males, and far less frequently females. it is an inherited lysosomal disorder caused by a de fi ciency of alpha-galactosidase a (alpha-gal a). the systemic accumulation of globotriaosylceramide (gb3) results in gradual tissue deterioration leading to organ failure. there is a limited mouse model of the disease showing gb3 accumulation in an alpha-gal a-de fi cient mouse model. however, most of the important clinical manifestations are absent and the lack of relevant large animal model hinders the development of proper cell therapy. when compared to the human alpha-gal a, the porcine alphagal a showed a high level of homology in the coding regions. cell lysate and supernatants from fabry patient-derived fi broblasts transduced with a lentiviral vector carrying the porcine alpha-gal a cdna (lv/porcine alpha-gal a) showed high levels of alpha-gal a activity, and its enzymological stability was similar to that of human alpha-gal a. even more importantly, uptake of secreted porcine alpha-gal a by non-transduced cells was observed. furthermore, gb3 accumulation was reduced in fabry patient-derived fi broblasts transduced with the lv/porcine alpha-gal a. the fi nding that the porcine version of the gene is also x-linked (x22q) provides hope that a large animal (porcine) model of fabry disease can be constructed in the near future for use in testing a novel application of cell therapy using mscs [ 278 ] . the success of such model and eventually the feasibility of the treatment depends on the "bystander phenomenon ," i.e., the transduced mesenchymal cells intended for delivering the enzyme secrete the enzyme in abundance, but the defective cells in their microenvironment also must be able to take up the enzyme and utilize it. to facilitate the uptake, a fusion protein between gb3 and hiv tat protein has been made [ 104 ] . if successful, the range of enzyme replacement therapy approach could widen signi fi cantly. the data published by higuchi et al. indicate that indeed the tat's ability to penetrate the cell membrane was maintained in the recombinant fusion protein and it enhanced the enzyme uptake, as expected. since the different manifestations of the disease produce problems in different organs (brain, kidney, and heart), it seems to imply that mscs will be the best candidates for this enzyme replacement therapy as the earlier attempts to perform enzyme replacement therapy in mouse model showed insuf fi cient ef fi ciency [ 277 ] . the enormity of the problems posed by diabetes is re fl ected by the statistics published on the nih website ( http://diabetes.niddk.nih.gov/dm/pubs/statistics/#dd ). by the age of 65, almost one in four americans suffers from diabetes. the at-risk population of prediabetics is 37% of the population older than 20 years. the sheer number of patients indicates that restoring glucose metabolism by pancreas or pancreatic islet transplantation, even in the most severe cases, is just impractical if not impossible. the low engraftment rate makes the prospects of such treatment even worse, especially, as there never will be a suf fi cient number of donors. that leaves the stem cell technology as the major source of hope for solving the relevant issues in recovering regulated insulin production and glucose regulation functionality in diabetes. a large number of clinical trials using mscs are under way [ 75, 106, 120 ] , and a rather confusing sets of stem cell markers are listed in these studies indicating that there is a plurality of stem cells residing in different tissues, all of which have the potential to help pancreatic tissue regeneration. not surprisingly, the most obvious source of these stem cells could be the pancreas itself, from which the resting stem cells can be isolated, reactivated, and expanded by variety of stimulants. the data are still being evaluated, and need further con fi rmation, reproduction, and lineage tracing. the currently available datasets could not fi rmly substantiate the claims when using different markers ( carbonic anhydrase ii vs. hepatocyte nuclear factor 1 beta ) [ 61, 113, 237, 243 ] . since then, neurogenin 3 also was considered as a marker for endocrine type differentiation of proto beta cells [ 273 ] , leaving the subject as to whether well-de fi ned adult beta islet cell progenitors truly exist in signi fi cant numbers rather murky. the phenomenon of in vitro trans-differentiation of the acinar cells into beta cells upon exposure to egf, lif, notch1-inhibitors [ 15 ] looks promising, and recently zhou and colleagues added a more extensive study on in vivo reprogramming of adult pancreatic exocrine cells into beta cells [ 289 ] . however, the reported ef fi ciency was low and the progenitor cells remained elusive. this left the fi eld searching for other sources, including mscs from bone marrow, liver, intestine, and neural tissue (reviewed by efrat [64] [65] [66] and jones [ 118 ] ), that are capable of trans-differentiating into insulin-producing beta cells. with the available results, their ultimate hope was that these cells could be used to seed the pancreas with new sets of insulin producing islands. since lineage tracing was often omitted and the reproducibility of the results remained unsettled, the fi eld, despite its high importance, seems to be somewhat in shambles [ 106, 107 ] , ready for deployment of the novel, lentiviral vector supported techniques. szabat et al. report a signi fi cant set of results on beta-cell maturation using lentiviral vector-based lineage-study examining a novel pdx1/ins1 dual fl uorescent reporter vector. they con fi rmed that individual adult human and mouse beta-cells exist in at least two differentiation states, distinguishable by the activation of the ins1 promoter. they performed real-time imaging of the maturation of individual cultured beta-cells and followed the kinetics of the maturation process in primary human and mouse beta-cells and collected gene expression pro fi ling data as well. the gene expression pro fi ling of facs puri fi ed immature pdx1+ /ins1 (low) cells and mature pdx1 (high)/ins1 (high ) cells from cultures of human islets, mouse islets, and min6 cells revealed that pdx1+/ins1 (low) cells are enriched for expression of multiple genes associated with beta-cell development/progenitor cells, proliferation, apoptosis, as well as genes coding for other islet cell hormones such as glucagon [ 240 ] . it turns out that trans-differentiation can be successfully performed using mafa. mafa is a leucine zipper transcription factor from the maf family that can be activated by p38 map kinase. this protein is a known pancreatic transcriptional factor controlling the beta-cell-speci fi c transcription of the insulin gene [ 40 ] . expressing it using lentiviral vectors in placenta-derived multipotent stem cells (pdmscs) that constitutively expressed oct-4 and nanog resulted in signi fi cantly upregulated expression of a series of pancreatic development-related genes (sox17 , foxa2 , pdx1, and ngn3 ), similar to that of native pancreas and islet tissues. mafa increased the expression levels of the mrnas of nkx2.2 , glut2 , insulin , glucagons, and somatostatin , and further facilitated the differentiation of pdmscs into insulin + cells. importantly, the expression of mafa in pdmscs xenotransplanted into immunocompromised mice improved the restoration of blood insulin levels to control values and greatly prolonged the survival of graft cells in immunocompromised mice with stz-induced diabetes [ 40 ] . another successful lineage analysis and monitoring the induced trans-differentiation was reported by cheng et al., in which a relatively abundant epithelial cell source, fetal human pancreas, was used to assess the proliferation potential, changes in lineage markers during culture, and capacity for generating insulin-expressing beta cells from fetal epithelial cells. the fetal epithelial cells readily formed primary pancreatic progenitor cultures, although their replication capacity was rather limited. this was overcome by introduction and expression of htert (human telomerase reverse transcriptase) which greatly enhanced cellular replication in vitro. however, during culture the htert-modi fi ed pancreatic progenitor cells switched their phenotype gaining additional mesodermal properties. this phenotypic switching was inhibited when a pancreas-duodenal homeobox (pdx)-1 transgene was expressed with a lentiviral vector, along with inductive signaling through activin a and serum deprivation. this restored endocrine properties of htertmodi fi ed cells in vitro and were able to express insulin in vivo in immunode fi cient mouse model [ 39 ] . the complexities of these result indicate that a sophisticated multi-gene cell therapies may be needed to solve the issues of proper modulation of transdifferentiation pathways. other strategies using a lentiviral vector-based approach to achieve beta-cell proliferation through the beta-cell-speci fi c activation of the hepatocyte growth factor (hgf)/cmet signaling pathway are also being explored. one of these methodologies is based on the beta-cell-speci fi c expression of a ligand-inducible, chimeric receptor (f36vcmet ), under transcriptional control of the promoter from the human insulin gene, and its ability to induce hgf/cmet signaling in the presence of a synthetic ligand (ap20187) and result in speci fi c proliferation of human pancreatic beta-cells [ 182 ] . the selective, regulated beta-cell expansion may help to increase the availability of cells for transplantation in patients with advanced diabetes. these recent studies show that rapid progress may be achieved in this fi eld and lentiviral vectors may provide the necessary tools to analyze the issues. however, some of the notable efforts are made to avoid stem cell therapy altogether in certain types of diabetes. instead, choosing a more direct route, applying in vivo gene therapy for expressing insulin gene in cell types other than beta cells. ren et al. successfully restored near normal insulin levels for 500 days by expressing insulin in resting liver cells transduced with lentiviral vector using a rat diabetes model [ 204 ] . although monogenic disease appears to be the most obvious human diseases to treat with gene therapy, since they are caused by a single-gene defect, the progress in clinical studies has thus far been rather limited. explanations for the lack of success include inef fi ciency of transductions in vivo, dangers posed by vectors, the failure to permanently correct the gene defect in suf fi cient number of cells, or the rapid turnover of cells. alternative approaches therefore involve the search for and use of stem cell populations and depleting the active stem cell compartments ablation using cytostatic drugs to give chance if increased engraftment by transplanted stem cells. combining the versatility and availability of mscs, their ability to engraft, the use of autologous instead of allogeneic sources for safe transplantation, and the fact that the stem cell population can be expanded in vitro allows highly ef fi cient ex vivo gene therapy relying on latest generation of lentiviral vectors. cystic fi brosis (cf) is caused by a mutation in the gene for the cystic fi brosis transmembrane conductance regulator protein (cftr). the mutant form of the protein causes severe defect in mucus metabolism in the lungs and intestinal track that deteriorates into a life-long, deadly disease. cf is theoretically amenable to gene therapy. in spite of intensive research and a large number of clinical trials in the last 18 years, little practical success can be shown for treating cystic fi brosis [ 191 ] . the explanations include the fact that the deeper regions in the inner surface of the lung are not accessible to direct inhalation and direct treatment [ 48, 263 ] . in light of this fi nding, stem cells remain the most promising delivery vehicles. castellani et al. reviewed the recent attempts to identify lung-or bone marrow-derived populations of stem cells or progenitor cells and application of such cells, allogenic or gene-corrected autologous cells, to colonize the airways, while differentiating into functional respiratory columnar epithelial cells [ 33 ] . when the reporter gene expression was analyzed in trachea-lungs and bronchoalveolar lavage, 0.4-5.5% of stem cells survived in injured airways, but no stem cells survived in control, healthy airway, or in the epithelial lining fl uid [ 138 ] . the most successful approaches thus far appear to be obtained with bone marrow-derived mscs, although the trans-differentiation rate thus far has been limited to below 10-14% [ 26 ] . as an alternative, the proven multipotent nature of bronchoalveolar stem cells isolated from lung tissue may provide another promising approach for stem cell therapy. some additional improvement is expected from more ef fi cient targeting of lentiviral vectors. mitomo and colleagues built a sendai virus env-pseudotyped siv lentiviral vector that can be manufactured at high enough titer and is capable of transducing respiratory epithelium of the murine nose in vivo at levels that may be relevant for achieving clinical bene fi t to cystic fi brosis patients [ 167 ] . availability of novel cystic fi brosis gene-carrying stem cell lines derived from placental mesenchymal cells certainly will help to speed up the research [ 53 ] . however, much more needs to be known about the normal differentiation and functioning of the airway's basal cells and the differentiation and lineages of stem cells to have more ef fi cient treatment options both for gene therapy and for stem cell therapy [ 207 ] . we expect that the intensity of research and push for clinical trials will remain high as the outline of directions will become clearer. also the methods to derive respiratory cell types from stem cells will remain a critical piece [ 181 ] . this disease is an x-linked recessive disorder caused by a mutation in the dystrophin gene that destabilizes muscle cell membranes and causes muscle dystrophy in approximately 1 of 3,600 boys. the musculoskeletal abnormalities deteriorate to a fatal level and the average life expectancy is no more than 25 years, even with high quality care. the research is facilitated by the availability of dystrophin-de fi cient transgenic mice (mdx-mice ) and double knockout (utrophin/dystrophin-de fi cient mice) that can be used as experimental disease models [ 144, 269 ] . human immortalized pluripotent cell lines expressing the mutant dystrophin gene are also available [ 187 ] . the ability of mscs to differentiate into muscle cells places them on the top of the list of candidates that could be used to treat duchenne muscular dystrophy [ 157 ] . lentiviral vectors have been used in this fi eld for conditional immortalization of human cells for basic biologic studies. cudre-maroux et al. demonstrated that the lentiviral vector-mediated transduction of immortalizing genes into human primary cells is an ef fi cient method for obtaining such cell lines. for duchenne muscular dystrophy, the muscle satellite cell model was used to examine the impact of the transduced genes on the genotypic and phenotypic characteristics of the immortalized cells. the most commonly used immortalizing gene, the sv40 large t antigen (t-ag), was extremely ef fi cient at inducing the continuous growth of primary myoblasts, but the resulting cells rapidly accumulated major chromosomal aberrations and exhibited profound phenotypic changes. in contrast, the constitutive expression of telomerase and bmi -1 in satellite cells from a control individual and from a patient suffering from duchenne's muscular dystrophy yielded cell lines that remained diploid and conserved their growth factor dependence for proliferation. however, despite the absence of detectable cytogenetic abnormalities, clones derived from satellite cells of a control individual exhibited a differentiation block in vitro. in contrast, a duchenne-derived cell line exhibited all the phenotypic characteristics of its primary parent, including an ability to differentiate fully into myotubes when placed in proper culture conditions. this cell line should constitute a useful reagent for a wide range of studies aimed at this disease [ 46 ] . a realistic source of stem cells would be the adipose tissue-derived stem cells that can be enhanced for muscle repair. forced expression of myod using lentiviral vector in vitro strongly induced myogenic differentiation, while the adipogenic differentiation was inhibited. moreover, myod-expressing human multipotent adipose-derived stem cells had the capacity to fuse with dmd myoblasts and can restore dystrophin expression. importantly, transplantation of these modi fi ed human, multipotent, adipose-derived stem cells into injured muscles of immuno-depressed rag2(−/−)gam-mac(−/−) mice resulted in a substantial increase in the number of human multipotent adipose-derived stem cell-derived fi bers [ 92 ] . goncalves and colleagues went a step further and devised a technique to monitor the fusion events necessary for myoblast formation by using an elaborate bipartite genetic switch that relays on recombinaseinducible genetic switch that is activated after two cell types, one of which expresses cre and the other the rest of the elements with loxp1 sites, that switch on only upon fusion. this provides a sensitive tool to study the lineages and process of myocyte fusion in transgenic system [ 90 ] . ikemoto et al. used high transduction-ef fi ciency lentiviral vector-mediated gene transfer into freshly isolated autologous satellite cells. freshly isolated cells have better myogenic capability than satellite cell-derived myoblasts, and expansion of the satellite cells does not affect their regenerative potential. the transduced cells successfully regenerated the targeted muscle groups in mdx mice [ 112 ] . however, the vsvg pseudotyped lentiviral vector are inferior in transducing nondividing murine cells, and shunchang et al. demonstrated that by pseudotyping with feline immunode fi ciency virus env better transduction rates can be achieved [ 234 ] . we include this disease because of the challenges researchers faced in attempts to use cell-based therapies. granulomatous disease is a rather rare x-linked immunode fi ciency disorder caused by mutations in the cybb gene encoding the phagocyte nicotinamide adenine dinucleotide phosphate (nadph)-oxidase catalytic subunit gp91(phox). earlier attempts to restore the gene function with oncoretroviral vectors failed due to (a) gene silencing common for retroviral inserts, (b) high risk of genotoxicity that these oncoretroviral vectors pose [ 165 ] , and (c) low transduction ef fi ciency and inability to target appropriate cell lineage and practice differentiation-restricted gene expression [ 17 ] . the solution for the compound problem seems to lie in using a safer and more ef fi ciently targeting lentiviral vector system [ 17, 223 ] . it has been demonstrated and repeatedly con fi rmed that by using lentiviral vector it is possible to transduce hscs as well as differentiated neutrophiles from patients with x-linked chronic granulomatous disease (x-cgd) and correct the x-cgd-phenotype in the nod/scid model. the lentivector was a vsv-gpseudotyped, third-generation, self-inactivating (sin) lentivector encoding gp91 (phox). lentiviral vector ef fi ciently transduced cd34+ peripheral blood stem cells under ex vivo conditions nonpermissive for cell division and resulted in 54% of the cells expressing gp91 (phox). lentivector also achieved signi fi cant correction of differentiated human x-cgd neutrophils arising in vivo in nod/scid mice that underwent transplantation (20% and 2.4%, respectively). thus, third-generation sin lentivector-gp91 (phox) performs well as assessed in human x-cgd neutrophils differentiating in vivo, and the studies suggest that the nod/scid model is generally applicable for in vivo study of therapies evaluated in human blood cells expressing a speci fi c disease phenotype [ 165, 209, 226 ] . however, long-term solution can be expected from transducing hscs, and not the fully differentiated neutrophils that have only limited lifespan left and the lack of genotoxicity safety of the third-generation sin v lentiviral vector seem to address these requirements perfectly. wilson's disease is a genetic disease caused by a spectrum of mutations in the atp7b gene , whose product is a liver transporter protein responsible for coordinated copper export into bile and blood. zhang and colleagues reported in 2011 an attempt to restore the normal phenotype by directed hepatocyte differentiation from human-induced pluripotent stem cells. the phenotype correction was achieved by chaperone drug curcumin that can reverse the functional defect in vitro in the case of the r778l chinese hotspot mutation in the atp7b gene. they propose this model system for correcting the gene using lentiviral technology [ 284 ] . atp7b gene seems to be relevant for the wider mesenchymal stem cell fi eld as over-expressing it protects mscs form copper toxicity . this in turn could be used as a selection advantage of transduced mscs over the non-transduced ones in copper-rich environment for enriching the transduced mesenchymal cell compartment before transplantation [ 225 ] . the image of the "fountain of youth" represents a mirage deeply engraved in the human psyche and expresses our fear and resentment of one of the inevitabilities of life: if we are lucky, we will get old, decrepit, suffer a lot from series of painful, chronic diseases, and fi nally succumb. the irony is that those who we consider unlucky, die young, but are saved from the long lasting predicaments of aging. certainly, the intricacies of the factors leading to longevity, or the lack thereof, keep generations of stem cell researchers awake and busy and for good reasons. a model for aging has been found in the condition known as progeria , or more precisely the hutchinson-gilford progeria syndrome , a rare disease affecting children of both sexes and which is caused by a mutant prelaminin a gene, encoding the lamin a-processing enzyme. prelaminin a that retains a farnesyl group, subsequently expressing its abnormal form, progerin. progerin in turn is anchored to the nuclear membrane and destabilizes the nucleus, limiting the ability of cells to divide and leading to premature cell death. unlike other accelerated aging diseases affecting dna repair (werner's and cockayne's syndrome), progerin may play role in normal aging process [ 211 ] and its production is slowly turned on in cells that have uncapped chromosomes, i.e., have truncated telomeres, resulting in premature depletion of stem cell compartments. see the popular ucsf website for details: http://www.ucsf.edu/news/2011/10/10766/aging-disease-children-sheds-lightnormal-aging . additional upregulation of multiple genes in major in fl ammatory pathways indicated an activated in fl ammatory response in progeria patients. this response has also been associated with normal aging, emphasizing the importance of studying progeria to increase the understanding of the normal aging process [ 178 ] . the progressing disease shows a pattern of tissue and organ degeneration that correlates with the depletion of a variety of stem cell compartments, a correlation fi rst pointed out by favreau [ 70 ] . the insight into the role of stem cell depletion in progeria accumulated rapidly in the last couple of years [ 170, 178, 179, 211 ] . this leads to the establishment of an animal model by creating zmpste24 knockout mice [ 67 ] in 2008, which showed premature senescence and progeroid symptoms. with the role of stem cells in aging and in progeria, the doors opened for studying stem cell renewal via dedifferentiation. autologous or heterologous transfer of native or lentivector-enhanced cells [ 129, 184, 212, 265 ] are being actively considered as a possible interventions to slow down progeria as well as natural aging [ 28 ] . however, in both cases, the changes are systemic and murine gene therapy data indicate that the therapy in lysosomal storage disease models, affecting large segments of the body, is more ef fi cient if done at an earliest possible age [ 28, 129 ] . this may have something to do with the limited availability of the microenvironment for the modi fi ed or transplanted stem cells. for this, the preexisting ones, even when "old" and malfunctioning, already occupy the microenvironment appropriate for stem cells. we already know that wound sites create new sites and attract mscs [ 5 ] . also, it is possible that cancer growth is able to generate and maintain an appropriate microenvironment for cancer stem cells [ 24 ] as well as mscs [ 42 ] (potentially for use as anticancer agents) but the normal tissue, even in aging, seems to be resilient in accepting externally provided stem cells. experiments are under way to create arti fi cial microenvironments using nanotechnology to deliver stem cells that produce therapeutical factors [ 52 ] , and 3d scaffolding mimicking bone marrow niches are being designed for similar purpose [ 55 ] and lentiviral vector are often used to deliver the genes of interest [ 60, 141, 166, 248 ] . virxsys pioneered the use of lentiviral vectors in phase i clinical trials to deliver antisense hiv genes as an antisense rna therapy for aids [ 110, 156 ] . this established an initial safety pro fi le for the ex vivo use of lentiviral vectors (see http:// clinicatrials.gov : identi fi er vrx496-usa-05-002). this phase i trial demonstrated the safety and tolerability of a single dose of approximately ten billion autologous hiv infected cd4+ t cells transduced with the lentivector vrx496 carrying a 937-base antisense targeting the hiv envelope. these encouraging results have led to design of a phase ii clinical trial to evaluate the safety, tolerability, and biological activity of four or eight repeated infusions of fi ve to ten billion autologous vrx496modi fi ed hiv+, cd4+ t cells. a major obstacle to completing this phase ii trial was manufacturing enough cells to administer multiple infusions in patients. for this study the safety issues were cleared successfully, opening the way for more extensive use of lentiviral vectors in clinical trials. currently, 16 lentiviral clinical trials are listed at the clinicaltrials.gov website. all of these trials are in early stage, phase i or ii. three of these trials have not yet started and 11 trials are still recruiting patients. most of the clinical trials are focused on hematopoietic stem cells, which are outside of the scope of this work. one trial targets netherton syndrome (clinicaltrials.gov: identi fi er: nct01545323) and attempts to restore lekti serine protease levels in an affected 5 cm 2 skin area, a proof of principle study that has the potential to utilize mesenchymal stem cells in the future. we witnessed a tremendous progress in characterizing and understanding stem cells, the factors needed for maintaining the stem cell phenotype as well as changing it in a predictable mode forcing the mesenchymal stem cells into various differentiation pathways. this progress provides a test bed for a higher level of bioengineering when the genetic buildup of the stem cells is changed to achieve well-de fi ned therapeutical goals. the overview of the recent literature presents a long list of "proof of concept experiments" in which tantalizing possibilities are validated as things that can be accomplished in a wide range of fi elds representing different pathologies: from the debilitating alzheimer's, parkinson's diseases; various traumatic neuronal injuries, diabetes, neuronal and cardiac ischemia; to agerelated tissue degeneration and tissue engineering or delivery of biologics for therapeutical purposes. in parallel, lentiviral vectors are becoming highly valued tools in this tedious work as they are highly ef fi cient vehicles for gene delivery to mark cells, express genes of interest, proteins, and various inhibitory rna species in stem cells. consequently these stem cells, especially the various forms of mscs, have been shown to be highly effective in delivering the targeted genes to dif fi cultto-reach tissues, including the cns. manipulating genes and gene expression, gene transfer has been made safe and ef fi cient by the recent progress in lentivector technology and merged successfully with the stem cell technology. this fi eld has reached an advanced stage, at which it has become feasible to use them safely in a clinical environment. more and more researchers as well as clinicians are becoming familiar with the power of these technologies both for ex vivo and in vivo cell therapy. it does not take a prophet to predict that advanced stem cell therapy has gaining a strong foothold, and even though a tremendous amount of work is needed to be done for it to become a everyday intervention, it is here to stay and will become a routine treatment for the next generation of patients. biological characteristics of stem cells from foetal, cord blood and extraembryonic tissues mechanisms of cellular therapy in respiratory diseases arti fi cial dna cutters for dna manipulation and genome engineering tissue engineering technologies: just a quick note about transplantation of bioengineered donor trachea and augmentation cystoplasty by de novo engineered bladder tissue wound microenvironment sequesters adipose-derived stem cells in a murine model of reconstructive surgery in the setting of concurrent distant malignancy neuroprotective effect of combined hypoxia-induced vegf and bone marrowderived mesenchymal stem cell treatment complete knockdown of ccr5 by lentiviral vector-expressed sirnas and protection of transgenic macrophages against hiv-1 infection speci fi c transduction of hiv-susceptible cells for ccr5 knockdown and resistance to hiv infection: a novel method for targeted gene therapy and intracellular immunization stem cells and reprogramming: breaking the epigenetic barrier? mesenchymal stem cell conditioned media attenuates in vitro and ex vivo myocardial reperfusion injury non-skin mesenchymal cell types support epidermal regeneration in a mesenchymal stem cell or myo fi broblast phenotype-independent manner tgf{beta}/ tnf{alpha}-mediated epithelial-mesenchymal transition generates breast cancer stem cells with a claudin-low phenotype genetically engineered mesenchymal stem cells: applications in spine therapy hyaluronan and mesenchymal stem cells: from germ layer to cartilage and bone notch signaling as gatekeeper of rat acinar-to-beta-cell conversion in vitro transplantation of human bone marrow-derived mesenchymal stem cells promotes will there be a live-attenuated hiv vaccine available for human safety trials by the year 2000? interview by gordon nary toward modeling the bone marrow niche using scaffold-based 3d culture systems transduction of human hematopoietic stem cells by lentiviral vectors pseudotyped with the rd114-tr chimeric envelope glycoprotein rna-based gene therapy for hiv with lentiviral vector-modi fi ed cd34(+) cells in patients undergoing transplantation for aids-related lymphoma an anti-major histocompatibility complex class i intrabody protects endothelial cells from an attack by immune mediators minimal criteria for de fi ning multipotent mesenchymal stromal cells. the international society for cellular therapy position statement vascular targeting and antiangiogenesis agents induce drug resistance effector grp78 within the tumor microenvironment adult pancreatic beta-cells are formed by self-duplication rather than stem-cell differentiation lentiviral vectors: their molecular design, safety, and use in laboratory and preclinical research a thirdgeneration lentivirus vector with a conditional packaging system cell-based therapy for insulin-dependent diabetes mellitus cell therapy approaches for the treatment of diabetes beta-cell replacement for insulin-dependent diabetes mellitus nuclear envelope defects cause stem cell dysfunction in premature-aging mice mesenchymal stem cells in the dental tissues: perspectives for tissue regeneration novel treatment for neuronopathic lysosomal storage diseases-cell therapy/gene therapy expression of a mutant lamin a that causes emery-dreifuss muscular dystrophy inhibits in vitro differentiation of c2c12 myoblasts nanotherapeutics for alzheimer's disease (ad): past, present and future the antitumor effect of mesenchymal stem cells transduced with a lentiviral vector expressing cytosine deaminase in a rat glioma model dual origin of mesenchymal stem cells contributing to organ growth and repair hematopoietic stromal cells and megakaryocyte development bone marrow-derived stem cell transplantation for the treatment of insulin-dependent diabetes a novel lentivector targets gene transfer into hhsc in marrow from patients with bm-failure-syndrome and in vivo in humanized mice advances in the fi eld of lentivector-based transduction of t and b lymphocytes for gene therapy measles virus glycoprotein-pseudotyped lentiviral vector-mediated gene transfer into quiescent lymphocytes requires binding to both slam and cd46 entry receptors strategies for targeting lentiviral vectors enhancement of posterolateral lumbar spine fusion using low-dose rhbmp-2 and cultured marrow stromal cells peptides for therapy and diagnosis of alzheimer's disease mesenchymal stem cells in cartilage repair: state of the art and methods to monitor cell growth, differentiation and cartilage regeneration development of lentiviral gene therapy for wiskott aldrich syndrome limitation of in vivo models investigating angiogenesis in breast cancer rd114-pseudotyped retroviral vectors kill cancer cells by syncytium formation and enhance the cytotoxic effect of the tk/gcv gene therapy strategy tissue-speci fi c shrna delivery: a novel approach for gene therapy in cancer cervical motion preservation using mesenchymal progenitor cells and pentosan polysulfate, a novel chondrogenic agent: preliminary study in an ovine model potential applications for using stem cells in spine surgery cervical interbody fusion is enhanced by allogeneic mesenchymal precursor cells in an ovine model rapid and sensitive lentivirus vector-based conditional gene expression assay to monitor and quantify cell fusion activity methods for making induced pluripotent stem cells: reprogramming a la carte enhancement of myogenic and muscle repair capacities of human adipose-derived stem cells with forced expression of myod the role of micrornas in self-renewal and differentiation of mesenchymal stem cells strategies, development, and pitfalls of therapeutic options for alzheimer's disease ef fi cient processing of alzheimer's disease amyloid-beta peptides by neuroectodermally converted mesenchymal stem cells gene therapy of x-linked severe combined immunode fi ciency insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of scid-x1 a serious adverse event after successful gene therapy for x-linked severe combined immunode fi ciency optimized lentiviral vector design improves titer and transgene expression of vectors containing the chicken betaglobin locus hs4 insulator element zinc-fi nger nuclease based genome surgery: it's all about speci fi city role of mesenchymal stromal cells in solid organ transplantation immunomodulatory effect of mesenchymal stem cells alpha-galactosidase a-tat fusion enhances storage reduction in hearts and kidneys of fabry mice transgenic rabbit production with simian immunode fi ciency virus-derived lentiviral vector the quest for tissue stem cells in the pancreas and other organs, and their application in beta-cell replacement lineage tracing evidence for transdifferentiation of acinar to duct cells and plasticity of human pancreas lentivirus-mediated transfer of mmp-9 shrna provides neuroprotection following focal ischemic brain injury in rats stem cellbased approaches for intervertebral disc regeneration ef fi cient lentiviral vector-mediated control of hiv-1 replication in cd4 lymphocytes from diverse hiv + infected patients grouped according to cd4 count and viral load oxidative stress-inducible lentiviral vectors for gene therapy autologous transplantation of sm/c-2.6(+) satellite cells transduced with micro-dystrophin cs1 cdna by lentiviral vector into mdx mice carbonic anhydrase ii-positive pancreatic cells are progenitors for both endocrine and exocrine pancreas after birth self-inactivating lentiviral vectors with u3 and u5 modi fi cations speci fi c and stable gene transfer to human embryonic stem cells using pseudotyped lentiviral vectors functional differences between mesenchymal stem cell populations are re fl ected by their transcriptome an update on targeted gene repair in mammalian cells: methods and mechanisms cell-based treatments for diabetes mesenchymal stem cells for the treatment of neurodegenerative disease human placenta-derived mesenchymal stem cells and islet-like cell clusters generated from these cells as a novel source for stem cell therapy in diabetes generation of hiv-1 resistant and functional macrophages from hematopoietic stem cell-derived induced pluripotent stem cells identi fi cation and biology of beta-secretase safety considerations in vector development hedgehog signaling, epithelial-to-mesenchymal transition and mirna (review) directing stem cell homing biosurface engineering through ink jet printing minimal requirement for a lentivirus vector based on human immunode fi ciency virus type 1 potential oncogene activity of the woodchuck hepatitis post-transcriptional regulatory element (wpre) neonatal gene therapy of mps i mice by intravenous injection of a lentiviral vector filovirus-pseudotyped lentiviral vector can ef fi ciently and stably transduce airway epithelia in vivo umbilical cord blood-derived progenitor cells enhance muscle regeneration in mouse hindlimb ischemia model resident vascular progenitor cells: an emerging role for nonterminally differentiated vessel-resident cells in vascular biology sox2 transduction enhances cardiovascular repair capacity of blood-derived mesoangioblasts methods in mammalian cell line engineering: from random mutagenesis to sequence-speci fi c approaches cre/loxp recombination system and gene targeting human mesenchymal stem cells (hmscs) expressing truncated soluble vascular endothelial growth factor receptor (tsflk-1) following lentiviral-mediated gene transfer inhibit growth of burkitt's lymphoma in a murine model vegf at the neurovascular interface: therapeutic implications for motor neuron disease developing cell therapy techniques for respiratory disease: intratracheal delivery of genetically engineered stem cells in a murine model of airway injury human umbilical cord blood-derived mesenchymal stem cells improve neuropathology and cognitive impairment in an alzheimer's disease mouse model through modulation of neuroin fl ammation cell therapy for bone regeneration-bench to bedside survival and neuronal differentiation of mesenchymal stem cells transplanted into the rodent brain are dependent upon microenvironment hypoxia preconditioned mesenchymal stem cells improve vascular and skeletal muscle fi ber regeneration after ischemia through a wnt4-dependent pathway gene transfer in humans using a conditionally replicating lentiviral vector improved motor function in dko mice by intravenous transplantation of bone marrow-derived mesenchymal stromal cells hypoxia and hypoxia inducible factors in cancer stem cell maintenance inhibition of hiv-1 infection by a unique short hairpin rna to chemokine receptor 5 delivered into macrophages through hematopoietic progenitor cell transduction mesenchymal stromal cells expressing heme oxygenase-1 reverse pulmonary hypertension human fetal trachea-scid mouse xenografts: ef fi cacy of vesicular stomatitis virus-g pseudotyped lentiviral-mediated gene transfer human umbilical mesenchymal stem cells promote recovery after ischemic stroke hypoxia-inducible factor-1alpha is essential for hypoxia-induced mesenchymal stem cell mobilization into the peripheral blood effects of transplantation with bone marrow-derived mesenchymal stem cells modi fi ed by survivin on experimental stroke in rats neuro fi lamentopathy in neurodegenerative diseases arti fi cial cell microencapsulated stem cells in regenerative medicine, tissue engineering and cell therapy gene editing in human stem cells using zinc fi nger nucleases and integrase-defective lentiviral vector delivery hypoxia and hypoxia-inducible factors: master regulators of metastasis antisensemediated inhibition of human immunode fi ciency virus (hiv) replication by use of an hiv type 1-based vector results in severely attenuated mutants incapable of developing resistance sources of adult mesenchymal stem cells applicable for musculoskeletal applications -a systematic review of the literature design, engineering, and characterization of zinc fi nger nucleases biomimetic mineral-organic composite scaffolds with controlled internal architecture neprilysin gene transfer reduces human amyloid pathology in transgenic mice recent advances in lentiviral vector development and applications rabies virus glycoprotein pseudotyping of lentiviral vectors enables retrograde axonal transport and access to the nervous system after peripheral delivery mesenchymal stem cells for the sustained in vivo delivery of bioactive factors lentiviral-transduced human mesenchymal stem cells persistently express therapeutic levels of enzyme in a xenotransplantation model of human disease site-speci fi c gene insertion mediated by a cre-loxpcarrying lentiviral vector targeting the tumor and its microenvironment by a dual-function decoy met receptor toward gene therapy for cystic fi brosis using a lentivirus pseudotyped with sendai virus envelopes in vivo transduction of hiv-1-derived lentiviral particles engineered for macrolide-adjustable transgene expression insertional transformation of hematopoietic cells by self-inactivating lentiviral and gammaretroviral vectors barrier-to-autointegration factor in fl uences speci fi c histone modi fi cations the genotoxic potential of retroviral vectors is strongly modulated by vector design and integration site selection in a mouse model of hsc gene therapy hematopoietic stem cell gene transfer in a tumor-prone mouse model uncovers low genotoxicity of lentiviral vector integration in vitro analysis of multipotent mesenchymal stromal cells as potential cellular therapeutics in neurometabolic diseases in pediatric patients in situ tissue engineering for tracheal reconstruction using a luminar remodeling type of arti fi cial trachea inserting optimism into gene therapy ex vivo gene transfer and correction for cell-based therapies a dual speci fi city promoter system combining cell cycle-regulated and tissue-speci fi c transcriptional control dedifferentiation rescues senescence of progeria cells but only while pluripotent in vitro pathological modelling using patient-speci fi c induced pluripotent stem cells: the case of progeria origin of hematopoietic progenitors during embryogenesis deriving respiratory cell types from stem cells regulated expansion of human pancreatic beta-cells targeting the perpetrator: breast cancer stem cell therapeutics ef fi ciency of lentiviral transduction during development in normal and rd mice reprogramming to pluripotency: stepwise resetting of the epigenetic landscape targeting of therapeutic gene expression to the liver by using liver-type pyruvate kinase proximal promoter and the sv40 viral enhancer active in multiple cell types disease-speci fi c induced pluripotent stem cells glial cells in (patho)physiology lentivirus-expressed sirna vectors against alzheimer disease mesenchymal stem cells and tooth engineering stem cell therapy for cystic fi brosis: current status and future prospects progress in understanding reprogramming to the induced pluripotent state conditional mutagenesis in mice: the cre/loxp recombination system lentiviral vectors encoding tetracycline-dependent repressors and transactivators for reversible knockdown of gene expression: a comparative study lentiviral vectors approach the clinic but fall back: national institutes of health recombinant dna advisory committee review of a fi rst clinical protocol for use of a lentiviral vector mesenchymal stem cells as therapeutics and vehicles for gene and drug delivery plant biotechnology: zinc fi ngers on target chimeric nucleases stimulate gene targeting in human cells identi fi cation of the sensory neuron speci fi c regulatory region for the mouse gene encoding the voltage-gated sodium channel nav1.8 intrapleural delivery of mesenchymal stem cells: a novel potential treatment for pleural diseases zinc-fi nger nucleases for somatic gene therapy: the next frontier identi fi cation of potential stroke targets by lentiviral vector mediated overexpression of hif-1 alpha and hif-2 alpha in a primary neuronal model of hypoxia de fi ning pluripotent stem cells through quantitative proteomic analysis longterm correction of diabetes in rats after lentiviral hepatic insulin gene therapy effect of interleukin-10 overexpression on the properties of healing tendon in a murine patellar tendon model stem cell technologies for tissue regeneration in dentistry airway basal stem cells: a perspective on their roles in epithelial homeostasis and remodeling mesenchymal stem cells derived from dental tissues third-generation, self-inactivating gp91(phox) lentivector corrects the oxidase defect in nod/scid mouse-repopulating peripheral blood-mobilized cd34+ cells from patients with x-linked chronic granulomatous disease current development of lentiviral-mediated gene transfer stem cell depletion in hutchinson-gilford progeria syndrome human-induced pluripotent stem cells produced under xeno-free conditions tissue speci fi city of enhancer and promoter activities of a herv-k(hml-2) ltr porous titanium scaffolds fabricated using a rapid prototyping and powder metallurgy technique collagen scaffolds reinforced with biomimetic composite nano-sized carbonate-substituted hydroxyapatite crystals and shaped by rapid prototyping to contain internal microchannels novel collagen scaffolds with prede fi ned internal morphology made by solid freeform fabrication gastric carcinogenesis and the cancer stem cell hypothesis mesenchymal stem cells stably transduced with a dominantnegative inhibitor of ccl2 greatly attenuate bleomycin-induced lung damage lentiviral vector-mediated gene transfer to endotherial cells compared with adenoviral and retroviral vectors engineering zinc fi nger nucleases for targeted mutagenesis of zebra fi sh lentiviral vectors pseudotyped with a modi fi ed rd114 envelope glycoprotein show increased stability in sera and augmented transduction of primary lymphocytes and cd34+ cells derived from human and nonhuman primates targeting retroviral and lentiviral vectors biochemical correction of x-cgd by a novel chimeric promoter regulating high levels of transgene expression in myeloid cells migration of mesenchymal stem cells through cerebrospinal fl uid into injured spinal cord tissue overexpressed atp7b protects mesenchymal stem cells from toxic copper lentivirus-mediated gene transfer of gp91phox corrects chronic granulomatous disease (cgd) phenotype in human x-cgd cells genotypic features of lentivirus transgenic mice intracerebral cell transplantation therapy for murine gm1 gangliosidosis indirect rapid prototyping of biphasic calcium phosphate scaffolds as bone substitutes: in fl uence of phase composition, macroporosity and pore geometry on mechanical properties fibroblasts derived from human embryonic stem cells direct development and repair of 3d human skin equivalents mesenchymal stem cell transplantation modulates neuroin fl ammation in focal cerebral ischemia: contribution of fractalkine and il-5 superior tissue-speci fi c expression from tyrosinase and prostate-speci fi c antigen promoters/enhancers in helper-dependent compared with fi rstgeneration adenoviral vectors a highly ef fi cient short hairpin rna potently down-regulates ccr5 expression in systemic lymphoid organs in the hu-blt mouse model expression of truncated dystrophin cdnas mediated by a lentiviral vector mscs: biological characteristics, clinical applications and their outstanding concerns feasibility of a stem cell therapy for intervertebral disc degeneration pancreatic exocrine duct cells give rise to insulinproducing beta cells during embryogenesis but not after birth potential differentiation of human mesenchymal stem cell transplanted in rat corpus cavernosum toward endothelial or smooth muscle cells induced pluripotent cancer cells: progress and application kinetics and genomic pro fi ling of adult human and mouse beta-cell maturation cervical spinal cord delivery of a rabies g protein pseudotyped lentiviral vector in the sod-1 transgenic mouse. invited submission from the joint section meeting on disorders of the spine and peripheral nerves micromanipulating viral-based therapeutics growth and regeneration of adult beta cells does not involve specialized progenitors tumor-initiating and -propagating cells: cells that we would like to identify and control highly ef fi cient endogenous human gene correction using designed zinc-fi nger nucleases genome editing with engineered zinc fi nger nucleases mesenchymal stem cells injection in degenerated intervertebral disc: cell leakage may induce osteophyte formation development of an in vitro model for the simultaneous study of the ef fi cacy and hematotoxicity of antileukemic compounds mesenchymal stem cell transplantation changes the gene expression pro fi le of the neonatal ischemic brain silencing of cardiac mitochondrial nhe1 prevents mitochondrial permeability transition pore opening gene delivery of a mutant tgfbeta3 reduces markers of scar tissue formation after cutaneous wounding induced pluripotent stem cells: fundamentals and applications of the reprogramming process and its rami fi cations on regenerative medicine transcriptional pro fi ling of human mesenchymal stem cells transduced with reporter genes for imaging molecular imaging of mesenchymal stem cell: mechanistic insight into cardiac repair after experimental myocardial infarction the engineering of patient-speci fi c, anatomically shaped, digits in vitro osteogenic differentiation of adipose stem cells after lentiviral transduction with green fl uorescent protein runx1/aml1/ cbfa2 mediates onset of mesenchymal cell differentiation toward chondrogenesis targeted transduction patterns in the mouse brain by lentivirus vectors pseudotyped with vsv, ebola, mokola, lcmv, or mulv envelope proteins local gene transfer to calci fi ed tissue cells using prolonged infusion of a lentiviral vector harnessing hiv for therapy, basic research and biotechnology macroarchitectures in spinal cord scaffold implants in fl uence regeneration transduction patterns of pseudotyped lentiviral vectors in the nervous system establishment of an erythroid cell line from primary cd36+ erythroid progenitor cells targeted genome editing across species using zfns and talens in vivo gene transfer into adult stem cells in unconditioned mice by in situ delivery of a lentiviral vector combinatorial control of suicide gene expression by tissue-speci fi c promoter and microrna regulation for cancer therapy effect of hypoxia on the gene pro fi le of human bone marrow-derived mesenchymal stem cells the in fl uence of semen-derived enhancer of virus infection on the ef fi ciency of retroviral gene transfer microdystrophin delivery in dystrophin-de fi cient (mdx) mice by genetically-corrected syngeneic mscs transplantation long-term ef fi cacy and safety of human umbilical cord mesenchymal stromal cells in rotenone-induced hemiparkinsonian rats vegf-expressing human umbilical cord mesenchymal stem cells, an improved therapy strategy for parkinson's disease mesenchymal stem cells promote cardiomyocyte hypertrophy in vitro through hypoxia-induced paracrine mechanisms beta cells can be generated from endogenous progenitors in injured adult mouse pancreas magnetic resonance evaluation of transplanted mesenchymal stem cells after myocardial infarction in swine transplantation of adipose tissue-derived stem cells for treatment of focal cerebral ischemia current advances in retroviral gene therapy correction of cardiac abnormalities in fabry mice by direct intraventricular injection of a recombinant lentiviral vector that engineers expression of alpha-galactosidase a sequencing and characterization of the porcine alpha-galactosidase a gene: towards the generation of a porcine model for fabry disease self-inactivating retroviral vectors designed for transfer of whole genes into mammalian cells lentivirus vector-mediated gene transfer to the developing bronchiolar airway epithelium in the fetal lamb glycosaminoglycans and pdgf signaling in mesenchymal cells validation of a mutated pre sequence allowing high and sustained transgene expression while abrogating whv-x protein synthesis: application to the gene therapy of was the hiv-1 dna fl ap stimulates hiv vector-mediated cell transduction in the brain rescue of atp7b function in hepatocyte-like cells from wilson's disease induced pluripotent stem cells using gene therapy or the chaperone drug curcumin transplantation of neural stem cells modi fi ed by human neurotrophin-3 promotes functional recovery after transient focal cerebral ischemia in rats lentiviral-mediated overexpression of bcl-xl protects primary endothelial cells from ischemia/reperfusion injury-induced apoptosis lentivirus-mediated gene transfer of viral interleukin-10 delays but does not prevent cardiac allograft rejection loss of proliferation and differentiation capacity of aged human periodontal ligament stem cells and rejuvenation by exposure to the young extrinsic environment in vivo reprogramming of adult pancreatic exocrine cells to beta-cells organ-speci fi c expression of the lacz gene controlled by the opsin promoter after intravenous gene administration in adult mice neurovascular pathways to neurodegeneration in alzheimer's disease and other disorders woodchuck hepatitis virus posttranscriptional regulatory element enhances expression of transgenes delivered by retroviral vectors key: cord-267733-fuz8r3vj authors: al ali, sally; baldanta, sara; fernández-escobar, mercedes; guerra, susana title: use of reporter genes in the generation of vaccinia virus-derived vectors date: 2016-05-21 journal: viruses doi: 10.3390/v8050134 sha: doc_id: 267733 cord_uid: fuz8r3vj vaccinia virus (vacv) is one of the most extensively-studied viruses of the poxviridae family. it is easy to genetically modify, so it has become a key tool for many applications. in this context, reporter genes facilitate the study of the role of foreign genes introduced into the genome of vacv. in this review, we describe the type of reporter genes that have been used to generate reporter-expressing vacv and the applications of the recombinant viruses obtained. reporter-expressing vacv are currently employed in basic and immunology research, in the development of vaccines and cancer treatment. since the first description of recombinant dna techniques, many advances have been achieved in the field of molecular biology and genetic modification. currently, there is a wide variety of tools that allow the genetic modification of animals, plants, bacteria and viruses [1] [2] [3] [4] . the genetic modification of viruses has become one of the best strategies for introducing nucleic acids into different cells, tissues or even in in vivo models, given the high transfection efficiency and ease of carrying it out, compared to chemical or physiological methods [5, 6] . after the description of recombination events in cells infected with vaccinia virus (vacv) and through recombinant dna technology [7, 8] , vacv has become a suitable model for the generation of recombinant virus vectors [9] . at first, the main purpose for introducing foreign genes into virus genomes was basic research about the biology of the viruses both in vitro and in vivo. however, with the latest technical advances and the higher understanding of the vacv viral cycle, virus genetic modification is getting a wider spectrum purpose. thus, they can also be used for the development of vaccines or as oncolytic agents. this review aims to highlight the main aspects of the genetic modification of vacv and the generation and application of reporter-expressing virus in this model. vacv is the prototype member of the poxviridae family, so most research of poxvirus has been focused on its use [10] . vacv is a large dna double-stranded virus, with a complex envelope. it was the live vaccine used to eradicate smallpox and nowadays is also used as a viral vector for recombinant vaccines and cancer therapy [9, 11] . the vacv genome is one of the largest of all dna viruses, with a size of 190 kbp and about 250 encoding genes [12] . the genome has a high genetic compaction, with a few intergenic and small non-coding regions. the coding regions are continuous, thereby not given to splicing [13, 14] . vacv have a complete replicating cycle inside the cytoplasm of the host cell, even though it is a dna virus (figure 1 ) [10] . this fact determines the genetic characteristics of the virus, being completely independent of the replication and transcription machinery of the host cell. once the virion infects the host cell, the viral core is uncoated, and nearly 100 early viral genes are transcribed [15, 16] . early genes produce the required enzymes for catalyzing the viral core breakdown, viral dna replication and the modulation of the host antiviral response [17] . viral dna begins to replicate inside the infected host cell using viral enzymes at 3 h post-infection. as soon as the viral replication starts, transcription of downstream genes encoding for regulatory proteins that induce the expression of the late genes occurs. late genes encode for proteins and enzymes required for the assembly of new viral particles. after dna and all viral proteins are synthesized, the process known as morphogenesis begins, which results in the formation of the new virions [18, 19] . these can be retained inside the cell until cellular lysis or released to the environment by other mechanism [10, 18] . several features of the biology of vacv make it suitable for its use as a vector in biological experiments, vaccine design or cancer therapy. the complete cytoplasmic replication of vacv facilitates the expression of foreign genes inserted in the viral genome and its detection or isolation [20, 21] . usually, bacterial or non-mammalian viral vectors fail to make the expressed proteins to perform its full activity as antigens. however, vacv has the ability to transcribe its genes using its own transcription factors and enzymes. that means that if a foreign gene is inserted directly to a vacv promoter element, it will be transcribed with foreign proteins reaching high levels of expression in the infected cell. moreover, this replication cycle is an appropriate model for molecular and genetic investigations of cis and trans factors that are mainly required for gene expression [12, 22] . furthermore, since vacv remains in the cytoplasm, the risk of insertional mutagenesis and oncogenesis, the main problems encountered in gene therapy using integrative viruses, disappears. in some cases, patients treated with retroviral vectors have developed cancer years after they have been treated [5, 23] . vacv can replicate in different cell lines, primary cell cultures, and also grows in several animal species, such as mice, guinea pigs, rabbits, etc. [10] . this broad host range allows infection of cell lines with recombinant viruses for large-scale expression of heterologous proteins, which reduces its cost several features of the biology of vacv make it suitable for its use as a vector in biological experiments, vaccine design or cancer therapy. the complete cytoplasmic replication of vacv facilitates the expression of foreign genes inserted in the viral genome and its detection or isolation [20, 21] . usually, bacterial or non-mammalian viral vectors fail to make the expressed proteins to perform its full activity as antigens. however, vacv has the ability to transcribe its genes using its own transcription factors and enzymes. that means that if a foreign gene is inserted directly to a vacv promoter element, it will be transcribed with foreign proteins reaching high levels of expression in the infected cell. moreover, this replication cycle is an appropriate model for molecular and genetic investigations of cis and trans factors that are mainly required for gene expression [12, 22] . furthermore, since vacv remains in the cytoplasm, the risk of insertional mutagenesis and oncogenesis, the main problems encountered in gene therapy using integrative viruses, disappears. in some cases, patients treated with retroviral vectors have developed cancer years after they have been treated [5, 23] . vacv can replicate in different cell lines, primary cell cultures, and also grows in several animal species, such as mice, guinea pigs, rabbits, etc. [10] . this broad host range allows infection of cell lines with recombinant viruses for large-scale expression of heterologous proteins, which reduces its cost in comparison to other production systems [21, 24] . additionally, vacv enables high production titers, so it is an advantage in the manufacturing of a large amount of vaccines [6] . although the vacv genome is large and compact, it can tolerate the deletion of certain viral sequences and the insertion of exogenous genetic material [25] . a vacv vector has a transgene capacity of approximately 25-30 kb, higher than other viral vectors, including adeno-associated virus (4.5 kb), adenovirus (8-10 kb) and retrovirus (7-8 kb) [4] . thus, vacv is an excellent candidate vector in the design of polyvalent vaccines with antigens from several pathogens or different antigens from the same pathogen [9, 26] . finally, as far as its use as a vaccine vector is concerned, vacv is clearly immunogenic effective, strong evidence being the eradication of smallpox in 1980 [11] . vacv is also safe and easy to inoculate, since it can be administrated intradermally or with an air gun without medical training. in some organisms, it has been found that it can cause problems by preexisting immunity, but the probability of having post-vaccination complications, such as progressive vacv infection or encephalitis, is significantly low [27] . nowadays, due to the better knowledge of the vacv biology and the immune response generated after vaccination, vaccines based on this virus are becoming safer [9] . in addition, it is important to remark that vacv vectors are very stable and can be lyophilized and kept frozen for several years, facilitating its transport and storage [23] . to get recombinant vacv expressing foreign genes, the main method used is homologous recombination ( figure 2 ) [28] . first, it is necessary to construct a plasmid that contains the gene or transgene of interest. after that, the cells have to be infected with the virus and subsequently transfected with the plasmid that contains the transgene. an alternative method could be used, employing two viruses, one defective for some genes and one wild-type acting as a helper [4, 29] . for both methods, the recombinant viruses are produced by homologous recombination inside the infected cell. in comparison to other production systems [21, 24] . additionally, vacv enables high production titers, so it is an advantage in the manufacturing of a large amount of vaccines [6] . although the vacv genome is large and compact, it can tolerate the deletion of certain viral sequences and the insertion of exogenous genetic material [25] . a vacv vector has a transgene capacity of approximately 25-30 kb, higher than other viral vectors, including adeno-associated virus (4.5 kb), adenovirus (8-10 kb) and retrovirus (7-8 kb) [4] . thus, vacv is an excellent candidate vector in the design of polyvalent vaccines with antigens from several pathogens or different antigens from the same pathogen [9, 26] . finally, as far as its use as a vaccine vector is concerned, vacv is clearly immunogenic effective, strong evidence being the eradication of smallpox in 1980 [11] . vacv is also safe and easy to inoculate, since it can be administrated intradermally or with an air gun without medical training. in some organisms, it has been found that it can cause problems by preexisting immunity, but the probability of having post-vaccination complications, such as progressive vacv infection or encephalitis, is significantly low [27] . nowadays, due to the better knowledge of the vacv biology and the immune response generated after vaccination, vaccines based on this virus are becoming safer [9] . in addition, it is important to remark that vacv vectors are very stable and can be lyophilized and kept frozen for several years, facilitating its transport and storage [23] . to get recombinant vacv expressing foreign genes, the main method used is homologous recombination ( figure 2 ) [28] . first, it is necessary to construct a plasmid that contains the gene or transgene of interest. after that, the cells have to be infected with the virus and subsequently transfected with the plasmid that contains the transgene. an alternative method could be used, employing two viruses, one defective for some genes and one wild-type acting as a helper [4, 29] . for both methods, the recombinant viruses are produced by homologous recombination inside the infected cell. another way to generate recombinant viruses is the method described by falkner and moss [30] , denominated transient dominant selection (tds), which allows the introduction of site-directed mutations into the vacv genome. generally, the recombinant viruses obtained by this method are rescued by metabolic selection, using the guanine phosphoribosyltransferase gene (gpt) from escherichia coli as a marker. the presence of the protein encoded by gpt allows the recombinant viruses to grow in the presence of mycophenolic acid, xanthine and hypoxanthine [31] . subsequently, after this first metabolic selection, a second recombination event must occur to eliminate the selection marker, maintaining the mutation introduced into the vacv genome ( figure 3 ) [32] . in contrast to the method described above, in the tds technique the marker should not be flanked by homologous regions of the vacv genome [30] . alternatively, puromycin resistance could be used as a selection marker in tds, increasing the recombinant viruses' generation efficiency [33] . another way to generate recombinant viruses is the method described by falkner and moss [30] , denominated transient dominant selection (tds), which allows the introduction of site-directed mutations into the vacv genome. generally, the recombinant viruses obtained by this method are rescued by metabolic selection, using the guanine phosphoribosyltransferase gene (gpt) from escherichia coli as a marker. the presence of the protein encoded by gpt allows the recombinant viruses to grow in the presence of mycophenolic acid, xanthine and hypoxanthine [31] . subsequently, after this first metabolic selection, a second recombination event must occur to eliminate the selection marker, maintaining the mutation introduced into the vacv genome ( figure 3 ) [32] . in contrast to the method described above, in the tds technique the marker should not be flanked by homologous regions of the vacv genome [30] . alternatively, puromycin resistance could be used as a selection marker in tds, increasing the recombinant viruses' generation efficiency [33] . two important aspects to be considered when obtaining recombinant poxvirus are the vacv genome insertion sites and the reporter genes introduced. the vacv genome has about seven known insertion sites where foreign genes can be inserted ( figure 4 ) [13] . the insertion site choice depends mainly on the future application of the recombinant viruses. it may also be important in the later selection of the recombinant viruses obtained. for instance, inserting the gene of interest in the thymidine kinase (tk) locus confers a detectable phenotype (tk-): the recombinant viruses are able to grow in the presence of 5-bromo-2'-deoxyuridine (brdu), a synthetic analog of thymidine [28, 34] . another important site of insertion that allows a subsequent selection is the vacv hemagglutinin (ha) gene as the recombinant viruses can be easily recognized by their disability to bind erythrocytes in a hemagglutination test [35] [36] [37] . two important aspects to be considered when obtaining recombinant poxvirus are the vacv genome insertion sites and the reporter genes introduced. the vacv genome has about seven known insertion sites where foreign genes can be inserted ( figure 4 ) [13] . the insertion site choice depends mainly on the future application of the recombinant viruses. it may also be important in the later selection of the recombinant viruses obtained. for instance, inserting the gene of interest in the thymidine kinase (tk) locus confers a detectable phenotype (tk-): the recombinant viruses are able to grow in the presence of 5-bromo-2'-deoxyuridine (brdu), a synthetic analog of thymidine [28, 34] . another important site of insertion that allows a subsequent selection is the vacv hemagglutinin (ha) gene as the recombinant viruses can be easily recognized by their disability to bind erythrocytes in a hemagglutination test [35] [36] [37] . vacv has five more places of insertion: the bamhi site of the hindiii-f dna fragment [38] ; the vacv growth factor gene (vgf), located in both inverted terminal repeats (itrs) [39] ; the n2 and m1 genes located on the left side of the vacv genome [40] ; the m1 subunit of the ribonucleotide reductase (rr) gene in the hindiii-i dna fragment [41] ; and the a27l gene encoding the 14 kda fusion protein, in the large hindiii-a dna fragment [42] . it is noteworthy that some strains of vacv have only one copy of vfg, such as vacv lister variants [33] . recombinant production using these insertion sites, although successfully occurring, requires the use of a marker gene or other strategies for later selection of the recombinant viruses. due to these limitations, the tk gene is the most common site of insertion in the vacv genome [5] . some authors have used temperature-sensitive vacv strains, allowing the recombinant viruses to be selected in culture at 40˝c [43] . however, the most common way for an easy identification of recombinant viruses is the use of reporter genes as selectable markers, which will be discussed in section 4.2 [44] . vacv has five more places of insertion: the bamhi site of the hindiii-f dna fragment [38] ; the vacv growth factor gene (vgf), located in both inverted terminal repeats (itrs) [39] ; the n2 and m1 genes located on the left side of the vacv genome [40] ; the m1 subunit of the ribonucleotide reductase (rr) gene in the hindiii-i dna fragment [41] ; and the a27l gene encoding the 14 kda fusion protein, in the large hindiii-a dna fragment [42] . it is noteworthy that some strains of vacv have only one copy of vfg, such as vacv lister variants [33] . recombinant production using these insertion sites, although successfully occurring, requires the use of a marker gene or other strategies for later selection of the recombinant viruses. due to these limitations, the tk gene is the most common site of insertion in the vacv genome [5] . some authors have used temperature-sensitive vacv strains, allowing the recombinant viruses to be selected in culture at 40 °c [43] . however, the most common way for an easy identification of recombinant viruses is the use of reporter genes as selectable markers, which will be discussed in section 4.2 [44] . in spite of the promoter or the regions between the promoter and coding region, the insertion site also influences foreign gene expression and virus virulence [13, 23, 25] . insertion into the tk, vgf, rr or a27l genes has an impact on viral replication in vivo, but not in vitro [25, 45] . moreover, the method described in figure 2 requires the use of special cell lines or mutagenic selective agents, such as tk-/-cell lines and brdu [30] . for this reason, different strategies and new insertion sites are being studied to ensure the correct expression of the transgenes in vitro and in vivo [25, 33, 46, 47] . reporter-expressing viruses are recombinant viruses expressing a reporter gene [48] . in some cases, the reporter gene is located downstream of a viral promoter, to study biological pathways or, fused with other viral or foreign genes. as reporter genes are expected to be easily detected, they are the best indicators for screening successfully recombinant viruses. the reporter gene should be chosen considering the non-endogenous activity in the cell type, tissue or organism used to culture the viruses [44] . reporter genes have additional applications in vitro and in vivo, as the reporter gene acts as a substitute of the gene of interest. moreover, reporter genes facilitate the use of tissue-specific and pathway-specific promoters, as well as regulatory promoter elements as biomarkers for a particular event route. furthermore, it is important that the existence of the reporter gene should not affect the normal physiology and general characteristics of the transfected cells [48] [49] [50] . table 1 presents an overview of the reporter genes commonly used in the generation of recombinant vacv. in spite of the promoter or the regions between the promoter and coding region, the insertion site also influences foreign gene expression and virus virulence [13, 23, 25] . insertion into the tk, vgf, rr or a27l genes has an impact on viral replication in vivo, but not in vitro [25, 45] . moreover, the method described in figure 2 requires the use of special cell lines or mutagenic selective agents, such as tk-/cell lines and brdu [30] . for this reason, different strategies and new insertion sites are being studied to ensure the correct expression of the transgenes in vitro and in vivo [25, 33, 46, 47] . reporter-expressing viruses are recombinant viruses expressing a reporter gene [48] . in some cases, the reporter gene is located downstream of a viral promoter, to study biological pathways or, fused with other viral or foreign genes. as reporter genes are expected to be easily detected, they are the best indicators for screening successfully recombinant viruses. the reporter gene should be chosen considering the non-endogenous activity in the cell type, tissue or organism used to culture the viruses [44] . reporter genes have additional applications in vitro and in vivo, as the reporter gene acts as a substitute of the gene of interest. moreover, reporter genes facilitate the use of tissue-specific and pathway-specific promoters, as well as regulatory promoter elements as biomarkers for a particular event route. furthermore, it is important that the existence of the reporter gene should not affect the normal physiology and general characteristics of the transfected cells [48] [49] [50] . table 1 presents an overview of the reporter genes commonly used in the generation of recombinant vacv. elisa: enzyme-linked immunosorbent assay. cat was the first reporter gene used in transcriptional assays in mammalian cells. cat is an enzyme from escherichia coli that detoxifies the antibiotic chloramphenicol, which inhibits protein synthesis in bacteria [58] . particularly, cat links acetyl-coenzyme a (acetyl-coa) groups to chloramphenicol, preventing it from blocking the 50 s ribosomal subunit. this gene is not found in eukaryotes, so eukaryotic cells do not present any basal cat activity [44] . the reaction catalyzed by cat can be quantified using fluorogenic or radiolabeled substrates, such as 3 h-labeled acetyl-coa and 14 c-labeled chloramphenicol. cat can be detected either by thin-layer chromatography, autoradiography or enzyme-linked immunosorbent assay (elisa) [51] . there is a strong link between cat gene transcript levels and enzymatic activity, which is easy to quantify. thus, cat has become a suitable reporter gene for investigating transcriptional elements in a wide variety of experiments implicating animal and plant cells, as well as viruses [51] . there are some disadvantages of using the cat system, such as the higher amount of cells required when compared to other assays, like the luciferase assay (detailed in section 4.2.5). in addition, the cat system is not suitable for use with weakly-expressed genes and cat promoter activity quantification takes longer than other reporter systems [52] . finally, this reporter gene has another important limitation due to the use of radioisotopes [44] . the first study using lacz as a reporter gene was published in 1980, and since then, it has become one of the most commonly-used reporter genes in molecular biology [49] . although β-galactosidase catalyzes the cleavage of the disaccharide lactose to form glucose and galactose, it recognizes several artificial substrates, which has promoted its use as a reporter gene [58] . thus, β-galactosidase can hydrolyze substrates such as ortho-nitrophenyl beta-galactoside (onpg), 5-bromo-4-chloro-3-indolyl beta-d-galactopyranoside (x-gal) and 3,4-cyclohexenoesculetin beta-d-galactopyranoside (s-gal), resulting in a yellow, blue or black product precipitate, respectively [53, 59] . furthermore, expression of the lacz gene can be stimulated with isopropyl beta-d-thiogalactopyranoside (iptg), which is a highly stable synthetic and non-hydrolyzable analog of lactose [49] . one of the applications of the lacz reporter gene is the selection of transformed bacterial colonies. the recombinant (white) and non-recombinant (blue) bacteria are discriminated based on the interruption of the lacz gene by the insert dna or gene of interest using x-gal as a substrate [53] . other uses are the visualization of the β-galactosidase expression in transfected eukaryotic cells or the selection of the recombinant virus by viral plaque screening [60] . finally, lacz is used to detect β-galactosidase activity in immunological and histochemical experiments [44] . one of the main advantages of using this reporter gene system is its low cost, since it does not require specific devices to detect the colorimetric reaction or to identify its expression. another escherichia coli-derived hydrolyzing enzyme gene that lends a reporter assay is gus. the β-glucuronidase protein catalyzes the breakdown of complex carbohydrates, such as glycosaminoglycans. this reporter gene system has been widely used in transgenic plants, and it has also been successfully used in mammalian cells for vacv recombinant virus selection [54] . for the β-glucuronidase (gus) assay 4-methylumbelliferyl beta-d-glucuronide (mug) or 5-bromo-4-chloro-3-indolyl beta-d-glucuronide (x-gluc) can be used as substrates. they respectively lead to a fluorogenic or a blue product after cleavage [61, 62] . monitoring β-glucuronidase activity through a gus assay allows the determination of the spatial and temporal expression of the gene of interest [63] . the most known fluorescent protein is green fluorescent protein (gfp), which was cloned from the species of jellyfish aequorea victoria. because of the great impact of fluorescent proteins in molecular biology applications, the nobel prize in chemistry 2008 was awarded to osamu shimomura, martin chalfie and roger y. tsien for the discovery and development of gfp [64, 65] . gfp is the most used reporter gene; however, genetic engineering has developed a wide variety of color mutants, such as red fluorescent protein (rfp) or yellow fluorescent protein (yfp) among others [49] . fluorescent proteins tolerate n-and c-terminal fusions to a wide-range of proteins, have been expressed in most known cell types and are used as a non-harmful fluorescent marker in living cells and organisms. the use of fluorescent proteins allows a variety of applications: cell lineage tracker , reporter for gene expression assays or measure of protein-protein interactions. additionally, cell-fixation is not needed to examine its expression, and the probability of artifacts is quite small compared to immunocytochemical methods which require cell fixation [44] . one of the disadvantages of these proteins is their size. therefore, in some cases, they can affect the in vivo function of fused proteins or genes of interest. nevertheless, one limitation of using gfp is its low sensitivity [66] , another is that its signal cannot be exogenously amplified [50] . the first luciferase (luc), from the firefly photinus pyralis, was cloned in 1980 and luc has been widely used as a reporter gene. later, it was also described in bacteria and dinoflagellates [44] . luciferases are enzymes that catalyze a chemical reaction resulting in the production of light. firefly luciferase oxidize the d-luciferin, in the presence of oxygen and adenosine triphosphate (atp) as the energy source. as in β-galactosidase assays, an exogenous substrate is needed, and it may be a disadvantage [49] . in other systems, such as the luciferase identified in bacteria (luxcdabe operon), the enzyme catalyzes the oxidation of long-chain aldehydes and flavin mononucleotides (fmnh 2 ) in the presence of oxygen to yield green-blue light [67] . although in bacteria this operon encodes all components necessary for light emission, it is limited in mammalian cells. therefore, the exogenous substrate has to be added to improve the reaction [56] . besides the different substrates required, each luciferase system is categorized by having specific kinetics, with a particular detection and sensitivities that require adjusting the experimental design [58, 67] . the use of luciferase is extremely widespread in biological systems studies and includes cell proliferation assays, protein folding/secretion analyses, in vivo imaging and control of in vivo viral spreading [57, [67] [68] [69] . the main advantage of using this system is its high sensitivity when compared to other systems, such as cat. additionally, the luc system is more direct, rapid and suitable when it comes to weakly-expressed genes, and it can be used to quantify gene activity. one disadvantage of the luc system is the requirement of ultrasensitive charge-coupled device (ccd) cameras to detect gene expression [56] . reporter-gene assays have helped the pox virologists in basic research, for example for the study of the location, structure and function of many vacv proteins during the infection cycle and their interaction with proteins of the host cell [44, 70] . as shown in dvoracek and shors [63] , the gus reporter gene was used for deleting the d9r viral protein and selecting the recombinant viruses, with the aim to understand the role of this protein in the viral life cycle. in addition, the lacz gene has typically been used mainly for the selection of recombinants [71] . moreover, several studies have reported the different transgenes' insertion points and vacv promoters in which the recombinant virus production was enhanced. these studies are essential for improvement of the development of vaccines based on recombinant vacv [62, 72] . on the other hand, fluorescent markers such as the gfp, yfp or luciferase are also useful for labelling vacv replicative strains. these viruses have allowed the study of processes like the input and output morphogenesis in virus-infected cells [68, 69, [73] [74] [75] [76] . in these studies, fluorescence of certain viral proteins allows us to study their interaction with other viral or cellular proteins [77] . furthermore, vacv is a clear example of how viruses have developed strategies to evade the immune response [78] . in this field, the generation of recombinant vacv with reporter genes is also useful to discern the molecular mechanism by which vacv proteins manipulate the immune system of the host. thus, in unterholzner et al. [79] , the generation of a gfp-labeled recombinant vacv revealed that the c6 viral protein acted as an immunomodulatory agent, blocking the expression of type i interferon. another major application of reporter-expressing vacvs is the design of high-throughput assays. the generation of lacz or gfp expressing recombinant virus can be used to optimize antibody neutralization assays [71, 80] . lastly, vacv and reporter genes have been used to study proteins from other viruses, particularly rna viruses, such as influenza or severe acute respiratory syndrome-associated coronavirus (sars-cov) [81] . to genetically modify these viruses, rna must be reverse transcribed to cdna, since this is particularly unstable in plasmids, making vacv a good tool for functional studies of proteins from such viruses [82] . there are several in vivo applications for recombinant reporter-expressing viruses. for example, in virulence studies, the use of labeled viruses allows us to follow the viral pathogenicity and detect in which organs the viral replication and dissemination occur [70, 76] . for example, zaitseva et al. [69] used the recombinant vacv western reverse strain (wr)-luc to analyze the viral spread in vivo for several days reducing the number of mice used. moreover, vacv is an effective enhancer for both humoral and cell-mediated immunity; it is used as a vector to study the immune system and the expression of proteins' antigenicity of other pathogens. furthermore, vacv is used to explore the immunopathological mechanisms, to know which epitopes or antigens presented by a pathogen have the ability to induce the host-immune response, and to demonstrate the specific role of a particular antigen during the pathogenic process [13, 83, 84] . despite the examples mentioned above, the most common uses of recombinant vacv in vivo are the production of prophylactic vaccines and treatments against cancer [4, 85] . table 2 shows some of the vaccines based on vacv, with the reporter gene and the insertion site employed indicated in each case. in these vaccines, vacv acts as a vector capable of delivering antigens from other organisms [23] . while in many recombinant vaccines a viral antigen has been inserted, some of them have also been developed against bacteria [86] or protists [34, 87, 88] . these vaccines simulate infection by the pathogen from which the antigens are and elicit the immune response, by producing antigens for different pathogens. in several vaccines, mainly against human immunodeficiency virus (hiv) or influenza, genes of immunomodulatory cytokines are added for coexpression with the antigen, improving the immunogenicity of the vaccines [23, 89, 90] . as summarized in table 2 , most of the transgene insertion sites are within the tk or the ha genes, making the selection of recombinants easier, as explained above. however, in several vaccines, besides using this strategy, a reporter gene is used as well. the use of reporter genes facilitates the preliminary tests of the vaccine on animal models. moreover, especially in vaccines used in animals, the reporter gene makes it possible to distinguish between vaccinated and infected animals [48] . for example, vacv has been used for nearly twenty years to eradicate rabies from wildlife as an oral-based vaccine. in this case, the recombinant vacv expresses the rabies virus glycoprotein and has been used to vaccinate raccoons, red foxes, skunks and coyotes in the united states and europe. this battle has successfully purged rabies in some parts of europe and the united states [91] . streptococcus pyogenes m protein tk gene not mentioned [108] another application for vacv vectors is in cancer treatment, known as oncolytic virotherapy [26] . this is the use of replication-competent viruses to selectively attack and destroy cancer cells, without harming healthy host cells [109] . examples of recombinant vacv used are summarized in table 3 . a promising study is the use of oncolytic vacv as a vector for the human sodium iodide symporter (hnis) gene in prostate cancer therapy, which has been demonstrated to restrict tumor growth and to increase survival in mice [110] . vacv is also a promising therapeutic agent for pancreatic cancer [85] , cholangiocarcinoma [111] and colorectal cancer [112] . it is worth mentioning that many of the viral vectors developed to treat tumors have several common characteristics. generally, vacv oncolytic vectors have a deletion in the tk gene, essential for the pyrimidine synthesis pathway, which forces the virus to replicate in cells displaying a high amount of nucleotide pools, enhancing the viral tropism to cancer cells. others have a deletion in the vgf gene, preventing non-infected cells from proliferation [109] . furthermore, as in the development of vaccines, viral vectors are "armed" with genes that enhance the antitumor activity, the virus tropism or the immunoreactivity, to promote better tumor destruction, such as granulocyte-macrophage colony-stimulating factor (gm-csf) or erythropoietin genes (enhanced virus; table 3 ). another particular feature is that many of these recombinants carry reporter genes, and thus viral replication can be monitored by non-invasive imaging methods [68, 69, 76, 113] . the main limitation of using vacv as a vector is the short-term gene expression, since it is a lytic virus killing the infected cells. thus, gene expression will not last for more than 12-24 h post-infection [13, 109] . additionally, although for some applications it is an advantage, since vacv replicates completely in the infected cell cytoplasm, it is hard to use vacv to engineer nuclear gene replacement [23] . the other main disadvantage is the limited immunogenicity in individuals vaccinated against smallpox. this pre-existing immunity reduces the effectiveness of vaccines based on vacv, although some trials have overcome this problem by mucosal vaccination with vaccinia vectors [5] . the vacv safety profile should be considered because it has progressive complications especially with immunocompromised individuals [11] . these limitations primarily affect in vivo applications of vacv recombinants in vaccine development, so several attenuated strains of vacv are being generated [9] . as for other viruses, the development of vaccines or oncolytic therapies based on vacv requires the understanding of its pathogenesis and biology. despite improvements in the vectors' design, such as the use of different promoters or insertion sites, homologous recombination has been almost exclusively the way to obtain vacv recombinants [45] . homologous recombination requires the use of markers or reporter genes for selecting recombinants, which offers many disadvantages. apart from the physical space needed for the marker gene, which is limited in therapeutic virus, the use of certain markers can introduce mutations or generate artifacts that are only found after an analysis of the generated virus. sometimes, these problems cannot be detected in vitro, but are very important to overcome when these vectors are used in vivo on animal models [46, 48] . in recent years, some strategies have been developed to avoid these risks using markers, or at least to remove them from the final recombinant vacv. rice and colleagues [45] described a double selection method to improve the selection of recombinant vacv, so that a reporter or marker gene is not necessary. a helper virus is used to rescue a recombinant vacv and is subsequently grown in non-permissive cells to the helper virus; allowing the selection of a large percentage of recombinant virions. however, the method that has certainly had an enormous importance in the modification of genomes is the clustered regularly interspaced short palindromic repeats (crispr)/crispr-associated protein 9 (cas9) system. briefly, the crispr/cas9 system consists of an endonuclease (cas9) employing a guide rna to generate a break in a target place of the genome, later to be repaired, either randomly or precisely using a specifically designed "restful" template [119] . the effectiveness of this system has been proven in different organisms, including viruses, such as herpes simplex virus (hsv) [120] , hepatitis b virus (hbv) [121] and hiv [76] . currently, this technique is starting to be used also in vacv [47] . for example, this system has achieved the deletion of vacv virulence genes, such as a46l and n1l. a46l and n1l are vacv intracellular proteins that inhibit nuclear factor-kappa b (nf-kb) activation, so it is undesirable that they were present in vacv vectors with therapeutic purposes [78] . furthermore, given the efficiency of the method, "reparative" vectors with excisable marker genes have been designed. therefore, recombinant viruses are effectively isolated, but eventually, the marker gene is eliminated [46] . given the simplicity of recombinant vacv by the crispr/cas9 system generation, an exponential increase of applications with better markers for basic research or without selectable markers for clinical application is expected [119, 120] . in conclusion, the development of recombinant viruses is a promising therapeutic advance in the biomedical field. in this sense, the use of reporter-expressing vacvs has become a fundamental tool for a number of applications, in basic research, vaccine design and cancer therapy. as many of these trials are still experimental, more information is required regarding the side effects of the viral treatment. continuing efforts are necessary to develop new reporter-expressing vacvs that are safer and more effective for future therapies. author contributions: sara baldanta wrote the paper and created the figures, saly al ali and the rest of the authors also wrote the paper. the authors declare no conflict of interest. western reserve strain x-gal 5-bromo-4-chloro-3-indolyl beta-d-galactopyranoside x-gluc 5-bromo-4-chloro-3-indolyl beta-d-glucuronide yfp yellow fluorescent protein. genetic engineering of mammals genetic engineering of crops: a ray of hope for enhanced food security precision genome engineering in lactic acid bacteria developments in viral vector-based vaccines recombinant viruses as vaccines against viral diseases. braz mammalian cell transfection: the present and the future molecular genetics of vaccinia virus: demonstration of marker rescue construction of poxviruses as cloning vectors: insertion of the thymidine kinase gene from herpes simplex virus into the dna of infectious vaccinia virus the evolution of poxvirus vaccines poxvirus tropism the world health organization and global smallpox eradication vaccinia virus transcription vaccinia virus vectors: new strategies for producing recombinant vaccines poxvirus genome evolution by gene gain and loss poxvirus cell entry: how many proteins does it take? viruses poxvirus host cell entry comparative analysis of viral gene expression programs during poxvirus infection: a transcriptional map of the vaccinia and monkeypox genomes vaccinia virus morphogenesis and dissemination vaccinia virus infection & temporal analysis of virus gene expression: part 1 exploring vaccinia virus as a tool for large-scale recombinant protein expression evaluation of production parameters with the vaccinia virus expression system using microcarrier attached hela cells cis-and trans-acting elements involved in reactivation of vaccinia virus early transcription recombinant vaccines and the development of new vaccine strategies. braz production of recombinant proteins by vaccinia virus in a microcarrier based mammalian cell perfusion bioreactor insertion sites for recombinant vaccinia virus construction: effects on expression of a foreign protein replicating poxviruses for human cancer therapy vaccinia virus-induced smallpox postvaccinal encephalitis in case of blood-brain barrier damage recombinant vaccinia virus vaccines vaccinia virus as a subhelper for aav replication and packaging transient dominant selection of recombinant vaccinia viruses escherichia coli gpt gene provides dominant selection for vaccinia virus open reading frame expression vectors methodology for the efficient generation of fluorescently tagged vaccinia virus proteins apoptin enhances the oncolytic properties of vaccinia virus and modifies mechanisms of tumor regression vaccine efficacy against malaria by the combination of porcine parvovirus-like particles and vaccinia virus vectors expressing cs of plasmodium enhanced cd8+ t cell immune response against a v3 loop multi-epitope polypeptide (tab13) of hiv-1 env after priming with purified fusion protein and booster with modified vaccinia virus ankara (mva-tab) recombinant: a comparison of humoral and cellular immune responses with the vaccinia virus western reserve (wr) vector. vaccine molecular characterization of the vaccinia virus hemagglutinin gene vaccinia virus: kinetics of the hemagglutination-inhibition test and preparation of hemagglutinin construction of live vaccines using genetically engineered poxviruses: biological activity of vaccinia virus recombinants expressing the hepatitis b virus surface antigen and the herpes simplex virus glycoprotein d deletion of the vaccinia virus growth factor gene reduces virus virulence decreased virulence of recombinant vaccinia virus expression vectors is associated with a thymidine kinase-negative phenotype insertional inactivation of the large subunit of ribonucleotide reductase encoded by vaccinia virus is associated with reduced virulence in vivo highly attenuated vaccinia virus mutants for the generation of safe recombinant viruses isolation, characterization, and physical mapping of temperature-sensitive mutants of vaccinia virus recent developments of biological reporter technology for detecting gene expression an efficient method for generating poxvirus recombinants in the absence of selection a marker-free system for highly efficient construction of vaccinia virus vectors using crispr cas9 efficiently editing the vaccinia virus genome by using the crispr-cas9 system development and application of reporter-expressing mononegaviruses: current challenges and perspectives the art of reporter proteins in science: past, present and future applications limitations of green fluorescent protein as a cell lineage marker recombinant genomes which express chloramphenicol acetyltransferase in mammalian cells tissue-specific expression in transgenic mice of a fused gene containing rsv terminal sequences lacz beta-galactosidase: structure and function of an enzyme of historical and molecular biological importance coli beta-glucuronidase (gus) as a marker for recombinant vaccinia viruses vaccinia virus-based expression of gp120 and egfp: survey of mammalian host cell lines the bacterial lux reporter system: applications in bacterial localisation studies firefly luciferase as a tool in molecular and cell biology gene expression assays a novel 1h mri reporter for beta-galactosidase vaccinia virus expression vector: coexpression of beta-galactosidase provides visual screening of recombinant virus plaques an evaluation of the use of 4-methylumbelliferyl-beta-d-glucuronide (mug) in different solid media for the detection and enumeration of escherichia coli in foods a vaccinia virus transfer vector using a gus reporter gene inserted into the i4l locus construction of a novel set of transfer vectors to study vaccinia virus replication and foreign gene expression early history, discovery, and expression of aequorea green fluorescent protein, with a note on an unfinished experiment the nobel prize in chemistry limitations of the reporter green fluorescent protein under simulated tumor conditions guided by the light: visualizing biomolecular processes in living animals with bioluminescence comparison of the replication characteristics of vaccinia virus strains guang 9 and tian tan in vivo and in vitro passive immunotherapies protect wrvfire and ihd-j-luc vaccinia virus-infected mice from lethality by reducing viral loads in the upper respiratory tract and internal organs drug-encoded biomarkers for monitoring biological therapies development of a novel vaccinia-neutralization assay based on reporter-gene expression seven major genomic deletions of vaccinia virus tiantan strain are sufficient to decrease pathogenicity the vaccinia virus fusion inhibitor proteins spi-3 (k2) and ha (a56) expressed by infected cells reduce the entry of superinfecting virus visualization and characterization of the intracellular movement of vaccinia virus intracellular mature virions reverse genetics analysis of poxvirus intermediate transcription factors the crispr/cas9 system inactivates latent hiv-1 proviral dna the a33-dependent incorporation of b5 into extracellular enveloped vaccinia virions is mediated through an interaction between their lumenal domains vaccinia virus immune evasion: mechanisms, virulence and immunogenicity vaccinia virus protein c6 is a virulence factor that binds tbk-1 adaptor proteins and inhibits activation of irf3 and irf7 a rapid, high-throughput vaccinia virus neutralization assay for testing smallpox vaccine efficacy based on detection of green fluorescent protein protective efficacy of the conserved np, pb1, and m1 proteins as immunogens in dna-and vaccinia virus-based universal influenza a virus vaccines in mice reverse genetics of sars-related coronavirus using vaccinia virus-based recombination recombinant vaccinia virus-induced t-cell immunity: quantitation of the response to the virus vector and the foreign epitope fusion-expressed ctb improves both systemic and mucosal t-cell responses elicited by an intranasal dna priming/intramuscular recombinant vaccinia boosting regimen vaccinia virus, a promising new therapeutic agent for pancreatic cancer characterization of specific immune responses of mice inoculated with recombinant vaccinia virus expressing an 18-kilodalton outer membrane protein of brucella abortus anchoring a secreted plasmodium antigen on the surface of recombinant vaccinia virus-infected cells increases its immunogenicity plasmodium knowlesi sporozoite antigen: expression by infectious recombinant vaccinia virus il-12 delivery from recombinant vaccinia virus attenuates the vector and enhances the cellular immune response against hiv-1 env in a dose-dependent manner il-15 adjuvanted multivalent vaccinia-based universal influenza vaccine requires cd4+ t cells for heterosubtypic protection field use of a vaccinia-rabies recombinant vaccine for the control of sylvatic rabies in europe and north america prime-boost immunization schedules based on influenza virus and vaccinia virus vectors potentiate cellular immune responses against human immunodeficiency virus env protein systemically and in the genitorectal draining lymph nodes hiv-1 reverse transcriptase is a target for cytotoxic t lymphocytes in infected individuals analysis of antibody formation to the vaccinia virus in human subjects and rabbits in response to the administration of a recombinant vaccinia-hepatitis b vaccine immunogenicity of recombinant vaccinia viruses expressing hepatitis b virus surface antigen in mice influence of the parental virus strain on the virulence and immunogenicity of recombinant vaccinia viruses expressing hbv pres2-s protein or vzv glycoprotein i. vaccine hepatitis b virus large surface protein is not secreted but is immunogenic when selectively expressed by recombinant vaccinia virus selective synthesis and secretion of particles composed of the hepatitis b virus middle surface protein directed by a recombinant vaccinia virus: induction of antibodies to pre-s and s epitopes immunization with a vaccinia virus recombinant expressing herpes simplex virus type 1 glycoprotein d: long-term protection and effect of revaccination mechanisms of antiviral immunity induced by a vaccinia virus recombinant expressing herpes simplex virus type 1 glycoprotein d: cytotoxic t cells expression of herpes simplex virus 1 glycoprotein b by a recombinant vaccinia virus and protection of mice against lethal herpes simplex virus 1 infection a recombinant vaccinia virus expressing herpes simplex virus type 1 glycoprotein b induces cytotoxic t lymphocytes in mice expression and characterization of herpes simplex virus type 1 (hsv-1) glycoprotein g (gg) by recombinant vaccinia virus: neutralization of hsv-1 infectivity with anti-gg antibody construction and characterization of an infectious vaccinia virus recombinant that expresses the influenza hemagglutinin gene and induces resistance to influenza virus infection in hamsters a human multi-epitope recombinant vaccinia virus as a universal t cell vaccine candidate against influenza virus heterologous prime-boost vaccination with a non-replicative vaccinia recombinant vector expressing lack confers protection against canine visceral leishmaniasis with a predominant th1-specific immune response vaccinia virus as a vaccine delivery system for marsupial wildlife expression of streptococcal m protein in mammalian cells oncolytic poxviruses oncolytic vaccinia virus as a vector for therapeutic sodium iodide symporter gene therapy in prostate cancer recombinant vaccinia virus glv-1h68 is a promising oncolytic vector in the treatment of cholangiocarcinoma phase 1b trial of biweekly intravenous pexa-vec (jx-594), an oncolytic and immunotherapeutic vaccinia virus in colorectal cancer armed therapeutic viruses -a disruptive therapy on the horizon of cancer immunotherapy jx-594, a targeted oncolytic poxvirus for the treatment of cancer systemic cancer therapy with a tumor-selective vaccinia virus mutant lacking thymidine kinase and vaccinia growth factor genes double-deleted vaccinia virus in virotherapy for refractory and metastatic pediatric solid tumors vaccinia virus-mediated expression of human erythropoietin in tumors enhances virotherapy and alleviates cancer-related anemia in mice oncolytic vaccinia virus expressing the human somatostatin receptor sstr2: molecular imaging after systemic delivery using 111in-pentetreotide recent progress in crispr/cas9 technology engineering large viral dna genomes using the crispr-cas9 system application of crispr/cas9 technology to hbv we thank all of the pox virologist who contributed to this study. this work is supported by grant fis2011-00127 and reference saf2014-54623-r to sg. key: cord-004893-28mrzvsc authors: pavesi, angelo; de iaco, bettina; granero, maria ilde; porati, alfredo title: on the informational content of overlapping genes in prokaryotic and eukaryotic viruses date: 1997 journal: j mol evol doi: 10.1007/pl00006185 sha: doc_id: 4893 cord_uid: 28mrzvsc in genetic language a peculiar arrangement of biological information is provided by overlapping genes in which the same region of dna can code for functionally unrelated messages. in this work, the informational content of overlapping genes belonging to prokaryotic and eukaryotic viruses was analyzed. using information theory indices, we identified in the regions of overlap a first pattern, exhibiting a more uniform base composition and more severe constraints in base ordering with respect to the nonoverlapping regions. this pattern was found to be peculiar to coliphage, avian hepatitis b virus, human lentivirus, and plant luteovirus families. a second pattern, characterized by the occurrence of similar compositional constraints in both types of coding regions, was found to be limited to plant tymoviruses. at the level of codon usage, a low degree of correlation between overlapping and nonoverlapping coding regions characterized the first pattern, whereas a close link was found in tymoviruses, indicating a fine adaptation of the overlapping frame to the original codon choice of the virus. as a result of codon usage correlation analysis, deductions concerning the origin and evolution of several overlapping frames were also proposed. comparison of amino acid composition revealed an increased frequency of amino acid residues with a high level of degeneracy (arginine, leucine, and serine) in the proteins encoded by overlapping genes; this peculiar feature of overlapping genes can be viewed as a way with which they may expand their coding ability and gain new, specialized functions. a particular issue in the statistical analysis of genomic dna sequences concerns the characterization of codes and semantic patterns in the genetic language (trifonov 1989; smith 1989) . in this language, overlapping genes represent an unusual pattern, as two, or exceptionally three, out-of-phase reading frames may lie in a single nucleotide sequence. such an arrangement, called ''overprinting,'' is frequent in viruses, where it probably evolved to increase the density of genetic information (lamb and horvath 1991) . the first genes of this type were identified by barrell and co-workers (1976) in the genome of x174, a single-stranded dna phage, and similar overlapping regions were later detected in many other genes belonging to dna or rna viruses of both prokaryotes and eukaryotes (normark et al. 1983; samuel 1989 and references therein) . translation of the different reading frames has been shown to be mediated by ribosomal frameshifting, which requires an upstream site of ribosomal slippage and a downstream stem-loop structure known as a ''rna pseudoknot'' (jacks et al. 1988 ; wilson et al. 1988; brierley et al. 1989 ). on the other hand, translation of multiple reading frames can occur simply by internal de novo initiation in an alternative frame and does not require ribosomal frameshifting (atkins et al. 1979; chang et al. 1989) . originally developed to maximize the efficiency of transmission of electronic signals, information theory (shannon and weaver 1949) was later utilized to evaluate the complexity of dna sequences (gatlin 1968 (gatlin , 1972 . in the past years, several papers have dealt with the connection between information theory and the analysis of overlapping coding regions (yockey 1979; granero-porati et al. 1980; smith and waterman 1981) . other studies have addressed the problem of the evolution of overlapping arrangement (miyata and yasunaga 1978; soeda and maruyama 1982; keese and gibbs 1992) and the restrictions imposed on proteins encoded by the overlaid genetic messages (sander and schulz 1979; smith and waterman 1980) . here, we present an analysis at different levels of complexity (divergence from randomness of mono-and dinucleotide composition, choice of synonymous codons, and frequency of occurrence of amino acid residues) of the informational content of overlapping genes. using information theory indices and statistical methods of sequence analysis, the constraints acting on overlapping coding regions were quantitatively evaluated and compared to those occurring in the nonoverlapping regions belonging to the same viral genomes. results obtained from the information theory approach and those derived from codon usage and amino-acid composition correlation analyses are discussed in terms of evolution of overlapping genes. the nucleotide sequences of the complete genomes of three prokaryotic (x174, g4, and ␣3 coliphages), five animal (two avian hepatitis b viruses and three different strains of lentivirus human immunodeficiency type 1), and four plant viruses (beet and barley luteoviruses, turnip and eggplant tymoviruses), all containing a large density of overlapping coding regions, were selected from the embl database (rice et al. 1993) . the genomic map of the five virus families is reported in fig. 1 . divergence from randomness at the level of mono-and dinucleotide composition was evaluated, respectively, by the informational indices d 1 and d 2 (gatlin 1968) : where n is equal to 4 (number of symbols in the genetic language) and p i is the relative frequency of base ''i'' in a sequence under examina-tion. entropy h is measured in bits per symbol and its maximum value, h max , corresponds to a 25% frequency for each base equalling 2 bits/ symbol. the d 1 index represents the divergence from maximum entropy due to constraints on mononucleotide composition. where p ij is the relative frequency of dinucleotide ''ij'' in a sequence under examination. the absolute frequency of dinucleotides is calculated by moving along the sequence with steps corresponding to one nucleotide position. the d 2 index measures the divergence from an independent ordering of bases, thus accounting for the constraints acting on dinucleotide composition. therefore, for a random sequence with no order at any level we would expect values of d 1 and d 2 indices nearly equal to zero. the additional step of our analysis takes into account the comparison, at the level of both codon usage and amino-acid composition, between overlapping and nonoverlapping coding regions. correlation analysis of the codon choice was carried out using both the relative synonymous codon usage (rscu) index (sharp and li 1987) and the pearson correlation coefficient r. the rscu value for each of the 59 degenerate codons was calculated as follows: where n codon is the total number of times a given codon is used in a given coding region, n aminoacid is the absolute frequency for the amino acid specified by that codon and its synonyms, and d is the degeneracy of that amino acid (when all synonyms are used with equal frequencies, a rscu value of 1 for each codon is expected). a set of rscu values obtained from a given overlapping gene was then compared with that of the nonoverlapping regions of the same viral genome by means of the pearson correlation coefficient (r), whose values, ranging from −1 to 1, reflect a completely discordant or concordant degree in the usage of synonymous codons, respectively. at the level of composition in amino acid residues, the degree of similarity between proteins encoded by overlapping and nonoverlapping genes was carried out by the chisquare test (snedecor and cochran 1967) . data were arranged in a 2 × 2 contingency table to identify amino acid residues whose frequency of occurrence in proteins encoded by overlapping genes is significantly higher than that observed in the nonoverlapping counterpart. from each of the 12 viral genomes under examination, two sets of data, including overlapping and nonoverlapping genes, respectively, were obtained and the constraints acting on base composition and base ordering were evaluated, respectively, by the d 1 and d 2 indices, whose values are reported in table 1 . in eight cases, including ␣3, g4, and x174 coliphages (bacalpha, mig4xx, phix174), duck and heron hepatitis b viruses (hbdgenm, hbhcg), and strains of hiv-1 lentivirus family (hivbrucg, hivcam1, hivndk), the d 1 value of overlapping regions was found to be smaller than the d 1 value of the nonoverlapping part. in two cases, corresponding to barley and beet luteoviruses (bydcg, bwyvfl1), the d 1 values appeared to be similar and near to zero. the exception was repre-sented by the family of eggplant and turnip tymoviruses (tymvcg, mtyrpvp), with a d 1 value of the overlapping sequences higher than the d 1 value of the nonoverlapping ones. moreover, the d 1 value considerably different from zero obtained from both types of coding regions in plant tymoviruses reflects the highest divergence from a random base composition in the set of sequences considered in our analysis. when analyzed with respect to the divergence from an independent ordering of bases (table 1) , all the overlapping sequences exhibited, with the exception of tymoviruses, a higher d 2 index value, as compared with the nonoverlapping counterpart. the graphical representation ( fig. 2) of the average values of the d 1 and d 2 indices, calculated by grouping the 12 viruses under examination in the five corresponding families (coliphage, hepatitis b virus, hiv-1 lentivirus, luteovirus, and tymovirus), led to the identification of two different informational patterns in the viral coding sequences. the first pattern is characterized by a clear tendency to possess, in the regions of overlap, a more uniform nucleotide composition (a lower d 1 value) and more severe constraints in base ordering (a higher d 2 value), with respect to the nonoverlapping regions lying in the same genome. it includes four of the five families considered in this study (coliphage, hepatitis b virus, hiv-1 lentivirus, and luteovirus). in the second pattern, which appears to be limited to the family of tymoviruses, both regions show, instead, similar compositional constraints, as evidenced by a slight variation of the corresponding d 2 values. the additional step of our analysis concerned the relationship between the frequencies of synonymous codons in overlapping or nonoverlapping genes. for each of the five virus families, the nonoverlapping regions belonging to the corresponding members were combined into a single entity, while the two frames of each overlapping gene arrangement were considered as two sepa(table 2) were then characterized, thus increasing the statistical relevance of our analysis. for example, the nonoverlapping set of tymoviruses (see the genomic map of tymoviruses in fig. 1 ) includes the coat gene and the nonoverlapping fraction of replicase gene of both eggplant and turnip virus. the overlapping regions of tymovirus family were considered, instead, as two distinct sets of data, the one including the overlapping region of replicase gene, the other the 69-kd protein gene. the subsequent correlation analysis (table 2 ) evidenced that six overlapping genes exhibited a choice of synonymous codons highly different from that occurring in the corresponding nonoverlapping genes. they include the a, c, e, and k genes of coliphage family, the tat gene of lentivirus, and the vpg gene of luteovirus, all exhibiting an r value near to zero. the highest degree of relationship was found in the overlapping genes encoding the replicase and the 69-kd protein of tymoviruses, as evidenced by an r value of 0.90 and 0.74, respectively. more generally, when the r mean values of the virus families were considered (table 2) the overlapping regions related to the first informational pattern (coliphage, hepatitis b virus, hiv-1 lentivirus, and luteovirus families) exhibited a very low correlation with the usage of synonyms in the nonoverlapping counterpart, as documented by a range of variation from 0.24 in coliphage to 0.38 in hepatitis b virus. in contrast, a much higher relationship between overlapping and nonoverlapping regions (an r mean value of 0.82) was found in the family of tymoviruses, representing the alternative informational pattern. the statistical analysis testing a difference in the composition of amino-acid residues between each of the 24 overlapping frame encoded proteins and the corresponding nonoverlapping counterpart was performed by the chi-square contingency-table test. data were arranged in a 2 × 2 table whose a, b, c, d values correspond to the content of a given amino-acid residue in a given overlapping frame (a), in the nonoverlapping frames (b), and to the total amount of the other amino acid residues in the same overlapping frame (c) and in the nonoverlapping frames (d). the counting of the chi-square values above the 3.8 cutoff (p < 0.05 for 1 degree of freedom), expressing a significantly higher content of amino-acid residues in the overlapping genes, led to the general representation shown in fig. 3 . it appears that the aminoacid composition bias within overlapping genes can be mainly ascribed to amino-acid residues with the highest level of codon degeneracy (e.g., arginine, serine, and leucine residues are expressed by six synonymous codons each and proline residue by four synonyms). this findings was also corroborated when considering the highest compositional differences. as summarized in table 3 , out of a total of ten chi-square values higher than 30.0 (p < 0.00001), four were ascribed to arginine, three to leucine, two to proline, and one to methionine residues. it has recently been proposed (keese and gibbs 1992) that overlapping gene arrangements may arise de novo, thus encoding novel specialized proteins; it has also been hypothesized that a new gene arisen in this way will have an unusual codon usage and will encode a protein with biased physicochemical properties. taking into account these observations, we have analyzed the informational content of viral overlapping genes at different levels of complexity. the use of information theory indices shows that viral sequences, albeit deriving from different sources, can be referred to two distinct patterns. considering that (luo et al. 1988 ) ''the smallness of d 1 represents the abundance of vocabulary and the largeness of d 2 represents the clarity of grammatical rules,'' the informational measures of overlapping sequences related to the first pattern (a low d 1 value, a high d 2 value, see fig. 2 ) suggest a level of genetic information storage closely resembling natural languages. the occurrence of these constraints in coliphage, hepatitis b virus, hiv-1 lentivirus, and luteovirus also reflects a peculiar pattern in the usage of synonymous codons for most of the corresponding overlapping frames (table 2) . since the most striking difference in the choice of synonyms concerns the family of coliphages, some speculations on the origin of its overlapping genes (see the genomic map shown in fig. 1) can be proposed. for example, the codon usage pattern of overlapping frames encoding the structural ''scaffolding'' b and d proteins is well correlated (an r value of 0.55 and 0.68, respectively) with that of the nonoverlapping genes, which encode the similarly structural j, f, g, and h proteins. this relatively high degree of correlation suggests an ancient origin for the b and d frames. on the other hand, the highly peculiar choice of synonyms in the genes e and k, which are entirely embedded within the d and a/c genes and exhibit an r value of 0.02 and 0.03, respectively, supports the idea of a more recent acquisition. since a low expression of the gene e is necessary and sufficient to induce lysis of the host cell (blasi et al. 1990 ), its peculiar codon usage pattern could represent a mechanism to regulate the rate of translation, thus preventing premature lysis of the host. a low expression during infectious cycle could also be required for the gene k, whose regulative role consists in increasing the burst size of phage production (gillam et al. 1985) . the codon usage pattern of the region of gene a which overlaps both b and k genes (an r value of 0.13 with the nonoverlapping frames) contrasts with that found in the nonoverlapping part of the gene a (an r value of 0.91). this observation suggests that a shorter gene a originally terminated in close proximity to a preexisting gene b and that the present overlapping arrangement evolved by a new termination codon of the gene a beyond the gene b. in a similar way, the codon usage of the overlapping fraction of gene c (an r value of 0.01 with the regions of nonoverlap) markedly differs from that occurring in the nonoverlapping region (an r value of 0.41). considering that the gene c of x174, g4, and ␣3 coliphages contains a second in-phase atg codon localized in the nonoverlapping region, we predict an originally shorter gene c which evolved using as initiator codon an upstream atg localized at the end of the gene a. the alternative pattern revealed by the information fig. 3 . frequency of occurrence of chi-square values above the 3.8 cutoff value which reflect a significantly higher content of individual amino acid residues in proteins encoded by overlapping genes. is preserved in the overlapping regions (a ‫ס‬ 21%, t ‫ס‬ 23%, g ‫ס‬ 16%, c ‫ס‬ 40%), and this excess of c residues tends to be clustered in the third base position of codons. in fact, a 49% content of c residues occurs in the third base position of nonoverlapping regions, a 44% content in the replicase protein (rp) overlapping frame, and a 36% content in the 69-kd protein overlapping frame. the high degree of relationship in the codon usage (an r mean value of 0.82) likely suggests that, at variance with the case of x174, the infectious cycle of tymoviruses may require, in all coding regions, a more uniform adaptation to the translationary machinery of the host. tymoviruses infect various members of the cruciferae (e.g., brassica rapa, arabidopsis thaliana) and they accumulate in leaves (bozarth et al. 1992) , where the highly expressed genes code for the small subunit of ribulose 1,5-bisphosphate carboxylase and for the chlorophyll a/b-binding protein (murray et al. 1989 ). interestingly, the third base positions of the coding regions of these latter genes show a frequency of c residues (40%) similar to that occurring in all different frames of tymoviruses. since the base frequency at third degenerate position in nonoverlapping regions (a ‫ס‬ 16%, t ‫ס‬ 21%, c ‫ס‬ 49%, g ‫ס‬ 14%) is closely similar to rp overlapping frame (a ‫ס‬ 19%, t ‫ס‬ 18%, c ‫ס‬ 44%, g ‫ס‬ 19%) and contrasts with 69-kd protein frame (a ‫ס‬ 24%, t ‫ס‬ 29%, c ‫ס‬ 36%, g ‫ס‬ 11%), we can also predict that this latter overlapping gene arose later by superimposition on a preexisting rp gene. the statistical analysis of the amino acid composition evidenced that the peculiar amino-acid usage occurring in overlapping genes can be mainly ascribed to a significantly higher frequency of amino acid residues having the highest level of codon degeneracy (fig. 3) . for example, the high content of leucine and arginine residues in the overlapping region of lentivirus env gene (table 3) is related to a very peculiar choice of synonyms (leu/ cta 4.9%; leu/tta 16.9%; leu/ctc 29.0%, arg/ aga 31.9%, arg/cgc 17.0%), when compared with that occurring in the lentivirus nonoverlapping regions (leu/cta 18.6%; leu/tta 34.2%; leu/ctc 6.5%; arg/aga 65.0%; arg/cgc 0.7%). therefore, a localized high frequency of both leucine and arginine residues combined with a strongly different strategy of codon usage within the ancestral env gene frame can be hypothesized as a basic event to originate the tat, rev, and vpu overlapping frames. some overlapping genes exhibiting a strongly preferred occurrence of leucine or arginine residues (table 3) have previously been demonstrated to perform a crucial function in the viral life cycle. for example, the high content of leucines in the overlapping e protein of coliphages lie within a transmembrane domain that is required to determine escherichia coli cell lysis (buckley and hayashi 1986) . the high frequency of arginines in the overlapping fraction of the core antigen of hepatitis b virus corresponds to a carboxyl-terminal signal that is involved in nuclear targeting of the protein (eckhardt et al. 1991) . a similar function has been ascribed to the polyarginine motifs of the rev protein of the hiv-1 lentiviruses (kubota et al. 1989 ). it has also been demonstrated (zapp et al. 1991 ) that the run of arginines in the center of rev protein is involved in the recognition, and nucleocytoplasmic transport, of unspliced viral mrnas. these data support the notion that the high frequency of amino-acid residues with a high level of codon degeneracy, which appears to be a peculiar feature of overlapping genes, can be viewed as a valuable tool with which to achieve a more flexible strategy in the choice of synonymous codons and/or to gain new specialized functions in the viral life cycle. binding of mammalian ribosomes to ms2 phage rna reveals an overlapping gene encoding a lysis function overlapping genes in bacteriophage x174 translational efficiency of x174 lysis gene e is unaffected by upstream translation of the overlapping gene d reading frame expression of orf-69 of turnip yellow mosaic virus is necessary for viral spread in plants characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an rna pseudoknot lytic activity localized to membranespanning region of x174 e protein biosynthesis of the reverse transcriptase of hepatitis b viruses involved de novo translational initiation not ribosomal frameshifting hepatitis b virus core antigen has two nuclear localization sequences in the arginine-rich carboxyl terminus the information content of dna gene k of bacteriophage x174 codes for a protein which affects the burst size of phage production informational parameters of an exact dna base sequence signals for ribosomal frameshifting in the rous sarcoma virus gag-pol region origins of genes: ''big bang'' or continuous creation? functional similarity of hiv-i rev and htlv-i rex proteins: identification of a new nucleolar-targeting signal in rev protein diversity of coding strategies in influenza viruses informational parameters of nucleic acid and molecular evolution evolution of overlapping genes codon usage in plant genes overlapping genes the embl data library polycistronic animal virus mrnas degeneracy of the information contained in amino acid sequences: evidence from overlaid genes the codon adaptation index, a measure of directional synonymous codon usage bias, and its potential applications semantic and syntactic patterns in the genetic language. in: colwell rr (ed) biomolecular data. a resource in transition protein constraints induced by multiframe encoding overlapping genes and information theory statistical methods molecular evolution in papova viruses and bacteriophages searching for codes in the sequences. in: colwell rr (ed) biomolecular data. a resource in transition hiv expression strategies: ribosomal frameshifting is directed by a short sequence in both mammalian and yeast systems do overlapping genes violate molecular biology and the theory of evolution? oligomerization and rna binding domains of the type i human immunodeficiency virus rev protein: a dual function for an arginine-rich binding motif we are grateful to professor franco conterio for support and encouragement. we also appreciate critical readings of the manuscript by simone ottonello and elena maestri. this work was supported by the national research council of italy and by the ministry of university and scientific and technological research. key: cord-273910-fna7s9te authors: bochud, pierre-yves; bochud, murielle; telenti, amalio; calandra, thierry title: innate immunogenetics: a tool for exploring new frontiers of host defence date: 2007-07-20 journal: lancet infect dis doi: 10.1016/s1473-3099(07)70185-8 sha: doc_id: 273910 cord_uid: fna7s9te the discovery of innate immune genes, such as those encoding toll-like receptors (tlrs), nucleotide-binding oligomerisation domain-like receptors (nlrs), and related signal-transducing molecules, has led to a substantial improvement of our understanding of innate immunity. recent immunogenetic studies have associated polymorphisms of the genes encoding tlrs, nlrs, and key signal-transducing molecules, such as interleukin-1 receptor-associated kinase 4 (irak4), with increased susceptibility to, or outcome of, infectious diseases. with the availability of high-throughput genotyping techniques, it is becoming increasingly evident that analyses of genetic polymorphisms of innate immune genes will further improve our knowledge of the host antimicrobial defence response and help in identifying individuals who are at increased risk of life-threatening infections. this is likely to open new perspectives for the development of diagnostic, predictive, and preventive management strategies to combat infectious diseases. environmental and host factors are important determinants of susceptibility to infection. in recent years, a rapidly growing body of evidence has underscored the importance of host genetic factors. the eff ect of genetic and environmental factors on the risk of death was assessed in a study of 960 adoptees. 1 death of a biological parent (but not of an adoptive parent) from infection before the age of 50 years resulted in a six times increase in the relative risk of dying from infection in the adoptee, strongly suggesting that susceptibility to infection aggregates in families. individuals who are heterozygous for haemoglobin s are known to be protected against malaria, whereas homozygous individuals have sickle-cell anaemia. 2 the high frequency of sickle-cell anaemia and other red blood cell disorders in regions where malaria is highly prevalent suggests that infectious agents (eg, plasmodium falciparum) can exert quite substantial selective pressure on human populations. 3 although natural immunity ensures survival of the species as a whole, individuals themselves are not likely to be immunocompetent to all pathogens, and individual diff erences in susceptibility to specifi c pathogens are quite common. 4 the development of the human genome project in 1990 propelled the scientifi c community into a new era, allowing genetic mapping and the development of large-scale gene identifi cation that has greatly facilitated the study of gene polymorphisms. we review recent advances in the fi eld of innate immunogenetics of host defences and show how an interdisciplinary approach of combining genetic epidemiology, genetics, genomics, and molecular and cellular biology will improve our understanding of the pathogenetic basis of infectious diseases, and help the development of new preventive and therapeutic treatment strategies. little inter-individual variation exists within the human genome. in fact, all genetic diff erences between individuals are estimated to be caused by variability in 3 million bp, which represent about 0·01% of the human genome. 5 since the mutation rate in mammalian genomes is low (10 -⁹ per bp per year), most interindividual variations are inherited. the most frequent variation is the single nucleotide polymorphism (snp), which occurs on average every 1300 bp. another type of genetic mutation is the variable number of tandem repeat (vntr); vntrs consist of repeats of sequences ranging from a single basepair to thousands of basepairs. 6 the term microsatellite is used for repeats of one to six nucleotides, whereas repeats of longer units are called minisatellites (seven to 100 nucleotides) or, in the extreme case, satellite dna (more than 100 nucleotides). since the number of repeats varies among individuals, vntrs have been widely used as genetic markers. within a coding region of a gene, an snp can either induce an aminoacid change (non-synonymous snps) or not (synonymous snps). snps may be located in the promoter region of a gene and therefore infl uence gene expression or splicing. similarly, diff erent lengths of vntr regions have been associated with diff erential gene expression. 7 certain snps or vntr alleles, or both, may be linked together so that non-functional polymorphisms can be used as genetic markers of functionally important mutations. only 1·5% of snps are thought to be located in a coding region of a gene. the functions of nearly all snps that are located outside gene-coding or regulatory regions are unknown. in recent years, snp genotyping technologies with high throughput and aff ordable costs have become available. these technologies are based on a few basic biochemical reactions (hybridisation, pcr with diff erential primer extension, specifi c ligation, and diff erential cleavage), which are used on diff erent support media and can be detected by diff erent methods (fi gure 1). 8 recent highthroughput technologies allow genotyping at low cost (ie, a few cents per snp per sample). 9 once markers have been typed, two main approaches can be used to analyse them: single marker analysis or haplotype analysis. a haplotype refers to the arrangement of two or more alleles on the same chromosome. currently, there is much debate about which approach is the most appropriate. studies have proposed that the underlying structure of the human genome can be described by use of a relatively simple framework in which the data are parsed into a series of discrete haplotype blocks. 10, 11 this observation has led to the development of haplotype tagging methods that aim to capture the haplotype structure in a candidate region. 11 haplotype tagging refers to the concept that most of the haplotypic structure in a particular chromosomal region can be captured by genotyping a smaller number of markers than all of those that constitute the haplotypes. the crucial markers to type would be the minimum set of markers that unambiguously identify each possible haplotype. the detection and estimation of familial aggregation is usually the fi rst step in the genetic analysis of a trait. once familial aggregation has been documented, the traditional approach has been to narrow down the genetic region of interest by use of linkage analysis, followed by fi ne mapping and association studies (table 1) . linkage and association studies are based on the same underlying principle: once a mutation occurs on a particular chromosome, it is subsequently transmitted to off spring together with nearby loci. this association is broken down at each successive generation by recombination (ie, homologous chromosomes pair during the meiotic cell division and exchange genetic material). when two loci are close enough on the same chromosome that their alleles cosegregate when passed on to the next generation, we say that the two loci are linked. 12 linkage disequilibrium refers to allelic association that is caused by linkage, or in other words, that has not yet been broken up by recombination. 12 an association between two loci, such as the non-independence of alleles at these loci, may be caused not only by linkage, but also to factors such as population stratifi cation or chance. population stratifi cation refers to the situation in which study participants are selected from genetically diff erent subpopulations. population stratifi cation will only lead to a spurious association (and hence be a confounder) if both the allele and disease frequencies diff er across subpopulations. 13 some researchers have argued that too much emphasis has been put on this issue and surprisingly few examples can be found that unequivocally show that population stratifi cation has led to a spurious association. 14, 15 whereas linkage and association studies can be done in families, only association studies can be done in unrelated cases and controls (table 1). the main diff erence between related and unrelated cases is the number of meiotic events that separate them, so that unrelated cases share a much shorter chromosomal segment around a particular causative mutation than related cases. linkage and association can be obscured by incomplete penetrance (ie, there is no one-to-one correspondence between genotype and phenotype), misdiagnoses, genetic heterogeneity (several genes can produce a similar phenotype), phenocopies (ie, environmental factors mimicking the eff ect of certain genes), and disease an important advance toward enabling effi cient wholegenome-scan association studies is the determination of linkage disequilibrium patterns on a genome-wide scale through the hapmap project. 16 because most diseases are likely to be infl uenced by several genes and environmental factors, the analysis of gene-gene interactions (epistasis) and gene-environment interactions will represent an important task in the future, but this is, and will remain, a challenging issue for the years to come. the innate immune system assumes an essential role in the natural host defences against microbes. the recognition of microbial pathogens, either in tissue in contact with the host's environment or in the systemic circulation after invasion of the bloodstream, is done by macrophages, dendritic cells, natural killer cells, granulocytes, and monocytes, which act as sentinels of the innate immune system (fi gure 2). this process involves coordinated action of several families of proteins, such as toll-like receptors (tlrs), 17 nucleotide-binding oligomerisation domain (nod)-like receptors (nlrs), 18, 19 rna helicase-containing proteins, 20 and the c-type lectins. 21 tlrs are essential components of the innate immune system. 17, 22, 23 tlrs are type i transmembrane proteins that function as homodimers or heterodimers. the extracellular domain comprises multiple leucine-rich repeat structures that vary among diff erent tlrs and are implicated in the selective recognition of a vast range of microbial-associated molecular patterns (mamps). 24 so far, 12 members of the tlr family have been identifi ed in mammals. several molecules, including cd14, 25 cd36, 26 and md2, 25 have also been shown to participate in the sensing of microbial products and are therefore integral components of these receptor complexes. binding of microbial products to microbial-recognition molecules activates signal transduction pathways and the transcription of immune genes that code for costimulatory molecules expressed at the cell surface or for immunoregulatory eff ector molecules (including cytokines and chemokines) released in the extracellular milieu that orchestrate the host innate immune defence response. 23, 27 in addition to lipopolysaccharide of gram-negative bacteria, 28, 29 tlr4 detects other mamps that are structurally unrelated to lipopolysaccharide, such as mannan (candida albicans) or the fusion protein of respiratory syncytial virus (fi gure 3). other endogenous ligands, including fi brinogen, fi bronectin, hyaluronic acid, heparin sulphate, beta-defensins, or heat-shock proteins, have been reported to activate tlr4. 17 however, endotoxin contamination has been argued to account for tlr5 tl r2 tl r1 microbial-associated molecular patterns are recognised by transmembrane receptors (1: eg, toll-like receptors [tlrs]), which trigger the activation of several signal-transducing pathways, leading to the production of cytokines and expression of costimulatory molecules. cytokines induce and regulate the infl ammatory response and orchestrate the adaptive immune response. by contrast with other tlrs, tlr3, tlr7, tlr8, and tlr9 are expressed mainly in the endosomal compartment (2), where local acidifi cation is required for recognition of microbial products by their cognate receptors. intracellular pathogens or microbial products released intracellularly after lysis of ingested microorganisms may also interact with intracytoplasmic receptors, such as nucleotide-binding oligomerisation domain-like (nlr) proteins (3), or the rna helicase-containing molecules (4: rig-i or mda5). tcr=t-cell receptor. tlr4 detects lipopolysaccharide (lps), mannan (candida albicans), and the fusion protein of the respiratory syncytial virus. tlr2 forms a heterodimer with either tlr1 to detect triacyl lipopeptide or tlr6 to detect diacyl lipopeptide and zymosan. tlr2 is also involved in the recognition of lipoteichoic acid (lta), peptidoglycan (pg), lipoarabinomannan (lam), porins (neisseria spp), glycosylphosphatidylinositol mucin (trypanosoma spp; tgpi), and the haemaglutinnin protein (ha, measles virus). tlr3, tlr7, tlr8, and tlr9 are located in the endosomal compartment and detect nucleic acids and/or haemozoin (plasmodium spp, tlr9). through their intracellular domain, tlrs interact with specifi c adaptor proteins, including the myeloid diff erentiation primary response protein 88 (myd88), the tir domaincontaining adaptor protein (tirap), the tir domain-containing adapter inducing interferon (trif), and the trifrelated adapter molecule (tram). these adaptors lead to the activation of several transcription factors such the nuclear factor κb (nfκb), the activating-protein 1 (ap1), and/or the interferon regulatory factors 3 and 7 (irf3/7) that ultimately induce the production of pro-infl ammatory mediators. ss=single-stranded. ds=double-stranded. the tlr4 specifi city of some of these putative tlr ligands. 30 tlr2 and tlr6 heterodimers detect diacyl lipopeptides, whereas tlr2 and tlr1 heterodimers recognise triacyl lipopeptides. 17 tlr2 has also been proposed to sense lipoteichoic acid, peptidoglygan, lipoarabinomannan, phospholipomannan (c albicans), zymosan (saccharomyces cerevisiae), porins (neisseria spp), glycosylphosphatidylinositol mucin (trypanosoma spp), and the haemagluttinin protein of the measles virus. 17 tlr3, tlr7, tlr8, and tlr9, which are mainly expressed in endosomes, serve to detect viral or bacterial nucleic acids. tlr3 detects double-stranded rna and tlr8 detects single-stranded rna. 17 tlr9 senses dna containing the unmethylated cpg motifs found in bacteria and viruses and the malaria pigment haemozoin. 17 compartmentalisation of tlr3, tlr7, tlr8, and tlr9 thus allows the detection of pathogenic dna and rna within the endosomal compartment, while avoiding the detection of self-dna and mrna. 23 on binding of cognate ligands, the intracellular tollinterleukin-1 receptor (tir) domain of tlrs recruits and activates diff erent adaptor proteins, including myeloid diff erentiation primary response protein 88 (myd88), tir domain-containing adaptor protein, tir domaincontaining adapter-inducing interferon β (trif; also known as ticam), and trif-related adapter molecule, ultimately leading to the activation of several specifi c signal-transducing pathways and transcription factors such as nuclear factor κb (nfκb) and activating protein 1 (ap1; fi gure 3 and fi gure 4). 40 myd88-dependent signalling pathways (nfκb and ap1) are activated by all tlrs, whereas myd88-independent, trif-dependent signalling path ways (interferon regulatory factor [irf] 3) are activated only by some tlrs (such as tlr3 and tlr4). the observation that diff erent tlrs may activate diff erent signalling pathways with diff erent biological conse quences shows that the innate immune system can produce pathogen-specifi c defensive responses. in addition to the tlrs, the family of proteins comprising nod proteins and the nalps (neuronal apoptosis inhibitor [like] proteins), also known collectively as nlrs or nacht-leucine-rich-repeat-containing proteins, have been shown to have a crucial role in the sensing of microbial products, invasive pathogens, and endogenous host proteins. nlrs are cytosolic proteins composed of three diff erent structural domains, a carboxy-terminal ligand-binding domain consisting of leucine-rich repeats, a nucleotide oligomerisation domain, and an aminoterminal eff ector domain consisting of various caspaserecruitment domains (card), a pyrin domain, or a baculoviral inhibitor-of-apoptosis repeat. 17, 41 nod1 and nod2 have been shown to recognise specifi c bacterial peptidoglycan motifs, 42 and to interact with tlr signalling pathways. 19, 42 nod2 detects muramyldipeptide, a peptidoglycan fraction of gram-positive and gram-negative bacteria, 43 traf6=tumour necrosis factor-receptor-associated factor 6. snp=single nucleotide polymorphism. γ-d-glutamyl-meso-diaminopimelic acid, a peptidoglycan fraction found in gram-negative bacteria and in a few gram-positive bacteria (listeria and bacillus spp). 44, 45 on exposure to microbial products, nods activate transcription factors, including nfκb and the mitogenactivated protein kinase, and induce the cleavage of prointerleukin 1β into active interleukin 1β. [46] [47] [48] the nalp subfamily of nlr proteins interact with several adaptor molecules, including asc (apoptosis-associated specklike protein containing a card domain), caspase 1, and caspase 5, and are essential for the activation of interleukin 1β. 49 the nalp-related protein card12 (also known as ipaf) is involved in salmonella typhimuriuminduced activation of caspase 1. 50 nalp3 is implicated in the detection of atp, 51 bacterial rna, 52 and uric acid crystals. 53 however, most nalps are orphan recognition proteins with no known ligands. a series of fascinating articles have provided strong evidence implicating the innate immune system in the host defence against viruses. two intracytoplasmic molecules have been implicated in the detection of viral rna. retinoic-acid-inducible protein i (rig-i) 54 and melanoma diff erentiation-associated gene 5 (mda5) 55-57 possess a card domain and rna helicase domains that function as sensors of double-stranded rna. 58 rig-i and mda5 signal through the adaptor molecule mavs (mitochondrial antiviral signalling protein; also known as cardif or visa) 18, 59, 60 and interact with several other signal-transducing molecules, including fadd (tumour necrosis factor receptor superfamily member 6 precursor [tnfrsf6, also know as fas]-associated death domain protein), ripk1 (receptor-interacting serine/threonineprotein kinase 1; also known as rip or rip1), tbk1 (traf family member-associated nf-kappa-b activator [tank]-binding kinase-1), and ikke (inhibitor of nfκb kinase subunit epsilon; also known as ikk-i). 40 these molecules are involved in the production of type i interferons (interferons α and β) in response to infection by rna viruses. therefore, rig-i and mda5 are able to detect single-stranded rna present in the cytoplasmic compartment and thus not accessible to endosomal tlr3. interestingly, rig-i and mda5 can discriminate between diff erent types of viruses. rig-i is essential for the production of interferons in response to paramyxoviruses, infl uenza virus, and japanese encephalitis virus, whereas mda5 is crucial for detection of picornavirus. 20 a newly described form of innate immunity, termed intrinsic immunity, ensures protection by providing a constitutive, always-on line of defence, relying on intracellular obstacles to hinder the replication of pathogens. 61 this component of the immune system has gained much attention as a cornerstone of the resistance of mammals against several classes of retroelements and retroviruses. 61 among the best studied proteins are the family of apolipoprotein b mrna-editing enzyme catalytic polypeptide 3 (apobec3) proteins, which interfere with the viral lifecycle by incorporating themselves into viral particles, leading to viral dna hypermutation on the next round of infection. 62, 63 a series of studies involving infection of human cd4+ t cells and macrophages with wild-type hiv-1 and hiv-1 defi cient in the vif gene showed that the antiviral eff ect of abc3g (also known as cem15 or apobec3g) is counteracted by vif. 64 interestingly, in non-human primates, abc3g orthologues provide antiviral activity against wild-type hiv-1, 65 but not their cognate simian immunodefi ciency viruses, suggesting that virus permissiveness in diff erent primates results from species-specifi c diff erences within vif. 63 one human variant of abc3g has been associated with rapid hiv-1 disease progression. 66 the tripartite motif (trim) family is a well-conserved family of proteins characterised by a structure comprising a ring-fi nger domain, one or two b-box motifs, and a predicted coiled-coil region. 67 additionally, most trim proteins have additional carboxy-terminal domains. members of the trim protein family are involved in various cellular processes, including cell proliferation, diff erentiation, development, oncogenesis, and apoptosis. 68, 69 some trim proteins exert antiviral properties. trim5α is reported to restrict retroviral infection by specifi cally recognising the viral capsid and promoting its premature disassembly. 70 human trim5α has limited effi cacy against hiv-1, whereas some primate trim5α orthologues can potently restrict this particular lentivirus. 68, 69 substantial interspecies sequence diversity characterises trim5α and may underlie diff erences in the pattern and breadth of restriction of multiple lentiviruses. human trim5α variants do not modify susceptibility to hiv-1; however, they change susceptibility to other retroviruses, such as n-tropic murine leukaemia virus. 71 polymorphisms found in trim5α might conceivably have been selected in past epidemics by viruses unrelated to hiv-1. the increasing availability of genomic data allows comparative analyses of genetic sequences involved in innate and intrinsic immunity. this approach, also described as evolutionary genomics, identifi es the role of adaptive forces on protein-encoding genes by determining signs of positive (diversifying) or negative (purifying) selection. for example, positive selection in the human genome indicates shifts in living conditions experienced by modern human populations, such as diff erent habitats, food sources, population densities, and exposure to pathogens. 72 several families of innate immunity genes have been investigated by use of comparative genomics. vertebrate tlrs are an example of evolutionary conservation that review indicate the diffi culty for the microbes to mutate genes that encode mamps. 73, 74 the cd209 (dc-sign) proteins, a family of c-type lectins that participate in the recognition of various pathogens, display a complex pattern of evolution. whereas cd209 has been under a strong selective constraint that prevents accumulation of aminoacid changes, cd209l (also known as dc-sign2 or dc-signr) exhibits greater variation across human populations. 75 such variations may be tolerated because of the potentially redundant functional activities of the molecules encoded by these genes. 76 the killer-cell immunoglobulin-like receptor (kir) genes encode a family of receptors expressed by natural killer cells, which participate in early responses against infected or transformed cells through production of cytokines and direct cytotoxicity. 77, 78 by contrast with tlrs and cd209, only a small proportion of kir alleles are conserved among primates, showing a rapid species-specifi c diversifi cation of the kir gene family members and a plasticity of the genomic region that parallels that of the mhc loci. 79 thus, the evolutionary forces driving the genesis of natural killer receptors and their hla ligands represent a concerted response to pathogens. finally, a remarkable success of evolutionary genomics in infectious diseases is the identifi cation of protein regions relevant for host-pathogen interactions in hiv-1 infection. comparative analysis of the primate antiretroviral cellular defence genes encoding for abc3g and trim5α have revealed the powerful selective pressures emerging from a long-standing battle between retroviruses and their hosts. [80] [81] [82] singular aminoacids or regions (patches) contain key residues that confer primates the ability to combat hiv-1. given that the innate immune system is at the interface between the host and the pathogen, polymorphisms of innate immune genes are very likely to aff ect the host susceptibility to infections. since the innate immune system senses only a limited number of highly conserved microbial-associated molecular patterns 23 via a limited number of receptors and signalling molecules, as anticipated, several polymorphisms have been found to confer an increased susceptibility to specifi c pathogens (table 2, table 3 , and fi gure 4). a study from turkey revealed an association between susceptibility to tuberculosis and an snp (r753q) in the tlr2 gene. 94 14 (9·3%) of 151 tuberculosis patients were homozygous for the minor allele compared with two (1·7%) of 116 healthy controls (odds ratio 6·0, 95% ci 1·3-3·9, p=0·009). of note, r753q was associated with decreased responsiveness to bacterial lipopeptides. 95 the role of a microsatellite polymorphism (gt repeat) in the exon 2 of tlr2 has been studied in 176 korean patients with tuberculosis and 196 healthy controls. 7 a shorter gt repeat was found more frequently in tuberculosis patients than healthy individuals (49·4% vs 37·7%, p=0·02), and was associated with weaker promoter activities and lower tlr2 expression in cd14positive peripheral blood monocytes. two nonsynonymous snps in the extracellular domain of tlr4 found to be in linkage disequilibrium (d299g and t399i) have been associated with an increased susceptibility to infections caused by gram-negative bacteria, 103, 104 brucella spp, 98 respiratory syncytial virus, 88 and p falciparum. 99 individuals heterozygous for the d299g and t399i snps were hyporesponsive to lipopolysaccharide as measured by bronchospastic response after inhalation of endotoxin. 31 furthermore, airway epithelial cells isolated from heterozygous individuals had defi cient response to lipopolysaccharide, suggesting that d299g and t399i acted in a dominant fashion with respect to the wild-type allele. 31 however, monocytes and whole blood isolated from heterozygous individuals did not show abnormal responses to lipopolysaccharide, 127, 128 suggesting that the eff ects of these mutations may vary between cell types. although two studies had shown that d299g, 103 or d299g and t399i, 104 were associated with an increased risk of gram-negative infections or septic shock, three subsequent studies did not fi nd an association in patients with meningococcal sepsis. [120] [121] [122] however, in one study, rare heterozygous missense mutations of tlr4 were linked with the development of meningococcal disease. 120 unexpectedly, d299g and t399i were associated with decreased rather than increased susceptibility to legionella pneumophila infection. 32 a stop codon polymorphism in tlr5 (r392stop), shown to abolish the ability of tlr5 to detect bacterial fl agellin, has been associated with increased susceptibility to pneumonia caused by l pneumophila. 123 several studies have shown associations between mutations in genes encoding several proteins of the tlr signalling pathways (irak4, 33,34 ikbkg, [36] [37] [38] and iκbα 35, 129 ) and rare inherited immunodefi ciencies. complete recessive interleukin-1 receptor-associated kinase 4 (irak4) defi ciency is characterised by recurrent infections with pyogenic bacteria at an early age that tend to disappear over time. by contrast, mutations aff ecting the other genes result in x-linked (ikbkg) or autosomaldominant (iκbα) anhydrotic ectodermal dysplasia, which is characterised by increased susceptibility to a broader range of pathogens, such as atypical mycobacteria or pneumocystis jirovecii and a complex disorder involving impaired development of skin appendages, conical teeth, and hypotrichosis. [36] [37] [38] taken together, these data clearly show that mutations in the genes encoding tlrs and downstream signal-transducing molecules infl uence innate immune responses and increase susceptibility to many infectious diseases. similarly, polymorphism of cytokines and cytokine receptor genes, which are key eff ector molecules, have also been associated with altered susceptibility to invasive pathogens. 130 polymorphisms in genes encoding nlrs have been shown to infl uence susceptibility to infl ammatory diseases. polymorphisms in nod2 have been associated with susceptibility to crohn's disease, 131 common limitations of genetic association studies are shown in table 4. genetic studies done to date often fail on the following factors: (1) to properly account for confounding factors (such as lack of information on ethnicity), and selection and information biases (insuffi cient data on the source population of cases and controls or study endpoints); (2) to present appropriate statistical analyses (such as lack of sample size calculation and correction for multiple testing); and (3) to provide convincing information about biological plausibility. as an example, among fi ve studies that assessed the eff ect of tlr4 polymorphisms on susceptibility to, and outcome of, severe infections, only two included more than 100 patients, 105 two provided information about patient's ethnicity, 105, 106 and only one limited the analysis to a specifi c ethnic group. 106 comparison of data is often impaired by the fact that apparently similar studies used markedly diff erent controls groups and endpoints. [103] [104] [105] [106] [107] proving causality is never trivial. associations are likely to occur when noncausal markers are in linkage disequilibrium with the true disease locus. although the replication of a fi nding in an independent sample decreases the risk of a falsepositive result, the functional signifi cance of the genetic variant should ultimately be shown in biological studies. however, proving biological plausibility may be diffi cult in view of the limitations of in-vitro studies used as proxy of complex in-vivo biological processes. for example, use of gene-silencing techniques often reduces the biological observation to that of an on/off system, which does not allow the detection of quantitative variations (ie, a dose-response eff ect) of gene expression or discrete functional alterations. with the increasing use of high-throughput genotyping techniques, the number of genetic associations that will be reported in the years to come will most probably exceed our capacity to do proper functional studies and hence to provide convincing evidence for biological plausibility. 139 36 jain et al, 37 jain et al, 38 functional studies should therefore focus on genetic polymorphisms that exert a strong eff ect, have been replicated by independent investigators, and have potential diagnostic or therapeutic implications. to limit the importance of positive publication bias, it will be crucial for investigators and journal editors to become less reluctant to publish well-conducted negative studies. 139 in recent years, innate immunogenetic studies of inherited genetic disorders have provided researchers and clinical investigators with crucial information that has improved our understanding of the host defences against microbial pathogens. table 5 shows examples of the eff ect of recent discoveries in the fi eld of innate immunogenetics with foreseeable applications for the short, middle, and long term in areas such as vaccine development and predictive and preventive medicine. the persistence or emergence of potentially devastating infectious diseases, such as tuberculosis, malaria, hiv/ aids, and, most recently, severe acute respiratory syndrome or avian infl uenza, underscore the need to develop new vaccines and therapeutic treatment strategies. a better understanding of microbial genomics and genetics and host innate immunogenetics is likely to provide important information for the development of new vaccines. vaccine immunogenicity is determined not only by the chemical and physical nature of microbial antigens and adjuvants, but also by the genetic make-up of vaccine recipients. analyses of polymorphisms of innate immune genes may also help understand why some individuals exhibit suboptimum responses to vaccination. 140 immuno suppression as a result of myeloablative chemotherapy, solid organ or haematological stem-cell transplantation, or corticosteroid therapy for autoimmune diseases represent other clinical conditions for which immune gene polymorphisms may help to predict the risk of life-threatening infectious complications. the recent discoveries of genes encoding tlrs, nlrs, and the related signal-transducing molecules has markedly improved our understanding of innate immunity. the availability of high-throughput genotyping techniques opens new perspectives to further improve our understanding of the pathogenesis of infectious diseases and for the development of new diagnostic, predictive, and preventive treatment strategies. clinicians and researchers should be aware of the results and far-reaching implications of recent innate immunogenetic studies that have associated genetic polymorphisms with susceptibility to, or outcome of, infectious diseases. collecting dna should now be an integral part of epidemiological or clinical infectious disease studies. national and international consortia should be created to put together large cohort studies to promote and facilitate research in the fi eld. we declare that we have no confl icts of interest. relevant articles for this review were identifi ed by searching medline (1966 to november, 2006 ) by use of the terms "genetics", "single nucleotide polymorphisms", "toll-like receptors" or "tlrs", "nucleotide-binding oligomerization domain receptors" or "nods", "immunology", and "innate immunity", and by extracting references from these articles. the review was limited to articles published in the english language. genetic and environmental infl uences on premature death in adult adoptees protection aff orded by sickle-cell trait against subtertian malareal infection the immunogenetics of resistance to malaria genetic dissection of immunity to mycobacteria: the human model a map of human genome sequence variation containing 1·42 million single nucleotide polymorphisms microsatellites: simple sequences with complex evolution the association between microsatellite polymorphisms in intron ii of the human toll-like receptor 2 gene and tuberculosis among koreans accessing genetic variation: genotyping single nucleotide polymorphisms single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput the structure of haplotype blocks in the human genome haplotype blocks and linkage disequilibrium in the human genome introduction and overview. statistical methods in genetic epidemiology reporting, appraising, and integrating data on genotype prevalence and gene-disease associations population stratifi cation and spurious allelic association counterpoint: bias from population stratifi cation is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer the international hapmap consortium. the international hapmap project pathogen recognition and innate immunity cardif is an adaptor protein in the rig-i antiviral pathway and is targeted by hepatitis c virus signalling pathways and molecular interactions of nod1 and nod2 diff erential roles of mda5 and rig-i helicases in the recognition of rna viruses myeloid c-type lectins in innate immunity toll-like receptors in the induction of the innate immune response inferences, questions and possibilities in toll-like receptor signalling innate immunity: an overview lps, tlr4 and infectious disease diversity cd36 is a sensor of diacylglycerides innate immune recognition defective lps signaling in c3h/hej and c57bl/10sccr mice: mutations in tlr4 gene md-2, a molecule that confers lipopolysaccharide responsiveness on toll-like receptor 4 endotoxin contamination in recombinant human heat shock protein 70 (hsp70) preparation is responsible for the induction of tumor necrosis factor alpha release by murine macrophages tlr4 mutations are associated with endotoxin hyporesponsiveness in humans tolllike receptor 4 polymorphisms are associated with resistance to legionnaires' disease distinct mutations in irak-4 confer hyporesponsiveness to lipopolysaccharide and interleukin-1 in a patient with recurrent bacterial infections pyogenic bacterial infections in humans with irak-4 defi ciency a hypermorphic ikappabalpha mutation is associated with autosomal dominant anhidrotic ectodermal dysplasia and t cell immunodefi ciency a novel x-linked disorder of immune defi ciency and hypohidrotic ectodermal dysplasia is allelic to incontinentia pigmenti and due to mutations in ikkgamma (nemo) specifi c missense mutations in nemo result in hyper-igm syndrome with hypohydrotic ectodermal dysplasia specifi c nemo mutations impair cd40-mediated c-rel activation and b cell terminal diff erentiation x-linked anhidrotic ectodermal dysplasia with immunodefi ciency is caused by impaired nf-kappab signaling nod-lrr proteins: role in host-microbial interactions and infl ammatory disease the role of toll-like receptors and nod proteins in bacterial infection host recognition of bacterial muramyl dipeptide mediated through nod2. implications for crohn's disease nod1 detects a unique muropeptide from gram-negative bacterial peptidoglycan an essential role for nod1 in host recognition of bacterial peptidoglycan containing diaminopimelic acid rick/rip2/ cardiak mediates signalling for receptors of the innate and adaptive immune systems role of nod2 in the response of macrophages to toll-like receptor agonists heterotypic interactions among nacht domains: implications for regulation of innate immune responses nlrs join tlrs as innate sensors of pathogens diff erential activation of the infl ammasome by caspase-1 adaptors asc and ipaf cryopyrin activates the infl ammasome in response to toxins and atp bacterial rna and small antiviral compounds activate caspase-1 through cryopyrin/nalp3 goutassociated uric acid crystals activate the nalp3 infl ammasome the rna helicase rig-i has an essential function in double-stranded rna-induced innate antiviral responses the v proteins of paramyxoviruses bind the ifn-inducible rna helicase, mda-5, and inhibit its activation of the ifn-beta promoter expression analysis and genomic characterization of human melanoma diff erentiation associated gene-5, mda-5: a novel type i interferon-responsive apoptosis-inducing gene mda-5: an interferon-inducible putative rna helicase with double-stranded rna-dependent atpase activity and melanoma growth-suppressive properties shared and unique functions of the dexd/h-box helicases rig-i, mda5, and lgp2 in antiviral innate immunity visa is an adapter protein required for virus-triggered ifn-beta signaling ips-1, an adaptor triggering rig-i-and mda5-mediated type i interferon induction lentiviral vectors and antiretroviral intrinsic immunity role and mechanism of action of the apobec3 family of antiretroviral resistance factors innate cellular defenses of apobec3 cytidine deaminases and viral counter-defenses isolation of a human gene that inhibits hiv-1 infection and is suppressed by the viral vif protein a single amino acid determinant governs the species-specifi c sensitivity of apobec3g to vif action apobec3g genetic variants and their infl uence on the progression to aids the tripartite motif family identifi es cell compartments trim family proteins: retroviral restriction and antiviral defence control of viral infectivity by tripartite motif proteins specifi c recognition and accelerated uncoating of retroviral capsids by the trim5alpha restriction factor highfrequency persistence of an impaired allele of the retroviral defense gene trim5alpha in humans a map of recent positive selection in the human genome toll-like receptor 5 recognizes a conserved site on fl agellin required for protofi lament formation and bacterial motility the evolution of vertebrate toll-like receptors novel member of the cd209 (dc-sign) gene family in primates the heritage of pathogen pressures and ancient demography in the human innate-immunity cd209/cd209l region mhc class i molecules and kirs in human history, health and survival the impact of variation at the kir gene cluster on human disease single haplotype analysis demonstrates rapid evolution of the killer immunoglobulin-like receptor (kir) loci in primates ancient adaptive evolution of the primate antiviral dna-editing enzyme apobec3g positive selection of primate trim5alpha identifi es a critical species-specifi c retroviral restriction domain patterns of evolution of host proteins involved in retroviral pathogenesis polymorphisms in cd14, mannose-binding lectin, and toll-like receptor-2 are associated with increased prevalence of infection in critically ill adults cd14 and tlr4 gene polymorphisms in adult periodontitis combined carriership of tlr9 −1237c and cd14 −260t alleles enhances the risk of developing chronic relapsing pouchitis cd14 promoter polymorphism −159c>t is associated with susceptibility to chronic chlamydia pneumoniae infection in peripheral blood monocytes the cd14 functional gene polymorphism −260 c>t is not involved in either the susceptibility to chlamydia trachomatis infection or the development of tubal pathology association between common toll-like receptor 4 mutations and severe respiratory syncytial virus disease genetic polymorphisms of cd14, toll-like receptor 4, and caspase-recruitment domain 15 are not associated with necrotizing enterocolitis in very low birth weight infants gene variants of the bactericidal/permeability increasing protein and lipopolysaccharide binding protein in sepsis patients: gender-specifi c genetic predisposition to sepsis tlr1 and tlr6 polymorphisms are associated with susceptibility to invasive aspergillosis after allogeneic stem cell transplantation polymorphisms in toll-like receptor 2 are associated with increased viral shedding and lesional rate in patients with genital hsv-2 infection a microsatellite polymorphism in intron 2 of human toll-like receptor 2 gene: functional implications and racial diff erences the arg753gln polymorphism of the human toll-like receptor 2 gene in tuberculosis disease a novel polymorphism in the toll-like receptor 2 gene and its potential association with staphylococcal infection lack of association between toll-like receptor 2 polymorphisms and susceptibility to severe disease caused by staphylococcus aureus heterozygous arg753gln polymorphism of human tlr-2 impairs immune activation by borrelia burgdorferi and protects from late stage lyme disease tlr4 polymorphism in iranian patients with brucellosis toll-like receptor (tlr) polymorphisms in african children: common tlr-4 variants predispose to severe malaria common polymorphisms of toll-like receptors 4 and 9 are associated with the clinical manifestation of malaria during pregnancy the toll-like receptor 4 (asp299gly) polymorphism is a risk factor for gram-negative and haematogenous osteomyelitis role of tlr4 receptor polymorphisms in boutonneuse fever human toll-like receptor 4 mutations but not cd14 polymorphisms are associated with an increased risk of gram-negative infections relevance of mutations in the tlr4 receptor in patients with gram-negative septic shock tlr4 and tnf-alpha polymorphisms are associated with an increased risk for severe sepsis following burn injury eff ects of functional toll-like receptor-4 mutations on the immune response to human and experimental sepsis polymorphisms in toll-like receptor 4 and the systemic infl ammatory response syndrome functional gene polymorphisms in aggressive and chronic periodontitis gingival epithelial cells heterozygous for toll-like receptor 4 polymorphisms asp299gly and thr399ile are hypo-responsive to porphyromonas gingivalis gene polymorphisms in pro-infl ammatory cytokines are associated with systemic infl ammation in patients with severe periodontal infections tolllike receptor (tlr) 2 and 4 mutations in periodontal disease toll-like receptor 4 thr399ile polymorphisms are a risk factor for candida bloodstream infection candida-specifi c interferon-gamma defi ciency and toll-like receptor polymorphisms in patients with chronic mucocutaneous candidiasis role of the toll-like receptor 4 asp299gly polymorphism in susceptibility to candida albicans infection relationship between a toll-like receptor-4 gene polymorphism, bacterial vaginosis-related fl ora and vaginal cytokine responses in pregnant women diff erences in infl ammatory cytokine and toll-like receptor genes and bacterial vaginosis in pregnancy the toll-like receptor 4 asp299gly variant: no infl uence on lps responsiveness or susceptibility to pulmonary tuberculosis in the gambia innate immunity genes infl uence the severity of acute appendicitis the role that the functional asp299gly polymorphism in the toll-like receptor-4 gene plays in susceptibility to chlamydia trachomatis-associated tubal infertility assay of locus-specifi c genetic load implicates rare toll-like receptor 4 mutations in meningococcal susceptibility variation in toll-like receptor 4 and susceptibility to group a meningococcal meningitis in gambian children a functional polymorphism of toll-like receptor 4 is not associated with likelihood or severity of meningococcal disease a common dominant tlr5 stop codon polymorphism abolishes fl agellin signaling and is associated with susceptibility to legionnaires' disease host susceptibility and clinical outcomes in toll-like receptor 5-defi cient patients with typhoid fever in vietnam polymorphisms in toll-like receptor 9 infl uence the clinical course of hiv-1 infection innate immune receptor genetic polymorphisms in pouchitis: is card15 a susceptibility factor? heterozygous tolllike receptor 4 polymorphism does not infl uence lipopolysaccharide-induced cytokine release in human whole blood monocytes heterozygous for the asp299gly and thr399ile mutations in the toll-like receptor 4 gene show no defi cit in lipopolysaccharide signalling inherited disorders of human toll-like receptor signaling: immunological implications cytokine promoter polymorphisms in severe sepsis host recognition of bacterial muramyl dipeptide mediated through nod2. implications for crohn's disease nod2 is a general sensor of peptidoglycan through muramyl dipeptide (mdp) detection card15 mutations in blau syndrome early-onset sarcoidosis and card15 mutations with constitutive nuclear factor-kappab activation: common genetic etiology with blau syndrome both donor and recipient nod2/card15 mutations associate with transplant-related mortality and gvhd following allogeneic stem cell transplantation mutation of a new gene encoding a putative pyrin-like protein causes familial cold autoinfl ammatory syndrome and muckle-wells syndrome chronic infantile neurological cutaneous and articular syndrome is caused by mutations in cias1, a gene highly expressed in polymorphonuclear cells and chondrocytes the bare lymphocyte syndrome and the regulation of mhc expression host genetic determinants in hepatitis c virus infection lipopolysaccharides from distinct pathogens induce diff erent classes of immune responses in vivo this work was supported by grants from the swiss foundation for medical and biological grants (1121 to pyb), the swiss national science foundation (81la-65462 to pyb, 32-49129.96 and 3100-066972.01 to tc), the bristol-myers squibb foundation (to tc), the leenaards foundation (to pyb and tc), and the santos-suarez foundation for medical research (to tc and at). key: cord-010038-0m2f0eh4 authors: caspi, jonathan; amitai, gil; belenkiy, olga; pietrokovski, shmuel title: distribution of split dnae inteins in cyanobacteria date: 2003-11-11 journal: mol microbiol doi: 10.1046/j.1365-2958.2003.03825.x sha: doc_id: 10038 cord_uid: 0m2f0eh4 inteins are genetic elements found inside the coding regions of different host proteins and are translated in frame with them. the intein‐encoded protein region is removed by an autocatalytic protein‐splicing reaction that ligates the host protein flanks with a peptide bond. this reaction can also occur in trans with the intein and host protein split in two. after translation of the two genes, the two intein parts ligate their flanking protein parts to each other, producing the mature protein. naturally split inteins are only known in the dna polymerase iii alpha subunit (polc or dnae gene) of a few cyanobacteria. analysing the phylogenetic distribution and probable genetic propagation mode of these split inteins, we conclude that they are genetically fixed in several large cyanobacterial lineages. to test our hypothesis, we sequenced parts of the dnae genes from five diverse cyanobacteria and found all species to have the same type of split intein. our results suggest the occurrence of a genetic rearrangement in the ancestor of a large division of cyanobacteria. this event fixed the dnae gene in a unique two‐genes one‐protein configuration in the progenitor of many cyanobacteria. our hypothesis, findings and the cloning procedure that we established allow the identification and acquisition of many naturally split inteins. having a large and diverse repertoire of these unique inteins will enable studies of their distinct activity and enhance their use in biotechnology. inteins are genetic elements present in protein coding regions. all the element codes for a protein that is translated together with the coding region of its host gene. the intein protein is removed from the host protein by a pro-tein-splicing reaction that joins the intein flanks with a peptide bond. this reaction is autocatalytic, fully catalysed by the intein and the residue c-terminal to it, with no need for other proteins, atp or such molecules (paulus, 2000) . it is typically a cis intramolecular reaction but can also occur when the n-and c-terminal parts of the intein are split and encoded on separate protein chains, each with its own flank. a trans protein-splicing reaction then ligates the flanks of the two intein parts (shingledecker et al ., 1998; southworth et al ., 1998) . a split intein is naturally present in the dnae genes of a few cyanobacteria (gorbalenya, 1998; kaneko et al ., 2001; nakamura et al ., 2002) . inteins are active with various flanks, in heterologous organisms and in vitro . biochemical studies of the protein-splicing mechanism led to the use of typical and split inteins in diverse biotechnology applications (perler and adam, 2000; ozawa et al ., 2001; mootz and muir, 2002) . inteins are present in a variety of protein genes from diverse bacteria and archaea and in several eukaryotes. however, intein distribution is extremely sporadic. although inteins are widely distributed, present in the three domains of life, they are relatively rare, and about 160 are currently known (perler, 2002) . moreover, their distribution is discontinuous and irregular, with even closely related species differing in intein presence at homologous integration sites (liu, 2000; pietrokovski, 2001) . consequently, intein presence is very difficult to predict. intein distribution is the result of two parallel processes. the primary process seems to be independent of gradual loss of intein elements from separate species during evolution (pietrokovski, 2001) . inteins are not known to contribute any advantage to their host genes or species and are believed to be selfish genetic elements (belfort et al ., 1995) . active selection against inteins seems to be relatively weak because of their apparent negligible disruption of their host genes, protein products and organisms. intein removal from the genome requires a precise dna excision event as they are inserted in highly conserved points of genes coding for essential proteins (derbyshire and belfort, 1998) . counteracting this slow extinction is horizontal transfer to specific integration points, e.g. homing (belfort and roberts, 1997) . most intein proteins include a homing endonuclease domain that can mediate insertion of their intein gene into homologous unoccupied intein integration points (gimble and thorner, 1992) . hence, homing activ-ity of inteins can reinsert their gene into integration sites that they were lost from. cyanobacteria are a large and diverse group of bacteria. they can be clustered by their 16s rrna sequences into seven monophyletic evolutionary groups (honda et al ., 1999) . although the exact relation between the groups is not fully certain, each group of species is highly likely to have diverged from a distinct species. the recent genome sequencing of various cyanobacteria prompted us to examine the distribution and origin of the cyanobacterial split dnae inteins. here, we show that split inteins of the dnae gene are common in several groups of cyanobacteria. this gene arrangement seems to be fixed in at least three large and probably related groups that include scores of known species. we cloned split inteins from five diverse species of these groups and thus show how to obtain split inteins from many different species. the distribution of these genes allows us to reconstruct their evolution. we discuss necessary steps in the transition from contiguous to split inteins and whether this transition is likely to be common. we first screened for the presence of split inteins by database searches. dna polymerase iii alpha subunit (polc or dnae protein) was found to be encoded on two genes in several cyanobacteria -synechocystis species pcc 6803 (ssp pcc6803) (kaneko et al ., 1996) , synechococcus species pcc 7002 (ssp pcc7002) (yu et al ., 2002) , nostoc species pcc 7120 (nsp pcc7120, previously called anabaena species pcc 7120) (kaneko et al ., 2001) , nostoc punctiforme (npu) (meeks et al ., 2001) , therrmosynechococcus elongatus bp-1 (tel) (nakamura et al ., 2002) and trichodesmium erythraeum (ter, http:// genome.jgi-psf.org/draft_microbes/trier/trier.home.html). the gene organization is identical in all cases. the dnae coding sequences are split in the same highly conserved point into two genes. the 5 ¢ part of the dnae gene ( dnae 1) is followed by a 5 ¢ part of the split intein, and the dnae gene 3 ¢ part ( dnae 2) is preceded by a 3 ¢ part of the split intein ( fig. 1) . the ter dnae 1 gene also includes other inteins and group ii introns (not shown). the mature dnae cyanobacterial dnae gene loci. each line shows the dnae gene locus or the loci of dnae 1 and dnae 2 split genes from one species, except for the top locus that is identical in all three listed species. gene protein-coding regions are shown as rectangles with an arrowhead at their 3 ¢ ends. the dnae genes are shown in dark grey with the split intein parts in black. other homologous genes are indicated by similar patterns -there are only two such pairs, between nostoc species pcc7120 and prochlorococcus marinus med4, and nostoc species pcc7120 and nostoc punctiforme . gene names or functions are indicated where known. the distance between the split dnae genes in each species is indicated where known. protein in each species is assumed to be ligated by the split intein in a trans protein-splicing reaction from the separately translated dnae1 and dnae2 proteins (wu et al ., 1998; perler, 1999) . dnae genes were also found in three other species of cyanobacteria, prochlorococcus marinus med4 (pma med4) and mit9313 (pma mit9313) (hess et al ., 2001) and synechococcus species wh8102 (ssp wh8102, http://genome.jgi-psf.org/finished_microbes/ synw8/synw8.home.html). in all these species, the dnae genes are contiguous, having no inteins, and are flanked by the same genes. these genes thus appear in the same genomic context (fig. 1) . corresponding parts of known split dnae genes are each flanked by different genes. thus, split dnae genes are in different genomic contexts (fig. 1) . in addition to being integrated at the same point, the dnae split intein amino acid sequences are more similar to each other then to those of other inteins (fig. 2) . the progenitor intein of all dnae split inteins was probably a typical, contiguous intein. at some time point, this intein and its dnae host gene were split by some genetic rearrangement event to form two genes (perler, 1999) . although all known split dnae genes underwent further genomic rearrangements, evidenced by their differing genomic contexts, they retained the split intein parts. this indicates the stability of the split intein organization in dynamic genomes. five of the six above-mentioned known split dnae genes are present in species from three of the seven distinct groups of cyanobacteria; thermosynechococcus elongatus bp-1 (previously named synechococcus elongatus toray) is not placed in any of the seven groups by 16s rrna analysis and is considered as part of an early diverged group of cyanobacteria (honda et al ., 1999) . cyanobacteria with known contiguous dnae genes are all from a fourth group, and there is no public data on cyanobacterial dnae genes from the other three defined groups ( fig. 2a) . split intein genes are extremely unlikely to be transferred by homing. not only do both intein genes need to be copied, but they have no endonuclease domains. thus, honda et al . (1999) . species with known dnae sequences are marked by filled bullets for dnae sequences split by inteins and by empty bullets for contiguous dnae sequences. underlined filled bullets denote split dnae inteins identified and sequenced in this work ( aphanizomenon , aphanothece , oscillatoria and thermosynechococcus vulcanus ). the relations shown between the groups are the most probable ones but are not as certain as the group definitions (honda et al ., 1999) . listed for each group are species with known dnae or dnae intein sequences or a representative species chosen from the groups of honda et al . (1999) . the listed aphanothece and oscillatoria species 16s rrna sequences were significantly most similar to each other (931/1000 bootstrap value). they were closest to group 2 species but could not be definitely clustered with it. centre. a dendogram calculated from conserved dnae polymerase protein regions spanning 409 amino acids. the dendogram is rooted by the position of all cyanobacterial dnae protein regions within a larger dendogram of dnae proteins from various bacteria (not shown). right. a dendogram calculated from conserved split dnae intein sequences spanning 132 amino acids (see fig. 4a ). the dendogram is rooted by the position of all split dnae inteins within a larger dendogram of various inteins (not shown). bootstrap confidence values for grouping of dnae proteins and split inteins (values at the root) are from the larger dendograms. bootstrap confidence values for dnae polymerase and intein dendograms calculated from 1000 trials. nodes below values of 700/1000 were collapsed. therrmosynechococcus vulcanus sequences are not included in the analysis as only a small part upstream of its dnae1 intein was determined. nevertheless, the determined sequences of the intein and polymerase regions are very similar to t. elongatus bp-1, and the two species are expected to cluster together. the distribution of known split dnae inteins is probably the result of regular vertical transmission. we hypothesized that, as split dnae inteins are present in species from three diverse cyanobacterial groups, they might be very common, or even invariably present, in other species from these groups and maybe also in other related groups ( fig. 2a) . to test our hypothesis, we set out to clone the dnae genes from diverse cyanobacteria. analysed species included aphanizomenon ovalisporum (aov), a freshwater cyanobacteria most similar by its 16s rrna sequence to nostoc species; microcystis species (vardi et al ., 2002) , a freshwater cyanobacteria belonging to group 5 (related to ssp pcc6803 and ssp pcc7002); aphanothece halophytica (aha) and oscillatoria limnetica (oli), unicellular and filamentous, respectively, facultative anaerobic photoautotrophic cyanobacteria that are most similar by their 16s rrna sequences to group 2 (not shown); and therrmosynechococcus vulcanus (tvu), a thermophilic cyanobacteria species closely related to t. elongatus bp-1 (nakamura et al ., 2002) . using sequential degenerate primer polymerase chain reaction (pcr) amplifications, we determined the sequences of conserved dnae gene regions flanking one side of the known split intein integration point. we then amplified the region of the insertion site by single primer linear amplification and terminal transferase tailing (rudi et al ., 1999) (fig. 3) . the dnae genes of all five tested species were found to be split in two and have a split intein at the same point as the other known cyanobacterial split dnae inteins. we obtained complete sequences of the split inteins and the dnae flanks from four of the species (fig. 4a) . together with the previously publicly available sequences, there are now nine known complete split dnae intein sequences. the length of the n ¢ split intein parts is between 123 and 101 amino acids, with the longest being from ssp pcc6803 and the shortest from aov. the length variability results from the c ¢ ends of these intein parts. in contrast, all eight known c ¢ split intein parts are 35 or 36 amino acids in length (fig. 4a) . the combined length of the aov split intein is 137 amino acids, only three residues longer then the length of the shortest known intein, from the archaeon methanobacterium thermoautotrophicum (smith et al ., 1997) . none of the split inteins has an endonuclease domain, like 18% of known inteins (http://bioinfo.weizmann.ac.il/ ~ pietro/inteins). all nine split dnae inteins have the six conserved sequence motifs that define the intein proteinsplicing fold and active site (duan et al ., 1997; pietrokovski, 1998) . they are further conserved along their entire sequence, except for the c ¢ ends of the n ¢ parts, which differ from each other in sequence and length (fig. 4a) . the conserved minimal sequence regions of the n-and c-intein parts are those defined by computational and experimental analyses (pietrokovski, 1998; ghosh et al., 2001; mootz and muir, 2002) . split dnae inteins were found to be common in cyanobacteria. they are present in all examined dnae genes of species from three groups of cyanobacteria and an unassigned species (thermosynechococcus) and were probably fixed in them by one common event. we suggest that most, if not all, cyanobacteria originating from the last common ancestor of species with split dnae inteins also have this split intein. the apparent fixation of split dnae inteins in a subdivision of cyanobacteria contrasts with the discontinuous distribution of other inteins, including those present in cyanobacteria (table 1) . split dnae intein presence is a taxonomic trait that could be used in classifying cyanobacteria. we suggest it to be an ancient, highly persistent and vertically transmitted trait that separates cyanobacteria into two separate fig. 3 . split inteins primer-walking amplification strategy. sequence of a dnae1, the dnae n¢ part, and its n¢ intein part are shown as boxes. amplification reactions are shown as lines beneath the sequence region amplified, with the primer regions shown as triangles. degenerate primers are shown as grey triangles, and specific primers are shown as black triangles. terminal transferase 3¢ added tail is shown as a dotted line. the reaction order is from top to bottom. the first reaction amplified a region 5¢ to the intein integration point using degenerate primers to two flanking conserved regions. the next reaction amplified the 3¢ region of the previous reaction and a downstream region, using a specific 5¢ primer and a degenerate 3¢ primer. typically, two or three such reactions sufficed to determine the sequence up to the insertion site. to amplify the less conserved nintein region and its downstream 3¢ untranslated region, a linear amplification using a single specific 5¢ primer was followed by terminal transferase polycytosine tailing. products from this reaction were then reamplified by a specific internal 5¢ primer and a polyguanosine primer. finally, the sequence was verified by amplifying from genomic dna the intein part with upstream and downstream regions using specific primers. the c-intein part of the dnae2 gene was cloned similarly with the amplification reactions proceeding in the reverse orientation -advancing from the dnae2 central region towards its 5¢ end. clades. thermosynechococcus, classified by its 16s rrna gene as an early diverged genus of cyanobacteria (honda et al., 1999) , is found by the presence of a split dnae intein to be part of a larger assembly of cyanobacterial groups. this could be investigated further by analysing the now available complete genome of t. elongatus bp-1 (nakamura et al., 2002) . split cyanobacterial dnae genes are present in diverse loci, whereas all known cyanobacterial contiguous dnae genes appear in stable genomic positions. the original genomic rearrangement that split the dnae gene thus seems to have been followed by further genomic rearrangements. nostoc species pcc 7120 and related cyanobacteria undergo developmentally regulated genome rearrangements (golden et al., 1987; carrasco and golden, 1995) . these are species in which split dnae genes also occur. it is possible that some cyanobacteria are more tolerant to genome rearrangements. such events produce split genes that are reactivated by regulated rearrangements. one rearrangement may have split a. dnae1 (top) and dnae2 (bottom) intein parts. the start position of each intein is shown in their host proteins, except for the dnae1 sequences determined in this work where only the 3¢ parts of the genes were determined. asterisks indicate stop codons. underlined regions are intein conserved sequence motifs (pietrokovski, 1998) . lower case sequence regions in the c¢ end of the dnae1 intein parts are not conserved in these sequences and should not be considered aligned. below the sequence alignment is a graphical representation of it (henikoff et al., 1995) a dnae gene in its intein region. however, the two dnae gene parts could reassemble at the protein level by their intein regions. there is no biological assay for intein activity. new intein genes can be identified by amplifying genes known to include inteins in some species from other related species (davis et al., 1994; fsihi et al., 1996; sander et al., 1998; saves et al., 2000; lazarevic, 2001) . however, all known inteins are randomly distributed, and only one case was found of closely related species with consistent intein distribution (sander et al., 1998) . usually only a few of the examined species were found to contain inteins. dnae split inteins seem to be an exception, being present in all cyanobacteria species that we examined that belong to certain well-defined groups. other species from these groups can thus serve as a reliable source of split inteins. the continuous distribution of dnae split inteins in cyanobacterial groups has theoretical and applied implications for intein studies. we can now obtain inteins that co-evolve with their host species. these inteins can be used to study the evolutionary change rate of inteins, for example to decide whether inteins are of ancient or recent origin (pietrokovski, 2001) . they could also be a valuable source in theoretical and experimental studies of proteinprotein interaction and intein catalytic activity (gorbalenya, 1998; martin et al., 2001) . from a practical aspect, our ability to obtain an apparently very large number of different dnae split inteins will be enhanced. for example, the pasteur culture collection of cyanobacteria (pcc) maintains about 475 axenic strains, many belonging to the groups in which we believe dnae split inteins are fixed. of particular interest will be split inteins from species living in extreme conditions such as high temperature or salinity. the protein-splicing activity of these split inteins may depend on the conditions in which their host species are active. our hypothesis is that split inteins are very difficult to lose. this is based on the idea that split inteins cannot be lost by one, or even two, simple dna excision events. cyanobacterial dnae proteins are split by inteins in a highly conserved motif appearing in all known dnae proteins (fig. 4b ). this strongly suggests that the two split dnae host protein parts would have to be attached by a peptide bond at the intein integration point for the motif to adopt the fold, and have the activity, found in other dnae proteins. loss of one or both intein parts will leave the products of their gene flanks split with no mechanism to ligate. to lose the intein genes and retain a functional dnae gene requires parallel precise loss of the intein parts followed directly by precise fusion of their flanks into one gene; this is a highly unlikely event. an alternative method for removal of split intein genes is to acquire a surrogate gene to replace the product of the split genes. however, dnae is an essential and tightly regulated gene forming the core of the bacterial replicative dna polymerase. it interacts with four other protein subunits in the polymerase holoenzyme and with various cofactors during replication (kelman and o'donnell, 1995) . thus, a surrogate gene must be precisely adapted to the species to replace its dnae gene. there are more then 50 known intein alleles, but only the cyanobacterial dnae intein allele is split. nevertheless, these split inteins are common in cyanobacteria. if split inteins are common, at least in one group of species, why is there only one known type (allele) of them? the rarity of split intein organization probably results from the difficulty in switching to a functional two genes and two prod a. species include fully and almost fully sequenced cyanobacteria. +, intein is present; -, intein is absent; \, species does not have the orthologue of the intein host protein. ucts state from a single gene and a single product one. gene splitting would require acquiring a promoter and translation initiation signal for the downstream part of the split gene, adapting co-regulation for the two formed genes and evolving amino acid sequences to stabilize the two new ends of the intein parts. gene splitting of the intein might also require adaptation for trans-splicing and high-affinity interaction between the two intein parts. artificially created split inteins were found to have some affinity between their two parts and a low level of trans-splicing (shingledecker et al., 1998; southworth et al., 1998; mootz and muir, 2002) . thus, adapting the protein function of a newly split intein gene might be the minor difficulty compared with adapting proper transcription and translation control and protein stability. once all these adaptations were complete, the unidirectional ratchet nature of the intein-splitting process would ensure the evolutionary persistence of the gene organization. however, still unknown are what initial circumstances led to the selection of the individual cyanobacterium in which the intein splitting occurred. significant advantage/s likely to have accompanied this event to compensate for all necessary genetic adjustments are discussed above. analysed species sequence determination of the dnae split inteins was done in two steps for each part of the dnae gene. first, a conserved region flanking the n¢ or c¢ split intein part was amplified by degenerate primer pcrs. in some cases, a number of partially overlapping regions were amplified until the intein parts were approached. next, linear (single primer) amplifications were done towards each split intein part using specific primers, designed from the previously amplified regions. the 3¢ (intein) ends of the single-stranded amplification products were tailed by a homo-oligomer using terminal transferase. the tailed products were then pcr amplified by a primer complementary to the homo-oligomer tail and a second specific primer, nested to the first specific primer (rudi et al., 1999) . this pcr amplification was repeated twice, and the products were cloned and sequenced or sequenced directly. to confirm the resulting sequence, the intein-containing region, together with its adjacent untranslated and dnae flanks, was amplified from genomic dna using specific primers and sequenced on both strands (fig. 3) . pcr amplifications included ª50 ng of genomic dna, 25 pmol of each specific primer and 30-100 pmol of each degenerate primer, 2 nmol of dntp mix, 2.5 units of taq dna polymerase (sigma) and 5 ml of 10¥ taq polymerase buffer (sigma) in 50 ml reaction volumes. linear amplifications were done in the same way except using ª500 ng of genomic dna and 12.5 pmol of one specific primer. amplification schedules and temperatures are detailed in supplementary material (table s1 ). primer sequences are listed in supplementary material (table s2) . the linear, single-stranded amplification products were purified using a pcr purification kit (qiagen), as recommended by the manufacturer. cytosine tailing was done in 20 ml reactions with 5 ml of the purified linear amplification products, 200 pmol of dctp, 30 units of terminal deoxynucleotidyl transferase (tdt) (fermentas) and 4 ml of 5¥ tdt buffer (fermentas). reaction mixtures were incubated at 37∞c for 20 min and stopped by a 10 min incubation at 72∞c. pcr amplifications with the polyguanine primer (polyg) used 4 ml of the tdt reaction products as template with all the other components as listed above. products were purified by a pcr purification kit (qiagen) and sent directly for sequencing. the confirmed sequence data have been submitted to the genbank database under accession numbers ay209003-ay209008 and ay311409-ay311410. phylogenetic analysis was done using programs from the phylip package (felsenstein, 1989) , version 3.55. trees were calculated using the seqboot, protdist and neighbor programs with 1000 bootstrap trials and default settings. homing endonucleases: keeping the house in order prokaryotic introns and inteins: a panoply of form and function two heterocystspecific dna rearrangements of nif operons in anabaena cylindrica and nostoc sp evidence of selection for protein introns in the recas of pathogenic mycobacteria lightning strikes twice -intron-intein coincidence crystal structure of pi-scei, a homing endonuclease with protein splicing activity phylip -phylogeny inference package, version 3.2 homing events in the gyra gene of some mycobacteria zinc inhibition of protein trans-splicing and identification of regions essential for splicing and association of a split intein homing of a dna endonuclease gene by meiotic gene conversion in saccharomyces cerevisiae different recombination site specificity of two developmentally regulated genome rearrangements non-canonical inteins automated construction and graphical presentation of protein blocks from unaligned sequences the photosynthetic apparatus of prochlorococcus: insights through comparative genomics detection of seven major evolutionary lineages in cyanobacteria based on the 16s rrna gene sequence analysis with new sequences of five marine synechococcus strains sequence analysis of the genome of the unicellular cyanobacterium synechocystis sp. strain pcc6803. ii. sequence determination of the entire genome and assignment of potential protein-coding regions complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium anabaena sp. strain pcc 7120 dna polymerase iii holoenzyme: structure and function of a chromosomal replicating machine ribonucleotide reductase genes of bacillus prophages: a refuge to introns and intein coding sequences protein-splicing intein: genetic mobility, origin, and evolution characterization of a naturally occurring trans-splicing intein from synechocystis sp an overview of the genome of nostoc punctiforme, a multicellular, symbiotic cyanobacterium protein splicing triggered by a small molecule complete genome structure of the thermophilic cyanobacterium thermosynechococcus elongatus bp-1 split luciferase as an optical probe for detecting protein-protein interactions in mammalian cells based on protein splicing protein splicing and related forms of protein autoprocessing a natural example of protein transsplicing inbase: the intein database protein splicing and its applications modular organization of inteins and c-terminal autocatalytic domains intein spread and extinction in evolution restriction cutting independent method for cloning genomic dna segments outside the boundaries of known sequences inteins in mycobacterial gyra are a taxonomic character inteins invading mycobacterial reca proteins molecular dissection of the mycobacterium tuberculosis reca intein: design of a minimal intein and of a trans-splicing system involving two intein fragments complete genome sequence of methanobacterium thermoautotrophicum strain dh: functional analysis and comparative genomics control of protein splicing by intein fragment reassembly dinoflagellate-cyanobacterium communication may determine the composition of phytoplankton assemblage in a mesotrophic lake protein trans-splicing by a split intein encoded in a split dnae gene of synechocystis sp. pcc6803 pgaas: a prokaryotic genome assembly assistant system we thank z. kelman the following material is available from http://www.blackwellpublishing.com/products/journals/ suppmat/mmi/mmi3825/mmi3825sm.htm table s1 . dna amplification conditions. table s2 . pcr primers. key: cord-004222-z4butywi authors: joyce, collin; burton, dennis r.; briney, bryan title: comparisons of the antibody repertoires of a humanized rodent and humans by high throughput sequencing date: 2020-01-24 journal: sci rep doi: 10.1038/s41598-020-57764-7 sha: doc_id: 4222 cord_uid: z4butywi the humanization of animal model immune systems by genetic engineering has shown great promise for antibody discovery, tolerance studies and for the evaluation of vaccines. assessment of the baseline antibody repertoires of unimmunized model animals will be useful as a benchmark for future immunization experiments. we characterized the heavy chain and kappa light chain antibody repertoires of a model animal, the omnirat, by high throughput antibody sequencing and made use of two novel datasets for comparison to human repertoires. intra-animal and inter-animal repertoire comparisons reveal a high level of conservation in antibody diversity between the lymph node and spleen and between members of the species. multiple differences were found in both the heavy and kappa chain repertoires between omnirats and humans including gene segment usage, cdr3 length distributions, class switch recombination, somatic hypermutation levels and in features of v(d)j recombination. the inference and generation of repertoires (igor) software tool was used to model recombination in vh regions which allowed for the quantification of some of these differences. diversity estimates of the omnirat heavy chain repertoires almost reached that of humans, around two orders of magnitude less. despite variation between the species repertoires, a high frequency of omnirat clonotypes were also found in the human repertoire. these data give insights into the development and selection of humanized animal antibodies and provide actionable information for use in vaccine studies. rag1, rag2 and artemis (among others). p and n nucleotides are added in the vh-dh and dh-jh junctions by artemis and tdt, dramatically increasing sequence diversity. after successful pairing of this newly formed heavy chain with surrogate light chain (slc), recombination of a light chain from v and j gene segments of the kappa or lambda loci occurs and the b cell swaps the slc for this new light chain. unless the immature b cell is autoreactive or anergic and undergoes receptor editing or clonal deletion, it matures into a naïve b cell and migrates to the periphery whereupon it can become activated by encountering antigen and form germinal centers with help from t-cells. sequence diversity is again enhanced in the germinal center by somatic hypermutation (shm) and/or class switch recombination (csr), two processes that depend on activation induced cytidine deaminase (aid). the omnirat was created by genomic integration of human immunoglobulin (ig) loci on a background of inactivated endogenous rat ig loci. it expresses chimeric heavy chains (i.e. human v, d, and j genes and rat constant genes) that pair with fully human light chains 11, 12 . we sought to characterize the circulating antibody repertoire diversity in this animal and make comparisons to humans. high throughput antibody sequencing has been used to describe the circulating antibody repertoire of organisms, including more recently at unprecedented depth in humans 13 . reverse transcription of antibody rna and combined tagging with unique molecular identifiers (umis) have allowed us 14 and others 15, 16 to correct for error and bias in antibody sequencing. using these methods to gain insight into the antibody repertoire of omnirats, we ask whether or not it accurately represents that of humans, and by extension allows for usefulness in the approximation of the human antibody response. we postulate that there are major differences in the repertoires due to distinctness in the ig loci genomic structure and genes that shape antibody diversity between species. here, we provide the most thorough description of humanized transgenic rodent antibody repertoires to date and leverage a novel extremely deep human dataset to make comparisons with implications of immediate use as a reference for omnirat immunization studies. we individually separated total rna from spleens and lymph nodes of three unimmunized omnirats and pcr amplified the heavy and kappa chain antibody v gene segments. the resulting amplicons were subjected to high throughput sequencing in conjunction with preprocessing and annotation by the abstar analysis pipeline 17 (methods) which resulted in a mean of ~3 × 10 6 processed heavy chain sequences and ~1.5 × 10 6 processed kappa chain sequences per transgenic animal (table s1 ). two previously published datasets 13,18 of the same 10 humans which together contain a mean of ~3.6 × 10 7 processed heavy chain sequences and ~1.5 × 10 6 processed kappa chain sequences per individual were used for comparison. we started by making intra-animal comparisons, intra-species comparisons and inter-species comparisons of the immunoglobulin gene segment usage frequencies for each antibody repertoire by performing hierarchical clustering ( fig. 1 ) and linear regression analysis (figs. s1 and s2). repertoires were found to cluster by species and tissue when variable heavy (vh) (fig. 1a) , diversity heavy (dh) (fig. 1b) , joining heavy (jh) (fig. 1c) and variable kappa (vk) (fig. s5a ), but not joining kappa (jk) (fig. s5b ) gene usage was examined. differences between the lymph node and spleens of individual omnirats were next investigated. vh gene, dh gene and jh gene usage frequencies between these tissues were highly correlated (fig. s1) , although a few vh gene segments were overrepresented in spleen as compared to lymph nodes including vh4-34, vh4-59 and vh5-1 (figs. 1a and s3a). dh gene and jh gene usage remained highly correlated with minor differences in specific genes (figs. 1b,c, s1 and s3b,c). inter-animal spleen gene usage was highly correlated for all three heavy chain gene segments (fig. s2 ). inter-human comparisons yielded similar, albeit slightly less correlated results (fig. s2 ). intra-species vh and dh usage comparisons made show much weaker correlations with lower r-squared values than any other previous comparison, while surprisingly jh gene usage was highly correlated (fig. s2 ). species specific gene usage biases were more predominant in variable genes (vh and vk) than in joining genes (jh and jk) (fig. s3a-e) . we hypothesize that this may be due to species specific differences in variable gene order, but not joining gene order at the genomic loci 11 , although no significant correlation between variable gene order and gene usage in the omnirat was found (data not shown). omnirats show a preference for dh gene families of shorter average length such as dh1, dh5, dh6 and dh7 as compared to humans which show a higher representation of longer dh genes from dh2, dh3, and dh4 families with the exception of dh3-9 which appears at similar frequencies between each species (figs. 1b and s3b). vk and jk gene usage frequencies were very similar for all comparisons made (figs. s1-3 and s6). differences in cdr3 length distributions of each repertoire were next determined. the mean heavy chain cdr3 (cdrh3) length in humans is 14.8 amino acids, while in the omnirat we observed a mean cdrh3 length that is shorter with a mean length of 12.1 amino acids (fig. 2a) . there are minor differences in the kappa light chain (cdrl3) lengths between species with near identical average lengths of 9.0 and 9.1 for omnirats and humans respectively (fig. s6c ). the frequency of light chains with a cdr3 of 5 amino acids in length is an important consideration when choosing a model animal for vaccination experiments involving the germline targeting immunogen eod-gt8 which is in human clinical trials 3, 14, 19 . this frequency of 5-amino acid cdrl3s was lower in omnirats (0.02%) than in humans (0.56%) i.e. a factor of 28 (fig. s6c ). after observing a tendency for shorter cdrh3 lengths in the omnirat as compared to humans, we wanted to know if the number of n and p nucleotide additions in the heavy chain v-d and d-j junction sites were different. figure 2b ,c shows average v-d and d-j junction nucleotide addition lengths in the omnirat are indeed shorter as compared to humans. nucleotide additions in the v-j junctions of kappa chains are also shorter on average as compared to humans (fig. s6e) . the longest dh gene segments are found in the dh3 family and the longest jh gene segments come from the jh6 gene family. gene segments from these families are important contributors to the generation of unusually long cdrh3s in humans and are consistently found in certain broadly neutralizing antibodies (bnabs) that bind to the human immunodeficiency virus (hiv) envelope (env) glycoprotein, indicating the importance of these rearrangements in hiv vaccine studies 20 . on average, the frequency of antibodies with d3-j6 rearrangements in omnirats is 0.012 with little variation, while in humans the frequency of these antibody species is more variable between subjects with a higher mean of 0.028 (fig. 2d) . the preference of omnirats for shorter cdrh3 lengths and dh gene segments can be placed in the context of shorter dh gene lengths in the wild-type rat (rattus norvegicus) as compared to human dh genes (fig. 2e) , indicating a possible biologically intrinsic bias. we used igor 21 to infer recombination models for each individual repertoire from 100,000 unmutated sequences allowing for the quantification of differences in features of heavy chain vdj recombination and generated 1,000,000 synthetic sequences per model. cdrh3 length, vd insertion length and dj insertion length distributions from the synthetic sequence data (fig. s4a-c) were found to be very similar to the observed data columns are antibody repertoires and rows are gene segments. data was scaled by calculating the z-score for each gene (row) and hierarchical clustering (euclidean distance metric) was done. a dendrogram representation of clustering is shown and indicates uniqueness in gene segment usage between the lymph node and spleen repertoires of the omnirat and between the omnirat and human repertoires. red and blue indicate high and low z-scores respectively (legend shown in a), and since it is calculated per gene it represents differences between repertoires and not the relative frequencies of gene segment usage in each repertoire. ( fig. 2a-c) . kullback-leibler (kl) divergence is a measure of how different two probability distributions are. kl divergence between models (fig. s4d ) and model 'events' (fig. s4e) were computed as previously described 13 . kl divergence between pairs of omnirat models was found to be lower than kl divergence between pairs of human models for both complete and all event level calculations. the average pairwise omnirat model versus human model complete kl divergence calculation was found to be much greater than that of pairwise inter-animal calculations and more than twice that of pairwise inter-human calculations. "d-gene", "v-gene trim (3')", and "d-gene trim (3')" were among the events computed to have the mean highest kl divergence from pairwise inter-species event level model comparisons. class switch recombination and somatic hypermutation in omnirats. in supplementary fig. s5a , the frequency of antibody isotypes is shown. the human repertoire contains average frequencies of 0.84 and 0.16 for igm and igg respectively as previously published 13 , while in the omnirat antibody repertoire we observe mean frequencies of 0.15 and 0.003 for lymph node and spleen igg respectively and means of 0.85 and 0.997 for lymph node and spleen igm respectively. mean numbers of variable gene mutations in igm (fig. s5b) , igg (fig. s5c) and kappa (fig. s6d) sequences of the omnirat were about half of those found in the human repertoire. the observed increase in shm of class-switched igg sequences as compared to igm sequences in the omnirat demonstrates the ability of the animal to generate memory b cells. we first examined clonotype (defined as identical vh gene, jh gene and cdrh3 amino acid sequence) diversity of the heavy chain repertoire for each individual animal. all sequences from lymph nodes and spleens were collapsed into unique clonotypes, separately for each tissue and animal. any clonotype found in both tissue compartments must have originated from different b cells, allowing for the measurement of repeatedly observed clonotypes. rarefaction curves for each animal were www.nature.com/scientificreports www.nature.com/scientificreports/ generated (fig. 3a) and indicate a low frequency of repeatedly observed clonotypes. we estimated diversity using chao 2 22, 23 and recon 23,24 as previously described 13 . diversity estimates were similar between the two estimators, (4.8 × 10 6 -7.4 × 10 6 ) for chao and (9.4 × 10 6 -1.9 × 10 7 ) for recon (fig. 3b) . we next estimated heavy chain sequence diversity for each animal (fig. 3c) and again found that both estimators broadly agreed, giving similar values of (5.0 × 10 7 -8.1 × 10 7 ) for chao and (5.4 × 10 7 -1.0 × 10 8 ) for recon. previously published estimates of both clonotype and sequence diversity in individual humans 17 only exceed that in the omnirats by a maximum two orders of magnitude. this is surprising given that the omnirat is more restricted in cdrh3 length. for each combination of two or more animals, we computed the frequency of shared unique heavy chain clonotypes (fig. 4a) . there was on average 9.32% of clonotypes shared between each combination of two omnirats. surprisingly, we found that 4.90% of clonotypes were shared between all three of the animals. next, we pooled unique heavy chain clonotypes from all ten human subjects and measured the percentage of clonotypes in each animal that could be found in the total human pool (fig. 4b) . we found that (11.9-13.7%) of each omnirat clonotype repertoire and 13.8% of clonotypes combined from all animals could be found in the total human clonotype pool. shared clonotypes have shorter cdrh3 lengths than unshared clonotypes on average (fig. 4c) which is expected given that sequence diversity is expected to increase as the number of amino acids increases giving less of a chance for sharing. vh gene family usage between shared and unshared clonotypes indicates no major differences (fig. 4d) . sequence logos for 8 amino acid long (fig. 4e) and 13 amino acid long (fig. 4e) cdhr3s from both shared and unshared fractions were made and indicate broad similarity between the two fractions. we set out to determine commonalities and differences between omnirat and human antibody repertoires to be used as a reference for vaccine studies. our results show that there exists substantial variation in gene usage frequencies and elements of recombination, indicating specific limitations of this animal model for predicting the human immune response. we found that by performing hierarchical clustering on gene segment usage, repertoires clustered together by both species and tissue. differences in gene segment usage between transgenic animal models and humans, as well as between tissues are expected. for example, multiple human ig loci transgenic rodents are reported to have gene usage profiles that slightly vary from that of humans 25, 26 . furthermore, antibody repertoires from separate human tissues are known to deviate strongly enough to be clustered by hierarchical clustering 27 . in our case, the lymph node repertoire from the omnirat was most likely able to be distinguished from that of the spleen due to the increased presence of antigen-experienced b cells in the latter as shown by somatic hypermutation and class-switched transcript frequencies. this indicates that tissue selection will affect the outcome of an antibody discovery campaign and reinforces evidence for normal b cell development by suggesting the existence of affinity maturation. investigation into cdr3 length distributions revealed that the omnirat prefers shorter cdrh3s as compared to humans. interestingly, the mechanisms of this preference are due to decreased n additions, and a tendency to incorporate shorter dh gene segments. this result has also been seen in multiple other transgenic and wild-type rodents 25, 26, 28, 29 . the specific reasons remain unclear, although the observation that wild-type rat germline d gene segment lengths are shorter suggests intrinsic species-specific mechanisms of selection as well as differences in tdt expression during bone marrow b cell development. we further speculate that another possible reason for intra-species variation can be attributed to distinct prior antigen exposure and divergent gut microbiomes 30, 31 . the diversity of the omnirat heavy chain repertoire was shown to be slightly lower than that in humans. our results indicate biased gene usage and decreased junctional diversity are the primary reasons for the resulting repertoire diversity estimate comparisons. we also showed that there is a much higher frequency of 'public clonotypes' or clonotypes shared between members of this species than previously reported in humans 13, 32 . lower sequence diversity combined with identical genetic background and highly similar gene usage are possible reasons for this result. in summary, we have determined specific differences between the omnirat and human antibody repertoires which must be taken into careful consideration when evaluating an antibody response in order to make predictions for human subjects. we have also shown that this animal's antibodies show signs of class switch recombination, somatic hypermutation and large diversity supporting its value for the discovery of monoclonal antibodies to targets that may not be immunogenic in other models. even though a high degree of variation exists, we still found many clonotypes to be shared between the species pools. finally, more studies will need to be done in order to characterize omnirat serum and memory b cell responses to immunogens. next-generation sequencing of omnirat antibody repertoires. total rna from spleens and lymph nodes was extracted (rneasy maxi kit, qiagen) from each unimmunized heavy chain and kappa chain only transgenic rat (omnirat, open monoclonal technology inc., palo alto, ca, usa) and antibody sequences were amplified as previously described 13 except for different primers used during reverse transcription (table s2) . we chose to interrogate the antibody repertoires found in secondary lymphoid organs as opposed to peripheral blood due to the higher number of b cells. correct pcr product sizes were verified on an agarose gel (e-gel ex; invitrogen) and quantified with fluorometry (qubit; life technologies), pooled at approximately equimolar concentrations and each sample pool was re-quantified before sequencing on an illumina miseq (miseq reagent kit v3, 600-cycle). all animal experiments were conducted in accordance with the institutional animal care and use committee of scripps research and approved by the institutional research boards of scripps research. (d) distribution of vh gene family usage for unshared clonotypes from the animal pool (black) or clonotypes shared by both species pool (red). (e,f) sequence logos of the cdrh3s encoded by shared and unshared clonotypes of length 8 (e) or 13 (f). head-region amino acid coloring: polar amino acids (gstycqn) are green; basic amino acids (krh) are blue; acidic amino acids (de) are red; and hydrophobic amino acids (avlipwfm) are black. all torso residues are gray. murine antibody responses to cleaved soluble hiv-1 envelope trimers are highly restricted in specificity rational hiv immunogen design to target specific germline b cell receptors priming hiv-1 broadly neutralizing antibody precursors in human ig loci transgenic mice bacterially derived synthetic mimetics of mammalian oligomannose prime antibody responses that neutralize hiv infectivity a repertoire of monoclonal antibodies with human heavy chains from transgenic mice antigen-specific human monoclonal antibodies from mice engineered with human ig heavy and light chain yacs antigen-specific human antibodies from mice comprising four distinct genetic modifications human antibody production in transgenic animals transgenic mouse strains as platforms for the successful discovery and development of human therapeutic monoclonal antibodies mechanisms of central tolerance for b cells high-affinity igg antibodies develop naturally in ig-knockout rats carrying germline human igh/igκ/igλ loci bearing the rat ch region human antibody expression in transgenic rats: comparison of chimeric igh loci with human vh, d and jh but bearing different rat c-gene regions commonality despite exceptional diversity in the baseline human antibody repertoire clonify: unseeded antibody lineage assignment from next-generation sequencing data genetic measurement of memory b-cell recall using antibody repertoire sequencing toward a more accurate view of human b-cell repertoire by next-generation sequencing, unbiased repertoire capture and single-molecule barcoding massively scalable genetic analysis of antibody repertoires rapid and focused maturation of a vrc01-class hiv broadly neutralizing antibody lineage involves both binding and accommodation of the n276-glycan hiv-1 vaccines. priming a broadly neutralizing antibody response to hiv-1 using a germline targeting immunogen identification of common features in prototype broadly neutralizing antibodies to hiv envelope v2 apex to facilitate vaccine design high-throughput immune repertoire analysis with igor estimating the population size for capture-recapture data with unequal catchability how many different clonotypes do immune repertoires contain? robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery mechanisms that shape human antibody repertoire development in mice transgenic for human ig h and l chain loci tissue-specific expressed antibody variable gene repertoires intrinsic bias and public rearrangements in the human immunoglobulin vλ light chain repertoire immunoglobulin light chain gene rearrangements, receptor editing and the development of a self-tolerant antibody repertoire microbial symbionts regulate the primary ig repertoire b cell superantigens in the human intestinal microbiota high frequency of shared clonotypes in human b cell receptor repertoires moderated estimation of fold change and dispersion for rna-seq data with deseq. 2 the authors thank all the study subjects for their participation. c.j., d.r.b. and b.b. planned and designed the experiments. c.j. performed experiments. c.j. analyzed data. c.j. wrote the manuscript. all authors contributed to manuscript revisions. the authors declare no competing interests. supplementary information is available for this paper at https://doi.org/10.1038/s41598-020-57764-7.correspondence and requests for materials should be addressed to d.r.b. or b.b.publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.open access this article is licensed under a creative commons attribution 4.0 international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. key: cord-007708-hr4smx24 authors: van kampen, antoine h. c.; moerland, perry d. title: taking bioinformatics to systems medicine date: 2015-08-13 journal: systems medicine doi: 10.1007/978-1-4939-3283-2_2 sha: doc_id: 7708 cord_uid: hr4smx24 systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. in this chapter we discuss how bioinformatics critically contributes to systems medicine. first, we explain the role of bioinformatics in the management and analysis of data. in particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. we discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine. systems medicine fi nds its roots in systems biology, the scientifi c discipline that aims at a systems-level understanding of, for example, biological networks, cells, organs, organisms, and populations. it generally involves a combination of wet-lab experiments and computational (bioinformatics) approaches. systems medicine extends systems biology by focusing on the application of systems-based approaches to clinically relevant applications in order to improve patient health or the overall well-being of (healthy) individuals [ 1 ] . systems medicine is expected to change health care practice in the coming years. it will contribute to new therapeutics through the identifi cation of novel disease genes that provide drug candidates less likely to fail in clinical studies [ 2 , 3 ] . it is also expected to contribute to fundamental insights into networks perturbed by disease, improved prediction of disease progression, stratifi cation of disease subtypes, personalized treatment selection, and prevention of disease. to enable systems medicine it is necessary to characterize the patient at various levels and, consequently, to collect, integrate, and analyze various types of data including not only clinical (phenotype) and molecular data, but also information about cells (e.g., disease-related alterations in organelle morphology), organs (e.g., lung impedance when studying respiratory disorders such as asthma or chronic obstructive pulmonary disease), and even social networks. the full realization of systems medicine therefore requires the integration and analysis of environmental, genetic, physiological, and molecular factors at different temporal and spatial scales, which currently is very challenging. it will require large efforts from various research communities to overcome current experimental, computational, and information management related barriers. in this chapter we show how bioinformatics is an essential part of systems medicine and discuss some of the future challenges that need to be solved. to understand the contribution of bioinformatics to systems medicine, it is helpful to consider the traditional role of bioinformatics in biomedical research, which involves basic and applied (translational) research to augment our understanding of (molecular) processes in health and disease. the term "bioinformatics" was fi rst coined by the dutch theoretical biologist paulien hogeweg in 1970 to refer to the study of information processes in biotic systems [ 4 ] . soon, the fi eld of bioinformatics expanded and bioinformatics efforts accelerated and matured as the fi rst (whole) genome and protein sequences became available. the signifi cance of bioinformatics further increased with the development of highthroughput experimental technologies that allowed wet-lab researchers to perform large-scale measurements. these include determining whole-genome sequences (and gene variants) and genome-wide gene expression with next-generation sequencing technologies (ngs; see table 1 for abbreviations and web links) [ 5 ] , measuring gene expression with dna microarrays [ 6 ] , identifying and quantifying proteins and metabolites with nmr or (lc/ gc-) ms [ 7 ] , measuring epigenetic changes such as methylation and histone modifi cations [ 8 ] , and so on. these, "omics" technologies, are capable of measuring the many molecular building blocks that determine our (patho)physiology. genome-wide measurements have not only signifi cantly advanced our fundamental understanding of the molecular biology of health and disease but table 1 abbreviations and websites have also contributed to new (commercial) diagnostic and prognostic tests [ 9 , 10 ] and the selection and development of (personalized) treatment [ 11 ] . nowadays, bioinformatics is therefore defi ned as "advancing the scientifi c understanding of living systems through computation" (iscb), or more inclusively as "conceptualizing biology in terms of molecules and applying 'informatics techniques' (derived from disciplines such as applied mathematics, computer science and statistics) to understand and organize the information associated with these molecules, on a large scale" [ 12 ] . it is worth noting that solely measuring many molecular components of a biological system does not necessarily result in a deeper understanding of such a system. understanding biological function does indeed require detailed insight into the precise function of these components but, more importantly, it requires a thorough understanding of their static, temporal, and spatial interactions. these interaction networks underlie all (patho)physiological processes, and elucidation of these networks is a major task for bioinformatics and systems medicine . the developments in experimental technologies have led to challenges that require additional expertise and new skills for biomedical researchers: • information management. modern biomedical research projects typically produce large and complex omics data sets , sometimes in the order of hundreds of gigabytes to terabytes of which a large part has become available through public databases [ 13 , 14 ] sometimes even prior to publication (e.g., gtex, icgc, tcga). this not only contributes to knowledge dissemination but also facilitates reanalysis and metaanalysis of data, evaluation of hypotheses that were not considered by the original research group, and development and evaluation of new bioinformatics methods. the use of existing data can in some cases even make new (expensive) experiments superfl uous. alternatively, one can integrate publicly available data with data generated in-house for more comprehensive analyses, or to validate results [ 15 ] . in addition, the obligation of making raw data available may prevent fraud and selective reporting. the management (transfer, storage, annotation, and integration) of data and associated meta-data is one of the main and increasing challenges in bioinformatics that needs attention to safeguard the progression of systems medicine. • data analysis and interpretation . bioinformatics data analysis and interpretation of omics data have become increasingly complex, not only due to the vast volumes and complexity of the data but also as a result of more challenging research questions. bioinformatics covers many types of analyses including nucleotide and protein sequence analysis, elucidation of tertiary protein structures, quality control, pre-processing and statistical analysis of omics data, determination of genotypephenotype relationships, biomarker identifi cation, evolutionary analysis, analysis of gene regulation, reconstruction of biological networks, text mining of literature and electronic patient records, and analysis of imaging data. in addition, bioinformatics has developed approaches to improve experimental design of omics experiments to ensure that the maximum amount of information can be extracted from the data. many of the methods developed in these areas are of direct relevance for systems medicine as exemplifi ed in this chapter. clearly, new experimental technologies have to a large extent turned biomedical research in a data-and compute-intensive endeavor. it has been argued that production of omics data has nowadays become the "easy" part of biomedical research, whereas the real challenges currently comprise information management and bioinformatics analysis. consequently, next to the wet-lab, the computer has become one of the main tools of the biomedical researcher . bioinformatics enables and advances the management and analysis of large omics-based datasets, thereby directly and indirectly contributing to systems medicine in several ways ( fig. 1 3. quality control and pre-processing of omics data. preprocessing typically involves data cleaning (e.g., removal of failed assays) and other steps to obtain quantitative measurements that can be used in downstream data analysis. 4. (statistical) data analysis methods of large and complex omicsbased datasets. this includes methods for the integrative analysis of multiple omics data types (subheading 5 ), and for the elucidation and analysis of biological networks (top-down systems medicine; subheading 6 ). systems medicine comprises top-down and bottom-up approaches. the former represents a specifi c branch of bioinformatics, which distinguishes itself from bottom-up approaches in several ways [ 3 , 19 , 20 ] . top-down approaches use omics data to obtain a holistic view of the components of a biological system and, in general, aim to construct system-wide static functional or physical interaction networks such as gene co-expression networks and protein-protein interaction networks. in contrast, bottom-up approaches aim to develop detailed mechanistic and quantitative mathematical models for sub-systems. these models describe the dynamic and nonlinear behavior of interactions between known components to understand and predict their behavior upon perturbation. however, in contrast to omics-based top-down approaches, these mechanistic models require information about chemical/physical parameters and reaction stoichiometry, which may not be available and require further (experimental) efforts. both the top-down and bottom-up approaches result in testable hypotheses and new wet-lab or in silico experiments that may lead to clinically relevant fi ndings. biomedical research and, consequently, systems medicine are increasingly confronted with the management of continuously growing volumes of molecular and clinical data, results of data analyses and in silico experiments, and mathematical models. due fig. 1 the contribution of bioinformatics ( dark grey boxes ) to systems medicine ( black box ). (omics) experiments, patients, and public repositories provide a wide range of data that is used in bioinformatics and systems medicine studies to policies of scientifi c journals and funding agencies, omics data is often made available to the research community via public databases. in addition, a wide range of databases have been developed, of which more than 1550 are currently listed in the molecular biology database collection [ 14 ] providing a rich source of biomedical information. biological repositories do not merely archive data and models but also serve a range of purposes in systems medicine as illustrated below from a few selected examples. the main repositories are hosted and maintained by the major bioinformatics institutes including ebi, ncbi, and sib that make a major part of the raw experimental omics data available through a number of primary databases including genbank [ 21 ] , geo [ 22 ] , pride [ 23 ] , and metabolights [ 24 ] for sequence, gene expression, ms-based proteomics, and ms-based metabolomics data, respectively. in addition, many secondary databases provide information derived from the processing of primary data, for example pathway databases (e.g., reactome [ 25 ] , kegg [ 26 ] ), protein sequence databases (e.g., uniprotkb [ 27 ] ), and many others. pathway databases provide an important resource to construct mathematical models used to study and further refi ne biological systems [ 28 , 29 ] . other efforts focus on establishing repositories integrating information from multiple public databases. the integration of pathway databases [ 30 -32 ] , and genome browsers that integrate genetic, omics, and other data with whole-genome sequences [ 33 , 34 ] are two examples of this. joint initiatives of the bioinformatics and systems biology communities resulted in repositories such as biomodels, which contains mathematical models of biochemical and cellular systems [ 35 ] , recon 2 that provides a communitydriven, consensus " metabolic reconstruction " of human metabolism suitable for computational modelling [ 36 ] , and seek, which provides a platform designed for the management and exchange of systems biology data and models [ 37 ] . another example of a database that may prove to be of value for systems medicine studies is malacards , an integrated and annotated compendium of about 17,000 human diseases [ 38 ] . malacards integrates 44 disease sources into disease cards and establishes gene-disease associations through integration with the well-known genecards databases [ 39 , 40 ] . integration with genecards and cross-references within malacards enables the construction of networks of related diseases revealing previously unknown interconnections among diseases, which may be used to identify drugs for off-label use. another class of repositories are (expert-curated) knowledge bases containing domain knowledge and data, which aim to provide a single point of entry for a specifi c domain. contents of these knowledge bases are often based on information extracted (either manually or by text mining) from literature or provided by domain experts [ 41 -43 ] . finally, databases are used routinely in the analysis, interpretation, and validation of experimental data. for example, the gene ontology (go) provides a controlled vocabulary of terms for describing gene products, and is often used in gene set analysis to evaluate expression patterns of groups of genes instead of those of individual genes [ 44 ] and has, for example, been applied to investigate hiv-related cognitive disorders [ 45 ] and polycystic kidney disease [ 46 ] . several repositories such as mir2disease [ 47 ] , peroxisomedb [ 41 ] , and mouse genome informatics (mgi) [ 43 ] include associations between genes and disorders, but only provide very limited phenotypic information. phenotype databases are of particular interest to systems medicine. one well-known phenotype repository is the omim database, which primarily describes single-gene (mendelian) disorders [ 48 ] . clinvar is another example and provides an archive of reports and evidence of the relationships among medically important human variations found in patient samples and phenotypes [ 49 ] . clinvar complements dbsnp (for singlenucleotide polymorphisms) [ 50 ] and dbvar (for structural variations) [ 51 ] , which both provide only minimal phenotypic information. the integration of these phenotype repositories with genetic and other molecular information will be a major aim for bioinformatics in the coming decade enabling, for example, the identifi cation of comorbidities, determination of associations between gene (mutations) and disease, and improvement of disease classifi cations [ 52 ] . it will also advance the defi nition of the "human phenome," i.e., the set of phenotypes resulting from genetic variation in the human genome. to increase the quality and (clinical) utility of the phenotype and variant databases as an essential step towards reducing the burden of human genetic disease, the human variome project coordinates efforts in standardization, system development, and (training) infrastructure for the worldwide collection and sharing of genetic variations that affect human health [ 53 , 54 ] . to implement and advance systems medicine to the benefi t of patients' health, it is crucial to integrate and analyze molecular data together with de-identifi ed individual-level clinical data complementing general phenotype descriptions. patient clinical data refers to a wide variety of data including basic patient information (e.g., age, sex, ethnicity), outcomes of physical examinations, patient history, medical diagnoses, treatments, laboratory tests, pathology reports, medical images, and other clinical outcomes. inclusion of clinical data allows the stratifi cation of patient groups into more homogeneous clinical subgroups. availability of clinical data will increase the power of downstream data analysis and modeling to elucidate molecular mechanisms, and to identify molecular biomarkers that predict disease onset or progression, or which guide treatment selection. in biomedical studies clinical information is generally used as part of patient and sample selection, but some omics studies also use clinical data as part of the bioinformatics analysis (e.g., [ 9 , 55 ] ). however, in general, clinical data is unavailable from public resources or only provided on an aggregated level. although good reasons exist for making clinical data available (subheading 2.2 ), ethical and legal issues comprising patient and commercial confi dentiality, and technical issues are the most immediate challenges [ 56 , 57 ] . this potentially hampers the development of systems medicine approaches in a clinical setting since sharing and integration of clinical and nonclinical data is considered a basic requirement [ 1 ] . biobanks [ 58 ] such as bbmri [ 59 ] provide a potential source of biological material and associated (clinical) data but these are, generally, not publicly accessible, although permission to access data may be requested from the biobank provider. clinical trials provide another source of clinical data for systems medicine studies, but these are generally owned by a research group or sponsor and not freely available [ 60 ] although ongoing discussions may change this in the future ( [ 61 ] and references therein). although clinical data is not yet available on a large scale, the bioinformatics and medical informatics communities have been very active in establishing repositories that provide clinical data. one example is the database of genotypes and phenotypes (dbgap) [ 62 ] developed by the ncbi. study metadata, summarylevel (phenotype) data, and documents related to studies are publicly available. access to de-identifi ed individual-level (clinical) data is only granted after approval by an nih data access committee. another example is the cancer genome atlas (tcga) , which also provides individual-level molecular and clinical data through its own portal and the cancer genomics hub (cghub). clinical data from tcga is available without any restrictions but part of the lower level sequencing and microarray data can only be obtained through a formal request managed by dbgap. medical patient records provide an even richer source of phenotypic information , and has already been used to stratify patient groups, discover disease relations and comorbidity, and integrate these records with molecular data to obtain a systems-level view of phenotypes (for a review see [ 63 ] ). on the one hand, this integration facilitates refi nement and analysis of the human phenome to, for example, identify diseases that are clinically uniform but have different underlying molecular mechanisms, or which share a pathogenetic mechanism but with different genetic cause [ 64 ] . on the other hand, using the same data, a phenome-wide association study ( phewas ) [ 65 ] would allow the identifi cation of unrelated phenotypes associated with specifi c shared genetic variant(s), an effect referred to as pleiotropy. moreover, it makes use of information from medical records generated in routine clinical practice and, consequently, has the potential to strengthen the link between biomedical research and clinical practice [ 66 ] . the power of phenome analysis was demonstrated in a study involving 1.5 million patient records, not including genotype information, comprising 161 disorders. in this study it was shown that disease phenotypes form a highly connected network suggesting a shared genetic basis [ 67 ] . indeed, later studies that incorporated genetic data resulted in similar fi ndings and confi rmed a shared genetic basis for a number of different phenotypes. for example, a recent study identifi ed 63 potentially pleiotropic associations through the analysis of 3144 snps that had previously been implicated by genome-wide association studies ( gwas) as mediators of human traits, and 1358 phenotypes derived from patient records of 13,835 individuals [ 68 ] . this demonstrates that phenotypic information extracted manually or through text mining from patient records can help to more precisely defi ne (relations between) diseases. another example comprises the text mining of psychiatric patient records to discover disease correlations [ 52 ] . here, mapping of disease genes from the omim database to information from medical records resulted in protein networks suspected to be involved in psychiatric diseases. integrative bioinformatics comprises the integrative (statistical) analysis of multiple omics data types. many studies demonstrated that using a single omics technology to measure a specifi c molecular level (e.g., dna variation, expression of genes and proteins, metabolite concentrations, epigenetic modifi cations) already provides a wealth of information that can be used for unraveling molecular mechanisms underlying disease. moreover, single-omics disease signatures which combine multiple (e.g., gene expression) markers have been constructed to differentiate between disease subtypes to support diagnosis and prognosis. however, no single technology can reveal the full complexity and details of molecular networks observed in health and disease due to the many interactions across these levels. a systems medicine strategy should ideally aim to understand the functioning of the different levels as a whole by integrating different types of omics data. this is expected to lead to biomarkers with higher predictive value, and novel disease insights that may help to prevent disease and to develop new therapeutic approaches. integrative bioinformatics can also facilitate the prioritization and characterization of genetic variants associated with complex human diseases and traits identifi ed by gwas in which hundreds of thousands to over a million snps are assayed in a large number of individuals. although such studies lack the statistical power to identify all disease-associated loci [ 69 ] , they have been instrumental in identifying loci for many common diseases. however, it remains diffi cult to prioritize the identifi ed variants and to elucidate their effect on downstream pathways ultimately leading to disease [ 70 ] . consequently, methods have been developed to prioritize candidate snps based on integration with other (omics) data such as gene expression, dnase hypersensitive sites, histone modifi cations, and transcription factor-binding sites [ 71 ] . the integration of multiple omics data types is far from trivial and various approaches have been proposed [ 72 -74 ] . one approach is to link different types of omics measurements through common database identifi ers. although this may seem straightforward, in practice this is complicated as a result of technical and standardization issues as well as a lack of biological consensus [ 32 , 75 -77 ] . moreover, the integration of data at the level of the central dogma of molecular biology and, for example, metabolite data is even more challenging due to the indirect relationships between genes, transcripts, and proteins on the one hand and metabolites on the other hand, precluding direct links between the database identifi ers of these molecules. statistical data integration [ 72 ] is a second commonly applied strategy, and various approaches have been applied for the joint analysis of multiple data types (e.g., [ 78 , 79 ] ). one example of statistical data integration is provided by a tcga study that measured various types of omics data to characterize breast cancer [ 80 ] . in this study 466 breast cancer samples were subjected to whole-genome and -exome sequencing, and snp arrays to obtain information about somatic mutations, copy number variations, and chromosomal rearrangements. microarrays and rna-seq were used to determine mrna and microrna expression levels, respectively. reverse-phase protein arrays (rppa) and dna methylation arrays were used to obtain data on protein expression levels and dna methylation, respectively. simultaneous statistical analysis of different data types via a "cluster-of-clusters" approach using consensus clustering on a multi-omics data matrix revealed that four major breast cancer subtypes could be identifi ed. this showed that the intrinsic subtypes (basal, luminal a and b, her2) that had previously been determined using gene expression data only could be largely confi rmed in an integrated analysis of a large number of breast tumors. single-level omics data has extensively been used to identify disease-associated biomarkers such as genes, proteins, and metabolites. in fact, these studies led to more than 150,000 papers documenting thousands of claimed biomarkers, however, it is estimated that fewer than 100 of these are currently used for routine clinical practice [ 81 ] . integration of multiple omics data types is expected to result in more robust and predictive disease profi les since these better refl ect disease biology [ 82 ] . further improvement of these profi les may be obtained through the explicit incorporation of interrelationships between various types of measurements such as microrna-mrna target, or gene methylation-microrna (based on a common target gene). this was demonstrated for the prediction of short-term and long-term survival from serous cystadenocarcinoma tcga data [ 83 ] . according to the recent casym roadmap : "human disease can be perceived as perturbations of complex, integrated genetic, molecular and cellular networks and such complexity necessitates a new approach." [ 84 ] . in this section we discuss how (approximations) to these networks can be constructed from omics data and how these networks can be decomposed in smaller modules. then we discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, lead to predictive diagnostic and prognostic models, and help to further subclassify diseases [ 55 , 85 ] (fig. 2 ) network-based approaches will provide medical doctors with molecular level support to make personalized treatment decisions. in a top-down approach the aim of network reconstruction is to infer the connections between the molecules that constitute a biological network. network models can be created using a variety of mathematical and statistical techniques and data types. early approaches for network inference (also called reverse engineering ) used only gene expression data to reconstruct gene networks. here, we discern three types of gene network inference algorithms using methods based on (1) correlation-based approaches, (2) information-theoretic approaches, and (3) bayesian networks [ 86 ] . co-expression networks are an extension of commonly used clustering techniques , in which genes are connected by edges in a network if the amount of correlation of their gene expression profi les exceeds a certain value. co-expression networks have been shown to connect functionally related genes [ 87 ] . note that connections in a co-expression network correspond to either direct (e.g., transcription factor-gene and protein-protein) or indirect (e.g., proteins participating in the same pathway) interactions. in one of the earliest examples of this approach, pair-wise correlations were calculated between gene expression profi les and the level of growth inhibition caused by thousands of tested anticancer agents, for 60 cancer cell lines [ 88 ] . removal of associations weaker than a certain threshold value resulted in networks consisting of highly correlated genes and agents, called relevance networks, which led to targeted hypotheses for potential single-gene determinants of chemotherapeutic susceptibility. information-theoretic approaches have been proposed in order to capture nonlinear dependencies assumed to be present in most biological systems and that cannot be captured by correlation-based distance measures . these approaches often use the concept of mutual information, a generalization of the correlation coeffi cient which quantifi es the degree of statistical (in)dependence. an example of a network inference method that is based on mutual information is aracne, which has been used to reconstruct the human b-cell gene network from a large compendium of human b-cell gene expression profi les [ 89 ] . in order to discover regulatory interactions, aracne removes the majority of putative indirect interactions from the initial mutual information-based gene network using a theorem from information theory, the data processing inequality. this led to the identifi cation of myc as a major hub in the b-cell gene network and a number of novel myc target genes, which were experimentally validated. whether informationtheoretic approaches are more powerful in general than correlationbased approaches is still subject of debate [ 90 ] . bayesian networks allow the description of statistical dependencies between variables in a generic way [ 91 , 92 ] . bayesian networks are directed acyclic networks in which the edges of the network represent conditional dependencies; that is, nodes that are not connected represent variables that are conditionally independent of each other. a major bottleneck in the reconstruction of bayesian networks is their computational complexity. moreover, bayesian networks are acyclic and cannot capture feedback loops that characterize many biological networks. when time-series rather than steady-state data is available, dynamic bayesian networks provide a richer framework in which cyclic networks can be reconstructed [ 93 ] . gene (co-)expression data only offers a partial view on the full complexity of cellular networks. consequently, networks have also been constructed from other types of high-throughput data. for example, physical protein-protein interactions have been measured on a large scale in different organisms including human, using affi nity capture-mass spectrometry or yeast two-hybrid screens, and have been made available in public databases such as biogrid [ 94 ] . regulatory interactions have been probed using chromatin immunoprecipitation sequencing (chip-seq) experiments, for example by the encode consortium [ 95 ] . using probabilistic techniques , heterogeneous types of experimental evidence and prior knowledge have been integrated to construct functional association networks for human [ 96 ] , mouse [ 97 ] , and, most comprehensively, more than 1100 organisms in the string database [ 98 ] . functional association networks can help predict novel pathway components, generate hypotheses for biological functions for a protein of interest, or identify disease-related genes [ 97 ] . prior knowledge required for these approaches is, for example, available in curated biological pathway databases, and via protein associations predicted using text mining based on their cooccurrence in abstracts or even full-text articles. many more integrative network inference methods have been proposed; for a review see [ 99 ] . the integration of gene expression data with chip data [ 100 ] or transcription factor-binding motif data [ 101 ] has shown to be particularly fruitful for inferring transcriptional regulatory networks. recently, li et al. [ 102 ] described the results from a regression-based model that predicts gene expression using encode (chip-seq) and tcga data (mrna expression data complemented with copy number variation, dna methylation, and microrna expression data). this model infers the regulatory activities of expression regulators and their target genes in acute myeloid leukemia samples. eighteen key regulators were identifi ed, whose activities clustered consistently with cytogenetic risk groups. bayesian networks have also been used to integrate multiomics data. the combination of genotypic and gene expression data is particularly powerful, since dna variations represent naturally occurring perturbations that affect gene expression detected as expression quantitative trait loci ( eqtl ). cis -acting eqtls can then be used as constraints in the construction of directed bayesian networks to infer causal relationships between nodes in the network [ 103 ] . large multi-omics datasets consisting of hundreds or sometimes even thousands of samples are available for many commonly occurring human diseases, such as most tumor types (tcga), alzheimer's disease [ 104 ] , and obesity [ 105 ] . however, a major bottleneck for the construction of accurate gene networks is that the number of gene networks that are compatible with the experimental data is several orders of magnitude larger still. in other words, top-down network inference is an underdetermined problem with many possible solutions that explain the data equally well and individual gene-gene interactions are characterized by a high false-positive rate [ 99 ] . most network inference methods therefore try to constrain the number of possible solutions by making certain assumptions about the structure of the network. perhaps the most commonly used strategy to harness the complexity of the gene network inference problem is to analyze experimental data in terms of biological modules, that is, sets of genes that have strong interactions and a common function [ 106 ] . there is considerable evidence that many biological networks are modular [ 107 ] . module-based approaches effectively constrain the number of parameters to estimate and are in general also more robust to the noise that characterizes high-throughput omics measurements. a detailed review of module-based techniques is outside the scope of this chapter (see, for example [ 108 ] ), but we would like to mention a few examples of successful and commonly used modular approaches. weighted gene co-expression network analysis ( wgcna) decomposes a co-expression network into modules using clustering techniques [ 109 ] . modules can be summarized by their module eigengene, a weighted average expression profi le of all gene member of a given module. eigengenes can then be correlated with external sample traits to identify modules that are related with these traits. parikshak et al. [ 110 ] used wgcna to extract modules from a co-expression network constructed using fetal and early postnatal brain development expression data. next, they established that several of these modules were enriched for genes and rare de novo variants implicated in autism spectrum disorder (asd). moreover, the asd-associated modules are also linked at the transcriptional level and 17 transcription factors were found acting as putative co-regulators of asd-associated gene modules during neocortical development. wgcna can also be used when multiple omics data types are available. one example of such an approach involved the integration of transcriptomic and proteomic data from a study investigating the response to sars-cov infection in mice [ 111 ] . in this study wgcna-based gene and protein co-expression modules were constructed and integrated to obtain module-based disease signatures. interestingly, the authors found several cases of identifi er-matched transcripts and proteins that correlated well with the phenotype, but which showed poor or anticorrelation across these two data types. moreover, the highest correlating transcripts and peptides were not the most central ones in the co-expression modules. vice versa , the transcripts and proteins that defi ned the modules were not those with the highest correlation to the phenotype. at the very least this shows that integration of omics data affects the nature of the disease signatures. identifi cation of active modules is another important integrative modular technique . here, experimental data in the form of molecular profi les is projected onto a biological network, for example a protein-protein interaction network. active modules are those subnetworks that show the largest change in expression for a subset of conditions and are likely to contain key drivers or regulators of those processes perturbed in the experiment. active modules have, for example, been used to fi nd a subnetwork that is overexpressed in a particularly aggressive lymphoma subtype [ 112 ] and to detect signifi cantly mutated pathways [ 113 ] . some active module approaches integrate various types of omics data. one example of such an approach is paradigm [ 114 ] , which translates pathways into factor graphs, a class of models that belongs to the same family of models as bayesian networks, and determines sample-specifi c pathway activity from multiple functional genomic datasets. paradigm has been used in several tcga projects, for example, in the integrated analysis of 131 urothelial bladder carcinomas [ 55 ] . paradigm-based analysis of copy number variations and rna-seq gene expression in combination with a propagation-based network analysis algorithm revealed novel associations between mutations and gene expression levels, which subsequently resulted in the identifi cation of pathways altered in bladder cancer. the identifi cation of activating or inhibiting gene mutations in these pathways suggested new targets for treatment. moreover, this effort clearly showed the benefi ts of screening patients for the presence of specifi c mutations to enable personalized treatment strategies. often, published disease signatures cannot be replicated [ 81 ] or provide hardly additional biological insight. also here (modular) network-based approaches have been proposed to alleviate these problems. a common characteristic of most methods is that the molecular activity of a set of genes is summarized on a per sample basis. summarized gene set scores are then used as features in prognostic and predictive models. relevant gene sets can be based on prior knowledge and correspond to canonical pathways, gene ontology categories, or sets of genes sharing common motifs in their promoter regions [ 115 ] . gene set scores can also be determined by projecting molecular data onto a biological network and summarizing scores at the level of subnetworks for each individual sample [ 116 ] . while promising in principle, it is still subject of debate whether gene set-based models outperform gene-based one s [ 117 ] . the comparative analysis of networks across different species is another commonly used approach to constrain the solution space. patterns conserved across species have been shown to be more likely to be true functional interactions [ 107 ] and to harbor useful candidates for human disease genes [ 118 ] . many network alignment methods have been developed in the past decade to identify commonalities between networks. these methods in general combine sequence-based and topological constraints to determine the optimal alignment of two (or more) biological networks. network alignment has, for example, been applied to detect conserved patterns of protein interaction in multiple species [ 107 , 119 ] and to analyze the evolution of co-expression networks between humans and mice [ 120 , 121 ] . network alignment can also be applied to detect diverged patterns [ 120 ] and may thus lead to a better understanding of similarities and differences between animal models and human in health and disease. information from model organisms has also been fruitfully used to identify more robust disease signatures [ 122 -125 ] . sweet-cordero and co-workers [ 122 ] used a gene signature identifi ed in a mouse model of lung adenocarcinoma to uncover an orthologous signature in human lung adenocarcinoma that was not otherwise apparent. bild et al. [ 123 ] defi ned gene expression signatures characterizing several oncogenic pathways of human mammary epithelial cells. they showed that these signatures predicted pathway activity in mouse and human tumors. predictions of pathway activity correlated well with the sensitivity to drugs targeting those pathways and could thus serve as a guide to targeted therapies. a generic approach, pathprint, for the integration of gene expression data across different platforms and species at the level of pathways, networks, and transcriptionally regulated targets was recently described [ 126 ] . the authors used their method to identify four stem cell-related pathways conserved between human and mouse in acute myeloid leukemia, with good prognostic value in four independent clinical studies. we reviewed a wide array of different approaches showing how networks can be used to elucidate integrated genetic, molecular, and cellular networks. however, in general no single approach will be suffi cient and combining different approaches in more complex analysis pipelines will be required. this is fi ttingly illustrated by the diggit (driver-gene inference by genetical-genomics and information theory) algorithm [ 127 ] . in brief, diggit identities candidate master regulators from an aracne gene co-expression network integrated with copy number variations that affect gene expression. this method combines several previously developed computational approaches and was used to identify causal genetic drivers of human disease in general and glioblastoma, breast cancer, and alzheimer's disease in particular. this enabled identifi cation of klhl9 deletions as upstream activators of two previously established master regulators in a specifi c subtype of glioblastoma. systems medicine is one of the steps necessary to make improvements in the prevention and treatment of disease through systems approaches that will (a) elucidate (patho)physiologic mechanisms in much greater detail than currently possible, (b) produce more robust and predictive disease signatures, and (c) enable personalized treatment. in this context, we have shown that bioinformatics has a major role to play. bioinformatics will continue its role in the development, curation, integration, and maintenance of (public) biological and clinical databases to support biomedical research and systems medicine. the bioinformatics community will strengthen its activities in various standardization and curation efforts that already resulted in minimum reporting guidelines [ 128 ] , data capture approaches [ 75 ] , data exchange formats [ 129 ] , and terminology standards for annotation [ 130 ] . one challenge for the future is to remove errors and inconsistencies in data and annotation from databases and prevent new ones from being introduced [ 32 , 76 , 131 -135 ]. an equally important challenge is to establish, improve, and integrate resources containing phenotype and clinical information. to achieve this objective it seems reasonable that bioinformatics and health informatics professionals team up [ 136 -138 ] . traditionally health informatics professionals have focused on hospital information systems (e.g., patient records, pathology reports, medical images) and data exchange standards (e.g., hl7), medical terminology standards (e.g., international classifi cation of disease (icd), snomed), medical image analysis, analysis of clinical data, clinical decision support systems, and so on. while, on the other hand, bioinformatics mainly focused on molecular data, it shares many approaches and methods with health informatics. integration of these disciplines is therefore expected to benefi t systems medicine in various ways [ 139 ] . integrative bioinformatics approaches clearly have added value for systems medicine as they provide a better understanding of biological systems, result in more robust disease markers, and prevent (biological) bias that would possibly occur from using single-omics measurements. however, such studies, and the scientifi c community in general, would benefi t from improved strategies to disseminate and share data which typically will be produced at multiple research centers (e.g., https://www.synapse.org ; [ 140 ] ). integrative studies are expected to increasingly facilitate personalized medicine approaches such as demonstrated by chen and coworkers [ 141 ] . in their study they presented a 14-month "integrative personal omics profi le" (ipop) for a single individual comprising genomic, transcriptomic, proteomic, metabolomic, and autoantibody data. from the whole-genome sequence data an elevated risk for type 2 diabetes (t2d) was detected, and subsequent monitoring of hba1c and glucose levels revealed the onset of t2d, despite the fact that the individual lacked many of the known non-genetic risk factors. subsequent treatment resulted in a gradual return to the normal phenotype. this shows that the genome sequence can be used to determine disease risk in a healthy individual and allows selecting and monitoring specifi c markers that provide information about the actual disease status. network-based approaches will increasingly be used to determine the genetic causes of human diseases. since the effect of a genetic variation is often tissue or cell-type specifi c, a large effort is needed in constructing cell-type-specifi c networks both in health and disease. this can be done using data already available, an approach taken by guan et al. [ 142 ] . the authors proposed 107 tissue-specifi c networks in mouse via their generic approach for constructing functional association networks using lowthroughput, highly reliable tissue-specifi c gene expression information as a constraint. one could also generate new datasets to facilitate the construction of tissue-specifi c networks. examples of such approaches are tcga and the genotype-tissue expression (gtex) project. the aim of gtex is to create a data resource for the systematic study of genetic variation and its effect on gene expression in more than 40 human tissues [ 143 ] . regardless of the way how networks are constructed, it will become more and more important to offer a centralized repository where networks from different cell types and diseases can be stored and accessed. nowadays, these networks are diffi cult to retrieve and are scattered in supplementary fi les with the original papers, links to accompanying web pages, or even not available at all. a resource similar to what the systems biology community has created with the biomodels database would be a great leap forward. there have been some initial attempts in building databases of network models, for example the cellcircuits database [ 123 ] ( http://www.cellcircuits.org ) and the causal biological networks (cbn) database of networks related to lung disease [ 144 ] ( http://causalbionet.com ). however, these are only small-scale initiatives and a much larger and coordinated effort is required. another main bottleneck for the successful application of network inference methods is their validation. most network inference methods to date have been applied to one or a few isolated datasets and were validated using some limited follow-up experiments, for example via gene knockdowns, using prior knowledge from databases and literature as a gold standard, or by generating simulated data from a mathematical model of the underlying network [ 145 , 146 ] . however, strengths and weaknesses of network inference methods across cell types, diseases, and species have hardly been assessed. notable exceptions are collaborative competitions such as the dialogue on reverse engineering assessment and methods (dream) [ 147 ] and industrial methodology for process verifi cation (improver) [ 146 ] . these centralized initiatives propose challenges in which individual research groups can participate and to which they can submit their predictions, which can then be independently validated by the challenge organizers. several dream challenges in the area of network inference have been organized, leading to a better insight into the strengths and weaknesses of individual methods [ 148 ] . another important contribution of dream is that a crowd-based approach integrating predictions from multiple network inference methods was shown to give good and robust performance across diverse data sets [ 149 ] . also in the area of systems medicine challenge-based competitions may offer a framework for independent verifi cation of model predictions. systems medicine promises a more personalized medicine that effectively exploits the growing amount of molecular and clinical data available for individual patients. solid bioinformatics approaches are of crucial importance for the success of systems medicine. however, really delivering the promises of systems medicine will require an overall change of research approach that transcends the current reductionist approach and results in a tighter integration of clinical, wet-lab laboratory, and computational groups adopting a systems-based approach. past, current, and future success of systems medicine will accelerate this change. the road from systems biology to systems medicine participatory medicine: a driving force for revolutionizing healthcare understanding drugs and diseases by systems biology the roots of bioinformatics in theoretical biology sequencing technologies -the next generation exploring the new world of the genome with dna microarrays spectroscopic and statistical techniques for information recovery in metabonomics and metabolomics next-generation technologies and data analytical approaches for epigenomics gene expression profi ling predicts clinical outcome of breast cancer diagnostic tests based on gene expression profi le in breast cancer: from background to clinical use a multigene assay to predict recurrence of tamoxifentreated, node-negative breast cancer what is bioinformatics? a proposed defi nition and overview of the fi eld the importance of biological databases in biological discovery the 2014 nucleic acids research database issue and an updated nar online molecular biology database collection reuse of public genome-wide gene expression data experimental design for gene expression microarrays learning from our gwas mistakes: from experimental design to scientifi c method effi cient experimental design and analysis strategies for the detection of differential expression using rna-sequencing impact of yeast systems biology on industrial biotechnology the nature of systems biology gene expression omnibus: microarray data storage, submission, retrieval, and analysis the proteomics identifi cations (pride) database and associated tools: status in 2013 metabolights--an open-access generalpurpose repository for metabolomics studies and associated meta-data the reactome pathway knowledgebase data, information, knowledge and principle: back to metabolism in kegg activities at the universal protein resource (uniprot) path2models: large-scale generation of computational models from biochemical pathway maps precise generation of systems biology models from kegg pathways pathguide: a pathway resource list pathway commons, a web resource for biological pathway data consensus and confl ict cards for metabolic pathway databases the ucsc genome browser database: 2014 update biomodels database: a repository of mathematical models of biological processes a community-driven global reconstruction of human metabolism the seek: a platform for sharing data and models in systems biology malacards: an integrated compendium for diseases and their annotation genecards version 3: the human gene integrator in-silico human genomics with genecards peroxisomedb 2.0: an integrative view of the global peroxisomal metabolome the mouse age phenome knowledgebase and disease-specifi c inter-species age mapping searching the mouse genome informatics (mgi) resources for information on mouse biology from genotype to phenotype gene-set approach for expression pattern analysis systems analysis of human brain gene expression: mechanisms for hiv-associated neurocognitive impairment and common pathways with alzheimer's disease systems biology approach to identify transcriptome reprogramming and candidate microrna targets during the progression of polycystic kidney disease mir2disease: a manually curated database for microrna deregulation in human disease a new face and new challenges for online mendelian inheritance in man (omim(r)) clinvar: public archive of relationships among sequence variation and human phenotype searching ncbi's dbsnp database dbvar and dgva: public archives for genomic structural variation using electronic patient records to discover disease correlations and stratify patient cohorts on not reinventing the wheel beyond the genomics blueprint: the 4th human variome project meeting comprehensive molecular characterization of urothelial bladder carcinoma open clinical trial data for all? a view from regulators clinical trial data as a public good biobanking for europe whose data set is it anyway? sharing raw data from randomized trials sharing individual participant data from clinical trials: an opinion survey regarding the establishment of a central repository ncbi's database of genotypes and phenotypes: dbgap mining electronic health records: towards better research applications and clinical care phenome connections phewas: demonstrating the feasibility of a phenome-wide scan to discover genedisease associations mining the ultimate phenome repository probing genetic overlap among complex human phenotypes systematic comparison of phenomewide association study of electronic medical record data and genome-wide association study data finding the missing heritability of complex diseases systems genetics: from gwas to disease pathways a review of post-gwas prioritization approaches when one and one gives more than two: challenges and opportunities of integrative omics the model organism as a system: integrating 'omics' data sets principles and methods of integrative genomic analyses in cancer toward interoperable bioscience data critical assessment of human metabolic pathway databases: a stepping stone for future integration the bridgedb framework: standardized access to gene, protein and metabolite identifi er mapping services integration of transcriptomics and metabonomics: improving diagnostics, biomarker identifi cation and phenotyping in ulcerative colitis a multivariate approach to the integration of multi-omics datasets comprehensive molecular portraits of human breast tumours bring on the biomarkers assessing the clinical utility of cancer genomic and proteomic data across tumor types incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction the casym roadmap: implementation of systems medicine across europe molecular classifi cation of cancer: class discovery and class prediction by gene expression monitoring how to infer gene networks from expression profi les coexpression analysis of human genes across many microarray data sets discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks reverse engineering of regulatory networks in human b cells comparison of co-expression measures: mutual information, correlation, and model based indices using bayesian networks to analyze expression data probabilistic graphical models: principles and techniques. adaptive computation and machine learning inferring gene networks from time series microarray data using dynamic bayesian networks the biogrid interaction database: 2013 update architecture of the human regulatory network derived from encode data reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes a genomewide functional network for the laboratory mouse string v9.1: protein-protein interaction networks, with increased coverage and integration advantages and limitations of current network inference methods computational discovery of gene modules and regulatory networks a semisupervised method for predicting transcription factor-gene interactions in escherichia coli regression analysis of combined gene expression regulation in acute myeloid leukemia integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks integrated systems approach identifi es genetic nodes and networks in late-onset alzheimer's disease a survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohort an introduction to systems biology: design principles of biological circuits from signatures to models: understanding cancer using microarrays integrative approaches for fi nding modular structure in biological networks weighted gene coexpression network analysis: state of the art integrative functional genomic analyses implicate specifi c molecular pathways and circuits in autism multi-omic network signatures of disease identifying functional modules in protein-protein interaction networks: an integrated exact approach algorithms for detecting signifi cantly mutated pathways in cancer inference of patient-specifi c pathway activities from multi-dimensional cancer genomics data using paradigm pathway-based personalized analysis of cancer network-based classifi cation of breast cancer metastasis current composite-feature classifi cation methods do not outperform simple singlegenes classifi ers in breast cancer prognosis prediction of human disease genes by humanmouse conserved coexpression analysis a comparison of algorithms for the pairwise alignment of biological networks cross-species analysis of biological networks by bayesian alignment graphalignment: bayesian pairwise alignment of biological networks an oncogenic kras2 expression signature identifi ed by cross-species gene-expression analysis oncogenic pathway signatures in human cancers as a guide to targeted therapies interspecies translation of disease networks increases robustness and predictive accuracy integrated cross-species transcriptional network analysis of metastatic susceptibility pathprinting: an integrative approach to understand the functional basis of disease identifi cation of causal genetic drivers of human disease through systems-level analysis of regulatory networks promoting coherent minimum reporting guidelines for biological and biomedical investigations: the mibbi project data standards for omics data: the basis of data sharing and reuse biomedical ontologies: a functional perspective pdb improvement starts with data deposition what we do not know about sequence analysis and sequence databases annotation error in public databases: misannotation of molecular function in enzyme superfamilies improving the description of metabolic networks: the tca cycle as example more than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology biomedical and health informatics in translational medicine amia board white paper: defi nition of biomedical informatics and specifi cation of core competencies for graduate education in the discipline synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care elixir: a distributed infrastructure for european biological data enabling transparent and collaborative computational analysis of 12 tumor types within the cancer genome atlas personal omics profi ling reveals dynamic molecular and medical phenotypes tissue-specifi c functional networks for prioritizing phenotype and disease genes the genotype-tissue expression (gtex) project on crowd-verifi cation of biological networks inference and validation of predictive gene networks from biomedical literature and gene expression data verifi cation of systems biology research in the age of collaborative competition dialogue on reverse-engineering assessment and methods: the dream of highthroughput pathway inference revealing strengths and weaknesses of methods for gene network inference wisdom of crowds for robust gene network inference we would like to thank dr. aldo jongejan for his comments that improved the text. key: cord-017932-vmtjc8ct authors: georgiev, vassil st. title: genomic and postgenomic research date: 2009 journal: national institute of allergy and infectious diseases, nih doi: 10.1007/978-1-60327-297-1_25 sha: doc_id: 17932 cord_uid: vmtjc8ct the word genomics was first coined by t. roderick from the jackson laboratories in 1986 as the name for the new field of science focused on the analysis and comparison of complete genome sequences of organisms and related high-throughput technologies. two basic computational methods are used for genome analysis: gene finding and whole genome comparison (2) . gene finding. using a computational method that can scan the genome and analyze the statistical features of the sequence is a fast and remarkably accurate way to find the genes in the genome of prokaryotic organisms (bacteria, archaea, viruses) compared with the still difficult problem of finding genes in higher eukaryotes. by using modern bioinformatics software, finding the genes in a bacterial genome will result in a highly accurate, rich set of annotations that provide the basis for further research into the functions of those genes. the absence of introns-those portions of the dna that lie between two exons and are transcribed into a rna but will not appear in that rna after maturation and therefore are not expressed (as proteins) in the protein synthesis-will remove one of the major barriers to computational analysis of the genome sequence, allowing gene finding to identify more than 99% of the genes of most genomes without any human intervention. next, these gene predictions can be further refined by searching for nearby regulatory sites such as the ribosome-binding sites, as well as by aligning protein sequences to other species. these steps can be automated using freely available software and databases (2) . gene finding in single-cell eukaryotes is of intermediate difficulty, with some organisms, such as trypanosoma brucei, having so few introns that a bacterial gene finder is sufficient to find their genes. other eukaryote organisms (e.g., plasmodium falciparum) have numerous introns and would require the use of special-purpose gene finder, such as glimmerm (3, 4) . whole genome comparison. this computational method refers to the problem of aligning the entire deoxyribonucleic acid (dna) sequence of one organism to that of another, with the goal of detecting all similarities as well as rearrangements, insertions, deletions, and polymorphisms (2) . with the increasing availability of complete genome sequences from multiple, closely related species, such comparisons are providing a powerful tool for genomic analysis. using suffix trees-data structures that contains all of the subsequences from a particular sequence and can be built and searched in linear time-this computational task can be accomplished in minimal time and space. because the suffix tree algorithm is both time and space efficient, it is able to align large eukaryotic chromosomes with only slightly greater requirements than those for bacterial genomes (2) . bacterial genome annotation. the major goal of the bacterial genome annotation is to identify the functions of all genes in a genome as accurately and consistently as possible by using initially automated annotation methods for preliminary assignment of functions to genes, followed by a second stage of manual curation by teams of scientists. the family enterobacteriaceae encompasses a diverse group of bacteria including many of the most important human pathogens (salmonella, yersinia, klebsiella, shigella), as well as one of the most enduring laboratory research organisms, the nonpathogenic escherichia coli k12. many of these pathogens have been subject to genome sequencing or are under study. genome comparisons among these organisms have revealed the presence of a core set of genes and functions along a generally collinear genomic backbone. however, there are also many regions and points of difference, such as large insertions and deletions (including pathogenicity islands), integrated bacteriophages, small insertions and deletions, point mutations, and chromosomal rearrangements (5). the first genome sequence of escherichia coli k12 (reference strain mg1655) was completed and published in 1997 (6) . later, the genome sequence of two other genotypes of e. coli, the enterohemorrhagic e. coli o157:h7 (ehec; strains edl933 and rimd 0509952-sakai) (7, 8) and the uropathogenic e. coli (upec; strain cft073) (9) , were sequenced and the information published. currently, it is accepted that shigellae are part of the e. coli species complex, and information on the genome of shigella flexneri strain 2a has been published (10) . a comparison of all three pathogenic e. coli with the archetypal nonpathogenic e. coli k12 revealed that the genomes were essentially collinear, displaying both conservation in sequence and gene order (5) . the genes that were predicted to be encoded within the conserved sequence displayed more than 95% sequence identity and have been termed the core genes. similar observations were made for the shigella flexneri genome, which also shares 3.9 mb of common sequence with e. coli (10) . a comparison of the three e. coli genomes revealed that genes shared by all genomes amounted to 2,996 (9) from a total of 4,288, and about 5,400 and 5,500 predicted proteincoding sequences for e. coli k12, ehec, and upec, respectively (5) . the region encoding these core genes is known as the backbone sequence. it was also apparent from these comparisons that interdispersed throughout this backbone sequence were large regions unique to the different genotypes. moreover, several studies had shown that some of these unique loci were present in clinical disease-causing isolates but were apparently absent from their comparatively benign relatives (11) . one such well-characterized region is the locus of enterocyte effacement (lee) in the enteropathogenic e. coli (epec). thus, an epec infection results in effacement of the intestinal microvilli and the intimate adherence of bacterial cells to enterocytes. furthermore, epec also subverts the structural integrity of the cell and forces the polymerization of actin, which accumulates below the adhered epec cells, forming cup-like pedestals (12) . this is called an attachment and effacing (ae) lesion. subsequently, lee was found in all bacteria known to be able to elicit an ae lesion (5). the presence of many regions in the backbone sequence similar to lee have been characterized in both gram-negative and gram-positive bacteria (13) . this led to the concept of pathogenicity islands (pais) and the formulation of a definition to describe their features (5) . typically, pais are inserted adjacent to stable rna genes and have an atypical g+c content. in addition to virulence-related functions, the pathogenicity islands often carry genes encoding transposase or integrase-like proteins and are unstable and self-mobilizable (13, 14) . it was also noted that pais possess a high proportion of gene fragments or disrupted genes when compared with the backbone regions (15) . it is generally accepted that the pathogenic e. coli genotypes have evolved from a much smaller nonpathogenic relative by the acquisition of foreign dna. this laterally acquired dna has been attributed with conferring on the different genotypes the ability to colonize alternative niches in the host and the ability to cause a range of different disease outcomes (5) . although sharing some of the features of pais and considered to be parts of the pais, some genomic loci are unlikely to impinge on pathogenicity. to take account of this, the concept of pais has been extended to include islands or strainspecific loops, which represent discrete genetic loci that are lineage-specific but are as yet not known to be involved in virulence (7, 8) . currently, there are more than 2,300 salmonella serovars in two species, s. enterica and s. bongori. all salmonellae are closely related, sharing a median dna identity for the reciprocal best match of between 85% and 95% (16, 17) . despite their homogeneity, there are still significant differences in the pathogenesis and host range of the different salmonella serovars. thus, whereas s. enterica subspecies enterica serovar typhi (s. typhi) is only pathogenic to humans causing severe typhoid fever, s. typhimurium causes gastroenteritis in humans but also a systemic infection in mice and has a broad host range (16) . like e. coli, the salmonellae are also known to possess pais, known as salmonella pathogenicity islands (spis). it is thought that spis have been acquired laterally. for example, the gene products encoded by spi-1 (18, 19) and spi-2 (20, 21) have been shown to play important roles in the different stages of the infection process. both of these islands possess type iii secretion systems and their associated secreted protein effectors. spi-1 is known to confer on all salmonellae the ability to invade epithelial cells. spi-2 is important in various aspects of the systemic infection, allowing salmonella to spread from the intestinal tissue into the blood and eventually to infect, and survive within, the macrophages of the liver and spleen (22) . spi-3, like lee and pai-1 of upec, is inserted alongside the selc trna gene and carries the gene mgtc, which is required for the intramacrophage survival and growth in the low-magnesium environment thought to be encountered in the phagosome (23) . other salmonella spis encode type iii-secreted effector proteins, chaperone-usher fimbrial operons, vi antigen biosynthetic gene, a type ivb pilus operon, and many other determinants associated with the salmonellae enteropathogenicity (15). although the mobile nature of pais is frequently discussed in the literature, there is little direct experimental evidence to support these observations. one possible explanation for this may be that on integration, the mobility genes of the pais subsequently become degraded, thereby fixing their position (5) . certainly, there is evidence to support this hypothesis, as many proposed pais carry integrase or transposase pseudogenes or remnants. one excellent example of this is the high-pathogenicity island (hpi) first characterized in yersinia (24) . the yersinia hpis can be split into two lineages based on the integrity of the phage integrase gene (int) carried in the island: (i) y. enterocolitica biotype 1b and (ii) y. pestis and y. pseudotuberculosis. the y. enterocolitica hpi int gene carries a point mutation, whereas the analogous gene is intact in the y. pestis and y. pseudotuberculosis hpis. the yersinia hpi is a 35-to 43-kb island that possesses genes for the production and uptake of the siderophore yersiniabactin, as well as genes, such as int, thought to be involved in the mobility of the island. hpi-like elements are widely distributed in enterobacteria, including e. coli, klebsiella, enterobacter, and citrobacter spp., and like many prophages, these hpis are found adjacent to asn-trna genes (8) . trna genes are common sites for bacteriophage integration into the genome (25) . integration at these sites typically involves site-specific recombination between short stretches of identical dna located on the phage (attp) and at the integration site on the bacterial genomes (attb). the trna genes represent common sites for the integration of many other pais and bacteriophages, with the secc trna locus being the most heavily used integration site in the enterics (9). integrated bacteriophages, also known as prophages, are also commonly found in bacterial genomes (5) . for example, in the s loops of the e. coli o157:h7 strain edl933 (ehec) unique regions, nearly 50% were phage related. in addition to the 18 prophage sequences detected in the genome of ehec strain sakai (8) , the genomes of e. coli k12, upec, and s. flexneri have all been shown to carry multiple prophage or prophage-like elements (6, 7, 9, 10) . moreover, comparison of the genome sequences of ehec o157:h7 strain edl933 and strain sakai revealed marked variations in the complement and integration sites of the prophages, as did internal regions within highly related phages (8, 26) . in addition to genes essential for their own replication, phages often carry genes that, for example, prevent superinfection by other bacteriophages, such as old and tin (27, 28) . however, other genes carried in prophages appear to be of nonphage origin and can encode determinants that enhance the virulence of the bacterial host by a process known as lysogenic conversion (29) . in addition to the presence of the lee pai and the ability to elicit ae lesion, another defining characteristics of the enterohemorrhagic e. coli (ehec) is the production of shiga toxins (stx). the shiga toxins represent a family of potent cytotoxins that, on entry into the eukaryotic cell, will act as glycosylases by cleaving the 28s ribosomal rna (rrna) thereby inactivating the ribosome and consequently preventing the protein synthesis (30) . other enteric pathogens such as s. typhi, s. typhimurium, and y. pestis are also known to possess significant numbers of prophages (15, 16, 31) . thus, the principal virulence determinants of the salmonellae are the type iii secretion systems, carried by spi-1 and spi-2, and their associated protein effectors (32, 33) . a significant number of these type iii secreted effector proteins are present in the genomes of prophages and have a dramatic influence on the ability of their bacterial hosts to cause disease (5). small insertions and deletions. even though the large pais play a major role in defining the phenotypes of different strains of the enteric bacteria, there are many other differences resulting from small insertions and deletions, which must be taken into account when considering the overall genomic picture of enterobacteriaceae (5) . thus, the comparisons between e. coli k12 and e. coli o157:h7 and between s. typhi and s. typhimurium have indicated the existence of many small differences that exist aside from the large pathogenicity islands. for example, the number of separate insertion and deletion events has shown that there are 145 events of 10 genes or fewer compared with 12 events of 20 genes or more for the s. typhi and s. typhimurium comparison. furthermore, comparison between s. typhi and e. coli revealed 504 events of 10 genes or fewer compared with just 25 events of 20 genes or more. even taking into account that the larger islands contain many more genes per insertion or deletion event, it becomes clear that nearly equivalent numbers of speciesspecific genes are attributable to insertion or deletion events involving 10 genes or fewer as are due to events involving 20 genes or more. these data should lend credence to the assertion that the acquisition and exchange of small islands is important in defining the overall phenotype of the organism (5) . in the majority of cases studied to date, there is no evidence to suggest the presence of genes that may allow these small islands to be self-mobile. it is far more likely that small islands of this type are exchanged between members of a species and constitute part of the species gene pool. once acquired by one member of the species, they can be easily exchanged by generalized transduction mechanisms, followed by homologous recombination between the near identical flanking genes to allow integration into the chromosome (5) . this sort of mechanism of genetic exchange would also make possible nonorthologous gene replacement, involving the exchange of related genes at identical regions in the backbone. a specific example to illustrate such a possibility is the observed capsular switching of neisseria meningitides (34) and streptococcus pneumoniae (35, 36) for which different sets of genes responsible for the biosynthesis of different capsular polysaccharides are found at identical regions in the chromosome and flanked by conserved genes. the implied mechanism for capsular switching involves replacement of the polysaccharide-specific gene sites by homologous recombination between the chromosome and exogenous dna in the flanking genes (5) . point mutations and pseudogenes. one of the most surprising observations to come from enterobacterial genome research has been the discovery of a large number of pseudogenes. the pseudogenes appeared to be untranslatable due to the presence of stop codons, frameshifts, internal deletions, or insertion sequence (is) element insertions. the presence of pseudogenes seems to run contrary to the general assumption that the bacterial genome is a highly "streamlined" system that does not carry "junk dna" (5). for example, salmonella typhi, the etiologic agent of typhoid fever, is host restricted and appears only capable of infecting a human host, whereas s. typhimurium, which causes a milder disease in humans, has a much broader host range. upon analysis, the genome of s. typhi contained more than 200 pseudogenes (15) , whereas it was predicted that the number of pseudogenes in the genome of s. typhimurium would be around 39 (16) . from this observation, it becomes clear that the pseudogenes in s. typhi were not randomly spread throughout its genome-in fact, they were overrepresented in genes that were unique to s. typhi when compared with e. coli, and many of the pseudogenes in s. typhi have intact counterparts in s. typhimurium that have been shown to be involved in aspects of virulence and host interaction. given this distribution of pseudogenes, it has been suggested that the host specificity of s. typhi may be the result of the loss of its ability to interact with a broader range of hosts caused by functional inactivation of the necessary genes (15) . in contrast with other microorganisms containing multiple pseudogenes, such as mycobacterium leprae (37) , most of the pseudogenes in s. typhi were caused by a single mutation, suggesting that they have been inactivated relatively recently. taken together, these observations suggest an evolutionary scenario in which the recent ancestor of s. typhi had changed its niche in a human host, evolving from an ancestor (similar to s. typhimurium) limited to localized infection and invasion around the gut epithelium into one capable of invading the deeper tissues of the human hosts (5) . a similar evolutionary scenario has been suggested for another recently evolved enteric pathogen, yersinia pestis. this bacterium has also recently changed from a gut bacterium (y. pseudotuberculosis), transmitted via the fecal-oral route, to an organism capable of using a flea vector for systemic infection (38, 39) . again, this change in niche was accompanied by pseudogene formation, and genes involved in virulence and host interaction are overrepresented in the set of genes inactivated (31) . yet another example of such an evolutional scenario is shigella flexneri 2a, a member of the species e. coli (which is predicted to have more than 250 pseudogenes), and is again restricted to the human body (10) . all of these organisms demonstrate that the enterobacterial evolution has been a process that has involved both gene loss and gene gain, and that the remnants of the genes lost in the evolutionary process can be readily detected (5). the focus in the postgenomic era is on functional genomics, in which proteomics plays an essential role. the living cell is a dynamic and complex system that cannot be predicted from the genome sequence. whereas genomes will disclose important information on the biological importance of the organism, it is still static and will not reveal information on the expression of a particular gene or of posttranslational modifications or on how a protein is regulated in a specific biological situation (40) . thus, whereas the complete genome sequence provides the basis for experimental identification of expressed proteins at the cellular level, very little has been accomplished to identify all expressed and potentially modified proteins. direct investigation of the total content of proteins in a cell is the task of proteomics. proteomics is defined as the complete set of posttranslationally modified and processed proteins in a well-defined biological environment under specific circumstances, such as growth conditions and time of investigation (40, 41) . proteomics can be studied by following two separate steps: separation of the proteins in a sample, followed by identification of the proteins. the common methodology used for separating proteins is two-dimensional polyacrylamide gel electrophoresis (2d page). the principal method for large-scale identification is mass spectroscopy (ms), but other identification methods, such as n-terminal sequencing, immunoblotting, overexpression, spot colocalization, and gene knockouts, can also be used. because of its high-resolution power, 2d page is currently the best methodology to achieve global visualization of the proteins of a microorganism. in the first dimension, isoelectric focusing is carried out to separate the proteins in a ph gradient according to their isoelectric point (pi). in the second dimension, the proteins are separated according to their molecular weight by sds-page (sodium dodecyl sulfate-page). the resulting gel image presents itself as a pattern of spots in which pi and the relative molecular weight (m r ) can be recognized as in a coordinate system (40) . a critical step during the 2d page procedure is the sample preparation, as there is no single method that can be universally applied because different reagents are superior with respect to different samples. to this end, chaotropes such as urea, which act by changing the parameters of the solvent, are used in most 2d page procedures. major problems to overcome in 2d page sample preparation arise because of limited entry into the gel of high-molecular-weight proteins and the presence of highly hydrophobic and/or basic proteins (42, 43) . for protein separation, the protein mixture is loaded onto an acrylamide gel strip in which a ph gradient is established. when a high voltage is applied over the strip, the proteins will focus at the ph at which they carry zero net charge. the ph gradient is established during the focusing using either carrier ampholytes in a slab gel (44) or a precast polyacrylamide gel with an immobilized ph gradient (ipg) (45) . the latter method is advantageous because of improved reproducibility. samples can be applied to ipg dry strips preferably by rehydration. rehydration of dried ipgs under application of a low voltage (10 to 50 v) has significantly improved the recovery especially of high-molecularweight proteins. mass spectrometry is the method of choice for identifying proteins in proteomics. the proteins are converted into gas phase ions that can be measured with an accuracy better than 50 ppm (40) . two widely used techniques for ionization are matrix-assisted laser desorption ionization (maldi) (46) and electrospray ionization (47) . maldi is usually coupled with a tof (time of flight) device for measuring the masses. the ionized peptides are then accelerated by the application of accelerated field and the tof until they reach a detector to calculate their mass/charge ratio (40) . in electrospray ionization, the peptides are sprayed into the spectrometer (47) . ionization is achieved when the charged droplets evaporate. an alternative procedure for measuring masses is the ion trap (48) , which selects ions with certain mass/charge ratios by keeping them in sinusoidal motion between two electrodes. in 1995, the first microbe sequencing project, haemophilus influenzae (a bacterium causing upper respiratory infection), was completed with a speed that stunned scientists (http:// www3.niaid.nih.gov/research/topics/pathogen/introduction. htm). encouraged by the success of that initial effort, researchers have continued to sequence an astonishing array of other medically important microorganisms. to this end, niaid has made significant investments in large-scale sequencing projects, including projects to sequence the complete genomes of many pathogens, such as the bacteria that cause tuberculosis, gonorrhea, chlamydia, and cholera, as well as organisms that are considered agents of bioterrorism. in addition, niaid is collaborating with other funding agencies to sequence larger genomes of protozoan pathogens such as the organism causing malaria. the availability of microbial and human dna sequences opens up new opportunities and allows scientists to perform functional analyses of genes and proteins in whole genomes and cells, as well as the host's immune response and an individual's genetic susceptibility to pathogens. when scientists identify microbial genes that play a role in disease, drugs can be designed to block the activities controlled by those genes. because most genes contain the instructions for making proteins, drugs can be designed to inhibit specific proteins or to use those proteins as candidates for vaccine testing. genetic variations can also be used to study the spread of a virulent or drug-resistant form of a pathogen. niaid has launched initiatives to provide comprehensive genomic, proteomic, and bioinformatic resources. these resources, listed below, are available to scientists conducting basic and applied research on a broad array of pathogenic microorganisms (http://www3.niaid.nih.gov/research/topics/ pathogen/initiatives.htm): r niaid's microbial sequencing centers (nscs). the niaid's microbial sequencing centers are state-of-theart high-throughput dna sequencing centers that can sequence genomes of microbes and invertebrate vectors of infectious diseases. genomes that can be sequenced include microorganisms considered agents of bioterrorism and those responsible for emerging and re-emerging infectious diseases. resource center is a centralized facility that provides scientists with the resources and reagents necessary to conduct functional genomics research on human pathogens and invertebrate vectors at no cost. the pfgrc provides scientists with genomic resources and reagents such as microarrays, protein expression clones, genotyping, and bioinformatics services. the pfgrc supports the training of scientists in the latest techniques in functional genomics and emerging genomic technologies. r niaid's proteomics centers. the primary goal of these centers is to characterize the pathogen and/or host cell proteome by identifying proteins associated with the biology of the microorganisms, mechanisms of microbial pathogenesis, innate and adaptive immune responses to infectious agents, and/or non-immune-mediated host responses that contribute to microbial pathogenesis. it is anticipated that the research programs will discover targets for potential candidates for the next generation of vaccines, therapeutics, and diagnostics. this will be accomplished by using existing proteomics technologies, augmenting existing technologies, and creating novel proteomics approaches as well as performing early-stage validation of these targets. r administrative resource for biodefense proteomic centers (arbpcs). the arbpcs consolidate data generated by each proteomics research center and make it available to the scientific community through a publicly accessible web site. this database (www.proteomicsresource.org) serves as a central information source for reagents and validated protein targets and has recently been populated with the first data released. r niaid's bioinformatics resource centers. the niaid's bioinformatics resource centers will design, develop, maintain, and continuously update multiorganism databases, especially those related to biodefense. organisms of particular interest are the niaid category a to c priority pathogens and those causing emerging and re-emerging diseases. the ultimate goal is to establish databases that will allow scientists to access a large amount of genomic and related data. this will facilitate the identification of potential targets for the development of vaccines, therapeutics, and diagnostics. each contract will include establishing and maintaining an analysis resource that will serve as a companion to the databases to provide, develop, and enhance standard and advanced analytical tools to help researchers access and analyze data. tb structural genomics consortium. a collaboration of scientists in six countries formed to determine and analyze the structures of about 400 proteins from mycobacterium tuberculosis. the group seeks to optimize the technical and management aspects of highthroughput structure determination and will develop a database of structures and functions. niaid, which is co-funding this project with nigms, anticipates that this information will also lead to the design of new and improved drugs and vaccines for tuberculosis. structural genomics of pathogenic protozoa consortium. this consortium is aiming to develop new ways to solve protein structures from organisms known as protozoans, many species of which cause deadly diseases such as sleeping sickness, malaria, and chagas' disease. the national institute of allergy and infectious diseases is providing support to the microbial genome sequencing centers (mscs) at the j. craig venter institute [formerly, the institute for genomic research (tigr)], the broad institute at the massachusetts institute of technology (mit), and harvard university for a rapid and cost-efficient production of high-quality, microbial genome sequences and primary annotations. niaid's mscs (http://www.niaid.nih.gov/dmid/genomes/mscs/) are responding to the scientific community and national and federal agencies' priorities for genome sequencing, filling in sequence gaps, and therefore providing genome sequencing data for multiple uses including understanding the biology of microorganisms, forensic strain identification, and identifying targets for drugs, vaccines, and diagnostics. in addition, the niaid's mscs have developed web sites that provide descriptive information about the sequencing projects and their progress (http://www.broad.mit.edu/seq/msc/and http://msc.tigr.org/status.shtml). genomes to be sequenced include microorganisms considered to be potential agents of bioterrorism (niaid category a, b, and c), related organisms, clinical isolates, closely related species, and invertebrate vectors of infectious diseases and microorganisms responsible for emerging and re-emerging infectious diseases. in addition, in response to a recommendation from a 2002 niaid-sponsored blue ribbon panel on bioterrorism and its implication for biomedical research to support genomic sequencing of microorganisms considered agents of bioterrorism and related organisms, the mscs will address the institute's need for additional sequencing of such microorganisms and invertebrate vectors of disease and/or those that are responsible for emerging and re-emerging diseases (http://www.niaid.nih.gov/dmid/ genomes/mscs/overview.htm). the panel's recommendation included careful selection of species, strains, and clinical isolates to generate genomic data for different uses such as identification of strains and targets for diagnostics, vaccines, antimicrobials, and other drug developments. the mscs have the capacity to rapidly and costeffectively sequence genomic dna and provide preliminary identification of open reading frames and annotation of gene function for a wide variety of microorganisms, including viruses, bacteria, protozoa, parasites, and fungi. sequencing projects will be considered for both complete, finished genome sequencing and other levels of sequence coverage. the choice and justification of complete versus draft sequence is likely to depend on the nature and scope of the proposed project. large-scale prepublication information on genome sequences is a unique research resource for the scientific community, and rapid and unrestricted sharing of microbial genome sequence data is essential for advancing research on infectious agents responsible for human disease. therefore, it is anticipated that prepublication data on genome sequences produced at the niaid microbial sequencing centers will be made freely and publicly available via an appropriate publicly searchable database as rapidly as possible. niaid-supported investigators have completed 131 genome sequencing projects for 105 bacteria, 8 fungi, 15 parasitic protozoa, 2 invertebrate vectors of infectious diseases, and one plant (http://www.niaid.nih.gov/dmid/genomes/ mscs/req process.htm). in addition, niaid completed the sequence for 1,467 influenza genomes. in 2006, genome sequencing projects were completed for 22 pathogens as described in section 23.16.2. genome sequencing data is publicly available through web sites such as genbank, and data for the influenza genome sequences have been published in 2006. furthermore, through the niaid's microbial sequencing centers, the niaid has funded the sequence, assembly, and annotation of three invertebrate vectors of infectious diseases. in 2006, the final sequence, assembly, and the annotation of aedes aegyptii were released, as well as the preliminary sequence and assembly of the genomes for ixodes scapularis and culex pipiens; the final results for i. scapularis and c. pipiens will be released in 2007. in 2006, niaid supported nearly 40 large-scale genome sequencing projects for additional strains of viruses, bacteria, fungi, parasites, viruses, and invertebrate vectors. new projects included additional strains of borrelia, clostridium, escherichia coli, salmonella, streptococcus pneumonia, ureaplasma, coccidioides, penicillium marneffei, talaromyces stipitatus, lacazia loboi, histoplasma capsulatum, blastomyces dermatitidis, cryptosporidium muris, and dengue viruses, as well as additional sequencing and annotation of aedes aegyptii. in 2004, niaid launched the influenza genome sequencing project (igsp) (http://www.niaid.nih.gov/dmid/genomes/ mscs/influenza.htm), which has provided the scientific community with complete genome sequence data for thousands of human and animal influenza viruses. the influenza sequence data has been rapidly placed in the public domain, through genbank, an international searchable database, and the niaid-funded bioinformatics resource center with accompanying data analysis tools. all of the information will enable scientists to further study how influenza viruses evolve, spread, and cause disease and may ultimately lead to improved methods of treatment and prevention. this sequence information is now providing a larger and more representative sample of influenza than was previously publicly available. the influenza genome sequencing project has the capacity to sequence more than 200 genomes per month and is a collaborative effort among niaid (including the niaid's division of intramural research), the national center for biotechnology niaid is continuing its support for the pathogen functional genomics resource center (pfgrc) (http://www. niaid.nih.gov/dmid/genomes/pfgrc/default.htm) at the institute for genomic research (tigr) (currently part of the j. craig venter institute). the pfgrc was established in 2001 to provide and distribute to the broader research community a wide range of genomic resources, reagents, data, and technologies for the functional analysis of microbial pathogens and invertebrate vectors of infectious diseases. in addition, the pfgrc was expanded to provide the research community with the resources and reagents needed to conduct both basic and applied research on microorganisms responsible for emerging and re-emerging infectious diseases and those considered agents of bioterrorism. one of the priorities for the pfgrc has been to provide the scientific community with access to the reagents and genomic and proteomic data that the pfgrc generated. a new software tool, called snp filtering tool, was developed for affymetrix resequencing arrays to analyze the single nucleotide polymorphism (snp) data. enhancements have been made to other tools for microarray data analysis, including a tool for analyzing slide images. a new layout for the tigr-pfgrc web site (http://pfgrc.tigr.org/) has been developed and launched and has the potential to be more user-friendly for the scientific community to access the pfgrc research and development projects, poster presentations, publications, reagents, and their descriptions and data. the number of organism-specific microarrays produced and distributed to the scientific community increased to 28 pfgrc has continued to collaborate with the national institute of dental and craniofacial research (nidcr/nih) in producing and distributing five organism-specific microarrays, including arrays for actinobacillus actinomycetemcomitans, fusobacterium nucleatum, porphyromonas gingivalis, streptococcus mutans, and treponema denticola. pfgrc has also developed the methods and pipeline for generating organism-specific clones for protein expression. seven complete clone sets are now available for human severe acute respiratory syndrome coronavirus (sars-cov), bacillus anthracis, yersinia pestis, francisella tularensis, streptococcus pneumoniae, staphylococcus aureus, and mycobacterium tuberculosis. in addition, individual custom clone sets are available for more than 20 organisms upon request. comparative genomics analysis using the available bacillus anthracis sequence data and the discovery of the snps were used to develop a new bacterial typing system for screening anthrax strains. this system allowed niaid-funded scientists to define detailed phylogenetic lineages of bacillus anthracis and to identify three major lineages (a, b, c) with the ancestral root located between the a+b and c branches. in addition, a genotyping genechip, which has been developed and validated for bacillus anthracis, will be used to genotype about 300 different strains of bacillus anthracis. pfgrc has developed additional comparative genomic platforms for both facilitating the resequencing a bacterial genome on a chip to identify sequence variation among strains and to discover novel genes. a pilot project has been completed with streptococcus pneumoniae for sequencing different strains using resequencing chip technology. in collaboration with the department of homeland security (dhs), a resequencing chip has been developed and is now being used to screen a number of francisella tularensis strains to identify snps and genetic polymorphisms. sixteen francisella tularensis strains are being genotyped by using the newly developed resequencing chip. additional collaboration with dhs led to the development of a gene discovery platform aimed at discovering novel genes among different strains of yersinia pestis. to this end, nine strains are being analyzed using this platform to discover novel gene sets. pfgrc is developing proteomics technologies for protein arrays and comparative profiling of microbial proteins. a protein expression platform is under development, and a pilot comparative protein profiling project using staphylococcus aureus has already been completed and published. a protein profiling project using yersinia pestis to compare proteomes in different strains is now under way, complementing ongoing proteomics projects supported by niaid; numerous proteins are currently being identified that are differently abundant during different growth conditions. a new project was added in 2006 for comparative profiling of proteins on the proteomes of e. coli and shigella dysenteriae to provide the scientific community with reference data on differential protein expression in animal models versus cultured systems infected with the pathogen. in 2006, niaid continued to support the population genetics analysis program: immunity to vaccines/infections. a joint project between niaid's division of allergy, immunity, and transplantation (dait) and the division of microbiology and infectious diseases (dmid), this program is aimed to identify associations between specific genetic variations or polymorphisms in immune response genes and the susceptibility to infection or response to vaccination, with a focus on one or more niaid category a to c pathogens and influenza. niaid awarded six centers to study the genetic basis for the variable human response to immunization (smallpox, typhoid fever, cholera, and anthrax) and susceptibility to disease (tuberculosis, influenza, encapsulated bacterial diseases, and west nile virus infection). the centers are comparing genetic variance in specific immune response genes as well as more generally associated genetic variance across the whole genome in affected and nonaffected individuals. the physiologic differences associated with these genome variations will also be studied. in 2006, these centers focused on recruiting the samples needed for genotyping. for example, more than 1,100 smallpox-vaccinated individuals and controls were recruited and blood and peripheral blood mononuclear cell (pbmc) samples were obtained for whole genome association studies, which were conducted in 2007. in another example, one of the centers used genome-wide linkage approaches to map, isolate, and validate human host genes that confer susceptibility to influenza infection. nearly 1,000 individuals with susceptibility to influenza and 2,000 control individuals were recruited using an iceland genealogy database. by late 2006, the center had recruited more than 600 individuals and had genotyped more than 500 in this subproject of the study. during 2006, niaid continued its support of the eight bioinformatics resource centers (brcs) (http://www. niaid.nih.gov/dmid/genomes/brc/default.htm) with the goal of providing the scientific community with a publicly accessible resource that allows easy access to genomic and related data for the niaid category a to c priority pathogens, invertebrate vectors of infectious diseases, and pathogens causing emerging and re-emerging infectious diseases. the brcs are supported by multidisciplinary teams of scientists to develop new and improved computational tools and interfaces that can facilitate the analysis and interpretation of the genomic-related data by the scientific community. in 2006, each publicly accessible brc web site continued to be developed, the user interfaces were improved, and a variety of genomics data types were integrated, including gene expression and proteomics information, host/pathogen interactions, and signaling/metabolic pathways data. a public portal of information, data, and open-source software tools generated by all the brcs is available at http://www.brccentral.org/. in 2006, many genomes of microbial species were sequenced by the niaid's microbial sequencing centers as well as by other national and international sequencing efforts, and the brcs provided either long-term maintenance of the genome sequence data and annotation or the initial annotation for a number of particular microbial genomes. for example, niaid's brc vectorbase collaborated with niaid's mscs to annotate the genome of aedes aegyptii with the scientific community and will continue the curation of this genome. in 2006, niaid continued to support contracts for seven biodefense proteomics research centers (bprcs) to characterize the proteome of niaid category a to c bioweapon agents and to develop and enhance innovative proteomic technologies and apply them to the understanding of the pathogen and/or host cell proteome (http://www. niaid.nih.gov/dmid/genomes/prc/default.htm). these centers conducted a range of proteomics studies, including six category a pathogens, six category b pathogens, and one category c emerging disease organism. data, reagents, and protocols developed in the research centers are released to the niaid-funded administrative resource for biodefense proteomics research centers (www.proteomicsresource.org) web site within 2 months of validation. the administrative resource web site was created to integrate the diverse data generated by the bprcs. in 2005, more than 700 potential targets for vaccines, therapeutics, and diagnostics were generated. examples of progress include: in 2006, more than 2,400 potential new pathogen targets for vaccines, therapeutics, and diagnostics were identified, and more than 5,700 new corresponding host targets were generated. in addition: (i) two more sars-cov structures were solved. (ii) ninety-six percent of the orfs for b. anthracis were cloned with 47% sequence validated. (iii) a custom b. anthracis affymetrix genechip was developed. (iv) fifty-three polyclonal sera generated against novel toxoplasma gondii and cryptosporidium parvum proteins were characterized, and accurate time and mass tag databases were populated for salmonella typhi, monkeypox, and vaccinia virus. r niaid staff are participating in two related nih-wide genomic initiatives that focus on examining and identifying genetic variations across the human genome (genes) that may be linked or influence susceptibility or risk to a common human disease, such as asthma, autoimmunity, cancer, eye diseases, mental illness, and infectious diseases, or response to treatment as a vaccine. the approach is to conduct genome-wide association studies in which a dense set of snps across the human genome is genotyped in a large defined group of controls and diseases samples to identify genetic variations that may contribute to or have a role in the disease, with the hope of identifying an association between a genetic variant in a gene or group of genes and the disease. r niaid has continued to participate in a coordinated federal effort in biodefense genomics and is a major participant in the national inter-agency genomics sciences coordinating committee (nigscc), which includes many federal agencies. this committee was formed in 2002 to address the most serious gaps in the comprehensive genomic analysis of microorganisms considered agents of bioterrorism. a comprehensive list of microorganisms considered agents of bioterrorism was developed that identifies species, strains, and clinical and environmental isolates that have been sequenced, that are currently being sequenced, and that should be sequenced. in 2003, the committee focused on category a agents and provided the cdc with new technological approaches for sequencing additional smallpox viral strains. affymetrixbased microarray technology for genome sequencing was established, as well as additional bioinformatics expertise for analyzing the genomic sequencing data. in 2004, as a result of this continuing coordination of federal agencies in genome sequencing efforts for biodefense, niaid developed a formal interagency agreement with the department of homeland security (dhs) to perform comparative genomics analysis to characterize biothreat agents at the genetic level and to examine polymorphisms for identifying genetic variations and relatedness within and between species. r niaid continues to participate in the microbe project interagency working group (iwg), which has developed a coordinated, interagency, 5-year action plan on microbial genomics, including functional genomics and bioinformatics in 2001 (http://www.ostp. gov/html/microbial/start.htm). in 2003, the microbe project interagency working group developed guidelines for sharing prepublication genomic sequencing data that serve as guiding principles, so that federal agencies have consistent policies for sharing sequencing data with the scientific community and can then implement their own detailed version of the data release plan. in 2004, the microbe project iwg supported a workshop on "an experimental approach to genome annotation," which was coordinated by the american society for microbiology, and discussed issues faced in annotating microbial genome sequences that have been completed or will be completed in the next few years. in 2005, the microbe project iwg developed a strategic plan and implementation steps as an updated action plan for coordinating microbial genomics among federal agencies, and the plan was finalized in 2006. r niaid continues to participate with other federal agencies in coordinating medical diagnostics for biodefense and influenza across the federal government and in facilitating the development of a set of contracts to support advanced development toward the approval of new or improved point-of-care diagnostic tests for the influenza virus and early manufacturing and commercialization. r niaid continues to participate in the nih roadmap initiatives, including lead science officers for one of the national centers for biomedical computation and one of the national technology centers for networks and pathways. seven biomedical computing centers are developing a universal computing infrastructure and creating innovative software programs and other tools that would enable the biomedical community to integrate, analyze, model, simulate, and share data on human health and disease. five technology centers were created in 2004 and 2005 to cooperate in a u.s. national effort to develop new technologies for proteomics and the study of dynamic biological systems. r supramolecular architecture of severe acute respiratory syndrome coronavirus (sars-cov). coronaviruses derive their name from their protruding oligomers of the spike glycoprotein (s), which forms a coronal ridge around the virion. the understanding of the virion and its organization has previously been limited to x-ray crystallography of homogenous symmetric virions, whereas coronaviruses are neither homogenous nor symmetric. in this study, a novel methodology of single-particle image analysis was applied to selected coronavirus features to obtain a detailed model of the oligomeric state and spatial relationships among viral structural proteins. the two-dimensional structures of s, m, and n structural proteins of sars-cov and two other coronaviruses were determined and refined to a resolution of approximately 4 nm. these results demonstrated a higher level of supramolecular organization than was previously known for coronaviruses and provided the first detailed view of the coronavirus ultrastructure. understanding the architecture of the virion is a necessary first step to defining the assembly pathway of sars-cov and may aid in developing new or improved therapeutics (49). r large-scale sequence analysis of avian influenza isolates. avian influenza is a significant global human health threat because of its potential to infect humans and result in a global influenza pandemic. however, very little sequence information for avian influenza virus (aiv) has been in the public domain. a more comprehensive collection of publicly available sequence data for aiv is necessary for research on influenza to understand how flu evolves, spreads, and causes disease, to shed light on the emergence of influenza epidemics and pandemics, and to uncover new targets for drugs, vaccines, and diagnostics. in this study, the investigators released genomic data from the first large-scale sequencing of aiv isolates, doubling the amount of aiv sequence data in the public domain. these sequence data include 2,196 aiv genes and 169 complete genomes from a diverse sample of birds. the preliminary analysis of these sequences, along with other aiv data from the public domain, revealed new information about aiv, including the identification of a genome sequence that may be a determinant of virulence. this study provides valuable sequencing data to the scientific community and demonstrates how informative large-scale sequence analysis can be in identifying potential markers of disease (50) . genome sequencing project. the analysis of the first 209 full genome sequences from human influenza strains, deposited in genbank through the niaid influenza genome sequencing project, was published in 2006 (51) . influenza isolates were chosen in a relatively unbiased manner, allowing a comprehensive look at the influenza virus population circulating within the same geographic region over several seasons, which provided a real picture of the dynamics of influenza virus mutation and evolution. analysis demonstrated that the circulating strains of influenza included alternative minor lineages that could provide genetic variation for the dominant strain. this may allow a novel strain to emerge within a human host and would explain the unexpected emergence of the fujian influenza strain in 2003-2004 that resulted in a vaccine mismatch. these findings demonstrate the usefulness of full genomic sequences for providing new information on influenza viruses and lend further support for the need for large-scale influenza sequencing and the availability of sequence data in the public domain. within the influenza community, public availability of influenza sequence data and sharing of strains has been an important issue. the niaid has been instrumental in promoting the sharing of influenza sequence information, notably by sequencing more than 1,400 complete influenza genome sequences and depositing the sequences in the public domain through gen-bank as soon as sequencing has been completed. history of microbial genomics tools for gene finding and whole genome comparison interpolated markov models for eukaryotic gene finding computational gene finding in plants the genomes of pathogenic enterobacteria the complete genome sequence of escherichia coli k-12 genome sequence of enterohemorrhagic escherichia coli o157:h7 complete genome sequence of enterohemorrhagic escherichia coli o157:h7 and genomic comparison with a laboratory strain k-12 extensive mosaic structure revealed by the complete genome sequence of uropathogenic escherichia coli genome sequence of shigella flexneri 2a: insights into pathogenicity through comparison with genomes of escherichia coli k12 and o157 large, unstable inserts in the chromosome affect virulence properties of uropathogenic escherichia coli o6 strain 536 escherichia coli that cause diarrhea: enterotoxigenic, enteropathogenic, enteroinvasive, enterohemorrhagic, and enteroadherent pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution excision of large dna regions termed pathogenicity islands from trna-specific loci in the chromosome of an escherichia coli wild-type pathogen complete genome sequence of multiple drug resistant salmonella enterica serovar typhi ct18 complete genome sequence of salmonella enterica serovar typhimurium lt2 cloning and nucleotide sequence of the salmonella typhimurium lt2 gnd gene and its homology with the corresponding sequence of escherichia coli k12 a 40 kb chromosomal fragment encoding salmonella typhimurium invasion genes is absent from the corresponding region of the escherichia coli k-12 chromosome molecular genetic bases of salmonella entry into host cells identification of a virulence locus encoding a second type iii secretion system in salmonella typhimurium identification of a pathogenicity island required for salmonella survival in host cells pathogenicity islands and host adaptation of salmonella serovars the salmonella selc locus contains a pathogenicity island mediating intramacrophage survival the 102-kb unstable region of yersinia pestis comprises a high-pathogenicity island linked to a pigmentation segment which undergoes internal rearrangement transfer rna genes frequently serve as integration sites for prokaryotic genetic elements complete nucleotide sequence of the prophage vt2-sakai carrying the verotoxin 2 genes of the enterohemorrhagic escherichia coli o157:h7 derived from the sakai outbreak a novel mechanism of virus-virus interactions: bacteriophage p2 tin protein inhibits phage t4 dna synthesis by poisoning the t4 single-stranded dna binding protein, go32 the old exonuclease of bacteriophage p2 filamentous phages linked to virulence of vibrio cholerae shiga toxin: purification, structure, and function genome sequence of yersinia pestis, the causative agent of plague salmonella pathogenicity islands encoding type iii secretion systems the salmonella pathogenicity island-1 type iii secretion system capsule switching of neisseria meningitides capsules and cassettes: genetic organization of the capsule locus of streptococcus pneumoniae genetic and molecular characterization of capsular polysaccharide biosynthesis in streptococcus pneumoniae type 3 massive gene decay in the leprosy bacillus yersinia pestis -etiologic agent of plague yersinia pestis, the cause of plague, is a recently emerged clone of yersinia pseudotuberculosis microbial proteomics from proteins to proteomes: large scale protein identification by twodimensional electrophoresis and amino acid analysis membrane proteins and proteomics: un amour impossible? two-dimensional electrophoresis of membrane proteins: a current challenge for immobilized ph gradients new developments in isoelectric focusing isoelectric focusing in immobilized ph gradients: principle, methodology and some applications laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons electrospray ionization for mass spectrometry of large biomolecules ion trap mass spectrometry supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy large-scale sequence analysis of avian influenza isolates large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution key: cord-023928-9a1w174h authors: thomas, neal j.; dahmer, mary k.; quasney, michael w. title: genetic predisposition to critical illness in the pediatric intensive care unit date: 2011-12-16 journal: pediatric critical care study guide doi: 10.1007/978-0-85729-923-9_11 sha: doc_id: 23928 cord_uid: 9a1w174h much progress has been made in the past decade in the understanding of the genetic contribution to the development of human disease in general, and critical care illness specifically. with the mapping of the human genome and on-going mapping of genetic polymorphisms and haplotypes in humans, the field of critical care is now in prime position to study the impact of genetics on common illnesses that affect children who require critical care, to examine how differences of the host defense response lead to variable outcomes in outwardly appearing similar disease states, and to study how genetic differences in response to therapy will help practitioners tailor therapeutic interventions to an individual child’s genetic composition. while we are still years away from true individualized medicine, we are now closer than ever to understanding why two might children respond to the same environmental insult in vastly different ways. much progress has been made in the past decade in the understanding of the genetic contribution to the development of human disease in general, and critical care illness specifi cally. with the mapping of the human genome and on-going mapping of genetic polymorphisms and haplotypes in humans, the fi eld of critical care is now in prime position to study the impact of genetics on common illnesses that affect children who require critical care, to examine how differences of the host defense response lead to variable outcomes in outwardly appearing similar disease states, and to study how genetic differences in response to therapy will help practitioners tailor therapeutic interventions to an individual child's genetic composition. while we are still years away from true individualized medicine, we are now closer than ever to understanding why two might children respond to the same environmental insult in vastly different ways. neal j. thomas , mary k. dahmer, and michael w. quasney before being able to appreciate the advances in research that have been accomplished in relation to the genetic impact on critical illness in children in recent years, it is important to understand the basics of human genetics, and become familiar with the terminology that is utilized to discuss these remarkable advances. once the genetic basics are clear, discussion can then proceed to genetic associations that have been determined in critical illness in children. the nucleus of all cells holds chromosomes that contain deoxyribonucleic acid (dna), the genetic material that is inherited from parents. dna is responsible for determining the structure of the cell, the function and activity of the cell in response to various stimuli, and the interaction the cell has with other cells and the extracellular environment. the dna molecule consists of two chains of deoxyribonucleotides held together by complementary base pairs. the deoxyribonucleotides contain the four nucleotide bases, adenine (a), thymine (t), guanine (g), and cytosine (c) that are covalently bound together by phosphodiesterase bonds linking the 5 ¢ carbon of one deoxyribose group to the 3 ¢ carbon of the next group. the two chains of deoxyribonucleotides are linked by hydrogen bonds between the a's of one strand and the t's of the other. likewise, the g's of one strand are linked by hydrogen bonds to the c's of the complementary strand. these two complementary strands form the dna double helix ( fig. 11-1 ) , with one strand running in the 5 ¢ to 3 ¢ direction while the other strand runs in the 3 ¢ to 5 ¢ direction. the order of nucleotides bases is termed the sequence and is read in the 5 ¢ to 3 ¢ direction. the genetic information of an individual is encoded by the precise positioning and order of these base pairs. the order of nucleotide bases is termed the sequence and is read in the 5 ¢ to 3 ¢ direction. the four nucleotides of the dna double helix (from national human genome research institute's talking glossary of genetics ( http://www.genome.gov/ glossary.cfm#s )) the entire dna content of an organism is their genome . every cell of an organism contains two copies of the dna, with the exception of red blood cells which lack a nucleus and dna and sperm and egg cells which contain one copy of the dna. overall, humans have 46 chromosomes, including 22 pairs of autosomal chromosomes and one pair of sex chromosomes. each chromosome is made up of a centromere and two telomeres (ends) (fig. 11-2 ). the two arms of the chromosome are the short arm (p) and long arm (q). the parts of the genome that contain nucleotide sequences that code for proteins are the genes and it is estimated that the human genome contains about 20,000-30,000 genes. the structure of genes is very complex and highly variable. genes are made up of a variable number of exons , which contain the actual coding sequence for the proteins, and introns , which are noncoding regions which separate the exons. while the function of the introns is unclear, some disease processes have been found to be associated with certain nucleotide variations located in these intron regions ( fig. 11-3 ). genes also have regulatory regions, including promoter sequences that generally reside at the 5 ¢ end of the gene (referred to as upstream) and regions at the 3 ¢ end associated with stability of the mrna. genetic recombination , which is the re-shuffl ing of genes from generation to generation, is the basis of genetic diversity in sexually reproducing organisms. the analysis of genetic recombination is a useful method of mapping genes in the genome. genetic recombination results in an exchange of genetic material between homologous chromosome pairs. this results in segments of dna being exchanged with the other chromosome of the pair, thereby shuffl ing the genetic material. the basic principal of linkage analysis , a method used to fi nd disease causing genes, relies on the genetic recombination frequency between two loci on a single chromosome. this allows the estimation of the relative distance between them, and is crucial for the mapping of genes in the genome. recombination frequencies can be measured by genotyping individuals in a family pedigree. the closer together two loci are on a chromosome, the lower the likelihood of recombination to occur between them. if loci are very close, they are said to be linked. the reliability of genetic linkage between loci is determined using the lod score , which is an estimate of whether two loci are likely to lie near each other on a chromosome and are therefore likely to be inherited together. a lod score of 3 or more, which represents odds of 1000:1 or greater in favor of linkage, is used to indicate statistically signifi cant linkage, and therefore concludes that the two loci of interest are close. the development of genetic maps has been very useful in fi nding genes which may cause human disease. genes may be mapped to a particular location in the genome based on being inherited with respect to a marker of known map location, and with the assumption of no the entire dna content of an organism is their genome. genes are made up of a variable number of exons, which contain the actual coding sequence for the proteins, and introns, which are noncoding regions which separate the exons. nucleus telomere telomere centromere the structure of a chromosome (from national human genome research institute's talking glossary of genetics ( http://www. genome.gov/glossary.cfm#s )) genetic recombination. there are a number of polymorphic markers which may be utilized for genetic map construction, including minisatellites, microsatellites, and single nucleotide polymorphism (snps). linkage disequilibrium (ld), often referred to as allelic association, is a measure of physical association between two alleles and occurs when closely linked alleles are inherited together during many generations. no signifi cant degree of genetic recombination occurs between them, and they continue to be passed along together throughout generations. therefore, knowledge of one marker can be used to study the other. there are many potential benefi ts of identifying genes/gene variants involved in disease. these include, but are not limited to, an improved understanding of the disease etiology, insight into the mechanisms of disease pathogenesis, an ability to develop an early disease risk assessment, the potential to discover novel therapeutic drug targets, the ability to estimate the therapeutic response to specifi c pharmacologic therapies, the possibility of targeted disease prevention strategies to be utilized in high-risk populations based on genetic predisposition, and the movement from the classic symptoms-based disease defi nition towards a true molecular defi nition of complex disease processes. genetic mutations are changes that occur in the sequence of dna. mutations can be classifi ed as somatic mutations, which occur in somatic cells and are not commonly passed on to offspring, and germ-line mutations which occur in the reproductive cells and are passed on linkage disequilibrium (ld), often referred to as allelic association, is a measure of physical association between two alleles and occurs when closely linked alleles are inherited together during many generations. the structure of a gene, including the exons (coding sequence) and introns (noncoding sequence) (from national human genome research institute's talking glossary of genetics ( http://www. genome.gov/glossary.cfm#s )) to offspring. there are several different types of mutations. translocations are large-scale mutations comprised of switching of chromosomal regions between one chromosome and another chromosome. mutations can also consist of single changes in the nucleotide bases and include substitution, deletion, or insertion of nucleotides. insertions and deletions can also involve hundreds of nucleotides. mutations that occur in the coding regions can have several consequences: they can change the amino acid of the protein at a single site, they can cause a premature stop codon resulting in early termination of translation, and, consequently, lead to a truncated protein, or they may have no effect at all if the mutation leads to a nucleotide substitution that does not alter the amino acid. likewise, mutations in noncoding regulatory regions (such as promoters) may also affect the expression of the gene by altering the quantity of mrna transcribed and, hence, the level of the protein. mutations in the intron/ exon boundary region may also lead to incorrectly spliced mrnas and result in signifi cantly different proteins or differences in levels of protein products. the sequencing of the human genome has revealed that most genes are polymorphic; that is, there are small differences in the nucleotide sequences. there are estimates that the human genome may contain over 10 million of these types of variations. these differences in the nucleotide sequence are what give rise to our genetic variability; they account for inherited differences in our physical traits and the way we respond to environmental stimuli and medications. while the majority of these nucleotide variations do not cause a disease, some genetic variations may infl uence the development of certain diseases. the mutations discussed in the preceding paragraph are variations that occur in less than 1% of the population and are, thereby, rare. on the other hand, variations that occur at a frequency greater than 1% in the population are referred to as polymorphisms . if the polymorphism is a change in a single nucleotide, it is referred to as a single nucleotide polymorphism (snp) . these more common genetic variations, whether snps or small insertions or deletions of nucleotides, are the ones currently being examined in many studies for associations with susceptibility to and outcome from diseases seen in the intensive care unit setting. copy number variations (cnvs) are stretches of dna of greater than 1 kb that show differences in the expected number of copies of the dna in greater than 1% of the human population. very recently it has become clear that cnvs are also common in human genomes and contribute significantly to human genetic variation. another important concept in genetics is a locus, which refers to the location in the genome of a specifi c gene or variant. keeping in mind the above discussion of genetic variations, a locus may contain two slightly different sequences for a specifi c gene. these alternative forms of a gene are termed alleles , or variants . the alleles for a specifi c individual at a genetic locus is that person's genotype . an example is the surfactant protein b (sp-b) +1580 site. an individual's genotype at that site may either be tt, ct, or cc. individuals are heterozygous if they possess two different alleles at the locus of interest and homozygous if they possess two identical alleles at that locus. but we obviously do not contain only one genetic variation in our genome. a haplotype represents a combination of polymorphic alleles on a single chromosome delineating a pattern that is inherited together and transmitted from parent to offspring. haplotype analysis is a useful tool for analysis of disease gene discovery, as investigators may capitalize on the fact that many of the polymorphisms of interest are not transmitted independently of each other, and the presence of one gene variant can tag the presence of another polymorphism from the same chromosome. in some cases, haplotype assessment can provide a higher level of specifi city, sensitivity, and accuracy in "true" associations with disease risk or severity. by focusing on haplotypes as well as snps, researchers are now able to more accurately study genetic predisposition to various diseases of interest. with the recent report of the international hapmap consortium, and the identifi cation and cataloging of haplotypes now available, the utility of this type of study is brought into focus as an important tool to guide genetic association studies on complex human diseases. genetic polymorphisms, like the rarer mutations, may also infl uence the quantity of the mrna made if present in a regulatory region, or they may also infl uence the functional activity of the protein product. there has been an explosion of studies attempting to determine if these genetic polymorphisms may account for some of the clinical variability we as clinicians observe at the bedside in the picu. for example, can the difference in disease genetic mutations are rare changes that occur in the sequence of dna that occur in less than 1% of the population. polymorphisms are variations that occur at a frequency greater than 1% in the population; if the polymorphism is a change in a single nucleotide, it is referred to as a single nucleotide polymorphism (snp). a haplotype represents a combination of polymorphic alleles on a single chromosome delineating a pattern that is inherited together and transmitted from parent to offspring. c hapter 11 • g en etic pr edis pos ition to c r itical i lln ess severity between two children with pneumonia be associated with variations in their genes coding for one of the surfactant proteins? gene expression is the process by which the information contained within genes is used to make proteins ( fig. 11-4 ) . this occurs by a combination of two distinct processes: transcription and translation. transcription is the process by which the genetic information in dna is transcribed into messenger ribonucleic acid (mrna). mrna differs from dna in that it is singlestranded, has a modifi ed sugar backbone, and contains uracil (u) instead of t. the process of transcription involves the unwinding of the two complementary strands of dna, the enzyme rna polymerase binding to the promoter region of a gene on a single strand of dna, and synthesizing the mrna molecule by adding ribonucleotides in an order that is complementary to the dna strand. the transcribed mrna thus contains all the genetic information between the transcriptional start and stop sites on the dna including exons and introns. the non-coding intron sequences are removed by a process referred to as splicing which connects all the exons together to form the fi nal mrna product. this mrna represents the coding dna sequences for a single gene. (it should be noted that splicing variations have been identifi ed that infl uence disease processes that impact the fi nal protein product by altering the mrna sequences that are spliced together.) the mrna is then transported to the cytoplasm, and translation occurs, in which the genetic information from mrna is utilized to guide the synthesis of proteins. proteins are composed of amino acids. there are 20 different amino acids in humans, and each is encoded by a set of 3 nucleotides in the mrna. these three nucleotides are called triplets or codons . the corresponding anticodon on the transfer rna (trna) links with a codon, presenting its unique amino acid in the process of translational protein synthesis. gene expression is the process by which the information contained within genes is used to make proteins. transcription is the process by which the genetic information in dna is transcribed into messenger ribonucleic acid (mrna). translation is the process by which the genetic information from mrna is utilized to guide the synthesis of proteins. since all cells that contain a nucleus carry the full set of genetic information, it is necessary for gene expression to be selective and tightly controlled, in a way that guarantees specifi c proteins are expressed in specifi c cells under appropriate conditions. this differential expression of genes ensures that cells develop correctly, can differentiate and function as specialized cells, and can mount various responses to external stimuli. in certain disease states, expression of specifi c genes may change, thereby providing a clue as to which genes may be important in that disease process. recent advances in technology have provided a valuable tool to evaluate the expression of genes during various diseases, including sepsis. these technologies include dna microarrays, in which the basic approach is as follows: small strands of dna probes representing the genes of interest, for example, tumor necrosis factor alpha (tnf-a ), are attached to a solid substrate such as a glass slide or silicon chip (in reality, thousands of probes are applied to the same micorarray chip). mrna is then isolated from, for example, a patient without acute lung injury (ali) and one with ali. these two samples are then separately converted to complementary dna (cdna) in a manner which incorporates different fl uorophores in the two samples of cdna (e.g., red into all the cdnas from the patient with ali which would also include cdna made from the mrna coding for tnf-a , and green into all the cdnas from the patient without ali including the cdna made from the mrna coding for tnf-a ). the cdnas from the two patients are then mixed and hybridized to the chip containing the dna probes. thus, if tnf-a is up-regulated in ali and there is signifi cantly more cdna in the ali sample than the non-ali sample, then more red fl uorophore -labeled cdna would be present. when the mixture of cdnas is hybridized to the chip containing the probes, the probe for tnf-a would light up red and represent increased expression of the tnf-a gene in ali. if tnf-a is expressed at a lower concentration in the patient with ali then when the samples are mixed, more green fl uorophore-labeled cdna would be present than red fl uorophore-labeled cdna, and the probe for tnf-a on the microarray chip would light up green indicating decreased expression of tnf-a gene in ali. finally, if there is no change in the expression of the tnf-a gene, the amounts of red and green fl uorophorelabeled cdnas would be equal, and the probe for the tnf-a gene would light up yellow. in this fashion, one can identify specifi c genes that are expressed in the development of ali. several examples of the use of this technology in the critically ill patient have been published which have aided our understanding of the pathophysiology of certain icu specifi c diseases. up to this point, we have discussed the structure of dna and the process of getting from the code in the dna to protein. it is the functional aspects of these proteins that give rise to the observed traits, whether it be the color of one's eyes, the rate of metabolism of a drug, or the effi ciency with which a protein receptor on a cell surface recognizes a pathogen. the observable characteristics of an individual defi ne that individual's phenotype . this may include common physical and biochemical characteristics, but can also describe a person's disease status (such as in cystic fi brosis). phenotypes caused by mutations in a single gene may show mendelian inheritance patterns . these patterns can be autosomal dominant (where a single copy of the gene causes the phenotype), autosomal recessive (where both copies of the gene are necessary for the phenotype), or sex linked (where the mutation occurs on the x chromosome). it is crucial to note that mendelian inheritance patterns are only seen for single gene disorders. critical care diseases and syndromes, such as sepsis and acute respiratory distress syndrome (ards), are complex disorders whose genetic predisposition to the development of the disease is due to multiple genes and other factors, such as environmental exposures. the multifaceted gene-gene and gene-environment interactions make the study of these diseases extremely complex. there are many common complex disorders that display obvious familial aggregation of cases, but have no clear mendelian inheritance patterns. the most commonly studied in medicine are cancer, diabetes, hypertension, and obesity, among others. the disease aggregation may be due to complex genetic factors, the interaction of multiple genes on the development the observable characteristics of an individual defi ne that individual's phenotype. c hapter 11 • g en etic pr edis pos ition to c r itical i lln ess of the disease of interest, a host of environmental factors which place the individual at risk for the disease, or, most commonly, a combination of all of the above factors ( fig. 11-5 ). in other words, a person's genetic background may make them prone to a specifi c disease; this is called susceptibility gene variants or susceptibility genes. due to their inherited genetic make-up, the individual is at a higher inborn risk for developing the disease of interest, but only if they are exposed to the environmental stressor that is known to associate with the disease. the susceptibility gene variant will not directly lead to the disease, but will put that person at a higher risk if they are exposed to the environmental risk. for example, individuals may possess a susceptibility gene variant for the development of lung cancer, but this will only lead to the development of cancer if they are subject to a known environmental risk factor, such as smoking. alternatively, a newborn may possess one or more susceptibility gene variants for the development of bronchopulmonary dysplasia, but if they are not born prematurely and do not require mechanical ventilation in the newborn period (and therefore are not subject to the environmental stressors known to impact the development of this lung disease), they will never develop this disease despite being genetically susceptible. gene variants can also decrease the susceptibility, or increase the resistance against a disease. these are protective gene variants . to give an example, there are individuals who smoke their entire adult lives and yet never develop chronic obstructive pulmonary disease. it is likely that these individuals possess protective gene variants against the development of this disease, even in the face of a strong environmental insult. a person may possess both susceptibility and protective genetic variants for the same disease, and the mix of these variants will impact together the overall genetic risk of that person to the disease of interest. this resultant genetic risk also interacts with the environmental risk of the individual, leading to the overall risk of that person developing the disease. in critical care, it is unlikely that one gene will cause the diseases that are treated in the intensive care unit; it is more likely that multiple genes will interact with multiple environmental insults to predispose individuals to diseases processes resulting in an overall risk for an individual patient to develop a certain disease of interest. interestingly, certain populations seem to be immune to certain complex diseases. examples include the australian aborigines and inuits from greenland, in which both populations appear to be resistant to the development of type 1 diabetes. it is plausible that the disease resistance observed in these populations is due to absence of susceptibility gene variants or presence of protective gene variants in the group's gene pool. when it is determined that a complex disease has familial aggregation, it is important to take into account that families may also share environmental or social factors that predispose to the disease of interest, and therefore the entire impact may not be genetic. one example would be radon gas present in a neighborhood leading to an increased incidence of lung cancer in families living in close proximity. while it may be assumed that the genetic impact is responsible for the development of the disease, environmental exposure is the likely source. twin studies and adoption studies are utilized to attempt to determine the relative weight of genetics and environment in the development of a certain disease. genomic medicine and the concept that an individual's genetic makeup may infl uence not only the severity of and outcome from their critical illness, but also their response to the therapies, has begun to makes it's way into the intensive care unit. this section will highlight disease aggregation may be due to complex genetic factors, the interaction of multiple genes on the development of the disease of interest, a host of environmental factors which place the individual at risk for the disease, or, most commonly, a combination of all of the above factors. the complex interactions that can occur between genes that may impact a disease process (gene-gene interaction) as well as between the genes of interest and environmental factors (gene-environment interaction) studies involving analysis of gene expression and genetic association studies in patients with critical illness. while it may appear that advances have been made in the fi eld, it is important to understand that we are not yet in the age of personalized medicine and much work needs to be done. the expression of specifi c genes in many cases represents the body's specifi c and complex response to environmental stimuli, such as in the case of a severe infection, trauma, or cardiopulmonary bypass. examining gene expression may, therefore, provide a clue as to which genes are important in a specifi c critical illness. many studies have examined gene expression in critically ill patients, but most examine the expression of only a few genes. with the advent of the dna microarray technology discussed above investigators have begun to explore gene expression patterns in thousands of genes in children with septic shock using mrna isolated from whole blood representing the gene expression response in circulating white blood cells. these studies have compared gene expression from blood samples obtained within 1 day of admission to the picu with that observed in blood of healthy controls, examined longitudinal changes in gene expression in children with septic shock (days 1 and 3) and compared expression in patients with septic shock to expression in children with sepsis or systemic infl ammatory response syndrome (sirs). genes that were up-regulated, down-regulated, or unchanged between the groups were examined. the genes that were up-regulated when the septic shock group was compared to healthy controls included genes related to immunity and infl ammation as would be expected. unexpectedly the study demonstrated that many genes related to zinc biology and zinc homeostasis were down-regulated. the signifi cance of this fi nding was supported by the observation that children who did not survive septic shock had lower serum levels of zinc and the demonstration in a murine model that zinc depletion leads to increased mortality from sepsis. in addition, genes involved in t-cell receptor signaling and antigen presentation appeared decreased suggesting that septic shock may be associated with depression of the adaptive immune system. interestingly expression studies of adults with sepsis and septic shock did not identify a down-regulation of genes related to zinc biology although upregulation of genes related to immunity and infl ammation and down-regulation of genes related to the adaptive immune system was observed. the longitudinal study in children with septic shock demonstrated that in general the observed changes in up-regulated and down-regulated gene expression persisted over time. in addition, the study comparing expression in children with shock to those with sepsis or sirs indicated that while there were patterns of expression that were similar in all three groups (such as genes involved in innate immunity that were up-regulated) there were genes that were unique to the septic shock group with relation to the degree, and duration, of the response. examples include the fi nding that up-regulated genes that were involved in the il-10 signaling pathway had a greater signal which persisted for longer in the septic shock patients and down-regulation of genes for zinc biology and the adaptive immune system was greater and lasted longer than that seen in the other two groups. in addition to providing insight into the pathophysiology of sepsis and identifying potentially important proteins in early sepsis, the use of this technology may also provide physicians with a unique diagnostic tool. if the gene expression profi le of a patient who is in the early stages of sepsis is different from a patient who exhibits sirs but does not develop sepsis, then earlier therapies could be initiated before full blown sepsis is clinically evident. there is also some evidence that various subclasses of sepsis and septic shock may be able to be identifi ed using this technique. dna microarrays have also been used to investigate the expression profi le in adults with ali. mrna for pre-b-cell colony enhancing factor (pbef), a cytokine that is involved in the maturation of b-cell precursors, inhibition of neutrophil apoptosis, and perhaps regulation of endothelial cell calcium-dependent cytoskeletal arrangement was noted to be signifi cantly increased in adults with ali, a fi nding that was also consistent in both a canine and mouse model of ali. in addition to the elevated mrna levels, pbef protein in bronchoalveolar lavage fl uid was also elevated in adults with ali. it is also worth mentioning an important with the advent of the dna microarray technology, gene expression patterns in thousands of genes can lead to insight into disease pathogenesis, treatment, and outcome. c hapter 11 • g en etic pr edis pos ition to c r itical i lln ess study in a canine model of lung injury to highlight the value of gene expression arrays. the use of mechanical ventilation is invariably needed to treat patients with ali though the use of positive pressure ventilation itself may exacerbate the lung injury. gene expression arrays in a canine model of ventilator associated lung injury have identifi ed a number of genes that are regulated during ali. many of the genes can be grouped into biological processes known to important in the pathophysiology of ali; these include infl ammation (e.g., il-1b, il-6, il-1ra, mmif), coagulation (tissue factor, pai-1), and chemotaxis/cell motility (myosin light chain kinase, cell chemokine receptor 2). several other genes also appeared to be expressed including pbef, heat shock protein 70 (hsp 70), and vascular endothelial growth factor (vegf). thus, the use of expression arrays has identifi ed a number of candidate genes that may play important roles in the development of ali. as will be discussed below, genetic variations that infl uence the activity or level of the protein in several of these candidate genes have been examined in gene association studies in patients with ali. as described above, a large amount of genetic variability exists throughout our genome. whether these differences infl uence the susceptibility to or outcome from diseases in the critical care setting is an area receiving a great deal of interest. perhaps the greatest amount of focus of genetic association studies on critical illnesses is in sepsis and ali. the general approach has been to compare the frequencies of polymorphisms in specifi c candidate genes between a cohort of patients with sepsis or ali and an at-risk cohort without sepsis or ali. this section will review some of these studies. individual variability in the susceptibility to and outcome from sepsis and lung injury has long been observed in critically ill patients. why one child with pneumococcal pneumonia has little consequence of their infection and can be treated as an outpatient while another child develops refractory septic shock and respiratory failure has been attributed to a number of factors. these have included virulence of the pathogen, length of time between onset of symptoms and appropriate treatment, and comorbid conditions. while all these certainly contribute to the severity of disease, a growing body of evidence suggests that genetic variations in the individual patient may also contribute to the severity of and outcome from critical disease. these genetic polymorphisms may not be of any consequence during normal healthy periods but their importance may only become evident during a severe stressor such as an infection, trauma, cardiopulmonary bypass, or other scenarios seen in the intensive care unit (table 11 .1 ). a strong genetic infl uence on the outcome from infections was indicated by a family based study of adoptees. adoptees with a biological parent who died due to infection before the age of 50 had a relative risk of death due to infection of 5.81 (ci = 2.47-13.7); a higher relative risk than that seen when risk related to early death of a biologic parent due to cardiovascular and cerebrovascular disease (4.52; 1.32-15.4) or cancer (1.19; 0.16-8.99) was examined. thus, an individual's genetic makeup may infl uence the severity of disease in infection and sepsis. given the tens of thousands of genes in the human genome and the millions of genetic polymorphisms, on which polymorphisms and in which genes should investigators focus? one approach in choosing the candidate gene is to examine the pathways by which pathogens lead to the clinical symptoms of sepsis. the body's response to infections involves recognition of pathogen-associated products followed by an infl ammatory response that involves a large number of cellular proteins. genetic variations that lead to alterations in the amount or functional activity of any of these proteins involved in the recognition of or response to pathogen-associated products may infl uence the individual's response. examples of the infl uence of genetic variations in proteins involved in recognition of pathogens on the severity of infections include polymorphisms in the genes coding for mannose binding individual variability in the susceptibility to and outcome from critical care diseases has long been observed, and advances in genomic medicine now gives an opportunity to understand these differences. the body's response to infections involves recognition of pathogenassociated products followed by an infl ammatory response that involves a large number of cellular proteins. genetic variations that lead to alterations in the amount or functional activity of any of the proteins involved in the recognition of or response to pathogenassociated products may infl uence the individual's response. lectin (mbl) , the receptor for fc g , and toll-like receptor (tlr) 4. the heterotrimeric mbl is involved in binding bacterial surface carbohydrates and the opsonization of bacteria. a helical domain in the tertiary structure of the protein is crucial for formation of the active heterotrimer. three genetic polymorphisms in the gene coding for mbl result in amino acid changes in the helical tails of the protein and result in increased degradation and decreased serum levels of mbl. genetic association studies have demonstrated associations between variant b, c, d variants associated with decreased levels and activity and increased risk of infection fc g riia h131r r associated with decreased affi nity to igg 2 and opsonization and increased risk of infection and septic shock tlr4 asp299gly/thr399ile gly/ile associated with decreased expression, increased risk of sepsis and mortality cd-14 −159 c/t t allele associated with increased levels and susceptibility to sepsis and sepsis-related mortality in adults md-2 −1625 c/g −1625 g allele associated with higher risk of sepsis and multiple organ dysfunction score in chinese adults tnfa −308 g/a, −238 g/a, lta + 250 g/a a alleles for each polymorphism are associated with increased tnfa levels, increased mortality in sepsis and meningococcal disease, increased sepsis in adults with pneumonia, and increased mortality in bacteremia and sepsis il-6 −174 g/c g associated with increased il-6 levels in patients but c associated with increased levels in monocytes from neonates, sepsis in neonates but not adults, and severe sepsis and organ dysfunction in children il-1ra variable 86-bp repeat a2 associated with increased levels of il-1 ra and variable results of association studies examining risk of sepsis and mortality il-10 −1082 g/a, −819 c/t, −592 c/a gcc haplotype associated with increased levels and sepsis but not mortality irak-1 +1595 t/c c associated with increased nf-kb translocation and presence of shock and higher 60-day mortality in adults with sepsis hsp70a1b −179 c/t +1267 g/a −179 c/+1267a associated with decreased hspa1b and tnf a and +1267a associated with septic shock in adults with cap ace 287 bp i/d dd associated with increased serum and tissue levels and more severe meningococcal disease in children; no association with sepsis related mortality in neonates or adults pai-1 4g/5g 4g associated with increased levels and septic shock in meningococcal disease protein c −1641 a/g and −1654 c/t ac haplotype associated with increased mortality and organ dysfunction in adults with sepsis and with decreased protein c serum level; gc haplotype associated with more severe sepsis in children less than 1 year of age with meningococcemia fibrinogenbeta −854 g/a, −455 g/a, +9006 g/a gaa haplotype associated with higher levels of fi brinogen, lower 28 day mortality and less severe organ dysfunction mbl mannose-binding lectin, ig immunoglobulin, tlr toll-like receptor, rsv respiratory syncytial virus, md myeloid differentiation, tnf tumor necrosis factor, lt lymphotoxin, il-1ra interleukin 1 receptor antagonist, il-10 interleukin-10, irak-1 interleukin receptor-associated kinase 1, hsp heat shock protein, cap community acquired pneumonia, ace angiotensin converting enzyme, pai plasminogen activator inhibitor a terminology used for the various polymorphisms are the ones most commonly used in the literature and may refer to the nucleotide position, amino acid position, or name of the allele. this table is representative of polymorphisms examined in sepsis but does not include all such polymorphisms the 3 mbl genetic polymorphisms and increased susceptibility to infections, hospitalizations due to infections, number of acute respiratory infections, and risk of meningococcal infections in children, and pneumonia and sepsis in neonates. in adults these polymorphisms have been associated with recurrent respiratory infections, invasive pneumococcal infections and viral coinfections with pneumococcal pneumonia. the family of leukocyte fc g receptors is also involved in the recognition of bacteria such as streptococcus pneumoniae , haemophilus infl uenzae type b, and neisseria meningitides. fc g receptors bind the fc portion of igg bound to bacteria, thereby facilitating phagocytosis and inducing the infl ammatory response. several polymorphisms have been described in the genes coding for the various fc g receptors that alter their binding affi nity to the various subclasses of igg. two such polymorphisms have been described in the genes coding for the fc g riiib receptor and the fc g riia receptor . in the case of the fc g riiib receptor, the genetic polymorphism results in a four amino acid substitution (allotypes fc g riiib-na1 or -na2) in the receptor that alters the opsonization effi ciency. in the case of the fc g riia receptor, the genetic polymorphism results in replacing a histidine for an arginine in the extracellular domain of the receptor at amino acid position 131. the variant fc g riia receptor containing the histidine binds the fc region of igg2 with a lower affi nity and results in reduced phagocytocytosis in vitro compared with the more common fc g riia receptor containing the arginine. in association studies, a higher frequency of individuals homozygous for the na2 allotype of the fc g riiib receptor or an arginine at position 131 in the fc g riia receptor was found in patients with severe meningococcal disease or fulminant meningococcal sepsis. the fi nal examples of genetic variation in genes coding for pathogen recognition products infl uencing the severity of sepsis are the polymorphisms in the gene coding for the tlr4 receptor . this receptor is a component of a complex that includes cd-14 and myeloid differentiation (md)-2 that binds lipopolysaccharide (lps), one of the major cell wall components of gram negative bacteria. in addition, tlr4 recognizes the f protein of the respiratory syncytial virus (rsv). two genetic polymorphisms have been identifi ed in the gene coding for tlr4 that result in the change of a threonine for a glycine at amino acid position 299 and a threonine for a isoleucine at amino acid position 399. the gly299ile399 variant form of the receptor appears to be expressed on the cell surface in lower amounts and result in a lower systemic cytokine response to lps and rsv. genetic association studies have demonstrated an association between the tlr4 gly299ile399 variant and gram negative bacterial infections and septic shock as well as mortality in patients with systemic infl ammatory response syndrome. however, a number of studies have also shown confl icting results. these tlr4 variants have also been reported to be associated with susceptibility to and severity of respiratory syncytial virus infections in children. future studies with more participants will be required to determine whether variations in the tlr4 gene are involved in infection and/ or severity of disease. thus far, the focus has been on genetic variations in genes coding for proteins involved in pathogen recognition, and in each case, the variation results in an inferior host response resulting in more severe disease. currently it is thought that severe sepsis and septic shock may be the result of an imbalance in the infl ammatory response. the mechanism by which this imbalance occurs is thought to be multi-factorial. one possibility that has attracted much interest is that the host may harbor genetic variations in the regulatory regions of genes involved in the response to noxious stimuli resulting in an imbalance between pro-and anti-infl ammatory cytokines. these variations can result in an over-expression of pro-infl ammatory cytokines, such as tnf-a and interleukin (il)-6, or an under-expression of anti-infl ammatory cytokines, such as il-10. in either case, the normal infl ammatory response is dysregulated. one of the pro-infl ammatory genes in which genetic polymorphisms infl uence expression is tnf-a . as a key pro-infl ammatory cytokine, tnf-a is responsible for the activation of the infl ammatory response and by itself can produce many of the clinical manifestations of sepsis such as capillary leak, hypotension, and multiple organ dysfunction syndrome. the regulatory region of the gene coding for tnf-a has several polymorphisms that alter transcription of tnf-a , thereby infl uencing the amount of tnf-a produced. several of these polymorphisms alter nucleotides which make up the recognition sequences of some of the genetic variation of toll-like receptor 4 may be an important contributor to difference in the host response to infectious illness observed in children. transcription factors that regulate transcription. two of these polymorphisms in particular have been studied. the rarer a allele (tnf-a -308) at a location 308 base pairs upstream from the transcriptional start site results in greater transcription than the more common g allele. a second rare a allele (tnf-a -238) at a location 238 base pairs upstream from the transcriptional start site results in lower transcription than the g allele. in addition, another site located ~3,200 base pairs upstream from the transcriptional start site of the tnf-a gene and located in the gene coding for another gene, lymphotoxin (lt)-a , (also referred to as the tnfb allele, tnf-b + 252, and lt-a + 250) also appears to regulate transcription of the tnf-a gene. in genetic association studies, the frequency of the tnf-a -308 a allele has been shown to be higher in children who died from meningococcal infections and adults who died with septic shock compared with controls. genetic association studies examining the infl uence of the polymorphic lt-a + 250 site has shown a higher frequency of the a allele in adults with pneumonia presenting with the clinical symptoms of sepsis, in adults in post-operative and trauma intensive care units who develop sepsis and who have a higher mortality, and in bacteremic children who exhibit higher serum tnf-a levels and have a higher mortality. however other genetic association studies examining the tnf-a gene have reported confl icting results. recently a well designed, prospective study examining a number of polymorphisms in the gene for tnf-a (including those described above) in adults with trauma admitted to the icu demonstrated that the a allele of tnf-a -308 was associated with elevated tnf, sepsis syndrome and death in trauma patients both in their initial cohort and a replication cohort. the gene for il-6 (another pro-infl ammatory cytokine) also contains variations which multiple studies suggest are associated with the susceptibility to or outcome from sepsis. as mentioned above, the progression to severe sepsis is believed to be an imbalance in the pro-and anti-infl ammatory mediators. in addition to polymorphisms that result in increased levels of pro-infl ammatory cytokines, examples of polymorphisms that result in lower levels of anti-infl ammatory cytokines also exist. il-1 receptor antagonist (il-1ra) is one of the body's mechanisms for keeping the infl ammatory reaction in check by binding to the il-1 receptor without activating the signal transduction pathway. the gene coding for il-1ra contains a polymorphic region consisting of a variable number of 86 base-pair tandem repeats. these different il-1ra alleles have been associated with variable circulating levels of both il1-ra and il-1 b (the two genes are located close to one another on chromosome 2), and several association studies have suggested an infl uence of this variation on a variety of diseases in which infl ammation plays an important role, including the susceptibility to sepsis. il-10 is another anti-infl ammatory cytokine for which genetic polymorphisms appear to alter transcription levels. a number of studies have demonstrated an association between an increased susceptibility to sepsis and certain il-10 polymorphisms although confl icting results have also been reported. it is important to remember that the cytokines and their receptors mentioned above activate a complex signal transduction pathway composed of dozens of proteins with the end result of a well coordinated cellular response to the noxious stimulus. genetic variation in any of the proteins in the pathway may also infl uence the fi nal response. recent studies have begun to analyze components of various pathways involved in the development of sepsis. one example is il-1 receptor-associated kinase-1 (irak-1) that plays an important role in the signal transduction pathway initiated by the activation of the il-1 receptor. activation of irak-1 results in increased transcription of a variety of pro-infl ammatory genes modulated by nf-k b, a key transcription factor in the infl ammatory response. genetic variations in the gene coding for irak-1 have been shown to be associated with elevated nuclear levels of nf-k b as well as the presence of shock and a higher 60-day mortality in patients with sepsis. the association of a variant in irak-i with severity of septic shock has been independently replicated in a large multi-centered cohort of adult patients with septic shock. interestingly, this study also indicated that age might modify the relationship as this association was stronger for younger patients. many other proteins involved in these complex signaling pathways have yet to be investigated for the infl uence of genetic variations on critical illnesses. many of the genes discussed thus far have a role in infl ammation, a key component in the pathophysiology of sepsis. loss of homeostatic mechanisms regulating the coagulation/ fi brinolytic system also plays an important role in sepsis. plasminogen activator inhibitor 1 the regulatory region of the gene coding for tumor necrosis factora (tnfa ) has several polymorphisms that alter transcription of tnfa , thereby infl uencing the amount of tnfa produced. il-1 receptor antagonist (il-1ra) is a key mechanism for keeping the infl ammatory reaction in check by binding to the il-1 receptor without activating the signal transduction pathway. together with il-10, another anti-infl ammatory cytokine, genetic polymorphisms appear to alter transcription levels of these proteins. (pai-1) inhibits fi brinolysis thereby favoring the formation of microthrombi in the capillaries. the pathophysiology of multiple organ dysfunction syndrome in patients with sepsis is thought to involve, in part, intravascular fi brin deposition. thus, elevated pai-1 activity could contribute to organ failure in sepsis and elevated plasma concentrations have been observed in patients with sepsis. a genetic variation in the gene coding for pai-1 consisting of either the presence of 4 guanines or 5 guanines at a specifi c location appears to infl uence the amount of pai-1 produced. individuals homozygous for the 4g genotype (4g/4g) produce more pai-1 than individuals homozygous for the 5g genotype (5g/5g) or individuals that are heterozygous (4g/5g). association studies have demonstrated that children with meningococcal disease who were 4g/4g at this site had an increased risk of death compared to children who were 4g/5g or 5g/5g. more recent studies in both children and adults have demonstrated higher mortality in individuals homozygous for the 4g allele in a number of infectious diseases. since fi brin deposition is thought to play a role in the multiple organ system failure in patients with sepsis, genetic variations that infl uence the production of fi brin might also infl uence the severity of disease in patients with sepsis. the production of fi brinogen, the precursor to fi brin, is dependent on the transcription of fi brinogen-beta . several polymorphisms in the promoter region have been associated with higher plasma levels of fi brinogen, and higher levels of fi brinogen have been associated with improved outcomes in sepsis. association studies have demonstrated that the gaa haplotype, consisting of the genotypes at the −854, −455, and +9006 sites, is associated with higher levels of fi brinogen and with decreased mortality and less organ dysfunction. protein c has anticoagulant activity as well as anti-infl ammatory and anti-apoptotic effects suggesting that diminished activity of protein c may lead to increased fi brin deposition, infl ammation, and apoptosis. genetic polymorphisms located in the promoter region of the gene coding for protein c result in decreased levels. association studies in caucasian and han chinese adults have demonstrated increased mortality and organ dysfunction in adults with sepsis who carry the a allele at the −1641 site and the c allele at the −1654 site. this haplotype has also been reported to be associated with decreased protein c concentration. interestingly, an increased risk of more severe sepsis has been observed in children less than 1 year of age with meningococcal disease who carry the g allele at the −1641 site along with the c allele at the −1654 site. this haplotype was not associated with severe sepsis in older children suggesting a potential developmental difference in these variations on the severity of sepsis. severe lung injury in both adults and children can be precipitated by a diverse array of causes and are classifi ed as either direct injury when the insult is from the alveolar side of the alveolar/capillary membrane or indirect injury when the insult is from the capillary side. major causes of direct lung injury include pneumonia, aspiration, pulmonary contusion, and inhalation while major causes of indirect injury include sepsis, trauma without pulmonary contusion, cardiopulmonary bypass, and multiple transfusions. despite these various causes, the central pathogenesis of ali involves derangements in multiple biological processes. these include activation of infl ammation, loss of coagulation and fi brinolytic homeostasis, disruption of vascular permeability, epithelial and endothelial cell apoptosis as well as proliferation, and derangements in surfactant. some of these processes, notably infl ammation and coagulation, play key roles in the pathophysiology of sepsis as well as ali. thus, it is not surprising to fi nd that the candidate genes examined in genetic association studies for ali are in many cases the same as those examined in sepsis (table 11 .2 ). this section will review some of the genetic association studies examining the infl uence of genetic variations on the development of ali. pulmonary surfactant is synthesized by the type ii alveolar epithelial cells and is required for normal lung function. one of surfactant's primary functions is to lower the surface tension at the alveolar air-liquid interface. surfactant is composed of phospholipids and four proteins, surfactant protein (sp)-a, sp-b, sp-c, and sp-d. knockout models in mice have demonstrated that of these four proteins, only sp-b is absolutely required for post-natal genetic polymorphisms located in the promoter region of the gene coding for protein c result in decreased levels, and possibly an impact on mortality with sepsis. defi ciency in, or impaired activity of surfactant protein-b appears responsible for a number of interstitial pulmonary diseases in humans including ards. survival. defi ciency in, or impaired activity of sp-b appears responsible for a number of interstitial pulmonary diseases in humans including ards. several genetic variations exist in the genes coding for the surfactant proteins and two will be discussed here; the sp-b + 1580 t/c polymorphism and insertion/deletion polymorphism consisting of dinucleotide (ca) tandem repeats in intron 4. several studies have demonstrated an association between these polymorphisms and the need for mechanical ventilation in children (sp-b + 1580 t/c) and mechanical ventilation and ards in adults (sp-b + 1580 t/c polymorphism and insertion/ deletion of dinucleotide (ca) tandem repeats). the consequences of these variations are not fully known. the sp-b + 1580 t/c polymorphism results in an amino acid change in exon 4 in a region of the amino terminal propeptide which is thought to play a role in targeting of sp-b to lamellar bodies. the resulting amino acid change alters glycosylation of sp-b and may affect the level of sp-b by altering its processing or stability. aberrant proteolytic processing of the sp-b product encoded by the c allele is supported by a recent report demonstrating that the c allele is associated with absence of a specifi c pro-sp-b cleavage product in neonates. the intron 4 dinucleotide repeat length variation polymorphism is associated with incompletely spliced sp-b mrna. interestingly, in caucasians this length variation polymorphism in intron 4 is in linkage disequilibrium with the sp-b + 1580 t/c polymorphism; the c allele of rs1130866 is associated with the deletion variants at the intron 4 polymorphic site. further work is needed to not only defi ne the consequence of these genetic variations on surfactant function but also to further evaluate whether these genetic variations infl uence the development of ali. another study of interest involves the susceptibility to pneumonia, the leading cause of ali and ards in both children and adults. as discussed above, the 4g/4g genotype in the gene coding for pai-1 is associated with higher levels of pai-1 expression. in a large cohort of adults, those individuals with the 4g allele demonstrated a signifi cantly higher susceptibility to pneumonia. while pai-1 activity inhibits fi brinolysis leading to formation of microthrombi, it also demonstrates anti-infl ammatory activity, and in this fashion, may increase the susceptibility to infection. in patients with ali, plasma levels of pai-1 levels are elevated and protein c levels are diminished. in addition, alveolar levels of pai-1 are elevated suggesting a possible local activation of the fi brinolytic system. recent studies have demonstrated that the 4g allele of pai-1 is associated with increased mortality in patients with severe pneumonia and patients with ards. to date there are no reports indicating association of specifi c protein c variants with ali. infl ammation is one of the hallmarks of ali and as with sepsis, it is thought that one of the central components of ali is an imbalance between pro-and anti-infl ammatory cytokines in the lung. the infl uence of genetic variation in several genes involved with infl ammation on the development of ali has been examined. the tnf-a -308 a allele appears to be associated with increased mortality in adults with ards but not with the susceptibility to ards when compared with adults who were at-risk for the development of ards. the lt-a + 250 polymorphism that appears to be associated with more severe sepsis did not infl uence the severity of ards. macrophage migration inhibitory factor (mif) plays a central role in regulating the infl ammatory response by directly increasing tnf-a and il-8 and countering the antiinfl ammatory actions of glucocorticoids. mif mrna has been demonstrated in cells from bronchoalveolar lavage of patients with ali and mif concentrations in serum are elevated in patients with ali compared with controls. haplotypes composed of polymorphisms in the 3' end of the gene coding for mif are associated with the development of ali in both caucasian and african american populations. a number of studies have demonstrated association of specifi c haplotypes in another gene involved in the regulation of infl ammation, the gene for il-6, with the development of ali. whether these haplotypes are associated with elevated levels of il-6 is still unclear. association studies have also been performed examining the genetic variants discussed earlier in the promoter region of the anti-infl ammatory cytokine il-10. the −1082 gg genotype results in higher levels of il-10 and is associated with the development of ards in younger adults. furthermore, the adults with ards who carried the gg genotype at this site demonstrated lower mortality and organ failure. the genetic variants in the gene coding for mbl have also been examined for their infl uence on ali. one of the variants described previously that results in decreased serum levels of mbl, variant b, is associated with the susceptibility to ards and a greater degree of organ dysfunction and higher mortality in patients with ards. while no reports have described the tlr4 polymorphisms specifi cally in ali or ards, the gly299ile399 variant form of the receptor is associated with an increased risk of severe rsv bronchiolitis and increased risk for hospitalization for rsv in previously healthy infants suggesting a potential role for tlr4 in infl uencing the severity of lung injury. signal transduction pathways activated after stimulation of a variety of immune receptors including the tlrs and members of the family of il-1 and tnf receptors result in the upregulation of specifi c genes involved in the innate and adaptive immune responses. several transcription factors are involved in this process and genetic variation in any of these factors may infl uence the level of transcription. one such factor is nuclear factor k b (nf-k b) which under non-stimulated conditions is inhibited by the cytoplasmic inhibitor nf k bia . upon activation of cytokine-mediated signal transduction pathways, nf k bia is degraded allowing nf-k b to translocate to the nucleus. a number of polymorphisms located in the macrophage migration inhibitory factor (mif) plays a central role in regulating the infl ammatory response by directly increasing tnfa and il-8 and countering the anti-infl ammatory actions of glucocorticoids. haplotypes composed of polymorphisms in the 3 ¢ end of the gene coding for mif may be associated with the development of ali. promoter region of the gene coding for nf k bia have been described but their functional consequence is unknown. when individual nf k bia promoter polymorphisms were examined to determine if they are associated with the development of ali, none by themselves demonstrated an association. however, the haplotype of −881 g/−826 t/−297 c was found in a higher frequency in adults who developed ards especially in males with direct lung injury. another transcription factor is nf-e2 related factor 2 (nrf2) which, under nonstressed conditions, is located in the cytoplasm. under oxidative stress nrf2 translocates to the nucleus and results in transcription of several anti-oxidant enzymes. several polymorphisms within the promoter region of the gene coding for nrf2 have been identifi ed that reduce transcription. one such variant, −617 a allele, is associated with the development of ali in adult trauma patients. the role of angiotensin converting enzyme ( ace) in lung injury has recently attracted uch interest. ace is present in pulmonary endothelium and is responsible for converting ati to atii. ace levels are elevated in the bronchoalveolar lavage fl uid of adults with ards and higher levels are associated with mortality from ards. the key component is most likely atii, which has apoptotic effects on alveolar epithelial and endothelial cells in vitro . atii receptor antagonists block pneumocyte apoptosis in a model of meconium aspiration. another important component of this system is ace2, a homologue of ace expressed in human lungs, which is a negative regulator of the renin-angiotensin system as well as the probable receptor for the sars virus in humans. lung injury models using knockout mice lacking the ace2 gene have higher atii levels and exaggerated lung injury compared to wild type mice. however, the lung injury is reversed if the ace gene is inactivated in the ace2 knockout mice. this suggests that ace induces lung injury through atii and that ace2 protects against lung injury. indeed, ace inhibitors and atii antagonists appear to decrease the severity of lung injury in animal models, the risk of aspiration pneumonia in some adult populations, and reduce the 30-day mortality in adults with pneumonia. several studies have demonstrated the d/d genotype appears associated with the susceptibility to and outcome from ards. as discussed previously, expression microarrays have been invaluable in identifying other potential mediators involved in the pathophysiology of lung injury. the expression of pre-b cell colony enhancing factor (pbef) was found to be signifi cantly elevated in both animal and human studies of ali using this approach. pbef is a lesser studied cytokine involved in the maturation of b-cells, inhibition of neutrophil apoptosis, and perhaps regulation of the endothelial cell calcium-dependent cytoskeletal arrangement. two genetic polymorphisms have been identifi ed in the promoter region, −1001 t/g and −1543 c/t which appear to infl uence the development of ali. carriers of the g allele at position −1001 had a 2.75-fold increased risk of ali and the g allele remained an independent risk factor after controlling for several other variables. the t allele at position −1543 was found at a lower frequency in adults with ali. combining these two polymorphisms in a haplotype analysis demonstrated that adults with the −1001 g/ −1543 c haplotype had a higher risk of ali (7.7 fold). the consequence of these two polymorphisms remains to be elucidated though the −1543 t allele may result in reduced expression. one fi nal gene to be discussed in this section is the myosin light chain kinase (mlck) gene. three isoforms of the protein exist, and one isoform is a key component in the cytoskeletal arrangement regulating vascular permeability, angiogenesis, endothelial cell apoptosis, and leukocyte diapedesis. several polymorphisms in the gene coding for mlck have recently been identifi ed. analysis was performed not only on the infl uence of single polymorphisms on the development of ali but also a number of haplotypes using a sliding window approach. several strong associations between various single nucleotide polymorphisms as well as haplotypes and the risk of ali and sepsis were identifi ed in adults. this included one haplotype, ggt, composed of markers mylk_021, mylk_022, and mylk_011 spanning a region of 846 base pairs between the 5 ¢ untranslated region and the fi rst exon that appeared to be specifi cally associated with the risk of ali and not sepsis. the functional signifi cance of these haplotypes remains to be determined. specifi c variants in the mylk gene were also shown to be associated with trauma-induced ali, however association of specifi c genetic variants and lung injury were not observed in children or adults with pneumonia. two other areas should be mentioned in regards to the infl uence of genetic variations in the picu. the fi rst is in the area of coagulation. several examples of genetic polymorphisms in genes coding for proteins involved in coagulation and fi brinolysis were discussed above in relation to sepsis and ali. however, these and many other genetic variations that exist in other components of the coagulation cascade could also infl uence the development of thrombosis including deep venous thrombosis in critically ill patients. thrombosis of central venous catheters is a recurring problem in picus and while certain environmental factors play a role (eg, length of time catheter is in place, size of the patient and vessel), genetic polymorphisms in the patient favoring the formation of thrombi may also play a role. finally, the action of every drug that is used in the picu can potentially be infl uenced by genetic variation in the patient. whether it be inhaled b 2 -agonists, or the array of intravenous vasoactive agents, sedatives, muscle relaxants, antibiotics, steroids, etc.; all bind to protein receptors and either activate or block specifi c signal transduction pathways, many bind protein carriers or transporters, and most are metabolized by various protein enzymes. every gene coding for each of these proteins has multiple genetic variations with the potential to infl uence the levels or activities of these proteins. the area of pharmacogenomics attempts to determine the infl uence of genetic variations in genes affecting these various aspects of drug action. however, while the list of genetic polymorphisms in genes affecting drug action is growing rapidly, there are few clinical examples of the degree of infl uence that these genetic variations have on the response to drugs in the picu. for example, warfarin is the most widely used oral anticoagulant for long-term prophylaxis and treatment of thromboembolic disorders and is used in many children and adults with mechanical valves. the metabolism of and response to warfarin involves several enzymes, two of which exhibit genetic variations that dramatically alter the levels of warfarin. for one of these genes, cyp2c9, the common allele is referred to as cyp2c9*1 and is consider the wildtype while cyp2c9*2 contains a c to t nucleotide change at position 430 in exon 3 and cyp2c9*3 contains an a to c nucleotide change in exon 7. cyp2c9*2 has approximately 80% of the metabolic activity of the wild type cyp2c9*1 while cyp2c9*3 contains only 20% of the wildtype activity. by also using genetic variation in a second gene, vitamin k epoxide reductase complex subunit 1 or vkorc1, one can account for more than 50% of the observed dosing variability. current practice in the use of warfarin usually involves starting at an age and weight specifi c dose and monitoring coagulation studies. however, because of the genetic variations in these two enzymes and perhaps others, different patients take different amounts of time to achieve the appropriate therapeutic dose. knowing the specifi c genotypes of patients prior to initiating warfarin may allow for more appropriate dose selection, less time to achieve therapeutic levels, and less risk of adverse events. recently, an algorithm using the patient's genotypes at these two sites has been developed that allows for more accurate dosing in some populations. although these algorithms are being developed and tested it should be kept in mind that they do not account for drug-drug interactions. the era of the study of the genetic impact on critical illness in children is present. clinicians must be prepared to deal with the growing body of literature related to genetic infl uence on critical disease development, treatment and outcome, and be able to critically review the literature in order to determine the impact on the patients they are caring for daily. for the results of these representative genetic association studies to take the leap into clinically impacting care, they must meet certain criteria. first and foremost, the phenotype must be well defi ned; that is, the enrolling patients with ali/ards or sepsis must meet strict and well accepted criteria. second, they must be high quality studies, utilizing highly sensitive and specifi c methods for genotyping. third, the studies must use a large sample size to assure that no type i or type ii errors are made based simply upon the number of individuals genetic polymorphisms in genes coding for proteins involved in coagulation and fi brinolysis may be very important in the risk of bleeding and thrombosis in critically ill children. the area of pharmacogenomics attempts to determine the infl uence of genetic variations in genes affecting the various aspects of drug action. ace i/d but not agt (−6)a/g polymorphism is a risk factor for mortality in ards 4g4g genotype of the plasminogen activator inhibitor-1 promoter polymorphism associates with disseminated intravascular coagulation in children with systemic meningococcemia mechanisms and regulation of the gene-expression response to sepsis comparison of two polymorphisms of the interleukin-1 gene family: interleukin-1 receptor antagonist polymorphism contributes to susceptibility to severe sepsis recent advances in genetic predisposition to clinical acute lung injury novel polymorphisms in the myosin light chain kinase gene confer risk for acute lung injury macrophage migration inhibitory factor in acute lung injury: expression, biomarker, and associations polymorphisms in the mannose binding lectin-2 gene and acute respiratory distress syndrome severity of meningococcal disease in children and the angiotensin-converting enzyme insertion/ deletion polymorphism polymorphisms of human sp-a, sp-b, and sp-d genes: association of sp-b thr131ile with ards deletions within a ca-repeat-rich region of intron 4 of the human sp-b gene affect mrna splicing lipopolysaccharide hyporesponsiveness as a risk factor for intensive care unit hospitalization in infants with respiratory syncitial virus bronchiolitis fibrinogen-beta gene haplotype is associated with mortality in sepsis sepsis syndrome and death in trauma patients are associated with variation in the gene encoding for tumor necrosis factor association of tnf2, a tnf-alpha promoter polymorphism, with septic shock susceptibility and mortality: a multicenter study variation in the tumor necrosis factor-alpha gene promoter region may be associated with death from meningococcal disease association between surfactant protein b + 1580 polymorphism and the risk of respiratory failure in adults with community-acquired pneumonia genome-level longitudinal expression of signaling pathways and gene networks in pediatric septic shock microarray analysis of regional cellular responses to local mechanical stress in acute lung injury genetic and environmental infl uences on premature death in adult adoptees the national human genome research institute's talking glossary of genetic terms irak1 functional genetic variant affects severity of septic shock genome-level expression profi les in pediatric septic shock indicate a role for altered zinc homeostasis in poor outcome pre-b-cell colony-enhancing factor as a potential novel biomarker in acute lung injury disease phenotype, preferably by a different group of investigators. finally, the impact of the genetic variant on the protein product must possess biologic plausibility as impacting the development or the outcome of the disease of study. only after all of these criteria are met should clinicians be comfortable moving to the arena of tailoring therapy based on genetic variations. 4. you are caring for two brothers with acute lung injury secondary to smoke inhalation from an apartment fi re. these two infants were apparently sleeping in the same crib when they were rescued simultaneously by fi re fi ghters. the fi rst infant was extubated within 48 h of intubation and is doing well with a minimal oxygen requirement. the second has experienced a much more severe lung insult and remains intubated on high frequency oscillatory ventilation. in attempting to understand the difference in their clinical response to the seemingly identical insult, you suspect that it may be related to a polymorphism in one of the genes that codes for surfactant protein b. in considering this possibility, which of the following is true? a. although plausible, it is unlikely to be associated with a polymorphism because polymorphisms are rare occurring in less than one percent of the population. b. it is not plausible because variances in the translation of such a complex protein require differences in an entire haplotype, and not simply a single nucleotide polymorphism. c. it is plausible because genetic polymorphisms may infl uence the quantity of mrna transcribed and/or the functional activity of the surfactant protein b. d. it is unlikely as there are no reports of associations between surfactant protein gene polymorphisms and outcomes from pulmonary disease. e. it is unlikely because dysfunctional surfactant protein b demonstrates an x-linked pattern of inheritance. key: cord-016364-80l5mua2 authors: menotti-raymond, marilyn; o’brien, stephen j. title: the domestic cat, felis catus, as a model of hereditary and infectious disease date: 2008 journal: sourcebook of models for biomedical research doi: 10.1007/978-1-59745-285-4_25 sha: doc_id: 16364 cord_uid: 80l5mua2 the domestic cat, currently the most frequent of companion animals, has enjoyed a medical surveillance, as a nonprimate species, second only to the dog. with over 200 hereditary disease pathologies reported in the cat, the clinical and physiological study of these feline hereditary diseases provides a strong comparative medicine opportunity for prevention, diagnostics, and treatment studies in a laboratory setting. causal mutations have been characterized in 19 felid genes, with the largest representation from lysosomal storage enzyme disorders. corrective therapeutic strategies for several disorders have been proposed and examined in the cat, including enzyme replacement, heterologous bone marrow transplantation, and substrate reduction therapy. genomics tools developed in the cat, including the recent completion of the 2-fold whole genome sequence of the cat and genome browser, radiation hybrid map of 1793 integrated coding and microsatellite loci, a 5-cm genetic linkage map, arrayed bac libraries, and flow sorted chromosomes, are providing resources that are being utilized in mapping and characterization of genes of interest. a recent report of the mapping and characterization of a novel causative gene for feline spinal muscular atrophy marked the first identification of a disease gene purely from positional reasoning. with the development of genomic resources in the cat and the application of complementary comparative tools developed in other species, the domestic cat is emerging as a promising resource of phenotypically defined genetic variation of biomedical significance. additionally, the cat has provided several useful models for infectious disease. these include feline leukemia and feline sarcoma virus, feline coronavirus, and type c retroviruses that interact with cellular oncogenes to induce leukemia, lymphoma, and sarcoma. mankind has held a centuries-long fascination with the cat. the earliest arch eological records that have been linked to the domestication of felis catus date to approximately 9500 years ago from cyprus, 1 with recent molecular genetic analyses in our laboratory suggesting a middle eastern origin for domestication (c. driscoll et al., unpublished observations) . currently the most numerous of companion animals, numbering close to 90 million in households across the united states (http://www.appma.org/ press_industrytrends.asp), the cat enjoys a medical surveillance second only to the dog and humankind. in this chapter we review the promise of the cat as an important model for the advancement of human hereditary and infectious disease and the genomic tools that have been developed for the identification, and characterization of genes of interest. for many years we have sought to characterize genetic organization in the domestic cat and to develop genomic resources that establish f. catus as a useful animal model for human hereditary disease analogues, neoplasia, genetic factors associated with host response to infectious disease, and mammalian genome evolution. 2, 3 to identify genes associated with inherited pathologies that mirror inherited human conditions and interesting pheno-types in the domestic cat, we have produced genetic maps of sufficient density to allow linkage or association-based mapping exercises. [4] [5] [6] [7] [8] [9] [10] [11] the first genetic map of the cat, a physical map generated from a somatic-cell hybrid panel, demonstrated the cat's high level of conserved synteny with the human genome, which offered much promise for the future application of comparative genomic inference in felid mapping and association exercises. 12 several radiation hybrid (rh) and genetic linkage (gl) maps have since been published. [4] [5] [6] [7] [8] [9] 11, 13, 14 although previous versions of the cat gene map, based on somatic cell hybrid and zoo fish analysis, 15 ,16 revealed considerable conservation of synteny with the human genome, these maps provided no knowledge of gene order or intrachromosomal genome rearrangement between the two species, information that is critical to applying comparative map inference to gene dis covery in gene-poor model systems. radiation hybrid (rh) mapping has emerged as a powerful tool for constructing moderate-to high-density gene maps in vertebrates by obviating the need to identify interspecific polymorphisms critical for the generation of genetic linkage maps. 7 the most recent rh map of the cat 8 includes 1793 markers: 662 coding loci, 335 selected markers derived from the cat 2x whole genome sequence targeted at breakpoints in conserved synteny between human and cat, and 797 short tandem repeat (str) loci. the strategy used in developing the current rh map was to target gaps in the feline-human comparative map, and to provide more definition in breakpoints in regions of conserved synteny between cat and human. the 1793 markers cover the length of the 18 feline autosomes and the x chromosome at an average spacing of one marker every 1.5 mb (megabase), with fairly uniform marker density. 8 an enhanced comparative map demonstrates that the current map provides 86% and 85% comparative coverage of the human and canine genomes, respectively. 8 ninety-six percent of the 1793 cat markers have identifi able orthologues in the canine and human genome sequences, providing a rich comparative tool, which is critical in linkage mapping exercises for the identification of genes controlling feline phenotypes. figure 25 -1 presents a graphic display of each cat chromosome and blocks of conserved syntenic order with the human and canine genomes. 8 one hundred and fifty-two cat-human and 134 cat-dog homologous synteny blocks were identified. alignment of cat, dog, and human chromosomes demonstrated different patterns of chromosomal rearrangement with a marked increase in interchromosomal rearrangements relative to human in the canid lineage (89% of all rearrangements), as opposed to the more frequent intrachromosomal rearrangements in the felid lineage (95% of all rearrangements) since divergence from a common carnivore ancestor 55 my ago. with an average spacing of 1 marker every 1.5 mb in the feline euchromatic sequence, the map provided a solid framework for the chromosomal assignment of feline contigs and scaffolds during assembly of the cat genome assembly, 17 and served as a comparative tool to aid in the identification of genes controlling feline phenotypes. as a complement to the rh map of the cat, a third generation linkage map of 625 strs is currently nearing completion. the map has been generated in a large multigeneration domestic cat pedigree (n = 483 informative meioses). 18 previous first-and second-generation linkage maps of the cat were generated in a multigeneration interspecies pedigree generated between the domestic cat and the asian leopard cat, prionailurus bengalensis, 7 to facilitate the mapping and integration of type i (coding) and type ii (polymorphic str) loci. 7 the current map, which spans all 18 autosomes with single linkage groups, has twice the str density of previous maps, providing a 5-cm resolution. there is also greatly expanded coverage of the x chromosome, with some 75 str loci. marker order between the current generation rh and gl maps is highly concordant. 8 approximately 85% of the strs are mapped in the most current rh map of the cat, 8 which provides reference and integration with type i loci. whereas the third-generation linkage map is composed entirely of str loci, the sequence homology of extended genomic regions adjacent to the str loci in the cat 2x whole genome sequence, 17 to the dog's homologous region, 19 has enabled us to obtain identifiable orthologues in the canine and human genome sequences for over 95% of the strs. thus, practically every str acts as a "virtual" type 1 locus, with both comparative anchoring and linkage map utility. combined with the cat rh map, these genomic tools provide us with the comparative reference to other mammalian genomes critical for linkage and association mapping. the domestic cat is one of 26 mammalian species endorsed by the national human genome research institute (nhgri) human genome annotation committee for a "light" 2-fold whole genome sequence, largely to capture the pattern of genome variation and divergence that characterizes the mammalian radiations (http:// www.hgsc.bcm.tmc.edu/projects/bovine/, http://www.broad.mit. edu/mammals/). although light genome coverage provides limited sequence representation, (∼80%), 20 one of the rationales for these light genome sequences included "enhancing opportunities for research on species providing human medical models." the 2-fold assembly of the domestic cat genome has recently been completed for a female abyssinian cat, "cinnamon," 17 and a 7x whole genome sequencing effort is planned in the near future. a total of 9,161,674 reads were assembled to 817,956 contigs, covering 1.642 gb with an n50 (i.e., half of the sequenced base pairs are in contigs 90% of genome in contigs, (2) average contig length > 5 kb, (3) >90% of a set of conserved genes present, (4) contig n90 length > 5 kb, (5) >90% of bases > 5× read coverage, (6) scaffold n90 length > 20 kb. the information used to annotate genomes comes from three types of analysis: (1) ab initio gene finding programs, which are run on the dna sequence to predict protein coding genes; (2) s.aureus usa300 s.aureus col fig. 1.1 alignment of genomes of three strains of staphylococcus aureus. dna sequences that find a perfect match are connected with red lines or blocks. blue areas are inversions or transitions and white areas represent indels. the figure was produced using artemis software (the wellcome trust sanger institute, uk) 1 informatics for infectious disease research and control evidence-based gene calling or translating alignments of the dna sequence to known proteins; and (3) aligning cdnas from the same or related species. gene finding has progressed far beyond the simple identification of open reading frames. the programs aligning cdna and protein sequences to genomic dna can locate the protein coding regions by searching the publicly available databases or by applying machine learning algorithms such as hidden markov models (hmm). there is a long list of such programs including genemark, morfind, prodigal (prokaryotic dynamic programming genefinding algorithm), argon and glimmer (gene locator and interpolated markov modeller) (delcher et al. 1999; suzek et al. 2001; majoros 2007) . they differ in the time required for automated annotation as well as the quality of gene calling (guigo et al. 2006) . problems with the accuracy of current gene finders reflect not only the performance of their algorithms but also the quality of the primary resources and the abundance of non-coding dna regions in microbial genomes. genome assembly annotation methods and tools including new applications for rna genes, were reviewed in detail elsewhere (stothard and wishart 2006; médigue and moszer 2007; brent 2008; pop and salzberg 2008) . recent breakthroughs in high-throughput sequencing technologies have posed new challenges for genome assembly, annotation and analysis. these technologies make it feasible to sequence not only static genomes but also entire transcriptomes expressed under different conditions (shendure and ji 2008) . however, they can produce read lengths as short as 35-40 nucleotides, which cannot be analyzed with software developed for sanger data as they are often non-unique, lack neighborhood context and have a different distribution of errors. the task of linking such short-reads may be accomplished using a comparative assembly algorithm, in which new sequences are put together by mapping them onto close relatives or the "reference genomes." not surprisingly, the comparative assembly strategy works best when the two species are more than 90% identical. alternatively, when no "reference genome" is available, the new cohort of assembly algorithms based on de bruijn graphs -a way to transform sequence data into a network structure -has risen to the task (chaisson and pevzner 2008; maclean et al. 2009 ). strategies and systems that address these new challenges have recently been reviewed elsewhere (pop and salzberg 2008; maclean et al. 2009; ussery et al. 2009 the metagenomics or the sequencing of genomes of complex mixed communities has emerged at the interface of genomics, microbiology and information technology. this field examines the interplay of hundreds of microbial species present at specific sites of potential infections in space and time (hutchinson 2007; smarr et al. 2009 ). significantly, metagenomics has extended its focus from environmental microorganisms to microbial communities or "community whole genome sequences" of the human host (field et al. 2006; verberkmoes et al. 2009 ). most of the 10-100 trillion microorganisms in the human gastrointestinal tract live in the colon (turnbaigh et al. 2007 ). the genomes of these microbial symbionts have been collectively defined as the microbiome or ecosystem in which the number of microbial genes is estimated to be many folds higher than those present in the human genome. the human gut microbiome initiative, a logical conceptual extension of the human genome project, aims to discover genomes of at least 100 new intestinal species. this approach has targeted the totality of genes involved in the gut biofilms, the mechanisms of horizontal gene transfer, and the role of the microbial pan-genome (field et al. 2006) . the microbiome project aims to address some of the most inspiring and fundamental scientific questions today in order to identify new ways to determine health and predisposition to diseases and define parameters in addition to conventional strings of nucleotides, large-scale sequencing can provide new types of data reflecting global genome architecture and the properties of pathogens. these data include the size of a genome and its nucleotide composition, the locations of genes and intergenic regions, gc percentage and gene density. microbial genomes are compared by the number of particular sets of genes, gene order (synteny) and the presence or absence of important genes. other metrics include gene set properties (the number of two component system regulatory genes) and nucleotide sequence-based measures (distance between paired twocomponent system genes and consensus sequence) (whitworth 2008; ussery et al. 2009 ). these metrics represent a global view of genomes but often have limited biological meaning. thus, "signature" sequences have been suggested as a means of identifying organisms or genes with sequence profiles correlating with the pathogen phenotype or disease outcomes. examples of genome characteristics that are more directly related to biologically important behavior are bacterial iq (a measure of the number of signal transduction proteins as a function of genome size) and extrovertedness (the proportion of signaling proteins predicted to sense external stimuli) (galperin 2005) . analyses of genomics data challenge the traditional taxonomy of microbial species. recent projects have focused on producing simple analytical diagnostic tools based on strong taxonomic knowledge collated in the dna reference libraries such as the dna barcode of life data system (bold; http://www.boldsystems. org). these types of data enable the acquisition, storage, analysis and publication of dna barcode results, and provide clues about the global distribution of species. their genetic diversity and structure is based on two postulates: first, that every species is represented by a unique dna barcode (indeed there are 4 650 possible atgc combinations compared to an estimated 10 million species remaining to be discovered (frézal and leblois 2008) ), and second, that the genetic variation between species exceeds the variation within species. dna barcoding requires a minimum sequence length of 500 bp and more than three individual sequences per species. the initial barcode of life framework was based on the sequence of a single universal marker -the cytochrome c oxidase gene -but has evolved since then, giving rise to a flexible description of dna barcoding, a larger range of applications and the broader use of the term "barcode" (frézal and leblois 2008) . for example, the whole microbial genome's barcodes were defined as frequency distributions of periodic dna sequences or k-mers across the whole genome (zhou et al. 2008 ). it has been postulated that such barcode similarities are proportional to the genomes' phylogenetic closeness and could be utilized in metagenome analyses (zhou et al. 2008) . microbial species diversity can be also estimated by the average nucleotide identity (ani) using the list of orthologs and deriving the overall divergence of the core genome by averaging the percentages of identity at the nucleotide level (konstantinidis and tiedje 2005) . another approach to measure distances between genomes is based on estimating the proportion of common genes by calculating the ratio of orthologs to the total number of genes of the reference genome. more recently, similar methods such as dna content, blast distance phylogeny and the mum (maximal unique and exact matches) index have been suggested as more sensitive measures for intra-species comparisons (deloger et al. 2009 ). the true power of large-scale comparative genomic studies lies in their ability to identify and characterize biological trends and rules that explain particular phenomena (field et al. 2006 ). computational methods have become essential steps in formulating hypotheses about gene functions. the comparative approach has not only yielded fundamental insights into the function and evolution of microbial genomes, but has also led to practical results. comparative genomics has allowed the accurate estimation of the structure of genomes and the speed of gene movements, including the role of natural selection versus genetic drift, the origin of the pandemic strains, and the ecology of a pathogen in its natural reservoir yang et al. 2008a) . computational studies identified unexpected relationships between genomic features and ecological niches, demonstrated diversity in the microbial world and helped to reconstruct evolutionary relationships among genomes (binnewies et al. 2006; field et al. 2006) . comparisons made between different genomes can also generate new hypotheses for testing, usually relating to the unexpected presence or absence of particular genes with respect to other genomes (whitworth 2008) . the studies of three main forces shaping genome evolution -gene loss, gain and change -have been especially fruitful in this respect (burrack et al. 2007; whitworth 2008) . discoveries of gene duplication in many bacterial pathogens, resulting in increased numbers of key gene clusters or the expansion of important protein families have led to the development of new diagnostic methods. for example, the gene clusters encode a secreted protein called the early secretory antigenic target 6 or esat6, which was identified as one of the key virulence factors in mycobacterium tuberculosis and was subsequently used in the interferon-gamma release assays for the diagnosis of tuberculosis (pallen and wren 2007; behr 2008) . comparative genomics has also revealed that pathogens undergo a process of genome decay or a reduction in the number of biosynthetic pathways, resulting in a dependence on the infected host for certain essential functions. the most surprising 1 informatics for infectious disease research and control snapshots of genome decay have come from relatively recently emerged pathogens that have changed their lifestyles by adopting a simpler host-associated niche. for example, the genomes of yersinia pestis (parkhill et al. 2001b) and salmonella enterica serovar typhi (parkhill et al. 2001a ) contain hundreds of pseudogenes. these findings challenge the traditional view that bacterial genomes never contain "junk" dna and that every gene in a bacterial genome must have a function. instead, every genome should be viewed as a work in progress, burdened with some non-functional "baggage of history" (pallen and wren 2007) . as the smallest-scale variation in microbial genomes occurs at the level of singlenucleotide polymorphisms (snps), snp detection has been applied extensively to many pathogens (yao et al. 2008) . while snps are generally considered rare, at one per several thousand base pairs, two genomes of m.tuberculosis of 4 mb each may have some 1,0002008 snps between two isolates (behr ) . whole-genome sequencing has been proven as an even more powerful tool to detect snps. it enabled the differentiation of escherichia coli strains that had diverged for as few as 200 generations (shendure and ji 2005) and revealed genomic changes in pathogens in the process of human infection (chen et al. 2006; forst 2006; pallen and wren 2007 ). in the pre-informatics era, virulence factors were typically identified either by biochemical studies or through genetic screens. informatics has enabled innovative strategies for the recognition of virulence gene recognition through the analysis of genetic signatures (pallen and wren 2007) . despite the variety of microbial life styles and associated genomic and metabolic complexity, pathogen genomes share common architectural principles. as a result, computational techniques assist in exploring similarities between virulence factors and other genes with known functions. this association can then be tested using targeted genetic methods such as the inactivation of the putative virulence gene followed by the comparison of phenotypes of the original and modified microorganisms raskin et al. 2006) . a strategy that does not rely on sequence similarity for identifying potential genes is the detection of coding sequences, which is based the gene context "grammars" supplemented with machine learning models (garrido et al. 2008) . for example, functional gene recognition tools genemark and glimmer employ hidden markov models, in which the preceding nucleotide bases are used to predict the next base in a coding region, and the algorithm is trained on a trusted set of sequences. gene coding regions are then identified using probability estimates of the correct coding "grammar" in a region (dougherty et al. 2002) . different statistical and machine learning methods for gene prediction have been reviewed elsewhere (majoros 2007) . gene-gene interactions specifically associated with a phenotype or a particular disease can be explored with or without a prior biological knowledge. several techniques utilizing bayesian networks, pair-wise mutual information and graphical gaussian models have been proposed for this purpose. coupled with biological knowledge, the identification of such phenotype-specific interactions can shed light on the responsible pathways. the complexity of data handling and visualization has led to efforts to develop dedicated comparative genomics resources such as gendb (meyer et al. 2003) , cmr, act, (table 1 .1) xbase and microbes online as well as data management systems such as seed (table 1. 2) (chaudhuri et al. 2008 ). informatics has been instrumental in the change from static to a dynamic view of the microbial world. in contrast to the static view of genome annotations focused on the gene or protein prediction, the dynamic view places information obtained into a biological context to identify interactions between the genomic components and the reconstruction of regulatory networks (médigue and moszer 2007; sakata and winzeler 2007) . under the network vision of the microbial world, microbial chromosomes are not envisaged as strictly defined genotypes gradually changing in time but rather as islands of temporary, relative dynamic stability that form tightly connected (vertically and horizontally) areas of the network (koonin and wolf 2008) . the infection cycle should be considered as a whole and the links between growth, virulence, immune evasion and transmission should be assessed (restif 2009 ). biological interactions vary in their nature and are spatially and temporally heterogeneous. one can abstract the actions of proteins and metabolites by representing genes acting on other genes as a gene network or as genetic regulatory, transcription or expression networks. such networks can be constructed using computationally assigned functional linkages inferred by rosetta stone, operon or similar methods (rachman and kaufmann 2007; harrington et al. 2008) , and often point to highly connected and central proteins frequently referred to as "hubs" (wu et al. 2008) . biological interaction and communication networks share several commonalities: they are scale free (only a few nodes are highly connected) and are small world networks (highly clustered with short distances between any two nodes) (kann 2008) . increasingly, disease pathogenesis and the mechanisms of drug action are viewed from a biological systems perspective (wu et al. 2008) . from this perspective, a deeper understanding of infectious diseases may rely on an exhaustive characterization of all potential interactions occurring between proteins encoded by viruses and those expressed in infected cells. thus, the integration of all protein-protein interactions into an infected cellular network, or "infectome," offers a powerful framework for the virtual modeling and analysis of infections (navrati et al. 2009 ). the terms "interactome" and "phenomics" have been coined in this context (lussier and liu 2007) . numerous resources have been developed to explore host-pathogen interactions (phi) (table 1.3) . specifically, phi-base (winnenburg et al. 2006) , phidias (xiang et al. 2007 ), biohealthbase (squires et al. 2008) , pig (driscoll et al. 2009 ) virusmint (chatr-aryamontri et al. 2009 ) and virhostnet (navrati et al. 2009 ) have been virulence prediction lengauer et al. 2007 drug resistance prediction navrati et al. 2009 raman et al. 2008 effect of diseases on gene expression drug target identification reddy et al. 2009 squires et al. 2008 stavrinides et al. 2008 drug resistance prediction drug resistance prediction suggested to study and visualize pathogen-related pathways. for example, the virhostnet is a knowledge base for the management and analysis of proteome-wide virus-host interaction networks and a resource of manually curated interactions defined for a wide range of viral species (navrati et al. 2009 ). genomic and proteomic data is often informationally synergistic, allowing for the reconstruction of known pathways from the first principles. the combination of these forms of data have been used to identify libraries of recurring motifs, where the mixed semantics of the pattern promises to be more informative than any single data source taken in isolation in building biological networks (michael et al. 2008; stavrinides et al. 2008) . systems biology has arisen from various attempts to move away from the reductionist approach, which is hindered by the difficulty of breaking a system into separable and meaningful parts. it encompasses several high-throughput analytic technologies, including genomics, transcriptomics to measure gene expression and its regulation at the level of messenger rna and microrna production, proteomics to measure changes in protein production, and computational biology, which depends on analytic software packages for analyzing, organizing, and interpreting those data (sakata and winzeler 2007) . such an approach treats pathogens and their environments as a series of hierarchical levels or networks from gene products to whole organisms and integrates the time dimension in order to structure knowledge and to determine rules that would allow navigation between levels (lisacek et al. 2006 ). this approach demands new tools for data management, the integration of which offers the opportunity to correlate multiple lines of evidence and to reduce uncorrelated noise. the major difference between the pre-and post-genomics eras is that one can now potentially account for and keep track of all components at once. however, the gathering of a large collection of data does not guarantee that we can make sense of it or that new knowledge will emerge (collado-vides et al. 2009 ). the chance for enriching biomedical knowledge can be increased by mixing various streams of data and gaining robustness from the "cross-validation" of the knowledge sources (guyet et al. 2007 ). public websites like galaxy (http://galaxy.psu.edu) and interpro (http://www.ebi.ac.uk/interpro/) offer integration toolsets for genomics and proteomics analyses. as generating data remains a costly undertaking, computational models have a pivotal role to play in the integrative science. they help researchers to illuminate the underlying processes and identify the key questions that need to be addressed experimentally (restif 2009 ). compared to conventional, small-scale experimental approaches, they give a wider, often more relevant view of host responses to infections or other health insults. these computational models have the capacity to guide and direct wet lab experimental efforts complimenting traditional in vivo, in situ, and in vitro testing with the emerging in silico approach (lengauer et al. 2007; raman et al. 2008) . some impressive starts have been made on bacterial models in the form of simulation tools. for example, the reconstruction of metabolic networks gave birth to the first examples of in silico strains that can be utilized to explore alternative ways of identifying new drug targets (jamshidi and palsson 2007) . the end result of these simulations may be the genomic bioengineering of microorganisms based on knowledge of interacting systems and networks of genes and gene products. text mining tools are being created to query the pubmed literature database and to integrate the available genomic and proteomic information to map the genes and their interrelationship with particular networks of a disease (korbel et al. 2005; jelier et al. 2008; rzhetsky et al. 2008; zaremba et al. 2009 ). an unsupervised, systematic approach for associating genes and phenotypic characteristics (g2p) that combines literature mining with comparative genome analysis has been successfully applied and has uncovered clusters of unsuspected g2p associations (korbel et al. 2005 ). the phase of history in which biomedical science could be significantly advanced by individual researchers without data sharing has come to a close. the global, collaborative analyses of data and the exchange of the results across social, political and technological boundaries have created the demand for new cyber-infrastructures for research. there has been a major effort, in the form of e-science, to develop technologies to fulfill these demands (craddock et al. 2008 ). the chance of making a discovery or replicating the finding is greatly increased if there are effective mechanisms for different groups to share data and thereby enlarge the number of samples that are studied. this paradigm has been successful in both human genomics and infectious disease research (e.g., including the rapid discovery and identification of emerged pathogens such as the nipah virus and the novel coronavirus that caused the sars epidemic). post-genomic era solutions such as federated databases and other technologies that enhance connectivity and data retrieval have created a new knowledge environment (birkholtz et al. 2006; thorisson et al. 2009 ). the level of technical competence required of the users is being reduced by the provision of "off-the-shelf" solutions. for example, the gen2phen project offers "database-in-a-box" installation packages, which include an open-source complete genetic association database system with the option for federation (thorisson et al. 2009 ). alternative infrastructures for e-science with significant advantages over conventional internet technologies are offered by grid and cloud computing and the semantic web (numann and prusak 2007; craddock et al. 2008) . first, grids provide unique access to high performance computing power, distributed applications and sources (see chap. 14 for examples). second, grids increase data storage spaces, and allow data and tools to be shared by geographically dispersed users. however, developing and maintaining grid or cloud architectures remains a complex task and requires further advances in security and privacy models before they can be embraced by diagnostic laboratories (lisacek et al. 2006 ). tasks that require an e-science approach or global science that is performed in silico are typically computationally intensive and use heterogeneous resources that must be integrated across distributed networks (craddock et al. 2008) . increasingly, the genomic, proteomic and metabolomic data have to be integrated with traditional literature in a machine-readable way. typical sets of experimental data yield component lists with quantitative content data and a catalog of interactions and networks. this requires the establishment of a middleware to convert experimental data into a format suitable for manipulation and viewing by end-users. for example, the generic model organism database project (gmod; http://gmod.org) aims to link experimental data with corresponding contextual meta-data about experimental conditions and protocols in a multi-user, multi-center environment. it offers a collection of open source tools for creating and managing genome-scale biological databases ranging from a small database of genome annotations to a large web-accessible community database. another approach is to trade off the width of integration for more depth with regard to a particular analysis task, and to employ workflow systems such as inforsense (http://www.inforsense.com) or taverna (http://taverna.sf.net). these act as glue layers between various data sources and analysis packages and are also often referred to as pipelines, in silico protocols or e-experiments (turnbaigh et al. 2007) . "pipeline" is mostly used to describe executable workflows, while the other terms are dedicated to abstract workflows (lisacek et al. 2006) . many innovative solutions for the multi-dimensional integration of data produced by experimental laboratories have been introduced by bioinformatics resource centers for biodefense and emerging/re-emerging infectious diseases through regional biodefense centers of excellence (mcneal et al 2007; greene et al. 2007) . sets of task-and domain-specific online query and display tools are being developed to allow the end-user to view data in a number of different formats and to run informative comparisons of data with existing libraries (louie et al. 2007; glassner et al. 2008) . the most striking change in data collection and representation is expressed by the move from flat databases to atlases or collections of interconnected maps (lisacek et al. 2006 ). the uneven content and quality of data and the constant evolution of biomedical knowledge remain the main obstacles to data integration (lisacek et al. 2006 ). the quality of data is affected by a number of factors including the accuracy of the mapping algorithms and reference datasets, the standardization of data formats and the level of detail of the experiment description (stead et al. 2008 ). in addition, an increasing number of genomes are being released in "draft" form, before the finishing stage of a sequencing project, with high sequencing error rates (de keersmaesher et al. 2006; médigue and moszer 2007) . recent developments in databases and browsers for genomics have been summarized by schattner (2008) . there is an urgent need for data structures suitable for infectious disease space that can be applied to emerging "omics" data sets. the pathogen information markup language (piml) has also recently been introduced to enhance the interoperability of microbiology datasets for pathogens with epidemic potential (he et al. 2005) by capturing the data elements that describe determinants of pathogen profiles. however, the jury is still out on the question of which data integration architectures are best suited to assembling large scale and highly diverse genomic data. integrating high-throughput techniques with other analytic tools brings a new understanding of infectious processes and introduces an era of personalized strategies for managing infectious diseases. in this way, informatics becomes an irreplaceable platform for the constant cross-fertilization and interplay between focused and genome-wide studies. rapid and standardizable molecular identification systems have emerged during the last decade, with the development of sequence based species identification and sub-typing as the alternative to slow, labor-intensive and underpowered phenotypic techniques. molecular identification usually relies on the detection of a single gene or multiple gene targets, or requires the comparison of whole microbial genomes. for example, in the pragmatic world of diagnostic bacteriology, conserved housekeeping genes such as the 16s rrna gene, rpob gene and others have been accepted as reliable targets. they are found in all microorganisms and show enough sequence conservation for accurate alignment as well as enough variation for phylogenetic analyses (christen 2008) . furthermore, the 16s rrna gene based phylogeny is sufficiently congruent with those based on whole genome approaches. sequencing of six to eight genes or loci, as it typically done in multilocus sequence typing analysis, may constitute a reasonable compromise between single genebased and whole genome-based methods for species diversity studies. to streamline the process of the translation of sequencing-based identification into clinical practice, the concept of the pathogen profile has been introduced . a pathogen profile is a single, multivariate observation or set of observations, comprised of classes of specific attributes (e.g., genome, transcriptome, proteome or metabolome data), which are designed to allow the interrogation of existing or future databases, and the integration of genomics and post-genomics data with clinical observations and patient outcomes. the profile may indicate the probability that a specific marker is associated with a clinically relevant phenotype such as in vivo antimicrobial resistance or high transmissibility. this information allows the classification of strains into "risk groups" for treatment failure or a propensity to cause outbreaks of infections. it is often important to capture the quantitative information about a pathogen, in vivo, i.e. viral or bacterial loads and their units of measurement. in contrast to traditional subtyping, which is based on phenotypic characteristics such as serotype, biotype, phage type or antimicrobial susceptibility, genetic profiling describes the phenotypic potential in the nucleic acid sequence. a pathogen profile is a synthesis of various markers and clinical end-points, which can be extracted from medical charts that characterize an individual patient's clinical and public health outcomes. the profile may be heuristic, when only a single genetic marker is associated with a specific patient outcome, while more insights can be achieved when attributes from different levels of the biological hierarchy (i.e. gene detection, gene expression, metabolite profiles etc) corroborate and complement each other. machine learning algorithms, such as e-predict (urisman et al. 2005) , are being developed to identify viruses and bacteria present in clinical samples. these profiles are based on the microarray hybridization patterns or dna sequences of pathogens. many computerized evidence-based guidelines and decision support systems (dss) have been designed to improve the effectiveness and efficiency of antibiotic prescribing (samore et al. 2005; buising et al. 2008) . the most frequently utilized are electronic guidelines and protocols, especially for the empirical selection of antibiotics. the majority of dss result in improvement in clinical performance and, in at least half of the published trials, in improved patient outcomes (finch and low 2002; sintchenko et al. 2008a) . the revival of interest in prescribing-decision-support reflects the recent change in emphasis from support for diagnostic decisions towards support for patient management, and the changing focus from systems targeting a broad range of clinical diagnoses to task-and condition-specific decision aids. despite reported successes of individual applications, the safety of electronic prescribing systems in routine practice has recently been identified as an issue of potential concern. bioinformatics assisted prescribing has become a new frontier in reducing the complexities of prescribing combinations of antimicrobials in the era of multidrug resistance. the great diversity of mutational patterns contributing to antimicrobial resistance complicates the choice of optimal therapies. a range of bioinformatics tools to predict drug resistance or response to therapy from a genotype, have been developed to support clinical decision-making (beerenwinkel et al. 2003; lengauer and singh 2006) . these tools use either a statistical approach, in which the inferred model and prediction are 1 informatics for infectious disease research and control treated as regression problems, or machine learning algorithms, in which the model is addressed as a classification problem (sintchenko et al. 2008a) . a statistical learning approach to the ranking of therapeutic choices often relies on a direct correlation between the baseline microbial profiles, the therapeutic decision and the patient's response to treatment (e.g., expected reduction in viral load resulting from anti-hiv combination therapy). for example, several susceptibility scores have been used for combination antiretroviral therapy. these take into account specific resistance mutations and add up the activities of individual drugs in the regimen (lengauer and singh 2006) . computer-assisted therapy depends on the availability of widely shared databases that can correlate quality-controlled data from genotypic resistance assays and treatment regimens with short-and long-term clinical outcomes. databases such ardb (liu and pop 2009 ) capture differences in antimicrobial sensitivities and reflect variation in the amino acid composition of resistant microbes, but simply counting mutations may not be enough to predict functional differences, which affect treatment outcomes. the molecular profiling of pathogens is based on the concept that various pathogens can be associated with different clinical outcomes. it brings together the pathogen and host factors as the pathogenesis and natural history of infection are determined by both the pathogen and human genetic susceptibility. the effectiveness of combining host and pathogen genetics in a single system or "genetics-squared" has been proven in studies of viral infections (persson and vance 2007) . investigations of the impact of host genetics on the susceptibility to hiv infection and the rate of disease progression have mainly used a candidate gene approach to reveal associations with a number of different genes. the genome-wide association studies look at the genetic variation across the human genome in order to uncover factors not previously suspected of influencing infection outcomes. for example, this strategy identified variants of the hiv virus associated with differences in the control of viral load at set points and in disease progression. however, unraveling the interaction between the host and microbial genetic factors requires large clinical trials, reinforcing the role of collaborative networks and data repositories. informatics methods have become critical for data mining to decipher links between genetic variation and disease pathogenesis in order to define markers of disease progression, to guide the optimum use of therapeutics and to refine the drug and vaccine development (mansmann 2005) . a better understanding of the function of genes and other parts of the genome has enabled the reverse engineering approach, which may lead to the characterization and discovery of potential drug targets, vaccine candidates and diagnostic or prognostic markers (davies and flower 2007; yang et al. 2008b) . proteins with essential biological functions present in multiple pathogens could be the best drug targets. once the target genes essential for pathogen survival are identified, their susceptibility to specific compounds derived from large chemical libraries is examined in silico and in vitro (muzzi et al. 2007; biswas et al. 2008 ). increases in the use of electronic medical records and the availability of information technology tools have created opportunities for the automation of surveillance and facilitation of surveillance based on either syndromic or disease-specific signals (amadoz and gonzales-candelas 2007; m'ikanatha et al. 2007 ). the automation of data collection improves the time and completeness of surveillance and allows infection control professionals to focus on interventions (hota et al. 2008; young and stevenson 2008) . the comparison of chromosomal sequences allows the identification of the unique genomic signatures of pathogens for the purposes of infection control and "microbial forensics." molecular typing methodologies, in contrast to classical phenotypic methods, allow the discrimination of variations among strains within a species, the elucidation of the route of contamination, the identification of the source of infection as well as the analysis of epidemics. the identification of the natural reservoir and any possible intermediate hosts of pathogens is critical for understanding the transmission modes, designing a long-term disease control strategy, and preventing future reintroduction ). bioinformatics assisted biosurveillance addresses the inefficiencies of traditional surveillance, as well as the need for a more timely and comprehensive infectious disease monitoring and control. it leverages on recent breakthroughs in the rapid, high-throughput molecular profiling of microorganisms and text mining, as well as on the growing electronic body of knowledge about the molecular epidemiology of pathogens with epidemic potential. such a framework combines the genetic and geographic data of a pathogen to reconstruct its history and to identify the migration routes through which the strains spread regionally and internationally (cantón 2005; sintchenko et al. 2008b) . computer-based geographic information systems (gis) have offered an efficient way to visualize the dynamics of the transmission of infections, especially in the setting of a community outbreak (mckee et al. 2000; schreiber et al. 2007) . another way to track infectious diseases of public health concern is to monitor health-seeking behavior in the form of queries to online search engines used by the general public or health professionals. epidemics of seasonal influenza in areas with a large population of internet users have been successfully detected using google search data and then correlated with visits to a doctor (ginsberg et al. 2009; brownstein et al. 2009 ). the advent of news aggregators has led to the development of new disease surveillance tools that can continuously mine, categorize, filter, and visualize multilingual online information about epidemics. the global public health intelligence network (gphin), developed almost a decade ago by health canada in collaboration with who, healthmap (http://www.healthmap.org/en) ( fig. 1.2) or geosentinel (http://www.istm.org/geosentinel/main.html) among many others are examples of such early warning systems. resources for infection prevention and control on the world wide web have been recently reviewed elsewhere (brownstein et al. 2009; johnson et al. 2009) the reductionist approach to biomedical research focusing on the study of cells and molecules has peaked with the sequencing of the human genome. however, it is becoming increasingly clear that "taking apart" analyses have reached their limit, and the time has perhaps come for integrative science (an and faeder 2009) . developments in informatics have been critical in supporting and engaging with both reductionist and integrative paradigms. on one hand, informatics has equipped comparative genomics with tools to scrutinize genes and explore genetic polymorphisms. on the other hand, informatics has enabled the generation of integrative and testable hypotheses through the discovery of knowledge in databases and through the study of gene-phenotype connections between a pathogen and its host environment. a variety of data sets can be integrated, including the patient's demographic and clinical presentation, the laboratory results, the pathogen's gene regulation and expression, and metabolic maps with different parameters reflecting the phenotypic behavior of a pathogen and host factors. in early years some skeptics saw informatics-assisted research as a distraction of effort and funding away from traditional hypothesis-driven inquiry. since then, infectious disease informatics has verified its status as a platform for hypothesis generation and testing ). new breakthroughs in infectious disease informatics (idi) are the result of cross-pollination between different disciplines that use technologies to gather and disseminate knowledge (fig. 1.3) . microbial genome sequence analysis and metagenomics have contributed intriguing new data types and data sources to idi. bioinformatics has brought to the idi a range of analytic tools, databases and data standards. conventional health informatics and computer science has provided high performance solutions for the data storage, sharing, analysis and visualization as well as clinical terminology libraries, data standards, decision support and technology evaluation frameworks. importantly, the infectious disease informatics community has fed the lessons learnt from the implementation of clinical and public health systems back to the broader audience. as the subsequent chapters of this volume testify, infectious disease informatics is set to lead to the more targeted and effective prevention, diagnosis and treatment of infections through a comprehensive review of the genetic repertoire and metabolic profiles of pathogens. the post-genomic era offers new opportunities for the efficient discovery of safe and efficacious subunit vaccines by shortcutting the enormous economic burden of the experimental process. our analytical capacity has already become the rate-limiting step in biomedical research. at the same time, it provides an opportunity to apply the engineering paradigm to biomedical research, thereby mandating the development of tools that can dynamically represent a body of current knowledge. however, the simplistic application of brute force computational power to massive reams of biomedical data is unlikely to result in meaningful mechanistic insight. it cannot be overstressed that informatics initiatives should compliment "wet laboratory" practices. an iterative loop of discovery and validation between the two methodologies remains the best way forward. epipath: an information system for the storage and management of molecular epidemiology data from infectious pathogens detailed qualitative dynamic knowledge representation using a bionet gen model of tlr-4 signaling and preconditioning bioinformatics in microbial biotechnology -a mini review geno2pheno: estimating phenotypic drug resistance from hiv-1 genotypes mycobacterium du jour: what's on tomorrow's menu? microb infect ten years of bacterial genome sequencing: comparative-genomics-based discoveries integration and mining of malaria molecular, functional and pharamacological data: how far are we from a chemogenomic knowledge space? a bioinformatic approach to understanding antibiotic resistance in intracellular bacteria through whole genome analysis steady progress and recent breakthroughs in the accuracy of automated genome annotation digital disease detection -harnessing the web for public health surveillance improving antibiotic prescribing for adults with community acquired pneumonia: does a computerised decision support system achieve more than academic detailing alone?-a time series analysis genomic approaches to understanding bacterial virulence role of the microbiology laboratory in infectious disease surveillance, alert and response artemis and act: viewing, annotating and comparing sequences stored in a relational database short read fragment assembly of bacterial genomes virusmint: a viral protein interaction database xbase2: a comprehensive resource for comparative bacterial genomics vfdb: a reference database for bacterial virulence factors identification of genes subject to positive selection in uropathogenic strains of escherichia coli: a comparative genomics approach identification of pathogens -a bioinformatic point of view bioinformatics resources for the study of gene regulation in bacteria e-science: relieving bottlenecks in largescale genome analyses mauve: multiple alignment of conserved genomic sequence with rearrangements harnessing bioinformatics to discover new vaccines integration of omics data: how well does it work for bacteria? improved microbial gene identification with glimmer a genomic distance based on mum indicates discontinuity between most bacterial species and genera microbial genomics and novel antibiotic discovery: new technology to search for new drugs pig -the pathogen interaction gateway how do we compare hundreds of bacterial genomes? a critical assessment of published guidelines and other decision-support systems for the antibiotic treatment of community-acquired respiratory tract infections host-pathogen systems biology four years of dna barcoding: current advances and prospects biosurveillance of emerging biothreats using scalable genotype clustering a census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial iq, extroverts and introverts evaluation of eight different bioinformatics tools to predict viral tropism in different human immunodeficiency virus type 1 subtypes detecting influenza epidemics using search engine query data enteropathogen resource integration center (eric): bioinformatics support for research on biodefense-relevant enterobacteria national institute of allergy and infectious diseases bioinformatics resource centers: new assets for pathogen informatics egasp: the human encode genome annotation assessment project knowledge construction from time series data using a collaborative exploration system predicting biological networks from genomic data piml: the pathogen information markup language informatics and infectious diseases: what is the connection and efficacy of information technology tools for therapy and health care epidemiology dna sequencing: bench to bedside and beyond investigating the metabolic capabilities of mycobacterium tuberculosis h37rv using the in silico strain inj661 and proposing alternative drug targets anni 2.0: a multipurpose text-mining tool for the life sciences resources for infection prevention and control on the world wide web what would you do if you could sequence everything? protein interactions and disease: computational approaches to uncover the etiology of diseases analysis of mixed sequencing chromatograms and its application in direct 16s rrna gene sequencing of polymicrobial samples genomic insights that advance the species definition for prokaryotes genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world systematic association of genes to phenotypes by genome and literature mining mega: a biologist-centric software for evolutionary analysis of dna and protein sequences bioinformatics-assisted anti-hiv therapy bioinformatics prediction of hiv coreceptor usage proteome informatics ii: bioinformatics for comparative proteomics ardb -antibiotic resistance genes database data integration and genomic medicine computational approaches to phenotyping: high-througput phenomics infectious disease surveillance application of 'next-generation' sequencing technologies to microbial genetics methods for computational gene prediction genomic profiling: interplay between clinical epidemiology, bioinformatics and biostatistics application of a geographic information system to the tracking and control of an outbreak of shigellosis the national microbial pathogen database resource (nmpdr): a genomic platform based on subsystem annotation annotation, comparison and databases for hundreds of bacterial genomes gendb -an open source genome annotation system for prokaryote genomes building a knowledge base for system pathology the pan-genome: towards a knowledge-based iscovery of novel targets for vaccines and antibacterials virhostnet: a knowledge base for the management and the analysis of proteome-wide virus-host interaction networks knowledge networks in the age of the semantic web bacterial pathogenomics complete genome sequence of a multiple drug resistant salmonella enterica serovar typhi ct18 genome sequence of yersinia pestis, the causative agent of plague genetics-squared: combining host and pathogen genetics in the analysis of innate immunity and bacterial virulence bioinformatics challenges of new sequencing technology exploring functional genomics for the development of novel intervention strategies against tuberculosis targettb: a target identification pipeline for mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis bacterial genomics and pathogen evolution tb database: an integrated platform for tuberculosis research evolutionary epidemiology 20 years on: challenges and prospects seeking a new biology through text mining genomics, system biology and drug development for infectious diseases clinical decision support and appropriateness of antimicrobial prescribing nucleotide sequence of bacteriophage x174 dna genomes, browsers and databases dengueinfo: a web portal to dengue information resources next-generation dna sequencing laboratory-guided detection of disease outbreaks: three generations of surveillance systems genomic profiling of pathogens for disease management and surveillance are we measuring the right thing? variables that affect the impact of computerized decision support on patient outcomes: a systematic review decision support systems for antibiotic prescribing towards bioinformatics assisted infectious disease control building an optiplante collaboratory to support microbial metagenomics biohealthbase: informatics support in the elucidation of influenza virus host-pathogen interactions and virulence host-pathogen interplay and the evolution of bacterial effectors information quality in proteomics automated bacterial genome analysis and annotation a probabilistic method for identifying start codons in bacterial genomes genome analysis of multiple pathogenic isolates of streptococcus agalactiae: implications for the microbial "pan-genome genotype-phenotype databases: challenges and solutions for the post-genomic era the human microbiome project e-predict: a computational strategy for species identification based on observed dna microarray hybridization patterns computing for comparative microbial genomics: bioinformatics for microbiologists shortgun metaproteomics of the human distal gut flora genomes and knowledge -a questionable relationship phi-base: a new database for pathogen host interactions discovery of virulence factors of pathogenic bacteria phidias: a pathogen-host interaction data integration and analysis system genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research infectious disease in the genomic era bioinformatics databases and tools in virology research: an overview primersnp: a web tool for whole-genome selection of allele-specific and common primers of phylogenetically-related bacterial genomic sequences real-time surveillance and decision support: optimizing infection control and antimicrobial choices at the point of care text-mining of pubmed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens infectious disease informatics and outbreak detection key: cord-013171-wgn529rc authors: zhong, yi; hu, zhengchao; wu, jingcui; dai, fan; lee, feng; xu, yangping title: stau1 selectively regulates the expression of inflammatory and immune response genes and alternative splicing of the nerve growth factor receptor signaling pathway date: 2020-09-16 journal: oncol rep doi: 10.3892/or.2020.7769 sha: doc_id: 13171 cord_uid: wgn529rc double-stranded rna-binding protein staufen homolog 1 (stau1) is a highly conserved multifunctional double-stranded rna-binding protein, and is a key factor in neuronal differentiation. rna sequencing was used to analyze the overall transcriptional levels of the upregulated cells by stau1 and control cells, and select alternative splicing (as). it was determined that the high expression of stau1 led to changes in the expression levels of a variety of inflammatory and immune response genes, including ifit2, ifit3, oasl, and ccl2. furthermore, stau1 was revealed to exert a significant regulatory effect on the as of genes related to the ‘nerve growth factor receptor signaling pathway’. this is of significant importance for neuronal survival, differentiation, growth, post-damage repair, and regeneration. in conclusion, overexpression of stau1 was associated with immune response and regulated as of pathways related to neuronal growth and repair. in the present study, the whole transcriptome of stau1 expression was first analyzed, which laid a foundation for further understanding the key functions of stau1. rna binding proteins (rbps) are an important family of proteins related to rna metabolism. they dynamically bind rna to form a variety of complexes, including ribonucleoprotein particles (rnps) (1) . rbps play a significant role in various cellular processes, including rna splicing, polyadenylation, rna editing, rna transport, maintenance of rna stability and degradation, intracellular localization, and translation control (2) . recent studies have revealed that rbps are closely associated with muscular dystrophy, neurological disease, cancer, and mendelian disease (3, 4) . staufen is a member of the double-stranded rna-binding protein family, which is involved in mrna transport and localization to different subcellular compartments (5) . staufen proteins contain several double-stranded rna-binding domains (6) . double-stranded rna-binding protein staufen homolog 1 (stau1) plays a crucial role in mrna output, re-localization, translation, and stau1-mediated mrna decay (smd) (7, 8) . smd is also involved in the developmental process, such as myogenesis and adipogenesis, and possibly in angiogenesis (9) . stau1, by binding to the inverted repeat alu elements (iralus) on the 3'utr of mrna, inhibits retention of iralus-containing mrna in the nuclei, thereby enhancing its nuclear export. moreover, stau1 binding to the iralus on the 3'utr of mrna can promote translation of these mrnas by inhibiting the binding of protein kinase r (pkr) (10) . in addition, stau1 is also an important mrna transport factor in neurons. after binding to the 3'utr of mrna, stau1 can induce translation-dependent mrna degradation through direct interaction with upf1. an upregulation of transcripts due to depletion of stau1 and upf1 may play a pivotal role in the differentiation process (11) . stau1 expression can be used to regulate neuronal differentiation (12) . studies have revealed that stau1 plays a substantial role in the immunity against the influenza virus and human immunodeficiency virus type i (hiv-1) (13) (14) (15) . in order to study the possible biological functions of stau1, a stau1-regulated transcriptome in hela cells was obtained. to determine the gene expression profile and the alternative splicing (as) events in the genome, which are regulated by stau1, high-throughput rna sequencing (rna-seq) was employed. in addition, the related results were validated in hela cells. a previous research demonstrated that ptb knockdown converted highly transformed hela cells to neuronal-like cells. they extended this analysis to multiple cell types of diverse origin, including human embryonic stau1 selectively regulates the expression of inflammatory and immune response genes and alternative splicing of the nerve growth factor receptor signaling pathway carcinoma stem cells (nt2), mouse neural progenitor cells (n2a), human retinal epithelial cells (arpe19), and primary mouse embryonic fibroblasts (mefs). upon ptb knockdown, all of these cells exhibited a neuronal-like morphology (16) . comparative transcriptome analysis revealed that stau1 can selectively regulate the expression of inflammatory genes, including interferon-induced protein with tetratricopeptide repeats 2 (ifit2), 2'-5'-oligoadenylate synthetase-like protein (oasl), and interferon-induced protein with tetratricopeptide repeats 3 (ifit3). furthermore, it was revealed that as of genes widespread in the 'nerve growth factor receptor signaling pathway', including plekhg2, arhgef11, nr4a1, pdgfb, fgfr4, and ralgds, was regulated by stau1. overexpression of stau1 was closely associated with inflammation. collectively, the present research defined a potential regulatory pattern in which as of inflammatory and immune response genes was regulated by stau1. it indicated that stau1 may be involved in the proliferation and survival of neurons (17), differentiation, cell growth and apoptosis (18), post-damage repair and regeneration, neurite outgrowth and retraction (19) , and myelination (20) . cloning and construction of plasmids. ce design 1.04 software (vazyme biotech co., ltd.). was employed to design primers for hot fusion (21) . there was a gene-specific sequence and a pires-hrgfp-1a vector (part no. 240031; agilent technologies, inc.) sequence (17-30 bp) in each primer. the designed primers contained two parts of a sequence, the sequence before 'atg' in the forward (f)-primer or 'agc' in the reverse (r)-primer matches of the pires-hrgfp-1a vector and the other part of the sequence belonged to stau1 gene sequence, presented as follows: f-primer: agc ccg ggc gga tcc gaa ttcatg aaa ctt gga aaa aaa cca atg t and r-primer: gtc atc ctt gta gtc ctc gag agc acc tcc cac aca cag aca. ecori and xhoi (neb) were used to digest pires-hrgfp-1a vectors for ~2-3 h at 37˚c, which were then subjected to agarose (1.0%) gel electrophoresis and purification using qiagen spin-column-based kit (qiagen gmbh) according to the manufacturer's instructions. trizol reagent (ambion) was used to extract total rna from hela cells, which was used to synthesize cdna by oligo dt primer. the cdna is the template to amplify insert fragments of the stau1 gene using the primers by polymerase chain reaction (pcr). the vectors and insert fragments were added to a microtube and ligated by clonexpress ii one step cloning kit (vazyme biotech co., ltd.). the ligand product was transformed into e. coli dh5α, which was then added to a luria-bertani (lb) agar plate, containing 1 µl/ml ampicillin (sigma-aldrich; merck kgaa), followed by overnight incubation at 37˚c. the template used for pcr was dna extracted from dh5α cells, using dna polymerase 2x green taq mix (vazyme biotech co., ltd.). the sequences of the universal primers were as follows: forward (f)-primer: aat taa ccc tca cta aag gg and reverse (r)-primer: gtc ctt atc atc gtc gtc tt. pcr procedures were carried out as follows: the samples were first denatured at 95˚c for 5 min, denaturation at 95˚c for 3 min, followed by 28 cycles of annealing at 55˚c for 30 sec and extension at 72˚c for 1 min. universal primers were employed to screen colonies. sanger sequencing (22) was used to verify the insert sequence. cell culture and transfections. human cervical carcinoma cell line, hela (cctcc no. gdc0009) was obtained from the china center for type culture collection. hela cells were cultured at 37˚c with 5% co 2 in dulbecco's modified eagle's medium (dmem; gibco; thermo fisher scientific, inc.) with 10% fetal bovine serum (fbs; ge healthcare), 100 µg/ml streptomycin (hyclone), and 100 u/ml penicillin (hyclone). plasmid (400 ng) (part no. 240031; agilent technologies, inc.) was transfected into hela cells using lipofectamine™ 2000 transfection reagent (cat. no. 11668019; invitrogen; thermo fisher scientific, inc.) according to the manufacturer's protocol. transfected cells were harvested after 48 h for reverse transcription-quantitative pcr (rt-qpcr) and western blot analysis. gene overexpression. stau1 overexpression was determined using glyceraldehyde-3-phosphate dehydrogenase (gapdh) as the control. cdna was synthesized according to the standard instructions followed by real-time quantitative pcr. rt-qpcr was conducted on the bestar sybr green rt-pcr master mix (dbi bioscience). the primers are presented in table si . the concentrations of the transcripts were then normalized to gapdh mrna levels using the 2 -δδcq method (23) . the data were analyzed by student's t-test using graphpad prism software (version 8.0; graphpad software, inc.). cell proliferation assay. cell proliferation was measured by the 3-(4, 5-dimethylthiazol-2-yl)-2, 5-diphenyltetrazolium bromide (mtt) assay. briefly, 1x10 3 cells were seeded in 96-well plates in triplicate. after 48 h of transfection, 5 mg/ml mtt solution was added into the cells. after 4 h of incubation at 37˚c, the mtt solution was removed. the insoluble mtt was dissolved in dmso. absorbance at 490 nm was measured using a benchmark plus microplate reader (bio-rad laboratories, inc.). rna isolation and high-throughput sequencing. four groups of samples were prepared, namely, control cells and overexpression (oe)-stau1 cells (two biological replicates). prior to rna isolation, hela cells were first harvested. trizol reagent (ambion; thermo fisher scientific, inc.) was used to isolate total rna, which was then purified by phenol-chloroform and treated with rq1 dnase (promega corporation) to eliminate dna. the quantity and quality of the rna were verified by using smartspec plus (bio-rad laboratories, inc.) to detect the absorbance at a wavelength of 260-280 nm. agarose (1.5%) gel electrophoresis was employed to detect rna integrity. rna-seq library was prepared using vahts stranded mrna-seq library prep kit (cat. no. nr602; vazyme biotech co., ltd.), and 1 µg total rna was used for one sample. after polyadenylated mrnas were purified, they were fragmented and double-strand cdnas were produced. the double-strand cdnas were subjected to end repair, and poly(a) tails were added. they were then ligated to vahts rna adapters (vazyme biotech co., ltd.) and digested by heat-labile uracil-dna glycosylase (udg). before sequencing, amplification, purification, and quantification of the single-strand cdna were performed, the quantity of the cdna was rede-termined by qpcr using agilent 2100 (agilent technologies, inc.), and was finally stored at -80˚c. the library was prepared according to the manufacturer's protocol, which was applied to 150-nt paired-end sequencing using illumina hiseq x ten system (illumina, inc.). ablife performed the sequencing using the sequencing kit provided from illumina, inc. data processing and alignment. raw reads containing more than 2-n bases were first discarded. fastx-toolkit (ver. 0.0.13) was used to trim the adaptors and bases of low quality from raw sequencing reads, and short reads less than 16-nt were removed, filtered for quality (fastq_quality_filter -q 25 -p 80) and against artifact sequences (fastx_artifacts_filter) and collapsed (fastx_collapse). base quality q30 was used to indicate the proportion of bases with a sequencing error rate <0.1%. rna-seq data presented in this study have been deposited in the gene expression omnibus of ncbi and are accessible through geo series accession number gse136890. using tophat v.2.0.10 (24), clean reads were mapped to the grch38 genome and four mismatches were allowed. for calculation of fragments per kilobase of transcript per million fragments mapped (fpkm) (25) and gene read counting, we applied specifically mapped reads, and by calculating the pearson correlation coefficient (pcc) for cluster analysis. analysis of differentially expressed genes (degs). the r bioconductor package edger (26) was utilized to screen the degs. a false-discovery rate (fdr)<0.05 and fold-change (fc)>2 or <0.5 were set as the cut-off points for identifying degs. next, a volcano map was drawn to reveal the number of degs. as analysis. by applying the ablas algorithm (27, 28) , regulated differential splicing events (rdses) and differential splicing events (dses) were identified. briefly, 10 types of dses were detected by ablas on the basis of splice junctions directly extracted from mapping reads, including exon skipping (es), alternative 5'splice site (a5ss), alternative 3' splice site (a3ss), intron retention (ir), mutually exclusive exons (mxe), mutually exclusive 5'utrs (5pmxe), mutually exclusive 3'utrs (3pmxe), cassette exon, a3ss and es, and a5ss and es. sashimi plot by igv tools was used for as analysis. to assess rbp-regulated dses, student's t-test was performed to evaluate the significance of as events. the events, which were significant at a p-value equal to 5%, were considered rbp-regulated dses. we also analyzed the overlapping genes from the degs and regulated alternative splicing events (rases) between the samples were defined and quantified by using the ablas pipeline; and a venn diagram was drawn. as events and degs are validated by rt-qpcr. in order to validate rna-seq data, rt-qpcr was performed. the primers used for rt-qpcr are presented in table si . rna was reversely-transcribed into cdna using an m-mlv reverse transcriptase (vazyme biotech co., ltd.). rt-qpcr was performed by stepone real-time pcr system using the sybr-green pcr reagent kit (yeasen biotechnology co., ltd.). pcr procedures were carried out as follows: the samples were first denatured at 95˚c for 10 min, followed by 40 cycles of denaturation at 95˚c for 15 sec, and then annealed and extended at 60˚c for 1 min. the procedures were repeated three times for all samples. the rna expression levels of all the degs were normalized against those of gapdh. in addition, an rt-qpcr assay was undertaken for dse validation. the primers used for detecting dses are presented table si . to detect alternative isoforms, a boundary-spanning primer of constitutive and alternative exons was used, as well as an opposing primer in one constitutive exon. the boundary-spanning primer of the alternative exon was designed according to a 'model exon' to detect model splicing, or an 'altered exon' to detect altered splicing. functional enrichment analysis. to sort out functional categories of degs, kobas 2.0 server (29) was employed to identify kyoto encyclopedia of genes and genomes (kegg) pathways and gene ontology (go) terms. enrichment was assessed using the hypergeometric test followed by benjamini-hochberg fdr adjustment for p-values. the heatmap was constructed by calculating the pearson correlation coefficient (pcc) of the degs. overexpression of flag-tagged stau1 promotes cell proliferation. in order to analyze the expression of stau1-expressing vectors and its influence on the proliferation of human hela cells, the stau1-transfected cells with gfp label and flag label (fused with the target gene) were successfully constructed. in addition, cells transfected with blank controls were considered as the negative control. after 48 h of transfection, the expression level of gfp was detected to indicate whether the transfection was successful. the cells were harvested and detected by rt-qpcr and western blot assays. the results indicated a significant overexpression after cell transfection with stau1-expressing vectors (fig. 1a) . thereafter, proliferation of stau1-transfected cells was detected by mtt assay. compared with the control group, overexpression of stau1 promoted cell proliferation (fig. 1b) . rna-seq profiling of the transcriptional response to stau1 overexpression. in order to assess stau1-mediated transcriptional regulation in human hela cells, four groups of samples were prepared, namely, control cells and overexpression (oe)-stau1 cells (two biological replicates). total rna extraction was carried out for the 4 groups of samples, and the cdna library was prepared. then, the library was subjected to paired-end sequencing on illumina hiseq x ten to extract high-quality transcriptome data. quality analysis of clean reads indicated that the mean q30 quality score was 93.93% (30) . next, high-quality clean reads were aligned against grch38 human reference genome using tophat2 software (table i) . rna-seq data were analyzed and the expression levels of stau1 were quantified, which further demonstrated overexpression of stau1 (fig. 1c ). correlation analysis was undertaken to determine the variability in the gene expression level between each pair of the samples. moreover, cluster analysis was performed between the samples (fig. 1d ). as revealed in fig. 1d , there was a correlation between stau1 oe cells and control cells; there was also a significant correlation between the biological replicates. with analysis of degs among samples, criteria for significant difference were set to fc≥2 or ≤0.5 and fdr<0.05. a volcano plot was drawn and the results revealed that 801 significant degs were related to stau1 overexpression. among them, 423 upregulated genes and 378 downregulated genes were identified (fig. 1e ). this indicated that stau1 plays an extensive transcriptional regulatory role in hela cells. a heatmap of the expression levels of degs was plotted. the upregulated genes were represented in the red region of the experimental group, and downregulated ones in the red region of the control group. the results revealed a high level of consistency in stau1-mediated transcription between the two groups of data ( fig. 2a) . the functions and potential biological roles of these degs were further analyzed. go enrichment and kegg pathway analyses were conducted for the upregulated and downregulated genes, respectively, and the top 10 go terms were presented. it is generally believed that a p<0.05 indicates significant difference, i.e., significant enrichment. thus, this threshold was defined as the cut-off. go functions were divided into three categories, namely, molecular function, biological process, and cellular component. as revealed in fig. 2b , genes regulating stau1 overexpression were primarily enriched in pathways related to inflammation and immune response. the upregulated genes in the oe-stau1 were enriched in 'defense response to virus', 'cytokine-mediated signaling pathways', 'transport', 'signal transduction' and 'synaptic transmission'; the downregulated genes were enriched in 'signal transduction', 'transmembrane transport', 'inflammatory response', and 'innate immune response'. studies have confirmed that for kegg pathway analysis, the cut-off was also set to p<0.05. the results revealed that the genes were enriched in a variety of pathways related to the immune system, inflammatory response, and nervous system (fig. 2c) ; the upregulated genes were primarily enriched in 'abc transporters' and 'tnf signaling pathway'. the downregulated genes were mainly enriched in 'systemic lupus erythematosus', 'rheumatoid arthritis', 'nod-like receptor signaling pathway', 'tnf signaling pathway', 'serotonergic synapse', and 'nf-κb signaling pathway'. then, among all the degs, 28 genes related to cytokine-mediated signaling, inflammatory response, and immune response were selected and presented in the degs-based hierarchical clustering plot (fig. 2d ). among them, there were 15 upregulated genes and 13 downregulated genes. herein, 3 upregulated genes (ifit2, oasl, and ifit3) and 1 downregulated gene [chemokine (c-c motif) ligand 2 (ccl2)] were analyzed in details in terms of coverage and distribution of reads (fig. 2e) . distribution of reads reflected the relative location of genes and the relative read abundance, which further demonstrated differential expression in control cells and stau1 oe cells. therefore, it could be concluded that stau1 selectively regulated genes related to inflammation and immune responses. fig. 3a . these genes were enriched in the cytokine-mediated signaling pathway, inflammatory response or other biological processes in the go database. to verify the reliability of rna sequencing, qpcr was conducted for hela cells. it was revealed that ifit2, oasl, and ifit3 were significantly upregulated, while ccl2 was significantly downregulated (fig. 3b) , which was consistent with results of rna-seq analysis. the results of qpcr of the 3 genes (ifi27, s1pr4, ccl5) in fig. s1 were also consistent with rna-seq analysis (data not shown), and the non-deg cd44 was the control gene. stau1 regulates the as of genes enriched in the 'nerve growth factor receptor signaling pathway'. regulated as events (rases) of stau1 in human hela cells were further analyzed. every sample in the rna-seq data was aligned to unique mapped reads on the reference genome for rase analysis. the results of exon detection in 4 samples are presented in table ⅱ . a total of 237,791 detected exons were achieved, accounting for 64.74% of all annotated exons in the reference genome. splice junctions of each sample were then analyzed by using tophat2 software, and 160,308 known splice junctions (known_junction) and 163,225 novel splice junctions (novel_junction) were obtained (table iii) . various rases were statistically analyzed using ablas, and the detection results in each sample are presented in table ⅳ . there were 76,259 detected rases, including 19,746 annotated rases in the genome (known as) and 56,513 non-annotated novel rases (novel as). due to the set-up of biological replicates, a student's t-test was used to compare the variation of as levels of each gene between two samples, and the criterion for significantly different as was set to p≤0.05. as revealed in fig. 4a , a total of 549 rases were detected, which are presented in table ⅴ . among them, the major types of rases included 102 a3ss, 91 a5ss, 78 es, and 63 cassette exons. this indicated that stau1 had a retaining and promoting effect on exons in the entire genome. number and types of other differential rases were as follows: 24 5pmxe, 10 3pmxe, 10 a3ss and es, 16 mxe, and 15 a5ss and es. the aforementioned results indicated that stau1 could regulate as in the genome of hela cells. an integrated analysis was performed for differentially regulated alternatively spliced genes (rasgs) and degs in different samples. there were 2 genes with significant difference in terms of both the expression level and as level (fig. 4b) . similarly, go enrichment and kegg pathway analyses were undertaken on the differential rasgs, and the top 10 terms are presented in fig. 4c and d. go terms and kegg pathways, in which rasgs were enriched are presented in table sii and siii, respectively. it was revealed that the genes whose as level was regulated by stau1 were mainly enriched in 'retrograde transport, endosome to golgi', 'muscle cell differentiation', and other reported stau1-related pathways. the go term ranking the 16th was enriched in 'nerve growth factor receptor signaling pathway' (~p=0.01). table sii , 6 key genes were selected for the detection of rase, namely, es, a5ss, 5pmxe, and ir. according to the results of rna-seq ( fig. 5 and fig. s2 ), the number of reads sequenced for each gene was over 10. then, for each gene, the number of reads for as between stau1 oe cells and control cells was compared, and a significant difference in the as levels was noted. to verify the reliability of the results, qpcr was performed in the hela cells. primers for the qpcr verification are presented in table si . rases detected by qpcr were consistent with those by rna-seq, which demonstrated that stau1 may play a significant regulatory role in the as of 'nerve growth factor receptor signaling pathway'. rna-seq based on high-throughput sequencing is currently the most widely used transcriptome sequencing technology, which can promptly extract all the genetic information of the samples. rna-seq, has become one of the most representative high-throughput sequence-based techniques due to its high-throughput, high accuracy, and cost-effectiveness. it can be used to study the structure and function of genes, identify changes in gene expression, and explore as patterns that are regulated (34) . in the field of life sciences, this method has been used to explore the pathogenesis of diseases, clinical diagnosis, and pharmacological research (35, 36) . in the present study it was revealed that overexpression of stau1 promoted the proliferation of hela cells, which are useful for the study of gene regulation in the central nervous system, while the proliferation of neurons and glial cells in the central nervous system plays an important role in neuropathic pain (37) . upregulation of stau1 caused upregulation or downregulation of numerous genes, including ifit2, ifit3, oasl and ccl2. through functional analysis, changes in the expression levels of these genes may affect signaling pathways, such as 'defense response to virus', 'cytokine-mediated signaling pathway', and 'inflammatory response', which are closely associated with inflammatory immune response. in addition, the as of multiple genes was also regulated by stau1, and the main enriched pathways not only include 'retrograde transport' and 'muscle cell differentiation', but also the 'nerve growth factor receptor signaling pathway'. significant upregulation of ifit2, ifit3, and oasl genes was consistently indicated by rna-seq and qpcr of hela cells. the ifit family performs multiple functions, including antitumor effects and regulation of cell apoptosis and innate immune pathways (38, 39) . it can also inhibit replication of flavivirus and coronavirus (40) . siegfried et al stimulated wild-type bone marrow-derived macrophages (bmdms), ifit2 -/-, and ifnar -/-bmdms with lps, respectively. results of enzyme-linked immunosorbent assay (elisa) test indicated that the mutant bmdms exhibited a significant reduction in the expression levels of tnf-α and interleukin-6 (il-6) than the wild-type bmdms. furthermore, shrna interference targeting ifit2 was performed in raw264.7 macrophages. it was revealed that tnf-α and il-6 were also downregulated, suggesting a pro-inflammatory role of ifit2 (41) . berchtold et al demonstrated that overexpression of ifit2 decreased the secretion of tnf-α in the raw264.7 cells (42). niess et al compared highly metastatic l3.6pl pancreatic tumor cells and lowly metastatic col0357fg pancreatic tumor cells. it was determined that upregulation of ifit3 promoted synthesis and secretion of il-6 (43). liu et al revealed that the expression of exogenous ifit3 enhanced the inducing effect of nf-κb on tnf-α, without influencing tnf-α-mediated activation of nf-κb (44) . furthermore, oasl is an interferon-stimulated gene (isg), playing a significant role in the immune response to viruses (45) . activation of oasl can be induced by interferon (ifn). the expression of oasl can further stimulate the production of ifn, thereby forming a positive feedback (46) . ifn-γ has a neuroprotective effect, and significantly promotes secretion of il-6 in astrocytes (47) . inflammatory cytokines (tnf-α and il-6) are important molecules, mediating enhancement of hyperalgesia via increasing glutamic acid-induced excitatory current, thereby promoting the development of pain (48) . excitatory synaptic transmission is mainly regulated by ampa and nmda receptors. inflammatory factors enhance their degree of excitation, promote the release of excitatory mediators, such as glutamic acid and substance p, and participate in the regulation of various pain signaling pathways (49, 50) . therefore, tnf-α and il-6 play a vital role in np. in the present study it was also revealed that ccl2 was markedly downregulated. ccl2, also known as monocyte chemotactic protein-1 (mcp-1), can activate monocytes in the inflammatory state, induce leukocyte migration reaction, regulate t-cell function, and participate in inflammation and immune response (51) . recently, it has been revealed that ccl2 is highly expressed in drg neurons and spinal dorsal horn surface neurons during peripheral nerve injury (52) . ccl2 is released in an activity-dependent manner from the synaptic vesicles in the central nervous system into the spinal cord (52, 53) . the ccl2 expression in the spinal cord is not limited to neurons. after spinal nerve ligation, astrocytes can also upregulate ccl2. additionally, the in vitro cultured astrocytes exhibited an upregulation of ccl2 by over 100-fold, which was rapidly released in a jnk-dependent manner (54) . ccl2 secreted by astrocytes acts on ccr2 in the dorsal horn neurons. ccl2 can strengthen the release of glutamic acid from the injured neuronal presynaptic membrane and promote the function of glutamic acid receptors in the postsynaptic membrane. this inhibits gaba-induced inhibitory synaptic transmission, while causing rapid phosphorylation of eukaryotic protein kinase (epk) and activation of nmda receptors. as a result, central sensitization is induced in a direct, rapid and non-transcriptional manner (55) . another study shows that mcp-1 and its receptor ccr2 in primary sensory neurons are involved in maintaining paclitaxel induced peripheral neuropathy (56) . therefore, overexpression or depletion of ccl2 and ccr2 has a direct influence on np. the regulatory role of stau1 overexpression on rase of hela cells was further studied. a total of 549 significantly differential rases were identified, which verified our speculation that stau1 can globally regulate the as events in the genome in hela cells. pathways in which the differential rasgs were enriched have been previously aforementioned. plekhg2 and arhgef11 are both rho guanine nucleotide exchange factors and activators of rho gtpases. a variety of biological effects can be regulated by rho gtpases, such as transmembrane transport, cell migration, adhesion, and proliferation (57) . moreover, rho gtpases can participate in the immune response by regulating the rho/rock signaling pathway (58, 59) . another study demonstrated that plekhg2/ flj 00018 can regulate the morphology of neuro-2a cells, thereby playing a significant role in nerve growth and cell proliferation (60) . arhgef11 is involved in the regulation of axonal growth by regulating the activity of rhoa (61) . ralgds is one of the ras effectors and functions as a guanine nucleotide exchange factor for the small g-protein, ral, which regulates membrane trafficking and cytoskeletal remodeling (62) . notably, ralgds has been revealed to promote neuronal differentiation (63) and exert a key regulatory effect on neuronal plasticity and memory formation (64) . ralgds has also been revealed to mediate cytoskeletal remodeling (65) , promote cell proliferation (66) , and facilitate oncogenic transformation (67) . rondaij et al revealed that ralgds overexpression was conducive to promote the exocytosis of endothelial weibel-palade bodies (wpbs) (68) . the proteins encoded by the fgfr1 and fgfr4 genes are all members of the fibroblast growth factor receptor (fgfr) family (69) . they trigger the downstream cascade by binding to fgfrs, thereby playing a substantial role in promoting embryonic growth and development (69) , development of the nervous system (70) , and regulating the metabolism (71) . other biological functions of the proteins are manifested in promoting injury repair (72) , bone formation (73) , and vascular and neural regeneration (74, 75) . fgfr overexpression has also been revealed to be associated with tumor and bone diseases, as well as arthritis (76) (77) (78) . all of the aforementioned genes were subjected to as analysis, and the results were validated by qpcr, which indicated consistency with rna-seq except for the qpcr result for the pdgfb gene that was inconsistent with that of rna-seq. it is already known that pdgfb plays a significant role in the growth and proliferation of vessels and nerves (79) . herein, we further discussed the as events induced by pdgfb and the resultant alterations in gene functions as an example. platelet-derived growth factor (pdgf) is an important factor promoting cell growth. it consists of five homotypic or heterotypic dimerized ligands (pdgf-aa, -ab, -bb, -cc, -cd), which are formed by polypeptide chains (pdgf-a, pdgf-b, pdgf-c, pdgf-d), encoded by four different genes, via the disulfide bonds (80) . both of its receptors pdgfr-α and pdgfr-β belong to the receptor tyrosine kinase (rtk) family (81) . it has been demonstrated that pdgf can stimulate the division growth of fibroblasts (82) , neuroglial cells (83) and smooth muscle cells (84) . in particular, pdgf-bb has been revealed to promote neuronal development and differentiation (85) , and to play a neurotrophic role as well (86, 87) . a number of scholars have demonstrated that pdgf-bb can regulate neuronal proliferation and differentiation by activating the pi3k/akt and erk pathways (88) , restore the proliferation and differentiation of damaged neural precursor cells, and reverse neuronal excitotoxicity. pdgfb has been demonstrated to play an important role in neuropathic pain (89, 90) , and in the present study it was confirmed that staui regulated the alternative splicing of genes enriched in the 'nerve growth factor receptor signaling pathway' including pdgfb, and thus is also associated with neuropathic pain. the present study confirmed that the pdgfb gene undergoes a5ss events. the normal secretion of pdgf-b protein into the extracellular domain to bind to the pdgfr receptor-α and -β subunits and fulfill the biological effect is consequently affected. the qpcr results were consistent with our theory that stau1 promotes the retention of the pdgfb signal peptide, which mediates the neuroprotective mechanism and relieves neuropathic pain. stau1 promotes the retention of pdgfb signal peptide, which mediates the neuroprotective mechanism and relieves the neuropathic pain. we surmised that an even more complex regulatory mechanism is herein involved, while further studies should be conducted to confirm our findings. the present study revealed that overexpression of stau1 had a regulatory effect on gene splicing and transcription in hela cells. stau1 could positively regulate the transcription of genes related to inflammation and immune response. this regulatory effect also influenced the expression levels of pro-inflammatory factors and chemotactic factors. moreover, as of genes enriched in the 'nerve growth factor receptor signaling pathway' as well as 'retrograde transport, endosome to golgi', and 'muscle cell differentiation' was regulated by stau1. a recent study demonstrated that several rna binding factors involved in local translation may play a crucial role in pain, including stau1, a double-stranded dsrna binding protein, which is expressed in peripheral sensory neurons and may play a role in axonal mrna transport (91) . therefore, it can be concluded that stau1 may be a novel potential therapeutic target for np. rna-binding proteins in mendelian disease rna-binding proteins: modular design for efficient function defining the rgg/rg motif a brave new world of rna-binding proteins a genome-wide approach identifies distinct but overlapping subsets of cellular mrnas associated with staufen1-and staufen2-containing ribonucleoprotein complexes staufen1 regulates multiple alternative splicing events either positively or negatively in dm1 indicating its role as a disease modifier lncrnas transactivate stau1-mediated mrna decay by duplexing with 3'utrs via alu elements control of somatic tissue differentiation by the long non-coding rna tincr kretz m: tincr, staufen1, and cellular differentiation stau1 binding 3' utr iralus complements nuclear retention to protect cells from pkr-mediated translational shutdown the ejc factor eif4aiii modulates synaptic strength and neuronal protein expression lin28b and mir-142-3p regulate neuronal differentiation by modulating staufen1 expression human staufen1 protein interacts with influenza virus ribonucleoproteins and is required for efficient virus multiplication impairment of the staufen1-ns1 interaction reduces influenza viral replication hiv-1 requires staufen1 to dissociate stress granules and to produce infectious viral particles direct conversion of fibroblasts to neurons by reprogramming ptb-regulated microrna circuits nerve growth factor activation of nuclear factor kappab through its p75 receptor is an anti-apoptotic signal in rn22 schwannoma cells sortilin is essential for prongf-induced neuronal cell death neurotrophin dependence mediated by p75ntr: contrast between rescue by bdnf and ngf neurotrophins in myelination: a new role for a puzzling receptor hot fusion: an efficient method to clone multiple dna fragments as well as inverted repeats without ligase dna sequencing-spanning the generations analysis of relative gene expression data using real-time quantitative pcr and the 2(-delta delta c(t)) method tophat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation edger: a bioconductor package for differential expression analysis of digital gene expression data transcriptome analysis reveals the complexity of alternative splicing regulation in the fungus verticillium dahliae celf1 preferentially binds to exon-intron boundary and regulates alternative splicing in hela cells kobas 2.0: a web server for annotation and identification of enriched pathways and diseases alta-cyclic: a self-optimizing base caller for next-generation sequencing mrna binding protein staufen 1-dependent regulation of pyramidal cell spine morphology via nmda receptor-mediated synaptic plasticity staufen1 regulation of protein synthesis-dependent long-term potentiation and synaptic function in hippocampal pyramidal cells a loss of function allele for murine staufen1 leads to impairment of dendritic staufen1-rnp delivery and dendritic spine morphogenesis rna-seq: a revolutionary tool for transcriptomics fragseq: transcriptome-wide rna structure probing using high-throughput sequencing rearrangements of the raf kinase pathway in prostate cancer, gastric cancer and melanoma cardiolipin externalization to the outer mitochondrial membrane acts as an elimination signal for mitophagy in neuronal cells interferon-induced ifit2/isg54 protects mice from lethal vsv neuropathogenesis the interferon stimulated gene 54 promotes apoptosis ifit5 potentiates anti-viral response through enhancing innate immune signaling pathways ifit2 is an effector protein of type i ifn-mediated amplification of lipopolysaccharide (lps)-induced tnf-α secretion and lps-induced endotoxin shock forced ifit-2 expression represses lps induced tnf-alpha expression at posttranscriptional levels overexpression of ifn-induced protein with tetratricopeptide repeats 3 (ifit3) in pancreatic cancer: cellular 'pseudoinflammation' contributing to an aggressive phenotype ifn-induced tpr protein ifit3 potentiates antiviral signaling by bridging mavs and tbk1 differential regulation of the oasl and oas1 genes in response to viral infections oasl-a new player in controlling antiviral innate immunity neuroprotection by ifn-ү via astrocyte-secreted il-6 in acute neuroinflammation neuroinflammation and the generation of neuropathic pain cytokine mechanisms of central sensitization: distinct and overlapping role of interleukin-1beta, interleukin-6, and tumor necrosis factor-alpha in regulating synaptic and neuronal activity in the superficial spinal cord tnf-alpha and neuropathic pain-a review monocyte chemoattractant protein-1 (mcp-1): an overview spinal ccl2 pronociceptive action is no longer effective in ccr2 receptor antagonist-treated rats ccl2 is a key mediator of microglia activation in neuropathic pain states jnk-induced mcp-1 production in spinal cord astrocytes contributes to central sensitization and neuropathic pain role of the cx3cr1/p38 mapk pathway in spinal microglia for the development of neuropathic pain following nerve injury-induced cleavage of fractalkine induction of monocyte chemoattractant protein-1 (mcp-1) and its receptor ccr2 in primary sensory neurons contributes to paclitaxel-induced peripheral neuropathy rho gtpases: masters of t lymphocyte migration and activation threonine 680 phosphorylation of flj00018/plekhg2, a rho family-specific guanine nucleotide exchange factor, by epidermal growth factor receptor signaling regulates cell morphology of neuro-2a cells contribution of small gtpase rho and its target protein rock in a murine model of lung fibrosis four-and-a-half lim domains 1 (fhl1) protein interacts with the rho guanine nucleotide exchange factor plekhg2/flj00018 and regulates cell morphogenesis hspb1 silences translation of pdz-rhogef by enhancing mir-20a and mir-128 expression to promote neurite extension activated ras interacts with the ral guanine nucleotide dissociation stimulator the structural basis of the activation of ras by sos rab gtpases implicated in inherited and acquired disorders beta-arrestins regulate a ral-gds ral effector pathway that mediates cytoskeletal reorganization ralgds couples growth factor signaling to akt activation activation of the ralgef/ral pathway promotes prostate cancer metastasis to bone guanine exchange factor ralgds mediates exocytosis of weibel-palade bodies from endothelial cells fibroblast growth factor-10 and fibroblast growth factor receptors 1-4: expression and peptide localization in human decidua and placenta diverse fgf receptor signaling controls astrocyte specification and proliferation exploring mechanisms of fgf signalling through the lens of structural biology fibroblast growth factor receptor tyrosine kinases: molecular analysis and signal transduction growth factor regulation of fracture repair the role of basic fibroblast growth factor in peripheral nerve regeneration fgfr3 expression by astrocytes and their precursors: evidence that astrocytes and oligodendrocytes originate in distinct neuroepithelial domains the fgf family: biology, pathophysiology and therapy inhibition of tumor angiogenesis and growth by a small-molecule multi-fgf receptor blocker with allosteric properties basic fibroblast growth factor binding and processing by human glioma cells signal transduction mediated by the ras/raf/mek/erk pathway from cytokine receptors to transcription factors: potential targeting for therapeutic intervention the pdgf family: four gene products form five dimeric isoforms structural and functional specificities of pdgf-c and pdgf-d, the novel members of the platelet-derived growth factors family platelets as a source of fibroblast growth-promoting activity a platelet factor stimulating human normal glial cells a platelet-dependent serum factor that stimulates the proliferation of arterial smooth muscle cells in vitro a pdgf-regulated immediate early gene response initiates neuronal differentiation in ventricular zone progenitor cells protective effect of platelet-derived growth factor against 6-hydroxydopamine-induced lesion of rat dopaminergic neurons in culture neurotrophic activity of platelet-derived growth factor (pdgf): rat neuronal cells possess functional pdgf beta-type receptors and respond to pdgf immature neurons from cns stem cells proliferate in response to platelet-derived growth factor platelet-derived growth factor protects neurons against gp120-mediated toxicity platelet-derived growth factor-bb restores human immunodeficiency virus tat-cocaine-mediated impairment of neurogenesis: role of trpc1 channels this work is licensed under a creative commons attribution-noncommercial we deeply thank dr wen chen for his significant technical and scientific comments. the present study was financially supported by the natural science foundation of hubei province, china (grant no. 2011cdc032). this study was partially supported by ablife (experimental project no. abl-7702121). the data discussed in this publication are available under geo series accession number gse136890. yz performed the experiments and data analysis and was the main author of the manuscript. zh participated in the experimental design, carried out bioinformatics analysis, and partly contributed to the writing of the manuscript. jw and fd conducted the cell experiments and interpretation of the data. fl and yx designed the project, contributed to the analysis and interpretation of the data, and revision of the manuscript. all the authors read and approved the submitted final version of the manuscript. not applicable. not applicable. the authors declare that they have no competing interests. key: cord-015935-r2wd1yfa authors: sokol, deborah k.; lahiri, debomoy k. title: the genetics of autism date: 2011-02-10 journal: international handbook of autism and pervasive developmental disorders doi: 10.1007/978-1-4419-8065-6_6 sha: doc_id: 15935 cord_uid: r2wd1yfa this chapter is written to make the fast-paced, expanding field of the genetics of autism accessible to those practitioners who help children with autism. new genetic knowledge and technology have quickly developed over the past 30 years, particularly within the past decade, and have made many optimistic about our ability to explain autism. among these advances include the sequencing of the human genome (lander et al., 2001) and the identification of common genetic variants via the hapmap project (international hapmap consortium, 2005), and the development of cost-efficient genotyping and analysis technologies (losh, sullivan, trembath, & piven, 2008). improvement in technology has led to improved visualization of chromosomal abnormality down to the molecular level. the four most common syndromes associated with autism include fragile x syndrome, tuberous sclerosis, 15q duplications, and untreated phenylketonuria (pku; costa e silva, 2008). fxs and 15q duplications are discussed within the context of cytogenetics. tsc is illustrated within the description of linkage analysis. an affected sibling-pair design in multiplex (having more than one affected member) families. this led to the identification of chromosomal abnormalities in such conditions as neurofibromatosis, tuberous sclerosis, and dyslexia (smith, 2007) . improving technology in the 1990s enabled the detection of small genomic alterations of 50-100 kb and the direct visualization of these alterations in uncultured cells via fluorescent in situ hybridization (fish) . this technique ushered in the field of molecular genetics (li & andersson, 2009) and allowed the identification of chromosomal microdeletions and duplications in areas of the chromosome where there is already high suspicion that abnormality would exist. fish enables prenatal and cancer genetics screening and has led to the identification of genetic aberrations associated with angelman's and prader willi syndromes. in the past decade, microarray cytogenetics has permitted the study of the entire genome on a single chip with resolution as fine as a few hundred base pairs (li & andersson, 2009 ). such microarray technology represents a union between molecular genetics and classical cytogenetics. two types of microarray technology are used clinically: comparative genomic hybridization (cgh) and single-nucleotide polymorphism (snp) analysis. cgh directly measures copy number differences between a patient's dna and a normal reference dna spanning known genes, chromosomal regions, or across the entire genome. snp analysis, on the other hand, provides identification of a single point mutation via sequencing the gene in areas of suspected abnormality, previously identified via cgh or fish. another offshoot of microarray technology is submicroscopic chromosome copy number variation (cnv) analysis, in which deletions or duplications involving > 1-kb dna have been detected in patients with mental retardation, autism, and multiple congenital anomalies. there are several recent, detailed reviews of the genetics of autism (abrahams & geschwind, 2008; li & andersson, 2009; losh et al., 2008; o'roak & state, 2008) and this chapter summarizes those reviews. the reader is encouraged to first review the overview of gene expression (box 6.1) and essential nomenclature used in genetics (box 6.2). in addition, one must understand basic concepts pertinent to brain development (box 6.3) and to the neurobiology of autism (box 6.4) in order to understand its genetics. for example, there is a compelling rationale that genetically directed mechanisms that regulate the assembly of the brain during embryogenesis, when gone awry, may cause autism (costa e silva, 2008). box 6.1 overview of dna replication and gene expression * genes. a gene is a unit of hereditary material located in a specific place (locus) on a chromosome. most genes carry the instructions for producing a specific protein. a protein-coding human gene will have a coding sequence, which contains the information for protein production. the coding sequence may be broken up into several sub-sequences, or exons, which are separated by noncoding dna sequences, or introns. the coding sequence is flanked by a promoter and (usually) a 5 -untranslated region (5 -utr) that occur immediately before the coding sequence and by a 3 -utr and a terminator that occur immediately after the coding sequence. promoters, terminators, utrs, and introns can regulate the activity of a given gene. at a molecular level, chromosomes are composed of long chains of deoxyribonucleic acid (dna) upon which genes are linearly arranged. the dna exists in a double helix and the helical strands are usually wrapped around histone proteins. in addition to dna and histones, chromosomes contain scaffolding proteins that provide structural support and may also participate in gene regulation. dna is a molecule composed of a chain of four different types of nucleotides. it is the sequence of these nucleotides that is the genetic information that organisms inherit. dna is double stranded with nucleotides on each strand complementary to each other. each strand acts as a template for creating a new partner strand. this is the physical method for making copies of genes that are inherited. both individual dna nucleotides and dnaassociated proteins can be chemically modified to produce epigenetic variation that may be in response to environmental conditions and might or might not be inherited in any individual case. epigenetic variation can significantly alter the expression level of a gene without changing its basic genetic code. dna replication. during dna replication, a dna strand is produced from component nucleotides using its partner strand as a template. this is done by dna polymerases that "proofread" to ensure accuracy and repair when necessary. despite these safeguards, errors called mutations can occur during the polymerization of this second strand. a mutation is any nucleotide sequence in a dna molecule that does not match its original dna molecule from which it is copied. radiation, such as ultraviolet rays and x-rays, viruses, and errors that occur during meiosis or during dna replication can cause errors in dna molecules. these mutations can have an impact on the phenotype of an organism, especially when they occur within the protein-coding sequence of the gene. kinds of mutations include the following: frameshift mutation: one or more bases are inserted or deleted. this can alter the entire amino acid sequence of a protein or introduce aberrant rna splicing. deletion. missing dna sequences can range from a single base pair to longer deletions involving many genes in the chromosome. insertion. additional nucleotides are inserted in the dna sequence and can result in a nonfunctional protein. duplication (or gene amplification). multiple copies of a complete gene or genes, increasing the dosage of genes located within chromosomes. inversion. dna sequence of nucleotides is reversed, either among a few bases within one gene or among longer dna sequences containing several genes. point mutations. known as substitutions, this is a replacement of a single base nucleotide with another nucleotide. missense mutations. a point mutation where a single nucleotide is changed to cause substitution of a different amino acid. nonsense mutation. a point mutation that results in a premature stop codon in the transcribed mrna and possibly a nonfunctional protein product. rna splicing mutation. a point mutation that inserts or deletes an intron/exon border necessary for splicing of heterogeneous nuclear rna to mrna. silent mutation. a point mutation that does not alter the amino acid coding for that codon. gene expression is the process in which genes express their functional effect through the production of proteins which are responsible for most functions of the cell. the dna sequence of a gene is used to produce a specific protein through a ribose nucleic acid (rna) intermediate. gene dna is the map which directs the production of the protein, normally following a one-way pathway: splicing translation gene regulation gives the cell control over structure and function and is the basis for cellular differentiation and adaptability of any organism. gene regulation is controlled by noncoding dna sequences, such as the promoter, terminator, or introns, and by epigenetic variation, such as differences in dna methylation and histone acetylation. this entire process takes place within the cell in the vicinity of the nucleus, where dna is housed and hnrna is spliced, and in the cytoplasm, where ribosomes are located along the rough endoplasmic reticulum. some proteins are further processed in the golgi apparatus if they are to be exported from the cell. the correspondence between nucleotide and amino acid sequence is known as the genetic code. rna transcription. in transcription, dna is used as a blueprint for the production of rna. transcription is performed by rna polymerase which adds one ribonucleotide to a growing rna strand, complimentary to the dna nucleotide being transcribed. this product is heterologous nuclear rna (hnrna). rna splicing. in splicing, hnrna is processed within the nucleus to remove introns which may function to partially regulate transcription and join exons, which code for the actual protein product. the product of slicing is mrna. translation. translation is the production of proteins by decoding the mrna that was produced by transcription and splicing. the mrna sequence is the template that guides the production of a chain of amino acids that form a protein. decade, this technique enables examination of the whole human genome on a single chip with a resolution as high as a few hundred base pairs. this resolution is at least 10-fold greater than the best prometaphase chromosomal analysis and therefore, it is the most sensitive for whole genome screen for deletions and duplications. penetrance. the proportion of individuals carrying a particular variation of a gene that also expresses an associated trait. for example, if a mutation has 95% penetrance, then 95% of those with the mutation will develop the syndrome, while 5% will not. is expressed along a continuous phenotype, whose extreme end may produce the disorder such as hypertension, reading disability, or attention-deficit disorder with hyperactivity (adhd). recessive genetic disorders. a number of genetic disorders are caused when an individual inherits two recessive alleles for a single gene trait, such as with cystic fibrosis, pku, or albinism. recessive alleles can be located on the x chromosome so that males are more affected than females (hemizygous), such as in fragile x. a microarray technique that identifies a single point mutation via sequencing the gene in areas of suspected abnormality as identified via cgh or fish. this looks at a smaller region with greater detail than does cgh. * * solomon, berg, and martin (2008) during embryologic development, the brain emerges from the neural tube, an early embryonic structure. the anterior neural tube expands rapidly due to cell proliferation and eventually gives rise to the brain. cells stop dividing and differentiate into neurons and glial cells, the main cellular components of the brain. the neurons must migrate to different parts of the developing brain to self-organize into different brain structures. the neurons grow connections (axons and dendrites) which often must span long distances to reach their target cells. growth cones are the highly motile filamentous connections that eventually form more stable synapses between neurons. neurons communicate with other neurons via these synapses. this communication leads to the establishment of functional neural circuits that mediate sensory-motor processing and behavior. associated with this period of synaptogenesis is a brief period of "programmed" cell death, a pruning process in which cells succumb during natural development (mattson & furukawa, 1998) . migration of cells from the ventricular zone lining the central canal of the neural tube to the developing cortex next takes place followed by a more protracted period of pruning (volpe, 2001) . one mechanism that allows cells to detach and migrate away from each other appears to be regulated by a "molecular switch" regulated by catenin adhesion proteins (dicicco-bloom, 2006 ). neuroligin, recently described in autism research, is a cell adhesion molecule found in the postsynaptic membrane. also, brain-derived neurotrophic factor (bdnf) is produced by the brain and regulates several functions within the developing synapse, including enhancement of neurotransmitter release. activity at the cns synapses occurs over the life span and is thought to underlie learning and memory. the excitatory neurotransmitter glutamate and its receptors, particularly the n-methyl-d-aspartate (nmda) receptor, initiate synaptogenesis through activation of downstream products. at the cns synapse, a nerve terminal is separated from the postsynaptic membrane by a cleft containing specialized extracellular material. localized vesicles are at the active sites and clustered receptors are at the postsynaptic membrane with glial cells that encapsulate the entire synaptic cleft. there is differentiation of the pre-and postsynaptic membrane following initial contact between two cells. this includes the clustering of receptors, localized upregulation of protein synthesis at the active sites, and continued neuronal pruning through synapse elimination. neurons within the cns receive multiple inputs that must be processed and integrated for successful transfer of information. the multiple inputs physically represent the plasticity of the brain. macrocephaly has been one of the "most widely replicated biological findings in autism" (mccaffery & deutsch, 2005) , affecting up to 20% of all children with autism. several reports have shown increased brain volume (macroencephaly/megalencephaly) in both white and grey matter with results varying as a function of age. mri volumetric studies have showed increased brain volume in younger (total ages 2-21) subjects compared to older (total ages 5-21) subjects (courchesne, carper, & akshoomoff, 2003; sokol & edwards-brown, 2004; sparks et al., 2002) . to explain the relevance of brain volume enlargement in autism, it has been proposed that brain "growth without guidance" occurs in young autistic children (courchesne et al., 2001) with premature expansion and overgrowth of neural elements and dendritic connections. indeed, brain enlargement appears to develop postnatally, arising before or during the early recognition of autistic behaviors -between 18 months and 3 years (courchesne et al., 2003) . proposed mechanisms underlying brain enlargement include overproduction of synapses, failure of synaptic pruning, excessive neurogenesis, gliogenesis, or reduction in cell death (mccaffery & deutsch, 2005) . the mechanisms underlying the tremendous growth of the typically developing embryonic brain, compared to the relatively more limited postnatal and adult brain growth, have been implicated in the brain enlargement seen in autism. this leads to the proposal that overstimulation of nuclear receptor-mediated gene transcription may increase neural progenitor cells that generate marked increases in cortical surface area (mccaffery & deutsch, 2005) . therefore, the number of neurons in the young cortex is a function of the number of proliferative cells present at the onset of neurogenesis. tumor suppressor genes which control brain growth and migration, when defective, could contribute to macrocephaly of failure in synaptic pruning. this explains the great interest in tumor suppressor genes such as pten as candidate genes for autism. recently, neuronal cell adhesion has been proposed as another mechanism involved in brain overgrowth in autism. cell adhesion suppresses brain growth, while abnormalities in adhesion promote growth or contribute to aberrant growth. it is believed that heterogeneous causes of autism may be associated with alterations in adhesion molecules of the synapse or cytoplasmic molecules associated with synaptic receptors. indeed, recent genetic evidence has found associations between autism susceptibility and other neuronal cell adhesion molecules, such as nlgn1 and astn2 and specific cadherins. there is an emerging theory, as recently reviewed in geschwind (2009) , that short-range brain connections may be overgrown and longer range brain connections between different brain lobules are reduced in autism. in autism, there has been a noted reduction in the size of the corpus callosum which connects the two cerebral hemispheres and reduced connectivity between the frontal and temporal lobes of the brain, locations responsible for language and social judgment. the preservation of normal short-range connections could explain some of the preserved processing functions such as visual perception or attention to detail experienced by many with autism (geschwind, 2009 ). three lines of research indirectly attest to the heritability of autism: twin studies, family studies, and the fact that autism affects more boys than girls. in general, hereditability appears to be greater when a broader definition of autism (including individuals with cognitive deficits and/or social impairment) instead of the specific dsm-iv criteria for autism is used. in the first twin study of autism (folstein & rutter, 1977) , the concordance rate in monozygotic (mz) pairs (36%) was significantly greater than that found in dizygotic (dz) pairs (0%). if the phenotype was expanded to include a cognitive or a language disorder, the concordance rates were 82 and 10%, respectively. two subsequent studies found an mz/dz concordance rate of 91-0% (steffenburg et al., 1989) or 60-5% utilizing the specific phenotype, and 90% vs. 10% using the broader phenotype which included social or cognitive deficits . across twin studies of autism, the difference between mz and dz concordance rates is sizable, averaging roughly 10:1 (pennington, 2009 ). this rate is greater than that for other psychiatric disorders such as depression, bipolar disorder, and schizophrenia (between 2:1 and 4:1), indicating a high heritability for autism (pennington, 2009 ). on the other hand, there was great variability in iq and clinical behaviors in the 16 mz pairs concordant for autism in the study. in other words, there was no more similarity for these traits within mz pairs than that between individuals picked at random from different mz pairs who also had autism. as interpreted by pennington (2009) , this finding suggests that although autism is heritable, the genes may not dictate the exact phenotype. nonadditive interaction among genes (epistasis) and nonshared environmental influences likely contribute to these differences in phenotypes. further, the large disparity between the mz and the dz concordance rates has been attributed to epistasis (pennington, 2009 ). alternatively, rare gene variants causing a common disorder (autism), as described below, could contribute to this large mz/dz discordance. individuals with autism rarely marry and have children so that vertical transmission of the diagnosis from parent to child is rarely observed (pennington, 2009 ). however, genetic transmission is still possible as parents can transmit genetic risk factors without having the diagnosis themselves. family studies (cited in geschwind & konopka, 2009; pennington, 2009) suggest that the risk of autism is 20-60% higher in siblings compared to the incidence of autism in the general population, and to that of other psychiatric disorders. several studies have shown that a broad autism phenotype is transmitted in families of individuals with autism (piven, 1999; rutter, 2000) . for example, first-degree relatives of individuals with autism were shown to be shy, aloof, and have problematic pragmatic language (rutter, 2000) . this pattern is consistent with the segregation of sub-threshold traits within these families (abrahams & geschwind, 2008) . finally, autism affects more boys than girls (4:1), a finding which has remained constant since kanner's first description of autism in 1945, and despite the increasing incidence of this diagnosis. the predominantly male ratio has been attributed to an abnormality on the x chromosome (discussed below), or to sex linkage or genomic imprinting (lintas & persico, 2009; marco & skuse, 2006) . sex linkage involves a gene on the x chromosome transmitted from the mother to the son. as the son has only one x chromosome, this gene would be expressed. since the mother's daughter has two copies of the x chromosome -one from her father and one from her mother -the daughter likely would not express the abnormal phenotype. the most well-known example of a sex-linked disorder is hemophilia, which is on the x chromosome. genomic imprinting, on the other hand, is an epigenetic phenomenon wherein chemical modification of dna that does not alter the basic dna sequence or modification of the dna-associated histone proteins determines whether the maternal or the paternal copy of a specific gene is expressed. genomic imprinting has been determined for prader willi and angelman's syndrome. finally, increased risk for autism has been identified in the offspring of older fathers (reichenberg et al., 2006) . therefore, the gender and age of the parent may confer risk for offspring with autism. the conclusion drawn from indirect evidence is that autism is the most heritable and familial neurodevelopmental disorder (pennington, 2009) . with rare exceptions, however, autism does not appear to be the action of a single gene inherited in a strictly mendelian pattern (autosomal dominant, recessive, or x-linked; gupta & state, 2007; o'roak & state, 2008) . rather, there are reports of multiple, distinctly rare changes in the genetic code in small subsets of individuals that cause or contribute to autism. there may be multiple gene variants -"a conspiracy of multiple genes" (gupta & state, 2007 ) -that converge leading to a given phenotype. despite the indirect evidence for heritability and recent genetic technological advances, a genetic cause can be attributed to only 10-20% of all cases (reviewed in abrahams & geschwind, 2008) , with a recent report suggesting a genetic cause can be uncovered in up to 40% of cases (schaefer & lutz, 2006) . further, abraham and geschwind (2008) state that no single genetic cause accounts for more than 1-2% of cases -similar to what is seen in mental retardation, another condition without a single genetic cause. while numerous studies identifying candidate genes or makers have been reported, very few studies have been replicated (losh et al., 2008) . reasons to explain why candidates have not been agreed upon include the initial lack of uniform diagnostic criteria (strict vs. broad definition), limited power, varying methodology (losh et al., 2008) , and neglect of epigenetic factors modeling the disorder (lahiri, maloney, & zawia, 2009) . further, as with other conditions with a strong heritable component, it appears that different genes may contribute to distinct components of the condition which gives rise to the full disorder through concerted actions (losh et al., 2008; pickles et al., 1995) . consequently, it has been said that linkage technology has not "found the autism gene," but rather it demonstrated that more powerful technology is necessary to explain the multiple genes associated with autism (abrahams & geschwind, 2008) . no one knows just how heterogeneous the syndrome is likely to be, that is, how many genes or regions of dna (loci) may contribute either within a single individual or among the entire group of affected individuals. some of the chromosomal regions and genes that have been associated with autism are summarized in table 6 .1 and will be addressed herein. in addition, it is important to distinguish between locus heterogeneity, which refers to a variation at many different genes or loci resulting in a similar phenotype, and allelic heterogeneity, which refers to different variations or mutations at the same locus leading to an identical or overlapping clinical picture. accumulating evidence suggest that both play a role in autism (o'roak & state, 2008) . recently, a rare-variant common disease model has been introduced (o' roak & state, 2008) . in this model, rare genes explain the common disease of autism. this makes sense in the darwinian tradition that a deleterious change in the human genome leads to reduced fitness and that this would not be likely to propagate within a population. autism fits the rare gene-common disease model as it begins early in life, impairs social interactions, and is associated with mental retardation which impacts reproductive fitness (o'roak & state, 2008) . the large difference seen between monozygotic and dizygotic concordance rates is consistent with de novo (rare gene) mutation. it appears that simplex families demonstrate de novo mutations more frequentlywith one family member with autism (sebat et al., 2007) whereas multiplex families demonstrate transmitted variants (bakkaloglu et al., 2008) . the search for rare variants has led to the study of families that are genetically isolated, with shared ancestry and prone to consanguinity (morrow et al., 2008; strauss et al., 2006) to locate recessive alleles. finally, rare sequence variations have been detected, thanks to improved technology such as cnv analysis. indeed within the last 5-10 years, 20 bona fide risk genes have been identified due to the improved technology developed after the linkage method (abrahams & geschwind, 2008) . several of these risk genes will be discussed in the following sections. abrahams and geschwind (2008) write that two views underlie genetic autism research. the first is that rare variants involve an abnormality in a single molecule causing the clinical pathology in autism. mendelian inheritance (autosomal dominant vs. recessive) takes this form. technology compatible with this approach includes cytogenetics (including karyotyping and fish), gene association studies (analysis of genes and protein system from less complex genetic syndromes similar to autism such as rett and fragile x syndromes), linkage studies (including genome screens in affected sibling pairs), microarray technology, and cnv analysis. the second view involves the study of independent hereditary endophenotypes representing the core features of autism. an endophenotype is an operationally defined behavioral feature such as age of first spoken word. although there is a general consensus regarding endophenotypes commonly seen in autism, the behaviors may not be specific to autism, as delay in language acquisition accompanies many developmental disorders. ideally, endophenotypes represent discreet behaviors which do not overlap with other behaviors from the autism core domains (i.e., language, socialization, and rigid behavior). for example, poor eye contact, representing a deficit with socialization, is construed as being independent of repetitive behaviors, thought to represent the rigid behavior domain. genetic study of these discreet endophenotypes, largely undertaken via linkage studies, narrows the scope of analysis (losh et al., 2008; o'roak & state, 2008) . reports of chromosomal abnormalities detected by karyotyping first demonstrated that rare variants may contribute to autism (vorstman et al., 2006) . estimates of chromosomal abnormalities in autism range from 6 to 40% (abrahams & geschwind, 2008; marshall et al., 2008; pennington, 2009; schaefer & lutz, 2006) so that genetic workup is now being recommended for all children diagnosed with autism (pennington, 2009; . chromosomal studies have produced a great interest in chromosome 15q11-13, the site of the most frequent chromosomal anomaly seen in autism (reviewed in veenstra-vanderweele & cook, 2004) . the maternally derived duplication of this region involves an imprinting mechanism. maternal interstitial duplication or supernumerary inverted duplication of 15q11-13 is seen in 1-3% of patients (cook et al., 1990; schroer et al., 1998) . clinical features of chromosomal derangement in this region include mental retardation, motor impairment, seizure disorder, and impairment in communication, in some but not all with autism or attentiondeficit hyperactivity disorder (adhd; cook et al., 1997; schroer et al., 1998) . the duplication of 15q11-13 seen in some cases of autism is the opposite of deletions from the same region seen in angelman's syndrome (if inherited from the mother) or prader willi syndrome (if inherited from the father). angelman's syndrome, known as the "happy puppet" phenotype, involves mental retardation, epilepsy, ataxia, lack of speech, predominant laughing and smiling, and a high rate of autism. prader willi syndrome associated with mental retardation and hyperphagia is only occasionally associated with autism. presently, there is ongoing research regarding how duplications and deletions in this gene region lead to an increased risk of autism (veenstra-vanderweele & cook, 2004) . this region, however, has not been identified as one of interest in whole genome searches, perhaps due to its rarity (pennington, 2009 ). still, fish analysis of chromosome 15q11-13 is often performed in evaluation of children with autism. recently, fish is being replaced by cgh due to improved detection of abnormality within the 15q11-13 region. cytogenetic approaches provided the first evidence for an autism gene 40 years ago when lubs (1969) identified an abnormal or "fragile" site on the long arm of chromosome x in four males with mental retardation, leading to the recognition of fragile x syndrome (fxs). this syndrome, associated with mental retardation and autistic features, is more severely expressed in males. fxs is caused by a deficiency of the fragile x mental retardation protein (fmrp), resulting from little or none of the disease gene fragile x mental retardation 1 (fmr1) mrna. the fmr1 gene mutation consisting of expanded cgg repeats of > 200 at chromosome site xq27.3 is considered the origin of fxs. autism, using the broad definition, has been reported in up to 30% of males with fxs, and fxs can be found in as many as 7-8% of individuals with autism (muhle, trentacoste, & rapin, 2004) . later studies, using dna measures of the fragile x mutation rather than cytogenetics and strict autism criteria, have found a smaller association between fxs and autism (pennington, 2009) . two investigations, however, which studied carefully controlled groups of fxs-negative and fxs-positive males matched on intelligence, found higher rates of the following autistic symptoms in fxs males: gaze avoidance and hand flapping (einfeld, molony, & hall, 1989 ) and stereotypic movements (including hand flapping, rocking, and hitting, scratching, or rubbing their own bodies), echolalia, gaze avoidance, and ritualistic behaviors (maes, fryns, van walleghem, & van den berghe, 1993) . studies of endophenotype behaviors rather than the strict autism criteria are more likely to uncover robust similarities between fxs and autism. exploring similarities between rare sub-groups of patients with a known disorder and those with a more common disorder (autism) provides a window into the shared biology between the disorders. fmrp has been shown to interact with multiple transcripts in repressing metabotropic glutamate receptor-5 (mglur 5 ) signaling activity which regulates long-term depression (ltd) associated with synaptic elimination. without fmrp acting as a "brake," mglur-ltd is enhanced (bear, huber, & warren, 2004) . this favors an anabolic state which could contribute to the key features of fxs such as epilepsy, cognitive impairment, loss of motor coordination, and increased density of thin, long, dendritic spines in neurons (bear, 2005) . these observations have invited drug trials of glutamate receptor antagonists to reduce the expression of mglur 5 (dolen et al., 2007) in individuals with fxs and in individuals with autism. other genes contributing to medical conditions such as alzheimer's disease are under investigation to determine if they, too, share biology with autism and fxs. the discovery of fragile x ataxia syndrome (fxtas) provides a precedent to study the relationship of developmental conditions across the life span within and between families. fxtas is an adult-onset ataxia/dementia syndrome found in older family members of individuals with fxs (hagerman, 2002; hagerman et al., 2001) . in contrast to what is found in fxs, fxtas occurs with a rise in fmr1 mrna level (hagerman, 2006) , fewer than 200 repeats (basehore & friez, 2009) , and hypomethylation (berry-kravis, potanas, weinberg, zhou, & goetz, 2005) in the fmr1 cgg section on the x chromosome. this association of two clinically divergent disorders, regulated by the same gene produced in different doses, sets the stage for the investigation of the shared biology between autism, fxs, and other genes related to alzheimer's disease such as amyloid precursor protein (app). app, which is encoded by a gene on chromosome 21, is a large (695-770 amino acid) glycoprotein produced in several central nervous system (cns) cell types including microglia, astrocytes, oligodendrocytes, and neurons. after protein processing, mature app is axonally transported and can be released from axon terminals in response to electrical activity. app is believed to play an important role in neuronal maturation and in synaptogenesis as reviewed by lahiri, farlow, greig, giacobini, and schneider (2003) . app is of great interest because processing products of app can include the insoluble 40-42-amino acid amyloid î²-peptide (aî²-40 and aî²-42, respectively), the principal component of the cerebral plaques associated with memory and cognitive decline in alzheimer's disease (alley et al., 2010) . recent research linking app to autism illustrates how association studies enable the generation of new hypotheses about the biology of autism and ultimately advance our understanding of this disorder. recently, we found evidence to support an association between autism and one of the app pathways established for alzheimer's disease. in contrast to the upregulation of the amyloidogenic pathway as seen in alzheimer's disease, we found evidence that there may be an upregulation of the nonamyloidogenic (î±-secretase amyloid precursor protein, sappî±) pathway in a small sample of children with severe autism associated with self-injurious and aggressive behavior. that is, children with severe autism expressed total sapp (representing the combined amyloidogenic and nonamyloidogenic pathways) at two or more times the levels of children without autism and up to four times the levels of children with mild autism (sokol et al., 2006) . overall, there was a trend toward higher sappî± within the children with autism. one of the severely autistic children in this study had fxs. high levels of sappî± have also been found in a sample of children with autism from an independent laboratory (bailey et al., 2008) . high levels of sappî± imply an increased î±-secretase pathway in autism (anabolic), opposite to what is seen in alzheimer's disease. this would be consistent with the brain overgrowth hypothesis attributed to autism, in contrast to the brain atrophy seen with the deposition of amyloid plaque in alzheimer's disease. coincidentally, via animal studies, westmark and malter (2007) concluded that fmrp binds to and regulates translation of app mrna through mglur 5 , providing a potential link between neuronal proteins associated with ad and fxs. by way of mglur 5 , fmrp provides the "brake," which if unchecked in conditions such as fxs would favor high levels of app. further, high levels of app have been found in another study of fmr1 knockout mice (d'agata et al., 2002) . these findings have led to enticing speculation that the app gene may influence several neurodevelopmental disorders across the life span. both chromosomal inversions and translocations have been reported near 7q31 in boys with autism. one of these translocation breakpoints identified the deranged gene as ray1/st7 (fam4a1), a putative tumor suppressor gene (vincent et al., 2002) , and work in this region is ongoing. mutations leading to amino acid changes have been found on the wnt2 gene on 7q31 in two families with autism, and one affected parent transmitting the mutation to two affected children (li et al., 2004; wassink, brzustowicz, bartlett, & szatmari, 2004) . the foxp2 gene on 7q31 was found to be disrupted in one family with an autosomal dominant form of specific language impairment (sli) (lai, fisher, hurst, vargha-khadem, & monaco, 2001) , and this was replicated in a larger study of sli (o'brien, zhang, nishimura, tomblin, & murray, 2003) , although initial studies failed to associate this with autism (newbury et al., 2002; wassink et al., 2001) . recently, foxp2 differences in gene expression were found between chimp and human cell cultures . compared to chimps, in human culture, the foxp2 gene affected transcription upregulation in 61 genes and downregulation in 55 genes. further, the foxp2 gene was shown to regulate downstream effects involving cerebellar motor function, craniofacial formation, and cartilage and connective tissue formation required for expressive language. as discussed in section "endophenotype," there is a suspected association between the chromosome 7q region and speech abnormality seen in autism. the reln gene maps to chromosome 7q22. this gene encodes a protein that controls intercellular interactions involved in neuronal migration and positioning in brain development. a large polymorphic trinucleotide repeat in the 5 -ytr of the reln gene has been implicated in autism in several studies (ashley-koch et al., 2007; persico et al., 2002; zhang et al., 2002) . further support for this candidate gene comes from the reln mouse model which carries a large deletion in reln and shows atypical cortical organization similar to cytoarchitectural pathological anomalies reported in the brains of individuals with autism (bailey et al., 2008) . pten is a tumor suppressor gene located on chromosome 10q23. this gene influences the cell-cell arrest cycle and apoptosis or programmed cell death (lintas & persico, 2009) . pten inactivation causes excessive growth of dendrites and axons, with an increased number of synapses (kwon et al., 2006) . mutations in pten cause cowden's syndrome (macrocephaly, hamartomas, and autism). this gene has been of interest to autism researchers as macrocephaly has been considered to be one of the "most widely replicated biological findings in autism" affecting up to 20% of all children with the condition (mccaffery & deutsch, 2005) . butler et al. (2005) examined the pten gene in 18 individuals with autism and macrocephaly. they discovered three with pten missense mutations. others have found pten mutations in macrocephalic patients with autism (herman et al., 2007a (herman et al., , 2007b . cytogenic abnormalities have included deletions and duplications involving 2q37, 22 q13, 22q11, 17p11, 16p11.2,18q, xp, and sex chromosome aneuploidies (47, xyy and 45, x/46, and xy mosaicism) (costa e silva, 2008; sykes & lamb, 2007) . these genetic findings have generated interest in testing the association of a number of candidate genes in these regions via linkage and animal studies. accumulating evidence points to the involvement of three genes (neuroligin, shank3, and neurexin) in the synapse formation of glutamate neurons. glutamate is an excitatory neurotransmitter and aberrant glutamate function has long been suspected to contribute to autism. neuroligins induce presynaptic differentiation in glutamate axons. shank3 encodes for a postsynaptic scaffolding protein which regulates the structural organization of dendritic spines in neurons. consequently, shank3 is a binding partner of neuroligins. neurexin induces glutamate postsynaptic differentiation in contacting dendrites. altogether, these genes appear to contribute to glutamatergic synapse formation. neuronal cell adhesion is important in the development of the nervous system, contributing to axonal guidance, synaptic formation and plasticity, and neuronalglial interactions (glessner et al., 2009; lien, klezovitch, & vasioukhin, 2006) . derangement in cell adhesion could contribute to migrational abnormalities including brain overgrowth. neuroligins are cell-adhesion molecules localized postsynaptically at both glutamatergic (nlgn1, nlgn3, nlgn4, nlgn4y) and î³-aminobutyric acidergic (nlgn2) synapses (lintas & persico, 2009) . neuroligins trigger the formation of functional presynaptic structures in contacting axons. as mentioned above, they interact with postsynaptic scaffolding proteins such as shank3 (see below). the nlgn3 and nlgn4 genes are located at chromosomes xq13 and xq22, 33, respectively, and mutations here have been associated with autism jamain et al. (2003) . extensive genetic screens conducted by several research groups have confirmed the low frequency of neuroligin mutations among individuals with autism (lintas & persico, 2009) . for example, jamain et al. (2003) found a frameshift mutation in nlgn4 and a missense mutation in nlgn3 in two unrelated swedish families, inherited from unaffected mothers. laumonnier et al. (2004) reported a frameshift mutation in nlgn4 in 13 affected male members of a single pedigree. lawson-yuen, saldivar, sommer, and picker (2008) found a deletion of exons 4-6 of nlgn4 in a boy with autism and in his brother with tourette syndrome whose mother showed psychiatric problems and also carried the mutation. neuroligin mutation carriers, however, display a variety of syndromes, such as x-linked mental retardation without autism (laumonnier et al., 2004) , asperger's syndrome, autistic disorder of variable severity, and pdd-nos (yan et al., 2005) . the symptoms may be slow, or abrupt and associated with regression. despite intensive investigation, the low frequency of neuroligin mutations and the lack of similar phenotypes have led to the recommendation that neuroligins should not be included in widespread screens for individuals with nonsyndromic autism (lintas & persico, 2009 ). the shank3 gene is located on chromosome 22q13.3 which encodes for a scaffolding protein found in the postsynaptic density complex of excitatory synapses binding to neuroligins. recently, two studies have reported a correlation between mutations affecting shank3 and an autism phenotype characterized by severe verbal and social deficits (durand et al., 2007; moessner et al., 2007) . of the seven patients reported with the shank3 gene mutations, five were deletions, one was a missense, and another a frameshift mutation. in addition, rare shank3 variations were present in the autism group, but not in the control group. these variations were inherited from healthy parents and they were present in unaffected siblings, perhaps demonstrating incomplete penetrance. another interesting observation is that in both studies (durand et al., 2007; moessner et al., 2007) , the shank3 deletion was inherited via a paternal balanced translocation. further, in both studies, siblings of the probands with shank3 abnormalities had partial 22q13 trisomy that resulted in attention-deficit hyperactivity disorder (moessner et al., 2007) and asperger's syndrome with early language development (durand et al., 2007) . like neuroligins, shank3 mutations are very rare. however, because of the robust genotype-phenotype correlation reported in two studies (durand et al., 2007; moessner et al., 2007) , it has been recommended that children with severe language and social impairment obtain shank3 mutation screening (lintas & persico, 2009 ). presynaptic neurexins influence postsynaptic differentiation in contacting dendrites by interactions with postsynaptic neuroligins. three neurexin genes (nrxn1, nrxn2, and nrxn3) are located on chromosome loci 2q32, 11q13, and 14q24.3-q31.1, respectively. again, neurexin mutations are very rare. for example, two missense mutations were present in 4 of 264 (1.5%) individuals with autism compared to none in 729 controls (feng et al., 2006) . however, in this study, missense mutations also occurred in first-degree relatives who displayed heterogeneous phenotypes such as hyperactivity, depression, and learning problems. incomplete penetrance may explain these findings, or autism may be caused by neurexin acting synergistically with other susceptibility genes. neurexin-1î± is being intensely studied in mice as deletion in this molecule resulted in increased repetitive behavior, whereas social behavior was relatively intact (etherton, blaiss, powell, & sudhof, 2009 ). this implies an animal model for a discreet feature of autistic behavior (endotype), as discussed below. contactin-associated protein-like 2 (cntnap2) is a member of the neurexin superfamily which involves a recessive frameshift mutation on chromosome 7q35. it is one of the largest genes of the human genome and encodes caspr2, a transmembrane scaffolding protein. cntnap2 has been associated with cortical dysplasia-focal epilepsy in an old order amish community (strauss et al., 2006) . autism was present in up to 67% of these individuals. we recently reported cntnap2 in an amish girl with epilepsy and autism who also showed hepatosplenomegaly (jackman, horn, molleston, & sokol, 2009 ). cntnap2 has been associated with an autism language phenotype in large studies of non-amish individuals (alarcon et al., 2008) . further, stage two of this investigation showed that cntnap2 was expressed in the language centers (frontal and anterior temporal lobes) of fetal brain (alarcon et al., 2008) . it has been recommended that amish children presenting with autism should be tested for cntnap2 gene mutation (strauss, personal communication) . this work again supports a language function associated with chromosome 7. methyl-cpg-binding protein 2 (mecp2) works as a transcriptional repressor within gene-promoting regions involved in chromatin repression. this gene is located on chromosome xq28 and shows mutation in 80% of females with rett syndrome (pervasive developmental disorder, acquired microcephaly, epilepsy, and loss of hand function). this gene has been studied in children with autism, and mecp2 mutations are considered to be rare (0.8-1.3%) in females with autism. interestingly, autism and rett syndrome share some similarities at the phenotypic and pathogenic levels, and both disorders were proposed to result from disruption of postnatal or experience-dependent synaptic plasticity (zoghbi, 2003) . among the rett mutations reported are a frameshift mutation, a nonsense mutation, and additional introns (lintas & persico, 2009 ). observe that mecp2 has not been associated with idiopathic autism in males so that this gene test is recommended only for females with autism. linkage studies involve determining whether the transmission of a chromosomal segment from one generation to another coincides with the presence of the phenotype of interest (gupta & state, 2007) . linkage can be utilized in mendelian inherited conditions and can also be used to study complex conditions such as in autism when mendelian inheritance is unlikely, and there is no hypotheses regarding the specific nature of transmission. linkage studies can be grouped into two types -the conventional study of a chromosomal region of interest in affected sibling-pairs from multiplex families and the more recent genome-wide linkage analysis in which every chromosome is evaluated simultaneously. in the sib-pair study method, siblings with autism are evaluated to determine whether they share any regions of the genome more frequently than would be expected by chance. genomic wide association studies compare genetic risk factors in the form of specific genetic markers in cases and controls. these markers are distributed within the entire genome rather than limited to specific gene regions such as in the sib-pair method. this enables a more unbiased ascertainment of regions of interest (losh et al., 2008) . several genome-wide scans of individuals with autism have been reported and evidence in favor of linkage has been determined for the majority of chromosomes (gupta & state, 2007) . the trouble with these studies is that the evidence has not reached statistical significance and there is lack of replication for many of these findings. statistical significance is calculated by a logarithm of the odds (lod) score. this score represents the logarithm of the likelihood ratio of observing the data under a model of linkage to observing the data under a model of free recombination (no linkage). an lod score of 3.5 in a sib-pair analysis is considered to be a significant linkage (lander & kruglyak, 1995) . suggestive linkage is an lod score of 2.2, and a highly significant lod score is 5.4. for replication studies, the replication threshold is considered to be 1.4 (lander & kruglyak, 1995) . by chance, one would expect to see a suggestive linkage peak once every genome scan or a significant peak once every 20 scans (gupta & state, 2007) . most linkage studies have identified linkage regions reaching the level of suggestive linkage at best (freitag, 2007) . despite large increases in the size of patient cohorts, linkage signals have not increased significantly with sample size (abrahams & geschwind, 2008) . only three loci (2q, 7q, and 17q11-17) have been replicated for nonsyndromic autism and are considered to have genome-wide significance. genomic-wide screens have engendered great interest in chromosome 7q with distinctive peaks involving two distinct regions: 7q21-22 and 7q32-36 (international molecular genetic study of autism consortium-imgsac, 1998; collaborative linkage study of autism, 2001). as chromosome 7q has been discussed in section "cytogenetics: rare mutations," chromosome 2 and then 17q11 will follow the description of how linkage studies led to the discovery of the gene loci for a syndromic form of autism: tuberous sclerosis complex (tsc). the sibling-pair from multiplex family design has been used to study autism within many genetic loci, including those involved in tsc; the genomewide linkage analysis has detected linkage on chromosomes 7q, 2q, 17q11, and novel loci. tuberous sclerosis complex (tsc) is a neurodevelopmental disorder characterized by cognitive delay, epilepsy, and neurocutaneous growths including hamartomas (i.e., tubers) within the central nervous system and other organs. up to 60% of individuals with tsc have autism. the first suspicion of a chromosomal abnormality associated with tsc originated from a linkage study of 26 protein markers within 19 multigenerational families affected with tsc (fryer et al., 1987) . in eight of the families, abo blood group gene mapped to chromosome 9q34.3. many groups corroborated these results using larger numbers of families (au, williams, gambello, & northrup, 2004) . these linkage studies established the tsc1 gene, later discovered to encode hamartin, a growth suppressor protein. another tsc gene site was discovered via linkage studies on chromosome 16p13.3, known as tsc2. this gene produces the tumor suppressor protein tuberin. further evidence showed that hamartin works together with tuberin in several cell-signaling pathways including a growth and translation regulatory pathway (p13k/pkb), a cell adhesion/migration/protein transportation pathway [glycogen synthase kinase 3 (gsk3)/î²-catenin/focal adhesion kinase/ras-related homolog (rho) pathway], and a cell growth and proliferation pathway [mitogen-activated protein kinase (mapk)]. the tuberin-hamartin complex affects mtor kinase activity (au et al., 2004) , promoting tumor growth. this discovery has led to the study of rapamycin, a drug used in organ transplant and cancer treatment, as a therapy for suppressing growth of tumors in tsc (kenerson, aicher, true, & yeung, 2002) . other growth-promoting genes such as pten have been linked to the tuberin-hamartin gene complex as contributors to the general overgrowth in tsc. as previously discussed, pten is a gene of interest in autism. buxbaum et al. (2007) , together with the seaver autism research center (sars), reported a two-point dominant lod score of 2.25 on chromosome 2q in 35 affected sibling pairs. in a second-stage screening which employed 60 families with autism probands, the strongest linkage was at this same location. the imgsac study showed a strong linkage to 7q as described above, but also found linkage to 2q (imgsac, 1998) , with a more recent study also providing evidence for linkage to 2q (imgsac, 2001b) . early recognition of the need for large patient cohorts and substantial genetic heterogeneity led to the establishment of the autism genetic resource exchange (agre) composed of over 500 families. agre is a publically available resource of phenotypic data and biomaterials. genomic scan linkage analysis was performed on 109 autistic sibling pairs from 91 agre families together with analysis of those pairs from an independent sample of 345 families from the same agre cohort (cantor et al., 2005) . when families with autism were stratified into only those with affected males, there was significant linkage at 17q11-21 in both samples. the lod score for this replicated work was at genome-wide level of significance (lod score > 1.4). one positional candidate gene close to this region is the serotonin transporter gene (slc6a4) which codes for a protein controlling the reuptake of serotonin in the synapse. this was exciting as elevated levels of serotonin have been determined in 25-30% of cases of autism (cook et al., 1990) . indeed, relationship between the slc6a4 site and a repeat polymorphic region in its promoter region (5httlpr) recently has been independently investigated (losh et al., 2008) . the chromosome 17q locus, however, has not been uniformly observed in subsequent, large-scale studies (schellenberg et al., 2006) which may be due to phenotypic heterogeneity. a very recent genome-wide linkage mapping study in a sample of 1,031 multiplex autism families (weiss, arking, daly, & chakravarti, 2009 ) identified significant linkage on chromosome 20p13 and suggestive linkage on chromosome 6q27. in this study, no associations meeting criteria for genome-wide significance were found, suggesting there are not many common loci of moderate to large effect size. however, replication data revealed a novel snp locus on chromosome 5p15. this region was adjacent to sema5a, a member of the semaphoring axonal guidance protein family which has been shown to be downregulated in transformed b lymphocytes from autism samples (weiss et al., 2009 ). the authors further demonstrated lower sema5a gene expression in autism brain tissue. this finding is in keeping with the suspected derangement in the migration of cortical neurons during embryogenesis in autism. microarray technology is transforming the identification of chromosome duplications and deletions (gupta & state, 2007) . this new technology, known as high-density, oligonucleotide-based array comparative genomic hybridization (acgh), is now available for widespread use. this technique uses patient dna and control dna, each labeled with a fluorescent tag (red or green). equal amounts of dna from patient and control are hybridized to known regions of the human genome pre-arrayed on a slide. if patient and control have equal copy numbers at a given locus, the color turns yellow, representing an equal measure of dna. if the patient has lost (deleted) a locus, only the control color is visualized. if the patient has an extra copy at a locus (duplication), the patient color predominates. the acgh probe may be enriched for known genes or specific chromosomal regions for known syndromes, or distributed evenly across the whole genome. this new technique is now available at all major medical centers and through signature genomic laboratories (www.signaturegenomics.com). single-nucleotide polymorphism (snp) analysis probes thousands of snps and provides data about copy number and genotype (li & andersson, 2009 ). the genotype can be used to study uniparental disomy (upi) seen in imprinting disorders and consanguinity. this technique is better at focused study of specific gene regions instead of the whole genome. additionally, snp analysis uses pre-established laboratory standards rather than intraexperimental controls. both snp analysis and acgh map produce copy number variation, and it is common for labs to first perform acgh and follow up with snp in specific regions of interest. today, cnv from several thousand nucleotides can be identified, greatly improving upon the sensitivity of conventional cytogenetics. sebat et al. (2007) showed that cnvs were present in 10% of affected individuals from singleincidence autism families (i.e., sporadic cases), contrasting with substantially lower rates observed in controls (1%) and autism cases from multiplex families (3%). this has led to the general expectation that de novo cnvs are more likely to be found in sporadic (and as it turns out, dysmorphic) cases. jacquemont et al., 2006 reported cnv rearrangements in 8 (27.5%) of 29 patients with syndromic autism (including facial dysmorphism, limb or visceral malformations, and growth abnormalities). there were no reoccurrence or overlap in these variants for the eight children. chromosome 11q12-p13 and neurexins were implicated in a linkage study using snp-cnv in 1,181 families with at least two affected individuals through the autism genome project (szatmari et al., 2007) . microarray technology, however, comes with the realization that typically developing individuals have more structural variants than previously imagined (sebat et al., 2007) . there is growing concern that cnv irregularities may not be pathological and therefore not be a cause of autism (tabor & cho, 2007) . these authors note that using diagnostic tools prematurely in a clinical context may be "unethical, either because of over-treatment, under-treatment, unwarranted labeling and stigmatization, or a false sense of security." certainly, further studies on large cohorts of children with the same deletion/duplication are necessary to enable clinical application of this technology. due to the phenotypic heterogeneity of autism and the lack of finding a specific gene, researchers have narrowed the scope of analysis to more pure, operationally defined behaviors/traits. endophenotypes are behavioral, physiological, and/or neuropsychological markers that are present in both affected and unaffected individuals. rather than searching for "autism genes," endophenotype investigations search for smaller grouping of genes that contribute to discreet phenotypes (losh et al., 2008) . this approach allows for measurement of "dosage" of a trait and can be applied to affected as well as unaffected individuals. specific language impairment (sli) is defined as the failure of normal development of language without hearing loss, mental retardation, or oral motor, neurological, or psychiatric impairment. this affects approximately 7% of children entering school (tomblin et al., 1997) . individuals with sli perform poorly on phonologically based tasks and many go on to develop dyslexia (stothard, snowling, bishop, chipchase, & kaplan, 1998 ). there appears to be neurostructural association between sli and autism (herbert et al., 2002; 2004) , and there is an associated finding of language delay in relatives of probands with autism (wassink et al., 2004) . several investigations have found strong linkage of sli to chromosome 13q21 (bartlett et al., 2002; 2004) and to chromosomes 16q and 19q (sli consortium, 2002 , 2004 . this finding is noteworthy as it directly overlies the 16q locus linked to autism (wassink et al., 2004) . a specific language marker, age of first word, has shown significant linkage to chromosome 7 observed by five separate investigators (losh et al., 2008) . the 7q region has been the subject of intense investigation, as described above, and may be the loci associated with autism language. the restricted and repetitive behavior (rrb) endophenotype in children with autism is receiving attention (morgan, wetherby, & barber, 2008) . rrb comprises one prong of the autism clinical triad (language deficit, social deficit, and rrb). cuccaro et al. (2003) identified two factors underlying rrb as measured by the autism diagnostic interview-revised (adi-r): repetitive sensory motor actions and resistance to change. this is of interest as a genetic linkage signal has been reported for resistance to change on chromosome 15 . it appears that the analysis of phenotypic homogeneous subtypes may be a powerful tool for mapping of candidate genes in complex traits such as autism. the recent marked increase in the incidence and awareness of autism has resulted in an increase in the number of children sent for diagnostic evaluation. the general consensus within the literature is that genetic consultation should be conducted on all persons with the confirmed diagnosis of autism. while referral for genetic consult is often preferred, often the primary practitioner, pediatrician, and/or pediatric neurologist are in a position to conduct the initial evaluation. further, clinical geneticists may not be available, particularly in underserved areas of the country, or a timely genetic evaluation may not be possible due to lengthy waiting lists. therefore, the clinician taking care of a child with autism may wish to initiate the genetic evaluation. recently, sequential guidelines for clinical genetics evaluation in autism have been published by schaefer, mendelsohn, and the professional practice and guidelines committee (2008) (see box 6.5). these evidence-based guidelines have evolved from an original retrospective study of patients referred for clinical genetics evaluation between the years 2002 and 2004 at the university of nebraska medical center (schaefer & lutz, 2006) . the guidelines are dynamic rather than static and have been updated, for example, as acgh has become more available. the stepwise approach was designed to balance cost with the expected yield of the tests. further, there is a pyramidal effect so that "earlier tiers have a greater expected diagnostic yield, lower invasiveness of testing, better potential of intervention, and easier overall practicality" . summarizing the diagnostic yields expected for the following investigations (high-resolution chromosome studies, 5%; acgh -beyond that detected by chromosome analysis -10%; fragile x, 5%; mecp2, 5% women only; pten, 3% if head circumference >2.5 sd; other, 10%), it was predicted that utilization of these guidelines would lead to diagnosis in 40% of cases (schaefer, mendelsohn, and the professional practice and guidelines committee (2008) . finally, this stepwise approach has met the approval of third-party payers and families (schaefer & lutz, 2006 ). confirmation of diagnosis of autism by trained professional using objective criteria and tools sensory screening ( this initial step includes confirming the diagnosis of autism using objective (dsm-iv) criteria and/or standardized objective measures such as those discussed in chapter 15. an audiogram is obtained to rule out hearing loss and an electroencephalogram (eeg) is obtained if there is a clinical suspicion of seizures. cognitive testing, when appropriate, can determine mental retardation. finally, verifying the results of the newborn screen can help rule out rubella and pku. initial assessment involves a standard clinical genetic history and physical exam to identify known syndromes or associated conditions. included would be a wood's lamp examination of the skin to help rule out neurocutaneous conditions such as tsc. also, for suspected diagnoses, standard metabolic screening is performed to check for urine mucopolysaccharides and organic acids as well as serum lactate, amino acid, ammonia, and acylcarnitine profile. if not already performed, high-resolution chromosomal analysis and dna for fragile x is sent. if the studies in the first tier are unrevealing, the second tier checks for acgh duplications and deletions. for patients with pigmentary abnormalities on exam but with a normal leukocyte karyotype, skin biopsy can be obtained to obtain a fibroblast karyotype. for females, mecp2 gene testing is obtained; for children with head circumference greater than 2.5 sd from the mean, pten gene testing is recommended. lower yield tests have been reserved for this level: brain magnetic resonance imaging and serum and urine uric acid. further tests are outlined depending on the results (high or low) of the uric acid tests. we would add here that if a child has significant language impairment, a shank3 mutation should be ruled out. finally, new susceptibility loci that can contribute to the autism phenotype are continually identified and catalogued in the online mendelian inheritance in man (omim) database (http//www.ncbi.nlm.nih.gov/sites/entrez). using this search engine, a patient's phenotype including dysmorphic features can be entered into the program to generate a list of possible genetic diagnoses. most clinical geneticists work with genetic counselors who interpret the findings into recurrent risks for full siblings. the tiered clinical genetics evaluation should identify two groups of individuals: those with and those without an identifiable etiology for autism. for those without an identifiable etiology, counseling should be provided according to multifactorial inheritance . that is, 4% recurrence rate if the proband is a girl and 7% if the proband is a boy. if a second child has autism, a reasonable recurrence rate, based upon published reports, is 30%. advances in autism genetics: on the threshold of a new neurobiology linkage, association, and geneexpression analyses identify cntnap2 as an autism-susceptibility gene memantine lowers amyloid-beta peptide levels in neuronal cultures and in app/ps1 transgenic mice investigation of potential gene-gene interactions between apoe and reln contributing to autism risk molecular genetic basis of tuberous sclerosis complex: from bench to bedside peripheral biomarkers in autism: secreted amyloid precursor protein-alpha as a probable key player in early diagnosis autism as a strongly genetic disorder: evidence from a british twin study molecular cytogenetic analysis and resequencing of contactin associated protein-like 2 in autism spectrum disorders examination of potential overlap in autism and language loci on chromosomes 2, 7, and 13 in two independent samples ascertained for specific language impairment a major susceptibility locus for specific language impairment is located on 13q21 molecular analysis of fragile x syndrome therapeutic implications of the mglur theory of fragile x mental retardation the mglur theory of fragile x mental retardation fragile x-associated tremor/ataxia syndrome in sisters related to x-inactivation subset of individuals with autism spectrum disorders and extreme macrocephaly associated with germline pten tumour suppressor gene mutations mutation screening of the pten gene in patients with autism spectrum disorders and macrocephaly replication of autism linkage: finemapping peak at 17q21 an autosomal genomic screen for autism an autosomal genomic screen for autism autistic children and their first degree relatives: relationships between serotonin and norepinephrine levels and intelligence autism or atypical autism in maternally but not paternally derived proximal 15q duplication autism, a brain developmental disorder: some new pathophysiologic and genetics findings evidence of brain overgrowth in the first year of life in autism unusual brain growth patterns in early life in patients with autistic disorder: an mri study factor analysis of restricted and repetitive behaviors in autism using the autism diagnostic interview-r neuron, know thy neighbor correction of fragile x syndrome in mice mutations in the gene encoding the synaptic scaffolding protein shank3 are associated with autism spectrum disorders gene expression profiles in a transgenic animal model of fragile x syndrome autism is not associated with the fragile x syndrome mouse neurexin-1alpha deletion causes correlated electrophysiological and behavioral changes consistent with cognitive impairments high frequency of neurexin 1beta signal peptide structural variants in patients with autism infantile autism: a genetic study of 21 twin pairs the genetics of autistic disorders and its clinical relevance: a review of the literature evidence that the gene for tuberous sclerosis is on chromosome 9 advances in autism neuroscience in the era of functional genomics and systems biology autism genome-wide copy number variation reveals ubiquitin and neuronal genes recent advances in the genetics of autism the physical and behavioral phenotype lessons from fragile x regarding neurobiology, autism, and neurodegeneration intention tremor, parkinsonism, and generalized brain atrophy in male carriers of fragile x abnormal asymmetry in language association cortex in autism localization of white matter volume increase in autism and developmental language disorder increasing knowledge of pten germline mutations: two additional patients with autism and macrocephaly a full genome screen for autism with evidence for linkage to a region on chromosome 7q. international molecular genetic study of autism consortium gene associated with seizures, autism, and hepatomegaly in an amish girl array-based comparative genomic hybridisation identifies high frequency of cryptic chromosomal rearrangements in patients with syndromic autism spectrum disorders mutations of the x-linked genes encoding neuroligins nlgn3 and nlgn4 are associated with autism activated mammalian target of rapamycin pathway in the pathogenesis of tuberous sclerosis complex renal tumors human-specific transcriptional regulation of cns development genes by foxp2 pten regulates neuronal arborization and social interaction in mice a critical analysis of new molecular targets and strategies for drug developments in alzheimer's disease the learn model: an epigenetic explanation for idiopathic neurobiological diseases a forkhead-domain gene is mutated in a severe speech and language disorder genetic dissection of complex traits: guidelines for interpreting and reporting linkage results initial sequencing and analysis of the human genome x-linked mental retardation and autism are associated with a mutation in the nlgn4 gene, a member of the neuroligin family familial deletion within nlgn4 associated with autism and tourette syndrome clinical application of microarray-based molecular cytogenetics: an emerging new era of genomic medicine lack of evidence for an association between wnt2 and reln polymorphisms and autism cadherin-catenin proteins in vertebrate development autistic phenotypes and genetic testing: state-of-the-art for the clinical geneticist current developments in the genetics of autism: from phenome to genome a marker x chromosome fragile-x syndrome and autism: a prevalent association or a misinterpreted connection neuropsychiatry of 18q autism-lessons from the x chromosome structural variation of chromosomes in autism spectrum disorder signaling events regulating the neurodevelopmental triad. glutamate and secreted forms of beta-amyloid precursor protein as examples macrocephaly and the control of brain growth in autistic disorders genetic evaluation of autism contribution of shank3 mutations to autism spectrum disorder repetitive and stereotyped movements in children with autism spectrum disorders late in the second year of life identifying autism loci and genes by tracing recent shared ancestry the genetics of autism foxp2 is not a major susceptibility gene for autism or specific language impairment association of specific language impairment (sli) to the region of 7q31 autism genetics: strategies, challenges, and opportunities diagnosing learning disorders: a neuropsychological framework serotonin transporter gene promoter variants do not explain the hyperserotonemia in autistic children latent-class analysis of recurrence risks for complex phenotypes with selection and measurement error: a twin and family history study of autism genetic liability for autism: the behavioral expression in relatives advancing paternal age and autism genetic studies of autism: from the 1970s into the millennium diagnostic yield in the clinical genetics evaluation of autism spectrum disorders genetics evaluation for the etiologic diagnosis of autism spectrum disorders clinical genetics evaluation in identifying the etiology of autism spectrum disorders evidence for multiple loci from a genome scan of autism kindreds autism and maternally derived aberrations of chromosome 15q strong association of de novo copy number mutations with autism fine mapping of autistic disorder to chromosome 15q11-q13 by use of phenotypic subtypes a genomewide scan identifies two novel loci involved in specific language impairment highly significant linkage to the sli1 locus in an expanded sample of individuals affected by specific language impairment genes, language development, and language disorders high levels of alzheimer beta-amyloid precursor protein (app) in children with severely autistic behavior and aggression neuroimaging in autistic spectrum disorder (asd) brain structural abnormalities in young children with autism spectrum disorder a twin study of autism in denmark language-impaired preschoolers: a followup into adolescence recessive symptomatic focal epilepsy and mutant contactin-associated protein-like 2 autism: the quest for the genes mapping autism risk loci using genetic linkage and chromosomal rearrangements ethical implications of array comparative genomic hybridization in complex phenotypes: points to consider in research prevalence of specific language impairment in kindergarten children molecular genetics of autism spectrum disorder the ray1/st7 tumor-suppressor locus on chromosome 7q31 represents a complex multi-transcript system neurology of the newborn identification of novel autism candidate regions through analysis of reported cytogenetic abnormalities associated with autism the search for autism disease genes evidence supporting wnt2 as an autism susceptibility gene a genome-wide linkage and association scan reveals novel loci for autism fmrp mediates mglur5-dependent translation of amyloid precursor protein analysis of the neuroligin 3 and 4 genes in autism and other neuropsychiatric patients reelin gene alleles and susceptibility to autism spectrum disorders postnatal neurodevelopmental disorders: meeting at the synapse? science acknowledgment thanks to bryan maloney, m.s. for his help in editing this manuscript. key: cord-002366-t94aufs3 authors: aurrecoechea, cristina; barreto, ana; basenko, evelina y.; brestelli, john; brunk, brian p.; cade, shon; crouch, kathryn; doherty, ryan; falke, dave; fischer, steve; gajria, bindu; harb, omar s.; heiges, mark; hertz-fowler, christiane; hu, sufen; iodice, john; kissinger, jessica c.; lawrence, cris; li, wei; pinney, deborah f.; pulman, jane a.; roos, david s.; shanmugasundram, achchuthan; silva-franco, fatima; steinbiss, sascha; stoeckert, christian j.; spruill, drew; wang, haiming; warrenfeltz, susanne; zheng, jie title: eupathdb: the eukaryotic pathogen genomics database resource date: 2017-01-04 journal: nucleic acids res doi: 10.1093/nar/gkw1105 sha: doc_id: 2366 cord_uid: t94aufs3 the eukaryotic pathogen genomics database resource (eupathdb, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. to facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and apis. all data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. eupathdb is updated with numerous new analysis tools, features, data sets and data types. new tools include go, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. new features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a eupathdb galaxy instance for private analyses of a user's data. forthcoming upgrades include user workspaces for private integration of data with existing eupathdb data and improved integration and presentation of host–pathogen interactions. a unique infrastructure and search strategy system distinguish the eukaryotic pathogen database resource (eupathdb, http://eupathdb.org) from other organism databases. the power of eupathdb lies in the ability to query across hundreds of data sets while refining a set of genes, proteins, pathways or organisms of interest. the interface is designed for easy mastery by biological researchers, enabling in silico experiments that interrogate diverse and complex data sets. despite the sophisticated strategy system, browsing gene pages and genomic spans or regions remains a simple and informative task in this innovative and valuable resource. eupathdb facilitates the discovery of meaningful biological relationships between genomic features such as genes or snps by integrating pre-analyzed data with sophisticated data mining, visualization and analysis tools that are designed to be used by wet-bench researchers. organized into 13 free, online databases eupathdb supports over 170 eukaryotic pathogens with genomic sequence and annotation, functional genomics data, host-response data, isolate and population data and comparative genomics. table 1 provides a web address and a link to a list of organisms supported for each database. all databases are built with the same infrastructure and use the strategies web development kit (1) , which provides a graphical interface for building complex search strategies and exploring relationships across data sets and data types ( figure 1 ; strategy http://plasmodb.org/plasmo/im.do?s=7b88206dd42007c8). as one of four national institute of allergy and infectious disease (niaid/nih) funded bioinformatics resource centers (2-6) eupathdb provides data, tools and services to scientific communities researching pathogens in the niaid list of emerging and re-emerging infectious diseases which includes niaid category a-c priority pathogens and many fungi. additional eupathdb support for the kinetoplastid and fungal research communities is funded by the wellcome trust in collaboration with genedb (7), including support for focused curated annotation. this manuscript describes expanded content, features and tools added since 2013 that increase the data mining and discovery power of eupathdb. over the past 4 years, eupathdb has routinely updated existing databases and added two new databases. we added new data, expanded the range of supported data types, enhanced infrastructure and added new analysis tools. eupathdb resources have been expanded to include fungidb (http://fungidb.org) (8) , which supports fungi and oomycetes, and hostdb (http://hostdb.org), for interrogation of host responses to infection. hostdb supports host data obtained during infections by organisms supported by eupathdb's 10 parasite lineage-specific databases. minot et al. (9) , for example, infected murine macrophages with 29 toxoplasma gondii strains and collected mixed parasitehost samples for rna sequencing. reads that align to the t. gondii genome are integrated into toxodb whereas hostdb houses those sequencing reads that align to the m. musculus genome. because all eupathdb databases employ the same data analysis pipelines, search strategy system, visualization and analysis tools, the t. gondii and m. musculus data can be compared. for example, one can easily identify parasite genes that are differentially expressed between two t. gondii strains from toxodb as well as host genes that are differentially expressed during infection with the same two strains from hostdb. enrichment analyses and comparison of these lists offers insights into host-pathogen interactions and responses. eupathdb tools are conceived and designed to reduce analysis barriers, enhance data mining and improve communication within and between the scientific communities we serve. the near-seamless integration of strategy results with tools for functional enrichment analyses and transcript interpretation as well as our new galaxy workspace and the availability of publicly shared strategies augment the data mining experience in eupathdb. galaxy workspace. eupathdb sites now include a galaxy-based (10) workspace for large-scale data analyses, e.g. rna-seq read mapping to a reference genome. developed in partnership with globus genomics (11), workspaces offer a private analysis platform with published workflows and pre-loaded annotated genomes for the organisms we support. the workspace is accessed through the 'analyze my experiment' (figure 2a ) tab on the home page of any eupathdb resource and can be used to upload your own data e.g. rna-seq reads, compose and run preconfigured or custom workflows ( figure 2b and c), retrieve your results, visualize them in eupathdb ( figure 2d ), and share workflows and data analysis results with colleagues. explore transcript subsets. transcript subsets occur when a multi-transcript gene has at least one transcript that does not meet the search criteria. for example, signal peptides are short sequences at the n-terminus of secretory proteins and eupathdb predicts signal peptides for all annotated genomes using signalp (12) . the predicted signal peptide search returns genes and transcripts with predicted signal peptides. if one transcript of a multi-transcript gene excludes the exon containing the signal peptide, the search returns the gene but not the signal peptide-deficient transcript. searches and strategies that query transcript-specific data ( figure 3a ; strategy http://plasmodb.org/plasmo/im. do?s=859df329f857438e) are equipped with an explore tool for interrogating or filtering transcript subsets. the explore tool appears in the gene results tab above the table of ids ( figure 3c ) and offers filters for transcripts based on their inclusion in the result set. filters are applied to the strategy result and update the gene result list. for two-step strategies where both steps query transcript specific data, the explore tool offers further filters for viewing transcripts that were returned by both searches, either search or neither search. enrichment analyses. gene ontology, metabolic pathway and word enrichment analyses are available for gene strategy results to aid with their interpretation ( figure 3f ). these functional analyses apply the fisher's exact test to determine over-represented pathways, ontology terms and product description terms. clicking the analyze results tab of any gene strategy result ( figure 3e ) and selecting an enrichment analysis will open an analysis tab where users are prompted for parameter values. the results of an enrichment analysis are presented in tabular form and include a list of enriched go terms, pathways or product description words and associated data. public strategies. strategies marked as public when saved to a user's profile will also be shared with the community in the 'public strategies' tab of the 'my strategies' interface. users control the availability of the strategy and can remove it at any time. the panel also includes example strategies provided by eupathdb. data sets search tool. each data set integrated into eu-pathdb is documented with a data set record which contains information about the data including a description, contact information for the investigator that generated the data, literature references, and when available, example graphs and links to searches and genome browser tracks. links to data set records appear on gene pages and on search pages beneath the parameters. a searchable table of all data sets is available from the data summary tab in the gray drop-down menu bar. eupathdb's philosophy is to provide a data mining platform that allows users to ask their own questions in support of hypothesis driven research. the extensive range of data types (genomic, transcriptomic, proteomic, metabolomic, etc.) maintained by eupathdb broadens the user's ability to mine extensively by providing multiple forms of experimental evidence to interrogate. as the omics world expands, eupathdb endeavors to support meaningful data types and has expanded its coverage over the past few years. source brought many genomes from this large and diverse research community. updates to eupathdb's reflow workflow system (2) make it possible to quickly and reliably analyze and load data. thus, over the past 4 years, numerous functional data sets have been loaded. data sets of interest can be located with the data set search tool described above. protein microarray. this new data type offers a measure of host response to infection by revealing pathogenspecific antibodies in host serum or plasma samples. a typical data set includes data from serum samples collected from patients during an infection (or from healthy controls) that were hybridized to arrays spotted with possible pathogen antigens (peptides representing gene products) (13) (14) (15) (16) . searches that query this data type are classified un-d586 nucleic acids research, 2017, vol. 45, database issue der immunology and graphs of a pathogen gene's antigenicity for each sample appear on gene pages. the searches employ the filter parameter for selecting samples based on clinical characteristics of patients when configuring the search (17) . metabolic pathways. pathways are integrated from meta-cyc, kegg, trypanocyc and leishcyc (18) (19) (20) (21) (22) as networks of enzymatic reactions and substrate/product compounds. genes are mapped to pathways based on ec numbers. pathway record pages feature a cytoscape image which can be 'painted' with experimental data, e.g. gene expression values or ortholog profiles. for easy transition to functional analysis, gene search results can be converted to pathways using the transform to pathways function in the add step popup or users can run a pathways enrichment analysis of their gene result to identify pathways that are statistically enriched. compounds. compound records are integrated from the chemical entities of biological interest (chebi) database (23) and associated to genes through metabolic pathway mappings. lists of compounds are returned based on molecular weight or formula, compound id, enzyme ec number, compound id and text. lists of genes and metabolic pathways can be transformed into their associated compounds using the transform function. a genome-wide loss of function screen using crispr technology is available in toxodb and provides a measure of a gene's contribution to parasite fitness (24 quantitative proteomics. this new data type provides evidence for differential protein expression from experimental methods such as silac (25, 26) . the searches appear under the proteomics, quantitative mass-spec evidence and return genes based on the fold change in protein expression between samples. gene pages include graphs of these data when available. copy number variation. whole genome resequencing data are used to estimate chromosome and gene copy number in re-sequenced strains (27) . the median read depth is set to the organism's ploidy and each chromosome's median read depth is normalized to this value. contigs that are not assigned to chromosomes are excluded from this analysis. gene copy number is similarly calculated using a normalized read depth for each gene. to compare the number of genes in the re-sequenced genome to the reference genome, genes are grouped into clusters that are inferred to have originated by duplication. searches are categorized under genetic variation and either return genes with a certain copy number, or genes with different copy numbers between strains. polysomal transcriptomics. rna-sequencing of polysome or ribosome associated transcripts reveals potential translation events. data sets of this data type are available in plasmodb (28, 29) and trytripdb (30) . categorized under transcriptomics, rna seq evidence, the searches against this new data type return genes with differential translation potential (fold change search) or genes within a certain percentile rank within a sample. expression graphs and rna sequencing coverage plots are available statically in gene pages and dynamically in gbrowse. these coverage plots provide evidence for the cds and translational start site usage. metadata. biological sample characteristics such as host clinical parameters for pathogen isolates or blood samples offer valuable information for stratifying samples while configuring searches. eupathdb integrates metadata when available and presents it in the filter parameter interface to take advantage of the rich data type when selecting samples for data mining (see below). the most recent eupathdb release represents significant updates to the underlying data and infrastructure. in addition to refreshing all data to the latest versions, we added workspaces, redesigned our gene pages, incorporated alternative transcripts into gene pages and searches, updated search categories and contemporized the rna sequence analysis workflow. categories. searches, the experimental data sets they query, and genome browser tracks for visualization are now displayed with a common logic across the websites. the categories are based on the embrace data & methods ontology (edam) (31) , which relates biological concepts with bioinformatic analyses. the result is a logical, consistent menu structure from home page to gene page to genome browser. for example, the category names and order in the home page 'search for genes' (figure 1b) is the same as the 'contents' section of the gene page ( figure 4c ). eupathdb's extensive record system documents integrated data and analysis results for entities such as genes, genomic sequences, snps, isolates, compounds and metabolic pathways. record pages have a new streamlined look, contain improved navigation tools, and are reorganized to reflect edam-based categories ( figure 4) . to view the gene page for pf3d7 0905700, autophagy-related protein 3, putative that is highlighted in figure 4 , go to http://plasmodb.org/plasmo/app/record/ gene/pf3d7 0905700. for example, in gene record pages, gene ids and product descriptions are prominently displayed in the upper left corner of the page with other pertinent gene information and links directly below ( figure 4a ). also at the top of the page are 'shortcuts' ( figure 4b ) which serve two functions--clicking on the shortcut's magnifying glass icon offers a larger view of the data, while clicking on the image (or its title) navigates to the data within the gene page. 'view in genome browser' links (e.g. above and below the gene models image in figure 4d ) accompany data that are also available for dynamic viewing in the genome browser. these links open the genome browser (gbrowse) (32) with the pertinent data track added to the user's current browser session. the collapsible and interactive 'contents' section reflects the new edam-based categories and features a search function for quickly locating a category ( figure 4c ). the contents section remains stationary and visible while scrolling the gene page data ( figure 4d) . a section indicator (small blue circle) appears to the left of the category name of the data currently in view. clicking a category name directs the page to that data section. the check boxes to the right of the category names can be used to customize the data display. data from categories with empty check boxes will be hidden from view. data tables (4e, 4f and within figure 4d ) are collapsible, interactive, contain sortable columns and present transcript-specific information when data can be unambiguously assigned to a transcript. tables with two or more rows include a search function. the transcriptomics (figure 4e) , protein properties and features ( figure 4f ), mass spec -based expression evidence and sequences tables contain expandable rows for retrieving detailed information. each row of the transcriptomics table represents a data set and expanding a row reveals graphs, data tables, and a data set description, as well as coverage plots for rna sequencing data. expansion of the rows in the protein properties and features table reveals the domains, blastp hits and other analysis results pertinent to the transcript's protein product. the mass spec-based expression evidence graphic table shows proteomic evidence associated with each transcript. the sequences table offers genomic, coding, predicted mrna and predicted protein sequences for each transcript. human and mouse genes (hostdb) have extensive alternative transcripts and there is increasing evidence that many eukaryotic pathogen genes have more than one transcript. eupathdb infrastructure was updated to better represent transcript information. transcripts are graphically represented on gene pages and listed in gene page tables when data can be unambiguously assigned to a transcript (figure 4d ). all gene search results now include a transcript id column ( figure 3c ). the results of searches that query transcript-specific data (e.g. predicted signal peptide) contain an explore tool (see tools section of this manuscript) for investigating transcript subsets ( figure 3b ). filtering samples based on metadata. sequences from pathogen isolates and data from host clinical blood samples are often accompanied by rich metadata-sample characteristics including host, age, geographic location, disease status and parasitemia. eupathdb's new filter parameter ( figure 5 ) increases the user's power to mine data via display of sample characteristics (metadata) on the interface for selection of samples while configuring a search or multiple sequence alignment. for example, the filter parameter makes it possible to compare the antigenicity of parasite genes between infected children and uninfected children within the same dataset. the filter parameter is available for searches and sequence alignments that access snp, chip-seq and hostresponse data. rna-sequence analysis workflow updated. our pipeline for analyzing and loading rna-sequence data was updated to use standard tools and to accommodate data sets with biological replicates. the new workflow aligns reads with gsnap and calculates fpkm/rpkm with ht-seq (33, 34) . deseq2 is used to determine differential expression for experiments that have appropriate biological replicates (35) . future development efforts at eupathdb will concentrate on expanding private analysis workspaces and better integration and support for host response to pathogen infection. the galaxy toolshed contains many tools for data analysis. we expect to enhance our existing galaxy workspace with new workflows such as alignment of resequencing reads and snp calls or production of multiple sequence alignments and phylogenetic analyses. critical to our expanded workspace will be the ability for users to fully integrate the results of their analyses into eupathdb so that they can query, view, and share their results in the context of the publicly available data in eupathdb. a high priority for eupathdb in the coming year is to better represent host responses to pathogen infection and enable users to mine these data to identify genes (or other entities) and relationships of interest. currently, only a few omics data sets are available for host response, but we expect this situation to change rapidly. we will be expanding not only the amount of host data that we load, but also the types of host response data so that we can include highthroughput metabolic and immune profiling and rich descriptions of all study, experiment and sample metadata. we will be loading these rich multi-dimensional studies and we will be implementing a variety of tools and analyses to mine these data at a systems level. the strategies wdk: a graphical search interface and web development kit for functional genomics databases eupathdb: the eukaryotic pathogen database ) patric, the bacterial bioinformatics database and analysis resource vectorbase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases virus pathogen database and analysis resource (vipr): a comprehensive bioinformatics database and analysis resource for the coronavirus research community influenza research database: an integrated bioinformatics resource for influenza research and surveillance. influenza other respir viruses genedb-an annotation database for pathogens fungidb: an integrated functional genomics database for fungi admixture and recombination among toxoplasma gondii lineages explain global genome diversity the galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses signalp 4.0: discriminating signal peptides from transmembrane regions submicroscopic and asymptomatic plasmodium falciparum and plasmodium vivax infections are common in western thailand--molecular and serological evidence a prospective analysis of the ab response to plasmodium falciparum before and after a malaria season by protein microarray plasmodium falciparum protein microarray antibody profiles correlate with protection from symptomatic malaria in kenya malaria transmission, infection, and disease at three sites with varied transmission intensity in uganda: implications for malaria control a framework for global collaborative data management for malaria research kegg as a reference resource for gene and protein annotation leishcyc: a biochemical pathways database for leishmania major leishcyc: a guide to building a metabolic pathway database and visualization of metabolomic data the metacyc database of metabolic pathways and enzymes and the biocyc collection of pathway/genome databases trypanocyc: a community-led biochemical pathways database for trypanosoma brucei the chebi reference database and ontology for biologically relevant chemistry: enhancements for 2013 a genome-wide crispr screen in toxoplasma identifies essential apicomplexan genes quantitative proteomics using silac: principles, applications, and developments proteome remodelling during development from blood to insect-form trypanosoma brucei quantified by silac and mass spectrometry chromosome and gene copy number variation allow major structural change between species and strains of leishmania genome-wide regulatory dynamics of translation in the plasmodium falciparum asexual blood stages polysome profiling reveals translational control of gene expression in the human malaria parasite plasmodium falciparum extensive stage-regulation of translation revealed by ribosome profiling of trypanosoma brucei edam: an ontology of bioinformatics operations, types of data and identifiers, topics and formats the generic genome browser: a building block for a model organism system database htseq-a python framework to work with high-throughput sequencing data gmap and gsnap for genomic sequence alignment: enhancements to speed, accuracy, and functionality differential expression analysis for sequence count data the authors wish to thank members of the eupathdb research communities for their willingness to share genomicscale data sets, often prior to publication and for numerous comments and suggestions from our scientific advisors and the scientific community at large, which have helped to improve the functionality of eupathdb resources. we also thank past and present staff associated with the eupathdb brc project, and our research laboratory colleagues whose contributions have facilitated the creation and maintenance of this database resource. key: cord-000402-unr44dvp authors: yoo, hyun jung; yoon, sung soo; park, seon yang; lee, eun young; lee, eun bong; kim, ju han; song, yeong wook title: gene expression profile during chondrogenesis in human bone marrow derived mesenchymal stem cells using a cdna microarray date: 2011-06-20 journal: j korean med sci doi: 10.3346/jkms.2011.26.7.851 sha: doc_id: 402 cord_uid: unr44dvp mesenchymal stem cells (mscs) have the capacity to proliferate and differentiate into multiple connective tissue lineages, which include cartilage, bone, and fat. cartilage differentiation and chondrocyte maturation are required for normal skeletal development, but the intracellular pathways regulating this process remain largely unclear. this study was designed to identify novel genes that might help clarify the molecular mechanisms of chondrogenesis. chondrogenesis was induced by culturing human bone marrow (bm) derived mscs in micromass pellets in the presence of defined medium for 3, 7, 14 or 21 days. several genes regulated during chondrogenesis were then identified by reverse transcriptase-polymerase chain reaction (rt-pcr). using an abi microarray system, we determined the differential gene expression profiles of differentiated chondrocytes and bm-mscs. normalization of this data resulted in the identification of 1,486 differentially expressed genes. to verify gene expression profiles determined by microarray analysis, the expression levels of 10 genes with high fold changes were confirmed by rt-pcr. gene expression patterns of 9 genes (hrad6b, annexina2, bmp-7, contactin-1, peroxiredoxin-1, heat shock transcription factor-2, synaptotagmin iv, serotonin receptor-7, axl) in rt-pcr were similar to the microarray gene expression patterns. these findings provide novel information concerning genes involved in the chondrogenesis of human bm-mscs. mesenchymal stem cells (mscs) are present in a variety of tissues during human development, and in particular, are prevalent in adult bone marrow (1) . mscs isolated from bone marrow (bm) and expanded in vitro in their undifferentiated phenotype, retain an extensive capacity for multi-lineage differentiation into chondrocytes, adipocytes, osteoblasts, and tenocytes under appropriate environmental cues (2) . the presence of specific, distinct antigens identified by the monoclonal antibodies sh2, sh3, and sh4, on the surfaces of marrow-derived mscs, that are not present on osteocytes and osteoblasts, suggests that these epitopes are developmentally regulated. the antigen which bound to sh2 antibody was identified as endoglin (cd105), a receptor for tgf-β3, which potentially plays a role in mediating the chondrogenic differentiation of mscs and in their interactions with hematopoietic cells (3) . chondrogenesis, the differentiation of mscs into chondrocytes, is crucial required for skeletal development and maturation, since the cartilage anlage is the model for bone formation. cartilage development thus includes the differentiation of mscs into chondrocytes, followed by their maturation, and eventual their hypertrophy and death (4) . differential gene expression profiling has been widely performed to identify and characterize candidate genes that play potentially important roles in particular biological process (5) . although the amount of information regarding the role of growth factors and cytokines as inducers and mediators of msc differentiation continues to increase, little is known about the gene expression profiling of msc chondrogenic differentiation. in this study, we employed abi genechips (representing > 30,000 genes) to identify genes differentially expressed during bm-msc chondrogenesis. abi is introduced technology based on nylonspotted 60 mer oligonucleotides, that uses on oligomers to detect each gene for most genes, chemiluminescence to measure gene expression levels, and fluorescence to grid to normalize and identify microarray features. the abi gene list was compiled from information in public and celera databases (6) . this study was designed to identify differential gene expression profiles and novel genes that might be involved in bm-msc chondrogenesis. mononuclear cells from bm aspirates were isolated by density ficoll-paque gradient separation. bm was placed in a 50 ml syringe containing 5,000 units of preservative-free heparin, diluted 1:1 with phosphate buffered saline (pbs), resuspended in pbs to a final volume of 10 ml, and layered over an equal volume of histopaque-1,077 (sigma chemical co., st. louis, mo, usa). after centrifugation at 2,000 rpm for 30 min, mononuclear cells were recovered from the gradient interface, rinsed twice in pbs, adjusted to a concentration of 1.5 × 10 7 cells/10 ml, and seeded onto 100-mm culture plates in dulbecco's modified eagle's medium-low glucose (dmem-lg; 1 g/l glucose, jbi, seoul, korea) containing 1% penicillin-streptomycin (p/s; 10,000 units/ml, gibco/brl, new york, ny, usa) and 10% (v/v) heat-inactivated fetal bovine serum (fbs; hyclone, logan, ut, usa). total numbers of nucleated and viable cells were determined using a hemocytometer and trypan blue (gibco/brl, gaithersburg, md, usa) staining. cells were incubated at 37°c in a humidified 5% co2 atmosphere and allowed to adhere for 72 hr. non-adherent cells were then removed. the medium was changed twice a week. when cells were 80%-90% confluent, adherent cells were trypsinized (0.05% trypsin, gibco/brl) at 37°c for 5 min and replated in 100-mm culture plates. after passage 3, a morphologically homogenous population of adherent cells was obtained. during this expansion, medium was changed every 4-5 days. mscs that adhered to spot slide bottoms were fixed with -20°c methanol (100%) for 5 min. cells were then rehydrated in pbs for 5 min at room temperature, washed three times with pbs, blocked with 3% bovine serum albumin in pbs, and incubated overnight at 4°c with sh2 (american type culture collection [atcc], rockville, va, usa) as a positive control. primary antibody (sh2) was removed by washing three times with pbs, and cells were then incubated with fluorescein isothiocyanate (fitc)labeled affinity-purified antibody to mouse igg + igm (h + l) (dinona inc., seoul, korea) for 1 hr at room temperature. secondary antibodies were removed by washing three times with pbs. coverslips were mounted onto slides with a solution containing 50% pbs and 50% glycerol. labeled cells were observed under an axiovert 200 (zeiss, thornwood, ny, usa). flow cytometry was performed to determine mscs positive for sh2. cells were permeabilized with ice cold 75% methanol in pbs for 30 min at 4°c, and washed three times. a fitc-conjugated sh2 antibody (dinona), diluted 1:500 in pbs, was then added, and cells were incubated for 1 hr at 4°c. cells were analyzed within 2 hr of staining using a flow cytometer (facscali-bur, becton dickinson, bedford, ma, usa). a total of 1 × 10 6 cells were collected for each measurement. negative control samples were stained with an isotype-matched irrelevant mab. to induce chondrogenic differentiation, 200,000 mscs were placed in a 15-ml polypropylene tube and centrifuged at 1,000 rpm for 5 min. pellets were then cultured at 37°c in 5% co2 and 300 μl of serum-free chondrogenic medium consisting of dulbecco's modified eagle medium-high glucose (dmem-hg, jbi) supplemented with 10 ng/ml of transforming growth factor-β3 (tgf-β3, r&d systems, minneapolis, mn, usa), 100 nm dexamethasone (sigma-aldrich, st. louis, mo, usa), 50 μg/ml ascorbate-2-phosphate, 100 μg/ml pyruvate, and 50 mg/ml its + premix (becton dickinson biosciences, bedford, ma, usa; 6.25 μg/ml insulin, 6.25 μg/ml transferrin, 6.25 ng/ml selenious acid, 1.25 mg/ml bovine serum albumin [bsa], and 5.35 mg/ ml linolenic acid); the medium was replaced every 2-3 days for 3, 7, 14, or 21 days. total rna was extracted from undifferentiated mscs and from pellets after 3, 7, 14, or 21 days of differentiation using rneasy kits (qiagen, valencia, ca, usa), according to the manufacturer's instructions. and 1,200 μl of rlt buffer supplemented with beta-mercaptoethanol (10 μl/ml) was added to the washed cells. rna integrity was assessed by gel electrophoresis and rt-pcr and concentrations were determined by measuring absorbance at 260 nm. total rna was processed using the genesys applied biosystem facility (genesys, munster, germany), according to manufacturer's recommendations. each rna pool (2 μg) was labeled with digoxigenin-utp using the abi chemiluminescent rt-ivt labeling kit v 1.0. double-stranded cdna was prepared from total rna. utp-digoxigenin-labeled complementary rna (crna) was synthesized by in vitro transcription. labeled crna (10 μg) was hybridized to abi human genome survey microarray v 2.0, which was then incubated with alkaline phosphatase-linked digoxigenin antibody. phosphatase activity was then initiated to produce the chemiluminescent signal. chemiluminescent (crna) and fluorescent (spot background) signals of the crna and standard control spots were then scanned. chemiluminescent detection and image acquisition was performed using an applied biosystems 1,700 (ab1,700) chemiluminescent microarrays analyzer, according to the manufacturer's instructions. using the software supplied with the ab1,700 apparatus, the spot chemiluminescent signal was normalized over the fluorescent signal of the same spot to obtain normalized signal value. for inter-array normalization, global median normalization was ap-plied across all microarrays (7). data analysis and data normalization were performed using the method described by quackenbush (8) . for background correction, the mean intensities of areas surrounding spots were subtracted from spot intensities (local area background). data sets were normalized by dividing the mean intensity value of every spot (in duplicate) by sum of all spot intensities within a sample to eliminate experimental or data acquisition variations. normalized data were used to calculate the gene expression level ratios of different culture stages. a two-fold expression cut-off was applied. for hierarchical gene cluster analysis, expression ratios were calculated for all genes as described by eisen et al. (9) . first strand cdna was synthesized using reverse transcriptase (rt) and 1 μg of total rna. reactions were conducted in 20 μl of buffer containing; 0.5 μl oligo (dt)12-18 primer (gibco/brl, grand island, ny, usa), 50 mm tris-hcl (ph 8.3), 75 mm kcl, 3 mm mgcl2, 40 mm dtt, 0.5 mm deoxynucleotide triphosphate (dntp) mixture (invitrogen, carlsbad, ca, usa), 10 unit rnase inhibitor (gibco/brl), and 200 units of mmlv reverse transcriptase (invitrogen). after incubation at 37°c for 60 min, reactions were stopped by heating at 70°c for 15 min. to remove remaining rna, 1 μl of e. coli rnase h (4 mg/ml) was added to reaction mixtures and incubated at 37°c for 30 min. cdnas obtained were used as a template for pcr amplification using gene-specific primers for target genes and for glyceraldehyde 3-phosphate dehydrogenase (gapdh). primer sequences are listed in table 1 . the in vitro growth pattern of msc is shown in fig. 1 . human bone marrow-derived mscs were cultured and expanded. durtable 1 . ing the log phase of growth, cells proliferated with a population doubling time of 30 hr, and this growth period was followed by a confluent growth-arrested phase. colonies were examined approximately 7 days after initial plating. a morphologically homogeneous population of 90% confluent fibroblast-like cells was obtained after 2 weeks. the cells were replated into culture dishes and cultured for 2 weeks. the replated cells were used for subsequent experiments. the cultured mscs were positive for sh2 by flow cytometry (fig. 2) . mscs were pelleted into micromasses and differentiated in serum-free medium in the presence of tgf-β3 and dexamethasone. immediately after centrifugation, the cells appeared as flattened pellets at the bottom of tubes. one day later, pellets had a thickened lip, and between days 2 and 3, pellet became spherical without any increase in size. pellets then grew in size and pellet diameters increased to about 2-fold on days 7 and 14 (fig. 3a) . using normalized microarray data, we identified 1,486 differen-tially expressed genes (fig. 4a) , which included 1,303, 121, 40, 20, and 2 genes exhibiting minimum 2 to < 5, 5 to < 10, 10 to < 20, 20 to < 70 and > 70-fold changes, respectively. to verify gene expression profiles determined by microarray analysis, the expression levels of 10 genes with high fold changes (2-48 fold changes, table 2 ) were confirmed by rt-pcr. the expression levels of the 10 genes selected (hrad 6b, annexin a2, bmp-7, contactin-1, peroxiredoxin-1, heat shock transcription factor-2, synaptotagmin iv, serotonin receptor-7, axl, and il-15) were analyzed by rt-pcr, by using total rnas obtained from 5 samples (fig. 4b) . the expression levels of 9 genes (hrad 6b, annexin a2, bmp-7, contactin-1, peroxiredoxin-1, heat shock transcription factor-2, synaptotagmin iv, serotonin receptor-7, axl) were low in undifferentiated cells and increased in differentiated cells by rt-pcr and microarray, but the expression pattern of il-15 was different. expression level of il-15 tended to be decreased in microarray, but increased in rt-pcr (fig. 5 ). in this study, we determined gene expression profiles in differentiated chondrocytes and bm-mscs. the microarray technology used did not allow quantitative comparisons between the expressional levels of different genes, but did allow us to compare fold changes with time and quantify differences in the expressions of multiple genes. our results show the sequences for gene expressional changes during bm-msc chondrogenesis. microarray data showed that axl, synaptotagmin iv, hrad6b, peroxiredoxin-1, bmp-7, heat shock transcription factor-2, annexin a2, contactin-1 and serotonin receptor-7 expressions were maintained in differentiating bm-mscs until day 14. axl is overexpressed in a number of tumors (10) , and il-15 is known to mediate the transactivation and upregulation of axl with subsequent activation of pi3k/akt and upregulations of bcl-2 and bcl-xl (11) . on the other hand, synaptotagmin iv is required for the maturation of secretory granules in pc12 cells (12) . human homologues of yeast rad 6 (hrad6b) encode ubiq-uitin-conjugating enzymes, and is highly expressed in lung cancer cell. it has been reported that dna repair and uv mutagenesis are defective in saccharomyces cerevisiae rad6 mutant (13) . peroxiredoxin-1 is the most ubiquitously expressed member of the peroxiredoxin family, and is found in the cytoplasm, nucleus, mitochondria, and peroxisomes of many cell types (14) . furthermore, recent studies have reported high levels of peroxiredoxin-1 expression in the bovine bladder, seminal vesicles, testes, adrenal gland (15) , and in the rat liver, skin, lungs and nervous system (14) . the role of peroxiredoxin-1 in cell differentiation and proliferation suggests that it has a possible role in growth and development. recent studies have confirmed that bmp-7 is a strong chemotactic component in cartilage cells produced by mesenchymal stem cells, and it can promote cartilage cells to secrete specific extracellular matrix (proteoglycans and collagen type ii). and bmp-7 can induce the differentiation of bm-mscs into cartilage cells, and that it offers a greater efficiency in repairing cartilage and subchondral bone defects (16) . heat shock transcription factor-2 (hsf-2) has been shown to be a transcriptional regulator of heat shock protein gene expression during the differentiation and development of eukaryotic cells in a tissue dependent manner (17) . hsf-2 plays an important role in fgf-2 stimulated osteoclast formation, and hsf-2 deficiency was found to modulate gene expression in stromal/ preosteoblast cells and affect osteoclastogenesis in the bone microenvironment (18) . annexins bind to negatively charged phospholipids in a ca 2+ -dependent manner and have a conserved structure. the human annexin, annexin a2 (alternative names: annexin ii, p36, and lipocortin ii) is expressed abundantly in various human organs, including the placenta, lungs, heart, and liver (19) . at the cellular level, annexin a2 is expressed on endothelial cell surfaces and acts as a co-receptor for plasminogen and tissue plasminogen activator (20) . furthermore, annexins are commonly dysregulated in cancer (21) and annexin a2 is upregulated in a variety of tumors and cancer cell lines (22, 23) . contactin-1 is a cell surface adhesion molecule, which is normally expressed by neurons, oligodendrocytes, and human astrocytic gliomas (24, 25) . previous studies have reported that mscs express il-15, essential hematopoietic growth factor (26, 27) and il-15 is also a potent apoptosis inhibitor and has many immunomodulatory activities (28) . the serotonin receptor 7 is the most recently identified member of the serotonin receptor family and is found in brain, mainly in the hypothalamus, thalamus, hippocampus, and cortex (29) . in the present study, we performed microarray analysis during bm-msc chondrogenesis in vitro. we found that over 1,486 genes were expressed by bm-mscs during chondrogenesis, and we identified genes that were differentially expressed. these data may provide novel information of the genes involved in chondrogenesis of human bm-mscs. multilineage potential of adult human mesenchymal stem cells transgene expression and differentiation of baculovirus-transduced human mesenchymal stem cells cell surface antigens on human marrow-derived mesenchymal cells are detected by monoclonal antibodies gene expression profiling following bmp-2 induction of mesenchymal chondrogenesis in vitro. osteoarthritis suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cdna probes and libraries cross platform microarray analysis for robust identification of differentially expressed genes open software development for computational biology and bioinformatics microarray data normalization and transformation cluster analysis and display of genome-wide expression patterns gas6 induces growth, beta-catenin stabilization, and t-cell factor transcriptional activation in contact-inhibited c57 mammary cells a promiscuous liaison between il-15 receptor and axl receptor tyrosine kinase in cell death control synaptotagmin iv is necessary for the maturation of secretory granules in pc12 cells decreased hrad6b expression in lung cancer differential cellular and subcellular localization of heme-binding protein 23/peroxiredoxin i and heme oxygenase-1 in rat liver cloning of bovine peroxiredoxins-gene expression in bovine tissues and amino acid sequence comparison with rat, mouse and primate peroxiredoxins bmp7 induces the differentiation of bone marrow-derived mesenchymal cells into chondrocytes heat shock factor 2 is activated during mouse heart development rank ligand expression in heat shock factor-2 deficient mouse bone marrow stromal/preosteoblast cells differential expression of annexins i, ii and iv in human tissues: an immunohistochemical study specific interaction of tissue-type plasminogen activator (t-pa) with annexin ii on the membrane of pancreatic cancer cells activates plasminogen and promotes invasion in vitro annexin a2 on lung epithelial cell surface is recognized by severe acute respiratory syndrome-associated coronavirus spike domain 2 antibodies tenascin c and annexin ii expression in the process of pancreatic carcinogenesis redox regulation of annexin 2 and its implication for oxidative stress-induced renal carcinogenesis and metastasis faivre-sarrailh c. f3/contactin, a neuronal cell adhesion molecule implicated in axogenesis and myelination contactin is expressed in human astrocytic gliomas and mediates repulsive effects phenotypic and functional comparison of cultures of marrow-derived mesenchymal stem cells (mscs) and stromal cells gene expression profile of cytokine and growth factor during differentiation of bone marrow-derived mesenchymal stem cell death deflected: il-15 inhibits tnfalpha-mediated apoptosis in fibroblasts by traf2 recruitment to the il-15r alpha chain functional, molecular and pharmacological advances in 5-ht7 receptor research mesenchymal stem cells (mscs) have the capacity to proliferate and differentiate into multiple connective tissue lineages such as cartilage and bone. in this study, using an abi microarray system, the authors determined the differential gene expression profiles of differentiated chondrocytes and bone marrow (bm)-mscs. normalization of this data resulted in the identification of 1,486 differentially expressed genes. to verify gene expression profiles of microarray, rt-pcr was also performed. gene expression patterns of 9 genes in rt-pcr were similar to the microarray results. these findings provide novel information concerning genes involved in the chondrogenesis of human bm-mscs. key: cord-016200-zfh20im0 authors: saxena, jyoti; rawat, shweta title: edible vaccines date: 2013-10-22 journal: advances in biotechnology doi: 10.1007/978-81-322-1554-7_12 sha: doc_id: 16200 cord_uid: zfh20im0 in recent years edible vaccine emerged as a new concept developed by biotechnologists. edible vaccines are subunit vaccines where the selected genes are introduced into the plants and the transgenic plant is then induced to manufacture the encoded protein. foods under such application include potato, banana, lettuce, corn, soybean, rice, and legumes. they are easy to administer, easy to store and readily acceptable delivery system for different age group patients yet cost effective. edible vaccines present exciting possibilities for significantly reducing various diseases such as measles, hepatitis b, cholera, diarrhea, etc., mainly in developing countries. however, various technical and regulatory challenges need to overcome in the path of this emerging vaccine technology to make edible vaccine more efficient and applicable. this chapter attempts to discuss key aspects of edible vaccines like host plants, production, mechanism of action, advantages and limitations, applications, and different regulatory issues concerned to edible vaccines. vaccine is a biological preparation intended to produce immunity to a disease by stimulating the production of antibodies. dead or attenuated organisms or purified products derived from them are generally used to produce various vaccines. over the past decade, scientific advances in genetics, molecular biology, and plant biotechnology have improved the understanding of many infectious diseases and led to the development of vaccination programs. the most common method of administering vaccines is by injection but some are given by mouth or nasal spray. though immunization is the safest method to combat the diseases worldwide but there are many constraints regarding its mode of production, distribution, delivery, cost, and lack of enough research. hence it is desirable to look for an effectual and powerful yet cost effective, easy for storage and distribution yet safe method of immunization. it should also be readily acceptable to all sociocultural groups around the globe. research underway is dedicated to solving these problems by finding ways to produce edible vaccines in the form of transgenic plants which have been investigated as an alternative means to produce and deliver vaccine. edible vaccines are called by several alternative names such as food vaccines, oral vaccines, subunit vaccines, and green vaccines. they seem to be a viable alternative especially for the poor and developing countries. they have come up as great boon in medicinal science for which biotechnologists should be given all credit. the concept of edible vaccines lies in converting the edible food into potential vaccines to prevent infectious diseases. it involves introduction of selected desired genes into plants and then inducing these altered plants to manufacture the encoded proteins. it has also found application in prevention of autoimmune diseases, birth control, cancer therapy, etc. edible vaccines are currently being developed for a number of human and animal diseases. this new technology hopefully will contribute positively toward the global vaccine programs and have a dramatic impact on health care in developing countries. many people in developing countries do not have access to the vaccines they need, as the traditional vaccines are costly and require skilled medical people for administration and are less effective in inducing mucosal immune response. it was these needs which inspired hiatt et al. (1989) who attempted to produce antibodies in plants which could serve the purpose of passive immunization. the first report of edible vaccine (a surface protein from streptococcus) in tobacco, at 0.02 % of total leaf protein level, appeared in 1990 in the form of a patent application published under the international patent cooperation treaty. by conceiving the idea of edible vaccine dr. charles arntzen tried to realize it (arntzen 1997) . in 1992, arntzen and coworkers introduced the concept of transgenic plants as a production and delivery system for subunit vaccines in which edible tissues of transgenic crop plants were used (mor et al. 1998) . they found that this concept could overcome the limitations of traditional vaccines, thereby triggering the research on edible vaccine. in 1990s, streptococcus mutans surface protein antigen a was expressed for the first time in tobacco. the same group also pioneered the field with work on hepatitis b and heatlabile toxin, b subunit in tobacco plants and potato tubers. in the same year, the successful expression of hepatitis b surface antigen (hbsag) in tobacco plants was also achieved (mason et al. 1992) . to prove that plant-derived hbsag could stimulate mucosal immune responses via oral route, potato tubers were used as an expression system and were optimized to increase the accumulation of the protein in plant tubers (richter et al. 2000) . parallel to the evaluation of plant-derived hbsag, mason and arntzen explored plant expression of other vaccine candidates including the labile toxin b subunit (lt-b) of enterotoxigenic escherichia coli (etec) and the capsid protein of norwalk virus. the plant-derived proteins correctly assembled into functional oligomers that could elicit the expected immune responses when given orally to animals (mason et al. 1998 ). in 1998 a new era was opened in vaccine delivery when researchers supported by the national institute of allergy and infectious diseases (niaid) have shown for the first time that an edible vaccine can safely generate significant immune responses in people. the report by collaborators from the university of maryland in baltimore, the boyce thompson institute for plant research in ithaca, n.y., and tulane university in new orleans appeared in the may issue of nature medicine. according to the then director of niaid ''edible vaccines offer exciting possibilities for significantly reducing the burden of diseases like hepatitis and diarrhea, particularly in the developing world where storing and administering vaccines are often major problems,'' mor et al. (1999) also discussed the rapid increase of research in the edible-vaccine field and pointed out that plants could be used to create multicomponent vaccines that can protect against several pathogens at once. this is an aspect of the edible-vaccine approach that further strengthens its impact. later, in 2003 sala and research group reported that proteins produced in these plants induced the mucosal immune response which was the main aim behind this concept. research into edible vaccine is still at a very early stage and scientists have a long way to go before it will become a major part of immunization program world wide. to date, many plant species have been used for vaccine production. the choice of the plant species is important. an edible, palatable plant is necessary if the vaccine is planned for raw consumption. in case of vaccine for animal use, the plant should preferentially be selected among those consumed as normal component of the animal's diet. some food vehicles are discussed below: the concept of edible vaccine got impetus after arntzen and coworkers expressed hbsag in tobacco. the first edible vaccine was produced in tobacco in 1990 in which 0.02 % recombinant protein (a surface protein from streptococcus) of the total soluble leaf proteins was found. it appeared in the form of a patent application published under the international patent cooperation treaty. transgenic tobacco is successfully engineered for the production of edible vaccines against hepatitis b antigen using 's' gene of hepatitis b virus (hbv). the optimum level of recombinant protein was obtained in leaves and seeds. since acute watery diarrhea is caused by enterotoxigenic e. coli and vibrio cholerae that colonize the small intestine and produce one or more enterotoxin, an attempt was made toward the production of edible vaccine by expressing heat-labile enterotoxin (lt-b) in tobacco. besides, antibodies against dental caries, expressed in tobacco, are already in preclinical human trials. italian researchers have now developed an immunologically active, cost-efficient vaccine against human papilloma viruses (hpv). hpv are the causative agents for cervical cancer, and are also involved in skin, head, and neck tumors. cervical cancer is one of the main causes of cancer-related deaths. genetically modified potatoes are also a viable option and seem to be the desired vector. many of the first edible vaccines were synthesized in potato plants. in serum antibodies at some point after immunization, and 6 of the 11 (55 %) also showed fourfold mount in intestinal antibodies. the potatoes were well tolerated and no one experienced serious adverse side effects. vaccine development has successfully tested a potato-based vaccine to combat the norwalk virus, which is spread by contaminated food and water. the virus causes severe abdominal pain and diarrhea. a research team led by william langridge of the loma linda university in california has reported that transgenic potatoes engineered with a cholera antigen, ctb can effectively immunize mice. mice fed transgenic potatoes produce cholera-specific antibodies in their serum and intestine; iga and igg antibodies reach their highest levels after the fourth feeding. in yet another experiment genetically engineered potatoes containing a hepatitis b vaccine have successfully boosted immunity in their first human trials. attempts have also been made to boil the potatoes as raw potatoes are not very appetizing but unfortunately the cooking process breaks down about 50 % of the proteins in the vaccine. while some proteins are more tolerant to heat, for most proteins it will be necessary to amplify the amount of protein in the engineered foods if they are to be cooked before consumption. tomatoes are an excellent candidate because they are easy to manipulate genetically and new crops can be grown quickly. moreover, they are palatable and can be eaten raw. while tomatoes do not grow well in the regions in which the edible vaccines are most needed, the engineered tomatoes can be dried or made into a paste to facilitate their delivery. the anti-malaria edible vaccines in different transgenic tomato plants expressing antigenic type(s) have been proposed by chowdhury and bagasara in 2007 . they hypothesized that immunizing individuals against 2-3 antigens and against each stage of the life cycle of the multistage parasites would be an efficient, inexpensive and safe way of vaccination. tomatoes with varying sizes, shapes, and colors carrying different antigens would make the vaccines easily identifiable by lay individuals. tomatoes serve as an ideal candidate for the hiv antigen because they unlike other transgenic plants that carry the protein, are edible and immune to any thermal process, which help to retain their healing capabilities. scientists have claimed that tomatoes could be used as a vaccine against alzheimer's disease. the work is in progress to genetically modify the fruit to create an edible vaccine that fires up the immune system to tackle the disease by attacking the toxic beta-amyloid protein that destroys vital connections between brain cells, causing alzheimer's. researchers have engineered tomato plants (lycopersicon esculentum mill var. uc82b) to express a gene for the glycoprotein (g-protein), that coats the outer surface of the rabies virus. the recombinant constructs contained the gprotein gene from the era strain of rabies virus, including the signal peptide, under the control of the 35s promoter of cauliflower mosaic virus (camv). a common fruit-the banana-is currently being considered as a potential vehicle for vaccines against serious as well as too common diseases. the advantage of bananas is that they can be eaten raw as compared to potatoes or rice that need to be cooked and can also be consumed in a pure form. furthermore, children tend to like banana and the plants grow well in the tropical areas in which the vaccines are needed the most. hence, the research is leaning toward the use of banana as the vector since a large number of third-world countries, who would benefit the most from edible vaccines have tropical climates. on the negative side, a new crop of banana plants takes about 12 months to bear fruit. after fruiting, the plants are cut down and a new crop of vaccine-bearing plants must be planted. researchers have also developed bananas that deliver a vaccine for hbv. the banana vaccine is expected to cost just 2 cents a dose, as compared to the $125 for the currently available injectable vaccine. maize has also been used as a vector for various edible vaccines. egyptian scientists have genetically engineered the maize plants to produce a protein known as hbsag which elicits an immune response against the hepatitis b virus and could be used as a vaccine. if human trials are successful more than 2 billion people infected with hepatitis b, and about 350 million of these at high risk of serious illness and death from liver damage and liver cancer would be benefited. researches are in offing at iowa state university with the aim to allow pigs and humans to get a flu vaccination simply by eating corn or corn products. it is quite likely that corn vaccine would work in humans when they eat corn or even corn flakes, corn chips, tortillas, or anything that contains corn. genetically modified maize could provide protection to chickens against a highly contagious and fatal viral disease affecting most species of birds. mexican researcher octavio guerrero-andrade and his colleagues at the centre for research and advanced studies in guanajuato, central mexico, genetically modified maize to create an edible vaccine against newcastle disease virus (ndv). they inserted a gene from the ndv, a major killer of poultry in developing countries, into the maize dna and found antibodies against the virus in chickens that ate the genetically modified maize. one pig vaccine has also been produced in corn successfully. efforts are being made by us company prodigene to genetically modify maize to contain a key protein found on the surface of the monkey form of hiv. according to us national institute of health this development brings an edible, more effective, hiv vaccine for people a step closer. transgenic maize expressing the rabies virus glycoprotein (g) of the vnukovo strain has also been produced using ubiquitin maize promoter fused to the whole coding region of the rabies virus g gene, and a constitutive promoter from camv. maize embryogenic callus were transformed with the above construct by biolistics. regenerated maize plants were recovered and grown in a greenhouse. the amount of g-protein detected in the grains was approximately 1 % of the total soluble plant protein. rice is another potential crop which has been used for developing vaccines. it offers several advantages over traditional vaccines; it does not require refrigeration. in fact, the rice proved just as potent after 18 months of storage at room temperature and the vaccine did not dissolve when exposed to stomach acids. in an attempt, predominant t cell epitope peptides, which were derived from japanese cedar pollen allergens, were specifically expressed in rice seeds and delivered to the mucosal immune system (mis); the development of an allergic immune response of the allergen-specific th2 cell was suppressed. furthermore, not only the specific ige production and release of histamine from mast cells were suppressed, but the inflammatory symptoms of pollinosis, such as sneezing, were also suppressed. these results suggest the feasibility of using an oral immunotherapy agent derived from transgenic plants that accumulate t cell epitope peptides of allergens for allergy treatment. the transfer of genetic material from the microbe responsible for producing cholera toxin into a rice plant has been achieved. the plants produced the toxin and when the rice grains were fed to mice they provoked immunity from the diarrhea-causing bacterium. genetically modified spinach has also been considered for the development of edible vaccine. spinach is being investigated as a plantderived, edible vehicle for anthrax vaccine, as well as a vehicle for the hiv-1 tat protein (a prospective vaccine candidate). in an experiment a fragment of protective antigen (pa) that represents most of the receptor-binding domain was expressed as a translational fusion with a capsid protein on the outer surface of tobacco mosaic virus, and spinach was inoculated with the recombinant virus. the plant-expressed pa is highly immunogenic in laboratory animals. among other food crops with potential to be developed as edible vaccine; sweet potato, peanuts, lettuce, watermelon, and carrots are on the top priority. the development of plant-based vaccines to protect against many other diseases, such as hiv-1, hepatitis b, rabies, and non-hodgkin's lymphoma are ongoing throughout the globe using one of these edible plants. the advantages and disadvantages of various plant host systems are given in conventional subunit vaccines are expensive and technology-intensive, need purification, require refrigeration, and produce poor mucosal response. in contrast, edible vaccines would enhance compliance, especially in children, and because of oral administration would eliminate the need for trained medical personnel. their production is highly efficient and can be easily scaled up. for example, hepatitis b antigen required to vaccinate whole of china annually, could be grown on a 40-acre plot and all babies in the world each year on just 200 acres of land. they are cheaper, sidestepping demands for purification (single dose of hepatitis b vaccine would cost approximately 0.43 cents), grown locally using standard methods and do not require capitalintensive pharmaceutical manufacturing facilities. mass-indefinite production would also decrease dependence on foreign supply. fear of contamination with animal viruses-like the mad cow disease, which is a threat in vaccines manufactured from cultured mammalian cells, is eliminated as plant viruses do not infect humans. edible vaccines activate both mucosal and systemic immunity, as they come in contact with the digestive tract lining which is not possible with subunit vaccines which provide poor mucosal response. this dual effect of edible vaccines provides first-line defense against pathogens invading through mucosa, such as mycobacterium tuberculosis and agents causing diarrhea, pneumonia, stds, hiv, etc. the specific advantages are stated below: 1. edible means of administration. 2. no need of medical personnel and syringes. 3. sterile injection conditions are no more required. 4. economical in mass production by breeding compared to an animal system. 5. easy for administration and transportation. 6. effective maintenance of vaccine activity by controlling the temperature in plant cultivation. 7. therapeutic proteins are free of pathogens and toxins. 8. storage near the site of use. 9. heat stable, thus eliminating the need of refrigeration. potential for outcrossing in field; deep root system problematic for cleaning field (mason et al. 2002) 10. antigen protection through bioencapsulation. 11. subunit vaccine (not attenuated vaccine) means improved safety. 12. seroconversion in the presence of maternal antibodies. 13. generation of systemic and mucosal immunity. 14. enhanced compliance (especially in children). 15. delivery of multiple antigens. 16. integration with other vaccine approaches. 17. plant-derived antigens assemble spontaneously into oligomers and into virus like particles. 18. no serious side effect problems have been noticed until now. 19. reduced risk of anaphylactic side effects from edible vaccine over injection system is one benefit reported by the bio-medicine.org. they reported that the edible vaccine carries only part of the allergen compared to injection methods which reduce anaphylactic risk. 20. administration of edible vaccines to mothers to immunize the fetus-in utero by transplacental transfer of maternal antibodies or the infant through breast milk. edible vaccines have a potential role in protecting infants against diseases like group-b streptococcus, respiratory syncytial virus (rsv), etc., which is under investigation. 21. edible vaccines would also be suitable against neglected/less common diseases like dengue, hookworm, rabies, etc. they may be integrated with other vaccine approaches and multiple antigens may also be delivered. with advancement come many hurdles and problems, so is true for edible vaccines. like, one could develop immunotolerance to the vaccine peptide or protein, though a little research has been done on it. one of the key goals of the edible-vaccine pioneers was to reduce immunization costs but later many limitations were reported as given below: 1. consistency of dosage from fruit to fruit, plant to plant, lot to lot, and generation to generation is not similar. 2. stability of vaccine in fruit is not known. 3. evaluation of dosage requirement is tedious. 4. selection of best plant is difficult. 5. certain foods like potatoes are generally not eaten raw and cooking the food might weaken the medicine present in it. 6. not convenient for infants as they might spit it, eat a part or eat it all, and throw it up later. concentrating the vaccine into a teaspoon of baby food may be more practical than administering it in a whole fruit. 7. there is always possibility of sideeffects due to the interaction between the vaccine and the vehicle. 8. people could ingest too much of the vaccine, which could be toxic, or too little, which could lead to disease outbreaks among populations believed to be immune. 9. a concern with oral vaccines is the degradation of protein components in the stomach due to low ph and gastric enzymes. however, the degradation can be compensated by repeating the exposure of the antigen until immunological tolerance is accomplished (mason et al. 2002) . 10. potential risk of spreading the disease by edible vaccine delivery is a concern of many. potential contamination of the oral delivery system is a possible danger. foreign proteins in plants accumulate in low amounts (0.01-2 % of total protein) and are less immunogenic, therefore the oral dose far exceeds the intranasal/parenteral dose. for example oral hepatitis b dose is 10-100 times the parenteral dose and 100 g potato expressing b subunit of labile toxin of etec (lt-b) is required in three different doses to be immunogenic. attempts at boosting the amount of antigens often lead to stunted growth of plants and reduced tuber/fruit formation as too much mrna from the transgene causes gene silencing in plant genome. techniques to overcome these limitations are given below: 1. optimization of coding sequence of bacterial/ viral genes for expression as plant nuclear genes 2. expression in plasmids 3. plant viruses expressing foreign genes 4. coat-protein fusions 5. viral-assisted expression in transgenic plants 6. promoter elements of bean yellow dwarf virus with reporter genes gus (b -glucuronidase) and green fluorescent protein (gfp), substituted later with target antigen genes. 7. antigen genes may be linked with regulatory elements which switch on the genes more readily or do so only at selected times (after the plant is nearly fully grown) or only in its edible regions. exposure to some outside activator molecule may also be tried. development of edible vaccines is a possible high-volume, low-cost delivery system for thirdworld countries to fight against fatal maladies like aids, hepatitis, and diarrhea. researches by the niaid and the university of maryland showed no significant side effects in a small study using genetically engineered potatoes to make toxin of the e. coli, a diarrhea-causing bacterium. volunteers reported no serious adverse reactions to genetically altered potatoes used to deliver edible vaccine toxin, according to the national institutes of health (nih). the nih reported that 10-11 volunteers who ate the raw potato bites developed four times the antibodies against e. coli without obvious side effects. long-term reactions to edible vaccines are yet to be determined and possible delayed reactions have not yet been discovered. an organized large scale study is required before edible vaccines are put into large scale production. creating edible vaccines involves the genetic engineering approach for the introduction of selected desired genes into plants and then inducing these altered plants to manufacture the encoded proteins. this process is known as ''transformation'' and the plants altered with new characteristics are called ''transgenic plants''. like conventional subunit vaccines, edible vaccines are composed of antigenic proteins and are devoid of genes responsible for pathogenicity. thus, they have no way of establishing infection, assuring its safety, especially in immune-compromised patients. the gene which codes the active antigenic protein is first isolated from the pathogen and is incorporated in a suitable ''gene vehicle''. this gene vehicle is integrated into the genome of the plant and is allowed to express the corresponding antigen. then these plant parts are fed to animals and humans to run their course. two main strategies are used for the production of candidate vaccine antigen in plant tissues (fig. 12.1 a, b) . 1. stable genomic integration: this is the most popular method used for the published edible vaccine clinical trials to date. under this method the genetic line is produced that can be propagated either by vegetative (stem cuttings) or sexual (seeds) reproduction methods. the stable expression strategy provides an opportunity to introduce more than one gene for possible multicomponent vaccine production. furthermore, the choice of genetic regulatory elements allows organ and tissue-specific expression of foreign antigens. stable transformation causes the desired gene to be incorporated either in nucleus or chloroplast. agrobacterium mediated gene transfer is used for transforming the plants in which the gene is integrated in nucleus. besides, direct delivery of dna into the tissue can also be applied, biolistic being the most popular method. however, chloroplast engineering has also got impetus in last decade due to its various advantages over nuclear engineering. gram-negative soil bacterium, which infects the wound sites in dicot plants causing the formation of the crown gall tumor. this bacterium is capable of transferring a particular dna segment (t-dna) of the tumor-inducing (ti) plasmid into the nucleus of infected cells where it is subsequently integrated into the host genome and transcribed. the t-dna usually contains cancer-causing oncogenic genes and genes that synthesize opines which are excreted by infected crown gall cells and are a food source for bacterium. during the genetic manipulation, the ti plasmid is engineered to carry the desired gene for vaccine and the virulent genes that cause tumor growth in plants are deleted. the transgene is integrated, expressed, and inherited in mendelian fashion. the whole plant can be then regenerated from individual transformed plant cell. it has been studied that genes are successfully expressed in experimental model plants and when given orally to animals, the extract of transgenic plant containing the antigen induced serum antibodies, thus can be used to produce the edible vaccine. the application of agrobacterium mediated transformation is at present possible to most species of agronomic interest, including members of family graminae and leguminosae. this opens interesting new aspect for the development of edible vaccines for both human and veterinary uses. the second approach for nuclear transformation is based on the microprojectile bombardment method, also known as the gene gun or biolistic method. this method is especially beneficial for those plants which can not be transformed by a. tumefaciens mediated gene transfer method. selected dna sequences are precipitated onto metal microparticles and bombarded with a particle gun at an accelerated speed in a partial vacuum against the plant tissue placed within the acceleration path. microparticles penetrate the walls and release the exogenous dna inside the cell where it will be integrated in the nuclear genome. thus, this method effectively introduces dna. the cells that take up the desired dna, are identified through the use of a marker gene (in plants the use of gus is most common), and then cultured to replicate the gene and possibly cloned. this method has various advantages including (1) thousands of particles are accelerated at the same time causing multiple hits resulting in transferring of genes into many cells simultaneously, (2) since intact cells can be used, the difficulties encountered with the use of protoplast are automatically circumvented, and (3) the method is universal in its application so that cell type, size, and shape or the presence/absence of cell wall do not significantly alter its effectiveness. another important use of the gene gun involves the transformation of organelles such as chloroplasts, and yeast mitochondria. the biolistic particle delivery system ''shoots'' adequately processed dna particles, which penetrate into the chloroplast and integrate with its genome. the chloroplast's transformation is an interesting alternative to nuclear transformation which has come up in recent past. all plant cells have chloroplasts that capture light energy from the sun to produce free energy through a process called photosynthesis. in chloroplast genetic engineering, the recombinant dna plasmid is bound to small gold nanoparticles that are injected into the chloroplasts of the leaf using a gene gun as described above. this device uses high pressure to insert the plasmid coated particles into the cells. these plasmids contain multiple genes of importance such as the therapeutic gene, a marker gene (may or may not be for antibiotic resistance), a gene that enhances the translation of therapeutic gene and two targeting sequences that flank the foreign gene. the foreign genes are inserted through homologous recombination via flanking sequences at a precise and predetermined location in the organelle genome. the gene expression level in plastids is predominately determined by promoter and 5 0 -untranslated regions (5 0 -utr elements) (gruissem and tonkyn 1993) . therefore, suitable 5 0 -utrs including a ribosomal binding site (rbs) are important elements of plastid expression vectors (eibl et al. 1999) . in order to obtain high-level protein accumulation from expression of the transgene, the first requirement is a strong promoter to ensure high levels of mrna. most laboratories use the strong plastid rrna operon (rrn) promoter (prrn). besides gene gun, peg mediated transformation and galistan expansion femto syringe microinjection techniques are also used for gene delivery in chloroplast. some of the advantages of chloroplast transformation technology are its low cost, natural gene containment, site specific insertion, very high level of stable expression, generation of production lines with a competitive timeline, elimination of gene silencing, and high accumulation of the recombinant protein. precise steps are given below: step 3: a heteroplasmic diploid plant cell (first round of selection) a homoplasmic diploid plant cell (second round of selection) step 4: multiple gene expression step 5: reproductive organs disintegration of paternal plastids step 6: maternal inheritance of transgenic traits. the most common entry point for pathogens is at mucosal epithelia lining the gastrointestinal, respiratory, and urino-reproductive tracts, which are collectively the largest immunologically active tissue in body. the mis is the first line of defense and the most effective site for vaccination against those pathogens; nasal, and oral vaccines being the most effective. the goal of oral vaccine is to stimulate both mucosal and humoral immunity against pathogens. edible vaccines have plant parts which are fed directly and the outer tough wall of plant cell acts to protect the antigens against attack by step 1: single gene step 2 an antigen in a food vaccine is taken up by m cells in the intestine and passed to various immune system cells, which then start a defensive attack, as antigen is a true infectious agent, not just part of one. the response leaves longlasting ''memory cells'' able to promptly neutralize the real infectious agent. the whole procedure can be explained in two stages (fig. 12.3 a, b) antibodies and antibody fragments produced against specific antigens are given in table 12 .2. there are numerous therapeutic and diagnostic applications of edible vaccines which are summarized in table 12 .3. some of the diseases on which the work is going on are described below: malaria is a disease of humans transmitted by the bite of an infected mosquito. it remains one of the most significant causes of human morbidity and mortality worldwide. according to who's 2010 world malaria report there are more than 225 million cases of malaria killing around 781,000 people. three antigens are currently being investigated for the development of a plant-based malaria vaccine, merozoite surface protein (msp) 4, msp 5 from plasmodium falciparum and msp 4/5 from p. yoelli. wang et al. (2004) have demonstrated that oral immunization of mice with recombinant msp 4, msp 4/5, and msp 1, co-administered with ctb as a mucosal adjuvant, induces antibody responses effective against blood stage parasite. measles is an infection of the respiratory system caused by a virus. in an experiment mice fed with tobacco expressing mv-h (measles virus haemagglutinin from edmonston strain) antibody titers five times the level considered protective for humans could be attained and secretory iga was found in their feces. prime boost strategy by combining parenteral and subsequent oral mv-h boosters could induce titers 20 times the human protective levels. these titers were significantly greater than with either of the vaccine administered alone. mv-h edible vaccine does not cause atypical measles, which may be occasionally seen with the current vaccine. thus, it may prove better for achieving its eradication. the success in mice has prompted similar experiments in primates. transgenic rice, and lettuce, and baby food against measles are also being developed. when given with ctb (adjuvant), 35-50 g mv-h lettuce is enough; however, an increased dose would be required if given alone. rabies is a deadly viral infection that is transmitted to humans from animals. tomato plants expressing rabies antigens could induce antibodies in mice. alternatively, tmv may also be used. transformed tomato plants using camv with the glycoprotein (g-protein) gene of rabies virus (era strain) was shown to be immunogenic in animals. hepatitis b is a potentially life-threatening liver infection caused by the hepatitis b virus. it is estimated to have infected 400 million people throughout the globe, making the virus one of the most common human pathogen. first human trials of a potato-based vaccine against hepatitis b have reported encouraging results. since immunization is the only known method to (das 2009) prevent the disease of the hepatitis b virus, any attempt to reduce its infection requires the availability of large quantities of vaccine hbsag. the amount of hbsag needed for one dose could be achieved in a single potato. levels of specific antibodies significantly exceeded the protective level of 10 miu/ml in humans. when cloned into camv, the pcmv-s plasmid encoding the hbsag subtype ayw showed higher expression in roots as compared to leaf tissue of the transgenic potato. furthermore, expression of the antigen was found to be higher in roots of transgenic potato than in leaf tissues. however, the expression of hbsag in transgenic potatoes is not sufficient for using as oral vaccine. further studies are underway to increase the level of hbsag by using different promoters e.g., patatin promoter, and different transcription regulating elements. cholera is an infection of the small intestine that causes a large amount of watery diarrhea. it causes up to 10 million deaths per year in the developing world, primarily among children. studies supported by who have demonstrated possibility of an effective vaccine for cholera, which provides cross protection against enterotoxic e. coli. to address this limitation, plants were transformed with the gene encoding b subunit of the e. coli heat liable enterotoxin (lt-b). transgenic potatoes expressing lt-b were found to induce both serum and secretory antibodies when fed to mice; these antibodies were protective in bacterial toxin assay in vitro. this is the first ''proof of concept'' for the edible vaccine. since people eat only cooked potatoes, the effect of boiling on the properties of ctb expressed in transgenic potatoes was examined. after boiling for 5 min, over half of the vaccine protein survived in its biologically active form, providing evidence that cooking does not always inactivate edible vaccines. thus, the spectrum of plants for producing edible vaccines may be expanded beyond raw food plants such as fruits. co-expression of mutant cholera toxin subunit (mct-a) and lt-b in crop seeds has been shown to be effective by nasal administration and is extremely practical. the prevalence of diabetes is increasing globally and india is no exception. more than 100 million people are affected with diabetes worldwide. type-i diabetes, also known as insulindependent diabetes mellitus (iddm) or juvenileonset diabetes, primarily affects children and young adults and accounts for 5-10 % of the diagnosed diabetes in north america. research by ma and hein (1995) at the university of western ontario showed that diabetes can be prevented in mice by feeding them with plants engineered to produce a diabetes related-protein. the idea is based on 'oral tolerance' where the autoimmune system is selectively turned off early by teaching the body to tolerate the ''antigenic proteins''. the pancreatic protein, glutamic acid decarboxylase (gad67) is linked to the onset of iddm, and when injected into mice it is known to prevent diabetes. the canadian group developed transgenic potato and tobacco plants with the gene for gad67, fed them to nonobese diabetic mice, which developed insulin-dependent diabetes spontaneously. the results were intriguing, only 20 % of the prediabetic mice fed with transgenic plants developed the diabetes, while 70 % nontreated mice developed the disease. the treated mice also showed increased levels of ig1, an antibody associated with cytokines, which suppresses harmful immune responses. thus, the antigen produced in plants appears to retain immunogenicity and prevent diabetes in an animal model. according to canadian scientists, this is the first proof of principle for the use of edible vaccines in the treatment of the autoimmune diseases. human immunodeficiency virus (hiv) is a retroviral that causes acquired immunodeficiency syndrome (aids), a condition in humans in which progressive failure of the immune system allows life-threatening opportunistic infections and cancer to thrive. in order to produce edible vaccine initial success in splicing hiv protein into cpmv has been achieved. two hiv protein genes and camv as promoter were successfully injected into tomatoes with a needle, and the expressed protein was demonstrable by polymerase chain reaction (pcr) in different parts of the plant, including the ripe fruit, as well as in the second generation plants. recently, spinach has been successfully inoculated for tat protein expression cloned into tmv. each gram of leaf tissue of spinach was shown to contain up to 300-500 lg of tat antigen. mice fed with this spinach followed by dna vaccinations resulted in higher antibody titers than the controls, with the levels peaking at 4 weeks post-vaccination. it is still unclear whether the edible vaccines would be regulated under food, drugs. or agricultural products and what vaccine component would be licensed-antigen itself, genetically engineered fruit or transgenic seeds. they would be subjected to a very close scrutiny by the regulatory bodies in order to ensure that they never enter the food supply. this would include greenhouse segregation of medicinal plants from food crops to prevent outcrossing and would necessitate separate storage and processing facilities. although edible vaccines fall under ''genetically modified'' plants, it is hoped that these vaccines will avoid serious controversy, because they are intended to save lives. edible vaccines are future vaccines and some challenges are yet to be overcome before these can become a reality. like all products regulated by food and drug administration, edible vaccines undergo a rigorous review of laboratory, and clinical testing that are conducted to get information regarding safety, efficacy, purity, and potency of these products. these trials can take place only after satisfactory information has been collected on the quality of the nonclinical safety. successful expression of antigens in plants has been demonstrated in the past. the vaccines have also been checked for their efficacy in humans. results from the primary phase of the first-ever human clinical trial of an edible vaccine were published in the journal nature medicine in 1998 (blaine p. friedlander, boyce thomson institute of plant research), which indicated that consumption of servings of raw potatoes resulted in immunity to specific diseases. the human clinical study was conducted under the direction of dr. carol tacket at the center for vaccine development, university of maryland school of medicine in baltimore. in the first phase of human testing, the potatoes eaten by volunteers contained a vaccine against travelers' diarrhea, a common condition resulting from intestinal infection by the bacterium e. coli, which contaminates food or water supplies. the clinical trials were approved in advance by the food and drug administration. encouraged by the results of this study, scientists started exploring the use of this technique for administering other antigens. in 2005 thanavala's group has developed a potato vaccine booster for use in conjunction with injected hepatitis b vaccine. it is currently in phase ii clinical trial and phase i for patients who have previously been vaccinated. in 2000, tacket and his team mates studied the human immune response to the norwalk virus capsid protein expressed in potatoes. overall, 95 % (19 out of 20 volunteers) developed some kind of immune response, although the antibody increase in some cases was modest. in same year, pogrebnyak's lab developed an effective vaccine against the coronavirus which causes severe acute respiratory syndrome (sars). tomato and tobacco plants are used for high expression of the coronavirus spike protein (s1). first, lyophilized tomato fruit was fed to mice and then boosting occurred with s1 protein expressed in tobacco roots; high igg1 immune responses and significant igg2a and igg2b responses were observed in their sera. research is also going in the direction to engineer the plants to produce a variety of functional monoclonal antibody (ma et al. 2005) . in the first human study of transgenic plant vaccine designed to induce active immunity, 14 adult volunteers were given either 100 g of transgenic potato, 50 g of transgenic potato or 50 g of wild type potato, each transgenic potatoes containing from 3.7 to 15.7 lg/g of lt-b. the variable dose per gram of potato was due to the tissue specificity of the promoter, therefore, that lt-b was expressed to a different degree in the different tissues of the potatoes. the potatoes in this study were ingested raw; however, subsequent studies have shown that transgenic potatoes expressing the b subunit of cholera toxin could be boiled for 3 min until the tissue becomes soft with loss of only about 50 % of the ct-b pentameric gm1binding form. serologic responses were also detected after vaccination. totally 10 out of the 11 volunteers (91 %) who ingested transgenic potatoes developed igg anti-lt and in half of them responses occurred after the first dose. there are 6 of the 11 (55 %) volunteers developed fourfold rise in serum iga anti-lt. researchers supported by the niaid have shown for the first time that an edible vaccine can safely trigger significant immune responses in people. the goal of the phase 1 proof-ofconcept trial study was to demonstrate that an edible vaccine could stimulate an immune response in humans. volunteers ate bite-sized pieces of raw potato that had been genetically engineered to produce part of the toxin secreted by e. coli, which causes diarrhea. the trial enrolled 14 healthy adults, 11 were chosen at random to receive the genetically engineered potatoes and 3 received pieces of ordinary potatoes. the investigator periodically collected blood and stool samples from the volunteers to evaluate the vaccine's ability to stimulate both systemic and intestinal immune responses. the potatoes were well tolerated and no one experienced serious adverse side effects. niaid supported scientists are exploring the use of this technique for administering other antigens. edible vaccines for other intestinal pathogens are already in the pipeline. potatoes and bananas that might protect against norwalk virus, a common cause of diarrhea, and potatoes and tomatoes that might protect against hepatitis b are being developed. thirty million children throughout the world do not receive even the most basic immunizations each year. as a result, at least three million of these children die from diseases that are fully vaccine-preventable. the solution to vaccinate these children might seem simple with the idea of large scale production of edible vaccines for various diseases. as a recent progress, the first human clinical trials for plant-based vaccine have been performed; it brings many challenges like optimization of expression levels, stabilization during post harvest storage, etc. long-term reactions to edible vaccines are yet to be determined. possible delayed reactions not yet discovered may be the point of consideration. in addition to that, edible vaccines can be further improved for their oral immunogenicity by the use of specific adjuvant which can be applied either as a fusion to the candidate gene or as an independent gene. some of the diseases to which edible vaccines have shown promising application may be elaborated in the veterinary as well as human spectrum. these studies conclude plant-derived vaccines as a new hope and promise for more immunogenic, more effective, and less expensive vaccination strategies against both respiratory as well as intestinal mucosal pathogens. research in the field of edible vaccines holds immense potential for the future and every advancement made in this direction is bringing the dream of edible vaccine one step closer. there is hope that in coming future edible vaccines will conquer all serious diseases and make the planet beautiful to live in. edible vaccines an edible vaccine for malaria using transgenic tomatoes of varying sizes, shapes and colors to carry different antigens plant derived edible vaccines in vivo analysis of plastid psba, rbcl and rpl32 utr elements by chloroplast transformation: tobacco plastid gene expression is controlled by modulation of transcript levels and translation efficiency control mechanisms of plastid gene expression production of antibodies in transgenic plants immunotherapeutic potential of antibodies produced in plants production of biologically active human interleukin-4 in transgenic tomato and potato transgenic plants as vaccine production systems expression of hepatitis b surface antigen in transgenic plants edible vaccine protects mice against escherichia coli heat-labile enterotoxin (lt): potatoes expressing a synthetic lt-b gene edible plant vaccines: applications for prophylactic and therapeutic molecular medicine edible vaccines: a concept comes of age plant biotechnology and in vitro biology in the 21st century production of hepatitis b surface antigen in transgenic plants for oral immunization human immune responses to a novel norwalk virus vaccine delivered in transgenic potatoes immunogenicity in humans of an edible vaccine for hepatitis b oral immunization with a combination of plasmodium yoelii merozoite surface protein 1 and 4/5 enhances protection against lethal malarial challenge key: cord-000580-dcid9emx authors: sällman almén, markus; bringeland, nathalie; fredriksson, robert; schiöth, helgi b. title: the dispanins: a novel gene family of ancient origin that contains 14 human members date: 2012-02-20 journal: plos one doi: 10.1371/journal.pone.0031961 sha: doc_id: 580 cord_uid: dcid9emx the interferon induced transmembrane proteins (ifitm) are a family of transmembrane proteins that is known to inhibit cell invasion of viruses such as hiv-1 and influenza. we show that the ifitm genes are a subfamily in a larger family of transmembrane (tm) proteins that we call dispanins, which refers to a common 2tm structure. we mined the dispanins in 36 eukaryotic species, covering all major eukaryotic groups, and investigated their evolutionary history using bayesian and maximum likelihood approaches to infer a phylogenetic tree. we identified ten human genes that together with the known ifitm genes form the dispanin family. we show that the dispanins first emerged in eukaryotes in a common ancestor of choanoflagellates and metazoa, and that the family later expanded in vertebrates where it forms four subfamilies (a–d). interestingly, we also find that the family is found in several different phyla of bacteria and propose that it was horizontally transferred to eukaryotes from bacteria in the common ancestor of choanoflagellates and metazoa. the bacterial and eukaryotic sequences have a considerably conserved protein structure. in conclusion, we introduce a novel family, the dispanins, together with a nomenclature based on the evolutionary origin. membrane proteins are essential for the ability of all cellular organisms to respond and interact with their environment. therefore they have attained large research interest and are one of the major groups of drug targets [1] . we have previously estimated that 27% of the human genes codes for alpha-helical membrane proteins and provided a comprehensive classification based on their function and evolutionary origin [2] . however, the identification and annotation of many membrane bound protein families is still being revised. we have during recent years worked on the annotation of both g protein-coupled receptors [3, 4] and solute carriers [5] and most of the genes of these large superfamilies now have a clear identity and annotation. there is however still large work to be done to clarify the identity, annotation and the evolutionary history of several families of membrane bound proteins. establishing a rigid nomenclature based on evolutionary information and structural features of the predicted proteins facilitates prediction of the functional role of these genes that often have only have been studied in large gene or transcription consortia. in previous studies we have found that membrane proteins with few transmembrane (tm) helices are less studied than other. this is particularly true for 2tm proteins where more than 70% of the about 700 proteins remained unclassified. interestingly, we found evidence for several uncharacterized homologues to a small group of genes known as the interferon-induced transmembrane proteins (ifitm) family. the ifitms constitute a group with four human members (ifitm1-3, 5) that are found in a consecutive order on chromosome 11, having two transmembrane (2tm) helices. the ifitm4 gene is not present in human, but is located in proximity to the other four genes in the mouse genome. the ifitm1-3 proteins were identified 25 years ago as being upregulated by interferons (ifn) [6] . recently they received considerable attentions as ifitm1-3 were found to prevent infection of a growing list of viruses such as hiv-1, sars influenza a h1n1, west nile and dengue fever viruses [7, 8, 9, 10] . hence, proteins of the ifitm family mediate part of the antiviral response orchestrated by ifns. however, the ifitm family is also involved in other processes such as oncogenesis, bone mineralization (ifitm5) and germ cell development (ifitm1 and 3) and ifitm5 has not been identified as interferon-inducible [11, 12, 13, 14] . although the biological roles of the ifitm genes are emerging, no thorough evolutionary analysis has been performed on this group. in this study, we sought to infer the evolutionary history of the human ifitm genes and identify potential homologues. we mined 36 eukaryotic species, covering all major eukaryotic groups, and found that the ifitms form a subfamily in a larger novel family that has ten human members in addition to the four ifitm genes. we propose dispanins as a novel name for this family, which refers to their common 2tm structure. further, we find that the eukaryotic dispanins first appeared before the radiation of metazoa and that they branch out into four subfamilies (a-d). more surprisingly, we also discover that the dispanins are found in a large range of bacteria and in brown alga. in total, we collected 87 eukaryotic ifitm homologues from h. sapiens (14 genes) m. musculus (17 genes), g. gallus (6 genes), x. tropicalis (13 genes), d. rerio (7 genes), p. marinus (1 gene), c. intestinalis (1 gene), b.floridae (12 genes), s. manosoni (1 gene), s. purpuratus (9 genes), n. vectensis (4 genes) and m. brevicollis (1 gene). no ifitm genes could be detected in any of the remaining 24 analyzed proteomes, which covers all other major eukaryotic groups. the search of the nr database with hmmer did not get any hits outside metazoa except bacteria, choanoflagellates and the brown alga ectocarpus siliculosus (2 genes). nine genes were deemed as pseudogenes based on annotation and sequence analysis and removed from further analysis. in addition to the four previously identified human ifitm genes, ten novel human homologous genes were detected. these ten genes together with the four ifitm genes form a human gene family that we choose to call dispanins based of their common 2tm structure. in uniprot we identified 65 annotated ifitm homologues from full bacteria proteome sets spread over seven different phyla (see table s1): acidobacteria (2 genes), actinobacteria (43 genes), cyanobacteria (3 genes), tg1 (1 gene), bacteroidetes (2 genes), firmicutes (1 gene) and proteobacteria (13 genes). out of these, 46 bacterial sequences from 32 species were included for further analysis. no viral or archaean genes were annotated as ifitm homologues in uniprot. the phylogeny of the vertebrate dispanins ( figure 1 ) allows the division of the dispanins into four subfamilies a-d that are supported by strong confidence with respect to posterior probabilities (pp) or bootstraps (pp.0.75 and bs.90% for all nodes). we propose a common nomenclature for the dispanins that are based on their subclass and a number (dspa1 etc). the proposed names together with previous gene symbols and accession number can be found in table s1 . the finding of two dispanin homologs in the brown alga e. siliculosus, which is evolutionary distant to metazoa, and the single dispanin in the close metazoan relative m.brevicollis are the two only non-metazoan eukaryotic dispanins. a blast search gives that the e. siliculosus proteins have a higher similarity to metazoan family members (best hit e-value,10 210 ) than bacterial dispanins and m. brevicollis. the m. brevicollis dispanin share a conserved splice site with all metazoan family members, which suggest that the eukaryotic dispanins first emerged in a common ancestor of m. brevicollis and the metazoan lineage. within the metazoan lineage the dispanins have been lost in at least two separate occasions, i.e. t. adhaerens and in the ecdysozoan lineage (d. melanogaster and c. elegans). the vertebrate dispanins sort into subfamilies a-d ( figure 1 ). the dspa subfamily has six human genes (dspa1, dspa2a-d and dspa3) of which the dspa2a-c corresponds to ifitm1-3 and dspa1 to ifitm5. dspa2d (ac023157) and dspa3 (ac068580) are two novel identified genes, closely related to the ifitm family. the phylogenetic tree indicates that the dspa2 genes have undergone an independent duplication in h. sapiens and m. musculus and these were given a species specific nomenclature, e.g. dspa2a-d in human. the dspa4, dspa5, dspa6 genes in the phylogenetic tree do not have any clear human orthologs. the dspb subfamily is only found in tetrapoda and contains three human genes called dspb1 (tusc5), dspb2 (tmem233) and dspb3 (prrt2) whereas dspb4 is only present in g. gallus. dspc1 (tmem90a), dspc2 (tmem90b) and dspc3 (tmem91) make up the human dspc subfamily, which is represented in all investigated vertebrates. the dspd1 (pprt1) gene is found in all the vertebrate species except the basal organism p. marinus whereas dspd2 (al160276) is mammalian specific. five vertebrate genes and all invertebrate were excluded from the phylogenetic analysis and instead classified into subfamilies by using a blast approach (table s1) . some genes could not unambiguously be classified into the vertebrate subfamilies: c. intestinalis (1 gene), s. mansoni (1 gene), b. floridae (2 genes), s. purpuratus (3 genes), n. vectensis (1 gene) and m. brevicollis (1 gene). by combining the results of the phylogenetic analysis and blast classification, we created a schematic overview of the organisms' gene repertoire and a schematic picture of the dispanin family's evolutionary history, which suggests that the invertebrate dispanins share more similarity towards the dspc and d subfamilies than dspa and b ( figure 2 ). all the members of the human dspa are located on chromosome 11 except for dspa2d which resides on chromosome 12. the genes of the other subfamilies are not enriched on any chromosome. several features are common to all the eukaryotic dispanin proteins ( figure 3 ). they comprise two transmembrane helices that are predicted between 20 and 30 amino acids in length with the second helices often being slightly longer. the dispanins are rich in both glycosylation-and phosphorylation sites that predominantly are found on the nterminus. the n-terminus is often long (.100 amino acids) compared to the c-terminal (,10 amino acids) and both are always oriented towards the outside of the cell. the dispanins contain several conserved motifs ( figure 3 and 4), which are found both among eukaryotes and bacteria. the most conserved pattern is the g-d motif and the a-x(6)-a motif, both situated in the intracellular loop between the transmembrane helices that also is frequently rich in positive amino acids (k and r). the first helix is the most conserved with an alanine (a) residue and double cysteine c-c (c-f-c in the dspb family) motif whereas a glycine (g) residue is the most conserved in the second helix. another highly conserved motif, though only amongst the eukaryotes, is the single aspartic acid (d) on the n-terminus, flanking the first helix. analysis of the exon structure in the protein sequences was made for all eukaryotic dispanins except e. siliculosus where no such information was found. the eukaryotic dispanins have a conserved splice site in the intracellular loop that separates the two transmembrane helices into different exons ( figure 4 ). this site is only missing in the s. mansoni dispanin and the mouse dspa2f genes. the vertebrate proteins of the dspa and dspb subfamilies only have these two exons whereas the whole dspc subfamily and the dspd1 proteins have an additional exon that codes for their n-terminus. the dspd2 proteins that only are found in mammals seem to have lost their n-termini exon. all the classified b. floridae and s. purpuratus sequences has the corresponding splice site in the n-terminus, whereas the n. vectensis and m. brevicollis proteins has 3-6 and 7 exons respectively. we provide evidence that the four ifitm genes together with ten additional human genes, known as tusc5, tmem233, prrt2, tmem90a, dspc2, tmem90b, tmem91, ac023157, al160276 and ac068580, form a novel gene family that we call the dispanins, which refers to the 2tm membrane topology that is common to all identified members. this family is the second largest 2tm family in the human genome, superseded only by the inwardly rectifying potassium channel family that has 15 members [2] . except for the 2tm memebrane topology the dispanins are not homologous or share domains with any other 2tm proteins in the human genome and constitute a distinct gene family. we have discovered that this family is found in metazoan, the choanoflagellate m. brevicollis and the brown alga e. siliculosus, but not in other eukaryotes. surprisingly it is widely present in bacteria where it is found in several different phyla such as actinobacteridae, acidobacteria, cyanobacteria, bacteriodetes, firmicutes and proteobacteria. the highest number of bacterial dispanins is detected in actinobacteria and proteobacteria, which diverged around three billion years ago [15] . we find that dispanins in eukaryotes and bacteria have high sequence similarities and share several conserved sequence motifs (figure 4) , which is strong evidence for a common evolutionary origin and possibly a functional relationship. as the family is found in several bacterial phyla we suggest that it first emerged in bacteria to later be introduced in eukaryotes through a horizontal gene transfer event. however, we were not able to construct a stable phylogenetic tree including bacterial and eukaryotic dispanins. as the eukaryotic dispanins only is widespread in metazoa it was unexpected to find the family in the evolutionary distant brown alga e. siliculosus (figure 2 ). our sequence analysis supports that all metazoan dispanins have their origin in the common ancestor of m. brevicollis and metazoa as the choanoflagellate share a conserved splice site in the intracellular loop with nearly all metazoan dispanins (figure 4) . however, the finding of the family in e. siliculosus suggests that the family has undergone two horizontal gene transfer events. as the e. siliculosus dispanins are more similar to metazoan family members than m. brevicollis and bacteria we propose that the first horizontal gene transfer event was from bacteria to a common ancestor of choanoflagellates and metazoa followed by a second transfer between metazoa and brown alga. during the course of metazoan evolution the dispanins have expanded and diverged into four distinct subfamilies. however, it has also been lost in the basal metazoa t. adhaerens and the ecdysozoan lineage, which show that it is not essential for all metazoan life. although we were unable to create a stable phylogenetic tree that include both vertebrate and invertebrate sequences blast searches suggest that the dspc and d subfamilies are the oldest of the vertebrate subfamilies as the invertebrate sequences has higher resemblance to these two subfamilies ( figure 2) . moreover, the dspc and d subfamilies forms a separate cluster from dspa-b (figure 1) . hence, the phylogenetic analysis suggests that the dspa and b subfamilies have their origin close to the radiation of teleost, although dspb have been lost in d. rerio (figure 1 and 4) . the dspc family is found in two to three copies in all vertebrates and is the most widespread family as the blast classification suggest that it is present in two invertebrate species and e. siliculosus. in mouse, the family members are expressed predominantly in brain tissues (dspc1/tmem90a, dspc2/tmem90b) or ubiquitously (dspc3/ tmem91). dspc1 (tmem90a) has been proposed to have a role in striatial functioning and the pathophysiology of huntington's disease and is localized to the golgi apparatus [16] . the dspd family has been lost at several occasions, both in vertebrates and invertebrates ( figure 2 ). the mouse dspd1 (prrt1) gene is ubiquitously expressed with the highest expression in b-cells according to biogps [17] . however, no previous studies have been performed on the dspd subfamily. the dspb subfamily is found in tetrapoda and has three members in human and mouse. mice expression profiles from biogps shows that dspb3 (prrt2) is exclusively expressed in brain tissues and that dspb1 (tusc5) is expressed in dorsal root ganglia and adipose tissues. in agreement with this expression data, dspb1 (tusc5) has been suggested to be involved in neural regulation of adipocyte differentiation and is regulated by pparc [18, 19] . the dspa/ifitm subfamily is the most numerous and the mouse and human genes are all clustered in a consecutive manner on chromosome six and eleven, respectively. these regions share a conserved synteny (http:// cinteny.cchmc.org/) and are flanked by the athl1 and b4galnt4 genes on each side, which is strong evidence for the genes to have their origin in common evolutionary gene duplications. this is supported by the phylogenetic analysis ( figure 1 ) for dspa1 (ifitm5) and dspa3 (ac068580), which have orthologs in all tetrapoda. however, for the dspa2 group (dspa2a-d/ifitm1-3 and ac068580) the phylogeny suggests that m. musculus and h. sapiens have undergone independent expansions of the group. rather than being created by independent gene duplications in the two species, it is possible that these genes are subject to concerted evolution, where paralogous genes within a species are more conserved towards each other than towards orthologs in other species. this phenomenon is most common in tandemly repeated genes, such as the dspa2 group, and is believed to primarily be the result of recombination mechanisms [20] . interestingly, also the dspa4a-f genes of x. tropicalis seem to have undergone and independent expansion. however, the phylogeny is not strong enough to prove that the dspa4a-f genes are orthologous to the mammalian dspa2 genes (figure 1 ). the dspa/ifitm subfamily is the most well studied and is a multifunctional family of which its antiviral properties are best understood [8] . the family is expressed in many mouse tissues with the highest expression in mast cells, macrophages and osteoblasts according to biogps. we add two novel human members to this subfamily: dspa2d, which is closely related to dspa2c (ifitm3) and dspa3, which forms a distinct cluster (figure 1 ). both these genes are poorly characterized. microarray data from array express (www.ebi.ac.uk/ arrayexpress/) shows that dspa3 is upregulated by interferon after exposure of macrophages to interferon-gamma in a study (e-geod-5099) where dspa2a-c (ifitm1-3) also were induced [21] . the mouse dspa3 gene is like the other genes of this subfamily situated on chromosome seven, but is 1.5 mbp away from the dspa (ifitm) cluster where the other genes reside. hence, dspa3 could explain the mild phenotypes and be responsible for the suggested functional redundancy that was found when deleting the whole dspa (ifitm) loci [22] . the dispanin family has several conserved motifs across subfamilies that are also detected in bacteria (figure 4) . one of the most prominent is the double cysteine motif (c-c) in the first transmembrane helix. this motif has recently been shown to undergo post-translational modification through s-palmitoylation in dspa2c (ifitm3), which increases hydrophobicity [23] . further, yount and colleagues shows that the antiviral activity of dspa2c (ifitm3) is dependent on this modification, which induces clustering of the proteins. as this motif is highly conserved, it is likely that s-palmitoylation is an important regulatory mechanism also among the other subfamilies. intriguingly, this motif is also found in the bacterial dispanins even though bacterial proteins do not undergo s-palmitoylation. hence, the cysteine motif of the may have other means of structural and functional importance apart of from the s-palmitoylation. in this study, we introduce the dispanin family, of which the ifitm genes constitute a subfamily. in addition to the ifitm genes we identify 10 novel human dispanins and investigate the family's evolutionary history and suggest that the eukaryotic members are descending from bacteria through a horizontal gene transfer. thus, the expansion and diversification of dispanins in vertebrates may reflect the evolution of a larger functional repertoire, which is a supported by the distinct expression profiles figure 3 . the protein features and topology of the dispanin subfamilies. the picture shows the membrane topology and sequences features of a representative human member of each subfamily. conserved motifs and residues are shown and those which have a sequence identity of more than 90% are framed in black and those with 80-90% sequence similarity are framed in blue. predicted phosphorylation (green) and glycosylation (orange) sites are shown. doi:10.1371/journal.pone.0031961.g003 of the subfamilies. by identifying homologs to the ifitm genes and establishing the dispanins as a family together with a solid detailed and evolutionary based nomenclature for the vertebrate genes, we provide a fundament for future functional characterization these genes. the whole proteome dataset for the following eukaryotic species was included in the analysis: homo sapiens, mus musculus, gallus gallus, xenopus tropicalis, danio rerio, petromyzon marinus, drosophila melanogaster, caenorhabditis elegans, saccharomyces cerviciae, schistosoma mansoni, apis mellifera, anopheles gambiae, pediculus humanus, ixodes scapularis, daphnia pulex, oryza sativa, pristionchus pacificus, acyrthosiphon pisum, trypanosoma brucei, leishmania braziliensis and ciona instestinalis were downloaded from ensembl; strongylocentrous purpuratus was downloaded from spbase (www.spabase.org); branchiostoma floridae, nematostella vectensis, trichoplax adhaerens, phytophtera soyae, thalassiosira pseudonana, naegleria gruberi and monosiga brevicollis were downloaded from the joint genome institute; dictyostelium discoideum was downloaded from dictybase (www.dictybase.org); arabidopsis thaliana was downloaded from tair (http://www.arabidopsis.org/); entamoeba histolytica was downloaded from amoebadb (http://amoebadb.org); paramecium tetraurelia was downloaded from ncbi; tetrahymena thermophila was downloaded from uniprot; trichomonas vaginalis was downloaded from trichdb (http://trichdb.org); giardia lamblia was downloaded fromgiardiadb (http://giardiadb.org). all proteomes were searched against a local installation of the pfam database (v.23) [24] using hmmer3 [25] and the script pfam_scan.pl, which was obtained from the pfam ftp-site (ftp://ftp. sanger.ac.uk/pub/databases/pfam/tools/), with pfam's default settings. in pfam, the ifitm family is represented by a specific hidden markov model [pfam: p04505]. all the proteins that were assigned to this model in the pfam-search were considered to be homologous to the ifitm family and were therefore included for further analysis. the script pfam_scan.pl uses the homology criterion set by the pfam database, which is based on a manually curated gathering threshold for each model. the gathering threshold for pf04505 is a score of 20.6. the sequence datasets were controlled for annotated pseudogenes and transcript variants from the same gene. in the case of multiple transcript variants, the longest sequence was kept. the resulting non-redundant datasets were used for the analysis. the bacterial sequences were obtained by querying uniprot (www.uniprot.org) for the pfam id [pfam: pf04505]. thereafter, the sequence set was downloaded by browsing by taxonomy and restricting it to species with a full proteome set. to assure that no lineages were missing in the selection of proteomes the nr protein dataset from ncbi was downloaded. the nr datasets contained 15,322,545 (20-09-11) protein sequences from a wide range of organisms. the dataset was searched against the pf04505 pfam model using hmmer3 with default settings and sequences with a score above the pfam gathering threshold (20.6) were deemed as homologous to ifitm. mafft-einsi was used, with default settings, to create a multiple sequence alignment (msa) for the vertebrate protein sequences [26] . the msas were thereafter examined and refined in jalview 2.5.1 [27] , i.e. the sequences were trimmed and well conserved and aligned regions were kept, which included 82 aligned amino acid columns. phylogenetic analysis was performed with a bayesian approach implemented in mrbayes [28] . the following settings for the eukaryote proteins were adjusted: the analysis was run using a gamma shaped model for the variation of evolutionary rates across sites (rates = gamma) and the mixed option (aamodelpr = mixed) was used to estimate the best amino acid substitution model. we generated 5 000 000 trees and the markov chain monte carlo analysis reached well below a standard deviation of split frequencies of 0.01. each hundred tree was sampled from the mcmc run and the first 25% of the sampled trees were discarded (burnin = 0.25) to reassure a good sample from the posterior probability distribution. a consensus tree was built from the remaining 37 500 trees with the mrbayes sumt command using the 50% majority rule method. the sump command was used to assure that an adequate sample of the posterior probability distribution was reached during the mcmc procedure. to validate the phylogenetic inference with mrbayes a maximum likelihood method implmemented in raxml was used [29] . the combined rapid bootstrapping and search for the bestscoring ml tree option (-f a) in raxml was used to create 1000 bootstraps (-# 1000) using a gamma model of evolutionary rates and the jtt substitution model (-m protgammajtt). the jtt substitution model was identified as the most suitable model in the bayesian analysis and therefore selected for the maximum likelihood phylogeny. the consensus phylogenetic tree found with mrbayes was drawn in dendroscope 3.0 and the bootstrap support values from raxml were annotated on the corresponding nodes [30] . the phylogenetic tree was used to determine subfamilies by identifying clusters with a high posterior probability and bootstrap support. invertebrate sequences were excluded from the phylogenetic analysis as they induced highly unstable topologies together with 1 m. musculus, 2 d. rerio and 3 x. tropicalis sequences. these excluded sequences were categorized into their respective subfamilies by using a blast search towards the categorized sequences. the top five hits were examined to classify the invertebrate and excluded sequences into subfamilies. a sequence was assigned to the subfamily if four out of the five top hits are from the same subfamily. several resources were used to identify the protein sequence features of the dispanins. netphos 2.0 [31] identified potential phosphorylation sites. netnglyc 1.0 and netoglyc 3.1 [32] were used to find possible n-and o-glycosylation sites respectively. transmembrane helix prediction was made using tmhmm 2.0 [33] . motifs were found manually through jalview 2.5.1 and the server meme 4.4.0. finally, emboss:cons was used to create consensus sequences of the different dispanin families that emerged from the phylogenetic analysis and the bacterial sequences. the consensus sequences were aligned together with the m. brevicollis and the invertebrate sequences. the resulting msa was viewed and trimmed in jalview and the conserved region around the tm helices was kept. splice sites were detected in the eukaryotic dispanins by studying their annotation in the respective databases and align them to their genome using blat at the ucsc genome browser website (http://genome.ucsc.edu). table s1 this is a record of all identified dispanins together with their accession numbers, nomenclature and species belonging. (xls) trends in the exploitation of novel drug targets mapping the human membrane proteome: a majority of the human membrane proteins can be classified according to function and evolutionary origin the g-proteincoupled receptors in the human genome form five main families. phylogenetic analysis, paralogon groups, and fingerprints structural diversity of g protein-coupled receptors and significance for drug discovery the solute carrier (slc) complement of the human genome: phylogenetic classification reveals four major families transcriptional and posttranscriptional regulation of interferon-induced gene expression in human cells the ifitm proteins inhibit hiv-1 infection the ifitm proteins mediate cellular resistance to influenza a h1n1 virus, west nile virus, and dengue virus identification of five interferon-induced cellular proteins that inhibit west nile virus and dengue virus infections distinct patterns of ifitm-mediated restriction of filoviruses, sars coronavirus, and influenza a virus identification of the ifitm family as a new molecular marker in human colorectal tumors characterization of the osteoblast-specific transmembrane protein ifitm5 and analysis of ifitm5-deficient mice ifitm/mil/ fragilis family proteins ifitm1 and ifitm3 play distinct roles in mouse primordial germ cell homing and repulsion bril: a novel bone-specific modulator of mineralization timetree: a public knowledge-base of divergence times among organisms capucin: a novel striatal marker down-regulated in rodent models of huntington disease biogps: an extensible and customizable portal for querying and organizing gene annotation resources characterization of tusc5, an adipocyte gene co-expressed in peripheral neurons molecular characterization of the tumor suppressor candidate 5 gene: regulation by ppargamma and identification of tusc5 coding variants in lean and obese humans concerted evolution: molecular mechanism and biological implications transcriptional profiling of the human monocyte-to-macrophage differentiation and polarization: new molecules and patterns of gene expression normal germ line establishment in mice carrying a deletion of the ifitm/fragilis gene family cluster palmitoylome profiling reveals s-palmitoylation-dependent antiviral activity of ifitm3 the pfam protein families database a new generation of homology search tools based on probabilistic inference recent developments in the mafft multiple sequence alignment program jalview version 2-a multiple sequence alignment editor and analysis workbench mrbayes: bayesian inference of phylogenetic trees raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models dendroscope: an interactive viewer for large phylogenetic trees sequence and structure-based prediction of eukaryotic protein phosphorylation sites prediction, conservation analysis, and structural characterization of mammalian mucin-type oglycosylation sites a hidden markov model for predicting transmembrane helices in protein sequences key: cord-005432-mqyvpepo authors: ma, z; mi, z; wilson, a; alber, s; robbins, pd; watkins, s; pitt, b; li, s title: redirecting adenovirus to pulmonary endothelium by cationic liposomes date: 2002-02-22 journal: gene ther doi: 10.1038/sj.gt.3301636 sha: doc_id: 5432 cord_uid: mqyvpepo somatic gene transfer to the pulmonary endothelium may be a useful strategy for modifying the phenotype of endothelium and/or vascular smooth muscle in disorders such as primary pulmonary hypertension, ards or pulmonary metastatic disease. adenoviral (ad) vectors, although highly efficient in liver gene transfer, have proven to be limited for pulmonary gene transfer with respect to efficiency, in part because of difficulty in assuring significant residence time in the lung and/or paucity of receptors for adenovirus on the endothelium. a recent study has shown that the use of a bispecific antibody to endothelial cells and ad vectors efficiently redirects ad vectors to pulmonary endothelium and improves gene expression in the lung. in this study, we report that pulmonary gene transfer by ad vectors can also be improved significantly via the use of cationic liposomes. preinjection of cationic liposomes followed by adenovirus led to a significant increase in the level of gene expression in the lung. the improvement in pulmonary gene transfer was associated with a decrease in the level of gene expression in the liver. gene expression in the lung lasted for up to 2 weeks. this protocol, together with genetic modification of adenovirus, may prove to be useful for pulmonary gene transfer for the treatment of pulmonary diseases. this method may also be extended to pulmonary gene transfer using other types of viral vectors via vascular route. adenoviral (ad) vectors possess many attributes that favor their use in gene therapy, particularly their high efficiency. 1 one of the key limitations to the use of ad vectors is the restricted tropism of the virus. systemic administration of ad vectors leads to gene expression mainly in the liver. 2 pulmonary gene transfer by ad vectors via the vascular route has proven to be limited with respect to efficiency in part because of difficulty in assuring significant residence time in the lung and/or paucity of receptors for adenovirus on the endothelium. two approaches have been proposed to overcome the problem of restricted tropism: one is to genetically modify ad vectors to render them target cell-specific [3] [4] [5] and the other one is to use bispecific conjugates to redirect ad vectors to target cells. [6] [7] [8] [9] several groups have reported selective delivery of ad vectors to various types of cells via genetic modifications of ad vectors. [3] [4] [5] thus far, there is no successful report of targeted delivery of ad vectors to pulmonary endothelium using this approach. the concept of targeted delivery of ad vectors to pulmonary endothelium has been nicely demonstrated in a study by report that pulmonary gene transfer by ad vectors can also be improved significantly via the use of cationic liposomes. preinjection of cationic liposomes followed by adenovirus led to a significant increase in the level of gene expression in the lung. the improvement in pulmonary gene transfer was associated with a decrease in the level of gene expression in the liver. gene expression in the lung lasted for up to 2 weeks. this protocol, together with genetic modification of adenovirus, may prove to be useful for pulmonary gene transfer for the treatment of pulmonary diseases. this method may also be extended to pulmonary gene transfer using other types of viral vectors via vascular route. gene therapy (2002) 9, 176-182. doi: 10.1038/sj/gt/3301636 reynolds et al 8 using a bispecific antibody. the complexing of ad vectors with a bispecific antibody to ad vectors and angiotensin-converting enzyme (ace) leads to a significant improvement in the infection of endothelial cells that overexpress ace receptors. systemic administration of the complexed ad vectors also leads to improvement in pulmonary gene transfer. 8 more importantly, the enhancement in pulmonary gene transfer was associated with a decrease in gene expression in the liver, although the level of gene expression in the liver is still substantially higher than that in the lung. 8 in this study, we have shown that pulmonary gene transfer by ad vectors can also be significantly improved via preinjection of cationic liposomes followed by ad vectors. this protocol, together with genetic modification of adenovirus, may prove to be useful for pulmonary gene transfer for the treatment of pulmonary diseases. this method may also be extended to pulmonary gene transfer using other types of viral vectors via the vascular route. we recently showed that preinjection of cationic liposomes followed by plasmid dna led to efficient gene expression in the lung. 10 to test whether preinjection of cationic liposomes can also enhance ad vector-mediated pulmonary gene transfer, groups of six mice received tail vein injection of various amounts of dotap:cholesterol liposomes followed by injection of adcmvluc. gene expression was assayed 3 days following the injection. when adcmvluc alone was used, the liver had the highest level of gene expression (figure 1b) . a low level of gene expression was found in the lung and other organs. there was about a 100 to 1000-fold difference in the level of gene expression between the liver and lung (figure 1b) . preinjection of cationic liposomes significantly improved gene expression in the lung. as shown in figure 1a , increasing the amount of cationic liposomes from 0.3 to 0.9 mol/mouse led to a steady increase in the level of gene expression in the lung. at the dose of 0.9 mol dotap per mouse, a range of eight-to 40-fold increase in gene expression was observed in seven different experiments. continuous increase in dotap dose was not associated with a further increase in gene expression in the lung. interestingly, the improvement in gene expression in the lung was associated with a dramatic decrease in gene expression in the liver (figure 1b) . several studies have shown that complexation of ad vectors with cationic lipids/polymers improves infection of a number of cell lines in vitro [11] [12] [13] [14] [15] [16] and airway epithelial cells following intratracheal instillation in mice. 12 to gene therapy examine whether lipid complexation can similarly enhance pulmonary gene transfer by ad vectors via the vascular route, adcmvluc was mixed with various amounts of dotap:cholesterol liposomes and gene expression was assayed 3 days following the injection of the complexed adcmvluc. in contrast to sequential injection, premixing of cationic liposomes with adcm-vluc resulted in a decreased gene expression in all major organs examined ( figure 2 ). complexation of cationic lipids with adcmvluc had minimal effects on infection of cultured mouse lung endothelial cells (mlec) at low doses and inhibited gene expression at relatively high doses (data not shown). figure 3 shows gene expression in liver and lung 3 days following injection of various amounts of adcmvluc, alone or following preinjection of dotap:cholesterol liposomes (0.9 mol/mouse). increasing the amount of adcmvluc from 1 to 4 × 10 10 particles/mouse led to a significant increase in gene expression in the lung. further increase in adcmvluc dose was associated with a much smaller increase in gene expression in the lung. preinjection of cationic liposomes improved gene expression in the lung at all adcmvluc doses examined. increasing the virus dose also led to an increase in gene expression in the liver. however, in contrast to the lung, preinjection of cationic liposomes either had no effect or inhibited gene expression in the liver. figure 4 shows gene expression in lung and liver at days 3, 7 and 14 following sequential injection of dotap:cholesterol liposomes (0.9 mol/mouse) and adcmvluc (4 × 10 10 particles/mouse). gene expression appears to peak at day 7 and decline gradually thereafter. however, a significant level of expression could still be detected at day 14. a similar pattern was noticed when adcmvluc alone was injected into mice, although the level of gene expression in the lung was lower compared with the sequential protocol (data not shown). having defined the optimal conditions with adcmvluc, the sequential injection protocol was further evaluated using adcmvlacz as a reporter gene. mice received intravenous injections of dotap:cholesterol liposomes followed by adcmvlacz. seventy-two h after injection, mice were killed and lungs were fixed and stained for ␤galactosidase activity using x-gal at 37°c. no blue cells were observed in the lungs of mice treated with a control vector (adcmvluc) (figure 5a ). in contrast, there was localized gene expression throughout the distal lung of mice that received adcmvlacz (figure 5b ). the primary loci of lacz expression appeared to be the capillary endothelium located within the alveolar septum ( figure 5b ). no sign of inflammation was noticed in any of the lung sections examined. the cell type of transfected cells was further analyzed by anti-platelet endothelial cell adhesion molecule (pecam) immunofluorescence staining of lung sections of mice receiving cationic liposomes followed by adcmvegfp. pecam is a transmembrane adhesion molecule expressed at high levels on endothelial cells. as shown in figure 6a and b, egfp signal was substantially co-localized with pecam labeling, confirming that endothelial cells were the major cell type transfected. we hypothesized that the improved pulmonary gene transfer via the sequential injection protocol might be due to an increased uptake of ad vectors by the lung compared with i.v. injection of ad vectors alone. to test this hypothesis, mice received tail vein injections of adcm-vluc, alone or following the injection of dotap:cholesterol liposomes. at days 1 and 7 following the injection, mice were killed and lungs were perfused and collected. the amount of luciferase cdna in mouse lungs was evaluated by a semi-quantitative pcr. as shown in figure 7 , the amount of luciferase cdna was slightly decreased from day 1 to day 7. however, at both timepoints, greater amounts of luciferase cdna were detected in the lungs of mice that received sequential injections of cationic liposomes and adcmvluc, than in the lungs of mice that received adcmvluc alone. recent studies have shown that systemic administration of cationic liposome/dna complexes is associated with an acute proinflammatory cytokine response, 17-20 which can be partially overcome via the sequential protocol. 10 to examine whether a similar cytokine response also we have shown in this study that preinjection of cationic liposomes followed by ad vectors significantly improves gene expression in the lung (figure 1a) . cationic liposomes enhance pulmonary gene transfer by ad vectors in a dose-dependent manner. the major cells that are gene therapy infected appear to be endothelial cells (figures 5 and 6 ) and gene expression lasts for up to 2 weeks (figure 4 ). the improvement in pulmonary gene transfer is associated with a decrease in the level of gene expression in the liver (figure 1b) . based on cytokine response, this protocol is safe and the liposomes used do not add to the ad vector-related toxicities ( figure 8 ). the sequential injection protocol was initially described by song and colleagues 21 to address the roles of cationic liposomes in cationic lipid-mediated pulmonary gene transfer via the vascular route. recently tan et al 10 have extended these studies by showing that pulmonary gene transfer via sequential injection protocol is even more efficient than that via cationic liposome/dna complexes. moreover, the sequential injection protocol overcomes, at least partially, the cpg-related proinflammatory cytokine response that is associated with systemic administration of cationic liposome/dna complexes. the mechanism of pulmonary gene transfer via sequential protocol is not clearly understood at present. it might be due to a transient slow-down of pulmonary microcirculation by preinjected liposomes, which allows for efficient interaction of subsequently injected dna with the target cells (endothelial cells). the improved lung gene transfer by ad vectors via the sequential protocol might be due to a similar mechanism. this was supported by the observation that more viral dna was detected in lungs of mice that received sequential injection of cationic liposomes and ad vectors than in lungs of mice that received ad vectors alone (figure 7) . the increased pulmonary uptake of ad vectors may play an important role in improving the efficiency of gene expression in the lung. ad vectors are likely to be taken up by endothelial cells in free form. a number of studies have shown that complexation of ad vectors with cationic lipids improves infection of cultured cells, especially those cells that lack the receptors for ad virus. [11] [12] [13] [14] [15] [16] complexation of ad vectors with cationic liposomes has also been shown to enhance the infection of airway epithelial cells upon instillation into mouse trachea. 12 cationic liposomes enhance ad vectormediated infection mainly by improving the cellular uptake. however, mixing of adcmvluc with dotap: cholesterol liposomes had minimal effect on infection of cultured mouse lung endothelial cells (mlec) despite an increased cellular uptake of ad vectors (data not shown). at high doses, dotap:cholesterol liposomes inhibited infection of mlec in a dose-dependent manner (data not shown). similarly, mixing of dotap:cholesterol liposomes with adcmvluc led to a decreased level of gene expression in the lung (figure 2 ). these differences might reflect the different cellular barriers in ad-mediated infection in different cell types. mlec might lack an efficient mechanism in mediating the release of ad vectors into cytosol following cellular uptake of cationic lipid-complexed ad vectors. despite an improved cellular uptake, cationic lipids may inhibit the subsequent release of ad vectors into cytosol, thus resulting in an overall decreased level of gene expression. more studies are needed to better understand the mechanism of infection of mlec via ad vectors, alone or complexed with cationic liposomes. several approaches have been proposed to redirect ad vectors to pulmonary endothelium. these include genetic modification to render the vectors endothelium-specific and the use of bispecific conjugate. thus far, there has been no successful report of improving ad-mediated infection of pulmonary endothelium via genetic modification of ad vectors. redirecting ad vectors to pulmonary endothelium has been demonstrated using a bispecific conjugate. 8 our study provides a different approach that is also highly efficient in improving pulmonary gene transfer by ad vectors via the vascular route. the advantages of our method lie in its simplicity. cationic liposomes can be prepared in large quantities in a cost-effective manner. this protocol requires no modification of ad vectors, therefore it can be used for different ad vectors containing different therapeutic genes. this protocol might be combined with other approaches to achieve a synergistic effect in lung targeting. this method may also be extended to pulmonary gene transfer using other types of viral vectors that carry fewer side-effects than ad vectors. it should be noted that while the sequential protocol brings about a significant improvement in pulmonary gene transfer and a concomitant decrease in the level of gene expression in the liver, the absolute level of gene expression in the lung is still relatively lower than that in the liver. a similar phenomenon was also observed in the study by reynolds et al 8 using a bispecific antibody. further improvements in lung targeting may require the combination of several different approaches and/or the inclusion of an endothelium-specific promoter. these possibilities are currently under investigation in this laboratory. chemicals 1,2-dioleoyl-3-trimethylammonium-propane (dotap) and dioleoylphosphatidyl-ethanolamine (dope) were purchased from avanti lipids (alabaster, al, usa). cholesterol was obtained from sigma (st louis, mo, usa). luciferase assay kit was from promega (madison, wi, usa). other chemicals were of reagent grade. for in vivo experiments, female cd-1 mice aged 4-6 weeks (16-18 g) were purchased from charles river laboratories (wilmington, ma, usa). the animals were kept at the university of the pittsburgh animal facilities. all experiments were conducted under protocols approved by the institutional animal care and use committee. adcmvluc is a recombinant e1-, e3-deleted ad vector expressing firefly luciferase under the control of the cytomegalovirus (cmv) promoter. adcmvlacz is a recombinant e1-, e3-deleted ad vector expressing escherichia coli ␤-galactosidase (lacz) under the control of the cmv promoter. adcmvegfp is a recombinant e1-, e3-deleted ad vector expressing enhanced green fluorescence protein (egfp). they were propagated in a permissive 293 cell line, purified by centrifugation through cesium chloride gradients and plaque titered on 293 cells by standard techniques. liposomes containing dotap and cholesterol in a 1:1 molar ratio were prepared as follows. the lipid mixture in chloroform was dried under a stream of nitrogen as a thin layer in a 100-ml round-bottomed flask, which was further desiccated under vacuum for 2 h. the lipid film was hydrated in 5% dextrose in water to give a final concentration of 10 mg dotap/ml. preparation of small unilamellar vesicles by extrusion was performed as follows. the lipid solution was briefly sonicated, followed by incubation at 50°c for 10 min and then sequentially extruded through polycarbonate membranes with the following pore sizes: 1.0, 0.6 and 0.2 m. the size of liposomes was around 150 nm as measured by dynamic laser scattering using a coulter n4sd particle sizer (hialeah, fl, usa). if not otherwise indicated, the amounts of dotap:cholesterol liposomes and ad vector per mouse were 900 nmol and 4 × 10 10 particles, respectively. all of the dilutions were made in saline. for sequential injection groups of six mice received first an injection of 150 l liposome and then 150 l adenovirus via the tail vein. control mice received adenovirus in 300 l saline. in a separate experiment, adenovirus was mixed with various amounts of dotap:cholesterol liposomes and the mixtures were incubated at room temperature (rt) for 10 min before injection. assay for luciferase activity at 3 days or the indicated times following injection, mice were killed and organs were homogenized in 1 ml of ice-gene therapy cold lysis buffer (0.05% triton x-100, 2 mm edta, and 0.1 m tris, ph 7.8) with a tissue tearer (biospec products, bartlesvile, ok, usa) for 20 s at high speed. the homogenates were then centrifuged at 14 000 g for 10 min at 4°c. ten l of the supernatant was analyzed with the luciferase assay system (promega) using an automated lb953 luminometer (berthod, bad wildbad, germany). the protein content of the supernatant was measured with the biorad protein assay system (biorad, hercules, ca, usa). luciferase activity was expressed as relative lights units (rlu) per milligram of tissue protein. mice received tail vein injection of dotap:cholesterol liposomes followed by adcmvlacz. three days later, mice were killed and lungs were perfused intravascularly with 2% paraformaldehyde in pbs and inflated with this fixative to near total lung capacity for 1 h at room temperature. after rinsing with cold pbs, the lungs were incubated in x-gal staining solution (invitrogen, carlsbad, ca, usa) overnight at 37°c. the lungs were then embedded in paraffin and thin sections were prepared. the sections were counterstained with eosin and viewed with a nikon light microscope. mice were injected with dotap:cholesterol liposomes followed by adcmvegfp. three days later, mice were killed and lungs were perfused intravascularly with pbs followed by 2% paraformaldehyde in pbs, and inflated with this fixative to near total lung capacity. the lungs were rinsed with cold pbs and immersed in 30% sucrose in pbs at 4°c overnight. the lungs were then quickly frozen in oct with dry ice. five-micrometer lung cryosections were then cut. following three washes in pbs containing 0.5% bovine serum albumin and 0.15% glycine (pbg buffer) sections were incubated in a 1:100 dilution of rat anti-mouse pecam (pharmingen, san diego, ca, usa) for 1 h at rt, washed with pbg three times, and labeled with cy3-labeled goat anti-rat igg (jackson immunoresearch laboratories, west grove, pa, usa) for 1 h at rt. following three further washes with pbg the sections were stained with hoescht dye 33258 (sigma) for 30 s and mounted in gelvatol (monsanto, st louis, mo, usa). cells were visualized using an olympus provis microscope (olympus, tokyo, japan) using a triple pass (blue/green/red) cube, which allows excitation at 384 nm and collection at 540 nm. images were collected using an optronics magnifier camera (santa barbara, ca, usa) or with a leica tcs nt confocal microscope with a 60× oil immersion objective at 1024 × 1024 pixel resolution. mice received tail vein injections of adcmvluc, alone or following dotap:cholesterol liposomes. at days 1 and 7 following injection, the mice were killed and the lungs were collected. genomic dna in lungs was extracted using the dneasy tissue kit (qiagen, valencia, ca, usa). six hundred ng of dna was analyzed by the mastercycle gradient pcr (eppendorf, westbury, ny, usa), using primers specific for the luciferase gene that generate a 510-bp fragment. primers were synthesized by mwg-biotech (high point, nc, usa) and their sequences were cgtcacatctcatctacctc (luc-1) gene therapy and gtatccctggaagatggaag (luc-2), respectively. after an initial denaturation for 5 min at 95°c, 30 cycles of amplification were performed using cycle times of 45 s at 95°c, 45 s at 58°c and 1 min at 72°c. a 5-min extension step at 68°c followed the pcr amplification. amplification products were analyzed by electrophoresis on a 2% agarose gel. at 4 h following injection, mice were bled from the retroorbital sinuses under anesthesia. the blood was allowed to stay at 4°c for 4 h and then centrifuged at 14 000 g for 10 min at 4°c. serum was collected and stored at -80°c until analyzed. in addition, the lungs of the mice were perfused with pbs intratracheally and bronchoalveolar lavage fluid was then collected. the concentrations of cytokines (tnf-␣, il-12 and il-6) in serum or bronchoalveolar lavage fluid were determined with the specific cytokine immunoassay kit (r&d systems, minneapolis, mn, usa). data were expressed as means ± standard derivations and analyzed by the two-tailed unpaired student's t test using the prism software program (graphpad software, san diego, ca, usa). data were considered significant if p ͻ 0.05 ( * ) and very significant if p ͻ 0.01 ( * * ). adenovirus vectors for gene delivery adenoviral vectors for liver-directed gene therapy genetic targeting of adenoviral vectors genetic targeting of an adenovirus vector via replacement of the fiber protein with the phage t4 fibritin efficient gene transfer into human cd34(+) cells by a retargeted adenovirus vector targeted adenovirus gene transfer to endothelial and smooth muscle cells by using bispecific antibodies targeted gene delivery by tropism-modified adenoviral vectors a targetable injectable adenoviral vector for selective gene delivery to pulmonary endothelium in vivo molecular adaptors for vascular-targeted adenoviral gene delivery sequential injection of cationic liposome and plasmid dna effectively transfects the lung with minimal inflammatory toxicity virosomes: cationic liposomes enhance retroviral transduction complexes of adenovirus with polycationic polymers and cationic lipids increase the efficiency of gene transfer in vitro and in vivo polycations increase the efficiency of adenovirus-mediated gene transfer to epithelial and endothelial cells in vitro cationic liposomes enhance adenovirus entry via a pathway independent of the fiber receptor and alpha(v)-integrins polycations and cationic lipids enhance adenovirus transduction and transgene expression in tumor cells. cancer adenovirus complexed with polyethylene glycol and cationic lipid is shielded from neutralizing antibodies in vitro cationic lipids enhance cytokine and cell influx levels in the lung following administration of plasmid: cationic lipid complexes effect of immune response on gene transfer to the lung via systemic administration of cationic lipidic vectors contribution of plasmid dna to inflammation in the lung after administration of cationic lipid:pdna complexes cationic lipid:bacterial dna complexes elicit adaptive cellular immunity in murine intraperitoneal tumor models enhanced gene expression in mouse lung by prolonging the retention time of intravenously injected plasmid dna this work was supported by nih grants hl63080 (to s li), hl32154 and gm53789 (to b pitt), and hl66949 and ar62225 (to p robbins). key: cord-018924-wo42j0ps authors: nettelbeck, dirk m. title: bispecific antibodies and gene therapy date: 2011-07-01 journal: bispecific antibodies doi: 10.1007/978-3-642-20910-9_18 sha: doc_id: 18924 cord_uid: wo42j0ps gene therapy is the transfer of therapeutic genes, via gene transfer vectors, into patients for therapeutic purposes. different gene therapy strategies are being pursued, including long-term gene correction of monogenetic diseases, eradication of tumor cells in cancer patients, or genetic vaccination for infectious diseases. bispecific antibodies and gene therapy are connected in two ways. first, bispecific antibodies are tools of interest for the development of targeted gene transfer vectors. different gene therapy strategies require different vectors, frequently replication-ablated viruses. similar to the role of antibody engineering in antibody therapy, the engineering of gene transfer vectors has become key to the implementation of genetic therapies. cytoablative cancer gene therapy and efficient genetic vaccination, for example, depend on vectors that are targeted to cancer cells and antigen-presenting cells, respectively, in order to avoid side effects and vector sequestration. to this end, bispecific antibodies have been engineered as adapters that link the vector to a specific molecule on the targeted cell and at the same time block the interaction with the native virus receptor. different formats of bispecific antibodies and related molecules have been developed and succeeded in re-directing vectors to target cells in vitro and in vivo. these adapters also improved gene therapies in animal models. second, gene transfer is a promising tool for delivery of bispecific antibodies to patients. therefore, vectors can be injected directly into patients for antibody gene transfer, or cells isolated from patients can be genetically modified in vitro and then re-injected for in vivo antibody production. genetic antibody delivery, compared with standard antibody injection, can be advantageous with respect to achieving persistent antibody titers or effective antibody biodistribution in patients. initial studies have shown antibody production and therapeutic activity in animal models, setting the stage for more widespread investigations. moreover, gene therapy can enable novel therapeutic applications for bispecific antibodies by facilitating the delivery of membrane associated or intracellular antibody formats. gene therapy is the transfer of genes into patients' cells for therapeutic purposes ( fig. 18.1 ). gene therapy was originally envisioned as a cure for inherited (monogenetic) diseases by gene correction, i.e., by replacing or complementing the causative mutated gene with a functional copy. in recent decades, however, gene therapy has been intensively investigated for treatment of many diseases by transfer of diverse classes of therapeutic genes from various species (table 18 .1). examples are genes encoding pathogen antigens for prevention or treatment of infectious diseases (genetic vaccination); genes encoding agonists or antagonists of vascular growth factors for treatment of cardiovascular diseases; or genes that directly or indirectly mediate tumor cell killing for cancer treatment. gene therapy drugs consist of the therapeutic gene, which defines the mode of therapeutic action, and the gene transfer vector, which needs to facilitate appropriate stability, delivery, and expression of the therapeutic gene ( fig. 18 .1). indeed, major efforts in gene therapy research focus on vector development, since the delivery of therapeutic genes is complex and critically determines treatment efficacy. since the 1990s a multitude of gene therapy clinical trials have been performed with thousands of patients and therapeutic efficacy was demonstrated recently. examples are the restoration of immunity in scid patients, restoration of some degree of vision in childhood blindness or inhibition of neurodegeneration (kohn 2010; roy et al. 2010; cartier et al. 2009 ). however, most gene therapy approaches necessitate improved efficacy or selectivity of gene transfer in order to facilitate successful applications in patients. to ensure proper expression of the therapeutic gene in the patients' cells, a gene therapy vector contains a promoter and a transcription termination/polyadenylation signal ( fig. 18.1 ). further regulatory elements can be exploited, for example, to achieve enhanced (introns) or bicistronic (internal ribosome entry sites, ires) gene expression. importantly, regulatory elements can be exploited for spatial or temporal control of gene expression. examples are inducible or cell type-specific promoters or sequences differentially regulating mrna stability or translation efficiency (goverdhana et al. 2005; dorer and nettelbeck 2009; brown and naldini 2009) . to improve stability of the therapeutic dna, these eukaryotic expression for gene therapy, a therapeutic gene is delivered into the patient's cell, where the gene product is expressed and mediates therapeutic activity. examples are the complementation of the patient's genetic defects or the killing of cancer cells. for delivery and expression, the therapeutic gene is incorporated into a gene transfer vector containing regulatory sequences (e.g., for transcription start and termination). frequently, the vector is also a means for efficient gene delivery into the patient's cells (e.g., replication-deficient viruses). right panels: bispecific antibodies can be either a tool for targeting gene transfer vectors to specific cell types (1) or gene transfer can be exploited as a tool for antibody therapy by antibody gene transfer and subsequent synthesis of the antibody in the patient (2) singer and verma (2008) cassettes are inserted either into circular plasmids, which might be further packaged by non-viral vectors, or into genomes of replication-deficient viral vectors ( into hematopoietic stem cells for treatment of inherited immunodeficiencies (kohn 2010) . in contrast, transient gene transfer is usually sufficient for genetic vaccination or cytoablative cancer therapy. for the latter, however, efficient gene transfer is pivotal and thus vector choice is determined by transduction efficiency. in this regard, conditionally replication-competent viral vectors have been recently engineered allowing for vector spread in tumors and thus amplified gene transfer (parato et al. 2005; cody and douglas 2009 ). such replication-competent vectors also mediate tumor cell lysis by virus replication, termed oncolysis or virotherapy. hence, from the perspective of the virotherapist, insertion of therapeutic genes into the genome of oncolytic viruses is a strategy to complement oncolysis with gene therapy ("arming" of oncolytic viruses). many gene therapy applications require the restriction of gene transfer to specific cells. this is obvious for cytoablative gene therapy and for replication-competent vectors. also effective genetic vaccination can depend on gene transfer into appropriate immune cells, as antigen expression in the wrong cells can trigger tolerance rather than immunity. consequently, vector targeting is a major challenge for gene therapy research. targeted gene therapy (or viral replication) can be achieved by inserting cell-binding ligands into the gene transfer vector for targeted cell entry, or by post-entry regulation of therapeutic gene expression using appropriate regulatory sequences, as mentioned above. bispecific antibodies and gene therapy are connected in two ways. first, bispecific antibodies have been developed as promising tools for targeting cell entry of gene transfer vectors: as adapter molecules they link the vector to a marker molecule (specifically) expressed on the target cell surface ( fig. 18.1) . second, gene therapy can be an alternative means for delivery of therapeutic antibodies to patients, i.e., by antibody production in the patients' cells (genetic antibody therapy, fig. 18 .1). besides genetic delivery of (established) soluble antibodies, such antibody gene transfer can also facilitate new applications for (bispecific) antibody therapy, for example by expression of membrane-bound or intracellular derivatives. certainly combination therapies of bispecific antibodies and gene therapy can also be envisioned. bispecific antibodies have been exploited in gene therapy as tools to direct viral gene transfer vectors to diseased cells. therefore, an antibody with specificity for a viral surface protein is linked to a second antibody that binds to a cell surface molecule of interest, thus implementing an adapter molecule that binds the gene transfer vector to the target cell (figs. 18.1 and 18.2). such modification of virus tropism is required when virus receptor expression is lacking on target cells, preventing gene transfer, or when widespread expression of the native virus receptor on healthy cells leads to adverse side effects and vector sequestration. for the latter, either the viral attachment proteins have been mutated without losing their affinity for the adapter, or the receptor-binding domain of the virus attachment protein is shielded by the adapter. the resulting loss of virus tropism for healthy cells is termed de-targeting. binding of and entry into target cells in both cases is mediated by the target of the cell-binding moiety of the adapter (re-targeting). important advantages of the adapter strategy are (a) it does not require modifications to the virus structure, which might well turn out to be detrimental for bispecific antibodies as tools for targeting gene therapy. bispecific adapters binding to the gene transfer vector via one specificity and to a cell surface molecule with the other are used for delivering therapeutic genes to specific cell types. this strategy is of interest to gene therapy in order to ensure targeted therapy and avoid side effects. these bispecific adapters might contain one antibody or antibody fragment (for either vector or cell binding). alternatively, they can be bispecific antibodies: chemical conjugate, diabody or tandem scfv vector assembly, stability, or activity; (b) it is flexible as vector binding to any target molecule, to which an antibody can be raised, is possible by exchange of the adapter's cell-binding moiety and (c) once an effective adapter for a specific vector has been generated, it can be used for transfer of any therapeutic gene by corresponding derivatives of this vector. bispecific antibodies as adapters for targeting gene transfer have been most intensively investigated with adenoviral vectors. adenoviruses (ads, mcconnell and imperiale 2004) possess a double-stranded linear dna genome covered by a protein capsid, but not a lipid envelope. the receptor-binding spike of the adenoviral capsid, made of the trimeric fiber protein, is responsible for attachment to host cells by binding to the virus receptor, which is the coxsackie-adenovirus receptor (car) for the mostly used human ad serotype 5 (hadv-5). virus internalization into the host cell is then mediated by a secondary interaction of a different virus capsid protein, the penton base, with cellular integrins. by separating cell binding from entry, this two-step mechanism facilitates a high degree of flexibility for the nature of initial attachment of ad vectors to cells. after entry of the vector into the cell, the virus genome is transferred to the nucleus, where viral genes are expressed from the episomal genome. likewise, therapeutic genes are expressed after transfer of ad vector genomes into patients' cells. therefore, essential viral genes are replaced with the therapeutic gene, rendering the vector replication-deficient. more recently, therapeutic genes have been inserted into replication-competent ads (mcconnell and imperiale 2004). ads represent prominent gene therapy vectors, as they are stable, can be produced at high titers, possess an effective gene transfer machinery, and are only mildly pathogenic (hadv-5 causes common cold symptoms) (mcconnell and imperiale 2004). they have been the most frequently used viral vectors in clinical gene therapy trials (journal of gene medicine clinical trials database). these trials have revealed a favorable safety profile of ad vectors in patients. cancer gene therapy and genetic vaccination are the regimens where ad vectors are widely used. one therapeutic approach in cancer gene therapy is molecular chemotherapy, also termed gene-dependent enzyme prodrug therapy (gdept). this strategy is based on transfer of a gene encoding a prodrug-activating enzyme, which activates a harmless prodrug into an effective chemotherapeutic drug (portsmouth et al. 2007 ). the rationale for this strategy is that tumor-restricted prodrug activation should facilitate effective concentrations of the chemotherapeutic drug in the tumor, which cannot be achieved by conventional systemic infusion of the drug due to dose-limiting side effects. gdept and other cytoablative cancer gene therapies depend on tumor-selective gene transfer which is not provided by unmodified hadv-5 or other ad serotypes due to widespread expression of ad receptors also on healthy cells. ads are also frequently used as vectors for genetic vaccination, which is most efficient when the antigen gene is transduced into professional antigen-presenting cells (apcs), which provide the proper signals for activation of immune effector cells. dendritic cells (dcs) are the most effective apcs, but are difficult to transduce. though ads are the most effective gene transfer vectors for dcs, high vector titers are required for efficient dc transduction because of low expression of car. antibodies are attractive binding molecules for targeting gene transfer vectors based on their high affinity, specificity, and the opportunity to generate antibodies with specificity for virtually any cell surface target molecule. three strategies have been pursued for insertion of targeting ligands into viral gene transfer vectors: genetic fusion to viral capsid or envelope proteins, complexing with bispecific adapters, or chemical linkage. major drawbacks of the genetic and chemical strategies are that they are tedious and often interfere with viral functions. moreover, genetic insertion of antibodies into the adenoviral capsid is hampered by the incompatibility of biosynthesis of capsid and antibody molecules. ad capsid proteins are synthesized in the cytosol and transferred to the nucleus where viral particle assembly takes place, whereas antibodies are produced via the secretory pathway, which ensures their proper folding. consequently, genetic fusion of antibodies to ad capsid proteins has been inefficient and limited to a few cytosolically stable antibody fragments (hedley et al. 2006; vellinga et al. 2007; poulin et al. 2010) . in contrast, synthesis of adapter molecules can be separated from virus production. moreover, the insertion of cell-binding antibodies into adapter molecules is less tedious than the engineering of a complete new virus genome and resulting adapters can be linked to any ad vector, allowing for better flexibility. for production of adapters, antibody fragments binding to ad capsid proteins, mostly the fiber, and antibodies or antibody fragments binding to cell surface target molecules of interest have been used (fig. 18 .2). they have been linked by chemical conjugation (see also: chap. 3) or by genetic fusion, the latter generating tandem scfvs or scdbs (see also: chap. 5). as an alternative to bispecific antibodies, adapters have been generated by linking virus-binding antibody fragments to cell-binding proteins or peptides, or by linking cell-binding antibody fragments to the soluble adenovirus receptor car. the adapter strategy for targeting cell entry of ad vectors has been pioneered by douglas and co-workers for targeting of folate receptor overexpressing tumor cells (douglas et al. 1996) . to this end, they chemically conjugated folate to the fab fragment of a neutralizing anti-ad fiber monoclonal antibody (mab). a fab fragment was used to avoid agglutination of ad vectors by bivalent antibodies. after complexation to the respective ad vector, the adapter mediated folatedependent transfer of a reporter gene or of cytoablative genetic prodrug activation to target cells. the fab fragment alone inhibited adenoviral transduction, which was expected as it was derived from a neutralizing antibody. thus the fab-folate adapter realized targeted gene transfer by both ablating virus binding to the native virus receptor and directing virus attachment to a novel cell surface molecule (fig. 18.1) . wickham et al. described a bispecific antibody for directing ad cell binding to integrins. this adapter consisted of the integrin-binding mab chemically linked to a second mab with specificity for a peptide tag, which was engineered into the ad penton base (wickham et al. 1996) . the conjugate mediated enhanced adenoviral transduction of human smooth muscle and endothelial cells, which were only modestly tranduced by unmodified hadv5 vectors. subsequently, various bispecific antibody conjugates were reported, that consist of a fiber-binding fab fragment covalently linked to a cell-binding antibody or antibody fragment. such bispecific antibody conjugates have been reported to re-direct ad gene transfer to various cell types via binding to different cell surface molecules, including cd40, epcam, tag72, cd70, and ace (tillman et al. , 2000 de gruijl et al. 2002; brandao et al. 2003; miller et al. 1998; haisma et al. 1999; israel et al. 2001; reynolds et al. 2000 reynolds et al. , 2001 . these reports confirm the high flexibility of the adapter approach. for example, dcs, as professional antigen-presenting cells, represent targets of interest for gene transfer aiming at genetic vaccination for infectious or malignant diseases. conjugates of a-fiber fab and mabs binding to the dc surface molecule cd40 allowed for efficient ad gene transfer into mouse and human dcs (tillman et al. , 2000 . with this adapter, improved efficiency and selectivity of ad gene transfer to dcs was also achieved in situ using human skin explants (de gruijl et al. 2002) . in addition to targeting ad entry, the a-fiber fab/acd40 mab adapter triggered dc activation, as required for efficient induction of immune responses, via its cd40-binding activity. accordingly, the adapter increased the efficiency of tumor vaccination with ad vector transduced dcs in an animal model (tillman et al. 2000) . adapter targeting of ad vectors to cancer cells was demonstrated in cell culture studies with an egfr-binding a-fiber fab/mab conjugate for squamous cell carcinoma, glioblastoma, and osteosarcoma (miller et al. 1998; blackwell et al. 1999 ; barnett et al. 2002) ; with an epcam-binding fab/fab conjugate for various adenocarcinomas (haisma et al. 1999; heideman et al. 2001) ; with a tag-72-binding a-fiber fab/mab conjugate for ovarian cancer (kelly et al. 2000) ; and with a cd70-binding a-fiber fab/mab conjugate for b cell lymphomas (israel et al. 2001) . as car-expression varies on cancer cells, adapters frequently mediated markedly enhanced transduction of cancer cells. yet another type of antibody-based adapter conjugate has been generated by linking a-fiber fab fragments to basic fibroblast growth factor for targeting of various cancer cells (fgf2, goldman et al. 1997; rogers et al. 1997; rancourt et al. 1998 ) to a synthetic lung-homing peptide (trepel et al. 2000) , or to the hc-fragment of tetanus toxin for targeting neuronal cells (schneider et al. 2000) . recombinant bispecific adapter molecules possess attributes that are advantageous for application in vector targeting when compared with chemical conjugates. foremost, they can be produced by standardized procedures of prokaryotic or eukaryotic expression yielding well-defined molecules. both tandem single chain variable fragments (tandem scfvs, see also: chap. 5) and single chain diabodies (scdbs, see also: chap. 5) have been used for targeting ad gene transfer. haisma and co-workers demonstrated that ad transduction of glioblastoma and carcinoma cells can be increased by complexing the virus with a recombinant tandem a-fiber/a-egfr scfv (haisma et al. 2000) . our group reported in 2001 that a scdb with specificities for the ad fiber and endoglin, which is expressed on proliferative endothelium, facilitated targeted transduction of endothelial cells (nettelbeck et al. 2001) . in contrast to the tandem scfv, which was expressed in eukaryotic cells, the scdb was produced in bacteria. ad transduction was also targeted to gastric cancer cells with a tandem scfv adapter binding to epcam (heideman et al. 2002) , to dcs with a cd40-binding tandem scfv adapter (brandao et al. 2003) , to melanoma cells using a scdb adapter binding the melanoma surface antigen hmwmaa ), or to breast cancer cells with either a tandem scfv or a scdb binding to cea (korn et al. 2004 ). for improved de-targeting, "receptor-blind" ad mutants were combined with tandem scfv or scdb adapters that were derived from a-fiber scfvs that retained binding to mutant fibers nettelbeck et al. 2004; carette et al. 2007 ). these ad vectors could not bind car, even when individual fiber molecules were not protected after complexation with adapters. in consequence, this strategy of combined genetic/immunological tropism-modification implements a further increase in selectivity of gene transfer. recombinant antibody-derived adapters for targeting adenoviral transduction were also obtained by fusion of a-fiber scfv to ligand proteins (egf or upar, watkins et al. 1997; harvey et al. 2010) or to ligand peptides (nicklin et al. 2000) . alternatively, cell-binding scfvs (a-c-erbb2, a-cd40 or a-fcgri) were fused to monomeric or trimeric soluble car pereboev et al. 2002; kim et al. 2002; sapinoro et al. 2007 ). such scar-derived adapters offer the advantage of improving affinity to fiber by scar trimerization; however, they naturally cannot bind to "receptorblind" fiber-mutant viruses. these strategies also demonstrated that the adapter, besides targeting gene transfer, might also influence the outcome of gene therapy in different ways: adenoviral gene transfer to dcs by cd40-binding adapters, but not by the fcgri-binding adapter resulted in dc activation, thus influencing the type of immune response (immunization versus tolerization, tillman et al. 1999; sapinoro et al. 2007 ). in vitro studies with adapter molecules, including various bispecific antibodies, have clearly proven that viral cell entry can be re-directed via novel cell surface receptors, thus reprogramming virus tropism. this has been demonstrated in established cell cultures, freshly purified normal and tumor cells and in tissue explants, as for the demonstration of dc-targeted gene transfer in skin explants (de gruijl et al. 2002) . what are possible applications of bispecific antibodies and other antibody-derived gene transfer adapters? first, due to their modular composition and the opportunity to rapidly (in comparison with genetically engineered viruses) produce new adapters by chemical or genetic means, they facilitate the analysis, comparison, and screening of cell surface molecules for their feasibility as targets for viral gene transfer. second, applications of adapters for ex vivo gene therapy are of interest. an example is genetic vaccination of cancer or infectious diseases by ex vivo gene transfer into dcs isolated from patients. gene therapy of inherited diseases by ex vivo gene transfer into hematopoietic (stem) cells is a further application. here, however, retroviral vectors are preferred over ad vectors, as they facilitate stable gene transfer and thus prolonged gene correction or replacement (table 18 .2). of note, adaptertargeting of retroviral gene transfer has been demonstrated recently (see below). most gene therapy applications, however, require in vivo gene transfer. for establishing adapters for targeting gene transfer, in vivo extensive studies on the stability, efficiency, and selectivity of adapter-vector complexes after in vivo application are needed. whereas rigorous studies for the evaluation of pharmacologic and therapeutic parameters of adapter-targeted gene transfer are still to be done, initial studies have shown efficacy of adapter-targeting in vivo. in an effort to facilitate gene therapy of pulmonary vascular disease, reynolds and colleagues investigated a fab-mab conjugate adapter that binds angiotensinconverting enzyme (ace) for targeting of ad gene transfer to the lung endothelium in rats (reynolds et al. , 2001 . by systemic application of adapter-bound or uncomplexed ad vector, it was shown that this adapter increased gene transfer to the lung by more than 20-fold. importantly, gene transfer was directed to endothelial cells. moreover, gene transfer to the liver, the organ responsible for most adinduced side effects, was reduced more than 80%. hence, this study demonstrated both systemic stability of the adapter-vector complex and adapter-dependent vector de-and re-targeting in vivo. for the fab-fgf2 adapter, several studies in mice showed adapter-dependent reduction of liver transgene expression after systemic injection of ad vectors and reduced toxicity of ad-mediated genetic prodrug activation therapy. furthermore, this adapter increased therapeutic activity of ad-mediated genetic prodrug activation of peritoneal malignancies, when the ad vectors were injected intraperitoneally (rancourt et al. 1998; gu et al. 1999; printz et al. 2000) . in vivo stability of adaptor-vector complexes has also been demonstrated for recombinant proteins. trimeric, but not monomeric scar significantly blocked liver gene transfer by ads after systemic application of the scar-ad vector complex into mice (kim et al. 2002) . however, in a different study, a scar-scfv adapter targeting cea also reduced liver transduction by ad vectors after systemic injection of adapter-virus complexes into mice (li et al. 2007) . after systemic injection, this adapter also increased adenoviral transduction of ceapositive, but reduced transduction of cea-negative tumors that were grafted to mouse livers. furthermore, a trimeric derivative of the scar-cea adapter mediated improved targeting of adenoviral gene transfer in vitro and in vivo. in combination with transcriptional targeting using the cox-2 promoter, this trimeric adapter increased therapeutic activity and at the same time reduced liver toxicity of genetic prodrug activation therapy with hsv-tk/gcv (li et al. 2009 ). studies with scar-egf and trimeric scar-mcd40l confirm the re-targeting properties of scar-derived adapters in vivo (liang et al. 2004; huang et al. 2007 ). in addition to facilitating selective gene transfer, targeting adenoviral cell binding and entry is of interest also for improving oncolytic ads. toward this end, adapter molecules are of interest to re-direct the injected virus to target tumors. to also allow for targeting of progeny viruses of oncolytic ads produced in patients' tumors, genes encoding recombinant bispecific adapters have been inserted into the genome of these viruses. using a tandem scfv with specificity for the ad fiber and egfr, van beusechem and co-workers demonstrated increased viral spread and oncolysis in two-and three-dimensional tumor cell cultures (van beusechem et al. 2003; carette et al. 2007 ). although most widely investigated for ad vectors, adapters have been also shown to facilitate targeted cell entry of other viruses. adeno-associated viruses (aav) are small non-enveloped viruses that are frequently used for diverse gene therapy applications (buning et al. 2008) . tropism-modification of aav vectors was achieved with a bispecific fab/fab antibody conjugate. the adapter with specificity for the virus capsid and for integrins facilitated gene transfer into megakaryocytes, which are not permissive to unmodified aav vectors (bartlett et al. 1999) . for non-human coronaviruses, enveloped rna viruses that naturally do not enter human cells, infectivity for human cancer cells was established with a bispecific tandem scfv with specificities for a coronavirus surface glycoprotein and egfr (wurdinger et al. 2005a, b) . the idea of this approach was to selectively kill tumor cells by lytic virus infection rather than viral gene transfer. similar results were obtained using a recombinant adapter built of soluble coronavirus receptor fused to the egfr-binding scfv (wurdinger et al. 2005a, b) . newcastle disease virus, which is in development for viral oncolysis and gene therapy, has been re-targeted using a recombinant adapter built of a virus-binding scfv and il-2 (bian et al. 2005 (bian et al. , 2006 . retroviruses are enveloped rna viruses, which insert their genome after reverse transcription into the chromosome of infected cells. therefore, retroviral vectors facilitate long-term gene transfer which is especially suitable for gene correction therapy of monogenetic diseases. adapter targeting of retrovirus cell entry was reported for recombinant proteins built of the virus receptor extracellular domain fused to egf, vegf, or an egfrspecific scfv (snitkovsky and young 1998; boerger et al. 1999; snitkovsky et al. 2000 snitkovsky et al. , 2001 . 18.3 gene transfer as a tool for antibody therapy: genetic antibody delivery gene therapy can be exploited for expressing antibodies in patients, which might be advantageous for achieving sustained and/or efficient antibody concentrations and/ or a favorable antibody biodistribution by local expression. thus, gene therapy is a tool of interest to overcome rapid antibody clearance or poor access to tumors as reported for antibodies that are injected as proteins. genetic antibody therapy can be implemented by in vivo or ex vivo gene transfer (fig. 18.3) , i.e., by direct injection of the gene transfer vector into patients or by gene transfer in cultures of previously isolated cells followed by injection of the resulting genetically engineered cells, respectively. dependent on the design of the gene transfer vector, genetic antibody application can be transient or permanent, constitutive or inducible, targeted or ubiquitous. for example, retroviral vectors allow for stable gene transfer, inducible promoters facilitate control of antibody expression, and targeted vectors can direct gene transfer to specific cell types (see sects. 18.1.2 and 18.1.3). therefore, gene therapy possesses high potential and flexibility for implementing improved antibody delivery for specific applications. however, this area of research is still in its infancy and more widespread investigations are warranted. with the advent of recombinant dna technology it became possible to establish novel strategies for antibody production and to engineer antibody properties (for example affinity maturation and humanization), formats (single chain fragments), and fusion proteins (immunotoxins). recombinant antibodies have been frequently produced in bacteria, but gene transfer into eukaryotic cells has also been utilized for in vitro production of immunoglobulins, antibody fragments or antibody fusion proteins. having established the engineering of recombinant gene constructs for eukaryotic antibody expression, also the in vivo production of antibodies became feasible. examples are the expression of functional recombinant mabs in mice after transfer of genetically engineered cells (noel et al. 1997) or after in vivo gene transfer with an adenoviral or aav vector (noel et al. 2002; jiang et al. 2006; watanabe et al. 2009; lewis et al. 2002; fang et al. 2005 fang et al. , 2007 skaricic et al. 2008; de et al. 2008; ho et al. 2009 ). toward this end, fang and coworkers optimized antibody production: they expressed the heavy and light chains of the mab at equal amounts from a single open reading frame using a "ribosomal skip" sequence. thereby, serum levels of >1 mg/ml antibody for extended time periods were obtained in mice after injection of a single dose of aav vector. in a subsequent study, the same group demonstrated that by using an inducible promoter, serum antibody levels after in vivo gene transfer can be repeatedly shut off and on (fang et al. 2007 ). this represents a promising strategy to increase safety and/or facilitate dose adaptation in potential future clinical applications of genetic fig. 18. 3 gene therapy as a tool for antibody delivery: genetic antibody therapy. for genetic antibody delivery antibody genes, which can be engineered to match specific purposes, are incorporated into gene transfer vectors. these vectors are either directly injected into patients (in vivo gene therapy) or are used for gene transfer into cells previously purified from a patient followed by re-injection of the engineered cells into the patient (ex vivo gene therapy). the antibodies are produced in the patient from cells genetically modified by in vivo or ex vivo gene transfer. dependent on the vector design, antibody production can be transient or prolonged, constitutive or inducible and show local or systemic activity antibody delivery. de and co-workers combined genetic delivery of a mab gene by aav and ad vectors to achieve both rapid (ad) and persistent (aav) antibody production . functional expression in vivo was also demonstrated for recombinant antibody fragments or fusion proteins that contain such fragments after adenoviral gene transfer (whittington et al. 1998; arafat et al. 2002; afanasieva et al. 2003; kasuya et al. 2005; liu et al. 2010) . the expression of chimeric antigen receptors by t cells and subsequent adoptive t cell therapy is another important application of genetic antibody delivery. ã� lvarez-vallina and team have developed genetic delivery of bispecific antibodies by engineered cells. in 2003, they reported anti-tumor activity for a bispecific diabody expressed in vivo from irradiated, genetically engineered 293t cells (blanco et al. 2003) . they produced stably transfected 293t cells secreting a diabody with specificity for both cea and cd3. a second cell line additionally secreted a bivalent cea-specific diabody fused to the extracellular domain of b7-1. after co-injection with cea-positive tumor cells into mice, these genetically engineered cells showed anti-tumor activity compared with co-injection of control 293t cells. subsequent to this proof-of-principle study, the same group engineered a lentiviral gene transfer vector encoding the cea-cd3 diabody (compte et al. 2007 ). this vector facilitated the transduction of different types of hematopoietic cells that showed prolonged secretion of active diabody in vitro and antitumor activity in vivo. in a follow-up study, the group demonstrated that also the implantation of lentivirally transduced endothelial cells into mice resulted in prolonged production of the cea/cd3 diabody with therapeutic activity (compte et al. 2010 ). this study aims at a therapeutic regimen that allows for the production of therapeutic antibodies from neovessels that have incorporated ex vivo engineered endothelial cells. genetic delivery of bispecific antibodies has also been reported for intracellular applications: cell surface localization of two membrane proteins, vegfr2 and tie-2, could be blocked by expression of a corresponding bispecific, tetravalent antibody targeted to the endoplasmic reticulum (jendreyko et al. 2003) . this intracellular bispecific antibody showed anti-angiogenic activity in vitro, which was superior to monovalent control antibodies. a similar construct with specificity for vegfr2 and tie-2 mediated anti-angiogenic and anti-tumor activity in vivo after adenoviral gene transfer (jendreyko et al. 2005) . proof of principle has been demonstrated in several cell culture studies and animal models for both the utility of bispecific antibodies for targeting gene therapies and the feasibility of gene transfer for delivering recombinant bispecific antibodies. based on this fundamental work, bispecific antibody adapters and gene transfer technologies should now be considered for improving therapeutic regimens in gene therapy and antibody therapy, respectively. cooperation between antibody engineers and gene therapists are warranted to further develop bispecific antibodies and gene transfer vectors for this purpose. single-chain antibody and its derivatives directed against vascular endothelial growth factor: application for antiangiogenic gene therapy effective single chain antibody (scfv) concentrations in vivo via adenoviral vector mediated expression of secretory scfv dual targeting of adenoviral vectors at the levels of transduction and transcription enhances the specificity of gene expression in cancer cells targeted adeno-associated virus vector transduction of nonpermissive cells mediated by a bispecific f(ab'gamma)2 antibody retrovirus vectors: toward the plentivirus? selective gene transfer in vitro to tumor cells via recombinant newcastle disease virus in vivo efficacy of systemic tumor targeting of a viral rna vector with oncolytic properties using a bispecific adapter protein retargeting to egfr enhances adenovirus infection efficiency of squamous cell carcinoma induction of human t lymphocyte cytotoxicity and inhibition of tumor growth by tumor-specific diabody-based molecules secreted from gene-modified bystander cells retroviral vectors preloaded with a viral receptor-ligand bridge protein are targeted to specific cell types cd40-targeted adenoviral gene transfer to dendritic cells through the use of a novel bispecific single-chain fv antibody enhances cytotoxic t cell activation targeted gene-delivery strategies for angiostatic cancer treatment exploiting and antagonizing microrna regulation for therapeutic and experimental applications recent developments in adeno-associated virus vector technology a conditionally replicating adenovirus with strict selectivity in killing cells expressing epidermal growth factor receptor hematopoietic stem cell gene therapy with a lentiviral vector in x-linked adrenoleukodystrophy armed replicating adenoviruses for cancer virotherapy inhibition of tumor growth in vivo by in situ secretion of bispecific anti-cea x anti-cd3 diabodies from lentivirally transduced human lymphocytes factory neovessels: engineered human blood vessels secreting therapeutic proteins as a new drug delivery system prolonged maturation and enhanced transduction of dendritic cells migrated from human skin explants after in situ delivery of cd40-targeted adenoviral vectors rapid/sustained anti-anthrax passive immunity mediated by co-administration of ad/aav targeting cancer by transcriptional control in cancer gene therapy and viral oncolysis fifteen years of gene therapy based on chimeric antigen receptors: "are we nearly there yet? targeted gene delivery by tropism-modified adenoviral vectors stable antibody expression at therapeutic levels using the 2a peptide an antibody delivery system for regulated expression of therapeutic levels of monoclonal antibodies in vivo targeted gene delivery to kaposi's sarcoma cells via the fibroblast growth factor receptor regulatable gene expression systems for gene therapy applications: progress and future challenges trail gene therapy: from preclinical development to clinical application fibroblast growth factor 2 retargeted adenovirus has redirected cellular tropism: evidence for reduced toxicity and enhanced antitumor activity in mice tumor-specific gene transfer via an adenoviral vector targeted to the pan-carcinoma antigen epcam targeting of adenoviral vectors through a bispecific single-chain antibody retargeted adenoviral cancer gene therapy for tumour cells overexpressing epidermal growth factor receptor or urokinase-type plasminogen activator receptor an adenovirus vector with a chimeric fiber incorporating stabilized single chain antibody achieves targeted gene delivery selective gene delivery toward gastric and esophageal adenocarcinoma cells via epcam-targeted adenoviral vectors selective gene transfer into primary human gastric tumors using epithelial cell adhesion molecule-targeted adenoviral vectors with ablated native tropism growth inhibition of an established a431 xenograft tumor by a full-length anti-egfr antibody following gene delivery by aav enhancement of adenovirus vector entry into cd70-positive b-cell lines by using a bispecific cd70-adenovirus fiber antibody intradiabodies, bispecific, tetravalent antibodies for the simultaneous functional knockout of two cell surface receptors phenotypic knockout of vegf-r2 and tie-2 with an intradiabody reduces tumor growth and angiogenesis in vivo gene therapy using adenovirus-mediated full-length anti-her-2 antibody for her-2 overexpression cancers new aspects in vascular gene therapy adenovirus targeting to c-erbb-2 oncoprotein by single-chain antibody fused to trimeric form of adenovirus receptor ectodomain passive immunotherapy for anthrax toxin mediated by an adenovirus expressing an anti-protective antigen single-chain antibody selectivity of tag-72-targeted adenovirus gene transfer to primary ovarian carcinoma cells versus autologous mesothelial cells in vitro targeting adenoviral vectors by using the extracellular domain of the coxsackie-adenovirus receptor: improved potency via trimerization update on gene therapy for immunodeficiencies recombinant bispecific antibodies for the targeting of adenoviruses to cea-expressing tumour cells: a comparative analysis of bacterially expressed single-chain diabody and tandem scfv generation of neutralizing activity against human immunodeficiency virus type 1 in serum by antibody gene transfer adenovirus tumor targeting and hepatic untargeting by a coxsackie/adenovirus receptor ectodomain anticarcinoembryonic antigen bispecific adapter combined transductional untargeting/retargeting and transcriptional restriction enhances adenovirus gene targeting and therapy for hepatic colorectal cancer tumors noninvasive of adenovirus tumor retargeting in living subjects by a soluble adenovirus receptor-epidermal growth factor (scar-egf) fusion protein advances in viral-vector systemic cytokine gene therapy against cancer gene therapy in haemophilia -going for cure? biology of adenovirus and its use as a vector for gene therapy differential susceptibility of primary and established human glioma cells to adenovirus infection: targeting via the epidermal growth factor receptor achieves fiber receptor-independent gene transfer targeting of adenovirus to endothelial cells by a bispecific singlechain diabody directed against the adenovirus fiber knob domain and human endoglin (cd105) retargeting of adenoviral infection to melanoma: combining genetic ablation of native tropism with a recombinant bispecific single-chain diabody (scdb) adapter that binds to fiber knob and hmwmaa selective targeting of gene transfer to vascular endothelial cells by use of peptides isolated by phage display in vitro and in vivo secretion of cloned antibodies by genetically modified myogenic cells high in vivo production of a model monoclonal antibody on adenoviral gene transfer recent progress in the battle between oncolytic viruses and tumours coxsackievirus-adenovirus receptor genetically fused to anti-human cd40 scfv enhances adenoviral transduction of dendritic cells suicide genes for cancer therapy retargeting of adenovirus vectors through genetic fusion of a single-chain or single-domain antibody to capsid protein ix fibroblast growth factor 2-retargeted adenoviral vectors exhibit a modified biolocalization pattern and display reduced toxicity relative to native adenoviral vectors basic fibroblast growth factor enhancement of adenovirusmediated delivery of the herpes simplex virus thymidine kinase gene results in augmented therapeutic benefit in a murine model of ovarian cancer a targetable, injectable adenoviral vector for selective gene delivery to pulmonary endothelium in vivo combined transductional and transcriptional targeting improves the specificity of transgene expression in vivo dna vaccines: precision tools for activating effective immunity against cancer use of a novel cross-linking method to modify adenovirus tropism ocular gene therapy: an evaluation of recombinant adenoassociated virus-mediated gene therapy interventions for the treatment of ocular disease enhanced transduction of dendritic cells by fcgammari-targeted adenovirus vectors retargeting of adenoviral vectors to neurons using the hc fragment of tetanus toxin applications of lentiviral vectors for shrna delivery and transgenesis genetic delivery of an anti-rsv antibody to protect against pulmonary infection with rsv dendritic cell-based cancer gene therapy cell-specific viral targeting mediated by a soluble retroviral receptor-ligand fusion protein a tva-single-chain antibody fusion protein mediates specific targeting of a subgroup a avian leukosis virus vector to cells expressing a tumor-specific form of epidermal growth factor receptor targeting avian leukosis virus subgroup a vectors by using a tva-vegf bridge protein maturation of dendritic cells accompanies high-efficiency gene transfer by a cd40-targeted adenoviral vector adenoviral vectors targeted to cd40 enhance the efficacy of dendritic cell-based vaccination against human papillomavirus 16-induced tumor cells in a murine model molecular adaptors for vasculartargeted adenoviral gene delivery efficient and selective gene transfer into primary human brain tumors by using single-chain antibody-targeted adenoviral vectors with native tropism abolished conditionally replicative adenovirus expressing a targeting adapter molecule exhibits enhanced oncolytic potency on car-deficient tumors efficient incorporation of a functional hyper-stable single-chain antibody fragment protein-ix fusion in the adenovirus capsid genetic delivery of bevacizumab to suppress vascular endothelial growth factor-induced high-permeability pulmonary edema aavrh.10-mediated genetic delivery of bevacizumab to the pleura to provide local anti-vegf to suppress growth of metastatic lung tumors the 'adenobody' approach to viral targeting: specific and enhanced adenoviral gene delivery gene therapy progress and prospects: electroporation and other physical methods recombinant adenoviral delivery for in vivo expression of scfv antibody fusion proteins targeted adenovirus gene transfer to endothelial and smooth muscle cells by using bispecific antibodies breaking the bonds: non-viral vectors become chemically dynamic soluble receptor-mediated targeting of mouse hepatitis coronavirus to the human epidermal growth factor receptor targeting non-human coronaviruses to human cancer cells using a bispecific single-chain antibody key: cord-003196-fdb6az0v authors: casalino-matsuda, s. marina; wang, naizhen; ruhoff, peder t.; matsuda, hiroaki; nlend, marie c.; nair, aisha; szleifer, igal; beitel, greg j.; sznajder, jacob i.; sporn, peter h. s. title: hypercapnia alters expression of immune response, nucleosome assembly and lipid metabolism genes in differentiated human bronchial epithelial cells date: 2018-09-10 journal: sci rep doi: 10.1038/s41598-018-32008-x sha: doc_id: 3196 cord_uid: fdb6az0v hypercapnia, the elevation of co(2) in blood and tissues, commonly occurs in severe acute and chronic respiratory diseases, and is associated with increased risk of mortality. recent studies have shown that hypercapnia adversely affects innate immunity, host defense, lung edema clearance and cell proliferation. airway epithelial dysfunction is a feature of advanced lung disease, but the effect of hypercapnia on airway epithelium is unknown. thus, in the current study we examined the effect of normoxic hypercapnia (20% co(2) for 24 h) vs normocapnia (5% co(2)), on global gene expression in differentiated normal human airway epithelial cells. gene expression was assessed on affymetrix microarrays, and subjected to gene ontology analysis for biological process and cluster-network representation. we found that hypercapnia downregulated the expression of 183 genes and upregulated 126. among these, major gene clusters linked to immune responses and nucleosome assembly were largely downregulated, while lipid metabolism genes were largely upregulated. the overwhelming majority of these genes were not previously known to be regulated by co(2). these changes in gene expression indicate the potential for hypercapnia to impact bronchial epithelial cell function in ways that may contribute to poor clinical outcomes in patients with severe acute or advanced chronic lung diseases. go biological process-associated gene clusters targeted by hypercapnia. major clusters from hypercapnia-downregulated genes are linked to immune response, nucleosome assembly, cell differentiation, oxidation reduction, and ion and lipid transport (fig. 2) . clusters from upregulated genes induced by high co 2 (fig. 3) involve biological processes related to lipid metabolism, cholesterol biosynthesis, signal transduction, and transport. a number of these important clusters, labelled a-e in figs 2 and 3, are analyzed in more detail in the following sections. their corresponding gene lists are depicted in figs 4, 5, and supplementary fig. 3. hypercapnia differentially regulates genes associated with innate immunity and nucleosome assembly. cluster a, represented in fig. 4a , includes hypercapnia-regulated genes involved in signal transduction, immune and inflammatory responses, and leukocyte chemotaxis. notably, tlr4, multiple chemokines (ccl28, cxcl1, cxcl2, cxcl6, and cxcl14) and the il-6 receptor gene (il6r) were all downregulated by elevated co 2 . on the other hand, the il-1 receptor like 1 gene (il1rl1) was upregulated by hypercapnia. to validate the microarray results related to co 2 -induced changes in key immunoregulatory genes, expression of cxcl1, cxcl14, ccl28, ilr6 and tlr4 was also assessed by qpcr. we found that these genes were all downregulated at levels similar to those in the microarray analysis (fig. 4c) . indeed, the degree of co 2 -induced downregulation of these transcripts assessed by qpcr and microarray was highly correlated (r 2 = 0.7981). in addition, to determine whether downregulation of a key immunoregulatory transcript by hypercapnia was accompanied by a similar change in protein expression, we assessed expression of tlr4 protein in differentiated nhbe cells. immunofluorescence microscopy ( fig. 4d ) and immunoblotting (fig. 4e ) both showed that exposure to 20% co 2 for 24 h decreased nhbe cell expression of tlr4 protein. full-length blots are included in supplementary fig. 4 . taken together, these results suggest that hypercapnia would suppress airway epithelial innate immune response to microbial pathogens and other inflammatory stimuli. next, we analyzed cluster b, which includes hypercapnia-regulated genes that codify proteins involved in nucleosome assembly (fig. 5a) . the heat map in fig. 5b shows that hypercapnia downregulates genes encoding multiple family members of the core histones h2a and h2b 27 , as well as the nucleosome assembly protein 1-like 1 (nap1l1), which regulates protein complex assembly, chromosome organization and dna metabolism. the only upregulated gene in cluster b is h1f0, encoding histone h1, which is normally expressed in terminally differentiated and slowly dividing cells. to validate the microarray data from cluster b, we performed qpcr for selected transcripts whose expression was significantly altered in the microarray analysis. figure 5c shows that expression of the histone genes hist1h2ac, hist1h2bd, and hist1h2bk was downregulated by hypercapnia as assessed by qpcr, again similar to the microarray results. and others), cell surface receptor signaling (egfr, ifnar1, il6r and others) and apoptosis (bcl2l15, dapl1, sema6a and others). the impact of elevated co 2 on expression of these genes would be expected to alter epithelial metabolism and barrier function, as well as innate immune and inflammatory responses. to our knowledge, the present study is the first to investigate the impact of hypercapnia on global gene expression in airway epithelial cells. of importance, we utilized primary nhbe cells cultured at ali to achieve a differentiated state closely resembling normal human bronchial epithelium. our principal finding was that hypercapnia altered expression of a small number of specific genes (309 out of 20,390 transcripts assayed, or 1.5%) in differentiated nhbe cells. of these, 183 genes (59%) were downregulated, while 126 (41%) were upregulated. thus, the effects of elevated co 2 are highly selective, involving both differential repression and differential activation of specific gene subsets. the overwhelming majority of these genes were not previously known to be regulated by co 2 . furthermore, gene ontogeny analysis showed enrichment of hypercapnia-regulated genes involved in a variety of fundamentally important cellular processes. altering expression of genes related to these processes would be expected to impart functional changes in bronchial epithelial cells, which could in turn influence the pathophysiology and outcomes of many respiratory diseases. our data show that hypercapnia alters expression of multiple components of the innate immune system, including downregulation of the il-6 receptor (il6r); the neutrophil chemokines cxcl1, cxcl2 and cxcl6 28 ; the mucosal-associated chemokines ccl28 and cxcl14 [29] [30] [31] and importantly tlr4. hypercapnia also upregulated cd55 and cd86, which bind virus at the cell surface 32, 33 . while hypercapnia downregulated tlr4, it increased the expression of il1rl1, which has been shown to inhibit tlr4 activation 34 defense against multiple respiratory pathogens [36] [37] [38] [39] [40] . interestingly, airway epithelial tlr4 expression was reduced in patients with severe copd as compared to those with less severe copd 41 , possibly due to hypercapnia in patients with advanced disease. reduced expression of immune response genes was also seen in the lungs of newborn mice exposed to moderate hypercapnia (8% co 2 ) for the first two weeks of life 42 . while the immune genes downregulated by hypercapnia in the newborn mice differed from those we found in nhbe cells, the mucosal immunity chemokine cxcl14 43 was commonly downregulated in both systems. taken together, these observations indicate that the airway epithelium is an important target for hypercapnic suppression of innate immune gene expression. this, along with the suppressive effects of elevated co 2 on macrophage, neutrophil, alveolar epithelial cell functions [13] [14] [15] [17] [18] [19] likely contributes to the deleterious impact of elevated co 2 on lung injury and host defense. another cluster impacted by hypercapnia includes genes related to nucleosome assembly, which also have antibacterial properties. the nucleosome consists of 145-147 base-pair-segments of dna wrapped around a histone octamer containing one (h3-h4) 2 tetramer, two h2a-h2b dimers, and histone chaperones or linkers that facilitate nucleosome assembly 44 . regulation of nucleosome assembly following dna replication, dna repair and gene transcription is critical for the maintenance of genome stability and epigenetic information 44 . within this gene cluster, hypercapnia downregulated transcripts for the core histones h2a and h2b 27 , the histone chaperone nap1l1 45 , and the linker histone h1 27 . downregulation of histone gene expression can be triggered by dna-damage or indirect inhibition of dna synthesis 46 and might lead to alterations of chromatin structure that would influence transcriptional regulation of many genes and even genome stability 47 . exchange of core histones with histone variants might also alter the chemical nature and physical properties of the nucleosome, thereby affecting distinct cellular processes 48 . in addition, histones h2a and h2b also can inactivate endotoxin and function as antimicrobial proteins 49, 50 . we also found that elevated co 2 upregulated nhbe cell expression of cholesterol and fatty acid biosynthesis genes, while downregulating atp-binding cassette (abc) transporters, which promote the efflux of cholesterol and phospholipids from the cell 51 . interestingly, enveloped viruses subvert preexisting lipids for viral entry and trafficking and also reprogram lipid synthesis and lipid distribution in lipid rafts to establish an optimal environment for their replication, assembly and egress 52 . furthermore, host defense against viral infection involves interferon-mediated downregulation of sterol biosynthesis 53 . thus, hypercapnia-induced cholesterol accumulation might contribute to the entry, replication, and shedding of respiratory viruses in the airways. as noted above, in a previous study, we showed that hypercapnia downregulates the tca cycle enzyme idh2, resulting in mitochondrial dysfunction and impaired proliferation of fibroblasts and a549 lung epithelial cells 20 . however, in the current study, hypercapnia did not alter idh2 expression in nhbe cells, indicating that co 2 -mediated regulation of gene expression is cell-type-specific. on the other hand, a number of genes involved in mitochondrial function were regulated by hypercapnia in nhbe cells. among these, upregulated genes included acyl-coa dehydrogenase short/branched chain (acadsb) and acyl-coa synthetase short chain family member 2 (acss2), which encode enzymes involved in fatty acid synthesis and oxidation 54 . genes downregulated by elevated co 2 included gamma-butyretaine hydroxlase 1 (bbox 1), which catalyzes synthesis of l-carnitine, an essential co-factor in beta-oxidation 55 ; kynurenine 3-monooxygenase (kmo), an outer mitochondrial membrane protein that hydroxylates tryptophan to form kynurenine 56 ; bcl2 interacting protein 3 (bnip3), a bh3 domain protein with pro-apoptotic activity 57 ; and mitochondrial assembly of ribosomal large subunit 1 (malsu1), an inhibitor of translation at the mitochondrial ribosome 58 . the diverse activities of these genes indicate the potential for hypercapnia to disrupt multiple mitochondrial functions in nhbe cells. while the current study does not reveal the molecular mechanism(s) underlying hypercapnia-induced changes in gene transcription, other recent work suggests a path to elucidating components of a putative co 2 -induced signaling pathway leading to inhibition of innate immune gene expression and impaired host defense. we previously reported that elevated co 2 inhibits expression of antimicrobial peptide genes and suppresses antibacterial host defense in drosophila 59 , suggesting that the immunosuppressive effect of hypercapnia is evolutionarily conserved. using a genome-wide rnai screen, we identified a small number of genes whose expression is required for co 2 -induced immunosuppression in drosophila cells, and which are conserved in mammalian systems 60 . flies deficient in of one these genes, a zinc finger homeodomain transcription factor known as zfh2, were protected from co 2 -induced mortality associated with bacterial infection 60 . this opens up the opportunity to test whether orthologues of zfh2 and other genes identified in the drosophila screen mediate hypercapnic immunosuppression in mice and ultimately in humans. alterations in expression of innate immune and other genes in airway epithelial cells may be of central importance in the co 2 -induced increase in mortality of pseudomonas pneumonia we previously observed in mice 17 . in addition, the suppressive effect of elevated co 2 on immune gene expression in the airway epithelium, along with similar effects on immune cells, suggest a reason why severe copd and other lung disease associated with hypercapnia all carry a high risk of pulmonary infection. bacterial and viral infections are a principal cause of acute copd exacerbations [61] [62] [63] [64] , which are linked to the need for hospitalization and to mortality 65, 66 . thus, co 2 -induced alterations in airway epithelial gene expression may underlie the increase in mortality associated with hypercapnia in advanced copd, as well as community-acquired pneumonia 9 , adenoviral lung infections 10 and cystic fibrosis 11 . it is notable in this regard that reducing hypercapnia with noninvasive ventilatory support has been shown to decrease hospital readmissions and mortality in patients with severe copd 67, 68 . further investigation of the molecular mechanisms and mediators of co 2 effects on gene expression may reveal targets for pharmacologic intervention to prevent hypercapnic immune suppression in patients with advanced respiratory disease. humans without known lung disease were obtained from a commercial source (lonza). the cells were plated on collagen-coated plastic dishes, grown to confluence in begm tm bronchial epithelial cell growth medium (lonza), and passaged after enzyme dissociation with trypsin 69 . cells from passage-3 were seeded onto 24-mm, 0.4 μm pore size, polyester, transwell inserts (corning) at 0.5 × 10 6 cells per insert (4.67 cm 2 ) and cultured in a serum-free medium 70 , comprised of 1:1 mixture of bebm (lonza): dmem (mediatech), supplemented with hydrocortisone (0.5 μg/ml), insulin (5 μg/ml), transferrin (10 μg/ml), epinephrine (0.5 μg/ml), triiodothyronine (6.5 ng/ml), epidermal growth factor (0.5 ng/ml), retinoic acid (50 nm), bovine pituitary extract (0.4%), gentamycin (50 μg/ml), and amphotericin b (50 ng/ml). after the cells reached confluence in submersion culture, the medium above the inserts was removed and the cells were maintained in ali culture for two more weeks, at which point differentiation to a pseudostratified mucociliary epithelium with characteristics of airway epithelium in vivo was established 69, 71 . differentiation after ∼2 wk on ali culture was confirmed by the presence of beating cilia and mucus production, as previously described 72 . culture of nhbe cells up to the point of full differentiation was carried out in an atmosphere of humidified 5% co 2 /95% air at 37 °c. hypercapnia exposure. after differentiation, nhbe cells were cultured in ali for an additional 24 h in humidified 20% co 2 /21% o 2 /59% n 2 (hypercapnia) or maintained in humidified 5% co 2 /95% air (5% co 2 /20% o 2 /75% n 2 ; normocapnia), as control. the growth medium was pre-saturated with appropriate co 2 concentration for 4 h prior to the addition to the cells. the pco 2 and ph of the pre-saturated media were measured using a phox plus blood gas analyzer (nova biomedical corp). for the normocapnia-and hypercapnia-equilibrated media, the pco 2 s were 44 and 112 mmhg, and the corresponding ph values were 7.4 and 7.1 respectively. to determine whether hypercapnia induces cytotoxicity, lactate dehydrogenase (ldh) release to the apical and basolateral compartments was assessed using a colorimetric cytotoxicity detection kit (roche) according to the manufacturer's instructions. absorbance at 490 nm was measured using a versamax tunable microplate reader (molecular devices). percent ldh release was calculated as the amount of ldh measured in the basolateral supernatant or apical wash divided by the total amount of ldh in the culture (ldh in cell lysates plus that measured in apical and basolateral compartments) times 100. mini kit (qiagen). quality and quantity of each rna sample were assessed using a 2100 bioanalyzer (agilent). rna was hybridized to genechip ® human genome u133 2.0 plus array (affymetrix). a total of 6 chips, each hybridized to a crna from different normocapnic (n = 3) or hypercapnic (n = 3) nhbe cell cultures were used in this study. the u133 2.0 plus arrays contain probes for approximately 56,921 transcripts and variants, including over 45,000 well characterized human genes. fluorescent images were detected in a genechip ® scanner 3000 and expression data were extracted using the genechip operating system v 1.2 (affymetrix). assessed by a statistical linear model analysis using the bioconductor package limma 73, 74 (https://www.bioconductor.org/help/faq/), in which an empirical bayes method is used to moderate the standard errors of the estimated log-fold changes of gene expression. the moderated t-statistic p-values derived from the limma analysis were further adjusted for multiple testing by benjamini and hochberg's method 75 to control false discovery rate scientific reports | (2018) 8:13508 | doi:10.1038/s41598-018-32008-x (fdr). many genes whose expression signals were below background were defined as "absent". transcripts absent in all samples were filtered out, leaving 54,675 probes corresponding to 20,390 genes in the downstream analysis. the lists of differentially expressed genes were obtained by the fdr criteria of <0.05 and fold-change cutoff of >1.4. differential gene expression in hypercapnia versus normocapnia was depicted in a pie chart, volcano plot of statistical significance (−log 10 p value) plotted against log 2 fold change, and hierarchical clustering by pearson correlation represented as heat maps generated using heatmapper 76 and gene-e (https://software.broadinstitute. org/gene-e/). over representation analysis (ora) of gene ontology (go) terms from biological processes of all genes downregulated or upregulated by hypercapnia were separately analyzed using the gene ontology analysis innatedb tool 77 which utilizes a manually-curated knowledgebase of genes, proteins, interactions and signaling pathways involved in mammalian innate immune responses. results from the innate db analysis were confirmed using genego metacore (thomson reuter), a separately curated database and pathway analysis tool. microarray data have been deposited to the national center for biotechnology information (ncbi) gene expression omnibus (geo; http://www.ncbi.nlm.nih.gov/projects/geo) complied with miame standards (accession number gse110362). network ontology analysis. subsequent analysis of global expression changes and ontology network assessment on the differentially selected genes was performed using mathematica ® v11.2 (wolfram research, inc., mathematica, version 11.2, champaign, il (2017)). ontology groups were generated using inbuilt genomedata, matching the annotated genes with pre-defined processes and intracellular functions. two approaches were used for analysis of genome wide expression changes: unbiased measurements of intra-network gene expression and fold-change ranked segmentation. unbiased intra-network changes were assessed for cellular processes that contained at least five genes in the post-screen data. mean-fold change, the variance of the fold-change, and pearson correlation of expression were measured for each process. intra-network heterogeneity of relative expression was measured by calculating the standard deviation of the relative expression for genes within any given ontological process. for instance, if a gene was classified as belonging to both "nucleosome assembly" and "signal transduction", it was assigned to both groups and a connection between these processes was indicated. to further understand the impact of hypercapnia-induced differential gene expression, cluster domains of go biological processes containing 5 or more genes and with at least 4 connections were also generated using mathematica ® v11.2. these processes were broadly grouped based on gene function and by their connections. quantitative taqman real-time rt-pcr. total rna was isolated from nhbe cells and first-strand cdna was generated using multiscribe ™ mulv reverse transcriptase (applied biosystems). the first-strand cdna was used to quantitate the mrna levels by taqman real-time pcr system (applied biosystems). the level of expression of eukaryotic translation elongation factor 1 alpha 1 (eef1a1) was used as reference, and fold change of target genes was calculated by the ∆∆ ct method 78 . immunofluorescence staining for tlr4. after exposure to normocapnia (5% co 2 ) or hypercapnia (20% co 2 ) for 24 h, differentiated nhbe cells were fixed with ice-cold 50% acetone/50% methanol for 5 min. cells were blocked in pbs containing 2% bsa and 0.1% triton x-100 then double-stained with 1:200 polyclonal rabbit anti-human tlr4 antibody (h-80, santa cruz biotechnology) followed by 1:200 alexa fluor 555-conjugated goat-anti-rabbit igg (red) (invitrogen), and 1:500 monoclonal mouse anti-human acetylated tubulin antibody (clone 6-11b-1, sigma) followed by 1:200 alexa fluor 488-conjugated goat anti-mouse igg (green) (invitrogen). nuclei were identified by staining with 1 µg/ml hoescht (blue) (sigma). images were obtained using a nikon te200 inverted fluorescence microscope (nikon) equipped with a spot rt monochrome digital camera (diagnostic instruments). all images were captured with the same gain and exposure time using metamorph software. immunoblotting for tlr4. after exposure to normocapnia (5% co 2 ) or hypercapnia (20% co 2 ) for 24 h, differentiated nhbe cells were lysed in ripa buffer (santa cruz biotechnology) supplemented with pmsf, sodium orthovanadate and protease inhibitor cocktail. lysate proteins (30 μg/well) were resolved by sds/page 4-20% gradient gels and transferred to nitrocellulose (bio-rad laboratories). membranes were probed with polyclonal rabbit anti-human tlr4 (h-80) antibody followed by hrp-conjugated anti-rabbit secondary antibody (pierce). blots were stripped and re-probed with monoclonal mouse anti-human β-actin (abcam) followed by hrp-conjugated anti-mouse secondary antibody (pierce) to confirm the equal loading. the signals were detected using enhanced chemiluminescence supersignal west dura substrate kit (pierce). tlr4/βactin ratios were assessed using imagej 79 . statistical analysis. data are presented as means ± se. differences between two groups were assessed using student's t test. levene's test was used to analyze the homogeneity of variances. significance was accepted at p < 0.05. deaths: preliminary data for severe hypercapnia in critically ill adult cystic fibrosis patients carbon dioxide and the critically ill-too little of a good thing acute respiratory failure in obstructive lung disease. long-term survival after treatment in an intensive care unit the prognosis of patients with chronic obstructive pulmonary disease after hospitalization for acute respiratory failure apache ii predicts long-term survival in copd patients admitted to a general medical ward mortality and mortality-related factors after hospitalization for acute exacerbation of copd clinical presentation and predictors of outcome in patients with severe acute exacerbation of chronic obstructive pulmonary disease requiring admission to intensive care unit arterial carbon dioxide tension on admission as a marker of in-hospital mortality in communityacquired pneumonia lower respiratory infections by adenovirus in children. clinical features and risk factors for bronchiolitis obliterans and mortality risk factors for death of patients with cystic fibrosis awaiting lung transplantation severe hypercapnia and outcome of mechanically ventilated patients with moderate or severe acute respiratory distress syndrome elevated co 2 selectively inhibits interleukin-6 and tumor necrosis factor expression and decreases phagocytosis in the macrophage nf-κb links co 2 sensing to innate immunity and inflammation in mammalian cells hypercapnia inhibits autophagy and bacterial killing in human macrophages by increasing expression of bcl-2 and bcl-xl sustained hypercapnic acidosis during pulmonary infection increases bacterial load and worsens lung injury hypercapnia impairs lung neutrophil function and increases mortality in murine pseudomonas pneumonia high co 2 levels impair alveolar epithelial function independently of ph amp-activated protein kinase regulates co 2 -induced alveolar epithelial dysfunction in rats and human cells by promoting na,k-atpase endocytosis elevated co(2) levels cause mitochondrial dysfunction and impair cell proliferation epithelial cells and airway diseases innate immunity in the respiratory epithelium airway epithelial repair, regeneration, and remodeling after injury in chronic obstructive pulmonary disease a novel host defense system of airways is defective in cystic fibrosis oxidant-antioxidant balance in acute lung injury respiratory epithelial cells orchestrate pulmonary innate immunity histone exchange, chromatin structure and the regulation of transcription chemokines and chemokine receptors: positioning cells for host defense and immunity the antimicrobial activity of ccl28 is dependent on c-terminal positively-charged amino acids cxcl14 as an emerging immune and inflammatory modulator cxcl14 displays antimicrobial activity against respiratory tract bacteria and contributes to clearance of streptococcus pneumoniae pulmonary infection adenovirus receptors parainfluenza virus 5 upregulates cd55 expression to produce virions with enhanced resistance to complement-mediated neutralization st2 is an inhibitor of interleukin 1 receptor and toll-like receptor 4 signaling and maintains endotoxin tolerance effect of elevated carbon dioxide on bronchial epithelial innate immune receptor response to organic dust from swine confinement barns toll-like receptor 4 expression is required to control chronic mycobacterium tuberculosis infection in mice toll-like receptor 4 plays a protective role in pulmonary tuberculosis in mice toll-like receptor 4 mediates innate immune responses to haemophilus influenzae infection in mouse lung role of toll-like receptor 4 in gram-positive and gram-negative pneumonia in mice pattern recognition receptors tlr4 and cd14 mediate response to respiratory syncytial virus epithelial expression of tlr4 is modulated in copd and by steroids, salmeterol and cigarette smoke effect of carbon dioxide on neonatal mouse lung: a genomic approach histone chaperones in nucleosome assembly and human disease functional characterization of human nucleosome assembly protein 1-like proteins as histone chaperones dna damage induces downregulation of histone gene expression through the g1 checkpoint pathway histone h2a variants in nucleosomes and chromatin: more or less stable? histone variants-ancient wrap artists of the epigenome antimicrobial action of histone h2b in escherichia coli: evidence for membrane translocation and dna-binding of a histone h2b fragment after proteolytic cleavage by outer membrane proteinase t endotoxin-neutralizing antimicrobial proteins of the human placenta emerging new paradigms for abcg transporters multifaceted roles for lipids in viral infection host defense against viral infection involves interferon mediated down-regulation of sterol biosynthesis fatty acid synthesis and oxidation in cumulus cells support oocyte maturation in bovine non-enzymatic chemistry enables 2-hydroxyglutarate-mediated activation of 2-oxoglutarate oxygenases kynurenine-3-monooxygenase: a review of structure, mechanism, and inhibitors autophagy capacity and sub-mitochondrial heterogeneity shape bnip3-induced mitophagy regulation of apoptosis c7orf30 is necessary for biogenesis of the large subunit of the mitochondrial ribosome elevated co 2 suppresses specific drosophila innate immune responses and resistance to bacterial infection identification of drosophila zfh2 as a mediator of hypercapnic immune regulation by a genome-wide rna interference screen influenza infection and copd importance of viral and bacterial infections in chronic obstructive pulmonary disease exacerbations predicting chronic obstructive pulmonary disease hospitalizations based on concurrent influenza activity infection in the pathogenesis and course of chronic obstructive pulmonary disease standards for the diagnosis and treatment of patients with copd: a summary of the ats/ ers position paper global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease non-invasive positive pressure ventilation for the treatment of severe stable chronic obstructive pulmonary disease: a prospective, multicentre, randomised, controlled clinical trial. the lancet effect of home noninvasive ventilation with oxygen therapy vs oxygen therapy alone on hospital readmission or death after an acute copd exacerbation: a randomized clinical trial mucin gene expression during differentiation of human airway epithelia in vitro. muc4 and muc5b are strongly induced mucociliary differentiation of serially passaged normal human tracheobronchial epithelial cells characterization of mucins from cultured normal human tracheobronchial epithelial cells epidermal growth factor receptor activation by epidermal growth factor mediates oxidant-induced goblet cell metaplasia in human airway epithelium bioconductor: open software development for computational biology and bioinformatics limma powers differential expression analyses for rna-sequencing and microarray studies controlling the false discovery rate: a practical and powerful approach to multiple testing heatmapper: web-enabled heat mapping for all innatedb: systems biology of innate immunity and beyond-recent updates and continuing curation analysis of relative gene expression data using real-time quantitative pcr and the 2(-delta delta c(t)) method nih image to imagej: 25 years of image analysis contributed reagents or analytic tools this work was supported by national institutes of health grants r01 hl107629, r56 hl131745, r01 hl131745 and r01 hl085534; and by a merit review from the department of veterans affairs. supplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-32008-x.competing interests: the authors declare no competing interests.publisher's note: springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.open access this article is licensed under a creative commons attribution 4.0 international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. key: cord-048322-5eqdrd52 authors: aigner, achim title: delivery systems for the direct application of sirnas to induce rna interference (rnai) in vivo date: 2006-05-18 journal: j biomed biotechnol doi: 10.1155/jbb/2006/71659 sha: doc_id: 48322 cord_uid: 5eqdrd52 rna interference (rnai) is a powerful method for specific gene silencing which may also lead to promising novel therapeutic strategies. it is mediated through small interfering rnas (sirnas) which sequence-specifically trigger the cleavage and subsequent degradation of their target mrna. one critical factor is the ability to deliver intact sirnas into target cells/organs in vivo. this review highlights the mechanism of rnai and the guidelines for the design of optimal sirnas. it gives an overview of studies based on the systemic or local application of naked sirnas or the use of various nonviral sirna delivery systems. one promising avenue is the the complexation of sirnas with the polyethylenimine (pei), which efficiently stabilizes sirnas and, upon systemic administration, leads to the delivery of the intact sirnas into different organs. the antitumorigenic effects of pei/sirna-mediated in vivo gene-targeting of tumor-relevant proteins like in mouse tumor xenograft models are described. altered expression levels of certain genes play a pivotal role in several pathological conditions. for example, in many cancers the upregulation of certain growth factors or growth factor receptors, or the deregulation of intracellular signal transduction pathways, represents key elements in the process of malignant transformation and progression of normal cells towards tumor cells leading to uncontrolled proliferation and decreased apoptosis. since these processes may result in the direct, autocrine stimulation of the tumor cell itself as well as the paracrine stimulation of other cells, including the stimulation of tumor-angiogenesis, many novel therapeutic strategies focus on the reversal of this effect, that is, the inhibition of these proteins or the downregulation of their expression. likewise, several other diseases have been firmly linked to the (over-)expression of endogenous wildtype or mutated genes. taken together, in addition to strategies based on the inhibition of target proteins, for example, by low molecular weight inhibitors or inhibitory antibodies, this opens an avenue to gene-targeting approaches aiming at decreased expression of the respective gene. the first method to be introduced for the specific inhibition of gene expression was the use of antisense oligonucleotides in the late 1970s [1, 2] . upon their introduction into a cell, antisense odns are able to hybridize to their target rna leading to the degradation of the rna-dna hybrid double strands through rnaase h, to the inhibition of the translation of the target mrna due to a steric or conformational obstacle for protein translation and/or to the inhibition of correct splicing. in the early 1980s, the discovery of ribozymes, that is, catalytically active rnas which are able to sequence-specifically cleave a target mrna, further expanded gene-targeting strategies [3] [4] [5] . subsequently, both methods were extensively studied and further developed with regard to the optimization of targeting efficacies and antisense-odn/ribozyme delivery strategies in vitro and in vivo. most recently, another naturally occurring biological strategy for gene silencing has been discovered and termed rna interference (rnai). since rnai represents a particularly powerful method for specific gene silencing and is able to provide the relatively easy ablation of the expression of any given target gene, it is now commonly used as a tool in biological and biomedical research. this includes the rnaimediated targeting in vitro and in vivo for functional studies of various genes whose expression is known to be upregulated as well as the development of novel therapeutic approaches based on gene targeting. double-stranded rna molecules as described first in c elegans by fire et al [6] who then introduced the name rna interference. these findings also explained earlier observations in petunias which turned white rather than purple upon the introduction of the "purple gene" in form of dsrna [7] , and on gene silencing by antisense oligonucleotides as well as by sense oligonucleotides in c elegans [8] . subsequent studies demonstrated that rnai, while described under different names (posttranscriptional gene silencing (ptgs), cosuppression, quelling), is present in most eukaryotic organisms with the response to dsrna, however, being more complicated in higher organisms. rnai relies on a multistep intracellular pathway which can be roughly divided into two phases, that is, the initiation phase and the effector phase. in the initiation phase, double-stranded rna molecules from endogenous or exogenous origin present in the cell are processed through the cleavage activity of a ribonuclease iii-type protein [9] [10] [11] [12] into short 21-23 nucleotide fragments termed sirnas. these effector sirnas, which contain a symmetric 2 nt overhang at the 3 -end as well as a 5 -phosphate and a 3hydroxy group, are then in the effector phase incorporated into a nuclease-containing multiprotein complex called risc (rna-induced silencing complex) [13] . several structural and biochemical studies have shed light on the processing of double-stranded rna and the formation of the risc complex (see, eg, [14] for a recent review). through unwinding of the sirna duplex by an rna helicase activity [15] , this complex becomes activated with the single-stranded sirna guiding the risc complex to its complementary target rna. upon the binding of the sirna through hybridization to its target mrna, the risc complex catalyses the endonucleolytical cleavage of the mrna strand within the target site, which, due to the generation of unprotected rna ends, results in the rapid degradation of the mrna molecule. with the risc complex being recovered for further binding and cleavage cycles, the whole process translates into a net reduction of the specific mrna levels and hence into the decreased expression of the corresponding gene. for an overview of the rnai pathway, see figure 1 . while from this mechanism it becomes obvious that sirna molecules complementary to the target mrna and thus being able to serve as a guide sequence for the risc complex play a pivotal role in this process, they need not be derived from long double-stranded precursor molecules. rather, omitting the initiation phase, they can be delivered directly into the target cell ( figure 1 , upper right arrow). several studies have led to the development of guidelines for the generation of sirnas which are optimal in terms of efficacy and specificity [12, 16] . this includes the initial definition of the preferable length (19-25 bp) combined with a low g/c content in the range between 36% and 52% and the requirement of symmetric 2 nt overhangs at the 3 -end [16] [17] [18] . later studies on synthetic sirna molecules, however, revealed an up to 100-fold higher targeting efficacy in the case of even longer duplexes (25-30 nucleotides) which act as a substrate for dicer and which therefore allow the direct incorporation of the newly produced sirnas into the risc complex [19] . as to be expected, intramolecular foldback structures which can result from internal repeats or palindrome sequences decrease the numbers of functional sirna molecules with silencing capability [20] . additional silencing-enhancing criteria include an a in position 3 and a g at position 13 of the sense strand, the absence of a c or g at position 19 and, most importantly, a u in position 10 of the sense strand. since nucleotides 10-11 represent the site of the risc-mediated cleavage of the target mrna, this indicates that risc is comparable to most other endonucleases in preferentially cleaving 3 of u rather than any other nucleotide [20, 21] . furthermore, it was shown more generally that the thermodynamic flexibility of the positions 15-19 of the sense strand correlates with the silencing efficacy and that the presence of at least one a/u base pair in this region improves sirna-mediated silencing efficacy due to a decreased internal stability of its 3 -end [20] . still, different sirna sequences may display differing efficacies, which suggest additional still unknown criteria for optimal sirna selection and emphasize the influence of target mrna accessibility. in fact, several studies also correlate the sirna efficacy with the mrna secondary structure [18, [22] [23] [24] [25] [26] [27] . in conclusion, apart from the selection criteria defined above, the individual screening of different sirnas for highly efficient and specific duplexes, or the pooling of multiple sirnas, is the most effective approach to increase sirnamediated targeting efficacy. for the design of effective sirnas, several algorithms on publicly accessible web sites are available (see [28] for review). to reduce the risk of nonspecific ("off-target") effects of the sirnas, a homology search of the targeting sequence against a gene database is necessary and already incorporated in some of these web sites. nevertheless, it has also been shown that sirnas may cross-react with targets of limited sequence similarity when regions of partial sequence identity between the target mrna and the sirna exist. in fact, in some cases regions comprising of only 11-15 contiguous nucleotides of sequence identity were sufficient to induce gene silencing [29] . the prediction of these off-target activities is difficult so far. an additional mechanism that may lead to nonspecific effects in vivo relies on the interferon system [30] [31] [32] [33] which is induced when double-stranded rna molecules enter a cell activating a multi-component signalling complex. this effect is particularly true for long dsrna molecules and essentially prevents them from being used as inducers of rna interference in mammalian systems. the development of synthetic sirnas [10, 12, 33, 34] largely circumvents this problem since they seem to be too small. however, some synthetic sir-nas do induce components of the interferon system which seems to be dependent on their sequence [31, 32, 35] as well as, in the case of in vitro transcribed sirnas, on the 5 initiating triphosphate [36] . thus, strategies to avoid as far as possible the unwanted interferon response upon application of sirnas in vivo will include a design of sirnas without known interferon-stimulating sequences, the use of the lowest possible sirna dose to still achieve the desired effect and optimized sirna delivery methods. based on the known mechanisms of antisense technology, ribozyme-targeting or rnai, small oligonucleotides or plasmid-based expression vectors can be used to specifically downregulate the expression of a given gene of interest or of pathological relevance in vitro. in principle, this also applies to the in vivo situation leading to novel, potentially relevant therapeutic approaches. for the delivery of therapeutic nucleic acids, viral vectors have been used which have the advantage of high transfection efficacy due to the inherent ability of viruses to transport genetic material into cells. on the other hand, however, viral systems show a limited loading capacity regarding that the genetic material are rather difficult to produce in a larger scale and, most importantly, pose severe safety risks due to their oncogenic potential and their inflammatory and immunogenic effects which prevent them from repeated administration [37] [38] [39] [40] . in the light of these problems, concerns, and limitations, nonviral systems have emerged as a promising alternative for gene delivery. main requirements are the protection of their nucleic acid "load" as well as their efficient uptake into the target cells with subsequent release of the dna or rna molecules and, if necessary, their transfer into the nucleus. several strategies can be distinguished, mainly lipofection and polyfection relying on cationic lipids or polymers, respectively (see, eg, [41] [42] [43] ). the efficient protection against enzymatic or nonenzymatic degradation is particularly important for rna molecules including sirnas. in fact, while the therapeutic potential of sirnas for the treatment of various diseases is in principle very promising, limitations of transfer vectors may turn out to be rate-limiting in the development of rnai-based therapeutic strategies. one approach to solve this problem is the use of dna expression plasmids which encode palindromic hairpin loops with the desired sequence. upon transcription and folding of the rna, the doublestranded short hairpin rnas (shrnas) are recognized by dicer and cleaved into the desired sirnas. additionally, an in vitro method has been described recently which is based on the expression of shrnas in e coli and their delivery via bacterial invasion [44] . while all these different dnabased systems offer the advantage of sirna expression with a longer duration and a probably higher level of gene silencing, they still rely on (viral or nonviral) delivery of dna molecules and again raise safety issues in vivo. hence, the direct delivery of sirnas molecules, derived from in vitro transcription or chemically synthesized, offers advantages over dna-based strategies and may be preferable for in vivo therapeutic use. in the last years, a large body of studies has been published which describe different strategies for the systemic or local application of sirnas in vivo. tables 1-3 give an overview. the probably largest number of papers focuses the use of unmodified sirnas (table 1 ) whose administration is often performed iv by hydrodynamic transfection (high pressure tail vein injection). while this method is widely used and in some cases led to efficient target gene inhibition in the liver and, to a lesser extent, in lung, spleen, pancreas, and kidney, it may suffer from certain technical and practical limitations at least in a therapeutical setting since it relies on the rapid iv injection of a comparably large volume (>= 1 ml/mouse/injection, in theory equivalent to a ∼ 3 l iv bolus injection in man). alternative strategies for the application of naked sirnas include various delivery routes which, however, often provide an only local administration or rely on an administration at least close to the target tissue or target organ, thus restricting the number of target organs which may not be relevant for certain diseases. it should also be noted that several studies described here and below use rather large amounts of sirnas and that upon intravenous injection of sirnas the liver is the primary site of sirna uptake. as an alternative approach for the application of sir-nas in vivo, their delivery by liposomes/cationic lipids has been described. for liposome-based sirna formulations, a wide variety of modes of application allowing local or systemic delivery has been used (table 2) . finally, several other strategies for local or systemic sirna administration have been explored, including chemical modifications of sirna molecules, electropulsation, polyamine, or other basic complexes, atelocollagen, virosomes, and certain protein preparations (table 3 ). an alternative approach relies on the complexation of unmodified sirna molecules with a cationic polymer, polyethylenimine (pei). polyethylenimines (peis) are synthetic polymers available in branched or linear forms (figure 2 , upper panels) and in a broad range of molecular weights from < 1000 da to > 1000 kd. commercial pei preparations, although labelled with a defined molecular weight, consist of pei molecules with a broad molecular weight distribution [45] [46] [47] . peis possess a high cationic charge density due to a protonable amino group in every third position [48, 49] . since no quarternary amino groups are present, the cationic charges are generated by protonation of the amino groups and hence are dependent on the ph in the environment (eg, 20% at ph 7.4, see [50] for review). due to its ability to condense and compact the dna into complexes, which form small colloidal particles allowing efficient cellular uptake through endocytosis, pei has been introduced as a potent dna transfection reagent in a variety of cell lines and in animals for dna delivery (for review, see [51, 52] and references therein). in fact, in several studies pei has been shown to be able to deliver large dna molecules such as 2.3 mb yeast artificial chromosomes (yacs) [53] as well as plasmids or small oligonucleotides [48, [54] [55] [56] into mammalian cells in vitro and in vivo. the n/p ratio, which indicates the ratio of the nitrogen atoms of pei to dna phosphates in the complex and thus describes the amount of pei used for complex formation independent of its molecular weight, influences the efficiency of dna delivery. a positive net charge of the complexes, resulting from high n/p ratios, inhibits due to electrostatic repulsion their aggregation and improves their solubility in aqueous solutions as well as their interaction with the negatively charged extracellular matrix components and thus their cellular uptake [57] . additionally, the strong buffer capacity, described by the "proton sponge hypothesis" which postulates enhanced transgene delivery by cationic polymer-dna complexes (polyplexes) containing h + buffering polyamines due to enhanced endosomal cl − accumulation and osmotic swelling/lysis [48] , seems to be responsible for the fact that pei-based delivery does not require endosome disruptive agents for lysosomal escape. this tight condensation of the dna molecules as well as the buffering capacity of pei in certain cellular compartments like endosomes and lysosomes also protects dna from degradation [48, 49, 58, 59] . peis have been successfully used for nonviral gene delivery in vitro and in vivo. while initial publications showed increased transfection efficacies when using high molecular weight peis [45] , more recent studies demonstrated the advantages of certain low molecular weight peis [47, 60, 61] . the higher transfection efficacy of low molecular weight peis may be due to a more efficient uptake of the resulting pei/dna complexes, a better intracellular release of the dna and/or lower in vitro cytotoxicity as compared to high molecular weight pei [60] [61] [62] [63] . in fact, a decrease in the molecular weight of the pei leads to an increase in complex size which may be favourable at least for in vitro use [64, 65] . on the other hand, other peis with very low molecular weight (< 2 kd) display little or no transfection efficacy even at very high n/p ratios which may be attributed to the fact that a decrease in the molecular weight of pei has been shown to translate into an increasingly lower ability to form small complexes [63] . therefore, low molecular weight peis require higher n/p ratios for optimal transfection efficacies as compared to higher molecular weight peis since higher n/p ratios lead to an increase in compaction with reduced complex sizes and a reduced tendency of the complexes to aggregate due to hydrophobic interactions [61, 63, 64] . nevertheless, while several parameters have been extensively studied, some precise determinants for transfection efficacy remain to be elucidated (see [50, 66] for review). also, the mechanism of the cytotoxic effects of pei complexes is only poorly understood. it may rely on the formation of large aggregates in the range of up to 2 μm which, when formed on the cell surface, impairs membrane functions finally leading to cell necrosis [60] . clearly, there is a trend towards low molecular weight peis as rather nontoxic delivery reagents in vitro and in vivo, which combine high biocompatibility and reduced side-effects thus also allowing to employ larger pei/dna complex amounts without significant cytotoxicity. more recently, the use of polyethylenimines has been extended towards the complexation and delivery of rna molecules, especially small rna molecules like 37 nt all-rna ribozymes [67] [68] [69] and sirnas [70] (figure 2 ). while chemically unmodified rna molecules are very instable and prone to rapid degradation, the pei complexation has been shown to lead to an almost complete protection against enzymatic or nonenzymatic degradation. in fact, pei-complexed sir-nas, which are [ 32 p]-labeled for better detection, remain intact in vitro for several hours even in the presence of rnase a or fetal calf serum at 37 • c, while non-complexed sirnas are rapidly degraded (figure 3(a) ). this indicates that sirna molecules are efficiently condensed and thus fully covered and protected by pei. indeed, the analysis of pei/sirna complexes by atomic force microscopy showed the absence of free sirnas or sirna molecule ends and thus confirms these findings regarding an efficient complexation (grzelinski et al, submitted). however, while the complex stability seems to be sufficient for sirna protection with all peis tested (werth et al, in press; aigner et al, unpublished data), several of these complexes do not show any targeting efficacy at all. in fact, only when using certain polyethylenimines, pei/sirna complexes are efficiently delivered into target cells in vitro, where sirnas are released and display bioactivity (figures 1 and 2) . in general and as seen before for pei/dna complexes (see above), the transfection efficacy is dependent on the pei used, also indicating that the sirna targeting efficiency mainly depends on the endocytotic uptake of the complex and/or its intracellular decomposition rather than on the in vitro complex stability. good results were obtained with commercially available jetpei [70] while the in vivo jetpei from the same supplier showed only poor sirna delivery efficacies [71] . likewise, a novel low molecular weight pei based on the fractionation of a commercially available polyethylenimine demonstrates high sirna protection and delivery efficacies in vitro (werth et al, in press). under certain conditions, the pei/rna (sirna or ribozyme) complexes retain their physical stability and biological activity also after lyophilization ( [72] and werth et al, in press). although the pei transfection is only transient, data from our lab show that pei/sirna effects are stable for at least 7 days (urban-klein and aigner, unpublished results). finally, another study has explored the use of sirna nanoplexes comprising of pei that is pegylated with an rgd peptide ligand attached at the distal end of the pei. again, sirna nanoplexes protect sirnas against serum degradation and show in vitro activity [73] . the ultimate goal is the application of sirnas in vivo which has been explored in some studies in different mouse models. ge et al showed that pei-complexed sirnas targeting conserved regions of influenza virus genes are able to prevent and treat influenza virus infection in mice. upon iv injection, pei promoted the delivery of sirnas into the lungs where, either given before or after virus infection, sirna reduced influenza virus production in the lungs [74] . most biological effects of the systemic application of peicomplexed sirnas, however, have been determined in different mouse tumor models and by targeting different proteins which have been shown previously to be tumor-relevant. this includes the epidermal growth factor receptor her-2 (c-erbb-2/neu), the growth factor pleiotrophin (ptn), and vascular endothelial growth factor (vegf) and its receptor (vegf r2), and the fibroblast growth factor-binding protein fgf-bp. the in vivo administration of pei complexed, but not of naked sirnas, through ip or subcutaneous injection resulted in the detection of intact sirnas even hours after injection (figure 3(b) ). radiolabeled sirna molecules were found in several organs including subcutaneous tumors, muscle liver, kidney and, to a smaller extent, lung and brain. it is important to note that the sirnas were actually internalized by the tissues as indicated by the fact that blood was negative for sirnas ( figure 3(b) ). overexpression of the her-2 receptor has been observed in a wide variety of human cancers and cancer cell lines. since her-2 displays strong cell growth-stimulating and antiapoptotic effects especially through heterodimer formation with other members of the egfr family, its overexpression has been established as a negative prognostic factor and linked to a more aggressive malignant behaviour of tumors (eg, [75] ). consequently, her-2 qualifies as an attractive target molecule for antitumoral treatment strategies including anti-her-2 antibodies, low molecular weight inhibitors, or her-2-specific gene-targeting approaches. in fact, the relevance of her-2 (over-)expression in tumor growth has been established in several in vitro her-2 targeting studies including the use of ribozymes [76, 78, 79] or sirnas [80, 81] . proposed mechanism of pei-mediated sirna transfer. due to electrostatic interactions, pei is able to complex negatively charged sirnas leading to a compaction and the formation of small colloidal particles which are endocytosed. the "proton sponge effect" exhibited by pei complexes leads to osmotic swelling and ultimately to the disruption of the endosomes. sirnas are protected from degradation due to their tight condensation in the complex and the buffering capacity of pei. upon their release from the pei-based complex, intact sirnas are incorporated into the risc complex and induce rnai (see figure 1 ). it was demonstrated that her-2 reduction in vitro leads, among others, to the inhibition of cell proliferation and increased apoptosis. the systemic treatment of athymic nude mice bearing subcutaneous skov-3 ovarian carcinoma tumor xenografts through ip injection of pei-complexed her-2-specific sirnas led to marked antitumoral effects as seen by a significant reduction tumor growth (figure 4 ) [70] . peicomplexed nonspecific sirnas or her-2-specific, naked sir-nas had no effects. this was paralleled by the detection of intact her-2-specific sirnas in the tumors of the specific treatment group already 30 min after administration and for at least 4 h, and by the downregulation of her-2 on mrna and protein levels [70] . another receptor, vegf r2, was targeted in a study employing self-assembling nanoparticles based on sirnas complexed pei which is pegylated with an rgd peptide ligand attached at the distal end of peg. while the pegylation allows steric stabilization and reduces nonspecific interactions of the complexes, the rgd motif provided tumor selectivity due to their ability to target integrins expressed on activated endothelial cells in the tumor vasculature. upon iv administration into mice bearing subcutaneous n2a neuroblastoma tumor xenografts, a selective tumor uptake and a vegf r2 downregulation were observed, resulting in decreased tumor growth and tumor angiogenesis [73] . the receptor ligand, vegf, is a mitogenic and angiogenic growth factor stimulating tumor growth and angiogenesis in several tumors including prostate carcinoma. thus, it may represent attractive target molecule for rnai-based genetargeting strategies also bearing in mind the double antitumoral effect due to reduction of tumor cell proliferation as well as tumor angiogenesis. the subcutaneous or intraperitoneal injection of vegf-specific sirnas complexed with a novel pei obtained through fractionation of a commercially available pei (werth et al, in press) resulted in the reduction of tumor growth due to decreased vegf expression levels (höbel and aigner, unpublished results). the same was true for pei/sirna-mediated targeting of fgf-bp (dai and aigner, unpublished results), which has been established figure 3 : protection and in vivo delivery of sirnas upon pei complexation. in [70] (a) in vitro protection of sirnas against nucleolytic degradation. [ 32 p] end-labeled sirnas, complexed (upper panel) or not complexed (lower panel) with pei, were subjected to treatment with 1 % fetal calf serum at 37 • c. at the time points indicated, the samples were analysed by agarose gel electrophoresis, blotting, and autoradiography. the bands represent full-length sirna molecules indicating that pei complexation leads to the efficient protection of sirnas while noncomplexed sirnas are rapidly degraded. (b,c) in vivo delivery of intact sirnas upon pei complexation. [ 32 p]-labeled sirnas, complexed (+) or not complexed (−) with pei, were injected ip into mice bearing subcutaneous skov-3 ovarian carcinoma cell tumor xenografts, and after 30 min (b) or 4 h (b) total rna from various organ and tissue homogenates was prepared and subjected to agarose gel electrophoresis prior to blotting and autoradiography. the bands represent intact [ 32 p]-labeled sirna molecules which for several hours are mainly found in tumor and muscle as well as in liver and, time-dependently, in kidney. only little sirna amounts are detected in the lung and traces in the brain. previously as "rate-limiting" for tumor growth and angiogenesis in several tumors ( [82, 83] , see [84] for review). finally, pei/sirna-mediated targeting of pleiotrophin (ptn) exerted strong antitumoral effects. ptn is a secreted growth factor which shows mitogenic, chemotactic, angiogenic and transforming activity [85] [86] [87] [88] [89] [90] [91] [92] [93] and which is markedly upregulated in several human tumors including cancer of the breast, testis, prostate, pancreas, and lung as well as in melanomas, meningiomas, neuroblastomas, and glioblastomas. the in vivo treatment of nude mice through systemic subcutaneous or ip application of pei-complexed ptn sirnas led to the delivery of intact sirnas into subcutaneous tumor xenografts and a significant inhibition of tumor growth. likewise, in a clinically more relevant orthotopic mouse glioblastoma model with u87 cells growing intracranially, the injection of pei-complexed ptn sirnas into the cns exerted antitumoral effects. this establishes, also in a complex and relevant orthotopic tumor model, the potential of pei/sirna-mediated ptn gene targeting as a novel therapeutic option in gbm, and further extends the modes of delivery of pei/sirna complexes intrathecal strategies as employed in the therapy of glioblastomas with antisense oligonucleotides. only a few years after their discovery, sirnas are catching up with ribozymes and antisense oligonucleotides as efficient tools for gene targeting in vitro and, more recently, also in vivo. this includes the exploration of their potential as therapeutics which will lead to the development of sirna-based therapeutic strategies. their ultimate success, however, will figure 4 : systemic treatment of mice with pei-complexed her-2-specific sirnas leads to reduced growth of subcutaneous skov-3 tumor xenografts due to decreased her-2 expression. in [70] athymic nude mice bearing subcutaneous tumor xenografts were injected ip with 0.6 nmoles her-2-specific naked (open circles) or pei-complexed (closed circles) sirnas 2-3 times per week and tumor sizes were evaluated daily from the product of the perpendicular diameters of the tumors. mean +/-standard error of the mean (sem) is depicted and student's unpaired t test was used for comparisons between data sets ( * * p < .03, * * * p < .01). differences in tumor growth reach significance at day 5 indicating the antitumoral effects of the pei-complexed her-2-specific sirnas. strongly depend on the development of powerful and feasible sirna delivery strategies which need to address several issues including the stability/stabilization of sirna molecules while preserving their efficacy and maintaining their genesilencing activity, an efficient delivery into the target organ(s) as well as a sufficiently long sirna half life in the organism and particularly in the target organ. thus, sirna delivery strategies must provide sirna protection and transfection efficacy, the absence of toxic and nonspecific effects, they must be efficacious also when using small amounts of sirnas and must be applicable in various treatment regimens and in various diseases even when this requires to overcome biological barriers after their administration to reach their target tissue or target organ. the research done on dna-based gene delivery, ribozyme-targeting, and antisense technology will facilitate this process since it already provides a basis of established technologies. this is also true for the complexation of sirnas with polyethylenimine, which may represent a promising avenue for sirna applications in vivo. this may eventually lead to novel therapeutic strategies. the work of a. aigner is supported by the deutsche forschungsgemeinschaft (ai 24/5-1) and by the deutsche krebshilfe. the author would like to apologize to the authors whose primary works have not been cited due to length considerations. inhibition of rous sarcoma viral rna translation by a specific oligodeoxyribonucleotide inhibition of rous sarcoma virus replication and cell transformation by a specific oligodeoxynucleotide in vitro splicing of the ribosomal rna precursor of tetrahymena: involvement of a guanosine nucleotide in the excision of the intervening sequence self-splicing rna: autoexcision and autocyclization of the ribosomal rna intervening sequence of tetrahymena the rna moiety of ribonuclease p is the catalytic subunit of the enzyme potent and specific genetic interference by double-stranded rna in caenorhabditis elegans altered gene expression in plants due to trans interactions between homologous genes par-1, a gene required for establishing polarity in c. elegans embryos, encodes a putative ser/thr kinase that is asymmetrically distributed a species of small antisense rna in posttranscriptional gene silencing in plants rnai: doublestranded rna directs the atp-dependent cleavage of mrna at 21 to 23 nucleotide intervals role for a bidentate ribonuclease in the initiation step of rna interference rna interference is mediated by 21-and 22-nucleotide rnas an rnadirected nuclease mediates post-transcriptional gene silencing in drosophila cells structural domains in rnai. febs letters atp requirements and small interfering rna structure in the rna interference pathway analysis of gene function in somatic mammalian cells using small interfering rnas functional anatomy of sirnas for mediating efficient rnai in drosophila melanogaster embryo lysate positional effects of short interfering rnas targeting the human coagulation trigger tissue factor synthetic dsrna dicer substrates enhance rnai potency and efficacy rational sirna design for rna interference site specific enzymatic cleavage of rna the efficacy of small interfering rnas targeted to the type 1 insulin-like growth factor receptor (igf1r) is influenced by secondary structure in the igf1r transcript expression of small interfering rnas targeted against hiv-1 rev transcripts in human cells a statistical sampling algorithm for rna secondary structure prediction efficient reduction of target rnas by small interfering rna and rnase h-dependent antisense agents. a comparative analysis effective small interfering rnas and phosphorothioate antisense dnas have different preferences for target sites in the luciferase mr-nas the activity of sirna in mammalian cells is related to structural target accessibility: a comparison with antisense oligonucleotides mammalian rnai: a practical guide expression profiling reveals off-target gene regulation by rnai sequence-specific potent induction of ifn-alpha by short interfering rna in plasmacytoid dendritic cells through tlr7 activation of the interferon system by short-interfering rnas induction of an interferon response by rnai vectors in mammalian cells duplexes of 21-nucleotide rnas mediate rna interference in cultured mammalian cells specific inhibition of gene expression by small double-stranded rnas in invertebrate and vertebrate systems sequence-dependent stimulation of the mammalian innate immune response by synthetic sirna interferon induction by sirnas and ssrnas synthesized by phage polymerase virus treatment questioned after gene therapy death molecular basis of the inflammatory response to adenovirus vectors immune responses to adeno-associated virus and its recombinant vectors helper virus induced t cell lymphoma in nonhuman primates after retroviral mediated gene transfer prospects for cationic polymers in gene and oligonucleotide therapy against cancer. advanced drug delivery reviews cationic liposomes for gene delivery: novel cationic lipids and enhancement by proteins and peptides cationic transfection lipids high-throughput screening of effective sirnas from rnai libraries delivered via bacterial invasion size matters: molecular weight affects the efficiency of poly(ethylenimine) as a gene delivery vehicle the influence of polymer structure on the interactions of cationic polymers with dna and morphology of the resulting complexes preparation of a low molecular weight polyethylenimine for efficient cell transfection a versatile vector for gene and oligonucleotide transfer into cells in culture and in vivo: polyethylenimine the proton sponge: a trick to enter cells the viruses did not exploit recent advances in rational gene transfer vector design based on poly(ethylene imine) and its derivatives gene transfer with modified polyethylenimines targeted nucleic acid delivery into tumors: new avenues for cancer therapy transfer of yacs up to 2.3 mb intact into human cells with polyethylenimine a powerful nonviral vector for in vivo gene transfer into the adult mammalian brain: polyethylenimine nonviral gene delivery to the rat kidney with polyethylenimine polyethylenimine-based intravenous delivery of transgenes to mouse lung polyethylenimine-mediated cellular uptake, nucleus trafficking and expression of cytokine plasmid dna polyethylenimine shows properties of interest for cystic fibrosis gene therapy poly(ethylenimine)-mediated transfection: a new paradigm for gene delivery a novel nonviral vector for dna delivery based on low molecular weight, branched polyethylenimine: effect of molecular weight on transfection efficiency and cytotoxicity. pharmaceutical research inhibition of monocyte adhesion on brain-derived endothelial cells by nf-kappab decoy/polyethylenimine complexes vitro cytotoxicity testing of polycations: influence of polymer structure on cell viability and hemolysis low-molecularweight polyethylenimine as a non-viral vector for dna delivery: comparison of physicochemical properties, transfection efficiency and in vivo distribution with highmolecular-weight polyethylenimine the size of dna/transferrin-pei complexes is an important factor for gene expression in cultured cells different behavior of branched and linear polyethylenimine for gene delivery in vitro and in vivo polyethylenimine-based non-viral gene delivery systems delivery of unmodified bioactive ribozymes by an rna stabilizing polyethylenimine lmw pei efficiently down regulates gene expression physicochemical and biological characterization of polyethylenimine-graft-poly(ethylene glycol) block copolymers as a delivery system for oligonucleotides and ribozymes. bioconjugate chemistry efficiency of polyethylenimines and polyethylenimine-graft-poly (ethylene glycol) block copolymers to protect oligonucleotides against enzymatic degradation rnai-mediated gene-targeting through systemic application of polyethylenimine (pei)-complexed sirna in vivo lipid-mediated sirna delivery down-regulates exogenous gene expression in the mouse brain at picomolar levels stabilization of oligonucleotide-polyethylenimine complexes by freeze-drying: physicochemical and biological characterization cancer sirna therapy by tumor selective delivery with ligand-targeted sterically stabilized nanoparticle inhibition of influenza virus production in virus-infected mice by rna interference studies of the her-2/neu proto-oncogene in human breast and ovarian cancer her-2/neu is rate-limiting for ovarian cancer growth. conditional depletion of her-2/neu by ribozyme targeting ribozyme targeting of her-2 inhibits pancreatic cancer cell growth in vivo adenovirus-mediated ribozyme targeting of her-2/neu inhibits in vivo growth of breast cancer cells adenovirusmediated transduction of ribozymes abrogates her-2/neu and pleiotrophin expression and inhibits tumor cell proliferation small interfering rna (sirna) inhibits the expression of the her2/neu gene, upregulates hla class i and induces apoptosis of her2/neu positive tumor cell lines inhibition of breast and ovarian tumor growth through multiple signaling pathways by using retrovirus-mediated small interfering rna against her-2/neu gene expression a secreted fgf-binding protein can serve as the angiogenic switch in human cancer ribozyme-targeting of a secreted fgf-binding protein (fgf-bp) inhibits proliferation of prostate cancer cells in vitro and in vivo the fibroblast growth factor-binding protein fgf-bp. to appear in the international pleiotrophin stimulates fibroblasts and endothelial and epithelial cells and is expressed in human cancer anti-apoptotic signaling of pleiotrophin through its receptor, anaplastic lymphoma kinase a heparin-binding growth factor secreted from breast cancer cells homologous to a developmentally regulated cytokine a novel 17 kd heparin-binding growth factor (hbgf-8) in bovine uterus: purification and n-terminal amino acid sequence human breast cancer growth inhibited in vivo by a dominant negative pleiotrophin mutant melanoma angiogenesis and metastasis modulated by ribozyme targeting of the secreted growth factor pleiotrophin ribozyme-targeting elucidates a direct role of pleiotrophin in tumor growth wellstein a. human trophoblast and choriocarcinoma expression of the growth factor pleiotrophin attributable to germline insertion of an endogenous retrovirus an angiogenic role for the neurokines midkine and pleiotrophin in tumorigenesis caspase 8 small interfering rna prevents acute liver failure in mice small interfering rna inhibits hepatitis b virus replication in mice inhibition of hepatitis b virus replication in vivo by nucleoside analogues and sirna efficient delivery of sirna for inhibition of gene expression in postnatal mice rna interference targeting fas protects mice from fulminant hepatitis lack of interferon response in animals to naked sirnas sequencespecific suppression of mdr1a/1b expression in mice via rna interference caspase-8 and caspase-3 small interfering rna decreases ischemia/reperfusion injury to the liver in mice gene silencing in rat-liver and limb grafts by rapid injection of small interference rna silencing of cxcr4 blocks breast cancer metastasis. cancer research targeting 2a protease by rna interference attenuates coxsackieviral cytopathogenicity and promotes survival in highly susceptible mice systemic sirna-mediated gene silencing: a new journal of biomedicine and biotechnology approach to targeted therapy of cancer epha2: a determinant of malignant cellular behavior and a potential therapeutic target in pancreatic adenocarcinoma rna interference targeting focal adhesion kinase enhances pancreatic adenocarcinoma gemcitabine chemosensitivity small interfering rna targeting fas protects mice against renal ischemia-reperfusion injury protection against lethal influenza virus challenge by rna interference in vivo gene silencing in the endocrine pancreas mediated by short-interfering rna in vivo delivery of small interfering rna targeting brain capillary endothelial cells sirna-mediated inhibition of vascular endothelial growth factor severely limits tumor resistance to antiangiogenic thrombospondin-1 and slows tumor vascularization and growth variants of bcl-2 specific sirna for silencing antiapoptotic bcl-2 in pancreatic cancer down-regulation of apoptosis mediators by rnai inhibits axotomy-induced retinal ganglion cell death in vivo in vivo gene silencing (with sirna) of pulmonary expression of mip-2 versus kc results in divergent effects on hemorrhageinduced, neutrophil-mediated septic acute lung injury rna interference in adult mice small interfering rna (sirna) targeting vegf effectively inhibits ocular neovascularization in mouse model effects of treatment with small interfering rna on joint inflammation in mice with collagen-induced arthritis modification of professional antigen-presenting cells with small interfering rna in vivo to enhance cancer vaccine potency inhibition of respiratory viruses by nasally administered sirna small interfering rna targeting heme oxygenase-1 enhances ischemia-reperfusioninduced lung apoptosis using sirna in prophylactic and therapeutic regimens against sars coronavirus in rhesus macaque successful incorporation of short-interfering rna into islet cells by in situ perfusion anti-rhoa and anti-rhoc sirnas inhibit the proliferation and invasiveness of mda-mb-231 breast cancer cells in vitro and in vivo colony-stimulating factor-1 blockade by antisense oligonucleotides and small interfering rnas suppresses growth of human mammary tumor xenografts in mice sirna relieves chronic neuropathic pain exploring rna inteference as a therapeutic strategy for renal disease silencing of fas, but not caspase-8, in lung epithelial cells ameliorates pulmonary apoptosis, inflammation, and neutrophil influx after hemorrhagic shock and sepsis reducing hypothalamic agrp by rna interference increases metabolic rate and decreases body weight without influencing food intake neurochemical and behavioral consequences of widespread gene knockdown in the adult mouse brain by using nonviral rna interference sirna-mediated knockdown of the serotonin transporter in the adult mouse brain sirna targeted against amyloid precursor protein impairs synaptic activity in vivo inhibition of ocular angiogenesis by sirna targeting vascular endothelial growth factor pathway genes: therapeutic strategy for herpetic stromal keratitis rna interference targeting transforming growth factor-beta type ii receptor suppresses ocular inflammation and fibrosis. molecular vision antitumor activity of small interfering rna/cationic liposome complex in mouse models of cancer small interfering rna-mediated functional silencing of vasopressin v 2 receptors in the mouse kidney therapeutic epha2 gene targeting in vivo using neutral liposomal small interfering rna delivery sirna-induced caveolin-1 knockdown in mice increases lung vascular permeability via the junctional pathway cationic liposome-mediated delivery of sirnas in adult mice efficient delivery of small interfering rna for inhibition of il-12p40 expression in vivo small interfering rnas directed against beta-catenin inhibit the in vitro and in vivo growth of colon cancer cells gene silencing by systemic delivery of synthetic sirnas in adult mice intravesical administration of small interfering rna targeting plk-1 successfully prevents the growth of bladder cancer in vitro and in vivo suppression of gjb2 expression by rna interference blockage of the macrophage migration inhibitory factor expression by short interference rna inhibited the rejection of an allogeneic tracheal graft widespread lipoplex-mediated gene transfer to vascular endothelial cells and hemangioblasts in the vertebrate embryo systemic delivery of raf-sirna using cationic cardiolipin liposomes silences raf-1 expression and inhibits tumor growth in xenograft model of human prostate cancer novel cationic cardiolipin analogue-based liposome for efficient dna and small interfering rna delivery in vitro and in vivo an efficient intrathecal delivery of small interfering rna to the spinal cord and peripheral neurons comparison of antisense oligonucleotides and sir-nas in cell culture and in vivo an sirna-based microbicide protects mice from lethal herpes simplex virus 2 infection therapeutic silencing of an endogenous gene by systemic administration of modified sirnas potent and persistent in vivo anti-hbv activity of chemically modified sir-nas inhibition of gene expression in mice muscle by in vivo electrically mediated sirna delivery small interfering rna targeting raf-1 inhibits tumor growth in vitro and in vivo a small interfering rna targeting vascular endothelial growth factor as cancer therapeutics atelocollagenmediated synthetic small interfering rna delivery for effective gene silencing in vitro and in vivo efficient delivery of small interfering rna to bone-metastatic tumors by using atelocollagen in vivo rad51 sirna delivered by hvj envelope vector enhances the anti-cancer effect of cisplatin antibody mediated in vivo delivery of small interfering rnas via cell-surface receptors silencing heat shock factor 1 by small interfering rna abrogates heat shock-induced cardioprotection against ischemiareperfusion injury in mice reconstituted influenza virus envelopes as an efficient carrier system for cellular delivery of small-interfering rnas key: cord-000012-p56v8wi1 authors: bigot, yves; samain, sylvie; augé-gouillou, corinne; federici, brian a title: molecular evidence for the evolution of ichnoviruses from ascoviruses by symbiogenesis date: 2008-09-18 journal: bmc evol biol doi: 10.1186/1471-2148-8-253 sha: doc_id: 12 cord_uid: p56v8wi1 background: female endoparasitic ichneumonid wasps inject virus-like particles into their caterpillar hosts to suppress immunity. these particles are classified as ichnovirus virions and resemble ascovirus virions, which are also transmitted by parasitic wasps and attack caterpillars. ascoviruses replicate dna and produce virions. polydnavirus dna consists of wasp dna replicated by the wasp from its genome, which also directs particle synthesis. structural similarities between ascovirus and ichnovirus particles and the biology of their transmission suggest that ichnoviruses evolved from ascoviruses, although molecular evidence for this hypothesis is lacking. results: here we show that a family of unique pox-d5 ntpase proteins in the glypta fumiferanae ichnovirus are related to three diadromus pulchellus ascovirus proteins encoded by orfs 90, 91 and 93. a new alignment technique also shows that two proteins from a related ichnovirus are orthologs of other ascovirus virion proteins. conclusion: our results provide molecular evidence supporting the origin of ichnoviruses from ascoviruses by lateral transfer of ascoviral genes into ichneumonid wasp genomes, perhaps the first example of symbiogenesis between large dna viruses and eukaryotic organisms. we also discuss the limits of this evidence through complementary studies, which revealed that passive lateral transfer of viral genes among polydnaviral, bacterial, and wasp genomes may have occurred repeatedly through an intimate coupling of both recombination and replication of viral genomes during evolution. the impact of passive lateral transfers on evolutionary relationships between polydnaviruses and viruses with large double-stranded genomes is considered in the context of the theory of symbiogenesis. approximately two-thirds of these wasps are endoparasites, meaning that the larval stages develop within the body cavity of their hosts, typically other insects. among the most successful of these endoparasitic wasps are those that use lepidopteran larvae as hosts. owing to the economic importance of these insects and the utility of their wasp parasites as biological control agents, the ability of these parasites to develop within lepidopteran hosts without triggering an intense immune response has been the subject of numerous studies over the past forty years. early studies of the mediterranean flour moth, ephestia kuhniella, parasitized by the ichnemonid, venturia canescens, showed that eggs of this species are coated with particles that resemble virions [2] [3] [4] and contain surface proteins that mimic host proteins, thus keeping the eggs and larvae from being recognized as foreign material by their host. these particles lack dna, and thus are not considered virions [5] . with respect to both species number and mechanisms that lead to successful parasitism, endoparasitic wasps are known to inject secretions at oviposition, but only a few lineages use viruses or virus-like particles (vlps) to evade or to suppress host defences. in the family ichneumonidae, for example, four types of host defence suppression mediated by the injection of fluids or suspensions are known that lead to successful parasitism. 1) fluid injected with eggs bypasses host defences without the aid of viruses or vlps [6] . 2) wasps inject a virus that replicates in both the wasp and lepidopteran host. one example is the wasp diadromus pulchellus, which injects an ascovirus, dpav4 [7] into host pupae to circumvent host defence response. 3) the wasp injects vlps capable of molecular mimicry and/or direct defence suppression. 4) the wasp injects polydnavirus particles that contain genes coding for proteins that interfere with host defence responses. the last mechanism is by far the best-studied type of direct immune suppression by ichneumonid wasps, and occurs in many species belonging to genera campoletis, hyposoter and tranosema (ichneumonidae, campopleginae), and glypta (ichneumonidae, banchinae) [8] . in these cases, female wasps inject eggs along with ichnovirus particles into their hosts. similarly, in certain lineages of endoparasitic braconid wasps, other types of immunosuppressive particles containing dna occur in the fluid injected along with eggs [ [9] ; for a review, [10] ]. once in the host, ichneumonid and brachonid particles enter host nuclei and their dna is transcribed, producing proteins that selectively suppress various steps in the host defence response. as a result of this unusual biology, these particles were described as symbiotic viruses belonging to new viral family, polydnaviridae [10] [11] [12] since the 1970's, it was assumed that the dna in the polydnavirus particles, as with all other viruses, encoded typical enzymes and proteins for viral replication and virion assembly and structure. however, several recent genomic studies have shown that only a small number of the genes vectored into lepidopteran hosts, less than 2%, have homologs in other viruses. most viral dna is noncoding, except that which codes for wasp proteins involved in suppression of immune pathways, such as phenoloxidase activation and the toll pathways [8, 13, 14] . even before these genomic studies, it was suggested that these particles were more similar to organelles than viruses [15] . the similarities between particle structure and virions of known types of complex dna insect viruses are striking, and suggest these immunosuppressive particles originated by symbiogenesis between viruses and endoparasitic wasps, the same evolutionary process by which mitochondria and plastids originated from symbiotic bacteria [16] . for example, most braconid wasps produce enveloped bacilliform particles classified as bracoviruses, and these resemble baculovirus and nudivirus virions [10, 15] . similarly, ichneumonid wasps produce enveloped spindle-shaped particles classified as ichnoviruses that resemble virions of ascoviruses, viruses lethal to lepidopterans, which, interestingly, are vectored by endoparasitic wasps [15] . it must also be noted that ichnoviruses resemble other true virus particles that are structurally very similar to virions of ascoviruses, but which remain unclassified because the lack of information about their genomes [17] [18] [19] [20] [21] . however, ascoviruses and ichnoviruses display very different genome properties; similar genomic differences occur between bracoviruses and baculoviruses or nudiviruses, suggesting that convergent evolution led to the origin the different polydnavirus types from at least two different types of viruses. in ascoviruses, the genome consists of a single circular dna molecule ranging from 119-to 180-kpb in size [7] . phylogenetic analyses of several viral genes have revealed that ascoviruses are closely related to iridoviruses [22] , and likely evolved from them. in contrast, the genome of ichnoviruses is composed of multiple circular dna molecules (25 to 105) representing a total size of 250 to 300kbp, all of which are replicated from the wasp chromosomes. the ichnovirus proviral genome is specifically excised and amplified in several segments in the female calyx cells, the only wasp tissue in which ichnovirus virogenesis occurs. after assembly, these particles are secreted into the female genital tract. once injected into the host, the ichnovirus genome does not replicate, and does not lead to the production of a new virus generation. the third characteristic of ichnoviruses is that most of the genes borne by the particles are not related to viral genes. among the 7 annotated ichnovirus gene families, there are four (rep, prrp, n, and trv) for which no homology with known eukaryotic (or prokaryotic) proteins has been detected and for which no function has been proposed. among the remaining three (cys, ank and inx), cys-motif proteins have no clear homologs among eukaryotic (or prokaryotic) proteins, although the "cysteine knot" that they form is a folding domain found in many proteins, but not one that is necessarily related to eukaryotic host immune systems [10, 14] . however, some protein domains and their putative functions suggest that they might be related to regulatory components of eukaryotic host defence systems that are not sufficiently elucidated. although the resemblance of the polydnavirus virions to those of conventional insect viruses suggests that the former evolved from the latter, to date no molecular evidence supports this hypothesis. in the case of ascoviruses and ichnoviruses, well-conserved genes found among the three ascoviruses sequenced so far (sfav1a [23] , tnav2c [24] , and hvav3e [25] ) are not found in ichnovirus genomes. as noted above, the principal reason for this is that the genomes of the latter viruses appear to contain mainly wasp genes, not viral genes. this highlights the need for new and alternative types of sequence data obtained from pertinent biological systems. in this regard, dpav4 has features that could provide important insights. indeed, it is the only ascovirus known to replicate in both its wasp and caterpillar hosts. it is transmitted vertically from wasp to caterpillars to suppress the defence response of the latter host, thereby enabling parasite development [26, 27] . moreover, in males and females of d. pulchellus, the dpav4 genome resides in the nuclei of all hosts cells, providing a possible example of what may have been an intermediate stage in the symbiogenesis that led to the evolutionary origin of ichnoviruses. we recently sequenced the dpav4 genome, and a combination of our analysis of this genome and recent data from new types of ichnoviruses, as well as new software programs that elucidate protein relationships based on structural analysis, have enabled us to detect phylogenetic relationships between proteins encoded by open reading frames of dpav4 and the glypta fumiferanae (gfiv) and campolitis sonorensis (csiv) ichnoviruses. in support of the symbiogenesis hypothesis for the origin of ichnoviruses, data and analyses suggest two independent symbiogenic events, in agreement with what was previously proposed [28] . the first led to the ichnoviruses in banchinae lineage. this hypothesis is based on the occurrence of a gene cluster present in gfiv and dpav4. the second symbiogenic event led to ichnoviruses in the campopleginae wasp lineage. this hypothesis is based on relationships of the major capsid proteins among csiv, ascoviruses and iridoviruses. extending our investigations to proteins encoded by open reading frames of certain ascoviruses and bracoviruses, hosts and bacteria, in the light of recent analyses about the involvement of the replication machinery of virus groups related to ascoviruses in lateral gene transfer [29] , we discuss the robustness and the limits of the molecular evidence supporting an ascovirus origin for ichnovirus lineages. the dpav4 genome sequenced by genoscope (france) is 119,334-bp in length. its organization, gene content and evolutionary characteristics will be detailed in a separate publication (manuscript in preparation; additional file 1). however, blast results obtained with several orfs in the dpav4 genome provide evidence that certain ichnovirus orfs have their closest relatives in an ascovirus genome. specifically, we identified a 13-kbp region that contains a cluster of three genes ( fig. 1 , orf90, 91 and 93; additional files 1 and 2) that have close homologs in a gfiv gene family composed of seven members [28] . all contain a domain similar to a conserved domain found in the pox-d5 family of ntpases. to date, this pox-d5 domain has been identified as a ntp binding domain of about 250 amino acid residues found only in viral proteins encoded by poxvirus, iridovirus, ascovirus and mimivirus genomes. these genes seem to be specific to gfiv, as they are absent in the three sequenced genomes of other ichnoviruses, namely csiv, tranosema rostrales ichnovirus (triv), and hyposoter fugitivus ichnovirus (hfiv). more specifically, in dpav4, orf90 encodes a protein of 925 amino acid residues that is 40% similar from position 140 to 925 to a protein of 972 amino acid residues encoded by the orf1 contained in the segment c20 in the gfiv genome (fig. 2) . these two proteins can therefore be considered putative orthologs. the 480 c-terminal residues of this dpav4 protein are also 42% similar to the cterminal domain of the protein homologs encoded by the orf1 of the d1 and d4 gfiv segments, 36% similar to the n-terminal and the c-terminal domains of the protein encoded by the orfs 184r and 128l of the iridovirus civ and lcdv, and 30% similar with those encoded by orfs 119, 99 and 78 in the ascovirus genomes of hvav3e, sfav1a and tnav2c, respectively. overall, this indicates that this dpav4 protein is more closely related to that of gfiv than to those found in other ascovirus and iridovirus genomes currently available in databases. orf091 encodes a protein of 161 amino acid residues similar only with the c-terminal domain of three proteins encoded by the orfs 1, 1 and 3, contained, respectively, in gfiv segments d1, d4 and d3. in contrast, orf93 is closer to iridovirus and ascovirus genes than to gfiv genes. this protein of 849 amino acid residues is 43% similar over all its length to civ orf184r orthologs in all iridoviral and ascoviral genomes and is only 36% similar over 350 amino acid residues to the c-terminal domain of the gfiv protein homologs encoded by the orf1, 2, 1, 1, 1 and 1 in, respectively, the c20, c21, d1, d2, d3 and d4 segments of this virus. analysis of the genes surrounding the dpav4 orf-90-91-93 cluster confirms that this virus has an ascovirus origin since this region contains orfs that are close homologs of genes in iridovirus and ascovirus genomes. upstream from the orf-90-91-93 cluster, an orf encoding the dna-dependent rna polymerase 1 subunit c is present, which is an ortholog of the iridoviral civ orf176r and the ascoviral sfav1a orf008. downstream from this cluster, there are two genes, absent in known ascoviral genomes, but similar to the iridoviral civ orf115l and civ orf132l. these two genes encode, respectively, a chromosomal replication initiation protein and zinc finger protein. in between them, a gene encoding a small protein is present that is similar to that encoded by the orf069l of the iridovirus civ, and which corresponds to the ali-like protein also found in entomopoxviruses [30] . since the three dpav4 genes have relatives in all ascovirus and iridovirus genomes sequenced so far, their presence in the dpav4 genome cannot result from a lateral transfer that occurred from an ichnovirus genome related gfiv to dpav4. thus, as these dpav4 genes are the closest relatives of the pox-d5 gene family present in gfiv identified so far, they could be considered a landmark of the symbiogenic ascovirus origin of the ichnovirus lineage to which this polydnavirus belongs. an alternative explanation is that the presence of dpav4-like genes in the genome of gfiv resulted from a lateral transfer from viral genomes closely related to those of gfiv and dpav4. indeed, this might have happened when a glypta wasp was infected by an ancestral virus related to dpav4. nevertheless, the symbiogenic origin of gfiv from ascoviruses is also supported by morphological features of its virions [28] , which, aside from similarities in shape, also show reticulations on their surface in negatively stained preparations, a characteristic of the virions of all ascovirus species examined to date [7] . because ascovirus virions and ichnovirus particles display structural similarities, we developed an approach to search for homologs of virion structural proteins in ichnoviruses. these approaches were initiated in 2000 and recently finalized, but some of the conclusions have been published [14] . to date, only two virion proteins from the campoletis sonorensis ichnovirus (csiv) have been characterized [31, 32] . the first is the p44 (acc n° aad01199), a structural protein that appears to be located as a layer between the out envelope and nucleocapsid, and the second, p12, a capsid protein (acc n° af004367). presently, there are more than one hundred ascoviral or iridoviral mcp sequences in databases. blast searches using these sequences failed to detect any similarities between csiv virion proteins and ascoviral or iridoviral mcps, or any other proteins [33] . to evaluate the possibility that homology between ichnovirus and ascovirus virion proteins may simply not be detectable by conventional blastp searches, we used a different method, wapam (weighted automata pattern matching; [34] ). the models were designed on the basis of a previous study [22] demonstrating that mcp encoded by ascovirus, iridovirus, phycodnavirus and asfarvirus genomes are related, and all contain 7 conserved domains separated by hinges of very variable size. we investigated these conserved domains further using hydrophobic cluster analysis (hca, [35] ). this map of the 13-kbp region of the dpav4 genome (embl acc. n° cu469068 and cu467486) that contains the gene cluster with direct homologs in the genome of the glypta fumiferanae ichnovirus amino acid sequence comparison resulting from a blast search done with the dpav4 orf90 as a query, and the best hit corresponding to the protein encoded by the orf1 of the ichnovirus segment gfv-c20 (subject; genbank acc. n° yp_001029423) figure 2 amino acid sequence comparison resulting from a blast search done with the dpav4 orf90 as a query, and the best hit corresponding to the protein encoded by the orf1 of the ichnovirus segment gfv-c20 (subject; genbank acc. n° yp_001029423). analysis revealed that most conservation occurred at the level of hydrophobic residues, as expected for structural proteins (additional file 3a and 3b). the size variability of the hinges between conserved domains and the conservation of hydrophobic residues might explain why blast searches using iridoviral and ascoviral mcp sequences have limited ability to detect mcp orthologs in phycodnavirus and asfarvirus genomes. we designed two syntactic models (see materials and methods), which together were able to specifically align all mcp sequences of the four virus families. importantly, wapam aligned the csiv ichnovirus p44 structural protein with both models. complementary structural and hca confirmed the presence of the seven conserved domains in this csiv structural protein ( fig. 3a and additional file 3c). in addition to the above analysis, ten syntactic models were developed using proteins conserved in the three sequenced ascovirus species (sfav1a, tnav2c, and hvav3a) and twelve iridoviruses [36] . none of these 1 and 4, typed in black) , dpav4 (lanes 2 and 5, typed in blue) and sfav1a (lanes 3 and 6, typed in purple) . conserved positions among the amino acid sequence of csiv and those of dpav4 and sfav1a are highlighted in grey. secondary structures in the three sfav1a orf061 orthologs were calculated with the network protein sequence analysis at http://npsa-pbil.ibcp.fr/ and the statistical relevance of the secondary structures were evaluated with psipred at http://bioinf.cs.ucl.ac.uk/psipred/. c, e and h in lanes 4 to 6 respectively indicated for each amino acid that it is involved in a coiled, b sheet or a helix structure. using default parameters of psipred, upper case letters indicate that the predicted secondary structure is statically significant in psipred results. significant secondary structures are highlighted in yellow. in (a), the comparisons were limited to three of the seven conserved domains (additional file 3a, b and 3c), the 2, 5 and 7. indeed, classical in silico methods appeared to be inappropriate to predict statistically significant secondary structures in conserved structural protein rich in b strand such as iridovirus and ascovirus mcp. in contrast, a complete and coherent domain comparison was obtained by hca profiles (fig. s3b, c) . , developed from small proteins encoded by the dpav4 orf041, sfav1a orf061, hvav3a orf74, and tnav2c orf118 in the ascovirus genomes, and iridovirus civ orf347l and mimivirus miv orf096r genomes, respectively. importantly, these proteins have orthologs in vertebrate iridoviruses, phycodnaviruses, and asfarvirus. in sfav1a, the peptide encoded by orf061 is one of the virion components. in ascoviruses, iridoviruses, phycodnaviruses, and the asfarvirus, they have been annotated as thioredoxines, proteins that play a role in initiating viral infection [37] [38] [39] . database mining with our model revealed four hits with csiv sequences (acc n°. m80623, s47226, af236017, af362508) each a homolog orf of sfav1a orf061. in fact, these sequences correspond to several variants of a single region contained in the b segment of the csiv genome. to date, these have not been annotated in the final csiv genome, probably because they overlap a recombination site. hca analyses confirmed that the hydrophobic cores were conserved ( fig. 3b and additional file 3d and 3e). the chromosomal locations of genes encoding these two csiv proteins, i.e., p44 and p12, were also consistent with the symbiogenesis hypothesis. in fact, the orf encoding p44 is not found in proviral dna. it is notable that no orfs encoding orthologs of p44 or other structural proteins such as mcps are found in any of the other three ichnovirus genomes sequenced -triv, gfiv, hfiv [8, 14] . therefore, this indicates that the orthologs of ichnovirus mcps and other virion structural proteins are also probably located in the genomes of these wasps, i.e., not in proviral dna. in contrast to this, we found that the gene encoding the csiv ortholog of sfav1a orf061 is located within the proviral dna. whether ortholog proteins are similarly involved in the triv, gfiv and hfiv biology, their genes are not found in proviral dna, since no matches were detected in their viral genomes. the phylogenetic analysis performed previously on p44 and the sfav1a orf061 orthologs [15] indicated that they have an ancestor close to that of the ascoviruses and iridoviruses. as in the case of genes encoding pox-d5 family of ntpases in all ascoviruses, iridoviruses, and gfiv, genes encoding virion proteins cannot result from a horizontal transfer from a campoplegine or banchine ichnovirus genome to all ascovirus, iridovirus, phycodnaviruses and asfarvirus genomes. as the ascovirus genes encoding the two virion proteins investigated here are the closest relatives of virion proteins in csiv, they can be considered a landmark reflecting the symbiogenic origin of the two ichnovirus lineages from ascoviruses closely related to dpav4. in fact, the difficulty encountered in elucidating their sequence relationships can be explained by a combination of the marked transition from ascovirus to ichnovirus, and the significant selection constraints that resulted as the latter virus type evolved from the former. analysis of available ascovirus, iridovirus and ichnovirus genomes provides some of the first molecular support for the hypothesis that ichnoviruses evolved from ascoviruses by symbiogenesis. however, examining genes shared only by ascovirus, iridovirus and ichnovirus genomes likely limits the sources of genes that contributed to the evolution and complexity of these viruses, especially of the role of lateral gene transfer. relevant to this is the recent finding that an important part of the mimivirus and phycodnavirus genomes had a bacterial origin [28] . obviously, this did not lead to the conclusion that these viruses had a bacterial origin. the cytoplasmic environment in which these viruses replicate is rich in bacterial dna because their amobae and unicellular algae hosts feed on bacteria that they digest in their cytoplasm. thus, it has been proposed [28] that lateral transfers of bacterial dna within these viral genomes were driven by intimate coupling of recombination and viral genome replication. indeed, replication of these viruses is similar to that of bacteriophage t4. this mode of replication has been called recombination-primed replication. it permits integration of dna molecules with sequence homology as short as 12-bp [28, 40] . the replication machinery used by ascoviruses, iridoviruses, mimiviruses, phycodnaviruses, and other nucleocytoplasmic large dna viruses (ncldv) [41, 42] is common to all of them, despite differences in the specifics of replication in each virus family. it can therefore be expected that recombination-primed replication occurred repeatedly during evolution of both these viruses and the genome of their eukaryotic hosts. in an eukaryotic cellular environment in which bacteria, chromosomes, ncldv viruses and non-ncldvs (such as baculoviruses) intimately cohabit temporarily or permanently, recombination-primed replication is able to allow reciprocal passive lateral transfers between viral genomes, host chromosomes, and bacterial dna. under these conditions, lateral transfers are considered passive since they just result from the intimate environment and not from an active mechanism dedicated to genetic exchanges. in ascoviruses and iridoviruses, the occurrence of such lateral transfers is supported by blastp searches that detected the presence of orfs whose closest relatives have their origin within eukaryotic genomes (e.g., for dpav4, in additional data 1, orfs 029, 049, 077, 080, 083, 118), bacterial genomes (e.g., for dpav4, in additional data 1, orfs 056, 057, 059, 112, 115 and119) or viruses belonging to other ncldv and non-ncldv families (e.g., for dpav4, in additional data 1, orfs 007, 037, 062, 068). the transmission of ascoviruses is unusual in that they are poorly infectious per os and appear to be transmitted among lepidopteran hosts by parasite wasp vectors at oviposition [7, 43] . the genome of the ascoviruses can be replicated in presence of polydnavirus dna either within the reproductive tissues of female wasps or within the body of the parasitized hosts infected by both polydnavirus and ascovirus. consequently, integrated sequences of ascovirus origin can be expected within wasp and polydnavirus genomes. reciprocally, sequences of polydnavirus origin may have been integrated in ascovirus genomes, whatever the wasp origin, ichneumonid or braconid. one gene family related to a bacterial family of n-acetyl-l-glutamate 5-phosphotransferase (acc. n° of the closest bacterial relatives yp_001354925, cam32558, zp_00944224, zp_02006449), identified only within the sfav1a, hvav3e and tnav2c genomes, supports this conclusion. it has been found in the genome of a bracovirus, cotesia congregata bracovirus (ccbv [13] ; fig. 4 ). since this gene is absent in the genome of microplitis demolitor bv, a related bracovirus [8] , it is difficult to infer the direction of the lateral transfer between the common ancestors of the three ascoviruses and of the wasp c. congregata. however, they unambiguously indicate that there was at least one lateral transfer for this gene between the common ancestor of ascoviruses and the parasitic wasp. since iridoviruses, like ascoviruses and other virus species [44, 45] , are, in some cases, vectored by parasitic wasps, databases were mined using all the available ichnovirus virus proteins as queries. we found no significant relationships between csiv, hfiv and triv genomes and genomes of their putative closest relatives ncldv and non-ncldv relatives. this indicates that passive lateral gene transfers from virus to eukaryotes that are successfully spread and maintained in ichnovirus genomes remain rare events. one case of such lateral transfer was described in the ccbv genome. in this genome, aside from the presence of cardinal endogenous eukaryotic retrotranposon and polintons that transposed in the chromosomal dna of the proviral form of ccbv [46] [47] [48] , two genes encoding acmnpv p94-related proteins, which have their closest relatives among granuloviruses (xcgv), were found. this suggests that ccbv contained at least two cases of lateral transfers between non-ncldv and a bracovirus. our results provide another source of evidence that passive lateral gene transfers have occurred regularly during evolution from bacteria to viruses and eukaryotes, and between viruses and eukaryotes [49] [50] [51] [52] . even if the pox-d5 ntpase genes in the gfiv genome, and the mcp and sfav061-like genes in the csiv genome, indicate that they have an ascovirus origin, they provide only limited evidence supporting an ascovirus origin of ichnoviruses. indeed, their sequence conservation and biological characteristics suggest that there were repeated lateral transfers during evolution between ascoviruses and wasp genomes, including the proviral ichnovirus loci. this raises an important issue about the role of lateral transfers during co-evolution of the ncldvs and non-ncldvs, ichnovirus, wasp and parasitized host. indeed, genetic materials of various origins have been exchanged and maintained during co-evolution. this therefore suggests that ichnoviruses might be chimeric entities partly resulting from sevsymbiogenesis was first proposed as an evolutionary mechanism when it became widely recognized that mitochondria and plastids originated from free-living prokaryotes [7] . the genomes of the endosymbiotic cyanobacteria and proteobacteria, respectively, at the origin of chloroplasts and mirochondria have evolved by reduction of several orders of magnitude to the approximate size of plasmids. concurrently, nuclear genomes have been the recipients of plastid genomes. this relocation of the genes encoding most proteins of the endosymbiotic bacteria to the host nucleus is the ultimate step of this evolutionary process, so-called endosymbiogenesis [7, 53] . recent studies of plants have revealed a constant deluge of dna from organelles to the nucleus since the origin of organelles [54] . this allows the host cell to have the genetic control on its organelles, in a relationship that is closer to enslavement or domestication than to a symbiosis or a mutualism in which the organelles would recover benefits from their contribution to the eukaryotic cell well-being. to date, this deluge of dna is considered to correspond to passive lateral transfers that result from the interactions between the life cycle of the organelle and nuclear replication. numerous cases of symbiogenesis between endocellular bacteria and a wide variety of eukaryotic hosts have been characterized. however, recent work has demonstrated that this evolutionary process was not restricted to bacteria. it also occurred between endocellular eukaryotes such as unicellular algae and fungal endophyte in plants [55, 56] . endosymbiogenesis was also proposed as the evolutionary mechanism that allowed some invertebrate viruses with a large double-stranded dna genome related to the nudiviruses and the ascoviruses [22] , to have led, respectively, to the origin of bracoviruses and ichnoviruses, which are currently recognized as forming two genera within the family polydnaviridae. although presently there is no definitive evidence ruling out the hypothesis that the resemblance between ichnovirus and ascovirus virions is only an evolutionary convergence, the genomic differences between ascovirus and ichnoviruses are in good agreement with the symbiogenetic hypothesis. indeed, they match an evolutionary scenario of endosymbiogenesis during which, from a single integration event of symbiotic virus genome, viral genes were lost and/or translocated from the provirus to other chromosomal regions (fig. 5 ). in parallel, host genes of interest for the wasp parasitoid were integrated and diversified by selection and gene duplication in the proviral dna. in this scenario, the more ancient symbiogenesis, the rarer the traces of genes from viral origin in the ichnovirus genome would be. this constitutes a constraint that dramatically limits the possibility to investigate the evolutionary links between ascovirus and ichnovirus. results of our analyses demonstrate that the situation is also complicated by the fact that lateral gene transfers unrelated to the origin of ichnoviruses cause important misleading background noise. moreover, the scenario in figure 5 is close to a previously proposed version [57] , but is not consistent with results presented here, nor with recently accumulated knowledge on dna transfer from organelles into the nucleus. since endocellular environments favour lateral transfers between virus and wasp nucleus, it can be proposed that genes of virus origin that are involved in the ichnovirus biology were passively integrated in one or several loci, step by step over time, alone or through transfers of gene clusters, or even the entire viral genome. since parasitoid wasps are able to vector different viruses [44, 45] , this second scenario opens the exciting possibility that virus genes involved in the ichnovirus biology might correspond to a gene patchwork resulting from transfers from viruses belonging to different ncldv and non-nclvd families. because of the background noise due to lateral gene transfers found in these systems, elucidating the origins of ichnoviruses will be very time-consuming, requiring new accurate experimental approaches to generate more robust evidence. sequencing wasp genomes to identify proteins of viral origin that are components of virions and involved in the assembly of these may well contribute to our understanding of how ichnoviruses and bracoviruses evolved from other insect dna viruses. searches for similarities were mainly developed using facilities of blast programs at two websites http:// www.ncbi.nlm.nih.gov/blast/blast.cgi and http:genoweb.univ-rennes1.fr/serveur-gpo/out ils.php3?id_rubrique=47. for dpav4 genes having their origin within eukaryotic, bacterial or virus genomes belonging to ncldv and non-ncldv families, the closest gene was located using the distance trees supplied with each blast search at the ncbi website. construction of syntactic models: conserved amino acid blocks and positions described previously [15, 22] and with new data sets were verified or determined using meme at http://meme.sdsc.edu/meme/meme.html. in the first step, we used motifs resulting from meme to make mast minings in databases at http:// meme.sdsc.edu/meme/mast.html. since meme motifs depend significantly on the data set use to calculate them, this approach did not enable an exhaustive detection of homologs among ascoviruses, iridoviruses, phycodnaviruses, mimiviruses and asfarviruses, and the detection sensitivity was ultimately very similar to that obtained with blast. to reach our detection objectives, we therefore constructed syntactic models that only included the most conserved positions and their variable spacing using wapam at the website. http://genoweb.univ-rennes1.fr/ serveur-gpo/ outils_acces.php3?id_syndic=185&lang=en. defining these models was obtained empirically until they allowed an exhaustive detection in refseq-protein and genbank databases of the homologs among ascoviruses, iridoviruses, phycodnaviruses, mimiviruses and asfarviruses. the procedures were done until we were only able to detect exact match with the syntactic model. whatever obtained with wapam, they required a confirmation with other approaches. here, we used psipred result comparison for regions with scores over 7 and hca analyses for regions having scores lower than 7 with psipred. this simplified the statistical treatment of the result obtained with wapam, since all exact matches have significance or a score of 100%. syntactic hypothetical mechanism for the integration and evolution of ascovirus genomes in endoparasitic wasps figure 5 hypothetical mechanism for the integration and evolution of ascovirus genomes in endoparasitic wasps. schematic representation of the three-step process of symbiogenesis, and dna rearrangements that putatively occurred in the germ line of the wasp ancestors in the banchinae and campopleginae lineages, from the integration of an ascoviral genome to the proviral ichnoviral genome. sequences that originate from the ascovirus are in blue, those of the wasp host and its chromosomes are in pink. genes of ascoviral origin are surrounded by a thin black or white line, depending on their final chromosomal location. two solutions can account for the final chromosomal organisation of the proviral ichnovirus genome, monolocus or multilocus, since this question is not fully understood in either wasp lineage. more complex alternatives to this three-step process might also be proposed and would involve, for example, the complete de novo creation of a mono or multi locus proviral genome from the recruitment by recombination or transposition of ascoviral and host genes located elsewhere in the wasp chromosomes. this model for the chromosomal organization of proviral dna in polydnaviruses is consistent with data recently published [58] . immune surface of eggs of a parasitic insect the resistance of insect parasitoids to the defense reactions of their hosts an insect glycoprotein: a study of the particles responsible for the resistance of a parasitoid's egg to the defence reactions of its insect host role of virus-like particles in parasitoid-host interaction of insects venom from the endoparasitic wasp pimpla hypochondriaca adversely affects the morphology, viability, and immune function of hemocytes from larvae of the tomato moth, lacanobia oleracea characteristics of pathogenic and mutualistic relationships of ascoviruses in field populations of parasitoid wasps polydnavirus genomes reflect their dual roles as mutualists and pathogens particles containing dna associated with the oocyte of an insect parasitoid family polydnaviridae. in virus taxonomy. eighth report of the international commitee on taxonomy of viruses edited by: fauquet cm virus in aparasitoid wasp: suppression of the cellular immune response in the parasitoid's host polydnaviridae -a proposed family of insect viruses with segmented, doublestranded, circular dna genomes genome sequence of a polydnavirus: insights into symbiotic virus evolution shared and species-specific features among ichnovirus genomes origin and evolution of polydnaviruses by symbiogenesis of insect dna viruses in endoparasitic wasps symbiosis in cell evolution hyenoptera: formicidae) from brazil the ultrastructure of microorganisms in the tissues of casenaria infesta (hymenoptera: ichneumonidae) apparent replication of an unusual viruslike particle in both parasitoid wasp and its host an unusual virus from the parasitic wasp cotesia melanoscela. virology viruslike particles in the ovaries of microctonus aethiopoides loan (hymenoptera: braconidae), a parasitoid of adult weevils (coleoptera: curculionidae) evidence for the evolution of ascoviruses from iridoviruses genomic sequence of spodoptera frugiperda ascovirus 1a, an enveloped, double-stranded dna insect virus that manipulates apoptosis for viral reproduction sequence and organization of the trichoplusia ni ascovirus 2c (ascoviridae) genome. virology sequenceand organization of the heliothis virescens ascovirus genome biological and molecular features of the relationships between diadromus pulchellus ascovirus, a parasitoid hymenopteran wasp (diadromus pulchellus) and its lepidopteran host, acrolepiopsis assectella dpav-4, on thehemocytic encapsulation response and capsule melanization of the leek-moth pupa, acrolepiopsis assectella genomic and morphological features of a banchine polydnavirus: comparison with bracoviruses and ichnoviruses i am what i eat and i eat what i am: acquisition of bacterial genes by giant viruses the genome of melanoplus sanguinipes entomopoxvirus cloning and expression of a gene encoding a campoletis sonorensis polydnavirus structural protein a gene encoding a polydnavirus structural polypeptide is not encapsidated what does structure tell us about virus evolution? cluster of re-configurable nodes for scanning large genomic banks deciphering protein sequence information through hydrophobic cluster analysis (hca): current status and perspectives comparative genomic analysis of the family iridoviridae: reannotating and defining the core set of iridovirus genes the thioredoxin system in retroviral infection and apoptosis mimivirus giant particles incorporate a large fraction of anonymous and unique gene products cell entry by enveloped viruses: redox considerations for hiv and sars-coronavirus genetic recombination of the dna plant virus pbcv-1 in a chlorella alga common origin of four diverse families of large eukaryotic dna viruses evolutionary genomics of nucleo-cytoplasmic large dna viruses effects of the nonoccluded virus of spodoptera frugiperda (lepidoptera: noctuidae) on the development of a parasitoid parasitoid-mediated transmission of an iridescent virus non-poly-dna viruses, their parasitic wasp, and hosts the few virus-like genes of cotesia congragata self-synthesizing dna transposons in eukaryotes marvericks, a novel class of giant transposable elements widespread in eukaryotes and related to dna viruses evolution of viruses by acquisition of cellular rna or dna nucleotide sequences and genes: an introduction microbialgenes in the human genome: lateral transfer or gene loss? science are there bugs in our genome? science express genome-wide survey for genes horizontally transferred from cellular organisms to baculoviruses morphogenesis by symbiogenesis endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes a cryptic intracellular green alga in ginkgo biloba: ribosomal dna markers reveal worldwide distribution forest succession suppressed by an introduced plant-fungal symbiosis unfolding the evolutionary story of polydnaviruses structure and evolution of a proviral locus of glyptapanteles indiensis bracovirus this research was funded by grants from the c.n.r.s. (pics n°204343), the genoscope, the a.n.r. project in bioinformatics modulome, the ministère de l'education nationale, de yb is the leader of all aspects of the research on the biology, genomics, and evolution of dpav4. ss coordinated the sequencing, assembly, and sequence quality control of the dpav4 genome. cag participated in the bioinformatics analysis of the dpav4 genome development of the manuscript. baf contributed original concepts regarding the evolutionary origins and role of polydnaviruses in endoparasitoid biology, provided virological expertise to optimize data interpretation, and participated in writing the manuscript. predicted orfs in dpav4 genome. key: cord-007760-it9wach2 authors: jiao, long r.; havlik, roman; nicholls, joanna; jensen, steen lindkaer; habib, nagy a. title: suicide gene therapy in liver tumors date: 2004 journal: suicide gene therapy doi: 10.1385/1-59259-429-8:433 sha: doc_id: 7760 cord_uid: it9wach2 charaterization of a variety of genomic defects in malignant cells (1) has led to attempts to treat cancer by gene therapy. gene therapy is a therapeutic approach in which therapeutic nucleic acids are transferred into the affected organs. although the ideal concept would be the replacement of the abnormal gene by a copy of the functional gene, currently there have not been reliable and safe techniques to allow the site-specific integration of dna into the human genome (2). thus, almost all gene therapies are developed by simply transferring the therapeutic gene into somatic cells without replacing the abnormal gene. the goal is to identify and correct genetic abnormalities interfering with the cell cycle and to correct them in all cells. technically, there are two methods amenable for gene transfer: reintroduction of in vitro transferred gene into the body and direct transfer of gene into the target cells in vivo. charaterization of a variety of genomic defects in malignant cells (1) has led to attempts to treat cancer by gene therapy. gene therapy is a therapeutic approach in which therapeutic nucleic acids are transferred into the affected organs. although the ideal concept would be the replacement of the abnormal gene by a copy of the functional gene, currently there have not been reliable and safe techniques to allow the site-specific integration of dna into the human genome (2). thus, almost all gene therapies are developed by simply transferring the therapeutic gene into somatic cells without replacing the abnormal gene. the goal is to identify and correct genetic abnormalities interfering with the cell cycle and to correct them in all cells. technically, there are two methods amenable for gene transfer: reintroduction of in vitro transferred gene into the body and direct transfer of gene into the target cells in vivo. cancer is a genetic disease characterized by failure to maintain the fidelity of dna because of germ-line and/or somatic gene changes (3). genes involved in carcinogenesis are often usefully categorized as either oncogenes contributing to the development of cancer or tumor suppressor genes suppressing the development and maintenance of the cancer phenotype. therefore, gene therapy is developed by targeting these two genes. current strategies undergoing development for gene therapy involve in restoring tumor suppressor gene function, downregulating oncogeneic expression, stimulating immune response, introducing genes that either increase drug sensitivity or confer multidrug resistance, and modulating tumor angiogenesis genetically (4-13). in this chapter, principles and methods of suicide gene therapy are reviewed together with the results of its clinical trials. protocols required for application of human study are discussed in details by using ad-tk gene therapy for liver tumors as an example. suicide gene therapy consists of the intracellular delivery of a gene encoded for an enzyme that can transform a nontoxic prodrug into a toxic substance (14) . these suicide genes are not present in human cells, so an elective delivery of these genes into cancer cells followed by the administration of a prodrug can lead to a conversion of the drug into cytotoxic substances in the transduced cells. the delivery and transcription of a tumor-specific suicide gene in vivo are crucial for suicide gene therapy to be effective. currently, two strategies are being investigated to selectively transduce tumor cells: tumor-specific vectors and control of suicide gene transcription in tissues or tumors with a tumor-specific promoter. promoters such as carcinoembryonic antigen (cea), or α-fetoprotein (afp) for liver tumors, limit gene expression to cea-or afp-positive cells only. the most commonly used suicide genes are cytosine deaminase (cd) and the herpes simplex virus-thymidine kinase (hsv-tk). acyclovir and ganciclovir are used for the treatment of herpes simplex virus (hsv) infections. these drugs are phosphorylated to the active form by the enzyme thymidine kinase (tk) encoded for part of the hsv genome. the hsv-tk gene is expressed after herpetic viral tk transcription, leading to activation of the drug and cell death. to achieve expression of hsv-tk, which is specific only to hepatocellular carcinoma (hcc) cells, an afp promoter has been constructed. the promoter ensures that only cells expressing afp are able to transcribe and express the hsv-tk gene. the first hsv-tk suicide gene system was introduced by moolten (15,16) . under the control of an afp promoter, an adenoviral vector was used to bring the hsv-tk gene into various hcc cell lines (5). following ganciclovir (gcv) administration, cell death occurred in hepatoma cells producing afp, leaving non-afp-producing cell lines unaffected. phosphorylated gcv is thought to cause cell death by the inhibition of dna polymerase and by causing chain termination during dna synthesis in dividing cells, which then leads to apoptosis (16). a bystander effect has been described, both in vitro and in vivo, where neighboring hsv-tk-negative cells have died when in contact with hsv-tk-positive cells after gcv treatment (17). this increases the effectiveness of suicide gene therapy, and only a part of a tumor mass needs to be transfected by a suicide gene for tumor destruction. a glucocorticoidresponsive element exists upstream of the afp gene in hepatoma cells, which explains the increase in afp observed after the addition of dexamethasone into the culture medium of such cells (18). it is possible that the potency of the tk/gcv suicide system could be enhanced by the addition of dexamethasone to a hepg2 cell line (19) . a retrovirus vector (lnaf0.3tk) carrying the hsv-tk gene regulated only by a human afp promoter has also been reported to provide gcv-mediated cytotoxicity in high-afp-producing human hepatoma cells, but not in low-afp-producing cells. the retrovirus has been further improved so that the hsv-tk gene expression is under the control of a human afp enhancer directly linked to its promoter [lnaf0.3(e+)tk]. the vector also sensitized both low-and intermediate-afp-producing hepatoma cells to gcv treatment, and did not affect cell growth in nonhepatoma cells (20) . in animal models, gcv treatment has led to more pronounced growth inhibition in the lnaf0.3(e+)tk infected cells than in the lnaf0.3tk infected cells (20) . these results indicate that an afp enhancer directly linked to its promoter can further enhance tumoricidal activity in gene therapy for hepatocellular carcinoma. most in vivo studies have used subcutaneously grown human hcc tumor xenografts in mice followed by transfection with an afp-hsv-tk gene in an adenoviral vector. tumor selectivity was confirmed by the regression of huh-7 established tumors in athymic mice, whereas normal tissues remained unaffected (7). the bystander effect enabled tumor regression even when only 10% of the tumor mass expressed hsv-tk (17). a similar therapeutic response has been seen in rats with colorectal liver metastases, where following direct intratumoral injections of hsv-tk-producing packaging cells, a 60-fold reduction in tumor mass was noted following gcv treatment when compared with controls (21). kuriyama et al. described cancer gene therapy with the gcv-hsv-tk system that induced efficient antitumor effects and protective immunity in immunocompetent mice, but not in nude mice. this indicates that a t-cell-mediated immune response may be a critical factor for hsv-tk gene therapy to be successful (22) . unfortunately, severe hepatic dysfunction has been described following adenovirus-mediated transfer of the hsv-tk gene and gcv administration in a rat model of colorectal liver metastases (23). hepatic expression of hsv-tk was demonstrated, both in tumor-bearing and in tumor-free liver tissue. the hepatic hsv-tk expression provoked severe liver dysfunction and mortality upon gcv administration, and, in addition, normal, nonmitotic tissues were affected by the adenovirus-mediated hsv-tk transfer and subsequent gcv administration (23). cytosine deaminase is a nontoxic gene present in some fungi and bacteria. the gene plays a role in the conversion of cytosine to uracil. cells containing this gene can convert 5-fluorocytosine (5-fc) into the cytotoxic chemotherapeutic reagent 5-fluorouracil (5-fu). the escherichia coli cd gene is currently being used as a suicide gene so that genetically modified cells "commit suicide" in the presence of 5-fc. the cd gene has been used with an afp promoter to kill afp-positive hcc cell lines in the presence of 5-fc (6). a bystander effect occurred irrespective of cell-tocell contact with transduced cells. on cell lysis, 5-fu is released into the medium and is thus likely to be responsible for the bystander effect, and, indeed, the 5-fu levels in the medium correlated well with the degree of cytotoxicity (24). afp-positive hcc tumors that have been established subcutaneously in vivo have been shown to regress significantly after adenoviral-mediated insertion of the cd gene (with an afp promoter) and subsequent 5-fc administration, and in one study, nontumor tissue was unaffected (6). block sung et al. have recently published the result of a phase i clinical trial using intratumoral injection of escalating doses of adenovirus-mediated suicide gene followed by intravenous gcv at a fixed dose in patients with colorectal liver metastases (13). the aim was to assess the safety and maximal tolerated dosage of adv.rsv-tk. the vector was infected into a metastatic tumor in the liver under local anaethesia and ultrasound guidance. a total of 16 patients were entered into the trial who received five dose-level cohorts of adv.rsv-tk, from 1.0 × 10 10 to 1.0 × 10 13 virus particles per patient. the response rate was assessed by world health organization (who) criteria with follow-up imaging studies. the assessment of toxicity was carried out according to common toxicity criteria v. 2.0 from the national cancer institute (bethesda, md). one patient was withdrawn from the study because of clinical deterioration from disease and died. stable disease (defined as < 25% change in the size of the tumor measured on computed tomography [ct] or magnetic resonance imaging [mri]) was seen in 11 patients. one patient had a biopsy of the injected tumor at 11 wk following treatment that revealed extensive necrosis of the tumor on histology, whereas five others had a biopsy taken at the later date but showed no evidence of necrosis. adv.rsv-tk dna was not detected in any of these six biopsed specimen. low transient toxicities were present in patients including grade 1 elevations in serum aminotransferase in three patients, grade 2-3 fevers in five patients, grade 3 thrombocytopenia in one patient, and grade 2 leucopenia in three patients. one patient is alive at 40.5 mo, but the remaining 15 died between 0.2 and 36.5 mo (median: 11.3 mo). the authors concluded that adv.rsv-tk could be safely administered by percutaneous intratumoral injection in patients with hepatic metastases at doses up to 1.0 × 10 13 virus particles per patient and could provide the basis for future clinical trials. however, the trial did not demonstrate any tumor reponse following intratumoural injection of adv.rsv-tk. a phase i clinical trial using a replication-deficient adenovirus to deliver the cd gene to metastatic colonic cancer of the liver has been initiated (12). the patients are being treated with a direct intratumoral injection of the cd vector in combination with oral 5-fu. to safeguard the development of human gene therapy for clinical application, various countries have now established regulatory bodies for gene therapy to ensure safety and benefit for humankind. in the united states, federal guidelines for research involving recombinant dna molecules were issued in 1976. the guidelines require that institutions establish an institutional biosafety committee to monitor the use of recombinant dna in the laboratory, in micro-organisms, in animals, and in humans. there are a number of approvals that are required for a proposed human clinical gene therapy trial to be approved and allow patient accrual. in the united kingdom, the gene therapy advisory committee (gtac) together with the medicines control agency (mca) are established to evaluate proposals for human gene therapy. the creation of the european medicines evaluation agency (emea) has standardized approaches across european countires. in the united states, guidelines have been drawn up by the recombinant dna advisory committee (rac) of the national institutes of health to facilitate documentation, review, and discussion on human gene therapy. in addition, each of the vector delivery systems used in human gene transfer trials is considered a biologic and requires the filing of an investigation new drug application for each specific vector. 1. to assess the safety of direct intratumoural injection of ad-tk followed by gcv administration 2. to assess the efficacy of intratumoral injection of ad-tk followed by gcv administration, and to compare this treatment with standard treatment of percutaneous ethanol injection alone in groups of patients with irresectable hepatocellular carcinoma. 3. to study the biological efficacy, including the efficiency and stability of gene transfer by analysis of tumor tissue following therapy. the clinical evidence of antitumor efficacy will also be noted. 4. to seek to identify dose level by injecting ad-tk at differing dose levels in successive cohorts of patients. live, wild-type (nonrecombinant) adenoviruses have been used clinically as vaccines for the prophylaxis of adenoviral upper respiratory infections, a disease of low morbidity but high incidence. these vaccines, which were at one time given routinely to military recruits, are well tolerated and are considered nononcogenic. recombinant versions of the adenovirus have entered clinical trials both as injections and as oral vaccines (26). thus far, they appear to be without significant toxicity. recently, clinical trials using recombinant, replication-defective adenoviral vectors for gene therapy have been initiated. in these studies, recombinant adenoviral vectors carrying the gene for cystic fibrosis transmembrane conductance regulator (cftr) are given via intraairway administration to patients with cystic fibrosis. tursz at the gustave roussy institut, paris, initiated a phase i study to evaluate the feasibility, safety, and clinical effects of the intratumor administration of a recombinant-deficient adenovirus containing the marker gene encoding the e. coli enzyme βgalactosidase (ad-β-gal) in untreated patients with advanced lung cancer (27). the first dose level was 10 7 , the second was 10 8 , and the third was 10 9 pfu (plaque-forming units) (three patients per dose level). all patients received concomitant chemotherapy. β-gal express (x-gal) staining was observed in three out of six tumor biopsies. the microbiological and immunological follow-up of patients who were carriers of wildtype adenovirus before injection. only viral cultures of bronchoalveolar (bal) specimens taken immediately after ad-β-gal injection were positive in all patients. all body-fluid specimens were positive at polymerase chain reaction (pcr) analysis within the first 10 d after injection, as were blood samples drawn 30 min after injection in three patients at the second dose level. bal samples remained positive at 1 mo in two patients and at 3 mo in one patient after ad-β-gal injection. no antibody (ab) response to β-gal was noted in patients, but four had a significant rise in their antiadenovirus ab titers. all 363 samples (throat and stools) taken by the 54 medical staff before and after injection of patients were negative for wild-type adenovirus and ad-β-gal. sera tests (cf) in 202 staff were also negative for antiadenovirus ab titers. this study shows that a marker gene can be safely transferred into human tumor cells with a recombinant adenoviral vector. the study seeks to determine the safety, biological efficacy, and effect of the ad-tk-gcv dose in the locoregional gene therapy of primary malignant tumors of the liver. the study design consists of an open-label, nonrandomized, dose-escalation phase i/ii trial. the ad-tk will be administered by direct intratumoral injection under ct scan or ultrasound control. the study will include sampling on one occasion of normal and malignant tissue from the livers of patients following ad-tk-gcv treatment. this will greatly facilitate assessments of clinical safety and biological efficacy, including efficiency and stability of gene transfer. furthermore, sampling of treated tissues will require minimal additional morbidity for study patients. the replication-deficient adenovirus encoding for hsv-tk (ad-tk) will be administered in 10 cm 3 of normal (0.9%) saline directly into the tumor under ultrasound or ct guidance. dose escalation will occur until the maximum tolerated level or dose level 1 × 10 11 pfu is achieved. thereafter, a further 10 patients will receive the maximum tolerated level. the gcv will be administered intravenously at 5 mg/kg/d, twice a day for 14 d. the first dose will be given 7 d after ad-tk administration. this procedure will be used only in patients over the age of 35 yr in whom conventional treatments have failed or are inapplicable. the risks to the patients are the nforeseen effects of expression of the vector within the tumor, the transmission of other biologically active products with the vector construct, and the clinical risks associated with the percutaneous biopsy of a tumor. the risks seem to be negligible, as 27 patients were treated in the united states with 10 13 -pfu doses (100 times more than the maximum proposed dose in this study) and showed no serious side effects. the only abnormalities observed at the 10 13 -pfu dose level were low-grade fevers and transient elevation in liver function tests. recently, however, a death was reported in a 17-yrold man in pennsylvania (usa), following the administration of 10 13 pfu adenovirus. the cause of death was reported to be the result of acute respiratory distress syndrome (ards). to the best of our knowledge, the risk appears to be negligible for doses of adenovirus up to 10 11 pfu. the risk of bleeding after percutaneous biopsy of the tumor is less than 1%. a generally accepted mortality rate in standard textbooks is between 0.1% and 0 .01% (28,29) . patients must fulfill all of the following criteria in order to be eligible for study admission: histological diagnosis of primary liver tumor; at least 35 yr and less than 75 yr of age( women of childbearing potential may be included, but must use a reliable and appropriate contraceptive method, not including abstinence, for at least 1 mo before study start, for the duration of the study, and for three mo afterward. results of a negative pregnancy test at study start must be available. postmenopausal women must be amenorrheal for at least 12 mo before the study start. men of childbearing potential should practice a barrier method of contraception for the duration of the study, have a life expectancy of at least 3 mo, and have adequate performance status (karnofsky score ≥ 70%). the required values for initial laboratory data are as follows: patients with any of the following will be excluded from study admission: pregnant or lactating women; women with either a positive pregnancy test at screen or baseline, or who have not had a pregnancy test; women of childbearing potential who are not using a reliable and appropriate contraceptive method; postmenopausal women who have been amenorrheal for less than 12 mo; uncontrolled serious bacterial, viral, fungal, or parasitic infection; patients who are human immunodeficiency virus (hiv) positive; systemic corticosteroid therapy or other immunosuppressive therapy administered within the last 3 mo; karnofsky score less than 70%; participation in another investigation therapy study within the last 6 wk; any underlying medical condition that in the principal investigators' opinion, will make participation in the study hazardous or obscure the interpretation of adverse events. the following must be performed within 2 wk prior to study admission: complete medical history; physical examination; toxicity evaluation; performance status; height and weight and body surface area; laboratory screening (*eligibility criteria) for full blood count with differential, platelet count*, serum electrolytes (sodium, potassium, chloride, bicarbonate), urea, creatinine*, glucose, uric acid, albumin, liver function tests, including total protein, calcium, phosphorus, magnesium, aspartate transaminase (ast*), alanine transaminase (alt*), total bilirubin*, alkaline phosphatase, lactate dehydrogenase (ldh), pt*, partial thomboplastin time (ptt*); urinalysis; α-fetoprotein; electrocardiogram (12-lead); chest x-ray (pa and lateral views); abdomen and pelvis ct or mri scan. this should contain a title like "gene therapy of tumors of the liver using ad-tk intratumourally followed by ganciclovir administration: a phase i/ii study. " include the following sections and text in the leaflet. you are being invited to take part in a research study because the cancer in your liver, unfortunately, cannot be removed surgically or treated in any other way. the purpose of the study is to find out which of two treatments may be better for treating your type of liver cancer. the first is a gene therapy treatment that comprises two different drugs, ad-tk (a gene therapy product) that will be given by direct injection into the tumor and ganciclovir (a drug that kills certain types of viruses) that will be injected into a vein in your arm. the second treatment is a treatment that is used commonly for liver cancer, which is the injection of ethanol (a type of alcohol) directly into the tumor. this is a randomized study and so you will only receive one of the treatments described above. before you decide whether or not to take part in this study it is important for you to understand why the research is being done and what it will involve. please take time to read the following information carefully and discuss it with friends and relatives if you wish. ask us if there is anything that is not clear or you would like more information. take time to decide whether or not you wish to take part. thank you for reading this. two weeks before you have one of the treatments, the following procedures will need to be undertaken: 1. you will have various blood samples taken from a vein in your arm. 2. you will have a physical examination. 3. you will be asked about your medical history. 4. you will have a chest x-ray and an electrocardiogram (ecg) of your heart. 5. you will have a special scan of your liver to show the doctors where the tumor is in your liver and so they can measure its size. if you agree to take part in the study, you will be allocated to one of two treatment groups: group 1: treatment of your liver tumor with gene therapy (ad-tk) followed by a 2-wk course of ganciclovir group 2: treatment of your liver tumor with ethanol injection ad-tk will be given by injecting it directly into the tumor under ultrasound or ct scan control in the x-ray department. one week after the ad-tk injection, you will be given ganciclovir into a vein in our arm, twice a day, for 2 wk. afterward, you will have to rest for a few hours before going home. you will also have one liver biopsy performed during the period of the study to see if the drugs have affected the cancer. blood samples will be taken on each occasion you come to the clinic. on day 60 (month 2) of your treatment schedule, you will undergo a ct scan or a mri scan to measure the size of your tumor. this will help to tell us whether the treatment has been effective. you will be given an injection of ethanol directly into the tumor in the liver under ultrasound or ct guidance in the x ray department. afterwards you will have to rest for a few hours before going home. blood samples will be taken on each occasion you come to the clinic. on day 60 (month 2) of your treatment schedule you will undergo a ct scan or a mri scan to measure the size of your tumour. this will help to tell us whether the treatment has been effective. if you decide to take part in this study, your doctor would like to see you every 3 months in the clinic to follow your progress. your doctor would like to track your progress after the study has finished and would also like to keep you informed of any new treatment information about the drugs you had while participating in this study. in order for this to happen, you must tell your doctor if you move. if the treatment has made a difference to the tumor, you will be invited to participate in a further study to receive a further course of treatment. this study will last for about 60 days (2 months), but the doctors would like to continue to see you every 3 months thereafter. there is a small risk of bleeding from the injection site in the liver after treatment with either the gene therapy drug or the ethanol. also, a small bruise and some sore-ness may be left for a short time at the spot where the doctors inject the drug through the skin. there is a small chance that the liver may bleed after the liver biopsy. if this happens, you will have to stay in hospital until the bleeding settles. normally, there may be slight pain in your arm when blood is taken. a small bruise may be left for a short time at the spot where the blood was taken. as this is a new treatment, the risks associated with it are largely unknown. however, experiments suggest that the likelihood of serious side effect is extremely small. if you decide to participate and you experience a reaction to the drug, the doctors will provide you with every medical support. we do ask that you or your partner take reliable and appropriate contraceptive precautions for 1 month prior to the treatment and for 3 months afterward to prevent a pregnancy. this is because we do not know the effect the drug might have on an unborn child. it is important for you to know that recently a death occurred in the united states following an adenovirus injection given in the same way in which you will receive it. the patient was suffering from metabolic liver disease and was given a dose far higher than any that you will receive in this study. your doctor may take you out of the study if your disease becomes worse, if new relevant scientific developments occur, for administrative reasons, or if your doctor feels that taking part in this study is no longer in your best interest. you may, of course, wish to withdraw yourself from the study. it is not possible to tell if there will be any personal benefit from taking part in this study. the information obtained may be used scientifically and might be helpful to others. sometimes during the course of a research project, new information becomes available. if this happens, your doctor will tell you or your legally accepted representative about it and discuss with you whether you want to continue in the study. if you decide to withdraw, your care will not be affected. if you decide to continue, you may be asked to sign an updated consent form. also, on receiving new information, your doctor might consider it to be in your best interests to withdraw you from the study. the reasons will be explained to you and your care will continue. if you consent to take part in the research, your medical records may be inspected for the purposes of analyzing the results. your name, however, will not be disclosed outside of the hospital. any information that leaves the hospital will have your name and address removed so that you cannot be recognized from it. you will not be identified in any report or publication that arises from this study. you may choose not to take part in this study. you may choose to have no further tests and receive supportive care only. you doctor will discuss these choices with you. if you have any questions about taking part in this study or your future participation please contact dr. on . thank you for considering helping us with this important trial. "gene therapy of tumours of the liver using ad-tk intratumourally followed by ganciclovir administration: a phase i/ii study" name of researcher: please initial box i confirm that i have read and understand the information sheet ᮀ for the above study and have had the opportunity to ask questions. i understand that my participation is voluntary and that i am free to withdraw at any time without giving any reason without my medical care or legal rights being affected. i understand that sections of any of my medical notes may be ᮀ looked at by responsible individuals or by individuals from regulatory authorities where it is relevant. i give ᮀ permission for these individuals to have access to my records. i agree to take part in the above study. the ad-tk will be given intratumorally under ultrasound or ct scan guidance. gcv will be then administered intravenously twice per day for 14 d, starting at d 7 after ad-tk injection. dosages will be calculated based on functional units of ad-tk. the dose of ganciclovir will be 5 mg/kg/d. the first three patients enrolled in the study will receive ad-tk at dose level 1. if no dose-limiting toxicity (dlt) is observed after a period of at least 7 d of posttreatment monitoring, the next three patients will receive ad-tk at dose level 2, and so on. if dlt is observed, the appropriate dose escalation decision rule will be followed. the dose level will be escalated to the next level according to the following rules. dose escalation decision rule 0 out of 3 enter next three patients at the next dose level. 1 out of 3 enter up to three additional patients at current dose level. if none or one out of the three of the second group experiences dlt, then enter three patients at the next dose level. as soon as two of the second group experience dlt, then the mtd has been reached at the previous (lower) level and dose escalation will stop. 2 out of 3 enter up to three additional patients at current dose level. if none experience dlt, then enter three patients at the next dose level. as soon as any patient of the second group experi ences dlt, the mtd has been reached at the previous (lower) level and dose escalation will stop. 3 out of 3 the mtd has been reached at the previous (lower) level and dose escalation will stop. patients randomized for ethanol injection will be given 50 cm 3 of ethanol intratumorally under an ultrasound or ct scan guidance. once a patient has been enrolled in the study she/he will be randomized to receive either ad-tk-gcv or ethanol. patients will be treated in the radiology department or the clinic on an outpatient basis. immediately following ad-tk application, vital signs (body temperature, respiratory rate, heart rate, blood pressure) will be performed every 15 min during the first hour after the injection. these evaluations will be performed on d 1, 15, 30, and 60 and will include clinical evaluations (complete history, physical examination, toxicity evaluation, performance status, height and weight, body surface area), as well as blood tests (cbc with differential, platelet count, serum electrolytes [sodium, potassium, chloride, bicarbonate], bun, creatinine, glucose, uric acid, albumin, total protein, calcium, phosphorus, and magnesium, ast, alt, total bilirubin, alkaline phosphatase, ldh, pt, ptt), and urinalysis. other studies will also be undertaken that will include pharmacokinetics and immune responses. serum will be stored in case of future investigations. patients will undergo percutaneous tru-cut liver biopsy of the liver tumor and normal liver as guided by ultrasound or ct scan. patients will be considered to be actively on study from the time of study admission until poststudy evaluation on d 60. patients will be considered associated with the study after poststudy evaluation and will undergo regular follow-up evaluations thereafter. the following evaluations will be performed on an outpatient basis on day 60 (poststudy evaluation), then every 3 mo for 1 yr, then annually. these evaluations will be the same as in subheading 3.5.1. abdomen and pelvis ct or mri scan will be performed on d 60. if tumor response is noted, the abdominal scan will be repeated. one of the main objectives of this study is to assess the biological efficacy of ad-tk, including efficiency and stability. the molecular and cellular effects of ad-tk treatment on malignant tissue will be assayed. malignant tissue from the ad-tk-treated liver will be obtained with tru-cut biopsy. toxicity will be assessed using the national cancer institute criteria. toxicity will be formally evaluated on d 1, 15, 30, 60 and then every 3 mo for 1 yr, then annually. all toxic events will be managed with full and optimal supportive care, including transfer to the intensive care unit (icu) if appropriate. the terms "adverse event," "adverse experience," and "adverse reaction" include any adverse event whether or not it is considered to be drug related. this includes any side effect, injury, toxicity, or sensitivity reaction. an adverse event is considered serious if any of the following occur: it is fatal or life-threatening; it is severely or permanently disabling; it requires new or prolonged inpatient hospitalization; it involves the exacerbation of a congenital anomaly or the development of cancer; it results in an overdose. an adverse event is considered unexpected if it is not identified in nature, severity, or frequency in the current investigator brochure. patients will be instructed to report any adverse event to the investigators. all adverse events occurring during participation in the study will be documented. all adverse events will be reported to both the local ethical committee and the gtac, with a description of the severity, duration, and outcome of the event, and the investigator's opinion regarding the relationship, if any, between the event and the study treatment. the tumor response to either treatment regimen is one of the primary objectives of this study. observations of antitumor activity will be collected and analyzed. standard criteria will be formally employed to classify the antitumor responses observed in patients with measurable disease in the liver. measurable disease will consist of bidimensionally measurable liver lesions with perpendicular diameters of ≥ 1 cm × ≥ 1 cm. patients may withdraw or be removed from the study for any of the following reasons: • patient's request to withdraw • patient unwilling or unable to comply with study requirements • clinical need for concomitant or ancillary therapy not permitted in the study • any unacceptable treatment-related toxicity precluding further participation in the study • unrelated intercurrent illness that, in the judgment of the principal investigator, will affect assessments of clinical status to a significant degree a patient removed from the study prior to any of the scheduled response evaluations will not be considered inevaluable for response. the study will explore the relationships among pharmacokinetic parameters, toxicity, and biological efficacy. the study will explore the relationship between dose of ad-tk and efficiency of transduction (gene transfer). evaluation of clinical efficacy is one of the primary objectives of this study. treatment effect will be estimated as the proportion of patients with an objective response (complete or partial) following ad-tk-gcv as compared with objective response to alcohol therapy. chi-square tests and logistic regression will be used to analyze which variables are significant predictors of response. the kaplan-meier method will be used to estimate progression-free survival. it is expected that the construct will not spread to other persons. tursz et al. studied 10 patients treated with recombinant adenoviral vectors in lung cancer patients and found no cross-contamination to the medical and nursing staff (27). in the french study, there was no shedding of the virus beyond the third day, and we intend to keep the patients overnight in separate rooms with barrier nursing. we will analyze the urine and sputum of the medical and nursing staff during this period. in order to minimize the risk of cross-infection to offspring, only patients above the age of 40 will be included. it is unlikely that cancer patients above this age will remain reproductively active. nevertheless, patients will be warned of the risk and will be advised to take contraceptive measures. for this type of anticancer therapy, viruses need to be rigidly tumor-cell-specific. dl1520 originally produced by barker and berk in 1987 (30) has the ability to target and destroy tumor cells only and led to it being termed the "smart bomb" cancer virus (31). after viral internalization, intracellular adenoviral replication augments an administered dose to the level required to kill the tumor host cells only, leaving neighboring normal tissues intact. dl1520 is an adenovirus hybrid of serotypes 2 and 5 with a genome deletion in the e1b region, causing loss of expression of viral 55-kda protein (e1b 55k). e1b 55k has been shown to bind to the mammalian tumor cell suppressor protein p53 and block p53-mediated transcriptional activation (32). p53 has many functions including arrest of the g 1 phase of cell proliferation via the cyclindependent kinase inhibitor p21/waf1/cip1, or apoptosis through induction of genes such as bax (33) . studies have shown that dl1520 appears to replicate independently of p53 status in many tumor cell lines (33-36) . phase i trials of direct intratumoral injection of dl1520 in more than 22 patients with recurrent head and neck cancer with p53 mutations have already shown necrosis in a significant number of tumors, without evidence of damage to normal tissue (37). habib et al. (11) have reported the results of a phase i and a phase ii clinical study, in which patients with primary and secondary liver tumors were treated with e1b 55-kda deleted dl1520. the adenovirus was given via three different routes: intratumoral, intra-arterial, and intravenous. the study has confirmed that dl1520 was well tolerated when given as either monotherapy or in combination with chemotherapy. furthermore, ultrastructural examination of tissue showed the presence of adenovirus in cell cytoplasm around the nucleus and revealed two dissimilar end points of cell death after virus infection: a preapoptotic sequence and necrosis (11). reid et al. have recently published their results of a phase i study in patients with colorectal liver metastases by using intra-arterial administration of a replication-selective adenovirus (dl1520) (38) . in this study, dl1520 was infused into the hepatic artery at doses of 2 × 10 8 to 2 × 10 12 particles for two cycles (d 1 and 8) with subsequent cycles of dl1520 administered in combination with intravenous 5-fu and leucovorin. they have successfully demonstrated intravascular administration of dl1520 virus and have shown that hepatic artery infusion of the attenuated adenovirus dl1520 was well tolerated at doses resulting in infection, replication, and chemotherapy-associated antitumoral activity. the molecular genetics of cancer ideal gene therapy: approaches and horizons tissue-specific growth suppression and chemosensitivity promotion in human hepatocellular carcinoma cells by tetroviral-mediated transfer of the wild-type p53 gene gene therapy for alpha-fetoprotein-producing human hepatoma cells by adenovirus-mediated transfer of the herpes simplex virus in vivo gene therapy for alpha-fetoprotein producing hepatocellular carcinoma by adenovirus-mediated transfer of cytosine deaminase gene retro-virus-mediated gene therapy for human hepatocellular carcinoma transplanted in athymic mice regional versus systemic delivery of recombinant vaccinia virus as suicide gene therapy for murine liver metastases wild-type p 53 induces apoptosis in hep 3 b through up-regulation of bax expression preliminary report: the short-term effects of direct p53 dna injection in primary hepatocellular carcinomas e1b-deleted adenovirus (dl1520) gene therapy for patients with primary and secondary liver tumours phase i study of direct administration of a replication deficient adenovirus vector containing e. coli cytosine deaminase gene to metastatic colon carcinoma of the liver in association with the oral administration of the pro-drug 5-fluorocytosine intratumoral adenovirus-mediated suicide gene transfer for hepatic metastases from colorectal adenocarcinoma:results of a phase i clinical trial retroviral-mediated gene therapy for the treatment of hepatocellular carcinoma: an innovative approach for cancer therapy tumour chemosensitivity conferred by inserted herpes thymidine kinase genes: paradigm for a prospective cancer control strategy curability of tumours bearing herpes thymidine kinase genes transferred by retroviral vectors the "bystander effect"; tumour regression when a fraction of the tumour mass is genetically modified transcriptional regulation of alphafetoprotein expression by dexamethsone in human hepatoma cells gene therapy for hepatoma cells using a retrovirus vector carrying herpes simplex virus thymidine kinase gene under the control of human alpha-fetoprotein gene promoter retrovirus-mediated gene therapy for hepatocellular carcinoma:selective and enhanced suicide gene expression regulated by human alpha-fetoprotein enhancer directly linked to its promoter regression of established macroscopic liver metastases after in situ transduction of a suicide gene cancer gene therapy with hsv-tk/ gcv system depends on t cell mediated immune responses and causes apoptotic death of tumour cells in vivo severe hepatic dysfunction after adenovirus-mediated transfer of the herpes simplex virus thymidine kinase gene and ganciclovir administration bystander effect caused by cytosine deaminase gene and 5-fluorocytosine in vitro is substantially mediated by generated 5-fluorouracil gene therapy of metastatic comon carcinoma: regression of multiple hepatic metastases by adenoviral expression of bacterial cytosine deaminase a controlled study of adenoviral-vector-mediated gene transfer in the nasal epithelium of patients with cystic fibrosis phase i study of a recombinant adenovirus-mediated gene transfer ilung cancer patients complications following percutaneous liver biopsy liver biopsy: complications in 1000 inpatients and outpatients adenovirus proteins from both e1b reading frames are required for the transformation of rodent cells by viral infection and dna transfection progress of the smart bomb cancer virus replication of onyx-015, a potential anticancer adenovirus, is independent of p53 status in tumour cells status does not determine outcome of e1b 55-kilodalton mutant adenovirus lytic infection -independent and -dependent requirments for e1b-55k in adenovirus type 5 replication selective and nonselective replication of an e1b-deleted adenovirus in hepatocellular carcinoma onyx-015, an e1b geneattenuated adenovirus, causes tumour-specific cytolysis and antitumoural efficacy that can be augmented by standard chemotherapeutic agents phase ii trial of intratumoural infection with an e1b deleted adenovirus in patients with recurrent refractory head and neck cancer intra-arterial administration of a replication-selective adenovirus (dl1520) in patients with colorectal carcinoma metastatic to the liver: a phase i trial we are grateful to the pedersen family charitable foundation for supporting our research endeavors. key: cord-019050-a9datsoo authors: ambrogi, federico; coradini, danila; bassani, niccolò; boracchi, patrizia; biganzoli, elia m. title: bioinformatics and nanotechnologies: nanomedicine date: 2014 journal: springer handbook of bio-/neuroinformatics doi: 10.1007/978-3-642-30574-0_32 sha: doc_id: 19050 cord_uid: a9datsoo in this chapter we focus on the bioinformatics strategies for translating genome-wide expression analyses into clinically useful cancer markers with a specific focus on breast cancer with a perspective on new diagnostic device tools coming from the field of nanobiotechnology and the challenges related to high-throughput data integration, analysis, and assessment from multiple sources. great progress in the development of molecular biology techniques has been seen since the discovery of the structure of deoxyribonucleic acid (dna) and the implementation of a polymerase chain reaction (pcr) method. this started a new era of research on the structure of nucleic acids molecules, the development of new analytical tools, and dna-based analyses that allowed the sequencing of the human genome, the completion of which has led to intensified efforts toward comprehensive analysis of mammalian cell struc ture and metabolism in order to better understand the mechanisms that regulate normal cell behavior and identify the gene alterations responsible for a broad spectrum of human diseases, such as cancer, diabetes, cardiovascular diseases, neurodegenerative disorders, and others. in this chapter we focus on the bioinformatics strategies for translating genome-wide expression analyses into clinically useful cancer markers with a specific focus on breast cancer with a perspective on new diagnostic device tools coming from the field of nanobiotechnology and the challenges related to high-throughput data integration, analysis, and assessment from multiple sources. great progress in the development of molecular biology techniques has been seen since the discovery of the structure of deoxyribonucleic acid (dna) and the implementation of a polymerase chain reaction (pcr) method. this started a new era of research on the structure of nucleic acids molecules, the development of new analytical tools, and dna-based analyses that allowed the sequencing of the human genome, the completion of which has led to intensified efforts toward comprehensive analysis of mammalian cell structure and metabolism in order to better understand the mechanisms that regulate normal cell behavior and identify the gene alterations responsible for a broad spectrum of human diseases, such as cancer, diabetes, cardiovascular diseases, neurodegenerative disorders, and others. technical advances such as the development of molecular cloning, sanger sequencing, pcr, oligonucleotide microarrays and more recently the development of a variety of so-called next-generation sequencing (ngs) platforms has actually revolutionized translational research and in particular cancer research. now, scientists can obtain a genome-wide perspective of cancer gene expression useful to discover novel cancer biomarkers for more accurate diagnosis and prognosis, and monitoring of treatment effectiveness. thus, for instance, microrna expression signatures have been shown to provide a more accurate method of classifying cancer subtypes than transcriptome profiling and allow classification of different stages in tumor progression, actually opening the field of personalized medicine (in which disease detection, diagnosis, and therapy are tailored to each individual's molecular profile) and predictive medicine (in which genetic and molecular information is used to predict disease development, progression, and clinical outcome). however, since these novel tools generate a tremendous amount of data and since the number of laboratories generating microarray data is rapidly growing, new bioinformatics strategies that promote the maximum utilization of such data, as well as methods for integrating gene ontology annotations with microarray data to improve candidate biomarker selection are necessary. in particular, the management and analysis of ngs data requires the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. as a paradigmatic example, a major pathology such as breast cancer can be considered. breast can-part f 32 cer is the most common malignancy in women with a cumulative lifetime risk of developing the disease as high as one in every eight women [32.1]. several factors are associated with this cancer such as genetics, life style, menstrual and reproductive history, and long-term treatment with hormones. until now breast cancer has been hypothesized to develop, following a progression model similar to that described for colon cancer [32.2, 3], through a linear histological progression from adenosis, to ductal/lobular hyperplasia, to atypical ductal/lobular hyperplasia, to in situ carcinoma and finally to invasive cancer, corresponding to increasingly worse patient outcome. molecularly, it has been suggested that this process is accompanied by increasing alterations of the genes that encode for tumor suppressor proteins, nuclear transcription factors, cell cycle regulatory proteins, growth factors, and corresponding receptors, which provide a selective advantage for the outgrowth of mammary epithelial cell clones containing such mutations [32.4] . recent advances in genomic technology have improved our understanding of the genetic events that parallel breast cancer development. in particular, dna microarray-based technology, with the simultaneous evaluation of thousands of genes, has provided researchers with an opportunity to perform comprehensive molecular and genetic profiling of breast cancer able to classify it into some clinically relevant subtypes and in the attempt to predict the prognosis or the response to treatment [32.5-8]. unfortunately, the initial enthusiasm for the application of such an approach was tempered by the publication of several studies reporting contradictory results on the analysis of the same samples analyzed on different microarray platforms that arose the skepticism regarding the reliability and the reproducibility of this technique [32.9, 10]. in fact, despite the great theoretical potential for improving breast cancer management, the actual performance of predictors, built using genes' expression, is not as good as initially published, and the lists of genes obtained from different studies are highly unstable, resulting in disparate signatures with little overlap in their constituent genes. in addition, the biological role of individual genes in a signature, the equivalence of several signatures, and their relation to conventional prognostic factors are still unclear [32.11]. even more incomplete and confusing is the information obtained when molecular genetics was applied to premalignant lesions; indeed, genome analysis revealed an unexpected morphological complexity of breast cancer, very far from the hypothesized multi-step linear process, but sug-gesting a series of stochastic genetic events leading to distinct and divergent pathways towards invasive breast cancer [32.12], the complexity of which limits the application of really effective strategies for prevention and early intervention. therefore, despite the great body of information about breast cancer biology, improving our knowledge about the puzzling bio-molecular features of neoplastic progression is of paramount importance to better identify the series of events that, in addition to genetic changes, are involved in breast tumor initiation and progression and that enable premalignant cells to reach the six biological endpoints that characterize malignant growth (self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of programmed cell death, limitless replicative potential, sustained angiogenesis, and tissue invasion and metastasis). to do that, instead of studying the single aspects of tumor biology, such as gene mutation or gene expression profiling, we must apply an investigational approach aimed to integrate the different aspects (molecular, cellular, and supracellular) of breast tumorigenesis. at the molecular level, an increasing body of evidence suggests that gene expression alone is not sufficient to explain protein diversity and that epigenetic changes (i. e., heritable changes in gene expression that occur without changes in nucleotide sequences), such as alteration in dna methylation, chromatin structure changes, and dysregulation of microrna expression, may affect normal cells and predispose them to subsequent genetic changes with important repercussions in gene expression, protein synthesis, and ultimately cellular function [32. [13] [14] [15] [16] . at the cellular level, evidence indicates that to really understand cell behavior, we must consider also the microenvironment in which cells grow; an environment that recent findings indicate to have a relevant role in promoting and sustaining abnormal cell growth and tumorigenesis [32.17] . this picture is further complicated by the concept that among the heterogeneous cell population that makes up the tumor, there exists an approximately 1% of cells, also known as tumor initiating cells that are more likely derived from normal epithelial precursors (stem/precursor cells), and share with them a number of key properties including the capacity of self-renewal and the ability to proliferate and differentiate [32.18, 19] . when altered in their response to abnormal inputs from the local microenvironment, these stem/precursor cells can give rise to preneoplastic lesions [32.20]. in fact, similarly to bone marrow-derived stem cells, tissue-specific stem cells show remarkable part f 32.1 plasticity within the microenvironment: they can enter a state of quiescence for decades (cell dormancy), but can become highly dynamic once activated by specific microenvironment stimuli from the surrounding stroma and are ultimately transformed in tumor initiating cells [32.21]. the stroma, in which the mammary gland is embedded, is composed of adipocytes, fibroblasts, blood vessels, and an extracellular matrix in which several cytokines and growth factors are present. while none of these cells are themselves malignant, they may acquire an abnormal phenotype and altered function due to their direct or indirect interaction with epithelial stem/precursor cells. acting as an oncogenic agent, the stroma could provoke tumorigenicity in adjacent epithelial cells leading to the acquisition of genomic changes, at which epigenetic alterations also concur, that can accumulate over time and provoke silencing of more than 100 pivotal genes' encoding for proteins involved in tumor suppression, apoptosis, cell cycle regulation, dna repair, and signal transduction [32.22] . under these conditions, epithelial cells and the stroma co-evolve towards a transformed phenotype following a process that has not yet been worked out [32.23, 24]. many of the soluble factors present in the stroma, essential for the normal mammary gland development, have been found to be associated with cancer initiation. this is the case of hormone steroids (estradiol and progesterone), which are physiological regulators of breast development and whose dysregulation may result in preneoplastic and neoplastic lesions [32.25-27]. in fact, through their respective receptors, in epithelial cells estrogens and progesterone may induce the syn-thesis of local factors that, on the one hand, trigger the activation of the stem/precursor cells and, on the other hand, exert a paracrine effect on endothelial cells, which in response to the vascular endothelial growth factor, trigger neoangiogenesis activation [32.21]. in addition, estrogens have been found implicated in the local modifications of tissue homeostasis associated to a chronic inflammation that may promote epithelial transformation due to the continued production of pro-inflammatory factors that favors generation of a pro-growth environment and fosters cancer development [32.28]; alternatively, transformed epithelial cells would enhance activation of fibroblasts through a vicious circle that supports the hypothesis according to which cancer should be considered as a never healing wound. last but not least, very recent findings in animal models have clearly indicated that an early event occurring in the activation of estrogen-induced mammary carcinogenesis is represented by the altered expression of some oncogenic micrornas (oncomir), suggesting a functional link between hormone exposure and epigenomic control [32.29]. concerning the forecasted role of new nanobiotechnology applications, disclosing the bio-molecular events contributing to tumor initiation is, therefore, of paramount importance and to achieve this goal a convergence of advanced biocomputing tools for cancer biomarker discovery and multiplexed nanoprobes for cancer biomarker profiling is crucial. this is the one of the major tasks currently ongoing in medical research, namely the interaction of nanodevices with cells and tissues in vivo and their delivery to disease sites. biomarkers refer to genes, rna, proteins, and mirna expressions that can be correlated with a biological condition or may be important for prognostic or predictive aims as far as regards the clinical outcome. the discovery of biomarkers has a long history in translational research. in more recent years, microarrays have generated a great deal of work, promising the discovery of prognostic and predictive biomarkers able to change medicine as was known until then. since the beginning, the importance of statistical methods in such a context was evident, starting from the seminal paper of golub, which showed the ability of gene expression to classify tumors [32.30]. although bioinformatics is the leading engine, referenced in biomolecular literature, providing in-formatics tool to handle massive omic data, the computational core is actually represented by biostatistics methodology aiming at extracting useful summary information. biostatistics cornerstones are represented by large sample and likelihood theories, hypothesis testing, experimental design, and exploratory multivariate techniques summarized in the genomic era according to class comparison, prediction, and discovery. actually, massive omic data and the idea of personalized medicine need to develop statistical theory according to new requirements. even in the case of multivariate techniques, the problems usually faced using statistical techniques accounted for orders of magnitude of less data than those encountered with high-throughput technologies. a situation that ngs techniques will easily exacerbate. in class comparison studies there is a predefined group which identifies samples and the interest is in evaluating if the groups express the transcripts of interest differently. such studies are generally performed using a transcript by transcript analysis, performing thousands of statistical tests then correcting p-values to account for the desired percentages of false positives and negatives. in fact, the multiple comparison problem is the first concern as traditionally methods for family-wise control are generally too restrictive when accounting for thousands of tests. the false discovery rate (fdr) was a major breakthrough in such a context. the general concepts underlying fdr are outlined later ( fig. 32.1) . another topic discussed regards the parametric assumptions underlying most of the statistical tests used. permutation tests were much developed to face this issue and are now one of the standard tools available to researchers. jeffery and colleagues [32.32] performed a systematic comparison of 9 different methods for identifying genes differentially expressed across experimental groups, finding that different methods gave rise to very different lists of genes and that sample size and noise level strongly affected predictive performance of the methods chosen for evaluation. also, evaluation of the accuracy of fold-change compared to ordinary and moderated t statistics was performed by witten and tibshirani [32.33], which discusses the issues of reproducibility and accuracy of gene lists returned by different methods, claiming that a researcher's decision to use fold-change or a modified t-statistic should be based on biological, rather than statistical, considerations. in this sense, the classical limma-like approach [32.34] has become a de facto standard in the analysis of high-throughput data: gene expression and mirna microarrays, proteomics, and serial analysis of gene expression (sage) generate an incredible amount of data which is routinely analyzed element-wise, without considering the multivariate nature of the problem. akin to this, non-parametric multivariate analysis of variance (manova) techniques have also been suggested to identify differentially expressed genes in the context of microarrays and qpcr-rt [32.35, 36], with the advantage of not making any distributional assumption on expression data and of being able to circumvent the dimensionality issue related to omic data (n • of subjects n • of genes). a well-known example of class comparison study was that of van't veer and colleagues [32.37] in which a panel of genes, a signature, was claimed to be predictive of poor outcome at 5 years for breast cancer patients. in this case a group of patients relapsing at 5 years was compared in terms of gene expression to a group of patients not relapsing within 5 years. in class discovery studies no predefined groups are available and the interest is in findings new groupings, usually called bioprofiles, using the available expression measures. the standard statistical method to perform class discovery is cluster analysis that received great expansion due to gene expression studies. it is worth saying that cluster analysis is a powerful yet tricky method that should be applied taking care of outliers, stability of results, number of suspected profiles, and so on. these aspects are very hard to face with thousands of transcript to be analyzed. even more subtle is the problem of the interpretation of the clusters obtained in terms of disease profiles and the definition of a rule to define the discovered profiles. alternatively, classical multivariate methods, such as principal components analysis (pca), are gaining relevance for visualization of high-dimensional data ( fig. 32 the work of perou and colleagues [32.5] is an important example of class discovery by cluster analysis in a major pathology such as breast cancer. in their work, the authors found genes distinguished between estrogen positive cancer with luminal characteristics and estrogen negative cancers. among these two subgroups, one had a basal characterization and the other showed patterns of up-regulation for genes linked to oncogeneerb-b2. repeated application of cluster anal-ysis in different case series resulted in very similar groupings. notwithstanding the above-mentioned issues connected to cluster analysis, one of the major breakthroughs of genomic studies was actually believed to be the definition of genomic signatures/profiles by the repeated application of cluster analysis to different case series without the definition of a formal rule for class assignment of new subjects. profiles may then be correlated with clinical outcome as was done for breast cancer by van't veer and colleagues [32.37]. now, more than 10 years after this study, it is not yet clear which is the real contribution of microarray-based gene expression profiling to breast cancer prognosis. of all the so-called first-generation signatures, only oncotype dx [32.46], a qrt-pcr based analysis of 21 genes, has reached level ii of evidence to support tumor prognosis and has been included in the national comprehensive cancer network guidelines, whereas the remaining signatures have only obtained level iii of evidence so far [32.47]. reasons for this are, among the others, a lack of stabil-part f 32.2 ity in terms of genes that the lists are composed of and strong time-dependence, i. e., reduced prognostic value after 5 to 20 years of follow-up. another, and more important, issue for prognostic/prediction studies is connected to the design of the study itself. in fact, a prognostic study should be planned by defining a cohort that will be followed during time while a case control study may be only suggestive of signatures to be considered class comparison in genomic-wide studies is one of the most common and challenging applications since the advent of microarray technology. the first study on predictive signatures in breast cancer in 2002 [32.6] was mainly a class comparison study. from the statistical viewpoint one of the first problems evidenced was the large number of statistical tests performed in such an analysis. in particular, the classical control for false positives, emphasizing the specificity of the screening appeared from the beginning to be too restrictive with the cost of false negatives too high. to understand such an issue, let us suppose to have to compare the gene expression in a group of tumor tissues with that of a group of normal tissues. for each gene a statistical test controlling the probability of saying that a gene is different when in fact it is not (false positive, fp), is performed. such an error is called type one error and its level is generally called α and fixed at a 5% level. the problem is that a test at α level is performed for each gene. therefore, if the probability of making a mistake (fp) is 0.05, while the probability of not making a mistake is 0.95 (this is the probability of saying the gene is not differentially expressed when it is not, true negative), when performing, say, 1000 tests the probability of not making any mistake is 0.95 1000 , which is practically 0. accordingly, the probability of at least one fp is practically 1. how can the specificity of the experiment be controlled? a large number of procedures is available, the most simple and known is the bonferroni correction. let us see how it works. in particular, if n tests are performed at α level, the probability of not having any false positive is (1 − α) n , therefore the probability of making at least one false positive is 1 − (1 − α) n , which can be approximated as 1 − nα (for small α). the bonferroni correction originates from this. in fact, if the tests are performed at level α b = α/n, then we can expect to have no false positive among the genes declared differentially expressed at α level. this is, in fact, at the cost of a large number of false positives. in genomic experiments, when thousands of tests are performed, the bonferroni significance level is so low that very few genes can easily pass the first screening probably paying too high costs in terms of genes declared not significantly differentially expressed when actually they have a differential expression. the balance between specificity and sensibility is a fairly old problem in screening problems, which is exacerbated with high-throughput data analysis. one of the most common approaches applied in such a context is the proposal of benjamini and hochberg [32.50] called the false discovery rate (fdr) trying to control the number of false positives among the genes declared significant. to better understand the novelty of fdr, let us suppose to have m genes to be considered in the highthroughput experiment, n of the m genes are truly differentially expressed while p are not. performing the appropriate statistical test ns of the m genes are declared not different between groups under comparison while s are significantly different (fig. 32.1) . the type one error rate α (fpr) controls the number of fp with respect to n, while using the bonferroni correction the probability that fp is greater than 0 is controlled. the fdr changes perspective and considers the columns of the table instead of the rows. fdr controls the number of fp with respect to s. if, for example, 10 genes are declared differentially expressed with an fdr of 20%, it is expected that 2 be false positives. this may allow greater flexibility in the managing of the screening phase of the analysis (see fig. 32 .3 for a graphical representation of results from a class comparison microarray study, with an application of fdr concepts). the problem first solved by benjamini and hochberg was basically how to estimate fdr and different proposals have appeared since then, for example the q-value of storey [32.51]. in general, omic and multiplexed diagnostic technologies with the ability to produce vast amounts of biomolecular data, have vastly outstripped our ability to sensibly deal with this data deluge and extract useful and meaningful information for decision making. the producers of novel biomarkers assume that an integrated bioinformatics and biostatistics infrastructure exists to support the development and evaluation of multiple assays and their translation to the clinic. actually, the best scientific practice for the use of high-throughput data is still to be developed. in this perspective, the existence of fig. 32. 3 volcano plot of differential gene expression pattern between experimental groups. on the x-axis the least squares (ls) means (i. e., difference of mean expression on log 2 scale between experimental groups) and on the yaxis -log 10 transformed p-values corrected for multiplicity using the fdr method from benjamini et al. [32.50] are reported. the horizontal red line corresponds to a cut-off for the significance level α at 0.05. points above this threshold represent genes which are actually differentially expressed between experimental groups, and that are to be further investigated advanced computational technologies for bioinformatics is irrelevant along the translational research process unless supporting biostatistical evaluation infrastructures exist to take advantage of developments in any technology. in this sense, a key problem is the fragmentation of quantitative research efforts. the analysis of high dimensional data is mainly conducted by researchers with limited biostatistical experience using standard software without the knowledge of the underlying statistical principles of the methodology then exposing the results to a wide uncertainty not only due to sample size limitations. moreover, so far, a large amount of biostatistical methods and software tools supporting bioinformatics analysis of genomic/proteomic data has been provided but reference standardized analysis procedures coping with suitable preprocessing and quality control approaches on raw data coming from omic and multiplex assays are still waiting for development. formal initiatives for the integration of biostatistical research groups with functional genomics and proteomics labs are one of the major challenges in this context. in fact, besides the development of innovative biostatistics and bioinformatics tools, a major key of success lies in the ability to integrate different competencies. such an integration cannot be simply demanded for the development of software, such as the arraytrack initiative, but needs to develop integrated skills assisted by a software platform able to outline the analysis plan. in such a context, different strategies can be adopted from open software, such as r and bioconductor, to commercial ones such as sas/jmp genomics. in a functional dynamic perspective, to the characterization of the bio-profiles of cancer affected patients, is added the complexity related to the prolonged followup of patients with the necessity of the registration of the event-history of possible adverse events (local recurrence and/or metastasis) before death, that may offer useful insight into disease dynamics to identify a subset of patients with worse prognosis and better response to the therapy. this makes it necessary to develop strategies for the integration of clinical and follow-up information with those deriving from genetic and molecular characterizations. the evaluation and benchmarking of new analytical processes for the discovery, development, and clinical validation of new diagnostic/prognostic biomarkers is an extremely important problem especially in a fast growing area such as translational research based on functional genomics/proteomics. in fact, the presentation of overoptimistic results based on the unsuited application of biostatistical procedures can mask the true performance of new biomarker/bioprofiles and create false expectations about its effectiveness. guidelines for omic and cross-omic studies should be defined through the integration of different competencies coming from clinical-translational, bioinformatics, and biostatistics research competencies. this integrated contribution from multidisciplinary research teams will have a major impact on the development of standard procedures that will standardize the results and make research more consistent and accurate according to relevant bioanalytical and clinical targets. . 32.4 microarray studies have provided insight on global gene expression in cells and tissues with the expectation of prognostic assessments improvement. the identification of genes whose expression levels are associated with recurrence might also help better discriminating those subjects who are likely to respond to the various tailored systemic treatments. however, microarray experiments raised several questions to the statistical community about the design of the experiments, data acquisition and normalization, supervised and unsupervised analysis. all these issues are burdened by the fact that typically the number of genes being investigated far exceeds the number of patients. it is well-recognized that too large a number of predictor variables affects the performance of classification models: bellman coined the term curse of dimensionality [32.52], referring to the fact that in the absence of simplifying assumptions, the sample size needed to estimate a function of several variables to a given degree of accuracy (i. e., to get a reasonably low-variance estimate) grows exponentially with the number of variables. to avoid this problem, feature selection and extraction issue play a crucial role in microarray analysis. this has led several researchers to find it judicious to filter out genes that do not change their expression level significantly, reducing the complexity of the data and improving the signal to noise ratio. however, the adopted measure of significance in filtering (the implicitly controlled error measure) is not often easy to interpret in terms of the simultaneous testing of thousands of genes. moreover, gene expressions are usually filtered on a per-gene basis and seldom taking into account the correlation between different gene expressions. this filtering approach is commonly used in most current high-throughput experiments whose main objective is to detect differentially expressed genes (active genes) and, therefore, to generate hypotheses rather than to confirm them. all these methods, based on a measure of significance, select genes from a supervised perspective, i. e., accounting for the outcome of interest (the subject status). however, an unsupervised approach might be useful in order to reveal the pattern of associations among different genes making it possible to single out redundant information. the figure shows the data analysis pipeline developed in many papers dealing with expressions from high throughput experiments integration and standardization of approaches for assessment of diagnostic and prognostic performance is a key issue. many of the clinical and translational research groups have chosen different approaches for biodata modeling, tailored to specific types of medical data. however, very few proper benchmarking studies of algorithm classes have been performed worldwide and fewer examples of best practice guidelines have been produced. similarly, few studies have closely examined the criteria under which medical decisions are made. the integrating aspects of this theme relates to methods and approaches for inference, diagnosis, prognosis, and general decision making in the presence of heterogeneous and uncertain data. a further priority is to ensure that research in biomarker analysis is designed and informed from the outset to integrate well with clinical practice (to facilitate widespread clinical acceptance) and that it exploits cross-over between methods and knowledge from different areas (to avoid duplication of efforts to facilitate rapid adoption of good practice in the development of this healthcare technology). reference problems are related to the assessment of improved diagnostic and prognostic tools in the clinical setting, resorting to observational and experimental clinical studies from phase i to phase iv and the integration with studies on therapy efficacy which would involve bioprofile and biopattern analysis. in this perspective, the integration of different omic data is a well-known issue that is receiving increasing attention in biomedical research [32.53, 54], and which questions the capability of researchers to make sense out of a huge amount of data with very different features. since this integration can not only be seen as an it problem, proper biostatistical approaches need to be taken into account that consider the multivariate nature of the problem in the light of exploiting the maximum prior information about the biological patterns underlying the clinical problem. a critical review of microarray studies was performed earlier in a paper by dupuy and simon [32.10], in which a thorough analysis of the major limitations and pitfalls of 90 microarray studies published in 2004 concerning cancer outcome was done (see fig. 32 .4 for a general pipeline for high-throughput experiments). integrated into this review was the attempt to write guidelines for statistical analysis and reporting of gene expression microarray studies. starting from this work, it will be possible to extend the outlined criticisms to a wider range of omic studies, in order to produce updated guidelines useful for biomolecular researchers. in the perspective of integrating omic data coming from different technologies, a comparison of microarray data with ngs platforms will be a relevant point [32.55-57]. due to the lack of sufficiently stan-dardized procedures for processing and analyzing ngs data, much attention will be given to the process of data generation and of quality control evaluation. such an integration is crucial because, though capabilities of ngs platforms mostly outperform those of microarrays, protocols of management and data analysis are typically very time-consuming, thus making it impractical to be used for in-depth analysis of large samples. of note, one of the ultimate goals of biomedical research is to connect diseases to genes that specify their clinical features and to drugs capable of treating them. dna microarrays have been used for investigating genome-wide expression of common diseases producing a multitude of gene signatures predicting survival, whose accuracy, reproducibility, and clinical relevance has, however, been debated [32.48, 49, 58, 59]. moreover, the regulatory relationships between the signature genes have rarely been investigated, largely limiting their biological understanding. the genes, indeed, never act independently from each other. rather, they form functional connections that coordinate their activity. hence, it is fundamental that in each cell in every life stage, regulatory events take place in order to keep the healthy steady state. any perturbation of a gene network, in fact, has a dramatic effect on our life, leading to disease and even death. the prefix nano is from the greek world meaning dwarf. nanotechnology refers to the science of materials whose functional organization is on the nanometer scale, that is 10 −9 m. starting from ideas originating in physics in the 1960s and boosted by the need of miniaturization (i. e., speed) of the electronic industry the field has grown rapidly. today, nanotechnology is gaining an important place in the medicine of the future. in particular, by using the patho-physiological conditions of diseased and inflamed tissues it is possible to target nanoparticles and with them drugs, genes, and diagnostic tools. moreover, the spatial and/or temporal contiguity of data from ngs and nanobiotech diagnostic approaches imposes the adoption of methods related to signal analysis which are still to be introduced in standard software, being related to statistical functional data analysis methods. therefore, the extension of the multivariate statistical methodologies adopted so far is requested in a functional data context; a problem that has already been met in the analysis of mass spectrometry data from proteomic analyses. nanotechnology-based platforms for the highthroughput, multiplexed detection of proteins and nucleic acids actually promise to bring substantial advances in molecular diagnostics. forecasted applications of nano-diagnostic devices are related to the assessment of the dynamics of cell process for a deeper knowledge of the ongoing etio-pathological process at the organ, tissue, and even single cell level. ngs is a growing revolution in genomic nanobiotechnologies that parallelized the assay process, integrating reactions at the micro or nano scale on chip surfaces, producing thousands or millions of sequences at once. these technologies are intended to lower the costs of nucleic acid sequencing far beyond that possible with earlier methods. concerning cancer, a key issue is related to the improvement of early detection and prevention through the understanding of the cellular and molecular pathways part f 32.3 of carcinogenesis. in such a way it would be possible to identify the conditions that are precursors of cancer before the start of the pathological process, unraveling its molecular origins. this should represent the next frontiers of bioprofiling to allow the strict monitoring and possible reversal of the neoplastic transformation through personalized preventive strategies. advances in nanobiotechnology enables the visualization of changes in tissues and physiological process with a subcellular real-time spatial resolution. this is a revolution that can be compared to the daguerreotype pictures from current high-throughput multiplex approaches to the digital high resolution of next generation diagnostic devices. enormous challenges remain in managing and analyzing the large amounts of data produced. such evolution is expected to have a strong impact in terms of personalized medical prevention and treatment with considerable effects on society. therefore, the success will be strongly related to the capability of integrating data from multiple sources in a robust and sustainable research perspective, which could enhance the transfer of high-throughput molecular results to novel diagnostic and therapy application. the new framework of nanobiotechnology approaches in biomedical decision support according to improved clinical investigation and diagnostic tools is emerging. there is a general need for guidelines for biostatistics and bioinformatics practice in the clinical translation and evaluation of new biomarkers from cross-omic studies based on hybridization, ngs, and high-throughput multiplexed nanobiotechnology assays. specifically, the major topics concern: bioprofile discovery, outcome analysis in the presence of complex follow-up data, assessment of diagnostic, and prognostic values of new biomarkers/bioprofiles. current molecular diagnostic technologies are not conceived to manage biological heterogeneity in tissue samples, in part because they require homogeneous preparation, leading to a loss of valuable spatial information regarding the cellular environment and tissue morphology. the development of nanotechnology has provided new opportunities for integrating morphological and molecular information and for the study of the association between observed molecular and cellular changes with clinical-epidemiological data. concerning specific approaches, bioconjugated quantum dots (qds) [32.60-63] have been used to quantify multiple biomarkers in intact cancer cells and tissue specimens, allowing the integration of traditional histopathology versus molecular profiles for the same tissue [32. [64] [65] [66] [67] [68] [69] . current interest is focused on the development of nanoparticles with one or multiple functionalities. for example, binary nanoparticles with two functionalities have been developed for molecular imaging and targeted therapy. bioconjugated qds, which have both targeting and imaging functions, can be used for targeted tumor imaging and for molecular profiling applications. nanoparticle material properties can be exploited to elicit clinical advantage for many applications, such as for medical imaging and diagnostic procedures. iron oxide constructs and colloidal gold nanoparticles can provide enhanced contrast for magnetic resonance imaging (mri) and computed tomography (ct) imaging, respectively [32.70, 71]. qds provide a plausible solution to the problems of optical in vivo imaging due to the tunable emission spectra in the near-infrared region, where light can easily penetrate through the body without harm and their inherent ability to resist bleaching [32.72]. for ultrasound imaging, contrast relies on impedance mismatch presented by materials that are more rigid or flexible than the surrounding tissue, such as metals, ceramics, or microbubbles [32.73]. continued advancements of these nano-based contrast agents will allow clinicians to image the tumor environment with enhanced resolution for a deeper understanding of disease progression and tumor location. additional nanotechnologically-based detection and therapeutic devices have been made possible using photolithography and nucleic acid chemistry [32. highly sensitive biosensors that recognize genetic alterations or detect molecular biomarkers at extremely low concentration levels are crucial for the early detection of diseases and for early stage prognosis and therapy response. nanowires have been used to detect several biomolecular targets such as dna and proteins [32.82, 87]. the identification of dna alterations is crucial to better understand the mechanism of a disease such as cancer and to detect potential genomic markers for diagnosis and prognosis. other studies have reported the development of a three-dimensional gold nanowire platform for the detection of mrna with enhanced sensitivity from cellular and clinical samples. highly sensitive electrochemical sensing systems use peptide nucleic acid probes to directly detect specific mrna molecules without pcr amplification steps [32. cantilever nanosensors have also been used to detect minute amount of protein biomarkers. label-free resonant microcantilever systems have been developed to detect the ng/ml level of alpha-fetoprotein, a potential marker of hepatocarcinoma, providing an opportunity for early disease diagnosis and prognosis [32.95]. nanofabricated and functionalized devices such as nanowires and nanocantilevers are fast, multiplexed, and label-free methods that provide extraordinary potential for the future of personalized medicine. the combination of data from multiple imaging techniques offers many advantages over data collected from a single modality. potential advantages include: improved sensitivity and specificity of disease detection and monitoring, smarter therapy selection based on larger data sets, and faster assessment of treatment efficacy. the successful combination of imaging modalities, however, will be difficult to achieve with multiple contrast agents. multimodal contrast agents stand to fill this niche by providing spatial, temporal, and/or functional information that corresponds with anatomic features of interest. there is also great interest in the design of multifunctional nanoparticles, such as those that combine contrast and therapeutic agents. the integration of diagnostics and therapeutics, known as theranostics, is attractive because it allows the imaging of therapeutic delivery, as well as follow-up studies to assess treatment efficacy. finally, a key direction of research is the optimization of biomarker panels via principled biostatistics approaches for the quantitative analysis of molecular profiles for clinical outcome and treatment response prediction. the key issues that will need to be addressed are: (i) a panel of tumor markers will allow more accurate statistical modeling of the disease behavior than relying on single tumor markers; and (ii) the combination of tumor gene expression data and molecular information of the cancer microenvironment is necessary to define aggressive phenotypes of cancer, as well as for determining the response of early stage disease to treatment (chemotherapy, radiation, or surgery). currently, the major tasks in biomedical nanotechnology are (i) to understand how nanoparticles interact with blood, cells, and organs under in vivo physiological conditions and (ii) to overcome one of their inherent limitations, that is, their delivery to diseased sites or organs [32.96-98]. another major challenge is to generate critical studies that can clearly link biomarkers with disease behaviors, such as the rate of tumor progression and different responses to surgery, radiation or drug therapy [32.99]. the current challenge is, therefore, related to the advancement of biostatistics and biocomputing techniques for the analysis of novel high-throughput biomarkers coming from nanotechnology applications. current applications involve high-throughput analysis of gene expression data and for multiplexed molecular profiling of intact cells and tissue specimens. the advent of fast and low cost high-throughput diagnostic devices based on ngs approaches appears to be of critical relevance for improving the technology transfer to disease prevention and clinical strategies. the development of nanomaterials and nanodevices offers new opportunities to improve molecular diagnosis, increasing our ability to discover and identify minute alterations in dna, rna, proteins, or other biomolecules. higher sensitivity and selectivity of nanotechnology-based detection methods will permit the recognition of trace amounts of biomarkers which will open extraordinary opportunities for systems biology analysis and integration to elicit effective early detection of diseases and improved therapeutic outcomes; hence paving the way to achieving individualized medicine. effective personalized medicine depends on the integration of biotechnology, nanotechnology, and informatics. bioinformatics and nanobioinformatics are cohesive forces that will bind these technologies together. nanobioinformatics represents the application of information science and technology for the purpose of research, design, modeling, simulation, communication, collaboration, and development of nano-enabled products for the benefit of mankind. within this framework a critical role is played by evaluation and benchmarking approaches according to a robust health technology assessment approach; moreover the development of enhanced data analysis approaches for the integration of multimodal molecular and clinical data should be based on up to date and validated biostatisical approaches. therefore, in the developing nanobiotechnology era, the role of biostatistical support to bioinformatics is definitely essential to prevent loss of money and suboptimal developments of biomarkers and diagnostic disease signature approaches of the past, which followed a limited assessment according to a strict business perspective rather than to social sustainability. concerning the relevance and impact for national health systems, it is forecasted that current omic approaches based on nanobiotechnology will contribute to the identification of next generation diagnostic tests which could be focused on primary to advanced disease prevention by early diagnosis of genetic risk patterns, or the start or natural history of the pathological process of multifactor chronic disease by the multiplexed assessment of both direct and indirect, inner genetic, or environment causal factors. a benefit of such a development would be finally related to the reduction of costs in the diagnostic process since nanobiotechological approaches seem best suited in the perspective of points-of-care poc diagnostic facilities which could be disseminated in large territories with a reduced number of excellence clinical facilities with reference diagnostic protocols. nanomaterials are providing the small, disposable lab-on-chip tests that are leading this new approach to healthcare. a variety of factors are provoking calls for changes in how diagnosis is managed. the lack of infrastructure in the developing world can be added to the inefficiency and cost of many diagnostic procedures done in central labs, rather than by a local doctor. for the developed world, an increasingly elderly population is going to exacerbate demand on healthcare and any time-saving solutions will help deal with this new trend. poc devices are looking to reduce the dependence on lab tests and make diagnosis easier, cheaper, and more accessible for countries lacking healthcare infrastructure. a key role in the overall framework will be played by data analysis under principled biostatistical approaches to develop suitable guidelines for data quality analysis, the following extraction of relevant information and communication of the results in an ethical and sustainable perspective for the individual and society. the proper, safe and secure management of personalized data in a robust and shared bioethical reference framework is, indeed, expected to reduce the social costs related to unsuited medicalization through renewed preventive strategies. a strong biostatistical based health technology assessment phase will be essential to avoid the forecasted drawbacks of the introduction of such a revolution in prevention and medicine. to be relevant for national health services, research on biostatistics and bioinformatics applied to nano-biotechnology should exploit its transversal role across multiple applied translational research projects on biomarker discovery, development, and clinical validation until their release for routine application for diagnostic/prognostic aims. objectives that would enable an accelerated framework for translational research since the involvement of quantitative support are listed here: • technological platforms for the developments in the fields of new diagnostic prevention and therapeutic tools. in the context of preventing and treating diseases, the objectives are to foster academic and industrial collaboration through technological platforms where multidisciplinary approaches using cutting edge technologies arising from genomic research may contribute to better healthcare and cost reduction through more precise diagnosis, individualized treatment, and more efficient development pathways for new drugs and therapies (such as the selection of new drug candidates), and other novel products of the new technologies. • patentable products: customized array and multiplex design with internal and external controls for optimized normalization. validation by double checked expression results for genes or protein in the customized array and multiplex assays. patenting of validated tailor-made cdna/proteomic arrays that encapsulate gene/protein signatures related to the response to the therapy with optimized cost/effectiveness properties. a robust, multidisciplinary quantitative assessment framework in translational research is a global need, which should characterize any specific laboratory and clinical translation project. however, the quantitative assessment phase is rarely based on an efficient cooperation between biologists, biotechnologists, and clinicians with biostatisticians, with relevant skills in this field. this represents a major limitation to the rapid transferability of basic research results to healthcare. such a condition is solved in the context of pharmacology in the research and development of new drugs to their assessment in clinical trials, whereas, for diagnostic/prognostic biomarkers, this framework is still to be fully defined. such a gap is wasting resources and is malpractice in the use of biomarkers and related bioprofiles for clinical decision making in critical phases of chronic and acute major diseases like cancer and cardiovascular pathologies. cancer statistics premalignant and in situ breast disease: biology and clinical implications genetic alteration during colorectal tumor development the hallmarks of cancer molecular portraits of human breast tumours bernards: a gene-expression signature as a predictor of survival in breast cancer foekens: gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer 9 maqc consortium: the microarray quality control (maqc) project shows inter-and intraplatform reproducibility of gene expression measurements critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting highthroughput genomic technology in research and clinical management of breast cancer. exploiting the potential of gene expression profiling: is it ready for the clinic? ductal epithelial proliferations of the breast: a biologic continuum? comparative genomic hybridization and highmolecular-weight cytokeratin expression patterns baylin: gene silencing in cancer in association with promoter hypermethylation dna methylation and histone modification regulate silencing of epithelial cell adhesion molecule for tumor invasion and progression histone modifications in transcriptional regulation croce: mi-crorna gene expression deregulation in human breast cancer putting tumours in context prospective identification of tumorigenic breast cancer cells daidone: isolation and in vitro propagation of tumorigenic breast cancer cells with stem/progenitor cell properties stem cells, cancer, and cancer stem cells hare: tumour-stromal interactions in breast cancer: the role of stroma in tumourigenesis know thy neighbor: stromal cells can contribute oncogenic signals tarin: tumor-stromal interactions reciprocally modulate gene expression patterns during carcinogenesis and metastasis fusenig: friends or foes -bipolar effects of the tumour stroma in cancer estrogen carcinogenesis in breast cancer effects of oestrogen on gene expression in epithelium and stroma of normal human breast tissue the role of estrogen in the initiation of breast cancer inflammation and cancer estrogen-induced rat breast carcinogenesis is characterized by alterations in dna methylation, histone modifications and aberrant microrna expression molecular classification of cancer: class discovery and class prediction by gene expression monitoring multiple comparisons: bonferroni corrections and false discovery rates culhane: comparison and evaluation of methods for generating differentially expressed gene lists from microarray data a comparison of foldchange and the t-statistic for microarray data analysis linear models and empirical bayes methods for assessing differential expression in microarray experiments robustified manova with applications in detecting differentially expressed genes from oligonucleotide arrays non-parametric manova methods for detecting differentially expressed genes in real-time rt-pcr experiments gene expression profiling predicts clinical outcome of breast cancer using biplots to interpret gene expression in plants botstein: singular value decomposition for genome-wide expression data processing and modeling epithelialto-mesenchymal transition, cell polarity and stemness-associated features in malignant pleural mesothelioma use of biplots and partial least squares regression in microarray data analysis for assessing association between genes involved in different biological pathways statistical issues in the analysis of chip-seq and rna-seq data biganzoli: data mining in cancer research a gene expression database for the molecular pharmacology of cancer systematic variation in gene expression patterns in human cancer cell lines wolmark: a multigene assay to predict recurrence of tamoxifen-treaten, node-negative breast cancer reis-filho: microarrays in the 2010s: the contribution of microarray-based gene expression profiling to breast cancer classification, prognostication and prediction boracchi: prediction of cancer outcome with microarrays hill: prediction of cancer outcome with microarrays: a multiple random validation strategy hochberg: controlling the false discovery rate: a practical and powerful approach to multiple testing a direct approach to false discovery rates adaptive control processes: a guided tour the challenges of integrating multi-omic data sets searls: data integration: challenges for drug discovery comparing microarrays and next-generation sequencing technologies for microbial ecology research rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays microrna expression profiling reveals mirna families regulating specific biological pathways in mouse frontal cortex and hippocampus ioannidis: predictive ability of dna microarrays for cancer outcomes and correlates: an empirical assessment gene expression profiling: does it add predictive accuracy to clinical characteristics in cancer prognosis? quantum dot bioconjugates for ultrasensitive nonisotopic detection the use of nanocrystals in biological detection quantum dots for live cells, in vivo imaging, and diagnostics in-vivo molecular and cellular imaging with quantum dots molecular profiling of single cells and tissue specimens with quantum dots molecular profiling of single cancer cells and clinical tissue specimens with semiconductor quantum dots bioconjugated quantum dots for multiplexed and quantitative immunohistochemistry in situ molecular profiling of breast cancer biomarkers with multicolor quantum dots high throughput quantification of protein expression of cancer antigens in tissue microarray using quantum dot nanocrystals emerging use of nanoparticles in diagnosis and treatment of breast cancer superparamagnetic iron oxide contrast agents: physicochemical characteristics and applications in mr imaging colloidal gold nanoparticles as a blood-pool contrast agent for x-ray computed tomography in mice quantum-dot nanocrystals for ultrasensitive biological labeling and multicolor optical encoding contrast ultrasound molecular imaging: harnessing the power of bubbles light-directed spatially addressable parallel chemical synthesis bio-barcode-based dna detection with pcr-like sensitivity microchips as controlled drug delivery devices ingber: soft lithography in biology and biochemistry small-scale systems for in vivo drug delivery real-time monitoring of enzyme activity in a mesoporous silicon double layer nanotechnologies for biomolecular detection and medical diagnostics nanosystems biology nanowire nanosensors for highly sensitive and selective detetction of biological and chemical species thundat: cantilever-based optical deflection assay for discrimination of dna singlenucleotide mismatches bioassay of prostatespecific antigen (psa) using microcantilevers micro-and nanocantilever devices and systems for biomolecule detection viral-induced self-assembly of magnetic nanoparticles allows the detection of viral particles in biological media high-content analysis of cancer genome dna alterations label-free biosensing of a gene mutation using a silicon nanowire field-effect transistor solit: therapeutic strategies for targeting braf in human cancer electrical detection of vegfs for cancer diagnoses using anti-vascular endothelial growth factor aptamer-modified si nanowire fets single conducting polymer nanowire chemiresistive label-free immunosensor for cancer biomarker label-free, electrical detection of the sars virus n-protein with nanowire biosensors utilizing antibody mimics as capture probes nanogram per milliliter-level immunologic detection of alpha-fetoprotein with integrated rotating-resonance microcantilevers for early-stage diagnosis of heptocellular carcinoma transport of molecules, particles, and cells in solid tumors delivery of molecular and cellular medicine to solid tumors the next frontier in molecular medicine: delivery of therapeutics biomedical nanotechnology with bioinformatics -the promise and current progress key: cord-001060-9g8rwsm1 authors: arruebo, manuel; vilaboa, nuria; sáez-gutierrez, berta; lambea, julio; tres, alejandro; valladares, mónica; gonzález-fernández, áfrica title: assessment of the evolution of cancer treatment therapies date: 2011-08-12 journal: cancers (basel) doi: 10.3390/cancers3033279 sha: doc_id: 1060 cord_uid: 9g8rwsm1 cancer therapy has been characterized throughout history by ups and downs, not only due to the ineffectiveness of treatments and side effects, but also by hope and the reality of complete remission and cure in many cases. within the therapeutic arsenal, alongside surgery in the case of solid tumors, are the antitumor drugs and radiation that have been the treatment of choice in some instances. in recent years, immunotherapy has become an important therapeutic alternative, and is now the first choice in many cases. nanotechnology has recently arrived on the scene, offering nanostructures as new therapeutic alternatives for controlled drug delivery, for combining imaging and treatment, applying hyperthermia, and providing directed target therapy, among others. these therapies can be applied either alone or in combination with other components (antibodies, peptides, folic acid, etc.). in addition, gene therapy is also offering promising new methods for treatment. here, we present a review of the evolution of cancer treatments, starting with chemotherapy, surgery, radiation and immunotherapy, and moving on to the most promising cutting-edge therapies (gene therapy and nanomedicine). we offer an historical point of view that covers the arrival of these therapies to clinical practice and the market, and the promises and challenges they present. chemotherapy, surgery and radiotherapy are the most common types of cancer treatments available nowadays. the history of chemotherapy began in the early 20th century, but its use in treating cancer began in the 1930s. the term "chemotherapy" was coined by the german scientist paul ehrlich, who had a particular interest in alkylating agents and who came up with the term to describe the chemical treatment of disease. during the first and second world wars, it was noticed that soldiers exposed to mustard gas experienced decreased levels of leukocytes. this led to the use of nitrogen mustard as the first chemotherapy agent to treat lymphomas, a treatment used by gilman in 1943 . in the following years, alkylating drugs such as cyclophosphamide and chlorambucil were synthesized to fight cancer [1, 2] . kilte and farber designed folate antagonists such as aminopterin and amethopterin, leading to the development of methotrexate, which in 1948 achieved leukemia remission in children [3] . elion and hitchings developed 6-thioquanine and 6-mercaptopurine in 1951 for treating leukemia [4, 5] . heidelberger developed a drug for solid tumors, 5-fluorouracil (5-fu), which is up to now an important chemotherapy agent against colorectal, head and neck cancer [6] . the 1950s saw the design of corticosteroids, along with the establishment of the cancer chemotherapy national service center in 1955, whose purpose was to test cancer drugs. at that time, monotherapy drugs only achieved brief responses in some types of cancers [7] . by 1958, the first cancer to be cured with chemotherapy, choriocarcinoma, was reported [8] . during the 1960s, the main targets were hematologic cancers. better treatments were developed, with alkaloids from vinca and ibenzmethyzin (procarbazine) applied to leukemia and hodgkin's disease [9] [10] [11] . in the 1970s, advanced hodgkin's disease was made curable with chemotherapy using the momp protocol [12, 13] , which combined nitrogen mustard with vincristine, methotrexate and prednisone, and the mopp protocol [14, 15] , containing procarbazine but no methotrexate. patients with diffuse large b-cell lymphoma were treated with the same therapy and, in 1975, a cure for advanced diffuse large b-cell lymphoma was reported using protocol c-mopp, which substituted cyclophosphamide for nitrogen mustard [16] . surgery and radiotherapy were the basis for solid tumor treatment into the 1960s. this led to a plateau in curability rates due to uncontrolled micrometastases. there were some promising publications about the use of adjuvant chemotherapy after radiotherapy or surgery in curing patients with advanced cancer. breast cancer was the first type of disease in which positive results with adjuvant therapy were obtained, and also the first example of multimodality treatment, a strategy this manuscript reviews the evolution of oncological treatments available today, together with several immunotherapeutic approaches and nanoscale-based therapeutics including successes, drawbacks and recent progress. the concept of immunotherapy in medicine incorporates the use of components of the immune system, including antibodies (abs), cytokines, and dendritic cells, to treat various illnesses, such as cancer, allergies, and autoimmune and infectious diseases. immunotherapy also includes the use of vaccines for the prevention of allergies and tumors. immunotherapy adds new dimensions to clinical practice, offering much more specificity, higher efficacy, directed therapy, less toxicity, lower secondary effects and better tolerance. although immunotherapy can be used for several illnesses (macular degeneration, autoimmune diseases, etc.), in the case of cancer, the aim of immunotherapy is to kill tumor cells (either directly or indirectly) or to help patients' immune systems destroy tumors. of all the types of anti-tumoral immunotherapy, this review will focus on the use of antibodies, their history, problems and current applications. antibodies (abs) are one of the most important defense mechanisms for vertebrate animals. they are produced by b cells, which, after antigen-mediated activation, undergo differentiation to secretory (plasma) cells thus producing soluble antibodies. antibodies are highly specific, and they recognize and eliminate pathogens and disease antigens, but can be deliberately generated to recognize different target molecules (tumor markers, bacteria, receptors, cytokines, hormones, etc.). thus, abs can be used in many applications, including diagnostic techniques, research and therapy (against infections, tumors, transplants and autoimmune diseases). antibodies were described in 1890 ( figure 1 ) by von behring and kitasato as "anti-toxins" that appeared in the serum of animals after immunization with inactivated toxins (toxoids) [33] . the researchers noted that protection could be transferred to other animals through the use of these antisera, thus beginning what it is known as "serum therapy" for treating infectious diseases (diphtheria and tetanus) in humans. soon after, these sera elements were described as "anti-bodies" because they could be directed not only against toxins, but also against a large variety of organisms and compounds (bacteria, proteins, chemicals, etc.). immunotherapy initially began with the use of antisera obtained from animals such as horses and sheep containing, among other things, a mixture of antibodies from the activation of different b cell clones, so-called "polyclonal antibodies" (pabs). in 1926, felton and bailey obtained pure antibodies, but it was not until the 1960s, thanks to the work of porter and edelman (1972 nobel prize winners), that the ab structure became known. after the introduction of abs to therapy, researchers observed that the transferred defense was only temporary (as opposed to vaccination, which induces long-term memory). in addition, it often incurred anaphylactic responses that were occasionally fatal and which greatly reduced their use in human therapy. however, these problems did not prevent pabs from being used successfully in diagnostic cancers 2011, 3 3284 techniques and even in preventive therapies. anti-snake venom, ant-tetanus and anti-rh+ gamma globulins are still being used in clinical practice. figure 1 . history of antibodies. in 1890 von behring and kitasato showed that it was possible to generate anti-toxins (against tetanous, diphtheria), and soon after, therapy with antiserum containing antitoxins were used in patients. it took several years to purify the antibodies (1926) and even more to know their structure. on 1975, milstein and köhler developed the first monoclonal antibody, and the generation and application of monoclonal antibodies started (on diagnosis, research and therapy), initiating the modern immunology. in the 1980s, the first anti-tumoral monoclonal antibody was tested and molecular biology techniques started to designed chimeric and humanized antibodies. later on, transgenic mice carrying human ig genes and other animal models were used to produce fully human antibodies. in 1975, cesar milstein and george köhler (1984 nobel prize winners) succeeded in generating monoclonal antibodies (mabs) by fusing mouse b cells with b cell tumors (myeloma) to create hybrid cells, which were immortal and had the capacity to produce large quantities of a single (monoclonal) antibody [34] . in 1976, genetic studies by susumu tonegawa revealed the basis for the vast diversity of antibodies, identifying the process of somatic recombination in immunoglobulin genes [35] . since the publication of the monoclonal antibody technique, mouse and rat mabs have been used in many laboratories with thousands of applications in various scientific fields, in diagnostic techniques (clinical, food, environmental), research and in therapy (antitumor, autoimmune diseases). monoclonal antibodies have helped in the discovery of new molecules (such as the identification of more than 300 membrane proteins, grouped under the cd concept or cluster of differentiation), transcription factors, viral, plant and bacterial proteins, phosphorylated compounds involved in death by apoptosis, factors involved in enzymatic cascades and many more. as an example of their usefulness, the current classification of leukemia by the world health organization is based on the presence or absence of membrane molecules recognized by monoclonal antibodies that define leucocyte populations in various stages of differentiation. but one of the greatest achievements with monoclonal antibodies is their use in human therapy. surgery, chemotherapy and radiotherapy are not specifically directed to tumor cells and may also affect healthy tissue. antibodies can provide specificity and lower toxicity, opening new therapeutic possibilities. the first evidence of this potential came in 1982 when a patient suffering from lymphoma responded to treatment using a mouse mab directed specifically against his tumor b lymphocytes [36] . this response rapidly encouraged research into the production of potentially therapeutic abs. however, clinical trials results revealed that many patients receiving this therapy developed an immune response directed against the therapeutic abs, a response known as hama (human anti-mouse antibodies) or hara (human anti-rat antibodies). some even developed anaphylactic reactions, especially after repeated administration. the high immunogenicity of antibodies due to their large size compared to conventional pharmaceutical drugs, and differences in the pattern of glycosylation between murine and human abs, once again led to the cessation of antibody use in therapy. completely human mabs needed to be developed to avoid immune rejection, but their production was much more complex than initially thought. in contrast to mouse or rat myeloma cells, human myeloma cells proved difficult to adapt to continuous growth in vitro. researchers tried to resolve this problem by immortalizing b cells using the epstein-barr virus (ebv) [37] and by fusing human b cells with well-established murine myeloma (obtaining heterohybridomas) [38] . however, the low production of antibodies in these cells, the instability of heteromyeloma cells and numerous technical problems lead to the search for alternative methods for generating human-like mabs in the mid-1980s. one of these methods was the modification of murine mabs through genetic engineering ( figure 2 ). several antibody molecules and some antibody fragments are shown. chimeric (mouse-human) antibodies carry mouse heavy and light variable domains (in yellow) being the rest of the molecule of human origin (in red). in the case of humanized antibodies, only the hypervariable regions are mouse derived (in yellow). it is possible to generate bi-specific antibody molecules, using different heavy and light chains (each arm will have a different specificity). fab: fragment antigen binding; scfv: single chain fragment variable; vh: variable domain from the heavy chain. chimerization (murine variable domains linked to constant regions of human heavy and light chains), humanization (only hypervariable regions of murine origin), primatization (chimeric structure of human and primate origin) and the design of recombinant antibody fragments, such as fv (variable fragment), fab (antigen binding fragment), scfv (single chain variable fragment) and minibodies (artificial polypeptides with a structure based on the igv domain), are some of the methods that have been used over the last 30 years to reduce antigenicity and maintain the binding affinity and specificity of the original ab. rituximab, a chimeric anti-cd20 mab, was the first mab approved by the fda for antitumor therapy. however, a year earlier, several mabs conjugated with radioactive elements were approved for in vivo tumor detection. every year since then, several mabs have been approved for therapy in the us and europe, and more than half of them are chimeric or humanized mabs (see table 1 ). in addition to fully engineered antibodies, antibody fragments also have advantages compared to whole antibodies, especially in terms of the rate of solid tumor penetration. jainr [39] determined that an intact igg molecule needed 54 hours to move 1 mm into a solid tumor, whereas a fab fragment reached the same distance in 16 hours. while the expression of chimeric and humanized antibodies was carried out in eukaryotic hosts, such as mammalian or plant cells, bacteria have been the most widely used organism for the production of recombinant antibody fragments [40] [41] [42] . however, despite numerous advantages, such as avoiding animal immunization and hybridoma production, their low cost and easier production [43] , antibody fragments have shorter circulating half-lives compared to full-size antibodies, lack glycosylation and lack effector functions due to the absence of their fc region (unless added). thus, antibody therapies using incomplete antibodies have been relegated to those cases where rapid elimination of antibodies from the blood is required and to local therapy (e.g., macular degeneration). modified versions, such as pegylation of fragments (modification of a molecule by linking of one or more polyethylene glycol chains) [44] to improve circulation half-life, glycosylation and fc region engineering are some of the recent approaches used by researchers to overcome these problems [45] . in the mid 1990s, thanks to the development of molecular biology techniques and microinjection and manipulation of embryonic cells, several groups created various transgenic mice models carrying human igs genes ( figure 1 ). the introduction of human ig loci in these mice was carried out using various vectors, such as miniloci, yeast and human artificial chromosomes (yacs and hacs, respectively) and p1 vectors. transgenic mice can be immunized with almost any ag (including human tumor cells), and their spleens can be used to obtain hybridomas following the conventional protocol [46] [47] [48] [49] . moreover, mice can produce human abs of intermediate/high affinity because they can introduce mutations in their human igs transgenes through the mechanism of somatic hypermutation. fully human monoclonal antibodies show several advantages in human therapy, which include low or no immunogenicity, better interaction with human effector systems and patterns of glycosylation and a longer half-life in human serum. in recent years, many fully human mabs have been introduced into clinical trials and some of them have been approved by regulatory agencies (table 1) . in addition to the use of transgenic mice to generate fully human mabs, other alternatives have been developed, such as the use of immunodeficient mice receiving human hematopoietic tissue, the use of chicken eggs with human ig coding genes inserted into embryonic cells and the generation of transgenic tobacco plants for producing human mabs. moreover, several groups are working on modifications of the basic antibody structure to generate monovalent and multispecific reagents that may have various therapeutic properties and even completely new structures. examples of these new reagents include antibody alternative protein scaffolds based on leucine-rich repeat molecules of lamprey variable lymphocyte receptors (vlrs), libraries of fibronectin domains and designed ankryin repeat proteins (darps) [50] . with all these novel antibody formats, immunogenicity, stability and aggregation problems should be carefully considered. soon after mabs generation was reported in 1975, the potential of mabs became clear and many companies showed interested in developing new reagents for diagnosis and designing new equipment, among other contributions. however, when it came to the field of human therapy, pharmaceutical companies did not initially show much interest in the development of monoclonal antibodies, although several research groups were showing promising results in preclinical and clinical studies. the reasons for their reluctance are many: 1. a number of pharmaceutical companies had experience with generating small compounds, most of them chemically synthesized, but not with generating large biological molecules produced by cells. moreover, sophisticated equipment and cell culturing under controlled conditions, with full quality assurance, are necessary for antibody production. 2. there was the perception by pharmaceutical companies that production of mabs was not going to yield sufficient profit. most companies preferred to concentrate their efforts on developing analogues of well-known drugs rather than on new products, while at the same time most clinicians opted for trials using combinations of known agents. this view took years to change. advances in mab engineering helped develop more effective mab drugs with high specificity, improved potency and stability and decreased immunogenicity, which helped change the companies' initial reluctance. 3. in terms of clinical trials, there were concerns about the cost of the trials (around 10 times more expensive today than 30 years ago), the time required for preclinical pharmacology and toxicology studies (which are much more regulated) and the difficulty in conducting early clinical trials. since new drugs can only be tested against advanced and usually heavily pretreated disease, it is unlikely that dramatic responses will occur with these patients. 4. the requirement for fetal calf serum in cell hybridoma cultures introduced another problem when mad cow disease was identified in the early 1990s. the fda proposed a limit on materials used in some medical products in order to keep them free of the agent thought to responsible for mad cow disease (also known as bovine spongiform encephalopathy or bse), making it necessary to find alternatives, such as enriched media without serum. since 1988, 228 mabs have entered clinical studies for various diseases, with 56% of those currently in clinical development. some of these mabs are listed in table 1 . the first mab approved for cancer therapy was rituximab (rituxan tm ), a chimeric antibody directed against cd20, for non-hodgkin's lymphomas. since then, many others have reached the market, including those for the treatment of breast cancer (trastuzumab, herceptin ® ), acute myeloid leukemia (gemtuzumab ozogamicin, mylotarg tm ), chronic lymphocytic leukemia (alemtuzumab, campath-1h ® ), colorectal tumor (cetuximab, erbitux tm ) and several types of cancer (bevacizumab, avastin tm ). companies such as genentech inc., amgen, bristol-myers-squibb, imclone systems and trion pharma represent only a portion of the pharmaceutical companies involved in the antibody market related to cancer therapy ( table 1) . new developments have also occurred in the immunoconjugate field and many of them are currently being explored by the pharmaceutical industry. immunoconjugates include antibodies linked to cancer-killing agents such as drugs, cytokines, toxins and radioisotopes. the objective is for the antibody to act as a transporter for the cancer-killing agent, concentrating the agent directly in the cancer cell, with minimal damage to healthy cells. although conjugated antibodies showed toxicity in the past, more recent approaches under development appear to decrease unwanted side effects. pharmaceutical companies are developing immunoconjugates independently, forming partnerships with specialized players and even acquiring small biotech companies that are focused on the field of immunoconjugates. although the challenge of their potential immunogenicity requires special attention, there are several practical advantages to immunoconjugates over single antibodies. these include lower dosages, which may lead to lower treatment costs and fewer side effects; the reintroduction of antibodies that historically have shown low efficacy in isolation; the possibility of using bacteria or plant cells to produce immunoconjugates rather than using mammalian cell cultures (decreasing costs and complexity) and the large number of potential combinations (antibodies-cancer killing agents) that are possible. the advantages of immunoconjugates over single antibodies make them crucial players in new cancer therapy developments. although many researchers have worked on monoclonal antibodies and cancer (close to 60,000 reports on this subject can be found in pubmed (http://www.ncbi.nlm.nih.gov/pubmed) the therapeutic mab market moved much more slowly than initially expected, due mostly to the problems indicated above. this situation has changed in recent years and mabs are now the largest class of biological therapies under development, representing a multi-billion dollar worldwide market. as reported recently by scolnik [51] , the 22 mabs currently marketed in the us have a sales growth rate of 35% compared to less than 8% for small-molecule drugs. oncology and autoimmune diseases are the most successful indications for these drugs, with five mabs having sales in excess of $3b. thanks to basic research, researchers are identifying new biomarkers, which could be potential targets for mabs. there are currently numerous mabs at various developmental stages and it is expected that many of them will be available for clinical use in the near future. chemotherapy, radiation therapy and surgery are the most common types of cancer treatments available today. more recent treatments, which are at various stages of development, include angiogenesis inhibitor therapy, biological therapies (including interferons, interleukins, colony-stimulating factors, monoclonal antibodies, vaccines, gene therapy and nonspecific immunomodulating agents), bone marrow and peripheral blood stem cell transplantation, laser therapy, hyperthermia, photodynamic therapy and targeted cancer therapies [52] . in the last two decades, a large number of nanoscale and nanostructure-based therapeutic and diagnostic agents have been developed, not only for cancer treatment but also for its prevention and diagnosis [53] . targeted cancer, hyperthermia, photodynamic and gene therapies are just some of the cancer treatments that use engineered nanomaterials. these therapies can be used in isolation or in combination with other cancer treatments, thereby taking advantage of their ability to target tumors (actively or passively), to respond to physical or chemical stimulation (internal or external) and to deliver therapeutic genes to the cell nuclei. the main objective of nanomaterials in cancer treatment is to deliver a therapeutic moiety to tumor cells in a controlled manner (depending on the required pharmacokinetic) while minimizing side effects and preventing drug resistance. nanoscale and nanostructured materials may also be used in diagnosis to detect and prevent pathologies as soon as possible, ideally being able to sense cancer cells and associated biomarkers. compared to conventional therapies, nanoparticles show six clear advantages in cancer treatment and/or diagnosis: (1) they can be synthesized in specific sizes and with surface characteristics to penetrate tumors by taking advantage of the enhanced permeation and retention effect (epr) (a mechanism known as passive targeting); (2) they can be engineered to target tumor cells by surface functionalization with biomolecules that attach to tumor-specific cell markers (a mechanism known as active targeting); (3) they can be engineered to penetrate cells and physiological barriers (e.g., blood-brain barrier, blood-retinal barrier); (4) they can increase the plasma half-life of carried chemotherapeutic drugs, which are usually highly hydrophobic; (5) they can protect a therapeutic payload from biological degradation; and (6) they can be synthesized as multifunctional platforms for combined imaging and therapeutic applications (theragnostic nanoparticles). examples of various nanostructured materials with potential applications in oncology are shown in figure 3 . the advantages of biocompatible nanomaterials have contributed to their significant expansion in cancer treatment. targeted therapies for oncology are predicted to reach a 30 billion euro global market by 2015 [54] . the total market for nanobiotechnology products reached as high as $19.3 billion in 2010 [55] . table 2 compiles some of the clinically approved nano-based therapeutics for cancer treatment and diagnosis. many other nanoscale or nanostructure-based therapeutic and diagnostic agents are currently in clinical trials at various stages of development. in 2008, zhang et al. [53] reported on 15 clinical trials being conducted for nanoparticle-based therapeutics. a year later, 50 ongoing clinical trials using nanoparticles for cancer were mentioned by bergin [55] , and at present, there are more than 70 clinical trials under development [56] . this large number of commercial nano-based therapeutics for use in cancer treatment is also reflected in the exponential increase in scientific publications and patents involving nanomaterials in recent years. figure 4 shows the evolution over the last decade in the number of published scientific papers and issued patents involving nano-based applications developed to fight cancer. the number of papers and patents involving traditional forms of therapy (chemotherapy, radiation therapy and surgery) grew linearly over the last decade. however, the use of the terms "nano-" and "cancer" has shown exponential growth over the past decade, demonstrating a major focus on nano-based tools applied to cancer treatment and diagnosis. recent advances in the use of nanoscale and nanostructured-based therapeutic agents in cancer treatment are reported below. nanoparticles are engineered to achieve cell targeting by using selective moieties (e.g., antibodies and their fragments, carbohydrates, peptides, nucleic acids), which binds to its corresponding antigen, cell surface carbohydrate or over-expressed receptor in tumor cells. the rapid cellular proliferation of these cells is also exploited by coupling the nanoparticles with different biological agents, such as folic acid. the rationale for coupling these carriers with folic acid is that the folate receptor is over-expressed in a broad range of tumor cell types, including solid and hematological malignancies [57] . once it has reached the target, the cargo is released into the interior of the cell, and ideally, a signaling marker attached to the vector will aid the physician in visualizing the tumor. such a vector may also be grafted with a moiety (usually peg), which retards recognition by the reticuloendothelial system (res) to increase nanoparticle systemic circulation. in addition to recognition moieties, carried drugs and signaling elements attached to nanoparticles, numerous authors have also envisioned and designed vectors with additional functionalities, including cell-penetrating moieties, combinations of several drugs, combinations of drugs and genes, prodrugs (which become drugs upon biochemical modification by tumor cells), stimulus-sensitive agents that can be externally triggered and molecules for evaluating therapeutic efficacy. the more functionality added to the vector, the better the chances of reaching the target; however, its chances of being detected by the res also increase. therefore, currently marketed nanoparticles use passive targeting and active targeted nanoparticles are still being developed. examples of active targeted nanoparticles are reviewed elsewhere [58] . targeted nanoparticle fabrication remains a challenge due to the multiple steps involved, which include biomaterial synthesis and assembly, targeting ligand coupling/insertion, drug loading, surface stabilization and final purification, which could cause batch-to-batch variations and, therefore, quality concerns. for this reason, single-step synthesis of targeted nanoparticles by self-assembling pre-functionalized biomaterials provides a simple and scalable manufacturing strategy [59] . mass production is also a serious concern and continuous synthesis procedures are therefore still being sought. when using batch reactors to synthesize nanoparticles, several drawbacks usually appear, including: (1) heterogeneous distribution of reactants and temperatures in the reactor; (2) insufficient mixing; (3) variations in the physicochemical characteristics of products from different batches; (4) their inherent discontinuity; and (5) the numerous post-synthesis purification steps that are usually required. in order to overcome these disadvantages, microfluidic reactors (e.g., micromechanized micromixers, capillaries, junctions) have been used in the continuous synthesis of nanoparticles to precisely control reaction temperatures and residence times, thereby rendering nanoparticles with narrow particle-size distributions. other continuous synthesis processes are usually preferred when synthesizing nanoparticles on a large scale (e.g., laser pyrolysis, arc discharge methods). another concern is the adaptive response of the immune system after repeated applications of nanoparticles. immunological memory, created from the primary response to a specific nanoparticle, provides an enhanced response to secondary encounters with the same type of nanoparticle. as an example, the recognition of pegylated liposomes by anti-peg antibodies has been reported to occur between 2 to 4 days after the first administration of peg-liposomes, leading to fast clearance from circulation [54] . finally, one of the last major barriers to achieving the transition of targeted nanoparticle use into clinical practice is the complete understanding of potential toxicological properties of these materials, along with their exact pharmacodynamics and pharmacokinetics. in spite of these hurdles, many research groups are focusing their efforts on solving them. other groups are also directing their efforts towards designing more efficient targeted nanoparticles for cancer treatment in terms of structure, morphology, biocompatibility and surface functionalization. some of those advances will be described later in this document. novel targeted theragnostic nanoparticles have been synthesized and their bi-functionality demonstrated. among them are perfluorocarbon nanoemulsions, which are in clinical trials [60] . quain et al. [61] coupled pegylated gold nanoparticles to a single-chain variable fragment antibody, which recognized the epidermal growth factor receptor overexpressed in many types of malignant human tumors, and demonstrated the targeting capabilities of these vectors in nude mice bearing human head-and-neck tumors. the nanoparticles were also able to function as tags for spectroscopic detection with surface-enhanced raman spectroscopy. magnetic targeting has also been used as a physical method for targeting and visualizing tumors. effects of magnetic targeting on the extent and selectivity of nanoparticle accumulation in tumors of rats harboring orthotopic 9l-gliosarcomas were analyzed using magnetic resonance imaging (mri) [62] . sun et al. also demonstrated the targeted drug release capabilities of iron oxide nanoparticles conjugated with a drug (methotrexate) and a targeting ligand, chlorotoxin, while monitoring tumor-cell specificity in vivo using mri [63] . weng et al. demonstrated the targeted tumor cell internalization and imaging of multifunctional quantum dot-conjugated immunoliposomes, in vitro and in vivo [64] . in this targeted delivery system, anti-her2 single chain fv fragments were attached to the end of peg chains located on the surface of liposomes. targeting via extracellular activation of the nanocarrier is a promising method for achieving active targeting using physiological stimuli present in the tumor environment. triggering mechanisms that only release the transported cargo of nanocarriers into the tumor environment take advantage of its acidic ph and uncontrolled enzyme production. a complete description of these systems is reported elsewhere [58] . tumor targeting of prodrugs that become active once they reach tumor cells is another novel strategy for avoiding unwanted side effects of the drug, and it allows for the delivery of large doses of drugs. following this approach, dhar et al. [65] synthesized pt(iv)-encapsulated prostate-specific membrane antigen targeted nanoparticles of poly(d,l-lactic-co-glycolic acid) (plga)-poly(ethylene glycol) (peg)-functionalized controlled release polymers. after reduction in the interior of the tumor cells, the prodrug becomes cisplatin, which cross-links on nuclear dna. photodynamic therapy (pdt) is a technology that uses a photosensitizer that is activated upon exposure to visible or near infrared (nir) light, and transfers energy to molecular oxygen, thereby generating reactive oxygen species (e.g., singlet oxygen, free radicals, peroxides). the subsequent oxidation of lipids, amino acids and proteins induces cell death. a complete review of photosensitizers is reported elsewhere [66] . fda-approved photosensitizers absorb in the visible spectral regions below 700 nm, where light penetrates only a few millimeters into the skin. pdt is therefore limited to treatment of certain types of skin cancer and its effectiveness for other tumors is not yet apparent [66, 67] . pdt is usually performed as an outpatient procedure and may be repeated and used in combination with other therapies, such as surgery, radiation and chemotherapy [52] . photosensitizers are susceptible to photobleaching under light irradiation, and have therefore been loaded within nanoparticles to avoid this drawback. most photosensitizers are also highly hydrophobic, so nanoparticles are being explored as carriers to increase their bioavailability. noble metal nanoparticles have proven very useful as agents in photodynamic therapy due to their enhanced absorption cross sections, which are four to five orders of magnitude larger than those offered by conventional photoabsorbing dyes [68] . silica nanoparticles synthesized in the non-polar core of micelles have been used to entrap the water-insoluble photosensitizing anticancer drug 2-devinyl-2-(1hexyloxyethyl) pyropheophorbide. upon nir light irradiation, nanoparticles embedded in hela cells generate singlet oxygen, resulting in a reduction in the percentage of cell survival [69] . many other photosensitizers have been embedded within inorganic nanoparticles for pdt, including meta-tetra(hydroxyphenyl)-chlorin (m-thpc) [70] . a complete review of various nanoparticulate-based carriers for pdt is reported elsewhere [71] . preclinical studies will determine the added translational value of pdt therapies using photosensitizers loaded into these novel nanoparticles prior to their use in clinical settings. hyperthermia, as an anticancer therapy, consists of heating a tumor to inhibit proliferation of cancer cells with the aim of destroying or rendering them more sensitive to the effects of conventional protocols of radiation and chemotherapy. in fact, hyperthermia is currently used as an adjunct therapy to radiotherapy and/or chemotherapy. when cells are heated beyond their normal temperature they can become sensitized to conventional therapeutic agents such as radiation and chemotherapy. when high temperatures are used, typically above 43 °c, the heat causes irreparable damage and results in tumor cell death in a process known as thermal ablation. the success of local thermal ablation consists of destroying the entire tumor mass without damaging adjacent vital structures. this requirement is particularly important for patients with limited reserves of tissue function. hyperthermia treatments make use of microwaves, ultrasounds and radiofrequency, which can be focused and used locally to target the tumor. a significant advantage of thermal technology is that it is minimally invasive. mild heat increases blood flow in the tumor, allowing chemotherapy to exert greater effect on cancer cells. by depressing the metabolic activity of target cells, heat also reduces the oxygen demand in the tumor and tumor tissue oxygenation increases, which makes hyperthermia one of the most potent radiosensitizers available [72] . results from clinical trials conducted under quality assurance guidelines have shown hyperthermia to be beneficial in the treatment of several types of solid tumors, including breast cancer, melanoma, sarcoma and locally advanced cervical cancer, with reports demonstrating improved overall survival, as compared to patients who only receive radiotherapy or chemotherapy [73] [74] [75] . it is widely accepted that the benefits of hyperthermia will significantly increase with refinements in heating delivery technologies as well as in monitoring strategies that ensure optimal thermal dose coverage, resulting in advanced local tumor control and prolongation of overall survival. integration of hyperthermia with emerging imaging technologies, such as non-invasive mr-based thermometry, will help unveil the full potential of hyperthermia for treating cancer. nanotechnology may offer a window of opportunity to improve heat delivery. for example, highly focused ultrasound energy transfer to deep brain tumors may be difficult to achieve due to the skull's electromagnetic barrier. magnetic fluid hyperthermia (mfh) uses iron oxides as a heating source due to their excellent magnetic properties and good compatibility [76] . depending on the route of administration, magnetically mediated hyperthermia can be classified into two main types: arterial embolization hyperthermia, where arterial supply is used to deliver magnetic particles into the tumor tissue, and direct intratumoral injection hyperthermia. magnetic nanoparticles for hyperthermia settings show the advantage of being able to achieve site-specific tumor targeting through the aid of an external magnetic field. magnetic nanoparticles can also be simultaneously traced using mri. these nanoparticles are then selectively heated by application of a high frequency alternating magnetic field. magnetic energy dissipation from the nanoparticles (brown and néel relaxations) induces heating, which produces cell death at temperatures above 43 °c. significant antineoplastic effects of mfh treatment were initially observed in animal models of glioma [77] and prostate cancer [78] . consequently, phase i and ii clinical trials with thermotherapy using magnetic particles have been conducted to treat prostate carcinoma [79] and glioblastoma multiforme [80, 81] . it has been demonstrated that magnetic hyperthermia in conjunction with a reduced radiation dose leads to longer survival following diagnosis of first tumor recurrence compared to conventional therapies in the treatment of recurrent glioblastoma [81] . limiting factors of magnetic hyperthermia have been reported, including patient discomfort at high magnetic field strengths as well as irregular intratumoral heat distribution even upon direct intratumoral injection [82] . magnetoliposomes, i.e., magnetic nanoparticles encapsulated within liposomes, have been designed to achieve active targeting of tumor cells by electrostatic interaction before hyperthermia treatment [83] . other active strategies, including antibody-functionalized magnetoliposomes, have been used in combination with hyperthermia, demonstrating effective targeting and cytotoxic responses when applying alternating magnetic fields in tumor-bearing mouse models [84] . the harnessing of therapeutic effects of nanoparticle-driven hyperthermia will likely take advantage of the feasibility of using these vectors to load drugs or biological agents and trigger their release upon heating, in order to increase tumor control and disease-free survival. the use of magnetic hyperthermia to trigger drug release has also been demonstrated as feasible in combinatorial approaches for cancer treatment. purushotham et al. [85] developed magnetic nanoparticles coated with a thermoresponsive polymer poly-n-isopropylacrylamide (pnipam). with these nanoparticles, simultaneous hyperthermia and drug release of therapeutically relevant quantities of doxorubicin at hyperthermia temperatures was achieved in vitro. in vivo targeting of those doxorubicin-loaded nanoparticles injected directly via the main hepatic artery to hepatocellular carcinoma in a rat model was followed by mri examination. nir-absorbing nanoparticles have the advantage of being able to absorb or scatter light, thus producing heat, which increases the temperature in the tissue where the nanoparticles have been embedded. this region of the electromagnetic spectrum is notable for minimal absorption by water and biological chromophores [86] . therefore, nir light is preferable as a trigger in biomedical applications because it has maximal penetration of tissues due to their minimal absorbance at those wavelengths [87] . hemoglobin and water, the major absorbers of visible and infrared light, respectively, have their lowest absorption coefficient in the nir region (around 650-900 nm). nir light has been shown to travel at least 10 cm through breast tissue and 4 cm through skull/brain tissue and deep muscle using microwatt laser sources (fda class 1), while light at higher power levels (fda class 3) has been shown to penetrate through 7 cm of muscle and neonatal skull/brain [86] . the use of sio 2 /au nanoparticles (nanoshells) as nir-absorbing tags is also considered for the photothermal ablation of solid tumors [88] . a pilot study on patients with refractory head and neck cancer is currently being conducted [89] . au/aus sulfide nir-absorbing nanoparticles (35-55 nm) provide higher absorption than nanoshells (98% absorption and 2% scattering for au/aus versus 70% absorption and 30% scattering for sio 2 /au nanoshells) as well as potentially better tumor penetration [90] . other nanoparticles used in nir include hollow gold nanoparticles, which are smaller than sio 2 /au nanoshells thus giving them prolonged blood circulation half-life and increased chances of reaching the tumors [91] . maltzahn et al. [92] demonstrated that (peg)-protected gold nanorods exhibit superior spectral bandwidth, higher photothermal heat generation per gram of gold and longer circulation half-life when compared to gold nanoshells, as well as an approximately two-fold higher x-ray absorption than a clinical iodine contrast agent. nir-absorbing nanoparticles have also been functionalized with anti-her2 antibodies to achieve tumor targeting in medulloblastoma cells [93] . hollow gold nanoparticles were loaded with an į-melanocyte-stimulating hormone analog [90] , a potent agonist of melanocortin type-1 receptor overexpressed in melanoma, demonstrating selective photothermal ablation of b16/f10 melanoma. nanoshells have been loaded into cells of monocyte lineage, which acted as carriers. once incorporated into human breast tumors in nude mice, the photoinduced cell death of nanoparticle-loaded macrophages was able to induce the death of malignant cells in the tumor's hypoxic microenvironment [94] . current studies are focused on engineering more efficient nir-absorbing nanomaterials and on their functionalization with targeting moieties. compared to currently available non-invasive procedures with capabilities of increasing the temperature of target tumors, the main drawbacks of magnetic and nir-absorbing nanoparticles arise from their necessarily invasive nature as well as from the relatively indiscriminate nature of the tissue damage. due to their efficient intracellular uptake, concerns regarding acute and long-term effects of inorganic nanoparticles accumulation and cytotoxicity are emerging in the biomedical research community [95] [96] [97] . despite the increasing number of newly developed nanoparticles designed for hyperthermia applications, the number of studies addressing their toxicity is low [98] . collected data indicate that size, crystallinity, shape and surface chemistry strongly influence the mechanism of inorganic nanoparticle internalization by cells, their biodistribution, metabolism and potential toxicity, highlighting the great importance of increasing understanding of healthy and tumor cell interactions with nanoparticles. it is expected that ongoing studies will help reconcile conflicting data and demonstrate the safety of inorganic particles to those reporting transient or acute in vivo toxicity. gene therapy aims to treat diseases by introducing dna, rna, small interfering rna and antisense oligonucleotides into specific target cells or tissues to restore missing functionality and to eradicate pathogenic dysfunction. the therapeutic gene material is delivered to specific target cells using efficient vectors that aim to sustain stable, regulated gene expression without creating unwanted side effects. viral carriers, organic cationic compounds, recombinant proteins and inorganic nanoparticles are the four kinds of carriers currently being explored for gene delivery applications [99, 100] . all of them show advantages and disadvantages, but none of them fulfill the criteria for an ideal vector. indeed, viruses can be regarded as nanoparticles due to their dimensions, regular geometries and well-characterized surface properties. the most widely used viral vectors for gene transfer include adenoviruses (ad), which are the dominant gene delivery systems in clinical settings, adeno-associated viruses, herpes simplex-1 viruses, retroviruses and lentiviruses [101] . viruses are very efficient carriers; however, some of them have limited dna cargo capacity, can cause immunogenicity and toxicity and their manufacture is rather expensive. in general, synthetic delivery systems prevent specific immune responses and may carry higher amounts of material, without strict limitations on the size of the genetic drugs. the concept of gene therapy was initially envisioned in the 1970s, but due to the cumbersome nature of the testing required to design and produce effective and safe vectors, gene therapy systems were not fully developed until the early 1980s. the first clinical trials were approved in 1989, and during the 1990s numerous vectors carrying various therapeutic genes were engineered, and their usefulness was tested in preclinical studies. due to a simplistic belief in the straightforward success of gene therapy, many of these viral vectors rapidly moved to clinical settings. although success could be demonstrated in some early clinical studies, even when conducted with far from perfect vectors, serious adverse effects and patient deaths led to rigorous regulation of gene therapy protocols for human use. the evolution of currently successful cancer strategies discussed in sections 1 and 2 also included significant failures and setbacks, which did not restrain investments in chemotherapy and immunological therapies. however, the pharmaceutical industry has not yet developed a single cancer gene therapy product, and so the development of genetic medicines has been left to academic institutions and small biotechnology companies. in addition, the drawbacks of clinical trials for gene therapy led to extended periods of severe cuts in public research funding. the fda has not yet approved a human gene therapy product for sale, although gene-related research is growing rapidly and many clinical trials are ongoing. most of these are in phase i or ii and are aimed at dose determination and toxicity assessment [102] . due to the unknown safety profile of gene vectors, design and approval of human trials were facilitated for life-threatening diseases. approximately 1,500 trials have been conducted worldwide since 1989, and more than two-thirds of them were conceived for cancer diseases. due to the complex nature of cancer, the numerous gene therapy approaches for fighting it include strategies for restoring mutant suppressor gene functions, inactivating oncogenes, expressing suicide genes and eliciting protective immune responses [103] . oncolytic viruses have also been engineered that exploit tumor cells characteristics by replicating them in these target cells as a method for improving the dissemination of biological agents in solid tumors [104] . for the delivery of therapeutic genes encoding proteins with cytotoxic or anti-angiogenic actions, transcriptional targeting using regulatable promoters has been explored as a way of restricting transgene expression to an optimal therapeutic window [105] . to date, there are two gene therapy products available on the market for clinical use, both of which have been approved for cancer treatment in china. since 2004 china has been the only country in the world where gene therapy is licensed for practice. these products are adenoviral vectors marketed under the brand names gendicine tm and oncorine tm [106, 107] . gendicine tm is a p53-overexpressing, replication-incompetent ad for the treatment of head and neck squamous cell cancer in combination with radiotherapy. oncorine tm is an e1b-55k-gene-deleted oncolytic ad, similar to the discontinued onyx-015 [107] . a few examples of viruses that have almost reached the market are given below. cerepro ® (sitimagene ceradenovec) is an adenoviral vector containing the herpes simplex virus thymidine kinase gene cdna under the control of a cytomegalovirus promoter, manufactured by ark therapeutics ltd., for the treatment of high-grade glioma with oral ganciclovir [108] . cerepro ® demonstrated significant efficacy in a recent phase iii trial, but a further trial is still required before approval in order to provide a sufficient level of evidence of clinical benefit [109] . similar to gendicine tm , advexin tm (contusugene ladenovec; ing21) was developed by introgen therapeutics inc. as a replication-impaired, adenoviral vector carrying the p53 tumor suppressor gene under the control of a constitutive viral promoter. numerous human cancers have abnormalities in some of the molecules associated with the p53 pathway, contributing to tumor resistance to a variety of conventional therapeutics. preclinical data has demonstrated increased amounts of p53 wild-type protein after transduction with advexin tm , and phase ii and iii trials were conducted in unresectable recurrent head and neck squamous cell carcinoma [110] . responders to the adenovirus therapy had a characteristic p53 profile, with either low expression of mutated p53 or wild-type p53 inactivated by upregulation of inhibitors. genetic immunotherapy was conceived to deliver immune mediators as an efficient and safe approach that also prevents the need to produce and purify large amounts of recombinant proteins [111] . tnferade tm , developed by genvec [112] , is a second-generation adenovirus vector containing e1, e3 and e4 deletions harboring a tnf-į gene, functionally controlled by the radiation-inducible egr-1 promoter. tnferade tm was successfully tested in multicenter phase ii and iii randomized controlled trials in combination with chemoradiation in patients with locally advanced pancreatic cancer [113] . despite initially encouraging results, genvec stopped the phase iii trial in march 2010, as an interim analysis could not demonstrate relevant evidence of effectiveness. an example of a retroviral vector in cancer gene therapy is rexin-g£, currently in clinical trials for advanced pancreatic, metastatic breast cancer, osteosarcoma and soft tissue sarcoma [114] . rexin-g£ is a replication-incompetent, collagen-targeted vector, encoding a dominant negative mutant of the human cyclin g1 gene, which makes it lethal to cancer cells [115, 116] . impressive results were obtained in phase i and ii clinical trials, which demonstrated unprecedented tumor control, prolonged survival and clinical remissions in late-stage cancer patients [117] . genomic and proteomic technologies are quickly evolving to detect specific molecular targets in patient tumor samples, fulfilling the promise of a personalized treatment approach. information collected from these emerging technologies will help engineer vectors that carry therapeutic genes specifically targeted to the specificities of individual tumor properties. it is now envisioned that future cancer gene therapies will use a combination of viral and non-viral vectors tailored to meet patientspecific tumor characteristics. consequently, many research groups have focused their efforts on the generation of synthetic carriers that incorporate features that mimic the biological mechanisms of viral gene delivery. the ideal synthetic vector would incorporate a polycationic sequence to condense nucleic acids and a coating to evade the reticuloendothelial system. it would exhibit colloidal stabilization properties to prevent accumulation in the lung capillaries, and would contain specific target-cell entry, endosomal escape and nuclear localization signals. the goal is to synthetically manufacture biodegradable vectors than can be administered systemically to reach micro metastases. these carriers were initially prepared from polymers, lipids and dendrimers [118] . the first non-viral gene therapy trial was conducted in 1991, on patients with advanced melanoma who received intratumor injection of dna-liposome complexes [119] . the results demonstrated for the first time the safety and feasibility of cancer treatment by gene therapy protocols using non-biological carriers. cationic polymers have demonstrated superior gene transfer properties to those of polymers having anionic or neutral charge at physiological ph. however, most clinical trials have been conducted with carriers classified as safe [120] , such as the nonamine polymers polyvinyl pyrrolidone and poly(lacid co-glycolic acid). allovectin-7 tm , a registered trademark of vical incorporated (san diego, ca, usa) is a promising cancer gene therapy product formulated with a cationic lipid system. allovectin-7 tm contains a bicistronic plasmid encoding human leukocyte antigen-b7 and beta-2 microglobulin. this plasmid allows the immune system to recognize metastatic melanoma lesions as foreign by incorporating a mhc class i complex into the tumor through direct injection, as demonstrated in phase i/ii trials [121] . a phase iii trial is currently being conducted to compare the efficacy of allovectin-7 tm to conventional chemotherapy. encouraging results were also obtained in a recent phase i trial conducted on women with recurrent, chemotherapy-resistant ovarian cancer to assess the safety and tolerability of a plasmid carrying the human gene for interleukin-12 plasmid formulated with a synthetic lipopolymer, polyethylene glycol-polyethyleneimine-cholesterol [122] . currently, numerous nanostructured systems are being developed and tested in preclinical studies. for example, self-assembled nanoparticles containing sirna, carrier dna, protamine and lipids, including polyethylene glycol and a ligand, anisamide, to target cancer cells were prepared and tested by li et al. [123] . these authors demonstrated the high efficiency of these systems in delivering genetic material to xenograft tumors after intravenous administration in athymic nude mice. folate groups have also been linked to liposomes for sirna delivery, which resulted in significant suppression of xenograft growth in mice [124] . folate-peg-polymeric nanoparticles have also been tested in vivo for suicide gene therapy applications, using ganciclovir as a prodrug [125] . peg-modified gelatin-based nanocarriers have been used in vivo to deliver plasmid dna encoding for the soluble form of the extracellular domain of the vascular endothelial growth factor receptor-1 (vegf-r1 or sflt-1) in antiangiogenic therapy [126] . upon intravenous administration, overexpressed sflt-1 was therapeutically active as shown by suppression of the xenograft tumor growth. nanoparticles also offer the ability to monitor the delivery of genetic material. tan et al. [127] were able to synthesize chitosan-based nanoparticles encapsulating quantum dots coupled to sirna and demonstrate efficient silencing and transfection tracking. finally, inorganic nanoparticles are also under development, which, despite their low synthesis efficiency, have the significant advantage of low toxicity and easy functionalization [100] . for example, magnetic liposomes have also been tested in magnetic hyperthermia settings to induce therapeutic tnf-α expression driven by the promoter of the stress-inducible gadd153 gene [128] . the combined thermal and gene therapy treatment significantly arrested tumor growth in nude mice, which encouraged the refinement of this type of cancer gene therapy, which was then successfully tested in preclinical studies [129] . after more than two decades of cancer gene therapy using biological vectors, preclinical studies yielded excellent results and clinical trials reported satisfactory results in terms of reporting mild or no long-term toxicity. however, a real breakthrough cannot be claimed in clinical therapy. the reasons for the different outcomes of preclinical and clinical trials include the inherent limitations of rodent models, which develop homogeneous tumors arising from clonal cell lines, while tumors found in clinical practice are composed of heterogeneous cell types. the therapies described in sections 1 and 2 also confronted similar limitations during their development. the main players in gene therapy, vectors and transgenes, will evolve to achieve the highest possible degree of specificity for targeting cancer cells. nanotechnology has already engineered powerful non-biological carriers of a variety of therapeutic genes that have demonstrated efficacy and safety in preclinical tests. since current knowledge of cancer cell biology is far from complete, ongoing and future clinical trials with these synthetic systems are expected to suffer similar drawbacks in terms of efficacy as those experienced with viral gene therapy systems. as we have seen from other therapies that have already been incorporated into the clinical routine of cancer treatment, the success of cancer gene therapies will be preceded by many failures, which will likely be due to a greater extent to our technological limitations than to flaws in their general concept. this review has tried to summarize the history and evolution of the most common types of cancer treatments available today, but also new therapies under study in the last years. in addition to surgery, chemotherapy, radiation therapy, hyperthermia, photodynamic therapy or immunotherapy, new therapies are now at different stages of development trying to decrease drug toxicity in health tissues and increase efficacy by targeting tumor angiogenesis, by exploring cell and gene therapy, or by using new nanostructures for diagnosis or therapeutic purposes. nanotechnology is offering new products, which either used alone, due to their intrinsic properties, or in combination with other biomolecules (anti-tumoral drugs, folic acid, albumin, antibodies, aptamers) could be used to target cancer cells. however, the history tells us that the fight against cancer is not an easy task. many types of cancers are able to resist to conventional therapies, and different combinations of drugs and therapies (e.g., surgery together with radiotherapy and chemotherapy) are usually the only way to destroy tumoral cells. this may be also true for the new therapies arriving now to the clinic. much more studies are required but these new ways of treatment are opening doors to hope for many patients waiting for a successful therapy. symposium on advances in pharmacology resulting from war research: therapeutic applications of chemical warfare agents nitrogen mustard therapy: use of methyl-bis (h-chloroethyl) amine hydrochloride and tris (h-chloroethyl) amine hydrochloride for hodgkin's disease, lymphosarcoma, leukemia, and certain allied and miscellaneous disorders temporary remissions in acute leukemia in children produced by folic acid antagonist, 4-aminopteroyl-glutamic acid (aminopterin) the chemistry and biochemistry of purine analogs antagonists of nucleic acid derivatives. viii. synergism in combinations of biochemically related antimetabolites fluorinated pyrimidines. a new class of tumor inhibitory compounds acth-and cortisone-induced regression of lymphoid tumors in man: a preliminary report therapy of choriocarcinoma and related trophoblastic tumors with folic acid and purine antagonists the vinca alkaloids: a new class of oncolytic agents a methylhydrazine derivative in hodgkin's disease and other malignant neoplasms. therapeutic and toxic effects studied in 51 patients preliminary clinical studies with ibenzmethyzin intensive combination chemotherapy and x-irradiation in the treatment of hodgkin's disease intensive combination chemotherapy and x-irradiation in hodgkin's disease combination chemotherapy in the treatment of advanced hodgkin's disease combination chemotherapy in the treatment of advanced hodgkin's disease advanced diffuse histiocytic lymphoma, a potentially curable disease. results with combination chemotherapy combination chemotherapy in disseminated testicular cancer: the indiana university experience cis-diamminedichloroplatinum, vinblastine, and bleomycin combination chemotherapy in disseminated testicular cancer testicular cancer as a model for a curable neoplasm: the richard and linda rosenthal foundation award lecture the protein kinase complement of the human genome tyrosine kinases as targets for cancer therapy history of cancer chemotherapy a method of performing abdomino-perineal excision for carcinoma of the rectum and of the terminal portion of the pelvic colon recent advances in the surgery of the lung and pleura hugh morriston davies: first dissection lobectomy in 1912 lung cancer: new surgical approaches minimally invasive techniques in breast cancer treatment evolution of the management of laryngeal cancer mri-guided laparoscopic and robotic surgery for malignancies intensity-modulated radiation therapy, protons, and the risk of second cancers introduction. image-guided and adaptive radiation therapy late baron shibasaburo kitasato continuous cultures of fused cells secreting antibody of predefined specificity evidence for somatic rearrangement of immunoglobulin genes coding for variable and constant regions treatment of b-cell lymphoma with monoclonal anti-idiotype antibody an efficient method to make monoclonal antibodies from memory b cells: potent neutralization of sars coronavirus a human myeloma cell line suitable for the generation of human monoclonal antibodies transport of molecules across tumor vasculature escherichia coli secretion of an alive chimeric antibody fragment assembly of a functional immunoglobulinfv fragment in echerichia coli single-chain antigen-binding proteins high cytoplasmic expression in escherichia coli, purification and in vitro refolding of a single chain of fv antibody against the hepatitis b surface antigen pegylation of therapeutic proteins the state of antibody therapy production of antigen-specific human monoclonal antibodies from translocus mice: comparison of mice carrying igh/igκ or igh/igκigλtransloci gonzález-fernández, a. the use of transgenic mice for the production of a human monoclonal antibody specific for human cd69 antigen h24: a human monoclonal antibody obtained from mice carrying human ig genes, recognizes human myeloid leukaemia and cd5 í no hodgkin´s lymphoma gonzález-fernández, a. rearrangement of only one human ighv gene is sufficient to generate a wide repertoire of antigen specific antibody responses in transgenic mice design of next-generation protein therapeutics mabs: a business perspective nanoparticles in medicine: therapeutic applications and developments targeting nanoparticles to cancer applications and global markets clinical trials database; search keywords: cancer and nanoparticles; cancer and nano folate-mediated delivery of macromolecular anticancer therapeutic agents extracellularly activated nanocarriers: a new paradigm of tumor targeted drug delivery nanotechnology in drug delivery and tissue engineering: from discovery to applications theragnostics for tumor and plaque angiogenesis with perfluorocarbonnanoemulsions in vivo tumor targeting and spectroscopic detection with surface-enhanced raman nanoparticle tags iron oxide nanoparticles as a drug delivery vehicle for mri monitored magnetic targeting of brain tumors tumor-targeted drug delivery and mri contrast enhancement by chlorotoxin-conjugated iron oxide nanoparticles targeted tumor cell internalization and imaging of multifunctional quantum dot-conjugated immunoliposomes in vitro and in vivo targeted delivery of cisplatin to prostate cancer cells by aptamer functionalized pt(iv) prodrug-plga-peg nanoparticles nanoparticles in photodynamic therapy: an emerging paradigm a systematic review of photodynamic therapy in the treatment of pre-cancerous skin conditions, barrett's oesophagus and cancers of the biliary tract, brain, head and neck, lung, oesophagus and skin plasmonic photothermal therapy (pptt) using gold nanoparticles ceramic-based nanoparticles entrapping water-insoluble photosensitizing anticancer drugs: a novel drugícarrier system for photodynamic therapy the embedding of meta-tetra(hydroxyphenyl)-chlorin into silica nanoparticle platforms for photodynamic therapy and their singlet oxygen production and ph-dependent optical properties nanoparticles as vehicles for delivery of photodynamic therapy agents mild temperature hyperthermia and radiation therapy: role of tumor vascular thermotolerance and relevant physiological factors today's thermal therapy: not your father's hyperthermia: challenges and opportunities in application of hyperthermia for the 21st century cancer patient present and future technology for simultaneous superficial thermoradiotherapy of breast cancer neoadjuvant chemotherapy alone or with regional hyperthermia for localized high-risk soft-tissue sarcoma: a randomised phase 3 multicentre study multifunctional nanoparticles--properties and prospects for their use in human medicine the effect of thermotherapy using magnetic nanoparticles on rat malignant glioma anticancer effect of hyperthermia on prostate cancer mediated by magnetite cationic liposomes and immune-response induction in transplanted syngeneic rats morbidity and quality of life during thermotherapy using magnetic nanoparticles in locally recurrent prostate cancer: results of a prospective phase i trial intracranial thermotherapy using magnetic nanoparticles combined with external beam radiotherapy: results of a feasibility study on patients with glioblastomamultiforme nanoparticle thermotherapy and external beam radiation therapy for human prostate cancer cells magnetic nanoparticle hyperthermia for prostate cancer intracellular hyperthermia for cancer using magnetite cationic liposomes: an in vivo study anti-cancer effect of hyperthermia on breast cancer by magnetite nanoparticle-loaded anti-her2 immunoliposomes thermoresponsive core-shell magnetic nanoparticles for combined modalities of cancer therapy a clearer vision for in vivo imaging optical-thermal response of laser-irradiated tissue nanoengineering of optical resonances near-infrared-resonant gold/gold sulfide nanoparticles as a photothermal cancer therapeutic agent targeted photothermal ablation of murine melanomas with melanocyte-stimulating hormone analog-conjugated hollow gold nanospheres computationally guided photothermal tumor therapy using long-circulating gold nanorod antennas immunonanoshells for targeted photothermal ablation in medulloblastoma and glioma: an in vitro evaluation using human cell lines a cellular trojan horse for delivery of therapeutic nanoparticles into tumors nanotoxicity of iron oxide nanoparticle internalization in growing neurons toxic effects of iron oxide nanoparticles on human umbilical vein endothelial cells acute toxicity of magnetic nanoparticles in mice magnetic nanoparticles for theragnostics non-viral nucleic acid delivery: key challenges and future directions inorganic nanoparticles as carriers for efficient cellular delivery gene therapy: the first approved gene-based medicines, molecular mechanisms and clinical indications deliberate regulation of therapeutic transgenes replicative adenoviruses for cancer therapy regulatable gene expression systems for gene therapy current status of gendicine in china: recombinant human ad-p53 agent for treatment of cancers clinical trials with oncolytic adenovirus in china. curr. cancer drug targets a preclinical assessment of the safety and biodistribution of an adenoviral vector containing the herpes simplex virus thymidine kinase gene (cerepro) after intracerebral administration sitimageneceradenovec: a gene-based drug for the treatment of operable high-grade glioma biomarkers predict p53 gene therapy efficacy in recurrent, squamous cell carcinoma of the head and neck improvement of different vaccine delivery systems for cancer therapy translation of the radio-and chemo-inducible tnferade vector to thetreatment of human cancers molecular engineering of matrix-targeted retroviral vectors incorporating a surveillance function inherent in von willebrand factor inhibition of metastatic tumor growth in nude mice by portal vein infusions of matrix-targeted retroviral vectors bearing a cytocidal cyclin g1 construct noteworthy clinical case studies in cancer gene therapy: tumor-targeted rexin-g advances as an efficacious anti-cancer agent gene delivery with synthetic (non viral) carriers direct gene transfer with dna-liposome complexes in melanoma: expression, biologic activity, and lack of toxicity in humans handbook of pharmaceutical excipients a phase 2 study of high-dose allovectin-7 in patients with advanced metastatic melanoma phase-i clinical trial of il-12 plasmid/lipopolymer complexes for the treatment of recurrent ovarian cancer tumor-targeted delivery of sirna by self-assembled nanoparticles folate-linked lipid-based nanoparticles for synthetic sirna delivery in kb tumor xenografts folate-linked nanoparticle-mediated suicide gene therapy in human prostate cancer and nasopharyngeal cancer with herpes simplex virus thymidine kinase antiangiogenic gene therapy with systemically administered sflt-1 plasmid dna in engineered gelatin-based nanovectors quantum-dot based nanoparticles for targeted silencing of her2/neu gene via rna interference heat-inducible tnf-alpha gene therapy combined with hyperthermia using magnetic nanoparticles as a novel tumor-targeted therapy targeted hyperthermia using magnetite cationic liposomes and an alternating magnetic field in a mouse osteosarcoma model we greatly appreciate the support from grants mat2009-14695-c04-02 from the ministerio de ciencia e innovación (ministry for science and innovation, spain), fundación mutua madrileña, ibercaja, fundación ramón areces and the 2006 ramón y cajal program. n.v. is supported by program i3sns from the fondo de investigaciones sanitarias (healthcare research fund, spain). m.a. is especially indebted to the graduate students and researchers of ina for providing some of the images in figure 3 . we also greatly appreciate the support from the inbiomed (2009/063, xunta de galicia), immunonet (soe1/p1/e014, immunotherapy network, sudoe-feder) and hinamox (7° eu program) projects. we thank josé me, r. esteban and a. esteban for their help in compiling data. cancers 2011, 3 the authors declare no conflict of interest. key: cord-011630-lfm34fsw authors: li, yan; shan, yongli; kilaru, gokhul krishna; berto, stefano; wang, guang-zhong; cox, kimberly h; yoo, seung-hee; yang, shuzhang; konopka, genevieve; takahashi, joseph s title: epigenetic inheritance of circadian period in clonal cells date: 2020-05-27 journal: nan doi: 10.7554/elife.54186 sha: doc_id: 11630 cord_uid: lfm34fsw circadian oscillations are generated via transcriptional-translational negative feedback loops. however, individual cells from fibroblast cell lines have heterogeneous rhythms, oscillating independently and with different period lengths. here we showed that heterogeneity in circadian period is heritable and used a multi-omics approach to investigate underlying mechanisms. by examining large-scale phenotype-associated gene expression profiles in hundreds of mouse clonal cell lines, we identified and validated multiple novel candidate genes involved in circadian period determination in the absence of significant genomic variants. we also discovered differentially co-expressed gene networks that were functionally associated with period length. we further demonstrated that global differential dna methylation bidirectionally regulated these same gene networks. interestingly, we found that depletion of dnmt1 and dnmt3a had opposite effects on circadian period, suggesting non-redundant roles in circadian gene regulation. together, our findings identify novel gene candidates involved in periodicity, and reveal dna methylation as an important regulator of circadian periodicity. circadian oscillations maintain daily rhythms to control multiple physiological and behavioral processes, including metabolism, cell growth, immune response, and the sleep-wake cycle. disruptions of the circadian clock have been linked with various disease processes and aging (takahashi et al., 2008; kondratova and kondratov, 2012) . circadian oscillations display remarkable fidelity in their periodicity even in the absence of environmental cues. this precision of the internal biological clock arises from a complex gene network. in mammals, the core of this network is composed of an autoregulatory transcriptional negative feedback loop involving clock, bmal1, per1/per2, and cry1/cry2, and there are additional feedback loops interlocked with the core (takahashi et al., 2008; mohawk et al., 2012; takahashi, 2017) . interestingly, although the cell-autonomous clock is ubiquitous, individual cells often do not maintain a perfect 24 hr circadian period, and within cell populations there are heterogeneous autonomous oscillations with a broad distribution of period length (nagoshi et al., 2004; welsh et al., 2004; leise et al., 2012) . the heterogeneity in intrinsic period of hypothalamic suprachiasmatic nucleus (scn) neurons confers important functions of phase liability and phase plasticity (welsh et al., 1995; liu et al., 1997; ko et al., 2010; mohawk et al., 2012) . however, it is still unclear how heterogeneous circadian periodicity is established and maintained under physiological conditions, or how much of this heterogeneity is heritable. the origin of heterogeneity is complex, but may be driven by genetic variation, epigenetic modifications, and/or transcriptional noise (jaenisch and bird, 2003; raser and o'shea, 2005; raj and van oudenaarden, 2008; burrell et al., 2013; kelsey et al., 2017; cavalli and heard, 2019; liu et al., 2019) . we have recently shown that nonheritable noise is the predominant source of intercellular variation in circadian period within clonal cell lines (li et al., 2020) . however, it is still unclear what heritable factors contribute to period variation among different clonal cells. dna methylation has been recognized as a chief contributor to gene expression states, and it is essential for mammalian embryonic development, with genome-wide methylation patterns changing during differentiation (greenberg and bourc'his, 2019) . there are three canonical cytosine-5 dna methyltransferases that catalyze the addition of methylation marks. dnmt3a and 3b, the de novo methyltransferases, set up dna methylation patterns during early development. once established, dnmt1 will copy those patterns onto the daughter strand during dna replication ensuring methylation maintenance (jaenisch and bird, 2003) . dnmt dysfunction has been associated with various diseases, and dnmt-deficient mice exhibit embryonic lethality (greenberg and bourc'his, 2019) . numerous studies have supported the role of dna methylation in gene silencing; however, more recent work suggests that dna methylation can also be involved in transcriptional activation (rinaldi et al., 2016; yin et al., 2017b; harris et al., 2018; lyko, 2018) . interestingly, despite high fidelity in mitotic inheritance, dna methylation is variable across individuals, tissues, and cell types (jaenisch and bird, 2003; jones, 2012; varley et al., 2013) . thus, we hypothesized that differential dna methylation could contribute as a heritable factor underlying heterogeneous circadian oscillations in clonal cell lines. here, by examining phenotype-associated high-throughput multi-omics profiles in clonal cell populations, we identified and validated a pool of novel candidate genes regulating circadian period length and uncovered complex gene co-expression networks highly enriched in stress response and metabolic pathways. we next explored the origins of heterogeneous gene expression and found differences in global dna methylation patterns that were associated with both silencing and activation of differentially expressed genes. using gene knockdown studies, we also found that dnmt1 and dnmt3a have opposite effects on period length. together, our findings demonstrate the important role of dna methylation in the regulation of circadian period. to assess cellular phenotypic heterogeneity, we utilized an immortalized mouse ear fibroblast cell line carrying a per2::lucsv bioluminescence reporter generated from per2::lucsv knockin mice (chen et al., 2012; yoo et al., 2017) . we recently showed that these cells express persistent, robust, and cell-autonomous circadian oscillations over a 2 week period. moreover, clonal cell lines generated from the parent culture had period distributions similar to those seen with single cells, indicating that circadian period is a heritable phenotype ( figure 1a li et al., 2020) . here, we used the clonal cell lines to address the underlying molecular mechanism for heterogeneous circadian periodicity. to examine the stability of this heritability, twenty clonal cell lines were randomly selected and cultured continuously for 20 passages and tested for circadian period every five passages. although two-way anova revealed significant effects (p<0.01) of both cell line and passage, there was no interaction (p=0.09). moreover, cell line was the dominant source of variation (74.70%), while passage only contributed 2.64% of the total variation. multiple comparisons within each clonal cell line across passages identified a significant difference (adjusted p<0.05) for only~5% of comparisons (11 out of 200), which is consistent with 5% false positive rate. these results indicate that circadian period of clonal cell lines is stable and transmissible for at least 20 cell passages ( figure 1c ). to explore potential underlying mechanisms, we selected two groups of clonal cell lines from the two tails of the period distribution ( table 1 , 5 short period (sp) and five long period (lp) clones) (li et al., 2020) and performed rna-seq analysis (figure 2-source data 1). we compared their transcriptomic profiles and identified 5,137 period-correlated differentially expressed (de) genes, with 2,782 genes upregulated and 2,355 genes downregulated in the lp group (figure 2a to narrow down the target pool further and identify candidate genes more directly responsible for periodicity differences, we selected four additional groups of subclones established from two representative clonal cell lines with different periods: a shorter period subgroup and a longer period subgroup from short period clone#33 (ssp and lsp), or long period clone#114 (slp and llp), respectively ( figure 2b ; li et al., 2020) . these subclones and the original 10 clonal cell lines constituted a continuous period spectrum beneficial for identifying period-correlated genes ( figure 2c , table 1) . we identified 535 additional period-correlated de genes from subclones originating from sp clone#33 and 1,352 additional de genes from subclones originating from lp clone#114 ( figure 2d -e, figure 2 -source data 1). by comparing the three rna-seq datasets, 67 overlapping de genes were identified ( figure 2f ). from these, we selected 14 genes based on the strength of the correlation between their expression and circadian period length from all 88 samples and performed knockdown experiments to validate their function in circadian periodicity. out of 7 positively correlated de genes, knockdown of ak3 and trim3 significantly shortened period, whereas knockdown of cpeb1, lrrfip1, rbfa, and dars lengthened period ( figure 3a -c, figure 3 -source data 1). out of 7 negatively correlated de genes, knockdown of ipo13 and tmem165 significantly lengthened period, whereas slc8a3, jun, med23, and cpa4 knockdown shortened period ( figure 3d -f, figure 3-source data 1). knockdown of two other genes, eif4e2 and rfx5, did not alter period length. we also examined the effect of knockdown of five representative genes in 10 clonal cell lines and found that they all showed the same period alterations as that seen in the parent culture . these results suggest that multiple genes function together to determine circadian period length and that there were no unique (clone-specific) effects on the direction (long or short) of the period changes. since the majority of the de genes identified here have never been reported as having effects on circadian period, these data provide a new pool of candidate genes functioning in circadian periodicity. large-scale gene networks are associated with period heterogeneity because functionally related genes are usually co-expressed (heyer et al., 1999) , we further characterized the period-correlated de genes by examining their co-expression patterns. using weighted gene co-expression network analysis (wgcna), we generated 31 modules from the 10 clonal cell lines rna-seq data ( figure 4a , figure 4 -source data 1). several modules exhibited significant enrichment for period-correlated de genes. blue, lightgreen, green and darkred modules were enriched for positively correlated de genes, while salmon, pink, red, and darkgreen modules were enriched for negatively correlated de genes ( figure 4b ). ingenuity pathway analysis (ipa) revealed stress response signaling pathways and metabolic pathways were associated with the period-correlated de genes, suggesting their important roles in circadian periodicity ( figure 4c , figure 4 -source data 1). ipa analysis of the correlated modules also revealed overlapping functional pathways. for example, the blue module is highly enriched for de genes, and is also enriched for the eif2 signaling pathway, which has been recently shown to regulate circadian period, consistent with the predicted elevated translational activity in lp group (pathak et al., 2019; figure 4c ). to validate these results further, we used two different small molecules to activate the eif2 signaling pathway in parent culture and observed significantly shortened period, consistent with what has been previously reported (pathak et al., 2019; figure 4d ). in addition, the darkred module was enriched for the mtor signaling pathway; the green and salmon modules were enriched for the protein ubiquitination pathway; and the pink module was enriched for nrf2-mediated oxidative stress response pathway (figure 4 -source data 1). interestingly, all three of these pathways have been shown to be functional in circadian periodicity, further confirming the functional importance of the co-expressed gene networks (stojkovic et al., 2014; ramanathan et al., 2018; wible et al., 2018) . further analysis of protein-protein interactions (ppi) revealed that co-expressed de genes were also physically interconnected. for example, within the blue module there were several different tightly linked clusters, including those enriched for ribosomal rna processing, protein ubiquitination, nucleotide and amino acid metabolism, and mrna splicing, emphasizing the blue module as a transcriptional/translational related gene network ( figure 4e ). taken together, our results suggest that period heterogeneity is regulated by changes in large-scale functional gene co-expression networks. to explore whether there was a genetic basis for heterogeneous gene expression, we performed whole-exome sequencing on sp clone#33 and lp clone#114. interestingly, only four annotated genes carrying unique variants were identified (supplementary file 1), but 2 of them are not expressed (figure 2 -source data 1), and none of them have known circadian functions, suggesting that somatic mutations are unlikely to underlie the heterogeneous period distributions. cell-to-cell variability is also partially heritable via epigenetic modifications such as dna methylation (jaenisch and bird, 2003; jones, 2012) . to assess the contribution of dna methylation in heterogeneous circadian periodicity, we used reduced representation bisulfite sequencing (rrbs) to explore dna methylation profiles and their correlation with the period-correlated transcriptomes. using 1,000 bp tiling windows genome-wide, we identified 16,520 significant differentially methylated regions (dmrs). importantly, none of the core clock genes, even the few that were differentially expressed in the parental lines, had coding mutations or differential dna methylation, except for a small dmr spanning~10 nucleotides located in exon 1 of per1 (table 2) . of the dmrs found, 62% (10,212 dmrs) were up-regulated, whereas 38% (6,308 dmrs) were down-regulated in the sp group ( figure 5a , figure 5 -source data 1). 6055 genes were annotated as dmr-associated with dmrs falling in either the gene body or 5 kb upstream of the transcription start site (tss), and of these, 1,315 dmr-associated genes overlapped with period-correlated de genes ( figure 5b ). interestingly, for period-correlated de genes associated with dmrs, in addition to negative correlations, we also observed positively correlated dmrs, indicating both repression and enhancement of functional (figure 5c -d) as reported by others (jones, 2012; rinaldi et al., 2016; yin et al., 2017b; harris et al., 2018) . the overall clustering pattern of the methylomes resembled that of the transcriptomes, indicating an important role for global dna methylation in regulating the co-expressed genes ( figure 5-figure supplement 1) . we examined the modules enriched for period-correlated de genes and found that several hub genes were regulated by differential dna methylation. for example, the hub gene of the blue module, htatip2, which exhibited the same expression pattern of the module eigengene ( figure 6a-b) , was hypermethylated at the promoter region and repressed in the sp group ( figure 6c-d) . on the contrary, parvb and rftn1, two hub genes from negatively correlated modules, were hypermethylated and repressed in the lp group ( figure 6a-d) . except for these negative correlations, some genes with hypermethylation in the gene body or enhancer showed enhanced expression levels ( figure 6-figure supplement 1) , supporting recent findings that dna methylation in these regions may activate gene expression (jones, 2012; rinaldi et al., 2016; yin et al., 2017b) . to validate the function of dmr-associated de genes further, we also performed gene knockdown experiments in two different clonal cell lines. knockdown of htatip2 and dusp18 in lp clone#114 significantly shortened period, whereas knockdown of rftn1 in sp clone#128 significantly lengthened circadian period ( figure 6e) , consistent with predictions that deficiency of htatip2 shortens circadian period possibly by activating the akt/mtor signaling pathway (yin et al., 2017a ; ramanathan et al., 2018) , and that hypomethylation and upregulated expression of dusp18 lengthens circadian period, possibly by inhibiting the sapk/jnk signaling pathway (wu et al., 2006; chansard et al., 2007; yoshitane et al., 2012) . to assess the role of dna methylation in circadian periodicity further, we manipulated global dna methylation either by knocking down dna methyltransferases or by applying small molecule inhibitors. interestingly, deficiency of dnmt1 significantly shortened period length, whereas knockdown of dnmt3a slightly, but significantly, lengthened period ( figure 7a -b). dnmt1 and dnmt3a knockdown in the ten clonal cell lines showed the same overall results, suggesting that dna methylation affects circadian periodicity in the same way in all clones tested ( figure 7c ). as pharmacological validation, administration of sgi-1027, which selectively induces degradation of the dnmt1 protein (datta et al., 2009) , significantly shortened period, while administration of zebularine, which induces significant reduction of both dnmt1 and dnmt3a (billam et al., 2010; you and park, 2012) , lengthened period ( figure 7d ). drug administration in primary mef cells with per2::lucsv and nih3t3 cells carrying an e2-box-luc reporter also revealed similar results ( figure 7e ). taken together, these findings suggest that different dna methyltransferases contribute to the regulation of circadian periodicity, likely via different mechanisms. using clonal cell analysis, we show that the heterogeneity of single-cell circadian periodicity is heritable and stable for at least 20 cell passages. the heritability of circadian period is consistent with an epigenetic mechanism, likely mediated by dna methylation. by analyzing gene expression profiles of multiple clonal cell lines with different circadian periods, we identified groups of differentially expressed genes that were significantly correlated with period length. although a few core clock genes were differentially expressed in parental cultures, there were no significant differences in these genes among subclones, suggesting they are not responsible for the period heterogeneity seen in these homogeneous cell populations. by comparing subclones, we narrowed down the common candidate gene list and further validated that 86% of the novel candidates regulated circadian period using gene knockdown assays. while some of these genes had effects on period length that were aligned with our predictions, others had effects counter to our expectations which were probably masked in the complex gene networks. overall, our results are consistent with the hypothesis that period is determined by the ensemble interactions of many genes that can either shorten or lengthen period individually. importantly, the vast majority of the de genes identified here have never been reported as having effects on circadian period. thus, we have provided a new pool of candidate genes functioning in circadian periodicity. we also provide evidence that the genome-wide dna methylation landscape underlies much of the complex gene networks. multiple hub genes of period-correlated modules were under the regulation of dna methylation, showing remarkable coherence in dna methylation, gene expression, and circadian phenotype. the similar clustering patterns of transcriptomes and methylomes further suggested an important role of dna methylation in shaping circadian period heterogeneity through regulating large-scale gene networks. previous studies have linked dna methylation of core clock genes with different diseases (joska et al., 2014; peng et al., 2019) ; however, the results presented here have revealed how global dna methylation can regulate circadian clock function via genomewide changes in gene expression. our whole exome sequencing failed to detect significant coding mutations, further supporting the role of differential dna methylation in establishing circadian heterogeneity. however, we cannot rule out that genetic variation in regulatory regions, or other epigenetic modifications could be involved. additional experiments will help to understand better the full array of underlying mechanisms regulating circadian period. we observed both negatively and positively correlated dmrs in almost equal proportions, indicating both repression and activation of gene expression by dna methylation and supporting the revised view of the functions of dna methylation (greenberg and bourc'his, 2019) . in addition, we found that knockdown of dnmt1 and dnmt3a had opposite effects on circadian period. it is not surprising that dnmt1 knockdown alters period length, since it is the methyltransferase responsible for dna methylation maintenance through mitotic inheritance (jones, 2012) . however, as dnmt3a is responsible for de novo dna methylation, it is less clear how its knockdown affects circadian period. one possibility is that dnmt3a is also involved in transcriptional activation associated with active enhancers (rinaldi et al., 2016; lyko, 2018) . another possibility is that some genes might undergo dynamic demethylation and de novo methylation since both tet2 and tet3 are expressed at comparable levels to dnmt3a in our cellular system (oh et al., 2018; oh et al., 2019 ; figure 2 source data 1). additional studies targeting dnmt1 and dnmt3a may help to explain the functions of different dnmts in circadian regulation. in conclusion, our findings have identified a novel pool of candidate genes involved in circadian period regulation, and have revealed the important role of dna methylation underlying circadian period heterogeneity by bidirectionally regulating large-scale gene co-expression networks. our study not only expands the knowledge about circadian clock regulation, but also may benefit epigenetic research by providing multiple candidate genes repressed or activated by dna methylation. embryos. nih3t3 cells stably expressing per2 e-box (e2)-driven luciferase bioluminescence reporter were established by lentivirus transduction followed by blasticidin selection. our cell line stocks have all tested negative for mycoplasma contamination. for authentication of cell lines, as described below, two clonal cell lines, #33 and #114, were sequenced by whole exome sequencing; and a total of 34 clones and subclones were assessed by rna-seq and were found to be valid. to measure luminescence rhythms from 35 mm culture dishes, confluent cells were synchronized with 100 nm dexamethasone for 2 hr, then changed to hepes-buffered recording medium containing 2% fbs (welsh et al., 2004) , and loaded into a lumicycle luminometer (actimetrics). the period was analyzed with lumicycle analysis program (actimetrics). all lumicycle period analysis results shown in this paper were averages of !3 experiments. baseline-subtracted signals were exported to excel to generate bioluminescence traces. for single-cell imaging, cells were changed to recording medium containing 2% b27% and 1% fbs without dexamethasone synchronization. an inverted microscope (leica dm irb) in a heated lucite chamber custom-engineered to fit around the microscope stage (solent scientific, uk) kept the cells at a constant 36ë�c was mounted on an anti-vibration table (tmc) equipped with a 10x objective. a cooled ccd camera with backside illuminated e2v ccd 42-40, 2048 ã� 2048 pixel, f-mount adapter, ã�100ë�c cooling (series 600, spectral instruments) was used to capture the luminescence signal at 30 min intervals, with 29.6 min exposure duration, for at least 12 days. 8 ã� 8 binning was used to increase the signal-to-noise ratio. the bioluminescence signal of each single cell, outlined with a region of interest (roi), was tracked using imagej (schindelin et al., 2012; rueden et al., 2017) with the trackmate plugin (tinevez et al., 2017) and analyzed as described previously (li et al., 2020) . for exome sequencing, two clonal cell lines #33 and #114 were sequenced representing short period and long period clones, respectively. genomic dna was purified using a chargeswitch gdna mini tissue kit (invitrogen). libraries were made using the sureselect xt reagent kit (agilent) following the manufacturer's instruction. all reads were mapped to mm10 genome assembly. we used haplo-typecaller and unifiedgenotyper from gatk to call variants and the results were the union of both callers. snpeff was used to annotate variants. results were further filtered as follows: threshold gq ! 20, total counts ! 8, and alternate frequency (defined as the ratio of alternates to total counts)!30%. for rna-seq, cells were collected at two time-points after synchronization: the first peak (t1) and the following trough (t2) based on lumicycle recording. at each time-point, we collected 2 replicates for 10 clonal cell lines and 1 replicate for 24 subclones. rna was isolated using trizol (life technologies), and libraries were prepared as described previously (takahashi et al., 2015) . raw reads were tested for quality using fastqc. the resulting reads were mapped to mm10 annotation from ucsc using tophat (trapnell et al., 2009) . the output bam file was then filtered for uniquely mapped reads using samtools (li et al., 2009) , and rpkm calculations were performed using ana-lyzerepeats.pl of homer suite (heinz et al., 2010) . the average rpkm value for each gene was calculated separately for each of the six groups (sp, lp, ssp, lsp, slp, llp). to identify significant de genes, the list was further filtered based on expression level. only genes for which the maximum average rpkm value among six groups was greater than 0.5 were preserved. differential gene expression analysis was carried out with deseq2 (love et al., 2014) and edger (robinson et al., 2010) using a raw read counts matrix generated with featurecounts tool (liao et al., 2014) . genes with fdr < 0.05 were deemed significant. results from both programs were combined to generate a final de gene list. pearson correlation coefficient between circadian period length and gene expression was calculated across all 88 samples (including replicates and different time-points) in excel. p-value was adjusted using benjamini-hochberg (bh) method, and fdr < 0.05 was considered as significant. the overlaps between significant de genes and period-correlated genes were defined as period-correlated de genes. multidimensional scaling (mds) analysis with euclidean distance was performed using edger. ingenuity pathway analysis (qiagen) was used to identify the pathways associated with period-correlated de genes, using all expressed 22,786 genes (average rpkm >0) as a reference set. for dna methylation sequencing, cells were collected at the first peak (t1) after synchronization. each clone included two replicates. dna was purified using a purelink genomic dna mini kit (invitrogen). libraries were made using the premium reduced representation bisulfite sequencing (rrbs) kit (diagenode) following the manufacturer's instruction. raw reads were tested for quality using fastqc and trimmed with trim galore. the trimmed reads were aligned to mm10 using bismark (krueger and andrews, 2011) . the cpg reports from bismark methylation extractor were then analyzed using methylkit (akalin et al., 2012) . we used default settings to discard bases that had coverage below 10x and/or more than 99.9th percentile of coverage in each sample. differentially methylated regions (dmr) were identified using a tiling window of 1,000 bp and a step size of 1,000 bp comparing sp and lp group. clone#44 was excluded for dmr analysis because of the outlying clustering ( figure 5-figure supplement 1 ). overdispersion correction with fisher's extract test was applied. p-value was adjusted with bh method. dmrs with fdr < 0.05 and methylation difference >25% were considered as significant. genes with significant dmrs located either in the gene body or 5 kb upstream of the transcription start site (tss) were considered as dmr-associated genes. principal component analysis (pca) was performed using methylkit. all sequencing was performed by the utsw mcdermott sequencing core facility. weighted gene co-expression network analysis was performed using wgcna package (langfelder and horvath, 2008) . only genes for which the maximum average rpkm value among six groups was greater than 0.5 rpkm were used. a soft-threshold power was automatically calculated to achieve approximate scale-free topology (r 2 >0.85). networks were constructed with blockwisemodules function with biweight midcorrelation (bicor). we used cortype = bicor, networktype = signed, tomtype = signed, tomdenom = mean, maxblocksize = 16000, mergingthresh = 0.10, mincorekme = 0.4, minkmetostay = 0.5, reassignthreshold = 1e-10, deepsplit = 4, detectcutheight = 0.999, minmodulesize = 100, power = 26. the modules were then determined using the dynamic tree-cutting algorithm. deep split of 4 was used to split more aggressively the data and create more specific modules. spearman's rank correlation was used to compute module eigengene -covariates associations. gene set enrichment applied for module -period-correlated de genes was performed using a fisher's exact test in r with the following parameters: alternative = 'greater', conf.level = 0.99. the ppi network was generated using string without textmining, and the minimum required interaction score was 0.7 (szklarczyk et al., 2019) . gene knockdown assay shrna sequences were cloned into plko.1-trc vector (gift from david root, addgene plasmid # 10878) (moffat et al., 2006) . scramble shrna (5' -cctaaggttaag tcgccctcg-3') was used as control. lentiviruses were produced using hek293t cells as described previously (huang et al., 2012) . viruses were harvested twice after transfection, at 48 and 72 hr, to infect fibroblasts. forty-eight hours after first infection, cells were synchronized and loaded for lumi-cycle analysis. rna was extracted at the first peak after synchronization to check knockdown efficiency via qpcr. average of three reference genes (gapdh, hprt and ywhaz) served as internal control. see supplementary files 2 and 3 for shrna target sequences and primer sequences, respectively. the eif2 signaling pathway activator halofuginone (sigma-aldrich) was dissolved in dmso as 10 mm stock and used at 50 nm. tunicamycin (sigma-aldrich) was dissolved in dmso as 5 mg/ml stock and used at 5 mg/ml. cells were treated for 4 hr and 6 hr, respectively, before loading for lumicycle analysis. dnmt inhibitor sgi-1027 (sigma-aldrich) was dissolved in dmso as 200 mm stock and used at 10 mm. zebularine (sigma-aldrich) was dissolved in water as 200 mm stock and used at 50 mm or 100 mm. the parent culture was continuously treated for up to 60 days and split when necessary. mefs and nih3t3 cells were treated for 3 days. methylkit: a comprehensive r package for the analysis of genome-wide dna methylation profiles effects of a novel dna methyltransferase inhibitor zebularine on human breast cancer cells the causes and consequences of genetic heterogeneity in cancer evolution advances in epigenetics link genetics to the environment and disease c-jun n-terminal kinase inhibitor sp600125 modulates the period of mammalian circadian rhythms identification of diverse modulators of central and peripheral circadian clocks by high-throughput chemical screening a new class of quinoline-based dna hypomethylating agents reactivates tumor suppressor genes by blocking dna methyltransferase 1 activity and inducing its degradation the diverse roles of dna methylation in mammalian development and disease a dna methylation reader complex that enhances gene transcription simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities exploring expression data: identification and analysis of coexpressed genes crystal structure of the heterodimeric clock:bmal1 transcriptional activator complex biovenn -a web application for the comparison and visualization of biological lists using area-proportional venn diagrams epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals functions of dna methylation: islands, start sites, gene bodies and beyond regulated dna methylation and the circadian clock: implications in cancer single-cell epigenomics: recording the past and predicting the future emergence of noise-induced oscillations in the central circadian pacemaker the circadian clock and pathology of the ageing brain bismark: a flexible aligner and methylation caller for bisulfite-seq applications wgcna: an r package for weighted correlation network analysis persistent cell-autonomous circadian oscillations in fibroblasts revealed by six-week single-cell imaging of per2::luc bioluminescence the sequence alignment/map format and samtools noise-driven cellular heterogeneity in circadian periodicity featurecounts: an efficient general purpose program for assigning sequence reads to genomic features cellular construction of a circadian clock: period determination in the suprachiasmatic nuclei multi-omic measurements of heterogeneity in hela cells across laboratories moderated estimation of fold change and dispersion for rna-seq data with deseq2 the dna methyltransferase family: a versatile toolkit for epigenetic regulation a lentiviral rnai library for human and mouse genes applied to an arrayed viral high-content screen central and peripheral circadian clocks in mammals circadian gene expression in individual fibroblasts: cell-autonomous and self-sustained oscillators pass time to daughter cells cytosine modifications exhibit circadian oscillations that are involved in epigenetic diversity and aging circadian oscillations of cytosine modification in humans contribute to epigenetic variability, aging, and complex disease the eif2a kinase gcn2 modulates period and rhythmicity of the circadian clock by translational control of atf4 dna methylation of five core circadian genes jointly contributes to glucose metabolism: a gene-set analysis in monozygotic twins nature, nurture, or chance: stochastic gene expression and its consequences mtor signaling regulates central and peripheral circadian clock function noise in gene expression: origins, consequences, and control dnmt3a and dnmt3b associate with enhancers to regulate human epidermal stem cell homeostasis edger: a bioconductor package for differential expression analysis of digital gene expression data imagej2: imagej for the next generation of scientific image data fiji: an open-source platform for biological-image analysis a central role for ubiquitination within a circadian clock protein modification code string v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets the genetics of mammalian circadian order and disorder: implications for physiology and disease chip-seq and rna-seq methods to study circadian control of transcription in mammals transcriptional architecture of the mammalian circadian clock trackmate: an open and extensible platform for single-particle tracking tophat: discovering splice junctions with rna-seq qqman: an r package for visualizing gwas results using q-q and manhattan plots dynamic dna methylation across diverse human cell lines and tissues individual neurons dissociated from rat suprachiasmatic nucleus express independently phased circadian firing rhythms bioluminescence imaging of individual fibroblasts reveals persistent, independently phased circadian rhythms of clock gene expression nrf2 regulates core and stabilizing circadian clock loops, coupling redox and timekeeping in mus musculus ggplot2: elegant graphics for data analysis dplyr: a grammar of data manipulation dual specificity phosphotase 18, interacting with sapk, dephosphorylates sapk and inhibits sapk/jnk signal pathway in vivo tip30 regulates lipid metabolism in hepatocellular carcinoma by regulating srebp1 through the akt/mtor signaling pathway impact of cytosine methylation on dna binding specificities of human transcription factors period2 3'-utr and microrna-24 regulate circadian rhythms by repressing period2 protein accumulation jnk regulates the photic response of the mammalian circadian clock zebularine inhibits the growth of hela cervical cancer cells via cell cycle arrest and caspase-dependent apoptosis this research was supported by the howard hughes medical institute. all bioinformatics analyses were carried out on stampede2 cluster of tacc at ut austin. the authors would like to thank all takahashi lab members, dr. carla b green, and dr. shin yamazaki for helpful discussions, and the mcdermott bioinformatics lab at ut southwestern medical center for their bioinformatics support. jst is an investigator in the howard hughes medical institute. immortalized mouse ear fibroblast cells from male mice carrying per2::lucsv bioluminescence reporter were maintained in dmem (corning) supplemented with 10% fetal bovine serum (fbs). to generate clonal cell lines, cells were diluted and seeded at a density of~30 cells per 96-well plate with conditioned medium. each well was monitored on a daily basis to make sure only single colonies were picked. 20 clonal cell lines were randomly selected and cultured continuously for 20 generations (3 days/generation) to verify stability of circadian period. primary mouse embryonic fibroblast (mef) cells carrying per2::lucsv bioluminescence reporter were isolated from 13.5 day mouse statistical analysis of single-cell imaging was performed with a python code as described previously (li et al., 2020) . student's t-test and two tailed f-test were performed in excel. p-values were adjusted using benjamini-hochberg (bh) method. two-way anova analysis with multiple comparisons via tukey test was performed using graphpad prism. heatmaps for single-cell imaging analysis and gene expression were generated using mev based on z-score. graphpad prism was used to generate heatmaps for t-test and f-test based on log transformed q-value. volcano plot was generated in r using ggplot2 (wickham, 2016) . venn diagrams were generated using biovenn (hulsen et al., 2008) . manhattan plots were generated in r using qqman (turner, 2014) . quadrant plots were generated using dplyr package in r. funding howard hughes medical institute the funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. key: cord-003254-yiqdsf9z authors: schlub, timothy e; buchmann, jan p; holmes, edward c title: a simple method to detect candidate overlapping genes in viruses using single genome sequences date: 2018-08-07 journal: mol biol evol doi: 10.1093/molbev/msy155 sha: doc_id: 3254 cord_uid: yiqdsf9z overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. in addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. we introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. we applied this method to 2548 reference rna virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of rna virus replication. gene overlap occurs when two or more genes share the same region of a nucleotide sequence in a genome. this occurs frequently in viruses, especially those with rna genomes, but has also been observed in bacteria and in eukaryotes including humans (smith et al. 1977; keese and gibbs 1992; veeramachaneni et al. 2004; nakayama et al. 2007 ). the high prevalence of gene overlap in viruses has been attributed to two complementary theories: gene "compression" and gene "novelty." compression theory argues that the size of viral genomes is constrained by factors such as high mutation rates and the small capsid structure housing the genetic material. this constrained genome size subsequently exerts selection pressure on genes to overlap to maximize genetic potential (belshaw et al. 2007; chirico et al. 2010) . gene novelty theory asserts that the constrained nature of viral genomes, combined with their limited noncoding regions, makes the generation of new genes difficult without major changes in genomic structure or input from the host genome. mutations in current genes that generate a new open reading frame (orf) then allow the generation of new genes within an established older gene in a process called "overprinting" (keese and gibbs 1992; sabath et al. 2012; brandes and linial 2016) . these theories are not mutually exclusive and both processes may be operating in virus genomes. overlapping genes may also function as a mechanism for regulating gene expression and reduce the probability of mutation fixation in overlapping areas as the resident genes may have competing selection pressures (krakauer 2000; dreher and miller 2006) . due to these evolutionary constraints, overlapping genes frequently encode proteins with accessory functions that play important roles in pathogenicity or spread (rancurel et al. 2009 ). overlapping genes were first detected following the discovery that the cumulative length of protein sequences in bacteriophage u174 exceeded the length of the genome (barrell et al. 1976 ). today, the detection of overlapping genes still largely relies on laboratory methods that isolate, sequence, and align individual proteins to reference genomes (fellner et al. 2015) . these and other potential laboratory methods such as ribosome profiling (irigoyen et al. 2016) are costly and time intensive, making large scale screening and identification of overlapping genes expensive. necessarily, these factors have led to the development of bioinformatics and theoretical methods for the analysis of overlapping genes that rely on genome sequence analyses alone. for example, synonymous sites that exhibit a reduced nucleotide substitution rate are indicative of functional overlapping proteins; because these substitutions affect two proteins they are usually expected to be deleterious and hence are observed at a reduced rate (firth and brown 2005, 2006; jagger et al. 2012) . a number of other properties of overlapping genes have been article ß the author(s) 2018. published by oxford university press on behalf of the society for molecular biology and evolution. this is an open access article distributed under the terms of the creative commons attribution non-commercial license (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. for commercial re-use, please contact journals.permissions@oup.com open access used as effective bioinformatics makers, such as synonymous codon dissimilarity between newly generated overlapping genes and the remainder of the genome (pavesi et al. 1997 (pavesi et al. , 2013 , and the restriction that particular codon sequence orders place on alternative reading frames (lebre and gascuel 2017) . for example, the reverse complementary nucleotide sequence for two adjacent tyrosines (tat/c and tat/c) will be a/gta and a/gta, which always creates a stop codon (either a taa or tag after a reading frame shift of 1 nucleotide). although these properties help in the development of bioinformatics techniques to discover unknown overlapping genes, they are restricted by their requirement for multiple genomic sequences or by their poor sensitivity. with the rapid rise of metagenomics to discover new viruses (bekal et al. 2011; ballinger et al. 2014; shi et al. 2016 shi et al. , 2018 , efficient and sensitive approaches of identifying overlapping genes that require genome sequence information alone will be essential. herein, we present a new statistical method for detecting overlapping genes in different reading frames that relies on only a single nucleotide sequence of a gene or genome. we apply this method to a large scale computational screening of all available (linear) rna virus genomes. the method estimates the theoretical expected length of orfs before a stop codon is reached in all reading frames within an established gene. if an orf exists of much greater length than predicted by this expected length, then we surmise that there has been selection against the accumulation of stop codons that shorten the putative orf. we conclude that this constitutes evidence that the orf in question provides functional benefit to the virus. despite its simplicity, we show that this method is a powerful way to detect functional overlapping genes that can be readily applied to large scale computational screening of all known viruses and to viruses newly discovered through metagenomics. our rationale is that overlapping orfs with functional benefit will result in negative selection against nucleotide substitutions that introduce stop codons within that gene. accordingly, the length of orfs (as measured by the distance between bookend stop codons) is likely to be larger when they are functional than what would be expected by random chance alone where stop codons could be introduced without penalty. hence, the defining characteristic of our method for the detection of overlapping genes is identifying orfs larger than expected by chance alone (fig. 1a ). we developed three tests for estimating the distribution of expected orf lengths. briefly, in the first test, we estimate the expected length of orfs by permuting codon positions in the original reading frame and then measuring orf lengths in other reading frames. this process is repeated to generate an expected distribution of orf lengths (codon permutation test, fig. 1b ). in the second test, instead of permuting codon positions, the codon order is unchanged and nucleotide substitutions that would introduce synonymous mutations in the original reading frame are randomly generated (synonymous mutation test, fig. 1c ), before measuring orf lengths in the other reading frames. in the third test, referred to as the combined test, the p values for both the codon permutation test and the synonymous mutation test must fall below some cut-off value. to demonstrate the applicability of this method, we first considered andean potato latent virus that contains a known overlapping gene. andean potato latent virus is a positivesense single-stranded rna virus (family tymoviridae) in which the i870_gp1 gene, that encodes the putative the expected orf length based on codon composition is calculated. orfs longer than expected by random chance are identified. the expected orf length is estimated by one of three tests. for the codon permutation test (b) the codon sequence on the original frame is permuted and orf lengths on alternative reading frames measured for each permutation. for the synonymous mutation test (c), codons that preserve the original amino acid sequence are randomly generated and the length of orfs on alternative reading frames subsequently measured (note that codon replacement is not restricted to the example mutations shown in the figure, all of which occur in the third nucleotide positions, and that codon replacement with the original codon is also possible). the third test requires both the codon permutation test and the synonymous mutation test p values to be below some cut-off value. simple method to detect candidate overlapping genes . doi:10.1093/molbev/msy155 mbe movement protein, is overlapping. i870_gp1 is 665 codons long, is located on frame þ2, and is largely contained within the larger (1832 codon) i870_gp2 gene that encodes the enzymes necessary for virus replication (methyltransferase, endopeptidase, helicase, and polymerase) ( fig. 2a ). we calculated the distribution of expected orf lengths on frames þ1 and þ2 using the codon permutation test. the distributions of these lengths are shown with the shading in figure 2b and c. the actual orf lengths on frames þ1 and þ2 on the unpermuted i870_gp2 gene are represented by black dots on top of the theoretical distribution. for frame þ1, we observe 37 orfs, all of which have lengths within expected ranges (p ¼ 0.92). a different picture is observed in frame 2þ. although there are 62 orfs, 61 of which have lengths within the expected range, there is a single orf whose length far exceeds the expected distribution of lengths (p < 0.0001); this is correctly identified as i870_gp2. the synonymous mutation test produces similar results in this example. to explore the possibility of using this method to screen for candidate overlapping orfs, we calculated both the sensitivity and false discovery rate of the codon permutation test, the synonymous mutation test, and a combined test that requires an orf to be larger than expected by both the codon permutation and synonymous mutation test. as there are too few coding regions within a single genome to estimate the sensitivity and false discovery rate with sufficient precision, we estimated the population sensitivity and false positive rate across a subset of viruses (linear rna viruses) known to contain many overlapping genes (see materials and methods section). accordingly, whole genome sequences were downloaded from 2548 reference linear rna viruses available on genbank; this produced a total of 6408 coding regions that were used to estimate the sensitivity and false discovery rate of each test. the codon permutation, synonymous mutation and combined test all rely on detecting overlapping orfs that are larger than expected by random chance. consequently, the sensitivity of these tests will depend on how much of a gene is overlapping (denoted as overlap length). the sensitivity and false discovery rate will also be dependent on the p value cutoffs used to determine if an orf is larger than expected by random chance, with higher p values providing higher sensitivity at a cost of greater false discovery. to understand these dependencies, receiver operator characteristic (roc) curves were generated across a range of p values, and across a range of overlapping gene lengths for all three tests ( fig. 3 ). the number of true overlapping orf's used in this sensitivity set ranged from 958 for overlapping orf lengths greater than 0 nucleotides, to 199 for overlapping orf lengths greater than 300 nucleotides. we find that the three test (codon permutation, synonymous mutation, and combined) have similar sensitivities for p value cut-offs between 0.001 and 0.10, with the synonymous mutation generally having the highest sensitivity, followed by the codon permutation test and then the combined test ( fig. 3 , table 1 ). however, for all three tests we also find that sensitivities are generally insufficient when the overlapping length is below 50 nucleotides in length (<17 codons), but improve considerably as the overlapping length increases above 50 nucleotides. importantly, the three tests show the expected distribution of orf lengths in frame 2þ (shaded area) calculated by permutation test, and the actual orf lengths on frame 1þ (black dots). the known frameshifted gene, i870_gp1, was clearly identified using the permutation test as its length was much larger than that expected by chance alone (p < 0.0001). schlub et al. . doi:10.1093/molbev/msy155 mbe considerable differences in false discovery rates, with the synonymous mutation tests showing the highest (worst) rates and the combined test with the lowest (best) rates. as the highest and lowest sensitivity tests (synonymous mutation and combined, respectively) are also the tests with the corresponding highest and lowest false discovery rates, we used the standard measure of a diagnostic tool-the area under the curve-to compare which test gave the best sensitivity and false discovery rate combinations. the area under the curve here will lie between 0 and 1, with a value of 0.5 indicating a screening tool of no benefit, and a value of 1 indicating a perfect screening tool with no error. accordingly, we find that the combined test consistently has the best sensitivity and false discovery rate combinations across all minimum overlap lengths, with an area under the curve increasing from 0.59 to 0.89 as the minimum overlap length increases from 50 to 300 nucleotides (table 1). this demonstrates that the combined test is a successful screening tool with both high sensitivity and relatively low false discovery. synplot2 is a commonly used bioinformatic approach to identify overlapping genes by detecting reduced variability at synonymous sites (firth 2014) . although powerful, this method is necessarily constrained by the requirement for multiple sequences of sufficient diversity to robustly detect overlapping genes. in contrast, our method requires only a single sequence, and can therefore be applied in many situations where synplot2 would be inviable. quantitative comparisons of sensitivity and false discovery between the methods are difficult, as the factors associated with sensitivity in synplot2 (sequence diversity, recombination, and window size) are not present in our method. therefore, to make this comparison informative, we apply our method to the synplot2 validation data set (table 1 from firth 2014) and report the results in table 2. this validation consists of 21 gene overlaps with a minimum overlap length of 108 nucleotides. we find that using a p value cut of value of 0.01, the codon permutation method, synonymous mutation method, and combined approach detects 12, 12, and 10 of the gene overlaps, respectively. these results are in agreement with our previous sensitivity estimates. for example, figure 3 shows that the combined approach will have $50% sensitivity for overlaps of at least 100 nucleotides when a p value cut-off of 0.01 is used. we next screened for previously undiscovered overlapping genes by using the combined test and a p value cut-off of 0.001. this cut-off was chosen as only 9.7% of any discoveries are estimated to be a false positive (table 1) . we find evidence for 40 undocumented functional overlapping orfs within all reference genomes of linear rna viruses. of these 40 orfs, two had been previously described in synplot2's rna screening in 2014 (firth 2014 mbe not annotated within genbank, they were not necessarily undiscovered, as some existed within the ncbi protein databases. to remove these already discovered or hypothesized overlapping orfs, we performed a protein blast search of the 38 undocumented overlapping orfs and found that nine had previously been discovered but were not annotated within the reference genome, thereby leaving 29 newly discovered functional overlapping orfs from our method (table 3, supplementary materials s2 and s5, supplementary material online). of these newly discovered orfs, we would expect approximately three to be false discoveries. to test if we can detect homologs of the 29 newly discovered overlaps in other species, we aligned their protein sequence against the ncbi nt database using tblastn (supplementary material s4, supplementary material online). we filtered the results to only include alignments with a similarity of at least 90% and where the alignment was at least 90% the length of the orf (material table 1 . sensitivity, false discovery, and area under the curve for each test across a range of p value cut-offs and overlapping lengths. the 29 discovered orfs ranged from 87 to 708 codons in length, with a median and interquartile range of 195.5 (157-279.2) codons; 13 were transcribed in the same direction (sense, frames þ1 and þ2) as the original gene with 17 coded in the opposite direction on complementary nucleotides (antisense frames àc0, àc1, and àc2, supplementary material s1, supplementary material online). in addition, 18 of the orfs were located completely within their reference coding region, eight lay on the boundary and four encompassed the entire coding region, suggesting that the reference coding region may lie completely within the larger discovered orf. of these discovered orfs, a number are of particular interest and discussed in more detail below. the largest detected orf was 708 codons long and located within nhumirim virus, a positive-sense flavivirus recently isolated from mosquitoes in brazil (pauvolid-correa et al. 2015) . unexpectedly, this orf is coded on a reverse complementary reading frame (àc1), which means that unlike the other proteins in this virus, transcription must occur from a negative-sense rna template. this finding invites further investigation of the potential mechanisms by which transcription of reverse complementary reading frames might occur in positive-sense rna viruses. in addition, the 26th codon in this orf is a methionine (a common start codon) suggesting that a large component of the 708 codons may be transcribed. a 173 codon long orf was detected within the phosphoprotein p coding gene of bovine respirovirus virus 3 (singlestranded negative-sense rna virus, family paramyxoviridae). this þ2 reading frame orf was particularly interesting because although its protein alignment didn't match any bovine respirovirus virus proteins, it did align with v proteins and rna editing derivatives within the phosphoprotein p gene of (galinski et al. 1992; wells and malur 2008) . these derivatives may play an important role in virus replication (durbin et al. 1999) , virulence (huang et al. 2003) , and/or the disruption of interferon expression (roth et al. 2013) , and the discovery is in agreement with that claims all three reading frames in the p gene of bovine respirovirus are expressed (pelet et al. 1991 ). the method also detected two new orfs of length 160 and 131 codons in phosphoprotein in another paramyxovirus, tioman virus. although one of these orfs was in the same sense (þ2 reading frame) the other was in the reverse complementary frame (àc2). the three smallest orfs detected were in ligustrum necrotic ringspot virus, a positive-sense virus from the betaflexiviridae, okahandja mammarenavirus, a negative-sense virus from the arenaviridae, and simbu orthobunyavirus, a vector-borne negative-sense virus from the peribunyaviridae. these orfs had lengths 105, 91, and 87 codons, and were in frames þ1, þ2, and àc2, respectively. interestingly, these were three of the four detected orfs that completely encompass their relatively short reference genes, suggesting that the reference gene may be entirely located within the orfs discovered by this method. we present a simple new method that uses a single genome sequence to detect candidates for overlapping genes. the method assumes that functional orfs are longer than expected by random chance as they experience selective pressure against mutations that introduce stop codons. we quantify this by using three ways to estimate the null distribution for orfs lengths within each reading frame of a gene, and use the null to identify those orfs significantly longer than predicted by random chance. this approach has a number of advantages over current bioinformatics methods to detect overlapping genes. in addition to being simple and quick, it only requires a single genome sequence. this is in contrast to other bioinformatic methods that require multiple sequences to estimate and compare nucleotide or codon diversity. this feature allows the method to be applied much more broadly in both metagenomics projects where genomes of new viruses are frequently only present in a single copy (bekal et al. 2011; ballinger et al. 2014; shi et al. 2016) , and also in screening scenarios such as demonstrated herein. the method is best suited to refine regions of the genome that contain candidates for functional overlapping genes, after which these regions can be further tested for functionality with more resource intensive laboratory methods such as protein isolation, ribosomal profiling (michel et al. 2012; ingolia 2016) , and studying the effects of introduced knock out mutations (chung et al. 2008) . a second important feature of our method is the relatively high sensitivity to detect overlapping genes, whilst maintaining acceptable false discovery rates. this is best achieved by using the combined test where newly detected orfs must be larger than expected by both the codon permutation and synonymous mutation tests. the combined test is advantageous as true positives are readily detected by both tests, so the constraint of requiring both tests to detect the orf does not impact the sensitivity. however, the combined test does substantially reduce the false positives rate, as false positives detected by one test are frequently excluded by the other. there is also scope to further reduce false discovery by modifying our method, or by imposing post analysis constraints, for example by calculating orf lengths from start codon to stop codon rather than between two stop codons. this was not considered for the screening results here due to variation in alternative start codons among viruses, but would be an important optimization in more targeted screening. one caveat to this method (and other bioinformatics approaches) is that sensitivity depends on the size of overlap, with smaller regions of overlap being more difficult to detect. unlike other methods, however, we explicitly calculated the sensitivity for many lengths of overlap and find that a length of at least 50 nucleotides (17 codons) is required before the method becomes effective. as this length increases to 300 nucleotides (100 codons), the method becomes a very powerful diagnostic tool as measured by an area under the curve equal to 0.89. the estimate of this method's sensitivity and false discovery rates for an overlapping gene detection method is a strength, as although sensitivity can be calculated for other methods, false discovery estimation is often neglected and rarely reported due to a lack of negative controls. when it is reported, it is usually based on estimates of type 1 error rates of p values, rather than comparison to a negative control as we have done in here. to demonstrate the utility of our method's effectiveness for overlapping gene screening, we individually analyzed all reference linear rna genomes available on genbank. this provided evidence for 29 undocumented overlapping orfs of which we expect only 3 to be false positives, although all should be verified experimentally. one notable orf identified here is the exceptionally long (708 codons) antisense orf in nhumirim virus, a single-stranded positive-sense rna virus from the family flaviviridae, which suggests that this virus may employ a novel method of transcription and clearly merits further investigation. we also identified several undiscovered orfs in the phosphoprotein p within the paramyxoviridae family, a region known to frequently contain overlapping genes in other reading frames due to rna editing. within bovine respirovirus virus 3, the orf codon sequence discovered here aligned with many v proteins of other members of the paramyxoviridae. as the v protein in these viruses also overlaps with the phosphoprotein p protein, this suggests that the v protein also exists in bovine respirovirus virus 3. in other paramyxoviridae, notably tioman virus, we also identified antisense orfs in the phosphoprotein. the detection of 17 antisense orfs is notable. antisense overlaps have been shown to exist in a number viruses that use dna as a replication intermediate including those in the herpesviridae (ward et al. 1996) , rep/orf3 in porcine schlub et al. . doi:10.1093/molbev/msy155 mbe circovirus 2 (he et al. 2012 ) and hbz/p12 and hbz/p30 in human t lymphotropic virus 1 (arnold et al. 2006) . they are also suspected to occur in many more viruses with dna intermediaries, including a long suspected antisense protein (asp) in hiv-1 (torresilla et al. 2015; cassan et al. 2016 ). in addition, they have been infrequently suggested to occur in rna viruses that do not use dna intermediates, such as a more than 100 amino acid (a) overlapping antisense hypothetical protein in rice black streaked dwarf virus (dsrna) (zhang et al. 2001 ), a 96 aa overlapping antisense hypothetical protein in lymphocytic choriomeningitis mammarenavirus (-ssrna) (salvato et al. 1989) , and a possible 167 aa overlapping antisense orf called "neg8" in human influenza a virus (clifford et al. 2009; sabath et al. 2011) . our method can be used to investigate these further. for example, in the case of neg8 we find that a 167 codon orf on in the neg8 reading frame (àc2) is highly statistically unlikely by both the codon permutation and synonymous mutation methods, providing further evidence for a functional benefit of this orf. interestingly, however, a further frameshift of 1 nucleotide (frame àc1) would make orfs of such lengths much more likely (p ¼ 0.02 and 0.03 for codon permutation and synonymous mutation methods respectively), demonstrating the importance of the expected orf lengths on every individual reading frame, rather than just the sense direction. furthermore, when applying our method to hiv-1, we find that the possible antisense orf (asp) is not substantially longer than expected by chance alone (p ¼ 0.06 and p ¼ 0.04 for codon permutation and synonymous mutation methods, respectively) in that reading frame. our results do indicate that antisense orfs are present at levels higher than currently expected. this does not necessarily mean that a transcribed protein is functional, although its presence may be indicative of some functional benefit, such as regulating expression by diverting ribosomes (pelechano and steinmetz 2013; beltran and garcia de herreros 2016) . importantly, our method's high detection of antisense orfs are in contrast to other bioinformatic screening methods which have been shown to have poor sensitivity to antisense orfs by computer simulations. this is because synonymous mutations in frame þ0 impact the reverse complementary frame (specifically frame àc2) much less than other reading frames (mir and schober 2014) . although this feature would impact the sensitivity of our synonymous mutation test, as it would for all current methods, the codon permutation test will not impacted by this, and could be used in isolation when specifically screening for antisense orfs. overlapping genes play an important role in viral evolution (simon-loriere et al. 2013) , and are particularly prevalent in rna viruses with small genomes. however, the study of overlapping genes is limited by detection methods that either have high laboratory costs, or require enough sequences to make reliable substitution rate comparisons. our simple, but powerful, permutation and synonymous mutation method requires only a single genome sequence and is computationally quick to run. these properties make it an ideal choice for identifying candidate orfs in screening situations such as metagenomics viral discovery projects, or applied to large genome databases such as we have done here. whole genome sequences were downloaded for all viruses available from the ncbi ftp site ftp://ftp.ncbi.nlm.nih.gov/ genomes/viruses. of the 5757 viral genomes, 2548 were rna viruses with linear genomes, and these were selected for analysis. all 6408 coding regions (annotated with a cds in genbank) were analyzed, excluding 291 regions annotated with a "join" indicating some form of midsequence frameshift in the established gene such as a ribosomal slippage (leaving 6117 coding regions analyzed). the following notation is used to identify the different reading frames (supplementary material s1, supplementary material online): þ0 is the original reading frame; þ1 or þ2 is a frameshift of 1 or 2 nucleotides, respectively (5 0 to 3 0 transcription); à0, à1, or à2 a frameshift of 0, 1 or 2 nucleotides, respectively, after the coding sequence has been reversed (i.e. 3 0 to 5 0 transcription); þc0, þc1 or þc2 is a frameshift of 0, 1 or 2 nucleotides, respectively, on the complement of the coding sequence (3 0 to 5 0 transcription); and àc0, àc1 or -c2 is a frameshift of 0, 1 or 2 nucleotides, respectively, on the complement and reversed coding sequence (5 0 to 3 0 transcription). þ0, þ1, þ2, þc0, þc1, þc2 are considered the only viable reading frames as transcription on these frames occur in the 5 0 to 3 0 direction. for each coding region and for each viable alternative reading frame (þ1,þ2,àc0, àc1, àc2) we performed the following analysis (summarized in fig. 1a ). first, the length of orfs between the stop codons "tga," "tag," and "taa" on that specific alternative reading frame was calculated. then, 20,000 new coding sequences were created by either randomly permuting the codons in frame þ0 (codon permutation test; fig. 1b ), or for each amino acid in reading frame þ0, randomly choosing a replacement codon (for which the original codon is a possible candidate) that encodes the same amino acid (synonymous mutation test; fig. 1c ). for each of these 20,000 new coding sequences, the length of orfs between stop codons in that alternative reading frame was calculated again. the lengths of orfs over all 20,000 randomly generated coding sequences were pooled to calculate a theoretical distribution of the length of orfs on that specific alternative reading frame. for each orf length l in the original unpermuted coding sequence, the probability of observing a length as large or larger by random chance alone is calculated using this theoretical distribution as follows: where c is the empirical cumulative distribution function of the theoretical distribution of lengths calculated by permuting codons in the original coding sequence: that is, c(l) is the simple method to detect candidate overlapping genes . doi:10.1093/molbev/msy155 mbe probability of sampling an orf length less than l on that specific reading frame. the value 1 à c(l) has an interpretation similar to that of a p value testing whether or not the length l is sampled from the theoretical distribution of lengths calculated earlier. to correct this "p value" for the total number of alternative reading frame orfs, s, in the original coding sequence the following equation is used: this adjustment is analogous to a bonferroni adjustment of p values, here correcting for the number of orfs within a reading frame. small p values for an orf are interpreted as evidence that the orf in question is larger than expected by random chance alone and therefore provides evidence that there has been negative selection against mutations that introduce stop codons in this orf. from this, we can also infer that the orf is of functional benefit to the virus. the third test, denoted as the "combined test," requires that the p value for both the codon permutation test and the synonymous mutation test be below some cut-off value. the method was only applied to orfs on alternative reading frames that exist totally or partially within the parent orf. when orfs on alternative reading frames extended beyond the parent orf boundaries, its length was truncated to the length contained with the parent orf. this analysis was performed using r (version 3.3.2; r_core_team 2016) and required the packages seqinr the sensitivity (true positive rate) is measured as the proportion of known overlapping genes within the downloaded reference genomes that are detected using our method. an orf identified with our method was considered a true positive if it was located on the same reading frame and overlapped with a gene already annotated in the reference genome. as the sensitivity of our method will be dependent on the extent of overlap, we calculated the sensitivity for detecting previously annotated overlapping genes where the minimum nucleotide length of overlap is 1, 10, 20 50, 100, 150, 200, and 300 nucleotides. the false discovery rate calculation is more complex as the absence of an annotated overlapping gene does not exclude its biological presence so that distinguishing between false positives and new discoveries is not possible without extensive laboratory work. to overcome this, we conservatively estimated the false discovery rate of our tests by using the nonviable 3 0 -5 0 reading frames (à0, à1, à2, þc0, þc1, and þc2, supplementary material s1, supplementary material online) as a negative control. that is, as detected orfs on these frames cannot be transcribed into proteins and are therefore false positives, they serve as an estimate to what proportion of detected orfs on viable reading frames are similarly not functional. a search for sequences homologous to the 29 newly discovered overlapping orfs was performed by aligning their protein sequences to the ncbi nt database using tblastn (tblastn 2.6.0þ). the e-value threshold was set to 0.001 while all other settings were set to their default values. the results were stored in a sqlite3 database (3.24.0). for homologous sequence detection the alignments were filtered to include only alignments with similarity !90% and length !90% of the corresponding orf sequence. from each filtered alignment, the ncbi accession for the query (orf) and subject were extracted and the corresponding taxid and lineage obtained using ncbi entrez. a python tool taxmax.py was developed (https://gitlab.com/janpb/taxmax.git) to compare the ncbi lineage from each orf and its aligned sequence. the similarity between two lineages is described as a score between 0 and 1. a score of 0 indicates no similarity between lineages while a score of 1 indicates both sequences have the same ncbi lineage. for each alignment the alignments positions were compared to check for orthologous positions. supplementary data are available at molecular biology and evolution online. enhancement of infectivity and persistence in vivo by hbz, a natural antisense coded protein of htlv-1 discovery and evolution of bunyavirids in arctic phantom midges and ancient bunyavirid-like sequences in insect genomes overlapping genes in bacteriophage phix174 discovery and initial analysis of novel viral genomes in the soybean cyst nematode the evolution of genome compression and genomic novelty in rna viruses antisense non-coding rnas and regulation of gene transcription gene overlapping and size constraints in the viral world concomitant emergence of the antisense protein gene of hiv-1 and of the pandemic why genes overlap in viruses an overlapping essential gene in the potyviridae evidence for a novel gene associated with human influenza a viruses translational control in positive strand rna plant viruses mutations in the c, d, and v open reading frames of human parainfluenza virus type 3 attenuate replication in rodents and primates evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting mapping overlapping functional elements embedded within the protein-coding regions of rna viruses detecting overlapping coding sequences with pairwise alignments detecting overlapping coding sequences in virus genomes rna editing in the phosphoprotein gene of the human parainfluenza virus type 3 analysis of putative orf3 gene within porcine circovirus type 2 newcastle disease virus v protein is associated with viral pathogenesis and functions as an alpha interferon antagonist ribosome footprint profiling of translation throughout the genome highresolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling an overlapping proteincoding region in influenza a virus segment 3 modulates the host response origins of genes: "big bang" or continuous creation? stability and evolution of overlapping genes the combinatorics of overlapping genes observation of dually decoded regions of the human genome using ribosome profiling data selection pressure in alternative reading frames overlapping of genes in the human genome nhumirim virus, a novel flavivirus isolated from mosquitoes from the pantanal on the informational content of overlapping genes in prokaryotic and eukaryotic viruses viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of deltaretroviruses gene regulation by antisense transcription the p gene of bovine parainfluenza virus 3 expresses all three reading frames from a single mrna editing site r: a language and environment for statistical computing. version 3.3.2. vienna, austria: r foundation for statistical computing overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation deletion of the d domain of the human parainfluenza virus type 3 (hpiv3) pd protein results in decreased viral rna synthesis and beta interferon (ifn-beta) expression is there a twelfth protein-coding gene in the genome of influenza a? a selection-based approach to the detection of overlapping genes in closely related sequences evolution of viral proteins originated de novo by overprinting the primary structure of the lymphocytic choriomeningitis virus l gene encodes a putative rna polymerase redefining the invertebrate rna virosphere meta-transcriptomics and the evolutionary biology of rna viruses the effect of gene overlapping on the rate of rna virus evolution dna sequence at the c termini of the overlapping genes a and b in bacteriophage phi x174 reviving an old hiv-1 gene: the hiv-1 antisense protein mammalian overlapping genes: the comparative perspective a novel herpes simplex virus 1 gene, ul43.5, maps antisense to the ul43 gene and encodes a protein which colocalizes in nuclear structures with capsid proteins expression of human parainfluenza virus type 3 pd protein and intracellular localization in virus infected cells package 'ggplot2' for r. elegant graphics for data analysis molecular characterisation of segments 1 to 6 of rice black-streaked dwarf virus from china provides the complete genome simple method to detect candidate overlapping genes this work was supported by an arc australian laureate fellowship awarded to ech (fl170100022). the authors acknowledge the university of sydney hpc service at the university of sydney for providing high performance computing resources that have contributed to the research results reported within this article. we thank an anonymous reviewer for suggesting the synonymous mutation method. key: cord-102219-d3gkfo7s authors: perzel mandell, kira a.; price, amanda j.; wilton, richard; collado-torres, leonardo; tao, ran; eagles, nicholas j.; szalay, alexander s.; hyde, thomas m.; weinberger, daniel r.; kleinman, joel e.; jaffe, andrew e. title: characterizing the dynamic and functional dna methylation landscape in the developing human cortex date: 2019-10-30 journal: biorxiv doi: 10.1101/823781 sha: doc_id: 102219 cord_uid: d3gkfo7s dna methylation (dnam) is a key epigenetic regulator of gene expression across development. the developing prenatal brain is a highly dynamic tissue, but our understanding of key drivers of epigenetic variability across development is limited. we therefore assessed genomic methylation at over 39 million sites in the prenatal cortex using whole genome bisulfite sequencing and found loci and regions in which methylation levels are dynamic across development. we saw that dnam at these loci was associated with nearby gene expression and enriched for enhancer chromatin states in prenatal brain tissue. additionally, these loci were enriched for genes associated with psychiatric disorders and genes involved with neurogenesis. we also found autosomal differences in dnam between the sexes during prenatal development, though these have less clear functional consequences. we lastly confirmed that the dynamic methylation at this critical period is specifically cpg methylation, with very low levels of cph methylation. our findings provide detailed insight into prenatal brain development as well as clues to the pathogenesis of psychiatric traits seen later in life. sites change over time and these changes lead to adjustments in gene expression and splicing. these key regions have also been linked to neurodevelopmental disorders such as schizophrenia, in which early dysregulation plays a vital role 2, 3 . dnam is an attractive epigenetic mechanism to study in post-mortem human brain tissue, because it represents an interaction between genetic and environmental effects. external factors such as changes in diet 4 , exposure to cigarette smoking 5 , and exposure to arsenic 6 have been associated with both global and site-specific changes to dnam levels. in order to better understand the causes and consequences of deviant dnam patterns in psychiatric disease development, we must first understand the normal landscape. illuminating typical methylation changes in prenatal development will both provide insight into gene expression and molecular pathways active in the postnatal developing brain, and provide a baseline for identification of aberrant dnam in postnatal disease states, as the pathological changes that lead to symptoms of psychiatric disease may precede the onset of illness by several decades 7 . the dorsolateral prefrontal cortex (dlpfc) is a dynamic region of the brain throughout development, essential for motor planning, conceptual organization, and working memory, and is often functionally dysregulated in patients with schizophrenia 8 . previous studies using microarrays to quantify dnam have revealed that there are many dnam changes in the dlpfc around the time of birth 9 , but the dynamics of the prenatal brain are far less characterized. previous work assessing fetal brain dnam has largely used microarrays, and sampled whole brain rather than a specific region 10 . most cells in the developing brain are neuronal 11 , and it has been shown that neuronal dna methylation is especially dynamic in the earliest stages of postnatal life, making the dlpfc a potentially fruitful region for deeper interrogation in prenatal samples 1 . in the present report, we describe the use of whole genome bisulfite sequencing (wgbs) to capture an unbiased map of the dnam landscape, and to characterize both cpg and cph methylation during prenatal brain development. dynamic regions of dnam even in the early developmental context are likely to be associated with psychiatric risk-associated genes, and connection to gene expression data can validate the importance of these genomic regions. these important regions will help point to pathways and mechanisms of normal brain development as well as psychiatric and neurodevelopmental conditions. study samples: brain tissue from these second-trimester prenatal samples was obtained via a material transfer agreement with the national institute of child health and human development brain and tissue bank. all specimens were flash-frozen, then brain ph was measured and postmortem interval (pmi, in hours) was calculated for every sample. postmortem tissue homogenates of the dorsolateral prefrontal cortex (dlpfc) were obtained from all subjects. samples were obtained from the developing prefrontal cortex from the dorsolateral convexity of the frontal lobe half-way between the frontal pole and temporal pole, 4 mm lateral to the central sulcus. specimens extended from the surface of the brain to the ventricular zone. there were 10 of each male and female subjects, and 17 subjects were african american and 3 were european ancestries (table s1 ). data generation: genomic dna was extracted from 100 mg of pulverized dlpfc tissue with the phenol-chloroform method. dna was subjected to bisulfite conversion followed by sequencing library preparation using the truseq dna methylation kit from illumina. lambda dna was spiked in prior to bisulfite conversion to assess its rate, and we used 20% phix to better calibrate illumina base calling on these lower complexity libraries. resulting libraries were pooled and sequenced on an illumina hiseq x ten sequencer with paired end 150bp reads (2x150bp), targeting 90gb per sample. this corresponds to 30x coverage of the human genome as extra reads were generated to account for the addition of phix. data processing: the raw wgbs data was processed using fastqc to control for quality of reads 12 , trim galore to trim reads and remove adapter content 13 , arioc for alignment to the grch38.p12 genome (obtained from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/000/001/405/gca_000001405.27_grch38.p12/gc a_000001405.27_grch38.p12_assembly_structure/primary_assembly/assembled_chromoso mes/) 14 , duplicate alignments were removed with samblaster 15 , and the bismark methylation extractor to extract methylation data from the sequencing data 16 . we then used the bsseq r/bioconductor package (v1.18) to process and combine the dna methylation proportions across the samples for all further manipulation and analysis 17 . after initial data metrics were calculated, the methylation data was smoothed using bsmooth for modelling. cpgs were filtered to those that had ≥ 3 coverage in all samples, and cphs were filtered to those that had ≥ 3 coverage and non-zero methylation in at least half (≥10) of the samples. comparison to 450k: in order to compare wgbs methylation levels to 450k methylation levels, we used data from the same samples using the two methods, and compared their methylation levels at the same sites both graphically ( figure 1d ) and assessing the mean differences and root-mean-square deviation ( figure 1c ). we then validated our model's findings by applying the same model to the microarray data to overlapping sites and considered loci significant at fdr < 0.05 and validated if it's association had p < 0.05 in the validating set ( figure s3 ). gene set enrichment: we annotated our data using gencode v. 29 on hg38. we performed gene ontology and gene set enrichment using clusterprofiler (v3.12) 18 with a p-value cutoff of 0.01 and q-value cutoff of 0.04. we used sfari 2.0 19 for an autism spectrum disorder gene set, and a set of clinical gene sets defined by birnbaum et al for other neuropsychiatric and neurodevelopmental disorders 20 . enrichment was calculated on a background of genes expressed in our samples, to avoid brain bias. we performed ld score regression as described by and with data from finucane et al 21 . data analysis: cell composition of samples was deconvoluted using flow-sorted dlpfc microarray data from fluorescence-activated nuclear sorted neurons and non-neurons 22 and r package minfi 23 . linear regression modelling was performed accounting for sex, age, and embryonic stem cell content. sex-related cpgs and their surrounding sequence were re-aligned to the genome with bowtie2 to check for homology. association was tested using limma (v3.30) 24,25 to create a linear regression model accounting for sex, age, and embryonic stem cell content. the homogenate rna-seq samples were also part of a larger study of rna-seq data from homogenate dlpfc tissue (brainseq consortium phase i) 26 . we compared a site's methylation or mean methylation (for dmrs) to the expression level from rna-seq using a linear model adjusting for the strongest covariates, including sex, age, and estimated embryonic stem cell (esc) content. section 1: whole genome bisulfite sequencing in the prenatal human cortex we performed whole genome bisulfite sequencing (wgbs) to better characterize the shifting dnam landscape in the developing human dorsolateral prefrontal cortex (dlpfc) in 20 prenatal samples during the second trimester in utero (table s1 ). after data processing and quality control (see methods), we analyzed 28,612,402 cpg and 13,166,520 cph (h=a, t, or c) sites across the epigenome. we first focused on cpg dnam and performed a series of quality control analyses. we quantified read coverage at every cpg, which after processing and alignment resulted in an average of 17.7 reads per cpg ( figure 1a ). most of these sites are methylated (>80% dnam) and a minority are unmethylated (<20% dnam) (table s2) . we then performed principal component analysis (pca) on the dnam levels across these high-coverage cpgs and found that the top principal components were most associated with sex and estimated fractions of embryonic stem cells (escs) via deconvolution ( figure 1b ; figure s1 ; see methods), in line with our previous work 9 . we lastly compared the dnam levels from wgbs to levels from illumina 450k microarray measured on 15 of the 20 samples for the subset of cpgs in common to both technologies (n=456305), and found that overall, cpg dnam methylation levels were highly concordant regardless of cpg read coverage ( figure 1c -d). these analyses together suggested that our dnam data was of high quality and available for subsequent differential methylation analysis. to understand the changes in the prenatal epigenetic landscape, we performed linear modeling across all cpg sites. we found that dnam changes were abundant even during this relatively restricted period in prenatal development, with 36,546 cpg sites differentially methylated across the ages of 14-20 post-conception weeks (pcw, at fdr < 0.05, table s3 ). on average, each week of development was associated with a 1.8% change in dnam (iqr: 1.18% -2.33%) at a given cpg, with some sites changing as much as 8.7% per week. these differentially methylated cpgs were not evenly distributed across the genome, with none on the y chromosome, and a small number on the x chromosome (table s4 ). there was also some unevenness on the autosomes, with some chromosomes having up to 2.5x higher frequency of age-associated cpgs than others (table s4 ). 2,743 of these cpgs lie in previously annotated cpg islands, with 5,933 in shores, defined as 2 kb from island ends. the vast majority (73%) of these cpgs lie within genes, and 2,932 lie within 1 kb of a transcription start site and thus potentially in promoters. this leaves 6.5% in intergenic space. sites were fairly evenly split in the direction of methylation change by age, with 46% increasing in methylation as the cortex develops. additionally, less than half of these cpg sites are significantly associated with the estimated esc proportion in the sample, suggesting that many of these cpg sites have true prenatal age changes in methylation rather than reflecting effects of maturing cell type composition. we further explored whether these sites could be organized into differentially methylated regions (dmrs), which have been shown to be more functionally relevant than individual cpgs. we therefore used a "bumphunting" technique adapted to wgbs data 17 to identify regions of methylation with 1.5% dnam changes per week (corresponding to 10.5% changes across our developmental window) change in dnam levels across adjacent cpgs, and calculated statistical significance using 1000 bootstrap-based permutations 27 . using a conservative cutoff (fwer < 0.2) there were 34 dmrs across prenatal development, though a less conservative cutoff (p < 0.05) identified 3,446 dmrs (table s5, figure s2a for dmr plot). the dmrs are similarly unevenly distributed throughout the genome as the cpgs, being far less frequent on chrx and variable among the autosomes. the dmrs on average had a width of 1696 bp (iqr: 921-2181), 75% overlap with annotated genes, and 12% overlap with annotated cpg islands. like the cpgs, the dmrs were split between hypo-and hypermethylation, where in 49% of the dmrs, methylation increased with age. we also found differentially methylated sites and regions between the sexes in these prenatal samples. there were 618,978 significantly differential cpgs by sex and as expected, the vast majority (95%) of these were on the sex chromosomes. there were still 30,823 significant autosomal cpgs (table s6) , and while 993 of them were in regions homologous to the sex chromosomes, the majority (97%) had no homology to chrx or chry. among these, we found conservatively 28 dmrs with a 10% methylation change between sexes (table s7, figure s2b for dmr plot). again we saw that these were not global, but regional changes in methylation, with equal numbers of dmrs being hypo-and hypermethylated in males and females. these data show that there are many differences in the dnam landscape throughout second trimester development, and between sexes prenatally. previously, many studies of brain dna methylation have used the illumina infinium® humanmethylation450 beadchip ("450k") and more recent infinium methylationepic ("850k") microarray technologies. while these platforms can sensitively measure dnam levels without the high coverage/sequencing requirements of wgbs, they assay a limited number of sites. to better identify the tradeoff between breadth (assaying more sites) and depth (assaying a given site more accurately), we compared our wgbs findings to analogous statistics calculated using these arrays on the same dna extractions. first, using the probe coordinates alone, the 450k array does not measure dnam levels at 35,401 (97%) of the significant age-correlated cpgs or 610,578 (98.7%) of the significant sex-correlated cpgs we identified. the 450k dnam data validated 566 (49%) out of the 1,145 age-correlated sites that were covered. performing the same age model on the 450k data, our wgbs data validated 279/423 (66%) of the significant results. the effects of age on methylation were generally directionally concordant between the two datasets ( figure s3 ). though dmrs have much wider spans, the 450k does not cover 10 (30%) and 12 (43%) of dmrs for age and sex, respectively. the newer 850k microarray had almost twice the number of probes, yet we found that it does not measure 95% of our significant sites for age and 98% of sites for sex. additionally, it still failed to capture 24% of age dmrs and 36% of sex dmrs. microarrays are potentially missing a great deal of significantly differential sites, suggesting wgbs is best for thorough analysis. section 3: functional characterization of differential sites while differential methylation analysis provides detailed information about changes in the methylation landscape across brain prenatal development and between the sexes, we wanted to better understand the molecular consequences of these changes. we performed gene set enrichment and gene ontology to better understand in which processes the genes containing significant cpgs are involved. the top biological processes associated with genes containing differentially methylated cpgs across age were related to axon development and guidance, and regulation of neuron projection ( figure s4 , table s8 ). these sites were also shown to be enriched for enhancers in fetal brain tissue as well as enriched for transcription start site regions in brain tissue from the epigenome roadmap project 28 , implying that these areas are likely functional in prenatal brain 28 ( figure 3a ). to better assess the putative functionality of the methylation at these cpgs and dmrs, we correlated these dnam associations with nearby expression levels using rna-seq data from the same cortical dissection. among the 36,546 age-related cpgs, 59% were within an rna-seq-measured gene (corresponding to 1,630 unique genes). 7,998 (37%) of these cpgs had methylation levels significantly correlated (p < 0.05) to expression of the gene they lie directly within (661 unique genes, 41%, see methods, figure 3b -c). these expression-correlated cpgs did not have any differential gene ontology from the overall group of genes. these sites were mostly in weak transcription, enhancer, and quiescent chromatin states in fetal brain ( figure s5) 28 . with an expanded window of 5 kb around genes, 24,679 (66%) cpgs were accounted for, and 8,759 (35%) correlated to the nearest gene's expression. it is important to note that this only tested sites within actual genes, and only the effect on the same gene, so enhancer, upstream, and trans-acting effects were not accounted for. the correlation to expression was much lower for sex-related cpgs, at 5% and 9% significant correlation depending on inclusion or exclusion of sex chromosomes. the lack of correlation to expression here is unsurprising because there are almost no transcriptomic differences between the sexes in these samples. gene set enrichment on autosomal sex-related cpgs found genes involved with synaptic transmission and signaling, regulation of gtpase activity, and the glutamate receptor signaling pathway (table s9) . age-related dmrs were most enriched for genes related to stem cell proliferation and various cell fate specifications (table s10 ). among the dmrs across age, 21% were significantly correlated with gene expression. the genes represented by this are otx1 , ac246817.1 , cyp2e1 , plekhh3 , and dux4l32 . there was no enrichment for any gene category among the sex-related dmrs, but it is worth noting that many of these genes coded for lncrnas. 25% of sex-related dmrs were correlated to gene expression within their own gene and these genes were linc01606 , linc01166 , spatc1l , and rfpl2 . correlation to gene expression overall was not different between differentially methylated regions and non-differentially methylated regions; the dmrs we found were not enriched for highly correlated regions. these results provide potential starting points for further understanding the dna methylation landscape of the developing brain, allowing us to understand which processes are active during normal and disordered development. while dna methylation occurs exclusively in the cpg context in most somatic cell types, neurons in the human brain uniquely have high levels of dna methylation in other cytosine contexts (predominantly cpa) 29 . we therefore investigated the potential role of cph methylation across brain development. only 2.4% (iqr: 2.33-2.49) of sites in a chg context and 1.71% (iqr: 1.61-1.79) of sites in a chh context were methylated. for comparison, around 10% of cph sites are methylated in postnatal neurons, so it is likely as previously suggested 1 that cph methylation accumulates beginning around the time of birth. this contrasted with cpg sites, which were predominantly methylated across the genome (table s2) . because there were so many more cph than cpg sites in the genome, this means that there are actually similar numbers of methylated cpg and cph sites, despite the different proportions. using the same linear model as was used for cpgs on 13,166,520 cph sites, no cphs were significantly associated with age within this time period when accounting for the large multiple testing burden. however, 51 autosomal cphs were significantly associated with sex (at fdr < 0.05, table s11 ). additionally, there were 40 associated cphs on the sex chromosomes, though most of these were on chry, overall very proportionally different from cpg methylation. there was no enrichment for any more specific trinucleotide context in these significant sites over the whole genome. these cphs were less likely to be in or near genes than the cpgs, but of the 17 autosomal cphs that were in or near genes, only 1 was significantly correlated (p < 0.05) with expression of the nearest gene within 10 kb. it is notable that the majority of these cphs are relatively far from a gene. additionally, the effect size of the cph methylation at these significant sites is independent of the effect of cpgm in the surrounding area in most cases. five of the cphs' effect is reduced when accounting for the methylation of the nearest cpg ( figure s6) , showing that these few are dependent on the local cpg landscape, but most function independently. these loci are found both in and outside of genes and their surrounding regions. overall, cph methylation does not seem to be very dynamic or functional in prenatal second trimester development, though later in early postnatal life it is 30 . section 5: links to neuropsychiatric conditions and their (genetic) risk. to understand the clinical implications of our findings, we tested the genes represented by prenatal age-associated cpgs and found that they were enriched for bipolar disorder-(p = 0.035), neurodevelopmental disorder-(p = 0.0011), and schizophrenia-(p = 0.00022) associated genes 20 . age-differential cpgs with correlation to gene expression were not further enriched. the set of age-differential cpg genes was particularly enriched for autism-associated genes (p = 9x10 -20 , or = 5.6), with 163 of our 1,630 genes being linked to autism spectrum disorder (asd) 19 . sex-differential cpgs were also enriched for asd-associated genes (p = 6x10 -29 , or = 2.9), but not for genes associated with other psychiatric disorders. dmrs represented a greater portion of the genome than cpgs, and thus may have different functional effects. to further understand what phenotypes may be linked to our dmrs, we performed stratified ld score regression 21 . our dmr sets represented a fairly small portion of the genome, but at 5.8 mb, the less stringent age-based dmrs were wide enough to detect enrichment of bmi-and subjective well-being-(swb) associated markers (table s12 ). these dmrs were also shown to be enriched for brain-linked traits overall, compared to non-brain-linked traits. additionally, the dmrs overlap with 15 of 108 schizophrenia loci from psychiatric genetics consortium 1 31 . these results suggest that dynamic methylation states in prenatal development may play a role in many disease and non-disease phenotypes that manifest later in life. here we demonstrate that cpg methylation changes are associated with prenatal cortical development, are likely functionally relevant, and may play a role in developmental psychiatric disorders such as autism spectrum disorder (asd). the shifting dnam landscape during this restricted prenatal time period -the second trimesterlikely plays a role in neurogenesis of the developing human brain. following current dogma, changes in dnam at these critical sites presumably lead to altered gene expression which promotes the formation of new neural connections in the cortex. at this point in time, we see that it is cpgs that are the important sites in the dnam landscape, as opposed to cphs. this is in line with previous research which finds that most cphm is established around birth and postnatally 1, 32 , later in neuron differentiation and development 33 . we also see that much of this regulation is occuring in cis -within the site's own gene. we do not currently understand the trans effects of our developmentally dynamic sites, but they may explain the effects of the methylation that does not correlate to its nearest gene's expression. the fact that the age-associated cpg sites we identified are often in enhancer regions in prenatal brain is further evidence that these sites may be regulating from a distance. our findings are in line with previous findings based on a microarray platform 10 , but using wgbs allows us to assess methylation at far more loci than the more commonly used illumina microarrays. we've shown that these microarrays do not even measure the majority of sites that are dynamic in neural development. despite potentially less precision, we also show that wgbs has enough coverage to assess dnam dynamics and that its findings were consistent with deeper-coverage assays. additionally, we present novel loci due to our narrower developmental age range, which means that we can detect more subtle changes in the second trimester but also means that we do not replicate all the findings of studies done across wider timespans. confining our samples to just cortex tissue rather than whole brain tissue also allows us to detect more regionally specific changes. most variation in our data likely comes from different cell types in the cortex, which we partially accounted for in our model by adjusting for esc estimated proportions. we believe that most of the cell types in this area at this time are neurons, but there is likely a range of maturities leading to variation in methylation 11 . by accounting for esc fraction in our model, we believe that the age affects we find are truly a result of prenatal age. we also find that there are autosomal differences in dnam between the sexes, though by our assessment they seem less functionally significant. they may be acting more in trans which makes it more difficult to assess significance, but there are also just generally far fewer transcriptomic differences between sexes than between ages. sex-differential dnam later in life has been linked to psychiatric genes, which has been proposed as a mechanism for sex differences in psychiatric disorders 34 , so perhaps the early changing dnam is laying the groundwork for future effects. like xia et al 34 , we found that asd-associated genes are differentially methylated between the sexes, but this does not readily translate into transcriptomic differences. given the enrichment in our developmentally-associated sites for psychiatric-linked genes, these data support the notion of early neurodevelopmental components for disorders such as schizophrenia 7, 35, 36 . genes implicated in bipolar disorder, schizophrenia, and asd are clearly dynamic at this time point in life, so dysregulation at this stage could lead to vulnerabilities to these disorders later in life. our data also implies that bmi and subjective well beingnon-disease traits -could be linked to neural development at this stage of life. as more data is generated, particularly through genome-scale methods like wgbs, we will be able to establish normal ranges of dnam at all ages, which will undoubtedly provide insight into molecular dynamics in this hard-to-study period and organ, as well as give clues to where deviation from the norm is important. unfortunately, wgbs still does not measure hydroxymethylation, another chemical modification thought to be epigenetically important. our findings are only the beginning of what may be found given the limited sample size of our study, but even here we reveal important processes. there is room for much more characterization of these epigenetic marks, but it is clear that they are worth understanding. dna methylation serves as an exciting potential avenue to understand neural development and psychiatric disorders. there are clear and functional changes in the neuronal dnam landscape over this important window of brain development and between sexes, and further investigation will help elucidate unknown mechanisms in the brain. supplemental data include 6 figures and 12 tables. divergent neuronal dna methylation patterns across human cortical development reveal critical periods and a unique role of cph methylation spatio-temporal transcriptome of the human brain epigenetic mechanisms in neurological disease persistent epigenetic differences associated with prenatal exposure to famine in humans tobacco-smoking-related differential dna methylation: 27k discovery and replication long term low-dose arsenic exposure induces loss of dna methylation genetic insights into the neurodevelopmental origins of schizophrenia complexity of prefrontal cortical dysfunction in schizophrenia: more than up or down mapping dna methylation across development, genotype and schizophrenia in the human frontal cortex methylomic trajectories across human fetal brain development a single-cell transcriptomic atlas of human neocortical development during mid-gestation a wrapper around cutadapt and fastqc to consistently apply adapter and quality trimming to fastq files, with extra functionality for rrbs data arioc: gpu-accelerated alignment of short bisulfite-treated reads samblaster: fast duplicate marking and structural variant read extraction bismark: a flexible aligner and methylation caller for bisulfite-seq applications bsmooth: from whole genome bisulfite sequencing reads to differentially methylated regions clusterprofiler: an r package for comparing biological themes among gene clusters sfari gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (asds) prenatal expression patterns of genes associated with neuropsychiatric disorders partitioning heritability by functional annotation using genome-wide association summary statistics flowsorted.dlpfc.450k: illumina humanmethylation data on sorted frontal cortex cell populations (r: bioconductor) minfi: a flexible and comprehensive bioconductor package for the analysis of infinium dna methylation microarrays limma powers differential expression analyses for rna-sequencing and microarray studies robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis bump hunting to identify differentially methylated regions in epigenetic epidemiology studies integrative analysis of 111 reference human epigenomes distribution, recognition and regulation of non-cpg methylation in the adult mammalian brain divergent neuronal dna methylation patterns across human cortical development reveal critical periods and a unique role of cph methylation biological insights from 108 schizophrenia-associated genetic loci global epigenomic reconfiguration during mammalian brain development principles governing dna methylation during neuronal lineage and subtype specification sex-differential dna methylation and associated regulation networks in human brain implicated in the sex-biased risks of psychiatric disorders the neurodevelopmental hypothesis of schizophrenia, revisited implications of normal brain development for the pathogenesis of schizophrenia the authors thank the umb brain bank at the department of pediatrics in the university of maryland school of medicine for the tissue provided. this project was supported by the lieber institute for brain development and by nih grants r21mh102791 and r01mh112751. finally, we are indebted to the generosity of the families of the decedents, who donated the brain tissue used in these studies. the authors declare no competing interests. raw and processed nucleic acid sequencing data generated to support the findings of this study are part of the psychencode consortium and the brainseq consortium data releases. specifically, wgbs data have been deposited at www.synapse.org along with the other psychencode data, under the accession code syn5842535. the homogenate rna-seq samples were also part of a larger study of rna-seq data from homogenate dlpfc tissue (brainseq consortium phase i) 26 , which was also deposited at www.synapse.org and summarized in http://eqtl.brainseq.org/phase1 . the processed, homogenate rna-seq data for this study have additionally been deposited via globus under the jhpce#brainepi-cellsorted collection at the following location: http://research.libd.org/globus/jhpce_brainepi-cellsorted/ . neun-sorted rna-seq data were originally published as part of phase ii of the brainseq consortium ( http://eqtl.brainseq.org/phase2/ ) and have also been deposited via globus under the jhpce#brainepi-polya collection at the following location: http://research.libd.org/globus/jhpce_brainepi-polya/ . publicly available data reprocessed in support of the conclusions in this work were downloaded from the gene expression omnibus under geo accession gse47966. key: cord-001858-nmi39n6h authors: petriccione, milena; mastrobuoni, francesco; zampella, luigi; scortichini, marco title: reference gene selection for normalization of rt-qpcr gene expression data from actinidia deliciosa leaves infected with pseudomonas syringae pv. actinidiae date: 2015-11-19 journal: sci rep doi: 10.1038/srep16961 sha: doc_id: 1858 cord_uid: nmi39n6h normalization of data, by choosing the appropriate reference genes (rgs), is fundamental for obtaining reliable results in reverse transcription-quantitative pcr (rt-qpcr). in this study, we assessed actinidia deliciosa leaves inoculated with two doses of pseudomonas syringae pv. actinidiae during a period of 13 days for the expression profile of nine candidate rgs. their expression stability was calculated using four algorithms: genorm, normfinder, bestkeeper and the deltact method. glyceraldehyde-3-phosphate dehydrogenase (gapdh) and protein phosphatase 2a (pp2a) were the most stable genes, while β-tubulin and 7s-globulin were the less stable. expression analysis of three target genes, chosen for rgs validation, encoding the reactive oxygen species scavenging enzymes ascorbate peroxidase (apx), superoxide dismutase (sod) and catalase (cat) indicated that a combination of stable rgs, such as gapdh and pp2a, can lead to an accurate quantification of the expression levels of such target genes. the apx level varied during the experiment time course and according to the inoculum doses, whereas both sod and cat resulted down-regulated during the first four days, and up-regulated afterwards, irrespective of inoculum dose. these results can be useful for better elucidating the molecular interaction in the a. deliciosa/p. s. pv. actinidiae pathosystem and for rgs selection in bacteria-plant pathosystems. multiplication and growth of p. s. pv. actinidiae in a. deliciosa leaf. during the time course of the experiment, the multiplication and growth of psa cra-fru 8.43 inoculated at 1-2 × 10 3 cfu/ml and 1-2 × 10 7 cfu/ml into a. deliciosa cv. hayward leaves was assessed. when inoculated at the lower dose, the growth of the pathogen within the leaf never exceeded 10 5 cfu/ml and showed a peak of 2-3 × 10 4 cfu/ml at nine dpi; no symptoms were observed in the inoculated leaves. by contrast, when the pathogen was inoculated at 1-2 × 10 7 cfu/ml, it incited the appearance of tiny necrotic spots on many of the inoculated leaves of nine dpi (see supplementary fig. s1 online). in this case, only the green tissue was precisely removed to prepare the samples. selection of candidate reference genes, amplification specificity and efficiency. nine rgs, commonly used as internal controls for expression studies in other pathosystem, were screened in a. deliciosa leaves inoculated with the pandemic psa strain cra-fru 8. 43 . to determine the specificity of the primer pairs used in this study, melting curve analysis and agarose gel electrophoresis were performed following the rt-qpcr experiment. a single peak in the obtained melting curve confirmed the specificity of the amplicon, and no signal was detected in the negative controls for all of the tested rgs (see supplementary fig. s2 online) . in addition, a single band with the expected size was detected in a single pcr product (see supplementary fig. s3 online) . the standard curve method using a pool of all of the cdna samples was performed to calculate the pcr efficiency (e) and the correlation coefficient (r2) of each primer pair. average e values ranged from 100.7 to 108.2%, with r 2 varying from 0.991 to 0.999 (table 1 ). the results showed that all of the primer pairs were suitable for rt-qpcr analysis. expression levels of the reference genes. rt-qpcr was used to quantify the mrna levels of nine candidate rgs, and the expression stability was investigated. to determine the expression levels of the candidate rgs, the raw quantification cycle (cq) values were determined. the nine candidate rgs displayed a wide expression range, with cq ranging from 20.23 to 31.31 , across all of the tested samples, with mean cq values between 22.06 ± 0.92 and 28.58 ± 0.86 (fig. 1 ). all of the tested rgs showed a normal distribution in cq values according to the kolmogorov and smirnov method. these genes were clearly distributed into different expression level categories. the results showed that cyp was the most expressed gene with the lowest mean cq (22.06 ). on the other hand, glo7a was the least expressed gene with the highest mean cq value (28.58) . tub showed the most variation in expression level among the evaluated rgs by the larger whisker taps and boxes compared to the other genes, suggesting its low stability. most of the candidate rgs were highly expressed, with average cq values between 22 and 24 cycles, except sand and glo7a, which showed average cq values at intermediate expression levels (fig. 1 ). primer sequence (5′-3′) bestkeeper and the deltact method) were used to evaluate the stability of expression of selected rgs. the analyses were performed for three comparison groups considering both low-and high-dose bacterial inocula in the leaves and their combined dataset. in each comparison group, the nine rgs were ranked from the most stable to the least stable. the data obtained from biological replicates were analysed separately to verify that the variation was not due to the treatment, but was intrinsic to the gene itself 40,41 . genorm analysis. nine rgs were ranked in three comparison groups based on their average expression stability (m-value), as shown in tables 2, 3 and 4. all of the tested rgs showed an overall limited variance, with m-values lower than 1.5, which was the default limit (m≤ 1.5), indicating a high stability level of the analysed genes in our experimental conditions. gapdh, pp2a and ubc were the three most stable genes in this pathosystem, with slight differences in ranking for three comparison groups. in a. deliciosa leaves inoculated with a low dose of bacterial inoculum, gapdh was the most stable gene (table 2) , while in leaves inoculated with a high dose of bacterial inoculum and when all of the sample sets were analysed together, pp2a was the most stable gene (tables 3 and 4 ). tub was the least stable gene in three comparison groups (tables 2, 3 and 4). in this study we used the genorm algorithm to find the optimal number of suitable rgs required for proper normalization. in three comparison groups, genorm analysis revealed that by step wise calculation the pairwise variation value v2/3 was lower than the threshold value (0.15), suggesting that two rgs could be used for normalization under these conditions (fig. 2 ). this suggested that the optimal number of rgs for normalization was two and that the addition of the third rgs showed no significant effect on the normalization of gene expression. finally, gapdh and pp2a were identified as the best rgs and selected for normalization by genorm. normfinder analysis. normfinder ranks the rgs according to their stability values under the tested conditions. the results of normfinder analysis were slightly different from those of genorm. however, in the three comparison groups, gapdh emerged as the most stably expressed gene with the lowest 14 tub 1.00 table 3 . average stability values (sv) of the nine candidate reference genes are shown for leaves inoculated with high dose of pseudomonas syringae pv. actinidiae inoculum. stability value. gapdh and pp2a still occupied the next two top positions for higher stability when we considered the total dataset (table 4 ) or in a. deliciosa leaves inoculated with a high dose of bacterial inoculum (table 3) , while in a. deliciosa leaves inoculated with a low dose of bacterial inoculum, gapdh and act were the most stable rgs ( table 2 ). the normfinder results indicated that tub was the least stable rg in the total dataset, confirming our genorm results. tables 2, 3 and 4. in the total dataset, bestkeeper analysis highlighted six rgs characterized by the least overall variation, with sd < 1; sand and eef-1a were the most stable genes, with sd values of 0.69 and 0.76, respectively (p < 0.001) ( table 4 ). in a. deliciosa leaves with a low dose of bacterial inoculum, sand (0.72) was the most stable gene, followed by eef-1a and glo7a, with sd values of 0.81 and 0.95, respectively (table 2 ). in kiwifruit leaves with a high dose of bacterial inoculum, bestkeeper revealed that only the expression of tub overcame the stability threshold; cyp and gapdh were considered to be the most stable genes, with sd values of 0.50 and 0.61, respectively (table 3 ). the results of the deltact method were reported in tables 2, 3 and 4 . gapdh was the most stable gene for the three comparison groups. for the entire dataset, the results were similar to normfinder and genorm analysis, with gapdh and pp2a as the top two ranked rgs, with a slight difference in the ranking (table 4) . tub was the least stable gene in three comparison groups, as demonstrated by other statistical algorithms. in this study, to determine the consistency of the ranks of candidate rgs produced by genorm, normfinder, bestkeeper and the deltact method, the pearson correlation coefficient was employed ( table 5 ). the pearson correlations achieved from the calculations were positive and significant for all methods, except bestkeeper. the most significant correlation of the rank of all rgs ranked by two methods was genorm and deltact in a. deliciosa leaves inoculated with a high dose of bacterial inoculum (r = 0.958), followed by normfinder vs. deltact in a. deliciosa leaves inoculated with a low dose of bacterial inoculum (r = 0.895) ( table 5) . for the overall final ranking obtained by the four algorithms, the two top rgs for the total dataset were gapdh and pp2a, while the least stable were glo7a and tub. expression analysis of the target genes for reference gene validation. the expression of three genes encoding the reactive oxygen species (ros) scavenging enzymes ascorbate peroxidase (apx), superoxide dismutase (sod) and catalase (cat), induced during the systemic infection of kiwifruit leaves with psa, were chosen to further validate the reliability of the selected rgs for the normalization of rt-qpcr data. in this study, we followed two normalization strategies to determine the expression of these target genes. the first used the best two rgs (gapdh and pp2a) given by ranking from four methods (genorm, bestkeeper, normfinder and deltact), and the second used the least stable rgs (tub and glo7a). in a. deliciosa leaves inoculated with a high dose of bacterial inoculum, an up-regulation in apx mrna expression was observed during the time course of the experiment with 2.4-and 4.5-fold changes after 1 and 13 dpi, respectively. instead, when we used a low dose of bacterial inoculum, we observed an accumulation of the apx transcript after 4 dpi with a 3.7-fold-change and a gradual decrease from 7 to 13 dpi (fig. 3a) a down-regulation in cat mrna expression during the first 4 dpi was observed, and subsequently, we registered a gradual up-regulation in cat mrna expression in a. deliciosa leaves inoculated with a low-and high-dose of bacterial inocula. the maximum level of the transcript was reached after 10 dpi, with a 2.2-and 5.7-fold change in infected leaves with high-and low-dose bacterial inocula, respectively (fig. 3b) . similarly, we observed in the accumulation of the sod transcript, that the maximum average value after 10 dpi was a 3.6-and 1.6-fold change, with low and high bacterial inocula, respectively (fig. 3c) . our results confirm that the transcriptional levels of apx, cat and sod are subjected to complex regulation in psa-infected kiwifruit leaves. this information is distorted when we normalize against the least stable genes, upon which the expression levels of apx, cat and sod were inaccurate and altered transcriptional profiles were displayed (fig. 4 ). in research of plant molecular pathology, studies on gene expression patterns are important for understanding the biological process involved in host-plant interactions. presently, several methods can be applied to study gene expression levels, but rt-qpcr has become the primary quantitative method for the high-throughput and accurate expressing profiling of target genes. for rt-qpcr analysis, the requirement of a normalization method against rgs is important to achieve reliable results. as suggesting by the "minimum information for publication of quantitative real-time pcr experiments" (miqe) guidelines 4 , the use of rgs as internal controls is the most appropriate normalization strategy 7 . ideal rgs should be stably expressed in all cells or tissues and remain stable under different experimental conditions 42 . several studies highlighted that there is neither a universal rg nor a defined number of genes to use, but the choice and an optimal number of rgs should be experimentally determined 4, 27 . many reliable rgs have been determined in plant cells and across different plant species, developmental stages, and biotic and abiotic stresses 43 . however, to the best of our knowledge, few studies have been carried out to assess rgs in bacteria-plant pathosystems 6 . here, we assessed nine rgs for their use as internal controls in gene expression studies of the a. deliciosa response to infection by psa upon leaf infiltration using two different doses of bacterial inoculum. to identify the best rgs, four different statistical algorithms were used. combined use of genorm, normfinder, bestkeeper and the deltact method to select and validate the best rgs generated substantial discrepancies in the final ranking due to different mathematical models associated with each algorithm, as confirmed by other studies 18, 44, 45 . as reported in other studies, the most discrepant results in gene stability ranking were obtained with bestkeeper 46 . in the total dataset, pp2a, gapdh and ubc were identified as the top three rgs using genorm, while gapdh, pp2a and act were suggested as the most stable rgs by normfinder and the deltact method. according to bestkeeper, act, gapdh, pp2a and ubc were ranked fifth to eighth, respectively. among all of the tested rgs, tub was ranked as the least stable gene in the four statistical algorithms, and its use as a rg should be avoided in rt-qpcr experiments in this pathosystem. to overcome differences in the ranking of rgs, we adopted the geometric mean of all four algorithms to obtain a final ranking 47 . as suggested by several studies, the accuracy of rt-qpcr can improve by using more than one rg 27 . the optimal number of candidate rgs for normalization of rt-qpcr data has been evaluated by genorm software. our results showed a pairwise variation v2/3 value below 0.15, which indicates that combination of two-rgs was sufficient for optimal normalization in the three comparison groups. the final ranking showed that the two top rgs for the total dataset were gapdh and pp2a and can be used as rgs for rt-qpcr normalization in this pathosystem. gapdh was indicated to be a stable rg in a tomato-virus interaction 15 , in virus-infected mammalian cells 48 and in wheat infected with barley yellow dwarf virus (bydv) 14 , but was the least stable rg in coffea spp. hypocotyls inoculated with colletrichum kahawae 13 . pp2a was a stable rg in virus-infected leaf tissues of nicotiana benthiamiana 48 and in virus-infected arabidopsis thaliana 43 . in our study, ubc was among the four most stable rgs, as demonstrated in coffea arabica leaves inoculated with hemileia vastarix 49 , but was considered to be the least stable rg in common bean inoculated with colletotrichum lindemuthianum 50 . tub was not confirmed as a stable normalization factor in our conditions, confirming our previous proteomic study that showed the variability of this protein in a. chinensis shoot during systemic infection with psa 25 ; however, in other pathosystems, such as puccinia graminis f sp. tritici-infected wheat, tub was one of the most stable rgs 51 . furthermore, this rg showed highly variable expression levels in closely related cereals, such as wheat, barley and oat infected with bydv; tub was unstable in wheat and reasonably stable in two other species 14 . the sand transcript was ranked lower among rgs in our pathosystem than was identified in nicotiana benthiamiana and lycopersicum esculentum plants inoculated with viruses 52, 53 . these variations in the expression profiles of rgs in different pathosystems confirm the need for validation for rgs under each specific condition. some rgs can be involved in different metabolic pathways 13 and influenced in a plant tissue-dependent manner during plant-pathogen interactions 15 . the suitability of the selected rgs has been evaluated analysing the expression levels in three target genes (apx, cat and sod) that encode for proteins that are directly involved in ros detoxification, protecting cells from oxidative bursts induced as responses to pathogen invasion 54 . sod catalyses the dismutation of o 2 to h 2 o 2 , cat dismutates h 2 o 2 to oxygen and water, and apx reduces h 2 o 2 to water by utilizing ascorbate as a specific electron donor 55 . the balance between sod and apx or cat activities in cells is crucial for determining the steady-state level of o 2 and h 2 o 2 . in our study, the accumulation of apx, cat and sod gene transcripts was strongly influenced by the dose of bacterial inoculum used. indeed, these genes involved in ros detoxification and the oxidative-stress response have a key role for bacteria survival and pathogenesis 55 . the apx up-regulation during a relatively long time course of infection (i.e., 10 days) was observed upon the twig inoculation with the same high dose of psa cra-fru 8.43 also in the case of a. chinensis "soreli" 25 . in the same study, however, neither cat nor sod were found differentially expressed 10 days after the twig inoculation. irrespective of the inoculum doses, both cat and sod resulted up-regulated during the first four days of infection, and, subsequently, their level in the leaf tissues declined. interestingly, a similar trend was observed for sod in the phaseolus vulgaris/p. s. pv. phaseolicola pathosystem after the inoculation of bean leaves with the same high dose of bacterial inoculum used in the present study (i.e., 1 × 10 7 cfu/ml) 56 . in this study, however, the sod level in the bean primary leaves and into the apoplastic fluid starts to decrease 48 and 24 hours after the artificial inoculation. furthermore, in this study, we demonstrated that to correctly quantify apx, cat and sod, it was necessary to choose the rgs that had transcript levels that were not influenced by bacterial infections and that the use of inappropriate rgs can markedly change the expression pattern of a given target gene, leading to incorrect results. this is the first study in which a set of candidate rgs was analysed in terms of their expression stability in a. deliciosa leaves infected with psa. four different statistical algorithms showed slight differences in the final ranking of rgs, but by combining and analysing the data together, we demonstrated that two genes, gapdh and pp2a, are the most stably expressed transcripts in all infected kiwifruit leaves. the validation of rgs in our study provides new information that will be useful for a better understanding of the molecular mechanisms implicated in the expression profiles of target genes in the a. deliciosa/p.s. pv actinidiae pathosystem. it should be considered that ideal rgs can vary with the pathosystem under investigation, and therefore, these genes should be carefully selected for each study conforming to the miqe guidelines. plant material, p. syringae pv. actinidiae inoculations and experimental design. two-yearold, self-rooted, pot-cultivated a. deliciosa "hayward" plants and the pandemic psa strain cra-fru 8.43 were used in this study 21 . this bacterial strain was originally isolated from a. chinensis leaf spot and further characterized 57, 58 . plants were maintained in an aseptic room with 95% relative humidity with natural light and no further fertilization after their transfer from the nursery. they were watered regularly. inoculation took place in spring (i.e., may). the strain was grown for 48 h on nutrient agar (oxoid) with 3% sucrose added (nsa) at 25 ± 1 °c. subsequently, a low (1-2 × 10 3 cfu/ml) and high (1-2 × 10 7 cfu/ ml) dose of bacterial inoculum, determined using spectrophotometry, were prepared in sterile, distilled water. to avoid wounding, the inoculation occurred by gently spraying the suspensions on the abaxial surface of fully expanded, healthy, young leaves, until the appearance of homogenous water-soaked areas on the whole leaf lamina. twenty plants per dose were inoculated. artificial inoculations were performed separately, according to the dose. control plants were treated in the same way with sterile, distilled water. after inoculation, plants were maintained separately and were kept for 24 h in a moist chamber (100% humidity), which was required for optimal infection. during the experiment, the multiplication and growth of the pathogen was assessed as previously described 21 . leaves were collected after one day post-inoculation and at intervals of three days for 13 days, immediately frozen in liquid nitrogen and stored at − 80 °c until rna isolation. in the same treatment group (inoculated and mock inoculated), each biological replicate was obtained by pooling three leaves from different plants harvested at random. three independent biological replicates were performed for each sample with three technical replicates each. total rna extraction and cdna synthesis. total rna was isolated from a. deliciosa leaves inoculated with psa as well as from control leaves as described by rubio-piña and zapata-perez 59 . residual genomic dna was digested by rnase-free dnase (invitrogen life technologies, carlsbad, ca, usa) according to the manufacturer's instructions. the rna concentration was quantified by measuring the absorbance at 260 nm using a jasco v-530 uv/vis spectrophotometer (tokyo, japan). the purity of all of the rna samples was assessed at an absorbance ratio of od260/280 and od260/230, while its structural integrity was checked by agarose gel electrophoresis. only high-quality rna with od 260/280 and od 260/230 > 2 was used for subsequent steps. single-stranded cdna was synthesized from 1 μ g of total scientific reports | 5:16961 | doi: 10.1038/srep16961 rna using an iscript ™ select cdna synthesis kit and oligo(dt)20 primers (bio-rad, milan, italy), according to the manufacturer's instructions. test. for this study, special attention was paid to a select set of nine candidate rgs (act, eef-1a, pp2a, ubc9, sand, tub, glo7a, cyp and gapdh) to investigate their robustness as internal controls for rt-qpcr in a. deliciosa. these genes belong to different functional and abundance classes to significantly reduce the chance that they are co-regulated. apx, cat and sod were selected as genes of interest. gene-specific primers, such as sand, tub, glo7a, cyp, gapdh, apx, cat and sod, were designed in our laboratory using primer expression software version 3 ( table 1 ). the amplification efficiency of each candidate/target gene was determined using a pool representing all of the cdna samples. first, all of the primers were examined by end-point pcr, all of the chosen candidates/target were expressed, and specific amplification was confirmed by a single band of appropriate size in a 2% agarose gel after electrophoresis (see supplementary fig. s3 online) . in a second step, the pool was used to generate a five-point standard curve based on a ten-fold dilution series. the amplification efficiency (e) and correlation coefficient (r 2 ) of the primers were calculated from the slope of the standard curve according to the equation 60 : quantitative real-time pcr (qpcr). quantitative real-time-pcr was performed using a cfx connect real-time pcr detection system (bio-rad) to analyse the specific expression of each reference/ target gene. cdna was amplified in 96-well plates using the ssoadvanced ™ sybr ® green supermix (bio-rad), 15 ng of cdna and 300 nm specific sense and anti-sense primers in a final volume of 20 μ l for each well. thermal cycling was performed, starting with an initial step at 95 °c for 180 s, followed by 40 cycles of denaturation at 95 °c for 10 s and primer-dependent annealing (table 1 ) for 30 s. each run was completed with a melting curve analysis to confirm the specificity of amplification and lack of primer dimers. determination of reference gene expression stability. data analyses were performed on three groups: a) infected plants with a low dose of bacterial inoculum compared to the mock-inoculated plants dataset (ldi), b) infected plants with a high dose of bacterial inoculum compared to the mock-inoculated plants dataset (hdi), and c) the entire dataset (total). the stability of candidate rgs for several comparison groups was analysed with the following four applets: genorm 27 , normfinder 28 , bestkeeper 29 and the deltact method 30 . the raw cq values were converted into relative quantities and imported into the genorm and normfinder software programs; no transformed cq values are required for bestkeeper and the deltact method. genorm calculates an expression stability value (m) for each rg and then determines the pairwise variation (v) of each rg with all of the other genes. at the end of analysis, by stepwise exclusion of the gene with the highest m-value (less stable), this tool allows for the ranking of the tested rgs according to their expression stability. the optimal number of rgs required for normalization was determined by pairwise variation v n /v n + 1 (0.15 recommended threshold). normfinder calculates the expression stability value (sv) for each gene, taking into account intraand inter-group variations of the samples set 28 . a low sv-value indicates the high expression stability of this gene. bestkeeper is an excel-based software tool that selects best-suited rgs by performing a statistical analysis based on pearson correlation coefficient (r), standard deviation (sd) a coefficient of variance (cv). only genes with a high r value and a low sd are combined into bestkeeper index (bki) value using the geometric mean of their cq values. finally, this tool determines the correlation coefficient of each candidate rg with the bki value, along with the probability (p) value. the rg with the highest coefficient of correlation with the bki is considered to be the most stable. the deltact (dct) method compares relative expression of pairs of rgs within each sample to identify stable rgs 30 . a ranking of the rgs using the four algorithms together was obtained as suggested by velada et al. 47 . correlations among the stability values of rgs obtained with different software were analysed using pearson's correlations (p < 0.05 and p < 0.01). all statistical analyses were performed using the spss v. 20.0. to confirm the reliability of the rgs, the relative expression profiles of apx, cat and sod genes were determined and normalized with the most stable and less stable genes. relative fold changes in gene expression was calculated using the comparative 2 −δδct method and normalized to the corresponding rgs levels 29,61 . statistical analysis. data are displays as mean ± standard deviation. cq values were tested for normality (kolmogorov-smirnov test) prior to analysis. statistical analysis of data was performed by one-way anova followed by lsd post-hoc test. calculation were performed using the spss v. 20.0. transcript profiling in host-pathogen interactions relative quantification in real-time pcr proteomics approach combined with biochemical attributes to elucidate compatible and incompatible plant-virus interactions between vigna mungo and mungbean yellow mosaic india virus the miqe guidelines: minimum information for publication of quantitative real-time pcr experiments normalization of qrt-pcr data: the necessity of adopting a systematic, experimental conditions-specific, validation of references genome-wide identification and testing of superior reference genes for transcript normalization in arabidopsis real-time rt-pcr normalisation; strategies and considerations evaluation of reference genes for accurate normalization of gene expression for real time-quantitative pcr in pyrus pyrifolia using different tissue samples and seasonal conditions tracking the best reference genes for rt-qpcr data normalization in filamentous fungi standardization of real-time pcr gene expression data from independent biological replicates transcript profiling of a conifer pathosystem: response of pinus sylvestris root tissues to pathogen (heterobasidion annosum) invasion transcript profiling of poplar leaves upon infection with compatible and incompatible strains of the foliar rust melampsora larici-populina validation of reference genes for normalization of qpcr gene expression data from coffea spp. hypocotyls inoculated with colletotrichum kahawae detection of prune dwarf virus by one-step rt-pcr and its quantitation by real-time pcr evaluation of reference genes for quantitative reverse-transcription polymerase chain reaction normalization in infected tomato plants reference gene selection and validation for the early responses to downy mildew infection in susceptible and resistant vitis vinifera cultivars selection and validation of reference genes for gene expression studies by reverse transcription quantitative pcr in xanthomonas citri subsp. citri during infection of citrus sinensis reference genes for accurate transcript normalization in citrus genotypes under different experimental conditions evaluation of reference genes for real-time rt-pcr expression studies in the plant pathogen pectobacterium atrosepticum pseudomonas syringae pv. actinidiae: a re-emerging, multi-faceted, pandemic pathogen pseudomonas syringae pv. actinidiae draft genomes comparison reveal strain-specific features involved in adaptation and virulence to actinidia species pseudomonas syringaepv. actinidiae (psa) isolates from recent bacterial canker of kiwifruit outbreaks belong to the same genetic lineage genomic analysis of the kiwifruit pathogen pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease redefining the global populations of pseudomonas syringae pv. actinidiae based on pathogenic, molecular and phenotypic characteristics proteomic changes in actinidia chinensis shoot during systemic infection with a pandemic pseudomonas syringae pv. actinidiae strain proteomic analysis of the actinidia deliciosa leaf apoplast during biotrophic colonization by pseudomonas syringae pv. actinidiae accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes normalization of real-time quantitative reverse transcription-pcr data: a modelbased variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets determination of stable housekeeping genes, differentially regulated target genes and sample integrity: bestkeeper -excel-based tool using pair-wise correlations selection of housekeeping genes for gene expression studies in human reticulocytes using real-time pcr a rapid transcriptional activation is induced by the dormancy-breaking chemical hydrogen cyanamide in kiwifruit (actinidia deliciosa) buds modified carotenoid cleavage dioxygenase8 expression correlates with altered branching in kiwifruit (actinidia chinensis) metabolic analysis of kiwifruit (actinidia deliciosa) berries from extreme genotypes reveals hallmarks for fruit starch metabolism characterization of two alcohol acyltransferases from kiwifruit (actinidia spp.) reveals distinct substrate preferences identification of suitable reference genes for real-time rt-pcr normalization in the grapevine-downy mildew pathosystem quantitative rt-pcr analysis of differentially expressed genes in quercus suber in response to phytophthora cinnamomi infection technical advance: transcript profiling in rice (oryza sativa l.) seedlings using serial analysis of gene expression (sage) selection and validation of reference genes for quantitative gene expression studies by real-time pcr in eggplant (solanum melongena l) reference gene validation for quantitative rt-pcr during biotic and abiotic stresses in vitis vinifera selection of reference genes for expression studies in cicer arietinum l.: analysis of cyp81e3 gene expression against ascochyta rabiei normalisation of real-time rt-pcr gene expression measurements in arabidopsis thaliana exposed to increased metal concentrations analysis of qpcr reference gene stability determination methods and a practical approach for efficiency calculation on a turbot (scophthalmus maximus) gonad dataset identification and validation of reference genes for normalization of transcripts from virus-infected arabidopsis thaliana reference gene selection for quantitative real-time pcr normalization in caragana intermedia under different abiotic stress conditions the choice of reference genes for assessing gene expression in sugarcane under salinity and drought stresses identification of a novel reference gene for apple transcriptional profiling under postharvest conditions reference genes selection and normalization of oxidative stress responsive genes upon different temperature stress conditions in hypericum perforatum l reference gene selection for quantitative real-time pcr analysis in virus infected cells: sars corona virus,yellow fever virus, human herpesvirus-6, camelpox virus and cytomegalovirus infections biphasic haustorial differentiation of coffee rust (hemileia vastatrix race ii) associated with defence responses in resistant and susceptible coffee cultivars validation of reference genes for rt-qpcr normalization in common bean during biotic and abiotic stresses reference gene selection for qpcr gene expression analysis of rust-infected wheat validation of reference genes for gene expression studies in virus-infected nicotiana benthamiana using quantitative real-time pcr assessment of reference gene stability influenced by extremely divergent disease symptoms in solanum lycopersicum l reactive oxygen and oxidative stress tolerance in plant pathogenic pseudomonas the antioxidant systems vis à vis reactive oxygen species during plant-pathogen interaction selected reactive oxygen species and antioxidant enzymes in common bean after pseudomonas syringae pv. phaseolicola and botrytis cinerea infection identification of pseudomonas syringae pv. actinidiae as causal agent of bacterial canker of yellow kiwifruit (actinidia chinensis planchon) in central italy molecular and phenotypic features of pseudomonas syringae pv. actinidiae isolated during recent epidemics of bacterial canker of yellow kiwifruit (actinidia chinensis) in central italy isolation of total rna from tissues rich in polyphenols and polysaccharides of mangrove plants guideline to reference gene selection for quantitative real-time pcr analysis of relative gene expression data using real-time quantitative pcr and the 2 −δδct method this work was financed by the regione campania programme under the grant agreement ur.co. fi. (unità di coordinamento e potenziamento delle attività di sorveglianza, ricerca, sperimentazione, monitoraggio e formazione in campo fitosanitario), decreto dirigenziale n°9 del 3 giugno 2014. key: cord-000159-8y8ho2x5 authors: bekaert, michaël; firth, andrew e.; zhang, yan; gladyshev, vadim n.; atkins, john f.; baranov, pavel v. title: recode-2: new design, new search tools, and many more genes date: 2009-09-25 journal: nucleic acids res doi: 10.1093/nar/gkp788 sha: doc_id: 159 cord_uid: 8y8ho2x5 ‘recoding’ is a term used to describe non-standard read-out of the genetic code, and encompasses such phenomena as programmed ribosomal frameshifting, stop codon readthrough, selenocysteine insertion and translational bypassing. although only a small proportion of genes utilize recoding in protein synthesis, accurate annotation of ‘recoded’ genes lags far behind annotation of ‘standard’ genes. in order to address this issue, provide a service to researchers in the field, and offer training data for developers of gene-annotation software, we have gathered together known cases of recoding within the recode database. recode-2 is an improved and updated version of the database. it provides access to detailed information on genes known to utilize translational recoding and allows complex search queries, browsing of recoding data and enhanced visualization of annotated sequence elements. at present, the recode-2 database stores information on approximately 1500 genes that are known to utilize recoding in their expression—a factor of approximately three increase over the previous version of the database. recode-2 is available at http://recode.ucc.ie the term 'translational recoding' describes the utilization of non-standard decoding during protein synthesis and encompasses such processes as ribosomal frameshifting, codon redefinition, translational bypassing and stopgo (1) (2) (3) (4) (5) (6) (7) . what is often considered as a decoding error-e.g. a frameshifting error or mistranslation of a particular codon-may occasionally benefit the organism by increasing its fitness and survival. in such instances the propensity for the decoding 'error' may be selected for during evolution, leading to the formation of a particular sequence context that elevates the frequency of the 'error'. to discriminate such cases of programmed decoding 'misbehaviour' from promiscuous translational errors or translational noise, the term recoding is used. the position within an mrna where a recoding event takes place is termed the 'recoding site'. sequence elements responsible for increasing the efficiency of recoding events are termed 'recoding stimulatory signals', and a minimal sequence fragment that allows recoding to take place at the natural efficiency (i.e. relative to the level of standard decoding at the recoding site) is termed a 'recoding cassette'. recoding can benefit gene expression in a number of ways. it can regulate gene expression by being part of a sensor for particular cellular conditions. prominent examples include ribosomal frameshifting in bacterial release factor 2 (rf2) and eukaryotic antizyme mrnas. in both instances, ribosomal frameshifting is required for the production of the corresponding active full-length protein products. in the rf2 mrna, the efficiency of frameshifting is negatively regulated by the cellular concentration of its product, rf2, providing an autoregulatory circuit for its biosynthesis (8) (9) (10) . in the antizyme mrna, the efficiency of frameshifting is modulated by cellular levels of polyamines, whose concentration in turn is controlled by antizyme (11, 12) . thus, this mechanism ensures the maintenance of antizyme production at the levels required to support physiologically appropriate concentrations of polyamines. recoding can also be used for the diversification of protein products encoded by a single gene. an illustrative example is in bacterial dnax mrna, where frameshifting allows synthesis of two different protein subunits-sharing the same n-terminal part-from a single open reading frame (orf) in its mrna (13) (14) (15) . a presumed constant ratio of frameshifting in dnax ensures a fixed stoichiometric balance between these two subunits (16) . this balance, then, is independent of the absolute levels of dnax transcription and translational initiation on its mrna. similarly, in many viruses recoding is responsible for setting a ratio between protein products (such as those encoded by gag-pro-pol genes in retroviruses) produced from a single mrna (17) . recoding also provides rna viruses with a mechanism for the translation of downstream orfs on polycistronic rnas [other mechanisms include leaky scanning, shunting, reinitiation, iress and the production of subgenomic rnas (18) ] and may also be involved in global regulation mechanisms, such as mediating the switch between translation and replication on the same genomic rna (19) . finally, recoding provides a way for the incorporation of non-standard amino acids-e.g. amino acids that share their codons with termination signals (the most prominent example of which is selenocysteine, encoded by uga) (20) (21) (22) . for further information on the diverse variety of recoding functions, see recent reviews (1, 3, 7, 23, 24) . recoding cassettes may be composed of a variety of diverse sequence elements. for example, primary nucleotide sequences may promote re-arrangements of trna molecules relative to their codons in mrna inside the ribosome or affect recognition of trnas or release factors in the ribosomal a-site. on the other hand, many recoding signals act in the form of rna secondary structures, such as simple stem-loops, or more complex pseudoknots, kissing stem-loops and other structures that involve interactions between considerably distant rna regions (19, (25) (26) (27) (28) . trans-acting rna signals affecting ribosomal decoding through complementary interactions with ribosomal rna (29-32), or through the nascent peptide acting within the ribosome exit tunnel (6, 33, 34) , are also known. some recoding events-such as selenocysteine insertion-require the presence of additional specialized machinery such as selenocysteine trnas, selenocysteine-specific translation factors and several other components of the selenocysteine biosynthesis and insertion pathway (20, (35) (36) (37) . recent reviews on stimulatory signals involved in the modulation of recoding events and molecular mechanisms of recoding provide further details (7, 25, 27, 38, 39) . despite considerable progress in the development of computational tools for the prediction of protein coding genes in sequenced genomes, the identification and annotation of recoded genes lags far behind. the hurdle lies not so much in the fact that recoded genes do not obey standard rules of genetic readout but, rather, in the considerable diversity of recoded genes and sequence elements responsible for recoding. even among evolutionarily related genes, all utilizing recoding, the diversity of recoding signals can be considerable. an extreme example is when orthologous genes utilize recoding at different stages of gene expression to achieve the same goal. an example is in dnax, where ribosomal frameshifting is employed by enterobacteria, but transcriptional slippage is used in thermus thermophilus (40) . a similar situation occurs in bacterial insertion sequence (is) elements, where a certain group of is elements utilizes transcriptional slippage to produce orfa-orfb fusions, while many other is elements utilize ribosomal frameshifting for the same purpose (41) . the diversity of recoding functions, combined with the wide spectrum of unrelated sequence elements involved in recoding, makes the design of a uniform model of recoding intractable. nonetheless, in recent years, we have witnessed the development of specialized models and computational tools for the identification of particular subsets of recoding cassettes, or tools that are specific to recoding events in particular groups of homologous genes (42) (43) (44) (45) . these developments, at least partially, were facilitated by the availability of a compiled dataset of known recoded genes collected together in the recode database (http://recode.genetics.utah.edu), which was initially launched 9 years ago (46, 47) . to facilitate further development of computational tools for the prediction of recoded genes in the ever faster growing body of sequence data, as well as to provide bench researchers with upto-date information on recoding, an efficient means of recode database population and annotation are now required. in this article, we describe the incarnation of the database, recode-2. the major advances of recode-2 (hosted in a new location http://recode.ucc.ie) over previous versions include a new web design allowing enhanced visualization of stimulatory signals, a uniform recodeml format for the annotation of recoded genes, and a significantly larger number of entriesincluding many recently identified cases-that altogether have more than doubled the size of the database since its last published update. the data are stored in a local postgresql database that is queried by php scripts embedded in the web interface. the schema of the postgresql database is shown in figure 1 . the database stores information on individual genes that utilize recoding, the mechanisms and stimulatory signals involved, and references to the original literature sources that describe the recoding events. in order to facilitate the uniform annotation of recoding events, we have designed an xml-based format for the annotation of recoded genes, recodeml. the document type definition for recodeml is available at the recode-2 web site at http://recode.ucc.ie/dtd the extensibility of the recodeml format will allow incorporation of new annotation, if required, for newly discovered types of recoding, and the associated features, as they are being discovered. the database handles batch importation of properly designed recodeml entries into the postgresql database, thus facilitating rapid population of the database with new data. the data in the database may be explored in two ways. they may be browsed by one of the three categories: kingdom (archaea, bacteria, eukaryotes and viruses), organism and type of recoding. the data may also be searched directly by key words that can be inserted into the search field. searches that use regular expressions are allowed. the output of a database search is a list of recode-2 entries in a short format that includes organism name, kingdom, genus, type of recoding event, status of figure 2 shows an example of sequence annotation for the human oaz1 gene, alongside a diagram of a stimulatory rna secondary structure, and the recode-1 logo. unlike recode-1, where all data on recoding events were introduced manually, recode-2 also utilizes automated identification of recoding events by the recently developed computer programs arfa (43) and oaf (44) , that are able to identify and annotate +1 frameshifting events in mrnas of bacterial rf2s and eukaryotic antizyme (oazs), respectively. however, a significant source of recoding events remains to be serendipitous discoveries by experimental studies that sometimes are complemented by more systematic studies of large groups of similar genes (51, 52) . therefore, a large proportion of new data are still populated manually or semi-manually. to ease manual population of recoding events, a special form has been designed that is available in the database upon user registration. user registration needs to be approved by one of the database contributors. the novel data in the database include 249 rf2 mrnas identified by arfa, 152 events identified by oaf, 200 new selenoprotein genes (53) (54) (55) (56) and 200 new viral annotations (57) including the newly discovered frameshift cassettes in potyviruses (58), alphaviruses (59) and the japanese encephalitis group of flaviviruses (60) . the database will expand in accordance with the growth of available sequence information that will be scanned by one of the existing programs for recode annotation. we also plan to continue developing tools for the automatic identification of recoding events from nucleotide sequences. as the field grows and the number of recoded genes progressively increases, it becomes harder to extract data from the relevant literature and a number of novel recoded genes may escape the database. therefore, we encourage users and researchers in the field to submit their data directly to the recode-2 database. we are also willing to provide help with the analysis of potential new recoding events. reprogrammed genetic decoding in cellular gene expression programmed translational frameshifting recoding: translational bifurcations in gene expression programmed ribosomal frameshifting goes beyond viruses: organisms from all three kingdoms use frameshifting to regulate gene expression, perhaps signaling a paradigm shift a case for ''stopgo'': reprogramming translation to augment codon meaning of ggn by promoting unconventional termination (stop) after addition of glycine and then allowing continued translation (go) coupling of open reading frames by translational bypassing recoding: expansion of decoding rules enriches gene expression expression of peptide chain release factor 2 requires high-efficiency frameshift the function, structure and regulation of e. coli peptide chain release factors release factor 2 frameshifting sites in different bacteria ribosomal frameshifting in decoding antizyme mrnas from yeast and protists to humans: close to 300 cases reveal remarkable diversity despite underlying conservation autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme the gamma subunit of dna polymerase iii holoenzyme of escherichia coli is produced by ribosomal frameshifting translational frameshifting generates the gamma subunit of dna polymerase iii holoenzyme programmed ribosomal frameshifting generates the escherichia coli dna polymerase iii gamma subunit from within the tau subunit reading frame structural probing and mutagenic analysis of the stem-loop required for escherichia coli dnax ribosomal frameshifting: programmed efficiency of 50% programmed ribosomal frameshifting in hiv-1 and the sars-cov alternative translation strategies in plant viruses long-distance rna-rna interactions in plant virus gene expression and replication eukaryotic selenoprotein synthesis: mechanistic insight incorporating new factors and new functions for old factors selenoprotein synthesis: uga does not end the story selenium: its molecular biology and role in human health recoding in bacteriophages and bacterial is elements the role of programmed-1 ribosomal frameshifting in coronavirus propagation frameshifting rna pseudoknots: structure and mechanism structure, stability and function of rna pseudoknots involved in stimulating ribosomal frameshifting rna pseudoknots and the regulation of protein synthesis a -1 ribosomal frameshift element that requires base pairing across four kilobases suggests a mechanism of regulating ribosome and replicase traffic on a viral rna slippery runs, shifty stops, backward steps, and forward hops: -2, -1, +1, +2, +5, and +6 ribosomal frameshifting upstream stimulators for recoding overriding standard decoding: implications of recoding for ribosome function and enrichment of gene expression use of trna suppressors to probe regulation of escherichia coli release factor 2 translational bypassing without peptidyl-trna anticodon scanning of coding gap mrna a nascent peptide is required for ribosomal bypass of the coding gap in bacteriophage t4 gene 60 protein factors mediating selenoprotein synthesis solution structure of secis, the mrna element required for eukaryotic selenocysteine insertion-interaction studies with the secis-binding protein sbp selenocysteine inserting trnas: an overview p-site trna is a crucial initiator of ribosomal frameshifting a new kinetic model reveals the synergistic effect of e-, p-and a-sites on +1 ribosomal frameshifting nonlinearity in genetic decoding: homologous dna replicase genes use alternatives of transcriptional slippage or translational frameshifting transcriptional slippage in bacteria: distribution in sequenced genomes and utilization in is element gene expression knotinframe: prediction of -1 ribosomal frameshift events arfa: a program for annotating bacterial release factor genes, including prediction of programmed ribosomal frameshifting ornithine decarboxylase antizyme finder (oaf): fast and reliable detection of antizymes with frameshifts in mrnas predicting genes expressed via -1 and +1 frameshifts recode: a database of frameshifting, bypassing and codon redefinition utilized for gene expression database resources of the national center for biotechnology information pseudoviewer3: generating planar drawings of large-scale rna structures with pseudoknots sequences that direct significant levels of frameshifting are frequent in coding regions of escherichia coli conserved translational frameshift in dsdna bacteriophage tail assembly genes comparative genomics of trace elements: emerging dynamic view of trace element utilization and function dynamic evolution of selenocysteine utilization in bacteria: a balance between selenoprotein loss and evolution of selenocysteine from redox active cysteine residues trends in selenium utilization in marine microbial world revealed through the analysis of the global ocean sampling (gos) project the selenoproteome of clostridium sp. ohilas: characterization of anaerobic bacterial selenoprotein methionine sulfoxide reductase a an extended signal involved in eukaryotic -1 frameshifting operates through modification of the e site trna an overlapping essential gene in the potyviridae discovery of frameshifting in alphavirus 6k resolves a 20-year enigma a conserved predicted pseudoknot in the ns2a-encoding sequence of west nile and japanese encephalitis flaviviruses suggests ns1' may derive from ribosomal frameshifting we would like to express our appreciation to the colleagues who have contributed data for the previous versions of the database. conflict of interest statement. none declared. key: cord-001921-73esrper authors: lin, cheng-yung; chiang, cheng-yi; tsai, huai-jen title: zebrafish and medaka: new model organisms for modern biomedical research date: 2016-01-28 journal: j biomed sci doi: 10.1186/s12929-016-0236-5 sha: doc_id: 1921 cord_uid: 73esrper although they are primitive vertebrates, zebrafish (danio rerio) and medaka (oryzias latipes) have surpassed other animals as the most used model organisms based on their many advantages. studies on gene expression patterns, regulatory cis-elements identification, and gene functions can be facilitated by using zebrafish embryos via a number of techniques, including transgenesis, in vivo transient assay, overexpression by injection of mrnas, knockdown by injection of morpholino oligonucleotides, knockout and gene editing by crispr/cas9 system and mutagenesis. in addition, transgenic lines of model fish harboring a tissue-specific reporter have become a powerful tool for the study of biological sciences, since it is possible to visualize the dynamic expression of a specific gene in the transparent embryos. in particular, some transgenic fish lines and mutants display defective phenotypes similar to those of human diseases. therefore, a wide variety of fish model not only sheds light on the molecular mechanisms underlying disease pathogenesis in vivo but also provides a living platform for high-throughput screening of drug candidates. interestingly, transgenic model fish lines can also be applied as biosensors to detect environmental pollutants, and even as pet fish to display beautiful fluorescent colors. therefore, transgenic model fish possess a broad spectrum of applications in modern biomedical research, as exampled in the following review. although zebrafish (danio rerio) and medaka (oryzias latipes) are primitive vertebrates, they have several advantages over other model animals. for example, they are fecund and light can control their ovulation. spawning takes place frequently and no limitation in their spawning season. microinjection of fertilized eggs is easily accessible and relatively cheap. their embryos are transparent, making it easy to monitor the dynamic gene expression in various tissues and organs in vivo without the need to sacrifice the experimental subjects. their genome sizes are approximately 20 to 40 % of the mammalian genome, making them the only vertebrates available for large-scale mutagenesis. their maturation time takes only 2~3 months, which is relatively less laborious and time-saving for generating transgenic lines. in addition, many routine techniques of molecular biology and genetics, including knock-in, knockdown and knockout, are well developed in the model fish. therefore, zebrafish and medaka are new excellent animal systems for the study of vertebrate-specific biology in vivo. the f0 transgenic line can be established once the exogenous gene can be successfully transferred to the embryos, followed by stable germline transmission of the transgene to the f1 generation. generally, around 10-20 % of treated embryos have a chance to achieve germline transmission [1] . it has been reported that a foreign gene flanked with inverted terminal repeats of adeno-associated virus can be used to enhance the ubiquitous expression and stable transmission of transgene in model fish [2] . meanwhile, transgenesis can be facilitated by using the tol2 transposon derived from medaka [3] . transposase catalyzes transposition of a transgene flanked with the tol2 sequence [4] . the efficiency of tol-2-mediated germline transmission could range from 50 to 70 % of injected embryos [4, 5] . a cutting-edge technique has taken the study of fish gene transfer to new horizons, such as knockout zebrafish by the transcription activator-like effector nuclease (talen) system and the clustered regularly interspaced short palindromic repeats (crispr) combined with crispr-associated proteins 9 (cas9) [6, 7] . the talen system involves the dna recognition domain of transcription activator-like effectors (tales) and a nuclease domain for generation of nicks on dna sequences. the crispr/cas9 system directed by a synthetic single guide rna can induce targeted genetic knockout in zebrafish. the main difference between these two systems is based on their recognition mechanisms. unlike the tales applied in the talen system, the crispr/cas9 system recognizes its target dna fragment by the complementary non-coding rna. the development of the talen and crispr/cas9 systems provides new genomic editing approaches for establishing genetic knockout fish lines [8] . the fluorescence protein gene (fpg) has been widely applied as a reporter gene in studies of the transgene expression by direct visualization under fluorescent microscopy in vivo [9] . many transgenic model fish lines harbor an fpg driven by various tissue-specific promoters, including the erythroid-specific gata promoter [10] , muscle-specific α-actin promoter [11] , rod-specific rhodopsin promoter [12] , neuron-specific isl-1 promoter [13] , pancreas-specific pdx-1 and insulin promoters [14] , myocardium-specific cmlc2 promoter [15] , liver-specific l-fabp promoter [16] , bone-specific col10a1 promoter [17] , macrophage-specific mfap4 promoter [18] , and germ cell-specific vasa promoter [19] . using medaka β-actin promoter, tsai's lab generated a transgenic line of medaka displaying green fp ubiquitously around the whole fish from f0 through f2 generations in a mendelian inheritance manner [20] . this is known as the first transgenic line of glowing pet fish, which was reported by science [21] and far eastern economic review [22] and honored to be selected as among "the coolest inventions of 2003" by time [23] . the dna sequences of the aforementioned promoters ranging from 0.5 to 6.5 kb are sufficient to drive the fpg reporter to mimic the tissue-specific expression of endogenous gene. however, some genes require a longer regulatory dna sequence, such as more than 20 kb, to fully recapitulate the characteristic expression profiles of endogenous genes. in that case, bacterial artificial chromosome (bac) and phage p1-dereived artificial chromosome (pac) have been commonly used for this purpose [24] . for example, the zebrafish rag1 gene, flanked with pac dna containing 80 kb at the 5′ upstream and 40 kb at the 3′ downstream, can be expressed specifically in lymphoid cells. instead of using the tedious chi-site dependent approach, jessen et al. reported a two-step method to construct a bac clone [25] . employing this protocol, chen et al. constructed a bac clone containing the upstream 150 kb range of zebrafish myf5 and generated a transgenic line tg (myf5:gfp) [26] . this transgenic line is able to recapitulate the somite-specific and stagedependent expression of the endogenous myf5 at an early developmental stage. in summary, all the above transgenic lines should be very useful materials for studying both gene regulation and cell development. zebrafish is particularly useful for studying heart development for the following reasons: (a) zebrafish have a primitive form of the heart, which is completely developed within 48 h post-fertilization (hpf). (b) the cardiac development can be easily observed in the transgenic line possessing a fp-tagged heart. (c) the zebrafish embryos with a defective cardiovascular system can still keep on growing by acquiring oxygen diffused from water. (d) discovery of genes involved in heart development can be facilitated by a simple haploid mutation method [27] . for example, using the zebrafish jekyll mutant, which has defective heart valves, walsh and stainier discovered that udpglucose dehydrogenase is required for zebrafish embryos to develop normal cardiac valves [28] . tsai's lab is the first group to generate a transgenic zebrafish line that possesses a gfp-tagged heart [15] . this line was established from zebrafish embryos introduced with an expression construct, in which the gfp reporter is driven by an upstream control region of zebrafish cardiac myosin light chain 2 gene (cmlc2). using this transgenic line, raya et al. found that the notch signaling pathway is activated is during the regenerative response [29] . shu et al. reported that na, k-atpase α1b1 and α2 isoforms have distinct roles in the patterning of zebrafish heart [30] . this transgenic line should also be useful for studying the dynamic movement and cell fate of cardiac primordial cells. for example, forouhar et al. proposed a hydro-impedance pump model for the embryonic heart tubes of zebrafish [31] . a 4d dynamic image of cardiac development has been developed [32] . furthermore, hami et al. reported that a second heart field is required during cardiac development [33] . thus, recently, nevis et al. stated that tbx1 plays a function for proliferation of the second heart field, and the zebrafish tbx1-null mutant resemble the heart defects in digeorge syndrome [34] . thus, the expression pattern of heart-specific genes could be analyzed based on heart progenitor cells collected in this transgenic line. the analysis of gene or protein expression dynamics at different developmental stages could also be conducted. furthermore, this transgenic fish is a potential platform for detecting chemicals, drugs and environmental pollutants affecting heart development, as detailed in following section. in vivo transient assay of the injected dna fragments in model fish embryos is a simple yet effective way to analyze the function of regulatory cis-elements. for example, myf5, one of myogenesis regulatory factors (mrf), plays key roles in the specification and differentiation of muscle primordial cells during myogenesis. the expression of myf5 is somite-specific and stage-dependent, and its activation and repression are delicately orchestrated. using in vivo transient assay, chen et al. found that a novel ciselement located at −82/-62 is essential for somite-specific expression of myf5 [35] . lee et al. revealed that this −82/-62 cis-element is specifically bound by forkhead box d3, and proposed that somite development is regulated by the pax3-foxd3-myf5 axis [36] . besides foxd3, foxd5, another protein in the forkhead box family, is necessary for maintaining the anterior-posterior polarity of somite cells in mesenchymal-epithelial transition [37] . the expression of foxd5 is regulated by fgf signaling in anterior presomitic mesoderm (psm), which indicates that fgf-foxd5-mesp signaling takes place in somitogenesis [37] . furthermore, analysis of the loci of adjacent mrf4 and myf5 revealed the complicated regulation mechanism of the mrf genes. it was also found that the biological function of mrf4 is related to myofibril alignment, motor axon growth, and organization of axonal membrane [38] . the molecular mechanism that underlies the repression of myf5 has also been reported. for example (fig. 1a) , a strong repressive element of zebrafish myf5 was found within intron i (+502/+835) [39] . this repressive element is modulated by a novel intronic microrna, termed mir-in300 or mir-3906 [40] . when myf5 transcripts reach the highest level after specification, the accumulated mir-3906 starts to reduce the transcription of myf5 through silencing the positive factor dickkopf-related protein 3 (dkk3r or dkk3a) for the myf5 promoter [41] . itgα6b is a receptor of secretory dkk3a and that interaction between itgα6b and dkk3a is required to drive the downstream signal transduction which regulates myf5 promoter activity in somite during embryogenesis of zebrafish [42] . dkk3a regulates p38a phosphorylation to maintain smad4 stability, which in turn enables the formation of the smad2/3a/4 complex required for the activation of the myf5 promoter [43] . however, when myf5 transcripts are reduced at the later differentiation stage, mir-3906 is able to be transcribed by its own promoter [40] (fig. 1b) . furthermore, increased expression of mir-3906 interacts with its receptor itgα6b, resulting in the phosphorylation of p38a and the formation of the smad2/3a/4 complex, which in turn, activates the myf5 promoter activity. when myf5 is highly transcribed, the intronic mir-3906 suppresses the transcription of myf5 through silencing the dkk3a [39, [41] [42] [43] . b at the late muscle development, mir-3906 starts transcription at its own promoter and switches to silence homer1b to control the homeostasis of intracellular calcium concentration ([ca 2+ ] i ) in fast muscle cells [40] . either mir-3906-knockdown or homer-1b-overexpression causes the increase of homer-1b protein, resulting in an enhanced level of [ca 2+ ] i , which in turn, disrupts sarcomeric actin filament organization. in contrast, either mir-3906-overexpression or homer-1b-knockdown causes the decrease of homer-1b, resulting in a reduced [ca 2+ ] i and thus a defective muscle phenotype [40] controls the intracellular concentration of ca 2+ ([ca 2+ ] i ) in fast muscle cells through subtly reducing homer-1b expression. the homeostasis of [ca 2+ ] i is required during differentiation to help maintain normal muscle development [40] . nevertheless, it remains to be investigated how mir-3906 switches its target gene at different developmental stages. apart from the regulation of somitogenesis, myf5 is also involved in craniofacial muscle development. the functions of myf5 in cranial muscles and cartilage development are independent of myod, suggesting that myf5 and myod are not redundant. thus, three possible pathways could be associated with the molecular regulation between myf5 and myod: (i) myf5 alone is capable of initiating myogenesis, (ii) myod initiates muscle primordia, which is subdivided from the myf5-positive core, and (iii) myod alone, but not myf5, modulates the development of muscle primordia [44] . furthermore, the six1a gene was found to play an important role in the interaction between myf5 and myod [45] . in cartilage development, myf5 is expressed in the paraxial mesoderm at the gastrulation stage. myf5 plays a role in mesoderm fate determination by maintaining the expression of fgf3/8, which in turn, promotes differentiation from neural crest cells to craniofacial cartilage [46] . this research on myf5 not only reveals that it has different functions between craniofacial muscle development and somitogenesis, but it also opens up a new study field for understanding craniofacial muscle development. hinits et al. reported no phenotype both in myf5knockdown embryos and myf5-null mutant, suggesting that myf5 is rather redundant in somitogenesis of zebrafish [47] . however, it is hard to reasonably explain why these embryos and mutant are all lethal and can't grow to adulthood. on the other hand, lin et al. reported an observable defective phenotype in myf5-knockdown embryos, and claimed that the concentration of myf5-mo they used can inhibit maternal myf5 mrna translation [46] . this discrepancy might attribute the effectiveness of mo used or the different phenotypes between the knockdown embryos and knockout mutant in this case. the retina-specific expression of the carp rhodopsin gene is controlled by two upstream regulatory dna ciselements [48] . one is located at −63 to −75, which is the carp neural retina leucine zipper response-like element; the other is located at −46 to −52, which is a carpspecific element crucial to reporter gene expression in medaka retinae. intriguingly, immediate activation of early growth response transcriptional regulator egr1 could result in the incomplete differentiation of retina and lens, leading to microphthalmos [49] . another important factor for ocular development is the adpribosylation factor-like 6 interacting protein 1 (arl6ip1). loss of arl6ip1 function leads to the absence of retinal neurons, disorganized retinal layers and smaller optic cups [50] . upon losing arl6ip1, retinal progenitors continued to express cyclin d1, but not shh or p57kip2, suggesting that eye progenitor cells remained at the early progenitor stage, and could not exit the cell cycle to undergo differentiation [51] . additionally, it has been reported that arl6ip1 is essential for specification of neural crest derivatives, but not neural crest induction. tu et al. found that arl6ip1 mutation causes abnormal neural crest derivative tissues as well as reduced expression of neural crest specifier genes, such as foxd3, snail1b and sox10, indicating that arl6ip1 is involved in specification, but not induction, of neural crest cells [52] . furthermore, they found that arl6ip1 could play an important role in the migration of neural crest cells because in the arl6ip1-knockdown embryos, crestinand sox10-expressing neural crest cells failed to migrate ventrally from neural tube into trunk. more recently, lin et al. found that ras-related nuclear (ran) protein is conjugated with arl6ip1, and proposed that ran protein associates with arl6ip1 to regulate the development of retinae [53] . to date, no in vivo model system has been established to identify cells in the cns that can specifically respond with regeneration after stresses, and, even if identified, no method is in place to trace these responsive cells and further identify their cell fates during hypoxic regeneration. to address these issues, lee et al. generated a transgenic zebrafish line huorfz, which harbors the upstream open reading frame (uorf) from human ccaat/enhancer-binding protein homologous protein gene (chop), fused with the gfp reporter and driven by a cytomegalovirus promoter [54] . after huorfz embryos were treated with heat-shock or under hypoxia, the gfp signal was exclusively expressed in the cns, resulting from impeding the huorf chop -mediated translation inhibition [54] . interestingly, zeng et al. found that gfp-(+) cells in spinal cord respond to stress, survive after stress and differentiate into neurons during regeneration (chih-wei zeng, yasuhiro kamei and huai-jen tsai, unpublished data). micrornas (mirnas) are endogenous single-stranded rna molecules of 19-30 nucleotides (nt) that repress or activate the translation of their target genes through canonical seed-and non-canonical centered mirna binding sites. the known mechanisms involved in mirnas-mediated gene silencing are decay of mrnas and blockage of translation [55] [56] [57] . probably the expression of 30~50 % of human genes is regulated by mirnas [58, 59] . therefore, to understand gene and function in cells or embryos, it is important to exactly know the target gene(s) of a specific mirna at different phase of cells or at particular stages of developing embryos. instead of using a bioinformatic approach, the tsai's lab developed the labeled mirna pull-down (lamp) assay system, which is a simple but effective method to search for the candidate target gene(s) of a specific mirna under investigation [60] . lamp assay system yields fewer falsepositive results than a bioinformatic approach. taking advantage of lamp, scientists discovered that mir-3906 silences different target genes at different developmental stages, e.g., at early stage, mir-3906 targets dkk3a [41] , while at late stage, it targets homer-1b [40] (fig. 1) . in another example (fig. 2) , mir-1 and mir-206 are two muscle-specific micrornas sharing the same seed sequences. they are able to modulate the expression of vascular endothelial growth factor aa (vegfaa) and serve as cross-tissue signaling regulators between muscle and vessels. since mir-1 and mir-206 share identical seed sequences, stahlhut et al. demonstrated that they can silence the same target gene, such as vegfaa, and considered them as a single cross-tissue regulator termed as mir-1/206 [61] . mir-1/206 reduces the level of vegfaa, resulting in the inhibition of the angiogenic signaling [61] . surprisingly, using the lamp assay system, lin et al. reported that the target genes for mir-1 and mir-206 are different [62] . while mir-206 targets vegfaa, mir-1 targets seryl-trna synthetase gene (sars). sars is a negative regulator of vegfaa. although both mir-1 and mir-206 have identical seed sequences, the sars-3′utrs of zebrafish, human and mouse origins can be recognized only by mir-1 in zebrafish embryos and mammalian cell lines (hek-293 t and c2c12), but not by mir-206 [62] . conversely, the vegfaa-3′utr is targeted by mir-206, but not by mir-1. therefore, lin et al. concluded that mir-1 and mir-206 are actually two distinct regulators and play opposing roles in zebrafish angiogenesis. the mir-1/sars/vegfaa pathway promotes embryonic angiogenesis by indirectly controlling vegfaa, while mir-206/vegfaa pathway plays an anti-angiogenic role by directly reducing vegfaa. interestingly, they also found that the mir-1/sars/vegfaa pathway increasingly affects embryonic angiogenesis at late developmental stages in somitic cells [62] . it remains to be studied how mir-1 increases in abundance at late stage. different from mammals, zebrafish have the ability to regenerate injured parts in the cns. many mirnas have been found in the cns. since mirnas are involved in many aspects of development and homeostatic pathways, they usually play important roles in regeneration [63] . it has been shown that several mirnas have prominent fig. 2 mir-1 and mir-206 silence different target genes and play opposing roles in zebrafish angiogenesis. both mir-1 and mir-206 are musclespecific micrornas and share identical seed sequences. however, they silence different target genes to affect the secreted vegfaa level through different pathways [62] . the mir-1/sars/vegfaa pathway plays a positive role in angiogenesis since sars, a negative factor for vegfaa promoter transcription, is silenced by mir-1, resulting in the increase of vegfaa. however, the mir-206/vegfaa pathway plays a negative role since vegfaa is silenced directly by mir-206. dynamic changes of mir-1 and mir-206 levels are also observed [62] . the mir-1 level gradually increases between 12 and 20 hpf and significantly increases further between 20 and 32 hpf, while the mir-206 level is only slightly changed during this same period. consequently, vegfaa increases greatly from 24 to 30 hpf, which might be responsible for the continuous increase of mir-1/sars/vegfaa pathway, but not mir-206/vegfaa pathway. therefore, temporal regulation of the expression of mir-1 and mir-206 with different target genes occur during embryonic angiogenesis in somitic cells of zebrafish functions in regulating the regeneration process. for example, mir-210 promotes spinal cord repair by enhancing angiogenesis [64] , and the mir-15 family represses proliferation in the adult mouse heart [65] . furthermore, mir-nas mir-29b and mir-223 are identified following optic nerve crush. by gene ontology analysis, mir-29b and mir-223 are found to regulate genes, including eva1a, layna, nefmb, ina, si:ch211-51a6.2, smoc1, and sb:cb252. these genes are involved in cell survival or apoptosis, indicating that these two mirnas are potential regulators of optic nerve regeneration [66] . although the main hematopoietic sites in zebrafish differ from those in mammals, both zebrafish and mammals share all major blood cell types that arise from common hematopoietic lineages [67] . moreover, many genes and signaling pathways involved in hematopoiesis are conserved among mammals and zebrafish. for example, scl, one of the first transcription factors expressed in early hematopoietic cells, is evolutionarily conserved. during definitive hematopoiesis, runx1 marks hematopoietic stem cells (hscs) in both mouse and fish. additionally, in differentiated populations, gata1, the erythroid lineage regulator, pu.1 and c/ebp, the myeloid lineage regulators, and ikaros, a mark of the lymphoid population, are in accordance with the hematopoietic hierarchy in zebrafish and mammals [68] . thus, the findings with respect to zebrafish blood development could be applied to mammalian system. genetic screening in zebrafish has generated many blood-related mutants that help researchers understand hematopoietic genes and their functions [69] . for example, the spadetail mutant carrying a mutated tbx16 exhibits defective mesoderm-derived tissues, including blood. this mutant displays the decrease levels tal1, lmo2, gata2, fli1 and gata1 in the posterior lateral mesoderm, indicating the important role of tbx16 during hemangioblast regulation [70] . chemical screening in zebrafish using biologically active compounds is also a powerful approach to identify factors that regulate hscs. for example, it is well known that prostaglandin (pg) e2 increases the induction of stem cells in the aorta-gonad-mesonephros region of zebrafish, as demonstrated by increasing expressions of runx1 and cmyb, which, in turn, increases engraftment of murine marrow in experimental transplantation [71] . in human clinical trials, the treatment of cord blood cells with dimethyl pge2 caused an increase in long-term engraftment [72] , suggesting that a compound identified in zebrafish could have clinical application in humans. model fish are excellent materials for the study of human diseases due to some mutants display similar phenotypes of human diseases [73] . in addition, essential genes and thereof regulation to control the development of tissues or organs are highly conserved [74] . for example, tbx5 is a t-box transcription factor responsible for cell-type specification and morphogenesis. the phenotypes of tbx5 mutant are highly similar among mammals and zebrafish. thus, transgenic fish with heart-specific fluorescence could provide a high-through screening platform for drugs for cardiovascular disease. for example, the tsai's lab established a transgenic line which could be induced to knock down the expression level of cardiac troponin c at any developmental stage, including embryos, larva or adult fish. the reduction of troponin c resulted in mimicry of dilated cardiomyopathy, and the incomplete atrioventricular blocking disease in humans. therefore, this transgenic line is expected to make a significant contribution to drug screening and the elucidation of the molecular mechanisms underlying cardiovascular diseases. next, the effect of drugs on embryonic development was also studied. amiodarone, which is a class iii antiarrhythmic agent, is being used for the treatment of tachyarrhythmia in humans. however, amiodarone-treated zebrafish embryos were found to exhibit backflow of blood in the heart [75] . subsequent research showed that amiodarone caused failure of cardiac valve formation [75] . specifically, amiodarone induces ectopic expression of similar to versican b (s-vcanb), resulting in repression of egfr/gsk3β/snail signaling, which in turn, upregulates cdh5 at the heart field, and causes defective cardiac valves [76] . moreover, amiodarone was found to repress metastasis of breast cancer cells by inhibiting the egfr/erk/snail pathway [77] , a phenomenon analogous to the inhibitory effects of amiodarone on emt transition observed in the heart. last but not least, although zebrafish has a twochambered heart, relative to mouse, rat, and rabbit, its heart rate, action potential duration (apd) and electrocardiogram (ecg) morphology are similar to those of humans. [78, 79] . additionally, tsai et al. demonstrated that the in vitro ecg recording of zebrafish heart is a simple, efficient and high throughput assay [80] . thus, zebrafish can serve as a platform for direct testing of drug effect on apd prolongation and prolonged qt interval, which is required by the fda as a precondition for drug approval. zebrafish become a popular experimental animal for the studies of human cancer [81] , in part because the fish homologs of human oncogenes and tumor suppressor genes have been identified, and in part because signaling pathways regulating cancer development are conserved [82] [83] [84] . amatruda et al. reported that many zebrafish tumors are similar to those of human cancer in the histological examination [85] . the zebrafish transgenic line with skin-specific red fluorescence could be applied for skin tumor detection [86] . when the embryos of this line were treated with solutions containing arsenic, the tumors induced on the skin could be easily identified by naked eye under fluorescent microscope. therefore, this transgenic line can be potentially used for the study of skin diseases. for example, the common skin cancer melanoma may be screened by the red fluorescence expression in this transgenic line. zebrafish transgenic line could also be applied to establish models simulating melanoma development. the human oncogenic braf v600e was expressed under the control of the zebrafish melanocyte mitfa promoter to establish a melanoma model [87] . combining skinspecific red fluorescence with mitfa-driven oncogene expression, the melanoma could be easily traced. therefore, transgenic lines and mutants of model fish could provide abundant resources for mechanistic studies and therapeutic research in human diseases. metastasis involves processes of sequential, interlinked and selective steps, including invasion, intravasation, arrest in distant capillaries, extravasation, and colonization [88] . zebrafish is again an alternative organism for in vivo cancer biology studies. in particular, xenotransplantation of human cancer cells into zebrafish embryos serves as an alternative approach for evaluating cancer progression and drug screening [89] . for example, human primary tumor cells labeled with fluorescence have already been implanted in zebrafish liver, and the invasiveness and metastasis of these cells were directly observable and easily traceable [90] . to investigate the mechanism of local cancer cell invasion, human glioblastoma cells labeled with fluorescence were infiltrated into the brain of zebrafish embryos. it was observed that the injected cells aligned along the abluminal surface of brain blood vessels [91] . by grafting a small amount of highly metastatic human breast carcinoma cells onto the pericardial membrane of zebrafish embryos at 48 hpf, tumor cells were observed to move longitudinally along the aorta [92] . similarly, highly metastatic human cancer cells labeled with fluorescence were injected into the pericardium of 48-hpf embryos. afterwards, it is possible to visualize how cancer cells entered the blood circulation and arrested in small vessels in head and tail [93] . in another example, zebrafish embryos were injected with tumorigenic human glioma stem cells at different stages of metastasis, including beginning, approaching, clustering, invading, migrating, and transmigrating [94] . thus, grafting a small number of labeled tumor cells into transparent zebrafish embryos allows us to dynamically monitor the cancer cells without the interference of immune suppression. apart from its utility in analyzing the mechanisms of tumor dissemination and metastasis, the zebrafish model can also be applied to screen potential anticancer compounds or drugs. in addition, zebrafish feature such advantages as easy gene manipulation, short generation cycle, high reproducibility, low maintenance cost, and efficient plating of embryos [95, 96] . therefore, this small fish is second only to scid and nude mice as xenograft recipients of cancer cells. leukemia is a cancer related to hematopoiesis. most often, leukemia results from the abnormal increase of white blood cells. however, some human cancers of bone marrow and blood origins have their parental cells from other blood cell types. the search for efficacious therapies for leukemia is ongoing. interestingly, the developmental processes and genes related to hematopoiesis are similar between zebrafish and humans, making zebrafish a feasible model for the study of leukemia. in addition, gene expression in zebrafish could be conveniently modified by several approaches, e.g., mo-induced gene knockdown, talens and crispr/cas9 gene knockout, and dna/ rna introduced overexpression [6, 7] . in the study of yeh et al. [97] , the zebrafish model was applied to screen for chemical modifiers of aml1-eto, an oncogenic fusion protein prevalent in acute myeloid leukemia (aml). treatment of zebrafish with chemical modifiers of aml1-eto resulted in hematopoietic dysregulation and elicited a malignant phenotype similar to human aml. cyclooxygenase-2 (cox2) is an enzyme causing inflammation and pain. nimesulide is an inhibitor of cox2 and an antagonist to aml1-eto in hematopoietic differentiation. fms-like tyrosine kinase 3 (flt3) is a class iii receptor tyrosine kinase which is normally expressed in human hematopoietic stem and progenitor cells (hspcs) [98] . internal tandem duplication (itd), which may occur at either the juxtamembrane domain (jmd) or the tyrosine kinase domain (tkds) of flt3, is observed in one-third of human aml. zebrafish flt3 shares an overall 32, 35, and 34% sequence identity with that of human, mouse, and rat, respectively. however, the jmd and the activation loops of tkd are highly conserved, implicating that the functions of flt3 signaling are evolutionally conserved. overexpression of human flt3-itd in zebrafish embryos induces the ectopic expansion of flt3-itd positive myeloid cells. if those embryos are treated with ac220, a potent and relatively selective inhibitor of flt3, flt3-itd myeloid expansion is effectively ameliorated [99] . in another example, isocitrate dehydrogenase (idh) 1 and 2 are involved in citric acid cycle in intermediary metabolism. idh mutations are found in approximately 30 % of cytogenetically abnormal aml, suggesting a pathogenetic link in leukemia initiation [100, 101] . injection of either human idh1-r132h or zebrafish idh1-r146h, a mutant corresponding to human idh1-r132h, resulted in increased 2-hydroxyglutarate, which in turn induced the expansion of primitive myelopoiesis [102] . taken together, these reports suggest that the molecular pathways involved in leukemia are conserved between humans and zebrafish. based on the aforementioned experimental evidence, zebrafish can be an exceptional platform for mimicking human myelodysplastic syndromes and establishing an in vivo vertebrate model for drug screening. several liver tumor models have been reported by liverspecific expression of transgenic oncogenes such as kras, xmrk and myc. these transgenic lines of zebrafish usually generate liver tumors with various severity from hepatocellular adenoma (hca) to hepatocellular carcinoma (hcc) [103] [104] [105] . these three transgenic liver cancer models have been used to identify differentially expressed genes through rna-sage sequencing. for example, researchers have searched genes either up-or downregulated among the three tumor models and analyzed the possible signaling pathways. then, correlation between zebrafish liver tumor signatures and the different stages of human hepatocarcinogenesis was determined [106] . high tumor incidence and convenient chemical treatment make this inducible transgenic zebrafish a plausible platform for studying on liver tumor progression, regression, and anticancer drug screening. interestingly, zebrafish become a modern organism for studying on depressive disorders [107] [108] [109] . because the physiological (neuroanatomical, neuroendocrine, neurochemical) and genetic characteristics of zebrafish are similar to mammals, zebrafish are ideal for high-throughput genetic and chemical genetic screening. furthermore, since behavioral test of zebrafish for cognitive, approach-avoidance, and social paradigms are available, the identification of depression-like indices in response to physiological, genetic, environmental, and/ or psychopharmacological alterations is feasible [110] . actually, zebrafish display highly robust phenotypes of neurobehavioral disorders such as anxiety-like and approach-avoidance behaviors. furthermore, novel information of behavioral indices can be exposed, including geotaxis via top-bottom vertical movement [111] . zebrafish behavior can also be monitored using automated behavioral tracking software, which enhances efficiency and reduces interrater variance [112] . additionally, zebrafish offer a potential insight into the social aspects of depression [113] and may be suitable for studying the cognitive deficits of depression [114] and its putative etiological pathways [115] . last but not least, zebrafish are highly sensitive to psychotropic drugs, such as antidepressants, anxiolytics, mood stabilizers, and antipsychotics [116] [117] [118] , serving as an important tool for drug discovery. aromatic hydrocarbons, heavy metals and environmental estrogens are currently being used to test the impact of environmental pollutants on animals [119] . these studies mainly focused on mortality and abnormality rates. however, the developing embryos may have already been damaged in a subtle way that would have precluded direct observation of morphology and detection of mortality. to overcome this drawback, transgenic fish can be used because they are designed to study (a) whether toxicants cause defective genes during embryogenesis; (b) whether pollutants affect the expression of tissue-specific gene; and (c) whether the impact of pollutants on embryonic development is dosage dependent. pollutants can be directly detected by simply observing the coloration change of cells before or after the pollutants can cause morphological damage. therefore, transgenic model fish are promising organisms for use as bioindicators to environmental toxicants and mutagens [120, 121] . in addition, chen and lu reported that the environmental xenobiotics can be detected by a transgenic line of medaka carrying a gfp reporter driven by cytochrome p450 1a promoter (cyp1a-gfp) [122] . furthermore, the environmental xenoestrogenic compounds can be specifically detected by a hybrid transgenic line derived from crossing between line cyp1a-gfp and line vg-lux whose lux reporter activity is driven by a vitellogenin promoter [123] . lee et al. reported another zebrafish transgenic line, termed huorfz [54] , as it has been described in pervious section . at normal condition, the translation of the transferred huorf chop -gfp mrna in huorfz embryos is completely suppressed by an inhibitory uorf of human chop mrna (huorf chop ). however, when the huorfz embryos were under er stress, such as heat shock, cold shock, hypoxia, metals, alcohol, toxicants or drugs, the downstream gfp became apparent due to the blockage of huorf chop -mediated translation inhibition. therefore, huorfz embryos can be used to study the mechanism of translational inhibition. additionally, huorfz embryos can serve a living material to monitor the contamination of hazardous pollutants [124] . besides the universal huorfz system, zebrafish could also be indicators for specific pollutants. for example, xu et al. reported a transgenic zebrafish tg (cyp1a:gfp) which can serve as an in vivo assay for screening xenobiotic compounds, since cyp1a is involved in the aryl hydrocarbon receptor pathway, and can be induced in the presence of dioxins/dioxin-like compounds and polycyclic aromatic hydrocarbons [125] . additional advantages of zebrafish include the small size, abundant number, rapid development and transparent eggs. these features make this model fish more accessible for the studies of molecular toxicology. it is increasingly clear that the transgenic fish model is a powerful biomaterial for the studies of multiple disciplines, including molecular biology, developmental biology, neurobiology, cancer biology and regenerative medicine. it provides a simple, yet effective, in vivo approach to identify regulatory dna sequences, as well as determine gene function and molecular pathways. more importantly, an increasing number of papers have reported that (a) the defective phenotype of mutants of model fish can photocopy with known human disorders; and (b) drugs have similar effects on zebrafish and mammalian systems. therefore, the transgenic fish model offers a useful platform for high-throughput drug screening in biomedical sciences. additionally, it can serve as an environmental indicator for detecting pollutants in our daily lives. nevertheless, there are several limitations and caveats of this fish model. first, unlike mammals, fish lack the heart septation, lung, mammary gland, prostate gland and limbs, which make the fish model impossible for studies of these tissues and organs. additionally, fish are absent of placenta so that fish embryos are directly exposed to the environment (e.g., drugs or pollutants) without involving the placenta. second, fish are poikilothermic and usually maintained below 30°c, which may not be optimal for those mammalian agents adapted for 37°c in evolution. last, since the zebrafish genome is tetraploid, it is less straight forward to conduct loss-offunction studies for certain genes. the molecular biology of transgenic fish enhanced expression and stable transmission of transgenes flanked by inverted terminal repeats from adeno-associated virus in zebrafish identification of the tol2 transposase of the medaka fish oryzias latipes that catalyzes excision of a nonautonomous tol2 element in zebrafish danio rerio functional dissection of the tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition a transposon-mediated gene trap approach identifies developmentally regulated genes in zebrafish heritable gene targeting in zebrafish using customized talens efficient genome editing in zebrafish using a crispr-cas system crispr/cas9 and talen-mediated knock-in approaches in zebrafish the aequorea victoria green fluorescent protein can be used as a reporter in live zebrafish embryos gata-1 expression pattern can be recapitulated in living transgenic zebrafish using gfp reporter gene high-frequency generation of transgenic zebrafish which reliably express gfp in whole muscles or the whole body by using promoters of zebrafish origin isolation of a zebrafish rod opsin promoter to generate a transgenic zebrafish line expressing enhanced green fluorescent protein in rod photoreceptors visualization of cranial motor neurons in live transgenic zebrafish expressing green fluorescent protein under the control of the islet-1 promoter/enhancer analysis of pancreatic development in living transgenic zebrafish embryos germ-line transmission of a myocardium-specific gfp transgene reveals critical regulatory elements in the cardiac myosin light chain 2 promoter of zebrafish 435-bp liver regulatory sequence in the liver fatty acid binding protein (l-fabp) gene is sufficient to modulate liver regional expression in transgenic zebrafish establishment of a bone-specific col10a1: gfp transgenic zebrafish the macrophage-specific promoter mfap4 allows live, long-term analysis of macrophage behavior during mycobacterial infection in zebrafish expression of a vas:: egfp transgene in primordial germ cells of the zebrafish uniform gfp-expression in transgenic medaka (oryzias latipes) at the f0 generation random samples: that special glow genetics: fish that glow in taiwan coolest inventions 2003: light and dark-red fish, blue fish and glow-in-dark fish modification of bacterial artificial chromosomes through chi-stimulated homologous recombination and its application in zebrafish transgenesis artificial chromosome transgenesis reveals longdistance negative regulation of rag1 in zebrafish multiple upstream modules regulate zebrafish myf5 expression use of the gal4-uas technique for targeted gene expression in zebrafish udp-glucose dehydrogenase required for cardiac valve formation in zebrafish activation of notch signaling pathway precedes heart regeneration in zebrafish k-atpase is essential for embryonic heart development in the zebrafish the embryonic vertebrate heart tube is a dynamic suction pump four-dimensional cardiac imaging in living embryos via postacquisition synchronization of nongated slice sequences zebrafish cardiac development requires a conserved secondary heart field tbx1 is required for second heart field proliferation in zebrafish novel regulatory sequence − 82/-62 functions as a key element to drive the somite-specificity of zebrafish myf-5 foxd3 mediates zebrafish myf5 expression during early somitogenesis foxd5 mediates anterior-posterior polarity through upstream modulator fgf signaling during zebrafish somitogenesis inactivation of zebrafish mrf4 leads to myofibril misalignment and motor axon growth disorganization novel cis-element in intron 1 represses somite expression of zebrafish myf-5 microrna-3906 regulates fast muscle differentiation through modulating the target gene homer-1b in zebrafish embryos novel intronic microrna represses zebrafish myf5 promoter activity through silencing dickkopf-3 gene zebrafish dkk3a protein regulates the activity of myf5 promoter through interaction with membrane receptor integrin α6b dickkopf-3-related gene regulates the expression of zebrafish myf5 gene through phosphorylated p38a-dependent smad4 activity myogenic regulatory factors myf5 and myod function distinctly during craniofacial myogenesis of zebrafish the transcription factor six1a plays an essential role in the craniofacial myogenesis of zebrafish normal function of myf5 during gastrulation is required for pharyngeal arch cartilage development in zebrafish embryos differential requirements for myogenic regulatory factors distinguish medial and lateral somitic, cranial and fin muscle fibre populations retina-specific ciselements and binding nuclear proteins of carp rhodopsin gene egr1 gene knockdown affects embryonic ocular development in zebrafish the embryonic expression patterns and the knockdown phenotypes of zebrafish adp-ribosylation factor-like 6 interacting protein gene arl6ip1 plays a role in proliferation during zebrafish retinogenesis zebrafish arl6ip1 is required for neural crest development during embryogenesis ras-related nuclear protein is required for late developmental stages of retinal cells in zebrafish eyes transgenic zebrafish model to study translational control mediated by upstream open reading frame of human chop gene a parsimonious model for gene regulation by mirnas gene silencing by micrornas: contributions of translational repression and mrna decay regulation of mrna translation and stability by micrornas micrornas: target recognition and regulatory functions microrna target predictions in animals labeled microrna pull-down assay system: an experimental approach for high-throughput identification of micrornatarget mrnas mir-1 and mir-206 regulate angiogenesis by modulating vegfa expression in zebrafish mir-1 and mir-206 target different genes to have opposing roles during angiogenesis in zebrafish embryos concise review: new frontiers in microrna-based tissue regeneration administration of microrna-210 promotes spinal cord regeneration in mice regulation of neonatal and adult mammalian heart regeneration by the mir-15 family integrated analyses of zebrafish mirna and mrna expression profiles identify mir-29b and mir-223 as potential regulators of optic nerve regeneration transplantation and in vivo imaging of multilineage engraftment in zebrafish bloodless mutants hematopoiesis: an evolving paradigm for stem cell biology transcriptional regulation of hematopoietic stem cell development in zebrafish mutantspecific gene programs in the zebrafish prostaglandin e2 regulates vertebrate haematopoietic stem cell homeostasis prostaglandin e2 enhances human cord blood stem cell xenotransplants and shows long-term safety in preclinical nonhuman primate transplant models from zebrafish to human: modular medical models the heartstrings mutation in zebrafish causes heart/fin tbx5 deficiency syndrome the toxic effect of amiodarone on valve formation in the developing heart of zebrafish embryos amiodarone induces overexpression of similar to versican b to repress the egfr/gsk3b/snail signaling axis during cardiac valve formation of zebrafish embryos cancer metastasis and egfr signaling is suppressed by amiodarone-induced versican v2 in vivo recording of adult zebrafish electrocardiogram and assessment of drug-induced qt prolongation zebrafish model for human long qt syndrome in-vitro recording of adult zebrafish heart electrocardiogram -a platform for pharmacological testing liver development and cancer formation in zebrafish zebrafish as a cancer model zebrafish modelling of leukaemias catch of the day: zebrafish as a human cancer model zebrafish as a cancer model system a keratin 18 transgenic zebrafish tg(k18(2.9):rfp) treated with inorganic arsenite reveals visible overproliferation of epithelial cells braf mutations are sufficient to promote nevi formation and cooperate with p53 in the genesis of melanoma the pathogenesis of cancer metastasis: the 'seed and soil' hypothesis revisited zebrafish xenotransplantation as a tool for in vivo cancer study metastatic behaviour of primary human tumours in a zebrafish xenotransplantation model calpain 2 is required for the invasion of glioblastoma cells in the zebrafish brain microenvironment distinct contributions of angiogenesis and vascular co-option during the initiation of primary microtumors and micrometastases visualizing extravasation dynamics of metastatic tumor cells a novel zebrafish xenotransplantation model for study of glioma stem cell invasion quantitative phenotyping-based in vivo chemical screening in a zebrafish model of leukemia stem cell xenotransplantation zebrafish-based systems pharmacology of cancer metastasis discovering chemical modifiers of oncogene-regulated hematopoietic differentiation stk-1, the human homolog of flk-2/flt-3, is selectively expressed in cd34+ human bone marrow cells and is involved in the proliferation of early progenitor/ stem cells functions of flt3 in zebrafish hematopoiesis and its relevance to human acute myeloid leukemia cancer-associated metabolite 2-hydroxyglutarate accumulates in acute myelogenous leukemia with isocitrate dehydrogenase 1 and 2 mutations regulation of cancer cell metabolism functions of idh1 and its mutation in the regulation of developmental hematopoiesis in zebrafish inducible and repressable oncogene-addicted hepatocellular carcinoma in tet-on xmrk transgenic zebrafish an inducible kras (v12) transgenic zebrafish model for liver tumorigenesis and chemical drug screening a transgenic zebrafish liver tumor model with inducible myc expression reveals conserved myc signatures with mammalian liver tumors xmrk, kras and myc transgenic zebrafish liver cancer models share molecular signatures with subsets of human hepatocellular carcinoma gaining translational momentum: more zebrafish models for neuroscience research zebrafish as an emerging model for studying complex brain disorders zebrafish models for translational neuroscience research: from tank to bedside zebrafish models of major depressive disorders threedimensional neurophenotyping of adult zebrafish behavior aquatic blues: modeling depression and antidepressant action in zebrafish social modulation of brain monoamine levels in zebrafish can zebrafish learn spatial tasks? an empirical analysis of place and single cs-us associative learning cognitive dysfunction in depression-pathophysiology and novel targets a larval zebrafish model of bipolar disorder as a screening platform for neuro-therapeutics role of serotonin in zebrafish (danio rerio) anxiety: relationship with serotonin levels and effect of buspirone, way 100635, sb 224289, fluoxetine and para-chlorophenylalanine (pcpa) in two behavioral models an affective disorder in zebrafish with mutation of the glucocorticoid receptor global water pollution and human health transgenic zebrafish for detecting mutations caused by compounds in aquatic environments mutational spectra of benzo [a] pyrene and meiqx in rpsl transgenic zebrafish embryos transgenic fish technology: basic principles and their application in basic and applied research gfp transgenic medaka (oryzias latipes) under the inducible cyp1a promoter provide a sensitive and convenient biological indicator for the presence of tcdd and other persistent organic chemicals zebrafish transgenic line huorfz is an effective living bioindicator for detecting environmental toxicants generation of tg (cyp1a:gfp) transgenic zebrafish for development of a convenient and sensitive in vivo assay for aryl hydrocarbon receptor activity the authors declare that they have no competing interests.authors' contributions hjt conceptualized, organized, charged and revised the content, and hjt, cyl and cyc wrote the manuscript together. all authors read and approved the final manuscript.• we accept pre-submission inquiries • our selector tool helps you to find the most relevant journal submit your next manuscript to biomed central and we will help you at every step: key: cord-000248-zueoyesj authors: berretta, regina; moscato, pablo title: cancer biomarker discovery: the entropic hallmark date: 2010-08-18 journal: plos one doi: 10.1371/journal.pone.0012262 sha: doc_id: 248 cord_uid: zueoyesj background: it is a commonly accepted belief that cancer cells modify their transcriptional state during the progression of the disease. we propose that the progression of cancer cells towards malignant phenotypes can be efficiently tracked using high-throughput technologies that follow the gradual changes observed in the gene expression profiles by employing shannon's mathematical theory of communication. methods based on information theory can then quantify the divergence of cancer cells' transcriptional profiles from those of normally appearing cells of the originating tissues. the relevance of the proposed methods can be evaluated using microarray datasets available in the public domain but the method is in principle applicable to other high-throughput methods. methodology/principal findings: using melanoma and prostate cancer datasets we illustrate how it is possible to employ shannon entropy and the jensen-shannon divergence to trace the transcriptional changes progression of the disease. we establish how the variations of these two measures correlate with established biomarkers of cancer progression. the information theory measures allow us to identify novel biomarkers for both progressive and relatively more sudden transcriptional changes leading to malignant phenotypes. at the same time, the methodology was able to validate a large number of genes and processes that seem to be implicated in the progression of melanoma and prostate cancer. conclusions/significance: we thus present a quantitative guiding rule, a new unifying hallmark of cancer: the cancer cell's transcriptome changes lead to measurable observed transitions of normalized shannon entropy values (as measured by high-througput technologies). at the same time, tumor cells increment their divergence from the normal tissue profile increasing their disorder via creation of states that we might not directly measure. this unifying hallmark allows, via the the jensen-shannon divergence, to identify the arrow of time of the processes from the gene expression profiles, and helps to map the phenotypical and molecular hallmarks of specific cancer subtypes. the deep mathematical basis of the approach allows us to suggest that this principle is, hopefully, of general applicability for other diseases. in a seminal review paper published nine years ago, hanahan and weinberg [1] introduced the ''hallmarks of cancer''. they are six essential alterations of cell physiology that generally occur in cancer cells independently of the originating tissue type. they listed: ''self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of the normal programmed-cell mechanisms (apoptosis), limitless replicative potential, sustained angiogenesis, and finally, tissue invasion and metastasis''. more recently, several researchers have advocated including ''stemness'' as the seventh hallmark of cancer cells. this conclusion has been reached from the outcomes of the analysis of high-throughput gene expression datasets [2, 3] . the new role of stemness as a hallmark change of cancer cells is also supported by the observation that histologically poorly differentiated tumors show transcriptional profiles on which there is an overexpression of genes normally enriched in embryonic stem cells. for example, in breast cancer the activation targets of the pluripotency markers like nanog, oct4, sox2 and c-myc have been shown to be overexpressed in poorly differentiated tumors in marked contrast with their expression in welldifferentiated tumors [4] . other authors suggest different hallmarks, with many papers pointing alternative processes as their primary focus of their research. the difference may stem from the fact that these authors prefer to cite as ''key hallmarks'' physiological changes which occur at a ''lower level'' scale closer to the molecular events. these authors cite, for example, ''mitochondrial dysfunction'' [5, 6] (including, but not limited to ''glucose avidity'' [7] and ''a shift in glucosemetabolism from oxidative phosphorylation to glycolysis'' [6, 8] , ''altered glycolysis'' [9] , ''altered bioenergetic function of mitochondria'' [10] ), ''dysregulation of cell cycle and defective genome-integrity checkpoints'' [11] , ''aberrant dna methylation'' [12] (''promoter hypermethylation of hallmark cancer genes'' [13] and ''cpg island hypermethylation and global genomic hypomethylation'' [14] ), ''shift in cellular metabolism'' [15, 16, 17] , ''regional hypoxia'' [18] , ''microenviroment acidosis'' [19] , ''abnormal microrna regulation'' [20, 21] , ''aneuploidy'' and ''chromosome aberrations'' [22, 23, 24, 25, 26] , ''disruption of cellular junctions'' [27] , ''avoidance of the immune response'' [28] , ''pre-existing chronic inflammatory conditions'' [29, 30] , ''cancerrelated inflammation'' [29] , ''disabled autophagy'' [28] , ''impaired cellular senescence'' [31] , ''altered nf-kappab signalling'' [32] , ''altered growth patterns, not altered growth per se'' [33] , ''disregulated dna methylation and histone modifications'' [34] , ''tissue dedifferentiation'' [35, 36] , and ''somatically heritable molecular alterations'' [37] . this research enriches the list of the most important cancer hallmarks. however, these physiological changes occur at a ''lower'' molecular level they are likely related sub events of the orginial seven instead of newly discovered ''key hallmarks''. more recently, luo et al attempted a ''stress-based'' description of some of the hallmarks in terms of ''stresses'' (''dna damage/replication stress, proteotoxic stress, mitotic stress, metabolic stress, and oxidative stress'') [38] . while this is an interesting descriptive grouping, it is still a phenotypical characterization. what is needed is a higher level unifying genotypical characterization, from which individual disregulated processes can be identified in a quantitative way using the existing high-throughput data capture methodologies. it is clear that a unifying hallmark is needed if we aim at quantifying the cell's progression. it is then evident for us that a unifying mathematical formalism is necessary to uncover the cell transcriptome's progression from a normal to a more malignant phenotype. we start our quest assuming an implicit working hypothesis common to many research groups around the world: the macroscopic physiological changes (i.e. hanahan and weinberg's ''hallmarks'') must also correlate with global alterations of the molecular profiles of gene transcription. it is also assumed that the ''hallmark changes'' occur along a certain timeline, but that some of the sub-processes discussed before are concurrent. these processes may start in a slow incremental way with some of the major changes being early events while others (e.g. tissue invasion and metastasis) are likely later processes triggered by new events during cancer progression. the timeline is not explicit and it is also likely that cancer subtypes progress to similar timelines. in some cases the sequence of events are better understood (e.g. some leukaemia subtypes [39] ). the elicitation and regulation of molecular events is likely to be an ongoing quest during this century for many types of cancer. it is not to be assumed that some of the transitions of the transcriptome are gradual. that is a hypothesis that is unnecessary in this study. we envision that the progression of cancer may have ''switches'', with a number of concurrent converging events leading to macroscopic observable changes in the gene expression profile resulting in dramatic variations of expression patterns. for instance, these molecular switches could not be characterized by an ''oncogene'' but by a large number of the genes that have changed its transcriptional state. these abrupt changes may be triggered by the confluence of several non-linear interactions, and are likely to be related to the physiological hallmarks we refer to above. the presence of macroscopic observable changes that are computable from a large number of relatively smaller changes mean that it may be possible to find an objective mathematical formalism to infer the turning point at which these radical changes occur. it is then evident that computing the jensen-shannon divergences, the normalized shannon entropy, and the statistical complexity of samples reveal different global transcriptional changes. it is, however, not easy to infer if these changes would correlate with a gradual progression or sudden changes. however, one valid mathematical possibility is that the most important ''hallmark of cancer'', a unifying principle above all, is the existence of a measurable gradual ''progression'' from a well-differentiated gene expression profile (corresponding to a healthy tissue). this would reveal the timeline of a higher level process that is observable and measurable via a change of normalized shannon entropy and an increment of jensen-shannon divergences from the originating tissue type. if this is the case, by correlating the changes in information theory quantifiers with the expression of the genes we would be able to not only uncover useful biomarkers to track this progression but to explain the ''hallmarks'' in an ordered timeline. the timeline also yields clinical and translational important outcomes. such analytical methodology will naturally produce ''a continuous staging'' of the cancer samples, based on a solid foundations of information theory, based on the knowledge of transcriptional profile of healthy cells as reference to measure divergences. in addition, as a mathematical methodology, it can be applied to other high-throughput technologies for which a probability distribution function of observed abundances has been computed. with these ideas in mind, we provide a ''transcriptomic-driven'' method revealing important biomarkers for cancer progression a direction of time for which they are presented. the method, however, is generalizable to other type of high-throughtput techonologies (e.g. proteomic studies). we have chosen two types of cancers to study which are almost at the antipodes in terms of progression rates: prostate cancer and melanoma. prostate cancer progresses very slowly. pathological samples are common in autopsies of men as young as 20 years old. by the age of 70 more than 80% of men have these alterations, a fact that already shows a relationship of this cancer type with increasing age. the clinical management of prostate cancer requires the identification of the so-called gleason patterns in the biopsies [40] , which after almost fifty years is still ''the sole prostatic carcinoma grading system recommended by the world health organization''. however, undergrading, underdiagnosis, interobserver reproducibility and variable trends in grading have been observed as major problems [41, 42] . melanoma, on the other hand, differs from prostate cancer in its rapid progression [43] and it is considered one of the most aggressive types of cancer. one of melanoma's usual markers of progression and concern (i.e thickness) is measured in millimetres, which gives a rough idea of how devastatingly fast the disease can spread. we will present our results starting with one prostate cancer dataset, followed by another in melanoma, to come back to the prostate cancer discussion using another highly relevant dataset. this is a departure from the alternative approach in which each disease is discussed in separate sections. however, after considering several possibilities, we are convinced that our approach is the most appropriate to showcase the technique and its power. details on the datasets and methods used are given in the 'materials and methods' section of this paper. we also refer to the original studies and manuscripts associated to the three datasets we analysed. and available at the web address given above). after imputation of missing values, we first calculated the normalized shannon entropy and the mpr-statistical complexity for the each sample. the flowing section explains the context in which our results were generated (refer to the 'materials and methods' section for detail on how our quantities are computed). the normalized shannon entropy measure is widely used in ecosystem modelling to quantify species diversity, where it is acknowledge as having great sensitivity to relative abundances of species in an ecosystem [45] . we utilise the same sensitivity to differentiate a samples in cancer datasets. figure 1 shows that the normalized shannon entropy of prostate cancer tumor samples do not differ much from normal samples. this is in contrast to lymph node metastasis samples that appear to have smaller values of normalized shannon entropy. a mathematical interpretation of this result is that the samples from lymph node metastases have cells that not only varied their transcriptomic profile, they have also ''peaked'' the distribution of expression values with significant fold increases on a smaller number of probes. this explains the reduction in normalized shannon entropy. we note that there are several mechanisms that can explain a macroscopically observable global reduction of transcription. for instance, this may indicate that a relatively large number of genes have reduced their expression levels by genome damage, changes in gene regulation, or other silencing processes. it is reassuring to observe that the changes of the most prototypical quantitative measure we can draw from information theory, the normalized shannon entropy correlate well with the transition between normal samples with to ones with metastases. however, it is also evident from that normal samples do not differentiate much from the tumor group (the normalized shannon entropy values do not differ much). it is then not the number of genes with high expression values, but the change in the distribution of expression levels on the molecular profile, that can provide the other measure that could distinguish these other samples. this must be handled by the other statistical complexity measures to be discussed next. several statistical complexity measures can be defined which aim to clarify our argument. we will first discuss the results of computing the mpr-statistical complexity measure (in the previous figure the y-coordinates correspond to the mpr-statistical complexity values of each sample). the mpr-statistical complexity is proportional to both the normalized shannon entropy associated to the transcription profile and the jensen-shannon's divergence between that probability density function and the uniform probability distribution. again, we refer the reader to the 'materials and methods' section for an explanation of how these magnitudes are computed. although the results of using the mpr-statistical complexity might not seem particularly impressive, there are a few reasons why we introduce them at this stage. we want to illustrate a fact that can already be observed when we employ this measure on this dataset. in this dataset, for a given entropy value interval, normal tissue samples tend to have relatively lower mpr-statistical complexity values than tumor and lymph node metastasis. this means that both prostate cancer and metastases samples diverge from a ''more uniform'' distribution indicating that the distribution ''peaks'' in fewer active genes. it also means that, in terms of jensen-shannon's divergence, the transcriptional profile of a normal prostate cell sample is ''closer'' to a uniform distribution than to the one that is observed in a prostate cancer cell sample. the reader will readily argue, and with reason, that the transcriptional profile of a normal cell is tissue-specific and that it hardly resembles that of a uniform distribution of expression values. that is correct and this observation motivates the introduction of two new statistical complexity measures. we generically call these two variants as 'm-complexities' (with 'm' standing for ''modified''). they have the same functional form as the mpr-statistical complexity, but instead of computing the jensen-shannon's divergence from a uniform probability distribution we compute it against an ad hoc probability distribution functions derived from the data. in this sense, these measures are more supervised then the mpr-statistical complexity is. another perspective is that the mpr-statistical complexity is a special case of this measure in which the ad hoc probability distribution function of reference is the equiprobability distribution. the relevance of this measure derives from being a general definition that allows [44] . metastatic samples have typically lower values of normalized shannon entropy than normal samples and prostate cancer primary tumors. the reduction in normalized shannon entropy indicates that there exists a significant reduction on the expression of a large number of genes, or that the gene profile of metastatic samples has a more ''peaked'' distribution (due to the upregulation of a selected subset of genes). both possibilities just cited are not mutually exclusive. we also note that neither the normalized shannon entropy, nor the mpr-statistical complexity (as a single unsupervised quantifier), can help differentiate between tumor and normal samples, indicating that other information theory quantifiers are required for this discrimination. doi:10.1371/journal.pone.0012262.g001 accommodating several different reference states. we will use it to measure divergences to the ''initial'' and ''final'' transcriptomic states (two states of reference). taken as computed averages over normal samples, and respectively metastatic ones, these measures will allow tracking the processes of differentiation of a cancer cell from a particular tissue type. for example, using lapointe et al.'s dataset, the m-normal statistical complexity quantifier first requires the computation of the probability distribution function of the average gene expression profile of all normal prostate samples. afterwards, the normalized shannon entropy and the jensen-shannon's divergence of any sample profile will be computed using the divergence to that averaged normal distribution. analogously, we compute the m-metastases statistical complexity quantifier by first calculating the average profile of the metastases samples, and then generating the corresponding probability distribution function, finally computing the jensen-shannon's divergence with that profile. we refer to the 'materials and methods' section for details of the calculations. the results can be observed in figure 2 . on the x-axis, the lymph node metastases have the largest values of m-normal indicating a divergence from the normal profile. in addition, the m-metastases values of normal samples tend to be higher than most of the metastasis samples (with the exception of only one). figure 2 shows a gradual progression of the samples positions on this plane from a well-differentiated tissue type specific profile, first to a more heterogeneous primary tumor cluster, and finally to an even less differentiated metastatic profile. the result presented in figure 2 shows that the prostate cancer samples, which are not metastases and therefore could have been scattered anywhere on the plane, are clustered on a particular confined area between the two other groups. we understand that there are reasons to be sceptical about this result being not just a simple consequence of the gene selection process used by lapointe et al. for example, if we assume that the 5,153 probes singled out by lapointe et al. in their figure one of ref. [44] (and that constitute our original data) have been selected with a supervised method that try to distinguish between normal and metastases, then the relative position of normal and metastases samples is perhaps something to be expected. however, even under that assumption, what is not expected is the position of all primary tumor prostate cancer samples, linking the normal cluster of samples with the metastases one. note that the definition of both the m-normal and m-metastases measures do not use any information from the primary tumor prostate cancer samples, so the location of these samples between the normal cluster and the metastases, bridging them naturally is something to highlight. together with figure 1 , it gives evidence that supports the working hypothesis that a gradual ''progression'' occurs, from the normal tissue specific profile to the metastasis one. indeed, following our line of argument, figure 2 has even more relevance when we highlight the fact that the 5,153 probes have not been selected with a supervised method. the authors say that the only selection criteria was to single out the 5,153 cdnas whose expression varied most across samples. in the supplementary notes of their paper the authors say: ''we included for subsequent analysis only well measured genes whose expression varied, as determined by (1) signal intensity over background .1.5-fold in both test and reference channels in at least 75% of samples, and (2) 3-fold ratio variation from the mean in at least two samples; 5,153 genes met these criteria.'' as a consequence, figure 2 has been generated without class selection bias only using the genes that have the most varied expression pattern. we now turn to another aspect of the statistical complexity and entropy analysis. we note that figure 2 shows that the metastases samples have a clear reduction on normalized shannon entropy in comparison with the values observed for the normal samples. at the same time, metastases samples, as expected, have higher mnormal complexity than the normal samples ( figure 2 ). it is then interesting to evaluate the value of the jensen-shannon divergence of these samples and to identify the genes that most correlate with the variations of jensen-shannon divergence to quantify one of the factors that is related to the statistical complexity changes. we have computed the correlation of the gene expression profile corresponding to each of the 5,123 probes. for each of the 5,123 probes, we computed both the pearson correlation (x-axis of figure 3 ) and the spearman correlation (y-axis of figure 3 ) of each probe profile with the jensen-shannon divergence having as probability distribution of reference that of a metastasis profile (these values are called jsm2-pearson and jsm2-spearman in the accompanying excel file provided). with this data, we have produced figure 3 , a scatter plot of the values associated to each probe. in this figure, there are two probes that are immediately recognizable by any cancer researcher, and in particular for those in prostate cancer: klk3/psa (prostate specific antigen) and fos. the interpretation of these scatter plots is not immediate and needs an introductory explanation. each dot corresponds to one probe of the array. for example, a dot that is very close to the origin of coordinates (0,0) indicates a probe such that its pattern of gene expression (across all samples) is not correlated with the jensen-shannon divergence to the average profile of a metastasis pattern. it is, in essence, a probe which is highly uninteresting in this regard. probes that have a high correlation, across all samples, either positive or negative with the jensen-shannon divergence to the average profile of a metastasis pattern are highly informative. they ''co-express'' with this measure. although we provide in the supplementary material the information corresponding to all probes, we will discuss just a few of them. this will allow the reader to understand these plots and will put our results in the perspective with current research in prostate cancer. we particularly highlight the position of klk3/ psa, fos and ccl2. to our surprise, we have found which is perhaps the most famous biomarker in prostate cancer klk3/ psa (kallikrein-related peptidase 3), probe g_914588 (correlations of 20.9312 and 20.9000 respectively). fos and klk3/ psa are the second and the fourth most negatively correlated probes in this ranking of all the genes in the microarray. with opposite signs for correlations are cdkn2d, foxm1, and brca2. the following is a discussion of a selection of probes (highlighted in figure 3 ) in the context of prostate cancer. cdkn2d (cyclin-dependent kinase inhibitor 2d, p19, inhibits cdk4). one of the genes that has strong positive correlations is cdkn2d, (cyclin-dependent kinase inhibitor 2d, p19, inhibits cdk4) (pearson correlation of 0.7543, spearman correlation 0.6833), probe g_145503. a gene that shows a positive correlation with the divergence of a metastasis profile indicates a gene that has a putative reduced expression on these samples. cdkn2d is a known regulator of cell growth regulator and controls cell cycle g1 progression [46, 47] . loss of cdkn2d in cancer cells is one event which is generally associated to a more malignant phenotype. foxm1. another probe that presents positive correlations is foxm1 (forkhead box m1), with pearson correlation of 0.7039 and spearman correlation 0.7500), probe g_564803. it has been recently shown that the depletion of foxm1 still allows cells to enter mitosis but they are unable to complete cell division. as a consequence this leads to mitotic catastrophe or endoreduplication [48] . foxm1 is considered a key regulator of a transcriptional cluster which is that is essential for proper execution of the mitotic program and the control of chromosomal stability [49] . brca2 -(breast cancer 2, early onset). another gene with positive correlations is brca2 (breast cancer 2, early onset), probe g_193736, with pearson correlation of 0.8161 and spearman correlation 0.7333). while the loss of brca2 function and its consequences in prostate cancer is being reconsidered [50, 51, 52, 53] , brca2 is generally regarded as a ''tumor suppressor'', with an established role in maintaining genomic stability via its function in the homologous recombination pathway for double-strand dna repair. this result is supporting its proposed function. loss of brca2 function is thus a warning sign of the existence of error prone cell processes. in prostate cancer brca2 has been associated to promotion of invasion through upregulation of mmp9 [54] . brca2 loss of function due to mutations is linked to poor survival in prostate cancer [55] and rare germline mutations have been associated with early-onset of prostate cancer [56] . ccl2/mcp-1 (chemokine (c-c motif) ligand 2). bone is one of the most common sites of prostate cancer metastasis; close to 85% of men who die of prostate cancer have bone metastasis [57] . the successful metastatic process to bone follows from the activation of osteoclasts with bone resorption, which in turns leads to the release of different growth factors from the bone matrix [58] . ccl2 has been previously reported as expressed in human bone marrow endothelial cells; the ccl2 stimulation promotes prostate cancer cell migration and proliferation [57, 59] and it has been proposed as a paracrine and autocrine factor for invasion and growth of prostate cancer [60] . as a consequence of this central role in the tumor microenvironment, ccl2 is being the object of several studies and is included in the list of potential targets for novel therapies [60, 61, 62, 63, 64, 65, 66, 67, 68, 69] . fos (v-fos fbj murine osteosarcoma viral oncogene homolog). a probe for fos (g_811015; correlations of 20.9380 and 20.9500 computed with pearson and spearman) has a similar correlation than klk3/psa. the high rank of fos was unexpected, but perhaps it is less of a surprise for some experienced researchers in prostate cancer as its role has been highlighted in the past [70, 71, 72] . amplification of members of the mapk pathway was associated with androgen independent prostate cancer, and co-expression of raf1, erbb2/her2 and c-fos would lead to this phenotype [73] . we will not discuss in depth the known relationships between fos, lamin a/c and prostate cancer. we leave this discussion for later, as lamin a/c will also appear in our study of the other prostate cancer dataset studied in this paper. lamin a/c appears as a member of a set of genes with reduced expression for higher grade primary prostate cancer samples (note that the current analysis that gave fos as a biomarker is on lymph node metastatic samples like here). however, we would like to point out a connection that is currently hypothesized between lamin a/c and fos, the gene we have just discussed. ivorra et al. have recently proposed that ''lamin a overexpression causes growth arrest, and ectopic c-fos partially overcomes lamin a/c-induced cell cycle alterations. we propose lamin a/c-mediated c-fos sequestration at the nuclear envelope as a novel mechanism of transcriptional and cell cycle control'' [74] . in addition: ''c-fos accumulation within the extraction-resistant nuclear fraction (ernf) and its interaction with lamin a are reduced and enhanced by gain-of and lossof erk1/2 activity, respectively.'' [75] . these novel interactions between lmna and fos, their putative role in prostate cancer metastasis and their seemingly different behaviours in prostate cancer lymph node metastases warrant further investigation. sox9 (sry (sex determining region y)-box 9). this transcription factor has been recently identified as having an importat role during embryogenesis and in the early stages of prostate development [76, 77] and in testis determination [78] , processes that link sox9 upregulation to cancer development [79] . basal epithelial cells do express sox9 in a normal prostate. while there exists no detectable expression in lumina epithelial cells, sox9 has already been reported as ''expressed in primary prostate cancer in vivo, at a higher frequency in recurrent prostate cancer and in prostate cancer cell lines (lncap, cwr22, pc3, and du145)'' [80] . wang et al., also in [80] add that: ''significantly, down-regulation of sox9 by sirna in prostate cancer cells reduced endogenous ar protein levels, and cell growth indicating that sox9 contributes to ar regulation and decreased cellular proliferation. these results indicate that sox9 in prostate basal cells supports the development and maintenance of the luminal epithelium and that a subset of prostate cancer cells may escape basal cell requirements through sox9 expression.'' an increased value of sox9 expression in advanced prostate cancer has been associated to tumor progression and the epithelial-mesenchymal transition [81] . sox9 expression has been associated with a putative subgroup of prostate cancer [82] , associated to lymph-node metastasis (as seems to be the case in this dataset) and has a know role in chondrogenic differentiation processes [83] . klk3/psa -(kallikrein-related peptidase 3)/prostate specific antigen. to finalize our initial discussion on this dataset, we address klk3. the high ranking of klk3/psa in our list is perhaps one of the most remarkable retrodictive outcomes of our approach. klk3/psa (also known as prostate specific antigen) is a conspiquous member of our top rank list. it is perhaps the best blood biomarker for prostate cancer screening. its relevance and popularity as a target of studies is so wide that it makes unfeasible any serious attempt to uncover its relevance in the prostate cancer literature. a search using pubmed using the keyword 'klk3' (and the other alias names of this gene) reveals a total of 11,429 published papers. of course, many of these publications relate to its role for early screening, but in this study we are uncovering its role as a tissue biomarker. our results echoes a recent contribution by s. miyano's and his collaborators [84] on a massive meta-analysis of microarray datasets. it is also in line with results from clinical studies that indicate that a 5-year psa value is useful for predicting prostate cancer recurrence. [85] . certainly the dynamics of psa, now perhaps with fos and sox9 added to the set of biomarkers of interest, warrant further investigation for patient population stratification after initial treatment. the biomarkers discussed in this section warrant further investigation in prediction of lymph-node metastasis and clinical management of prostate cancer [86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109] . we refer the reader to the supplementary material to have a complete list of probes and their correlations with the information theory quantifiers. the following sections present the results that we obtained with a melanoma dataset. our aim is to observe if variations of the normalized shannon entropy and the statistical complexity measures, mpr-complexity and the modified forms m-normal and m-metastases, provide interesting results in a different disease and experimental setting. in this case we have selected a gene expression dataset from haqq et al. [110] containing information of 14,772 cdnas in 37 samples (figure two from the [110] ). the 37 samples include 3 normal skin, 9 nevi, 6 primary melanoma and 19 melanoma metastases. this datasets has more phenotypical characteristics for the group of samples. after an initial process of data cleaning, we removed 35 probes which had an unsually high expression value on only a few samples, in some cases on a single one. the dataset we work with from original contributed by haqq et al.consists of 14,737 probes. first, we computed the normalized shannon entropy and the mpr-statistical complexity for each sample (refer to the 'materials and methods' section for a detailed presentation of these calculations). figure 4 shows the values of these quantifiers for each sample. we first observe an important difference between figure 1 and figure 4 . in this melanoma dataset, neither the use of the normalized shannon entropy nor the mpr-complexity helps to discriminate between normal skin, nevi, primary and metastastic melanomas. nevertheless, we decided to present this figure for methodological reasons. we envision that some researchers will calculate the normalized shannon entropy and mpr-complexity using all the probes. we note that in figure one of haqq et al's original paper, the whole probe set was previously filtered by selecting those which vary across samples, thus indicating that they may have information about disease subtypes (although the phenotypic types were not biasing the selection). in this case we want to illustrate both the normalized shannon entropy and mpr-complexity calculated using all the probes does not give the expected benefits. we will now see the benefits of using the m-complexities. as we did for prostate cancer (see figure 2 ), we aim at identifying if the use of the modified forms of the statistical complexity (the m-complexities) could give some insight where the normalized shannon entropy and mpr-complexity measures fail. to compute the m-normal measure, we need to define the average gene expression profile for a normal cell (which we call p ave ). we thus resort to the three normal skin profiles and we produce the average based on these profiles (details for computing the average profiles are given in the 'materials and methods' section). we call m-skin the resulting measure that relies on this profile. analogously, we need to compute a pattern for m-metastasis, and we proceed to calculate the p ave profile averaging over the 19 metastases samples. the result is encouraging, as samples plotted in the (m-skin, m-metastasis)-plane cluster in groups, showing an important m-skin complexity transition between normal skin cells and nevi. most importantly, this method naturally shows that some of the metastatic samples have a large value of m-skin complexity, so we present the results of another experiment, aimed at clarifying this fact. in their original publication, haqq et al. classified the melanoma metastases in two groups due to their molecular profiles: five samples were classified as 'type i' and fourteen as 'type 2' based on a hierarchical clustering approach. our result reinforced the view that the type ii melanomas metastasis is a pretty homogeneous group, we will present the results on the (mskin, m-metastasis i)-plane. this means that now the p ave profile will not be obtained by averaging over the 19 metastases samples, but instead using only the 14 samples which have been labelled as 'type ii'. as such, we aim at revealing if type i samples are indeed different in this plane, and if other clusters are also present. figure 5 presents the results. the first fact worth commenting is the pronounced gap between normal skin samples and the nevi, primary, and metastatic melanoma samples as revealed by the mskin measure. note also that the m-skin is based on the average profile that of the normal samples, which indicates that no information about the profiles of metastasis are used, yet m-skin reveals that increasing values of this measure may be linked with a 'progression' from nevi to primary and metastasis melanoma profiles. we now introduce another useful technique to identify genes which correlate with the transitions. the challenge is to find genes which are related with the progression towards metastases profiles, even when we recognize that there the group of metastasis samples is heterogeneous (containing at least two groups). since the final outcome of figure 4 and figure 5 is that the normalized shannon entropy does not help much in this experimental scenario, we will concentrate only on one of the multiplicative factors of the mcomplexities, the jensen-shannon divergence. we compute two p ave profiles, one with the normal skin samples only, and the other with all the metastasis samples (regardless their type). we will call the two divergences jsm0 and jsm5 respectively. we then compute the spearman correlation of the profile of all gene probes in the array across the 37 samples to both jsm0 and jsm5. we have listed all probes according to the absolute value of the difference of these correlations, i.e. abs. diff. (probe) = |jsm0(probe)2jsm5 (probe)| in decreasing order. the results are provided as haqq-plosone-supfile.xls, in the sheet labelled 'results-correlation'. the rationale is to identify those probes which are highly correlated (both positively or negatively) with the jensen-shannon divergence of the normal tissue profile and that ''reverse signs''. for instance, a probe for the tp63 gene (tumor protein p63, keratinocyte transcription factor ket), aa455929, is ranked in the third position. its correlation with the jensen-shannon divergence of the normal skin type is relatively high and negative (jsm0 = 20.63632) while at the same time is has a positive correlation with the jensen-shannon divergence of the metastasis profile (jsm5 = 0.62138). in the ranking, the first probe that presents the opposite behaviour is one for ada (adenosine deaminase), aa683578. figure 6 helps to understand the relationship of these correlations with expression. not only are these genes well correlated with the divergences, they also seem to be good markers of the progression from one tissue type profile to the metastasis profile. we will now discuss three of these genes in the context of current biological knowledge on melanoma drivers and metastatic progression. we provide many references for one of them, spp1 (secreted phosphoprotein 1 or osteopontin). the discussion on this gene will be left for later, when we will discuss specifc oncosystems related to cell proliferation, chemotaxis and responses to external simulus. figure 7 shows the expression of ada (adenosine deaminase, aa683578) as a function of tp63 (keratinocyte transcription factor ket, aa455929). all normal skin samples, as well as nevi and a couple of primary melanomas have relatively low values of ada but they express tp63. there is a change of roles in metastatic and some primary melanomas, which have reduced tp63 expression but increased values of expression of ada. as we will later see, these events correlate with other major transcriptional modifications which involve dozens of genes and that we have been able to map thanks to functional genomics bioinformatics tools. the role of spp1 will be discussed in that context after some references to tp63, ada, and plk1 which follow. tp63. the product of this gene [111, 112] belongs to the same protein family of its more famous relative, tp53, a gene that is often mutated in human cancers [113] and highly regarded as a key ''tumor suppressor''. tp63's product, p63, is a homologous protein to p53, which is considered to be phylogenetically newer [114] and also regarded as an important apoptotic and cell-cycle arrest protein. mice that lack tp53 are born alive with a propensity for developing tumours; mice that lack tp63 do not appear to be tumour prone, although, new results are partially contradicting earlier findings [115] . it appears that the diverse roles of the isoforms of the p63 family reveal that there exists a crosstalk with the different isoforms of the p53 family that needs to be systematically investigated [116] . it has recently been shown that p63 is a key regulator of the development of stratified epithelial tissues [113] and that its deletion results in loss of stratified epithelial and of all keratinocytes [117] . melanocytes also express two isoforms of p63 [118] , but p63 expression is not reported in 57 out of 59 tumors in a tissue microarray study performed by brinck et al. [119] . it is clear that the the role of loss of expression of tp63 in melanoma warrants further investigation. ada -(adenosine deaminase) and dpp4/cd26 (dipeptidylpeptidase 4, cd26, adenosine deaminase complexing protein 2). a link between tp63 and ada has already been reported in the literature. ada is a gene involved in cell division and proliferatation [120] and it has been suggested to have a regulatory role in dendritic cell innate immune responses [121] .translational modification is also a function of p63. sbisa et al. have proved that ada is a direct target of isoforms of p63, which is an important discovery as ada has two tp53 binding sites, leading to a complex metabolic balance due to the different relationships between this trio and p21 yet to be completely elicitated [120, 122] . several studies indicate elevation of adenosine deaminase levels in sera of breast [123] , head and neck [124] , colorectal [125] , acute lymphoblastic leukaemia [126] and laryngeal cancers [127] . we observe a marked increase of expression of a probe for ada with melanoma progression while at the same time we observe a loss of expression of a probe corresponding to dpp4/cd26 (dipeptidylpeptidase 4, cd26, adenosine deaminase complexing protein 2), a membrane-bound, proline-specific serine protease [128] that has been attributed tumor suppressor functions [129] . it has been previously reported that loss of dpp4 immunostaining helps to discriminate malignant melanomas from deep penetrating nevi, a variant of benign melanocytic nevus [130] and early reports of their absence in metastatic melanomas exist [131, 132] . as deep penetrating nevi can mimic the vertical growth phase of nodular malignant melanoma, and ada could potentially be downregulat-ing dpp4 [133, 134] we believe that the elicitation of the complementary role of these two biomarkers to distinguish these two entities is necessary and also warrants further clinical studies. plk1 (polo-like kinase 1 (drosophila)). another probe for gene that ranks high as a positive marker of metastasis is plk1, polo-like kinase 1, serine/threonine protein kinase 13 (aa629262). plk1 is a centrosomal kinase [135] which is figure 7 . scatter plot showing the expression of the probe corresponding to ada (adenosine deaminase), aa683578 (y-axis) and tp63 (tumor protein p63), aa455929 (x-axis). all the samples that have tp63 expression are normal or nevi, with two primary melanomas still preserving tp63 expression but with higher ada. the trend reverses for the rest of the primary melanoma samples and the metastatic ones, which all express ada but not tp63. doi:10.1371/journal.pone.0012262.g007 analogously, we compute the jensen-shannon divergence of each sample with the average metastastic profile and we also compute the correlation of each probe with this measure (y-axis). the position of one probe corresponding to the tp63 gene (tumor protein p63, keratinocyte transcription factor ket), aa455929, is highlighted. the expression of this probe has a relatively high negative correlation with the jensen-shannon divergence of the normal skin type (jsm0-spearman = 20.63632) while at the same time is has a positive correlation with the jensen-shannon divergence of the metastasis profile (jsm5 = 0.62138). the first probe that presents an opposite behaviour is one for ada (adenosine deaminase), aa683578. probes for spp1 (secreted phosphoprotein 1 or osteopontin) and plk1 (polo-like kinase 1 or drosophila) are also highlighted. while plk1 is currently less recognized as a biomarker in melanoma research, the importance of spp1 in cutaneous pathology [315, 318, 320, 321] and in particular in melanoma [208, 209, 210, 211, 212, 214, 215, 216, 217, 218, 219, 222, 226, 264, 314, 315, 316, 317, 319, 322, 323, 324, 325, 326, 327, 328, 804, 805, 806, 807, 808, 809] is increasing. using a 5-biomarker panel that included spp1, kashani-sabet et al. used tissue microarrays on 693 melanocytic neoplasms to show that spp1 expression collaborates significantly improving the detection of high percentage of melanomas arising in a nevus, spitz nevi, dysplastic nevi and misdiagnosed lesions [253] . like in the case of prostate cancer ( figure 3 , in which klk3/psa -prostate specific antigen was highlighted), our method allows the detection of important biomarkers with a high degree of concordance with current biological understanding of metastatic processes. doi:10.1371/journal.pone.0012262.g006 regarded as being linked to centrosome maturation and spindle assembly [135] . plk1 expression has also been singled out as a biomarker of a ''death-from-cancer'' signature, sharing with others the function of being an activator of mitotic spindle check point proteins. with other proteins it would has a stem cell-like expression profile phenotypically characterized by enabling metastasis with anoikis resistance and disregulated cell-cycle control [136] . plk1 inhibition could be a common target for gastric adenocarcinoma [137] , bladder cancer [138] , colon cancer [139, 140] , hepatocellular carcinoma [141] , medullary thyroid carcinoma [142] , esophageal cancer [143] , pancreatic cancer [144] and in some types of non-hodgkin lymphomas [145] and breast cancer [146] . plk1's spearman correlation with the values of the jensen-shannon divergence of samples with the normal skin profile is relatively high (0.5863). plk1 also has a high value of (negative) spearman correlation with the values of the jensen-shannon divergence of samples with the average metastatic profile (20.44571 in the comparison, it was found that metastatic malignant melanomas with expressed plk1 at markedly elevated levels (median, 60.00% vs. 37.98%; p-value,0.000053), concluding that plk1 is a reliable biomarker for patients at high risk of metastases, even when the most important prognostic clinical factor (breslow's maximum thickness of the primary malignant melanoma) indicates the contrary [147] . we consider this an important finding as plk1 silencing is already part of an integrated oncolytic adenovirus approach currently being studied in mice models of orthotopic gastric carcinoma [148] and has promise due to the lack of a reported measurable immune response of sirna-based therapeutics [149] . another positive note is the less sensitivity to plk1 depletion of cells with a functional p53 [150, 151] , and can help to sensitize cells to chemotherapy (as observed in lung cancer [152] ). this constraint of aneuploid cancer cells to plk1 expression, particularly in cells with inactivated p53 [153] , could be exploited by lentivirus-based rna interference [154] . correlation analysis with jensen-shannon divergences reveals biomarkers for loss of cell adhesion, cell-cell communication, impairment of tight junction mechanisms and dysregulation of epithelial cell polarity. as discussed before, the probe for ada (adenosine deaminase) is the first that has a different trend. since we put all metastasis samples together in the same group when we calculated the average probability profile (and we have a heterogeneous group) we have on our ranking 58 probes that appear before ada (we refer to the supplementary file haqq-plosone-supfile.xls). an analysis using gather (http://gather.genome.duke.edu/) [155] to interpret the collective influence of the lack of expression of all these genes in the metastasis samples reveals an interesting new perspective. using gene ontology, we found that six of the 44 genes identified by gather are related to epidermis development (cdsn, dsp, evpl, gjb5, krt13, krt5), p-value ,0.0001, bayes factor 16, and eight genes are related to cell adhesion (cdsn, cldn1, dsg1, dst, lgals7, lrig3, pcdh21, pkp1), p-value,0.0001, bayes factor 7. ank1 (ankyrin 1, erythrocytic), aa464755 was also singled out as by our gene ontology analysis as related to the maintenance of epithelial cell polarity (p-value = 0.002, bayes factor 3). the use of another profiler of genome signatures (g:profiler, [156] ) also reinforces the view that many genes that have lost expression are related to 'epidermis development' (col17a1, dsp, evpl, gjb5, krt13, krt5, lce1c, mafg, tgm3) with p-value = 7.78e-11. thirteen are associated with gene ontology function of cell communication (ank1, cdsn, cldn1, dsg1, dst, gchfr, gjb5, gpr115, lgals7, lrig3, pcdh21, pkp1, ptger3), albeit with a pvalue of only 0.02. gchfr is also involved in nitric oxide metabolism. if we add to the list of 44 genes already recognized by gather the other 77 probes that after ada in this ranking have also loss of expression (until we found pdxp (pyridoxal (pyridoxine, vitamin b6) phosphatase), the evidence is stronger, now col7a1, gjb5, klk4, and krt1 also is in this group (the bayes factor of this association returned by gather is now 21 for the go term 'epidermis development'). 'cell adhesion' has now 13 genes, cdsn, cldn1, col7a1, dsc2, dsg1, dst, jup, lgals7, lrig3, pcdh21, pkp1, slit3 thbs3 (p-value,0.001, bayes factor 10). these results are considered statistically very relevant as identifiers of a particular process which seems to be undermined by this collective loss of expression. if we put all this information together, we clearly observe a pattern of downregulation of gene expression that is associated with an impairment of epidermis development and the maintainance of its structure ( figure 8 and table 1 ). this is, perhaps, an instantiation of one of the ''extended hallmarks of cancer'' (that of ''tissue dedifferentiation''). this process includes the loss of function of genes that are essential for the maitainance of tight junction and epithelial cell-cell communication. while loss of epithelial structure is related to these genes, we observe that those that increase expression are associated to other developmental processes, not necessarily concerted in this panel. instead they show a pattern of increasing cell motility, chemotaxis and positive regulation of cell proliferation. we will first discuss the processes related to the loss of adhesion, which could be linked to an increased probability of metastatic potential of these cells. the loss of expression of plakophilin 1, junction plakoglobin, desmoplakin and desmoglein 1 indicate deficiencies in desmosome processes. in general, this panel is composed of a number of genes that are losing expression during progression and that have gene ontology annotations related to tight junctions, gap junctions, adherens junctions and desmosomes, and an impaired set of processes that link, via intercellular channels and bridges, the cells of the epidermis. mutations in these genes are linked to a number of skin genetic diseases [157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170] the desmosome are cell-cell adhesive junctions which provide a mechanical coupling between cells. these junctions are found in several epithelial tissues and the decreased assembly of the desmosome has been shown to be a common feature of many epithelial cancers [171, 172] . plakoglobin helps to connect transmembrane elements to the cytoskeleton [173] . plakophilin 1 [174] (pkp1, one of the genes in our panel above) is a desmosomal plaque component [175] that stabilizes desmosomal proteins at the plasma membrane [176, 177] and, with desmoplakin [178] , recruits filaments to sites of cell-cell contacts [179] . as a consequence, it has been proposed that the lack of pkp1 increases keratinocyte migration [180] and loss of pkp1 expression in head and neck squamous cell carcinoma and in esophageal squamous cell carcinoma may contribute to an invasive phenotypic behaviour [171] , perhaps as a consequence of the impaired recruitment of desmoplakin. the desmoglein-specific cytoplasmic region (dscr) is the site of caspase cleavage during apopotosis and is a conserved region of yet undefined function and unknown structure, but it specifies the function of the desmoglein family of cell adhesion molecules (of which dsg 1 is a member). it has been recently shown that the dscr has a weak interaction with pkp1, plakophilin 1 (ectodermal dysplasia/skin fragility syndrome) and the cytoplasmic domain of desmocollin 1 [181] . plakoglobin is cleaved by caspase 3 during apoptosis [182] . in addition, kami et al. in ref [181] also report and conclude that: ''desmoglein 1 membrane proximal region also interacts with all four dscr ligands, strongly with plakoglobin and plakophilin and more weakly with desmoplakin and desmocollin 1. thus, the dscr is an intrinsically disordered functional domain with an inducible structure that, along with the membrane proximal region, forms a flexible scaffold for cytoplasmic assembly at the desmosome''. as previously discussed, all these genes progress towards a loss of expression, and they are highly correlated. figure 9 shows the average expression of pkp1/plakophilin 1 (ectodermal dysplasia/ skin fragility syndrome), (nm_000299) and jup, junction plakoglobin, (bx648177) on the x-axis against that of dsp, desmoplakin (nm_004415 hs.519873) on the y-axis. again, we see a clear pattern of progressive reduction of expression from normal skin and nevi (green and yellow, respectively), primary melanomas (in orange) and melanoma metastases (red). joint loss of expression of claudin 1 and members of the aquaporin family are also linked to a transition to a more malignant phenotype we note however, the gene ontology annotation is not the only way that we can make sense of this information. a detailed analysis of that list of 58 genes reveals other proteins involved in tight junction, like aquaporin 3 (aqp3). probes for aqp3 and claudin 1 (cldn1) have reduced expression with the progression of the disease as shown in figure 10 . aqp3 (gill blood group) is a member of the aquaporin family of proteins, and currently is recognized as an 'aquaglyceroporin' [183] of great importance to maintain skin hydration of mammals epidermis [184] . three proteins of this family (aqp1, aqp3, and aqp9) have probes that seem correlated with melanoma progression, all losing their expression in the process of going from normal skin to metastatic melanoma. aqp3 water channels have been pointed out as an essential pathway for volume-regulatory water transport in human epithelial cells [185] . aqp3 is also selective for the passage of glycerol and urea and it has been suggested that osmotic stress up-regulates aqp3 gene expression in cultured keratinocytes [186] . aqp3 was found to be the predominant aquaporin in human skin which increased expression and altered cellular distribution of aqp3 in eczema thus contributing to water loss [187] . the putative involvement of aquaporins in the progression of melanoma, uncovered by our method in our results, warrants further investigation as it has been recently shown that another member of this family (aqp8) also facilitates hydrogen peroxide diffusion across membranes [188] . it is suspected that aqp3 has other functions with a suggestion that it is involved in ultraviolet radiation induced skin dehydration [189] . there is no probe for aqp8 in haqq et al.'s dataset that we could scrutinize from its trend with progression but we note that a novel strategy for drug development for melanoma (i.e. elesclomol) works by inducing apoptosis via a mechanism of elevation of reactive oxygen species (of course, including hydrogen peroxide in cancer cells) thus exploiting the ''achilles hell of cancer metabolism'' [190] . claudin 1, cldn1 [191] , a gene which is reported to be ''normally expressed in all the living layers of the epidermis'' [192] , in concert with aqp3, is a key component of the tight junction complexes of the epidermis. low cldn1 gene expression was correlated with shorter overall survival in lung adenocarcinoma. overexpression of cldn1 was correlated with suppression of cancer cell migration, invasion and metastasis [193] . hoevel et al. report that re-expression of cldn1, in breast tumor spheroids, induces apoptosis and they conclude: ''these findings support a potential role of the tight junction protein cldn1 in restricting nutrient and growth factor supplies in breast cancer cells, and they indicate that the loss of the cell membrane localization of the tight junction protein cldn1 in carcinomas may be a crucial step during tumor progression'' [194] . tokes et al.also report that malignant invasive breast tumors are negative table 1 . gene names and probe accession number of the 27 probes with genes annotated with functions on cell adhesion, cell-cell communication, tight junction mechanisms and epithelial cell polarity shown in the heat map in figure 8 . for cldn1 [195] . as in breast cancer [196] , in which reduced expression correlated with recurrence status, the low expression of cldn1 and other tight junction proteins seems to contribute to cellular detachment. the complementary set of correlations with the jensen-shannon divergences unveils biomarkers for cell proliferation, chemotaxis, and responses to external simulus. if the use of gene ontology has produced very peculiar results, helping us to link the loss of expression of 44 genes with a significant change in epithelial structure and development. a natural question arises: ''which is the significance of another set, now arbitrarily chosen to be also of the same cardinality (i.e 44 genes) with the complementary behavioural pattern?'' we have now listed all the probes according to diff. (probe) = jsm0(probe)2jsm5(probe) in decreasing order. the results are provided as haqq-plosone-supfile.xls ('results-correlation' sheet). this now gives ada as the first ranked gene. again using gather [155] on the first 44 genes recognized by the software, and again using gene ontology, we observe as most important common function that of cell motility (ccl3, cxcl10, fprl1, sema6a, spp1), p-value = 0.0002, bayes factor 5, and chemotaxis (ccl3, cklfsf7, cxcl10, fprl1, spp1), p-value,0.0001, bayes factor 7. the genes cxcl10, spp1, and wars, together with another gene that has been annotated as related to positive regulation of mitosis (sch1), have also been annotated as regulators of cell proliferation (pvalue = 0.007, bayes factor 2). using the g:profiler software [156] , we obtain a complementary information. sixteen genes (including spp1, sema6a, lef1 [197] , cd230, als2cr2, dkk1, cyfip2, shc1, ankrd7, ifi6, cited1, and mid1) have been associated to the gene ontology term of 'developmental process'. spp1 -secreted phosphoprotein 1 (osteopontin). spp1 is one of the most conspicuous melanoma biomarkers [198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222] (see also the references cited in figure 6 and note its eminent position in this scatter plot). in 1990, craig et al. reported that spp1 may work as an autocrine adhesion factor for tumor cells (see also [204, 223, 224] ). they observed that ''spp1 mrna, which is barely detectable in normal mouse epidermis, was expressed at moderate-to-high levels in 2 of 3 epidermal papillomas and at consistently high levels in 7 of 7 squamous-cell carcinomas induced by an initiation-promotion regimen'' [225] . the evidence is being constantly expanded on the role of spp1 as a molecular prognostic biomarker in melanoma [226] . activation of spp1 may be an important event that allows the transformed melanocytes to invade the dermis as proposed by geissinger et al. in 2002 [208] . this causes spp1 to avoid the apoptotic stimulus, one of the ''hallmarks of cancer'', which invasive cells will be receiving from this new tissue. if we extend the literature-based search so that we now include the first 200 gene probes recognized by gather then we have 27 gene probes associated with the gene ontology in terms of ''cell proliferation'' (p-value = 0.0002, bayes factor 5), and 'regulation of cell proliferation', p-value = 0.003, bayes factor 3). however, other partners of plk1 appear and their function in 'mitotic cell cycle' (pvalue = 0.0003, bayes factor 5) is increasingly present (in particular, the m phase of the mitotic cell cycle). the details of the gene ontology terms which are significant and the genes associated to them are listed in table 2 . the analysis using g:profiler largely coincides with the analysis using gather, however, it retrieves 12 genes associated with the m phase of mitotic cell cycle, namely: aurka and aurkb [227, 228, 229] , bub1 [230, 231] , cdca5a/sororin/p35 [232] , cdc7 [233, 234] , chek1 [235] , kif23/mklp-1 [227, 236, 237] , map9/asap [238, 239] , ncapd3, ncapg2 [240] , nek6 [241, 242, 243, 244] , plk1 [147, 245, 246] , pttg1/securin [247] , shc1/p66 [248, 249, 250] (discussed in the context of shc4 signalling), and tfdp1/dp-1 [251] . these are a significant finding by g:profiler (p-value = 4.03e-07). we have listed above some of the genes gene associated to the m phase of mitotic cell cycle and associated references which are either to current research in melanoma and/or its biological function. we now list other genes which have been associated with the term 'cell proliferation' by gather. these genes are: arpc1b [252] , arpc2 (which, together with spp1, is also in the novel 5-biomarker panel of kashani-sabet et al. [253] ), bccip (brca2 and cdkn1a-interacting protein)/p21-and cdkfigure 10 . expression of a probe for cldn1 (claudin 1) (y-axis) as a function of a probe for aquaporin 3 (x-axis). other members of the aquaporin family of proteins have a similar behaviour. aqp3, together with cldn1 are key components of the tight junction complexes of the epidermis and their joint loss of expression seem to be related to a transition to a more malignant phenotype. we use the same color coding as figure 9 . doi:10.1371/journal.pone.0012262.g010 associated protein 1) [254] , bst2/bone marrow stromal antigen 2/tetherin [255] , ccl3/mip-1alpha [256, 257, 258] , cct4, cdca5/sororin [259, 260, 261, 262, 263] , cenpf/mitosin [264] , cxcl1/chemokine (c-x-c motif) ligand 1 (melanoma growth stimulating activity, alpha) [265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285] (in uveal melanoma see [286] ), cxcl10 [256] , flt1/vegfr1 [287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299] , fth1/ferritin heavy chain [300, 301, 302] (which may indicate a necessary condition for the mainainance of iron sequestration and suppression of reactive oxygen species accumulation [303] ), fprl1, lig3/dna ligase 3 [304] (which, together with xpa and ercc5 is associated to dna repair in ionizing radition studies [305] ), mcmdc1, psen2, nrp2/neuropilin 2/vascular endothelial cell growth factor 165 receptor 2 [306, 307, 308] , sema6a (a member of the semaphorin family, of increasing importance in cancer research [309, 310, 311] and in particular due to its observed upregulation in undifferentiated embryonic stem cells [312] ), slamf1/cd150 (a marker associated with hematopoietic stem cells [313] ), spp1/osteopontin (which, together with arpc2, is also in the novel 5-biomarker panel of kashani-sabet et al. [253] ) [206, 207, 208, 209, 210, 211, 212] [206,207,208,209,210, 211,212,214,215,216,217,218,219,220,221,222,226,314,315,316,-317,318,319,320,321,322,323,324,325,326,327,328,329] , stk6 [230, 330] , and wars/tryptophanyl-trna synthetise [331] . figure 11 shows a heat map of discussed gene probes annotated with functions on cell proliferation. the references provided next to each gene help to related these upregulated genes in the context of current research in melanoma or with the m phase of mitotic cell cycle, showing a high degree of correlation between our results and with published literature. another microarray dataset we have selected to evaluate for the relevance of transitions of normalized shannon entropy and statistical complexity was contributed by true et al. [332] in 2006. the original goal of true et al. was to identify a molecular correlate for gleason patterns 3 and, if possible, the clinically most worrisome patterns 4 and 5. they partially succeeded by linking the expression of only 86 genes with gleason pattern 3 [332] using a standard statistical analysis. in this study, we eliminated sample 02-209c since data was acquired using a different platform and would not be useful for our analysis. the remaining thirty one (31) samples were assayed with the gpl3834 (fhcrc human prostate pedb cdna array v4) platform using 15,488 probes. we also eliminated all the probes with missing values, remaining 13,188 probes. we have first plotted the samples on the (normalized shannon entropy, mpr-statistical complexity) plane ( figure 12 ). it was interesting to observe that there exists a high correlation between the two measures. samples that are entirely composed of gleason pattern 3 tend to have a greater value of normalized shannon entropy than 0.985. we can also identify a cluster of samples that present gleason patterns which are either 4 or 5. note that there seems to be two outliers (02_003e and 03_063) to the generic trend of the other 29 samples. the two outliers are samples that correspond to samples labelled as having gleason 3 patterns and both have unusually low values of normalized shannon entropy that are well below the values of the rest of the group. this raised a suspicion about the true nature of this phenomenon. if the labelling is correct, this may indicate a subsampled group of prostate cancer that has gleason 3 pattern characteristics but very low entropy. alternatively, it may indicate an experimental bias for reasons we can not explain with the available clinical information. in order to clarify the situation, and see if we can declare these two samples as outliers of the other group, we performed another experiment. we have now computed two modified complexities, which we will call mgleason 3 and m-gleason 5 ( figure 13 ). the names are probably selfexplanatory, but a brief reminder follows. to calculate the mpr-complexity, by definition, we have used the equiprobable distribution as our probability distribution of reference (for the computation of the jensen-shannon divergence of the gene expression profile to this distribution). in the case of the m-gleason 3, the probability distribution of the reference is obtained averaging all the probability distributions of the samples that have been labelled as gleason 3 (analogously, we calculated mgleason 5) . samples that have gleason pattern 3 and 5 appear as separate clusters in the (m-gleason 3, m-gleason 5) plane with the two putative outliers of the general trend far apart (even if they have been used to calculate the average probability distribution function of the gleason 3 pattern). even samples with gleason 4 pattern are located closer to samples of gleason patterns 3, and 5, indicating that, perhaps, there exists a subsampled subtype of prostate cancer or there might be another experimental bias or factor that at present we can not resolve with the information we have for these samples. consequently, we have decided to eliminate both samples (02-003e and pna_03-063a) from further calculations. with these considerations, we now have a dataset with 13,188 probes and 29 samples as our dataset for further analysis. figure 14 shows the distribution of the samples using the normalized shannon entropy and the mpr-complexity. by definition, the positions of the 29 samples in the plane do not change (this figure is basically ''zooming in'' one region of figure 12 that contains these samples). we note again, however, that the 29 samples seem to be separating in three different clusters. whether we can argue about the existence or not of these gaps in normalized shannon entropy, it is clear that there seems to be a progression as we have seen with lapointe et al's dataset. there is a group of three samples with gleason pattern 3 that seem to have the the largest normalized shannon entropy values. there is also a cluster that only contains samples of either gleason pattern 4 and 5, all with normalized shannon entropy values smaller than 0.985. there is also very little variation (see figure 15 ) of the positions of the 29 samples on the (m-gleason 3, m-gleason 5)-plane, indicating a degree of robustness that the computation of these modified complexities have, even in the presence of some outliers. after observing that figure 14 shows a correlation of gleason pattern score with normalized shannon entropy, we asked ourselves: 'which are the genes that most positively and negatively correlate with the transitions of normalized shannon entropy?' we have plotted spearman versus pearson correlation values of probe expressions to attempt to find those that best correlate, either positively or negatively, with the normalized shannon entropy values of the samples. the results have revealed some of the most relevant biomarkers of progression, and some unexpected newcomers. figure 16 shows the pearson and spearman correlations of all the 13,188 probes in the dataset with the normalized shannon entropy values of the samples. we have highlighted some particular genes that are discussed below. cdkn2c (cyclin-dependent kinase inhibitor 2c (p18, inhibits cdk4). when we compute the correlations of the probes expressions with the normalized shannon entropy values of the samples, the gene that has the most negative correlations is cdkn2c (cyclin-dependent kinase inhibitor 2c -p18, inhibits cdk4 -nm_078626), which has been previously associated with the transition from prostatic intraepithelial neoplasia (pin) to prostate cancer [68] figure 12 . we plot the values of two modified statistical complexities, which we will call m-gleason 3 and mgleason 5. instead of using the equiprobable distribution as our probability distribution of reference (for the computation of the jensen-shannon divergence of the gene expression profile to this distribution), as required for the mpr-statistical complexity calculation, we used a different one. for the m-gleason 3, the probability distribution of the reference is obtained averaging all the probability distributions of the samples that have been labelled as gleason 3 (analogously, we calculated mgleason 5) . this is analogous to our approach in melanoma ( figure 5 ) in which we used normal and metastatic samples as reference sets for a modified statistical complexity. we observe that, even in this case, 02_003e and 03_063 continue to appear as outliers. in addition to the evidence, we have observed that the deletion of these two samples did not significantly alter the identification of biomarkers. doi:10.1371/journal.pone.0012262.g013 samples 02_003e and 03_063 seem to be outliers to this trend, and in the case of 03_063 the sample is not even close to a hypothetical linear fit which seems to be the norm for all the samples. figure 13 will provide further evidence that may indicate that these two samples are outliers or not to the overall trend. doi:10.1371/journal.pone.0012262.g012 with the dedifferentiation process, with preoperative psa levels and the percent of gleason 4 and 5 cancers [338] . amacr, cyclin g2, cdk4 and cdk7. other probes that also have high negative correlations with the shannon normalized entropy correspond to ccng2 (cyclin g2) cr598707, cdk4 (cyclin-dependent kinase 4), cdk7 (cyclin-dependent kinase 7, tfiih basal transcription factor complex kinase subunit) [339] , and amacr (alpha-methylacyl-coa racemase), an ''obscure metabolic enzyme (that has taken) centre stage'' [340] as judged by the extraordinary convergence to this biomarker in prostate. we believe that our result is an important finding. amacr was not judged of importance according to the methodology used in [332] and it was barely cited in that manuscript. here we present results, from an unifying biological and informational principle, which allows (using ref. [332] 's own data) the identification of the most central current biomarker with a truly compelling body of support in independent studies [ [490, 491] . knockdown of brca1 results in the accumulation of multinucleated cells, indicating that brca1 regulates gene expression of an orderly progression during mitosis [492] , preserving chromosomal stability [490] . brca1 showed decreased expression in a study involving immortalized prostate epithelial cells before and after their conversion to tumorigenicity [493] . lack of brca1 function may impair activation of stat3 [494] . inactivation of tp53 by somatic mutations is also associated to the panel of disruptions which are common for this ''tumor suppressor'' [113] . one possible mechanism for gene silencing is cpg island methylation. rabiau et al.show in [495] that brca1, rassf1, gstp1 and ephb2 promoter methylation is common in prostate biopsy samples. mannicia et al. suggest that the mitochondrial localization of brca1 proteins may be a significant factor in regulating the mitochondrial dna damage [5] . sfpq -(polypyrimidine tract-binding protein-associated splicing factor). the most positively correlated gene with the loss of normalized shannon entropy is sfpq/psf (polypyrimidine tract-binding protein-associated splicing factor) (spearman correlation of 0.7902), a multifaceted nuclear factor [496, 497] which is also a putative regulator of growth factor-stimulated gene expression [498] . this is extremely interesting as it has been recently shown that the ar/psf complex interacts with human psa gene and that psf inhibits ar transcriptional activity [499] . the loss of expression of sfpq and other proteins that together regulate androgen receptor-mediated gene transcription [500] (see also [501, 502] ) may indicate they have a role not only as a biomarker of the progression and well as transitions of the disease to androgen independence. in a study of human labor, dong et al., also showed that sfpq acts as a progesterone receptor corepressor, thus putatively contributing to the functional withdrawal of progesterone [503] . we will return to this particular gene later on the 'discussion' section as new evidence of its role in nuclear organization has been documented. cd40 -(tnfrsf5, b-cell surface antigen cd40). the loss of normalized shannon entropy gives us several markers that indicate a de-differentiation from a epithelial basal phenotype and an increasing loss of control of cell cycle regulation (due to uncoordinated upregulation of cdk4, cdk7, ccng2 with their functional partners). this poses the question: what can we observe while looking at the genes that most positively correlate with the loss of normalized shannon entropy? we observe, second on the ranking of all samples, a probe for cd40 (tnfrsf5, b-cell surface antigen cd40), bx381481 with a spearman correlation of 0.7616. loss of cd40 expression has been previously reported in prostate cancer and it is the object of a study that attempts to establish dendritic cell gene therapies [504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522] . we will continue discussing cd40 in the following subsection in concert with other genes. another natural question can be asked: which is the extra information that we can obtain the by analysing the correlations with the mpr-statistical complexity in this case? as we have discussed before, and can be appreciated from figure 14 , there is a strong correlation between the mpr-statistical complexity and the value of the normalized shannon entropy. it appears in prostate cancer, as in this gene expression dataset, the reduction of entropy is not the major factor responsible for the increase in mpr-statistical complexity. again, it is perhaps better to now look at one of the multiplicative factors of the statistical complexity measure, the jensen-shannon divergence to the equiprobability distribution, as this is increasing the mpr-complexity. cd40. we present more evidence of the case of cd40 as a biomarker, since a probe for cd40 (bx381481) ranks 6 th (the spearman correlation of the probe expression with the jensen-shannon divergence from the equiprobability distribution is 20.5764). cd40 is a member of the tnf receptor superfamily. notably, in 56 out of 57 archival prostate cancer samples palmer et al. have reported no cd40 expression [518] . however, cd40 expression was present in normal prostatic acini, so they proposed that ''invasive prostate cancer is a cd40-negative tumour'' (see the previous results of moghaddami et. al. [514] ). matching our observations, they proposed that cd40 provides ''insight into progression of cancer from normal epithelium''; our proposed methodology is revealing this fact as well. depletion of cd40 in the tumour microenvironment may be central in avoiding the action of the immune system [506] , as prostate cancer induces a progressive suppression of the dendritic cell system [520] . it is perhaps a central piece which should be put together in the context of other pieces of information coming from immunotherapy [508, 512, 513, 516] and pharmacological studies [507] that warrant serious investigation towards the design of new and improved clinical studies [508, 517] . cd59 molecule, complement regulatory protein. four probes for protectin [335, 523, 524] , cd59, with spearman correlations with the jensen-shannon divergence from the equiprobable distribution, ranging from 20.61823 to 20.5089, rank between the 1 st and 39 th position (when we rank genes according to this correlation in ascending order). cd59 is an interesting gene as ''a comprehensive investigation of cd59 expression in prostate cancer has not been conducted yet'' [524] . like lmna (which is ranked third and will be discussed later) the rank of cd59/ protectin means that these genes progressively loose expression of these probes. cd59 is expressed in the prostatic epithelium [525] and in prostasomes [526] ; secretory granules which are produced, stored and released by the glandular epithelial cells of the prostate [527] . babiker et al. concluded in [335] that prostasomes (via expression cd59) contribute to the protection of malignant cells from complement attack. we now investigate if the ratio of deltacatenin to cd59 can is a more robust biomarker for non-invasive prostate cancer detection, particularly after the results presented in [528] . we also note that cd59 may be also relevant to reveal the heterogeneous nature of prostate cancer. its correlation was good, but is not lower than 20.62, which in our experience, indicates that we may be dealing with at least two types tumors in this dataset. indeed, xu et al. obtained cd59 mrna levels were determined by real-time pcr in matched (tumor/normal) microdissected tissues from 26 cases and they found that: ''high rates of cd59 expression were noted in 36% of prostate cancer cases and were significantly associated with tumor pt stage (p = 0.043), gleason grade (p = 0.013) and earlier biochemical (psa) relapse in kaplan-meier analysis (p = 0.0013). on rna level, we found an upregulation in 19.2% (five cases), although the general rate of cd59 transcript was significantly lower in tumor tissue (p = 0.03)'' [524] . they concluded that: ''cd59 protein is strongly expressed in 36% of adenocarcinomas of the prostate and and is associated with disease progression and adverse patient prognosis'' [524] . jarvis et al. have previously hypothesized that cd59 expression, in some cancer cells, may help to regulate the immunological response, protecting them from the cytolytic activity of complement [523] (see also [529, 530] ). lmna (lamin a/c). the third probe in the ranking corresponds to a lmna (lamin a/c), ay528714. mutations on lmna have been linked at 10 different human diseases [531, 532] . lmna, due to its functions, could be involved in important cell fate decisions as lamins are involved in the organization of the functional state (and position) of interphase chromosome [531] . lamins are ''scaffolders'' for the function of nuclear processes such as chromatin organization, dna replication, cellular integrity and transcription [532] . as a consequence lamins are involved in several clinical syndromes [533, 534, 535] . among the recent functions attributed to lmna is as an intrinsic modulator of ageing within adult stem cells via a mechanism where lmna act as signalling receptors in the nucleus. these observations correspond to pekovic and hutchinson who observed that dysfunction of lmna leads to inappropriate activation of self-renewal pathways and initiation of stress-induced senescense [536] . in lmna-deficient mouse embryonic fibroblasts (lmna(2/2) mefs), the loss of lmna''dramatically affects the micromechanical properties of the cytoplasm'', since ''both the elasticity (stretchiness) and the viscosity (propensity of a material to flow) of the cytoplasm in lmna(2/2) mefs are significantly reduced'' [537] . using ballistic intracellular nanorheology to evaluate the micromechanical properties of the cytoplasm of these cells, lee et al. conclude: ''together these results show that both the mechanical properties of the cytoskeleton and cytoskeleton-based processes, including cell motility, coupled mtoc and nucleus dynamics, and cell polarization, depend critically on the integrity of the nuclear lamina, which suggest the existence of a functional mechanical connection between the nucleus and the cytoskeleton. these results also suggest that cell polarization during cell migration requires tight mechanical coupling between mtoc and nucleus, which is mediated by lamin a/c'' [537] (see also [538, 539] ). in addition to these very interesting findings, a functional association of lmna and the retinoblastoma protein (prb) exists. nitta et al. have shown that prb needs to be stabilized by lmna for ink4a-mediated cell cycle arrest and that somatic mutations in lmna may also have a role in tumor progression [540] . in mammalian cells, lmna a) colocalizes with c-fos at the nuclear envelope, b) suppresses ap-1 through a direct interaction with c-fos and, in lmna-null cells perinuclear localization of c-fos is absent (but it is restored when it is overexpressed, c) lmna-null cells have enhanced proliferation [74] . these results obtained by ivorra et al. are giving the indication that of yet another mechanism of cell cycle and transcriptional control mediated by lmna [74] (see also [541] ). lmna has also been proposed as an inhibitor of adipocyte differentiation [542] . hutchingson et al. have proposed the alias of ''guardian of the soma'' for lamins a and c as they seem to have ''essential functions in protecting cells from physical damage, as well as in maintaining the function of transcription factors required for the differentiation of adult stem cells'' [543] . from our results, we can not completely establish if the downregulation of cd40 and cd59 are enough to pinpoint an impaired or abnormal immune response. if we continue the inspection of the list, the first 20 probes give us more supporting evidence. the 20 probes correspond to 13 different genes. five of these 13 genes have genome ontology information annotated as ''defense response'', the above mentioned cd59 and cd40 as well as il4r (interleukin 4 receptor, cr616481), xbp1 (x-box binding protein 1, ak093842) and hla-a (major histocompatibility complex class i hla-a29.1, bu075230). takahashi et al. [544] report an inverse correlation between xbp1 expression and histological differentiation in a series of prostate cancers without hormonal therapy, the expression of xbp1 was localized in epithelial and adenocarcinoma cells of the prostate and the majority of refractory cancer cases exhibited weak xbp1 expression), mst1/stk4 (along with mst2/stk3) act as inhibitors of endogenous akt1, a mediator of cell growth and survival [545] . we can not yet know what reason is behind their joint downregulation, but another interesting common denominator is that 12 out of 13 genes share a regulatory motif for nf-kappab (according to transfac, v$nfkb_q6_01). a putative role for nf-kappab in prostate cancer has been reported based on the observation of the centrality of nfkb on two up-and downregulated networks compairing prostate tumors and healthy tissue [546] and in a larger study by mcdonnel et al. [547] (255 core prostate cancer tissue microarrays from 47 prostatectomy specimens). several other researchers are currently investigating different roles of the nfkb family in prostate cancer [548, 549, 550, 551, 552, 553, 554, 555] and it could be a promising target for intervention [555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571 ]. if we include other genes following the ranking order, the first 38 genes in the ranking include 33 that have the regulatory motif v$nfkb_q6_01 (gather reports for this list a p-value of 0.0006). even when we double the list to the probes that correspond to the first 76 different genes recognized by gather, 58 of them have the regulatory motif v$nfkb_q6_01, with p-value = 0.003 (atp6ap2, bcat1, btg2 [572, 573, 574, 575, 576, 577, 578] , c14orf123, c18orf45, ccl2, cd302, cd40 (already discussed), cd59 (already discussed), chi3l1, col16a1, commd6, crabp2, csrp1, ctbp2, ctgf (connective tissue growth factor, [579, 580, 581, 582] ), des, dmn, dnajb1, egf, emp1, fhl2 [583, 584, 585, 586, 587, 588] , gripap1, gstm1 [589, 590] , hbegf, il4r, itga3, itga7, junb [591, 592] , kiaa0152, kiaa1191, kiaa1324, klf6, lamb2, lmna (already discussed), nfatc1, nfkb2, nudc [593] , p4hb, pdk2, pim1, pisd, pxn, rap1b, rnf40, sara1, sec61a1, sgta [594] , slc12a2, srd5a2, stat6 [595, 596] , tacstd2, tbx1, tmed3, vps39, wdfy3, xbp1 [544] , zak). this result indicates that our results support the importance of nfkappa-b and the huge amount of research effort to understand the role of the nfkappa-b activity and its potential as a target for intervention in prostate cancer (file s4). the group of 58 biomarkers contains one of particular interest, stat6. this gene is considered a survival factor in prostate cancer and a key regulator of the genetic transcriptional program responsible for progression [595] . stat6 has been recently linked to hpn as one of the most robust pair of biomarkers for prostate cancer using an integrative approach that linked several microarray datasets [596] . analysis using gather of this group reveals that six of these 58 genes are in kegg pathway path:hsa04510, focal adhesion (egf, itga3, itga7, lamb2, pxn, rap1b, p-value,0.0007) and from these there are three in pathway:hsa045122, ecm-receptor interaction (itga3, itga7, lamb2, p-value,0.005) while four of these six are also in path:hsa04810: regulation of actin cytoskeleton, (egf, itga3, itga7, pxn, p-value,0.01). lamb2. alterations of the gene profile of lamb2 and cdkn2c/p18(ink4c), a cdk4 inhibitor, have been reported on the transition from prostatic intraepithelial neoplasia (pin) to prostate cancer [597] (see also [333] ). 3). the contribution of the loss of these integrins and the subsequent derived impairment on cell adhesion has been reported in several tumours. ren et al. in [598] report that ''focal or no integrin alpha 7 eexpression in human prostate cancer and soft tissue leiomyosarcoma was associated with a reduction of metastasis-free survival (for example, for prostate cancer with focal or no expression, 5-year metastasis-free survival was 32%, 95% ci = 24.4% to 40.3%, and for prostate cancer with at least weak expression, it was 85%, 95% ci = 79% to 91%; p-value,.001)''. ''any method involving the notion of entropy, the very existence of which depends on the second law of thermodynamics, will doubtless seem to many far-fetched, and may repel beginners as obscure and difficult of comprehension.'' willard gibbs, graphical methods in the thermodynamics of fluids, (1873) the changes of the normalized shannon entropy and statistical complexity of the gene expression profile of a cancer cell are associated with the gradual deterioration of genome transcriptional information content due to the modification of its structural and functional integrity during disease progression. our results clearly suggest that we can track the cancer cell's progression by following observable changes in the shannon entropy and, in particular, by employing the jensen-shannon divergence of the gene expression profile of a sample to the normal expression profile. we have also shown if an average expression profile of some state of interest can be properly defined (i.e. distant metastasis) then the jensen-shannon divergence can help us to identify which probes best correlate with these measures resulting in useful biomarkers. before any thermodynamical consideration could be discussed, we note that there is a clear and objective informational perspective that our study delivers. in this study we have chosen to position ourselves as the 'receivers' of a 'transcriptional message'. in this experimental perspective the tumor tissue is the 'sender' (the source of information) and the high-throughput technology (gene expression microarrays in this case) can be regarded as the transmission medium (providing noise and distortion). as we explain in the 'materials and methods' section, the shannon entropy of a gene expression profile is the average expected surprisal of that profile understood as a message. the normalized shannon entropy makes this surprisal an intensive measure and the correlation of the gene expression patterns across samples with this measure can deliver useful biomarkers to track the progression of transcriptional change. after normalization, we have a measure that does not depend of the number of probes of the high-throughput technology, although, it obviously does depend on the type of probes used. we believe that the readers may have already noticed an apparent paradox. while some researchers understand cancer progression as a mechanism that increases entropy, we actually observe a reduction of normalized shannon entropy in this work. this means that our normalized average expected surprisal, as receivers of the transcriptional message, is smaller. we must then discuss the physical meaning of thermodynamic entropy, its current use in systems biology and cancer research genetics and the informational measure we use in this paper to clarify these notions in this context. in biomedical research there exists a certain consensus among cancer researchers that genetic instability or ''mutability'' is a major critical force of cancer progression, but it is not the only one to consider. it is clear that the mutational damage of key genes (like tp53, tert, brca1, rb1, etc.), and the collective damage inflicted on key dna repair mechanisms (like nucleotide-excision repair and base-excision repair) collaborate for an increasing acceleration of the number of genomic changes. sub-microscopic alterations of the genome accumulate in cancer progression in an irreversible way and ''are compounded by the widespread scrambling of the chromosome structure, and thus the karyotype, found in cells from the great majority of solid tumours'' [599] . in weinberg's own words [599] : ''we learned that this chromosomal chaos also contributes this progression forward''. this ''chromosomal chaos'' [600] or ''cancer as a chromosomal disease'' perspective is viewed by some researchers not as just a side consequence of mutational damage, but as the main core theme to understand a number of unexplained issues in cancer progression. ''in sum, cancer is caused by chromosomal disorganization, which increases karyotypic entropy'' [601] . regarding the cancer types studied in this paper, one particular ''measure of disorder of a system'', aneuploidy, has been observed in poorly-differentiated prostate cancer cells and it is often associated with a more agreessive phenotype [602, 603] , increased psa levels [604, 605] , and correlate with gleason score [606, 607, 608] . gene fusions and chromosomal rearrangements are other source of increase in the ''disorder'' of the genome organization and they are increasingly being recognized as a major player in prostate cancer progression [609] . the increase in ''karyotypic complexity'' and ''extended aneuploidy and heteroploidy'' may be already enough to develop a malignant melanoma phenotype, as the report of gagos et al. indicate [610] . the observed finding of aneuploidy in melanoma (also including uveal melanoma) is also increasingly important due to a number of different independent observations [247, 611, 612, 613, 614, 615, 616, 617, 618] . it is in this context that the word 'entropy' has been used. the magnitude of the ''chromosomal chaos'' is also evident from comparative genomic hybridization (cgh) studies which show significant variations in the copy number of individual chromosomal segments. 'chaos' is really a very appropriate word to describe what we observe from cgh data. the genomic changes are not distributed uniformly at random. 'chaos' has been described by some researchers as ''a kind of order without any periodicity''. some common changes seem to consistently appear in several independently arising tumours of the same type, and sometimes the researchers suggest common links [619] . our work has addressed, in part, this question: ''can we quantify the chaos observed in the genome from the increasingly available transcriptional data and relate it to tumour progression?'' if no commonalities were observed, we would not have found interesting biomarkers that seem that strongly correlate with the divergences from normal tissue types. we know from our results that these commonalities do occur. we need to go back to basics to explain these evolving concepts and resolve this apparent paradox. the phrase ''karyotypic entropy'' has been used in the past to define what is actually a divergence from the normal chromosome structure and it genomic organization. this denomination has also been employed by several authors, notably [601] , but it has also been used in at least two other publications [620, 621] . these works have in common the use of this term to refer to a ''disorder'', fuelled by the undergraduate textbooks indoctrination of associating increase of entropy in natural spontaneous processes with the increase of ''observed disorder'' in the system. we propose that the use of a natural measure of divergence, the jensen-shannon divergence, could not only be a more formal, but also more appropriate modelling approach. as such, we propose to introduce the term 'karyotypic divergence' or 'karyotypic jensen-shannon divergence' to replace this concept and to avoid a subjective approach. why is it the case that we observe the normalized shannon entropy of the transcriptional profile decreasing with cancer progression when intuitively our average expected surprisal (shannon entropy) should increase with progression? arieh ben-naim in his recent book ''a farewell to entropy: statistical thermodynamics based on information'' [622] comments:''it is interesting to note that landsberg (1978) not only contended that disorder is an ill-defined concept, but actually made the assertion that 'it is reasonable to expect 'disorder' to be an intensive variable'''. ben-naim also states: ''in my view, it does not make any difference if you refer to information or to disorder, as subjective or objective. what matters is that order and disorder are not well-defined scientific concepts. on the other hand, information is a well-defined scientific quantity, as much as a point or a line are scientific in geometry, or mass or charge of a particle are scientific in physics.'' however, in a manuscript entitled ''can entropy and 'order' increase together ?'' landberg defines (in an attempt to decouple the notions of order and entropy), for a thermodynamical system that can be on n states the 'disorder' d(n) to be the normalized entropy (which is a function of n) divided by boltzmann's constant [623] . 'disorder' then is an intensive magnitude bounded by 0 and 1, and 'order' is defined as 1-d(n) . while landberg's decoupling argument between order and entropy [623] may still be controversial in physics, the question is pertinent for our apparent paradox (the question that motivates this subsection). borrowing from the title of his paper we could now state the central question as ''can shannon entropy increase while the normalized shannon entropy decrease?'' the solution of this apparent paradox is a trick of escapologism, perhaps also paralleled by what a cancer cell may be experiencing (or ''reacting'' in response to increased sources of stresses), and it is worth discussing in this context. let h[x] be shannon entropy for an ensamble x with n different values. we will now assume, and here is the trick, that n is not a constant, but a function of time n(t). let d(x (n(t) )) be the normalized shannon entropy. by definition d(x(n(t))) = h(x(n(t)))/log 2 (n(t)). then, just by taking the time derivatives it can be shown that the time variation of d(x(n(t))) can be negative, although the time rate of h[x] can be positive. where k is a constant. the escape to our paradox is ''achieved'' via making explicit the time variability of n(t). landberg explicitly mentions that biological systems are examples where growth processes increase n(t), and perhaps the increased diversity in the transcriptome of a cancer cell during progression is one of such examples. this discussion somehow resolves the apparent disassociations due to language barriers that may exist between the different disciplines (physics, information theory, molecular biology and oncology). a biologist may regard a cancer cell as an entity that, during progression, may ''spread'' its transcriptomic profile, including the generation of a large number of novel molecular species (due to adquired characteristics during its ''devolution'' from the normal type). in our informational perspective, this would be analogous to a situation in which the sender of a message, after some time, decides to increase the size of the alphabet of transmitted symbols. clearly, it is intuitive to think that the receiver would be in a situation of increased shannon entropy. however, if the receiver is not aware of the new symbols (or is not able to detect them) and some of the symbols of the previous alphabet are no longer used, the receiver would now perceive a reduction of normalized shannon entropy, observing an increasing order. we now borrow an illustrative example from landberg [623] , but we add a twist to this argument for the purpose of illustrating this discussion. suppose we have a sender transmitting only two possible symbols (n = 2), and we will assume that we have the same probability, let's denote this as (1/2, 1/2). then the average expected surprisal (shannon entropy), is h(x) = 1, and the normalized shannon entropy is also equal to one. assume now that now our sender starts to transmit using another symbol, so that we now have theoretical probabilities of (0.5, 0.25, 0.25). then n = 3, and the average expected surprisal increases to h(x9) = 1.5 the normalized shannon entropy is now 1.5/log 2 (3) = 0.946â�¦ (a reduction). this 'third symbol' could actually represent a new ''molecular species'' or a protein isoform that would not be normally expressed in that tissue type [624] , or even something entirely new, product of a mutational/deletional event. if our hypothetical high-throughput technology can only be detecting the first two symbols, and following the conventions we established in the 'materials and methods' section, we would be ''observing'' frequencies of (2/3,1/ 3) since the other events would not be detected with our equipment. as a consequence, the both the log 2 (2) = 1, shannon entropy and the normalized shannon entropy are both reduced to 0.918293. obviously, we can not count what we can not observe. as a consequence, a degenerating transcriptional profile that produces novel molecular species, and at the same time reduces those which we can not measure with a particular technology, would look increasingly more ordered. we envision that physicists may find here a fertile ground to explore new ideas and attempt novel mathematical formalisms for cancer progression from the realm of finite-state thermodynamics [625] and in particular endorevesible processes [626] and endoreversible thermodynamics [627] . some molecular alterations would then be part of the set of revesible processes that could occur in a cancer cell, while other processes like aneuploidy or gene fusions could be truly ''irreversible genetic switches'' associated with cancer progression [628] . if we assume that the process is slow (i.e. the times required for significant variations of the transcriptome's profile is large in comparison with the cell's processes time scales), and follwing the results of spirkl and reis [626] , it may be possible that we have a constant entropy production rate exists during cancer progression leading to hauptmann's ''entropic devolution'' [629] . hauptmann sees a malignant tumour as ''a dissipative structure arising within the thermodynamical open system of the human body'' that starts when ''a localized surplus of energy exists and there is no possibility to export entropy. an energetic overload in most malignant cells is indicated by their abnormally high phosphorylation state.'' his perspective, preceeded in part by dimitrov [630] , klimek [631, 632] and marinescu and viculetz [633] might then fit well an endoreversible thermodynamic formalism. hauptmann says in [629] ''i believe that cancer is a special kind of adaptation to energetic overload, characterized by multiplication and mutation of genomic dna (generation of new biomolecules which enhance the probability of survival under harmful conditions), and by chiral alterations (reduction of entropy by entrapping energy) leading to abnormal configurated biomolecules. in this regard the genetic alterations are probably secondary changes. cancer serves to dissipate energy in a type of developmental process but one in which the results are harmful to the whole organism: an entropic devolution.'' this thermodynamical perspective is now worth exploring and we will discuss it in this context. assuming that a cancer cell is in a state of ''energy overload'', without ''the possibility of exporting entropy'', could it lead to some type of ''genetic alterations''? which key mechanisms might be impaired? what consequences is this ''system'' delivering? could this be another hallmark for oncosystems indentification? in 1871, in this book called ''theory of heat'', maxwell speculated the idea of ''a being, who can see the individual molecules'' and who has enough reactive intelligence to open and close a unique small hole existing between two communicating vessels (called 'a' and 'b'). an ideal gas filled both vessels, so that starting at uniform temperature the intelligent being could observe the molecules and close and open the hole accordingly to a mission: ''to allow only the swifter molecules to pass from a to b, and only the slower ones pass from b to a.'' the being, ''without expenditure of work raise the temperature of b and lower that of a in contradiction to the second law of thermodynamics.'' the ability of the ''being'' to use observable information about the system to lower the thermodynamical entropy has motivated many articles in physics and fuelled the imagination of many since it was originally introduced by mawell, and named as ''demon'' by thomson three years later [622] . an excellent collection of articles until 1990 [634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644] was edited by leff and rex [645] . the maxwell ''demon'', far from being ''exorcised'' from physics, still inspires interesting new perspectives [634,635,636,637,638,639,640,641,642,643,644,646,647 ]. in a letter to peter guthrie tait, maxwell writes about the ''demons'': ''is the production of an inequality of temperature their only occupation? no, for less intelligent demons can produce a difference in pressure as well as temperature by merely allowing all particles going in one direction while stopping all those going the other way. this reduces the demon to a valve. as such value him. call him no more a demon but a valve like that of the hydraulic ram, suppose.'' (from [645] , p. 6). maxwell gives again here a sign of his brilliant mind, ''degrading'' the demon to a valve, but also offering an inspiring perspective to oncosystems research. which types of mechanisms exist in biological systems, and particularly in individual cells, to control these differential values in key parameters? could changes of key physical parameters for metabolic processes of the cytoplasm and cell's organelles like temperature, volume, ph or electrochemical potentials be also implicated in cancer progression? the influence of temperature may be giving an interesting working hypothesis for further research. what are the consequences if cancer cells are a different type of open system which also operates at a different temperature than a normal cell? butler et al. have studied p53 and they argue that at temperatures above 37 degrees centigrades wild-type p53 spontaneously loses dna binding activity. while folding kinetics do not show important changes in a range from 5 to 35 degrees c, the unfolding rates accelerate 10,000-fold. this leads to a somewhat unexpected mechanism of p53 inactivation. it could be the case that a fraction of p53 molecules become trapped in misfolded conformations with each folding-unfolding cycle due to the increased frequency of cycling. the occurrence of misfolded p53 proteins can lead to aggregation and subsequent ubiquitination in the cell, leading to p53 inactivation [648, 649] . if a key ''guardian of the genome integrity'' [650, 651] and its remarkable conformational flexibility [652] is challenged by an increase of temperature [653] , its role in genotoxic damage and adaptive response (like that of the skin to uvb damage [654] ) may be impaired. the same may occur for other members of the dna damage response. an increment in temperature has already been linked to skin carcinogenesis. boukamp et al. report in that [655] ''exposure of immortal human hacat skin keratinocytes (possessing uv-type p53 mutations) to 40 degrees c reproducibly resulted in tumorigenic conversion and tumorigenicity was stably maintained after recultivation of the tumors.'' on the other hand, natural gradients on physical biochemical properties can also be challenged in a cancer cell. this in turn derives in metabolic processes running under abnormal parametric circumstances. it is well-known that compartimentalization, in biological systems, naturally require the existence of mechanisms that would keep some key state variables relatively constant, or within bounds, for normal operation of the metabolic processes. one example is very illustrative and a case in point. instead of demons, holes, or valves, the cell requires pores in its membranes to allow osmotic regulatory processes, yet it should preclude the conduction of protons. this is a nanotechnological design problem not faced by maxwell, but certainly solved by biological systems without the need of an ''intelligent being'' as mawell cleverly pointed to tait in his letter. this discussion brings us to one of the gene families we have already discussed in this paper, the aquaporins [184, 656, 657, 658, 659, 660, 661] . they are considered the primary water channels of cell membranes [662, 663, 664, 665] . the specific functions of each member of this family are now being slowly mapped by several research labs around the world [666] . their clinical role in cancer [667, 668, 669, 670, 671, 672, 673, 674, 675] ,obesity [676] , malaria [677, 678] and other diseases is emerging [657, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689] . in [690] , our group observed the dowregulation of aqp3 in all melanoma cell lines studied of the nci-60 dataset of ross et al.; this dowregulation was also observed for the cns and renal cell lines. aqp3 was relatively upregulated for leukaemia and colon cell-lines (we refer the reader to the supplementary material of [690] for details). inhibition of aqp3 in prostate cancer cells was already proposed as a mechanism that increases the sensitivity to cryotherapy treatment [691] . the aquaporins are not ''an intelligent being'' in any real sense, yet they are so formidable selective that they could easily parallel maxwell demon's efficiency in creating the right conditions for the cell. wu et al. give us some clues on the role of point mutations in the aqp1 and how their effective electrostatic proton barrier can be impaired [692] . the elicitation of the detailed mechanistic explanation of this extraordinary selectivity is under intense investigation with a number of techniques, including sophisticated molecular dyanamics simulations, for an overview of this field see [665, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708] . one less known feature of aquaporins is that they may not only channel water, but also carbon dioxide and ammonia [709, 710, 711] , glycerol [712] and urea and other small solutes [713] and, very relevant for cancer research, hydrogen peroxide [188] . at least two of members of this family have been observed in the inner mitochondrial membrane in different tissues. this in turn may indicate mitochondrial roles for aquapotins in osmotic swelling induced by apoptotic stimuli [714] . could it be possible that we can track cancer progression by looking at some of these ''maxwell demons''? we have seen in figure 10 , that aqp3 has a reduced expression with increased progression in our melanoma dataset. cao et al., reported that ultraviolet radiation induced aqp3 down-regulation in human karatinocytes; thus aqp3 has become a strong and plausible link between uv radiation, skin dehydration [186, 715] and photoaging [189] . this may indicate an impared function on skin hydration [184, 185, 716, 717, 718, 719] . the expression of aqp3, as well as aqp1, aqp5, and aqp9 seem to be correlated with melanoma progression, indicating a common pattern of downregulation from the higher values in normal skin and benign nevi (see figure 17 ). does a similar pattern of aquaporin downregulation exist in prostate cancer? wang et al. have looked at the expression and localization of aqp3 in human prostate using cell lines as well as patient samples. they have observed aqp3 mrna ''in both normal and cancerous epithelia of human prostate tissues, but not in the mesenchyme. in the normal epithelia of the prostate, localization was limited to cell membranes, particularly the basolateral membranes. however, the expression of aqp3 protein in the cancer epithelia was not observed on the cell membranes.'' this finding seems to implicate the subcellular localization of aqp3 as a possible indicator of a transition to a more malignant phenotype. lapointe's dataset allows us to see the downregulation of aqp3 and aqp1. a large subgroup of primary prostate tumors has reduced levels of aqp3 and aqp1 as most of the lymph node metastasis samples [ figure 18 ]. one critique that we are aware we could receive is that the current manuscript presents a novel methodology and an underlying unifying theory based on retrodictions or postdictions. indeed we have shown that the use of the normalized shannon entropy and the information theory quantifiers (the m-complexities and the jensen-shannon divergence) allow to monitor cancer progression and to identify the best biomarkers that correlate with the transcriptomic changes. our approach works in a retrodiction way in that it looks at data already obtained by other studies, but gives a unifying framework to track cancer progression. for instance, on true et al's dataset, our unifying hallmark of cancer gives not only maoa, which was already identified in the original publication, but also amacr, cd40, cdk4, etc. are very important biomarkers for prostate cancer. analogously, the identification of klk3/psa in lapointe's dataset is another important retrodiction which shows the power of the method. in some sense our approach also works in a postdiction way, as it helps to evaluate the speculation that cancer cells have ''an entropic devolution''. our results show that the variations of normalized shannon entropy and jensen-shannon divergences indeed give measurable changes, and that these changes are related to important biomarkers in the two types of cancer studied in this work. in addition, we remark that we are literally making hundreds, or even thousands of predictions. the results in the 'supplementary material' provide this information for the detailed scrutiny of our peers. we believe that other probes with gene expression patterns in high correlation with the probes discussed in this paper, and perhaps less studied by immunohistochemistry and other methods in the two cancer types studied here, are worth exploring as a group of biomarkers. these predictions can be tested with further studies on staging and patient stratification. a very recent study by ballal et al. have linked brca1 to telomere length and maintenance and its loss from the telomere in response to dna damage [720] (see also [721] ). we have previously mentioned that brca1 is a conspiquous biomarker arising from the analysis of true et al.'s dataset using our methods. we found this to correlate with a preivous study that showed that brca1 has a reduced expression in immortalized prostate epithelial cells before and after their conversion to tumorigenicity [493] . we also mentioned that the knockdown of brca1 leads to anaccumulation of multinucleated cells [492] , preserving chromosomal stability [490] . ballal et al. telomeric chip assays to detect brca1 at the telomere and reported time-dependent loss of brca1 from the telomere following dna damage. due to the role of telomeres in maintaining chromosomal stability [722] and the inverse correlation of telomere length and divergent karyotypes in prostate cancer cell lines [723, 724] (as well as the recognized role of telomere dysfunction in the induction of apoptosis or senescence in vivo [725, 726, 727, 728, 729, 730] , increase of mutation rates [731] , dna fragmentation [732] , and their relation with dna damage signalling [733] ), we checked for other probes of genes involved in telomeric function. from those which we were able to identify in true et al's dataset, we have found a strong high correlation of the expression of brca1 with terf2/trf2 (telomeric repeat binding factor 2) [734] and a negative correlation with the expression pattern of terf2ip (telomeric repeat binding factor 2, interacting protein) [ figure 19 ]. finally, one particular type of probes has also caught our attention, and we would like to refer to them before concluding this section. with the denomination of 'non-coding rna' we identify those rna molecules which are functional but that are not translated into proteins. many microarray chips contain probes that are annotated as 'non-protein coding', indicating that there might be some valuable expression data that we can also mine for information. we note that our method, although employing transcriptomic data, does not limit its application to proteincoding information, and that the combined use of protein-coding and non-coding protein probe expression would allow a more comprehensive view of the transcriptional state of the cell. among non-protein coding, micrornas [735] are gaining acceptance as key players in several cancers [736, 737, 738] (including prostate cancer [739, 740] ), but the so-called ''long noncoding rnas'' [741] are also gaining a place in the scenario of cancer biomarkers (see [742] , and [743, 744, 745] ). we thus turned our attention to these probes that have been annotated as ''nonprotein coding'' and we highlight some of them that have very high correlation values with the normalized shannon entropy in true et al's prostate cancer dataset. in particular, the probes for malat1/ malat-1 [742, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758] have a very conspiquous position (see figure 20) . they located very closely to other protein coding biomarkers that have also lost expression and have been discussed in this work like sfpq, cd40, brca1, and tp53 (see figure 16 ). malat1 has been recently pointed as a biomarker in primary human lobular breast cancer as a result of an analysis of over 132,000 roche 454 highconfidence deep sequencing reads [749] . an international team, searching on thousands of novel non-coding transcripts of the breast cancer transcriptome, has been able to identify more than three hundred reads corresponding to malat1 [749] . this is a noncoding rna which was identified in 2003 in non-small cell lung cancer, was shown to be highly expressed (relative to gapdh) in lung, pancreas and prostate, but not in other tissues including muscle, skin, stomach, bone marrow, saliva, thyroid and adrenal glands, uterus and fetal liver [758] . malat-1, also known as neat2, is considered to be ''extraordinarily conseved for a noncoding rna, more so than even xist'' [754] . our results indicate that the reduction of expression of some non-coding rnas, in particular of malat-1, and snora60 with respect to their normal expression in prostate, as well as the upregulation of snhg8 and snhg1 should be monitored as useful biomarkers to track disease progression. we will now address another non-coding rna called neat1 which, like neat2, is also conserved in the mammalian lingeage. before we move onto neat1, we will first recall a previous result. we have noted before the conspiquous position of sfpq/psf (polypyrimidine tract-binding protein-associated splicing factor) in figure 16 . the expression of a probe for spqf has the highest correlation with the values of the normalized shannon entropy. we highlighted before that sfpq/psf is a putative regulator of growth factor-stimulated gene expression [498] . the loss of sfpq expression during the progression of prostate cancer may be an important key to understand this disease or one of its subtypes. we have also mentioned that the ar/psf complex interacts with the psa gene (perhaps the most well-established prostate cancer biomarker) and that sfpq/psf inhibits ar transcriptional activity [499] . kuwahara et al. showed that sfpq together with nono (non-pou-domain-containing, octamer binding protein) and pspc1 (paraspeckle protein 1 alpha isoform, formerly known as psp1) are expressed in mouse sertoli cells of the testis and form complexes that function as coregulators of androgen receptormediated transcription [500] . while new research results [759] link sfpq and nono/p54nrb with the rad51 family of proteins (largely regarded as another key protector of chromosome integrity as being involved in homologous recombination dna repair), it is perhaps sfpq and nono's co-localization in paraspeckles that make this group also remarkable [760] . paraspeckles [760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777] are a novel nuclear compartment, of approximately 0.2-1 mm in size, discovered in 2002, by fox et al. in dundee scotland, following the identification of the protein pspc1 (af448795) in the nucleolar proteomics project at lamond's lab which is described well by fox et al. [777] . three years later, fox, bond and lamond showed that nono and pspc1 form a heterodimer that localizes to paraspeckles in an rna-dependent manner [773] . paraspeckles are dynamic structures, observed in numbers that vary between 10 and 20, that seem to control gene expression via retention of rna in the nucleus [772] . a long noncoding rna called neat1/men epsilon/beta [754, 760, 762, 764, 778] , that colocalizes with paraspeckles, seems to be integral to their structure. depletion of neat1 erradicates paraspeckles and a biochemical analysis by clemson et al indicates that the neat1 binds with paraspeckle proteins sfpq/psf, p54nrb/nono and pspc1. neat1 is also known as tncrna (trophoblast-derived noncoding rna) [754, 779, 780, 781, 782, 783, 784, 785, 786] and probes for tncrna exist on this dataset, we have observed in true et al.'s dataset that there exists a high correlation between the normalized shannon entropy with the expression of sfpq/psf, p54nrb/nono, and tncrna. overall, this implies that the disruption of the function of the paraspeckles is correlated with the increasing signs of deterioration of normal transcriptomic state of the cells. while a causal relationship still needs to be proved, we admire the mathematical elegance of the normalized shannon entropy of the samples, a global measure of the average expected surprisal of the transcriptome, which in turn has lead us to consider the dysfunction of the smallest nuclear body as a putative biomarker of disease progression. the role of sfpq/psf in the control of tumorigenesis is under investigation [787] and the information coming from these studies would need to be integrated with their role, together with p54nrb/nono and tncrna, in paraspeckles if we want to achieve a better understanding of these mechanisms. in this contribution we have shown that for the melanoma and prostate cancer datasets studied, the quantitative changes of information theory measures, normalized shannon entropy, jensenfigure 19 . the stacked average gene expression of probes corresponding to brca1 and terf2 (telomeric repeat binding factor 2) in true et al's prostate cancer dataset. the first group of samples (1 to 9 in green) correspond to gleason 3 pattern, indicating that most of the samples in this group have no significantly reduced expression of this pair of genes. the second group of columns (10 to 21 in yellow) correspond to gleason 4 patterns and the last 8 columns (22 to 29 in red) correspond to gleason 5 samples. a very recent study by ballal et al. have linked brca1, to telomere length and maintenance and its loss from the telomere in response to dna damage [720] (see also [721] ). there is an increasing trend of dowregulation, so it would be interesting to evaluate if indeed this pair of proteins could be an early marker of dowregulation useful to evaluate samples with gleason pattern 2, or if may constitute a biomarker useful to distinguish a prostate cancer subtype. doi:10.1371/journal.pone.0012262.g019 shannon divergence and the novel statistical complexity quantifiers defined here are in high correlation with gene expression changes of well-established biomarkers associated to cancer progression. in addition, variations of the basic technique (i.e. a modified form of statistical complexity) which allows us to better understand the phenotypic changes observed in these samples which are associated with the progression and the transitions of the gene expression profiles. for instance, in a properly defined statistical complexity vs. entropy plane, on a melanoma dataset first studied in ref. [110] , samples appear in well differentiated ''clusters''. these clusters correlate well with the phonotypic characteristics of normal skin, nevi, primary and metastatic melanoma. in this ''complexity vs. entropy'' plane, primary melanomas samples appear ''bridging'' benign nevi and metastatic melanoma samples. our results may also suggest that the evolution of metastatic melanoma leads to at least two different subtypes. the normalized shannon entropy of a transcriptional sample profile is calculated associating the measured expression values of a gene with the relatively probability of being expressed. we have observed that, in general, the transcriptomes of tumour progressing cells tend to have lower values of normalized shannon entropy than normal ones. given a population of normal cells of a given tissue type it is then possible to compute useful measure of divergence of cancer cell profiles from the normal expression average profile, in terms of information theory quantifiers, the shannon eveness normalized entropy and generalized statistical complexity [788, 789, 790] . in addition, our observation of the correlation of the statistical complexity of tumours with its natural progression allows an unprecedented way of finding biomarkers that links with the gradual deterioration of the genome integrity. the proposed methodology uncovered, for the first time, evidence of the putative role of impared centrosome cohesion in melanoma progression. statistical complexity has then been able to pinpoint otherwise unrecognized biomarkers in concert with existing ones, reinforcing the view that ''chromosomal chaos'' and ''cancer as a chromosomal disease'' can be a useful guiding principle to understand the molecular biology of cancer and uncover the timeline of its progression. this is a powerful method to uncover ''oncosystems'' instead of ''oncogenes''. ''oncosystems'' are a highly differentially disregulated set of genes that, if linked with the molecular ''hallmarks of cancer'' described in the introduction, and existing databases with putative common functional genomic annotations, can help to understand the biological progression pathways that drive the disease. on one of the prostate cancer dataset studied (obtained from a previous published study, [44] ), we observe a gradual pattern of reduction of normalized shannon entropy from three well characterized tissue types: normal prostate, primary prostate tumours and lymph node metastases. on a different dataset on prostate cancer (from ref [332] ), we observe that a group of samples having gleason figure 16 is that we have now highlighted the position of s ome probes which have been annotated as corresponding to ''non-coding rnas''. in particular, we highlight those of malat1 (metastasis associated lung adenocarcinoma transcript 1, (non-protein coding)), snora60 (small nucleolar rna, h/aca box 60); both increasingly downregulated, snhg1 (small nucleolar rna host gene 1 (non-protein coding)) and snhg8 (small nucleolar rna host gene 8 (non-protein coding)). the probes for malat1/malat-1 [742, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758] have a very conspiquous position, which we could judge a priori to be equivalent in relevance to those of the previously discussed roles of sfpq, cd40, brca1, and tp53 (see figure 16 ). malat1 has been recently pointed as a biomarker in primary human lobular breast cancer as a result of an analysis of over 132,000 roche 454 high-confidence deep sequencing reads. within the thousands of novel non-coding transcripts of the breast cancer transcriptome, guffanti al., identified more than three hundred reads corresponding to malat1 [749] . this non-coding rna, first identified in 2003 in non-small cell lung cancer, was shown to be highly expressed (relative to gapdh) in lung, pancreas and prostate, but not in other tissues including muscle, skin, stomach, bone marrow, saliva, thyroid and adrenal glands, uterus and fetal liver (see figure four of ref. [758] ). our results indicate that the reduction of expression of some non-coding rnas, in particular of malat-1, and snora60 with respect to their normal expression in prostate, as well as the upregulation of snhg8 and snhg1 should be monitored as useful biomarkers to track disease staging and progression to a more malignant phenotype. interestingly enough, a study published in 2006 by nadminty et al. has shown that klk3/psa modulates several genes, reporting a 16.5 fold downregulation of malat1 [810] . while these results have been obtained using the human osteosarcoma cell line saos-2, our results indicate that malat1 expression in the normal prostate and in cancer cells could also be considered as a relevant biomarkers to be tested in the future. doi:10.1371/journal.pone.0012262.g020 patterns 4 and 5 (two patterns which are typically associated to an aggressive phenotype) have lower normalized shannon entropy values than a subset of gleason pattern 3 (a pattern which is normally associated to a less aggressive phenotype but which nevertheless is still of clinical concern). however, a group of samples having gleason patterns 3, 4, and 5 is revealed; this mixed cluster has a mid-range entropy. this is an interesting fact which correlates with the limitations observed in ref. [332] . we note the authors' comment: ''we were unable to identify a cohort of genes that could distinguish between pattern 4 and 5 cancers with sufficiently high accuracy to be useful, suggesting a high degree of similarity between these cancer histologies or substantial molecular heterogeneity in one or both of these groups.'' our results provide a conciliatory middle ground that explains the perceived clinical usefulness of gleason pattern classification, widely used around the world, while at the same time reveals the reason for the difficulties of obtaining a good transcriptional signature for the other two patterns [791] . we have seen, through a detailed discussion of several biomarkers in three different datasets, that the variation of the gene expression distributional profile can be characterized via information theory quantifiers. our study also showed that current established biomarkers of the two diseases studied seem to correlate with those that best co-variate with these quantifiers. for instance, amacr, in our second prostate cancer dataset studied, naturally appears as one of the most correlated genes (in both the pearson and the spearman sense) with the pattern of variation of entropy of the samples. together with maoa, which is the highlighted gene in true et al.'s [332] original publication, amacr is now being recognized as one of the best biomarkers in primary prostate cancer with approximately 180 publications dedicated to it in the past five years. we have also shown that many gene probes that best correlate with the divergence of the normal tissue profile have been identified as useful biomarkers (via other accepted validation methods). this said, the use of other sources of information, like pathway or gene ontology databases has lead as to the identification of other cell processes that may be altered. we have presented a unifying hallmark of cancer, the cancer cell's transcriptome changes its normalized shannon entropy (as measured by high-througput technologies), while it increments its physical entropy (via creation of states we might not measure with our devices). this hallmark allows, via the use of the jensen-shannon divergence, to identify the arrow of time of the process, and helps to map the phenotypical and molecular hallmarks of cancer as major converging trends of the transcriptome. the methodology has produced remarkable postdictions and retrodictions that show that it can predictively guide biomarker discovery. we refer the reader to the original publications for details of methods for data collection, but we highlight here some aspects that are important to understand the data generation process for the purpose of our analysis. samples were obtrained from radical prostatectomy surgical procedures. samples are labelled as ''tumors'' if they contain at least 90% of cancerous epithelial cells, and they were considered as ''non-tumor'' if they contain no tumor epithelium and are from the noncancerous region of the prostate. the later samples were labelled ''normals'' although the authors alert that some may contain dysplasia. in this dataset, lapointe et al. have performed a gene expression profiling by using cdna microarrays containing 26,260 different human genes (unigene clusters). using 50 mg of total rna from prostate samples cy5-labeled cdna was prepared and cy3-labeled cdna used 1.5 mg of mrna common reference, pooled from 11 human cell lines (see ref. [792] ). the fluorescence ratios were subsequently normalized by mean centering genes for each array, a relatively standard procedure. in addition, to minimize potential print run specific bias, lapointe et al. report that ratios were then mean centered for each gene across all arrays according to ref. [793] . we have only used the genes that the authors report in their first figure, 5,153 genes that have been well measured and have significan variation in some of the samples. for the other details of their matrials and methods we refer the readers to the supporting notes and the materials and methods section of their original publication [44] . samples were obtained from nevus volunteers and melanoma patients and only those samples that have more than 90% of tumor cells were profiled. the 20,862 cdnas used (research genetics, huntsville, al) represent 19,740 independent loci. (unigene build 166).median of ratio values from the experiment were subjected to linear normalization in nomad (which can be accessed at http://derisilab.ucsf.edu), log-transformed (base 2), and filtered for genes where data were present in 80% of experiments, and where the absolute value of at least one measurement was .1. in this dataset, samples have information of 15,488 spots per array, with a total of 7,700 unique cdnas represented. the samples were obtained from frozen tissue blocks from 29 radical prostatectomies accessioned and selected to represent gleason grades 3, 4, and 5. the samples are ''treatment naã¯ve'', meaning that they were also selected such that their gene expression profile is also and the absence of any bias that the treatment before prostatectomy. the frozen sections (8 mm) were cut from optimal cutting temperature medium blocks and immediately fixed in cold 95% ethanol. around 5,000 epithelial cells from both histologically benign glands and cancer glands were separately laser-capture microdissected (lcm). the authors of the study have also been very careful to include only one gleason pattern in each laser-captured cancer sample, following a process in which the patterns were assessed independently by two investigators.the matched benign epithelium was captured for each cancer sample for a total of 121 samples. an important characteristic of this dataset is the normalization procedure. for each spot and in each channel (cy3 and cy5), true et al. substracted the median background intensity from the median foreground intensity, and subsequently the log ratios of cancer expression to benign expression were computed. these ratios were obtained by first dividing the background-subtracted intensities (prostate cancer/benign) and then taking the logarithm base 2. in the case that the median background intensity was greater than the median foreground intensity, the spot was considered missing. we refer to the original publication for the other aspects of imputation, spot quality and filtering, but, like in lapointe et al's study, they also filter to keep informative (expression ratios of benign versus cancer should at least be 1.5-fold or greater in at least half of one of the gleason groups as one of the selection criteria). shannon entropy. in many circumstances, experimental measurements are associated with the accumulation of individual results which, ultimately, qualitatively and quantitatively characterized our experimental observations. the presence (or absence) of a particular result of an individual experimental measure is called an event. an event which can take one of several possible values is called a random variable. analogously, a random event is an event that can either fail to happen, or happens, as a result of an experiment. an event is certain if it can not fail to happen and it is said to be impossible if it can never happen. following andreyev [794] , we will define the probability p(x) of an event x, as the theoretical frequency of the event x about which the actual frequency occurrence of the event shows a tendency to fluctuate as the experiment is repeated many times. the shannon information content of an event x (or the surprisal of an event x, [795] ), is defined as h(x)~log 2 1 p(x) following mckay [796] , an ensamble x is a triple (x,a x ,p x ), where x is the value of a random variable, which takes on one of a set of possible values, a x~f a 1 ,a 2 ,:::,a i ,:::,a n g, having probabilities p x~f p 1 ,p 2 ,:::,p n g, with p(x~a i )~p i , p i â§0 and x a i [a x p(x~a i )~1. the shannon entropy of an ensemble x (also known as the uncertainty of x), denoted as h[x], is defined to be the average shannon information content. it is the average expected surprisal for an infinitely long series of experiments. we use the theoretical frequencies to compute this average, and then we have suppose that we have a fair dice, the theoretical frequency of an event 'the dice shows a three' is 1/6, (if the dice is assumed fair, the theoretical frequency is the same for any number from 1 to 6). in that case a hypothetical experimentalist guessing will have an average expected surprise of h[x] = log 2 (6) . we note the two natural bounds that the entropy can have. the shannon entropy of an ensemble x is always greater or equal to zero. it can only be zero if p(x~a i )~1 for only one of the n elements of a x~f a 1 ,a 2 ,:::,a i ,:::,a n g. on the other hand, the shannon entropy is maximized in the case that p(x~a i )~1=n. this is the so-called ''equiprobable distribution'', a uniform probability distribution over the finite set. transcriptional shannon entropy. let f (j) i the expression value of probe i (i = 1,â�¦, n) on sample j (j = 1, â�¦, m). for each sample j we first normalize the expression values. we interpret them as the theoretical frequency of a single hybridization event. we then define a probability distribution function (pdf) over a finite set as: the uniform (equiprobably) distribution is defined as i let h e~h â½p e ~log 2 n, then in this paper we always use the normalized shannon entropy, defined as: i h e , j~1, . . . ,m the jensen-shannon divergence and the statistical complexity measures given a probability distribution function over a discrete finite set, is then straightforward to calculate its normalized shannon entropy if we have the theoretical frequencies. several measures of ''complexity'' of a probability distribution function have been proposed. in this work we have used statistical complexity measures. all the complexity measures used in this work are the product of a normalized shannon entropy of the probability distribution function, and a divergence measure to a reference probability distribution function. we follow earlier proposals by lã³pez-ruiz, mancini and calbet who first introduced a statistical complexity measure based on such a product in [797] . the lmc-statistical complexity is the product of the normalized shannon entropy, h[p], times the disequilibrium, q[p]; the latter given by the euclidean distance from p to p e , the uniform probability distribution over the ensemble. in this paper we used a later modification which we refer as the mpr-statistical complexity [43] which replaces the euclidean distance between p to p e by the jensen-shannon divergence [788, 798] . the jensen-shannon divergence is linked in physics to the thermodynamic length [799, 800, 801, 802] . we define the mpr-statistical complexity [790] as: where q p (j) ,p e ã� ã� q 0 j s p (j) ,p e ã� ã� , q 0 is a normalization factor, and j s p (1) ,p (2) ã� ã� is the jensen-shannon's divergence between two probability density functions p (1) and p (2) , which in turn is defined as j s p (1) ,p (2) ã� ã� h p (1) zp (2) 2 in this work, in many cases we compute the jensen-shannon divergences of a probability with a probability of reference which is not the uniform probability distribution over the ensemble. in general, it is the average over a subset of probability distribution functions which are consider to be either the ''initial'' of ''final'' states of interest. let p ave be such an average, then the m-statistical complexity of a probability distribution function p (j) , given a p ave of reference, is given by c (m) p (j) ã� ã�~h p (j) ã� ã� : j s p (j) ,p ave ã� ã� an illutrative example. in order to discuss a relatively simple example that can intuitively provide a grasp of the basic mathematical principles of information theory we present a hypothetical ''gene expression'' dataset involving four samples each with the expression of five unique probes corresponding to five genes (not necessarily different) as follows in table 3 . one of the quantifiers that we use in this contribution describes a measure of order for a sample: the normalized shannon entropy also known as shannon evenness index [803] . this section focuses on this quantifiers use and importance (refer to the 'materials and methods' section to see how this measure is calculated). in sample 4 all probes have the same expression therefore it has the highest achievable value of normalized shannon entropy (h = 1). the normalized shannon entropy values for samples 1 and 2 are the same (h = 0.82). sample 3, which tends to be less peaked and has the two most significantly expressed genes with the same value, has a higher value of normalized shannon entropy (h = 0.92) (see figure 21 ). this simple example shows that the normalized shannon entropy variations of the gene expression profile convey information about global transcriptomic changes; however, this measure alone is not enough to characterize the deviations from normal tissue profiles. for example, assume that sample 1 is the normal profile of a particular tissue type. assume that sample 3 is the profile of a cancer cell that originated from that tissue type, the variation of normalized shannon entropy can be related to this malignant change. however, as sample 2 illustrates, normalized shannon entropy is not enough to let us to measure the variation from a profile and at least another information theory quantifier is needed. we resort to statistical complexity quantifiers, which in turn use the jensen-shannon divergence [798] to provide this complementary dimension [800] (refer to the 'materials and methods' section for a mathematical definition of the jensen-shannon divergence). figure 21 shows how the jensen-shannon divergence helps us to evaluate the variation between profiles. samples 1 and 2, as perhaps intuitively expected, have the largest divergence between them, their jensen shannon divergence is 0.286636 (js(1,2) = table 3 . an example dataset to illustrate the principles of shannon entropy and the information theory quantifiers used in this work. table 3 . sample 4 has the largest attainable value since the expression of all probes is the same. samples 1 and 2, which have the same set of expression values, although in different probes, have the same value of normalized shannon entropy. as a consequence, there is a need for another quantifier of gene expression to address the permutational indistinguishability of these two expression profiles. the jensen-shannon divergence provides a natural alternative (see table 4 ). doi:10.1371/journal.pone.0012262.g021 table 4 . jensen-shannon divergence values using the example introduced in table 3 . table 4 . let h p (j) ã� ã� be the normalized shannon entropy of a transcriptional sample profile, then the mpr-statistical complexity c (mpr) p (j) ã� ã� is defined as being proportional to the product of the normalized shannon entropy times the jensen-shannon divergence of the profile with the equiprobable distribution (in the example above the equiprobable distribution is that of sample 4). then we have where q 0 is a normalization factor. once again, we refer to the 'materials and methods' sections for the accompanying formal mathematical presentation. as a consequence, we can plot the mpr-statistical complexity of the samples of our example as a function of the normalized shannon entropy as can be seen in figure 22 . annotated genes. a full list of gene references in this paper along with their descriptions from ihop (http://www.ihop-net. org/unipub/ihop/) can be found in supplementary material reference file s5. stemness, cancer and cancer stem cells stemness'' genomics law governs clinical behavior of human cancer: implications for decision making in disease management an embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors mitochondrial localization, elk-1 transcriptional regulation and growth inhibitory functions of brca1, brca1a, and brca1b proteins mitochondria and cancer glucose avidity of carcinomas mitochondrial dna instability and metabolic shift in human cancers roles of p53, myc and hif-1 in regulating glycolysis -the seventh hallmark of cancer alteration of the bioenergetic phenotype of mitochondria is a hallmark of breast, gastric, lung and oesophageal cancer deregulation of the g1/s-phase control in human testicular germ cell tumours aberrant dna methylation as a cancer-inducing mechanism promoter hypermethylation of hallmark cancer genes in atypical adenomatous hyperplasia of the lung loss of acetylation at lys16 and trimethylation at lys20 of histone h4 is a common hallmark of human cancer metabolic transformation in cancer metabolic targeting of cancers: from molecular mechanisms to therapeutic strategies tumor cell metabolism: cancer's achilles' heel role of hypoxia in the hallmarks of human cancer adaptation to hypoxia and acidosis in carcinogenesis and tumor progression micrornas: novel regulators in the hallmarks of human cancer micrornas and the hallmarks of cancer genetic and epigenetic mosaicism in cancer precursor tissues road to the crossroads of life and death: linking sister chromatid cohesion and separation to aneuploidy, apoptosis and cancer chromosome aberrations in solid tumors instability of chromosome structure in cancer cells increases exponentially with degrees of aneuploidy genetic instability of cancer cells is proportional to their degree of aneuploidy cell junctions and the biological behaviour of cancer mitochondrial gateways to cancer cancer-related inflammation, the seventh hallmark of cancer: links to genetic instability pathways connecting inflammation and cancer hallmarks for senescence in carcinogenesis: novel signaling players nf-kappab and cancer-identifying targets and mechanisms altered growth pattern, not altered growth per se, is the hallmark of early lesions preceding cancer development micrornas and cancer epigenetics common themes of dedifferentiation in somatic cell reprogramming and cancer disruption of differentiation in human cancer: aml shows the way molecular theory of cancer principles of cancer therapy: oncogene and non-oncogene addiction stem cell regulation and the development of blast crisis in chronic myeloid leukemia: implications for the outcome of imatinib treatment and discontinuation classification of prostatic carcinomas trends in reporting gleason score 1991 to 2001: changes in the pathologist's practice a comparison of interobserver reproducibility of gleason grading of prostatic carcinoma in japan and the united states cutaneous melanoma: available therapy for metastatic disease gene expression profiling identifies clinically relevant subtypes of prostate cancer revisiting the relation between species diversity and information theory differential post-transcriptional regulation of two ink4 proteins, p18(ink4c) and p19(ink4d) ink4 proteins, a family of mammalian cdk inhibitors with novel biological functions loss of the forkhead transcription factor foxm1 causes centrosome amplification and mitotic catastrophe foxm1 is required for execution of the mitotic programme and chromosome stability the role of the brca2 gene in susceptibility to prostate cancer revisited prostate cancer in male brca1 and brca2 mutation carriers has a more aggressive phenotype down-regulation of brca2 expression by collagen type i promotes prostate cancer cell proliferation brca2 mutation in a family with hereditary prostate cancer loss of brca2 promotes prostate cancer cell invasion through up-regulation of matrix metalloproteinase-9 rapid progression of prostate cancer in men with a brca2 mutation rare germline mutations in the brca2 gene are associated with early-onset prostate cancer ccl2 induces prostate cancer transendothelial cell migration via activation of the small gtpase rac prostate cancer promotes cd11b positive cells to differentiate into osteoclasts a destructive cascade mediated by ccl2 facilitates prostate cancer growth in bone ccl2 is a potent regulator of prostate cancer cell migration and proliferation ccl2, survivin and autophagy: new links with implications in human cancer ccl2 protects prostate cancer pc3 cells from autophagic death via phosphatidylinositol 3-kinase/akt-dependent survivin up-regulation co-inoculation of prostate cancer cells with u937 enhances tumor growth and angiogenesis in vivo pthrpinduced mcp-1 production by human bone marrow endothelial cells and osteoblasts promotes osteoclast differentiation and prostate cancer cell proliferation and invasion in vitro monocyte chemotactic protein-1 mediates prostate cancer-induced bone resorption ccl2 as an important mediator of prostate cancer growth in vivo through the regulation of macrophage infiltration targeting ccl2 with systemic delivery of neutralizing antibodies induces prostate cancer tumor regression in vivo par1-mediated rhoa activation facilitates ccl2-induced chemotaxis in pc-3 cells identification of differentially expressed genes in organ-confined prostate cancer by gene expression array the role of c-jun and c-fos expression in androgen-independent prostate cancer molecular cancer phenotype in normal prostate tissue activator protein-1 transcription factors are associated with progression and recurrence of prostate cancer raf-1 expression may influence progression to androgen insensitive prostate cancer a mechanism of ap-1 suppression through interaction of c-fos with lamin a/c fast regulation of ap-1 activity through interaction of lamin a/c, erk1/2, and c-fos at the nuclear envelope the role of sox9 in prostate development sox9 is required for prostate development sexual differentiation androgen-induced programs for prostate epithelial growth and invasion arise in embryogenesis and are reactivated in cancer sox9 is expressed in normal prostate basal cells and regulates androgen receptor expression in prostate cancer cells inducible fgfr-1 activation leads to irreversible prostate adenocarcinoma and an epithelial-to-mesenchymal transition sox9 is expressed in human fetal prostate epithelium and enhances prostate cancer invasion calcification of multipotent prostate tumor endothelium multivariate gene expression analysis reveals functional connectivity changes between normal/tumoral prostates psa response signatures -a powerful new prognostic indicator after radiation for prostate cancer? first results of pet/ ct-guided secondary lymph node surgery on patients with a psa relapse after radical prostatectomy vascular endothelial growth factor receptor 1 expression in pelvic lymph nodes predicts the risk of cancer progression after radical prostatectomy prostate cancer patients with lymph node metastasis radioisotope guided sentinel lymph node dissection in patients with localized prostate cancer: results of the first 100 cases risk stratification for positive lymph nodes in prostate cancer use of preoperative plasma endoglin for prediction of lymph node metastasis in patients with clinically localized prostate cancer the rate of lymph node invasion (lni) in men with psa values less than 10 ng/ml pelvic lymph node irradiation for prostate cancer: who, why, and when? the role of p501s and psa in the diagnosis of metastatic adenocarcinoma of the prostate detection and management of isolated lymph node recurrence in patients with psa relapse significance of micrometastases in pelvic lymph nodes detected by real-time reverse transcriptase polymerase chain reaction in patients with clinically localized prostate cancer undergoing radical prostatectomy after neoadjuvant hormonal therapy 18f]fluorocholine pet/ct imaging for the detection of recurrent prostate cancer at psa relapse: experience in 100 consecutive patients incidence of positive pelvic lymph nodes in patients with prostate cancer, a prostate-specific antigen (psa) level of , or = 10 ng/ml and biopsy gleason score of , or = 6, and their influence on psa progression-free survival after radical prostatectomy histopathological evaluation of radical prostatectomy in the treatment of localized prostate cancer disease progression and survival in patients with prostate carcinoma and positive lymph nodes after radical retropubic prostatectomy is pelvic lymph node dissection necessary in patients with a serum psa,10ng/ml undergoing radical prostatectomy for prostate cancer? usefulness of the nadir value of serum prostate-specific antigen measured by an ultrasensitive assay as a predictor of biochemical recurrence after radical prostatectomy for clinically localized prostate cancer lymphovascular invasion is an independent prognostic factor in prostatic adenocarcinoma lymph node staging in prostatic carcinoma revisited prostate specific antigen progression in men with lymph node metastases following radical prostatectomy: results of long-term followup natural history of disease progression in patients who fail to achieve an undetectable prostate-specific antigen level after undergoing radical prostatectomy the value of prostate-specific antigen levels in pelvic lymph nodes for diagnosing metastatic spread of prostate cancer pelvic lymphadenectomy can be omitted in selected patients with carcinoma of the prostate: development of a system of patient selection the gene expression signatures of melanoma progression the p53 family: same response, different signals p53 family activities in development and cancer: relationship to melanocyte and keratinocyte carcinogenesis contribution of p53, p63, and p73 to the developmental diseases and cancer and p63: why do we still need them? tp63 gene in stress response and carcinogenesis: a broader role than expected properties of the six isoforms of p63: p53-like regulation in response to genotoxic stress and cross talk with deltanp73 ) mir-203 represses 'stemness' by repressing deltanp63 melanocyte and keratinocyte carcinogenesis: p53 family protein activities and intersecting mrna expression profiles comparative study of p63 and p53 expression in tissue microarrays of malignant melanomas connecting p63 to cellular proliferation: the example of the adenosine deaminase target gene adenosine deamination sustains dendritic cell activation in inflammation ck2-site phosphorylation of p53 is induced in deltanp63 expressing basal stem cells in uvb irradiated human skin adenosine deaminase activity in the serum and malignant tumors of breast cancer: the assessment of isoenzyme ada1 and ada2 activities adenosine deaminase, xanthine oxidase, superoxide dismutase, glutathione peroxidase activities and malondialdehyde levels in the sera of patients with head and neck carcinoma activities of adenosine deaminase and 59-nucleotidase in cancerous and noncancerous human colorectal tissues serum adenosine deaminase and arylsulphatase a as an index of early infiltration of central nervous system in acute lymphoblastic leukemia serum adenosine deaminase and total superoxide dismutase activities before and after surgical removal of cancerous laryngeal tissue dipeptide proline diphenyl phosphonates are potent, irreversible inhibitors of seprase (fapalpha) dipeptidyl peptidase inhibits malignant phenotype of prostate cancer cells by blocking basic fibroblast growth factor signaling pathway loss of dipeptidyl peptidase iv immunostaining discriminates malignant melanomas from deep penetrating nevi a marker for neoplastic progression of human melanocytes is a cell surface ectopeptidase expression of cd26/dipeptidyl-peptidase iv in benign and malignant pigment-cell lesions of the skin adenosine downregulates dppiv on ht-29 colon cancer cells by stimulating protein tyrosine phosphatases and reducing erk1/2 activity via a novel pathway adenosine down-regulates the surface expression of dipeptidyl peptidase iv on ht-29 human colorectal carcinoma cells: implications for cancer cell behavior differential regulation of centrosome integrity by dna damage response proteins genomic models of metastatic cancer: functional analysis of death-from-cancer signature genes reveals aneuploid, anoikis-resistant, metastasis-enabling phenotype with altered cell cycle control and activated polycomb group (pcg) protein chromatin silencing pathway expression patterns of polo-like kinase 1 in human gastric cancer overexpression of polo-like kinase 1 (plk1) and chromosomal instability in bladder cancer polo-like kinase 1 expression is a prognostic factor in human colon cancer polo-like kinase 1 (plk1) is overexpressed in primary colorectal cancers aberrant polo-like kinase 1-cdc25a pathway in metastatic hepatocellular carcinoma polo-like kinase 1 expression in medullary carcinoma of the thyroid: its relationship with clinicopathological features silencing of polo-like kinase (plk) 1 via sirna causes inhibition of growth and induction of apoptosis in human esophageal cancer cells identification of human polo-like kinase 1 as a potential therapeutic target in pancreatic cancer robust diagnosis of non-hodgkin lymphoma phenotypes validated on gene expression data from different laboratories polo-like kinase isoforms in breast cancer: expression patterns and prognostic implications expression of polo-like kinase (plk1) in thin melanomas: a novel marker of metastatic disease biodistribution and kinetics of the novel selective oncolytic adenovirus m1 after systemic administration confirming the rnai-mediated mechanism of action of sirna-based cancer therapeutics in mice normal cells, but not cancer cells, survive severe plk1 depletion polo-like kinase (plk)1 depletion induces apoptosis in cancer cells effect of antisense rna targeting polo-like kinase 1 on cell growth in a549 lung cancer cells a panel of isogenic human cancer cells suggests a therapeutic approach for cancers with inactivated p53 plk1 depletion in nontransformed diploid cells activates the dna-damage checkpoint gather: a systems approach to interpreting genomic signatures profiler-a webbased toolset for functional profiling of gene lists from large-scale experiments cryptic splicing at a non-consensus splice-donor in a patient with a novel mutation in the plakophilin-1 gene genetic diseases of junctions ectodermal dysplasia-skin fragility syndrome resulting from a new homozygous mutation, 888delc, in the desmosomal protein plakophilin 1 clinical and molecular significance of splice site mutations in the plakophilin 1 gene in patients with ectodermal dysplasiaskin fragility syndrome inherited disorders of desmosomes hereditary mucoepithelial dysplasia: clinical, ultrastructural and genetic study of eight patients and literature review homozygous splice site mutations in pkp1 result in loss of epidermal plakophilin 1 expression and underlie ectodermal dysplasia/skin fragility syndrome in two consanguineous families assessment of splice variant-specific functions of desmocollin 1 in the skin desmosomes: structure and function in normal and diseased epidermis genomic amplification of the human plakophilin 1 gene and detection of a new mutation in ectodermal dysplasia/skin fragility syndrome genomic organization and amplification of the human plakoglobin gene (jup) skin fragility and hypohidrotic ectodermal dysplasia resulting from ablation of plakophilin 1 hereditary diseases of desmosomes a novel genodermatosis caused by mutations in plakophilin 1, a structural component of desmosomes decreased plakophilin-1 expression promotes increased motility in head and neck squamous cell carcinoma cells the distribution of the desmosomal protein, plakophilin 1, in human skin and skin tumors plakoglobin is required for effective intermediate filament anchorage to desmosomes plakophilins-hard work in the desmosome, recreation in the nucleus? the function of plakophilin 1 in desmosome assembly and actin filament organization plakophilins: multifunctional proteins or just regulators of desmosomal adhesion? comparative analysis of armadillo family proteins in the regulation of a431 epithelial cell junction assembly, adhesion and migration desmoplakin assembly dynamics in four dimensions: multiple phases differentially regulated by intermediate filaments and actin a role for plakophilin-1 in the initiation of desmosome assembly lack of plakophilin 1 increases keratinocyte migration and reduces desmosome stability the desmoglein-specific cytoplasmic region is intrinsically disordered in solution and interacts with multiple desmosomal protein partners the fate of desmosomal proteins in apoptotic cells aquaporin water channels in mammals skin aquaporins: function in hydration, wound healing, and skin epidermis homeostasis roles of aquaporin-3 water channels in volume-regulatory water flow in a human epithelial cell line osmotic stress up-regulates aquaporin-3 gene expression in cultured human keratinocytes increased expression of aquaporin 3 in atopic eczema specific aquaporins facilitate the diffusion of hydrogen peroxide across membranes all-trans retinoic acid attenuates ultraviolet radiation-induced down-regulation of aquaporin-3 and water permeability in human keratinocytes reactive oxygen species: a breath of life or death? role of claudins in tumorigenesis changes in the distribution pattern of claudin tight junction proteins during the progression of mouse skin tumorigenesis claudin-1 is a metastasis suppressor and correlates with clinical outcome in lung adenocarcinoma reexpression of the tj protein cldn1 induces apoptosis in breast tumor spheroids claudin-1, -3 and -4 proteins and mrna expression in benign and malignant breast lesions: a research study decreased expression of claudin-1 correlates with recurrence status in breast cancer constitutive activation of wnt/beta-catenin signaling pathway in migrationactive melanoma cells: role of lef-1 in melanoma with increased metastatic potential osteopontin expression and distribution in human carcinomas functional analysis of the osteopontin molecule osteopontin activation of c-src in human melanoma cells requires the cytoplasmic domain of the integrin alpha v-subunit osteopontin n-terminal domain contains a cryptic adhesive sequence recognized by alpha9beta1 integrin an examination of the effects of hypoxia, acidosis, and glucose starvation on the expression of metastasis-associated genes in murine tumor cells structural requirements for alpha 9 beta 1-mediated adhesion and migration to thrombin-cleaved osteopontin chimeric peptides of statherin and osteopontin that bind hydroxyapatite and mediate cell adhesion natural metabolites of 1alpha, 25-dihydroxyvitamin d(3) retain biologic activity mediated through the vitamin d receptor osteopontin deficiency reduces experimental tumor cell metastasis to bone and soft tissues osteopontin stimulates tumor growth and activation of promatrix metalloproteinase-2 through nuclear factor-kappa b-mediated induction of membrane type 1 matrix metalloproteinase in murine melanoma cells autocrine stimulation by osteopontin contributes to antiapoptotic signalling of melanocytes in dermal collagen cooperative role of osteopontin with type i collagen on the metastasis of murine melanoma cells osteopontin induces nuclear factor kappa bmediated promatrix metalloproteinase-2 activation through i kappa b alpha/ ikk signaling pathways, and curcumin (diferulolylmethane) down-regulates these pathways osteopontin-deficiency suppresses growth of b16 melanoma cells implanted in bone and osteoclastogenesis in co-cultures nuclear factor-inducing kinase plays a crucial role in osteopontin-induced mapk/ikappabalpha kinasedependent nuclear factor kappab-mediated promatrix metalloproteinase-9 activation gene expression analysis to identify mrna markers of cardiac myxoma osteopontin: it's role in regulation of cell motility and nuclear factor kappa b-mediated urokinase type plasminogen activator expression osteopontin expression correlates with melanoma invasion osteopontin in melanocytic lesions-a first step towards invasion? osteopontin expression correlates with melanoma invasion osteopontin expression and serum levels in metastatic uveal melanoma: a pilot study osteopontin is a downstream effector of the pi3-kinase pathway in melanomas that is inversely correlated with functional pten alpha-v-dependent outside-in signaling is required for the regulation of cd44 surface expression, mmp-2 secretion, and cell migration by osteopontin in human melanoma cells regulation of mesenchymal stem cell and chondrocyte differentiation by mia systematic search for the best gene expression markers for melanoma micrometastasis detection cd44 and the adhesion of neoplastic cells non-rgd domains of osteopontin promote cell adhesion without involving alpha v integrins secreted phosphoprotein mrna is induced during multi-stage carcinogenesis in mouse skin and correlates with the metastatic potential of murine fibroblasts osteopontin as a molecular prognostic marker for melanoma aurora b-mediated abscission checkpoint protects against tetraploidization sns-314, a pan-aurora kinase inhibitor, shows potent anti-tumor activity and dosing flexibility in vivo map kinase meets mitosis: a role for raf kinase inhibitory protein in spindle checkpoint regulation molecular classification of melanoma using real-time quantitative reverse transcriptasepolymerase chain reaction the gene expression profiles of primary and metastatic melanoma yields a transition point of tumor progression and metastasis defective cell cycle checkpoint functions in melanoma are associated with altered patterns of gene expression cdc7 expression in melanomas, spitz tumors and melanocytic nevi global analysis of gene expression changes during retinoic acid-induced growth arrest and differentiation of melanoma: comparison to differentially expressed genes in melanocytes vs melanoma cell cycle regulation of central spindle assembly relocation of aurora b from centromeres to the central spindle at the metaphase to anaphase transition requires mklp2 asap is a novel substrate of the oncogenic mitotic kinase aurora-a: phosphorylation on ser625 is essential to spindle formation and mitosis asap, a human microtubule-associated protein required for bipolar spindle assembly and cytokinesis comprehensive expression profiling of tumor cell lines identifies molecular signatures of melanoma progression the nek6 and nek7 protein kinases are required for robust mitotic spindle formation and cytokinesis the nimafamily kinase nek6 phosphorylates the kinesin eg5 at a novel site necessary for mitotic spindle formation integrative approach for differentially overexpressed genes in gastric cancer by combining large-scale gene expression profiling and network analysis nek6 is involved in g2/m phase cell cycle arrest through dna damage-induced phosphorylation polo-like kinase 1 (plk1) in non-melanoma skin cancers targeted depletion of polo-like kinase (plk) 1 through lentiviral shrna or a small-molecule inhibitor causes mitotic catastrophe and induction of apoptosis in human melanoma cells expression and possible role of hpttg1/securin in cutaneous malignant melanoma antagonism of p66shc by melanoma inhibitory activity ralp, a new member of the src homology and collagen family, regulates cell migration and tumor growth of metastatic melanomas melanoma: targeting signaling pathways and ralp deregulated e2f transcriptional activity in autonomously growing melanoma cells arpc1b gene is a candidate prediction marker for choroidal malignant melanomas sensitive to radiotherapy a multi-marker assay to distinguish malignant melanomas from benign nevi bccip functions through p53 to regulate the expression of p21waf1/cip1 identification of genes expressed in malignant cells that promote invasion chemokine expression in melanoma metastases associated with cd8+ t-cell recruitment a potential immune escape mechanism by melanoma cells through the activation of chemokine-induced t cell death multiplex analysis of serum cytokines in melanoma patients treated with interferon-alpha2b regulation of centromeric cohesion by sororin independently of the apc/c sororin is required for stable binding of cohesin to chromatin and for sister chromatid cohesion in interphase sororin, the cell cycle and sister chromatid cohesion sororin, a substrate of the anaphase-promoting complex, is required for sister chromatid cohesion in vertebrates cooperative roles of c-abl and cdk5 in regulation of p53 in response to oxidative stress gene expression signatures for tumor progression, tumor subtype, and tumor thickness in laser-microdissected melanoma tissues a receptor for the malarial parasite plasmodium vivax: the erythrocyte chemokine receptor mgsa/gro transcription is differentially regulated in normal retinal pigment epithelial and melanoma cells nuclear factor-kappa b activation by the cxc chemokine melanoma growth-stimulatory activity/growth-regulated protein involves the mekk1/p38 mitogen-activated protein kinase pathway endothelin-1 induces cxcl1 and cxcl8 secretion in human melanoma cells gene expression profiling reveals cross-talk between melanoma and fibroblasts: implications for host-tumor interactions in metastasis threedimensional culture of melanoma cells profoundly affects gene expression profile: a high density oligonucleotide array study evidence of involvement of cxc-chemokines in proliferation of cultivated human melanocytes role of cxcl1 in tumorigenesis of melanoma melanoma growth stimulatory activity in primary malignant melanoma: prognostic significance a novel nf-kappa b-inducing kinase-mapk signaling pathway up-regulates nf-kappa b activity in melanoma cells induction of melanoma in murine macrophage inflammatory protein 2 transgenic mice heterozygous for inhibitor of kinase/alternate reading frame constitutive ikappab kinase activity correlates with nuclear factor-kappab activation in human melanoma cells mgsa/gromediated melanocyte transformation involves induction of ras expression the tumorigenic and angiogenic effects of mgsa/gro proteins in melanoma autocrine and paracrine roles for growth factors in melanoma localization of mgsa/gro protein in cutaneous lesions melanoma growth stimulatory activity: isolation from human melanoma tumors and characterization of tissue distribution immunoaffinity purification of melanoma growth stimulatory activity characterization of the role of melanoma growth stimulatory activity (mgsa) in the growth of normal melanocytes, nevocytes, and malignant melanocytes enhanced tumor-forming capacity for immortalized melanocytes expressing melanoma growth stimulatory activity/growth-regulated cytokine beta and gamma proteins expression and migratory analysis of 5 human uveal melanoma cell lines for cxcl12, cxcl8, cxcl1, and hgf inhibition of multiple vascular endothelial growth factor receptors (vegfr) blocks lymph node metastases but inhibition of vegfr-2 is sufficient to sensitize tumor cells to platinum-based chemotherapeutics pilot study on the interaction between b16 melanoma cell-line and bone-marrow derived mesenchymal stem cells circulating serum levels of angiogenic factors and vascular endothelial growth factor receptors 1 and 2 in melanoma patients analysis of the tyrosine kinome in melanoma reveals recurrent mutations in erbb4 the role of vascular endothelial growth factors and their receptors in malignant melanomas the role of circulating angiogenic factors in patients operated on for localized malignant melanoma simultaneous blockade of vegfr-1 and vegfr-2 activation is necessary to efficiently inhibit experimental melanoma growth and metastasis formation flt-1 intraceptor induces the unfolded protein response, apoptotic factors, and regression of murine injury-induced corneal neovascularization an autocrine loop directed by the vascular endothelial growth factor promotes invasiveness of human melanoma cells overproduction of vegf concomitantly expressed with its receptors promotes growth and survival of melanoma cells through mapk and pi3k signaling mmp9 induction by vascular endothelial growth factor receptor-1 is involved in lungspecific metastasis release and complex formation of soluble vegfr-1 from endothelial cells and biological fluids angiogenesis and vascular growth factor receptor expression in malignant melanoma association of increased levels of heavychain ferritin with increased cd4+ cd25+ regulatory t-cell levels in patients with melanoma heavy chain ferritin activates regulatory t cells by induction of changes in dendritic cells immunosuppressive effects of melanoma-derived heavy-chain ferritin are dependent on stimulation of il-10 production ferritin heavy chain upregulation by nf-kappab inhibits tnfalpha-induced apoptosis by suppressing reactive oxygen species sealing of chromosomal dna nicks during nucleotide excision repair requires xrcc1 and dna ligase iii alpha in a cell-cycle-specific manner gene expression profiles in radiation workers occupationally exposed to ionizing radiation human melanoma cells secrete and respond to placenta growth factor and vascular endothelial growth factor expression patterns of placenta growth factor in human melanocytic cell lines neuropilins in tumor biology semaphorin signaling in cancer cells and in cells of the tumor microenvironment-two sides of a coin from plasma membrane to cytoskeleton: a novel function for semaphorin 6a semaphorins at the interface of development and cancer comparative integromics on non-canonical wnt or planar cell polarity signaling molecules: transcriptional mechanism of ptk7 in colorectal cancer and that of sema6a in undifferentiated es cells deciphering developmental stages of adult myelopoiesis downregulation of ccn3 expression as a potential mechanism for melanoma progression osteopontin expression in normal skin and non-melanoma skin tumors osteopontin stimulates melanoma growth and lung metastasis through nik/mekk1-dependent mmp-9 activation pathways osteonectin downregulates ecadherin, induces osteopontin and focal adhesion kinase activity stimulating an invasive melanoma phenotype osteopontin expression in spitz nevi serum markers to detect metastatic uveal melanoma osteopontin and the skin: multiple emerging roles in cutaneous biology and pathology the hedgehog pathway transcription factor gli1 promotes malignant behavior of cancer cells by up-regulating osteopontin expression of phosphorylated-stat3 and osteopontin and their correlation in melanoma matricellular proteins produced by melanocytes and melanomas: in search for functions osteopontin and 'melanoma inhibitory activity': comparison of two serological tumor markers in metastatic uveal melanoma patients extracellular and intracellular mechanisms that mediate the metastatic activity of exogenous osteopontin endothelin signaling axis activates osteopontin expression through pi3 kinase pathway in a375 melanoma cells a highthroughput study in melanoma identifies epithelial-mesenchymal transition as a major determinant of metastasis serum osteopontin, an enhancer of tumor metastasis to bone, promotes b16 melanoma cell migration gene-specific fluorescence in-situ hybridization analysis on tissue microarray to refine the region of chromosome 20q amplification in melanoma identification of proteins regulated by interferon-alpha in resistant and sensitive malignant melanoma cell lines a molecular correlate to the gleason grading system for prostate adenocarcinoma ink4c and pten constrain a positive regulatory loop between cell growth and cell cycle control survey of differentially methylated promoters in prostate cancer cell lines transfer of functional prostasomal cd59 of metastatic prostatic cancer cell origin protects cells against complement attack differentially expressed genes in two lncap prostate cancer cell lines reflecting changes during prostate cancer progression amine oxidase activities in chemically-induced mammary cancer in the rat the significance of monoamine oxidase-a expression in high grade prostate cancer from androgen receptor to the general transcription factor tfiih. identification of cdk activating kinase (cak) as an androgen receptor nh(2)-terminal associated coactivator alphamethylacyl-coa racemase-an 'obscure' metabolic enzyme takes centre stage biopsy tissue microarray study of ki-67 expression in untreated, localized prostate cancer managed by active surveillance alpha-methylacyl-coa racemase (p504s) expression in evolving carcinomas within benign prostatic hyperplasia and in cancers of the transition zone expression of alpha-methylacyl-coa racemase (p504s) in various malignant neoplasms and normal tissues: astudy of 761 cases using an amacr (p504s)/34betae12/p63 cocktail for the detection of small focal prostate carcinoma in needle biopsy specimens diagnostic utility of alpha-methylacyl coa racemase (p504s) on prostate needle biopsy discovery and clinical application of a novel prostate cancer marker: alpha-methylacyl coa racemase (p504s) alphamethylacyl-coa racemase: a multi-institutional study of a new prostate cancer marker quantitative immunohistochemical detection of the molecular expression patterns in proliferative inflammatory atrophy the importance of determining the aggressiveness of prostate cancer using serum and tissue molecular markers golph2 protein expression as a novel tissue biomarker for prostate cancer: implications for tissue-based diagnostics optimization of laser capture microdissection and rna amplification for gene expression profiling of prostate cancer ) alpha-methylacyl-coa racemase: expression levels of this novel cancer biomarker depend on tumor differentiation elevated alpha-methylacyl-coa racemase enzymatic activity in prostate cancer comparison of monoclonal antibody (p504s) and polyclonal antibody to alpha methylacyl-coa racemase (amacr) in the work-up of prostate cancer immunohistochemical detection of carcinoma in radical prostatectomy specimens following hormone therapy basal cell subpopulation as putative human prostate carcinoma stem cells a statistical method for identifying differential gene-gene co-expression patterns alphamethylacyl-coa racemase (amacr/p504s) protein expression in urothelial carcinoma of the upper urinary tract correlates with tumour progression sequence variation in alpha-methylacyl-coa racemase and risk of early-onset and familial prostate cancer malignant transformation of human benign prostate epithelial cells by high linear energy transfer alpha-particles expression of alpha-methylacylcoenzyme a racemase in dysplastic barrett's epithelium quantitative analysis of a panel of gene expression in prostate cancer-with emphasis on npy expression analysis the value of using an amacr/34betae12/p63 cocktail double staining for diagnosis of prostate carcinoma decreased gene expression of steroid 5 alpha-reductase 2 in human prostate cancer: implications for finasteride therapy of prostate carcinoma alpha-methylacyl-coa racemase: a new molecular marker for prostate cancer alpha-methylacyl-coa racemase: a variably sensitive immunohistochemical marker for the diagnosis of small prostate cancer foci on needle biopsy neoadjuvant docetaxel treatment for locally advanced prostate cancer: a clinicopathologic study biomarkers for prostate cancer method for quantification of a prostate cancer biomarker in urine without sample preparation effects of the dual 5 alpha-reductase inhibitor dutasteride on apoptosis in primary cultures of prostate cancer epithelial cells and cell lines routine immunohistochemical staining for high-molecular weight cytokeratin 34-beta and alpha-methylacyl coa racemase (p504s) in postirradiation prostate biopsies search for residual prostate cancer on pt0 radical prostatectomy after positive biopsy branched fatty acids in dairy and beef products markedly enhance alpha-methylacyl-coa racemase expression in prostate cancer cells in vitro alpha-methyl coa racemase expression in renal cell carcinomas diagnostic utility of a p63/alpha-methyl-coa-racemase (p504s) cocktail in atypical foci in the prostate value of new prostate cancer markers: alpha methylacyl coa racemase (p504s) and p63 evaluation of p63 and p504s markers for the diagnosis of prostate cancer a preliminary study of the baboon prostate pathophysiology alternative spliced variants of the alpha-methylacyl-coa racemase gene and their expression in prostate cancer a variant of the alpha-methyl-acyl-coa racemase gene created by a deletion in exon 5 and its expression in prostate cancer utility of alpha-methylacyl coenzyme a racemase (p504s antibody) as a diagnostic immunohistochemical marker for cancer alpha-methylacyl-coa racemase as a marker in the differential diagnosis of metanephric adenoma frequent overexpression of ets-related gene-1 (erg1) in prostate cancer transcriptome human telomerase and alpha-methylacyl-coenzyme a racemase in prostatic carcinoma. a comparative immunohistochemical study prostate cancer detection on urinalysis for alpha methylacyl coenzyme a racemase protein decreased alpha-methylacyl coa racemase expression in localized prostate cancer is associated with an increased rate of biochemical recurrence and cancer-specific death quantitative determination of expression of the prostate cancer protein alphamethylacyl-coa racemase using automated quantitative analysis (aqua): a novel paradigm for automated and continuous biomarker measurements ) alpha-methylacyl coenzyme a racemase as a tissue biomarker for prostate cancer prostatic ductal adenocarcinoma presenting as a urethral polyp: a clinicopathological study of eight cases of a lesion with the potential to be misdiagnosed as a benign prostatic urethral polyp ) alpha-methylacyl coenzyme a racemase, ki-67, and topoisomerase iialpha in cystoprostatectomies with incidental prostate cancer emerging biomarkers for the diagnosis and prognosis of prostate cancer extraction and processing of high quality rna from impalpable and macroscopically invisible prostate cancer for microarray gene expression analysis kinetic fluorescence reverse transcriptase-polymerase chain reaction for alphamethylacyl coa racemase distinguishes prostate cancer from benign lesions expression profiling identifies a novel alpha-methylacyl-coa racemase exon with fumarate hydratase homology alphamethylacyl coa racemase in pulmonary adenocarcinoma, squamous cell carcinoma, and neuroendocrine tumors: expression and survival analysis isolation of human prostatic epithelial plasma membranes for proteomics using mirror image tissue banking of radical prostatectomy specimens expression of alphamethylacyl-coa racemase (p504s) in nephrogenic adenoma: a significant immunohistochemical pitfall compounding the differential diagnosis with prostatic adenocarcinoma oct4a is expressed by a subpopulation of prostate neuroendocrine cells humoral immune response to alpha-methylacyl-coa racemase and prostate cancer comparison of annexin ii, p63 and alpha-methylacyl-coa racemase immunoreactivity in prostatic tissue: a tissue microarray study ancillary alpha-methylacyl-coa racemase immunocytochemistry in the diagnosis of adenocarcinoma of the prostate in urinary cytology: a case report alpha-methylacyl-coa racemase (p504s)/34betae12/p63 triple cocktail stain in prostatic adenocarcinoma after hormonal therapy differences between latent and clinical prostate carcinomas: lower cell proliferation activity in latent cases variation of alpha-methylacyl-coa racemase expression in prostate adenocarcinoma cases receiving hormonal therapy phytanic acid, amacr and prostate cancer risk transcriptome analysis of human colon caco-2 cells exposed to sulforaphane expression of alpha-methylacyl-coa racemase in papillary renal cell carcinoma alpha-methylacyl-coa racemase expression is upregulated in gastric adenocarcinoma: a study of 249 cases sensitivity of p504s/alpha-methylacyl-coa racemase (amacr) immunohistochemistry for the detection of prostate carcinoma on stored needle biopsies diagnostic utility of immunohistochemistry in morphologically difficult prostate cancer: review of current literature the prostate-specific g-protein coupled receptors psgr and psgr2 are prostate cancer biomarkers that are complementary to alpha-methylacyl-coa racemase partial atrophy on prostate needle biopsy cores: a morphologic and immunohistochemical study abundant expression of amacr in many distinct tumour types peroxisomal disorders affecting phytanic acid alphaoxidation: a review alpha-methylacyl-coa racemase protein expression is associated with the degree of differentiation in breast cancer using quantitative image analysis analysis of alpha-methylacyl-coa racemase (p504s) expression in high-grade prostatic intraepithelial neoplasia serum levels of phytanic acid are associated with prostate cancer risk detection of alpha-methylacyl-coenzyme a racemase in postradiation prostatic adenocarcinoma tumor suppressor activity of glucocorticoid receptor in the prostate detection of alpha-methylacyl-coenzyme-a racemase transcripts in blood and urine samples of prostate cancer patients alphamethylacyl-coa racemase as an androgen-independent growth modifier in prostate cancer peroxisomal branched chain fatty acid beta-oxidation pathway is upregulated in prostate cancer a nonclassic ccaat enhancer element binding protein binding site contributes to alpha-methylacyl-coa racemase expression in prostate cancer sequence variants of alpha-methylacyl-coa racemase are associated with prostate cancer risk how often does alphamethylacyl-coa-racemase contribute to resolving an atypical diagnosis on prostate needle biopsy beyond that provided by basal cell markers? alpha-methylacyl-coa racemase: a novel tumor marker over-expressed in several human cancers and their precursor lesions expression and diagnostic utility of alphamethylacyl-coa-racemase (p504s) in foamy gland and pseudohyperplastic prostate cancer a novel diagnostic test for prostate cancer emerges from the determination of alphamethylacyl-coenzyme a racemase in prostatic secretions deletion hotspots in amacr promoter cpg island are cis-regulatory elements controlling the gene expression in the colon gene expression profiles in the pc-3 human prostate cancer cells induced by nkx3.1 usefulness of cytokeratin 5/6 and amacr applied as double sequential immunostains for diagnostic assessment of problematic prostate specimens conversion of prostate cancer from hormone independency to dependency due to amacr inhibition: involvement of increased ar expression and decreased igf1 expression oct4a is expressed by a subpopulation of prostate neuroendocrine cells a duplex quantitative polymerase chain reaction assay based on quantification of alphamethylacyl-coa racemase transcripts and prostate cancer antigen 3 in urine sediments improved diagnostic accuracy for prostate cancer a continuous assay for alpha-methylacylcoenzyme a racemase using circular dichroism group iia phospholipase a as a prognostic marker in prostate cancer: relevance to clinicopathological variables and disease-specific mortality biomarkers for prostate cancer immunohistochemical algorithms in prostate diagnostics: what's new?]. pathologe alpha-methylacyl-coa racemase (amacr) in fine-needle aspiration specimens of prostate lesions biopsy tissue microarray study of ki-67 expression in untreated, localized prostate cancer managed by active surveillance urine markers in monitoring for prostate cancer expression of alpha-methylacyl-coa racemase (p504s) in sebaceous neoplasms expression of prostatic acid phosphatase (psap) in transurethral resection specimens of the prostate is predictive of histopathologic tumor stage in subsequent radical prostatectomies prostate carcinogenesis induced by n-methyl-n-nitrosourea (mnu) in gerbils: histopathological diagnosis and potential invasiveness mediated by extracellular matrix components synthesis and use of isotope-labelled substrates for a mechanistic study on human alpha-methylacyl-coa racemase 1a (amacr; p504s) s-100a1 is a reliable marker in distinguishing nephrogenic adenoma from prostatic adenocarcinoma molecular cloning and preliminary analysis of the human alpha-methylacyl-coa racemase promoter autopsy evaluation of a prostate cancer case treated with brachytherapy loss of heterozygosity (loh), malignancy grade and clonality in microdissected prostate cancer allelic loss and microsatellite instability in prostate cancers in japan tumour suppressor gene mutations in benign prostatic hyperplasia and prostate cancer mutations in brca1 and brca2 and predisposition to prostate cancer brca1a has antitumor activity in tn breast, ovarian and prostate cancers coexpression of the mutated brca1 mrna and p53 mrna and its association in chinese prostate cancer exploiting the achilles heel of cancer: the therapeutic potential of poly(adp-ribose) polymerase inhibitors in brca2-defective cancer prostate cancer patients with brca2 mutation face poor survival brca1 mutations and prostate cancer in poland common variation in the brca1 gene and prostate cancer risk brca1 in special populations male brca1 and brca2 mutation carriers: a pilot study investigating medical characteristics of patients participating in a prostate cancer prevention clinic truncating brca1 mutations are uncommon in a cohort of hereditary prostate cancer families with evidence of linkage to 17q markers unravelling the genetics of prostate cancer prostate-cancer screening targets men with brca mutations brca1 and prostate cancer gene mutational rate, gleason score, and bk virus infection in prostate adenocarcinoma: is there a correlation tp53 mutation in prostate needle biopsies-comparison with patients follow-up shared tp53 gene mutation in morphologically and phenotypically distinct concurrent primary small cell neuroendocrine carcinoma and adenocarcinoma of the prostate unique substitution of chek2 and tp53 mutations implicated in primary prostate tumors and cancer cell lines molecular biology in prostate cancer a microdissection approach to detect molecular markers during progression of prostate cancer molecular biology of prostate cancer progression molecular heterogeneity in prostate cancer: can tp53 mutation unravel tumorigenesis abnormalities in primary prostate cancer: single-strand conformation polymorphism analysis of complementary dna in comparison with genomic dna. the cooperative prostate network status and prognosis of locally advanced prostatic adenocarcinoma: a study based on rtog 8610 localization of prostate cancer metastasis-suppressor activity on human chromosome 17 correlation of genetic and immunodetection of tp53 mutations in malignant and benign prostate tissues molecular markers for predicting prostate cancer stage and survival role in tumorigenesis of silent mutations in the tp53 gene chromosomal imbalances, loss of heterozygosity, and immunohistochemical expression of tp53, rb1, and pten in intraductal cancer, intraepithelial neoplasia, and invasive adenocarcinoma of the prostate increase of androgen-induced cell death and androgen receptor transactivation by brca1 in prostate cancer cells androgen regulation of the androgen receptor coregulators breast cancer susceptibility gene 1 (brcai) is a coactivator of the androgen receptor common mutations in brca1 and brca2 do not contribute to early prostate cancer in jewish men brca1 in hormonal carcinogenesis: basic and clinical research role of direct interaction in brca1 inhibition of estrogen receptor activity brca1 regulates gene expression for orderly mitotic progression the consequences of chromosomal aneuploidy on gene expression profiles in a cell line model for prostate carcinogenesis constitutive activation of jak-stat3 signaling by brca1 in human prostate cancer cells methylation analysis of brca1, rassf1, gstp1 and ephb2 promoters in prostate biopsies according to different degrees of malignancy psf and p54(nrb)/nono-multi-functional nuclear proteins the psf.p54nrb complex is a novel mnk substrate that binds the mrna for tumor necrosis factor alpha ptb-associated splicing factor regulates growth factor-stimulated gene expression in mammalian cells transcriptional activity of androgen receptor is modulated by two rna splicing factors, psf and p54nrb pspc1, nono, and sfpq are expressed in mouse sertoli cells and may function as coregulators of androgen receptor-mediated transcription acts as a transcriptional coactivator for activation function 1 of the human androgen receptor regulation of rna-polymerase-ii-dependent transcription by n-wasp and its nuclearbinding partners identification and characterization of the protein-associated splicing factor as a negative coregulator of the progesterone receptor inhibition of dendritic cell generation and function by serum from prostate cancer patients: correlation with serum-free psa impact of the tumor microenvironment on host infiltrating cells and the efficacy of flt3-ligand combination immunotherapy evaluated in a treatment model of mouse prostate cancer prostate tumor microenvironment alters immune cells and prevents long-term survival in an orthotopic mouse model following flt3-ligand/cd40-ligand immunotherapy the influence of alpha1-antagonist on the expression pattern of tnf receptor family in primary culture of prostate epithelial cells from bph patients adenovirus-mediated cd40 ligand therapy induces tumor cell apoptosis and systemic immunity in the tramp-c2 mouse prostate cancer model avoiding tolerance against prostatic antigens with subdominant peptide epitopes avoiding tolerance against prostatic antigens with subdominant peptide epitopes plasma tissue factor antigen in localized prostate cancer: distribution, clinical significance and correlation with haemostatic activation markers enhanced activation of human dendritic cells by inducible cd40 and toll-like receptor-4 ligation mature dendritic cells induce tumor-specific type 1 regulatory t cells cd40 is not detected on human prostate cancer cells by immunohistologic techniques flt3 ligand expands dendritic cell numbers in normal and malignant murine prostate functional dichotomy in cd40 reciprocally regulates effector t cell functions dendritic cell gene therapy cd40 expression in prostate cancer: a potential diagnostic and therapeutic molecule a population of hla-dr+ immature cells accumulates in the blood dendritic cell compartment of patients with different types of cancer cytokine-mediated protection of human dendritic cells from prostate cancerinduced apoptosis is regulated by the bcl-2 family of proteins fas-mediated apoptosis in human prostatic carcinoma cell lines increased function and survival of il-15-transduced human dendritic cells are mediated by up-regulation of il-15ralpha and bcl-2 expression and function of the complement membrane attack complex inhibitor protectin (cd59) in human prostate cancer increased cd59 protein expression predicts a psa relapse in patients after radical prostatectomy differential expression of complement regulatory proteins decay-accelerating factor (cd55), membrane cofactor protein (cd46) and cd59 during human spermatogenesis flow cytometric technique for determination of prostasomal quantity, size and expression of cd10, cd13, cd26 and cd59 in human seminal plasma transfer of prostasomal cd59 to cd59-deficient red blood cells results in protection against complement-mediated hemolysis identification of extracellular delta-catenin accumulation for prostate cancer detection possible immunoprotective and angiogenesis-promoting roles for malignant cell-derived prostasomes: a new paradigm for prostatic cancer? complement resistance of human carcinoma cells depends on membrane regulatory proteins, protein kinases and sialic acid the nuclear lamina. both a structural framework and a platform for genome organization the nuclear envelope, a key structure in cellular integrity and gene expression mouse models of the laminopathies laminopathies'': a wide spectrum of human diseases phenomics and lamins: from disease to therapy adult stem cell maintenance and tissue regeneration in the ageing context: the role for a-type lamins as intrinsic modulators of ageing in adult stem cells and their niches nuclear lamin a/c deficiency induces defects in cell mechanics, polarization, and migration lamin a/c deficiency causes defective nuclear mechanics and mechanotransduction decreased mechanical stiffness in lmna2/2 cells is caused by defective nucleo-cytoskeletal integrity: implications for the development of laminopathies stabilization of the retinoblastoma protein by a-type nuclear lamins is required for ink4a-mediated cell cycle arrest a-type lamins regulate retinoblastoma protein function by promoting subnuclear localization and preventing proteasomal degradation nuclear lamin a inhibits adipocyte differentiation: implications for dunnigan-type familial partial lipodystrophy a-type lamins: guardians of the soma? downregulation of human x-box binding protein 1 (hxbp-1) expression correlates with tumor progression in human prostate cancers the proapoptotic kinase mst1 and its caspase cleavage products are direct inhibitors of akt1 gene network and canonical pathway analysis in prostate cancer: a microarray study biomarker expression patterns that correlate with high grade features in treatment naive, organ-confined prostate cancer nf-kappab2 processing and p52 nuclear accumulation after androgenic stimulation of lncap prostate cancer cells diverse effects of zinc on nf-kappab and ap-1 transcription factors: implications for prostate cancer progression nuclear factor-kappab nuclear localization is predictive of biochemical recurrence in patients with positive margin prostate cancer id-1 expression promotes cell survival through activation of nf-kappab signalling pathway in prostate cancer cells suppression of hormone-refractory prostate cancer by a novel nuclear factor kappab inhibitor in nude mice mechanisms of constitutive nf-kappab activation in human prostate cancer cells bcl-2 suppresses apoptosis resulting from disruption of the nf-kappa b survival pathway gene expression profiling of human prostate cancer stem cells reveals a proinflammatory phenotype and the importance of extracellular matrix interactions molecular imaging of nf-kappab in prostate tissue after systemic administration of il-1 beta tnf/il-1/nik/nf-kappa b transduction pathway: a comparative study in normal and pathological human prostate (benign hyperplasia and carcinoma) proteasome inhibitors induce apoptosis of prostate cancer cells by inducing nuclear translocation of ikappabalpha targeting the receptor activator of nuclear factor-kappab (rank) ligand in prostate cancer bone metastases pomegranate extract inhibits androgen-independent prostate cancer growth through a nuclear factor-kappab-dependent mechanism the nuclear factor-kappab pathway controls the progression of prostate cancer to androgenindependent growth a new prostate cancer therapeutic approach: combination of androgen ablation with cox-2 inhibitor inhibitory effect of snake venom toxin from vipera lebetina turanica on hormone-refractory human prostate cancer cell growth: induction of apoptosis through inactivation of nuclear factor kappab prostate cancer chemoprevention by silibinin: bench to bedside genistein inhibits radiation-induced activation of nf-kappab in prostate cancer cells promoting apoptosis and g2/m cell cycle arrest nf-kappab inhibition increases chemosensitivity to trichostatin a-induced cell death of ki-ras-transformed human prostate epithelial cells involvement of the tnf-alpha autocrine-paracrine loop, via nf-kappab and yy1, in the regulation of tumor cell resistance to fas-induced apoptosis estrogens and antiestrogens as etiological factors and therapeutics for prostate cancer nf-kappab activation upregulates fibroblast growth factor 8 expression in prostate cancer cells anticancer potential of silymarin: from bench to bed side blockage of nf-kappab induces serine 15 phosphorylation of mutant p53 by jnk kinase in prostate cancer cells skp2 enhances polyubiquitination and degradation of tis21/btg2/pc3, tumor suppressor protein, at the downstream of foxm1 triiodothyronine modulates cell proliferation of human prostatic carcinoma cells by downregulation of the b-cell translocation gene 2 b cell translocation gene 2 enhances susceptibility of hela cells to doxorubicin-induced oxidative damage tis21 (/btg2/pc3) as a link between ageing and cancer: cell cycle regulator and endogenous cell death molecule expression of b-cell translocation gene 2 protein in normal human tissues antiproliferative b cell translocation gene 2 protein is down-regulated posttranscriptionally as an early event in prostate carcinogenesis identification of genes associated with stromal hyperplasia and glandular atrophy of the prostate by mrna differential display role of connective tissue growth factor in fibronectin synthesis in cultured human prostate stromal cells stromal expression of connective tissue growth factor promotes angiogenesis and prostate cancer tumorigenesis analysis of gene expression during staurosporine-induced neuronal differentiation of human prostate cancer cells profiling molecular targets of tgf-beta1 in prostate fibroblast-to-myofibroblast transdifferentiation expression of androgen receptor coregulatory proteins in prostate cancer and stromal-cell culture models fhl2, a novel tissue-specific coactivator of the androgen receptor four and a half lim domain 2 alters the impact of aryl hydrocarbon receptor on androgen receptor transcriptional activity differently regulated androgen receptor transcriptional complex in prostate cancer compared with normal prostate suppression of foxo1 activity by fhl2 through sirt1-mediated deacetylation the transcriptional coactivator fhl2 transmits rho signals from the cell membrane into the nucleus functional epigenomics identifies genes frequently silenced in prostate cancer expression level and dna methylation status of glutathione-s-transferase genes in normal murine prostate and tramp tumors function of junb in transient amplifying cell senescence and progression of human prostate cancer kai1 promoter activity is dependent on p53, junb and ap2: evidence for a possible mechanism underlying loss of kai1 expression in cancer cells inhibition of prostate tumor growth by overexpression of nudc, a microtubule motorassociated protein control of androgen receptor signaling in prostate cancer by the cochaperone small glutamine rich tetratricopeptide repeat containing protein alpha signal transducer and activator of transcription-6 (stat6) is a constitutively expressed survival factor in human prostate cancer robust prostate cancer marker genes emerge from direct integration of inter-study microarray data molecular features of the transition from prostatic intraepithelial neoplasia (pin) to prostate cancer: genome-wide gene-expression profiles of prostate cancers and pins analysis of integrin alpha7 mutations in prostate cancer, liver cancer, glioblastoma multiforme, and leiomyosarcoma the biology of cancer the chromosomal basis of cancer gene panel model predictive of outcome in men at high-risk of systemic progression and death from prostate cancer after radical retropubic prostatectomy aneuploidy and rapid cell proliferation in recurrent prostate cancers with androgen receptor gene amplification prognostic value of dna analysis of prostate adenocarcinoma: correlation to clinicopathologic predictors heterogeneity in prostate cancer: prostate specific antigen (psa) and dna cytophotometry identification of patients with low-risk for aneuploidy: comparative discriminatory models using linear and machinelearning classifiers in prostate cancer prognostic factors in prostate cancer stage b prostate cancer: correlation of dna ploidy analysis with histological and clinical parameters ets gene fusions in prostate cancer unusually stable abnormal karyotype in a highly aggressive melanoma negative for telomerase activity repetitive dna alterations in human skin cancers emerging insights into the molecular pathogenesis of uveal melanoma integrative genomic analysis of aneuploidy in uveal melanoma decreased dna ploidy may constitute a mechanism of the reduced malignant behavior of b16 melanoma in aged mice high frequency of tetraploidy detected in malignant melanoma of japanese patients by fluorescence in situ hybridization analysis of the dna content in the progression of recurrent and metastatic melanomas evaluation of dna ploidy and degree of dna abnormality in benign and malignant melanocytic lesions of the skin using video imaging association of genomic imbalances with resistance to therapeutic drugs in human melanoma cell lines highgrade prostate intraepithelial neoplasia shares cytogenetic alterations with invasive prostate cancer persistent exposure to mycoplasma induces malignant transformation of human prostate cells profiling cytogenetic diversity with entropy-based karyotypic analysis a farewell to entropy: statistical thermodynamics based on information world can entropy and ''order'' increase together ? entropy measures quantify global splicing disorders in cancer computing the optimal protocol for finite-time processes in stochastic thermodynamics optimal finite-time endoreversible processes optimal process paths for endoreversible systems altered mode of allelic replication accompanied by aneuploidy in peripheral blood lymphocytes of prostate cancer patients a thermodynamic interpretation of malignancy: do the genes come later? the storage of energy as a cause of malignant transformation: a 7-phase model of carcinogenesis biology of cancer: thermodynamic answers to some questions cervical cancer as a natural phenomenon information of genome sequences and molecular basis of cancer maxwell's demon and smoluchowski's trap door maxwell's demon assisted thermodynamic cycle in superconducting quantum circuits ponderomotive ratchet in a uniform magnetic field protein folding and evolution are driven by the maxwell demon activity of proteins current drive in a ponderomotive potential with sign reversal global and local) fluctuations of phase space contraction in deterministic stationary nonequilibrium the szilard engine revisited: entropy, macroscopic randomness, and symmetry breaking phase transitions hydrodynamic maxwell demon in granular systems evolution of biological complexity adaptation and information in ontogenesis and phylogenesis. increase of complexity and efficiency use of glucose 6-phosphate and hexokinase as an atp regenerating system by the ca(2+)-atpase of sarcoplasmic reticulum and submitochondrial particles maxwell's demon anderson pw comment on ''quantitative limits on the ability of a maxwell demon to extract work from heat quantitative limits on the ability of a maxwell demon to extract work from heat folding and misfolding mechanisms of the p53 dna binding domain at physiological temperature kinetic partitioning during folding of the p53 dna binding domain p53: guardian of the genome and policeman of the oncogenes in mitochondria enhances the accuracy of dna synthesis structural biology of the p53 tumour suppressor temperature sensitivity for conformation is an intrinsic property of wild-type p53 adaptive response of the skin to uvb damage: role of the p53 protein tumorigenic conversion of immortal human skin keratinocytes (hacat) by elevated temperature the story of the discovery of aquaporins: convergent evolution of ideas-but who got there first? aquaporin water channels: molecular mechanisms for human diseases the aquaporin family of water channels in kidney aquaporin chip: the archetypal molecular water channel abh and colton blood group antigens on aquaporin-1, the human red cell water channel protein water channel proteins: from their discovery in 1985 in cluj-napoca, romania, to the 2003 nobel prize in chemistry structure-function relationships in aquaporins a brief survey of aquaporins and their implications for renal physiology the structural basis of water permeation and proton exclusion in aquaporins concerted action of two cation filters in the aquaporin water channel knock-out models reveal new aquaporin functions involvement of aquaporin-5 in differentiation of human gastric cancer cells aquaporins-new players in cancer biology role of human aquaporin 5 in colorectal carcinogenesis prevention of skin tumorigenesis and impairment of epidermal cell proliferation by targeted aquaporin-3 gene disruption aquaporin-3 facilitates epidermal cell migration and proliferation during wound healing roles of aquaporin-3 in the epidermis expression profile of early lung adenocarcinoma: identification of mrp3 as a molecular marker for early progression expression of aquaporin 3 in the human prostate expression of aquaporin 3 (aqp3) in normal and neoplastic lung tissues role of aquaporin-7 and aquaporin-9 in glycerol metabolism genetic conservation of the gil blood group determining aquaporin 3 gene in african and caucasian populations aquaporin 9 is the major pathway for glycerol uptake by mouse erythrocytes, with implications for malarial virulence ph regulated anion permeability of aquaporin-6 temporal changes in expression of aquaporin3, -4, -5 and -8 in rat brains after permanent focal cerebral ischemia expression of renal aquaporins is down-regulated in children with congenital hydronephrosis aquaporins as potential drug targets for meniere's disease and its related diseases in situ fluorescence measurement of tear film [na+], [k+], [cl2], and ph in mice shows marked hypertonicity in aquaporin-5 deficiency regulation of aquaporin-2 trafficking the role of water channel aquaporin 3 in the mechanism of tnf-alphamediated proinflammatory events: implication in periodontal inflammation nephrogenic diabetes insipidus in mice lacking aquaporin-3 water channels the role of aquaporins in dendritic cell macropinocytosis structure and function of aquaporin water channels selection of discriminative genes in microarray experiments using mathematical programming inhibition of the aquaporin 3 water channel increases the sensitivity of prostate cancer cells to cryotherapy point mutations in the aromatic/arginine region in aquaporin 1 allow passage of urea, glycerol, ammonia, and protons mechanism of aquaporin-4's fast and highly selective water conduction and proton exclusion dynamics and energetics of permeation through aquaporins. what do we learn from molecular dynamics simulations? in vitro analysis and modification of aquaporin pore selectivity structure-function analysis of plant aquaporin atpip2;1 gating by divalent cations and protons ion exclusion mechanism in aquaporin at an atomistic level aquaporins with selectivity for unconventional permeants molecular mechanisms of conduction and selectivity in aquaporin water channels fast and selective ammonia transport by aquaporin-8 the high diversity and regulation of plant water channels charge delocalization in proton channels, i: the aquaporin channels and proton blockage regulation of selectivity and translocation of aquaporins: an update toward theoretical analysis of long-range proton transfer kinetics in biomolecular pumps the barrier for proton transport in aquaporins as a challenge for electrostatic models: the role of protein relaxation in mutational calculations the mechanism of proton exclusion in the aquaporin-1 water channel what really prevents proton transport through aquaporin? charge self-energy versus proton wire proposals control of the selectivity of the aquaporin water channel family by global orientational tuning the aquaporins aquaporin-1 and hco3(-)-cl-transportermediated transport of co2 across the human erythrocyte membrane ammonia and urea permeability of mammalian aquaporins expression and function of aquaporins in human skin: is aquaporin-3 just a glycerol transporter physiological roles of glyceroltransporting aquaporins: the aquaglyceroporins a role for mitochondrial aquaporins in cellular life-and-death decisions? functional expression of aqp3 in human skin epidermis and reconstructed epidermis skin hydration: a review on its molecular mechanisms hydrating skin by stimulating biosynthesis of aquaporins pores in the epidermis: aquaporins and tight junctions biophysical properties of epithelial water channels brca1 localization to the telomere and its loss from the telomere in response to dna damage disruption of brca1 function results in telomere lengthening and increased anaphase bridge formation in immortalized cell lines the role of dna damage response proteins at telomeresan ''integrative'' model correlating breakage-fusion-bridge events with the overall chromosomal instability and in vitro karyotype evolution in prostate cancer telomere shortening is an early somatic dna alteration in human prostate tumorigenesis role of telomere dysfunction in aging and its detection by biomarkers telomere dysfunction and tumour suppression: the senescence connection dual roles of telomere dysfunction in initiation and suppression of tumorigenesis telomere dysfunction, genome instability and cancer brca1 knock-down causes telomere dysfunction in mammary epithelial cells telomere dysfunction in aging and cancer telomere dysfunction increases mutation rate and genomic instability telomere dysfunction triggers extensive dna fragmentation and evolution of complex chromosome abnormalities in human malignant tumors dna damage signalling prevents deleterious telomere addition at dna breaks the human rap1 protein complex and modulation of telomere length small regulatory rnas in mammals implications of micro-rna profiling for cancer diagnosis mirnas: little known mediators of oncogenesis regulation of epidermal growth factor receptor signaling in human cancer cells by microrna-7 micrornas and their potential for translation in prostate cancer microrna expression profiling in prostate cancer clusters of internally primed transcripts reveal novel long noncoding rnas long, abundantly expressed non-coding transcripts are altered in cancer long non-coding rnas: insights into functions rna regulation of epigenetic processes the genetic signatures of noncoding rnas 39 end processing of a long nuclearretained noncoding rna yields a trna-like cytoplasmic rna dna sequence of the translocation breakpoints in undifferentiated embryonal sarcoma arising in mesenchymal hamartoma of the liver harboring the t(11;19)(q11;q13.4) translocation transcriptomic and genomic analysis of human hepatocellular carcinomas and hepatoblastomas a transcriptional sketch of a primary human breast cancer by 454 deep sequencing metastasis associated lung adenocarcinoma transcript 1 (malat-1) is up-regulated in placenta previa increta/percreta and strongly associated with trophoblast-like cell invasion in vitro elucidating the molecular physiopathology of acute respiratory distress syndrome in severe acute respiratory syndrome patients expression profile of micrornas in c-myc induced mouse mammary tumors prognostic significance of drug-regulated genes in high-grade osteosarcoma a screen for nuclear transcripts identifies two linked noncoding rnas associated with sc35 splicing domains a large noncoding rna is a marker for murine hepatocellular carcinomas and a spectrum of human carcinomas phenotypic characterization of endometrial stromal sarcoma of the uterus genome-wide screening for prognosis-predicting genes in early-stage non-small-cell lung cancer malat-1, a novel noncoding rna, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer the interaction profile of homologous recombination repair proteins rad51c, rad51d and xrcc2 as determined by proteomic analysis paraspeckles: nuclear bodies built on long noncoding rna hnrnp m interacts with psf and p54(nrb) and co-localizes within defined nuclear structures altered nuclear retention of mrnas containing inverted repeats in human embryonic stem cells: functional role of a nuclear noncoding rna how to build a paraspeckle an architectural role for a nuclear noncoding rna: neat1 rna is essential for the structure of paraspeckles menepsilon/beta noncoding rnas are essential for structural integrity of nuclear paraspeckles men epsilon/beta nuclear-retained non-coding rnas are up-regulated upon muscle differentiation and are essential components of paraspeckles nuclear domains: characterization and dynamics as a function of transcriptional activity subnuclear localization and dynamics of the pre-mrna 39 end processing factor mammalian cleavage factor i 68-kda subunit functional studies of bcl11a: characterization of the conserved bcl11a-xl splice variant and its interaction with bcl6 in nuclear paraspeckles of germinal center b cells nucleocytoplasmic transport of fluorescent mrna in living mammalian cells: nuclear mrna export is coupled to ongoing gene transcription splicing speckles are not reservoirs of rna polymerase ii, but contain an inactive form, phosphorylated on serine2 residues of the c-terminal domain regulating gene expression through rna nuclear retention p54nrb forms a heterodimer with psp1 that localizes to paraspeckles in an rna-dependent manner dynamics of the mammalian nucleus: can microscopic movements help us to understand our genes? distinct sequence motifs within the 68-kda subunit of cleavage factor im mediate rna binding, protein-protein interactions, and subcellular localization nucleolomics: an inventory of the nucleolus paraspeckles: a novel nuclear domain a neat way of regulating nuclear export of mrnas genomic structure, chromosomal localization and expression profile of a porcine long non-coding rna isolated from long sage libraries apoptosis, cell cycle progression and gene expression in tp53-depleted hct116 colon cancer cells in response to short-term 5-fluorouracil treatment ocia domain containing 2 is highly expressed in adenocarcinoma mixed subtype with bronchioloalveolar carcinoma component and is associated with better prognosis interacting endogenous and exogenous rnai pathways in caenorhabditis elegans inhibition of alloresponse by a human trophoblast non-coding rna suppressing class ii transactivator promoter iii and major histocompatibility class ii expression in murine blymphocytes class ii transactivator promoter activity is suppressed through regulation by a trophoblast noncoding rna micrornas and other tiny endogenous rnas in c. elegans human trophoblast noncoding rna suppresses ciita promoter iii activity in murine b-lymphocytes role of human noncoding rnas in the control of tumorigenesis intensive entropic non-triviality measure statistical complexity and disequilibrium distinguishing noise from chaos microarrays-identifying molecular portraits for prostate tumors with different gleason patterns molecular portraits of human breast tumours molecular characterisation of soft tissue tumours: a gene expression study correlation theory of statistically optimal systems gelbaum br information theory, inference, and learning algorithms a statistical measure of complexity analysis of symbolic sequences using the jensen-shannon divergence effect of chain length on fragility and thermodynamic scaling of the local segmental dynamics in poly(methylmethacrylate) length of time's arrow measuring thermodynamic length far-from-equilibrium measurements of thermodynamic length empirical relationships between species richness, evenness, and proportional diversity the functional 2443t/c osteopontin promoter polymorphism influences osteopontin gene expression in melanoma cells via binding of c-myb transcription factor tumor necrosis is associated with increased alphavbeta3 integrin expression and poor prognosis in nodular cutaneous melanomas osteopontin as a serologic marker for metastatic uveal melanoma: results of a pilot study stable overexpression of smad7 in human melanoma cells impairs bone metastasis analysis of gene expression in the tumor-associated macrophage nuclear factor inducing kinase: a key regulator in osteopontin-induced mapk/ikappab kinase dependent nf-kappab-mediated promatrix metalloproteinase-9 activation prostate-specific antigen modulates genes involved in bone remodeling and induces osteoblast differentiation of human osteosarcoma cell line saos-2 the authors would like to thank three research associates of our centre (osvaldo rosso, carlos riveros and john marsden) for discussions on this topic. we thank the first two, in particular to osvaldo, for their collaboration on data cleaning and the computation of the entropy and jensen-shannon divergences. we thank marsden for his advice on editing the final draft. pm would also like to thank two stimulating discussions (decades apart) with dr. carlos reigosa and prof. elizabeth blackburn, as well as the invisible hand of prof. yaser abu-mostafa. key: cord-000492-ec5qzurk authors: devaney, james; contreras, maya; laffey, john g title: clinical review: gene-based therapies for ali/ards: where are we now? date: 2011-06-20 journal: crit care doi: 10.1186/cc10216 sha: doc_id: 492 cord_uid: ec5qzurk acute lung injury (ali) and acute respiratory distress syndrome (ards) confer substantial morbidity and mortality, and have no specific therapy. the accessibility of the distal lung epithelium via the airway route, and the relatively transient nature of ali/ards, suggest that the disease may be amenable to gene-based therapies. ongoing advances in our understanding of the pathophysiology of ali/ards have revealed multiple therapeutic targets for gene-based approaches. strategies to enhance or restore lung epithelial and/or endothelial cell function, to strengthen lung defense mechanisms against injury, to speed clearance of infection and to enhance the repair process following ali/ards have all demonstrated promise in preclinical models. despite three decades of gene therapy research, however, the clinical potential for gene-based approaches to lung diseases including ali/ards remains to be realized. multiple barriers to effective pulmonary gene therapy exist, including the pulmonary architecture, pulmonary defense mechanisms against inhaled particles, the immunogenicity of viral vectors and the poor transfection efficiency of nonviral delivery methods. deficits remain in our knowledge regarding the optimal molecular targets for gene-based approaches. encouragingly, recent progress in overcoming these barriers offers hope for the successful translation of gene-based approaches for ali/ards to the clinical setting. gene-based therapy involves the insertion of genes or smaller nucleic acid sequences into cells and tissues to replace the function of a defective gene, or to alter the production of a specifi c gene product, in order to treat a disease. gene therapy can be classifi ed into germline and somatic gene therapies. germline approaches modify the sperm or egg prior to fertilization and confer a stable heritable genetic modifi cation. somatic gene approaches use gene therapy to alter the function of mature cells. commonly used somatic gene therapy strategies include the overexpression of an existing gene and/or the insertion of smaller nucleic acid sequences into cells to alter the production of an existing gene. ali/ards may be suitable for gene-based therapies as it is an acute but relatively transient process [8] , requiring short-lived gene expression, obviating the need for repeated therapies and reducing the risk of an adverse immunological response. th e distal lung epithelium is selectively accessible via the tracheal route of administration, allowing targeting of the pulmonary epithelium [9] . th e pulmonary vasculature is also relatively accessible, as the entire cardiac output must transit this circulation. antibodies that bind antigens selectively expressed on the pulmonary endothelial surface can be complexed to gene vectors to facilitate selective targeting following intravenous administration [10] . it is also possible to use gene-based strategies to target other cells central to the pathogenesis of ali/ards, such as leuko cytes and abstract acute lung injury (ali) and acute respiratory distress syndrome (ards) confer substantial morbidity and mortality, and have no specifi c therapy. the accessibility of the distal lung epithelium via the airway route, and the relatively transient nature of ali/ ards, suggest that the disease may be amenable to gene-based therapies. ongoing advances in our understanding of the pathophysiology of ali/ards have revealed multiple therapeutic targets for genebased approaches. strategies to enhance or restore lung epithelial and/or endothelial cell function, to strengthen lung defense mechanisms against injury, to speed clearance of infection and to enhance the repair process following ali/ards have all demonstrated promise in preclinical models. despite three decades of gene therapy research, however, the clinical potential for gene-based approaches to lung diseases including ali/ ards remains to be realized. multiple barriers to eff ective pulmonary gene therapy exist, including the pulmonary architecture, pulmonary defense mechanisms against inhaled particles, the immunogenicity of viral vectors and the poor transfection effi ciency of nonviral delivery methods. defi cits remain in our knowledge regarding the optimal molecular targets for genebased approaches. encouragingly, recent progress in overcoming these barriers off ers hope for the successful translation of gene-based approaches for ali/ards to the clinical setting. fi bro blasts [11] . furthermore, gene-therapy-based approaches off er the potential to selectively target diff erent phases of the injury and repair process. th e potential to target specifi c aspects of the injury and repair processes such as epithelial-mesenchymal transition, fi brosis, fi brinolysis, coagulopathy and oxidative stress with these approaches is also clear. gene therapy requires the delivery of genes or smaller nucleic acid sequences into the cell nucleus using a carrier or vector. th e vector enables the gene to overcome barriers to entry into the cell, and to make its way to the nucleus to be transcribed and translated itself or to modulate transcription and/or translation of other genes. both viral and nonviral vector systems have been developed (table 1) . viral vectors are the most eff ective and effi cient way of getting larger nucleic acid sequences, particularly genes, into cells (table 1) . th e viral genome is modifi ed to remove the parts necessary for viral replication. th is segment is then replaced with the gene of interesttermed a transgene -coupled to a promoter that drives its expression. th e modifi ed genome is then encapsulated with viral proteins. following delivery to the target site, the virus binds to the host cell, enters the cytoplasm and releases its payload into the nucleus (figure 1 ). th e size of trans gene that can be used depends on the capsid size. a number of diff erent viral vectors have been used in preclinical lung injury studies to date. adenoviruses have double-stranded dna genomes, have demonstrated promise in preclinical models [12, 13] and are well tolerated at low to intermediate doses in humans [14, 15] . advantages include their ease of production, the high effi ciency at which they can infect the pulmonary epithelium [14, 16] and that they can deliver relatively large transgenes. a disadvantage of adenoviruses is their immunogenicity, particularly in repeated doses [14] . newer adenoviral vectors, in which much of the immuno genicity has been removed, hold promise [17] . while adenovirus-mediated gene transfer in the absence of epithelial damage is relatively ineffi cient [18] , this may be less of a problem in ali/ards that is characterized by widespread epithelial damage. adeno-associated viruses (aavs) are single-stranded dna parvoviruses that are replication defi cient [19] . a substantial proportion of the human population has been exposed to aavs but the clinical eff ects are unknown. aav vectors have a good safety profi le, and are less immunogenic compared with other viruses, although anti bodies do develop against aav capsid proteins that can compromise repeat administration. aav vectors can insert genes at a specifi c site on chromosome 19 . th e packaging capacity of the virus is limited to 4.7 kb, restricting the size of the transgene that can be used. aavs are less effi cient in transducing cells than adenoviral vectors. successful aav vector gene transfer has been demon strated in multiple lung cell types including lung progenitor cells, in both normal and naphthaleneinduced ali lungs [20] . aav serotypes have specifi c tissue tropisms, due to diff erent capsid proteins that bind to specifi c cell membrane receptors. aav-5 [21] and avv-6 [22] exhibit enhanced tropism for the pulmonary epi thelium [21, 22] . aavs can transduce nondividing cells and result in long-lived transgene expression. aav vectors have been used in clinical trials in cystic fi brosis patients, underlining their safety profi le [23, 24] . th ese rna viruses can transfect nondividing cells such as mature airway epithelial cells [25] . th e virus stably but randomly integrates into the genome and expression is likely to last for the lifetime of the cell (~100 days). th e transgene can be transmitted post mitosis, and there is also a risk of tumorigenesis if the transgene integrates near an oncogene. th e development of leukemias in children following gene therapy for severe combined immunodefi ciency highlights this risk [26, 27] . while lentiviral vectors may be useful to correct a gene defi ciency associated with increased risk of ali, the long-lived gene expression of lentiviral delivered genes may be more suitable for chronic diseases than for ali/ards. nonviral delivery systems, while generally less effi cient than viral vectors in transfecting the lung epithelium, are increasingly used to deliver smaller dna/rna molecules (table 1 ). strategies include the use of dna-lipid and dna-polymer complexes and naked dna/rna oligonucleotides, such as sirna [28] , decoy oligo nucleo tides [29] and plasmid dna [30] . nonviral delivery systems are less immunogenic than viral vector-based approaches, and can be generated in large amounts at relatively low cost. plasmid vectors are composed of closed circles of doublestranded dna. as naked and plasmid dna contain no proteins for attachment to cellular receptors, there is no specifi c targeting to diff erent cell types and thus it is essential that the dna is placed in close contact with the desired cell type. th ese limitations make this approach less relevant clinically. th e therapeutic dna is held within a sphere of lipids, termed a lipoplex, or within a sphere of polymers, such as polyethyleneimine, termed a polyplex. lipoplexes and polyplexes act to protect the dna, facilitate binding to the target cell membrane and also trigger endocytosis of the complex into the cell, thereby enhancing gene expression. th ese systems can be modifi ed to include a targeting peptide for a specifi c cell type, such as airway epithelial cells [31] . th ese complexes effi ciently and safely transfect airway epithelial cells [31] , and they have demonstrated promise in human studies [32] . sirnas are dsrna molecules of 20 to 25 nucleotides that can regulate the expression of specifi c genes. specifi c sirnas reduce infl ammation-associated lung injury in table 1 . viral vector-delivered gene therapy relatively easily produced immunogenic [14] adenoviral transfer of genes for a surfactant (dsdna genome) effi ciently transfect lung enzyme [49] , angiopoietin-1 [51] , hsp-70 [52] , epithelium [14, 16] apolipoprotein a-1 [53] , and na + ,k + -atpase pump can deliver larger genes [55] genes attenuate experimental ali well tolerated in lower doses [1, 3] adenoviral delivery of il-10 gene attenuates zymosan ali at low doses, but is harmful at high doses [58] adeno-associated virus good safety profi le; less limited transgene size aav vector gene transfer demonstrated in multiple vectors (ssdna genome) immunogenic diffi cult to produce in large lung cell types including progenitor cells in both inherently replication defi cient quantities normal lungs and following naphthalene-induced aav-5 and aav-6 lung epithelial ali [20] tropism [10, 11] transduce nondividing cells long-lived gene expression used in clinical trials for cf [12, 13] lentivirus vectors transduce nondividing cells [25] oncogenesis risk due to lentiviral transfer of shrna to silence cd36 gene (rna genome) integrate stably but randomly integration into genome expression suppresses silica-induced lung fi brosis into the genome [26, 27] in the rat [35] nonviral gene-based strategies plasmid transfer (closed easily produced at low cost no specifi c cell targeting electroporation-mediated gene transfer of the dsdna circles) very ineffi cient na + ,k + -atpase rescues endotoxin-induced lung injury [60] nonviral dna complexes complexes protect dna less effi cient than viral vectors cationic lipid-mediated transfer of the na + ,k + -(lipoplexes or polyplexes) complexes facilitate cellular atpase gene ameliorated high-permeability targeting [31] pulmonary edema [59] lipoplex-delivered il-10 gene decreased clp-induced ali [61] systemic cationic polyethylenimine polyplexes incorporating indoleamine-2,3-dioxygenase decreased ischemia-reperfusion ali [62] dna and rna easily produced at low cost no specifi c cell targeting specifi c sirnas reduce infl ammation-associated oligonucleotides (sirna, smaller molecules that can lung injury in humans [33] and in animal models shrna, decoy easily enter cells [28, 34] oligonucleotides) target regulation of specifi c genes shrna-based approaches have reduced lung injury in animal models [35, 36] cell-delivered gene therapy humans [33] and in animal models [28, 34] . shrna is a single strand of rna that, when introduced into the cell, is reverse transcribed and integrated into the genome, becoming heritable. during subsequent transcription, the sequence generates an oligonucleotide with a tight hairpin turn that is processed into sirna. shrnas have reduced lung injury in animal models [35, 36] . decoy oligonucleotides are double-stranded dna molecules of 20 to 28 nucleo tides, which bind to specifi c transcription factors to reduce expression of targeted genes, and have been successfully used in animal models [37, 38] . an alternative approach is to use systemically delivered cells to deliver genes to the lung. th is approach has been used to enhance the therapeutic potential of stem cellssuch as mesenchymal stem/stromal cells, which demon strate promise in preclinical ali/ards models [39] . fibroblasts have also been used to successfully deliver genes to the lung to attenuate ali [40] . preliminary data from a clinical trial in pulmonary hypertension show that endothelial progenitor cells overexpressing endothelial nitric oxide synthase (nos3) decrease pulmonary vascular resistance [41] , highlighting the potential of cell-delivered gene therapy for ali/ards. nebulization of genetic material into the lung is eff ective [42] , safe and well tolerated [32, 43, 44] . th e integrity of aav vectors [9, 43] and adenoviral virus vectors [44] are maintained post nebulization, as are cationic lipid vectors [32] and dna and rna oligonucleotides [45] . a number of gene therapy clinical trials have utilized nebulization to deliver the transgene to the lung [23, 43] , but without clear clinical benefi t to date [43, 44] . intravascular delivery approaches target the lung endothelium. th ese approaches have been successfully used in preclinical studies of cell-based gene therapies [39, 40] , and also with vectors that incorporate components such as antibodies to target antigens on the lung endothelium [10] . successful gene-based therapies require the delivery of high quantities of the gene or oligonucleotide to the pulmonary epithelial or endothelial surface, require effi cient entry into the cytoplasm of these large and insoluble nucleic acids, which then have to move from the cytoplasm into the nucleus, and activate transcription of its product. multiple barriers exist that hinder this process, not least the natural defense mechanisms of the lung, and additional diffi culties that exist in transducing the acutely injured lung (table 2 ). limitations regarding delivery technologies and defi ciencies in our knowledge regarding the optimal molecular targets also reduce the effi cacy of these approaches. th e lung has evolved eff ective barriers to prevent the uptake of any inhaled foreign particles [46] . while advantageous in minimizing the potential for uptake of external genetic material (for example, viral dna), these barriers make it more diffi cult to use gene-based therapies in the lung. barriers to entry of foreign genetic material into the lung include airway mucus and the epithelial lining fl uid, which traps and clears inhaled material. th e glycocalyceal barrier hinders contact with the cell membrane, while the tight intercellular epithelial junctions and limited luminal endocytosis further restrict entry of foreign material into the epithelial cells. transducing the acutely injured lung may be diffi cult, due to the presence of pulmonary edema, consolidated or collapsed alveoli, and additional extracellular barriers such as mucus. gene-based therapies targeted at the pulmonary epithelium may be less eff ective where there is extensive denudation of the pulmonary epithelium, as may occur in primary ards. encouragingly, there is some evidence to suggest that ali may not substantially impair viral gene transfer to the alveolar epithelium [47] . th e key limitation of nonviral vector approaches has been their lack of effi ciency in mediating gene transfer and transgene expression in the airway epithelium. viral vectors are immunogenic, due to the protein coat of the viral vector, and the immune response is related to both vector dose and number of administrations. th e potential to limit administration to a single dose in ali/ards may reduce this risk. however, the development of an infl amma tory response resulting in death following administration of a fi rst-generation adenoviral vector highlights the risks involved [48] . additional limitations of viral vectors include transgene size, which is limited by the size of the capsid that encloses the viral genes. th e therapeutic potential of gene therapy for ali/ards is underlined by a growing body of literature demon strating effi cacy in relevant preclinical models. in considering the clinical implications of these studies, it is important to acknowledge that animal models of ards do not fully replicate the complex pathophysiological changes seen in the clinical setting. th is is highlighted by the fact that many pharmacologic strategies demonstrating considerable promise in preclinical studies were later proven ineff ective in clinical trials. nevertheless, these studies provide insights into the clinical potential of these strategies. adenovirus-mediated transfer of a gene that enhances surfactant production improves lung function and confers resistance to pseudomonas aeruginosa infection ( figure 2 ) [49] . adenovirus-delivered superoxide dismutase and catalase genes protected against hyperoxic-induced, but not ischemia-reperfusion-induced, lung injury [50] . more recent studies have demonstrated the therapeutic potential of overexpression of a number of genes, including angio poietin-1 [51] , hsp-70 [52] , apolipo protein a-1 [53] , defensin î²2 [54] and the na + ,k + -atpase pump [55] . in contrast, overexpression of il-1î² can directly cause ali [56] , while overexpression of suppressor of cytokine signal ing-3 worsens immune-complex-induced ali [57] . intriguingly, intra tracheal administration of adenoviral vector incor porating il-10, prior to zymosan-induced lung injury, improved survival at a lower dose but was ineff ective and even harmful at higher doses [58] . an early murine study demonstrated that cationic lipidmediated transfer of the na + ,k + -atpase gene ameliorated high-permeability pulmonary edema [59] . electroporationassisted gene transfer of plasmids encoding for na + ,k + -atpase reverses endotoxin-induced lung injury [60] . th e lipoplex-delivered il-10 gene decreased lung and systemic organ injury induced by cecal ligation and puncture in mice [61] . systemically administered cationic polyethyleni mine polyplexes incorporating indoleamine-2,3-dioxyge nase transduced pulmonary endo thelial cells and decreased lung ischemia-reper fusion injury [62] . nf-îºb decoy oligonucleotides, incorporated into viral vectors, attenuate systemic sepsis-induced lung injury when administered intravenously (figure 3 ) [37] . in animal models, both intratracheal [34, 63] and intra venously [29, 64] administered sirna successfully silence their target genes. shrna-based approaches have been used to suppress silica-induced lung fi brosis [35] and to ameliorate lung ischemia-reperfusion-induced lung injury [36] . more recently, aerosolization of sirna that targets respiratory syncytial virus viral replication was safe and potentially eff ective in patients post lung transplant with respiratory syncytial virus infection [33] , clearly illustrating the therapeutic potential of these approaches for ali/ards. mei and colleagues enhanced the effi cacy of mesen chymal stem/stromal cells in endotoxin-induced ali by transducing them to overexpress angiopoeitin-1 (figure 4 ) [39] . mesenchymal stem/stromal cells overexpressing il-10 decreased alveolar infi ltration of cd4 and cd8 t cells following lung ischemia-reperfusion injury [65] . bone marrow stem cells expressing keratinocyte growth factor attenuate bleomycin-induced lung injury [66] . non stem cells can also be used to deliver genes to the injured lung [67] . fibroblasts overexpressing angiopoeitin-1 attenuate endotoxin-induced lung injury [40] , while fi broblasts overexpressing vascular endothelial growth factor and endothelial nitric oxide synthase can attenuate or even reverse endotoxin-induced ali [68] . advances in the identifi cation of therapeutic targets, improvements in viral and nonviral vector technologies, and regulation of gene-based therapies by temporal and spatial targeting off er the potential to translate the therapeutic promise of gene-based therapies for ali/ ards to the clinical setting (table 3) . viral vectors remain the focus of intensive research to optimize their effi ciency, to minimize their immuno genicity and to enhance their tissue specifi city [19, 31, 69, 70] . strategies to develop less immunogenic vectors have focused on modifying the naturally occurring proteins in the viral coat [71] . much research has been devoted to searching and characterizing both naturally occurring [71] and engineered capsid variants from mammalian species [72] . capsid protein modification has also been used to enhance tissue specifi city [70] . envelope protein pseudotyping involves encapsulating the modifi ed genome from one virus, such as simian immuno defi ci ency virus, with envelope proteins from another virus, such as vesicular stomatitic virus. th is encapsu lation can enhance the therapeutic potential of viral vectors, by combining the advantages of one viral genome (for example, bigger payload or site-specifi c integration) with the tissue tropism of another virus. strategies to enhance the eff ectiveness of the lipoplexes used to deliver plasmids and other dna/rna oligonucleotides involve manipulation of the lipoplex lipid content and the use of targeting peptides. th e choice of lipid infl uences expression effi ciency by enhancing release of the genetic material within the target cell [73, 74] . targeting peptides increases transfection effi ciency by directing the lipid to a particular cell membrane or cell type [31] . physical methods of plasmid delivery such as electroporation [60] and ultrasound can enhance gene transfer by bringing the plasmid dna into closer proximity with the cell membrane and/or causing temporary disruption of the cell membrane. other physical methods can also be used to increase in vivo gene transfer, including pressurized vascular delivery, laser, magnetic fi elds and gene gun delivery. th ese systems enable plasmid-based gene delivery to reach effi ciencies close to that achieved with viral vectors. successful gene therapy relies upon being able to target the injury site, and to control the duration and levels of gene expression. modifying the transgene dna to exclude nonmethylated cpg motifs, typical of bacterial dna, decreases the immune response and may increase transgene expression [75, 76] . high-effi ciency tissue-specifi c promoters may improve the effi ciency and specifi city of transgene expression. lung-specifi c promoters include surfactant promoters [77] such as the surfactant protein c promoter [78] , a ciliated cell-specifi c promoter foxj1 [79] , the cytokeratin 18 promoter [80] , and the clara cell 10-kda protein [78] . promoters can also be used to target a specifi c phase of illness, switching on when required to produce an eff ect at the optimal time point. a related approach is the development of promoters that allow for transfected genes to be turned on and off . currently, the tetracycline-dependent gene expression vector [81] is the most widely used regulated system as it has a good safety profi le. tetracycline is rapidly metabolized and cleared from the body, making it an ideal drug to control gene expression. however, the potential for an activator such as tetracycline to modulate the lung injury should be borne in mind. new-generation transactivators, with no basal activity and increased sensitivity, have now been developed [82] . in an ards context, conditional regulation of gene expression by the combined use of a lung-specifi c promoter and the tetracycline-dependent gene expression system may be a useful approach [83] . capsid protein modifi cation to reduce immunogenicity [71] capsid protein modifi cation to enhance tissue specifi city [70] envelope protein pseudotyping manipulation of lipoplex lipid content to enhance cellular uptake [73, 74] use of targeting peptides on lipoplexes and polyplexes [31] strategies to enhance gene transfer; for example, electroporation, ultrasound, gene gun delivery modifying transgene dna to eliminate bacterial motifs [75, 76] development of high-effi ciency tissue-specifi c promoters [77] [78] [79] [80] development of promoters that regulate gene expression [83] enhanced therapeutic targeting nebulization technologies [9] strategies to target the pulmonary endothelium [10] improved cellular uptake of vector surface active agents to enhance vector spread [84] reduce ubiquitination of viral capsid proteins [85] better therapeutic targets enhancement or restoration of lung epithelial and/or endothelial cell function [86] strengthening lung defense mechanisms against injury [87] speeding clearance of infl ammation and infection enhancement of the repair process following ali/ards [88] . an advantage of gene-based strategies is the ability to target specifi c cells within an organ; for example, the epithelial cells of the lung. novel nebulization technologies, which facilitate the delivery of large quantities of undamaged vector to the distal lung, demonstrate considerable promise in this regard [9] . alternative approaches to spatial targeting include targeting specifi c receptors that are plentiful on the target cell to increase transfection effi ciency. an interesting development in this regard is the targeting of systemically administered therapies to the pulmonary endothelium using antibodies to proteins expressed preferentially on these cells ( figure 5 ) [10] . in these studies, the antioxidant enzyme catalase was conjugated with antibodies to the adhesion molecule pecam, which is widely expressed on pulmonary endothelial cells, and to a nonspecifi c igg antibody. th e anti-pecam/catalase conjugate, but not the igg/catalase conjugate, bound specifi cally to the pulmonary endothelium and attenuated hydrogen peroxide injury. specifi c strategies have been developed to maximize uptake of vector into alveolar epithelial cells. it is possible to enhance lung transgene expression with the use of surface-active agents such as perfl urocarbon, which enhances the spread of vector and mixing within the epithelial lining fl uid [84] . agents that reduce ubiquitination of aav capsid proteins following endocytosis, such as tripeptide proteasome inhibitors, dramatically augment (>2,000-fold) aav vector transduction in airway epithelia [85] . ultimately, the success or failure of gene-based therapies for ali/ards is likely to rest on the identifi cation of better gene targets. ongoing advances in our understanding of the pathophysiology of ali/ards continue to reveal novel therapeutic targets for gene-based approaches. promising potential approaches include strate gies to enhance or restore lung epithelial and/or endothelial cell function [86] , to strengthen lung defense mechanisms against injury [87] , to speed clear ance of infl ammation and infection, and to enhance the repair process following ali/ards [88] . ali/ards may be a particularly suitable disease process for gene-based therapies (table 4 ). th is is supported by increasing evidence from relevant preclinical ards models for the effi cacy of gene-based therapies that enhance or restore lung epithelial and/or endothelial cell function, strengthen lung defense mecha nisms against injury, speed resolution of infl ammation and infection, and enhance the repair process following ali/ards. despite this promising preclinical evidence, the potential for gene based approaches to ali/ards in the clinical setting remains to be realized. multiple barriers exist to the successful use of gene-based therapies in the lung, which limit the effi cacy of these approaches. future research approaches should focus on overcoming these barriers, by developing more eff ective and less immunogenic vector delivery systems, developing strategies to focus gene expression on specifi c injury zones of the lung for defi ned time periods, and identifying better molecular targets that can take advantage of these potentially very powerful therapeutic approaches. abbreviations aav, adeno-associated virus; ali, acute lung injury; ards, acute respiratory distress syndrome; il, interleukin; nf, nuclear factor; shrna, small hairpin rna; sirna, small interfering rna. the authors declare that they have no competing interests. epidemiology of acute lung injury incidence and outcomes of acute lung injury one-year outcomes in survivors of the acute respiratory distress syndrome ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome. the acute respiratory distress syndrome network pulmonary-artery versus central venous catheter to guide treatment of acute lung injury prone ventilation reduces mortality in patients with acute respiratory failure and severe hypoxemia: systematic review and meta-analysis elbourne d: effi cacy and economic assessment of conventional ventilatory support versus extracorporeal membrane oxygenation for severe adult respiratory failure (cesar): a multicentre randomised controlled trial the acute respiratory distress syndrome optimized aerosol delivery to a mechanically ventilated rodent pecamdirected delivery of catalase to endothelium protects against pulmonary vascular oxidative stress adenoviral augmentation of elafi n protects the lung against acute injury mediated by activated neutrophils and bacterial infection aerosol delivery of a î²-galactosidase adenoviral vector to the lungs of rodents adenovirusmediated persistent cystic fi brosis transmembrane conductance regulator expression in mouse airway epithelium airway epithelial cftr mrna expression in cystic fi brosis patients after repetitive administration of a recombinant adenovirus analysis of risk factors for local delivery of low-and intermediate-dose adenovirus gene transfer vectors to individuals with a spectrum of comorbid conditions modifi cation of nasal epithelial potential diff erences of individuals with cystic fi brosis consequent to local administration of a normal cftr cdna adenovirus gene transfer vector a phase i study of adenovirus-mediated transfer of the human cystic fi brosis transmembrane conductance regulator gene to a lung segment of individuals with cystic fi brosis aerosol and lobar administration of a recombinant adenovirus to individuals with cystic fi brosis. i. methods, safety, and clinical implications recent developments in adeno-associated virus vector technology analysis of adeno-associated virus progenitor cell transduction in mouse lung adeno-associated virus type 5 (aav5) but not aav2 binds to the apical surfaces of airway epithelia and facilitates gene transfer adeno-associated virus type 6 (aav6) vectors mediate effi cient transduction of airway epithelial cells in mouse lungs compared to that of aav2 vectors repeated adeno-associated virus serotype 2 aerosol-mediated cystic fi brosis transmembrane regulator gene transfer to the lungs of patients with cystic fi brosis: a multicenter, double-blind, placebo-controlled trial safety and biological effi cacy of an adeno-associated virus vector-cystic fi brosis transmembrane regulator (aav-cftr) in the cystic fi brosis maxillary sinus lentivirus vectors pseudotyped with fi loviral envelope glycoproteins transduce airway epithelia from the apical surface independently of folate receptor alpha gene therapy of human severe combined immunodefi ciency (scid)-x1 disease cavazzana-calvo m: insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of scid-x1 rna interference for î±-enac inhibits rat lung fl uid absorption in vivo eff ect of antisense oligonucleotides to nuclear factor-îºb on the survival of lps-induced ards in mouse electroporation-mediated transfer of plasmids to the lung results in reduced tlr9 signaling and infl ammation a receptor-targeted nanocomplex vector system optimized for respiratory gene transfer cationic lipid-mediated cftr gene transfer to the lungs and nose of patients with cystic fi brosis: a double-blind placebo-controlled trial rna interference therapy in lung transplant patients infected with respiratory syncytial virus in vivo gene silencing (with sirna) of pulmonary expression of mip-2 versus kc results in divergent eff ects on hemorrhage-induced, neutrophil-mediated septic acute lung injury silencing cd36 gene expression results in the inhibition of latent-tgf-î²1 activation and suppression of silica-induced lung fi brosis in the rat prevention of lung ischemia-reperfusion injury by short hairpin rna-mediated caspase-3 gene silencing nuclear factor-îºb decoy oligodeoxynucleotides prevent acute lung injury in mice with cecal ligation and puncture-induced sepsis eff ects of intratracheal administration of nuclear factor-îºb decoy oligodeoxynucleotides on long-term cigarette smokeinduced lung infl ammation and pathology in mice prevention of lpsinduced acute lung injury in mice by mesenchymal stem cells overexpressing angiopoietin 1 cell-based angiopoietin-1 gene therapy for acute lung injury stem cells and cell therapies in lung biology and lung diseases calculating expected lung deposition of aerosolized administration of aav vector in human clinical studies repeated aerosolized aav-cftr for treatment of cystic fi brosis: a randomized placebo-controlled phase 2b trial aerosol and lobar administration of a recombinant adenovirus to individuals with cystic fi brosis. ii. transfection effi ciency in airway epithelium inhibition of lung tumor growth by complex pulmonary delivery of drugs with oligonucleotides as suppressors of cellular resistance gene transfer to the lung: lessons learned from more than 2 decades of cf gene therapy acute lung injury does not impair adenoviral-mediated gene transfer to the alveolar epithelium fatal systemic infl ammatory response syndrome in a ornithine transcarbamylase defi cient patient following adenoviral gene transfer adenoviral gene transfer of a mutant surfactant enzyme ameliorates pseudomonas-induced lung injury gene therapy for oxidant injury-related diseases: adenovirus-mediated transfer of superoxide dismutase and catalase cdnas protects against hyperoxia but not against ischemiareperfusion lung injury angiopoietin-1 increases survival and reduces the development of lung edema induced by endotoxin administration in a murine model of acute lung injury enhanced expression of 70-kilodalton heat shock protein limits cell division in a sepsis-induced model of acute respiratory distress syndrome human apoa-i overexpression diminishes lps-induced systemic infl ammation and multiple organ damage in mice protection against pseudomonas aeruginosa pneumonia and sepsisinduced lung injury by overexpression of î²-defensin-2 in rats overexpression of the na-k-atpase î±2-subunit improves lung liquid clearance during ventilation-induced lung injury interleukin-1î² causes acute lung injury via î±vî²5 and î±vî²6 integrin-dependent mechanisms adenoviral-mediated overexpression of socs3 enhances igg immune complex-induced acute lung injury dose-dependent improvements in outcome with adenoviral expression of interleukin-10 in a murine model of multisystem organ failure pretreatment with cationic lipid-mediated transfer of the na + k + -atpase pump in a mouse model in vivo augments resolution of high permeability pulmonary oedema electroporation-mediated gene transfer of the na + ,k + -atpase rescues endotoxin-induced lung injury interleukin-10 gene transfer: prevention of multiple organ injury in a murine cecal ligation and puncture model of sepsis nonviral gene delivery with indoleamine 2,3-dioxygenase targeting pulmonary endothelium protects against ischemia-reperfusion injury silencing of fas, but not caspase-8, in lung epithelial cells ameliorates pulmonary apoptosis, infl ammation, and neutrophil infl ux after hemorrhagic shock and sepsis caveolin-1 sirna increases the pulmonary microvascular and alveolar epithelial permeability in rats interleukin-10 delivery via mesenchymal stem cells: a novel gene therapy approach to prevent lung ischemia-reperfusion injury bone marrow stem cells expressing keratinocyte growth factor via an inducible lentivirus protects against bleomycin-induced pulmonary fi brosis cell-based gene transfer of vascular endothelial growth factor attenuates monocrotaline-induced pulmonary hypertension microvascular regeneration in established pulmonary hypertension by angiogenic gene transfer tetracycline-inducible transgene expression mediated by a single aav vector effi cient transfection of non-proliferating human airway epithelial cells with a synthetic vector system tailoring the aav vector capsid for gene therapy artifi cial evolution with adeno-associated viral libraries analysis and optimization of the cationic lipid component of a lipid/ peptide vector formulation for enhanced transfection in vitro and in vivo stabilized integrin-targeting ternary lpd (lipopolyplex) vectors for gene delivery designed to disassemble within the target cell cpg-free plasmids confer reduced infl ammation and sustained pulmonary gene expression toll-like receptor expression reveals cpg dna as a unique microbial stimulus for plasmacytoid dendritic cells which synergizes with cd40 ligand to induce high amounts of il-12 targeting type ii and clara cells for adenovirus-mediated gene transfer using the surfactant protein b promoter development of lentiviral vectors with regulated respiratory epithelial expression in vivo expression of cftr from a ciliated cell-specifi c promoter is ineff ective at correcting nasal potential diff erence in cf mice a human epithelium-specifi c vector optimized in rat pneumocytes for lung gene therapy tight control of gene expression in mammalian cells by tetracycline-responsive promoters use of a new generation reverse tetracycline transactivator system for quantitative control of conditional gene expression in the murine lung construction of an rtta2(s)-m2/ tts(kid)-based transcription regulatory switch that displays no basal activity, good inducibility, and high responsiveness to doxycycline in mice and non-human primates adenoviral vector transfection into the pulmonary epithelium after cecal ligation and puncture in rats ubiquitination of both adeno-associated virus type 2 and 5 capsid proteins aff ects the transduction effi ciency of recombinant vectors gp130-stat3 regulates epithelial cell migration and is required for repair of the bronchiolar epithelium spatial and temporal expression of surfactant proteins in hyperoxia-induced neonatal rat lung injury intrapulmonary tnf gene therapy reverses sepsis-induced suppression of lung antibacterial host defense clinical review: gene-based therapies for ali/ards: where are we now? the present work was supported by funding from the health research board key: cord-014597-66vd2mdu authors: nan title: abstracts from the 25th european society for animal cell technology meeting: cell technologies for innovative therapies: lausanne, switzerland. 14-17 may 2017 date: 2018-03-15 journal: bmc proc doi: 10.1186/s12919-018-0097-x sha: doc_id: 14597 cord_uid: 66vd2mdu nan . a schematic representation of crispr based synthetic transcription factor technology. b mrna expression levels of protein transport related genes (napg, rab5a and arpc1b). background an increasing number of biologics are entering the development pipelines of pharmaceutical companies [1] . today, the preferred production host for therapeutic proteins is the cho cell line. however one of the major hurdles, especially for the production of non-antibody glycoproteins, is host cell-related proteolytic degradation which can drastically impact developability and timelines of pipeline projects. material and methods spike-in: cho cells were cultivated in a chemically defined culture medium at 36.5°c/10% co 2 in shake-flasks. when the cells reached their maximum viable density, they were removed by centrifugation and the conditioned medium was collected. a model mab was spiked into the conditioned medium and incubated at 37°c ± protease inhibitors. the amount of proteolytic degradation was analysed by western blot and lc-ms. transcriptomics: total rna was extracted after 3 days of cell cultivation. rna sequencing libraries were constructed and processed on the hiseq 2000 platform from illumina. generation of matriptase knockout: cho-k1 cells were transfected with mrna encoding "transcription activator-like effector nucleases" or "zinc finger nucleases" targeting matriptase exon 2. the transfected cells were subsequently sorted into single cells and analysed for frameshift mutations in both alleles via sanger sequencing. cell cultivation: fed batch cultivation was performed in 15-ml miniaturized bioreactors (ambr15). approximately 700 proteases are known in rodents. to reduce the number of candidate proteases we showed first that a model mab (prone to proteolytic degradation) incubated in conditioned medium of cho-k1 cells resulted in clipping of the mab, demonstrating the involvement of secreted/shedded proteases (fig. 1a) . broad spectrum inhibitors of the different protease classes revealed that only serine protease inhibitors prevented clipping. serine protease inhibitors of higher specificity highlighted the group of "s1a trypsin-like proteases" (fig. 1a) . comparison of the proteolytic degradation profile of several therapeutic proteins between cho-k1 with another cho cell line (cho-a) revealed less degradation in cho-a. therefore expression of the involved protease(s) is likely lower in cho-a. gene expression profile analysis of both cell lines showed five secreted/shedded "s1a trypsin-like serine proteases" more than 1.5 fold lower expressed in cho-cho-a versus cho-k1 (fig. 1b) . surprisingly, sirna knockdown experiments of these five candidates identified "matriptase" as the major protease involved in degradation of recombinant proteins expressed in cho-k1 cells ( fig. 1c upper panel) . next, we generated a cho-k1 matriptase knockout (ko) cell line. no proteolytic degradation product was detected when the model mab was spiked into conditioned medium of the ko cell line (fig. 1c lower panel) . also, stable expression of the model mab in the ko cell line resulted in no/significantly less clipping (fig. 1e) . the protein titer and the cell growth behaviour of the matriptase ko cells were similar to the corresponding wildtype (wt) cells (fig. 1d) as shown by comparative cultivation in ambr system. conclusions one major challenge for the production of recombinant proteins is cho host cell mediated proteolytic degradation which can negatively impact or even result in termination of projects [2; 3] . using a variety of techniques such as applying protease inhibitors, transcriptomics and sirna mediated knock-down we were able to identify "matriptase" as the major protease involved in degradation of recombinant proteins expressed in cho-k1 cells. subsequently we generated a matriptase deficient cho cell line. protein candidates of diverse formats, severely degraded in wt cho-k1 cell line, were not or significantly less cleaved in the matriptase ko cell line. furthermore cell growth, viability and productivity levels were comparable between the wt and the matriptase ko cell line. in summary, we have generated a superior platform-compatible cho production host cell line with the same favourable productivity properties as the parental host cell line [4; 6] , allowing expression of complex glycoproteins prone to clipping. background cho cell lines are common hosts for the production of biopharmaceutical proteins. so far, considerable progress has been made increasing productivity of cell culture to meet the rapidly growing demand for antibody biopharmaceuticals through increased cell densities and longer culture times. the downside is the increase of the process related impurities, bringing new challenges for process and harvest development. among the process related impurities such as host cell proteins (hcps) or dna the potential impact of lipids production and release during cell culture is still poorly understood due to the complex nature and diversity of this class of molecules. thanks to recent advances in analytical tools especially mass spectrometry, the advent of lipidomics offers now the feasibility to study several thousands of lipid species thus unraveling the possibility to understand and potentially control the interactions between high performance bioreactor processes, harvest conditions and purification. in order to analyze and quantify lipids, we developed a three steps method. in a first step, lipids were extracted with methyl tert-butyl ether (mtbe) according to matyash method [1] . lipids were then separated by liquid chromatography using either hilic of reverse phase column prior to detection and quantification by mass spectrometry. all lipid classes were detected by esi-ms/ms excepted cholesterol (apci-ms/ms). finally we applied this method to analyze the lipid content of different cell lines each expressing a different recombinant protein, during a 14 days fed batch process. lipid from cho cells were successfully extracted with a yield between 80% and 95% depending on the different lipid classes. stable isotope labeled lipids were used as internal standard in order to have comparable results between batches. the obtained results (fig. 1) show that for a given cell line, lipid distribution is changing over the process. moreover, this distribution may vary significantly depending on the cell line: cl-1 and in a lower extend, cl-3, show an accumulation of triglycerides from day 6 to the end of the process, while cl-2 doesn't seems to follow this trend. conclusion interestingly, in some cell lines/experimental conditions, we highlighted an overproduction of triglycerides and cholesterol leading to the accumulation of lipid droplets known as energy storage sink. at the metabolic level, these findings suggest a relative overflow of the carbon metabolism. from a process development perspective these findings can be considered on the one hand as a resource waste since the stored energy is not used for protein/biomass biosynthesis and on the second hand as the root cause of additional process challenges especially during the harvest and the first capture steps given the hydrophobic nature of these molecules. implementation of lipidomics analysis enables us to highlight a new type of process variability and to anticipate potential problems for the downstream steps. the application of this methodology on our platform has helped us to design tailor made solutions (pretreatment selection, filter selection,…) at the clarification step which are now implemented in our harvest development platform approach. . matriptase knock-out in cho cells prevents clipping of recombinant proteins. a serine protease inhibitors protect model mab from proteolytic degradation in cho-k1 cell derived conditioned medium. the model mab was incubated in conditioned medium for 0h or 48h at 37°c, subsequently samples were analyzed by western blot. broad spectrum serine protease inhibitors (aprotinin, leupetin) were added during incubation. aprotinin and leupetin are inhibiting proteolytic degradation. the intact mab (upper band) and the clipped mab (lower band) are indicated by arrows. b gene expression profiling of cho-k1 versus cho-a by ngs. shown is the gene expression profile of "secreted/shedded members of the s1a trypsin-like serine protease family" for cho-k1 and cho-a cell lines using next generation sequencing. the gene expression analysis highlights that five proteases were more than 1.5 fold higher expressed in cho-k1 cells (labelled with a red asterix). the y-axis shows the transcript abundance as rpkm (reads per kilobase of exon model per million mapped reads). c sirna knock-down identifies matriptase as major clipping protease and cho matriptase ko clone shows no detectable clipping activity. upper figure: sirnas directed against the five protease genes and scrambled (scr.) sirna were transfected and conditioned medium was collected three days after transfection. the model mab was incubated in fresh medium as control (first lane) and conditioned medium from the sirna transfected cells. samples were analyzed by western blot. only sirna targeting matriptase (st14) showed reduced proteolytic degradation. the intact mab (upper band) and the clipped mab (lower band) are indicated by arrows. lower figure: the model mab was incubated for 48h in conditioned medium collected from wt cho-k1 as well as the matriptase knockout clone. samples were analyzed by western blot. the intact mab (upper band) and the clipped mab (lower band) are indicated by arrows. no proteolytic degradation could be detected in the samples originating from the matriptase ko clone. d cell growth, viability and productivity in ambr (fed batch with temperature shift). cell growth, viability and volumetric productivity profiles of wt cho-k1 (red circles, n=2) and matriptase ko clone (blue squares, n=1) cultivated in 15-ml ambr. no significant differences were seen between wt and matriptase ko clone regarding cell growth and viability. comparable or slightly higher productivity was detected for the matriptase ko clone compared to the wt. e significant reduced proteolytic clipping applying matriptase ko clone. the model mab was stable expressed in cho-k1 (wt) as well as the cho-k1 matriptase ko clone. samples were analyzed by western blot. the intact mab (upper band) and the clipped mab (lower band) are indicated by arrows. significant reduced proteolytic degradation could be detected in the samples originating from the matriptase ko clone (3 samples each is shown for wt and ko cells) the glycosylation of therapeutic proteins is a critical quality attribute (cqa) and needs to be analyzed during cell line and bioprocess development. the current methods for analyzing glycosylation are mainly based on the enzymatic release of glycans. they are tedious and offer only limited throughput, which makes them unsuitable for cell line development work. in this study we evaluated a novel paia assay for measuring intact glycoproteins with capture beads and fluorescence labeled plant lectins to analyze glycans in a high throughput 384-well plate format. material and methods analytes: erbitux © , mabthera © , arzerra © and avastin © . two glycoengineered variants of one igg were kindly provided by merck (vevey, switzerland). all analytes were spiked into cho-k1 cell culture supernatant or buffer, diluted 1:1 with a denaturation solution and incubated at 65 or 70°c for 20 minutes to expose the fc glycans. erbitux samples were analyzed under denaturing conditions to detect fab-and fc-glycosylation and in native conditions for fab glycosylation only. 10μl of pretreated sample was added to each well of the special 384-well paiaplate, containing labeled lectin and capture beads. the microplate was incubated for 45 minutes at 1800 rpm on an orbital shaker at room temperature and spun down at 500 xg. the read-out was done on a fluorescence microscope (synentec, elmshorn, germany) in less than five minutes. results figure 1a : lectin binding profiles of different iggs. the analysis of different igg results in lectin binding profiles which show the different degrees in glycosylation. high abundance of sugars leads to high binding rates of the lectin for the respective sugar. avastin has a very low degree of galactosylation and high mannose species compared to mabthera and arzerra (fig. 1a) . only arzerra is carrying glycans with 2-6 linked sialic acids. these findings are in line with results from literature [1] . figure 1b : distinction between fc and fab glycosylation in erbitux. without denaturation only the fab glycans are detectable in erbitux. denaturation leads to additional exposure of the fc glycans and thus higher lectin binding rates compared to native erbitux. gna and npl only bind to denatured erbitux indicating that the high mannose glycans are only present on the fc part. the equal sna binding rates for both conditions confirm that the 2-6 linked sialic acids are almost exclusively found on the fab part. this is in agreement with published data [2] . figure 1c : lectin binding rates correlate with the levels of galactosylation and fucosylation. increasing degrees of glycosylation in the mixtures of the glycan variants from merck lead to higher lectin binding rates for all galactose and fucose markers proving that quantitative analysis can be performed with these assays. the cona lectin which binds to the common core mannose glycan motive remains at the same level, suggesting that the fc glycans were similarly exposed in all samples. the results demonstrate that paia assays are capable of quickly detecting differences in glycan patterns of different antibodies. in addition it was shown that glycan variants of the same igg can be analyzed quantitatively. and finally we could confirm the differences in fab and fc glycosylation in erbitux. we believe that bead-based assays with lectins have a great potential for monitoring product quality early in the development process. background gene-and cell therapy-based medicines are experiencing resurgence due to the introduction of "next generation" transfer viral vectors, which have demonstrated improved safety and efficacy. adeno associated viruses (aav) and lentiviruses are very commonly used in therapeutics and often produced by transient gene expression, using pei-mediated transient transfection in hek-293 or hek-293t cells [1] . the critical raw materials needed for cgmp vector production must be sourced from approved suppliers and should have gone through a rigorous testing program to reduce the risk of introducing adventitious agents into the production process. correspondingly, the pei transfection reagent must also be sourced from a qualified supplier, and have gone through rigorous testing to ensure reliable transfection efficiencies, and hence reproducible virus production yields. here, we present peipro® and peipro®-hq, the unique pei-based transfection reagents suitable for use in process development and in cgmp biomanufacturing, respectively. unlike commercially available peis, peipro® benefits from extensive research and development in polymer chemistry and formulation for mammalian cell transfection. we further demonstrate that peipro® and peipro®-hq are the reagents of choice for virus production runs in most cell culture systems, hence facilitating the transition from initial optimization during process development up to large-scale therapeutic viral vector production in adherent or suspension cells. manufacturing process of peipro® and peipro®-hq reagents. peipro® and peipro®-hq are fully synthetic reagents, free of any animal-origin components. in comparison to peipro®, a more extensive number of quality controls are performed on peipro®-hq to enable its use as a qualified raw material in gmp processes for the manufacturing of clinical batches of therapeutic products. lentivirus and aav production. irrespective of the cell culture vessel type, transfection using peipro® was performed following our recommandations. as an example, hek-293t (lentivirus) and hek-293 (aav) cells were thawed directly into each medium and passaged every 3 to 4 days before going into a 2 liter benchtop bioreactor. cells were resuspended and cultured for 3 days before transfection with peipro®. hek-293t cells were transfected with a third-generation system (four plasmids) for lentivirus production. hek-293 cells were co-transfected with three plasmids for aav production. lentiviral and aav titers were measured 48 and 72 hours post-transfection (data kindly provided by généthon). peipro® is the reagent of choice for virus production runs in most adherent and suspension cell culture systems from process development up to large scale clinical-grade virus production. irrespective of the cell culture-based system and production scale, peipro® and peipro®-hq have led to efficient viral vector yields in standard laboratory cell systems, such as in flasks, cell factories, and roller bottles, as well as in multilayers flasks or fixed-bed culture systems that take into account time and space concerns for the scaling-up process (table 1) . for example, high viral vector yields superior to 10 7 ig/ml and 10 11 -10 12 vg/ml were obtained respectively for lentiviruses and aavs in suspension hek-293t and hek-293 cells cultured in one of the commercially available synthetic cell culture medium balancd® hek293 (irvine scientific®). conclusion peipro® and its higher quality grade peipro®-hq are the unique pei suitable for efficient and reproducible production of therapeutic viral vectors. efficient viral vector production yields can be achieved in most cell culture systems, irrespective of the production scale. with appropriate and advanced quality controls, the highest quality grade peipro®-hq is commercially available to accompany academics and biopharmaceutical companies in terms of qualified raw material for their gmp-grade viral vector production needs. b distinction between fc and fab glycosylation in erbitux. erbitux was diluted to a concentration of 200 μg/ml in tris buffer and measured in native and denatured conditions to distinguish fab glycosylation (native erbitux) from fab and fc glycosylation (denatured erbitux). it could be confirmed that sialic acids are almost exclusively present on the fab part of erbitux and that the high mannose glycans are only found in the fc part. c lectin binding rates correlate with the levels of galactosylation and fucosylation. two glycan variants samples of the same igg from merck were mixed in different ratios to yield glycosylation rates of 9 to 55% in terminal β-galactose and 8 to 100% in core-fucose, based on data from 2-ab uplc analysis. the mixtures all contained 0.5 μg igg per well. the measured lectin binding rates for all galactose and fucose markers correlate very well with the respective degree of glycosylation in the mixtures. all measurements were performed in triplicates (irvine scientific®) . hek-293t (lentivirus) and hek-293 (aav) cells were thawed directly into each medium and passaged every 3 to 4 days before going into a 2 liter benchtop bioreactor. cells were seeded and cultured for 3 days before being transfected by peipro® (polyplus). for transfection, four plasmids were used for lentivirus and three plasmids were used for aav. lentiviral and aav titer were measured 48 and 72 hours post transfection (data kindly provided by généthon) table 1 (abstract p-004). peipro®, the reagent of choice for virus production runs in most cell culture systems in both adherent and suspension cells. irrespective of the cell culture-based system and production scale, peipro® and peipro®-hq have led to efficient viral vector yields superior to 10 7 ig/ml and 10 9 vg/ml, respectively for lentiviruses and aavs background continuous perfusion process is making a comeback as a competing upstream manufacturing technology for the production of biopharmaceuticals compared to the standard fed batch processes. this is primarily because of cost advantages such as reduced capital cost and increased product yield. the change in status of perfusion process from older perfusion to the new-age perfusion is due to availability of better cell retention devices leading to more efficient processes, improved cell lines, cell culture medium capable of supporting high cell densities and better bioreactor control strategies. in this work, we present the advantages, limitations and challenges of fed batch and perfusion type of processes through case studies. table 1 results the perfusion run yielded 5-fold higher titer compared to fed batch run (fig. 1a) . considering the number of runs that could be executed in a manufacturing facility within the same calendar days, about 1fold increase in product output can be achieved with the perfusion process (fig. 1a) . this difference is attributed to higher ivcc, higher pcd and longevity of cells because of decreased level of toxic metabolite concentrations such as lactate and ammonia. case 2: understanding product retention in perfusion process the new-age perfusion processes utilize hollow fiber filters. this has been observed to cause retention of product within the bioreactor especially towards the end of the production run. two types of experiments were conducted to study the factors contributing to product retention: -spiking studies: -role of product titer: product was spiked into chemically defined media -role of different cell viability: different broths with varying viability spiked with same product titer -evaluation of different hollow fiber membrane (m1, m2 and m3) on product retention. from spiking studies, it was evident that cell debris and poor quality cell broth (lower viability) were the major factors contributing to product retention (fig. 1c) . from the different membranes experiments, it was identified that at pilot scale, m1 showed much higher retention from the first perfusion cycle itself and it increased to more than 75% towards the end of the batch. however, with m2 membrane, product retention started only late (after 50% of batch duration) and it remained low (~20-40%). on the contrary, this difference was not observed at 1l scale due to the usage of membranes with larger filter area (2-3 folds higher compared to pilot scale). when the filter area per unit volume of perfusate was decreased by half (m2_batch 4) for the pilot scale, even m2 showed retention profile similar to m1 (fig. 1d ). we presented data to show that perfusion process has 5-fold increase in product yield on a per-batch basis and a 3-fold increase when facility throughput is considered. product retention is a technical challenge that requires optimization (perfusion rates and filter membrane types). we believe it is imperative that labs that develop processes for biologics can now consider both perfusion and fedbatch based processes as both these technologies can now closely compete with each other. the choice of the process format going forward should now solely be dependent on the requirement for the biologic rather than the earlier perception that fed-batch is the preferred choice because of its simplicity. background chinese hamster ovary (cho) cell culture has been widely used for production of monoclonal antibodies in the pharmaceutical industry. previous studies have shown that the cell specific productivity in cho cells can be increased by glucose limitation [1] . introducing a productivity enhancing effect it is possible that this also affects the quality of the product such as glycosylation or other posttranslational modifications. in this work, we are focusing on the impact of glucose limitation and increased productivity on the product quality of a monoclonal antibody produced in a fed-batch cultivation of cho cells. materials and methods cho cells were cultivated both under limiting and non-limiting nutrient conditions in fed-batch. for fed-batch cultivation the reduced range for glucose concentration was chosen between 0.2 and 0.5 g/ l. reference cultivation was performed between 1.5 and 3.0 g/l. both cultures were fed with similar volumes of a complex nutrient supplement. all cultivations were performed in chemically-defined, animalcomponent free cho growth media (xell ag). viable cell density and viability were determined using the automated cell counting system cedex (roche diagnostics), glucose and lactate concentrations were detected via ysi (ysi life sciences). amino acid were quantified using hplc-fld, vitamins were quantified using reversed phase chromatography coupled to a triple quadrupol mass spectrometer (varian 320, selected reaction monitoring). amounts of igg1 were quantified via protein a hplc, mab purified from another cho cell clone was used as a standard. the analysis of product quality was performed by intact mass analysis using reversed phase chromatography coupled to a microotof-q ii mass spectrometer (bruker daltonik). the cho cell culture cultivated under low nutrient conditions reached a 54% higher viable cell density than the reference culture (fig. 1a) . the product titer was even increased by 109% (fig. 1b) . the spent media analysis shows that some amino acids and vitamins were present at presumably limiting concentrations after day 5/6, mostly in the low nutrient level culture (down to 40 to 190 μm for tyr, gln, arg, and asn, below 1 μm for pyridoxine, data not shown). the product quality showed significant changes for the changed feeding strategy ( fig. 1c and d) . as expected, the glycation level decreased from 3% to 1% compared to the reference culture. the truncation level of c-terminal lysine at the heavy chain of the mab increased from 79% to 88%. the glycosylation was also significantly influenced by the low nutrient level (fig. 1e) : the nonfucosylated variants increased from 3% to 6% (fig. 1f) , the degree of galactosylation increased from 31% to 39% (fig. 1g) . cultivation under low nutrient level led to 54% higher viable cell density and a product titer increased by 109% when compared to reference culture grown under non-limiting nutrient conditions. the analysis of product quality reveals 75% less glycation of light chain for cho cells grown under low nutrient conditions (0.7% vs 2.7% in reference culture). the truncation of c-terminal lysine decreased by 10% (from 88% to 79%), the degree of galactosylation increased by 23% (from 31% to 39%, also observed by takuma et al. [2] ) and non-fucosylated glycans increased by 105% (from 2.8% to 5.8%) under low nutrient conditions. the product quality analysis by intact mass proved to be highly robust (average cv for four replicates = 2%). in summary, cultivation with alternative feed led to higher igg product titer and better product quality (glycation unwanted, higher amount of non-fucosylated glycans leads to higher antibody-dependent cellmediated cytotoxicity (adcc), higher amount of galactosylation to higher complement-dependent cytotoxicity (cdc) and adcc [3, 4] ). background we developed an automated, multiwell plate (mwp) based screening system for suspension cell cultures (fig. 1a) which is now routinely used in cell culture process development. it is characterized by a fully automated workflow with integrated analytical instrumentation. it uses shaken 6-24 well plates as bioreactors which can be run in batch and fed-batch mode with a capacity of up to 768 reactors in parallel [1] [2] [3] . a wide ranging analytical portfolio to monitor cell culture processes and also a cooperation with internal high throughput (ht) analytic groups to characterize product quality are available. in addition the use and the benefits of spectroscopic methods for cell culture automation were shown in the past [4, 5] . automated cell culture systems enable broader screening within a shorter time frame for many applications in upstream process development. the higher degree of parallelization and automation helps to screen for most promising parameters in a shorter time. the use of broad doe screening design allows in addition the identification of parameters that support high titers while keeping high product quality (multiple factors at the same time). the illustration (fig. 1b) shows an example how this combination can speed up process development steps. main applications of the cell culture automation are for example the identification of product quality levers and media or feed optimization. the application of the cell culture automation is shown for two examples. the goal in the first application was to identify levers to reduce trisulfides. by a screening of 39 conditions in parallel (in 4-fold replication, 158 wells in sum) the reduction of trisulfides by 97.5 % (normalized to start level) was possible. in addition the levers for trisulfide reduction were identified. the best and start conditions were verified in bioreactor scale (fig. 1c) . the goal in the second application was to increase product concentration without an impact on product quality. by a screening of 54 conditions in parallel (in 4-fold replication, 216 wells in sum) the increase of titer from 1.5 g/l to 3.7 g/l (> factor 2) was possible by media platform change and media optimization. an impact on product quality could not been shown. the best conditions were also verified in bioreactor scale (fig. 1d) . the benefits of using cell culture automation in late stage process development were shown based on two examples of current applications. for this purpose the experimental results of the development work of two late state projects using the in-house developed automated cell culture system were shown. the first example shows the capability of the automated cell culture system by reducing trisulfides significantly in just one experiment. for the other project the final product concentration could be increased by factor 2.5 by a media screening and changing to the in-house media platform. these two examples show the potential of cell culture automation as a routine tool in process development. the cell line development process has become faster and is simultaneously generating more clone-and product-related analytical data. in order to select the best producer cell line, extremely heterogeneous data types need to be systematically compared. the timely availability of all data needed to decide which cell line to pursue has become a bottleneck in the cell line development workflow. to ensure sound decision making, new integrated workflow support and data analysis methods are needed. we have developed a new end-to-end platform for bioprocess development, which includes a cell line development workflow system supporting seeding, selection, passaging, analyzing, cryo-conservation, and processing in (micro-) bioreactors. this platform, genedata bioprocess™, enables partially or fully automated cell line selection and assessment processes, and it increases process efficiency and quality. the system tracks the full history of all clones -from initial transfection all the way to their evaluation in bioreactor runs -and combines this information with analytics data on molecules, clones, and product quality. it can directly integrate with all instruments, such as pipetting robots, bioreactors, and bioanalyzers. the system is designed for a wide range of . a schematic illustration of the automated cell culture system. only the core system is shown with a robotic plate handler as key device connecting cultivation, processing and analytical parts. b illustration of an example how cell culture automation can speed up process development steps. c application in the identification of product quality levers. d application in titer optimization biologic molecules, including antibodies (iggs, novel formats) and other therapeutic proteins (e.g., fusion proteins). highlighted use cases describe the identification of top producer cell lines, decision making support, bioreactor data management, and full clone history report documentation (fig. 1 ). genedata bioprocess, which was developed in collaboration with top pharmaceutical companies, can flexibly support various (non-linear) workflows and structure the collected information in a way that fosters collaboration across an organization. while increasing throughput is crucial to ensure the timely availability of optimal producer cell lines, high-throughput is only possible when automated processes in the laboratory and the resulting data collection and aggregation can be streamlined. genedata bioprocess helps to establish more productive processes by offering support and integration for automation stations and measurement devices. thanks to the comprehensive workflow support and the possibility to integrate results from cell line stability experiments, product quality assessment, and bioreactor suitability tests, genedata bioprocess provides a unique way to evaluate cell lines. comprehensive analysis of all data collected in the process helps to ensure the highest possible quality and minimize the time and resources needed for data analysis and management. integration of bioreactor data analysis and visualization with other parameters measured in cell line development, streamlines clone evaluation in micro-bioreactors and supports highthroughput operations. genedata bioprocess comprehensively tracks the full clone history from the origin of the host cell line to the generation of the validated monoclonal producer cell line. for promising clones, the clone history report can be generated with one click. besides supporting cell line development, genedata bioprocess is a comprehensive platform capable of tracking the complete bioprocess development process. in 2012, 14.1 million people suffered from cancer [1] making it to a major concern of our society. since common cancer treatment is limited and not effective for late stage carcinoma, alternative methods are needed to reduce the high mortality rate of cancer patients. one alternative approach is the application of the oncolytic measles virus (omv), because omv has a natural affinity against cancer cells. the major drawbacks of omv is to produce the extremely high amount of at least 10 11 tcid 50 (50 % tissue culture infective dose) per dose [2] which is needed. to solve this problem, a high titer process must be established including an efficient downstream processing (dsp). we developed an appropriate upstream processing and are able to produce 10 10 -10 11 tcid 50 ml -1 in a bioreactor with 0.5 l working volume [3] . now, we focus on the dsp part. the following study tested the application of charged depth filters for the omv clarification. in contrast to common dsp schemes, a depletion of virus particles or a loss of infectivity is not desired. the aim is a reduction of protein content and dna with minimal loss of infective omv. further, we investigated the influence of the cell culture medium on the depth filtration process. to explore the influence of the surrounding cell culture medium on the depth filtration performance, omv was either produced in serum-free medium (vp-sfm) or serum-containing medium (dmem + 10% fcs). the production was done in a str with a working volume of 0.5 l as described in [3] . cells and carriers were separated with an opticap xl 1-module (polygard-cr; 5 μm; merck). for the depth filtration millistak+ ce50 filters (merck) were used. the filter material was autoclaved and rinsed with 25 ml of 20 mm tris-hcl (ph=7.4). the virus suspension was filtered with a load of 50 l m -2 using a peristaltic pump (ism931c; ismatec) applying a flux of 150 l m -2 h -1 (fig. 1 ). samples were collected at the beginning and end of a filtration run. the omv titer (tcid 50 ml -1 ) of the samples was determined according to kärber and reed [4, 5] . protein content was measured with the pierce bca protein assay kit (thermofisher scientific) according to the manufacturer's instructions. dna was measured by a microtiter assay using quant-it picogreen dsdna reagent (thermofisher scientific) according to the manufacturer's instructions. we found that positively charged depth filters were suitable to clarify omv suspensions. the cell culture medium, in which the omv was produced, influenced the outcome of the depth filtration. a log reduction value (lrv) of 0.87 was determined for omv present in serum-containing medium (scm), whereas the titer of omv in serumfree medium (sfm) was reduced 1.63 log levels. this indicates that without serum in the surrounding liquid, omv will adsorb to the filter material. however, we must evaluate if the missing serum or other components present in sfm are responsible for this effect. total protein was not relevantly reduced by the clarification using charged depth filters. for omv present in scm, the residual protein content was slightly less compared to omv present in sfm (table 1 ). in contrast, host cell dna (hcdna) was bound to the filter material. we achieved a 33% reduction of hcdna for an omv suspension in sfm. after clarifying an omv suspension in sfm, the remaining hcdna content was even lower being only 42 %. conclusions charged depth filters are suitable for the first clarification step of omv downstream processing. residual protein could pass the depth scheme of the complete cell line development workflow support in genedata bioprocess. showcasing integration of data from diverse measurement instruments, data visualization for decision making support as well as, tracking of full clone history filter almost unhindered, whereas the hcdna content was already reduced to 42% at maximum. however, the omv titer was also reduced by the depth filtration. this undesired effect was stronger for the omv present in sfm. because the agencies require avoiding serum in clinical-grade production processes, this is disadvantageous. nonetheless, because sfm will be soon standard for omv production, further experiments have to be done preventing the omv reduction during clarification. one option can be to reduce the adsorption strength of the virus to the filter material by the addition of salt. moreover, it is important to establish a standardized protocol for the upstream processing. we determined batch-to-batch variations within the clarification indicating a strong impact of upstream processing (usp) on the outcome of the dsp. therefore, further studies must investigate the influence of usp parameter e.g. time of harvest and ph of the harvest solution on the omv. fed-batch culture is commonly employed to maximize cell and product concentrations in upstream mammalian cell culture processes. typical standard platform processes rely on fixed-volume bolus feeding of concentrated feed supplements at regular intervals. however, such static approaches might result in over-or underfeeding. to mimic more closely the dynamics of a fed-batch culture, we developed a dynamic feeding strategy responsive to the actual nutrient needs of a mab-producing recombinant cho cell line. results and discussion improvements made at different steps during fed-batch development are shown in fig. 1 . at step 1, all eight cell boost supplements were added to cdm4ns0 according to a doe approach, and batch cultures were performed. this evaluation allowed us to select only those cell boost supplements that were beneficial to the overall culture performance. non-performing cell boost supplements were removed and not considered further. at step 2, the selected cell boost supplements were added daily to the cultures at different ratios according to a doe approach, and fedbatch cultures were conducted. as expected, daily feed additions to replenish consumed nutrients substantially improved mab and peak cell concentrations as well as viable cumulative cell days (vccd) compared to batch cultivation. further, the results enabled us to fine-tune the feed ratio of selected cell boost supplements. at step 3, we further optimized the best performing feed ratio by investigation of static and dynamic feed protocols. most fed-batch protocols rely on constant feed additions on distinct days. however, these approaches often lead to substantial over-or underfeeding during bioprocessing. to improve such "static" protocols, we investigated three different "dynamic" approaches as shown in table 1 by applying the selected cell boost supplements with the optimized feed ratio. this investigation allowed us to further improve the bioprocess performance. the best performing approaches, constant and retrospective feed, were further investigated in fully automated bioreactors under controlled conditions. in general, constant cultivation parameters in the bioreactor slightly enhanced mab titers compared to shake flask cultivation. the retrospective feed strategy yielded 10% higher titers than the constant strategy. overall, the established methodology for fed-batch development allowed us to obtain 2.5× higher mab titers (batch mean: 1.9 g/l vs. fed-batch 4.9 g/l) in a short time and three simple steps. in addition, the product quality was investigated. compared to the legacy fedbatch process, fed-batches that were conducted with the newly selected basal and feed media altered the distribution of charge and glycan variants. the amount of aggregated product was not altered. the established methodology for fed-batch development is a rapid protocol to select well-performing feed supplements and optimize their ratio to the culture requirements. in three steps, mab titers were boosted 2.5x from 1.9 g/l to 4.9 g/l. product glycosylation and charge variants could be influenced by the newly selected basal and feed media compared to a legacy fed-batch process. the amount of aggregated product was not altered. the present study investigates the beneficial effect of spiking hyclone™ actipro™ basal medium with hyclone cell boost™ 7a and cell boost 7b feed supplements on growth and productivity of a recombinant cho cell line. to evaluate the impact of feed-spiking compared with cultivation in basal medium only, the cell line was grown in bioreactors under controlled conditions to determine cellspecific metabolic rates, nutrient consumption, and byproduct accumulation over the process time. transcriptome analysis of the cultivated cells, using microarrays on four consecutive days to investigate differential gene expression, revealed the beneficial effect of feed-spiking compared with cells grown in basal medium. model cell line was a mab-expressing cho dg44 (licensed from cellca gmbh) cultivated either in actipro basal medium only (ge healthcare) or in actipro basal medium feed-spiked with additional supplementation with 7% cell boost 7a and 0.7% cell boost 7b (ge healthcare). both cultures were grown in batch mode using dasgip™ cellferm-pro™ stirred-tank bioreactors (eppendorf). the beneficial effect of feed-spiking was analysed by transcriptome analysis using microarray technology (8×60 k design, agilent). both basal and feed-spiked processes lasted for seven days with viabilities above 95% until day 6. on day seven, a sharp decline in viability indicated the end of the batch process (fig. 1a) . in feed-spiked medium, cells initially grew slower but reached almost twice as high peak cell concentrations (17.6 × 10 6 c/ml) than in basal medium only (9.79 × 10 6 c/ml). remarkably, the integral of the viable cell concentration over the total process time (viable cumulative cell days [vccd] ) was similar between both process strategies (fig. 1c) . while mab production plateaued after day 4 in basal medium only (final titer 0.8 g/l), a continuous increase to three-fold higher final titers (2.4 g/l) was observed in feed-spiked medium (fig. 1b) . the higher titers could be attributed to generally higher cell-specific productivities (qp), which remained rather constant (~70 pg/cell/day) in feedspiked cultures. in basal medium, the qp continuously dropped by 20% (day 0 to 3), 50% (day 4), and > 90% (day 5 to 7) from 70 to 10 pg/cell/day in basal medium cultures. in average, the qp was 70% higher in feed-spiked cultures (fig. 1d ). transcriptome analysis of differentially expressed genes between cells grown in basal medium or feed-spiked medium were used to identify relevant go terms that indicated a more active proliferative state for feed-spiked cultures (data not shown). the top go terms significantly related to cell cycle and primary metabolism, cellular division, as well as nucleobase formation or regulation. furthermore, gsea revealed several significantly enriched set of genes related to gene transcription, dna replication and repair, cell growth and proliferation, as well as inhibition of apoptosis in feed-spiked cultures. thus, feed-spiking increased the proliferative activity of cultivated cells. several of the identified genes appear as promising targets for cell line engineering, but have not yet been described in relation to high-producing recombinant cell lines and will need to be evaluated in future studies. feed-spiking of basal medium is a convenient and easy way to considerably increase product concentrations in a simple batch culture. differential gene expression revealed genes that appear important for high cell-specific production rates, and this knowledge can be leveraged into cell line engineering approaches or the design of high producing cho cell media. in the latter case, a maximized supply of high biosimilarity must be demonstrated by physicochemical and functional characterization for approval requirements of phase i and phase iii studies in terms of efficacy, safety and immunogenicity. in this study, rounds of upstream and downstream processes were run to reach the cqa limits of the originator molecule. after conducting many different development strategies, the mirror plot images of the intact deconvoluted mass were found to be identical corresponding to similar levels of glycoforms. the uv chromatogram of reversed phase ultraperformance liquid chromatography (rp-uplc) of tryptic peptide mapping demonstrated that the primary structure of tur01 is identical to the originator as shown in fig. 1a . post-translational modifications (ptms) such as oxidation, deamidation, n-terminal pyroglutamic acid, c-terminal lysine truncation levels were also comparable for two products. the glycosylation site (hc-asn301) was confirmed by peptide mapping analysis and 100% glycan site occupancy was proven for tur01 and originator. the glycosylation pattern for two products were highly similar in terms of major glycans (g0f, g1f, g0f-gn and etc.). man5 level was lower in tur01 compared to the originator product which may not have any clinical effect on the molecule. the secondary structure was determined by atr-ftir spectroscopy. absorption bands (amide i and amide ii) were overlapped completely and amounts of α-helix and β-sheet structures were comparable. furthermore; size-exclusion chromatography (sec) analysis revealed that both products have the same level of purity (>99%) and aggregate (<1%) levels. the level of impurities were determined as below 4% by ce-sds. the capillary isoelectric focusing (cief) experiments showed that the charge variant profiles of two products are indistinguishable and the isoelectric point of main peak is observed at 8.3 for both products. the association/dissociation rate constants and binding affinity for both tur01 and originator were highly similar and similarity score was calculated greater than 99%, as shown in fig. 1b . in this study, state-of-art analytical techniques were used to assess the biosimilarity of tur01 to the originator adalimumab. head-tohead comparison data clearly demonstrated that tur01 is highly similar to the originator adalimumab in terms of physicochemical and functional characteristics. based on the analytical similarities, we . process performance of basal medium (black) and feed-spiked (red) bioreactor batch cultures: a cell concentrations and viability, b viable cumulative cell days and specific growth rate, and c antibody concentrations and cell-specific productivity. error bars indicate standard deviation from three independent experiments. the black arrows on day 4 indicate the beginning of decreasing cell-specific productivities and lower cell-specific growth rates in basal medium cultures believe that tur01 will have comparable pk/pd, potency, and efficacy results to the originator adalimumab. the expanded interest in intensified continuous bioprocessing has highlighted the need to develop a small scale model for perfusion cell culture. the direction in the industry has been to increase target cell densities to ≥50x10 6 vc/ml and decrease perfusion rates to ≤3vvd. in order to increase the throughput of our perfusion media development capabilities we sought to develop a small scale model of perfusion using the ambr®15 instrument (sartorius, germany). we used a modified cell settling model from the previously published by kreye et. al. to achieve the cell retention necessary to reach perfusion relevant viable cell concentrations [1] . in this work, we will show the application of this small scale model for: (1) identification of specific productivity performance over a steady-state for tested media, (2) identification of cspr min for a specific cell line and medium combination, and (3) confirmation of consistent product quality profiles between the small scale model and benchtop perfusion (data not shown). a chozn® cell line producing an igg1 was evaluated in several proprietary chemically defined media prototypes generated during the development of the catalog excell® advanced hd perfusion medium: "fed batch medium", "early prototype", "mid prototype", "intermediate prototype" and "late prototype" [2] . small scale simulation of perfusion experiments were run in ambr®15. media exchange was performed 3 times per day in equal amounts. agitation, gassing, and liquid handling were stopped for an optimized period of time to allow cells to settle to the bottom. spent media was removed in an amount proportional to 1/3 rd daily exchange volume. agitation, gassing, and liquid handling were resumed and fresh media was added back to the vessels. for benchtop perfusion, cells were inoculated in 3l applikon bioreactors (applikon, netherlands). at a concentration of~6.0x10 6 vc/ml, perfusion was initiated using the atf2 (repligen, massachusetts). perfusion rate was limited at 1.2vvd during steady-state. using the cell settling method described above we have been able to achieve ≥90% cell retention efficiency. all media tested in this work were able to reach and maintain the 30x10 6 vc/ml target cell density at 1vvd (fig. 1) . performance of each media formulation was ranked based on specific productivity (table 1) . using "intermediate prototype", minimum steady-state cspr was determined to be 33.3pl/c/d for this cell line. n-glycan analysis of ambr®15 and bioreactor samples via intact mass spectrometry displayed only slight differences in product quality profile (data not shown). our work has shown a clear distinction between various prototype perfusion media and demonstrated a 50% increase in specific productivity over "fed batch medium" used in perfusion. additionally, we have shown the application to further characterize the process using this model to determine cspr min for a given medium and cell line. the transient process has been successfully operated at 500l in a sartorius biostat single use bioreactor (sub), yielding 0.4kg of crude product from a two-week expression culture (table 1) . successful scale up of the process to 500l creates the potential to supply transiently expressed products to support toxicology studies or even early gmp clinical supply, enabling accelerated biopharmaceutical development project timelines. the scale up from rocking bioreactors (rbr) to sub scale identified some scalability issues. lower specific productivity due to increased cell growth and decreased titres were observed in the sub ( fig. 1 iii & iv). to improve the predictability of scale up, a new process was developed and evaluated in the sub vessels utilising a modified transfection method, which resulted in comparable expression levels and specific productivity between rbr and sub scales. two sets of expression vectors comprising heavy chain and light chain plasmids expressing a human igg1 kappa mab, as previously described [1, 2] were used in the process optimisation study. the cell line used for transient expression and the pei mediated transfection method has been described previously [1] . transfected cultures were run under fed batch conditions for 14 days in 22l ge healthcare wave bioreactors (rbr), hyclone sub using 50l and 250l hyclone bioreactor bags (thermo scientific). the transfection process was modified to address the reduced titres and higher viable cell density (vcd) seen in the sub cultures. shake flask cultures were used to assess the standard (a) and modified transfection processes (b and c) (fig. 1 , i & ii). process c was identified as the process to be studied at sub scale, offering the potential to mitigate the high viable cell densities (vcd) observed. scaling up process c to 50l and 250l sub resulted in cultures producing titres exceeding 1g/l with desired cell growth profiles. scale up of process a into sub vessels resulted in decreased productivity compared to the rbr scale. after optimisation, the sub process c yielded increased specific productivities and expression titres comparable to those seen at rbr scale (table 1) . medimmune has successfully completed the first known successful cho transient culture at 500l scale producing > 800mg/l of mab at harvest. process optimisation has subsequently demonstrated reproducible titres at 50l to 250l scale exceeding 1g/l with comparable glycosylation profiles between sub and rbr cultures across scales. comprehensive analysis of the impact of trace elements in media on clone dependent process performance and product quality background state-of-the-art biopharmaceutical processes are accounting concomitantly for process performance and product quality. even though high yielding, robust processes are the cornerstones of any process development, product quality parameters such as structural integrity, charge variances and post-translational modifications are progressively becoming the focus of the developmental work. in conjunction with host cell line selection and process performance parameters, media components are crucial for the continued progress in rational modulation of product quality attributes affecting biological activity, immunogenicity, half-life or stability. among media components, trace elements (te) are of particular interest as they play a pivotal role in various cell metabolism pathways. based on a comprehensive doe approach, extensive process performance-and product quality evaluation combined with metabolic flux analysis, the impact of several trace elements on the biopharmaceutical process is assessed. in a comprehensive i-optimal doe approach ( fig. 1) , the effect of six te in various concentration levels and combinations in serum-free media was studied for four different cho-k1 cells lines in an ambr® 15 setup. a scrutiny of the process performance parameters such as cellular growth, productivity, amino acids and vitamins consumptions rates for each of the conditions was performed. the process performance evaluation was accompanied by extensive product quality analysis including size and charge variants, glycosylation patterns, oxidation and methylation. furthermore, a metabolic flux analysis was performed based on the nitrogen balance. based on extensive analytical data, the obtained response surface model provides a clear insight into the impact of particular te and their combinations on process performance and product quality. the high model quality enables discriminations between clone dependent and clone independent effects. with an elevation in titer up to 25% in the best condition of the cell lines clearly show, that even state-of-the-art media can be outperformed by trace element rebalancing. analyzing specific rates in combination with metabolic flux analysis improves the understanding of metabolic restructuring of the cell lines under distinct te levels and combinations. modulation of trace elements levels had a tremendous effect on the charge heterogeneity and glycosylation structure of the different proteins. this provides a toolbox for the fine tuning and control of product quality parameters. taken together, the data further paves the way to the rational fine tuning of process performance and product quality attributes. background due to regulatory concerns and economic impact, ensuring product quality and consistency is now one of the main challenge faced by the biopharmaceutical industry. for monoclonal antibodies (mab), glycosylation is one of the most important quality attributes as it impacts on mab structure integrity, and ultimately on both clinical efficacy and safety. many factors affect mab glycosylation and its inherent heterogeneity, including the host cell, the culture medium, the mode of operation and the operating conditions. in this context, the capacity to monitor and control on-line the antibody glycosylation, from early-to late-stage process development, would be of salient interest to reduce the time and cost to market. in order to address this unmet need, we have designed an improved spr biosensor assay to measure the kinetics of interaction between a mab and the extracellular domain of the fcγriiia receptor bound at the biosensor surface [1, 2] . of salient interest, we also demonstrated that various binding kinetic signatures, especially different dissociation kinetics could be correlated with distinct mab glycosylation patterns and with therapeutic efficacies, as deduced from mass spectrometry and a surrogate adcc assay, respectively. in parallel, we have also harnessed a spr biosensor directly to a bioreactor, which permitted the at-line determination of the concentration of antibodies by hybridoma cells during a bioreactor culture. we now plan on combining both approaches to determine on-line the glycosylation profile of the produced mabs. our ultimate goal is to design a unique and highly innovative bioprocess control tool that can be readily applied in an industrial bio-manufacturing setting. reducing timelines and costs are key factors for bio-pharmaceutical industries to accelerate process development and drug delivery to patients. enhancing throughput of bioprocess development has become increasingly important for the screening and optimization of cell culture processes. this challenge requires high throughput tools. in a previous study [1] , we showed that ambr® 15, a robotically driven mini-bioreactor system developed by tap-sartorius, could be advantageous to accelerate process development. the use of ambr® 15 system allows us to test a large number of experimental conditions in a single experiment. therefore, the large amount of production samples to be characterized for product quality attributes (pqa) increases as well: the bottleneck has moved from the generation of samples at the production bioreactor step to in-process analysis. for product quality attribute analysis at lab scale, protein purification is generally carried out on >5ml columns which is incompatible with the size of ambr® 15 bioreactors. moreover, the applied methods are relatively low throughput. the development of a high binding capacity resin (up to 70 mg/ml) [2] , combined with high performing new cell lines which are able to produce up to 5 g/l of recombinant monoclonal antibodies, allow require the development of an efficient and high throughput (hts) purification method robot. the use of robotic equipment for small scale purification purposes is a great opportunity for us to tackle this bottleneck, by enabling highthroughput sample purification at smaller scale (200μl). recombinant monoclonal antibodies were produced by a genetically engineered dihydrofolate reductase (dhfr)-/-dg44 chinese hamster ovary (cho) cell line. clarified cell culture fluid (cccf) was obtained from 2 and 2k liter bioreactors after three filtration steps. minipurifications were performed on tecan freedom evo® robot with predictor robocol-umns® containing 200μl mabselect sure® resin. larger scale purification were executed using an aktaxpress using hitrap column prota. to assess monoclonal antibody purification at small scale, we first tested the repeatability of the minipurification, purifying the samples 8 times on the same columns and using different columns, focusing on the yield of the purification and the impact on product quality attributes, especially the hmws. then, we compared those results to those obtained with the aktaxpress at larger scale purification, comparing the yield of the purification and the pqa of the protein-a eluates obtained with both purification systems. finally, we assessed the capability of the robot to perform hts of buffer and purification conditions, evaluating three different buffers at different concentrations and ph values, and also testing different loading column capacities. in this study we established that the tecan can be used as a robust platform for purification at small scale. we observed similar purification yields, intra and inter run. the analysis of the pqa1a level showed there is also very high reproducibility. and the ph of the eluate showed as well strong comparability as well. table 1 shows the coefficient of variation (cv) of the yield, hmws and eluate ph are low, demonstrating the good reproducibility of the purification. the strong reproducibility obtained between the different purifications showed that the tecan and the aktaxpress are similar in terms of purification performance and pqa (fig. 1a, b) . the tecan is a versatile automated liquid handler allowing the screening of huge purification conditions (fig. 1c) , the possibility to purify large quantities of samples, while the samples amount is limited. the tecan has the potential to purify more than 150 samples/day, reducing timelines and allowing us to deliver faster to the patients. viable cell density monitoring in bioreactor with lensless imaging geoffrey monitoring cell density and viability of mammalian cell culture bioreactors is a necessary task that presents today a number of remaining challenges. the traditional measurement for bioreactor cell count and viability rely on using the trypan blue exclusion method once a day. while automatic cell counters have reduced the statistical manual error, sampling the bioreactor remains a contamination risk and is prohibiting process control as the sampled volume becomes siginficant. lensless imaging technology is a new method for accurately determining cell concentration and viability without staining. this technique directly acquires the light diffraction properties of each individual cells through their holograms images without any objective, lens or focus settings. living and dead cells have significant holographic patterns that can be distinguished and precisely counted. lensless imaging technique directly acquires the light diffraction properties of each individual cells through their holograms images without any objective, lens or focus settings. living and dead cells have significant holographic patterns that can be distinguished and precisely counted. we compare cell counts and viability between the reference method and our lensless imaging device, the cytonote counter. measures are performed once a day on samples from 12 bioreactors, from the inoculation to the end of the culture. we also assessed the repeatability of our method. another lensless imaging prototype is setup as a measurement chamber directly connected to a perfusion bioreactor, for continuously receiving the bioreactor broth, and therefore reproducing an in situ measure. with a concentration range up to 40x10 6 cells/ml ( fig. 1) and viability range at 75-100%, we obtained a correlation factor of 0.98 between the two compared methods. the large field of view allows the analyze of several thousand cells within a single image, keeping the statistical variability of the measure as low as 3%. our measurement chamber prototype has demonstarted its capability for continuous viable cell density and viability monitoring. we are now working at designing a steam strerilizable probe, and we envision lensless imaging to become the future method of choice for on-line monitoring of suspension cells cultures. lensless imaging technology is capable of accurately and precisely monitoring viable cell density and viability with a combination of significant advantages starting from low sample volume use, label free detection, quick measure, simple device, to high number of cell analyzed which let us think that it is a good candidate for very smallcomparison between both purification systems and the ability of the system to be used as a high throughput tools for buffers screening. a purification yield (%). b pqa1.a (normalized). c impact of the ph and buffer concentration on the pqa1.a scale bioreactor and high-throughput measures. its high repeatability is also a key parameters in the effort to narrow batch to batch deviations. in addition we demonstrate that this technique is potentially powerful for in-line and continuous monitoring of a lab bioreactor. we envision lensless imaging to become the future method of choice for on-line and in-situ monitoring of suspension cells and a perfect tool for process control in fed-batch or perfusion mode in single-use bioreactors or traditional steam sterilized vessels. it can certainly become the first vcd measurement technique to work from cell line engineering, to process development, pilot scale, and up to manufacturing scale. time-dependent product heterogeneity in mammalian cell fermentation processes background a consistent product quality is a major goal in the production of biotherapeutics, especially recombinant glycoproteins. whereas it is unlikely that the polypeptide chain changes during a production process, posttranslational modifications and protein folding are sensitive to fluctuations in parameters and conditions. here we focus on protein glycosylation as one important indication for product quality [1] . during a batch process conditions change continuously. at the beginning, the supply situation for the cell is excellent, but the secreted material stays a long time in the culture fluid. later during cultivation substrate provision decreases, whereas the exposition time of the protein to the culture fluid is much shorter. altogether this leads to product heterogeneity of the secreted protein during a batch culture. four different cell lines, two of human origin and two cho clones, producing four different recombinant glycoproteins were investigated in this study. together with their respective parental cell line the clones were cultured in three replicates in shakers. supernatant from the cultures were harvested at four time points. the removed culture volume was replaced by culture supernatant of the identically cultured corresponding parental cell line. the product was isolated from the supernatant and the glycans were released. one part of the released glycans was labeled with 2-ab and separated by hilic-fld. the other part of the glycans was permethylated and analyzed by maldi-tof mass spectrometry (fig. 1a) . the investigated proteins were antibody, antithrombin iii from cho clones and α1-antitrypsin, c1-inhibitor from human clones. the antennarity of the glycans is quite stable in all production phases. the degree of core fucosylation is very high in all products. a low fucosylation degree of antibodies may be favorable for a higher adcc performance [2] . some of the products showed an antennary fucosylation, which seemed to change not very much in different cultivation phases. nevertheless, this might be an issue due to an antigenic impact in the patient. the antennary galactosylation changes noticeable for the antibody and α1-antitrypsin. in both cases the degree is highest in the first phase. an incomplete galactosylation leads to truncated glycans. this leads inevitable to undersialylated antennas to be seen for α1-antitrypsin. the sialylation is the highest in the early phases and decreases during cultivation time. sialylation of a therapeutical protein is important for the half-life in the patient. therefore highly sialylated products are desired [3] . in further studies the consistency of the galactosylation and the sialylation will investigated for fed batch and long term continuous cultures in comparison to batch cultures. due to the feed solution or the fresh media being present during such processes, the supply situation should be excellent for the whole cultivation time. the differences between the maldi-tof and hilic-fld data originate from complex and unresolved chromatograms (fig. 1b , chromatograms not shown). for that reason coupling of hilic-fld and ms is very much recommended. background novel biologics are often selected from a large library of lead candidates in the initial stage of preclinical and clinical developments. for this selection, there is a demand for high-throughput production of recombinant proteins of high quality and in sufficient quantity. transient gene expression offers a rapid approach to the production of numerous recombinant proteins for the initial-stage developments of biologics. mammalian cells are major host cells for transient gene expression, but they have the disadvantages of complicated operations and high cost of culture. insect cells are easy to handle and can be grown to a high cell density in suspension with a serum-free medium. insect cells can also produce large amount of recombinant proteins through post-translational processing and modifications of higher eukaryotes. hence, insect cells have been recognized as an excellent platform for the production of functional recombinant proteins [1, 2] . in the present study, the production of an antibody fab fragment through transient gene expression in lepidopteran insect cells was examined. the dna fragments encoding the heavy chain (hc) and light chain (lc) genes of an fab fragment of mouse anti-bovine rnasea [3] were respectively cloned into the plasmid vector pihaneo, which contained the bombyx mori actin promoter downstream of the b. mori nucleopolyhedrovirus (bmnpv) ie-1 transactivator and the bmnpv hr3 enhancer for high-level expression [4] . trichoplusia ni bti-tn-5b1-4 (high five) cells were co-transfected with the resultant plasmid vectors using linear polyethyleneimine (pei; mw 40,000). before transfection, the plasmids and pei were prepared in 150 mm nacl, ph 7.0 and incubated at room temperature for 5 min. when the transfection efficiency was checked, a plasmid vector encoding the enhanced green fluorescent protein (egfp) gene was also co-transfected. transfected cells were incubated with a serum-free medium in a static or shake-flask culture. culture supernatants were analysed by western blotting and enzyme-linked immunosorbent assay (elisa). the numbers of green fluorescent cells and total cells in culture broth was determined using a flow cytometer. western blot analysis and elisa of culture supernatants showed that transfected high five cells secreted the fab fragment with antigenbinding activity. in static cultures, transfection and culture conditions, such as hc:lc gene ratio, a serum-free medium, dna:pei ratio, and dna amount per cell, were successfully optimized by flow cytometry of egfp expression in transfected cells and the yield of the secreted fab fragment measured by elisa. the effects of culture temperature and initial cell density were also examined by comparing the cell growth and the production of fab fragments in shake-flask cultures. under optimal conditions (medium, psfm-j1 (wako pure chemical industries, japan); hc:lc gene ratio, 3:7; dna, 5 μg/(10 6 cells); pei, 10 μg/(10 6 cells); initial cell density, 1 x 10 6 cells/cm 3 ; temperature, 24°c), the yield of more than 100 mg/l of fab fragment was achieved in 5 days in a shake-flask culture ( fig. 1) . transfection did not significantly affect the growth of high five cells as compared with untransfected cells. transient gene expression using insect cells may offer a promising approach to high-throughput production of candidate proteins for the development of biologics. the increasing demand for biopharmaceuticals produced in mammalian cells has led industries to increase volumetric productivity of bioprocesses through different strategies [1, 2, 3] . in this context, fedbatch and perfusion cultures have attracted more interest than conventional batch processes. the efficient application of such alternative processes requires the availability of reliable on-line measuring tools for cell density and cell metabolic activity estimation [4] . the comparison of different culture strategies for hek293 cell line producing ifn-γ are presented below: batch, fortified batch and fed-batch. in this context, a new robust feeding strategy based on the monitoring of alkali buffer addition was applied for the estimation of nutrient requirements. this method allows to increase cell density and product titer compared with the other strategies assessed. three different culture strategies were carried out in 2-litre biostat b-dcu ii bioreactor. first, a reference batch and a batch using fortified medium (nutrient enriched medium) were run and assessed in terms of viable cell density (vcd) and product titer, and set as initial references. then, a fed-batch was performed applying a feeding strategy based on the nutrient requirements estimation by monitoring the alkali buffer addition used for the control of ph. results vcd and product titer achieved for the different culture strategies assessed (batch, fortified-batch and fed-batch) are presented in table 1. in fortified batch an increase in vcd of 145% and also 350% in product titer were obtained compared with batch. in the fed-batch culture carried out (fig. 1) , we observed that alkali buffer addition profile matched the vcd evolution trend. thus, the monitoring of alkali buffer addition was used for estimating the nutrients requirements (i.e. the volume of feeding medium) at any time during the fed batch phase. the feeding strategy based on alkali buffer addition enabled to maintain glucose concentration set point therein a narrow range during fed-batch phase (around 20 mm). as a result, higher vcd (16.6·10 6 cells/ml) was obtained when compared with both batch references: vcd was enhanced to 241% and 39% and an increase up to 381% and 7% in product titer in respect to batch and fortified batch respectively. the results prove that fed-batch strategy based on the alkali buffer addition is a robust on-line monitoring method that enables to optimize the feeding strategy in a fed-batch cultures. three different culture strategies have been tested in bioreactor with a hek293 cell line producing ifn-γ. results show as the higher vcd is reached, the higher product concentration is achieved. therefore, from bioprocess development point of view, it is very interesting to implement strategies with higher vcd outcome, such as fed-batch operation mode. in this context, a new robust method for vcd estimation in fed-batch was applied. the alkali buffer addition necessary for maintaining the ph set-point is an on-line reliable and easy measuring variable that provides information about by-products formation (mainly lactic acid). the monitoring of this variable can provide information about the cell concentration, activity and metabolism, to detect changes in culture. besides that, a relationship between alkali buffer addition and vcd can be established since the first is strongly correlated with cell growth and metabolites consumption/formation. the application of alkali buffer addition measure to implement an optimal feeding strategy in fed-batch permits to enhance vcd and product titer when comparing with batch strategies. a novel approach to high throughput screening for perfusion background perfusion systems for suspended mammalian cells raise growing interest in the biomanufacturing industry. continuous manufacturing is growing in the field and is encouraged by health authorities [1, 2, 3] . this work addresses scale down limitations inherent to continuous media exchange and cell retention by using a semi-continuous system. data was generated with a set of different clones that were previously studied in fed-batch mode [4] . materials and methods 4 cho-k1 cell lines expressing the same monoclonal antibody (mab) and issued from the same transfection were used as models. 3.5l bioreactors (sartorius) were used for fed-batch and perfusion production runs. the perfusion bioreactors were run using an alternating tangential flow filtration device (repligen, xcell™ atf 2 system). the cell biomass was controlled by removing cells through a bleed line and was controlled using a biocapacitance probe (hamilton, incyte). the perfusion rate (d) was fixed to one vessel volume a day (vvd -1 ). the semi-continuous runs were made in 50 ml shake tube (tpp, tubespin® bioreactor 50). once a day, the tubes were centrifuged (5 min, 200 g), the supernatant removed (to mimic a perfusion rate of 1 vvd -1 ), replaced with fresh media and cells were re-suspended. the clone's growth potential were preserved across the systems (fig. 1 ). clone #3 always reached the highest viable cell density (vcd), followed by clone #1. clone #2 and #4 showed similar growth characteristics. it is interesting to note that in the perfusion bioreactor different patterns in terms of vcd were observed although the cell biomass signals were similar for all 4 runs. this reflects the fact that the capacitance measures the biomass and not the absolute cell count [5] . to estimate the minimum cell specific perfusion rate (cspr min ) in the semi-continuous experiment, the perfusion rate was divided by the maximum viable cell density (vcd max ). this value was compared to the cspr obtained during the 4 th set-point (sp4) of the perfusion runs. as expected, the bleed fraction decreased when the capacitance set-point was increased and went down to 5% or less of the total perfusion rate (data not shown). since the bleed removes the excess biomass, it is an indication of how close to a limitation the system is. therefore, the cspr calculated at sp4 was considered as the minimum cspr. the cspr min obtained in both systems were very close ( table 1 ). the semi-continuous system can therefore be used to identify the cspr min before running a continuous bioreactor, it therefore facilitates the decision making early in the development (to define the target cell density for a defined perfusion rate). the specific productivity (q p ) of the 4 clones was quantified at the maximum vcd (semi-continuous) or at sp4 (perfusion). absolute values are not representative since the cell environment is so different in both systems. nevertheless, a relative ranking proved to be indicative of the respective performances ( table 1 ). the maximum cell growth in fed-batch, semi-continuous and perfusion were also compared, their ranking was always preserved. both indications can be used to assess a performance ranking for different clones. the performance of 4 clones was studied in 3 different cultivation systems: fed-batch/perfusion bioreactors and semi-continuous shake tube. the semi-continuous system was able to precisely predict the cspr min , an important process parameter for perfusion. specific productivity and maximum cell density ranking was preserved across the systems, therefore the scale down experiment can be used to assess a performance ranking for perfusion clone screening. modulating antibody galactosylation through cell culture medium for improved function and product quality jenny y. bang, james-kevin y. tan the production of therapeutic antibodies (abs) requires high titers and excellent product quality to ensure efficient manufacturing and potent drug efficacy. glycosylation, or the attachment of sugars to organic molecules, is a critical quality aspect that can significantly alter ab binding, function, and therapeutic effect [1] . galactose is a key sugar of interest due to its significant impact on ab function and the ability to control galactosylation through cell culture medium. herein, irvine scientific assessed the ability of media components to modulate galactose levels on a model therapeutic ab. various media compositions were able to modulate galactosylation levels without compromising cell growth and ab titers. in addition, an in vitro assay was utilized to evaluate the functional ability of abs to bind and activate complement-dependent cytotoxicity (cdc). differences in galactosylation significantly altered the abs' ability to induce cell cytotoxicity. furthermore, design of experiment analysis determined the optimal ratio of supplements to maximize galactosylation. this "optimized supplement" was verified and evaluated against other suppliers' galactosylation supplements in terms of growth, titer, glycan analysis, and ab function. the optimized supplement outperformed all other suppliers' supplements and resulted in the best overall cell growth, titer, galactosylation, and ab function. ab against cd20 were grown in balancd® cho growth a and were fed with balancd® cho feed 4 on days 3-7 of the cultures. viable cell density and cell viability were assessed by a beckman coulter vi-cell xr, ab titer was assessed by a pall fortébio qk e , and glycan analysis was assessed by a perkinelmer labchip gxii. for the functional cdc assay, abs were incubated with daudi b lymphoblast cells and normal human complement serum. cell cytotoxicity was assessed with a promega cytotox-glo kit. various supplements were evaluated in fed-batch cultures and resulted in 15-45% ab galactosylation without compromising cell growth and ab titers. design of experiment analysis determined an optimal composition, deemed "optimized supplement," which was evaluated against a panel of galactose-modulating supplements from other suppliers. the optimized supplement resulted in a similar viable cell density (vcd) and cell viability compared to the fed-batch culture control which had no supplements (fig. 1a) . supplements from supplier 1 resulted in similar to half the vcd of the control while supplements from supplier 2 resulted in very low vcd and percent viability. all of the supplements except those from supplier 2 resulted in ab titers similar to the control (fig. 1b) . due to the poor growth and subsequently low titer from supplier 2's supplement, supplier 2 was not further evaluated. the glycan profiles were analyzed and are presented in (fig. 1c ). all the evaluated supplements were able to raise galactosylation; however, only the optimized supplement and the 2x supplier 1 supplement resulted in over 40% galactosylation. the function of the abs was further evaluated in a cdc assay (fig. 1d) . abs from the optimized supplement were more effective than the control abs and had a significantly lower half-maximal effective concentration (ec 50 , 1.19 μg/ml) than the control (1.71 μg/ml). abs from the 2x supplier 1 supplement had a similar ec 50 to the control which may be due to the higher man5% of the abs. an optimized supplement was produced through fed-batch evaluation and design of experiment analysis. the optimized supplement outperformed all other supplenments from other suppliers and resulted in the best overall cell growth, glycan profile, and functional ab activity (table 1) . industry practice for mammalian cell culture media and feed development typically employs a high-throughput screening (hts) platform along with large sets of experiments [1] . modern hts systems often include robotic liquid handlers to replace labor intensive steps. to align with advancements in the field, a semi-automated hts platform was developed to facilitate in-house media and feed development for early stage biologics projects. selecting appropriate instruments and integrating them into a seamless system are the keys to a hts platform. the developed hts platform uses 24 deep well plates (dwps) for culture vessels, the liquid handler of the advance microscale bioreactor (ambr15) for media/ feed formulation preparation in an aseptic environment, nyone cell imager for viability and cell growth analysis, tecan freedom evo's liquid handler for activity assay sample preparation, and cedex bioht for high-throughput metabolite analysis. 24 dwps offer comparable cell growth to shake flasks and compatible layout to ambr15, which makes the 24 dwp an ideal candidate for the platform. in addition, the user friendly design of experiments (does) interface and liquid handler function of the ambr15 expediates the formulation preparation of varying doe conditions [2] . a macro program was written and developed in excel to enable the easy import of does design from major statistics software packages, such as jmp and simca, into ambr15. performance qualification of each component were performed prior to implementing the hts platform. comparable cell growth profile and productivity were achived between shake flasks and 24 dwps (fig. 1a ), indicating compatable cell culture environment for the cells. cell counts using nyone gave identical cell growth ranking as the traditional count from vi-cell xr (fig. 1b) . freedom evo's liquid handler was optimized to produce comparable activity results to manual operation while expediting the sample preparation with improved consistency (fig. 1c) . finally, implementing the liquid handler function of ambr15 to support media and feed formulation significantly reduced the labor for each experiment. summary of the capability comparison between the hts platform and the traditional method are listed in table 1 . a case study of a complex feed screening with definitive screening design was completed using the semi-automatic hts platform. this experiment, containing more than 60 feed formulations in duplicates, was handled by one operator and delivered a 40% improvement in productivity within a 4 week period (data not shown). in addition, implementing the hts platform for this study also resulted into~80% reduction in labor while improving the traceability of formulation preparation. a semi-automated hts platform was developed to support media and feeds screening and development for early stage biologics projects. the platform utilizes 24 dwps, nyone cell imager, ambr15, freedom evo liquid handler system, and bioht metabolic analyzer to accelerate the screening process. this screening platform not only improves process throughput, operational precision, and traceability of formulation preparation, but also reduces the labor for the media and feed formulation preparation. background a perfusion medium requires high concentrations of specific nutrients while balancing other components to support intensified perfusion processes. using a combination of design of experiment (doe), multivariate analysis (mva), and spent media analysis, we developed a catalog "de novo" perfusion medium by working with multiple cho cell lines and proteins. the optimization of the medium in bioreactors using alternating tangential flow (atf) cell retention devices reduced the minimum cspr from over 80pl/cell/d to under 35pl/cell/d for most cell lines while increasing specific productivity during 30 day steady states with stable growth rates, viability, volumetric productivity and product quality. high throughput screening (hts) was performed with seven cell lines, while four were used in bioreactors: cho-s, dg44, and two chozn® gs lines, each producing different monoclonal antibodies and include a fusion protein. for hts experiments, cells were inoculated at 2.0x10 6 vc/ml with a 30 ml working volume in 50 ml tpp® tubes and cultured for 7 days in a multitron shaken at 200rpm, 37°c, 80% rh, and 5% co 2 . for benchtop perfusion, cells were inoculated at 0.4-2.0x10 6 vc/ml in 3l applikon bioreactors (applikon, netherlands) with a 2l working volume. bioreactors were operated at 350 rpm, 37°c, 40% do, and a ph of 6.9 or 7.1±0.05 depending on the cell line. oxygen was supplied through an l-sparger or microsparger as needed, and excell® antifoam (milliporesigma, germany) was added at a maximum rate of 0.25% v/v to control foam. at a cell concentration of~6.0x10 6 vc/ml, perfusion was initiated using the atf2 (repligen, massachusetts), with a bleed set to maintain cell concentrations at 50 or 80*10 6 vc/ml. two "de novo" prototype media were developed using doe and mva in hts with tpps and an ambr®15 [1] and one was chosen for further development after comparing to a basal medium enriched with feed in bioreactors. eleven components were identified as significant effectors of critical parameters for perfusion processes across evaluated cell lines. doe central composite experiments were run and component concentrations were optimized in the selected prototype. in parallel, amino acid specific consumption rates were calculated from bioreactor spent media samples and used to adjust the concentration of amino acids to target a reduced cspr. increasing specific amino acids concentrations resulted in a significant reduction of the minimum cspr across all tested cell lines -for example the cspr of cho-s was reduced from 60 to 39pl/cell/day (table 1 ). however, even at the lower cspr, spent media analysis revealed excess concentration of some amino acids, so specific accumulating amino acids were reduced and components were streamlined for the final medium: excell® advanced hd perfusion medium. using this medium, a cho-s and a chozn® gs cell line producing a fusion protein were cultured at a cspr of less than 40pl/cell/day with a vcd of 50*10 6 vc/ml. metabolic profile, productivity, and product quality were constant over the 30 day steady state. the chozn® gs cell line was also tested at 80*10 6 vc/ml with a cspr of 33pl/cell/day (fig. 1 ). we have developed a catalog perfusion medium from first principles, ensuring broadness of application by using seven cell lines in scaleddown systems and four in perfusion bioreactors. the final catalog medium showed significant improvements in productivity across all cell lines, with reduced csprs when compared to enriched fed-batch medium or initial prototypes (table 1) . there is a rising demand for accelerated process development, increased efficiency and economics for biopharmaceutical production processes. furthermore, increased process understanding have evolved from the process analytical tool initiative (pat) and the quality by design (qbd) methodology. in contrast to one-factor-at-atime methods, statistical design of experiment (doe) methods are widely used to develop biopharmaceutical processes. even if highthroughput systems can handle these numbers of experiments in parallel, the heuristic restriction of boundaries and the high number of factors results in stepwise iterations with multiple runs. therefore, the combination of model-based simulations with doe methods (mdoe) for the development of sophisticated cell culture processes is a novel tool for process development [1] . it is used to reduce the number of experiments during doe and the time needed for the development of more knowledge-based cell culture processes. this concept was applied to the optimization of the initial glutamine and glucose concentrations of a cho batch process. a mechanistic model was adapted and modified from [2] and used to describe the dynamics of cell metabolism and antibody production of an il-8 antibody producing cho cell line (see abbreviation of fig. 1 for cultivation details). experiments were simulated and compared to a fully experimental doe. as can be seen from table 1 , user defined constraints were chosen to get a stable and reproducible process with the aim of maximizing the cell density but decreased lactate and ammonia production. at first, the experimental space was estimated by simulating the responses for broad concentration ranges and calculating the multiple response desirability function (fig. 1a) . this results in a small area (turquoise) suggested as experimental space. experiments were planned within these boundaries and responses were either simulated ( fig. 1b, 4 cultivations for fitting the model) or compared with the purely experimental responses (fig. 1c, 16 cultivations). optimal concentrations for glutamine and glucose with respect to the constraints are in the lower right corner and similar for both methods (red frame, fig. 1 ). compared with the fully experimental design, mdoe results in a reduction of 75 % in the number of experiments (4 experiments for modelling vs. 16 experiments in experimental doe). the method is intended to optimize cultivation strategies for mammalian cell lines and evaluated these before experiments have to be performed in laboratory scale. this results in a significant time and cost reduction during process development and process establishment. the strategy is especially intended for the use in multi-single-use-devices to speed up process development. . at a target cell density of 50*10 6 vc/ml, volumetric productivity was stable for a 30 day steady state with excell® advanced hd perfusion medium. shorter steady states were tested at 80*10 6 vc/ml background for the large-scale production of therapeutic glycoproteins, fedbatch culture has been widely used for its operational simplicity and high titer. however, repeated feeding of medium concentrates and/ or addition of a base to maintain optimal ph during fed-batch culture lead to increase in osmolality. the hyperosmolality affects glycosylation in a protein-specific manner. however, the mechanism behind such osmolality-dependent variations in glycosylation in recombinant chinese hamster ovary (rcho) cells remains unclear. in this study, to better understand the effect of hyperosmolality on the glycosylation of a protein produced from rcho cells, we investigated 52 n-glycosylation-related gene expression and n-linked glycan structure in fc-fusion protein-producing rcho cells exposed to hyperosmotic conditions. furthermore, to validate the effect of hyperosmolality on protein glycosylation, we performed hyperosmotic culture supplemented with betaine, an osmoprotectant, and then analyzed the n-linked glycan structure and mrna levels of n-glycan branching/antennary genes. after three days of hyperosmotic culture, nine genes (ugp, slc35a3, slc35d2, gcs1, manea, mgat2, mgat5b, b4galt3, and b4galt4) were differentially expressed over 1.5-fold of the control, and all these genes were down-regulated. n-linked glycan analysis by anion exchange and hydrophilic interaction hplc showed that the proportion of highly sialylated (di-, tri-, tetra-) and tetra-antennary n-linked glycans was significantly decreased upon hyperosmotic culture. addition of betaine, an osmoprotectant, to the hyperosmotic culture significantly increased the proportion of highly sialylated and tetra-antennary n-linked glycans (p ≤ 0.05), while it increased the expression of the n-glycan branching/antennary genes (mgat2 and mgat4b). thus, decreased expression of the genes with roles in the n-glycan biosynthesis pathway correlated with reduced sialic acid content of fc-fusion protein caused by hyperosmolar conditions. conclusions taken together, the results obtained in this study provide a better understanding of the detrimental effects of hyperosmolality on n-glycosylation, especially sialylation, in rcho cells. the identified genes, particularly mgat2 and mgat4b, are potential targets for engineering in cho cells to overcome the impact of hyperosmolality on glycoprotein sialylation. disruptive cost-effective antibody manufacturing platform based on cutting-edge purification process v. medvedev, m. duyck, t. albano, j. castillo univercells sa, gosselies, belgium correspondence: v. medvedev (v.medvedev@univercells.com) bmc proceedings 2018, 12(suppl 1):p-093 background demand for high-quality monoclonal antibodies is growing exponentially, calling for new production capacities. overcoming current limitations of conventional manufacturing strategies, namely the high capital investment and production cost, can only be achieved through innovative process designs based on the latest technologies. this study presents a process design combining batch-fed technology with continuous multi-column capture. an advanced cell culture clarification method was introduced to simplify downstream operations and increase overall cost-effectiveness of the process, for an optimized production of recombinant proteins. this study was performed with cho cells expressing a monoclonal antibody targeted against the coronoavirus responsible for the middle east respiratory syndrome (mers), developed by organic vaccines tm and the nih, kindly provided to univercells. upstream process: -fed-batch, 12 days culture at 10l scale with cd-cho chemically defined media and feeds. harvest treatment: -precipitation of impurities in the production bioreactor using organic compounds (<1% v/v) and flocculation by electropositive organics (<0.1% w/v). -acidic ph and physiological conductivity. upstream processing and harvest treatment: culture reached 0.5 g/ l (8x10 6 cells/ml; 90% viability), harvest treatment was found to be very effective in terms of impurities clearance. capture: capture strategies were evaluated from the point of view of simplification of downstream operations, with hcp impurities content monitored as a key performance indicator. -protein a affinity chromatography: advanced harvest clarification enabled major improvements in affinity capture, in terms of eluate purity and reduction of host cell impurities (<35 ppm in all conditions tested). (fig. 1 ). -cation exchange chromatography: cex allows higher capacities (>100 g/l) than protein a, whilst being more affordable (from 2-to 6-fold cheaper). low residual hcp (<500 ppm) was observed with all cex resins tested. without harvest treatment and clarification preceding the capture studies (either affinity or cex), results showed a lower binding capacity of the resin, a higher content of hcp in the eluate (up to 2000 ppm), a higher content of hmw species in the elution fraction (up to 3-fold higher) and a significant turbidity of the neutralized eluate. -continuous multicolumn chromatography: further options to increase cost efficiency include using a continuous multicolumn setup (table 1) . two models were assessed based on two different static binding capacities (sbc), demonstrating that 4 to 6 columns of 100ml were able to process a 200l production in less than 24h. this method provides a great opportunity for designing simplified and low footprint mabs dsp processes, while maintaining similar or achieving superior quality profile compared to standard approaches: -harvests treatments followed by depth filtration proved to be a cost-efficient way to obtain pretreated feed and minimize the burden on downstream operations. -protein a resins exhibited advantages of extracting key contaminants during harvest treatment, while caex confirmed to be a competitive capture strategy. -switching from batch to continuous multicolumn mode allowed to process a complete batch in less than 24 hours, requiring lower media and resins volumes. followed by a single polishing step, such process set-up strongly supports the reduction of operations required to deliver a high-quality product. analyses of product quality of complex polymeric igm produced by cho cells background immunoglobulin m (igm) antibodies are secreted by b cells as the first defense against invading pathogens during primary immune response. some igm antibodies already gained the orphan drug status, which shows their unique capability in therapy of rare diseases. potential fields for applications are discovered with increasing knowledge about these molecules. it seems that the most active forms are pentameric and hexameric igms. unfortunately, recombinant production of igms is rather difficult as secretion and correct polymer formation results in low expression yields and mixtures of polymers. we established stable producing chinese hamster ovary (cho) dg44 cell lines to analyze cellular and extracellular factors that influence quantity and quality of the produced recombinant polymeric igm in future studies [1] . one quality parameter is polymer distribution, which can be measured directly in cell culture supernatant using densitometric analyses [2] . additionally, we developed a very efficient single-step-affinity purification strategy using the poros captureselect igm affinity matrix to analyze pure igms. for more precise measurements of the igm isoform distribution we separated the purified polymers by high performance liquid size exclusion chromatography (sec hplc). our cho dg44 cell lines grow to peak cell concentration of 4.5x10 6 cells/ml in erlenmeyer flasks and 4.0x10 6 cells/ml in bioreactors. similar productivity of approximately 50 mg/l was observed for cells cultivated in both cultivation vessels in a nonoptimized batch culture using chemically defined media. analysing how cultivation conditions affect the fraction of polymers may offer clues about the assembly of polymers and the challenges of igm production. we quantified polymeric distribution of igm directly in the supernatant using a densitometric method [2] . cultivated under standard conditions (37°c, ph 7) igm012 is produced as 90% pentamers, whereas igm012_gl only consists of approximately 80% pentamers. the purified igm012_gl was analysed with sec-hplc and contained 81 % pentamer and 19 % dimer, which is comparable to the results achieved with densitometry. the purification of the igm antibodies was quite challenging as the manufacturer recommend acidic elution, which led to aggregation and inefficient elution of our model igms. therefore, we screened for different elution buffers that prevent denaturation and aggregation. by combining high salt concentrations with moderate ph reduction we optimized elution conditions to 88-99% igm recovery, which corresponded to a five to six fold improvement compared to the manufacturers' conditions. sds-page analysis and sec-hplc showed that our elution strategy resulted in a very pure product after a single chromatographic step. the purification strategy was verified with the igm103, igm104 and igm617. our model igms were produced in a ratio of approximately 4:1 pentameric to dimeric igm, measured concordantly with both analytical methods. process development on igm purification using the poros capture select human igm affinity matrix enabled the recovery of highly pure fractions. through optimization, by combining mild ph and high salt concentrations, the relatively low elution yields were increased by a factor of 5-6. applying densitometry and sec-hplc we will investigate how culture conditions influence polymer formation in future. currently, no small scale (<0.5l) cell culture system is commercially available for high cell density perfusion cultivations to use in high throughput screening studies. to increase throughput for process characterization activities at janssen vaccines and prevention, a shaker flask-based scale down model was developed. though, the control possibilities of shaker flask cultures are technically very limited and different compared to a bioreactor controlled process. in addition, the sensitivity of the shaker flask model should allow the detection of the effects of process parameters on critical quality attributes (cqas) of the vaccine produced at large scale. iterative experiments were performed in shake flasks to evaluate the influence of cultivation parameters such as shaking speed, working volume, co2% in the incubator and daily base additions on cultivation parameters (as cell growth, ph and do). in addition, a medium exchange was tested to mimic the perfusion mode used in the bioreactor process. the presens shake flask reader was implemented to allow for ph and do monitoring. the conditions for which the performance as reflected in specific virus titer showed the best fit were selected. at these conditions, a series of parallel shaker flask infections were conducted to demonstrate statistical equivalence of performance parameter and cqas (as cell specific iu titer and vp/iu ratio) between the production scale and reduced scale processes and thus to qualify the shake flask as a scale-down model. a daily medium exchange by centrifugation was implemented and cultivation parameters for shake flasks were identified. based on performance parameter (cell specific vp titer) and the cqas of the vaccine (cell specific iu titer and vp/iu ratio), equivalence between the production-scale and scale down systems was confirmed. the scale down model data fall into the 95% prediction intervals calculated on manufacturing data whereas scale down model data from batch mode experiments (using non optimized cultivation conditions) do not. the shaker flask as a scale down model for the 10l bioreactor perfusion process was qualified. this model is a tool to screen a subset of process parameters at a higher throughput, thereby reducing process characterization timelines. background until today, the market for therapeutic proteins, especially monoclonal antibodies, is gaining more and more importance in the pharmaceutical field. to meet the increasing demand for these products, the industry made tremendous efforts to generate highly efficient production systems. one of the pharmaceutical industry's research focuses is the improvement of the secretion process in eukaryotic cells. in mammalian cells, the efficiency of protein transportation strongly depends on the translocation of a nascent protein into the er, which is mostly conducted by the signal peptide (sp) coupled to the nterminus. through the interchangeability of signal peptides between products and even species, a large variety can be used to enhance protein expression in already existing production systems materials and methods at first the influence of four different natural sps (sp (7), (8), (9) and (10)) was compared on the secreted amount of an igg4 model antibody (product a) in fed batches using a cho dg44 host cell line. in the second part, one promising sp-candidate showing improved secretion (sp (9)) was identified and the influence of this sp on four additional antibody products, which varied in their expressability from good to mid/bad, was investigated. in both approaches, the standard sp was implemented for comparative reasons. in the first approach, four signal peptides sp (7), sp(8), sp(9) and sp(10) were screened for their potential to improve the product secretion of cho dg44 cells expressing a model antibody (product a). the results revealed a 2.4-fold increase in average final fed-batch antibody titer of sp(9) when compared to the standard sp approach (standard sp = 0.44 g/l; sp(9) = 1.50 g/l). in the second approach, the enhancing capacity of sp(9) on secretion of four other igg products (named product b to e, table 1 ) was further evaluated. an improved performance was observed for all products when comparing sp(9) and the standard sp in a fed batch process (fig. 1) . with an increase in average final fed-batch titers ranging from 28 to 354 % and up to 290 % in cell-specific productivities. taken together, with a positive influence on the final concentrations of all tested products, the results obtained with sp(9) contribute to -signal peptide sp(9) was identified as a promising candidate with an average 2.4-fold titer increase during screening of four signal peptides. -sp(9) was able to improve production titers up to 354 % compared to standard sp. -sp(9) was able to improve cell-specific productivities up to 290 % compared to standard sp. -future usage of sp (9) contributes to the further optimization of sartorius stedim cellca's standard cell line development process. new platform for the integrated analysis of bioreactor online and offline data lukasz gricman 1 , milan ganguly 2 , amanda fitzgerald 3 , hans peter more and more experiments are used to assess bioreactor suitability and stability of clones, to evaluate media composition and other process parameters, and to start upscaling campaigns. this has resulted in a major bottleneck due to the increase in data capturing, processing, aggregation, visualization, and statistical analysis. in addition, the association of the data with the experimental context (e.g., fermentation protocols, media recipes, bioreactor control parameters) is not easily accomplished in high throughput. the data generated in the process must not only be analyzed, but also managed and stored to enable easy tracking and relating to historical records. furthermore, the processes are often developed by global teams interacting in complex enterprise it ecosystems. therefore, new and high performing systems for data capture, processing, and analysis need to be integrated in order to enable storage and correlation of experimental context information and various types of time course analytics data. we have developed genedata bioprocess™, a new enterprise platform for bioprocess development. the platform enables automatic capture and visualization of all online and offline data (e.g., ph, o2, metabolic data), auto-calculations and aggregations (e.g., ivcd, qp, consumption rates) and multi-parametric assessment of any type of time-series bioreactor data in the context of experimental protocol data (e.g., process parameters, feeds). genedata bioprocess comes with dedicated interfaces for integrating with relevant laboratory instruments, control systems, statistical analysis software packages and custom enterprise solutions. it enables the modeling and tracking of complex nonlinear workflows and supports decision making in bioprocess development. the data can be analyzed in the context of upstream process development, and also be correlated to other unit operations. automation support assists the ever increasing throughput of bioprocess development operations, and the analysis of experimental data and process parameters across unit operations or even different projects. this overall integration enhances process development workflows. highlighted use cases describe the selection of the best producer clones (fig. 1a) , the identification of optimized media feeding strategies (fig. 1b) , and the comparison of clone performance across different fermentation scales (fig. 1c) . a special focus is on the analysis of data from micro-and bench-top bioreactors (such as the ambr15™ and dasgip™ systems) operated in parallel. these bioreactors allow for increased throughput of clone selection and process optimization studies, which in turn leads to an increase in data generation. genedata bioprocess supports integration with such systems and enables a comparison of data regardless of the instrument provider or scale. automated bioreactor data analysis allows development groups to take advantage of even richer datasets and, as data management is built-in to the system, the data can be easily tracked and associated to historical records. another focus is on cross-reactor scale comparisons. data coming from different bioreactor scales can be easily imported into the platform and analyzed to establish the best conditions for upscaling. genedata bioprocess enables the correlation of process parameters (e.g., fermentation protocols, media recipes, bioreactor control parameters), with key performance indicators of the processes (e.g., titer, qp) and the product quality attributes (e.g., aggregation, glycosylation profiles). finally, bioreactor time course data can be tracked together with clone analytics and product quality parameters, which makes the platform uniquely able to support end-to-end biopharma development. upstream bioprocesses are at particular risk of contamination from adventitious agents. the typical 0.1 μm filters used at this step protect bioreactors from bacteria and mycoplasma but offer no protection from viral contaminations. a new polyethersulfone (pes) upstream virus filter, viresolve® barrier, has demonstrated high levels of microorganism retention -full retention for bacteria and mycoplasma (>8.0 lrv -log reduction value) and~5 lrv for small viruses, such as parvoviruses. it also has improved flow and capacity as compared to virus removal filters designed for monoclonal antibody purification. given the small pore size of virus retentive filters, implementing a virus filter upstream of the bioreactor raises the question of whether critical cell culture media components are removed. therefore, it is important to evaluate the cell culture performance and protein quality attributes using virus-filtered media to ensure that filtration does not negatively impact the process. materials and methods ex-cell® cho media and corresponding feeds were processed through either viresolve® barrier filters or 0.22 μm filters (control). media composition post-filtration was evaluated by high performance liquid chromatography (hplc), inductively coupled plasma/ optical emission spectrometry (icp-oes), and nuclear magnetic resonance (nmr). recombinant cho cells were cultured in fed batch culture. cell density and viability were measured by vi-cell tm cell viability analyzer while metabolites were analyzed by bioprofile® flex analyzer. shake flasks and bioreactors were utilized to verify that surfactants, such as poloxamer, (which are essential for shear protection in stirred tank bioreactors and can be difficult to filter) have not been removed during filtration. monoclonal antibody titer was quantitated by protein a hplc. characterization of the antibody product quality was assessed via weak cation-exchange chromatography (charge heterogeneity), size exclusion chromatography (aggregate profile), and 2-ab fluorescent labeling with np-uplc (glycan species). media and feed compositions were unaffected by filtration through the virus barrier filter. no significant differences in concentrations were observed with icp-oes (trace metals) or hplc (amino acids and water soluble vitamins). nmr showed no change in the organic composition of the media including poloxamer. the aromatic region with vitamin and amino acid signals is shown (fig. 1a) . cell cultures showed no differences in cell growth or titer, in either shake flasks or bioreactors (fig. 1b) . cell viability was unaffected, metabolite levels were within limits, and titer was consistent. the protein quality of the secreted antibodies showed no differences in the glycosylation pattern (fig. 1c) , amount of aggregates or charge variants. the risk of virus contamination in the bioreactor remains a concern for biotherapeutic manufacturers as there is no universal technology that provides a reliable, cost effective solution for virus removal that can be applied to all components of cell culture media. this study evaluated the viresolve® barrier filter that provides an efficient and easy way to protect bioprocesses from adventitious virus contamination. study results demonstrated that media and feed compositions, cell culture performance, and product quality were unaffected by filtration through the viresolve® barrier filter. implementation of vire-solve® barrier filters provides efficient filtration performance, high virus retention, and minimal cell culture impact and offers a viable option to improve the overall virus risk mitigation strategy for the manufacture of biotherapeutics. b tracking of process conditions together with online and offline performance analytics. the system allows to flexibly define tracked parameters and select optimal process conditions. c comparison of process performance across different reactor scales. the open architecture makes genedata bioprocess a provider agnostic system which allows to aggregate and compare data regardless of provider background bi-and multi-specific antibodies, antibody-cytokine fusion proteins, nonimmunoglobulin scaffolds, chimeric antigen receptors (cars), engineered t-cell receptors (tcrs) and tcr-based bispecific constructs can provide significant advantages for use in cancer immunotherapy. however, as highly engineered molecules they pose new challenges in design, engineering, cloning, expression, purification, and analytics. we have thus implemented an infrastructure that addresses these challenges and enables the industrialization of these various novel therapeutic platforms. in close collaboration with leading biopharmaceutical companies, we implemented a workflow, data management and analysis support system, genedata biologics™, enabling the automated design, screening, and expression of large panels of therapeutic candidates using these novel technologies. we have also built tools for developability and manufacturability assessments of these complex molecules. we have ensured that there is a seamless integration of all data generated and that functionalities such as bulk protein and vector generation using our in silico cloning engine, configurable library of template vectors and cloning strategies, fully annotated in silico protein molecules and dna constructs, and dna synthesis verification support, can be used for the newest protein formats and molecule topologies. we implemented data structures and data handling systems, which mirror how these complex next-generation biologics molecules and cell lines are being designed, screened, and analyzed. the result successfully addresses workflows for tcr optimization and engineering. we exemplified this with the generation and evaluation of a panel of engineered tcrs with an alpha chain cdr3 randomization and successfully supported the analysis and selection of beneficial mutations. the system also successfully supported workflows for the design and generation of a panel of tcr-based bispecifics (tcr coupled with anti-cd3) using automated molecule registration and in silico cloning tools and subsequent capture of expression, purification, and functional and analytical characterization data. on the car-t cell front, the system is able to provide traceability of the work from antibody generation, optimization, car engineering (e.g., attachment to the scfv with cd3-zeta and co-stimulatory domains to mimic the natural tcr complex) to the engineering of the t-cell. the genedata biologics platform successfully enabled automation, increased data integrity and traceability during research and development work, and will contribute towards the industrialization of these very exciting novel approaches for cancer immunotherapy. optimal selection of therapeutic antibodies and production cell lines by assessment of critical quality attributes and developability background the increasing cost of bringing a new drug to the market has put significant pressure on biopharma organizations. to increase efficiency in r&d processes and reduce costs, organizations need to evaluate potential drug candidates earlier in the r&d process, eliminate those with undesirable characteristics, and focus on the most promising candidates. after designing and thorough testing of successful candidates, efficient production of new biological entities in mammalian cell lines is necessary. the main goal here is to find a suitable cell line and optimal upstream and downstream processing conditions that not only lead to a satisfactory product yield, but also to a product with the desired biochemical properties. the evaluation of production cell lines, processes, and product quality attributes is performed earlier and in higher throughput for an increasing number of drug candidates. in addition, new methods in molecular and cell biology (e.g., novel genome engineering approaches such as crispr/cas9), in analytics [e.g., process analytical techniques (pats)], in process miniaturization, and in automation promise to make process development more efficient. however, the management and analysis of the increasing amount of experimental data during candidate selection and cell line and process development has become a bottleneck. in addition, quality-compromising steps in biopharma organizations can negatively impact the cost of goods and substantially prolong the drug candidate's time to market. therefore, systems for integrated management and analysis of wellstructured and curated data that comprehensively integrate molecule and sample information, manufacturing process parameters, and process and product quality attributes are needed. critical quality attribute (cqa) assessment should be enabled along the whole bioprocess development workflow, including cell line development, upstream and downstream process development, as well as analytical and formulation development. we have developed a comprehensive platform, genedata bioprocess™, which supports drug candidate developability and manufacturability assessment and bioprocess development. the platform captures and structures the cell line and process parameters together with analytical data for cell lines, processes, and protein products. the protein analytical data being tracked include biological data (such as bioactivity, immunogenicity), and physicochemical properties. these properties include glycosylation, chemical liabilities (such as deamidation and oxidation), aggregation, stability under different conditions (low ph, low and high temperature), solubility, and impurities. genedata bioprocess™ simplifies and streamlines laborious, manual process and supports tools for molecule, clone and process selection. furthermore, the platform allows for seamless integration with laboratory instruments, statistical software packages, and custom solutions. here, we present use cases showing how to identify and annotate liability sites prone to chemical modifications (fig. 1a) and how to monitor cqas of molecules allowing to assess developability more efficiently. we show how the analytical data generated in the course of a developability assessment are compiled to select the best drug candidate (fig. 1b) . implemented traffic-light systems indicate where molecules harbor issues such as in case of the antibody tpp-86, which is compromised by low temperature and repeated freeze-thaw operations. the same assessment views can also be applied on batches and cell lines. the underlying data can be visualized graphically. as an example, we show glycan types of products obtained from different cell line clones generated in a cell line development campaign for the molecule tpp-86 (fig. 1c) . even though the selected clone cli-35 meets the glycosylation criteria (e. g., <13% afucosylation, <40% galactosylation, <2% sialylation), the produced next-generation biologics molecules are composed from a number of specific subdomains. each type of molecule is composed of a specific set of domains, which must be mirrored in the registration and further research and development workflow. molecule registration and hit-selection using data from a number of assays is shown here using the example of car-t cells. the image is a screenshot from the genedata biologics™ software molecule harbors some stability issues as mentioned above. therefore, more attempts would be needed either in formulation or in reengineering of the complimentarity determining regions (cdrs) in order to provide a developable ttp-86-like drug candidate. background environmental process variables are often used as tools to optimize the performance of mammalian cell cultures to achieve higher cell densities and high productivities of r-proteins (q p ). the manipulation of culture temperature in the range of mild hypothermia (mh) (35-30°c) [1, 2] , as well as different glucose availability scenarios [3, 4] , has been shown to improve productivity in different cell lines. however, the manipulation of these variables individually or together has a concomitant effect on the rate at which cells grow, masking the net response exhibited by the cells. in order to identify the effects of these variables, we have taken advantage of the use of the chemostat culture. chemostat cultures were performed at two dilution rate (d)(0.010 or 0.018(h-1)), two temperatures (33 or 37°c) and three feed glucose concentrations (20, 30 or 40 mm). the response was analysed considering r-protein production, cell growth and key metabolites. r-tpa protein concentration was determined by immunoassay (trinilize tpa kit); cells were counted using a hemocytometer and cell viability was determined by the method of exclusion using trypan blue (t8154, sigma, usa); glucose, lactate and glutamate were determined by enzymatic assay using a biochemical analyser ysi (yellow spring instruments). statistical analysis of the results was performed by anova (design-expert 7 for windows). a decrease in cell density was observed in response to an increase of glucose feeding concentration, regardless of temperature or specific growth rate (in this case μ=d) evaluated. the maximum cell densities were reached at 20mm, achieving 1.65 and 1.50 x10 6 cells/ml at 37/33°c and 0.018(h -1 ); and 1.10 and 1.33 x 10 6 cells/ml at 37/33°c and 0.010(h -1 ) respectively (fig. 1a) . the increase in glucose concentration from 20 to 40mm resulted in an q p increase of 3 and 3.3 fold at 33°c/0.018(h -1 ) and 37°c/0.018(h -1 ) respectively. a lower increase of 2.4 and 1.8 fold was reached at 33°c/0.010(h -1 ) and 37°c/0.010(h -1 ) respectively (fig. 1b) . the highest q p s were reached at 37°c and 0.010(h -1 ). however, a positive effect of mh was not observed, in contrast to that observed in batch culture [1, 2, 3] . this behaviour suggests that low μ is a main factor on increased r-protein production in batch cultures exposed at mh condition. the specific consumption rate of glucose was significantly increased by the glucose increase from 20 to 40mm and reduced by mh (fig. 1c) . at 0.010 (h -1 ) the specific production rate of lactate (q lac ) was increased by glucose increase, independent of the culture temperature used. while at 33°c/0.018(h -1 ) the q lac decreased with increasing glucose concentration and at 37°c/0.018(h -1 ) a maximum consumption was observed at 30 mm glucose (fig. 1d) . the lactate-glucose yield ( fig. 1e ) not showed relevant changes at 0.010(h -1 ), while at 0.018(h -1 ) this yield showed a more efficient utilization of glucose, as glucose concentration was increased. however, this last behaviour was not reflected in an increase of r-protein production. the concentration of glucose has the greatest impact on the behaviour of the culture, and its increase affects positively the protein productivity. the mh did not improve proteins productivity of cho cells producing tpa under the different conditions evaluated; low dilution rate and at high glucose concentration impact positively the protein productivity and the metabolism exhibited by the cells. background mammalian cell cultures are the most commonly used bioprocess for the production of therapeutic recombinant proteins such as monoclonal antibodies (mabs). facing to the increasing demand of these biopharmaceuticals, the fda has initiated the process analytical technology (pat) framework in order to encourage pharmaceutical industries to use innovative technologies to monitor in real time the critical process parameters (cpps), and to ensure the final product quality [1] . one of the most important cpps for cell culture bioprocesses is the specific growth rate (μ), which is a direct indicator of cellular physiological state. indeed, μ is sensible to culture conditions and its value decreases when cells are in the unfavourable environment for growth [2] , which may greatly influence mab production and quality. however, until this day, the online monitoring of μ remains a great challenge for mammalian cell culture bioprocesses. igg-producing cho cells were cultured in 2 l stirred bioreactors equipped with an in situ dielectric spectroscopy (hamilton). operating conditions were fixed at 90 rpm, 50% of air saturation, ph 7.2 and 37°c. permittivity of cell culture was measured every 12 min, which allowed to calculate in real time the vcd by using a previously established linear correlation. then, a model of online estimation of μ was developed based on vcd prediction and cell mass balance equations. several signal noise filters and various calculation methods were evaluated to reach better model stability. cell cultures were performed in both batch and feed-harvest modes. feed-harvest cultures consisted of sequential renewals of 2/3 volume of the culture medium by following different strategies. this study proposed an innovative methodology based on dielectric spectroscopy to monitor in real time the cellular physiological state, by online estimating the specific growth rate (μ) of cells. model of online estimation of μ was developed from cultures in batch mode, and was validated by comparing online estimated μ with the experimental ones calculated at the end of the culture. with this model, the moment when μ started to decrease significantly, which indicated that cells were no longer in the exponential growth phase, was identified as the critical moment. to demonstrate the interest of online estimation of μ, the developed model was applied to a feedharvest culture, where the medium renewals were performed at the critical moments indicated by the model. this culture was then compared with the traditional feed-harvest culture where medium renewals were performed by following offline measurements of glucose and glutamine. we found that the online strategy allowed to maintain the value of μ by renewing the medium at the right time, while the values of μ varied a lot when using offline strategy. moreover, by using the online estimation of μ, the glycosylation of igg was kept at a high level (about 95%) throughout the whole culture. however, for the culture using offline strategy, the glycosylation level decreased progressively and was only about 75% at the end of the culture (fig. 1) . model of online estimation of μ was developed by using dielectric spectroscopy, which allowed to monitor the physiological state of cells in cell culture bioprocesses. implementation of this model in feed-harvest cell culture led to better mab glycosylation, which demonstrates clearly the potential of this methodology in mab production bioprocesses. background monoclonal antibodies are normally synthesised from transfected mammalian cells as heterogeneous mixtures of glycoforms [1] . however, clinical efficacy may depend upon single glycoforms which have been difficult to isolate [2] . we have now developed an efficient method for generating single glycoforms by solid phase re-modelling which is superior to previous methods because it allows a sequential series of enzymatic changes without the need for intermediate purification of the antibody. solidphase binding exposes the antibody glycans to enable easier access of the transforming glycosylation enzyme. the antibodies subjected to modification were a chimeric human/ camelid monoclonal antibody (eg2), a humanized monoclonal antibody (il8), a full size chimeric antibody (cetuximab) and polyclonal antibodies obtained from pooled human serum. the antibodies were bound to a protein a column using conditions typical of mab purification (fig. 1 ). after washing out non-bound impurities by a neutral ph buffer, each antibody was subjected to enzymatic modification directed to a targeted glycan profile ( table 1 ). the antibodies were then eluted with a low ph buffer and neutralized. the glycan profiles were analysed following glycan removal with pngase f, labelling with 2-aminobenzamide and separation on a hilic-hplc column [3] . prior to enzymatic modification glycan analysis of all 4 antibodies showed variable galactosylation and sialylation typical of human abs. this included a distribution of fg0, fg1, fg2, fs1 and fs2 with galactosylation indices ranging from 0.22 for il8 to 0.64 for eg2. there was minimal sialylation in il8 but up to 11% in eg2. glycan modifications were made as each antibody was held on a protein a column in accordance with procedures shown in table 1 . agalactosylated glycans were enriched by treatment with the single addition of galactosidase and neuraminidase. this resulted in 83-95% of agalactosylated structures in the mabs and 65% in the polyclonal antibody. galactosylated antibodies (>95% yield) were produced by a single stage reaction involving sialidase and by galactosyltransferase with udp-gal. breakdown of the glycans to a trimannosyl core was accomplished by treatment of the agalactosylated structures with hexosaminidase. this produced a yield of 76-80% of the fm3 structure with a small remainder of fa1. sialylated antibodies (>95%) were produced by a 2 stage reaction involving sialidase, galactosyltransferase and finally treatment with 2,6 sialyltransferase in the presence of cmp-nana. the latter reaction produced equimolar quantities of monosialylated and disialylated cetuximab and polyclonal antibodies. the results suggest that for human antibodies (150 kda) there may be a limitation for sialylation given the steric constraints between the two ch2 domains of the dimeric structure. the ability to sialylate the smaller camelid antibody (80 kda) was greater resulting in a high (>90%) level of disialylated glycans. this suggests that the steric constraints for glycosylation may be lower. these sialylated antibodies have significant potential clinical importance for their ant-inflammatory activities. we have modified the glycans of antibodies following immobilization on an affinity ligand column. this allows enzymatic transformation in a solid state that has a distinct advantage over the equivalent transformation in solution because the enzymes and buffers can be washed out on completion of the modification leaving the antibody still attached to the affinity ligand. this enables repeated rounds of an enzymatic reaction or sequential reaction steps without the need for intermediate antibody purification. the antibody can be removed eventually from the column by application of an elution buffer once all desired glycan modification have been made. since affinity ligand purification of antibodies is performed routinely as an initial step of purification after cell culture, the glycan modification can easily be incorporated into this process. the enrichment of the resulting antibody for a targeted glycoform can enhance the potential therapeutic efficacy as it is known that specific glycoforms are required for certain biological effects. [1, 2] . this is mainly because microvesicles can be enriched/deprived for specific proteins, based on their functional purpose and their cellular origin. recently, microvesicles purified from the supernatant of t24 bladder cancer cells were reported to be enriched for bcl-2 and cyclin d1 (anti-apoptotic proteins), but deprived for bax and caspase-3 proteins (pro-apoptotic proteins) contributing towards immunity against programmed cell-death [2] . however, impact of microvesicles on cho-based bioprocess has not been evaluated yet. therefore, in this investigation, we aimed to evaluate their impact on cell growth and recombinant protein production from cho cells. materials and methods cho-k1 cells were grown in chemically-defined protein-free culture medium (life technologies-1835273) in shake flask (gx-00125p). the different fractions of spent-media (microvesicles and microvesicle-free spent media) were collected using ultracentrifugation method [1, 3] . quality of different fractions was ensured using western blotting for exosomal marker, cd63 (sc-15363) and coomassie stained gel for loading control (fig. 1a) . to evaluate impact on cell growth, cells were seeded with microvesicles and microvesicle-free fraction collected from log-phase of culture and cell counts were performed by vicell using trypan-blue dye exclusion method. for impact on productivity, cell-free supernatant, collected from microvesicle-treated human igg secreting cho culture from stationary-phase of culture with respective control, was evaluated using elisa (ab100547). microvesicles collected from 10% of media (by volume) from routine maintenance cultures compared to working volume for microvesiclesupplementation were used in each experiment. the growth of microvesicle-supplemented cultures had shorter lag-phase and achieved 1.2 fold higher maximum cell density (1.46x10 6 viable cells/ml) compared to untreated standard culture (1.21 x10 6 viable cells/ml) and maintained higher for the remaining period of batch culture (fig. 1b) . however, microvesicle-free fraction did not had significant impact on growth. the viability of microvesicle-supplemented cultures, similar to microvesicle-free media supplemented, was also higher compared to standard culture suggesting potential use of microvesicles for regulating cho growth in production cultures. this could be possibily because microvesicles have already been reported to be enriched with cell growth/death-regulating proteins and hence facilitating cell growth [2, 3] . we have also observed abundance of cell cycle regulators including cyclin d1 in microvesicle-fraction compared to microvesicle-free spent-media in our laboratory (data not shown); however, further investigation are required to prove the hypothesis. the overall productivity of human igg secreting cho cells was also observed to increase bỹ 4 fold following supplementation of microvesicles to the culture without significantly affecting per-cell productivity. since microvesicle-supplementation facilitates cell growth, increased number of viable producer cells in the culture could be expected to be the basis of observed increase in the overall productivity of the culture [2, 3] . the further work is ongoing to in-depth explore the potential of microvesicles for improving recombinant protein production from cho cells. the data indicate that microvesicles secreted from cho cells can improve cell growth and hence recombinat protein production in culture. therefore, strategies need to be developed for sterile isolation of cho microvesicles from routine maintenance cultures and their supplementation into the production culture for improving the performance of cho-based production process. the glycosylation profile of a recombinant protein is one of the most important attributes when defining product quality. producing a protein with desired characteristics requires the ability to modify and target specific glycosylation profiles. traditionally the approach to modify the glycosylation profile of a protein involves supplementing a culture with components that can improve galactosylation. experimentation using this supplemental approach resulted in a dramatic increase in terminal galactosylation, but lacked the ability to easily and repeatedly target specific glycosylation profiles. using novel and proprietary technology, we have developed a feed (glycantune™) and a unique feeding process that will maximize growth and titer while being able to modulate glycan profiles. this new feed can be added as a standalone process that can result in a significant shift from g0f to g1f and g2f (maximum galactosylation). using a unique fed-batch process, glycantune can also be used with a standard feed to dial in targeted glycosylation profiles. through process development, we created a method where a transition point is used to switch from a standard feed to a glycan modulating feed. the timing of the transition point will determine the specificity of the glycan profile. ) . n-linked glycans were digested with pngase f and quantified using 100pmole maltohexose/maltopentose internal standards labeled with 8-aminopyrene-1,3,6-trisulfonic acid (atps) as described by laroy et al [1] or the user guide for the glycan labeling and analysis kit (glycanassure™ user guide, thermo fisher scientific). all ce separations were performed using the applied biosystems™ 3500xl. the timing of transition from efc+ to gtc+ made it possible to target specific glycosylation profiles. modulating g0f from 75% down to 32%, while increasing g1f and increasing g2f (fig. 1) . transitioning to gtc+ early in culture resulted in a greater shift from g0f to g1f and g2f. transitioning midway or late in culture resulted in a greater proportion of g0f compared to g1f and g2f. supplementation based approaches using glycosylation modulating media components to modify and target specific glycosylation profiles proved to be difficult. these approaches were able to increase terminal galactosylation (g1f and g2f), but lacked the ability to fine tune glycan profiles. this could result in numerous rounds of titration experiments to target specific glycan profiles that would likely remain inconsistent between cell lines, culture media and feeds, and process scale. the development of a unique process made it possible to predictably target specific glycosylation profiles. transition from standard feeding to glycantune allowed for precise targeting of glycan profiles. transition to glycantune early in culture resulted in an increased shift from g0f to g1f and g2f. a transition late in culture resulted in increased g0f and decreased g1f and g2f. growth performance during precultures and batch curves in plain shaking flasks did not show any differences among tested surfactants or lots thereof, and cell densities reached 10-12·10 6 cells/ ml ( fig. 1a and b) . experiments with hek 293-f cells at elevated power input in baffled shaking flasks revealed distinct differences between pluronic® f-68, f-127 and kolliphor® p188, with f-127 showing the best performance. peak viable cell densities reached with lots a and b of pluronic® f-68 and f-127 were comparable to those in plain shaking flasks, while those for kolliphor® p188 and lots c and d of pluronic® f-68 were significantly lower. peak viale cell densities were of 2 -12·10 6 cells/ml (fig. 1c) . similar transient transfection efficiency and mean fluorescence of transfected cells independent of applied surfactant and lot thereof indicated no major impact of respective poloxamer (fig. 1d) . interestingly, experiments using fluorescein-labelled pluronic® showed a time-dependent uptake into hek cells. visual tracking revealed an endocytic uptake of poloxamers by the cells (>10fold increase in signal after 96 h) and its co-localisation with cell membrane and lysosomes. sec (fig. 1e) analyses showed differences between the tested poloxamers. especially tested lots of pluronic® f-68 revealed notable deviations in the low molecular weight fraction (peak 2, fig. 1e ), compared to the other poloxamers. cultures subjected to varying levels of shear stress showed distinct growth differences depending on used poloxamer. while experiments in plain shake flasks did not show any differences in growth, cultivations under elevated shear stress in baffled shake flasks resulted in lower peak viable cell densities with kolliphor® p188 and some pluronic® f-68 lots. it remaines unclear whether this can be explained by different membrane protective activities alone, or if other mechanisms, occuring during and after cellular uptake, contribute to this effect. especially for the tested lots of pluronic® f-68, sec of surfactants showed differences in the low molecular weight fraction. this fraction mainly represents polyethylen oxide (peo) (revealed by nmr), which is likely to be a remnant from synthesis. these observations indicate that the use of different poloxamers and lots thereof should be carefully evaluated, especially under elevated shear stress. further experiments will focus on investigating distinct sec fractions of poloxamers. overcoming (fig. 1) . aurintricarboxylic acid (endonuclease inhibitor; enhancer used in e.g. salivary gland transfection) and polyvinylpyrrolidone (polymer; beneficial in electroporation) were both found to negatively impact peimediated transfection of cho cells, while another tested polymer enhanced growth as well as transfection efficiency. the use of a strong chelator led to a high transfection efficiency, but impaired cell growth. based on the results of the independent substance testings, the medium formulation was modified by the addition of a weak chelator and further components including vitamins. different osmolalities between 280 mosmol/kg and 340 mosmol/kg were tested for the final formulation, but no major impact was seen neither on transfection efficiency nor on viability 2 days post-transfection. the final cho tf medium formulation supported high cell growth of finally tested cho cell lines 2 and 3 with peak viable cell densities above 10⋅10 6 cells/ml in batch cultivations with an overall cultivation time of 7-8 days (fig. 1) . further improvements of the process might be achieved by adapting the protocol, as the results shown are based on a simple precomplexing of dna-pei. moreover, product yields could potentially be increased by using feeds, temperature shifts or commonly used enhancers (e.g. valproic acids). scaling of a cell culture process is an essential part in its development. in a typical approach scaling [1] is performed by keeping a (critical) process parameter constant throughout the complete bioreactor range. this can lead to non-beneficial results either on the high or the low end of the range. for instance, the specific power input [p/v] of 30 w/m 3 might result in a good agitation in production scale whereas it leads to a nonturbulent mixing behavior in process development scale. to overcome this issue a new approach for an easy scaling procedure was developed. this "utility function" approach for agitation scaling is based on individual functions with a value-based mapping independent of bioreactor scale. process insight information (established either from doe process investigation or existing experience with a process platform) is directly formalized into a set of mappings which transform bioprocess values into perceived benefits (0 to 1). at each bioreactor scale, parameters (e.g. stirring and gassing) are then chosen to maximize the product of resultant utility functions. the model cho fed-batch process in this trial comprised a cho dg44 cell line that was transfected to produce a humanized antibody igg1. a chemically defined media system was used. the process, including cell line, medium and feeding strategy was designed and developed by sartorius stedim cellca. the aim of the gassing scale-up was to achieve similar cell densities when the addition of pure oxygen starts. for all flexsafe str® bags oxygen was sparged via the micro sparger part of the combi sparger. all other systems used a ring sparger with holes face up. the initial air flow rate was set to an oxygen transfer rate (k l a) of 8 1/h at the corresponding agitation rate and volume. all process engineering characterization parameters were determined according to dechema guidelines [2] . with the use of the utility functions the discrete agitation rate was determined (table 1) . the utility functions led to discrete agitation rates where not only homogeneous mixing but also a turbulent flow pattern and a suitable specific power input was guaranteed. the initial gassing rate of air supplied enough oxygen for 5 x 10 6 cells/ml in all bioreactors. due to the used scaling methods the growth patterns in all bioreactor scales were comparable. peak viable cell densities (vcd) of 20 -26 x 10 6 cells/ml were achieved and viability at the point of harvest was above 80 % in all scales. the final product concentration was in an acceptable range of 2.9 -3.6 g/l. product quality attributes show comparability over the complete bioreactor range (fig. 1) . the harvest criteria of 12 days gave a combination of viability and product concentration that made it easy to process the cell broth during cell removal and other downstream steps. the process implementation of the cho production system -expressing mab 2 was successfully performed with the use of utility functions. cell growth, productivity and product quality is comparable over the complete bioreactor range. background endoplasmic reticulum (er), the central part of the secretory pathways in eukaryotic cells, is responsible for controlling the quality of secreted and resident proteins through the regulation of protein translocation, protein folding, and early post-translational modifications [1] . a number of physiological conditions such as oxidative stress, hypoglycemia, acidosis, and thermal instability can disturb the er functions, which triggers er stress [2] . prolonged er stress induces apoptotic cell death [3] . oxidative stress that naturally accumulates in the er as a result of mitochondrial energy metabolism and protein synthesis can disturb the er function [4] . because er has a responsibility on the protein synthesis and quality control of the secreted proteins, er homeostasis has to be well maintained. when h 2 o 2 , an oxidative stress inducer, was added to recombinant chinese hamster ovary (rcho) cell cultures, it reduced cell growth, monoclonal antibody (mab) production, and galactosylated form of mab in a dose-dependent manner. antioxidants can reduce the oxidative stress level and suppress the apoptotic cell death by scavenging oxygen free radicals, inhibiting chain reaction of oxidation, and detoxifying peroxide [5] . however, despite the importance of mass production of mabs, studies on the effect of antioxidants on the production and quality of mabs in rcho cell cultures have not been fully substantiated. to find a more effective antioxidant in rcho cell cultures, six different antioxidants including baicalein, which have used widely in mammalian cell cultures, were evaluated as chemical supplements with two different rcho cell lines producing the same mab in 6-well plates. then, batch and fed-batch cultures were performed in shake flasks with the supplementation of baicalein, which showed the best effect on culture performance among the 6 antioxidants. the reactive oxygen species (ros) and er stress levels were measured to study the effect of baicalein on mab production and quality. among these antioxidants, baicalein showed the best mab production performance. addition of baicalein significantly reduced the expression level of bip and chop along with reduced ros level, suggesting oxidative stress accumulated in the cells can be relieved using baicalein. as a result, addition of baicalein in batch cultures resulted in 1.7 -1.8-fold increase in the maximum mab concentration (mmc), while maintaining the galactosylation of mab ( fig. 1 and table 1 ). likewise, addition of baicalein in fed-batch culture resulted in 1.6-fold increase in the mmc while maintaining the galactosylation of mab. oxidative stress negatively affected the production and galactosylation of mab in rcho cell cultures. among the various antioxidants tested in this study, baicalein showed the best mab production performance in both batch and fed-batch cultures of rcho cells. baicalein addition significantly enhanced mab production while maintaining galactosylated forms of mab. thus, baicalein is an effective antioxidant for use in rcho cell cultures for improved mab production. background the production of many biopharmaceuticals (e.g. antibodies & proteins for diagnostic and therapeutic purposes) requires the cultivation of mammalian cell lines, which is demanding with respect to various aspects such as complex cell metabolism, variabilities in cell behavior, scale dependencies, influences of changes in cultivation conditions, medium composition etc. although an increasing number of measurement parameters is available, only a part of them is routinely utilized in industrial cell culture processes and their corresponding seed trains. nevertheless, the data base grows, statistical investigation of data gains importance and process data are more easily accessible in the context of industry 4.0. cell cultivation has to consider these complex requirements, e.g. for fed-batch control and seed train design. furthermore, cultivation strategies have to be adapted to new products, cell lines and clones as well as to different production plants when transferring processes. one approach to encounter the variabilities and to include actual information from the process and from data analysis is adaptive model-assisted control [1] . two software tools enabling adaptive model-assisted control applying unstructured, unsegregated models have been developed and implemented using matlab © , winers and fortran, one tool for fedbatch control and another one for seed train simulation and optimization. one key element of adaptive model-assisted control is the underlying process model. in order to provide an adaptive character, model parameters should be easily identifiable from routine cultivation data, which is available during seed train and fed-batch without additional sophisticated measurements. therefore, the usage of unstructured, unsegregated models is recommended. a) example of an unstructured, unsegregated cell culture model (for adaptive model-assisted control) one example, describing cell growth, cell death, uptake of substrates and production of metabolites via a first order system of ordinary differential equations and monod-type kinetics, is shown in table 1 . this mathematical model includes 13 cell specific model parameters [2] . ii) b) open-loop control sequence for seed train simulation and optimization [3] : using model, a priori identified model parameters and starting concentration values, the temporal concentration courses can be predicted for the first scale. subsequently, points in time for passaging and starting values for the next scale can be computed by adding a passaging strategy, seed train conditions and medium concentrations. prediction for the following scales can be obtained iteratively. integrating feedback from the process in terms of cultivation data enables increasing prediction accuracy and responding to possible changes in cell behaviour. process design and optimization, e.g. regarding seed train and fed-batch, is realized by adaptive model-assisted software tools using unstructured, unsegregated models. they enable feedback from the process via routine cultivation data and allow adaptation to diverse circumstances such as different cell lines, products, cultivation conditions, plant configurations etc. ) in polyelectrolyte capsules. significant advantages, such as great mechanical stability, good biocompatibility and good mass transfer properties characterized these capsules based on sodium cellulose sulfate/poly(diallyldimethyl) ammonium chloride (scs/pdadmac) [1, 2] . here, we present the possibility to cultivate human t cells, freshly isolated from blood, to high densities in similar semipermeable polyelectrolyte microcapsules within less than 10 days. cells were encapsulated in semipermeable scs/pdadmac polyelectrolyte microcapsules or confined in 1.5% alginate/poly-l-lysine (pll) beads, a standard approach for cell immobilization. the permeability of the microcapsules was estimated using dextran-based molecular weight standards (10 and 20 kda) and vitamin b12 (1.6 kda). gentle digestion with endocellulase allows an easy release of the cells out of the capsules. cell growth, cytokines production and phenotype were measured in non-encapsulated and encapsulated cells grown under standard culture conditions. moreover, we analyzed the interplay between the secreted cytokines and the scs within the capsules and its putative influence on cell growth. cells mixed in the cellulose sulfate solution under physiological conditions can be safely trapped within a liquid core during capsule formation. encapsulated cells can reached cell densities ≤ 40 x 10 6 cells ml capsule -1 , whereas cells confined in alginate/pll beads and non-encapsulated ones reached 11.3 x 10 6 cells ml bead -1 and 2.4 x 10 6 cells ml, respectively. one major advantage of these polyelectrolyte microcapsules (<1 mm) is the low mwco (<10 kda) (fig. 1a-b) . this restricted permeability allows for a conditioning of the capsule core by autocrine factors, which in turn permits the use of basal cell culture medium instead of expensive t cell specialized media, hence does not necessitate high amounts of rhil-2 and reduces the cultivation costs. moreover, co-encapsulation of rhil-2 had a beneficial effect on the growth kinetics in most cases (fig. 1c) . some evidence is presented that the scs used to form the polyelectrolyte microcapsules, specifically adsorbs il-2 (table 1 ) -a cytokine which provides an essential signal for t-cell proliferation and differentiation [3] . therefore, we postulate that the scs used for encapsulation has biomimetic properties, creating an artificial extracellular matrix mimicking heparin sulfate which in turn positively affect t cell proliferation via trans-presentation of il-2 (fig. 1d) [4] . primary t lymphocytes can be expanded under appropriate conditions outside the body. in the latter, t cells grow/expand in specific environments where the cells are tightly packed, leading to multiple cell-cell contacts and manifold interactions with the extracellular matrix. ex vivo suspension cultures of diluted cells cannot provide such a microenvironment. in the microcapsulesbased cultivation system presented, the cells are suspended in a viscous scs-solution. the low molecular weight cut off of the surrounding polyelectrolyte membrane assures that typical signaling molecules produced by the cells are retained thus facilitates the "conditioning" of the cellular microenvironment, while nutrients and metabolites can pass. expensive additives, such as interleukin-2 (il-2), can be co-encapsulated. expansion then no longer requires specialized t-cell media. moreover, the scs seems to have biomimetic properties, representing an artificial extracellular matrix mimicking heparin sulfate. we consider that the described method may be an appropriate alternative to expand t cells while creating a local microenvironment mimicking in vivo conditions. -175) . equations of balances and kinetics of an employed process model including x v viable cell density, x t total cell density, μ cell-specific growth rate, μ d cell-specific death rate, t time, k s and k monod kinetic constant and monod constant for uptake, k lys cell lysis constant, q cell-specific uptake rate or production rate, respectively, y kinetic production constant, c concentration, glc glucose, gln glutamine, lac lactate, amm ammonia, f feed rate, v volume balances with fed-batch terms kinetics ;uptake if c glc ≥ 0.5 mm : q lac,uptake = 0 if c glc < 0.5 mm : q lac,uptake = q lac,uptake,max q amm = y amm/gln • q gln background digital manufacturing (dm) is heightening the productivity and robustness of existing processes and facilities. it also enables the efficient development of previously unmanageable products or processes and provided the basis for a wave of innovations. dm is a resident and on-line source of continuous optimization of process performance. it relies upon the comprehensive, real-time interfacing of both human and machine sourced information through one centralized system. more than legacy distributed control system (dcs) and supervisory control and data acquisition (scada), it is an integral interconnection of real-time access to divergent sources of information. as such, it can promise deep analysis and predictions leading to shortened product cycle and advanced process control. this comprehensive analysis is extending beyond operations performance data from the production floor to data driving such activities as raw materials security of supply (sos) and business continuity management systems (bcms). digital biomanufacturing (db) can be viewed as yet another, larger, embodiment of digital biotechnology. db is similar to digital manufacturing in that it promotes innovations in the manufacturing of biologicals by using such things as computer aided design, manufacture, verification and deep process analysis using software sensors (fig. 1) . however, the fact that there are living components (cells) involved in the processes puts a distinctly different flavor to the systems employed. it is desirable to use a distinct term here to distinguish it because, as in the terms bioproduction and biopharmacology, db addresses many unique aspects of biologically-based activities. the reasons why the biotech and biopharma industry lags behind other sectors such as the automotive regarding the transformation to digital manufacturing are (i) the complexity and dissipative nature of biological systems, (ii) distributed heterogeneous data and (iii) limited at-line or on-line data sources. however, the costs of genomic sequencing, omics data generation, and computing resources are decreasing rapidly, and at the same time process analytical technologies, computational power and predictive modeling as well as data management infrastructures are greatly improving (table 1) . by removing roadblocks that used to limit approaches, these changes have paved the way to transforming the bioeconomy into an industry that is based on digital knowledge. such new and optimized manufacturing technologies as continuous biomanufacturing and 3d bioprinting can actually demand the interfacing of many sources of information, deep data prior to elisa, the various proteins were incubated at 37°c in scs prepared as for encapsulation. as control, the scs was replaced by pbs. shown are mean values ± sd, n = 3 analysis including software sensors for metabolic fluxes, and model-based predictions of digital biomanufacturing. the application of predictive models for bioprocess optimization greatly improves established platforms and finally leads to a massively increased mechanistic process understanding. four essential benefits result from the increased bioprocess understanding, development, and control of db. first, personnel are relieved of many manual and repetitive tasks. second, strategic planning and operational efficiency are improved. third, we see real-time optimization of end-to-end manufacturing based on such high-value criteria as projected product quality and profitability. fourth, it enables previously unmanageable operations and creates innovative solutions. monitoring between-batch behavior of real-time adjusted cellculture parameters xavier lories, jean-françois michiels arlenda, mont-saint-guibert, 1435, belgium correspondence: xavier lories (xavier.lories@arlenda.com) bmc proceedings 2018, 12(suppl 1):p-192 background cell-culture parameters (ccp), such as ph, may be continuously measured online and subject to real-time automated adjustment (e.g. automated addition of a base to prevent the ph to drop too low). this is an efficient method to maintain the parameter within specified limits. this type of control constraint the variability within the predefined limits and does not provide any information on the between-batch variability of the process. online measurements of ccp provide time-dependent curves presenting one or more transitions. different types of transition can be observed: -the process can shift from a state in which adjustment is needed to keep the ccp in range to a state in which it is not. typically, the ccp drifts away from a limit. -the process shifts from a state in which adjustment is not needed to one in which it is. for instance, a drifting ccp reaches the lower or upper limit of the accepted range. the timepoints at which those transitions take place are here called changepoints. those are aspects of the process and, as such, should be controlled. in the multiple changepoints cases, the approach allows the early termination of runs showing very early or very late first changepoint. the identification of the changepoints position is based on simple rules rather than complex statistical modeling to keep the identification methodology simple. once the changepoint are identified, a multivariate bayesian model is adjusted on the appropriately transformed data. prediction regions are obtained and used as control limits [1] . results obtained for a 2-changepoint case are shown on fig. 1 . points on the right-hand graph represent new batches. the red triangle represents a failed batch. it appears that the control strategy fails to identify the failed batch. two reasons can be considered: -the limits of the prediction region have been established based on 9 points, such a small sample size is likely to be insufficient for the definition of such a control chart. -the tested batches were produced out of set point. a control chart should be used on a stable process, ran in the same conditions, in order to be really relevant. this work was based on available historical data, which is never an ideal situation. the suggested strategy offers a simple approach to the monitoring of between-batch behavior for cell-culture. once the limits have been defined, the approach is quite straightforward and usable by nonstatistician. however, such strategy, as any other of this type, must be based on a sufficient number of batches for the definition of the control limits in order to have a good estimation of the batch-to batch variability. fig. 1 (abstract p-190) . intelligent software applications support digital biomanufacturing process development and control. • databases using data collected online, at-line, and offline from bioprocesses operating worldwide. • process data are used to generate metabolic network models that represent a specific host cell line in a bioprocess. • modelbased computational simulations improve process understanding and reduce experimental efforts for media design, clone selection, and metabolic engineering. • automated data import and processing allow for a streamlined and standardized metabolic process analysis. • identification of critical metabolic parameters is used for proactive steering and control of production processes background rabies is a zoonotic viral disease with a mortality close to 100% [1] . as there is not an efficacious treatment available, post-exposure vaccination is recommended for individuals in contact with the virus. on the other hand, the most common source of virus transmission is saliva of infected animals, mostly dogs, whereby mass vaccination of pets is the most cost-effective way to reduce human infections. in this context, availability of both human and veterinary vaccines is critical [2, 3] . our group had previously developed an effective vlp-based rabies vaccine candidate produced in high density hek293 cell cultures with serum free medium (sfm) [4, 5] . one of the aims in vaccine production process is the achievement of a good productivity with a low cost per dose, mainly in the case of vaccines for animal use in which case the sfm is one of the principal expenses. in this work, we show the adaptation of the producer clone to a non-expensive in-house developed culture medium, in order to reduce the global cost of the process and therefore the price per dose. experimental approach first, we compared a direct and a sequential adaptation protocol of our hek293 rv-vlps producer clone, from 100% of the commercial sfm (ex-cell293, safc) to a new formulation with only 50% of the sfm and a minimum essential medium (p2g), developed in our laboratory specifically for rv-vlps production. this new formulation was called rvpm (rabies vaccine production medium). the specific productivity of rv-vlps in culture supernatants was measured by sandwich elisa, using the 6 th international standard for rabies vaccine that quantify the glycoprotein content (nibsc, expressed in elisa units per ml). further, we evaluated both media for the production of the rabies vaccine, using stirred tank bioreactors operated in continuous mode (biostat qplus, sartorius). the production of the rv-vlps was daily evaluated by elisa and the obtained harvests analysed by the nih potency test for rabies vaccine. after the adaptation process, suspension cultures without aggregates or clumps were obtained, with the same specific growth rate. a lower maximum cell density with the rvpm was reached, achieving 5x10 6 cells.ml -1 , compared with the sfm that reach cell densities between 8 and 9x10 6 cells.ml -1 in batch mode. the specific rv-vlps productivity per cell was maintained, obtaining values of 0.88 and 0.90 eu.10 6 cells -1 .day -1 for the clone being cultured in sfm and rvpm, respectively. taking into account that this producer clone can be changed directly from one medium to the other without lag phase or cell damage, and that in rvpm the maximum cell density reached was lower, this medium was proposed to be analysed in high cell density in perfusion mode for a continuous culture in bioreactor. therefore, we performed two cultures in parallel to compare the efficacy of each media formulation in perfusion. as shown in fig. 1 , we obtained very similar culture performances in both bioreactors; 14.4 eu.ml -1 and 16.1 eu.ml -1 of rv-vlps for the commercial sfm and rvpm, respectively. after that, the harvests were evaluated by the nih potency test obtaining a rabies vaccine potency of 1.2 iu.ml -1 for both cultures (being 1 iu.ml -1 the minimum potency required for animal vaccine). thus, the results obtained represent an interesting advance in the optimization of this vaccine production process since the use of this new medium formulation represents a reduction of 40% of the total cost which will be reflected in a considerable reduction of the price of the vaccine dose. background vaccines are one of the most powerful and effective health inventions ever developedproviding tremendous economic and societal value; yet several factors hinder comprehensive immunization coverage. traditional methods of biologics production, based on stainless steel bioreactors, allow pharmaceutical companies to achieve economy of scale, but are limited by high capital expenditures. such approaches stifle manufacturing innovation and lack long-term cost-effectiveness and sustainability. current innovations can cut biologics' production costs to revolutionize the mainstream use of biologic treatments, focusing on developing fast, potent and cost-effective vaccine production. univercells' mission to make biologics affordable to all initiated a paradigm shift, targeting an innovative single-use manufacturing platform incorporating bioprocess into continuous operations. univercells employs process intensification, using high volumetric productivity bioreactors; and unit steps integration, coupling usp and dsp into continuous operations. the objective is a down-scaled high-productivity process for a cost-effective manufacturing solution. the resulting micro-facilities are easily-deployable in developing countries, breaking entry barriers to biomanufacturing (fig. 1) . manufacturing and distribution advancements, from centralized to distributed, foresee affordable treatments' obtainability via supplying local populations with local production units. -bench-scale fixed-bed bioreactor; -carriers made of 100% pure non-woven hydrophilized pet fibers; -vero cells grown in serum-free and serum containing media; -attenuated polio strains; -cell nuclei on carriers counted by the crystal violet method; -polio virus production estimated by elisa assay (d-antigen content). cultivation of vero cells in medium with serum and in serum-free medium, was carried out in bench-scale compact fixed-bed bioreactors, to determine which culture conditions result in the highest growth rate, the highest cell biomass by carriers and virus production. cells were inoculated at 0.05x10 6 cells/cm 2 and infected during the mid-exponential phase, following a complete media exchange. viral infection took place in serum-free media. in-line clarification and purification is targeted to be performed in only a few steps (maximum one of two) without intermediary diafiltration. in such configuration, we measured that vero cells can reach a cell density of 300-350x10 3 cells/cm 2 with pdl/day of 1.0-1.2 in serum-containing media. this new facility is expected to manufacture any type of viral vaccine at a very low cost and could be deployed at the site of the manufacturer in emerging countries, killing the two birds of cost of manufacturing and distribution with one stone. the presentation will feature the description of the engineering development, but also the preliminary results of cell growth, infections, and product quality, as well as a description of the cogs calculation. univercells developed a disruptive polio vaccine manufacturing technology exceeding expectations when compared to traditional methodsachieving a superior result via its all-in-one solution of a simple, scalable, and fully-disposable vaccine production platform resulting in long-term cost-effectiveness, flexibility and sustainability: -all upstream, downstream and inactivation steps take place within a closed system with all the equipment contained in a low footprint isolatorcreating a confined area for polio virus handling that facilitates the deployment of micro-facilities. -this leads to a dramatic reduction in capital investment, time required for development and increases production capacity. -in conclusion, this is a simple and elegant solution for the industrial production of human vaccines at a low cost in micro-facilities, making polio vaccines available to all. comparison of media formulations for the vaccine production in 1 l stirred tank bioreactors operated in continuous mode. both cultures were performed in parallel using the corresponding medium for the perfusion. a feeding was performed with the commercial sfm. b the first two days of perfusion feeding was performed with sfm until the cell density reach 10 7 cells.ml -1 and, after that, the bioreactor was fed using the rvpm formulation. (↓) on day number 10, 20% of the reactor volume was punctually bled maintaining the working volume background vectored vaccines based on modified vaccinia virus ankara (mva) are reported to stably maintain large transgenes, and to be safe, immunogenic and tolerant to pre-existing immunity. mva is usually produced on primary chicken embryo fibroblasts but continuous cell lines are being investigated as more versatile substrates. we have previously reported development of a continuous suspension cell line (cr.pix) derived from the muscovy duck and efficient production process for mva in chemically defined media [1, 2] . this process allowed isolation of an hitherto undescribed genotype (mva-cr19) that induced fewer syncytia in adherent cultures and replicated to higher infectious titers in the extracellular volume of suspension cultures [3] . replication of mva-cr19 remained restricted predominantely to avian cells, an important property of mva vectors. homologous recombination in cr.pix cells was used to generate viruses with various expression cassettes in deletion site iii [4] and combinations of the differentiating point mutations of mva-cr19 in a backbone of wildtype virus. all recombinant viruses were plaquepurified. successful introduction of the mutations was confirmed by sequencing and specifically designed restriction fragment length polymorphisms (rflps). viruses were analyzed by serial passaging, diagnostic pcrs accross deletion sites [4] , replication kinetics, plaque phenotype and electron microscopy. the genome was further investigated by anchored pcr and long pcr. efficiency of spread of recombinant viruses (fig. 1a) could be mapped to a point mutation in one of the genes, a34r. however, although mva-cr19 carries mutations in three structural proteins we detected no obvious differences to wildtype by electron microscopy (fig. 1b) . the replacement of the left viral telomere by the right counterpart was the most surprising result of our new study (fig. 1c) . this extensive rearrangement affects 15 % of the viral genome and has also increased the area of complementarity between the two telomeres. the recombination site was precisely located and shown via analysis of earlier and subsequent passages to be a stable property of mva-cr19. various viruses, including those with larger dual (dsred1 and gfp) expression cassettes, were serially passaged at least 20-fold. although the genotype of mva-cr19 is advantageous for replication, all genomic and genetic markers of wildtype and mva-cr19 were stably maintained in all passages of the recombinant viruses, independent of wildtype or mva-cr19 backbone. we confirmed our previous results that suggested that mva-cr19 replicates efficiently in single-cell suspensions and were able to connect this property with the d86y mutation in a34, a structural protein on the surface of the virions. mva-cr19 was also found to differ from wildtype mva by a recombination between left and right viral telomere. due to this event, several genes encoded at the left terminus have been deleted whereas the gene dosis of those originally encoded only at the right terminus may have increased. we do not currently know how much the various point mutations and changes in genomic structure combine to explain the improved replication of mva-cr19. as several of the affected genes have been reported in the literature to impact interaction of mva with the host we would expect that in vivo studies may reveal additional novel properties of mva-cr19. an extremely important distinction between our earlier study [3] and this one concerns the source of the viruses. here, we investigated plaque-purified viruses and confirm the high genetic and genomic stability of mva. different expression cassettes inserted into deletion site iii, all diagnostic rflps and pcrs over various sites of the genome and within the viral telomeres remained unchanged throughout at least 20 serial passages -independent of whether recombinant viruses with wildtype or cr19-derived backbones were characterized. fig. 1 (abstract p-225) . a one hallmark of mva-cr19 is a significantly reduced tendency to induce syncytia and an increased dispersion of plaques in cr.pix cell monolayers. this property appears to be supported by the mutation in a34r. b electron microscopy reveals no obvious differences between novel genotype and wildtype. background transient gene expression systems using polyethylenimine (pei) are considered to be fast, flexible and cost-efficient for recombinant protein production [1] . transfection efficiency depends on different factors; one of them is the type of media. production media support cell growth and protein production but not high transfection efficiency (te) mediated by pei [2] . therefore, media were selected for transfection followed by feeding of production media [3] to improve te and protein production. two different transfection strategies are compared: conventional transfection by preparing polyplex of a plasmid (pdna) and pei interaction before transfection and insitu transfection by direct addition both of them to the cell suspension and the polyplex formed spontaneously [4] . cells were seeded 24 hr in chomacs cd media before transfection. at transfection time point an equal amount of cells were resuspended in each media type. transfection was applied either insitu or conventional (polyplex prepared in 100 μl of 150 mm nacl and incubated for 20 min.), media addition was performed 5 hours post-transfection (hpt). media type and transfection condition were illustrated in table 1 . media screen result exhibits the highest transfection efficiency of around 50% transfected cells by opti-mem medium coming along with low cell growth and viability. to improve the transfection efficiency, basic parameters including cell density, pdna, and pei concentrations were varied and higher transfection efficiency was reached by reducing media or accordingly increasing cell density, pei and pdna concentration for transfection. further optimization results show that the transfection of cho-k1 cells in opti-mem (transfection medium) for 5 hours followed by addition of cho-macs cd (production medium) for further enhancing the transfection, cell count, and cell viability. the transfection efficiency (te) increased up to 85 ± 2.6% coincide with increases in viable cell concentration (vcc) in comparsion to transfection and cultivation in opti-mem media alone fig. 1a . both conventional and insitu methods are successfully transfected cho-k1 to the same similar high te as shown in fluorescence microscope images of fig. 1b . insitu transfection shows super-priority for suspension cell transfection concerning the reduction of handling steps (one step) compared to the conventional way (two steps). the insitu transfection avoiding the optimization step required for the incubation period to prepare transfection polyplex but require a higher amount of pdna and pei than conventional way as shown in table 1 . in order to deal with the growing demand of large quantities of therapeutic proteins in a timely fashion, expression systems are being optimized to reduce the time of generation of stable clones as well as to increase the levels of protein secretion. this can be achieved by a combination of expression cassette optimization, cell engineering and selection process. we have previously developed the cumate gene-switch, which is a very efficient expression system for protein production [1] . we have shown that the cumate-inducible promoter (cr5) was the strongest promoter we had tested so far in chinese hamster ovary (cho) cells. with this promoter, we were able to generate stable cho pools capable of producing high levels of a fc fusion protein (900 mg/l), outperforming by 3 to 4 fold those generated with cmv5 and hybrid ef1α-htlv constitutive promoters. besides the strength of the cr5 promoter, we demonstrated that the ability to control both the time and the level of expression during pool generation and maintenance gave a real advantage to the inducible expression system. indeed, we observed that keeping the expression off during selection enabled the generation of pools with superior productivity compared with the pools whose expression was maintained on. moreover, preliminary results suggest that keeping recombinant protein expression down increases the frequency of high producer clones [2] . knowing that one of the main bottlenecks of the successful bioprocessing of recombinant proteins using cho cells is the rapid isolation of a high producer, our data suggest that the cumate gene-switch system could be a valuable platform for the generation of stable clones. the main regulatory authorities and organizations demand proof of monoclonality for biotechnological producer cells. with increasing pressure to shorten timelines and to improve drug safety, technologically advanced methods have to be established to ensure that production cell lines are derived from a single progenitor cell. sartorius stedim cellca's single cell cloning approach is based on one round of fluorescence-activated cell sorting (facs) using becton dickinson (bd) facsariatm fusion cell sorter combined with photodocumentation by synentec cellavista microscopic imaging system. for the approach, critical process parameters such as different cell lines, viability and cell aggregation levels were investigated separately to assess their contribution to the probability of monoclonality. immediately after single cell cloning into 384-well plates (1 cell/ well) the plates were centrifuged followed by imaging using the cellavista (day 0). further cellavista images are taken on day 1, day 2 and on one day between day 5 and 7. outgrowth was defined at day 14. 8 cell lines expressing different recombinant products were investigated to calculate probability of having ≥ 2 cells/well after facs sorting p(d), the apparent probability p(i) of having ghostcells (cells that are out-of-focus and, thus, are not visible during initial microscopic imaging), and the apparent probability p(k) of having ghostcells that outgrow the 384-well stage (fig. 1 ). using these results, the probability of obtaining a monoclonal cell by using sartorius stedim cellca's single cell cloning approach was determined (table 1) by conservative examination: p(monoclonal, conservative) = 1 -(p(d) x p(i)) realistic examination: p(monoclonal, realistic) = 1 -(p(d) x p(k)) cell pools with low viability can theoretically impact the probability of monoclonality by e.g. diminishing microscopic imaging quality (cell debris). therefore, pool cell line 1 with very low viability (≥36 %) was used to demonstrate, that the probability of monoclonality is still 99.9 % in case of low viability on day of sorting: p(monoclonal, conservative) = p(d) x p(i) = 99.9 % p(monoclonal, realistic) = p(d) x p(k) = 99.9 % furthermore, cell pools with high aggregation levels can theoretically impact the probability of monoclonality by sticking together during facs sorting and therefore increase the probability p(d) of having ≥ 2 cells/droplet. therefore, pool cell line 8 with high aggregation levels (≥11.1 %) was used to demonstrate, that the probability of monoclonality is still ≥ 99.9 % in case of highly aggregated cell pools on day of sorting: p(monoclonal, conservative) = p(d) x p(i) = > 99.9 % p(monoclonal, realistic) = p(d) x p(k) = > 99.9 % conclusions in summary, there is no obvious correlation between protein product type and the determined probabilities for monoclonality. furthermore, pools with a viability as low as 36 % and pools with an aggregation level as high as 11.1 % can be used for scc resulting in acceptable probabilities of monoclonality. background ich guidance [1] requires that any cell line used to produce biopharmaceuticals originates from a single progenitor cell. recently, there has been increased scrutiny of the method(s) used to achieve this requirement. here, we review the suitability of the legacy capillary aided cell cloning (cacc) method in light of this changing landscape of expectations. the cacc method is based on the 'spotting' technique [2] and relies on independent visual conformation by two scientists of the presence of a single cell in a 1 μl droplet. this method achieves a high probability of monoclonality in one cloning round. although the method has since been replaced by facs single cell deposition for routine use, it remains a viable cloning method. -performed by trained scientists -dilute culture to 1500 ± 500 cells/ml with ≤2% doublets -draw cell suspension into pipette tip by capillary action; tap tip against the centre of the base of each well of a 48 well plate. -size of resulting droplet =~1μl (fig. 1a ) -two scientists independently view all wells using a microscope (initially use 40x magnification with the entire rim of the droplet visible within the field-of-view. next, examine particles using 100x or 200x magnification to confirm they are cells) and individually record the number of cells present in each well's droplet (fig. 1b to d) . -exclude droplet from further analysis if full visualisation is hindered (fig. 1e to h) . -add growth medium, and incubate plates. record all wells containing colonies; only progress colonies from wells that both scientists agree contains only one cell. -data analysis: -each scientist's observations categorised as: 0 cells, 1 cell or >1 cell -observed outcome for each well: growth or no growth -probability of monoclonality estimated from data using a statistical model cloning (ldc) increased accuracy of p(monoclonality) with cacc -ldc weakness: no visualisation after seeding (to check both well seeding and subsequent growth of colonies is well described by the poisson distribution), potentially overestimating p(monoclonality) -addressed by cacc: visual examination with colonies arising from wells seeded with 1 cell distinguished from those seeded with >1 cell -visualisation step further strengthened by: using controls for exclusion of wells; measuring errors based on the presence or absence of colonies in wells where two scientists independently reported 0 cells; and formally analysing the data using a suitable statistical model decreased time and resource requirements with cacc -high p(monoclonality) possible in single round as each well examined individually with only those containing a single cell progressed, and because the error rate for incorrect scoring is considered to be low two scientists miss a cell one cell sitting on top of another and the two thus appearing as one an experiment was performed to estimate error frequency [3] . conclusion -scientists miss a cell infrequently (in the range 0.4% to 1.3%, [3] ) -error frequency does not invalidate use of direct observation methods for cell cloning -single cell seen by both scientists is highly likely to be monoclonal -during method development, strategies established to control potential sources of error ( table 1 ) use of a contemporaneous visualisation approach, a strict control strategy, and a suitable statistical model (which takes into account potential errors) results in: -the cacc method being at least as robust as the ldc method -the cacc method being a reliable, single-step method for cloning to achieve a high p(monoclonality) background vector design is a key step in cell line development for the expression of therapeutic biologics. it is essential that the vector design results in high, stable expression of the encoded protein. other considerations include ease of cloning, stability for propagation in e. coli as well as in the mammalian host cell line, and ease of sequence amplification for verification of vector construction and for detection of insertion site and copy number in stably expressing cells. for these reasons, use of the same promoters and polya tails in dual cassette vectors, as is common for expression of the heavy and light chains of monoclonal antibodies, can be problematic. in order to minimize sequence similarities between the two expression cassettes, we have modified the promoters, introns, and polya tails of the light chain and heavy chain expression cassettes in the dual expression vector commonly used for the expression of therapeutic antibodies in the chozn® gs -/cell line development platform. gene synthesis and vector construction of igg1 and fluorophore-expressing vectors was done by atum. vectors were transfected into chozn® gs -/cells via electroporation. analysis of gfp and rfp expression was achieved using a macsquant instrument. selection and generation of stable pools and single cell clones from transfections with igg1-encoding vectors was performed as described in the chozn® platform technical bulletin. titer analysis was performed in static (96 well plate), in a 7 day tpp assay and in a 14 day fed batch assay using a qk fortebio. initial screening experiments identified a lead vector, #39, and a vector, #37, which produced very low titers and relatively few minipools expressing detectable levels of igg1. analysis of gfp and rfp expression from the modified vectors indicated relatively high expression from the rfp/hc expression cassette of vector #37. a stronger promoter resulting in overabundance of hc, known to be toxic to cells, provides a possible explanation for the poor results with this vector. interestingly, swapping the positions of the lc and hc in #37 resulted in a vector, #77, that outperformed the initially identified lead vector (fig. 1 ). this same change was made to vector #39 without any resulting improvement in titers (vector #78, fig. 1 ). interestingly, vector #39 had a smaller difference in relative promoter strengths, based on mean channel fluorescence ratio of gfp to rfp, suggesting that overabundance of hc was not an impediment to igg1 expression from #39. poor titers were also seen with a modified version of vector #39 (vector #79, fig. 1 ) in which the glutamine synthethase selection cassette was in the reverse orientation. this second screen identified vector #77 as the lead vector design (fig. 1) . a full comparative study of vector #77 and the control vector was performed, cumulating in the generation and comparison of single-cell clones from each. these studies have demonstrated the equivalence of these vectors in terms of igg1 titer. this work has resulted in the identification and characterization of a dual expression vector with minimized similarity between the two expression cassettes, easing the cloning, propagation and analysis of vector integration in stable cell lines while maintaining the high, stable expression of the encoded protein of the original vector design. background traditional cell line engineering strategies mainly include an antibiotics resistance selection. in this process, cells are transfected with the goi (gene of interest) together with an antibiotics resistance gene and those cells are selected that survive treatment with the respective antibiotic [1] . although the gene responsible for the survival of the cell is transfected together with the goi, resistance is not necessarily linked to high goi expression. thus, a significant proportion of resistant cells may not express the goi at all, necessitating the search for alternative, more closely linked selection systems. sirnas (silencing inducing rnas) are short, noncoding rnas that can bind to complementary mrna and inhibit their translation. this function has been used in many approaches to silence the expression of certain genes [2] . with their short length, sirnas can be hidden in introns (non-translating regions) of genes, making it possible to couple the expression of a sirna to a gene. this way a cell produces a correlating amount of sirna when transcribing the gene, without adding any further translational burden on the cell. the co-expression of the sirna can be used as a selective marker by one of the following methods: (1) knock-down of a suicide gene to enable a cell's survival after suicide gene mrna transfection, (2) down-regulation of a surface marker which is used in macs (magnetic cell separation) to filter out wanted or unwanted cells, and (3) inhibition of a fluorophore marker for selection using facs without product specific antibodies. for sirna based cell selection systems, sirnas replace the commonly used antibiotics resistance as a marker. cells that produce goi will also produce the sirna that protects the cell from a suicide gene. the selection protein (suicide genes, fluorophores, surface markers, etc.) is transfected as mrna and is only expressed during selection. the general process is outlined in fig. 1. (a) the traditional antibiotics resistance marker is replaced by an sirna, which is cotranscribed with the goi. unlike in antibiotic resistance, the marker here is not a protein, reducing the translational burden and providing more resources for goi production [3] . transfection with the suicide gene proved to be 100% lethal within 2 days, with no outgrowth over two weeks. protection by expression of the sirna was shown to be efficient. currently a comparison of stable cell line development programs based on sirna selection and neomycin selection is ongoing. conclusions the novel selection system should speed up cell line development, as the system kills rapidly and directly selects for cells transcribing the product gene on a high level. we expect to see more high producers earlier in the process, which will allow for an easier and faster selection in the following steps. sirna based selection offers great opportunities. by directly selecting based on goi transcription and not a proxy marker, we expect more relevant cells on a pool level. in addition, the elimination of an antibiotics resistance allows more cellular resources for goi production. the system offers multiple ways of application, either by enriching wanted, or depleting unwanted cells. background single-cell cloning is an essential step used in the upstream development of transformed cell lines for therapeutic protein production. while single-cell clones are typically used to ensure product consistency, such low cell density cultures present a survival challenge; cells grow more slowly or may even not survive at low densities in protein-free media, costing the industry time and money and limiting the pool of candidate colonies for choice of production clones [1, 2] . to address this problem, we aimed to develop a highly efficient serum-free medium suitable for optimising single-cell cloning efficiency by studying a range of conditioned media (cm) samples isolated from different chinese hamster ovary (cho) cell lines. materials and methods cho-s, dg44 and cho-k1 were adapted to cho-s sfm-ii (gibco) medium for a minimum of three passages. conditioned media was then collected when the cultures reached a cell density of 1x10 6 cells/ml (typically day 2-day 3 depending on the growth profiles of each cell line and whether they grew in suspension or attached conditions). samples were then centrifuged twice to remove cell pellet/debris and stored at -20°c. the ability of conditioned media to support cho colony formation was then assayed using 96-well plates, seeding the cells at low cell density (1-10cells/well) by diluting down cho cultures in media/conditioned media. after incubation at 37°c for 10 days, cloning efficiency was assayed using a standard xtt assay. initial screening of the nine cm samples was performed using cho-k1 cells due to their widespread use in industrial antibody production. successful media candidates were subsequently screened using additional cho cell lines. table 1) . the k1-sfmii-cm product improved cell cloning efficiency for dg44 cells (avg. increase>1.5-fold) and cho-s cells (avg. increase>3-fold) ( fig. 1) and also the adherent cho-k1 cell line growing in atcc +5%fbs. the ability of conditioned media to support cho growth in limiting-dilution conditions (1, 6 and 10 cells/ml) was investigated. from a range of nine conditioned media samples; four compelling products have been identified which improve low-cell density growth of cho-k1 cells, compared to sfm-ii control media. we feel that these early-stage conditioned media products may increase cloning efficiencies during upstream cho cell line development, resulting in financial savings for industry and increasing the possibilities of identifying particularly highperforming transformed clones. 12 (7):3496-3510. the main rate-limiting step in the upstream stages of protein biomanufacture is the isolation of stable, high producing cell clones. ubiquitous chromatin opening elements (ucoe®s) consist of at least one promoter region with associated methylation-free cpg island from housekeeping genes; they possess a dominant chromatin opening capability and thus confer stable transgene expression. ucoe®-viral promoter (e.g. cmv) based plasmid vectors markedly reduce the time it takes to isolate high, stably producing cell clones. although some ucoe®-viral promoter combinations have been tested, they have not been thoroughly evaluated in chinese hamster ovary (cho) cells. plasmid vectors containing combinations of either the human hnrpa2b1-cbx3 ucoe® (a2ucoe®) or murine rps3 ucoe® linked to different viral promoters (hcmv, gpcmv, sffv) driving expression of an egfp reporter gene were functionally analysed by stable transfection into cho-k1 cells and expression analysed by flow cytometry and qpcr to determine vector copy number. the results at 21 days post-transfection and selection clearly indicate that the rps3 ucoe®-gpcmv and -hcmv combinations give the highest transgene expression as shown in fig. 1 . the a2ucoe®-hcmv/gpcmv constructs were the next efficacious but 2-fold lower than the rps3 ucoe® vectors. the sffv promoter linked with either of the two ucoe®s was the least effective with expression levels 17-fold lower than the rps3-cmv constructs. the rps3 ucoe®-gpcmv/hcmv constructs are now being further modified to include elements that will provide optimal post-transcriptional pre-mrna processing (splicing, polyadenylation, transcription termination, mrna stability) thereby maximising stable cytoplasmic transgene mrna levels and protein production. in the last 20 years, growing number of innovator biologics and biosimilars have formed a competitive environment, where speed and efficiency of generating robust and highly productive cell lines needs to be improved continuously. through various advances, especially in media development and process optimization, product titers as high as 10 g/l were achieved in the pharmaceutical industry (kim et al., 2012) for standard products such as monoclonal antibodies. nevertheless, other proteins e.g. bispecific antibodies, fc-fusion proteins or fab-related products are difficult-to-express (dte) in chinese hamster ovary (cho) and may result in delays or even in termination of the cell line development process. we developed a new robust pool generation approach (cld 2.0) addressing both, easy-and difficult-to-express molecules, while reducing timelines down to 5 months (cld standard = 6 months), improving reliability of cell line development as well as clearly increasing obtained titers. in order to create stable cell lines, we transfected our cho dg44 host cells by electroporation. cells processed using the standard approach were cultivated in selective medium or medium containing additional 2.5 nm methotrexate (mtx) for three weeks. after an amplification step with 30 nm mtx for three weeks, stable individual cell pools were expanded and clones were generated by facs-sorting. clones were analyzed for growth performance and product concentration in fed-batch studies. in our new cld 2.0 approach, we increased mtx concentrations (2.5 nm, 5 nm and 10 nm mtx) during the first selection phase of three weeks. afterwards we omitted the 30 nm mtx amplification step. thereby, pool generation finished four weeks earlier than in the standard approach. to evaluate the stability of cell clones derived from mini pools (mps) generated according to the cld 2.0 approach, stability studies were performed for eight weeks, including stability fed batches at t=2 weeks and t=8 weeks. altogether three different proteins of interest with six cell clones each were tested. we adapted our cell line development process by increasing the initial selection during the first selection phase, thereby allowing the omission of the 30 nm mtx amplification step. we observed that the capacity of amplifiability varied for different products. cell lines with a protein titer ranging from >1 g/l to 1.5 g/l (dte) in shake flask fed-batch showed to be more susceptible to increased initial mtx levels and were thus not amplifiable with 30 nm mtx. in contrast, cell lines with high protein titer >1.5 g/l were observed to adapt to 30 nm mtx easily and were amplifiable. finale shake flask fed-batch data with cld 2.0 clones of highexpressing products showed comparable titers to clones from the standard approach. cld 2.0 clone titers for dte proteins revealed in average a 2.0-fold increase compared to clones generated in the standard approach. titers of top producing clones were in a range of 1.8 g/l to to 2.7 g/l (fig. 1) . furthermore, stability data of cld2.0 cell clones from different dte products showed a stable specific productivity in a range of +/-15 % over eight weeks cultivation. fed-batch titer from t=2 weeks and t=8 weeks were in a normal range of +/-20% of the standard 30 nm projects. our results demonstrate that cld 2.0 is a robust and reliable process for standard products (mab) and dte proteins. with our new process, we were able to increase titer of difficult-to-express proteins up to 200%. by omitting the amplification step (30 nm mtx) 96 % of generated clones were stable over eight weeks cultivation time. additionally using the cld2.0 approach, the time line from dna to rcb was reduced to 5 months. background cho cells have become the most popular platform for production of therapeutic proteins [1] . however, the generation of high-producer cells is a time-consuming and labor-intensive process that requires the screening of large amount of cells to get a clone of high titer and stability. since the expression titer and stability of clone is highly dependent on the site of integration, we demonstrated a new cell line development strategy by using ngs to identify the integration site and using crispr/cas9 to generate the target integrated high producing cell lines [1, 2] . to identify the high expression sites in the cho cells, we employed ngs to analyze the integration sites of a high producing cell line (titer > 3g/l). the pair-end reads with one read mapped to the vector and the other read mapped to the cho reference genome are extracted to identify the integration sites. to test the expression activity of the integration sites, we employed crispr/cas9 to specifically integrate the antibody gene into cho genome for expression. our data showed 4 integration sites are in the high producing cell line. among the 4 integration site, is1 integration site was tested by crispr/ cas9 for target integration of antibody gene for expression. the is1target integrated cell pool present higher expression titer than cell pool generated by target integration into other integration sites (fig. 1a) . the single cell clones derived from is1-target integrated cell pool had low copy number of goi (fig. 1b) . after normalization with copy numbers, the single cell clones derived from is1-target integrated cell pool showed high titer per copy (123~583 mg/l/copy) (fig. 1c) . this study demonstrated the generation of high-producing cell lines by crispr/cas9 mediated target integration. this approach will cost less time and labor than traditional method. the active integration site will serve as a platform like a cassette player for therapeutic antibody production. background cho, hek and sp2/0 are the dominant host cells for biologics drug production. achieving high level of recombinant protein production by these cell lines still remains a challenge. in order to understand the potential roles of lipids in protein production, secretion, vesicular transport and energy metabolism, we coupled high-throughput transcriptomics and lipidomics technologies. quantitative lipidomics is an emerging 'omics technology which can help us understand the physiological limitations of each cell line. the two types of major lipid groups in cells are non-polar and polar lipids. polar lipids such as glycerophospholipids (pls) include phosphatidylethanolamine (pe), phosphatidylcholine (pc), phosphatidylinositol (pi), phosphatidylserine (ps), phosphatidylglycerol (pg), and phosphatidic acid (pa). in this study; we integrated two dimensional high performance thin layer chromatography (2d-hptlc) and mass spectrometry (ms) lipid analysis of sp2/0, cho, and hek cell lines to understand the major differences in the lipid content of these hosts. bligh-dyer method was used to extract the lipids and extracts were analyzed by hp-tlc and ms. the polar lipids were separated into different categories by 2-d hp-tlc using a chcl 3 -meoh-h 2 o (71:25:2.5, v/v/v) solvent system in the first dimension and a chcl 3 -meoh-acetic acid-h 2 o (76:9:12:2, v/v/v/v) solvent system in the second dimension. non-polar lipids were separated by 1-d hptlc using hexane-diethyl ether-acetic acid. 2,7-dichlorofluorescein dye was used to visualize both polar and non-polar lipids. further detailed analysis was performed on a qqq mass spectrometer (thermo tsq vantage, san jose, ca) using negative-ion and positive-ion esi modes as well as negative-ion esi mode in the presence of lithium hydroxide. in this study, quantitative lipidomics was coupled with transcriptomics to further understand the physiological pathways of hek, cho-m and sp2/0 cells. initial hp-tlc analysis indicated that major lipids in these industrial cell lines were pe and pc. other polar lipids such as pi, ps, pg, pa, and sm were lower compared to pc and pe in exponential and stationary phases of each cell line. figure 1 represents 2d hp-tlc results of hek with the relative quantitation of polar lipids. in order to investigate the lipid subgroups, shotgun ms analysis was conducted for both exponential and stationary growth phases of the three cell lines. ms analysis indicated that lysophosphatidylethanolamine (lpe) and lyso-phosphatidylcholine (lpc) amounts were 4 -10 fold and 2-4 fold higher in hek cells compared to sp2/0 and cho cell lines. sphingomyelin (sm) was another lipid subgroup that was shown to have a major difference between sp2/0 and other mammalian cell lines. sm was 30-65 fold lower in sp2/0 cell line compared to cho and hek. to understand these metabolic differences, transcriptomics analysis using illumina highseq and gene expression omnibus was conducted on these mammalian cells. the kyoto encyclopedia of genes and genomes (kegg) database was used to map the transcriptomics data to the lipid synthetic pathways. transcriptomics data mapping to kegg pathways demonstrated that differences in lpe and lpc pathways correlate with the expression profiles of secretory phospholipase a2 (spla2), lysophospholipid acyltransferase (lpeat), lysophosphatidylcholine acyltransferase (lpcat), and lysophospholipase (lypla) [1] . the hp-tlc and lc/ms findings demonstrated that high levels of lpe and lpc existed in the hek cell line and low levels of sm were observed in the sp2/0 cell line. coupling lipidomics with transcriptomics provides us with an improved understanding of the physiological differences across sp2/0, cho, and hek cell lines that could be used to guide cell engineering efforts with the goal of increasing the recombinant protein expression capabilities of these three cell lines. biopharmaceuticals are a class of biological macromolecules that include antibodies and antibody derivatives, generally produced from cultured mammalian cell lines via secretion directly into the media. manufacturing at medimmune requires the generation of chinese hamster ovary (cho) clonal cell lines capable of producing the biopharmaceutical product at commercially relevant quantities with optimal product quality. the isolation of cell clones based on random single cell deposition via fluorescence activated cell sorting (facs) provides a heterogeneous panel of expressers. we hypothesize that the application of facs to provide an additional sorting step based on desirable cell attributes that correlate with productivity, product quality or cell growth attributes could lead to the isolation of higher producing cell lines with enhanced product quality attributes. a panel of 20 cell lines expressing a model recombinant monoclonal antibody were characterised in terms of growth, productivity, intracellular recombinant protein and mrna amounts. assays were also developed to investigate cell attributes using the commercially available imagestream instrument, an imaging flow cytometer, which enables the investigation of cellular characteristics that correlate with cell productivity at the single cell level. characterisation revealed the cell lines exhibited a range of values for productivity, growth, and intracellular (ic) antibody mrna and protein expression, ideal for further imagestream characterisation. western blot and qrt-pcr analysis demonstrated that final titre correlated with both ic heavy chain (hc) protein and mrna amounts (pearson correlation coefficient (pcc) = 0.70 and pcc = 0.80, respectively). to assess productivity at the single cell level, assays multiplexing ic hc protein and mrna with cell attributes were therefore developed. initial assay development focusing on hc mrna and protein amounts has revealed interesting results; four cell lines displayed two distinct populations, one producing the antibody and another nonexpressing population. the ratio of these populations varied amongst the cell lines. images obtained from the imagestream have shown the cellular localization and expression of hc and lc message and protein (fig. 1) . for both message and protein, hc and lc colocalize in the cell. whether there is any relationship between ic hc protein and cell attributes at the single cell level was then also investigated, as well as correlations with cell culture parameters at the population level. at the population level, correlations were found between titre and ic hc protein and mrna (pcc = 0.84 and pcc = 0.79, respectively) confirming the data obtained by western blot and qrt-pcr analysis. a panel of 20 cell lines has been characterised at the population level and show a wide range of antibody expression profiles at both the mrna and protein level. in parallel, assays have been developed for the imagestream to measure hc and lc message and protein amounts at the single cell level. protein and message quantification with the imagestream are consistent with more traditional approaches, such as western blots and qrt-pcr, that operate at the population level. the developed assays are now being used to investigate single cell productivity attributes and for the isolation of more productive clones. background productivity and stability are key factors for the selection of cell line in protein drugs production. large amount of target gene integrated in cell genome could lead to the instability of production. therefore, cells with low copies of target gene integrated in high yield sites could be an ideal production cells for manufacturing. it has been known that the transposon system can control the integrated copy number of target gene and can generate high yield producing cells, it could be a great approach to generate stable high yield producing cell lines carrying low copies of target gene through transposon system. we intended to develop a platform to generate high yield producing cell lines carrying 1-2 copy of the integrated target gene using transposon system. two cho cell lines, cho-s cells and dxb11 cells, have been applied. cells were co-transfected with transposon and target gene expression plasmids. after drug selection, the cell pool with highest productivity per target gene copy was applied to single cell cloning. the productivity and copy number of cell clones were determined, and the stability of cell clones was analysed after culture of about 60 generations. in the stable pools of cho-s and dxb11 cells, the productivities per integrated target gene copy were about 11-13 mg/l/copy and 68-75 mg/l/copy in a batch culture, respectively. after single cell cloning, the integrated copy numbers in most cell clones were less than three copies per cell. in cho-s and dxb11 cell clones, the productivities per integrated target gene copy were 20-60 mg/l/copy and 60-150 mg/l/copy in a batch culture, respectively. the productivity per integrated target gene in cell clones developed by the transposon system was much higher than that in cell clones developed by random integration (fig. 1a and b) . to evaluate the productivity stability of cell clones developed by the transposon system, ten cell clones at generation 0, 30, 60, and 100 were applied in the analysis. of interest, about 80% of cell clones were stable at generation 60, but lost the productivity at generation 100 (fig. 1c) , implying the most cell clones could maintain the stability within 2 months. using the optimized conditions of the transposon system to develop the stable gene expression cells, the productivity per integrated target gene was higher than random integration. these results suggested that our platform is capable to develop high yield producing cells with 1-2 copy of integrated target antibody gene and can be applied to identify high yield integration sites. background mammalian cells show an inefficient metabolism characterized by high glucose uptake and the production of high amounts of lactate, a widely known growth inhibition by-product [1] . recently, we have observed a different glucose-lactate metabolism in some cell lines. while some cell lines are unable to metabolize lactate, others can co-metabolize simultaneously glucose and lactate under certain culture conditions, even during the exponential growth phase [2] . these metabolic differences between different mammalian cell lines (cho, hek293 and hybridoma) have been studied by means of flux balance analysis (fba). three different cell lines were cultured in a 2-liter bioreactor: cho-s, hek293sf and hybridoma kb-26.5. for the fba, two adapted genome-scale metabolic models were used: a reconstruction of mus musculus for cho and hybridoma [3] , and a reconstruction of human metabolic model (recon 2) for hek293 [4] . in cultures where ph was not controlled, two different metabolic phases were observed for cho and hek293 cells. during the first phase both cell lines produced large amounts of lactate as a consequence of the high glucose consumption rates. interestingly, when ph dropped below 6.8, due to acid lactic secretion and accumulation, a second metabolic phase was identified, in which concomitant consumption of glucose and lactate was observed even during the exponential growth phase. conversely, hybridoma cells were unable to co-consume lactate and glucose simultaneously even under noncontrolled ph conditions. therefore, the hybridoma physiological data used for the fba corresponded to only phase 1 of phcontrolled cultures. a summary of the main cell growth and metabolic parameters obtained from the different experiments performed is presented in table 1 . fba shows ( fig. 1 for hek293 cell culture) that lactate is produced in phase 1 because pyruvate has to be converted to lactate to fulfill the nadh regeneration in the cytoplasm and only a small amount of pyruvate can be transported into tca through acetyl-coa. cell metabolism in phase 1 is highly inefficient, as the majority of the carbon source is not used for the generation of energy nor biomass. in phase 2, in which mitochondrial ldh was considered, tca fluxes could be maintained as in phase 1 at the maximal rate encountered; hence, the energy available for cells to grow was similar in both phases, obtaining similar growth rate. two different glucose and lactate metabolism behaviors have been observed in cho and hek293 cultures depending on the culture conditions: phase 1) glucose consumption and lactate production, and phase 2) glucose and lactate simultaneous consumption. in contrast, only phase 1 was observed in hybridoma cultures even when ph was non-controlled. fba showed that tca fluxes in phase 1 and phase 2 were similar, obtaining similar cell growth rate, but glucose uptake rate was much lower in phase 2 due to the lactate co-consumption. some authors hypothesize that cells metabolize extracellular lactate as a strategy for ph detoxification [2] . glucose and lactate co-metabolization resulted in a better-balanced cell metabolism, as can be seen from the metabolic fluxes calculated, with minor effects on cell growth. the observation of glucose and lactate co-consumption metabolic behavior and its deeper study and characterization could open the door of novel culturing strategies with the aim of increasing bioprocesses productivity. background transient protein expression in mammalian cell lines has gained increasing relevance as it enables fast and flexible production of high-quality eukaryotic protein. considerable efforts have thus been made to overcome existing limiting aspects of transient gene expression systems, in terms of cell lines, cell culture-based systems, and protein production in a cost-effective manner. milligram amounts of protein per liter can be produced within several days, allowing a significant shortening of the bioproduction process in comparison to protein production from stable clones. to ensure the robustness of the process, it is essential to have a reliable and easy-to-use transfection method. to palliate for the need of a reliable transfection reagent, we developed peipro®, the only commercially available pei optimised for mid to large-scale transient protein production during process development. peipro® is a non-polydiperse and fully-characterised polymer that has become the gold pei standard due to its reliability, reproducibility in high dna delivery efficiency and in ensuing high protein production yields. here, we present experimental data showing the benefits of using peipro® for protein production in comparison to other peis. we further demonstrate compatibility of using peipro® for recombinant protein production in most commonly used chemicallydefined media. materiel and methods suspension hek-293 and cho cells were cultured in shaker flasks in various synthetic media, as listed in table 1 . hek-293 and cho cells were resuspended at 1×10 6 cells/ml of serum-free medium, on the day before transfection. cells were transfected with 0.5-1 mg of plasmid dna encoding for the luciferase gene reporter using peipro®, pei "max" and l-pei 25 kda (polysciences, warrington, pa) resuspended at 1 mg/ml according to the manufacturer's recommendation. protein expression of the luciferase reporter gene was assayed 48 hours post-transfection by affinity chromatography using protein g (hplc). comparison of peipro® to other commercially available peis was achieved by transfecting suspension hek-293 and cho cells with plasmid dna encoding for the luciferase gene reporter. luciferase production yields obtained in hek-293 and cho cells were at least respectively 5-fold and 10-fold higher when using a similar amount of peipro in comparison to the other peis (fig. 1) . furthermore, peipro® was the only pei that led to similar luciferase production yields when decreasing the amount of plasmid dna per liter of cell culture. conversely, at least 1 mg of plasmid dna and 4-fold more of pei "max" and l-pei 25 kda were needed to obtain a similar luciferase expression range in both hek-293 and cho cells. we further assessed the compatibility and versatility of peipro® by measuring protein production yields obtained in most commonly used animal-free synthetic media. as shown in fig. 2 , peipro® leads to high protein production yieds in several commercially avaialble media formulations for hek-293 anc cho cell lines. peipro® is the only fully characterised pei transfection reagent that is suitable for reliable and reproducible recombinant protein production, irrespective of the scale of production and of the type of adherent and suspension cell culture system. fig. 1 (abstract p-274) . peipro® requires less reagent and similar to lower dna amount compared to other peis. suspension hek-293 and cho cells were seeded at 1×10 6 cells/ml in serum free medium and transfected with peipro®, pei "max" and l-pei 25 kda (polysciences, warrington, pa) resuspended at 1 mg/ml. luciferase expression was assayed 48 h after transfection using a conventional luciferase assay fig. 2 (abstract p-274) . peipro® is optimized for transfection of hek-293 and cho cells in several specific synthetic culture media. suspension hek-293 and cho cells were seeded following the recommended protocol in serum-free media and transfected with peipro® using the standard conditions. igg3-fc production was assayed 48 h after transfection using protein g affinity quantification (hplc) monoclonal antibodies (mabs), which are widely used in anticancer therapies, are mainly produced by mammalian cell lines. mab conjugation to biological molecules for enhancing their antitumor activity offers a new powerful tool for anticancer therapies. we have assessed the production of commercially approved anti-her2 therapeutic antibody trastuzumab (tzmb) [1] and also its fusion with interferon-α2b (ifnα2b). two cloning strategies consisting in transfecting cho-s and hek293 cell lines with two bicistronic or with a single tricistronic plasmids have been assessed. the in vitro efficacy of both antibodies has been tested and compared side by side. tzmb heavy and light chains were cloned in two bicistronic plasmids (pirespuro3 and piresneo3, clontech) and in a tricistronic plasmid derived from pirespuro3. ifnα2b was spliced to tzmb heavy chain by overlap extension pcr and the resulting tzmb-ifnα2b fusion protein was also cloned in the expression vectors in the same way than non-modified tzmb. selected cell pools were cultured in 125 ml shake flasks containing sfmtransfx supplemented with 10% v/v of cell boost 5 (hyclone), 4 mm of glutamax (gibco) and 2 μg/ml of puromycin and also with 700 μg/ml neomycin in the case of the cells transfected with pires-neo3. cells were cultivated in the same conditions as described elsewhere [2] . purified products (using protein a chromatography (hitrap mabselect sure, äkta avant 150)) were quantified by both elisa and sds-page. antigen binding test was performed in sk-br-3 breast cancer cell line by means of flow cytometry analysis. the biological activity of the different candidates was tested with mtt assay. both tzmb and the fusion protein tzmb-ifnα2b have been successfully expressed in cho-s and hek293, which use for heterologous protein expression have previously been optimized in prior works [3] . the tricistronic strategy resulted in the most efficient, showing a 3.5fold increase in terms of productivity with respect to the bicistronic double-transfection for tzmb in cho-s cells and a 5-fold increase in hek293 cells (fig. 1a) . in the case of tzmb-ifnα2b, the tricistronic strategy also allowed to achieve higher productivities than the bicistronic one (fig. 1b) . regarding the differences of specific productivity between both cell lines tested, hek293 emerged as the best production host candidate, for the two tested strategies (tricistronic and bicistronic) and for the two produced proteins, showing a 1.5-fold increase in terms of productivity with respect to cho-s cells for tzmb using the tricistronic strategy. tzmb and tzmb-ifnα2b were analysed in terms of their antigen binding capacity, and both were find to efficiently bind to her2+ skbr-3 cells (fig. 1c) . thus, the antibody affinity to her2 antigen has not been affected when fused to inf-α2b. finally, antiproliferative activity of tzmb and tzmb-ifnα2b were assessed on the same sk-br-3 cells. at a concentration of 500 nm of tzmb, and after a 72-hour incubation, sk-br-3 cells presented a 83% growth with respect to the untreated control. however, no antiproliferative effect was observed for tzmb-ifnα2b (fig. 1d) . the tricistronic strategy provides higher productivity yields in hek293 and cho-s cell lines for both recombinant proteins (trastuzumab and tzmb-ifnα2b). regarding which cell line is the best production host candidate, hek293 achieved higher productivity than cho-s cells for the two proteins tested. all constructions performed preserved the binding affinity to its antigen, trastuzumab and tzmb-ifnα2b bind efficiently to the her2 antigen present in skbr-3 cells. finally, tzmb-ifnα2b does not present an improved antiproliferative effect with respect to trastuzumab when compared by means of an in vitro assay. the genetic engineering of patient-specific t cells with lentiviral vectors (lvv) expressing chimeric antigen receptors (car) for late phase clinical trials requires the large-scale manufacture of high-titer vector stocks. the state-of-the-art production of lvv is based on 10-to 40layer cell factories transiently transfected in the presence of serum. this manufacturing process is extremely limited by its labor intensity, open-system handling operations, its requirements for significant incubator space plus costs and patience risk due to presence of serum. to circumvent these limitations, this study aims to develop a stable and serum-free process to produce lvv with pei-mediated transfection. in addition, this study also focuses on the development of a a c b d fig. 1 (abstract p-276) . expression of trastuzumab (a) and trastuzumab-ifnα2b (b) from bicistronic strategy (bc) and tricistronic strategy (tri) with cho-s and hek293 cells. relative specific productivity units are used for comparing the different strategies. c antigen binding analysis of trastuzumab and trastuzumab-ifnα2b. d antiproliferative activity of trastuzumab and trastuzumab-ifnα2b on sk-br-3 cells production system not only using a gfp marker but also a therapeutically relevant transgene (cd20-car) [1] . therefore, three different cell lines (hek 293, 293t, 293ft) were investigated concerning their productivity of lvv and their growing behavior in the in-house serum-free medium transmacs. as part of this, design of experiment was used to investigate the optimal conditions for pei/ dna-transfection. furthermore, this statistical approach was used focusing an ideal ratio between the 3rd generation plasmids (transfer plasmid cd20-car or gfp, envelope plasmid, packaging plasmids). in addition, different enhancers (sodium butyrate, lithium acetate, caffeine, trichostatin a, cholesterol, hydroxyurea, valproic acid) were investigated concerning their effects on productivity comparing hek cultures producing lvv encoding for gfp-marker or cd20-car. concerning productivity and growing behavior, hek 293t was the favored cell line for our serum-free lv manufacturing process. in addition, an additive screen revealed that sodium butyrate alone had the most promising effect on both gfp-lvv and cd20-car-lvv production. after pei/dna titration, we finally could increase lvv productivity by lowering pei/dna amount at higher cell densities referred to our standard transfection protocol. furthermore, the titration for the optimal plasmid ration revealed, that for large transfer constructs higher amounts of transfer plasmid are required than for smaller constructs to achieve a high productivity (fig. 1) . the outcome of these experiments enabled the development of a robust hek293t based process to produce clinical relevant lvv under serumfree conditions. furthermore, it provides an insight how therapeutic genes and the expression of its transgene can influence cell productivity. led to a vast increase in productivity, cho cells yield less than other expression systems like yeast or bacteria [1] . to improve yields and find beneficial bioprocess phenotypes, genetic engineering plays an essential role in recent research. the mir-23 cluster with its genomic paralogues (mir-23a and mir-23b) was first identified as differentially expressed during temperature shift, suggesting its role in proliferation and productivity [2] . the common approach to deplete mirnas is the use of a sponge decoy which, requires the introduction of reporter genes. as an alternative this work aims to knockdown mirna expression using the recently developed crispr/cas9 system which does not require a reporter transcript. this system consists of two main components: the single guide rna (sgrna) and an endonuclease (cas9) which induces double strand breaks (dsbs). these dsbs can result in insertion or deletion (indels) of base pairs which can disrupt mirna function and processing [3] . a cho-k1 cell line stably expressing an igg was used for knockdown experiments. sgrnas were designed to target the seed region of each mirna member and stable mixed populations were generated (fig. 1a) . total rna form each mixed population was reverse transcribed into cdna using mirna specific stemloop primers. the expression was quantified by rt-qpcr. to further analyse the range of indels the mir-23a and mir-23b clusters were amplified by a standard high-fidelity pcr. amplicons were cloned into pcr tm -topo® vector and positive clones were analysed by sanger sequencing. cell growth was monitored using viacount tm viability stain on a guava tm benchtop flow cytometer. productivity was assessed by elisa. students t-test was used for statistical analysis. it was shown that mirna expression was significantly reduced in mixed populations. a knockdown up to 95% was achieved for mir-23a, mir-23b and mir-24. the knockdown in mir-27a and mir-27b expression was considerably less -between 70-90% (n=3, * p ≤ 0.05, ** p ≤ 0.01, *** p ≤ 0.001) (fig. 1b) . furthermore, it was shown that various sizes of indels were generated by targeting the seed region. smaller indels (+1/ +2/-1/-2 bps) seemed to be more common but larger deletions were detected as well (fig. 1c) . mir-23a, mir-23b and mir-27b showed increased viability in late stages of the culture. depletion of mir-27a reduced growth significantly whereas knockdown of mir-24 showed increased proliferation as well as boosting igg titers (table 1) . in this work, we have shown that crispr/cas9 can be successfully applied as a tool to knockdown mirna expression in cho cells. the data was generated using mixed pools and it remains to be established if both alleles can be successfully targeted e.g. using nextgeneration sequencing of individual clones. background chinese hamster ovary (cho) cells are the most widely used host cell line for the production of therapeutic antibodies. pre-and posttranslational modifications and optimization of culture methods contributed to increase the productivity, resulting in a very high titre [1, 2] . however, it has been pointed out that the intracellular secretion process is a bottleneck in the production of therapeutic antibodies [3] . in addition, the details of the process of secretion of humanized recombinant antibodies from cho cells have not been well investigated. in this study, we thus analysed the detailed process of secretion of therapeutic antibodies using cho cell lines, which have already been established as high producers, with the aim of obtaining information for the more rational and efficient establishment of high-producer cells. we performed 1) chase assay, 2) immunofluorescent microscopy observation, and 3) size exclusion chromatography (sec) analysis to investigate the duration of secretion, bottleneck position, and formation of recombinant igg, respectively. high-producer cho cells expressing humanized igg1 [4] and igg3 were used. for the chase assay, cells were cultivated in shake flasks with serum-free medium containing 50 μg/ml cycloheximide (chx) to stop nascent peptide synthesis. the amounts of igg both remaining in the cell and secreted into the medium at each time point were measured by quantitative western blotting. for immunofluorescent microscopy observation, cells were cultivated on coverslips with chx for 4 h. immunofluorescent staining against the recombinant igg, endoplasmic reticulum (er), and golgi apparatus was performed after chemical fixation. for sec, cells cultured with chx were re-suspended in a buffer containing tritonx-100 and injected into a column. the amount of igg in each fraction was measured by quantitative western blotting. the amount of igg3 in the supernatant increased until 4-6 h after the inhibition of protein synthesis by chx; however, it hardly changed thereafter (fig. 1, upper panel) . at this point in time, however, around 40% of igg still remained in the cells (fig. 1, lower panel) , meaning that all of the synthesized igg could not be secreted into the medium and remained in the cells for several hours. this result was almost the same as that of studies using igg1-expressing cells [5, 6] . the localization of igg in the cells was checked before and after the addition of chx, with the results showing that igg1 remained in the er and was hardly seen in the golgi apparatus [5] [6] [7] ; igg did not seem to be efficiently transported to the golgi apparatus. the sec experiment showed that most of the igg1 remaining in the cell seemed to form full-sized antibodies [5, 6] , but it could not be secreted despite this. the high-producer cells could not secrete all of the synthesized igg, and around 40% of igg remained in the cells for several hours. this incomplete secretion is a common phenomenon among cho cells producing different types of recombinant igg. the igg could not be transported from the er to the golgi despite its formation of fullsized antibodies. solving this bottleneck in the transportation of igg from the er to the golgi and/or achieving more efficient glycosylation of igg after the formation of full-sized antibodies might be the next target to improve productivity. background humanized monoclonal antibodies (mabs) are among the most promising drugs, but defined strategies for their modification are still not available. our work deals with humanization of murine mab2/ 3h6. the superhumanization approach leads to a loss of binding affinity which was partially restored by a single human-to-mouse backmutation (t98hr). [1] this residue was selected by synergistic combination of sequence analyses of antibody framework regions and structural information using novel in silico simulations. for structural stabilization, a conglomeration of tyrosine residues surrounding t98hr was identified, the so called "tyrosine cage". [2] analysis of the "tyrosine cage" was done by alanine scanning mutations with a double mutation variant t98hr + y27ha (bm09) and a triple mutation variant t98hr + y27ha + y32ha (bm10). in a recent series of experiments we tried to enhance binding affinity by three new variants with backmutations in the variable light chain (vl). originating from t98hr, residues in the vl were selected based on their spatial proximity to the cdr3 loop of the variable heavy chain. affinity improvement of t98hr was evaluated by vl-double backmutation variants t98hr + f46ll (su01) and t98hr + q49ls (su02) and a triple backmutation variant t98hr + f46ll + q49ls (su03). all five variants were expressed transiently in hek293-6e cells and binding affinities were investigated in two individual settings with bio-layer interferometry. in the first approach concentrated cell culture supernatants were directly applied and mabs were captured on protein a tips, blocked with 3d6scfv-fc and the association and dissociation of 2f5 igg was measured. for the second approach, the culture supernatants were purified and the affinity was determined with streptavidin biosensors. first, biotinylated 2f5 igg was bound and then the association/dissociation of the purified 3h6 variants was measured. affinity evaluation of concentrated culture supernatants with protein a sensor tips showed a decrease of binding affinity of bm09 and a loss of binding of bm10. the protein a measurement showed an increased binding strength of su01, su02 and su03 compared to su3h6 and bm07. su01 and su03 result in a higher binding affinity compared to su02. these results can be confirmed with purified variants by the streptavidin bio-assay (fig. 1) . alanine scanning of the tyrosine cage demonstrated a reduction of binding affinity (bm09) and a severe loss of binding (bm10), concluding that the tyrosine cage plays an important role for supporting a correct cdr loop conformation. further affinity improvement of the single mutation variant t98hr could have been reached via mutations in the vl. it demonstrates the underestimated role of the vl for the interaction with its binding partner. although cho cells are a major expression system for production of recombinant biopharmaceuticals, the molecular and cellular background characterizing a high producer is largely unknown. it has been observed that in producer cell lines important signaling pathways like the akt-signaling are altered in characteristical ways. thus analyzing according signaling events should lead to identification of key elements characterizing high producer cells. to investigate this, our emphasis lies on the phosphorylation status of involved proteins as reversible switches in all signaling pathways. we aimed to establish a workflow for cho-specific phosphoproteomics and focused on igf signaling, as cell culture media often are supplemented with this growth factor. two producer cell lines and the according parental cells were cultivated in a stable isotope labeling with amino acids in cell culture (silac) experiment, followed by quantitative ms phosphoproteomic analysis including chospecific data evaluation. the chosen cho cell lines were cultivated in triplicates in silac media containing isotopically-labeled lysine/arginine (hlys/harg) and in parallel in identical standard media (llys/larg, tcx10d, xell). cell density, viability, metabolism and cell cycle distribution were monitored during 50 ml batch culture for 7-8 days. at day 3.25 igf was added into hlys/harg cultures. 5 min later a part of the cells was harvested. for ms analysis igf-treated (hlys/harg cultures) and control cultures (llys/larg cultures) were combined. the following ms sample preparation workflow included digestion of whole protein lysate and phosphopeptide enrichment via tio 2beads. nanolc-esi-orbitrap ms (q exactive plus, thermo fisher scientific) of phosphopeptides was excecuted with subsequent identification and quantification in maxquant [1] . in addition to silac quantification of h/l ratios for investigation of igf effects, aquired data was also used to perform label-free quantification (lfq) in maxquant [1] for comparison of cell lines. statistical significance was calculated via t-test (p<0.05) or anova (permutation-based fdr<0.05) in perseus [2] . results igf effects on growth and production the igf treatment resulted in a prolonged viability for all cell lines. however, an increased vcd was only observed for producer cell line 1, yielding in an enhanced integral of vcd (ivcd). for the parental cells growth was inhibited by igf, although s-phase cells were enriched at least temporary (fig. 1a) . regarding antibody production igf led to a decreased qp and product titer, concomitantly with an increase in s-phase cells (fig. 1a) . this inverse correlation of proliferation and cell specific productivity is known from different productivity enhancing molecules, like butyrate [3] . ms investigation of signaling events the phosphoproteomic experiment resulted in the identification of 10.485 class i-phosphorylation sites. statistical evaluation of phosphopeptide abundances in perseus showed up 144 significant differences between the cell lines and led to producer vs. parental classifications (fig. 1b) . the quantitative evaluation via silac yielded in about 2.408 quantifiable phosphosites in at least 6 biological replicates. rapid phosphorylation changes after growth factor treatment indicated signaling towards protein synthesis, cell cycle and regulation of actin cytoskeleton amongst others. for 201 phosphosites significantly different h/l ratios were calculated between the two groups parental vs. producer, four of them are listed (table 1) . the workflow to study phosphorylation states revealed differences in the related cell lines and gave insights into signal transduction as a response on igf. on the one hand, igf-treatment resulted in a fast and widespread upregulation of phosphorylation sites within aktand mapk-signaling. on the other hand, a different phosphorylation status for producer compared to parental cell lines uncovered distinctions in biological processes like rna-and dna-binding and regulation of cytoskeleton. in sum, our sucessfully established phosphoproteomic approach allows to detect important signaling key players in cho cells that subsequently can be targeted through cell engineering or small molecule treatment. to improve antibody production in the cho cell expression system, it seems to be useful to up-or downregulate gene expression including antibody folding, secretion, and cell metabolism. many cell engineering approaches, including gene introduction, knockout and knockdown, have been employed to enhance recombinant antibody production [1] . however, identifying production enhancer genes is the rate-limiting step for cho cell engineering, because the conventional method requires a series of experiments including genomic integration of the tested genes, selection of stable cell clones and cell culture experiments of several clones. in this study, we propose an approach for rapid evaluation of production enhancer genes based on an episomal expression system. plasmid vector carrying the epstein-barr virus (ebv) encoded nuclear antigen 1 (ebna1) was transfected into cho cell line producing igg1 antibody. after g418 selection and single colony isolation, ebna1 expression was checked with capillary electrophoresis system wes (proteinsimple). ebv ebna1-antibody (1eb12) was used for detection as the primary antibody. the expression vector for the gene of interest was prepared by inserting 1508 bp of an orip dna sequence into a plasmid vector carrying cag promoter, resulting in the potc vector. pei max (polysciences, inc.) and balancd transfectory cho (irvine scientific) were used for the transfection. the number of viable cells and gfp-positive cells were counted using countess ii fl automated cell counter (thermo fisher scientific). the transfected cells were cultured in cellstar cellreactor tubes. the tubes were incubated in a climo-shaker isf1-x (kuhner). antibody production was measured using biolayer interferometry with an octet qk system (fortebio). we constructed four cho cell lines stably expressing ebna1, termed igg1-eb01 to eb04. in capillary electrophoresis analysis, we observed a clear peak corresponding to the ebna1 expression in all four cell lines. we tested the transfection efficiency by potc-gfp plasmids. in the best transfection condition, pei/dna ratio of 1/1, igg1-eb01 cell showed the highest gfp-positive cell number (1.07×10 7 cell/ml) and transfection efficiency (95%) among the four cell lines. therefore, igg1-eb01 cell lines were selected for further study. after the transfection, the number of gfp-positive cells continued to increase even after the passage (fig. 1) , suggesting that the potc-gfp plasmid was stably retained and replicated by ebna1/orip system in igg1-eb01 cell lines. in preliminary experiments, we introduced three genes, mdh2, gss and gclm, into igg1-eb01 cell lines. cotransfection of these three genes led to an increase in igg1 production from 287±18 mg/l (control) to 334±21 mg/l at day 8 (p<0.05, t-test, n=3). this result suggests that these three genes work as production enhancer genes. conventional methods based on stable cells take up to 6 months to determine whether the gene of interest is beneficial for recombinant igg1 production. in contrast, identification of production enhancer genes is achievable within 10 days by our proposed method based on ebna1/orip system. the proposed method makes it possible to evaluate production enhancer genes in a rapid manner. the proposed method is a promising approach to identify genes enhancing recombinant antibody production. background 2g unic™ (2gun) technology comprises a set of protected genetic elements that improve protein production by acting on transcription as well as on translation. the elements can either be inserted into existing (platform) vectors or be provided as complete ready-to-use vectors. the technology can be used in stable and in transient transfection to boost protein production for product development and is being applied in cld for pharmaceutical proteins. in combination with antibiotic selection or dhfr selection, 2gun technology routinely results in 2-3 fold increase in expression of client antibodies or fusion proteins, both in pools and after clonal selection. previously, we have successfully combined 2gun technology with glutamine synthetase (gs) selection and the cho gs null cells of horizon discovery, resulting in clonal cell lines producing > 6 g/l of a biosimilar mab in fed-batch assay. here we present data on the successful application of the 2gun technology for the enhanced expression of a large (>300 kda) human heterotrimeric glycoprotein, a renowned difficult-to-express (dte) protein. all expression vectors comprised a hcmv promoter and bgh polyadenylation sequence in the expression cassettes for the gene of interest, and a selection marker gene with sv40 promoter and sv40 polyadenylation sequence. 2g unic™ vectors also contained genetic elements (2g unic™ technology, proteonic). cho gs null cells (horizon discovery) were transfected in duplicate with reference or 2gun expression vectors and selected in media lacking glutamine and containing the appropriate antibiotics. the bulk pools were seeded at equal viable cell density after obtaining maximum viability and cultured for 9 days without feeding (batch). expression of the target protein in cell culture supernatants of stable bulk pools was measured by elisa. the three protein subunit genes were expressed from vectors with different selection markers. in the reference constructs (without 2gun), the α, β, and γ chains were expressed from vectors with marker genes for zeocin, blasticidin, and gs, respectively. a similar 3 vector combination was also generated with 2gun elements integrated in each vector. in addition, 2 vectors with 2 subunits (γ-α and α-γ), each with a separate 2gun element, promoter and polyadenylation signal, were generated with a gs marker gene. cho gs -/cells were transfected with the 4 appropriate vector combinations in equimolar ratios and selected in bulk in medium lacking glutamine and 1 or 2 antibiotics. the 2-vector transfected cell pools recovered first, due to the presence of only 1 antibiotic in the medium (fig. 1a) . the pools transfected with three 2gun vectors recovered to maximum viability just a few days after the 2-vector 2gun pools. recovery of the reference pools took up to a week longer than the 2gun pools. production of each pool was assessed in a batch production run in shaker flasks. all 2-vector 2gun pools which recovered first produced titers around 0.1 g/l, which is almost 10-fold higher as compared to the production by reference pools (fig. 1b) . the highest titers of 0.5 g/l were obtained in the 3-vector 2gun pools. these data show that the 2g unic™ genetic elements can be successfully used to obtain a significant increase in the titer of difficult-to-express proteins. similar results have been obtained with other dte proteins, including fc-fusion proteins and bi-specific antibodies (not shown). the expression of a large, glycosylated multimeric difficult to express protein can be increased more than ten-fold in cho gs pools by application of 2g unic™ genetic elements. the highest expression of is obtained using a separate vector for each subunit. characterization of antibody-producing cho cells with chromosome aneuploidy noriko yamano 1,2 , sho tanaka background chinese hamster ovary (cho) cells are commonly used as host cells to produce biopharmaceuticals. however, the number of chromosomes in cho cells varies. previously, dg44-sc20 and dg44-sc39 cell lines with modal chromosome numbers of 20 and 39 were isolated from parental cho-dg44 cells, from which igg3-expressing cell lines named igg3-sc20 and igg3-sc39 were established, respectively. the igg3-sc39 cell pool showed a higher specific igg3 production rate than the igg3-sc20 cell pool [1] . even though all of the igg3-sc20 clones and half of the igg3-sc39 clones contained the same number of vector integration sites (single integration site), igg-sc39 cell clones produced more igg3 following the culture of single-cell clones than any of the igg3-sc20 clones [1] . in this study, we performed transcriptome analysis to investigate the characteristics of high-producer cells with chromosome aneuploidy. transcriptome analyses using amplified fragment length polymorphism (aflp)-based high-coverage expression profiling (hicep) and de novo mrna-seq were performed on dg44-sc20, dg44-sc39, igg3-sc20 and igg3-sc39. to compare cell lines with different numbers of chromosomes, transcriptome data from mrna-seq were adjusted for cell number using rna reference materials (nmij crm 6204-a; national institute of advanced industrial science and technology) mixed at equal amounts per cell. pathways related to differentially expressed genes were searched using keymolnet (km data). high-chromosome-number cho cells showed larger cell diameters, as determined by vi-cell (beckman coulter) measurement. the predicted volume ratios, based on these diameters, are 2.24 (dg44-sc39:dg44-sc20) and 1.59 (igg3-sc39:igg3-sc20). the levels of β-actin and the products of most other genes that were detected by mrna-seq differed by approximately 20% in the comparison between sc39 and sc20 (sc39 > sc20). based on the analysis of gene expression levels per cell volume, approximately 90% of detected genes showed lower expression in both dg44-sc39 and igg3-sc39 compared with the levels in dg44-sc20 and igg3-sc20, respectively. in addition, the number of genes whose expression level was decreased in igg3-sc39 compared with that in dg44-sc39 was larger than those showing the opposite pattern. the results of the comparisons between igg3-sc20 and igg3-sc39 indicate that differentially expressed genes were mainly related to cell growth (e.g. myc, smad), apoptosis (e.g. caspase), lipid metabolism (e.g. srebp, pparγ) and epigenetic histone modification (e.g. brca, hat) pathways. the mrna levels of myc, smad, caspase, brca and hat related genes were lower in igg3-sc39, while those of srebp and pparγ related genes were higher in igg3-sc39. the effects of these pathways on antibody production should be examined in future. in this study, we found that high-chromosome-number cho cells have lower amounts of mrna relative to their volume. a reduction per unit volume in the expression of genes that are required for survival might generate additional energy for recombinant protein production in high-chromosome-number cells. from an evolutionary perspective, an increased set of chromosomes underlies rapid evolutionary adaptation. although there are issues to be considered, such as stability, there may also be advantages to using high-chromosome-number aneuploid cho cells as a production host cells of recombinant proteins. background human growth factors have an enormous therapeutic potential. among them, the bone morphogenetic protein-2 (bmp-2) can induce de novo bone formation endowing the protein a high therapeutic potential. however, finding a suitable recombinant production system for such a protein still remains a challenge. recombinant expression of hbmp2 was investigated in transiently transfected hek-293 cells and in stable clones established in cho-k1 cells cultivated in excell and pro-cho5 medium, respectively. protein stability and interaction of the hbmp2 with the producer cells were investigated in vitro using commercially available rhbmp2. in addition, we investigated a cell-free protein synthesis system harboring translocationally active microsomal structures, hence having the potential to perform post-translational modifications, as an alternative production method. we showed that growth rates and viabilities of the rhbmp2producing cells were similar to those of the parent cell line, while entry into the death phase was delayed in case of the recombinant cells. the maximum rhbmp2 concentration detected in the culture supernatant was low for stable clones but can be greatly improved combining the hek-293 cells transient expression system and batch reactor cultivation which reflects a better compatibility of the codon usage in the human cells (table 1) . hbmp2 protein is sensitive to slightly acidic ph and to a lesser extend to proteases (fig. 1a ) and binds to both producers cell lines (fig. 1b) -all this could incidentally contribute to the low product titers. cell-free protein synthesis has been proposed as alternative for "difficult-to-express" proteins. since native hbmp2 is glycosylated, a cell-free system based on eukaryotic cell lysates is required for its production. cho cell lysates were chosen, since they had previously been established as the most productive eukaryotic system in our hands [1] , while concomitantly enabling a direct comparison to the production of hbmp2 in stable clones established in cho-k1. the ability to perform post-translational modifications is a major advantage of eukaryotic systems. the cho lysates prepared by the protocol used here have previously been shown to contain significant amounts of endogenous microsomes derived from the endoplasmatic reticulum during lysis [2] . to enforce translocation of the target protein into the microsomal structures, a melittin signal peptide was fused to the hbmp2 cdna. the glycosylation of the protein was assessed by enzymatic treatment (pngase, endoh) and confirmed using 14 c-mannose for the de novo protein synthesis. upon cell-free protein synthesis, the hbmp2 yield was 100-fold higher than the best one in the hek-293 cells. the difference becomes even more dramatic, when productivities are considered (table 1) , i.e. the fact that maximum product titers are reached within 3 h in the cellfree system compared to 120 h in the cell-based ones. this demonstrates that the cell-free expression system is most suitable compared to mammalian cell expression method for the production of glycosylated human bmp2 (table 1 ) [3] . human growth factors are complex molecules, which make their production in mammalian cells desirable. however, low product titers caused by a variety of both cell and process related effects may hinder the development of highly productive processes. in such cases, cell-free protein production using cho cell lysates containing endogenous microsomes for posttranslational processing, may eventually present an attractive alternative. in particular since these lysates can be used under tightly controlled conditions assuring a higher degree of reproducibility, than, e.g. transient transfection systems. cell-free systems are known to circumvent typical bottlenecks of cellbased ones, e.g. metabolic regulation and cell maintenance mechanisms. in consequence, the production of a recombinant protein is neither inhibited by its accumulation nor by any interaction with the cells, e.g. through the activation of inhibitory signaling pathways. core. preliminary studies showed that the corresponding polyplexes, but also some of the cells that came into contact with them, became magnetic and were manageable by magnetic fields [1] [2] [3] . here, we present a characterization of the influence of structure and composition on the function of these polymers using a library of highly homogeneous, paramagnetic nano-stars with varied arm lengths and densities [4] . the paramagnetic nano-stars library was synthesized by coating maghemite nanoparticles (γ-fe 2 o 3 ) with a thin silica-shell functionalized with an atomic transfer radical polymerization (atrp) initiator. pdmaema arms were grown from the core particles via atrp. in one case, the pdmaema arms was end-capped with pdegma blocks produced during a second atrp step. all nanostars were characterized by size exclusion chromatography and thermogravimetry to calculate number and length of the pdmaema arms. the core diameter was determined by transmission electron microscopy and dynamic light scattering (dls). the different variants (table 1) were analyzed for their ability to complex pdna (pegfp-n1) using various physicochemical methods (dls, zeta sizer). transfection efficiency/cytotoxicity in cho-k1 cells were determined by flow cytometry. transfected cells were placed in a magnetic field and the influence of the polymer architecture on the magnetic separation was investigated. nonparametric spearman analysis was used to correlate between arm length/arm densities, magnetic properties of the cells and transfection efficiency. based on the hydrodynamic radii of the polyplexes, the investigated nano-stars could be divided into three subgroups (table 1) . middle, but also high arm density nano-stars formed smaller polyplexes with hydrodynamic radii ≤ 300 nm, a size that is considered suitable for endocytosis and transfection. transfection efficiencies and cytotoxicities varied systematically with the nano-stars architecture, with viability showing a more pronounced dependency on the characteristics of the transfection agent than the transfection efficiency itself. the arm density was particularly important, with values of approximately 0.06 arms/ nm 2 yielding the best results (fig. 1a) . the end-capping the polycation arms with pdegma significantly improved the serum compatibility (fig. 1b) . the gene delivery potential of a given nano-star and its ability to render the cells magnetic did not correlate. although, compared to the non-separated cells, egfp-expressing cells were consistently more frequent in the magnetic cell fraction, while the non-magnetic fraction was slightly depleted. when the egfp-expressing cells were further divided into low, middle and high producers, a statistically significant shift towards the high producers was observed in the magnetic cell fraction (fig. 1c) . a nonparametric spearman correlation analysis was used to statistically evaluate possible links between the molecular characteristics of the nano-stars, the physicochemical properties of the corresponding polyplexes, the transfection conditions, and the cellular reactions. the resulting correlogram is shown in fig. 1d . transfection agents with magnetic properties enlarge the toolbox for studying non-viral gene delivery, since cellular magnetism is added as a new parameter. this allows, inter alia, a distinction between mere cellular interaction and actual uptake, which is otherwise difficult. viability showed a much more pronounced dependency on the characteristics of the transfection agent/polyplex than the transfection efficiency itself, which should be taken into account during method optimization. end-capping the polycationic pdmaema-arms with pdegma-blocks improved the compatibility of the polycationic nano-stars with serum components. in future optimized, blood-compatible, nano-stars, which can be retained/directed by magnetic fields, could become options for non-viral gene delivery in vivo. the increasing demand for monoclonal antibodies has necessitated the need to increase the productivity of current industrial cell lines. in our earlier study [1] , we had shown that treatment with er-stress inducer, tunicamycin significantly increased the titers and productivity in recombinant cho cell lines with a simultaneous upregulation of many genes from the unfolded protein response pathway (upr). however the loss in cell viability prevented a sustained increase in titers. in the current study we explore the effect of varying concentrations of tunicamycin and treatment times, such as to modulate the increase in protein folding capacity while preventing induction of apoptosis. anti-rhesus igg-secreting cho cells [2] were cultured in sf-cdm in 125 ml shake flasks. the cells were treated with varying concentrations (30-500 ng/ml) of tunicamycin in a batch culture. further, the effect of treatment with tunicamycin for short periods of time (24 hrs) was also evaluated. igg titers and mrna expression levels were quantified using elisa and qrt-pcr (illumina), respectively. results cho cells were treated with different concentrations of tunicamycin and cultured in a batch for 8 days (referred as continuous treatment/cte). figure 1a presents the maximum vcd and % drop in viability under treatment. a dose-dependent inhibitory effect is observed on growth and viability of cells in cte-cultures, with minimal inhibition as lower concentrations. contrastingly, igg titers (fig. 1b) were higher in treated cultures w.r.t. control in initial phase of the cultures at all the concentrations of tunicamycin. the per-cell productivity (fig. 1c) also showed a significant increase w.r.t control at all the concentrations of tunicamycin. however, the increased productivity due to tunicamycin was not sustained and levels become similar to control after day 3 (data not shown). to prevent loss of viability due to tunicamycin, the effect of short-term treatment (ste) with tunicamycin was explored. cells treated with tunicamycin for 24 hours were harvested (corresponding to day 2 of cte cultures) and inoculated in fresh media. the ste-cultures showed improved viability and higher maximum vcd as compared to cte-cultures (fig. 1a) . the fold increase in igg titers was not sustained beyond day 1-2 in stecultures ( fig. 1d ) but significant increase in productivity was seen in the initial phase (fig. 1e) . further, the cells were adapted over 25 continuous generations under 30ng/ml tunicamycin. the adapted cells had overall 1.3-fold higher productivity, as compared to control (fig. 1f) , in a batch culture. to understand the molecular basis of increase in productivity, mrna expression level of key genes was determined. xbp1s is a transcription factor involved in activation of chaperones (like grp78, calreticulin) and apoptotic genes (such as chop). significant increase in the levels of calreticulin was seen on treatment with tunicamycin (fig. 1g) . both xbp1s and grp78 were marginally induced when treated with 30ng/ml of tunicamycin in both cte-and ste-cultures (fig. 1h) , and significantly up-regulated when treated with 500ng/ml of tunicamycin. the chop mrna levels also increase with increasing tunicamycin concentrations, with levels in ste-cultures lower than cte-cultures (fig. 1h) . the results suggest that upr induction may be important to increase productivity in these cte/ste-cultures. note that, tunicamycin had no effect on the expression levels of igg heavy-chain, thus eliminating the involvement of igghc-mrna in increasing productivity (fig. 1i) . tunicamycin induced er-stress increased productivity in the initial phase of the culture and enhanced upr-mediated folding capacity can be attributed as one of the reasons for it. at lower concentrations of tunicamycin, a fine balance between optimum upr induction and apoptosis can be achieved, as seen in 30ng/ml tunicamycin ste-cultures. in summary, this study demonstrates an alternate approach to enhance productivity of current industrial cell lines. background chinese hamster ovary (cho) cells have been widely used for the large-scale production of biopharmaceuticals [1] . to construct antibody-producing cho cells, exogenous genes encoding antibodies are usually integrated into unspecified regions of chromosomes (random integration). however, the chromatin structure differs depending on the location of the chromosomal region, which affects the expression level of the gene of interest [2] . recently, gene-targeting methods that enable site-specific integration of expression vectors have been developed. however, the regions that are most efficient for exogenous gene expression have not been clarified. we previously constructed a cho genomic bacterial artificial chromosome (bac) library generated from the recombinant cho-dg44 cell line. it was expected to cover the entire cho genome five times. the 20 chromosomes in cho-dg44 cells were aligned in decreasing order of size and assigned letters from a to t [3] . three hundred and four bac clones were mapped to every chromosome of cho-dg44. among the karyotypes of cho-dg44, cho-k1 and primary chinese hamster cells, chromosomes a and b are considered as the sole paired chromosomes corresponding to chromosome 1 in primary chinese hamster cells. hence, chromosomes a and b are considered to be stable [4] . in this study, we constructed antibody-producing cells by using a gene-targeting method, which focused on the stable chromosomes. a gene map of chromosome 1 was constructed by combining the bac-fluorescence in situ hybridization (fish)-based chromosome physical map and sequence data of mapped bac clones. the sequences of bac clones were searched by blast with ncbi and chogenome.org databases. three different regions on chromosomes a and b were selected based on cho genomic bac library sequences as target sites. cho-k1 cells were stably transfected by lipofection. the target sequences were broken using the clustered regularly interspaced short palindromic repeats (crispr)/crispr-associated protein 9 (cas9) system and humanized igg1 genes were integrated by non-homologous end joining recombination. transfection without using the crispr/ cas9 system was also performed. these cell pools were cultivated for six days with serum-supplemented medium, and their levels of antibody productivity were evaluated by elisa. copy number analysis was also performed using real-time pcr. results and discussion construction of gene map of chromosome 1: eighty-three bac clones were mapped onto chromosomes a and b (each clone contained 100-150 kb of the cho genome sequence). as a result of annotations of 83 bac clone sequences, 91 genes were mapped on chromosome 1. investigation of the differences of productivity among antibodyproducing cells that were constructed by chromosome 1 targeting and/or random integration: cell growth was not affected by the gene targeting site. the specific production rates of antibodyproducing cell pools constructed by gene targeting of chromosome 1 were higher than those of the cell pool constructed by random integration. all cell pools constructed by gene targeting showed lower copy numbers of heavy chain and light chain in genomic dna than those in the cell pool constructed by random integration, despite showing high productivity. our results indicate that high productivity of the cells constructed by gene targeting of chromosome 1 does not depend on the increase of the antibody copy number, and that the environments around these target regions are suitable for exogenous gene expression. the approach of using gene targeting to chromosome 1 may be promising for constructing antibodyproducing cells. retroviral vectors have been widely used as gene delivery tools in various biotechnology fields. however, the random integration feature of retroviral vectors seems to cause problems such as insertional mutagenesis and gene silencing. we previously demonstrated cre-mediated retroviral transgene insertion into a pre-determined site of the founder cells using integrasedefective retroviral vectors (idrvs), where a cre expression plasmid was transfected into the cells prior to retroviral transduction [1] . recently, we reported novel hybrid idrvs (cre-idrvs) incorporating bioactive cre recombinase protein, and validated site-specific gene integration of an scfv-fc antibody expression unit into the chinese hamster ovary (cho) cell genome [2] . we also developed an accumulative site-specific gene integration system, which enables repeated integration of multiple transgenes into a pre-determined locus of the cell genome [3] . here, we attempted repeated integration of transgenes using cre-idrvs. a viral vector plasmid (pqmscv/hd[scfv-fc]) encoding reporter genes and an scfv-fc expression unit flanked with wild-type and mutant loxps was constructed for the production of idrvs. cre-idrvs were produced described previously [2] . results and discussion figure 1a shows a schematic drawing of each round of targeted transgene integration using cre-idrvs harboring an scfv-fc expression unit. (fig. 1b) . genomic dna extracted from the cells were subjected to pcr using specific primer pairs α and β, and γ and δ to confirm site-specific integration. dna fragments with expected sizes were amplified in each cell clone (fig. 1c) . these results indicate that site-specific repeated integration was achieved using cre-idrvs. in contrast, scfv-fc productivity in cho/hd[scfv-fc]×2 cells was slightly decreased compared with that of cho/ne[scfv-fc]×1 (data not shown). although the reason remains unclear, repeat-induced gene silencing might occur due to tandem repeat structure of expression units. we reported improved recombinant antibody production using a production enhancer element [4] . such a cis-regulatory element might be a feasible approach to enhance the productivity. we demonstrated site-specific repeated transgene integration into a pre-determined chromosomal locus using cre-idrvs for the production of an scfv-fc antibody. if lipids role in the cell have been reduced for a long time to cell membrane formation, it is now understood that lipids plays also a role into energy metabolism, vesicular transport, membrane structure, dynamics and signaling. however, the exact mechanism of how compositional complexity affects cell homeostasis remains unclear. thanks to recent advances in mass spectrometry, it is now possible to study a wide range of lipids, providing a better understanding of lipid homeostasis in high performance cell culture processes. the purpose of this work was to develop a robust lipidomics method applied to mammalian cell cultures in a three step method: extraction, separation and detection (fig. 1 ). both matyash [1] and folch [2] extraction method were performed on our cells to reach the highest yield. two separation techniques were also tested: hydrophilic interaction liquid chromatography (hilic) and reverse phase chromatography. finally lipid classes' identification was achieved by tandem mass spectrometry analysis thanks to structure-specific fragmentation ions. the yield obtained with matyash extraction method was higher than with folch method for each lipid class tested. besides, matyash method presents also the advantage to be less toxic and suitable for high throughput analysis since the organic layer is above the aqueous layer. lipids separation by hilic is based on their polar head. since lipid classes are defined by polar head, the lipids are eluted class by class, making their identification easier. the separation of lipids by reverse phase was correct but the method is longer and we observed a massive carryover of triglycerides on the column. finally each lipid class was screened in ms/ms parent ion mode. target daughter ion was set according to the lipid class structure and fragmentation pattern. this detection technique enabled the identification of 50 different lipids. to ensure the absolute quantification of the detected lipids and to guarantee comparable results between batches labeled internal standard were added prior to extraction. this method was optimized in a stepwise process to ensure a sensitive and selective measurement of the lipids. lipids were extracted by matyash method, separated by hilic and detected by tandem mass spectrometry. this method is suitable for both in process sample lipid analysis providing information on the cell lipid content, and for harvest samples, enabling to follow the lipid release during the different harvest steps. this non-targeted lipidomic quantitation method will enable us to better control lipid synthesis during biopharmaceutical fed batch production through clone selection, metabolomics studies and harvest development. background human mesenchymal stem/stromal cells (hmsc) can easily be isolated from e.g. bone marrow, fat tissue or umbilical cord blood and are therefore a central player in regenerative medicine, gene therapy and cell therapy [1] [2] [3] . the necessary gene shuttle is mainly provided by viruses associated with diseases, like retrovirus or adenovirus [4] [5] [6] [7] . these possible pathogen viruses demand for high safety standards. also, they are prone to genomic alterations and there is the possibility of virus inactivation, triggered due to pre-existing immunity in the patient [8] [9] [10] . in this context, the autographa californica multicapsid nucleopolyhedrovirus (acmnpv) is a safe alternative. the virus replication is hostspecific for insects [11] , but it is known since the mid-90s, that a temporary transduction of mammalian cells is possible [12] . some modifications of the virus increased the applicability in stem cells. pseudotyping the virus with the vesicular stomatitis glycoprotein (vsv-g) led to an expansion of the transducable cell [13, 14] and the integration of the woodchuck hepatitis virus post-transcriptional regulatory element (wpre) prolonged the recombinant protein expression [15, 16] . for achieving a baculovirus-induced differentiation of hmscs, the promotor and the expression strength of the recombinant protein are crucial factors. still, there are still few comparative promotor studies [17, 18] . however, a successful virus uptake is the prerequisite for a successful protein expression. we therefore investigated factors significantly influencing the transduction process by applying design of experiments (s. fig. 1a ). the experimental design comprises a two level factorial screening, set-up using design expert v9.for the transduction 60,000 c/cm 2 were seeded in 24-well plates with dmem + 10% fcs and incubated overnight at 37°c, 8% co 2 and humidified atmosphere. the recombinant baculovirus using an integrated ef1α promoter to control gfp expression, described elsewhere [18] , was diluted to the respective concentrations in the different surrounding fluids. after discarding the cultivation medium of the hmsc-tert, 1 ml of virus containing solution was added to the cells. the following incubation was varied in duration before replacing the virus solution with growth medium and an incubation overnight. 24 h post transduction (hpt) the cells were washed with pbs, trypsinized with 100 μl trypsin/edta and incubated for 5 min at 37°c. trypsination was then stopped applying 100 μl soybean trypsin inhibitor and the cells were analyzed using flow cytometry. as shown in fig. 1a , the virus concentration and incubation time exert the highest influence on the transduction efficiency. obviously, a higher concentration of viral particles and longer incubation of cells with virus increases the probability for hits between cells and virus particles. additionally, the surrounding fluid can have a negative impact on the transduction. this is due to the interaction of medium components with the baculovirus. therefore, pbs containing ca 2+ & mg 2+ is recommended as surrounding fluid for transduction experiments. in fig. 1b , the transduction conditions resulting in the highest percentage of gfp+ cells are displayed: 150 virus particles per cell (ppc) and an incubation time of 5 h with hmsc-tert. the experiments show, that especially the virus concentration and the incubation time of cells with virus influence the transduction efficiency. based on the results of the screening, further optimization of the transduction conditions will be done using a face centered central composite design with pbs containing ca 2+ & mg 2+ as surrounding fluid and at an incubation temperature of 37°c. background breast cancer is the second main cause of cancer related deaths for women worldwide and among them the triple negative subtype (tnbc) represents a clinical challenge by being associated with high mortality and having no effective therapies against it [1] , [2] . accordingly, there is an urgent need to design new and more effective drugs to treat breast cancer. notch signaling is an evolutionary conserved cell-to-cell communication pathway crucial during embryonic and breast development and tissue homeostasis. this pathway is often hyper-activated by overexpression of notch receptors and/or its ligands in several types of cancers, such as breast cancer (tnbc included), where it contributes to its development, progression and drug resistance [3] , [4] , [5] . our aim is to generate a function blocking antibody against the notch delta-like-1 (dll1) ligand with therapeutic efficacy against breast cancer. materials and methods dna of human dll1 full length extracellular domain (dll1-ecd) and a truncated version, containing the minimal binding region to the notch receptor (dll1-egf3), were cloned into pfuse-fc1-igg1, and expressed in hek293e6cells. recombinant proteins were purified from culture media by protein-a affinity and size exclusion chromatography. the human scfv phage display tomlinson i+j library was used to select specific scfv against peptides targeting dll1 binding regions to notch. the binding ability and specificity of the selected scfv clones was evaluated by scfv-on-phage elisa. our strategy allowed us to obtain 20 mg of pure (>95%) and stable dll1-ecd-fc as confirmed by sds page and thermofluor assay. dll1-egf3-fc yield was very low and buffer screenings are ongoing to optimize protein stability. functional studies performed in human breast cancer mcf7 cells showed that both ligands are biologically active as they increased the expression of the notch-dependent genes hes-1, hey-l and hey-1. recombinant dll1 and peptides were used to select for monoclonal antibodies by phage display. after three rounds of panning with dll1 peptides we identified 13 scfv positive clones, 2 of which presented high affinity to dll1-ecd-fc. currently we are performing more phage display selections to increase the number of positive clones. scfv with higher affinities will be reformatted into iggs and their ability to inhibit the notch pathway will be evaluated. the anti-oncogenic effects of anti-dll1 iggs will be assessed in breast cancer cells in viability/apoptosis, proliferation, migration, and invasion assays. an anti-dll1 igg with therapeutic efficacy against breast cancer will demonstrate that targeting dll1 could be one of the key factors for successfully targeting breast cancer. recombinant adeno-associated virus (raav) approaches have an outstanding reputation in gene therapy and are evaluated for cancer therapy [1] . advantages include long-term gene expression, targeting of dividing and non-dividing cells, and low immunogenicity. established raav production utilizes triple transfection of adherent hek 293 cells, which hardly meets product yield requirements for clinical applications. we transferred the aav production system to hek 293-f suspension cells. this process is scalable and uses serum-free media streamlining downstream procedures. after optimization of transfection efficiencies and shaker cultivations, we produced titers of 1×10 5 viral genomes per cell in a 2 l bioreactor. the suspension adapted hek-freestyle 293-f cell line was used for the experiments in chemically defined animal component free media (hek-tf, hek-gm (xell ag), freestyle f17 (thermo fisher scientific)). samples for viable cell density and viabilities were taken daily and analyzed using an automated cell counting system (cedex, roche diagnostics). transient transfection of 3×10 6 cells/ml was carried out with polyethylenimine max in a 1:4 dna-pei ratio (w/w) with 2 μg dna. three plasmids (pgoi, prepcap, phelper) were applied in a molar 1:1:1 ratio (fig. 1a) . pretests were performed in orbital shaking tube spin bioreactors. for scale-up, batch processes were carried out in 125 ml shake flasks as well as in 2 l stirred bioreactors at 30% air saturation and ph 7.1. transfection efficiencies and raav production were quantified by flow cytometry using a goi coding for a fluorescent protein and qpcr of genomic copies, respectively. by optimizing the dna amount for transfection of 293-f cells more than 90 % of the cells were reproducibly transfected. batch cultivations in shaker flasks revealed that raav were produced in the first 24-96 h after transfection. figure 1b shows viable cell densities and viabilities in relation to the genomic titer. genomic titers were determined from raw cell extracts and up to 10 9 copies/ml were repetitively achievable. a decrease in viability marked the decline in genomic copies per ml showing that a prolongation of the process e.g. by addition of a feed would probably not increase yield. in a first scale-up, the raav production was transferred to a 2 l bioreactor (fig. 1c) . transfection efficiencies in bioreactors of up to 55% were comparable to that obtained in a simultaneous shaker flask experiment. transfection efficiencies were lower compared to prior experiments due to controlled conditions in the bioreactor. nonetheless the titer with up to 1×10 5 genomic copies per cell was elevated compared to that of shaker flasks. first experiments with 293-f cells in hek tf medium showed promising results of transferring raav production from the adherent system to suspension. after improvement of transfections by the adjustment of dna amounts in small scale experiments, aav production was analyzed in shaker flasks. the batch process showed an expected increase in cell density with low variability between biological replicates (fig. 1b) . the genomic titer increased according to the viable cell density until day four where a sudden drop started. this observation was made for aav productions in hek-tf, hek-gm and freestyle f17 medium. for optimal yields, we assume that a slight decrease in viability marks the point in time for harvest. from optimized protocols, a batch process in a 2 l bioreactor was carried out. interestingly the bioreactor cultivation resulted in lower overall viable cell densities but in higher genomic copies per cell compared to shaker flasks (fig. 1c) . these results are comparable to already published data for suspension cells [2] . subsequent optimization of the bioreactor protocol will lead to further increase in raav yield. genethon and pall have collaborated to assess pall's single-use icellis fixed-bed bioreactor for viral vector production. clinical use of gene therapies to treat formerly incurable genetic diseases is advancing rapidly. viral vectors are an important tool for introducing genes into target cells. many gene therapies have been developed using adherent cells in 2-dimensional flatware or roller bottles but using these technologies to reach commercial-scale production represents a significant challenge. the icellis bioreactor enables large-scale viral vector production by providing a 3-dimensional matrix for cell growth in a compact configuration (fig. 1 ). up to 500 m 2 of surface area is available in a compact bioreactor measuring 88 mm in diameter in a total volume of 75 l with ph, do and temperature control. a key feature of the icellis bioreactor is that it scales by increasing the diameter of the fixed-bed while keeping the height constant with no change in aspect ratios. the height of the fixed-bed can be varied (2, 4 and 10 cm) as well as density of carrier packing (96 gm/l or 144 gm/l). the icellis system comes in two formats, the icellis nano bioreactor (0.53-4.0 m 2 ) and the icellis 500 bioreactor (66-500 m 2 ). processes developed in the bench top icellis nano bioreactor can be directly transferred to the corresponding icellis 500 system. the icellis nano bioreactor enables an efficient platform for process optimization. the genethon raav-8 process was transferred to an icellis nano bioreactor 0.8 m 2 (2 cm bed height, 144 gm/l density) bioreactor using freestyle media. the initial icellis nano process was established as (1) seed on day 1, (2) transfect at day 5, (3) harvest at day 8 and yielded <1x 10 9 vp/cm 2 (n=3). media exchange, cell density at transfection, pdna/cell ratio, and lysis method were then changed to determine the effect on productivity. the modified process was then scaled from 0.8 m 2 to 4.0 m 2 (10 cm bed height, 144 gm/l density) icellis nano bioreactor. -media: a media exchange at 5 hours post transfection with dmem substituted for freestyle medium resulted in an 8x increase in specific productivity. 1 (abstract p-349) . a schematic overview of raav production in hek293 cells with triple-transfection system. b viable cell densities (vcd), viabilities and genomic copies per ml (gc) of a raav production with 293-f batch cultivations in shaker flasks. genomic copies per ml refer to the titer determined in 1 ml culture volume. error bars represent biological and technical duplicate measurements of samples. c viable cell densities and genomic copies per cell of a raav production with 293-f batch cultivation in a 2 l bioreactor. for reasons of comparability between shaker and bioreactor data genomic copies are given per cell. error bars represent technical duplicate measurements of samples -cell density at transfection: cells were seeded at 6,000 cells/cm 2 and reached 200,000 cells/cm 2 at day 5 which was determined to be the optimal cell density for transfection. -pdna/cell ratio: reducing pdna by 50% had no significant effect on productivity. -lysis: use of trion x-100 at 0.5% with 100 mm nacl at ph 8 resulted in >100% virus recovery compared to sampled carriers. -scaling: specific productivity was maintained as the system was scaled from 0.8 m 2 to 4.0 m 2 . -overall, an average yield of 4x10 13 vg/m 2 was achieved. the icellis technology is being adopted widely for viral vector production. transferring a process to the icellis nano bioreactor can be easily achieved and once in place can be optimized to provide significant productivity increases and cost savings such as reduced pdna. the icellis nano bioreactor is an efficient bench-top system the results of which can be readily scaled to the icellis 500 system. background tissuse multi-organ-chip (moc) platform contributes to the ongoing advancement in systemic substance testing in vitro. current in vitro and animal tests for drug development are failing to emulate the systemic organ complexity of the human body and, therefore, often do not accurately predict drug toxicity. especially, cardiotoxicity is one of the main reasons why new compounds are failing in clinical trials. therefore, we aimed to establish an autologous dynamic multiorgan-device integrating cardiomyocytes for substance testing. generic 2d monolayer and 3d suspension ipsc derived cardiomyocytes differentiation protocols were established. beating cardiomyocytes were first seen on day 8 in monolayer as well as in spheroid culture. cardiomyocytes show up to 64% cardiac troponin t positive cells and 44% myosin heavy chain positive cells by flow cytometry (fig. 1g, h) . myosin ii heavy chain, α-actinin, myosin 9/10, myosin 11 and caldesmon expression was shown by immunohistochemistry ( fig. 1a-d) . due to the exclusion of a lactate enrichment of cardiomyocytes, cardiac fibroblasts are also expressed in the spheroids shown by vimentin staining. those cardiac fibroblasts lead to a physiological heterologous cell population similar to the human heart. beating spheroids were cultivated for 7 days under dynamic culture conditions in the multi-organ-chip. the integrated on-chip micropump provides physiological-like pulsatile circulation at a microliter scale and leads to better nutrition and oxygen supply. the next significant step is to combine multiple autologous 3d organ equivalents in our multi-organ-chip using ipsc differentiation technology. differentiating all cell types from one ipsc donor is crucial to overcome source and rejection problems. combining our multi-organ-chip platform with ipsc differentiation technology will eventually lead to a personalized system for drug and substance testing. lab as a service -automated cell-based assays lena schober, moriz walter, andrea traube laboratory automation and biomanufacturing engineering, fraunhofer ipa, stuttgart, germany correspondence: lena schober (lena.schober@ipa.fraunhofer.de) bmc proceedings 2018, 12(suppl 1):p-365 background the use of cell-based assays in pharmaceutical industry and academic research is a growing trend that is a driving force to reduce costs for drug development. academic research is gaining information about intracellular targets or functional mechanisms through the variety of different assays. these benefits can be used in preclinical studies and furthermore costly late-stage drug failures may be reduced by the use of cell-based assays. the use of automated systems is also in great demand and will change the testing of substances and research activities. nevertheless, there are a lot of barriers at the moment limiting the successful application of automated systems in this field. by the lack of flexibility and the demand for skilled computer scientists & engineers just the two main aspects stated by experts shall be mentioned. our strong background on automated cell culture technologies and expertise, gained in several projects, let us rethink the overall process chain and overcome established principles. a new service orientated platform for the execution of cell-based assays that are commonly used will be introduced. the main idea is to give access to automated infrastructure for academic research or spin-offs which cannot afford the special infrastructure. nowadays it is known that the development of inhibitory antibodies by hemophiliac patients is closely related with immunogenic epitopes present in the coagulation factors. these proteins are produced in hamsters cells [1 -4] which insert a different posttranslational modification profile when compared with the human profile. patients with high-titer/high-responding inhibitors must be treated with bypassing agents that can achieve hemostasis. activated factor vii (fviia) is an attractive candidate for hemostasis, independent of fviii/fix, making this coagulation factor an alternative for hemophilia patients with inhibitory antibodies. however recombinant factor vii is produced in bhk-21 cells (baby hamster kidney cells) and as well as the others coagulation factors, it may contain immunogenic epitopes [5 -7] . in this context, becomes extremely important to produce recombinant proteins with complex posttranslational modifications in a cell line not yet used [8 -10] . we have been using the sk-hep-1 human cell line for the production of recombinant fvii. to generate the recombinant cell line we have used a bicistronic lentiviral vector, 1054-gfp, containing a fvii gene and the gfp selection marker gene. a master cell bank and a work cell bank were generated in gmp conditions. the rfvii analyses were made by elisa assay, western blot, gene expression quantification and biological activity using the prothrombin time (pt) assay. rfvii purification by affinity chromatography using viiselect (ge) column. after purification the rfvii was formulated and dry froze to be used in in vivo experiments. in static conditions sk-hep-1 cells showed, for a period of 6 months, a stable fvii production with an average of 8,03 iu/ml of fvii, 83% of cell viability and 77% of cells expressing the gfp gene. after purification with viiselect column it was possible observe a recover of 65% of the purified protein with 95% degree of purity (fig. 1) . this recombinant purified fvii is being used in in vivo experiments to determine the pharmacokinetics parameters and to evaluate the posttranslation modifications profile. in conclusion, this study reports the use of sk-hep-1 cell line for high-level production of recombinant factor vii. these cells have proven to be effective in the production of recombinant protein and can be used as a new platform for the production of recombinant proteins. fig. 1 (abstract p-366) . a determination of protein yield of egfr (epidermal growth factor receptor) synthesized in a cho cecf system. analysis of egfr protein yield obtained in a various batches of cecf formatted reaction. cecf synthesis was performed in the presence of 14 c leucine for radio labeling of target proteins. radio labeled proteins were precipitated using tca followed by scintillation measurement. b detection of radio labeled egfr by autoradiography. a no template control (ntc) was prepared containing no egfr dna template background emergence of stem cell-based regenerative medicine recently leaded to the necessity to reach a sustained production of such cells [1] . hence, new bioreactors and carriers were designed for cell expansion. however, to meet this increasing demand, improvement of both quality and quantity of stem cells remains necessary. soft biocompatible microcarriers mimicking extracellular matrix in term of structure and stiffness should be of valuable utility as substrate stiffness strongly influence in vitro stem cell fate and differentiation [2, 3] . our expertise in the field of microbeads design using jetcutting technology [4] enabled us to engineer +/-200 μm alginate beads of various g/m monomer ratio. we used jetcutter (genialab gmbh) with 100 μm nozzle at max speed 12000 rpm. alginate solutions with concentrations 2% to 4% were gelifyed in 2% cacl2 etoh 50% solution. alginates with estimated viscosity (@1%) from 30 to 720 mpa were tested. a further surface treatment with gelatine (0,1%, 1%) and poly-l-lysine (0,1%) was carried out to reach an optimal cell anchoring of human adiposederived mesenchymal stem cells (atcc-psc-500-011) in mesempro rs medium (gibco). jetcutter technology allowed us to obtain alginate microcarriers with a good homogeneity in size around 200 μm and sphericity comparable to commercial carriers (table 1) . best adhesion of human adipose-derived mesenchymal stem cells was obtained on 0,1% gelatine coated alginate carriers (fig. 1) . we observed limited apoptosis and human adipose-derived mesenchymal cells stemness was conserved after 14 days in culture (data not shown). cellular bioassays developed with functionally immortalized cell lines aileen bleisch 1 , aleksandra velkova 3 , tom wahlicht 2 , dagmar wirth 2 , tobias may 1 1 inscreenex gmbh, braunschweig, germany; 2 msys, helmholtz centre for infection research, braunschweig, germany; 3 greiner bio-one gmbh, frickenhausen, germany bmc proceedings 2018, 12(suppl 1):p-381 background a major challenge of current research is the limited availability of physiologically relevant cells [1] . thus the development of relevant cellular bioassays that are robust, reproducible and scalable is hindered. to overcome current limitations we developed an immortalization strategy allowing the efficient and reproducible establishment of novel cell lines showing an in vivo-like phenotype. the main feature of our ci-screen technology is the ability to combine the advantage of cell linesthe unlimited cell supplywith the advantage of primary cellsthe physiological relevance. using this technology we have immortalized, amongst others, a human osteoblast cell line (ci-huob) [2] . in the present study, the in vivo-like phenotype and functionality of the novel ci-huob was examined. therefore, ci-huob cells were used to develop a 3d cell culture model by using the magnetic 3d bioprinting technology (nano3d biosciences, houston, tx, usa) [3] . the ci-huob cell line was recently described and cultivated in huob maintenance medium (inscreenex, germany). for spheroid creation ci-huobs were grown in a monolayer, magnetized by adding a magnetic nanoparticle assembly (nanoshuttle, ns, nano3d biosciences, houston, tx, usa) at a concentration of 4μl ns/cm 2 growth area. after an overnight incubation magnetized ci-huob were detached and seeded into cellstar® cell-repellent 96-well plates (greiner bio-one, frickenhausen, germany). with the help of mild magnetic forces cells were printed into spheroids within 2h. these consist of 1.000-50.000 cells and were cultured for a period of up to 50 days. the cell viability was analyzed by a propidium iodide (pi) and calcein am staining. to improve spheroid functionality spheroids were cultivated with huob differentiation medium (inscreenex, germany). "mini bone" tissue functionality and thus mineralization was analyzed by an alkaline phosphatase (alkaline phosphatase activity) and an alizarin red s staining (ca 2+ deposits). the combination of ci-huob cells with the magnetic 3d bioprinting technology enabled the establishment of reproducible and consistent 3d spheroids. single spheroids per well were formed independent of the amount of cells (1.000-50.000 cells) (fig. 1a) . formed spheroids were stable for a culture period of up to 50 days (fig. 1b) . neither cell death nor cell proliferation were observed in the bioprinted spheroids which is indicated by the stable size of the spheroids throughout the cultivation (fig. 1c) . after treatment with a differentiation stimulus the 3d bioprinted spheroids became fully functional "mini bones". this was highlighted by the alkaline phosphatase activity and the ca 2+ deposits within the 3d bioprinted spheroids (fig. 1d,e) . taken together, these results demonstrated that the functional immortalization technology provides physiologically relevant cells in sufficient numbers and that the magnetic 3d bioprinting technology enabled a fast, consistent cell aggregation and the formation of stable uniform spheroids. importantly, these immortalized cells are capable to differentiate when a suitable stimulus is provided. for differentiation into mini bones, 3d spheroid cultivation and additional stimulation by small molecules are required. the combination of physiologically relevant cell systems with three dimensional culturing will help to generate in vitro test systems which closely resemble the in vivo physiology and thereby supporting future drug discovery approaches. fig. 1 (abstract p-381) . characterization of spheroid "mini bones". a different number (1.000-50.000 cells) of ci-huob cells were printed into spheroids. b 20.000 ci-huobs were printed into spheroids and cultivated for indicated time points. c for analyzing spheroid sizes, pictures were taken and quantified by imagej. (d/e) 20.000 ci-huob cells were printed into spheroids and cultivated with (huob differentiation medium) or without a differentiation stimulus for two weeks. afterwards, bioprinted spheroids were sectioned by a cryo microtome and d stained for ca 2+ deposits (alizarin red s) or e stained for alkaline phosphatase activity crispr/cas9, a novel genomic tool to knock down microrna in vitro and in vivo degrontagged dcas9/cpf1 effectors for multi-directional drug-inducible control of synthetic gene regulation assessing the variability of an innovator molecule n-glycan profile correct primary structure assessment and extensive glyco-profiling of cetuximab by a combination of intact, middle-up, middle-down and bottom-up esi and maldi mass spectrometry techniques 2d-dige screening of high productive cho cells under glucoselimitation -basic changes in the proteome equipment and hints for epigenetic effects dependence on glucose limitation of the pco2 influences on cho cell growth, metabolism and igg production fcgalactosylation modulates antibody-dependent cellular cytotoxicity of therapeutic antibodies fc glycans of therapeutic antibodies as critical quality attributes development of an automated, multiwell plate based screening system for suspension cell culture global cancer statistics remission of disseminated cancer after systemic oncolytic virotherapy screening different host cell lines for the dynamic production of measles virus beitrag zur kollektiven behandlung pharmakologischer reihenversuche a simple method of estimating fifty per cent endpoints webinar: ambr15 as a sedimentation-perfusion model for cultivation characteristics and product quality prediction de novo" high density perfusion medium: increased productivity and reduced perfusion rates a high-yielding cho transient system: co-expression of genes encoding ebna-1 and gs enhances transient protein expression an integrated vector system for the eukaryotic expression of antibodies or their fragments after selection from phage display libraries towards the development of a surface plasmon resonance assay to evaluate the glycosylation pattern of monoclonal antibodies using the extracellular domains of cd16a and cd64 biotinylation of the fc gamma receptor ectodomains by mammalian cell co-transfection: application to the development of a surface plasmon resonance-based assay ambr™ mini-bioreactor as a high-throughput tool for culture process development to accelerate transfer to stainless steel manufacturing scale: comparability study from process performance to product quality attributes maximizing binding capacity for protein a chromatography protein glycosylation and its role in protein folding afucosylated antibodies increase activation of fcγriiia-dependent signaling components to intensify processes promoting adcc sialic acids and other nonulosonic acids production of antibody in insect cells suitability and perspectives on using recombinant insect cells for the production of virus-like particles cloning of cdna and characterization of anti-rnase a monoclonal antibody 3a21 production of functional antibody fab fragment by recombinant insect cells optimization of hek-293s cell cultures for the production of adenoviral vectors in bioreactors using on-line our measurements enhancing heterologous protein expression and secretion in hek293 cells by means of combination of cmv promoter and ifnα2 signal peptide hek293 cell culture media study towards bioprocess optimization: animal derived component free and animal derived component containing platforms comparison of control strategies for fed-batch culture of hybridoma cells based on on-line monitoring of oxygen uptake rate, optical cell density and glucose concentration continuous bioprocessing: the real thing this time? mabs white paper on continuous bioprocessing fda perspective on continuous manufacturing. ifpac annu. meet screening and assessment of performance and molecule quality attributes of industrial cell lines across different fed-batch systems amanullah: quantitative modeling of viable cell density, cell size, intracellular conductivity, and membrane capacitance in batch and fed-batch cho processes using dielectric spectroscopy optimal and consistent protein glycosylation in mammcalian cell culture journal of laboratory automation. designs and concept-reliance of a fully automated high content screening platform mini-bioreactor as a highthroughput tool for culture process development to accelerate transfer to stainless steel manufacturing scale: comparability study from process performance to product quality attributes perfusion media development using cell settling in automated cell culture system model-based design of process strategies for cell culture bioprocesses: state of the art and new perspectives model-based strategy for cell culture seed train layout verified at lab scale hyperosmotic stimulus study discloses benefits in atp supply and reveals mirna/mrna targets to improve recombinant protein production of cho cells effects of osmoprotectant compounds on ncam polysialylation under hyperosmotic stress and elevated pco 2 evaluating the bottlenecks of recombinant igm production in mammalian cells igm characterization directly performed in crude culture supernatants by a new simple electrophoretic method effect of culture ph on erythropoietin production by chinese hamster ovary cells grown in suspension at 32.5 and 37.0°c enhancing protein expression in hek-293 cells by lowering culture temperature relationship between tissue plasminogen activator production and specific growth rate in chinese hamster ovary cells cultured in mannose at low temperature a quantitative proteomic analysis of cellular responses to high glucose media in chinese hamster ovary cells process analytical technologies in the pharmaceutical industry: the fda's pat initiative effects of ammonium and lactate on growth and metabolism of a recombinant chinese hamster ovary cell culture glycosylation in cell culture structural mechanism of high affinity fcgammari recognition of immunoglobulin g inhibition of glycosylation on a camelid antibody uniquely affects its fcγri binding activity investigation of cho secretome: potential way to improve recombinant protein production from bioprocess bladder cancer cell-derived exosomes inhibit tumor cell apoptosis and induce cell proliferation in vitro exploring packaged microvesicle proteome composition of chinese hamster ovary secretome glycome mapping on dna sequencing equipment iron (iii) citrate inhibits polyethylenimine-mediated transient transfection of chinese hamster ovary cells in serum-free medium efficient high-throughput biological process characterization scale-up of a stirred single-use bioreactor family quality control: er-associated degradation: protein quality control and beyond pharmacological targeting of endoplasmic reticulum stress signaling in cancer when er stress reaches a dead end oxidative stress and antioxidant defense the antioxidant edaravone attenuates er-stress-mediated cardiac apoptosis and dysfunction in rats with autoimmune myocarditis advanced process and control strategies for bioreactors. book chapter model-based strategy for cell culture seed train layout verified at lab scale seed train optimization for cell culture. chapter high cell density cultivation of human leukemia t cells (jurkat cells) in semipermeable polyelectrolyte microcapsules. eng. life sci cell retention by encapsulation for the cultivation of jurkat cells in fixed and fluidized bed reactors the role of interleukin-2 during homeostasis and activation of the immune system creating a biomimetic microenvironment for the ex vivo expansion of primary human the era of digital biomanufacturing the multivariate normal distribution recovery from rabies: a call to arms the role of vaccination in rabies prevention eliminating canine rabies, the principal source of human infection: what will it take? rabies virus-like particles expressed in hek293 cells immunogenic virus-like particles continuously expressed in mammalian cells as a veterinary rabies vaccine candidate an avian cell line designed for production of highly attenuated viruses a chemically defined production process for highly attenuated poxviruses a genotype of modified vaccinia ankara (mva) that facilitates replication in suspension cultures in chemically defined medium easy and efficient protocols for working with recombinant vaccinia virus mva large-scale transfection of mammalian cells for the fast production of recombinant protein recombinant protein production by large-scale transient gene expression in mammalian cells: state of the art and future perspectives high density transfection with hek 293 cells allows doubling of transient titers and removes need for a priori dna complex formation with pei regulation of recombinant protein expression during chobri/rcta pool generation increases productivity and stability qc, canada; 2 department of microbiology, infectiology and immunology the cumate gene-switch: a system for regulated expression in mammalian cells rapid protein production from stable cho cell pools using plasmid vector and the cumate gene-switch christoph zehe sartorius stedim cellca gmbh, 88471 laupheim animal cell technology: from target to market recent advances in mammalian protein production molecular mechanisms of rna interference. annual review of biophysics ribosome profiling-guided depletion of an mrna increases cell growth rate and protein secretion ubiquitous chromatin-opening elements (ucoes): applications in biomanufacturing and gene therapy the art of cho cell engineering: a comprehensive retrospect and future perspectives a novel bxb1 integrase rmce system for high fidelity site-specific integration of mab expression cassette in cho cells high-troughput lipidomic and transcriptomic analysis to compare sp2/0, cho, and hek-293 mammalian cell lines single cell characterisation of chinese hamster ovary (cho) cells eva pekle 1,2 , guglielmo rosignoli 1 generation of stable chinese hamster ovary pools yielding antibody titers of up to 7.6 g/l using the piggybac transposon system comparison of three transposons for the generation of highly productive recombinant cho cell pools and cell lines effects of ammonia and lactate on hybridoma growth, metabolism, and antibody production lactate and glucose concomitant consumption as a self-regulated ph detoxification mechanism in hek293 cell cultures flux balance analysis of cho cells before and after a metabolic switch from lactate production to consumption reducing recon 2 for steady-state flux analysis of hek cell culture trastuzumab -mechanism of action and use in clinical practice hek293 cell culture media study towards bioprocess optimization: animal derived component free and animal derived component containing platforms enhancing heterologous protein expression and secretion in hek293 cells by means of combination of cmv promoter and ifnα2 signal peptide production of lentiviral vectors references 1. wurm f m: production of recombinant protein therapeutics in cultivated mamammalian cells initial identification of low temperature and culture stage induction of mirna expression in suspension cho-k1 cells small indels induced by crispr/cas9 in the 5' region of microrna lead to its depletion and drosha processing retardance production of recombinant protein therapeutics in cultivated mammalian cells improved antibody production in chinese hamster ovary cells by atf4 overexpression the vesicle-trafficking protein munc18b increases the secretory capacity of mammalian cells rapid evaluation of n-glycosylation status of antibodies with chemiluminescent lectin-binding assay intracellular secretion pathway analysis for constructing highly producible engineered cho cells. 16th annual peptalk intracellular secretion analysis of therapeutic antibodies in engineered high-producible cho cells analysis of intracellular recombinant igg secretion in engineered cho cells. the 29th annual and international meeting of the japanese association for animal cell technology humanization strategies for an anti-idiotypic antibody mimicking hiv-i gp41 antibody humanization by molecular dynamics simulations-in-silico guided selection of critical backmutations maxquant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification the perseus computational platform for comprehensive analysis of (prote)omics data hoffrogge: label-free protein quantification of sodium butyrate treated cho cells by esi-uhr-tof-ms the art of cho cell engineering: a comprehensive retrospect and future perspectives gs system for increased expression of difficult-to-express proteins the netherlands correspondence: maurice van der heijden (heijden@proteonic.nl) bmc proceedings increased recombinant protein production owing to expanded opportunities for vector integration in high chromosome number chinese hamster ovary cells ires-mediated translation of membrane proteins and glycoproteins in eukaryotic cell-free systems cell-free protein expression based on extracts from cho cells comparison of cell-based vs. cell-free mammalian systems for the production of a recombinant human bone morphogenic growth factor. eng dual-responsive magnetic core-shell nanoparticles for nonviral gene delivery and cell separation pdmaema-grafted core-shell-corona particles for nonviral gene delivery and magnetic cell separation influence of polyplex formation on the performance of starshaped polycationic transfection agents for mammalian cells systematic study of a library of pdmaema-based, superparamagnetic nano-stars for the transfection of cho-k1 cells systems biology of unfolded protein response in recombinant cho cells dynamics of unfolded protein response in recombinant cho cells cells were transfected with np@(pdmaema 1037 ) 46 (n/p 15), separated 24 h post transfection (t = 0) by magnetically-assisted cell sorting and placed into separated cultures. the bars represent the overall transfection efficiency. distribution of low (light green), middle (green), and high (dark green) producers within the egfp-expressing cell fraction. data represent one experiment carried out in duplicate, with random experimental error shown. d correlogram between the molecular characteristics of the nano-stars (core diameter, arm density, arm length, number of monomeric units per nano-star), the physicochemical properties of the corresponding polyplexes (hydrodynamic radius, zeta potential), the transfection conditions (n/p ratio, amount of polymer), and the cellular reactions (transfection efficiency, magnetism, viability) production of recombinant protein therapeutics in cultivated mammalian cells position effects on eukaryotic gene expression bacterial artificial chromosome library for genome-wide analysis of chinese hamster ovary cells construction of bac-based physical map and analysis of chromosome rearrangement in chinese hamster ovary cell lines suguru imanishi 1 , akira ito 1 , masamichi kamihira 1,2 1 department of chemical engineering cre recombinase-mediated sitespecific modification of a cellular genome using an integrasedefective retroviral vector targeted transgene insertion into the cho cell genome using cre recombinase-incorporating integrase-defective retroviral vectors an accumulative site-specific gene integration system using cre recombinase-mediated cassette exchange improved recombinant antibody production by cho cells using a production enhancer dna element with repeated transgene integration at a predetermined chromosomal site lipid extraction by methyl-tert-butyl ether for high-throughput lipidomics a simple method for the isolation and purification of total lipids from animal tissues using baculovirus as a gene shuttle in hmsc: optimization of transduction efficacy gundula sprick 1 clinical applications of mesenchymal stem cells concise review: mesenchymal stem cell treatment of the complications of diabetes mellitus wharton's jelly-derived mesenchymal stromal cells as a promising cellular therapeutic strategy for the management of graft-versus-host disease gene therapy: twenty-first century medicine state-of-the-art gene-based therapies: the road ahead viral vectors: a look back and ahead on gene transfer technology basic biology of adeno-associated virus (aav) vectors used in gene therapy biosafety challenges for use of lentiviral vectors in gene therapy adenoviral vector-mediated gene therapy for gliomas: coming of age manufacturing of viral vectors for gene therapy: part i. upstream processing the complete dna sequence of autographa californica nuclear polyhedrosis virus efficient gene transfer into human hepatocytes by baculovirus vectors efficient transduction of mammalian cells by a recombinant baculovirus having the vesicular stomatitis virus g glycoprotein recombinant baculoviruses as mammalian cell gene-delivery vectors post-transcriptional regulatory element boosts baculovirusmediated gene expression in vertebrate cells baculoviral vector-mediated transient and stable transgene expression in human embryonic stem cells systematic comparison of constitutive promoters and the doxycycline-inducible promoter baculovirus-induced recombinant protein expression in human mesenchymal stromal stem cells: a promoter study triple-negative breast cancer: an unmet medical need the therapeutic monoclonal antibody market. mabs a monoclonal antibody against human notch1 ligand-binding domain depletes subpopulation of putative breast cancer stem-like cells notch activation stimulates migration of breast cancer cells and promotes tumor growth notch-out for breast cancer therapies aav production in suspension: evaluation of different cell culture media and scale-up potential modular adeno-associated virus (raav) vectors used for cellular virusdirected enzyme prodrug therapy production of recombinant adenoassociated virus vectors using suspension hek293 cells and continuous harvest of vector from the culture media for gmp fix and flt1 clinical vector development of a cost-efficient scalable production process for raav-8 based gene therapy by transfection of hek-293 cells simon arias 2 , mustapha hohoud 1 , roel lievrouw 1 , fabien moncaubeig 1 b-1120 brussels, belgium; 2 généthon, rue de l'internationale 1 cell-free systems based on cho cell lysates: optimization strategies, synthesis of "difficult-to-express" proteins and future perspectives cell-free protein expression based on extracts from cho cells comparison of cell-based vs. cell-free mammalian systems for the production of a recombinant human bone morphogenic growth factor ires-mediated translation of membrane proteins and glycoproteins in production of recombinant factor vii in sk-hep-1 human cell line zip 14049-900, brazil; 3 department of clinical, toxicological and food science analysis, faculty of pharmaceutical sciences of ribeirão preto human cell lines for the production of recombinant proteins: on the horizon production of recombinant protein therapeutics in cultivated mammalian cells recombinant protein therapeutics from cho cells -20 years and counting establishment of a cell line expressing recombinant factor vii and its subsequent conversion to active form fviia through hepsin by genetic engineering method expression and fast preparation of biologically active recombinant human coagulation factor vii in cho-k1 cells implications of the presence of n-glycolylneuraminic acid in recombinant therapeutic glycoproteins uniquely human evolution of sialic acid genetics and biology production platforms for biotherapeutic glycoproteins. occurrence, impact, and challenges of non-human sialylation human cells: new platform for recombinant therapeutic protein production therapeutic glycoprotein production in mammalian cells masthering industrialization of cell therapy products tissue cells feel and respond to the stiffness of their substrate matrix elasticity directs stem cell lineage specification continuous cider fermentation with co-immobilized yeast and leuconostoc oenos cells eternity and functionality -rational access to physiologically relevant cell lines generation and characterization of two immortalized human osteoblastic cell lines useful for epigenetic studies biocompatibility of nanoshuttletm and the magnetic field in magnetic 3d bioprinting publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations we accept pre-submission inquiries • our selector tool helps you to find the most relevant journal • we provide round the clock customer support • convenient online submission • thorough peer review • inclusion in pubmed and all major indexing services • maximum visibility for your research submit your manuscript at www submit your next manuscript to biomed central and we will help you at every step authors thankfully acknowledge the biotechnology and biological sciences research council for funding this research work. sns thanks esact 2017 for providing her with the opportunity to present her work at the meeting. we would like to thank moritz frei for his support for the generation of the ngs transcriptomics data. many thanks to valentine chevallier for her precious advices, to stefanos grammatikos for his support and to the whole upstream process sciences team. we thank david bruehlmann and thomas vuillemin from merck (vevey, switzerland) for providing the igg glycan variants and the 2-ab-uplc glycan data. polyplus-transfection would like to thank généthon for their kindly provided data.acknowledgements asmita mukerji, reetesh pm, sasi kumar k, pilot plant team acknowledgment to cedric allier from cea leti, grenoble, france. we would like to thank a. schemel and a. ehrlich for technical assistance. the authors would like to mention that this research was supported by the fi-dgr (2017) from spanish government and the project was led by prof. jordi joan cairó badillo. the authors want to thank the biotech process sciences team at merck in corsier-sur-vevey for their support and also the members of the morbidelli group at eth zürich for their input and collaboration. authors would like to thank dr. benjamin youn in manufacturing science and technology (msat) at biomarin for his help on coding the excel macro program for ambr15, and dr. donald l. traul from tap biosystems (now part of the sartorius stedim biotech group) for his assistance on ambr15 operations. thanks to the bioreactor team of dustin davis, amer al-lozi, and jana mahadevan. we organic vaccines tm and the nih, who kindly provided to univercells. we thank polymun scientific immunbiologische forschung gmbh for providing the antibodies igm103, igm104 and igm617 as a kind gift. this allison kurz and gian andrea signorell, genedata ag, basel, switzerland allison kurz, gian andrea signorell, genedata, basel, switzerland. allison kurz, gian andrea signorell, genedata ag high glucose concentration and low specific cell growth rate improve specific r-tpa productivity in chemostat culture of cho acknowledgements r. boraston for photographs in fig. 1 . we acknowledge atum for their contributions to vector design and construction. austrian bmwfw, bmvit, sfg, standortagentur tirol, government of lower austria and business agency vienna through the austrian ffg-comet-k2. this research is supported by sfi grant 13/ia/1841. financial support of the austrian science fund (fwf; grant number p 25056) is gratefully acknowledged. we would like to thank the australian institute for bioengineering and nanotechnology, university of queensland-brisbane, australia (aibn) for providing the cho clones. this research is partially supported by the developing key technologies for discovering and manufacturing pharmaceuticals used for next-generation treatments and diagnoses both from meti and amed, japan. this work was supported in part by grants for developing key technologies for discovering and manufacturing pharmaceuticals used for next-generation treatment and diagnoses, both from the ministry of economy, trade and industry (meti), japan and from the japan agency for medical research and developments (amed). many thanks stefanos grammatikos for his support and to the whole upstream process sciences team. the authors thank the hessen state ministry of higher education, research and the arts within the hessen initiative for scientific and economic excellence (loewe program) for the financial support. the authors thank xell ag, bielefeld, for providing hek serum-free media (hek gm and hek tf) and for fruitful discussions. the authors would like to thank dana wenzel for cho lysate preparation (fraunhofer izi, potsdam-golm, germany). this work is supported by the european regional development fund (efre) and the german ministry of education and research (bmbf, no. 031b0078a). the authors acknowledge são paulo research foundation -fapesp (2015/ 19017-6), centro de pesquisa, inovação e difusão (cepid), and national institute of science and technology in stem cell and cell therapy -inctc for financial support. this work was supported by grants from the niedersächsisches ministerium für wissenschaft und kultur (80029155) and the german ministry for economic affairs and energy (igf 16153 n). the infrastructure which was modularly built up, consists of automated liquid handling robots, plate and tube handling robots as well as incubators, refrigerator and analysis systems as for example an imaging system. the aim is to address the need on reproducibility and reliability of results and to offer access to a maximal controlled and automated environment. with the help of a web-based configurator assay selection as well as parameterization of the assays can be done in an easy way. after the order process, test items can be shipped to the lab. assays will be executed on the fully automated platform. by capturing in process data as well as environmental conditions, a real complete data set is leading to comprehensively results. as soon as results are available during the process, the view and analysing can be done in a secure cloud. the service can be used for single experiments in low throughput applications and is therefore a benefit for labs which cannot afford automated infrastructure or the staff for the maintenance for such platforms. extensive monitoring and data capturing during the run leads to a gapless data trail and the possibility of detailed result analysis. due to automated processing the reproducibility is increased associated with direct reduction of costs and time. the centralized service paired with specific know-how allows up-scaling of processes at any time. the web-based interface provides a flexible guidance for the user and the online order gives 24/7 access on the infrastructure, leading to a fast reliable result generation. furthermore the secure interaction with additional services e.g. other specific data analysis tool is possible. this dynamic access to automation offers high flexibility for low throughput experiments and will push high quality research and drug development in early stage. development of alternative animal cell technology platforms: cho based cell-free protein synthesis systems for the production of "difficult-to-express" proteins lena thoring 1, 2 background nowadays, animal cell technologies are commonly used for a broad range of medical and pharmaceutical applications. one main topic of these technologies is the production of proteins used for therapeutical purposes. these in vivo production processes are often time consuming and limited in production of so called "difficult-to-express" proteins including the pharmaceutical relevant class of membrane proteins. to overcome these issues, novel cell-free protein synthesis platforms were developed based on the industrial working horse cho cells [1] . cell lysates provide a basis for this technology by including all components of the translational machinery and enabling protein production within a few hours. microsomal structures present in cho cell lysates enable posttranslational modification of target proteins and insertion of membrane proteins into lipid bilayer. in this study a cell-free protein synthesis platform was developed based on a combination of cho cell lysates and a continuous exchange reaction format. the continuous exchange reactor consists of a twochamber system, a reaction and a feeding chamber, separated by a semipermeable membrane. due to concentration gradients, energy components can diffuse to the reaction chamber, while inhibitory byproducts are continuously removed. different classes of proteins were selected to evaluate the quality of the cho cecf system including a transmembrane receptor, a single chain variable fragment and an ion channel. cell-free protein synthesis was performed in the presence of 14 c leucine for radio labeling of synthesized proteins. protein yield was quantified by tca precipitation of radio labeled proteins followed by scintillation measurement and molecular mass was detected by autoradiography. posttranslational modifications and activities of proteins were estimated by kinase assays, elisa, endoglycosidase treatment and electrophysiological measurements. the demonstrated results showed a protein production of up to around 1 g/l while detecting correct molecular weights by autoradiography. analysis of the productivity using different lysate batches by the production of the membrane protein egfr revealed only minimal batch-to-batch variations (fig. 1a) . posttranslational modifications of proteins, including phosphorylation and glycosylation, were detected using western blot and autoradiography (fig. 1b) . evaluation of localization of membrane embedded eyfp fusion proteins by confocal laser scanning microscopy resulted in the detection of proteins in the microsomal fraction of cho cell lysate. produced single chain variable fragments showed binding specificity in elisa experiments. the activity of synthesized ion channels was underlined by electrophysiological measurements and detected single channel activities. a cell-free system based on cho cell lysates for high yield production of proteins was developed that provides a platform for efficient production of "difficult-to-express" proteins. the combination of a cho lysate based cell-free system and a continuous exchange cell-free system leads to be a highly efficient production system for various classes of "difficult-to-express" proteins. this approach opens up a fast and cost-effective process pipeline for the production of "difficult-to-express" proteins and shows a high potential for industrial applications including screening technologies, protein structure determination and just-in-time protein production processes. key: cord-001541-5d64esp4 authors: walker, peter j.; firth, cadhla; widen, steven g.; blasdell, kim r.; guzman, hilda; wood, thomas g.; paradkar, prasad n.; holmes, edward c.; tesh, robert b.; vasilakis, nikos title: evolution of genome size and complexity in the rhabdoviridae date: 2015-02-13 journal: plos pathog doi: 10.1371/journal.ppat.1004664 sha: doc_id: 1541 cord_uid: 5d64esp4 rna viruses exhibit substantial structural, ecological and genomic diversity. however, genome size in rna viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. all but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. we show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive orfs within the major structural protein genes, and the insertion and loss of additional orfs in each gene junction in a clade-specific manner. changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new orfs were observed increased in the 3’ to 5’ direction along the genome. we also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving turbs-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. we conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded rna genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the rhabdoviridae. rna viruses are among the most structurally and ecologically diverse of all life forms [1] . their genomes may consist of positive (+) sense, negative (-) sense or ambi-sense singlestranded (ss) rna, or double-stranded (ds) rna, and may take the form of a single or multiple segments that are packaged in single or multiple particles. rna viruses also employ a plethora of strategies for replication and gene expression, and encode a vast array of structural and nonstructural proteins, many of which are unique and have multiple, highly specialized functions [2] . despite their diversity, rna virus genomes are ubiquitously small, averaging only 10 kb, and with a maximum size of~32 kb for some members of the order nidovirales [3, 4] . this size limitation has been linked to high mutation rates (a mean rate of~1 mutation /genome /replication) due to replication with an error-prone rna-dependent rna polymerase that lacks proofreading capability [5, 6] . high error rates are thought to limit genome sizes because, as size increases, the number of deleterious mutations also increases to levels beyond which reproduction of the fittest variant cannot be guaranteed [7, 8] . due to this fundamental evolutionary constraint, rna viruses have employed various mechanisms of genome compression, such as the use of alternative or overlapping open reading frames (orfs) and the evolution of multiple functions for individual proteins [4, 7, 9] . for some rna viruses, increases in genome size have been associated with increases in the size of replicative proteins [10] and the presence of helicase and proof-reading exonuclease domains [3, [11] [12] [13] . however, the mechanisms and evolutionary context that would favour increased genome size and complexity, given constraints on replication efficiency, are currently unknown [3, 4] . the rhabdoviridae is one of the most ecologically diverse families of rna viruses. rhabdoviruses have been identified in a very wide range of plants and animals, including mammals, birds, reptiles, and fish with many transmitted by arthropod vectors [14, 15] . the family includes rabies virus (rabv), which causes over 25,000 human deaths annually [16] , vesicular stomatitis indiana virus (vsiv), which has served as an important model for the study of many aspects of mammalian virus replication and virus-host interactions, and many other important pathogens of humans, livestock, farmed aquatic animals and food crops. the nonsegmented [-] ssrna rhabdovirus genome is packaged within a characteristic bullet-or rodshaped particle comprising five structural proteins-the nucleoprotein (n), polymeraseassociated phosphoprotein (p), matrix protein (m), glycoprotein (g) and rna-dependent rna polymerase (l) [17] . the genome features partially complementary, untranslated leader (l) and trailer (t) sequences and five orfs arranged in the order 3'-n-p-m-g-l-5'. each orf is flanked by relatively conserved transcription initiation (ti) and transcription termination/ polyadenylation (ttp) sequences which orchestrate expression of the five corresponding capped and polyadenylated mrnas [17] . rhabdovirus genomes may also contain additional orfs encoding putative proteins, which are mostly of unknown function. these may occur as alternative or overlapping orfs within the major structural protein genes or as independent orfs flanked by ti or ttp sequences in the regions between the structural protein genes [15] , some of which appear to have arisen by gene duplication [15, [18] [19] [20] [21] [22] . here we undertake the first large-scale analysis of the evolution of genome size and complexity in a family of [-] ssrna viruses. we demonstrate that remarkable changes in genome size and complexity have occurred in rhabdoviruses in a clade-specific manner, primarily by extension and insertion of additional transcriptional units in the structural protein gene junctions, followed by occasional losses. we also show that rhabdoviruses have evolved a large number of accessory proteins and that the use of non-canonical gene expression strategies appears to be common, particularly amongst vector-borne rhabdoviruses. our data set comprised the complete or near-complete genome sequences of 99 animal rhabdoviruses, including 45 viruses isolated from various vertebrates and arthropods for which we determined the sequences de novo (s1 table) . incomplete genomes lacked only the extreme terminal sequences. all rhabdovirus genomes contained the five canonical structural protein genes (n, p, m, g and l); however, there was remarkable diversity in the number and location of other long orfs. across the data set, we identified 179 additional orfs 180 nt in length of which 142 shared no detectable protein sequence similarity with any other protein in our data set or with those in public databases (s2 table) . these additional orfs were located either within the structural protein genes or in additional transcriptional units located in regions between these genes (fig. 1) . the additional transcriptional units were annotated by using relatively conserved ti and ttp motifs. the core ti sequence (uugu) was conserved with some minor variations (cugu, uugc, uuga, ucgu, ugau) employed in some viruses. the ttp motif g[u] 7 was also conserved, with the variation a[u] 7 occurring only in several genes of one virus (chov). due to the large number and diversity of additional orfs, we adopted a standard nomenclature that does not necessarily reflect structural homology. unless previously assigned a distinctive name (e.g., befv g ns , α1, α2, β and γ proteins), all orfs 180 nt were assigned names according to the following rules: i) each additional transcriptional unit was designated u (unknown) followed by a number as they appeared in order in the genome presented in positive polarity (i.e., u1, u2, u3, etc); ii) the first orf within each transcriptional unit was assigned the same designation as the transcriptional unit; and iii) each subsequent orf within any transcriptional unit (alternative, overlapping or consecutive) was designated by letter (i.e., u1x, u1y, u1z) (s2 table) . alternative orfs are defined here as those which occur in a different frame within another longer orf; overlapping orfs are alternative orfs which extend beyond the end of the primary orf; and consecutive orfs are those which do not overlap but follow consecutively within the same transcriptional unit. the arbitrary cut-off of 180 nt (60 aa) was selected on the basis that two small basic proteins of 55 and 65 amino acids (c and c') have been shown to be expressed from an alternative orf within the vsiv p gene [23, 24] . these are the smallest known rhabdovirus proteins. to determine the evolutionary history of the rhabdoviruses studied here, we inferred a phylogenetic tree using conserved regions of the l protein of all 99 viruses in our data set as well as the recently described north creek virus (norcv) [25, 26] (fig. 2) . all but two of these 100 rhabdoviruses (norcv and mouv) clustered into 17 well-supported monophyletic groups (bootstrap proportion [bsp] 85); however, many of the deeper nodes were unresolved throughout the phylogeny. eight of the well-supported clades corresponded to the eight established genera (lyssavirus, vesiculovirus, perhabdovirus, sigmavirus, ephemerovirus, tibrovirus, tupavirus and sprivivirus) and we assigned a further seven clades as proposed new genera (almendravirus, bahiavirus, curiovirus, hapavirus, ledantevirus, sawgravirus and sripuvirus). the taxonomic assignment of the two remaining clades was considered to be ambiguous (s1 table) . for simplicity of expression we refer here to all as 'genera', whether existing or proposed, but we recognise that taxonomic proposals require consideration and ratification by the international committee on taxonomy of viruses (ictv). although the analysis was limited by the availability of single isolates of most viruses, apparent structure by geographic location or reservoir host was not observed in the phylogeny. however, multiple genera appeared to be primarily associated with bats (i.e., ledanteviruses, lyssaviruses), fish (i.e., perhabdoviruses, spriviviruses) or ungulates (i.e., ephemeroviruses, tibroviruses, vesiculoviruses). vector-borne rhabdoviruses were present in 12 of the 17 groups, dominating the dimarhabdovirus supergroup, but were largely absent from clades associated with bats (lyssavirus), flies (sigmavirus) and fish (perhabdovirus, sprivivirus) (fig. 2) . the exception to this trend was the tupavirus clade, which comprised viruses that have not yet been associated with a vector species, and for which little is known about their ecology or distribution. each of the seven newly proposed rhabdovirus genera formed an independent, well-supported monophyletic group in the l protein phylogeny (bsp 85), and comprised viruses with similar genome organization ( fig. 1; fig. 2 ). in several instances, viruses clustered closely with other members of a genus, yet we considered them to be unassigned species due to major differences in genomic architecture (see below). for example, the newly proposed genus curiovirus comprises a monophyletic group of four viruses isolated from biting midges (culicoides sp.), sandflies (lutzomyia spp.) and mosquitoes (coqillettidia and trichoprosopon spp.) from the forests of south america and the caribbean (s1 table) . the genomes of curv, irirv, rbuv and itav all have one or more orfs located between the m and g genes, and the g and l genes. in contrast, the closely related aruv and inhv lack additional genes between the m and g and for this reason we have excluded them from the genus curiovirus at this time. we also recognize the previous suggestion that curv and itav should be assigned to a new genus for which the name bracorhabdovirus (brazilian amazonian culicoides rhabdoviruses) was proposed [27] . however, our analysis clearly indicates that this monophyletic group has a broader host range and geographic distribution than this regionally-derived name suggests. five of the novel viruses (comprising four putative new species) identified in this study were assigned to established genera. two of these, koolv and yatv, clustered within the existing ephemerovirus clade, (bsp 85) and possessed the characteristic genome organization of ephemeroviruses, including a non-structural glycoprotein gene (g ns ) followed by a viroporin newly proposed genera are indicated by a † symbol. cytorhabdovirus, novirhabdovirus and nucleorhabdovirus outgroup sequences were excluded from the tree as they were too divergent to establish a reliable rooting. the tree is therefore rooted arbitrarily on one of two basal clades (genera almendravirus and bahiavirus) that comprise viruses isolated from mosquitoes. (α1) and several other small proteins ( fig. 1; fig. 2 ). similarly, two novel viruses isolated from biting midges (culicoides insignis), swbv and bav, clustered within the genus tibrovirus (bsp 85) and exhibited the conserved n-p-m-u 1 -u 2 -g-u3-l genome organisation ( fig. 1; fig. 2 ; s1 table) . swbv was assigned as a new species (sweetwater branch virus), but bav is closely related to tibv and may be regarded as the same species (tibrogargan virus). finally, a novel tupavirus (klav) identified from two species of vole (microtus and clethrionomys spp.), clustered with the tupv and durv clade in the l protein phylogeny ( fig. 2 ; s1 table) . a more detailed rationale for the assignment of viruses to existing and proposed new genera is provided as supplementary text. we identified a 48.5% variation in genome size from the smallest genome (fukv, ledantevirus; 10,863 nt) to the largest in our data set (koolv, ephemerovirus; 16,133 nt). all genomes, including those for which extreme terminal sequences were unresolved, appeared to fall within this range. variations in genome size were associated with: i) variation in the length of intergenic regions (igrs) between transcriptional units; ii) variation in the length of 3' and 5' untranslated regions (utrs) within individual transcriptional units; iii) the presence of additional transcriptional units containing long orfs; and iv) the presence of overlapping or consecutive long orfs within individual transcriptional units. an examination of genome size across the phylogeny revealed a general trend towards larger genomes in the lower third of the tree, which is comprised of the hapaviruses, curioviruses, tibroviruses and ephemeroviruses, as well as several unassigned viruses (s1 fig.) . although this may indicate that an enhanced capacity for genome expansion is a property specific to this group, variation in genome size can also be observed between viruses in the majority of genera in the data set. several clade-specific patterns were evident when the lengths of the transcriptional units and igrs were compared within and between rhabdovirus genera (table 1) . ledantevirus genomes were smallest on average (1.75 × the length of the l) whereas ephemeroviruses genomes were the largest (2.37 × the length of the l, table 1 ). interestingly, although substantial variation in the length of gene junctions was observed in several genera (including ephemeroviruses and lyssaviruses), most variation in genome size occurred as the result of the presence of new, non-canonical orfs in the regions between the structural protein genes (table 1) . although new orfs were observed in each igr across the phylogeny (n-p, p-m, m-g and g-l) their location was primarily restricted to a single igr within each genus. for example, while hapavirus genome expansion occurred primarily in the p-m junction, genome expansion in the ephemeroviruses occurred at the g-l junction and tibrovirus and curiovirus genomes contained additional orfs primarily in the m-g junction (table 1 ). this suggests that once a new orf arises at a particular gene junction within a lineage, further expansion is more likely to continue at the same gene junction, rather than begin anew elsewhere in the genome. whilst the genome architecture in some viruses was highly compact, others featured long stretches of sequence with non-ascribed function that occurred primarily as 5'utrs and 3'utrs within transcriptional units (fig. 3) . the proportion of untranslated sequences within or between transcriptional units ranged from 0.5% (fukv; 58 nt) to 10.6% (wcbv; 1290 nt) and did not correlate with genome size. furthermore, although all lyssaviruses (such as wcbv) featured a high proportion of untranslated sequences (primarily evident as a very long 3'utr in the g gene), there was no consistent association between the proportion of untranslated sequences and genus assignment (fig. 3 ). for example, in the genus hapavirus, the proportion of untranslated sequences in the two largest genomes varied from 1.1% (ngav) to 6.4% (ljv). similarly, in the genus ephemerovirus the proportion of untranslated sequences varied from 1.2% in the smallest genome (yatv) to 9.6% in the largest genome (koolv). the presence of long stretches of untranslated sequence, which occurred primarily within transcriptional units, suggests these regions may be functional. however, it is unclear at this time why they are present in some rhabdoviruses and not in others. gene duplication. previous studies have provided evidence of gene duplication in the rhabdoviridae, involving the g and g ns genes [18, 21] and the β and γ genes [22] in the ephemeroviruses, and the u1, u2 and u3 genes in the hapaviruses flav and wonv [15, 19, 20] . to identify further examples of gene duplication, we conducted a blast analysis of all proteins in our database (e-value <1e-3) and used clustalx alignments to confirm sequence similarity. by this analysis, orfs located between the p and m genes of most hapaviruses encode proteins which share detectable sequence similarity. this family of homologous p-m intergenic region proteins (pmips) includes the u1, u2 and u3 proteins of ljv, wonv, pcv, orv, ljav, manv, mqov, flav, hpv, kamv and mosv (s2 fig. and s3 fig.) , as well as the u1x proteins of manv and glov which are encoded in orfs overlapping their respective u1 orfs (s4 fig.) . although pairwise alignments provide clear evidence for homology, the hapavirus pmips share generally low levels of sequence identity and no universally conserved motifs, indicating considerable structural and functional divergence from their ancestral homolog. proteins encoded in the p-m region in other hapaviruses (i.e., joiv u1, ngav u1, u1x and ngav u2) failed to display significant similarity with the pmips or evidence of gene duplication but this may be due to further structural divergence. additional evidence of gene duplication included the u2 and u3 proteins of joiv (encoded in orfs located between the g and l genes), and the n-terminal regions of the p proteins and the upstream u1 accessory proteins of the sripuviruses chov and smv, each of which share significant sequence similarity (s5 fig.) . these data suggest that the u1 protein of the sripuviruses originated from a duplication of the p gene, with the downstream copy of the gene retaining the parental function. similarly, in the curioviruses there is extensive amino acid sequence similarity between the u3 proteins of curv and irirv and the n-terminal region of the g proteins, suggesting evolution of u3 through partial duplication of the g gene, which lies immediately downstream. putative accessory genes were found to be abundant and varied greatly in number and location in each genome (fig. 1) . a complete list of orfs >180 nt is annotated in s2 table. in most cases, homology searches detected no significant amino acid sequence identity with entries in genbank. however, various rhabdovirus accessory gene families were identified based on amino acid sequence identity in our custom blast searches, or common structural characteristics. viroporins. viroporins are small hydrophobic proteins that oligomerize in host cell membranes to form hydrophilic pores, disrupting various cellular processes and promoting virus replication [28] . orfs encoding viroporin-like proteins were found in more than one-third of the rhabdoviruses in the data set, either as overlapping or consecutive orfs within the g gene, or in additional transcriptional units following the g (or g ns ) gene (fig. 1) . orfs encoding putative viroporins were evident in the genomes of all ephemeroviruses, tibroviruses, hapaviruses, bahiaviruses, almendraviruses and curioviruses, as well as the unassigned species aruv and inhv (fig. 4) . several of these proteins have been identified previously [19, 22, [29] [30] [31] [32] [33] [34] [35] . like the befv α1 protein for which viroporin activity has been confirmed experimentally, these proteins have the structure characteristics of class ia viroporins, including a central transmembrane and a highly basic c-terminal domain. however, although located in similar positions in the genomes, they are generally too divergent in sequence to establish orthology [22, 36] . other small transmembrane proteins. small proteins with a predicted central transmembrane domain but lacking other characteristics of class 1a viroporins were identified in several other rhabdoviruses (s6 fig.; s2 table) . transmembrane proteins with an n-terminal ectodomain are encoded in the gx orf of sripuviruses and the u3 orf of one curiovirus (rbuv). however, in other curioviruses (curv and irirv), transmembrane proteins are encoded in the u2 orf and are predicted to have the reverse membrane topology to the rbuv u3 protein. sequence alignments further suggest these proteins are not orthologous. there is also a small double-membrane spanning protein with a predicted short ectodomain loop encoded in an alternative orf in the fukv m gene that is not present in other ledanteviruses. other small hydrophobic (sh) proteins. small highly hydrophobic proteins (6.8-10.8 kd) lacking predicted transmembrane domains are encoded in all tupaviruses (as independent transcriptional units following the m gene) and sripuviruses (as overlapping orfs within the m gene) (s7 fig.; s2 table) . all have similar hydropathy profiles with a highly hydrophilic n-terminal domain extending to the centre of the sequence, but sequence identity indicative of orthology is restricted to closely-related viruses. several of these sh proteins have been identified previously but their function remains unknown [37] [38] [39] [40] . large class i transmembrane glycoproteins. all ephemeroviruses encode a class i transmembrane glycoprotein (g ns ) in the orf following the g gene [18, 21, 30, 31] . ngav (assigned to the proposed new genus hapavirus) also encodes a g ns protein with similar structural characteristics [35] . however, as we found no evidence to support recombination between ngav and any ephemerovirus, the ngav g ns gene is likely to have arisen by an independent duplication event of the upstream g gene with which it shares amino acid sequence identity. orf u1 immediately following the mcov g gene (genus hapavirus) also encodes a large class i transmembrane glycoprotein but lacks the set of conserved cysteine residues that are characteristic of g and g ns proteins, and our homology searches failed to identify similarity with any known protein (s8 fig.) . other genus-specific accessory gene families. orthologous sets of accessory genes occur in genus-specific patterns in each of the structural protein gene junctions ( fig. 1; s2 table) . in addition to the hapavirus pmip genes, these include genes in the n-p junction of sripuviruses chov and smv (u1 proteins), the m-g junction of curioviruses (u1 and u1x proteins) and tibroviruses (u1 and u2 proteins), and the g-l junction of curioviruses (u3x proteins) and ephemeroviruses (α2, β, γ and δ proteins) (s9 fig. to s11 fig.) . some of these orthologous gene sets have been described previously [15] . most encode proteins without remarkable structural characteristics and of unknown function (s2 table) . several general architectural patterns in the arrangement of orfs were evident, implicating several mechanisms of non-canonical gene expression. non-cannonical expression mechanisms are used commonly in other families of rna viruses to increase genome complexity without significantly increasing genome size [41] . the patterns we observed in this data set were associated with consecutive, overlapping of alternative orfs within individual transcriptional units. consecutive orfs and turbs motifs. consecutive long orfs with termination and initiation codons that are either overlapping (e.g., uaaug) or separated by a short stretch of nucleotides were common in several groups of rhabdoviruses (fig. 5) . as previously observed for flav, this 'stop-start' arrangement is commonly preceded by a 'termination upstream ribosome-binding site' (turbs), which contains a short sequence motif that is complementary to the loop region of helix 26 of 18s ribosomal rna [19, 41] . the turbs may also contain flanking anti-complementary sequence motifs that are predicted to form a stem-loop structure. this arrangement was found in the m transcriptional unit in the sripuviruses, the g transcriptional unit of several hapaviruses (flav, hpv, manv, mqov, kamv, mosv and glov) and the transcriptional unit between the p and m genes of glov. the 'stop-start' arrangement also occurs in the transcriptional unit between the g and l genes of aruv, allowing expression of the u2 orf, but in this case the turbs appears to be further upstream of the stopstart site. finally, the α gene transcriptional unit in most ephemeroviruses contains consecutive orfs encoding a viroporin (α1) and a second protein of unknown function (α2). in kotv, a tubrs is evident upstream of the stop-start site but in other ephemeroviruses the turbs appears to be more cryptic. overlapping orfs and ribosomal-frame shift (rfs) sites. overlapping orfs are common in rhabdovirus genomes and represent a second common architectural arrangement requiring non-canonical gene expression. overlapping orfs occur within the n transcriptional unit (wonv, orv, pcv, mcov, manv), the g transcriptional unit (wonv, orv, pcv, bgv, harv) or within additional transcriptional units between the p and m genes (manv, ngav) or the m and g genes (curv, irirv, rbuv). expression of the second orfs in these arrangements would require either internal initiation in an alternative reading frame or another mechanism such as rna editing or a ribosomal frame-shift (rfs) to extend the first orf. use of alternative initiation codons has been reported in the m and p genes of vsv and the p gene of rabv, and rna editing has been described in the p gene of paramyxoviruses [23, [42] [43] [44] [45] . although not described previously in mononegaviruses, potential rfs sites were identified in some of these rhabdovirus gene overlap regions, featuring the 'slippery' sequence motifs uaruuuuuuca (bgv, harv, msv) or ccnuuuuuuga (wonv, orv, pcv) followed by a predicted stem-loop structure (s12 fig.) . these sequence motifs and associated stem-loop structures most closely resemble the-1 rfs that allows expression of gag-pol in hiv-1 and other lentiviruses [41, 46] . alternative orfs and leaky ribosomal scanning. the third architectural arrangement involves the use of alternative orfs within a longer orf. this arrangement was described previously in vsiv, in which two small basic proteins of 55 and 65 amino acids (c and c') are expressed from an alternative orf within the p gene [23, 24] . on this basis, we scanned the rhabdovirus genome data set for alternative orfs of various size ranges and observed that the frequency varied from~2.3/genome for orfs in the range of 90-150 nt (30-50 amino acids) tõ 8.6/genome for range 150-210 nt (30-70 amino acids) (fig. 6 ). alternative orfs 60 amino acids occurred in each of the structural protein genes (n, p, m, g and l) and in the additional transcriptional units between the p and m genes. they were most common in the p and least common in the m genes. as observed in other viruses, expression of these alternative orfs could occur by leaky ribosomal scanning, allowing initiation of transcription by a proportion of ribosomes on the alternative start codon [41] . although, it is not known which (if any) of these alternative orfs are expressed, several factors are likely to be important in determining the probability and level of expression: i) the kozak contexts of the first and alternative initiation codons; ii) the length of the alternative orf (longer orfs are less likely to occur by chance); iii) the location of the alternative orf (distally located orfs are less likely to be expressed in long transcripts); and iv) the expression level of the transcript (l gene transcripts are likely to be the least abundant). for example, short orfs with initiation codons in poor kozak context at the distal end of the l gene are not likely to be expressed at significant levels, if at all. however, in some cases, closely related viruses were found to contain alternative orfs at the same genome location, with initiation codons in good context and encoding predicted polypeptides with high levels of sequence identity (s2 table) . such arrangements occurred in the n table. doi:10.1371/journal.ppat.1004664.g006 genes of hpv and flav, the p genes of manv and mqov, the u2 and m genes of kamv and mosv, and near the start of the g genes of the sripuviruses (niav, sriv, chov and smv); these proteins are considered very likely to be both expressed and functional. we have conducted a detailed analysis of the structural organisation and genome evolution of a family of negative-sense rna viruses-the rhabdoviridae. previous studies have surveyed known rhabdoviruses for biological and genomic diversity, revealed phylogenetic relationships, and considered factors that may have determined their rates of evolution [14, 15, 47, 48] . in this study, we greatly expanded the repertoire of rhabdovirus genome sequences, which demonstrate extensive variation in genome size and complexity, allowing the assignment of seven proposed new genera. we also identified patterns of accessory gene evolution and expression, and showed that changes in rhabdovirus genome length and composition have occurred throughout the evolutionary history of the family, primarily through the generation and loss of new transcriptional units. this observation is especially striking given the obvious constraints on viral genome size [7] . the most remarkable aspect of this analysis is the number and variety of additional orfs identified in rhabdovirus genomes, which provides a very different perspective of the family and its evolution than had been obtained from studies of the traditional prototype members (vsiv and rabv). as many of these orfs occur as additional transcriptional units complete with conserved transcriptional control sequences, there is a high likelihood that they would be expressed in infected cells. expression of orfs located in additional transcriptional units has been demonstrated previously for several ephemeroviruses and for the hapavirus wonv [18, 21, 30, 31, 36, 49] . others occur as either alternative or overlapping orfs. further studies are required to determine which of these orfs may be expressed, but we suggest that expression is likely when both the encoded amino acid sequence and the translational context are conserved in related species. notably, very few of the additional orfs detected in this analysis encode proteins with identifiable sequence similarity to other known proteins. sequence similarity, when detected, occurred only between closely related viruses assigned to a genus and, although some accessory protein families were identified, these were more commonly related by shared structural characteristics, such as charged or transmembrane domains, than by sequence. this has been observed previously for so-called orphan ('orfan') proteins in other viruses and bacteria. it has been suggested that the uniqueness of orphan proteins, or their restriction to a single species or genus, is the result of creation de novo, rather than by recombination or lateral gene transfer, and that they play an 'accessory' role in viral pathogenicity or transmission instead of having functions in virion structure or replication [50] [51] [52] . it has also been observed that many orphan proteins are predicted to be highly disordered in structure or, when ordered, structural resolution has revealed unique folds [50] . as such, future determination of the biological activities of the plethora of novel proteins identified here will require functional studies that may well provide important insights into aspects of infection and immunity as well as fundamental cellular processes and pathways. substantial variation in genome size and complexity was also observed in many rhabdovirus genera, suggesting that the length of the genome is not heavily constrained in all members of the family. indeed, the presence of new orfs and/or very long stretches of non-coding sequence within or between transcriptional units was noted frequently. previous observations have demonstrated that foreign genes of up to~6 kb can be inserted into the vsiv genome without significant disruption to viral replication in vitro [53, 54] . expanded vsiv genomes were morphologically similar but proportionally longer than wild-type viruses, suggesting that the unique morphology of the rhabdovirus particle may more readily accommodate genome expansion than other virion structures. a significant body of evidence suggests that genome size in rna viruses is likely to be constrained by low replication fidelity [7, 8] , and a relationship between genome size and error rate has been observed in a diverse array of organisms [55] . however, if the genome sizes of rhabdoviruses are constrained by selective pressures other than (or in addition to) those imposed by the background mutation rate, genome expansion may not require a concomitant reduction in polymerase error rates. as the mutation rate of rhabdoviruses has only been determined experimentally for vsiv thus far (~6 × 10 -6 subs/ nucleotide/replication), it is impossible to assess whether the increases in genome size observed here have been associated with concomitant reductions in mutation rate [48] . it is also striking that while some rhabdovirus genomes appear to have undergone major changes in length and complexity, others contain only the 3' and 5' promoter regions and five canonical transcriptional units with minimal 5' and 3'utrs. this suggests that the acquisition and loss of new genes and intergenic regions may be a regular feature of rhabdovirus evolution. previous studies of rna viruses have concluded that constraints on genome size imposed by polymerase error have led to various strategies to minimize genome size while increasing functional complexity, such as gene overlaps and protein multi-functionality [9, 56] . given these size constraints, it is unclear why long non-coding regions would arise both within and between transcriptional units and be maintained throughout the evolution of some rhabdovirus genera. it has been known for many years that a long 3'-utr of unknown function (ψ region) in the g gene of rabv is unnecessary for efficient replication in cell culture or in mice, but may play a role in neuroinvasion [57] [58] [59] . indeed, the retention of similar ψ regions in all lyssaviruses and the existence of long utrs and igrs in other rhabdoviruses suggests that they must provide some fitness advantage in vivo, such as stabilising rna secondary structure, serving as a source of, or targets for, micro rnas, or attenuating transcription of downstream genes to achieve the most effective balance of gene expression. indeed, an analysis of patterns and rates of sequence evolution in the rhabdoviridae and other families in the mononegavirales revealed that, although non-coding regions are less conserved than those that encode proteins, their evolutionary rates are associated with relative genomic position, suggesting that they impact on gene expression [60] . additional orfs and non-coding sequences occurred at all junctions of the canonical structural protein genes (i.e., n-p, p-m, m-g, and g-l), although there was variation in both the frequency of insertion and the extent of expansion. notably, insertions at the n-p junction are rare, with a single additional orf present in the closely related sripuviruses chov and smv, and short overlapping orfs present within the n gene transcriptional unit in some hapaviruses. it has been reported previously in a study of vsiv recombinants that only the n-p gene junction was refractory to the stable expression of an inserted transcriptional unit, and resulted in a virus with significantly reduced replication efficiency [61] . in contrast, transcriptional units inserted at other gene junctions were stably expressed, maintained through repeated passages and had no effect on replication efficiency. as the insertion of additional transcriptional units attenuates expression levels of all downstream genes, this may be associated with the importance of maintaining precise control of n and p protein ratios in infected cells to ensure efficient switching between the transcription and replication modes of the ribonucleoprotein complex [62, 63] . the relationships, locations and contexts of additional orfs in various viruses lead us to propose a general model for rhabdovirus genome plasticity, which can account for both gains and losses in genome size and complexity (fig. 7) . in each of these viruses, small orfs of various lengths occur within most transcriptional units; and although only those 180 nt have genomic evolution in rhabdoviruses been catalogued here, there are numerous other smaller orfs throughout most genomes. it is reasonable to assume that, although the polypeptides encoded in many of these orfs may not be expressed at all during infection, some may be expressed through leaky ribosomal scanning. these are likely to represent a rich genetic resource for the evolution of new functional genes in rna viruses [4] , triggering the rapid evolution of highly specialised functions. contemporarily, the evolution of a suitable kozak context, turbs motifs and ribosomal frame-shift sites would allow optimal expression within the parental transcriptional unit. ultimately, these new orfs may become uncoupled from the parental gene through gene (sequence) duplication [18] . as observed previously, this process would allow unconstrained evolution of the new orf and loss of the redundant copy of the parental orf [4, 64] . alternatively, new genes may also evolve independently of existing orfs. in some rhabdoviruses in our data set, very long non-coding regions (up to 749 nt) were present either within or between transcriptional units that could serve as a resource to spawn genes de novo in the absence of the evolutionary constraints imposed on alternative or overlapping orfs. this is most likely to occur when orfs are present in transcribed non-coding regions (utrs) such as the ψ region of wcbv in which, uniquely amongst lyssaviruses, an orf of 180 nt has been identified [65] . the creation of new genes de novo in non-transcribed igrs, such as those present in the g-l gene junctions of ljv, kotv and koolv, almost certainly would require prior or simultaneous evolution of new or modified transcriptional control sequences to allow their expression. we recognise that other mechanisms of genome expansion are also possible. in central american isolates of vsiv, for example, imprecise reiterative insertions of up to 300 nt in the 5'-utr of the g-gene (variations of 3'-uuuuuaa-5') have been attributed to non-templated extension by polymerase stutter at the ttp sequence [66, 67] . although homologous recombination appears to be very rare in mononegaviruses [68] , and we found no evidence of lateral gene transfer, we cannot exclude their involvement in rhabdovirus genome expansion. it is also evident that although there is an overall trend toward an expansion of genome size and complexity in the rhabdoviruses, gene loss is also likely to have occurred periodically throughout the evolution of the family. for example, the ephemerovirus γ proteins appear to have been lost in arv and obov, and the hapavirus pmips are entirely absent only from mcov (fig. 1) . although our data suggests that gene gain is a more frequent process than gene loss, we acknowledge that, if loss is very frequent, we might not be able to observe it given the available data. this may be resolved in the future with the acquisition of significantly more genomes sampled more closely in time. indeed, as defective-interfering particles are known to occur commonly in rhabdoviruses, a mechanism for purging redundant sequences appears to be readily available [69] [70] [71] . nevertheless, it is evident that a remarkable capacity for genomic plasticity through the gain and loss of accessory functions has been a central theme of rhabdovirus evolution. although our analysis was limited to the rhabdoviridae, similar mechanisms of genome expansion appear to occur in other families of non-segmented (-) ssrna viruses (mononegavirales). for example, amongst the paramyxoviridae genome length varies by 46.5% from human metapneumovirus (13,113 nt) to beilong virus (19, 212 nt) , and paramyxoviruses also contain novel accessory genes in transcriptional units inserted at various gene junctions [72] . the apparent propensity for genome expansion in mononegaviruses may be due to their discontinuous transcription strategy which generates multiple viral mrnas. sequence insertions within and between the individual transcriptional units of mononegaviruses are less likely to disrupt gene expression than in (+) ssrna viruses in which the genome commonly encodes a single polyprotein which is processed post-translationally. finally, this study has also provided an important advance in rhabdovirus taxonomy, allowing the assignment of six new species to existing genera and the assignment of 37 species to seven proposed new genera as well as the identification of six new unassigned species. there are currently no formal criteria for genus demarcation in rhabdoviruses. a system of genetic classification (demarc) that allows demarcation of viral taxa based on pairwise evolutionary distances has been proposed and, for picornaviruses, was shown to be comparable to expertbased taxonomic classification [73, 74] . however, the application of this approach to the rhabdoviridae would likely require a larger set of sequenced genomes at lower taxonomic levels [75] , and would be compromised by extensive rate variation among lineages (as this leads to biases in genetic distance measurements). in the taxonomy of higher organisms, to be descriptively useful, a genus should be monophyletic, reasonably compact, and ecologically, morphologically, or biogeographically distinct [76] . our assignment of new genera in the rhabdoviridae has been based primarily on the identification of well-supported monophyletic groups using unambiguously aligned regions of the l gene, together with a consideration of common features of genome organisation and known aspects of viral ecology. genome organisation has proven here to be a useful taxonomic marker as similar arrangements of accessory genes and other conserved elements of genome architecture appear to be the result of significant evolutionary events that provide resolution between the family and species levels. for some of the new genera, host and/or vector associations have also been relatively informative but in many cases, only single isolates of a species are available and else little is known of their ecology. it is likely that the proposed assignments of viruses to genera and the placement of the proposed unassigned species will evolve into a more complete taxonomic description as more viruses are discovered and as ecological data accumulates. details of the viruses included in this study, including taxonomic status, sources and dates of isolation, and genbank accession numbers of genome sequences are given in s1 table. all but three viruses sequenced in this study were obtained from the world reference center for emerging viruses and arboviruses (wrceva), located at the university of texas medical branch, galveston. of the remaining viruses, fukv and koolv were obtained from the collection held at the csiro australian animal health laboratory, geelong, and joiv was obtained from the qimr collection held at the queensland university of technology, brisbane, and kindly provided by dr john aaskov. viruses sequenced in this study were prepared as described previously [37] . with the exception of hpv, itav, curv, glov, inhv, nmv, mebv, yatv, ldv, garv, cntv, irirv, rbuv, barv, ljav, keuv, mcov, smv, chov, pcv and bav, which were sequenced directly from infected suckling mouse brain, viruses were sequenced from viral preparations grown in bhk-bsr, c6/36 or vero cells monolayers. sequencing was performed using either the illumina hiseq or miseq platforms. viral rna was fragmented by incubation at 94°c for 8 min in 19.5 l of fragmentation buffer (illumina 15016648). a sequencing library was prepared from the sample rna using an illumina truseq rna v2 kit following the manufacturer's protocol. samples were sequenced using the 2 × 50 paired-end protocol. reads in fastq format were quality-filtered and any adapter sequences were removed using trimmomatic software [77] . the de novo assembly program abyss [78] was used to assemble the reads into contigs using several different sets of reads and k values from 20 to 40. the longest contigs were selected and reads were mapped back to the contigs using bowtie 2 [79] and visualized with the integrated genomics viewer [80] to verify that the assembled contigs were correct. total reads ranged from 0.5 to 12 million and the percentage of reads mapping to the virus genome in each sample ranged from 0.2% to 33%. details are available upon request. assembly of full genome sequences was performed as previously described [37] and predicted orfs >30 amino acids in length were identified across each genome using geneious 7.0.6 (biomatters ltd). for each non-canonical orf >60 amino acids in length, we sought to identify putative homologues by first comparing the protein sequence to the complete non-redundant protein sequence database available on genbank using the blastp and psi-blast search algorithms, as well as to the uniprot20 database using the hidden markov model alignment-based algorithm hhblits [81] . for these searches, we investigated all matches with an evalue <1. we then created a custom protein database containing all orfs >60 amino acids in length from our data set (648 proteins) and performed a custom blast search to identify homologues within this data set. here, an e-value of <1e-3 was considered a significant match. amino acid sequence alignments containing all putative matches to each orf were then created using clustal x and evidence of structural and sequence similarity was investigated by visual inspection. structural predictions for proteins were conducted using compute pi/mw, sig-nalp, tmhmm, tmpred, netnes and netnglyc available through the expasy bioinformatics resource portal (http://www.expasy.org/). to quantify the location and extent of variation in genome size in our data set, we compared the average length of each genomic region within and between rhabdovirus genera. for all viruses, we normalized the length of each gene region (from the ti to ttp sequences, inclusively) and intergenic region by dividing by the length of the corresponding l gene, which varied least across the data set (coefficients of variation: n = 0.06, p = 0.12, m = 0.09, g = 0.13, l = 0.01). as there was substantial variability in the proportion of the 5' and 3' utrs that were included in the sequence data set, we considered each genome to begin at the first ti sequence and end at the final ttp sequence for this analysis. to infer evolutionary relationships among animal rhabdoviruses, we compiled sequences of the l (rna-dependent rna polymerase) protein, as this was the most highly conserved protein across the data set. we initially attempted to root the tree using a standard outgroup method. members of the rhabdovirus genera that infect plants (i.e., cytorhabdovirus and nucleorhabdovirus) were excluded as their sequences were highly divergent. we therefore utilized four members of the genus novirhabdovirus (infectious haematopoietic necrosis virus adb93801; viral hemorrhagic septicaemia virus bah57327; hirame rhabdovirus aco87999; and snakehead rhabdovirus np050585) as outgroups. unfortunately, these novirhabdovirus sequences were also far too divergent (>>1 amino acid change per site under multiple amino acid substitution models; results available on request) to establish a reliable rooting for our data set, as three different basal groups were identified using different models of amino acid substitution, although overall tree topologies were similar among substitution models (results available on request). in addition, the use of the novirhabdoviruses as outgroups resulted in excessive numbers of residues being removed following gblocks pruning (see below). based on the observation that most known rhabdoviruses are either insect viruses or replicate in insect vectors, it has been reasonably argued that plant and animal rhabdoviruses may have origins in insects [82] . we therefore selected the rooting scheme that best fit this theory. to this end, we choose one of the two basal clades from the novirhabdovirus-rooted tree, comprising viruses isolated from mosquitoes (i.e., the almendraviruses), as the most divergent group. we then repeated the phylogenetic analysis (procedure described below) excluding the novirhabdoviruses and rooting it on the almendraviruses. importantly, the choice of outgroup did not influence relationships either between or within the major clades demonstrating strong bootstrap support (bsp 85). the alignment used for the final tree inference (i.e., excluding the novirhabdoviruses) was comprised of amino acid sequences aligned using the muscle program [83] , with ambiguously aligned regions removed using the gblocks program with default parameters [84] . this resulted in a final sequence alignment of 100 taxa, 1007 amino acid residues in length. the phylogenetic relationships among these sequences were determined using the maximum likelihood (ml) method available in phyml 3.0 [85] employing the wag+g model of amino acid substitution and subtree pruning and regrafting (spr) branch-swapping. the phylogenetic robustness of each node was determined using 1,000 bootstrap replicates and nearest-neighbour branch-swapping. fig. (a-d) . amino acid sequence alignments of small accessory proteins encoded in the genomes of ephemeroviruses. (pdf) s10 fig. (a-d) . amino acid sequence alignments of the u1, u1x proteins, u3x and u4x proteins of the curioviruses, and of the rbuv u2 protein with the itav u1 protein. (pdf) s11 fig. (a, b) . amino acid sequence alignments of the u1 and u2 proteins of tibroviruses. (pdf) s12 fig. (a-e) . analysis of the potential ribosomal frame-shift sites in the sequence overlap regions of curioviruses and some hapaviruses. (pdf) s1 table. rhabdoviruses for which genome sequences have been used in this study. the evolution and emergence of rna viruses virus taxonomy. classification and nomenclature of viruses. ninth report of the international committee on taxonomy of viruses the footprint of genome architecture in the largest genome expansion in rna viruses the evolution of genome compression and genomic novelty in rna viruses lack of evidence for proofreading mechanisms associated with an rna virus polymerase rates of spontaneous mutation error thresholds and the constraints to rna virus evolution selforganization of matter and the evolution of biological macromolecules why genes overlap in viruses pacing a small cage: mutation and rna viruses discovery of the first insect nidovirus, a missing evolutionary link in the emergence of the largest rna virus genomes unique and conserved features of genome and proteome of sars-coronavirus, an early split-off from the coronavirus group 2 lineage nidovirales: evolving the largest rna virus genome the rhabdoviruses: biodiversity, phylogenetics, and evolution rhabdovirus accessory genes global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the global burden of disease study virus taxonomy, ninth report of the international committee on taxonomy of viruses adelaide river rhabdovirus expresses consecutive glycoprotein genes as polycistronic mrnas: new evidence of gene duplication as an evolutionary process gene duplication and phylogeography of north american members of the hart park serogroup of avian rhabdoviruses gene duplication is infrequent in the recent evolutionary history of rna viruses the genome of bovine ephemeral fever rhabdovirus contains two related glycoprotein genes genome organization and transcription strategy in the complex gns-l intergenic region of bovine ephemeral fever rhabdovirus a small highly basic protein is encoded in overlapping frame within the p gene of vesicular stomatitis virus identification of a set of proteins (c' and c) encoded by the bicistronic p gene of the indiana serotype of vesicular stomatitis virus and analysis of their effect on transcription by the viral rna polymerase sunguru virus: a novel virus in the family rhabdoviridae isolated from a chicken in north-western uganda enhanced arbovirus surveillance with deep sequencing: identification of novel rhabdoviruses and bunyaviruses in australian mosquitoes characterization of two new rhabdoviruses isolated from midges (culicoides spp) in the brazilian amazon: proposed members of a new genus viroporins: structure and biological functions arboretum and puerto almendras viruses: two novel rhabdoviruses isolated from mosquitoes in peru kotonkan and obodhiang viruses: african ephemeroviruses with large and complex genomes malakal virus from africa and kimberley virus from australia are geographic variants of a widely distributed ephemerovirus complex genome organization in the gns-l intergenic region of adelaide river rhabdovirus tibrogargan and coastal plains rhabdoviruses: genomic characterisation, evolution of novel genes and seroprevalence in australian livestock genomic characterisation of wongabel virus reveals novel genes within the rhabdoviridae ngaingan virus, a macropod-associated rhabdovirus, contains a second glycoprotein gene and seven novel open reading frames bovine ephemeral fever rhabdovirus α1 protein has viroporin-like properties and binds importin β1 and importin 7 niakha virus: a novel member of the family rhabdoviridae isolated from phlebotomine sandflies in senegal characterization of the tupaia rhabdovirus genome reveals a long open reading frame overlapping with p and a novel gene encoding a small hydrophobic protein genetic characterization of k13965, a strain of oak vale virus from western australia characterization of durham virus, a novel rhabdovirus that encodes both a c and sh protein non-canonical translation in rna viruses translation initiation at alternate in-frame aug codons in the rabies virus phosphoprotein mrna is mediated by a ribosomal leaky scanning mechanism identification of two additional translation products from the matrix (m) gene that contribute to vesicular stomatitis virus cytopathology accessory genes of the paramyxoviridae, a large family of nonsegmented negative-strand rna viruses, as a focus of active investigation by reverse genetics internal initiation of translation on the vesicular stomatitis virus phosphoprotein mrna yields a second protein the where, what and how of ribosomal frameshifting in retroviral protein synthesis phylogenetic relationships among rhabdoviruses inferred using the l polymerase gene phylogenetic relationships of seven previously unclassified viruses within the family rhabdoviridae using partial nucleoprotein gene sequences wongabel rhabdovirus accessory protein u3 targets the swi/snf chromatin remodelling complex overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation orphans as taxonomically restricted and ecologically important genes finding families for genomic orfans genetically modified vsv(nj) vector is capable of accommodating a large foreign gene insert and allows high level gene expression expression of human immunodeficiency virus type 1 gag protein precursor and envelope proteins from a vesicular stomatitis virus recombinant: highlevel production of virus-like particles containing hiv envelope extremely high mutation rate of a hammerhead viroid the effect of gene overlapping on the rate of rna virus evolution rabies virus glycoprotein gene contains a long 3' noncoding region which lacks pseudogene properties infection characteristics of rabies virus variants with deletion or insertion in the pseudogene sequence identification of viral genomic elements responsible for rabies virus neuroinvasiveness level of gene expression is a major determinant of protein evolution in the viral order mononegavirales adding genes to the rna genome of vesicular stomatitis virus: positional effects on stability of expression the transcription complex of vesicular stomatitis virus transcription and replication of nonsegmented negative-strand rna viruses origins of genes: "big bang" or continuous creation? complete genomes of aravan, khujand, irkut and west caucasian bat viruses, with special attention to the polymerase gene and non-coding regions polymerase errors accumulating during natural evolution of the glycoprotein gene of vesicular stomatitis virus indiana serotype isolates full-length genome analysis of natural isolates of vesicular stomatitis virus (indiana 1 serotype) from north, central and south america phylogenetic analysis reveals a low rate of homologous recombination in negative-sense rna viruses defective interfering viruses the origins of defective interfering particles of the negative-strand rna viruses origin and replication of defective interfering particles beilong virus, a novel paramyxovirus with the largest genome of non-segmented negative-stranded rna viruses partitioning the genetic diversity of a virus family: approach and evaluation through a case study of picornaviruses toward genetics-based virus taxonomy: comparative analysis of a genetics-based classification and the taxonomy of picornaviruses genetics-based classification of filoviruses calls for expanded sampling of genomic sequences phylogeny of titmice (paridae): ii. species relationships based on sequences of the mitochondrial cytochrome-b gene trimmomatic: a flexible trimmer for illumina sequence data abyss: a parallel assembler for short read sequence data fast gapped-read alignment with bowtie 2 integrative genomics viewer hhblits: lightning-fast iterative protein sequence searching by hmm-hmm alignment plant and animal rhabdovirus host range: a bug's view muscle: a multiple sequence alignment method with reduced time and space complexity improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0 key: cord-003044-9uqa39j9 authors: cervera, héctor; ambrós, silvia; bernet, guillermo p; rodrigo, guillermo; elena, santiago f title: viral fitness correlates with the magnitude and direction of the perturbation induced in the host’s transcriptome: the tobacco etch potyvirus—tobacco case study date: 2018-03-19 journal: mol biol evol doi: 10.1093/molbev/msy038 sha: doc_id: 3044 cord_uid: 9uqa39j9 determining the fitness of viral genotypes has become a standard practice in virology as it is essential to evaluate their evolutionary potential. darwinian fitness, defined as the advantage of a given genotype with respect to a reference one, is a complex property that captures, in a single figure, differences in performance at every stage of viral infection. to what extent does viral fitness result from specific molecular interactions with host factors and regulatory networks during infection? can we identify host genes in functional classes whose expression depends on viral fitness? here, we compared the transcriptomes of tobacco plants infected with seven genotypes of tobacco etch potyvirus that differ in fitness. we found that the larger the fitness differences among genotypes, the more dissimilar the transcriptomic profiles are. consistently, two different mutations, one in the viral rna polymerase and another in the viral suppressor of rna silencing, resulted in significantly similar gene expression profiles. moreover, we identified host genes whose expression showed a significant correlation, positive or negative, with the virus' fitness. differentially expressed genes which were positively correlated with viral fitness activate hormoneand rna silencing-mediated pathways of plant defense. in contrast, those that were negatively correlated with fitness affect metabolism, reducing growth, and development. overall, these results reveal the high information content of viral fitness and suggest its potential use to predict differences in genomic profiles of infected hosts. fitness is a complex parameter often used by evolutionary biologists and ecologists to quantitatively describe the reproductive ability and evolutionary potential of an organism in a particular environment (linnen and hoekstra 2009; orr 2009) . despite this apparently simple definition, measuring fitness is difficult and most studies only measure one or more fitness components (e.g., survival to maturity, fecundity, number of mates, or number of offspring produced) as proxies to total fitness (linnen and hoekstra 2009; orr 2009 ). in the field of virology, it has become standard to measure fitness by growth-competition experiments in mixed infections with a reference strain (holland et al. 1991; wargo and kurath 2012) . with this experimental set up, fitness is just the relative ability of a viral strain to produce stable infectious progeny in a given host (cell type, organ, individual, or species) when resources have to be shared with a competitor (domingo and holland 1997) . regardless its limitations, this approach provides a metric for ranking viral strains according to their performance in a particular environment/host. such a fitness measure has been pivotal for quantitatively understanding many virus evolution processes: the effect of genetic bottlenecks and accumulation of deleterious mutations (chao 1990; duarte et al. 1992; de la iglesia and elena 2007) , the rates and dynamics of adaptive evolution into novel hosts , the pleiotropic cost of host range expansion (novella, clarke, et al. 1995; turner and elena 2000; lali c et al. 2011) , the cost of genome complexity (pesko et al. 2015; willemsen et al. 2016) , the cost of antiviral escape mutations (novella et al. 2005; westerhout et al. 2005; mart ınez-picado and mart ınez 2008) , the topography of adaptive fitness landscapes (da silva and wyatt 2014; lali c and elena 2015; cervera et al. 2016) , and the role of robustness in virus evolution (codoñer et al. 2006; sanju an et al. 2007; novella et al. 2013) . but differences in viral fitness should also matter in genome-wide studies seeking to understand the mode of action of the viruses (i.e., the precise way they interact with their hosts). it has been argued that an integrative systems biology approach to viral pathogenesis would result in a better understanding of pathogenesis and in the identification of common targets for different viruses, therefore serving as a guide to a more rational design of therapeutic drugs (tan et al. 2007; viswanathan and früh 2007; bailer and haas 2009; barab asi et al. 2011; friedel and haas 2011; elena and rodrigo 2012; finzer 2017) . pioneering studies have ignored the high genetic variability of viruses in fitness and in mode of action. experimental evidence supports that even single nucleotide substitutions have significant effects on viral fitness regardless of whether they are synonymous or nonsynonymous, or they affect coding or noncoding genomic regions (sanju an et al. 2004; carrasco, de la iglesia, et al. 2007; domingo-calap et al. 2009; peris et al. 2010; acevedo et al. 2014; bernet and elena 2015; visher et al. 2016) . a common trend among all these studies is that, whenever fitness is evaluated in the standard host, the distribution of mutational effects is highly skewed toward deleterious effects, with a large fraction of mutations being lethal. furthermore, increasing evidence suggests that the distribution of fitness effects increases as the genetic divergence among two hosts (e.g., a tested host and the natural one) increases (lali c et al. 2011; vale et al. 2012; lali c and elena 2013; cervera et al. 2016) . together, all these observations suggest that the fitness of a given viral genotype depends not only on its own genetic background but also on the host where fitness is evaluated. arguably, differences in viral fitness reflect differences in the virus-host interaction. as cellular parasites, viruses need to utilize all sort of cellular factors and resources, reprogram gene expression patterns into their own benefit, and block and interfere with cellular defenses. all these processes take place in the host complex network of intertwined interactions and regulations. interacting in suboptimal ways with any of the elements of the host network may have profound effects in the progression of a successful infection and therefore in viral fitness; inefficient interactions may result in attenuated or even abortive infections. little is known about how viral fitness informs about the underlying changes occurring in host gene expression and protein function at a genome-wide scale. in this work, we have investigated the potential association between viral fitness and host transcriptional regulation upon infection as a first step into this direction. we have characterized the transcriptomic profiles of nicotiana tabacum l. var xanthi nn plants inoculated with a collection of genotypes of tobacco etch virus (tev; genus potyvirus, family potyviridae) that differ in their fitness in this natural host. analyses of gene expression data allowed us to characterize differential gene expression upon infection with different tev genotypes, as well as to identify sets of candidate genes whose expressions positively or negatively correlated with the magnitude of tev fitness. differences in expression for representative genes from these two categories were experimentally validated by an alternative method. differences in viral fitness and host symptomatology figure 1 shows relevant information about the seven tev genotypes used for this study. the mutant genotypes differ from the wild-type (wt) genotype in a rather limited number of nonsynonymous mutations (1 or 2). however, their fitness values and the severity of symptoms induced differ widely. in this study, the fitness of each mutant genotype was estimated as the ratio of malthusian growth rates of the mutant and the wt (see materials and methods for details). significant differences existed among the fitness values of the seven selected 1b ). figure 1c illustrates the differences in symptoms induced by each one of the seven genotypes. symptoms ranged from the asymptomatic infection or local chlorotic spots characteristic of mutant as13, the mild etching of mutant cla11 and the severe etching induced by the wt and the other mutants. no correlation exists between virus fitness and symptoms, a finding previously reported for this experimental system (carrasco, de la iglesia, et al. 2007 ). differences in viral fitness are associated with differences in the magnitude of the change in the host transcriptome first, we sought to test whether differences in tev fitness might be associated with differences in the gene expression profiles of infected plants. we hypothesized that viral fitness results from a particular interaction between virus and host factors and assumed that the outcome of infection of a wt virus in its natural host results from an optimal (from the virus perspective) modulation of the host's gene expression profile. as viral fitness is reduced, interactions are less optimal and, consequently, the gene expression profile of the plant will be increasingly different from that resulting from the infection with the wt virus. to test this hypothesis, we infected n. tabacum plants with each one of the seven tev genotypes described earlier. eight-day postinoculation (dpi) symptomatic tissues were collected for all mutants except for the very low fitness mutant as13, for which tissues were collected 15 dpi because the delay in symptoms appearance and severity ( fig. 1c ). total rnas were extracted, normalized, and used to hybridize n. tabacum gene expression 4 â 44 k microarrays (agilent). slides were handled as described in the materials and methods section; intensity signals were normalized using tools in babelomics (alonso et al. 2015) . normalized expression data are contained in supplementary file s1, supplementary material online. figure 2a shows the clustering (unweighted average distance method; upgma) of average expression data for those genes that significantly changed expression (62-fold) among plants infected with the seven viral genotypes (1-way anovas with false discovery rate (fdr) correction; overall p < 0.05) relative to the mock-inoculated plants. regarding individual genes, two major clusters can be distinguished, one corresponding to the overexpression of genes related to stress response and a second one corresponding to the underexpression of genes involved with metabolism and plant development. to further explore the similarity in the perturbation induced by each viral genotype into the plants' transcriptome, we computed all pairwise pearson productmoment correlation coefficients (r) between the mean expression values for all genes in the microarray. these correlations were used as a measure of similarity to build a upgma dendrogram. the rationale for this analysis is as follows: the more correlated two expression profiles are, the more similar the effects induced in infected plants. when comparing expression profiles from a pair of infected plants, a high correlation may indicate that genes that changed expression relative to the mock-inoculated plants, are exactly the same in both samples, showing a similar expression pattern. conversely, if genes with differential expression do not match in the two profiles being compared, then the correlation will be low. figure 2b shows both the heat-map of the correlation coefficients and the resulting dendrogram. three clusters result from this analysis ( fig. 2b ). the first cluster is constituted by the three viral genotypes with the higher fitness values, that is, wt, pc55, and pc48. genotypes of intermediate fitness cla11, pc95, and cla2 constitute a second cluster. finally, plants infected with as13 show the most dissimilar gene expression profile. the heat-map is shown with viral genotypes ordered according to the upgma clustering. correlations decreased as the distance in the cladogram increases. within clusters, r > 0.85, whereas between clusters the correlations ranged between 0.65 < r < 0.75, except for plants infected with as13, whose similarity with other infected plants was r < 0.65. next, to further investigate the similarity between expression profiles of plants infected with different tev genotypes, we performed a principal components (pc) analysis of all the gene expression data. the percentage of total observed variance explained by the first three components was $93% (the first pc itself explained 81%). figure 2c shows the distribution of values in the space defined by the three first principal components. results are equivalent to those obtained with the two previous clustering methods, where genotypes are classified into three groups. wt, pc55, and pc48 are closer in the space and characterized by positive values of first pc but negative values of the second and third pcs. cla11, pc95, and cla2 form a second group, with positive values of first and second pcs but negative values of the third. as before, as13 effect on host transcriptome is clearly different, and has negative values of the first and second pcs but positive of the only genes that show significant differences among all infections (oneway anova with fdr, adjusted p < 0.05) are included in the heat-map. hierarchical clustering of genes done with upgma by using the correlations between all pairs of mean profiles as distance metric. genes down-regulated (in blue) mainly correspond to metabolic and developmental processes, whereas genes up-regulated (in red) mainly correspond to stress responses. (b) tev genotypes clustered (upgma) according to the similarity of the mean expression profiles of plants infected with each one of them. the heat-map represents the value of the pearson's correlation coefficient between pairs of mean profiles. (c) representation of the three major principal components from the data shown in panel (b) . the three first pcs explain up to 93% of the total observed variance. lines link each genotype with the centroid of the 3d space. the arrow represents a putative trajectory of increasing viral fitness. (d) association between viral fitness and the magnitude of the perturbation (vs. mockinoculated control plants) both relative to wt (p ¼ 0.005). (e) association between viral fitness and the distance of each genotype to the wt (p ¼ 0.003), from the dendrogram shown in panel (b). cervera et al. . doi:10.1093/molbev/msy038 mbe third. interestingly, figure 2c shows that genotypes are located in this principal component space following a trajectory of increasing fitness values (indicated by the arrow in fig. 2c ). along this trajectory, pcs switch sign in different directions. this transition suggests that the over-or under-expression of a set of genes is associated with particular levels of viral fitness: low fitness as13 is characterized by a positive third pc and a negative first pc while high fitness viruses are characterized by the opposite sign. that is, over-or underexpressed genes are not progressively accumulated as long as viral fitness changes. these genes will be evaluated in the following sections. following from our working hypothesis, if the wt virus has evolved to optimize its interaction with the host, it is logical that small departures in viral fitness will be associated with small deviations between the transcriptomes of plants infected with the wt virus and with viruses whose fitness is close to the wt. conversely, the less similar fitness between the wt and mutant viruses, the more dissimilar would be the transcriptional profiles of infected plants. to test this prediction, we have explored the following: 1) the correlation between the similarity of transcriptional profiles of plants infected with the wt tev and with each mutant (again using pearson's r) and fitness and 2) the correlation between the distance from wt in the cladogram shown in figure 2b and fitness. the results of these analyses are shown in figure 3d and e. as expected, both correlations were significant (r ¼ 0.826, p ¼ 0.005 and r ¼ à0.857, p ¼ 0.003, respectively; in both cases 5 df) and of the expected sign. viral fitness and perturbation of host's transcriptomes . doi:10.1093/molbev/msy038 mbe plants. the number of down-expressed degs ranges between 531 (for as13) and 2,809 (for cla11), while in the case of upexpressed degs the range is slightly narrower: from 781 (as13) to 2,696 (cla11). figure 3b illustrates the number of degs in common between all pairs of transcriptomes from infected plants. the heat-map shows a pattern of modularity, with three well-defined modules. the first module contains the three viruses with highest fitness (wt, pc48, and pc55), the second module contains the three viruses with intermediate fitness (pc95, cla11, and cla2), and the very low fitness genotype as13 is the only member of the third module. the number of shared degs within each of these modules is > 75% of total. the number of shared degs between modules drops <60%. next, following the same rationale as the previous section, we sought to determine whether the number of degs also depends on the difference in fitness from wt. in this case, we hypothesized that the overlap in the lists of degs must be similar for wt and viruses of equivalent fitness (e.g., pc48 or pc55), whereas the magnitude of the overlap between deg lists would decrease as differences in fitness became larger. figure 3c shows the counts of degs that are differentially expressed (for up-and down-expressed genes) between wt and the other six viral genotypes. as expected, pc48, pc55, and pc95 alter the same genes, though in a different magnitude (as shown in the previous section). the number of genes that are not in common with wt increases from cla11 (502), cla2 (969), and as13 (2001) of particular interest is the similarity between pc95, a mutant of the replicase nib gene, and cla11, a mutant of the vsr hc-pro gene. these two mutations led to close fitness values ( fig. 3a ), but also resulted in significantly similar gene expression profiles ( fig. 3b ). at first sight, one may argue that their impact in transcriptomic profiles should be different since these mutations affect virus proteins that are functionally unrelated. however, our results suggest that the effects on the overall virus-host interaction of each mutant are canalized in the same way. this clearly shows that viral fitness contains high information about the virus-host interaction. lists of genes are difficult to interpret and functional analyses provide a good tool to cluster genes into groups with related functions. to this end, we performed an analysis of enriched functional categories (go terms) for each viral genotype. figure 4 illustrates the way that plants infected with each one of the seven tev genotypes differ in the functional categories significantly overrepresented relative to the mockinoculated plants. figure 4 shows the go terms ordered from the highest to the lowest viral fitness. the upper plane shows the functional categories altered in plants infected with wt tev, with metabolic process (go: 0008152) containing the largest number and photosynthesis (go: 0015979) the smallest. regulation of response to biotic stimulus (go: 0002831), defense response (go: 006952), immune system process (go: 0002376), protein modification process (go: 0036211), hormone-mediated signaling (go: 0009755), and cell death (go: 008219) are all enriched in up-expressed degs, while photosynthesis, lipid metabolic process (go: 0006629), and regulation of nitrogen compound metabolic process (go: 0051171) are categories significantly enriched in down-expressed degs. categories such as metabolic process, unspecific response to stress (go: 0006950), response to stimulus (go: 0050896), response to abiotic stimulus (go: fig. 4 . functional analysis associated to differential gene expression. artwork of meaningful biological processes (in a plane). categories that are overrepresented in the two lists of degs for each tev genotype (up-and down-regulated), either in one of them or in both, are indicated with different colors. in red, we represent categories that are significantly enriched by up-expressed degs, in blue categories that are significantly enriched by down-expressed degs, and in pink categories enriched in both types of degs; the surface of each circle is proportional to the number of degs included in each category. enrichments were evaluated by fisher's exact tests with fdr (adjusted p < 0.05). the different planes are organized according to viral fitness. we considered infected versus mock-inoculated control plants (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.). cervera et al. . doi:10.1093/molbev/msy038 mbe 0009628), localization (go: 0051641), or transport (go: 0008150) are enriched in both types of degs. therefore, overall speaking, genes involved in different aspects of plant defense pathways and response to infection are up-expressed, whereas genes involved in metabolism and photosynthesis are down-expressed. the second fitness plane corresponds to genotypes pc48 and pc55, both of mild effect and carrying mutation in the ci gene. the most remarkable difference between these two genotypes and the rest of genotypes is the significant enrichment in up-expressed degs related to signal transduction (go: 0007165) and regulation of gene expression (go: 0010468). drifting down in the virus fitness scale, the next plane in figure 4 corresponds to genotype pc95, which shows a similar distribution of gos as the wt except for a lack of enrichment in up-expressed degs in the regulation of response to biotic stress category. next plane corresponds to genotypes cla2 and cla11 hyposuppressors of moderate fitness and carrying point mutations in the hc-pro gene. plants infected with these two viral genotypes differ from plants infected with the wt virus in three main functional categories: the loss of significant enrichment in the hormonemediated signaling category, and a significant enrichment in up-expressed degs into the localization and transport categories. finally, the bottom plane in the fitness scale corresponds to genotype as13 ( fig. 4 ), which has a very week vsr activity, very low fitness and induces no symptoms or very mild symptoms. these differences in fitness and severity of symptoms have a direct translate into the enrichment of the different functional categories. compared with wt, metabolic process is enriched with down-expressed degs while nonspecific response to stress is very much enriched now in up-expressed degs, and response to stimulus, protein modification process, localization, and transport are not enriched in any particular type of degs. moreover, no significant enrichment in the hormone-mediated signaling module was found in plants infected with as13, or for the other hc-pro mutant genotypes cla2 and cla11. taken together, the results shown in the previous sections suggest that the transcriptomic response of plants to infection varies with the fitness of the virus being inoculated. this observation motivated us to identify genes whose expression significantly correlates with viral fitness; that is, systematic changes in virus fitness are associated with an increase or decrease in the expression level of a particular gene. this is a correlation analysis and as such does not assume a functional dependence between viral fitness and the expression of individual host genes. yet, it may provide a list of candidate genes to be considered as determinants of viral fitness. we computed a nonparametric spearman's correlation coefficient between viral fitness and the normalized degree of expression (z-score) for each one of the previously characterized degs correlation plots between host gene expression and viral fitness for those genes that significantly vary across all viral infections (one-way anova with fdr, adjusted p < 0.05), and that exhibit a significant positive (upper panel; red dots) or negative (lower panel; blue dots) trend (spearman's correlation test, p < 0.05). expression data represented as z-scores. (b) pie charts of biological and molecular functions. on the top, for genes whose expression increases with tev fitness (red dots in panel a); on the bottom, for genes whose expression decreases with fitness (blue dots in panel a). (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.). viral fitness and perturbation of host's transcriptomes . doi:10.1093/molbev/msy038 mbe next, we sought to explore which functional categories and molecular functions, if any, were enriched among these two subsets of degs. results are shown in figure 5b and functional annotations are all reported in the supplementary file s4, supplementary material online. there are significant differences in the distribution of positively and negatively correlated degs into different functional categories ( fig. 5b , left column; homogeneity test: v 2 ¼ 29.225, 6 df, p < 0.001), although the difference in magnitude is moderate (cram er's v ¼ 0.304). among degs whose expression is positively correlated with viral fitness, biological regulation (go: 0065008) and developmental processes (go: 0032502) are strongly enriched compared with the negatively correlated degs. by contrast, negatively correlated degs disproportionally contribute more than positively correlated ones to the categories response to stimulus, localization, metabolic processes, and cell death. focusing on molecular functions ( fig. 5b , right column), a significant difference also exists among degs whose expression is positively-and negatively correlated with viral fitness (homogeneity test: v 2 ¼ 36.720, 6 df, p < 0.001), with a magnitude of the difference in the moderate to large magnitude range (cram er's v ¼ 0.341). on the one hand, among positively correlated degs, nucleic acid binding (go: 003676) shows the largest departure from negatively correlated ones. on the other hand, catalytic activity (go: 003824) and transporter activity (go: 0005215) are the two molecular functions that appear to be enriched among negatively correlated degs. together, these results suggest that positively correlated degs play a role in the transcriptional regulation of host defenses. by contrast, degs with negative correlation between expression and tev fitness participate more in catalytic and transport activities than genes with positive correlation, suggesting a redirection of resources by the host that is not independent of viral fitness. normalized expression data used in figure 5 were estimated from changes in spot intensity in the n. tabacum gene expression 4 â 44 k microarrays (agilent). to validate these results with rt-qpcr, we selected four positively correlated and five negatively correlated degs that cover the entire range of observed significant spearman's correlation coefficients (supplementary file s4, supplementary material online). they represent different biological functions and are expressed at different developmental stages and under different environmental situations (see below). the four positively correlated degs selected were (ordered according to the observed r s values): dicer-like 2 gene (dcl2; r s ¼ 0.893), the gene encoding for the vq motif-containing protein 29 (vq29; r s ¼ 0.893), the gene encoding for the gast1 protein homolog 1 (gasa1; r s ¼ 0.857), and a gene encoding for a member of the lipase/lipoxygenase plat/lh2 family (plat1; r s ¼ 0.786). dcl2 is involved in defense response to viruses, maintenance of dna methylation and production of ta-sirnas involved in rna interference (parent et al. 2015) . vq29 is a negative transcriptional regulator of lightmediated inhibition of hypocotyl elongation that likely promotes the transcriptional activation of phytochrome interacting factor 1 (pif1) during early seedling development, and participates in the jasmonic acid-mediated (ja) plant basal defense, as the vq proteins interact with wrky transcription factors . gasa1 encodes for a gibberellin-and brassinosteroid-regulated protein possibly involved in cell elongation (bouquin et al. 2001) , also reported to be involved in resistance to abiotic stress through ros signaling (o'brien et al. 2012) . plat1 encodes for a lipase/lipoxygenase that promotes abiotic stress tolerance (hyun et al. 2015) , is a positive regulator of plant growth, and regulates the abiotic-biotic stress cross-talk. the negatively correlated degs selected for validation are the adenosine kinase 2 gene (adk2; r s ¼ à0.857), the gene encoding for the small 3b chain of the rubisco (rbcs3b; r s ¼ à0.857), the agamous-like 20 gene (agl20; r s ¼ à0.786), the factor of dna methylation 1 gene, (fdm1; r s ¼ à0.786), and the granule-bound starch synthase 1 gene (gbss1; r s ¼ à0.821). adk2 encodes for an adenosine kinase involved in adenosine metabolism, including the homeostasis of cytokinines (schoor et al. 2011) , controls methyl cycle flux in a s-adenosyl methioninedependent manner and plays a role in rna silencing by methylation. rbcs3b is involved in carbon fixation during photosynthesis and in yielding sufficient rubisco content (zhan et al. 2014) . agl20 is a dna-binding mads-box transcription activator modulating the expression of homeotic genes involved in flower development and maintenance of inflorescence meristem identity, transitions between vegetative stages of plant development and in tolerance to cold (lee et al. 2000) . fdm1 is an sgs3-like protein that acts in rna-directed dna methylation participating in the rna silencing defense pathway (xie et al. 2012) . gbss1 is involved in glucan biosynthesis and responsible of amylase synthesis that is essential for plant growth and other developmental processes (denyer et al. 1996) . rt-qpcr-based relative expression data were calculated using the ddc t method normalized by each one of the two reference genes and then averaged (see materials and methods). to make expression data by microarray readings and rt-qpcr readily comparable, they were both transformed into z-scores. figure 6a shows the comparison of the two expression measures for the four degs with positive correlation with tev fitness and figure 6b for the five degs with negative correlation with tev fitness. two different plots are presented for each gene. in all cases, the left plot illustrates the relationship between the expression z-scores obtained with the microarray method (x-axis) and with rt-qpcr method (y-axis) for each one of the seven tev genotypes; the solid lines indicate the best linear fit between these two data sets. in this representation, a regression line of slope 1 is expected if both quantification methods provide identical z-scores. in all nine cases, both expression z-scores are highly and significantly correlated; pearson's r values ranged from 0.696 (vq29) to 0.970 (gasa1) (in all cases 5 df, 1-tailed p 0.041). if a more stringent holm-bonferroni correction of the overall significance level is taken, then vq29 would not remain significant. for each gene, the right plot shows both expression z-scores as a function of tev fitness; solid lines represent mbe the best linear fit between normalized expressions and tev fitness. in this representation, the more overlap between the two regression lines, the better the agreement between both quantitative methods. in this representation, vq29 and adk2 show the largest departure between both regression lines, though even in these extreme cases, the difference was not large enough as to be significant in a nonparametric wilcoxon's signed ranks test (p ! 0.499 in all nine cases) or in a student's t-test for the comparison of regression coefficients (p ! 0.285). thus, we conclude that, at least for the sample of genes analyzed, the observed correlations between host's gene expression and viral fitness are consistent for both experimental methods used to evaluate levels of gene expression. we further delineated a picture of virus-plant interaction reflected in precise alterations of transcriptomic profiles and regulatory networks. plant-virus interactions result from the confrontation of two players with opposed strategies and interests. from the plant perspective, activation of basal defenses, immunity, hormone-regulated pathways, and rna-silencing (some of which are not virus-specific) will result in an immediate benefit to control virus replication and spread. we found that some plant defense responses are expressed upon infection regardless the fitness of the virus, whereas other defenses are induced progressively as viral fitness increases. consistent with the first mode, we observed the activation of the genes eds1 and pad4, components of r gene-mediated disease resistance with homology to lipases, in every infection (fig. 7a ). these are master regulators of plant defenses that connect pathogen signals with salicylic acid signaling (cui et al. 2017 ). salicylic acid is involved in resistance to a broad spectrum of pathogens, and in particular viruses (alamillo et al. 2006; conti et al. 2017) . consistent with the second mode, we observed the activation of many genes involved in defenses in proportion to tev fitness ( fig. 7a ). for example, the dcl2 and ago1 genes-key for the rna silencing response-, genes modulating resistance to pathogens such as the subtilisin-like protease (sbt1.9), or genes expressing proteins involved in hormone-regulated defenses such as gasa1 and vq29, brassinosteroids (e.g., brassinosteroid enhanced expression 2 bee2, brassionosteroid-signaling kinases 2 and 7, brassinosteroid bak1 bri1-associated receptor kinase), ethylene response factors (e.g., erf1b, erf71, or cytokine response factor 1 crf1), and members of abscisic acid perception pathway (e.g., pyl4-rcar10, a regulatory component of the abscisic acid receptor family). likewise, genes involved in methylation-mediated stress responses, such adk2, fdm1 or the methionine adenosyltransferase mat3 reduce their expression as virus replication is more efficient, thus resulting in less methylation and increased expression of genes that participate in apoptosis and posttranscriptional gene silencing (schoor et al. 2011) . in this way, the overexpression of genes that modulate histone acetylation or chromatin organization, such as the histone acetyltransferase hac1 and the chromatin remodeling factor r17 (chr17) would regulate differentiation, apoptosis, transcriptional activation, or ethylene response just as viral fitness increases. however, these activations have a cost, mainly in terms of resources that can be invested into secondary metabolism and development. consistent with this idea is the fact that many genes participating in metabolic processes (e.g., cysc1, a cysteine synthase) are highly repressed upon infection ( fig. 7a ). there are also central genes for the plant metabolism whose repression correlates with viral fitness, such as gbss1, photosystem components, or assembly factors (e.g., lhcb1.3 and hcf136), rubisco subunits and atpases, catalases, transketolases, nucleotide and phosphate transporters, synthases involved in flavonoid, isoprenoid, ascorbate, or tryptophan biosynthesis, and gapdh ( fig. 6 ). diverting host cell resources and reprograming the metabolic machinery to support rna metabolism and atp production is a general strategy both of plant ) and animal viruses (tang et al. 2005; tiwari et al. 2017) . tev achieves this reprogramming by altering the expression of a series of genes to its own benefit. for example, we found the expression of genes involved in actin cytoskeleton organization such as adf4 and pfn3 to be negatively correlated with tev fitness. the profilin pnf3 is an actinbinding protein and adf4 participates in the depolymerization of actin filaments that results from microbial-associated molecular patterns being recognized by the corresponding pattern-recognition receptors (henty-ridilla et al. 2014) . therefore, by downregulating this function, longer and more stable actin filaments are produced that virions can use to move around the cell from the er-associated replication factories to plasmodesmata. another example is the repression observed for the ubp1b gene, a negative regulator of potyvirus translation, that would allow for a more optimal virus accumulation (hafr en et al. 2015) . genes involved in nonsense-mediated decay (nmd) defenses (wachter and hartmann 2014) , such as the atp-dependent rna helicase upf1, also show reduction in levels of expression. other group of proteins that show alteration during viral infection are those involved in protein degradation, via ubiquitination and downstream into the proteasome pathway (e.g., ubiquitin-protein ligase 1, upl1; ubiquitin-conjugating enzyme 2, ubc2; ubiquitin e2 variant 1b, mmz2; ein3-binding f box protein 1, ebf) or via autophagy (e.g., the atp-driven chaperone cdc48c and the plant autophagy adaptor nbr1). moreover, tev activates in a fitness-dependent manner the expression of genes rh8, an rna helicase, and pcap1, a fig. 6. continued measured by microarray and in blue by rt-qpcr) and viral fitness. the solid lines represent linear models; the closer the slopes of both lines, the more similarity between microarray and rt-qpcr expression data. (a) genes whose expression increases with viral fitness (cases from fig. 5a , red dots). (b) genes whose expression decreases with viral fitness (cases from fig. 5b, blue dots) . bidimensional error bars represent 61 sd. (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) cervera et al. . doi:10.1093/molbev/msy038 mbe membrane-associated cation-binding protein, also required by potyviruses for cell-to-cell movement (vijayapalani et al. 2012) (fig. 7a ), and of a diversity of transcription factors including global (e.g., gra2, gte8), sequence-specific (e.g., sacl1 and spl12), gata/nac family members (e.g., gata1, nac083, nac029), bzip g-box finding factors (e.g., gbf1 and bzip63), and involved in homeotic gene expression (e.g., agl20 and homeobox-1). we also found genes related to genome integrity, (e.g., the cohesins syn2 and smc3, and the chromatin protein spt2), dna replication and nucleosome assembly, alternative splicing (e.g., sf1, an homolog nuclear splicing factor), chromatin transition (e.g., spt16, an histone chaperone involved in transcription elongation from rnapolii promoters and regulation of chromatin transitions; or the histone acetyltransferase hac1, a coactivator of gene transcription with a major role in controlling flowering time and also essential for resistance to bacterial infections), dna replication, and cell division (e.g., the mitotic cohesin rad21 and the cyclin-dependent protein kinase cych1). however, not all host factors recruited by the virus present alterations in their expression. according to our data, the translation initiation factor eif4e, known to be exploited by tev for its own translation (robaglia and caranta 2006) , was found to be unperturbed ( fig. 7a ) while eif3a and eif4g expression is positively correlated with tev fitness. in essence, there are genes that are significantly altered (up or down) upon infection irrespective of the ability of the virus to replicate, genes whose expression correlates with this ability (positively or negatively), and genes that remain unaltered. nevertheless, this picture of virus-plant interaction may be biased by the limited number of viral genotypes analyzed in this work. three out of six genotypes correspond to hc-pro mutants. as a multifunctional protein, it is not surprising that different fitness levels can be reached by introducing mutations in different functional domains. but, certainly, more mutants should be analyzed in future work to provide a comprehensive picture and avoid bias toward certain virus proteins. in addition, we here focused on the transcription regulation, but other interlinked networks exist in the cell (e.g., metabolism, protein-protein interactions, . . .). to provide an insight on these other networks, we constructed the interactome ( fig. 7b ) of hc-pro with the host proteins known to interact with this virus protein (revers and garc ıa 2015) . we then contextualized our gene expression data over multiple tev infections. many of the cellular functions in which hc-pro participate (protein degradation, translation, redox processes, and cation signaling) are not regulated transcriptionally upon infection (or regulated marginally). presumably, the virus exploits these processes for its own benefit (mainly to enhance replication and movement within a cell), and the normal expression of the corresponding genes is sufficient for such subversion. by contrast, rna silencing and methylation are functions involved in defense against pathogens that are quantitatively regulated, as a sort of control strategy exerted by the plant, as long as they are needed, that is, according to viral fitness. biological systems and processes can be analyzed and modeled at every scale of complexity. it is expected that components of each level of complexity may contribute to determine the behavior of processes at other levels. the complexity at the molecular level (i.e., the lowest level of biological organization) is astonishing both in terms of possible elements (genes, functional rnas, proteins, and metabolites) and of interactions among them (barab asi et al. 2011) . thus, if the components at lower scales of complexity, presumed to be more accessible experimentally, are informative enough about the underlying processes, they result in excellent proxies to understand biological systems. in the case of a disease (in plants or animals), the symptoms exhibited by the organism have been traditionally used as macroscopic indicators of what occurred within the organism. this allows diagnosing diseases without the need to perform further analyses. however, symptoms are generally uncoupled from the magnitude of the perturbation at the molecular level in the host (with respect to a healthy state) (barab asi et al. 2011; finzer 2017) . this is particularly true in the case of a virusinduced disease, a paradigmatic example of a system-wide perturbation (tan et al. 2007; viswanathan and früh 2007; bailer and haas 2009; friedel and haas 2011; elena and rodrigo 2012) . here, we have studied for the first time the use of viral fitness as an indicator of the molecular changes occurring in the host upon infection. after all, the progress of a viral infection depends on the fitness of the virus mutant swarm. classically, viral fitness has been evaluated by means of parameters describing the absolute growth and accumulation, by competition experiments (holland et al. 1991; wargo and kurath 2012) , or even by correlating it with the development of host's symptoms (carrasco, de la iglesia, et al. 2007; wargo and kurath 2012) . we focused on the infections exerted by different genotypes of a given virus in the same host. fitness differences among genotypes are due to several causes. first, they may be a direct consequence of the effect of mutations on viral proteins, perhaps even resulting in altered folding, and thus jeopardizing their functions. second, in the case of mutations affecting regulatory regions (e.g., rna stems and loops), the effect may be due to altered structural configurations that impede the binding of virus own proteins or of cellular factors. plenty of examples illustrate the effect of mutations via these two mechanisms (bernet and elena 2015) . a third, more intriguing, yet poorly explored possibility is that mutated viral components (i.e., rnas and proteins) may interact in nonoptimal ways with the complex network of genetic and biochemical interactions of the cell as a whole. interacting in nonoptimal ways with any of the elements of the host regulatory and biochemical networks may have profound effects in the progression of a successful infection and therefore of viral fitness. in this work, we considered mutations affecting the ci protein (with rna helicase, atpase, and membrane activities), the viral replicase nib, and the hc-pro protein (vsr, protease, and helper-component during transmission by aphids). our results point out that fitness, irrespective of what type of mutation is introduced, is a good indicator of how a given mutant reprograms gene expression patterns, to its own benefit or as a consequence of cellular defenses (e.g., fig. 2c ). despite the interest of this hypothesis, none of the early studies tackled the relationship between genotype and fitness of the virus and transcriptomic profiles of the host in a systematic manner, but rather focused on comparing two viral genotypes. evolution experiments simulating the spillover of tev from its natural host n. tabacum into a novel, poorly susceptible one, arabidopsis thaliana, have shown that adaptation of tev to the novel host (i.e., concomitant to large increases in fitness) was associated with a profound change in the way the ancestral and evolved viruses interacted with the plant's transcriptome, with genes involved in the response to biotic stresses, including signal transduction and innate immunity pathways, being significantly underexpressed in plants infected with the evolved virus than in plants infected with the ancestral one (agudelo-romero et al. 2008) . further evolution experiments into different ecotypes of a. thaliana that differed in their susceptibility to infection illustrated a pattern of adaptive radiation in which viruses were better adapted to their local host ecotype than to any alternative one, but with viruses evolved into more restrictive ecotypes being more generalists than viruses evolved in the more permissive ones (hillung et al. 2014) . interestingly, these differences in fitness had a parallelism with differences in the transcriptomic profiles of plants from different ecotypes; the more generalist viruses altering similar genes in every ecotype, whereas the more specialist viruses altered different genes in different ecotypes (hillung et al. 2016) . similarly, a. thaliana plants infected either with a mild or a virulent isolate of turnip mosaic potyvirus also showed profound differences in the genes and functional categories altered (s anchez et al. 2015) . in this case, the more virulent strain mainly altered stress responses and transport functions compared with the mild one (s anchez et al. 2015) . in a recent study, the transcriptomic alterations induced in nicotiana benthamiana plants infected either with a wt tobacco vein banding mosaic potyvirus or a genotype deficient in the vsr were compared (geng et al. 2017) . both transcriptomes differed in many aspects, including repression of photosynthesis-related genes, genes involved in the rna silencing pathway, the jasmonic acid signaling pathway, and the auxin signaling transduction (geng et al. 2017) . altogether, the results reported in this study illustrate the complex interaction between viruses and their native host plants, and how the outcome of this interaction, in terms of viral replication and accumulation, correlates with the expression of host genes ( fig. 7a ). our observation that viral fitness correlates positively or negatively with the expression of certain genes is of particular interest. by simply measuring the fitness of the virus infecting a given host, we may predict the whole genomic profile of the host cell to characterize its state (molecular impact of infection). moreover, by specifically targeting host genes that are essential for high fitness virus variants but not for milder ones, we may prevent the cervera et al. . doi:10.1093/molbev/msy038 mbe spreading of the former variants, whereas still allowing mild variants to replicate and, perhaps, act as attenuated vaccines that enhance the antiviral response of the plant. the infectious clone pmtev contains a full copy of the genome of a wt tev strain isolated from tobacco ( fig. 1a ; genbank accession dq986288) (bedoya and dar os 2010) . six tev mutant genotypes were constructed by sitedirected mutagenesis starting from template plasmid pmtev as described in torres-barcel o et al. (2008) (mutants as13, cla2, and cla11) and in carrasco, de la iglesia, et al. (2007) (mutants pc48, pc55, and pc95) . figure 1a shows the characteristics of the seven genotypes used in the study. the pmtev-derived plasmids contain a unique bglii restriction site. after linearization with bglii, each plasmid was transcribed with mmessage mmachine sp6 kit (ambion), following the manufacturer's instructions, to obtain infectious 5 0 -capped rnas. transcripts were precipitated (1.5 volumes of diethyl pyrocarbonate [depc]-treated water, 1.5 volumes of 7.5 m licl, 50 mm edta), collected, and resuspended in depc-treated water (carrasco, dar os, et al. 2007) . rna integrity and quantity were assessed by gel electrophoresis. in addition, each transcript was confirmed by sequencing of a ca. 800-bp fragment circumventing the mutation site as described elsewhere (lali c et al. 2010) . in short, reverse transcription (rt) was performed using m-mulv reverse transcriptase (thermo scientific) and a reverse primer outside the region of interest to be pcr-amplified for sequencing. pcr was then performed with phusion dna polymerase (thermo scientific) and appropriate sets of primers for each transcript. sequencing was performed at ibmcp sequencing service. templates were labelled with big dyes v3.1 and resolved in an abi 3130 xl machine (life technologies). nicotiana tabacum l. cv. xanthi nn plants were used for production of virus particles of each of the seven genotypes ( fig. 1a ). the 5 0 -capped rna transcripts were mixed with a 1:10 volume of inoculation buffer (0.5 m k 2 hpo 4 , 100 mg/ml carborundum). batches of 8-week-old n. tabacum plants were inoculated with $5 mg of rna of each viral genotype by abrasion of the third true leaf. inoculations were done in two experimental blocks, the first one including as13, cla2, cla11, pc95, and their controls, and the second one including pc48, pc55, and their corresponding controls. all plants were at similar growth stages. afterward, plants were maintained in a biosafety level-2 greenhouse chamber at 25 c under a 16-h natural sunlight (supplemented with 400 w high-pressure sodium lamps as needed to ensure a minimum light intensity of par 50 lmol/m 2 /s) and 8 h dark photoperiod. all infected plants showed symptoms 5-8 dpi, except the as13 infected plants, which remained asymptomatic and only showed erratic chlorotic spots. at 8 dpi virus-infected leafs and apexes from each plant were collected individually in plastic bags (after removing the inoculated leaf), with the exception of the as13 infected plants that were collected at 15 dpi. next, plant tissue was frozen with liquid n 2 , homogenized using a mixer mill mm 400 (retsch), and aliquoted in 1.5 ml tubes (100 mg each). these aliquots of tevinfected tissue were stored at à80 c. rna extraction from 100 mg of fresh tissue per plant was performed using agilent plant rna isolation mini kit (agilent technologies) following the manufacturer's instructions. the concentration of total plant rna extract was adjusted to 50 ng/ml for each sample. each rna sample was resequenced again at this stage to ensure the constancy of the genotypes as described earlier. viral loads were measured by absolute real-time rt-quantitative pcr (rt-qpcr), using standard curves (lali c et al. 2010) . standard curves were constructed using ten serial dilutions of the wt tev genome, synthetized in vitro as described earlier, in total plant rna obtained from healthy tobacco plants treated like all other plants in the experiment. quantification amplifications were done in a 20-ml volume, using a gotaq 1-step rt-qpcr system (promega) following the manufacturer's instructions. the forward (q-tev-f 5 0 -ttggtcttgatggcaacgtg-3 0 ) and reverse (q-tev-r 5 0 -tgtgccgttcagtgtcttcct-3 0 ) primers were chosen to amplify a 71-nt fragment in the 3 0 end of tev genome and would only amplify complete genomes (lali c et al. 2011). amplifications were done using an abi stepone plus realtime pcr system (applied biosystems) and the following cycling conditions: the rt phase consisted of 15 min at 37 c and 10 min at 95 c; the pcr phase consisted of 40 cycles of 10 s at 95 c, 34 s at 60 c, and 30 s at 72 c; and the final phase consisted of 15 s at 95 c, 1 min at 60 c, and 15 s at 95 c. amplifications were performed in a 96-well plate containing the corresponding standard curve. three technical replicates per infected plant were done. quantification results were examined using stepone software version 2.2.2 (applied biosystems). total rna was extracted and virus accumulation quantified by rt-qpcr as described earlier and detailed previously (lali c et al. 2010) . virus accumulation (expressed as genomes/ng of total rna) was quantified 8 dpi for all genotypes with the exception of as13, that was quantified 15 dpi. these sampling times assure that viral populations were at a quasi-stationary plateau in n. tabacum (carrasco, dar os, et al. 2007) . these values were then used to compute the fitness of the mutant genotypes relative to that of the wt genotype using the expression w ¼ (r t /r 0 ) 1/t , where r 0 and r t are the ratios of accumulations estimated for the mutant and wt viruses at inoculation and after t days of growth, respectively (carrasco, de la iglesia, et al. 2007) . fitness (w) data were fitted to a generalized linear model with a normal distribution and an identity link function. the model incorporates two random factors, the tev genotype (g) and the replicate plants (p), with the second nested within the first: where l is the grand mean value and e ijk is the error associated with individual measure k (estimated from the three technical viral fitness and perturbation of host's transcriptomes . doi:10.1093/molbev/msy038 mbe replicates of the rt-qpcr reaction). the statistical significance of each factor was evaluated using a likelihood ratio test that asymptotically follows a v 2 probability distribution. this statistical analysis was performed with ibm spss version 23. total rna was isolated as described earlier and its integrity assessed using a bioanalyzer 2100 (agilent) before and after hybridization. the rna samples were hybridized onto a genotypic designed n. tabacum gene expression 4 â 44 k microarray (amadid: 021113), which contained 43,803 probes (60-mer oligonucleotides) and was used in a onecolor experimental design according to minimum information about a microarray experiment guidelines (brazma et al. 2001) . three biological replicates for each of the six tev mutant genotypes, four replicates for the wt tev, plus four mock-inoculated negative control plants were analyzed. sample rnas (200 ng) were amplified and labeled with the low input quick amp labeling kit (agilent). the one-color spike-in kit (agilent) was used to assess the labeling and hybridization efficiencies. hybridization and slide washing were performed with the gene expression hybridization kit (agilent) and gene expression wash buffers (agilent) as detailed in the manufacturer's instructions kits. after washing and drying, slides were scanned with a genepix 4000b (axon) microarray scanner, at 5 mm resolution. image files were extracted with the feature extraction software version 9.5.1 (agilent). microarray hybridizations were performed at ibmcp genomics service. interarray analyses were performed with tools implemented in the babelomics 5 webserver (alonso et al. 2015) . firstly, all agilent files were uploaded together to standardize the expression-related signals using quantile normalization (bolstad et al. 2003 ). this process resulted in a matrix of normalized expression with genes in rows and samples (tev genotypes, controls, and their replicates) in columns, provided as supplementary file s1, supplementary material online. to compare the expression profiles of two tev genotypes, the expression level corresponding to mockinoculated plants (control) was first subtracted. secondly, differential expression was carried out by comparing two different samples, including replicates (against mock-inoculated or wt tev-infected plants), by using the limma test (smyth 2004) with fdr according to benjamini and hochberg (1995) (adjusted p < 0.05). an additional criterion of at least 2-fold change in mean expression, that is jlog 2 (fold change)j > 1, was imposed to discard genes presenting minimal increases or decreases. lists of degs, up-or down-regulated, provided in supplementary file s2, supplementary material online. thirdly, one-way anova tests were performed to identify genes that vary across all conditions (with fdr as above, adjusted p < 0.05). to identify the genes shown in figure 2a , the test was done over all samples, including the control. by contrast, to identify the genes shown in figure 5a , the test was done over all samples corresponding to infections with distinct tev genotypes. an additional criterion of significant spearman's correlation between mean fitness and mean expression (p < 0.05) was imposed in this latter case. lists of genes whose expressions correlate with viral fitness, either positive or negative, provided in supplementary file s4, supplementary material online. the similarity between expression profiles of plants infected with different tev genotypes was evaluated with a principal components (pc) analysis with matlab version r2014b (mathworks) with default parameters (singular value decomposition). the annotation of the individual probes in the agilent's tobacco microarray (files 021113_d_aa_20130122.txt and 021113_d_genelist_20130122.txt provided by agilent) was updated by blasting the oligo sequence file (021113_d_fasta_20130122.txt) against the most recent version of the n. tabacum mrna database (ntab-bx_awok-ss_basma.mrna.annot.fna) available at the sol genomics network (fern andez-pozo et al. 2015) . sequences that did not return a significant blast hit were removed from the output. a total of 40,430 annotated probes were generated. in 2,673 cases, more than one probe pointed to the same n. tabacum gene (e.g., probes a_95_p217927 and a_95_p046006 were both complementary to gene eb438730), and in those cases the target appeared twice in the output. each one of the hits could be associated with an alternatively spliced mature mrna in the sol genomics network database. we then proceeded to generate the list of n. tabacum orthologous genes in the a. thaliana genome. to do so, we used blast against the tair version ten database of a. thaliana cdnas (lamesch et al. 2012) , with a cutoff e-value < 10 à4 . the resulting mapping between n. tabacum and a. thaliana orthologues is listed in supplementary file s3, supplementary material online. the determination of the gene ontology (go) categories overrepresented within the lists of degs was carried out in the agrigo webserver ) by using the fisher's exact test (with fdr adjusted p < 0.05 according to benjamini and yekutieli [2001] criterion). for the graphical representation, we constructed a plane involving the most relevant categories, depicted as circles with size proportional to the total number of host genes belonging to that category (in log scale). in addition, with the lists of genes whose expression correlates with viral fitness, we calculated the pie charts associated to the following: 1) biological function and 2) molecular function in the panther webserver (mi et al. 2017 ). total rna was extracted from 100 mg of fresh tissue of plants infected with each one of the seven tev genotypes as described earlier. the concentration of total plant rna was adjusted to 50 ng/ml. nine candidate genes were selected to validate the effect of each tev genotype on expression. specific primers were cervera et al. . doi:10.1093/molbev/msy038 mbe designed for each gene that amplified the matured version of their corresponding mrnas. primers were designed using oligo primer analysis software version 7 (www.oligo.net). gene expression was quantified by rt-qpcr relative to the expression of two housekeeping genes (schmidt and delaney 2010) . the first housekeeping gen encodes for the l25 ribosomal protein (genbank accession l18908). forward ntl25-f (5 0 -cccctcaccacagagtctgc-3 0 ) and reverse ntl25-r (5 0 -aagggtgttgttgtcctcaatctt-3 0 ) primers were chosen to amplify a 51-nt long fragment. the second housekeeping gen encodes for the elongation factor 1a (genbank accession af120093). for this second gene, forward ntef1a-f (5 0 -tgagatgcaccacgaagctc-3 0 ) and reverse ntef1a-r (5 0 -ccaacattgtcaccaggaagtg-3 0 ) primers were chosen to produce a 51-nt long amplicon. amplifications were done in 20 ml volume, using gotaq 1-step rt-qpcr system (promega) following the manufacturer's instructions. the forward and reverse primers for each target gene were chosen to amplify a 68-137 nt fragments in the corresponding tobacco mature mrna. amplifications were done using an abi stepone plus realtime pcr system (applied biosystems) and the following cycling conditions: the rt phase consisted of 15 min at 37 c and 10 min at 95 c; the pcr phase consisted of 40 cycles of 10 s at 95 c, 34 s at 60 c, and 30 s at 72 c; and the final phase consisted of 15 s at 95 c, 1 min at 60 c, and 15 s at 95 c. amplifications were performed individually for each target gene (with the corresponding set of primers) in a 96-well plate containing three biological replicates and two technical replicates per infected plant. in addition, each plate incorporates the two housekeeping genes. since each plate served for the quantification of a single mature mrna together with the two housekeeping reference genes, a baseline value of 0.1056, resulting from averaging the threshold baselines of all plates analyzed, was used as default threshold. quantification results were examined using the stepone version 2.2.2 software (applied biosystems). details on the primers used for amplifications, the size of the amplicons, the genbank identification ids, and rt-qpcr threshold crossing (c t ) values for the nine degs and the corresponding internal reference genes from the same samples are all reported in supplementary file s5, supplementary material online. the microarray data that support the findings of this study have been deposited at ncbi geo with accession number gse99838. processed data are presented in the supplementary material online. all other relevant data are available from the corresponding author on request. supplementary data are available at labarchives under doi: 10.6070/h4np22xx. mutational and fitness landscapes of an rna virus revealed through population sequencing virus adaptation by manipulation of host's gene expression salicylic acid-mediated and rnasilencing defense mechanisms cooperate in the restriction of systemic spread of plum pox virus in tobacco babelomics 5.0: functional interpretation for new generations of genomic data connecting viral with cellular interactomes network medicine: a networkbased approach to human disease stability of tobacco etch virus infectious clones in plasmid vectors controlling the false discovery rate: a practical and powerful approach to multiple testing the control of the false discovery rate in multiple testing under dependency distribution of mutational fitness effects and of epistasis in the 5' untranslated region of a plant rna virus a comparison of normalization methods for high density oligonucleotide array data base don variance and bias control of specific gene expression by gibberellin and brassinosteroid minimum information about a microarray experiment (miame). toward standards for microarray data a real-time rt-pcr assay for quantifying the fitness of tobacco etch virus in competition experiments distribution of fitness and virulence effects caused by single-nucleotide substitutions in tobacco etch virus effect of host species on topography of the fitness landscape for a plant rna virus fitness of rna virus decreased by muller's ratchet the fittest versus the flattest: experimental confirmation of the quasispecies effect with subviral pathogens viral fitness and perturbation of host's transcriptomes modulation of host plant immunity by tobamovirus proteins a core function of eds1 and pad4 is to protect the salicylic acid defense sector in arabidopsis immunity fitness valleys constrain hiv-1's adaptation to its secondary chemokine coreceptor fitness declines in tobacco etch virus upon serial bottleneck transfers the elongation of amylose and amylopectin chains in isolated starch granules rna virus mutations and fitness for survival sanju an r. 2009. the fitness effects of random mutations in single-stranded dna and rna bacteriophages rapid fitness losses in mammalian rna virus clones due to muller's ratchet towards an integrated molecular model of plant-virus interactions the sol genomics network (sgn) -from genotype to phenotype to breeding how we become ill virus-host interactomes and global models of virus-infected cells transcriptomic changes in nicotiana benthamiana plants inoculated with the wild-type or an attenuated mutant of tobacco vein banding mosaic virus formation of potato virus ainduced rna granules and viral translation are interrelated processes required for optimal virus accumulation acting depolymerization factor 4 regulates actin dynamics during innate immune signaling in arabidopsis experimental evolution of an emerging plant virus in host genotypes that differ in their susceptibility to infection the transcriptomics of an experimentally evolved plant-virus interaction quantitation of relative fitness and great adaptability of clonal populations of rna viruses the arabidopsis plat domain protein 1 promotes abiotic stress tolerance and growth in tobacco adaptation of tobacco etch potyvirus to a susceptible ecotype of arabidopsis thaliana capacitates it for systemic infection of resistant ecotypes effect of host species on the distribution of mutational fitness effects for an rna virus epistasis between mutations is host-dependent for an rna virus the impact of higher-order epistasis in the withinhost fitness of a positive-sense plant rna virus the arabidopsis information resource (tair): improved gene annotation and new tools the agamous-like 20 mads domain protein integrates floral inductive pathways in arabidopsis arabidopsis vq motif-containing protein 29 represses seedling deetiolation by interacting with phytochrome-interacting factor 1 measuring natural selection on genotypes and phenotypes in the wild hiv-1 reverse transcriptase inhibitor resistance mutations and fitness: a view from the clinic and ex vivo panther version 11: expanded annotation data from gene ontology and reactome pathways, and data analysis tool enhancements extreme fitness differences in mammalian and insect hosts after continuous replication of vesicular stomatitis virus in sandfly cells exponential increases of rna virus fitness during large populations transmissions adaptability costs in immune escape variants of vesicular stomatitis virus congruent evolution of fitness and genetic robustness in vesicular stomatitis virus a peroxidase-dependent apoplastic oxidative burst in cultured arabidopsis cells functions in mamp-elicited defense fitness and its role in evolutionary genetics respective contributions of arabidopsis dcl2 and dcl4 to rna silencing distribution of fitness effects caused by single-nucleotide substitutions in bacteriophage f1 genome rearrangements affects rna virus adaptability on prostate cancer cells molecular biology of potyviruses translation initiation factors: a weak link in plant rna virus infection a meta-analysis reveals the commonalities and differences in arabidopsis thaliana response to different viral pathogens viral strain-specific differential alterations in arabidopsis developmental patterns selection for robustness in mutagenized rna viruses the distribution of fitness effects caused by single-nucleotide substitutions in an rna virus stable internal reference genes for normalization of real-time rt-pcr in tobacco (nicotiana tabacum) during development and abiotic stress adenosine kinase contributes to cytokinin interconversion in arabidopsis linear models and empirical bayes methods for assessing differential expression in microarray experiments systems biology and the host response to viral infection comparative host gene transcription by microarray analysis early after infection of the huh7 cell line with severe acute respiratory syndrome coronavirus and human coronavirus 229e agrigo v2.0: a go analysis toolkit for the agricultural community, 2017 update zika virus infection reprograms global transcription of host cells to allow sustained infection from hypo-to hypersuppression: effect of amino acid substitutions on the rnasilencing suppressor activity of the tobacco etch potyvirus hc-pro cost of host radiation in an rna virus the distribution of mutational fitness effects of phage /x174 on different hosts interaction of the trans-frame potyvirus protein p3n-pipo with host protein pcap1 facilitates potyvirus movement the mutational robustness of influenza a virus viral proteomics: global evaluation of viruses and their interaction with the host nmd: nonsense-mediated defense viral fitness: definitions, measurements, and current insights hiv-1 can escape from rna interference by evolving an alternative structure in its rna genome predicting the stability of homologous gene duplications in a plant rna virus the dna-and rna-binding protein factor of dna methylation 1 requires xh domain-mediated complex formation for its function in rna-directed methylation cosupression of rbcs3b in arabidopsis leads to severe photoinhibition caused by ros accumulation viral fitness and perturbation of host's transcriptomes we thank francisca de la iglesia and paula agudo for excellent technical assistance, the evolsysvir lab members for help, comments and discussions, rachel whitaker for english proofreading, and lorena latorre (ibmcp genomics service) and javier forment (ibmcp bioinformatics service) for their assistance. this research was supported by grants from spain's agencia estatal de investigaci on-feder (bfu2012-30805 and bfu2015-65037-p to s.f.e. and bfu2015-66894-p to g.r.) and generalitat valenciana (prometeoii/2014/021). key: cord-017752-ofzm3x3a authors: nan title: theories of carcinogenesis date: 2007 journal: molecular mechanisms of cancer doi: 10.1007/978-1-4020-6016-8_1 sha: doc_id: 17752 cord_uid: ofzm3x3a the oldest description of human cancer, referring to eight cases of tumors of the breast, was found in the egyptian edwin smith papyrus, written around 3000–1500 bc. the oldest specimens of human cancers were detected in the remains of a female skull dating back to the bronze age (1900–1600 bc), and in fossilized bones of ancient egypt. the mummified skeletal remains of peruvian incas, dating about 2,400 years ago, contained lesions suggestive of malignant melanoma. the term “cancer” goes back to hippocrates (460–370 bc), who named a group of diseases καρκινοσ and καρκινομα, the ancient greek word for crab. it is a metaphor for the hard center and spiny projections of the tumors he studied. cancer is the latin word for crab and its use has been traced back to galen (ad 129–199). a snapshot of theories of carcinogenesis, devised in the course of the last two centuries, reflects the progress of insight from the cellular level via biochemistry to an understanding of damaging influences and oncogenes, and to a more wholistic approach in the regulatory theory. it shows the relative success of reductionism as well as the current need to put the insights of various research endeavors into broader paradigmatic contexts. in 1665, robert hooke described walled cavities in his microscopic examination of cork and called them cells. in 1805, lorenz oken conceptualized a cell-based theory of life, arguing that plants and animals are assemblages of tiny living infusoria. this notion was later populated and refined by matthias schleiden and theodor schwann [schleiden 1838; schwann 1839; nurse 2000 ]. in 1841, robert remak (1815 -1865 described the phenomenon of cell division in chick embryos and in muscle development. between 1850 and 1855, he extended these observations to embryonic development and proposed that tumor cells arose by cell formation from existing specific tissues [remak 1852 [remak , 1855 . like giovanni morgagni, who had performed the first autopsy in 1761 and had correlated illness to macroscopic pathology, rudolf virchow (1821 virchow ( -1902 correlated illness to microscopic pathology. after initial skepticism, virchow acknowledged remak's evidence for cell division. in 1858, he gave a series of 20 lectures to a group of physicians at the institute of pathology in berlin, in which he summarized his experience in microscopic anatomy of tissues with special attention to those deviating from the healthy condition [virchow 1858 ]. according to virchow's dictum "omnis cellula e cellule" cells of diseased tissues are derived from normal tissues, implying that malfunction begets disease (significantly, virchow had been a student of müller's, who had demonstrated in 1838 that cancer is made up of cells, not lymph; but he was of the opinion that cancer cells arose from interstitial budding elements, blastema, not from normal cells). hence, tumors are derived from cells that divide faster than they should. the average human body experiences around 10 16 cell divisions in a lifetime. with an individual's risk to contract cancer being about 10%, malignant transformation occurs in 1 out of 10 17 cell divisions [weinberg 1998 ]. the mechanistic underpinning for this process was defined by the identification of key regulators of the cell division cycle by leland h. hartwell, r. timothy hunt, and paul m. nurse. chapter 1 the analysis of transformation has been guided substantially by the technical accomplishment to expand cells in culture. tissue culture was developed in the early years of the 20th century [harrison 1907; burrows 1910] . warren lewis cultured rodent cancers [lewis 1936] . in 1951 at the johns hopkins hospital, george gey established human cancer cell culture from the cervical adenocarcinoma of the 30-year-old, black henrietta lacks [gey et al. 1952] . although the resulting hela cells are among the cornerstones of cancer research, their high rate of proliferation caused a risk for cross-contamination of other cultures by them. this lead to the establishment of cell-typing techniques on the biochemical [gartler 1967 ] and genetic [nelson-rees et al. 1973] levels. generally, the human tumor cells that grow permanently in culture are a selected group of very aggressive cancers. almost all of the continuous cell lines are derived from high-grade, high-stage cancers. programmed cell death (apoptosis, greek: falling off of tree leaves) [kerr et al. 1972 ] may be invoked by many organisms as a control mechanism to prevent unrestricted growth. research during the 1960s through 1970s in the worm caenorhabditis elegans, identified ced-3 and ced-4 as essential genes for programmed cell death, while ced-9 was found to be a negative regulator of apoptosis. the 2002, nobel prize in medicine and physiology was awarded for these observations to sydney brenner, h. robert horvitz, and john e. sulston. the first mammalian homolog for ced-3 was described as bcl-2, a gene that is involved in b-cell lymphomata [negrini et al. 1987; vaux et al. 1988 ]. bcl-2 transfected b-lymphocytes are resistant to apoptosis, which is typically induced by interleukin-3 withdrawal. for the first time, it was demonstrated that the pathway to tumorigenesis depends not only on the ability to escape growth control but also on the ability to prevent cell death [hockenbery et al. 1990 ]. according to the biochemical theory of cancer, a key process that governs cell proliferation goes awry and causes transformation. various aspects of metabolism may be affected in a manner that could lead to cancer. consequently, before the discovery of oncogenes, a large variety of theories was debated, which incriminated the malfunction of diverse biochemical processes as causative for malignant transformation. during tumor progression, the enzymatic composition of the affected cells is simplified (described as the theory of convergence in cancer), so that various cancers resemble one another more than they resemble their tissue of origin [greenstein 1954 ]. as one possible underlying reason, the biochemist otto von warburg [von warburg 1930] had suggested that the oxidative metabolism in cancer cells is replaced by glycolysis and that the excessive proliferation of cancer cells reflects their ability to metabolize independently of oxygen. later, it was found that the limiting substrates for tumor growth are oxygen and glucose. hence, anaerobic glycolysis is not the cause, but the consequence of the accelerated growth, which cannot be satisfied by the reorganization of the micro-vasculature [vaupel et al. 1976 ]. however, in a remarkable reversal toward supporting the warburg model, a 2005 publication showed that in cells engineered to become cancerous glycolytic conversion started early and expanded as the cells became more malignant [ramanathan et al. 2005 ]. this rekindled the discussion of bioenergetics in cancer cells. others attributed the simplified enzyme patterns of cancerous cells to a regression of the tumor tissues to early embryonal stages of development. highly malignant cells tend to resemble fetal tissues more than their adult normal counterparts do. the idea of derepressive dedifferentiation in carcinogenesis found support in the occurrence of onco-fetal proteins during the disease. the expression of these genes should be repressed in differentiated tissues, but this repression is reversed in tumors. the description of tumor tissue in histopathologic analysis as dedifferentiated is derived from this concept. the alternative model of "oncogeny as partially blocked ontogeny" suggested that cancer is the result of a series of alterations in the genes and their gene expression, which prevent a stem cell from completing all the steps necessary for terminal differentiation, suggesting that the target cell for carcinogenesis is the pluripotent stem cell [potter 1978 ]. the protein deletion theory, an extension of the dedifferentiation theory, is an epigenetic model of cancer. based on the observation that a carcinogenic aminoazo dye covalently bound liver proteins in animals undergoing early carcinogenesis, whereas little or no dye binding occurred to the proteins of tumors induced by this dye, miller and miller [1947] proposed the deletion hypothesis. they suggested that carcinogenesis resulted from permanent alterations or loss of proteins that are essential for the control of growth. thus, carcinogens eliminate specific enzymes from the affected cells by binding covalently to water-soluble basic proteins (h 2 proteins according to electrophoresis nomenclature). this causes the elimination (deletion) of these proteins from the cells. cancer originates because the water-soluble basic proteins contain several growth inhibitory components. therefore, the initial step in carcinogenesis is the inactivation of endogenous inhibition. transformation can be associated with refraction to exogenous inhibitors of cell cycle progression. potter [1964] suggested that the proteins lost during carcinogenesis may be involved in the feedback control of enzyme systems required for cell division, and he proposed the feedback deletion theory. in this model, repressors crucial to the regulation of genes involved in cell proliferation are lost or inactivated by the action of oncogenic agents on the cell, either by interacting with dna to block repressor gene transcription or by reacting directly with repressor proteins and inactivating them. it was thought that experimental evidence, in which the fusion of cancerous cells with nontransformed cells resulted in the absence of transformation, supported the epigenetic theory. later, this phenomenon was attributed to the functional dominance of tumor suppressor genes. the demonstration of the presence of an ordered biochemical imbalance, linked to transformation and progression, in cancer cells led to the molecular correlation concept. weber [1977] stated that the biochemical dysregulations underlying neoplasia could be identified by elucidating the pattern of gene expression as revealed in the activity, concentration, and isozyme aspects of key enzymes and their linking with neoplastic transformation and progression. key enzymes are involved in the regulation of rate and direction of the flux of competing synthetic and catabolic pathways and are most likely affected in the malignant process. a number of enzyme activities found to be altered in malignant cells are those involved in nucleic acid synthesis and catabolism. in general, the key enzymes in the de novo pathways and salvage pathways of purine and pyrimidine biosynthesis are increased and the opposing catabolic enzymes are decreased during malignant transformation and tumor progression. these findings and concepts were further developed by the analysis of gene expression profiles and identification of gene expression signatures in cancer cells some 20 years later. the stigma that cancer equals death, originating in the experiences of hippocrates, galen, and celsus, was attached to the disease for centuries. it led to the long-respected dictum that doctors should not inform their patients of the diagnosis to avoid agony. in view of progress in surgery, which allowed the removal of some tumors, the american cancer society was formed in 1913 to educate the public about the warning symptoms of cancer and to reduce their fatalistic fears. the increased public health awareness was helpful whenever carcinogenic mechanisms were identified and the need for lifestyle changes was publicized. the insight that malignancy may be caused by the influence of damaging agents forms the basis of the noxious theory of carcinogenesis. among the influences that may cause cancer are chemicals, radiation, and viruses. chemical carcinogenesis. in 1775, chemical carcinogenesis was observed by the english surgeon sir percival pott, who related the cause of scrotal skin cancer in a number of his patients to a common history of occupational exposure to large amounts of coal soot as chimney sweepers when they were boys. the connection between soot and cancer was confirmed in 1915 by the first controlled experimental induction of cancer in laboratory animals by katsusaburo yamagiwa. the experiment established chemical carcinogenesis, and specifically occupational exposure, as one possible cause for malignant growths. an unrelated form of occupational exposure was documented in the mid-19th century in silver miners from st. joachimsthal, bohemia (today czech republic). silver had been extracted there since the mid-16th century and was manufactured into the joachimsthaler silver coins that were predecessors of the german currency "thaler" and later the american currency "dollar." these miners had a high incidence of lung cancer, which was otherwise extremely rare at that time. the cause was traced to their occupational exposure (table 1. archeological evidence suggests that the mayans smoked tobacco leaves as early as the 1st century bc. only in 1761, john hill published a treatise that warned of unusual tumors of the nose consecutive to sniffing tobacco. by 1949, ernst wynder had conducted a survey of 684 lung cancers, which indicated a substantially elevated risk in smokers compared to nonsmokers. it was followed 6 months later by a similar analysis, authored by richard doll. about 188 years after the publication by john hill, a connection between lifestyle choices and cancer risk was established. during the following years of the 20th century, chemical carcinogenesis by tobacco products became a major cause for an increasing incidence of lung cancers. (table 1. 3.b). in italy, bernardino ramazzini associated breast cancer with reproductive factors. he reported in 1713 the virtual absence of cervical cancer and relatively high incidence of breast cancer in nuns and suggested that this was in some way related to their celibate lifestyle. the key observations by pott, hill, and ramazzini laid the foundation for the field of cancer epidemiology. this area of research was given another foundation between 1930 and 1932, when fisher, haldane, and wright established the principles of population genetics. in the united states, the first hospital registry for cancer was established in 1926 at yale-new haven hospital in connecticut. in 1935 and 1946 , the first central cancer registries were initiated in connecticut and california. in 1941, the united states national cancer institute published a survey of 696 chemical compounds, 169 of which were found to be carcinogenic in animals. during the 1960s, environmental movements became prominent in most of the western societies. rachel carson believed that the long-term ecological effects of synthetic chemical pesticides were not being researched adequately. her book "silent spring" pointed to the pathogenic potential of environmental toxins, and the concept of carcinogens entered popular consciousness. in 1964, rachel carson succumbed to cancer at the age of 56. the national cancer act of 1971 (declared "war on cancer" by president richard nixon) mandated the collection, analysis, and dissemination of all data useful in the prevention, diagnosis, and treatment of cancer. it resulted in the establishment of the national cancer program, under which the surveillance, epidemiology, and end results (seer) program was developed in 1973. over the years, the susceptibility to various cancers has been associated with nutritional habits. in 1981, doll and peto [1981] estimated that 35% of cancer deaths in the united states were attributable to dietary factors. the western european diet is rich in meat and correlates with a high incidence of colon cancer. nasopharyngeal cancer is among the most widespread tumors in southeast asia, possibly supported by the ingestion of salted fish. esophageal cancer typically occurs in conjunction with alcoholism. the growing health conscience in the late years of the 20th century, combined with insights into the potential carcinogenic properties of reactive oxygen intermediates prompted multiple studies into cancer preventive capacities of antioxidants as nutrition supplements. it was soon found that while 10 theories of carcinogenesis table 1 .3.a. occupational cancers. certain occupations are associated with high levels of exposure to specific carcinogenic influences. these agents cause dna damage through physical or chemical effects. accordingly, the types of cancers induced by these carcinogens have a higher than normal incidence among exposed workers the intake of some foods can increase the risk for specific malignancies, others -such as retinoidscan act in a chemopreventive [sporn et al. 1976 ] fashion (figure 1.3 from their studies of oral cancer, slaughter, southwick, and smejkal derived the concept of carcinogenesis as a process of field cancerization (field carcinogenesis, condemned mucosa syndrome). the repeated exposure of a region's entire tissue area to carcinogenic insult increases the risk for developing multiple independent premalignant and malignant foci in that tissue [slaughter et al. 1953 ]. increasingly, molecular mechanisms have been identified to link certain toxins to specific cancers. in 1975, bruce ames at the university of california in berkeley developed a test for the mutagenicity of chemical compounds, which was used to confirm that carcinogens are mutagens. further mechanistic insight was gained with the demonstration that aflatoxin causes the mutation g249t in p53, which is associated with hepatoma [bressac et al. 1991] . ultraviolet (uv) light induces pyrimidine dimers, which cause mutations in p53 that lead to skin cancer [brash et al. 1991; pierceall et al. 1991] . the double-edged sword of mutagens became evident when their possible benefit in the treatment of neoplasias was discovered. mustard gas had been used as a chemical warfare agent during world war i and was studied further in world war ii. in 1917, krumbhaar, a captain in the us medical corps, noted the development of profound leukopenia in individuals who survived a gas attack for several days [krumbhaar 1919 ]. following up on this observation, a group of the us office of scientific research and development (osrd) at yale medical school secretly studied the effects of nitrogen mustard on lymphomata. there, lindskog successfully treated a radioresistant lymphosarcoma that compressed the patient's trachea with the injection of nitrogen mustard in december 1942. none of this was made public until 1946. during a military operation in world war ii, allied ships in bari harbor, italy, were sunk in an air assault (2 december 1943). at the center of the destruction was the vessel john harvey, laden with ammunition, supplies, and 2,000 mustard gas bombs. a large number of military personnel were accidentally exposed to mustard gas and were later found to have abnormally low white blood cell counts. it was reasoned that an agent, which damaged the rapidly growing white blood cells, might have a similar effect on cancer. cornelius p. rhoads served as chief of the medical division of the us army's chemical warfare unit during world war ii. based on his experience in the bari incident, he investigated mustard gas as a tumor-killing agent. this presaged classical chemotherapy [rhoads 1946 ]. soon, the pharmacists louis goodman and alfred gilman, recruited by the us department of defense to investigate potential therapeutic applications of chemical warfare agents, observed that exposure to mustard gas caused profound lymphoid and myeloid suppression suggesting its utility for the treatment of lymphomata [goodman et al. 1946 ]. sidney farber of boston recognized that folic acid stimulated the proliferation of leukemia cells. in one of the first examples of rational drug design, he collaborated with lederle laboratories to devise folate analogs. he demonstrated that aminopterin produced remission in acute leukemia in children because it blocked a critical chemical reaction needed for dna reduplication [farber et al. 1948 ]. aminopterin was the predecessor of methotrexate (developed by lederle laboratories in 1948), which in 1956 became the first compound cure of metastatic cancer, when it was used by roy hertz and min chiu li to treat a case of choriocarcinoma. from 1942, research by george hitchings and gertrude ellion at the burroughs wellcome corporation had corroborated that it was possible to treat cancer with chemical compounds. using one of them, 6-mercaptopurine, joseph n. burchenal achieved a high percentage of complete remissions in childhood leukemias. due to these early successes, the us congress created a national cancer chemotherapy service center (nccsc) at the national cancer institute in 1955. in 1965, cisplatin was discovered by barnett rosenberg, who explored the effects of electric fields on the growth of bacteria. he observed that the bacteria unexpectedly ceased to divide due to the exposure to an electrolysis product of the platinum electrodes. the discovery soon initiated studies into the effects of platinum compounds on cell division. this drug was later pivotal in the cure of testicular cancer. the often adverse effects of these agents were diminished when it was realized that they could be effectively used in combination [frei et al. 1958; frei et al. 1965 ]. this approach followed the strategy of antibiotic therapy for tuberculosis, which used combinations of drugs with different mechanisms of action. frei, freireich, and holland hypothesized that cancer cells would be less likely to mutate and develop drug resistance to the drug combination (table 1 .3.c). the coalescence of efforts to eliminate compounds with intrinsic mutagenic potential from cancer therapy with increasing insights into the molecular pathways associated with growth signals led to the development of small molecule inhibitors, including sti571 (gleevec) [druker and lydon 2000] and zd1839 (iressa). röntgen , experimenting with electrical discharges in vacuum tubes (crookes tubes), identified penetrating radiation that also produced theories of carcinogenesis 13 table 1 .3.c. categories of conventional anticancer drugs. chemotherapy is the use of chemical substances to treat cancer. the groups of classical anticancer agents comprises cytotoxic drugs that interfere with cell proliferation through various mechanisms fluorescence, and named it x-rays ("x" symbolizing the unknown). he died from leukemia after years of working with these newly discovered rays. in 1896, henri becquerel observed that penetrating radiation was given off by uranium. marie curie (born maria sklodowska, 1867-1934) discovered the element radium, as well as methods for separating radium from radioactive residues in sufficient quantities to analyze its therapeutic properties. after a life time of research into radioactivity, marie curie succumbed to pre-leukemia. the hazards of exposure to ionizing radiation were soon recognized. acute skin reactions were observed in many individuals working with the recently invented x-ray generators. in the early years of the 20th century, these researchers were frequently affected by skin cancers and leukemias. by 1902, a case of radiation-induced cancer was reported, arising in an ulcerated area of the skin. within a few years, a large number of such skin cancers had been observed, and the first report of leukemia in five radiation workers appeared in 1911. the french physician jean bergonie developed the law of radiosensitivity. he died in 1925 from cancer caused by his research with x-rays. in 1927, hermann j. müller recognized that ionizing radiation, already known to be carcinogenic, is also mutagenic [müller 1927 ]. x-rays break the sugar-phosphate backbone of dna. radiation damage may be exerted by directly and indirectly ionizing radiation. photons and neutrons are not charged and are indirectly ionizing. radiation of charged particles (α-rays, electron rays including β-rays, proton rays) bear a higher risk for cellular damage, including transforming events. the atomic bombs that exploded over hiroshima and nagasaki caused dramatic increases in the incidence of leukemias during the ensuing decades. by the 1950s, researchers at the sloan-kettering institute in new york city became alarmed over thyroid cancers that were diagnosed in adolescents who had received radiation treatment of their thymus glands in childhood. later reports began to document that thyroid cancers could develop about 20 years following childhood radiation therapy. nevertheless, the use of radiation to fight cancer was under study early on. the work by maude menton , simon flexner, and j.v. jobling at the rockefeller institute lead to the publication of the monograph "influence of radium bromide on a carcinomatous tumor of the white rat" in 1910. tumor viruses were detected at the turn of the 20th century with the cell-free transmission of human warts [ciuffo 1907 ] and of chicken leukemia [ellermann and bang 1908] . in 1911, peyton rous isolated a highly oncogenic retrovirus (rous sarcoma virus) from a chicken sarcoma [rous 1911 ]. in 1932, shope and hurst demonstrated that papillomavirus had oncogenic activity in rabbits. in the early 1940s, clarence cook little argued that viruses had caused breast cancer in a strain of laboratory mice. these groundbreaking results had been met with skepticism, because transmissibility in chickens and tumorigenesis in rabbits were not seen as applicable to human disease. the doubts were dispelled in the 1950s, by the demonstration that a tumor induced by rous sarcoma virus (rsv) could produce infected tumor cells [rubin 1955 ]. in conjunction with the observation that murine leukemia viruses are transmissible to newborn animals [gross 1950 ], it initiated two decades of intense research into animal viruses, including many retroviruses with tumorigenic properties in animals. in 1964, the epstein-barr virus (ebv) was observed by electron microscopy in cultured cells from burkitt lymphoma [epstein et al. 1964 ]. studies of rsv lead to the identification of the first oncogene, v-src, in the 1970s [martin 1970; brugge and erikson 1977] and its subsequent sequencing [czernilofsky et al. 1980 ]. in general, the infection of cells with an oncogenic dna virus may result either in productive lytic infection with cell death and the release of newly formed virus particles or in cell transformation to the neoplastic state with little or no virus production, but with the integration of viral genetic information into the cell dna. the genomes of herpes viruses are doublestranded linear dna molecules with sizes in the range of 140-170 kb. the initiation of transformation by oncogenic herpes viruses appears to depend on specific genes, although no single t antigens (tumor antigens) have been identified. ebv was discovered in 1964 by epstein, achong, and barr in a biopsy from burkitt lymphoma. it is a γ-1 herpes virus infecting all human populations, with a prevalence of over 90% in adults. infection results in the establishment of a lifelong carrier state, characterized by the persistence of antibodies to several viral gene products and the secretion of infectious virus in the saliva, which is also the usual vehicle of transmission. the epstein-barr virus, which is the agent of infectious mononucleosis, is causative for burkitt lymphoma (described by english surgeon denis burkitt in uganda in 1958) in africa and sporadic cases elsewhere, for b-cell lymphomata in acquired immunodeficiency syndrome (aids), as well as for nasopharyngeal carcinoma with high prevalence in china. viral dna and various ebv antigens are detectable in the affected tumor cells. a herpes virus designated hhv type 8 (kshv, kaposi sarcomarelated herpes virus) has been implicated in aids associated kaposi sarcoma [chang et al. 1994] , the most common malignant tumor in aids, and also in rare sporadic kaposi sarcomata unrelated to aids. the herpes simplex virus type 2 (hsv-2) may be involved in the pathogenesis of cervical cancer. originally known as serum hepatitis, hepatitis b has only been recognized as such since 1947. it has caused epidemics in parts of asia and africa. hepatitis b is recognized as endemic in china and various other parts of asia. hepatitis b viruses (hbv) specifically infect liver cells. chronic infection with hbv may have a causal role in primary hepatocellular carcinoma, which is one of the most common forms of cancer in asia. viral dna is integrated into the tumor cells in some of these cases. in 1963, baruch blumberg and harvey alter reported the discovery of the hepatitis b surface antigen (aa, hbsag, australia antigen), and a specific antibody binding to it. in 1970, dane visualized the hepatitis b virion. these discoveries paved the way for the development of a vaccine. the genomes of the papova family members polyomavirus and sv40 are double-stranded circular dna molecules with sizes of about 5 kb. they contain two main groups of genes that are associated with early and late events in the replication cycle. the early genes are transcribed soon after infection of a cell and their encoded proteins participate in viral dna synthesis but are not structural components of the virions. the late genes encode proteins of the viral coat and capsid. in productive lytic infection, early proteins are formed transiently before the structural proteins are assembled into viral particles. when stable transformation takes place, viral dna is integrated into the cellular chromosomal dna and some of the early proteins are persistently synthesized, but viral particles are not produced. approximately 60-120 distinct types of human papilloma viruses (hpv) have been identified, which infect epithelial cells. while several forms cause benign tumors, such as warts, some types of sexually transmitted hpv are associated with precursor lesions to squamous carcinoma of the uterine cervix. in 1983, harald zur hausen and colleagues isolated hpv16 from a human cervical cancer specimen. hpv types 16 and 18 ("high risk hpv"), followed by hpv types 45 and 31, may cause invasive cervical carcinoma or anorectal cancers. hpv dna is extrachromosomal in the precursor lesions and infectious virus is produced. viral dna is frequently integrated into the cancer cells, but additional agents or factors may be involved at various stages of the progression to invasive carcinoma. cell transformation by hpv results from the expression of two early genes, e6 and e7. e6 binds to p53, while e7 binds to rb, in both cases resulting in the degradation of their targets in the ubiquitin-proteasome pathway. acting together, e6 and e7 are sufficient to induce transformation in the absence of mutations in cell regulatory proteins. in 2006, a vaccine against high risk hpv strains came on the market. while there is no evidence that sv40 can induce human tumors or that sv40 dna is present in human tumor cells, it has been a valuable model in cancer research. the early proteins found in tumors induced by polyomavirus and sv40 are termed t (tumor) antigens. polyomavirus produces large, middle, and small t antigens, of which the middle t antigen (55 kd) is necessary for transformation. this early protein is bound to the plasma membrane of transformed cells and activates signal transduction pathways that promote cell cycle progression. the two early proteins, t (large t, 94 kd) and t (small t, 17 kd), are formed from the same reading frame by alternative splicing. the large t antigen is located in the nucleus of infected cells and maintains the transformed state. distinct domains of large t bind to p53 and rb, inhibiting their function. because large t inhibits both proteins, expression of only the sv40 large t protein is sufficient to induce the transformation of certain cells. most adenoviruses only cause acute upper respiratory tract infections. adenoviruses were discovered in adenomatous tissue in 1953 by rowe. their genomes are double-stranded linear dna molecules with sizes of about 35-40 kb. in cells transformed by oncogenic adenoviruses, a region of the genome encoding early gene products, including the e1a and e1b oncoproteins, is transcribed. these transforming proteins inactivate the rb and p53 tumor suppressors, with e1a binding to rb and e1b binding to p53. oncogenic rna viruses. hübner and todaro postulated the existence of retroviral oncogenes [hübner and todaro 1969] . among the many families of rna viruses, only members of the retrovirus and flavivirus families are capable of transforming cells and inducing tumors. the genomes of retroviruses are single-stranded rna molecules with a size range of 3-9 kb. all retroviruses contain a reverse transcriptase [baltimore 1970; temin and mizutani 1970] , and their reduplication requires the synthesis of a double-stranded dna intermediate of the rna genome. some of the virally determined dna becomes integrated into the host dna as a provirus. typically, there are three retroviral genes that encode proteins necessary for viral reduplication, but do not contribute to transformation: -the gag gene encodes internal structural proteins of the virus. two unique types of human retroviruses, human t-cell leukemia viruses (htlv) types 1 and 2 take part in the etiology of leukemias [ruscetti et al. 1977; mier and gallo 1980; poiesz et al. 1980] . human t-cell leukemia virus type 1 (htlv-1), the first human retrovirus to be isolated and characterized, may be the causative agent of a relatively rare form of t-cell lymphoma that occurs mainly in japan and the caribbean islands. htlv-2 can cause hairy t-cell leukemia [kalyanaraman et al. 1982] . all the known rna-containing tumor viruses are classified as retroviruses, with the exception of the hepatitis c virus (hcv), which resembles a flavivirus. in 1989, daniel bradley provided chiron with non-a/non-b hepatitis serum from chimpanzees. there, michael houghton and colleagues discovered a single virus and changed the name to hcv. the virus was then cloned from infectious sera of patients with posttransfusion hepatitis. hepatitis c may lead to chronic liver disease and cirrhosis, which is a predisposing factor for liver cancer. the encounter with a family, in which many members developed breast or liver cancer, led pierre paul broca to hypothesize, in 1866, that an inherited abnormality within the affected tissue caused the tumor development [broca 1866 theodor boveri (1862 boveri ( -1915 then proposed that defects in chromosomes lead to malignancy [boveri 1914 ]. he hypothesized that malignant tumors might be the result of a certain abnormal condition of the chromosomes, which may arise from multipolar mitosis. the main concepts of boveri's theory are: -the problem of tumors is a cellular problem -typically, every tumor arises form a single cell -the primordial cells of tumors contain, as a result of an abnormal process, definite and wrongly combined chromatin contents -chromosome abnormalities are the cause to the tendency toward rapid cell proliferation, which is passed on to all decendents of the primordial cell. in the 1950s, sajiro makino in japan, theodore hauschka in the united states, and albert levan in sweden observed that virtually all tumor cell lines have chromosomal aberrations. the discovery of the philadelphia chromosome in chronic myeloid leukemia [nowell and hungerford 1960] later provided experimental evidence for boveri's theories. it supported the hypothesis that damage to the chromosomes induced carcinogenesis. aneuploidy, typically with elevated dna content, is a frequent marker of cancerous cells. providing more functional insight, the first description of a translocation was reported in 1973 by janet d. rowley [rowley 1973 ]. although the philadelphia chromosome was among the first translocations to be discovered, the genes involved in the translocation that causes burkitt lymphoma were the first to be molecularly characterized. in 1982, carlo croce and bob gallo showed that the myc proto-oncogene on chromosome 8 is affected by the translocation. simultaneously, phil leder's group demonstrated that myc is translocated into the 5′ region of the immunoglobulin heavy chain (igh) gene [dalla-favera et al. 1982; taub et al. 1982] . cancers represent a large category of somatic cell genetic diseases [mckusick 1985] . the term "somatic mutation" was first applied to cancer by ernest tyzzer, who observed that tumors sequentially transplanted into mice developed a continuous broadening of host specificity among recipients from various inbred strains [tyzzer 1916 ]. by the 1970s, tyzzer's model had received a molecular underpinning and cancer was understood as a disease of genetic alterations. tumor initiation and progression occurs through the accumulation of changes that begin when a single normal cell sustains a permanent genetic damage. the resulting dysregulation of gene function is responsible for the clonal expansion of a population of somatic cells that ultimately becomes dominant. progress in the understanding of dna and genes has been a major determining factor for progress in cancer research. in 1869, johann friedrich miescher had identified a weakly acidic substance of unknown function in the nuclei of human white blood cells, which later became known as deoxyribonucleic acid, or dna. the term gene (derived from the greek γενοσ = origin), attributed to johanssen, first appeared in 1909 as an abstract concept to explain the hereditary basis of traits. oswald avery, colin mcleod, and maclyn mccarthy showed in 1944 that dna constitutes the genetic material. in 1953, james watson and francis crick deduced the double helical structure of dna from x-ray diffraction data, generated by rosalind franklin. in 1961, sidney brenner and francis crick established that groups of three nucleotide bases, or codons, are used to specify individual amino acids. the genetic code of nucleotide triplets was worked out in final detail in 1966, mainly through work by marshall nierenberg and heinrich matthaei. this paved the way for the molecular analysis of gene damage. one of the most important approaches for biotechnology is the cloning of genes inserted into plasmids. it was initiated through discussions between stanley cohen and herb boyer at a conference in hawaii, and by march 1973 the feasibility of their new method was demonstrated. pcr was invented by kary b. mullis in spring of 1983. these techniques allowed for the large availability and easy manipulation of cancer related genes. in 1977, frederick sanger at the medical research council in cambridge, uk and walter gilbert at harvard university in boston, usa independently devised methods for sequencing dna, which were further developed by leroy hood at the california institute of technology, who invented an automated dna sequencer in 1985. in 1990, the human genome project was launched to obtain the complete blueprint of human dna, planned for 2005. in 1998, the competition by a private enterprise, led by craig venter, accelerated the process, so that both groups presented a draft sequence of the genome by june 2000. the genetic analysis of cancer experienced additional support from the technical accomplishment to manipulate individual genes in vivo. in 1982, a team led by richard palmiter and ralph brinster generated the first transgenic mouse. this was achieved through pronuclear microinjection of genetic material into the nuclei of fertilized eggs. from 1987 through 1989, teams led by martin evans, oliver smithies, and mario capecchi created knockout mice by selectively disabling a specific target gene in embryonic stem cells. rna tumor viruses can cause normal cells to adopt the characteristics of rapid uncontrolled growth that are typical of many tumors. the discovery of the human proto-oncogene src by dominique stéhelin, harold varmus, michael bishop, and peter vogt [stéhelin et al. 1976] confirmed that viral oncogenes are derived from related genes of host cells. their analysis implied that the cellular src sequence is involved in the normal regulation of growth. it also suggested that tumors could arise independently of viruses as a result of mutations in their related cellular genes. consecutively in 1982, three publications in the journal nature independently of one another identified a point mutation in the proto-oncogene ras as a defect associated with bladder cancer [chang et al. 1982; parada et al. 1982; mcbride et al. 1982 ]. these discoveries revealed that a cellular transforming gene involved in human bladder and lung tumors was homologous to the transforming viral ras gene [parada et al. 1982; der et al. 1982] , and that an activating point mutation affected the identical codon in all cases. thus, it became apparent that the same cellular proto-oncogenes could be affected by viruses, by chemical carcinogens, or by nonviral somatic mutations, which brought together various previously independent lines of research. the observation that the growth of murine tumor cells in vivo could be suppressed by fusion of the tumor cells with nontransformed cells provided evidence that the ability of cells to form a tumor is a recessive trait [ephrussi et al. 1969 ]. knudson [knudson 1971 ] carried out an epidemiological study of retinoblastoma development in children. he postulated that "two hits" are required for the complete inactivation of a tumor suppressor gene. the gene p53 was discovered independently by linzer and levine [1979] and by lane and crawford [1979] as a cellular protein that binds to the viral oncoprotein of sv40. initially suspected as a cellular oncogene, due to mutations that act as dominant negative forms, the identification of loss of heterozygozity and loss of function mutations of p53 confirmed its actual role as a tumor suppressor [baker et al. 1990 ]. after this clarification, p53 became known as the guardian of the genome, because it protects from the consequences of genetic damage by inhibiting cell division or inducing cell death. in 1983, loss of heterozygosity analysis was used to map the tumor suppressor gene rb, which was then cloned in 1986 [friend et al. 1986 ]. oxidative metabolism inevitably leads to dna damage. this may occur by direct oxidation of bases, by induction of dna strand breaks, or by mediation of frameshift mutations in microsatellite dna. each cell (of estimated 10 14 in the human body) loses more than 10 4 bases (out of a total of 6 × 10 9 nucleotides) per day from the spontaneous breakdown of dna at body temperature, mostly through the damage by reactive oxygen species. a similar number of lesions is generated by spontaneous depurination, resulting in miscoding by the residual apurinic site [loeb 2001 ]. the deamination of 5-methylcytosine to thymine is among the most frequent causes for point mutations. it accounts for more than 20% of all base mutations that give rise to genetic disease [krawczak et al. 1998 ]. it has been estimated that 5-methylcytosine deaminates at a rate of 5.8 × 10 −17 s −1 at each cpg site (cytosine and guanine separated by a phosphate) [shen et al. 1994] , which corresponds to about four residues per cell per day. mutation frequencies of the hypoxanthine phosphoribosyl transferase (hprt) gene, a commonly used marker for mutation frequency, in normal adult epithelial cells reach approximately 1.3 × 10 −4 [martin et al. 1996 ]. the reduplication of dna during cell division introduces the possibility of errors at an estimated rate of 1.4 × 10 −10 nucleotides/cell/division. loeb and colleagues [loeb et al. 1974 ] realized that it would be unlikely for tumor cells to acquire the number of mutations presumably needed for full transformation during the lifetime of the host and postulated the existence of mutator genes. much later, the study of hereditary non-polyposis coli led to the discovery of defective dna repair genes [ionov et al. 1993; thibodeau et al. 1993; parsons et al. 1993] . any mutation of cancer associated genes can be handed on to following generations and predispose the affected cells to malignant transformation in the case of additional dna damage. the formation of cancer has been termed "clonal evolution" to describe how certain mutations enable cells to copy their damaged dna and divide under conditions, which cause normal cells to stop replicating. the repetition of this process allows cells to accumulate cancerous mutations [cavenee and white 1995] . in 1949, berenblum and shubik [1949] concluded that carcinogenesis is at least a two-stage process. five years later, armitage and doll [1954] inferred from their analysis of age and cancer incidence a 6-7 step process. in 1983, newbold and overell observed that an activated ras gene failed to transform normal fibroblasts, unless they were first immortalized [newbold and overell 1983] . this led to the hypothesis that ras activation was only one step in a number of mutations necessary in the pathway to malignancy. the concept of multiple somatic mutations as underlying mechanism of carcinogenesis was further advanced by a multistep carcinogenesis model, conceived of by foulds [1957] and refined by fearon and vogelstein [1990] . it also gave rise to the recognition of chromosome instability and microsatellite instability as two distinct pathogenetic mechanisms of carcinogenesis. the technical achievements of differential display ], serial analysis of gene expression (sage) [velculescu et al. 1995; zhang et al. 1997] , and dna microarrays [schena et al. 1995; derisi et al. 1996 ] further advanced these concepts to the definition of transformation on the basis of aberrant gene expression profiles [kononen et al. 1998; golub et al. 1999 ]. in addition to chromosome integrity and dna sequence fidelity, the regulation of the chromatin structure is an important determinant in transformation. dna methylation is a covalent modification of the c5 position in cytosine. this methylation pattern is stably maintained at cpg dinucleotides by a family of dna methyl transferases that recognize hemi-methylated cpg dinucleotides after dna replication. dna hypo-methylation was identified as a characteristic of cancer cells in 1983 [feinberg and vogelstein 1983] . in 1964, vincent allfrey had realized that histones were often chemically modified by acetylation, which caused them to relax their binding to dna [allfrey et al. 1964 ]. this implied the possibility of a role for histones in cancer [roth 1965 ]. in 1974, robert kornberg proposed that chromatin was quite structured, consisting of repeated units of about 200 base pairs of dna wrapped around 2-4 distinct histones (later called nucleosomes) [kornberg 1974; kornberg and thomas 1974] . the importance of acetylation for the regulation of gene expression and gene silencing was, however, realized only many years later. in 1998, methylation and phosphorylation of histones were observed by several investigators to contribute similarly [bestor 1998 ]. today, various enzymes that modify histones are known to contribute to transformation [horiuchi et al. 1981 ]. beyond the development of cancer research from explanations on a cellular level to a molecular genetic level, there has been a development of dynamic models of carcinogenesis. winge introduced the concept of selective cellular proliferation, realizing that selection must operate on a genotypically mixed population of proliferating cells as inevitably as it acts on a genotypically mixed population of reproducing organisms [winge 1930 ]. macfarlane burnet conceptualized the clonal selection theory for immunity and applied it to cancer. it suggests that tumorigenesis represents the development of a clone of cells with the capacity to multiply excessively in the context of its relationships within the body [burnet 1959 ]. in the 1960s, feedback control in biological systems was described by francois jacob and jacque monod. cellular metabolism and proliferation are regulated by spatiotemporal circuits of mutual feedback control. they include extracellular and intracellular signals, rate limiting steps, and checkpoint controls. cancer development has also been described with the algorithms of ecology [michelson et al. 1987; maley et al. 2006 ] and game theory [tomlinson 1997 ]. the regulatory theory contends that cancer is not a morphologic entity, but an aberrant regulatory process among individual cells, their microenvironment, and the entire host. genetically identical cells and organisms exhibit substantial diversity, even when they have identical histories of environmental exposure. variation in gene expression, based in part on the stochastic nature of biochemical reactions, may contribute to this phenotypic variability [raser and o'shea 2005] . genetic changes underlying growth control, senescence, invasion, and stromalparenchymal interactions are part of a continuum of carcinogenesis that affects interrelated pathways. in malignant cells, the normal balance between the number of cells completing the cell cycle and the number of cells dying is changed. likewise, the balance of adhesive versus migratory surface molecules on malignant cells is shifted in favor of the motility enhancing receptors. full transformation has two basic requirements: -genetic instability of the cell to drive tumor progression -selective advantage of the cell to allow for clonal expansion [cairns 1975; nowell 1976 ]. the genetic instability of tumor cells is reflected in the heterogeneity within individual tumors and among tumors of the same type. it is based either on chromosome instability, leading to aneuploidy, or on defective dna repair, leading to microsatellite instability and gene mutations. genomic destabilization is an early event in tumor development. the mean number of alterations in a cell that turns carcinomatous may amount to about 11,000 [stoler et al. 1999 ]. waves of clonal expansion give rise to daughter cells that have the growth advantage typical of cancer. clonal selection drives this process. tumors are clonal insofar as they are derived from the same stem cell precursor. genetic instability generates a collection of coexisting subclones, each with the potential for future changes in the face of selective pressures [cahill et al. 1999 ]. the relative importance of selective advantage versus genetic instability in tumor initiation and progression is still subject to debate. studies of cell senescence have led to a research focus on population dynamics, selection, and evolution. hayflick recognized that there is a finite number of possible population doublings by nontransformed differentiated cells [hayflick and moorehead 1961] . after a limited number of divisions, a state of crisis is reached, in which most cells die. a few cells may be altered in a fashion that conveys a selective advantage, which allows them to grow out and dominate the population. these cells are selected and form an expanding population with potentially precancerous characteristics. the demonstration that htlv-1 immortalizes normal t-lymphocytes [popovic et al. 1983] led to additional investigations, which confirmed that tumor viruses can frequently immortalize human host cells. the shortening of the chromosome ends, telomeres [szostak and blackburn 1982; moyzis et al. 1988] , is an integral part of replicative senescence. the enzyme telomerase [mckay and cooke 1992; chong et al. 1995] replenishes the chromosome ends and can prevent this shortening. its activity is present in most cancer cells, but not typically in nontransformed differentiated cells [hastie et al. 1990 ]. the first cancer hospital was founded in the 18th century in reims, france. french gynecologist joseph claude anthelme récamier (1774-1852) described the invasion of the bloodstream by cancer cells, coining the word "metastasis." in the 1850s, pierre paul broca (1824 broca ( -1880 and karl von rokitansky (1804 -1878 , independently of each other, observed the venous spread of cancer. theories of metastasis formation have traditionally been based on concepts of population dynamics. in 1889, english surgeon stephen paget (1855 paget ( -1926 described the propensity of various types of cancer to form metastases in specific organs. he stated that "the distribution of the secondary growth is not a matter of chance" and proposed that these patterns were due to the dependence of the "seeds" (the cancer cells) on the "congenial soil" (the target organ for metastasis) [paget 1889 ]. this notion was challenged by american pathologist james ewing (1866 ewing ( -1943 , who suggested that circulatory patterns between a primary tumor and specific secondary organs were sufficient to account for most of the targeted metastasis [ewing 1928 ]. this was relativized by leonard weiss, who demonstrated that the number of metastases in specific target organs, derived from certain tumors, could not be accounted for solely by blood flow patterns [weiss 1992 ]. the first evidence that metastasis formation depends on intrinsic characteristics of the tumor cells came from experiments by isaiah fidler [fidler 1975 ], who generated sublines with increasing invasive potential by serial passage of a melanoma cell line through mice. soon, somatic cell fusion and microcell mediated chromosomal transfer suggested that the ability to disseminate was under positive and negative genetic control [ramshaw et al. 1983; sidebottom and clark 1983; layton and franks 1986] . these observations placed ensuing research activities into metastasis on a deterministic footing. the secretion of proteases by tumor cells [turpeenniemi-hujanen et al. 1985; matrisian et al. 1986 ] was recognized as one factor causing invasiveness. homing receptors were identified on the cell surface, which are necessary and sufficient to mediate metastasis formation by specific tumors [günthert et al. 1991] . in conjunction with the finding of metastasis suppressor genes [steeg et al. 1988; alvarez et al. 1990 ], the detection of metastasis genes has corroborated the existence of genetic programs intrinsic in the tumor cells, which regulate invasiveness. these observations have led to the development of a genetic theory of metastasis formation, according to which metastasis genes are developmentally nonessential genes that physiologically contribute to inflammation, wound healing, and stress-induced angiogenesis. their dysregulation in cancer occurs on the level of aberrant expression and splicing [weber and ashkar 2000] . tissue-specific molecular markers (addressins) were identified in 1988 [streeter et al. 1988 ], which implied the possibility that circulating cells could recognize target organs. this was corroborated by the identification of the contribution by chemokines and their cognate receptors to tumor dissemination [mueller et al. 2001 ]. in the evolution of research progress from a reductionist to a comprehensive understanding of cancer, interactions between the host and the cancer cells have recently received increasing attention. mintz and illmensee [1975] had demonstrated that the injection of undifferentiated embryonal carcinoma cells into mouse blastocysts suppressed their inherent tumorigenicity and led to the contribution by these cells to a variety of functional tissues. around the same time, the michigan radiologist john wolfe recognized that women with dense breasts had an elevated risk of contracting breast cancer, implying a role for the stromal architecture. in 1990, it was realized that the tissue environment had a dramatic effect on the potential by tumors to metastasize [nakajima et al. 1990 ]. tumorigenic prostatic stroma and nontumorigenic prostatic epithelium can interact to induce the development of carcinosarcoma [chung et al. 1988 ]. the concept that the stroma plays important roles in carcinogenesis has since been developed by mina bissell [bissell and radisky 2001] , judy campisi [krtolica et al. 2001] , and donald ingber [huang and ingber 1999] . early work in experimental carcinogenesis had shown vascularization and hyperemia around tumor transplants [ide et al. 1939; coman and sheldon 1946] and similarities were seen between the vascular reactions to tumors and to tissue damage [algire et al. 1945] . cancer researchers became interested in angiogenesis factors in 1968, when the first hints emerged that tumors might release such substances to foster their own progression. two groups, one led by melvin greenblatt in california with phillipe shubik in chicago, and another by robert l. ehrmann and mogens knoth in boston, showed that burgeoning tumors can release a substance that induces existing blood vessels to grow into them [rijhsinghani et al. 1968; ehrmann and knoth 1968] . such vascularization promotes tumor growth because it ensures a sufficient supply of oxygen and nutrients. folkman [1971; folkman et al. 1971 ] recognized the important role of blood vessels in the growth of cancerous tumors. after more than a decade of research, mediators of angiogenesis that are secreted by some tumors were identified [senger et al. 1983; shing et al. 1984 ]. the inhibition of vegf (vascular endothelial growth factor)-induced angiogenesis was shown to suppress tumor growth [kim et al. 1993 ]. today, a monoclonal antibody to vegf is used in the treatments of some cancers. these investigations also led to the discovery of naturally secreted compounds that curtail the growth of new tumors [taylor and folkman 1982; o'reilly et al. 1994; o'reilly et al. 1997 ]. in the 17th and 18th centuries, some believed that cancer was contagious. in fact, the first cancer hospital in france was forced to move from the city in 1779 because of the fear that cancer could spread throughout the city. more than a century later, the potentially protective role of the immune system against transformed cells was recognized. in the 1890s, new york surgeon william b. coley found a record of a young patient with round cell sarcoma on the neck, who had been listed as an utterly hopeless patient when he developed a severe infection of erysipelas. he survived the infection and his tumor went into remission. based on this case, coley devised a killed vaccine of streptococcus pyogenes (the cause of erysipelas) with serratia marcescens. after a few years of its use, he reported to have successfully treated some sarcoma patients with the application of his bacterial toxins (coley's toxin) [coley 1893 [coley , 1896 . after coley's death, his daughter helen coley nauts reviewed his records, published several reviews of his work, and founded the cancer research institute, which promotes immune therapies for cancer. in 1909, paul ehrlich carried out immunizations in animals with tumor cells and suggested that tumors occur at high frequency in humans, but are kept under control by the immune system [ehrlich 1909 ]. further developments in tumor immunology have led to models of selection and evolution of cancer cells. macfarlane burnet coined the term immunosurveillance in 1967 [burnet 1967 ]. in this conceptual framework, the host immune system constantly screens cells for signs of transformation and eliminates those that pose a threat to the body's integrity. the growth of a tumor reflects an escape from immunosurveillance. cancer cells that can evade the immune system, be it by down-regulation of antigen presenting or co-stimulatory molecules, be it by expression of immunosuppressive cell surface molecules or cytokines, will grow out and form tumors. three distinct theories were developed to interpret the nature of the tumor recognition by the immune system. -lewis thomas described homograft rejection as a primary defense against neoplasia [thomas 1959 ]. -according to concepts by burnet, which are based on self/non-self discrimination, the immune system is active early in antitumor protection. the early surveillance mechanisms shape the tumor's immunological phenotype [burnet 1967 ]. this was supported by the description of tumor specific antigens [old and boyse 1964] . tumors mostly express self antigens, which may account for the incomplete protection from transformation by the immune system. -the alternative proposal of the danger theory [matzinger 1994 ] implies that the immune system is activated only at later stages of carcinogenesis. during the early stages, tumor cells appear immunologically as healthy growing cells that do not send out danger signals to activate the immune system because they express neither microbial immune recognition patterns nor release distress signals to alarm the innate immune system cells [fuchs and matzinger 1996] . in advanced growth, hypoxia and tissue damage induce stress responses, which activate the immune system. in the framework of the danger theory, the immune system is activated at later stages of tumor development, when tissue damage has occurred. the possibility to direct the immune system to fight cancer cells in virtually any location within the body with minimal side effects has attracted increasing research efforts. the high specificity and high binding affinity of antibodies made them attractive as potential anticancer agents. for a long time, however, they were difficult to isolate in large quantities. the fusion of antibody producing cells with myeloma cells into hybridomas, accomplished by cesar milstein and georges koehler in the early 1970s, changed that. yet, biotechnology had to advance to accomplish humanizing such antibodies before they became successful in therapy. in 1997, the us food and drug administration (fda) approved rituxan, a monoclonal antibody to cd20 (developed by idec pharmaceuticals) to treat non-hodgkin lymphoma [mclaughlin et al. 1998 ]. the process also led to the development of herceptin, spearheaded by dennis slamon, an antibody that targets the receptor erbb2 (her-2/neu) that is overexpressed on the surface of about 30% of breast cancers. because antitumor immunity is predominantly cellular immunity, other research has been directed toward turning t-lymphocytes against tumors. steven a. rosenberg focused his efforts to generate antitumor vaccines on tumor associated antigens. in a similar approach, martin kast studied the development of peptide-based vaccines. glenn dranoff demonstrated the high effectiveness of irradiated tumor cells transfected with the cytokine gm-csf in inducing antitumor immunity [dranoff et al. 1993] . over time it became clear, on the other hand, that the immune system could also impact negatively on cancer risk in the context of chronic inflammation. in 1876, robert koch and louis pasteur had shown independently of each other that microorganisms can cause disease. in the 1980s, barry j. marshall and j. robin warren demonstrated that gastric ulcers were caused by bacteria they called helicobacter pylori. infection results in widespread inflammation that predisposes to stomach cancer. inflammation in the stomach mucosa is also a risk factor for malt (mucosaassociated lymphoid tissue) lymphoma, a lymphatic neoplasm in the stomach. over the decades, the roles of hormones in carcinogenesis have received increasing attention. the observation by bernardino ramazzini in 1713 of a virtual absence of cervical cancer and relatively high incidence of breast cancer in nuns was an important step toward identifying and understanding the importance of hormonal factors, such as those associated with pregnancy, in modifying cancer risk. in 1878, thomas beatson discovered that the breasts of rabbits stopped producing milk after he removed the ovaries. he suggested to the edinburgh medico-chirurgical society in 1896: "this fact (. . .) pointed to one organ holding control over the secretion of another and separate organ." beatson found that oophorectomy often resulted in the improvement of breast cancer patients and inferred the stimulating effect of a female ovarian hormone on breast cancer, before the hormone itself was discovered [beatson 1896 ]. allen and doisy [1923] identified an ovarian hormone they referred to as "estrus stimulating principle," later called estrogen. from the late 1950s to the 1970s elwood jensen demonstrated that such hormones do not undergo redox modifications to become activated. instead, they bind to a receptor protein within their target cells [jensen and jacobson 1962] . this hormone/receptor complex then travels to the cell nucleus, where it regulates gene expression. the first nonsteroidal antiestrogen to be reported in the literature, mer25, was described by lerner and coworkers in 1958 [lerner et al. 1958 ] as an agent that had no other hormonal or antihormonal properties. the drug failed in clinical trial because the large doses required caused serious central nervous system side effects. tamoxifen, first discovered in 1962, is a nonsteroidal antiestrogen that serves a dual role as breast cancer treatment and preventive. it was approved for the treatment of advanced breast cancer by the us fda in 1977. awareness of the androgen dependence of prostate tissue can be traced back to the scottish surgeon john hunter, who observed in 1786 that castrated bulls had small prostates. in 1941 , charles brenton huggins (1901 -1997 , a urologist at the university of chicago, with his students clarence v. hodges and william wallace scott, published three papers that demonstrated the relationship between the endocrine system and the normal functioning of the prostate gland. in the 1940s, charles huggins also reported a dramatic regression of metastatic prostate cancer following removal of the testes [huggins and hodges 1941] . later, drugs that blocked male hormones were found to be effective treatments for prostate cancer. androgen ablation with gonadotropin releasing hormone agonists (gnrh-as) in prostate cancer patients was first reported in 1982 [tolis et al. 1982] . in 1988, the androgen receptor was cloned ]. iatrogenic causes for cancer predisposition were incriminated by a study published in 1971, which documented an association between clear-cell adenocarcinoma of the vagina and in utero exposure to diethylstilbestrol [herbst et al. 1971 ] (dodds and associates had characterized diethylstilbestrol as an extremely potent estrogen [dodds et al. 1938] ; it had been prescribed for close to 30 years to prevent certain complications of pregnancy and as a treatment for advanced breast cancer in postmenopausal women). in july 2002, the women's health initiative study was stopped after more breast cancers and heart problems occurred among women taking estrogen-progestin pills. in 2006, multiple clinical studies showed that breast cancer rates in the united states dropped in 2003, consecutive to a drastic reduction in the use of hormone replacement therapy. some of the numbers came from the national cancer institute's surveillance database, which uses cancer registries around the country to project national incidence and death rates. vascular reactions of normal and malignant tumors in vivo: i. vascular reactions of mice to wounds and to normal and neoplastic transplants an ovarian hormone: preliminary report on its localization, extraction and partial purification and action in test animals acetylation and methylation of histones and their possible role in the regulation of rna synthesis inhibition of collagenolytic activity and metastasis of tumor cells by a recombinant human tissue inhibitor of metalloproteinases the age distribution of cancer and a multi-stage theory of carcinogenesis suppression of human colorectal carcinoma cell growth by wild-type p53 rna-dependent dna polymerase in virions of rna tumour viruses on the treatment of inoperable cases of carcinoma of the mamma: suggestions for a new method of treatment with illustrative cases an experimental study of the initiating stage of carcinogenesis, and re-examination of the somatic cell mutation theory of cancer gene silencing. methylation meets acetylation putting tumors in context zur frage der entstehung maligner tumoren a role for sunlight in skin cancer: uv-induced p53 mutations in squamous cell carcinoma selective g to t mutations of p53 gene in hepatocellular carcinoma from southern africa traité des tumeurs, 2 vols identification of a transformationspecific antigen induced by avian sarcoma virus immunological aspects of malignant disease the clonal selection theory of acquired immunity the cultivation of tissues of the chick embryo outside the body genetic instability and darwinian selection in tumours mutation selection and the natural history of cancer the genetic basis of cancer molecular cloning of human and rat complementary dna encoding androgen receptors tumorigenic transformation of mammalian cells induced by a normal human gene homologous to the oncogene of harvey murine sarcoma virus identification of herpes-like dna sequences in aids-associated kaposi's sarcoma a human telomeric protein prostatic carcinogenesis evoked by cellular interaction innesto positivo con filtrado di verrucae volgare further observations upon the treatment of malignant tumours with the toxins of erysipelas and bacillus prodigiosus with a report of 160 cases the treatment of malignant tumors by repeated inoculations of erysipelas, with a report of ten original cases the significance of hyperemia around tumor implants nucleotide sequence of an avian sarcoma virus oncogene (src) and proposed amino acid sequence for gene product human c-myc onc gene is located on the region of chromosome 8 that is translocated in burkitt lymphoma cells transforming genes of human bladder and lung carcinoma cell lines are homologous to the ras genes of harvey and kirsten sarcoma viruses use of cdna microarray to analyze gene expression patterns in human cancer oestrogenic activity of esters of diethylstilboestrol the causes of cancer: quantitative estimates of avoidable risks of cancer in the united states today vaccination with irradiated tumor cells engineered to secrete murine granulocyte-macrophage colony-stimulating factor stimulates potent, specific, and long-lasting anti-tumor immunity lessons learned from the development of an abl tyrosine kinase inhibitor for chronic myelogenous leukemia ueber den jetzigen stand der karzinomforschung transfilter stimulation of vasoproliferation in the hamster cheek pouch. studied by light and electron microscopy experimentelle leukämie bei hühnern malignancy of somatic cell hybrids virus particles in cultured lymphoblasts from burkitt's lymphoma. the lancet i neoplastic diseases: a treatise on tumors temporary remissions in acute leukemia in children produced by folic antagonist, 4-aminopteroylglutamic acid (aminopterin) a genetic model for colorectal tumorigenesis hypomethylation distinguishes genes of some human cancers from their normal counterparts biological behavior of malignant melanoma cells correlated to their survival in vivo tumor angiogenesis: therapeutic implications isolation of a tumour factor responsible for angiogenesis tumor progression a comparative study of two regimens of combination chemotherapy in acute leukemia the effectiveness of combinations of antileukemic agents in inducing and maintaining remission in children with acute leukemia a human dna segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma is cancer dangerous to the immune system? genetic markers as tracers in cell culture tissue culture studies of the proliferative capacity of cervical carcinoma and normal epithelium molecular classification of cancer: class discovery and class prediction by gene expression monitoring nitrogen mustard therapy. use of methyl-bis(beta-chloroethyl)amine hydrochloride and tris(beta-chloroethyl)amine hydrochloride for hodgkin's disease, lymphosarcoma, leukemia, and certain allied and miscellaneous disorders biochemistry of cancer susceptibility of newborn mice of an otherwise apparently 'resistant' strain to inoculation with leukemia a new variant of glycoprotein cd44 confers metastatic potential to rat carcinoma cells observations on the living developing nerve fiber telomere reduction in human colorectal carcinoma and with ageing the limited in vitro lifetime of human diploid cell strains adenocarcinoma of the vagina. association of maternal stilbestrol therapy with tumor appearance in young women bcl-2 is an inner mitochondrial membrane protein that blocks programmed cell death increased histone acetylation and deacetylation in rat ascites hepatoma cells the structural and mechanical complexity of cell-growth control oncogenes of rna tumor viruses as determinants of cancer studies on prostatic cancer, i: the effect of estrogen and of androgen injection on serum phosphatases in metastatic carcinoma of the prostate vascularization of the brown-pearce rabbit epithelioma transplant as seen in the transparent ear chamber ubiquitous somatic mutations in simple repeat sequences reveal a new mechanism for colonic neoplasia basic guides to the mechanism of estrogen action a new subtype of human t-cell leukemia virus (htlv-ii) associated with a t-cell variant of hairy cell leukemia apoptosis: a basic biologic phenomenon with wide-ranging implications in tissue kinetics inhibition of vascular endothelial growth factor-induced angiogenesis suppresses tumour growth in vivo mutation and cancer: statistical study of retinoblastoma tissue microarrays for high-throughput molecular profiling of tumor specimens chromatin structure: a repeating unit of histones and dna chromatin structure; oligomers of the histones neighboringnucleotide effects on the rates of germ-line single-base-pair substitution in human genes senescent fibroblasts promote epithelial cell growth and tumorigenesis: a link between cancer and aging role of the blood and the bone marrow in certain forms of gas poisoning t antigen is bound to a host protein in sv40-transformed cells selective suppression of metastasis but not tumorigenicity of a mouse lung carcinoma by cell hybridization a nonsteroidal estrogen antagonist 1-( p-2 diethylaminoethoxyphenyl)-1-phenyl-2-p-methoxyphenyl-ethanol malignant cells differential display and cloning of messenger rnas from human breast cancer versus mammary epithelial cells differential display of eukaryotic messenger rna by means of the polymerase chain reaction characterization of a 54k dalton cellular sv40 tumor antigen present in sv40-transformed cells and uninfected embryonal carcinoma cells a mutator phenotype in cancer errors in dna replication as a basis of malignant change genetic clonal diversity predicts progression to esophageal adenocarcinoma somatic mutations are frequent and increase with age in human kidney epithelial cells rous sarcoma virus: a function required for the maintenance of the transformed state the mrna coding for the secreted protease transin is expressed more abundantly in malignant than in benign tumors tolerance, danger, and the extended family localization of the normal allele of t24 human bladder carcinoma oncogene to chromosome 11 a protein which specifically binds to single stranded ttagggn repeats marcella o'grady boveri (1865-1950) and the chromosome theory of cancer rituximab chimeric anti-cd20 monoclonal antibody therapy for relapsed indolent lymphoma: half of patients respond to a four-dose treatment program tumor micro-ecology and competitive interactions purification and some characteristics of human t-cell growth factor from phytohemagglutininstimulated lymphocyte-conditioned media the presence and significance of bound aminoazo dyes in the livers of rats fed p-dimethylaminoazobenzene normal genetically mosaic mice produced from malignant teratocarcinoma cells a highly conserved repetitive dna sequence, (ttaggg)n, present at the telomeres of human chromosomes involvement of chemokine receptors in breast cancer metastasis artificial transmutation of the gene influence of organ environment on extracellular matrix degradative activity and metastasis of human colon carcinoma cells molecular analysis of mbcl-2: structure and expression of the murine gene homologous to the human gene involved in follicular lymphoma banded marker chromosomes as indicators of intraspecies cellular contamination fibroblast immortality is a prerequisite for transformation by ej c-ha-ras oncogene the clonal evolution of tumor cell populations chromosome studies on normal and leukemic human leukocytes the incredible life and times of biological cells immunology of experimental tumors endostatin: an endogenous inhibitor of angiogenesis and tumor growth angiostatin: a novel angiogenesis inhibitor that mediates the suppression of metastases by a lewis lung carcinoma the distribution of secondary growths in cancer of the breast human ej bladder carcinoma oncogene is homologue of harvey sarcoma virus ras gene hypermutability and mismatch repair deficiency in rer + tumor cells mutations in the p53 tumor suppressor gene in human cutaneous squamous cell carcinomas detection and isolation of type c retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous t-cell lymphoma isolation and transmission of human retrovirus (human t-cell leukemia virus) biochemical perspectives in cancer research phenotypic diversity in experimental hepatomas: the concept of partially blocked ontogeny. the 10th perturbational profiling of a cell-line model of tumorigenesis by using metabolic measurements the use of cell fusion to analyse factors involved in tumour cell metastasis noise in gene expression: origins, consequences, and control über extracellulare entstehung thierischer zellen und über vermehrung derselben durch theilung. müllers archiv für anatomie untersuchungen ueber die entwickelung der wirbelthiere the edward gemaliel janeway lecture: the sword and the ploughshare vascular abnormalities induced by benzo[a]pyrene: an in vivo study in the hamster cheek pouch histones in development, growth and cancer a sarcoma of the fowl transmissible by an agent separable from the tumor cells identification of a translocation with quinacrine fluorescence in a patient with acute leukemia quantitative relations between causative virus and cell in the rous no. 1 chicken sarcoma functional and morphologic characterization of human t cells continuously grown in vitro quantitative monitoring of gene-expression patterns with a complementary-dna microarray beiträge zur pathogenesis. müllers archiv für anatomie mikroskopische untersuchungen über die übereinstimmung in der struktur und dem wachsthum der thiere und der pflanzen tumor cells secrete a vascular permeability factor that promotes accumulation of ascites fluid the rate of hydrolytic deamination of 5-methylcytosine in doublestranded dna heparin affinity: purification of a tumorderived capillary endothelial cell growth factor cell fusion segregates progressive growth from metastasis field cancerization" in oral stratified squamous epithelium: clinical implications of multicentric origin prevention of chemical carcinogenesis by vitamin a and its synthetic analogs (retinoids) evidence for a novel gene associated with low tumor metastatic potential dna related to the transforming gene(s) of avian sarcoma viruses is present in normal avian dna the onset and extent of genomic instability in sporadic colorectal tumor progression a tissue-specific endothelial cell molecule involved in lymphocyte homing cloning yeast telomeres on linear plasmid vectors translocation of the c-myc gene into the immunoglobulin heavy chain locus in human burkitt lymphoma and murine plasmacytoma cells protamine is an inhibitor of angiogenesis rna-dependent dna polymerase in virions of rous sarcoma virus microsatellite instability in cancer of the proximal colon cellular and humoral aspects of the hypersensitive states tumor growth inhibition in patients with prostatic carcinoma treated with luteinizing hormone-releasing hormone agonists game-theory models of interactions between tumour cells expression of collagenase iv (basement membrane collagenase) activity in murine tumor cell hybrids that differ in metastatic potential tumor immunity kritische sauerstoff-und glucoseversorgung maligner tumoren bcl-2 gene promotes haemopoietic cell survival and cooperates with c-myc to immortalize pre-b cells serial analysis of gene expression die cellularpathologie in ihrer begründung auf physiologische und pathologische gewebelehre über asymmetrische zellteilung in epithelkrebsen und deren biologische bedeutung the metabolism of tumours hereditary with reference to carcinoma enzymology of cancer cells (first of two parts) stress response genes -the genes that make cancer metastasize racing to the beginning of the road. the search for the origin of cancer comments on hematogenous metastatic patterns in humans as revealed by autopsy zytologische untersuchungen über die natur maligner tumoren. ii gene expression profiles in normal and cancer cells key: cord-015684-q10sx1dm authors: cacabelos, ramón title: pharmacogenomic biomarkers in neuropsychiatry: the path to personalized medicine in mental disorders date: 2009 journal: the handbook of neuropsychiatric biomarkers, endophenotypes and genes doi: 10.1007/978-90-481-2298-1_1 sha: doc_id: 15684 cord_uid: q10sx1dm neuropsychiatric disorders and dementia represent a major cause of disability and high cost in developed societies. most disorders of the central nervous system (cns) share some common features, such as a genomic background in which hundreds of genes might be involved, genome-environment interactions, complex pathogenic pathways, poor therapeutic outcomes, and chronic disability. recent advances in genomic medicine can contribute to accelerate our understanding on the pathogenesis of cns disorders, improve diagnostic accuracy with the introduction of novel biomarkers, and personalize therapeutics with the incorporation of pharmacogenetic and pharmacogenomic procedures to drug development and clinical practice. the pharmacological treatment of cns disorders, in general, accounts for 10–20% of direct costs, and less than 30–40% of the patients are moderate responders to conventional drugs, some of which may cause important adverse drugs reactions (adrs). pharmacogenetic and pharmacogenomic factors may account for 60–90% of drug variability in drug disposition and pharmacodynamics. approximately 60–80% of cns drugs are metabolized via enzymes of the cyp gene superfamily; 18% of neuroleptics are major substrates of cyp1a2 enzymes, 40% of cyp2d6, and 23% of cyp3a4; 24% of antidepressants are major substrates of cyp1a2 enzymes, 5% of cyp2b6, 38% of cyp2c19, 85% of cyp2d6, and 38% of cyp3a4; 7% of benzodiazepines are major substrates of cyp2c19 enzymes, 20% of cyp2d6, and 95% of cyp3a4. about 10–20% of caucasians are carriers of defective cyp2d6 polymorphic variants that alter the metabolism of many psychotropic agents. other 100 genes participate in the efficacy and safety of psychotropic drugs. the incorporation of pharmacogenetic/ pharmacogenomic protocols to cns research and clinical practice can foster therapeutics optimization by helping to develop cost-effective pharmaceuticals and improving drug efficacy and safety. to achieve this goal several measures have to be taken, including: (a) educate physicians and the public on the use of genetic/ genomic screening in the daily clinical practice; (b) standardize genetic testing for major categories of drugs; (c) validate pharmacogenetic and pharmacogenomic procedures according to drug category and pathology; (d) regulate ethical, social, and economic issues; and (e) incorporate pharmacogenetic and pharmacogenomic procedures to both drugs in development and drugs in the market to optimize therapeutics. central nervous system (cns) disorders are the third problem of health in developed countries, representing 10-15% of deaths, after cardiovascular disorders (25-30%) and cancer (20-25%) . approximately, 127 million europeans suffer brain disorders. the total annual cost of brain disorders in europe is about €386 billion, with €135 billion of direct medical expenditures (€78 billion, inpatients; €45 billion, outpatients; €13 billion, pharmacological treatment), €179 billion of indirect costs (lost workdays, productivity loss, permanent disability), and €72 billion of direct non-medical costs. mental disorders represent €240 billion (62% of the total cost, excluding dementia), followed by neurological diseases (€84 billion, 22%). 1 senile dementia is becoming a major problem of health in developed countries, and the primary cause of disability in the elderly. alzheimer's disease (ad) is the most frequent form of dementia (50-70%), followed by vascular dementia (30-40%) , and mixed dementia (15-20%) . these prevalent forms of agerelated neurodegeneration affect more than 25 million people at present, and probably more than 75 million people will be at risk in the next 20-25 years worldwide. the prevalence of dementia increases exponentially from approximately 1% at 60-65 years of age to more than 30-35% in people older than 80 years. it is very likely that in those patients older than 75-80 years of age most cases of dementia are mixed in nature (degenerative + vascular), whereas pure ad cases are very rare after 80 years of age. the average annual cost per person with dementia ranges from €10,000 to 40,000, depending upon disease stage and country, with a lifetime cost per patient of more than €150,000. in some countries, approximately 80% of the global costs of dementia (direct + indirect costs) are assumed by the patients and/or their families. about 10-20% of the costs in dementia are attributed to pharmacological treatment, including anti-dementia drugs, psychotropics (antidepressants, neuroleptics, anxiolytics), and other drugs currently prescribed in the elderly (antiparkinsonians, anticonvulsants, vasoactive compounds, antiinfl ammatory drugs, etc). in addition, during the past 20 years more than 300 drugs have been partially or totally developed for ad, with the subsequent costs for the pharmaceutical industry, and only 5 drugs with moderate-to-poor effi cacy and questionable cost-effectiveness have been approved in developed countries. [2] [3] [4] the lack of accurate diagnostic markers for early prediction and an effective therapy of cns disorders are the two most important problems to effi ciently diagnose and halt disease progression. the pharmacological treatment of cns disorders, in general, accounts for 10-20% of direct costs, and less than 30-40% of the patients are moderate responders to conventional drugs, some of which may cause important adverse drugs reactions (adrs). in the case of dementia, less than 20% of the patients can benefi t from current drugs (donepezil, rivastigmine, galantamine, memantine), with doubtful cost-effectiveness. the pathogenic mechanisms of most cns disorders (e.g., psychosis, depression, anxiety, alzheimer's disease, parkinson's disease, huntington's disease, multiple sclerosis, etc) are poorly understood. this circumstance makes it diffi cult the implantation of a molecular intervention to neutralize causative factors. in fact, more than 80% of the 25,000 genes integrating the human genome are expressed in the cns at different periods of the life span, and only a few neurotransmitters (e.g., noradrenaline, dopamine, acetylcholine, gaba, histamine, and less than ten neuropeptides) are the actual targets of conventional psychopharmacology. common features in cns disorders include the following: (a) polygenic/ complex disorders in which genomic and environmental factors are involved; (b) deterioration of higher activities of the cns; (c) multifactorial dysfunctions in several brain circuits; and (d) accumulation of toxic proteins in the nervous tissue in cases of neurodegeneration. for instance, the neuropathological hallmark of alzheimer's disease (ad) (amyloid deposition in senile plaques, neurofi brillary tangle formation, and neuronal loss) is but the phenotypic expression of a pathogenic process in which more than 200 genes and their products are potentially involved. drug metabolism, and the mechanisms underlying drug effi cacy and safety, are also genetically regulated complex traits in which hundreds of genes cooperatively participate. structural and functional genomics studies demonstrate that genomic factors, probably induced by environmental factors, cerebrovascular dysfunction, and epigenetic phenomena, might be responsible for pathogenic events leading to premature neuronal dysfunction and/or death. pharmacogenetic and pharmacogenomic factors may account for 60-90% of drug variability in drug disposition and pharmacodynamics. about 10-20% of caucasians are carriers of defective cyp2d6 polymorphic variants that alter the metabolism of many psychotropic agents. the incorporation of pharmacogenetic/pharmacogenomic protocols to cns research and clinical practice can foster therapeutics optimization by helping to develop cost-effective pharmaceuticals and improving drug effi cacy and safety. [5] [6] [7] extensive molecular genetics studies carried out in the past 2 decades have demonstrated that most cns disorders are multifactorial, polygenic/complex disorders in which hundreds of genes distributed across the human genome might be involved (tables 40.1-40. 3). 8, 9 for example, 255 genes have been associated with dementia (table 40 .1), 205 with schizophrenia (table 40. 2), 106 with depression (table 40. 3), 107 with anxiety, 103 with stroke, 385 with different types of ataxia, 155 with epilepsiy, 83 with meningioma, 105 with glioblastoma, 27 with astrocytoma, 73 with parkinson's disease, and more than 30 genes with cerebrovascular disorders. 8, 10 many of these genetic associations could not be replicated in different settings and different populations due to many complex (methodological, technological) factors. 8, 11, 12 furthermore, the same genomic defect can give rise to apparent diverse phenotypes, and different genomic defects can converge in an apparently common phenotype, this increasing the complexity of genomic studies (e.g., patient recruitment, pure controls, concomitant pathology, epigenetic factors, environmental factors). several candidate genes for schizophrenia may also be associated with bipolar disorder, including g72, disc1, nrg1, rgs4, ncam1, dao, grm3, grm4, grin2b, mlc1, syngr1, and slc12a6. genes associated with bipolar disorder include trpm2 (21q22.3), gpr50 (xq28), citron (12q24), chp1.5 (18p11.2), gchi (14q22-24), mlc1 (22q13), gabra5 (15q11-q13), bcr (22q11), cux2, flj32356 (12q23-q24), and napg (18p11). 9 another paradigmatic example of heterogeneity and complexity is dementia, one of the most heterogeneous disorders of the cns. the genetic defects identifi ed in ad during the past 25 years can be classifi ed into three main categories: (a) mendelian or mutational defects in genes directly linked to ad, including (i) 32 mutations in the amyloid beta (aβ)(abp) precursor protein (app) gene (21q21); (ii) 165 mutations in the presenilin 1 (ps1) gene (14q24.3); and (iii) 12 mutations in the presenilin 2 (ps2) gene (1q31-q42) 8, 10, 13 (table 40 .1). (b) multiple polymorphic variants of risk characterized in more than 200 different genes distributed across the human genome can increase neuronal vulnerability to premature death 8 (table 40 .1). among these genes of susceptibility, the apolipoprotein e (apoe) gene (19q13.2) is the most prevalent as a risk factor for ad, especially in those subjects harbouring the apoe-4 allele, whereas carriers of the apoe-2 allele might be protected against dementia. 8 apoe-related pathogenic mechanisms are also associated with brain aging and with the neuropathological hallmarks of ad. 8 (c) diverse mutations located in mitochondrial dna (mtdna) through heteroplasmic transmission can infl uence aging and oxidative stress conditions, conferring phenotypic heterogeneity. 8, 14, 15 it is also likely that defective functions of genes associated with longevity may infl uence premature neuronal survival, since neurons are potential pacemakers defi ning life span in mammals. 8 all these genetic factors may interact in still unknown genetic networks leading to a cascade of pathogenic events characterized by abnormal protein processing and misfolding with subsequent accumulation of abnormal proteins (conformational changes), ubiquitin-proteasome system dysfunction, excitotoxic reactions, oxidative and nitrosative stress, mitochondrial injury, synaptic failure, altered metal homeostasis, dysfunction of axonal and dendritic transport, and chaperone misoperation 8,16-20 ( fig. 40.1 ). these pathogenic events may exert an additive effect, converging in fi nal pathways leading to premature neuronal death. some of these mechanisms are common to several neurodegenerative disorders which differ depending upon the gene(s) affected and the involvement of specifi c genetic networks, together with cerebrovascular factors, epigenetic factors (dna methylation) and environmental conditions (nutrition, toxicity, social factors, etc). 8, [16] [17] [18] [19] [20] [21] [22] the higher the number of genes involved in ad pathogenesis, the earlier the onset of the disease, the faster its clinical course, and the poorer its therapeutic outcome. 8, [16] [17] [18] [19] [20] high throughput microarray gene expression profi ling is an effective approach for the identifi cation of candidate genes and associated molecular pathways implicated in a wide variety of biological processes or disease states. the cellular complexity of the cns (with 10 3 different cell types) and synapses (with each of the 10 11 neurons in the brain having around 10 3 -10 4 synapses with a complex multiprotein structure integrated by 10 3 different proteins) requires a very powerful technology for gene expression profi ling, which is still in the very early stages and is not devoid of technical obstacles and limitations. 23 transcripts of 16,896 genes have been measured in different cns regions. each region possess its own unique transcriptome fi ngerprint that is independent of age, gender and energy intake. less than 10% of genes are affected by age, diet or gender, with most of these changes occurring between middle and old age. gender and energy restriction have robust infl uences on the hippocampal transcriptome of middle-aged animals. prominent functional groups of age-and energy-sensitive genes are those encoding proteins involved in dna damage responses, mitochondrial and proteasome functions, cell fate determination and synaptic vesicle traffi cking. the systematic transcriptome dataset provides a window into mechanisms of neuropathogenesis and cns vulnerability. 24 with the advent of modern genomic technologies, new loci have been associated with different neuropsychiatric disorders, and novel pathogenic mechanisms have been postulated. cryptic chromosome imbalances are increasingly acknowledged as a cause for mental retardation and learning disability. with subtelomeric screening, nine chromosomal anomalies and submicroscopic deletions of 1pter, 2qter, 4pter, 5qter and 9qter have been identifi ed in patients with mental retardation. 25 increased dna fragmentation was observed in non-gabaergic neurons in bipolar disorder, suggesting that non-gabaergic cell may be selectively vulnerable to oxidative stress and apoptosis in patients with bipolar disorder. 26 [17] [18] [19] [20] with laser microdissection, rna amplifi cation, and array hybridization, expression of more than 1,000 genes was detected in ca1 and ca3 hippocampal neurons under normoxic conditions. the comparison of each region under normoxic and ischemic conditions revealed more than 5,000 ischemia-regulated genes for each individual cell type. 27 microarray technology has helped to elucidate gene expression profi les and potential pathogenic mechanisms in many other cns disorders including schizophrenia and bipolar disorder, [28] [29] [30] speech and language disorders, 31 parkinson's disease, 32, 33 huntington's disease, 34 prion disease, 35 drug addiction, 36,37 alcoholism, 38 brain trauma, 39 epilepsy, [40] [41] [42] cockayne syndrome, 43 rett syndrome, 44 friedreich ataxia, 45 neuronal ceroid lipofuscinosis, 46 multiple sclerosis, 47 amyotrophic lateral esclerosis, 48 acute pneumococcal meningitis, 49 and the role of lipids in brain injury, psychiatric disorders, and neurodegenerative diseases. [50] [51] [52] interactions between genomic factors and environmental factors have been proposed as important contributors for brain neuropathology. in schizophrenia, neurodevelopmental disturbances, neurotoxins and perinatal infections, myelin-and olygodendrocytes abnormalities and synaptic dysfunctions have been suggested as pathophysiological factors. individual genotoxicants can induce distinct gene expression signatures. exposure of the brain to environmental agents during critical periods of neuronal development can alter neuronal viability and differentiation, global gene expression, stress and immune response, and signal transduction. 53 the binomial genome-neurotoxicants effect can be documented in cases of drug abuse or alcohol dependence. functional gene expression differences between inbred alcohol-preferring and nonpreferring rats suggest the presence of powerful genomic infl uences on alcohol dependence. 54 alcohol dependence and associated cognitive impairment may result from neuroadaptations to chronic alcohol consumption involving changes in expression of multiple genes. it has been suggested that cycles of alcohol intoxication/withdrawal, which may initially activate nuclear factor-kappa b (nf-κb), when repeated over years downregulate p65 (rela) mrna expression and nf-κb and p50 homodimer dna-binding. downregulation of the dominant p50 homodimer, a potent inhibitor of gene transcription apparently results in depression of κb regulated genes. alterations in expression of p50 homodimer/nf-κb regulated genes may contribute to neuroplastic adaptation underlying alcoholism. 55 gene expression profi ling of the nucleus accumbens of cocaine abusers suggests a dysregulation of myelin. 56 humans who abused cocaine, cannabis and/or phencyclidine share a decrease in transcription of calmodulin-related genes and increased transcription related to lipid/cholesterol and golgi/er function. 57 another important issue in the pathogenesis and therapeutics of cns disorders is the role of micror-nas (mirnas). mirnas are small (22 nucleotide), endogenous noncoding rna molecules that posttranscriptionally regulate expression of protein-coding genes. computational predictions estimate that the vertebrate genomes may contain up to 1,000 mirna genes. mirnas are generated from long primary transcripts that are processed in multiple steps to cytoplasmic 22 nucleotide mature mirnas. the mature mirna is incorporated into the mirna-induced silencing complex (mirisc), which guides it to target sequences located in 3′ utrs where by incomplete base-pairing induce mrna destabilization or translational repression of the target genes. an inventory of mirna expression profi les from 13 regions of the mouse cns has been reported. 58 this inventory of cns mirna profi les provides an important step toward further elucidation of mirna function and mirna-related gene regulatory networks in the mammalian cns. 58 the introduction of novel procedures into an integral genomic medicine protocol for cns disorders is an imperative requirement in drug development and in the clinical practice to improve diagnostic accuracy and to optimize therapeutics. this kind of protocol should integrate the following components: (i) clinical history, (ii) laboratory tests, (iii) neuropsychological assessment, (iv) cardiovascular evaluation, (v) conventional x-ray technology, (vi) structural neuroimaging, (vii) functional neuroimaging, (viii) computerized brain electrophysiology, (ix) cerebrovascular evaluation, (x) structural genomics, (xi) functional genomics, (xii) pharmacogenetics, (xiii) pharmacogenomics, (ix) nutrigenetics, (x) nutrigenomics, (xi) bioinformatics for data management, and (xii) artifi cial intelligence procedures for diagnostic assignments and probabilistic therapeutic options (table 40 .4). 2, 8, [16] [17] [18] [19] [20] [21] [22] 59, 60 all these procedures, under personalized strategies adapted to the complexity of each case, are essential to depict a clinical profi le based on specifi c biomarkers correlating with individual genomic profi les. functional genomics studies have demonstrated the infl uence of many genes on cns pathogenesis and phenotype expression (tables 40.1-40. 3). taking ad as an example, it has been demonstrated that mutations in the app, ps1, ps2, and mapt genes give rise to wellcharacterized differential neuropathological and clinical phenotypes of dementia. 8 the analysis of genotypephenotype correlations has also revealed that the presence of the apoe-4 allele in ad, in conjunction with other genes, infl uences disease onset, brain atrophy, cerebrovascular perfusion, blood pressure, β-amyloid deposition, apoe secretion, lipid metabolism, brain bioelectrical activity, cognition, apoptosis, and treatment outcome. 8, [16] [17] [18] [19] [20] [21] [22] 61 the characterization of phenotypic profi les according to age, cognitive performance (mmse and adas-cog score), serum apoe levels, serum lipid levels including cholesterol (cho), hdl-cho, ldl-cho, vldl-cho, and triglyceride (tg) levels, as well as serum nitric oxide (no), β-amyloid, and histamine levels, reveals sex-related differences in 25% of the biological parameters and almost no differences (0.24%) when patients are classifi ed as apoe-4(−) and apoe-4(+) carriers, probably indicating that genderrelated factors may infl uence these parametric variables more powerfully than the presence or absence of the apoe-4 allele; in contrast, when patients are classifi ed according to their apoe genotype, dramatic differences emerge among apoe genotypes (>45%), with a clear biological disadvantage in apoe-4/4 carriers who exhibit (i) earlier age of onset, (ii) low apoe levels, (iii) high cho and ldl-cho levels, and (iv) low no, β-amyloid, and histamine levels in blood. 8, [16] [17] [18] [19] [20] [21] [22] 61 these phenotypic differences are less pronounced when ad patients are classifi ed according to their ps1 (15.6%) or ace genotypes (23.52%), refl ecting a weak impact of ps1-and ace-related genotypes on the phenotypic expression of biological markers in ad. ps1related genotypes appear to infl uence age of onset, blood histamine levels and cerebrovascular hemodynamics, as refl ected by signifi cant changes in systolic (sv), diastolic (dv), and mean velocities (mv) in the left middle cerebral arteries (mca). 19 ace-related phenotypes seem to be more infl uential than ps1 genotypes in defi ning biological phenotypes, such as age of onset, cognitive performance, hdl-cho levels, ace and no levels, and brain blood fl ow mv in mca. however, when apoe and ps1 genotypes are integrated in bigenic clusters and the resulting bigenic genotypes are differentiated according to their corresponding phenotypes, an almost logarithmic increased expression of differential phenotypes is observed (61.46% variation), indicating the existence of a synergistic effect of the bigenic (apoe + ps1) cluster on the expression of biological markers, apparently unrelated to app/ps1 mutations, since none of the patients included in the sample were carriers of either app or ps1 mutations. 19, 61, 62 these examples illustrate the potential additive effects of ad-related genes on the phenotypic expression of biological markers. furthermore, the analysis of genotype-phenotype correlations with a monogenic or bigenic approach documents a modest genotype-related variation in serum amyloid-β (abp) levels, suggesting that peripheral levels of abp are of relative value as predictors of disease-stage or as markers of disease progression and/ or treatment-related disease-modifying effects. 19, 61, 62 the peripheral levels of abp in serum exhibit an apoe-dependent pattern according to which both apoe-4(+) and apoe-2(+) carriers tend to show higher abp levels than apoe-4(−) or apoe-3 carriers 19,61-63 ( fig. 40 est concentration of serum histamine is systematically present in apoe-2(+) and apoe-4(+) carriers, and the highest levels of histamine are seen in apoe-3(+) carriers ( fig. 40.3 ). central and peripheral histaminergic mechanisms may regulate cerebrovascular function in ad, which is signifi cantly altered in apoe-4/4 carriers. 19, [61] [62] [63] [64] [65] [66] these observations can lead to the conclusion that the simple quantifi cation of biochemical markers in fl uids or tissues of ad patients with the aim of identifying pathogenic mechanisms and/or monitoring therapeutic effects, when they are not accompanied by differential genotyping for sample homogenization, are of very poor value. differential patterns of apoe-, ps1-, ps2-, and trigenic (apoe + ps1 + ps2) cluster-related lymphocyte apoptosis have been detected in ad. fas receptor expression is signifi cantly increased in ad, especially in apoe-4 carriers where lymphocyte apoptosis is more relevant. 19, 67 it has been demonstrated that brain activity slowing correlates with progressive gds staging in dementia 8, 16, [18] [19] [20] (fig. 40.4 ). in the general population subjects harbouring the apoe-4/4 genotype exhibit a premature slowing in brain mapping activity represented by increased slow delta and theta activities as compared with other apoe genotypes. in patients with ad, slow activity predominates in apoe-4 carriers with similar gds stage 8,16,18-20 ( fig. 40.4) . ad patients harbouring the apoe-4/4 genotype also exhibit a dramatically different brain optical topography map refl ecting a genotype-specifi c differential pattern of neocortical oxygenation as well as a poorer activation of cortical neurons in response to somatosensory stimuli ( fig. 40 .5). all these examples of genotype-phenotype correlations, as a gross approach to functional genomics, illustrate the importance of genotype-related differences in ad and their impact on phenotype expression. 8, [16] [17] [18] [19] [20] [21] [22] 62, 63 similar protocols are applied to schizophrenia, depression, anxiety and other neuropsychiatric disorders. most biological parameters, potentially modifi able by monogenic genotypes and/or polygenic cluster profi les, can be used in clinical trials for monitoring effi cacy outcomes. these parametric variables also show a genotypedependent profi le in different types of dementia (e.g., ad vs. vascular dementia). for instance, striking differences have been found between ad and vascular dementia in structural and functional genomics studies. 8, [16] [17] [18] [19] [20] [21] [22] 62, 63 our understanding of the pathophysiology of cns disorders has advanced dramatically in the last 30 years, especially in terms of their molecular pathogenesis and genetics. drug treatment of cns disorders has also made remarkable strides, with the introduction of many new drugs for the treatment of schizophrenia, depression, anxiety, epilepsy, parkinson's disease, and alzheimer's disease, among many other quantitatively and qualita-tively important neuropsychiatric disorders. improvement in terms of clinical outcome, however, has fallen short of expectations, with up to one third of the patients continuing to experience clinical relapse or unacceptable medication-related side effects in spite of efforts to identify optimal treatment regimes with one or more drugs. 68 potential reasons to explain this historical setback might be that: (a) the molecular pathology of most cns disorders is still poorly understood; (b) drug targets are inappropriate, not fi tting into the real etiology of the disease; (c) most treatments are symptomatic, but not anti-pathogenic; (d) the genetic component of most cns disorders is poorly defi ned; and (e) the understanding of genome-drug interactions is very limited. with the advent of recent knowledge on the human genome 69,70 and the identifi cation and characterization of many genes associated with cns disorders, 8, 19 as well as novel data regarding cyp family genes and other genes whose enzymatic products are responsible for drug metabolism in the liver (e.g., nats, abcbs/ mdrs, tpmt), it has been convincingly postulated that the incorporation of pharmacogenetic and pharmacogenomic procedures ( fig. 40 .6) in drug development might bring about substantial benefi ts in terms of therapeutics optimization in cns disorders and in many other complex disorders, assuming that genetic factors are determinant for both neuronal dysregulation (and/or neuronal death) 8,16-22 and drug metabolism. [71] [72] [73] fig. 40.6 effi cacy and safety issues associated with pharmacogenetics and pharmacogenomics (adapted from r. cacabelos 19, 20 ) however, this fi eld is still in its infancy; and the incorporation of pharmacogenomic strategies to drug development and pharmacological screening in cns disorders is not an easy task. the natural course of technical events to achieve effi cient goals in pharmacogenetics and pharmacogenomics include the following steps: (a) genetic testing of mutant genes and/or polymorphic variants of risk; (b) genomic screening, and understanding of transcriptomic, proteomic, and metabolomic networks; (c) functional genomics studies and genotype-phenotype correlation analysis; and (d) pharmacogenetics and pharmacogenomics developments, addressing drug safety and effi cacy, respectively. 8, [16] [17] [18] [19] [20] [21] [22] [74] [75] [76] [77] with pharmacogenetics we can understand how genomic factors associated with genes encoding enzymes responsible for drug metabolism regulate pharmacokinetics and pharmacodynamics (mostly safety issues). [78] [79] [80] with pharmacogenomics we can differentiate the specifi c disease-modifying effects of drugs (effi cacy issues) acting on pathogenic mechanisms directly linked to genes whose mutations determine the disease phenotype. [16] [17] [18] [19] [20] [21] [22] [74] [75] [76] [77] the capacity of drugs to reverse the effects of the activation of pathogenic cascades (phenotype expression) regulated by networking genes basically deals with effi cacy issues. at present, the terms pharmacogenetics and pharmacogenomics are often used interchangeably to refer to studies of the contribution of inheritance to variation in the drug response phenotype 73 ; however, from historical and didactic reasons (until a more suitable and universal defi nition can be established) it would be preferable to maintain the term of pharmacogenetics for the discipline dealing with genetic factors associated with drug metabolism and safety issues, whereas pharmacogenomics would refer to the reciprocal infl uence of drugs and genomic factors on pathogenetic cascades and disease-associated gene expression (effi cacy issues). [18] [19] [20] [21] [22] [74] [75] [76] [77] the application of these procedures to cns disorders is a very diffi cult task, since most neuropsychiatric diseases are complex disorders in which hundreds of genes might be involved 8, [16] [17] [18] [19] [20] [21] [22] [74] [75] [76] [77] (tables 40.1-40.3 ). in addition, it is very unlikely that a single drug be able to reverse the multifactorial mechanisms associated with neuronal dysfunction in most cns processes with a complex phenotype affecting mood, personality, behaviour, cognition, and functioning. this heterogeneous clinical picture usually requires the utilization of different drugs administered simultaneously. this is particularly important in the elderly population. in fact, the average number of drugs taken by patients with dementia ranges from six to more than ten per day depending upon their physical and mental conditions. nursing home residents receive, on average, seven to eight medications each month, and more than 30% of residents have monthly drug regimes of nine or more medications, including (in descending order) analgesics, antipyretics, gastrointestinal agents, electrolytic and caloric preparations, central nervous system (cns) agents, anti-infective agents, and cardiovascular agents. 81 in population-based studies more than 35% of patients older than 85 years are moderate or chronic antidepressant users. 82 polypharmacy, drug-drug interactions, adverse reactions, and non-compliance are substantial therapeutic problems in the pharmacological management of elderly patients, 83 adding further complications and costs to the patients and their caregivers. in 2000-2001, 23.0-36.5% of elderly individuals received at least 1 of 33 potentially inappropriate medications in ten health maintenance organizations (hmos) of the usa. 84 although drug effect is a complex phenotype that depends on many factors, it is estimated that genetics accounts for 20-95% of variability in drug disposition and pharmacodynamics. 79 under these circumstances, therapeutics optimization is a major goal in neuropsychiatric disorders and in the elderly population, and novel pharmacogenetic and pharmacogenomic procedures may help in this endeavour. [16] [17] [18] [19] [20] [21] [22] [74] [75] [76] [77] the pharmacogenomic outcome depends upon many different determinant factors including (i) genomic profi le (family history, ethnic background, disease-related genotype, pharmacogenetic genotype, pharmacogenomic genotype, nutrigenetic genotype, nutrigenomic genotype), (ii) disease phenotype (age at onset, disease severity, clinical symptoms), (iii) concomitant pathology, (iv) genotype-phenotype correlations, (v) nutritional conditions, (vi) age and gender, (vii) pharmacological profi le of the drugs, (viii) drug-drug interactions, (ix) gene expression profi le, (x) transcriptomic cascade, (xi) proteomic profi le, and (xii) metabolomic networking ( fig. 40.7) . the dissection and further integration of all these factors is of paramount importance for the assessment of the pharmacogenomic outcome in terms of safety and effi cacy (figs. 40.8 and 40.9). more than 80% of psychotropic drugs (table 40 .5) are metabolized by enzymes known to be genetically variable, including: (a) esterases: butyrylcholinesterase, paraoxonase/arylesterase; (b) transferases: n-acetyltransferase, sulfotransferase, thiol methyltransferase, thiopurine methyltransferase, catechol-o-methyltransferase, glutathione-s-transferases, udp-glucuronosyltransferases, glucosyltransferase, histamine methyltransferase; (c) reductases: nadph: quinine oxidoreductase, glucose-6-phosphate dehydrogenase; (d) oxidases: alcohol dehydrogenase, aldehydehydrogenase, monoamine oxidase b, catalase, superoxide dismutase, trimethylamine n-oxidase, dihydropyrimidine dehydrogenase; and (e) cytochrome p450 enzymes, such as cyp1a1, cyp2a6, cyp2c8, cyp2c9, cyp2c19, cyp2d6, cyp2e1, cyp3a5 (table 40 .6) and many others. 19, 20 polymorphic variants in these genes can induce alterations in drug metabolism modifying the effi cacy and safety of the prescribed drugs. 85 drug metabolism includes phase i reactions (i.e., oxidation, reduction, hydrolysis) and phase ii conjugation reactions (i.e., acetylation, glucuronidation, sulfation, methylation). 80 the principal enzymes with polymorphic variants involved in phase i reactions are the following: cyp3a4/5/7, cyp2e1, cyp2d6, cyp2c19, cyp2c9, cyp2c8, cyp2b6, cyp2a6, cyp1b1, cyp1a1/2, epoxide hydrolase, esterases, nqo1 (nadph-quinone oxidoreductase), dpd (dihydropyrimidine dehydrogenase), adh (alcohol dehydrogenase), and aldh (aldehyde dehydrogenase). major enzymes involved in phase ii reactions include the following: ugts (uridine 5′-triphosphate glucuronosyl transferases), tpmt (thiopurine methyltransferase), comt (catechol-o-methyltransferase), hmt (histamine methyl-transferase), sts (sulfotransferases), gst-a (glutathion s-transferase a), gst-p, gst-t, gst-m, nat2 (n-acetyl transferase), nat1, and others. 86 polymorphisms in genes associated with phase ii metabolism enzymes, such as gstm1, gstt1, nat2 and tpmt are well understood, and information is also emerging on other gst polymorphisms and on polymorphisms in the udp-glucuronosyltransferases and sulfotransferases. the typical paradigm for the pharmacogenetics of phase i drug metabolism is represented by the cytochrome p-450 enzymes, a superfamily of microsomal nonsteroidal anti-infl ammatory drug drug-metabolizing enzymes. p450 enzymes comprise a superfamily of heme-thiolate proteins widely distributed in bacteria, fungi, plants and animals. the p450 enzymes are encoded in genes of the cyp superfamily (table 40 .6) and act as terminal oxidases in multicomponent electron transfer chains which are called p450containing monooxigenase systems. some of the enzymatic products of the cyp gene superfamily can share substrates, inhibitors and inducers whereas others are quite specifi c for their substrates and interacting drugs. [18] [19] [20] [71] [72] [73] [78] [79] [80] there are more than 200 p450 genes identifi ed in different species. saito et al 87 provided a catalogue of 680 variants among eight cyp450 genes, nine esterase genes, and two other genes in the japanese population. the microsomal, membrane-associated, p450 isoforms cyp3a4, cyp2d6, cyp2c9, cyp2c19, cyp2e1, and cyp1a2 are responsible for the oxidative metabolism of more than 90% of marketed drugs. about 60-80% of the psychotropic agents currently used for the treatment of neuropsychiatric disorders are metabolized via enzymes of the cyp family, especially cyp1a2, cyp2b6, cyp2c8/9, cyp2c19, cyp2d6 and cyp3a4 (table 40 .5). cyp3a4 metabolizes more drug molecules than all other isoforms together. most of these polymorphisms exhibit geographic and ethnic differences. [88] [89] [90] [91] [92] [93] [94] these differences infl uence drug metabolism in different ethnic groups in which drug dosage should be adjusted according to their enzymatic capacity, differentiating normal or extensive metabolizers (ems), poor metabolizers (pms) and ultrarapid metabolizers (ums). most drugs act as substrates, inhibitors or inducers of cyp enzymes. enzyme induction enables some xenobiotics to accelerate their own biotransformation (auto-induction) or the biotransformation and elimination of other drugs. a number of p450 enzymes in human liver are inducible. induction of the majority of p450 enzymes occurs by increase in the rate of gene transcription and involves ligand-activated transcription factors, aryl hydrocarbon receptor, constitutive androstane receptor (car), and pregnane x receptor (pxr). 93, 95 in general, binding of the appropriate ligand to the receptor initiates the induction process that cascades through a dimerization of the receptors, their translocation to the nucleus and binding to specifi c regions in the promoters of cyps. 95 cyps are also expressed in the cns, and a complete characterization of constitutive and induced cyps in brain is essential for understanding the role of these enzymes in neurobiological functions and in age-related and xenobiotic-induced neurotoxicity. 96 assuming that the human genome contains about 20,000-30,000 genes, at the present time only 0.31% of commercial drugs have been assigned to corresponding genes whose gene products might be involved in pharmacokinetic and pharmacodynamic activities of a given drug; and only 4% of the human genes have been assigned to a particular drug metabolic pathway. supposing a theoretical number of 100,000 chemicals in current use worldwide, and assuming that practically all human genes can interact with drugs taken by human beings, each gene in the human genome should be involved in the metabolism and/or biopharmacological effect of 30-40 drugs; however, assuming that most xenobiotic substances in contact with our organism can infl uence genomic function, it might be possible that for 1,000,000 xenobiotics in daily contact with humans, an average of 350-500 xenobiotics have to be assigned to each one of the genes potentially involved in drug metabolism and/or xenobiotics processing. to fulfi l this task a single gene has to possess the capacity of metabolizing many different xenobiotic substances and at the same time many different genes have to cooperate in orchestrated networks to metabolize a particular drug or xenobiotic under sequential biotransformation steps (figs. 40.7 and 40.8). numerous chemicals increase the metabolic capability of organisms by their ability to activate genes encoding various xenochemical-metabolizing enzymes, such as cyps, transferases and transporters. many natural and artifi cial substances induce the hepatic cyp subfamilies in humans, and these inductions might lead to clinically important drug-drug interactions. some of the key cellular receptors that mediate such inductions have been recently identifi ed, including nuclear receptors, such as the constitutive androstane receptor (car, nr1i3), the retinoid x receptor (rxr, nr2b1), the pregnane x receptor (pxr, nr1i3), and the vitamin d receptor (vdr, nr1i1) and steroid receptors such as the glucocorticoid receptor (gr, nr3c1). 97 there is a wide promiscuity of these receptors in the induction of cyps in response to xenobiotics. indeed, this adaptive system acts as an effective network where receptors share partners, ligands, dna response elements and target genes, infl uencing their mutual relative expression. 97,98 the most important enzymes of the p450 cytochrome family in drug metabolism by decreasing order are cyp3a4, cyp2d6, cyp2c9, cyp2c19, and cyp2a6. [85] [86] [87] 94, 99, 100 the predominant allelic variants in the cyp2a6 gene are cyp2a6 * 2 (leu160his) and cyp2a6del. the cyp2a6 * 2 mutation inactivates the enzyme and is present in 1-3% of caucasians. the cyp2a6del mutation results in no enzyme activity and is present in 1% of caucasians and 15% of asians. [18] [19] [20] 86 the most frequent mutations in the cyp2c9 gene are cyp2c9 * 2 (arg144cys), with reduced affi nity for p450 in 8-13% of caucasians, and cyp2c9 * 3 (ile359leu), with alterations in the specifi city for the substrate in 6-9% of caucasians and 2-3% of asians. [18] [19] [20] 86 the most prevalent polymorphic variants in the cyp2c19 gene are cyp2c19 * 2, with an aberrant splicing site resulting in enzyme inactivation in 13% of caucasians, 23-32% of asians, 13% of africans, and 14-15% of ethiopians and saoudians, and cyp2c19 * 3, a premature stop codon resulting in an inactive enzyme present in 6-10% of asians, and almost absent in caucasians. [18] [19] [20] 86, 101 the most important mutations in the cyp2d6 gene are the following: cyp2d6 * 2xn, cyp2d6 * 4, cyp2d6 * 5, cyp2d6 * 10 and cyp2d6 * 17. [18] [19] [20] 96, 102 the cyp2d6 * 2xn mutation gives rise to a gene duplication or multiplication resulting in an increased enzyme activity which appears in 1-5% of the caucasian population, 0-2% of asians, 2% of africans, and 10-16% of ethiopians. the defective splicing caused by the cyp2d6 * 4 mutation inactivates the enzyme and is present in 12-21% of caucasians. the deletion in cyp2d6 * 5 abolishes enzyme activity and shows a frequency of 2-7% in caucasians, 1% in asians, 2% in africans, and 1-3% in ethiopians. the polymorphism cyp2d6 * 10 causes pro34ser and ser486thr mutations with unstable enzyme activity in 1-2% of caucasians, 6% of asians, 4% of africans, and 1-3% of ethiopians. the cyp2d6 * 17 variant causes thr107ile and arg296cys substitutions which produce a reduced affi nity for substrates in 51% of asians, 6% of africans, and 3-9% of ethiopians, and is practically absent in caucasians. [18] [19] [20] 86, 96, 102 the cyp2d6 enzyme, encoded by a gene that maps on 22q13.1-13.2, catalyses the oxidative metabolism of more than 100 clinically important and commonly prescribed drugs such as cholinesterase inhibitors, antidepressants, neuroleptics, opioids, some β-blockers, class i antiarrhythmics, analgesics and many other drug categories, acting as substrates, inhibitors or inducers with which most psychotropics may potentially interact (table 40 .5), this leading to the outcome of adrs. [18] [19] [20] 86, 96, 103 the cyp2d6 locus is highly polymorphic, with more than 100 different cyp2d6 alleles identifi ed in the general population showing defi cient (poor metabolizers, pm), normal (extensive metabolizers, em) or increased enzymatic activity (ultra-rapid metabolizers, um). 100,104 most individuals (>80%) are ems; however, remarkable interethnic differences exist in the frequency of the pm and um phenotypes among different societies all over the world. [18] [19] [20] 89, [91] [92] [93] [94] 102 on the average, approximately 6.28% of the world population belongs to the pm category. europeans (7.86%), polynesians (7.27%), and africans (6.73%) exhibit the highest rate of pms, whereas orientals (0.94%) show the lowest rate. the frequency of pms among middle eastern populations, asians, and americans is in the range of 2-3%. [16] [17] [18] [19] [20] 94 cyp2d6 gene duplications are relatively infrequent among northern europeans, but in east africa the frequency of alleles with duplication of cyp2d6 is as high as 29%. 73 the most frequent cyp2d6 alleles in the european population are the following: cyp2d6 * 1 (wild-type) (normal), cyp2d6 * 2 (2850c > t)(normal), cyp2d6 * 3 (2549a > del)(inactive), cyp2d6 * 4 (1846g > a)(inactive), cyp2d6 * 5 (gene deletion)(inactive), cyp2d6 * 6 (1707t > del)(inactive), cyp2d6 * 7 (2935a > c)(inac-tive), cyp2d6 * 8 (1758g > t)(inactive), cyp2d6 * 9 (2613-2615 delaga)(partially active), cyp2d6 * 10 (100c > t)(partially active), cyp2d6 * 11 (883g > c) (inactive), cyp2d6 * 12 (124g > a)(inactive), cyp2d6 * 17 (1023c > t)(partially active), and cyp2d6 gene duplications (with increased or decreased enzymatic activity depending upon the alleles involved). [16] [17] [18] [19] [20] [104] [105] [106] in the spanish population, where the mixture of ancestral cultures has occurred for centuries, the distribution of the cyp2d6 genotypes differentiates 4 major categories of cyp2d6-related metabolizer types: (i) extensive metabolizers (em)( * 1/ * 1, * 1/ * 10); (ii) intermediate metabolizers (im)( * 1/ * 3, * 1/ * 4, * 1/ * 5, * 1/ * 6, * 1/ * 7, * 10/ * 10, * 4/ * 10, * 6/ * 10, * 7/ * 10); (iii) poor metabolizers (pm)( * 4/ * 4, * 5/ * 5); and (iv) ultra-rapid metabolizers (um)( * 1xn/ * 1, * 1xn/ * 4, dupl). in this sample we have found 51.61% ems, 32.26% ims, 9.03% pms, and 7.10% ums. 20, [74] [75] [76] [77] the distribution of all major genotypes is the following: * 1/ * 1, 47.10%; * 1/ * 10, 4.52%; * 1/ * 3, 1.95%; * 1/ * 4, 17.42%; * 1/ * 5, 3.87%; * 1/ * 6, 2.58%; * 1/ * 7, 0.65%; * 10/ * 10, 1.30%; * 4/ * 10, 3.23%; * 6/ * 10, 0.65%; * 7/ * 10, 0.65%; * 4/ * 4, 8.37%; * 5/ * 5, 0.65%; * 1xn/ * 1, 4.52%; * 1xn/ * 4, 1.95%; and dupl, 0.65%. 20, [74] [75] [76] [77] in some instances, there is association of cyp2d6 variants of risk with genes potentially involved in the pathogenesis of specifi c cns disorders. when comparing ad cases with controls, we observed that ems are more prevalent in ad ( * 1/ * 1, 49.42%; * 1/ * 10, 8.04%)(total ad-ems: 57.47%) than in controls ( * 1/ * 1, 44.12%; * 1/ * 10, 0%)(total c-ems: 44.12%). in contrast, ims are more frequent in controls (41.18%) than in ad (25.29%), especially the * 1/ * 4 (c: 23.53%; ad: 12.64%) and * 4/ * 10 genotypes (c: 5.88%; ad: 1.15%). the frequency of pms was similar in ad (9.20%) and controls (8.82%), and ums were more frequent among ad cases (8.04%) than in controls (5.88%). 20, 74, 75, 77 we have also investigated the association of cyp2d6 genotypes with ad-related genes, such as app, mapt, apoe, ps1, ps2, a2m, ace, agt, fos, and prnp variants. 20, 74, 75, 77 no app or mapt mutations have been found in ad cases. homozygous apoe-2/2 (12.56%) and apoe-4/4 (12.50%) accumulate in ums, and apoe-4/4 cases were also more frequent in pms (6.66%) than in ems (3.95%) or ims (0%). ps1-1/1 genotypes were more frequent in ems (45%), whereas ps-1/2 genotypes were over-represented in ims (63.16%) and ums (60%). the presence of the ps1-2/2 genotype was especially high in pms (38.46%) and ums (20%). a mutation in the ps2 gene exon 5 (ps2e5+) was markedly present in ums (66.67%). about 100% of ums were a2m-v100i-a/a, and the a2m-v100i-g/g genotype was absent in pms and ums. the a2m-i/i genotype was absent in ums, and 100% of ums were a2m-i/d and ace-d/d. homozygous mutations in the fos gene (b/b) were only present in ums, as well. agt-t235t cases were absent in pms, and the agt-m174m genotype appeared in 100% of pms. likewise, the prnp-m129m variant was present in 100% of pms and ums. 20, 74, 75, 77 these association studies clearly show that in pms and ums there is an accumulation of ad-related polymorphic variants of risk which might be responsible for the defective therapeutic responses currently seen in these ad clusters. 20, [74] [75] [76] [77] it appears that different cyp2d6 variants, expressing ems, ims, pms, and ums, infl uence to some extent several biochemical parameters, liver function, and vascular hemodynamic parameters which might affect drug effi cacy and safety. blood glucose levels are found elevated in ems ( * 1/ * 1 vs. * 4/ * 10, p < 0.05) and in some ims ( * 4/ * 10 vs. * 1xn/ * 4, p < 0.05), whereas other ims ( * 1/ * 5 vs. * 4/ * 4, p < 0.05) tend to show lower levels of glucose compared with pms ( * 4/ * 4) or ums ( * 1xn/ * 4) (table 40 .7). the highest levels of total-cholesterol are detected in the ems with the cyp2d6 * 1/ * 10 genotype (vs. * 1/ * 1, * 1/ * 4 and * 1xn/ * 1, p < 0.05). the same pattern has been observed with regard to ldlcholesterol levels, which are signifi cantly higher in the em-* 1/ * 10. in general, both total cholesterol levels and ldl-cholesterol levels are higher in ems (with a signifi cant difference between * 1/ * 1 and * 1/ * 10), intermediate levels are seen in ims, and much lower levels in pms and ums; and the opposite occurs with hdlcholesterol levels, which on average appear much lower in ems than in ims, pms, and ums, with the highest levels detected in * 1/ * 3 and * 1xn/ * 4 (table 40 .8). the levels of triglycerides are very variable among different cyp2d6 polymorphisms, with the highest levels present in ims ( * 4/ * 10 vs. * 4/ * 5 and * 1xn/ * 1, p < 0.02). these data clearly indicate that lipid metabolism can be infl uenced by cyp2d6 variants or that specifi c phenotypes determined by multiple lipid-related genomic clusters are necessary to confer the character of ems and ims. other possibility might be that some lipid metabolism genotypes interact with cyp2d6-related enzyme products leading to defi ne the pheno-genotype of pms and ums. no signifi cant changes in blood pressure values have been found among cyp2d6 genotypes; however, important differences became apparent in brain cerebrovascular hemodynamics (table 40 .9). in general terms, the best (2) 0.88 ± 0.07 (6) 0.56 ± 0.02 ( ldl-cholesterol levels, than in ims ( * 4/ * 10, p < 0.05); and diastolic velocities (dv) also tend to be much lower in * 1/ * 10 and especially in pms ( * 4/ * 4) and ums ( * 1xn/ * 4), whereas the best dv is measured in * 1/ * 5 ims. more striking are the results of both the pulsatility index (pi = (sv-dv)/mv) and resistance index (ri = (sv-dv)/sv), which are worse in ims and pms than in ems and ums (table 40 .9). these data taken together seem to indicate that cyp2d6-related ad pms exhibit a poorer cerebrovascular function which might affect drug penetration in the brain with the consequent therapeutic implications. [16] [17] [18] [19] [20] [74] [75] [76] [77] some conventional anti-dementia drugs (tacrine, donepezil, galantamine) are metabolized via cyp-related enzymes, especially cyp2d6, cyp3a4, and cyp1a2, and polymorphic variants of the cyp2d6 gene can affect the liver metabolism, safety and effi cacy of some cholinesterase inhibitors. 107, 108 in order to elucidate whether or not cyp2d6-related variants may infl uence transaminase activity, we have studied the association of got, gpt, and ggt activity with the most prevalent cyp2d6 genotypes in ad (table 40 .10). globally, ums and pms tend to show the highest got activity and ims the lowest. signifi cant differences appear among different im-related genotypes. the * 10/ * 10 genotype exhibited the lowest got activity with marked differences as compared to ums (p < 0.05 vs. * 1xn/ * 1; p < 0.05 vs. * 1xn/ * 4). gpt activity was signifi cantly higher in pms ( * 4/ * 4) than in ems ( * 1/ * 10, p < 0.05) or ims ( * 1/ * 4, * 1/ * 5, p < 0.05). the lowest gpt activity was found in ems and ims. striking differences have been found in ggt activity between pms ( * 4/ * 4), which showed the highest levels, and ems ( * 1/ * 1, p < 0.05; * 1/ * 10, p < 0.05), ims ( * 1/ * 5, p < 0.05), or ums ( * 1xn/ * 1, p < 0.01) ) ( table 40 .10). interesting enough, the * 10/ * 10 genotype, with the lowest values of got and gpt, exhibited the second highest levels of ggt after * 4/ * 4, probably indicating that cyp2d6-related enzymes differentially regulate drug metabolism and transaminase activity in the liver. these results are also clear in demonstrating the direct effect of cyp2d6 variants on transaminase activity 20,77,109 (table 40 .10). (2) 16.28 ± 7.40 (11) 18.14 ± 6.79 (17) intermediate metabolizers * 1/ * 3 22.33 ± 1.52 (3, 4) 24.66 ± 10. 59 22.00 ± 8.71 * 1/ * 4 21.76 ± 3.57 (5, 6) 21.88 ± 8.40 32.23 ± 25.53 * 1/ * 5 18.33 ± 2.33 (7, 8) 16.16 ± 5.60 (12, 13) 18.50 ± 6.47 ( no clinical trials have been performed to date to elucidate the infl uence of cyp2d6 variants on the therapeutic outcome in ad in response to cholinesterase inhibitors or other anti-dementia drugs. to overcome this lack of pharmacogenetic information, we have performed the fi rst prospective study in ad patients who received a combination therapy with (a) an endogenous nucleotide and choline donor, cdp-choline (500 mg/day), (b) a nootropic substance, piracetam (1,600 mg/day), (c) a vasoactive compound, 1,6 dimethyl 8β-(5bromonicotinoyl-oxymethyl)-10α-methoxyergoline (nicergoline) ( (fig. 40.10 ). among ems, ad patients harbouring the * 1/ * 10 genotype responded better than patients with the * 1/ * 1 genotype. the best responders among ims were the * 1/ * 3, * 1/ * 6 and * 1/ * 5 genotypes, whereas the * 1/ * 4, * 10/ * 10, and * 4/ * 10 genotypes were poor responders. among pms and ums, the poorest responders were carriers of the * 4/ * 4 and * 1xn/ * 1 genotypes, respectively. 20, 77, 109 from all these data we can conclude the following: (i) the most frequent cyp2d6 variants in the spanish population are the * 1/ * 1 (47.10%), * 1/ * 4 (17.42%), * 4/ * 4 (8.37%), * 1/ * 10 (4.52%) and * 1xn/ * 1 (4.52%), accounting for more than 80% of the population; (ii) the frequency of ems, ims, pms, and ums is about 51.61%, 32.26%, 9.03%, and 7.10%, respectively; (iii) ems are more prevalent in ad (57.47%) than in controls (44.12%); ims are more frequent in controls (41.18%) fig. 40 .10 cyp2d6-related therapeutic response to a multifactorial treatment in alzheimer's disease over a 1-year period (adapted from r. cacabelos 77, 109 ).patients received a combina-tion therapy for 1 year, and cognitive function (mmse score) was assessed at baseline (b) and after 1, 3, 6, 9, and 12 months of treatment. than in ad (25.29%), especially the * 1/ * 4 (c: 23.53%; ad: 12.64%) and * 4/ * 10 genotypes (c: 5.88%; ad: 1.15%); the frequency of pms is similar in ad (9.20%) and controls (8.82%); and ums are more frequent among ad cases (8.04%) than in controls (5.88%); (iv) there is an accumulation of ad-related genes of risk in pms and ums; (v) pms and ums tend to show higher transaminase activities than ems and ims; (vi) ems and ims are the best responders, and pms and ums are the worst responders to a combination therapy with cholinesterase inhibitors, neuroprotectants, and vasoactive substances; and (vii) the pharmacogenetic response in ad appears to be dependent upon the networking activity of genes involved in drug metabolism and genes involved in ad pathogenesis. [16] [17] [18] [19] [20] [74] [75] [76] [77] 109, 110 taking into consideration the available data, it might be inferred that at least 15% of the ad population may exhibit an abnormal metabolism of cholinesterase inhibitors and/or other drugs which undergo oxidation via cyp2d6-related enzymes. approximately 50% of this population cluster would show an ultrarapid metabolism, requiring higher doses of cholinesterase inhibitors to reach a therapeutic threshold, whereas the other 50% of the cluster would exhibit a poor metabolism, displaying potential adverse events at low doses. if we take into account that approximately 60-70% of therapeutic outcomes depend upon pharmacogenomic criteria (e.g., pathogenic mechanisms associated with ad-related genes), it can be postulated that pharmacogenetic and pharmacogenomic factors are responsible for 75-85% of the therapeutic response (effi cacy) in ad patients treated with conventional drugs. [16] [17] [18] [19] [20] [74] [75] [76] [77] 109, 110 of particular interest are the potential interactions of cholinesterase inhibitors with other drugs of current use in patients with ad, such as antidepressants, neuroleptics, antiarrhythmics, analgesics, and antiemetics which are metabolized by the cytochrome p450 cyp2d6 enzyme. 111 although most studies predict the safety of donepezil 112 and galantamine, 107 as the two principal cholinesterase inhibitors metabolized by cyp2d6-related enzymes, 113, 114 no pharmacogenetic studies have been performed so far on an individual basis to personalize the treatment, and most studies reporting safety issues are the result of pooling together pharmacological and clinical information obtained with routine procedures. 103, [115] [116] [117] in certain cases, genetic polymorphism in the expression of cyp2d6 is not expected to affect the pharmacodynamics of some cholinesterase inhibitors because major meta-bolic pathways are glucuronidation, o-demethylation, n-demethylation, n-oxidation, and epimerization. however, excretion rates are substantially different in ems and pms. for instance, in ems, urinary metabolites resulting from o-demethylation of galantamine represent 33.2% of the dose compared with 5.2% in pms, which show correspondingly higher urinary excretion of unchanged galantamine and its n-oxide. 118 therefore, still there are many unanswered questions regarding the metabolism of cholinesterase inhibitors and their interaction with other drugs (potentially leading to adrs) which require pharmacogenetic elucidation. it is also worth to mention that dose titration (a common practice in ad patients treated with cholinesterase inhibitors; e.g., tacrine, donepezil) is an unwise strategy, since approximately 30-60% of drug failure or lack of therapeutic effi cacy (and/or adr manifestation) is not a matter of drug dosage but a problem of poor metabolizing capacity in pms. additionally, inappropriate drug use is one of the risk factors for adverse drug reactions (adrs) in the elderly. the prevalence of use of potentially inappropriate medications in patients older than 65 years of age admitted to a general medical or geriatric ward ranges from 16% to 20%, 119 and these numbers may double in ambulatory patients. overall, the most prevalent inappropriate drugs currently prescribed to the elderly are amiodarone, long-acting benzodiazepines and anticholinergic antispasmodics; however, the list of drugs with potential risk also include antidepressant, antihistaminics, nsaids, amphetamines, laxatives, clonidine, indomethacin, and several neuroleptics, 119 most of which are processed via cyp2d6 and cyp3a5 enzymes. 120 therefore, pre-treatment cyp screening might be of great help to rationalize and optimize therapeutics in the elderly, by avoiding medications of risk in pms and ums. there are substantial differences between individuals in the effects of psychotropic drugs in the treatment of neuropsychiatric disorders. pharmacogenetic studies of psychotropic drug response have focused on determining the relationship between variation in specifi c candidate genes and the positive and adverse effects of drug treatment. 121 more than 200 different genes are potentially involved in the metabolism of psychotropic drugs infl uencing pharmacokinetics and pharmacodynamics. of all genes affecting drug metabolism, effi cacy and safety, the cyp gene family is the most relevant since more than 60% of cns drugs are metabolized by cytochrome p450 enzymes. [122] [123] [124] approximately, 18% of neuroleptics are major substrates of cyp1a2 enzymes, 40% of cyp2d6, and 23% of cyp3a4; 24% of antidepressants are major substrates of cyp1a2 enzymes, 5% of cyp2b6, 38% of cyp2c19, 85% of cyp2d6, and 38% of cyp3a4; 7% of benzodiazepines are major substrates of cyp2c19 enzymes, 20% of cyp2d6, and 95% of cyp3a4 (table 40 .5). approximately, 80% of patients with resistant depression, 60% of patients non-responsive to neuroleptics, and 50-70% of patients with paradoxical responses to benzodiazepines are carriers of mutant variants of the cyp2d6, cyp2c9 and cyp3a4 genes, falling within the categories of poor or ultra-rapid metabolizers. other genes infl uencing psychotropic drug activity include the following: abcb1 ( [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] (table 40.5) . historically, the vast majority of pharmacogenetic studies of cns disorders have been addressed to evaluate the impact of cytochrome p450 enzymes on drug metabolism. [125] [126] [127] furthermore, conventional targets for psychotropic drugs were the neurotransmitters dopamine, serotonin, noradrenaline, gaba, ion channels, acetylcholine and their respective biosynthetic and catalyzing enzymes, receptors and transporters 121 ; however, in the past few years many different genes have been associated with both pathogenesis and pharmacogenomics of neuropsychiatric disorders. some of these genes and their products constitute potential targets for future treatments. new developments in genomics, including whole genome genotyping approaches and comprehensive information on genomic variation across populations, coupled with large-scale clinical trials in which dna collection is routine, now provide the impetus for a next generation of pharmacogenetic studies and identifi cation of novel candidate drugs. [139] [140] [141] cyclic nucleotide phosphodiesterases (pdes) are a family of enzymes that degrade camp and cgmp. intracellular cyclic nucleotide levels increase in response to neurotransmitters and are down-regulated through hydrolysis catalyzed by pdes, which are therefore candidate therapeutic targets. camp is a second messenger involved in learning, memory, and mood, and cgmp modulates brain processes that are controlled by the nitric oxide (no)/cgmp pathway. the analysis of snps in 21 genes of this superfamily revealed that polymorphisms in pde9a and pde11a are associated with major depressive disorder. in addition, remission on antidepressants was associated with polymorphisms in pde1a and pde11a. according to these results, it has been postulated that pde11a (haplotype gaacc) has a role in the pathogenesis of major depression. 142 another example is the purinergic receptor gene p2rx(7), located in a major linkage hotspot for schizophrenia and bipolar disorder (12q21-33), which has been associated with bipolar disorder, but nine functionally characterized variants of p2rx(7) did not show association with schizophrenia. 143 the possible role of a tag snp (the 1359g/a polymorphism) of the gene encoding the cannabinoid receptor type 1 (cnr1) has been investigated in schizophrenics treated with atypical antipsychotics. no difference in 1359g/a polymorphism was observed between patients and control subjects, and no relation-ships were noted between this polymorphism and any clinical parameter considered as potential intermediate factor; however, the g allele was signifi cantly higher among non-responders vs. responsive patients, suggesting that the g allele of the cnr1 gene could be a pharmacogenetic rather than a vulnerability factor for schizophrenics. 144 synaptic dysfunction is a potential pathogenic factor in schizophrenia. cholesterol is an essential component of myelin and has proved important for synapse formation and lipid raft function. it has been demonstrated that the antipsychotic drugs clozapine and haloperidol stimulate lipogenic gene expression in glioma cells in culture through activation of the sterol regulatory element-binding protein (srebp) transcription factors. recently, the action of chlorpromazine, haloperidol, clozapine, olanzapine, risperidone and ziprasidone on srebp and srebp-controlled gene expression (acetyl-coa acetyltransferase 2, acetoacetyl-coa thiolase, acat2; 3-hydroxy-3-methylglutaryl-coa reductase, hmgcr; 3-hydroxy-3-methylglutaryl-coa synthase 1, hmgcs1; fdps; sterol-c5-desaturase like, sc5dl; 7-dehydrocholesterol reductase, dhcr7; low density lipoprotein receptor, ldlr; fatty acid synthase; farsenyl diphosphate synthase, fasn; stearoyl-coa desaturase, delta-9-desaturase, scd1) has been investigated in different cns human cell lines, demonstrating that antipsychotic-induced activation of lipogenesis is most prominent in glial cells and that this mechanism could be relevant for the therapeutic efficacy of some antipsychotic drugs. 145 rgs2 (regulator of g-protein signaling 2) modulates dopamine receptor signal transduction. functional variants of this gene (rgs2-rs 4606 c/g) may infl uence susceptibility to extrapyramidal symptoms induced by antipsychotic drugs. this snp is located in the 3′-regulatory region of the gene, and is known to infl uence rgs2 mrna levels and protein expression. 146 furthermore, rgs4 (regulator of g protein signaling 4) genotypes predict both the severity at baseline symptoms and relative responsiveness to antipsychotic medication. 147 tardive dyskinesia is characterized by involuntary movements predominantly in the orofacial region and develops in approximately 20% of patients during long-term treatment with typical antipsychotics. polymorphic variants of cyp1a2, cyp2d6, and drd3 genes have been associated with tardive dyskinesia in schizophrenics. 148, 149 in contrast, the haplotype t-4b-glu of the endothelial nitric oxide synthase (nos3) gene (-786t > c in the promoter region, 27-bp variable number of tandem repeats in intron 4, glu298asp in exon 7) might represent a protective haplotype against tardive dyskinesia after long-term antipsychotic treatment. 150 the t102c variant in the serotonin 2a receptor (htr2a) and the ser9gly variant in the dopamine d3 receptor (drd3) were associated with a risperidone response to exacerbated schizophrenia. the patients with t/t in the htr2a gene show less clinical improvement than do those with t/c or c/c. the c allele is more frequent in responders. when combinations of both polymorphisms are considered, patients who have t/t in the htr2a gene and encode ser/ser or ser/gly from drd3 gene have a higher propensity to non-responsiveness compared to other subjects, suggesting that the htr2a t102c variant could be a potential indicator of clinical improvement after risperidone treatment. 151 there is a signifi cant relationship between a promoter region polymorphism in the serotonin transporter gene and antidepressant response, as well as for associations between candidate neurotransmitter receptor genes and second generation antipsychotic drug response. 121 polymorphic variants of several serotonin receptor subtypes seem to be involved in the efficacy and symptomatic response of schizophrenic patients to atypical antipsychotics. for instance, the −1019 c/g polymorphism of the htr1a receptor gene is associated with negative symptom response to risperidone in schizophrenics. 152 interaction between comt and notch4 genotypes may also predict the treatment response to typical neuroleptics in patients with schizophrenia. 153 the effi cacy of iloperidone in patients with schizophrenia has been associated with the homozygous condition for the rs1800169 g/g genotype of the ciliary neurotrophic factor (cntf) gene. 154 dopamine receptor interacting proteins (drips) are pivotally involved in regulating dopamine receptor signal transduction. two snps in the dopamine receptor interacting protein gene, nef3, which encodes the drip, neurofi lament-medium (nf-m), were associated with early response (rs1457266, rs1379357). a 5 snp haplotype spanning nef3 was over-represented in early responders. since nef3 is primarily associated with dopamine d1 receptor function, it is likely that both genes cooperate in eliciting genotype-specifi c antipsychotic response. 155 the improvement in the positive and negative syndrome scale (panss) positive subscore was found signifi cantly greater in patients homozygous for the a1287 allele of the slc6a2 (solute carrier family 6 (noradrenaline transporter), member 2) gene, and smaller in patients homozygous for the c-182 allele of the slc6a2 gene, suggesting that these polymorphisms of the noradrenaline transporter gene are specifi cally involved in the variation of positive symptoms in schizophrenia. 156 weight gain is a problem commonly found in patients treated with neuroleptics, tricyclic antidepressants, and some antiepileptics (e.g., valproic acid). the adipocyte-derived hormone, leptin, has been associated with body weight and energy homeostasis, and abnormal regulation of leptin could play a role in weight gain induced by antipsychotics. the leptin gene promoter variant g2548a was associated with clozapine-induced weight in chinese patients with chronic schizophrenia. 157 likewise, studies in caucasians suggest that genetic vulnerability in the leptin gene (−1548g/a) and leptin receptor (q223r) may predispose some individuals to excessive weight gain from increased exposure to olanzapine. 158, 159 the development of selective type 5 metabotropic glutamate receptor (mglu5) antagonists, such as 2-methyl-6-(phenylethynyl)-pyridine (mpep) and 3-[(2-methyl-1,3-thiazol-4-yl)ethynyl]-pyridine (mtep), has demonstrated the potential involvement of these receptors in several cns disorders including depression, anxiety, epilepsy, parkinson's disease, drug addiction, and alcoholism. treatment with mpep and mtep can induce gene expression related to atp synthesis, hydrolase activity, and signaling pathways associated with mitogen-activated protein kinase (mapk) in the frontal cortex, this constituting another potential therapeutic target in some neuropsychiatric disorders. 160 a new marker (rs1954787) in the grik4 gene, which codes for the kainic acid-type glutamate receptor ka1, has been associated with response to the antidepressant citalopram, suggesting that the glutamate system plays a role in modulating response to selective serotonin reuptake inhibitors (ssris). 161 glycogen synthase kinase-3β (gsk3b) activity is increased in the brain of patients with major depressive disorders. inhibition of gsk3b is thought to be a key feature in the therapeutic mechanism of antidepressants. four polymorphisms of the gsk3b gene [rs334555 (−50 t > c); rs13321783 (ivs7 + 9227 a > g); rs2319398 (ivs + 11660 g > t); rs6808874 (ivs + 4251 t > a)] have been genotyped in chinese patients with major depression. gsk3b tagt carriers showed poorer response to antidepressants. 162 lithium has been used for over 40 years as an effective prophylactic agent in bipolar disorder. response to lithium treatment seems to be, at least in part, genetically determined. it has been suggested that lithium exerts an effect on signal transduction pathways, such as the cyclic adenosine monophosphate (camp) pathway. association studies in patients with bipolar disorders revealed that creb1-1h snp (g/a change at 2q32.3-q34) and creb1-7h (t/c change) may be associated with bipolar disorder and lithium response. 163 dna oligonucleotide microarrays have been used to evaluate gene expression in the substantia nigra of patients with parkinson's disease (pd). sporadic pd is characterized by progressive death of dopaminergic neurons within the substantia nigra, where cell death is not uniform. the lateral tier of the substantia nigra (snl) degenerates earlier and more severely than the more medial nigral component (snm). genes expressed more highly in the pd snl included the cell death gene, p53 effector related to pmp22, the tnfr gene, tnfr superfamily, member 21, and the mitochondrial complex i gene, nadh dehydrogenase (ubiquinone) 1-beta subcomplex, 3, 12 kda (ndufβ3). genes that were more highly expressed in pd snm included the dopamine cell signaling gene, cyclic adenosine monophosphate-regulated phosphoprotein, 21 kda, the activated macrophage gene, stabilin 1, and two glutathione peroxidase (gpx) genes, gpx1 and gpx3. this gene expression profi le reveals that there is increased expression of genes encoding pro-infl ammatory cytokines and subunits of the mitochondrial electron transport chain in glial cells, and that there is a decreased expression of several glutathione-related genes in the gnl, suggesting a molecular basis for pathoclisis. 164 these fi ndings may contribute to open new therapeutic avenues in pd, where glial cells might represent potential targets to halt disease progression. pharmacological inhibition of cyclic-dependent kinase 5 (cdk5) protects neurons under distinct stressful conditions. in ad and amyotrophic lateral sclerosis deregulation of cdk5 causes hyperphosphorylation of tau and neurofi lament proteins, respectively, leading to neuronal cell death. by two-dimensional gel electrophoresis and matrix assisted laser desorption/ionisation-time of fl ight (maldi-tof)-mass spectrometry, several phosphoproteins that are modulated by cdk5 inhibitors have been identifi ed. these phosphoproteins include syndapin i which is involved in vesicle recycling, and dynein light intermediate chain 2 which represents a regulatory subunit of the dynein protein complex, confi rming the role of cdk5 in synaptic signaling and axonal transport. other phosphoproteins detected are cofi lin and collapsing response mediator protein, involved in neuronal survival and/or neurite outgrowth. selective cdk5 inhibitors can also block mitochondrial translocation of pro-apoptotic cofi lin. phosphoproteome and transcriptome analysis of neurons indicate that cdk5 inhibitors promote both neuronal survival and neurite outgrowth. 165 these compounds might represent novel therapeutic alternatives in neurodegenerative disorders. despite the promising results obtained with structural and functional genomic procedures to identify associations with disease pathogenesis and potential drug targets in cns disorders, it must be kept in mind that allelic mrna expression is affected by genetic and epigenetic events, both with the potential to modulate neurotransmitter tone in the cns. 166 epigenetics is the study of how the environment can affect the genome of the individual during its development as well as the development of its descendants, all without changing the dna sequence, but inducing modifi cations in gene expression through dna methylation-demethylation or through modifi cation of histones by processes of methylation, deacetylation, and phosphorylation. 167 cumulative experiences throughout life history interact with genetic predispositions to shape the individual's behaviour. 167 epigenetic phenomena can not be neglected in the pathogenesis and pharmacogenomics of cns disorders. studies in cancer research have demonstrated the antineoplastic effects of the dna methylation inhibitor hydralazine and the histone deacetylase inhibitor valproic acid, of current use in epilepsy. 168 novel effects of some pleiotropic drugs with activity on the cns have to be explored to understand in full their mechanisms of action and adjust their dosages for new indications. both hyper-and hypo-dna methylation changes of the regulatory regions play critical roles in defi ning the altered functionality of genes (mb-comt, maoa, dat1, th, drd1, drd2, reln, bdnf) in major psychiatric disorders, such as schizophrenia and bipolar disorder. 169 this complexity requires a multifactorial approach to overcome the hurdles that cns drug development faces at the present time. 170 polymorphic variants in the apoe gene (19q13.2) are associated with risk (apoe-4 allele) or protection (apoe-2 allele) for ad. 8, [18] [19] [20] for many years, alterations in apoe and defects in the apoe gene have been associated with dysfunctions in lipid metabolism, cardiovascular disease, and atherosclerosis. during the past 25 years an enormous amount of studies clearly documented the role of apoe-4 as a risk factor for ad, an the accumulation of the apoe-4 allele has been reported as a risk factor for other forms of dementia and cns disorders. 8, [18] [19] [20] apoe-4 may infl uence ad pathology interacting with app metabolism and abp accumulation, enhancing hyperphosphorylation of tau protein and nft formation, reducing choline acetyltransferase activity, increasing oxidative processes, modifying infl ammation-related neuroimmunotrophic activity and glial activation, altering lipid metabolism, lipid transport and membrane biosynthesis in sprouting and synaptic remodelling, and inducing neuronal apoptosis. 8, [18] [19] [20] different apoe genotypes confer specifi c phenotypic profi les to ad patients. some of these profi les may add risk or benefi t when the patients are treated with conventional drugs, and in many instances the clinical phenotype demands the administration of additional drugs which increase the complexity of therapeutic protocols. from studies designed to defi ne apoe-related ad phenotypes, 8, [16] [17] [18] [19] [20] [21] [22] 62, 63, [74] [75] [76] [77] 109, 110 several confi rmed conclusions can be drawn: (i) the ageat-onset is 5-10 years earlier in approximately 80% of ad cases harbouring the apoe-4/4 genotype; (ii) the serum levels of apoe are the lowest in apoe-4/4, intermediate in apoe-3/3 and apoe-3/4, and highest in apoe-2/3 and apoe-2/4; (iii) serum cholesterol levels are higher in apoe-4/4 than in the other genotypes; (iv) hdl-cholesterol levels tend to be lower in apoe-3 homozygotes than in apoe-4 allele carriers; (v) ldl-cholesterol levels are systematically higher in apoe-4/4 than in any other genotype; (vi) triglyceride levels are signifi cantly lower in apoe-4/4; (vii) nitric oxide levels are slightly lower in apoe-4/4; (viii) serum abp levels do not differ between apoe-4/4 and the other most frequent genotypes (apoe-3/3, apoe-3/4); (ix) blood histamine levels are dramatically reduced in apoe-4/4 as compared with the other genotypes; (x) brain atrophy is markedly increased in apoe-4/4 > apoe-3/4 > apoe-3/3; (xi) brain mapping activity shows a signifi cant increase in slow wave activity in apoe-4/4 from early stages of the disease (fig. 40.4) ; (xii) brain hemodynamics, as refl ected by reduced brain blood fl ow velocity and increase pulsatility and resistance indices, is signifi cantly worst in apoe-4/4 (and in apoe-4 carriers, in general, as compared with apoe-3 carriers); (xiii) lymphocyte apoptosis is markedly enhanced in apoe-4 carriers; (xiv) cognitive deterioration is faster in apoe-4/4 patients than in carriers of any other apoe genotype; (xv) occasionally, in approximately 3-8% of the ad cases, the presence of some dementia-related metabolic dysfunctions (e.g., iron, folic acid, vitamin b12 defi ciencies) accumulate in apoe-4 carriers more than in apoe-3 carriers; (xvi) some behavioral disturbances (bizarre behaviors, psychotic symptoms), alterations in circadian rhythm patterns (e.g., sleep disorders), and mood disorders (anxiety, depression) are slightly more frequent in apoe-4 carriers; (xvii) aortic and systemic atherosclerosis is also more frequent in apoe-4 carriers; (xviii) liver metabolism and transaminase activity also differ in apoe-4/4 with respect to other genotypes; (xix) blood pressure (hypertension) and other cardiovascular risk factors also accumulate in apoe-4; and (xx) apoe-4/4 are the poorest responders to conventional drugs ( fig. 40.11 ). these 20 major phenotypic features clearly illustrate the biological disadvantage of apoe-4 homozygotes and the potential consequences that these patients may experience when they receive pharmacological treatment. 2, 4, 8, [16] [17] [18] [19] [20] [21] [22] 62, 63, [74] [75] [76] [77] 109, 110 fig. 40.11 apoe-related cognitive performance in patients with alzheimer's disease treated with a combination therapy for 1 year (adapted from r. cacabelos 77, 109 ). patients received a combination therapy for 1 year, and cognitive function (mmse score) was assessed at baseline (b) and after 1, 3, 6, 9, and 12 months of treatment. several studies indicate that the presence of the apoe-4 allele differentially affects the quality and size of drug responsiveness in ad patients treated with cholinergic enhancers, neuroprotective compounds or combination therapies; however, controversial results are frequently found due to methodological problems, study design, and patients recruitment in clinical trials. from these studies we can conclude the following: (i) multifactorial treatments combining neuroprotectants, endogenous nucleotides, nootropic agents, vasoactive substances, cholinesterase inhibitors, and nmda antagonists associated with metabolic supplementation on an individual basis adapted to the phenotype of the patient may be useful to improve cognition and slow-down disease progression in ad. (ii) in our personal experience the best results have been obtained combining (a) cdp-choline with piracetam and metabolic supplementation, (b) cdp-choline with piracetam and anapsos, (c) cdp-choline with piracetam and cholinesterase inhibitors (donepezil, rivastigmine), (d) cdp-choline with memantine, and (e) cdpcholine, piracetam and nicergoline. (iii) some of these combination therapies have proven to be effective, improving cognition during the fi rst 9 months of treatment, and not showing apparent side-effects. (iv) the therapeutic response in ad seems to be genotypespecifi c under different pharmacogenomic conditions. (v) in monogenic-related studies, patients with the apoe-2/3 and apoe-3/4 genotypes are the best responders, and apoe-4/4 carriers are the worst responders ( fig. 40.11 ). (vi) ps1-and ps2-related genotypes do not appear to infl uence the therapeutic response in ad as independent genomic entities; however, app, ps1, and ps2 mutations may drastically modify the therapeutic response to conventional drugs. (vii) in trigenic-related studies the best responders are those patients carrying the 331222-, 341122-, 341222-, and 441112-genomic clusters. (viii) a genetic defect in the exon 5 of the ps2 gene seems to exert a negative effect on cognition conferring ps2+ carriers in trigenic clusters the condition of poor responders to combination therapy. (ix) the worst responders in all genomic clusters are patients with the 441122+ genotype. (x) the apoe-4/4 genotype seems to accelerate neurodegeneration anticipating the onset of the disease by 5-10 years; and, in general, apoe-4/4 carriers show a faster disease progression and a poorer therapeutic response to all available treatments than any other polymorphic variant. (xi) pharmacogenomic studies using trigenic, tetragenic or polygenic clusters as a harmonization procedure to reduce genomic heterogeneity are very useful to widen the therapeutic scope of limited pharmacological resources. [4] [5] [6] [16] [17] [18] [19] [20] [21] [22] 62, 63, [74] [75] [76] [77] 109, 110 apoe infl uences liver function and cyp2d6-related enzymes probably via regulation of hepatic lipid metabolism. 20, 42, [74] [75] [76] [77] it has been observed that apoe may infl uence liver function and drug metabolism by modifying hepatic steatosis and transaminase activity. there is a clear correlation between apoe-related tg levels and got, gpt, and ggt activities in ad. 20, [74] [75] [76] [77] 171 both plasma tg levels and transaminase activity are signifi cantly lower in ad patients harbouring the apoe-4/4 genotype, probably indicating (a) that low tg levels protect against liver steatosis, and (b) that the presence of the apoe-4 allele infl uences tg levels, liver steatosis, and transaminase activity. consequently, it is very likely that apoe infl uences drug metabolism in the liver through different mechanisms, including interactions with enzymes such as transaminases and/ or cytochrome p450-related enzymes encoded in genes of the cyp superfamily. 20, [74] [75] [76] [77] 109, 171 when apoe and cyp2d6 genotypes are integrated in bigenic clusters and the apoe + cyp2d6-related therapeutic response to a combination therapy is analyzed in ad patients after 1 year of treatment, it becomes clear that the presence of the apoe-4/4 genotype is able to convert pure cyp2d6 * 1/ * 1 ems into full pms (fig. 40.12) , indicating the existence of a powerful infl uence of the apoe-4 homozygous genotype on the drug metabolizing capacity of pure cyp2d6-ems. 20, 74, 75, 109 behavioral disturbances and mood disorders are intrinsic components of dementia associated with memory disorders. 60, [172] [173] [174] the appearance of anxiety, depression, psychotic symptoms, verbal and physical aggressiveness, agitation, wandering and sleep disorders complicate the clinical picture of dementia and add important problems to the therapeutics of ad and the daily management of patients as well. under these conditions, psychotropic drugs (antidepressants, anxyolitics, hypnotics, and neuroleptics) are required, and most of these substances contribute to deteriorate cognition and psychomotor functions. apoe-related polymorphic variants have been associated with mood disorders 175, 176 and panic disorder. 177 gender, age, dementia severity, apoe-4, and general medical health appear to infl uence the occurrence of individual neuropsychiatric symptoms in dementia, and medical comorbidity increases the risk of agitation, irritability, disinhibition, and aberrant motor behavior. 178 a positive association between apoe-4 and neuropsychiatric symptoms 179 and depressive symptoms in ad has been reported, 180 especially in women. 181 in other studies, no association of apoe-4 with behavioral dyscontrol (euphoria, disinhibition, aberrant motor behavior, and sleep and appetite disturbances), psychosis (delusions and hallucinations), mood (depression, anxiety, and apathy), and agitation (aggression and irritability) could be found. 182 some authors did not fi nd association of apoe-4 with major depression in ad 183, 184 or in patients with major depression in a community of older adults, 185 but an apparent protective effect of apoe-2 on depressive symptoms was detected. 186 others, in contrast, found that apoe-4 was associated with an earlier age-of-onset, but not cognitive functioning, in late-life depression. 187 apoe−/− mice without human apoe or with apoe-4, but not apoe-3, show increased measures of anxiety. 188 differences in anxiety-related behavior have been observed between apoe-defi cient c57bl/6 and wild type c57bl/6 mice, suggesting that apoe variants may affect emotional state. 189 histamine h3 autoreceptor antagonists increase anxiety measures in wildtype, but not apoe−/−, mice, and apoe defi cient mice show higher sensitivity to the anxiety-reducing effects of the h1 receptor antagonist mepyramine than wildtype mice, suggesting a role of h3-autoreceptormediated signaling in anxiety-like symptoms in this ad-related animal model. 190 in humans, apoe-4 carriers with deep white matter hyperintensities in mri show association with depressive symptoms and vascular depression. 191 . reduced caudate nucleus volumes and genetic determinants of homocysteine metabolism accumulate in patients with psychomotor slowing and cognitive defi cits, 192 and older depressed subjects have persisting cognitive impairments associated with hippocampal volume reduction. 193, 194 depressive symptoms are also associated with stroke and atherogenic lipid profi le. 195 some multifactorial treatments addressing neuroprotection have shown to be effective in reducing anxiety progressively from the fi rst month to the 12 month of treatment. 109 the anxiety rate was declining from a baseline hrs-a score of 10.90 ± 5.69 to 9.07 ± 4.03 (p < 0.0000000001) at 1 month, 9.01 ± 4.38 (p < 0.000006) at 3 months, 8.90 ± 4.47 (p < 0.005) at 6 months, 7.98 ± 3.72 (p < 0.00002) at 9 months, and 8.56 ± 4.72 (p < 0.01) at 12 months of treatment (r = −0.82, a coef.: 10.57, b coef.: −0.43). 109 similar striking results were found in depression, suggesting that improvement in mood conditions can contribute to stabilize cognitive function or that neuroprotection (with the consequent stabilization or improvement in mental performance) can enhance emotional equilibrium. 20, 74, 75, 109 at baseline, all apoe variants showed similar anxiety and depression rates, except the apoe-4/4 carriers who differed from the rest in a signifi cantly lower rates of anxiety and depression (figs. 40.13 and 40.14). remarkable changes in anxiety were found among different apoe genotypes (fig. 40.13) . practically, all apoe variants responded with a signifi cant diminution of anxiogenic symptoms, except patients with the apoe-4/4 genotype who only showed a slight improvement. the best responders were apoe-2/4 > apoe-2/3 > apoe-3/3 > apoe-3/4 carriers (fig. 40.13) . the modest anxiolytic effect seen in apoe-4/4 patients might be due to the very low anxiety rate observed at baseline. concerning depression, all apoe genotypes improved their depressive symptoms with treatment except those with the apoe-4/4 genotype which worsen along the treatment period, especially after 9 months (fig. 40 .14). the best responders were patients with apoe-2/4 > apoe-2/3 > apoe-3/3 > apoe-3/4, and the worst responders were patients harbouring the apoe-4/4 genotype 20,74,75,109 (fig. 40.14) . the optimization of cns therapeutics requires the establishment of new postulates regarding (a) the costs of medicines, (b) the assessment of protocols for multifactorial treatment in chronic disorders, (c) the implementation of novel therapeutics addressing causative factors, and (d) the setting-up of pharmacogenetic/ pharmacogenomic strategies for drug development. 20, [74] [75] [76] [77] 109 the cost of medicines is a very important issue in many countries because of (i) the growing of the aging population (>5% disability), (ii) neuropsychiatric and demented patients (>5% of the population) belong to an unproductive sector with low income, and (iii) the high cost of health care systems and new health technologies in developed countries. despite the effort of the pharmaceutical industry to demonstrate the benefi ts and cost-effectiveness of available drugs, the general impression in the medical community and in some governments is that some psychotropics and most anti-dementia drugs present in the market are not costeffective. 20, [74] [75] [76] [77] 109 conventional drugs for neuropsychi-atric disorders are relatively simple compounds with unreasonable prices. some new products are not superior to conventional antidepressants, neuroleptics, and anxiolytics. there is an urgent need to assess the costs of new trials with pharmacogenetics and pharmacogenomics strategies, and to implement pharmacogenetic procedures to predict drug-related adverse events. 20, 74, 75, 109 cost-effectiveness analysis has been the most commonly applied framework for evaluating pharmacogenetics. pharmacogenetic testing is potentially relevant to large populations that incur in high costs. for instance, the most commonly drugs metabolized by cyp2d6 account for 189 million prescriptions and us$12.8 billion annually in expenditures in the us, which represent 5-10% of total utilization and expenditures for outpatient prescription drugs. 196 pharmacogenomics offer great potential to improve patients' health in a cost-effective manner; however, pharmacogenetics/pharmacogenomics will not be applied to all drugs available in the market, and careful evaluations should be done on a case-by-case basis prior to investing resources in r&d of pharmacogenomic-based therapeutics and making reimbursement decisions. 197 in performing pharmacogenomic studies in cns disorders, it is necessary to rethink the therapeutic expectations of novel drugs, redesign the protocols for drug clinical trials, and incorporate biological markers as assessable parameters of effi cacy and prevention. in addition to the characterization of genomic profi les, phenotypic profi ling of responders and non-responders to conventional drugs is also important (and currently neglected). brain imaging techniques, computerized electrophysiology, and optical topography in combination with genotyping of polygenic clusters can help in the differentiation of responders and non-responders. the early identifi cation of predictive risks requires genomic screening and molecular diagnosis, and individualized preventive programs will only be achieved when pharmacogenomic/pharmacogenetic protocols are incorporated to the clinical armamentarium with powerful bioinformatics support. 18-20,74,75.109 an important issue in ad therapeutics is that antidementia drugs should be effective in covering the clinical spectrum of dementia symptoms represented by memory defi cits, behavioural changes, and functional decline. it is diffi cult (or impossible) that a single drug be able to fulfi l this criteria. a potential solution to this problem is the implementation of cost-effective, multifactorial (combination) treatments integrating several drugs, taking into consideration that traditional neuroleptics and novel antipsychotics (and many other psychotropics) deteriorate both cognitive and psychomotor functions in the elderly and may also increase the risk of stroke. 198 few studies with combination treatments have been reported and most of them are poorly designed. we have also to realize that the vast majority of dementia cases in people older than 75-80% are of a mixed type, in which the cerebrovascular component associated with neurodegeneration can not be therapeutically neglected. in most cases of dementia, the multifactorial (combination) therapy appears to be the most effective strategy. [18] [19] [20] [74] [75] [76] [77] 109 the combination of several drugs (neuroprotectants, vasoactive substances, acheis, metabolic supplementation) increases the direct costs (e.g., medication) by 5-10%, but in turn, annual global costs are reduced by approximately 18-20% and the average survival rate increases about 30% (from 8 to 12 years post-diagnosis). there are major concerns regarding the validity of clinical trials in patients with severe dementia. despite the questionable experience with memantine, 199 simi-lar strategies have been used to demonstrate the utility of donepezil in severe ad. 200 this kind of studies bears some important pitfalls, including (a) short duration (<1 year), (b) institutionalized patients, (c) patients receiving many different types of drugs, (d) non-evaluated drug-drug interactions, (e) side-effects (e.g., hallucinations, gastrointestinal disorders) that may require the administration of additional medication, (f) lack of biological parameters demonstrating actual benefi ts, and (e) no cost-effectiveness assessment, among many other possibilities of technical criticism. [18] [19] [20] 109, 201 some of these methodological (and costly) problems might be overcome with the introduction of pharmacogenetic/ pharmacogenomic strategies to identify good responders who might obtain some benefi t by taking expensive medications. major impact factors associated with drug effi cacy and safety include the following: (i) the mechanisms of action of drugs, (ii) drug-specifi c adverse reactions, (iii) drug-drug interactions, (iv) nutritional factors, (v) vascular factors, (vi) social factors, and (vii) genomic factors (nutrigenetics, nutrigenomics, pharmacogenetics, pharmacogenomics). among genomic factors, nutrigenetics/nutrigenomics and pharmacogenetics/pharmacogenomics account for more than 80% of effi cacy-safety outcomes in current therapeutics. [18] [19] [20] 74, 75, 77, 109 some authors consider that priority areas for pharmacogenetic research are to predict serious adverse reactions (adrs) and to establish variation in effi cacy. 202 both requirements are necessary in cns disorders to cope with effi cacy and safety issues associated with either current cns drugs and new drugs. 121, 138 since drug response is a complex trait, genome-wide approaches (oligonucleotide microarrays, proteomic profi ling) may provide new insights into drug metabolism and drug response. genome-wide family-based association studies, using single snps or haplotypes, can identify associations with genome-wide signifi cance. 203, 204 to achieve a mature discipline of pharmacogenetics and pharmacogenomics in cns disorders and dementia it would be convenient to accelerate the following processes: (a) educate physicians and the public on the use of genetic/genomic screening in the daily clinical practice; (b) standardize genetic testing for major categories of drugs; (c) validate pharmacogenetic and pharmacogenomic procedures according to drug category and pathology; (d) regulate ethical, social, and economic issues; and (e) incorporate pharmacogenetic and pharmacogenomic procedures to both drugs in development and drugs in the market to optimize therapeutics. [18] [19] [20] [21] [22] [74] [75] [76] [77] 109, 205 costs of disorders of the brain in europe. executive summary a conceptual introduction to geriatric neuroscience the clinical and costeffectiveness of donepezil, rivastigmine, galantamine and memantine for alzheimer's disease pharmacological treatment of alzheimer disease: from phychotropic drugs and cholinesterase inhibitors to pharmacogenomics pharmacogenomics in alzheimer's disease pharmacogenomics for the treatment of dementia pharmacogenetics and drug development: the path to safer and more effective drugs molecular genetics of alzheimer's disease and aging molecular genetics of bipolar disorder and depression confounding in genetic association studies and its solutions molecular genetics of schizophrenia: a critical review lifespan and mitochondrial control of neurodegeneration high aggregate burden of somatic mtdna point mutations in aging and alzheimer's disease brain the application of functional genomics to alzheimer's disease pharmacogenomics and therapeutic prospects in alzheimer's disease pharmacogenomics, nutrigenomics and therapeutic optimization in alzheimer's disease pharmacogenomics, nutrigenomics and future therapeutics in alzheimer's disease pharmacogenomics in alzheimer's disease cerebrovascular risk factors in alzheimer's disease: brain hemodynamics and pharmacogenomic implications phenotypic profi les and functional genomics in dementia with a vascular component high throughput protein expression screening in the nervous system -needs and limitations gene expression atlas of the mouse central nervous system: impact and interactions of age, energy intake and gender subtelomeric study of 132 patients with mental retardation reveals 9 chromosomal anomalies and contributes to the delineation of submicroscopic deletions of 1pter, 2qter, 4pter, 5qter and 9qter dna fragmentation is increased in non-gabaergic neurons in bipolar disorder but not in schizophrenia the functional genome of ca1 and ca3 neurons under native conditions and in response to ischemia fibroblast and lymphoblast gene expression profi les in schizophrenia : are non-neural cells informative ? runs of homozygosity reveal highly penetrant recessive loci in schizophrenia the use of microarrays to characterize neuropsychiatric disorders: post-mortem studies of substance abuse and schizophrenia high-throughput analysis of promoter occupancy reveals direct neuronal targets of foxp2, a gene mutated in speech and language disorders microarray analysis of oxidative stress regulated genes in mesencephalic dopaminergic neuronal cells: relevance to oxidative damage in parkinson's disease towards a pathway defi nition of parkinson's disease: a complex disorder with links to cancer, diabetes and infl ammation analysis of potential transcriptomic biomarkers for huntington's disease in peripheral blood comprehensive transcriptional profi ling of prion infection in mouse models reveals networks of responsive genes transcriptional profi ling in the human prefrontal cortex : evidence for two activational states with cocaine abuse gene expression profi ling in the brains of human cocaine abusers ethanol and brain damage genomic responses in rat cerebral cortex after traumatic injury transcriptional profi ling in human epilepsy: expression array and single cell real-time qrt-pcr analysis reveal distinct cellular gene regulation molecular profi ling of temporal lobe epilepsy: comparison of data from human tissue and animal models the antiepileptic drug levetiracetam selectively modifi es kindling-induced alterations in gene expression in the temporal lobe of rats increased apoptosis, p53 up-regulation, and cerebellar neuronal degeneration in repair-defi cient cockayne syndrome mice inhibitors of differentiation (id1, id2, id3 and id4) genes are neuronal targets of mecp2 that are elevated in rett síndrome hdac inhibitors correct frataxin defi ciency in a friedreich ataxia mouse model gene expression profi ling in a mouse model of infantile neuronal lipofuscinosis reveals upregulation of immediate early genes and mediators of the infl ammatory response multiple sclerosis as a generalized cns disease -comparative microarray analysis of normal appearing white matter and lesions in secondary progressive ms pathways and genes differentially expressed in the motor cortex of patients with sporadic amyotrophic lateral sclerosis gene expression in cortex and hippocampus during acute pneumococcal meningitis role of lipids in brain injury and diseases expression profi le analysis of neurodegenerative disease: advances in specifi city and resolution methodological considerations regarding single-cell gene expression profi ling for brain injury genotoxicants target distinct molecular networks in neonatal neurons functional gene expression differences between inbred alcohol-preferring and -non-prerats in fi ve brain regions neuroadaptations in human chronic alcoholics: tion of the nf-κb system gene expression profi le of the nucleus accumbens of human cocaine abusers : evidence for dysregulation of myelin transcriptional changes common to human cocaine, cannabis and phencyclidine abuse microrna expression in the adult mouse central nervous system a pharmacogenomic approach to alzheimer's disease clinical psychiatry. jobe th genomics and phenotypic profi les in dementia: implications for pharmacological treatment a functional genomics approach to the analysis of biological markers in alzheimer disease genomic characterization of alzheimer's disease and genotype-related phenotypic analysis of biological markers in dementia the histamine-cytokine network in alzheimer disease: etiopathogenic and pharmacogenomic implications histamine function in brain disorders histamine in alzheimer's disease pathogenesis: biochemistry and functional genomics characterization of cytokine production, screening of lymphocyte subset patterns and in vitro apoptosis in healthy and alzheimer's disease individuals international human genome sequencing consortium. finishing the euchromatic sequence of the human genome implications of the human genome for understanding human biology and medicine pharmacogenetics and pharmacogenomics. in: emery and rimoin's principles and practice of medical genetics. 4th edn pharmacogenomics: the inherited basis for interindividual differences in drug response pharmacogenetics and pharmacogenomics: development, science, and translation pharmacogenomics and therapeutic prospect in dementia pharmacogenetic basis for therapeutic optimization in alzheimer's disease infl uence of pharmacogenetic factors on alzheimer's disease therapeutics pharmacogenetic aspects of therapy with cholinesterase inhibitors: the role of cyp2d6 in alzheimer's disease pharmacogenetics from pharmacogenetics and ecogenetics to pharmacogenomics inheritance and drug response pharmacogenomics-drug disposition, drug targets, and side effects national estimates of medication use in nursing homes medicare current benefi ciary survey and the 1996 medical expenditure survey antidepressant drugs prescribing among elderly subjects: a population-based study potentially inappropriate medication use among elderly home care patients in europe potentially inappropriate medication use by elderly persons in u.s. health maintenance organizations the metabolic & molecular bases of inherited disease catalog of 680 variants among eight cytochrome p450 (cyp) genes: nine esterase genes, and two other genes in the japanese population dna sequence variations in a 3.7-kb noncoding sequence 5-prime of the cyp1a2 gene: implications for human population history and natural selection pm frequencies of major cyps in asians and caucasians polymorphisms of drug-metabolizing enzymes cyp2c9, cyp2c19, cyp2d6, cyp1a1, nat2 and of p-glycoprotein in a russian population cyp2c9 allelic variants: ethnic distribution and functional signifi cance identifi cation and functional characterization of a new cyp2c9 variant (cyp2c9 * 5 1 ) expressed among african americans molecular basis of ethnic differences in drug disposition and response isolation, sequence and genotyping of the drug metabolizer cyp2d6 gene in the colombian population effects of prototypical microsomal enzyme inducers on cytochrome p450 expression in cultured human hepatocytes cytochrome p450 in the brain: a review the expression of cyp2b6, cyp2c9 and cyp2a4 genes: a tangle of networks of nuclear and steroid receptors regulation of cytochrome p450 (cyp) genes by nuclear receptors the cyp2c19 enzyme polymorphism cytochrome p450 2d6 variants in a caucasian population: allele frequencies and phenotypic consequences clinically signifi cant drug interactions with cholinesterase inhibitors: a guide for neurologists assessment of the predictive power of genotypes for the in-vivo catalytic function of cyp2d6 in a german population ten percent of north spanish individuals carry duplicated or triplicated cyp2d6 genes associated with ultrarapid metabolism of debrisoquine clinical pharmacokinetics of galantamine impact of the cyp2d6 polymorphism on steady-state plasma concentrations and clinical outcome of donepezil in alzheimer's disease patients molecular pathology and pharmacogenomics in alzheimer's disease: polygenic-related effects of multifactorial treatments on cognition, anxiety, and depression pharmacogenomic studies with a combination therapy in alzheimer's disease interethnic differences in genetic polymorphisms of cyp2d6 in the u.s. population: clinical implications donepezil use in alzheimer disease effects of cholinergic markers in rat brain and blood after short and prolonged administration of donepezil the o-demethylation of the antidementia drug galantamine is catalyzed by cytochrome p450 2d6 cholinesterase inhibitors in the treatment of alzheimer's disease: a comparison of tolerability and pharmacology galantamine pharmacokinetics, safety, and tolerability profi les are similar in healthy caucasian and japanese subjects pharmacokinetics and drug interactions of cholinesterase inhibitors administered in alzheimer's disease the metabolism and excretion of galantamine in rats, dogs, and humans prevalence of potentially inappropriate medication use in elderly patients. comparison between general medicine and geriatric wards phargkb update: ii. cyp3a5, cytochrome p450, family 3, subfamily a, polypeptide 5 genomics and the future of pharmacotherapy in psychiatry cyp2d6 polymorphism: implications for antipsychotic drug response, schizophrenia and personality traits cytochrome p450 polymorphisms and response to antipsychotic therapy infl uence of cytochrome p450 polymorphisms on drug therapies: pharmacogenetic, pharmacoepigenetic and clinical aspects pharmaco-genomics handbook. 2nd edn. lexi-comp drug information handbook with international trade names index. 17th edn. lexi-comp drug information handbook for psychiatry. 6th edn. lexi-comp pharmacogenomics in schizophrenia: the quest for individualized therapy the role of 5-ht2c receptor polymorphisms in the pharmacogenetics of antipsychotic drug treatment dopamine d4 receptor gene exon iii polymorphism and interindividual variation in response to clozapine association between multidrug resistance 1 (mdr1) gene polymorphism and therapeutic response to bromperidol in schizophrenic patients: a preliminary study genetic susceptibility to tardive dyskinesia among schizophrenia subjects: iv. role of dopaminergic pathway gene polymorphisms drd2 promoter region variation as a predictor of sustained response to antipsychotic medication in fi rst-episode schizophrenic patients the relationship between p-glycoprotein (pgp) polymorphisms and response to olanzapine treatment in schizophrenia polymorphisms of the abcb1 gene are associated with the therapeutic response to risperidone in chinese schizophrenia patients systematic investigation of genetic variability in 111 human genes -implications for studying variable drug response pharmacogenetics of psychotropic drug response individualizing antipsychotic drug therapy in schizophrenia: the promise of pharmacogenetics pharmacogenetics and schizophrenia pharmacogenetics and pharmacogenomics of schizophrenia: a review of last decade of research phosphodiesterase genes are associated with susceptibility to major depression and antidepressant treatment response variation in the purinergic p2rx(7) receptor gene and schizophrenia the cnr1 gene as a pharmacogenetic factor for antipsychotics rather than a susceptibility gene for schizophrenia druginduced activation of srep-controlled lipogenic gene expression in cns-related cell lines: marked differences between various antipsychotic drugs further evidence for association of the rgs2 gene with antipsychoticinduced parkinsonism: protective role of a functional polymorphism in the 3′-untranslated region ethnic stratifi cation of the association of rgs4 variants with antipsychotic treatment response in schizophrenia pharmacogenetic assessment of antipsychotic-induced movement disorders: contribution of the dopamine d3 receptor and cytochrome p450 1a2 genes association between cyp2d6 genotype and tardive dyskinesia in korean schizophrenics haplotype analysis of endothelial nitric oxide synthase (nos3) genetic variants and tardive dyskinesia in patients with schizophrenia could htr2a t102c and drd3 ser9gly predict clinical improvement in patients with acutely exacerbated schizophrenia? results from treatment responses to risperidone in a naturalistic setting the -1019 c/g polymorphism of the 5-ht1a receptor gene is associated with negative symptom response to risperidone treatment in schizophrenia patients interaction between notch4 and catechol-o-methyltransferase genotypes in schizophrenia patients with poor response to typical neuroleptics effect of a ciliary neurotrophic factor polymorphism on schizophrenia symptom improvement in an iloperidone clinical trial association of the dopamine receptor interacting protein gene, nef3, with early response to antipsychotic medication pharmacogenetic study of atypical antipsychotic drug response : involvement of the norepinephrine transporter gene association of clozapineinduced weight gain with a polymorphism in the leptin promoter region in patients with chronic schizophrenia in a chinese population leptin and leptin receptor gene polymorphisms and increases in body mass index (bmi) from olanzapine treatment in persons with schizophrenia polymorphisms of the 5-ht2c receptor and leptin genes are associated with antipsychotic drug-induced weight gain in caucasian subjects with a fi rst-episode psicosis transcriptional profi ling of the rat frontal cortex following administration of the mglu5 antagonists mpep and mtep association of grik4 with outcome of antidepressant treatment in the star * d cohort glycogen synthase kinase-3beta gene is associated with antidepressant treatment response in chinese major depressive disorder lithium response and genetic variation in the creb family of genes the medial and lateral substantia nigra in parkinson's disease: mrna profi les associated with higher brain tissue vulnerability phosphoproteome and transcriptome analysis of the neuronal response to a cdk5 inhibitor allelic mrna expression of x-linked monoamine oxidase a (maoa) in human brain : dissection of epigenetic and genetic factors epigenetics and its implications for behavioural neuroendocrinology antineoplastic effects of the dna methylation inhibitor hydralazine and the histone deacetylase inhibitor valproic acid in cancer cell lines epigenetic alterations of the dopaminergic system in major psychiatric disorders central nervous system drug development: an integrative biomarker approach toward individualized medicine pleiotropic effects of apoe in dementia: infl uence on functional genomics and pharmacogenetics. in: advances in alzheimer's and parkinson's disease. insights, progress, and perspectives apoe-related dementia symptoms: frequency and progression apoe-related frequency of cognitive and noncognitive symptoms in dementia behavioral changes associated with different apolipoprotein e genotypes in dementia polymorphisms in the angiotensin-converting enzyme gene are associated with unipolar depression, ace activity and hypercortisolism apolipoprotein e gene polymorphism in early and late onset bipolar patients angiotensinrelated genes in patients with panic disorder risk factors for neuropsychiatric symptoms in dementia : the cache county study apolipoprotein e genotype infl uences presence and severity of delusions and aggressive behavior in alzheimer disease alzheimer genes and proteins, and measures of cognition and depression in older men depression in alzheimer's disease might be associated with apolipoprotein e epsilon 4 allele frequency in women but not in men four components describe behavioral symptoms in 1,120 individuals with late-onset alzheimer's disease association analysis of apolipoprotein e genotype and risk of depressive symptoms in alzheimer's disease behavioural pathology in alzheimer's disease with special reference to apolipoprotein e genotype apolipoprotein e genotype and major depression in a community of older adults. the cache county study protective effect of the apoε2 allele in major depressive disorder in taiwanese apoe is associated with age-of-onset, but not cognitive functioning, in late-life depression apoe isoforms and measures of anxiety in probable ad patients and apoe-/-mice differences in anxietyrelated behaviour between apolipoprotein e-defi cient c57bl/6 and wild type c57bl/6 mice role of h3-receptor-mediated signaling in anxiety and cognition in wild-type and apoe-/-mice relationship of deep white matter hyperintensities and apolipoprotein e genotype to depressive symptoms in older adults without clinical depression caudate nucleus volumes and genetic determinants of homocysteine metabolism in the prediction of psychomotor speed in older persons with depression a longitudinal study of hippocampal volume, cortisol levels, and cognition in older depressed subjects reduced hippocapal volumes and memory loss in patients with early-and late-onset depression vascular/risk and late-life depression in a korean community population measuring the value of pharmacogenomics assessing the cost-effectiveness of pharmacogenomics pharmacological treatment of neuropsychiatric symptoms of dementia. a review of the evidence memantine in moderate-to-severe alzheimer's disease donepezil in patients with severe alzheimer's disease: double-blind, parallel-group, placebo-controlled study donepezil for severe alzheimer's disease priorities and standards in pharmacogenetic research genomic screening and replication using the same data set in familybased association testing pharm acogenomics and individualized drug therapy ethical considerations in the use of dna for the diagnosis of disease key: cord-023055-ntbvmssh authors: nan title: immunogenicity date: 2004-02-19 journal: j cell biochem doi: 10.1002/jcb.240410506 sha: doc_id: 23055 cord_uid: ntbvmssh nan ia moyecules with respect to their roles as peptide receptors and target structures for tcr interaction. particular attention has been paid to distinguishing between local and distant effects of amino acid substitutions on ia function and to determining which residues interact with peptide antigen and which (if any) with the tcr. this ex erimental approach has led to the identification of several regions of the pofvorphic amino-terminal domains of the a and p chains as playing critical roles in chain-chain association and quaternary ia conformation. the a1 and p l putative helical regions have been found to have distinct degrees of structural lability, with the a1 helix showing much greater susceptibility to conformational change due to allelic variation in other re ions of the molecule. allelically polymor hic residues in the a1 and p 1 domainstave been shown to play important roles in &e activity of the assembly/folding control regions, and hence, analysis of local binding roles of specific residues in ia molecules must take this additional effect of substitutions at these positions into account. by controlling for large scale conformational effects, individual residues in the p chain have been assigned to desetopic ( eptide interaction) and histotopic (tcr interaction) roles. in the cytochrome c molel, a putative peptide bindin "pocket" involvin residues from both the postulated p l a helix and also the p-stran% floor has been defined, residues controlling both the extent of binding and the orientation of the bound peptide have been identified, and at least one residue with tcr interaction potential without obvious peptide binding properties has been localized. combining these data with those of other investigators leads us to propose a general model of class i1 mhc structure-function relationships. we have shown previously that memory b cells transferred into k-allotype distinct congenic rats in the absence of any priming antigen are deleted from the adoptive host within a matter of weeks (half-life of 1-2 weeks). in contrast co-injection of antigen with the cells facilitates their survival and the maintenance of a donor response for periods in excess of one year. in the experiments reported here we ask if the persistence of t cell memory is also dependent on antigen. 10. carrier (klh) primed t cells were transferred in the presence or absence of antigen into irradiated, k-allotype distinct adoptive host. a t various times after transfer these rats were injected with 2x10' hapten-carrier (dnp-klh) primed b cells together with 50 ig of soluble dnp-klh. this limiting number of b cells makes a secondary type response only if carrier-specific memory t cells survive in the adoptive host. we found that already at 6 weeks following transfer without antigen, no memory t cell help was available for these b cells. in contrast t cells transferred together with 10 ~g klh provided help for secondary type donor responses at 6 and 12 weeks after transfer. we conclude that longterm memory at both the t and b cell levels does not reside with small, very long-lived, resting cells but. with active clones that are maintained by small amounts of antigen that may persist for long periods. once antigen is lost from lymphoid tissues both t and b cell memory wanes within a relatively short time. t cells recognize antigen in the form of short peptides associated to class i or class i1 mhc molecules. each mhc molecule has the ability to bind a large number of peptides and peptides with unrelated sequences can compete for binding to the same mhc molecule, as well as in vitro. in vivo competition strictly correlates with the capacity of the competitor peptide to bind to the mhc molecule presenting the antigenic peptide and its extent dependes on the molar ratio between antigen and competitor. competition among different peptides derived by processing of hen egg-white lysozyme (hel) appears to exert a major influence on the immunodominance of antigenic determinants recognized by t cells. thus, the h l peptides 1-18 and 25-43 are both generated by hel processing and are both able to bind to the i-e molecule but only 1-18 becomes immunodominant because it has th ability to compete in vivq with other hel peptides, such as 25-43, for the available sites on the i-e molecule. however, two immunodominant t cell epitopes, such as those in hel peptides 51-66 and 112-129, both interacting with i-ak molecules, do not compete with each other when injected together at equimolar concentrations. such a coexistance is anticipated between peptides that bind with relatively high affinity to the presenting molecule and thus have both the chance to occupy a number of binding sites sufficient for t cell activation. r v s e iii xen ic tr lantation. in v i m lnvesti$ation uslng mocloml a n t m i e s r e 4 e -t e x e y skin gracs-val on m i c e w a s significantl pr:lrd l g anti-antihdy trea-t but n o t a b anti-antibody: w i d e r the saue animals but a n t i c d 4 antibody did prolong minor a n t w -d i allqrdts. in v i m studies r m that primary proliferation a n f z . 2 prcdwtion & -f i cells in response to mmkey stimulators was weak conpared to allogeneic reqonses. secondary responses t o xenogeneic stinulation were strong after in v i m priming but required the presence of responder nc's. assays for c y g t s t cell effectors in m i c e which had rejected monkey skin revealed few such cells. zhese results est that widely d i a t e xencgeneic processing and presentation, since xenogeneic antigens require that such presentation be in association with the.= antigens of regponder apc's, the xenogeneic r a f t s have a functional similarity t o aff2 leted allografts. shoved t h a t f e t a l r e n a l and f e t a l and p o s t n a t a l testis a l l o g r a f t s survived longer than corresponding a d u l t t i s s u e i n non-immunosuppressed outbred r a t hosts. the c u r r e n t study a s k s v h e t h e r t h e d i f f e r e n c e i n s u r v i v a l betveen r e n a l and t e s t i c u l a r g r a f t s and between g r a f t s of d i f f e r e n t ages is r e l a t e d t o d i f f e r e n t i a l t i s s u e expression of class i and class i1 mrna t r a n s c r i p t s or s u r f a c e antigens. and i f t h e s e p a t t e r n s change w i t h t r a n s p l a n t a t i o n . congeneic mice w e found t h a t prolonged s u r v i v a l of c57bl/6 f e t a l r e n a l (n=42; p<0.008) and f e t a l (n=14; ~( 0 . 0 5 ) and p o s t n a t a l (n= 8; ~( 0 . 0 5 ) t e s t i s mouse a l l o g r a f t s t r a n s p l a n t e d beneath t h e r e n a l c a p s u l e of a d u l t r e c i p i e n t bio.a mice and t h i s s u r v i v a l c o r r e l a t e s i n v e r s e l y w i t h t h e expression of class i and class i1 mrna (northern a n a l y s i s ) and p r o t e i n s (immunohistochemistryy) and t h a t both p r o t e i n and mrna increased throughout ontogeny f o r both t h e testis and kidney. after t r a n s p l a n t a t i o n t h e r e vas a marked i n d u c t i o n of mhc mrna t r a n s c r i p t s f o r both testis (n=207) and kidney (n=320). implanted f e t a l kidney t i s s u e t h a t survives. however. f a i l e d t o express d e t e c t a b l e mhc p r o t e i n , i n d i c a t i n g t h a t some p o s t -t r a n s c r i p t i o n a l modification i n t h i s t i s s u e occurs. t o a f f o r d it p r o t e c t i o n from r e j e c t i o n . implanted testis shoved i n d u c t i o n of both mrna and p r o t e i n v e l l above i t s much lower baseline. i n d i c a t i n g t h a t i t s r e g u l a t i o n , i n c o n t r a s t t o t h e kidney may be t r a n s c r i p t i o n a l . thus t h e f e t u s may lower t h e mhc burden as a s t r a t e g y t o escape r e j e c t i o n e i t h e r by p o s t t r a n s c r i p t i o n a l modification of p r o t e i n expression a s i n t h e kidney or by t r a n s c r i p t i o n a l modification of mrna as i n t h e testis. culture of thymus tissue in 2-deoxyguanosine (2dgua) is thought to reduce tissue immunogenicity by selectively depleting highly immunogenic, thymic immigrants of bone marrow origin. in the mouse 2dgua treated thymus tissue survival is markedly enhanced compared to untreated tissue when transplanted under the kidney capsule of allogeneic recipients. these experiments were repeated in the rat. as expected, strain da neonatal thymus tissue was rejected when transplanted under the kidney capsule of normal allogeneic strain pvg rats. surprisingly. acute rejection occurred even when the tissue was cultured for 14 days in 4 mm pdgua (3x the effective dose in mice). by in vitro criteria this dose was very effective in destroying thymocytes. to test whether residual marrow derived cells that escaped pdgua treatment were responsible for inducing rejection we "parked" the 2dgua treated da tissue in t cell depleted pvg rats. our working hypothesis was that the few remaining donor derived cells of marrow origin would be overgrown by host type cells. when pdgua-treated da thymus tissue was transplanted into t cell depleted pvg recipients graft rejection did not occur. however da pdgua treated thymus tissue, parked for as long as 200 days in t cell depleted pvg rats, was acutely rejected when retransplanted into normal pvg recipients. we interpret these results to suggest that rat thymic epithelium devoid of marrow derived cells is innately immunogenic. c 104 corinne amiel, violaine gugrin, thierry may, philippe canton, gilbert c faure, laboratoire d'immunologie and maladies infectieuses, chu de nancy, facult6 de mgdecine, 54500 vandoeuvre les nancy, france. lfal is a dimeric membrane molecule composed of a specific alpha chain (cdlla) and a beta chain (cd18) common to three members of the lfa family. lfal is physiologically expressed on all white blood cells, while other molecules of the lfa family (with cdllb and cdllc alpha chains) are restricted to cells of myeloid lineage. a defective expression of lfal has been described in some congenital immune deficiency and in aids. we investigated the lfal defect on peripheral blood lymphocytes from 100 hiv+ patients. three different monoclonal antibodies were used, respectively directed to chain-specific epitopes of cdlla (spvl7, sanbio) and cd18 (iot18, immunotech) and to a conformational epitope involving both chains (iot16, immunotech). cell suspensions were stained in indirect immunofluorescence and a flow cytometer (epics profile, coultronics) was used to assess the percentages of stained cell, the fluorescence intensity and the shape of fluorescent peaks. our data suggest that lfal expression is impaired in hiv+ patients both through the quantitative expression of each chain and through conformational alterations. the adhesion molecule lfa-1 is known to be important in antigen presentation. we have previously shown that both monocyte and t cell lfa-1 play a role in the interaction between these two cells (eji d; 943, 1987) . antibody to icam-1 (known to act as a ligand for lfa-1) also inhibits antigen presentation, although icam-1 is not thought to be expressed on resting t cells (eji 18: 35, 1988) . we have looked at the expression of icam-1 on t cells after incubation with 12 cytokines and found that only il-2 consistently effects an increase in both the percentage of icam-1 positive cells and in the level of expression. in addition we have found that a proportion of resting t cells express very low levels of icam-1. double labelling experiments have shown that these cells are part of the memory t cell population as defined by antibodies to uchli, lfa-3 and lfa-1, and furthermore that icam-1 negative cells are unable to respond to to antigens such as ppd and flu but are able to respond to pha. this suggests that icam-1 represents an additional marker on the memory t cell population which more precisely defines the subset able to respond to recall antigens icam-1 expression on t cells, anne-marie buckle and nancy hogg, macrophage lab. icrf, lincolns inn fields, london, wc2a 3px, u.k. immunization, francis r. carbone and michael j. bevan, department of immunology, research institute of scripps clinic, la jolla, ca 92037. ctl recognize peptide forms of processed, foreign antigens in association with class i molecules of the mhc and are usually directed against endogenously synthesized "cellular antigens" such as those expressed by virusinfected cells. in vifro studies have shown that small exogenous peptides can directly associate with class i molecules on the cell surface and mimic the target complex derived by intracellular processing and presentation. we have recently generated ova-specific, h-2kb-restricted ctl by immunizing c57bl/6 mice with a syngeneic tumor line transfected with the ova cdna. the ctl recognize the ova transfectant eg7-ova and the synthetic peptide ova but fail to recognize the native protein. we reasoned that given the potential for direct peptide/class ?'&$%ation observed in vifro, ova2s,-2ra may induce ctl after in vivo priming. however, we found that this is not the case. ova,,,-,,, and peptides of increasing lengths up to which are all able to form the target complex in vitro, are inefficient at priming ~% -%~~ specific ?h%sponses following intravenous injection. this is also true for both native and denatured ova. in contrast to these results, the synthetic peptide ova22g:z76 corresponding to a peptide in a partial tryptic digestion of ova can efficiently prime c57bl/6 mice in vrvo following intravenous injection. this peptide elicits ctl which appear identical to those derived from animals immunized with syngeneic cells producing ova endogenously. it is now well established that human t lymphocytes can be activated via the t cell specific cd2 antigen. in order to determine if a factor@) other than the single cd2 polypeptide is involved in cd2 mediated signal transduction, we have stamy transfected murine l cells with the human cd2 cdna. we report that such transfectants expressed hah levels of cd2 at the cell surface. formed sheep erythrocyte rosettes and expressed the three cd2 epitqm previously defined on human t lymphocytes, including the "activation associated' t i 13 epitope. the latter observation unequivocally demonstrating that expression of the ti13 epitope, in contrast to a previous report, is entirely independent of t cell specific factors. combinations of cd2 mabs that are potent stimulators of human t cells. however, failed to elicit either an increase in the concentration of intracellular free calcium or augment [3h]-thymidine inmrporatbn in the transfectants. these results provide both formal identification of the cd2 cdna and dearly demonstrate that the single cd2 polypeptide expressed in an heterokgous cell system devoid of t cell specific factors, cannot alone transduce intracellular signals in response to stimulatory combinations of cd2 mabs. the results are therefore consistent with the notion. that the functional cd2 antigen expressed in human t lymphocytes, requlres the association of another, as yet, undefined factor@). this conclusion was based on several lines of evid nce incl ding the observation that mabs specific for the class i a3 domain of either h-zl8 or l 2 d b interfered with t e generation of cd8-dependent (low substitution at position 227 in the a3 domain are not lysed by cd8-dependent primary ctl but are lysed by secondary cd8-independent (high affinity) ctl generated in the presence of antibody to the a3 domain. populations of ctl. we have isolated and characterized a c d w , cd4-da-specific cpl line. this line is cd8-independent and is capable of lysing the addition, we are currently generating clones from primary $-specific ctl cultures to obtain cd8-dependent (low affinity) ctl. directed rnutagenesis are being tested with the cd8-dependent and cd8-independent clones to define additional residues important for cd8 recognition. the comparison of ctl clones with different cd8 dependencies will allow us to more precisely define the role of cd8 in t cell recognition. percolle from the buffy coat of one unit ot blood. these cells (=400x10 ) are introduced into a curame 5000 elutriation centrifuge (rotor speed of 3000 rpm; loading flow of 10 ml/min). nine fractions can be obtained. the first three containing >90% lymphocytes; fraction 4 (3000 rpm-18 ml/min) and fraction 5 (2900 rpm-18 ml/min! contain both lymphocytes and monocytes and the next three fractions contain >90% monocytes; the finat fraction (rotor off) contains monocytes + granulocytes. cells from each fraction (5x10 /well) are incubatee for five days with tetanus toxoid (1.5lf/well) and an enriched population of t cells (5x10 /well). quadriplicate samples are then pulsed for 16 hours with 'h methyl thymidine. maximum apc activity is found in fractions 4 and 5 representing 4 to 7% of the mononuclear cells. apc activity for these two fractions can be further purified by selective absorption of the cells onto gelatin coated surfaces that have been preincubated with plasma. the non adherent lymphocytes are rgmoved after two hours. after overnight incubation spontaneously released cells (1-5x10 ) can be harvested which have a higher apc activity than cells rotated by elutriation alone. these methods are now highly reproducible in our laboratory, so we can now begin to characterize and study these cells. the male s p e c i f i c h-y a n t i g e n h a s been shown t o behave as a minor histocompatibility a n t i g e n in man and mouse. i n t r a n s p l a n t a t i o n , male t i s s u e may t r i g g e r t h e c l o n a l expansion of h-y reactive hhc r e s t r i c t e d effector cells of female o r i g i n . although male epidermal cells (ec) can induce an anti-h-y t cell response in female mice, so far in v i t r o techniques have f a i l e d t o i d e n t i f y t h e cell-defined h-y a n t i g e n on murine ec (1). here w e developed a 51cr release assay t o use human c u l t u r e d k e r a t i n o c y t e s (k) as t a r g e t cells for hla-a2 specific and ma-a2 r e s t r i c t e d h-y s p e c i f i c t cell clones. hla-a2+ but n o t h l a -a t k were l y s e d by anti-ma-a2 ctls i n a dose dependent manner. low but d e t e c t a b l e l e v e l s of anti-h-y k i l l i n g were found a g a i n s t ma-a2+ male k b u t n o t a g a i n s t h l a -a t male or hla-a2+ female k. both l e v e l s of a l l o r e a c t i v e and h-y s p e c i f i c l y s i s were d r a m a t i c a l l y enhanced after exposure of k t o ifn gamma. these r e s u l t s s t r o n g l y suggest t h a t h m a n male s k i n cells are d i r e c t l y s u s c e p t i b l e for h-y d i r e c t e d t c e l l k i l l i n g through t h e expression of f u n c t i o n a l h-y/hla complexes on t h e i r cell s u r f a c e . i n view of t h e s e f i n d i n g s , t o g e t h e r w i t h our r e c e n t s t u d i e s on t h e expression of h-y ctl determinants on h m a n hematopoietic p r o g e n i t o r c e l l s (21, t h e r o l e of h-y a s a t a r g e t s t r u c t u r e f o r c e l l mediated immunity i n o l i n i c a l t r a n s p l a n t a t i o n should be s e r i o u s l y taken i n t o account. 1. steinmuller d. and burlingham w.j. t r a n s p l a n t a t i o n 19@4,37,1,22. 2. voogt p.j., goulmy e., fibbe w.e., e t a l . j . clin. invest. sept.1988. c 116 diphteria toxoid (dt) presentation by hla dr7 transfected murine fibroblasts bismuth, laboratory of c e l l u l a r and t i s s u l a r immunology, chu p i t i e s a l p b t r i b r e , p a r i s , france and veterans medical c e n t e r , iowa c i t y , usa. l t r a n s f e c t a n t s e x p r e s s i n g s i n g l e type of human mhc c l a s s i1 molecules produced by dna conjugate formation has been studied with cloned t cell lines and a b cell hybridoma and with t cells and b cells from normal mice. resting t cells and b cells do not form appreciable numbers of conjugates but conjugates are formed between t cells stimulated with alloantigen for four days and b cells activated by 24 hour culture with lps. irrelevant lymphocytes do not affect the rate of specific conjugate formation in suspensions of cells agitated by gentle rocking but impair conjugate formation when cells are allowed to settle in round bottom tubes. in further experiments, it was shown that the conditions for the induction of lymphokine secretion by the t cell were not indentical to the conditions for conjugate formation.the significance of these and other observations for the interaction of t cells and b cells in vivo will be discussed. of the primary mixed leukocyte reaction (mlr) and that this reaction occurs in multicellular dendritic cell-cd4+ t cell clusters [cellular immunology 111, 183-195(1988) dendritic cells are able to contact, cluster, and retain allogeneic t cells and induce these alloreactive cells to proliferate and divide. tions labeled with a vital flvorescent dye, we show that only dendritic cells efficiently form stable clusters. labeled monocytes and b cells do not form clusters with t cells. when labeled monocytes and unlabeled dendritic cells are used to stimulate t cells, unlabeled clusters form. labeled monocytes do not move into the clusters until the third day of the mlr. significant levels of il-2 and a-ifn appear in the culture supernatant by the first or second day. blast transformation by the second day of the mlr as demonstrated by giemsa staining of cluster cytopreps. also been studied by immunoperoxidase staining. it is known that human peripheral blood dendritic cells are potent stimulators using purified dendritic cell populathe distribution of certain adhesion molecules within clusters has c1m microbiology and immunology, emory university, atlanta, ga 30322. immunization of sjl/j mice with myelin basic protein (mbp) induces the t cell-mediated autoimmune central nervous system disease, experimental allergic encephalomyelitis. response against a dominant epitope (residues 89-101) leads to disease. lymph node t cells from mbp-immune mice react against several epitopes in addition to 89-101 indicating that the i-as molecule is able to form immunogenic complexes with several mbp peptides. the question asked in these studies was whether subdominant epitopes from the same molecule would compete with the dominant epitope for binding sites on the i-as molecule. to address this question two t cell clones, one specific for 89-101 (sp4.2) and a second specific for a second epitope present in peptide 89-170 (sp4.7) were tested for responsiveness when cultured with the dominant epitope alone or with mixtures of peptides containing dominant and subdominant epitopes. reactivity of sp4.2 against peptide 89-101 was inhibited by peptides 1-37 and 43-88. reactivity of sp4.7 against peptide 89-170 was not inhibited by peptide 89-101 although peptides 1-37 and 43-88 were inhibitory. controls indicated that inhibitory reactivity was not due to toxicity at high concentrations of peptides. these findings imply that subdominant epitopes are able to compete with dominant epitopes of mbp for binding sites on i-as molecules. linda r. gooding, frances c . rawle, david i . kusher, w i l l i a m s . m. wold+ and barbara knowles*. department of microbiology and immunology, emory university school of medicine, atlanta, ga 30322, 'institute f o r molecular virology, s t . louis university school of medicine, s t . louis, mo 63110 and *the wistar i n s t i t u t e , philadelphia, pa 19104. i n several v i r u s systems e a r l y non-structural proteins localized predominantly i n the nucleus of infected cells are major t a r g e t antigens f o r cytotoxic t lymphocytes (ctl). whether early synthesis o r nuclear l o c a l i z a t i o n are important factors i n immunodominance is not known. w e have recently developed a system f o r studying the ctl response t o human group c adenoviruses i n mice. by us ng both transfected t a r g e t s and virus deletion mutants w e have shown t h a t , response t o wild type ad5. there are two e1a t r a n s c r i p t s , 12s and 1 3 s . which both encode major e a r l y nuclear antigens d i f f e r i n g by a 46 amino a c i d insertion: both antigens are recognized equally w e l l by ctl. the e3 encoded 19k glycoprotein (gpl9k) of ad5 binds t o mhc c l a s s i antigens i n the endoplasmic reticulum preventing t h e i r translocation t o the c e l l surface and strongly inhibiting l y s i s by ad5 specific ctl. however, the presence of gpl9k i n the priming v i r u s does not a f f e c t the s p e c i f i c i t y of the ctl generated f o r e l a , so the immunodominance of t h i s protein cannot be due t o the fact t h a t i t is the only major protein synthesized before gpl9k i n the course of infection. using virus deletion mutants we are investigating whether ctl s p e c i f i c f o r other ad5 antigens can be induced i n the absence of ela, and whether e1a is also the dominant antigen recognized i n mice of other mhc haplotypes. respond to antigens present on non-replicative virions. in contrast, we have obtained balc/c i-erestricted t hybridomas specific for the neuraminidase (na) glycoprotein of a/pr8 influenza which recognize infectious, but not non-replicative virus, closely resembling recognition requirements observed for most class i mhc-restricted responses to influenza. recognition correlated with the rte nova synthesis of viral na within antigen-presenting cells, but did not depend strictly upon the amount of na present in cultures, since high na concentrations could be achieved by addition of non-replicative virus without being stimulatory for na-specific t cells. recognition of a neo-antigen was ruled out, since, in high concentration, na isolated from purified virions, even if reduced and alkylated, was recognized by the t hybridoma clone. isolated na was recognized when added to pre-fixed apc, suggesting that this form of antigen was able to bypass the usual processing pathway of exogenous proteins. this suggests that endogenously-synthesized antigen may use different pathways to achieve class 11-associated presentation. t lymphocyte activation is a complex event which is influenced by a variety of distinct cell surface molecules. in order to determine the role of individual molecules in the activation process, we have developed an efficient methodology for generating cell variants in which expression of molecules is selectively inhibited by expression of anti-sense rna from an epstein-barr virus episomal replicon. in a previous study, we reported that marked inhibition of cd8 cell surface expression could be achieved in a human t cell clone using this approach. we have now extended this strategy to another t cell surface molecule, cd2, as a first ste towards ascertaining its role in t cell activation. to this end, we s nthesized a &-mer oli onucleotide corresponding to a sequence in. the 5: end of the c d i n g re ion of human cd'i and inserted it in an anti-sense orientation into this replicon. this a-c%2/rep3 construct was electroporated into jurkat cells. analysis of stable a-cd2irep3 transfectants by immunofluorescence staining and flow cytometry demonstrated complete and selective inhibition of cd2 expression. in contrast to the nontransfected arent, this cd2-variant demonstrated a partial loss in its ability to form conjugates a n 8 to secrete interleukin 2 when stimulated with anti-cd2 monoclonal antibodies. however, stimulation of the cd2-variant with a23187 and pma did result in interleukin 2 secretion. several observations suggest that cd8 functions not only as an adhesion molecule recognising mhc class i on the adjacent cells but also potentiate the transducting capacity of the tcr/cdg complex. comparison of the mouse ly 2 protein sequence with the homolog rat ox8 and human t8 sequences revealed most highly conserved regions in the membrane and cytoplasmic part of the molecule. the conservation of the transmembrane and cytoplasmic sequences in different species may be significant for the function of the cd8 molecule. in order to initiate the functional dissection of the cd8-molecule we constructed mutations in different parts of the molecule. by transfecting the a and b chain genes donated by a cd8 dependent cytotoxic t cell clone(kb5 c20) into the mhc class i1 restricted agd cd4 t cell hybridoma do-11.10 we were able to reconstitute the ability to respond to k only if the transfer was done with the ly 2 molecules (gabert et al., 1987. cell, 50. 545-554) . in this system surface expression of mutated and non mutated ly-2 molecules were checked by facs-analysis and the molecuar size of the proteins were analysed by immunoprecipitation with the anti-ly-2 monoclonal antibody 19/lj8. finally functional effects of the mutations were investigated in response towards the k alloantigen. we have simulated graft versus host and host versus graft reactivity in vitro by studying primary anti-minor h responses in a limiting dilution culture system. the ability of bmm and peripheral blood mononuclear cells (pbm) to stimulate and respond in this system were compared by estimating the number of proliferating cells. in gvh-direction the combination of donor-bmm (d-beim) and host-pbm (h-pbm) was 2 to 15 times more effective in stimulating proliferation than any other combination; the same applied to the combination h-pbm/d-bmm in hvg-direction.-using these combinations the median frequency of proliferating cells in gvh-direction was 1/5300 (range 1/59c-1/18100) in 10 pairs, in hvg-direction (7 pairs) 1/2330 (range 1/115-1/6100). 85% of the proliferating cells had the phenotype of mature t-cells.-using the same combination of responder/stimulator cells we have also estimated the number of cytotoxic cells specific for the hla-identical target cell. in gvh-direction the median estimate (n=8) was 1/10300 (range 1/66o-1/45700), in hvg-direction (n=5) 1/2250 (range 1/400-1/6500). by split well analysis similar or higher frequencies of cytotoxic cells with specificity for nk-targets were detected (gvhr: 1/5750, hvgr: 1/3620). it was however possible to identify a significant number of minor h-specific clones by segregation analysis; their specificity could be confirmed after clonal expansion. the clones had the phenotype of typical cytotoxic t-cells.-the relevance of the two cytotoxic subpopulations described above to clinical events such as gvhd, graft rejection and relapse needs to be clarified.molecular cloning of murine icah-1, k.j. horley, b. baker, and f. takei, terry pathology, university of british columbia, vancouver, b.c., canada. we have previously reported a novel cell surface antigen expressed on activated and proliferating murine lymphocytes. the antigen, termed hala-2, is absent or present at low densities on thymocytes, lymph node cells, and fibroblast cell lines, indicating it is not a universal proliferation antigen. some cells of the spleen and bone marrow express mala-2 at a high density possibly representing in vivo proliferation in these tissues. apparent molecular weight of 95-100 kd under both reducing and nonreducing conditions, and is susceptible to endo f digestion. the monoclonal antibody yn1/1.7 that reacts with this antigen, profoundly inhibits mlr. a xgtlo cdna library was constructed from ns-1 cells that express a high level of mala-2, and screened with synthetic oligonucleotides resulting in the isolation of a full length cdna clone (-3.2 kb). the cdna sequence has high homology with the human icau-1 sequence, indicating that hala-2 may be the murine homologue of this characterized protein. hines, trudeau i n s t i t u t e , inc., p.o. box 59, saranac lake, ny 12983 a tumor c e l l l i n e , et-5, has been derived from an apparent fibrosarcoma t h a t arose i n a c57bl/6 male mouse. antigens. mice t h a t have r e j e c t e d et-5 become imnune t o these minor h antigens, judged by accelerated s k i n g r a f t r e j e c t i o n , and t h i s imnunity can be t r a n s f e r r e d t o imnunod e f i c i e n t mice w i t h lymphoid c e l l s . however, spleen c e l l s from mice t h a t have r e j e c t e d according to the widely accepted view, cd2 (t11, sheep erythrocyte receptor) is the first t cell-specific antigen to appear on differentiating thymocytes during ontogeny. it follows that cd2 should be expressed on all immature and mature t cells. using two-color cytofluorometry i have here identified subsets of cd2-cd3+ t cells both in fetal human thymus or spleen and in adult peripheral blood. cd2-cd3+ t cells constitute 1-25% of fetal thymocytes and 0.1-0.8% of peripheral blood t cells. il-2-dependent longterm clones of cdi-cdj+ cells do not react with a panel of monoclonal antibodies (mab) directed against the t1ll, tlll or t113 epitopes of cd2 and do not transcribe cd2 mrna. fetal tissue-derived clones react with the tigammaa mab and thus express a functional tcr gamma chain, while cd2-cd3+ clones from peripheral blood are bha031+ and express a full-length 1.3 kb tcr c s .ria. the clones established here are currently being characterized with respect to functional capacities. i conclude that expression of cd2 is not an absolute prerequisite for the expression of the cd3/tcr molecular complex on human t cells. if they are added after 24 hours. these interactions are bidirectional. since both cdlla and cd18. and t h e i r ligand i-cam 1. are expressed on the presenting c e l l s as well as the t cells. however, a l l such e a r l y adhesion related events are not bidirectional since anti-cdz and anti-lfa-3. which are expressed d i f f e r e n t i a l l y on t c e l l s and presenting c e l l s respectively are also effective as inhibitors. antibodies, a n t i cd4 and a n t i cd25 antibodies do not i n h i b i t clustering but do i n h i b i t p r o l i f e r a t i o n , and t h i s i s seen irrespective o f when the antibodies are added i n t o the assay. our findings suggest t h a t there are two mechanisms involved i n dendritic c e l l -t c e l l interaction, f i r s t l y an inrnediate cell-cell adhesion step and l a t e r a secondary signal transduction process possibly mediated v i a cytokines. the q u a l i t a t i v e d i fferences between dendritic c e l l and b c e l l induced i m n o g e n l c i t y may thus l i e i n e i t h e r o f these two steps. king, department o f eathology, the bland-sutton i n s t i t u t e . university college and middlesex school i n contrast, a n t i class i1 m c lmmunogenicity c130 cultvred tissue is capable of stimulating an rggwwse when l " s p l m l e d ~e n e i c w y , robert j. ketchum and orion d. hegre, dept. of cell biology and neumanatoay, university of uinnesota, minneapolis hn 55455. neonatal rat islets derived by culture-isolation have teen shown to k free of mlc class 11+ cells, and are immunologically silent when transplanted to either syngeneic or allogeneic hosts. allogeneic transplantation of cultured neonatal non-islet pancreatic tissue, which is known to contain class 11+ cells, results in rapid allograft rejection. unexpectedly, m i c transplantation of cultured non-islet ductal tissue also resulted in lononuclear lm,me cell (hnc) infiltration of the graft in 86% of grafts examined. highly purified syngeneic islets and ductal elements grafted syngeneically at remote sites display an i u n e response in the ductal element graft, while the islet graft is free of any imnme cell infiltrate. this syngeneic imune response does not result from the use of xenogeneic serum in the medium, since cultures carried out using syngeneic rat serum supplemented medium yielded identical results. uncultured neonatal pancreatic tissue grafted syngeneically does not result in iqk: infiltrate, thi6 i.rmne response to a syngeneic stimulus correlates with the presence of class 11+ (antigen presenting) cells. in grafts free of class 11+ cells (culture-isolated islet grafts) no i.rmne responrrc to syngeneic stimulus was observed, while a response was present when syngeneic ductal elements, known to include class 11+ cells, were grafted. this indicates a need for cells capable of antigen presentation to stimulate this syngeneic rerrponne, and suggests that either a modified self antigen or a nomally sequestered antigen is being presented. this syngeneic imune response demonstrates many of the same characteristics of, and may be analagous to, the in vitro syngeneic, or autologous, mixed lymphocyte reactions. indicating this response is not to developnental antigen. c 132 the presence of "self" mhc class i1 (ma-dr) antigens determines whether blood transfusions ihmunise or suppress. el lagaaij, a termijtelen. e goulmy, & jj van rood, leiden university hospital, the netherlands. blood transfusions can immunise the recipient, as well as induce prolonged allograft survival. it is not known what makes that some transfusions inrrmnise the recipient whereas others induce immune suppression. we investigated if certain mhc compatibilities or differences between recipient and transfusion donor and organ donor are required to induce the beneficial "transfusion effect" in man. we studied graft survival and blood transfusion induced changes in cellular and humoral immunity in 4 different patient groups. the patients received a single blood transfusion of a randomly choosen donor. we found in all 4 groups that to induce a beneficial "transfusion effect" compatibility for at least 1 hla-dr antigen between recipient and transfusion donor is required. if the transfusion and recipient are mismatched for both ma-dr antigens, the recipient is immunised, resulting in an increased antibody production (p=o.ool), an increased cytoxicity (cml) (p=o.oos), an increased mixed lymfocyte reaction (mlr) (p=o.oos) and a decreased graft survival (p-0.003). after a beneficial (ma-dr sharing) transfusion. the in vitro test remain unchanged or decrease. graft survival increases with the number of shared antigens between transfusion donor and organ donor (p=o.o2), suggesting that a donor specific suppression is induced. recent experiments have revealed a direct interaction between the cd4 molecule and hla-dr antigen. to address the nature of this interaction we have used a xenogeneic system in which a human cd4 cdna was expressed in the murine cd4-and cd8-negative hybridoma 3dt52.5.8. the tcr of 3d3752.5.8 recognizes the murine class i molecule od. a class i 1 expressing dd-positive cell line was obtained by cotransfection of the human class i1 cdnas together with the murine od gene int.0 the murine fibroblast line dap3. coculture of 3dt52.5.8 and dap3 expressing dp-dd resulted in a 20 fold increase in il-2 production and in rosette formation only when both cd4 and dp were present on the responding t hybridoma and the presenting cell, respectively. we are using this system to map regions of the cd4 molecule that interact with the class i1 mhc ag. the cd4 molecule has also been shown to be the receptor for the human immunodeficiency virus (hiv) via the gp120 molecule. since gp120 and class i1 both interact kith cd4, we have used our functional assay to verify if gp120 exerts an inhibitory function on cd4 class i 1 interaction. recombinant gp120 inhibits the functional interaction and rosette formation in a concentration dependent fashion with maximal inhibition at about 10 pg/ml.. this inhibition is specific since it can be reversed by recombinant soluble cd4. the fact that recombinant. gp120 can inhibit the functional interaction between cd4 and its physiological ligand (class i1 ags) suggests that the use of gp120 on a vaccine against hiv infection could alter the immune response of such individuals. this work was supported by src, mrc and nci. t lymphocytes discern self from non-self molecules through the interaction of their antigen-specific receptors and proteins encoded by the mhc. although the nature of this association is not well-defined, a model has been proposed whereby the v-segments of the t cell receptor interact with residues along the 2 alpha helices of the class i antigen (davis et al.; 334:395, 1988) . we have recently shown that ctl generated against the class i molecule qiod crossreact on several unrelated murine class i antigens containing the shared qiod residues at amino acid positions 152, 155, and 156 (mann et al.; j x 168:307, 1988) . these residues contributed by the a-2 domain occur in the alpha helical portion of the class i molecule and amino acids 152 and 155 could interact directly with the t cell receptor. to further characterize the role of these amino acids, we are in the process of determining whether insertion of these 3 residues by site-directed mutagenesis into a human class i molecule will allow for the antigen's recognition by anti-910 ctl. here the t cell repertoire becomes restricted, so that foreign antigen can be recognized only when associated with the mhc products of the host, and mature t cells are tolerized to self antigens, a process which also seems to be mhc-restricted. thus, t cells should be non-reactive to self antigens when they are associated with mhc products present on the tolerance-inducing thymic cells, whereas they may still react to the same self antigens when associated with different mhc products. to examine mhcrestricted tolerance in vivo, a model system must have: a) self antigen in the context of one mhc haplotype. and b) tolerance to both that and a second mhc haplotype. chimeras were prepared by aggregation of preimplantation embryos of two strains of mice, c57bl/6 (b6) and balb/c. the thymus of such chimeras should be composed of two distinct and completely intermixed populations of cells, one from each parental strain (isozyme analysis indicates no detectable fusion of cells). thus, t cells maturing in the chimeric thymus should be exposed to and tolerized to minor histocompatibility antigens (mhas) of one parental strain only in association with the mhc of that strain. for example, mice might be expected to express b6 mhas only with h-2b (the 8 6 mhc). however, our chimeras were fully tolerant to f1 skin grafts, which have "hybrid" combinations of mhas and mhc (e.g., b6 mhas with h-2d). these results are most consistent with either, a) "wholesale" antigen processing and presentation of all mhas by the tolerizing thymic cells, and/or, b) functional sharing of mhc products between the parental thymic cell populations. many of the events critical to the maturation of t lymphocytes occur in the thymus. in the case of t-cell responses against viruses such response defects are associated with a marked increase in disease susceptibility as illustrated by class i mhc controlled susceptibility to lethal pneumonia induction by sendai virus. certain class i or class i1 mhc determined tc response defects (four out of six tested by us) can be restored by imunization in vivo and/or restimulation in vitro with dc. dc are the most effective apc. their superior apc capacity is due to 1) a very high absolute number of class i and class i1 mhc molecules, and 2 ) a low degree of sialylation of mhc and other surface molecules, reducing negative charge and facilitating access of the t-cell receptor to the mhc groove presenting the antigenic peptide and/or improved clustering with t cells. the more effective antigen presentation by dc allows a more prominent role for a cd4+ q cell independent pathway of cd8+ tc activation. it is postulated that the more effective direct triggering of cd8+ tc precursors lowers the threshold for il-2 production by cd8+ cells, reducing the requirement for il-2 production by the cd4+ cells. failure of dc to overcome certain mhc-linked specific tc response defects probably reflects complete failure of any foreign peptide derived from the processed antigen to interact efficiently with the mhc or a true tc repertoire defect. donald pious, department of pediatrics, university of washington, seattle, w 98195 mhc class i1 molecules bind inmunogenic peptides derivsd fro13 soluble antigen and the complex is recognized by specific t cells. ue have isolated eight independent mutant b -l u clones which are altered in their ability to present antigen. in standard proliferation assays using four different soluble protein antigens, the mutants are unable to stimulate the majority of t cell clones restricted to hlfl dr or dp. cllthough unable to present *hole hepatitis b surface antigen (hbdg), they effcctively present a hbdg peptide to a dprestricted t cell clone. the fact that both dr and dp restricted antigen presentation is abnormal in these mutants made it likely that the class i 1 structurual genes are unaltered. this hypothesis is supported by the finding that dno sequences from the dr genes of one mutant are norml. however, two observations indicate that the uture class i1 di-expmsccd by the mutants are structurally altered. binding to the mutants with tuo polymorphic anti-dr antibodies and one anti-dp antibody is reduced, although the level of cell surface class i1 expression is normal. second, the class i1 diners from the mutants dissociate into no-rs under in vitro conditions (sds-pwe) which preserve dimers in the progenitor line. together, these functional and structural data suggest that the mutants are defective in a molecule that either associates with or post-translationally modifies class i1 molecules and is required for the physiologic formation of an twc/antigen complex. they do not function as restriction elements presenting foreign antigens to t-cells. to investigate the nature of this functional defect we have constructed 3 different recombinant class i genes using dna segments from the b10 q9 (qa-2) or h-2db genes. structural protein encoded by the recombinant genes is derived entirely from the q9 gene whereas the cis-acting transcriptional regulatory elements or the dna segment encoding the membrane anchoring domain is derived from the h-2db gene. into fertilised cba/ca embryos by microinjection and transgenic lines were established. to date we have established 13 transgenic lines. we have shown that the q9 (qa-2) antigen encoded by the recombinant genes behaves as a major transplantation antigen in skin grafts and provokes strong secondary cytotoxic t-cell responses in grafted animals irrespective of the tissue distribution or mode of membrane anchorage of the q9 antigen. at present, we are investigating whether the q9 antigens encoded by the recombinant genes are able to present influenza virus or mouse minor histocompatibility antigens to t-cells during immune responses and hence whether they can function as restriction elements. is a distinctive system because it permits an analysis of the activation requirements for antigen specific, resting t cells. been isolated following culture with anti-ig-sepharose and compared to dendritic cells as stimulators of cd4' t cells in the mix. i1 mhc products and independently stimulated the 1 ' mlr and the production of several t derived lymphokines, including il-2 and 11-4. however, the relative potencies of dendritic cells and anti-ig blasts as 1'mir stimulators varied in a strain dependent fashion. times more active in stimulating hls-mismatched. mhc-matched t cells, relative to syngeneic t cells. anti-ig blasts when stimulating acroas an mhc barrier and were likewise more effective in binding iqic-disparate t cells to form the clusters in which the mix was generated. dendritic cell-t cell clustering was resistant to anti-lfa-1 mab, while b blast-t cell clustering was totally blocked. thus, anti-ig b lymphoblasts and dendritic cells, two cell types which differ markedly in phenotype, also differ in efficiency and mechanism for initiating responses in allogeneic t cells. only anti-ig blasts could stimulate across an mls barrier, being at least 100 in contrast, dendritic cells were 10-30 times more potent than the as the lymphocyte f u n c t i o n antigen 3 (lfa-3) and the i n t r a c e l l u l a r adhesion molecule (icam). we i n t e r p r e t t h i s as an increase i n the membrane expression o f these s t r u c t u r e s f o l l o w i n g incubation. the increase i s blocked by the t r a n s l a t i o n i n h i b i t o r , cycloheximide, implying t h a t p r o t e i n synthesis i s involved. helper t cell responses to soluble globular proteins require processing of the protein by ia-expressing antigen presenting cells (apc). antigen is internalized into acidic vesicles, proteolyzed, and peptides containing t ceu antigenic determinants are transported to the apc surface where they are recognized by the antigen-specific t cell in conjunction with ia. most ia-"pressing cells are competent apc, however, only b cells have antigen-specilic receptors on their surface auowing bound antigen to be processed and presented at 1/lw the antigen concentration required by nonspecific apc little is known about b cell antigen processing function during differentiation, or if ig-mediated apc function is altered at different maturational stages, thus allowing regulation of b cell-helper t cell interactions. neonatal acquisition of apc function was examined in mice ages day 1 to day 15. splenic cells from d l to d10 mice process and present pigeon qochrome $, pg at 0-20% of adult levels. by d15 neonatal spleen cells acquire the ability to process and present soluble pg at 3 5 4 of adult levels. the ability to internalize antigen through ig rwptors was determined using an antigen-antibody conjugate, p$-&(ab')z. neonatal spleen cells acquire the ability to process antigen through ig simultaneously with the ability to process soluble antigen. lack of prowsing by neonatal spleen cells prior to d10 is not attributable to insufficient levels of surface ia. since d3 neonate spleen cells are able to activate t cell hybrids to 50% of adult levels when provided with p$ 81-104, containing the t cell determinant. dlod15 neonate presentation of p@1-104 is indistinguishable from adult levels. b cell maturation into memory b cells was identified by the loss of the jlld differentiation marker. splenic jlld'o b cells increase from 540% following immunization and return to nonimmune levels after 4 weeks. during antigen-induced b cell maturation, jllb b cells are indistinguishable from splenic b cells in the ability to present antigen introduced into the processing pathway either pinocytotically or via surface ig. p p antibody conjugates specific for mouse f(ab')z i n , igd, or igg are presented equally well by both splenic and jll@ b cells. thus, acquisition of b cell processing function appears to be developmentally regulated and may play an important role in b cell tolerance meshanisms. once b cells have acquired the ability to process antigen this function is maintained and is not regulated during maturation into memory b cells. we are currently investigating the role of ig isotype during neonatal acquisition of antigen processing. (supponed by nih grants ai-18939, ai-12001, and ai-2317) as the preliminary studies suggested that carrier sc (apc) for ts vs. tcs activation might be distinct, studies were done to directly address this possibility by assessing tha ability of s3 coupled to various cell populations to activate ts and tcs. the results indicated that ts activation required that s3 be coupled to plastic adherent cells which bear both i-a and i-j determinants. these cells are nonadberent to anti-ig and nonfunctional in cyclophosphamide (cy) treated mice. i n contrast activation of tcs required coupling of s3 to plastic non-adherent and anti-ig adherent cells. these cells are functional in cy treated mice and bear the b cell markers jlld and i-a but not 1-3. thus s3-specific ts are activated by i-a+ i-j+ adherent cells (presumably macrophages) whereas tcs are activated when antigen is presented by b cells. (nih grant ca25054.) interleukin-2 activated killer (iak) lymphocytes (also known as lak cells) which destroy a broader spectrum of tumors invitro than nk cells have been used sucessfully in an adoptive immuno-therapy protocol for the treatment of patients with a variety of advanced cancers. the cell suface molecule(s) on tumor cells that a r e involved in specific binding to iak cells and in programming iak cells for cytolysis (iak acceptor molecules) have not been characterized. inorder to identify such acceptor molecules a crude membrane digest of the lung carcinoma cell line a549 was biotinylated and adsorbed to iak cells or to unstimulated human peripheral blood lymphocytes (upbl) (each from the same person). proteins from the washed solubilized cells were separated by page, western blotted and probed with streptavidin-alkaline phosphatase. several experiments demonstrated that different tumor membrane proteins bound to iak cells compared to upbl. the unstimulated cells bound one tumor membrance protein (about 40kd) not found on the iak-adsorded blot. the iak cells bound three tumor proteins (approximately 30,46 & 50kd) not found on the upbl-adsorded blot. three other proteins (about 35,44 & 55kd) were found to adhere equally well t o iak cells and upbl. utilizing a streptavidin affinity column, solubilized tumor membrane proteins that bound to iak cells could be separated from solubilized iak membrane proteins. the isolated tumor membrane proteins that adsorded to iak cells inhibited iak mediated lysis of a549 tumor cells by >85%. these studies suggest that specific cellular adsorption techniques may be useful in isolating and characterizing tumor membrane proteins involved in interactions unique to cytolytic lymphocyte-tumor cell target binding and lysis. activation of human t lymphocytes occurs via the t cell receptor-cd3 complex but can also be induced through the non-antigen-specific cd2 molecule. selected combinations of mabs or the soluble cd2 ligand, namely lfa-3 and a unique anti-cdz mab (cd2.1) induce human t cell activatlon. cd4 is an accessory molecule implicated in t h e activation of human t lymphocytes. this molecule may exert this function by increasing intercellular avidity through binding to mhc class i1 molecules and/or by transmitting intracellular signals. we have investigated the action of mabs directed against different epitopes on the cd4 molecules in the activation of human t cells via the cdz pathway. we show that anti-cd4 mabs inhibit cd2 induced t cell proliferation in an epitope-depe dent fashion. this inhibition does not appear to be linked to the lower cd2 mediated [ c a ' + ] response induced by anti-cd4 mabs, since [ca"] response is equally affected by anti-cd4 mabs whether or not they inhibi ed t cell proliferation. in conclusion, the partial inhibition of the cd2 induced [cb'] response of t cells by various anti-cd4 mabs suggest that : 1) this inhibition does not totally account for the inhibitory effect of anti-cd4 mabs, 2) and the proliferation induced by anti-cdz mabs may not be completely ascribed to the [ca2+] response of t cells. rosenberg. and alfred singer, experimental immunology branch, nci, nih, bethesda, md 20892 . we have devised a model to study the in vivo generation of suppressor cells by using mice congenic at qa-1, a class i-like molecule encoded to the right of h-2d. disparate tail skin grafts (tsg) unless a second graft with additional helper determinants was also present. without any source of additional help, failed to reject their qa-1 graft, and were unable to reject them even upon the subsequent addition of exogenous help. thus, exposure to qa-1 disparate grafts, in the absence of additional help, either led to qa-1 specific tolerance or suppression. mice failing to reject qa-1 allografts revealed the presence of qa-1 specific suppressor cells that inhibited the in vivo activation of antigen specific effector cells capable of rejecting qa-1 bearing allografts. experiments using t cell subpopulations should allow for further characterization of these qa-1 specific suppressor cells. we found that b6 mice did not reject qa-1 however, animals engrafted with a qa-1 graft alone, in the thymus, the t-cell receptor genes are rearranged, the t cells learn to recognize their own major histoconpatibilitp complex "hc). and they learn to respond to foreing hec. these events seem to be linked to the interaction between t-cell precursors and the stromal cells of the thyms. thus increasing evidence points to an essential role for the thymus epithelial cells (te cells) in development of at least hec class i1 recognition by the t cells. to be able to study the importance of te cells in t cell maturation. we have developed a method for growing murine t cells in serum-free pediun with well defined constituents. the m d i u a allows far growth of te cells without concomitant growth of bone marrow derived cells as macrophages and fibroblasts. data obtained by en and lmmunocytochenistry showing the epithelial nature of the cultured cells, as well as autoradiographic data on the growth pattern, and characterization of tb cell supernatants will be provided in addition to results obtained from co-culture of te cells and t-cell precursors (cm-cds-thymytes). lmmunogenicity c 152 branch, national cancer institute, bethesda, md 20892 the effector limb mediating skin allograft rejection is highly antigen specific, rejecting cells that express allogeneic mhc antigens while sparing those which fail to express allogeneic mhc determinants. disparate skin grafts are completely rejected in spite of the fact that only a small percentage of the cells within the graft express ia antigens. thus, it is possible that mhc class i1 disparate grafts are rejected by a mechanism that does not assess the expression of mhc determinants on each cell. we assessed the specificity of the rejection of ia disparate grafts by using allophenic skin grafts in an adoptive transfer system and concluded that skin grift rejection across an mhc class i1 disparity required recognition of allo-ia determinants expressed by every cell in the graft. therefore, we reasoned that mhc class i1 antigens must be induced on these ia negative populations. indeed, injection of mice with gamma interferon dramatically induced ia antigens on previously negative keratinocytes. we next tested whether the induction of allogeneic ia determinants on keratinocytes was necessary for graft rejection by engrafting parental strain mice with skin from f >parent bone marrow chimeras. such grafts failed to be rejected, in spite of the speciic rejection of the allogeneic langerhans cells, indicating that the failure of keratinocytes to express allogeneic class i1 determinants leads to graft preservation. conclusion, mhc class i1 disparate skin allografts are rejected in a highly antigen specific fashion, secondary to the induction of mhc class i1 antigens on skin cells that fail to constitutively express them. to address whether the extensive polymorphism characteristic of class i molecules influences cd8 binding, we have screened a panel of transfectants expressing individual class i mhc alleles. of 18 alleles tested, only aw68.1 did not bind. all other molecules dld bind, including a2.1 and aw69, which differ by 13 and 7 amino acids respectively from aw68.1. position 245 in the alpha 3 domain was identified by sitedirected mutagenesls as the critical residue differing between a2.1 and aw68.1 which determines binding. a mutant aw68.1 molecule containin alanine at position 245 bound cd8, while a mutant of a2.1 with valine at 245 did not. alanine is found at position 245 of all human and murine class i molecules sequenced to date except aw68.1 and aw68.2, which have valine at that position. bulk cultures of a2-allospecific ctl were also sensitive to this substitution, and preferenhally recognized both molecules with alanine at 245. this study shows that aw68.1 differs from other class i molecules in its capacity to bind cd8, and raises the possibility that aw68.1 may not function as a restriction element as effectively as other class i alleles. t cell hybridomas derived from h-zd lnc recognked s and pres2 antigens in a i-ad restricted way, while t cell hybridmas from h-2k lnc manifested a specificity for either pres2 in association w i t h i-ak or for s in association with i-!& the activation of lhese hybridomas by antigen and antigen presenting cells (af'c), as measured by il-2 secretion, was found to be sensitive to prostaglandines and could be completely inhibited by anti-lfa-1 moncclonal antihxlies. different aft populations were tested for their capacity to present spresz particles to these t cell hybridomas. various macmphage like populations such as resident, con a induced, thiiglycolate induced peritoneal exudate cells as well as splenic adherent cells were found to present efficiently the spres2 antigen. in conhast b cells and la+ b cell lines (ta3, m12.4) could not function as accessory cells in the spresz specific stimulation of these t cell hybridomas. the inability of these cells to present this antigen was not due t o inhibitory effects since these cells did not inhibit the presentation capacity of other potent apc's. funhcnnore addition of apc's of a different haplotype could not complement for the defective presentatiw of spresz by b cells and b ceu lines indicating that mhc independent accessory factors are not implicated in this process. hence it is clear that maaophage-like apc's and b cells d i f f a in their capacity to process and present spdz antigens. since spresz is a very stable particle composed of lipids and proteins it is conceivable that such antigen quires a smng degradation and swh plocessing might occur in cenain -phage-like apc's but not in b cells. recombinant human insulin biosynthetically labeled with k n d " s at several amino acids was used as an antigen and was exposed for varyillg lengths of time to ta3 mouse b cell apc. subcellular fractionation and hplc chromatography permitted several of the processed peptides distributed throughout the insulin molecule to be monitored. many insulin peptides localized to both the extracellular (8 peptides) and intracellular (4 peptides) compartments of ta3 cells were detected. membrane-associated peptides is in progress. many of the peptides processed by ta3 apc in situ co-elute with those obtained upon digestion i n vitro by the insulin-specific insulin degrading enzylg5 (ide). f o r the processing of 51-labelled human insulin suggest that insulin may be processed in b cell apc into immunogenic peptides by an enzyme(s) present on the plasma-membrane, intracellularly and extracellularly. (supported by mrc and cda) . we investigated the pathway of antigen processing 5 itu in cell apc. hurine ctl clones specific for hia-a2 were generatef with the human cell line jy. four of five ctl clones were found to lyse a2k transfected murine cells more effectively than a2 transfectants. anti-cd8 specific mab inhibited t 3 lysis by these four clones, and this inhibition was more pronouncgd for a2k one clone, which lysed a2 and a2k transfectants equivalently, was shown to be insensitive to anti-cd8 antibody inhibition. these findings indicate that a2-specific t r i n e ctl clones possess greater avidity for murine target cells expressing the a2k hybrid molecule relative to those expressing the a2 molecule. this implies that a cd8 interaction with the same molecule seen by the t cell receptor is important for target cell recognition. hla class i1 antigens are highly polymorphic cell surface proteins involved in initiation and regulation of the immune response. allelic sequence variation primarily affects the structure of the first external domains of the a and j3 component chains. here we provide evidence for other types of allelic polymorphism for these genes. the sequences of two cdna clones corresponding to the hla-dqp mrnas from an hla homozygous cell line exhibit both alternative splicing and readthrough of polyadenylation. furthermore, the alternative splicing event is associated with only a subset of hla-dqb alleles, while the polyadenylation site readthrough is found in a larger subset. this suggests that polymorphic & acting elements within the hla-dqp gene control both processing steps. proteins, presumably encoded by the alternatively spliced mrnas lacking transmembrane exons, are immunoprecipitated with a monomorphic monoclonal antibody directed against hla-dq. these proteins are found in supernatants of cultured cell lines for which secretion is predicted, but not in those of cell lines which do not contain the alternatively spliced mrnas. such secretion class of i1 allelic products could profoundly affect interactions between effector and target cells in an immune response. departments of biology and chemistry and the cancer center, q-063, university of california at san diego, la jolla, ca 92093. we are studying the murine icam-1 gene and the effects of icam-1 on antigen recognition. using the human icam-i cdna, we have isolated cdna and genomic clones encoding the murine homologue. the murine icam-1 gene is a single-copy gene that consists of multiple exom spanning 25kb of dna and encodes a 2.5kb mrna that is expressed at high levels in a wide variety of different cell types. sequence analysis indicates that murine icam-1 is 50% and 6096 homologous to human icam-1 on the protein and dna levels, respectively, and is a member of the immunoglobulin gene superfamily, consisting of several v-like domains linked tanddy. we are also studying the effects of icam-1 on antigen recognition. the t-cell clone 14d m s f 1988). this response is blocked with the anti lfa-1 antibody fd441.8 when antigen is presented by blo.a(2r) spleen cells but not when antigen is presented by dcek, a fibroblast transfected with i-ek. we are currently aansfecting the murine icam-1 cdna into dcek in order to determine if we can enhance the 14d response and to determine if the enhancement is lfa-1 dependent. wth an alp a beta tcr that recognizes moth the t cell differentiation anti en, cd4, is expressed by mhc class i1 restricted t lymphocytes. mhc class i1 products. the association between cd4 expression and restriction by mhc class i1 rfioducts has led to the hypothesis that cd4 may interact with monomorphic determinants of a large body of experimental evidence suggests that cd4 interaction with mhc class i1 molecules leads to an increase in the binding avidity of t cell-stimulator cell interactions. a direct test for a functional cd4-mhc class i1 interaction in t cell activation requires a separate evaluation of cd4-ia interactions from tcr-a /ia recognition. however, a separate evaluation proves difficult since the t cell receptor and cb4 may interact with the same mhc class i1 molecule. in this report, we use a t cell activation protocol, where tcr-ag/ia recognition is replaced by tcr complex-antlcd3 antibodies interactions. using this activation protocol, we pave analyzed the effects of monoclonal anti-mhc class i1 antibodies on the activation o$a cd4 hybridoma in the absence of its tcr restr$tinj mhc class i1 molecule (ie ) but in the presence of unrelated mhc class i1 molecules (ie ,ia ). the data obtained clearly indicate a functional role for cd4-mhc class i1 interactions in t cell triggering. we have targeted hen egg lysozyme (hel) to murine b cells using heterocrosslinked antibodies which specifically bind to surface igd or different mhc molecules. occurred more quickly with targeting to igd than to mhc structures as assessed by fixation and pronase stripping experiments. hel was internalized quickly into acidic compartments when targeted to igd but was detected much later when targeted to mhc molecules, as assessed by shifts of fluorescent signal of internalized fitc-hel. however, the data indicate that not all endocytosed hel entered low ph (<5.5) compartments. degraded hel was released from b cells following endocytosis of 125-i-hel. this release was detected earlier with targeting to igd than to mhc structures. interestingly, the total amount of internal 125-i-hel decreased with time after endocytosis via igd, but the internal 125-i-hel was almost entirely whole undegraded hel at all times following endocytosis. these data and those of chloroquine and leupeptin inhibition studies indicate differences in the fate of antigen entering b cells via igd or mhc structures, and support the notion of a neutral ph storage compartment for antigen endocytosed via surface igd on normal splenic b cells. internalization and presentation of hel to hybridoma t cells laboratories, department of surgery, university o f iowa, iowa city, ia 52242 target cell lysis by cd8' ctl is a highly specific phenomenon in vitro, as we have confirmed repeatedly in reverse labelling tests by showing that admixed "third party" target cells are not lysed in the presence of specific ctl-mediated cytolysis. however, when mixtures of ctl and their specific targets are inoculated into the skin of hosts syngeneic to the ctl, host cells at the site of inoculation are destroyed, often to an extent that results in grossly observable, full -thickness necrotic lesions. we have evoked these "innocent bystander" reactions in mice with ctl directed against single and multiple non-h-2 antigens and tnphapten and influenza a virus-specific antigens. thus, the ability to trigger bystander tissue destruction appears to be a general characteristic of ctl-target cell interaction in vivo. our current evidence suggests that host inflamatory cells recruited and activated by factors stimulated by ctl-target cell recognition actually mediate the tissue destruction. these ctl-initiated bystander reactions may be the basis of the non-specific tissue destruction that contributes to allograft rejection and that is observed in many serious virus infections and in intense dth reactions and contact dermatitis. rong h a lin. baael i n s t i t u t e f o r i u m l o g y , baael, snitzerland. w e have investigated the b a d e for i n u n i t y or tolerance t o a m a e aezw proteinthe f i f t h caponent of c o l p l a c n t (a). i n c5 deficient nice t h i s protein is absent from s e r m and as a cnrwquence they are not tolerized t o cs. c5 deficient dca generate uy bearing t c e l l s which recognize c5 i n the c m t e x t of clasa 11. i n contrast, c5 s u f f i c i e n t mice i n which c5 protein is continuously probced d m t mtmt t c e l l reaponsea againmt u. we have tested i f t h i s self protein i s proceseed and presented with clase i1 i n n o m l mice and can be recognized by c5 specific t c e l l s i n the absence of exogenewsly added antigen. a l l clasa i1 bearing c e l l s fm c5 s u f f i c i e n t l i c e activated c5 specific t c e l l clonea without additional antigen. presentation was mt a cansaquence o f c5 secretion by macrophages i n culture but was a h t o be derived prom endqlenewaly generated cs/cl.ss i1 caplexea. thus t h i s self protein is e f f i c i e n t l y preaented ,inin and available f o r tolerance in&xtion. although c5 deficient dca cannot secrete c5 they s t i l l synthesize a precursor mlecule, pro-c5, i n accumulating evidence from a number of models suggests that unique subsets of antigenpresenting cells a r e responsible for the induction of specific t cell-mediated responses. w e have previously described an age-dependent maturational defect in the ability of the sjl strain of mice to activate dth-inducer t cells to a wide variety of antigenic stimuli. none of the other 14 strains tested exhibited a similar defect and all other accessory cell dependent responses were unaffected in the dth unresponsive sjl. w e have also shown that the adoptive transfer a macrophage from older dth responsive sjl or other dth responsive las strains can overcome this defect in dth responsiveness. w e have recently found that a subpopulation with the mac-i+, mac-3' and mac-2-surface phenotype a r e able to transfer responsiveness. facs analysis indicate that the mac-3 phenotype is expressed on less than 20% of macrophages. titrations of the mac-3' cells isolated by facs indicate that adoptive transfer of o ly 100 mac-3' cells can overcome the defect in dth responsiveness. by contrast, transfer of 10 mac 3-or mac 2' cells were unable to overcome the defect. our data suggest that the induction of cd4' antigen specific cells dth-inducer t cells is mediated by a phenotypically unique small subset of macrophage accessory cells. in our studies, we have examined the effect of administering fab' fragments of anti-l3t4 moab (fabl-gk1.5) on the inhibition of humoral immunity. treatment of klh-primed mice with 0.5 mg fabl-gk1.5 depleted l3t4' cells from lymph node tissue while leaving other lymphocyte subpopulations intact. after injection of klh in complete freund's adjuvant, these t,depleted mice were unable to produce anti-klh antibodies. long-lasting unresponsiveness against klh (12 weeks) was observed despite the apparent regeneration of the t, population of the lymph node. the results obtained using either fab' or intact gk1.5 antibody were comparable and suggest that a transient depletion of t, does not account entirely for the long-term humoral unresponsiveness. the aim of this study was to gain a more detailed insight into the molecular aspects of antigen processing during the imune response. as a first approach, endosomal vesicles were isolated from bovine alveolar macrophages and their proteolytic activity with respect to a model protein antigen, sperm whale myoglobin (mb), was characterized. during the first stage of digestion of mb by the endosomes, a limited number of fragments were preferentially released from the antigen. we have isolated and identified these fragments. the digestion of myoglobin is completely prevented by pepstatin, a specific inhibitor of aspartic proteinases, and only marginally by other proteinase inhibitors. when mb fragments preferentially released upon digestion with purified bovine cathepsin d, an aspartic proteinase abundant in macrophages, were identified, almost all coincided with the fragments released by the endosomes. to define in more detail the selectivity of cathepsin d under the mild conditions applied, other protein antigens were similarly treated with the enzyme and the peptides released were identified. the location of the preferential cleavage siteswhen related to known t-cell epitopessuggests a dominant role for cathepsin d in the processing of protein antigens to yield fragments for presentation to t-cells. possibly, the observed selectivity of the enzyme may account for the structural similarities among t-cell epitopes, noted by others. actively acquired tolerance in mice to the antigens of the mhc (h-2) is induced by exposure of the animals to allogeneic lymphocytes within 24 hours of birth. actively acquired tolerance to the mhc in humans (hla) cannot be studied in the same way. however, we have evidence for the existence of actively acquired tolerance in humans in a study of 26 highly sensitized patients waiting for a renal allograft. they had developed complement dependent antibodies to the hla antigens of almost all unrelated caucasoid donors. the sera of these highly sensitized patients were tested against a panel of lymphocytes that were mismatched for only one hla class i antigen. we found for these 73 patients hla class i antigens that, although different from those present in the recipient, did not lead to a positive crossmatch. we called such antigens "permissible mismatches" and show that they often included those hla antigens of the patient's mother that the patient had not inherited (noninherited maternal antigens; nima). in 15 of the 26 patients, the permissible class i mismatches included the nimaa. the noninherited paternal antigens (nipas) were analyzed as a control; only two of the 25 nipas tested were acceptable mismatches, which emphasized the preferential nonresponsiveness to nima. recent experiments indicate that what holds true for antibody formation also holds true for t cell activation. of hla class i and hia class i1 allospecific cd8-positive ctl clones. monoclonal antibodies (mcab) directed against the cd8 structure were only found to inhibit antigen-specific cytotoxicity of a series of class i allospecific cd8-positive ctl clones and not of a class i1 allospecific cd8-positive ctl clone. however cytotoxicity induced by cd3 mcab (used at suboptimal concentrations) or cd2 mcabs in both types of ctl clone was blocked by cd8 mcabs. the absence of cd8 mcab blocking of antigen-specific cytotoxicity of the class-11-specific cd8positive ctl clone may be explained by assuming that it results from a triggering signal which is to strong to be overcome by the down-regulatory signal of the cd8 antigen. these combined findings clearly suggest a functional involvement of cd8 not only in tcr/cd3 activation, but also in tcr/cd3 controlled alternative activation routes, such as the cd2 activation pathway. moreover it shows that even an hla class i1 allospecific cd8-positive ctl clone expresses a functional active cd8 antigen. the absence of hia class i expression on the target cells (daudi cells) used in the experiments described indicate that the cd8 antigens not act solely in an adhesion-like fashion, but exhibit also a more general regulatory function in t-cell activation. this regulatory role of cd8 may be explained by assuming the induction of a threshold for activation, which is triggered after binding of cd8 mcab or binding to its natural ligand, hla class i. in our view, cde-mediated regulation of t-cell activation could therefore prevent non-specific triggering of cytotoxicity by interactions of insufficient affinity. *this study was supported by a grant from the dutch kidney foundation. the cd4 t-cell surface antigen 1s felt to have the dual functlon of stabilizlng the interaction of the t-lymphocyte with the antigen presenting cell (apc) a s well as transduclng an independent signal that can potentiate the actlvatlon related alteratlons generated through the t-cell receptor. we have found that upon antibody-mediated cross-linklng of the cd4 molecules of cloned murlne t-lymphocytes there is a time and temperature dependent decrease in the abundance of the lymphocyte-speciflc tyroslne klnase p66lok. this co-modulation is speclflc for cd4 and p66lck slnce cross-linking of other t-cell surface antlgens (cd3. t200, thyl.2) does not result in detectable alteratlons in the abundance of the lck protein and slnce cd4 cross-llnklng does not induce any alteratlon in the abundance of p60*.. another srcrelated tyrosine klnase highly expressed in t-cells. such data suggest that cd4 and the internal membrane lck protein are in close proximity within the cell. further analysls has revealed that slgnlflcant amounts of lck can be immunopreclpitated by antl-cd4 antibodies. in addltlon. cd4 can be speciflcally preclpltated by anti-lck antibodies. our data imply that cd4 and p661ok are physically associated in cd4+ t-lymphocytes. the flndlngs that cd4 is msoclated to the lck proteln in either murlne or human t-cells and that cd8 is also complexed to p66lck ln cd8* t-cells suggest that the lck tyroslne klnase is involved in the functlon of the cd4 and cd8 accessory molecules. these apc do not appear to present processed klsa determinants. in light of these findings, of apparent interest is the issue of which cells types are responsible for hlsa-specific t cell tolerance induction. studies in mice treated from birth with anti-p antibodies suggest an important but perhaps not exclusive role for b cells in this process. we are currently pursuing the identity of other cell types which may be involved. in addition, lmmunogenlclty c173 university of texas southwestern medical center at dallas, dallas, texas75235 qlo is a soluble class i-like major histocompatibility antigen produced specifically by the liver. previously, it has been shown that mice possessing soluble qlo can generate anti-q10 cytotoxic t lymphocytes (ctl), suggesting that this soluble molecule does not function as a tolerogen. we have recently constructed c3h transgenic animals which express an exon shuffled q10 (al, p2)/ld (03, tm) molecule. this qio/ld molecule is expressed specifically in the liver on hepatocytes but not on nonparenchymal liver cells, spleen, thymus, kidney or brain. the expression of qio/ld in the transgenic hepatocytes is equivalent to la expression on balb/c hepatocytes, suggesting the animals are expressing physiologic levels of the transgene. the presence of membrane bound qio/ld in c3h animals has not caused anti-910 ctl precursors to be deleted, however, because primary in vitro ctl assays show these transgenic animals can specifically lyse qio/ld targets. histopathologic examination of the livers of these animals does not show extensive lymphocytic infiltration or inflammation. in addition, serum levels of alanine aminotransferase. aspartate aminotransferase, and alkaline phosphatase are also normal, confirming that these animals do not show overt signs of liver rejection. results are ampatable with a t least two different pathways of antigen hardling, a pathway for degradaticm of antigen, and a "pmcessiq" pathway for antigen presenbtim. my+ l monocytcq enes appear t o i n t e r f e r e with t h e processing pathway, either by i n h i b i t i n g production of antigenic material t h a t can associate with ia o r by i n h i b i t i n g putative intracellular event@) imr0lvb-q the binding of ia to processed antigen and tmnsprt of annplexes to the cell surface the immunogenicity and antigenicity of synthetic peptides (sp) derived from the sequences of a streptococcal antigen were investigated in macaque monkeys. immunization with the free peptides of 17 and 21 residues failed to elicit serum antibodies or t cell responses. however, both serum antibodies and lymphocyte responses were elicited by immunization with the sp linked to tetanus toxoid (t) as a carrier. indeed, spl7-lt and sp21-tt elicited serum antibodies and proliferative responses of lymphocytes, not only to the sp but also to the native strtptococcal antigen. recall of sp17-tt or sp21-1t immunized monkeys w i t h suboptimal doses of the native srreptococcal antigen resulted in a significant increase in antibodies, both to the sp and native antigen, confirming that the two sp share antigenic epitopes with the native antigen. the b and t cell epitopes were then determined and the b cell epitopes resides in residue 8-13, whereas the t cell epitope overlaps and consists of residue 7-15. the t cell epitope has an amino-terminal leucine and carboxy-terminal glycine and alanine added to residue 8-13 of the b cell epitope. in spite of the b and t cell epitopes being expressed in sp17 (residues 1-15), the monomer failed to induce serum antibodies without a carrier. however, immunization with dimers of peptide-linked or disulphide-linked residues 1-15, without a canier, elicited both serum antibodies and proliferative responses of lymphocytes. the results suggest that the monomeric sp17 is not immunogenic, whereas the dimeric peptide elicits both antibodies and t cell responses. the minimal t cell-b cell structure required for immunogenicity is now being determined. lmmunogenlclty section a departments of immunology and rheumatology, mayo clinic, rochester, pin 55905. susceptibility to collagen induced arthritis (cia) in mice maps to the i-a loci in h-29 mice. however, swr (h-29) mice are cia resistant, suggesting a role of non-mhc genes. we have recently shown gene complementation between h-29 from swr and tcr v genes from several non-susceptible strains. cia has been induced in b10, c3h.a and a fackcrosses with swr with similar high incidences; 63, 71 and 67% respectively. c57l shares a similar background with b10 and is h-2b, but has the same v tcr mutation as swr. backcrosses showed a very low incidence (17%) of cia, and tte arthritis observed was of a much milder and transient nature. the invariant chain associated with hla class i1 molecules is a 31-33 kd glycoprotein implicated in antigen processing and assembly and intracellular transport of class i1 molecules. class i1 molecules and invariant chain are expressed primarily by b lymphocytes and antigen-presenting cells such as macrophages and can be induced by interferon-y in a variety of cell types. to define sequences involved in the human invariant chain gene regulation, 790 bp 5 ' to the initiation of transcription were subcloned upstream of the cat gene. transfection into invariant chain-producing cell lines and non-producing cell lines demonstrated that this 5 ' region displayed tissue specificity and responsiveness to interferon-y. deletion mutants were constructed to ascertain the functional properties of specific regions of the invariant chain upstream regulatory regions. these deletion mutants have led to the identification of 3 putative regulatory regions: 394 to 239, 2 3 to 216, and 216 to 165 bp 5 ' to the cap site of the invariant chain gene. deletion of any one of these 3 regions results in decreased cat activity. protein-dna interactions of these sequences have been characterized by mobility gel shift assay and dnase i footprinting. two regions have been identified that exhibit cell type dependent binding of nuclear proteins. two color flow cytometry was used to characterize the surface phenotypes of human bronchoalveolar lymphocytes (n=32). the cd4/cd8 ratio was highly variable (0.3-6.6, mean-2.1). a high proportion of the t cells expressed hla-dr (9-38%, mean=2l%) indicative of t cell activation. however, detectable levels of the i+-2 receptor were expressed on <3% of the cells. cd45r was absent from cd4 cells in most preparations (0-10% mean=3%) suggesting that the cells are inducers of ig synthesis. uchl1, a marker of memory cells was present on 68-100 % of lung t cells. uchll+ cd45r-lung lymphocytes responded poorly to pha and cona but did respond to il-2 in the presence of accessory cells. together these data suggest that lung lymphocytes are recently activated memory cells. il-2 induced lung t cell lines were also characterized for antigen expression and w ( activity. high lak activity was obtained in preparations containing a high proportion of cd8 cells. these cultures appeared to be suicidal. in contrast, lines with a high proportion of cd4+ had low or absent lak activity but proliferated in the presence of il-2 for at least 3 months expressing a cd2 cd45r-phenotype. this abstract is a proposed presentation and does not necessarily reflect epa policy. the polymorphic second exons of the hla-dp, and dpd genes have been specifically amplified in vitro by the polymerase chain reaction (pcr) method, using the thermostable dna polymerase of aauaticus. sequence analysis of mi3 clones containing the amplified dp sequences from a panel of thirty-four df typed cell lines revealed only the two previously characterized alleles for dp, . fourteen allelic variants were defined for dpw eight of these are associated with the t-celldefined dpwl-6 types; two subtypes were found for both dpw2 and dpw4. six additional dp alleles which were previously typed in the t cell assay as blanks were also idenlfied. based on this sequence information, non-isotopic sequence specific oligonucleotide probes have been developed and used to type a margarita betz, dominic dordai, brian e. lacy, and barbara s. fox. department of medicine, university of maryland school of medicine, baltimore md 21201. murine type 2 helper t cells (th2) secrete interleukin 4 (il 4) in response to antigen. despite the likely importance of these cells, little is known about their priming and expansion in vivo. we have demonstrated il 4 production in response to a cytochrome p peptide following t cell expansion in vitro. this antigen has not previously been shown to induce th2 cells. bio.a mice were primed with a peptide fragment of pigeon cytochrome c in cfa. lymph node cells were restimulated in vitro with antigen for 5-7 days, ficolled and rested for 3 days without antigen. cells were then tested by limiting dilution for the presence of antigen-specific il 4 producing cells. il 4 was detected using the il 4 sensitive cell line ct4s (provided by dr. w. e. paul, nih). the specificity of the response was confirmed by blocking with the anti-ll 4 antibody 11 b1 1. following in vitro restimulation of the primed lymphocytes with antigen, il 4 production was detectable from as few as1 03 cells per well. il 4 secretion was antigen dependent and required both in vivo priming and restimulation in order to be detected. it is not clear why primed lymph node cells, placed in limiting dilution culture directly after removal from the animal, failed to secrete detectable amounts of il 4 in response to antigen. suppression is an unlikely mechanism as fresh primed lymph node cells were unable to inhibit il 4 production by restimulated cells. we are now investigating the factors that may regulate the development of il 4 producing t cells. mark r. boothby, ellen gravallese, hsiou-chi liou. and laurie h. glimcher, department o f cancer biology, harvard school o f public health, boston, ma 02115. regulated pattern, and normally expression is limited to certain cell types such as 6 cells and macrophages. cells is accompanied by the loss o f class i 1 mhc expression. these genes also respond to external stimuli such as the cytokine il-4, which increases b cell ia. a region o f the aa mhc gene activated expression of a cat reporter gene in a b lymphoma cell line but not in a myeloma cell line. a nuclear protein that bound to two sites within this region was found. this binding activity was present in spleeiis that lack t cells and in b cell lines, but it was absent from all three myeloma cell lines tested. il-4 treatment of normal and athymic mouse spleen cells greatly increased the binding of this nuclear protein to its a a target sites, concomitant with increased aa transcription; "thus, 6 cells contain a sequence-specific binding activity regulated both by il-4 and by differentiation. the differentiation o f b cells to plasma lmmunogenicity c210 peter van den elsen3. ldepartment of immunology, the netherlands cancer institute, amsterdam, zdepartment of immunology, erasmus university, rotterdam, 3department of immunohaematology, academic hospital, leiden, the netherlands. human tcr y6 occurs in disulphide-linked (type 1) or non-disulphide-linked (type 2) forms, dependent on the use of the cyl or cy2 gene segment. the cyz gene segment can contain a duplication or triplication of exon 2 , which gives rise to different protein forms (types 2bc or zabc). it is not known whether functional differences exist between these receptor types. protein chemical analysis of type 1 and type 2bc receptors on functional human t cell clones derived from peripheral blood (pb) has indicated that not only the y chains, but also the 6 chains have a molecular mass and charge which set apart type 1 and q p e 2bc receptors. two sets of fifteen clones were randomly generated from pb of two normal donors after selection with the anti-tcr y6-1 mab, which recognizes all receptor types. dna rearrangement and mrna expression analysis of y and 6 genes allowed us to map the specificity of the anti-tcr y6 mabs 6tcs-1 and tiya to the v6l and vy9 gene segments respectively. subsequently it could be concluded from the analysis of random clones that the majority of type 1 receptors use vy9, while this preference seems absent in type 2 receptors. the great majority of type 1 receptors do not use v6l. while the majority of type 2 receptors do. this was confirmed by fluorescence analysis of pbl of a large panel of normal donors. we conclude that vy and v6 gene segments in functional tcr y6 in pb are used in non random combination and that their expression is correlated with rearrangement of the y gene to cyl or cy2. we have previously shown that polymorphic residues in the nh2-tenninal half of the p1 domain (amino acids 1-48; hypervariable regions 1 and 2 [phvl and 21) determine with which allelic or isotypic a chain a particular b chain can achieve efficient cell surface beterodimer expression. this result might be understood in terms of the current model for ia smcture which predicts that w v l would lie adjacent to region. therefore, to examine the role of ahvl residues in conmlling hetcroduncr expression. a mutant a d cdna was created in which the codon for amino acid 11 was mutated to code for the auk residue at this position. in addition, recombinant a d and a& cdnas, in which the segments encoding the three a hypervariable regions were exchanged between the two alleles, were used to study the connibutions of other a chain polymorphisms to this process. interestingly, the polymorphic residues in ahv2 are predicted to lie in a region of the a a chain a-helix which is adjacent to the phv4 region of the p chain a-helix. allelic substitutions in this latter region of ad have been shown to similarly affect surface ia h e t e r o d i i expression. taken together, these results suggest that there are at least two spatially separate areas in which the a and p chains interact and that these interactions are affected by polymorphic rtsidues in both areas, conmbuting to the efficiency of heteroditner expression and, most likely, ia quartcmary conformation. the aim of this project is to identify contact residues of the t cell receptor (tcr) with antigen and/or mhc class i1 molecules. as a model system, a vp17-containing tcr has been chosen since the majority of vb17' t cell hybrids react with ie molecules of the k,s,d, and b haplotype. t cell hybrids have been made which have a dual reactivity: they are vp17+ and recognize ie molecules but also show reactivity towards a known antigen, namely chicken ovalbumin (ova). one such hybrid has been mutagenized with ethyl methane sulfonate (ems). mutants were selected on the basis of their survival after stimulation by either antigen or ie. it was expected that mutations in all different kind of genes involved in t cell recognition and t cell activation would be found. mutants obtained fall into two major groups: 1) loss variants of tcr a or fl chains,t3 or l3t4: 2) mutants with point mutations in one of these genes. we are currently analyzing the mutants biochemically and functionally in order to identify the particular gene affected. point mutations in the a and p genes of tcr mutants will be localized using the polymerase chain reaction in combination with dideoxy sequencing. activation of t lymphocytes requires the intracellular fragmentation of foreign antigens and their presentation by class i or class i1 major histocompatibility complex glycoproteins. the direct binding of peptides to class i1 molecules has been shown in a number of experimental systems and its specificity compared to that of t cell activation. in contrast, direct binding of peptides to class i molecules has been difficult to detect; although peptide sensitization experiments and the crystallographic structure of hla-a2 persuasively argue for its occurence and importance. in this study, we demonstrate specific binding to hla-a2 of an influenza matrix peptide (flu-m1 residues 56-68) that has previously been shown to act as a target for certain hla-a2 restricted influenza-specific cytotoxic t lymphocytes. we estimate that less than 0.3% of the purified hla-a2 molecules were able to bind the added peptide. we and others have shown that allorecognition by cytolytic t lymphocytes (ctl) is analogous to t cell recognition of foreign antigens in that both can occur via presentation of antigenic peptides by products of the major histocompatibility complex. we have used peptides corresponding to the alphal helix of selected hla molecules to analyze t cell recognition of this polymorphic region. the alpha alpha helix of hla-b44 and -b13 are identical, and show a high degree oh homology with those of hla-bw58, -b47, and -b27. peripheral blood lymphocytes from 5 normal donors were stimulated in vitro with targets expressing hla-b44 to derive allospecific ctl lines and clones. in some individuals, the allospecific response was almost totally directed against the alphal helix. the ability of peptides corresponding to the alphal helix of these hla molecules to inhibit and induce lysis as well as to modify other assays of t cell activation will be discussed. diseases and *national institute of child health and human development, d bethesda, md 20892 . classical transplantation antigens are constitutively expressed on cells of all tissues exce t brain. transaiption is regulated by the interaction of nuclear factors with 5' flanking regions tiat include the class i re latory element (cre). previously, the cre has been divided mto 3 re om on the basis of nuclear t%or blndin . several studies have implicated the nuclear protein (ri) wfich binds to the inverted repeat geggattcccca) of re 'on i as necessary for gene transai tion although region i is identicarin all se uencedouse $d and l enes, it is not conservefin qa r y genes. a comparison of the c& from h-2ld with that of 610, a qa region gene expressed o y in the liver and fetal olk sac, shows that there are two changes within the inverted r eat se uence (tgaggactcc$a). these differences disru t the dyad symmetry. another nugotide diierence between h-2l and qlo falls within re 'on ifof the cre. however, qlo can bind to the nuclear factor (rii) that binds to region ii of the fit2ld cre, whereas qlo region i can not bind to the nuclear factor (ri) that binds to the region i inverted re at. to test whether the differences m region i contribute to the restricted tissue e ression of q l r w e have used site-directed in vitro mutagenesis to make the inverted repeat of]cglo region i like that of the classical class i genes. a change at either base enhances transcription as measured in a transient transfection system. either change also allows binding of the nuclear factor that binds to the classical r alterations in the cre regon i contribute to the limited tissue expression of310. the presence of disrupted cre region i in other region genes likely contributes to their tissue restricted expression. on i sequence. thus, molecular analysis of t cell receptor structure/function in sperm whale myoglobin specific t cell clones. jayne s.danska, alexandra m. livingstone, toshi isihara and c. garrison fathman, stanford university medical school, stanford, ca. we have undertaken structural characterization of the t cell receptors (tcr) utilized by a well defined panel of murine dba/2 t cell clones that recognize epitopes within the 110-120 peptide of sperm whale myoglobin (sp wmb) presented by i-ad or i e . only 2 of 14 independent clones show alloreactivity for 10 whc haplotypes. using the polymerase chain reaction (pcr) and dna sequencing of the tcr a and fc chains from matched sets of clones bearing either whc restriction or epitope specificity in common, we are addressing structural relationship between tcr and mhc/antigen for this model system. among 6 i-ed restricted t cell clones reactive with spwmb 110-120, all have highly homologous tcr 6 chains associated with a minimum of three different tcr a chains, some of which are derived from novel v gene families. to further characterized the specificity of these clones we are generating substituted peptides to identify residues within the epitope important for interaction with tcr or restricting mhc molecule. functional verification of the relationship between given tcr primary sequences, and recognition capability will be addressed by transfer of the a and/or 8 chafns cdnas created by pcr amplification into t-cell hybridomas expressing endogenous tcr genes of known sequence and specificity. with mhc fine specificity. the differential impact of substitutions with the n-terminal and c-terminal portions of the apl domain is consistent with models of auap structure in which the n-terminus interacts with peptide while the c-terminus interacts with both peptide and the tcr. or aauapu-resmcted t cell clones. the antigens tested were l-tymsine-p-lmmunogenicity c 218 expression of the q4p gene, patricia m. day, katherine e. lapan and jeffrey a. frelinger, department of microbiology and lmmunolog university of north carolina at chapel hill, chapel hill, nc 27599. generally, the transcription of class i genes tom the q a a region is limited to tissues of hematopoietic origin. previous work in our lab demonstrated widespread transcription of the q4 gene in the 81 0.p mouse, with high levels of mrna found in liver, lung, lymph node, spleen, testes and thymus. less rna was present in muscle and brain tissues. however,,it is not known whch individual cell types within these tissues are responsible for the transcription of the q4 gene. we raised polyclonal antisera against a synthetic peptide, derived from the predicted amino acid sequence of the q4p transmembrane region. we selected this region since it is the most locus specific. this antisera immunoprecipitates a class lsized protein. a monoclonal antibody, directed against the same peptide, has also been produced. sv40-transformed h-2p fibroblasts show an abundance of q4 message. suprisingly, indirect immunoluorescent staining with the monoclonal antibody reveals a cytoplasmic localization of the protein with a perinuclear concentration. different patterns have been observed in examination of the h-2b em onal carcinoma cell lines 402ax and pcc4. qgspecific antibodies allow us to identify the cell ty es which express the24 gene product. the application of in situ hybridization techniques will correlate the cellular site ofmrna synthesis and protein detected by antibodies. understanding the paltern of expressim of the q4 gene is the first step in determining the so far elusive function of these mhc genes. and il2r mrna a f t e r mitogenic stimulation. antibodies against cd45r, but not against c045 common determinants, synergise with suboptimal doses o f mitogen t o induce il2 and il2r mrna expression, suggesting t h a t cd45r molecules are operative i n transmembrane s i g n a l l i n g i n immature thymocytes. there i s also an i n d i c a t i o n from northerns using cd45 probes t h a t cd45p180 mrna i s not induced i n activated cd348-thymocytes as i t i s i n mature t c e l l s . these r e s u l t s support the idea t h a t cd45r+ molecules are essential f o r generating signals required f o r c e l l survival w i t h i n the productive intrathymic lineage. we examined a panel of thl and th2 t cell clones for the ability to induce antibody synthesis in a mishell-dutton culture system under cognate b-t cell conditions: our findings indicate that both thl and th2 t cells are heterogeneous, i.e., some but not all thl and some but not all th2 clones have the capacity to induce antibody under these conditions. we examined the effect of 7-irradiation or rnitomycin-c pretreatment of our thl and th2 clones on their ability induce antibody synthesis. asano et al. (j. immunol 138:667) have reported that th2 clones are exceedingly sensitive to 7-irradiation, with doses as low as 500 rads abrogating the ability of th2 clones to induce antibody synthesis. we found that while the helper activity of th2 clones was very radiation sensitive, helper activity in thl clones was very radiation resistant. thl clones given 2 0 0 0 rads of irradiation were as effective as unirradiated clones in inducing anti-tnp plaque forming cells (pfc). moreover, when used at higher t cell/b cell ratios in culture, irradiated thl clones were more effective than unirradiated clones in inducing antibody synthesis. the effect of 7irradiation on th2 clones was not simply due to inhibition of proliferation, since mitomycin-c pretreatment of the clones had little effect on helper activity. conservation, alexander l. dent, pamela j. fink and ste hen m. hedrick, department of biology, university of california, san diego, 8a 92093. the p chain gene of the murine t cell receptor has been shown previously to have an alternative1 spliced form of message. this message contains a novel exon, termed c& which is inserted between the vdj and constant region exons. we have studi6d expression of the cpo exon at the mrna level by rnase protection. we have found that about 1% or less of of p messages in normal t cell clones contain the cgo exon, whereas p m e s a es. in the thymus contain the exon at 10-20 fold higher levels. to address t i e importance of the c 0 exon in the immune ,system, we have undertaken a phylogenetic approach. \y cloning and sequencin the rat analogue of cpo, we have found that while the rat exon is very simifar to the mouse exon, both donor and acceptor rna splice signals are defective in the rat cpo gene. this implies that rat cpo cannot be spliced into rat p m e s a 8s. furthermore, we have sequenced the analo ous region to mouse cpo in 8 e human p chain locus, and have found no stretct of sequence remotel homolo ous to cpo. because cpo is not conserved evolutionarily, we beieve that ! he cpo gene element does not sewe an important function to the immune system of most vertebrates. 1988) . we found that one arrdno acid substitution at a vdjp juntional region position found to be highly conserved in pigcon cytochrome c-specific tcr's results in a change in antigen fine specificity, while an* change abolishes all detectable responses characteristic of the d6 tcr. we will present the results of mutagenesismansfection analyses of two other pigeon cytochrome c-specific tcr's. the murine ctl response to human class i molecules is 1-2 orders of magnitude lower than the response to murine alloantigens, due to structural differences between human and murine homologs. we investigated whether this discrepancy could be overcome by exposure of developing t cells to human class i molecules in transgenic c57bl/6 mice expressing hla-a2.1. ! c222 zhe bdixujiar basis of . lbve digiustn ard ed p d l u b 2 r , divof s ch mice expressed hla-a2.i in spleen, bone marrow and thymus at levels similar to those of endogenous h-2 molecules. however, the frequency of ctl specific for other human alloantigens remained similar to that of normal mice. the frequency of hla-a2.1 restricted influenza specific ctl was 1-2 orders of magnitude less than the frequency of h-2 restricted ctl. these results indicate that the poor response of murine ctl to human class i antigens is not determined by selection in the thymus, but by species-specific constraints on the interaction of mhc antigens with t-cell recognition structures. while the mice are tolerant to hla-a2.i expressed on murine cells, they still respond to hla-a2.1 expressed on human cells. the epitopes defined by such clones are present on hla-a2.1 positive human cells derived from several different tissues. such epitopej are not dependent upon the species of p2m associated with the class i molecule, nor upon the structure of the attached carbohydrate. the results suggest that one or more highly conserved normal human proteins contribute to the formation of such epitopes, and provide an explanation for the failure of ctl raised against class i molecules on human cells to recognize the same molecules expressed on murine transfectants. this suggests that normal endogenously expressed molecules may also be important in the formation of epitopes on class i antigens recognized by allospecific ctl. we previously demonstrated t h a t several subclones derived from a c03+, cd4-/cd8-t-cel i l i n e have undergone secondary rearrangements a t t h e t-cell receptor (tcr) a locus w h i l e maint a i n i n g i t s o r i g i n a l tcrb and igh d-j rearrangements (marolleau e t . al., i n press). these secondary rearrangements r e s u l t i n t h e j o i n i n g of germline va and j a gene segments which replace the p r e -e x i s t i n g va-jacmplexes of t h e parental t-cell line. i n an e f f o r t t o examine t h e molecular mechanism responsible f o r these va-ja gene replacements, t h e s t r u c t u r e s o f tcra cdnas prepared from both t h e parental and subcloned t-cell l i n e s were determined. i n addition, northern b l o t and southern b l o t analyses were performed on both t h e parental and subcloned t-cell l i n e s using a panel o f va and j a probes. our r e s u l t s i n d i c a t e t h a t : 1) secondary rearrangements r e s u l t i n both productive and non-productive va-ja j o i n s , 2) the mechanism whereby secondary rearrangements occur i s a d e l e t i o n event t h a t involves germline va genes 5 ' t o t h e p r e -e x i s t i n g va-ja complex j o i n i n g t o j a the class ii major histocompatibility complex (mhc) antigens are a family of integral membrane proteins whose expression is tissue-specific and developmentally regulated. a pair of consensus sequences, x and y, separated by an interspace element, is found upstream to all class ii genes. deletion of each of these sequences eliminates expression of class ii genes in vitro or in transgenic mice (1-3). furthermore, the absence of a specific binding protein for the hla dr a x box in patients with severe combined immunodeficiency disease whose cells lack class ii suggests a critical role for these proteins in class ii gene transcription (4). report the cloning of a agtll cdna encoding a dna binding protein (human x-box binding protein, hxbp-1) which, like the proteins in whole nuclear extract, recognizes both the x box and interspace elements of the human dra and murine aa genes. the hxbp-1 cdna hybridizes to two rna species, 2.2 kb and 1.8 kb in human, that are expressed in both class ii positive and class ii negative cells. hxbp-1 does not cross-hybridize to two murine aa x box binding cdnas recently isolated in our laboratory which also recognize the dra and a ax boxes. these observations provide evidence for the existence of multiple x box binding proteins which recognize a common or overlapping motif. chromosome mapping studies demonstrate that hxbp-1 arises from a multi-gene family two of whose members map to human chromosomes 5 and 22. taken together, these data suggest a high degree of complexity in the transcriptional control of the class ii gene family. france . in an attempt to analyze positive or negative in vivo regulation of clonal expansion of cytolytic t lymphocytes (ctl), we immunized blo.br mice with the kb specific ctl clone kbs-cu). and we tested whether t cells obtained from such mice would influence the in vitro development of the ctl clone kbs-c?o. a clone-specific helper effect has k e n observed, which is mediated by cd4+ splenic cells from immunized mice. control immunizations of b1o.br mice with ti negative variants of suggest that this growth regulation involves the recognition of the ti of kbsc20. the precise nature of the antigen recognized on the ctl clone, the possible involvement of ti determinants with or without mhc products u e now under investigation. we have shown that thy-]+ dendritic cells present in the epidermis of mice (dec) express cd3 associated v73 and v61 gene products. we have produced a monoclonal antibody directed against v73 and found that a wave of cells appearing at the earliest stages of fetal thymic development express v73. phenotypic and functional analysis of v73+ cells in the early fetal thymus indicates that they have characteristics in common with the v73+ dec. both populations express high levels of ly-1 and are ly-6c+. neither express cd4 or cd8. interestingly, the v73+ fetal cells express elevated levels of il-2 receptor, indicating that they may have been activated. functional analysis demonstrated that, unlike other fetal thymocytes, the v73+ cells can be stimulated to produce lymphokines and lyse a panel of target cells which are also lysed by the adult thy-l+ dec. these results raise the intriguing possibility that the first receptor-bearing component of the t cell system to appear in ontogeny might give rise to the thy-]+ dec. (3ritical to an understanding of the function of cells bearing the gamma-delta t cell receptor will be an understanding of when and where such cells function. in order to investigate this, we have used a variety of techniques (in situ hybridisation, cdna cloning, and pcr) to examine m a s of tcr gamma delta gene expression. one conspicuous site of expression is the intestinal epithelium, which is by contrast almost devoid of tcr alpha beta expression. interestingly, the v gene segment usage in this location is quite specific and is different to the specifcity that we have found in the spleen and in the thymus, and that others have found in the skin. this specificity suggests in turn that expression of the resmcting elements recognised by gamma delta may also be spatially nxticted. extensive analysis of junctional diversity can pmvide infomation on the diversity of antigen nxogniscd by gamma delta. the basis for selective expnssion of v gene segments may in part lie in different requkments of the v-gannna gene pmoters. to examine this, the wnscriptional capabilities of the various gamma gene promoters in t cells murine tcr gamma genes: distinct spatial restriction of v-gene segment usage, adrian thyday*, susan kyes*, simon carding#, charles a. janevay, being compared by linkage to the chloramphenicol acetyl transferase gene. biochemistry, university of wisconsin-madison, madison, 11 53706. murine strain a sublines a/j and a/wysnj have a genetic polymorphism that regulates serum immunoglobulin responses to several protein antigens. strain a/j secondary igg1 responses to bovinv gamma globulin, oralbumin, hemocyanin and galactosidase l-ere 50-, l o -, i -and 4-fold greater, respectively, than a/wysnj responses. subline a/hej is a low responding strain like .l/wysnj. analysis of h-2 class 1 and class i 1 molecules provided no evidence for a breeding error to account for the genetic polymorphism. instead, an important immune response gene outside h-2 may hare been heterozygous when the sublines diverged, and the polymorphism resulted from sezregation and differential allele fixation. a mutation subsequent to subline divergence is also a possible source of the polymorphism, but is less likely. the high responder phenotype inheritance pattern in (a/wysnj x a/j)fl, f2, and backcross mice was consistent with segregation of a single, recessive gene. we named this locus l a for the strain a sublines that define it; strain a/j represents the k a h allele and strains a/yysnj and a/hej represent the allele. hayes, keith d. hanson, faye nashold, and david j . miller, department of the secondary iggza responses were also affected. several different proteolytic digests of denatured seb have been tested for their ability to stimulate t cell hybrids to produce il-2. a tryptic digest that retains activity has been fractionated by hplc and the stimulatory component is being analyzed. examination of the amino acid sequence of seb and the proteolytic cleavage sites has led us to synthesize several peptides for analysis. these peptides, and their analogues, will be tested for their function in vivo and in vitro. to understand the interactions involved in the famation of peptide-mhc complexes, an assay has been developed to detect dr specific binding of peptide analogues of t e l l determinants to cell surfaces. ebv msformcd b cell lines (bcls) w m incubated with biotinylated peptide followed by fltc conjugated streptavidin, and then anaiysed by flow cytomctg. a panel of bcls homoygous for diffmnt dr types bound analogues of peptide 307-319 from influenza virus haemagglutinin (previously shown to be a helper t cell determinant restricted through dr1) to varying degrees, w h m no binding was observed to the dr-bcl rj225. binding could be specifically inhibited by the natural unbiotinylated t cell determinant or other drl restricted determinants. competition by a range of peptides revealed quantitative diffmnces in their ability to bind drl. the assay is currently being used to generate a detailed model of the complex formed betweenha307-319anddrl. and trp for leul 6. and hla-a2.3 differs from hla-a2.i by the substitutions of thr for ala 4 9 glu fo??a1152 and trp for zeu156. residues 9 and 95 in the 8-sheet of the molecule, and residues 149, 152, and 156 in the a-helix are thought to interact with bound peptide or the tcr. to evaluate the role of these residues on ctl-defined epitopes, two genes were constructed that encoded novel molecules which differ from hla-a2.1 only at residues 9, 43, and 95, or at residue 156. the effect of a-helix substitutions on serologic and ctl-defined epitopes that varied between hla-a2.i and hla-a2.3 were evaluated by constructing genes that encoded the individual differences at residues 149, 152, and 156, as well as additional non-naturally occuring substitutions at these same positions. hla-a2.1 specific ctl were found that were: (1) insensitive to substitutions at either residues 9, 43, and 95, or residue 156, but were lost when all four positions were changed; (2) dependent upon the residues 9, 43, 95, but not residue 156; (3) dependent upon residue 156, but not residues 9. 43, and 95; and (4) dependent upon residues 9, 43, 95, and residue 156. further epitope mapping with the a-helix mutants demonstrated that a substitution at residue 152 often destroys an epitope not affected by substitution at residue 156. even conservative substitutions at position 152 were more disruptive than nonconservative changes at residue 156. residue 149, while important in defining an mab epitope, had no effect on any ctl epitopes. these results indicate that spatially separate residues in the a-helix and 8-sheet of the molecule can contribute to the epitope recognized by a given ctl. furthermore, considerable complexity must exist in the spectrum of t cell receptors utilized to recognize hla-az, as 28 ctl clones exhibited 21 distinct fine specificity patterns. to follow the evolution of these class i types, to discern the chief selective pressures on its members and thus indicate the probable functional properties of the antigens. a cosmid library was screened for class i genes. 91 clones were mapped and could be grouped into 17 clusters of contiguous dna spanning 1,264 kb. by hybridisation studies, 61 class i genes/ gene fragments could be distinguished. transfection analysis revealed that 10 genes could be expressed as cell surface antigens: two genes, in a block of duplicated dna encoded serologically defined rt1.c products, the other 8 genes gave rise to novel class i antigens detected by the xeno-antibody 0x18. using region specific probes, we could detect clear rat homologues of the mouse qa and h-2 genes, however there were only two rat genes with limited homology to the mouse tla genes. the analysis showed extensive remodelling of the class i region in the evolutionary gap between rat and mouse. while the immunological role of t cells bearing the ap t cell receptor (tcr) has been well characterized, much less is known about the function of t cells bearing the $3 tcr. we investigated the role of tcr $cells in the immune response to complete freund's adjuvant (cfa). after immunizing mice with cfa, we observed a greater than 26-fold increase in the number of tcr $3 cells present in lymph nodes draining the sites of immunization, compared to a 3-4-fold increase in the number of tcr ap cells. there were at least three different species of 3tcr's expressed on these cells in the draining lymph nodes, including two protein products derived from the rearrangements of cyl and cp, and one product derived from cy4. 37% of tcr ys cells from immunized lymph nodes expressed the il-2 receptor in vivo. and these cells constituted roughly 50% of the proliferative response of total lymph node t cells to 11-2. tse, et al. (j.lmmun..vol.l25, p.491.1980) have demonstrated that at least three cell types are involved in the t cell proliferative response to antigen, including an antigen specific-t cell, an antigen-presenting cell, and a t cell that is found in unprimed lymph nodes or spleen, which has been termed the recruitable cell. we have utilized their approach of analyzing the slope of log cell number-log response curves to examine whether tcr @ cells can function as "recruitable" cells. we found that tcr ys cells as well as tcr ap cells can function as recruitable cells in this system. these data suggest that tcr $3 cells can participate in the immune response without being specific for the antigen. analysis of the membrane associated phosphoprotein profiles of b cells harvested from cultures of resting cells exposed to il-4 for 18-24 hrs reveals the presence of phosphoprotein with an mr in the range 75-80,000. destroy the autoradiographic signal from this phosphoprotein suggesting that it is phosphorylated upon tyrosine residues. appearance of this molecule, and lps also apparently fails to result in the presence of a 75kd structure in the phosphoprotein profiles. anti-il-4 antibody. 11811, in the cultures prevents the appearance of the 75kd phosphoprotein. the genes for t n f -a and tnf-p are tandemly arranged on mouse chromosome 17, with only 1.1 kb separating the 3' end of the tnf-p mrna from the 5' end of the tnf-a mrna. yet, the two genes are independently regulated. in vitro transcription and nuclear run-on experiments indicate that the two genes are transcribed from independent promoters. in macrophages, which express tnf-a but not tnf-p, only the tnf-a promoter is active. in t lymphocytes, which can synthesize both proteins, both promoters are active. activation of either cell type results in a moderate (up to 10-fold) increase in the level of transcription, while mrna levels increase more than 1wfoid under the same conditions. interestingly, the tnf-p gene is aanscribed 10-fold less than the tnf-a gene in t lymphocytes, although the corresponding mrna is more abundant. these results indicate that the accumulation of both tnf-a and tnf-p mrna after cell activation and their relative steady state levels are controlled mostly at a post-transcriptional step. acanomycin d chase experiments reveal that tnf-a mrna stability in macrophages is not significantly altered after activation by lf's, and therefore that stabilization done cannot account for the observed accumulation of tnf-a mrna. in order to examine more closely which elements are required for the regulation of tnf-a and tnf-p mrna abundance, we constructed hybrid genes combining putative control regions of tnf-a and tnf-p with known constitutive control elements. results obtained from the transfection of these hybrid genes into various cell types indicate that elements located both 5' and 3' of the coding sequence are required for the proper regulation of tnf-a and tnf-p mrna abundance. celiac disease is characterized by small intestinal mucosal injury and malabsorption. disease is activated when a genetically susceptible host ingests wheat gliadin or similar proteins (i.e., prolamins) in rye and barley. d region specif icities -dr3 and -dqw2. class i1 d-region haplotype associated with celiac disease is extended and also includes genes in the hla-dp subregion. chain gene with those encoding dr3 and dqw2 may indicate that the hla haplotype associated with celiac disease exhibits an unusual degree of linkage disequilibrium or, alternatively, that disease susceptibility involves the gene products of more than one hla locus. to characterize possible hla structural variants unique to celiac disease, the polymorphic second exons of the expressed dr, dq and dp genes were amplified from genomic dna of celiac disease patients, and their nucleotide sequences determined. our studies indicate the presence of a unique constellation of d region genes associated with the celiac haplotype, and exclude the presence of a disease specific dr, eq or dp structural gene variant in this disease. disease susceptibility is strongly associated with the hla class i1 we recently determined that the hla this same population of t cells contains a high frequency 0 1 % ) of cells which will respond to a given allogeneic mhc protein, or to differences at two other genetic loci termed mls, in conjunction with mhc. we have transfered the a and b chain genes from a pigeon cytochrome c/el specific, alloreactive. and mis' specific murine t cell clone into an unrelated host t cell. we demonstrate that the genes encoding a single a b receptor chain pair can transfer the recogntion of self mhc molecules c m p l e x e d with fragments of antigen, allogeneic mhc molecules. and an m1sc (hls-2) encoded determinant. in this case the transfer of antigen specificity and alloreactivity requires a specific a8 receptor chain combination, whereas mlsc reactivity can be transfered with the 6 chain alone into a recipient expressing a randomly selected a chain. site directed mutagenesis of the ja region has also been performed in an attempt to identify sites involved in the alloreactivity of this t cell clone. in addition. we demonstrate that a single amino acid change in the v-j junction of the a b receptor can alter mhc restriction a s well a s antigen fine specificity. department of genetics, washington university school of medicine, st. louis, i(0 63110. the s49 tumor sublines are variants isolated from a sing1 parent balb/c tumor which demonstrate locus-specific shut-off of their kd, dd and l8 genes. four phenotypically different sublines were characterized at the dna and rna level. southern blot analysis indicated that no major chromosomal deletions have occurred, and treatment of the sublines with 5-azacytidine had no effect on class i expression. between loci are unlikely. none of the repressed class i antigens could be induced with interferon even though the expressed antigens were fully inducible. northern blot analysis revealed message only for the expressed antigens, showing that the repression mechanism is acting at the transcriptional level. rnase protection analysis confirmed this result and demonstrated that the transcriptional repression is exquisitely specific for the kd, dd and ld genes as other "class i-like'' messages are present in. all the cell lines. expressing class i antigens from both fusion partners, but the negative class i antigens originating from the s49 partner were not expressed. lymphokine gene expression was examined in a panel of 116 short-term murine t lymphocyte clones derived by single-cell micromanipulation from allogeneic mixed leukocyte cultures. about 30% of clonable t cells, including both cd4+cd8-and cd4-cdw cells, could be expanded for assay at an average of 22 days after cloning. following stimulation with concanavalin a or anti-cd3 antibody, all clones secreted detectable granulocyte-macrophage colony stimulating factor (gi(-csf), interleukin-2 (il-2) and il-3, but cd4+ clones on average secreted higher 'levels of each lymphokine than cd8+ clones. clones (85%-96%) expressed detectable gm-csf, interferon-y and il-3 mrna and 11% expressed il-4 mrna. when the frequencies of co-expression of any pair of lymphokine mrnas vere determined, all were found to correspond to the values predicted for random assortment of the individual frequencies. for example, among 13 il-4-positive clones, 11 also transcribed interferon-y, giving the frequency of double-positive clones expected for random association (9.6% 10.8%). expression of the four lymphokine genes therefore segregated independently among the clones and did not allow the division of t cells into subsets vith distinct patterns of lymphokine synthesis. greater than 20-fold in the adult liver cell line, 2 to 3 fold in the macrophage cell line and just slightly in l-cells. we have subcloned the region 5' to the li gene which contains sequences that may be important to regulating expression of the li gene. this region includes a 15-mer (cctagaaacaagtga) which occurs 5' to many ifn?i regulated genes. current research has been directed towards identifying and comparing proteins from nuclear extracts prepared from control and ifn-7 treated cells which bind to this region (-260 to -1 1). this data indicates the li molecule may be expressed in cells not known to be directly involved in the immune response. although there has been considerable interest in the recently identified gamma, delta t cell receptor, relatively little is known as to its function. during our studies of the human immune response to autologous b cell lymphomas, we generated cytotoxic t lymphocytes (ctl) specific for tumor idiotype. these ctl lysed only autologous tumor cells and none of a large panel of other autologous and allogeneic cells. inhibitable by anti-idiotypic and anti-immunoglobulin antibodies but not by a panel of classical anti-mhc antibodies. phenotypic analyses showed that these ctl were cd3+, cd4-, cd8-, and express the delta, and presumably gamma! t cell receptor. such ctl can be used to gain new insights into the function of the gamma, delta t cell receptor and t cell recognition of immunoglobulin, and may prove clinically useful in adoptive immunotherapy. tumor lysis was for ebv-induced antigens. furthermore, lcl variant .221, which does not express any hla -a, -b. or -c determinants. is killed by cultures primed to lcl-.180. antibody blocking experiments suggested that this killing was mediated by t cells, and was not restricted by known class i antigens. depletion of leu 19 positive cells from the effector population did not eliminate cytotoxicity on lcl-.221. cold-target blocking studies further suggested that the class 11-nonexpressing lcl-.180 and the class i-nonexpressing lcl , 2 2 1 share residual deterninant(s) other than hla class i or class i1 that can restrict cytotoxic t cell responses to ebv-induced antigens. national jewish center for immunology and respiratory medicine, denver, co 80206 it is uncertain to what extent lymphokines can be differentially produced by activated primary t cell populations. to determine if il4 and ifnr were differentially regulated in uncloned human t cells from adults (ad) and neonates (nt), these mrnas vere measured by in situ hybridization after maximal stimulation by ionoaycin and pma. il4 mrna was detected in 1.2% of total (tl), 3.5% of cd4', 10% of cd4' cd45r-, and 0.1% of cdb' ad t cells, but in none of the tl, cd4+, or cd8' nt t cell populations (virtually all nt t cells were cd45r'). in contrast, ipnr mrna was found in 39.42% of tl, 34.36% of c d 4 ' , 51% of cd4' cd45r-, and 58% of cd8+ ad t cells, but only 2.3% of tl, 2% of c d 4 ' , and 4% of cd8' nt t cells. these results agreed with other estimates of il4 and ifnr production based on ria of cell culture supernatants, rna blotting, and gene transcription assays. in contrast to il4 and ipnr, il2 was expressed in similar amounts by ad and nt t cell fractions, as well as the ad cd4' cd45r' and cd45r' subsets. thus, the capacity for increased il4 and ifnr production by ad t cells appears attributable, in large part, to the postnatal acquisition of the cd45r' subset (putative memory t cell population). aowever, additional mechanisms exist which act transcriptionally to limit il4 production by both neonatal and adult t cells. such selective expression may be important for restricting the potentially pleiotropic effects of certain lymphokines t o appropriate responder cells. we observed significant inhibition (>70% at 800 ng/ml) of the presentation of wova and of ova 323-339 by the anti-ap 57-78 peptide mab's. exhibited significant inhibition. 61 peptide mab's. after incubation with antigen +/-mab, indicate that the inhibition occurs at the level of antigen presentation. dg11, was also observed for the anti-p chain peptide mab's and to a lesser extent by the anti-a chain peptide mab's. peptide sequences are capable of interfering with antigen presentation, in vitro. supported by nih grant, ai-14764. the ovalbumin (ova) i-ad restricted t cell hybridoma, do1l.10 was used to the anti-i-ad mab, mkd6, also much less inhibition was observed with the anti-% 43-experiments with glutaraldehyde fixation of the b1d.p cells before or inhibition of i-ad allorecognition by the t cell hybridoma. these results indicate that mab's generated against class i1 rijllinghoff, institute for clinical microbiology, university of erlangen-nurnberg, 8520 erlangen, f.r.g. and the *institute for clinical immunology and rheumatology, university of erlangen-niirnberg, 8520 erlangen. f.r.g. recently we have shown that cloned l . major-specific l1/1 t-helper cells of type 2 (th2cells), when stimulated with antigen, are able to induce polyclonal b-cell proliferation (1). we here present evidence demonstrating that this process is dependent on a direct cellcell interaction between t-and b-cells. which in the effector phase, i.e. during stimulation of the b-cells by activated t-cells, can be mediated by a mechanism other than cognate interaction. this conclusion is derived from experiments, in which highly purified resting b-cells were polyclonally stimulated by l1/1 t-cells triggered by an anti-t3 monoclonal antibody, in the absence of antigen. the triggering process was independent of the presence of the fc part of the antibody and occurred in cultures devoid of macrophages. thus, the well established cognate recognition does not appear to be the only way of b-cell induction by t-helper cells of type 2. studies show that a proportion of the peripheral blood cd3' t lymphocytes do not express cd4 or cd8 and are called double negative t cells. they normally have a 1 6 tcr. however, another population of double negative t cells exists that expresses the a@ heterodimer. w e have purified and expanded such a population isolated from the peripheral blood of a healthy individual and studied i t s phenotypical and functional characteristics. the c e l l s are cd3' cd4-cd8-, positive for wt31 and negative for the nk markers. they express a and p mrna b u t lack ymrna. from surface iodinated cells were precipitated w i t h monoclonal pf1 two closely running bands (46 & 48 kd) . functional studies demonstrate that they proliferate to anticd3 and pha, t h i s response was blocked by cyclosporin a. there was no nk lysis b u t anticd3 induced l y s i s of target cells. the cells responded t o il-2 and il-4 as previously shown for other t c e l l s , b u t also t o il-3, a lymphokine thought t o affect mainly stem cells and not previously shown t o stiaulate growth of mature cells. long term growth of these c e l l s was also maintained by these cytokines . roberto biassoni , silvano ferrini , rafck p. sekaly , and eric 0. long , laboratory of immunogenetics, national institute 2f allergy and infectious diseases, nih, beth-md 20892, and istituto nazionale per la ricerca sul cancro , 16132 genova, italy. cd3-cells grown in vim in the presence of il-2 acquire the ability to l~s e a wide variety of tumor cells in an mhc-unrestricted manner. we have previously shown that cd3-16 clones expressed the cd3 epsilon gene but no functional transcript from cd3 gamma, cd3 delta, tcr alpha, tcr beta and tcr gamma genes. this result suggested that these cd3-16' cells represented an early stage in t cell differentiation. to test for expression of the tcr delta gene in these cells, rna from a panel of cd3-16' clones and from three highly enriched populations was hybridized with several dna fragments of the delta locus. abundant transcripts were detected with a c delta probe and a j delta 1 probe in 6 out of 8 clones and in all three populations. at least four different transcripts were present with sizes similar to those found in cd3' tcr gamma-delta' cells. however, the tcr delta transcripts in cd3-16' cells are most likely derived from unrearranged genes because no rearrangement could be detected in dna from an enriched population using a j delta 1 probe, and because these aanscripts hybridized to a dna fragment corresponding to the unrearranged genomic sequence 5'-upstream of j delta 1. expression of unrearranged tcr delta genes in cd3-cells provides further evidence that these cells belong to the t cell lineage. functional capabilities and by differential release of either il2 or il4 upon activation. we have produced a new monoclonal antibody to cd45 which has allowed us to separate normal murine cd4+ cells into two populations based on the density of expression of cd45 epitope. the separated populations seem to be analogous of subsets found in cloned t cell lines. cd4+ t cells with high density of cell surface cd45 after polyclonal activation produce il2 and mrna encoding ifw and il2. it does not produce il4 or il4 mrna. cd45 low density population on the other hand transcribes mrna for il4 and secretes il4 protein. data will be presented to demonstrate that the two subsets of normal cd4+ cells also differ in their proliferative response to mitogenic stimuli and to exogenously added growth factors. the substitution of v to l at 95 was the only change that could be discriminated by 2 of 9 allospecific ctl lines. suggesting that those 2 ctl lines recognize a2.1 plus a peptide whose presentation andlor binding is affected by the v to l substitution in the floor of the peptide binding site. in contrast, the l to w substitution at 156 (but not the other 2 substitutions) abolished the ability of the a2 molecule to present the viral peptide to 24 out of 25 peptide-specific a2.1-restricted ctl lines, suggesting that this substitution alters the presentation of the influenza matrix peptide but does not inhibit the ability of the peptide to bind to the a2 molecule. although y8tcr.s have a great potential for diversity, it remains to be determined whether this potential is realized in terms of expressed y6tcrs. preliminary studies in several laboratories have indicated that y6tcrs expressed in earlg t h r c y t e s and adult epithelial tissues are more restricted in diversity com ared to adult tc expressin thymocytes. we have derived a panel of cloned dendritic epigrmal t cells (jetc, lines and5ybridomas that express at least three types of y6 receptors -c 8, cy26 and c n . immunoprecipitation, northern and southern blot analyses, and sequence anazses of l gt 10 cloned cdna or olymerase chain reaction ( k r ) amplified cdna segments have been used to anal ze in detail &e extent of diversit in the expressed y and 6 chains and whether restricted airin o?y and 6 chains occurs. our resu& indicate that for this panel of cloned cell lines 7 and ! irkg is nonrandom and that variability in certain types of receptors appears to be restricted. %owever, we have observed significant 6 chain diversity in these cells that is obtained by the use of multiple v-regions, and n-region and junctional diversity. we are investigating whether the observed y and 6 chain pairing, and pattern of 6 chain diversity are present in other $tcr bearing cells or whether they are only characteristic of detc. activation of ctl precursors from murine unprimed spleen cells with ril-2 or ril-4 results in distinct lytic spectra, depending on which lymphokine is present. we have used allo-stimulation in limiting dilution analysis with subsequent testing on an allo-specific target (a20) and an mhcdeficient, non-specific target (rle). in the presence of ril-4 exclusively allo-specific ctl are generated, while ril-2 supports a proximately equal numbers of precursors that k~ll a20 and rie targets. dose response analysis of ril-2-supported killing activity indicates that the lytic spectrum is independent of the amount of ril-2 used, and therefore this il-2 effect is intrinsic in its activity on unprimed spleen cells. mixing experiments indicate that ril-4 can partially override the effect of il-2 on the generation of non-specific killer cells. split well analysis and cold target inhibition experiments are in rogress to ascertain the actual proportion of specific killer cells which can be generated with ril-2. be. are also testing the ability of cofactors, such as il-1 and il-6, to optimize the response of il-4 generated ctl. we conclude that il-4, not il-2, must be used when ctl are generated from unprimed spleen cells in mice. t r a n s c r i p t s i n y/6 tcr populations. i n t e r e s t i n g l y , these same v genes, as well as a further+crosshybridizing v gene previously designated va7.2, are expressed by peripheral a& tcr c e l l s as 1.6kb tcra transcripts. these data suggest t h a t b2a2-dn th represent a developmentally unique subset i n which both v6 and vg segments are non-randomly expressed. furthermore they i n d i c a t e t h a t there i s considerable overlap between the v a and v6 gene repertoires . indianapolis, in 46285 in order to detect the small amounts of lymphokines generated in vivo following antigen stimulation, we developed a co-culture system which allows for detection of il-2/4, il-3/csf and tnf from ln cells stimulated in vivo with picryl chloride (pcl). utilizing thb system in combination with facs analysis and receptor binding studies, we examined the production of these lymphokines in primary and secondary immune responses. during a primary immune response, the production of il-2 was not readily detectable on dl, peaked on d3 and was gone by d5. at no time were we able to demonstrate the presence of il-4. alternatively, the presence of il-3/csf and tnf was w i l y detected on dl, but olso peaked on d3. in comparison to primary responses, secondary immunization lead to at least two alteraticns. (i) peak production of all lymphoki es shifted towards dl. (2) although most lymphokines did not demonstrate increasea in the amount produced/lo cells, the amount of lymphokine generated/ln was vastly increased due to an increased number of cells. utilizing single and dual color facs analysis we also examined the ln cells for alterations in t cell subpopulations. during the course of the primary response: (i) the percentage of thy i+ and l3t4+ cells decreased until d3 and then began to recover, (2) the percentage of thy-i+, t4-,t8-cells peaked at the time of greatest lymphokine production (i.e.-d3) and (3) the il-2 receptor was expressed solely on thy-i+ cells, was detected on both t4+ and t8+ subsets and peaked on d3. most of these alterations also occurred during the secondary response, but their timecoune was shifted so that maximal effects occurred earlier (e.g., dl). finally. the maximal binding of radiolabeled il-2 by the ln cells following both primary and secondary sensitization correlated with the expression of the il-zr as detected by facs analysis. in addition, binding of radiolabeled il-4 demonstrated similar patterns except for the detection of significant binding on dl. these results demonstrate that (i) an ordered timecourse of lymphokine production occurs in vivo following exposure to antigen and (2) the secondary immune response to pcl is characterrd by an accelerated tempo of lymphokine production, rather than an increased level of lymphokine production/lo cells. activation as direct g-protein activation by a1f4 pi-hydrolysis using phorbol diester stimulation of pkc restores the inhibi$$ble phenotype and the ability to upregulate c-fos. even more interesting, sig-linked ca responses by vs2.12-c1.2 are equivalent to those observed in the wildtype wehi-231. resul$g suggest that contrary to current thought, sig-generated signals may not be coupled to ca fluxes via inositol phospholipid hydrolysis. thus, vs2.12-c1.2 is a new and powerful tool with which to analyze signalling through sig at the molecular level. unlike the wildtype, crosslinking of sigm on vs2.12-c1.2 did the signaling defect in vs2.12-cl.2-appears to be proximal to phospholipase c triggers pi-hydrolysis and bypassing these latter lmmunogenicity c 302 analysis of t cell receptor 7 chains from adult cd4-,cd8-thymocytes mark w. moore, i. nicholas crispe and michael j. bevan, department of immunology, research institute of scripps clinic, la jolla, ca 92037. the role of tcr 7 genes in t cell development has not been determined. to extend out understanding of the repertoire of tcr 7 expression, we prepared a cdna library from cd4-,cd8-adult balb/c thymocytes and cloned and sequenced 15 tcrq genes from this cdna library. we found that 2 clones were transcripts of the unrearranged c7, gene and that 3 clones terminated in the j7, region. nine of the remaining clones were v71,2-j7 c7 genes and five of these were in frame. only one clone corresponded to c71 and was v7 -jy c7jofned'in frame. sds-page analysis of the 7-chain proteins from the surface of both balb/c anzcdbl)6 adult cd4-,cd8-thymocytes did not detect the 32,000 mw vy c7 protein, but did detect the 35,000 mw v7c7 protein. these results suggest that despite the abundanc$%f pull-length, functionally joined, v7 c7 transchpts in the thymocyte subset, the protein product is not expressed on the cell surface as the prehfcted 32,000 mw 7 protein. finally, our analysis of the v-j jointing of the 7 genes reveals both flexibility at the v-j junction and extensive n-region nucleotide addition that lead to diversity of the predicted protein sequence. il5 in response to the same stimuli. e identification of these two subsets of cd4' helper cells is mostly based on studies performed with long-term cultured t cell lines and it is not clear whether these two subsets exist in vivo and represent distinct lineages of t cells. in particular, the frequency, tissue distribution and ontogeny of cells capable of secreting il4 in vivo is not known. these studies have been hampered by the fact that freshly isolated t cells from unprimed animals failed to secrete detectable amounts of ila and il5 when stimulated in vitro by lectins or alloantigens, whereas iu is readily detectable in these same cultures. data presented here indicate that freshly isolated t cells from unprimed animals can be induced to produce il4 in a receptor-de endent, antigen-independent manner upon stimulation by anti-cd3 antibodies. our results also stow that only cd4' and not cdst cells can be induced to secrete il4 and that cross-linking of the receptor is required for o timal activity. we believe that this approach will be useful in identifying in vivo cells recomittefto the th2 pathway and study their ontogeny, activation requirements and tissue distriiution. hlb, brussels, belgium. we have studied the murine tcr repertoire against the c-terminus of cytochrome c in association with certain alleles of the mhc class i1 molecule, eakepk (iek) and eakepb (ieb). for mice possessing these alleles, the majority of responsive t cells utilize one member of the variable val1 gene family in conjunction with a limited set of vp genes. as an extension of these studies, we have examined ie specific, alloreactive hybridomas derived from ie non-expressing (eab) cytochrome c non-responder mice to determine their usage of va and vp genes. tion assay showed that fourteen utilized the same val1 gene segment used by the majority of cytochrome c specific, ie restricted t cells and eight utilized a closely related val1 gene that also is associated with this antigen response. element most commonly used by cytochrome c-specific t cells was not found among the alloreactive hybridomas tested, @ genes less frequently used in the cytochrome response were expressed by seven of the 22 alloreactive hybridomas whose va segments were defined by rnase protection. determining recognition of ie molecules both in mhc-restricted, antigen specific immune responses and in alloreactive responses. the t-helper cells of seven mouse strains, representing 5 class i1 haplotypes (ias, ia4, iab, iakiek, iadied) were responsive to immunization and restimulation with parent peptide. the ied determinant was shown to be a presenting element by monoclonal antibody blocking and by use of l-cell-transfectants as af'cs to purified t cells and to t cell hybridomas. a series of overlapping synthetic peptides identified two minimal t-cell sites within the parent peptide: mice expressing ia and ie responded to a fragment at the n-terminus of the parent peptide (site 1) while mice expressing only ia responded to a distinct but overlapping fragment at the c-terminus (site 2 ) . these minimal sites identified in vitro could be used to immunize mice in vivo in an mhc-restricted manner. the human 6 tcr locus is strategically located within the atcr complex between the cluster of va/v6 region and the ja segments. which can be spliced to ca in pre t cells, separates 6 from the ja segments. pulse field gel mapping as we11 as molecular cloning link diversity (ds), j g , c,5 and tea within 35 kb. considerable 6 tcr diversity is generated despite the predominant use of one v 6 and j 6 segment. d61 and d62 are 9 and 13 bp long, are frequently recombine as d,1/d62? and reveal exonucleolytic trimning with extensive "n" segment addition. specialized 5' and 3 ' 6 deleting elements, 6 rec and p j a , separate the 6 locus from the a locus. cells with 6 rec/$b ja recombinations comprise most 6 deletion events although 6 rec recombines with 2 other major acceptor sites in fetal and post-neonatal thymic dna. the 5' 6 deleting element ( 6 rec) is evolutionarily conserved in the mouse and functional comparisons are underway. delete the 6 locus may prove to be the pivotal event establishing separate y6 and ae lineages. to study the mechanism of t-cell tolerance, transgenic mice were generated that expressed the mlsa reactive t-cell receptor (tcr) o-chain vb8 1 on -90 % of peripheral t-cells. in transgenic mice bearing mlsd, the numbers of high tcr expressing thymocytes and of thy 1.2+ peripheral t-cells were reduced. the cd4/cd8 ratio of peripheral t-cells was decreased fourfold compared to negative littermates. both mlsa and mlsb tcr &transgenic mice were able to mount a t-cell dependent antibody response against viral antigens whereas the capacity to generate alloreactive and virusspecific cytotoxic t-cells was impaired in tcr &transgenic mlsa, but not in transgenic mlsb mice. rna analysis and immunof luorescence with tcr vb-specific mab further revealed, that the expression of endogenous tcr 0-genes in these mice was suppressed. tolerogen-reactive lymphocytes, as measured in the mlr, in spite of their long-term acceptance of a skin graft bearing the tolerated antigens. lymphokine production by mlr+ tolerant lymphocytes is different from that of syngeneic normal lymphocytes. normal lymphocytes produce only il-2 in primary response to tolerogen, while tolerant lymphocytes produce il-2 and il-4. using limiting dilution analysis, we have to estimated the frequencies of pil-2 and pil-4 (precursor) cells in these cultures. after primary k vitro stimulation, normal responders have a low but measurable frequency of pil-4 cells, while tolerant responders have a much higher pil-4 frequency. however, following subsequent & restimulations, the pil-4 frequency of normal responders rises and begins to approach that of the tolerant responders, such that the two populations are indistinguishable based on pil-4 frequencies following the third round of in vitro stimulation. these data suggest that the high frequency of il-4 producers (presumably t,, cells) among the tolerant lymphocytes resembles unexpectedly a "primed" state, rather than "unprimed"as in nontolerant responders (where th1, dominate the early response). the existence of "primed" t cells in phenotypically tolerant animals raises the possibility that precocious activation of tr1 (by neonatal exposure to tolerogen?) suppresses the later emergence of t,,, which would be expected to contain the cells responsible for graft rejection. a large number of cd4+ t-cell clones, obtained from peripheral blood t lymphocytes by direct limiting dilution, allowed us to address the question whether functional heterogeneity exists within the human cd4+ t-cell subset. six out of 12 cd4+ clones were able to lyse daudi or p815 cells in the presence of anti-cd3 antibodies. the remaining 6 cd4+ tcell clones tested did not acquire this cytotoxic capacity during a culture period of 20 weeks. in the absence of anti-cd3 mab, no lytic activity against daudi. p815 and k562 target cells was observed under normal culture conditions. these two types of cd4+ t cells showed high reactivity with anti-cdw29 (4b4) mab and no reactivity with anti-cd45r (284) mab. the cd4+ clones without anti-cd3 mediated cytotoxic activities (th2) consistently showed a higher expression level of cd28 antigens. th1 cd4+ clones did produce il-2, ifngamma and tnf-alpha.beta. whereas the th2 t-cell clones produced minimal amounts of il-2. ifn-gamma and tnp-alpha. beta in response to anti-cd3 mab and pma. not all cd4+ clones did release il-4, but there was no correlation with cytotoxic activity. moreover, as compared to the th1 cd4+ clones, tr2 cd4+ clones proliferated moderately in response to anti-cd3 mab. however, anti-cd3 mab induced proliferation of only the th2 cd4+ t-cell clones was enhanced by anti-cd28 mab. both cd4+ subsets provided help for polyclonal b-cell activation with anti-cd3 mab. our data suggest that the human cd4+ subset, in analogy to the murine system, comprises two functionally distinct t-cell subpopulations. in the mouse, when la antigens are isolated immunochemically, the predominant species isolated are the isotypic matched pairs, aaap and eaep. however, when la ap dimer expression is studied using an l cell transfection model, it is found that the isotype-mismatched dimer apdea is readily expressed at the cell surface. these results suggest that differences in assembly andl or transport of different la pairs may be most readily visualized in a competitive environment where multiple distinct la chains are available. to investigate this possibility, the relative efficiency of inter-and intra-isotypic dimer formation and expression was evaluated using a sequential l cell transfection system. l cells already expressing an ap dimer on the cell surface (apdea or apdaad) were supertransfected with a third la gene (aad or ea, respectively). synthesis of this second a or p protein led to competition for the unique partner chain. individual clones were scored for cell surface expression of the distinct dimers (e.g., apdea vs epdea or apdaad vs apdea) using facs analysis with chain specific monoclonal antibodies. in addition, each species of mrna was quantitated by northern blot hybridization using bcus specific probes. our results indicate that in the h-2d haplotype. isotype-matched dimers are expressed with 3-4x the efficiency of isotype-mismatched dimers. this result suggests that, regardless of the cell type studied, if each of the four murine la genes is expressed at equivalent levels, intraisotypic dimers will be expressed to the virtual exclusion of the interisotypic dimers. however, if chain synthesis asymmetry occurs, the isotype mismatched pairs may be expressed at immunologically relevant levels. differential we have identified a series of discrete stages among the cd3-double negatives which seem to form a sequence, with tcr gene rearrangement and rna expression gradually progressing, but with potential for expansion and repopulation of irradiated thymuses diminishing along the series. on this pathway cd3 must be expressed late or after the acquisition of cd4 and cd8. cell cycle analysis shows the highest rates of cell division to be among the rsa+ il-2r-pgp-1-population which probably precedes the transition to cd4+cd8+ and tcr expression. thus it seems unlikely that tcr/antigen interactions play a role in cellular events occurring among the double negative cells which lead on to mainstream t-cell development. egr-l is a murine early growth factor inducible gene which encodes a protein with zinc fingers. its expression was investigated in murine b-lymphocytes stimulated through their antigen receptor (sig) with anti-recptor antibodies (anti-ig) . rapid (by 15 minutes) upregulatlon of egr-l mrna expression was observed at doses of anti-ig sufficient to drive the majority of go cells into cell cycle. agonists and inhibitors of protein kinase c (pkc) showed that expression was coupled to the pkc component of receptor immunoglobulin transmembrane signalling. interestingly, signalling through sig on the murine b lymphoma wehi-231 did not upregulate egr-l expression even though similar signalling pathways are associated with this receptor in these cells. southern analysis showed that egr-l is not deleted or translocated in this cell line. importantly, cell growth and proliferation of wehi-231 is inhibited by anti-ig stimulation suggesting a relationship for egr-l expression and differential processing of receptor ig signals. this notion is further supported by the finding that murine b lymphomas whose proliferation is not inhibited by anti-ig showed receptor immunoglobulin coupled egr-l expression. reeulation of exdression of a class 1 wc transeene. dinah s . the expression of the transgene product. the patterns of expression of the transgene parallels that observed in situ, indicating that regulatory elements necessary for normal patterns of expression are contained within the injected 9 kb dna segment, and that trans acting factors involved in its regulation function between species. included among these elements are those specifying preferential expression in b cells relative to t cells. in vivo treatment of transgenic mice with a/@-interferon results in increased expression of the transgene in a number of tissues. the response parallels that observed for the endogenous h-zkb, but differs markedly from qa-2. analysis of the chromatin structure of the transgene reveals a single constitutive dnase i hypersensitive site present in both spleen and thymus, which is not altered by interferon. both a novel negative and positive regulatory elements have been identified in the 5'flanking region of the transgene. the negative regulatory element reduced the activity of both the homologous class i promoter and a heterologous viral promoter. in vivo competition experiments indicated that the functions of the positive and negative elements are mediated by distinct cellular trans-acting factors. the negative regulatory element requires the presence of a positive regulatory element to function. this interaction between elements represents a novel mechanism for regulating gene expression. mcdevitt. department of microbiology and immunology, stanford university school of medicine, stanford, ca 94305. published data show that encephalitogenic h-zu murine t cell clones with specificity for the n-terminal eleven amino acid peptide of myelin basic protein display a restricted fine specificity when tested on substituted analogs of the native peptide. for example, substitution of alanine at certain positions in the peptide totally abolishes the response of each clone (acha-orbea et al.. 1988. cell 54:263) . recent experiments also have shown that the ability of some peptide analogs to bind to h-zu i-a gene products does not always correlate with their ability to stimulate the t cell clones (see accompanying abstract by david c. wraith and hugh 0. mcdevitt). this suggests that h-2u mice may lack a t cell repertoire capable of recognizing these peptides complexed to h-2u i-a gene products. to test this possibility, h-2u mice were immunized with a panel of peptide analogs, as well as the native peptide. the in vitro t cell proliferative response to each of the peptides then was measured. the results show that in vivo immunogenicity of the peptide analogs also does not strictly correlate with their capacity to stimulate the t cell clones. in this way, the polyclonal t cell repertotre of h-zu mice for the myelin basic protein peptide analogs was examined, and could be compared with the i-a binding characteristics of the peptides. terms of antigen and mhc recognition. this response involves a limited repertoire of t cells which crossreact on species variants of the antigen. in addition, t cells specific for the antigen in association with syngeneic mhc can recognize antigen on similar allogeneic mhc molecules. the groupin of clones by functional phenotypes defined by these crossreactivities allowed us to corrcfate tcr gene usage with either antigen or mhc recognition. some of the pigeon cytochrome c-specific clones within one functional phenotype use receptors that differ by as few as two amino acid residues. other clones e y s s very different tcrs but exhibit similarities in antigen/mhc reco nition. the efect of these tcr differences on recognition was assessed using a pane? of anti en analogs with single amino acid substitutions presented on different mhc molecufes. each clone exhibited a unique pattern of res onse to the antigen analog panel, even clones with very similar receptors. also, eace residue in the antigenic region of the peptide was critical for interaction with at least one t cell receptor. therefore, the antigen must either be a linear molecule with each residue available to interact with the tcr or be able to assume several conformations to interact with mhc and the tcr. lmmunogenicity c323 thy-1+ cd3+ ly-5(b220)+ cd4-cd8-tcrx-6' helper cells. anne i. we have found that these cells can be preferentially stimulated to proliferate when cocultured with the b lymphoma, ch12. one to 2% of nylon wool non-adherent, ia-, jlld-, and cd8-lymph node cells from normal unimmunized mice have the phenotype thy-l', cd3', cd4-. and cd8-. these cells proliferate when co-cultured with a syngeneic surface ig' lymphoma, ch12, even in the absence of any added antigen, mitogen, or fetal calf serum. prior to stimulation we find that approximately 30% of thy 1.2' cd3' cd4-cd8-express the marker ly-5(b220), however after culture with ch12 the majority of cells with this phenotype express the marker ly-5(b220). after ch12 dependent proliferation the ly-5(b220)' t cells are able to provide help for secretion of ig by fresh ch12 b cells. surface labelling and precipitation of t cell receptor molecules reveals that most of the thy-1' cd3' ly-5(b220)* cd4-cd8-cells express tcr(r-6). furthermore, cd3 precipitation shows that as many as four different 7-6 heterodimers are utilized within the entire responding population. this suggests that a heterogeneous population of double negative tcri-6 cells are involved in the response to ch12. college of kedicine at east tennessee state university, johnson city, tn 37614 interferon-producing (t 1) and interleukin 4 (il4) producing (t 2) clones were assayed for their ability to diregtly induce cytostatic activity in macro:hages generated from splenic myeloid precursors (m -c). in the presence, but not in the absence, of antigen, t 1 clones activated the m -c to inhibit the growth of p815 tumor cells in vitro. th2 cjlones were not able to activate such effector activity in the i4 -c. effectively present antigen to the t 2 clones as evidenced by the proliferation of t 2 cells cultured with antigen in the pfesence, but not in the absence, of m -c. thereyore, although both t 1 and t 2 were activated by cognate interaction with antigen presenting (ba) or nippostrongylus brasiliensis (nb). spleen cells from these mice were cloned at limiting dilution with alloantigen stimulation, and every two weeks, lk production in response to con a was measured. clones derived from, and stimulated with, cells from unimmunized mice initially tended to secrete low lk levels, with few clearly defined th1 or th2 clones. by 56 days after cloning, some clones had acquired th1 or th2 patterns. cfa, ba and nb-imnunized mice gave rise to clones that were mostly th1 or th2 even at early times. cfa and ba immunizations induced almost exclusively th1 clones, whereas nb induced more th2 clones. these results are consistent with a model in which resting, previously unstimulated t cells produce low amounts of lks, and progress through stage(s) where they secrete both th1 and th2 lks before finally differentiating into th1 and th2 cells. the results with cfa, ba and nb-primed mice suggest that this process occurs in vivo as well as in vitro. strains as carriers of melioidosis antigens to the immune system, deja tanphaichitra, mahidol university, p.o. box 4-217, bangkok 10400, thailand the attenuated gale mutant, salmonella typhi strain, tyzla, served as the recipient in a conjugal dna transfer experiment. conjugal dna transfer was obtained by the mating procedure on an appropriate blood agar medium. were examined serologically. one selected strain was found to have the serological characteristics of the recipient s . typhi, tyfla strain and also expressed the pseudomonas the donor strain was a pseudomonas pseudomallei mu107. the resulting antigen clones were repurified by restreaking on the medium and pseudomallei antigen. the s. typhi transconjugant strain is due to the presence of the pseudomonas pseudomallei plasmid. a group of subjects when received four doses of this bivalent vaccine strain in this study it appears that pseudomonas pseudomallei synthesis in developed antibodies against pseudomonas pseudomallei up to 70%. pseudomallei, an intracellular pathogen, produces a characteristic antigen probably to be plasmid coded, we considered that the gale salmonella typhi tyzla oral vaccine strain, highly effective against typhoid fever, might be modified so as to be protective also against melioidosis due to pseudomonas pseudomallei. terminal deoxynucleotidyl transferase (tdt) is a lymphoid-specific nuclear enzyme present in early lymphocytes. to investigate the regulation of tdt gene expression, pre-b and pre-t cells were treated with phorbol 12-myristate 13-acetate (pma) o r three analogs, and tdt steady-state mrna levels were determined by northern blot analysis. treatment of early lymphocytes with pma results in a rapid and reversible decline in steady-state tdt mrna levels within six hours. this rapid decline can be blocked by pretreatment of the cells with a protein kinase c inhibitor, implicating protein kinase c activation in the decline of tdt mrna. nuclear run-off studies demonstrate that tdt transcription is rapidly down-regulated within 45 minutes after pma treatment, indicating that this regulation occurs mainly at the level of transcription. furthermore, cycloheximide blocks the decline in tdt in rna showing that new protein synthesis is required for transcriptional inactivation. the nucleoprotein gene from the influenza virus a/nt/60/68 was stably cloned into the attenuated aroa-strain of salmonella typhimurium sl3261. nucleoprotein purified from pnp -3261 was tested for the ability to generate virus-specific immunity. immunization with recombinant derived nucleoprotein induged immunity to all type a influenza tested but not against type b viruses. cd4 helper t cells were primed but no evidence was found for priming of class i restricted ctc. mice immunized with recombinant nucleoprotein were protected against a subsequent challenge of influenza virus. the information obtained from the study of the immunity and protection generated by the purified recombinant protein was then used to design experiments to investigate the possibility of using the attenuated salmonella vector to deliver the nucleoprotein molecule to the immune system by the parenteral or enteral routes. we characterized the extrachrom-1 circular i n i s in 19-day-fetal and 4-week-old m u r i n e thpmcytes and 8-week-old m u r i n e splenocytes. f popllation of circular chias was clone3 into the kgtll phase vector. we screened ca. 10 tna cl-by plaque hybridizations with all far kirds of tcr gene probes derived from jal , val 0, db1, db2, jyl , j61 and 562 loci. cut of 10,000 cna cl-from fetal and 4-week-ld thymocytes, 30 hybridized with tcr aprobes and 5 hybridized with tcr &probes. positive cl-with tcr yand 6probes were 3 to 7 in fetal thymocyte4erived library, but few in 4-week-old thymocyte. of 7 fetal tcr 6 clcnes analyzed, 6 cl-had dd or vd reciprocal joints and 1 clcne had vd ar dd d i n g joint. relative frequencies of circular dna clones for four different tcr genes are consistent with the order of the expression of the genes the t cell developnent. of 10,000 tna clsignalling could be studied. llzmambxane signalling was maasured by 9 ability to t?z3nslocate fkc frcrm the cytcplasa to the nw2leus after surface i-a was banrl by a or p dxdn specific monoclcnal antibody. i(pmwing either 6 or 12 amino rids fmn the a chain cvtoplasnic (cy) damin did not affect the ability of tkse i-a r m l d e s to trarslocate pkc to the nucleus. normal splenic b c e l l s were rendered non-responsive t o subsequent challenge w i t h lps, as measured by a decreased a b i l i t y t o generate antibody forming c e l l s (afc), by incubation overnight (18-24 hours) w i t h 10 ug/ml a n t i -i g . both i n t a c t and f(ab)', a n t i -i g , as well as monoclonal anti-igm (bet2 and b-7-6) , were able t o induce 6 c e l l non-responsiveness t o subsequent lps challenge, suggesting t h a t sig/fcr i n t e r a c t i o n s are not necessary i n the induction o f lps non-responsiveness. i n contrast, induction o f nonresponsiveness t o subsequent challenge w i t h fitc-prucella abortus required i n t a c t a n t i -i g . the a b i l i t y o f mitogenic a n t i -i g (rab f(ab)', o r 6-7-6 northern blot analysis and bioassay data were used to analyze 9 separate lymphokines as well as the il-2 receptor (murine tac). northern blot comparison of fresh and primedt4 enriched rna revealed that primed t cells produced 10-fold more lymphokine than the fresh t cells. the only lymphokine that showed equal amounts of mrna for both fresh and primed t cells was il-2. a time course of fresh and primed t4+ cell lymphokine production was also analyzed. the primed cells produced a short burst of lymphokine mrna that peaked between 7.5 and 13 hr after con a stimulation and declined after 18 hr. the fresh t cells produced a longer burst of lymphokine mrna that peaked 18-44 hr after stimulation. the il-2 receptor @-2r) mrna time course from activated primed cells showed different kinetics than lymphokine mrna. this suggested that molecular regulation of the il-2r might be different than lymphokine regulation. to further examine molecular regulation in the primed t cells polysome profiles were evaluated for lymphokines, l 2 r , and other cellular genes. the recently developed method of gene amplification by the polymerase chain reaction (pcr) has proven to be particularly suited for the analysis of t cell receptor (tcr) genes. we adopted existing methods for the preparation of cytoplasmic rna from as little as 1000 cells and used this material as template for first strand c-dna synthesis. pcr amplification of this c-dna, using v-and c-specific oligonucleotide primers yielded enough material to produce single-stranded dna in a second pcr which could then be sequenced without cloning. in case of unknown v-usage, the pcr was employed for screening for v-beta elements by sequential reactions with different v-beta specific primers. we have used this method to reinvestigate the h-2b restricted cytotoxic t cell response to tnp in c57b116 mice. beta chain sequences of 26 ctl clones obtained by direct cloning of immune spleen cells were compared to sequences of 11 clones obtained by cloning of individual short-term in vitro ctl lines. it was found that a) in vitro bulk-stimulations reduced the heterogeneity of the beta-chain responses to tnp, b) similarities between different tcr-beta-chains concentrated on the usage of certain jb-elements ( jb2.6, 2.5, 2.1 ) rather than v-region or nid-region sequences, and c) the majority of jb2.6 containing beta-chains was associated with alpha-chains expressing v-segments of the val 0 family. these expression of genes which encode the t cell antigen receptor is cenval to the generation of the t cell repemire. our labomtory has been investigating genes for both the alpha and beta chains of this receptor in inbred strains of runus norvqicus (the laboratory rat), a species in which several autoimmune disease models have been developed. and which is used extensively in transplantation studies. using genomic southern blots and mouse probes specific for five different v a subfamilies, we have estimated the size of the v a repertoire in ten inbred strains of rat. results show a significant increase in the size of one subfamily and suggest increases in two others in all ten strains. the rat v a l subfamily has about twice as many members as the mouse, while the va2 and vu5 subfamilies, depending upon the enzyme used, show a similiar duplication. the va6 and va9 subfamilies have a comparable number of members in both species. these data are most easily explained by a single duplication event in the rat invoking at least one and perhaps three subfamilies, but not encompassing the entire v a locus. this implies that the val subfamily (perhaps together with va2 and va5 subfamilies) is regionally clustered and not interspersed with either the va6 or va9 subfamily. based on restriction fragment length polymorphisms, we find evidence for six distinct v a haplotypes in the ren strains tested. we have also cloned eight unique germline v a l gene segments. one of these has been sequenced. and has a coding region 87% identical to the most closely related mouse v a l sequence. this degree of relatedness is similiar to ra#nouse vg homologues. which share 8548% nucleotide sequence similarity. we are using these clones to generate angle copy probes from flankiig regions to further map the v a l locus. current approaches to mhc-peptide binding studies require either large quantities of highly purified mhc protein and/or the use of sophisticated detection apparatus. i n order to simplify detection of peptide-mhc interactions we have investigated the use of photosensitive-crosslinkers. two reagents have been successfully tested. a benzophenone derivative of peptide 1-16 from rat myelin basic protein (rmbp) was only effective after the introduction of a glycine spacer residue between peptide and crosslinker. an azido-nitro-benzoyl derivative of peptide 7.4, a heteroclitic analog of rmbp 1-11 (1). had a high affinity and bound specifically to the peptide binding site. the 7.4 photoaffinity probe has been used to test the binding properties of other analogues of rmbp 1-11 and is currently being used to define (a) the kinetics, (b) ph and (c) temperature dependence of the binding event. this particular photoaffinity conjugate retains both the mhc binding and biological properties of the original peptide and is helping us to define the roles of "determinant" versus "t cell repertoire" selection in the mhc linked autoimmune response to mbp the antigen-specific t cell repertoire is diverse in its ability to recognize a wide universe of foreign antigens. this t cell repertoire is composed of a set of clones each of which is specific for a given foreign antigen. therefore the precursor frequency of t cells specific for any give foreign antigen is extremely low. however, two prominent exceptions to this general rule exist, and these are the t cells present at high precursor frequency which are specific for foreign hhc products or for the products of the minor lymphocyte stimulatory (mls) genes in the mouse. the present studies were undertaken in order to examine factors involved in t cell repertoire formation by assessing the relationship between t cell repertoire for conventional foreign antigens and for mls products. studies indicate a striking degree of overlap between the set of t cells specific for pigeon cytochrome c and the set of t cells specific for mlsc gene products. demonstrate that the basis for this overlap lies in the predominant expression of one tcr vp gene, vbs, by those t cells which recognize mlsc. involvement of specific tcr afl dimers in recognition of mlsc and further suggest that t cell reactivity to these gene products may play an important role in establishing the t cell repertoire for foreign antigens. conclude that, rather than destruction of some essential apc structure, ecdi fixation prevents the apc from actively responding during the encounter with the t cell. this results in a failure to express new structures (probably located on the apc plasma membrane) that appear to be essential for stimulating t cell proliferation. these structures are distinct from ia or il1. the induction of these structures during t-apc interaction occurs in six hours, requires protein synthesis, and can be elicited by il1, il4 or lps, but not ifn-gamma. in the absence of these induced structures, the apc stimulates a partial t cell response, il4 release, but the t cells fail to proliferate. these induced structures on the apc may be either adhesion molecules that stabilize the t-apc interaction, or they may provide additional stimuli to the t cell. were not c m n t o the three s t r a i n s o f mice (balb/c, regions 1, 3, 5, 6 and 7; c3h/he, regions 2, 3, 5, 6, 6', 7 and 7'; and c57bl/6, regions 2, 4, 5, 6 and 7). immunisation with type i1 collagen (cii) leads to development of arthritis in mice with certain mhc haplotypes and is associated with an immune response against cii. we have been studying the t-cell response in the arthritis susceptible strain dbm1 (h-2q) . analysing the proliferative response in cultures of lymph node cells from immunised mice a s well as t-cell lines and clones established from such cultures it was found t h a t li the t-cell response after immunisation with heterologous cii was preferentially directed against foreign determinants on the cii molecule with little o r no crossreactivity against autologous cii. 2 ' both the primary response and the reactivity of established lines and clones were directed against the cbll fragment of the cii molecule, using c b l l fragments prepared from chick, bovine or rat cii. 3/ pepsin present in cii preparations after using pepsin digestion for solubilisation of the collagen is strongly immunogenic even in very small amounts and it was therefore necessary to use cii prepared from lathyritic cartilage without pepsin digestion for immunisation. in contrast to the pattern in lymph node cultures from immunised mice we found that when culturing spleen cells from unimmunised mice there was a t-cell response against collagen that was preferentially directed against autologus cii. since we earlier have found that autologus cii may induce an immune response and also arthritis in dbn1 mice we conclude that there exist t-cells capable of reacting with autologus collagen and inducing an immune response as well as arthritis but that these cells are under regulation so that they not readily can be activated into proliferation but may be induced to perform certain effector functions. tested. the characterization of these two cd6 rdsas will be presented. analysis of hla polymorphism using sequence specific oligonucleotide probe hybridization to amplified dna, lee ann baxter-lowe, jay b . hunter, and jack gorski, the blood center of southeastern wisconsin, milwaukee, wisconsin 5 3 2 3 3 . hla polymorphism plays a key role in antigen:mhc interaction. the polymorphism of the first domain encoding exon of the hla-dr p chain has been studied by in vitro dna amplification and use of sequence specific oligonucleotide probe hybridization (ssoph) to detect polymorphic sequences. a 230 bp segment of genomic dna was amplified and hybridized with synthetic oligonucleotide probes (12-19 bases) under conditions that detect single base pair mismatches. identification of these mismatches can be used to predict micropolymorphism in the protein products, including single amino acid changes. haplotype specific patterns of oligonucleotide probe hybridization were defined for a panel of homozygous typing cells. analysis of family data demonstrated the expected inheritance patterns. most known serological specificities are encoded by multiple allelic forms of dr p chains and ssoph can identify these differences. this was exemplified by detection of unique ssoph profiles for subtypes of dr4, drw6 and drw52 alleles. this procedure was also used for analysis of hla-dr polymorphism in large numbers of heterozygous individuals, including an hla-deficient scid patient. the ssoph data were correlated with serological specificities and will be useful for delineation of hla restriction in alloand autoimmunity. different cell membrane receptors have been shown to be involved in human t lymphocyte activation induced by either monoclonal antibodies or mitogenic lectins. these t cell surface molecules can be divided into two categories : a) the t cell antigen receptor (tcr) associated with the non-polymorphic cd3 antigen b) t cell differentiation molecules not linked to cd3/ti such as cd2 (t11) and tp44 (9.3). monoclonal antibodies directed against these t cell surface structures triggered different t lymphocytes functions : mitogenesis, il-2 receptor expression, il-2 secretion. our knowledge about early events involved in t cell membrane activation is not complete, especially involving the transduction mechanism mediated by gtp-binding proteins ; nevertheless, numerous authors have demonstrated that cd3/ti complex triggering induces the activation of phospholipase c, leading to the phosphoinositide cascade associated with an increase of free cytoplasmic calcium ions. in the present report, we show that different activating cell molecules (con a , pha and pma) can trigger oxygen free radical liberation when incubated with the human jurkat tumor t cell line. since membrane oxidative metabolism has been shown to be related to the stimulation of the phospholipase a2, and to be the final consequence of a membrane nadphoxidase : this could represent a previously undescribed pathway of t lymphocyte activation. high affinity monoclonal antibodies (mab), specific for staphylococcal nuclease (nase), were produced and characterized. competitive inhibition assays were conducted resulting in a series of complementation groups that define eight overlapping epitopes. it is estimated that these epitopes account for 70% or more of the accessible surface of nase. mutagenesis of the coding sequences for nase was carried out to produce a series of variant molecules (each differing from wild-type. nase and from each other by a single amino acid) that will enable mapping of nase epitopes, determination of residues involved in antibody binding, and the contribution of various physical and chemical factors to affinity and fine specificity. screening some of these mutants with the panel of mab enabled us to map several nonoverlapping epitopes and further subdivided some of the mab complementation groups. oligonucleotide-directed mismatch mutagenesis has been done on codons encoding the original amino acid residue and other surface residues in its immediate vicinity. determination of enzyme activity and structural analysis by cd spectropolarimetry of several of the mutant proteins suggests that any structural changes that may occur are local and not global. supported by grants ai20745, l32ca09109 and s07rr05431 from the national institutes of health. activation of t cell proliferation is believed to occur via the hydrolysis of inositol phos holipids, which, through the second messengers inositol-1,4,5-tris hosphate and diacylglycerof(dag), promotes the elevation of intracellular calcium levels anjactivation of protein kinase c (pkc), respectively. the role of pkc in t cell activation was investigated by comparing the effects of stimulation by 12-0-tetradecanoyl phorbol acetate (tpa), and the dag, oleoylacetyl glycerol (oag), on a >99% pure population of t cells cultured in rpmi 1640 medium containing 10% autologous serum. treatment with either tpa or oag caused down-regulation of the t cell rece tor, a consequence of its hosphorylation, but only tpa, in syner leuiin 2 receptor (il2-r), expression and, sgsequently, proliferation. immunohistochemical staining with antisera specific for the pkc subspecies a, pi, pii and 7 shows that restin t cells express a, pi and pii pkc subspecies which are diffusely distributed throughout the celt. after 20 minutes treatment with either oag or tpa all three subspecies are redistributed to a focal area within the cell. the redistribution is transient in oag stimulated cells, where the pkc distribution is similar to that in untreated cells after 1 hour of treatment. in tpa stimulated cells, however, the pkc redistribution is prolonged, becoming more marked until mitosis occurs after 48-72 hours of treatment. these results suggest that transient intracellular redistribution ofpkc causes phosphorylation and down-regulation of the t cell receptor, but that prolonged redistribution is required or t cell proliferation. sm is a nucleoprotein complex associated with small rna molecules in eukaryotic cells. the spontaneous generation of anti-sm antibodies is specific for patients with systemic lupus erythematosus (sle) and develops in 2 5 % of mrl mice. the response has been shown to be t-cell dependent in mrl/lpr mice. t-cells specific for sm are found only in mrl (h-2k) mice and mice bearing h-2s and h-2f haplotypes (which do not develop anti-sm antibodies). we are currently working to define the variable regions of the t-cell receptor genes used in the sm response. a series of t cell hybridomas from mrl mice has been generated and are being screened for sm positivity. a technique has been designed to amplify specific alpha and beta chain tcr genes using the polymerase chain reaction allowing for a more rapid sequence analysis. it is also our intention to locate the sm specific epitopes of the t-cell hybridomas. d. bloom, p.l. cohen, and s.h. clarke, department medical institute and experimental immunology branch, nci. nih, bethesda. md 20892. to determine whether prior activation history affects t cell receptor mediated activation of t cell clones, the murine type i helper clone ae7 was maintained in tissue culture by stimulation every ten days with either (1) antigen (cytochrome c), irradiated h-zk spleen cells. and il-2 or (2) il-2 alone. ae7 cells grown with antigen and antigen presenting cells (ae7-ag) proliferated and produced t cell growth factor activity (tcgf) in its culture supernatants following stimulation with immobilized anti-t3 antibody. the tcgf activity was shown by bioassay using indicator cell lines and specific blocking antibodies to be almost entirely due to gm-csf with little or no il-2 activity detectable. detectable il-2 mrna levels. (ae7-ilz) displayed substantially greater anti-t3 induced proliferation than did ae7-ag cells. in contrast to ae7-ag cells, ae7-il2 cells produced large quantities of il-2 in response to anti-t3 stimulation. furthermore. one cycle of stimulation of clone ae7-ag with il-2 in the absence of antigen and irradiated spleen cells was sufficient to cause this clone to produce substantial amounts of il-2 upon subsequent anti-t3 stimulation. these data suggest that t cell receptor mediated stimulation of t cell clones by specific antigen and antigen presenting cells inhibits subsequent anti-t3 induced il-2 production. t cell proliferative responses and sera antibody levels of myasthenic patients to several synthetic peptides representing different epitopes of the human achr were examined. we detected significant differences in the humoral and cellular responses of mg patients compared to healthy controls to peptides of the human achr alpha-subunit with sequences p195-212, p257-269 and ~310-327. proliferative responses of lymphocytes from myasthenic patients to p195-212 and to p257-260 correlated significantly with hla-dr5 and with hla-dr3, respectively. in order t o investigate further the immune responsiveness to selected sequences of the human achr, t cell lines and clones specific for peptides p195-212 and p259-271 were established from lymph nodes of c3h.sw mice. the recognition specificities of these lines were tested by examining crossreactivity to a series of shortened and/or extended peptides of the above sequences. deletions of amino acids in positions 211 and 212 (211=p, 212=l) resulted in a decrease of the peptides' stimulatory activity on the ~195-212 specific t cell line, whereas deletion up to position 200 on the n-terminal end had no effect on the triggering potential of the peptides. similar results were obtained when deleting residues 270 and 271 (270=v, 271=p) in stimulation assays of the p250-271 specific t cell line. help in determining important t cell epitopes on the human achr. the role of guanine nucleotide binding regulatory proteins (g proteins) in the regulation of phosphorylation of the y subunit of the cd3 antigen has been examined. cd3 y chain phosphorylation in isolated t cell microsomes or permeablised t cells was stimulated by the g protein activator, guanosine 5'-0 thiotriphosphate (gtpys), but other nucleotides such as camp or gdpbs were ineffective. dependent. these data are consistent with the involvement of a g protein in the signalling mechanisms that regulate the phosphorylation of the cd3 y chain. the regulatory effects of calcium and gtpys were compared in normal peripheral blood derived t cells and jurkat cells. there were differences regarding g protein regulation of cd3 y chain phosphorylation in normal t cell and jurkat cells and current models explaining these differences will be described. expression of the gamma-delta t cell receptor has been thought to first occur in a population of thymocytes shortly after their precursors populate the thymus between 11 and 13 days of gestation. in the course of our studies investigating the ontogeny of t cell receptor expression in the mouse embryo we have identified an extrathymic site of gammadelta expression in a population of cells present at distinct times of gestation. evidence will be presented demonstrating two periods of activity of the murine gamma locus in the developing embryo. are colonizing the thymus from the liver and the gene segment useage detected is different to that first expressed in cells of the developing thymus. around the time of birth) involves the functional rearrangement and expression of a gamma gene segment corresponding to the initial functional rearrangements detected earlier in gestation in the thymus, which can occur independently of thymic influence. demonstrate a new site of gamma-delta receptor expression in the liver of newborn mice that can occur in the absence of any thymic influence. the primary (in vivo) response of cs7bl/6 animals to the class i antigen qa-1 is a helper (th) dependent event as indicated by the requirement for copriming with a distinct antigen capable of activating helper cells. in contrast, the secondary (in vitro) response to qa-1 demonstrates no need for costimulation with the helper antigen. in attempts to more closely examine the helper requirements for activation of primed ctlp, we have observed that depletion of l3t4 cells from spleens of qa-1 primed mice abrogates the in vitro generation of anti-qa-1 effectors. the response is restored by the addition of concanavalin a induced supernatant (cas) or by the addition of syngeneic but not qa-1 allogeneic l3t4 cells. indeed, even in the presence of cas, l3t4 cells expressing the qa-1 alloantigen specifically suppress the activation of anti-qa-1 ctl in a manner reminiscent of that seen with lyt-2 veto cells. although the mechanism whereby l3t4 cells exert suppression is unclear, we have determined that ctlp are susceptible to veto only within approximately the first 48 hours of culture, after which they resist suppression. results from further studies of the nature of suppression and the l3t4 veto cell will be presented. group i1 proteins induce ige ab responses in -90% of mite allergic patients. murine mab and human igg and ige ab. unrelated. crossreactive epitopes on gpi and gpii allergens from different mite species. in contrast, igg ab in balb/c mice immunized with loug specific" epitopes and <1% was a n t i -u i (a gpi homologue, with -80% amino acid sequence homology to i). four non-over lapping epitopes were defined by mab, with one species specific immunodominant site on each gpi allergen. cross-reactive gpi epitope and this mab could inhibit human ige ab binding by -40%. specificity of the murine anti-gpi response was not h-2 restricted, but could be altered by immunizing balb/c mice with lower ag doses (lug) in alum or 8 . dertussis. using these regimes, up to 52% of the murine igg ab responses was gpi cross-reactive. responses to gpii allergens appear to be strain dependant. unresponsive to gpii. however, balb/b, a/j, cba, c3h 6 c57b16 all produce gpii cross reactive igg ab. anitgenic sites on gpi allergens are conformational, whereas those on gpii may be sequential. known to affect ige expression in mice may also affect the epitope specificity of igg ab. we have compared the b cell epitopes on these allergens using panels of however, ag binding ria on 73 sera showed that human igg and ige ab recognize the gpi and gpii allergens are antigenically i in cfa was directed against "species murine ab balb/c are completely thermal denaturation and reduction and alkylation expts suggest that the results with the gpi allergens suggest that immunization regimes which are c425 this report demonstrates for the the exclusive recovery of 72-84-specific t cell clones c 426 further molecular analysis should identify and characterize achr reactive autoimmune clones and/or suppressor cells. cohplex, mogens h. claesson, p e t e r bkams a n d s t e e n d i s s i n g , l a b . e x p . h e m a t . immunol., d e p t . med. anatomy a, a n d d e p t . g e n e r a l p h y s the ly-6 alloantigens represent a family of phosphatidylinositol anchored proteins that function as accessory molecules in the process of t lymphocyte activation. the expression of these alloantigens is often induced on t and b lymphocytes after activation by mitogens or antigens. previous studies have shown that the induction of ly-6 alloantigens in t cells is at least in part due to the action of ifn-a/b or ifn-7. in the present study, we have demonstrated that ifn-7 also induced ly-6 molecules on b lymphocytes and bone marrow cells. furthermore, we now show that tnf also participates in the induction of at least one of the ly-6 proteins, ly-6a/e. tnf was found to synergize with ifn-7 to induce ly-6a/e expression in thymocytes, t lymphocytes, and bone marrow cells, but not b cells. for t lymphocytes, the synergistic induction of ly-6a/e by tnf was restricted to cells from the ly-6.1 haplotype whereas ifn-7 was sufficient to fully induce ly-6a/e expression in cells from the ly-6.2 haplotype. this result is consistent with the notion that there is more complex regulation of the ly-6afe molecules in t cells obtained from the ly-6.1 haplotype. for t cells from balb/c (ly-6.1) mice, ly-6a/e, but not ly-cc, molecules were induced by ifn-7 and tnf. furthermore, when compared to ly-6a/e, the regulation of mhc class 1 molecules in these t cells by tnf was minimal. the induction of ly-6afe molecules on balbfc t cells resulted in an enhanced capacity to activate these cells through the ly-6 t cell activation pathway. one transformed t cell line, 5.1.2. was also identified whose ly-6a /e molecules were synergistically induced by ifn-7 and tnf. optimal expression of ly-6a/e molecules on 5.1.2 cells required continuous culture of this cell line with these two cytokines and resulted in the detection of optimal levels of cytoplasmic ly-6afe mrna by northern blot analysis. this latter result suggests that ifn-7 and tnf regulate ly-6a /e at the level of transcription and/or mrna stabilization. ut southwestern medical center, dallas, texas 75235. an igm antlcd3 mab (38.1) was found to modulate cell surface cd3 on highly purified human t cells within 5 hours in the absence of a secondary antibody or accessory cells. inhibition could be overcome with accessory cells or il2. the inhibitory effects of 38.1 could be mimicked by briefly pulsing cells with the calcium ionophore, ionomycin. 38.1 or ionomycin pulsed cells were inhibited in their subsequent capacity to resp nd to pha even when exposures were carried out in the presence of egta to prevent increases in [cap*]. from extracellular sources. inhibition was not the result of an inability to respond to pha 'by increasing [ca2+] .. moreover the newly expressed cd3 molecules were capable of generating increases in [ca2*].' after reacting the cells with anti-cd3 + a cross-linking secondary antibody. these studies dem'onstrate that a state of nonresponsiveness in resting t cells can be induced by modylating cd3 with an anti-cd3 mab in the absence of co-stimulatory signals. a brief increase in [ca ' 1. resulting from the mobilization of intracellular calcium stores appears to be sufficient to induce'this state of t cell nonresponsiveness. cd3. laurie s. davis, mary c. wacholtz, and peter e. lipsky, dept. of internal medicine, lmmunogenicity c a central lab.blood transf.service, lab. of exp. and clin.immunology of the univ. of amsterdam, amsterdam. the netherlands monoclonal antibodies (mab) directed against the human cd3 molecular complex induce a strong proliferation of t cells, when immobilized on microtiter wells. this activation system, that was shown to be independent of accessory cells, accessory-cell derived factors or lfa-1 mediated intercellular adhesion (1). allows one to study the requirements for t-cell proliferation and differentiation in a well defined manner. il-2 and ifn-gamma but no il-4 could be detected i n culture supernatants of coated anti-cd3 stimulated t cells. the addition of ril-1 or ril-2 had only a moderate effect on t-cell proliferation, vhereas helper activity for ig production was strongly enhanced in the presence of these factors. in this system differentiation of precursors to cytotoxic t lymphocytes (ctl), as measured in anti-cd3mediated cytotoxicity, could be demonstrated within 2 days after initiation of the activation. allospecificity of the induced effector ctl was demonstrated using a panel of hla class-i p815-transfectants. in this system the regulatory role of the cd28 molecule in tcell activation and differentiation was studied. addition of anti-cdz8 mab to t cells stimulated with coated anti-cd3 mab enhanced il-2 production, proliferation as well as ig production. interestingly, pctl differentiation was also enhanced by anti-cd28 mab. this system seems valuable for the analysis of requirements for differentiation of human t cells subsets. 1. van noesel et al., nature 333, 850-852, 1988 analysis of the requirements for human t-cell differentiation, rolien de jong. vivienne rebel, g i j s van seventer. miranda brouwer, frank miedema, rene' van lier, the newly described t cell receptor (tcr) 6 locus is located inside the tcr a locus between va and ja . despite this unique situation, a highly efficient regulatory mechanism results in the complete independence of these two loci. we have recently described, in humans, a site specific recombination which joins a 5' deleting element (srec) to the send of the ja's (yja) resulting in the deletion of the tcr-6 locus in t lymphocytes expressing the a/b tcr. rearrangements of the tcr as well as immunoglobulin genes are mediated by a unique recombination machinery and therefore, the specificity of these rearrangements is thought to be the result of a differential accessibility of the dna involved in the recombination process. as a consequence (and/or cause) of the opening of a segment of dna, the region involved is fxst transcribed as a sterile transcript prior to the rearrangement. in that regard, we have found that the 2 kb of dna u p s a u~n of yja are actively transcribed ('t early a" transcript, tea) early during fetal development. the presence of the tea transcript presumably reflects the opening of the tea sequence prior to the tcr-6 deletional rearrangement. in order to better understand the mechanisms involved in the dna accessibility model, we started to look for dna-binding proteins which might play a putative role in the opening or blocking of the tea sequence. by the technique of "gel shift assay" we found such a negative regulatory protein in the nuclear extract from a non-lymphoid cell. the binding activity appeared to be specific as it was competed out by an excess of unlabeled autologous dna and not by an excess of irrelevant dna. further studies are now in progress to determine first wether the presence of this binding activity can be correlated with a "closed" configuration of the tea region and second to determine the precise location of the dna binding region. cohen, laboratory of chemical biology, niddk, national institutes of health, bethesda, md 20892. mabs retard lymphoproliferation as well as autoimmunity. interesting, so does the adminisuation of a mab to l3t4 , thus suggesting that the t helper subset, which is not part of the unusual expanding population, is required for initiating the pathology in these animals. as a means of characterizing the expanding population of abnormal cells as well as the phenotypically mature (l3t4+) cells that may be associated with them, i have generated a series of t cell hybridomas from the enlarged lymph nodes and spleen of m p r mice. in parallel, i have derived a series of control (non-lprflpr) hybridomas from mrulpr x balb/c f1 animals (which show no sign of pathology), and a series from mrl, mice (which have a delayed onset of autoimmunity without lymphadenopathy). very few hybridomas (20-30) were obtained in the non-lpr derived fusions. when i con a stimulated the lymphocytes from non-lpr mice prior to fusion however, many more hybridomas were obtained(l20-220). this is in contrast to the fusion efficiency obtained from lprnpr mice which did not require in viuo lymphocyte stimulation to obtain a comparable number of hybrids. this result suggests that the m u p r lymphocytes are activated in situ. in addition, while less than 15% of the lymphoid mass is comprised of t helper (l3t4+) cells , over 50% of the hybrids are l3t4+. the fact that a dispropomonate number of t helper cells are rescued by fusion suggests that the cells activated in situ may be autoreactive t helper cells. currently i am characterizing these t helper cells for their lymphokine production, t cell receptor gene usage and auto-specificity and will compare them to the hybridomas obtained from non-diseased animals. goodnow, s. gilfillan, h-j. garchon, j. erikson and m. davis. stanford university, stanford, ca 94305. we have made transgenic mice bearing gene constructs encoding the t cell receptor a and p chains from a cytochrome c-reactive t cell hybridoma. despite a lack of tissue-specificity in mrna expression, cell surface expression of uansgene-encoded protein was limited to t cells, presumably because both chains require cd3 proteins in order to assemble on the cell surface. in mice carrying only the a chain consuuct, the transgene was expressed in the thymus as early as day 15 of fetal life, 1-2 days before endogenous a chain mrna. the first detectable cell surface expression of a transgene was on 25% of day 15 fetal thymocytes. this vast increase in up-bearing cells in fetal thymus was due to pairing of transgenic a chains with endogenous p chains, of which a substantial number are. normally rearranged by day 15 of fetal liie. the balance between ap-expressing t cell supopulations was grossly disturbed in these mice, the most marked abnormality being an increase in the number of l3tnyt-2-cells both in thymus and in peripheral lymphoid organs. it therefore appears that premature expression of surface ap t cell receptor may disturb t cell differentiation pathways by allowing t cells to leave the thymus without expressing l3t4 or lyt-2. mice carrying the p construct showed no increase in surface expression of t cell receptor in fetal life, since endogenous a chain rearrangement was limiting. in mice carrying either the a or p chain mansgenes, the number and surface phenotype of t cells expressing y& t cell receptors was unaffected in early fetal liie. suggesting that the a p and y5 t cell lineages diverge before thc rearrangement and expression of the appropriate subset of t cell receptor genes. department of microbiology and immunology, institute and the university of pennsylvania, philadelphia, pa. 19104.the thymic stroma plays a major role in initiating the colonization, organization and differentiation of precursor stem cells into functionally mature t cells. a variety of cell types including thymic nurse cells, cortical and medullary epithelial cells, nonepithelial dendritic cells, and macrophages, combine to form the thymic stroma. the differential role of such cells in thymic development is unclear. we have isolated a number of morphologically distinct stromal cell lines from the thymuses of sv40 transgenic mice. several of the cell lines are of epithelial origin, while others have features consistent with non-epithelial "dendritic" cells. we have focused on one of these cell lines, bearing the phenotype of a cortical epithelial cell, for its ability to support the growth and differentiation of stem cells from the fetal liver and fetal thymus, and cloned pre-t cells obtained from adult mice. the cortical epithelial cell line produces factors that induce the dramatic proliferation of fetal liver and thymic stem cells . in addition, fetal liver cells cocultured with this cell line are induced to rearrange and express their t cell receptor (tcr) genes. a cloned pre-t cell line is also induced to rearrange its tcr oenes in resoonse to sianals mediated bv this cell line. gugrin, marie c. b6n6, corinne amiel, nadia coniglio, jacques leclsre, laboratoire d'immunologie and clinique endocrinologique, chu de nancy and facult6 de mgdecine, 54500 vandoeuvre les nancy, france. the lfal molecule, an adhesin of the lfa family involved in cell-cell interactions, is physiologically expressed on all white blood cells. it is absent in some congenital immune deficiencies ((id), and is expressed on a decreased number of peripheral blood lymphocytes (pbl) in aids. we investigated its presence on pbl from 60 patients with auto-immune disorders of the thyroid. a monoclonal antibody (iot16, immunotech) directed to a conformational epitope involving both chains of lfal was used in indirect immmunofluorescence on pbl from blood drawn at a similar time in all patients. a calibrated flow cytometer (epics profile, coultronics) was used to measure the percentage and numbers of positive cells, as well as the mean fluorescence (mf) and shape of the fluorescent peak. data were correlated with clinical information,therapeutic, and other pbl features such as the cd4icdb ratio. the percentage of lfa1+ cells was significantly decreased in patients with graves' disease, hypothyroidism and hyperthyroidism. the mf was lower and the shape of the fluorescent peak seldom displayed the bimodal characteristic noted in controls. these data suggest the participation of the altered expression of lfal in the pathogenesis or evolution of auto-immune diseases. pt=ciilat.pd that excess hla r l a s s tt expression, ciinmonly found i n a i t i v e human nutoinimiine tlisrases maintdin:? the *rctivatinn of dutnreactivr t c e l l s which in turn prnducr mediators which maintain r.la:;s i t expression. this hypothesis has been tested i n many ways i n t h y r o i d i t i s . crj t i c a l l y autoreactive t cell:? are â�¬nilrid i n thyroid autoinnline tissues which a r e rrstimnlatrd hy thyroid f o l l j c u l a r r r l l s . more rrcently we have been exploring the s p e c i f i c i t y of the autoantigen reactive t c e l l s in hashinoto's t h y r o i d i t i s where thyroy-lobulin s p e c i f i c clones have been found, i n contrast t o graves' disease, where thyrocyte recognizing clones do not react wi.th tliyroglnhulin. tn rheumatoid a r t h r i t i s , collagen type i1 clones have been found, persistently in the activated (il-2r') t c r l l pmil over several years i n t.he same p a t i e n t . to verify t h a t antigen present,ation is involved i n rhrumatoid art.hritis ( r a ) a disease i n which, unlike thyroi.dj t i s the nature of the major antigen presenting c e l l (apc) is unknown, thc e f f e c t of ,~nticl.ass i1 dntibodies a t 1-oncentration which block *ari;j vation of t r-el is mi the synthesis n â�¬ rla-dr mrna wa:j waluat.rd. the inhibitory effect supports d i.rifira1 role nf an d s yet iiriknown apc i n maintaining the i:hronj.ci t y o f r a conversely, all the clones were unable to respond to a substitution at 91 (tyr to asp). nase mutant proteins were constructed with the same single amino acid substitution and t cell responses to peptides and mutants were compared. preliminary evidence suggests that the mutant proteins like the peptides, substituted at residue 88 and 91, will not induce t cell clone responsiveness. these data suggest that the overall structure of the protein will not compensate for the lose of a particular amino acid which is necessary for t cell recognition. medicine, baltimore, md 21201. to explore the variables important in t cell priming, an adjuvantfree immunization regimen was developed. bio.a mice were primed subcutaneously with syngeneic spleen cells that had been pulsed with high concentrations (100pm) of the peptide 81-104, a cnbr cleavage fragment of pigeon cytochrome e. the t cell response was assessed using a sensitive limiting dilution assay that measures lymphokine production with the ctl-l cell line. the precursor frequency of antigen-specific cells in the draining lymph nodes of mice primed with antigen-pulsed spleen (aps) was 1 in 4000, indistinguishable from the frequency of 1 in 3400 found in mice primed in the footpads with 10 nmol of 81-104 in complete freund's adjuvant (cfa) (data are given as geometric means, n=5, s.e.m = x/t 1.7 and 1.3, respectively). despite the apparent similarity in the t cell compartment of mice primed using these different regimens, antibody induction was strikingly different. mice primed with 81-104 in cfa developed serum igm and igg responses against the peptide, with antibody detectable in an ellsa assay at a 1 :3000 dilution. mice primed with 81 -1 owaps, however, produced no detectable anti-peptide antibodies. maximal t cell clonal expansion therefore appears to be possible in the absence of antigenspecific b cells. these data argue against the hypothesis that antigen-specific b cells play an obligate role in t cell proliferation in vivo. the reasons for the lack of antibody induction are currently under investigation. cell receptor (tcr) complex of jurkat cells. the coprecipitation of these peptides with tcr requires treatment with monoclonal antibodies (mabs) directed against tcr (c305 or r140) prior to cell lysis and immunoprecipitation. treatment of jurkat cells with mabs directed against cd2 (9-1 or 9.6) or hla (w632) does not not induce the association of these peptides with tcr. the signal-transduction mutant cell lines, j.cam1 and j.cam2, have previously been described (1,z). these cell lines, derived from jurkat, fail to activate the inositol-phopholipid second-messenger pathway in response to anti-tcr mabs. treatment with mab c305 induces the association of the 35 and 39 kd peptides with tcr in j.cam2 cells but not in j.cam1. j.cam1 modulates tcr normally in response to anti-tcr mab treatment (1). hence, these observations suggest that the two peptides are involved in the signal-transduction pathway of the t cell receptor complex rather than receptor internalization. sle is an autoimmune disorder associated with several different hla class i1 antigens. we studied a large sle patient population by sequencing of the pcr amplified first domain of the dqb and dqa chains and by sequence specific oligonucleotide probes to further define these associations. shared dqb sequences at amino acid positions 26=1eu, 30=tyr, and 57fasp may predispose some individuals with hla dr1,2,4, or 6 to develop sle. a novel dqb sequence found in two drw6 dqwl sle patients shares these amino acids in the dqb hypervariable regions. the association of the drz dqwl.azh gene was greatly increased in the sle patients with lupus renal disease. the hla association may be directly due to structural aspects of the hla genes. when parent -+ f1 chimeras are prepared with supralethal irradiation (1300 rad + 900 rad), the donor-derived cd4+ cells differentiating in the chimeras show partial tolerance to host-type h-2 determinants, despite the apparent absence of host-type apc. donor-derived cd4+ cells give only low proliferative responses to host-type apc in primary mixedlymphocyte reactions (mlr); furthermore, in i-e-+ i-e+ combinations, the donor cd4+ cells show molecules. this finding implies that tolerance is induced intrathymically, presumably through contact with a non-marrow-derived component of the thymus, e.g. epithelial cells. in support of this possibility, thymectomized & + ( a x b)f1 chimeras given strain j? marrow cells and a strain thymus graft (irradiated) show no detectable tolerance to host-type strain b determinants: the strain & cd4+ cells differentiating in these chimeras give strong mlr to strain b and do not show deletion of v 11+ cells. = 70% deletion of cd4+ cells expressing i-e-reactive v 11 t cell receptor b measured by these two parameters applies not only to lymph node (ln) cd4+ also to cd4+ cells recovered from the thymus. interphotoreceptor retinoid-binding protein) is a glycoprotein of 1264 residues (bovine) which localizes in the retina and pineal gland and induces inflammatory changes in these organs (eau and eap, respectively) in immunized animals. the experimental disease is considered a model for certain uveitic condiiions in man. we have recently shown that irbpderived synthetic peptides can also induce eau/eap in lewis rats. the present study compared two such peptides, "r4" (residues 1158-1 180) and "r14" (1 169-1 191). peptide r14 was found to be immunodominant, shown by its being recognized by lymph node cells (lnc) or line cells sensitized against whole irbp. in contrast, peptide r4 was not recognized by the whole irbp-specific lymphocytes and is considered nondominant. in addition, lnc sensitized to r14, but not to r4, responded to intact irbp. r14 was superior to r4 in producing eau/eap and cellular immunity (minimal doses: 0.06 vs 67 pg/rat). on the other hand, the two peptides were comparable in their capacity to stimulate presensitized lymphocytes. moreover, lnc sensitized against r4 were similar to those sensitized against r14 in their capacity to adoptively transfer eau/eap to naive recipients. this study thus provides a unique system in which both immunodominant and nondominant peptides produce autoimmune disease and can be compared for their immunological features. the age-related diminution in immune responsiveness has been shown to result from increased regulatory mechanisms and not from a paucity of immunobgical recruitment (aging: immunology and infectious disease 1,47, 1988). we present evidence based on "libraries" of monoclonal antibodies (mabs) omained from young and aged donors that there occurs with aging an increase in autoimmunity which is possibly the result of the accumulation of liielong "original antigenic sins". the resultant increased connectivity of the immune system is represented by mabs obtained from aged donors which are multiply anti-self cross-reactive. furthermore increased connectivii is supported by the evidence that anti-2.4,6-trinitrophenyl mabs are ad8 positive, 2d3 positive as determined by i n h b i i n studies using mabs anti-idiitypic reagents. analysis of the vh and vk region genes utilized by these mabs indicate a nonrandom gene usage. life bng stochastic immunological events lead to a pattern of cross-reactiities and non-random usage of vh genes. these immnological events lead to the emergence of the patterns which are partially elucidated by the data presented. these patterns mimick those seen early in ontogeny, but indicate a possible convergence to an ever-increasing connectance of the idiitypic repertoire expression. in other words, life-long immunological experiences contribute to a down-regulation resulting in both paucity of drimaw immune reswnses and an increase in autoimrmnity which are both the earmarks of immunity in aging. (supported by usphs grants ag-04042 to eag and al 18316 to cab) lmmunogenicity c449 stimulation of these cell lines in suspension with saturating levels of mab okt3 produces total and fractional inositol phosphate accumulation linearly related to receptor number, (r > 0.9 ). this technique also allows an approximation of the minimal number of reccptors which must be engaged for second messenger generation in this system, which we estimate as 6.5~102 receptors per cell. or terminate t cell activation. since this molecule plays an important role in human t cell development we sought to identify the murine homologue of cd28 in order to determine its expression on murine t cell and its role in activation. we have used a human cd28 cdna clone to isolate a full length cdna encoding the murine equivalent of cd28 from an el4 t cell lymphoma library. this clone shows similar domain organization and a high degree of homology to the human cd28 molecule. the murine cdna clone has been used to examine mrna expression of cd28 in normal and activated murine t cells, and in various t cell tumors. peptides generated from the translated sequence will be used to produce antisera to correlate the surface expression of cd28 with mrna expression, and to biochemically and functionally characterize this molecule. pat happ and ed palmer, basic sciences division, dept. of pediatrics, national jewish center for immunology and respiratory medicine, denver, co 80206. we have attempted to determine the frequency of rearrangement and expression of the individual a and g chain v gene segments that make up an unselected, untolerized t cell repertoire. in order to do this we generated over 500 t cell hybridomas from freshly-isolated thymocytes of newborn c57blllo mice and subjected rna from these hybrids to northern dot blot analysis using 11 va, 16 vp. cy and c6 probes. comparison of the expressed repertoire of vp gene segments in this newborn thymocyle population with similar data previously generated from an adult peripheral t cell population reveals two vg genes, vp12 and vpl5, whose expression is decreased in the periphery, possibly due to the effects of tolerance. two additional vp gene segments were expressed more frequently in the peripheral population than in the newborn thymus, vp5 (4.5 times higher in the periphery) and val0 (2 times higher). it is possible that these represent ewo instances of positive selection of t cells which is determined primarily by the receptor's vg gene segment. va gene segments were expressed in only 15% of newborn thymocyte hybridomas (compared to 58% expressing vp) and determination of va rearrangement frequencies was complicated by the unexpectedly large number (67%) of hybrids expressing cs mrna. further examination revealed that several va gene probes were actually detecting rearrangements to cs. the most notable of these was va7, which accounted for approximately 34% of the expressed va repertoire but was rearranged exclusively to cs. barbara bergman, brenda bradley, kevin lafferty and mary portas. barbara davis center for childhood diabetes, u. colo. health sci. ctr., denver, co 80262. we have produced a panel of islet-specific t cell clones by culturing lymphoid cells obtained from non-obese diabetic (nod) mice in the presence of nod islet cell antigen and antigen-presenting cells (apc). these clones were selected to the panel on the basis of (a) their antigen-specific reactivity to islet cells and apc in an in vitro proliferation assay and (b) their ability to mediate islet graft rejection in vivo in a tissue-specific manner. we have further characterized these lines for cell surface phenotype, 11-2 production, and proliferative response to non-nod islet antigen. all of the clones tested to date are of the cd4 phenotype and make il-2 in response to islet antigen and nod apc. nearly all of the clones we have tested also make good proliferative responses to islet cell antigen obtained from mouse strains other than the nod or to a mouse beta cell tumor line. preliminary results indicate that at least one of these clones can lead to islet cell damage in a disease transfer experiment in which the cloned t cells are injected into a non-diabetic nod f1 recipient. we are currently carrying out tests to further characterize lymphokine production by these cloned cell lines, to analyze differences in antigen recognition and mhc restriction requirements among the clones, and to determine their effectiveness in mediating the disease process in nondiabetic animals. in an attempt to identify the epitopes on class i mo ecules recognized by alloreactive cytotoxic t lymphocytes (ctl) we have examined k -specific ctl for tpir recognition of synthetic peptides with sequences derived from the native k molecule. co secutive overlapping peptides molecule were tested for their capacity to inh&bit k -specific ctl clones in their recognition of cells expressing the native k molecule. in these studies inhib9ion by peptide was found to be an extremely rare event, although one peptide (k 111-122) did inhibit recognition by a particular ctl clone (clone 13). in a separate set of experiment8 it was observed that clone 13 could recopize kblll-122 when presented by h-2 class i molecules. as clone 13 was of h-2 origin, this finding led to conclude that inhibition may be due to class i-restricted recognition of the k pegtide on the surface of the ctl clone, peptide and native kb for the t cell receptor. we present evidence in favor of this conclusion. the pepscan method is used for the systematic identification of sequential b cell epitopes i n protein molecules (geysen, meloen, barteling. pnas 81: 3998; 1984) . it was designed for the synthesis and subsequent testing for antibody binding of large numbers of overlapping peptides directly on their solid supports. the mhc dependent presentation required for t cell recognition seemed prohibitive for the use of pepscan to identify t cell epitopes. however we now have shown that by a novel modification the peptides can be recovered from their solid supports and used in t cell assays. holmdahl, department of medical and physiological chemistry, box 575 uppsala university. s-75123 uppsala, sweden. both autoreactive t cells and autoantibodies play an important role in the pathogenesis of type i 1 collagen (cii) induced arthritis in mice. we have earlier reported that only strains with h-2q , h-2w3, h-2w17 and h2-r were responders to autologous mouse cii and only these strains developed arthritis after immunization with autologous or heterologous cii. however, heterologous cii induced a more acute and severe disease and a more pronounced autoantibody response. this findings indicate that 1) the ability to mount an immune response against autologous cii is a prerequisite for the susceptibility to collagen arthritis and 2) that a crossreactive autoantibody response after immunization with heterologous cii may further enhance development of arthritis. we have now studied activation of autoreactive b cells after primary immunization of den1 mice with rat cii. in hybridoma collections, obtained 9-11 days after immunization, 30.80% of the hybridomas produced igg reactive with autologous cii, 10-15% produced multispecific igm and a significant number produced igg rheumatoid factors. the anti-cii antibodies recognized at least 5 different epitopes on the cii molecule and originated from many different vh and v kappa gene families. furthermore, none out of 6 investigated anti-cii hybridoma expressed cd5 rna message. we therefore suggest that the primary anti-cii autoantibody response involves activation of memory b-cells. these memory b cells have most likely been earlier activated by cii autoreactive t cells. in these aspects the origin of the anti-cii autoantibody response is principally different from the origin of "natural" autoantibodies. t cell receptors (tcr) recognize antigen in association with self mhc molecules, usually following processing to smaller peptides. the t cell repertoire to an antigen, therefore, reflects not only the ability of a given mhc molecule to interact with antigen, but also the affects of initial repertoire selection by self mc. we have t#tn analyzing the tcr repertoire specific for beef insulin (61) in balb/c mice (h-2 ), which are high responders to the antigy. these studies revealed that vp.3 is dominantly used in the tcr's specific for bi/a and our preliminary data suggests that the vgb.3 chain may be involved in mhc restriction. we have now obtained several t cell hybridmas specific for bi from (balb/cxa/j) f1, animals. a/j mice (h-2a) are low responders to bi while the f 1 m ce a e high responders. most of the balb/cxa/j hybridmas were restricted to the hin the balb/c hybridmas. interestingly, the analysis of v gene usage demonstrated that vg8.3 was not used in the balb/cxa/j hydridonas. the relevance of these results to the development of the tcr repertoire in different mouse strains will be discussed. this work was supported by the mrc of canada. ctl specific for the q k molecule were generated from normal splenocytes by & vitro culture with a2k bearing stimulators. these ctl have been shown to lyse transfected targets expressing hla-a2 regardless of their murine haplotype, and they specifically kill a2 bearing human target cells. furthermore. the effector function of these ctl can be inhibited with an hla-a2 specific monoclonal antibody. thus, the transgene product functions correctly as a tolerogen and is recognized directly as a class i antigen. although transgenic mice have been shown to be tolerant to a2k expressed by murine cells, transgenic ctl specific for hla-a2 on the surface of human cells have been generated. these the major virulence factor of m.pneumoniae was shown t o be a 168 kda protein which is located in the tip structure membranes of these cells. beside the adhesin function this protein is also involved in first massive humoral and cellular responses of the human host during the acute phase of upper respiratory tract infections and interstitiel pneumonia. intranasal inoculation of guinea pigs with the isolated 168 kda protein led to lympho-histiocyte infiltrations around bronchi and small vessels of the lungs which are characteristic infiltrations after an infection with live m.pneumoniae cells. furthermore one peptide ( 1 7 amino acids long) which was synthesized according to the amino acid sequence of the adhesin, showed a proliferative activity to in vitro cultivated t-cells of bronchial washings, whereas synthetic peptides with th e sequences of the direct neighbourhood showed no in vitro activity. most interestingly this t-cell proliferative activity is located on a surface loop of this protein which is also responsible for the adhesin f uction. christopher a. smith, gwyn t. williams, rosetta kingston and john j.t. owen. department of anatomy, university of birmingham, medical school, vincent drive, birmingham 815 2tj, uk. rearrangement of t-cell receptor a and b chain gene segments during t-cell development results in a diverse array of receptor specificities. to avoid auto-immune responses, cells that have generated self reactive receptors are thought to be eliminated or inactivated, to produce self tolerance. recent studies have provided compelling evidence that clonal deletion of immature receptor bearing cells within the thymus makes an important contribution to this process, although the mechanisms involved are not uderstood. of immature mouse thymocytes with anti-cd3 antibodies added to thymus organ cultures, induces dna degradation and cell death through the endogenous pathway of apoptosis. is in marked contrast to the activation of mature t-cells by the same anti cd3 preparation and is specific to the extent that apoptosis is not induced by either anti-cd-4 or anti-thy-1. to organ cultures suggesting a role for changes in intra-cellular c a w levels in the signalling pathway leading to the induction of apoptosis in immature cells binding. thus activation of the process of apoptosis in immature cells binding self antigens may be the mechanism responsible for the selective deletion of cells that could generate an auto-reactive response if allowed to mature. we have now obtained evidence that engaging the cd3/t-cell receptor complex this in addition, calcium ionophore (ionomycin) also causes apoptosis when added anderson cancer center. houston, tx 77030 immunization of patients with bcg and irradiated tumor cells induces specific delayed-type hypersensitivity (dth) to tumor cells and not to normal colon cells. since igg antibodies may require t-cell help, we wished to characterize the igg-defined tumor-associated autoantigens (taaa) of human crc so as to define a subset of the t-cell repertoire for crc. western blots of detergent extracts of 73 primary and metastatic human colorectal carcinomas and paired normal tissues were probed with autologous igg. nine taaa were recognized by 20% or more of the sera: 74, 72, 58, 52, 45, 41, 38, 29 . and 26 kda. these taaa may be normal colon differentiation antigens, since they were present in extracts of normal colon. autoantibodies are more frequently present to the 41 kda antigen in patients with metastases (79%) than in primary tumors (47%. prl-8 p p l a t i o n . m t s indicate that antigen receptolg on hath inmature and mature -itive t cells tmnsdwe signals via calcium mobilization, h-er the maqnitu3e of fnflux of e&acellular q'+ wfiich follckfi birding of antireceptor a n t q d i f f e r s tetmen these pqulaticns. specifically, imnature cells shcw a m& reduced q influx espcnse carpared to mature cells. we dmw here that dligation of ~~n p has different amepexe w i t h regard to q'+ nnbilization in mature and inmnture cells, no sucfi difference is seen f o l l a d r q ligaticm of the receptnr's transducer, a. ihe zpsults suggest that the signall* cascade leading to the influx of extracellular is intact hen c d~ is ligated, but is inccnplete w i e i i m p , the physiological liw, is ligated. in addition, ligation of cw or cd8 on bmdture t cells i n a~~ influx of extracellular a ' + canparable to that sem i n mature t cells. a clonal population has been isolated frcm inmnture thymocytes whir31 has the characteristic signal tramdmtion pxqerties of the tulk of inmnture thymdcytes. 'ihese f i r d h p suggest that "signal" transfer frcm xpnp to 03 may be inefficient in cd4+8+ cells. struchual analysis of the xpnp/cd3 canplex in hnature and mtum t cells is in progress. flood and alan friedman, dept. of pathology, yale university school of medicine, new haven, ct, 06510. protective immunity to the ultraviolet (w) light-induced sarcoma 1591-re is directed toward a single tumor-specific transplantation antigen expressed by the 1591-re tumor cells. termed the a antigen. a progressive variant line of 1591-re, termed 1591-pro4, lacks only the expression of this a antigen. immunization of mice with 1591-re tumor cells haptenated with trinitrophenyl (tnp-1591-re) leads to the subsequent rejection of tnp-haptenated progressive tumors and to increased delayed-type hypersensitivity and ctl responses to tnp in normal, immunocompetent syngeneic mice. however, little or no humoral immunity to tnp is seen in animals injected with tnp-1591-re tumor cells. pro4 did not exhibit tnp-specific tumor protective or cell-mediated immunity, but rather exhibited tolerance to subsequent immunizations with more immunogenic forms of tnp. biochemical and molecular genetic studies have revealed the a antigen to exist on the cell surface as a complex of 3 class i mhc-like molecules. transfection studies with dna encoding each of these molecules into 1591-pro4 reveals that the expression of one, and only one, of these three molecules mediates this increased immunity. these experiments suggest that an mhc-like antigen expressed on the 1591-re sarcoma acts as a natural adjuvant to increase cellmediated but not humoral immunity to linked antigens. the mechanism of this increased immunity is discussed. efforts to immunize cattle against economically important gastrointestinal nematodes showed that inrmunity is manifested by: 1) a response that reduces the fecundity of established and subsequently acquired worms. and/or 2 ) a reduction in the number of worms developing upon challenge infection. however. the ability of individuals to mount such immunity is highly variable. extending these studies to naturally infected populations indicate that there is a great difference in the number of eggs excreted by individual young calves on pasture. to delineate whether these differences were the result of host genetics and to begin to elucidate the mechanisms of resistance to parasite infection, a genetically defined cattle herd was assessed for parasite levels by determining fecal eggs per gram. three years of sampling of the calves during their first grazing season indicates that: 1) certain individuals in the herd will consistently excrete high or l o w numbers of parasite eggs, 2 ) the high or low phenotype is significantly controlled by the genetic make-up of the calf, and 3 ) the high or low phenotype is highly heritable (heritability -40). susceptibility is currently under investigation. calves have been determined and mhc class i1 typing is currently in progress. information is being used to assess the role of the bovine mhc in controlling immunoresponsiveness to parasite antigens. we have been studying the differential effects of il 1 + il 2 versus il 4 on the growth and differentiation of cd 4-, cd 8-thymocytes. culture of highly purified cd 4-, cd 8-thymocytes with il 1 (4 u/ml) + il 2 (100 u/ml) resulted in marked proliferation and increased cell size without change from the cd4-, cd0-phenotype. culture with il 1 or il 2 alone did not cause proliferation. a substantial contribution to the proliferation was secondary il 4 release: addition of anti-il 4 (11b11) blocking antibody inhibited proliferation induced by culture with il 1 + il 2 . il 4 mrna was demonstrated by northern blot analyses after 4 8 and 7 2 hours of culture with il 1 + il 2 whereas none was detected at culture initiation and very little was present at 24 hours. effects of il 1 + il 2 on il 4 transcription rate will also be reported. despite a marked inhibition of proliferation with anti-il 4 , there was no affect on expansion of cd 3+ cells following culture with il 1 + il 2 (increase from 10% to 60% in 7 2 hours). thus, in this system, il 4 enhances proliferation of progenitor thymocytes but does not contribute to induction of t cell receptor. carplex. lhese cwplexes can be used to inmmize mice in the absence of protein carriers or adjuvants, thus facilitating the study of the inmum to a sml1 ckmically defined antigen. use of this tecfinology has allawed us to identify two t helper cell epitopes in cclllserved regions of hn gp 160 not previcusly identified by computer algorithims, defined by amino acids 485-518 and 585-615. inummization with these peptides in pptide-@nqhlipid cwplexes results in the prpauction i* and i* antibodies, which cioss react with cloned fragmnts of the w l e protein. us& this &logy we have begun to characterize the innume response to individual peptide antigens. ?he reqonse of h2-k mice to amino acids 494-518 of 9p 160 of hiv, has been analyzed. lhe optimal dose of a peptide containing both b and t cell epitopes was found to be 15-30 ug, depending on the route of administration. im d z a t i o n requir& less antigen for opthum antibody resporrsf, than did ip. ment for an ant-response. additional variables, as phosfholipid carpasition and method of cross-linking have been studied anl will be . webelievethattheuseof this peptide-phospholipid canplex tehmlogy will be significant both for studying the innwe response to single epitcpes and for vaccine develolment. based on assays in which t cell proliferation was induced via oxidative mitogenesis and exposure to mhc alloantigens, it has been reported that langerhans cells (lc) isolated from normal mouse skin aquire maximum capacity to activate t cells only after 72 hours of culture. we have studied lc from balb/c mouse skins for their capacity to present ovalbumin (ova) and ia doantigens to unprimed t cells and to antigen-specific t cell hybridomas. the data reveal that both fresh and cultured lc presented ova and alloantigens with equal efficiency to previously primed responders (and with 10 fold greater efficiency than spleen cells or the b cell lymphoma a20.1-11). by contrast g& cultured lc displayed the capacity to present antigen to ynorimed t cells. we propose that the antigen presenting potential of freshly prepared and cultured langerhans cells, respectively, reflect the in vivo functional properties of intraepidermal lc and of lc that have picked-up antigen in the epidermis and migrated via dermis to the regional lymph node. if so, these data suggest that resident epidermal lc are fully prepared to present cutaneous antigens to memory/effector t cells (efferent limb), whereas resident lc must leave the influence of the epidermis in order to develop the capacity to meet the more stringent conditions required for antigen presentation to porimed t cells (afferent limb). we have investigated the structural restrictions placed on residues contained within a minimal t cell determinant, using the balb/c class i1 restricted t cell response to the site 1 determinant of the influenza hemagglutinin molecule as a model system. to delineate which of the residues comprising the site 1 determinant are involved in interaction with the t cell receptor, we have determinaed the response of a large panel of site 1 specific t cell hybridomas to a collection of peptide analogs differing by single conservative or non-conservative substitutions at 9 positions. the fine specificity patterns of the t cell panel is extremely diverse; t cells varied in both the location and number of residues within the antigenic peptide that effected recognition. our results implicate at least 6 out of 9 residues within the antigenic pepetide as being involved in interaction with the t cell receptor. this result suggest that peptides comprising the site 1 determinant do not form alpha helical structures when in association with mhc molecules. rubella-specific isotype and igg subclass responses were evaluated using elisa techniques in 44 rubella ha1 seronegative adult females undergoing rubella immunization (ra 27/3 strain). responses were evaluated prior to immunization and at 1,2,3,4,5,6,12 and 24 wks post-immunization. pre-immunization sera showed detectable levels of rubella-specific antibody in the igg class (17/44); iga class (32144) and in one or more of igg subclasses (22/44). post-immunization, i 1 subjects failed to develop igm class responses by the ha1 (sdg) technique while 43/44 developed igm antibody by elisa techniques. iga responses were detected at low levels in all vaccinees beginning at 3-4 wks and declining by 24 wks post-immunization. antibody in iggl and igg3 subclasses by 2-3 wks post-vaccine with sustained iggl levels but significant decline in igg3 levels noted between 6 and 24 wks post-immunization. no seroconversion was noted in the igg2 subclass although 9 individuals had detectable pre-immunization iggz rubella antibody present. igg4 levels were detected in all vaccinees post-vaccine with a delayed and progressive rise over the study period. subsequent correlation was then performed between rubella-specific antibody responses and the presence or absence of adverse joint reactions occurring in association with rubella vaccine administration. all individuals produced detectable c624 t lymphocyte responses to varicella zoster virus. anthony hayward, abbas vafai, roger giller & eileen villanueba. departments of pediatrics and microbiology, university of colorado school of medicine, denver co 8 0 2 6 2 61 university of iowa school of medicine, iowa city i0 the proliferative response of blood lymphocytes from varicella zoster virus (v2v)-immune donors to live vzv, extracted vzv antigens or purified glycoproteins is predominantly by cd4+, hla-d restricted t cells but little is known of the specificies of the responder cells. we restimulated t cells cloned by limiting dilution from vzv-stimulated cultures with purified vzv glycoproteins gpi, gpii and gpiii and found that t cell clones with specificity for each of these mediated both help for antibody responses and hla-dr restricted vzv-specific cytotoxicity. 5 polypeptides of 10 to 14 amino acids length corresponding to predicted amphipathic sequences in the primary structures of g p i, gp i1 and gp iv were synthesised. proliferative responses were observed to 3 of these peptips (one from each glycoprotein) with responder cell frequencies in the 1:lo blood t cells range. the gp i peptide additionally defined an epitope recognised by serum antibody. an immunomodulatory approach to treating hsv-1 corneal disease, hendricks rl, departments of ophthalmology, and microbiology/immunology, university of illinois school of medicine, chicago, il 60612 herpes simplex virus type i (hsv-1) corneal infections are a leading cause of blindness worldwide. we and others have demonstrated that the cellular immune response to hsv-1 contributes to the elimination of virus from the cornea, but in doing so causes the tissue destruction that is responsible for the blinding complications of the disease. we have demonstrated that specifically suppressing the cytotoxic t lymphocyte (ctl) response to hsv-1 renders mice resistant to corneal disease following topical corneal hsv-1 infection. in agreement with this observation was our recent finding that in vivo depletion of wt4' (t helper/inducer, and most dth effector cells) neither reduced susceptibility to corneal disease, nor increased susceptibility to disseminated disease. the corneal lesions in wt4 depleted mice contained numerous lyt-2 (t suppressor/cytotoxic) cells, and no l3t4 cells. the wt4 depleted mice exhibited normal hsv-specific ctl precursor frequencies. experiments designed to determine the effect of in vivo lyt-2 depletion on susceptibility to corneal disease are in progress. our goal is to identify cellular immune responses to hsv-1 that maximize protection, while minimizing immunopathology in the cornea, and identify hsv-1 epitopes that preferentially activate those responses. supported we found that affinity purified antibodies to bsa, klh and diptheria toxoid all contain a substantial amount of specific anti-idiotypic activity. against bsa react with mouse anti-bsa antibodies, which suggests that we are dealing with internal image antibodies. 9 mrl-lpr/lpr mice develop spontaneous autoimmunity. we found that these mice make anti-anti-(self h-2) antibodies prior to making appreciable amounts of pathological autoantibodies such as anti-dna, anti-rnp.sm, and rheumatoid factor. the anti-anti-self antibodies are detected using an inhibition of antibody mediated cytotoxicity assay, that also detects anti-anti-(self h-2) in ordinary allogeneic anti-sera. the antibodies are not rheumatoid factors, although the animals do make rheumatoid factors later in the development of the disease. anti-self activity is fully developed at 2 months, when the other autoantibodies are typically barely detectable. important role in the etiology of the disease. the anti-we conclude that anti-anti-self antibodies could play an c 627 feedback regulation of 11-1 synthesis in monccytes by t cell products: dual effect of 11-4. mikko hurme, tessa palkama and marja sihvola, department of bacteriology and immunology, university of helsinki, sf-00290, helsinki, finland. il-i production of human monocytehnacrophages is regulated by several cytokines some of which are themselves able to activate the il-1 production (e.g. tnf and il-2) while others (e.g. ifn-y) modulate the production activated by other signals. we have now examined the effect of 11-4 on the 11-1 synthesis. 11-4 alone did not induce any 11-1 bioactivity or il-la or 11-1 0 mrna expression in freshly isolated peripheral blood adherent cells. in contrast, il-4 effectively suppressed the lps induced 11-1 production. this suppression took place without any decrease in the steady-state levels of il-la and il-i 0 mrna, suggesting that this downregulative effects is posttranscriptional. monocytehnacrophages are known to rapidly loose their ability to produce il-i when cultivated in vitro. if ifn-yis present in the culture fluid, the cells remain capable of producing 11-1. as ifn-yand have been reported to have similar "priming" effects on macrophages (e.g. increasing the tumoricidal capacity and mhc class ii antigen expression) we cultivated monocytes for 24 h in the presence of either ifn-y or 11-4, and after washing the cells they were stimulated with lps. il-1 activity could be detected both in the ifnyand il-4 preincubated cultures (but not in the cultures preincubated with medium alone). these data suggest that il-4 can also display a similar upregulatory function in il-i production as ifn-y. gahreston, tx 77550 development of immunity to members of the spotted fever group of rickettsiae is a t-cell dependent response. we have used t-cell hybridomas and cloned t-cell lines from immune animals and convalescent humans to identify the rickettsial antigens that induce antigen-responsive t-cells. in these studies we found that the 155 kda antigen of rickettsia tickettsii. the causative agent of rocky mountain spotted fever, is one of the immunodominant tcell antigens. t-cells from immune animals and humans were responded in culture to a recombinant 155 kda antigen. both sources of t-cells were of the t-helper type (l3t4* and 04' respectively) and produced 1l-2 and interferon. it was found that soluble antigenic material of b. rlckettsii obtained by extraction with hypotonic buffer maximally stimulated the t-cell lines. this material was enriched far the high molecular weight polypeptides of 1s5 kda and 120 kda. also. ethirwwi ' will induce a long-lived immunity against infection with r. rickettsii. infected guinea pigs develop a minimally cross-reactive antibody response to b. nckettsii. in contrast. a strong cross-reactive t-cell proliferative response is produced. studies are in progress to determine the nature of the common protective antigen of r. humans infected with the parasitic nematode ascaris lumbricoides vary considerably in antibody responsiveness to a 14 kda component of the parasite. this molecule is secreted by the parasite, and is also abundant internally. this heterogeneous reactivity has been modelled in laboratory rodents, and the antibody response to it is h-2-and rt1-restricted in mice and rats, respectively. using inbred and congenic animals, only mice of h-2' and rats of rtiu were, so far, found to be responders, and this restriction only operated in the context of infection. the specificity of the ige response in these animals was assayed by passive cutaneous anaphylaxis, and in an ige-specific elisa assay. the data show that the above mhc restriction also applied to the specificity of the reaginic antibody response, although animals of all mhc haplotypes responded to other ascaris allergens. amino acid analysis of the 14 kda equates it to a previously identified "allergen a" of the parasite, and we now have its sequence available. these findings have implications for the genetic control of allergic responses in general, and, in particular, to the hypersensitivity responses which are such a feature of infections with parasitic nematodes. there are also implications for the generation of hypersensitivity responses by recombinant vaccines involving certain parasite antigens. the cns. immunohistochemial analysis of both frozen sections prepared from the brains of animals immunized in this manner and of highly enriched glial cell subpopulation cultures for viral gp70 expression indicated that oligodndrocytes and possibly a subset of astrocytes were the targets of this infection. further, microscopic analysis of frozen sections failed to reveal any overt signs of gross pathologic changes associated with the viral infection. we have been able to demonstrate the presence of virus specific antibody in the serum of these mice as well as virus specific cytolytic t cells in the peripheral lymphoid organs. ments are currently underway to determine whether the lack of pathology associated with wb91 infection in light of the previously shown virus specific immune responses in these mice is due to a failure of antigen presentation within the cns or some other form of immunoregulatory phenomena. the t lymphocyte proliferative response to pigeon cytochrome 5 in bio.a mice is restricted to the egk:e,k ia molecule and specific for the c-terminal determinant comprised of residues 93-104. blo.a(3r) and blo.a(sr) mice are nonresponders to pigeon cytochrome 2 nonetheless, the t cell repertoire of blo.a(3r) or (5r) contains some t cell clones capable of recognizing and proliferating to pigeon cytochrome c w h e n presented by bio.a antigen-presenting cells (aft). therefore, one would expect to stimulate such clones in allogeneic bone marrow chimeras of the type bio.a +blo.a(3r) or (5r) b1o.a apcs and a blo.a(3r) or (5r) t cell reperto4!6~.~espectively. ,en(isea9c22rzave were primed with pigeon cytochrome cytochrome 5, they showed a good antigen specific proliferative response in vitro. surprisingly, however, if pigeon cytochrome 5 was used for priming, no response was detected, even at priming doses as high as 400 nmol per mouse. 81-104 could only be achieved by treating the allochimeras with an anti-cdb monoclonal antibody in vivo during the priming step. clones specific for purified protein derivative (ppd) in the same chimera. thus the regulation which involves cd8 positive cells is antigen specific. transfer of pigeon cytochrome 81-104 primed lymph node cells from the chimera into naive bio.a mice prevented priming of the recipient for a t cell proliferative response to pigeon 81-104, but not priming to the moth synthetic fragment. chimeras of an antigen-specific suppression mechanism involving cd8 positive cells. faculty of medicine, kyoto university, kyoto 606, and department of oncology, nagasaki university school of medicine, nagasaki 852, japan. sera from b6 mice immunized with a syngeneic ctl specific for fbl-3 tumor of b6 origin blocked the cytotoxic activity of only the immunizing ctl clone. therefore, a monoclonal antibody (mab) n9-127 was produced by fusion of the b6 spleen cells immune to a syngeneic fbl-3-specific ctl clone (no. 8). the specificity of the mab n9-127 was confirmed by immunoprecipitation, blocking of cytolytic activity, stimulation of proliferation, and induction of tcr-mediated nonspecific cytolysis of the ctl clone no. 8. in some b6 mice, 3-13% of the anti-fbl-3 mltc cells were positive for this n9-127-defined idiotype, and formed a well demarcated population upon examination by flow cytomehy. even in mice in which no such population was observed some ctl clones established by limiting dilution culture were also positive for this idiotype (10 out of 89 clones from 3 mice). the cytotoxic activities of these ctl clones were blocked by n9-127, which in turn induced the nonspecific cytolysis in redirected assay. however, no positive cells were detected in non-cultured normal or fbg3-immune spleen and lymph node cells. this indicates the presence of cross-reactive (dominant) idiotype in the b6 anti-fbl-3 cytotoxic t cell responses and may provide a potent tool for analyzing the idiotype-mediated regulation of the anti-tumor immune responses. slade andsylvie gillard, max-planck-institut fur immunbiologie, d-7800freiburg, federal republic of germany t cells play a n essential role in t h e protective immune response to malaria and a r e associated with s o m e of t h e pathological consequences of t h e disease. however, t h e n a t u r e of their responses and t h e antigens t o which they respond a r e not well defined. w e have developed a limiting dilution assay system in which specific t cell responses to malaria antigens c a n be monitored a t t h e clonal level. i t is possible to determine t h e nature of t h e responding t cell by t h e growth f a c t o r s they s e c r e t e and by their ability to a c t as helper cells for t h e antibody response to malaria antigens. our d a t a suggest t h a t t h e t cell response changes during t h e primary infection and in hyperimmupe animals. o n e to t w o weeks a f t e r initiation of a blood s t a g e infection t h e major cd4+ t cell which proliferates in response t o parasite antigens s e c r e t e s il-2 and ifn-y but is not a n efficient helper cell for antibody responses. in c o n t r a s t l a t e r in infection and in immune animals t h e r e is an e f f e c t i v e helper cell response and many of t h e s e cells a r e distinct from those secreting ifn-y and il-2. we a r e currently investigating whether these cells retain these phenotypes when grown in long-term in vitro culture and whether defined antigens of t h e erythrocytic parasite elicit different t cell responses. we have localized linear neutralization epitopes on the coronaviruses ibv, mhv, fipv and tgev. the results can be summarized as follows: 1. linear epitopes of the spike proteins (1162-1452 residues) could be mapped to a resolution of a single residue by expression of gene fragments in the prokaryotic pex plasmids and/or pepscan peptide synthesis. 2. the length the epitopes varied from 4 to at least 20 amino acid residues. we present evidence that the larger epitopes, although conformation-independent according to operational criteria, are nevertheless discontinuous. 3 . in ibv, we localized several overlapping but different epitopes within an immunodominant region of 30 residues. this region is recognized by all polyclonal antisera tested. we propose that its immunodominancy is a consequence of its structure and function and does not depend on antigen presentation or idiotypic networks. an immune response against the mouse testis-specific antigen ldh-c4 reduces fertility by 70 percent in female baboons. an immune reaction to human ldh-c4 would be expected to be more effective in primates. since the human testis enzyme is not readily available in large quantities, recombinant dna technologies were uscd to create a source of human ldh-c4. antibodies to mouse ldh-c4 were used to screen a xgtll human testis cdna expression library. a full length human ldh-c clone was identified, sequenced, and the ldh-c cdna was engineered for expression in e.coli. the 5' and 3' untranslated sequences were removed by restriction enzyme digestion, and synthetic linkers were added adjacent to the start and stop codons of translation. the modified cdna was subcloned into the prokaryotic expression vector pkk223-3 and introduced into w3110 l a c iq cells. cells were grown to mid-log phase, and induced with iptg for positive regulation of the strong hybrid tac promoter. induced cells overexpressed the 35 kd subunit which spontaneously formed the enzymatically active 140 kd tetramer. human ldh-c4 was purified 196-fold from liter cultures of cells by two step affinity chromatography to a specific activity of 90 i.u./mg. the 20 n-terminal amino acids sequenced were identical to those predicted from the nucleic acid sequence. antibodies to synthetic peptide epitopes of human ldh-c4 cross-reacted with the enzyme produced in e.coli. two mg human ldh-c4 were expressed per liter of bacterial cells. the purified protein is now available for innunogenicity and fertility studies. it is now generally accepted that the principal effector mechanism in the host's defence against leishrnaniasis is gamma-interferon (ifn-y) which activates infected-macrophage to eliminate intracellular parasites. mice by prior sublethal whole body irradiation or treatment with anti-igm or anti-cd4 antibody. protection can also be induced by repeated intravenous or intraperitoneal immunisation with killed parasites or purified antigens. ly immunised mice produce little or no il-3 or il-4 but substantially elevated levels of ifn-y when stimulated with leishmania1 antigens in vitro. lymphoid cells from balb/c mice with progressive disease can inhibit the maf (macrophage activating factor) and leishmanicidal activities of the culture supernatant of lymphoid cells from mice recovered from l. major infection. maf appears to be ifn-y, whereas the maf inhibiting factors are il-3 and il-4. system can be reproduced with recombinant ifn-y, il-3 and il-4 and the maf inhibiting activity of the suppressive supernatant can be reversed by specific anti-il-3 and anti-il-4 antibodies. the disease by influencing the ability of macrophage to kill the intracellular parasite. the development of efficacious vaccines against malaria requires an understanding of the mechanisms involved in protective immunity. f'revious studies with plasmodium berghei demonstrated that sporozoite immunity is dependent upon antibody responses specific for the repeat region of the circumsporozoite (cs) protein and cell mediated mechanisms involving cd8+ t cells. in this study we analyzed the splenic t cell repertoire directed against epitopes on the cs protein of p. berghei and determined whether sporozoite-immune cd4+ and cd8+ t cells respond to shared or distinct epitopes. sporozoite-immune spleen cells, cd4+ and cd8+ enriched t cell populations of balb/c (h-m), c3h @i-%), and c57bv6 (h-2b) mouse strains were cultured in the presence of irradiated sporozoites or synthetic peptides representing 70% of the complete cs protein. surprisingly, none of the cultures proliferated to any of the peptides tested, although proliferative responses to sporozoites were observed in unfractionated spleens and cd4+ t cell populations. cd8+ t cells did not respond to any of the antigens tested, even in the presence of exogenously added 11-2. titration of cd8+ cells into proliferating cd4+ cell cultures did not suppress the anti-sporozoite response. the lack of anti-peptide reactivity contrasts with uniform responses to sporozoites and may be the result of the context in which cs antigens are presented to t cells. functional analysis of accessory splenic b cells and macrophages revealed that while the anti-sporozoite proliferative responses were not affected by the removal of macrophages, sporozoite-primed b cells were essential for the responses. these data suggest that the cs protein on sporozoites is not processed extensively by macrophages to yield many potential t cell epitopes, but instead is presented by immuncdominant b cells that resmct responses to a limited number of t cell clones. of the primary infection and is also required for optimum protection against reinfection. current studies have demonstrated that relatively few of the viral antigens tested to date ( 7 viral envelope glycoproteins or 2 nonsmctural nuclear proteins) are recognized by hsv-1 immune ctl populations generated in several different strains of mice (h2 haplotypes h2b, h2d. or h2k). this failure of hsv specific ciz to recognize the cloned gene products in in v i m assays was demonstrable at the clonal level and could not be attributed to a peculiarity of the recombinant vaccinia conshucts used because studies with adenovims vectors or tranfected l cell constucts yielded the same results. surprisingly, despite their inability to be recognized by hsv specific ciz in vim, when used to immunize mice several of the vaccinia virus constructs would induce memory ciz populations capable of lysing hsv-1 infected autologous cells. for example, hsv-1 glycoprotein c (gc) was recognized by h2b restricted but not h2k restricted hsv specific ctl. however, immunization of either haplotype of mice with a vaccinia gc recombinant induced ctl populations which upon in v i m restimulation with hsv-1 would lyse histocompatible cells infected with hsv-1. this demonstrates that despite the presence of suitable epitopes (intrinsic factors) the context of the immunogen (extrinsic factors) will also influence it's ability to induce ctl. the results of further studies into the nature of these extrinsic factors will be presented and discussed with relevance to future sub-unit vaccine design. w d ~upponrd by public ~~l t h service g~mu, ai 14981 md ai 24471 fran the ti-^ ~n,litu= md lnrcniour d ,~~~~~~. infection of mice with hsv-1 induces a brisk ciz response which is necessary for the subsequent resolution we have investigated the structural basis for antigen mimicry by anti-idiotypic internal image antibodies. two mouse monoclonal antibodies (mabs) that bear internal images of a well-defined protein epitope, i.e., the rabbit immunoglobulin (ig) a1 allotype, were produced and the variable region sequences were determined by rna primer extension sequencing. the results showed that the mab light chains did not contain any allotype-related residues; however, both heavy chain v regions contained a unique sequence homologous to the nominal antigen but in opposite orientation. this reversed sequence was expressed within cdr2 of both mabs. synthetic peptides corresponding to the putative antigenic regions of rabbit ig and the mab internal images, respectively, were tested for the ability to mimic the al-like determinant. although the homologous residues were presented in opposite orientations, both peptides completely inhibited at similar concentrations the binding of rabbit ig to anti-a1 antibody. a paired thr and clu was necessary for expression of the a1 epitope as revealed by conservative substitutions in the peptide sequence. computer-generated, energy-minimized models of rabbit ig and the mabs revealed that the critical a1 residue side chain placements could be almost superimposable in either context. thus, it appears that an antigenic epitope can be determined solely by md 20892. proliferation of murine type i cd4+ t cell clones quires simultaneous occupancy of the t cell antigen receptor and delivery of an accessory cell-duived costitnulamy signal in contrast, isolated t cell receptor occupancy induces the cell into a state of reduced proliferative responsiveness to antigen. based on the observation that pkc-activating phorbl esters can at times substitute for the p s e n c e of accessory cells in t cell proliferative response. to mitogens or anti-cd3 mnodonal antibodies, we investigated the requkment for accessory cells in the antigen-and con a-induced hydrolysis of p m and activation of pkc. the presence of normal accessory cells was found to be unnecessary for the development of pkc-dependent phosphorylations and the addition of normal accessory cells had no effect on the activity of pkc. cell il-2 synthesis and proliferation presents a paradox. we have studied the effects of ueatment with a calcium ionophore and p h h l ester on t cells and find that increased [ca2+]i and pkc activation are in fact insufficient biochemid second messengers in the induction of proliferation. while pliferation was induced at high t cell density in response. to these stimuli, incubation of t cells at decrrased cell density drmonsuated markedly reduced proliferation, and single t cells failed to divide. this suggested that cellular interactions were. q u i r e d in the response. additions of either il2 or normal accessory cells allowed p l i f d o n at low density, consistent with a requirement for an accessory cell-derived costimulatoq signal in the induction of i l 2 synthesis, even in the plifcrative response to ionomycin and pma. this result underscons the importance of an accessory cell-duived costimulatory signal, acting independently of t cell receptor-mediated increases in [caz+] i and pkc activation, in the induction of t cell proliferation. we describe experiments designed to determine the molecular requirements for recognition by fluorescein-specific ctlps and ctls derived both from n a i v e and from immunized mice. we the production of prostaglandin e, a major immunesuppressor secreted by the macrophages was inhibited by the addition of 0.1 m indomethacin to the cultures of monocytes harvested from patients suffering from pulmonary tuberculosis and those from equal number of normal controls. the il-i activity was estimated i n the supernatants of these cultures by their ability to proliferate mice thymocytes. it was found that the supernatants from cultures with indomethacin showed a greater il-i activity than the ones without it(44% p 0.001). this indicates the possibility of pge offering a negative feedback control over il-i production. the defective cell mediated immunity i n patients with pulmonary tuberculosis may be explained through the inhibition on il-i production by pge whose enhanced production is reported i n our earlier studies. the results and our hypothesis on the autoregulation of il-i production w i l l be presented and discussed. in variant viruses which differ from the parental virus (gv) at specific epitopes recognized by monoclonal antibodies directed against the env gene product, gp70. biological clones isolated from gv express the gv phenotype suggesting that the loss of specific epitopes is the result of selective de novo processes in the immunocompetent host. additionally, inoculation of adult mice with a biological clone expressing the gv phenotype also results in similar variant viruses. however, inoculation of gv into neonatal or nonlethally irradiated mice results in a population of viruses expressing only the gv phenotype suggesting that the emergence of antigenic variants may be influenced by neutralizing antibodies and/or cellular host res sds-page analysis of immunoprecipitates of sg~s~~,elled lysates of fibroblasts infected with clones expressing gv or variant pehnotypes shows a size difference of the gp70 precursor. additionally, the recognition of a neutralizing epitope (e-55) associated with gp70 by mab55 is dependent on the appropriate native conformation of the epitope which appears to require glycosylation for expression. experiments are in progress to further examine the immunogenetic basis for the generation of these variants and to determine the molecular changes in the virus genome responsible for changes in epitope expression. investigated the capacity of murine splenict cells depleted of acceso cells ( ac ) t o proliferate in response t o stimulation by con a, @ cd3 ab and activated t cells. the zepletion procedures consisted of carbonyl iron treatment, 2x "panning " on anti-ig coated flasks, 2x anti-la cytotoxic treatments and percoll gradient purification of small resting t la-cells. the appropiate concentration of con a (10 ng/ml ) and plastic-bound (pb) @cd3 lg or its f (ab)' 2 fra ments induce proliferation, 112 r expression and 112 (but not 114 secretion in t la-cellscultured for 48% at 5x105 cells/well . responsiveness of tlacells t o con a and dcd3 in low density cultures (5x104 cells/well) is restored by the addition of irradiated th2 cloned cells but not thl ,splenic cellsor r l l l + rll2 + r114. likewise, responsiveness t o non activating doses of con a (lnglml) or soluble @cd3 is restored by the addition of irradiatedth2 costimulatory cells . these experiments demonstrate that the ability of t cells t o proliferate in the absence of ac is critically dependent on t-t interactions. t cell subsets prepared by either negative (l3t4-and ly2'cells) or positive selection proliferate in response t o pb @cd3 lg . although the proliferative responses of both l3t4-and ly2-cells are maximal at 48h, the l3t4-cells require l o x more pb @cd3 lg for maximal stimulation and their res onses decline much faster than those of ly2cells. in addition, l3t4-cells are not stimulated by pb &cd3 f(ab)'2 fra ments and their responses t o @cd3 i are inhibitable by anti fcr as well as anti-lfa abs . re onsesof%oth l3t4-and ly2-cells are !nhibita%le by @i12 and @i12 r abs but not by @l3t4, @ly2 or b l l 4 . these experimentsdocument interesting differences in the triggering requirements of l3t4-and ly2 'cells. supported by nih grants po1 ca 19266, t 326m-07183 and gm 07281. the 19k glycoprotein encoded in the e3 region of ad2 and ad5 (gpl9k) binds to class i mhc antigens in the endoplasmic reticulum and prevents their translocation to the cell surface. this has been proposed as a mechanism by which virus infected cells can avoid recognition by the host cytotoxic t lymphocyte (ctl) response. we have shown that gpl9k can inhibit target cell lysis by adenovirus specific ctl, but the effectiveness of this inhibition varies greatly between different mouse strains. this is due in part to differences in the affinity of gpl9k for different mhc class i molecules, but this cannot account for a l l the variation observed. i t has been shown that cd4+8+ immature thymocytes fail to secrete il-2 or express il-2 receptors in response to activation signals. furthermore. they cannot induce il-2 gene transcription. several tumor lines have now been characterized which have a cd4+8+ phenotype and fail to secrete i l -2 or express il-2 receptors in response to stimulation with ionomycin plus pma. these cells also do not express il-2 mrna after stimulation, as determined by northern blotting and rnase protection. to determine the molecular mechanism for this lack of transcriptional activity, nuclear extracts were analysed for the presence of the d n a binding factor nfat-i. this nuclear factor i s present only in ac we have undertaken an mhc analysis using the polymerase chain-reaction (pcr) and dot blot analysis of the amplified lyme arthritis patients dna with allele specific oligonucleotide (aso) probes. genomic dna for the first domain of the dq beta chain and of the dr/pi from 20 patients with lyme arthritis has been amplified and we are analyzing the distribution of dridrr and w a l l e l e s in this population to test the hypothesis that the mhc class i1 genes might be involved in presentation of selected spirochete epitopes whose recognition by t lymphocytes leads to lyme arthritis. most inbred strains of mice do not respond to porc insulin (pins). experiments were conducted to elucidate the mechanism of the non-responsiveness in h-zk mice: 1) purification of cd-4' t c e l l s from pins-immune b1o.mbr mice revealed pins-specific t helper (th) cells, 2) these pins specific th cells could be activated by i-ak and i-ek expressing l929-fibroblasts. therefore, both i-ak and i-ek molecules can present pins in an immunogenic manner and activate pins-specific th cells. by means of different cell-fractionation procedures, it was found that antigen-specific t suppressor (ts) cells regulated the pins immune response. these ts cells were of the fcr-, cd-4-, cd-8'. thy-1' phenotype, and they were present in normal mice. we believe that these experiments indicate that antigen-specific ts cells exist and are important regulators of immune and autoimmune responses. the possibility of functional inactivation of cd4+ clones by ts-cells was investigated. m leprae-responsive cd4+ clones were preincubated with ts cd8+ clones, apc and antigen for 1 ; hours. after which the cd8+ cells were removed from culture. the cd4+ clones were then restimulated with e. leprae and apc. cd4+ clones incubated with cd8+ cells and antigen were unresponsive to restimulation by antigen, although they were not killed and could respond well to il-2. addition of il-2 in the preor post-incubation culture neither prevented the induction of unresponsiveness nor reversed it. earlier models of tolerance have suggested that receptor occupancy in the absence of second signals induces tolerance in b-and t-cells. we would suggest that in the presence of ts-cells. a second signal may be negated leading to th-cell unresponsiveness. university of texas southwestern medical school, dallas, tx 75235 graft versus host disease (gvhd) poses a serious threat to the survival of patients with bone marrow transplants. the state of immunosuppression established in gvhd results in a variety of immunological abnormalities at the humoral and/or cellular level. we have developed a murine model of chronic gvhd across a minor histocompatibility (mh) barrier. in this model, immunosuppression develops. spleen cells from mice undergoing this type of gvhd are unable to respond to the polyclonal activators lipopolysaccharide and concanavalin a . however, the response against the b cell leukemia bcll remains intact. the protective immune response against bcll is directed towards the mh antigen h-40 and is mediated by cytotoxic t lymphocytes. thus, the specific t cell response against a mh antigen can occur in the presence of chronic gvhd despite the absence of a polyclonal b and t cell response lymphocytic choriomeningitis virus (lcmv), a member of the arenavirus family, has a biseqmented rna genome which encodes at least three polypeptides. the smaller rna segment encodes two virus structural proteins, the slycoprotein (gp) and the nucleoprotein (np). upon infection of mice with lcmv a cytotoxic t cell immune response directed against these proteins is measurable in vitro and in vivo. it can be demonstrated that depending on the haplotype of the mice, one or the other protein may play a major part in the immune response. in order to define the immunogenic epitope(s) of the nucleoprotein which are recognized specifically by the t cell receptors of cytotoxic t cells, stepwise 3 ' truncated qene fraaments encodina the nucleoprotein were cloned and expressed in vaccinia virus. with these recombinant vaccinia viruses, protection experiments in mice aqainst lcmv infection were performed in parallel with in vitro studies, namely specific recoonition of target cells expressing truncated fragments of the nucleoprotein by lcmv primed spleen cells. cdna clones encoding the mouse and human t cell il-1 receptors have been isolated and expressed in mammalian cells. the recombinant receptor binds il-1 indistinguishably from the natural il-1 receptor, and is functional in signal transduction. deletion of the cytoplasmic portion of the receptor abolishes its signal transduction abilities. sequence and secondary structure analysis suggest that the cytoplasmic segment of the il-1 receptor binds a nucleotide. experiments designed to test this hypothesis and to examine the mode of signal transduction will be presented. also to be discussed are the mechanism of triggering of the receptor by il-i. and the nature of il-1 receptors expressed in other cell types such as b cells. it became evident that these subsets reflect different stages of helper t cell maturation before and after activation. therefore, these t cell subsets have been designated as naive t cells (cd45r/2h4+, cdw29[4b4]-) and memory t cells (cd45r/2h4-, cdw29[4b4]+). we analysed the expression of these antigens in dermal lymphohistiocytic infiltrates from different benign skin diseases and cutaneous t cell lymphomas (chronic contact dermatitis (n=14), parapsoriasis en plaques (n=5), lymphomatoid papulosis (n=4), mycosis fungoides (n=15), sezary's syndrome (n-2), pleomorphic t cell lymphoma (n=6) and high grade t cell lymphomas (~4)). in almost all cutaneous t cell infiltrates memory t cells were preferentially found whereas in the peripheral blood both subsets are equally distributed. this implicates, that t cells infiltrating the skin already have had contact with their respective antigen. where the switch from naive to memory t cells takes place can not be answerded by our findings, as we have investigated rather longstanding skin diseases. however, these memory t cells, which can be activated more easily, make diseased skin more e f f e c t i v e in the nmdlification of an immune resnonse. organs and after several days of stimulation with antigen or mitogen and lymphokines. we find that fresh th synthesize and secerte -iu,ifng,il3 and gmcsf but very little il4 or il.5 within 24-48 hours. this pattern resembles the pattern of lymphokines secreted by thl cell lines. the th responsible for this secretion are cd4 positive t cells which are long-lived since they disappear very slowly following adult thymectomy. they are also sensitive to the in vivo administration of ats(antithyrn0cyte serum) and they express high levels of pgp-1. the kinetics of lymphokine secretion and the phenotype of the cells suppon the hypothesis that lymphokine secretion from fresh lymphoid cells comes from a population of memory cells. in contrast we fmd that we can also stimulate a separate, ats-resistant population to become lymphokine-secreting cells after four days of in v i m priming.these primed cultures rapidly synthesize an secrete large amounts of i u and il5 in addition to ifng , il3 and gmcsf( a phenotype which could be combination of both thl and th2 helpers), when they are restimulated with ag or mitogen. the cells whic are responsible are cd4 positive and have a shorter liespan since they decline considerably after adult thymectomy. we suggest that the lymphokine secreting cells detected after priming come from a population(s) of helper t cell precufiofi which have differentiated to become effectors. this generation 0: effectors requires lymphokines, especially e-4 andlor ilz and apcs. thus the development of helper th appears to follow a similar pathway as that of cells of the b cell lineage developing into ab-secreting cells and cd8 positive t cells which develop into cytotoxic effectors from precursors. we have studied the secretion of lymphokines by helper t cells freshly obtained from lymphoid interleukin-2 production by t cells has been shown to be required for both humoral as well as cell-mediated imune responses. thus, il-2 production was measured in syphilitic rabbits as a function of their imune response. maximal il-2 production induced by con a at 10-14 days post-infection was only l/2 that observed for uninfected rabbits and this correlated well with a decrease in t cell proliferation ( < 35% that of normal rabbits) upon stimulation with con a . this decrease in il-2 production in infected rabbits was restored upon removal of most of the adherent cells. furthermore, the il-2 production by 10-14 day infected spleens was restored above normal levels upon addition of indomethacin. this decrease in il-2 levels was not due to an increase in the ability of infected spleen cells to adsorb il-2. finally, studies assessing il-2 production at various times postinfection indicated that at 4 days post-infection il-2 levels were higher than normal, however as early as 10-14 days after infection il-2 levels decreased below normal levels and continued to be depressed as late as 30 days post-infection. these results may explain why all organisms are not eliminated during primary infection w i t h ' . pallidum and why secondary and tertiary phases of the disease may develop. has been shown that especially the antigenic presentation of the fusion protein is important for eliciting a functional immune response. to study the immunolo ic properties of the f protein, we expressed the f gene in e.coli as a $galactosidase-f-fusion protein after insertion in a pex vector. we constructed deletion mutants with fragments generated with restriction enzymes and with the polymerase chain reaction method. using a panel of monoclonal antibodies a rough epitope mapping has been performed. two arears were found on the protein, with one area two monoclonal antibodies react and with an other area four monoclonal antibodies react. both area's were found in f1, the c-terminal part of the protein. the pepscan method was used to fine map the epitopes of the monoclonal antibodies reacting with the second area on the primary sequence. in at least one viral system, cd8+ effector cells can be induced in animals lacking cd4+ t cells. since cd8+ effector cells are important in immunity to malaria sporozoites, we wished to know if they,too, could be generated without help from cd4+ cells. we depleted balb/c mice of their cd4+ t cells by injection of an anti-cd4 monoclonal antibody, and then tried to immunize them with irradiated plasmodium yoelii sporozoites. when challenged with infectious sporozoites, these mice were not protected against malaria infection. although they did not make antibodies to sporozoites, passive transfer of hyperimmune serum into these animals still did not protect them against a sporozoite infections. cd8+ t cells from these animals functioned normally in in vitro assays against tnp labelled targets. it appears that, unlike viral systems, the generation of cd8+ effectors in malaria requires cd4+ helper cells. thus both cd4 and cd8 epitopes should be included in any synthetic vaccine against malaria sporozoites. univ. pennsylvania, philadelphia, pa 19104 migrate from fetal liver or bone marrow. rearrange t cell receptor (tcr) genes, express tcr. undergo thymic selection and finally emerge as mature single positive t lymphocytes. most studies of thymic t cell development have been performed by using polyclonal populations of t lymphocytes, which have made the interpretation of the results complicated. cells (0 clone) from nude mice by culturing nylon wool non-adherent cd4-cd8-spleen and lymph node cells in the presence of wehi3 supernatant and con a supernatant. clone was thy-1-cd3-cd4-cd8-il2r(il2 receptor)-and they have been maintained more than 16 months without changing phenotype. when the c9 clone was stimulated with il4. ili/il2. illnl6. gm-csf, the cells were induced lo express thy-i, tcr and il2r proteins. however, culture of the cells with gm-csffll2 did not induce the expression of these molecules. southern blotting of the dna isolakd from gm-csffll2 culture suggested that they have undergone partial db 1 -jp 1 rearrangement. the cultured cells were then recloned twice by limiting dilution. the cloned cells were again shown to induce expression of cd3 complex by the stimulation of il4 enriched medium. therefore. we have established a system in which to induce differentiation of cloned pre t cell line into tcr+ cells in vitro. the human lymphocyte differentiation antigen cd8 is encoded by a single gene which gives rise to a 32 kda glycoprotein expressed on the cell surface as a dimer, and in higher molecular weight forms. we demonstrate that the mrna is alternatively spliced such that an exon encoding a transmembrane domain is deleted. that is secreted and exists primarily as a monomer. messenger rna corresponding to both forms is present in peripheral blood lymphocytes,(pbl), con a activated pbl and three cd8+ t cell lines with the membrane form being the major species. ratio of mrna for membrane cd8 (mcd8) and secreted cd8 (scd8) exist. in addition, the splicing pattern we observe differs from the pattern found for the mouse cd8 gene. this mrna is also alternatively spliced, but an exon encoding a cytoplasmic region is deleted giving rise to a cell surface molecule which differs in its cytoplasmic tail from the protein encoded by the longer mrna. neither protein i s secreted. this is one of the first examples of a different splicing pattern between two homologous mouse and human genes giving rise to very different proteins. this represents one mechanism of generating diversity during speciation. cd4' t cells in the rat can be divided into two non-overlapping subsets by theii reactivity with the monoclonal antibody mrc ox-22 which binds some of the high molecular weight forms of the cd45 antigen. recent work, to be described has shown that the two subsets represent different stages of t cell maturation, with distinct t cell functions. the lymphokine repertoire of the memory t cell pool will be discussed with reference to the antigenic environment fmm which the cells are obtained. pancreatic islet allografts, gill, ronald g. and lafterty, kevin j., barbara davis center for childhood diabetes i univ. colo. health sciences center, 4200 e. 9th ave, box b-140, denver, co. 80262 we studied the cellular interactions requkd for the rejeaion of cultured mhc class i-dispiuate islet allografts. this model was suitable for studying t-t collaboration in that islet allograft immunity is cd4 dependent but rejection of the cultured islet graft is mediated by the cd8 cell. recipient c57by6 (b6) mice were grafted with mhc class idisparate b6.c-h-2bml (bml) islets beneath the mal capsule. islet grafts wen preaated for 7 days in 95% oxygen culture to reduce immunogenicity. thirty days after grafting, recipient mice wen immunized with 1oe6 live spleen cells from the strains indicated below. rejection of the established graft was not trig-by challenge with donortype bml spleen cells, indicating that the mhc class i stimulus was insufficient to initiate all@ immunity. further, immunization with a mixture of 1 m bml and 1oe6 mhc class ii-disparate b6.c-h-2bm12 (bml2) spleen cells failed to trigger host immunity. however, challenge with loe6 (bml x bml2)fl spleen cells t r i g g d acute rejection of the established bml islet grafts. the requirement for l i i presmtatidrecognition of class i and class 11 all* antigens to trigger allograft immunity indicates that the antigen-presenting (apc) plays an essential role for t-t collaboration in vivo. uature hla-dr complexes are purported to spontaneously internalize from and recycle to the plasma membrane of b but not t lymphocytes. using a neuraminidase protection assay, we have radiolabeled surface class i1 antigens on intact cells and cultured these cells under conditions which permit or prevent endocytosis; subsequently, surface glycoproteins on viable cells were desialylated and class i1 molecules were analyzed by iamunoprecipitation and two-dimensional gel electrophoresis. a panel of buman b lymphoblastoid cell lines and activated tonsillar b cell blasts failed to exhibit any internalization of class i1 complexes; control transferrin receptor molecules were endocytosed as ascertained by insensitivity to neuraminidase digestion. class ii+ pba blasts and sezary cells of the t lineage were also deficient in detectable hla-dr internalization. results did not vary regardless of the time allowed for efficient endocytosis (1251 ok 3~saethionine). or the addition of anti-class i1 monoclonal antibody during the chase period for endocytosis. therefore. within the limits of sensitivity of this assay, class 11 complexes do not appear to be internalized, either spontaneously or uben crosslinked by antibody. recycling represent a dynamic pathvay for regulating surface expression of class i1 antigens or a means of associating with and presenting foreign antigenic peptides. supported in part by usphs grants #5 t32 cao9058-14 and #5 eo1 a123081-03. in previous studies of antigen-specific t cell responses in two distinct models of autoimmune tubulointerstitial nephritis (tin), viiia villosa lectin binding (vv+) t cells have been shown to be necessary for effector t cell expression and mediation of tin. in anti-tubular basement membrane disease, antigen-specifi vv+ t cells direct the phenotypic selection of cd8+ nephritogenic t cells in susceptible mouse strains (j. immunol. 141, nov. 1,1988) . this function is mediated by an antigen-binding, i-js+ soluble protein factor. current studies investigate the role of the t cell glycoprotein which binds vv lectin in mediating w+ t cell function. using the previously described effector t cell induction assay, we found that n-acetyl-d-galactosamine (galnac) (at 25 mm but not 2.5 mm) inhibits vv+ t cell function and cd8+ effector t cell selection. when cd8+ effector t cell differentiation occurs in the presence of soluble factors derived from antigen-primed vv+ t cells, galnac is not inhibitory. these studies suggested that soluble gal nac may competitively bind to a soluble protein which stimulates vv+ t cells, in part by binding to the w lectin receptor, to synthesize andlor secrete their biologically active soluble factor. as an additional test of this hypothesis, we prepared detergent solubilized membranes from vv+ t cells and purified vv lectin binding proteins by affinity chromatography. like galnac. these membrane derived lectin binding proteins also inhibit vv+ t cell function and cd8+ effector t cell selection. inhibition by soluble 'lectin receptors' is dose dependent and is demonstrable with lectin binding glycoproteins derived from 15-20 x lo6 cells, in an assay utilizing 10 x106 vv+ cells. we are now further characterizing the lectin receptor and its endogenous ligand. elementary bodies and outer membranes of chlamydia trachomatis produce a high-titered igg response in rabbits and mice as measured by elisa and microscopic immunofluorescence assays. western blot analysis of total elementary body protein identifies a 40kd major outer membrane protein (momp) as the predominant antigen. to identify the cbmical structure of the epitope, purified momp was subjected to chemical and enzymatic fragmentation and the resulting peptides were purified by hplc and assayed for immunoreactivity. an immunoreactive 6kd cyanogen bromide peptide was amino-terminal sequenced and a series of overlapping synthetic peptides were synthesized and assayed for immunoreactivity. sequential single amino acid deletions at both the nhz and cooh termini allowed us to identify the precise epitope as a 12 amino acid peptide spanning residues 291-302 of momp. two amino acid substitutions at positions 293 (phe-gly) and 300 (pro-gly) completely eliminated antibody binding. the 12-amino acid synthetic peptide is a potent immunogen producing high-titered antibody responses that are specific for the momp molecule. analysis of an independently derived mutant harboring the same defect has shorn that this trans-acting gene is not required for transport of class i1 molecules. class i heavy chain is synthesized in this cell line and associates with b2m. transport of the class i appears to be blocked in the er or cis4olgi as the majority of the class i glycoproteins are not processed to the endo b resistant form. the ability of the cell to significantly increase expression of surface class i when the incubation temperature is lowered from 37oc to 22oc suggests that this gene may function to stabilize a particular conformation of the protein. consistent with this is the increased sensitivity of class i molecules in the mutant as compared to the parent to degradation when cell lysates are incubated at elevated temperatures. the inability to immunoprecipitate class i antigens in the mutant is possibly due to the action of endogenous proteases present in these lysates. two complementing approaches are being employed to isolate this gene and further analyze its role in class i biosynthesis. the first involves inactivation of the trans-acting gene by insertion of a retroviral vector and subsequent pcr amplification of regions flanking the vector. in another approach a cdna library will be introduced into the mutant cell line and the cdna will be reisolated from cells reexpressing surface class i. houston. tx. we have induced a panel of highly immunogenic (imm+) vanants of the murine fibrosarcoma mca-f using i-methyl-3niml-niuasoguanidine (mnng). 5-aza-2'-deoxycytidine (5-azacdr). and uv radiation. these tumors grew m immunosuppressed mice. but were complelely rejecled by normal syngeneic hosts. mice thac had rejected large numbers of imm+ also developed a smng, tumor. specific immunity to the parental mor. lmmunizalion with low numbers of lmm+ engendered only variant-specific immunity. the frequency of imm+ variant g e n e d o n was similar for the three induction different protocols (64% to 92%), suggesling lhat generauon of imm+ was more closely relalcd to the cell line used than to the inducing agent however, the swngth of the imm+ phenaypc was related lo ihe agent used, since mnng induced clones had the m g e s t immunogeniciues and uv-b ihe weakest the smng neoantigens expressed by mnng induced imm+ were varunt-sppifc. while uv and s -d d r induced clones displayed significant cross-reactivilies not attributable to the parental t u n a antigen. increased or inappropriate expression of class-l mhc antigens did not correlate with the imm+ phenotype. we investigated the phenotypes of the spleen cells medrating tumor rejccuon using the local adopllve lransfer a s a y (lata). variant-specific immunity w mnng, 5-azacdr and uv induced imm+ were all m d a t c d by thy1.2+. l3t4+. lyc2.1-t cells. afrw immunization with high numbers of imm+ w engender both anti-lmm+ and anu-parental immunity. both cd4+ and cdw effectors rejecled the imm+ in lata, while only ihe c w + t cells could wnsfer resistance w the parent immunity 10 the parenlal tumor anugen engendered by the imm+ suggeslcd associative recognition of the parental and neoantigem wgelher on ihe cell surface. this hypochfsis was supported by failure of lmm+ u) pmlect againu an antigenically disunct tumor (mca-d) admixed with it, either at the lime of immunization or at challenge. fusion of ihe h m * vanant with mca-d yielded a unique, hybrid parental umor antigen that was associatively recognized with rhc original imm+ neoanugen, demomirating the importance of antigen cocxpression. grant rr-5511-23. w e have recently demonstrated and reported that substitution of anionic side chain carboxylic groups with aminoethylamide groups on protein antigens exhibits a pattern of enhanced immunogenicity both in vivo and in vitro. this enhanced immunogenicity was also observed in low responder strains of mice and we investigated the mechanism by which it is achieved. we examined antigen processing and presentation of native (nbsa) and modified bsa (mbsa) to t helper c e h isolated from c57/bl low responder mice. a greatly reduced amount of mbsa than nbsa was required to activate both nbsa and mbsa primed th. proliferation of nbsa and mbsa primed t cells increased in proportion to the amount of time of exposure of the antigen presenting cells (apc) to nbsa, peaking at 8h. conversely, apc required less than 30 min exposure to mbsa to achieve optimal activation, indicating rapid uptake of mbsa. paraformaldehyde fixed apc recognized mbsa without a lag phase processing, indicating that this event also occurred quite rapidly. apc processed nbsa w a s presented to primed t cells more effectively than the soluble antigen m shown by the increased rate of t cell proliferation. in contrast, mbsa was equally well presented to th cells by apc m in soluble unprocessed form. our data demonstrate that the reduced response in low responders is greatly enhanced by a modified antigen which is rapidly taken up and processed by apc. b cells which bear surface innunoglobulin (sig) receptors specific for a particular antigen are abile t o present fragments of that antigen very efficiently t o t cells. this i s due. in part, t o the high affinity of the receptor, which facilitates antigen binding at low concentrations. using tnp-abc and specific antigen, we have demonstrated that the tnp-abc process antigen very effectively. w e have compared specific antigen with i t s polyclonal analog, anti-ig, and demonstrated differences in the kinetics of degradation of anti-ig and tnp-antigens by tnp-aex. both antigen and anti-ig bound by tnp-abc are degraded into small fragments which are released into the supernatant. however, the following differences have been found: 1) the rate of release of small fragments of tnp-antigen parallels the rate at which these cells become able to directly conjugate with t cells (a lneasure of antigen presentation), reaching a plateau between 4 and 6 hours. in contrast, the degradation of anti-ig and release of fragnents continues for 12 hours. 2) analysis of initial kinetics demonstrated that release of fragments of tnp-antigen begins 15 minutes after binding; there i s no significant release of anti-ig fragrnents u n t i l about 30 minutes. 3) in contrast to anti-ig where there i s significant accumulation of degradation intermediates within the cells, there i s very l i t t l e intracellular accumulation of intennediate-size fragments of tnp-antigen. thus, we propose that the processing of antigen bound via specific sig may involve a specific intracellular pathway and that intracellular routing may be determined either by the degree of cross-linking of sig induced by antigen vs anti-ig or the mode of interaction of the various ligands with sig. alt*, departments of biochemistry (*) and medicine (+), college of physicians and surgeons of columbia univerity, new york, new york 10032. we have recently analysed the structure of the 7 / 6 t cell receptor (tcr) expressed by the normal human thymocyte clone cii. cii expresses a c ' 2 constant region that is a polymorphic form lacking a copy of an izternal exon; the sequence of this constant region accounts for the size of the 7 chain and noncovalent linkage of 7 and 6 chains in the cii tcr. in order to elucidate its role, this 7/6 tcr will be reconstituted in immortalized t-cell lines. in addition, the productively rearranged human 7 / 6 receptor will be transgenically introduced into mice in order to assess the effect of the complete receptor on the development of t cells. the humoral immune response to human immunodeficiency virus has been shown to contain antibodies which act to mediate the uptake of virus through fc receptor mediated mechanisms. it is therefore possible that vaccination with the entire envelope polypeptide may present immunologic determinants that enhance infection. one means by which to generate an immune response to hiv that shall possess neutralizing activity in the absence of infection enhancing activity is to generate anti-idiotypic abs that bear the internal image of neutralizing human antibodies directed against hiv. we affinity purified human antibodies from hiv+ patients on a viral lysate column. we have produced 15 monoclonal anti-idiotypic antibodies directed against these abi's. two of these monoclonals were shown to he ag inhibitable by their ability to inhibit the binding of p o l y e l o n a l human antisera to hiv viral lysate on ortho hiv ab t,est, wells. one monoclonal, 0 b 0 , when coupled t.o klh and used to immunize mice, produced an abs that. bound to viral lysate in an elisa assay. an affinity column containing rbs was used to purify an abi that was shown to bind to p24 and p17 hy western blot analysis. these data suggest that 8br may be a potential vaccine candidate. we have recently described a transgenic mouse model which co-expresses the tcr u and fl chains from the 2c cell line (recognized by the 1b2 anti-clonotype). t celh bearing the transgenic clonotype are positively selected by elements of the h-zb mhc for expression on cd8' cells. thus in the periphery of h-zb animals 40-80% of the t cells are 1bz1/cd8*. the same peripheral expression is observed when the transgenes are expressed in f1 animals bearing a "neutral" mhc haplotype (eg. h-zb'*). however, when the transgene hi expressed in f1 animals which also express the h-2ld gene product, negative selection occurs by clonal deletion. however, this deletion is functional rather than structural as the 1b2 clonotype is present on 10-40% of peripheral t cells. these cells are unusual in that they express neither of the characteristic peripheral molecules cd4 or cd8. the absence of cd8 expression on the 1b2* cells appears to allow these potentially self-reactive clones to exist without evidence of autoimmunity. the original clone as well as 1b2+/cd8+ cells from h-2b animals are strongly inhibited by anti-cd8 reagents. in an effort to understand the process of negative selection and self-tolerance we have examined the capacity of these cells to be activated directly by the anti-clonotype rather than antigen (h-2ld). the results demonstrate that the clonotype is fully functional on these double negative cells, indicating a normal maturation in the thymus. further examination of their surface phenotype also supports the conclusion that these are fully mature cells which are phenotypically distinct from double negative cells which exist in the thymus of h-2b animals. of imunohematology. azl, leiden, the netherlands;2praxis biologics, rochester, new york, usa and 'university of southampton, uk. immunity to disease caused by neisseria menineitidis is associated with the presence of bactericidal and opsonic antibodies to the capsular polysaccharide (cps), lipopolysaccharide and to outer membrane proteins (omps). the cps of group a and c meningococci are proven efficacious vaccines, although the immunogenicity in infants is poor and the immunity is of short duration. the combination with t-helper epitopes will certainly improve the immunogenic properties of these t-independent (ti,) antigens. the group b cps is poorly immunogenic in humans probably because of tolerance due to structural similarity to host glycopeptides and/or glycolipids. we have focused our research onto the class 1 omps which show limited heterogeneity amongst meningococci. murine monoclonal antibodies to these proteins are highly bactericidal in vitro and will be used to map b-cell epitopes. t-epitopes have been identified by theoretical prediction of immunodominant sites by analysis of the amino acid sequence of the omp followed by their solid phase synthesis and subsequent testing for polyclonal activation of t-lymphocytes obtained from hl4-typed volunteers immunized with the omp. in addition human t-cell clones are generated with omps and maintained with antigen, ebv-transformed b-cells, fresh feeders and ril2. the clones are tested for antigen specificity, io vitro helper function, mhc restriction element, expression of surface markers and recognition of common meningococcal t-cell epitopes. c 642 demonstration o f p-azobenzene-arsonate-l-tyrosine (aba-tyr) speclfic t cells in low responder h-26 mice by il-1 supported t cell proliferatlon previous studies have shown that h-zb mice immunized with aba-tyr fail to produce aba specific delayed-type hypersensitivity and show little or no t cell proliferation in vitro to aba-tyr. these observations suggest that h-zb mice are deficient in th1 cells that respond to aba-tyr. by contrast, immunization of h-2b mice with tnp conjugates of aba-tyr revealed good cognate help, suggesting that these mice do possess aba-tyr specific th2 cells and that such cells are not revealed in conventional lymphoproliferative assays. because such assays are widely used to evaluate ir gene control and to map t cell epitopes, the databases generated from such studies may seriously under represent the total number of responder phenotypes and t cell epitopes. because of this concern, we established culture conditions that wlii support aba-tyr specific t cell proliferation in h-2b mice. in these studies, c57bu6.l mice were immunized s.c. with aba-tyr and 7 to 14 days later the draining lymph nodes were cultured with varying doses of aba-tyr or with varying doses of aba-tyr and varying doses of recombinant il-1 alpha (rll-la), a known costimulator of th2 cells. culture with aba-tyr alone produced no proliferation. by contrast, culture with aba-tyr and rll-la revealed t cell proliferation that titrated with the dose of aba-tyr and the dose of rll-la specific for conalbumin presented on ryngeneic antigen presenting cells and dependent on il-1 for its proliferation, was used a s an indicator cell for the ability of neonatal murine spleen cells to present antigen and produce il-1 and il-2.the antigen presenting capacity of neonatal spleen cells is low. during antigen presentation there is an augmentation of il-1 and il-2 production by the antigen presenting spleen cell population. however, neonatal spleen cells do not respond as well a s adult cells. the low levels o f il-1 can not be attributed t o a low potential for producing il-1 since neonatal cells produce high levels of il-1 after induction by a crude il-1 inducer factor (il-1-if).the this impairment leads t o a decreased stimulus of the 1-helper cell to produce inducer factors which leads t o low levels of il-1 and il-2 production by the neonatal cells during antigen presentation. no suppressor mechanisms responsible for the l o w interleukin production were detected. human or murine class i genomic dna was transfected into a b-lymphohlastoid x t-lymphoblastoid hybrid cell line. this fusion hybrid has lost both t cell derived copies of chromosome six and contains deletions spanning the class i1 region on both copies of chromosome six derived from the b-cell parent. previous data have described a transacting factor within this region that is responsible for class i antigen expression. hla-bw58 and b7 glycoproteins, although synthesized, were not transported to the plasma membrane in the hybrid. were surface expressed. these data suggest a fundamental difference between human and mouse histocompatibility antigens in their requirements for intracellular transport. the role of glycans in this transport dicotomy is currently under investigation. in addition, hybrid human-murine genes are being used to identify regions of the class i molecules involved in this transport phenomenon. we probed t h e means by which t h e a n t l g e n p r e s e n t l n g c e l l (apc) handles t h e a n t i g e n produce p e p t i d e s t h a t bind t o mhc-molecules. we propose t h e e x i s t a n c e o f a new type o f i n t e r n a l image i n which immunoglobulin v-region peptides. formed by processing, imitate peptides from conventional a n t i g e n s . w e r e f e r t o such denatured i n t e r n a l images as r e s i d u e internal images, since t h e y a r e a s s o c i a t e d w i t h t h e r e s i d u e of p e p t i d e s remaining a f t e r processing. in some cases, r e s i d u e internal images may be actual sequence images, i . e . , the v-region sequence m y be i d e n t i c a l t o the conventional a n t i g e n sequence.to be class i h-2ka-restricted cytolytic t lymphocytes (ctl) are directed against two immunodominant sites on the a/jap/305/57 influenza hemagglutinin (ha) that can be mimicked by synthetic oligopeptides spanning residues 202-221 in the ha1 and 523-545 in the hydrophobic, transmembrane region. analysis of the fine specificity of hal-specific ctl clones demonstrated that these ctl clones can be subdivided into at least two group based on their patterns of recognition of closely related influenza h2n2 field strains and a monoclonal antibody derived variant of a/guiyang/4/57. using a series of nested synthetic peptides spanning the 202-221 region, the minimal amino acid residues necessary for recognition by the two groups of ctl clones were defined and found to consist of two separate but overlapping sites. sequence comparison of the ha of the a/jap/305/57, the influenza field isolates and the monoclonal antibody derived variant has identified two amino acids, asn at position 207 and gly at position 215, that are critical for t cell recognition. thus, animo acid substitutions induced either by antigenic drift or by monoclonal antibody selection can affect class i ctl recognition. pretreatment with a n t m e s reactive with class 11 mhc anti-has previously been reported to be successful in exov~~kj a n t i g e n -w i r q dendritic cells (m) fran rodent tissue grafts. we have extended these exper-to inta? whole organ grafts. ilia pnmxses also demcnstrated a prolonged survival (17 f 2 days) ccapared to controls (11 t 1 days) (p < 0.01) tihen v l a n t e d into streprozatocin treat& da recipients. antigenic variation in the haemagglutinin (ha) of influenza a viruses frequently introduces new oligosacchekide attachment sites ( aan-x-serlthreo) and carbohydrate addition prevents antibody recognition by steric hindrance. acid substitution in mutant viruses of the h3n2 subtype (ha1 63 asp+asn), that introduces an n-glycosylation site (hn6gcys6&thr65), abrogates antibody and cd4+ t recognition. infected with x31 virus recognise a synthetic peptide corresponding to antigenic site e, ha1 56-76, and are sensitive to a single substitution (ha1 63 asp-basn) in mutant viruses. virus infected target cells, thereby confirming that carbohydrate addition prevents cd4' t cell recognition. here ve show that an amino cell i-ad restricted, ha specific t cell clones f r m balblc mice-reviously recognition of mutant viruses is restored however by tunicamycin-treatment of key: cord-003898-y6zpvw84 authors: tan, kai sen; andiappan, anand kumar; lee, bernett; yan, yan; liu, jing; tang, see aik; lum, josephine; he, ting ting; ong, yew kwang; thong, mark; lim, hui fang; choi, hyung won; rotzschke, olaf; chow, vincent t; wang, de yun title: rna sequencing of h3n2 influenza virus-infected human nasal epithelial cells from multiple subjects reveals molecular pathways associated with tissue injury and complications date: 2019-08-27 journal: cells doi: 10.3390/cells8090986 sha: doc_id: 3898 cord_uid: y6zpvw84 the human nasal epithelium is the primary site of exposure to influenza virus, the initiator of host responses to influenza and the resultant pathologies. influenza virus may cause serious respiratory infection resulting in major complications, as well as severe impairment of the airways. here, we elucidated the global transcriptomic changes during h3n2 infection of human nasal epithelial cells from multiple individuals. using rna sequencing, we characterized the differentially-expressed genes and pathways associated with changes occurring at the nasal epithelium following infection. we used in vitro differentiated human nasal epithelial cell culture model derived from seven different donors who had no concurrent history of viral infections. statistical analysis highlighted strong transcriptomic signatures significantly associated with 24 and 48 h after infection, but not at the earlier 8-h time point. in particular, we found that the influenza infection induced in the nasal epithelium early and altered responses in interferon gamma signaling, b-cell signaling, apoptosis, necrosis, smooth muscle proliferation, and metabolic alterations. these molecular events initiated at the infected nasal epithelium may potentially adversely impact the airway, and thus the genes we identified could serve as potential diagnostic biomarkers or therapeutic targets for influenza infection and associated disease management. the global burden of inter-pandemic influenza is high. it is estimated to affect 1 billion people annually, with 3-5 million severe cases requiring hospitalization or intensive care treatment, resulting in approximately 0.5 million deaths [1] . worryingly, drug-resistant influenza strains are emerging at a rapid rate that will severely hamper the ability of our healthcare systems to contain influenza outbreaks [2] . therefore, alternative strategies are needed against severe influenza infections during both seasonal and pandemic influenza outbreaks. the normal human airway epithelium is a pseudo-stratified layer of ciliated and non-ciliated columnar cells, goblet cells, club cells, and basal cells [3] . the airway epithelium protects against airway infection via efficient mucociliary clearance (mcc), the production of inflammatory mediators and chemokines against viruses, and the recruitment of immune cells [4] . when the influenza virus breaches the defense of the human airway epithelium, it causes a myriad of innate responses by the infected host in response to viral invasion [5, 6] . among these changes are critical factors that can determine disease severity, and which may lead to the development of diagnostic, prognostic prediction markers, or anti-influenza therapies [7] [8] [9] [10] . however, few studies have hitherto been performed in relevant models [11] and human models of influenza are not feasible due to potential severity of the infection. therefore, the mechanistic study of viral-induced airway changes using relevant models can lead to better understanding of the development of severe complications. additionally, we need greater clarity on the different immune responses in view of the rising prevalence of chronic diseases such as diabetes mellitus and asthma. patients with these disorders are especially susceptible to severe influenza complications compared to healthy subjects [12] . thus, the establishment of a baseline response against influenza infection of "healthy" tissue is beneficial to facilitate future comparative studies to better manage influenza in patients with co-morbidities. although the study of host responses in influenza infection is not new, current in vitro cell lines cannot accurately represent human airway infection due to the lack of key mucociliary features [13] . hence, we have previously developed an air-liquid interface (ali) human airway epithelial cell culture that is able to sustain influenza infection [5, 6] . we have also further compared the transcriptomic responses of our infected human nasal epithelial cells (hnecs) with 15 other in vitro and in vivo influenza infection transcriptomic studies [6] . the comparison revealed that at their peak responses against influenza, the differential transcriptome signature in hnecs was highly similar to the signatures from other influenza infection models [6] . interestingly, compared to the homogenous cell lines tested, our heterogenous hnec model exhibited a more comparable response to the clinical influenza studies, indicating that most responses were initiated at the nasal epithelium [6] . therefore, in this study, we aim to further utilize the hnec model as a physiologically relevant in vitro model to clarify the nasal epithelial responses against influenza h3n2 infection, which would then facilitate the identification of the key host factors that are significant for future studies. to establish host factors that are significantly altered in the nasal epithelium as a reference of early innate responses against influenza, the dynamic expression of the genes needs to be clearly elucidated. while there are many studies that utilize microarray analysis to identify the host responses against influenza, the limitation of the microarray is its inability to determine the full extent of gene changes due to its hybridization-based protocol [14] . the aim of this study was to utilize rna sequencing (rnaseq) technology to not only reveal the hnec responses (from multiple individuals) against influenza infection, but also to identify those genes with high magnitude changes to serve as potential reference markers of the innate responses of influenza infection. given that rnaseq functions by reading virtually all the rnas present in the samples tested, we can also discern the magnitude of each rna change and mark them as the canonical responses. in addition, as rnaseq is not constrained by probe usage as in microarrays, they are therefore more reliable in detecting novel interactions during influenza infections of hnecs. hence, rnaseq analysis will further augment the transcriptomic data established previously by microarray analysis. the augmented baseline can then be applied to future clinical studies and practice against influenza infection, especially for comparison against patients with other underlying co-morbidities that may be affected by more severe disease. approval to conduct this study was obtained from the national healthcare group domain-specific board of singapore (dsrb ref: d/11/228) and the institutional review board of the national university of singapore (irb ref: . written consent was obtained from donors prior to the collection of the tissue biopsies. at the time of collection, all subjects were free of symptoms of urti. the medical backgrounds of the subjects are summarized intable s1. the hnespcs were isolated and enriched from the tissue biopsies according to a previously standardized protocol [5, 15] , which normalized the hnespcs to a baseline state that differentiates into hnecs resembling healthy tissues if they pass the quality control checks for their differentiation [5] . following enrichment, the hnespcs were expanded further and subjected to ali culture in transwells for in vitro differentiation according to previous protocol as well [5, 15] . briefly, primary cells were subjected to isolation for selection of hnespcs, which were enriched and expanded with dulbecco's modified eagle medium: nutrient mixture f-12 (dmem/f12) (gibco-invitrogen, carlsbad, ca, usa) containing 10 ng/ml of human epithelial growth factor (egf, gibco-invitrogen, carlsbad, ca, usa), 5 µg/ml of insulin (sigma, st. louis, mo, usa), 0.1 nm of cholera toxin (sigma, st. louis, mo, usa), 0.5 µg/ml of hydrocortisone (sigma, st. louis, mo, usa), 2 nm/ml of 3,3 ,5-triiodo-l-thyronine (t3) (sigma, st. louis, mo, usa), 10 µl/ml of an n-2 supplement (gibco-invitrogen) and 100 iu/ml of antibiotic-antimycotic (gibco-invitrogen, carlsbad, ca, usa). the expanded hnespcs were then transferred onto 12-well 0.4 µm transwell inserts (corning, corning, ny, usa). once confluent, growth medium was discarded and 700 µl of pneumacult™-ali medium with inducer supplements (stemcell technologies inc., vancouver, canada) was added to the basal chamber to establish ali conditions. the cells were cultured in ali culture for 4 weeks, with media change every 2-3 days. after 3-4 weeks of differentiation, hnecs from a total of seven donors were then subjected to influenza h3n2 virus infection. the influenza a strain used in this study is of the h3n2 subtype (a/aichi/2/1968) (atcc, manassas, va, usa). the virus was propagated using embryonated egg culture and used for all the infection in the hnecs. prior to infection, fully differentiated hnecs were washed with 1× dpbs and infected with the h3n2 influenza virus at a multiplicity of infection (moi) of 0.1 and incubated for 1 h at 35 • c. after the 1 h incubation, the viral inoculum was removed and the hnecs were incubated back in 35 • c. the control hnecs were harvested for apical wash and rna prior to the infection at 0 h post-infection (hpi). the infected hnecs were then harvested for the apical wash and rna following 8, 24, and 48 hpi incubation at 35 • c. at each infection time point, 150 µl of 1x dpbs was added and incubated in the apical chamber for 10 min at 35 • c to recover progeny viruses as the apical wash. the plaque assay for viral quantification was performed using overnight mdck cultures (atcc, manassas, va, usa) at 85-95% confluence in 24-well plates. the mdck cells were incubated with 100 µl of serial dilutions (from 10 −1 to 10 −6 ) of virus from apical washes at 35 • c for 1 h, where plates were rocked every 15 min to ensure equal viral distribution. after incubation, the inocula were removed and replaced with 1 ml of avicel (fmc biopolymer, philadelphia, pa, usa) overlay, and incubated at 35 • c for 65-72 h. after incubation, avicel overlay were removed, and cells were fixed with 4% formaldehyde in 1× pbs for 1 h. formaldehyde was then removed, and cells were washed with 1× pbs prior to staining with 1% crystal violet for 15 min before washing the stain away. the plaque-forming units (pfu) were calculated as follows: number of plaques × dilution factor = number of pfu per 100 µl. at each time point after the collection of apical wash, the hnecs were lysed using rna lysis buffer. total rna was then extracted from the lysate using mirvana mirna isolation kit (life technologies, grand island, ny, usa). the extracted total rna was first subjected to nanodrop analysis to first ensure the rna quality, before being submitted for rnaseq analysis. then, 500 ng from the remaining rna was subjected to cdna synthesis using maxima first-strand cdna synthesis kit (thermoscientific, pittsburgh pa, usa). after this, qpcr analysis was performed to evaluate the transcriptional levels of host response genes selected based on previous microarray analysis using pre-designed primers (sigma aldrich). each qpcr reaction was performed in duplicate using gotaq-qpcr master mix kit (promega, san luis obispo, ca, usa), and relative gene expression was calculated using the comparative method of 2-∆∆ct normalized to the housekeeping gene pgk1. relative gene expression levels were presented as median values and interquartile ranges, while statistical significance was determined using the wilcoxon signed-rank test. all human rnas were analyzed on the agilent bioanalyzer (agilent, santa clara, ca, usa) or the perkin elmer labchip gx system (perkin elmer, waltham, ma, usa) for quality assessment with rna integrity number (rin) or rna quality score range from 6.8-9.7 and median of 9.0. cdna libraries were prepared using 2 ng of total rna and 1 µl of a 1:50,000 dilution of ercc rna spike in controls (ambion ® thermo fisher scientific, waltham, ma, usa) using smartseq v2 protocol [16] , except for the following modifications: fastq files were mapped to the human genome build hg38 using star. gene counts were computed using featurecounts (part of the subread package) using annotations from gencode version 26. differential gene expression analysis was performed using edger in a paired fashion under r version 3.3.3. multiple testing correction was done using the method of benjamini and hochberg and p-values (false discovery rate; fdr) less than 0.05 was deemed to be significant. geneset enrichment analysis using data from gene ontology (go) was performed using the bioconductor package topgo, while the analysis using reactome pathway was performed using the vioconductor package reactomepa. both analyses were run in r version 3.3.3 using multiple testing-corrected significant differentially-expressed genes. tnf-α, tnf-β, vegf, eotaxin/ccl11, and pdgf-aa. samples and standards were incubated with fluorescent-coded magnetic beads which had been pre-coated with the respective capture antibodies. after an overnight incubation at 4 • c, plates were washed twice. biotinylated detection antibodies were incubated with the complex for 1 h, and streptavidin-pe was then added and incubated for another 30 min. plates were washed twice again, then beads were re-suspended with sheath fluid before acquiring on the flexmap ® 3d (luminex) using xponent ® 4.0 (luminex) acquisition software. data analysis was done on bio-plex manager™ 6.1.1 (bio-rad). standard curves were generated with a 5-pl (5-parameter logistic) algorithm, reporting values for both mfi and concentration data. results were then expressed as mean fold change compared with uninfected control, and p-values (fdr) of less than 0.05 were considered significant. prior to analysis, the responses of all seven hnecs donors following influenza infection were plotted on a principal component analysis (pca) plot. the pca plot indicated a degree of variability in the responses between donors and time points (figure 1 ). nonetheless, the responses were clustered tightly enough following infection to signify their consistency of infection for further transcriptomic analysis-similar to those observed in our previous study [6] . the respective capture antibodies. after an overnight incubation at 4 °c, plates were washed twice. biotinylated detection antibodies were incubated with the complex for 1 h, and streptavidin-pe was then added and incubated for another 30 min. plates were washed twice again, then beads were resuspended with sheath fluid before acquiring on the flexmap ® 3d (luminex) using xponent ® 4.0 (luminex) acquisition software. data analysis was done on bio-plex manager™ 6.1.1 (bio-rad). standard curves were generated with a 5-pl (5-parameter logistic) algorithm, reporting values for both mfi and concentration data. results were then expressed as mean fold change compared with uninfected control, and p-values (fdr) of less than 0.05 were considered significant. prior to analysis, the responses of all seven hnecs donors following influenza infection were plotted on a principal component analysis (pca) plot. the pca plot indicated a degree of variability in the responses between donors and time points ( figure 1 ). nonetheless, the responses were clustered tightly enough following infection to signify their consistency of infection for further transcriptomic analysis-similar to those observed in our previous study [6] . significant gene expression changes (fdr < 0.05) of infected hnecs were detected as early as 8 hpi, and further increased at 24 and 48 hpi (table 1; figure 2a ). also, the number of genes decreased in a linear fashion as the fold change in expression increased, as seen in the 10x fold change genes significant gene expression changes (fdr < 0.05) of infected hnecs were detected as early as 8 hpi, and further increased at 24 and 48 hpi (table 1; figure 2a ). also, the number of genes decreased in a linear fashion as the fold change in expression increased, as seen in the 10x fold change genes indicated in figure 2b , where about 10% of the significantly altered genes remained. at 8 hpi, there were 31 upregulated genes and 13 downregulated genes. the major upregulated genes were the antiviral sensors and early response genes such as ifns, ifits, and ifis. interestingly enough, interferon lambda (ifnλ) gene ifnls was the earliest response interferon of infected hnecs, as opposed to interferons alpha or beta, at 8 hpi. at later time points, the number of gene expression changes increased substantially, with upregulation of 704 and 1080 genes, and downregulation of 217 and 758 genes at 24 and 48 hpi, respectively. there was augmented expression of antiviral effectors and inflammatory genes at both time points. ifnl remained the interferon gene with highest expression at both time points, while a marked elevation of cytokines such as cxcl10 and cxcl11 was also observed. considering downregulated genes, proliferative and transcriptomic functions appeared to be suppressed, with diminished expression of genes such as fmo2, klk12, and fosb. genes associated with metabolism, cell cycle, and dna repair were further suppressed following infection at 24 and 48 hpi. tables s2-s4 list the complete set of significant gene expression changes, arranged according to their fold change (log 2 fc). in addition, we have also verified that the genes showing major expression changes by rnaseq generally concurred with rt-qpcr analyses. of the 10 genes tested by qpcr at 48 hpi, all of them showed the same directional changes in expression as observed by rnaseq. hence, seven of these genes showed a p-value of <0.05 (il4i1, ifnl1(il29), cxcl10, tnfsf10, ifi6, ccl24, and cyp26a1), one gene had a p-value of <0.1 (ctgf), while only two genes were not statistically significant (tgfa and ano5) ( figure s1 ). indicated in figure 2b , where about 10% of the significantly altered genes remained. at 8 hpi, there were 31 upregulated genes and 13 downregulated genes. the major upregulated genes were the antiviral sensors and early response genes such as ifns, ifits, and ifis. interestingly enough, interferon lambda (ifnλ) gene ifnls was the earliest response interferon of infected hnecs, as opposed to interferons alpha or beta, at 8 hpi. at later time points, the number of gene expression changes increased substantially, with upregulation of 704 and 1080 genes, and downregulation of 217 and 758 genes at 24 and 48 hpi, respectively. there was augmented expression of antiviral effectors and inflammatory genes at both time points. ifnl remained the interferon gene with highest expression at both time points, while a marked elevation of cytokines such as cxcl10 and cxcl11 was also observed. considering downregulated genes, proliferative and transcriptomic functions appeared to be suppressed, with diminished expression of genes such as fmo2, klk12, and fosb. genes associated with metabolism, cell cycle, and dna repair were further suppressed following infection at 24 and 48 hpi. tables s2-4 list the complete set of significant gene expression changes, arranged according to their fold change (log2fc). in addition, we have also verified that the genes showing major expression changes by rnaseq generally concurred with rt-qpcr analyses. of the 10 genes tested by qpcr at 48 hpi, all of them showed the same directional changes in expression as observed by rnaseq. hence, seven of these genes showed a p-value of <0.05 (il4i1, ifnl1(il29), cxcl10, tnfsf10, ifi6, ccl24, and cyp26a1), one gene had a p-value of <0.1 (ctgf), while only two genes were not statistically significant (tgfa and ano5) ( figure s1 ). we then further compared the transcriptomic alterations in the hnecs over time, following influenza h3n2 infection. the number of gene expression changes mirrored the viral titer changes, which peaked at 48 hpi, and were consistent between donors ( figure 3a ). approximately two thirds of genes at 8 and 24 hpi overlapped with other time points, while about one third of genes at 48 hpi overlapped ( figure 3b ). the overlapping genes displayed similar directional consistency at the significant time points. in addition, congruent with the consistent viral titer with most gene expression changes at 48 hpi, we also noted the most consistent alterations in expression of genes across donors. this is highlighted in figure 3c , which portrays the heatmaps of the top 10 genes with the smallest p-value, together with their direction and magnitude of change. based on these analyses, we proposed that 48 hpi represents the optimal time point for the subsequent pathway analysis to ascertain influenza-specific pathway changes. we then further compared the transcriptomic alterations in the hnecs over time, following influenza h3n2 infection. the number of gene expression changes mirrored the viral titer changes, which peaked at 48 hpi, and were consistent between donors ( figure 3a ). approximately two thirds of genes at 8 and 24 hpi overlapped with other time points, while about one third of genes at 48 hpi overlapped ( figure 3b ). the overlapping genes displayed similar directional consistency at the significant time points. in addition, congruent with the consistent viral titer with most gene expression changes at 48 hpi, we also noted the most consistent alterations in expression of genes across donors. this is highlighted in figure 3c , which portrays the heatmaps of the top 10 genes with the smallest p-value, together with their direction and magnitude of change. based on these analyses, we proposed that 48 hpi represents the optimal time point for the subsequent pathway analysis to ascertain influenza-specific pathway changes. we then further subjected the significant gene changes to gene set enrichment using both go and reactome databases. at time points 8, 24, and 48 hpi, there were 3, 41, and 30 significant (adjusted p-value < 0.05) go biological processes (table s5 ) and 3, 93, and 74 significant (adjusted pvalue < 0.05) reactome pathways (table s6) , respectively. at the early time of 8 hpi, interferon we then further subjected the significant gene changes to gene set enrichment using both go and reactome databases. at time points 8, 24, and 48 hpi, there were 3, 41, and 30 significant (adjusted p-value < 0.05) go biological processes (table s5 ) and 3, 93, and 74 significant (adjusted p-value < 0.05) reactome pathways (table s6) , respectively. at the early time of 8 hpi, interferon-mediated antiviral responses were elevated as expected. at 48 hpi, the pathways appeared to be more stabilized and consistent for both go and reactome analyses, despite displaying more gene expression changes at this time point. responses to influenza virus skewing towards type i immunity were predominant in the go analysis. the expected interferon-mediated functions by the epithelium validated the authenticity of our model, where we found enriched type i interferon (go) and rig-i (reactome) pathways with upregulation of nearly all significant gene members (data not shown). besides the interferon and antiviral pathways, we identified several functions of interest initiated by the nasal epithelium that may contribute to the pathology and pathogenesis of influenza. at 48 hpi, go pathway enrichment analysis revealed that the nasal epithelium was actively involved in initial ifnγ signaling (go:0060333), despite not directly producing ifnγ. we also observed enriched function in apoptosis and necroptosis (go:0008637 and go:0060544), immune evasion (go:0045824), and other pathways that may lead to complication events such as smooth muscle proliferation (go:0048661) and response to fatty acid (go:0070542) ( table 2 ). for the reactome pathway analysis, we selected pathways that were enriched with more than 30 significant genes present in the enriched pathway, and these were generally in agreement with the go analysis (table 3 ). in addition to ifnγ signaling (877300) and apoptosis (109581), it also revealed changes in epithelial-initiated b cell receptor signaling (983705 and 1168372) and amino acid metabolism (71291) following influenza infection. it is noteworthy that these pathways were initiated at the epithelial level without the participation of immune cells, thus highlighting the relevant genes of interest for future studies. given that rnaseq analysis facilitates more accurate expression changes following infection compared to hybridization technology such as microarray, we conducted further analysis on the levels of gene expression changes to enable more stringent and accurate transcriptomic analyses for future studies. by comparing these results to a previous study that identified 11 influenza-specific signatures, we verified that these 11 genes were all expressed in infected nasal epithelium later at 24 hpi, but not at 8 hpi. furthermore, at both 24 and 48 hpi, all but one of the 11 gene signatures exhibited elevated expression of >2.5-fold change (>1.33 log 2 fc) compared to uninfected control hnecs (table 4 ). when we applied the higher fold change cutoff, the number of significant genes decreased by approximately 50% (figure 4a) , which was also congruent with the linear association observed earlier. therefore, future studies on early transcriptional alterations could consider adopting the 2.5-fold change in expression as a more stringent threshold, which may be more feasible, especially for large transcriptomic studies that yield large numbers of data points. in addition, when compared to the previous microarray study on a similar hnec model [6] , both rnaseq and microarray shared a high degree of overlap, with about one third and half of total genes from rnaseq and microarray overlapping, respectively ( figure 4b ). the overlap was generally observed in genes with highly altered expression, such as cxcl10, cxcl11, and rsad2, which were changed to a similar magnitude in both rnaseq and microarray (table s7 ). when we compared the 11 influenza signature genes, rnaseq revealed a more consistent increase in magnitude, i.e., at 48 hpi, the magnitude of the gene change was generally higher than that of the microarray (table 5 ). in addition, rnaseq was also able to detect novel genes with expression changes of high magnitude that were generally higher than those found by microarray only (193 genes versus 22 genes with elevated expression greater than 2.5-fold). genes such as heatr9, pdcd1, il4i1, art3, and kcnh7 were altered to a higher magnitude than the 2.5-fold threshold. hence, rnaseq-based transcriptomic analysis may augment transcriptomic findings to identify novel gene responses against influenza in the future. from rnaseq and microarray overlapping, respectively ( figure 4b ). the overlap was generally observed in genes with highly altered expression, such as cxcl10, cxcl11, and rsad2, which were changed to a similar magnitude in both rnaseq and microarray (table s7 ). when we compared the 11 influenza signature genes, rnaseq revealed a more consistent increase in magnitude, i.e., at 48 hpi, the magnitude of the gene change was generally higher than that of the microarray (table 5 ). in addition, rnaseq was also able to detect novel genes with expression changes of high magnitude that were generally higher than those found by microarray only (193 genes versus 22 genes with elevated expression greater than 2.5-fold). genes such as heatr9, pdcd1, il4i1, art3, and kcnh7 were altered to a higher magnitude than the 2.5-fold threshold. hence, rnaseq-based transcriptomic analysis may augment transcriptomic findings to identify novel gene responses against influenza in the future. after deriving the transcriptomes by rnaseq, we then further investigated whether the changes in expression of genes resulted in alterations in secretory cytokines and chemokines early in the infection of hnecs. initially, we detected significant reductions in multiple cytokines at 8 hpi, with the exception of il-15 which was increased ( figure s2 ). this may reflect the initial immune suppression during influenza infection. however, at 24 and 48 hpi, less significant changes were observed, i.e., only increase in tnf-a and decrease in mdc and pdgf-aa were noted at 24 hpi. this was followed by increase in ip-10 (cxcl10) and tgf-a and decrease in pdgf-aa seen at 48 hpi. this analysis highlights changes in ip-10, tgf-a, and pdgf-aa to be significant early responses in secretory cytokines/chemokines following influenza infection. our study has identified epithelium-initiated host responses which are found to be involved in both innate and adaptive responses. the finding is significant as we can now focus on the primary point of contact of influenza-the nasal epithelium in the study of early host responses for identifying host factors that can be utilized for diagnostic and therapeutic purposes [7] . in addition, our study also showed that it is important for reference databases to use relevant human models like the hnecs model, which contains the mucociliary component of the airways, in order to provide closely representative host responses. while there exists a high number of microarray studies that showed the host responses using similar hnec models, there are only a small number of equivalent rnaseq studies. compared to microarrays, rnaseq analysis can provide a more comprehensive picture of the transcriptomic landscape, and is not limited by the hybrid library variant and concentrations [14] . hence, in order to derive accurate magnitude of gene expression changes, we performed an rnaseq analysis of h3n2 infection using the hnecs model. h3n2 influenza virus was selected, given that it is a major circulating subtype over long periods of time. in addition, relatively lower efficacy of vaccines against this subtype prompted us to study its interactions with the primary host target to elucidate the immune responses and association with adaptive immunity [17, 18] . this model has been previously evaluated to be a highly clinically-relevant model that can facilitate controlled infection of nasal cells from multiple individuals. in addition, we have also previously shown-by microarray analysis-that the nasal epithelium is responsible for the initiation of host responses following influenza infection [6] . this renders the hnecs to serve as a valuable tool to analyze transcriptomics from different individuals infected under the same conditions to ensure consistent and relevant responses in humans. once the magnitude of gene expression changes was considered, several interesting findings emerged. firstly, the infected hnecs were observed with strong activation of antiviral genes and early inflammatory genes leading to type i immune responses. a large number of gene expression changes were of magnitude of over 100-fold difference (log 2 7 to log 2 9 fold change). most of the genes with high-magnitude expression changes were verified by qpcr, with statistical significance congruent with the rnaseq analysis. secondly, despite the absence of immune cells, the infected hnecs were able to generate strong type i responses that may likely aid the recruitment of cytotoxic cells to clear the infected cells. thirdly, in early responses of the hnecs, ifnλ genes, which represent type iii interferons, were more strongly induced than the more frequently observed type i interferons (ifnα and ifnβ), while type ii interferons were not produced by hnecs, in agreement with previous studies [5, 19] . the induction of type iii interferons may reflect an important event within the hnecs where ifnλ, the initial responders against the infection, may be more beneficial in the antiviral response [20, 21] . moreover, we also observed notable suppression of expression of certain genes following influenza infection, including suppression of proliferation and dna repair genes, which may contribute to the pathology and pathogenesis of influenza [22] . finally, rnaseq also unraveled expression changes of certain newly-discovered genes in response to influenza infection of the upper airway cells. genes such as heatr9 [23] , il4i1 [24] , tnfsf13b (baff) [25] , and pdcd1 (pd-1) [26] are recently implicated in influenza pathogenesis and mucosal defense, thereby signifying the role of the nasal epithelium against influenza infection. furthermore, rnaseq identified altered expression of art3 and kcnh7 genes that were not previously detected in influenza transcriptomes. these findings hence further reiterate the value of rnaseq in enhancing data on influenza transcriptomes for reference in future studies. via pathway enrichment analysis, we have identified known antiviral pathways to validate the hnecs responses against influenza. in addition, we have also documented the potential pathways initiated by the nasal epithelium that may contribute to influenza pathogenesis as represented by the gene expression changes listed in tables 2 and 3 . by analyses using literature-inferred go and reactome databases, we have demonstrated that the nasal epithelium can play a role in the main antiviral signaling, i.e., ifnγ responses despite not being a direct producer of ifnγ. the pathway enrichment indicated that hnecs may serve as important regulators of type ii interferons. even though the effects of ifnγ are vital to the robust clearance of influenza viruses [27] , there are reports of unregulated ifnγ being a contributor to inflammatory damage [28, 29] . therefore, the over-production of ifnγ response factors such as icam1 and cd44 may contribute to inflammatory damage of the epithelium. hence, production of factors such as stat1 [28] by the hnecs is also crucial in ensuring appropriate regulation of ifnγ-mediated expression of influenza response genes to modulate inflammation and to minimize damage. the primary contact of influenza virus with the nasal epithelium may subsequently lead to damage to the airway epithelium as well. this is apparent with the clear enrichment of the pathways of apoptosis, mitochondrial apoptotic processes, and necroptosis that contribute to cell death and mechanical barrier loss during infection [5, 30] . genes such as ifi6, bak1, caps8, tnfsf10, and fas suggest active apoptotic cell death that not only destroys cells in the epithelial barrier, but may also serve to propagate the virus and to perpetuate the damage [31] [32] [33] . furthermore, during virus infection, aberrant regulation of apoptosis may also lead to further injury to the epithelium and surrounding tissues [34] . on the other hand, necroptosis pathways have also been observed to be enriched in influenza-infected hnecs. compared to apoptosis, the study of necroptosis in influenza infection is relatively new with contradicting findings [34] . ripk3/necroptosis studies appear to generate contradictory results as to whether necroptosis protects against or is detrimental during influenza infection [35, 36] . hence, its increased expression during infection of hnecs warrants further investigation on its role in influenza-induced damage. in addition, we also noted enrichment of b-cell signaling pathways by the infected hnecs which may be vital for b-cell responses during the adaptive immune response [37] . we noted that most genes enriched in the b-cell pathways were related to antigen recognition such as proteasome subunits (psme2, psmb9, psma6, etc.) and b-cell receptor-associated genes such as dapp1 and card11 [38] . however, changes in expression of certain growth factors (including ereg and fgfs) following influenza infection may lead to complications involving airway remodeling and recruitment [39, 40] . further, the effects of the growth factors were further confirmed by the enrichment of pathways related to the proliferation of smooth muscle cells also induced by the infection. changes to airway smooth muscle cells are usually implicated in airway remodeling [41] [42] [43] , and may also contribute to post-influenza complications. hence, the genes found in this study may be crucial for elucidating the nasal-initiated responses that may contribute to the pathology and pathogenesis of influenza infection of the airways. finally, another interesting pathway that may contribute to epithelial damage is the negative regulation of innate immune responses. these genes may serve as proviral factors and aid in immune evasion. for example, adar is a proviral factor that works in synergy with influenza ns1 to enhance viral replication [44] . trafd1 is a negative regulator of toll-like receptor signaling which is upregulated in influenza-infected hnecs [45] . dhx58 is a negative regulator of rig-i/mda5 signaling pathway [46] . ceacam1 is involved in regulation of liver inflammation [47] and its expression appears to exert antiviral effects on influenza virus [48] . nmi binds to influenza virus ns1 and inhibits irf7-mediated interferon signaling [49, 50] . therefore, aberrant expression of genes in this signaling pathway may directly contribute to immune evasion of influenza, culminating in viral propagation and increased epithelial damage. we summarized the identified pathways (listed in tables 2 and 3) that alluded to immune evasion (negative regulation of innate immune responses), antigen processing (metabolism of amino acids and derivatives), and immunomodulation (interferon gamma signaling, b cell receptor signaling, and response to fatty acid) that may contribute to severity of influenza. there was evidence of direct pathway enrichment of potential influenza evasion strategies and/or immunomodulation with accompanying transcriptomic changes. the genes in the pathway may be analyzed for their immunomodulatory activity and whether their expression is beneficial to the virus (immune evasion) or the host (preventing cytokine storm). in addition, the infected hnecs also revealed modified responses associated with fatty acid, with many lipid signaling molecules such as leukotrienes that mediate antiviral responses and subsequent inflammation of the airway [51, 52] . such modified responses may also determine the afforded in the airway and the severity of airway inflammation and damage. in addition, the modification may also affect the lower airway responses to inflammatory mediators; hence, the changes in these pathways may also suggest a potential mechanistic link to the pathogenesis of viral-induced exacerbation of chronic inflammatory diseases. lastly, we also noted enrichment of pathways related to amino acid metabolism, which is important in antigen processing and proteasomal degradation of foreign protein. the changes in these genes at the hnecs, the target site of influenza infection, may determine the effectiveness of antiviral responses mounted and may therefore influenza disease severity. in addition, we also compared our rnaseq analysis against previously reported influenza-specific signatures in order to improve future transcriptomic analysis [53] . in vitro transcriptomic analysis yields a large number of differentially-expressed genes that would require additional criteria to identify functionally significant genes. by means of this comparison, we discovered that almost all influenza-specific signatures exhibited differences in expression of above 2.5-fold. hence, we propose applying fold change of >2.5 as a threshold for future in vitro transcriptomic systems analyses, in order to increase the stringency in detecting functionally significant gene changes. finally, we observed that, unlike the transcriptome, there were notably fewer cytokines that were readily secreted during the acute phase of infection. expression of cytokines was reduced at 8 hpi, except for il-15, which interestingly is implicated in influenza-induced acute lung injury [54] . even fewer cytokines showed altered expression at later time points. among them, only tgf-a, ip-10 (cxcl10), and pdgf-aa were significantly altered at 24 and 48 hpi. these may be significant markers that can be detected in the secretion of influenza-infected mucosal surface that may influence the severity of influenza. ip-10 is a well-established ifnγ response gene, and serves as a useful marker for response against influenza [5, 6] . tgf-a represents an important factor involved in the secretion of il-8 in response to influenza, and may determine the early appropriate innate responses to prevent severe disease [55] . on the other hand, it is also involved in pulmonary fibrosis as a ligand of epidermal growth factor receptor (egfr) and may contribute to complications in the lower airway [56] . pdgf-aa was found to be elevated in the cerebrospinal fluid of influenza-associated encephalopathy [57] , but was consistently reduced in hnec secretory fluid, thus warranting further investigation into its role in the infected nasal mucosa. the establishment of a reference transcriptome based on early responses of the human nasal epithelium model serves a key role in research on critical host factors involved in influenza. as the primary host contact with the virus, not only are immune responses against influenza important, but also the alterations in non-immune functions such as metabolism, cell content, and cell cycle, which may contribute to disease severity. in terms of translational potential, the model system identified gene expression changes of significant magnitude and pathways that impact responses against influenza and its severity. these genes may represent novel targets for future diagnostic and therapeutic development. under controlled conditions, the hnecs clinically establish the baseline for "normal" innate immune responses of the nasal epithelium against influenza viral infection. such a baseline can be particularly crucial when studying the changes in innate immune responses against influenza, especially in patients with underlying chronic diseases who may have aberrant airway responses against influenza. their antiviral responses may differ from "normal" subjects, and this study thus provides the basis for comparing the differential responses that culminate in more severe infections in patients with co-morbidities such as diabetes and chronic airway inflammatory diseases. such comparative clinical studies can potentially enhance the management of influenza viral infection in patients with chronic diseases. in conclusion, rnaseq technology allowed us to accurately quantify the magnitude of gene expression changes, as well as the relevant enriched pathways during h3n2 influenza virus infection of hnecs, which can serve as a baseline for future clinical studies. the establishment of this baseline under controlled condition elucidated the antiviral innate response by the infected nasal epithelium, and highlighted the molecular factors and abnormalities in the upper airway that may contribute to influenza severity. furthermore, this study also culminated in the identification of novel gene signatures and host factors that may be harnessed for future research to develop influenza diagnostic markers and therapeutic targets. supplementary materials: the following are available online at http://www.mdpi.com/2073-4409/8/9/986/s1, figure s1 . real-time quantitative pcr validation of genes selected from rnaseq analyses. pcr data are expressed as log2 fold change using median and interquartile range. statistical significance was determined using wilcoxon signed-rank test. * p < 0.05, # p < 0.1; figure s2 . luminex assay of secreted cytokines/chemokines in the apical supernatant. luminex data are expressed as mean fold change from uninfected control. statistical significance was determined using fdr. * p < 0.05; table s1 . information of seven donors of hnecs; table s6 . significant enriched pathways based on reactome pathway database analysis; table s7 . list of significant genes with altered expression at 48 hpi of influenza h3n2 infection of hnecs analyzed by rnaseq and microarray [6] . oseltamivir resistance-disabling our influenza defenses role of il-13ralpha2 in modulating il-13-induced muc5ac and ciliary changes in healthy and crswnp mucosa epithelial damage and response human nasal epithelial cells derived from multiple individuals exhibit differential responses to h3n2 influenza virus infection in vitro comparative transcriptomic and metagenomic analyses of influenza virus-infected nasal epithelial cells from multiple individuals reveal specific nasal-initiated signatures systems-biology approaches to discover anti-viral effectors of the human innate immune response uncovering the global host cell requirements for influenza virus replication via rnai screening. microbes infect cellular networks involved in the influenza virus life cycle cd151, a novel host factor of nuclear export signaling in influenza virus infection propagation of respiratory viruses in human airway epithelia reveals persistent virus-specific signatures distinction between rhinovirus-induced acute asthma and asthma-augmented influenza infection a novel three-dimensional cell culture method enhances antiviral drug screening in primary human cells comparing bioinformatic gene expression profiling methods: microarray and rna-seq the use of nasal epithelial stem/progenitor cells to produce functioning ciliated cells in vitro full-length rna-seq from single cells using smart-seq2 predicting clinical severity based on substitutions near epitope a of influenza a/h3n2 effectiveness of seasonal influenza vaccinations against laboratory-confirmed influenza-associated infections among singapore military personnel in 2010-2013. influenza other respir in vitro model of fully differentiated human nasal epithelial cells infected with rhinovirus reveals epithelium-initiated immune responses ifnlambda is a potent anti-influenza therapeutic without the inflammatory side effects of ifnalpha treatment interferon-lambda mediates non-redundant front-line antiviral protection against influenza virus infection without compromising host fitness influenza infection induces host dna damage and dynamic dna damage responses during tissue regeneration heatr9 is upregulated during influenza virus infection in lung alveolar epithelial cells the il4i1 enzyme: a new player in the immunosuppressive tumor microenvironment. cells cigarette smoke inhibits baff expression and mucosal immunoglobulin a responses in the lung during influenza virus infection highly pathological influenza a virus infection is associated with augmented expression of pd-1 by functionally compromised virus-specific cd8+ t cells new fronts emerge in the influenza cytokine storm inflammatory impact of ifn-gamma in cd8+ t cell-mediated lung injury is mediated by both stat1-dependent and -independent pathways production of interferon-gamma by influenza hemagglutinin-specific cd8 effector t cells influences the development of pulmonary immunopathology h3n2 influenza virus infection enhances oncostatin m expression in human nasal epithelium nf-kappab-dependent induction of tumor necrosis factor fas/fasl is crucial for efficient influenza virus propagation nucleoprotein of influenza a virus negatively impacts antiapoptotic protein api5 to enhance e2f1-dependent apoptosis and virus replication influenza a virus enhances its propagation through the modulation of annexin-a1 dependent endosomal trafficking and apoptosis programmed cell death in the pathogenesis of influenza dai senses influenza a virus genomic rna and activates ripk3-dependent cell death zbp1/dai is an innate sensor of influenza virus triggering the nlrp3 inflammasome and programmed cell death pathways the multifaceted b cell response to influenza virus influenza virus-induced type i interferon leads to polyclonal b-cell activation but does not break down b-cell tolerance neutrophils induce smooth muscle hyperplasia via neutrophil elastase-induced fgf-2 in a mouse model of asthma with mixed inflammation respiratory syncytial virus infection provokes airway remodelling in allergen-exposed mice in absence of prior allergen sensitization cd151, a laminin receptor showing increased expression in asthma, contributes to airway hyperresponsiveness through calcium signaling regulation of human airway smooth muscle cell migration and relevance to asthma airway smooth muscle in asthma: phenotype plasticity and function. pulm pharm the interactomes of influenza virus ns1 and ns2 proteins identify new host factors and provide insights for adar1 playing a supportive role in virus replication yoshimura, a. fln29, a novel interferon-and lps-inducible gene acting as a negative regulator of toll-like receptor signaling rna-and virus-independent inhibition of antiviral signaling by rna helicase lgp2 ceacam1 in liver injury, metabolic and immune regulation ceacam1-mediated inhibition of virus production subcellular proteomic analysis of human host cells infected with h3n2 swine influenza virus negative regulation of nmi on virus-triggered type i ifn production by targeting irf7 mast cells and influenza a virus: association with allergic responses and beyond leukotriene b4 enhances nod2-dependent innate response against influenza virus infection multi-cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses interleukin-15 is critical in the pathogenesis of influenza a virus-induced acute lung injury influenza induces il-8 and gm-csf secretion by human alveolar epithelial cells through hgf/c-met and tgf-alpha/egfr signaling overactive epidermal growth factor receptor signaling leads to increased fibrosis after severe acute respiratory syndrome coronavirus infection vascular endothelial growth factor (vegf) and platelet-derived growth factor (pdgf) levels in the cerebrospinal fluid of children with influenza-associated encephalopathy this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license we thank the surgeons and staff in the department of otolaryngology, national university hospital, singapore. we thank h.h. ong and t.t. he for the subject selection and recording. we thank m.c. phoon and s.h. lau for technical assistance in viral experiments. the authors would like to acknowledge the staff of the immunomonitoring platform at sign. the authors declare no conflicts of interest. key: cord-016293-pyb00pt5 authors: newell-mcgloughlin, martina; re, edward title: the flowering of the age of biotechnology 1990–2000 date: 2006 journal: the evolution of biotechnology doi: 10.1007/1-4020-5149-2_4 sha: doc_id: 16293 cord_uid: pyb00pt5 nan the significance of developing genetic and physical maps of the genome, and the importance of comparing the human genome with those of other species. it also suggested a preliminary focus on improving current technology. at the request of the u.s. congress, the office of technology assessment (ota) also studied the issue, and issued a document in 1987 -within days of the nrc report -that was similarly supportive. the ota report discussed, in addition to scientific issues, social and ethical implications of a genome program together with problems of managing funding, negotiating policy and coordinating research efforts. prompted by advisers at a 1988 meeting in reston, virginia, james wyngaarden, then director of the national institutes of health (nih) , decided that the agency should be a major player in the hgp, effectively seizing the lead from doe. the start of the joint effort was in may 1990 (with an "official" start in october) when a 5-year plan detailing the goals of the u.s. human genome project was presented to members of congressional appropriations committees in mid-february. this document co-authored by doe and nih and titled "understanding our genetic inheritance, the u.s. human genome project: the first five years" examined the then current state of genome science. the plan also set forth complementary approaches of the two agencies for attaining scientific goals and presented plans for administering research agenda; it described collaboration between u.s. and international agencies and presented budget projections for the project. according to the document, "a centrally coordinated project, focused on specific objectives, is believed to be the most efficient and least expensive way" to obtain the 3-billion base pair map of the human genome. in the course of the project, especially in the early years, the plan stated that "much new technology will be developed that will facilitate biomedical and a broad range of biological research, bring down the cost of many experiments (mapping and sequencing), and finding applications in numerous other fields." the plan built upon the 1988 reports of the office of technology assessment and the national research council on mapping and sequencing the human genome. "in the intervening two years," the document said, "improvements in technology for almost every aspect of genomics research have taken place. as a result, more specific goals can now be set for the project." the document describes objectives in the following areas mapping and sequencing the human genome and the genomes of model organisms; data collection and distribution; ethical, legal, and social considerations; research training; technology development; and technology transfer. these goals were to be reviewed each year and updated as further advances occured in the underlying technologies. they identified the overall budget needs to be the same as those identified by ota and nrc, namely about $200 million per year for approximately 15 years. this came to $13 billion over the entire period of the project. considering that in july 1990, the dna databases contained only seven sequences greater than 0.1 mb this was a major leap of faith. this approach was a major departure from the single-investigator-based gene of interest focus that research took hitherto. this sparked much controversy both before and after its inception. critics questioned the usefulness of genomic sequencing, they objected to the high cost and suggested it might divert funds from other, more focused, basic research. the prime argument to support the latter position is that there appeared to be are far less genes than accounted for by the mass of dna which would suggest that the major part of the sequencing effort would be of long stretches of base pairs with no known function, the so-called "junk dna." and that was in the days when the number of genes was presumed to be 80-100,000. if, at that stage, the estimated number was guessed to be closer to the actual estimate of 35-40,000 (later reduced to 20-25,000) this would have made the task seem even more foolhardy and less worthwhile to some. however, the ever-powerful incentive of new diagnostics and treatments for human disease beyond what could be gleaned from the gene-by-gene approach and the rapidly evolving technologies, especially that of automated sequencing, made it both an attractive and plausible aim. charles cantor (1990) , a principal scientist for the department of energy's genome project contended that doe and nih were cooperating effectively to develop organizational structures and scientific priorities that would keep the project on schedule and within its budget. he noted that there would be small short-term costs to traditional biology, but that the long-term benefits would be immeasurable. genome projects were also discussed and developed in other countries and sequencing efforts began in japan, france, italy, the united kingdom, and canada. even as the soviet union collapsed, a genome project survived as part of the russian science program. the scale of the venture and the manageable prospect for pooling data via computer made sequencing the human genome a truly international initiative. in an effort to include developing countries in the project unesco assembled an advisory committee in 1988 to examine unesco's role in facilitating international dialogue and cooperation. a privately-funded human genome organization (hugo) had been founded in 1988 to coordinate international efforts and serve as a clearinghouse for data. in that same year the european commission (ec) introduced a proposal entitled the "predictive medicine programme." a few ec countries, notably germany and denmark, claimed the proposal lacked ethical sensitivity; objections to the possible eugenic implications of the program were especially strong in germany (dickson 1989) . the initial proposal was dropped but later modified and adopted in 1990 as the "human genome analysis programme" (dickman and aldhous 1991) . this program committed substantial resources to the study of ethical issues. the need for an organization to coordinate these multiple international efforts quickly became apparent. thus the human genome organization (hugo), which has been called the "u.n. for the human genome," was born in the spring of 1988. composed of a founding council of scientists from seventeen countries, hugo's goal was to encourage international collaboration through coordination of research, exchange of data and research techniques, training, and debates on the implications of the projects (bodmer 1991) . in august 1990 nih began large-scale sequencing trials on four model organisms: the parasitic, cell-wall lacking pathogenic microbe mycoplasma capricolum, the prokaryotic microbial lab rat escherichia coli, the most simple animal caenorhabditis elegans, and the eukaryotic microbial lab rat saccharomyces cerevisiae. each research group agreed to sequence 3 megabases (mb) at 75 cents a base within 3 years. a sub living organism was actually fully sequenced and the complete sequence of that genome, the human cytomegalovirus (hcmv) genome was 0.23 mb. that year also saw the casting of the first salvo in the protracted debate on "ownership" of genetic information beginning with the more tangible question of ownership of cells. and, as with the debates of the early eighties, which were to be revisited later in the nineties, the respondent was the university of california. moore v. regents of the university of california was the first case in the united states to address the issue of who owns the rights to an individual's cells. diagnosed with leukemia, john moore had blood and bone marrow withdrawn for medical tests. suspicious of repeated requests to give samples because he had already been cured, moore discovered that his doctors had patented a cell line derived from his cells and so he sued. the california supreme court found that moore's doctor did not obtain proper informed consent, but, however, they also found that moore cannot claim property rights over his body. the quest for the holy grail of the human genome was both inspired by the rapidly evolving technologies for mapping and sequencing and subsequently spurred on the development of ever more efficient tools and techniques. advances in analytical tools, automation, and chemistries as well as computational power and algorithms revolutionized the ability to generate and analyze immense amounts of dna sequence and genotype information. in addition to leading to the determination of the complete sequences of a variety of microorganisms and a rapidly increasing number of model organisms, these technologies have provided insights into the repertoire of genes that are required for life, and their allelic diversity as well as their organization in the genome. but back in 1990 many of these were still nascent technologies. the technologies required to achieve this end could be broadly divided into three categories: equipment, techniques, and computational analysis. these are not truly discrete divisions and there was much overlap in their influence on each other. as noted, lloyd smith, michael and tim hunkapiller, and leroy hood conceived the automated sequencer and applied biosystems inc. brought it to market in june 1986. there is no much doubt that when applied biosystems inc. put it on the market that which had been a dream became decidedly closer to an achievable reality. in automating sangers chain termination sequencing system, hood modified both the chemistry and the data-gathering processes. in the sequencing reaction itself, he replaced radioactive labels, which were unstable, posed a health hazard, and required separate gels for each of the four bases. hood developed chemistry that used fluorescent dyes of different colors for each of the four dna bases. this system of "color-coding" eliminated the need to run several reactions in overlapping gels. the fluorescent labels addressed another issue which contributed to one of the major concerns of sequencing -data gathering. hood integrated laser and computer technology, eliminating the tedious process of information-gathering by hand. as the fragments of dna passed a laser beam on their way through the gel the fluorescent labels were stimulated to emit light. the emitted light was transmitted by a lens and the intensity and spectral characteristics of the fluorescence are measured by a photomultiplier tube and converted to a digital format that could be read directly into a computer. during the next thirteen years, the machine was constantly improved, and by 1999 a fully automated instrument could sequence up to 150,000,000 base pairs per year. in 1990 three groups came up with a variation on this approach. they developed what is termed capillary electrophoresis, one team was led by lloyd smith (luckey, 1990) , the second by barry karger , and the third by norman dovichi. in 1997 molecular dynamics introduced the megabace, a capillary sequencing machine. and not to be outdone the following year in 1998, the original of the species came up with the abi prism 3700 sequencing machine. the 3700 is also a capillary-based machine designed to run about eight sets of 96 sequence reactions per day. on the biology side, one of the biggest challenges was the construction of a physical map to be compiled from many diverse sources and approaches in such a way as to insure continuity of physical mapping data over long stretches of dna. the development of dna sequence tagged sites (stss) to correlate diverse types of dna clones aided this standardization of the mapping component by providing mappers with a common language and a system of landmarks for all the libraries from such varied sources as cosmids, yeast artificial chromosomes (yacs) and other rdnas clones. this way each mapped element (individual clone, contig, or sequenced region) would be defined by a unique sts. a crude map of the entire genome, showing the order and spacing of stss, could then be constructed. the order and spacing of these unique identifier sequences composing an sts map was made possible by development of mullis' polymerase chain reaction (pcr), which allows rapid production of multiple copies of a specific dna fragment, for example, an sts fragment. sequence information generated in this way could be recalled easily and, once reported to a database, would be available to other investigators. with the sts sequence stored electronically, there would be no need to obtain a probe or any other reagents from the original investigator. no longer would it be necessary to exchange and store hundreds of thousands of clones for full-scale sequencing of the human genome-a significant saving of money, effort, and time. by providing a common language and landmarks for mapping, sts's allowed genetic and physical maps to be cross-referenced. with a refinement on this technique to go after actual genes, sydney brenner proposed sequencing human cdnas to provide rapid access to the genes stating that 'one obvious way of finding at least a large part of the important [fraction] of the human genome is to look at the sequences of the messenger rna's of expressed genes' (brenner, 1990) . the following year the man who was to play a pivotal role on the world stage that became the human genome project suggested a way to implement sydney's approach. that player, nih biologist j. craig venter announced a strategy to find expressed genes, using ests (expressed sequence tag) (adams, 1991) . these so called ests represent a unique stretch of dna within a coding region of a gene, which as sydney suggested would be useful for identifying full-length genes and as a landmark for mapping. so using this approach projects were begun to mark gene sites on chromosome maps as sites of mrna expression. to help with this a more efficient method of handling large chunks of sequences was needed and two approaches were developed. yeast artificial chromosomes, which were developed by david burke, maynard olson, and george carle, increased insert size 10-fold (david t. burke et al., 1987) . caltech's second major contribution to the genome project was developed by melvin simon, and hiroaki shizuya. their approach to handling large dna segments was to develop "bacterial artificial chromosomes" (bacs), which basically allow bacteria to replicate chunks greater than 100,000 base pairs in length. this efficient production of more stable, large-insert bacs made the latter an even more attractive option, as they had greater flexibility than yacs. in 1994 in a collaboration that presages the snp consortium, washington university, st louis mo, was funded by the pharmaceutical company merck and the national cancer institute to provide sequence from those ests. more than half a million ests were submitted during the project (murr l et al., 1996) . on the analysis side was the major challenge to manage and mine the vast amount of dna sequence data being generated. a rate-limiting step was the need to develop semi-intelligent algorithms to achieve this herculean task. this is where the discipline of bioinformatics came into play. it had been evolving as a discipline since margaret oakley dayhoff used her knowledge of chemistry, mathematics, biology and computer science to develop this entirely new field in the early sixties. she is in fact credited today as a founder of the field of bioinformatics in which biology, computer science, and information technology merge into a single discipline. the ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. there are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information. paralleling the rapid and very public ascent of recombinant dna technology during the previous two decades, the analytic and management tools of the discipline that was to become bioinformatics evolved at a more subdued but equally impressive pace. some of the key developments included tools such as the needleman-wunsch algorithm for sequence comparison which appeared even before recombinant dna technology had been demonstrated as early as 1970; the smith-waterman algorithm for sequence alignment (1974); the fastp algorithm (1985) and the fasta algorithm for sequence comparison by pearson and lupman in 1988 and perl (practical extraction report language) released by larry wall in 1987. on the data management side several databases with ever more effective storage and mining capabilities were developed over the same period. the first bioinformatic/biological databases were constructed a few years after the first protein sequences began to become available. the first protein sequence reported was that of bovine insulin in 1956, consisting of 51 residues. nearly a decade later, the first nucleic acid sequence was reported, that of yeast alanine trna with 77 bases. just one year later, dayhoff gathered all the available sequence data to create the first bioinformatic database. one of the first dedicated databases was the brookhaven protein databank whose collection consisted of ten x-ray crystallographic protein structures (acta. cryst. b, 1973) . the year 1982 saw the creation of the genetics computer group (gcg) as a part of the university of wisconsin biotechnology center. the group's primary and much used product was the wisconsin suite of molecular biology tools. it was spun off as a private company in 1989. the swiss-prot database made its debut in 1986 in europe at the department of medical biochemistry of the university of geneva and the european molecular biology laboratory (embl). the first dedicated "bioinformatics" company intelligenetics, inc. was founded in california in 1980. their primary product was the intelligenetics suite of programs for dna and protein sequence analysis. the first unified federal effort, the national center for biotechnology information (ncbi) was created at nih/nlm in 1988 and it was to play a crucial part in coordinating public databases, developing software tools for analyzing genome data, and disseminating information. and on the other side of the atlantic, oxford molecular group, ltd. (omg) was founded in oxford, uk by anthony marchington, david ricketts, james hiddleston, anthony rees, and w. graham richards. their primary focus was on rational drug design and their products such as anaconda, asp, and chameleon obviously reflected this as they were applied in molecular modeling, and protein design engineering. within two years ncbi were making their mark when david lipman, eugene myers, and colleagues at the ncbi published the basic local alignment search tool blast algorithm for aligning sequences (altschul et al., 1990) . it is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterized genes. the emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in one location, or global, where regions of similarity can be detected across otherwise unrelated genetic code. the fundamental unit of blast algorithm output is the high-scoring segment pair (hsp). an hsp consists of two sequence fragments of arbitrary but equal length whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score. this system has been refined and modified over the years the two principal variants presently in use being the ncbi blast and wu-blast (wu signifying washington university). the same year that blast was launched two other bioinformatics companies were launched. one was informax in bethesda, md whose products addressed sequence analysis, database and data management, searching, publication graphics, clone construction, mapping and primer design. the second, molecular applications group in california, was to play a bigger part on the proteomics end (michael levitt and chris lee). their primary products were look and segmod which are used for molecular modeling and protein design. the following year in 1991 the human chromosome mapping data repository, genome data base (gdb) was established. on a more global level, the development of computational capabilities in general and the internet in specific was also to play a considerable part in the sharing of data and access to databases that rendered the rapidity of the forward momentum of the hgp possible. also in 1991 edward uberbacher of oak ridge national laboratory in tennessee developed grail, the first of many gene-finding programs. in 1992 the first two "genomics" companies made their appearance. incyte pharmaceuticals, a genomics company headquartered in palo alto, california, was formed and myriad genetics, inc. was founded in utah. incyte's stated goal was to lead in the discovery of major common human disease genes and their related pathways. the company discovered and sequenced, with its academic collaborators (originally synteni from pat brown's lab at stanford), a number of important genes including brca1 and brca2, with mary claire king, epidemiologist at uc-berkeley, the genes linked to breast cancer in families with a high degree of incidence before age 45. by 1992 a low-resolution genetic linkage map of the entire human genome was published and u.s. and french teams completed genetic maps of both mouse and man. the mouse with an average marker spacing of 4.3 cm as determined by eric lander and colleagues at whitehead and the human, with an average marker spacing of 5 cm by jean weissenbach and colleagues at ceph (centre d'etude du polymorphisme humaine). the latter institute was the subject of a rather scathing book by paul rabinow (1999) based on what they did with this genome map. in 1993, an american biotechnology company, millennium pharmaceuticals, and the ceph, developed plans for a collaborative effort to discover diabetes genes. the results of this collaboration could have been medically significant and financially lucrative. the two parties had agreed that ceph would supply millennium with germplasm collected from a large coterie of french families, and millennium would supply funding and expertise in new technologies to accelerate the identification of the genes, terms to which the french government had agreed. but in early 1994, just as the collaboration was to begin, the french government cried halt! the government explained that the ceph could not be permitted to give the americans that most precious of substances for which there was no precedent in law -french dna. rabinow's book discusses the tangled relations and conceptions such as, can a country be said to have its own genetic material, the first but hardly the last franco-american disavowal of détente (paul rabinow, 1999) . the latest facilities such as the joint genome institute (jgi), walnut creek, ca are now able to sequence up to 10mb per day which makes it possible to sequence whole microbial genomes within a day. technologies currently under development will probably increase this capacity yet further through massively parallel sequencing and/or microfluidic processing making it possible to sequence multiple genotypes from several species. nineteen ninety-two saw one of the first shakeups in the progress of the hgp. that was the year that the first major outsider entered the race when britain's wellcome trust plunked down $95 million to join the hgp. this caused a mere ripple while the principal shake-ups occurred stateside. much of the debate and subsequently the direction all the way through the hgp process was shaped by the personalities involved. as noted the application of one of the innovative techniques, namely ests, to do an end run on patenting introduced one of those major players to the fray, craig venter. venter, the high school drop out who reached the age of majority in the killing fields of vietnam was to play a pivotal role in a more "civilized" but no less combative field of human endeavor. he came onto the world stage through his initial work on ests while at the national institute of neurological disorders and stroke (ninds) from 1984 to 1992. he noted in an interview with the scientist magazine in 1995, that there was a degree of ambiguity at ninds about his venturing into the field of genomics, while they liked the prestige of hosting one of the leaders and innovators in his newly emerging field, they were concerned about him moving outside the nind purview of the human brain and nervous system. ultimately, while he proclaimed to like the security and service infrastructure this institute afforded him, that same system became too restrictive for his interests and talent. he wanted the whole canvas of human-gene expression to be his universe, not just what was confined to the central nervous system. he was becoming more interested in taking a whole genome approach to understanding the overall structure of genomes and genome evolution, which was much broader than the mission of ninds. he noted, with some irony, in later years that the then current nih director harold varmus had wished in hindsight that nih had pushed to do a similar database in the public domain, clearly in venter's opinion varmus was in need of a refresher course in history! bernadine healy, nih director in 1994, was one of the few in a leadership role who saw the technical and fiscal promise of venter's work and, like all good administrators, it also presented an opportunity to resolve a thorny "personnel" issue. she appointed him head of the ad hoc committee to have an intramural genome program at nih to give the head of the hgp (that other larger than life personality jim watson) notice that he was not the sole arbitrator of the direction for the human genome project. however venter very soon established himself as an equally non-conformist character and with the tacit consent of his erstwhile benefactor. he initially assumed the mantle of a non-conformist through guilt by association rather than direct actions when it was revealed that nih was filing patent applications on thousands of these partial genes based on his ests catalyzing the first hgp fight at a congressional hearing. nih's move was widely criticized by the scientific community because, at the time, the function of genes associated with the partial sequences was unknown. critics charged that patent protection for the gene segments would forestall future research on them. the patent office eventually rejected the patents, but the applications sparked an international controversy over patenting genes whose functions were still unknown. interestingly enough despite nih's reliance on the est/cdna technique, venter, who was now clearly venturing outside the ninds mandated rubric, could not obtain government funding to expand his research, prompting him to leave nih in 1992. he moved on to become president and director of the institute for genomic research (tigr), a nonprofit research center based in gaithersburg, md. at the same time william haseltine formed a sister company, human genome sciences (hgs), to commercialize tigr products. venter continued est work at tigr, but also began thinking about sequencing entire genomes. again, he came up with a quicker and faster method: whole genome shotgun sequencing. he applied for an nih grant to use the method on hemophilus influenzae, but started the project before the funding decision was returned. when the genome was nearly complete, nih rejected his proposal saying the method would not work. in a triumphal flurry in late may 1995 and with a metaphorical nose-thumbing at his recently rejected "unworkable" grant venter announced that tigr and collaborators had fully sequenced the first free-living organism -haemophilus influenzae. in november 1994, controversy surrounding venter's research escalated. access restrictions associated with a cdna database developed by tigr and its rockville, md.-based biotech associate, human genome sciences (hgs) inc. -including hgs's right to preview papers on resulting discoveries and for first options to license products -prompted merck and co. inc. to fund a rival database project. in that year also britain "officially" entered the hgp race when the wellcome trust trumped down $95 million (as mentioned earlier). the following year hgs was involved in yet another patenting debacle forced by the rapid march of technology into uncharted patent law territory. on june 5, 1995 hgs applied for a patent on a gene that produces a "receptor" protein that is later called ccr5. at that time hgs has no idea that ccr5 is an hiv receptor. in december 1995, u.s. researcher robert gallo, the co-discoverer of hiv, and colleagues found three chemicals that inhibit the aids virus but they did not know how the chemicals work. in february 1996, edward berger at the nih discovered that gallo's inhibitors work in late-stage aids by blocking a receptor on the surface of t-cells. in june of that year in a period of just 10 days, five groups of scientists published papers saying ccr5 is the receptor for virtually all strains of hiv. in january 2000, schering-plough researchers told a san francisco aids conference that they have discovered new inhibitors. they knew that merck researchers had made similar discoveries. as a significant valentine in 2000 the u.s. patent and trademark office (uspto) grants hgs a patent on the gene that makes ccr5 and on techniques for producing ccr5 artificially. the decision sent hgs stock flying and dismayed researchers. it also caused the uspto to revise its definition of a "patentable" drug target. in the meantime haseltine's partner in rewriting patenting history, venter turned his focus to the human genome. he left tigr and started the for-profit company celera, a division of pe biosystems, the company that at times, thanks to hood and hunkapillar, led the world in the production of sequencing machines. using these machines, and the world's largest civilian supercomputer, venter finished assembling the human genome in just three years. following the debacle with the then nih director bernine healy over patenting the partial genes that resulted from est analysis, another major personality-driven event in that same year occurred. watson strongly opposed the idea of patenting gene fragments fearing that it would discourage research, and commented that "the automated sequencing machines 'could be run by monkeys.' " (nature june 29, 2000) with this dismissal watson resigned his nih nchgr post in 1992 to devote his full-time effort to directing cold spring harbor laboratory. his replacement was of a rather more pragmatic, less flamboyant nature. while venter maybe was described as an idiosyncratic shogun of the shotgun, francis collins was once described as the king arthur of the holy grail that is the human genome project. collins became the director of the national human genome research institute in 1993. he was considered the right man for the job following his 1989 success (along with lap-chee tsui) in identifying the gene for the cystic fibrosis transmembrane (cftr) chloride channel receptor that, when mutated, can lead to the onset of cystic fibrosis. although now indelibly connected with the topic non-plus tout in biology, like many great innovators in this field before him, francis collins had little interest in biology as he grew up on a farm in the shenandoah valley of virginia. from his childhood he seemed destined to be at the center of drama, his father was professor of dramatic arts at mary baldwin college and the early stage management of career was performed on a stage he built on the farm. while the physical and mathematical sciences held appeal for him, being possessed of a highly logical mind, collins found the format in which biology was taught in the high school of his day mind-numbingly boring, filled with dissections and rote memorization. he found the contemplation of the infinite outcomes of dividing by zero (done deliberately rather than by accident as in einstein's case) far more appealing than contemplating the innards of a frog. that biology could be gloriously logical only became clear to collins when, in 1970, he entered yale with a degree in chemistry from the university of virginia and was first exposed to the nascent field of molecular biology. anecdotally it was the tome, the book of life, penned by the theoretical physicist father of molecular biology, edwin schrodinger, while exiled in trinity college dublin in 1942 that was the catalyst for his conversion. like schrodinger he wanted to do something more obviously meaningful (for less than hardcore physicists at least!) than theoretical physics, so he went to medical school at unc-chapel hill after completing his chemistry doctorate in yale, and returned to the site of his road to damascus for post-doctoral study in the application of his newfound interest in human genetics. during this sojourn at yale, collins began working on developing novel tools to search the genome for genes that cause human disease. he continued this work, which he dubbed "positional cloning," after moving to the university of michigan as a professor in 1984. he placed himself on the genetic map when he succeeded in using this method to put the gene that causes cystic fibrosis on the physical map. while a less colorful-in-your-face character than venter he has his own personality quirks, for example, he pastes a new sticker onto the back of his motorcycle helmet every time he finds a new disease gene. one imagines that particular piece of really estate is getting rather crowded. interestingly it was not these four hundred pound us gorillas who proposed the eventually prescient timeline for a working draft but two from the old power base. in meetings in the us in 1994, john sulston and bob waterston proposed to produce a 'draft' sequence of the human genome by 2000, a full five years ahead of schedule. while agreed by most to be feasible it meant a rethinking of strategy and involved focusing resources on larger centers and emphasizing sequence acquisition. just as important, it asserts the value of draft quality sequence to biomedical research. discussion started with the british based wellcome trust as possible sponsors (marshall e. 1995) . by 1995 a rough draft of the human genome map was produced showing the locations of more than 30,000 genes. the map was produced using yeast artificial chromosomes and some chromosomes -notably the littlest 22 -were mapped in finer detail. these maps marked an important step toward clone-based sequencing. the importance was illustrated in the devotion of an entire edition of the journal nature to the subject. (nature 377: 175-379 1995) the duel between the public and private face of the hgp progressed at a pace over the next five years. following release of the mapping data some level of international agreement was decided on sequence data release and databases. they agreed on the release of sequence data, specifically, that primary genomic sequence should be in the public domain to encourage research and development to maximize its benefit to society. also that it be rapidly released on a daily basis with assemblies of greater than 1 kb and that the finished annotated sequence should be submitted immediately to the public databases. in 1996 an international consortium completed the sequence of the genome of the workhorse yeast saccharomyces cerevisiae. data had been released as the individual chromosomes were completed. the saccharomyces genome database (sgd) was created to curate this information. the project collects information and maintains a database of the molecular biology of s. cerevisiae. this database includes a variety of genomic and biological information and is maintained and updated by sgd curators. the sgd also maintains the s. cerevisiae gene name registry, a complete list of all gene names used in s. cerevisiae. in 1997 a new more powerful diagnostic tool termed snps (single nucleotide polymorphisms) was developed. snps are changes in single letters in our dna code that can act as markers in the dna landscape. some snps are associated closely with susceptibility to genetic disease, our response to drugs or our ability to remove toxins. the snp consortium although designated a limited company is a nonprofit foundation organized for the purpose of providing public genomic data. it is a collaborative effort between pharmaceutical companies and the wellcome trust with the idea of making available widely accepted, high-quality, extensive, and publicly accessible snp map. its mission was to develop up to 300,000 snps distributed evenly throughout the human genome and to make the information related to these snps available to the public without intellectual property restrictions. the project started in april 1999 and was anticipated to continue until the end of 2001. in the end, many more snps, about 1.5 million total, were discovered than was originally planned. by 1998 the complete genome sequence of mycobacterium tuberculosis was published by teams from the uk, france, us and denmark in june 1998. the abi prism 3700 sequencing machine, a capillary-based machine designed to run about eight sets of 96 sequence reactions per day also reached the market that year. that same year the genome sequence of the first multicellular organism, c. elegans was completed. c. elegans has a genome of about 100 mb and, as noted, is a primitive animal model organism used in a range of biological disciplines. by november 1999 the human genome draft sequence reached 1000 mb and the first complete human chromosome was sequenced -this first was reached on the east side of the atlantic by the hgp team led by the sanger centre, producing a finished sequence for chromosome 22, which is about 34 million base-pairs and includes at least 550 genes. according to anecdotal evidence when visiting his namesake centre, sanger asked: "what does this machine do then?" "dideoxy sequencing" came the reply, to which fred retorted: "haven't they come up with anything better yet?" as will be elaborated in the final chapter the real highlight of 2000 was production of a 'working draft' sequence of the human genome, which was announced simultaneously in the us and the uk. in a joint event, celera genomics announced completion of their 'first assembly' of the genome. in a remarkable special issue, nature included a 60-page article by the human genome project partners, studies of mapping and variation, as well as analysis of the sequence by experts in different areas of biology. science published the article by celera on their assembly of hgp and celera data as well as analyses of the use of the sequence. however to demonstrate the sensitivity of the market place to presidential utterances the joint appearances by bill clinton and tony blair touting this major milestone turned into a major cold shower when clinton's reassurance of access of the people to their genetic information caused a precipitous drop in celera's share value overnight. clinton's assurance that, "the effort to decipher the human genome will be the scientific breakthrough of the century -perhaps of all time. we have a profound responsibility to ensure that the life-saving benefits of any cutting-edge research are available to all human beings." (president bill clinton, wednesday, march 14, 2000) stands in sharp contrast to the statement from venter's colleague that " any company that wants to be in the business of using genes, proteins, or antibodies as drugs has a very high probability of running afoul of our patents. from a commercial point of view, they are severely constrained -and far more than they realize." (william a. haseltine, chairman and ceo, human genome sciences). the huge sell-off in stocks ended weeks of biotech buying in which those same stocks soared to unprecedented highs. by the next day, however, the genomic company spin doctors began to recover ground in a brilliant move which turned the clinton announcement into a public relations coup. all major genomics companies issued press releases applauding president clinton's announcement. the real news they argued, was that "for the first time a president strongly affirmed the importance of gene based patents." and the same bill haseltine of human genome sciences positively gushed as he happily pointed out that he "could begin his next annual report with the [president's] monumental statement, and quote today as a monumental day." as distinguished harvard biologist richard lewontin notes: "no prominent molecular biologist of my acquaintance is without a financial stake in the biotechnology business. as a result, serious conflicts of interest have emerged in universities and in government service (lewontin, 2000) . away from the spin doctors perhaps eric lander may have best summed up the herculean effort when he opined that for him "the human genome project has been the ultimate fulfilment: the chance to share common purpose with hundreds of wonderful colleagues towards a goal larger than ourselves. in the long run, the human genome project's greatest impact might not be the three billion nucleotides of the human chromosomes, but its model of scientific community." (ridley, 2000) 6. gene therapy the year 1990 also marked the passing of another milestone that was intimately connected to one of the fundamental drivers of the hgp. the california hereditary disorders act came into force and with it one of the potential solutions for human hereditary disorders. w. french anderson in the usa reported the first successful application of gene therapy in humans. the first successful gene therapy for a human disease was successfully achieved for severe combined immune deficiency (scid) by introducing the missing gene, adenosine deaminase deficiency (ada) into the peripheral lymphocytes of a 4-year-old girl and returning modified lymphocytes to her. although the results are difficult to interpret because of the concurrent use of polyethylene glycol-conjugated ada commonly referred to as pegylated ada (pgla) in all patients, strong evidence for in vivo efficacy was demonstrated. ada-modified t cells persisted in vivo for up to three years and were associated with increases in t-cell number and ada enzyme levels, t cells derived from transduced pgla were progressively replaced by marrow-derived t cells, confirming successful gene transfer into long-lived progenitor cells. ashanthi desilva, the girl who received the first credible gene therapy, continues to do well more than a decade later. cynthia cutshall, the second child to receive gene therapy for the same disorder as desilva, also continues to do well. within 10 years (by january 2000), more than 350 gene therapy protocols had been approved in the us and worldwide, researchers launched more than 400 clinical trials to test gene therapy against a wide array of illnesses. surprisingly, a disease not typically heading the charts of heritable disorders, cancer has dominated the research. in 1994 cancer patients were treated with the tumor necrosis factor gene, a natural tumor fighting protein which worked to a limited extent. even more surprisingly, after the initial flurry of success little has worked. gene therapy, the promising miracle of 1990 failed to deliver on its early promise over the decade. apart from those examples, there are many diseases whose molecular pathology is, or soon will be, well understood, but for which no satisfactory treatments have yet been developed. at the beginning of the nineties it appeared that gene therapy did offer new opportunities to treat these disorders both by restoring gene functions that have been lost through mutation and by introducing genes that can inhibit the replication of infectious agents, render cells resistant to cytotoxic drugs, or cause the elimination of aberrant cells. from this "genomic" viewpoint genes could be said to be viewed as medicines, and their development as therapeutics should embrace the issues facing the development of small-molecule and protein therapeutics such as bioavailability, specificity, toxicity, potency, and the ability to be manufactured at large scale in a cost-effective manner. of course for such a radical approach certain basal level criteria needed to be established for selecting disease candidates for human gene therapy. these include, such factors as the disease is an incurable, life-threatening disease; organ, tissue, and cell types affected by the disease have been identified; the normal counterpart of the defective gene has been isolated and cloned; either the normal gene can be introduced into a substantial subfraction of the cells from the affected tissue, or the introduction of the gene into the available target tissue, such as bone marrow, will somehow alter the disease process in the tissue affected by the disease; the gene can be expressed adequately (it will direct the production of enough normal protein to make a difference); and techniques are available to verify the safety of the procedure. an ideal gene therapeutic should, therefore, be stably formulated at room temperature and amenable to administration either as an injectable or aerosol or by oral delivery in liquid or capsule form. the therapeutic should also be suitable for repeat therapy, and when delivered, it should neither generate an immune response nor be destroyed by tissue-scavenging mechanisms. when delivered to the target cell, the therapeutic gene should then be transported to the nucleus, where it should be maintained as a stable plasmid or chromosomal integrant, and be expressed in a predictable, controlled fashion at the desired potency in a cell-specific or tissue-specific manner. in addition to the ada gene transfer in children with severe combined immunodeficiency syndrome, a gene-marking study of epstein-barr virus-specific cytotoxic t cells, and trials of gene-modified t cells expressing suicide or viral resistance genes in patients infected with hiv were studied in the early nineties. additional strategies for t-cell gene therapy which were pursued later in the decade involve the engineering of novel t-cell receptors that impart antigen specificity for virally infected or malignant cells. issues which still are not resolved include nuclear transport, integration, regulated gene expression and immune surveillance. this knowledge, when finally understood and applied to the design of delivery vehicles of either viral or non-viral origin, will assist in the realization of gene therapeutics as safe and beneficial medicines that are suited to the routine management of human health. scientists are also working on using gene therapy to generate antibodies directly inside cells to block the production of harmful viruses such as hiv or even cancer-inducing proteins. there is a specific connection with francis collins, as his motivation for pursuing the hgp was his pursuit of defective genes beginning with the cystic fibrosis gene. this gene, called the cf transmembrane conductance regulator, codes for an ion channel protein that regulates salts in the lung tissue. the faulty gene prevents cells from excreting salt properly causing a thick sticky mucus to build up and destroy lung tissue. scientists have spliced copies of the normal genes into disabled adeno viruses that target lung tissues and have used bronchioscopes to deliver them to the lungs. the procedure worked well in animal studies however clinical trials in humans were not an unmitigated success. because the cells lining the lungs are continuously being replaced the effect is not permanent and must be repeated. studies are underway to develop gene therapy techniques to replace other faulty genes. for example, to replace the genes responsible for factor viii and factor ix production whose malfunctioning causes hemophilia a and b respectively; and to alleviate the effects of the faulty gene in dopamine production that results in parkinson's disease. apart from technical challenges such a radical therapy also engenders ethical debate. many persons who voice concerns about somatic-cell gene therapy use a "slippery slope" argument. it sounds good in theory but where does one draw the line. there are many issues yet to be resolved in this field of thorny ethics "good" and "bad" uses of the gene modification, difficulty of following patients in long-term clinical research and such. many gene therapy candidates are children who are too young to understand the ramifications of this treatment: conflict of interest -pits individuals' reproductive liberties and privacy interests against the interests of insurance companies or society. one issue that is unlikely to ever gain acceptance is germline therapy, the removal of deleterious genes from the population. issues of justice and resource allocation also have been raised: in a time of strain on our health care system, can we afford such expensive therapy? who should receive gene therapy? if it is made available only to those who can afford it, then a number of civil rights groups claim that the distribution of desirable biological traits among different socioeconomic and ethnic groups would become badly skewed adding a new and disturbing layer of discriminatory behavior. indeed a major setback occurred before the end of the decade in 1999. jesse gelsinger was the first person to die from gene therapy, on september 17, 1999, and his death created another unprecedented situation when his family sued not only the research team involved in the experiment (u penn), the company genovo inc., but also the ethicist who offered moral advice on the controversial project. this inclusion of the ethicist as a defendant alongside the scientists and school was a surprising legal move that puts this specialty on notice, as will no doubt be the case with other evolving technologies such as stem cells and therapeutic cloning, that its members could be vulnerable to litigation over the philosophical guidance they provide to researchers. the penn group principal investigator james wilson approached ethicist arthur caplan about their plans to test the safety of a genetically engineered virus on babies with a deadly form of the liver disorder, ornithine transcarbamylase deficiency. the disorder allows poisonous levels of ammonia to build up in the blood system. caplan steered the researchers away from sick infants, arguing that desperate parents could not provide true informed consent. he said it would be better to experiment on adults with a less lethal form of the disease who were relatively healthy. gelsinger fell into that category. although he had suffered serious bouts of ammonia buildup, he was doing well on a special drug and diet regimen. the decision to use relatively healthy adults was controversial because risky, unproven experimental protocols generally use very ill people who have exhausted more traditional treatments, so have little to lose. in this case, the virus used to deliver the genes was known to cause liver damage, so some scientists were concerned it might trigger an ammonia crisis in the adults. wilson underestimated the risk of the experiment, omitted the disclosure about possible liver damage in earlier volunteers in the experiment and failed to mention the deaths of monkeys given a similar treatment during pre-clinical studies. a food and drug administration investigation after gelsinger's death found numerous regulatory violations by wilson's team, including the failure to stop the experiment and inform the fda after four successive volunteers suffered serious liver damage prior to the teen's treatment. in addition, the fda said gelsinger did not qualify for the experiment, because his blood ammonia levels were too high just before he underwent the infusion of genetic material. the fda suspended all human gene experiments by wilson and the university of penn subsequently restricting him solely to animal studies. a follow-up fda investigation subsequently alleged he improperly tested the experimental treatment on animals. financial conflicts of interest also surrounded james wilson, who stood to personally profit from the experiment through genovo his biotechnology company. the lawsuit was settled out of court for undisclosed terms in november 2000. the fda also suspended gene therapy trials at st. elizabeth's medical center in boston, a major teaching affiliate of tufts university school of medicine, which sought to use gene therapy to reverse heart disease, because scientists there failed to follow protocols and may have contributed to at least one patient death. in addition, the fda temporarily suspended two liver cancer studies sponsored by the schering-plough corporation because of technical similarities to the university of pennsylvania study. some research groups voluntarily suspended gene therapy studies, including two experiments sponsored by the cystic fibrosis foundation and studies at beth israel deaconess medical center in boston aimed at hemophilia. the scientists paused to make sure they learned from the mistakes. the nineties also saw the development of another "high-thoughput" breakthrough, a derivative of the other high tech revolution namely dna chips. in 1991 biochips were developed for commercial use under the guidance of affymetrix. dna chips or microarrays represent a "massively parallel" genomic technology. they facilitate high throughput analysis of thousands of genes simultaneously, and are thus potentially very powerful tools for gaining insight into the complexities of higher organisms including analysis of gene expression, detecting genetic variation, making new gene discoveries, fingerprinting strains and developing new diagnostic tools. these technologies permit scientists to conduct large scale surveys of gene expression in organisms, thus adding to our knowledge of how they develop over time or respond to various environmental stimuli. these techniques are especially useful in gaining an integrated view of how multiple genes are expressed in a coordinated manner. these dna chips have broad commercial applications and are now used in many areas of basic and clinical research including the detection of drug resistance mutations in infectious organisms, direct dna sequence comparison of large segments of the human genome, the monitoring of multiple human genes for disease associated mutations, the quantitative and parallel measurement of mrna expression for thousands of human genes, and the physical and genetic mapping of genomes. however the initial technologies, or more accurately the algorithms used to extract information, were far from robust and reproducible. the erstwhile serial entrepreneur, al zaffaroni (the rebel who in 1968 founded alza when syntex ignored his interest in developing new ways to deliver drugs) founded yet another company, affymetrix, under the stewardship of stephen fodor, which was subject to much abuse for providing final extracted data and not allowing access to raw data. as with other personalities of this high through put era, seattle-bred steve fodor was also somewhat of a polymath having contributed to two major technologies, microarrays and combinatorial chemistry, the former has delivered on it's, promise while the latter, like gene therapy, is still in a somewhat extended gestation. and despite the limitations of being an industrial scientist he has had a rather prolific portfolio of publications. his seminal manuscripts describing this work have been published in all the journals of note, science, nature and pnas and was recognized in 1992 by the aaas by receiving the newcomb-cleveland award for an outstanding paper published in science. fodor began his industrial career in yet another zaffaroni firm. in 1989 he was recruited to the affymax research institute in palo alto where he spearheaded the effort to develop high-density arrays of biological compounds. his initial interest was in the broad area of what came to be called combinatorial chemistry. of the techniques developed, one approach permitted high resolution chemical synthesis in a light-directed, spatially-defined format. in the days before positive selection vectors, a researcher might have screened thousands of clones by hand with an oligonucleotide probe just to find one elusive insert. fodor's (and his successors) dna array technology reverses that approach. instead of screening an array of unknowns with a defined probe -a cloned gene, pcr product, or synthetic oligonucleotide -each position or "probe cell" in the array is occupied by a defined dna fragment, and the array is probed with the unknown sample. fodor used his chemistry and biophysics background to develop very dense arrays of these biomolecules by combining photolithographic methods with traditional chemical techniques. the typical array may contain all possible combinations of all possible oligonucleotides (8-mers, for example) that occur as a "window" which is tracked along a dna sequence. it might contain longer oligonucleotides designed from all the open reading frames identified from a complete genome sequence. or it might contain cdnas -of known or unknown sequence -or pcr products. of course it is one thing to produce data it is quite another to extract it in a meaningful manner. fodor's group also developed techniques to read these arrays, employing fluorescent labeling methods and confocal laser scanning to measure each individual binding event on the surface of the chip with extraordinary sensitivity and precision. this general platform of microarray based analysis coupled to confocal laser scanning has become the standard in industry and academia for large-scale genomics studies. in 1993, fodor co-founded affymetrix where the chip technology has been used to synthesize many varieties of high density oligonucleotide arrays containing hundreds of thousands of dna probes. in 2001, steve fodor founded perlegen, inc., a new venture that applied the chip technology towards uncovering the basic patterns of human diversity. his company's stated goals are to analyze more than one million genetic variations in clinical trial participants to explain and predict the efficacy and adverse effect profiles of prescription drugs. in addition, perlegen also applies this expertise to discovering genetic variants associated with disease in order to pave the way for new therapeutics and diagnostics. fodor's former company diversified into plant applications by developing a chip of the archetypal model of plant systems arabidopsis and supplied pioneer hi bred with custom dna chips for monitoring maize gene expression. they (affymetrix) have established programs where academic scientists can use company facilities at a reduced price and set up 'user centers' at selected universities. a related but less complex technology called 'spotted' dna chips involves precisely spotting very small droplets of genomic or cdna clones or pcr samples on a microscope slide. the process uses a robotic device with a print head bearing fine "repeatograph" tips that work like fountain pens to draw up dna samples from a 96-well plate and spot tiny amounts on a slide. up to 10,000 individual clones can be spotted in a dense array within one square centimeter on a glass slide. after hybridization with a fluorescent target mrna, signals are detected by a custom scanner. this is the basis of the systems used by molecular dynamics and incyte (who acquired this technology when it took over synteni). in 1997, incyte was looking to gather more data for its library and perform experiments for corporate subscribers. the company considered buying affymetrix genechips but opted instead to purchase the smaller synteni, which had sprung out of pat brown's stanford array effort. synteni's contact printing technology resulted in dense -and cheaper -arrays. though incyte used the chips only internally, affymetrix sued, claiming synteni/incyte was infringing on its chip density patents. the suit argued that dense biochips -regardless of whether they use photolithography -cannot be made without a license from affymetrix! and in a litigious congo line endemic of this hi-tech era incyte countersued and for good measure also filed against genetic database competitor gene logic for infringing incyte's patents on database building. meanwhile, hyseq sued affymetrix, claiming infringement of nucleotide hybridization patents obtained by its cso. affymetrix, in turn, filed a countersuit, claiming hyseq infringed the spotted array patents. hyseq then reached back and found an additional hybridization patent it claimed that affymetrix had infringed. and so on into the next millennium! in part to avoid all of this another california company nanogen, inc. took a different approach to single nucleotide polymorphism discrimination technology. in an article in the april 2000 edition of nature biotechnology, entitled "single nucleotide polymorphic discrimination by an electronic dot blot assay on semiconductor microchips," nanogen describes the use of microchips to identify variants of the mannose binding protein gene that differ from one another by only a single dna base. the mannose binding protein (mbp) is a key component of the innate immune system in children who have not yet developed immunity to a variety of pathogens. to date, four distinct variants (alleles) of this gene have been identified, all differing by only a single nucleotide of dna. mbp was selected for this study because of its potential clinical relevance and its genetic complexity. the samples were assembled at the nci laboratory in conjunction with the national institutes of health and transferred to nanogen for analysis. however, from a high throughput perspective there is a question mark over microarrays. mark benjamin, senior director of business development at rosetta inpharmatics (kirkland, wa), is skeptical about the long-term prospects for standard dna arrays in high-throughput screening as the first steps require exposing cells and then isolating rna, which is something that is very hard to do in a high-throughput format. another drawback is that most of the useful targets are likely to be unknown (particularly in the agricultural sciences where genome sequencing is still in its infancy), and dna arrays that are currently available test only for previously sequenced genes. indeed, some argue that current dna arrays may not be sufficiently sensitive to detect the low expression levels of genes encoding targets of particular interest. and the added complication of the companies' reluctance to provide "raw data" means that derived data sets may be created with less than optimum algorithims thereby irretrievably losing potentially valuable information from the starting material. reverse engineering is a possible approach but this is laborious and time consuming and being prohibited by many contracts may arouse the interest of the ever-vigilant corporate lawyers. over the course of the nineties, outgrowths of functional genomics have been termed proteomics and metabolomics, which are the global studies of gene expression at the protein and metabolite levels respectively. the study of the integration of information flow within an organism is emerging as the field of systems biology. in the area of proteomics, the methods for global analysis of protein profiles and cataloging protein-protein interactions on a genome-wide scale are technically more difficult but improving rapidly, especially for microbes. these approaches generate vast amounts of quantitative data. the amount of expression data becoming available in the public and private sectors is already increasing exponentially. gene and protein expression data rapidly dwarfed the dna sequence data and is considerably more difficult to manage and exploit. in microbes, the small sizes of the genomes and the ease of handling microbial cultures, will enable high throughput, targeted deletion of every gene in a genome, individually and in combinations. this is already available on a moderate throughput scale in model microbes such as e. coli and yeast. combining targeted gene deletions and modifications with genome-wide assay of mrna and protein levels will enable intricate inter-dependencies among genes to be unraveled. simultaneous measurement of many metabolites, particularly in microbes, is beginning to allow the comprehensive modeling and regulation of fluxes through interdependent pathways. metabolomics can be defined as the quantitative measurement of all low molecular weight metabolites in an organism's cells at a specified time under specific environmental conditions. combining information from metabolomics, proteomics and genomics will help us to obtain an integrated understanding of cell biology. the next hierarchical level of phenotype considers how the proteome within and among cells cooperates to produce the biochemistry and physiology of individual cells and organisms. several authors have tentatively offered "physiomics" as a descriptor for this approach. the final hierarchical levels of phenotype include anatomy and function for cells and whole organisms. the term "phenomics" has been applied to this level of study and unquestionably the more well known omics namely economics, has application across all those fields. and, coming slightly out of left field this time, the spectre of eugenics needless to say was raised in the omics era. in the year 1992 american and british scientists unveiled a technique which has come to be known as pre-implantation genetic diagnosis (pid) for testing embryos in vitro for genetic abnormalities such as cystic fibrosis, hemophilia, and down's syndrome (wald, 1992) . this might be seen by most as a step forward, but it led ethicist david s. king (1999) to decry pid as a technology that could exacerbate the eugenic features of prenatal testing and make possible an expanded form of free-market eugenics. he further argues that due to social pressures and eugenic attitudes held by clinical geneticists in most countries, it results in eugenic outcomes even though no state coercion is involved and that, as abortion is not involved, and multiple embryos are available, pid is radically more effective as a tool of genetic selection. the first regulatory approval of a recombinant dna technology in the u.s. food supply was not a plant but an industrial enzyme that has become the hallmark of food biotechnology success. enzymes were important agents in food production long before modern biotechnology was developed. they were used, for instance, in the clotting of milk to prepare cheese, the production of bread and the production of alcoholic beverages. nowadays, enzymes are indispensable to modern food processing technology and have a great variety of functions. they are used in almost all areas of food production including grain processing, milk products, beer, juices, wine, sugar and meat. chymosin, known also as rennin, is a proteolytic enzyme whose role in digestion is to curdle or coagulate milk in the stomach, efficiently converting liquid milk to a semisolid like cottage cheese, allowing it to be retained for longer periods in a neonate's stomach. the dairy industry takes advantage of this property to conduct the first step in cheese production. chy-max™, an artificially produced form of the chymosin enzyme for cheese-making, was approved in 1990. in some instances they replace less acceptable "older" technology, for example the enzyme chymosin. unlike crops industrial enzymes have had relatively easy passage to acceptance for a number of reasons. as noted they are part of the processing system and theoretically do not appear in the final product. today about 90% of the hard cheese in the us and uk is made using chymosin from geneticallymodified microbes. it is easier to purify, more active (95% as compared to 5%) and less expensive to produce (microbes are more prolific, more productive and cheaper to keep than calves). like all enzymes it is required only in very small quantities and because it is a relatively unstable protein it breaks down as the cheese matures. indeed, if the enzyme remained active for too long it would adversely affect the development of the cheese, as it would degrade the milk proteins to too great a degree. such enzymes have gained the support of vegetarian organizations and of some religious authorities. for plants the nineties was the era of the first widespread commercialization of what came to be known in often deprecating and literally inaccurate terms as gmos (genetically modified organisms). when the nineties dawned dicotyledonous plants were relatively easily transformed with agrobacterium tumefaciens but many economically important plants, including the cereals, remained inaccessible for genetic manipulation because of lack of effective transformation techniques. in 1990 this changed with the technology that overcame this limitation. michael fromm, a molecular biologist at the plant gene expression center, reported the stable transformation of corn using a high-speed gene gun. the method known as biolistics uses a "particle gun" to shoot metal particles coated with dna into cells. initially a gunpowder charge subsequently replaced by helium gas was used to accelerate the particles in the gun. there is a minimal disruption of tissue and the success rate has been extremely high for applications in several plant species. the technology rights are now owned by dupont. in 1990 some of the first of the field trials of the crops that would dominate the second half of the nineties began, including bt corn (with the bacillus thuriengenesis cry protein discussed in chapter three). in 1992 the fda declared that genetically engineered foods are "not inherently dangerous" and do not require special regulation. since 1992, researchers have pinpointed and cloned several of the genes that make selected plants resistant to certain bacterial and fungal infections; some of these genes have been successfully inserted into crop plants that lack them. many more infection-resistant crops are expected in the near future, as scientists find more plant genes in nature that make plants resistant to pests. plant genes, however, are just a portion of the arsenal; microorganisms other than bt also are being mined for genes that could help plants fend off invaders that cause crop damage. the major milestone of the decade in crop biotechnology was approval of the first bioengineered crop plant in 1994. it represented a double first not just of the first approved food crop but also of the first commercial validation of a technology which was to be surpassed later in the decade. that technology, antisense technology works because nucleic acids have a natural affinity for each other. when a gene coding for the target in the genome is introduced in the opposite orientation, the reverse rna strand anneals and effectively blocks expression of the enzyme. this technology was patented by calgene for plant applications and was the technology behind the famous flavr savr tomatoes. the first success for antisense in medicine was in 1998 when the u.s. food and drug administration gave the go-ahead to the cytomegalovirus (cmv) inhibitor fomivirsen, a phosphorothionate antiviral for the aids-related condition cmv retinitis making it the first drug belonging to isis, and the first antisense drug ever, to be approved. another technology, although not apparent at the time was behind the second approval and also the first and only successful to date in a commercial tree fruit biotech application. the former was a virus resistant squash the second the papaya ringspot resistant papaya. both owed their existence as much to historic experience as modern technology. genetically engineered virus-resistant strains of squash and cantaloupe, for example, would never have made it to farmers' fields if plant breeders in the 1930's had not noticed that plants infected with a mild strain of a virus do not succumb to more destructive strains of the same virus. that finding led plant pathologist roger beachy, then at washington university in saint louis, to wonder exactly how such "cross-protection" worked -did part of the virus prompt it? in collaboration with researchers at monsanto, beachy used an a. tumefaciens vector to insert into tomato plants a gene that produces one of the proteins that makes up the protein coat of the tobacco mosaic virus. he then inoculated these plants with the virus and was pleased to discover, as reported in 1986, that the vast majority of plants did not succumb to the virus. eight years later, in 1994, virus-resistant squash seeds created with beachy's method reached the market, to be followed soon by bioengineered virus-resistant seeds for cantaloupes, potatoes, and papayas. (breeders had already created virusresistant tomato seeds by using traditional techniques.) and the method of protection still remained a mystery when the first approvals were given in 1994 and 1996. gene silencing was perceived initially as an unpredictable and inconvenient side effect of introducing transgenes into plants. it now seems that it is the consequence of accidentally triggering the plant's adaptive defense mechanism against viruses and transposable elements. this recently discovered mechanism, although mechanistically different, has a number of parallels with the immune system of mammals. how this system worked was not elucidated until later in the decade by a researcher who was seeking a very different holy grail -the black rose! rick jorgensen, at that time at dna plant technologies in oakland, ca and subsequently of, of the university of california davis attempted to overexpress the chalcone synthase gene by introducing a modified copy under a strong promoter.surprisingly he obtained white flowers, and many strange variegated purple and white variations in between. this was the first demonstration of what has come to be known as post-transcriptional gene silencing (ptgs). while initially it was considered a strange phenomenon limited to petunias and a few other plant species, it is now one of the hottest topics in molecular biology. rna interference (rnai) in animals and basal eukaryotes, quelling in fungi, and ptgs in plants are examples of a broad family of phenomena collectively called rna silencing (hannon 2002; plasterk 2002) . in addition to its occurrence in these species it has roles in viral defense (as demonstrated by beachy) and transposon silencing mechanisms among other things. perhaps most exciting, however, is the emerging use of ptgs and, in particular, rnai -ptgs initiated by the introduction of double-stranded rna (dsrna) -as a tool to knock out expression of specific genes in a variety of organisms. nineteen ninety one also heralded yet another first. the february 1, 1991 issue of science reported the patenting of "molecular scissors": the nobel-prize winning discovery of enzymatic rna, or "ribozymes," by thomas czech of the university of colorado. it was noted that the u.s. patent and trademark office had awarded an "unusually broad" patent for ribozymes. the patent is u.s. patent no. 4,987,071, claim 1 of which reads as follows: "an enzymatic rna molecule not naturally occurring in nature having an endonuclease activity independent of any protein, said endonuclease activity being specific for a nucleotide sequence defining a cleavage site comprising single-stranded rna in a separate rna molecule, and causing cleavage at said cleavage site by a transesterification reaction." although enzymes made of protein are the dominant form of biocatalyst in modern cells, there are at least eight natural rna enzymes, or ribozymes, that catalyze fundamental biological processes. one of which was yet another discovery by plant virologists, in this instance the hairpin ribozyme was discovered by george bruening at uc davis. the self-cleavage structure was originally called a paperclip, by the bruening laboratory which discovered the reactions. as mentioned in chapter 3, it is believed that these ribozymes might be the remnants of an ancient form of life that was guided entirely by rna. since a ribozyme is a catalytic rna molecule capable of cleaving itself and other target rnas it therefore can be useful as a control system for turning off genes or targeting viruses. the possibility of designing ribozymes to cleave any specific target rna has rendered them valuable tools in both basic research and therapeutic applications. in the therapeutics area, they have been exploited to target viral rnas in infectious diseases, dominant oncogenes in cancers and specific somatic mutations in genetic disorders. most notably, several ribozyme gene therapy protocols for hiv patients are already in phase 1 trials. more recently, ribozymes have been used for transgenic animal research, gene target validation and pathway elucidation. however, targeting ribozymes to the cellular compartment containing their target rnas has proved a challenge. at the other bookend of the decade in 2000, samarsky et al. reported that a family of small rnas in the nucleolus (snornas) can readily transport ribozymes into this subcellular organelle. in addition to the already extensive panoply of rna entities yet another has potential for mischief. viroids are small, single-stranded, circular rnas containing 246-463 nucleotides arranged in a rod-like secondary structure and are the smallest pathogenic agents yet described. the smallest viroid characterized to date is rice yellow mottle sobemovirus (rymv), at 220 nucleotides. in comparison, the genome of the smallest known viruses capable of causing an infection by themselves, the single-stranded circular dna of circoviruses, is around 2 kilobases in size. the first viroid to be identified was the potato spindle tuber viroid (pstvd). some 33 species have been identified to date. unlike the many satellite or defective interfering rnas associated with plant viruses, viroids replicate autonomously on inoculation of a susceptible host. the absence of a protein capsid and of detectable messenger rna activity implies that the information necessary for replication and pathogenesis resides within the unusual structure of the viroid genome. the replication mechanism actually involves interaction with rna polymerase ii, an enzyme normally associated with synthesis of messenger rna, and "rolling circle" synthesis of new rna. some viroids have ribozyme activity which allow self-cleavage and ligation of unit-size genomes from larger replication intermediates. it has been proposed that viroids are "escaped introns". viroids are usually transmitted by seed or pollen. infected plants can show distorted growth. from its earliest years, biotechnology attracted interest outside scientific circles. initially the main focus of public interest was on the safety of recombinant dna technology, and of the possible risks of creating uncontrollable and harmful novel organisms (berg , 1975) . the debate on the deliberate release of genetically modified organisms, and on consumer products containing or comprising them, followed some years later (nas, 1987) . it is interesting to note that within the broad ethical tableau of potential issues within the science and products of biotechnology, the seemingly innocuous field of plant modification has been one of the major players of the 1990's. the success of agricultural biotechnology is heavily dependent on its acceptance by the public, and the regulatory framework in which the industry operates is also influenced by public opinion. as the focus for molecular biology research shifted from the basic pursuit of knowledge to the pursuit of lucrative applications, once again as in the previous two decades the specter of risk arose as the potential of new products and applications had to be evaluated outside the confines of a laboratory. however, the specter now became far more global as the implications of commercial applications brought not just worker safety into the loop but also, the environment, agricultural and industrial products and the safety and well being of all living things. beyond "deliberate" release, the rac guidelines were not designed to address these issues, so the matter moved into the realm of the federal agencies who had regulatory authority which could be interpreted to oversee biotechnology issues. this adaptation of oversight is very much a dynamic process as the various agencies wrestle with the task of applying existing regulations and developing new ones for oversight of this technology in transition. as the decade progressed focus shifted from basic biotic stress resistance to more complex modifications the next generation of plants will focus on value added traits in which valuable genes and metabolites will be identified and isolated, with some of the later compounds being produced in mass quantities for niche markets. two of the more promising markets are nutraceuticals or so-called "functional foods" and plants developed as bioreactors for the production of valuable proteins and compounds, a field known as plant molecular farming. developing plants with improved quality traits involves overcoming a variety of technical challenges inherent to metabolic engineering programs. both traditional plant breeding and biotechnology techniques are needed to produce plants carrying the desired quality traits. continuing improvements in molecular and genomic technologies are contributing to the acceleration of product development in this space. by the end of the decade in 1999, applying nutritional genomics, della penna (1999) isolated a gene, which converts the lower activity precursors to the highest activity vitamin e compound, alpha-tocopherol. with this technology, the vitamin e content of arabidopsis seed oil has been increased nearly 10-fold and progress has been made to move the technology to crops such as soybean, maize, and canola. this has also been done for folates in rice. omega three fatty acids play a significant role in human health, eicosapentaenoic acid (epa) and docosahexaenoic acid (dha), which are present in the retina of the eye and cerebral cortex of the brain, respectively, are some of the most well documented from a clinical perspective. it is believed that epa and dha play an important role in the regulation of inflammatory immune reactions and blood pressure, treatment of conditions such as cardiovascular disease and cystic fibrosis, brain development in utero, and, in early postnatal life, the development of cognitive function. they are mainly found in fish oil and the supply is limited. by the end of the decade ursin (2000) had succeeded in engineering canola to produce these fatty acids. from a global perspective another value-added development had far greater impact both technologically and socio-economically. a team led by ingo potrykus (1999) engineered rice to produce pro-vitamin a, which is an essential micronutrient. widespread dietary deficiency of this vitamin in rice-eating asian countries, which predisposes children to diseases such as blindness and measles, has tragic consequences. improved vitamin a nutrition would alleviate serious health problems and, according to unicef, could also prevent up to two million infant deaths due to vitamin a deficiency. adoption of the next stage of gm crops may proceed more slowly, as the market confronts issues of how to determine price, share value, and adjust marketing and handling to accommodate specialized end-use characteristics. furthermore, competition from existing products will not evaporate. challenges that have accompanied gm crops with improved agronomic traits, such as the stalled regulatory processes in europe, will also affect adoption of nutritionally improved gm products. beyond all of this, credible scientific research is still needed to confirm the benefits of any particular food or component. for functional foods to deliver their potential public health benefits, consumers must have a clear understanding of, and a strong confidence level in, the scientific criteria that are used to document health effects and claims. because these decisions will require an understanding of plant biochemistry, mammalian physiology, and food chemistry, strong interdisciplinary collaborations will be needed among plant scientists, nutritionists, and food scientists to ensure a safe and healthful food supply. in addition to being a source of nutrition, plants have been a valuable wellspring of therapeutics for centuries. during the nineties, however, intensive research has focused on expanding this source through rdna biotechnology and essentially using plants and animals as living factories for the commercial production of vaccines, therapeutics and other valuable products such as industrial enzymes and biosynthetic feedstocks. possibilities in the medical field include a wide variety of compounds, ranging from edible vaccine antigens against hepatitis b and norwalk viruses (arntzen, 1997) and pseudomonas aeruginosa and staphylococcus aureus to vaccines against cancer and diabetes, enzymes, hormones, cytokines, interleukins, plasma proteins, and human alpha-1-antitrypsin. thus, plant cells are capable of expressing a large variety of recombinant proteins and protein complexes. therapeutics produced in this way are termed plant made pharmaceuticals (pmps). and non-therapeutics are termed plant made industrial products (pmips) (newell-mcgloughlin, 2006) . the first reported results of successful human clinical trials with their transgenic plant-derived pharmaceuticals were published in 1998. they were an edible vaccine against e. coli-induced diarrhea and a secretory monoclonal antibody directed against streptococcus mutans, for preventative immunotherapy to reduce incidence of dental caries. haq et al. (1995) reported the expression in potato plants of a vaccine against e. coli enterotoxin (etec) that provided an immune response against the toxin in mice. human clinical trials suggest that oral vaccination against either of the closely related enterotoxins of vibrio cholerae and e. coli induces production of antibodies that can neutralize the respective toxins by preventing them from binding to gut cells. similar results were found for norwalk virus oral vaccines in potatoes. for developing countries, the intention is to deliver them in bananas or tomatoes (newell-mcgloughlin, 2006) . plants are also faster, cheaper, more convenient and more efficient than the principal eukaryotic production system, namely chinese hamster ovary (cho) cells for the production of pharmaceuticals. hundreds of acres of protein-containing seeds could inexpensively double the production of a cho bioreactor factory. in addition, proteins can be expressed at the highest levels in the harvestable seed and plant-made proteins and enzymes formulated in seeds have been found to be extremely stable, reducing storage and shipping costs. pharming may also enable research on drugs that cannot currently be produced. for example, croptech in blacksburg, va., is investigating a protein that seems to be a very effective anticancer agent. the problem is that this protein is difficult to produce in mammalian cell culture systems as it inhibits cell growth. this should not be a problem in plants. furthermore, production size is flexible and easily adjustable to the needs of changing markets. making pharmaceuticals from plants is also a sustainable process, because the plants and crops used as raw materials are renewable. the system also has the potential to address problems associated with provision of vaccines to people in developing countries. products from these alternative sources do not require a so-called "cold chain" for refrigerated transport and storage. those being developed for oral delivery obviates the need for needles and aspectic conditions which often are a problem in those areas. apart from those specific applications where the plant system is optimum there are many other advantages to using plant production. many new pharmaceuticals based on recombinant proteins will receive regulatory approval from the united states food and drug administration (fda) in the next few years. as these therapeutics make their way through clinical trials and evaluation, the pharmaceutical industry faces a production capacity challenge. pharmaceutical discovery companies are exploring plant-based production to overcome capacity limitations, enable production of complex therapeutic proteins, and fully realize the commercial potential of their biopharmaceuticals (newell-mcgloughlin, 2006) . nineteen ninety also marked a major milestone in the animal biotech world when herman made his appearance on the world's stage. since the palmiter's mouse, transgenic technology has been applied to several species including agricultural species such as sheep, cattle, goats, pigs, rabbits, poultry, and fish. herman was the first transgenic bovine created by genpharm international, inc., in a laboratory in the netherlands at the early embryo stage. scientist's microinjected recently fertilized eggs with the gene coding for human lactoferrin. the scientists then cultured the cells in vitro to the embryo stage and transferred them to recipient cattle. lactoferrin, an iron-containing anti-bacterial protein is essential for infant growth. since cow's milk doesn't contain lactoferrin, infants must be fed from other sources that are rich in iron -formula or mother's milk (newell-mcgloughlin, 2001) . as herman was a boy he would be unable to provide the source, that would require the production of daughters which was not necessarily a straightforward process. the dutch parliments permission was required. in 1992 they finally approved a measure that permitted the world's first genetically engineered bull to reproduce. the leiden-based gene pharming proceeded to artificially inseminate 60 cows with herman's sperm. with a promise that the protein, lactoferrin, would be the first in a new generation of inexpensive, high-tech drugs derived from cows' milk to treat complex diseases like aids and cancer. herman, became the father of at least eight female calves in 1994, and each one inherited the gene for lactoferrin production. while their birth was initially greeted as a scientific advancement that could have far-reaching effects for children in developing nations, the levels of expression were too low to be commercially viable. by 2002, herman, who likes to listen to rap music to relax, had sired 55 calves and outlived them all. his offspring were all killed and destroyed after the end of the experiment, in line with dutch health legislation. herman was also slated for the abattoir, but the dutch public -proud of making history with herman -rose up in protest, especially after a television program screened footage showing the amiable bull licking a kitten. herman won a bill of clemency from parliament. however, instead of retirement on a comfortable bed of straw, listening to rap music, herman was pressed into service again. he now stars at a permanent biotech exhibit in naturalis, a natural history museum in the dutch city of leiden. after his death, he will be stuffed and remain in the museum in perpetuity (a fate similar to what awaited an even more famous mammalian first born later in the decade). the applications for transgenic animal research fall broadly into two distinct areas, namely medical and agricultural applications. the recent focus on developing animals as bioreactors to produce valuable proteins in their milk can be catalogued under both areas. underlying each of these, of course, is a more fundamental application, that is the use of those techniques as tools to ascertain the molecular and physiological bases of gene expression and animal development. this understanding can then lead to the creation of techniques to modify development pathways. in 1992 a european decision with rather more far-reaching implications than hermans sex life was made. the first european patent on a transgenic animal was issued for a transgenic mouse sensitive to carcinogens -harvard's "oncomouse". the oncomouse patent application was refused in europe in 1989 due primarily to an established ban on animal patenting. the application was revised to make narrower claims, and the patent was granted in 1992. this has since been repeatedly challenged, primarily by groups objecting to the judgement that benefits to humans outweigh the suffering of the animal. currently, the patent applicant is awaiting protestors' responses to a series of possible modifications to the application. predictions are that agreement will not likely be forthcoming and that the legal wrangling will continue into the future. bringing animals into the field of controversy starting to swirl around gmos and preceding the latter's commercialization, was the approval by the fda of bovine somatotropin (bst) for increased milk production in dairy cows. the fda's center for veterinary medicine (cvm) regulates the manufacture and distribution of food additives and drugs that will be given to animals. biotechnology products are a growing proportion of the animal health products and feed components regulated by the cvm. the center requires that food products from treated animals must be shown to be safe for human consumption. applicants must show that the drug is effective and safe for the animal and that its manufacture will not affect the environment. they must also conduct geographically dispersed clinical trials under an investigational new animal drug application with the fda through which the agency controls the use of the unapproved compound in food animals. unlike within the eu, possible economic and social issues cannot be taken into consideration by the fda in the premarket drug approval process. under these considerations the safety and efficacy of rbst was determined. it was also determined that special labeling for milk derived from cows that had been treated with rbst is not required under fda food labeling laws because the use of rbst does not effect the quality or the composition of the milk. work with fish proceeded a pace throughout the decade. gene transfer techniques have been applied to a large number of aquatic organisms, both vertebrates and invertebrates. gene transfer experiments have targeted a wide variety of applications, including the study of gene structure and function, aquaculture production, and use in fisheries management programs. because fish have high fecundity, large eggs, and do not require reimplantation of embryos, transgenic fish prove attractive model systems in which to study gene expression. transgenic zebrafish have found utility in studies of embryogenesis, with expression of transgenes marking cell lineages or providing the basis for study of promoter or structural gene function. although not as widely used as zebrafish, transgenic medaka and goldfish have been used for studies of promoter function. this body of research indicates that transgenic fish provide useful models of gene expression, reliably modeling that in "higher" vertebrates. perhaps the largest number of gene transfer experiments address the goal of genetic improvement for aquaculture production purposes. the principal area of research has focused on growth performance, and initial transgenic growth hormone (gh) fish models have demonstrated accelerated and beneficial phenotypes. dna microinjection methods have propelled the many studies reported and have been most effective due to the relative ease of working with fish embryos. bob devlins' group in vancouver has demonstrated extraordinary growth rate in coho salmon which were transformed with a growth hormone from sockeye salmon. the transgenics achieve up to eleven times the size of their littermates within six months, reaching maturity in about half the time. interestingly this dramatic effect is only observed in feeding pins where the transgenics' ferocious appetites demands constant feeding. if the fish are left to their own devices and must forage for themselves, they appear to be out-competed by their smarter siblings. however most studies, such as those involving transgenic atlantic salmon and channel catfish, report growth rate enhancement on the order of 30-60%. in addition to the species mentioned, gh genes also have been transferred into striped bass, tilapia, rainbow trout, gilthead sea bream, common carp, bluntnose bream, loach, and other fishes. shellfish also are subject to gene transfer toward the goal of intensifying aquaculture production. growth of abalone expressing an introduced gh gene is being evaluated; accelerated growth would prove a boon for culture of the slowgrowing mollusk. a marker gene was introduced successfully into giant prawn, demonstrating feasibility of gene transfer in crustaceans, and opening the possibility of work involving genes affecting economically important traits. in the ornamental fish sector of aquaculture, ongoing work addresses the development of fish with unique coloring or patterning. a number of companies have been founded to pursue commercialization of transgenics for aquaculture. as most aquaculture species mature at 2-3 years of age, most transgenic lines are still in development and have yet to be tested for performance under culture conditions. extending earlier research that identified methylfarnesoate (mf) as a juvenile hormone in crustaceans and determined its role in reproduction, researchers at the university of connecticut have developed technology to synchronize shrimp egg production and to increase the number and quality of eggs produced. females injected with mf are stimulated to produce eggs ready for fertilization. the procedure produces 180 percent more eggs than the traditional crude method of removing the eyestalk gland. this will increase aquaculture efficiency. a number of experiments utilize gene transfer to develop genetic lines of potential utility in fisheries management. transfer of gh genes into northern pike, walleye, and largemouth bass are aimed at improving the growth rate of sport fishes. gene transfer has been posed as an option for reducing losses of rainbow trout to whirling disease, although suitable candidate genes have yet to be identified. richard winn of the university of georgia is developing transgenic killifish and medaka as biomonitors for environmental mutagens, which carry the bacteriophage phi x 174 as a target for mutation detection. development of transgenic lines for fisheries management applications generally is at an early stage, often at the founder or f1 generation. broad application of transgenic aquatic organisms in aquaculture and fisheries management will depend on showing that particular gmos can be used in the environment both effectively and safely. although our base of knowledge for assessing ecological and genetic safety of aquatic gmos currently is limited, some early studies supported by the usda biotechnology risk assessment program have yielded results. data from outdoor pond-based studies on transgenic catfish reported by rex dunham of auburn university show that transgenic and non-transgenic individuals interbreed freely, that survival and growth of transgenics in unfed ponds was equal to or less than that of non-transgenics, and that predator avoidance is not affected by expression of the transgene. however, unquestionably the seminal event for animal biotech in the nineties was ian wilmut's landmark work using nuclear transfer technology to generate the lambs morag and megan reported in 1996 (from an embryonic cell nuclei) and the truly ground-breaking work of creating dolly from an adult somatic cell nucleus, reported in february, 1997 (wilmut, 1997) . wilmut and his colleagues at the roslin institute demonstrated for the first time with the birth of dolly the sheep that the nucleus of an adult somatic cell can be transferred to an enucleated egg to create cloned offspring. it had been assumed for some time that only embryonic cells could be used as the cellular source for nuclear transfer. this assumption was shattered with the birth of dolly. this example of cloning an animal using the nucleus of an adult cell was significant because it demonstrated the ability of egg cell cytoplasm to "reprogram" an adult nucleus. when cells differentiate, that is, evolve from primitive embryonic cells to functionally defined adult cells, they lose the ability to express most genes and can only express those genes necessary for the cell's differentiated function. for example, skin cells only express genes necessary for skin function, and brain cells only express genes necessary for brain function. the procedure that produced dolly demonstrated that egg cytoplasm is capable of reprogramming an adult differentiated cell (which is only expressing genes related to the function of that cell type). this reprogramming enables the differentiated cell nucleus to once again express all the genes required for the full embryonic development of the adult animal. since dolly was cloned, similar techniques have been used to clone a veritable zoo of vertebrates including mice, cattle, rabbitts, mules, horses, fish, cats and dogs from donor cells obtained from adult animals. these spectacular examples of cloning normal animals from fully differentiated adult cells demonstrate the universality of nuclear reprogramming although the next decade called some of these assumptions into question. this technology supports the production of genetically identical and genetically modified animals. thus, the successful "cloning" of dolly has captured the imagination of researchers around the world. this technological breakthrough should play a significant role in the development of new procedures for genetic engineering in a number of mammalian species. it should be noted that nuclear cloning, with nuclei obtained from either mammalian stem cells or differentiated "adult" cells, is an especially important development for transgenic animal research. as the decade reached its end the clones began arriving rapidly with specific advances made by a japanese group who used cumulus cells rather than fibroblasts to clone calves. they found that the percentage of cultured, reconstructed eggs that developed into blastocysts was 49% for cumulus cells and 23% for oviductal cells. these rates are higher than the 12% previously reported for transfer of nuclei from bovine fetal fibroblasts. following on the heels of dolly, polly and molly became the first genetically engineered transgenic sheep produced through nuclear transfer technology. polly and molly were engineered to produce human factor ix (for hemophiliacs) by transfer of nuclei from transfected fetal fibroblasts. until then germline competent transgenics had only been produced in mammalian species, other than mice, using dna microinjection. researchers at the university of massachusetts and advanced cell technology (worcester, ma) teamed up to produce genetically identical calves utilizing a strategy similar to that used to produce transgenic sheep. in contrast to the sheep cloning experiment, the bovine experiment involved the transfer of nuclei from an actively dividing population of cells. previous results from the sheep experiments suggested that induction of quiescence by serum starvation was required to reprogram the donor nuclei for successful nuclear transfer. the current bovine experiments indicate that this step may not be necessary. typically about 500 embryos needed to be microinjected to obtain one transgenic cow, whereas nuclear transfer produced three transgenic calves from 276 reconstructed embryos. this efficiency is comparable to the previous sheep research where six transgenic lambs were produced from 425 reconstructed embryos. the ability to select for genetically modified cells in culture prior to nuclear transfer opens up the possibility of applying the powerful gene targeting techniques that have been developed for mice. one of the limitations of using primary cells, however, is their limited lifespan in culture. primary cell cultures such as the fetal fibroblasts can only undergo about 30 population doublings before they senesce. this limited lifespan would preclude the ability to perform multiple rounds of selection. to overcome this problem of cell senescence, these researchers showed that fibroblast lifespan could be prolonged by nuclear transfer. a fetus, which was developed by nuclear transfer from genetically modified cells, could in turn be used to establish a second generation of fetal fibroblasts. these fetal cells would then be capable of undergoing another 30 population doublings, which would provide sufficient time for selection of a second genetic modification. as noted, there is still some uncertainty over whether quiescent cells are required for successful nuclear transfer. induction into quiescence was originally thought to be necessary for successful nuclear reprogramming of the donor nucleus. however, cloned calves have been previously produced using non-quiescent fetal cells. furthermore, transfer of nuclei from sertoli and neuronal cells, which do not normally divide in adults, did not produce a liveborn mouse; whereas nuclei transferred from actively dividing cumulus cells did produce cloned mice. the fetuses used for establishing fetal cell lines in a tufts goat study were generated by mating nontransgenic females to a transgenic male containing a human antithrombin (at) iii transgene. this at transgene directs high level expression of human at into milk of lactating transgenic females. as expected, all three offspring derived from female fetal cells were females. one of these cloned goats was hormonally induced to lactate. this goat secreted 3.7-5.8 grams per liter of at in her milk. this level of at expression was comparable to that detected in the milk of transgenic goats from the same line obtained by natural breeding. the successful secretion of at in milk was a key result because it showed that a cloned animal could still synthesize and secrete a foreign protein at the expected level. it will be interesting to see if all three cloned goats secrete human at at the identical level. if so, then the goal of creating a herd identical transgenic animals, which secrete identical levels of an important pharmaceutical, would become a reality. no longer would variable production levels exist in subsequent generations due to genetically similar but not identical animals. this homogeneity would greatly aid in the production and processing of a uniform product. as nuclear transfer technology continues to be refined and applied to other species, it may eventually replace microinjection as the method of choice for generating transgenic livestock. nuclear transfer has a number of advantages: 1) nuclear transfer is more efficient than microinjection at producing a transgenic animal, 2) the fate of the integrated foreign dna can be examined prior to production of the transgenic animal, 3) the sex of the transgenic animal can be predetermined, and 4) the problem of mosaicism in first generation transgenic animals can be eliminated. dna microinjection has not been a very efficient mechanism to produce transgenic mammals. however, in november, 1998, a team of wisconsin researchers reported a nearly 100% efficient method for generating transgenic cattle. the established method of cattle transgenes involves injecting dna into the pronuclei of a fertilized egg or zygote. in contrast, the wisconsin team injected a replication-defective retroviral vector into the perivitelline space of an unfertilized oocyte. the perivitelline space is the region between the oocyte membrane and the protective coating surrounding the oocyte known as the zona pellucida. in addition to es (embryonic stem) cells other sources of donor nuclei for nuclear transfer might be used such as embryonic cell lines, primordial germ cells, or spermatogonia to produce offspring. the utility of es cells or related methodologies to provide efficient and targeted in vivo genetic manipulations offer the prospects of profoundly useful animal models for biomedical, biological and agricultural applications. the road to such success has been most challenging, but recent developments in this field are extremely encouraging. with the may 1999 announcement of geron buying out ian wilmuts company roslin biomed, they declared it the dawn of an new era in biomedical research. geron's technologies for deriving transplantable cells from human pluripotent stem cells (hpscs) and extending their replicative capacity with telomerase was combined with the roslin institute nuclear transfer technology, the technology that produced dolly the cloned sheep. the goal was to produce transplantable, tissue-matched cells that provide extended therapeutic benefits without triggering immune rejection. such cells could be used to treat numerous major chronic degenerative diseases and conditions such as heart disease, stroke, parkinson's disease, alzheimer's disease, spinal cord injury, diabetes, osteoarthritis, bone marrow failure and burns. the stem cell is a unique and essential cell type found in animals. many kinds of stem cells are found in the body, with some more differentiated, or committed, to a particular function than others. in other words, when stem cells divide, some of the progeny mature into cells of a specific type (heart, muscle, blood, or brain cells), while others remain stem cells, ready to repair some of the everyday wear and tear undergone by our bodies. these stem cells are capable of continually reproducing themselves and serve to renew tissue throughout an individual's life. for example, they continually regenerate the lining of the gut, revitalize skin, and produce a whole range of blood cells. although the term "stem cell" commonly is used to refer to the cells within the adult organism that renew tissue (e.g., hematopoietic stem cells, a type of cell found in the blood), the most fundamental and extraordinary of the stem cells are found in the early-stage embryo. these embryonic stem (es) cells, unlike the more differentiated adult stem cells or other cell types, retain the special ability to develop into nearly any cell type. embryonic germ (eg) cells, which originate from the primordial reproductive cells of the developing fetus, have properties similar to es cells. it is the potentially unique versatility of the es and eg cells derived, respectively, from the early-stage embryo and cadaveric fetal tissue that presents such unusual scientific and therapeutic promise. indeed, scientists have long recognized the possibility of using such cells to generate more specialized cells or tissue, which could allow the generation of new cells to be used to treat injuries or diseases, such as alzheimer's disease, parkinson's disease, heart disease, and kidney failure. likewise, scientists regard these cells as an important -perhaps essential -means for understanding the earliest stages of human development and as an important tool in the development of life-saving drugs and cell-replacement therapies to treat disorders caused by early cell death or impairment. geron corporation and its collaborators at the university of wisconsin -madison (dr. james a. thomson) and johns hopkins university (dr. john d. gearhart) announced in november 1998 the first successful derivation of hpscs from two sources: (i) human embryonic stem (hes) cells derived from in vitro fertilized blastocysts (thomson 1998 ) and (ii) human embryonic germ (heg) cells derived from fetal material obtained from medically terminated pregnancies (shamblott et al. 1998) . although derived from different sources by different laboratory processes, these two cell types share certain characteristics but are referred to collectively as human pluripotent stem cells (hpscs). because hes cells have been more thoroughly studied, the characteristics of hpscs most closely describe the known properties of hes cells. stem cells represent a tremendous scientific advancement in two ways: first, as a tool to study developmental and cell biology; and second, as the starting point for therapies to develop medications to treat some of the most deadly diseases. the derivation of stem cells is fundamental to scientific research in understanding basic cellular and embryonic development. observing the development of stem cells as they differentiate into a number of cell types will enable scientists to better understand cellular processes and ways to repair cells when they malfunction. it also holds great potential to yield revolutionary treatments by transplanting new tissue to treat heart disease, atherosclerosis, blood disorders, diabetes, parkinson's, alzheimer's, stroke, spinal cord injuries, rheumatoid arthritis, and many other diseases. by using stem cells, scientists may be able to grow human skin cells to treat wounds and burns. and, it will aid the understanding of fertility disorders. many patient and scientific organizations recognize the vast potential of stem cell research. another possible therapeutic technique is the generation of "customized" stem cells. a researcher or doctor might need to develop a special cell line that contains the dna of a person living with a disease. by using a technique called "somatic cell nuclear transfer" the researcher can transfer a nucleus from the patient into an enucleated human egg cell. this reformed cell can then be activated to form a blastocyst from which customized stem cell lines can be derived to treat the individual from whom the nucleus was extracted. by using the individual's own dna, the stem cell line would be fully compatible and not be rejected by the person when the stem cells are transferred back to that person for the treatment. preliminary research is occurring on other approaches to produce pluripotent human es cells without the need to use human oocytes. human oocytes may not be available in quantities that would meet the needs of millions of potential patients. however, no peer-reviewed papers have yet appeared from which to judge whether animal oocytes could be used to manufacture "customized" human es cells and whether they can be developed on a realistic timescale. additional approaches under consideration include early experimental studies on the use of cytoplasmic-like media that might allow a viable approach in laboratory cultures. on a much longer timeline, it may be possible to use sophisticated genetic modification techniques to eliminate the major histocompatibility complexes and other cell-surface antigens from foreign cells to prepare master stem cell lines with less likelihood of rejection. this could lead to the development of a bank of universal donor cells or multiple types of compatible donor cells of invaluable benefit to treat all patients. however, the human immune system is sensitive to many minor histocompatibility complexes and immunosuppressive therapy carries life-threatening complications. stem cells also show great potential to aid research and development of new drugs and biologics. now, stem cells can serve as a source for normal human differentiated cells to be used for drug screening and testing, drug toxicology studies and to identify new drug targets. the ability to evaluate drug toxicity in human cell lines grown from stem cells could significantly reduce the need to test a drug's safety in animal models. there are other sources of stem cells, including stem cells that are found in blood. recent reports note the possible isolation of stem cells for the brain from the lining of the spinal cord. other reports indicate that some stem cells that were thought to have differentiated into one type of cell can also become other types of cells, in particular brain stem cells with the potential to become blood cells. however, since these reports reflect very early cellular research about which little is known, we should continue to pursue basic research on all types of stem cells. some religious leaders will advocate that researchers should only use certain types of stem cells. however, because human embryonic stem cells hold the potential to differentiate into any type of cell in the human body, no avenue of research should be foreclosed. rather, we must find ways to facilitate the pursuit of all research using stem cells while addressing the ethical concerns that may be raised. another seminal and intimately related event at the end of the nineties occurred in madison wisconsin. up until november of 1998, isolating es cells in mammals other than mice proved elusive, but in a milestone paper in the november 5, 1998 issue of science, james a. thomson, (1998) a developmental biologist at uw-madison reported the first successful isolation, derivation and maintenance of a culture of human embryonic stem cells (hes cells). it is interesting to note that this leap was made from mouse to man. as thomson himself put it, these cells are different from all other human stem cells isolated to date and as the source of all cell types, they hold great promise for use in transplantation medicine, drug discovery and development, and the study of human developmental biology. the new century is rapidly exploiting this vision. when steve fodor was asked in 2003 "how do you really take the human genome sequence and transform it into knowledge?" he answered from affymetrix's perspective, it is a technology development task. he sees the colloquially named affychips being the equivalent of a cd-rom of the genome. they take information from the genome and write it down. the company has come a long way from the early days of venter's ests and less than robust algorithms as described earlier. one surprising fact unearthed by the newer more sophisticated generation of chips is that 30 to 35 percent of the non-repetitive dna is being expressed as accepted knowledge was that only 1.5 to 2 percent of the genome would be expressed. since much of that sequence has no protein-coding capacity it is most likely coding for regulatory functions. in a parallel to astrophysics this is often referred to in common parlance as the "dark matter of the genome" and like dark matter for many it is the most exciting and challenging aspect of uncovering the occult genome. it could be, and most probably is, involved in regulatory functions, networks, or development. and like physical dark matter it may change our whole concept of what exactly a gene is or is not! since beadle and tatum's circumspect view of the protein world no longer holds true it adds a layer of complexity to organizing chip design. depending on which sequences are present in a particular transcript, you can, theoretically, design a set of probes to uniquely distinguish that variant. at the dna level itself there is much potential for looking at variants either expressed or not at a very basic level as a diagnostic system, but ultimately the real paydirt is the information that can be gained from looking at the consequence of non-coding sequence variation on the transcriptome itself. and fine tuning when this matters and when it is irrelevant as a predicative model is the auspices of the affymetrix spin-off perlegen. perlegen came into being in late 2000 to accelerate the development of high-resolution, whole genome scanning. and they have stuck to that purity of purpose. to paraphrase dragnet's sergeant joe friday, they focus on the facts of dna just the dna. perlegen owes its true genesis to the desire of one of its cofounders to use dna chips to help understand the dynamics underlying genetic diseases. brad margus' two sons have the rare disease "ataxia telangiectasia" (a-t). a-t is a progressive, neurodegenerative childhood disease that affects the brain and other body systems. the first signs of the disease, which include delayed development of motor skills, poor balance, and slurred speech, usually occur during the first decade of life. telangiectasias (tiny, red "spider" veins), which appear in the corners of the eyes or on the surface of the ears and cheeks, are characteristic of the disease, but are not always present. many individuals with a-t have a weakened immune system, making them susceptible to recurrent respiratory infections. about 20% of those with a-t develop cancer, most frequently acute lymphocytic leukemia or lymphoma suggesting that the sentinel competence of the immune system is compromised. having a focus so close to home is a powerful driver for any scientist. his co-founder david cox is a polymath pediatrician whose training in the latter informs his application of the former in the development of patient-centered tools. from that perspective, perlegen's stated mission is to collaborate with partners to rescue or improve drugs and to uncover the genetic bases of diseases. they have created a whole genome association approach that enables them to genotype millions of unique snps in thousands of cases and controls in a timeframe of months rather than years. as mentioned previously, snp (single nucleotide polymorphism) markers are preferred over microsatellite markers for association studies because of their abundance along the human genome, the low mutation rate, and accessibilities to high-throughput genotyping. since most diseases, and indeed responses to drug interventions, are the products of multiple genetic and environmental factors it is a challenge to develop discriminating diagnostics and, even more so, targetedtherapeutics. because mutations involved in complex diseases act probabilisticallythat is, the clinical outcome depends on many factors in addition to variation in the sequence of a single gene -the effect of any specific mutation is smaller. thus, such effects can only be revealed by searching for variants that differ in frequency among large numbers of patients and controls drawn from the general population. analysis of these snp patterns provides a powerful tool to help achieve this goal. although most bi-alleic snps are rare, it has been estimated that just over 5 million common snps, each with a frequency of between 10 and 50%, account for the bulk of the dna sequence difference between humans. such snps are present in the human genome once every 600 base pairs or so. as is to be expected from linkage disequilibrium studies, alleles making up blocks of such snps in close physical proximity are often correlated, resulting in reduced genetic variability and defining a limited number of "snp haplotypes," each of which reflects descent from a single, ancient ancestral chromosome. in 2001 cox's group, using high level scanning with some old-fashioned somatic cell genetics, constructed the snp map of chromosome 21.the surprising findings were blocks of limited haplotype diversity in which more than 80% of a global human sample can typically be characterized by only three common haplotypes (interestingly enough the prevalence of each hapolytype in the examined population was in the ratio 50:25:12.5).from this the conclusion could be drawn that by comparing the frequency of genetic variants in unrelated cases and controls, genetic association studies could potentially identify specific haplotypes in the human genome that play important roles in disease, without need of knowledge of the history or source of the underlying sequence, which hypothesis they subsequently went on to prove. following cox et al. pioneering work on "blocking" chromosome 21 into characteristic haplotypes, tien chen came to visit him from university of southern california and following the visit his group developed discriminating algorithms which took advantage of the fact that the haplotype block structure can be decomposed into large blocks with high linkage disequilibrium and relatively limited haplotype diversity, separated by short regions of low disequilibrium. one of the practical implications of this observation is as suggested by cox that only a small fraction of all the snps they refer to as "tag" snps can be chosen for mapping genes responsible for complex human diseases, which can significantly reduce genotyping effort, without much loss of power. they developed algorithms to partition haplotypes into blocks with the minimum number of tag snps for an entire chromosome. in 2005 they reported that they had developed an optimized suite of programs to analyze these block linkage disequilibrium patterns and to select the corresponding tag snps that will pick the minimum number of tags for the given criteria. in addition the updated suite allows haplotype data and genotype data from unrelated individuals and from general pedigrees to be analyzed. using an approach similar to richard michelmore's bulk segregant analysis in plants of more than a decade previously, perlegen subsequently made use of these snp haplotype and statistical probability tools to estimate total genetic variability of a particular complex trait coded for by many genes, with any single gene accounting for no more than a few percent of the overall variability of the trait. cox's group have determined that fewer than 1000 total individuals provide adequate power to identify genes accounting for only a few percent of the overall genetic variability of a complex trait, even using the very stringent significance levels required when testing large numbers of dna variants. from this it is possible to identify the set of major genetic risk factors contributing to the variability of a complex disease and/or treatment response. so, while a single genetic risk factor is not a good predictor of treatment outcome, the sum of a large fraction of risk factors contributing to a treatment response or common disease can be used to optimize personalized treatments without requiring knowledge of the underlying mechanisms of the disease.they feel that a saturating level of coverage is required to produce repeatable prediction of response to medication or predisposition to disease and that taking shortcuts will for the most part lead to incomplete, clinically-irrelevant results. in 2005 hinds et al. in science describe even more dramatic progresss. they describe a publicly available, genome-wide data set of 1.58 million common singlenucleotide polymorphisms (snps) that have been accurately genotyped in each of 71 people from three population samples. a second public data set of more than 1 million snps typed in each of 270 people has been generated by the international haplotype map (hapmap) project. these two public data sets, combined with multiple new technologies for rapid and inexpensive snp genotyping, are paving the way for comprehensive association studies involving common human genetic variations. perlegen basically is taking to the next level fodor's stated reason for the creation of affymetrix, the belief that understanding the correlation between genetic variability and its role in health and disease would be the next step in the genomics revolution. the other interesting aspect of this level of coverage is, of course, the notion of discrete identifiable groups based on ethnicity, centers of origin and such breaks down and a spectrum of variation arises across all populations which makes the perlegen chip, at one level, a true unifier of humanity but at another adds a whole layer of complexity for hmos! at the turn of the century, this personalized chip approach to medicine received some validation at a simpler level in a closely related disease area to the one to which one fifth of a-t patients ultimately succumb when researchers at the whitehead institute used dna chips to distinguish different forms of leukemia based on patterns of gene expression in different populations of cells. moving cancer diagnosis away from visually based systems to such molecular based systems is a major goal of the national cancer institute. in the study, scientists used a dna chip to examine gene activity in bone marrow samples from patients with two different types of acute leukemia -acute myeloid leukemia (aml) and acute lymphoblastic leukemia (all). then, using an algorithm, developed at the whitehead, they identified signature patterns that could distinguish the two types. when they cross-checked the diagnoses made by the chip against known differences in the two types of leukemia, they found that the chip method could automatically make the distinction between aml and all without previous knowledge of these classes. taking it to a level beyond where perlegen are initially aiming, eric lander, leader of the study said, mapping not only what is in the genome, but also what the things in the genome do, is the real secret to comprehending and ultimately curing cancer and other diseases. chips gained recognition on the world stage in 2003 when they played a key role in the search for the cause of severe acute respiratory syndrome (sars) and probably won a mcarthur genius award for their creator. ucsf assistant professor joseph derisi, already famous in the scientific community as the wunderkind originator of the online diy chip maker in pat brown's lab at stanford, built a gene microarray containing all known completely sequenced viruses (12,000 of them) and, using a robot arm that he also customized, in a three day period used it to classify a pathogen isolated from sars patients as a novel coronavirus. when a whole galaxy of dots lit up across the spectrum of known vertebrate cornoviruses derisis knew this was a new variant. interestingly the sequence had the hottest signal with avian infectious bronchitis virus. his work subsequently led epidemiologists to target the masked palm civet, a tree-dwelling animal with a weasel-like face and a catlike body as the probable primary host. the role that derisi's team at ucsf played in identifying a coronavirus as a suspected cause of sars came to the attention of the national media when cdc director dr. julie gerberding recognized joe in march 24, 2003 press conference and in 2004 when joe was honored with one of the coveted mcarthur genius awards. this and other tools arising from information gathered from the human genome sequence and complementary discoveries in cell and molecular biology, new tools such as gene-expression profiling, and proteomics analysis are converging to finally show that rapid robust diagnostics and "rational" drug design has a future in disease research. another virus that puts sars deaths in perspective benefitted from rational drug design at the turn of the century. influenza, or flu, is an acute respiratory infection caused by a variety of influenza viruses. each year, up to 40 million americans develop the flu, with an average of about 150,000 being hospitalized and 10,000 to 40,000 people dying from influenza and its complications. the use of current influenza treatments has been limited due to a lack of activity against all influenza strains, adverse side effects, and rapid development of viral resistance. influenza costs the united states an annual $14.6 billion in physician visits, lost productivity and lost wages. and least we still dismiss it as a nuisance we are well to remember that the "spanish" influenza pandemic killed over 20 million people in 1918 and 1919, making it the worst infectious pandemic in history beating out even the more notorious black death of the middle ages. this fear has been rekindled as the dreaded h5n1 (h for haemaglutenin and n for neuraminidase as described below) strain of bird flu has the potential to mutate and recognise homo sapiens as a desirable host. since rna viruses are notoriously faulty in their replication this accelerated evolutionary process gives then a distinct advantage when adapting to new environments and therefore finding more amenable hosts. although inactivated influenza vaccines are available, their efficacy is suboptimal partly because of their limited ability to elicit local iga and cytotoxic t cell responses. the choices of treatments and preventions for influenza hold much more promise in this millennium. clinical trials of cold-adapted live influenza vaccines now under way suggest that such vaccines are optimally attenuated, so that they will not cause influenza symptoms but will still induce protective immunity. aviron (mountain view, ca), biochem pharma (laval, quebec, canada), merck (whitehouse station, nj), chiron (emeryville, ca), and cortecs (london), all had influenza vaccines in the clinic at the turn of the century, with some of them given intra-nasally or orally. meanwhile, the team of gilead sciences (foster city, ca) and hoffmann-la roche (basel, switzerland) and also glaxowellcome (london) in 2000 put on the market neuraminidase inhibitors that block the replication of the influenza virus. gilead was one of the first biotechnology companies to come out with an anti-flu therapeutic. tamiflu™ (oseltamivir phosphate) was the first flu pill from this new class of drugs called neuraminidase inhibitors (ni) that are designed to be active against all common strains of the influenza virus. neuraminidase inhibitors block viral replication by targeting a site on one of the two main surface structures of the influenza virus, preventing the virus from infecting new cells. neuraminidase is found protruding from the surface of the two main types of influenza virus, type a and type b. it enables newly formed viral particles to travel from one cell to another in the body. tamiflu is designed to prevent all common strains of the influenza virus from replicating. the replication process is what contributes to the worsening of symptoms in a person infected with the influenza virus. by inactivating neuraminidase, viral replication is stopped, halting the influenza virus in its tracks. in marked contrast to the usual protracted process of clinical trials for new therapeutics, the road from conception to application for tamiflu was remarkably expeditious. in 1996, gilead and hoffmann-la roche entered into a collaborative agreement to develop and market therapies that treat and prevent viral influenza. in 1999, as gilead's worldwide development and marketing partner, roche led the final development of tamiflu, 26 months after the first patient was dosed in clinical trials in april 1999, roche and gilead announced the submission of a new drug application to the u.s. food and drug administration (fda) for the treatment of influenza. additionally, roche filed a marketing authorisation application (maa) in the european union under the centralized procedure in early may 1999. six months later in october 1999, gilead and roche announced that the fda approved tamiflu for the treatment of influenza a and b in adults. these accelerated efforts allowed tamiflu to reach the u.s. market in time for the 1999-2000 flu season. one of gilead's studies showed an increase in efficacy from 60% when the vaccine was used alone to 92% when the vaccine was used in conjunction with a neuraminidase inhibitor. outside of the u.s., tamiflu also has been approved for the treatment of influenza a and b in argentina, brazil, canada, mexico, peru and switzerland. regulatory review of the tamiflu maa by european authorities is ongoing. with the h5n1 birdflu strain's relentless march (or rather flight) across asia, in 2006 through eastern europe to a french farmyard, an unwelcome stowaway on a winged migration, and no vaccine in sight, tamiflu, although untested for this species, seen as the last line of defense is now being horded and its patented production right's fought over like an alchemist's formula. tamiflu's main competitor, zanamivir marketed as relenza™ was one of a group of molecules developed by glaxowellcome and academic collaborators using structure-based drug design methods targeted, like tamiflu, at a region of the neuraminidase surface glycoprotein of influenza viruses that is highly conserved from strain to strain. glaxo filed for marketing approval for relenza in europe and canada. the food and drug administration's accelerated drug-approval timetable began to show results by 2001, its evaluation of novartis's gleevec took just three months compared with the standard 10-12 months. another factor in improving biotherapeutic fortunes in the new century was the staggering profits of early successes. in 2003, $1.9 billion of the $3.3 billion in revenue collected by genentech in south san francisco came from oncology products, mostly the monoclonal antibody-based drugs rituxan, used to treat non-hodgkin's lymphoma, and herceptin for breast cancer. in fact two of the first cancer drugs to use the new tools for 'rational' design herceptin and gleevec, a small-molecule chemotherapeutic for some forms of leukemia are proving successful, and others such as avastin (an anti-vascular endothelial growth factor) for colon cancer and erbitux are already following in their footsteps. gleevec led the way in exploiting signal-transduction pathways to treat cancer as it blocks a mutant form of tyrosine kinase (termed the philadelphia translocation recognized in 1960's) that can help to trigger out-of-control cell division. about 25% of biotech companies raising venture capital during the third quarter of 2003 listed cancer as their primary focus, according to online newsletter venturereporter. by 2002 according to the pharmaceutical research and manufacturers of america, 402 medicines were in development for cancer up from 215 in 1996. another new avenue in cancer research is to combine drugs. wyeth's mylotarg, for instance, links an antibody to a chemotherapeutic, and homes in on cd33 receptors on acute myeloid leukemia cells. expertise in biochemistry, cell biology and immunology is required to develop such a drug. this trend has created some bright spots in cancer research and development, even though drug discovery in general has been adversely affected by mergers, a few high-profile failures and a shaky us economy in the early 2000's. as the millennium approached observers as diverse as microsoft's bill gates and president bill clinton predicted the 21st century wiould be the "biology century". by 1999 the many programs and initiatives underway at major research institutions and leading companies were already giving shape to this assertion. these initiatives have ushered in a new era of biological research anticipated to generate technological changes of the magnitude associated with the industrial revolution and the computerbased information revolution. complementary dna sequencing: expressed sequence tags and human genome project basic local alignment search tool high-tech herbal medicine: plant-based vaccines asilomar conference on recombinant dna molecules potential biohazards of recombinant dna molecules hugo: the human genome organization chimeric plant virus particles administered nasally or orally induce systemic and mucosal immune responses in mice the human genome: the nature of the enterprise orchestrating the human genome project separation and analysis of dna sequence reaction products by capillary gel electrophoresis nutritional genomics: manipulating plant micronutrients to improve human health helping europe compete in human genome research genome project gets rough ride in europe construction of a linkage map of the human genome, and its application to mapping genetic diseases separation of dna restriction fragments by high performance capillary electrophoresis with low and zero crosslinked polyacrylamide using continuous and pulsed electric fields preimplantation and the 'new' genetics a history human genome project it aint necessarily so: the dream of the human genome and other illusions high speed dna sequencing by capillary electrophoresis a strategy for sequencing the genome 5 years early expression of norwalk virus capsid protein in transgenic tobacco and potato and its oral immunogenicity in mice rapid production of specific vaccines for lymphoma by expression of the tumor-derived single-chain fv epitopes in tobacco plants generation and analysis of 280,000 human expressed sequence tags national academy of sciences. introduction of recombinant dna-engineered organisms into the environment: key issues functional foods and biopharmaceuticals: the next generation of the gm revolution in let them eat precaution biotechnology: a review of technological developments, publishers forfas vitamin-a and iron-enriched rices may hold key to combating blindness and malnutrition: a biotechnology advance french dna: trouble in purgatory genome: the autobiography of a species in 23 chapters harper collins derivation of pluripotent stem cells from cultured human primordial germ cells production of correctly processed human serum albumin in transgenic plants high-yield production of a human therapeutic protein in tobacco chloroplasts the common thread: a story of science, politics, ethics and the human genome capillary gel electrophoresis for dna sequencing. laser-induced fluorescence detection with the sheath flow cuvette production of functional human alpha 1-antitrypsin by plant cell culture genetic modification of oils for improved health benefits, presentation at conference, dietary fatty acids and cardiovascular health: dietary recommendations for fatty acids: is there ample evidence? stable accumulation of aspergillus niger phytase in transgenic tobacco leaves antenatal maternal serum screening for down's syndrome: results of a demonstration project viable offspring derived from fetal and adult mammalian cells key: cord-009664-kb9fnbgy authors: nan title: oral presentations date: 2014-12-24 journal: clin microbiol infect doi: 10.1111/j.1469-0691.2009.02857.x sha: doc_id: 9664 cord_uid: kb9fnbgy nan [ primary immunodeficiency diseases are a heterogeneous group of disorders, caused by inherited defects in the immune system, and characterised by wide spectrum of clinical manifestations, particularly an increased susceptibility to infections and a predisposition to autoimmune diseases and malignancies. recurrent infections or infection with unusual organisms are the most commonly presentation of primary immunodeficiency diseases. although recurrent respiratory tract infections and gastrointestinal manifestations are the most common features of these diseases, especially in predominantly antibody deficiencies and combined immunodeficiencies, other organs can be involved as well. recurrent cutaneous abscesses with unusual organisms or deep abscesses may represent infections with an association with immunodeficiencies, particularly in phagocytes defects. meningococcal infections could have an association with complement deficiencies. meanwhile other bacterial infections, mainly streptococcus pneumoniae and staphylococcus aureus, as well as infections with viruses, fungi and parasites are also common in several primary immunodeficiency diseases. autoimmune diseases such as idiopathic thrombocytopenic purpura, autoimmune haemolytic anaemia, systemic lupus erythematosus, juvenile arthritis, sclerosing cholangitis, and vasculitis are common in primary immunodeficiency diseases. whilst some syndromic immunodeficiencies (e.g., wiskott aldrich syndrome, di george syndrome) have a strong association with autoimmunity, there are a group of disorders (e.g., alps, apeced, ipex) that the autoimmune manifestations are typically the first and most significant findings. malignancies are also common in some primary immunodeficiency diseases (e.g., cvid, alps, xlp, and dna repair defects). other manifestations such as dysmorphic features, associated anomalies, skeletal dysplasia, and oculocutaneous hypopigmentation can be unique characteristics of some cases with primary immunodeficiency diseases. the clinical manifestations of these diseases are often helpful in guiding the appropriate evaluation of the patients. prompt and precise diagnostic laboratory evaluation should be performed in the patients with such features, whereas early diagnosis and successful management of these patients prevent irreparable organ system damage and improve the prognosis. immunodeficiency specialists from all over europe have composed a multistage diagnostic protocol that is based on their expert opinion, in order to increase the awareness of pid among doctors working in different fields. the protocol starts from the clinical presentation of the patient; immunological skills are not needed for its use. a list of relevant symptoms and signs from the history and physical examination that should alert any physician to potential pid is given. these are grouped together to form eight typical clinical presentations of pid: recurrent ent and airway infections; failure to thrive from early infancy; recurrent pyogenic infections; unusual infections or unusually severe course of infections; recurrent infections with the same type of pathogen; autoimmune or chronic inflammatory disease, or lymphoproliferation; characteristic combinations of clinical features in eponymous syndromes; and angioneurotic edema. these presentations lead the user towards different algorithms, which in fact represent the traditional division into antibody, complement, lymphocyte, and phagocyte deficiencies, respectively. the algorithms each are comprised of several steps. this multistage design allows cost-effective screening for pid within the large pool of potential cases in all hospitals in the early phases, while more expensive tests are reserved for definitive classification in collaboration with an immunologist at a later stage. g. schmid°(geneva, ch) in 1986, articles suggesting that male circumcision (mc) decreased the risk of hiv infection appeared. over the next 15 years, studies of two epidemiologic types − ecologic and observational − increasingly supported this contention. ecologic studies showed strong correlations between prevalences of mc and hiv, e.g., tribes with low prevalences of mc had high prevalences of hiv infection. observational cross-sectional studies showed that uncircumcised men had higher rates of hiv than circumcised men. observational cohort studies confirmed these weaker study design findings. a systematic review of observational studies in 2000 found a relative risk (rr) of 0.42 (95% ci, 0.34−0.54), a 58% protective effect. in 2005 and 2007, results from three randomised controlled trials, all from sub-saharan africa, were reported. results were consistent, and the pooled rr of 0.42 (95% ci, 0.31−0.57) was identical to that of the observational studies. the protective effect in the three trials, found at about 21−24 months' follow-up, has been extended in one trial to a protective effect of 64% at 42 months of follow-up. who and unaids have strongly endorsed mc as an effective hiv prevention strategy in generalised hiv epidemics where mc is uncommon. what about europe? mc is uncommon with an adult male prevalence of <20%. hiv incidence is low enough that mc for hiv prevention purposes is unlikely to have much impact. no public health authority recommends routine neonatal circumcision. increasingly, however, data are showing benefits of mc in addition to hiv prevention. lessened risk of urinary tract infection in infants (rr 0.13, 95% ci 0.08−0.20) and lifetime avoidance of phimosis and associated conditions occur when mc is performed neonatally. other benefits occur in males circumcised at any age. mc protects against acquiring sexually transmitted infections characterised by genital ulcers-syphilis, chancroid and herpes-and possibly trichomoniasis. circumcised men may be less likely to acquire hpv and are more likely to clear the infection. through the protective effect against hpv, mc halves risk of penile cancer (rr 0.52, 95% ci 0.33−0.82) and partners of circumcised men are at lessened risk of cervical cancer. other issues must be considered in making public health decisions about mc. cultural objections may occur, but mc in the developing world is readily accepted in non-circumcising societies. studies of sexual pleasure and function have found no relationship to circumcision status. mc may be advised for subgroups, even if not for the entire population. and, surgical risk and cost must be considered. while many sub-saharan african countries are scaling up mc services to prevent hiv infection, public health agencies in many industrialised countries are reconsidering mc policies-the outcomes of both efforts are being followed with interest. acute otitis media (aom) is generally considered a bacterial infection that is treated with antibiotics. however, despite extensive use of broadspectrum antibiotics for this condition, the clinical response to the treatment is often poor. this fact, together with vast clinical experience connecting aom with viral respiratory infections, has prompted research into the role of viruses in aom. to date, ample evidence from studies ranging from animal experiments to large clinical trials supports a crucial role for respiratory viruses in the aetiology and pathogenesis of aom. in most cases, viral infection of the upper respiratory mucosa initiates the whole cascade of events that finally leads to the development of aom as a complication. the pathogenesis of aom involves a complex interplay between viruses, bacteria, and the host's inflammatory response. recent studies indicate that with sensitive techniques viruses can be found in the middle-ear fluid in most children with aom, either alone or together with bacteria. viruses appear to enhance the inflammatory process in the middle ear, and they may profoundly impair the resolution of otitis media. it is important to understand, however, that our increasing knowledge of the importance of viruses in the etiopathogenesis of aom does not diminish the central role of bacteria in aom. therefore, while viruses may explain many of the problems encountered in treating aom, the ultimate decision on whether or not to treat aom with antibiotics cannot be based solely on the degree of viral involvement in aom. the non-judicious use of antibiotics has lead to an epidemic in antimicrobial resistance. acute otitis media (aom) is the most common indication for use of antibiotics in children in the united states (us). despite available evidence that supports a wait and see approach, most us physicians immediately prescribe antibiotics for the treatment of aom. the american academy of pediatrics published a guideline in 2004 that addressed the diagnosis and treatment of aom. this guideline recommends the use of observation as a potential strategy for the treatment of aom. the key components of this published guideline will be discussed, as well as the evidence and rationale that supports the use of observation as an initial strategy to treat aom. otitis media (om) is the most common bacterial infection in children aged <5 years for which antibiotic treatment is prescribed worldwide. although most of the time this entity resolves spontaneously it is associated with morbidity, family dysfunction, antibiotic use and burden on the medical system. efforts to reduce the burden of om by vaccination have not been extremely rewarding, but some progress has been made. the first obvious step would be to reduce viral infections leading secondarily to om. in the modern era, the only viral vaccine with proven effect on aom is the influenza virus vaccine. both the inactivated and the live virus showed some effect, but since influenza virus has only a limited season yearly the effect on the overall om rate is far from being remarkable. haemophilus influenzae (hi) b vaccine did not reduce om since most hi causing om are nontypable (nthi) and not hib. the newly developed pneumococcal conjugate vaccines (pcvs) have all been shown to reduce >50% of the om caused by the serotypes included in the vaccines, but some replacement with serotypes not included in the vaccines and non pneumococcal organisms was demonstrated to reduce the overall effect of pneumococcal vaccines. the effect of pcv on the reduction of recurrent om, om with effusion, the need for ventilation tubes and frequent visits for aom has been suggested, and the real impact is still being studied. aiming with pcv at those with established recurrent om has proved disappointing. pcvs can reduce om caused by antibiotic-resistant s. pneumoniae but the continued overuse of antibiotics is responsible for the increase in antibiotic resistance in non-vaccine serotypes. a newly developed pcv with an outer membrane protein for hi (pnpd) is suggested to reduce also om caused by hi, but confirmation studies are needed. the expansion of the 7 serotypes included in the current licensed pcv to 10 or more serotypes may add to the prevention of om in the near future. in the next decade, om will continue to be an important disease in children. however, we can expect it to be modified in terms of bacteriologic aetiologies, antibiotic resistance and hopefully short and long term consequences. v. korten°(istanbul, tr) infectious consequences of an earthquake mainly involve several types of communicable diseases and crush related infections. water-borne and food-borne illnesses often result from the disruption of the public water and sewage systems and contamination of water supply. overcrowding, poor hygiene and sanitation in temporary shelters also may be factors. the type of infectious diseases are associated with the epidemiology of communicable diseases in the area where the earthquake occurred. the most common outbreaks associated with earthquakes are gastroenteritis, infectious hepatitis and pulmonary infections. in unvaccinated populations, there are reports of increased measles. tetanus can be seen in populations where vaccination coverage levels are low. the risk for diarrhoeal disease outbreaks following earthquakes is higher in developing countries than in industrialised countries. an outbreak of acute watery diarrhoea involved >750 cases occurred in a camp after the 2005 earthquake in pakistan. acute respiratory infections, hepatitis e clusters and measles (>400 clinical cases in the 6 months) also occurred among the displaced victims after the same earthquake. contamination of drinking water led to an outbreak of rotavirus after the 2005 earthquake in kashmir, india. an unusual outbreak of coccidiomycosis associated with exposure to increased levels of airborne dust occurred after the 1994 southern california earthquake. persons who have been trapped by rubble for several hours or days may develop compartment syndromes requiring fasciotomy or amputation. infectious complications were common in renal victims of the1999 marmara earthquake in turkey and were associated with increased mortality when complicated by sepsis. of 639 renal victims, 223 (34.9%) had infectious complications, mainly sepsis and wound infections. most of the infections were nosocomial in origin and caused by gram-negative aerobic bacteria and staphylococcus spp. multivariate analysis of the risk-factors for nosocomial infections revealed a significant association with fasciotomy and length of hospital stay in a back up university hospital. the most frequent pathogens isolated from pus and/or wounds culture in 2008 wenchuan earthquake survivors were s. aureus, e. coli, a. baumannii, e. cloacae, and p. aeruginosa. disaster-preparedness plans, focused on trauma and mass casualty management and also on health needs of the surviving affected populations may decrease the health impact of earthquakes. s16 infections in the disaster setting: famine. experience from darfour, sudan clinic malnutrition is a known risk factor for id worldwide. subsaharan africa and india is at higher risk due to vegetarian habits on absolute absence of animal meat proteins, resulting to depletion of micronutritients (zinc, iron, selenium), responsible for recovery of postmalarial anaemia. in addition, depletion of proteins results to immunoglobulinaemia and to delayed response to many bacterial pathogens causing id in topics (pneumococci, salmonella, etc.) . third problem is absence of vitamins dissolved in oil and fat, resulting to delayed phagocytic activity. therefore proteinocaloric malnutrition results to significant adverse outcome in hiv, tb (diarrhoea, pneumonia), the major killers of children under five. st. elizabeth university tropical programme runs 4 antimalnutrition centres: 1 in sudan, darfour and 2 in kenya amaong upcountry refugees from major conflict areas (sudan − turrana border) and 1 in uganda trying to rehabilitate malnourished children under 5 and helping them to combat disease, responsible for 12.5 million deaths in children mean 5 a year − malaria (1.2 mil), tb (1.1 mil), hiv (2.0 mil), pneumonia (7.5 mil) and diarrhoea (0.5 mil. children deaths approximately a year). h. giamarellou°(athens, gr) for the last six years greece has faced a large number of infections, mainly in the intensive care units (icu), due to carbapenemsresistant klebsiella pneumoniae. the proportion of imipenem-resistant k. pneumoniae has increased from less than 1% in 2001, to 23% in isolates from hospital wards and to 53% in isolates from icus in 2008. likewise, in 2002, these strains were identified in only three hospitals, whereas now they are isolated in at least 32 of the 40 hospitals participating in the greek surveillance system. until 2007 this situation was due to the spread of the blavim-1 cassette among the rapidly evolving multiresistant plasmids and multiresistant or even panresistant strains of mainly k. pneumoniae and also other enterobacterial species. however, the fact that most strains display mic values below or near the clsi resistance breakpoint create diagnostic and therapeutic problems, and possibly obstruct the assessment of the real incidence of these strains. as of 2007, the emergence of kpc-producing k. pneumoniae has been noted in icus of some greek hospitals and has now spread to most hospitals throughout the country creating a countywide outbreak in 2008. in attikon university hospital we recently described the icu outbreak of kpc-producing k. pneumoniae. twenty-nine patients (admitted from february to december 2008) were colonised mainly in gi tract. fifteen patients were male (52%) and the median apache ii was 19. patients had already long hospital stays preceding icu admission with a median of 25 (17−40) days. in twenty-two of these patients (76%) kpc-producing k. pneumoniae colonisation was definitely icuacquired while in 7 (24%) acquisition in other wards or other hospitals was hypothesized. five of these patients are still hospitalised in the icu and, of the remaining 24, 11 died (icu mortality 46%). ten of the 29 colonised patients were clinically infected. fifteen infections were documented, mostly bsi (11/15), followed by vap (2/15) and ssi (2/15). only 1 patient died from this infection (1/15, 6.7%). an evidence-based consensus on the therapeutic strategy for these infections has been reached by keelpno and the greek ministry of health which proposed the use of high dose meropenem (6−8 g/day) combined with an active aminoglycoside or colistin for strains with an mic 4 mg/ml whereas for strains with a higher mic the use of carbapenems is contraindicated and active alternatives (monotherapy with tigecycline, colistin, or an aminoglycoside or aztreonam-based combinations) could be used. antibiotic stewardship is of great importance in such a dismal situation but stringent adherence to infection control measures is probably of even greater importance for the effective containment of these pandrugresistant strains. the presentation of clostridium difficile infection (cdi) varies from mild diarrhoea to a potentially fatal pseudomembranous colitis. the recent emergence of types 027 and 078 of c. difficile has been associated with increased virulence. c. difficile takes advantage of disruption of the normal intestinal flora as caused by antibiotic therapy. the antibiotical class and the antimicrobial resistance pattern of c. difficile influence the development of disease. in the netherlands, significantly more patients with cdi due to type 027 used fluoroquinolones (or, 2.88; 95% ci, 1.01−8.20) compared with those who were infected with other pcr ribotypes. similar as type 027 cdi, patients infected with type 078 also more frequently received fluoroquinolones therapy (or, 2.17; . the risk to develop cdi due to type 027 was particularly high in persons receiving a combination of cephalosporin and fluoroquinolone (or 57.5, ). this association was also strongly dependent on the duration of therapy. the use of clindamycin was found as a protective factor. however, the recent detection of clindamycin-resistant c. difficile type 027 strains in other european countries is an important and worrying development. since the association of cdi with fluoroquinolones has only been investigated at patient level, a study was performed to investigate the relationship between cdi incidence and the preceding use of different antibiotic classes at hospital level in the netherlands. comparisons were made between hospitals where type 027 caused an epidemic, hospitals where only isolated cases of type 027 were observed and hospitals where no outbreak of cdi or type 027 were encountered. in the pre-epidemic period, the total use antibiotics was comparable between affected and unaffected hospitals. higher use of secondgeneration cephalosporins, macrolides and all other studied antibiotics were independently associated with a small increase in cdi incidence, but the effect was too small to predict which hospitals might be more prone to 027-associated outbreaks. despite the fact that the netherlands is known by its restrictive and conservative use of antibiotics, outbreaks of cdi due to new emerging types have been recognized. this is probably associated with the use of antibiotics at patient level and hospital department level rather than the use of antibiotics at the level of the healthcare institute. m. peiffer, j. bulitta, h.a. haeberle, m. kinzig-schippers, m. rodamer, v. jakob, b. nohé, f. sörgel, w.a. krueger°(trier, de; albany, us; tubingen, nuremberg, constance, de) piperacillin-tazobactam (pip-tazo) is a broad spectrum antibiotic, used for treatment of severe infections such as ventilator-associated pneumonia (vap). the effectiveness of betalactams is best predicted by the duration of free drug concentrations above the minimal inhibitory concentration (t > mic) of infecting pathogens [1] . animal experiments suggest that more than 50% of t > mic should be reached. continuous infusion (ci) of pip-tazo may enhance the therapeutic performance, but there is little data on pharmacokinetic/-dynamic (pk/pd) parameters, when ci is used in critically ill patients. objectives: the aim of our study was to determine concentrations of pip-tazo in plasma and broncho-alveolar epithelial lining fluid (elf) at steady state during ci. based on these results, the penetration ratio (plasma/elf) and pk/pd parameters for pip-tazo are derived. methods: after approval by the ethics committee, 16 mechanically ventilated critically ill patients were enrolled during treatment in 3 intensive care units. each patient received a loading dose of 4 g/0.5 g of pip-tazo, followed by ci of 12 g/1.5 g over 24 h. at steady state (67.8 + 39.5 h after loading dose), a total of 30 blood samples were drawn and bronchoalveolar lavage (bal) was simultaneously performed in 8 cases (1 sample discarded for technical reasons). samples were stored at −80ºc until analysis by liquid chromatography coupled with mass-spectrometry (lc-ms). elf-concentrations were calculated from bal-samples using the relation of ureaplasma:ureabal as dilution factor. results: plasma concentrations of pip and tazo (n = 30 in 16 pts.) amounted to 15.38+8.89 mg/ml, and 1.31+0.95 mg/ml, respectively. elflevels (n = 7) were 56.63+27.24 mg/ml, and 5.95+3.74 mg/ml. elf-levels were 368+236%, and 587+584% of corresponding plasma levels (n = 7) for pip and tazo, respectively. the ratio pip:tazo was 11.74:1 in plasma, and 9.52:1 in elf. conclusions: using advanced analytical techniques, elf concentrations were higher compared to traditional bolus administration [2] . ci yielded steady state plasma concentrations in excess of mics of susceptible bacteria (<8 mg/ml, according to eucast) in 76.6% of measurements, respectively, but elf levels exceeded 8 mg/ml in all cases. taken together, our data provide further arguments for ci being the preferred mode of administration for pip-tazo in critically ill patients with suspected vap. [ objectives: staphylococcus aureus is a potential pathogenic microorganism and a causative agent of~25% of infections in intensive care patients. an optimal empiric choice for the treatment of these infections will result in a reduction in morbidity and mortality. therefore, it is essential to provide the clinician with resistance data of the bacterial population to be treated. to optimise the empiric choice and to monitor the emergence of microbial resistance, a national surveillance program of the swab was started in the netherlands in 1996.this study describes the results of the resistance development of s. aureus from icu's of 14 hospitals all over the netherlands over a ten year period. methods: in the first 6 months of each year, the participating hospitals collected clinical isolates from among others blood and respiratory samples. in total 943 isolates were collected: 250 from 3 hospitals in the north, 187 from 2 in the east, 229 from five in the west and 280 from four in the south. the antimicrobial susceptibility was determined as a micro broth dilution method according to the clsi guidelines. results: an increase in resistance to ciprofloxacin was observed from 4% until 2002 to 14% from in 2005, which dropped again to 7% in 2006. the resistance to moxifloxacin was rather constant over time, i.e. 2%, only in 2003 8% resistance was found. resistance to clarithromycin increased to 10% in 2003, but decreased in 2006 to 6% the level before 2003. resistance to penicillin, clindamycin and tetracycline fluctuated over time at~75%, 4−8% and 2−10% respectively. during the study period seven methicillin resistant s. aureus were isolated, no resistance to vancomycin, teicoplanin and linezolid was observed. resistance to gentamicin and rifampicin was sporadicly found. regional differences were observed for ciprofloxacin, being the highest in the western and southern part and tetracycline being the lowest in the northern part. conclusion: during the 10 year study period only an increase in resistance to ciprofloxacin was observed. the data presented justify the empiric choice of flucloxacillin, (with rifampicin or gentamicin depending on the indication) in case of an infection in icu patients probably caused by s. aureus. j.j. lu°, p.r. hsueh, s.y. lee (taichung, taipei, tw) objectives: to investigate the prevalence of visa in hospitalised patients with mrsa infections or colonisations at a teaching hospital in taiwan and to evaluate the possible clonal spread of visa in the hospital. methods: from september 2001 to august 2002, 1500 consecutive mrsa isolates were collected from various clinical specimens of 637 patients hospitalised at a teaching hospital in taiwan. minimum inhibitory concentrations (mics) of vancomycin for all mrsa isolates were determined by the broth microdilution method in accordance with clsi guidelines. molecular characteristics and antimicrobial susceptibilities of visa isolates were investigated and pulsed-field gel electrophoresis was used to evaluate the clonality of the isolates. results: among the 1500 mrsa isolates, 43 (2.9%) were visa. of the 43 visa isolates, 35 had vancomycin mics of 4 microgram/ml and 8 had vancomycin mics of 8 microgram/ml. all isolates were inhibited by tigecycline at 0.5 microgram/ml, linezolid at 1 microgram/ml, and ceftobiprole at 2 microgram/ml. five (11.6%) isolates had reduced susceptibility to daptomycin (mics of 1−2 microgram/ml). six of the 43 visa isolates had decreased susceptibility to autolysis in 0.05% triton x-100. the 43 visa isolates were recovered from 21 patients; 13 of these patients had received glycopeptide treatment prior to the isolation of visa. five (23.8%) patients died despite vancomycin therapy. all 43 visa isolates carried sccmec type iii and agr group i but were negative for pvl gene (luks-lukf). none of the enterococcal van genes were detected in the 43 visa isolates. results of pfge analysis revealed that one major clone of visa isolates (90.5%, clone a exhibiting sccmec type iii, agr group i, and absence of pvl gene) had disseminated in the hospital. conclusion: this retrospective study demonstrated that clonal dissemination of visa had occurred in the hospital. rapid and correct detection of visa and proper use of antibiotics are the most effective approaches for preventing its emergence and spread. x. zheng°, c. qi, a. o'leary, m. arrieta, s. shulman (chicago, us) objectives: vancomycin remains one of the major options for treating methicillin-resistant s. aureus (mrsa) related infections. some but not all studies have shown an increase in prevalence of mrsa isolates with elevated vancomycin mic values among recent clinical isolates, so called "mic creep". although still within the susceptible range, higher mics may be associated with increased chance of treatment failure. because of the conflicting reports and lack of published data from paediatric patients, we sought to assess possible mic change over time and to compare results generated by using different methodologies including etest, agar dilution, and broth microdilution (microscan) methods. methods: we studied 318 mrsa isolates predominantly community acquired including all blood and normally sterile site isolates collected in our large children's hospital in 2000/2001, 2003, 2005, and 2007 molecular bacteriology o41 genome sequence of a virulent, methicillin-sensitive staphylococcus aureus clinical isolate that encodes the panton-valentine leukocidin toxin l. faraj, l.a.s. snyder, n.j. loman, d.p. turner, m.j. pallen, d. ala'aldeen, r. james°(nottingham, birmingham, uk) objective: to determine the genome sequence of a virulent meticillinsensitive staphylococcus aureus (mssa) clinical isolate sanot01. methods: roche 454 sequencing determined the genome sequence of the clinical isolate at 12 times coverage. newbler sequence assembly (roche) generated 10 scaffolds that were annotated using gendb and compared with other s. aureus genome sequences. results: an 11-year-old asian girl presented with fever and a 1-week history of knee pain following a trivial fall. an mr scan revealed a large subperiosteal abscess around the upper tibia secondary to metaphyseal osteomyelitis. a pvl-positive, mssa was isolated from blood cultures and pus. the child deteriorated, required repeated debridement and developed septic shock. further investigation revealed aortic valve endocarditis with an aortic root abscess. whole genome sequencing revealed that sanot01 is the first sequence of an st30 s. aureus isolate to be determined. sanot01 is agr type iii and carries three coding regions that are not found in any other s. aureus genome sequences. amongst the unique genes present in these regions is a dihydrofolate reductase gene (dfrg) which is present in addition to the usual dfrb gene. downstream of the orfx gene, a 6.5 kb remnant of sccmec type ivc was found. this sequence has only previously been found in the mrsa252 genome sequence where it is located between the orfx and sccmec type ii sequences. mrsa252 is unique in sharing 14 genome regions with s. aureus strain rf122, a causative agent of contagious bovine mastitis. all but one of these 14 genome regions are also present in sanot01. conclusions: comparison of the genome sequence of sanot01 and the closely related mrsa252 ha-mrsa (emrsa-16) isolate reveals new insights in the evolution of both ca-mrsa and ha-mrsa isolates and the link to s. aureus rf122. pvl-encoding mssa strains can be significant pathogens but are not currently under mandatory surveillance in uk. as the cost of whole genome sequencing falls further it will become feasible to use this technology to monitor the evolution of both mssa and mrsa in healthcare settings and reveal clinically relevant information that will help to improve patient outcomes. objectives: ca-mrsa often produce panton-valentine leukocidin (pvl), a leukocidin encoded by two co-transcribed genes located on lysogenised phages. five pvl-encoding phages have been described in s. aureus: phipvl, phi108pvl, phislt, phisa2mw and phisa2958. single nucleotide polymorphisms (snps) in the pvl genes tend to vary with lineage and may have structural and functional implications. we examined a selection of pvl-positive ca-mrsa reported in our hospital to determine whether sequence variation and the pvl-encoding phage vary with lineage. methods: twenty-two pvl-positive isolates were chosen to reflect mlst clonal complexes identified in our hospital: cc1, 5, 8, 59, 80, 88 and 154 . isolates were characterised by antimicrobial resistance profile, sccmec and spa type, pulsed-field gel electrophoresis (pfge) profile and multilocus sequence typing (mlst); an oligonuleotide array (clondiag arraytube) was used to detect a range of toxin and antimicrobial resistance genes. primers were designed to amplify and sequence the luksf-pv genes. the pvl-encoding phage was characterised using a recently described pcr-based assay (ma et al. j clin microbiol 2008; 40:3246−58) . results: snps were identified at seven positions in the luksf-pv genes and the snp profile varied with lineage. three of the snps were coding mutations, which may have structural and functional implications. cc1 and cc80 isolates were both found to carry phisa2mw. the pvlencoding phage was not definitively identified in the other lineages, although the cc59 isolates carried a phisa2958-like phage and the cc8, cc80 and cc154 isolates carried elongated head-type phages. one of the cc1 isolates had an unexpected snp pattern compared with other cc1 isolates; this isolate also carried a novel or variant phage. conclusion: pvl gene sequence and the pvl-encoding phage vary with lineage in pvl-positive ca-mrsa isolates. this suggests that certain lineages are susceptible to infection or lysogeny with certain phage types. although ca-mrsa commonly carry pvl genes, some strains do not; it is possible that some pvl-negative types are resistant to infection with pvl-encoding phage, perhaps via restriction modification systems. crucially, our findings suggest the pvl genes have co-evolved with their phage and are not freely transmitted between different phages. further work is required to characterise the pvl-encoding phage in other isolates and to investigate whether the pvl sequence variants result in biological differences. objectives: community-associated mrsa (ca-mrsa) of many different mlst clonal complexes (ccs) can harbour lysogenised bacteriophage dna (prophage) encoding panton-valentine leukocidin (pvl). five pvl phages (phipvl, phislt, phisa2mw, phi108pvl, and phisa2958) have been reported to date. we sought to determine the distribution of chromosomally integrated copies of these lysogenised pvl-phages amongst dominant clones of pvl mrsa in england and wales. methods: seventy isolates of previously characterised pvl-mrsa were analysed by pcrs developed by ma et. al, (jcm, 2008) , to identify and discriminate between the five known pvl phages. to maximise any underlying diversity, representatives of each cc were selected based upon their spa, staphylococcal cassette chromosome mec (sccmec), toxin gene and pulsed-field gel electrophoresis (pfge) profiles. these included isolates of internationally disseminated pvl-mrsa lineages ccs 8, 30 and 80 which resemble the usa300, south west pacific (swp) and european clones, respectively. in addition we analysed pvl-mrsa from ccs 1, 5, 22, 59, 88 and st93. results: all seven cc80 isolates, which included representatives of the european clone, possessed an elongated-head-type phage and were positive by the pcr specific for the phisa2mw phage. one of the cc30 isolates possessed a phi108pvl phage, four swp representatives had elongated head type phages, whilst the remaining four cc30 isolates harboured an icosahedral-head-type phage. one cc30 was positive for both head shapes. the 12 cc8 (including representatives of usa300), eight cc1, six cc88 isolates and the st93 isolate were all positive for elongated-head-type phage. nine cc5 isolates were non-typeable for phage head shape and specific phage pcrs. three of four cc59 isolates, harboured a phisa2958-like phage of an unknown head type and the other cc59 isolate was non-typeable. all 14 cc22 isolates possessed an icosahedral-head-type phage, 13 were positive for the phipvl phage type and one possessed phi108pvl type. we have determined the pvl phages present in a diverse panel of distinct pvl-mrsa clones and found considerable inter-lineage variation in the pvl prophage present. there was also evidence of intra lineage variation in some major ccs such as ccs 22, 30 and 59. together with variation in mlst cc and sccmec, these data suggest pvl-mrsa have evolved on multiple occasions, sometimes within the same lineage. o44 transcriptional profiling of klebsiella pneumoniae genes controlled by the transcription factor, rama objectives: rama is an arac/xyls family transcriptional activator where over expression is associated with a multidrug resistance phenotype. in both multidrug resistant klebsiella and salmonella isolates, the rama gene has been associated with increase in expression of the acrab efflux pump. in salmonella it has been shown that a deletion of the rama locus prevents the emergence of multidrug resistant mutants. therefore in order to understand the role of this key regulator in the emergence and development of antibiotic resistance, transcriptomic analyses of its regulon were undertaken in k. pneumoniae. methods: rna was extracted from a combination of isogenic mutants and clinical isolates using the qiagen or ribopure kits. rna integrity was assessed using nanodrop and agilent nanochip systems. the rna was transcribed into double stranded cdna prior to labelling with cy3. the cdna was hybridised to the nimblegen expression array platform designed from the k. pneumoniae mgh 78578 genome. results: approximately 50 genes were found to be affected by rama expression, of which twenty (involved in metabolism, physiology, transcription, drug efflux, protection responses and the cell envelope) were confirmed by rt-pcr. the rama protein appears to affect drug efflux operons not previously shown to be associated with multidrug resistance and or affected by similar proteins such as mara. comparative transcriptome analyses of different k. pneumoniae clinical isolates overexpressing rama showed that variations exist in the levels of expression of the drug efflux genes. of note genes shown to be directly regulated by rama have a marbox-like sequence within the promoter sequences. conclusion: in this study, the transcriptome of the regulatory protein, rama, was determined in the pathogen k. pneumoniae. drug efflux proteins not previously associated with rama overexpression were found to be directly affected. the rama regulon overlaps with the mara and soxs regulons in e. coli and salmonella but is directly associated with regulating the expression of a subset of genes via a marbox sequence. interestingly, variations in the levels of the expression of the regulon genes were found in the different rama overexpressing strains. m. eshoo°, c. crowder, h. li, h. matthews, s. meng, s. sefers, r. sampath, c. stratton, d. ecker, y.w. tang (carlsbad, nashville, us) objectives: the potential for fatal outcome from tick-borne human infections such as ehrlichiosis emphasizes the need for rapid diagnosis. we developed and validated an ibis t5000 assay (ibis biosciences, inc., carlsbad, ca) that can detect and identify a wide range of tick-borne pathogens from clinical samples. methods: a multi-locus assay was used that employs 16 broadrange pcr primer pairs targeting all known bacterial tick-borne pathogen families. electrospray ionisation mass spectrometry of the pcr amplicons was used to determine their base composition. these base composition signatures were subsequently used to identify the organisms found in the samples. the assay was developed using field collected ticks and a wide range of clinical sample types and has been shown to be sensitive to the stochastic limits of pcr. results: whole blood (198) , cerebrospinal fluid (20) and plasma (1) samples, which were originally submitted for ehrlichia species detection by a colorimetric microtiter plate pcr (pcr-eia), were collected consecutively from january 5 to august 1, 2008 at vanderbilt university hospital. among the total 219 specimens, pcr-eia detected 40 ehrlichia species with a positive rate of 18.3%. the ibis system detected ehrlichia in 38 of the 40 pcr-eia-positive samples and 1 in 179 of the pcr-eia-negative specimens, giving sensitivity and specificity of 95.0% and 99.4%, respectively. the ibis system further characterised the 38 ehrlichia-dual positive specimens to the species level (e. cheffeensis, 35; e. ewingii, 3) with a 100% agreement to that identified by pcr-eia using additional species-specific probes. in addition we demonstrated the detection of borrelia burgdorferi from the blood and skin of a patient with lyme disease. conclusions: we demonstrate broad-range detection of tick-borne pathogens in a single assay using skin, whole blood, plasma, skin and csf. in addition to ehrlichia, the ibis system detected 4 rickettsia rickettsii positive specimens, which were confirmed by serology and clinical findings. the ibis t5000 system, which can be completed within five hours from specimen processing to result reporting, provides rapid and accurate detection and identification of a broad range of pathogens causing tick-borne human infections. r. sampath°, l. blyn, r. ranken, c. massire, t. hall, m. eshoo, r. lovari, h. matthews, d. toleno, r. housley, s. hofstadler, d. ecker (carlsbad, us) objective: to investigate the use of a novel platform-based approach for rapid characterisation of hai organisms. pathogens that cause healthcare-associated infections (hais) pose an ongoing and increasing challenge to hospitals, both in the clinical treatment and in the prevention of the cross-transmission of these problematic pathogens. here we describe the utility of a pcr electrospray ionization mass spectrometry (pcr/esi-ms) detection platform as an innovative, rapid approach for detection and complete characterisation of important hai pathogens. methods: we have developed pcr/esi-ms based methods to rapidly identify and characterise mrsa, vre, c. difficile (nap-1 strain), p. aeruginosa and a. baumannii. each target organism can be analyzed using an independent 8-well assay that can be run on the same platform and can provide species and strain id, virulence factors, antibiotic resistance and genotyping as appropriate. validation studies were performed using 100-300 retrospective, well-characterised clinical isolates for each organism. this was followed by a prospective study for one of the 5 organisms, mrsa, that included screening of 557 clinical specimens (nares swab) from patients who were admitted to a medical unit with a high prevalence of mrsa clinical infections. results: for each of the five hai organisms, pcr/esi-ms species identifications were compared to gold standard testing results from the clinical microbiology laboratory and showed 100% concordance. for s. aureus, p. aeruginosa and a. baumannii, molecular genotyping by pcr/esi-ms was compared to pulse field gel electrophoresis (pfge) clusters and showed >95% concordance. characterisation of virulence and/or drug resistance was performed for mrsa, vre and c. difficile and showed 90−95% correct detection compared to existing testing methods. analysis of clinical specimens for mrsa showed that of the 557 swabs, 95 (15%) contained mrsa, either singly or as a dual infection with cons, 33 (5%) were mssa and 358 (58%) contained meca+ coagulase negative staphylococcus (mr-cons). comparison to gold standard analysis showed 100% sensitivity for mrsa detection with 96.8% specificity, 84% ppv and 100%npv. the pcr/esi-ms technology is a high throughput assay system useful for infection control and for epidemiological studies. it is capable of simultaneous identification of hai organisms while detecting presence of key phenotypic markers and genotypic strain characterisation. m. reijans°, j. ossel, j. keijdener, g. simons (maastricht, nl) objective: molecular diagnostics play an increasingly important role in the detection of infectious agents in cerebrospinal fluids. however, the growing list of targets and the relatively small sample volumes are challenges that demand an improved molecular diagnostic approach. the meningofinder is a multifinder assay allowing the simultaneous detection of 7 viruses and 1 internal control in 1 reaction. until now, the analysis of multifinder assays was based on size-fractionation, identifying each multifinder probe due to its specific length. here we present an alternative approach allowing realtime detection of eight meningofinder probes in a single tube. the realtime detection enables a faster analysis, less handling and lowers the risk of contamination. method: the meningofinder assay is a multifinder assay which detects herpes simplex virus 1 and 2 (hsv1−2), human parechovirus (hpev), cytomegalovirus (cmv), epstein-barr virus (ebv), enterovirus (ev) and varicella-zoster virus (vzv) plus an internal control in a single reaction. each meningofinder probe can be distinguished based upon the specific length of each probe by size-fractionation using gel or capillary electrophoresis. we developed an alternative detection method using fluorescently labelled probes which allow specific identification of 8 multifinder probes in a realtime pcr machine. results: a large number of qcmd samples (n = 44), several enterovirus types (n = 27) and characterised clinical samples (n = 66) were analyzed using the meningofinder. all meningofinder reactions were analyzed by capillary electrophoresis and by fluorescently labelled probes in a realtime pcr machine. the results of the meningofinder showed a very good correlation with the expected results (>95%). furthermore, the results of both meningofinder analyses showed a high degree of correlation. the realtime detection of the meningofinder probes decreases the analysis time and post pcr handling dramatically. we developed a new assay for the realtime detection of 8 meningofinder probes. the realtime analysis showed a very good correlation with the conventional capillary electrophoresis analysis. in addition, the realtime detection reduced contamination risk and patient results became available more quickly. the combination of multifinder technology combined with realtime detection shows great potential in fast and easy multiparameter screening of clinical samples for infectious pathogens. in-house naats were applied to nucleic acid extracts obtained by own in-house methodology in each centre. results: sensitivities for the detection of the respiratory viruses were 40% for commercial mx naat, 86% for in-house mw naat, and 90% for mono in-house naat. the viral load was low each time false-negative results were obtained. false positive results were obtained by all methods used, resulting in specificities ranging from 88%-97%. for the atypical bacteria, the 2 multiplex naats failed to detect low l. pneumophila positive samples and low m. pneumoniae positive sample resulting in sensitivities of 25% and 75% compared to 100% in the inhouse mono naats. the commercial mx naat also failed to detect strong positive samples. no false positive results were obtained for the atypical bacteria. revisiting phage therapy against problematic pathogens s61 how the past feeds the future: from d'herelle to modern phagotherapy the increasing antibiotic resistance problem boosts the interest in alternative treatments for infections. a prominent example for this is the so-called phagotherapy. it makes use of bacterial viruses − bacteriophages − as drugs against bacterial agents. these bacteriophages are isolated from nature, characterised and then tested against the bacterial strains that are targeted. in theory, this approach has several advantages. for instance, bacteriophages infect, as a rule, their bacterial prey very specifically. therefore, they do not harm the commensal bacteria of the patient. additionally, if a bacterial strain becomes resistant against a certain bacteriophage strain, evolution will provide for new and active bacteriophage strains. in practice, phagotherapy has been used for a long time. already one of the two discoverers of bacteriophages, félix d'herelle, was an ardent advocate of this method. in fact, he was the first to use bacteriophages against infections − 1919 against bacterial diarrhoea (shigella spp.). after that, phagotherapy has been used to quite some extent in europe, the us and other parts of the world until penicillin entered the market in the 1940 s. in some parts of the former soviet union and the eastern bloc, the method has been utilised until today. now, several companies and university researchers are developing bacteriophages for therapeutical purposes again. historical documents related to phagotherapy and oral history reveal a fascinating past. bacteriophages have been employed against a wide variety of bacterial diseases in a time in which there were virtually no other anti-infectives. for example, in india, millions of cholera patients were treated with bacteriophages in the 1930 s. anti-cholera phages were also poured into drinking wells as prophylactics. bacterial viruses have also been utilised by the german and soviet armies in the second world war. the history of phagotherapy makes for more than an exciting story, however. analysis of the old literature helps identify important factors for success and failure. this is especially relevant for a field which holds promise but which has had limited funds at its disposal in the past few years − and which, therefore, has been making rather slow progress. additionally, examination of the strategy used for phagotherapy in the soviet union and poland also contributes to a better application of this method today. the discovery of bacteriophages, particularly their ability to replicate and lyse pathogenic bacteria may have been among the most important milestones in the history of biomedical sciences. in the pre-antibiotic era of the early 20th century, phage therapy was becoming a powerful weapon against infectious diseases of bacterial aetiology. unfortunately, phage treatment and research was largely forgotten in the western world as antibiotics became widely available. nowadays, the rapid propagation of multi-drug resistant bacterial strains is leading to renewed interest in phage therapy. in contrast to its decline in the west, phage therapy remained a standard part of the healthcare systems in eastern europe and the ussr during the second half of the 20th century. phage preparations were used for diagnostic, therapeutic and prophylactic purposes to combat various bacterial infections. the eliava institute of bacteriophages, microbiology and virology (tbilisi, georgia) is perhaps the most famous institution in the world focused on the study of bacteriophages, particularly the isolation and selection of phages active against various bacterial pathogens. phages have been isolated against bacterial strains received from all over the former ussr and socialist east european countries; consequently, a huge collection of phages and pathogenic bacterial strains has been constructed at the institute. thousands of people were treated with individual phages and phage mixtures during the soviet era. the preparations developed in tbilisi have been studied through extensive preclinical and clinical trials. however, little of this information has ever been published and even when details are available, the trial reports do not meet internationally approved regulations and standards. bacteriophages have a number of advantages in comparison to antibiotics. phage therapy as an alternative approach for treatment of infections has become an evident and promising remedy. today, many people from various parts of the world express their willingness to take phage treatment against different infections, including those that are caused by antibiotic-resistant bacterial pathogens. the eliava institute has elaborated new, phage-based products and technological schemes for their production. strong collaboration with the medical community in the design of clinical trials according to international standards is absolutely critical to supporting the broader implementation of phage therapy. an australian male aged 57 years died from an intracerebral haemorrhage ten days after he returned from a trip to rural yugoslavia. his kidneys and liver were donated to three female recipients aged 44 years (kidney), 63 years (kidney), and 64 years (liver). four to five weeks after the organ donation, all three recipients died. all had febrile illnesses with altered mental status. subsequent testing of post-mortem tissues from the recipients identified a novel arenavirus, which was related to lymphocyctic choriomeningitis virus (lcmv). this viral detection process involved the use of high-throughput sequencing techniques to identify novel microbial rna sequences. confirmatory testing was performed using the techniques of reverse transcriptasepolymerase chain reaction, immunohistochemical analysis for arenavirus antigens, and immunofluorescent testing for igg and igm antibodies. the clinical features in these four patients as well as other similar problems with transplant-related illness from classic lcmv will be discussed, as well as details of the laboratory identification of this new virus, and implications for organ transplantation protocols in future. successful management of invasive fungal infections depends on timely and correct treatment. over the last decades a number of new tests have become available which have improved the diagnostic options. in contrast to the scenario for bacterial infections, acquired resistance in fungi is rare and thus species identification is a valuable tool guiding choice of treatment. therefore, microscopy & culture is still a corner stone in diagnosis, but culture and identification are time consuming (app. 1−5 and 1−3 days, respectively). the sensitivity and speed of microscopy have been improved by the use of fluorescent brighteners such as calcofluor white or blankophor. but only with the recent development of pna probes specific for a number of the candida spp. has species identification become possible directly from a positive blood culture before subculture on agar media. chromogenic agars allow a presumptive identification of several candida spp. and facilitate the recognition of yeast isolates in samples containing several yeasts or yeast and bacteria in combination. the use of such plates has been shown to lead to a better identification of mixed cultures in a recent nordic eqa scheme including more than 50 laboratories. rapid species identification of the most important candida spp. is possible in the routine laboratory using easy commercially available kits. thus, a species identification of c. albicans, c. dubliniensis and c. krusei can be obtained within minutes using latex agglutination kits (bichro-dubli, krusei-color; fumouze diagnostics) and c. glabrata can be rapidly identified due to its high amounts of preformed intracellular trehalase enzyme (glabrata rtt; fumouze diagnostics). finally, pna probes and fluorescence microscopy can also be used for a same day identification of a range of the clinically relevant candida spp. (advandx). susceptibility testing is possible using etest and the results are comparable with those obtained by reference methodologies in head to head comparisons. however, recent data from eqa distributions suggest that detection of isolates with acquired resistance causes many laboratories difficulties. this illustrates that a critical number of isolates should be tested per technician per week and quality control strains should be included on a regular basis. in conclusion, a number of new diagnostic tests have become available over the last decade and the diagnostic laboratories are encouraged to take advantage of these new options. 19th eccmid, oral presentations since the introduction of newer antifungals with different in vitro spectra, the aetiology of invasive fungal infections (ifi) has become a major diagnostic issue as a prerequisite for a guided antifungal therapy. while molecular methods, such as pcr and sequencing for the diagnosis of ifi have been evaluated from specimens such as blood and bronchoalveolar lavage fluid for some years, they have been less studied for biopsies. characteristics inherent to these molecular methods, e.g. sensitivity, specificity and short turnaround time makes them promising as adjuncts to conventional diagnostic tests, e.g. culture and histopathology from organ biopsies. studies using tissue from animal models of mould infections suggest that pcr might be more sensitive than culture and allows for a better species identification than histopathology. however, most of these studies used assays detecting only a small range of agents or even single organisms. while this may increase the sensitivity of the assays and reduces the likelihood of contaminations it limits the usefulness in the clinical setting, given the broad range of potential fungal pathogens. studies using fresh clinical samples suggest that the detection and identification of a wide range of fungi is possible using broad range assays in combination with sequencing or by combining more specific pcr assays. further studies are needed to optimise dna extraction, define the best molecular targets and the best method for amplicon detection. the prevention of contaminations due to ubiquitous fungi and unspecific amplifications are a major problem, especially when using broad range assays. in contrast, fish probes may potentially be more specific than pcr due to the visualisation of fungal elements in tissue. in contrast to pcr, they appear to work well with formalin fixed specimens. species identification might be more challenging than by pcr and sequencing. direct comparisons between fish and pcr are needed to characterise the pros and cons of each method in determining the aetiology of ifi. molecular tissue diagnosis has the potential to evolve into a useful method to describe the aetiology of ifi even in culture negative samples. results might be obtained fast enough to guide the antifungal therapy in patients with ifi progressive to empiric antifungal therapy. in these patients, the risk associated with invasive tissue sampling might be outweighed by potential benefits of a guided antifungal therapy. the two groups of carbapenemases (serine carbapenemases and metallobeta-lactamases (mbls)) can be encoded by genes that can be carried on plasmids. the serine carbapenemases are distinctly either class a or oxa (class d); the latter being mainly associated with acinetobacter spp. the dominant mbl subgroups, vim and imp have genes that are reportedly carried on plasmids and chromosomes. recent evidence has shown that the majority of blavim-2, even those initially reported, are indeed plasmid mediated and probably accounts for their rapid dissemination. blavim-1 genes have been recently shown to be carried on incn and incw plasmids. the "brazilian" mbl gene, blaspm-1, is exclusively chromosomally encoded. the mbls sim-1 and aim-1 are both chromosomally encoded whereas gim-1 is encoded from a plasmid of approx. 48 kb. the recently described blakmh-1 gene is also carried on a plasmid (200 kb). hitherto, only two mbl-positive plasmid sequences are available thus far -those carrying blaimp-8 and blavim-7. the former carries other resistance genes and are approx. 302 kb (inchi2), whereas the latter is a small plasmid (24 kb) and shows similarities with incp plasmids. oxa carbapenemase genes have been shown to be both plasmid and chromosomally mediated. thus far, the blaoxa-23 and blaoxa-24/40 clusters can be both plasmid and chromosomal and have mainly been found in acinetobacter spp. the blaoxa-48 and blaoxa-58 clusters have been found in k. pneumoniae and acinetobacter spp., respectively, and both are plasmid mediated. blaoxa-48 and blaoxa-58 have been shown to be carried on 70 kb and 28-100 kb plasmids, respectively. a blaoxa-58 plasmid has been recently sequenced and shown to carry two different replicases. the class a carbapenemase genes, blakpc, blaimi-2 and blages are all carried on plasmids. blakpc is found mainly in k. pneumoniae and carried on plasmids that vary in size 12−95 kb and mostly possessing the origin of replication incn. however, kpc-2 has recently described in a pseudomonas as being chromosomally mediated. blaimi-2 is exclusive to the usa and carried on a 66 kb plasmid although blaimi-1 is chromosomal. the blages genes have been found in p. aeruginosa and enterobacteriaceae of which ges-2, 4, 5 and 6 have been shown to be plasmid mediated although little else in known. this lecture will provide a synopsis, discuss the evolution of resistance due to plasmids and briefly predict what we may face in the 21c with respect to carbapenemase resistance. nosocomial infections caused by multidrug-resistant pathogens, especially gram-negative bacilli, have become a serious clinical concern in every healthcare setting worldwide. as well as carpapenemhydrolysing metallo-b-lactamases, ctx-m-type b-lactamases, and qunolone-resistance genetic determinants such as qnr, aac(6 )-ib-cr, and qepa, plasmid-mediated novel molecular mechanisms such as rmta, rmtb, rmtc, rmtd, arma, and npma responsible for pan-resistance to aminoglycosides have recently been identified in pseudomonas aeruginosa, acinetobacter spp., serratia marcescens, esherichia coli, klebsiella pneumoniae, proteus mirabilis etc. since 2003, and these enzymes have indeed methylation activity of 1405g or 1408a at the a-site of the bacterial 16s rrna as found in aminoglycoside-producing actinomycetes. these plasmid-mediated 16s rrna methylases are speculated to be originated from some nonpathogenic environmental microbes that produce aminoglycosides or some similar compounds, so it is quite natural that several new enzymes would be further identified hereafter in both clinical and livestock farming environments. rmtb and arma have widely spread in asia, europe, america and australia via various pathogenic gram-negative bacilli, we should pay special attention to the further spread of such hazardous microbes. in my talk, i would like to give an outline of newly identified molecular mechanisms that confer pan-resistance to aminoglycosides in pathogenic microbes isolated from both human and veterinary environments. [ acquired resistance to quinolones mainly results from chromosomal mutations responsible for modification(s) of dna gyrase and topoisomerase iv, and for a decrease of drug accumulation into bacteria due to decreased permeability and/or overexpression of efflux systems. plasmid-mediated quinolone resistance (pmqr) was first reported in 1998 from the usa, and two other mechanisms have been identified to date. the first pmqr determinants, qnr proteins, belong to the family of pentapeptide repeat proteins. five determinants have been identified: qnra, qnrb, qnrc, qnrd, and qnrs with 6, 20, 1, 1, and 3 different variants, respectively. they may act by binding directly to both dna gyrase and topoisomerase iv leading to protect them from quinolone inhibition. they confer resistance to nalidixic acid and reduced susceptibility to fluoroquinolones (fqs), but may facilitate recovery of mutants with higher level of resistance. the overall prevalence of qnra, qnrb, and qnrs determinants generally ranges from 1 to 5%, and they have been identified worldwide mostly in esbl-producing enterobacterial isolates. the origin of the qnra and qnrs genes were identified as shewanella algae and vibrio splendidus, respectively. the second type of pmqr determinant, aac(6 )-ib-cr, is a variant of the aminoglycoside acetyltransferase aac(6 )-ib which confers resistance to kanamycin, tobramycin and amikacin. this variant possesses two substitutions (trp102arg and asp179tyr) that are sufficient to acetylation of ciprofloxacin and norfloxacin with a 2-to-4-fold mic increase. the overall prevalence of aac(6 )-ib-cr may range from 0.4 to up to 34%, and it has been reported mainly in escherichia coli and klebsiella pneumoniae. the third type of pmqr determinant, qepa, has been identified in two e. coli clinical isolates from japan and belgium. the qepa gene encodes a 14-transmembrane-segment putative efflux pump belonging to the major facilitator superfamily. this protein confers decreased susceptibility to hydrophilic fqs (e.g. norfloxacin, ciprofloxacin and enrofloxacin) with an 8-to-32-fold mic increase. the two epidemiological surveys for qepa may indicate its low prevalence (<1%). the natural reservoir of qepa remains unknown but might be an actinomycetal species. discovering of three main mechanisms of pmqr within the last ten years is peculiar. it may reflect the emergence of novel mechanisms of resistance but also a deeper investigation of resistance mechanisms in clinical isolates. emerging infections: can we cope with them? a. kühn°, c. schulze, h. ranisch, p. kutzer, h. nattermann, r. grunow (berlin, frankfurt-oder, de) objective: little is known about the prevalence of francisella tularensis in humans and animals in germany. interestingly, the pathogen emerged recently when several marmosets (callithrix jacchus) died from tularaemia and a group of hunters became infected in the areas of western germany. to find out more about the distribution of the pathogen also in eastern germany we investigated the seroprevalence of tularaemia under foxes (vulpes vulpes) and raccoon dogs (nyctereutes procyonoides) in the area of brandenburg (around berlin). methods: sera of animals (n = 351 and n = 32, respectively) from the years 2007 and 2008 were tested for f. tularensis − lps antibodies in an indirect elisa and suspicious samples were confirmed by western blot for lps ladder recognition using protein g − pod conjugate. furthermore we investigated the serum samples by a competitive elisa using a peroxidase-conjugated anti − lps monoclonal antibody. results: from the serum collection, we tested 31 (8.8%) foxes and 3 raccoon dogs (9.4%) positive for specific f. tularensis antibodies. the geographical distribution showed hot spots in the area of the investigated region. our results indicate for a higher seroprevalence in wildlife for tularaemia in eastern regions of germany than assumed. since the reported human cases for the last decade seem to be underestimated, the real prevalence of the pathogen is unknown. the high number of tularaemia antibody positive foxes and raccoon dogs indicates that this zoonose is present in wildlife in eastern germany. however, the impact of transmission of zoonotic pathogens from wildlife to domestic animals and humans is not yet well studied. in conclusion, the obtained data will contribute for creating of up-to-date strategy for more efficient control of the two rickettsial zoonoses. objective: helicobacter pylori is established as the primary cause of gastritis and peptic ulceration in humans. in a minority of patients with upper gastrointestinal symptoms long tightly coiled spiral bacteria, clearly distinct from h. pylori, and provisionally named as "h. heilmannii", can be observed in gastric biopsies. our objective was to isolate and identify the spiral organism, resembling "h. heilmannii" from the gastric mucosa of a finnish patient presenting with severe dyspeptic symptoms. methods: we used two different selective media for the isolation of the bacteria from gastric biopsy samples before and after treatment of the patient with a 7-day course with lansoprazole, tetracycline and metronidazole. the isolates were characterised by testing for urease and catalase activity, light and electron microscopy, and sequencing the partial 16s rrna and ureab genes. single enzyme aflp was used to analyse the genetic diversity among the isolates. results: growth of long spiral organisms was obtained from 7 out of 8 antrum and all 8 corpus biopsies before and all three antrum biopsies after treatment of the patient. the partial 16s rrna gene sequence showed high sequence similarities with other gastric helicobacter species. the partial ureab gene showed high sequence similarity with h. bizzozeronii and was clearly distinct from other gastric helicobacter species. aflp indicated that the isolates belonged to the same clone however some minor genetic diversity was observed among the isolates. results: b. pseudomallei was primarily found in close proximity to streams and in grass-rich areas but was also correlated with environmentally disturbed soil such as caused by the presence of animals, farming or irrigation. prediction maps are currently being verified by sampling predicted b. pseudomallei "hot-" and "cold-spots". see in figure a prediction map for rural darwin with red areas indicating high probability for presence of b. pseudomallei. this study contributes to the elucidation of the environmental distribution of b. pseudomallei in endemic tropical australia and to the clarification of environmental factors influencing its occurrence. it also raises concerns that b. pseudomallei are spreading due to changes in land management. o82 concurrent multi-serotypic dengue infections in various body fluids w. kulwichit°, s. krajiw, d. chansinghakul, g. suwanpimolkul, o. prommalikit, p. suandork, j. pupaibool, k. arunyingmongkol, c. pancharoen, u. thisyakorn (bangkok, th) objectives: dengue virus infection is one of the rapidly-spreading emerging diseases worldwide. the virus is divided into 4 distinct serotypes with limited cross-protective immunity; therefore, one can be reinfected with different serotypes. while each episode is usually caused by a single serotype, an individual can occasionally be infected by concurrent multiple ones. our group has previously detected dengue virus from urine and oral specimens of some patients. in this study, we sought to determine the characteristics of multi-serotype infections when analysing beyond the patients' blood compartments. methods: during 2003 during -2007 and adult patients suspected of dengue infections were enrolled. plasma, peripheral blood mononuclear cells (pbmc), urine pellets, buccal brushes, and saliva were collected during and after the febrile episode. only specimens from patients with both positive dengue serology and pan-dengue-specific rt-pcr were included. serotype-specific rt-pcr was then performed on the aforementioned various specimens of each patient. results: 95 patients met the above criteria. serotyping was successful in 85 patients. den-4 was the most common serotype, accounting for half of the cases. 20 of these 85 (23.5%) demonstrated multiserotypic infections when combining data from all specimen types in each individual. serotyping using single, conventional serum/plasma specimens, however, would detect only half of the cases. the phenomenon of concurrent multi-serotypic infections was present in all examined specimen types, including urine pellets, buccal brushes, and saliva. the most frequent combinations were den-1 + den-4 and den-2 + den-4 (5 cases each). two patients were simultaneously infected by serotypes 1, 2, and 4 and one by serotypes 1, 3, and 4. there was no demonstrable significant difference in clinical severity between single-and multi-serotypic infections. conclusion: in a dengue-hyperendemic country with simultaneous circulation of all four serotypes, the phenomenon of concurrent multiserotypic infections are more common than previously demonstrated by traditional serotyping on single serum/plasma specimens. this may be explained by the sensitivity limitation of the detection method or by biological behaviour of the virus. our findings have an implication for potentially more accurate epidemiologic studies in the future, and for further exploratory investigations regarding dengue virus in various secretions and excretions. o83 emerging concepts about the evolutionary history of hantaviruses h.j. kang, s.n. bennett, l. sumibcay, s. arai, a.g. hope, j.a. cook, j.w. song, r. yanagihara°(honolulu, albuquerque, us; tokyo, jp; seoul, kr) objective: recent discovery of genetically distinct hantaviruses in shrews (family soricidae), captured in widely separated geographic regions, challenges the conventional view that rodents are the principal and progenitor reservoir hosts of hantaviruses, and raises the possibility that other soricomorphs, notably moles (family talpidae), harbour hantaviruses. methods: using oligonucleotide primers based on conserved genomic regions of rodent-and soricid-borne hantaviruses, rna extracts from tissues of the japanese shrew mole (urotrichus talpoides), american shrew mole (neurotrichus gibbsii) and european common mole (talpa europaea) were analyzed for hantavirus sequences by rt-pcr. newfound s-, m-and l-segment sequences were aligned using clustal w and were analyzed phylogenetically by the maximum-likelihood and markov chain monte carlo tree-sampling methods, with the gtr+i+g model of evolution. results: novel hantavirus genomes, designated asama virus (asav), oxbow virus (oxbv) and nova virus (nvav), were detected in tissues of urotrichus talpoides, neurotrichus gibbsii and talpa europaea, respectively. sequence and phylogenetic analyses indicated that asav and oxbv were related to hantaviruses harboured by soricine shrews in eurasia and north america, respectively. by contrast, phylogenetic analyses of full-length s-and l-segment sequences showed that nvav formed a unique clade, clearly distinct and evolutionarily distant from all other hantaviruses. despite the high degree of sequence divergence at the nucleotide and amino acid levels, the secondary structures of the nucleocapsid proteins, as well as the l-segment motifs, of the moleassociated hantaviruses were well conserved. conclusions: while cross-species transmission has influenced the course of hantavirus evolution, such host-switching events alone do not satisfactorily explain the co-existence and distribution of genetically distinct hantaviruses among species in two taxonomic orders of small mammals spanning four continents. when viewed within the context of molecular phylogeny and zoogeography, the close association between distinct hantavirus clades and specific subfamilies of rodents, shrews and moles is likely the result of alternating and variable periodic codivergence at certain taxonomic levels through evolutionary time. thus, the primeval hantavirus might have arisen from an insect-borne virus, with ancestral soricomorphs, rather than rodents, serving as the original mammalian hosts. from south-eastern france m. kaba, b. davoust, j.l. marié, m. barthet, m. henry, c. tamalet, j.m. rolain, d. raoult, p. colson°(marseille, toulon, fr) objectives: autochthonous hepatitis e is currently considered as an emerging disease in industrialised countries and several studies suggest that hepatitis e is a zoonosis, especially in pigs, boars and deer. we aimed to study whether hepatitis e virus (hev) is commonly present in domestic pigs in southern france, and to determine the relationship between hev sequences detected from pigs and those described in human hepatitis e cases. methods: serum and stools samples were collected from 207 three or six-month-old pigs from different regions of southern france. 107 sixmonth-old pigs were from a slaughterhouse, and 100 three-month-old pigs were from a pig farm. swine igg anti-hev antibodies testing was performed using a commercial elisa kit for clinical diagnosis with minor modifications. swine hev rna detection was conducted by realtime pcr and amplification/sequencing assays using in house protocols targeting the 5 orf2 region of the hev genome. results: 40% of pigs were seropositive, and 65% of three-monthold pigs were hev rna-positive, whereas none of the six-monthold pigs were hev rna-positive. hev rna was significantly more frequently detected from stools than from serum (65% versus 22%; p < 0.001). phylogenetic analysis showed that swine hev sequences belong to genotype 3f or 3e and formed two clusters within which sequences showed high nucleotide homology (>97%). these clusters were correlated with the geographical origin of pigs as well as with their repartition into pens and buildings in the pig farm where samples were collected. swine hev sequences from the present study were genetically close to hev sequences found from humans or swine in europe, although no strong phylogenetic link could be observed neither with these latter sequences nor with those from human hepatitis e cases diagnosed in the laboratory. conclusion: our data indicate that three-month-old farm pigs from southern france might represent a potential source of contamination to humans, and they underscore the great potential of hev to cause epizootic infections in populations of farm pigs. o85 clostridium difficile: changing epidemiology trends, 2000 -2007 objectives: clostridium difficile infection (cdi) has become a growing concern world-wide with an increased reported incidence and an increase in the associated financial burden. our aim therefore was to review trends in cdi occurring from 2000-2007 inclusive. methods: all patients admitted to lothian university hospitals division (luhd) tested for c. difficile toxins a+b by eia were included. retrospective analysis of prospectively collected data was performed. the number of occupied bed days was provided by nhs-lothian statistics department. the most recent published costs associated with cdi were used to estimate potential costs to lothian nhs trust. results: 50,590 faecal samples were tested for c. difficile toxins from 2000-2007 inclusive; of these 7301 samples were positive. overall cdi was identified in 15.2 cases/10000 patient days and 5.8 cases/1000 inpatient hospital admissions. the incidence of identified cdi rose from 3.6cases/10000 patient days in 2000 to 14.8cases/10000 patient days in 2007. incidence also increased with age from 3.3cases/10000 patient days in the 0−20 years age group to 18.1cases/10000 patient days in the 61−80 years age group. renal medicine and intensive care had the highest incidences of identified cdi with greater than 57cases/10000 patient days each followed by infectious diseases and gastrointestinal medicine whose rates were 47.5 and 42.6 cases/10000 patient days respectively. medicine of the elderly in comparison had an incidence of 19.5cases/10000 patient days. of note 10% of all patients were transferred through a minimum of two specialties during the period in which they remained positive for c. difficile toxins. estimated costs over the study period for toxin testing alone were in the region of £126,500 and the minimal potential hospitalisation costs of patients with cdi was in the region of £20,000,000. conclusion: the incidence of patients identified with cdi has risen markedly and not surprisingly the incidence has also been noted to increase with age. medicine of the elderly however had a much lower incidence than several other specialties and therefore risk assessment of cdi development and containment should now also be targeted within other specialties. with 10% of identified cdi patients transferred through different specialties and the significant financial burden cdi imposes on healthcare institutions judicious application of infection control measures remains an important factor to prevent cdi spread. isolates of this strain were pvl negative, but positive for enterotoxin a (sea) and, in most cases, also for seb, sek and seq. a fifth strain was the "taiwan clone", st59/952-mrsa-v (wa mrsa-9 and -52) which also comprised two closely related sequence types. this strain carried a sccmec element of type v(t) or vii as well as pvl and, usually, seb, sek and seq. it was the most common cc59 strain in wa. the sixth strain differed from the "taiwan clone" in the presence of a sccmec type v element and in the absence of pvl. the differentiation of this clonal complex into various different strains indicates a rapid evolution and spread of sccmec elements, and the diagnostic microarray technology allows one to distinguish beyond mlst level and hence to accurately trace outbreaks and spread of these strains. a sample taker 12 has daily contact with poultry and is excluded from analysis. b sample taker 5 reported no contact with livestock elsewhere than in this study at that moment (spa-types of sample taker 5 and farm are not corresponding). c sample taker 6 tested mrsa-negative in following tests. d sample taker 9 was not tested again. complete data sets (samples taken before, directly after and 24 hours after a visit) were collected on 141 visits by 29 sample takers visiting 50 farms. on 28 farms mrsa was collected from pigs or stabledust (56%). these farms were visited 78 times by 23 different sample takers. one sample taker (#12) was positive for mrsa before visiting a farm, he was removed from the following analysis. fifteen of the 78 (19%) visits to mrsa-positive farms resulted in acquisition of mrsa and 11/23 (48%) sample takers acquired mrsa at least once after visiting a positive farm. of these 11 positive sample takers 2 acquired mrsa twice and 1 sample taker acquired mrsa three times after separate visits. of the 15 acquisitions of mrsa, 13 were negative after 24 hours. the spa-types of mrsa isolates found on the farms and sample takers were grossly comparable. on the 32 negative farms, none of the 60 visits resulted in mrsa acquisition. for further information see the table. discussion: mrsa-cc398 was acquired by 48% of the sample takers after occupational exposure in this study. however, in 11 of the 13 cases the strain was not recovered the next day, therefore acquisition was of short duration, posing a limited treat to human health. some persons seemed to be more vulnerable to acquire mrsa during their work. the sample size of this study was too small to draw final conclusions concerning this inter-personal variation. this requires a more extensive study. [ objectives: community-associated mrsa is an increasing problem and an association with food animal contact has been made in some regions. this has led to concerns about the potential role of food in mrsa transmission. the objective of this study was to evaluate the prevalence of mrsa colonisation of retail pork in canada. methods: pork chops, ground pork and pork shoulders were purchased at retail outlets in four canadian provinces in conjunction with the canadian integrated program for antimicrobial resistance surveillance. both direct inoculation of meat into enrichment broth and rinsing of meat in broth were performed for pork chops and shoulders, followed by inoculation onto chromogenic agar. ground pork was tested only using the direct method. mrsa isolates were typed by pfge and spa typing. real time pcr was used to detect panton-valentine leukocidin genes. results: mrsa was isolated from 31/402 (7.7%, 95% ci 5.5−10.7%) of samples. there was a significant difference between provinces (p < 0.001) but no difference between different products, with mrsa isolated from 23/296 (7.7%) pork chops, 7/94 (7.4%) ground pork and 1/12 (8.3%) pork shoulders (p = 0.99). 21/403 (5.2%) samples were positive using direct culture while mrsa was isolated from 15/355 (4.2%) of samples testing using the rinse method. nine samples were positive on direct culture but negative using the rinse method, while 10 others were positive only with the rinse method and only 5 were positive with both methods. seven samples (ground pork) that were positive on direct culture were not tested using the rinse method. 3 main clones were present. the most common (40% of isolates) was a group of 3 related spa types (t064, t008 and new related type) were classified as canadian epidemic mrsa-5 by pfge, an st8 human epidemic clone that has been associated with horses. pfge-non-typable spa t034 were not surprisingly common, accounting for 30% of isolates. the 3rd main group was 3 related spa types (t002, t045 and new type) that were cmrsa-2 (usa100), an st5 clone that is common in humans in canada, that also accounted for 30% of isolates. the clinical relevance of mrsa contamination of pork is currently unclear. it is possible that contact with contaminated food could be a mode of mrsa transmission in the community, although further study of the prevalence of contamination, amount of mrsa in contaminated samples, sources of contamination and implications on human health are required. o95 prevalence of the novel trimethoprim resistance gene dfrk among german staphylococcal isolates of the bft-germvet monitoring study k. kadlec°, s. schwarz (neustadt-mariensee, de) objectives: very recently a novel trimethoprim resistance gene, dfrk, was identified on a tet(l)-harbouring plasmid in a porcine mrsa isolate from the bft-germvet monitoring study. this study included in total 248 independent coagulase-positive and coagulase-variable staphylococci collected between 2004 and 2006 all over germany: 46 isolates from infections of the urinary-genital tract of pigs, 44 isolates from skin infections of pigs, 57 isolates from respiratory tract infections of dogs/cats, and 101 isolates from infections of skin/ear/mouth of dogs/cats. in this study, we investigated the prevalence and the plasmid location of the dfrk gene among these isolates. methods: pcr primers were designed and a pcr with subsequent restriction analysis of the pcr product was established to detect dfrk. isolates with positive results were tested for a plasmid location of dfrk by transfer experiments and dfrk-carrying plasmids were further analysed. the trimethoprim resistance gene dfrk was detected in another 10 isolates. all isolates were from pigs: 9 from skin infections and the remaining 1 from a urinary-genital tract infection. six staphylococcus hyicus subsp. hyicus isolates, 3 s. aureus isolates (2 mrsa and 1 mssa) and 1 s. pseudintermedius. all these isolates harboured plasmids. in 7 isolates (4 s. hyicus, 2 mrsa and the single s. pseudintermedius), the plasmid location of dfrk was confirmed by protoplast transformation with subsequent susceptibility testing and pcr analysis of the transformants. in all 7 cases, the plasmids harbouring dfrk also carried a tet(l) tetracycline resistance gene. the results of a combined pcr assay with primers from tet(l) and dfrk confirmed that the dfrk gene was always located immediately downstream of the tet(l) gene. further analysis of these dfrk-and tet(l)-harbouring plasmids showed that they varied in size between 6 and 40 kb and that similar sized plasmids differed in their ecorv and hindiii restriction patterns. the novel trimethoprim resistance gene dfrk occurred in 11 (12.2%) of the 90 porcine staphylococcal isolates from the bft-germvet study. in 8 (72.7%) of the 11 isolates, it was located on structurally diverse plasmids, however, always in close proximity to a tet(l) gene. the linkage of the dfrk and tet(l) genes allows the maintenance and coselection of such plasmids under selective pressure by either tetracyclines or trimethoprim, both of which are widely used in veterinary medicine. (table) . the isolates were resistant to ciprofloxacin, clindamycin, erythromycin, gentamicin but susceptible to vancomycin. only one se was methicillin-susceptible and two isolates were quinupristin/dalfopristin non-susceptible. all strains were clonally related and clustered into three subtypes (a, a1 and a2). cfr gene was detected in a linezolid non-susceptible strain (mic, 64 mg/l), which was recovered from a 57 y/o male who underwent liver transplantation. plasmid analysis identified six plasmid bands ranging from c.a. 1.5-to 154-kb in the cfrcarrying strain. hybridisation signals were observed from the 154-kb plasmid band as well as from a chromosomal band after i-ceui digestion. mutations at the 23s rrna, l4 or l22 were not detected. the cfr increased the linezolid mic value between 8-and 16-fold. this report highlights the ability of se to acquire linezolid resistances. the potential mobility of cfr combined with the clonal tendency for dissemination among staphylococcus spp., represent a serious threat to several potent gram-positive-active agents, including oxazolidinones. active surveillance combined with effective infection control and molecular studies seem prudent to minimise the spread of these resistance mechanisms. the objective is to get a glimpse of the potential impact of infectious diseases on music, as regards to the composer's or performing musician's own disease, living conditions or other relevant elements which might have affected the end result, the music we enjoy today. as music is an art of senses, full of drama, despair, realities of life − or just the opposite, blissful ignorance of those realities, full of romance, beauty, and delicacy − various forms of music was researched paying special attention to infections which potentially have played a significant role in the birth of that particular piece or performance. the entire research process was subjective, biased, and emotional, but done wholeheartedly. it aimed at to taking into account, not only the personal life of a composer or performing musician, but also the historical context in which the music was born. musical examples, served to the audience along with the essential background data, will show the extent to which infections have impacted music. regarding the aetiology of those infections, bacterial, viral and parasitic agents are well represented. in addition, many epochs in history have played their role. sometimes, the connections are surprising, even dramatic. if listened to with a tender ear, music quite often turns out to be affected also by infectious diseases. as physicians we should realise the strength with which some people are driven by this demonic, divine − but altogether beautiful force: music. the prevalence of antibiotic resistance has been increasing in asian countries in recent years. this problem has most likely arisen due to a combination of inadequate infection control practices particularly in hospital settings and the widespread misuse of antibiotics in hospital and community settings. factors that lead to antibiotic misuse include inappropriate antibiotic prescription due to a lack of clinical, microbiological and/or imaging data in many clinical settings in the asian region. a lack of separation of prescribing and dispensing by medical practitioners as practised in many countries in asia as well as the easy availability of over the counter medications also contribute to antibiotic misuse. optimal control of antibiotic use can only be achieved through a multipronged approach that includes better education of the public and medical practitioners on rational use of antibiotic, a review of the health system structure, as well as better control of over the counter sales of antibiotics. upgrading of microbiology and other laboratories and radiological facilities that will enhance the accuracy of clinical diagnosis is also urgently needed in most developing countries to keep pace with the complexities of managing patients in this new era to minimise the widespread practise of inappropriate antibiotic use. examination of the csf for microorganisms, wbc and differential counts, and concentrations of glucose and protein is the primary investigation to diagnose meningitis. however, this csf examination may not always be conclusive, and it can be difficult to distinguish bacterial from viral meningitis. therefore, improvement in diagnostic sensitivity and specificity of bacterial meningitis and development of rapid test for a bacterial aetiology are still needed. this presentation gives a review of the strength and weakness of several analyses and methods to reveal the microbiological agent (i.e. csf microscopy and culture, antigen or antibody detection, molecular methods to detect dna or rna) and the use of several mediators of the host immune response for diagnostic and prognostic purposes. bacterial meningitis is a medical emergency that requires a multidisciplinary approach. a diagnosis of bacterial meningitis is often considered, but the disease can be difficult to recognize. recommendations for antimicrobial therapy are changing as a result of the emergence of antimicrobial resistance. in this lecture, current concepts of the initial approach to the treatment of adults with bacterial meningitis will be summarised. the management of the critically ill patient with bacterial meningitis poses important dilemmas. controversial areas (i.e., prehospital admission antibiotics) will be reviewed and relevant literature will be discussed in the framework of current treatment guidelines, highlighting new developments in adjunctive dexamethasone therapy. acute bacterial meningitis (abm), especifically when caused by infection with streptococcus pneumoniae, still has an unacceptably poor prognosis with a mortality of 10−30%. bacterial infection of the meninges causes one of the most powerful inflammatory reactions known in medicine. yet 50 years ago, this inflammatory reaction was suggested to contribute substantially to brain damage. this concept underlies the use of anti-inflammatory agents as adjunctive therapy in abm. of all adjunctive treatments in abm, only corticosteroids have been properly evaluated in clinical trials. these trials recommend corticosteroids in patients with haemophilus influenzae type b and pneumococcal meningitis (pm). however, adjunctive corticosteroid therapy has several weaknesses such as a narrow treatment window and borderline effects on neurologic sequelae. thus, there is still the need for additional or alternate adjuvants in the therapy of abm. experimental studies using animal models (predominantly of pm) have provided insight into the pathogenic mechanisms underlying brain injury in abm. it is now clear that the autodestructive inflammatory reaction is initiated by the interaction of bacterial components with host pattern recognition receptors (prr) like toll-like receptors (tlr). prr signaling results in the activation of transcription factors like nf-kb which up-regulate the production of proinflammatory cytokines. cytokines like il-1b are also potent triggers of nf-kb activation and therefore can exaggerate the inflammatory reaction (via positive feedback loops). as a consequence, great numbers of neutrophils are recruited to the meninges. activated neutrophils release many potentially cytotoxic agents including oxidants and matrix metalloproteinases that can cause collateral damage to brain tissue. additionally to the inflammatory response, direct bacterial cytotoxicty has been identified as a contributor to tissue damage in abm. thus, experimental studies point at four different targets of adjunctive therapy, namely interference with (i) the induction of inflammation (e.g., tlr blockade), (ii) the exaggeration of inflammation (e.g., il-1 antagonism), and (iii+iv) the generation of cytotoxic factors (either of host or bacterial origin, e.g., scavenging of oxidants). this presentation will give an overview of the pathophysiology of abm (with special emphasis on pm) and highlight promising targets for adjunctive therapy in abm, as deduced from experimental studies. a clinician's approach to managing difficult infections s120 acute post-surgical prosthetic joint infection optimal management of prosthetic joint infections (pji) remains undefined. important issues such us when the implant can be retained (conservative strategy), optimal duration of antimicrobial therapy (at) or the role of rifampin are yet matter of controversy. in spite of a number of reports, literature appears confusing. among the limitations of the literature we must emphasize: 1) different criteria to classify pji; 2) different criteria to select for conservative strategy (cs); 3) no description of the initial population from which patients were selected for cs; 4) very different at (from 4 weeks to chronic suppressive therapy); 5) low numbers of patients or short follow-up; 6) absence of clinical trials. it is not so surprising that the rates of cs success have varied from 0 to almost 100%. the most useful classification to approach pji was proposed by tsukayama (1996) . in his series 25 out of 35 patients with early pji managed by a cs (debridement, exchange of polyethylen and implant retention) were cured after 4 weeks of at. the spanish group for the study of pji was constituted in 2003 within the spanish network for the study of infectious pathology (reipi), a public funded initiative. data from 139 consecutive cases of early pji attended in 10 hospitals were recorded in an online database. 117 cases managed with cs could be analysed (mean followup of 2 years). sixty-seven patients (57.3%) were cured after a mean of 81 days of at. in 35 (29.9%) the infection was not controlled (or relapsed) after a mean of 84 days of at, and the implant had to be removed. in other 15 patients (12.8%) the implant was not removed, but suppressive at was given because of suspected ongoing infection. results were significantly worse in one hospital. no other factors resulted statistically significant, but there was a trend of worse results for mrsa produced infections (p = 0.06). time from the symptoms appearance to debridement was shorter in successfully treated cases (median, 7 days) than in failures (median, 10 days); p = 0.08. good functional results were obtained in patients with successfully cs. in summary, a substantial proportion of early pji can be managed with cs strategy and a definite (non suppressive) at. it is difficult to identify patients at higher risk for failure, although mrsa aetiology and longer time until debridement seem to predict failures. different outcomes in some centres suggest that surgical technique could be an important factor for failure. more than 3 million cardiac pacing systems are implanted worldwide and the estimated rate of infections after implantation of permanent endocardial leads is 1% to 2%, but varies between 0.1 to 20%. pacemaker infections correspond to different clinical situations including localised infection in the device pocket, pacemaker leads to systemic infection associated with bacteraemia and lead-associated endocarditis. this latter represents 10 to 25% of all cases of pacemaker infections. the severity of pacemaker related infective endocarditis is sustained by a mortality range between 10 to 20%. risk factors related to infections of implanted pacemakers are correlated with fever before 24 h before implantation, temporary pacing before implantation and early re-interventions (haematoma, lead dislodgment). in contrast, an inverse correlation is observed between development of infection and antibiotic prophylaxis and implantation of a new system. data to guide therapy in patients with pacemaker infection are limited and the most appropriate management remains to be determined. according to different series, staphylococci accounted for 60 to >90% of the responsible organisms. coagulase-negative staphylococci (cns) are reported as predominant pathogens following by staphylocococcus aureus. the biofilm production, responsible for bacterial survival, and the emergence of methicillin-resistant in s. aureus and cns have complicated the management of pacemaker infections. this implies that empiric treatment of suspected pacemaker infection should coverage for staphylococci including methicillin-resistant strains. streptococci, corynebacterium spp, propionibacterium acnes, gram-negative bacilli and candida spp can cause occasional infections. the optimal therapy combines complete device extraction (percutaneous ablation or surgical removal during extracorporeal circulation) and prolonged course of antibiotics, in particular in case of multiresistant bacteria. leaving the device intact is associated with increased mortality and risk of relapsing or persistent infections. in absence of prospective studies, the duration of antibiotic treatment remains to be determined but 1 month has been shown not to be associated with an increased incidence of relapse. shortest course of treatment (2 weeks) has been proposed in case of vegetations strictly localised to leads without affecting cardiac valves. antibiotic therapy working alone should be reserved for highly selected patients. infection remains the most critical complication of ventriculoperitoneal shunt placement with an incidence of 2.2−39%. factors as the age of patient, aetiology of hydrocephalus, the type of shunt implanted, and the surgeon's experience are determined to be associated with increased risk of infection. children are more likely than adults to acquire shunt infection. the possible reasons are longer hospital stay, higher skin bacterial concentrations, immature immune systems, or more adherent strains of bacteria. staphylococci, as skin commensals, are the main causative organisms. nevertheless, in recent years a change in the epidemiology of microorganisms was observed with an increase of gram-negative bacteria. appropriate systemic antibiotics according to the antimicrobial susceptibility testing and surgical removal of the shunt with temporary external cerebrospinal fluid drainage and shunt replacement following the eradication of the infection are the cornerstone of the treatment of cerebrospinal fluid shunt infections. good compliance with infection control practices, inserion of the catheter under aseptic techniques and short-term perioperative antimicrobial prophylaxis in order to prevent the emergence of drug-resistant subpopulations are important steps in the prevention of shunt infections. o125 influenza in adults admitted to canadian hospitals: data from two seasons a. mcgeer, d. gravel, g. taylor°, c. weir, c. frenette, j. vayalumkal, a. wong, d. moore, s. michaud, b. amihod (toronto, ottawa, edmonton, montreal, saskatoon, sherbrooke, ca) objective: seasonal influenza (flu) remains a cause of substantial morbidity and mortality. antiviral treatment should be considered for all hospitalised patients with influenza. to better understand the epidemiology and burden of illness within the hospital sector in canada and the current use of antiviral therapy, we carried out a multihospital survey of virologically confirmed flu in hospitalised adults. methods: cnisp is a network of largely teaching hospitals across canada that collaborates to collect data on infections in hospitalised patients. during two consecutive years (2006/2007 and 2007/2008) hospitals within cnisp identified inpatients >16 years who had virologically confirmed flu. case patient charts were reviewed to capture demographic and clinical data and to determine whether flu was community (ca) or hospital acquired (ha). cases were reviewed at 30 days to determine outcomes. deaths at 30 days were reviewed to determine whether flu was a main or contributing cause. results: fifteen (06/07) and 11 (07/08) hospitals were recruited from the cnisp network. 532 virologically confirmed cases of flu were found, 182 in 06/07 (95% flu a) and 358 in 07/08 (56% flu a). mean patient age was 67 years, 52% were male. there was documentation of patient vaccination that season in 29%. incidence of ca flu was 11/10,000 admissions in 06/07 (range by hospital 2 − 23) and 27 in 07/08 (1 − 47). admitting diagnoses in ca cases were: pneumonia or influenza 48%, exacerbation of copd 20%, sepsis or fever not otherwise specified 9%, cardiac diagnoses 7%, other diagnoses 16%. 24% of cases were ha, range by hospital 3.9 − 5.4/100,000 patient days. 68% of patients were managed with droplet and contact isolation practices, an n-95 mask was used in 19%. 29% of ca cases but 75% of ha cases received antiviral therapy p < 0.01, almost entirely oseltamivir. 9% of cases were admitted to an icu; 30-day mortality was 8% with 2.6% attributed to influenza. conclusion: there is considerable season-season and hospital-hospital variation in flu in patients in canadian hospitals. hospitalised patients ca flu present with a wide spectrum of clinical diagnoses; nearly a quarter of all cases were ha. few ca cases but most ha cases were treated with antiviral drugs. attributable 30 day mortality was 2.6%. v. papastamopoulos, e. kakalou°, t. panagiotopoulos, j. baraboutis, m. samarkos, a. skoutelis (athens, gr) objectives: our study sought to describe influenza vaccination coverage among adults in greece for the season 2007/08. methods: we conducted a random-sampling, telephone based household survey among adult individuals in greece. for this purpose a sample of 1104 adults representative of the basic demographic, social and geographical characteristics of the overall greek population according to the latest national survey, was used. two target groups were determined for analysis: persons >65 years of age and persons with chronic conditions such as respiratory and heart conditions (other than hypertension), diabetes mellitus and other conditions. results: the influenza vaccination rate for the season 2007/08 among the adult population in greece was: 16% for the overall adult population (19.5% for men, 12.7% for women), 48.1% for people >65 years of age, 31% for persons with chronic illness (32.5% for persons with respiratory illness, 50.2 for persons with heart conditions, 35% for persons with diabetes mellitus). a high rate of 81% of the overall population reaching 88% among persons with chronic conditions report having had any type of contact with the national health system or a private physician within the last three years. among them only 20.1% had been recommended to get vaccinated. among the ones recommended any vaccination, 80.5% of persons with respiratory illness, 100% of persons with diabetes mellitus and 89.1% of persons with heart conditions had been recommended to get the influenza vaccine. conclusions: available data show unacceptably low levels of influenza vaccination coverage among vulnerable groups such as the population over 65 years of age and people living with chronic illness. influenza vaccination is the only preventive measure reducing influenza morbidity and mortality and its use has proven cost-effective among high risk groups. it is also the main vaccine recommended by physicians. however the overall rate of physicians recommendation of vaccination is very low. dynamic efforts are thus needed to design and implement strategies and policies that have demonstrated their rigorous effectiveness in enhancing influenza vaccination coverage rates. conclusions: nasopharyngeal sampling with flocked swabs is well tolerated and suitable to be used in an outpatient setting. implementation of real-time mono and multiplex naats results in a significant improvement of the rate in diagnosing lrti. hrv account for the majority of viral lrti in primary care followed by influenza and coronaviruses but also rsv and hmpv are prevalent in an adult population. in this study, 19 polyomaviruses were detected of which 10 were involved in a double infection. methods: observational analysis of a prospective cohort of 1041 nonseverely immunosuppressed adults with pp requiring hospitalisation (1995) (1996) (1997) (1998) (1999) (2000) (2001) (2002) (2003) (2004) (2005) (2006) (2007) (2008) . of them, 556 were diagnosed by urinary antigen and/or 650 were diagnosed by culture. overall, 86% of pneumococcal strains were available for serotyping (quellung) and 58% for pfge (smal) and or mlst. the diagnosis of septic shock was based on a systolic blood pressure <90 mmhg and peripheral hypoperfusion with clinical or bacteriologic evidence of uncontrolled infection. results: a total of 114 (11%) patients with pp had septic shock at presentation. patients with shock were younger (61 vs 66 yrs; p = 0.003), were more frequently current smokers (45% vs 28%; p = 0.002), had received more commonly corticosteroid therapy (13% vs 6%; p = 0.015), and were more frequently classified into high-risk psi classes (81% vs 60%; p < 0.001) than those who did not have this complication. they were also less likely to have received prior influenza vaccine (31% vs 48%; p = 0.007) and had more frequently bacteraemia (41% vs 30%; p = 0.014). no significant differences were found in rates of penicillin-(2% vs 2%) and erythromycin-resistance (16% vs 12%). serotype 3 was more commonly associated with shock (40% vs 24%; p = 0.007), whereas serotype 1 was rarely associated with this complication (2% vs 9%; p = 0.041). no significant differences were found regarding genotypes: st2603 (26% vs 16%), netherlands-ser8-st53 (10% vs 3%), netherlands-ser3-st180 (10% vs 8%), spain-ser9v-st156 (10% vs 12%). patients with shock required more frequently mechanical ventilation (38% vs 4%; p < 0.001), and had longer los (19 vs 10 days; p < 0.001). early (10% vs 1%; p < 0.001) and overall case-fatality rates (25% vs 5%; p < 0.001) were higher in patients with shock. conclusions: pp presenting with septic shock is still associated with a poor outcome. it occurs mainly in current smokers, patients receiving corticosteroids, and in those infections caused by serotype 3. prior influenza vaccination and pp caused by serotype 1 are associated with a lower risk of shock. o131 high long-term mortality rate after initial recovery from severe community-acquired pneumonia background: despite the presence of antibiotics and vaccination strategies against pneumocci, community-acquired pneumonia (cap) is still a major cause for mortality in developed countries. however, it is unclear how an episode of cap influences long-term survival after initial recovery. therefore, we determined mortality up to 5 years after discharge in patients hospitalised because of an episode of severe cap in a non-intensive care setting. methods: in 5 hospitals in the netherlands, patients (pts) with severe cap (psi class iv and v without need for treatment in icu) were prospectively followed for 28 days and mortality up to 5 years after discharge was determined using the dutch municipal public records database. we used cox regression analysis to examine predictors for mortality. results: compared to strategy 2, strategy 1 resulted in slightly higher costs (chf 8,748 vs. 8,981) but fewer infections (.008 vs. 0.006) during patients' mean length-of-stay, producing an incremental costeffectiveness ratio (icer) of chf 83,303 per mrsa infection avoided. strategy 3 was dominated by strategies 1 and 2 (both more costly and less effective). sensitivity analyses suggest that prevalence of colonisation on admission is a stronger predictor of cost-effectiveness than the costs of infection or rapid screening, the probability of cross-transmission, or the incremental costs of isolation and contact precautions. increasing the relatively low on-admission prevalence at our centre by 20% lowers the icer to chf 60,973 per infection avoided. in contrast, increasing the cost of each infection, the cost of rapid screening, or the risk of cross-transmission by 20% only marginally affects the icer. conclusion: this analysis suggests that compared to risk factor identification and pre-emptive isolation, universal rapid screening upon surgical admission is not strongly cost-effective at our centre. however, local epidemiology plays an important role. in particular, settings with higher prevalence of colonisation on admission may find universal rapid screening more cost-effective. of note, no screening is undesirable, as costs and infections would be higher. results: admission and weekly screening coupled with patient isolation was found to dramatically reduce the number of mrsa acquisitions. the largest reductions were obtained with pcr technology, followed by chromogenic agar. the differences, however, were surprisingly small, and all screening technologies achieved reductions in mrsa acquisition of close to 80% compared with the no-intervention scenario. nonetheless, chromogenic and pcr-based systems were able to decrease the number of unisolated mrsa-bed-days by approximately 15 and 35% respectively. conclusions: the small differences in the ability of the screening technologies to reduce mrsa acquisition reflect both a relatively low estimated isolation efficacy and the observed highly skewed distribution of icu-stays, and may provide some important insights into the reasons for recent disappointing trial results. in particular, the skewed length of stay distribution means that most mrsa-bed days are accounted for by relatively long-stay patients for whom rapid detection will make the least difference. key sources of uncertainty were found to be isolation effectiveness and attributable mortality due to mrsa infections, both of which are difficult to accurately estimate with currently available data. the model results allow us to quantify the expected value of reducing these key uncertainties, and help to provide a rational basis for setting future research priorities. objectives: we have shown that there is substantial colonisation of mrsa among nursing home residents and staff with our recently conducted point prevalence study in 45 nursing homes which revealed an overall prevalence rate of 24% in residents and 7.6% in staff.1 the aim of this study was, therefore, to test the effectiveness of an intervention in nursing homes which sought to improve standards of infection control as a means of reducing mrsa prevalence. methods: a cluster randomised controlled trial (crct) involving 32 nursing homes, with each home representing the unit of analysis, was performed. the study ran for 12 months with data collected at baseline, 3, 6 and 12 months. nasal swabs were taken at baseline from consenting residents and staff in all homes prior to randomisation with an audit of infection control procedures also undertaken. following collection of these baseline data, nursing homes were allocated to the intervention or control arm (1:1). intervention home staff were trained in infection control, specifically hand hygiene, catheter care, barrier approaches such as use of gloves, aprons and masks, and decontamination of equipment and the environment with usual practice continuing in control homes. after each data collection timepoint, feedback was given to the intervention homes in terms of their performance and further education and training provided as required. the primary outcome was the prevalence of mrsa in intervention homes compared to control sites. results: preliminary analysis of the data has revealed no significant change in the prevalence of mrsa in the intervention and control homes, taking account of the clustering, over the one-year intervention period [risk ratio 0.83; 95% confidence intervals (ci) 0.53−1.29]. however, there was an improvement in infection control audit scores in the intervention homes, with a mean score in control homes at 12 months of 64.4% compared with 81.7% in the intervention sites; these scores were significantly different (paired t-test, p < 0.0001). the results suggest that infection control education and training as implemented in this study was not sufficient to affect mrsa prevalence. therefore, a more detailed education and training package either alone or in combination with mrsa decolonisation of staff and residents, may be required to reduce mrsa prevalence within this unique environment. [ objectives: in a response to the rapid global increase in the nosocomial prevalence of multi-resistant micro-organisms, infection control measures, such as patient isolation, are increasingly used. it is unknown how these measures influence the quality of life (qol) of patients during short-term isolation, and this was determined in a prospective matched cohort study. methods: all adult patients needing isolation in a single-patient room between 11/06 and 03/07 in the umc utrecht were eligible and included 24−48 hours after start of isolation (after giving informed consent and being able to fulfil study requirements). for each index patient we identified two control patients, admitted to the same wards at the same time, yet not subjected to any isolation measure. anxiety and depression and qol were assessed using the hospital anxiety and depression scale (hads) and visual analogue scale (eq-5d-vas) in all patients. opinions on and experiences with isolation were measured in isolated patients by means of a self-developed 'isolation evaluation questionnaire'. results: 42 isolated patients and 84 controls were included, with comparable baseline characteristics (age, sex, nationality, level of education, length of hospital stay and severity of underlying disease and co-morbidity (using the cumulative illness rating scale)). reasons for isolation were clostridium difficile-associated disease (n = 17, 40%), high risk for mrsa carriage (n = 12, 29%), or resistant gram-negative bacteria (n = 7, 17%). mean scores of questionnaires are presented in table 1. isin univariate analysis only duration of isolation of 48 hours (compared to 24 hours) was associated with a reduced quality of life (vas 57.7 compared to 68.7, p 0.02). on a visual analogue score of opposite terms isolation measures were rated with means of 87.5, 83.3 and 70.8 for safety, usefulness and quietness, respectively. conclusion: short-term isolation (up to 48 hours) is not associated with anxiousness or depression, but with positive feelings about safety, usefulness and quietness. index patients (n = 42), mean (sd) 4.7 (3.5) 5.3 (3.5) 9.9 (6.0) 62.3 (15.5) control patients (n = 84), mean (sd) 5.4 (3.7) 5.2 (3.6) 10.6 (6. objectives: there is a lack of data about the impact of healthcare associated infection (hai) on the experience of individual patients. this information is essential to empower health organisations to understand, prioritise, develop and implement solutions that will minimise risks to patients. this study explored comparable narratives from patients who had experienced a staphylococcus aureus blood stream infection with patients who had not. we conducted qualitative semi-structured interviews with eighteen adults who had previously been an in-patient in an acute teaching hospital in scotland. nine patients had had a laboratory diagnosed staphylococcus aureus blood stream infection and nine had no blood stream infection. all patients were interviewed for 20−40 minutes. the interviewer asked patients about their thoughts around hai, what concerns they had or still do, what measures they took to safeguard themselves from hai and how their experience impacted on their confidence of the nhs. probing questions were then asked depending on the responses given to the initial questions. all interviews were recorded, transcribed and analysed thematically. results: analysis of transcribed interviews is ongoing. preliminary analysis showed that all patients had positive and negative comments about infection prevention and control practice in the hospital. specific concerns included poor communication, poor cleanliness, awareness of patient boarding, lack of facilities, staff shortages and multi-tasking. some patients who had experienced bacteraemia said they had not been informed about the infection. those who had been informed were not given clear information about treatment or subsequent results. most patients were not specifically told what they or their family should do to safeguard them from infection and little or no written information about hai was provided. most patients are worried about hai on future admissions. the concerns of patients were not fundamentally different if they did or did not experience blood stream infection. the patient's reported experiences show that they have a broad awareness of systems issues that may increase risk of infection. consequently we need to involve patients in the design and evaluation of systems change and information that will improve patient experience. improving the safety and reliability of the system will have direct benefits for all patients in the hospital, not just the ones at risk of hai. analysis of surgical specialties separately revealed a significant reduction of mortality in cardiothoracic surgery who had been treated with mup-chx (2.3% (5/218) vs. 6.5% (11/170), p = 0.040, figure) . in other surgical specialties no significant difference was found. conclusion: peri-operative application of mup-chx in nasal carriers of s. aureus undergoing cardiothoracic surgery results in a threefold reduction of mortality after one year. o142 a lot done, more to do − a survey of teaching about healthcare-associated infections in uk and irish medical schools h. humphreys°, d. o'brien, j. richards, k. walton, g. phillips (dublin, ie; norwich, newcastle-upon-tyne, dundee, uk) objectives: patient safety and the prevention of healthcare-associated infections (hcai) are increasingly important health issues. medical doctors have traditionally been poor in complying with preventative measures to minimise hcai such as hand hygiene compliance. we surveyed medical schools in the uk and ireland to assess what is being taught and assessed in this area. methods: a questionnaire was drafted, piloted and then subsequently forwarded to the heads of medical schools as well as to known contact professionals with an interest in hcai in 38 medical schools. the questionnaire surveyed topics covered in the curricula, the modalities used to assess knowledge and practice, the usefulness of various teaching methods and materials, e.g. lectures, and what education resources were available. results: replies were received from 31 (82%) medical schools; two supplied data on their undergraduate and postgraduate courses. only 18 (60%) covered hcai as a quality and safety issue but over 90% covered prevalence, recognised risk factors, transmission, and preventative measures. 24 (80%) medical schools assessed competence in undertaking aseptic techniques and the disposal of sharps and mcqs were the most common (87%) means of assessment. case scenarios, resource materials and clinical skills stations were used in educating students in 26 (87%), 22 (73%) and 22 (73%) medicals schools respectively. 25 (83%) medical schools would be willing to share educational resources on hcai with other medical schools. conclusions: medical schools in the uk and ireland include hcai in their curricula but its importance as a safety and quality issue needs to be further emphasized. there is potential for agreeing a core curriculum on hcai and for sharing teaching resources such as videos and e-learning material. objectives: noroviruses are most common cause of outbreaks of gastroenteritis in uk national health service hospitals, leading to ward closure costing as much as £115 million per annum. using a detailed data set on norovirus outbreaks from three hospital systems in the south west of england, we estimated (1) the relative importance of introduction of norovirus from the community and within the hospital and (2) the cost effectiveness of ward closure at different time points during an outbreak. methods: using regression models we examined the association between number of new outbreaks in a hospital and community levels of activity and number of outbreaks currently occurring in other wards within the hospital. we examined the effect of different ward types (admission, general and long stay units) and whether the ward was open or closed to new admissions on a given day. we then undertook as analysis of cost (-effectiveness) of unit closure by developing a dynamic transmission model taking into account that ward closure may reduce norovirus transmission within and between wards. the stochastic simulation model was based on the actual characteristics of an acute hospital and the norovirus transmission parameters quantified in the statistical analysis. we measured the costs and benefits of closing affected wards at 1, 3 and 5 days after the onset of symptoms in the first case. results: community level of norovirus infection had a significant effect on the occurrence of new outbreaks as did outbreaks in admission and general medical units. the cost of closing wards to new admissions varied between £0.5 million to £0.9 million depending on the assumed effectiveness of closure in curtailing transmission. cost of bed day loss − compared with staff illness -accounted for around 90% of the total cost of closure. although the total number of cases tends to fall with rapid ward closure (by around 50% compared with no closure), the actual cost of control is similar regardless of when the closure is performed. we have developed a modelling framework to assess the effectiveness and cost-effectiveness of strategies to control norovirus outbreaks in hospital settings. ward closure is effective at preventing cases but since closure itself is an expensive intervention, it may not always be cost-effective. . other prevalent ribotypes were 001 (25%) and 106 (36%). 76% of the 027 isolates originated from 5 hospitals located in 2 healthboard areas. the remaining 18 isolates of ribotype 027 originated from 11 hospitals across scotland. in vitro 96% of 027 isolates were resistant to clindamycin with a mic range of 8−24 mg/l, mic50 of 12 mg/l and mic90 of 16 mg/l. furthermore 100% of the 027isolates were highly resistant to erythromycin (mic50 256 mg/l, mic90 256 mg/l), and to levofloxacin and moxifloxacin (mic50 32 mg/l, mic90 32 mg/l for both), while 65% of these isolates were resistant to cefotaxime (mic50=64 mg/l, mic90=96 mg/l). all 027isolates were susceptible to metronidazole, vancomycin, meropenem and piperacillin-tazobactam. high frequencies of clindamycin, erythromycin, levofloxacin, moxifloxacin and cefotaxime resistance were also found among isolates of ribotype 001 (90−99%) and 106 (94-100%). conclusion: until 2008 c. difficile ribotype 027 was only reported infrequently in scotland. in 2008, reports of ribotype 027 became more frequent and clusters were detected in 5 hospitals. the majority (96%) of ribotype 027 isolates were resistant to clindamycin. three other european countries have previously reported clindamycin resistance in pcr ribotype 027, albeit with a higher mic90 of >256 mg/l. objectives: to analyze trends in mortality due to clostridium difficile enterocolitis and to describe the most affected groups in order to better understand current clostridium difficile changing epidemiology. methods: we reviewed mortality data from the flanders and brussels regions in belgium (about 7 million inhabitants). we selected those records in which icd-10 code a04.7 (enterocolitis due to clostridium difficile) appeared as underlying cause of death within the death certificate. age-and sex-specific mortality rates were calculated for the period 1998-2006. direct standardisation was performed using the european standard population and 95% confidence intervals were calculated. stata 10 ® and excel ® were used as statistical software. objectives: toxigenic clostridium difficile is an enteric pathogen typical in the hospital environment but also community-acquired cases have been reported. however, relatively few attempts have been made to clarify the role of soil or water as a source of c. difficile infection. in november-december 2007, the drinking water distribution system in the town of nokia, finland was massively contaminated with treated sewage effluent resulting in a large gastroenteritis outbreak. the aim of the present study was to evaluate if contaminated water in this outbreak was also a potential source of c. difficile infection. a sample from the contaminated tap water and a treated sewage effluent sample were collected as soon as possible after the massive faecal contamination of the drinking water distribution system had occurred. c. difficile was isolated from heat-treated water samples by filtrating of 100 ml, 10 ml and 1 ml volumes of water and placing the membranes on selective ccey agar plates, which were anaerobically incubated for 3 d. stool samples from the patients fallen ill during the epidemic were examined for enteric pathogens, including c. difficile. all potential c. difficile colonies were subcultured on ccfa agar plates and toxin-positive isolates were identified by pcr. pcr ribotyping was performed according to the protocol of the anaerobe reference unit in cardiff, uk, using the cardiff-ecdc culture collection as a set of reference strains. after gel electrophoresis, the band patterns were analyzed using the bionumerics software. results: altogether 22 c. difficile isolates were found in water samples. twelve isolates were toxin-positive; 5 isolates were from contaminated tap water and 7 isolates from treated sewage effluent, the latter being the contamination source. among the tap water and sewage effluent isolates, 4 and 5 distinct pcr ribotype profiles were identified, respectively. the 9 human faecal c. difficile isolates detected were divided into 4 distinct pcr ribotype profiles. none of the profiles were identical with that of the hypervirulent pcr ribotype 027. two isolates, one from tap water and another from a patient, had an indistinguishable pcr ribotype profile. conclusion: our observation implies that c. difficile contamination of a tap water distribution system had occurred. waterborne transmission of toxigenic c. difficile and subsequent c. difficile infection seems possible. objectives: an accurate and rapid method is needed for typing of toxigenic clostridium difficile. a commercial automated repetitive pcr system (rep-pcr; diversilab ® , biomérieux inc., st louis, usa) utilises amplification and subsequent automated electrophoretic separation of the repetitive extragenic palindromic sequences of c. difficile. our aim was to evaluate the performance of this rep-pcr method for genotyping of c. difficile isolates and to compare it to pcr ribotyping. in addition, the correlation between the rep-pcr and the virulence gene profiles of c. difficile strains was studied. methods: a total of 195 toxin-positive c. difficile isolates were studied. we included consecutive isolates from two laboratories in finland, containing also strains of the hypervirulent c. difficile ribotype 027. in addition, selected c. difficile strains with >18 bp deletions in their tcdc genes were analyzed. the dna was extracted and the rep-pcr performed according to the manufacturer's instructions. the amplification products of rep-pcr were detected and analyzed using the diversilab system. further analysis was performed with the web-based software accompanying the system. the usefulness of the library construction option of the diverslab system for isolate comparison was tested. the virulence genes (tcda, tcdb, cdta, cdtb and tcdc) were analyzed by conventional pcr and the whole gene sequencing of tcdc was performed from isolates with deletions >18 bp. pcr ribotyping was performed using the protocol of the anaerobe reference unit in cardiff, uk. the correlation between the rep-pcr profile and the ribotype was excellent. all major ribotype groups were clustered in their own rep-pcr groups. interestingly, subgroups could be found with rep-pcr within two most prevalent ribotypes 001 and 027. the automated rep-pcr proved to be reproducible; the results from separate dna isolations and pcr-runs/microfluid electrophoresis as well as the results performed by different individuals of laboratory personnel were comparable. the rep-pcr profiles and pcr ribotypes correlated also with the virulence gene profiles. conclusion: this automated rep-pcr represents an effective and reproducible method for the genetic characterisation of c. difficile strains in clinical laboratories with molecular biology facilities. the constructed c. difficile library allows comparing the relatedness of c. difficile strains and their fingerprints over time. objectives: clostridium difficile infection (cdi) is a serious diarrhoeal illness associated with high morbidity and mortality. currently available treatments (oral vancomycin or metronidazole) usually produce good resolution of diarrhoea but are associated with a 20% to 30% incidence of recurrence. opt-80, the first in a new class of macrocyclic antibiotics, is bactericidal via unique inhibition of rna polymerase. this phase 3, non-inferiority clinical trial was conducted in more than 100 sites in north america and compared the efficacy and safety of opt-80 and vancomycin in treating cdi. methods: eligible patients were adults with acute cdi symptoms and a positive stool toxin test. patients received oral opt-80 (200 mg twice daily) or oral vancomycin (125 mg 4 times daily) for 10 days. primary end point was clinical cure (resolution of symptoms and no further need for cdi therapy 2 days after stopping study drug). secondary end point was cdi recurrence (diarrhoea and positive stool toxin test within 4 weeks after treatment). global cure was defined as a clinical cure with no recurrence. results: 629 patients were enrolled and 87% were evaluable. in the per protocol (pp) population (n = 548), mean age was 61.3±17.1 years and 44.0% of patients were male. equivalent rates of clinical cure were observed with opt-80 (92%) and vancomycin (90%) in the pp analysis; similar outcomes were observed in a modified intent-to-treat (mitt) analysis. significantly fewer patients treated with opt-80 (13%) than vancomycin (24%) experienced recurrence in the pp analysis (p = 0.004) and in the mitt analysis (15% vs 25%; p = 0.005). significantly more opt-80-treated patients achieved global cure (78%) than vancomycintreated patients in the pp analysis (67%; p = 0.006) and in the mitt analysis (75% vs 64%; p = 0.006). opt-80 was well tolerated with an adverse event profile similar to that of vancomycin. in this study -the largest comparative trial of a new antimicrobial agent versus vancomycin for the treatment of cdi -clinical cure rates after treatment with opt-80 or vancomycin were equivalent. however, opt-80 was associated with a significantly lower recurrence rate and a higher global cure rate than vancomycin. opt-80 is an oral, non-absorbed agent that has a convenient (twice daily) dosing schedule and low risk of adverse events. opt-80 represents a potential new treatment option for cdi that is associated with a lower recurrence rate than currently available treatments. results: sequence analysis (sa) revealed that locus a is absent in type 078 and that some mismatches are present in the primer annealing sites for loci b, c and g. lowering the annealing temperature and increasing the magnesium chloride concentration for loci b, c and g resolved the low yield of pcr products. applying the mlva on 54 type 078 strains revealed that 42 (80%) strains, encompassing isolates from human (n = 42) and porcine (n = 11) origin, are genetically related with a summed tandem repeat differences (strd) 10). three clonal complexes (cc, defined by strd 2) were recognized; one cc contained both human (n = 4) and porcine (n = 3) strains. the optimised mlva identified 3 genetically related clusters and 6 cc among the 67 isolates from e and ni. ccs contain isolates from more than one hospital and indeed for several clusters isolates from both e and ni. 2 isolates obtained from ni 8 years earlier were part of one large cc. the optimised mlva can distinguish and/or group type 078 strains from distinct settings. type 078 strains from human and animal origin are genetically related. the clustering of some isolates from distinct settings is consistent with community sources for type 078. the last 2 observations suggest zoonotic transmission. objectives: this paper updates our assessment of the contribution that community-associated clostridium difficile infection (cdi), as reported to the english mandatory surveillance scheme since 2007, makes to both the acute and community sectors of the national health service (nhs) in england. methods: nhs acute trusts (hospital groups) in england are required to report all c. difficile toxin positive diarrhoeal specimens processed by their laboratories whether the patients were in hospital or the community at the time of onset of the illness or when the specimen was taken via a web enabled reporting system. positive specimens from the same patient within 28 days are not reported. reported cases in patients under 2 years of age were omitted from this analysis. enhanced surveillance data (including information on date of admission, patient location prior to testing, sex, age and patient category) on cdi have been collected through a web-enabled reporting system since april 2007. risk factor information is completed on a voluntary basis. results: more than 75,000 cases of cdi in patients aged >2 years were reported, 23% of these cases were taken in non-acute settings of which 74% were taken by a general practitioner. a further 17% of specimens were taken on presentation or <2 days of admission into an acute trust. approximately 32% of all cases had at least one risk factor field completed, >19,000 cases reported risk factor information on episode category; 23% of these cases were community associated and 77% were hospital acquired. the information reported suggests that only 3% of the community associated cases were from patients with continued infection or relapsed episodes of cdi, this is compared to 8% of the hospital acquired cases who had continued infection or relapsed episodes of cdi. conclusions: 23% of the c. difficile specimens reported by acute trusts were diagnosed in a community setting. published studies suggest that 12−15% of these might be expected to have been acquired during a hospital stay within the previous month (i.e. were community onset hospital acquired cases). future work is required to investigate whether there are differences in the epidemiology, risk factors e.g. antibiotic exposure and outcome of patients with community onset disease. o152 clostridium difficile-associated disease: a newly notifiable disease in ireland m. skally, f. roche, d. o'flanagan, p. mckeown, f. fitzpatrick°( dublin, ie) new cases of clostridium difficile-associated disease (cdad) became notifiable in ireland on 4th may 2008. the main objective of this new notification process was to provide a national overview of the epidemiology and burden of cdad. this paper review the first six months of preliminary data notified. methods: the interim case definitions for new and recurrent cdad cases proposed by the european society for clinical microbiology and infectious diseases (escmid) study group for c. difficile were employed. this report reviews the weekly events of cdad extracted from the computerised infectious disease reporting (cidr) system in january 2009. census of population 2006 figures were used as denominator data in the calculation of incidence rates. results presented represent 34 weeks of data submitted. results: there were 1581 new cdad cases notified on cidr between the 4th may 2008 and 27th december 2008, representing a crude incidence rate (cir) of 37.3 cases/100000 population (estimated annual cir is 57.0 cases/100,000). all cases were laboratory confirmed. there was a higher occurrence of cases in females. the male:female ratio for the period was 1:1.6. in 0.4% of cases the sex was unknown. 71.4% of cases were in the greater than 65 years age category. the preliminary data submitted on cidr indicate that 63.0% of cases were hospital inpatients and 8.9% of cases were either gp patients or outpatients. the origin of 28.1% of samples is unknown. there was large variation between the 8 public health regions (table 1) . the incidence of cdad in ireland is prominent in older age groups and in healthcare settings. what is more remarkable is the regional variation of cases reported. this varies from 9.1 per 100,000 in the north east to 52.4 per 100,000 in the west. the seasonal trend is indistinguishable at present due to late and batch notifications from institutions. o153 clostridium difficile-associated diarrhoea in immunosuppressed patients with cancer objective: to assess the epidemiology, clinical features and outcome of clostridium difficile (cd) associated diarrhoea in immunosuppressed patients with cancer. methods: review of all episodes of cd associated diarrhoea documented in adults with cancer and haematopoietic stem cell recipients (2000) (2001) (2002) (2003) (2004) (2005) (2006) (2007) (2008) . microbiologic diagnosis included cd isolation from stool samples, direct detection of cd toxin, and testing for cytotoxin production by the isolated strain. we documented a significant increase of cd associated diarrhoea, from 0.34/1000 admissions in 2000 to 4.05/1000 admissions in 2008 (p < 0.01). there were 56 episodes in 54 patients. thirty-one patients were male (55%) with a mean age of 52 years (± 16). forty three (77%) patients had an haematological underlying disease and 13 had solid tumour; 41 (73%) had received previous chemotherapy, 14 (25%) were stem cell transplant recipients (3 presenting with gvhd) and 17 (30%) were neutropenic (<500). in the previous month 52 patients (93%) had received one or more antibiotics (cephalosporins 63.5%, glycopeptides 40%, carbapenems 38.5%, betalactam + betalactam inhibitors 29%, quinolones 19%). fever >38ºc (71%) and abdominal pain (44%) were the most frequent manifestations, and the diarrhoea was hemorrhagic in 8% of the cases. most patients (77%) were treated with metronidazole (median 11 days), and the antibiotic therapy was discontinued in 56%. in 5 patients who had recovered from neutropenia, the diarrhoea resolved just by discontinuing the antibiotic therapy. no patient developed toxic megacolon or needed surgery. three patients (5.5%) had relapses. overall mortality (<30 days) was 22% (12 patients). the incidence of cd associated diarrhoea in cancer patients has increased significantly in recent years. it is related with important morbidity and mortality. better strategies to improve its prevention and treatment are needed. s154 linking research to the clinic: how laboratory findings relate to management of invasive candida infections the role of the research laboratory in the management of invasive candida infections goes beyond routinely available tests for identification of candida species and susceptibility testing of antifungal agents. cutting-edge molecular epidemiology technologies have been used to type isolates of candida species based on their dna sequences. multilocus sequence typing schemes have been designed for c. albicans, c. dubliniensis, c. glabrata, c. krusei and c. tropicalis. multi-locus sequence typing can be used to investigate possible hospital outbreaks of infection (finding widely different strain types within a unit indicates no outbreak, although the converse is not true). for c. albicans, typing multiple isolates from the same patient has shown that people tend to harbour as commensals a mixture of closely related but different strain types, which may provide for selection of the most appropriate type for invasion of a particular tissue or in response to antifungal treatment. strains in c. albicans clade 1, the largest group of related strain types, have a higher proportion of isolates resistant to flucytosine than other clades, and they all share a common resistance mechanism. research on mechanisms of resistance of candida species to many types of antifungal has progressed to the point that some investigators are looking to design dna chips that could be used both for identification and for susceptibility testing of a candida isolate. much research effort goes into detailed study of host-fungus crosstalk in experimental candida infections. animal models of infection have been greatly refined and the latest research shows how early release of chemokines that attract neutrophils into infected tissues contributes to the immunopathology of candida infection. this rapid, innate immune response also emphasizes the need for antifungal intervention at the earliest possible stage to provide the best chance for successful treatment of a disseminated candida infection − a finding now supported by clinical data as well as experimental models. translation of the latest research advances into practical diagnostic tests and new therapeutic approaches for candida infections always takes a long time − typically years − and not all research results find clinical applications. however, the level of effort invested in basic candida research ensures support for steady progress in diagnosis and management. the echinocandins are semi-synthetic lipopeptides that are increasingly used for the prevention and treatment of invasive fungal infections. understanding the pharmacokinetic and pharmacodynamic (pk/pd) characteristics of these compounds is critical for their optimal clinical use. the echinocandins have potent in vitro activity against candida spp., although c. parapsilosis is less susceptible than other candida species. the molecular mechanisms of resistance in candida species, which relate to amino acid substitutions in 'hot spots' within the fks1 gene, are becoming well characterised. susceptibility breakpoints for all three clinically available compounds have been determined recently by the clinical laboratory standards institute, with a 'susceptible-only' breakpoint of >2 mg/l suggested. the pk/pd of the echinocandins have been determined in experimental models of disseminated candidiasis, and of both disseminated and pulmonary invasive aspergillosis. these studies suggest that the echinocandins: (1) display concentration-dependent antifungal killing (or effect); (2) are extensively distributed into peripheral tissues, where they exhibit prolonged mean residence times at the site of infection; (3) are fungicidal against candida spp. and induce dose-dependent morphological changes in aspergillus spp.; and (4) result in a diminished propensity for angioinvasion by aspergillus spp. recent evidence also suggests that the echinocandins have important immunomodulatory properties, which may contribute significantly to their observed antifungal effect. pk/pd modelling and laboratory animal-to-human bridging techniques have been used to identify safe and effective dosages for the echinocandins for relatively uncommon clinical syndromes such as neonatal haematogenous candida meningoencephalitis. these techniques are an efficient method of identifying effective regimens for humans that can be expedited for study in clinical trials. pk/pd modelling techniques can and should be used to address outstanding clinical queries in relation to these compounds, including optimal dosages, decision-support analysis for the setting of in vitro antifungal susceptibility breakpoints and the clinical relevance of inherent or acquired reduced antifungal susceptibility. s156 invasive candidiasis: which antifungal treatment for which patient? management of patients with invasive candidiasis represents a complex issue owing to the heterogeneity of patients in whom these infections occur. established risk factors for invasive candidiasis, which include total parenteral nutrition, multiple organ failure and candida colonisation, are common to many types of patients that are treated within the critical care setting. furthermore, the severity of the underlying condition in these patients necessitates swift antifungal treatment to ensure optimal outcomes. an additional factor for consideration when treating candida infections is the changing epidemiology of candida species; potentially fluconazole-resistant species such as c. glabrata and c. krusei are becoming more common, particularly in patients with prior fluconazole exposure. a range of antifungal agents is available with in vitro activity against candida species. however, not all of these agents are suitable options for the clinical management of invasive candidiasis because of the overall complexity of both infection and underlying condition. for example, the position of the polyenes, particularly amphotericin b deoxycholate, is becoming less tenable as the risk of renal complications is increasingly regarded as unacceptable in patients that are likely to have or be at risk of multiple organ failure. furthermore, because of the increasing prevalence of fluconazole-resistant species, recent guidelines no longer recommend the use of azoles as first-line treatment for invasive candidiasis except in special cases, focusing instead on the echinocandin agents. there is now a wealth of clinical data available for the echinocandins. micafungin, for example, has been assessed in invasive candidiasis in clinical trials that included a wide variety of underlying conditions and patterns of infection, including neutropenic patients and those with deep infections such as peritonitis. furthermore, micafungin is the most extensively evaluated of the echinocandins in paediatric patients, having been tested both in children up to the age of 16 years and in premature infants and neonates. optimal management of patients with invasive candidiasis depends on a strategy that takes into account the complex nature of the disease. judicious selection of antifungal treatment should be accompanied by consideration of non-drug-related factors that improve survival, such as careful assessment of intravenous catheters and their potential involvement in candida infections. patients with invasive candidiasis often have underlying conditions that are severe illnesses in themselves. these range from neutropenia during cancer chemotherapy to the multi-organ failure of intensive care unit patients. against this background of severe underlying illness, it can be difficult to appreciate the success or otherwise of treatment strategies for candida infections. in the last decade, major advances have been made in antifungal therapy with the introduction of 1. echinocandins; 2. extended-spectrum azoles; and 3. lipid formulations of amphotericin b. robust clinical studies for their successful use in candidaemia have been published. however, it is important to translate these studies into practical strategies for the care of individual patients. in this presentation, individual cases will be used to provide insights into the successes and failures of these antifungal classes for the management of invasive candidiasis. specific interest will be focused on the use of fluconazole versus the echinocandins. these micafungin-based cases will be supported by insights from the evidence-based literature combined with practical experiences at the bedside. the factors to be considered are: 1. spectrum of activity; 2. drug toxicity; 3. drug interactions; 4. drug resistance; 5. pharmacology; 6. diagnosis; 7. site of infection; 8. use of biomarkers/cultures in treatment strategies; and 9. costs. it is important to realise that large clinical trials exclude many patients with invasive candidiasis. therefore, with the use of individual cases, it is possible to provide further insights into the clinical use of these outstanding antifungal agents. patient management: the era of rapid diagnostic results (symposium organised by cepheid) s161 will community mrsa and clostridium difficile change infection control in hospitals? infections caused by methicillin-resistant staphylococcus aureus (mrsa), vancomycin-resistant enterococci, and clostridium difficile are inter-related in healthcare institutions. the emergence of epidemic mrsa and c. difficile strains has placed a greater burden on infection control systems in healthcare facilities, which often must increase surveillance and change disinfection strategies to halt the transmission of these pathogens in hospitals. ironically, the usa300 mrsa strain arose in the community but now is being transmitted frequently in healthcare settings, while the epidemic nap1/bi/027 c. difficile strain was originally a healthcare-associated pathogen, which now is causing considerable morbidity in community settings. to successfully slow the spread of these pathogens, infection control must work closely with both the laboratory and pharmacy services to ensure that these organisms are detected rapidly and that the selective pressure to maintain the organisms in the institution are reduced. clearly, bundles of interventions, rather than single approaches, are necessary to contain the spread of these organisms in hospitals. the continued influx of patients with communityacquired mrsa and c. difficile infections into healthcare institutions is a challenge for infection control practitioners that will clearly increase in the future. the food borne pathogen l. monocytogenes discovered by murray in 1926 is responsible for a severe infection with various clinical features (gastroenteritis, meningitis, meningoencephalitis and materno foetal infections) and a high mortality rate (30%). the disease is due to the ability of listeria to cross three host barriers during infection: the intestinal barrier, the placental barrier and the blood brain barrier. it is also due to listeria capacity to survive in macrophages and to enter into non phagocytic cells, such epithelial cells. recovery from infection and protection against reinfection are due to a t-cell response, explaining why listeria has since many years has become a model in immunology. nearly three decades of molecular biology and cell biology approaches coupled to genetic and post-genomic studies have promoted listeria among the best models in infection biology. in depth studies of the mechanism of entry into cells has help unraveling how listeria crosses the intestinal and placental barrier. unsuspected concepts in cell biology were discovered. post-genomic studies have recently allowed to unveil the listeria transcriptional landscape during switch from saprophytism to virulence. the talk will give an overview highlighting recent results in the frame work of well established data. the last several decades of research in medical mycology have offered great insights into fungal cell biology, epidemiology, phylogenetics and the cells and molecules involved in the pathogenesis of fungal disease. a legitimate question is to ask to what extent our extensive advances in comprehension of the biology of fungal pathogens have contributed to improvements in diagnosis and treatment. to what extent do patients benefit from translation of basic research into tools for clinical management? and the equally valid question: to what extent does biological science benefit from study of fungi that are opportunistic pathogens? the speaker will examine some of these questions from the perspective of long experience in the field and the curmudgeonly attitude that develops with age. objectives: the incidence of invasive meningococcal disease (imd) has been reported in the czech republic since 1943. in response to the emergence of a new hypervirulent clonal complex, cc11, nationwide enhanced surveillance of invasive meningococcal disease was implemented by the national reference laboratory for meningococcal infections (nrl) in 1993. the case definition is consistent with the ecdc guidelines. culture and pcr are used for confirmation of cases. notification is compulsory and is performed by local epidemiologists. strains of neisseria meningitidis isolated from imd cases are referred by the field laboratories to the nrl to be characterised by serogrouping, pora and feta sequencing (http://neisseria.org/nm/typing/) and multilocus sequence typing (mlst) (http://pubmlst.org/neisseria/). in the nrl, the epidemiological database is matched against that of strains to avoid duplicate reporting in the final enhanced surveillance database. results: despite the stable trend in imd incidence (0.8/100 000) since 2005, the case fatality rate was high (11.8%) in 2007. the disease was caused mainly by serogroup b meningococci (67.4%) in 2007, followed by serogroups c (20.9%) and y (9.3%). the most frequent clonal complexes were cc18, cc41/44 and cc32 (typical for serogroup b) and cc11 (typical for serogroup c). the highest age-specific morbidity rates were observed in the lowest age groups, i.e. 0−11 months and 1−4 years (11.4/100 000 and 4.5/100 000, respectively), and were associated with high prevalence of serogroup b. the case fatality rate was the highest in infants under 1 year of age (38.5%). the incidence of imd caused by serogroup c is currently low and there is no indication for mass vaccination with menc conjugate vaccine. menb vaccine is needed for infants, but the sero/subtype coverage by the currently developed porin-based vaccines is low for czech meningococcal isolates (maximum 56.8% for nine-valent meningococcal pora vaccine). methods:the vaccination programme incorporates dedicated vaccine clinic with a multi-disciplinary team including a nurse, data manager, a pharmacist specifically appointed to the unit. additional interventions to improve vaccine uptake and outcome have included use of sms texting to announce availability of influenza annually and improve adherence to completion of hepatitis b vaccination, educational programmes changes in guidelines e.g. varicella vaccination and creation of a vaccine passport. we reviewed vaccination clinic activity in the cohort of 1,700 hiv positive patients since introduction of a dedicated vaccine service. results:there has been a large increase in the uptake of vaccinations since introduction of this service. the varicella vaccination uptake increased from 8 (2007) to 43 (2008) due to targeted vaccine programme.(see graphic, legend reads left to right) conclusion: strategies implemented increased the uptake of recommended vaccinations in our hiv population. these included appointment of a dedicated health professional team, use of it supports, education of staff and patients and development of a vaccine passport. we developed the vaccine passport to help with patient education and awareness and it will serve as a record of vaccine administration for physicians off site. in the latter year, post guideline change, we have targeted our varicella non immune population. the next intervention planned is to assess all late entrants to our healthcare system to determine need for catch up vaccines, including mmr. results: column purified recombinant protein sspb1 was found to be a good antigen for both groups of animals used for immunisation. antibodies against the recombinant sspb1 tested by opsonophagocytosis were found to enhance phagocytosis of 4 gbs strains belonging to different serotypes at the average 5.5 times relatively to control. affect against gas strains was less pronounced (2.5 times) but still statistically significant. antibodies were also capable to interfere with adherence of gbs strains carrying sspb1 relatively to the strain without the protein. adherence of the strain with sspb1 towards different cell lines was dramatically higher which proves the function of the protein as adhesin. in passive protection test carried out with mice challenged with virulent gbs or gas strains introduced intranasaly were eliminated from the lungs of the animals 20 times faster in case of the usage of anti sspb1 serum relatively the control. in the experiments with active protection sspb1 immunised animals were found be significantly better protected against gbs and gas infection. (table 1) . similar results were obtained in the analysis of factors associated with 90-day mortality. conclusion: these data suggest that outcomes of both community-onset and nosocomial bloodstream infections due to s. aureus may be improved by an expert consultation service. the factors most critical for better outcomes and modifiable in time by id specialist consultation remain to be determined and may be explored as process of care quality indicators. objective: worldwide, the present tuberculosis epidemic is characterised by an alarming emergence in drug resistance. given the limited therapeutic options in mdr (and especially xdr) tuberculosis, there is a need to define the resistance levels and mechanisms present in clinical isolates categorised as drug resistant on the basis of critical concentration testing, so as to facilitate rapid therapeutic decisions. methods: we determined quantitative resistance levels of drug resistant isolates of mycobacterium tuberculosis sampled in switzerland over the past 3 years. resistance-conferring genetic alterations were identified by probe assays and pcr-mediated gene sequencing. results: rifampicin resistant isolates unanimously showed a high-level resistant phenotype (>50 mg/l) associated with mutations in rpob. in contrast, a significant fraction of clinical tb isolates categorised as isoniazid resistant on the basis of critical concentration testing showed a low-level resistant phenotype (mostly mutations in inha); heterogeneous phenotypic resistance levels were associated with mutations in katg. one third of streptomycin resistant clinical isolates had a low-level resistance phenotype (<10 mg/l). ethambutol resistance occurred mostly in mdr strains and was linked to alterations in embb, but resistance never exceeded 25 mg/l. our data indicate that some first line agents may be considered as therapeutic treatment option despite in vitro resistance at the critical concentration. diagnostic mycobacteriology would benefit from standardised measures of quantitative drug susceptibility testing in particular for those drugs were significant variations in phenotypic resistance levels are found in clinical isolates, e.g. isoniazid, ethambutol and streptomycin. introduction recent advances in the diagnostics of varicella zoster virus (vzv) infections have changed the perception of this virus as a cns pathogen. a real-time pcr method amplifying a 70 nt segment of the vzv gb region gave 0.5 log improved sensitivity over conventional pcr and was employed for routine diagnosis of vzv dna in samples of cerebrospinal fluid (csf). in addition, a new elisa method for detection of antibodies in the csf to glycoprotein e was developed, using a mammalian cell expression system for optimal glycosylation of the antigen. these methods were utilised for studies of vzv-induced cns infections. in a retrospective study, almost all patients had a reactivated vzv infection, but only 60% showed skin lesions. the following diagnoses were made: acute aseptic meningitis (aam), n = 34; encephalitis, n= 22; meningoencephalitis, n = 6; cranial nerve affections, n = 20; encephalopathy, n = 5; and cerebrovascular disease, n = 6. in 66 patients in whom vzv dna levels were determined, significantly higher viral loads were found in those with aam and encephalitis compared to patients with cranial nerve affection (including ramsay hunt syndrome). of the 50% (n = 50) who had a follow-up, 50% (n = 25) had neurological complications after 3 months. sixty-two percent had a ct/mri scan of the brain performed and 46% of these had pathological findings. vzv encephalitis showed a more broad disease spectrum as compared with herpes simplex encephalitis (hse), as will be presented. detection of intrathecal synthesis of vzv ge antibodies was positive in the vzv encephalitis patients, as well as in some of the hse patients, arguing for a previous suggested role for vzv as a co-pathogen at least in some cases of the latter disease. vzv vasculitis was a more common finding (6% of all cases) than expected from the literature of case reports. mr findings showed that middle and posterior cerebral arteries were targeted. surprisingly, despite substantial vzv dna loads in the csf of these patients, investigated serum samples were pcr negative. thus, vzv might be suggested to be neuronally transported to the arterial walls rather than haematogenously spread. conclusions: vzv is a serious and underestimated cause of cns infection. a substantial number of the patients presented with serious neurological symptoms and sequela, and pathological findings on ct/mri of the brain were abundant, especially in patients with encephalitis and vasculitis. pk/pd controversies for the clinician s190 pk/pd and azoles the triazoles have revolutionised the treatment of invasive and allergic fungal diseases. fluconazole, itraconazole, voriconazole and posaconazole are available for clinical use. isavuconazole and ravuconazole are in development. the triazoles have broad spectrum antifungal activity. the pharmacokinetics and pharmacodynamics (pk-pd) of the triazoles have been extensively investigated in murine models of disseminated candidiasis. the pd parameter that optimally links drug exposure with the observed antifungal effect is the ratio of the area under the concentration-time curve (auc) to mic (auc:mic). there is increasing information on the magnitude of the auc:mic that is required for optimal antifungal effect. pk-pd principles have been used to define in vitro susceptibility breakpoints. the triazoles are fungistatic against candida spp. their mode of action against aspergillus spp. is less well defined, although they clearly exhibit dose-dependant decrement in fungal burden in laboratory animal models of invasive pulmonary aspergillosis. the triazoles accumulate in tissues and this is important for an understanding of their antifungal effect. in humans, the triazoles are characterised by complicated pharmacokinetic properties. both itraconazole and voriconazole exhibit nonlinear pharmacokinetics. the triazoles all exhibit clinically relevant exposureresponse relationships. recent work from our laboratory suggests that itraconazole exhibits clinically relevant concentration-toxicity relationships. higher concentrations of voriconazole are associated with a progressively higher probability of hepatotoxicity, photopsia and central nervous system toxicity. because of the significant pharmacokinetic variability and clinically relevant drug exposure-response relationships, therapeutic drug monitoring (tdm) is frequently used. a strong argument can be made for the routine monitoring of itraconazole and voriconazole. there may also be grounds to consider monitoring posaconazole levels. tdm should be considered for all patients receiving triazoles who have refractory disease. furthermore, tdm should be considered when compliance, drug interactions and variable pharmacokinetics result in uncertainty about resultant drug exposures. an understanding of the pk-pd relationships of the triazoles has been instrumental in optimising their clinical efficacy. innate immunity s192 the inflammasomes: danger sensing complexes triggering innate immunity the nod-like receptors (nlr) are a family of intracellular sensors of microbial motifs and 'danger signals' that have emerged as being crucial components of the innate immune responses and inflammation. several nlrs (nalps and ipaf) form a caspase-1-activating multiprotein complex, termed inflammasome, that processes proinflammatory cytokines including il-1beta. amongst the various inflammasomes, the nalp3 inflammasome is particularly qualified to sense a plethora of diverse molecules, ranging from bacterial muramyldipeptide to monosodium urate crystals. the important role of the nalp3 inflammasome is emphasized by the identification of mutations in the nalp3 gene that are associated with a susceptibility to inflammatory disorders. these and other issues related to the inflammasome will be presented. it is now 20 years since charles janeway hypothesized the existence of clonally derived pattern recognition receptors and pointed to the importance of these in initial responses to bacterial and viral infections. janeway's hypothesis has been validated by the discovery of three groups of prrs. first, are the toll-like receptors which detect microbial lipids and non-self nucleic acids at the cell surface an in intracellular compartments. in addition cytoplasmic sensors of bacteria (nods) and of viral nucleic acids (rigs) have also been characterised. as well as being critical for responses to infections, these prrs also underlie a large burden of autoimmune and inflammatory disease in the human population and are thus important targets for therapy. in my talk i will describe the molecular mechanisms by which these conserved pathogen associated moecules are recognized by the tlrs with particular reference to lipo polysaccharide and single stranded viral rnas. i will also present new results which show how receptor activation is coupled to downstream signal transduction and in particular the role played by oligomeric signaling platforms assembled form adaptors and other signaling molecules involved in the pathway. i will discuss the potential for structural analysis to be used in the rational design of new drugs. this session proposes a critical review of the most salient recently published papers in the field with a special focus on control of multi drug-resistant organisms, prevention of infections in the intensive care unit, surgery etc. and highlights the need for validity/scope assessment. it emphasizes also the importance to prioritise information published in the abundant literature available so as to be able to summarise and understand the potential changes in clinical practice, and identify unresolved issues and areas of possible future clinical research. tourism is europe's face to the world. it is also a major source of revenue, employment and productivity. each year over 450 million arrivals are recorded into the continent, and of those, approximately 4 million are from latin america. returning travelers are even more numerous and more often associated with disease transmission into europe. within countries of the european continent, imported cases of environmental and zoonotic illnesses such as cholera, dengue, malaria, viral haemorrhagic fevers and west nile virus infections are a rare but established fact. diseases imported from latin america with the potential for autochthonous transmission (chikungunya, malaria, yellow fever) and or high infectivity (viral haemorrhagic fevers) will be described in detail and the possibility of european outbreaks from latin american countries will be discussed. cutaneous leishmaniasis (cl) is a worldwide disease, endemic in 88 countries, that has shown an increasing incidence over the last two decades. so far, pentavalent antimony compounds have been considered the treatment of choice, with rates of curing close to 85%. however, the high efficacy of these drugs is counteracted by their adverse events. recently, in vitro and in vivo studies have shown that no plays a key role in the eradication of the leishmania parasite objective: to determine whether a no donor patch (developed by electrospinning technique) is as effective as meglumine antimoniate in the treatment of cl while causing less adverse events methods: a double-blind, randomised, placebo-controlled clinical trial was conducted with 178 patients diagnosed with cl in santander, colombia, south-america. the patients were randomly assigned to two groups. during 20 days group 1 received simultaneously meglumine antimoniate and placebo of nitric oxide patches while group 2 received active nitric oxide patches and placebo of meglumine antimoniate. biochemical determinations (aspartate aminotransferase, alanine aminotransferase, creatinine and pancreatic amilase) were measured at he beginning and at the end of the treatment. a follow up was realised 21, 45 and 90 days after the beginning of the treatment results: the study included 69 (38.77%) women and 109 (61.23%) men. the average age in group 1 was 30.80±14.23 years; while in group 2 it was 27.88±13.79 years. clinical and demographic data were similar in the two groups. after the follow up period, the complete clinical healing of group 1 was 94.81% versus 37.14% for group 2 (p= 0.0001). treatment with no patches generated both, a lower frequency of non-serious adverse events (fever, anorexia, myalgia, arthralgia, headache), and a reduced variation in biochemistry determinations (asat 26 the treatment with no patches resulted in a lower percentage of complete clinical response compared with meglumine antimoniate. despite its inferior effectiveness, the safety, the lower frequency of adverse events, the facility of administration (topical) and the low cost of the patches justifies its evaluation in further poblational studies, especially in populations as the colombian ones, where the serious adverse events due to glucantime have increased dramatically. objectives: trichinellosis is a zoonotic disease which has never been reported in taiwan and is rarely linked to consumption of reptiles. we investigated the first documented outbreak of trichinellosis in taiwan consisting of 8 patients who became acutely ill after eating at the same restaurant in may 2008. we conducted a retrospective cohort study by interviewing the patients and persons who ate together with them. a case was defined as illness in an attendee who had fever (>38.0ºc) or myalgia 4 weeks after the festivals and was seropositive to trichinella antigen using an enzymelinked immunoassay and immunohistochemical staining. environmental study of the soft-shelled turtle farm was performed. results: of the 23 attendees, 8 persons met the case definition (attack rate = 35%). the most common presenting symptoms were myalgia (88%), fever (88%), and periorbital swelling (38%). all 8 patients sought medical care; five were hospitalised. of the 7 patients who underwent blood test, all had moderate eosinophilia. all 8 patients' serum samples were strongly reactive to trichinella excretory-secretory antigen. the only food item significantly associated with illness was the raw softshelled turtle meat (relative risk undefined; p = 0.005). traced back to the farm, histological examination of soft-shelled turtles was negative for trichinella species. the most likely cause of this outbreak was consumption of raw soft-shelled turtle served in the festivals. this investigation indicates taiwan is not free of trichinellosis. prevention and control programs of trichinellosis should be established. the public should be aware of the risk of acquiring trichinellosis from consumption of raw soft-shelled turtle. objective: to develop and evaluate a modified, rapid giemsa staining procedure for detection of malaria parasites in blood smears. disadvantage of the rapid commercially available staining methods is that they require highly experienced technicians for interpretation of results because the interpretation can be difficult. for this reason, many laboratories use the giemsa stain. shorter giemsa staining times have been reported previously, however, to our knowledge, the effect of 5 and 10 minute staining in different giemsa dilutions have not been evaluated. the stock solution of giemsa stain (merck, darmstadt, germany) was used in different dilutions (1:10 and 1:5) and incubated for different lengths of time (10 min and 5 min). the staining effect was compared to our standard giemsa stain (1:40, 45 min). sensitivity was determined by examining smears of p. falciparum from fresh and edta blood. the level of parasitaemia was followed in two patients admitted to our hospital with p. falciparum parasitaemia's of 21.5% and 28.8% (see table; patient a and b) by examination of blood smears taken at different time points after initiation of therapy. these samples were used to evaluate the different giemsa dilutions and staining times. smears were read by three independent observers (a clinical microbiologist, a laboratory technician specialised in parasitology, and a resident in clinical microbiology). in the table results of the three staining methods on blood from two patients from ghana with high parasitaemia's on admission and during follow-up are shown. all smears were equally easy to read and yielded parasite counts within internationally accepted ranges of variation (see united kingdom national external quality assessment service). conclusion: staining blood smears for detection of plasmodium falciparum parasites with a 1:5 dilution of giemsa stain for five minutes provides easy to read slides and results comparable to those obtained with the standard giemsa staining. advantage of the rapid method is the shorter turnaround time, disadvantage is the larger amount of stain used. objectives: diarrhoeal diseases are common in developed and developing countries and are major causes of morbidity and mortality worldwide. the need to differentially diagnose protozoan parasites versus other gastrointestinal (gi) aetiologies is well recognized. the most common gi protozoan parasites infecting humans worldwide are considered to be entamoeba histolytica, giardia lamblia, blastocystis hominis, dientamoeba fragilis and cryptosporidium spp. laboratory detection of these parasites is relying on microscopic analysis of stool samples and water concentrates, as well as enzyme immunoassay (eia) tests. utilising the microscopic examination usually results in underdetection of gi parasites, while usage of eia is often not cost-effective. methods: savyon diagnostics is currently engaged with developing an approach aiming to address the unmet needs and the current limitations in this field. this approach includes 3 major aspects: (1) the ability to detect a panel of all the above 5 organisms in one test kit, (2) the possibility to perform the diagnosis in two steps − first, simultaneous detection of these organisms without distinguishing between the different species for screening of large number of specimens, and second, distinctive detection of the specific aetiology in the positively-found specimens, and (3) the ability to apply eia diagnosis in formalin-preserved specimens for all the mentioned parasites. results: polyclonal antibodies were produced in-house based on native antigen extracts, recombinant antigens and synthetic peptides. the resulted inventory of antibodies enabled finding the optimal combination that provided the desired performance parameters for separate detection of each of the parasites in fresh, frozen or formalin preserved faeces specimens. the analytical limit of detection and the performance in characterised clinical specimens were comparable to microscopy or to reference eia, when available. the results show unique detection of e. histolytica in formalin-preserved specimens, which is comparable to detection in fresh specimens. furthermore, we demonstrate simultaneous detection of the parasites without compromising performance characteristics in fresh or preserved specimens. the presented work is a paradigm of an innovative approach, expected to advance the diagnosis of protozoan parasites in gi patients, thus, enabling appropriate and cost-effective diagnosis and treatment. objectives: systemic administration of certain facultative anaerob bacteria to mice bearing solid tumours leads to accumulation in tumours compared to normal target organs, like spleen and liver, and to retardation of tumour growth. salmonella enterica serovar typhimurium (s. typhimurium) as well as escherichia coli 1917 nissle (ecn) are such bacteria. preliminary experiments showed that such bacteria that exhibit the ability to form biofilms in vitro might also do so in tumours. in the present study this was systematically investigated. methods: biofilm formation of bacteria were detected on low-salt biofilm plates. additionally, salmonella-or e. coli-infected ct26tumours of balb/c mice that were left untreated or were treated with anti-gr1 to deplete neutrophilic granulocytes were removed two days post infection, fixed and prepared for electron microscope analysis. the expression of different genes which are probably involved in the biofilm formation were tested via real-time pcr. results: when examined after colonising tumours s. typhimurium sl7207 and sl1344 as well as ecn are almost exclusively found extracellular although they are able to invade the ct26 cells in vitro. interestingly, like in vitro all three bacteria form biofilms to various extend when residing in the tumours. this was followed in more detail for s. typhimurium sl7207. biofilms were not formed by sl7207 when neutrophils had been removed by antibodies. in addition, when arda a central switch for biofilm formation in the salmonellla had been deleted no biofilms could be found. importantly, now bacteria could be found intracellularly most likely in neutrophilic granulocytes. conclusion: the formation of biofilms by facultative anaerobic bacteria when residing in solid tumours is a novel and surprising finding. when neutrophils were removed, no biofilms are formed, while uptake into neutrophils is allowed when the ability of the bacteria to form biofilms was blocked. hence, it appears that the bacteria use biofilm formation as a defence system against the immune system of the host. objectives: rama is an arac/xyls family transcriptional activator found in klebsiella pneumoniae, salmonella spp. and enterobacter spp., the overexpression of which is associated with an mdr phenotype. recently a tetr-like gene that lies upstream of rama, known as ramr, has been identified as a repressor of rama. k. pneumoniae kp342 is a diazotrophic endophyte strain which has been reported to exhibit notable resistance to antibiotics. despite its mdr phenotype kp342 has been shown to exhibit attenuated pathogenicity in mouse models in comparison to clinical k. pneumoniae strains. the aims of this study were to: determine the levels of rama expression and establish its role in kp342's mdr phenotype; determine the effect of ramr complementation on rama expression and antibiotic susceptibility. methods: genome and sequence analysis performed in k. pneumoniae strain kp342 demonstrated a 96 bp deletion within the ramr gene. cloning and complementation with full size wild type ramr was performed in kp342 (hereby known as kp342/ramr). rt-pcr was used to assess levels of gene expression which were subsequently quantified using bio-rad quantity one software. mic testing was performed against chloramphenicol (cm), norfloxacin (nor) and tetracycline (tet) according to bsac guidelines. biofilm formation was measured using a modified protocol of o'toole and kolter. results: kp342 containing the mutated ramr gene (96 bp deletion) was shown to overexpress rama and the putative outer membrane protein roma. complementation of the ramr gene resulted in the repression of both rama and roma transcription by 3−4 fold. interestingly, the ramr complemented strain demonstrated increased biofilm formation (up to 9-fold increase) over a 72 hour period in both lb and m9 medium after static growth at 37ºc. mics of the tested antibiotics were reduced up to 16-fold in kp342/ramr compared to the ramr mutated kp342. conclusions: this result demonstrates that ramr acts as a repressor of both rama and putative outer membrane protein roma thereby increasing its susceptibility to antibiotics. however the restoration of a functional ramr in kp342 also increases biofilm formation significantly, suggesting that ramr plays a role in the regulation of biofilm formation genes and possibly bacterial virulence. rifampicin showed the highest activity on biofilm matrix and bacteria in sa and pa biofilms. results also indicated that biofilm viable mass was more susceptible to treatment than the biofilm matrix, which is mainly responsible for biofilm persistence. further research should specifically focus on compounds destroying matrix and which can be used as an adjunct to antibiotic therapy. [ objectives: staphylococcus epidermidis is a common cause of foreignbody infections (fbi) because of its ability to form biofilms. biofilms are very resistant to antibiotics. active and passive immunisation against biofilm-associated bacterial antigens may be an alternative. we studied the effect of immunisation against the lpxtg protein sesc in s. epidermidis biofilms in vitro and in vivo. we previously reported that sesc is present in all s. epidermidis strains tested. sesc is mainly expressed during the early and late fbi and at a higher level in sessile cells than in planktonic cells. methods: we used rabbit polyclonal anti-sesc-iggs (4 mg/ml) to study biofilm inhibition in vitro and in vivo in our rat model (50 mg igg per rat) on 1-day old biofilms. we also vaccinated rats twice with sesc according to standard protocols. serum samples taken at day 0 and 2 weeks after the 1st and 2nd immunisation were tested by elisa and showed an increase in anti-sesc antibody levels. s. epidermidis strains 10b and 1457 are biofilm forming strains and have been described before. for in vitro experiments, s. epidermidis 10b or 1457 were mixed with anti-sesc-iggs and incubated for 2 hours at 4ºc. subsequently 10 6 cells were added to each well. after 24 h at 37ºc biofilms were washed and stained with crystal violet and od595 was measured. for in vivo experiments, catheter fragments were pre-incubated with s. epidermidis 10b and implanted subcutaneously in each rat. after explantation, the average number of cfu was determined after 24 hrs. results: our data show that rabbit anti-sesc-iggs inhibit in vitro biofilm formation by s. epidermidis strains 10b and 1457 by 74% and 65%, respectively (n = 9). in the in vivo rat model, rabbit anti-sesc-iggs reduced the bacteria in a 1-day old biofilm 60-fold (n = 18). active immunisation with recombinant sesc led to a 10-fold reduction of cfu compared to control rats in 1 day-old biofilms (n = 10). after 3 days, the reduction in biofilm-associated bacteria in the immunised rats was 15-fold (n = 10) (fig 1.) . conclusion: sesc represents a promising target for prevention of s. epidermidis biofilm formation. the higher effect of passive immunisation compared with active immunisation is probably due to the subcutaneous injection of anti-sesc-iggs at the place of catheter insertion. objectives: staphylococcus epidermidis has emerged as a pathogen associated with infections of implanted medical devices impeding their long-term use. characteristics of s. epidermidis that allow persistence of infection are the ability of bacteria to adhere to surfaces in multilayered cell clusters, followed by the production of a mucoid substance more commonly known as slime, encoded by the ica operon. the adherent bacteria and slime are collectively known as biofilm. the coupled effects of specific chemical terminal surface groups and flow conditions on slime production and biofilm formation by s. epidermidis were investigated in correlation to the expression of two genes of the ica operon. methods: reference control strains (atcc35984, slime-positive and atcc12228, slime-negative), and two clinical strains isolated from different hospitalised patients, (one ica-positive/slime-positive and one ica-positive/slime-negative) were tested. bacteria grown in bhi medium were suspended in physiological saline at a concentration of~3×10 9 cells/ml. hydroxyl (oh)-terminated (hydrophilic) and methyl (ch3)terminated (hydrophobic) glass surfaces were used as substrates in a parallel plate flow chamber. bacterial adhesion was examined under two flow rates: 2 ml/min and 20 ml/min for two and four hours. total rna from both planktonic (p) and adherent (a) bacteria, after detachment with trypsin, was isolated by the trizol method. reverse transcription followed by relative real-time pcr (rrt-pcr) towards a 207 bp part of 23s rrna gene, allowed the detection of expression levels of icaa and icad. adherent bacteria were investigated with scanning electron and confocal laser microscopes. results: higher expression levels of both icaa and icad genes onto glass and especially methyl-terminated glass surfaces were calculated by rrt-pcr, under higher flow rate in two hours by the reference and the clinical slime-positive strains. these results correlate well with adherent bacterial cell counts and images taken by both microscopes. the icapositive slime-negative clinical strain showed lower expression levels of ica genes, less adherent ability and pia production on glass surfaces, as observed by microscopes. higher flow rate enhances the expression level of both ica genes, with a peak in two hours. hydrophobic biomaterial surfaces seem to play a crucial role to initial adherence, increasing ica gene expression and pia synthesis. consenting men and women with dfi (predefined by clinical signs and symptoms) caused by mrsa were potentially eligible including those associated with bacteraemia. patients with initial osteomyelitis were excluded. patients could receive l 600 mg bid either iv or po. primary end point were cure or improvement rates (c+i) and microbiologic eradication (me) at 60 days after the beginning of l. secondary end points were c+i on days 5 and 30 after the beginning of treatment and hospital discharge day, need of amputation, duration of therapy and mortality rates. all the adverse events were collected. results: 70 patients were enrolled. relation men:women was 2.1.the age of patients was 63.2±13 years and the average period from the diagnosis of diabetes was 16.5±9.7 years. associated bacteraemia was present in 27.1% of patients included. primary end points: c+i 60 days after the beginning of l was achieved in 91.4% of patients and me was obtained in 84.3% of patients. secondary end points: c+i on day 5, hospital discharge day and day 30 after the beginning of treatment and were; 70%; 84.3% and 88.6% respectively. only 8 patients needed a minor amputation. the primary and secondary end points in the subgroup of bacteraemic episodes were not statistically different of those previously described. the mean duration of therapy was 29.5±18.4 days. global mortality was 4.3%. only one episode of polineuropathy was reported. neither thrombocytopenia nor lactic acidosis was found. conclusions: l achieved excellent c+i even at first evaluation visit in documented dfi caused by mrsa. l also showed high me rates. although patients received prolonged periods of treatment, l was a safe drug. objectives: azithromycin microspheres formulation (azm) was developed to enable a higher dosage of 2 g to be administered as a single oral dose without decreasing the safety profile. this study compared azm with moxifloxacin (mox) aimed at confirming the efficacy and safety of azm in acute exacerbations of chronic bronchitis (aecb). methods: this prospective, multicentre, randomised, double-blind, double dummy study compared azm 2 g single dose with mox 400 mg once daily for 5 days, enrolled aecb patients 50 years old and above, with anthonisen type 1 exacerbations, and with at least 2 exacerbations of aecb in the past 12 months. subjects were to have a history of smoking of at least 20 pack-years and documented forced expiratory volume in 1 second (fev1) less than 80% of predicted. they were followed up for up to 9 months. results: a total of 396 patients were treated (198 in each of the treatment groups) the distribution of the age, and mean fev1 were similar for the 2 treatment groups. pathogens were isolated from 62.9% of the patients (61.1% of patients on azm and 62.9% of patients on mox). the clinical success (signs and symptoms related to the acute infection had returned to the subject's normal baseline level, or clinical improvement was such that no additional antibiotics were deemed necessary) rate for the per protocol population at test of cure (toc) at day 12−19 was 93.0% for azm and 94.2% for mox group (95% ci −5.8, 3.9). bacterial eradication rate (bacteriologic pre protocol population) at toc was 96.0% for azm group and 96.7% for mox group (95% ci −4.5, 3.3). although the study population had history of at least 2 exacerbation in the past 12 months, less than half of the subjects experienced a recurrence during the follow-up, and there was no statistically significant treatment difference in time taken to first occurrence of aecb. both treatments were well tolerated. the incidence of treatment related adverse events was low, being reported by 17% of subjects receiving azm and 12% of subjects receiving mox. most aes were mild or moderate in severity. the most common aes were gastrointestinal disorders, being reported by 14% of subjects receiving azm and 8% of subjects receiving mox. conclusions: a single oral dose of azm was as effective as a 5-day course of mox in the treatment of aecb and was well tolerated. objectives: optimal duration of gentamicin containing regimen for therapy of human brucellosis is not clearly determined. methods: this randomised clinical study was conducted to compare the efficacy of gentamicin 5 mg/day for 5 days plus doxycycline 100 mg twice daily for eight weeks (gd group) versus streptomycin 1gr im for 2 weeks plus the same dose of doxycycline for 45 days (sd group). all cases were followed for one year after cessation of therapy. efficacy of both regimens (failure of therapy or relapse) were compared. results: seventy-nine patients with the mean age of 35±14.5 years and 75 cases with the mean age of 36.7±13.9 years were treated with regimen of gd or sd, respectively. the clinical manifestations in these two treated groups were similar. failure of therapy was seen in one patient in gd group and in 2 cases in sd group ( objectives: to study the efficacy of telavancin (tlv), an investigational bactericidal lipoglycopeptide, for the treatment of complicated skin and soft tissue infections (cssti) caused by presumed or confirmed grampositive organisms. methods: atlas 1 and atlas 2 were methodologically identical, double-blind, randomised, multinational, phase 3 studies. adult men and women presenting with cssti including major abscess were randomised 1:1 to tlv 10 mg/kg intravenous (iv) q24 h or vancomycin (van) 1 g iv q12 h for 7 to 14 days. test-of-cure (toc) visit was conducted 7 to 14 days after end of study treatment. the all-treated population (at) included patients with confirmed diagnosis of cssti who received 1 dose of study medication. this analysis examined the baseline characteristics and cure rates at toc for patients with major abscess in the combined atlas at population. results: in the pooled at population of atlas, 772 patients presented with major abscess. more than 60% of these patients required hospitalisation. the baseline lesion surface area exceeded 5 cm 2 in 98% of the cases, while 65% of the patients presented with lesions exceeding 50 cm 2 (table 1) . elevated white blood cell counts were found in more than 40% of the cases (table 1) . nearly all patients required surgical drainage, with approximately 2/3 performed prior to the first dose of study medication. very few patients required a surgical procedure more than 4 days after the start of study medication. clinical cure rates at toc are presented in table 1 . overall, adverse events in the at population were similar between the treatment groups with regard to type and severity. conclusion: telavancin administered once daily was non-inferior to vancomycin for the treatment of major abscess. objectives: b. fragilis and related species, members of the normal bowel flora, are the most widely isolated anaerobic bacteria from different infections. to follow the development and spread of the resistance among these strains is difficult, as antibiotic susceptibility testing of clinically relevant anaerobes in different routine laboratories in europe is less and less frequently carried out due to the fact, that clinicians treat many presumed anaerobic infections empirically. to follow the changes in the antibiotic resistance of bacteroides strains three europe-wide studies were organised during the past twenty years. the evaluation of the results of these studies may show changes in the resistance to different antianaerobic drugs. only clinical isolates and no normal flora members of bacteroides strains belonging to different species were collected from different countries throughout europe during these studies. agar dilution method was used for the antibiotic susceptibility determination. actual breakpoints accepted by nccls (clsi) and eucast were used. molecular genetic investigations were carried out to detect resistance mechanisms. since the first study the chromosomally mediated beta-lactamase production and tetracyclin resistance is the most prevalent among bacteroides strains in europe. clindamycin resistance in bacteroides is mediated by a macrolide-lincomycin-streptogramin (mls) mechanism and its frequency differs in different countries in europe. resistance to beta-lactam-beta-lactamase inhibitor combinations was studied using amoxicillin-clavulanic acid and/or piperacillintazobactam. increase in resistance was observed to both combinations throughout the years. the same is true for cefoxitine and in the third study several hetero-resistant isolates were found. the occurrence and spread of resistance to imipenem and metronidazole among bacteroides strains merit special clinical importance. the presence of the cfia gene is much more prevalent than the expression of the imipenem resistance; however the spread of the cfia gene among species other than b. fragilis is still very rare. the molecular genetic methods looking for the resistance genes among strains with elevated mics against these antibiotics prove that resistance breakpoints should be reconsidered. the resistance to moxifloxacin shows great differences in different countries. the lowest resistance rate was observed in the case of tigecyclin. many factors may affect the response to treatment such as site of infection, surgical procedures, severity of the illness, patient status, presence of other pathogens (mixed infection), pk/pd parameters of the antibacterial drugs. thus, correlation between treatment failure and antibiotic resistance among anaerobes remains difficult to assess. the main discrepancies came from intra-abdominal infections and a worrisome disjunction between surgeon and microbiologist opinions emerged in the 1990's. but, patients in whom primary therapy failed had more resistant strains compared with patients in whom therapy succeeded. in contrast many failures may be due to the lack of isolation of anaerobes from clinical samples! during anaerobic bacteraemia, salonen et al. demonstrated that mortality increased dramatically from 17% for initially effective treatment to 55% when an ineffective treatment was started. facing new mechanisms of resistance and global increase resistance to many antibiotics among anaerobes may lead nowadays to a different answer. clindamycin vs. penicillin studies for the treatment of lung infections pointed out the failure due to b-lactamase production among gram-negative anaerobes. we found many reports of failure after clindamycin treatment in osteomyelitis, septic arthritis, brain abscess in presence of clindamycin-resistant anaerobes (bacteroides fragilis group and prevotella), probably because when resistance occurs, clindamycin mic's are high. similarly, the lack of coverage of an undetected resistant anaerobe allows the selection of an anaerobic strain resistant to the treatment chosen against the associated aerobes such as imipenemresistant eghertella lenta or metronidazole-resistant strains of prevotella or bacteroides fragilis. the later failures may give opportunity to set up a new metronidazole breakpoint for resistance (mic > 4 mg/l). the main problem is related to the difficulty to detect some heterogeneous resistant strains, that needs prolonged incubation period on agar medium. this kind of situation is probably the most suitable to correlate the bacterial antibiotic resistance with the failure of the antibiotic treatment. methicillin-resistant s. aureus isolates causing community-acquired infections (ca-mrsa) in children is a major problem in several areas around the world. ca-mrsa are associated with both skin and soft tissue infections and invasive infections. recurrent soft tissue infections and infections within the family caused by ca-mrsa isolates are common. ca-mrsa s. aureus isolates containing gene coding for pvl have been associated with serious staphylococcal pneumonia as well as osteomyelitis complicated by subperiosteal abscesses or venous thromboses. in addition to vancomycin, ca-mrsa generally are susceptible to clindamycin and trimethoprimsulfamethoxazole. treatment of superficial skin and soft tissue infections involves surgical drainage of abscesses followed by an oral agent such as tmp-smx or clindamycin. minocycline or doxycycline is a consideration for children >8 years old. empiric vancomycin is typically administered for more serious and invasive infections such as osteomyelitis, septic arthritis, serious head and neck infections or suspected staphylococcal pneumonia. clindamycin is efficacious in treating invasive ca-mrsa infections caused by susceptible organisms. linezolid or daptomycin is another option in selected circumstances. mri is the optimal imaging modality for assessing children with ca-mrsa osteomyelitis. aggressive surgical drainage of subperiosteal abscesses or sites of pyomyositis is recommended. venous thombosis is increasingly recognized as a complication of ca-mrsa osteomyelitis. anti-coagulation until the thrombus has resolved is recommended. the optimal approach to prevention of recurrent ca-mrsa infections is unclear but a strategy that includes emphasizing personal hygiene, plus/minus antimicrobial soaps, mupirocin to the nose or "bleach baths" is frequently suggested. s226 understanding the pathogenesis of group a streptococcal disease: the bedside-to-bench approach invasive group a streptococcal (gas) infection presents itself in a range of guises, most notoriously necrotising fasciitis and the streptococcal toxic shock syndrome. as a human pathogen, gas pathogenesis research should ideally be shaped by clinical questions arising from either epidemiological or case-based investigation of human disease. in the mid 1990 s, large epidemiological studies pointed to a central role for specific t cell-stimulating superantigens in the aetiology of streptococcal toxic shock. this sparked a series of clinical and laboratory investigations that demonstrated production of superantigens during infection which were indeed capable of triggering massive t cell activation in patients but were unlikely, alone, to account for all the features observed in toxic shock. genomic, clinical and laboratory-based investigations have identified novel and highly potent superantigens that appear to directly contribute to sepsis pathogenesis and, together, may constitute targets for adjunctive treatments in invasive disease. epidemiological, clinical, and laboratory studies have highlighted a role for blunt trauma in the aetiology of at least a quarter of cases of gas necrotising fasciitis. one of the most striking findings on examination of tissues from patients suffering with necrotising fasciitis is the failure of neutrophils to migrate to the focus of infection. investigation of patients with invasive gas infection led to the discovery that gas produces an enzyme that can cleave and inactivate human chemokines and study of patients with bacteraemia has highlighted a likely role for the causal enzyme spycep in disease pathogenesis; this bacterial surface enzyme has also shown promise as a potential vaccine antigen. notwithstanding a potential role for individual virulence factors in disease causation, clinical studies have demonstrated that gas bacteria may persist at the site of infection despite high concentrations of bactericidal antibiotics, and this has been borne out by experimental studies; the reasons behind such persistence are unclear but may include internalisation of gas by immune cells, formation of biofilm, and antibiotic penetration of necrotic tissues. the persistence of viable bacteria in such cases is not widely recognized and deserves focused consideration in the research laboratory. genome-wide analysis of microbial pathogens and molecular pathogenesis processes has become an area of considerable activity in the last 10 years. these studies have been made possible by several advances, including completion of the human genome sequence, publication of genome sequences for many human pathogens, development of microarray technology and high-throughput proteomics, and maturation of bioinformatics. despite these advances, relatively little effort has been expended in the bacterial pathogenesis arena to develop and use integrated research platforms in a systems biology approach to enhance our understanding of disease processes. we have exploited an integrated genome-wide research platform to gain new knowledge about how the human bacterial pathogen group a streptococcus causes disease. results of these studies have provided many new avenues for basic pathogenesis research and translational research focused on development of an efficacious human vaccine and novel therapeutics. new data stemming from use of a systems biology approach to provide new data about group a streptococcus pathogenesis will be presented. streptococcal toxic shock syndrome and necrotising fasciitis caused by group a streptococcus are rapidly progressive invasive diseases that are associated with significant morbidity and mortality, ranging from 30−80% despite prompt antibiotic therapy and surgical debridement. s. pyogenes is known to primarily cause disease by activating and modulating host immune responses. the exotoxins with superantigenic activities have been demonstrated to be crucial triggers of excessive inflammatory responses and consequently systemic toxicity, organ dysfunction, tissue necrosis and shock. another important virulence determinant is the m-protein, which is classically known for its antiphagocytic properties, and lately, was shown to trigger pro-inflammatory responses as well as induction of vascular leakage and shock. this likely represents important mechanisms contributing to the rapid development of shock and systemic toxicity in patients with severe invasive group a streptococcal infections. the understanding of these infections as hyperinflammatory diseases highlighted the potential of immunotherapy to improve outcome. one such strategy includes the administration of intravenous polyspecific immunoglobulin (ivig) as adjunctive therapy. the mechanistic actions of ivig in this setting are believed to include opsonisation of the bacteria, neutralisation of the superantigens and suppression of the pro-inflammatory responses. there is growing evidence to support the use of ivig in patients with streptococcal toxic shock syndrome. these studies include one observational cohort study based on canadian patients identified through active surveillance of invasive group a streptococcal infections, and one european multicentre placebo-controlled trial. however, the question remains whether ivig is efficacious also for the severe streptococcal deep tissue infections. an observational study of seven patients with severe streptococcal deep tissue infections suggested that the use of high-dose ivig in patients with severe gas soft tissue infections may allow an initial non-operative or minimally invasive approach, which can limit the need to perform immediate wide debridements and amputations in unstable patients. the fact that seven patients with severe group a streptococcal infections survived with this approach definitely warrants further studies to be conducted on the use of ivig in these severe infections. hepatitis o229 prevalence and outcome of pregnancy in chronic hepatitis c virus infection i. julkunen°, a. sariola, m. sillanpää, k. melen, p. koskela, p. finne, a.l. järvenpää, s. riikonen, h.m. surcel (helsinki, oulu, fi) objectives: in the western countries the incidence of hepatitis c virus (hcv) infection has steadily been increasing especially among young adults. it is thus likely that an increasing prevalence of hcv infection is also found in pregnant women. methods: to assess the frequency of hcv infection in the metropolitan area of helsinki selected anti-hcv antibody testing was carried out for pregnant women during the years 1991-1999. in addition, hcv prevalence was analysed in serum specimens collected from pregnant maters during the years of 1985-2005. results: altogether 145 mothers were identified among 44680 mothers. the frequency of anti-hcv positivity rose from 0.13% in 1991 to 0.43−0.53 in 1997-1999. in early 90's only 20% of mothers knew about their seropositivity, whereas by the end of the follow-up period almost 70% of mothers knew about their hcv infection already before the pregnancy. intravenous drug abuse was the major risk factor (71% of cases) for contracting the disease. in 90% of the mothers chronic hcv infection was well under control and in this population the mean serum alanine aminotransferase (alt) values decreased towards the end of the pregnancy. however, 10% of anti-hcv ab positive mothers developed intrahepatic cholestasis (odds ratio 16.4) as characterised by itching and elevated serum bile acid levels. the correspondig value in the control pregnancies was only 0.7%. anti-hcv ab positive mothers were younger, delivered earlier and gave birth to babies with smaller birth weight as compared to control deliveries. to have a more comprehensive view of the problem of hcv infection during pregnancy randomly selected serum specimens from the finnish maternity cohort were tested. 2000-5000 serum specimens were tested in selected cohorts (1985, 1990, 1995, 2000 and 2005) . in 1985 the nationwide prevalence was 0.19% and it steadily role to 0.50% in 2005. in the metropolitan area of helsinki the prevalence was higher being 0.68% and 0.70 in 1997 and 2002, respectively. conclusion: our study indicates that there is an increasing problem of hcv infection in pregnant women in finland. although most women cope well with their disease during pregnancy there is a subpopulation of mothers who develop cholestasis and their liver status should thus be followed-up carefully. testing of all mothers for serum anti-hcv antibodies is recommended. objectives: the viral genome of hepatitis c virus constitutes a 9.6kb single-stranded positive-sense rna which encodes altogether 11 viral proteins. in order to study the humoral immune responses against different hcv proteins in patients suffering from chronic hcv infection, we produced three structural (c, e1 and e2) and six nonstructural proteins (ns2, ns3, ns4a, ns4b, ns5a and ns5b) in sf9 insect cells by using the baculovirus expression system. the recombinant hcv proteins were purified and used in western blot analysis to determine antibody responses against individual hcv protein in 68 hcv rna and antibody positive human sera that were obtained from patients suffering from genotype 1, 2, 3 or 4 infection. results: these sera were also analysed with inno-lia score test for hcv antibodies against core, ns3, ns4ab and ns5a, and the results were similar to our western blot method. based on our western blot analyses we found that the major viral antigens were the core, ns4b, ns3 and ns5a proteins and they were recognized in 97%, 86%, 68% and 53% of patient sera, respectively. there were no major genotype specific differences in antibody responses to individual hcv proteins. a common feature within the studied sera was that all except two sera recognized the core protein in high titers, whereas none of the sera recognized ns2 protein and only three sera (from genotype 3) recognised ns5b. the data shows significant variation in the specificity in humoral immunity in chronic hcv patients. anti-hcv antibody pattern also remains very stable within one individual. alt and ast levels were tested in all subjects. the presence of hbv-dna was determined quantitatively in plasma samples of hd patients with anti-hbc alone (hbsag negative, anti-hbs negative and anti-hbc positive) by real-time pcr using the artus hbv rg pcr kit on the rotor-gene 3000 real-time thermal cycler. results: of 289 patients enrolled in this study, 18 subjects (6.2%, 95% ci, 3.5%-8.9%) had anti-hbc alone. hbv-dna was detectable in 9 of 18 hd patients (50%, 95% ci, 27%-73%) with anti-hbc alone. plasma hbv-dna load was less than 50 iu/ml in all of these patients. our study showed that detection of anti-hbc alone could reflect unrecognized occult hbv infection in hd patients. the majority of these infections are associated with low viral loads. were included in the study. all the subjects had never been exposed to antiretroviral therapy. genotypic resistance testing was performed at the time of diagnosis with a sequence-based assay (trugene hiv-1 genotyping test) targeted at the protease region (codons 1 to 99) and rt region (codon 40 to 247) of the hiv-l genome. results: 21 of 218 patients (9.63%) harboured a virus with at least one mutation associated with phenotypic resistance; 1/218 with mutations associated with resistance to nucleoside reverse-transcriptase inhibitors (nrtis), 17/218 to non-nucleoside reverse-transcriptase inhibitors (nnrtis) and 3/218 to protease inhibitors (pi). resistance to nrtis was associated with the key mutation m184v, while resistance to nnrtis was associated with y181c and k103n mutations. among mutations to pi, major resistance mutations l90m and d30n were found in three patients, whereas there was a high prevalence of accessory pi resistance mutations at positions 10, 20, 36 and 63. conclusion: our data estimate the prevalence of primary resistance and mutations patterns among naive hiv patients, underlining the importance of genotypic resistance testing in hiv patients before starting treatment, especially when nnrtis would be included in the initial antiretroviral therapy. objectives: few data are available on the genetic mechanisms of protease inhibitor (pi) resistance in non-b hiv-1, and pi resistanceassociated mutations (rams) are commonly observed in pi-naive patients with subtype a/e infection. this study aimed to compare pi-rams between pi-naive and -experienced patients. methods: genotypic resistance testing was conducted among a cohort of hiv-1 infected patients who had virologic failure. patients were categorised into 2 groups: pi-naive and pi-experienced. we focused on pi-rams previously described by ias-usa 2008. results: we studied 137 patients (mean age, 41.8 years; 64% male). median cd4 cell count and hiv-1 rna at virologic failure were 169 cells/cu.mm. and 14100 copies/ml, respectively. 85% of patients were infected with subtype a/e; the others had subtype b (12%), ab (2%), and c (1%). there were 75 patients in pi-naive group and 62 patients in pi-experienced group. the clinical characteristics between 2 groups were similar (p > 0.05) except for the duration of antiretroviral therapy which was shorter in pi-naive group (31.5 vs. 46.8 months, p = 0.028). percentage of patients who had primary pi-rams was 1% in pinaive and 19% in pi-experienced groups (p = 0.001). the most common primary pi-rams in the latter group were v82a (10%) and i54v (7%). percentage of patients with secondary pi-rams in the corresponding groups was 99% and 98%, respectively (p = 1.000). median number of secondary pi-rams was also similar between 2 groups (p = 0.244). the most common secondary pi-rams in both groups were m36i (91%), h69k (34%), l89m (30%), i13v (26%), l63p (25%), l10i we also defined a "silent score" (ss) and a "resistance score" (rs) as the number of synonymous mutations and of resistance mutations (in the second sequence in comparison with the first one) divided by number of days between the two tests, respectively. (12); pts with drms in non-b-st (%) were 7 (23.3), 6 (14), 5 (7) and 3 (4.2). a significant increase of non-b-st (p = 0.021) and a significant decrease in drms (p < 0.001) were observed. crf02_ag was the prevalent non-b st (44%). 35.3% of non-b st pts were italians. among b-st, drms predicted a reduced susceptibility to one drug class in 23, 17, 15 and 14 cases in the different periods; to two drug classes in 4, 6, 5 and 8; to three classes in 3, 2, 0 and 0. in non-b-st, a reduced susceptibility to one drug class was found in 6, 6, 4 and 0 cases; to two drug classes in 1, 0, 0 and 2; to three drug classes in 0, 0, 1 and 1, respectively. among pts with one or two classes of resistance, a decrease of percentage of protease inhibitors related drms, and a persistence of non nucleoside rt inhibitors involving drms, mainly 103n and 190a, were observed. methods: from hiv+ persons with a history of, or an acute episode of opc, oral fungal burden was evaluated bi-weekly and buccal mucosa tissue was collected bimonthly for a period of one year. tissue was evaluated for the presence of cd8+ t cells and e-cadherin by immunohistochemistry or flow cytometry. objectives: to define the secular trends in the epidemiology of candidaemia in queensland, australia (population, 4.1 million) over a 10-year period. methods: all episodes of candidaemia within queensland public hospitals from 1999-2008 were identified from laboratory information systems. data on species identification, antifungal susceptibility, demographics, and hospital ward of diagnosis, and denominator data (hospital admissions, accrued patient-days (pt-days) and fluconazole usage) were collected. results: over the 10-year period, 1137 unique episodes (100% case ascertainment) were identified from 42 healthcare facilities (8 tertiary, 2 paediatric, 11 secondary and 21 smaller hospitals). the median patient age was 56.4 years. the overall incidence-density was 0.45/10000 ptdays, highest in paediatric (1.28/10000 pt-days) and tertiary hospitals (0.62/10000 pt-days). over the 10 years, the incidence-density increased 3.2-fold in tertiary hospitals and 6.6-fold in secondary hospitals (both p < 0.0001 for trend), but not in paediatric or smaller hospitals. the incidence-density in icus (5.2/10000 pt-days) was 10-fold higher than in non-icu wards, but did not significantly increase over the study period. the relative proportion of episodes occurring in adult general medical/surgical (ie non-oncology/non-icu) wards significantly increased (p < 0.001), accounting for 62% of episodes at the end of the 10-year period, whereas that occurring in paediatric and adult oncology wards decreased (p < 0.001 and p = 0.07 respectively). overall, c. albicans accounted for 44%, c. parapsilosis 27% and c. glabrata 13%. although the incidence-density of all species increased over the study period, the relative proportion caused by c. albicans decreased (p = 0.007) and c. parapsilosis increased (p = 0.01). despite significantly increased fluconazole usage (from 19.7 to 30.6 ddd/1000 pt-days, p < 0.0001), the relative proportion caused by c. glabrata/c. krusei did not change (p = 0.5). the overall incidence of candidaemia has increased almost 400% in queensland public hospitals over the last 10 years. the relative proportion of episodes occurring among general medical/surgical patients and caused by c. parapsilosis has increased. candidaemia is an increasing problem the epidemiology of which continues to evolve. it is increasingly affecting patients outside traditional risk groups. conclusions: this surveillance study and pharmaco-economic modelling has proved immensely beneficial in setting up inhouse processing, improved tat, reduced costs of outsourcing and subsequent use of expensive antifungals. reduction in mortality has been noted but is not statistically significant. c. albicans was the commonest isolate; fluconazole resistance is minimal and associated mortality is lower than reported from europe. many pts received systemic prophylaxis (72%); itraconazole and fluconazole were used in 68 and 33 pts respectively. no differences emerged between empirical vs pre-emptive therapy and none of the drugs resulted to significantly influence outcome. in 66% of pts initial empirical/pre-emptive drug remained unchanged after ia diagnosis, while in 16% clinicians shifted to a combined treatment. conclusion: this study allows as to analyzed multiple factors as potentially influencing outcome. we confirmed that aml phase and neutropenia influence ia outcome. present data confirm the perception that during last years the application of a correct and timely diagnostic work-up and the availability of more efficacious and less toxic drugs (i.e. voriconazole, liposomal amphotericin b, caspofungin) have modified the course of ia. however none of the new drugs emerged as the most efficacious in our series. even combined treatment did not confer any advantage in survival analysis. (<3% each). the first line therapy was monotherapy with voriconazole (49%), caspofungin (14%), lipid formulations of amb (9%) or used antifungal drugs combination (20%). the mortality rate at day 90 was 41% when first line therapy included voriconazole compared to 60% when it did not (p < 0.001). conclusion: comprehensive collections of cases based on systematic reporting and description of cases using a dedicated network of hospitals in selected regions and stringent definition criteria applied by trained clinicians and microbiologists are useful to describe ia, to assess its burden and secular trends, and to identify potential changes in diagnostic and therapeutic procedures. this network will expand to other regions in the near future, and data will help assessing the impact of new management strategies such as prophylaxis with posaconazole, the impact of modification of new diagnostic criteria as recently proposed (clin infect dis, 2008), and identifying new populations at risk for ia. nosocomial aspergillosis represents a serious threat for severely immunocompromised patients and outbreaks have been attributed to airborne sources. the role of hospital-independent fungal spread sources e.g. the private homes or business suites are not known. we investigated the relationship between fungal exposure prior hospitalisation and the ensuing onset of invasive mould infections (imi) in patients at risk. patients admitted to the department of haematology and oncology or to the department of transplant surgery of the innsbruck medical university received a structured questionnaire regarding their fungal exposure prior hospitalisation. questions inquired heavy fungal exposures up to five days prior hospitalisation. 234 patients were enrolled in this study and 19% were smokers, 22% suffered from an airborne allergy, 62% lived in old buildings, 73% were ruralists, 82% and 92% were exposed to any outdoor or indoor fungus sources. poor housing conditions and other fungus exposures were associated with the onset of community-acquired imi only in patients with acute myelogenous leukaemia (p < 0.01). aml patients being more at risk for imi when smoking cigarettes (p < 0.05), living on the country site (p < 0.05), having two or more fungus exposures (p < 0.05) and suffering from allergy to dust, pollen and/or moulds (p < 0.05). a similar trend was for lung transplant recipients receiving extensive immunosuppressive agents to treat allograft rejection. overall, 88% of imi were community-acquired cases. hospital-independent fungal sources highlight risk-factors for imi in severe immunocompromised patients and the rate of communityacquired imi does increase. an analysis of an individual patient's risk factors for fungal infection and the type of fungus to which they are most susceptible, indicates the preventative strategies that are likely to be successful. to the icu-mhs with aspergillus spp detected in significant amounts in clinical samples. the underlying conditions of the patients were heart transplantation (n = 5), major heart surgery (n = 4), and other (n = 2). eight (72.7%) patients developed proven/probable ia (4 with lung infection, 2 with mediastinitis, 1 with disseminated ia, and 1 with prostate involvement). the mortality of patients with ia was 87.5%. the icu-mhs is divided into 3 areas, one of which is equipped with hepa filters. only 1 case of ia occurred in the protected area. we measured the fungal conidia levels in the air of each of the 3 areas (508 samples analyzed) monthly. a total of 172 strains of a. fumigatus (110 clinical strains from 10 patients and 62 environmental strains) were genotyped using microsatellites (de valk et al, jcm 2005) . the mean airborne conidia levels (6 months) before and after the outbreak were, respectively, 5.6 (0−15) cfu/m3 and 1.8 (0−10) cfu/m3. no cases of ia occurred during these periods. however, all cases of ia were linked to 4 peaks of abnormally high airborne conidia levels (65, 70, 200 and 500 cfu/m3). a. fumigatus was involved in 7 cases of ia; 1 patient was infected by non-fumigatus aspergillus (not further genotyped). in 4 patients (1 mediastinitis, 2 pulmonary ia and 1 colonisation), we demonstrated similar genotypes in the air and in clinical samples. patient 1 was located in the protected area and had a unique genotype. patient 2 had two different clusters of genotypes: one cluster was similar to that of patient 3 and the other was also found in patient 4 and in the air. the genotype present in patients 2 and 4 was also detected in the air during a 6-month period. conclusions: epidemiologic and molecular typing suggests that there is a causal relationship between aspergillus causing ia and those present in the air. our finding also supports the need for hepa filtration in icu-mhs. j. guinea is contracted by fis (cm05/00171). sensitivity, specificity, positive predictive value (ppv) and negative predictive value (npv) were calculated in reference to proven and probable cases of ia. reasons for performing bronchoscopy on patients were also recorded. the protocol received approval by the local ethic committee. results: from the 117 samples studied, 5 (4.3%) were classified as proven, 6 (5.1%) as probable, and 35 (29.9%) as possible cases of aspergillosis. twelve samples (10.3%) represented colonisation, and 59 bal samples were obtained during routine surveillance. pulmonary aspergillosis was the main clinical presentation of ia (63.6%). using roc analysis, the best cut-off for galactomannan testing in bal was defined as 1.5 (sensitivity 90.9%, specificity 90.6%, ppv 48% and npv 99.1%). median bal gm index for the group of patients with proven/probable aspergillosis and for 'negative cases' were 3.3 and 0.5, respectively (p < 0.001). overall mortality was 20% (n = 12). the odds for death for patients diagnosed with ia were 11.8, in comparison to patients who did not have this infection (95% ci 2.9−48.4). conclusion: gm testing in the bal added to the diagnosis of ia in lung transplant recipients. in order to avoid false-positive results, a higher test cut-off should be applied to bal samples, in comparison to sera. increasing the cut-off to 1.5 resulted in a very high npv, with an associated sensitivity of >90%. objectives: 1) determine the performance characteristics of the galactomannan (gm) assay in broncho-alveolar lavage (bal) in haematology-oncology patients; 2) evaluate the prognostic value of the gm assay in this particular population. methods: the platelia gm eia assay (bio-rad) was performed on all bal specimens obtained from haematology-oncology patients at our institution between march 2005 and april 2008, in addition to routine laboratory stains and cultures. all results were reported to physicians. we conducted chart reviews to classify cases as proven, probable, possible or without invasive pulmonary aspergillosis (ipa) according to the revised definitions of invasive fungal disease from the eortc/msg consensus group. for performance characteristics, proven and probable cases were considered as ipa; possible cases were considered as without ipa. the result of bal gm was not considered as a criterium to classify cases in order to avoid incorporation bias. in patients with >1 positive (gm index >0.5) specimen, only the first one was considered for the analysis. mortality was calculated at 60 days following the first bal procurement. data were analyzed with stata 8.0. results: there were 173 bal samples from 145 patients, including 101 haematopoietic stem cell transplant (hsct) recipients. we found 5 proven, 7 probable and 35 possible cases of ipa (total of 12 ipa cases; 6.9%). gm on bal was positive in 47 (27.2%) specimens. the sensitivity and specificity of the gm assay in bal were 100% and 78.3% respectively. positive predictive and negative predictive values were 25.5% and 100%, respectively. false-positive results were found in 21 patients without ipa and in 14 with possible ipa. an index value 0.5 was significantly associated with a 60-day mortality risk (12/39 patients with a positive gm died within 60 days after bal compared to 13/106 with a negative gm (or = 3.2, 95%ci 1.3−7.8; p = 0.01). this association was even stronger when restricted to hsc recipients (or =4.6, 95%ci 1.5−13.6; p = 0.006). the clinical utility of gm assay in bal mainly lies in its negative predictive value, identifying patients at low risk of ipa. this test also carries a prognostic value in predicting patients at higher risk of mortality. (see table below) . not significant differences have been found among pneumocystis colonisation and copd status evaluated by fev-1%. as well as no significant differences respect to age, sex or lymphocytes and leucocytes blood count were found. background: infliximab, a monoclonal antibody targeting tumour necrosis factor alpha (tnf-a), is indicated for the treatment of rheumatoid arthritis (ra) and other autoimmune diseases. however, its use has been associated with opportunistic infections, including pneumocystis jirovecii pneumonia (pcp). moreover, p. jirovecii has been observed colonising to humans with several disorders. objectives: to obtain information about p. jirovecii colonisation among patients with rheumatologic disease treated with infliximab. this information could be useful for assessing new strategies in the prevention of pcp in patients at risk. methods: 62 consecutive patients treated with infliximab for rheumatic disorders were included in the study. oropharyngeal washes (ow) samples were collected for p. jirovecii detection. clinical and demographic data were collected (sex, age, rheumatologic diagnosis, duration of infliximab use, concomitant use of other drugs for rheumatologic treatment, use of any other anti-tnf-a agent, use of anti-pc drugs in the last six months, smoking, and diagnosis of chronic pulmonary respiratory disease). p. jirovecii colonisation was identify in ow samples by pcr at mtlsu-rrna gene, with primers paz102-x and paz102-y. we adapted a method previously described to a real-time pcr setting, using a lightcycler 1.5 (roche, germany). individuals in whom the presence of p. jirovecii was detected at two independent assay in the absence of respiratory symptoms or radiological findings suggestive of pcp were considered to be colonised. results: clinical and demographic data for 62 patients treated with infliximab are presented in table 1 objectives: most research with human bocavirus, a recently found respiratory pathogen, has been done by molecular biology (polymerase chain reaction, pcr). the results have been ambiguous because the virus has often been found in co-infection with other viruses, and also in clinically healthy subjects. it has been proposed that, for bocavirus, antigen detection could better indicate the aetiology than qualitative nucleic acid detection. we have developed a rapid antigen detection test for the virus. the one-step test for bocavirus vp2 antigen is based on a separation-free two-photon excitation fluorometry (arcdia tpx assay technique). the assay protocol is simple; the swab sample is dissolved in sample buffer, and the solution is dispensed (20 ml) onto a 384-well microtitre plate (containing the reagents in dry form) for incubation and automated quantitative measurement. the immunoassay applies microspheres as solid-phase carriers of purified bocavirus-specific polyclonal antibodies. the virus antigens concentrate onto the solid-phase which is probed in real-time with fluorescently labelled antibody reagents. strong positive samples are reportable in 15 minutes, while low positive and negative samples are reported in 2 hours. the performance of the method was studied with recombinant human bocavirus-like particles (vp2), and purified respiratory pathogens (group a streptococci, streptococcus pneumoniae, and influenza a and b, respiratory syncytial, metapneumo, adeno, and parainfluenza 1−3 viruses). results: analytical detection sensitivity of the method (lowest limit of detection, 0-control + 3sds) was 3 ng/ml, dynamic concentration range was three orders of magnitude, and intra-assay imprecision was 5−10%. cross-reactions with the other respiratory pathogens were not found. the new method enables rapid detection of bocavirus antigens. the new test is very easy to perform in comparison to standard elisas. the analytical sensitivity of the method is expected to allow analysis of clinical samples. the sensitivity of the antigen detection test could be significantly increased by the use of monoclonal antibodies (10-100 fold). our future objectives include increasing the detection sensitivity, and analysis of clinical samples in order to study the correlation of antigen detection and the clinical aetiology. life-year for patients who survived. all analyses were performed using treeage software (2008). results: the overall mortality rates for empiric vancomycin (v) and semi-synthetic-penicillin (ssp) was 30% and 35%, respectively, as apposed to 24% for those receiving the rapid mrsa pcr testing. these mortality rates were similar in both the eu and us subsets. furthermore, the number needed to test in order to save one life was 20 and 11 for empiric v and ssp, respectively. using sensitivity analysis the prevalence of mrsa was varied from 5% to 80% and yielded an absolute mortality difference favouring the pcr testing group of 10% and 2%, respectively as compared to empiric v and 1% and 18% compared to empiric ssp. in eu the c/e for empiric v and ssp treated patients was €873 and €949, respectively as compared to €807 for rapid pcr testing. in the us the c/e for empiric v was $1,049 as compared to $971 for rapid pcr testing. using sensitivity analysis the prevalence of mrsa was varied from 5% to 80% and yielded favourable c/e in both the eu and us for rapid pcr testing regardless of the empiric treatment regimen. conclusion: rapid mrsa pcr testing using the xpert mrsa/sa blood culture pcr assay appears to improve mortality rates and is cost effective in the eu and us across a wide range of mrsa prevalence rates. background: rapid detection of gastro-intestinal carriage of glycopeptide-resistant enterococci (gre) from screening cultures is crucial for an efficient control of their spread. we assessed 4 media − 2 chromogenic, chromid, (biomérieux), and chromagar (chromagar microbiology), and 2 selective, vre selective (oxoid) and eccv (bd) − for their ability to detect gre using well-characterised isolates and stool samples from hospitalised patients at high risk of gre colonisation. methods: twenty-five isolates consisting of 13 gre. faecalis/faecium carrying various van genes and 12 non-vre at concentrations of 10 6 -10 1 cfu/ml and 10 6 cfu/ml, respectively, and 37 stool samples were randomised and spiral plated on all media and scored by 5 blinded investigators for characteristic colonies after 24 hrs incubation. standard confirmatory tests were done on 1 putative gre colony or on 1 characteristically coloured colony each for e. faecalis/faecium from the selective and chromogenic media, respectively. detection of van genes, and ddl or soda based speciation was done on pcr-sequencing. mean sensitivity (sen) and specificity (spec), and confidence intervals (cis) were estimated for each medium by a logistic regression model using a penalised likelihood approach based on the reader response for the stool samples and isolates, and additionally on confirmation test results for the stool samples, both at the aggregated (gre detected) and penalised level (correct species-colony colour correlation). results: chromagar showed the highest sen based on reader response at the aggregated and penalised level for both stool samples and isolates (table) . using confirmation test results at the aggregated level, sen for eccv was highest while the two chromogenic media showed a decrease in sen by at least 11% in comparison to the values obtained based on reader response. sens for the 2 chromogenic media were even lower (<70%) based on confirmation test results at the penalised level. eccv and chromid showed the highest specs with both reader response (stool samples) and confirmation test results at the aggregated level, and chromid also at the penalised level, with narrow cis indicating a high precision of this parameter estimate. for isolates, specs were highest for chromagar at both levels. conclusions: chromagar showed the best overall performance considering both sen and spec estimates. eccv performed well as a selective medium for gre detection from stool samples. objectives: metallo-beta-lactamases (mbls) expressed from pseudomonas are able to confer resistance to all beta-lactams with the exception of aztreonam. however, enterobacteriaceae possessing mbls exhibit moderate cephalosporin and low carbapenem mics and thus are often underestimated. herein, we describe data from new etest prototypes specifically designed to detect this problematic resistance mechanism. methods: 82 mbl-positive (vim or imp derivatives) enterobacteriaceae clinical isolates from 8 countries and 27 randomly selected enterobacteriaceae negative controls (including the atcc type strains) were tested against the 4 different etest mbl prototypes. beta-lactam substrates used were imipenem (ip), meropenem (mp), ceftazidime (tz) and cefotaxime (ct) with or without the inhibitors dipicolinic acid (dpa) and edta. the etest standard procedure for gram negative aerobes was used and a reduction of beta-lactam mic by equal to or greater than 3 dilutions by edta or dpa was interpreted as positive for mbl. presence of esbls was tested using the etest ct/ctl, tz/tzl and cefepime (pm)/pml strips. ampc production was detected using the etest cefoxitin (fx)/fxi and cefotetan (cn)/cni strips. of the 784 select specimens that were negative for gbs, 345 grew turquoise-blue colonies, but the majority that required further work to rule out gbs grew after 48 hours. two strains of gbs that were missed grew as white colonies on select, and even at 48 h, did not exhibit the characteristic turquoise-blue colour. conclusion: ssb enrichment followed by select subculture was extremely sensitive (99.2%) and superior to cna/ssb for detection of gbs from genital specimens. however, non-gbs organisms can produce turquoise-blue colonies on select and further work must be performed to rule out the presence of gbs. objectives: screening for chlamydia trachomatis (ct) specific antibodies is valuable in investigating recurrent cause of miscarriage, pelvic inflammatory disease and tubal damage following repeated episodes of pelvic inflammatory disease. immunofluorescence (if) is considered the gold standard for detection of ct antibodies. the present study aims to compare the performance of 4 other commercial tests for the detection of serum igg antibodies specific for ct: two ct igg pelisa both using major outer membrane protein (momp; ["momp-medac", ct-igg-pelisa; medac, wedel, germany and "momp-ruwag", ct pelisa; ruwag, bettlach, switzerland), one ct hsp-60 igg pelisa ("hsp60-medac", chsp60-igg-pelisa; medac, wedel, germany), and a new automated epifluorescence immunoassay ("inodiag", "must chlamydiae; inodiag, signes, france). methods: a total of 405 patients with (n = 251) and without (n = 154) miscarriages were tested by all 5 serological tests described above. sensitivity and specificity were calculated using if as gold standard. a second standard, defining true positive or negative samples as sera respectively positive and negative in all 4 others tests, was also used (see table) . objectives: participation in diagnostic microbiology internal and external quality control (qc) processes is good laboratory practice, an essential component of a quality management system and compulsory in some european countries. currently, there is no qc scheme for diagnostic oral microbiology. the aim of this study was to collate information on current qc needs, and processes undertaken in diagnostic oral microbiology laboratories. method: an on-line questionnaire was devised to ascertain interest in participating in an oral microbiology qc scheme and sent to oral microbiology diagnostic laboratories. the laboratories were identified from participants attending the european oral microbiology workshop in helsinki, 2008. following this, a pilot round of qc samples was distributed to all interested laboratories. results: we identified 12 individuals that worked in diagnostic oral microbiology laboratories and received 7 (58%) positive responses. of these 7 laboratories (representing 6 european countries) 71% did not participate in either internal or external qc. each laboratory processed on average a total of 4135 samples annually. 86% of participants were in favour of a european-wide oral microbiology qc scheme. the preferred frequency for receiving external qc specimen was once in 3−4 months. the most preferred specimen types were periodontal pocket and oral pus specimens (both 29%), followed by oral mucosal swabs and caries activity tests. all participating laboratories were willing to share and harmonise their specimen processing and interpretation standard operating procedures. the pilot round specimen was a periodontal pocket sample. six laboratories reported their findings in the specified time. the predominant pathogens (aggregatibacter actinomycetemcomitans, porphyromonas gingivalis) were identified by 5 of 6 laboratories. in addition to conventional culture, one laboratory used pcr. 5 laboratories performed antibacterial sensitivity testing primarily by disc diffusion. conclusions: this is the first attempt to a standardised europeanwide approach to diagnostic oral microbiology. the findings from this feasibility study have indicated that a qc scheme for oral microbiology is of interest and have raised a number a pointers for subsequent rounds of specimens. further work to improve the quality, to standardise the methodology and the interpretation of diagnostic oral microbiology at the european level is on-going. objectives: since severe sepsis with acute organ dysfunction can be fatal within hours, it is customary to start empirical broad-spectrum antimicrobial therapy in all patients hospitalised for a suspicion of systemic inflammatory response syndrome. however, increased use of broad-spectrum antimicrobials over the years has contributed to the emergence of drug resistant strains of bacteria. especially, drug resistance among gram-positive bacteria, the leading cause of sepsis, is now a serious problem. the objective of this preliminary study was to develop a method for distinguishing between gram− and gram+ bacterial infection. methods: in this prospective study, leukocyte and neutrophil counts, crp, esr, and quantitative flow cytometric analysis of neutrophil complement receptors 1 (cr1/cd35) and 3 (cr3/cd11b), were obtained from 289 hospitalised febrile patients, of which 89 had bacterial and 38 viral infection. the patient data were compared to 60 healthy controls. results: it was noticed that in gram− infection (n = 21) the average amount of cd11b on neutrophils was significantly higher than in gram+ infection (n = 22). on the contrary, serum crp level was significantly higher in gram+ than in gram− infection. other measured parameters did not differ significantly between gram+ and gram− infections. we derived a crp/cd11b ratio dividing the serum crp value by amount of cd11b on neutrophils. in thirteen (76%) out of 17 patients with gram+ sepsis had crp/cd11b ratio cutoff value of 3.1 (figure 1 ). of these 13 patients, 9 (70%) were diagnosed with streptococcus pneumoniae, 2 with staphylococcus aureus, 1 with enterococcus faecalis, and 1 with both streptococcus intermedius and streptococcus oralis. corresponding percentages in patients with local gram+ infection, gram− infection, clinical pneumonia, other clinical infection, and viral infection were 20%, 14%, 30%, 15%, and 0%, respectively. conclusion: the detection of gram+ sepsis is possible after combination of neutrophil cd11b data and serum crp level. crp/cd11b ratio viral infections of the central nervous system s61 displayed 76% sensitivity and 80% specificity for detection of gram+ sepsis. the proposed crp/cd11b ratio test could, for its part, assist physicians to decide appropriate antibiotic treatment in patients with severe bacterial infection. a bacterial biofilm is a structured consortium of bacteria cells surrounded by a self-produced polymer matrix. biofilms may be monospecies or polyspecies biofilms. biofilm growing bacteria give rise to chronic infections, which persist in spite of therapy and in spite of the host's immune-and inflammatory responses. biofilm infections are characterised by persisting pathology and immune response (in contrast to colonisation). bacterial biofilms use both biofilm specific (b) and conventional (planktonic) resistance mechanisms (p) when they are exposed to antibiotics. the following resistance mechanisms have been described in bacterial biofilms: 1. stationary phase physiology (b), low oxygen tension (b) and slow growth (b) especially inside biofilms whereas the surface of biofilms is more similar to planktonic growth. 2. penetration barriers (b), binding to the polymer matrix (b). 3. mutations, hypermutators (b, p). 4. chromosomal betalactamase is upregulated (b, p). 5. antibiotic tolerance/adaptive resistance (b). 6. efflux pumps (b, p). 7. alginate production (b). 8. high cell density and quorum sensing (b, p). 9. pbp 3 − sos response ? (b). the knowledge of these resistance mechanisms can, however, be used to design new therapeutica approaches especially as regards quorum sensing inhibitors. we consider two factors that contribute to treatment failure in the absence of inherited resistance, the density of the population being treated and the physiological state of the bacteria. we also explore how these factors might contribute to the evolution of inherited resistance during the course of treatment. we conclude with a computer-and chemostat-assisted consideration of the potential clinical implication of these density and physiology effects and make suggestions for treatment protocols to deal with them. using in vitro cultures of staphylococcus aureus atcc25923 or the clinical isolate ps80 and antibiotics of six different classes we determined the functional relationship between the inoculum density and the efficacy of the antibiotics. as measured by the rates and extent of kill and/or the minimum inhibitory concentration (mic), the efficacy of all of these antibiotics declined with increases in the density of bacteria, albeit to different extents. for daptomycin and vancomycin, much of this density effect can be attributed to bacteria-associated declines in the effective concentration of the antibiotic in the medium. for gentamicin, vancomycin, ciprofloxacin and oxacillin, our bioassays failed to reveal significant reductions in their effective concentration in the medium. the effects of the physiological state of s. aureus on the efficacy of these antibiotics were examined for bacteria from cultures in "stationary phase" for different times and from chemostats run at different generation times. these experiments are currently under way but by the time of the symposium we will have the full (and true) story. it is, however, clear that the efficacy of all of these antibiotics declines with the time in stationary phase (its "age"). and, even slowly dividing cultures from chemostats are more susceptible to antibiotic-mediated killing that early stationary phase batch cultures. the efficacy in killing non-growing bacteria varies among the bactericidal antibiotics examined. to ascertain the potential clinical implications of these density and physiological effects, we use both computer and in vitro simulations of antibiotic treatment. the results of these simulations provide compelling support for the proposition that antibiotic treatment regimes, including those designed to prevent the ascent of resistance, should take into account the anticipated density and physiological state of the target population of susceptible bacteria. there have been an increasing number of neurotrophic viral infections playing an important role in the world over the last decade. the list includes west nile virus, nipah and hendra virus (both paramyxoviruses), as well as chikungunya virus which suddenly emerged. furthermore, the relation between jc virus in progressive multifocal leukoencephalopathy (pml) in patients with multiple sclerosis treated with a new immunosuppressive drug, has triggered our attention. the development and implementation of molecular based amplification method has assisted us to detect these viruses more efficiently. these technologies have been used now routinely in a large number of laboratories to enable the detection of more commonly known neurotrophic viruses, like hsv, vzv and the neurotrophic picornaviruses like enterovirus and parechovirus. the pitfalls of these molecular methods have been generally solved by implementing regular quality control testing schemes, like organised by qcmd (quality control of molecular diagnostics) and the introduction of internal controls during the whole diagnostic process. finally, with the ability to quantify the amount of nucleic acid present in csf, more information on the pathogenesis of these viral infections, as well as significant tool to monitor the antiviral effect of treatment options for these viruses, has become available. to as a rare disease in europe restricted to some endemic foci. however, current data suggest that the incidence of ae has significantly increased, and the disease is spreading to the north, west, and east. ae has become an emerging disease in the baltic countries. thus, human infections with e. multilocularis have arrived in the "centre" of europe. ae is a lifethreatening disease, and is characterised by a tumour-like lesion in the liver. the larva can infiltrate the surrounding tissues and metastasize to distant organs. in an attempt to classify the large variety of anatomical findings in ae, the pnm-classification system was developed and serves as a benchmark for standardised evaluation of diagnostic and therapeutic measures. modern imaging techniques, such as ultrasound, ct or mri and pet/ct contributed not only to a much better description of the lesions, but also to a judgment upon the activity of the metacestode. the differential diagnosis of ae varies from haemangioma-like lesion of the liver or cancer. the diagnostic skills are limited, and are the reason for frequent misdiagnosis in geographic areas where ae is rather unknown. continuous treatment with benzimidazoles is the backbone of a lifelong management of ae. however, radical resection is the procedure of choice and should always be strived for. ae is still a rare disease in europe, but where it occurs, it is often diagnosed too late. patients are misdiagnosed for months and years, before receiving the correct treatment. at that late stage the disease has progressed, and radical cure of the liver lesion(s) is not anymore possible. recent reports provided hints for an accelerated larval growth of echinococcus spp. in the immunodeficient host. a careful monitoring of patients receiving immune-modifying drugs is warranted. the modern clinical management and long-term parasitostatic treatment with benzimidazoles are highly effective. thus, a higher alertness for the "tumours from the centre" would increase the prognosis of this hepatic disease resembling liver cancer. the percutaneous treatment of liver hydatid cysts were considered to be contraindicated due to two main potential risks: anaphylactic shock and abdominal dissemination of the disease. since the first case percutaneously treated was published, several series of successful percutaneous treatment of the liver and the other abdominal organs, peritoneum, thorax, soft tissue and orbital cavity hydatid cysts have appeared in the literature. percutaneous treatment of hydatid liver disease is an effective and safe procedure with its unique advantages (e.g., shorter hospital stay, low complication rate). today, the percutaneous approach has an important role in treatment of hydatid cysts not only in the liver but also in the other organs and tissue. therefore it must be first treatment option whenever it is indicated. in europe, dirofilaria immitis and dirofilaria repens are responsible of autochthonous filariases in dogs. adults of d. immitis kills the dogs with an heart location and d. repens is often found in subcutaneous nodules in dogs and cats. the microfilariae are present in the blood of these animals. dirofilariasis is due to the transmission of microfilariae by some mosquito bites (aedes, culex, anopheles, mansonia, psorophora and taeniorhynchus). usually non pathogenic to humans, these parasites are particularly present around the mediterranean basin. d. immitis is very rare in humans in europe, sometimes found in a pulmonary nodule and the heart location is not described. d. repens is more frequent and emerging in humans. usually, only one larva develops, producing an immature adult worm inside a subcutaneous nodule. ultrasound examination may suggest the parasitic origin of the lesion before an extraction and a parasitological diagnosis of the worm. more often, a fortuitous diagnosis is made on histological examination. very rarely, an adult worm may mature and produce systemic diffusion of microfilariae. dirofilariasis due to d. repens can present problems in diagnosis and treatment. an ocular and subconjunctival location of the worm and a subcutaneous nodule enclosing an immature adult are the commonest clinical forms. exceptional pulmonary locations are described. the subcutaneous locations described are: skull, cheek, breast, inguinal area, buttocks, arms and legs. cases of testicular location with painful symptoms have been observed. blood hypereosinophilia was exceptionally observed in human. it is treated surgically, by excision, without chemotherapy. while the majority of esbls, isolated in clinically-relevant gram negative bacteria (gnb) (mostly enterobacteriaceae, p. aeruginosa, a. baumannii) are tem-, shv-or ctx-m-types, a few others have been reported (sfo, bes, bel, tla, ges, bel, per, veb-types, and some oxa-esbls). laboratory detection of esbl-producers is important to avoid clinical failure due to inappropriate antimicrobial therapy and to prevent nosocomial outbreaks. selective culture media (macconkey and drigalski agar supplemented with cefotaxime and/or ceftazidime) have been proposed for detection of gnb resistant to expanded-spectrum cephalosporins (esc). media using chromogenic based substrates and selective antibiotics have been developed recently for the detection and presumptive identification of esbl-producing enterobacteriaceae directly from clinical specimens. detection of esbls based only on susceptibility testing is not easy due to the variety of b-lactamases and their variable expression of blactam resistance. commercially available esbl detection methods yield at most 90% accurate esbl identification, since some esbl-producers may appear susceptible to some escs. therefore, any organism showing reduced susceptibility to esc should be investigated using esbl confirmatory tests. these tests should be able to discriminate between esbl-producers and those with other mechanisms conferring esc resistance. these phenotypic tests (double-disk synergy test, esbl etest, and the combination disk method) are based on clavulanate inhibition and esc susceptibility testing. they often need slight changes by either reducing the distance between the disks of esc and clavulanate, the use of cefepime (not hydrolysed by ampcs), the use of cloxacillincontaining plates (that inhibits ampc), or by double inhibition by edta and clavulanate (masking metallo-enzymes). enzymatic tests have also been proposed for identification of esbl-producers. several pcr-based techniques (end-point or real time) have been developed on clinical samples or on colonies. several esbl genes have been detected using pcr coupled to either pyrosequencing, inverse hybridisation, to dhplc, or to fluorescent probes. these techniques even though more specific require technical knowledge, special equipment, are costly and detect only known genes, regardless of their expression. detection of esbl-producer remains a challenge for the microbiology laboratory and one shall be aware that esbl screening media are now available. resistance to antimicrobial agents has become common in many bacterial species, particularly those that cause human infections. the rapid detection of resistant organisms directly in clinical samples by real-time pcr coupled with molecular beacons, or of potentially resistant bacteria and yeast in blood culture bottles by peptide nucleic acid-fluorescence in situ hybridisation (pna-fish) is already having a positive impact on antimicrobial therapy. the direct detection of mycobacterium tuberculosis in sputum in approximately 2 hours with concomitant detection of mutations in rpob indicating rifampin resistance (as a surrogate for multidrug resistance) in the near future will likely improve the outcomes for tuberculosis patients in many developing and developed countries. several molecular technologies, including microarrays, bacterial tag encoded flx amplicon pyrosequencing (btefap), and ultra deep sequencing, have not yet transitioned to clinical laboratories but will likely provide even greater information about antimicrobial resistance not in just a single species, but in a whole community of microorganisms. complex wounds, like diabetic foot ulcers, containing multiple resistance genotypes are amenable to analysis by btefap. the implementation of these technologies in the clinical laboratory will be expensive but the potential to dramatically improve therapeutic outcomes especially for life-threatening diseases is unprecedented. objective: to determine the appropriateness of antimicrobial therapy (amt) in 11 dutch hospitals. method: data were obtained from a prevalence survey performed within the dutch surveillance network for nosocomial infections (prezies). amt administrated on the day of the survey was registered. antiviral and antifungal drugs, tuberculostatics, cements containing amt and prophylaxis administrated in the operation-theatre were excluded. the appropriateness of amt was assessed according to a standardised algorithm based on the local antimicrobial prescription guidelines. per patient a classification in appropriate use, inappropriate use and insufficient information was made. figure: relative risk of ia use of amt against largest hospital (hospital c). results: a total of 3,546 patients were included of which 1,075 (30%, range per centre (rpc): 23−37%) received amt. in the latter group, amt was considered appropriate in 70% (rpc: 57−84%), inappropriate in 17% (rpc: 3−32%) and was not judged because of insufficient information in 13% (rpc: 1−30%). there was considerable variation in inappropriate use among the participating centres (figure). in univariate analysis older age, the use of quinolones, being on the urology ward and presence of a suprapubical catheter were associated significantly with inappropriate use. admission on the icu and presence of an intravascular catheter were associated significantly with appropriate use. in a multivariate analyses the presence of suprapubical catheter, being on the urology ward and the use of quinolones were determinants for inappropriate use. this study showed large differences in overall use and appropriateness of use of amt between hospitals. based on these results it is possible to define targets for intervention to improve the prudent use of amt. the high fraction of patients with insufficient information in several centres may have influenced the analyses and should be addressed in future studies. m. struelens°, s. metz-gercek, r. mechtler, f. buyle, a. lechner, h. mittermayer, f. allerberger, w. kern objectives: the eu-project antibiotic strategy international (abs) qi team developed process qis for auditing the performance of key treatment and prophylactic practices. an international network of pilot hospitals tested these tools for feasibility, reliability and sensitivity to improvement. methods: qis included: 1. surgical prophylaxis (indication, drug choice, timing and duration of administration); 2. management of community-acquired pneumonia (cap) (blood culture and legionella antigen tests and drug choice for empirical treatment); 3. management of s. aureus bacteraemia (echocardiography, iv catheter removal and duration of therapy); and 4. iv-po switch for bio-available antibiotics. a minimum of 40 consecutive cases per centre and qi were retrospectively reviewed from clinical, laboratory and administrative records and assessed for data availability, inter-observer reliability, data collection workload and performance score. results: a total of 1240 patients were evaluated in 11 acute care hospitals from 5 countries, with a range of 80 to 500 cases and 2 to 9 centres per indicator. seven centres had already implemented antibiotic quality improvement and audit programmes. availability of data was >85% of cases and ranged between 87% (catheter removal in s. aureus bacteraemia) and 100% (diagnostic tests for cap). 13/14 indicators were found to be reliable with kappa 0.60 (good to excellent agreement). the workload per case ranged from a median time of 16 (cap) to 35 min (iv-po switch). the intention to treat qi scores showed high levels of adherence to the surgical prophylaxis qi bundle, with median values of 81 to 97% for hip prosthesis and 65 to 92% for colo-rectal surgery. for cap management, diagnostic testing appeared sub-optimal (<56% compliance with idsa guidelines). for s. aureus bacteraemia management, indicator results ranged from 60 to 65%. for use of bio available antibiotics, a median of 45% iv administrations were avoidable. there were marked differences of scores between centres for all qis. conclusions: the abs qis are reliable and broadly applicable tools for auditing antibiotic treatment and prophylactic practices. inter-hospital variation in adherence to recommended practice indicates substantial potential for improvement with different local priorities. these qis can be recommended for assessing the effect of quality of care interventions at either local or multi-centre level. d.j. noimark°, e. charani, s. smith, b. cooper, i. balakrishnan, s.p. stone (london, uk) introduction: reduction of clostridium difficile infection (cdi), which often follows use of third generation cephalosporins, is a national priority. over a three year period, antibiotic policies were reviewed and changed in an elderly medicine department according to local sensitivities of common pathogens and levels of cdi. a laminated pocket-sized card describing antibiotic policies was given to all doctors in the department on induction with instructions not to depart from these without microbiologists' approval. this prospective controlled interrupted time series examines whether this intervention increased compliance with antibiotic policy and decreased cdi incidence. methods: the department's "narrow-spectrum, no cephalosporin" antibiotic policy was changed on 1st august 2006 to replace trimethoprim with cephradine (1st generation cephalopsporin) as empiric treatment for urinary tract infection, reflecting local escheriscia coli sensitivities. in october 2007, all cephalosporins and quinolones were removed from the policy as cdi levels had increased. notional 7 day antibiotic usage was calculated from prospective pharmacy generated data with aspirin, calcium, bisphosphonate & laxative prescription use as a non-antibiotic control, and analysed by segmented regression with a robust variance estimator. cdi rates were prospectively collected separately & analysed by a poisson regression model. results: an immediate response to change in antibiotic guidelines was observed (figure) . from august 06-sep 07 there was a highly significant increase in cephalosporins (85-100% of which was cephradine alone) (p < 0.001), a significant fall in trimethoprim (p < 0.004) and a significant increasing trend in cdi ( no tools existed to assess the readiness of public hospitals to receive this technology, and therefore guide resource allocation to facilitate implementation. aim: to assess the readiness of victorian public hospitals to introduce electronic antimicrobial stewardship. method: literature on readiness for change, organisational culture and information technology acceptance were reviewed. group interviews with project teams at site initiation meetings, one on one interviews with project officers at subsequent meetings, and observation where appropriate were all used to determine potential barriers and enablers. this information was recorded using a 'readiness assessment tool' and analysed to identify a number of key domains. to triangulate the data, questionnaires were distributed to project officers asking them to assess their sites' readiness to implement the system. results: a novel 'readiness assessment tool' was developed. it covered the domains of technical readiness, skills readiness, process readiness, administrative support readiness, resource readiness and hospital organisational characteristics. assessments at several hospitals highlighted a variety of issues at different sites and allowed early efforts to address these. a formative readiness assessment can be used to identify systematic problems that might facilitate or hinder uptake of electronic antimicrobial stewardship and to inform the adopters of potential resources required. [1] buising, k, thursky, k, robertson, m, black, j, street, a, richards, m & brown, g (2008) . electronic antibiotic stewardship-reduced consumption of broad-spectrum antibiotics using a computerised antimicrobial approval system in a hospital setting. j antimicrob chemother. w.v. kern°, m. steib-bauert, a. pritzkow, g. peyerl-hoffmann, h. von baum, u. frank, m. dettenkofer, c. schneider, k. de with, h. bertz (freiburg, ulm, de) objectives: fluoroquinolone prophylaxis (fqpx) may reduce morbidity and mortality in cancer patients (pts) with neutropenia, but the development of fluoroquinolone resistance (fqr) in escherichia coli and other target organisms limits its usefulness. we evaluated changes in the incidence density of gram-negative bloodstream infection (gnb) and in the in vitro fqr rates after the introduction of fqpx (with levofloxacin) as a standard of care for pts with high risk neutropenia in a university hospital. methods: we collected individual data for 357 pts admitted during baseline and during the first months following the intervention to assess clinical outcomes. individual pt data were compared with aggregate data (3-month periods). aggregate data analysis (unit-wide antibiotic consumption, gnb and numbers of in vitro fqr bloodstream isolates) was continued for a total of eight 3-month periods for both the haematology-oncology service and for general internal medicine. the new policy was introduced in the second half of the year 2005 when unit-wide baseline fqr of e. coli and of coagulase-negative staphylococcal (cons) bloodstream isolates had been 15% and 80% in the haematology-oncology unit, and 8% and 60% in general internal medicine, respectively. the individual pt data analysis revealed that pts not given fqpx had a much higher incidence of gnb than those given fqpx ( -2007) . the monthly use of iv and oral quin was calculated based on data from the pharmacy department. statistical analyses were performed using segmented linear regression analysis. bayesian model averaging was used to account for model uncertainty. results: before the interventions the use of quin (both iv and total) was stable. the best fitting models indicated that the first intervention was associated with a stepwise reduction in iv use of 71 prescribed daily doses (pdd) (95% ci: 47, 95 (p < 0.001)). there was also an indication of smaller reduction in iv use associated with intervention 4, but only the intervention 1 effect was robust to model uncertainty. the overall use of quin was also significantly reduced (figure) with a large stepwise reduction of 107 pdd (95% ci: 58, 156) associated with intervention 2. this study showed that the hospital-wide use of quin can be significantly improved (and decreased) by an active policy consisting of multiple interventions. marwick°, j. broomhall, c. mccowan, s. gonzalez-mcquire, k. akhras, s. merchant, p. davey (dundee, high wycombe, uk; raritan, us) aim and objectives: to describe the antibiotic treatment and outcomes stratified by severity in a representative sample of adult patients aged 18 or older who were treated in hospital for skin and soft tissue infections. inadequate. we also judged that 43% of patients received unnecessarily broad spectrum therapy. conclusions: ssti is common and is associated with significant mortality. however, choice of empirical therapy is not evidence based, with significant under treatment of high risk patients. ab were mostly (16/17) prescribed by gps and delivered by public (n = 14) or hospital pharmacies (n = 3). surveillance of ab use in nhs was organised in only 4 ms. in 3 countries a nh specific pharmaceutical formulary was available. prescription profiles by prescriber were available in 5 countries. other quality improvement initiatives in nhs such as regular training of prescribers, promoting microbiological sampling, collection of antimicrobial resistance profiles or pharmacist advice on ab prescription were scarce. guidelines for ab treatment of most frequent infections were available in many countries but were focussing on ambulatory care and did not consider the specific nh situation. only in 1 country the presence of an infection control practitioner was compulsory and partnership with hospital infection control teams was legally imposed in 3 ms. conclusion: important structural, functional and regulatory nh differences exist between eu countries. specific tools to improve infection prevention and ab therapy in nhs should take into account these differences. a european nh network was created in the framework of the esac nh subproject, which will organise point prevalence surveys on ab use in 2009. c. escherichia coli in south-western finland j. jalava°, o. meurman, h. marttila, a. hakanen, m. lindgren, k. rantakokko-jalava (turku, fi) objectives: extended-spectrum betalactamases (esbls), especially enzymes of the ctx-m group, are spreading rapidly in europe. enterobacteriaceae with reduced susceptibility to third generation cephalosporins and a positive esbl confirmatory test are also increasing in southwest finland. the purpose of this work was to study the resistance genetics of these esbl-positive enterobacteriaceae. methods: the study comprises a total of 271 clinical enterobacteriaceae strains isolated from both inpatient and outpatient specimens. all enterobacteriaceae strains that were esbl confirmatory test positive between january 2004 and december 2008 were included in this study (263 escherichia coli, 8 klebsiella pneumoniae, one isolate per patient). of these strains, 225 (83%) were urine isolates. resistance determinations were done using disk diffusion method (clsi) or vitek 2 and esbl confirmations by the double disk method using cefotaxime and ceftatzidime with and without clavulanate. thus far, 219 strains (those collected by end of june 2008) have been analysed for the presence of the most important esbl genes (tem, shv and ctx-m) using pcr and pyrosequencing as described before (haanpera et al. aac, 52:2632; 2008) . results: in 2004 only 10 esbl-positive strains were found. all of them harboured a ctx-m type esbl gene. since then, the number esblproducing enterobacteriaceae strains has increased significantly being tenfold in 2008 compared to year 2004 (figure) . a high majority, 197 (90%) of the 219 strains analysed thus far had a ctx-m-type esbl gene. most of those (79%) belonged to the ctx-m-1 group according to the pyrosequencing results. ctx-m-9 group was the next common, with 20% of the ctx-m genes belonging to this group. only two strains with ctx-m group 2 enzyme were found. conclusions: enterobacteriaceae strains which produce esbl are increasing rapidly in southwest finland. this is especially true with e. coli strains isolated from urine. towards the end of the study period, the esbl enzymes were almost exclusively ctx-m, ctx-m-1 group being the most common. further research is needed to characterise genetic elements that carry these esbl genes. esbl strains and the proportion of ctx-m genes in 2004-2008. (2000) (2001) (2002) (2003) (2004) (2005) (2006) in france (n = 6), spain (n = 4), portugal (n = 6), uk (n = 11), kuwait (n = 2), canada (n = 13) and china (n = 10), including hong kong (n = 3) were studied. clonality was established by pfge and phylogenetic groups of ec and kp were determined as reported. susceptibility testing (clsi), blactx-m-14 transferability and location (i-ceu-i/s1 nuclease) were investigated. plasmid analysis included determination of inc group (pcr-replicon typing, hybridisation, sequencing) and comparison of rflp patterns. association of blactx-m-14 with isecp1, isecp1-is10 or iscr1 was established by pcr and sequencing. we identified 42 pfge types among 52 isolates: 38/47 ec, 3/4 kp and 1/1 cf. distribution among phylogroups were as follows: i) ec: a (n = 7), b1 (n = 3), b2 (n = 5) and d (n = 23), and ii) kp: kpi (n = 2) and kpii (n = 1). resistance to tetracycline (76%), nalidixic (74%), streptomycin (67%), sulfonamides (67%), ciprofloxacin (60%) and trimetroprim (43%) was common. were spreading horizontally in our hospitals and, here, we characterised the plasmids responsible in the major k. pneumoniae strains identified during the survey. methods: plasmids from representative k. pneumoniae strains with ctx-m-15 enzyme were extracted by alkaline lysis and compared by apai, psti and ecori restriction analysis. they were transferred into e. coli dh5a by electroporation. transformants were selected on cefotaxime-containing agar and were screened by pcr for beta-lactamase genes, the aminoglycoside resistance genes aac(6 )-ib and aac3-iib, and the plasmid-mediated quinolone resistance genes qnra/b/s. results: twelve isolates were characterised, representing 5 major strains (a-d, and f) found in the most-affected hospitals. restriction analysis divided their plasmids into several groups. representatives of strain a (n = 4) had essentially the same plasmid (group 1), as did the two representatives of strain d (group 2a). one strain f isolate had a plasmid (group 2b) very similar to plasmid 2a from strain d, indicating possible horizontal transfer. plasmids of group 3 were retrieved from representatives of strains b and c, again indicating probable transfer. plasmids from three other strains differed substantially from each other and from plasmids 1, 2a, 2b and 3. nevertheless, on all plasmids, blactx-m genes were linked to an upstream isecp1 element, known to be involved in their mobilisation. all encoded multi-resistance: all but one group 1 and one ungrouped plasmid carried aac(6 )-ib; blaoxa-1 and aac(3)-iia were detected on all except group 1 plasmids; blatem was found on group 1, 2b, one group 3 and two ungrouped plasmids. blashv and qnra/b/s genes were not detected. the considerable diversity of plasmids encoding ctx-m-15 enzyme in major slovenian k. pneumoniae strains suggested only limited transfer, even when multiple strains were present in the same hospital. evidence of plasmid transfer was between strains b and c, and possibly between strains d and f, although these plasmids were not strictly identical. analysis of resistance genes encoded by the plasmids revealed diversity, with groupings coinciding largely with those based on restriction profiles. a. ingold, g. borthagaray, a.k. merkier, d. centrón, h. bello, c.m. márquez°(montevideo, uy; buenos aires, ar; concepción, cl) objectives: to examine the genetic context of class 1 integron harbouring blactx-m-2 in fifteen nosocomial k. pneumoniae isolates from south america in order to enhance the understanding of the antibiotic resistance spread among the region. methods: dna was extracted with the use of axypreptm bacterial genomic dna miniprep kit. the analysis of the cassette array was carried out with the use of primers hs458/hs459 targeting adjacent conserved regions. the examination of the surroundings were performed using two pcr primer pairs, hs817/hs818 and hs825/hs826, to amplify the initial(iri) and the terminal(irt), inverted repeat boundary, respectively. the primer pair hs825/hs911 was used whenever a negative result was obtained with hs825/hs826. all pcr products were purified and sequenced and the data was analyzed with ncbi blast tool. the sequence obtained with primers hs817/hs818 revealed the presence of three different transposons backbones at the iri end. the tn5036-like module and the tn21-like module were present in 4 isolates, the tn1696-like module was present in 7 isolates. no amplicons were obtained with the use of primers hs825/hs826 that amplify a tn21-like insertion. two uruguayan isolates with a tn5036 boundary at the iri end were tested with hs825/hs911 that target a tn5036-like backbone and one generated a product consistent with a tn5036-like mer region. uruguayan isolates carried a single aada1 cassette (4/5) and the other one contained a dfra17-aada5 array, while the four argentinian isolates carried the combination aaca4-aada1-orfd. chilean isolates arrays are in process. conclusions: among the extended-spectrum beta-lactamases, the cefotaximases constitute a rapidly growing cluster of enzymes that have disseminated geographically. there is a high frequency of isolation of ctx-m-2 producing k. pneumoniae associated with a class 1 integron in the region. despite being common the presence of iscr1 linked to blactx-m-2 in k. pneumoniae isolates, this study provides new and relevant information in the sequence context at the iri. here we report about the cassette array diversity and the diversity of elements in which the class 1 integron are embedded. different integron/transposons carrying the blactx-m-2 gene seem to be circulating and different regional patterns could be emerging, this study highlights the ability of different genetic elements to act cooperatively to spread and rearrange antibiotic resistance. l. vinué, a. garcía-fernández, d. fortini, p. poeta, m.a. moreno, c. torres, a. carattoli°(logroño, es; rome, it; vila real, pt; madrid, es) objectives: ctx-m enzymes are frequently detected in europe. in particular, ctx-m-1 and ctx-m-32-producing strains have been recovered from both humans and farm animals in spain, italy, greece, and portugal, suggesting the existence of community reservoirs for these enzymes. the aim of this study was to compare escherichia coli strains and plasmids harbouring blactx-m-1 and blactx-m-32 genes isolated from human and animals. methods: four e. coli ctx-m-1 and eight ctx-m-32 epidemiologically unrelated producers from sick or healthy animals (pig, dog, cow and chickens) and from humans (urine, blood and faecal samples) were analysed by xbai-pfge, plasmid transferability, pcr-based replicon typing, plasmid restriction analysis and southern blot hybridisation. all isolates were from spain but the dog isolate was from portugal. the genetic context of the blactx-m genes was previously investigated for all the strains. results: three ctx-m-32 strains (one from healthy chicken and two from hospitalised patients) showed the same pfge pattern. a chromosomal localisation of the blactx-m-32 gene was suspected in these strains. the five remaining ctx-m-32 producers showed the blactx-m-32 gene on plasmids belonging to the incn (4 strains) or untypable groups (1 strain). two incn plasmids showed identical pvuiirestriction patterns: one was identified in a strain from a healthy chicken and one was from a hospitalised human patient; these two strains were isolated in 2002 and 2004, respectively and showed different pfge patterns. ctx-m-1 producers (three from animal strains and one a healthy human) did not show clonality by pfge and the blactx-m-1 gene was always located on plasmids, three belonging to the incn and one to the inci1 groups. two of the incn plasmids carrying the blactx-m-1 gene showed highly related restriction patterns: one was from a healthy dog and one from a healthy human. conclusion: this study demonstrated the presence of clonal e. coli ctx-m-32 producers in animal and human sources and also detected epidemic incn plasmids disseminating among unrelated isolates from humans and animals, clearly suggesting a potential animal reservoir for the blactx-m-1/32 genes. o309 characterisation of bladim-1, a novel integron-located metallo-beta-lactamase gene from a pseudomonas stutzeri clinical isolate in the netherlands l. poirel°, j. rodriguez-martinez, n. al naiemi, y. debets-ossenkopp, p. nordmann (k.-bicetre, fr; amsterdam, nl) objectives: characterisation of the mechanism involved in the uncommon resistance to carbapenems observed from a pseudomonas stutzeri isolate recovered from a patient hospitalised in the netherlands with a chronic tibia osteomyelitis. that strain was resistant to ticarcillin, piperacillin-tazobactam, imipenem and meropenem, of intermediate susceptibility to ceftazidime and cefepime, and susceptible to aztreonam. methods: screening for metallo-beta-lactamase (mbl) production was performed using the e-test method with a strip combining imipenem and edta. shotgun cloning was performed with xbai-digested dna of p. stutzeri and pbk-cmv cloning vector. selection was performed on amoxicillin and kanamycin-containing plates. results: e. coli top10 (pdim-1) recombinant strains were obtained, displaying resistance to penicillins and ceftazidime, reduced susceptibility to cefepime, imipenem and meropenem, and full susceptibility to aztreonam. sequence analysis identified a novel ambler class b betalactamase dim-1 for "dutch imipenemase" (pi 6.1) weakly related to all other mbls. dim-1 shared 52% amino acid identity with the most closely related mbl gim-1, and 45 and 30% identity with the imp and vim subgroups, respectively. dim-1 hydrolyzes very efficiently imipenem and meropenem, expanded-spectrum cephalosporins, but spares aztreonam. the bladim-1 gene was as a form of a gene cassette located at the first position in a class 1 integron, but the 59be of that gene cassette was truncated giving rise to a fusion with an aadb gene cassette encoding an aminoglycoside adenylyltransferase. the third and last gene cassette corresponded to the qach cassette encoding resistance to disinfectants. conclusion: a novel mbl gene was identified in p. stutzeri further underlining (i) the diversity of acquired mbl genes, especially among non-fermenters, (ii) that pseudomonas sp. may be a reservoir of these genes and (iii) the possibility of spread of important resistance determinants in northern part of europe. isolates in greece p. giakkoupi, o. pappa, m. polemis, a. bakosi, a. vatopoulos°( athens, gr) objectives: metallo-beta-lactamases of the vim family are the main mechanism of carbapenem resistance in p. aeruginosa in greece. in this preliminary report we attempted to survey the subtypes of vim betalactamase currently prevailing in p. aeruginosa clinical isolates in greek hospitals, the genetic relatedness of the respective isolates, as well as the genetic environment of the blavim gene. methods: fifteen mbl producing and epidemiologically unrelated p. aeruginosa clinical isolates were collected in september 2006 from fifteen different hospitals around greece. mbl production was initially identified by an edta synergy test. identification of blavim gene, as well as mapping of the blavim cassette carrying integrons were performed by pcr and sequencing of the products. the o serotypes of the isolates were determined by a slide agglutination test using p. aeruginosa antisera (biorad). molecular typing was performed by pulse-field gel electrophoresis of spei-restricted genomic dna. results: blavim-2 gene was detected in nine isolates, blavim-4 in five and blavim-1 in only one isolate. the blavim-2 cassette of all nine isolates was located on the 1600 bp variable region of a class i integron, preceded by aaca29 gene cassette. blavim-4 cassette of all five isolates was the first cassette of the 3200 bp variable region of a class i integron, followed by the aaca4 and blapse-1 gene cassettes. blavim-1 was the unique cassette of a class i integron. vim-2 producers belonged to o8, o11 and o12 serotypes, whereas four isolates were non-typeable. vim-4 producers belonged to the same three serotypes, whereas only one was non-typeable. the vim-1 producer belonged to o12 serotype. the nine vim-2 producing p. aeruginosa isolates revealed a great degree of variability in pfge molecular typing, belonging to seven types. contrary, the five vim-4 producing p. aeruginosa isolates displayed higher genetic similarity and fell into one major type with 85% homology, which also included the vim-1 producing isolate. there was no correlation between the results of serotyping and molecular typing. conclusions: mbl production in p. aeruginosa in greece seems to be mainly due to specific class i integrons harbouring either blavim-2 or blavim-4 genes. genetic variability was higher among bacteria carrying vim-2 beta-lactamase, a fact indicating wider intraclonar spread of the respective integron. j.m. rodriguez-martinez, l. poirel°, p. nordmann (k.-bicetre, fr) objectives: extended-spectrum beta-lactamases of ampc-type (esacs) contributing to reduced susceptibility to imipenem have been recently reported from enterobacteriaceae. the aim of the study was to evaluate the putative role of natural ampc-type beta-lactamases of p. aeruginosa in a similar resistance profile. methods: thirty-two non-repetitive p. aeruginosa clinical isolates recovered in our hospital in 2007 were included. they were selected on the basis of criteria of intermediate susceptibility or resistance to ceftazidime and intermediate susceptibility or resistance to imipenem. mics were determined by agar dilution and e-test techniques. the level of expression of the ampc beta-lactamases was evaluated by measuring specific activities. pcr, sequencing, and cloning allowed to characterise the different bla(ampc) genes. identified esacs were purified and their km and kcat values for beta-lactams determined by spectrophotometry. results: using cloxacillin-containing (an ampc beta-lactamase inhibitor) plates, the susceptibility to ceftazidime was restored for 25 out of 32 isolates, suggesting overproduction of the ampc. in addition, in presence of cloxacillin, reduced mic values were also observed with ceftazidime, cefepime and imipenem for 21 out of those 25 isolates. cloning and sequencing identified 10 distinct ampc b-lactamase variants among the 32 isolates. recombinant plasmids expressing the ampcs were transformed into reference p. aeruginosa strain and reduced susceptibility to cefepime and imipenem was observed only with recombinant p. aeruginosa strains expressing ampc beta-lactamases that had an arginine residue at position 105. the catalytic efficiencies (kcat/km) of the ampc variants possessing this arginine residue were increased against oxyiminocephalosporins and imipenem. in addition, in-vitro assays demonstrated that those ampc variants constituted a favourable background for selection of additional degree of carbapenem resistance. conclusions: some ampcs of p. aeruginosa possessing extended activity torward carbapenems may contribute to carbapenem resistance. background: most oxa-type esbls are oxa-10, oxa-2 or oxa-1 derivatives. they display a very low homology, the percentage of which is between 20% and 30%. oxa-type esbls are divided into five groups according to the different homology by frederic bert, etc. group 1 includes oxa-5, oxa-7, oxa-10 and its derivants;group 2 includes oxa-2, oxa-3, oxa-15 and oxa-20;group 3 includes oxa-1, oxa-4, oxa-30 and oxa-31; group 4 is named after oxa-9; group 5 only includes a single enzyme called lcr-1. oxa-type esbls has been reported widespread in the world since the first report in 1987, such as turkey, france, england and so on. but there is few report about it in china. objective: to investigate the prevalence and genotype distribution of oxa-type extended-spectrum beta-lactamases (esbls) in clinical pseudomonas aeruginosa strains isolated from xiangya hospital of central south university in changsha city, hunan province, china. methods: ninety-seven non-repetitive clinical isolates of p. aeruginosa were collected between october 2006 and january 2007 from the hospital. they were screened for oxa-type esbls production by polymerase chain reaction pcr with five pairs of primes specific for blaoxa genes, respectively. then amplification of oxa-type esbls production was performed by pcr with specific primers. the purified and amplified products were sequenced to confirm the genotype of the oxa-type esbls. results: the sequences of the three oxa-type esbls pcr products were then compared in genbank database and there were no the completely same ribonucleotide and amino acid sequence with them. they were two novel oxa-type esbls, named as blaoxa-128 and blaoxa-129, which have been registered in genbank database under accession numbers eu573214 and eu573215, respectively. conclusions: there have occurred infections caused by p. aeruginosa producing oxa-type esbls in xiangya hospital of central south university. two novel oxa-type esbls in p. aeruginosa strains have been discovered in our study, which are named blaoxa-128 and blaoxa-129, respectively. pneumonia is one of the most common nosocomial infections and is associated with high mortality. in the last 15 years, gram-positive bacterial pathogens have risen in prevalence as a cause of hospitalacquired pneumonia (hap), including that occurring during mechanical ventilation (ventilator-associated pneumonia; vap). in particular, staphylococcus aureus is a major cause of hap, including vap. the rise of multidrug-resistant infections is a source of concern, with methicillinresistant s. aureus (mrsa) accounting for >40% of s. aureus isolates in some european hospitals. this symposium will take the format of a question-and-answer roundtable session in which experts will answer questions and initiate discussion surrounding emerging concerns and appropriate therapeutic strategies in nosocomial pneumonia, including that caused by multidrug-resistant gram-positive pathogens. recently, shifts in the susceptibility of s. aureus to established therapeutic agents for nosocomial pneumonia have added to the challenge of selecting appropriate empiric therapy. in patients with suspected multidrug-resistant infections or those who are mechanically ventilated, prompt initiation of therapy, often before the pathogen has been confirmed, is critical. vancomycin is the gold-standard treatment for multidrug-resistant infections and resistance has been remarkably slow to emerge. however, clinical reports in europe of 'mic creep' and the emergence of vancomycin-intermediate s. aureus (visa), hvisa and linezolid-resistant mrsa have presented new clinical dilemmas. elevated vancomycin mics are linked to treatment failure and increased mortality. hence, while vancomycin remains a useful therapeutic tool, treatment decisions present an increasing challenge, especially in groups of patients in whom rapid eradication of infection with appropriate agents is critical. telavancin is a novel lipoglycopeptide under investigation for treatment of nosocomial pneumonia. a number of key features suggest telavancin as a potentially attractive option for nosocomial pneumonia. telavancin has a unique dual mechanism of action that disrupts both bacterial cell wall biosynthesis and cell membrane integrity. the agent is rapidly bactericidal against a broad range of clinically relevant grampositive bacteria, including mrsa. two pivotal phase iii studies have demonstrated telavancin efficacy equivalent to vancomycin in hap, including vap, including in seriously ill patient subgroups and in that caused by mrsa. hantaviruses are enveloped rna viruses, each carried primarily by rodents or insectivores of specific host species. they have coevolved with the hosts in which they cause almost asymptomatic and persistent infections. in humans some hantaviruses cause disease: haemorrhagic fever with renal syndrome (hfrs) in eurasia. in europe puumala (puuv) from bank voles and saaremaa (saav) from field mice cause mild hfrs and dobrava (dobv) from yellow-necked mice severe hfrs. in asia hfrs is caused mainly by hantaan and seoul viruses. in americas some viruses cause hantavirus cardiopulmonary syndrome: sin nombre, andes and other viruses carried by sigmodontine rodents, not found in eurasia. in addition, in europe the common vole carries tula and rats seoul virus. however, they have not been definitely associated with disease in europe, although both can infect humans. we discuss the epidemiology, molecular genetics, detection of infection in carrier hosts and humans (including rt-pcr and 5-min serological tests), functions of hantaviral proteins, risk factors for humans to catch hantavirus infection (including smoking) and disease (including risk and protective hla haplotypes), role and mapping of epitopes of cytotoxic t-cells, mechanisms of hantavirus-induced apoptosis, newly discovered clinical features (including hypophyseal haemorrhages in puuv infection), and long-term consequences and pathogenesis of hfrs (endothelial permeability, thrombocytopenia, tnf-alpha and il-6). puuv occurs widely in europe except in the far north and mediterranean regions, saav in northern, eastern and central europe and dobv mainly in the balkans. the epidemiological patterns differ: in western and central europe hfrs epidemics follow mast years with increased oak and beech seed production promoting rodent breeding. in the north, hantavirus infections and hfrs epidemics occur in 3−4 year cycles, driven by prey-predator interactions. the infections and hfrs are on the increase in europe, partly because of better diagnostics and partly perhaps due to environmental changes. in several european countries hantavirus infections are notifiable and in some countries (e.g. belgium, finland, france, germany, scandinavian countries, slovenia) their epidemiology is relatively well studied. in large areas of europe, however, hantavirus infections and hfrs have not been studied systematically and they are still heavily under-diagnosed. mrsa screening − will we ever agree? s330 mrsa: universal screening! the successful control of any outbreak or epidemic relies on detection of those harbouring the pathogen (infected and colonised persons) combined with eliminating spread to new individuals. the approach to containment and reduction of the global mrsa pandemic is now being discussed. a challenge for this infection is that most persons harbouring mrsa do not exhibit signs of disease and thus in order to detect all potential spreaders of this organism some surveillance must be done. the required level of detection (surveillance through screening) is not known and likely varies with the prevalence of colonisation and disease. for a given mrsa prevalence, the factor that seems most crucial in reducing spread is the percentage of potential isolation days captured. the operational processes that highly influence this are 1) the sensitivity of screening detection (including sites tested and laboratory methods used), 2) the speed at which results of newly detected positive patients are reported from the laboratory (assuming pre-emptive isolation is not employed), and 3) the selection of patient populations who are to undergo screening. laboratory testing has a major impact on detecting mrsa colonised patients with real-time pcr having a sensitivity of 98% and a possible 2 hour reporting time compared to direct chromogenic agar cultures with a sensitivity of 80% and >24 hour reporting and enriched chromogenic agar testing with a sensitivity of 90% and >48 hour reporting (am j clin pathol, 2009); both reduced sensitivity and prolonged reporting time negatively impacting the success of mrsa timely isolation. we have shown that capturing 33% of mrsa isolation days in a modest mrsa prevalence setting (9 infections/10,000 patient days) with a high sensitivity test having a >24 hour result reporting time did not reduce hospital-wide mrsa disease (ann int med 148:209, 2008) . others have demonstrated that surveillance in an icu with similar mrsa prevalence, again with a high sensitivity test having 1 day result reporting, did not reduce icu disease until preemptive isolation was initiated (crit care 10: r25, 2006) . finally, we demonstrated that universal admission surveillance and decolonisation capturing 85% of possible mrsa isolation days had a dramatic impact by reducing 70% of all in-hospital infections from mrsa. future research in this area should focus on better defining those patients that benefit from mrsa screening and the role of decolonisation in these programs. clostridium difficile infection (cdi) is a toxin-mediated intestinal disease and extraintestinal manifestations are exceptional. clinical outcomes can range from asymptomatic colonisation to mild diarrhoea and more severe disease characterised by inflammatory lesions and pseudomembranes in the colon, toxic megacolon or bowel perforation, sepsis, shock, and death. the main clinical symptoms, secretory diarrhoea and inflammation of colonic mucosa, can be in great part explained by the actions of two large protein toxins, toxin a (tcda) and toxin b (tcdb). both toxins are cytotoxic, destroy the intestinal epithelium and decrease colonic barrier function by disruption of the actin cytoskeleton and tight junctions resulting in a decreased transepithelial resistance allowing fluid accumulation. in addition, c. difficile toxins also cause release of various inflammatory mediators which affect enteric nerves, sensory neurons and promote inflammatory cells, adding to the fluid secretion, inflammation and transmigration of neutrophils. some experimental evidence points also to possible extraintestinal action of c. difficile toxin b. in zebrafish embryos tcdb caused damage and edema in cardiac tissue and in hamsters the same toxin caused lung damage. only recently efficient systems have been developed to genetically manipulate c. difficile. comparison of knock-out mutants producing only one of both toxins have shown that tcdb-positive-only mutants retain the ability to kill hamsters, whereas tcda-positive-only mutants were not virulent for hamsters. these results are in concordance with epidemiological findings that naturally occurring a-b+ strains still cause the entire spectrum of cdi, but are not in concordance with effects observed after intragastric challenge of hamsters with purified toxins tcda and tcdb. the role of the third toxin produced by c. difficile, binary toxin cdt in the development of human disease is not well understood. cdt was shown to have enterotoxic effect in rabbit ileal loop assay, but natural strains producing cdt but neither tcda nor tcdb colonised animals but were not lethal in hamsters. comparative genomic analysis will most likely reveal additional factors involved in pathogenesis and in increased virulence (including cell surface layer proteins, sporulation characteristics and antibiotic resistance). additionally, the role of the host immune response in cdi has just started to be better understood. since 2002, there has been an escalation in rates of clostridium difficile infection (cdi) with epidemic c. difficile (pcr ribotype 027/north american pulsed-field type 1 [nap1]) responsible for outbreaks of severe infection in north america and europe. while fluoroqinolone resistance and over-use are thought to be driving the epidemic, the ageing population and improved case ascertainment are contributing to the dramatic increase in cases. other factors may also be important, such as the increase in prescription of proton pump inhibitors. in the netherlands, since 2005, there has been an increase in prevalence of human cdi with ribotype 078 strains usually found in animals. these infections were in a younger population and more frequently community acquired. there was alarm when it was reported that 20% of retail beef samples in canada contained c. difficile. the figure is higher in the usa where more than 40% of packaged meats (beef, pork and turkey) from 3 arizona stores contained c. difficile. most animal isolates of c. difficile produce binary toxin, and both pigs and cattle harbour pcr ribotype 078 a strain that, like ribotype 027, also produces more toxins a and b, and binary toxin. in the eastern part of the netherlands where >90% of pig farms are located, >20% of human isolates are now ribotype 078, and human and pig strains of c. difficile are highly genetically related. it has been suggested that the overlap between the location of pig farms in the netherlands and the occurrence of human ribotype 078 infections involves a common source. that source is likely to be the environment. the upsurge in cdi has prompted diagnostic companies to try to either improve current tests or develop new ones. laboratory diagnostic methods can be divided into 3 groups; traditional faecal cytotoxin detection (with or without culture), enzyme immunoassays (eias) and molecular methods. faecal cytotoxin detection is specific but lacks sensitivity, culture is sensitive but lacks specificity. new eias should find a niche in medium sized laboratories. current in-house pcr methods have the potential for great sensitivity and specificity but have been available only in larger laboratories. new commerciallyavailable platforms will make this methodology more accessible to smaller laboratories. whatever method is chosen, it is necessary for the laboratory to have as fast a turn-around-time as possible, particularly in an outbreak situation. d. lévy-bruhl°(saint-maurice, fr) in 2005, the advisory board on immunisation (abi) has been asked to make recommendations to the ministry of health regarding the inclusion or not in the french immunisation schedule of the soon to be licensed first hpv vaccine. the main elements considered in the establishment of the benefit-risk balance of routine hpv vaccination were: on the benefit side: -the very significant potentially preventable burden of diseases; -the very high efficacy of the vaccine against persistent hpv 16/18 infections in naive subjects; -the expected additional impact on other hpv16/18 related lesions and cancers; -the fact that vaccination, by preventing the pre-cancerous lesions, has the advantage over screening to reduce the cost and anxiety related to their detection and management; -the available data in favour of a satisfactory safety profile; -the benefit of vaccination for the women not covered by the opportunistic screening program. on the "risk" side: -the high cost of vaccination; -the unknown duration of protection; -the need for continuation of screening, even for vaccinated women; -the fact that the majority of residual cervical cancers could be prevented by the organisation of the screening program; -the risk of a decrease in compliance to screening for vaccinated women; -the low benefit if vaccinated and screened women were the same. a cost effectiveness analysis, carried out on a multi cohort markov model, showed that, over a 70 years period, the impact of vaccinating 80% of 14 years old girls or of organising the screening were comparable (reduction of cancer deaths close to 20%). however, the cost-effectiveness ratio of the vaccination was higher than that of the screening organisation, resp. 45,200 and 22,700 € per life year saved (at a 3% discount rate). on the basis of the economical analysis, the screening organisation was therefore the first priority. however, if both interventions were implemented, the overall reduction in cervical cancer deaths was estimated at 32%. the cost-effectiveness of the addition of vaccination on the top of the organisation of the screening appeared acceptable (55,000 € per life year saved). based on those results, the abi issued in march 2007 a recommendation to include the hpv vaccination in the immunisation schedule for 14 years old girls, together with a catch up for 15 to 23 years old women not having started their sexual life more than one year ago. the vaccine cost has been reimbursed since july 2007. clinical microbiology − is outsourcing the way to go? s338 the (r)evolution of clinical microbiology in europe − is it good or bad? laboratory medicine in general and clinical microbiology in particular is presently subject to rapid (r)evolution. are we aware? are we in command? do we know where we are going? should we oppose or cooperate? do we have a choice? do we recognise a driving force other than money? is it good, bad or just plain necessary? and are we gaining or losing? it is not one evolutionary process -it is several parallel processes with varying emphasis in different areas. there are at least four distinctive major trends over the last 15 years; the gradual formation of bigger and bigger units (concentration), the amalgamation of many different laboratory services into one (laboratory medicine), accreditation and an explosion of professional proficiencies and backgrounds of staff in microbiological laboratories. personally i have withstood the first two, with pleasure succumbed to the latter. a recent 5th trend, outsourcing microbiology services to large private consortiums, is splitting clinical microbiology into a purely analytical high-throughput money-saving activity, often leaving the consultative, clinical part of microbiology and health care infection control adrift. what is driving the evolution? not only cost-saving but also our inability to recruit medically trained microbiologists, the need to broaden the knowledge base of microbiology laboratories, automation, the development of new techniques and apparatus common to many laboratory disciplines, computerised medicine, political trendiness, power struggles, and much more. there is much to be gained by both concentration and amalgamation but much to be lost as well and many consider the heart and soul of clinical microbiology at risk. over a period of years, rational high-throughput production has won over consultation and personalised microbiology. that may be fine for the production of negative hiv-antibody/antigen analysis as for the screening of blood-donors but certainly not for the bacteriological cultures taken in conjunction with a hip replacement. or when it comes to understand and advise on the intricacies of antimicrobial resistance development. in other cases "outsourcing" and/or "amalgamation" mean that blood cultures are sent to x-town, cmv-antibodies to y-town and everything else to z-ville. when that happens clinical microbiology is lost. there are several instances where concentration, amalgamation and/or outsourcing of clinical microbiological services, alone or with other services, have meant that the tie between clinical microbiology and infection control has been severed and that many, both small and large hospitals have lost the personalised service so necessary to control outbreaks of multi-resistant bacteria and other health care related infections. a good service requires a strong knowledgeable and enthusiastic champion. a service which encompasses too many branches of laboratory medicine cannot be expected to champion each and every one with equal strength and fervour. and when outsourced to "big companies", there is no "clinical", only "microbiology". in 2008 "medical microbiology" broke out from "laboratory medicine" in uems. we are now striving towards a strong "medical microbiology" service in europe. it will have many facets, much strength, some weakness, great opportunities, but many threats. escmid certainly intends to help shape microbiology in europe. the optimal organisation of microbiology laboratories in european metropolis is an evolutionary task, driven by the evolution in laboratory tasks, laboratory technologies, communication technologies, regulations and financial issues. in the past five-ten years, medical and societal query for a more rapid and refined detection and identification of pathogens and antimicrobial resistance determinants coincided with the expansion of internet-based and remote tools for communication, an unprecedented revolution in laboratory technologies and new financial constraints. the concentration of laboratory workforces into one unique laboratory is one way to address these apparently contradictory issues. the tertiary medical school hospital system in marseille, a 2-million metropolitan area in france, comprises four hospitals for a total of 3,500 beds. the system had once four microbiology laboratories which have been progressively embedded into a unique, 600,000 acts per year, laboratory which deals with bacteriology, virology and environmental microbiology and hygiene. the medical staff comprises of 17, the ingenior staff of 11, technical staff of 88 and support staff of 13 persons for a total of 129 persons. this organisation allowed reducing labour time for routine microbiology, to develop prospective and sophisticated time-consuming diagnostic methods and to develop advanced diagnostic methods such as molecular methods (real-time pcr-based tests, sequencing, and mass spectrometry identification) and new generation serology. new, sophisticated technologies such as automated serology and mass spectrometry were corner-stones on which to base the constant diminution of routine labour time and the development of time-consuming tasks such as fastidious organisms' isolation. these evolutions paralleled the exponential increase in the ratio of ingeniors in the laboratory. this paradigm allowed for the constitution of large collections of biological specimens for retrospective analyses, the specialisation of every medical senior in one particular field of internationally recognized expertise and the increase in knowledge output in terms of peer-reviewed papers, patents and grants. implantation of point-of-care in the emergency department, in permanent internetbased connection with the central laboratory, was the last, but not least, evolution of this system. when tuberculosis epidemiology is seen in a global perspective, and the millennium development goals are considered, it is clear that two regions of the world, africa and europe, are severely behind in the control of the disease. in africa, especially sub-saharan africa, the tb problem is closely related to the endemic hiv/aids situation. in europe, especially the eastern part and in parts of the former soviet union, the main obstacle to an effective tb control is related to drug resistant forms of m. tuberculosis. the prevalence of the most severe forms of resistance, mdr-and xdr-tb, is so high that it makes control efforts both extremely complicated and very expensive. unfortunately, increasing levels of drug resistant tb are today also seen in many african countries, and hiv infection is spreading in eastern europe. during the last ten-year period new tools, based on molecular fingerprinting of m. tuberculosis strains, have been increasingly adapted to study tb transmission. with such molecular methods to characterise clinical isolates of m. tuberculosis it is now possible to study the spread of individual strains of the bacteria in detail. the laboratory tools used, rflp, miru/vntr, spoligotyping and others, will be presented and their use exemplified. how molecular epidemiology contributed to the detection and characterisation of a major outbreak of drug resistant tb in the stockholm area will be discussed. molecular characterisation of clinical isolates from different parts of the world has led to an increased recognition of the differences between different families of m. tuberculosis strains. to further describe and understand the role of these differences in the clinical field as well as for tb epidemiology is an ongoing and interesting field of research. an increased understanding of how tb is transmitted will hopefully help in the efforts to control this global health threat both on the local level and in a global perspective. living in the era of increasing tuberculosis drug resistance, the importance of making an early and accurate diagnosis with drug sensitivities has never been greater. the epidemiology of tuberculosis defines the extent of latent disease and the proportion which becomes active. accurate diagnosis is vital if patients are to be treated in a timely manner and to reduce the amount of time infectious individuals go untreated in the community disseminating disease. in many areas of the world, dots programmes are at the forefront of tuberculosis control. however, as a diagnostic this currently relies on sputum smear microscopy which is known to miss 50% of cases of tuberculosis and provides no data on drug sensitivity. the second major issue around tb is the lack of worldwide diagnostic facilities. there is a need for a simple, low cost, easily implemented diagnostic test. this talk will briefly consider the issues around the diagnosis of latent and active disease which are quite distinct. the focus will be on the diagnosis of active infection. in particular, the use of mods (microscopic observation drug-susceptibility) assay in diagnosis of tuberculosis will be discussed. the potential for using this in resource poor countries will be reviewed as well as the way sophisticated technology maybe harnessed to improve reporting and allow translation to all parts of the world. the important issue of how to distinguish patients with latent and active disease will also be considered. key issues and principles in diagnosis both now and in the future will be reviewed. in terms of treatment, there are 2 main issues. the first is that even short-course therapy is prolonged being a minimum of 6 months leading to issue of compliance. this may result in drug resistance. the massive rise of multi-drug resistant tuberculosis to approximately 500,000 cases world-wide with around 50 countries reporting extensively drug-resistant disease means that the need for new approaches to therapy are urgent. the second part of this talk will review different approaches to using current anti-mycobacterial drugs, the emergence of a small number of new drugs such as the diarylquinolones and entirely novel approaches to control and treat tuberculosis. there has been great success and also many threats in the field of infectious diseases during the previous year. the antimicrobial resistance, especially increasing carbapenem resistance among aerobic gram-negative rods and xdr mycobacterial tuberculosis strains are already big threats in some countries and they will probably spread to many other areas all over the world in the future and we will need new drugs for these indications but unfortunately very few new promising drugs seem to be in the pipeline at the moment for these purposes. the virulent clostridium difficile 027 strain spreads rapidly to many new countries and e.g. in finland it killed many times more people compared with mrsa and esbl strains in 2008. however, it is possible to stop its spreading but it needs new thinking in antibiotic use policy and infection control policy in hospitals. clostridium difficile 027 infection has a high relapse rate after metronidazole or vancomycin therapy, but an experimental "stool exchange treatment" is a promising therapy although controlled studies are needed to prove this assumption. an interesting research area during the last years has been the role of infections in the etiopathogenesis of chronic diseases like cancer, atherosclerosis, cardiovascular diseases and many autoimmune diseases. we can fight against many cancers like liver cancer and cervix cancer with virus vaccines and gastric cancer with antimicrobial drugs. also the high incidence of malignant tumours seems to decrease during haart treatment in hiv patients. the role of infections in the etiopathogenesis of cardiovascular diseases and atherosclerosis is complex. it is obvious that infections play a role in the etiopathogenesis of atherosclerosis, stroke and myocardial infarction but the undirected routine antimicrobial treatment is not recommended for these patients but there seems to be subgroups in patients with various cardiovascular diseases which may benefit from antimicrobial treatment. recent studies seem to suggest that there are hla types which protect or make people susceptible for coronary heart disease. the hla type hla-b*35 seems to be a risk factor for coronary heart disease but it is also a risk factor for chronic chlamydia pneumoniae infection. the feared pandemia due to h5n1 influenza a did not appear during the recent year and the world is now much more prepared to meet the next pandemia which, however, hopefully does not come during the next year. ø. samuelsen°, c. giske, u. naseer, s. tofteland, d.h. skutlaberg, a. onken, r. hjetland, a. sundsfjord (tromsø, no; stockholm, se; kristiansand, bergen, oslo, førde, no) objectives: the worldwide dissemination of kpc-producing multidrugresistant enterobacteriaceae is worrisome. the first kpc-producing klebsiella pneumoniae in norway was isolated late 2007 from a patient after hospitalisation in greece. throughout the following year seven additional kpc-producing k. pneumoniae isolates have been detected in clinical samples from six new patients. the aim of this study was to perform molecular characterisation of the strains and examine their epidemiological relatedness. materials and methods: antimicrobial susceptibility was examined by etest. molecular characterisation was performed by mlst, pfge and sequencing of the blakpc genetic structure. plasmid analysis was carried out by pfge of s1 nuclease-digested total dna and southern blot hybridisation using a blakpc probe. relevant epidemiological data were collected retrospectively. results: eight kpc-producing clinical isolates of k. pneumoniae have been identified from seven patients in two different regions of norway from the following specimens: blood culture (n = 3), urine (n = 2), expectorate (n = 1), perineal swab (n = 1) and wound secretion (n = 1). two blood culture isolates with clonally related but different pfgeprofiles were observed in one patient. the detection of kpc-producing k. pneumoniae isolates in norwegian patients was associated with import in four cases after hospitalisation in greece. two patients had been hospitalised at the same hospital in greece. isolation of a kpc-producing isolate in a fifth patient was epidemiologically linked to one of these imported cases and was a case of nosocomial transmission in norway. for the latter two cases no risk factors were identified with respect to recent hospitalisation or travel abroad. molecular analysis of six isolates has shown genetically related pfge-patterns and a common sequence type (st258). st258 has been associated with dissemination of ctx-m-15 in hungary. the blakpc gene was localised in tn4401 on a~97 kb plasmid. the two most recent isolates are currently undergoing similar analysis. conclusion: the first seven cases of kpc-producing k. pneumoniae in norway are associated with hospitalisation abroad, nosocomial transmission in norway, or urinary tract infections in outpatients without obvious risk factors. the clonal relationship between isolates underlines the existence a biological fit genetic lineage of kpcproducing k. pneumoniae with an epidemic potential. objectives: two recent publications have reported the isolation of kpc producing k. pneumoniae from infections in two patients, one in france and one in sweden, who originally had been hospitalised in greece. since this resistant mechanism had not been identified before in this country, the purpose of this report was to confirm the presence of blakpc producing k. pneumoniae in greece, to assess the extent of its spread and to study the genetic relatedness of the respective bacterial strains and the transferability of the blakpc harbouring plasmids. methods: for a three month period (february to april 2008) 40 hospitals participating in the greek system for surveillance of antibiotic resistance (www.mednet.gr/whonet) were asked to seek for possible kpc producers among k pneumoniae isolates displaying reduced susceptibility to imipenem (equal or higher than 1 mg/l), a positive hodge test for the presence of carbapenemase and a negative edta synergy test for the presence of metalloenzymes. the presence of blakpc gene in these strains was confirmed by pcr and sequencing. mics to carbapenems were determined by etest. conjugation experiments were carried out both in broth and on agar. the possible absence of ompk36 porin was detected by pcr. molecular typing was performed by pulse-field gel electrophoresis of xbai-restricted genomic dna. results: ninety two k. pneumoniae clinical isolates (one per patient) from 13 hospitals all over greece were found to harbour blakpc-2 gene. although colonies present in the inhibition zone made the exact determination of imipenem mic difficult, the absence of ompk36 porin was always associated with mic of imipenem higher than 32 mg/l. all isolates exhibited resistance to all other drug classes except colistin, tetracycline and tigecycline. pfge analysis revealed that 85 isolates from 12 hospitals displayed more than 95% similarity and were classified into one pulsotype, whereas the remaining seven isolates belonged into four different pulsotypes. blakpc-2 gene could not be transferred by conjugation from strains belonging to the main pulsotype. however, it was transferred from strains belonging to three out of the four remaining pulsotypes. conclusion: production of kpc-2 betalactamase seems to be a new emerging resistance mechanism in klebsiella pneumoniae in greece. blakpc-2 gene's possible clonal spread imposes the urgent need of implication of infection control practices in the affected hospitals. i. galani, m. souli, e. papadomichelakis, f. panayea, n. mitchell, a. antoniadou, g. poulakou, f. kontopidou, h. giamarellou°(athens, gr) background: until now, carbapenem resistance among klebsiella pneumoniae (kp) clinical isolates in greek hospitals has been attributed to the dissemination of vim-1 metallo-beta-lactamase. we describe the first outbreak of kpc-producing kp in greece; the first to occur outside the usa or israel. setting: 21-bed icu of attikon university hospital, athens. methods: kp isolates with an imipenem mic > 1 mg/l and a negative edta-imipenem disk synergy test were submitted to boronic acid disk test, to pcr for a kpc gene with specific primers and sequencing. records from patients colonised or infected with a kpc-producing kp were retrospectively reviewed for clinical and epidemiological data. environmental cultures for kpc-producers were performed. clinical isolates were submitted to molecular typing using pfge. results: from february to november 2008, 552 kp were isolated from 95 patients, 132 (23.9%) of which were boronic acid positive and produced kpc-2. most of them (126/132, 95.5%) were isolated since august. a total of 24 patients were identified as colonised or infected by a kpc producer which in 22 of them belonged to the same genetic clone. the source was faeces (73), bronchial secretions (26), blood (7), cvc tip (5), urine (15), pus (4) and throat (2). among patients whose medical records were available, median age was 74, apache ii score; 21, length of preceding hospital stay; 28 days, total length of stay; 50 days, immunosuppresion was identified in one and crude mortality was 71%. the kpc-producing kp was more frequently icu acquired whereas in a minority of patients it was already present on icu admission. seventy percent of the patients had previously received a carbapenem for a median of 15 days. environmental colonisation was not identified. ten (7.6%) of the kpcproducers from 8 (33.3%) patients were identified as the cause of an infection: bacteraemia (7), ventilator-associated pneumonia (2) and surgical site infection (1) and exhibited mic90 (mg/l) for imipenem, >8; meropenem, >8; gentamicin, 4; ciprofloxacin, >2; fosfomycin, >128; colistin, 0.5 and tigecycline, 4. most patients were successfully treated with a colistin-containing combination mostly with a beta-lactam. there was no attributed mortality. isolates from the same bacterial species were typed by pfge or automated ribotyping. kpc-encoding gene was fully sequenced. plasmid preparations and i-ceu digestion of total dna were resolved in agarose gels, blotted and hybridised with a blakpc probe. the blakpc-carrying element (tn4401) was amplified with various primer pairs, digested with eag i and sequenced. results: 30 strains each carried kpc-2 and kpc-3. one e. cloacae carried kpc-4. 13 k. oxytoca were kpc-2-producers and 2 s. marcescens harboured blakpc-3, all from usa. great genetic diversity was observed among the isolates (41 different types). one clone of 10 e. cloacae was detected in new york state (2006) (2007) . small clusters of 2 and 3 strains were detected among e. coli, e. cloacae, k. oxytoca. plasmids were present in all but 3 isolates. persistence of clones throughout the years was not observed. in 35 isolates the kpc-encoding gene was located in high molecular weight plasmids (>54 kb). blakpc was located in the chromosome of 11 strains (e. cloacae, e. coli and k. oxytoca) and the location of this gene could not be determined in 15 strains. small plasmids were present in several strains, but did not harbour blakpc. tn4401 carried blakpc in 46 isolates, and the transposon element was conserved. this structure was not detected in 12 strains. conclusions: kpc-encoding genes were most often located in tn4401 among several enterobacteriaceae species collected in usa and israel. this blakpc-carrying element was located in plasmids and on the chromosome. this study highlights the importance of tn4401 in the dissemination of blakpc genes in several genetically diverse bacterial species. blakpc was not associated with tn4401 in only 12 of 61 strains. these strains are under further investigation. objective: to evaluate the carbapenem resistance mechanism in a raoultella planticola bacteraemia isolate recovered from a patient hospitalised in ohio, usa. methods: species identification was performed by vitek 2 and confirmed by 16s rrna sequencing. susceptibility testing used clsi broth microdilution method. blakpc was amplified and sequenced. the blakpc genetic element (tn4401) was amplified and sequenced. plasmid extractions and conjugation experiments were carried out and the isolate was screened for esbl-encoding genes, qnr and qepa. a 83 year old female patient was admitted to a hospital located in akron with a diagnosis of cap in may/2008. sputum, paracentesis and blood cultures were negative. urine culture grew e. coli and patient received courses of moxifloxacin, ceftriaxone, azithromycin and meropenem. the patient was discharged and returned after three weeks with respiratory problems. tracheal aspirate grew a multidrug resistant a. baumannii and the blood culture grew the enteric-like gramnegative bacillus. the isolate was identified as r. planticola by the vitek 2, which was confirmed by 16s sequencing. r. planticola strain demonstrated resistance against most b-lactams, including carbapenems. screening for kpc-encoding genes was positive and this strain carried blakpc-2. fluoroquinolone and aminoglycoside mic values were elevated. kpc-2-encoding gene was located in tn4401, but conjugation experiments failed. esbl and qnr/qepa genes were not detected. conclusions: kpc serine-carbapenemases have been detected in several gram-negative species commonly isolated from clinical specimens. kpc genes are embedded in transposon-like structure usually harboured in conjugative plasmids carrying multiple antimicrobial resistance mechanisms. this is the first report of kpc-producing r. planticola that is an environmental organism related to klebsiella spp. the similarity between these organisms could facilitate the transfer of genetic material. kpc-producing isolates appear to be prevalent among different enterobacteriaceae species in usa hospitals and was detected in an isolates of apparent environmental origin. objectives: it is long known that not all individuals with a specific disease present with the same clinical manifestations, nor do they have identical prognoses or responses to treatments. it has become clear that variations in the human genome are likely to have an impact on these aspects. tank-binding kinase 1 (tbk1) is a central molecule in the induction of a.o. the type i interferon response to pathogens. our goals for this study were 1) to investigate the frequency of single nucleotide polymorphisms (snps) in the promoter and coding region of tbk1 in a dutch caucasian population and 2) to search for potential associations between these snps and bloodstream infections. methods: whole blood samples or samples of positive blood cultures were collected and after genomic dna was isolated, pcr and sequencing were performed for snp identification. functional studies included promoter activity measurements using a luciferase assay as well as electrophoretic mobility shift assays (emsa) to study binding of the transcription factor usf1 to the wt and mutant promoter. snp incidences were studied in a case control study. results: in samples from dutch caucasian healthy volunteers, 4 snps were found with allele frequencies higher than 5% whereas 6 other known snps had frequencies lower than 5% in our cohort. two snps (rs89208169 and rs89208163) located in the promoter region were studied in a larger cohort of 350 anonymised patients from the maastricht university medical center with either gram-positive or gram-negative blood cultures. we found that the prevalence of rs89208169 was significantly increased in patients with positive blood cultures in comparison with those with negative blood cultures or healthy volunteers. further investigation of this snp showed that it is located just outside a usf1-binding site. measuring the promoter activity using luciferase assays, the mutant promoter exhibited a decreased activity of <35%. this observation was confirmed by emsa which showed that recombinant usf1 protein had a reduced binding affinity to the mutant promoter. conclusions: snp rs89208169 in the promoter region of tbk1 has a significant association with gram-positive infections. our results demonstrate that this is likely due to a decreased expression of tbk1 due to reduced binding of the transcription factor usf1 to the mutant promoter. our results support recent findings that tbk1 plays also an important role in the host response to gram-positive infections. objective: lymphocyte apoptosis has been recognized as an important factor contributing to both the onset of sepsis post infection and to the progression into septic shock. animal data suggest that prevention of lymphocyte apoptosis by caspase inhibition stabilises the immune system, improves bacterial clearance and decreases mortality in experimental sepsis. the present study evaluated the potential of vx-166, a novel broad caspase inhibitor, as a therapy for sepsis. methods and results: initial characterisation of vx-166 in a number of enzymatic and cellular assays clearly demonstrated that the compound is a broad caspase inhibitor with potent anti-apoptotic activity in vitro. in vivo, vx-166 was tested in a murine model of endotoxic shock and a clinically relevant model of peritonitis. in the endotoxic shock model, male cd-1 mice (n = 28 per group) were administered lps (20 mg/kg iv) and survival was monitored for 96 h. vx-166 administered by repeat iv bolus (0, 4, 8 and 12 h post-lps) significantly improved survival in a dose-dependent fashion (p < 0.0001). in the rat peritonitis model, adult male sprague-dawley rats (n = 12 per group) underwent caecal ligation and puncture (clp) and survival was monitored over 10d. continuous administration of vx-166 by mini-osmotic pump (0.9 mg/kg/h) immediately following surgery significantly improved survival (p < 0.01) from 38% in the control group to 88% in the compound-treated group. mode of action studies in the rat clp model confirmed that vx-166 reduced thymic atrophy and lymphocyte apoptosis (p < 0.01), supporting the anti-apoptotic activity of the compound in vivo. in addition, vx-166 reduced plasma endotoxin levels (p < 0.05), strongly suggesting an improved clearance of bacteria from the bloodstream. most importantly, we demonstrated that vx-166 fully retained its efficacy when dosed 3 hours after insult (p < 0.01) by improving survival to 92% versus 42% in control animals, further highlighting the potential of anti-apoptotic therapy in sepsis. overall these data demonstrate that vx-166 inhibits lymphocyte apoptosis, improves the clearance of bacterial endotoxin and improves survival in experimental sepsis. importantly vx-166 improves survival in the clp model when dosed post insult, and therefore represents significant progress in the development of therapeutically viable broad caspase inhibitors for the treatment of this disease. v. vankerckhoven°, s. van voorden, n. hens, h. goossens, g. molenberghs, e. wiertz (wilrijk, be; leiden, nl; hasselt, be) objectives: toll-like receptors function as key regulators of both innate and adaptive immunity. lactobacilli modulate the immune system in different ways. the aim of this study was to examine toll-like receptor (tlr2, tlr2/6 and tlr4) signalling induced by clinical and probiotic lactobacillus strains. methods: a total of 45 lactobacillus strains (19 l. paracasei and 26 l. rhamnosus) of different origin (22 probiotic, 2 faecal, and 21 clinical) were tested for tlr2, tlr2 in combination with tlr6, and tlr4. tlr signalling was measured as relative il-8 promotor activation in transfected human embryonic kidney (hek) 293 cells. il-8 concentrations were measured using an enzyme-linked immunosorbent assay. heat-killed listeria monocytogenes (hklm) was used as positive control in all assays, whereas pam3, pam2, and lps were used as positive controls for, respectively, tlr2, tlr2/6, and tlr4. all assays were performed at least in duplicate. linear mixed model analyses and stepwise model selection were used to identify the statistically significant effects. random effects were used to account for heterogeneity across and homogeneity within isolates. p < 0.05 was considered statistically significant. results: hek-tlr2 and hek-tlr2/6, but not hek-tlr4, cells released il-8 upon stimulation with uv-inactivated lactobacilli, which was enhanced by co-transfection with cd-14. interestingly, the production of il-8 was shown to be variable for the different lactobacillus isolates. although similar results were seen for all isolates for tlr2 and tlr2/6, il-8 production was significantly higher for tlr2 (8.4 log pg/ml) compared to tlr2/6 (6.05 log pg/ml) (p < 0.0001). no significant differences in il-8 production were seen between clinical and probiotic isolates. however, l. rhamnosus isolates induced a significantly higher il-8 production compared to l. paracasei isolates in both cell lines, 7.88 and 6.84 log pg/ml, respectively (p = 0.0004). intra-isolate correlation was found significant (p < 0.0022). conclusions: our study shows that lactobacilli activate both tlr2 and tlr2 in combination with tlr6. our results also indicate that heterodimerisation of tlr2 with tlr6 does not lead to an improved recognition of lactobacilli. furthermore, taking intra-isolate correlation into consideration proved to be important. finally, our results suggest that differences in immunomodulation by lactobacilli may be related to differential signalling through tlrs, including tlr2 and tlr2/6. m.c. gagliardi, v. sargentini, r. teloni, m.e. remoli, g. federico, m. videtta, g. de libero, e. coccia, r. nisini°(rome, it; basel, ch) objective: to gain insights into the mechanisms used by mycobacterium tuberculosis and bacillus calmette guérin to cause human monocytes differentiation into cd1 negative dendritic cells (my-modc), unable to present lipid antigens to specific t cells. methods: human monocytes infected or not with mycobacteria were induced to differentiate into dc with gm-csf and il-4 in the presence or absence of p38 or erk specific inhibitors. kinases activation was detected by western blot using antibodies specific for phosphorylated and non phosphorylated isoforms. differentiation of monocytes into dc and the cd1a, cd1b and cd1c expression was evaluated by flow cytometry and by real time pcr at different time points from infection. functional expression of cd1 molecules was assessed by recognition of lipid antigens by cd1 restricted t cell clones. results: we show that mycobacteria trigger phosphorylation of erk and p38 mitogen-activated protein kinase in human monocytes as well as of activating transcription factor (atf)-2. mycobacteria-infected monocytes treated with a specific p38 inhibitor, but not with a specific erk inhibitor become insensitive to mycobacterial subversion and differentiate into cd1 positive my-modc, which are fully capable of presenting lipid antigens. data indicate that phosphorylation of p38 is directly involved in cd1 inhibition. conclusions: we propose p38 signaling as a pathway exploited by mycobacteria to affect cd1 expression, thus representing a novel target of possible pharmacological intervention in the treatment of mycobacterial infections. s. ebert°, s. ribes, r. nau, u. michel (gottingen, de) objective: activin a (act a) is a multifunctional cytokine with roles in the immune system and the inflammatory response. act a levels are elevated in the cerebrospinal fluid of patients with meningitis. microglial cells, the major constituents of innate immunity within the brain, express toll-like receptors (tlrs) recognising exogenous and endogenous ligands. upon stimulation with tlr agonists, primary mouse microglial cells become activated and release nitric oxide (no), cytokines, and also act a, suggesting that they are a source of elevated conclusions: pre-treatment with act a enhances no release from microglial cells activated by agonists of the principal tlrs involved in the recognition of bacteria. these findings provide further evidence for a role of act a in the innate immune response and suggest that act a acts as an pro-inflammatory modulator during infection and inflammatory processes in the cns. insertion sequences (is) are genetic tools that can mediate expression of previously silent genes or be responsible for the overexpression of certain genes (in each case by providing promoter sequences). in addition to be involved in gene transcription levels, is elements also play a very important role for gene acquisition/mobilisation. an is is usually made of of two inverted-repeat sequences (irs) bracketing a gene encoding the transposase which activity enables this entity to replicate and target another sequence. the is-related mechanisms at the origin of antibiotic resistance gene acquisition are diverse, including composite-transposition, rolling-circle transposition, one-ended transposition. is elements may be also involved in gene acquisition by mediating co-integration processes, or recombination events as hypothetized for is26 in relation with blashv extendedspectrum b-lactamase (esbl) genes originating from the chromosome of klebsiella pneumoniae. the blactx-m esbl genes known to be extremely widespread worldwide are encoded on plasmids, and have been found in association with isecp1 (acting by one-ended transposition) or iscr1 (acting by rolling-circle transposition). in that case, iss have played a role in the mobilisation from the chromosome of kluyvera spp. being the blactx-m progenitors and then in their expression. also, genes encoding acquired ampc b-lactamases, being of the blaacc, bladha, and blacmy-types, are mostly found in association with iscr1 or isecp1. sometimes antibiotic resistance genes are mobilised by composite transposons which are made of two copies of a given is bracketing the mobilised fragment. in acinetobacter baumannii, the worldwide disseminated blaoxa-23 carbapenemase gene is part of a composite transposon structure made of two copies of isaba1, forming transposon tn2006 which had mobilised a chromosomal fragment from acinetobacter radioresistens that actually corresponds to the progenitor of blaoxa-23. another possibility can be the forming of composite transposon structure bracketed by two different is (sharing similar irs) as observed with the blaper-1 esbl gene in pseudomonas aeruginosa. this diversity of iss elements at the origin of mobilisation/acquisition of antibiotic resistance genes is therefore responsible for the very efficient dissemination of many of them. s362 resistance islands − their role in the accumulation and spread of antimicrobial resistance genes historically, multi-antibiotic resistance in many bacterial species was largely attributed to the acquisition of resistance (r)-plasmids encoding one or more resistance determinants. however, over the last decade the r-plasmid paradigm has begun to be challenged. 'resistance islands' comprising large, chromosomally-integrated spans of alien dna harbouring multiple antibiotic resistance genes have been identified in the major hospital pathogens methicillin-resistant staphylococcus aureus (mrsa) and multi-resistant acinetobacter baumannii, and the foodand water-borne diarrhoeal pathogens shigella, salmonella and vibrio cholerae. in addition, comparative genomics analysis of the archetypal haemophilus influenzae conjugative resistance element that had spread worldwide revealed that it belonged to a large syntenic family of integrative islands, members of which could be found in at least 15 other b-and g-proteobacteria. with the exception of the a. baumannii island, these elements can be described as classic self-excising, -circularising and -integrative elements. all three functions are mediated by short island-flanking direct repeats and cognate integrase proteins encoded by the islands. in 2006 fournier et al. described an 86 kb a. baumannii island (abar1) which harboured 45 resistance genes packaged within a highly mosaic, integron-rich element that had almost certainly evolved via recombination, transposition and integron-mediated cassette capture from an 'empty' ancestral prototype. abar1 probably represents a new class of resistance island as it exhibits several features reminiscent of complex nested transposons, suggesting a distinct functional natute. however, despite the widespread distribution of resistance and genomic islands only a minority are known to code for part or all of the conjugative machinery necessary for their dissemination; others have been mobilised by helper plasmids or bacteriophages. regardless, data on the mechanisms of mobilisation of the vast majority of similar nonresistance islands remain sparse. importantly, resistance islands may not consists merely of packages of resistance genes. on the contrary, these diverse and frequently hybrid entities could potentially confer upon their hosts other advantageous traits relating to host-pathogen interaction, virulence, survival in the environment and/or transmissibility, truly justifying the label 'selfish islands' and further explaining their evolutionary success. due to the availability of new techniques, genome sequencing of bacteria has become fast and inexpensive. furthermore, recent methods using paired-end reads located several kb apart in the genome eases the assembling process, even though no reference sequence is available. in a reasonably close future, it should be possible to obtain the fully assembled sequence of a bacterial isolate overnight. the new sequencing techniques generate enormous amounts of genomic data and, thereby, a need for new tools. these should able to quickly analyze genomes and point to zones of interest, prompting further analysis on a reduced number of regions or genes, such as genomic islands. pathogenicity islands, a subset of genomic islands, carry genes such as toxins or resistance genes and have the particularity to be mobile, i.e. they may transfer to other species or strains. thereby, they confer their new hosts a more resistant or infectious phenotype, making this phenomenon particularly important to study. nucleotide composition of genomes is fairly homogeneous inside bacterial genomes. in general, horizontally transferred regions can be spotted due to their particular nucleotide content, because they tend to retain the composition of their original host and don't share the one of their new hosts. to do an analogy with languages, genomes speak dialects, and as one would easily spot a paragraph in finnish in an english text while not knowing finnish, one can spot genomic and pathogenicity islands transfers in a given genome. several techniques relying on various compositional aspects and on different algorithmic methods have been recently developed to detect pathogenicity islands in bacterial genomes. even very simple measures of the genome composition, such as the variation in t vs. a bias (ta skew) can lead to the identification of all known prophages in streptococcus pyogenes. it can even trigger the discovery of a putative ancient genomic island carrying a large number of genes related to pathogenicity in all strains of that species. in conclusion, with the rise of fast and inexpensive genome sequencing, new quick and simple methods are being developed. they take the advantage of the homogeneous nucleotide composition of bacterial genomes to uncover mobile genetic elements carrying genes involved in pathogenicity. in the past 10 years, significant progress has been achieved in the management of chronic hepatitis b with the successive development of six potent antiviral medications (lamivudine, adefovir dipivoxil, pegylated interferon alpha, entecavir, telbivudine and tenofovir). however, the clinical results of antiviral therapy have been limited by the emergence of antiviral drug resistance especially with the first generation of nucleoside analogs (lamivudine, adefovir and telbivudine). furthermore, the unique mechanism of viral genome replication and persistence within infected cells is responsible for viral persistence even after prolonged therapy with the newer antivirals (entecavir and tenofovir). this is the major reason why life-long treatment is envisaged in the majority of patients, which may expose them to long-term risk of developing resistance. the use of in vitro phenotypic assays has been crucial for the characterisation of newly identified resistant mutants and determine their cross-resistance profile. results allowed to understand the different mechanism of viral resistance to lamivudine and adefovir, the mechanism of primary failure to adefovir therapy, the unique mechanism of entecavir resistance, and to characterise the emergence of multi-drug-resistant strains in patients receiving sequential antiviral therapy. the crossresistance profile for the main resistant mutants was determined which allowed to provide recommendation to clinicians for treatment adaptation based on molecular virology data. the understanding of the development of hbv drug resistance has allowed to significantly improve the management of antiviral resistance and to design better treatment strategies to prevent resistance. the current standard of care relies on treatment initiation with antivirals combining a strong antiviral potency and a high barrier to resistance. a precise virologic monitoring is required to measure antiviral efficacy, and to diagnose partial response or viral breathrough at an early stage. this allows to adapt antiviral treatment preferrably using an add-on strategy with a drug having a complementary cross-resistance profile. this strategy has been shown to be efficient in controling viral replication and preventing liver disease progression in the majority of patients. treatment of chronic hepatitis b virus (hbv) infection is aimed at suppressing viral replication to the lowest possible level. in many prospective clinical trials it has been shown that a sustained hbv dna response was correlated with serologic, histologic, or biochemical responses. despite the recent progress in hepatitis b antiviral treatment, it is shown that antiviral drug resistance is inevitable against many of the nucleoside analogs. the emergence of antiviral-resistant strains of hbv leads to viral and subsequently biochemical breakthrough and may lead to disease progression and increased death. most of the data on the clinical impact of antiviral resistant hbv came from the data derived from studies of lamivudine therapy. there is limited data on other hbv antiviral drugs like adefovir. it is shown in several studies that treatment of hbeag-negative chronic hepatitis b with lamivudine effectively suppresses hbv replication and results in biochemical remission and histologic improvement in more than two thirds of patients. however, relapse has occurred in the majority of hbeag-negative patients after the cessation of therapy. there are several studies to support the occurrence of severe hepatic flares, and liver failure after the emergence of lamivudine resistance. several studies, where liver biopsies were taken, demonstrated that histological improvement was reduced in those patients experiencing lamivudine resistance. the clinical outcome for patients with antiviral resistance is related to their age, the severity of the underlying liver disease and the severity of the hepatic flares. on the other hand in a different study it was found that long-term lamuvidine treatment was associated with a reduced chance of developing cirrhosis and hcc in patients without advanced disease but, although resistant mutants reduced the benefits from lamivudine therapy, the outcome of these patients was still better than untreated patients. results of several clinical trials have shown that the addition or substitution of newer antiviral agents can restore suppression of viral replication, normalisation of liver function and reverse histological progression in patients with antiviral resistance. consequently, well-tolerated, potent therapies that offer a strong genetic barrier against the development of resistance are desirable, since antiviral resistance and poor adherence are key risk factors for treatment failure and subsequent reversal of clinical improvement. resistance of enteric fever-causing and non-typhoid salmonella serovars to agents traditionally used to treat these infections in the past shows extensive geographical variation. decreased susceptibility to ciprofloxacin is rapidly increasing all over the world with target alteration and increased efflux being the most important mechanisms behind. infections with such strains often result in extended hospitalisation or even in therapeutic failures. furthermore, it is likely that moderately increased mic values facilitate the development of strains with higher level of resistance, i.e. a pattern described at various locations. screening methods based on quinolone sensitivity testing may fail to indentify decreased fluoroquinolone susceptibility both in typhoid, as well as in non-typhoid salmonella. plasmid mediated quinolone resistance genes are detected increasingly all around the world although neither the frequency nor the variety of genes identified has approached that seen in some other members of enterobacteriaceae. treatment with gatifloxacin or azithromycin are alternative options for invasive and systemic infections caused by strains with decreased susceptibility to ciprofloxacin. at some parts of the world resistance to extended spectrum cephalosporins reached such incidence that may have therapeutic implications particularly when initial, empiric treatment of invasive infections is concerned. resistance is due to plasmid coded ampc type beta lactamases (particularly to cmy-2), and most often to esbls of which usually some of ctx-m types are the frequently encountered ones. carbapenem resistance is still rare, albeit does occur, among salmonella isolates. the recent description of a non-typhoid salmonella strain with the blaimp-4 gene co-located on a class-1 integron with several other resistance determinants on a conjugative plasmid is of particular concern. campylobacters exhibit natural resistance to a variety of antimicrobials. the drugs of choice used to be fluoroqunolones or macrolides. however, the current incidence of ciprofloxacin resistance made the former drugs already obsolete or seriously limited their use at several parts of the world. with the exception of few locations the incidence of macrolide resistance is still relatively low and is seen more frequently in c. coli than in c. jejuni. however, strains exhibiting resistance against both groups of drugs have been emerging, particularly in south-east asia. neisseria meningitidis, the meningococcus, is a major cause of meningitis and septicaemia worldwide while neisseria gonorrhoeae, the gonococcus, is responsible for one of the most widespread sexually trasmitted disease. the behaviour of these two species towards antibiotics is very different: resistance in n. gonorrhoeae is now widespread occuring as both chromosomally and plasmid mediated to a variety of drugs, whereas, besides resistance to sulphonamides, n. meningitidis remains largely susceptible to antibiotics used both for therapy and prophylaxis. however, as in the gonococcus, the resistance to antibiotics of n. meningitidis is also evolving, as documented by the ever higher frequency of strains with intermediate resistance to penicillin in many countries. transformation has apparently provided both species with a mechanism by which they can increase resistance to penicillin by replacing part of their pena gene, which encodes pbp2, with part of the pena gene of related species that fortuitously produces forms of pbp2 less susceptible to the antibiotic. n. meningitidis is still at this step, whereas n. gonorrhoeae has acquired also mutation in the pona gene that encodes pbp1, mutation in porin ib, increased expression of efflux pump and the tem-1 b-lactamase plasmid. the emergence and the spread of gonococci fully resistant to penicillin since the second half of the 1980 s years led to the recommended use of fluoroquinolones as primary therapy. however, this class of antibiotics became rapidly unefficacious, mainly in asia, due to the emergence of mutations in gyra and parc which are able to block the activity of the quinolones on gyrase and topoisomerase iv. since 2006, cdc no longer recommends their use for treatment of gonococcal diseases. fortunately, the occurrence of quinolone resistant meningococci, due to mutations in gyra, is still rare but even if cases are still few they are of great concern for the epidemic potential of this pathogen and the required prophylaxis of contacts. also for the other antibiotic, frequently used to this aim, rifampicin, some meningococci have showed to be resistant, again for the presence of mutations, in this case in the rpob gene coding for the b-subunit of the meningococcal rna polymerase. the molecular epidemiological identification of clonal clusters for both neisseria species with distinct resistance profiles is required to monitor ongoing trends that may pose problems both in therapy and prophylaxis. l. brookes-howell°, c. butler, k. hood, l. cooper, h. goossens (cardiff, uk; antwerp, be) introduction: grace is a european network of excellence established to focus on antibiotic use for community-acquired lower respiratory tract infection (lrti) and antimicrobial resistance across europe. grace-02, the second study to begin within grace, is a large qualitative study that explores the attitudes of clinicians and patients to antibiotic use for lrti and antibiotic resistance. aims: this presentation will focus on clinicians' accounts of the factors that contribute to variation in management of lrti and patient views on when antibiotics are necessary. methods: semi-structured interviews with 81 clinicians and 121 patients were conducted in primary care networks in nine european countries. interviews were audio-recorded, transcribed and, where necessary, translated into english for analysis. themes were identified, organised and compared using a framework approach. results: analysis of clinician interviews shows that, beside clinical findings, factors which influence the management decision for patients can be divided into two main areas. firstly, within each european network there is a group of country specific factors imposed by the system in which consultations take place. these factors include: near patient test usage, self-medication, patients' finances and lack of consistent, local prescribing guidelines. secondly, there is a group of factors, similar across all networks, that relate to personal characteristics of certain groups of clinicians. these include clinicians' professional ethos, self-belief in decision making and attitude towards the doctorpatient relationship. analysis of patient interviews shows that beliefs about antibiotic use tend to draw on clinical factors, namely the severity of specific symptoms (fever and/or coughing). many patients also implied a period of waiting or alternative action required before antibiotics are used − to identify whether the immune system would fight the infection or whether nonantibiotic management was effective before turning to antibiotics. discussion and conclusion: with a greater understanding of the factors that contribute to the decision to prescribe, we discuss ideas to enhance appropriate prescribing. this analysis highlights the need for interventions to be sensitive to factors relating to the systems in which different european networks operate, to target the individual characteristics of specific groups of clinicians and to build on the clinical beliefs already held by patients. o377 pre-treatment with low-dose endotoxin prolongs survival from experimental lethal endotoxic shock k. kopanakis, i. tzepi, e.j. giamarellos-bourboulis°, a. macheras (athens, gr) objective: clinical trials of immunointervention with anti-endotoxin antibodies in patients with severe sepsis have failed to disclose survival benefit. these failures led us to the assumption that the opposite approach with a low endotoxin stimulus may result to low level immunoaralysis and subsequent survival benefit. this approach was tested in an experimental setting. methods: a total of 36 male c57b6 mice were studied divided into two groups: group a stimulated with the ip injection of sodium saline followed after one day by the ip injection of 30 mg/kg of lipopolysaccharide (lps) of escherichia coli o155:h5; and group b stimulated with the ip injection of 3 mg/kg of lps of the same isolate followed after one day by the ip injection of 30 mg/kg lps. lps was diluted in sodium saline and the volume of each injection was 0.2 ml. survival was recorded at six hour time intervals. results: survival of group b was considerable prolonged compared with group a (log-rank: 5.435, p: 0.020) as shown in figure 1 . thirteen mice of group a died (72.2%) compared with seven mice of group b (38.9%, p: 0.044 between group). conclusions: administration of low doses of lps prolongs survival after lethal endotoxic shock. this approach opens a promising novel pathway for immunointervention in sepsis. fragilis isolates with an mxf mic of 2 mg/ml (n = 5), 4 mg/ml (n = 20) and 8 mg/ml (n = 8), which were virulent in the mgp model, were used to determine the efficacy of mxf. for the mgp model, pouches were created by injecting 5 ml of air and 0.5 ml of 0.1% croton oil in olive oil under the skin of the back. on day 3, the air was withdrawn and replaced by 1 ml soft agar. on day 5, a bacterial suspension was injected into the pouch. infected mice (n = 6 mice/group) were treated with mxf 100 mg/kg iv, b.i.d. for 2 days. this dose simulates the auc of the human 400 mg once-daily mxf iv dosage. efficacy was assessed by the reduction in colony forming units (cfus) in pouch exudates 48 hours post-infection compared with the untreated infection control. results: in the mgp model, mxf, 100 mg/kg b.i.d., displayed good efficacy in term of cfu reduction against all used strains in this study. there were no non-responders in terms of cfu reductions. conclusion: the loss of atle had no impact on the mics of cloxacillin and vancomycin. conversely, the mutant atle(−) strain was less susceptible to bactericidal activity of both antibiotics, supporting the implication of atle in the tolerance of s. epidermidis to cell wall active antibiotics. the loss of atle did not alter the virulence of s. epidermidis in the mouse peritonitis model, whereas it decreased virulence in previously published experiments using an intravenous catheter infection model. therefore, the mouse peritonitis model was suited to compare antibiotics efficacy against atle(+) and atle(−) strains. our results showed that the loss of atle did not alter significantly the activity of cloxacillin and vancomycin in the mouse peritonitis model. this study shows that the loss of atle results in decreased susceptibility to bactericidal activity of cell wall active antibiotics, with no apparent impact on the activity of these antibiotics in the mouse peritonitis model. in infant rat pneumococcal meningitis, ceftriaxone plus daptomycin versus ceftriaxone attenuates brain damage and hearing loss while ceftriaxone plus rifampicin versus ceftriaxone does not d. grandgirard, m. burri, k. oberson, a. bühlmann, f. simon, s.l. leib°(berne, ch) objectives: lytic antibiotics for therapy of bacterial meningitis (bm) increase the release of pro-inflammatory bacterial compounds which, in turns, induce inflammation. exacerbation of the inflammatory response in cerebrospinal fluid (csf) contributes to the development of neurological sequelae in survivors of bm. daptomycin, a nonlytic antibiotic acting on gram-positive bacteria has been shown to decrease inflammation and brain injury vs. ceftriaxone in experimental pneumococcal meningitis. with a view on the clinical application for empiric therapy of paediatric bacterial meningitis we investigated, whether therapies combining daptomycin or rifampicin with ceftriaxone are beneficial when compared to ceftriaxone monotherapy in infant rat pneumococcal meningitis. methods: eleven day old wistar rats were infected by intracisternal injection of s. pneumoniae and animals were treated with daptomycin (10 mg/kg, s.c., daily) plus ceftriaxone (100 mg/kg, s.c., bid), rifampicin (20 mg/kg, i.p., bid) plus ceftriaxone or ceftriaxone alone. csf was sampled at 6 h and 22 h after the initiation of therapy and assessed for concentrations of chemo-and cytokines (mcp-1, mip-1a, il-1b, il-6, il-10; il-18 and tnf-a). a subset of animals was sacrificed 40 h post infection (h pi) and brain damage quantified by histomorphometry. the remaining animals were treated for 3 d and were tested for hearing loss, by assessing the auditory brainstem response (abr) at 3 weeks after infection. results: compared to ceftriaxone alone, daptomycin plus ceftriaxone significantly (p < 0.04) lowered csf concentrations of mcp-1, mip-1alpha and il-6 at 6 h and mip-1a and il-1b at 22 h after initiation of therapy, led to significantly (p < 0.01) less apoptosis assessed at 40 h pi, and significantly (p < 0.01) improved hearing capacity. while rifampicin plus ceftriaxone also led to lower csf inflammation (p < 0.02 for il-6 at 6 h), apoptosis and hearing loss were not significantly different from the ceftriaxone group. conclusion: compared to ceftriaxone monotherapy, daptomycin plus ceftriaxone lowers the level of pro-inflammatory mediators in the csf and reduces hippocampal apoptosis and hearing loss in infant rat pneumococcal meningitis. d. croisier-bertin°, l. piroth, p.e. charles, d. biek, y. ge, p. chavanet (dijon, fr; alameda, us) objectives: ceftaroline (cpt) is a novel, parenteral, broad-spectrum cephalosporin exhibiting bactericidal activity against gram-positive organisms, including methicillin-resistant s. aureus (mrsa) and multidrug-resistant s. pneumoniae, as well as common gram-negative pathogens. the efficacy of simulated human dosing with cpt or ceftriaxone (cro) was evaluated in a rabbit model of penicillin-resistant pneumococcal pneumonia. methods: 3 s. pneumoniae strains were used to induce pneumonia in rabbits: pssp, pisp, and prsp. mics (mg/l) were 0.06/0.015, 1/0.125, and 4/0.25 for cro and cpt, respectively. the animals were randomised to no treatment (controls), intravenous (iv) cpt human equivalent (he) dosage (600 mg/12 h), iv cro he dosage (1 g/24 h), or intramuscular (im) cpt (5 or 20 mg/kg) for prsp-infected rabbits. serum levels were measured by microbiological assay and pk data were obtained. evaluation of efficacy was based on bacterial counts in lungs and spleen (per gram tissue). results: 5−7 animals/group were tested. for iv cpt/iv cro, mean auc0−24 was 155/938 mg.h/l, cmax was 20/158 mg/l and cmin was 1.3/6 mg/l, respectively. bacterial counts in target tissues are listed in the iv cpt and iv cro were highly efficacious against pssp and pisp. iv and im cpt were superior to iv cro against prsp with a quasi sterilisation of lungs and spleen. combined results from the iv and im studies indicated that %t > mic for cpt of 30% and 45% were associated with 50% and 100% bacterial count reductions, respectively. in this rabbit model of penicillin-resistant pneumococcal pneumonia, cpt administered iv (with he dosing) or by im administration was more effective against prsp than iv cro. r. endermann°, d. hoepker, k. merfort, m. glenschek-sieberth (wuppertal, de) objective: moxifloxacin (mxf) is approved in the usa and other countries for the treatment of complicated intra-abdominal infections (ciais). we compared the efficacy of mxf with piperacillin/tazobactam (pip/taz), a commonly used treatment for ciais, in three different models: ( . c. clp model: survival over 10 days was significantly higher in the mxf group than in the pip/taz group (p < 0.0001). conclusions: using humanised dosages, mxf had greater antimicrobial activity and provided higher survival rates that pip/taz in three different models for ciai. m. nairz, i. theurl, a. schroll, m. theurl, s. mair, t. sonnweber, g. fritsche, r. bellmann-weiler, g. weiss°(innsbruck, at) mutations in hfe predispose to hereditary haemochromatosis type i, a frequent genetic disorder characterised by progressive parenchymal iron deposition and eventual organ failure. since hfe mutations are associated with reduced iron levels within mononuclear phagocytes, we hypothesized that hfe deficiency may be beneficial in infections with intramacrophage pathogens. using hfe+/+, hfe+/− and hfe−/− mice in a model of typhoid fever, we found that animals lacking one or both hfe alleles are protected from systemic infection with salmonella typhimurium, displaying prolonged survival and improved bacterial control. this increased resistance can be referred to an enhanced production of the siderophore-binding peptide lipocalin 2 and the reduced availability of iron for salmonella engulfed by hfe deficient macrophages. this effect is mediated via stimulation of lipocalin 2-dependent iron export from infected cells since hfe−/− macrophages concurrently knocked out for lipocalin 2 are unable to efficiently control the infection or to withhold iron from intracellular salmonella. correspondingly, infection of hfe+/+ and hfe−/− mice with siderophore deficient salmonella abolishes the protection conferred by the hfe defect. thus, by inducing the formation of the iron-capturing peptide lipocalin 2, the hfe mutation harbours a genetically determined immunological advantage towards infections with intracellular pathogens such as salmonella. i. koutelidakis, a. kotsaki, p.d. carrer, k. louis, a. savva, e.j. giamarellos-bourboulis°(thessaloniki, athens, gr) objective: the majority of clinical trials of immunointervention in severe sepsis have failed to disclose survival benefit. a likely explanation may be administration of therapy when immunoparalysis of the septic host supervenes. in an attempt to reverse immunoparalysis, injection of mononuclear cells was attempted in experimental sepsis by multidrugresistant pseudomonas aeruginosa (mdrpa). methods: peripheral blood mononuclear cells (pbmcs) diluted in rpmi were isolated from five healthy human volunteers after gradient centrifugation over ficoll. 1×10 7 /kg of one mdrpa live or heat-killed isolate from one patient with severe sepsis was injected intraperitoneally for bacterial challenge. a total of 72 male c57b6 mice were studied divided into four groups: group a (n = 26) pre-treated with rpmi and challenged after one hour with live isolate; group b (n = 26) pretreated with 5×10 7 pbmcs/kg and challenged after one hour with live isolate; group c (n = 10) pre-treated with rpmi and challenged after one hour with heat-killed isolate; group d (n = 10) pre-treated with 5×10 7 pbmcs/kg and challenged after one hour with heat-killed isolate. survival was recorded for 20 mice of groups a and b and for all mice of groups c and d. six mice of groups a and b were sacrificed six hours after challenge. blood was collected from the lower vena cava and tnfalpha and il-6 were estimated in serum by an enzyme immunoassay. bacterial growth of liver and lung at the same time was assessed. results: median survival of group a was 24 hours and of group b 88 hours (log-rank: 4.524, p: 0.033). nineteen animals of group a died (95%) compared with eight animals of group b (40%, p: 0.038). four animals of group c died (40%) compared with nil animals of group d (0%, log-rank: 4.274, p: 0.03). median serum tnf-a of groups a and b at sacrifice was 31 and 184 pg/ml respectively (p: 0.048). respective values for il-6 were 2084 and 2231 pg/ml (pns); for liver bacterial cells 3.63 and 4.99 log10 cfu/g (pns); and for lung bacterial cells 2.56 and 4.22 log10 cfu/g (pns). conclusions: allogeneic transplantation with pbmcs prolonged survival in experimental sepsis by mdrpa. its mechanism of action was related with a) blockade of cell wall structures as shown by survival experiments with heat killed isolate; and b) reversal of immunoparalysis as evidenced by increase of serum tnf-a. this approach creates a promising novel perspective for immunointervention in sepsis. a. marangoni°, c. nanni, m. donati, r. aldini, d. di pierro, s. trespidi, s. accardo, s. fanti, r. cevenini (bologna, it) objectives: chlamydia trachomatis is one of the world's major causes of sexually transmitted diseases of the cervix and urethra and it is a major agent of pelvic inflammatory disease. genital tract infection of female mice with chlamydia muridarum closely mimics acute genital tract infection in women. aim of this study was to assess the predictivity of 68ga-chloride small animal positron emission tomography ( o387 inadequate statistical power of published comparative cohort studies on ventilator-associated pneumonia to detect mortality differences between the compared groups m. falagas°, v. kouranos, a. michalopoulos, s. rodopoulou, a. athanasoulia, d. karageorgopoulos (athens, gr) objective: comparative cohort studies are often conducted to identify novel therapeutic strategies or prognostic factors for ventilator-associated pneumonia (vap). we aimed to evaluate the statistical power of such studies to provide statistically and clinically significant conclusions. methods: we searched in pubmed and scopus for comparative cohort studies evaluating the mortality of patients with vap. we calculated for each of the included studies the statistical power to detect the observed difference in mortality between the compared groups (observed power), as well as 3 expected, clinically relevant, effect sizes (expected power). we identified 39 (20 prospective) comparative cohort studies on vap as eligible for inclusion in this analysis. the median observed power of these studies was 17.9% [interquartile range (iqr), 9.8−52.4%]. the median expected power was 10.0% (iqr, 7.2−13.6%) for a risk ratio for mortality of 0.85 between the compared groups; 14.7% (iqr, 10.6−21.8%) for a risk ratio of 0.80; and 7.9% (iqr, 6.3−10.2%) for a reduction in mortality from 30% to 25%. all expected power measures were significantly lower than the observed power. the statistical power of most cohort studies to detect the observed difference in mortality between compared groups of patients with vap is low. the power is even lower when expected, clinically relevant, differences in mortality are considered. for a wiser utilisation of resources allocated to research, we favour the conduction of cohort studies with larger sample size so that potential differences between the compared groups are more likely to be shown. objective: to clarify issues regarding the frequency, prevention, outcome, and treatment of patients with ventilator-associated tracheobronchitis (vat), which is a lower respiratory tract infection involving the tracheobronchial tree, while sparing the lung parenchyma. methods: we performed a systematic review and meta-analysis of relevant available data, gathered though searches of pubmed, scopus, and reference lists, without time restrictions. a conservative random effects model was used to calculate pooled odds ratios (or) and 95% confidence intervals (ci). results: out of the 564 initially retrieved articles, 30 papers were included. frequency of vat was 10.2%. selective digestive decontamination was proved an effective preventive strategy against vat. presence, as opposed to the absence, of vat was not associated with higher mortality (or: 1.18, 95% ci 0.90−1.53). administration of systemic antimicrobials (with or without inhaled ones), as opposed to placebo or no treatment, in patients with vat was not associated with lower mortality (or: 0.56, 95% ci 0.27−1.14). most of the studies providing relevant data noted that administration of antimicrobial agents, as opposed to placebo or no treatment, in patients with vat was associated with more ventilator-free days and lower frequency of subsequent pneumonia, but without shorter length of intensive care unit stay or shorter duration of mechanical ventilation. conclusions: approximately one tenth of mechanically ventilated patients suffer from vat; an infection potentially prevented by the implementation of selective digestive decontamination. antimicrobial treatment of patients with vat may protect against the development of subsequent ventilator-associated pneumonia. degranulation. subsequently, allergen specific ige to chlorhexidine was demonstrated and skin prick/intradermal testing was positive to chlorhexidine, confirming the diagnosis of chlorhexidine-precipitated anaphylaxis in each patient. a detailed review of the case-notes revealed that each patient had manifest evidence of minor cutaneous reactions to pre-operative chlorhexidine use that had not been ascribed to chlorhexidine at the time. discussion: fda issued a public health notice [1998] following 1st description of anaphylaxis to chlorhexidine coated central venous catheter. a recent case cluster has also been reported from another cardiac centre in the uk [3-cases over a 9-month period]. references to be presented. it is interesting that these reports of chlorhexidine anaphylaxis have all occurred in patients undergoing cardiac surgery. these patients receive multiple exposures to chlorhexidine during their pre-operative investigations and preparation. this has increased recently as a result of the drive to reduce the incidence of hospital-acquired infections. we wish to postulate that these patients have been sensitised by repeated topical exposure to chlorhexidine and have exhibited anaphylaxis when this allergen was presented to the patient in the form of the chlorhexidine coated central venous catheter. type i strains of helicobacter pylori possess the cag pathogenicity island to deliver virulence factors. cag is a specialised type iv secretion machinery that is activated during infection and comprises 31 genes originated from a distant event of horizontal transfer. after translocation the effector protein caga is phosphorylated on tyrosine residues restricted to a previously identified repeated sequence called d1. this sequence is located in the c-terminal half of the protein and contains the five amino acid motif epiya, which is amplified by duplications in a large fraction of clinical isolates. tyrosine-phosphorylation of caga is essential for the activation process that leads to dramatic changes in the morphology of cells growing in culture. in addition, we observed that two members of the src kinases family, c-src and lyn, account for most of the caga-specific kinase activity in ags cell lysates. translocated caga interacts with the zo-1 and jam host-cell proteins causing disruption of the apical junctional complex. transfection of the caga gene into polarised epithelial cells induces disruption of cell-to-cell contacts and altered morphology. strikingly caga-expressing cells become migratory and invasive penetrating into collagen gel. the study of different portions of the molecule revealed the presence of two distinct functional domains and both are necessary to induce abnormal cell differentiation through interactions with host cell morphogens. cell polarity and invasion have been suggested to contribute to both early and late stages of cancer formation. these results suggest a mechanism by which caga may acts at the early stage of tumorigenic progression causing loss of cell polarity, increased cell motility and invasiveness of epithelial cells. after a period of 50 years of silence, a disease with an unpronouncable name, "chikungunya" (chik), has recently become a medical reality and reached the public throughout the world. conclusion: low mhla-dr expression after septic shock independently predicts ni. this promising biomarker may be of major interest in identifying patients at increased ni risk who could benefit from targeted and tailored therapy aimed at restoring immune functions. pneumonia, the leading infectious cause of death in the us, kills more people annually than aids, tuberculosis, meningitis and endocarditis combined. from a wide range of observational studies of communityacquired pneumonia (cap), only half of the cases had an aetiologic agent identified. streptococcus pneumoniae was consistently the predominant bacterial aetiology. this lecture will primarily focus on the innate immune response to pneumococcal pneumonia. toll-like receptors (tlrs) are key molecules that recognize pathogen associated molecular patterns (pamps) and induce an inflammatory response. pneumolysin, an intracellular toxin found in all s. pneumoniae clinical isolates, is an important virulence factor of the pneumococcus that is recognized by tlr4. although tlr2 is considered the most important receptor for gram-positive bacteria, tlr2 does not play a decisive role in host defence against s. pneumoniae pneumonia; likely, pneumolysin-induced tlr4 signalling can compensate for tlr2 deficiency during respiratory tract infection with s. pneumoniae. besides tlr2 and tlr4, tlr9 contributes to an effective host defence against s. pneumoniae in the airways. the importance of tlr signaling for host defence against pneumococcal pneumonia is illustrated by the fact that mice lacking the common tlr adaptor protein myd88 are highly susceptible to this infection. activation of tlrs results in the production of proinflammatory cytokines. there is ample evidence that underlines the importance of tumour necrosis factor (tnf) and interleukin (il)-1 in host defence in bacterial pneumonia: in a murine s. pneumoniae pneumonia model, treatment with a neutralising anti-tnf mab strongly impaired antibacterial defence. in addition, il-1a receptor type i deficient mice infected with s. pneumoniae displayed an increased bacterial outgrowth. of considerable interest, treating il-1 receptor deficient mice with a neutralising anti-tnf antibody made them extremely susceptible to pneumococcal pneumonia. infection of the lower airways by s. pneumoniae is associated with complex interaction between the pathogen (e.g. cell wall components, pneumolysin) and the host (e.g. tlrs, cytokines). these interactions play a crucial role in the outcome of this clinically important infection. severe bacterial pneumonia remains uncommon unless specific conditions exist that tip the balance between the host and pathogen in favour of the microorganism. such conditions include: persons at the extremes of age; exposure to especially virulent organisms; patients with concomitant illness impairing pulmonary clearance mechanisms; and immunocompromised hosts. pathogens overcome an array of innate and acquired host defences to successfully invade the host. the known virulence traits of three common respiratory pathogens (streptococcus pneumoniae, staphylococcus aureus, and pseudomonas aeruginosa) will be briefly reviewed. the capsular polysaccharide of pneumococci is the major anti-phagocytic virulence trait but many other factors contribute to disease pathogenesis including the critically important exotoxin known as pneumolysin, bacteriocins, adherence factors, choline binding proteins, lipoteichoic acid, iron, manganese and magnesium transporters, pili, competence and biofilm capacity, and virulence genes that promote invasion and impair clearance once the organism has entered the blood stream. s. aureus is notorious for the numerous a/b type toxins, cytotoxins, and superantigens it generates during the course of invasion. staphylococci deploy a complex series of quorum sensing signals that coordinate adhesin and invasion genes within biofilms or between planktonic organisms and likely contribute to the success of this pathogen. p. aeruginosa produces an array of extracellular exotoxins and cytotoxins delivered by type iii secretion systems. these include elastase, phospholipases c, a series of apoptotic and anti-phagocytic exotoxins, along with an alginate capsule and an unusual and variable lps structure that participate in microbial invasion. the pathogen expresses at least three interacting, quorum sensing systems to coordinate virulence and biofilm formation. a detailed understanding of these virulence factors is now providing therapeutic options to control these respiratory pathogens. surface expressed and extracellular toxins of pneumococci have been selected as new vaccine targets. inhibitory peptides and small molecule inhibitors of quorum sensing and biofilm formation are under investigation for staphylococcal and p. aeruginosa infections. these innovative and non-antibiotic treatment strategies are gaining greater importance as progressive antibiotic resistance threatens the management of these severe bacterial infections in the future. brucellosis, possibly the commonest zoonotic infection worldwide, has troubled humans since antiquity. recent years have seen the expansion of the animal reservoir of the disease to a wide spectrum of wildlife species, extending to marine mammals, and the recognition of novel brucella species. furthermore, animal and human disease has re-emerged in numerous countries which were brucellosis-free, and currently the most important endemic foci include near east and central asia. complex socioeconomic and political factors may be incriminated for these alterations in endemicity. the complex mechanisms by which brucella evades immune response and survives intracellularly are progressively clarified. novel diagnostic techniques as real time pcr may shed light in the life cycle of brucella inside the human host; preliminary studies have indicated that the pathogen may persist in latent form for years after apparent clinical cure, in asymptomatic individuals. treatment principles have not evolved significantly. the expert guidelines issued recently under the name of "ioannina recommendations" support the need for a six-week combined treatment that includes traditional antibacterials and is modified accordingly in serious complications as spondylitis and central nervous system involvement. the road to the development of a vaccine for humans seems long though. anthrax is ancient diseases and relatively a forgotten disease in western world until 2001 when spores were mailed in usa causing five deaths. currently, human anthrax is seen most commonly in agricultural regions of the world where anthrax in animals is prevalent, in which countries of middle east, in africa, central asia and south america. it is also an endemic disease in turkey. human cases may occur in an agricultural or an industrial environment. the infection is an occupational hazard of workers who process hides, hair, bone and bone products, and wool and of veterinarians and agricultural workers who handle infected animals. the main route of transmission is contact with or ingestion of contaminated meal with or inhalation of bacillus anthracis spores. leptospirosis is a very old disease that has been known for more than a hundred years and possibly even longer since the time of hippocrates. it remains a major cause of illness in many tropical and subtropical countries and thus in travellers. it has also been identified as a zoonosis in europe and north america. it is a disease that can surprise us because the clinical presentations are not always typical. in recent years, pulmonary and other atypical presentations have been more widely recognised. there is no effective vaccine but chemoprophylaxis is effective in selected populations. prompt recognition and early institution of appropriate treatment as with most other infectious diseases appear to be critical in ensuring a good outcome for our patients. there are interesting new developments in diagnostics and molecular epidemiology but clearly there are many challenges remaining in this field. objectives: the spread of carbapenemase genes within gram negative bacteria is of great cause for concern. in 2008, the first report of a blaoxa-58 gene outwith acinetobacter baumannii was reported in acinetobacter genospecies 3. we had also identified a genospecies 3 isolate encoding a blaoxa-58-like gene, and the aim of this study was to examine the genetic environment of the gene to investigate the mobilisation between species. methods: restriction analysis of rrna was used to confirm identity to the species level. susceptibility to imipenem and meropenem was determined through the plate doubling dilution method. screening by pcr for blaoxa-51-like, blaoxa-23-like, blaoxa-40-like and blaoxa-58-like genes was carried out. analysis of the genetic environment surrounding the blaoxa-58-like gene was conducted by sequencing inverse pcr products and gene-walking fragments. the structure of the surrounding sequence was confirmed using internal primers, which were also used to screen other blaoxa-58-like positive isolates in our collection. results: restriction analysis confirmed the isolate belonged to acinetobacter genospecies 3. the isolate showed reduced susceptibility to imipenem and meropenem with mics of 2 mg/l for both antibiotics. the isolate was negative for a blaoxa-51-like, blaoxa-23-like or blaoxa-40-like gene, but positive for a blaoxa-58-like gene. analysis of the genetic environment of the blaoxa-58-like gene revealed the gene was within a novel genetic structure. upstream of the blaoxa-58-like gene was the left-hand end of an isaba3 element, interrupted by an isaba125 element. the elements contained putative promoter sequences. downstream was an arac1 and a lyse gene, followed by a sequence similar to the re27 element described previously. following this was a complex region containing the right-hand end of an isaba3 tnpa gene, interrupted by an incomplete tnpa gene with 99% similarity to isaba3, itself interrupted by an isaba125 sequence. this region was followed by a second blaoxa-58-like gene. all other blaoxa-58-like positive isolates in our collection were negative for isaba125 upstream of blaoxa-58. this study is the first to report multiple copies of a blaoxa-58-like gene in an acinetobacter genospecies 3 isolate, and has identified a novel structure containing two blaoxa-58-like genes and two isaba125 sequences. the isaba125 elements may be responsible for the duplication of the blaoxa-58-like gene. objective: acinetobacter baumannii is an important nosocomial pathogen with wide intrinsic resistance. however, due to the dissemination of the acquired resistance mechanisms; such as extended-spectrum beta-lactamase (esbl) and metallo betalactamase (mbl) production, multidrug resistant strains have been isolated more often. per-1 was first detected in turkey and was found to be widespread among acinetobacter spp. and p. aeruginosa. since then, per-1 has been discovered in other countries, and most recently found in northern italy and in korea. in this study, the presence of per-1 type esbl was investigated in caftazidime resistant a. baumannii strains isolated from bloodstream infections by pcr and also the clonal relatedness of the isolates were investigated by random amplified polymorphic dna (rapd) and pulsed field gel electrophoresis (pfge) in all per-1 producing a. baumannii strains. methods: a. baumannii strains isolated from bloodstream infections was included in this study. the isolates were identified as a. baumannii by conventional methods and phoenix 100 bd automated system system (becton dickinson diagnostic systems, sparks). ceftazidime resistance was determined by e-test. per-1 genes were screened by these clusters encode: (i) resistance genes and transporters plausibly involved in drug efflux (30 transporters of the mfs, dmt, abc, rnd, mop and acr3 families were unique of drug resistant strains and absent in the susceptible sdf strain); (ii) pili and fimbriae systems related to biofilm formation and motility; (iii) haemolysin-and haemagglutininrelated proteins differently distributed among the four genomes, (iv) iron uptake and other metabolic genes. conclusion: genome comparison identified unique features of a. baumannii epidemic clones and provided novel insights into the genetic basis of multidrug resistance and pathogenesis in this species. this study may contribute to understand the concept of infection, invasiveness and colonisation in the emergent pathogen a. baumannii. hard to swallow − emerging and re-emerging issues in food-borne infection (symposium arranged with efwisg) s460 mrsa in food products: cause for concern or case for complacency? in 2003 first, a switch from intravenous-to oral medication (01-2006); second, education programs for interns/residents and physicians and the release of a new antimicrobial formulary (05-2006); third, a restriction note was printed on all laboratory rapports (10-2006) and fourth, active monitoring and giving feedback on prescriptions (01-2007). susceptibility patterns for e. coli including ciprofloxacin, cefuroxim, ceftazidim, co-trimoxazole and tobramycin from hospitalised patients were analyzed starting in 2004. statistical analyses were performed using segmented poisson regression models to look at effect of interventions on resistance (both sudden stepwise changes and changes in trends). bayesian model averaging was used to account for model uncertainty. results: before the start of the interventions the resistance rate was increasing by an average of 2.6% per year. the interventions resulted in a significant reduction of quin use from on average 550 prescribed daily doses to 350 pdd per month. in the best fitting poisson model for the resistance data, a significant stepwise decrease was found to be associated with interventions 2 and 4. however, there was substantial uncertainty in the model choice, and after accounting for this there was no conclusive evidence in support of any particular intervention, although there was evidence that at least one of the interventions was associated with the observed reduction in resistance. there were no stepwise decreases or decreasing trends in resistance rates to other antimicrobials during the study period. conclusion: many mds prescribe antibiotics often and believe their practice may have an effect on antibiotic resistance. results indicate that mds value information, interventions and surveillance in order to support responsible use of antibiotics. there is an ongoing effort in germany to address these findings at the national level e.g. by establishing a surveillance system for antibiotic resistance and antibiotic usage. table) . . ir for pn, er and tt were always higher in children (ch) than in adults (ad). significant differences were found for pn (1995), er (1997 er ( , 2004 er ( , 2006 er ( , 2007 er ( , 2008 , tt (2004 tt ( , 2006 . generally, cp-ir was higher in ad than in ch. ir was lower in the north (n) than in the south (s). significant differences: pn (2005 pn ( , 2006 , er (2003 er ( , 2004 er ( , 2005 , tt (2005) . both n and s knew a deceasing ir tendency: pn= n (12.1−8.1), s (18.8−13.2); cp= n (11.6−5.9), s (18.9−5.9); tt= n (27.4−21.5), s (35.9−23.5). er increased in the n (20.9−29.7). total outpatient antibiotic use (did) decreased from 26.2 (1999) to 22.7 (2004) and increased to 24.2 (2006) . did for pn and fq increased, mls stabilised and tt decreased. conclusions: since 2001-2003 an ir decrease was noted for pn, cp and tt. er-ir increased further over the years. the decrease paralleled the start of public campaigns on antibiotic use. ir rates remain higher in ch than in ad. the n/s difference became less marked. objectives: parachlamydia acanthamoebae is a new recognized member of the order chlamydiales. growing evidences suggest that this bacteria may have a pathogenic role in humans causing respiratory diseases. it has also been recently identified as an agent of bovine abortion and may be a cause of miscarriage in women. in contrast, little is known about the pathogenic role of rhabdochlamydia crassificans, another related chlamydiales. molecular diagnostic tools are useful to detect these obligate intracellular bacteria because of their inability to grow on conventional culture media. the aim of this work was (i) to develop a real-time pcr for the diagnosis of rhabdochlamydia infection and (ii) to study respiratory secretions of newborns for the presence of parachlamydia and rhabdochlamydia dna. methods: a new quantitative real-time taqman pcr (q-pcr) to be used on abi prism 7900 was developed. the q-pcr was then blindly applied to 41 consecutive respiratory samples (endotracheal or nasopharyngeal secretions) taken from 29 critically-ill newborns admitted in the neonatology ward of our university hospital. these samples were also tested using a previously developed parachlamydiaspecific pcr. results: most newborns (28/29) were premature (median gestational age: 28.6 weeks; range: 24.6−41.2). initial respiratory distress syndrome was present in 86% of them. positive pcr results were obtained in 12/29 (41%) patients (8 parachlamydia, 3 rhabdochlamydia, 1 both species) at a median of 17.5 days (range: 2-230) after birth. when compared to the control group (17 patients with negative pcr), these 12 newborns had a significantly worse primary adaptation and a higher incidence of resuscitation maneuvers at birth (table) . duration of noninvasive mechanical ventilation and stay in neonatology ward were also significantly longer. a fatal issue was observed in 3 infected cases, as compared to no death in controls (p = 0.06). gestational age at birth as well as the incidence of pulmonary or systemic infections did not differ between cases and controls. conclusion: a high prevalence of parachlamydia and rhabdochlamydia dna was observed in respiratory secretions of premature critically-ill newborns. the presence of dna of these microorganisms was associated with a worse primary adaptation, a more severe respiratory distress syndrome and a trend towards a higher mortality. their pathogenic role should be further investigated. the genus kingella consists of 3 species, k. kingae, k. oralis and k. denitrificans. all are gram negative, sometimes difficult to stain, rod shaped bacteria that are normal respiratory and genitourinary flora. they are slow-growing and fastidious. although improved recovery was shown when using fan or peds-f blood culture bottles, the majority of these infections remain undetected, especially in pre-treated patients. we report the use of real time polymerase chain reaction (rt-pcr) assays for detection of k. kingae and s. aureus in paediatric osteoarticular infections. methods: 116 synovial fluid samples from 97 patients, 1 month and 17 years of age, were collected over 19 months (03/2006 to 10/2007). the samples were from 54 knees, 39 hips, 9 ankles, 6 elbows, 4 shoulders, 2 wrists and 2 femur abscesses. after automated dna/rna extraction, specimens were subjected to 4 hour pathogen-specific rt-pcr. samples were inoculated onto sheep blood and chocolate agar as well as a peds-f bottle. final species identification and antimicrobial susceptibilities were determined by phoenix (tm). results: 45 patients (56 specimens) had positive culture and/or rt-pcr, resulting in an overall positivity rate of 46%. s. aureus was the predominant pathogen accounting for 31 specimens of 23 patients (12 mrsa, 11 mssa) and. 37% of positive specimens (18 patients) were due to k. kingae (n = 21). among children 0−2 years (n = 35), k. kingae was the predominant pathogen accounting for 16 positive patients (46%), followed by mssa in 4 patients (11%). the positivity rate for this age group was 57%. only 2 children >2 years (5 and 9 years) were positive for k. kingae. mrsa was the predominant pathogen in 6−12 year olds, and mssa was evenly distributed among children 3−17 years old. culture detected only 5 of 21 specimens positive for k. kingae and 25 of 31 s. aureus. 4 other pathogens were detected by culture only. the use of these molecular assays enhances detection of organisms, especially for k. kingae (19% vs. 5% for culture). additionally, faster identification (tat 4 hrs) allows for rapid targeted therapy. this improvement in tat could lead to shorter hospital stays in about 54% of cases. results: genotyping revealed a high degree of diversity, indicative of a panmictic bacterial population. further, there was no association between genotype and colonisation frequency, or year of isolation. pcr screening for virulence genes revealed an incidence of 98% for uspa1, 81% for hag, 82% for uspa2 and 18% for uspa2h. no significant difference was observed in the prevalence of virulence-associated genes between isolates originating from children who were colonised only once or children colonised on all 3 occasions (p = 1). pcr-rflp analysis of uspa1, hag and uspa2 showed many gene variants, with no association between pcr-rflp patterns and colonisation frequency, or year of isolation. conclusion: even in relatively localised geographical settings, the genotypic diversity of m. catarrhalis isolates colonising children is large, with no yearly pattern of genotype predominance. children serially colonised with m. catarrhalis isolates appear to clear a particular genotype only to become subsequently colonised with a different genotype. the incidence of virulence genes in this relatively localised study group is remarkably similar to that reported in global m. catarrhalis isolates, possibly indicating that similar selection pressure exists for m. catarrhalis at both the local and global level. virulence gene variation appears to be high, even in this relatively restricted geographical group. these results could have consequences for vaccines designed against virulence genes. a. naessens°, i. foulon, a. casteels, w. foulon (brussels, be) objectives: to evaluate the epidemiology of cytomegalovirus in pregnancy and to evaluate the risk for delivering a child with congenital cmv (ccmv). methods: between 1996-2006, 11825 unselected mother-infant pairs were included. in the mother a serological screening was performed consisting in the detection of cmv igg and igm antibodies at the first prenatal visit and at birth. in the neonate cmv urine culture was performed to diagnose congenital infection. when a pregnant woman was found to have a second trimester spontaneous abortion or a death in utero, an investigation for possible congenital cmv infection was carried out. results: serological screening at the first prenatal visit showed no immunity in 4701 women, evidence of past infection (igg positive igm negative) in 6877 women (58.2%) and in 250 women (2.0%) both igg and igm antibodies were detected. after investigation of stored and follow up samples from these 250 patients, 14 could be classified as having a primary cmv infection during pregnancy, 99 patients had previous immunity before the current pregnancy and from 137 patients the type of the maternal cmv infection could not be determined. follow-up serology of the 4701 women without immunity revealed a seroconversion in 58 of them (1.2%). a total of 61 (0.52%) congenital infections (ccmv) were diagnosed. the incidence of the ccmv among the different groups of women are summarised in the table. conclusion: ccmv infection occurs in 0.52% of our population of pregnant women. ccmv was considered to be due to a primary maternal cmv infection in 54% of the infants; 33% due to a recurrent maternal cmv infection and in 13% the type of maternal infection could not be determined. the risk for a seronegative pregnant woman of acquiring cmv during pregnancy is 1.2%. the transmission risk after a maternal primary infection is 45%. women with prior immunity have a very low risk (0.20%) for ccmv, this risk increases to 3% when igm are find in women with know prior immunity. the risk for women with undetermined infectious status in early pregnancy to give birth to a congenitally infected neonate is 5.8%. this report provides the first data on rotavirus epidemiology and disease burden in norway. further studies are needed to assess the economic impact of rotavirus disease and the cost-effectiveness of vaccination to inform decisions on introduction of rotavirus vaccines into the national program of childhood immunisation. pseudomonas aeruginosa may colonise the lungs of cystic fibrosis patients over years but may also cause acute infections in mechanically ventilated patients and immuno-compromised hosts within a matter of days. despite aggressive antibiotic treatments the organism is rarely eradicated. instead p. aeruginosa adapts to its host environment by developing resistance mechanisms and changing its lifestyle and virulence properties. focusing on mechanically ventilated patients, we will detail the dynamics of resistance emergence and persistence of p. aeruginosa lung populations during antibiotic therapy. we further discuss how p. aeruginosa populations evolve naturally in the absence of any antimicrobial treatment within the lungs of intubated patients by changing their virulence properties. the relevance of these findings both with respect to concepts of social evolution and the development of novel anti-infective strategies will be highlighted. the genome of p. aeruginosa encodes many potential efflux systems. however, only a few of them appear to play a significant role in antibiotic resistance. in this respect, the mex (for multiple efflux) systems are of particular interest because of their ability to extrude a wide range of antimicrobials. these polyspecific machineries result from the assembly of (i) a drug/proton antiporter, (ii) a periplasmic adaptor protein, and (iii) an outer membrane gated channel. it is now well established that the constitutive expression of the tripartite pump mexab-oprm provides p. aeruginosa with a relatively high intrinsic resistance to quinolones, blactams (except imipenem), tetracyclines, macrolides, chloramphenicol, trimethoprim, and novobiocin. this protective mechanism is potentiated by the poor permeability of the outer membrane and activity of another pump, mexxy/oprm, whose expression is induced by substrates targeting the ribosome (e.g., tetracyclines, macrolides, aminoglycosides). accumulating reports indicate that multidrug resistant mutants upregulating one or both of these systems are quite common in the clinical setting. such mutants, which are readily selected by sub-optimal treatments with fluoroquinolones, b-lactams or aminoglycosides, tend to accumulate various resistance mechanisms without loosing the wildtype pathogenicity of p. aeruginosa. whether the low resistance levels (mic x 2-to 8-fold) conferred by efflux may promote second-step mutants with altered drug targets (gyra, gyrb, parc) or derepressed ampc b-lactamase has not been confirmed in vitro. in the specific context of cystic fibrosis (cf), a recent study from our laboratory showed that the mexxy/oprm pump can be responsible for much higher resistance levels to aminoglycosides (64-to 128-fold). this increased efficacy of the system partially results from adaptive mutations in the mexy gene. in contrast, subpopulations deficient in mexab-oprm tend to emerge during long-term colonisation of cf airways. while easily selected in vitro on selective media, mutants overexpressing other mex systems (mexcd-oprj, mexef-oprn, mexghi-opmd, mexjk/oprm, mexvw/oprm) have been rarely described in cf and non-cf patients. some data support the notion that up-regulation of mexcd-oprj or mexef-oprn might be detrimental to the virulence of p. aeruginosa. in conclusion, therapeutic strategies based on efflux inhibitors should target the mexab-oprm and the mexxy/oprm systems in priority. european aspects of malaria s478 rapid diagnostic tests for malaria: twenty years to convince . . . prompt diagnosis and treatment of malaria are critical factors in reducing morbidity and mortality. microscopy has long been the gold standard for malaria diagnosis, but the newer rapid diagnostic tests (rdts) now offer considerable advantages, especially so in endemic countries. after close to twenty years of development and operational research, the diagnostic performance of rdts is now established in all settings. meta-analyses have clearly demonstrated equivalence of rdts over expert microscopy to detect parasites, and clear superiority over routine microscopy. actually, one of the major reasons that have delayed successful implementation of rdt in endemic areas was the use of poor quality microscopy that has impeded reliable measurement of sensitivity and specificity and undermined confidence of health workers in rdts. other factors were poor product performance, inadequate methods to determine the quality of products and a lack of emphasis and capacity to deal with these issues. for the potential of rdts to be realised, it is crucial that high-quality products that perform reliably and accurately under field conditions are made available and that quality insurance is performed on all steps of the procedure. in achieving this goal, the shift from symptom-based diagnosis to parasite-based management of malaria can bring significant improvement for the management of fever in endemic areas. for travelers returning in temperate climates with fever, rdts have also the potential to improve diagnostic procedures, especially so in hospitals where reliable microscopy is not available out of hours. in patients with no danger sign or significant thrombopenia, a negative rdt is sufficient to exclude malaria and allows waiting 12−24 hours for performing or reading the microscopy slide. rdts should be repeated every 12−24 hours for three consecutive days if fever persists and in the absence of alternative diagnosis. rdts represent a revolution in the fight against malaria and will tremendously help to manage appropriately patients with fever, especially so when malaria is declining and hence other causes of fever increasing. the ambitious deployment that is foreseen in the coming years in africa through large grants from the global fund should contribute to achieving the millennium goals. fever is the key symptom of malaria among returning travellers (97%). headache, chills, myalgia, sweating and lack of a focus are frequently recorded, but non-specific. nausea and vomiting are often seen in children. the differential diagnosis of other infections, mainly of viral origin, is further difficult because (dry) cough and (mild) diarrhoea are often present. laboratory findings (thrombocytopenia, low or normal leucocyte count) can be helpful in the assessment of mild to moderate malaria. clinical signs and symptoms, e.g. fever, may be mitigated in semiimmune patients (visiting friends and relatives, foreign visitors) seen in non-endemic countries who represent the majority of cases diagnosed in industrialised countries. caution is warranted in assessing such patients as many of them may no longer be exposed to malaria in their countries of origin, thus no longer partially protected and also at risk of suffering from severe complications. up to 10% of all imported malaria cases may be severe, presenting with jaundice, impaired consciousness to coma, acute renal failure, and, in the course of events, acute respiratory failure. delay in diagnosis and start of treatment is partly responsible for fatality rates of 1% and more in some countries. if you don't look for them, you won't find them: anaerobes revisited s481 anaerobic microbiota of the mouth − friend or foe? anaerobes form a major part of the commensal microbiota in the digestive tract where they constitute an integral component of the function on mucosal surfaces. in the mouth, teeth create a unique, non-shedding environment for bacteria to attach and to form biofilms. there is an age-related succession order of species in bacterial colonisation of the mouth, and once established, individual anaerobic species tend to remain as members of the oral microbiota. the agerelated pattern of the colonisation of anaerobic bacteria is partly connected with the development (or loss) of the dentition. interactions between different bacteria residing in the same microenvironment influence the composition of the microbiota − or the development of pathologic conditions. although commensal bacteria are regarded beneficial to the host, some anaerobic members of the oral microbiota contain characteristics potentially detrimental for the health status of an individual. molecular means of characterisation have resulted in increased knowledge about the "normal" microbiota of the mouth and in detection of new species and genera as well as phylotypes, which can be associated with infectious situations in the mouth. oral infections are multifactorial and polymicrobial in nature, and their aetiologic organisms originate mainly from the oral resident microbiota. the involvement of anaerobes is most obvious in infections of root canals, periodontal tissues, and tissues surrounding erupting wisdom teeth where typical anaerobic findings are gram-negative rods. in addition, gram-positive anaerobic cocci and non-spore-forming gram-positive anaerobic rods are common in odontogenic infections. on some occasions, anaerobes of localised dentoalveolar infections can spread to adjacent tissues and even to the bloodstream, resulting in severe complications in extraoral sites. interestingly, a relatively limited number of anaerobic species are involved in clinically severe infections, however, microbial findings seem to vary depending on geography. concomitant with the increase in the number of immunosuppressed patients, the number of opportunistic infections caused by commensal anaerobes may increase. identification to the species level will help to establish associations between individual anaerobic species and specific disease states. studies on the bacteriology of diabetic foot infections (dfi) have yielded varied and often contradictory results. the role of anaerobes is particularly unclear, often because the type and severity of the infection is poorly defined, recent antibiotic therapy is unknown, and specimen collection and culture techniques are inadequate. when optimal collection, transport, and culture techniques are used, multiple organisms including aerobes and anaerobes are usually recovered from severe dfi. interactions within these polymicrobial soups lead to production of virulence factors, such as haemolysins, proteases, collagenases, and short chain fatty acids, which promote inflammation, impede healing and contribute to the chronicity of the infection. to better define the bacteriology of diabetic foot infections, we analyzed our data from a large prospective u.s. multicentre trial of patients with moderate to severe infection that required initial parenteral antibiotic therapy and used optimal post-debridement sample collection, transport and culture procedures. of the 427 culture-positive specimens (of 454 total), only 16.2% were pure cultures while 30.4% yielded 5 or more organisms. a total of 462 anaerobes (range 0−9, average 2.3, per specimen) were recovered from 49% of patients, with gram-positive cocci (gpc) accounting for 45.5% of all anaerobic strains. s485 is culture still the gold standard, really? tremendous technological advances are made in culture-independent methods of detection and identification of human bacterial pathogens, such as pcr or hybridisation of their genomic dna. yet, time honoured pastorian bacterial culture in liquid and solid nutritive media still remains the gold standard for the laboratory diagnosis of a majority of bacterial infections. this unusual robustness of a 19th century technology stems from its unmatched operational characteristics: 1. broad range of detected agents, depending on adequate combination of media/incubation conditions; 2. unlimited source of clonal population for individual isolate, allowing versatile characterisation of antibiotic susceptibility and/or pathogenic factor production and/or epidemiological subtyping; 3. possibility of storage/bio-banking of cells for complementary clinical testing, research and diseases surveillance collections; 4. proof of pathogenic role of agent at the time of viable cell isolation from the site of infection, in contrast to false-positive results with molecular tests (tissue translocation or persistence of bacterial dna, soluble antigen,. . . ). major drawbacks of bacteriological culture include long turn-around time, cost and labour/skill intensity. these are partly alleviated by new technologies, including automated processing, physical/chemical growth detection and rapid molecular fingerprinting (maldi-tof, raman spectrometry, 16s rdna snp detection). it is likely that the next decade will see a complete redefinition of the place of direct detection methods and culture-based confirmation methods in clinical bacteriology, enabling a rejuvenation rather than elimination of culture as a daily diagnostic tool. the advent of real-time pcr revealed instrumental to the successful implementation of molecular methods in routine clinical microbiology laboratories. automated nucleic extraction platforms can now be coupled to robotic handling for large-scale detection and quantification purposes, mostly in virology. i will review here the attempts of implementing home-brew and commercial nucleic-acid based detection methods directly from blood samples and highlight hopes and pitfalls. i will then expand on two promising nucleic acid amplification methods: lamp (loop mediated isothermal amplification) and a protein-free method called dnazyme. these isothermal amplification methods share several strengths: robustness across highly diversified physico-chemical conditions, versatility in assay development and minimal requirements (if any) for sample preparation. they will definitely compete against current real-time pcr assays and might become a novel standard, due to lower costs and improved performances. the ribosomal rna (rrna) approach to microbial evolution and ecology has become an integral part of microbiology. rapidly growing databases exist that encompass besides the 16s rrna sequences of almost all validly described bacteria and archaea also numerous 16s rrna sequences of so far uncultivated microbes, directly retrieved from the environment by pcr or metagenomics. based on the patchy evolutionary conservation of rrna genes oligonucleotide probes can be designed in a directed way with specificities ranging from species up to large evolutionary entities like phyla or even domains. when such probes are labeled with fluorescent dyes or the enzyme horseradish peroxidase they can be used to identify single microbial cells by fluorescence in situ hybridisation (fish) directly in complex environmental samples. an update on recent applications and methodological improvements will be given which includes the identification of small bacterial cells by catalyzed reporter deposition (card)-fish. with optimised methods and proper controls fish yields exact cell numbers and spatial distributions for defined bacterial populations also in highly complex mixed microbial communities. r. amann & b.m. fuchs (2008) nature reviews microbiology 6:339-348. quick and reliable species identification of microorganisms is of great importance in medical microbiology. several bacterial and fungal species can be identified only using laborious and time-consuming methods. furthermore, in many cases misidentification occurs due to e.g. limited biochemical reactivity, different morphotypes or limited information in reference panels. in this talk, matrix-assisted laser desorption/ionisation time-of-flight (maldi-tof) mass spectrometry will be presented as a method for species identification. this technology applies protein pattern matching based on mass spectrometry. during the identification process, a mass pattern is generated for each organism. the subsequent comparison of this pattern with a database comprising reference patterns derived from well-characterised reference strains leads to species identification. as examples, the identification of various nonfermenting bacterial strains isolated from clinical specimens in comparison to partial 16s rdna sequencing will be shown. moreover, speed, accuracy in comparison to other methods, and inter-and intra-laboratory reproducibility of maldi-tof ms-based species identification will be discussed. o489 trends in invasive streptococcus pneumoniae serogroup 1 sequence types in belgium t. goegebuer, k. van pelt, j. verhaegen, j. van eldere°(leuven, be) objectives: s. pneumoniae serogroup 1 (sg1) isolates frequently cause invasive pneumococcal disease, particularly in children. from 2003 onwards a marked increase in sg1 isolates was observed; overall prevalence increased from 8. 2% (1998-2002) to 13.6% (2003) (2004) (2005) (2006) . we determined the sequence types (st) in sg1 isolates in order to better understand trends in sg1 resistance and spread. methods: as national reference centre, we receive all invasive isolates from more than 100 of 182 laboratories in belgium. 124 randomly chosen sg1 isolates from all ages from 1998 to 2006 were analysed via multi-locus sequence typing (mlst) as described by enright & spratt (microbiol. 1998; 144: 3049−60) . we also included data on strain characteristics and patient characteristics. results: 10 different sequence types (st) were identified: st350 (n = 66), st306 (n = 24), st304 (n = 13), st227 (n = 10), st228 (n = 5), st2915 (n = 2), st305 (n = 2), st612 (n = 1), and st217 (n = 1 mutations usually increase the mic slightly, but enhance the probability of further mutations. efflux pumps like pmra reduce antibiotic concentrations in the bacterial cell, enabling longer survival. we hypothesised that efflux positive bacteria are more likely to develop resistance than efflux negative bacteria. the following questions were addressed: 1. do the efflux pump inhibitors reserpine and verapamil reduce the mutation frequency? 2. do fluoroquinolone-susceptible efflux positive pneumococci exhibit higher parc or gyra qrdr mutation frequencies than efflux negative isolates? 3. does efflux phenotype impose a fitness cost? methods: matched efflux positive and negative pneumococcal isolates with identical or similar genotype according to multi-locus sequence typing collected by the german community acquired pneumonia network capnetz were analysed (n = 17). strains tigr4 and r6 were included as efflux negative controls. ciprofloxacin (cip) mics and efflux phenotype were measured by agar dilution method, for efflux detection reserpine (10 mg/l) was added and a fourfold decrease in mic was considered as efflux positive. mutation frequencies were determined by plating bacterial suspensions onto agar with and without cip. after incubation colonies were counted and the ratio of cfu/ml yielded the mutation frequency. equally, the mutation frequency was determined adding different concentrations of verapamil (10, 25, 50, 100, 500 mg/l) or reserpine (0.01, 0.1, 1, 5, 10 mg/l). biological fitness was calculated as the maximum slope of growth curves recorded in a microtitre plate reader. results: 1) even at low concentrations, reserpine clearly reduced the mutation frequency of efflux positive and, to a lesser extent, efflux negative pneumococci when exposed to cip (figure 1); verapamil exhibited this effect merely at high concentrations. 2) efflux positive isolates produced more frequently mutants (8/9) than efflux negative isolates (2/10) (p = 0.005, fisher's exact test). 3) efflux phenotype had no measurable impact on the biological fitness. conclusion: a positive efflux phenotype increases the qrdr mutation frequencies in the presence of fluoroquinolones and this effect can be inhibited by very low concentrations of reserpine. as a matter of concern, efflux is not associated with decreased biological fitness. background: use of fluoroquinolone (fq) has been associated with increasing fq resistance in s. pneumoniae. because respiratory fqs (levofloxacin (levo) and moxifloxacin (moxi)) are first line therapy for serious respiratory infections, increasing fq resistance (fqr) in sp is a concern. levo targets parc, and moxi targets gyra, which may permit differentiation of degree of selective pressure. we examined fq use, and changes in the prevalence of fqr and qrdr mutations in canadian isolates of sp. methods: cbsn is a canadian collaborative network of microbiology laboratories that has performed surveillance for antibiotic resistance in sp since 1988. antimicrobial resistance is performed in a central lab to clsi standards. we sequenced qrdr regions of all fqr isolates and a stratified sample of fq susceptible isolates. population fq use was obtained from ims canada. results: from 1995 to 2007, fq use increased from 53 to 97 rx/ 1000pop/yr; levo use from 0 to 10 rx/1000pop/yr, and moxi use from 0 to 17 rx/1000pop/yr. 31081 isolates were available for testing. levo r rates increased from 0 in 1993 to 1.8% in 2002 then remained stable until 2008 (1.6% in 2008). moxi r rates increased to 0.6% in 2004, then stabilised (0.7% in 2008). the prevalence of parc only mutations has not increased significantly in the last decade (see table) . the prevalence of isolates with both parc and gyra mutations increased until 2002, but has decreased in 2008. the first gyra only mutant was detected in 2000; the prevalence of gyra only mutants since then has increased, but remains very low (7/2044, 0.34% in 2007) . conclusion: despite increasing use of respiratory fqs, fqr in pneumococci is very low and not increasing in canada. the prevalence of isolates with parc mutations is decreasing. isolates with mutations in gyra alone remain extremely rare, suggesting that moxi exerts minimal selective pressure for resistance. in streptococci, two well characterised macrolide resistance have been described: target modification and active drug efflux. target site modification is mediated by the erm genes -erm(b), erm(a), erm(c)which confers the mlsb phenotype. target modification by mutations in 23s rrna as well as mutation in l4 and l22 ribosomal proteins have also been reported. expression of mef(a) genes activate an efflux mechanism responsible for m-type resistance we characterised a clinical isolate of s. agalactiae mb56gbs022 exhibiting the mlsb phenotype and tetracycline resistance. in this study, we determined the resistance genes, their association, and their localisation and mobility by conjugation. methods: the macrolide and tetracycline resistance genes were confirmed by pcr. the association between macrolide and tetracycline genes was investigated by long-pcr and sequencing. conjugation experiments were performed by filter matings. the genetic localisation of resistance genes was determined by endonuclease i ceui -followed by pfge and southern blot. the hybridisation study was performed using three specific probes for the 16s and 23s rrna genes, for erm(b) and tet(o) genes. results: s. agalactiae mb56gbs022 carried erm(b) and tet(o) genes on the same amplicon of 7 kb in size. the nucleotide sequence analysis of the entire product was identical to the peoc01 of 11 kb from pediococcus acidilactici that contains four orfs, of which orf2 and orf3 encode a putative resolvase and topoisomerase type i, respectively. the endonuclease i ceui method, that easily distinguishes between plasmid and chromosomal localisations as i-ceui only cuts chromosomal dna, revealed the localisation of resistance genes on the plasmid. all attempts to transfer erm(b)-tet(o) structure by conjugation from s. agalactiae mb56gbs022 to og1ss e. faecalis as recipient failed. conclusion: our results show the first case of the association between erm(b) and tet(o) genes on the unique mosaic structure in s. agalactiae, probably on the plasmid, as demonstrated by the i ceui-assay. further studies are on going to characterise the entire genetic element carrying resistance genes. o499 improving influenza pre-analytic collection systems: alternative collection systems to inactivate, preserve, or extract influenza for rapid testing s. castriciano°, k. luinstra, m. ackerman, a. petrich, m. smieja (brescia, it; hamilton, ca) objectives: in this study, 3 alternative influenza sample collection systems were evaluated for potential use in a pandemic situation. the objectives were to develop: 1) a non-temperature dependent swab collection and transport system, that inactivates influenza virus infectivity but preserves cell morphology and nucleic acid (na) for the detection of suspected influenza infections and/or 2) a system compatible with direct na testing without the need for purification prior to detection by a rapid real-time rt-pcr. methods: flocked nasopharyngeal swabs (nps) collected in utm (u) were compared to nps collected in a cymol (c), m-swab (m) or dry (d) flocked swab collection system (copan, italia). cymol is an alcoholbased medium that preserves cells for dfa testing. the m-swab contains 600 ul of medium and 150 ul of glass beads, and requires no na purification step. shell vial culture was used to assess influenza virus inactivation after 30 minutes exposure to the collection media. a mockinfected influenza a virus sample was absorbed to duplicate swabs then placed into the 4 collection systems. the infected collection media were held at rt for 30 minutes and then inoculated in duplicate into shell vial culture and stained after 48 hours. influenza a stability and na recovery after mock infection of each collection system was assessed after 1, 7, 14 and 21 days (d) at 4ºc, −20ºc, room temperature (rt) and 37ºc. aliquots of infected collection media were extracted by easymag and 5 ul of purified na tested by a quantitative influenza a rt-pcr on the roche lightcycler. m-swab collected samples were also tested directly or after boiling, without na purification. results: shell vial culture found that influenza a virus was inactivated after 30 minutes exposure to the c medium but not when exposed to the u and m media. influenza a was detected by dfa from the u and c cell smears. quantitation of influenza a rna was constant after 1, 7 and 14 d in u, c, m and d collection systems at −20, 4ºc and rt. the quantity of rna recovered declined significantly after 14 d at 37ºc in all 4 collection systems. m with boiling yielded data comparable to the easymag extraction. the copan cymol medium inactivates influenza infectivity, preserves cells and stabilises rna up to 14 days at −20, 4ºc and rt. cymol medium is a potential alternative for safe sample collection during a pandemic influenza situation. the m-swab presents a rapid testing alternative. luminex respiratory viral panel in respiratory specimens from children r. selvarangan°, s. selvaraju, d. baker, k. estes, l. hays, d. abel, s. hiraki (kansas city, us) objective: luminex respiratory viral panel (rvp) is a multiplex pcr capable of detecting and differentiating twelve different respiratory viruses and their subtypes; influenza a (flu a) (subtypes h1 and h3), influenza b (flu b), respiratory syncytial virus (rsv) (subtypes a and b), adenovirus (adv), parainfluenza 1 (piv 1), parainfluenza 2 (piv 2), parainfluenza 3 (piv 3), human metapneumovirus (hmpv) and rhinovirus (rhv). the aim of this study was to evaluate the analytical performance characteristics of rvp assay and to evaluate its ability to detect respiratory viruses from nasopharyngeal aspirates obtained from children. method: analytical sensitivity, specificity, accuracy and precision of the luminex rvp assay were determined using control viral stocks and respiratory specimens previously tested by rmix shell vial culture. result: luminex rvp assay reliably detected atcc viral stocks of flu a, flu b, rsvb, rhv and piv3 in the range of 10e-2 to 10e-4 tcid50/ml. no cross reactivity was noted with cmv, hsv, hhv6, ebv, vzv, piv4, cornoavirus 229e and oc43. among 146 respiratory specimens previously characterised by culture 138 specimens were accurately detected with overall accuracy of 95%. the median coefficient of variation in mean fluorescent index values of signals from replicate analyses of influenza a, b and rsv was 9% (7% to 25%). the 146 clinical specimens tested by rvp assay included 109 culture positive and 37 culture negative specimens. respiratory viruses isolated from the culture positive specimens include the following; 19 adv, 11 flu a, 10 flu b, 19 rsv, 9 piv1, 6 piv2, 5 piv3, 19 hmpv and 14 rhv. rvp assay detected all of the respiratory viruses except one each of rsv, piv1 and piv2 virus with overall sensitivity ranging from 88% to 100% for the different respiratory viral groups. among the 37 culture negative specimens 20 respiratory viruses were detected by rvp of which 15 were subsequently confirmed by repeat analyses. conclusion: luminex rvp assay is a highly sensitive and specific test useful in the detection of commonly encountered respiratory viruses in respiratory specimens. the addition of rvp assay to the viral testing algorithm of respiratory infections in children provides rapid results, improves diagnostic yield and may result in decreased antibiotic usage, reduced diagnostic testing and reduced hospital stay. m. savvala, i. daniil, i. berberidou, a. koutsibiri, a. stambolidi, m. papachristodoulou, n. spanakis, d. petropoulou°, a. tsakris (athens, gr) objective: in developed countries, viruses, particularly noroviruses, are recognized as the leading cause of acute gastroenteritis. we determined the aetiology, prevalence and seasonal distribution of viral gastrointestinal infections in hospitalised patients with acute diarrhoea. methods: during one-year period (november 2007-november 2008), a total of 201 faecal specimens were collected from 165 children, 21 premature neonates and 15 adults who were hospitalised with symptoms and signs of acute gastroenteritis. stool samples were tested for the presence of rotavirus, adenovirus, astrovirus and norovirus. rotavirus, adenovirus and astrovirus antigen detection was performed by chromatographic immunoassays (rotavirus and adenovirus, vikia ® -biomerieux, france; h&r astrovirus-vegal farmaceutica, spain). noroviruses were detected by an enzyme immunoassay (ridascreen ® rbiopharm, germany) and confirmed by reverse transcription-pcr. data were analyzed for seasonality of infection and possible transmission mode. the overall incidence of viral identification in acute diarrhoeal stool was 24% (48 of 201 patients). fifty one viral antigens were detected one patient with positive antigen detection is suffering from a disease of unclear aetiology. so, an association of replication of cihhv-6 with the disease might be considered. in contrast, the other patient did not show any symptoms at the time of antigen detection. this patient shows a special mode of acquisition of cihhv-6 (by bmt) possibly resulting in differences in the immunological priming and response. in addition, in the latter patient cihhv-6 is restricted to blood cells. two other patients did not show antigen expression. so, it is unclear how the transcription and translation of viral genes is influenced? furthermore, is there a pathophysiological impact of viral replication in individuals with cihhv-6? objectives: several case studies have reported on meningo-encephalitis caused by a primary epstein-barr virus (ebv) infection. we aimed to investigate the viral loads, and the inflammatory characteristics of this thus far poorly defined disease entity. we evaluated all cases from 2003-2008, in which an ebv polymerase chain reaction test (pcr) was requested on a cerebro spinal fluid (csf) sample. primary infection was defined as a clinical presentation with sore throat/pharyngitis/malaise in combination with lymphocytosis, and detectable heterophile antibodies or positive ebv igm antibodies. patients with proven neuroborreliosis served as control group. leukocyte response and ebv viral loads in csf, and serum were compared between primary ebv and neuroborreliosis cases. results: we identified six cases with a primary ebv infection (median age: 22, male: 4) with neurological symptoms ranging from meningeal signs to encephalitis. these were compared to 14 patients with neuroborreliosis (median age: 27, male: 6). in four out of six patients with a primary ebv infection with neurological symptoms ebv dna was detected in csf and in serum, whereas all neuroborreliosis cases were ebv pcr negative in both compartments. viral loads were lower in csf as compared to serum. in blood, leukocytes, lymphocyte, and monocyte counts were significantly increased as compared to the neuroborreliosis cases (see table 1 ). specific for vp7 and vp4 genes, using pools of g and p type specific primers. all strains (niv/brv/68, niv/brv/79, and niv/brv/86) were not typeable for the vp4 and vp7 genes. after purification by "qiaquick gel extraction kit" (qiagen, germany), the vp4, vp6, vp7, and nsp4 first amplicons of the borv-a strains were subjected to sequence analysis with automated sequencer abi 3130 xl dna analyzer (applied biosystems, usa). phylogenetic analysis was performed using mega version 4.0. objectives: dengue is a flavivirus and is among the most widely-spread viral diseases. our previous report demonstrates existence of live dengue virus in blood and urine even in the convalescent postfebrile period. in some cases, excretion in the patient's urine can be detected as late as 28 days after the onset of illness. this goes along with the model of west nile virus, another type of flavivirus, which can be excreted in the urine for months after acute infection in both animal studies and human case report. here we report a pilot study to address a magnitude of such findings. methods: between april 2007 and october 2008, paediatric and adult febrile patients suspected of dengue infection were enrolled. diagnosis of dengue was based on standard specific serology on paired sera. patients with negative serology served as controls. blood and urine specimens were collected at several time points. whole blood was separated into plasma and peripheral blood mononuclear cells (pbmc). these have been aliquoted and used for earlier studies and some stored in freezers. available plasma, pbmc, and urine were processed and inoculated into aedes aegypti. surviving mosquitoes at 14 days after inoculation were employed for viral detection by dengue-specific rt-pcr. indirect fluorescence antibody (ifa) staining of mosquito heads was performed on all positive rt-pcr specimens, except for the one from pbmc (awaiting ifa result). results: 5 and 45 cases of primary and secondary infections, respectively, and 4 negative controls were included. these translated into 55 and 59 early and late dengue specimens, and 6 and 4 early and late negative-control counterparts, respectively. dengue virus were isolated in some blood and urine specimens as late as 46 days after the onset of illness. no virus was isolated from control specimens. all but 5 positive rt-pcr specimens also demonstrated positive ifa. 4 out of 5 negatives were from early-phase specimens. conclusion: our study demonstrates prolonged survival of dengue virus after clinical recovery. this finding has pathologic and epidemiologic significance, adding a potential role of urine in the transmission of the disease. spread of the virus to humans might occur through infectious urine with help from arthropod vectors. this research could provide new insights into our understanding of the pathogenesis of denv infection. isolation of dengue virus from blood and urine specimens during early (days 1−7 after onset of illness) and late (days 8−46) phases of infection (specimens with dengue isolated/total specimens for mosquito inoculation) early phase late phase plasma 16/25 (64%) 0/13 (0%) pbmc not performed 1/2 (50%) urine 8/29 (28%) 12/44 (27%) all specimens 24/54 (44%) 13/59 (22%) dna copies) 226 (<50-1461) † ; n = 2
10% for all three antibiotics (p < 0.05 in each case). cft resistant isolates in rectal samples mainly included enterobacteriaceae not being escherichia coli and klebsiella spp, whereas tob and cip resistant isolates mainly included e. coli. conclusion: sod and sdd have marked effects on the bacterial ecology in an icu with a rapid and persistent increase in resistance after intervention. antibiotic resistance remains a major concern associated with these infection control measures. o391 throwing caution to the winds? three cases of anaphylaxis to chlorhexidine coated central venous catheters from a regional cardiac centre in northwestern england this blaoxa-58-producing clone showed resistance to several b-lactams (including imipemem), susceptibility to ceftazidime, netilmicin and minocycline, and variable susceptibility to meropenem, cefepime, and aztreonam. mics for colistin and tigecycline ranged from >16 mg/l and from 0.25−4 mg/l, respectively. all oxa-58-producing isolates presented the isaba3 downstream of the blaoxa-58 gene. hybridisation assays revealed a plasmidic location for the blaoxa-58 gene with ca 90kb. plasmid sequencing showed an isaba3-like truncated at the 3 end upstream of the blaoxa-58 gene, a fact that may explain the observed negative carbapenemase-production bioassay. conclusion: blaoxa-58-carrying a. baumannii is, apparently, more ancient than initially imagined. although undetected from 2004 onwards, the fact that it possessed a non-expressible gene, due to alterations in the promoter region, suggests that this information might have been incorporated from a still unidentified source. twenty-seven (45%) were male. isolates were recovered from respiratory secretions (33 isolates, 55.0%), blood (11, 18.3%), urine (7, 11.7%), catheter (5, 8.3%) and other secretions (4, 6.7%). only 24 (40.0%) of 60 patients received appropriate antimicrobial therapy either with polymyxin b (79.2%), ampicillin-sulbactam (12.5%) or tigecycline (8.3%). overall 30-day mortality of patients with crab was 50%. mortality rates were 3.2 per 1000-patient/day. these rates were significantly higher among patients who have not received appropriate therapy (1.2 per 1000-patient/day) compared with those who have received it (0.3 per 1000-patient/day; p = 0.001; figure 1 ). in the cox regression model only receiving appropriate treatment (hazard ratio [hr] 3.29; 95% confidence interval [ci] 1.35−8.02); p = 0.009) was independently associated with 30-mortality. positive blood culture for crab remained in the final model (hr 1.85; 95% ci 0.86−4.00; p = 0.12). all 25 isolates submitted to pcr were positive for blaoxa-23. all these isolates were susceptible to polymyxin b and tigecycline. conclusion: high 30-day mortality occurred in this icu outbreak. many patients did not receive appropriate therapy, which significantly increased mortality. other clinical risk factors for mortality in this outbreak are currently under investigation. acinetobacter baumannii in norwegian strain collections reveal major discrepancies to phenotypic identification and the presence of carbapenemase-producing clonal lineages baumannii isolates the per-1 gene was identified in 18 (23%). the similarity of the bands were calculated according to "dice smilarity coefficients" and all per-1 positive isolates were found as clonally related. conclusion: in our study the prevalence of per-1 was lower than the previous studies. but the presence of high ceftazidime resistance rates among these isolates may indicate the presence of other beta-lactamases. dna analysis by pfge and rapd revealed an outbreak caused by a unique clone. detection of clonal related isolates among different services may be because of the treatment of these patients at the same services before and this may explain the spread of per-1 positive strains.o442 resistance genomic islands related to abar1 are common in acinetobacter baumannii strains belonging to european clone i l. krizova°, m. maixnerova, l. dijkshoorn, a. nemec (prague, cz; leiden, nl) objective: acinetobacter baumannii strains belonging to european (eu) clone i are commonly resistant to multiple antimicrobial agents. a number of resistance genes were recently detected on an 86-kb genomic resistance island (abar1) inserted in the atpase gene of eu clone i strain aye. the aim of this study was to assess the presence of abar1related structures in epidemiologically unrelated strains of eu clone i. methods: the study set included 25 multi-drug resistant (mdr) strains of eu clone i collected in 19 european countries in 1978-2004 and 10 genotypically unique, fully susceptible strains. using pcr, all strains were investigated for the presence of the atpase gene and for nine genes found to be associated with abar1. furthermore, the strains were tested for the disruption of the atpase gene using pcr primers directed against the 3 and 5 ends of this gene. strains with the disrupted gene were investigated for the presence and structure of the atpase gene-abar1 connecting regions using pcr mapping and rflp. pcr primers were derived from the known sequence of strain aye. results: all strains were positive for the atpase gene. the 10 susceptible strains had an intact atpase gene whereas all mdr strains failed to produce the expected amplicon in the atpase disruption test. all eu clone i strains yielded positive results for the atpase gene-abar1 connecting regions, the structure of which corresponded to those of aye. these findings suggest the presence of atpase integrated elements in clone i strains, the integration of which had invariably taken place at the same locus site. none of the abar1-associated resistance genes were found in any of the susceptible strains. in contrast, the mdr strains harboured the following abar1-associated genes (% positive strains): aacc1 (21), aada1 (21), aadb (4), apha1 (21) stra (3), mera (20), teta (18), cat (23), the gene encoding heavy metal detoxification protein (25). individual mdr strains carried from one to nine abar1-associated genes in 11 different combinations. there was a good correlation between the content of resistance genes and resistance phenotypes. conclusion: genetic structures related to abar1 are common in strains belonging to eu clone i. the heterogeneity of resistance patterns in this clone is likely to result from the variations in the content of abar1related structures. supported by grant 310/08/1747 of the grant agency of the czech republic. objectives: to study the differences in mutation frequency and evaluate the possible correlations between drug resistance development and mutation rate in acinetobacter baumannii (ab). the mutation frequency (mf) of rifampicin (rif) resistance was used as a surrogate measure of differences in mutation rate and for detection of the presence of mutator phenotype. 10-and 100-fold higher when larvae were infected with atcc17978 and sdf, respectively. thus, the sdf genome was used as reference genome to identify functions acquired by pathogenic strains with a possible role in antibiotic resistance and pathogenicity. sixty-two clusters, corresponding to almost 870 cdss, were identified in the acicu and aye genomes (and partially in atcc17978) that were absent in sdf. this study found that targeted interventions that reduce the use of quin were associated with a decrease of the quin resistance rate in e. coli. e. velasco°, w. espelage, i. noll, a. barger, t. eckmanns (berlin, de) objectives: growing populations of older and immunocompromised patients, changes in epidemiology and unchecked use of antibiotics can led to a rise in consumption as well as resistance to certain treatments. medical doctors (mds) often have an important role alongside contributing factors. we conducted a national survey of mds in germany on their behaviours and expectations for intervention. we aimed to assess md behaviours with and influences on antibiotic prescribing and the potential for related interventions that address antibiotic resistance.methods: a representative sample comprised 10,610 mds with differing practice specialties, from both stationary and ambulatory settings (respectively: 36% and 0% internists, 0% and 54% general practitioners, 32% and 11% surgery, 3% and 4% ear/nose/throat, 10% and 10% paediatrics, 5% and 3% urology, 9% and 13% gynaecology, 2% and 4% dermatology, 3% and <1% other) in 15 federal states. we developed study questions to capture baseline information on mds and their practice with antibiotics. questions also focused on selected influences that may affect behaviour in practice. other questions solicited opinions about interventions that may improve practice. mailed questionnaires were distributed to participants via state medical associations. results: among survey respondents (n = 3,613; response rate = 34%), 66% reported that they prescribe antibiotics daily, and 90% indicated they do so at least weekly. of all surveyed mds, 60% reported that they think their own prescribing practice has an influence on antibiotic resistance in their region. of all mds, 83% found it "important" to continually improve use of antibiotics through industry independent experts providing consultation, audits and feedback. of all mds, 96% found it "important" to have provision of regional coverage of antibiotic resistance with appropriate feedback for practicing mds, and 82% found it "important" to have provision of antibiotic regulations of prescriptions with appropriate feedback for practicing mds. (results in table 1 .) -a not all results shown and remaining percentages are as follows: a closed three category scale was used for options "yes", "no", "do not know". a closed four category scale was used with options "very important", "important", "less important" and "not important". a closed five category scale was used for options "daily", "weekly", "monthly", "seldom" and "never". a closed five category scale was used for options "strongly agree", "agree", "neutral", "disagree" and "strongly disagree". objectives: to investigate the mlsb and tetracycline resistance and the emm gene distribution among the invasive streptococcus pyogenes (gas) strains. methods: between january 2006 and december 2008, a total of 991 strains responsible for invasive infections for adult patients were sent to the french national reference center for streptococci to be studied. antibiotic susceptibility testing was done by disk diffusion method according to the ca-sfm guidelines. mics were determined by e-test method. streptococcal emm sequence was done according to the cdc protocol. detection of macrolide and tetracycline resistance genes: erm(b), erm(tr), mef(a), tet(m), tet(o), tet(k), and tet(l) was performed by pcr. results: among the 991 streptococcus pyogenes invasive strains; more than ten different emm-types were identified. the most frequent emm sequence types were emm1, emm28 and emm89. a total of 80 strains (8%) were resistant to erythromycin. erythromycin resistance prevalence had decreased during the three years period (12. 2%-2006, 7.6%-2007, 5.5%-2008) . 69 had an mlsb constitutive (65 strains) or inducible (4 strains) phenotype due to erm(b) or erm(tr) resistance gene. 11 with the m phenotype and mef(a) gene were susceptible to clindamycin. among the 132 (13.3%) tetracycline resistant isolates tet(m), tet(o) and tet(l) genes were detected in 86, 27 and 19 strains, respectively. tetracycline resistance prevalence had also decreased during the three years period (16. 8%-2006, 14%-2007, 8.7%-2008) . conclusion: most of the invasive french gas isolates remained erythromycin and tetracycline susceptiple during three years. nontheless, the resistance rates have had the tendency to decrease slightly. taking into account the resistance trends helps to guide the therapy for penicillin-allergic patients. objectives: during a survey on antimicrobial susceptibility in betahaemolytic group c and g streptococci (gcgs) from portugal, a macrolide resistance rate higher than previously reported in other european countries was found (22%) among s. dysgalactiae subsp. equisimilis isolates. to gain further insights into the resistance mechanisms involved and the clonal structure of the resistant population, we undertook the phenotypic and molecular characterisation of macrolide resistant s. dysgalactiae subsp. equisimilis isolates and compared it with the susceptible population. methods: antimicrobial susceptibility testing and macrolide resistance phenotype were determined by disk diffusion. all the macrolideresistant isolates were further characterised by mic testing and genotype determination by pcr. a combination of emm typing and pulsed-field gel electrophoresis (pfge) was used to type the population and the simpson's index of diversity (sid) with 95% confidence intervals was calculated as previously described.results: a total of 69 isolates were resistant to erythromycin (mic range, 4 to >256 ug/ml). the vast majority of isolates presented a mlsb phenotype (n = 64) and carried the erm(a) gene (n = 55), while the mefencoded m-phenotype was expressed by only 5 isolates. among resistant isolates, 13 distinct emm types were found distributed by 10 pfge clusters that overlapped with the main clusters detected in the susceptible population. the emm types stg480, stg6, stg485 and stg2078 accounted for approximately two thirds of the resistant isolates. pfge did not always separate neither macrolide-resistant from susceptible isolates nor erm(b) and mef(a) from the prevailing erm(a) isolates. the sids of emm and pfge calculated for resistant isolates were not statistical different from the overall population. the two most prominent mls resistant lineages were one with stg480/erm(a) isolates (n = 8) and stg485/mef(a) (n = 3), and another including stg2078/erm(a) (n = 8).conclusion: although most of the resistant isolates presented a mlsb phenotype and carried an erm(a) gene, molecular typing revealed extensive diversity in both emm types and pfge clones. macrolide resistance had a polyclonal origin, with resistance emerging among most susceptible clones. monitoring of macrolide resistance patterns in s. dysgalactiae subsp. equisimilis is essential as this pathogen is increasingly recognised as an important human pathogen. a.s. simões°, r. sá-leão, s. nunes, n. frazão, a. tavares, h. de lencastre (oeiras, pt) while performing pneumococcal nasopharyngeal colonisation surveillance studies among children attending day care centres (dcc) in portugal, we observed that the rate of strains with penicillin mic 2 ug/ml more than tripled from 1.8% in 2006 to 6% in 2007 (p = 0.002). the aim of this study was to characterise the 20 isolates recovered in 2007 which had a mic to penicillin 2 ug/ml. methods: pneumococci were isolated and identified on the basis of selective growth on gentamycin blood agar plates, optochin susceptibility, colony morphology, and alfa-haemolysis. susceptibility to antimicrobials agents was performed according to the clsi recommendations and definitions. strains were serotyped by the quellung reaction and/or multiplex pcr using specific primers for each serotype. pulsed-field gel electrophoresis (pfge), after restriction of the total dna with smai, was performed to compare genetic backgrounds. results: sixteen of the 20 isolates belonged to serotype 14, three were serotype 19a and one was of serotype 15a. strains of serotype 14 were also resistant to sulfamethoxazole-trimethoprim and belonged to a single pfge cluster identified as clone spain 9v st156. the penicillin resistant serotype 14 strains were isolated in two dcc, from nine children vaccinated with the 7-valent pneumococcal conjugate vaccine (pcv7), four non-vaccinated children and three children with unknown vaccination status. five of these carriers had received antibiotics recently. in these two dcc the overall proportion of children vaccinated with pcv7 was 64%; 27% of the children had received antibiotics within the previous month and 16% had received three or more courses of antibiotics in the last six months. since the introduction of the pcv7 in portugal, in june 2001, the proportion of penicillin resistant pneumococci recovered from colonisation has been stable (c.a. 2%). the sudden increase in the levels of penicillin resistance observed in the 2007 surveillance study was found to be largely due to the dissemination of clone spain 9v st156 serotype 14 variant in two dcc with high consumption of antibiotics. the observations suggest a combination of high antibiotic selective pressure and transmission rates resulting in an outbreak-like situation with a penicillin resistant vaccine type clone being disseminated among children in day care despite use of pcv7. background: beside target mutation, active efflux is another common resistance mechanism to fluoroquinolones (fq) in s. pneumoniae. two main efflux systems have been described so far, namely pmra (member of the mfs superfamily) and the two abc transporters pata/patb. we have studied the inducibility of pmra, pata and patb genes expression when bacteria are exposed to subinhibitory concentrations of fq. we used a wild-type sensitive strain (atcc49619), two clinical strains resistant to fq (sp13 and sp295), and two efflux mutants (sp334 and sp335; selected in vitro after exposure to ciprofloxacin [jac 2007, 60:965-972] ). mic were determined according to clsi. induction was obtained by growing bacteria in todd-hewitt broth added by half the mic of each fq (cip, nor, lvx, mxf, gmf) for 4 h at 37ºc in 5% co2 atmosphere. expression levels of pmra, pata and patb genes were determined by real-time pcr. reversibility of induction was tested by re-cultivating bacteria for 4 h in drug-free medium. results: antimicrobial susceptibilities for cip and mxf and gene expression at basal level and after exposure to these fq are shown in the in women with single infection, the most common hpv types were hpv-6 and hpv-16, followed by hpv-51, hpv-31, hpv-53 and hpv-66, whereas in women with multiple infections hpv-6 was the most commonly detected type, followed by hpv-66, hpv-31, hpv-16 and hpv-51. a different distribution of hpv types and a higher rate of multiple infections were observed in young vs. older women, suggesting the existence of a natural selection of hpvs which preserve a better fitness. high-risk hpvs were detected in all high-grade cervical intraepithelial lesions, with hpv-16, hpv-18, hpv-31, and hpv-51 as the most frequent types. however, hr-hpv types were detected also in a high rate of women with a negative pap test as well as in women with a negative cervical biopsy, suggesting the need to improve the accuracy of available cervical cancer screening tests. the results of this study, which provide information on the epidemiology of hpv infection and type distribution in women from south italy, should be taken into consideration in the implementation of local vaccination programs. objectives: chromosomal integration of the hhv-6 genome (cihhv-6) into the human genome occurs in 1−2% of healthy individuals and leads to persistently high levels of hhv-6 pcr copy numbers in blood and tissue. consequently, this may be interpreted as persistent active hhv-6 infection. although hhv-6 mrna has been detected in a few individuals with cihhv-6, there is no evidence of replication of viral particles up to now. viral cultures have shown negative results. so, cihhv-6 is thought not to be linked to any disease. methods: we performed hhv-6 antigen detection in pbmcs of 4 individuals with fish proven cihhv-6 by means of antibodies directed against hhv-6 variant a and b (indirect immunoperoxidase staining). results: in 2 unrelated female adolescents (both with cihhv-6 variant a) we detected hhv-6 antigen. one patient is suffering from recurrent parotitis since 5 years and from hypoimmunoglobulinaemia. the other patient (15a) was treated with allogeneic bone marrow transplantation (bmt) for acute myeloid leukaemia (aml) and acquired cihhv-6 from the healthy donor. so, cihhv-6 is only found in blood cells. in the latter patient only symptoms attributable to the post bmt course have been observed (prolonged mixed haematological chimerism, protracted mucositis, transient hypertension and transient neuropathy). at the time of antigen detection 5 years after bmt the patient was clinically well. in 2 individuals (a girl after fatal myocarditis and her healthy father − both with variant b) no hhv-6 antigen has been detected. discussion: up to now cihhv-6 is considered not to cause any disease. for the first time we show the expression of hhv-6 antigen, which indicates the replication of viral particles. this might have a pathophysiological impact. sixty-seven % of cases with ebv meningo-encephalitis have detectable viral dna amounts in csf and serum, whereas neuroborreliosis patients do not. cases with primary ebv meningoencephalitis have increased systemic leukocytosis, with higher lymphocyte, and monocyte levels compared to neuroborreliosis patients.o506 incidence of post-herpetic neuralgia in treated and untreated patients with herpes zoster followed for 1 year in an italian prospective cohort: preliminary results g. parruti°, f. sozio, c. rebuzzi, m. tontodonati, e. polilli, a. agostinone, a. manna, f. di masi, a. consorte, g. congedo, l. cosentino, d. d'antonio, l. pippa, l. manzoli, c. granchelli (pescara, chieti, it) objectives: a large prospective cohort of patients with herpes zoster (hz) was enrolled between may 2006 and june 2008 in pescara, italy, with a planned 1-year follow-up after clinically and/or molecularly assessed diagnosis. aim of the study was to evaluate predictors of prolonged acute course and/or incidence of post-herpetic nevralgia (phn). methods: data from all enrolled patients were collected by a network of 51 general practitioners. suspected cases and patients with intense acute pain were referred to our institution for immediate evaluation. clinical and demographic information was mandatory at baseline, as photographs of enrolled patients. for uncertain cases, varicella-zoster virus (vzv) antibodies and vzv dna pcr on plasma and/or vesicular eluates (whenever available) were performed. follow-up data were collected at outpatient control visits or by phone calls at 1, 3, 6 and 12 months after onset of hz. phn was diagnosed when pain persisted or relapsed at least one month after complete clearing of dermatomeric lesions. adverse events other than pain were classified according to who grading scale and reported if 2. all statistical calculations were performed by stata 8.0 software package. results: 523 patients were enrolled, 306 (58.5%) females, with a mean age of 57.7 years, 1-year follow-up data being now available for 489. hz was localised at thorax in 45.4% and head in 20.7%; pain in the acute phase was reported as intense or very intense by 127 (25.97%) patients; 54 (11.04%) patients were referred for molecular diagnosis as clinically uncertain, 37 (68.5%) being confirmed as vzv-related cases. forty eight (9.82%) patients were not prescribed any antiviral drug at diagnosis by referring physicians, in spite of extensive support in the study plan. during follow-up, 163 (33.3%) patients reported any type of adverse event (at a mean of 91.2±74.8 days), including 93 (19.02%) patients reporting phn. phn was significantly more frequent in untreated vs treated patients (37.5% vs 17.0%, p = 0.001), as were total adverse events (54.2% vs 31.1%, p = 0.001). untreated patients did not significantly differ from those treated by age (56.3% vs 57.7%, p = 0.66) and sex (females vs males 11.9% vs 6.9%, p = 0.056), whereas they complained for more intense pain (15.0% vs 8.0%, p = 0.024) at presentation. conclusion: our study confirms the importance of early diagnosis and prompt antiviral treatment at the onset of hz in order to minimise the risk of phn. methods: faecal specimens (n = 78) from apparently healthy and diarrheic calves (aged <1 year) were collected per-rectally and investigated for detection of group a rotavirus by antigen capture elisa (generic assay, germany). elisa positive specimens (n = 3) were investigated further for molecular characterisation. genotyping of borv-a strains was carried out on dsrna extracted from 10% pbs faecal suspensions by a nested and/or heminested rt-pcr key: cord-266617-z8uecyl6 authors: pavesi, angelo title: asymmetric evolution in viral overlapping genes is a source of selective protein adaptation date: 2019-04-03 journal: virology doi: 10.1016/j.virol.2019.03.017 sha: doc_id: 266617 cord_uid: z8uecyl6 overlapping genes represent an intriguing puzzle, as they encode two proteins whose ability to evolve is constrained by each other. overlapping genes can undergo “symmetric evolution” (similar selection pressures on the two proteins) or “asymmetric evolution” (significantly different selection pressures on the two proteins). by sequence analysis of 75 pairs of homologous viral overlapping genes, i evaluated their accordance with one or the other model. analysis of nucleotide and amino acid sequences revealed that half of overlaps undergo asymmetric evolution, as the protein from one frame shows a number of substitutions significantly higher than that of the protein from the other frame. interestingly, the most variable protein (often known to interact with the host proteins) appeared to be encoded by the de novo frame in all cases examined. these findings suggest that overlapping genes, besides to increase the coding ability of viruses, are also a source of selective protein adaptation. many viruses produce novel genes inside pre-existing genes by overprinting of a de novo frame onto an ancestral frame (atkins et al., 1979; keese and gibbs, 1992; rancurel et al., 2009; sabath et al., 2012) . the high prevalence of overlapping genes in viruses has been attributed to the advantage of maximizing the gene information content of small viral genomes (miyata and yasunaga, 1978; lamb and orvath, 1991; pavesi et al., 1997) . in detail, the gene-compression hypothesis states that the size of the viral capsid imposes a biophysical limit on the size of the viral genome, thus making overprinting the most adequate strategy to gain new function (chirico et al., 2010) . in alternative, the gene novelty hypothesis argues that the birth of overlapping genes is driven by selection pressures favoring evolutionary innovation (brandes and linial, 2016) . this hypothesis is supported by the finding that overlaps, thought for a long time to be restricted to viruses, also occur in the large genomes of prokaryotic (delaye et al., 2008; fellner et al., 2015) and eukaryotic organisms (szklarczyk et al., 2007; bergeron et al., 2013; vanderperre et al., 2013) . a particularly interesting feature of overlapping genes is that they represent an intriguing example of adaptive conflict. indeed, they simultaneously encode two proteins whose freedom to change is constrained by each other (sander and schulz, 1979; krakauer, 2000; peleg et al., 2004; allison et al., 2016) , which would be expected to reduce the adaptive ability of the virus (simon-loriere et al., 2013) . we would expect, in principle, that overlapping genes are subjected to strong evolutionary constraints, as a single nucleotide substitution can impair two proteins (see the codon position "21" in fig. 1) . a typical example of "constrained evolution" is that occurring in hepatitis b virus (hbv), whose short genome (3.2 kb) contains a high percentage (50%) of overlapping coding regions (mizokami et al., 1997; zhang et al., 2010) . however, overlapping genes can also show a less conservative pattern of change, because of a high rate of non-synonymous substitutions in one frame (positive adaptive selection) with concurrent dominance of synonymous substitutions in the other (negative purifying selection). examples of positive selection concern the overlapping genes that encode the tat and vpr proteins of simian immunodeficiency virus (hughes et al., 2001) , the p19 and p22 proteins of the tombusvirus family of plant viruses (allison et al., 2016) , and the orf2 and orf5 proteins of trichodysplasia spinulosa-associated polyomavirus (kazem et al., 2016) . we can hypothesize for overlapping genes a first evolutionary model in which the two proteins they encode are subjected to similar selection pressures. when selection is strong both proteins (or protein regions) are highly conserved (e.g. the rnase domain of polymerase and the amino-terminal half of the x protein in hbv; see fig. 4 in mizokami et al., 1997) . when selection is not too strong both proteins can vary considerably (e.g. the spacer domain of polymerase and the pres1/s2 domain of the surface protein in hbv; see fig. 4 in mizokami et al., 1997) . this model is named "symmetric evolution", because the number of amino acid substitutions of one protein is expected to be not significantly different from that of the other. it corresponds to the "shared model" by fernandes et al. (2016) . in alternative, we can hypothesize for overlapping genes an evolutionary model in which the two proteins they encode are subjected to significantly different selection pressures. support for this model, which implies adaptive selection on one frame and purifying selection on the other, was provided both by viral (hughes et al., 2001; fujii et al., 2001; guyader and ducray, 2002; stamenković et al., 2016) and mammalian overlapping genes (szklarczyk et al., 2007) . this model is named "asymmetric evolution", because the number of amino acid substitutions of one protein is expected to be significantly different from that of the other. it corresponds to the "segregated model" by fernandes et al. (2016) . we recently assembled a dataset of 80 viral overlapping genes whose expression is experimentally proven (pavesi et al., 2018) , with the aim to provide a useful benchmark for systematic studies. a first analysis of the dataset revealed that overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition (pavesi et al., 2018) . we also found that the vast majority of the 80 overlaps of the dataset have one or more homologs, suggesting further comparative studies. in the present study, i investigated the evolution of viral overlapping genes by sequence analysis of 75 pairs of homologs. the first aim of the study was to determine which of the two evolutionary models described above is the prevailing one. the second aim was to identify the type of nucleotide substitution that significantly affects the pattern of symmetric/asymmetric evolution. finally, the third aim was to assess whether the most variable protein (in the case of asymmetric evolution) is that encoded by the ancestral or the de novo frame. 2.1. selection criteria for homologous overlapping genes i first extracted from the dataset of 80 overlapping genes experimentally proven (s1 dataset from pavesi et al., 2018) the amino acid sequence of the two proteins encoded by each overlap. for each protein, i searched for homologs against the non-redundant protein sequences ncbi database using blastp (altschul et al., 1997) . when blastp did not detect any homolog i used tblastn, which compared the protein query sequence against the nucleotide collection ncbi database translated in all reading frames. i used tblastn because the amino acid sequence of the protein encoded by one of the two overlapping frames (usually that discovered more recently) may not be reported in many viral genomes present in the ncbi database (pavesi et al., 2018) . the selection of homologous overlapping genes was based on three criteria. the first was an equal length of the homolog. it was met in the great majority of cases (72 out of 80). in the remaining cases, the homolog was only slightly shorter than the query sequence. the exception was the overlap capsid protein/assembly activating protein (aap) of adeno-associated virus-2, whose homolog encodes an aap 9 amino acids shorter in the amino-terminal region and 26 amino acids shorter in the carboxy-terminal region. the second criterion was a homolog yielding, for both the encoded proteins, an alignment with no insertion/deletion (indel) or with a minimal number of indels. in the latter case, i imposed the rule that indel(s) must be located at the same amino acid position in the alignments of the two pairs of proteins (see for example the overlap polymerase/2b protein of spinach latent virus, which is the first overlap in supplementary file s1). by imposing this rule, i could align the two homologous nucleotide sequences in full accordance with the corresponding protein sequences. the alignment of protein sequences was carried out with clustal omega (sievers and higgins, 2014) . the third criterion concerned the cases in which i found multiple homologs meeting the two criteria described above. in these cases, i selected the most distantly related homolog, with the aim to cover the largest evolutionary space. the choice to select only one homolog for each overlapping gene was due to the fact that collection of a larger sample of homologs is limited to a few overlaps, mainly those occurring in virus species that are human pathogens (e.g. influenza and hepatitis viruses or sars and ebola viruses). the search for homologs yielded a dataset of 80 pairs of homologous overlapping genes (supplementary file s1). thirty-seven homologs came from a different virus species, in accordance with the ictv taxonomy (king et al., 2018) (https://talk.ictvonline.org/taxonomy/). the mean nucleotide identity between overlaps and homologs was 70.7%, with a standard deviation (sd) of 9.4%. the remaining 43 homologs came from isolates belonging to the same virus species. in this case, the mean nucleotide identity between overlaps and homologs was 89.6% (sd = 7.1%). for each pair of homologous overlapping genes, the supplementary file s1 contains the following information: i) the nucleotide sequence of the upstream frame and that of the homolog; ii) the amino acid sequence of the protein encoded by the upstream frame (up1) and that of the protein encoded by the homolog (up2); iii) the nucleotide sequence of the downstream frame (shifted of one nucleotide 3' with respect to the upstream frame) and that of the homolog; iv) the amino acid sequence of the protein encoded by the downstream frame (down1) and that of the protein encoded by the homolog (down2); v) the alignment of up1 with up2 and the percent amino acid identity; vi) the alignment of down1 with down2 and the percent amino acid identity; vii) the chisquare analysis, which compared by a 2 x 2 contingency-table the number of the amino acid identities and differences in the up1-up2 alignment with that in the down1-down2 alignment (cut-off of significance = 3.84; 1 degree of freedom; p < 0.05). 3.2. half of overlapping genes evolve in accordance with the asymmetric model i carried out a preliminary analysis using the t-student test for paired data. for each pairs of homologous overlaps, i counted the number of amino acid identities between up1 and up2 and that between down1 and down2. i then calculated the absolute value of the difference between them. the null hypothesis was a mean difference orientation of overlapping genes, with the downstream frame having a shift of one nucleotide 3′ with respect to the upstream frame. there are 3 types of codon position (cp): cp13 (bold character), in which the first position of the upstream frame overlaps the third position of the downstream frame; cp21 (underlined character), in which the second position of the upstream frame overlaps the first position of the downstream frame; cp32 (italic character), in which the third position of the upstream frame overlaps the second position of the downstream frame. based on the genetic code, a nucleotide substitution at first codon position causes an amino acid change in 95.4% of cases, at second codon position in 100% of cases, and at third codon position in 28.4% of cases. thus, nucleotide substitutions at the codon positions "13" and "32" are usually non-synonymous in one frame and synonymous in the other. nucleotide substitutions at the codon position "21" are almost all non-synonymous in both frames. virology 532 (2019) [39] [40] [41] [42] [43] [44] [45] [46] [47] between paired observations close to zero, indicating that overlapping genes evolve in accordance with the symmetric model. the null hypothesis was rejected (t-student = 5.91; 79 degrees of freedom; p = 10 −5 ), indicating that overlapping genes can also evolve in accordance with the alternative asymmetric model. in order to identify which and how many overlapping genes undergo symmetric or asymmetric evolution, i then compared the amino acid diversity between up1 and up2 to that between down1 and down2. i used the contingency-table chi-square test (snedecor and cochran, 1967) with a cut-off value of 3.84 for 1 degree of freedom (p < 0.05). i classified a pair of homologous overlaps as a case of symmetric evolution, if the number of amino acid substitutions in the up1-up2 alignment did not significantly differ from that in the down1-down2 alignment (chi-square < 3.84). an example is given by the overlap ns1 protein/ns2 protein from dendrolimus punctatus densovirus. for the ns1 protein, i found 73 identities and 86 differences when compared to the homolog from hordeum marinum itera-like densovirus. for the ns2 protein, i found 71 identities and 88 differences, yielding a chi-square value (0.01) largely below the cut-off of significance. in alternative, i classified a pair of homologous overlaps as a case of asymmetric evolution, if the number of amino acid substitutions in the up1-up2 alignment was significantly different from that in the down1-down2 alignment (chi-square > 3.84). an example is given by the overlap movement protein/replicase from turnip yellow mosaic virus. for the movement protein, i found 302 identities and 323 differences when compared to the homolog from watercress white vein virus. for replicase, i found 454 identities and 171 differences, yielding a chisquare value (76.3) largely above the cut-off of significance. the chi-square test was highly sensitive. for example, i found that the overlap capsid protein/p31 protein from maize chlorotic mottle virus undergoes asymmetric evolution, in spite of a nucleotide identity with the homolog extremely high (96.7%). indeed, the number of amino acid differences between p31 and homolog (12 out of 149 sites) was significantly higher than that between capsid and homolog (2 out of 149 sites) (chi-square = 6.07; p = 0.01). based on this finding, i set the upper limit of sensitivity of the chi-square test to a nucleotide identity between overlap and homolog of 97%. this filter limited the analysis to 75 (out of 80) pairs of homologous overlaps. overall, i found that 38 overlapping genes evolve in accordance with the asymmetric model (significantly different selection pressures on the two proteins). the highest chi-square value (113.8) concerned the overlap from apple stem grooving virus, which encodes the 36kd movement protein and the polyprotein linker-domain. indeed, the amino acid diversity between linker-domain and homolog (39%; 125 differences and 195 identities) was ten-fold higher than that between movement protein and homolog (4%; 13 differences and 307 identities). i found that the remaining 37 overlapping genes evolve in accordance with the symmetric model (similar selection pressures on the two proteins). the occurrence of similar selection pressures can yield two highly conserved proteins. for example, analysis of the overlap 3a protein/3b protein from human sars coronavirus revealed that the amino acid diversity between 3a and homolog is remarkably low (5.3%; 6 differences and 108 identities), as well as that between 3b and homolog (8.8%; 10 differences and 104 identities). however, the occurrence of similar selection pressure can also yield two proteins with a remarkably less conserved pattern of change. this is the case of the overlap from spinach latent virus, which encodes the zincfinger domain of polymerase and the 2b protein. sequence analysis revealed that the amino acid diversity between zinc-finger domain and homolog is considerably high (47%; 47 differences and 54 identities), as well as that between 2b and homolog (44%; 44 differences and 57 identities). the analysis of amino acid diversity in the 75 pairs of homologous overlapping genes is summarized in fig. 2 . it shows, for each overlap, the percent amino acid (aa) identity of the two encoded proteins with those encoded by the homolog. the subset of the 37 overlapping genes under symmetric evolution ( fig. 2a) contains 31 overlaps in which both proteins have high conservation (aa identity > 50%), 5 overlaps in which both proteins have poor conservation (aa identity < 50%) and 1 overlap with a protein having an aa identity above 50% and the other below 50%. the subset of the 38 overlapping genes under asymmetric evolution (fig. 2b ) contains 24 overlaps in which both proteins have high conservation (aa identity > 50%), 1 overlap in which both proteins have poor conservation (aa identity < 50%) and 13 overlaps with a protein having an aa identity above 50% and the other below 50%. finally, a list of the 75 overlapping genes, classified in accordance with the symmetric or asymmetric model (37 and 38 cases, respectively), is given in supplementary table s1. 3.3. validation of the model of symmetric/asymmetric evolution by analysis of the pattern of nucleotide substitutions in homologous overlapping genes in accordance with wei and zhang (2014) , i first classified the nucleotide sites of each overlapping gene into four categories depending on the impact of potential mutations on the two encoded proteins. the four categories are referred as nn, sn, ns, and ss sites, respectively, where n stands for non-synonymous change and s stands for synonymous change. that is, if all potential mutations at a site cause nonsynonymous change in both proteins, it is a nn site, and so on. i then classified the nucleotide substitutions occurring in the homolog into four categories: nn, sn, ns, and ss. using the contingency-table chisquare test, i compared the number of sn and ns sites in each overlapping gene with the number of sn and ns substitutions in the homolog. under symmetric evolution, i would expect a chi-square value below the cut-off of significance (3.84; 1 degree of freedom), that is a full concordance between the number of sn and ns sites and that of sn and ns substitutions. for example, in the overlap orf4/orf5 from barley yellow striate mosaic virus i counted 49 sn sites and 51 ns sites. in the homolog from maize yellow striate virus, i classified 23 nucleotide substitutions into the sn category and 28 substitutions into the ns category. the chi-square test yielded a value (0.08) largely below the cut-off of significance. under asymmetric evolution, i would expect a chi-square above the cut-off of significance, that is a significant discordance between the number of sn and ns sites and that of sn and ns substitutions. for example, the overlap capsid protein/ns4 protein from bluetongue virus (serotype 10) has 56 sn sites and 46 ns sites. the homolog from bluetongue virus (serotype 16) has 5 nucleotide substitutions belonging to the sn category and 29 substitutions to the ns category. the chisquare test yielded a value (15.07) largely above the cut-off of significance. the analysis of the pattern of nucleotide substitutions in the 75 pairs of homologous overlaps revealed 39 and 36 cases of symmetric and asymmetric evolution, respectively (supplementary table s2 ). this result was in accordance with that obtained previously (from analysis of the amino acid diversity, see supplementary table s1) in the 87% of cases (65 out of 75). overall, i found a total of 33 overlaps under symmetric evolution (they are marked with a single asterisk in supplementary tables s2a) and a total of 32 overlaps under asymmetric evolution (they are marked with a double asterisk in supplementary table s2b ). a list of the 32 overlapping genes under asymmetric evolution is given in table 1 . these findings were not affected by the fact that some homologs came from a different virus species, while others from an isolate within the same virus species. under symmetric evolution, i found 14 and 19 overlaps with the homolog within and between species, respectively. under asymmetric evolution, i found 18 and 14 overlaps with the homolog within and between species, respectively. finally, a further validation of the model of symmetric/asymmetric a. pavesi virology 532 (2019) 39-47 evolution was provided by a correlation test between the chi-square value from analysis of amino acid substitutions and the distribution of nucleotide substitutions at the codon positions "32" and "13" (fig. 1) . given the orientation of overlapping genes in our dataset (fig. 1) , a substitution at the codon position "32" (cp32) is usually synonymous in the upstream frame and always non-synonymous in the downstream frame, while a substitution at the codon position "13" is almost always non-synonymous in the upstream frame and usually synonymous in the downstream frame. under symmetric evolution, the number of substitutions at the codon position "32" is expected to be close to that at the codon position "13", yielding a similar distribution of the amino acid substitutions in the two pairs of homologous proteins. under asymmetric evolution, the number of substitutions at the codon position "32" is expected to be significantly higher (or lower) than that at the codon position "13", yielding a different distribution of the amino acid substitutions in the two pairs of homologous proteins. by comparing the upstream frame of each overlap with that of the homolog, i calculated the absolute value (abs) of the difference between the percent frequency (%f) of substitutions at the codon position "32" (%f.cp32) and that at the codon position "13" (%f.cp13). i then carried out a correlation test between abs (%f.cp32 -%f.cp13) and the chi-square value from analysis of amino acid substitutions. as the chisquare test depends on the extent of the sample (here the length of the protein encoded by the overlap), i normalized the chi-square value in accordance with the cohen's rule (cohen, 1988) . normalization was the square root of the ratio between the chi-square value and the overall length of the two proteins encoded by the overlap (e.g. the highest chisquare value, 113.83, was converted into the highest normalized chi-square value, 0.42). i found a significantly positive correlation between abs (%f.cp32 -%f.cp13) and the normalized chi-square value (r = 0.88; t-student = 14.36; one tailed p < 0.00001; 63 degrees of freedom) (fig. 3) . as expected, this result indicates that asymmetric evolution is significantly affected by an unbalanced distribution of the nucleotide substitutions at the codon positions "32" and "13". to answer the question, i investigated the genealogy of the 32 overlapping genes under asymmetric evolution. identifying which gene is ancestral and which one is de novo (the genealogy of the overlap) can be done by examining their phylogenetic distribution, under the assumption that the gene with the most restricted distribution is the de novo one (rancurel et al., 2009) . this approach yielded a set of 34 overlapping genes with a reliably predicted genealogy (see table 1 in sabath et al., 2012 and table 1 in pavesi et al., 2013) . this set included 16 out of the 32 overlaps under asymmetric evolution. another approach to infer the genealogy of overlapping genes is the codon-usage method. it is based on the assumption that the ancestral gene, which has co-evolved over a long period of time with the other viral genes, has a distribution of synonymous codons significantly closer to that of the viral genome than the de novo gene (keese and gibbs, 1992; sabath et al., 2012; pavesi et al., 2013; willis and masel, 2018) . due to the shortness of most overlapping genes, the method has been improved, with the aim to evaluate the correlation between the codon-usage patterns of overlapping and non-overlapping genes with a fig. 2 . analysis of the amino acid diversity in the 75 pairs of homologous overlapping genes. each pair of columns shows: i) the percent amino acid identity between the protein encoded by the upstream frame of the overlap and that encoded by the homolog (dark column); ii) the percent amino acid identity between the protein encoded by the downstream frame of the overlap (shifted of one nucleotide 3′ with respect to the upstream frame) and that encoded by the homolog (gray column). the horizontal line separates well-conserved homologous pairs (aa identity > 50%) from not well-conserved homologous pairs (aa identity < 50%). (a) subset of the 37 overlapping genes under symmetric evolution. (b) subset of the 38 overlapping genes under asymmetric evolution. the numbering of overlapping genes is in accordance with that given in supplementary table s1 . the underlined numbers indicate the overlaps in which the pattern of symmetric evolution (4 cases out of 37) or that of asymmetric evolution (6 cases out of 38) was not confirmed by chi-square analysis of the nucleotide diversity. virology 532 (2019) 39-47 table 1 list of the 32 overlapping genes evolving in accordance with the asymmetric model. minimal loss of information (pavesi, 2015) . using the improved version of the codon-usage method (pavesi, 2015) , i could predict the genealogy of 18 out of the 32 overlapping genes under asymmetric evolution. in 11 cases, the prediction by codon-usage was concordant with that established by the phylogenetic method. in the remaining 7 cases, the prediction was provided only by the codon-usage method (supplementary table s3 ). the overlap p130/p104 of providence virus is notable, as the ancestral frame p104 was acquired from another viral genome by distant horizontal gene transfer (pavesi et al., 2013) , which makes the codon usage an unreliable predictor of the genealogy. the prediction yielded by phylogenetics is supported by the finding that p104, unlike p130, has a wide phylogenetic distribution (pavesi et al., 2013) . overall, i collected a set of 23 overlapping genes, all under asymmetric evolution and with known genealogy (15 overlaps with a shift of the de novo frame of one nucleotide 3′ with respect to the ancestral frame and 8 overlaps with a shift of two nucleotides 3'). interestingly, i found that in all cases the most variable protein is that encoded by the de novo gene (table 2) . 3.5. symmetric and asymmetric evolution in the same overlap: the case of the overlap polymerase/large envelope protein of hepatitis b virus (hbv) chi-square analysis indicated that the overlap polymerase/large envelope protein of hbv evolves in accordance with the symmetric model (supplementary tables s1 and s2 ). on the other hand, theoretical and experimental studies (pavesi, 2015; lauber et al., 2017) demonstrated that this long overlap (1167 nt) is subjected to modular evolution, as the spacer domain of polymerase and the s domain of the large envelope protein originated de novo by overprinting. thus, the overlap can be subdivided into two regions: a 5′ region (480 nt), in which the spacer domain of polymerase (de novo gene product) overlaps the pre-s domain of envelope (ancestral gene product), and a 3' region (687 nt), in which the reverse transcriptase domain of polymerase (ancestral gene product) overlaps the s domain of envelope (de novo gene product). i carried out a chi-square analysis of the 2 regions of the overlap independently, under the hypothesis that they may have been subject to different evolutionary pressures. this analysis revealed that the 5' region of the overlap undergoes asymmetric evolution, because the amino acid diversity of the spacer domain (33.7%; 54 differences and 106 identities) is significantly higher than that of the pre-s domain (19.4%; 31 differences and 129 identities) (chi-square = 7.75; p = 0.005). asymmetric evolution was confirmed by analysis of the pattern of nucleotide substitutions (chi-square = 10.13; p = 0.001). in addition, chi-square analysis revealed that the 3' region of the overlap undergoes symmetric evolution, as the amino acid diversity of the reverse transcriptase domain (7.4%; 17 differences and 212 identities) does not significantly differ from that of the s domain (11.8%; 27 differences and 202 identities) (chi-square = 2.04; p = 0.15). symmetric evolution was confirmed by analysis of the pattern of nucleotide substitutions (chi-square = 1.61; p = 0.20). with the aim to further validate these findings, i carried out a further analysis using, as homolog, the most distantly related overlap of woolly monkey hbv (79.9% of nucleotide identity). again, chi-square analysis of the amino acid and nucleotide diversity revealed asymmetric evolution in the 5′ region and symmetric evolution in the 3' region. details of both analyses are reported in the supplementary file s2. finally, the finding that the spacer domain of polymerase (de novo gene product) is significantly more variable than the pre-s domain (ancestral gene product) confirms that the most variable protein, under asymmetric evolution, is usually that encoded by the de novo gene. several researchers have developed methods for estimating the strength of selection pressure on overlapping genes (pedersen and jensen, 2001; sabath et al., 2008; de groot et al., 2008; mir and schober, 2014; wei and zhang, 2014) . all methods evaluate, in both overlapping frames, the ratio of non-synonymous nucleotide substitutions to synonymous nucleotide substitutions (dn/ds) by correctly taking into account the problem of the interdependence between sequences imposed by the overlap. the aim is to assess if there is neutral evolution or positive selection in one frame (dn/ds higher than 1) and purifying selection (strong constraints) in the other frame (dn/ds lower than 1). however, the only method having an accessible implementation is that by sabath et al. (2008) . yet, the method has some limitations, as it restricts the analysis to the homologous overlaps in which the two encoded proteins have both an amino acid diversity smaller than 50% or greater than 5%. in the dataset examined here (see the first 75 pairs of homologous overlaps in supplementary file s1), these limitations would have considerably reduced the size of the sample from 75 to 43 pairs of homologous overlaps. i thus chose an approach focused, at first instance, on the evaluation of the amino acid diversity of homologous overlapping proteins, which is the final result of the complex pattern of the interdependent nucleotide substitutions that occur in dual-coding regions. unlike previous studies, limited to a few virus species (sabath et al., 2012; zaaijer et al., 2007; liang et al., 2010; shukla and hilgenfeld, 2015; brayne et al., 2017) , i examined a large dataset of 75 overlaps from 59 virus species. a possible limitation of the study concerns the selection criteria for homologous overlapping genes. in particular, the first two stringent criteria (an equal length of the homolog and an alignment with a minimal number of indels) led to exclusion, for some overlaps, of highly divergent homologs. an example is given by the overlap p3n-pipo/ polyprotein of turnip mosaic virus, in which the length of the p3n-pipo protein is quite variable among the different potyvirus species, ranging from 60 to 115 amino acids (hillung et al., 2013) . thus, the dataset used in this study likely underestimates the sequence diversity of overlapping genes, as it was created mainly to ensure a high quality in the homologous relationship. the finding that 32 out of 65 overlapping genes (table 1 ) undergo asymmetric evolution is striking, as well as that the most variable protein is encoded by the de novo gene in all cases examined (table 3 ). in particular, i would point out the overlap orf3/orf4 from tobacco bushy top virus, which encodes two proteins entirely nested within each other. this peculiar arrangement is similar to that of the overlap p19/ p22 from tomato bushy stunt virus, in which the de novo p19 protein fig. 3 . correlation between the normalized chi-square value (from analysis of amino acid substitutions) and the absolute value (abs) of the difference between the percent frequency (%f) of nucleotide substitutions at the codon position "32" (%f.cp32) and that at the codon position "13" (%f.cp13). empty circles indicate the 33 overlapping genes under symmetric evolution. black circles indicate the 32 overlapping genes under asymmetric evolution. virology 532 (2019) 39-47 table 2 list of the 23 overlapping genes with known genealogy and evolving in accordance with the asymmetric model. shows a previously unknown structural fold an a previously unknown mechanism of binding to small interfering rnas (vargason et al., 2003; baulcombe and molnár, 2004; scholthof, 2006) . i believe that structural or functional studies on the de novo orf3 protein from tobacco bushy top virus could reveal new interesting features. in addition, i would point out the overlap polymerase (pb1 subunit)/pb1-f2 protein of human influenza a virus. it shows, when compared to the homolog from duck, a sixteen-fold increase of substitutions at the codon position "32" (89.2%) with respect to the codon position "13" (5.4%). this yields only 3 amino acid differences between the two pb1 subunits and as many as 35 differences between the two pb1-f2 proteins. interestingly, the de novo pb1-f2 protein has been shown to largely contribute to viral pathogenicity by a pleiotropic effect (chen et al., 2001; varga et al., 2011; yoshizumi et al., 2014) . several other de novo proteins under asymmetric evolution are known to play a role in viral pathogenicity. eight de novo proteins (arfp, vp5, l*, x, vf1, pb1-f2, p6, and nss) act as suppressor or antagonist of the interferon response by the host (park et al., 2016; lauksund et al., 2015; sorgeelos et al., 2013; wensman et al., 2013; mcfadden et al., 2011; varga et al., 2011; garcía-rosado et al., 2008; jääskeläinen et al., 2007) . four de novo proteins (p19, p69, ac4, and movement protein) act as suppressor of rna silencing (silhavy et al., 2002; chen et al., 2004; chellappan et al., 2005; yaegashi et al., 2008) . two de novo proteins (apoptin and pb1-f2) act as apoptosis factor (noteborn et al., 1994; chen et al., 2001) . finally, the de novo protein pa-x has the ability to selectively degrade the host rna-polymerase ii transcripts (khaperskyy et al., 2016) . however, another possible limitation of the study depends on the fact that the subset of overlapping genes evolving asymmetrically and with known genealogy (23 overlaps) is too small to conclude that the de novo protein is always the preferred target of selection. furthermore, overlapping genes are subjected to a variety of selection pressures that are independent of the orientation of the overlapping frames relative to one another. thus, it is hypothetically possible that an ancestral protein may be significantly more variable than a de novo protein under peculiar selective constraints. despite this limitation, our findings suggest that the birth of new overlapping genes, besides to increase the coding ability of small viral genomes (chirico et al., 2010) , is also a valuable source of selective protein adaptation. none. positive selection or free to vary? assessing the functional significance of sequence change using molecular dynamics gapped blast and psi-blast: a new generation of protein database search programs binding of mammalian ribosomes to ms2 phage rna reveals an overlapping gene encoding a lysis function crystal structure of p19-a universal suppressor of rna silencing an out-of-frame overlapping reading frame in the ataxi-1 coding sequence encodes a novel ataxin-1 interacting protein gene overlapping and size constraints in the viral world genotype specific evolution of hepatitis e virus microrna-binding viral protein interferes with arabidopsis development a novel influenza a virus mitochondrial protein that induces cell death viral virulence protein suppresses rna silencing-mediated defense but upregulates the role of microma in host gene expression why genes overlap in viruses statistical power and analysis for the behavioral sciences investigating selection in viruses: a statistical alignment approach the origin of a novel gene through overprinting in escherichia coli evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting functional segregation of overlapping genes in hiv conserved and non-conserved regions in the sendai virus genome: evolution of a gene possessing overlapping reading frames molecular and functional characterization of two infectious salmon anaemia virus (isav) proteins with type i interferon antagonizing activity sequence analysis of potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products intra-specific variability and biological relevance of p3n-pipo protein length in potyviruses simultaneous positive and purifying selection on overlapping reading frames of the tat and vpr genes of simian immunodeficiency virus tula and puumala hantavirus nss orfs are functional and the products inhibit activation of the interferon-beta promoter limited variation during circulation of a polyomavirus in the human population involves the coco-va toggling site of middle and alternative t-antigen(s) origins of genes: "big bang" or continuous creation? selective degradation of host rna polymerase ii transcripts by influenza a virus pa-x host shutoff protein changes to taxonomy and the international code of virus classification and nomenclature ratified by the international committee on taxonomy of viruses stability and evolution of overlapping genes diversity of coding strategies in influenza viruses deciphering the origin and evolution of hepatitis b viruses by means of a family of non-enveloped fish viruses infectious pancreatic necrosis virus proteins vp2, vp3, vp4 and vp5 antagonize ifna1 promoter activation while vp1 induces ifna1 selection characterization on overlapping reading frame of multiple-protein-encoding p gene in newcastle disease virus norovirus regulation of the innate immune response and apoptosis occurs via the product of the alternative open reading frame 4 selection pressure in alternative reading frames evolution of overlapping genes constrained evolution with respect to gene overlap of hepatitis b virus a single chicken anemia virus protein induces apoptosis hepatitis c virus frameshift/ alternate reading frame protein suppresses interferon responses mediated by pattern recognition receptor retinoic-acid-inducible gene-i on the informational content of overlapping genes in prokaryotic and eukaryotic viruses viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of deltaretroviruses different patterns of codon usage in the overlapping polymerase and surface genes of hepatitis b virus suggest a de novo origin by modular evolution overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes a dependent-rates model and an mcmc-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames overlapping messages and survivability overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation a method for the simultaneous estimation of selection intensities in overlapping genes evolution of viral proteins originated de novo by overprinting degeneracy of the information contained in amino acid sequences: evidence from overlaid genes the tombusvirus-encoded p19: from irrelevance to elegance acquisition of new protein domains by coronaviruses: analysis of overlapping genes coding for proteins n and 9b in sars coronavirus clustal omega, accurate alignment of very large numbers of sequences a viral protein suppresses rna silencing and binds silencing-generated, 21-to 25-nucleotide double-stranded rnas the effect of gene overlapping on the rate of rna virus evolution statistical methods. iowa state university press evasion of antiviral innate immunity by theiler's virus l* protein through direct inhibition of rnase l substitution rate and natural selection in parvovirus b19 rapid asymmetric evolution of a dual-coding tumor suppressor ink4a/arf locus contradicts its function direct detection of alternative open reading frames translation products in human significantly expands the proteome the influenza virus protein pb1-f2 inhibits the induction of type i interferon at the level of the mavs adaptor protein size selective recognition of sirna by an rna silencing suppressor a simple method for estimating the strength of natural selection on overlapping genes the x proteins of bornaviruses interfere with type i interferon signalling gene birth contributes to structural disorder encoded by overlapping genes inhibition of long-distance movement of rna silencing signals in nicotiana benthamiana by apple chlorotic leaf spot virus 50 kda movement protein influenza a virus protein pb1-f2 translocates into mitochondria via tom40 channels and impairs innate immunity independent evolution of overlapping polymerase and surface protein genes of hepatitis b virus evolutionary selection associated with the multi-function of overlapping genes in the hepatitis b virus the author is grateful to alessio peracchi (university of parma) and alberto vianelli (university of insubria) for helpful suggestions. special thanks to xinzhu wei (university of michigan) for valuable comments and suggestions and to gianmarco del vecchio for preparing the figures. the author thanks the anonymous referees and the editor alexander e. gorbalenya for their helpful feedback and suggestions. the study was financed by the miur (ministero dell'università e della ricerca). supplementary data to this article can be found online at https:// doi.org/10.1016/j.virol.2019.03.017. key: cord-016095-jop2rx61 authors: vignais, pierre v.; vignais, paulette m. title: challenges for experimentation on living beings at the dawn of the 21(st) century date: 2010-06-08 journal: discovering life, manufacturing life doi: 10.1007/978-90-481-3767-1_5 sha: doc_id: 16095 cord_uid: jop2rx61 “we can talk endlessly about moral progress, about social progress, about poetic progress, about progress made in happiness; nevertheless, there is a type of progress that defies any discussion, and that is scientific progress, as soon as we judge it within the hierarchy of knowledge, from a specifically intellectual point of view.” introduction of new exploratory methods such as biocomputing or bioinformatics and high-throughput screening, which involves the simultaneous processing of hundreds or even thousands of samples. this approach is in contrast with traditional biology, in which the research strategy is based upon the observation of effects obtained as a function of experimental parameters that are modified one by one. another aspect of modern times is that, with the irresistible trend in genetic manipulation towards a focus on human beings, certain areas of fundamental research are finding themselves locked into philosophical dilemmas that are matter for ethical and sociocultural consideration, and the subjects of fierce debate. instead of setting out to discover unknown mechanisms by analyzing effects that are dependent on specific causes, with some uncertainty as to the possible success of the enterprise being undertaken, which is the foundation stone of the bernardian paradigm of the experimental method, many current research projects give themselves achievable and programmable objectives that depend upon the means available to them: sequencing of genomes with a view to comparing them, recognition of sequence similarities in proteins coded for by genes belonging to different species, with the aim of putting together phylogenetic trees, synthesis of interesting proteins in transgenic animals and plants, analysis of the three-dimensional structure of proteins, in order to find sites that are likely to fix medicinal substances, and synthesis of molecular species able to recognize pathogenic targets. the facilities that are called into play include instruments that are often sophisticated, the performance of which, in terms of miniaturization, computerization and robotization, is far beyond that of apparatus that was in use a few decades ago. these facilities, applied to research into living beings, have entered the framework of a methodology that has been given the label biotechnology. proceeding handin-hand with applications that have become more and more meaningful in the domains of medicine, pharmacology, agronomy and animal husbandry, the biotechnological process has come to the fore as a new paradigm for the experimental method as applied to living beings. in addition to new discoveries, the driving forces behind biotechnologies are related to economic imperatives as well as the interest and support they receive from the political powers-that-be. the academic spirit that presides over fundamental science gives way to the entrepreneurial spirit that implements a rational programming of facilities and an efficient organization of scientific collaborations. as an example, the sequencing of the human genome, which includes three billion nucleotide base pairs, required the coordination of several dozen scientific teams around the world and the matching of several tens of thousands of results. research on dna provides a typical illustration of the way in which research has become divided, over the last few decades, between an approach and an interest that had previously been purely academic, and the increasing role of technology, which can be justified by the results that arise in the life of society at large, but which, because of these results, also gives rise to questions concerning how wellfounded some of these results are, particularly in the health domain. the experimental method, which had been confined to the laboratory, is now a matter for public debate. before it won acclaim, dna, which was isolated under the name of nuclein by johann friedrich miescher (1844 -1895), at the end of the 19 th century, had to undergo a series of structural evaluation tests that were spread out over the first five decades of the 20 th century. an overall conclusion then came to the fore. dna is a polydeoxyribonucleotide that carries four cyclic bases, adenine, thymine, cytosine and guanine. each base is involved in the structure of a mononucleotide where it is itself associated with a sugar, deoxyribose, which is associated with a phosphate residue. dna was compared to a ladder, the rungs of which (mononucleotides) were linked by ester bonds between an acid group of a phosphate residue of a nucleotide and the free hydroxyl group of the deoxyribose of the following nucleotide. research committed them to this path. it was the curiosity of each, a new way of considering old problems, that led a few men and women to solve the problems of heredity." the middle of the 20 th century saw an accumulation of experimental evidence showing that dna carries genetic information, and because of this, that it controls the transmission of hereditary characteristics: the proof provided in 1944 by oswald avery (1877 -1955) , colin macleod (1909 -1972 and maclyn mccarthy (1911 -2005 of the transforming power of dna in pneumococcus, the highlighting by alfred hershey (1908 -1997) and martha chase (1927 -2003 , in 1952, of the role played by bacteriophage dna as an infectious agent for bacteria, the revelation by erwin chargaff at the beginning of the 1950s of the equivalence of molar concentrations of adenine (a) and thymine (t), on the one hand, and of cytosine (c) and guanine (g), on the other hand, in dnas arising from a multitude of sources, animal, plant and microbial, thus suggesting a complementary pairing of adenine and thymine, and cytosine and guanine. based on the pairing of a/t and c/g bases, the model of the double helix structure of dna, formulated in 1953 by james watson and francis crick, made it possible to understand the identical synthesis of double strands of dna by replication during cell division (figure iv.1 ) and, as a consequence, the conservation of hereditary characteristics in descendants. afterwards, it was found that the information contained in the dna base sequence determines the amino acid sequence in proteins. then the roles played by messenger rna and transfer rnas were elucidated, the former acting as a carrier of information between dna and the proteins being synthesized and the latter acting as double-headed adaptors, able to recognize nucleotide triplets (codons) in messenger rna and to specifically fix amino acids in order to position them on the ribosomes, the final result being the synthesis of a protein chain. in 1966, the genetic code was deciphered. the veil of mystery that had covered the mechanism of the synthesis of proteins was lifted, and the decisive role played by nucleic acids in this synthesis was shown. later on, there were a few adjustments. although, in bacteria, proteins are coded for by a continuous sequence of nucleotide triplets in dna, in the 1970s the surprising discovery was made that in eukaryotic organisms, genes are discontinuous and made up of coding dna sequences (exons) interrupted by non-coding sequences (introns). from the end of the 1950s, françois jacob and jacques monod had postulated the existence of a dual determinism for protein synthesis and shown that, next to structural genes expressed as proteins, there are regulatory genes able to control the expression of the structural genes. the importance of the differential regulation of gene expression in cell differentiation in higher organisms was quickly recognized. from this point on it was possible to explain why a particular species of protein is more specifically expressed in a given tissue and another species of protein is more particularly expressed in another tissue, each type of tissue finding its specificity in its molecular components. this fantastic framework of knowledge, which was built up over a couple of decades, has been used as the foundation stone for the so-called central dogma of molecular biology, which explains the transcription of dna sequences into messenger rnas and the translation of messenger rnas into proteins and which, with only a few variants, is the same throughout the living world (figure iv.1) 1 . 1 the fascinating history of molecular biology was well described by james d. watson in molecular biology of the gene (3 rd edition 1976). there are now many works concerning this subject and how it has progressed, including two well-documented books in french: the secrets of the gene by françois gros (1986) and histoire de la biologie moléculaire by michel morange (1994) . new strand a -double helix structure of dna and simplified representation of its self-replication. each strand of the parent molecule of dna acts as a matrix for the synthesis of a daughter molecule of complementary dna, in conformity with the rules of pairing: adenine (a) with thymine (t) and guanine (g) with cytosine (c). the double strands that appear are identical to each other and identical to the parent dna molecule. b -transcription of dna into messenger rna and its translation into an amino acid chain. diagram of gene expression in a eukaryotic cell. one of the strands of dna (the coding strand) has coding sequences (exons) and non-coding sequences (introns). it is said that the gene is split. the transcription of the exons, accompanied by their splicing leads to the formation of a messenger rna, the codons (nucleotide triplets) of which are translated into amino acids that are linked to each other by covalent bonds in order to form a protein chain. in prokaryotic cells (bacteria), the genes do not contain introns and are not split. a: adenine; c: cytosine; g: guanine; t: thymine; u: uracil. met: methionine; his: histidine; tyr: tyrosine; gly: glycine; phe: phenylalanine. interlinked with the epic rise of molecular biology, there was a succession of technical innovations that led to the synthesis of dna by chemical or enzymatic means, and to its being cleaved at specific locations, with the pieces that were obtained being joined together again 2 . in 1962, in geneva, werner arber (b. 1929) 3 and daisy dussoix highlighted the restriction phenomenon, which involves the degradation of bacteriophage dna by a recipient bacterium. they discovered that an extract of e. coli has a restriction activity, and that this activity is of an enzymatic nature, caused by a nuclease that breaks the phosphodiester bonds in dna. in 1970, the americans hamilton smith (b. 1931) and kent wilcox 4 purified the first restriction enzyme from a strain of haemophilus influenzae. in 1971 , daniel nathans (1928 -1999 and kathleen danna 5 (b. 1945) at johns hopkins university in baltimore (usa) drew up the first restriction map based on the circular dna of the monkey sv40 virus, using a restriction enzyme that was named hindiii and a follow-up of the sequential appearance of shorter and shorter fragments resulting from the partial digestion of dna. in the following years, dozens of restriction enzymes were isolated, all of them endowed with a surprising specificity with respect to specific base sequences in dna ( figure iv .2). these enzymes were to be indispensable tools in genetic recombination experiments. the transformation of rna back into dna was observed by howard temin (1934 -1994) and s. mizutani 6 , in experiments on the rous sarcoma virus, a virus with rna that, when it proliferates in host cells, is able to synthesize a dna that is complementary to its rna. the enzyme responsible, reverse transcriptase, was purified by both h. temin and david baltimore (b. 1938) 7 . starting with a determined messenger rna, it then became possible to work back to the dna, i.e., to the gene, by a simple enzymatic reverse transcription operation. dna that has been synthesized in this way is called complementary dna (cdna). in eukaryotic organisms, reverse transcription has proved to be all the more useful as a technique in that all cdna is coding, unlike the situation in vivo in which the genes are divided up into portions that are coding (exons) and portions that are non-coding (introns). the ability to cleave dna and to join together the fragments obtained in a deliberately chosen order, or, in other words, to manufacture previously unseen dna sequences by making new combinations, led to the dawning of recombinant dna technology and caused scientists to come to the sudden realization that the pandora's box that contains the secrets of life had been opened, that uncontrollable catastrophes might arise from this, and that there was a potential danger of causing tumorigenic viruses to reproduce in commensal bacteria such as the enterobacterium escherichia coli. in 1975, around a hundred molecular biologists gathered together at the asilomar conference center 8 near monterey in california, in order to discuss the dangers of the new dna technology. they proposed strict regulation to govern genetic manipulation. time and experience have shown that the risks being run were very low. in 1977, dna sequencing methods were published. one of them made use of chemical techniques 9 , the other made use of enzymatic techniques 10 . applications were not slow in appearing. from 1977, the team led by frederick sanger (b. 1918) in cambridge (uk) determined the first sequence of a genome, that of bacteriophage phix174, which is 5 375 nucleotides long. this was the beginning of an audacious adventure, the apparently senseless challenge being met with unbelievable rapidity thanks to the innovative methods of bioengineering, resulting, during the first years of the 21 st century, in the sequencing of the human genome. analysis of the human dna sequence involved the participation of two rival groups, one of them being academic, coordinated by francis collins (b. 1950) , and bringing together dozens of laboratories around the world, and the other being a private californian company directed by craig venter (b. 1946) . at the beginning of the 1980s, when everyone was persuaded that the rnas could be placed into three well defined categories, messenger rnas, transfer rnas and ribosomal rnas, it was with great surprise that it was learned that there were rnas that have catalytic properties (see thomas cech 11 [b. 1947] ). these rnas, called ribozymes, have, in keeping with enzymatic proteins, structured catalytic sites that are able to catalyze rna or dna cleavage or ligation reactions. recently, engineering techniques have been used to obtain artificial ribozymes that have been found to be able to catalyze reactions as varied as oxidations or the synthesis of peptides and nucleotides, thus opening up wide-ranging possibilities of applications in molecular therapeutics, and, in addition, reinforcing the famous theory of the "world of rna" at the beginning of the appearance of life on earth 12 . another discovery of the 1980s was the role of methylation of dna bases, cytosine and adenine, and its deregulation in a certain number of pathologies: fragile x syndrome, scapulohumeral dystrophy, certain forms of cancer 13 … in the past decade or so, basic proteins known as histones that are associated with the nuclear dna of eukaryotes in the form of a complex called chromatin and which had previously been assigned a structural role, have now acquired the status of functional partners. thanks to specific modifications of certain amino acids (acetylation, methylation, phosphorylation), histones control the state of condensation of the chromatin and the efficacy of transcription of dna contained in the chromatin, to such an extent that we now speak of the "histone code". the development of our understanding of histones is a good illustration of the complexification of a concept, the dna code, into an entity that comes closer to living reality, the dna code in partnership with the histone code. there has also been the discovery of interfering micrornas, small polymers made up of around twenty nucleotide units, the role of which is to control protein synthesis (chapter iv-1.2.2). methylation of dna, structural modifications of the chromatin histones, and blocking of transcriptional activity by interfering micrornas are a few of the major areas of research in a scientific domain that is in full expansion, epigenetics, which could be said to have "pipped the science of genetics at the post," and which explains the plasticity of the functions of living beings. with the arrival of restriction enzymes and reverse transcription, the foundation stones of genetic engineering have now been laid, and are ready to be used, this being all the easier in that synthetic chemistry is now able to manufacture dna chains that are several hundreds of nucleotides long, and progress in robotics and computing techniques made it possible for chemists to avoid carrying out tedious routine tasks by using completely-programmable machines. the hope that it would be possible to experiment on living beings by means of the manipulation of dna became a reality when the american researchers paul in the first work carried out on the expression of foreign genes, the use of plasmids as vectors was preferred, particularly that of plasmid pbr322, because of its considerable replication capacity. in 1978, a first success was obtained by herbert boyer and his co-workers, with the expression of the gene for somatostatin, a peptide hormone comprising twelve amino acids that negatively regulates the secretion of growth hormone, in the bacterium e. coli. because of its small size, the somatostatin gene was synthesized by chemical means. the expression of somatostatin in e. coli was verified using immunological and physiological criteria, thus demonstrating the validity of the procedure that was used. the following year, human insulin was produced in e. coli. fairly soon, yeast was substituted for this bacterium because, as a eukaryotic organism, it has enzyme systems that are able to carry out chemical finishing operations on neosynthesized proteins that bacteria are unable to do, for example the formation of disulfide bridges in insulin. genetic recombination is used in order to cause bacteria to manufacture a foreign protein of animal or plant origin. this involves the insertion of the fragment of animal or plant dna that codes for this foreign protein into a plasmid. the plasmid, a small ring of bacterial dna, acts as a vector for the foreign dna. in order to carry out insertion, plasmid dna is cleaved by an appropriate restriction enzyme. the foreign dna is obtained by reverse transcription from a useful messenger rna. its duplication is catalyzed by a dna polymerase. the s1 nuclease makes it possible to break the covalent bond between two strands of dna. in the following step, a terminal transferase is used to add four nucleotides for which the base is a cytosine (c) to each of the two dna strands. the same lengthening operation is carried out on the bacterial plasmid, but, in this case, the addition involves a sequence of four nucleotides for which the base is a guanine (g) (complementary to the cytosine c). the bacterial plasmid is hybridized in vitro with foreign animal or plant dna and then introduced into the bacterium which, using its own machinery, perfects the junction between the integrated dna and the plasmid dna. a very small volume of dna (2.10 -9 ml) is injected under the microscope into eukaryotic cells (hela cells in inset) using a micropipette with a very fine end that pierces the cell membrane. the swelling of the cells at the moment of injection can be seen (inset). supported by these successes, genetic engineering started to come to the fore as an application-oriented discipline. levels of performance that would never have been imagined half a century before were achieved, such as the production of growth hormone, interferons, blood coagulation factors and vaccines. in the final decades of the 20 th century, phenotype transformations using genetic modifications that had previously been carried out in bacteria and yeasts were successfully attempted in animals and plants. it was observed that a mutated dna integrated into a plasmid and introduced into a fertilized mouse egg (by micromanipulation) modifies the mouse's genetic inheritance, which affects first the embryo and then the adult mouse with phenotype modifications. such mice, which are said to be transgenic because of the stable integration of a foreign dna into their genome, are now widely used as animal models in studies that aim to understand the mechanisms involved in high-incidence human pathologies such as cancer, diabetes, and rheumatoid conditions. in 1982, two american researchers 16 , ralph brinster (b. 1932) and richard palmiter (b. 1942) carried out a spectacular transgenesis experiment in mice. using microinjection, they introduced the growth hormone gene (obtained from the rat) into oocytes of mice from a "little" germ line. once the transgeneic mice had reached adulthood, they were giants. at present, the transgenesis technique is being applied both to the animal kingdom and to the plant kingdom. in the 1980s, kary mullis 17 (b. 1944) perfected an ingenious technique, the polymerase chain reaction (pcr), which makes it possible to produce several tens of thousands of copies of a fragment of dna. using this technique, it is possible to detect traces of a fragment of dna of a given sequence down to an attomolar concentration, i.e., one billion billion (10 -18 ) times smaller than molar concentration. by the end of the 20 th century, genetic engineering had become well-established and wide-spread, thanks to a mastery of techniques involving the manipulation of dna such as the accurate cleavage of a gene into fragments using commercially available restriction enzymes, the covalent assembly of two fragments of dna by ligases, the automated chemical synthesis of fragments of dna of more than one hundred nucleotides and the possibility of manufacturing a complementary dna (cdna) from a messenger rna by using a reverse transcriptase and automated dna sequencing. given this particularly well-equipped toolbox, the molecular biologist is now able to manipulate dna, that is to say, the chemical material that contains the information that is central to the functioning of living structures (microorganisms, plant and animal organisms), and thus to modify, at will, the genotype of these structures that the selective pressure of evolution has previously favored. genomics has produced enormous quantities of data that are stored in databanks. automated procedures have been invented to make the information contained in these data intelligible, these procedures forming the basis for a new discipline, biocomputing, or bioinformatics, which develops programs, or algorithm-based strategies, that are able to solve specific problems, of which the annotation of genomes, i.e., the identification of coding and non-coding sequences. while the annotation of the prokaryotic genomes is relatively easy, because of the absence of introns, that of the eukaryotic genomes is considerably more difficult because of the alternating exons and introns and the small proportion of coding exons (fewer than 2% in the case of the human genome). this explains why, at the time of writing, several hundred genomes of prokaryotes (around 300 at the beginning of 2006) have been sequenced, as opposed to only a few dozen genomes of eukaryotes. annotation was carried out manually at first, but it has become automated and it is now possible to analyze thousands of items of genomic data. the comparison of nucleotide base sequences in dnas and of amino acids in proteins of different origins involves biocomputing. the identification of similar or identical regions that provide information about functional similarities and phylogenetic proximity involves the use of alignment methods. one of these methods, which is in current use, is called blast (base local alignment search tool). comparison of protein sequences has been particularly instructive in the science of evolution. it has highlighted evolutive processes in the phylogenesis of proteins and linked these processes to precise functions. it has been possible to deduce that, over time, different families of proteins with similar functions appeared independently and evolved along different routes. this is the case for membrane proteins whose polypeptide chain crosses the thickness of the membrane six or twelve times; thus, the mitochondrial membrane proteins that transport metabolites are formed by triplication of an element with two transmembrane segments, while proteins located in other membranes of the cell are derived from duplication of an element with three transmembrane segments. from this academic context arose the study of paleogenetics, a new discipline that compares dna sequences extracted from fossils and amplified by pcr with dna sequences of current species. in addition to being of immense interest to fundamental biology, genetic bioengineering has led to innumerable industrial applications making use of genetically modified microorganisms that are able to synthesize molecules with a high added value that can also be used in xenobiotic depollution operations. so much data has already been deposited, equivalent to the sequencing of more than one hundred billion nucleotides, that it is inevitable that there have been errors, some of which might prove prejudicial for future use (comparison of sequences, screening of drugs…). nevertheless, the ever-increasing numbers of genome sequencing projects for animal, plant and microbe species show the interest that is shown in understanding the genetic information present in different types of cells, in order to be able to exploit their potential. dna chips appeared in the last decade of the 20 th century, and came to the fore as part of a new technical revolution, the "high throughput" revolution ( figure iv.5 ). an article written by the group headed by ronald davis and patrick brown (b. 1954 ) at the university of stanford 18 gives a precise description of the hybridization technique used in dna chips. thus, around a hundred short dna strands, corresponding to portions of genes of the plant arabidopsis thaliana, commonly known as mouse-ear cress, a small plant in the brassicaceae (formerly cruciferae) family, are synthesized. a robot is used to deposit microquantities of these dnas in solution in a dot pattern on a small glass slide coated with poly-l-lysine, thus comprising a "chip" on which the covalently fixed dnas act as probes for specific molecules. a later step involves both the use of reverse transcription to produce complementary dnas (cdnas) from messenger rnas arising from the expression of genes in the same plant and the labeling of these cdnas with fluorescent ligands for use in screening. once these fluorescent cdnas have been denatured, i.e., after separation into single strands, they are brought into contact with the dna chip. the unhybridized molecules are removed by washing. a -the term dna chip corresponds to a small, chemically-treated glass (or sometimes silicon) plate on which a robot has deposited dna strands of known sequence in a pre-determined order. b -the dna chip may be used for different types of diagnostic procedures. in the differential diagnosis experiment represented here, messenger rnas are prepared from two cell samples that have been treated in parallel, the control sample (normal cell) and the experimental sample (pathological cell). these messenger rnas are reverse transcribed into complementary dnas (cdnas) by means of a reverse transcriptase. each of these two types of cdna, corresponding to the two types of messenger rna, is labeled by a chemical reaction with a specific fluorescent ligand (cy3, which emits at 568 nm and cy5, which emits at 667 nm). they are then hybridized with chip dna strands. after hybridization and then washing, the fluorescence emitted at 568 nm and 667 nm under laser irradiation are analyzed using an appropriate detection system. the differential expression of the genes in the control cell sample and the experimental sample (shown by a color difference) can thus be analyzed. hybridization between the fluorescent dnas called targets and the complementary nucleotide probes fixed to the dna chip is detected by means of an automated fluorescence detection system. at the beginning of the 1990s, stephen fodor 19 and his group, who were working at the affymax research institute (palo alto, california), developed an ingenious procedure for microphotolithography that led to the synthesis of a network of a thousand peptides on chemically pre-treated glass microscope slides. the resolution of the network was shown by epifluorescence microscopy after fixation of specific antibodies labeled with fluorescent probes. soon after this, microphotolithography was used for the manufacture of dna molecule networks on solid supports. from then on, two competing techniques for the preparation of dna chips became well-established, either the depositing on a solid support of cdna obtained by gene amplification (technique used by davis and brown), or the synthesis in situ of oligonucleotides carried out directly on a solid support (technique used by the american affymetrix company, arising from affymax). one considerable advantage of dna chip technology is that it provides information on the level of transcription of thousands of genes into messenger rnas (mrnas), in a simultaneous manner and in a relatively short lapse of time. experiments that previously required weeks, months or even years to be completed can now be carried out in a matter of hours. we therefore have a sort of instantaneous, precise, freeze-frame picture of the state of a cell at a given moment, with a great number of parameters explored in a semi-quantitative manner. dna chips are a typical example of the application of high throughput technology to the study of living beings. the panoply of mrnas produced by the transcription of dna is called the transcriptome. the method by which transcriptomes are obtained is called transcriptomics. it should be added here that there is not necessarily a correlation between the abundance of a mrna, evaluated on a dna chip, and the functionality of the corresponding protein, which depends on multiple factors that particularly involve post-translational modifications: phosphorylation, glycosylation, hydroxylation, and so on. at the end of the 1990s, dna chips were being used extensively in the research programs of many biology laboratories. they are used in a variety of domains: human pathology, to differentiate between forms of cancer linked to multiform mutations; microbiology, to identify pathogenic germs; comparative genomics, to look at model eukaryotic organisms that have in common a certain number of genes; or even in populations genetics, to detect polymorphisms linked to a change in a single base in a dna sequence. as a complement to the dna chip technique, the fish (fluorescent in situ hybridization) method holds a key position in the study of cytogenetics. this method is based on hybridization between fluorescent nuclear probes of known nuclear sequence and of complementary motifs located in the dna of the chromosomes. it allows the detection of chromosomal modifications with a gain or loss of genetic material, such as those that are found in certain tumors. it is used in prenatal diagnostics to diagnose such modifications. protein chips, which are used to characterize the reactivity of proteins with respect to specific ligands, are another example of the application of high throughput technology. dozens of proteins of different types (antibodies, for example), as well as derivatives of nucleic acids or even molecules capable of being ligands of proteins that might arise from combinatorial chemistry (chapter iv-3.3), are arranged in a network on small glass plates that are chemically treated to act as hooks to entrap specific proteins present in a tissue extract or serum ( figure iv .6a). this procedure, which is essentially analytical in nature, is complemented by a functional study in which proteins that have been isolated in their native form, i.e., those that are capable of expressing the same functions that they posses in vivo, are deposited on a glass microplate. this type of biochip makes it possible to analyze the reactivity of the proteins that are fixed to it with respect to a multitude of targets (proteins, nucleic acids or pharmaceutical substances) (figure iv.6b). antigens peptides a -biochip for the identification of proteins. different types of ligands, antibodies, antigens, dna or rna, small molecules with a high affinity and specificity, are deposited on a reactive surface. these biochips can be used to determine the level of expression of proteins and the type of proteins expressed in cell extracts. they can be used for clinical diagnostics. b -biochip for the functional study of proteins. native proteins or peptides are arranged in micronetworks on a reactive medium. biochips produced in this way are used to analyze the activities of proteins and their affinities as a function of posttranslational modifications. they are useful for identifying drug targets. analytical proteomics, which was still in its infancy in the last decades of the 20 th century (chapter iii-6.2.3) has become a vigorous discipline. the association of liquid nanochromatography and of mass spectrometry allows the identification of peptides obtained by the trypsin hydrolysis of samples of proteins of around one picomole in size. applied to fundamental and pathological cell biology, the aim of analytical proteomics is not only to decipher the list of proteins on the scale of the cell, but also to highlight variations in the abundance of synthesized proteins as a function of environmental conditions. it also aims to determine the posttranslational modifications that are undergone by the proteins inside the cells, which, for a large part, control the specificity of their operation. in parallel with the study of proteomics, the study of peptidomics, or the study of all of the peptides (peptidome) present in animal and plant cells and in the fluids that bathe these cells, has developed. for example, several hundred different peptides have been found in the cerebrospinal fluid. structural proteomics, which deals with the three-dimensional structure of proteins, has also become a domain in which the activity has been increasingly dominated by high-throughput techniques. this is partly due to the fact that the pharmaceutical industry has given considerable, sustained attention to the understanding of the structure of proteins that could play the role of therapeutic targets. this is the case for protein kinases that catalyze, via atp, the phosphorylation of endocellular proteins, of proteases involved in hydrolysis reactions, or even of cell surface receptors that are able to combine with ligands such as hormones. the use of automated crystallization systems has become common in structural biology. from the classical technique of the hanging drop, in plates comprising 24 wells, we have moved on to plates with 96, 384 and, recently, 1536 wells. obviously, this increase in dimensions requires the use of an automated system that includes a robot in charge of transferring microaliquots of the protein solution into the wells and adding media that differ according to their ph, ionic force and molecular composition to these wells. the crystallization process is followed by an automated microscopic examination coupled with video photography. making use of the recent development of genomics and of proteomics, and a detailed inventory of the structures and functions of the different protein species of living beings, contemporary biology is now able to sketch out a scheme of molecular systematics, including a classification into phylla, families, and classes that echoes those of the zoological and botanical systematics of the 17 th and 18 th centuries. however, modern systematics does not tell us how protein macromolecules interact within dynamic networks. there is still an enormous amount of work to be done in order to achieve an understanding of the meaning of the dialogue between macromolecules in a normal or pathological cell context. this work will require a detailed analysis of metabolic pathways and of how they are controlled, and their evaluation in kinetic and thermodynamic terms. it will be accompanied by modeling (chapter iv-4). there is no doubt that it will be successful. making use of subtle differences in the qualitative and quantitative expression of genes, it will become possible to understand the molecular principles that modulate differences in morphology and in function between neighboring animal, plant or microbial species. the science of evolution should benefit from this. in medicine, the forecasting of predispositions for certain diseases should be made easier (chapter iv-3.2), opening up the perspective of prevention strategies. using recombinant dna technology, metabolic engineering applied to microorganisms and plants should make it possible to improve the production of molecules that are of economic interest or can be used for drugs. the diversity of bacteria is amazing, much greater than might be supposed by looking at the number of bacterial species identified by culturing on appropriate media. in fact, the number of bacterial species that can be cultivated only represents 1% of the total number of existing bacterial species on the surface of the earth. there are two major reasons for this: we do not know the appropriate conditions for culturing these bacteria; a certain number of environmental bacteria live in symbiosis, acting as commensal organisms that benefit from the products secreted by other organisms. nevertheless, the study of the bacterial genome, without any clonal culture, has been carried out, and comprises a branch of genomics known as metagenomics. instead of looking at an isolated, well-identified bacterial species, in order to analyze the sequence of its dna, as has been done traditionally, researchers look at a heterogeneous bacterial sample from which the dna is extracted, amplified, and then sequenced by high throughput methods. computer processing of the data provides information about individual germs. craig venter, who had already gained notoriety with the sequencing of the human genome, recently applied "metagenomic" procedures to the study of the sequence of the "metagenome" of the bacterial species of the sargasso sea 20 . he came up with nucleotide sequences corresponding to approximately 1 million kilobases of non-redundant nucleotides, attributable to more than two thousand different genomes. the challenge to be met by metagenomics is to connect a function to its phylogenetic source and to extend this information to specific species within a bacterial community. group 21 . in human biology, a metagenomics approach has been applied to the study of the population of bacteriophages present in the intestinal flora. approximately 1 200 genotypes have been identified, a number that greatly exceeds the 400 bacterial species of this flora 22 . this result leads us to think that the luxuriant community of bacteriophages which cohabits with that of the intestinal bacteria may influence the diversity of the latter by selective bacterial lysis and also by promoting the exchange of genes between bacteria. a rapid overview of the history of the exploration of genomic dna over the last fifty years shows the rapidity with which a traditional experimental paradigm can move thanks to modern computing and robotics procedures. in less than twenty years, we have moved from the manual sequencing of dna that was developed at the end of the 1970s to automated high throughput sequencing. at the turn of the 21 st century, the sequencing of communities of genomes (metagenomics) has been substituted for the sequencing of individual genomes. dna and protein chips have become objects of everyday use in fundamental and applied biology. transgenesis is widely practiced. dna, a molecule that remained mysterious for a long time after it was discovered, delivered some of its secrets during the second half of the 20 th century. the purpose of the first experiments on dna was to understand how dna, detector of the genetic code, transmitted its message. after having questioned dna, researchers moved on to manipulating it. the current aim is to use oligonucleotides to build nanoscale constructions with original and, if possible, useful, properties. in addition, the possibility that has recently become available of being able to interfere with the expression of the genome in living cells, with the intervention of small rna molecules, allows the programmed manipulation of the genome. another challenge, the extending of the coding power of the genetic code, now appears to be achievable. from the fact that each of the strands of the double helix overhangs in one direction, and in the other the strand with which it is paired, thus leaving a few bases free ( figure iv.7) . if two strands of dna with sticky ends are brought into contact, when the bases of these ends are complementary, a branched structure will appear spontaneously. using this principle as a basis, cube-shaped nanometric constructions that make it possible to encage molecules of interest have been built. the opening of the cage by appropriate devices liberates the encaged molecules, which can act as substrates in specific reactions. the cutting of a double strand of dna using restriction enzymes able to create fragments with cohesive ends (a) has been used to "build" an artificial construction (b), which, in this case, is a cube (c), but which could be an object of a different geometrical type. a dna nanomachine that is capable of movement is becoming a reality. one dna nanomachine, which is admittedly still rudimentary, has been put together based on the structural difference that exists between b-dna, the classical double helix that twists to the right, and z-dna, a double helix that twists to the left. a propensity to adopt the z-form is triggered when there is an alternating sequence of cytosine (c) and guanine (g) (cg sequence) in the dna. the experiment illustrated in figure iv.8 makes use of a duplex formed of b-double helices. the dna nanomachine constructed by n.c. seeman comprised a duplex of double strands of dna. one of the double strands, of the classical b-form of dna (right-hand twist), has been cleaved in such a way as to fix fluorescent molecular probes onto the cleavage zones. facing this cleavage zone, a short nucleotide sequence, in which the cytosine (c) guanine (g) motif is repeated, can be found in the other dna double strand, which is also of b-form. the addition of cobaltihexammine induces the transition of the cg segment from a right-hand twist (b-dna) to a left-hand twist (z-dna), which leads to a rotation of this segment and to a rotation of the assembly, which can be detected using fret (fluorescence resonance energy transfer) spectroscopy. one of the double helices has a short cg segment. facing the cg segment, the other double helix is interrupted, and its ends, where the interruption is, carry fluorescent molecular probes. the simple fact of adding a cationic substance such as cobaltihexammine, which neutralizes the negative charges of the phosphate groups, triggers a conformational transition, with the cg segment taking the z-form, causing a rotational movement of the assembly that is detected by the movement of the probes. there is no doubt that the use of dna strands in order to build nanomolecular constructions that are capable of programmed movement marks the beginning of an adventure that we may imagine will be rich in outlets for domains such as computer technology, nanomechanics and even the life sciences. in addition, the discovery that dna conducts electrical current gives rise to dreams of a revolutionary technology in which dna may be used in the design of electrical circuits, in competition with classical electronics 24 . interfering rnas are non-coding rnas of around twenty nucleotides that control gene expression at post-transcriptional level. as with many discoveries, that of rna interference was the result of serendipity. it began during the 1980s with observations made by two american research groups, that of victor ambros 25 now at darmouth medical school, hanover, and that of gary ruvkun 26 (b. 1951) at boston's massachusetts general hospital, that a gene named lin-4, which is involved in the post-embryonic development of the nematode c. elegans, did not code for a protein, but for a small size rna that played an antisense role. this odd discovery was supported and made more explicit a few years later by the research groups of andrew fire (b. 1959) at baltimore's carnegie institute and craig c. mello (b. 1960 ) at the university of massachusetts in worcester 27 . in order to block the production of certain proteins in the nematode c. elegans, the researchers used synthetic antisense rnas. the control involved the use of sense rnas according to a classical protocol. unexpectedly, protein synthesis was blocked in both cases, suggesting that a contaminant was present in the sense and antisense rna preparations. this contaminant was identified as a double strand rna (dsrna -double strand) that is, an rna that is folded back on itself in a "hairpin" loop because of the pairing of complementary bases (adenine vs uracil and guanine vs cytosine). in order to verify the mechanism by which the translation of messenger rnas into proteins is silenced, the nematode c. elegans was injected with a synthetic dsrna, part of the sequence of which was complementary to that of the gene unc-22, known to code for a protein involved in muscular contraction. within a few hours, the worm was making disordered movements, suggesting that the dsrna interferes with the production of proteins in the process of muscular contraction. the mechanism of action of dsrna was quickly unraveled: dsrna gives rise to two single strand rnas after cleavage by a specific enzymatic mechanism. one of the single strand rnas (sirna -small interfering rna) is paired thanks to a complementarity of bases with a short sequence of messenger rna transcribed from the gene unc-22. the result is a blockage of the translation of messenger rna into a protein, followed by the destruction of messenger rna. this phenomenon was named rna interference ( figure iv the dicer cleavage enzyme, which has a ribonuclease activity, cuts the double strand rna into two strands. in the presence of the risc (rna-induced silencing complex) protein complex, one of the rna strands finds a complementary nucleotide sequence in a messenger rna (mrna) and associates itself with this rna, making it unable to be translated into protein. we now know that eukaryotic cells from animals and plants produce and host interfering rnas that are said to be "natural" 28 . natural interfering rnas, of around twenty nucleotides, are called micrornas (mirnas). although a few details differ between the modes of formation and action of natural interfering rnas and those of synthetic interfering rnas, in particular the fact that messenger rnas are not destroyed by mirnas but blocked in their translation, the effect of negative regulation on the production of specific proteins comes to the same thing ( figure iv.9 ). there is a far from negligible number of genes that code for mirnas. already, several hundred mirnas have been identified in the genomes of plants and animals. the amount of interest that they arouse, and the feverishness of the research being carried out on them, are in keeping with the major mechanisms that they control: embryogenesis, hematopoiesis, neuronal differentiation, etc. given an understanding of the genome sequence in man, the rat and the mouse, trials have begun that aim to achieve an understanding of how the expression of mammal genes of known sequence might be manipulated by the interplay of interfering rnas (chapter iv-3.1). the treatment of viral infections such as aids or hepatitis b, which are worrying public health problems, could benefit from this new technology. it appears that interfering rnas have much more to give to us in the near future than they have taught us up until now. the deciphering of the genetic code in the middle of the 1960s was the end of a first step in elucidating the mechanism by which a sequence of nucleotides in dna is translated into a sequence of amino acids in a protein (chapter iv-1.1). during the years that followed, the subtleties of the transcription of dna into messenger rna and of the translation of messenger rna into protein via transfer rnas were explored in hundreds of laboratories around the world. particular attention was paid to the understanding of how a given amino acid is activated and bound to a transfer rna (trna) after being picked up by an aminoacyl-trna synthetase. nevertheless, the idea remained of a code in which triplets of purine and pyrimidine bases of messenger rnas are translated into natural amino acids. recently, methods have been developed that give more flexibility to the action of the aminoacyl-trna synthetases, or, in other terms, relax their specificity 29 . synthetases that have been manipulated in this way are able to recognize non-natural amino acids and to incorporate them into proteins by working together with the ribosomal machinery. it is in this way that, at the time of writing, around thirty non-natural amino acids, obtained by insertion of different types of chemical residue (photoactivable, fluorescent or radioactive residues capable of acting as probes for structural and functional analyses) (figure iv.10) have been incorporated into protein structures. with such an innovation, an unexpected field of exploration has opened up to research in domains as far apart as pharmacology and the science of evolution, giving rise to burning questions: could such non-natural proteins have therapeutic properties? could they give a selective advantage to the organisms that host them? with the addition of non-natural amino acids to the genetic code and the demonstration that proteins containing such amino acids can function in living cells, in sum, with the transgression of the potentialities of the natural genetic code, the experimental method appears to challenge the order of living beings. the triumph of genetic engineering via the study of dna is not unique to biology. many other sectors are undergoing changes in their type of experimental approach, dictated by the technosciences and making use of computer sciences, robotics and high-throughput screening. however, given the many questions that its operation continues to raise, and its central position at the heart of scientific ethics, the study of dna remains a typical example of the way in which the experimental life sciences and the techniques that underlie them are evolving nowadays. "i perfectly agree that when physiology is sufficiently advanced, the physiologist will be able to make new animals or plants, as the chemists produces substances that have potential, but do not exist in the natural order of things." more than a century after claude bernard predicted a genetically manipulated world, it has come to pass. the molecular biologist, having original, highperformance methods for "tinkering" with dna, has moved on to the application and use of his technical expertise for utilitarian ends. during the 1970s, with transgenesis, research on bacteria (chapter iv-1.1.2) opened up a new biological domain, that of genetically modified organisms (gmos). results led to predictions that it would be possible to transfer a fragment of dna corresponding to a gene of a certain species into the genome of another species and have this foreign gene express itself as a protein in the host cell. in 1983, the successful trans-genesis of a gene for resistance to an antibiotic, kanamycin, in tobacco plants, signaled the beginning of the technology of the first plant-type genetically modified organisms, still called gmps (genetically modified plants). in 1996, the birth of dolly the ewe unveiled the era of reproductive cloning in mammals, i.e., the identical reproduction of an already-existing organism. an additional step was taken with the first tests on the differentiation of embryonic stem cells towards different types of lines that are characteristic of well-defined tissues, such as nerve tissue, cardiac tissue or the hepatic parenchyma, thus opening up promising perspectives in regenerative medicine. the frontiers of the experimental method continue to be pushed back to the limit of what is feasible and sometimes into the realm of fiction, as in immunology, for example, with the idea of xenotransplantation, using "humanized" animal organs. given the universality of the genetic code, any gene that is introduced into the genome of a plant, whether that gene is of animal, plant or microbial origin, is able to replicate itself and be expressed as a specific protein. thus plant gmos or genetically modified plants (gmps) are able to express specific foreign proteins from another plant, a bacterial microorganism or an animal organism. in the 1990s, a short time after fundamental research had revealed the feasibility of plant transgenesis, the first transgenic plant, the flavr savr tomato, was marketed in the usa. since this time, numerous other plant gmos have been cultivated on a large scale and become available on the world market, including corn, soya, rice, cotton and the poplar. one of the desired aims is to produce modified plants that are able to resist destruction by the herbicides that are commonly used to eliminate weeds, while another is to prevent predation by harmful insects. in the first case, transgenesis involves the insertion of a herbicide-resistant gene, and in the second case, the inserted gene codes for an insecticidal toxin. recently, plant gmos that produce proteins with a therapeutic effect have appeared, ranging from antibiotic peptides to antibodies or proteins as unexpected as human hemoglobin. current projects aim to create plants that are resistant to adverse conditions such as the dryness of arid climate zones. the preferred procedure for producing a plant gmo is to use a bacterium, agrobacterium tumefaciens, a microorganism that is able to insert fragments of its own dna into plant cells ( figure iv .11). the useful gene that we wish to transfer into the plant may be a gene for resistance to a pesticide such as glyphosate, marketed under the name of roundup, phosphinothricin (basta) or glufosinate (liberty). the plasmid is reintegrated into a. tumefaciens. during infection of a plant cell by a. tumefaciens the t-dna carrying the gene of interest is inserted into one of the chromosomes of this cell. the ti plasmid is isolated and cut using a restriction enzyme. a foreign gene, known as a gene of interest, is inserted into the t-dna of the plasmid. a plant is generated from a modified clone. all of its cells carry the foreign gene. transgene of interest in the case of the fight against insect predators, the useful gene is carried by a fragment of dna contained in the genome of the bacterium bacillus thuringiensis. this gene, called bt, expresses a toxin responsible for the insecticidal capability of b. thuringiensis. a current application involves the protection of bt corn with respect to the corn-borer, a devastating insect whose caterpillars are particularly destructive. another, more direct, gene transfer method, known as biolistics, involves bombarding plant cells with tungsten microbeads covered with modified dna. with the implementation of large-surface-area experimental fields and the first marketing of gm soya, in 1996, the question of whether or not the advantages achieved with respect to crop yields are counter-balanced by risks for the environment and for consumers came to the fore. food risks could arise from the toxicity or allergenic power of artificially synthesized proteins. at the time of writing, this question remains unanswered, due to the lack of epidemiological studies carried out rationally over several years. when the first creations of gmos took place, the transfer of the gene of interest was carried out by means of the co-transfer of an antibiotic resistance gene. the transformed cells were selected according to the criterion of their resistance to this antibiotic, which involved a risk of dissemination of the resistance gene. this selection technique has been abandoned. in practice, it is difficult to evaluate the theoretical ecological risk of wild plants being invaded by genes that have been inserted artificially into gmos. as a precaution, zones used for experimentation of plant gmos are now surrounded by refuge zones, i.e., fields in which the same species of plants, in non-gmo form, are cultivated. there has been a much fiercer and completely legitimate debate concerning the presence of the terminator gene in seed from the first gmos marketed by the monsanto company in the usa. the terminator gene blocked germination of the seed from the cultivated plant, so it was necessary for the farmer to buy more seed from the company each season, thus creating a state of dependency. this technique is no longer in use, but the fact remains that most transgenic seed is patented, and therefore farmers who use such seed are dependent on the companies that posses this genetic know-how. the culture of plant gmos has spread around the world, covering more than a billion hectares of our planet, more than half of which are in the united states of america. this type of culture is used on a large scale for soya and in a less extensive way for corn, rape seed and cotton, but there are many other applications of plant transgenesis. among the countries that are actively involved we may mention argentina, brazil, canada and china, and more recently india, paraguay and south africa. while the policies of these countries are based on the fact that gmo products do not differ fundamentally from non-gmo products with respect to checks carried out a posteriori, and that there is thus no reason to prohibit them, european policy has taken refuge behind a principle of precaution, and it remains basically restrictive. although the moratorium on the culture and marketing of plant gmos that was put in place in 1999 was lifted in 2004, mandatory labeling for any consumable product containing more than 0.9% gmo remains dissuasive. the united states of america has refused to use such labeling. the worries that are aroused by plant transgenesis, which are often exacerbated by the diktats of ecology groups, must be analyzed in a reasoned manner. common sense and lucid thought dictate that the debate should be situated within a scientific perspective in which the main role is played by the experimental method in long-term applications. simple reflection leads us to think that with time parasites and self-propagating plants will develop a resistance to the most drastic treatments, as was the case for bacteria confronted with antibiotics. the perspective of an acquisition of uncontrolled resistances, which gives rise to so much passionate debate, is, in fact, only the first stage of a technology with promising applications. the mastery of plant transgenesis that was acquired through the first experimentations should, in fact, allow the emergence of plant gmos that are assigned to the production of molecules with therapeutic effects (drugs, vaccines, human proteins, vitamins…). in this domain, there have already been creations that include golden rice, which carries β-carotene, the precursor of vitamin a, banana plants that express a vaccine against hepatitis b and tobacco that produces human lactotransferrin and hemoglobin. if we just look at the production of golden rice as a palliative for vitamin a deficiency, it should be remembered that, in certain countries of our planet, this deficiency affects people's sight and is a frequent cause of blindness, that it generates problems with development and the immune response to infections, that it affects more than a hundred million children around the world and that it is responsible for the death of three million of them each year. if these plants are considered to be a material of choice for the production of proteins with a therapeutic effect, this is partly due to the yield of such crops over large surface areas, and also partly due to the low risk of transmission of viral pathogens to man, because of the species barrier, a risk that is less negligible when animal productions are involved. genetically modified plants are also potential factories for the manufacture of chemical products with an industrial impact, for example lubricants, perfumes and aromas. given the unpredictable outlets that plant gmos may have in human medicine and the different domains of the economy, plant gmo technology should be considered in a manner that is free of any pressure or passion, and, as far as the political authorities are concerned, it should be subject to appropriate measures to surround and protect certain strategic experiments. when looking at the worries being expressed by the european society, it should be remembered that the genetic inheritance of plants has never ceased changing, not only in the most of natural of manners, over millions of years, particularly with the mobility of transposable elements located in the genome, but also artificially, at the hands of farmers from ancient times onwards, with their methods of hybridization and selection. the nervousness of european authorities, showing an ignorance of basic scientific ideas, with the pretext of a principle of precaution, and sometimes political compromises that are exemplified by fluctuating and contradictory positions, runs the risk, in the short term, of causing their countries to lag disadvantageously behind the united states of america, which holds the majority of plant biotechnology patents. the principle of gene therapy is simple: the introduction of an appropriate gene into the cells of a patient who carries a mutation can correct the phenotypical consequences of this mutation, or, in other terms, cure the disease affecting the patient, or at least slow down its evolution. the technical difficulty involved in gene therapy is that of finding an appropriate vehicle or vector for the transfer of the gene and addressing it to an appropriate location in the genome of the host cell. the most commonly used vectors in human gene therapy are viral. a certain number of criteria are necessary for a transfer to be efficacious, including a high concentration of viral particles carrying the gene to be transferred (more than a billion viral particles per milliliter) and a good capability on the part of the foreign gene to be integrated into the host's genome. the patient's immune response remains a major worry in the use of viral vectors: at cell level it often leads to a proliferation of cytotoxic lymphocytes and, especially at humoral level, to the synthesis of antibodies directed against the viral proteins. in order to minimize its immune response, the genetic material of the viral vectors is modified. for ethical reasons, gene therapy is currently only applied to somatic cells, germinal gene therapy being rejected. somatic gene therapy has been experimented in the treatment of hereditary illnesses linked to hematopoiesis. one of the technical reasons for this choice is easy access to the progenitor cells of the bone marrow, with the aim of transfection. it was with this in mind that mouse gene therapy models were developed a few years ago. the sickle cell mouse is one of these models. human drepanocytosis (sickle cell anemia) is a serious disease that is caused by a mutation in the β protein chain of normal human hemoglobin a. the molecules of sickle cell hemoglobin s tend to aggregate and form fibers that obstruct the blood capillaries of the microcirculation. somatic gene therapy has been applied to these sickle cell mice. this involves an autograft of bone marrow hematopoietic cells transfected with a retrovirus hosting the gene coding for the β subunit of normal hemoglobin. encouraging results have shown the validity of this approach. in 2000, a gene therapy protocol that had been applied with success to man was described by the group of alain fischer (b. 1949) and marina cavazzana-calvo at the necker hospital in paris 30 (science, vol. 288, pp. 669-672). the purpose of this therapy was to bring about a long term remission in the case of an immune disease known as scid-x1 (severe combined immunodeficiency linked to a mutation on the x chromosome). because of their susceptibility to microbial and viral infections, babies who are affected can only survive in sterile rooms. they are known as bubble babies. in this illness, the hematopoietic progenitor cells of the bone marrow are unable to differentiate into t and nk (natural killer) lymphocytes because of a mutation that affects a cytokine receptor. previous experiments carried out on model mice show that scid can be corrected by in vivo transfer of the cytokine receptor gene into hematopoietic progenitors. the transfer of the gene of interest paired with a retroviral vector was carried out first in march 1999, in two babies, one of them eleven months and the other two months old. progenitor cells from their own bone marrow, cultured and modified genetically, were injected into them. these were therefore autografts, without any risk of immune rejection. a remission of symptoms over a period of nearly a year, shown by the almost normal behavior of the babies' immune cells, encouraged the application of the same therapy to other babies. in total, ten babies were given this therapy. the enthusiasm that greeted the successes that were recorded was nevertheless tempered by fact that in the spring of 2002, and again in the following year, a child who had undergone the gene therapy developed a leukemia characterized by an anarchical proliferation of lymphocytes, necessitating chemotherapy. these two occurrences were explained by the random character of the insertion of the gene of interest into the patients' genomes: insertion into a site close to a proto-oncogene had led to activation of this proto-oncogene and the proliferation of the lymphocytes. while the trial carried out at the necker hospital gave rise to great hopes, it nevertheless showed that there is still a long way to go before we achieve a targeted transfection of genes so that no undesirable consequences follow. here we have a typical example of the limits of an experimental method that is based on an in-depth technological know-how, but also on a still imperfect understanding of the complex arcana of the mechanisms that regulate the positioning and interaction of genes in the chromosomes of eukaryotic cells. this example highlights a harrowing ethical dilemma: should we not treat a patient whose illness is likely to be fatal, or attempt a therapy that may save the patient, without having any formal assurance of its success? an experimental medicine that has the power to modify the human organism via its genetic material is now able to take over from the experimental method that up until now operated on animals and plants. we can easily understand, given the progress that has already been accomplished and that which is to come in the domain of gene therapy, that the temptation will be great, in the future, to consider manipulations of the human germ cell genome as being licit, insofar as such manipulations make it possible to eradicate a handicapping defect in our descendents. at present, the idea of any attack on the germinal genetic inheritance has been rejected unconditionally on the basis of ethical considerations. nevertheless, the history of science shows that prohibitions that were once considered to be untouchable finish by being contravened. this was the case for abortion. in a text entitled why genetic engineering should continue its battle 31 , james watson writes of his confusion when faced with a choice that is likely to become more and more insistent over the years: "dare we be entrusted with improving on the results of several million years of darwinian natural selection? or do the human germ cells represent on the contrary rubicons that geneticists will never dare to cross?" a mastery of the differentiation of stem cells and of cloning are two essential weapons in the biotechnological arsenal, the use of which for utilitarian ends, particularly in human medicine, gives rise to hope and disquiet, agreement and disapproval. at the beginning of the 1960s, experiments carried out by the canadian biologists ernest mcculloch (b. 1926) and james till (b. 1931) attracted attention to the particular properties of cells in the bone marrow, the stem cells, which would subsequently be found in other tissues 32 . the experimental protocol is simple. bone marrow cells from a mouse are injected into another mouse that has previously been irradiated in order to destroy its stem cells. the injected cells go to the spleen where they divide and form colonies that take the form of nodules of different sizes. the researchers realized that the cells of these nodules present differences in their potential for renewal, which is more or less rapid. they reinjected the nodule cells into mice from a second batch. the reinjected cells showed themselves capable of multiplying and generating several types of blood line. these observations suggest the presence in the nodules of progenitor cells that have a strong potential for self-renewal and self-differentiation. in the following years, these observations were confirmed and explained by two characteristic criteria of stem cells; self-renewal and differentiation into multiples cell lines with specific characteristics. from this point on, it was possible to understand the enigma of the amputated hydra in the experiments carried out by trembley, two centuries beforehand (chapter ii-3.4.2). we now understand why, like the hydra, organisms like the flatworm, the salamander, the starfish and the zebrafish are able to recreate an amputated or damaged part of their bodies. the hydra mobilizes stem cells that it has preserved since its birth. in the case of the salamander, regeneration involves the reprogramming of cells that have already been differentiated. like all stem cells, embryonic stem cells (or es cells) are able to self-renew and differentiate into the different types of known adult cell line, giving rise to different types of cell such as neurons, cardiac cells that are able to contract, or hepatocytes ( figure iv .12). this potential has led to the hope that es cells could be used in regenerative medicine. in a fertilized egg that has developed to the blastocyst stage, it is possible to distinguish a cell mass (inner cell mass, icm) which protrudes inside the blastocyst. the icm cells are removed and placed on a mat of irradiated (and thus unable to divide) fibroblasts that provide them with a support and nutrients (steps 1 and 2) so that they can proliferate. the stem cells arising from the icm cells, placed in a medium that has been specifically conditioned to provide cytokines and other biomolecules, are able to differentiate into various cell types (step 3). at what stage of embryo development is it possible to remove es cells for experimental purposes? after fertilization by a sperm cell, the ovum undergoes a series of divisions that give rise to a microstructure, the blastocyst, the cells of which are called blastomeres. each isolated blastomere remains capable of producing an entire organism of fetus and placenta, by division and differentiation. at this stage, blastomeres are totipotent. five days after fertilization, the embryo has the form of a hollow sphere. an external layer of cells, the trophoectoderm, surrounds a cavity, the blastocele, inside which a small mass of cells, the inner cell mass, protrudes. from the beginning of the implantation of the blastocyt in the uterus, the trophoectoderm evolves to form the placenta. the cells of the inner cell mass take part in the process of differentiation that generates all of the tissues of the future adult organism. these are called embryonic stem cells (es cells). es cells are said to be pluripotent. isolated, they have lost their ability to give rise to a complete individual, but they have maintained the possibility of differentiating, according to their environment, into any of the two hundred cell types that make up animal tissues. during their division, es cells evolve from a stage of being pluripotent to a stage of being unipotent, passing through a stage of multipotency beyond a hundred cells. a state of multipotency characterizes cells that give rise to a restricted number of cell lines in the tissues in which they nest. this is the case for of the hematopoietic stem cells of the bone marrow that form the red blood cells and the white blood cells. the term unipotent refers to the progenitors, which give rise to a single type of cell, for example the hepatocyte of the liver or the cardiomyocyte of the heart. when es cells are cultivated for 4 to 7 days in a conventional nutritive medium, they multiply and aggregate. if the culture medium is supplemented with certain biomolecules such as insulin, retinoic acid, transferrin or fibronectin, the differentiation of the es cells is oriented towards cells of different types, such as neuron cells, glial cells or muscle cells. there are many publications about experiments concerning the grafting of differentiated stem cells in the mouse or the rat. for example, neuron precursors derived from the spinal cord or the brain are grafted into rats whose spinal chords have been injured. five weeks after the grafts are carried out, the transplanted cells have filled the area of the injury and differentiated into oligodendrocytes, astrocytes and neurons. what is more, after about twelve weeks, locomotive function has been partially restoredirradiated 33 . other experiments involving the grafting of differentiated stem cells have been carried out on rats in which the dopaminergic neurons of the "substantia nigra" of the brain that secrete the neurotransmitter dopamine have been selectively destroyed by injection of 6-hydroxydopamine. the problems found in the rat as a result of this neuronal degenerescence mimic those found in man in patients suffering from parkinson's disease. dopaminergic neurons obtained by the differentiation of mouse es cells are grafted into the striatum of each of these rats, a region of the brain whose neurons communicate with those of the substantia nigra and play a fundamental role in the control of movement. this results in a significant improvement in the motor deficit, coupled with the establishment of functional synapses between the injected neurons and those of the host 34,35 . a recent publication bringing together the results of two french research teams, that of michel pucéat (b. 1961) in montpellier and that of philippe menasché (b. 1950) in paris, provides interesting information about how mouse embryonic stem cells, grafted into sheep cardiac tissue where an infarctus has been artificially induced, are able to colonize the infarct zone and regenerate cardiac contraction in a functional manner. moving from the mouse to the sheep constitutes a considerable species leap, and the absence of any immune rejection leads us to say that embryonic stem cells have an "immune privilege" 36 . the use of es cells in regenerative medicine necessarily requires that their differentiation be regulated in an exhaustive manner into well-defined pathways, in order to produce homogeneous cell lines with a view to implanting them in damaged tissues. in fact, contamination with non-differentiated es cells is likely to cause tumors (teratomas) over the long term. the mastering of the use of es cell culture and differentiation, as well as of cloning, in such a way as to overcome problems of histocompatibility, is still in its infancy. for a long time, the mouse was the preferred animal model for experimental studies on the differentiation of es cells. in 1981, the first es cells from mouse blastocysts were isolated and successfully cultured by two groups of researchers in great britain and the usa. it was only in 1998 that human embryonic stem cells (hes) were isolated for the first time 37 and held in culture, on a nutritive layer of fibroblasts from irradiated mice. this delay with respect to the ability to culture animal es cells can be explained by the fact that the molecular machinery that activates replication and cell differentiation programs is not completely identical in man and the mouse 38 . for example, a cytokine called lif (leukemia inhibitory factor), which is indispensable for the renewal of es cells in an undifferentiated state in the mouse, has no effect on human es cells. there are several other differences concerning the control of proliferation and differentiation in human and murine es cells by growth factors. briefly, the conclusions obtained from experiments carried while there is a highly promising future for the use of es cells, this future is littered with obstacles, and rigorous checks and balances need to be put in place. nevertheless, research on such cells is mandatory if we wish to move on to a regenerative medicine that aims to be a new frontier in the art of healing. after specific differentiation, hes cells could provide unlimited quantities of the tissues needed to replace damaged tissues responsible for handicapping illnesses (dopaminergic neurons in parkinson's disease, cardiomyocytes in myocardial infarction, pancreatic islets of langerhans cells in diabetes, fibroblasts in skin grafts, chondrocytes in rhumatoid arthritis). in addition, metabolic analysis of hes cells carrying defective genes whose phenotypical expression is known in human pathology should improve our understanding of the perturbed mechanisms, and could lead to pharmacological advances. as well as the technical difficulties involved, which have not yet been adequately overcome, the handling of hes cells is subject to much ethical debate in many countries, with those who object to it holding to their prejudices, which are linked to religious or cultural traditions. this is the case in france, where, nevertheless, a few timid dispensations had begun to appear at the time of writing. in contrast, in great britain, the law authorizes the isolation of hes cells for therapeutic purposes, using embryos of less than one hundred cells, produced by in vitro fertilization, and surplus to requirements. the british response to the burning question of whether an isolated hes cell may be considered as a potential human embryo is clearly "no", for, in order to be able to develop in utero, such hes cells would need to have the placental progenitor cells. an alternative to the use of es cells is to make use of adult stem cells. however, the proliferation capacity of adult stem cells is considerably lower than that of their embryonic homologues. the hematopoietic stem cell is the paradigm of the adult stem cell. it can differentiate into all known types of cells. in the last decade of the 20 th century, several publications concerning the plasticity of the adult stem cell awakened a hope that these cells could transform the treatment of degenerative illnesses. certain of these publications stated that adult bone marrow stem cells, implanted into different types of tissues, differentiate into hepatocytes, cardiomyocytes or neurons, depending on the specific environment. careful re-examination of the techniques used revealed that, in certain cases, interpretation of the results as showing cell transdifferentiation was an erroneous one, and that the fusion of the bone marrow stem cells with cells from other tissues was a more plausible explanation. in any case, while not ignoring the use of adult stem cells, experimentation on hes cells remains a judicious choice, given our current state of understanding. in france, the 2004 law application decree that was issued on the 7 th of february 2006, revising the restrictive bioethical standards of 1994, opens up the possibility of using human embryonic stem cells for scientific purposes, with certain ethical reserves being maintained. one of the obstacles to the stabilization over time of a stem cell graft in a receiver involves the phenomenon of rejection for reasons of histocompatibility. considered to be foreign by the receiver (host), grafted stem cells coming from a donor are rejected. this obstacle could be overcome by using the technique of cloning. based on experiments on several animal species, it is now accepted that the transfer of the nucleus of an adult somatic cell from a host into an enucleated oocyte makes it possible to obtain from this oocyte, which is once again nucleated, and which is the equivalent of a zygote and able to divide, es cells whose genome is identical to that of the host. because of this, the es cells are immunologically compatible with the tissues of the host. in man, such cells could be directed by differentiation towards stable cell lines creating well-defined tissues and organs (liver, muscle…) that could be used in regenerative medicine. this is the principle of therapeutic cloning. in march 2004, korean veterinary researcher woo suk hwang (b. 1953) and his co-workers, who were recognized experts in animal cloning, announced in the american review science 39 that they had succeeded for the first time in obtaining around thirty human blastocysts by cloning, i.e., by the transfer of nuclei of somatic cells into enucleated ova. this first experiment involved autologous cloning (ovum nuclei and enucleated somatic cells taken from the same woman). hwang and his team used 176 ova, and the yield from the experiment was close to that obtained at that time for the cloning of mammals. using the inner cell mass of one of the blastocysts, they isolated a line of embryonic stem cells able to maintain a normal karyotype after several dozen divisions. this publication, which appeared in a highly prestigious scientific review, triggered an enthusiasm in the media that was in keeping with the spectacular nature of the team's exploit, tempered here and there by a few comments that were mainly linked to questions of medical ethics. in 2005, there were numerous other articles by the same team on the same subject, reinforcing the first results with a heterologous cloning technique (ovum nuclei and enucleated somatic cells taken from different people), thus giving rise to great hopes that the era of regenerative medicine was near. at the beginning of 2006, professor hwang's retractation of all his work, and a public confession of a spectacular fraud, were even more dramatic, offering certain media an occasion for a disproportionate level of fury against therapeutic cloning. however, despite such rear-guard combats, it is obvious that one day these technical difficulties will be overcome. human cloning, in order to obtain stem cells for therapeutic purposes, cannot escape the future. once this aim has been achieved, it will be spoken of as the outcome of a long story. the adventure of animal reproductive cloning began in 1960. in developmental biology 40 , two american researchers, robert briggs (1911 -1983) and thomas king (1921 -2000 described experiments involving the transfer of cell nuclei of embryos from a frog (rana pipiens), at the blastula and gastrula stages, into enucleated eggs of the same species. a high percentage of the clones obtained in this way were able to reach the tadpole stage when the transferred nuclei came from the early blastula stage, but only mediocre success was achieved when the nuclei came from the later gastrula stage. these experiments emphasized both the totipotency of the embryo somatic cells and the equivalency of the somatic cell nucleus and the nucleus of the fertilized egg in cell division and differentiation. briggs and king's publication did not arouse any particular interest. it is true that the 1960s were dominated by the saga of molecular biology, which would reach its culmination in the deciphering of the genetic code. from the 1980s onward, the first attempts to clone mammals (rat, mouse, pig) began. moving from the amphibian egg, which was a millimeter wide, to a mammal egg that was one hundred times smaller, presented a technical difficulty that would be overcome by a technique of cell-to-cell electrofusion. cloned embryos were thus obtained by nuclear transfer and then implanted into the uterus of a surrogate female. however, in all cases, the nucleus came from embryo cells. in february 1997, the announcement made by ian wilmut (b. 1944), keith h. campbell (b. 1954) and their collaborators at edinburgh's roslin institute 41 of the birth of the cloned lamb dolly had an immediate effect in the media. in fact, this was not only the cloning of a higher mammal, but, above all, the cloning by insertion of an adult somatic cell, in this case a mammary tissue cell, into an enucleated oocyte. this went far further than the experiment carried out by briggs and king, which essentially involved the transfer of embryo cell nuclei into enucleated frog eggs. the trick that gave wilmut and campbell their success was to bring the cells providing the nuclei to a quiescent state corresponding to the interphase stage of the cell cycle, by impoverishing their culture medium, before electrofusion with enucleated oocytes. although we should be aware that 434 attempts were made before a positive result was achieved, this does not make it any less astonishing that the nucleus of a cell in its adult state, i.e., completely differentiated, was able to behave as if it were totipotent. despite being committed to a program of differentiation that is considered to be more-or-less irreversible, and which will give it a specific identity, the nucleus of an adult cell can be reprogrammed and become totipotent. since dolly, many other mammals have been cloned from nuclei of adult cells; mice, cows, goats, pigs, rabbits, cats, dogs, rats and horses. as far as ethical discussion about cloning is concerned (chapter iv.5), it is essential to note that the demarcation line between reproductive cloning and therapeutic cloning is situated where decisions are made concerning the destiny of the cloned blastocyst ( figure iv .13). the transfer of the nucleus of a somatic cell (liver, epidermis, muscle) containing 2n chromosomes into an enucleated oocyte gives rise to an egg (2n chromosomes) that is able to divide and to produce a blastocyst. the cells of the blastocyst inner cell mass (icm) can be used as stem cells that can differentiate into different types of cell line (therapeutic cloning). on the other hand, if the whole blastocyst is implanted into a uterus, it will produce an embryo which, after birth, will grow into an adult animal (reproductive cloning). reproductive cloning and therapeutic cloning therefore differ because of the fact that in reproductive cloning, the whole blastocyst is used, while in therapeutic cloning, only certain cells, corresponding to the inner cell mass (icm) of the blastocyst, are used. the structural and functional identity of the cells of a given tissue in an adult organism involves a basic mechanism: while each cell has the same set of genes, only some of the genes are expressed as proteins and the genes that are expressed differ according to the tissue involved. the key to the mechanism responsible is in the epigenetic type chemical modifications of cell dna, for example methylations, which repress the expression of certain genes without altering the expression of others. these modifications of the dna, which control cellular specificity (muscle, liver, brain…) are not very reversible, but, in certain circumstances, they can become so. this is what happens from time to time when the nucleus of an adult cell is inserted into an enucleated oocyte. we are thus able to assume that in the molecular arsenal of the oocyte cytoplasm there are substances that can cancel the epigenetic modifications of the dna present in the nucleus of an adult somatic cell and recreate a state of pluripotency in this nucleus, or, in other words, provoke the reprogramming of the somatic cell nucleus. in the long term, it is to be hoped that biochemical technology will be able to find and purify the molecules responsible for the nuclear reprogramming of somatic cells. the use of human oocytes for the purpose of therapeutic cloning is still subject to severe criticism. certain groups wish it to be prohibited, because of a fear of a drift towards reproductive cloning. to obviate this risk, the idea has been to make use not of human oocytes but of those of animals, transferring the nuclei of human somatic cells into them. even supposing that the technical difficulties involved could be overcome, the cells that would result, a sort of man-animal chimera, would also be the subject of an ethical debate, even if the purpose of this type of cloning were to be solely therapeutic. some japanese researchers 42 have succeeded in creating mice according to a parthenogenetic process that involves adding the nucleus of an oocyte that is haploid (1n chromosomes) to another haploid oocyte, the result being the equivalent of a fertilized egg (2n chromosomes). this exploit is achieved by the invalidation of one of the genes (h19) involved in the control of the parental imprint. it is known that sexual reproduction in mammals involves a phenomenon called the parental imprint, which, by means of the methylation of dna and perhaps also of histones, allows the expressing or silencing of certain genes in male and female gametes. a single copy of a given gene, originating either from the oocyte or from the sperm cell, is therefore expressed, while the other is inactive. in the japanese experiment, if the mouse h19 gene had not been invalidated, the result would have been an anarchical development of the responsible genes involved in the parental imprint with overexpression in the case of some and an absence of expression in the case of others. these disturbances would have been incompatible with the viability of the embryo. however limited its application might be, the manipulation of the germinal genome poses the problem of the mechanism by which the parental imprint intervenes in the viability of the egg, a parameter that at the time of writing is still not completely understood, but is being actively explored. in boston, massachusetts, in 1954, a kidney was transplanted from a healthy boy into his twin brother, who was suffering from a fatal renal anomaly. the success of this graft ushered in the era of transplantations of such organs as the heart, liver and kidney in man. in order to try to prevent the rejection of grafts, caused by an immune incompatibility between the receiver and the graft from the donor, different immunosuppressing treatments were tried, one by one, involving corticosteroids or cytostatic agents such as 6-mercaptopurine. in the 1980s, a decisive step forward was made with the fortuitous discovery of the powerful immunosuppressive effect of the cyclosporin a, a cyclic polypeptide isolated from the mold tolypocladium inflatum. each year, human organ transplants into patients make it possible to save many lives. however, for some time now, organ transplantation has been suffering from penury of donors. one alternative to the homograft is the grafting of animal organs, or the xenograft, and, at the dawn of the 21 st century, this type of graft has entered an active, promising phase, with the creation of pigs that have been partially "humanized" and are thus, as a consequence, immunocompatible. for reasons of genetic and physiological similarity, the first choice for such grafts was to use apes or monkeys. however, this idea was quickly abandoned, for several reasons; a non-negligible risk of viral infection due to the phylogenetic kinship of human and simian species; a slow growth rate; a low reproduction rate and, finally, laws that protect primates. these disadvantages are not found, or are at least minimized, in the pig: the risk of a viral infection passing from the pig to man should be low because of the species barrier (but nevertheless, it ought to be evaluated), the pig growth rate is relatively rapid, pig litters are large and pig organs are of a size close to those of man. hyperacute rejection of grafts is the critical obstacle that must be overcome before it is possible even to envisage the feasibility of xenotransplantation. hyperacute rejection is caused by the presence in man of natural antibodies (xenoantibodies) that accumulate throughout a lifetime and are directed against antigenic motifs carried by the products of the digestion of food or dust that is breathed in. xenoantibodies are mobilized when a xenograft occurs, and when they combine with xenoantigens brought by the graft this activates immune proteins such as the complement proteins. the catastrophic effect of this xenoantibody/xenoantigen combination is a vascular thrombosis followed by necrosis and rejection of the graft. the pig xenoantigen that is considered to be the one mainly responsible for the phenomenon of rejection in man is a sugar molecule, galactose α-1,3-galactose, located on the plasma membrane of endothelial cells. synthesis of this molecule requires the enzyme α-1,3-galactosyltransferase, which is present in most mammals, but absent in man and the primates. this enzyme disappeared in man around twenty million years ago, following a double mutation of the gene. in 2002, cloning by nuclear transfer, associated with the invalidation of the gene coding for galactosyltransferase, made it possible to create pigs without galactose α-1,3-galactose 43 . this performance shows that the xenotransplantation objective, although it can only be envisaged over the long term, is not based on false hopes. plant gmos, gene therapy, embryonic stem cells, therapeutic cloning, and xenotransplantation are a few of the many examples that show how far experimentation on living beings has progressed in just a few decades, from inquiries into the operating mechanism of an organ or a cell, in the interests of pure understanding, to a programmed process, planned with an objective in mind, the chances of success of this objective being analyzed and counted in terms of impact and cost-effectiveness. during the renaissance, ecclesiastical authorities, worried by the libertarian forces that were assailing them, applied the brakes to audacious questioning of dogma such as the circumterrestrial revolution of the sun that had, since ancient times, placed man at the center of the universe. nowadays, civil authorities, conscious of the potential but also of the possible misuses of genetic manipulation, insist on having the right to oversee such procedures. in truth, since the 19 th century, governments have been interesting themselves in research on living beings and encouraging it, as long as its applications have allowed improvements in human health. this has been the case for vaccinations against infectious diseases or for prevention of microbial infections by means of aseptic or antiseptic methods. with the breakthroughs made in genetic manipulation at the end of the 20 th century, it was more than just the results of experiments on living beings that attracted the attention of the political authorities, it was, above all, the manner in which the experimental method, with all its hazards, made use of living material, sometimes of human origin, in order to unlock mysteries. conscious of the social impact of emerging discoveries that are subject to considerable media coverage and are sensationalized in both the written and audiovisual media, the state, with the help of researchers and philosophers, has laid down a code of bioethics, applied through strict or even restrictive legislation. it remains to be seen whether the rules of this code will continue to be an inviolable absolute or will be modified according to the evolution of the moral codes and the cultures of nations. university teaching and the education of a society must now take into account not only the content of successive discoveries, but also the fallout of these discoveries, insofar as they concern man, and even the ethical justification of the methods that have allowed these discoveries. in "remaking" living beings according to imposed norms, and in scheduling, in a certain fashion, the manufacture of life according to new codes, certain questions move from the "how" to the "why", i.e., from the scientific domain that is accessible to human thought to the metaphysical sphere, with its problems of the limit of what is surmountable and tolerable in terms of ethics. in his birth of predictive medicine, jacques ruffié (1921 -2004) reminds us that medicine has evolved through three stages over the course of time: curative medicine, which has been practiced since ancient times and is still being practiced; preventive medicine, which is more recent, and is designed to prevent people from falling ill, either by vaccinating them, in the case of infectious diseases, or by recommending an appropriate diet and medication in the case of metabolic disorders such as diabetes or arterial hypertension that have been detected by means of systematic examination; and, finally, predictive medicine, a branch of medicine that is still in its early phases, and which is based on modern technology and is able to predict situations of risk because of anomalies detected in the genetic inheritance or because of exposure to environments that are reputed to be dangerous (for example, carcinogenic smoke, asbestos). about one and a half centuries ago, the publication of the introduction to the study of experimental medicine (1865) provided proof, based on scientific arguments, that the time had come to transfer the experience that had been acquired through the experimental method practiced on animal models to the ill person. after claude bernard, attentive to the progress made by ideas and techniques in the physical and chemical sciences, and making use of its own advances in the understanding of the living cell, both normal and pathological, experimental medicine was to live through a development that was without precedent in the history of humanity. to understand the causes of epidemics, nutritional deficiencies, metabolic deviations of hereditary origin and degenerative illnesses, and then to translate these causes in cellular and molecular terms, this was the process undertaken by medicine once it began to use the experimental method. in fact, for several decades, from the beginning of the 19 th century, medicine had already undergone some major revisions of outdated practices and had inaugurated a new era in diagnosis. for example, the differential diagnosis of pulmonary ailments became possible because of the invention of the stethoscope by rené laennec (1781 -1826) and the practice of auscultation and percussion by joseph skoda (1805 -1881), the uncontested master of the vienna school. in france, pierre louis (1787 -1872) used statistical methods to evaluate the efficacy of different treatments. armand trousseau (1801 -1867), a pupil of pierre bretonneau (1787 -1862) wrote a famous treatise on the hôtel-dieu medical clinic in paris. in great britain, chronic nephritis, with its identifying symptoms, was described by richard bright (1789 -1858), paralysis agitans by james parkinson (1755 -1824) and addison's disease, which affects the adrenal glands, by thomas addison (1793 -1860). during the 19 th century, many other famous names signaled the arrival of a medicine that was resolutely anatomoclinical in nature, in line with bernardian doctrine. "experimental medicine is thus a medicine that claims to understand the laws of the organism in sickness and in health, in such a way that it not only predicts phenomena, but also in such a way that it can regulate and modify them, within certain limits." in the introduction to the study of experimental medicine, claude bernard stigmatizes the relics of an empirical medicine that was still being practiced in his day and was forgetful of rationalism. the terms that he uses are without leniency: "i have often heard doctors who, when asked the reason for a diagnosis, reply that they don't know how they recognize such a case, but it is obvious, or who, when asked why they administer certain remedies, reply that they don't really know how to put it exactly, and that anyway they are not required to give a reason, because it is their medical tact and their intuition that guides them. it is easy to understand that doctors who reason in this way are denying science. what is more, it is impossible to be too forceful in rising up against such ideas, which are bad not only because they stifle any scientific seed in the young, but also, above all, because they favor laziness, ignorance and charlatanism." in order to evaluate the meaning of these words, it should be remembered that in claude bernard's time, the medical profession was far from considering the microscope as a useful instrument for the study of cell structures and that the cause and effect relationship between bacterial germs and infections was still to be shown. with the development of increasingly effective instruments for exploration, and of methods for microanalyses concerning a wide range of blood and humoral constants, throughout the 20 th century, medicine, which was once empirical, has now become scientific. claude bernard's dream, experimental medicine, is now operative. this medicine is no longer content simply to determine the cause of an illness and to locate the affected organ, which was the major objective of clinical medicine, but it seeks to detect the mechanisms of pathological processes by means of histological and physicochemical explorations. this medicine is no longer willing to passively monitor the evolution of an infectious disease. after having identified the responsible germ, it tries to target this germ with the chemical weapon that is able to selectively destroy it. this medicine is no longer content simply to find remedies, it aims to understand the mode of action. it sets itself the goal of meeting challenges such as finding the genetic cause of degenerative illnesses or of cancers and developing appropriate therapies. it is supported by statistical data. when a new drug is implemented, the results are now evaluated by the double blind method: neither any of the patients (treated and non-treated) nor any of the investigators are aware of who has been administered with the drug and who has been administered with a placebo. in the surgical domain, audacious techniques have also led to considerable progress, particularly in neurosurgery and in cardiovascular surgery. thanks to robotics and to computer technology, remote surgery or telesurgery has become practicable, although up until not that long ago, it was only to be found in fiction. faced with emerging problems in public health, the task undertaken by experimental medicine is immense. in the middle of the 20 th century, the spectacular recovery from high-incidence infectious diseases such as pneumococcal pneumonia, meningococcal meningitis or acute forms of tuberculosis, which that was brought about by antibiotics, gave rise to the idea that medicine had won a battle against the microbial world and that, from then on, it would be able to control the evolution of infectious diseases and to offer rational treatments. the gradual appearance of a microbial resistance to antibiotics has brought an end to this euphoric era. penicillin, for example, which was put on the market at the end of the 1940s, was active on practically all strains of staphylococcus aureus. sixty years later, more than 90% of the strains of this same microbe are resistant to penicillin. the incidence of nosocomial infections, which are contracted in health care facilities, never ceases to rise. at present, around 10% of the hospitalizations that take place are complicated by the patient developing a nosocomial infection. equally worrying are the re-emergence of diseases that were once considered to be under control, such as tuberculosis or poliomyelitis in africa, and the emergence of new diseases such as aids, whose hiv virus (human immunodeficiency virus), which was identified at the beginning of the 1980s, has generated a pandemic that has spread throughout the planet. infectious diseases are currently responsible for more than a quarter of human deaths. the koch bacillus responsible for tuberculosis and the pneumococcus kill three to four million people a year, around the world. in 2004, hiv killed more than three million people, and more than forty million people are infected. one person is infected every 30 seconds. in viral diseases, the role of vectors (insects, various animals) as well as the notions of contagiousness and aggressivity have been emphasized. we have only to remember the dreadful contagiousness and aggressivity of the spanish flu virus (influenzavirus ah1n1) which, in 1918 -1919, killed more human beings around the world than the first world war that preceded it. in contrast, the sars (severe acute respiratory syndrome) epidemic of 2003, the vector of which was doubtless the civet, a small carnivore raised in china and desired for its meat, was rapidly contained because of its low contagiousness and also because of the isolation measures that were taken. human behavior is not without its effect on the emergence of viral diseases. the growth in intercontinental travel and human migration, as well as intensive deforestation in africa and south america, which bring virus vectors into contact with man, are factors concerned in the emergence of viral diseases that risk being explosive and devastating. in this context, the history of the ebola virus and of the marburg virus, which cause violent hemorrhaging, is edifying. in 1967, in the german village of marburg, an epidemic of unknown origin broke out, the illness manifesting itself with brutal suddenness by vomiting, diarrhea, a high fever and an increased tendency to bleed. this pathology, which was contained rapidly by means of drastic isolation measures, was found to be of viral origin. the pathogen concerned was a filovirus (filiform virus). a brief enquiry showed that the origin of the epidemic was contact between technicians of a pharmaceutical company and monkeys that had been imported from uganda and that were carrying the virus. in 1976, two other epidemics, characterized by severe and often fatal hemorrhagic fevers, were reported in the sudan and the republic of the congo. here again, the illness was caused by a filovirus, the ebola virus. at the time of writing, only public health organizations, including the nih (national institutes of health) in the usa, have attempted to set up vaccination and therapeutic strategies. research on these dangerous viruses requires high security installations that are particularly costly, so that private companies are reticent about investing in work that is only targeted on poor regions and which concerns epidemics that have so far been contained successfully, although one day the ebola and the marburg virus could quite well escape their african niches. experimental medicine must also understand the colossal challenge of the five thousand hereditary diseases that are currently listed, the most handicapping of which are myopathies and neuropathies. given the means that are available to the contemporary clinician in order to assign each of these diseases to a genetic defect, one can only be amazed by the mass of information about them that has accumulated over a century, since the first diagnosis of a hereditary disease, alcaptonuria, which was made in 1902 by archibald garrod (1857 -1936), a doctor at london's st bartholomew's hospital. alcaptonuria is a non-serious genetic flaw that can be detected easily by a blackening of the urine. it is the result of a blockage caused by the mutation of an enzyme involved in the catabolism of an amino acid, tyrosine, this blockage leading to the accumulation of homogentisic acid, the polymerization of which gives rise to a brownish color. the patient examined by garrod was a young boy. investigation of the family history revealed that transmission of the flaw was correlated to cross-cousin marriages and followed mendel's laws for recessive traits. garrod demonstrated other hereditary-type anomalies, cystinuria, porphyria and pentosuria. in 1909, these observations were published in a work that became a classic: inborn errors of metabolism. in 1956, the specific molecular defect of a metabolic anomaly linked to a mutation was identified for the first time by the german-born british biochemist vernon ingram. this was the hemoglobin defect responsible for drepanocytosis or sickle cell anemia: a glutamic acid in the β chain is replaced by a valine. the consequence of this simple change is a modification of the structure of the hemoglobin, leading to a sickle-shaped deformation of the red blood cells, the increased fragility of these cells and also a tendency towards cell lysis. this discovery made use of the electrophoresis and chromatography techniques that had just been introduced in biochemistry (chapter iii-6.2.2): such a discovery would not have been possible without these techniques. because of the progress made in molecular biology, the nosological framework of hereditary diseases has been greatly enriched over the last twenty years. for example, at present, more than one hundred hereditary-type myopathies have been identified by accurately locating molecular lesions in the genomic dna and characterizing the structural and functional modifications of the mutated proteins. certain health problems present real challenges for experimental research. this is the case for the spongiform encephalopathy caused by a prion (proteinaceous infectious particle) , which has all the more impact on the imagination because its etiology remains a mystery. it is also the case for degenerescence of the central nervous system correlated with aging, alzheimer's disease being a striking example, although, as far as familial forms of this illness are concerned, i.e., those of the hereditary type, it has been possible to link the invasion of the brain by a so-called amyloid peptide, which accumulates in plaques, on the one hand, and, on the other hand, the absence, due to a mutation, of an enzyme, a peptidase, which normally degrades the amyloid peptide. contemporary scientific medicine sometimes acquires a revolutionary aspect. here again, as with other disciplines involved in the study of living beings, it arises from discoveries resulting from the principle of serendipity (chapter iii-2.2.3). this was the case when, in january 1987, a team in grenoble, france 44 , led by the neurosurgeon alim-louis benabid (b. 1942) and the neurologist pierre pollack (b. 1950) discovered by accident that in patients affected by parkinson's disease a beneficial effect was achieved by deep, high-frequency electrical stimulation of the brain. the three major symptoms of parkinson's disease are muscular rigidity, a tremor when at rest and a slowing down of the execution of movements. in the 1960s, the swedish team of arvid carlsson (b. 1923) , who won the nobel prize for physiology and medicine, demonstrated a relationship between the parkinson syndrome and a deficit in the secretion of a neurotransmitter, dopamine. a group of neurons that is limited to half a million (of the 100 billion contained in the brain) produces this neurotransmitter in a small structure located in the midbrain, called the substantia nigra. the neurons of the substantia nigra have elongations that interact with different nerve formations (called nuclei) including the subthalamic nucleus. in 1990, bergman et al. 45 published an article that describes a curious relationship between a provoked lesion of the subthalamic nucleus and the disappearance of the signs of parkinson's disease in a monkey which had been made parkinsonian by chemical treatment. this publication led the team in grenoble to target their electrical stimulation on the subthalamic nucleus. this was completely successful. this electrical stimulation procedure, which is now well-codified, involves using stereotactic neurosurgical techniques, controlled by magnetic resonance imaging, to implant an electrode into the subthalamic nucleus. the electrode is connected to a generator that is implanted under the patient's clavicle. the generator sends brief electrical impulses of frequencies from 100 to 200 hz. under the effect of this stimulation, the characteristic symptoms of the illness, particularly the static tremor and bradykinesis, regress in a spectacular manner. the mechanism by which this stimulation acts is not yet understood. no doubt this has to do with complex phenomena involving the inhibition of certain neuron relays near to the substantia nigra, which remain to be deciphered. here we have a typical case of a progression from an experimental fact, discovered by accident, towards the analysis of its cause. from the point of view of the experimental method, it is interesting to make parallels between this discovery by serendipitous means of the beneficial role of electrical stimulation of the midbrain in parkinson's disease and cartesian style programmed research that aims to graft into the brain of parkinson's disease sufferers embryonic stem cells differentiated into dopaminergic neurons 46 . civilian society and its armed force, the political authorities, have understood that experimental science has the tools, the method and the thought processes necessary to develop strategies for prevention and healing. immune cells of this person and induces the synthesis of immune proteins. finally, it seems relatively certain that hopes concerning gene therapy for hereditary diseases will be fulfilled within before too long (chapter iv-2.3.1). one of the traits that is characteristic of the period we live in, and which arises partly from the economic stakes involved, is the shortening of the time that elapses between a discovery being made and the application of that discovery. for example, interfering rnas, which were discovered in the 1990s (chapter iv-1.2.2) are already the subject of therapeutic investigation. more than a hundred biopharmaceutical companies around the world are using them with a view to producing drugs from them 47 . in mice, a certain number of synthetic interfering rnas have proved their efficacy in silencing genes which, following mutation, have acquired carcinogenic potential. however, the use of interfering rnas as therapeutic agents requires them to be stabilized, because they are fragile molecules. the group headed by achim aigner 48 (b. 1965 ), at the school of medicine in marburg, germany, managed to stabilize a synthetic interfering rna by complexing it with polyethyleneimine, and this interfering rna is able to block the expression of a receptor involved in cancerization (c-erbb2/neu(her-2) receptor). used in mice, such a drug appears promising. despite the undeniable progress that has been made, experimental medicine is still some way from finding solutions to some of the enigmas that it meets along the way, and which underline the complexity of living beings. some time ago, it was thought that after having invalidated a gene coding for a protein that is indispensable to a function, we would discover the secret of a cause-and-effect relationship. experimental practice has shown that, generally, this is far from being the case. another example of the complex relationships that exist in living beings is the interference of the mental and the organic. one experiment that suggests this interference was carried out on mice who had acquired a form of pathology similar to huntington's chorea, by transgenesis. mice from the same line were separated into two batches, one acting as a control, and the other being subjected to daily mental stimulation, including memorization tests. unexpectedly, the appearance of symptoms was noticeably slowed down in mice who had been subjected to mental gymnastics 49 , as if the brain, by intentionally mobilizing its neuron activity, was able to secrete substances able to alleviate its own defects. in short, by means of possible retroactive mechanisms that are called upon by the mind, the brain appears to act as actor and spectator. at the turn of the 21 st century, experimental medicine was being nourished by techniques inherited from experimental physics, chemistry, and even mathematics and computer technology, in the same way as the other sciences of living beings. the progress made in medical imaging techniques has been particularly impressive since the time, at the end of the 19 th century, when the x-rays discovered by wilhelm röntgen made it possible to view the structure of the human skeleton. the saga of x-radiation continued through the 20 th century (chapter iii-2.6.1). for the last few decades, new imaging techniques have come to the fore. they have spread rapidly, and been refined. ultrasound imaging, which is based on the principle of the reflection of ultrasound waves off of different kinds of surfaces, has become an everyday technique for viewing blood flow in blood vessels and the heart. however, it is mainly in the study of the brain that medical imaging has benefited from technical advances in the domains of physics and computer technology, and it has been innovative in assigning cognitive activities to well-identified anatomical structures. this functional neuroanatomy makes it possible, in a non-trauma-inducing manner, to monitor and locate the operation of neuron networks with great temporal and spatial precision, during various cognitive tasks such as reading and the written or oral expression of thought. the middle of the 20 th century saw the gradual development of two methods for exploring zones of cerebral activity, electroencephalography and magnetoencephalography. at present, these techniques are being taken over by mri (magnetic resonance imaging) ( figure iv.14) . the principle of mri is based on the detection of hydrogen nuclei and their differentiation according to their environment. functional mri leads to the location of the areas of the brain that are active during calculation exercises, the perception of sounds, language and objects, and memorization, with a resolution of just a few millimeters. its power of exploration is such that it has been possible to analyze the brain response, in sleeping or awake babies who are only three months old, to auditory stimuli from language that either makes sense or does not make sense 50 . the response, located in the left hemisphere and the prefrontal cortex, leads to the conclusion that, from the first months of life, there are zones of the brain that are potentially active before the first attempts at language appear. both in france (cea-saclay and the frederic joliot hospital at orsay) and abroad, recent mri performance has encouraged projects concerning the manufacture of instruments able to produce magnetic fields of around ten teslas, which allows an unequaled definition in the identification of areas of the brain assigned to specific cognitive functions and in the highly accurate determination of the location of pathological lesions. a technique that is complementary to mri is positron emission tomography (pet). this generally uses water labeled with oxygen 15 ( 15 o), a radioactive isotope of natural oxygen that has a very short lifetime (123 s), produced extemporaneously in a cyclotron by bombardment of an 14 n target with protons. the radiolabeled water is injected into the blood flow of the patient. it is found in greater concentration in the zones that are the most irrigated by blood capillaries. the positrons that it emits collide with the surrounding electrons and give rise to photons that can be detected by the appropriate apparatus. affected by a stimulus (whether this stimulus results from talking, writing or listening), the blood irrigation of the zones of the brain that have been specifically excited increases noticeably. the location of the positron emission provides information about the location of these zones. within a few dozen minutes, it is possible to locate a highly vascularized cerebral tumor. pet can use molecules other than water, such as organic molecules labeled with positron-emitting atoms, ( 18 f) fluorine (half life 110 min) and ( 11 c) carbon (half life 20 min). around twenty years ago, in canada, an analogue of l-dopa, the precursor of dopamine in the brain, 18 f-6-l-fluorodopa, was synthesized, and was found to be an excellent probe for determining the capture capability of the endings of the dopaminergic neurons in the striatum. in patients suffering from parkinson's disease, this capture capability is noticeably reduced. at present, pet involving 18 f-6-l-fluorodopa is being used to evaluate the survival of dopaminergic cells grafted into the striata of parkinson's disease sufferers 51,52 . nowadays, brain imaging techniques can be used to explore the electromagnetic anomalies of neurological or neuropsychiatric illnesses such as huntington's chorea, the different forms of alzheimer's disease or even autism, the genetic origin of which is in the process of being deciphered. a bridge has now been built between the molecular defects identified by genetics and the electromagnetic anomalies that result, analyzed by functional cerebral imaging. it was not so long ago that descartes considered that human thought was unconnected to a material support (chapter ii-3.4.3). we are not far from the era when broca located the language area in a specific zone of the brain after the autopsy of an aphasic patient (chapter iii-3.1), thus opening the door to another scientific domain, neuropsychology, which had previously only been the subject of speculation. the consequences, from the societal point of view, were far from being insignificant. thus autism, which was once suspected of being caused by errors in the mother's behavior with respect to her child, has been shown to be a disturbance in the development of the fetal nervous system, in the temporooccipital region. while the neurosciences occupy a preponderant position in the medicine of the beginning of the 21 st century, because of the development of techniques that aim to analyze even the functions of thought, emerging methodologies of another order, such as gene therapy (chapter iv-2.2), are in the process of completely modifying our ways of treating and curing a range of previously incurable human diseases, from incapacitating immune disorders to cardiovascular diseases and cancer. "it is in the domain of thought about the future that man is singled out. we are beings who have an imagination. not content to live in the present, to profit from past experience, we remain haunted by a future that we are conscious of constantly entering. this obsession with the future has been a powerful driving force in cultural evolution. we seek to predict in order to avoid the worst and to better prepare for our tomorrows." by predicting potential dangers in subjects who are in good health, predictive medicine aims to provide the means of avoiding these dangers. these dangers can be intrinsic in nature, being written, for example, into a certain genome dna sequence, or they can be extrinsic in nature, linked to an unsuspectedly deleterious environment. in each generation, mutations occur, certain of which can lead to so-called genetic diseases; between 3 and 4% of newborns are affected. besides these spontaneous mutations, there are also mutations arising from the genetic inheritance of the parents. the purpose of genetic counseling is to warn parents when the existence of a potentially serious genetic flaw is suspected. the highlighting of genes that give a predisposition for cancer (proto-oncogenes) is a convincing illustration of the power of predictive medicine. this involves genes that control the synthesis of growth factors, the activity of which is essential to embryogenesis and to the repair of damaged tissue. while they are normally subject to strict control by anti-oncogenes, proto-oncogenes are able to become active in an anarchical manner, under different influences, and to transform themselves into cancer-generating oncogenes. recently, mutations have been found in two genes, brca1 and brca2, these mutations giving a predisposition for cancers of the breast and of the ovary. thanks to genetic exploration, it will soon be possible to predict whether a cancer of the breast will have a rapid progression leading to uncontrollable metastases or a slow progression. depending on the case patients will be subject to heavy chemotherapy or to a less aggressive treatment. in this context, targeted therapy with monoclonal antibodies is a source of great hope. while genetic inheritance has a role in cancer, the environment plays a notinsignificant role as well. this is the case, for example, in lung cancer sufferers who smoke tobacco, cancer of oesophagus in those who drink alcohol and job-related cancers in those working in factories producing colorants or materials derived from asbestos or tars. cardiovascular diseases are the primary cause of death in the more developed countries, involving either an infarctus, or a stroke. many risk factors for these diseases are known, i.e., metabolic deviations affecting cholesterol or the blood serum proteins involved in the transport of lipids. these metabolic anomalies result in a syndrome known as atherosclerosis, which is characterized anatomically by the deposit of fats in the form plaques in the arteries. while genetic factors are at the origin of these metabolic problems, the latter are clearly amplified by an inappropriate diet. the role of predictive medicine is to recognize the genes that are responsible, warn individuals of the risks they are running and to advise them about the types of lifestyle and diet that do not increase these risks. being able to predict, predictive medicine should be able to prevent by means of targeted drugs. within this context, it gives rise to reflection upon polymorphism linked to variation in a single nucleotide in the dna of the genome of an individual. known as snp (single nucleotide polymorphism), this polymorphism has proved to be a very useful auxiliary in molecular medicine. hundreds of thousands of snps are present in the human genome and several tens of thousands in genes coding for the proteins. where they are located differs according to ethnic backgrounds. among these snps, some appear to be linked to certain pathologies, such as certain forms of cancer or degenerative illnesses such as alzheimer's disease. in addition, in a small number of patients, the location of certain snps has been connected with previously-inexplicable drug incompatibilities. in line with these observations, pharmacogenomics, a branch of pharmacology that deals directly with genome sequence data, is trying to evaluate the impact of "snp variants" on the efficacy and toxicity of drugs and to understand the genetic bases that explain the differences that are observed in the responses of different individuals to the same medication 53 . rather than using a standard drug that is not very efficacious or causes adverse side effects, it might be possible, depending on the genetic profile of the patient, to use a drug that is more appropriate to his or her genetic map. it is doubtless not just a fantasy to imagine that, in 20 or 30 years' time, a patient visiting the doctor will be offered a genetic map thanks to cells taken from the buccal mucosa. finding snp variants that are known to be responsible for drug incompatibilities in such a map will make a targeted prescription possible. it will allow the detection of genes for susceptibility to an illness, at the same time uncovering targets for new drugs. pharmacogenomics, which is still called new pharmacogenetics, contrasts with old pharmacogenetics in which, having found an adverse clinical response to a certain therapy, an attempt was made to identify the protein target of the incriminated drug, and then to go back to the gene coding for this protein, and to look for the mutation responsible for the aberrant response to the drug. the existence of customized predictive medicine, which would read the destinies of individuals in their genes, would not be without its consequences in the life of a citizen. by registering each citizen with a genetic map, matched with a named identity card, predictive medicine might begin to take on the aspect of a janus, with his beneficent face warning subjects of potential risks of metabolic problems, and guiding them towards the actions to be taken to lower the risks, but also with his evil face delivering each individual's intimate details to the indiscrete inquisitiveness of investigators who are operating towards their own ends (insurance companies, employers…). no less worrying would be the sly but predictable transformation of the individuality of the repaired or even doped human being within a system of imposed, docilely-accepted assistance. in the 19 th and 20 th centuries, the methodology for biological experimentation underwent a revolution caused by the progress made in the domain of chemistry, both analytical chemistry, with the deciphering of increasingly complex molecular structures, and also in synthetic chemistry, with the large-scale production of tens of thousands of new molecules. the effects of these molecules, which might eventually be used as drugs, were tested directly on animals. it was thus that in 1910 the german chemist paul ehrlich discovered salvarsan, a derivative of arsenic, which was active against a type of treponeme, the agent of syphilis. this was the result of a systematic analysis of the effect of synthetic products, aromatic derivatives of arsenic acid, on syphilis in rabbits. salvarsan was the 606 th derivative that was tested, and this is why it was called 606 for a long time before it was given the name salvarsan. sometimes, lucky chance shows surprising and unexpected properties in synthetic molecules. this was the case for chlorpromazine, which was initially used as an antihistamine. it was luck that led to its antipsychotic activity being discovered in 1950. a new era opened up in psychiatry with the arrival of synthetic narcoleptics like chlorpromazine. a new chemical science known as combinatory chemistry, which dates from the 1990s, has aroused an increasing amount of interest in pharmacology. this involves making two or more species of organic molecules that carry reactive functional residues react in solution or in the solid phase in such a way as to synthesize, by means of all possible combinations, a number of final and intermediary products that is situated in the hundreds or even the thousands, and which makes up chemical library or drug library. we can directly test all of the products formed on a sample of eukaryotic cells, in order to verify their effects (for example the inhibition of an anarchical proliferation of cancerous cells), or on microorganisms in order to evaluate an antibiotic capability. we can also proceed straight away with the fractioning of the reaction products and the testing of each of the fractions. if the response is positive, fractioning is continued until the molecular species responsible for the desired effect is obtained. other evaluation parameters for this molecule, such as its absorption, its toxicity and its metabolic future (distribution in the organs, chemical modifications and excretion) are then explored, first in cells, and then in animals (rats, mice), thus comprising pre-clinical tests. these screening operations, which are said to be high-throughput, require automation and robotization aided by powerful computer technology. each year, pharmaceutical companies screen tens of thousands of different molecules on hundreds of targets. complementary to combinatory chemistry, in silico chemistry works by molecular modeling and uses computer programs for the rational design of new drugs that are able to fix onto specific protein targets. the purpose of this is to provide a virtual follow up to modifications in the reactivity of a given drug molecule as a function of the modifications imposed on its structure, for example, the addition of residues that differ according to their electrophilic or hydrophilic properties, or according to the length of their side-chain. provided there is a chemical library and we know the three-dimensional structure of a macromolecule, for example an enzyme, as well as the nature of the residues that define its active site, we can hope to select and chemically modify a substance that is able recognize the active site of this enzyme and to make an almost perfect ligand out of it which is able to efficiently block the operation of the target enzyme. this method, which is based on computer-aided chemistry, is called "structure-based drug design", and has had some notable successes. it has made it possible to develop an inhibitor capable of blocking a protease involved in the replication of the aids virus. however, both in combinatory chemistry and in molecular modeling, the many successes that have been achieved remain modest in number compared to the means that have been deployed to achieve them. in terms of statistics, out of ten thousand molecules that are recognized as being efficacious for a given target in vitro, around one hundred are chosen for preclinical trials on animals, around ten are chosen for preclinical trials in man and only one will come out as a drug. the financial and economic effect is far from being negligible. it has even become a preoccupation in a system where merciless competition is the rule. in addition to synthetic chemistry, preparative chemistry, which is based on the isolation of natural molecules, is now the subject of renewed interest, due to the introduction of high-throughput techniques. high-throughput screening, which is an essential tool in combinatory chemistry, is also carried out to ensure the systematic detection and isolation of natural substances having interesting pharmacological activities such as antibiotic activities or anti-cancer activities, based on marine animals, microscopic fungi, prokaryotic organisms and various plants. for example, among the substances that have been isolated recently are cibrostatin, a specific cytostatic of melanoma cells, from a marine sponge, mannopeptimycin, a bacterial antibiotic from an actinobacterium streptomyces hydroscopicus and a whole set of alkaloids with a cytostatic activity with respect to human tumor cells from an exotic plant of the genus daphniphyllum. the molecular diversity of the living world is such that the reserves of natural products having pharmacological activities are far from being exhausted. so far, only a small percentage of the microbial species populating the earth have been listed. the depths of the oceans harbor many unknown species. thousands of insect species remain to be discovered in the canopies of tropical rainforests. exploration of the plant kingdom is far from being complete. the listing of natural molecules having a therapeutic activity has only just begun. the hunt promises to be a fruitful one, all the more so because the highthroughput screening methods that can now be used greatly increase the efficiency of the search. high-throughput screening, applied to natural molecules, has overturned the methodological procedures that were in use until recently, which progress through logical steps, using relatively simple artisanal analytical methods, from observation, often resulting from serendipity, to the isolation of the active substance. thus, in the 19 th century, using inherited traditional knowledge that a decoction of cinchona officinalis bark calms malaria crises, pierre joseph pelletier (1788 -1842) and joseph bienaimé caventou (1795 -1877) decided to isolate the active substance of this bark. from the raw extract, they purified an alkaloid, quinine, which proved to be the anti-malarial substance they were looking for. more recently, the starting point of florey and chain's isolation of penicillin from the microscopic fungus penicillium notatum was the fortuitous observation made by fleming that this penicillium secretes an antibiotic factor (chapter iii-2.2.5). there are many examples in which serendipity has been the principle factor involved in the discovery of a drug, and this will no doubt continue to be the case. the appearance of a lucky chance, after all, is not incompatible with highthroughput practices. also, it is not impossible that in the future there will be a conjugation of the discovery of new natural substances and the use of combinatory chemistry, with the aim of manufacturing derivatives having a much greater power of action and quality of specificity from these substances 54 . to sum up, the experimental method has caused contemporary medicine to take a giant leap forward, with the discovery of increasingly high-performance functional exploration techniques, the development of therapies using molecules that are already present in nature or are manufactured by synthesis and the more and more advanced understanding of molecular mechanisms that takes into account the basic idea of the pioneers of molecular biology was that the function of a macromolecule depended on its structure. thus, perutz's elucidation of the tetrameric three-dimensional structure of hemoglobin, and of its modifications depending on the degree of oxygenation, shed a considerable amount of light on the cooperative mechanism of the transition from the hemoglobin state to the oxyhemoglobin state (chapter iii-6.2.1). in the same way, an understanding of the structure of many enzymes, receptors and transporters of metabolites has shed light on their mechanisms. in a parallel manner to the exploration of the structures and functions of proteins, that of genomes has made remarkable progress. the subtle entanglements of genomics and proteomics that have become accessible to the experimental method are the order of the day. one major challenge for post-genomics is to understand how proteins, expressed by genes, interact with one another to generate functions that characterize cellular specificity. even more ambitious are attempts to understand the operation of organs or even of living organisms, based on mechanisms that are implemented at molecular level. these attempts lead straight to an integrated biology, that is, a biology that aims to understand the overall functioning of living beings. taking as its purpose the access to emerging functions, resulting from interactions between macromolecules, integrated biology first tries to invent methods that make it possible to detect these interactions. strengthened by the information obtained, it tries to integrate this information with a mathematized language into modules that attempt to simulate living beings. from the simplistic procedures of the middle of the 20 th century, which were justifiable within the reductionist context of this period, and which involved considering each species of proteins as an autonomous functional entity, we have moved on to the idea that the different species of protein that inhabit a cell have a dialogue with one other, and that they may move from one endocellular organelle to another, depending on post-translational modifications (for example, phosphorylations) that change their conformation and, at the same time, their reactivity and their behavior. thus, an enzyme protein is not only defined by its catalytic performance with respect to a given substrate, but also by its place in a metabolic network where it interacts in a dynamic and transitory manner with a multitude of other protein species (figure iv.15a) . the concept of cell signaling has also evolved. instead of considering that a cell membrane receptor, activated by fixation of an extracellular ligand (a hormone, for example), addresses the information received to an endocellular effector protein via a linear cascade of individual proteins, it has come to be postulated that communication between an activated receptor and its effector is mediated by proteins organized into interactive networks ( figure iv.15b) . this machinery provides more flexibility in the addressing of messages to effectors. the diagram on the right shows that besides its catalytic function, protein a interacts with other proteins in the cell. b -case of the transduction of a signal that is external to the cell (a hormone, for example). the diagram on the left refers to the classical idea of signaling from a receptor r according to a linear cascade of protein-protein interactions inside the cell, leading to an effector z. the diagram on the right shows that the signal is spread through proteins organized into interactive networks. another subject to be considered is the density of macromolecules of all types, such as proteins, nucleic acids, lipids and polysaccharides, contained by a microorganism or a eukaryotic cell, which reaches values of 300 to 500 g/liter, denoting a semi-solid state or a considerable degree of compacting. however, for technical reasons, kinetic studies carried out in vitro on isolated enzymes have been carried out with solutions that are 1 000 or 100 000 times more dilute. conscious of this difference in scale between information obtained from in vitro studies and the in vivo reality, the biology of today is trying to re-evaluate molecular dynamics within the context of a cell. this is why we are seeing the birth of an integrated (or integrative) biology of functions, which, using modeling procedures, aims to achieve an understanding of the temperospatial dynamics of the interactive components inside cells. this holistic conception of biological systems ("systems biology ") has been made possible by progress in technological expertise in domains as varied as biochemistry, molecular biology, physical optics, electronics, nanomechanics, physical and mathematical modeling and computer technology. it is a necessary complement to the classical experimental method based on bernardian determinism which, in order to connect an effect with a cause, explores living beings in a manner that is often monoparametric and is inevitably reductionist. this signals a change in paradigm in the experimental approach to living beings. a particularly effective investigative method used to explore the dialogue between proteins is the double-hybrid by genetic construction, two proteins, p 1 and p 2 , whose interaction is to be tested, are expressed in the form of fusion proteins in yeast. protein p 1 is fused with the binding domain (gal4-bd) to the dna of gal4, a protein that regulates the transcription of the β-galactosidase gene. protein p 2 is fused with the activation domain of gal4 (gal4-ad). insofar as p 1 interacts with p 2 (b), the gal4 transcription regulation activity is re-established, which is verified by the transcription of the reporter gene. if the opposite occurs, i.e., in the absence of any interaction between the two domains of gal4 (a), the reporter gene is not transcribed. the principle of this method is based on the modular nature of numerous transcription factors in eukaryotes. these factors contain both a dna-binding domain that includes a specific dna-binding site and a transcription activation domain that starts up the machinery for transcribing dna into messenger rna. these two domains can be dissociated and then re-associated in a functional manner, by forming hybrids with interacting proteins. a first protein, p1, is fused with the dna-binding domain of a transcription factor by genetic manipulation, and a second protein, p 2 , is fused with the activation domain of the same transcription factor. if p 1 is able to interact with p 2 , the transcription factor is reconstituted and the reporter gene upon which it depends can be expressed. the trapping technique, which is complementary to the double-hybrid system, was developed to make it possible to identify a set of interactive proteins within a cell. a protein that is included in this set (protein of interest) is fused by genetic engineering techniques to a short polyhistidine chain (called a tag). using this tag, the protein of interest is fixed to a solid medium containing nickel ions, a material that is reactive with respect to the polyhistidine chain. in the presence of a soluble cell extract, the protein of interest binds the cognate proteins of this extract, making it possible to retrieve a complex whose components, corresponding to interactive proteins, can be resolved after denaturing gel electrophoresis and then characterized ( figure iv .17). the techniques described above are backed up by cell imaging techniques that make use of confocal microscopy, which is more directly in keeping with living reality. the optical performance level of confocal microscopes has improved lately, with the arrival of biphotonic and multiphotonic lasers that illuminate precise points of the cell. as we have seen previously (chapter iii-6.2.1), it is possible to create a protein chimera made up of a protein of interest fused with a fluorescent protein, in this case gfp (green fluorescent protein), inside a cell. there are currently several variants of gfp that are able to emit fluorescent light at different wavelengths. this has allowed the development of a technique known as fret (fluorescence resonance energy transfer ) which explores the interaction between two fluorescent proteins. in practice, two gfp variants that have neighboring emission spectra are fused, by genetic engineering inside the cell, to two proteins of interest, p 1 and p 2 , that are suspected of being interactive. if this is the case, the fluorochromes that they carry are sufficiently close that the result is a modification in the intensity of the emission fluorescence of the donor fluorochrome (decrease) and of the receiver fluorochrome (increase), which is readily detectable. all of these studies, taken together, have given rise to the idea that endocellular proteins are organized into networks, that these networks are interactive and that their location in defined compartments of the cell is dependent on epigenetic events such as phosphorylations. two attributes can be found in integrated systems: firstly, the presence of modules, i.e., interactive motifs, which, like the pieces of a jigsaw puzzle, fit together to produce a complex, coherent structure, and, secondly, the emergence of functional properties due to the newly created interactions. the protein of interest p is expressed in the form of a protein fused to a protein "tag" t that is able to bind to a solid support with a specific affinity. the assembly is brought into contact with a cell extract. certain proteins of this extract, a, b and c, which are able to interact with the protein p, become fixed to the latter. in a second step, the tag t is freed from its attachment to protein p by means of a specific cleavage enzyme. the pabc complex that is recovered from the solid medium in soluble form is subjected to polyacrylamide gel electrophoresis, in order to separate and identify its components. given an analytical description of the basic building blocks that are used to construct living systems, and an understanding of their modes of association in defined circumstances, it is normal to try to reconstruct, in their entirety, mechanisms that show the functioning of these systems. this idea was first applied to the yeast saccharomyces cerevisiae for different reasons, such as cell homogeneity (in principle, and, in any case, statistically speaking, all yeast cells have the same genome and the same proteome), an in-depth understanding of the genome and the proteome and the presence of a vast directory of well-characterized mutants. the use of techniques for the detection of interactions between proteins has revealed the existence of a potential dialogue of unexpected richness between a multitude of proteins ( figure iv.18) , in the yeast cell. it is necessary to reflect upon this evidence, which leads to the postulate that, for a given protein, there are mechanisms that restrict and select the many partners able to react with it at a precise moment.faced with a situation in which chance has the upper hand, leading to uncontrollable anarchy, it is necessary to have regulation, which is underpinned by darwinian logic. example of an interaction network involving the yeast sup45 prion protein. the line of dashes refers to experimental data concerning the interaction of sup45 with another protein, sup35. the lines in bold refer to interactions taken from experimental data; while the fine lines refer to predictions, particularly phylogenetic ones. this logic arises from a choice of the most efficient reaction path, which is first of all dictated by the speed constants involved in the association and dissociation of molecular partners, without, however, neglecting any stochastic events that may arise. chemical modifications of proteins participate in this regulation, such as phosphorylation, glycosylation and acylation. the result at cell level is a coherent channeling of the information that is carried from a molecular signal. thus, fixation of a hormone onto a receptor induces a series of modifications to the intracellular proteins that channel information towards an effector terminal, for example an enzyme responsible for the production of a metabolite with a strategic function. how can the sum of the scattered experimental data that we have concerning the catalytic capabilities of a multitude of enzymes of cellular origin be integrated into the operation of a cell? how can we envisage the gene-enzyme relationship according to current evidence concerning the complexity of the genetic message? biocomputing, or bioinformatics, a science that emerged towards the end of the 20 th century, proposes to try to answer these questions. at the turn of the 21 th century, with the development of increasingly powerful computer microprocessors that are able to carry out complex operations with amazing rapidity, the hope arose that it would be possible to simulate processes as varied as the regulation of the cell cycle, molecular flow in metabolic pathways and the reception of molecular signals, for example from hormones by living cells, as well as the transmission of the messages that result. the dream of an in silico virtual biology has become achievable. the first mathematical theory of simple enzyme reaction kinetics was put forward approximately a century ago, by victor henri (1872 -1940). born in marseilles to russian parents, victor henri studied in saint petersburg and then spent time at the universities of göttingen and leipzig before becoming established in paris. having an eclectic mind, studying both psychology and physicochemistry, he had the wonderful intuition that enzyme catalysis arises from a specific mechanism, different from that implemented in a chemical reaction. the study carried out by henri concerned the cleavage of sucrose (table sugar) into fructose and glucose by the action of an enzyme called invertase (sucrase). the term invertase was used because during reaction there was a change in the rotatory power of the sucrose solution, shown by a polarimeter. analysis of the reaction suggested that an enzyme-substrate complex is formed, which breaks down to regenerate the enzyme and liberate the product of the reaction. this analysis gave rise to an equation for the speed of the enzyme reaction as a function of substrate concentration. henri published the results of his experiments both in his thesis, which he presented to the paris faculty of sciences in 1902, and in two articles that appeared in the reports of the academy of sciences 56,57 . in 1913, in biochemische zeitschrift (vol. 49, pp. 333-369), leonor michaëlis and maud menten (1879 -1960) confirmed the results of victor henri and formulated an equation that became a classic, describing the speed of formation of a product from a substrate in enzyme catalysis. since the period of these first studies, the concepts involved in enzyme kinetics have evolved considerably. the first metabolic pathway be deciphered was that of the degradation of glucose (glycolysis) , either into ethanol in yeast, or into lactic acid in muscle tissue. after this, researchers became aware that the activity of enzymes could be modulated as a function of covalent modifications of amino acid residues of their protein chain (phosphorylation, dephosphorylation, acylation…). metabolic flow analysis led to the idea of the limiting reaction. in the 1970s, the idea that there is a single limiting reaction in a chain or a cycle of reactions gave way to the idea that metabolic control is distributed over several reactions, and that each reaction has its own, more-or-less intense control force. another complexity factor came to light with the discovery of allostery 58 . allosteric enzymes have the particularity that they can fix reversibly onto a site that is different from the active site (allosteric site) molecules that are often the terminal products of a chain of reactions: the consequence of this is a conformational modification of the structure of the enzyme that has repercussions on the geometry of the active site and modifies its reactivity with respect to the substrate. in the 1980s, faced with the complexity of the tangle of listed metabolic and signaling networks, attempts were made to use mathematical modeling to show the progress of the traffic of molecules inside a cell in relatively simple metabolic pathways such as glycolysis. in the modeling procedure, the concentrations of the different molecular species are considered to be variables whose variations over time depend on their speed of production and their speed of disappearance, which leads to a set of paired differential equations. with this procedure, we entered the domain of virtual biology. thanks to the creation of increasingly powerful software, the aim of virtual biology is to simulate signaling and metabolic pathways. in the longer term, the aim is to understand the molecular and cellular processes that direct embryo development, or to test the effects of drugs of known target on the metabolic behavior of the cell. metabolic engineering (which is already well developed) comprises two types of models, stoichiometric models and kinetic models. stoichiometric models describe metabolic networks in the stationary state, based on analytical data. kinetic models combine stoichiometric information and that concerning the catalytic capabilities of the enzymes in a metabolic network. in canada, the cyber-cell project plans to model the overall functioning of the machinery which, in the bacterium, includes its metabolism and its proliferation. the aim of the afcs (alliance for cellular signaling), which was launched in the usa, is to understand how signaling occurs in cells such as the b lymphocyte, the macrophage or the cardiac cell in response to different types of stress. the techniques that are used range from identification of all signaling network proteins to the evaluation of the flow of circulating information and to the integration of the data acquired into theoretical models. the european nerve synapse project makes use of similar procedures, with its long-term hope of linking the functioning of nerve cells with the cognitive and behavioral functions of living beings. this is a sizable challenge. in fact, there is far from being a real consensus concerning the principle of a demarcation between, on the one hand, cognitive functions such as language or memory, which are located in precise zones of the brain, and which could be reduced to physicochemical processes, and, on the other hand, forms reflective thought that are expressed through the creative imagination or by judgements concerning ethics or esthetics, the notion of personal responsibility, or even pictorial, architectural or musical beauty. should we see the human soul as the programmer of a superb computer that never ceases to develop from the embryonic state onwards, like john eccles (1903 -1997) and others, or should we admit, like jean-pierre changeux (b. 1936), stanislas dehaene (b. 1965), daniel dennett (b. 1942) and others, that thought is not transcendent, and that it is intrinsically dependent on the brain, which is considered to be a neurochemical system, and thus look for the secret of the individuation of the human being in brain information storage mechanisms with retrocontrol loops associated with subtle neuron architectures, or, in short, refer to a sort of turing machine? whatever the case, in this domain, as in others, simple animal models are used in order to identify elementary processes that are able to explain easily-tested functions such as the memory in anatomical and physiological terms. this is the case for the sea slug or sea hare (chapter iii-6.1) which, despite its rudimentary cognitive capabilities, provides information that can be used to reconstruct higher cognitive functions, present in the brains of mammals. it is clear that the cognitive sciences have reached a stage in which they are emerging from their infancy (chapter iv-3.2). now, ingenious computing methods and a basis for reflection that has spread beyond the confines of psychology and philosophy, are available to them. they have set themselves the goal of producing an artificial intelligence, using ultrarapid computers as well as software that is able to model the operation of neural networks and to come close to the performance of human intelligence in terms of the power of their reactivity and their memorization. at present, many other biological systems are being subjected to multiparametric exploration, with the aim of producing models. this is the case, for example, with the program of the differentiation of certain white blood cells, the neutrophils (chapter iii-2.2.4), from precursors located in the bone marrow, a differentiation that leads to the emergence of functions such as phagocytosis that are implemented in the fight against microbial infections 60 . in a domain that is closer to mechanical science, hydrodynamics, the digital simulation of the cardiovascular system has already made it possible to represent the physical phenomena associated with the propagation of a wave in deformable arteries during a cardiac contraction, in the form of equations, with a good approximation 61 . in short, from a monoparametric approach that often began as being essentially and necessarily reductionist, the experimental method, applied to living beings, has become a "globalized", or synthetic, multiparametric approach, the aim of which is to understand the dynamics of molecular interactions in defined biological systems. making use of data obtained, the hope is to use mathematical processing to simulate the overall functioning of a cell, organ or organism. this new paradigm of the experimental method ("systems biology" 62 ) is not limited to a simple accumulation of observations concerning a given biological system and their abstraction in mathematical form. the originality of this approach is that it formulates predictions of changes in the behavior of a system as a function of the manipulation of parameters such as substrate concentration, the presence of inhibitors, and so on. the mathematical processing of experimental data, with a view to learning about the functioning of complex systems by modeling, is supported by the technosciences, particularly biocomputing. it is linked not only to the enormous sum of accumulated knowledge concerning the structures and functions of living beings in the post-genomic era, and to the notion that the life of a cell depends on multiple networks of molecular interactions and thousands of enzyme reactions located in its different organelles, but also to the information that comes to it from its environment. a first type of modeling is based on observations made or experiments carried out on an easy-to-study model system. laws are drawn up from this. this so-called "bottom-up" (or synthetic) procedure, which proceeds from the simple to the complex, makes it necessary to have a set of very precise biochemical data. this great precision is all the more imperative in that any deviation, even a minimal one, in the integration of an experimental result can generate a mathematical model that is apparently plausible but which is unconnected to the living reality. the reverse, "top-down" (analytic) procedure proceeds from the overall operation of an organ and its theoretical analysis towards the specific mechanisms of its components. it takes into account the functioning of complex integrated systems such as the nerve and endocrine systems, immune and reproduction systems and the system controlling homeostasis, descending in stages towards the cellular, molecular and genetic levels. in the end, an understanding of living beings involves the management of an amazing capital of experimental data. this makes it necessary to consider all of the genes (genome), all of the transcripts coding for the proteins (transcriptome), all non-coding rnas (rnaome), all proteins expressed in a particular cell type (proteome) and all metabolites (metabolome) as a function of the enzyme catalyzers expressed and the energetics that underlie the catalyzed reactions, and, finally, to connect upstream events (mutations of genes and interference of messenger rnas, chemical modifications to amino acid residues in the proteins) to phenotypical modifications on the scale of the whole organism (phenome) (figure iv.19) . the goal of integrated or integrative biology ("systems biology") is therefore to put living beings into equations, that is, to represent them in virtual systems for which the behavior, accessible by means of calculation, can be predicted as a function of modifying parameters. in addition to the possibilities that are opened up in terms of a deeper understanding of physiological mechanisms, such virtual systems could be used for the design of new drugs or for the manufacture of economically valuable biomolecules. the diagram illustrates the different levels of complexity in the pathway that goes from all the genes together (genome) to all of the expressed characteristics (phenome) in the living being, passing via coding rnas (transcriptome) and non-coding rnas (non-coding rnaome), all the proteins (proteome), all addressing systems in the cell compartments (localisome) and all of the metabolic pathways (metabolome). at a scientific meeting held in sheffield, england, in january of 2005 63 , with the theme systems biology: will it work?, an argumentative discussion of the advantages as well as the disadvantages of an integrated, mathematized biology was useful in that it included a reminder that most of the parameters used in "systems biology" come from studies that are carried out in vitro on purified enzymes, and that it is not sufficient to know the value of the michaelian parameters (v max and k m ) in order to reach biological reality. in fact, in vivo, many enzymes record variations in activity that are hard to control due to allosteric type regulation or interenzyme contact; several enzymes of a metabolic pathway being able to interact to form a metabolon. however, by compacting several enzymes that catalyze contiguous reactions in a metabolic pathway, a metabolon considerably increases the catalytic efficiency of this pathway. another element of uncertainty arises from the protein density of the cell medium, and also from the fact that covalent modifications of enzymes can introduce a change in endocellular location (nucleus, organelles of the cytoplasm…). nevertheless, an approximative approach could limit itself to dealing with biological systems in modular terms, i.e., to considering them as being made up of a number of black boxes, each black box containing a series of reactions being processed mathematically together, with an input and an output. there is still a long way to go if we place ourselves on the cellular scale, but the end of the pathway seems even further away if we envisage the organism as a whole, taking into account the remote interactions between organs involving the interplay of chemical mediators. the brain plays a critical role in the dialogue between different organs, and in the regulation of the energy equilibrium in higher animals. this equilibrium can be disturbed by fasting or intense, prolonged muscular activity, or by an overabundant diet. the corrective response comes from a deep region of the brain, the hypothalamus, via the secretion of different types of peptides, some of which stimulate the appetite and others of which suppress it 64 . while taking into account the multitude of parameters that affect the complexity of living beings on an individual level, the theoretical approach to the study of cell function by modeling has the advantage that it produces predictions and provides information about the validity of conclusions and of theories based on experiments that are old and accepted in the absence of contradictory elements. this was the case for the theory that stated that the state of activation of a gene is determined only by the presence in its environment of transcription factors. recent studies concerning the level of gene transcription in isolated cells have shown that there are probabilistic-type factors which mean that a given gene in a given cell can be activated at any moment. a review which came out in 2005 65 sums up this subject. in this review, the authors use modeling to analyze the behavior of cells in the process of differentiation during embryogenesis. their darwinian model, which associates contingency and selectivity, competes advantageously with the determinist (or instructive) model, based on an all-or-nothing logic, that has been implicitly accepted up until now. the darwinian model takes into account the occurrence of stochastic events at gene expression level, events that are partially linked to the structural modifications to the chromatin that depend on covalent modifications of an epigenetic nature (phosphorylation, methylation…). by basing itself on the existence of mutational fluctuations that arise by chance, associated with a selfregulation of gene expression, the model that is obtained shows that during embryogenesis a cell has a choice either to differentiate into another cell type or to remain in its initial state. differentiated cells stabilize their own phenotype and, in their surroundings, stimulate the proliferation of foreign cell phenotypes. a harmonious equilibrium between these two processes is the necessary condition for the setting up of the steps that lead to the arrangement of different cell types during organogenesis, which take place in an apparently inescapable order, in the absence of disturbances. a break in this equilibrium leads to an anarchical cell proliferation. generally speaking, from the point of view of experimental science, the lesson that can be drawn from current modeling experiments is that the bernardian determinism that has prevailed as the essential foundation stone of the methodology applied to the study of living beings may find itself being requalified by the taking into account of stochastic phenomena. this is the case when the number of reacting molecules is low and the probability of stochastic events is non negligible. the modeling of such systems necessitates having recourse to a complex mathematical formalism. it remains true that determinist models for simulation of the dynamics of living beings, represented by classical differential equations, are more-or-less valid when the number of reacting molecules involved is high and the reactions supposedly take place in a homogeneous medium. should "systems biology" be regarded as a resurgence of a physiology that has been somewhat neglected over the last few decades, but has been reinvigorated by a salutary hybridization of biologists and model-makers? in any case, this is the intention of the "physiome" project 66 which has recently been launched on an international scale. it is also doubtless due to this state of mind that a trend which had gone out of fashion, involving the simulation of the performance of living beings by very elaborate concrete models, robots, is being reborn. an immense distance has been covered in just over two centuries, since the time when vaucanson presented automata in the forms of human figures, moved by ingenious springs and cogs, and giving the illusion that their movements were controlled by an intelligence, to a marveling public (chapter ii-6.4). in the last decades of the 20 th century, considerable progress was made in the understanding of the operation of the nervous system and in the development of technologies in which miniaturized electronics have come to the aid of already high-performance micromechanics. the brain being considered as an information processing machine, the aim is to understand the logic of this information machine by means of simulations on computers and, based on the results obtained, to construct robots whose electrical circuits take their inspiration from the operation of animal neurons. these robots are called biorobots or animats. insects have been chosen as a reference for the construction of such creatures because of the relative simplicity of their nervous systems: several hundred thousand neurons, in comparison with the billions of neurons present in mammals (100 billion in man). the fly's system of vision has been favored as a subject of study because of the possibility it offers of being able to record the electrical response of neurons that can be identified one by one. in the middle of the 1980s, in france, this inspired the pioneering work in biorobotics carried out by nicolas franceschini (b. 1942) and his team 67,68 (figure iv.20) . their objective was to study how an animal can avoid obstacles by means of its ocular perception and its movement-detecting neurons, the operation of which the team just analyzed using microelectrodes and a microscope-telescope specially built for the purpose. the fly's composite eye has 3 000 elementary units or ommatidia, each carrying eight light receptor neurons. the electrical signals emitted by these neurons in response to captured light (at most a few dozen millivolts) are sent to subjacent neurons that are organized into three levels that correspond to the optical ganglions called the "lamina", "medulla" and "lobula". the lobula is a strategic decoding center which, because of the small number of neurons contained in it (sixty), has been the subject of in-depth electrophysiological investigation. each of the sixty neurons of the lobula operates as a signal integrator. the neurons of the lobula send their messages to motor neurons involved in the contraction of small muscles that control the guidance and stabilization of the fly's flight. based on an exhaustive study of the neuron wiring of the fly's eye, franceschini and his colleagues were able to reconstruct a facetted artificial eye that can retranscribe the light signals received optoelectronically. this artificial eye, the electronic components of which correspond to around one hundred movement detectors in the fly, was incorporated into the head of a robot. the recorded light signals were transmitted to the moving components of the robot. a -head of the blowfly, calliphora, seen from the front, showing the two compound eyes with their multifacetted array. each eye hides 40,000 photoreceptors that drive various image processors based on a few hundred thousand neurons. b -"elementary motion detector" (emd) neuron and its evolution over fifteen years: on the left, first generation (1989), using surface mounted device (smd) technology, compared to a one franc coin from that period; on the right, the 2003 version of the highly-miniaturized hybrid (analog + digital) emd circuit (mass 0.8 grams), compared with a one euro coin. c -autonomous vehicle (12 kg) able to move around in a field of obstacles that it does not know about in advance. its vision is based on a genuine compound eye, whose circuits are inspired by those of the fly. it includes a network of 114 "motion detecting neurons", transcribed electronically according to the principle analyzed in the fly's eye by means of microelectrodes and a specially-constructed microscope-telescope. this network is arranged around a ring that is about thirty centimeters in diameter. the recently-constructed roboflies, oscar and octave, only weigh around one hundred grams. d -routing of the electronic components (resistances, condensers, diodes and amplifiers that operate in their thousands) soldered onto the six-layer printed circuit-board that provides the connection between the sensors and the steering motor on board the autonomous mobile robot shown in (c). figure iv .20 illustrates the neuromimetic biorobofly constructed according to this principle. completely autonomous because of its on-board power supply, this robot was able to move around at high speed (50 cm/s) in a cluttered area, avoiding the obstacles. this first "terrestrial" robofly, which was completed in 1991, was followed by several much lighter brothers and sisters: fania, oscar and octave are aerial roboflys 69 . constructed in 1999, oscar is a captive robot that weighs around one hundred grams. it is equipped with an eye that reproduces the retinal microscanning of the fly's eye discovered by franceschini, oscar is able to rotate around a vertical axis because of its two diametrically opposed helices and can thus orient its view towards an object. if this object moves, oscar follows it with its eye, up to an angular speed comparable to the tracking speed of the human eye. produced in 2003, octave is another aerial robofly that is able not only to take off and distinguish a relief, but also to land automatically and to react sensibly to a contrary wind in a turbulent atmosphere. on board, it has an electronic visuomotive self-regulation system, the operation of which is based on the signal processing operations that, in the insect, carry out the automatic pilot functions 70 . the age of biorobotics, in which robots take their inspiration from animals, has only just begun 71 . if specimens are still so rare, this is because behaviors for which we have a good understanding of the underlying neuronal bases are also rare. at the time of writing, a robot rat named psikharpax, with artificial muscles and a vision system that enables it to perceive objects in three-dimensional space, is being developed at the university of paris vi. almost in the realm of science fiction, we find hybrid robots obtained by hybridization of the living and the non-living. this is the case for the hybrid robot produced by japanese researchers, based on the silkworm moth. control of the nervous system of this insect is spread throughout its body. if its head is cut off, it continues to fly, which gave rise to the idea of replacing the head with an electronic transistor system 72 . using a remote measurement device, it was possible to explore certain behavioral aspects of the insect. although the construction of hybrid robots may raise ethical objections, such technology is capable of giving rise to spectacular applications in the domain of prostheses. the neurological prostheses of the future will nevertheless require that a contact be made between living neurons and the electronic chips that are able to improve the inadequate processing of the physiological signal. such a contact was produced recently in a german laboratory directed by peter fromherz 73 (b. 1942) . a small network of snail neurons, chosen because of their large size, was cultured on the surface of a silicon chip. a signal emitted at one location of the chip was able to be transmitted to another location via the synapse connection between two neurons 73 ( figure iv.21) . on the molecular scale, mitochondrial atpase or atp synthase, with a size of around ten nanometers (chapter iii-6.2.1) was used recently for the manufacture of a biorobot that made its mark in the media as the smallest known rotating molecular motor. the membrane-type enzyme catalyzes the reversible reaction atp + h 2 o adp + pi. this enzyme therefore has a double function; hydrolysis and synthesis. for this reason it is called atpase or atp synthase depending on the physiological context in which it is involved. in the mitochondria that oxidize metabolites, the enzyme operates like atp synthase. it catalyzes the synthesis of atp coupled to oxidation reactions. in the absence of respiration or of oxidizable substrates, the enzyme operates like atpase; it catalyzes the hydrolysis of atp. for ease of language, the enzyme will be designated here by the term atpase. it should be remembered that mitochondrial atpase includes two sectors, a hydrophobic sector, fo, characterized as a proton channel located inside the mitochondrial membrane, and a hydrophilic sector, f1, carrying catalytic subunits that are arranged as if they were on a turret (see figure iii .18c). fo contains two master parts of the atpase motor, i.e., a rotor comprising an assembly of around ten so-called "c" subunits and a stator that corresponds to the "a" subunit. the "c" subunit assembly is attached to the "γ" subunit of the catalytic sector f1, which thus functions as a rotor. in 1961, the british biochemist peter mitchell (1920 -1992) showed that phosphorylative oxidation in the mitochondria is associated with a transmembrane transfer of protons. the mechanism involved is said to be chemiosmotic. the most important experiment involved an almost serendipitous observation, carried out with a simple ph meter. when a current of oxygen was passed through a suspension of mitochondria in an unbuffered saline medium, in the absence of adp and phosphate, an instantaneous acidification of the extramitochondrial medium occurred, shown by means of the ph meter electrode immersed in this medium. it was concluded that the sudden switch from anaerobiosis to aerobiosis, i.e., the start up of respiration, is correlated with an ejection of protons from the mitochondrial matrix to the extramitochondrial medium. afterwards, this fact was linked with several others, the whole leading to the formulation of the chemiosmotic theory. briefly, mitochondrial respiration generates a vectorial movement of protons from the interior to the exterior of the mitochondrion. because of this, a proton concentration difference is established on either side of the mitochondrial membrane. the electrical potential that is created in this way is used by the mitochondrial atpase in order to synthesize atp from adp and mineral phosphate. this process involves two correlated events: return movement of protons towards the inside of the mitochondrion across the fo sector of the atpase; rotation of the assembly of c subunits and the γ subunit that is interdependent with it. we have therefore moved from electrical to mechanical energy. during its rotational movement, the γ subunit establishes contacts with the three catalytic subunits of the f1 sector, in succession. one after the other, each of the three catalytic subunits in contact with the γ subunit undergoes a change in the conformation of its active site, which is at the origin of the synthesis of atp. in the absence of mitochondrial respiration, the reverse process occurs. the atp is hydrolyzed into adp and mineral phosphate, and the energy released at each of the three catalytic subunits is used to rotate the γ subunit in the reverse direction to that which accompanies the synthesis of atp. the existence of a rotational movement of mitochondrial atpase, which had been suggested on the basis of biochemical arguments 74 and of structural data 75 was authenticated by masasuke yoshida and his co-workers in japan in 1997, thanks to an imaging technique 76 . in a first step, the molecular system was simplified by being limited to the catalytic f1 sector of the enzyme. a methodological trick was employed: genetic engineering was used to modify the α and β subunits of this sector by fixing polyhistidine chains to them. because of the strong affinity between polyhistidine and nickel ions, the f1 sector α and β subunits were immobilized on a medium covered with nickel ions (carried by an organic molecule). an actin filament labeled with a fluorescent ligand was attached to the end of the f1 sector γ subunit. this assembly made it possible, under a fluorescence microscope, to display a rotational movement of the actin arm carried by the g subunit affected by the addition of atp and by its hydrolysis into adp and mineral phosphate. a similar rotational movement of the γ subunit carrying a metal microbar was observed by an american research group 77 . remarkably, in 2004, after having fixed a metal microbead onto the γ subunit, the japanese researchers 78 demonstrated synthesis of atp from adp and mineral phosphate by rotating the γ subunit by means of rotation of the magnetic bead, induced by magnets. thus, the experimental coupling of a mechanical force and a chemical synthesis was demonstrated. in 2005, the japanese research team 79 succeeded in photographing the rotational movement of the enzyme powered by atp under the microscope, this time looking at the entire atpase complex, f1fo. after having attached a gold microbead onto the "c" subunits of the fo sector, to act as a probe, the researchers were able to confirm that the rotational movement of these subunits depended on the hydrolysis of atp into adp and phosphate ( figure iv.22) . the whole of the mitochondrial atpase (atp synthase) does, in fact, function as a molecular rotational motor powered by a proton flow, rather like an industrial rotational motor powered by a fossil fuel or electricity. the analogy is a striking one; the γ subunit of the enzyme corresponds to the motor driveshaft and the "c" subunits correspond to the motor itself. because of its association with non-living structures, for example metal bars or gold beads, which are carried along in the rotational movement of the enzyme, it is possible to speak of molecular biorobots. this domain, in which nanomachines use macromolecules from the living world, has only just opened up, but its future is full of promise. the story of scientific progress made with respect to the mechanisms of phosphorylative oxidation via the functioning of mitochondrial atpase, from the time of mitchell's experiment with the ph meter until the time of the manufacture of yoshida's biorobots, is an exemplary one. it is typical of the way in which a mode of thought evolves over time, from a primary discovery resulting from serendipity or an experiment "to see what happens", leading to the proposal of the existence of a mechanism, to a carefully programmed project which, because of its inventive technicity, shows the validity of the proposed mechanism, and, in addition, demonstrates its future utilitarian value. nowadays, certain biotechnologists dream of being able to "synthesize life" 80 in terms of cells that are able to imitate the performance of living cells. the concept of the "lab-in-a-cell " is coming to the fore 81,82 . nevertheless, it would be necessary to design an artificial cell that is an authentic replica of a living cell, and which benefits from all the attributes of a living cell. this is not achievable at the moment. thus, the current aim of nanobiotechnology is limited to scheduling the construction of artificial cells that are relatively simple both in composition and in function, for example, a microvesicle edged with a lipid membrane, containing a system of protein synthesis expressed from a short sequence of dna, as well as a system of atp synthesis able to supply the energy necessary for this protein synthesis. demonstration of a rotational movement of f1fo mitochondrial atpase (atp synthase) , induced by atp. atpase or atp synthase (reversible catalysis enzyme that hydrolyzes or synthesizes atp) has two sectors (see figure iii .18). the membrane sector, fo, comprises an assembly of a dozen so-called c (rotor) subunits and an a (stator) subunit. the other, extra-membrane sector, f1, is catalytic. it comprises three β catalytic subunits and three α non-catalytic subunits arranged in a ring, in alternating order. at the center of the ring is the γ subunit which is attached to the c subunits of the fo sector. subunits δ, ε and b stabilize the whole of the molecular complex. in the experiment illustrated in this figure, subunits α and β of the f1 sector of the enzyme have been genetically modified to include polyhistidine chains (his-tag, artificial ligand). due to the interaction of these chains with nickel ions (linked to an organic molecule) covering a solid support medium, the α and β subunits are immobilized. in addition, a gold microbead is fixed onto the ring of the fo sector c subunits by means of a chemical device (streptavidin molecule, artificial ligand). following the addition of atp, rotation of the microbead attached to the ring of fo sector c subunits is observed by microscopy on a black background. this rotation is dependent on (and at the same time an indicator of) the rotation of the c subunits, itself led by the rotation of the γ subunit in contact with the catalytic β subunits. note the ejection of protons. when the enzyme functions as atp synthase, the proton movement takes place in the opposite direction. "progress in biology is possibly mainly tributary to the drawing up of concepts or principles […] . in the process of elaborating concepts, which marks scientific progress in biology, there is sometimes a crucial step, when we realise that a more-or-less technical term that we had previously considered to cover a given concept, in fact covers a mixture of two (or more) concepts." ernst mayr translated from a french translation entitled history of biology. diversity, evolution and heredity -1989 on the margins of the modeling and the difference in mathematized systems that comprise theoretical biology, particularly in silico biology, concepts are mental representations, often image-filled and idealized ones, of fundamental mechanisms that are deduced on the basis of experimental results. from the imaginary domain of the probable, they extrapolate constructions of the mind that are in phase with the facts and experimental data, within a reflective projection that gives them their meaning and makes it possible to make certain predictions. there are premonitory concepts. this was the case for the concept of the reflex arc that associates movement with sensation. this concept was already present in the ideas of descartes (chapter ii-3.4), but it took more than a century before the theory of the existence of a reflex arc was supported by bell and magendie's demonstration of the existence of relay centers for sensory and motor nerves in the spinal chord (chapter iii-1). there have been premonitory concepts that, while they were demolished at the time they were first proposed, were shown to be completely accurate a few decades later. in the middle of the 19 th century, the german pathologist jacob henle (1809 -1885) needed a healthy dose of imagination and audacity in order to oppose the theory of the "miasma", a theory that was taught as a dogma, with a new theory that not only explained the spread of contagious diseases by microscopic beings, but also formulated the criteria for validating this theory, i.e., isolation of the pathogenic agent and its development in culture away from the diseased organism, then reproduction of the original pathology after injection of the pathogenic agent, which has been isolated, characterized and multiplied in a culture, into a model animal. thirty years would go by before the formulation of koch's postulates, based on experimental evidence (chapter iii-4). we may ask ourselves whether or not history is currently repeating itself in the case of spongiform encephalopathies that affect humans and animals, for which, according to the thesis of stanley prusiner 83 (b. 1942) , the prion, as an infectious protein, is responsible. other evocative concepts hold the keys that open doors to domains that are unknown, but are potentially rich in information. it is thus that the double helix dna structure proposed by crick and watson, based on the complementarity of adenine-thymine and cytosine-guanine bases (chapter iv-1.1.1), led to the concept of dna replication with reconstruction of a double strand that is identical to the original double strand. the concept of dna replication spurred on matthew meselson (b. 1930) and franklin stahl (b. 1929) to develop an experimental protocol based on the labeling of the dna nucleotide bases of the enterobacterium e. coli with a heavy isotope of nitrogen, 15 n, and on the differentiation of monocatenary dna strands in the process of synthesis by measurement of their density, as analyzed by centrifugation in cesium chloride gradients. in the same vein, jacob and monod's discovery of regulatory genes (chapter iv-1.1.1) gave rise to the concept of the operon which, in the bacterium, defines a genetic unit comprising structural genes and regulatory genes. the concept of the regulation of gene expression, extended to higher eukaryotes, makes it possible to explain the phenomenon of differentiation in cells with specific activities (muscle cells, nerve cells, epithelial cells…) by the silencing of certain genes and the activation of others. within the framework of bioenergetics, the chemiosmotic theory put forward by mitchell, in order to explain the coupling of mitochondrial respiration with atp synthesis (chapter iv-4.3), gave rise to consideration of the concepts of transmembrane transport of metabolites and of vectorial metabolism. some generalizing concepts that carry a unifying virtue within them are known. one such is the concept of compartmentation. the cell is no longer considered to be a bag of enzymes, as used to be the case. it is now considered to be a compartmented structure in which each type of compartment corresponds to a type of organelle delimited by a membrane and characterized by specific functions. thus, because of the genetic material that is present in it, the nucleus of the cell holds the information necessary for the manufacture of proteins. the mitochondria, which are called cell power plants, are in charge of oxidizing the products of cell catabolism and using the resulting energy for the synthesis of atp from adp and mineral phosphate. the lysosomes are the garbage collectors of the cell. among the functions carried out by peroxisomes is the partial breakdown of very long chain fatty acids. the endoplasmic reticulum and the golgi apparatus are involved in the maturation and the secretion of proteins. the ribosomes represent the machinery upon which messenger rnas are displayed in order to be decoded into proteins. a sign of the extreme sophistication of this setup is that the membranes of the endocellular compartments are not sealed common walls. they contain proteins that act as selective transporters of metabolites or highly specific ion channels, allowing the exchange of messages throughout the cell. thus, each organelle, informed of the condition of the others, is able to adjust its own activity to ensure the greatest harmony of the whole. this conditioned compartmentation at cell level may be compared to the socialization of human communities. while endocellular organelles are compartments delimited by membranes, there are non-membrane-bound compartments in the cell, such as protein complexes in which two, three or even more proteins are closely linked. often, these are enzymes that catalyze reactions that are contiguous in a metabolic pathway. being compacted into a complex known as a metabolon results in an increased efficiency of the flow of metabolites by facilitating the channeling of this flow. concepts evolve, often adjusting their representations according to accumulated knowledge. a good example of this is the evolution of the concept of the gene since its formulation at the beginning of the 20 th century. the term "genetics" was created in 1906 by the english naturalist william bateson (1861 -1926) . the term "gene" was introduced three years later by the dane wilhelm johannsen. this term designated a principle which, in the chromosomes of fertilized egg, and in an intentionally vague manner, was supposed to have an influence on the phenotype of the progeniture. during the same period, the term "locus" appeared out of the experiments carried out by the american thomas hunt morgan on the drosophila, a locus being defined as a region of a chromosome which, when altered by a mutation, leads to a modification of the phenotype of the living organism. based on cross-breeding experiments carried out on hundreds of drosophila mutants, morgan and his co-workers drew up the first genetic maps. by chance, the salivary glands of the drosophila have a particular characteristic; the nuclei of their cells contain giant chromosomes called polytenes, which result from the association of a hundred replicate copies of chromosomes that, after staining, are visible under the optical microscope. on these chromosomes, it is possible to distinguish colored bands separated by clear bands. it was observed that specific mutations had specific effects on the arrangement and number of these bands. the material contained in the bands was therefore the site of mutations. in the middle of the 1930s, the listing of more than 3 500 bands made it possible to construct a cytological map that was already highly detailed. the concept of the gene, the material basis of inheritance, took root. the sporadic mutagenic effect of x-radiation in the drosophila, which was shown by the geneticist and biophysicist hermann müller (1890 -1967), led the austrian physicist erwin schrödinger to question the sporadic event which, at the level of a target of a few dozen atoms, determines a mutation. he postulated that the target is located in the chromatin of the chromosomes, organized as an aperiodic crystal. the chemical nature of this target was identified with dna, following bacterial transformation experiments (avery, macleod and mccarthy, 1944) and experiments concerning bacterial infection by the bacteriophage (hershey and chase, 1952) (chapter iv-1.1.1). this is how the idea that gene = dna was born. the simple and reassuring idea that one gene → one enzyme, which was deduced from mutation experiments carried out by beadle and tatum on the mold neurospora crassa (chapter iii-6.1), had only a limited lifetime. a first stumbling block appeared when it was shown that the activity of a gene, and in consequence its contribution to the phenotype, depends on nucleic elements outside the gene. the definition of the term "gene" was then extended to include promoting and regulatory sequences. in the case of the lac operon of escherichia coli, these sequences are located just upstream of the site where transcription begins. however, in eukaryotes, a regulatory sequence may be distant from the gene that must be transcribed and sometimes it may be involved in the regulation of several genes (chapter iv-1.1.1). in the 1970s, the idea of the existence of the mosaic gene in eukaryotes appeared. a gene was now thought of as an assembly of several exons that originally in the chromosome are separated by introns. the alternative splicing of these pieces of genes gives rise to numerous possibilities for reconstitution, i.e., many messages coding for many different proteins. thus, however useful the concept of the gene has been with respect to its ability to generate discussion and to provoke experimentation concerning the molecular machinery responsible for the transmission of the hereditary characteristics, we can see that the term itself has not ceased to be the subject of readjustments, since the time it was first formulated. certain concepts are matched with metaphors. while some metaphorical concepts, particularly those that make use of images designed to grab the imagination, and to be easy to understand, tend to take liberties with the realities of living beings, they can also shed light on unsuspected mechanisms in sectors that have been neglected. metaphorical concepts are not a current fashion. it should be remembered that in his passions of the soul (1649), descartes, when asked "how limbs can be moved by objects of the senses and by the mind without the help of the soul," responds that this takes place "in the same way as the movement of a watch is produced only by the force of its spring and the arrangement of its cogs." later on, with lavoisier's clear vision of the vital role of oxygen, and his comparison of respiration with combustion, the concept of the chemistry of life, combined with that of bioenergetics, came to the fore, and was at the heart of studies on the metabolism. chemical reactions that liberate and absorb heat were substituted for the cogs of cartesian mechanics. the second half of the 20 th century saw the birth and development of the concept of the program, a concept with computer technology connotations, which was destined to explain the phenomena of inheritance. this concept began to fill out from the moment when it became certain that, in its nucleotide sequence, dna contains the necessary information for the construction of the protein material of cells. for a certain period of time, the passion for molecular genetics eclipsed the interest that had previously been given to metabolic chemistry. the powerfulness of the metaphorical concept may be measured according to the effect it has in pushing scientific research in particular directions, with the results this has on society. thus, during the 17 th and 18 th centuries, the study of human pathology was impregnated with a strong iatromechanical current. physiological chemistry and its corollary, pathological chemistry, which emerged as disciplines in their own right in the 19 th century and achieved full expansion in the 20 th century, are our inheritance from lavoisier and the concept of discussion about concepts necessarily leads to a brief discussion of scientific semantics, as shown by the few examples given in the previous pages. as we have just seen, the word gene that was put forward by johannsen around one century ago did not have the same meaning at that time as it has now, a meaning that still remains fluid. the gmo, an acronym meaning genetically modified organism, which has been the subject of vehement diatribes over the last few years, becomes much less of an object of passion if it is considered within the context of evolution. after all, for the last two to three billion years, living organisms have been genetically modified constantly by spontaneous mutations, which is why the human beings that we are today are able to discuss them! the term cloning is another example of a semantic misunderstanding that leads to inaccurate interpretation and arouses the passions. the primary meaning of the term cloning is the multiplication and the identical reproduction of a living cell. the simplest and most unambiguous example is that of bacterial cloning, a bacterial cell producing millions of cells that are identical to the original cell by its multiplication in a nutritive medium. the term animal reproductive cloning does not carry exactly the same semantic weight. it should be remembered that, in eukaryotes, the preliminary act of the cloning procedure involves the injection of the nucleus (with 2n chromosomes) from a somatic cell into an enucleated oocyte (chapter iv-2.3.2; see also chapter iv-6.1). the somatic cell nucleus, by providing its genetic equipment, gives the being that will develop in the uterus a phenotype that is practically identical to that of the somatic cell donor, but nevertheless not completely identical, as the cytoplasm of the enucleated ovum, with its mitochondria, provides a small but non-negligible fraction of genes, the mitochondrial genes. as for therapeutic cloning (in the absence of uterine implantation), this is used for the manufacture of differentiated cells that may be grafted into the individual who has donated the original somatic cell, with no immune-related rejection occurring. this is non-reproductive cloning. the passionate argument that has arisen because the term cloning is bandied about in an ill-considered fashion illustrates the confusion that can result from a lack of precision in the use of certain terms with a high level of media impact. "all the major problems of the relations between society and science lie in the same area. when the scientist is told that he must be more responsible for his effects on society, it is the applications of science that are referred to […] . no government has the right to decide on the truth of scientific principles, nor to prescribe in any way the character of the questions investigated." the progress of science is linked to that of civilization. it is in keeping with the state of mind, the beliefs, the lifestyle and the thought patterns of societies. in ancient greece, where manual work was considered to be servile, science remained essentially theoretical, confined to logic and dialectics, and strongly attached to questions of philosophy. the birth of experimental science in the 16 th and 17 th centuries went hand-in-hand with the rehabilitation of manual work. the technical side dominates in modern biology, which seeks to solve problems concerning the "how", rather than to address philosophical problems concerning the "why". as ian hacking (b. 1936) says in representing and intervening (1983), nowadays engineering, and not theorizing, is the greatest proof of scientific realism, which leads to the minimization of philosophical thought. in a skeptical biochemist (1992) , the polish-born american biochemist joseph fruton (1912 fruton ( -2007 emphasizes the contrast between the 19 th century and the first half of the 20 th century, when eminent scientists were still interested in the ideas of the professional philosophers of the history of the sciences concerning the progress of experimental research and, in contrast, the end of the 20 th century, when philosophy and the experimental sciences pretended to ignore one another. this is doubtless partly because the history of biology has become the history of biotechnologies to such an extent that, according to some, the objects being explored are so familiar that they are now part of the life of society. in pandora's hope (1999), bruno latour (b. 1947) considers that the current confrontation between subject and object, in which the researcher-subject explores the structure and function of the object, is being transformed into a human-nonhuman dialogue, in which the nonhuman-object becomes "socialized". taking yeast as an example, latour writes that it has been "working for millenia in the brewing industry, but now it works in a network of thirty laboratories where its genome is mapped, humanized and socialized like a code, a book, or a program of action that is compatible with our ways of coding, counting and reading […] . non-humans have become automatons, admittedly without rights, but much more complex than material entities." latour visualizes the human-nonhuman associations in the form of collectives that are organized into strata that implement the technical, the political, the social, the ethical, the ecological… the technosciences correspond to one of these strata, the sociotechnical stratum that is directly linked to the stratum of political ecology. in the same spirit, the belgian philosopher gilbert hottois (b. 1946), in his philosophies of the sciences, philosophies of techniques (2004) remarks that "laboratories produce things that go off to live their lives in society and in nature." thus, bacteria, yeasts or genetically modified plants are able to produce drugs such as insulin, growth hormone and vaccines for human medicine. these drugs become part of and indispensable to life in society. they are evaluated according to their market value by the companies that patent, manufacture and sell them, and according to the comfort they bring to the patients to whom they are administered. the financial management that results from their consumption becomes a worry for those responsible for public health, while their manufacture by specialized companies generates industrial activity and economic growth which may be measured according to how fashionable they are and how they sell. for a long time, society, while benefiting from scientific progress, remained indifferent to the experimental method, that is to say, the way in which knowledge progresses. in the last decades of the 20 th century, society became aware, via information concerning the occasionally demonized exploits of genetic engineering, that science can "take liberties" with the human being. populations were well informed about the effects that genetic engineering could have on the mortality rates of pathologies such as cancer and diabetes or on degenerative illnesses of the nervous system, and about the closeness of possible solutions. however, they were also warned about the risks to which science was exposing humankind. remembering certain tragic episodes concerning hiv-contaminated blood transfusions, growth hormone and mad cow disease, and certain cassandra-like predictions, such as a catastrophic epidemic of spongiform encephalopathy that has happily yet to appear, society shows reservations when the media inform its members of new feats of modern technology. political authorities, for their part, afraid of potential problems, tend to follow the principle of precaution, which in fact hides a fear of risk. however, evaluating risk involves not being afraid of it but understanding it in a lucid and courageous fashion. informed by the media, which often use sensationalism, the citizen is increasingly calling into question whether certain practices involving the biosciences, such as cloning, or certain mercantile transactions such as the taking out of patents concerning gene sequences, or even experimentation on live animals, are well-founded. "the problem of experimentation on man is no longer a simple problem of technique. it is a problem of value. from the moment that biology concerns man no longer simply as a problem, but as instrumental to the search for solutions concerning him, the question arises of deciding whether the price of knowledge is such that the subject of the knowledge is able to consent to become the object of his or her own knowledge. we have no difficulty here in recognizing the still open debate concerning man as a means or an end; an object or a person. this is to say that human biology does not contain in and of itself the answer to questions concerning its nature and its significance." knowledge of life -1965 written at a time when people were far from imagining how molecular biology was going to expand, the prophetic words of georges canguilhem (1904 -1995) have maintained their philosophical validity. manipulation of the human embryo, whether this involves its creation by cloning or the modification of its genetic inheritance, obviously leads to the need to consider the societal, religious and political points that arise from the domain of bioethics and are a reflection of the period in which we are living. until recently, advances made in biology left moralists indifferent. this ceased to be the case when scientific experimentation began to look at the human embryo with a view to utilitarian ends in the health domain. the specter of cloning was brandished without any clear distinction being made between reproductive cloning and therapeutic cloning. biology became demonized. however, as biologist pierre chambon (b. 1931) said in an interview in the french journal biofutur: "in absolute terms, biology is unable to tell us whether the cloning of a human being is moral or immoral, it simply tells us whether it is biologically possible." the birth dolly the sheep in 1997 (chapter iv-2.3.2) triggered a virulent debate because now that the cloning of an animal had become possible, that of a human being became envisageable. the media sensationalized this debate all the more in that it was exacerbated by debate concerning gmos (chapter iv-2.1). the dolly affair became a problem of society. up until then, the biosciences had been happy just to try and understand the mechanisms that explained the functions of living beings, but now, with the advent of gmos and cloning, it became obvious that a forbidden barrier had been crossed and that man had the power not only to transform but also invent himself. faced with this desacralisation of nature, the need arose for some philosophical reflection. this was given the name of bioethics, which is the title of the book, bioethics, a bridge to the future, which was written by the american biologist van rensselaer potter (1911 -2001) in 1971. the term bioethics covers philosophical considerations that range from the biosphere to the human person. bioethics tries to give a wider meaning to the moral codes which, in human societies, depend on ancestral traditions. it aims to prescribe that which is desirable according to the kantian maxim of the categorical imperative. in his what is bioethics? (2004), the belgian historian gilbert hottois reminds us that bioethics are above traditional morals, the latter being a set of norms that are most often spontaneously respected as good habits, without any critical reflection being involved, while bioethics, on the other hand, arises out of critical thought, analysis, discussion and the evaluation of established mores. over the last few years, the problems that are targeted by bioethics have moved towards today's burning issues. human cloning is an example. while allegations of the transcendence of man in nature may lead to human reproductive cloning being considered as a crime, strictly scientific considerations lead to an emphasis on the lack of responsibility shown by a few zealots, given the hazards involved in cloning in animals, such as the need to use a large number of oocytes in order to achieve success in cloning, the very low viability of the cloned embryos and the development of serious functional anomalies in the clones that survive. even supposing that scientific progress will one day overcome these difficulties, human reproductive cloning will come up against an insurmountable obstacle, the cloned subject's fear of finding that he or she is identical to the relative from whom his or her genetic inheritance comes. after all, the notion of manipulation of the human ovule with the aim of serial reproduction has often haunted science fiction stories. in brave new world (1932), aldous huxley (1894 -1963) gives an apocalyptic vision of the budding of human eggs that produce hundreds of identical twins which are conditioned into classes and subclasses while being raised in jars, depending on the quality of the nutritive substances they are given. in the artificial uterus (2005), henri atlan (b. 1931) predicts that the raising of human fetuses in jars could well become an alternative to uterine gestation in a distant future. let it be understood that human reproductive cloning, which is no longer part of the domain of science fiction, as it has become feasible, must be considered as being reprehensible because it goes beyond the limits of reason, and is a denial of human transcendence. man as subject cannot be considered as an object. the problem of therapeutic cloning is quite different, although it leads to reticence and prohibition because the demarcation between therapeutic and reproductive cloning depends mainly on whether a cloned embryo is implanted in a uterus. while, at the time of writing, therapeutic cloning has been prohibited in france, germany and other countries, it is tolerated in great britain. in the usa, the prohibition only applies to publicly-financed researched, while each state has its own legislation, which is relatively flexible. the objective of therapeutic cloning is to provide patients with tissues that arise from their own selves, and are therefore immunocompatible and able to be grafted without there being any risk of rejection (chapter iv-2.3.2). it is based on the removal of somatic cells from the subject to receive the graft and the transfer of the nuclei of these cells into enucleated oocytes. the stem cells that are obtained after the first divisions are stimulated using appropriate growth factors. depending on the factor used, the stem cells differentiate to form a type of tissue (hepatic, muscular, nerve…) that can be used as a graft. such a procedure may be envisaged for patients who have suffered a serious, invalidating trauma, for example section of the spinal chord. a graft of immunocompatible nerve cells might make it possible to re-establish nerve continuity. a similar type of therapy has been considered for parkinson's disease, the cause of which is a degenerescence of certain cells of the encephalon (chapter iv-2.3.1). given the hopes that are raised by the possibility of such therapies, and the fact that, after all, such therapeutic cloning is the equivalent to an autograft, even if the ways in which the graft is obtained are slightly tortuous, the demonization and rejection of such practices should be reconsidered, calmly and coolly. another option for therapeutic cloning is the correction of mutations identified in the mitochondrial genome of a woman wishing to have children. it is, in fact, the mother's ovum that provides the fertilized egg with its complement of mitochondria that are indispensable for its viability. the manipulation involves inserting the nucleus of a fertilized ovum from the mother, obtained by artificial insemination, into an enucleated oocyte taken from a woman who is not suffering from the mitochondrial defect. the cytoplasm of the enucleated oocyte provides the stock of functional mitochondria that are indispensable to normal cell function in the future embryo. in a domain of the bioethics, in which rational objectivity comes up against deliberately technophobic religious and cultural considerations, it is useful to remember certain legal and legislative paradoxes. thus, in france, after having been considered to be a criminal act that was subjected to severe repression by the law up until 1975, the right to have an abortion before the end of the third month of pregnancy became not only authorized but also protected by law. it is interesting to note that in the 13 th century, thomas aquinas, the father of the church, had acknowledged that a fetus only becomes "animated" by the implantation of the soul by holy will in the third month after fertilization. another subject to be considered is pre-implantation genetic diagnosis (pgd), in which human embryos that have been fertilized in vitro are sorted in order to find those that are without defects, a practice which is on the verge of being a deviation in the direction of eugenics. nevertheless, pgd is the basis of a practice that is either already legalized or is in the process of being so in several european countries, the creation of so-called designer babies. a typical example is that of a designer baby arising from an embryo whose immune profile to that of an older sibling who is suffering from leukemia. in this case, there is good reason to hope that a graft of immunocompatible blood cells from the designer baby into the sibling who is suffering from leukemia will save the latter from death. out of the disharmony of opinions that arising from cultural tradition, religious conviction or simply scientific pragmatism, the american biologist and philosopher h. tristram engelhardt (b. 1941) , in the foundations of bioethics (1996) proposes a lay bioethics that is based upon the principle of permission. lay bioethics advocates tolerance while admitting that this tolerance in no way prevents anyone from taking up a personal position; it means that each human being has a moral sensitivity as well as the ability to reason and to choose within a defined limit of non-harmfulness and of justice. the individual is free to modify his or her destiny, or to manipulate his or her nature by genetic interventions because, adds engelhardt, "there is no lay moral foundation to prohibit such an intervention." when a researcher or the research organization that the researcher belongs to files for a patent for an invention with a patent office, it is necessary to demonstrate the novel and utilitarian nature of this invention. if a patent is accepted, this gives the person or body that filed it the exclusive right to make use of the invention over a pre-determined period of time, generally 20 years, which is a means of protection, or, if desired, to allow others to make use of the invention by issuing a license to do so. in the domain of living beings, there has sometimes been confusion between invention and discovery. in 1991, craig venter, known for his participation in the sequencing of the human genome, filed a demand for a patent covering the sequences of 2 700 fragments of recombinant dna (cdna) called est (expressed sequence tags) that are obtained by reverse transcription from human brain messenger rnas, in the name of the nih (national institutes of health) at bethesda (usa). the patent specified that ests could be used as probes to characterize genes that are potentially involved in neurological ailments. the resulting outcry led the nih to withdraw its patent demand. in fact, the patenting of living beings has a long history that goes back to the patent that was filed in 1895 in france by louis pasteur, and then in 1873 in the usa, for the use, in brewing, of a yeast culture that was free from pathogenic bacteria. from this historical perspective, the case of ananda chakrabarty (b. 1938) set a legal precedent. in 1994, chakrabarty filed a demand with the us patent office for a patent relating to a pseudomonas type bacterium which, by genetic modification, had acquired the ability to digest crude oil. his demand was refused. after an appeal and many legal battles, the united states supreme court overturned the patent demand refusal, the basis of the judgement being that any modified microorganism is a product of human ingenuity and has a specific name, characteristics and use. thus, from 1980 onwards, the arrival of an era of patents derived from genetic engineering was indicative of how this discipline was growing. in december of that year, stanley cohen and herbert boyer, acting on behalf of the university of stanford, patented a nucleic chimera comprising a recombinant dna carried by a vector. in 1982, a patent concerning the growth hormone gene was awarded to the university of san francisco. in 1984, the university of california at berkeley obtained a patent for the human insulin gene. in 1985, the american company pioneer hi-bred succeeded in patenting a variety of corn in which genetic modification has led to an increased synthesis of tryptophan, an amino acid that is indispensable for animal feed. in 1988, the genentech company acquired a patent for the gene coding for human gamma interferon. this was followed in japan by a patent for the gene coding for beta interferon. in the same year harvard university patented the oncomouse, a transgenic mouse whose susceptibility to cancer is greatly increased. after this, several species of transgenic animals were patented for utilitarian purposes, such as the production of human alpha-1-antitrypsin taken from the milk of transgenic goats and used for the treatment of cystic fibrosis. the frenetic patenting of living beings has reached the domain of natural products arising from the plant world in tropical regions, the immensely varied essences arising from these plants being full of pharmacological potential. the potential for producing drugs of a considerable commercial value from such plants is very high. here we return to the problem of the patenting of genetically modified, cultivatable plants (gmps) (chapter iv-2.1). thus, the experimental method, the principle of which is to acquire pure knowledge, finds itself led astray in its applications. whatever the motives that are given, particularly for manipulations that give rise to the manufacture of marketable products, the patenting of genomes for mercantile ends shows the regrettable, but unfortunately inevitable, direction in which the very spirit of a science, molecular biology, which half a century ago wished to be at the heart of an understanding of living beings, has drifted. the suffering of animals that are being experimented upon gives rise to a moral problem. the end of the 19 th century saw large-scale demonstrations against vivisection and repeated demands for it to be abolished. today, there is renewed vigor in the call for the abolition of vivisection, without any real coherent basis. this desire to stop experimentation on animals ignores the imperatives of contemporary medicine, which must meet the challenge of pathologies whose increasing incidence is worrying, such as cardiovascular diseases, diabetes, cancer, and the degenerative illnesses that are linked with aging or are of genetic origin. it is true that animal experimentation inevitably leads to questions. are the stakes involved in a particular experiment, in terms of the acquisition of new knowledge, worth the suffering of an animal used in that experiment? is it not necessary to ensure that the experimental protocol is well-documented, that it is not redundant, or even that it has been the subject of previous studies carried out on cells in culture? it is easy to see the size of the methodological chasm that separates contemporary physiology from that of the time of claude bernard, when cell culture techniques were not yet being used, when the main instrument used was the scalpel and the researcher, using his or her imagination and creativity, had to develop specific protocols that were able to validate or refute a working hypothesis. each period in history operates in its own way according to its moral laws and its technical capabilities. the bloody operations carried out by magendie and by claude bernard in the 19 th century, which were tolerated at this time despite criticisms from antivivisectionists, would not be permitted today. nevertheless, it is true that the physiologists of the 19 th century, by means of the results of their experiments, wove a tapestry of new knowledge on which contemporary biologists were going to work and without which the level of understanding the modern science would be much lower than it is. animal experimentation remains indispensable in many areas of physiological investigation, in genomics, in toxicology and in pharmacology. it is a precondition for clinical trials of any new drug, being used to test for the drug's efficacy, its metabolism and any toxicity. however, not all data arising from animal experi-mentation can be extrapolated to man. the margin of uncertainty can be reduced by means of comparative trials on several animal species. because of their phylogenetic proximity to man, primates may seem to be the solution for experimentation prior to the application of a drug in man. this was the case for the development of a vaccine against hepatitis b. it has been proposed the grafting of stem cells in man should be preceded by experimentation in apes, in order to ensure the absence of tumorization over the long term. however, the researcher is confronted with a dilemma: should he or she ensure the safety of man with respect to possible deleterious effects or respond to ethical demands that recognize the very great genomic similarities between man and the chimpanzee. a consideration of cloning, patenting and animal experimentation practices illustrates the excesses of the experimental method in domains where political authorities consider themselves able to legislate. administrative decisions, often made in the absence of any dialogue with scientific authorities, can have serious consequences. thus, given the pretext of strict obedience to the principles of bioethics, which are a matter of tradition, and while certainly respectable, are nevertheless arguable, and also given the pretext of a sickly and unconsidered fear of the risk involved in certain experimental practices, and the absence of an intelligent evaluation of this risk, research, which until recently took place in a motivating atmosphere of liberty, may, over the long term, be weighed down with a highly prejudicial handicap and a limitless sense of discouragement. in the 17 th and 18 th centuries, experimental research, which was still in an emergent phase, was mainly artisanal, and in the hands of rare scholars. it took form during the 19 th century in the west, particularly actively in germany, and became operational in the 20 th century, under the aegis of governmental authorities, with the creation of institutes, the programmed recruitment of researchers and the allocation of renewable budgets. modern science, based on the principles of the experimental method, came to the fore much later in the east than in the west. the globalization of knowledge has meant that at present experimental science, in all domains, including that of the life sciences, has spread throughout the world, with even those countries that had become relatively backward in these domains because of their isolation catching up rapidly. nevertheless, it is true that the progress of the experimental sciences in the usa and in the united kingdom has been distinguished by pragmatic management of these countries' science policies, based on the excellence and the high degree of autonomy of their universities and research institutes with respect to recruitment and choice of subjects of study. the efficacy of this policy in the life sciences may be judged by the number of researchers who have won nobel prizes since the second world war (at the time of writing, more than 80 in the usa and twenty or so in great britain as opposed to only 4 in france). in france, research on living beings is carried out in the laboratories of universities, in institutes connected with higher education and in laboratories that are run by large organizations such as the national scientific research center (cnrs), the national institute of health and medical research (inserm), the national institute of agronomic research (inra), the atomic energy commission (cea), the national institute of research in computer processing and automation (inria), the national center for space studies (cnes) and the french institute of research on the seas and oceans (ifremer). equivalent bodies exist in countries other than france, some of them being institutes that are dedicated solely to research, and some being university laboratories that associate research and teaching. at the beginning of the 20 th century, the function of researcher was most often associated with that of a professor occupying a chair at a university, surrounded by a few assistants, the professor directing the research work in his area of specialization. now, within a period of a few decades, the status of researcher has been modified greatly. today we talk of research careers classified according to level of expertise and technicality. management, or the supervision of career paths and the control of financing, is carried out by an administration that is itself highly hierarchical. the scientific process has undergone a metamorphosis, shown by changes in the behavior of researchers not only within the institutions in which they work but also in their relationships with the media, the political sphere and society. the teaching of the life sciences needs to take this into account. "long ago, there was a time when scientists recounted the exact circumstances of their discoveries, without shame, even when their recital showed up the fragility of their forecasts or an indecent collaboration on the part of every bit of luck. such times are past, and the researchers of today often like to make us believe that they only find what they are looking for. the thousands of pages pasteur's lab books provide an opportune reminder to us (and to program-makers or impatient users) that it is just as difficult to ask a question as to answer it, that a scientific discovery often occurs after a long, winding path, that rather than following the fashion, it is preferable to follow one's ideas, particularly if they are good ones, and are in advance of the fashion." jean jacques molecular dissymmetry, in "pasteur, workbooks of a scholar" -1995 current technological progress, the accumulation of the scientific knowledge, the institutionalization of the public research and many other factors are disrupting a ritual of the experimental process that had survived until the middle of the 20 th century, and even beyond. the experimental life sciences of the 21 st century will necessarily see themselves remodeled with respect to their objectives and procedures. faced as it is by an increasingly tough international competition, the scientific community is also subject to restrictions in terms of operation and prospectives. an organization into small teams of a few researchers gathered around a boss, working in friendly interaction, is increasingly giving way to large groupings that sometimes seem like consortiums. focused on research subjects that are deemed to be "cost-effective", these superstructures are encouraged, or even imposed, in the sadly illusive hope that the will lead to greater efficacy. the person in charge of such large groups is taken up with everyday management tasks and by maintaining good relations with the administrative bodies on which his or her organization's survival depends. he or she may become distanced from the experimentation and forget the intellectual motivations that in the past caused his or her competence to be recognized. it should be emphasized that the secret of future successes lies in situations where young researchers are in direct contact with their bosses, and where friendly interaction with a known master teaches the apprentice researcher how to learn, how to think and how to experiment in a critical fashion. preoccupied by the rapid expansion of the scientific population, accompanied by the creation of laboratories whose operation necessarily requires financing, often on a large scale, political authorities, giving way to the requirements of media-fed public opinion, are interfering more and more, via administrative relays, in the control of the objectives of experimental research. short-term objectives, considered to be "visible", are favored. a priori, the viability of a project is judged according to the scientific context of the period and its impact on society, insofar as the project looks at health problems with a high degree of media coverage (cancer, degenerative illnesses, viral infections…) and often in agreement with a consensus that avoids going against the orthodoxy of the moment. this leads to a rigid management of projects that are financed and controlled according to objectives that have been fixed in advance, and that are all the more easily accepted by state authorities when they are somewhat fantastic in character. however, fundamental research proceeds from a playful activity, and for this reason, its efficacy is dependent on the passion of the researcher for the problem that he or she is studying. in contrast to what is believed by the narrow-minded, the effectiveness of a researcher in terms of discoveries depends upon the liberty that is given to this researcher, assuming, of course, that this liberty is underpinned by criteria of confidence such as the researcher's scientific past, his or her motivation, and judgements made concerning the researcher by impartial peers. it should not be forgotten that the determination of the three-dimensional structure of hemoglobin by max perutz (chapter iii-6.2.1) took around twenty years of solitary, uninterrupted and untiring labor. the theoretical and technical tricks that led to this success helped to open up the domain of the structures of giant macromolecules, several dozen kilodaltons in size, which no-one had dared study before. anyone who uses the experimental method realizes that while fundamental research must be organized, it cannot be scheduled. such a person knows that the pathways to discovery are convoluted, and that an inexplicable observation that appears unexpectedly during an experiment can sometimes, if the researcher is sufficiently perspicacious, be the beginning of an adventure that leads to a discovery. it was to just such a convoluted path that the belgian biologist christian de duve (b. 1917) alluded in his speech when he received the nobel prize for medicine and physiology in 1974. after working at the university of saint louis in the usa, de duve, who had taken up a post at the university of louvain, belgium, decided to look at a research theme that had received a great deal of media coverage, diabetes and insulin. it was while operating on one of the subcellular fractions obtained from ground rat's liver, and analyzing certain of the enzyme activities of these fractions, that he was surprised to find, in one of them, enriched with mitochondria, a phosphatase activity that, paradoxically, increased with time, while the enzyme activities specific to the mitochondria declined. this was an activity belonging to organelles that were contaminating the mitochondria. dropping all research on diabetes, de duve set out to identify and characterize these unknown organelles. he discovered that they were involved in the breakdown (lysis) of molecules that are undesired by the cell and, for this reason, he called them lysosomes. the discovery of lysosomes helped to open a new chapter in cell biology and to attribute a molecular cause to diseases with serious prognoses whose etiology had remained a mystery up until then. these diseases were given the label lysosomal diseases. these diseases result from the absence of a lysosomal enzyme that is responsible for the breakdown of a given metabolite. the accumulation of this non-broken-down metabolite in the lysosomes leads to cell malfunction, which causes the lysosomal disease. as de duve said jokingly, if he had carefully followed the experimental process laid down in his diabetes research project, and if he had not given way to the temptation of "playing hooky" or "playing truant" he would never have mounted the podium in stockholm. in the same way, henri-gèry hers 84 (1923 -2008), a cell pathologist at the internationally renowned louvain school, remarked in an article published in the review médecine/sciences: "i believe we would obtain maximum value for the money devoted to research if we were willing to distribute it to those who have been shown to be productive, according to their needs, and without asking them for a program." hers concluded, in a tone that was deliberately playful, but thought-provoking, "such a simple system would lead to unemployment for a large number of administrators, which is why i suspect that it will never be adopted." research has its own set of ethics, driven by anticonformity and the creative imagination, capable of shaking up firmly-anchored ways of thinking and established hierarchies, and of leaving the researcher the freedom to express him or herself and to experiment off the beaten paths. as eccles says in evolution of the brain and creation of the conscience, it is important to distinguish between intelligence and imagination. intelligence is measured according to the rapidity and depth of understanding and clearness of expression. it may be measured and even given a numerical value. the same is not true for the imagination, a more subtle, unmeasurable phenomenon that cannot be learned. the imagination is one of the levers that is able to lift the boulder that hides scientific truth. the imagination is the ultimate weapon of research, which shakes up the knowledge acquired by the intelligence. nevertheless, the imagination must be tempered by a good critical sense that is able to perceive potential sources of artifacts, both in sophisticated instruments that act as so many black boxes from which already manufactured information emerges and in genetic or chemical cell exploration methods whose specificity must be carefully checked. the benefits that can sometimes be gained from prospective research that is far from dogma that is rooted in sterilizing tradition, the way in which knowledge progresses, most often by moving away from any orthodoxy, the way discoveries appear unexpectedly on the fringes of carefully put together projects, all of these points are matters for reflection for those in power in the worlds of politics, economics and industry. publication is an essential tool for communicating scientific knowledge, and is the judgement criterion for committees in charge of evaluating the creativity of a researcher. in order to have meaning, a publication must provide information that is sufficiently innovative with respect to parallel work carried out in other laboratories. here again, media coverage has quietly infiltrated the scene. its role is all the more perverse in that the rating of a publication is estimated according to its impact index, or, roughly speaking, the renown of the scientific journal in which it is published. curiously, it has happened that articles that would later be considered to be of primary importance have been rejected by highly prestigious journals, simply because the facts mentioned in the article and the conclusions made have not coincided with the orthodox opinions of the period and the traditionalist spirit of the journal's editorial committee. this was the case for an article which the biochemist hans krebs (1900 krebs ( -1980 submitted to the british journal nature in 1937. in this article krebs described a series of experiments showing that an endocellular metabolite, pyruvate, product of glycolysis, is completely degraded during a cycle of enzyme reactions. this degradation cycle would later be recognized as the central pivot of the intermediate metabolism. called upon to judge revolutionary scientific considerations, and unable to perceive their importance, nature's editorial committee rejected the article. krebs then sent his article to a journal with a relatively restricted audience, enzymologia. it was accepted and published in the two months that followed. the importance of the concept that was put forward in the article ensured that its author gained international recognition, leading to his winning the nobel prize for physiology and medicine in 1953. for the researcher, publication is a way of making his or her work known. it is also the way in which the researcher learns about the work of others. while the rhythm at which publications in the life sciences appeared increased slightly in the first half of the 20 th century, the second half of that century saw a great acceleration in this rhythm, leading to a difficult-to-manage proliferation of reviews and books. it has been estimated that in the last thirty years the volume of publications in the biological domain has increased five-fold; in the preceding twenty years it had already doubled. this accumulation of publications makes it harder for the researcher to judge the quality of the huge mass of published articles, even in the highly targeted domains that are within his or her area of expertise. the researcher, therefore, will deliberately choose a particular article according to the prestige of the journal in which it is published, which is not an inviolable criterion of quality. in addition, any judgement concerning the pertinence of a scientific article necessitates a dissection of the subtleties of the methodology, the well-groundedness of the experimental protocol and the validity of the results, by means of a careful examination of tables of results and graphs, and, finally, the logic of the discussion. this restrictive yet absolutely necessary requirement limits the number of articles that are likely to be screened. however, this is not the worse fault of publication today. there is another problem that is much more worrying. many documentation centers have reacted to this inflation in the scientific press by equipping themselves with computing facilities that are able to find, in data banks, articles that have been selected on the basis of a key word index, and to display them on screens. while acknowledging that this constitutes an inescapable change in the transmission of scientific know-how, it should be recognized that in browsing through the pages of a highquality scientific review, it is possible to come across an article containing an innovative idea or a useful technique, an advantage that is less available when using the on-line system of scientific publication that is most prevalent nowadays. mention should also be made of the requirement to publish frequently and within short time frames, for reasons of competitivity, when aspiring to obtain jobs or promotions, or even just to obtain recognition, this requirement being another factor that is prejudicial to fundamental research. it is the cause of worrying excesses, such as experiments that are hastily published and non-reproducible, or even the falsification of experimental results, occasionally within a context of considerable media coverage. although such practices, which are the exception rather than the rule, are rapidly detected and condemned in a scientific culture where information circulates freely, the publicity that they incite, which reaches society at large via the media, leads to an overall discrediting of experimental research. at present, one of the most noticeable trends in scientific publication is that of collectivism. while, in the 19 th century, scientific articles were usually published in the name of a single author, occasionally two authors, and very rarely more than two, nowadays publications are often co-authored by several people, and when the work involves the analysis of structures, or the sequencing of genomes, several dozen researchers may be co-authors. from being the work of individuals, research has become collective. in domains whose complexity requires a wide selection of techniques that may range from physics to genetics, the hybridization of specific areas of expertise is certainly indispensable, and this requires the collaboration on a particular project of researchers who are sometimes physically remote from one another. the downside for the researcher, particularly one who is young, is that this requires him or her to abandon individuality and creativity. both collectivism and inflation in scientific publication are facts that are an integral part of contemporary science, facts which reflect an irreversible trend that it would be difficult to obviate. over the last few years, scientific publication has been subject to a type of restraint, in that certain "sensitive" data in the domain of molecular biology might be used for the manufacture of biological weapons in a form of terrorism known as bioterrorism. thus, the means of synthesizing de novo viruses (influenza virus, poliomyelitis) and the possibility of modifying their tropism by "directed molecular evolution" (change from a sexual tropism to a respiratory tropism for the aids virus) have been the subject of publications in prestigious journals. given sufficient means, terrorist pharmacists could well make use of such data in order to carry out malicious actions with catastrophic consequences 85 . in order to please a public that is eager for progress and the sensational, politicians favor, by means of targeted financing, the types of organization that appeal to their sensibilities, such as the technological platforms. while recognizing that such platforms are now an integral part of the landscape of research on living beings, and that they must therefore be taken into account, and while acknowledging that projects which implement the latest technologies in different domains need to be federated, it is nonetheless vital not to underestimate the potential creativity of small groups of researchers, a point that was expressed by one of the greatest of contemporary biologists, arthur kornberg (1918 kornberg ( -2007 , winner of the nobel prize for physiology and of medicine, in a speech given in 1997: "as i view the steady growth of collective science and big science, the greatest danger i see is a dampen-ing of individual creativity and reversion to the old politics -the inevitable local politics that infects every group and institution." however, conscious of the metamorphosis that is occurring in the experimental method, and faced with a particularly inventive and all-conquering technology, fundamental research in the life sciences must come to terms. a century ago, fundamental research and technological research interacted all the more directly because they were both in their infancy. this is no longer the case. management of the ever-increasing amount of knowledge in the life sciences, and the degree of sophistication achieved by bioengineering techniques and instruments, is widening a gap that makes dialogue increasingly laborious. however, dialogue appears to be a guarantee of future progress. the solution can only come from an increase in cross-disciplinarity, which should begin with university teaching and the establishment of a recruitment policy that advocates the cohabitation of talents from different educational backgrounds in the same laboratory. fortified by such hybrid expertise, while maintaining its share of originality and liberty in the choice of problems to be studied, fundamental research on living beings can only be enriched by a marriage of reason with biotechnology. convinced of the necessity for such a marriage, stanley fields, the inventor of the double hybrid method (chapter iv-4.1), in an article entitled "the interplay of biology and technology" (proceedings of the national academy of sciences, usa, 2001, vol. 98, pp. 10051-10054), concludes,: "it is at the interfaces of biology and other sciences that many of the future discoveries will be made, at the interfaces of biology and engineering that these discoveries will come to be exploited, and at the interfaces of biology and ethics and law that their consequences for society will be decided." the desired dialogue between biology and technology also implies the breaking down of barriers that too often isolate fundamental research and so-called applied research, and the facilitating of consistent interaction between the discoveries made in the academic institutions and their application for utilitarian ends in private companies. this is where the twin demons of money and power raise their heads. already, at the turn of the 1980s, a. bartlett giamatti 86 (1938 -1989) , who was then president of yale university in the usa, commenting on american university policies, spoke of a "ballet of antagonisms" between, on the one hand, commercial companies that are interested in the rapid cost-effectiveness of any new therapeutic advance and, on the other hand, non-profit university laboratories. recently, james j. duderstadt 87 (b. 1973), emeritus president of the university of the michigan, argued that the university is a "counter-hierarchical" organism. in fact, its members are free to carry out the research that pleases them and to think in the ways that they wish to think, in any case within an academic norm that considers itself as being free from the constraints dictated by private interest groups. until recently, such behavior was considered as a sort of ethic which arose out of the university conscience and dignity. the crumbling away of this ethic in the final decades of the 20 th century coincided with the rise of biotechnologies and the large-scale filing of patents relating to molecular genetics techniques that could be applied to the manipulation of living beings, by researchers in the public sector. the intrusion of the american private sector into public research laboratories, in the form of collaborations with transfer of "sensitive" information from the public to the private, has become such a worrying problem that drastic control measures have had to be taken. within this context, the american federal government, in february 2005, issued a certain number of prohibitions targeting the national institutes of health (nih) of bethesda, particularly with respect to the retribution of researchers for services rendered to industry 88 . these stands call for thought concerning the place that is currently held in universities with respect to fundamental research. without arguing against the efficacy of major research institutes, it is nevertheless necessary to remember the part played by the university in this domain. the university is not only the place where knowledge, both as it is now, in its current state of advancement, and as it has been, it is also the place where knowledge must be created by fundamental research. for the last few decades, under pressure from state policies, and also as a function of an improvement in social status, the world of the university has opened up to a wider public, leading to an influx of students that is sometimes so enormous that the task of teaching them has become overwhelming. because of this, the share of their time that university researchers can, in practice, devote to their research tasks has shrunk. this situation is highly prejudicial to the mission to innovate, which should be a priority. it is, in fact, during their university studies that the thought patterns of young students are forged by contact with teachers who not only instruct them, but also educate them by inspiring in them a motivation and an enthusiasm that gives rise to hope. how could this be true if the teaching faculty did not itself participate in scientific creation? "what can teaching, ex cathedra, do to guide the researcher? nothing, obviously. the researcher is trained in the laboratory. and the first stroke of genius on the part of a future researcher is to find a good boss. such a find will open up the royal road to success. the road will be opened -but the researcher must travel along it. a researcher may be taught many things. he or she can become familiar with techniques and with equipment. she or he can be assigned a problem to resolve. however, what is essential for the researcher is to know how to understand relationships between phenomena that seem unrelated, and to be able to progress from the particular to the general. a boss may develop such qualities in a gifted young researcher, but intuition is a gift; it cannot be taught." while the bernardian style experimental method, based on a working hypothesis aroused by an observation, followed by implementation of an experimental protocol, is still extant in the life sciences, and while "serendipity" is still the origin of great discoveries, "big science" , underpinned by sophisticated biocomputing or bioinformatics procedures, is intruding more and more, while genomics and proteomics are not far behind. the methods and instruments developed by the biotechnosciences have led to profound modifications in the ways that the structures and functions of living beings are investigated. for example, by varying multiple parameters in dna chips or protein chips, at the same time, the experimenter is able to ask questions that lead to grouped all-or-nothing answers (chapter iv-1.1.3). in combinatory chemistry, screening makes it possible to detect a molecule that is active for a given pathology from among a multitude of molecules (chapter iv-3.4). the mathematical simulation of metabolic networks or of signaling chains is already well under way (chapter iv-4.2). given this new technological outlook and the hope that it can provide rapid solutions to health problems subject to considerable media coverage, the teaching of biology in universities must not be limited to a description of current advances, no matter how brilliant and promising they may be. this teaching should return to its origins, be a reminder of history, and should not hesitate to use examples to illustrate how a major discovery can arise from a long period of wandering in the wilderness. in practical terms, while being conscious of the extraordinary complexity of living nature, and carefully avoiding the dangers of simplification, it is important to remember that the reductionist method was a necessary path to an understanding of the integrated, modelized biology that is emerging nowadays. at present, certain people call reductionism naive, but this is only the case insofar as we have faith in recent advances in integrated biology 89 . with this in mind, it should be noted that the deciphering of the protein synthesis mechanism in prokaryotic microorganisms (chapter iv-1.1.1) was, along with the discovery of the genetic code, a jumping-off point for an inventory of similar, but noticeably more sophisticated, mechanisms in eukaryotic organisms. the reductionist "one gene, one enzyme" dogma, formulated on the basis of beadle and tatum's experiments on the mold neurospora crassa (chapter iii-6.1) was a necessary prerequisite to a considerably more elaborate understanding of the relationship between the genotype and the phenotype. the way in which the nucleic acid and protein units in the tobacco mosaic virus spontaneously organize themselves (chapter iii-7.3) acted as a basis for thought concerning the self-organization of macromolecular complexes in the cell. these few examples underline the fact that it is difficult to comprehend the scientific research process if we only refer to experiments carried out in the present, and if we do not have a clear idea not only of the way in which hypotheses, even false ones, were once formulated, but also of the way in which experimental work, which may have led to failures, was once carried out, or, in brief, if we do not look back at the past. let us add that it is occasionally good for us to show some humility when we take the trouble to examine the past. thus, the processes involved in the phagocytosis of bacteria by innate immune cells (neutrophils, macrophages), which are today studied in the greatest detail with particularly refined technical facilities, had already been perceived more than a century ago by metchnikoff, and even analyzed, admittedly with the clumsy means at his disposal, but with such accuracy that none of the conclusions formulated at that time have yet been disproved (chapters iii-2.2.4 and iii-6.2.5). the experimental method applied to the life sciences, the history of its birth and of its development, the way in which it is regarded by political and societal authorities, and, finally, the dependencies that are developing at present between the technosciences, human medicine and the different branches of the economic sector, all of these aspects should be covered by university teaching that includes not only the pure sciences, but also the human, political and economic sciences, as well as philosophy. the student should not be saturated with book-learning, but he or she should be taught to reason, to imagine and to criticize, not to accumulate knowledge in an indigestible catalogue, but to ask questions about the way in which certain, carefully chosen, items of knowledge have been acquired, and not to deliberately accept science in its current state without knowing what it was like in the past. he or she should understand what pathways of thought led to dogmas that were established and taught as truths being refuted, and favor experimentation, with its risks and questions, rather than well-smoothed, abstract theoretical presentations without rough edges. these should be the principles of teaching that is designed to open up young minds to creativity. in anglo-saxon countries, the worlds of industry and research that welcome the graduate manage to communicate with one another, but these worlds ignore one another in france, or at least remain reserved, a situation which is prejudicial from the economic point of view. if we look at the pharmaceutical industry in particular, we see that only half a century ago the pharmacopeia was limited to plant extracts or active agents isolated from these plants, with antibiotics quietly beginning to make their appearance. in the last decades of the 20 th century, a great technological leap forward was made, with completely new methods in bioengineering, combinatory chemistry, and the finding of therapeutic targets in macromolecules, and this created a hiatus that severely handicapped countries that were unprepared for it. france, with its biological fundamental research training that is out of phase with that of the anglo-saxon countries, fell behind, and continues to be behind, a situation that is prejudicial for its economy. the remedy for this does not lie in incantatory speeches. it requires a volontarist policy for the management of experimental research. generally speaking, the fact that the major engineering schools in france, which recruit the scientific intellectual elite, students being chosen by competitive exams that select for intelligence rather than imagination, are unable to impose upon their students an end-of-course thesis that would authenticate their engineering degree, should not be tolerated. in contrast to other countries, in france only a small percentage of engineers have received doctoral training or had to present a thesis before entering their careers. the french dual system of major engineering schools and universities, which, a century ago, made sense for the economy of that period, has become completely obsolete, and deserves a courageous revision. "there is a question, much older than modern science, which has never ceased haunting certain men of science: that of the conclusions that the existence of science and the contents of scientific theories can lead to concerning the relationships that humankind has with the natural world. such conclusions cannot be imposed by science as is, but they are an integral part of the metamorphosis of this science." the new alliance. metamorphose of science -1986 (2 nd edition) in the 1950s -1960s, the hybridization of the techniques of genetics, biochemistry and biophysics gave birth to molecular biology. with the resolution of the double helix structure of dna, the demonstration of its replication, the elucidation of the mode of expression of its nucleotide sequence as a sequence of amino acids in proteins and finally the deciphering of the genetic code, biology underwent a revolution of an amplitude similar to that which, at the end of the 19 th century, saw a blossoming of the seeds of cell biology. the last decades of the 20 th century represented the utilitarian era of molecular biology. the introduction of genetic engineering into biological experimentation dates to the beginning of the 1970s. it was at this time that techniques were developed that made it possible to transfer a fragment of genomic dna from one species into the genome of another species. genetic engineering now fills a predominant position in the life sciences, supported by increasingly effective biocomputing or bioinformatics techniques. it is easy to understand that expertise and a high degree of knowledge about fundamental research is necessary in order to be able to master or even invent the genetic engineering techniques that are indispensable if we are going to produce biomolecules with a therapeutic impact, such as those that are currently being used in the pharmaceutical domain: insulin, growth hormone, blood coagulation factors, vaccines, etc. the engineering sciences that make up the greater part of contemporary biotechnology have now come to the fore in many domains of the life sciences. it is thus that a modernistic and original way of investigating nature has come into being. a multiparametric model, in which biocomputing or bioinformatics and high-throughput screening reign, is added to, or even substituted for, the bernardian model for the experimental method, based on observation, an a priori hypothesis, and experimentation to verify this hypothesis by varying a single parameter at a time. the aim of this globalized approach is to integrate the multiple reactions that take place almost simultaneously in different locations of a cell into a coherent whole, to rationalize the interpretation of the dialogue that operates between the different endocellular organelles, and finally to discover how the exchanges of information between cells in an organ and between organs in multicellular organisms are set up. we are therefore witness to the emergence of an integrated biology that has been labeled "systems biology". its long-term objective is to model the functioning of living beings and to theorize them. its development is encouraged by the perspective of consequences that could revolutionize certain sectors of the human economy and of public health. today, concrete, mechanical models, in the form of biorobots and hybrid robots, and, very recently, molecular motors are added to abstract models that are based on the logic of mathematics and algorithms, ushering in the era of nanobiomachines. becoming more utilitarian, the life sciences are imperceptibly detaching themselves from traditional philosophical concepts that try to explain the modes of reasoning of the researcher, or even to impose a framework for thought that is likely to orient his or her way of doing research. looking at genetic inheritance, contemporary experimentation has shown that at all levels of the tree of nature, including man, this inheritance can be modified. aware of his or her ability to influence the functioning and the destiny of living beings, the researcher is confronted with the dilemma of a desire for knowledge versus a questioning of the use to which discoveries may be put. there has never been such a real divorce between the world of phenomena that are understood by the experimenter and the world of noumena whose intelligibility is foreign to our senses. there has never been such a wide gap between the biotechnosciences, whose possibilities are coming to be seen as limitless, and a reflective analysis of thought, which wanders between freedom of action and prohibition. as society becomes aware of the potential applications of discoveries made concerning living beings, problems of bioethics, particularly those involving reproduction, have become problems of public interest. cloning and the production of stem cells are subjects that give rise to diatribes and passions. in the near future, genotyping, which is the result of progress in pharmacogenetics, could usher in a new form of customized medicine. elsewhere, the cognitive sciences that are bringing together philosophy and psychology in the domains of computer technology and artificial intelligence, and which are tackling the processes of thought, the creative imagination and memory, will no doubt be the subject of the considerable questioning concerning research on living beings with which the experimental method will be confronted in the 21 st century. when faced with the way in which biotechnologies have erupted into the life of society, the mind travels back to the allegorical illustration that embellishes francis bacon's novum organum (see figure ii.19) , showing vessels returning from unknown lands, loaded with precious cargoes and returning to port having sailed past the pillars of hercules. at present, the challenge has been partially met, but a great deal remains to be done. innumerable cargoes have already reached port, but what will be the destiny of this precious merchandise? after all, the seeds of the idea of technoscience were already in place in the 17 th century, in the philosophy of francis bacon and robert boyle (chapter ii-6). bacon recommended that the governments of the time promote experimental science by the creation of laboratories equipped with high-performance instruments and libraries, by the organization of researchers into teams and by appropriate financing. the utilitarian ends of scientific research were underlined. boyle imagined a situation in which laboratories were open to society and researchers were able to accept criticism. given innovations that upset tradition, protestations arose. the pneumatic machine or vacuum pump was the subject of the fameuse diatribe between boyle and the philosopher hobbes (chapter ii-6.2). hobbes criticized the validity of boyle's conclusions, drawn from experiments that he qualified as doubtful. following his words, he came to see in the discoveries of experimental science a possible threat to the power of governments and the hierarchical layout of society. such overcautious opposition to the pursuit of knowledge is in no way anecdotal, it is still a reality, with the uprooting of genetically modified plants and the veto that has been placed in certain areas on stem cell research. this type of opposition is also shown when pressures or even vetoes are in operation that take into account more the opportunism of the moment than an in-depth understanding of science and of its history and that forget that freedom of the mind is a guarantee of its creativity, because, just as in the world of arts and letters, the world of scientific research is situated outside those norms that can be modulated by state decrees. the creativity of the researcher cannot be manufactured on demand. where it exists, it still needs to be detected and encouraged. the atp-synthase -a splendid molecular machine structure at 2.8 å of f1-atpase from beef heart mitochondria direct observation of the rotation of f1-atpase powering an inorganic nanodevice with a biomolecular motor mechanically driven atp synthesis by f1-atpase atp-driven stepwise rotation of fo-f1 atp synthase key: cord-103465-6udhvl9n authors: schierding, william; horsfield, julia; o’sullivan, justin title: low tolerance for transcriptional variation at cohesin genes is accompanied by functional links to disease-relevant pathways date: 2020-04-13 journal: biorxiv doi: 10.1101/2020.04.11.037358 sha: doc_id: 103465 cord_uid: 6udhvl9n variants in dna regulatory elements can alter the regulation of distant genes through spatial-regulatory connections. in humans, these spatial-regulatory connections are largely set during early development, when the cohesin complex plays an essential role in genome organisation and cell division. a full complement of the cohesin complex and its regulators is important for normal development, since heterozygous mutations in genes encoding these components are often sufficient to produce a disease phenotype. the implication that genes encoding the cohesin complex and cohesin regulators must be tightly controlled and resistant to variability in expression has not yet been formally tested. here, we identify spatial-regulatory connections with potential to regulate expression of cohesin loci, including linking their expression to that of other genes. connections that centre on the cohesin ring subunits (mitotic: smc1a, smc3, stag1, stag2, rad21/rad21-as; meiotic: smc1b, stag3, rec8, rad21l1), cohesin-ring support genes (nipbl, mau2, wapl, pds5a and pds5b), and ctcf provide evidence of coordinated regulation that has little tolerance for perturbation. we identified transcriptional changes across a set of genes co-regulated with the cohesin loci that include biological pathways such as extracellular matrix production and proteasome-mediated protein degradation. remarkably, many of the genes that are co-regulated with cohesin loci are themselves intolerant to loss-of-function. the results highlight the importance of robust regulation of cohesin genes, indicating novel pathways that may be important in the human cohesinopathy disorders. the cohesin complex has multiple essential roles during cell division in mitosis and meiosis, genome organisation, dna damage repair, and gene expression 1 . mutations in genes that encode members of the cohesin complex, or its regulators, cause developmental diseases known as the 'cohesinopathies' when present in the germline 2 ; or contribute to the development of cancer in somatic cells [3] [4] [5] . remarkably, cohesin mutations are almost always heterozygous, and result in depletion of the amount of functional cohesin without eliminating it altogether. complete loss of cohesin is not tolerated in healthy individuals 2 . thus, cohesin is haploinsufficient such that normal tissue development and homeostasis requires that the concentrations of cohesin and its regulatory factors remain tightly regulated. the human mitotic cohesin ring contains four integral subunits: two structural maintenance proteins (smc1a, smc3), one stromalin heat-repeat domain subunit (stag1 or stag2), and one kleisin subunit (rad21) 6 . mutation of stag2 has been linked to at least four tumour types (e.g. ewing sarcoma, glioblastoma and melanoma 7 , and bladder carcinomas) 4 . strikingly, mutations in cohesin components are especially prevalent in acute myeloid leukaemia [8] [9] [10] . in meiotic cohesin, smc1a is replaced by smc1b; stag1/2 by stag3; and rad21 by rec8 or rad21l1 1 . mutations in meiotic cohesin subunits are associated with infertility in men 11 , chromosome segregation errors and primary ovarian insufficiency in women 12 . cohesin is loaded onto dna by the scc2/scc4 complex (encoded by the nipbl and mau2 genes, respectively) 1 . mutations in the cohesin loading factor nipbl are associated with >65% cases of cornelia de lange syndrome (cdls). remarkably, features associated with cdls are observed with less than 30% depletion in nipbl protein levels 13 . the release of cohesin from dna is achieved by wapl, which opens up the interface connecting the smc3 and rad21 subunits. the pds5a/pds5b cohesin associated subunits affect this process by contacting cohesin to either maintain (with stag1 and stag2) or remove (with wapl) the ring from dna 1 . spatial organization and compaction of chromosomes in the nucleus involves non-random folding of dna on different scales. the genome is segregated into active a compartments and inactive b compartments 14 , inside which further organisation occurs into topologically associating domains (tads) interspersed with genomic regions with fewer interactions 14 . cohesin participates in genome organisation by mediating 'loop extrusion' of dna to form loops that anchor tad boundaries. at tad boundaries, cohesin colocalizes with the ccctc binding factor (ctcf) to form chromatin loops between convergent ctcf binding sites. fine-scale genomic interactions include chromatin loops that mediate promoter-enhancer contacts. notably, the time-and tissue-specific formation of the fine-scale loops also requires cohesin. the spatial organization of the genome is particularly dynamic and susceptible to disruption during development. for example, changes to tad boundaries are associated with developmental disorders 15 . furthermore, disruption of tad boundaries by cohesin knockdown can lead to ectopic enhancer-promoter interactions that result in changes in gene expression 16 . rewiring of the patterns of course and fine-scale chromatin interactions also contributes to cancer development 17, 18 , including the generation of oncogenic chromosomal translocations 19, 20 . disease-associated gwas variants in non-coding dna likely act through spatially organized hubs of regulatory control elements, each component of which contributes a small amount to the observed phenotype(s), as predicted by the omnigenic hypothesis 21, 22 . non-coding mutations at cohesin and cohesin-associated factors were found by genome wide association studies (gwas-attributed variants) to track with multiple phenotypes 23 . however, the impact of genetic variants located within cohesin and its associated genes has not yet been investigated with respect to phenotype development. we hypothesised that cohesin-associated pathologies can be affected by subtle, combinatorial changes in the regulation of cohesin genes caused by common genetic variants within control elements. here, we link the 3d structure of the genome with eqtl data to determine if gwas variants attributed to cohesin genes affect their transcription. we test cohesin gene-associated gwas variants for regulatory connections beyond the cohesin genes (gene enrichment and regulatory hubs). we also identify all variants within each gene locus that had a previously determined cis-eqtl (gtex catalog) to the cohesin gene (eqtl-attributed variant list). as with the gwas variants, we tested these variants for the presence of spatial-regulatory relationships involving genes outside of the locus. only a few of these eqtl-attributed variants are currently implicated in disease pathways, but their regulatory relationships with cohesin suggest that they may be significant for cohesinopathy disorders. results 140 genetic variants with regulatory potential are associated with cohesin loci mitotic cohesin genes (smc1a, smc3, stag1, stag2, and rad21), meiotic cohesin genes (smc1b, stag3, rec8, and rad21l1), cohesin support genes (wapl, nipbl, pds5a, pds5b, and mau2) and ctcf were investigated to determine if they contain non-coding genetic variants (snps) that make contact in 3d with genes and therefore could directly affect gene expression (gwas-attributed and eqtl-attributed; table 1, table s1 ). a total of 106 gwas-attributed genetic variants associated with disease were identified (methods) that mapped to a cohesin gene (49 gwas; 50 snps) or cohesin-associated (49 gwas; 56 snps) gene (table s1 ). twelve snps (blue , table s1 ) are not listed in the current gtex snp dictionary (gtex v8), while fourteen snps have virtually no variation in the gtex per-tissue analysis (i.e. minor allele frequency [maf] < 0.05; green, table s1 ) and so were discarded. 80 snps passed all filters and were subsequently analysed using codes3d 21 (gwas-attributed list; table 1 ). within the gtex catalogue, 187 eqtl-attributed variants associate with regulation of the cohesin gene set (table s1 ). these variants associate with modified expression levels of cohesin genes in otherwise healthy individuals. fifty-five of these variants had a maf<0.05 (green , table s1 ) and were filtered out of the eqtl-attributed set prior to codes3d analysis (132 variants passed maf filter; eqtl-attributed; table 1 ). only three variants were shared between the gwas-attributed and eqtlattributed variant lists, resulting in a total of 290 cohesin-associated variants (gwas-and eqtl-attributed combined; table 1 ), but only 209 variants pass all filters (gwas-and eqtl-attributed combined; black, table s1 ). codes3d integrates data on the 3-dimensional organisation of the genome (captured by hi-c) and transcriptome (eqtl) associations across multiple tissue types (table s2 and s3). we used the codes3d algorithm to assign the 209 variants to hubs of regulatory/functional impacts by examining their potential to regulate other genes. of the 209 variants, four had zero significant eqtls, leaving 205 variants with significant eqtls (128 eqtl-attributed, 80 gwas-attributed, and 3 overlaps; table s2 and s3). however, many of the 205 variants were not attributed the gwas-or eqtl-attributed cohesin gene in the locus. of the 205 variants with eqtls, 106/187 (56.7%) eqtl-attributed variants and 37/106 (34.9%) gwas-attributed variants had a physical (hi-c detected) connection and significant eqtl with their attributed cohesin gene (140 total, 3 overlaps; table 1 ). strikingly, most of the 106 variants attributed by gwas studies to cohesin genes were not confirmed by spatial connection, with only 34% being cis-eqtls for the attributed cohesin gene (37 of 106, 34.9%). after the codes3d analysis, six of the cohesin genes have no gwas-attributed snps with a regulatory connection (stag2, nipbl, ctcf, smc3, rec8, and rad21-as1). therefore, the majority of gwas variants tested in proximity to cohesin loci have regulatory effects elsewhere in the genome. remarkably, despite five snps being attributed to nipbl by gwas, none of these were attributed to regulation of nipbl by our spatial eqtl analysis. of the cohesin or cohesin-associated genes with any gwas-attributed variants with cis-eqtls (rad21, rad21l1, smc1a, smc1b, stag1, stag3, mau2, pds5a, pds5b, and wapl), only the stag1, mau2, and pds5b loci contain more than two variants with confirmed cis-eqtls. therefore, even those loci with confirmed variant-gene gwas attributions have very few variants with evidence of cis-eqtls. to further characterize the potential for the 209 cohesin-associated variants to alter gene regulation, we analysed histone marks, dnase accessibility, and protein binding motifs (haploreg v 4.1) at each location (table s4) 24 . most variants reside within accessible chromatin (dnase: 61.3%) and almost all (93.1%) have at least one of three histone marks that are consistent with putative regulatory activity (promoter, 82.8%; enhancer, 45.1%; protein binding sites, 86.8%). intriguingly, haploreg motif prediction identified 16 of the 209 variants (7 different loci: mau2, pds5b, rec8, smc1b, stag3, rad21l1, stag1) as residing within protein binding domains associated with cohesin-related dna interactions (i.e. rad21, smc3, and ctcf). therefore, most of the variants like in regions associated with chromatin marks that highlight putative regulatory capabilities. codes3d predicted 140 out of 209 variants to have significant regulatory activity. we compared this to alternative functional variant prediction methods. the deepsea algorithm, which predicts the chromatin effects of sequence alterations by analysing the epigenetic state of a sequence, identified 28 of 209 the variants as having functional significance (<0.05, table s5 ). predictsnp2, which estimates noncoding variant classification (deleterious or neutral) from five separate prediction tools (cadd, dann, fat, fun, gwava tools), identified 9 of 209 variants as deleterious (table s6) . therefore, only 31/209 variants have putative functional significance predicted by these tools (28 deepsea, 9 predictsnp, 6 overlaps). therefore, these 31 variants have support from multiple methods, suggesting a potential for higher regulatory effects, and that the contrast with the haploreg chromatin marks and gtex measured eqtls possibly indicates a heavy weighting against false positives in these prediction methods. in summary, gwas-attributed snps are enriched for chromatin marks (regulatory potential). however, fewer than half the snps in proximity to the cohesin and cohesin-associated genes physically connected with the cohesin genes they are predicted to regulate, suggesting that cohesin genes are not the direct targets of these regulatory variants. pathway enrichment implicates coordinated regulation of cohesin with essential cell cycle genes codes3d identified 140 variants as being physically connected to, and associated with the expression levels of 310 genes (243 genes from eqtl-attributed variants, 141 from gwas-attributed variants, and 74 overlap) across 6,795 significant tissue-specific regulatory connections (fdr p<0.05). physical connections comprised 6,570 fine-scale connections (cis, <1mb from the variant), 42 coarsescale connections (trans-intrachromosomal, >1mb), and 183 connections on a different chromosome (trans-interchromosomal) (fig 1; tables s2, s3). of note, there is one cohesin-tocohesin regulatory connection: rs111444407, a gwas-associated variant (bipolar disorder) located within the mau2 locus has a significant trans-interchromosomal eqtl with rec8. the gene overlaps between the gwas-and eqtl-attributed analyses are also intriguing. for example, variants in the gwas-attributed and eqtl-attributed lists, each from a different chromosomal location (rec8 and pds5b), modify tcf7l1 expression. as tcf7l1 is part of the wnt pathway and is highly expressed in ovaries in the gtex catalogue, it is notable that this gene is regulated from variants in the rec8 locus (meiosis-specific cohesin). their co-regulatory relationships exemplify the systems of genomeregulatory hubs, with a total 74 genes overlapping the gwas-and eqtl-attributed analyses. the 31 significant variants highlighted by the deepsea and predictsnp2 analyses functionally connected to 107 (34.5%) of the genes identified by codes3d. thus, while deepsea and predictsnp2 assigned functionality to just 31/205 snps (15.1%), these variants represent 34.5% of the codes3dpredicted modulatory connections. therefore, deepsea and predictsnp2 successfully selected for variants with highly enriched regulatory functions. we used g:profiler to assess the functional enrichment of the gwas-and eqtl-attributed genes ( table s7 ). the 310 genes are enriched for pathways that support cohesin function, as we currently understand it, within the nucleus. for example, functional enrichment in sister chromatid gene ontology categories includes five non-cohesin genes from our analysis: ctnnb1, ppp2r1a, chmp4a, cul3, and dis3l2. moreover, cohesin's meiosis-specific role (smc1b, stag3, rec8) is enriched by two trans connections revealing regulation of meiosis-related genes (itpr2, ppp2r1a; kegg pathway hsa04114). collectively, these results suggest that expression of cohesin genes is coordinated with other genes that are involved in cell cycle control. meiosis-specific cohesin genes are functionally connected to kif6 and a germ cell pathway there are 108 genes functionally connected to variants within the meiosis-specific cohesin loci smc1b, stag3, rec8, and rad21l1 (table s8a) . we identified 14 significant enrichment terms using g:profiler (table s8b) , including the gene ontology "male germ cell nucleus" pathway. the gene ontology "male germ cell nucleus" pathway contains kif6, stag3, and rec8 (trans-interchromosomal connection from stag3 locus to kif6). within gtex, kif6 is highly expressed in brain and testis. the kif6 gene region was previously identified as significantly associated with a gwas of hypospadias, a birth defect presenting with a urethral opening located on the ventral side of the penis instead of at the tip of the glans. of particular relevance to the cohesinopathies, in which affected individuals present with cognitive defects, a mutation in human kif6 caused neurodevelopmental defects and intellectual disability 25 . variation in kif6 expression has been associated with epilepsy 26 , another notable cohesinopathy phenotype 27 . it has also been proposed that kif6 is a conserved regulator of neurological development 25 . previous findings of an association between kif6 and heart disease are not supported by gtex (kif6 is lowly expressed in gtex heart tissue) and kif6 knockout mice have no heart phenotype 28 , suggesting that the heart-associated kif6 variants might somehow affect the expression or function of other genes. we also identified an enrichment for e2f7 transcription factor binding sites within our genes ( table s8b ). the e2f7 transcription factor modulates embryonic development and cell cycle 29, 30 , with a role in cancer development 31 . of note, the e2f7 gene is loss-of-function intolerant (pli=.993), consistent with its crucial role in the cell cycle and development. collectively, our results suggest that the 301 genes are enriched for cell cycle-regulated genes, including e2f targets, and that these genes might be indirect targets of cancer drug treatments modifying e2f7 transcription factor activity. the cohesin gene regulatory network is intolerant to loss-of-function mutations a subset of human genes, whose activity is crucial to survival, are intolerant to loss-of-function (lofintolerant) mutations 32 . the gnomad catalog lists 19,704 genes: 3,063 lof-intolerant, 16,134 loftolerant, and 507 undetermined (15.5% lof-intolerance, defined as pli ≥ 0.9) 32 . all cohesin and cohesin-associated genes (except the meiosis-only rad21l1, smc1b, stag3, and rec8) are lofintolerant (table 2) . we hypothesized that genes functionally connecting to variants in cohesin genes would also be enriched for loss-of-function intolerance. consistent with our hypothesis, 79 of the 310 genes (25.4%) we identified are lof-intolerant (185 are lof tolerant and 46 have undetermined pli; table s9 ). stratification of the eqtl genes based on the distance between the variant and gene (cis versus trans-acting eqtl gene lists) identified a marked increase in lof-intolerance for genes regulated by trans-acting eqtls (fig 2; fig s1) .this was especially pronounced for regulatory interactions that occurred between chromosomes (fig 2; fig s1) . there was a significant correlation between distance and pli on the same chromosome (r=-0.03, p<0.05). as the cohesin genes are intolerant to even small perturbations in gene expression, we hypothesized that the lof-intolerant genes that were regulated by elements within the cohesin genes would similarly only be tolerant to small allelic fold changes in gene expression. therefore, we tested for a correlation between pli and the allelic fold change (log2 afc) associated with the eqtl. we observed that pli is significantly correlated with afc (r=-0.07, p<0.01; fig 3) . ignoring the direction of the change in expression, by using the absolute value of log2 afc, we identified an even stronger negative correlation between pli and |afc| (r=-0.31, p<0.01; fig 4) . collectively, these results are consistent with the hypothesis that long distance (especially inter-chromosomal) regulatory connections exhibit greater tissue specificity and disease associations 22,33 because they are enriched for lof-intolerant gene sets. thus, the inter-chromosomal regulatory connections potentially highlight novel disease pathways associated with the known cohesinopathies. pathway analysis identifies cohesin gene connections with extracellular matrix production and the proteasome we hypothesized that genes connected to regulatory variants within the cohesin loci might be contributing to disease-related phenotypes. a g:profiler transfac analysis identified significant enrichment for target sequences for the egr1 transcription factor (table s7) . egr1 transcription is regulated by stress and growth factor pathways; its binding to dna is modulated by redox state, and its transcriptional targets include genes involved in extracellular matrix production 34 . g:profiler enrichment also identified 9 genes as part of the ubiquitin mediated proteolysis pathway (birc3, birc6, cul2, ube2f, ube2k, ube4b, ubr5, cul3, xiap; tables s7 [red] and s10). the proteasome pathway tags unwanted proteins to be degraded and defects in proteolysis have a causal role in a variety of cancers. notably, 8 of the 9 (88.9%) genes identified in the ubiquitin-mediated proteolysis pathway are lof-intolerant. we hypothesized that the 310 genes we identified would have pharmacokinetic interactions with cancer treatments. notably, we identified cyp3a5 as being regulated by a cis interaction with a variant 441,888 bp away (intronic, cnpy4). cyp3a5 encodes a protein within the cytochrome p450 family. defects in p450 are known to alter cancer treatment outcomes (drug metabolism, kegg hsa00982). additionally, we identified a g:profiler enrichment for the e2f7 transcription factor within our egenes. e2f7 is up-regulated in response to treatment with doxorubicin or etoposide, topoisomerase 2 blockers 31 . indeed, many of the 310 genes from the codes3d analysis are targeted by drugs (table s11) . by contrast, for the cohesin genes, the drug-gene interaction database only lists known drug interactions for stag2 and stag3 35 . notably, consistent with our earlier observations, the drug-gene targets we identified include two genes targeted by topoisomerase blockers (e.g. papola, and xiap) 35 , both of which are lof-intolerant genes. through the use of the "contextualize developmental snps using 3d information" (codes3d) algorithm 21 , we have leveraged physical proximity (hi-c) and gene regulatory changes (eqtls) to reveal how variation in putative enhancers can alter the regulation of cohesin genes and modifier genes. our analysis has identified 140 eqtls that link 310 genes associated with the mitotic cohesin ring genes (smc1a, smc3, stag1, stag2, and rad21/rad21-as), meiosis-specific cohesin ring genes (smc1b, stag3, rec8, and rad21l1), cohesin-associated support proteins (wapl, nipbl, pds5a, pds5b, and mau2) and ctcf. collectively, these results form an atlas of functional connections from cohesin genes to proximal and distal genes, some of which reside on different chromosomes to the regulatory elements. these results agrees with previous findings that spatial eqtls mark hubs of activity across a multi-morbidity atlas 21 . only 35% of variant-gene mappings in the gwas catalogue were supported by spatial cis-eqtls. therefore, although several gwas-associated disease snps have been linked with cohesin, only a minority have supporting evidence that these variants actually regulate the cohesin genes. as such, our results call into question the validity of many of the previous associations between non-coding genetic variants and the cohesin genes that have been made in the gwas catalogue. previous reports have suggested that cis-eqtl gene sets are depleted of lof-intolerance genes when compared to similarly sized sets of non-eqtl genes 36 . similarly, our findings show that the 310 genes regulated by variants in cohesin loci are also enriched for lof-intolerant genes is notable. the apparent bias in existing studies can be explained by eqtl studies being predominantly focused on nearby (cis) variant-gene transcriptional connections and hi-c studies focusing on local changes in tad structure, due to technical limitations in the current analysis pipelines. we revealed that the greater the distance separating the eqtl and target gene, the more likely the target gene was to be lof-intolerant (i.e. over 30% of trans-interchromosomal interactions involved lof-intolerant genes). this finding is consistent with studies that show that long distance connections exhibit greater disease-and tissue-specificity 22, 33, 37 . the demonstration that the cohesion genes are also lof-intolerant agrees with their recognised haploinsufficiency in human developmental disease 2, 13 . notably, the genes that were enriched within pathways of pathological importance (e.g. 8 of the 9 genes in the ubiquitin-mediated proteolysis pathway gene set) were more likely to be lof-intolerant. this is consistent with previous findings that eqtl-identified genes are enriched for genome-wide disease heritability, and the subset of eqtl genes with lofintolerance are even larger enrichments for genome-wide disease heritability 38 . the codes3d method identified eqtl links between cohesin genes and other loci not related to cohesin. two pathways emerged from this analysis. firstly, spatial eqtl connections with cohesin genes identified an enrichment for genes that are regulated by zinc finger transcription factors including egr1 and znf880. egr1 positively regulates extracellular matrix (ecm) production. interestingly, we recently observed widespread dysregulation of extracellular matrix genes upon deletion of cohesin genes in leukaemia cell lines 39 . this supports the idea that regulation of the cohesin complex is tightly associated with ecm production. additional support for this is derived from the observation that cohesin subunit smc3 exists in the form of an extracellular chondroitin sulfate proteoglycan known as bamacan 40 . additionally, in asthma, smc3 upregulation significantly affected ecm components 41 . the ecm facet of cohesin biology is relatively under-explored and is worthy of further investigation. secondly, spatial eqtl connections with cohesin genes identified an enrichment for genes that encode effectors of the proteasome pathway. the stability of many cohesin proteins are regulated by the proteasome pathway 1,42,43 , but aside from this, genetic interactions between cohesin genes and proteasome pathway genes remain unexplored. in conclusion, many studies of mutations focus on the impact of coding-region variation, relying on natural knockouts (especially missense and loss of function variants) to identify gene function. our analysis highlights what those studies might be missing: sets of co-ordinated genes important to disease but largely intolerant to lof mutation in healthy individuals. we identified a novel set of genes which are regulated by elements within the cohesin genes. we found that many of the pathways and transcription factor binding sites enriched within these genes were relevant to disease pathways relevant to development and cancer. moreover, drug-gene interactions further reinforce the importance of these connections to cancer drug treatments and in particular topoisomerasetargeting drugs. as such, our results support recent reports of the importance of long-distance regulation as a key driver of phenotype development 33 . methods a large number of gwas studies have mapped phenotypic variation to cohesin ring gene loci we searched the gwas catalogue for snps mapped or attributed to mitotic cohesin ring genes (smc1a, smc3, stag1, stag2, and rad21/rad21-as), meiosis-specific cohesin ring genes (smc1b, stag3, rec8, and rad21l1), cohesin-associated support proteins (wapl, nipbl, pds5a, pds5b, and mau2), and ctcf from gwas studies covering a large assortment of altered phenotypes and pathologies across most tissues in the body (gwas-attributed). genomic positions of snps were obtained from dbsnp for human reference hg38. beyond variants with association to disease, we searched the gtex catalogue for cis-regulatory variants (variants within 1mb) that modify the expression of either cohesin ring genes (smc1a, smc1b, smc3, stag1, stag2, stag3, rad21, rec8, and rad21l1), cohesin support genes (wapl, nipbl, pds5a, pds5b, and mau2), or ctcf in one or more of 44 tissues across the human body (eqtl-attributed). unlike gwas variants, these variants have no inherent association to a phenotype (except the overlaps), as gtex contains individuals that were relatively healthy prior to mortality. thus, these variants explain variation in gene expression in a normal, mostly older cohort. genomic positions of snps were obtained from dbsnp for human reference hg38. for all gwas-attributed and eqtl-attributed variants, spatial regulatory connections were identified through genes whose transcript levels depend on the identity of the snp through both spatial interaction (hi-c data) plus expression data (expression quantitative trail locus [eqtl]; gtex v8 45 ) using the codes3d algorithm (https://github.com/genome3d/codes3d-v1) 21, 46 . spatial-eqtl association p-values were adjusted using the benjamini-hochberg procedure, and associations with adjusted p-values < 0.05 were deemed spatial eqtl-egene pairs. variants not found in the gtex catalogue or variants with a minor allele frequency below 5% were filtered out due to the sample size of gtex at each tissue. to identify snp locations in the hi-c data, reference libraries of all possible hi-c fragment locations were identified through digital digestion of the hg38 human reference genome with the same restriction enzyme employed in preparing the hi-c libraries (i.e. mboi, hindiii). digestion files contained all possible fragments, from which a snp library was created, containing all genome fragments containing a snp. next, all snp-containing fragments were queried against the hi-c databases to find distal fragments of dna which spatially connect to the snp-fragment. if the distal fragment contained the coding region of a gene, a snp-gene spatial connection was confirmed. there was no binning or padding around restriction fragments to obtain gene overlap. to limit technological challenges, gene transcripts for both the spatial and eqtl analyses used the gencode transcript model. spatial connections were identified from previously generated hi-c libraries of various origins (supp table 12 ): 1) cell lines gm12878, hmec, huvec, imr90, k562, kbm7, hela, nhek, and hesc (geo accession numbers gse63525, gse43070, and gse35156); 2) tissue-specific data from encode sourced from the adrenal gland, bladder, dorsolateral prefrontal cortex, hippocampus, lung, ovary, pancreas, psoas muscle, right ventricle, small bowel, and spleen (gse87112); and 3) tissues of neural origin from the cortical and germinal plate neurons (gse77565), cerebellar astrocytes, brain vascular pericytes, brain microvascular endothelial cells, sk-n-mc, and spinal cord astrocytes (gse105194, gse105513, gse105544, gse105914, gse105957), and neuronal progenitor cells (gse52457). the human transcriptome consists of genes with varying levels of redundancy and critical function, resulting in some genes being intolerant to loss-of-function (lof-intolerant) mutation. this subset of the human transcriptome are posited to also be more intolerant to regulatory perturbation. the gnomad catalog 32 lists 19,704 genes and their likelihood of being intolerant to loss-of-function mutations (pli), resulting in 3,063 lof-intolerant, 16,134 lof-tolerant, and 507 undetermined (15.5% lof-intolerance, defined as pli ≥ 0.9) 32 . we tested all cohesin, cohesin-associated genes, and those from our analysis (gwas-and eqtl-attributed) for lof-intolerance, comparing our cis and transacting eqtl gene lists for enrichment for lof-intolerance. as pli is bimodal and non-normally distributed, we tested both pli raw values as well as pli grouping (tolerant vs intolerant) for correlation between eqtl effect size (log2 allelic fold change, afc) and intolerance to disruption (pli). we considered both afc and its absolute value (direction of effect ignored), as it has been suggested that the eqtl effect direction is determined by how you define the minor allele within the population, not the actual molecular impact of the eqtl on the cohesin connection. this analysis highlights the significance of long-distance gene regulation on otherwise mutationally-constrained (lof-intolerant) genes. all genes from the gwas-attributed and eqtl-attributed analyses were then annotated for significant biological and functional enrichment using g:profiler 47 , which includes the kyoto encyclopedia of genes and genomes (kegg) pathway database (https://www.kegg.jp/kegg/pathway.html) for pathways and transfac for transcription factor binding enrichment. finally, we identified drugs that target the genes and related mechanisms through the drug gene interaction database (dgidb) 35 . to predict the most phenotypically causal variants within the variant set, we compared variants from the codes3d analysis with several tools which leverage deep learning-based algorithmic frameworks to classify functional relevance from dna markers including identified chromatin marks (enhancer marks, etc). we used deepsea 48 to predict the chromatin effects of variants to prioritize regulatory variants and predictsnp2 49 to summarise estimates of noncoding variants for classification (deleterious or neutral). deepsea predicts the chromatin effects of sequence alterations by analysing the epigenetic state of a sequence (transcription factors binding, dnase i sensitivities, and histone marks) across multiple cell types. predictsnp2 predictions are a consensus score from across five separate prediction tools for variant prioritization: cadd 1. tables table 1 the eqtl-attributed variant list consists of 187 variants, but after filtering results in 106 variants with significant spatial eqtls. the gwas-attributed variant list consists of 106 variants, but after filtering results in 37 variants with significant spatial eqtls. overall, the 290 attributed variants results in 140 variants with significant spatial eqtls. table 2 . pli of cohesin genes shows largely mutation intolerance the main set of genes comprising cohesin and cohesin-support are all loss-of-function intolerant (pli > 0.9). however, the subset of cohesin genes specific to only meiosis are not loss-of-function intolerant. the eqtl-attributed variant list consists of 187 variants, but after filtering results in 106 variants with significant spatial eqtls. the gwas-attributed variant list consists of 106 variants, but after filtering results in 37 variants with significant spatial eqtls. overall, the 290 attributed variants results in 140 variants with significant spatial eqtls. red: variants overlapping each set; green: variants filtered from codes3d for minor allele frequency (maf) < 0.05; blue: variants not in the gtex variant dictionary (no variant in gtex). codes3d scans the gwas-attributed variant list (106 variants) for variants with physical proximity to genes that is supported by allele-specific gene expression changes (eqtls). this analysis found 2188 variant-gene-tissues connections, involving 37 gwas-attributed variants. codes3d scans the eqtl-attributed variant list (187 variants) for variants with physical proximity to genes that is supported by allele-specific gene expression changes (eqtls). this analysis found 4607 variant-gene-tissues connections, involving 106 eqtl-attributed variants. we analysed the 209 genetic variants we identified for patterns of histone marks, dnase accessibility, and protein binding motifs in haploreg v 4.1. most of the variants have marks of accessible chromatin (dnase: 61.3%). in addition, almost all of the variants (93.1%) have at least one of the three: promoter histone marks (82.8%), enhancer histone marks (45.1%), proteins binding site (86.8%). additionally, 16 of the variants modify protein binding or motif predictions in rad21 (12), smc3 (4), and/or ctcf (10). green: protein or motif lists including a cohesin gene. supplementary table 5 . deepsea estimated effect of the cis-associated and gwasassociated variants deepsea predicts the chromatin effects of sequence alterations by analysing the epigenetic state of a sequence (transcription factors binding, dnase i sensitivities, and histone marks) across multiple cell types. this results in 28 variants identified as functionally significant. cohesin: its roles and mechanisms diverse developmental disorders fromthe one ring: distinct molecular pathways underlie the cohesinopathies gene regulation by cohesin in cancer: is the ring an unexpected party to proliferation? cohesin in cancer: chromosome segregation and beyond cohesin mutations in human cancer the cohesin release factor wapl restricts chromatin loop extension mutational inactivation of stag2 causes aneuploidy in human cancer. science (80-. ) towards a better understanding of cohesin mutations in aml cohesin mutations in myeloid malignancies made simple cohesin mutations in myeloid malignancies: underlying mechanisms mutations in the stromal antigen 3 (stag3) gene cause male infertility due to meiotic arrest meiotic kinetochores fragment into multiple lobes upon cohesin loss in aging eggs cornelia de lange syndrome: from molecular diagnosis to therapeutic approach a 3d map of the human genome at kilobase resolution reveals principles of chromatin looping long-range interactions between topologically associating domains shape the four-dimensional genome during differentiation cohesin facilitates zygotic genome activation in zebrafish proteogenomics and hi-c reveal transcriptional dysregulation in high hyperdiploid childhood acute lymphoblastic leukemia epigenetic reprogramming at estrogen-receptor binding sites alters 3d chromatin landscape in endocrine-resistant breast cancer spatial chromosome folding and active transcription drive dna fragility and formation of oncogenic mll translocations topoisomerase ii-induced chromosome breakage and translocation is determined by chromosome architecture and transcriptional activity chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities trans effects on gene expression can drive omnigenic inheritance the nhgri gwas catalog, a curated resource of snp-trait associations haploreg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants mutations in kinesin family member 6 reveal specific role in ependymal cell ciliogenesis and human neurological development genetic variants in incident sudep cases from a community-based prospective cohort with epilepsy epilepsy in patients with cornelia de lange syndrome: a clinical series no evidence for cardiac dysfunction in kif6 mutant mice atypical e2f repressors and activators coordinate placental development synergistic function of e2f7 and e2f8 is essential for cell survival and embryonic development e2f7, a novel target, is upregulated by p53 and mediates dna damage-dependent transcriptional repression variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes systematic identification of trans eqtls as putative drivers of known disease associations egr1 transcription factor is a multifaceted regulator of matrix production in tendons and other connective tissues dgidb 3.0: a redesign and expansion of the drug-gene interaction database genetic control of expression and splicing in developing human brain informs disease mechanisms dissecting the genetics of the human transcriptome identifies novel traitrelated trans-eqtls and corroborates the regulatory relevance of non-protein coding loci † leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits bet inhibition prevents aberrant runx1 and erg transcription in stag2 mutant leukaemia cells the cohesin smc3 is a target the for β-catenin/tcf4 transactivation pathway smc3 may play an important role in atopic asthma development degradation of the separase-cleaved rec8, a meiotic cohesin subunit, by the nend rule pathway pds5 prevents the polysumo-dependent separation of sister chromatids genomic atlas of the human plasma proteome genetic effects on gene expression across human tissues gwas on prolonged gestation (post-term birth): analysis of successive finnish birth cohorts profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update) predicting effects of noncoding variants with deep learningbased sequence model predictsnp2: a unified platform for accurately evaluating snp effects by exploiting the different characteristics of variants in distinct genomic regions this work was supported by a royal society of new zealand marsden grant to jh and jos (16-uoo-072), and ws was supported by the same grant.the drug-gene interaction database only lists known drug interactions for stag2 and stag3, and only for cancer treatments. however, dgidb identifies a large number of drug-gene interactions with the 310 genes identified by codes3d. notably, the drug-gene interaction analysis identifies two topoisomerase targets: camptothecin (stag2 interaction, a topoisomerase-i inhibitor used in cancer treatments) and etoposide (papola and xiap interactions, which inhibits the topoisomerase ii enzyme).supplementary table 12 . hi-c datasets used in this study spatial connections were identified from previously generated hi-c libraries of various origins: 1) cell lines gm12878, hmec, huvec, imr90, k562, kbm7, hela, nhek, and hesc (geo accession numbers gse63525, gse43070, and gse35156); 2) tissue-specific data from encode sourced from the adrenal gland, bladder, dorsolateral prefrontal cortex, hippocampus, lung, ovary, pancreas, psoas muscle, right ventricle, small bowel, and spleen (gse87112); and 3) tissues of neural origin from the cortical and germinal plate neurons (gse77565), cerebellar astrocytes, brain vascular pericytes, brain microvascular endothelial cells, sk-n-mc, and spinal cord astrocytes (gse105194, gse105513, gse105544, gse105914, gse105957), and neuronal progenitor cells (gse52457). the 310 genes identified by codes3d were searched for enrichment in various biological processes through g:profiler, identifying 54 significantly enriched processes. when removing the cohesin genes from the enrichment analysis, 4 processes remain significant (red), including 2 ubiquitin gene ontologies.supplementary table 8 . gene list and g:profiler enrichment for meiosis-specific cohesin genes (rad21l1, rec8, smc1b, stag3; g:profiler 2019) (a) there are 108 genes functionally connected to variants within the meiosis-specific cohesin loci smc1b, stag3, rec8, and rad21l1. (b) we identified 14 significantly enriched processes, including the gene ontology "male germ cell nucleus" pathway and for transfac targets for the e2f7 transcription factor, which link to meiosis-and cell-cycle specific mechanisms. the gnomad database identified lof-intolerant genes (intolerance defined as pli ≥ 0.9). here we show the pli for the 310 genes identified by spatial eqtls in our gwas-and eqtl-attributed variant analysis. overall, 185 are lof tolerant, 79 of the 310 genes (25.4%) are lof-intolerant, and 46 lack pli. supplementary table 10 . highlights of genes identified within the ubiquitin mediated kegg pathway within the g:profiler analysis, two ubiquitin-mediated kegg pathways were significant even when removing cohesin from the analysis.. here we highlight the pathways and their associated genes, eqtl distance, variant-attributed locus, and source of variant-attribution. the kegg disease pathways identified above are largely identifying lof-intolerant genes (8 of 9 [88.9%] kegg ubiquitin-mediated proteolysis genes).supplementary table 11 . drug-gene interactions with eqtl-attributed and gwasattributed eqtl genes (dgidb v 3.0) key: cord-014462-11ggaqf1 authors: nan title: abstracts of the papers presented in the xix national conference of indian virological society, “recent trends in viral disease problems and management”, on 18–20 march, 2010, at s.v. university, tirupati, andhra pradesh date: 2011-04-21 journal: indian j virol doi: 10.1007/s13337-011-0027-2 sha: doc_id: 14462 cord_uid: 11ggaqf1 nan patients showed rashes on face, hand and foot. ev detection carried out in vesicular fluid, stool, serum and throat swab specimens by rt-pcr of 5 0 ncr gene. serotyping was carried out by using rt-pcr of viral protein of vp1/2a junction region followed by sequencing and phylogenetic analysis using neighbor-joining-algorithm and kimura-2 parameter model of mega-4 software. overall ev positivity detected in hfmd patients from kerala, tamil nadu, west bengal and orissa states was found to be 51.6%, 66.6%, 62.5% and 71.4% respectively. typing of vp1 gene sequences indicated presence of ca-6, ev-71, echo-9 strains in kerala and ca-16 in west bengal, orissa and tamil nadu. phylogenetic analysis indicated ca-6, ev-71, echo-9 strains showed 94.8-95.7% and 95-94.4% homology with japanese, australian and french strains. however, ca-16 strains were closer to malaysian strains with 91.2-95.6% nucleotide homology. the present study documents the association of multiple types of ev's i.e., ca-6, ev-71, echo-9 and ca-16 strains contributing as prime viral pathogens in hfmd epidemics in the reported regions with new emergence of ca-6 circulating strain in kerala, india. tasgaon september 2010. sera were collected from 162 suspected hepatitis cases and there contacts and tested for anti hev igm/igg antibodies (elisa) and liver enzymes like alanine aminotransferase (alt). anti hev igm antibodies were detected in 45.7% (74/162) of the suspected cases. the overall attack rate was 0.7%. male to female ratio was 2:1. majority (60.4%) of the cases were in the age group 20-40 years and recovered without any clinical complications. weekly distribution of cases showed that the majority (79.4%, 116/146) cases occurred between 2nd and 3rd week of june. dark urine (97.5%), jaundice (93.5%), fatigue (35.9%), abdominal pain (32.6%), anorexia (29.4%), vomiting (26.5%), fever (22.8%), giddiness (14.3%), diarrhoea (12.6%) and arthalgia (3.7%) were the prominent symptoms. sera collected from 73 antenatal cases (ancs) showed anti hev igm antibody in 3. affected pregnant women had a normal outcome. a death of 32 year, male hepatitis e case was reported during the outbreak period that had cirrhosis of liver with oesophageal varices. sanitary survey revealed that water pipelines were laid down in close proximity of sewerage system, and water posts were without tap. these are the likely sources of faecal contamination of water supplies. among 17 water samples collected from various places, 5 were found to be unfit for drinking based on the routine bacteriological tests conducted at state public health laboratory, pune. no case occurred after the pipelines were repaired. this typical outbreak of hepatitis e re-emphasizes need for proper water supply/sewage disposal pipelines and adequate maintenance measures. jayanthi shastri, nilima vaidya, sandhya sawant, umesh aigal department of molecular biology, kasturba hospital for infectious diseases, mumbai, india dengue and dengue haemorrhagic fever are amongst the most important challenges in tropical diseases due to their expanding geographic distribution, increasing outbreak frequency, hyperendemicity and evolution of virulence. the gobal prevalence of dengue has grown dramatically in recent decades. who estimates 50-100 million cases of dengue virus infections worldwide every year resulting in 250,000 to 500,000 cases of dhf and 24,000 deaths each year. public health laboratories require rapid diagnosis of dengue outbreaks for application of measures such as vector control. laboratory diagnosis of dengue virus infection can be made by the detection of specific virus, viral antigen, genomic sequence and/or antibodies. currently 3 basic methods used by laboratories for diagnosis of dengue virus infection are virus isolation and characterisation, detection of genomic sequence by nucleic acid amplification technology assay and detection of dengue virus specific antibodies/antigen. molecular diagnosis based on reverse transcription (rt)-pcr s.a. one step or nested pcr, nucleic acid sequence based amplification (nasba), or real time rt-pcr, has gradually replaced the virus isolation method as the new standard for the detection of dengue virus in acute phase serum samples. several pcr protocols for detection have been described that vary in the extraction method, genomic location of primers, specificity, sensitivity and the methods to determine the products and the serotype. pcr-based dengue tests, due to the specificity of amplification, enable a definitive diagnosis and serotyping of the virus. in addition dna sequencing of the amplification product enables the virus to be genotyped, providing important information on the sources of infection. more recently tests have incorporated flurogenic probe, so called taq man technology for the specific real time detection of dengue 1-4 amplicons. product is detected by a specific oligodeoxy nucleotide probe that is labelled with 6 carboxy-fluorescein (fam). this technology offers the advantage of being both rapid and potentially quantitative. second, the detection of product by hybridisation of flurochrome labelled probes increases specificity. third, as the product is detected without the need to open the reaction tube, the risk of contamination by product carry over is minimised. the advantages of speed, contamination minimisation and reduced turn around time justify application of this assay over the currently used nested pcr assay. during the period january 2007 to october 2009, molecular laboratory received 900 samples from patients presenting with acute onset fever for dengue .6%) samples were tested positive by this method. the disease peaks in the monsoon season with a percentage of 17.5%. rapid tests, igm and igg capture elisa are popularly used tests for diagnosis of dengue infection. its utility is limited for diagnosing dengue in convalescecce (8-14 days) . specificity is also compromised due to infections with flaviviruses: japanese encephalitis and chikungunya. dengue ns1 ag elisa with its cost effectiveness, specificity and sensitivity should be considered as the test of choice for diagnosing dengue in the acute phase of illness in the developing countries. molecular diagnosis enables confirmatory diagnosis of dengue in the acute phase of the illness and is suitable for further typing methods. assistant general manager and r&d coordinator, division of quality control and r&d, bharat immunologicals and biologicals corporation ltd., village chola, bulandshahr, up vaccine development in india, though slow to start, has progressed by leaps and bounds in the past 60 years. it was dependent on imported vaccines but now it is not only self-sufficient in the production of vaccines conforming to international standards with major supplier of the same to unicef. the role of drug authorities is to enhance the public health by assuring the availability of safe and effective a2 indian j. virol. (september 2010) 21(suppl. 1):a1-a58 vaccines, allergenic extracts, and other related products. vaccine development is tightly regulated by a hierarchy of regulatory bodies. guidelines provided by the indian council of medical research (icmr) set the rules of conduct for clinical trials from phase i to iv studies as well as studies on combination vaccines. these guidelines address ethical issues that arise during a vaccine study. a network of adverse drug reaction (adr) monitoring centers along with the adverse events following immunization (aefi) monitoring program provide the machinery for vaccine pharmacovigilance. genetic modifications have been developed to develop effective and cheaper vaccines by the use of recombinant technology. to ensure safety of consumers, producers, experimental animals and environment, governments all over the world are following regulatory mechanisms and guidelines for genetically modified products. as with other industrializing countries undergoing rapid shifts, india clearly recognizes the need to restructure its regulatory system so that its biopharmaceutical industry can compete in international markets. genetic engineering approval council (geac), recombinant dna advisory committee (rdac), review committee on genetic manipulation (rcgm), institutional biosafety committees (ibsc) are responsible for development, commitment for parameters and commercialization of recombinant vaccines. to centralize and coordinate the whole system, government has taken to form two agencies to regulate the regulation laws to develop recombinant pharmaceuticals products including vaccines. the first is the creation of the national biotechnology regulatory authority (nbra), under the department of biotechnology (dbt), as part of india's long-term biotech sector development strategy. the second major initiative will affect the entire indian pharmaceutical industry. this is the replacement of most state, district, and central drug regulatory agencies with a single, central, fda-style agency, the central drug authority (cda). the cda is expected to have separate, semi-autonomous departments for regulation, enforcement, legal, and consumer affairs; biotechnology products; pharmacovigilance and drugs safety; medical devices and diagnostics; imports; quality control; and traditional indian medicines. it will set up offices throughout india and will be paid for inspection, registration, and license fees. its enforcement powers will be strengthened by a new law increasing the criminal penalties for illegal clinical trials. in the manufacturing area, though, the country has been tightening the rules and enforcement. an amendment to the regulations, ''schedule m'' of the drug and cosmetics act, now specifies the good manufacturing practice (gmp) requirements for factory premises and materials. these requirements were modeled after us fda regulations, to improve regulatory coordination between indian and us regulators. india has realized the importance of regulations in pharmaceutical specially in vaccine field but it will take several years to implementation of these. india has coordinated some of its regulatory functions with western organizations. the us pharmacopoeia established an office in hyderabad in 2007. a representative of the indian pharmaceutical lobby also recently has expressed openness to an expansion of the fda's oversight of indian manufacturing. as india expands its global drug and biologicals production, us and europe, as the world's largest drug importers, will likely expand their regulatory support in the development of the country's regulatory systems. rapid diagnosis of japanese encephalitis virus (jev) infections is important for timely clinical management and epidemiological control in areas where multiple flaviviruses are endemic. however, the speed and accuracy of diagnosis must be balanced against test cost and availability, especially in developing countries. an antigen capture enzyme-linked immunosorbent assay (elisa) for detection of circulating jev specific nonstructural protein 1 (ns1) was developed by using monoclonal antibodies (mabs) specific to recombinant (ns1). the applicability of this jev ns1 antigen capture elisa for early clinical diagnosis was evaluated with 200 acute phase serum/ cerebrospinal fluid (csf) specimens collected from different epidemics during [2007] [2008] [2009] . jev ns1 antigen was detected in circulation from day 1 to 18. the sensitivity and specificity of jev ns1 detection in serum/csf specimens with reference to reverse transcriptase pcr was 82%, and 98.9% respectively. no crossreactions with any of the other closely related members of the genus flaviviruses (dengue, westnile, yellow fever and saint louis encephalitis (sle) viruses) were observed when tested with either clinical specimens or virus cultures. these findings suggested that the reported jev specific mab-based ns1 antigen capture elisa will be a rapid and reliable tool for early confirmatory diagnosis as well as surveillance of je infections in developing countries. manmohan parida the recent emergence of a novel human influenza a virus (h1n1) poses a serious global health threat. the h1n1 virus has caused a considerable number of deaths within a short duration since its emergence. a two-step single tube accelerated rapid real-time and quantitative swine flu virus specific h1 rtlamp assay is reported by targeting the h1 gene of the novel h1n1 hybrid virus. the feasibility of swine flu h1 rtlamp for clinical diagnosis was validated with a panel of 239 suspected throat wash samples comprising 116 confirmed positive and 123 confirmed negative cases of ongoing epidemic. the comparative evaluation of h1 specific rtlamp assay with real-time rt-pcr demonstrated exceptionally higher sensitivity by picking up all the 116 h1n1 positive and 36 additional positive cases amongst the negatives that were sequence confirmed as h1n1. none of the real-time rtpcr positive samples were missed by rtlamp system. the comparative study revealed that rtlamp was 100-fold more sensitive than rtpcr with a detection limit of 1 copy number. these findings suggested that rtlamp assay is a valuable tool for rapid, real-time detection as well as quantification of h1n1 virus in acute phase throat swab samples without requiring any sophisticated equipments. because of its recurrent nature. despite considerable progress in understanding of the virus at cellular and molecular levels, the proper management of the disease in its different stages is still a dilemma particularly whether to use antiviral or steroids or both. the risk of using steroids with its attendant complications has to be weighed against the risk of progression of the disease if avoiding the use of steroids. this dilemma can be reduced to a considerable extent if basic principles of virology and pathogenesis are kept in mind. this article reviews current concepts of virological and clinical aspects of hsv keratitis to enable a broad understanding of the disease process. it is recognized several influential host factors including the fact that hsk is more common in men than women. it is observed that the ability of hsv to establish latent infection in sensory neurons and possibly cornea, but have as yet been unable to use this knowledge to prevent the disease limitations. acknowledging limitations may further stimulate application of laboratory knowledge in coping with hsk which constitutes to present major challenge in terms of management. mvo-10 study on effect of human bhsp90 in immunity of hcv core protein and hbv hbsag there are more than 500 million individuals with hepatitis b and c in the world. in spite of vaccination in the different areas there are several reports about patients who got vaccine before. also there is not efficient vaccine against of hepatitis c and one of the important problems in vaccine project is development of effective and suitable adjuvant in human vaccines. at present research we applied human bhsp90 protein as adjuvant and chaperon. this protein injected to balbc mice as adjuvant together with recombinant proteins of hcv core and hbv hbsag. then humoral and cellular immune systems of the mice were studied. core and hbsag genes were cloned into petduet-1 vector and thermal vector of pgp1-2 was used for human heat shock protein 90 expressions. the different combination of these three proteins was injected to mice and we evaluated the total igg and igg2a of mice serums after a week. two weeks after booster injection, we studied the proliferation and cytokine secretion of spleen, inguinal and popliteal lymph nodes lymphocytes in vitro and ex vivo conditions. so the core/hbsag + hsp and core + hbsag + hsp complexes induced total igg and igg2a secretion. the spleen lymphocytes proliferation were increased equal to serum igg2a level that was constant in second time bleeding with significant different to complexes with freund's adjuvant. at first il-4 and il-5 cytokines were increased and then decrease of il-4 meaned no hypersensitivity. the chaperon effect of hsp90 on structure of core and hbsag proteins was studied by cd and flourometer. it could fold the proteins after heating and unfolding. hepatitis b virus (hbv) infection is vaccine preventable global public health problem. all commercially available vaccines contain one or more of the recombinant hepatitis b envelope protein or surface antigen (hbsag). measurement of antigen responsible for immunogenicity of vaccine is central to quality assessment. the problems associated with the use of a polyclonal antibody in an assay with regard to its poorly defined nature and batch-to-batch variation has been mitigated by the use of mabs as described in this paper. the initial capture of hbsag by the mab could orientate it such that the same antibody could bind to it as a detection antibody after labeling with out steric hindrance. the development of an immuno-capture elisa (ic-elisa) to measure the hbsag content using a monoclonal antibody (mab) specific to determinant ''a'' of hbsag in the experimental vaccine formulations is being discussed. murine mabs developed against hbsag, subtype adw2 were found to cross-react with the other subtypes viz. ad and ay too. the mabs have been characterized following which, one mab hbs06 was chosen for developing ic-elisa format for the quantification of the hbsag in the final algel adsorbed vaccines. the unadsorbed hbsag was used to establish the standard curve of hbsag/a. the elisa had a sensitivity of 10 ng/ml of hbsag. the recovery rate of hbsag/a was found to be around 70% in the vaccines treated to desorb the antigen from algel. twenty seven experimental batches of monovalent hepatitis b vaccines were analyzed for the hbsag content, both by ic-elisa and a commercial kit (axsym kit, abbott laboratories, usa). the statistical analysis of ic-elisa results indicated that an experimental equation f(x) = 0.0062(x) + 0.184, could precisely estimate the amount of hbsag in the adsorbed vaccines. the amounts of hbsag recovered from the adsorbed vaccines as estimated by the ic-elisa format had a good correlation with the estimates derived from a commercial kit, which is being used by several vaccine manufacturers in india for the quality control of vaccine antigen. the varying amounts of vaccine antigens that could be recovered seemed to depend on the quality of the hbsag and the methods of hbsag adsorption to the alum gel during vaccine manufacture. epidemiology of the spread of h1n1 virus. children of school going age have become victim of this deadly virus as evident from the reporting data generated in the past few weeks. the mortality rate has also been slightly increased. the disease spread in wave pattern and presently the world is passing through the second wave of pandemic with more severity in young and otherwise health people with a predilection for lungs leading to viral pneumonia and respiratory failure. now the pandemic gained hold in the developing world affecting more severely as millions of people live under deprived conditions having multiple health problems, with little access to basic health care. current data about the pandemic from developed counties need to be very closely watched in relation to shift in virus sub type, shift of the highest death rate to younger populations, successive pandemic waves, higher transmissibility than seasonal influenza, and demographic differences etc. presently the world appears to be better prepared. vaccine is available in market in many countries. even vaccine trials are actively going on in indian population. effective antivirals are available. although till now h1n1 diagnostic centers worked with cdc/who recommended h1n1 specific primer, probes with taqman chemistry by real time pcr, efforts on the development of indigenous diagnostics, vaccines and chemoprophylaxis is going on to have a better combat against this highly infectious virus. were positive for rotavirus infection by either page or elisa methods. the available data highlights the importance of rotavirus as a cause of diarrhea in children, which is severe enough to deserve specialized care. the observed proportion of 25.5% of all diarrhea cases being associated with rotavirus falls within the range of values reported by other workers. the reported positivity varies from 10.5 to 70.7%. in our study a complete concordance of elisa and page results were observed in 194 (97%) of the 200 tested specimens. this finding closely correlates with the findings of other authors who found a 96.7-97.14% concordance results between elisa and page methods. some authors found rna-page method that is as sensitive and rapid as elisa for detecting rotavirus in stool samples of cases of diarrhea and some others proposed elisa is more sensitive than page method fond to be 100% specific. the remaining 6 (3%) samples showed conflicting results. in a lone sample in which the od value of elisa test was 0.195, this value was almost at the cutoff level, the possibility of this sample being positive by elisa test is doubtful. negative result of the same sample in page method is difficult to explain, the possibility of presence of lot of empty virus particles or due to low concentration of viral rna in the fecal specimen and insufficient extraction of viral rna could be possible. on the other hand, 5 of the samples which gave positive results by page method were negative by elisa test. these 5 samples had a typical 4-2-3-2 rna pattern. the reason for their being elisa negative thus remains unexplained, however blocking factors or the presence of inhibitory substance in stools might have been responsible. the samples containing predominantly complete particles can also give false negative results. since, the group antigen is not exposed. earlier studies have also reported page to be the most sensitive technique although some are of view that it is laborious procedure. how ever, the page system used in this study was very simple to perform and the results were available on the same day. the main requirement was of trained personnel and proper standardization of the technique. most reports states that the greatest advantage of page and silver stain method are its lack of ambiguity and the fact that it provides information about viral electropherotypes. the modified page system was thus found to be reliable, simple and rapid, no expensive reagents were required. locally available reagents from hi media were used. the cost of the chemical for page per specimen was rs. 24 approximately as compared to rs. 110 per test by confirmatory elisa. a locally produced slab gel electrophoresis system with power pack was the only equipment required. this method could be used for the routine diagnosis of rotavirus infection in the laboratory. vaccine, rapid diagnosis plays an important role in early management of patients. in this study a qc-rt-pcr assay was developed to quantify chikungunya virus rna by targeting the conserved region of e1 gene. a competitor molecule containing an internal insertion was generated, that provided a stringent control of the quantification process. the introduction of 10-fold serially diluted competitor in each reaction was further used to determine sensitivity. the applicability of this assay for quantification of chikungunya virus rna was evaluated with human clinical samples and the results were compared with real-time quantitative rt-pcr. the sensitivity of this assay was estimated to be 100 rna copies per reaction with a dynamic detection range of 10 2 to 10 10 copies. specificity was confirmed using closely related alpha and flaviviruses. the comparison of qc-rt-pcr result with real-time rt-pcr revealed 100% concordance. these findings demonstrated that the reported assay is convenient, sensitive and accurate method and has the potential usefulness for clinical diagnosis due to simultaneous detection and quantification of chikungunya virus in acute-phase serum samples. in india, measles vaccine was introduced as part of expanded programme of immunization in 1985. measles, mumps and rubella (mmr) vaccine is still not part of the national immunization schedule of india. the indian association of paediatrics (iap) recommends measles vaccine at 9 months of age and mmr vaccine at 15-18 months. however, in a recent policy update, iap committee on immunisation opined that there is a need for a second dose of mmr vaccine for providing adequate immunity against mmr. the aim of the present study was to assess the extent of sero-protection against mmr at 4-6 years of age in children who have received one dose of mmr between 12 and 24 months of age. an attempt has also been made to assess the sero-response to the second dose of mmr vaccine in 4-6 years old children. a total of 106 consecutive children between the ages of 4-6 years who had received mmr vaccine between 12 and 24 months of age and attending the immunization clinic of gtb hospital, delhi were enrolled. the vaccination status, anthropometry and physical examination findings were recorded. three ml of venous sample was again withdrawn for estimation of post vaccination antibody titre. it was observed that 20.39%, 87.38% and 75.73% children were seroprotected for mmr respectively after 2.5-4.5 year of receiving first dose of mmr vaccine. seroprotection rose to 72.62%, 100% and 100% for mmr respectively after 4-6 weeks of receiving second dose of mmr vaccine. geometric mean concentration of antibody also rose significantly in all three diseases. in view of low seroprevalence of mmr and hence high susceptibility to infection at 4-6 years of age, who have already received mmr vaccine, there is need to boost the immune responses against these three diseases by giving a second dose of mmr vaccine. baseline information on the epidemiology of viral agents causing stis and types of risk behaviour of affected persons are essential for any meaningful targeted intervention. the present study documents the pattern of viral stis in patients attending a tertiary care hospital, correlating the syndromic approach and the laboratory investigations to determine the aetiology. three hundred consecutive patients attending the sti clinic were diagnosed and categorized according to the syndromic approach of the who along with detailed history and demographic data. majority of the patients were men (53.12%) with a mean age of 24 years. men received education up to middle school. half of the female subjects were illiterate. sixty percent of the patients were married and among these, 19% were regular condom users. first sexual contact at or before 18 years of age was more in men (31% vs. 22 .22% in women). promiscuity was more among male patients who had contact with csw. genital herpes was the commonest viral sti (86/300) followed by genital wart (60/300). concomitant infection with more than one virus was seen in 35% of patients. hiv was prevalent in 10.3% of sti patients. hepatitis b, hepatitis c, herpes simplex type 1 and molluscum contagiosum were the other viral agents seen in sti clinic attendees at our centre. this disease currently prevalent in more than 100 countries world wide and annually 50-100 million people are infected with dengue virus among which 2.5-5 lakhs cases were dengue hemorrhagic fever (dhf) and dengue shock syndrome (dss) which are serious forms of dengue virus infection and due to this condition 25,000 deaths might occur annually world wide and approximately 3 million children were hospitalized for the fast 3 decades. this disease is characterized by sudden onset of high fever with sever headache, pain in the back and limbs, lymphadenopathy macuolo-papulur rash over the skin and retro-bulbar pain. early diagnosis can be established with simple and rapid lgg/1gm antibodies detection in the blood samples of the patients based on the bi-directional immunoassay system for its management and control to reduce morbidity and mortality. details will be presented. myocarditis and dilated cardiomyopathy (dcm) are common causes of morbidity and mortality both in children and adults. the most common viruses involved in myocarditis are coxsackievirus b or adenovirus. recently, the coxsackievirus and adenovirus receptor (car), a common receptor for coxsackieviruses b3, b4 and adenoviruses 2, 5 has been identified. increased expression of car has been reported in patients with dcm suggesting utilization of car by these viruses for cell entry. the present study was designed to study the expression of car in myocardial tissue of patients with dcm. formalin fixed myocardial tissues were obtained from autopsy cases. a total of 26 cases of dcm and 20 cases of controls which included non-cardiac (group-a) and cardiac disease other than dcm (group-b) were included in the study. expression of car was studied by immunohistochemical staining of myocardial tissue using car specific rabbit polyclonal antibody and biotin conjugated secondary antibody. the tissue sections were considered positive when[25% of the cell showed brown color staining by immunohistochemistry (ihc). the car positivity in dcm cases was found to be 96% (25/ 26) as compared to 30% in control group a and 40% in control group b respectively. the car positivity was significantly higher in the test group as compared to both the control groups. further car positivity in all the cellular types (myocytes, endothelial cells and interstitial cells) was found significantly higher in test group as compared to both the control groups. the expression of car was significantly higher in myocytes as compared to both endothelial and interstitial cells in all the groups. however, no significant difference was observed in car positivity between endothelial and interstitial cells. the present study highlights the increased expression of car in dcm cases with further significance of car expression in myocytes and endothelial cells. this may help further in understanding the tropism of viruses or cellular susceptibility, which in turn will help in appropriate diagnostic and therapeutic approach in management of viral myocarditis and dcm cases. food security and safety vary widely around the world, and reaching these goals is one of the major challenges, raising public concern for the wellbeing of mankind, in particular. industrialized production and processing as well as improper environmental protection have clearly shown severe limitations such as worldwide contamination of the food chain and water. contaminated water and food during the processes of production, processing and handling are essentially responsible for food and water borne viral infections/diseases. the cases of viral food borne outbreaks are on the rise, creating a threat to human health. recent researches indicate that epidemiological studies are meager to focus the frequently contaminated foods and food borne viral diseases. current paper projects the etiology of select food borne viral diseases, probable reasons for non availability of appropriate methods to detect the viruses responsible for the diseases, routes of water and food borne transmission of enteric viral infections, currently available methods of detection of select viruses and bio safety measures to prevent food borne viral infections. dietary/nutritional management in food borne viral diseases is crucial to control weakness and gastro enteric intolerance due to disease condition and antibiotic therapy. it will principally improve food intake, resulting in better nutritional status leading to optimum immune response. food borne viruses are mainly belong to rotaviruses, enteropathogenic viruses, astroviruses, adenoviruses and caliciviruses, causes acute gastroenteritis (ag) which is an important health problem. the frequency of rotavirus as a cause of sporadic cases of ag ranges between 17.3% and 37.4%. astroviruses cause ag, with a frequency ranging between 2 and 26%: outbreaks have been described in schools and kindergartens, but also in adults and the elderly. the frequency of identification of adenoviruses 40 and 41 as causes of sporadic ag in non-immuno suppressed children ranges between 0.7% and 31.5%, although there is probably underreporting because the sensitivity of conventional techniques is low. caliciviruses are separated phylogenetically into two genera: norovirus and sapovirus. norovirus is frequently associated with food-and water-borne outbreaks of ag. it is estimated that 40% of cases of ag due to norovirus are food borne. in sweden and some regions of the united states, norovirus is the first cause of outbreaks of food borne diseases. sapovirus outbreaks due to person-to-person and food borne transmission affecting both children and adults have recently been reported in countries such as canada and japan. it has been predicted that the importance of diarrhoeal disease, mainly due to contaminated food and water, as a cause of death will decline worldwide. evidence for such a downward trend is limited. this prediction presumes that improvements in the production and retail of microbiologically safe food will be sustained in the developed world and, moreover, will be rolled out to those countries of the developing world increasingly producing food for a global market. sustaining food safety standards will depend on constant vigilance maintained by monitoring and surveillance but, with the rising importance of other food-related issues, such as food security, obesity and climate change, competition for resources in the future to enable this may be fierce. in addition the pathogen populations relevant to food safety are not static. food is an excellent vehicle by which many pathogens (bacteria, viruses/prions and parasites) can reach an appropriate colonization site in a new host. although food production practices change, the well-recognized food-borne pathogens, such as salmonella spp. and escherichia coli, seem able to evolve to exploit novel opportunities, for example fresh produce and even generate new public health challenges, for example antimicrobial resistance. in addition, previously unknown food-borne pathogens, many of which are zoonotic, are constantly emerging. awareness and surveillance of viral food-borne pathogens is generally poor but emphasis is placed on norovirus, hepatitis a, rotaviruses and newly emerging viruses such as sars. it is clear that one overall challenge is the generation and maintenance of constructive dialogue and collaboration between public health, veterinary and food safety experts, bringing together multidisciplinary skills and multi-pathogen expertise. such collaboration is essential to monitor changing trends in the well-recognized diseases and detect emerging pathogens. it is also necessary to understand the multiple interactions between these pathogens and their environments during transmission along the food chain in order to develop effective prevention and control strategies. to analyse the effectiveness of these sirnas targeting rabies virus l gene, the bhk-21 cells expressing sirnas in shrna form were produced by transduction of cells with radv-l. the transduced bhk-21 cells expressing sirna were infected with rabies virus pv-11 strain. there was reduction in rabies virus multiplication as analysed by reduction in fluorescent foci forming unit (ffu) count by 51.85% (70 ffu in bhk-21 cells expressing sirna-l compared to 135 ffu in bhk-21 cells expressing negative sirna). the expression of l gene mrna was reduced by 16.11fold in rabies virus infected radv-l transduced cells compared to radv-neg transduced cells (negative control) as detected using real-time pcr. after analyzing the effectiveness of radv-l in vitro, its effectiveness was also evaluated in vivo in mice after virulent rabies challenge. the mice were inoculated with 10 7 plaque forming units (pfu) of radv-l in masseter muscle (i/m route) and challenged with 15 ld 50 rabies virus challenge virus standard (cvs) strain. the results indicated 50% protection with improved median survival from 7 to 11 days compared with group of mice treated with radv-neg. the results of this study indicated that sirnas targeting rabies virus polymerase (l) gene delivered through adenoviral vector inhibited rabies virus multiplication in vitro and in vivo. and 4 were successfully produced and purified from the infected spodoptera frugiperda (sf-9) cells using these recombinant baculovirus. the morphology of the vlps was validated by electron microscopy in comparison to the authentic bt virions. the vlps produced here were stable and were highly immunogenic with intact outer layer which is rapidly lost during normal infection of btv. these btv-vlps elicited long lasting protective immunity in vaccinated sheep against virulent virus challenge. with the use of btv-vlps it was also possible to differentiate the infected and vaccinated animals (diva). vlp-based btv vaccine has potential advantages with regard to controlling the spread of btv with multiple serotypes. it is possible to produce milligram quantities of correctly folded and processed protein complexes using this baculovirus expression system and hence it is a more promising system for producing new generation vaccines like vlp subunit vaccine against any viral diseases in large scale. peste des petits ruminants (ppr), goatpox and orf are oie notifiable diseases of small ruminants especially goat and sheep. these diseases are economically important, in enzootic countries like india and cause significant loss and are major constraints in the productivity. considering the geographical distribution of ppr, goat pox and orf infections and prevalence of mixed infection, in the present study, safety and potency of the experimental triple vaccine comprising attenuated strains of thermostable-ppr virus (pprv jhansi, p-50) grown at 40°c, high passaged goat poxvirus (gtpv uttarkashi, p100) and attenuated orf virus (orfv mukteswar, p51) was evaluated in sub-himalayan local hill goats. goats simultaneously immunized with 1 ml of vaccine consisting of either 10 3 tcid 50 or 10 5 tcid 50 of each of pprv, gtpv and orfv were monitored for clinical and serological responses for a period of 3-4 weeks post-immunization (pi) and post challenge (pc). specific immune responses i.e., antibodies directed to pprv, gtpv and orfv could be demonstrated by ppr competitive elisa kit and capripox indirect elisa, snt, respectively following immunization. all the immunized animals resisted infections when challenged with virulent strains of either gtpv or pprv or orfv on day 28 dpi, while in contact control animals developed characteristic signs of respective disease. further, ppr viral antigen could be detected by using ppr sandwich elisa kit in the excretions (nasal, ocular and oral swab materials) of unvaccinated control animals after challenge but not from any of the immunized goats. triple vaccine was found safe at dose as higher as 10 5 tcid 50 and induced protective immune response even at lower dose (10 2 tcid 50 ) in goats, which was evident from sero-conversion as well as challenge studies. the study indicated that these viruses are compatible and did not interfere with each other in eliciting immune response, paving the feasibility of use of this triple vaccine in combating these infections simultaneously. toll like receptors (tlrs), primary sensors of microbial origin, plays a crucial role in the innate immunity. till now 13 mammalian tlrs have been identified, while there is no information available on tlrs of yak. this study is part of world bank funded-icar project. yak, named bos grunniens for its distinctive vocalization and relationship with cattle, is natural habitant of extremely cold environment. when these animals comes to a lower altitude grazing land, adjacent to villages, become susceptible to the diseases of cattle, buffalo etc. thus, present study was undertaken to with genetic characterization and evolutionary lineage analysis of yak tlrs. we worked on tlr7 gene, which plays an important role in recognition of ssrna viruses. total rna was extracted from mitogen stimulated pbmcs of yak. the rt-pcr conditions were standardized for full length amplification of tlr gene 7 using specific self designed primers. the expected amplicon of 3559bps was obtained. it was cloned in pgemt-easy vector followed by transformation in e. coli top10 strain. the recombinant clones were screened, picked up for plasmid isolation and release of tlr7 was confirmed by restriction digestion. the cloned tlr7 product was sequenced and analyzed for the nucleotide and deduced amino acid sequences, and 3d structure analysis. the results revealed that yak shows more than 98% sequence homology with other bos indicus breeds and bos taurus breeds. however, identity was less than 88% with other animal species (equine, murine, feline, canine etc.). the evolutionary lineage findings cluster yak more closely with bovine species. point mutations revealed changes at 25 nucleotide positions with corresponding amino acid change at 15 positions. smart analysis of yak protein domain architecture revealed toll-interleukin i receptor (tir), leucine rich repeats (lrr) and signal peptide region. the variations in yak mainly lie in the lrr region. homology modeling revealed horse shoe shaped structure with 5 alpha helix. the additional alpha helix present in bos indicus was not detected in yak. the present study shows existence of genetic variability in tlr7 gene of yak, in particular the lrr region, which plays an important role in the pathogen recognition and the evolutionary lineage analyses shows its closeness with other bovine species. a.p. aquaculture and fisheries, tirupati in this new millennium, aquatic animal health management strategies in asia expanded and adjusted to the current disease problems faced by the aquaculture sector. this presentation will briefly discuss some of the most serious trans-boundary pathogens affecting asian aquaculture including a newly emerging disease and highlight recent regional and national efforts on responsible health management for mitigating the risks associated with aquatic animal movement. a regional approach is fundamental since many countries share common social, economic, industrial, environmental, biological and geographical characteristics. capacity and awareness building on aquatic animal epidemiology, science-based risk analysis for aquatic animal transfers, surveillance and disease reporting, disease zoning and establishment of aquatic animal health information systems to support development of national disease control programs and emergency response to disease outbreaks are needed. molecular diagnostics with emphasis towards standardization and harmonization, inter-calibration exercises and quality assurance in laboratories, accreditation program and utilization of regional resource centres on aquatic animal health will also be needed. whilst most of these strategies are directed in support of government policies, implementation will require pro-active involvement, effective cooperation and strategic networking between governments, farmers, researchers, scientists, development and aid agencies, and relevant private sector stakeholders at all levels. their contributions are essential to the health management process. generally, aquaculture plays an important role in economy as harvests from natural waters have declined or, at best, remained static in most countries. fish and shrimp, the main aquaculture product sources, have gained the most attention. many factors can cause losses in yields of fish products and infectious disease in fish and shrimp is the biggest threat to the fishery industry. shrimp and fish aquaculture has grown rapidly over several decades to become a major global industry that serves the increasing consumer demand for seafood and has contributed significantly to socio-economic development in many poor coastal communities. however, the ecological disturbances and changes in patterns of trade associated with the development of shrimp and fish farming have presented many of the pre-conditions for the emergence and spread of disease. shrimp and fish are displaced from their natural environments, provided artificial or alternative feeds, stocked in high density, exposed to stress through changes in water quality and are transported nationally and internationally, either live or as frozen product. these practices have provided opportunities for increased pathogenicity of existing infections, exposure to new pathogens, and the rapid transmission and trans boundary spread of disease. not surprisingly, a succession of new viral diseases has devastated the production and livelihoods of farmers and their sustaining communities. this review examines the major viral pathogens of farmed shrimp and fish, the likely reasons for their emergence and spread, and the consequences for the structure and operation of the shrimp farming industry. in addition, this review discusses the health management strategies that have been introduced to combat the major pathogens and the reasons that disease continues to have an impact, particularly on poor, smallholder farmers in asia. btv isolates from the same geographic region have been termed as 'topotypes' and initial observation on segment 3 nucleotide sequences identified a correlation between topotypes and genetic information. later topotyping was proposed based on segment 10, on the premise that the encoding protein ns3, which is involved in virus egress from insect cells, would lead to evolutionary fitness in parallel with the geographic distribution of the different culicoides species. further studies attempted to extend this to nucleotide sequence homology in segments 7 and 10, but failed to identify clear cut correlations or any evidence for positive selection. for example, south african isolates were found not to cluster into separate african lineage. in this study, we carried out a more extensive analysis of segment 10 sequences. our analysis showed no segregation of isolates into topographically distinct groups. instead we observed topological clustering of the clades, and we attribute this to genetic bottleneck resulting in genetic drift and founder effect leading to homogenous gene pool in a geographical area. we hypothesize that when a new virus enters a geographical area where local btv strains are already circulating, the new genes/segments would enter into a bigger gene pool. consequently, the newer incursions into a heavily endemic area tend to get diluted and disappear from the population because the rate of drift is inversely proportional to the population size, unless they are positively selected. use of live attenuated vaccine in israel, europe, south africa and usa also led to more homogenous population similar to the vaccine strains due to continuous infusion of the vaccine type genes into the gene pool. we conclude that restriction of specific strains to certain geographical areas could generate uniquely imprinted genotypes which would not only indicate origin but also predict movement of viral strains to new areas. vvo-10 viral diseases of zoonotic importance: indian context k. prabhudas pd-admas, ivri, campus, hebbal, bangalore 24 zoonoses are generally defined as animal diseases that are transmissible to humans. they continue to represent an important health hazard in most parts of the world, where they cause considerable expenditure and losses for the health and agricultural sectors. the emergence of these zoonotic diseases are very distinct, hence their prevention and control will require unique strategies, apart from traditional approaches. such strategies require rebuilding a cadre of trained professionals of several medical and biologic sciences. the article discusses virus infections that have significant zoonotic implications for india. buffalopox is a contagious viral disease affecting milch buffaloes and rarely, cows, with a morbidity rate up to 80% in the affected herd. although the disease is not responsible for high mortality, it adversely affects the productivity of the animals, resulting in large economic losses. furthermore, the disease has zoonotic implications, as outbreaks are frequently associated with human infections, particularly in the milkers. the causative agent, buffalopox virus (bpxv), is closely related to vaccinia virus. the outbreaks of febrile rash illness among humans and buffaloes were investigated in the villages of districts solapur and kolhapur of western maharashtra. clinico-epidemiological investigations of humans and buffaloes were carried out and representative clinical samples were collected respectively. the samples include vesicular fluid, scab, and blood. laboratory investigations for buffalo-pox virus (bpxv) was done by pcr on blood samples, scabs and vesicular fluid. in vitro virus isolation attempts were carried out by using vero e-6 cells. negative staining electron microscopy was also employed for detection of virus particles. a total of 166 human cases with pox lesions on hand and other body parts from village kasegaon, district-solapur and 185 cases from 20 different villages of kolhapur district were reported. besides pox lesions patients were having fever, malaise, pain at site of lesion and axillary and inguinal lymphadenopathy. in kasegaon village, attack rate in human cases was 6.6% and in buffaloes 41.9% (231/551). whereas in kolhapur area attack rate in buffaloes was 11.75% (2633/22398). bpxv was confirmed in blood, vesicular fluid and scab specimens from human cases and scab specimen from buffalo by polymerase chain reaction (pcr) method. the bpxv was also isolated from 3 different clinical specimens and further identified by pcr and electron microscopy. clinical manifestation of the disease in buffaloes from solapur district was as reported earlier like common pox lesions on teats and udders whereas the buffaloes from kolhapur district had lesions on hairless parts of ears and on the eyelids with purulent discharge. bpxv from human and buffalo cases showed similarity. vaccines have been made against several diseases and used for controlling the afflictions. however a few of them were not effective for successfully controlling the disease. the reasons for the failure are many, the major being, either the pathogen is not completely cleared from the vaccinated animal or it reemerges after changing its antigenic structure, thus making the vaccination programme less effective. in addition to this, emergences of newer diseases such as hiv the development of suitable vaccines have become a challenging task. this is especially true in the case of viral diseases. these challenges have warned the researchers ''that protection by vaccination is not that simple and strait forward approach'', and lot need to be understood in terms of host virus interaction and role of environment in perpetuating the disease. so the immediate step that was considered was the environmental safety by way using non infectious materials as vaccines. with the understanding that has been developed in molecular immunology and molecular biology and with the availability of molecular tools that have been developed through recombinant dna technology the field of vaccinology has changed dramatically to emerge as modern vaccinology. this presentation deals with the modern approaches that are being used to produce effective vaccines in the case of foot and mouth disease of cloven footed animals. the similar approach may be worked out for other viral diseases also. despite the availability of an inactivated vaccine that is noted to provide solid immunity against the disease over a short period of time, the search for an ideal vaccine, the criteria for which are; safety of the vaccine for environment, easy in its preparation, does not require a cold chain for its storage, provides longer lasting immunity, economically viable and may be able to clear the virus in case of persistent infection is on. the advent of recombinant dna technology together with the information available on the molecular biology of viruses has enabled to design the development of newer vaccines that can induce strong cellular and humoral responses. the underlying principal in the present vaccine development strategy world over is the virus antigen gene has to be expressed in the tissue and the vaccine backbone has to trigger the immune system for eliciting desired immune response. bangalore campus of ivri has been vigorously pursuing research to develop ideal vaccines for foot and mouth disease keeping above principal in mind to achieve the previously mentioned criteria. the approaches selected are to see that the virus antigen/s replicate transiently in the host. the self replicating vaccines that have been developed are pox virus vectored vaccines, alpha virus replicase based vaccines and fmdv vectored vaccines. the approach and the result obtained so far will be discussed. silkworm, bombyx mori is affected with various diseases caused by viruses viz., nuclearpolyhedrosis (bmnpv), densosnucleosis (bmdnv) and infectious flacherie (bmifv). silkworm viral diseases form major constraints for the silk cocoon production in all the sericultural countries. the losses due to silkworm diseases is estimated about 20-40% and among them viral diseases are most common. in sericulture, prophylactic measures play a vital role in the management of silkworm diseases. these include disinfection of silkworm rearing house and appliances, rearing area, rearing surroundings, silkworm egg and body, and rearing bed disinfection associated with maintenance of general hygiene and personnel hygiene. all these activities are generally carried out as rituals by using general disinfectants often with partial success. recent trends in complete management of silkworm diseases include development of silkworm hybrids evolved from disease resistant/tolerant breeds, effective eco-and user-friendly disinfectants, anti-microbial feed-supplements and use of transgenic silkworms. biotechnological breakthrough in this regard is through rna interference (rnai) approach involving dsrna mediated nuclear polyhedrosis management and this is presently pursued by apssrdi, hindupur in collaboration with centre for dna fingerprinting and diagnostics (cdfd), hyderabad. nadu and karnataka. the disease appears to be more severe in rural flocks than organized farms. our investigations revealed the morbidity, mortality and case fatality rates among rural and organised farms as 9.34%, 2.69%, 28.84% and 6.22%, 0.47%, 7.63% respectively. higher morbidity and mortality in rural areas may be due to stress factors like poor nutrition, parasitic burden, fatigue due to long walks and non availability of veterinary aid. kulkarni et al. 1992 also reported the severe bt outbreaks in rural areas of maharashtra with overall morbidity, mortality and case fatality of 32%, 8% and 25% respectively. all the south indian sheep breeds were found to be susceptible and clinical farm of the disease is evident in all of them though saravanabava (1992) reported variations in susceptibility among the indigenous sheep. trichy black and ramnad white sheep were found to be more susceptible than the vambur and mecheri sheep of tamil nadu. prevalence of bluetongue in sheep, goat and cattle appears to be high in the region. serological surveys conducted in andhra pradesh during 1991 revealed the prevalence of btv antibodies in sheep (47.5%) goats (43.56%) cattle (33%) and buffaloe (20%). similar high prevalence of btv antibodies in sheep and goats were also reported from the other states in the region. clinical disease has not been recorded in kerala though btv antibodies were recorded in sheep (13.76%) and goats (7.10%) (ravi sankar 2003) . culicoides are the known biological vectors of btv. all the culicoides species are not capable of transmitting the btv. the occurrence of the disease is related to the presence of the competent vectors in the area. jain et al. (1988) established the involvement of the culicoides in transmitting the btv by isolating the virus from culicoides at haryana, the north indian state. c. imicola and c. oxystoma were found to be prevalent in andhra pradesh and tamil nadu. narladakar et al.(1993) reported the presence of c. schultzei, c. perigrinus and c. octoni in marathwada region of maharastra. culicoid vectors are significantly affected by the climate and annual variations in the climate reflects the outcome of the disease. the monsoon season (june to dec) with the temperature ranging from 21.2 to 35.6°c appears to be favourable period for the multiplication of culicoides. the maximum no of outbreaks were recorded during the north east monsoon period (oct-dec) followed by south west monsoon period (june to sep) in the region. however, details on the distribution of the competent vectors, feeding habits and their dynamics in the region is lacking multiple btv serotypes were found to be circulating in the region. (kulkarni and kulkarni 1984; janakiraman etal. 1991; mehrotra et al. 1996) a total of 10 serotypes viz. 1-4, 8, 9, 15, 16, 18 and 23 were identified based on the virus isolations. sreenivasulu et al. 1999 isolated btv serotype 2 from an outbreak of bt in native sheep of andhra pradesh. btv serotype 9, 15 and 21 were also isolated from the outbreaks occurred in andhra pradesh. some of the isolates need to be serotyped. deshmukh and gujar (1999) isolated btv type 1 from maharashtra. following is the summary of the distribution of btv serotypes in this region. clinical picture of bt in native sheep appears to be slightly different, the major difference being that swelling of lips and face was less conspicuous. mucocutaneous borders appeared to be very sensitive to touch and bleed easily upon handling. the classical signs of cyanosis of tongue and reddening of coronary band are not the common features of the disease in native sheep. the disease was also confirmed by the virus isolation and identification. clinical disease has not been reported in cattle, buffaloes and goats in spite of high seroprevalence. in conclusion bt is established in native sheep and causes severe economic losses to the farmers. the disease is concentrated in the southern peninsula of the country. the disease is seasonal and is associated with the rain fall. multiple serotypes appear to be circulating in this region. the btv serotypes were of virulent in nature as evident by severe outbreaks. s. janardana reddy*, d. c. reddy department of fishery science and aquaculture, sri venkateswara university, tirupati 517 502 in less than three decades, the penaeid shrimp culture industries of the world developed from their experimental beginnings into major industries providing hundreds of thousands of jobs, billions of u.s. dollars in revenue, and augmentation of the world's food supply with a high value crop. concomitant with the growth of the shrimp culture industry has been the recognition of the ever increasing importance of disease, especially those caused by infectious agents. in india viral diseases have become an important limiting factor for growth of shrimp aquaculture industry. although more than 30 different viral pathogens have been identified in different species of shrimp world wide, only a few viruses have identified which are causing disease problems in cultured tiger shrimps in india, east coast of andhra pradesh, in particular. diagnostic methods for these pathogens include the traditional methods of morphological pathology (direct light microscopy, histopathology, and transmission electron microscopy), enhancement and bioassay methods, traditional microbiology, and the application of serological methods. while tissue culture is considered to be a standard tool in medical and veterinary diagnostic labs, it has never been developed as a useable, routine diagnostic tool for shrimp pathogens. the need for rapid, sensitive diagnostic methods led to the application of modern biotechnology to penaeid shrimp disease. the industry now has modern diagnostic genomic probes with nonradioactive labels for viral pathogens like infectious hypodermal and hematopoietic necrosis (ihhnv), hepatopancreatic virus (hpv), taura syndrome virus (tsv), white spot syndrome virus (wssv), monodon baculo virus (mbv), and bp. highly sensitive detection methods for some pathogens that employ dna amplification methods based on the polymerase chain reaction (pcr) now exist, and more pcr methods are being developed for additional agents. these advanced molecular methods promise to provide badly needed diagnostic and research tools to an industry reeling from catastrophic epizootics and which must become poised to go on with the next phase of its development as an industry that must be better able to understand and manage disease. within this field, shrimp immunology is a key element in establishing strategies for the control of diseases in shrimp aquaculture. research needs to be directed towards the development of assays to evaluate and monitor the immune state of shrimp. the establishment of regular immune checkups will permit the detection of shrimp immunodeficiencies but also to help monitor and improve environment quality. for this, immune effectors must be first identified and characterised. in the end, however, the assumption may be made that the sustainability of aquaculture will depend on the selection of disease-resistant shrimp, i.e. to develop research in immunology and genetics at the same time. the development of strategies for prophylaxis and control of shrimp diseases could be aided by the establishment of a collaborative network to contribute to progress in basic knowledge of penaeid immunity. however, to improve efficiency, it appears essential also to open this network to complementary research areas related to shrimp pathology, physiology, genetics and environment. bluetongue is an important viral disease of sheep causing severe economic losses to the farmers. lack of effective vaccine is the major impediments in controlling the disease. multiple serotypes were found to be circulating in the state. attempts are being made to develop the vaccine employing the available serotypes to control the disease. hence, it is essential to identify the antigenic relationship among the serotypes to identify the candidate vaccine strains to be incorporated in the preparation of vaccine. reciprocal cross neutralization test was employed to find out the r% values between btv-2, -9 and -15 which indicated the extent of antigenic relationship between the serotypes. r% value between btv-2 and btv-9 was recorded as 2.8 r% value of 3.53 and 2.8 were observed between btv-2 and -15 and btv-9 and -15 respectively. the r% values recorded in the present study revealed a weak antigenic relationship between the btv serotypes. the extent of antigenic relationship between the btv serotypes was also determined by multiple sequence alignment of the nucleotide and amino acid sequences of the reference btv serotypes 2, 9 and 15. the sequence analysis of the vp2 gene revealed a homology of 47-53% and 29-41% at the nucleotide and amino acid levels respectively. r% values obtained using reciprocal cross neutralization test with the btv-2, 9 and 15 serotypes isolated in native sheep of andhra pradesh and the genomic analysis of the reference serotypes of btv-2, 9 and 15 revealed very weak antigenic relationship and were highly divergent. diseases especially those by viral pathogens cause greater economic losses in most horticultural crop species throughout the world as compared to agricultural crops. non-genetic methods of management of these diseases include quarantine measures, eradication of infected plants and weed hosts, crop rotation, use of certified virus-free seed or planting stock and use of pesticides to control insect vector populations implicated in transmission of viruses. however, none of these measures is likely to provide an enduring solution against these diseases especially those caused by viruses due sometimes to the huge expenditure involved, but mostly to the questionable effectiveness and reliability of those methods. as key control pesticides are getting increasingly abandoned, development of alternative methods to control diseases has been a felt-need in the recent past. though breeding for disease resistance generally provides a reliable security in a long run, introgression of host plant resistance did not materialise in most important crops. non-availability of an appropriate source of resistance in inter-fertile relatives, linkage to undesirable traits, or often times polygenic nature of such sources of resistance are the stumbling blocks in breeding programs. the limitations of conventional breeding and routine cultural practices prompted the need for the development of other approaches of virus control that could be fully incorporated into traditional methods. in this perspective, the concept of pathogen-derived resistance offers an attractive strategy to evolve newer methods of virus management, by transforming crop plants with nucleotide sequences derived from the pathogen's genome. an increasing number of molecular characterisation of plant virus genomes and the stable transformation of a number of horticultural crop species have in fact opened an avenue for molecular breeding against virus pathogens. successful field-testing of genetically modified crop cultivars renders proof of their supremacy over existing cultivars. it also contributes to demonstrate their capability with regard to environmental safety with a view to winning over public concern and scepticism. in general, the eventual commercialisation transgenic lines expressing virus resistance will rely upon a host of factors including their field performance, genetic stability, public acceptance and the resolution of environmental concerns and patent related issues. as such, elaborate field trials and allied studies are now required to adapt genetically engineered horticultural crops expressing virus resistance for their implementation into practical agriculture. a few examples from current research at tnau, in india or elsewhere will be discussed in this presentation. virology unit, division of plant pathology, iari, new delhi 12 in recent times there has been greater emphasis on vegetatively propagated crops in india to help diversify the indian agriculture. fruit, flower, spice and plantation crops are important vegetatively propagated horticultural crops, which have become a driving force for economic development in several parts of india. however, most of the vegetatively propagated crops are threatened by biotic stress caused by plant pathogens in general and plant viruses in particular. plant viruses produce specific and non specific symptoms and in some cases no symptoms are produced. correct identification and diagnosis of viral diseases is first step in the management of any disease including viral diseases. there have been two major breakthroughs in virus diagnostics during last four decades. the first one was serological assay using monoclonal or polyclonal antibodies in enzyme linked immunosorbent assay (elisa) and the other one was the use of in vitro amplification of dna in polymerase chain reaction (pcr). a significant development in serological assays has been its simplification in form of user's friendly quick strip/dip stick method. the one-step lateral-flow (lf) tests have been developed for the on-site detection and identification of several plant viruses. rapid advancement in virus genome characterization has led to the development of novel approaches of nucleic acid based diagnostics which include conventional pcr, real time pcr, multiplex pcr, micro/macro arrays and biochips. pcr protocols already exist for many plant viruses of citrus, banana, apple, papaya, vegetables, ornamental and spice crops. a further advancement has led to development of realtime pcr assay which is relatively easy but requires training for diagnosticians. in real-time pcr assays, results can be available within 20 min. the nucleic acid template preparation in pcr has been simplified. membrane based dna template protocol and co-isolation of nucleic acid template preparation are novel approaches in pcr detection of virus and virus like pathogens. since many of the horticultural crops are often infected by more than one virus, their individual detection by pcr is not only expensive but also time consuming. therefore, multiplex pcr has been developed where in genome of more than one virus could be amplified and detected in the same reaction mixture. development of nucleic acid based chip is now one of the fastest and recent growing areas in the field of pathogen detection. these nucleic acid based chips have been named as dna/rna chips, biochips, genechips, biosensors or dna arrays. when it comes to applications of microarray technology for plant viruses, it is not too difficult to see the value of a method that could potentially detect a whole range of viruses using a single test. however, microarrays are unlikely to become the only method in use in a diagnostic laboratory. processing of germplasm including transgenic planting material imported for research purposes into the country. during the last two decades, a total of 49,923 samples of wheat including transgenics were imported from cimmyt (mexico), icarda (syria) and many other countries. these were grown in post-entry quarantine nursery each year at nbpgr, new delhi and the transgenic samples were grown in national containment facility of level-4 (cl-4) since its inception to ensure that no viable biological material/pollen/pathogen enters or leaves the facility during quarantine processing of transgenics. in addition, post-entry quarantine inspections of the transgenic wheat grown by indenters are also undertaken by nbpgr quarantine scientists. virus-induced gene silencing (vigs) is a technique in which viral genomes are used, usually after appropriate modifications, for transient gene silencing in plants. the mechanism behind vigs is the phenomenon called rna-interference (rnai), which is widespread in many organisms and is believed to be form of inherent defence system against intracellular pathogens, such as viruses and transposons. double-stranded rna or rna containing strong secondary structures, commonly produced during viral infections, are believed to cause triggering of rnai, which employs a battery of proteins and nucleoprotein complexes to identify and degrade specific viral transcripts. in vigs, viral genomes not causing severe symptoms, but which can accumulate and spread efficiently in the host plant are used as vectors in which a host gene is cloned and introduced into the plant. upon replication, the viral vector triggers rnai response in the host plant, which also targets the host gene, leading to its silencing and subsequently, the silenced phenotype revealing gene function in vivo. vigs has been used extensively to study gene functions in dicot plants, such as tobacco, tomato, pea, soybean, etc., using vectors derived from reference genes are commonly used as an/the endogenous normalisation measure for the relative quantification of target genes. the expression (characteristics) of seven potential reference genes was evaluated in tissues of 180 healthy, physiologically stressed and barley yellow dwarf virus (bydv) infected cereal plants. these genes were tested by rt-qpcr and ranked according to the stability of their expression (characteristics) using three different methods (two-way anova, genorm and normfinder tools). in most cases, the expression (characteristics) of all genes did not depend on the abiotic stress conditions or on the virus infections. all the genes showed significant differences in expression (characteristics) among plant species. glyceraldehyde-3-phosphate dehydrogenase (gapdh), beta-tubulin (tubb) and 18s ribosomal rna (18s rrna) always ranked as the three most stable genes. on the other hand, elongation factor-1 alpha (ef1a), eukaryotic initiation factor 4a (eif4a), and 28s ribosomal rna (28s rrna) for barley and oat samples; and beta-tubulin (tubb) for wheat samples were consistently ranked as the less reliable controls. the bydv titre was determined in two oat varieties by rt-qpcr by three different quantification approaches. statistically, there were no significant differences between the absolute and the relative quantification, or between quantification using gapdh + tubb + tuba +18s rrna and ef1a + eif4a + 28s rrna. the geometric average of gapdh, 18s rrna, tuba and tubb is suitable for normalisation of bydv quantification in barley and oat tissues. for wheat samples, a combination of gapdh, 18s rrna, tubb, eif4a and e1fa is recommended. department of microbiology, yogi vemana university, vemanapuram, kadapa 516 003 large scale production and import of propagative material poses potential risk of introducing several destructive pathogens particularly viruses and mycoplasma like organisms in our country. this demands adequate quarantine safe guards such as growing them under approved post entry quarantine facility for specific period so as to facilitate virus detection, thereby curtailing risk. when such facilities are coupled with propagation by tissue culture will ensure virus free propagative plant material. the requirement of nationwide network of post entry quarantine facility working in close collaboration with crop institutions are very much emphasized for considering import of high risk plant genera for agriculture development. present paper discusses about virus disease of quarantine importance affecting ornamental and fruit plants such as chrysanthimum, dahlia, dianthus, rosabengalensis, cattleya, cymbidium, dendrobium, lilium, citrus, vitis etc. the paper also discusses on immunodiagnostic methods of detection and methods of obtaining virus free propagative material. rice tungro occurs as epidemics in regular cycles and has been reported in the last 50 years from all the major rice growing regions of india, especially prevalent in the southern and eastern states. development of the durable resistant varieties to tungro is crucial for the management of the disease. molecular breeding, involving the use of dna markers linked to the resistant gene(s) for selection, can overcome the difficulties encountered in conventional resistant breeding programs. for successful marker-assisted selection (mas), the identification of closely linked markers through the process of gene tagging and mapping is a prerequisite. attempts have been initiated for identification of tungro resistance genes through molecular mapping and their introgression into the target varieties using marker-assisted selection at drr, hyderabad. the inheritance of resistance to rice tungro virus disease was studied in seven resistant rice cultivars with field evaluation at hot spot locations. the microsatellite markers linked to rice tungro resistance in utri merah was studied and found that resistance genes were linked to rm 336 on chromosome 7. through molecular mapping two qtl were identified controlling rtv resistance on chromosomes 7 and 2 in 'utri rajapan' explaining 40.8% and 21.6% of the phenotypic variance. in variety 'vikramarya', another two qtl for rtv resistance were detected on chromosomes 7 and 1 explaining 18.7% and 16.4% of the phenotypic variance. the closely linked markers identified in this study flanking the gene of interest through mapping will improve the efficiency and precision of introgression programs in marker assisted breeding for rtv resistance. functional characterization of these qtl for rtv resistance is under progress. there is only a limited pool of natural virus resistance in cassava against cassava mosaic geminiviruses and cassava brown streak ipomovirus hence the development of transgenic resistance in this significant crop might present an option. rna mediated resistance through the expression of inverted-repeat dsrna sequences derived from the virus genome and the modification of plant microrna to produce antiviral artificial microrna are strategies that have recently been proven very effective for induction of virus resistance (immunity) against a number of rna viruses. results from rna interference strategies against geminiviruses never resulted in immunity of transgenes. however, it suggest that viral mrna are targets of rna silencing and that the success of the strategy depends on the relevance of the target gene in the systemic spread of the virus. we have generated a number of rna silencing constructs to induce resistance against cbsv and the indian cassava mosaic viruses icmv and slcmv. due to the serious problems inherent with transformation of cassava and subsequent resistance screening, these constructs were tested for efficiency either by transient-or by transgenic expression in n. benthamiana. complete immunity was reached in transgenic n. benthamiana against cbsv using inverted repeat or amirna constructs. using different species of cbsv for resistance screening, immunity was broken, to show the minimum context for broad spectrum resistance. similarly, highly specific resistance was reached in expression of amirna. in contrast, virus resistance against icmv/ slcmv using single amirna constructs was not successful. results from the experiments to generate virus resistance against cbsv and icmv/slcmv will be shown; methods to evaluate efficiency of rnai gene constructs by transient gene expression in n. benthamiana and strategies to develop efficient resistance against rna and dna viruses in cassava will be discussed. bitter gourd (momordica charantia l.) which is also called bitter melon, balsam apple and balsam pear belongs to family cucurbitaceae. it is an important traditional vegetable of nutritive and medicinal value that is cultivated in tropical and sub-tropical asia, but is considered as a weed host reservoir for viruses in jamaica. viral disease-like symptoms were observed occurring naturally on the crops of bitter gourd grown in the fields of northern india during 2007-2009. an incidence of 78.5% of diseased plants was recorded which showed chlorotic spots and mosaic ranging from mild mottling to green blisters along with leaf smalling, leaf and fruit deformations, bud necrosis and stunted growth whereas 20.2% plants exhibited leaf curling alone or in combination with mosaic-type disease. a reduction of 34.5% in fruit yield was recorded in mosaic-like disease which could be attributed to lesser fruit setting due to bud necrosis, smaller fruit size and stunted plant growth. such plants produced deformed, notched, irregularly shaped fruits wherein pre-mature yellowing and necrosis on the anterior and posteriors ends made 22.4% fruits unfit for marketability. the dwindling yield and production of unmarketable fruits posed a major constraint for profitable cultivation of this economically important crop, thus warranting for studies on etiology and management of these diseases. the mosaic-like disease was transmitted to healthy seedlings of bitter gourd at 2-leaves stage by sap inoculation as well as by aphid viz., myzus persicae sulz. and aphis gossypii glov. initially studies were carried out to optimize protocols for efficient plant regeneration and agrobacterium-mediated transformation for nagpur sweet orange, which is a popular and elite citrus cultivar in india. organogenesis was induced in etiolated epicotyl explants of one-month-old axenically raised polyembryonic seedlings by culturing them in mt medium supplemented with 30 g/l sucrose with varying concentrations of plant hormones. it was found that bap at 1 mg/l without auxin was best for efficient shoot regeneration in citrus using epicotyl explants. a 100% regeneration frequency was obtained and multiple shoot formation was obtained from both the cut ends of all the explants. an average of 8.24 well-differentiated shoots per explant were obtained, all of which rooted normally under the influence of 1 mg/l iba. this improved regeneration protocol was utilized in standardizing agrobacterium-mediated transformation of citrus using a. tumefaciens strain eha 105, containing binary plasmid pcambia 2301 that harbors gus reporter gene and npt-ii plant selection marker gene. one-month-old epicotyl explants infected with over-night grown agrobacterium (a 600 0.6-0.8) for 15 min and co-cultured for 3 days were found to be optimum for transformation as assessed on the basis of pcr analysis and gus activity displayed by the stem and leaf sections of putative transgenics. overall transformation frequency ranged from 38 to 48%. current study focuses on the generation of citrus transgenics for ctv resistance using a. tumefaciens strain eha 105 containing binary plasmid pbinar harboring portion of coat protein gene of ctv and npt-ii gene employing the standardized protocols. several putative transgenic shoots were recovered on selection medium and they are being utilized for molecular analyses and resistance against ctv. work is also in progress on the generation of citrus transformants using rnai construct harboring ctv cp and p23 genes, singly and in conjunction. our lab was also involved in developing rice transgenics for resistance against rice tungro disease, which is one of the most important and widespread virus diseases of rice in south and southeast asia, causing an annual estimated loss in crop yield of economic losses worth millions of rupees are caused due to these diseases annually. virus diseases are frequently less conspicuous than those caused by other plant pathogens and last for much longer. this is especially true for perennial crops and those that are vegetatively propagated. one further problem with attending to assess losses due to various diseases on a global basis is that what most of the data are from small comparative trials rather than wide scale comprehensive surveys, even the small trials do not necessarily give data that can be used for more global estimates of losses. this is for several reasons, including: (1) variation in losses by a particular crop from year to year; (2) variation from region to region and climatic zone to climatic zone: (3) differences in loss assessment methodologies; (4) identification of the viral etiology of the disease; 5 variation in the definition of the term 'losses' and (6) chilli is the major vegetable and spice crop grown in thar desert areas of rajasthan. leaf curl disease (chlcd) is one of the major constrains in chilli cultivation faced by farmers and cause yield loss up to 100%. a survey was conducted in major chilli growing areas of thar desert; bikaner, nagur, jodhpur and jalore districts of rajasthan during november, 2009 to understand the present status of leaf curl disease in chilli. among the four district surveyed for chlcd, the disease incidence was recorded maximum (up to 98%) in jodhpur district followed by jolore district (up to 88%). no relation was found between the disease incidence and varieties. the major varieties grown in these area are; mehsana, rch (mandoria), haripur raipur, mathania and local cultivars. the number of whitefly was also counted in top, middle and bottom leaf of chilli grown in these areas. the average number of whitefly per plant ranged from 0.0 to 4.0. more number of whitefly (4.0) was recorded in jodhpur district and lowest (1.8) in jalore district. total dna was extracted from three leaf curl infected samples from each district and tested for the presence of begomovirus using coat protein (cp) and dna-b specific primers. all the samples were positive for cp and dna-b amplifications by pcr. the cloning and sequencing of selected cp gene and dna-b fragments are in progress. the preliminary investigations shows that the leaf curl disease of chilli is widespread in the arid region of rajasthan and may be caused by begomovirus associated with satellite dna-b. bittergourd (momordica charantia) is an important vegetable crop of kerala. the crop is affected by several diseases of which mosaic is a prominent one. a field experiment was conducted to evaluate the efficacy of potentised resistance inducing substances (ris) viz., mosaic affected bittergourd plant tissue, ash of mosaic affected bittergourd plant tissue, plumbago and salicylic acid for control of bittergourd mosaic in march 2008. ris were applied as drench and foliar spray at three potency levels twice, before flowering of the crop. the experimental crop was grown as per the package of practice recommendations in split plot design with five replications per treatment. the disease incidence, disease severity and yield of the crop were recorded. the result of the experiment shows that spraying was more effective than drenching of treatments for reducing mosaic incidence and severity. among treatments, infected plant extract at 19 potency was the most effective one for reducing mosaic incidence and it showed the maximum incubation period and minimum disease severity. the spray application of treatments produced significantly higher yield than drenching. among the treatments, ash of infected plant at 19 and 309 potency and infected plant extract at 69 potency were on par and produced comparatively higher yield. elephant foot yam (amoprhophallus paeoniifolius), colocasia (colocasia esculenta) and tannia (xanthosoma sagittifolium) are the major edible aroids cultivated in india. the elephant foot yam cultivation is gaining importance due to its high production potential, nutritional and medicinal values and good economic returns. all these aroids are vegetatively propagated and viral diseases are spreading through planting materials. ctcri has the mandate of producing healthy planting materials of these edible aroids. accurate diagnosis and identification of the virus is essential for production of healthy planting material and effective management of the disease. though occurrences of viral diseases on edible aroids in india were known in 1960s, not much attention was given for detection and identification of the virus involved. in case of elephant foot yam 5-30% mosaic incidence was observed with varying symptoms of mosaic, puckering, filiformy etc. in colocasia and tannia, 5-10% incidence was noticed. rt-pcr amplification with potyvirus group specific primers and subsequent cloning and sequencing of the amplified product has confirmed the association of dasheen mosaic virus (dsmv) with all the three edible aroids cultivated in india. the complete full length coat protein gene of dsmv infecting elephant foot yam was cloned in pgem-t vector and sequenced. further sequence analysis revealed that the cp of dsmv consisted of 942 nucleotides and the 3 0 utr comprised of 260 nucleotides. blast and phylogenetic analysis showed highest similarity of 89% with that of dsmv isolate af048981, reported from usa. the deduced amino acid sequence of cp had 92.0-98.0% identity with other dsmv isolates. blast analysis of the partial cp gene sequences of colocasia and tannia also confirmed that the virus involved is dsmv. rt-pcr analysis of large number of samples from all the three crops confirmed that the potyvirus group specific primers (mj1 and mj2) are good for rapid detection of dsmv in these crops. dsmv specific biotinylated cdna and digoxigenin labelled crna probes were also prepared and dsmv in elephant foot yam was detected through nucleic acid spot hybridization. yellow leaf disease (yld) caused by sugarcane yellow leaf virus (scylv) is a recently recorded disease in india and is found wide spread throughout country. in popular varieties, the disease incidence varied from 0 to 75.0% and attained epidemic levels under field conditions. detailed studies on the impact of yld on sugarcane revealed that the virus infection significantly reduces various cane growth parameters, cane yield and juice quality. sequence comparisons of the coat protein (cp) and movement protein (mp) of 22 scylv isolates from india and database sequences showed a significant variation between indian isolates and the database sequences both at nt and aa level in the cp/mp coding regions. the significant variation in our isolates with the database isolates, even in the least variable region of the scylv genome showed that the population existing in india is different from rest of the world. further, comparison of partial sequences encoding for orf 1 and 2 revealed that yld in sugarcane in india is caused at least by three genotypes viz., cub, ind and bra-per, of which a majority of the samples were found infected with cuban genotype (cub). the genotype ind was identified as a new genotype and this was found to have significant variation with the reported genotypes. we have identified specific primers from cp region of the virus and optimized rt-pcr conditions to diagnose the virus. this assay has been found efficient in detecting the virus in asymptomatic plants and tissue culture derived seedlings. elimination of the virus through meristem culture has been demonstrated to purify the virus from the infected planting materials and this technique needs to be adopted to supply disease-free planting materials for effective management of the disease. studies are also in progress to identify the yld-resistant sources in sugarcane germplasm to initiate breeding for yld-resistance in sugarcane. mycoviruses are viruses that infect fungi. they have been identified in all major fungal families. in the present scenario, mycoviruses are the important means of biocontrol of plant fungal pathogens. most identified fungal viruses have double stranded rna genomes, often with more than one dsrna present per virus particle, and have been spherical in shape. these viruses are mostly vesicle bound, as other viruses have protein coatings. to be a true mycovirus, they must demonstrate an ability to be transmitted-in other words be able to infect other healthy fungi through anastomosis and spores. mycoviruses lead 'secret lives', reduce the ability of their fungal hosts to cause disease in plants. this property, known as hypovirulence (hypovirulence is a term used to describe reduced virulence found in strains of pathogens), this phenomenon was first observed in cryphonectria (endothia) parasitica (chestnut blight fungus) on european castanea sativa in italy, where naturally occuring hypovirulent strains were able to reduce the effect of virulent ones. these slower growing hypovirulent strains of c. parasitica contain a single cytoplasmic element of double-stranded rna (ds rna) similar to that found in mycoviruses that was transmitted by anastomosis in compatible strains through natural virulent populations of c. parasitica. hypovirulence has also been reported in many other fungal plant pathogens, including rhizoctonia solani, gaeumannomyces gramini var. tritici, ophiostoma ulmi, sclerotinia homoeocarpa, diaporthe ambigua alternaria alternata, and fusarium sp. etc. hypovirulence has attracted attention owing to the importance of fungal diseases in agriculture and the limited strategies that are available for the control of these diseases. it reduces the use of toxic fungicides which also affect the plant growth. the symptoms resulted by the mycoviruses are reduction in growth, reduction in pigmentation and sporulation, excessive sectoring and aerial mycelial collapse. these are the consequences of alteration in complex physiological and biochemical processes involving interaction between host and virus. cassava (manihot esculenta crantz.) is the major tuber crop in peninsular india, it is grown in an area of 2.4 lakh hectares with the annual production of 6.7 million tonnes both for direct consumption and the starch grain (sago) producing industries, mainly in the southern states of tamil nadu, kerala and andhra pradesh (fao 2005) . in tamil nadu, cassava primarily produced for sago producing industries where it is considered as an industrial crop rather than food crop, so the resource rich farmers are cultivating the cassava as irrigated crop in their fertile land and the poor farmers are raising the crop under rainfed conditions. in south india in addition to cassava there is a practice of intercropping important vegetable crops like, tomato, brinjal, legumes and gourds in cassava fields since all the above mentioned crops are short duration and are money spinners for the farmers. unfortunately, the major production constraint in these vegetable crops including cassava is the geminiviruses belonging to the family of in recent years there has been growing concern regarding the standard of scientific researches in india. the strengths, weaknesses, opportunities and threats (swot) analysis on indian scientific research reviewed the progress of science during the last six decades. although the 'strengths' were highlighted in good measure, it was the list of 'weaknesses' that called for attention to upgrade the standard of research and 'opportunities' that provide scope for overall scientific growth. a comparison between india and other countries in terms of research papers published revealed that india's contribution to science has come down enormously. what ails indian science? should we compare the growth of indian science with other developed countries? what criteria should be adopted to judge the quality and standard of scientific research? how to motivate the scientists to improve their scientific output? how do motivate the scientists to improve their scientific output? how do indian journals perform in maintaining quality? this paper analyses critically the scientific journals around the world, based on the scores allotted by the national academy of agriculture sciences (naas) in 2003 and 2007 for 1460 and 1608 journals respectively. in general, the indian journals performed poorly irrespective of the disciplines with only 25-30% in the high standard. the paper dealt with the reasons for low impact factor, the anomalies in the allotment of scores to wide spectrum of the journals and the disadvantages the scientists face with the scoring system. a case study was presented of an institute with over 50 scientists whose publications were analyzed to discuss the merits and demerits of the system. the performance of the journals published by prestigious academics, societies and councils was also projected. the paper concluded with the need for enhancing the image of the country through research publications in high standard journals and the role of various scientific bodies with shore and long term measures. poster session herpes simplex virus (hsv) keratitis is a leading cause of corneal blindness throughout the world. the infection can be diagnosed by clinical manifestations but in case of atypical ocular cases, laboratory diagnosis is more helpful in timely management of disease. collection of corneal scrapings in all cases of stromal and epithelial keratitis may not be possible, but collecting tear fluid is a convenient procedure causing less discomfort to the patients. therefore, the present study was intended to evaluate the suitability of tear specimens for detecting hsv by polymerase chain reaction (pcr) and immunofluorescence (ifa). tear fluid and corneal scrapings were collected from 134 patients of suspected herpetic keratitis. hsv-1 antigen was detected by ifa using rabbit anti-hsv antibodies. pcr was performed to amplify 111 bp region of thymidine kinase (tk) coding gene and 144 bp region from dna polymerase coding gene of hsv. out of 134 patients hsv antigen was detected in 25 (18.65%) of corneal scrapings and 15 (11.19%) of tear specimens and in 12 (8.95%) patients from both the specimens. hsv gene could be amplified in 44 (32.83%) of corneal scrapings and 16 (11.94%) of tear fluids and in 13 (9.71%) patients from both the specimens. although, corneal scraping seemed to be marginally superior material for detection of hsv, tear fluid may also serve as an appropriate alternative clinical specimen, due to ease of collection and least discomfort to the patients. in either cases pcr detected higher number of hsv cases than ifa. therefore if and when feasible, both ifa and pcr should be used simultaneously on each specimen to obtain best results. cytokines play a key role in the regulation of immune responses. in hepatitis c virus infection (hcv), the production of inappropriate cytokine levels appears to contribute to viral persistence and to affect response to therapy. il-6 is produced by a variety of cells including t cells, phagocytes and fibroblast. cytokine genes are polymorphic at specific sites, and certain mutations located within coding/regulatory regions have been shown to affect the overall expression and secretion of cytokines in patients with hcv infection. to correlate the serum levels and polymorphism of il-6 gene in chronic hepatitis c patients and healthy controls. forty patients positive for hcv rna attending the medicine out patient department and wards of lok nayak hospital, new delhi as well as forty healthy controls were enrolled for the study. the serum level of il-6 was detected by using elisa. genomic dna was extracted from whole blood of hcv infected patients and healthy controls by using accuprep genomic dna extraction kit according to manufacture's instruction. the genotyping of il-6 promoter (-174 variant) was carried out by pcr and direct sequencing using the method of patricia woo et al. 1998. the serum level of il-6 was significantly down regulated in hcv infected chronic patients as compared to the healthy controls. genotyping of -174 promoter variant of il-6 was performed by pcr and direct sequencing. il-6 polymorphism in the g/g, g/c and c/c allele was non significant when compared to hcv patients and healthy controls. the il-6 serum levels were significant among hcv infected patients when compared to healthy controls. the polymorphism in the promoter region of il-6 (-174) was found nonsignificantly associated in hcv patients compared to healthy controls. in conclusion, the present study suggests that the host il-6 polymorphism alone may not play a significant role in the outcome of hcv infection. acute gastroenteritis (age) is a global health problem and has been associated with multiple etiological agents, which include bacteria, protozoa and viruses. viral gastroenteritis is considered as the second most common illness in children after upper respiratory tract infection. among enteric viruses, rota, noro, enteric adeno, astro and enterovirus are found to be associated with gastroenteritis. although, association of enteric viruses has been established in children hospitalized for age no such data is available from hospitalized children other than enteric infections. to determine the prevalence of enteric viruses circulating in hospitalized children. fecal samples, n = 292 (177 symptomatic and 115 asymptomatic for age) were collected from children \5 year of age from three different hospitals across the city of pune from june 2008 to feb. 2009. detection of group a rotavirus was carried out by using antigen captured elisa. rt-pcr and pcr was carried out for the detection of norovirus, enterovirus, astrovirus and enteric adenovirus detection by using primers targeted to rdrp gene, 5 0 ncr gene and consevered gene for serine protease and hexon gene respectively. out of 177 fecal samples tested for enteric viruses in age cases, the prevalence of rota, entero, noro, enteric adeno and astrovirus were 33.3% (59), 14.7% (26), 6.2% (11), 2.8% (5) and 1.1% (2) respectively. however, the presence of these viruses in the asymptomatic cases (n = 115) was detected at 7.8% (9), 5.2% (6), 7.8% (9), 0.86% (1) and 1.7% (2) levels respectively. mixed infections of enterovirus and rotavirus were found in both symptomatic 1.6% (3) and asymptomatic cases 0.8% (1). however, mixed infection of enterovirus with adenovirus were found only in asymptomatic cases 0.8% (1). no marked difference was observed in the seasonal pattern of all viruses in the patients with or without gastroenteritis. the findings of this study document highest circulation of rotaviruses in patients symptomatic and asymptomatic for age. the entero and noroviruses remain second most important enteric viruses in these patients. influenza in humans is a major public health concern and the understanding of its evolution in the light of its ''antigenic drift'' helps prediction of epidemics and update of yearly influenza vaccine. to antigenically characterize influenza a (h3n2) isolates and study antigenic drift during 1990 to 2009 in pune city. patients with influenza like illness were identified using a strict case definition from dispensaries located in different areas in pune and clinical samples (ns/ts) were collected after obtaining informed consent. these clinical samples were processed in vivo (in fertile eggs) and in vitro ( overall, an additional 35 (39.7%) positive cases of dengue could be detected when ns1 antigen assay was also used in the study. highest ns1 antigen positivity was encountered among the samples collected on the 3rd day of fever whereas mac elisa for anti igm antibody was positive after 4th day and gradually there was an increase in the positivity towards the convalescent phase of the disease. the results of this study indicate that ns1 antigen based elisa test can be an useful tool to detect the dengue virus infection in patients during the early acute phase of disease since appearance of igm antibodies usually occur after fifth day of the infection. concurrent use of both diagnostic assays namely ns1 antigen as well as mac elisa will improve the overall detection of dengue infection. early detection of acute dengue virus infection is crucial to provide timely information for the management of patients. human parvovirus b19, a member of the parvoviridae family, is a pathogen associated with a wide variety of diseases. most commonly, it causes childhood rash erythema infectiosum, but in some cases more serious symptoms such as persistent arthropathy, critical failures of red cell production causing transient aplastic crisis, this infection in pregnancy causes hydrops fetalis and myocarditis. traditional immunosuppressive therapy being unsuccessful, anti-viral therapy might be worthy of consideration. functional annotation would provide role of viral proteome in its survival and pathogenic mechanisms. svmprot functional family annotations of vp2 protein had deciphered its zincbinding, coat protein, outer membrane, chlorophyll biosynthesis, dna repair and calcium-binding nature. vp2 protein is having a key role in viral assembly of b19 virus and being non-homologous to human proteome, it was identified as an attractive molecular target for structure based drug discovery. the vp2 protein crystal structure was energy minimized using charmm. a structure based virtual screening method was applied using ligandfit to identify potential inhibitors of vp2 protein from chembank database and ten potential human parvovirus b19 vp2 inhibitors were proposed. prism 310 genetic analyzer. the drafting of the sequences was performed using bioedit software and were submitted in genbank. for phylogenetic interpretation denv representing the full extent of genetic diversity in denv-1, denv-2 and denv-3 were collected from genbank. neighbor joining algorithm was implemented with bootstrap value of 10,000 replicates for phylogenetic inference using mega 4.0.2. the genomic region 134 to 644 (c-prm gene junction) of denv were amplified directly from patient serum. twelve of 72 samples were positive for dengue viral rna. of these 4 were dengue type 1, 1 was dengue type 2 and 7 were dengue type 3. for molecular epidemiological survey and genotyping of the sequences more than 100 sequences from different geographical areas including sequences form previously reported north indian isolates were compared with our present data set. the critical analysis of the sequences revealed: 4 dengue type 1 sequences were clustered within sub-type 2 of genotype iii and all the 7 sequences of den-3 clustered along with genotype iii. thus, among the dengue types 1, 2 and 3 currently circulating in north india, dengue type 3, genotype iii, being the predominant one followed by, genotype iii of dengue type 1. although there is no specific treatment or vaccine available currently, the confirmative rapid diagnosis based on detection of viral nucleic acid or igm antibodies in serum, an indication of recent infection, helps in epidemiological monitoring, symptomatic treatment of patients and determining prognosis. serological detection of anti-cgv igm antibodies was performed using rapid immuno-chromatographic assay (rica) and igm-antibody capture enzyme linked immunosorbant assay (mac-elisa). eighty convalescent sera were tested by rica and 60 of them were found positive for anti-cgv igm antibodies. twenty-five anti-cgv igm antibody rica positive sera were further assayed using mac-elisa. more sera from the patients are currently being tested to compare the sensitivity of these two serological assays in anti-cgv igm antibody based early serological diagnosis of cgv infection and the findings will be presented. thus the present study was designed to evaluate the utility of multiplex pcr (mpcr) for simultaneous and rapid detection of dengue and chikungunya viral infections. seventy-two acute phase blood samples from clinically suspected dengue cases were subjected for dengue and chikungunya uniplex pcr using dengue genus specific primers and e gene specific primers for chikungunya virus as well as multiplex pcr was developed for simultaneous detection of dengue and chikungunya infection. standard strains of dengue and chikungunya virus were used as controls. 13 of the 72 clinically suspected dengue samples were found to be positive for dengue viral rna by dengue uniplex pcr as well as dengue chikungunya mpcr whereas none of the samples were positive for chikungunya virus infection by both uniplex chikungunya pcr and dengue chikungunya mpcr. the result of dengue and chikungunya uniplex pcr was found to be 100% concordant with dengue chikungunya multiplex pcr. dengue chikungunya multiplex pcr was found to be a potential rapid test to detect dengue and chikungunya viral infections simultaneously in clinical samples. sheetal malhotra, neelam marwaha, karan saluja, ratti ram sharma department of transfusion medicine, pgimer, chandigarh 160012 transmission through blood and blood products can be reduced to a great extent by efficient and reliable testing of the blood. the newer fourth generation elisa assays simultaneously detect antibodies against hiv-1 and 2 and the presence of p24 antigen and thus shorten the window period to about 14 days, as compared to 22 days with third generation elisa. to compare the hiv seroprevalance among blood donors using fourth generation elisa (antigen-antibody) versus third generation elisa (antibody) assay. this was a prospective study involving 5100 blood donors of which 3400 were voluntary donors (1700 being students and 1700 being non students) and 1700 were replacement donors. sex workers are one of the core group for transmission of sti/hiv and as a ''bridge group'' to the general population. accordingly, highest priority is given to this group in targeted intervention for prevention of hiv/aids. here we are describing one such female sex worker who was harbouring 5 concomitant sti including 4 viral sti. a 25 year old female sex worker was brought to the sti clinic of a tertiary care hospital by ngo with complaint of genital discharge for 3 days. on per speculum examination, cervix was slightly erythematous, tender with mucopurulent discharge. there was no vaginal discharge or ulcer in anogenital area. however, there was a wart at lateral wall of vagina. as per naco syndromic management guideline, treatment was given for n. gonorrhoeae, c. trachomatis and hpv. cervical swab was taken and subjected to various microbiological investigation for the detection of sti viz n. gonorrhoeae, c. trachomatis, t. pallidum, candida spp., t.vaginalis, hsv-1, hsv-2, hiv, hbv, hcv, hpv and m. contagiosum. saline wet mount showed pus cells, but no yeast cells or trophozoite of trichomonas vaginalis. gram stained smear showed more than four polymorphonuclear leucocytes in the absence of gramnegative intracellular diplococci and a presumptive diagnosis of non gonococcal urethritis was made. no organism was isolated on any culture media after appropriate incubation. cervical swab was negative for antigen of c. trachomatis. serum was tested positive for hbv, hcv, hsv-2 and t. pallidum though it was seronegative for hiv. in the present case, the female sex worker was harbouring four viral sti viz hsv-2, hbv, hcv and hpv alongwith t. pallidum. however clinically she was diagnosed and treated accurately only for genital wart while cervical discharge due to hsv-2 was misdiagnosed. it is necessary to try to test alternative approaches such as periodic presumptive therapy of viral sti, because this will not only boost up the efforts of sti control in the target group but also help in hiv control. alternatively, regular clinical and laboratory screening for viral sti may be tried. densonucleosis viruses (dnv) belong to parvoviridae family. they are the etiological agents of insect's disease known as densonucleosis, which leads to death or loss of vital functions of the infected insect. densonucleosis virus of mosquitoes has generated lot of scientific interests because of its tremendous potential in biological control and its application as a transducing vector. earlier, we have reported the isolation and characterization of a dnv from aedes aegypti mosquitoes and its prevalence among different ae. aegypti populations from india. there are reports suggesting that when aedes albopictus mosquitoes co-infected with dengue-2 and dnv, the multiplication of den-2 is suppressed. the present study focus on the effect of coinfection of ae. aegypti mosquitoes with dnv and chikungunya virus (chik). the first instar mosquito larvae were infected with dnv and the emerging dnv infected females were then infected with chikv by oral feeding. thus obtained chik infected female mosquitoes were analyzed by real time pcr for both dnv and chikv on alternate days post-infection, up to the 14th day. the data showed no significant difference in the multiplication of either of the viruses after co-infection. results suggest that chikv neither stimulates the replication of dnv nor is its own replication suppressed due to co-infection. this study forms an initial step in understanding the role played by such endogenous viruses on the vector dynamics. chandipura virus pathogenesis is manifested as encephalitis in young children with a very high mortality rate. this damage could be due to direct replication of the virus in brain parenchymal tissue or immune system mediated. this study aims at elucidating the role of brain infiltrating lymphocytes in pathogenesis using mice as the model system. mice were inoculated intracerebrally with the virus and the perfused brain tissue was used to isolate the lymphocytes. control mice were inoculated with an equal amount of media. in order to standardize the procedure for isolation of lymphocytes from brain tissue, splenocytes were processed to isolate the lymphocytes using histopaque density gradient method. methods to isolate lymphocytes from brain tissue as described by earlier workers were tested for the ease and efficiency of procedure using known suspension of lymphocytes from spleen. percoll density gradient method provided optimum yield of lymphocytes with an ease of handling. in this, brain cell suspension used to prepare 30% percoll is layered over 70% percoll prepared using media in 1:2 ratio. density gradient centrifugation is carried out at 9009g for 20 min at 15°c to obtain lymphocyte layer at the interface. leishman staining was performed to analyze the morphological characteristics of isolated lymphocytes. normal lymphocytes showed dark blue stained nucleus. some bigger sized cells with diffused nucleus characteristic of atypical lymphocytes were observed and some of the cells were surrounded by hair like structures. phenotypic characterization was carried out using flow cytometry. the presence of cd4 + , cd8 + and cd19 + cells was observed. the percentages of cd8 + , cd4 + and cd19 + cells was found to be 7.60%, 35.14% and 34.32% respectively in the lymphocytes isolated from infected animal and 5.65%, 30.27% and 3.13% respectively from control animal. hence, cd19 + cells showed maximum infiltration after infection. (santosh et al. 2008; pradeep et al. 2008 ). in the present study chikv suspected blood samples were collected and the acute phase samples were subjected to rt-pcr for the presence of virus specific rna by using the primer pair dvrchk-f/dvrchk-r as described by us earlier (naresh kumar et al. 2007 ). the convalescent phase samples were screened for chikv specific antibodies by using sd bioline chikungunya igm rapid test. six sets of primers were designed to amplify the complete nsp4 and complete structural genes of chikungunya virus. the products were further gel purified, cloned in ptz57r/t vector and the recombinant clones were sequenced and submitted to the genbank. the complete ns4gene and structural genes were compared with other available sequences in the genbank. sequence analysis results will be presented. the present study discusses these aspects in detail. . some of these phages (viz. v953, v954) showed plaques at 42°c but not at 37°c. thus they seem to be lysogenic. for propagating and increasing the titre of all the above isolates, various previously described methods were attempted, but none of these methods were satisfactory. but when siliconized glassware and plastic-ware were used, propagation was successful. we showed that siliconization of glassware and plastic-ware was essential for the propagation of our mycobacteriophage isolates v951, v952, v953, v954 and v955. also, phage dilution medium (pdm) as described by chaterjee et al. (2000) was found to be effective for picking out of the plaques made by the phages. in this way, the phage isolates were propagated up to p 3 . the various passages of the phage isolates v951, v952, v953, v954 and v955 (i.e. original, p 1 , p 2 and p 3 ) were stored at -80°c. the four major routes of transmission are unsafe sex, contaminated needles, transmission from an infected mother to her baby at birth (vertical transmission) and breast milk. screening of blood products for hiv has largely eliminated transmission through blood transfusions or infected blood products in the developed world. in 2008, globally, about 2 million people died of aids, 33.4 million were living with hiv and 2.7 million people were newly infected with the virus. hiv infections and aids deaths are unevenly distributed geographically and the nature of the epidemics vary by region. more than 90% of people with hiv are living in the developing world. there is growing recognition that the virus does not discriminate by age, race, gender, ethnicity, socioeconomic status-everyone is susceptible. however, certain groups are at particular risk of hiv, including men who have sex with men (msm), injecting drug users (idus), and commercial sex workers (csws). the present study indicates the prevalence of hiv infection among the people residing in the northern region of india predominantly among the foothills of the himalayas. the study was carried out on the patients visiting herbertpur christian hospital (a unit of emmanuel hospital association) under the integrated counselling and testing centre scheme at the respective hospital during the 2009-2010. the study indicates the screening of people groups residing in the respective area through community health schemes. the diagnosis of the hiv infection is done by three types of assays namely. the tridot method which is the rapid method of diagnosis followed by the. hiv coombs test which involves the dot immunoassay principle. the third assay is the enzyme linked immunosorbent assay (elisa). the number of patients screened during the period of september 2009 to march 2010 is 635 which include patients coming from four different states namely haryana uttarakhand uttarpradesh and himachal pradesh. the number of people who were tested positive are 8 and the number of people who were tested negative are 627. the people tested positive are sent to the higher centre for other confirmatory tests such as pcr and western blot analysis. these patients are sent for treatment and prophylaxis at a respective recognised centre in dehradun. the present study determines a consistent community hiv screening and treatment approach through diagnostics counselling and awareness programmes. classical swine fever (csf) also known as hog cholera is a highly contagious and fatal disease of swine. csf became rapidly a major issue of pig industries. it still causes important economical losses worldwide. it is considered as a major health problem of swines in india. during the month of august to october 2009 there was an outbreak of classical swine fever in bihar. from three districts darbhanga, patna and supol, total 36 numbers of different infected tissue samples like kidney, spleen and lymphnode were collected from the dead morbid/pigs. total rna was isolated from 20% homogenate of infected tissues in sterile pbs by tri-reagent (sigma, usa) according to the manufacturer's instructions and cdna was prepared by using commercial available kit. the cdna was stored frozen at -20°c until used. for the molecular detection of classical swine fever virus specific nested pcr amplification of e2 and 5 0 ntr was done along with ns5b and e rns amplification. primarily these samples were found positive with these primers. further confirmation by sequencing was done by cloning of these pcr products in pgem-t easy vector. e2 and 5 0 ntr sequences were considered for phylogentic analysis along with 20 complete available sequences of csfv. nucleotide sequence alignments were carried out using the clustalw program (dnastar) and phylogenetic tree analysis (dnastar) showed that 5 0 ntr have close proximity with taiwan strain (accession no. ay568569) and e2 shows close proximity with chinese isolate csfv-39 (accession no. af407339). peste des petits ruminants (ppr) and sheeppox are oie notifiable diseases of small ruminants especially sheep and goat. both the diseases are economically important, in enzootic countries like india and are major constraints in the productivity of animals. considering the geographical distribution of both ppr and sheep pox infections and prevalence of mixed infection, in the present study, safety and potency of the experimental duel vaccine comprising attenuated strains of thermostable-ppr virus (pprv-revati, p-50) grown at 40°c and attenuated sheep poxvirus (sppv-srinagar, p40) was evaluated in local non-descript sheep. experimental animals were grouped into four groups and each group was comprising six animals, received 100 doses (10 5 tcid 50 ), 1 dose (10 3 tcid 50 ) and 1/10th dose of vaccines and normal saline as control in 1 ml volume subcutaneously, respectively. serum samples were collected on 0, 7, 14, 21 and 28th day post vaccination. sheep simultaneously immunized with 1 ml of vaccine consisting of either 100 or 1 doses of each of pprv and sppv were monitored for clinical and serological responses for a period of 3-4 weeks post-immunization (pi) and post challenge (pc). specific immune responses i.e., antibodies directed to both pprv and sppv could be demonstrated by ppr competitive elisa kit and capripox indirect elisa, respectively following immunization. all the immunized animals' resisted infection when challenged with virulent strain of sppv (srinagar isolate at p-6) on day 28 dpi, while in contact control animals developed characteristic signs of sheeppox. the challenge of the sheep against ppr was not carried out, however, the antibody titre after immunization determined by snt and elisa, indicated that protective titre, as per earlier report on the goats. dual vaccine was found safe at higher dose and induced protective immune response even at lower dose (10 2 tcid 50 ) in sheep, which was evident from sero-conversion as well as challenge study with sppv. the study indicated that both the viruses are compatible and did not interfere with each other in eliciting immune response, paving the feasibility of use of this dual vaccine in combating both infections simultaneously. goatpox is one of the highly contagious, oie notifiable and economically important viral diseases of goats. the disease is caused by goatpox virus (gtpv) is classified of the genus capripoxvirus in the family poxviridae. the disease incurs severe economic losses in terms of high morbidity in adults and heavy mortality in young kids and is a major constraint in goat farming in india. considering the enzotic nature and economic impact of the disease, it is all important to control the infection by developing an effective vaccine. recently, vero cell based a live attenuated goat pox vaccine; using gtpv uttarkashi isolate (p60) has been developed in authors' laboratory and evaluated in goats. the vaccine was found safe, potent and immunogenic experimentally and even at field trials. the vaccine has been evaluated at large-scale at different regions of the country and found suitable for mass vaccination. however, the longevity of potency was not evaluated. therefore, a long term potency trials were studied for a period of 4 years with annual challenge by using virulent goatpox virus and sero-monitoring. a sufficient number of hill goats has been vaccinated with 1 dose of vaccine (10 2.5 tcid 50 /ml) and monitored for clinical and serological response. every year, significant number of vaccinated (n = 5) and control animals (n = 2) were used for challenge with virulent strain (2 9 10 7.0 srd 50 /ml, gtpv mukteswar). sera of pre-and post-challenged (14 dpc) animals including controls have been collected and monitored for serological response in the form of specific antibody production by snt and indirect elisa. all the vaccinated animals were protected on challenge, whereas, all unvaccinated controls developed infections. the same has been reflected in sero monitoring of collected sera. so the developed live attenuated goat pox vaccine was found safe, immunogenic and potent for a period of 4 years of immunization and suitable for mass scale vaccination in control and eradication of goat pox along with a/are suitable diagnostic tool/s in goatpox enzootic country like india. rotavirus infection in avian species varies from subclinical infections to outbreaks of diarrhea. the economic significance of rotaviral enteritis to the poultry industry has not yet been defined, but by analogy to the situation in mammals, it is likely to be significant. unlike the extensive studies performed on rotavirus infection in humans and animals, limited studies have been carried out to determine the extent of exposure of poultry birds to rotaviruses. to determine the prevalence of avian rotavirus antibodies in commercial broiler chickens. a total of 120 chicken serum samples were collected from the lairage of a poultry slaughter house where birds from four different broiler farms in and around pune city were supplied to. the serum samples were tested by an igg antibody capture elisa wherein purified chicken rotavirus ch2 was used as coating antigen. sera from specific pathogen free (spf) chick (n = 20) served as negative control in the test. cut off was calculated as mean negative control ? 3sd (standard deviation). s/co (mean sample od 450 /cut off) values above 1 (1.113-4.445) in 60% (72/120) serum samples were indicating positivity to rotavirus antibodies. the result of the study indicates exposure of the birds to avian rotavirus or similar agent that is circulating in pune city. bluetongue has become established in south india causing regular outbreaks in sheep. btv serotypes 2, 9, 15 and 21 were isolated from native sheep of andhra pradesh. the other serotypes circulating in the state need to be identified. however the major constraint is the serotype identification. to overcome the difficulties of traditional serotyping methods (neutralization tests), nucleic acid based tests are being tried. rt-pcr for serotyping was standardized using primers specific to vp2 gene of btv-2, 9 and 15 serotypes. rt-pcr resulted in 653 bp product of btv-2, 1241 bp product of btv-9 which was defined by specific primers. however non specific amplification at two different sites i.e. 700 bp and 1500 bp was noticed for btv-15. specificity of rt-pcr was evaluated. btv-2 and btv-9 specific primers could amplify only btv-2 and btv-9 respectively where as btv-15 type specific primers amplified not only btv-15 but also btv-2 and btv-9. nucleic acid sequence data obtained from btv-2 pcr product and btv-9 cloned products were specific to vp2 gene of btv-2 and btv-9 respectively. however, 700 and 1500 bp products of btv-15 were identical to vp 4 gene of btv-2, 8, 10, 11, 13 and 18 and vp 1 gene of btv-2, 8 and 10 respectively, indicating the non specific amplification of btv-15. foot and mouth disease is the most contagions and highly economically impotent disease of cloven footed animals. the disease is controlled by regular vaccination using the vaccine produced from the virus grown in the cell culture. the vaccine strain used for vaccine production is selected from the field isolates based on the adaptability and growth kinetics in bhk21 cells and antigen coverage. however the field viruses need to be passaged several times to adapt in tissue culture. passage of field viruses in tissue culture may results in development of mutants whose genetic makeup may differ from the field samples also some of the field strains may fail to adapt or may grow poorly in the tissue culture, thus the efficiency of the vaccine gets affected. structural proteins of fmdv carry the sequences which determine the serotype specificity and immunogenicity. thus one may replace the gene coding for structural proteins from the full length cdna copy of the vaccine strain that has been adapted to the tissue culture with the poly-structural protein gene (pi) so that the chimeric virus gets the serotype specificity of the field strain besides retaining the other characteristics that are needed for a vaccine virus. we have made replication competent fmdv asia i full length genome and cloned under t7 and cmv promoter separately in plasmid vectors. bam h1 sites were created for inserting pi-2a gene of other field strains. the p1-2a of type 'o' vaccine strain was amplified directly from the cattle tongue material, cloned in plasmid vector and studied the specificity by sequence analysis and gene expression. we have introduced 'o' p1-2a gene into the full length construct devoid of asia 1 structural protein gene, p1-2a. the in vitro transcribed rna in case of t7 promotered construct and plasmid dna in case of cmv promotered construct were transfected into the bhk21 cells. after the passaging the virus obtained was studied for the speciality. this approach may be used not only for rapid selection of vaccine strain and also as a repository of the cdna copy of the virus. the p1 is composed of 1a, 1b, 1c and 1d (vp4, vp2, vp3, and vp1) respectively of which the vp1 is the most immunogenic and subunit vaccine produced with vp1 alone was able to induce high level of neutralising antibodies. thus to control the disease in india polyvalent vaccine consisting of the inactivated virus of all the three serotypes are in use. however the conventional vaccines have several drawbacks which include safety and temperature sensitivity. hence alternatively sub-unit vaccines consisting of vp1 protein has been tried. however this showed limited success due to the antigenic variations occurring in the field viruses thus escaping the neutralization from the antibodies generated from single cloned protein. hence the present study was undertaken with an objective to include all the neutralizing epitopes present in the three serotypes by linking vp1 (1d) genes and produce a poly valent protein for using as poly subunit vaccine. in this study we have constructed a cassette by linking the genes of three serotypes 'o' (622 bp), 'a' (640 bp) and 'asia 1' (622 bp). these genes were cloned individually in commercially pbsk vector and confirmed by sequence analysis before linking in pc dna vector. the linked gene construct was sub-cloned in pet32 expression vector. the expression of the protein gene from the pet vector was induced with iptg and analysed by sodium dodecyl sulphate polyacrylamide gel electrophoresis (sds-page). a fusion protein of size 72 kda was observed in page gels. since the protein contains 6 his residues from the vector at the n-terminal end, affinity purification was carried out using nickel nitrilo-tri-acetic-acid (ni-nta) agarose matrix. the immunoreactivity of the purified protein was assayed by western blot with the anti fmdv type 'o' and 'asia 1' specific sera. the may be used as a subunit vaccine. silkworm diseases caused by viruses, bacteria, fungi and protozoans form major constraints for the silk cocoon production in all the sericultural countries and among these silkworm viral diseases viz., nuclear polyhedrosis and infectious flacherie caused by bmnpv and bmifv cause severe crop loss. the traditional disease management strategies include prophylactic measures and use of disease free silkworm eggs. the prophylactic measures such as disinfection of silkworm rearing house and appliances, egg surface, silkworm bed disinfection and rearing surroundings. the disinfectants used presently in sericulture are either formaldehyde or chlorine based products, but these chemicals are neither eco-nor user-friendly. the awareness about health hazards caused by formaldehyde and environmental pollution caused by cl 2 necessitated the development of eco-and user-friendly disinfectant products for use in sericulture. these include alternative disinfectant products developed using biodegradable chemicals and plant based ingredients by apssrdi, hindupur and central silk board for the management of silkworm diseases in india. the ideal disinfectant for sericulture would be the one which can inactivate silkworm pathogens of diverse origin and economical for sericulture. the paper discusses on the disadvantages of hcho and cl 2 based disinfectants and advantages of eco-and user-friendly disinfectant for the management of silkworm diseases especially the ones caused by viruses. the baculovirus expression vector system (bevs) is widely used for the production of high levels of properly post-translationally modified, biologically active and functional recombinant proteins and has facilitated basic biomedical research on protein structure, function, drug discovery and the roles of various proteins in disease. bevs is based on the introduction of a foreign gene into nonessential for viral replication genome region via of homologous recombination with a transfer vector containing target gene. the resulting recombinant baculovirus lacks one of nonessential gene (polh, v-cath, chia etc.) replaced with foreign gene encoding heterologous protein which can be expressed in cultured insect cells and insect larvae. insect cell-bev system is widely used to produce recombinant proteins. bevs also eliminates concerns regarding pathogens that could potentially be transmitted to humans as it is non-infectious to vertebral animals. these features make silkworm system an ideal expression and delivery package for producing proteins of medicinal importance. the efficiency, low cost and large-scale production of proteins using bevs represents breakthrough technology that is facilitating highthroughput proteomic studies. the bevs has become a core technology for cloning and expression of genes for study of protein structure, processing and function; production of biochemical reagents; study of regulation of gene expression; commercial exploration, development and production of vaccines, therapeutics and diagnostics; drug discovery research; exploration and development of safer, more selective and environmentally compatible biopesticides. utilization of silkworm larvae and pupae as bioreactor with recombinant bmnpv producing foreign proteins extends the usages of silkworms. due to its large-size and high protein synthesis ability as well as the expediency in mass culture, silkworm is considered as good candidate for producing recombinant proteins. wssbv is the causative agent of a disease, which has recently caused high shrimp mortalities and severe damage to shrimp culture. wssbv has been found across different penaeid shrimp species. in order to develop a effective diagnostic tool, a wssbv genomic library was constructed by cloning wssbv genomic dna extracted from purified virions. in the present study wssv disease free (confirmed by pcr analysis) were collected from hatcheries from different areas of guntur and prakasam districts and analysed to study the effect of various physical parameters like temperature, p h , salinity and turbidity on the prevalence of above disease. the studies on the surface water temperature revealed fluctuations in the ponds ranging between 19 to 30.2°c in diseased ponds and 25.2 to 34.5°c in healthy ponds. these results show definite influence of temperature on the prevalence of wssv. present day strategy in vaccine development is to include marker facility that helps in distinguishing antibody response due to vaccination vis-à-vis infection in vaccinated animals. such information becomes relevant for effective disease control programmes especially when using inactivated virus vaccines like foot and mouth disease (fmd). the antibodies generated in the animals, only through vaccination, is the measure of vaccine efficacy and safety. presently inactivated fmd virus (fmdv) vaccines are used to control the disease in the endemic countries like india. the quality assurance of the vaccine depends on the efficacy of the vaccine in generating protective antibody without causing subclinical disease due to improper inactivation. since protective antibody response in vaccinated animals can not be distinguished from that of infected animals one needs to assay the antibody response against non structural proteins (nsps) and the vaccine must be free of contaminated nsps. production of vaccine free of nsps requires the cumbersome method of virus purification which adds to the cost of the vaccine. alternatively one may develop a positive marker vaccine by including a foreign protein or epitope which is not expected to be present in the vaccine and the antibodies generated against which helps in detecting the vaccine related response. here we report a molecular approach by which we introduced a immuno-dominant epitope of green fluorescent protein (gfp) into the structural protein gene of foot and mouth disease virus vaccine strain asia 1 (63/72). our laboratory has produced a mini-genome of fmdv asia 1 that lacks structural protein gene (p1-2a) coding for all the structural proteins (vp1-4) of fmdv asia 1 as a vector (pcfl dasia 1). the p1-2a of the asia 1 vaccine strain was cloned separately into a plasmid vector and by successive pcr mutagenesis and cloning we have introduced nucleotide sequence corresponding to 9 amino acid epitope of gfp into p1-2a gene. gfp epitope was inserted by replacement at n-terminal region of vp-2 which is not immunogenic. the modified p1-2a was expressed in e. coli and studied. the modified p1-2a gene with gfp epitope was inserted into the pcfl dasia 1 to get full length replication competent cdna cloned under cmv promoter in pcdna (pcflasiagfp). this can be used to produce synthetic virus with gfp epitope that can generate antibodies not only against neutralizing epitopes but also against gfp epitope. presence of antibody against gfp epitope in the vaccinated animal will reveal vaccine efficacy. elisa against gfp can be used as a companion test not only for safety evaluation but also for quick evaluation of efficacy. further absence of nsp antibodies in the serum may reveal the quality of the vaccine in respect of safety. self replicating dna vaccines are developed to achieve robust immune response through enhanced antigen production and gamma interferon expression in vaccinated animals. since self replicating dna vaccines induce gamma interferon expression which helps in viral clearance such vaccines are expected to be useful to cure even the carrier and persistently infected animals. understanding the events that help in the elicitation of both the arms of immune response in vaccinated animals is necessary to understand the effectiveness of the vaccine. the work presented here deals with the immunological evaluation of a sindbis virus replicase based dna vaccine carrying linked fmdv vp1 genes in vaccinated guinea pigs. we have constructed self replicating dna vaccine vector and to the down stream of a sub genomic promoter we have inserted secretory signal followed by linked-vp1 genes of 3fmdv serotypes (o-a-asia 1) with glycine and proline bridge in between. guinea pigs were vaccinated with the construct and the sera at 28 days post vaccination were evaluated both for cellular response by studying the cd8 levels and by mtt and cytokine profiles by real time assays. the humoral response was evaluated by studying cd4 levels in the whole blood by facs analysis and serum antibody levels by snt and elisa. the animals were challenged with 100 gp infective dose of fmdv type 'o' virus lesions were scored. further the replicative efficiency of the challenge virus was studied by 3ab elisa. the results showed that all the assays except antibodies against 3ab protein have positive correlation with the protection. as expected the titre of the antibodies against 3ab protein was lower indicating that the challenge virus replication was inhibited in the vaccinated animals. the limited studies conducted by us showed that self replicating vaccine has a potentiality to emerge as potent vaccine for fmd. ganjam virus (ganjv) belongs to the genus nairoviruses (family bunyaviridae). these viruses cause diseases in livestock. it has been isolated from different animal hosts and tick vectors from india. genus nairoviruses includes a total of 34 tick-borne viruses, classified into 7 serogroups. the important serogroups are crimean congo hemorrhagic fever (cchf) and the nairobi sheep disease (nsd). the main members of the nsd group are nsd and dubge viruses. their genome consists of three segments of single stranded rna, viz. s, m and l that encodes viral nucleocapsid protein, viral glycoprotein g1 and g2 and the viral polymerase respectively. ganjv is very closely related to (nsdv). nsdv is found in east and central africa, causes very high morbidity and mortality in livestock. the present study involves phylogenetic comparison of ganjv isolates from india with other nairoviruses based on complete n gene. it will help to understand the kind of nucleotide (nt) and amino acid (aa) changes that have occurred in ganjv strains from different geographical areas. eight strains of ganjv isolated at niv during 1954-2002 from different parts of india were used in this study. virus stocks were prepared in vero e6 cell line these were used as the source of viral rna. the n gene was amplified either as a complete gene in one reaction or in fragments whenever necessary. thus obtained sequences were analyzed; annotated to get a consensus sequence, aligned against the sequence of prototype strain of ganjv and other representative nairoviruses. the nt sequences were converted to aa sequences and analysis was done at both nucleotide and amino acid levels. based on what nt or aa phylogenetic tree was constructed and compared with other nairoviruses (cchf, dugv, hazv, kupv and nsdv) where complete s segment sequences were available gen-bank database (ncbi). the phylogenetic data at both the nt and aa levels showed that all the strains of ganjv form monophyletic lineage with the nsdv. cchfv and hazv together formed another clade, whereas dugv and kupv made a separate branch in the tree. the different ganjv strains showed 9-10% difference with nsdv at the nucleotide level and 3-4% difference at the amino acid level. hazv showed 37-38% difference at the nt level and 37% difference at the aa level with ganjv as well as nsdv. the present data obtained suggests that ganjv and nsdv are minor variants of the same virus. diarrhoeal syndrome is one of the major concerns of the livestock industry. most of the diarrheic cases in animals go unnoticed and limited attention is paid on viral etiology. presence of large amount of fecal matter in animal shed acts as a source of infection for calves via drinking water, feed, or contaminated soil. keeping this in view, investigation was planned to detect the association of rotaviruses with diarrhea in dairy calves and to observe the genomic diversity among the circulating viruses in tarai area of uttarakhand. a total of 63 diarrheic fecal samples collected from instructional dairy farm, nagla, pantnagar, uttarakhand were screened during the study. samples were collected from both cow calves and buffalo calves in 0-3 months of age. for the diagnosis of rotavirus, all the fecal samples were subjected to rna-electrophoresis after nucleic acid extraction. viral genome segments were visualized by silver staining. out of the total 63 samples tested, seven were found positive in rna-page showing typical 11 genome segments migration pattern of bovine rotavirus. in the given samples prevalence of bovine rotavirus was 11.32% and 10% in cow and buffalo calves, respectively. on the basis of migration patterns of rotavirus in rna-page, group a were identified with typical 4:2:3:2 pattern. variation within movement of various genome segments among isolates of bovine rotaviruses was observed during the study that may be indicative of emergence of mutants in the circulating isolates. the vp6 gene based group a specific rt-pcr was standardized and all the isolates in this area were confirmed to be of group a type. work is in progress to genotype the bovine rotaviruses of this region based on vp7 and vp4 genes. this study emphasizes the need to explore the prevalence of bovine group a rotaviruses in different places of uttarakhand and their genetic characterization which could help in selection of control strategies for rotavirus infections. foot-and-mouth disease (fmd) is endemic in india causing enormous economic loss to the animal keepers and trade embargo with fmd free countries in livestock and animal products. rapid diagnosis of fmd is of immense importance in prevention and control of the disease. fmd is initially diagnosed clinically and confirmed by laboratory tests. virus isolation in cell culture and sandwich elisa for antigen detection are commonly practiced in laboratories. the virus isolation though is very sensitive but it can be slow and analytical sensitivity of the elisa is lower and can not be used with certain sample types. the use of molecular techniques in the diagnostic laboratory has greatly increased the speed, specificity and sensitivity of fmd diagnostic tests. molecular techniques like rt-pcr, pcr-elsa and dot hybridization can be used with more success for detecting carrier animals and animals harboring sub-clinical infection and can be applied in a wide range of clinical sample types. these techniques can be used as genus and serotype specific test including detection of particular lineage/genotypes with in the serotype. multiplex pcr has been used to differentiate serotypes of fmdv and the technique is sensitive, experimentally simpler, cost effective and less time consuming. the assay can be used for serotyping on elisa negative samples. the molecular techniques not only help in diagnosis but also useful for epidemiological studies. lineage differentiating rt-pcr has been useful in identifying different lineages of serotype asia 1 (lineage b, c and d) before proceeding with sequencing of 1d region. similarly genotype differentiating rt-pcr has been developed and used in differentiating two different genotypes of serotype a (genotype vi and vii). these assays have the potential to be applied on clinical samples directly, thereby saving much time needed for sample processing and nucleotide sequencing. recent development of real time rt-pcr methodology has allowed the diagnostic potential of molecular assays to be realised. advancement in real time pcr technology made it possible to combine several assays within a single tube which is in the progress in our laboratory. integration of these assays onto automated high throughput platforms provides diagnostic laboratories with the capability to test large numbers of samples. microarray technology was provided greater screening capabilities for pathogen detection. the microarray allows the addition of large number of oligonucleotide probes for identification of mutant pathogen and also for subtype determination. the combined properties of high sensitivity and specificity, low contamination risk, and speed has made realtime pcr and microarray technology a highly attractive alternative to conventional methods in increasing percentage of outbreaks confirmed and analyzed and for tracing the origin of fmd virus responsible for outbreaks. dna vaccines are expected to elicit both humoral and cellular responses, cellular response being long lasting. however the approach has several limitations like poor stability of dna, poor expression and risk of integration. poor expression becomes the major limitation in the case of fmd as fmdv proteins are poor immunogens. also dna vaccine vectors carrying only eukaryotic promoters elicit strong cmi response and weak humoral response. the methodology to achieve humoral response involves the expression and secretion of the expressed protein so that the antigen presenting cells will be able to process the antigen and produce humoral response. in case of fmd humoral response is as important as cellular response. the present project aims at addressing these issues; achieving higher expression and getting the protein secreted out by constructing self replicating gene vaccines for fmd and studying their efficacy. the vector for humoral immune response contains eef1 promoter, sindbis virus polymerase gene and secretory and anchoring signals. the integrity of the vectors was confirmed by sequence analysis. the linked polyvalent protein genes of fmdv serotype a, o and asia 1 were cloned into the vectors and the presence of the insert was confirmed by restriction enzyme digestion. the functionality of the constructed dna vaccine vector (pvac self rep 990) was assayed by transfecting the dna into bhk 21 cell monolayer and studying the 35 s labeled proteins in immuno-precipitation assays. the studies showed high level of expression in case of constructed vector as compared to infected virus for the specific protein. the secretion of the expressed protein was assayed by immuno-fluorescence assay and found to be positive. encouraged with these studies the preliminary studies were conducted on vaccine efficacy studies in guinea pig model. the immunized guinea pigs showed high antibody titres by snt and elisa, as compared to conventional dna vaccines (pup3cd) even at 1/10th of the dose. this approach of constructing self replicating dna vaccine for humoral response is the first report. genetically engineered microorganisms are important sources of industrial and medicinal proteins. over the past decade, plant host system has been investigated as potential host system for expressing proteins of therapeutic and diagnostic use. however concerns regarding the stability and environmental safety need to be addressed. chloroplast engineering is expected to resolve some of these issues since, plastids/chloroplasts are inherited maternally and are not disseminated through pollen. this makes plastid transformation a valuable tool for transgenic creation besides offering biological containment. since foot and mouth disease (fmd) of cloven footed animals is a major concern in the world over. foot and mouth disease (fmd) is the most feared, viral disease of the cloven footed animals causing heavy losses to the live stock industry. the disease is enzootic in many parts of the world including asia. the conventional vaccines for fmd have several limitations which include safety, temperature sensitivity and duration of immunity. attempts have been made to overcome these limitations using recombinant dna technology. amongst the newer vaccines, edible vaccines are cost effective and easy to administer. since the stability of the gene of interest is the major concern in the case of plant transgenics, marker genes are used for regular selection. the detection methods based on the available marker proteins like b glucoronidase (gus) protein/antibiotic selection are cumbersome and cost intensive. however selection based on herbicide resistance is much simpler and easy. hence in the present study, the 5-enolpyruvylshikimate-3-phosphate synthase (epsp) gene was used as a marker along with the immunogen gene of fmdv. epsp is the key enzyme in the shikimate biosynthesis pathway necessary for the aromatic amino acids production. in order to investigate the mechanism of long term immunity and the effect of protective immunity induced by cationic plg micro particle coated dna vaccination. we constructed the expression plasmid containing a foot-and-mouth disease virus (fmdv) id gene sero type a. intramuscular vaccination of guinea pigs with the micro particles coated plasmid dna induced a strong antibody response and neutralization antibodies, cellular mediated immune response which lasted 1 year. we further analyzed the persistence and expression of id gene by polymerase chain reaction and reverse transcriptase polymerase chain reaction and quantitative pcr. the results showed that id gene was present and expressed in the muscle cells up to 1 year after days post vaccination. furthermore, guinea pigs vaccinated with micro particles coated plasmid dna were protected against a challenge with fmdv virus. therefore the micro particles coated plasmid dna vaccination dose induce a protective immunity and long term humoral, cellular immuno responses against fmdv, which could be maintained by persistent expression of id gene in muscle cells. foot and mouth disease virus (fmdv) causes a highly contagious viral disease of cloven hoofed animals, which has a considerable socioeconomic impact on the countries affected. interleukin-18 (il-18) enhances the il-12 driven th1 immune response that is important in immunity against intracellular pathogens. the multiple roles of il-18 in many physiological and pathological processes have generated a great deal of interest in recent years. antiviral effects of il-18 have been reported. we evaluated the effects interleukin-18 (il-18) on the replication of fmdv in vitro in bhk-21 cells. bovine il-18 mature protein coding sequence was amplified from the bovine pbmc cells and cloned into prokaryotic expression vector pet32a. protein expressed was purified and specificity was confirmed by immunoblotting. bhk-21 cells were treated with purified expressed il-18 protein with (2 lg/ml) 4 h prior to fmd infection. cell culture supernatants were collected at 24 h post infection were subjected for elisa and virus titration assay. rna extracted from the cells was subjected to real time pcr for viral rna quantification. 2 log titer reduction was observed in the fmd virus titer in il-18 treated cells compared to the untreated cells where as virus antigen quantified by elisa has shown a reduction of 60-folds. 69-fold reduction in the fmd viral rna copy number was observed in the il-18 treated cell compared to the untreated measured by qpcr. current study demonstrated the potent anti viral activity of il-18 on fmdv by inhibiting the viral replication. these results further suggests that il-18 has the potential role of il-18 as molecular adjuvant in fmd vaccine development and development of therapeutic for fmd. foot and mouth disease is the most contagious viral disease of farm animals. control of the disease in animals is by vaccination and slaughtering of infected animals. conventional oil adjuvant vaccine has its own limitation. alternate to this genetic vaccines where the dna encoding viral antigen may be a promising approach. naked dna vaccine has limitations like poor uptake of dna by cells and more importantly by nucleus. as a result delivery of naked dna through calcium phosphate nanoparticle was attempted. calcium phosphate nanoparticle is a potential delivery agent which proved to enhance the immune response. fmdv p1-3cd ''o'' vaccine gene constructs in pcdna3.1? entrapped by the nanoparticles was prepared by using different molarity of calcium chloride and disodium hydrogen orthophosphate. the nanoparticles entrapping fmdv p1-3cd ''o'' and naked dna were presented to the guinea pigs through intramuscular injection to study the mrna expression of antigen by rt-pcr. animals were sacrificed at defined time to collect different organs and total rna was extracted. each time blood was collected to analyse the fmdv specific serum antibodies. dna vaccines presented through calcium phosphate produced transcripts in the injected muscle up to 240 days whereas naked dna up to 120 days. serum antibody levels of naked dna vaccine showed antibody titre till 60 days. whereas nanoparticle injected animals showed serum antibody till 120 days. serum neutralization titres of 1.5 were observed in calcium phosphate dna vaccines at about 28-150 days, where as naked dna sn titers were observed for short period of 30-90 days. the study clearly showed calcium phosphate nanoparticle entrapping fmdv vaccine dna may be a better delivery system for dna vaccines as it confirms availability of the antigen and persistence of antibody for longer duration than naked dna. capripox is highly infectious, contagious, and oie notifiable disease of small ruminants, caused by sheeppox and goatpox viruses which are members of capripoxvirus genus of the family poxviridae. in the present study, we analyzed the partial gene sequences of p32 protein, an immunogenic envelope protein of capripox viruses (capv) to assess the genetic relationship among different sheep pox and goat pox virus isolates from different geographical areas of the country. product of this gene has been shown to be important in attachment of capv to host cell surface receptors during viral entry and host immune response. the following virus isolates have been used in the analysis: gtpv-uttarkashi, p60, vaccine virus; gtpv mukteswar, p10, challenge virus; gtpv (akola), gtpv bareilly/00, gtpv ladakh/01 and gtpv sambalpur/82, field isolates and sppv srinagar, p40; sppv ranipet, p50; sppv-rf, p50, vaccine viruses and sppv makdhoom/07, sppv cirg/08, sppv pune/08, sppv bareilly, sppv 183/03 and sppv 125/02, field isolates. in this study, all virus isolates were confirmed by pcr amplification and analysed in pcr-restriction fragment length polymorphism (pcr-rflp) using ecori enzyme to confirm their specificity. further, the amplicons were cloned and sequenced commercially. nucleotide and the deduced amino acid (aa) sequences were compared with published sequences of the members of the genus capripox virus. sequence analysis of partial 172 bp sequence has shown high sequence identity among all indian sppv and gtpv isolates at both nt and aa levels. it revealed a 99.4-100% and 98.2 for gtpv field isolates where as, 100% for sppv field isolates at both the nt and aa levels. in general, capv isolates in this study shown 98.3-98.8 and 96.5% homology between gtpv and sppv at nt and aa levels as reported earlier. further, it revealed a unique change of g120a in all gtpv isolates resulting in formation of drai site in place of ecori and possible development of restriction enzyme specific pcr-rflp for differentiation of sppv and gtpv from field isolates. orf or contagious ecthyma is considered as non-contagious, proliferative disease and is caused by orf virus of the genus parapox virus of the family poxviridae. it is reported most commonly in sheep and goats and also a zoonotic agent. camels are also infected by orf virus and reported in camel rearing countries as a mixed infection with camel pox, the later is caused by an orthopox virus. in india, there are few reports of the orf virus infection in camels and identified by clinical signs and pcr. in this study, we identified the presence of orf virus from clinical samples of suspected case of sporadic infection in camels by serological and molecular techniques. viral dna isolated from processed scabs used initially in nested polymerase chain reaction as diagnostic pcr which successfully amplified 235 bp fragments and also sequenced to check the fidelity of the product. after confirming the infection by pcr, some of the structural and non-structural genes were amplified for sequence analysis. out of the five genes characterized, the major important one selected for sequence and phylogenetic analysis is b2l gene which is homologous to a major envelope protein p37k of vaccinia virus. full open reading frame of 1137 bp from orf b2l was amplified by pcr, cloned and sequenced commercially. nucleotide and deduced amino acid sequences of b2l were compared with other published sequences of the members of the genus papapox virus. sequence analysis shows a maximum percent identity of 94.8 and 95 (indian orf virus isolates); 94.7 and 94.5 (other orf isolates); 98.8 and 98.7 (orf-camel/jodhpur/08); 85 and 82.8 (bovine popular stomatitis virus) and finally 97.4 and 97.6 (pseudo cowpox virus) respectively at nt and aa levels. phylogenetic analysis of the isolate was also performed using the neighbour joining method in mega 4 program to know the phylogeny relatedness of the virus, which revealed that the isolate is well grouped with the jodhpur isolate and closely related to pseudo cowpox virus. it warrants further analysis of other potential genes to confirm the causative agent of the contagious ecthyma in camels as pseudo cowpox virus. chikungunya an arboviral disease is transmitted through the bite of an infected aedes mosquito. it causes a self limited febrile illness along with arthralgia and myalgia. in some cases neurological and severe hemorrhagic manifestations has been observed. chikv epidemic has been reported in africa, india, south east asian countries and during the current out break imported cases of chikv has been encountered in most of the european countries. the causative agent belongs to the genus alphavirus family togaviridae. human beings serve as the chikungunya virus reservoir host during epidemic periods. outside these periods the main reservoirs are monkeys, rodents, birds, and other unidentified vertebrates. antibodies to chikv have been detected in domestic animals. in the present study we surveyed madanapalli, palamaner, b. kotta kota and tirupati and collected a total of 67 rodent samples, 75 bovine samples; 20 sheep samples and 15 canine samples. total rna was isolated from all these samples and subjected to rt-pcr using a primer pair dvrchk-f/dvrchk-r which could amplify a 330 bp e1 gene product specific to chikungunya virus (naresh kumar et al. 2007 ). all the serum samples were further screened for chikv specific igm antibodies using commercially available ctk biotech strips. none of the samples were found positive either for chikv specific rna or chikv specific igm antibodies. more number of samples from domestic animals as well as rodents are being screened to study their possible role if any in the maintenance of chikv in nature and during the inter epidemic periods. the present study discusses these aspects in detail. petunia hybrida is widely used as experimental host plant for begomovirus identification and its characterization. hitherto, natural infection of begomovirus on petunia has not been reported in india. recently, petunia hybrida grown in and around ludhiana were found to be depicting typical symptoms caused by begomovirus. the symptoms include severe reduction in leaf size, downward curling and distorted leaves. severely infected plant became bushy, stunted and produces no flower. total genomic dna was extracted from the plants showing symptoms of begomovirus, by ctab method. the presence of virus was confirmed by using degenerated primers, designed to identify all the begomovirus prevailing in the world. to identify the strain associated with the disease, the positive samples along with healthy control were tested against different strain specific primers of tomato leaf curl virus, so far reported in india i.e. tomato leaf curl new delhi virus, tomato leaf curl palampur virus, tomato leaf curl banglore virus, tomato leaf curl karnataka virus and tomato leaf curl gujarat virus. among these, only tomato leaf curl new delhi virus specific primer was able to give the desired amplicon of *1180 bp. hence, it is confirmed that the leaf curl disease of petunia hybrida is associated with tomato leaf curl new delhi virus. this disease of petunia can become a sever production constraint in coming years. from last 2 years (2008 and 2009) it was observed that some varieties of brinjal grown in rainy season, showed typical leaf curl type of symptoms. the symptoms include upward curling of the leaves, cupping, vein thickening, reduction in leaf size and distortion of leaves. the severely infected plant remains stunted and bushy, became unproductive or produces only few fruits. the disease was experimentally transmitted from naturally infected brinjal to healthy seedlings by whiteflies (bemisia tabaci) and grafting, but not by mechanical or aphid transmission. to detect the begomovirus associated, total genomic dna was extracted from the plants showing disease symptoms. the presence of virus was confirmed by using pcr based begomovirus geneus-specific primers designed by deng et al., wyatt and brown and rojas et al. these degenerated primers give the expected product size of *530, *575 and *1280 bp, respectively. core coat protein (cp) gene and dna-b was also amplified in the samples using specific primers. to identify the strain associated with leaf curl virus, dna was subjected against primers of different indian tomato leaf curl virus strain i.e. tomato leaf curl new delhi virus, tomato leaf curl palampur virus, tomato leaf curl banglore virus, tomato leaf curl karnataka virus and tomato leaf curl gujarat virus, using pcr. among these, only tomato leaf curl new delhi virus primer was able to show the desired product size of *1180 bp. therefore, it was confirmed that leaf curl disease of brinjal is caused by tomato leaf curl new delhi virus in association with satellite b-dna. to identify the strain associated with the disease, all samples were further subjected to the specific primers, designed to amplify all the tomato leaf curl virus strains, so far reported from india i.e. tomato leaf curl new delhi virus, tomato leaf curl palampur virus, tomato leaf curl banglore virus, tomato leaf curl karnataka virus and tomato leaf curl gujarat virus, using pcr. among these, only tomato leaf curl palampur virus specific primer was able to give the expected product size of *900 bp. this shows the association of tomato leaf curl palampur virus with leaf curl disease of calendula and marigold. thus, calendula and marigold can act as a reservoir for the tomato leaf curl palampur virus and may cause severe constrain in the production of these important ornamental plants. groundnut bud necrosis virus (gbnv) belongs to serogroup iv of the genus tospovirus in bynayaviridae family and infects several economically important crops all over india. the nucleocapsid protein (np) encoded by the small rna of gbnv encapsidates the viral rnas. apart from this structural role, the np has also been implicated in the replication, transcription, maturation and cell to cell movement. with a view to study the structure and function, the np of gbnvtomato isolate from karnataka was over expressed in e. coli and purified by ni-nta chromatography. the purified np was present as ribonucleoprotein complex and as heterogeneous mixture containing monomers, tetramers and higher order multimers. in order to determine the regions involved in oligomerization and nucleic acid binding, mutational approach was taken. n-and c-terminal deletion clones were generated (n20np, n40np, c15np and c37np), over expressed in e. coli, and were purified by a procedure identical to that used for the wild type protein. initial studies on oligomeric status suggested that in addition to n-and c-terminal regions there may be additional regions or residues which contribute to multimerization of np. the amount of rna bound to the truncated proteins was reduced in case of n20np, n40np and c15np. interestingly removal of 37 amino acid residues (natively unfolded region) from the c terminus resulted in complete loss of nucleic acid binding suggesting that the rna binding domain was located in c-terminal region of np. further np was observed to get phosphorylated in in vitro kinase assays by a kinase present in the soluble fraction of tobacco plant sap. both atp and gtp were utilized as phosporyl donors and mn 2? was the preferred metal ion which suggests that np might be phosphorylated by a ck2-like protein kinase. phosphorylation studies with n-and c-terminal truncated proteins revealed that the site of phosphorylation lies within the amino acid residues 40-239. by mass spectrometric analysis of the protein threonine-84 and serine-202 were identified as possible phosphorylation sites. a naturally occurring isolate of virus infecting gherkin (cucumis anguira l.) showing mosaic symptoms of mosaic, leaf distortion and dark green islands in the lamina was identified in the export cultivars of gherkin grown in commercial fields of kuppam rural, chittoor district, andhra pradesh. the virus infection was deadly prevalent among the field that caused a lot of economic damage to the crop that resulted in yield losses and reduced quality of fruits meant for export. symptoms of the infected fruit included blistering and malformation of the fruit. the virus infected leaf samples were collected and initial host range tests were conducted with different cucurbit species showed that the host range include propagation hosts like cucumis anguira (gherkin), cucumis sativus, cucurbita pepo, cucumis melo, langeneria vulgaris, momordica charantia and local assay host like chenopodium amaranticolor. the virus host range was only restricted to cucurbit species and chenopodium. the virus was maintained for further studies on cucurbita pepo by sap or mechanical inoculation. the virus induced mosaic, vein clearing symptoms on pumpkin. electron microscopy of the leaf dip preparations stained with 2% uranyl acetate from the pumpkin leaves showing symptoms revealed the presence of a long flexuous filamentous particle measuring 750 9 12 nm. the virus positively reacted to the polyclonal antisera of papaya ringspot virus-w, potato virus y, tobacco etch virus and also strongly reacted with the polyclonal antiserum of zucchini yellow mosaic virus in direct antigen coated-enzyme linked immunosorbent assay (dac-elisa). because of very strong reaction to polyclonal antisera of zucchini yellow mosaic virus, we tried to amplify the partial nib and cp genes of the virus along with the 3 0 utr by using two primers zy2 5 0 gctccatacatagctgag acagc3 0 and zy3 5 0 taggctttttgcaaacggagtcta at c3 0 . total rna from gherkin infected leaves was isolated using trizol ls reagent (sigma). rt-pcr was performed to obtain an amplicon of *1.2kbp, cloned into fermentas ptz57r/t vector and sequenced at mwg biotech, bangalore. sequence analysis revealed that the virus was isolate of zucchini yellow mosaic virus and was showing 98% of homology to that of the zucchini yellow mosaic virus strain b genome ay188994 and zucchini yellow mosaic virus nat genome ef062582 which were strains reported from israel. the sequence of the present study was submitted to the genbank gq482976. the results state a suspicion that the virus could have been mobilized by some infected source brought by the commercial israeli based companies into india due to poor quarantine regulations as the gherkin cultivation in these regions is chiefly supported, purchased, exported and marketed by these private companies that are based from israel. this is the first report on molecular characterisation of zucchini yellow mosaic virus infecting cucumis anguira (gherkin) from india. they also exhibited synergism with other virus which was region specific. fifty percent of the total symptomatic plant population was found be positive only for carla while remaining showed mixed infection of carla with tospo in some regions while in others carla virus was found to be associated with cmv. presence of only carlavirus was up to 10-20% incidence, without association of tospo, cmv, poty or tobamo viruses was also observed in some fields. avijit tarafdar, raju ghosh, k. k. biswas plant virology unit, division of plant pathology, indian agricultural research institute, new delhi 110012 citrus tristeza virus (ctv), a brown citrus aphid (toxoptera citricidus) transmitted closterovirus under family closteroviridae, is one of the major limiting factors in cultivation of citrus worldwide. ctv is a longest known plant virus having flexuous particle of 2000 9 11 nm in size. ctv genome is a positive sense ssrna of about 20 kb nucleotide containing 13 open reading frames (orfs) encoding 17 proteins. several biological as well as genetic variants of ctv are reported in all the citrus growing countries in the world. ctv causes decline and death of millions of citrus trees in the world. in india, ctv is a century old problem, and has killed an estimated one million citrus trees till today. in molecular and genetic level, ctv isolates from india were not fully characterized. genetic diversity and sequence divergence in ctv isolates of india are not fully established. further, evidence of recombination and causes of evolution of ctv variants in india have not been studied till date. therefore, in the present study, effort has been made to characterize several indian ctv isolates in genetic level, examine their genetic diversity, identify recombination events and analyze evolution of divergent ctv. a total number of 73 ctv isolates from different regions of india (35 from darjeeling hills, five from bangalore, 15 from delhi and 18 from vidarbha) were under taken for genetic study. two genomic regions of ctv, i.e., entire cp gene (cpg) (672 nt) and a gene fragment of 5 0 orf1a (orf1a) (404 nt) were amplified, cloned, sequenced and nucleotides were analyzed. based on cpg, indian isolates shared 88-99% nucleotide identity, and based on orf1a they shared 82-99% identity, among them. incongruence of phylogenetic relationship was observed as on sequence analysis five phylogenetic clades based on cpg, and eight clades based on orf1a, were generated suggesting the recombination events have been occurred between the sequences of indian ctv isolates. thus, to identify the potential recombination events, and determine the parental sequences in ctv isolates, six recombination detecting algorithms, namely, rdp, genconv, bootscan, maxchi, chimera and siscan were used. out of 73 indian ctv, cpg of 18 and orf1a of 47 isolates of ctv showed recombination events suggesting orf1a was more prone and fragile to rna recombination as compared to cpg. this findings indicated that high degrees of genetic diversity and incongruent relationships of indian ctv isolates are due to genetic recombination occurred, which may be the important factors in driving evolution ctv variants in india, that was also supported by a splittree decomposition analysis. b. v. bhaskara reddy, y. sivaprasad, k. rekha rani, k. raja reddy department of plant pathology, regional agricultural research station, acharya n.g. ranga agricultural university, tirupati, andhra pradesh sunflower (helianthus annus l.) is one of the most important oil seed crops in the world which ranks third in area after soyabean and groundnut. the sunflower necrosis disease (snd) is characterized by necrosis of leaves, necrosis streaks on petioles, stem, floral parts and stunted growth. the causal agent of the disease has been identified as tobacco streak virus (tsv) which belongs to genus ilarvirus of the family bromoviridae. the suspected tsv infected sunflower samples collected from chittoor district in andhra pradesh were found positive for tsv-dac elisa. total rna was extracted from sunflower using rnaeasy isolation kit (qiagen). the tsv coat protein (cp) gene, movement protein (mp) gene and replicase (rep) gene were amplified by rt-pcr with specific primers, cloned in ptz57r/t vector, sequenced and deposited in genbank (gu355899, gu355900 and gu371445). the size of cloned cp gene was 717 bp and codes for 239 amino acids. the cp gene sequence analysis revealed that the tsv-tpt infecting sunflower has 98-100% homology at nucleotide level with soybean, tagietus-tpt and okra-tn isolates and 93-99% homology at amino acid level. the movement protein gene was 615 bp and codes for 205 amino acids. the mp gene sequence analysis showed that it has 94-97% homology at nucleotide level and 92-95% at aminoacid level. chilli (capsicum annuum), the important commercial vegetable/spice of himachal pradesh, is affected by several viral diseases; of them cucumo, tospo, poty and gemini viruses are the most common genera. however, these viruses are not identified clearly and characterized fully, which are foremost needed to formulate the management strategy. therefore, in the present study, effort has been made to identify and characterize the important viruses causing diseases in chilli. in this study, several farms in major chilli growing areas of bilaspur and kangra districts in himachal pradesh were surveyed and infected plant samples were collected randomly. virus infection in these samples were detected by das-elisa using antisera to cucumber mosaic virus (cmv) and potyvirus (group specific) and through slot-blot hybridization (sbh) using cmv, iris severe mosaic poty virus (ismv), tomato spotted wilt tospo virus (tswv) and chilli leaf curl gemini virus (clcuv). based on das-elisa and sbh, the incidence of disease was estimated and ranged from 18.2 to 21.8% by cmv and 3.5 to 5.4% by potyvirus. to detect tospo and geminivirus in the infected chilli, sbh test was carried out. infected samples showed maximum virus titer in both das-elisa and sbh test were further confirmed by pcr using specific primers. desired sizes of amplicons; *540 bp, *800 bp, *570 bp and *460 bp of cmv, poty, gemini and tospo viruses, respectively, were obtained. as the present study clearly indicated that cmv appeared as a major one among the viruses infecting chilli in the hilly region of himachal pradesh, two isolates of cmv were characterized in genetic level. thus the amplified products (*540 bp) of cmv, palampur 1 and palampur 2 were cloned in pgemt cloning vector, sequenced and the sequences were submitted to ncbi database (palampur 1: acc-fm209497 and palampur 2: acc-fm209498). the sequences were then analyzed and compared with other sequences available in the data base. based on sequence analysis, it was found that present cmv isolates shared 99% nucleotide identity between them, are closely related with australian cmv isolate cmv-ly (acc-af198103) by 98% nucleotide identity. in phylogenetic tree analysis, it was observed that indian cmv isolates formed same cluster along with cmv-ly. as it is known that cmv subgroup ii comprises cmv-ly, it is concluded that the cmvs of this hilly region of himachal pradesh belong to subgroup ii. chilli is essentially a crop of the tropics and grows better in hotter regions. chlii (capsicum annuum), a member of family solanaceae is an important vegetable and spice crop of immense commercial importance. the pungency in pepper is due to an alkaloid known as capsaicine and peppers are characterized as sweet, hot or mild depending on capsaicine content. the present investigation were conducted to find out the highly resistant cultivars of capsicum annuum against cmv and tylcv among ten cultivars of chilli in agroclimatic condition of aligarh. the highest (70 and 80) percentage of infection was observed in hc-201 and kalyanpur type-1 by showing the positive reaction to cmv by elisa test. no symptoms was recorded in case of bc-16, lca-235 and jca-154 and showed negative reaction to cmv by elisa. bc-16 and lca-235 also showed negative reaction to tylcv by elisa and these were symptomless. maximum infection (70 and 80) was registered in hc-201 and c 8 , cultivar. so, the bc-16, lca-235 and jca-154 has proved highly resistant varieties against cmv and tylcv and these may be used in breeding programmes against viruses. cotton leaf curl virus belongs to the family geminiviridae, genus begomovirus. the members of this family contain circular single stranded dna molecules as their genomes. there are two kinds of begomoviruses-bipartite viruses with genomes consisting of two dna molecules designated dna-a and dna-b and the monopartite viruses which contain only dna-a but not dna-b. in monopartite viruses, the dna-a is accompanied by a small circular dna molecule called dna-b which is essential for the development of typical disease symptoms. cotton leaf curl virus is a monopartite virus and causes the cotton leaf curl disease which has emerged as a major disease of cotton in the indian subcontinent. the non-structural protein ac4 of cotton leaf curl kokhran virus-dabawali isolate (clcukv-dab) was cloned into pgex5x2 vector and overexpressed in bl21(de3)plyss e. coli cells. the overexpressed gst-ac4 protein was purified by glutathione sepharose chromatography. the purified gst-ac4 protein was found to possess atpase activity. the optimum temperature and ph for the activity were 37°c and 7.4 respectively. the atpase activity was inhibited in presence of edta, showing that it is dependent on divalent metal ions. the activity was supported by magnesium, manganese and zinc ions but inhibited in presence of calcium ions. it was also inhibited by the non-hydrolyzable atp analogue adenosine-b, c-imido triphosphate and in the presence of other nucleotides like ctp and gtp. the k m and the v max of the reaction for atp as the substrate are 1.54 mm and 95.2 nmol/min/ mg of the protein respectively. the enzyme could also utilize gtp as the substrate. the fact that ac4 is specifically an ntpase and not a general phosphatase is revealed by the finding that it does not hydrolyze p-nitrophenyl phosphate to yield yellow colour while a similar reaction carried out in parallel with alkaline phosphatase readily yields the colour. it has been suggested earlier that ac4 may be involved in cell to cell movement of the virus (rojas et al. 2001) . it is possible that by its ability to hydrolyze atp, ac4 serves to power viral movement in the plant. thirteen sugarcane yellow leaf virus isolates causing yellow midrib and irregular yellow spot pattern from six states of india were characterized by rt-pcr assays. scylv-615f and scylv-615r primers were used as forward and reverse primer pairs and the amplified products were cloned and sequenced. comparative coat protein sequence analysis confirmed that all the scylv-indian isolates were clustered into two major groups confirming the existence of two strains of scylv affecting sugarcane crops of india. in a separate experiment, the member of both of the phylogenetic groups were found to be transmitted by the sugarcane aphid, melanaphis sacchari from infected to healthy sugarcane suggesting its secondary spread in nature. the symptoms produced by the virus causing cotton mosaic disease were little bit different in both sap inoculation and under natural field condition. in natural field condition it has shown clear chlorosis type of symptoms on major leaves of plants but in sap inoculated plants veinal chlorosis and mosaic type of symptoms are found to be common. in field conditions infected plants grows erect and have less boll formation. there is no effect found on seed shape or seed size. the initial symptoms produced on cotton leaves after inoculation were wonderful. local lesions observed in second week from inoculation and then they changes to chlorotic type of symptoms and some are necrotic symptoms also. the plants at early stage are found to be affected, has less lateral branch development and hence reduction in yield production. the naturally field infected plants showing good symptoms are also difficult to identify in lateral stage of plant. because they disappear with time. the virus is very easily sap transmissible. the virus is found to be transmitted by thrips palmi and thrips tobacci in persistent manner. no seed transmission is observed. virus showed same physical properties as it shows in stem necrosis of peanut or sunflower necrosis disease. the physical properties are found to be thermal inactivation point (tip) 55-60°c, dilution end point (dep) 10 -2 to 10 -3 and longevity in vitro (liv) 5 h, virus infecting nineteen different host plants are identified belonging to five different types of families viz. malvaceae, chenopodiaceae, compositae, leguminaceae and solanaceae. however they found to produce same types of symptoms as in most of the host that have been tested before. in elisa test report it is found that the virus showing positive test only with anti serum of tsv of a cowpea and cotton but negative reaction with pbnv of cowpea and cotton which clearly denied possibility of presence of pbnv in cotton producing these kinds of symptoms elisa report clearly shows that tsv antiserum of cowpea is showing positive results with clear chlorotic types of symptoms. a powerful approach to functional genomics, and an alternative to the massive generation of transgenic plants, is the use of the recently described virus induced gene silencing (vigs) process, which allows viral vectors to knock out the function of a gene-of-interest. vigs is based on a silencing mechanism that regulates gene expression by the specific degradation of rna. as a tool for reverse genetics, vigs has many advantages over other common ways to study gene function because of the ability of viruses to replicate and move systemically within a plant. vigs can generate a phenocopy of a mutant without all the troubles of traditional methods of mutagenesis. geminiviruses with their small dna genomes and ease of inoculation through agrobacterium, are excellent candidates for vigs vector development. as a first step, the geminivirus bhendi yellow vein mosaic virus, characterized in our lab (jose and usha, virology 305: [310] [311] [312] [313] [314] [315] [316] [317] 2003) has been chosen. the satellite b dna associated with this virus has a single open reading frame (bc1). bc1 is essential for symptom development but not for replication. therefore, bc1 has been replaced by a multiple cloning site harbouring sali, xbai, bamhi, bsrgi and xhoi, initially in a cloning vector and then in the binary vector containing the partial tandem repeat of the b dna. in the place of the bc1 orf, the plant phytoene desaturase gene has been cloned and the resulting construct was used for agroinfiltration along with the partial tandem repeat clone of the begomovirus (dna a component chilli (capsicum annuum l.) plants exhibiting prominent symptoms of begomovirus like: leaf curl, vein swelling, shortening of petioles, crowding of leaves and stunting of plants were collected from rorkee, uttarakhand and dhaulpur, rajasthan, india. total genomic dna was isolated from naturally infected chilli samples and pcr was carried out with coat protein (located in dna-a) gene specific primers. as expected to the primers, *800 bp dna fragments were amplified from the infected chilli samples. to know the bipartite nature of the virus isolates, nuclear shuttle protein (located in dna-b) gene specific primers were employed which also resulted in positive amplification of *850 bp dna bands with all the coat protein tested positive samples. to ascertain the association of dnab component with the virus isolates, a set of dna-b specific primers were used which resulted in positive amplification of full length (*1.3 kb) dna bands in the chilli samples collected from rorkee, uttarakhand, however, multiple sizes bands were resulted with the samples collected from dhaulpur, rajasthan. these findings confirmed that both the virus isolates under study are bipartite begomovirus associated with dna-b satellite. the sequencing of the pcr products is under progress which analysis will be discussed. groundnut bud necrosis virus (gbnv) belonging to the genus tospovirus, which is a unique member of the family bunyaviridae, infects several economically important crops. the virus has three genomic ssrna segments namely s (ambisense), m (ambisense) and l (negative sense). the s rna codes for nucleoprotein (np) and non-structural protein (nss) from viral complimentary and viral strands respectively. many viral nonstructural proteins such as ns3 of hepatitis c virus, yellow fever virus, dengue virus, sv40 large t antigen and cytoplasmic inclusion protein of tamarillo mosaic potyvirus are known to exhibit rna/dna stimulated ntpase, dntpase and helicase activity. nss of gbnv does not have any sequence similarity with any of the above mentioned viral rna/dna helicases but has a ntp binding domain. however, it has been implicated as suppressor of gene silencing in vivo. with a view to elucidate the mechanism by which nss could act as a suppressor of gene silencing and examine the other potential roles of nss in the life cycle of the virus, the gbnv (to) nss was over-expressed in e. coli and purified by ni-nta chromatography. in vitro studies with the purified rnss suggest that it exhibits an rna stimulated ntpase activity. many of the proteins that possess the rna/ dna stimulated ntpase and datpase activity, are also shown to have atp dependent nucleic acid unwinding activity. it was therefore of interest to examine whether nss has the nucleic acid unwinding activity. the helicase assays revealed that nss has dna/rna helicase activity. helicase activity of nss was absolutely dependent on atp and mg 2? ion. nss could unwind dsdna substrate with 5 0 overhang, or 3 0 overhang. mutation of the crucial lysine in walker motif a (k189) severely affected the unwinding activity where as mutation of aspartate residue in walker motif b (d159) resulted in only 20% loss of activity. in this regard, rnss is a unique enzyme which does not have the canonical helicase motifs but can catalyze dsdna/dsrna unwinding in an atp and mg 2? dependent manner. the rnss might act as a suppressor of by unwinding the dsrna, the substrate for dicer. in addition to being a suppressor of ptgs, nss may also regulate the viral replication and transcription by modulating the secondary structure of the viral genome. this new research finding on nss might pave way for further studies on its role in viral replication and transcription. yellow vein mosaic disease of pumpkin (cucurbita moschata) poses a serious threat to the cultivation of this crop in india. the disease was found to be associated with whitefly-transmitted bipartite begomoviruses were detected in varanasi field using polymerase chain reaction (pcr) with primer design through coat protein conserved region of begomoviruses from ncbi database. all plant samples showing symptoms were infected with begomovirus. the virus species were provisionally identified by sequencing *750 bp of the viral coat protein gene (av1 ageratum conyzoides is commonly known as billygoat-weed, chick weed, goatweed and whiteweed. in india it is popularly known as bill goat weed. it is an annual herbaceous plant with a long history of traditional medicinal uses in several countries of the world and also reputed to possess varied medicinal properties including the treatment of wounds and burns. in cameroon and congo, it is used traditionally to treat fever, rheumatism, headache, and colic. during survey in and around gorakhpur in 2009, ageratum plants were found affected with the symptoms of leaf curling, mosaic mottling and leaf yellows. the infected leaf samples were processed for virus identification and association with pcr assays. total dna was extracted and pcr were performed with begomovirus specific primers (tlcv-cp). a *800 bp band was consistently amplified on 1% agarose. the pcr products were directly sequenced and sequence was submitted in genbank with the accession no. gq412352. the blast search analysis showed highest similarity of 98% with the ageratum enation virus. vernonia cinerea leaves with yellow vein symptoms were collected around crop fields in madurai. a 550 bp product amplified from total dna extracted from symptomatic leaves with degenerate primers designed to amplify a part of the av1 gene from begomoviral dna a component was cloned and sequenced. based on the above sequences, specific primers were designed and the full length dna a of 2745 nucleotides with typical genome organization of begomoviral dna a was obtained and was submitted to embl data base (acc no: am182232). the sequence comparison with other begomoviruses revealed the closest identity (83%) with emilia yellow vein virus from china and less than 80% with all known begomoviruses. the international committee on taxonomy of viruses (ictv) has therefore recognized vernonia yellow vein virus (vyvv) as a distinct begomovirus species. conventional pcr could not amplify the dna b or dna b from the infected tissue. however, the b dna (1364 bp) associated with the disease was obtained (acc no: fn435836) by the rolling circle amplification-restriction fragment length polymorphism method (rca-rflp) using phi29 dna polymerase. sequence analysis shows that dna b of vyvv has the highest identity (81%) with dna b of ageratum leaf curl disease and 58-77% with the b dna associated with other begomoviruses. infectious clones of vyvv dna a and dna b as dimers were made using the products of rca-rflp. these infectious clones will be used for agroinfection of vernonia and the results will be discussed. this is the first report of the molecular characterization of vernonia yellow vein virus (vyvv) from vernonia cinerea in india. production of bulb and seed crop of onion (allium cepa l.) is hampered by onion yellow dwarf virus (oydv) and iris yellow spot virus (iysv) with an incidence of 83.22% and 89.97% in bulb crop and 90.65% and 89.58% in seed crop, respectively in the popularly grown cv. hisar-2. four symptom-based variants of oydv designated as grade a, b, c and d produced varied types of symptoms in onion crop incurring heavy losses in bulb and seed production. iysv caused tiny hay coloured spots of different shapes and sizes on leaves and scapes which later coalesced and led to drying and lodging of scapes. the plant height, bulb weight and bulb size were 37.7 cm, 75.5 g and 24.2 cm 2 in plants infected with oydv, 39.6 cm, 79.7 g and 25.5 cm 2 in iysv infection, 35.1 cm, 68.4 g and 22.1 cm 2 due to their combined infection, as compared to 40.6 cm, 88.4 g and 27.6 cm 2 respectively, in healthy plants of bulb crop. in plants infected with oydv grade a the plant height was minimum (90.33 cm) whereas the number of umbels was maximum (9.20 umbels/pl.) but other yield parameters viz., weight/umbel (2.32 g), number of seeds/umbel (209), seed weight/umbel (0.64 g) and seed yield/plant (5.88 g) were recorded to be the lowest. the minimum reduction in plant height (100.26 cm), weight/umbel (6.72 g), number of seeds/umbel (633), seed weight/umbel (2.36 g) and seed yield/plant (11.90 g) were recorded in oydv grade d. the plant height was 98.84 cm with 5.10 umbels per plant, 4.24 g weight/umbel, 428 seeds/umbel, 1.25 g seed weight/umbel and 6.37 g seed yield/plant in iysv infected plants. the plant height (96.26 cm), umbels/plant (5.97), weight/umbel (4.60 g), number of seeds/umbel (432), seed weight/umbel (1.42 g) and seed yield/plant (7.82 g) were found to be the lowest in combined infection of oydv and iysv diseases in comparison to higher values in healthy controls (104.50 cm, 4.90, 7.84 g, 677, 2.60 g, 12.74 g, respectively). a minimum reduction in the test weight, germination and seed vigour index were found (3.06 g, 75.68% and 926) due to oydv grade a infection, whereas these were 2.92 g, 70.42% and 788 in iysv disease infected plants and 2.62 g, 70.4% and 776 in combined infection of oydv and iysv diseases in comparison to 3.84 g, 88.67% and 1276 in healthy plants. the maximum hampering of seed vigour parameters was recorded due to iysv infection. lodging of scapes caused by this disease was responsible for heavy losses in seed production and seed quality. cotton leaf curl disease is one of the major threats to cotton cultivation from northern india. survey conducted during 2009, observed the disease incidence ranged from 70 to 90% from bhatinda, abohar, fazilka, sri ganganagar, hanumanghar. in order to study genetic variability in the virus, twelve clcuv isolates were partially characterized (700 bp common region, full length av2 gene and partial sequences of ac1 and av1 gene). full length characterization of representative isolates from bhatinda, abohar, fazilka, sri ganganagar, hanumanghar is under progress. partial sequence analysis of clcuv isolates revealed that, the virus isolates collected during 2009 cropping season are closely related to cotton leaf curl burewala virus from pakistan and results were discussed. pratibha singh, h. s. savithri department of biochemistry, indian institute of science, bangalore tospoviruses, belonging to the family bunyavirideae, infect economically important plants such as groundnut, tomato, watermelon etc. they have a tripartite genome, with l, m and s segments of rna, in pseudo circular (panhandle) form. the viral genomes encode four structural proteins (l, n, g1 and g2) in the antisense orientation, and two non structural proteins nss and nsm in the sense orientation. the nsm is the only protein unique to tospoviruseses that infect plants in the bunyaviridae family and hence is proposed to be important for cell to cell movement. ground nut bud necrosis virus (gbnv), a member of the tospovirus genus, is the most prevalent virus infecting several species of leguminosae and solanaceae plants in india. total rna was isolated from gbnv infected tomato leaves and rt-pcr was performed using appropriate primers to amplify the nsm gene. the pcr product was cloned in pgex5x2 vector. the recombinant nsm clone was transformed into bl21 (de3) e. coli cells and over-expressed by induction with 0.3 mm iptg. sds-page analysis of induced and uninduced fraction revealed the presence of overexpressed protein of expected size. the soluble gst-nsm was purified by gsh sepharose affinity chromatography. purified gst-nsm was shown to interact with in vitro transcribed rna transcript by electrophoretic mobility shift assay. further nsm was shown to interact with viral encoded proteins np and nss using elisa and yeast two hybrid system. nsm was also shown to be phosphorylated in vitro by pellet fraction of plant sap. thus the recombinant gbnv nsm possesses the characteristic features of a movement protein such as nucleic acid binding, interaction with nucleocapsid protein, and ability to undergo posttranslational modification. solanum melongena, commonly called as egg plant is one of the most important vegetable crop in the world. it is cultivated widely in the tropical and sub tropical regions. several viruses such as cucumber mosaic cucumo virus (cmv), potato virus-y (pvy), potato virus-x (pvx) and tobacco ring spot virus (trsv) infect egg plant under natural conditions. in india major crop losses due to cmv infection in brinjal is 57% (fao stat-2008) . in the present study the infected leaf samples were collected from local fields of ramapuram, chandamama palli, chandragiri, madanapalli, yadhamari, durgasamudram villages in and around tirupati, were tested for cmv infection by dac-elisa with cmv antisera. the resulting positive samples were further inoculated to the raised brinjal seedlings of selected varieties through mechanical sap inoculation. different varieties of brinjal like mullabadhine, ankhur, ravya, mattigulla, casper and easter egg were used for monitoring the susceptibility to cmv infection. the mosaic symptoms were observed after 2 weeks of inoculation in all varities of brinjal except mullabadhina. among all these susceptible varities ankhur variety is selected to study induced biochemical changes such as chlorophylls, carbohydrates, proteins, nucleic acids and polyphenol oxidases in cmv infected brinjal leaves. in the infected leaves considerable reduction in chlorophyll and starch and increase in total proteins, sugars, rna and polyphenol oxidases was observed when compared to healthy leaves. the amount of total starch, protein and dna decreased to about 25, 136 and 645 lg/g respectively in infected leaves, where as sugars (75 lg/g), rna content (754 lg/g) and polyphenol oxidase activity was increased as compared to healthy leaves. the above results suggests that there is an altered concentrations of chlorophyll, proteins, nucleic acids, carbohydrates and polyphenol oxidase activity in the brinjal leaves due to the effect of cucumber mosaic cucumo virus infection. leaf analysis was found to be used as widely accepted diagnostic tool to assess the nutritional status of the vegetables. the present study deals with these aspects in detail. the total rna and dna was isolated from infected leaf samples. rt-pcr assays were performed using sugarcane yellow leaf virus (scylv) specific primers (scylv-615f and scylv-615r). the infection of scylv was detected in all the collected samples, which showed the expected size (*610 bp) amplicon during rt-pcr. in another experiment with nested pcr analysis, a phytoplasma characteristic 1.2 kb rdna pcr product were amplified from dnas of all infected samples but not in healthy sugarcane plants tested using phytoplasma universal primer pairs p1/p7 and fu3/ru5. dna extracts from plants with yellow mid rib and leaf yellows produced products of 1250 bp, which gave typical phytoplasma profiles when digested with hae iii and hha i. no pcr amplifications were produced using dna from symptomless plants. our results suggest that the yellow mid rib and leaf yellows symptoms on sugarcane varieties in uttar pradesh and uttarakhand states of india exhibiting midrib yellowing and leaf yellows symptoms is mainly caused by mixed infection of scylv and scylp. the affected clumps showed reduction in stalk height as compared to healthy fields. thirty-one sugarcane mosaic isolates belonged to sugarcane mosaic virus (scmv) and sugarcane streak mosaic virus (scsmv were collected from china and india), confirmed in indirect elisa and rt-pcr amplification with scmv and scsmv-specific primers. the amplicons (0.8 kb) from the coding region of coat protein (cp) were cloned, sequenced and compared to each other as well as to the sequences of 15 scmv isolates from sugarcane (australia, usa, china, brazil, mexico and south africa), maize (australia, china, iranian) and one scsmv isolate from sugarcane (india) in genbank. maximum likelihood and maximum parsimony analyses robustly supported two major monophyletic groups that were correlated with the host of origin: the scmv subgroup that included 18 isolates from china and only 13 isolates from india, and the scsmv subgroup that contained all isolates from india. maize dwarf mosaic virus (mdmv) and johnsongrass mosaic virus (jgmv) were not detected in any of the samples tested. a strong correlation was observed between the sugarcane groups and the geographical origin of the scmv isolates. the 11 millable sugarcane samples from china contained a virus tentatively described as sorghum mosaic virus (srmv). three isolates from nine chewing canes in fujian, yunnan and guizhou provinces of china also contained srmv, and the other 12 samples including five isolates from india was found infected with scmv. no srmv infection has been detected in sugarcane mosaic samples from india. sequence comparisons and phylogenetic analysis indicated that srmv can be considered as the most common and prevalent potyvirus infecting sugarcane in china, however in india sugarcane streak mosaic virus is dominant in causing mosaic symptoms on sugarcane. dig-labeled dna probe complementary to coat protein (cp) region of tobacco streak virus (tsv) sunflower isolate was designed for the sensitive and broad-spectrum detection of tsv isolates, the most devastating virus in india. dot-blot and tissue print hybridizations with the digoxigenin labeled probe were performed for the tsv detection at field levels. here, dot-blot hybridization was used to check a wide number of tsv isolates with a single probe and sensitivity with different sample extraction methods. the probe with cp conserved region prepared from sunflower pcr amplicon was hybridized with the tsv field isolates of gherkin, pumpkin, sunflower, marigold and globe amaranth samples because of highly conserved with little variability in cp region. the sensitivity limits were decreased from total nucleic acid to partially purified and crude extract preparations. in particular, tissue blot hybridization offers a simple, reliable procedure as dot-blot, but requires no sample processing. because there is minimal sample preparation, tissue-print hybridization could be an important component of tsv management programs. thus, the above non-radioactive labeled probe techniques can facilitate in screening the samples during tsv outbreaks and in quarantine services. savita patil, rupali sawant*, k. banerjee virology group, agharkar research institute, macs, g.g. agarkar road, pune 411 004 two mycobacterium smegmatis strains (ari lab nos. v842 and v946) were employed for the isolation of mycobacteriophages from soil and sewage samples. mycobacteriophages were isolated from soil samples collected from an area surrounding the tuberculosis (tb) ward, naidu hospital, pune, against m. smegmatis strain v842. these were numbered as v942, v943 and v944 and were isolated by using washed-cell preparation method. the bacteriophages against the other m. smegmatis strain, i.e. v946, were isolated from soil samples (collected from around tb ward, sassoon hospital, pune). some of these phages (viz.v953, v954) showed plaques at 42°c but not at 37°c. thus they seem to be lysogenic. for propagating and increasing the titre of all the above isolates, various previously described methods were attempted, but none of these methods were satisfactory. but when siliconized glassware and plastic-ware were used, propagation was successful. we showed that siliconization of glassware and plastic-ware was essential for the propagation of our mycobacteriophage isolates v951, v952, v953, v954 and v955. also, phage dilution medium (pdm) as described by chaterjee et al. (2000) was found to be effective for picking out of the plaques made by the phages. in this way, the phage isolates were propagated up to p 3 . the various passages of the phage isolates v951, v952, v953, v954 and v955 (i.e. original, p 1 , p 2 and p 3 ) were stored at -80°c. pvp-29 effect on pigments due to geminivirus infection on cowpea (vigna unguiculata) shail pande*, naveen pandey, k. shukla mahatma gandhi p. g. college gorakhpur, d.d.u. gorakhpur university, gorakhpur geminiviruses are one of the most important group of viruses causing economic losses in tropics. the symptom produced are yellowing of leaves which directly affect the pigments of diseased plants it in turn affects productivity and yield of diseased plant. cowpea vigna unguiculata is one of the important crop cultivated throughout india for its green pods which are used as vegetables and seeds are used as pulse. cowpea is affected by many viruses amongst them geminiviruses are one of the important virus on the cowpea plant. in the present study total chlorophyll content was studied in leaf of cowpea of diseased and healthy plants using arnon's method. carotenoids were also studied using ikan's method. it was found that chlorophyll content in diseased plants were lower compared to healthy plant similar results were found with carotenoids so the geminivirruses infection lowers the chlorophyll and carotenoid content in diseased plants which reduces yield of diseased cowpea plant. shweta sharma 1 , amrita banerjee 2 , j. tarafdar 2 , r. rabindran 3 , indranil dasgupta 1 * 1 department of plant molecular biology, university of delhi, south campus, new delhi; 2 bidhan chandra krishi vishwavidayalaya, kalyani, nadia, west bengal 741235; 3 tamil nadu agricultural university, coimbatore, tamil nadu 641003 rice tungro disease is an important disease of rice, caused by a joint infection by two viruses: rice tungro spherical virus (rtsv) and rice tungro bacilliform virus (rtbv) in south and southeast asia. the complex of rtbv and rtsv is transmitted by an insect vector green leaf hopper (glh). previously we reported complete genomic sequences of two geographically distinct isolates of rtbv; rtbv-wb (west bengal) and rtbv-ap (andhra pradesh) collected from the field in mid-1990s. both the sequences showed high homology all along the genome but showed divergence from previously reported southeast asian isolate i.e. rtbv-phil (philippines). to check whether a time period of a decade has resulted into variability in the genomic sequence of different isolates of rtbv in india, we cloned and sequenced the complete genome of rtbv from two geographically distinct regions of india i.e. west bengal and kanyakumari collected from the field in 2008. the complete nucleotide sequence of the dna fragments covering the whole genome of rtbv was determined using universal primers m13f and m13r and by primer walking, without any ambiguities remaining. the nucleotide sequences of overlapping clones were assembled and analyzed using the dna analysis software generunner and blastn program of ncbi. homology search at the nucleotide and amino acid level were performed using the blastn and blastp (respectively) programs of ncbi. multiple sequence alignments were performed using clus-tal-w software. sequence analysis results thus obtained showed that both the recently obtained complete genomic sequences of rtbv from two geographically distinct regions of india i.e. west bengal and kanyakumari showed very high homology (both at the nucleotide and amino acid levels) with the two previously reported rtbv isolates from india i.e. rtbv-wb (west bengal) and rtbv-ap (andhra pradesh) all along the genome. as observed earlier both the sequences diverged significantly from the southeast asian isolates. this suggests that even after the spatial and temporal difference (a time gap of approx 10 years) between the two previously reported rtbv isolates and the recently reported one, there is very little sequence variability between them. this further strengthens the earlier reports that the rtbv genomes in india are highly conserved. homology search at the nucleotide level using blastn program with the previously existing rtbv isolates revealed a very high percentage identity of 99% with the rtbv west bengal isolate and 95% with the rtbv andhra pradesh isolate. this further strengthens the earlier reports that there is not much genetic variability in the rtbv genomes in indian subcontinent. complete genomic rna sequences of two geographically distinct isolates of rice tungro spherical virus (rtsv), a member of the genus waikavirus, family sequiviridae, were determined from india. out of the two previously reported sequences, the indian isolates were closer to the resistance breaking strain rtsv-[vt6] than rtsv[phila] . between them, the indian sequences showed nucleotide as well as amino acid identities of 96%. a moderate homology was observed between the leader peptide and a putative helper component protein involved in insect transmission of the maize chlorotic dwarf virus, a closely related waikavirus, indicating its possible transmission-related function. unlike rice tungro bacilliform virus, which causes rice tungro disease jointly with rtsv, and is significantly different between isolates from india and philippines, rtsv genomes were observed to be much more conserved between isolates from the two countries. rice tungro bacilliform virus (rtbv) are believed to be the joint causative agents for the devastating tungro disease of rice prevalent in south and southeast asia [11] . rice tungro disease has become the major cause of production losses in rice during last three decades in several rice growing states of india. here, we report, for the first time the complete sequence analysis of two geographically distinct indian isolates of rtsv. we analyze the deduced protein sequences and their phylogenetic relationship with the two complete rtsv sequences from philippines as well as with other members of sequiviridae family. we provide molecular evidence that the indian isolates of rtsv are closely related to those from the philippines. we had earlier reported that rtbv isolates between india and philippines differ significantly from each other [18] . this study was undertaken in order to see whether rtsv isolates from india also show similar difference from those reported from the philippines. frequent outbreaks of tungro were reported near kanyakumari in the last 2-3 years. the present work was undertaken to clone and sequence the full-length rtbv and rtsv genomes from the infected rice plants collected from above region and to analyze the similarity of its genetic material with the existing indian isolates of rtbv and rtsv. a 1.1 kb dna fragment encoding the reverse transcriptase gene of rtbv genome was amplified and cloned in t/a vector and was sequenced commercially. homology search at the nucleotide level using blastn program with the previously existing rtbv isolates revealed a very high percentage identity of 99% with the rtbv west bengal isolate and 95% with the rtbv andhra pradesh isolate. this further strengthens the earlier reports that there is not much genetic variability in the rtbv genomes in indian subcontinent. similarly, the cp3 region of rtsv was amplified by rt-pcr and was cloned in t/a vector. recently, rice tungro disease has been reported from kanyakumari district of tamil nadu. it is important to determine the genetic nature of this isolate in order to develop resistance strategies. it is thus necessary to clone and characterize the viruses from kanyakumari and to determine the mechanism of virus resistance in transgenic lines. rice tungro disease is an important viral disease of rice. rice tungro is caused by infection by two viruses: rice tungro bacilliform virus (rtbv) and rice tungro spherical virus (rtsv). rtsv is a plant picornavirus with a 12 kb single stranded rna genome. it belongs to genus waikavirus in the family sequiviridae and is necessary for transmission of the two viruses by the leafhopper vector nephotellix virescens. rtsv rna is translated to form a large polyprotein, which is then self cleaved to form the viral proteins, including the three coat proteins, replicase, protease. studies have been conducted on rtsv from philippines. correct information of sequence variability of viral isolates to check whether different geographical conditions like those present in india select for genotypically variable strain and to design for transgenic resistance strategy, information on rtsv from india is absolutely essential. the objective of this study was to clone rtsv isolates from india and compare the genetic diversity of indian isolates from other southeast asian isolates and amongst each other. also develop strategy to impair the attack of virus-complex on rice. the achieve this, complete genomes of two isolates from india were cloned by amplifying different genes by rt-pcr and subsequently cloned in ta vectors, followed by sequencing. subsequently constructs containing cp1-3, antisense replicase, sense replicase and double stranded replicase were cloned in plant transformation vector. these constructs were used to transform aromatic rice variety from indian-pusa basmati (pb1). pcr analysis of the above plants was done to check the stable insertion of insert in the transgenics. jatropha (jatropha curcas) of the family euphorbiaceae is being grown in india as a major commercial fuel (bio-diesel) crop. jatropha is cultivated in 200 districts of 19 potential states of india. unfortunately, the cultivation of jatropha is limited by the severe mosaic disease. recently, a severe mosaic disease with significant disease incidence was observed in 2006-2009 on j. curcas grown in experimental plots of nbri and j. gossypifolia, a weed growing road side around lucknow and kathaupahadi, madhya pradesh. the disease consisted of the symptoms of severe mosaic, blistering, leaf distortion and stunting of whole plant and no fruit/seed production in severely affected plants. symptomatology and whitefly population observed on them suggested the occurrence of begomovirus infection. to detect the begomovirus infection, the total dna from leaf samples of infected jatropha plants was extracted and polymerase chain reaction (pcr) were performed using three sets of begomovirus genus specific (cpit-i/cpit-t, paliv 1978/paric 496 and paliv 722/palic 1960) primers and the expected size *800 bp, 1.2 kb and 1.2 kb amplicons were obtained which confirmed the begomovirus infection. further to identify the begomovirus/es and investigate the genetic diversity among them exists if any, the *1.2 kb amplicons were cloned and sequenced. the sequence data were deposited in the genbank database under accession nos.: gq847545 and fj346232 (from j. curcas) and eu727086 and fj177030 (from j. gossypifolia). during blast analysis gq847545 and fj346232 shared highest 95% sequence identity with each other and 84-88%% with sri lankan cassava mosaic virus (aj579307, aj607394, aj890225, aj89 0229 and aj890224) and indian cassava mosaic virus from india (ay738105) therefore, designated as two strains of jatropha mosaic india virus-lucknow. blast analysis of eu727086 showed maximum 93% similarities with croton yellow vein mosaic virus (aj507777), 82% with tomato leaf curl new delhi virus (dq629102) and 80-79% with papaya leaf curl virus (aj436992 and y15934), therefore, identified as strain of croton yellow vein mosaic virus. blast analysis of the virus isolate (fj177030) showed highest 83% identities with tomato leaf curl virus-bangalore ii (tolcv-b ii-u38239) and 82-81% with tomato leaf curl karnataka virus (tol-ckv, ay754812, fj514798), therefore, considered as new begomovirus species ''jatropha yellow mosaic india virus''. the phylogenetic analysis of gq847545 and fj346232 (from j. curcas) and eu727086 and fj177030 (from j. gossypifolia) was performed along with some selected isolates of begomovirus which showed [90% sequence identities during blast analysis. the isolate eu727086 showed closest relationship with croton yellow vein mosaic virus while fj177030 showed separate clustering of all the four begomovirus from jatropha species. during phylogenetic analysis these isolates formed three separate clusters, therefore, they were considered as three distinct begomoviruses. the above data clearly show that some genetic diversity exists among the begomoviruses infecting jatropha species in india. bitter gourd (momordica charantia l.) of the family cucurbitaceae, also known as bitter melon is extensively cultivated in north eastern region of uttar pradesh, india. it is regarded as one of the world's major vegetable crops and has great economic importance. a severe yellow mosaic disease on bitter gourd (momordica charantia) with a significant disease incidence was observed during the survey of different locations of eastern up, india in the year 2007. the whitefly (bemisia tabaci) population was also observed in the vicinity. the characteristic disease symptoms and whitefly population indicated the possibility of begomovirus infection. total dna were isolated from infected as well as healthy leaf samples. two primer pair (tlcv-cp and roja's primer) were used to study, which resulted *800 bp with tlcv-cp in 3/3 samples and *1.3 kb amplicons with roja's primer in 3/4 samples. for further identification of the begomovirus, the pcr amplicons were cloned and sequenced (genbank accession no. eu439260 and eu888908, respectively). the blastn search analysis of eu439260 indicated 99-95% identity with several isolates of tomato leaf curl new delhi virus (tolcndv). the phylogenetic analysis also showed closest relationships of the isolate (eu439260) with tolcndv isolates. based on highest sequence identity and closed relationships with tolcndv the virus isolated from bitter gourd was considered as an isolate of tomato leaf curl new delhi virus. while, blastn search analysis of eu888908 isolate, shared highest 99-97% identites with pepper leaf curl bangladesh virus (peplcbv) isolates. the phylogenetic analysis of the virus isolate with selected begomovirus isolates revealed a closest relationship with peplcbv. these results confirmed the association of peplcbv on bitter gourd. study revealed the variability of viruses on bitter gourd in eastern up, india. tobacco streak virus groundnut isolate was characterized biologically by taking six cultivars (jl24, tmv2, k6, k7, k9) and one pre-release culture (k1271) using seedlings of 7-84 days old under glasshouse conditions. there were clear differences were observed among cultivars tested regarding incubation period, percent seedling wilt and time taken to death of seedlings. k-7 was least susceptible among all the cultivars tested and it supported least virus titer (a 405 nm: 0.11-1.23). both localized (necrotic lesions on leaf, veinal necrosis, leaf yellowing, wilting) and systemic (petiole necrosis, necrotic lesions on young leaves, death of top growing buds not only on main stem but also on all primaries (side shoots), followed by stem necrosis, stunted growth, axillary shoot proliferation with small leaves having general chlorosis, peg necrosis, pod necrosis, pod size reduction, wilt of plants) symptom were observed in all cultivars tested. biological differentiation of tsv and gbnv was made by sap inoculation of both viruses separately using susceptible groundnut cultivar jl24 under glasshouse conditions. there were certain similarities and differences were observed between these viruses infecting groundnut. seed infection of tsv ranged from 18.9 to 28.9% in seeds collected from naturally infected and sap inoculated groundnut cultivars/pre-releases (jl24, tmv2, k-6, k-7, k-9 and k-1271) belonging to spanish and virginia types. tsv was detected both in pod shell and seed testa from pod samples produced by sap inoculation under glasshouse conditions. however, seed transmission of tsv was not observed in groundnut. coat protein (cp) gene of three groundnut tsv isolates (gn-ap-1-00; gn-ap2-04; gn-ap3-07) were sequenced and all the three isolates contained a single open reading frame (orf) of 717 bp nucleotide and could potentially code for 238 amino acids (aa). cp gene of tsv isolates originating from different hosts shared high degree of sequence identity both at nucleotide (97.6-100%) and amino acid (95.7-100%) levels respectively. tones grown in an area of 3.83.430 ha (fao stat2007). in india papaya is grown in nearly 80,000 ha with an annual production of 7,00,000 tones (fao stat 2007) and occupies fourth place in the world. the crop is severely affected by a number of viruses. papaya ring spot virus (prsv-p) is the most important virus. the detection of virus infection in plants has traditionally involved either bioassay on indexing plants and or immunological methods (hill 1981, torrence and jones 1981) . use of nucleic acid probes has improved the detection and sensitivity of viruses. the most common non-radioactive probes are biotynilated probes, which are very specific and sensitive. papaya ring spot virus (prsv-p) is a positive sense ssrna virus belonging to the genus potyvirus family potyviridae and transmitted by aphids. prsv-p coat protein gene region was used as template cdna for probe preparation. dot-blot hybridization with the biotin labeled probe were performed for prsv-p detection. the clarified sap of healthy and infected plants were serially diluted and spotted onto the nitrocellulose membrane, hybridized to biotin labeled probe. biotin labeled rna's are employed as probes, with a subsequent detection based on streptavidin-alkaline phosphatase conjugates. the sensitivity for viral detection of the biotin labeled probe was found to be sensitive than enzyme linked immunosorbent assay (elisa). in recent years tospovirus is causing devastating damage to the yield of vegetables in india. it infects economically important crops viz., tomato, chilli, peppers, groundnut, watermelon and various legumes. now it is emerging as severe disease in brinjal also. in order to monitor the natural occurrence and distribution of tospovirus in vegetable, surveys were conducted in the predominant brinjal growing areas of gujarat, karnataka, maharashtra and andhra pradesh during 2008-2010 incidence ranging from 5 to 10%, 0 to 80%, 1 to 40%, and 0 to 55.78% respectively. samples collected from different places of india were found positive to pbnv in direct antigen coating-enzyme linked immunosorbent assay (dac-elisa). pbnv infected brinjal plants showed mosaic mottling of leaves with leaf distortion, longitudinal streaks on the stem and necrotic rings on leaves and fruits. early infection led to severe stunting and abnormal fruiting. biological and molecular characterization of pbnv-brinjal isolates were compared with other isolates and results are discussed. for identification of virus causing mosaic symptoms on soybean various host plants were tested. plants species belonging to the different families viz. caricaceae, graminae, leguminosae, malvaceae and solanaceae were tested. the virus produced symptoms on diagnostic plant species like chenopodium album, c. quinoa, helianthus anus, phaseolus vulgaris and vigna ungiculata. among tested families the leguminosae that were the host of virus included arachis hypogea, the virus causing mosaic symptoms in soybean is inactivated between 50 and 55°c and between dilution of 10 -4 to 10 -5 . all the inoculated plants of assay host showed the symptoms at 50°c but not at 55°c. similarly local lesions produced at 10 -4 but not at 10 -5 . the virus in crude sap was infectious up to 72 h but not at 96 h at room temperature. however, the percentage infectivity decreased progressively as the aging of the sap was increased at room temperature. on the basis of reactions on diagnostic hosts pvp-38 identification and characterization of potyvirus infected chilli (capsicul annum l the virus under study caused mild mosaic and severe mottling symptom in leaves of infected plants. the dilution end point (dep) of the virus was found to be 10 -3 to 10 -4 , longevity in vitro (liv) 1-3 days at room temperature (25°c), thermal inactivation point (tip) 50-55°c. electron microscopy of purified virus preparation revealed the presence of flexuous particle of size 780 nm long and 14 nm in width with characteristic cytoplasmic inclusions: pinwheels and scrolls. the virus was transmitted by sap and by aphid myzus persicae. the host range study revealed that the host species were restricted to family chenopodiaceae and solanaceae. on the basis of above characteristic, the virus under study was identified as potyvirus associated with mild mosaic and severe mottling symptom in capsicum. phytoplasma causing grassy shoot disease and sugarcane yellow leaf viruses are important pathogens of sugarcane. these pathogens are causing severe losses in sugarcane productivity. with a view to producing virus and phytoplasma free planting material of sugarcane, experiments were undertaken using infected varieties of sugarcane growing at the farms of sugarcane research institute. apical meristems measuring about 2 mm in length, were dissected out, surface sterilized and cultured on agar gelled murashige and skoog's (ms) medium containing growth regulators for shoot induction. the established shoot cultures were multiplied through repeated subcultures on fresh media at 10-12 days interval. elimination of gsd and scylv was confirmed through molecular analysis of regenerated plants using specific primers of scylv and gsd. results revealed that apical meristem culture technique is effective in eliminating the pathogens like scylv and phytoplasma (gsd) from the infected clones. this is probably the first report on elimination of grassy shoot disease in sugarcane through meristem culture. papaya ringspot virus (prsv), which causes the most widespread and devastating disease in papaya, isolates originating from different geographical regions in south india were collected and maintained on natural host papaya. the entire coat protein (cp) gene of papaya ringspot virus-p biotype (prsv-p) was amplified by reverse transcription-polymerase chain reaction (rt-pcr). the amplicon was inserted into pgem-t vector by t-a cloning method, sequenced and sub cloned into a bacterial expression vector prset-a using directional cloning strategy. the prsv coat protein was over expressed as fusion protein in e. coli. sds-page gel revealed that cp expressed as a *40 kda protein. the recombinant coat protein (rcp) fused with 69 his-tag was purified from e. coli using ni-nta resin. the antigenicity of the fusion protein was determined by western blot analysis using antibodies raised against purified prsv. the purified rcp was used as an antigen to produce high titer prsv specific polyclonal antiserum. the resulting antiserum was used to develop an immunocapture reverse transcription-polymerase chain reaction (ic-rt-pcr) assay and compared its sensitivity levels with elisa based assays for detection of prsv isolates. ic-rt-pcr was shown to be the most sensitive test followed by dot-blot immunobinding assay (dbia) and plate trapped elisa. key: cord-028721-x6f26ahr authors: nistal, manuel; paniagua, ricardo title: non-neoplastic diseases of the testis date: 2020-06-22 journal: urologic surgical pathology doi: 10.1016/b978-0-323-01970-5.50014-2 sha: doc_id: 28721 cord_uid: x6f26ahr nan testicular biopsy 665 infertility and chromosomal anomalies 686 other syndromes associated with hypergonadotropic hypogonadism 692 secondary idiopathic hypogonadism 694 hypogonadism secondary to endocrine gland dysfunction 697 infertility secondary to physical and chemical agents 705 infertility in patients with spinal cord injury 708 orchitis 708 histiocytosis with testicular involvement 712 non-neoplastic diseases of the testis manuel nistal, ricardo paniagua chapter 12 embryology and anatomy of the testis embryology sexual differentiation is the result of complex genetic and endocrine mechanisms that are closely associated with the development of both the genitourinary system and the adrenal glands. formation of the bipotential gonad and, subsequently, of the ovaries and testes, depends on gene expression in both sex and autosomal chromosomes. testes secrete steroid and peptidic hormones that are necessary for the development of inner and outer male genitalia. these hormonal actions are mediated by specifi c receptors that are transcriptional regulators. alteration of these genetic events leads to sexual dimorphism involving the inner and outer genitalia, and can also hinder the development of other organs. 1 chromosomal gender is established at fecundation with formation of an egg with either a 46xy (male) or a 46xx (female) karyotype. each chromosomal constitution initiates a cascade of genetic events leading to the development of female (ovaries) or male (testes) gonads (gonadal gender). hormonal secretions from the ovaries or testes are essential for the development of external genitalia (phenotypic gender). the relationship between the individual and the environment determines the social gender. there are multiple genes involved in the formation of the undifferentiated gonad. the two most important for the proper formation of the bipotential gonad are wt1 (wilms' tumor gene) and nr5a1 ( fig. 12-1 ). wt1 contains 10 exons located on chromosome 11p13, with two alternative splicing loci in introns 5 and 9. intron 9 splicing can lead to the inclusion or exclusion of three amino acids (kts: lysine, threonine and serine), giving rise to kts+ or kts− isoforms. an adequate kts+/kts− balance is crucial for normal expression of the gene. translation of this gene may generate up to 24 isoforms with several zinc-fi nger domains. this gene is expressed mainly in the kidneys and gonads, and mediates the transition from stroma to epithelium and morphogenetic differentiation (inhibits those genes that encode proliferative factors and activates those that enhance epithelial differentiation). wt1 gene anomalies lead to a wide variety of phenotypes; deletions are associated with minimal genitourinary alterations and predisposition to develop wilms' tumor. [2] [3] [4] missense heterozygous mutations give rise to denys-drash syndrome (complete or partial 46xy gonadal dysgenesis, renal disease of early onset with diffuse mesangial sclerosis, and wilms' tumor (omim 19408) ). 5 loss of the kts+ isoform accounts for frasier's syndrome (46xy gonadal dysgenesis, renal disease of late onset and absence of wilms' tumor (omim 136680)). 6 nr5a1 gene product is termed sf-1 (steroidogenic factor 1). the gene has seven exons in chromosome 9q33.3, and is expressed in the urogenital ridge that forms the gonads and adrenal glands. sf-1 promotes the expression of the anti-müllerian hormone (amh) and joins elements that regulate upstream the amh gene. sf-1 is fi rst detected in the developing sertoli cells of sex cords, but later is mainly localized in leydig cells. 7 a heterozygous deletion causes a female phenotype in patients with 46xy, adrenal failure during the fi rst weeks of extrauterine life, persistence of normal müllerian structures, and gonads consisting of poorly differentiated tubules embedded in abundant connective tissue. these patients do not respond to hcg stimulation. 8 in 46xx patients, ovarian development is not modifi ed by sf-1 mutations, and they present with adrenal failure only. 9 lim-1 is another gene involved in the formation of the bipotential gonad and kidneys. it was recently identifi ed in mice that bore homozygous deletions and presented alterations in both organs. 10 fgf-9 (fi broblastic growth factor 9) has also been related to gonadal development. both gonosomal and autosomal genes mediate the progression of the bipotential gonad toward testicular differentiation. the signal is triggered by the sry gene on the distal portion of the short arm of the y chromosome (sexdetermining region of the y chromosome; yp11.3), also called tdf (testis determining factor gene). 11 this gene stimulates the differentiation of sertoli cell precursors and germ cells, is responsible for the production of the anti-müllerian hormone, 12 and regulates other genes of the downstream cascade. these are either activated or inhibited by other genes in such a way that dozens of genes are involved in testicular differentiation. 13 the sry gene contains a single exon that encodes a 204 amino acid protein whose central part (79 amino acids) encodes a dna-binding domain termed hmg (high mobility group). immunohistochemical studies have demonstrated expression of the sry gene in the nuclei of both sertoli cells and germ cells, 14 suggesting that this gene acts in somatic cells of genital ridge and germ cells. sry works with the amh promoter gene and also regulates steroidogenic hormone expression. 15 sry mutations produce pure gonadal dysgenesis (swyer's syndrome) or true hermaphroditism; the karyotype of patients with the male phenotype lacking y chromosome is either 46xx sry+ (80%) or 46xx sry− (20%), and all have male external genitalia, testes, azoospermia and no müllerian structures. some 46xx srypatients have sox-9 duplication. 16 following discovery of the sry gene, the knowledge about genes involved in gonadal formation advanced experimentally with knockout mice and the study of human syndromes. now, there are numerous reported genes (including sox-8, sox-9, dax-1, lhx-9, lim-1 and dmrt-1) that encode associated transcription factors. sox-8 and sox-9 (sryy box 8 and 9 or sry hmg-box gene 9 ) are related to autosomal genes. sox-9 is on chromosome 17q24,3q25,1 and is expressed after sry expression in the same cell type (the pre-sertoli cell). 17 this gene is also essential for the development of the cartilaginous extracellular matrix. in the mouse gonad, sox-9 inhibits testicular development or sertoli cell marker expression, and the gonad acquires an ovarian pattern. 18 sox-9 haploinsufficiency (loss of a functional allele) causes camptomelic dysplasia (a syndrome characterized by abnormal formation of cartilage) and a 46xy constitution with female phenotype, 17, 19 whereas sox-9 duplication results in 46xx patients with male phenotype. 20 sox-8 is other cofactor in amh regulation and acts by protein-protein interaction with sf-1. experimental models show that sox-9 dysfunction results in replacement by sox-8 expression via a feedback mechanism. 21 dax-1 (dosage-sensitive sex-reversal, adrenal hyperplasia, x-linked) gene is involved in the development of testes, ovaries, and adrenal glands. dax-1, on x chromosome, is expressed during ovarian formation and inhibited by sry during testicular formation. duplication of the dax-1 region in xp21 results in 46xy gonadal dysgenesis. 22, 23 conversely, dax-1 mutations decrease gene expression, resulting in absence of adrenal cortex and hypogonadotropic hypogonadism; 10 testicular determination is normal. deletions in chromosomes 9p 24 and 10q 25 are associated with the female phenotype in 46xy individuals. chromosome 9p deletions are also associated with facial malformations, premature closure of the frontal suture, hydronephrosis, and delayed development. deletions of two genes (dmrt1 and dmrt2) on chromosome 9p24.3 may be found in 46xy females. terminal deletions in chromosome 10q are associated with genital malformations, multiple phenotypic anomalies, and mental retardation. in the fourth week of gestation, the urogenital ridges appear as two parallel prominences along the posterior abdominal wall. these give rise to two important pairs of structures: the genital ridges arising from the medial prominences, and the mesonephric ridges from the lateral prominences. the genital ridges are the fi rst primordium of the gonad and stand out as a pair of prominences about the midline. in 30-32-day embryos, each genital ridge is lateral to the aorta and medial to the mesonephric duct ( fig. 12-2) . the celomic epithelium forming the genital ridges grows as cordlike structures to create the primary sex cords. immediately beneath the celomic epithelium there are several mesonephric ductuli and glomeruli ( fig. 12-3) . the origin of the gonadal blastema results from the junction of two cell types: epithelial cells from the celomic epithelium and mesenchymal cells from the mesonephric region, 26,27 although experimental data are confl icting. one of the earliest effects of sry expression is induction of mesonephric cell migration toward the genital ridge. 28, 29 histochemical studies revealed that an early event is also disruption of the celomic epithelium basal lamina, permitting the migration of these epithelial cells inside the gonad. if chromosomal constitution is xy, these cells give rise to sertoli cells. 30 cells derived from the celomic epithelium are recognized by their pale cytoplasm, large size, embryology and anatomy of the testis the rete testis originates from mesonephric remnants of sex cords that are in continuity with the seminiferous cords. the connection between the testis and the mesonephros becomes progressively thinner ( fig. 12-4) . the testis has a round transversal section, and remains located between two suspensory ligaments: the cranial and the caudal, the latter of which gives rise to the gubernaculum. the development of the urogenital tract begins at the stage of the undifferentiated gonad, with the appearance of two different pairs of ducts: the wolffi an and the müllerian. the wolffi an ducts are formed in the mesonephros in the third week of gestation, when the cranial region of the segmented intermediate mesoderm gives rise to 10 pairs of tubules (the nephric tubules) that are metamerically arranged. these tubules form the pronephros. on each side of the body, the tubules converge to form a longitudinal duct that opens in the celomic cavity. in the fourth week, the pronephros disappears and is replaced by another tubular system (derived from the intermediate mesoderm, which is not segmented) that forms the mesonephros. the medial ends of the mesonephric tubules do not open to the celomic cavity but are connected to glomeruli at one end and the wolffi an duct at the other. at the end of the second month of gestation, the mesonephros is replaced by the metanephros or defi nitive kidney. however, in the male, the most caudal mesonephric tubules and the wolffi an duct persist. the former give rise to the ductuli efferentes, and the latter forms the ductus epididymidis, the ductus deferens, the seminal vesicle, and the ejaculatory duct. both müllerian ducts originate from a longitudinal invagination of the celomic epithelium in the anterolateral aspect of the genital ridge. the cranial end of each duct is a funnel that opens in the celomic cavity. each duct runs parallel and lateral to the respective wolffi an duct and, as they pass caudally, the müllerian duct crosses over the wolffi an duct and lies medial to it. finally, the two müllerian ducts fuse into the uterovaginal duct. this elongates caudally up to the posterior and ovoid euchromatic nucleus. the cells of mesonephric origin are darker and have a mesenchymal pattern. initially, the genital ridges are devoid of germ cells. in the third week, primordial germ cells appear in the extraembryonal mesoderm lining the posterior wall of the yolk sac near the allantoic evagination. they are ovoid, measuring 12-14 µm in diameter, and are easily detected histochemically by a high content of alkaline phosphatase. the nuclei are spherical and possess one or two prominent central nucleoli. the cytoplasm contains mitochondria with tubular cristae, lysosomes, microfi laments, lipid inclusions, numerous ribosomes, and abundant glycogen granules. attracted by chemotactic factors, the primordial germ cells migrate along the mesenchyma of the mesentery and reach the genital ridge by 32-35 days. the seminiferous cords arise from the gonadal blastema. 31,32 many germ cells reach the seminiferous cords, but some degenerate during migration. the seminiferous cords are delimited from the stroma by a basement membrane 33 and lose their connection to the celomic epithelium, which reduces its depth to one or two cell layers only. the intercordal mesenchyma, composed chiefl y of cells that migrated from the mesonephric stroma, differentiate later into myoid cells, leydig cells, fi broblasts, and blood vessels. 34 up to the sixth week, the gonads appear similar, although the incipient testes have more numerous blood vessels, more abundant stroma, 35 and a higher total dna content, suggesting more rapid growth. sertoli cells arise from somatic sex cord cells. these cells differentiate at the end of the seventh week from the somatic cells in the cords, develop adherent junctions between them and a basal lamina on the other cord surface, and begin to express amh. 36 in the eighth week, leydig cells differentiate from the intercordal gonadal blastema, 37 and immunohistochemical detection of 3β-hsd is apparently the fi rst step in this process. leydig cell development peaks during the 18th week, and numbers subsequently decrease progressively. 38 aspect of the urogenital sinus, forming the müllerian tubercle. the wolffi an ducts terminate at either side of this tubercle. the remaining structures of the male genital system are derived from the urogenital sinus. epithelium with endodermal origin forms the prostate, the urethra, and the bulbourethral and periurethral glands. the primitive urogenital sinus derives from the cloaca, a structure that appears at the end of the fi rst month and which consists of a dilation of the terminal portion of the primitive posterior intestine. the cloaca is closed by the cloacal membrane. in the third week, mesenchyma proliferates in the outer aspect of the cloacal membrane to form the cloacal folds and the cloacal eminence. in the sixth week, the cloacal folds enlarge to form the genital (or urethral) tubercle. external to the genital folds, another mesenchymal thickening develops into the genital prominences or genital swellings. in the fi fth week, a septum forms, dividing the cloaca into two compartments. the anterior compartment is the primitive urogenital sinus that is covered by the urogenital membrane. the posterior compartment is the anorectal canal, covered by the anal membrane. the primitive urogenital sinus then divides into two new compartments: superior and inferior. the superior compartment is the vesicourethral canal that later forms the urinary bladder and the urethra. the inferior compartment is the defi nitive urogenital sinus that will develop later according to the gender. the development of the male genital system is directly infl uenced by the action of multiple hormones, including anti-müllerian hormone (amh), dihydrotestosterone (derived from testosterone), and the pituitary hormones folliclestimulating hormone (fsh) and luteinizing hormone (lh) . amh (müllerian inhibitory substance; mis), 39 secreted by the sertoli cells, is a glycoprotein polymer consisting of two identical 72 kda subunits linked by disulfi de bonds. [40] [41] [42] it belongs to the tgf-β family and is synthesized as a 560 amino-acid precursor protein with proteolytic cleavage at 109 amino acids from the c terminal. cleavage is necessary to activate the hormone. amh is encoded by a 2.75 kb gene that comprises fi ve exons and is located on the p13.2 region of chromosome 19. amh is secreted by somatic cells only in both sexes: sertoli cells in males and granulosa cells in females. it is detected by 6-7 weeks of gonadal development (8-9 weeks of gestation), probably as soon as germ cells make contact with pre-sertoli cells, a week before the müllerian ducts lose their responsiveness. 43, 44 amh is at high concentration in the second trimester, but drops precipitously in the third trimester. 45 levels rise again during the fi rst year after birth, are detectable during infancy and childhood, and fi nally drop defi nitively to undetectable levels at the onset of puberty. the secreted amount of amh is inversely correlated to the degree of sertoli cell maturation. amh regulation is incompletely understood. its expression is controlled by steroidogenic factor 1 (sf-1), also called ad4bp, 46 which is an orphan nuclear receptor that acts as a transcriptional regulator of all steroidogenic genes. amh regulates sry expression, which in sertoli cells is detected immediately before amh expression. 47 during puberty, amh is negatively regulated by androgens. 48 amh acts on the testis, genital tract, and extragenital structures. it causes involution of the ipsilateral müllerian duct. action begins at the caudal testicular pole and progresses rapidly. in adults, remnants of the müllerian ducts include the appendix testis at the cranial end and the prostatic utricle (verumontanum) at the caudal end. amh also stimulates development of the tunica albuginea, formed by insertion of mesenchyma between the celomic epithelium and primordial sex cords. this mesenchyma is also the origin of collagenized connective tissue, with deposition of collagen fi bers in several layers that parallel the testicular surface. 49 amh also hinders the entry of spermatogonia in meiosis. 50 the best-known function of amh in the extragonadal system is the maturation of fetal lungs. 51 testosterone is synthesized by the leydig cells. these fi rst appear among the sex cords in the eighth week of gestation, and their number increases to 48 million per pair of testes by the 16th week, 52 occupying about 50% of the testicular volume ( fig. 12-6 ). the relative number of leydig cells decreases from the 16th to the 24th week, owing to rapid enlargement of the testis during this period. however, the absolute number of leydig cells remains constant. from the 24th week to birth, the number of leydig cells decreases to 18 million per pair of testes. testosterone synthesis begins after the 56th day of gestation. testosterone secretion is regulated by hcg and lh concentrations. hcg peaks between weeks 11 and 17 and drops markedly thereafter; hcg-dependent testosterone is the most important determinant of genital differentiation. wolffian duct differentiation occurs only as a response to the testosterone secreted by the ipsilateral testis. this secretion stimulates differentiation of the ductus epididymidis, ductus deferens, and seminal vesicle. anomalies in androgen synthesis lead to incomplete masculinization and cryptorchidism. dihydrotestosterone (dht) derives from testosterone by the action of the enzyme 5α-reductase and is responsible for differentiation of the prostate and the development of the external genitalia, male urethra, penis and scrotum. it induces fusion of the labioscrotal folds in the middle plane to form the scrotum and the middle scrotal raphe. the urethral folds become fused to form the penile urethra. the genital tubercle enlarges to form the glans penis. an ectodermal invagination of the glans tip forms the terminal portion of the urethra. the urogenital sinus gives rise to the urinary bladder, prostatic urethra, and prostate. 53 the initial effects of dht (labioscrotal fusion) occur on approximately day 70; the urethral groove is closed on about day 74; and the external genitalia are completely developed by week 20. the actions of these hormones occur at precise moments in development. failure in the amount or timing of secretion or in the responsiveness of target tissues causes most of the malformations found in intersex conditions. 52 fsh and lh both play an important role in the last months of gestation. lh appears in the fetal circulation during the 10th week and peaks by the 18th, decreasing progressively and slowly thereafter until birth. lh chiefl y regulates androgen production during the second half of fetal life. fsh is an essential mitogen for sertoli cells that reach the highest mitotic ratio at the end of fetal life ( fig. 12-7) . 54, 55 testicular descent testicular descent is the result of hormonal and mechanical actions that are not fully understood. three steps are recognized: nephric displacement, transabdominal descent, and inguinal descent. in nephric displacement, the gonad detaches from the metanephros in the seventh week of gestation. transabdominal descent occurs in the 12th week and consists of the displacement of the testis towards the deep inguinal ring. inguinal descent occurs between the seventh month and birth. 56 clinically, the term testicular descent often refers only to this last step, in which the testis passes from the abdominal cavity to the scrotum. testicular descent is directed by the gubernaculum testis, a structure that appears in the sixth week as an elongate condensation of mesenchymal cells extending from the genital ridge to the presumptive inguinal region. 57, 58 at this level in the abdominal wall, the gubernaculum cells persist as a simple mesenchyma while the remaining abdominal wall cells differentiate into muscle. these mesenchymal cells give rise to the inguinal canal. thus, the testis lies on a continuous column of mesenchyma limited by the cranial testicular ligament in the upper pole and by the plica gubernaculi that joins the testis to the future scrotal region in the inferior pole. the periphery of this mesenchymal tissue is invaded by the processus vaginalis, which develops from a peritoneal pouch that grows into this mesenchyma. once the inguinal canal and the plica gubernaculi are formed, development slows. in the seventh month the processus vaginalis undergoes active growth, the cremasteric muscle develops from the mesenchyma outside the processus vaginalis, and the distal end of the gubernaculum enlarges markedly. gubernacular enlargement occurs from the 16th to the 24th weeks of gestation period and is caused by hyperplasia, hypertrophy, and the absorption of a great volume of water by the glycosaminoglycans of the matrix. 59 the tissue is reminiscent of wharton's jelly of the umbilical cord. by this time, the testis-epididymis complex is pear-shaped and its largest component is the gubernaculum. the testis and epididymis slide through the inguinal canal behind the gubernaculum. simultaneously, development of the processus vaginalis is completed and the gubernaculum begins to shorten, the epididymis develops further, and the testicular blood vessels and vas deferens lengthen. 60 testicular descent is a complex process integrating several essential factors, including normal function of the hypothalamopituitary-testicular axis, normal development of abdominal musculature, gubernaculum and the processus vaginalis, 61, 62 and a testis with normal endocrine function. the critical role of normal hormonal function is supported by clinical and experimental observations: destruction of the hypophysis in laboratory animals impedes testicular descent; anencephalic fetuses usually have undescended testes; many cryptorchid patients have transitory neonatal hypogonadotropic hypogonadism; and some undescended testes descend after treatment with human chorionic gonadotropin or gonadotropin-releasing hormone. adequate intra-abdominal pressure is another requisite. 63, 64 in the prune-belly syndrome, bilateral cryptorchidism is associated with urologic malformations and absence of the abdominal wall musculature. in a variant of this syndrome, termed pseudo-prune-belly syndrome, there is a positive correlation between the development of the abdominal wall musculature and testicular descent. development of the processus vaginalis also plays a critical role in testicular descent. this structure grows within the gubernaculum; if it is partially replaced by fi brous tissue, the testis will follow other directions in its descent and end in an ectopic location. if fi brous tissue completely replaces the gubernaculum, the processus vaginalis and cremasteric muscle fail to develop fully, and descent of the testis is mechanically blocked. 62 the hormonal requirements for testicular descent are not clear. 65 the most important factor in transabdominal descent is the androgen-independent peptide insulin-like factor 3 (insf-3), a member of the relaxin-insulin family that is produced by fetal leydig cells. this peptide stimulates gubernaculum cells to initiate gubernaculum swelling, a necessary step for the initiation of testicular descent. 66 mutations in insl-3 gene or its receptors lgrb-8 (leucine-rich repeatcontaining g protein-coupled receptor 8) or great (g protein-coupled receptor affecting testicular descent) interfere with transabdominal descent and cause cryptorchidism. 67, 68 amh and androgens are also involved in the gubernaculum swelling reaction; androgens also facilitate regression of the cranial suspensory ligament. uncertainty exists regarding the mechanism of inguinoscrotal descent and its hormonal control. androgens and the genitofemoral nerve are two factors strongly implicated in these processes. the role of androgens on the gubernaculum is very limited, because this structure has neither muscular cells 69 nor androgen receptors at the time of testicular descent. androgenic effects are explained by the hypothesis of the genitofemoral nerve. 70 androgens appear to act on the nucleus of the genitofemoral nerve in the spinal cord rather than directly on the gubernacula, producing masculinization of the neurons that form this nucleus 71 (these neurons are much more numerous in males than in females) and secreting great amounts of calcitonin gene-related peptide (cgrp). in rats, cgrp causes rapid rhythmic contractions of the gubernaculum and it has been suggested that the gubernaculum might have embryonic cardiac muscle cells. however, it is also possible that cgrp acts on the cremasteric muscle that develops within the gubernaculum and is innervated by the genitofemoral nerve. this hypothesis is supported by the observation of neurogenic atrophy of this muscle in cryptorchid patients. 72 other factors involved in testicular descent are estrogens and epidermal growth factor (egf). during the fi rst trimester of gestation, mothers of cryptorchid infants have free estradiol serum concentrations that are signifi cantly higher than those of controls. 73 experimental studies have shown that estradiol diminishes gubernacular swelling and stabilizes müllerian ducts. it has been proposed that estradiol inhibits the cell proliferation that causes gubernaculum swelling. 74, 75 egf may facilitate testicular descent throughout the placental-gonadal axis. maternal egf levels increase just before fetal masculinization occurs. 76 the placenta has an elevated concentration of egf receptors, and placental stimulation by egf might stimulate hcg production, which may also stimulate fetal leydig cells to produce androgens; hypothetically, these and/or other factors may determine testicular descent. after birth, the gubernaculum and processus vaginalis regress. the gubernaculum is replaced by fi brous tissue that forms the scrotal ligament. the cephalic segment of the processus vaginalis atrophies after testicular descent. an exaggerated resorption of the processus vaginalis with pulling up of the testis may induce a testis that had descended normally to ascend, resulting in cryptorchidism. 77 from birth to puberty the testis is a dynamic structure, an important consideration in interpreting biopsies from children. all testicular components undergo waves of proliferation and differentiation prior to puberty. 78 three waves of germ cell proliferation occur: during the neonatal period, infancy, and puberty. 79 the last gives rise to complete spermatogenesis. there also are three waves of leydig cell proliferation (fetal, neonatal, and pubertal); the last corresponds to the pubertal wave of germ cell proliferation. the testis at birth the newborn testis has a volume of about 0.57 ml 80 and is covered by a thin tunica albuginea from which the intratesticular septa arise. these divide the testis into lobules containing the seminiferous tubules and testicular interstitium ( fig. 12-8) . the seminiferous tubules measure 60-65 µm in diameter, with no apparent lumina, and are fi lled with sertoli cells and germ cells. sertoli cells are the most abundant, with 26-28 cells per tubular cross-section ( fig. 12-9) . 81 they form a pseudostratifi ed cellular layer and have elongated to oval nuclei with darker chromatin than that of mature sertoli cells, as well as one or two small peripheral nucleoli. the cytoplasm contains abundant rough endoplasmic reticulum, several golgi complexes and numerous vimentin fi laments, and expresses inhibin b (fig. 12-10) . no specialized intercellular junctions appear between sertoli cells, but desmosome-like junctions are present between sertoli cells and germ cells. 82 two types of germ cell are present at birth: gonocytes and spermatogonia. gonocytes are usually located near the center of the tubules, with voluminous nuclei and large central nucleoli. 82 gonocyte migration is probably facilitated by cell adhesion molecules such as p cadherin, which is expressed by sertoli cells of immature testes. 83 spermatogonia are mainly located on the basal lamina, and possess smaller nuclei and less cytoplasm than gonocytes; the nucleoli are peripheral and very small. at birth, most spermatogonia correspond to the adult type a (see discussion on the adult testis below) ( fig. 12-11) . the testicular interstitium contains fetal leydig cells that resemble adult leydig cells but lack reinke's crystalloids ( fig. 12-12) . 84, 85 additionally, mast cells, macrophages, and hematopoietic cell are present. 86 the fi rst wave of testicular development occurs during the neonatal period and involves germ cells and leydig cells. these changes are caused by a signifi cant increase in secretion of both fsh and lh during the third postnatal month. 87-89 testicular weight and volume increase. lh stimulates the leydig cells to produce testosterone, 90,91 which stimulates the transformation of gonocytes to spermatogonia of the ad type ( fig. 12-13) . afterwards, some of these in some normal testes at this age, meiotic primary spermatocytes and round spermatids are observed ( fig. 12-15 ). this spermatogenic attempt fails and many degenerate germ cells may be present. 94, 95 the testis continues to produce amh (by sertoli cells) 96 and inhibin b. 97 amh modulates the number and function of leydig cells by regulating differentiation of their mesenchymal precursors and the expression of steroidogenic enzymes. 98 inhibin b plays a role in fsh inactivation during infancy. the cause of this second wave of germ cell proliferation is unknown; there is no elevation of fsh or lh serum concentrations between 6 months and 10 years of life. after the sixth year, there is a slight increase in adrenal androgens, but testicular testosterone levels increase only after the 10th year. 99, 100 by the third year, most leydig cells have degenerated: from a peak of about 18 million at birth, only 60 000 remain by the age of 6 years. at this age, testosterone levels embryology and anatomy of the testis divide to form ap spermatogonia (see discussion on the adult testis below). six months after birth, gonocytes are absent, coinciding with the loss of fetal germ cell markers (placental alkaline phosphatase and c-kit). paraganglia are often observed in epididymides and spermatic cords from newborns. this is not surprising, as paraganglia are the main source of catecholamine before birth ( fig. 12-14) . 92 the testis in infancy from the sixth month to approximately the second half of the third year of life, the testis is in a resting period; this quiescence is broken by the second wave of germ cell proliferation. 93 the number of ap spermatogonia increases, and b spermatogonia (derived from ap spermatogonia) appear. are similar to those of girls, 99 and most androgens are of adrenal origin. at about 9 years of age, the third and defi nitive wave of spermatogenesis begins, 101 coinciding with a signifi cant elevation of lh. this is followed by additional increases in the level of this hormone between 13 and 15 years of age. lh induces fi broblast-like leydig cell precursors to differentiate into mature leydig cells. 102 by the end of puberty, the population of leydig cells per testis has risen to about 786 million. 103 leydig cells secrete androgens, which, together with the rise in fsh between 11 and 14 years of age, cause sertoli cell maturation, germ cell development, and the appearance of tubular lumina ( fig. 12-16) , 103 increasing the size of the testes between the ages of 11.5 and 12.5 years of life. 104 at 13.5 years, before the testis reaches adult size, spermatozoa are present, secondary sex characteristics are completely developed, and the epiphyses close. 105 testicular biopsy in children is useful for diagnosing those with ambiguous genitalia, a history of leukemia or lymphoma whose testes underwent a rapid enlargement, or precocious testicular maturation of unknown cause. in other situations, the value of testicular biopsy is less established. for example, the value of biopsy of cryptorchid testes during orchidopexy is controversial. evaluation of biopsies of the prepubertal testis should involve the assessment of several features, including tunica albuginea thickness, mean tubular diameter, and the number of germ cells, sertoli cells, and leydig cells. the most frequent anomalies of the tunica albuginea include thin, poorly collagenized tunica albuginea with abnormal tubules typical of testicular dysgenesis (see the section on male pseudohermaphrodites with müllerian remnants, below); well-collagenized tunica albuginea containing ectopic seminiferous tubules, a frequent fi nding in cryptorchidism; and poorly collagenized tunica albuginea containing ovocytes characteristic of true hermaphroditic ovotestes. the mean tubular diameter is an excellent indicator of the development of the seminiferous epithelium. in the prepubertal testis, tubular diameter depends principally on the sertoli cells and thus indicates whether they are adequately stimulated by fsh. tubular diameter varies throughout, being smallest in the end of the third year of life, slowly enlarging up to 9 years of age, and rapidly enlarging thereafter up to 15 years ( fig. 12-17) . the most frequent abnormality in the prepubertal testis is a low mean tubular diameter. this is seen in undescended testes as well as in hypogonadotropic or hypergonadotropic hypogonadism. in the latter, the lesion results from anomalous sertoli cell responsiveness to fsh. 106 there are three levels of severity of low tubular diameter: slight tubular hypoplasia (up to 10% reduction in relation to the diameter normal for the age); marked tubular hypoplasia (from 10% to 30% reduction); and severe tubular hypoplasia (more than 30% reduction). many testicular biopsies show malformed seminiferous tubules that vary from straight or branched tubules up to ring-shaped. these are megatubules formed by either tight spiral or bell-shaped tubules. the presence of these malformations suggests the child will be infertile in adulthood. diffuse increase in mean tubular diameter may be unilateral or bilateral. unilateral increase is found in monorchidism (compensatory testicular hypertrophy) and some testes that are contralateral to cryptorchid testes. most frequently, diffuse enlargement occurs with benign idiopathic macroorchidism or macroorchidism associated with fragile x chromosome, familial testotoxicosis, hypothyroidism, or different forms of precocious puberty. focal increases in mean tubular diameter are usually associated with precocious maturation of the seminiferous epithelium layers, and occur at the periphery of some sertoli cell and leydig cell tumors. germ cells can be counted in two ways: calculation of the number of cells per tubular cross-section, or determination of the tubular fertility index. the former counts the number of germ cells in a light microscopic fi eld and divides this by the number of cross-sectioned tubules in the same fi eld. in the fi rst 6 months of postnatal life the normal testis has two germ cells per cross-sectioned tubule. this number drops to 1.5 at the end of the fi rst year and to 0.5 at the end of the third year. the number of germ cells increases to 1.8 cells at the age of 3-4 years, which coincides with the appearance of spermatocytes in some tubules. the tubular fertility index refl ects the percentage of tubular sections containing germ cells. in newborns, 68% of tubular sections contain at least one germ cell. from birth to 3 years this decreases to 50%, followed by a progressive increase to 100% at puberty. 93 if the numbers of gonocytes and spermatogonia are calculated separately, it is possible to determine when the transformation of gonocytes to spermatogonia occurs. the most accurate measure is calculation of total germ cell numbers per testis. this is more diffi cult because it requires morphometric assessment of intratubular volume and careful clinical measurement of the three axes of the testis. congenital decrease of germ cells occurs in numerous conditions, including trisomies 13, 18, and 21, some forms of primary hypogonadism such as klinefelter's syndrome, anencephaly, many cryptorchid testes, and in patients with posterior urethral valves and severe obstruction of the urinary ducts. 107 an increased number of germ cells may be seen at the periphery of germ cell tumor, gonadal-stromal tumor, and paratesticular sarcoma. at the periphery of leydig cell tumor, seminiferous tubular cellular maturation may be complete. three levels of severity of germinal hypoplasia are recognized: slight (tubular fertility index >50), marked (tubular fertility index between 50 and 30), and severe (tubular fertility index <30) ( fig. 12-17 ). marked and severe germinal hypoplasia is usually associated with marked or severe tubular hypoplasia, in most cases resulting from tubular dysgenesis. it also is useful to determine whether the seminiferous tubules devoid of germ cells are randomly distributed. if they are grouped, they probably belong to the same lobule or group of lobules that never will develop normally. other germ cells observed are multinucleate or hypertrophied spermatogonia and gonocyte-like cells; these latter may require immunohistochemical studies to exclude intratubular germ cell neoplasia. the number of sertoli cells per tubular cross-section varies during childhood as a result of slow proliferation from 4 years to 12 years 108 and the redistribution of sertoli cells as the seminiferous tubules become longer and broader. the pseudostratifi ed cellular pattern characteristic of sertoli cells at birth changes slowly to a columnar pattern at puberty . testicular biopsies may reveal hypoplasia or hyperplasia of sertoli cells; hyperplasia is usually pronounced and a sign of tubular dysgenesis, often detected during the fi rst year of life or the beginning of puberty. 109 some biopsies reveal one or several tubular sections containing sertoli cells with eosinophilic and granular cytoplasm that is positive to cd68 and α 1 -antitrypsin. these oncocytic changes are the result of lysosomal accumulation. 110 calculation of leydig cell numbers during childhood is diffi cult because at this age the population is scant. 102 semi-thin sections or immunohistochemistry to detect testosteronecontaining cells may be helpful. 111 selection of the appropriate denominator to express the leydig cell population is another problem. the most frequent measures are leydig cell number per tubular section, per unit area, or total number per testis. 104 low numbers of leydig cell are observed in undescended testes, hypogonadotropic hypogonadism, some variants of male pseudohermaphroditism caused by a defect in the lh receptor, and in anencephalic fetuses. high numbers of leydig cells occur in congenital leydig cell hyperplasia, 112 triploid fetuses, 113 variants of precocious puberty, several syndromes such as leprechaunism and beckwith-wiederman syndrome, and in most male pseudohermaphroditisms. an apparent increase in loose connective tissue is found in patients with marked tubular hypoplasia; in addition, disordered thick fusiform cell bundles are seen in patients with androgen insensitivity. other alterations include the presence of excessively developed lymphatic vessels (lymphangiectasis), focal hematopoiesis, leukemic infi ltration, and the presence of cells similar to those of the adrenal cortex (tumors of the adrenogenital syndrome). embryology and anatomy of the testis the adult testis is an egg-shaped organ that hangs in the scrotum from the spermatic cord, the retroepididymal surface, and the scrotal ligament. mean weight in caucasian men is 21.6 ± 0.4 g for the right testis and 20 ± 0.4 g for the left. mean testicular diameter is 4.6 cm (range, 3.6-5.5 cm) for the longest axis and 2.6 cm (range, 2.1-3.2 cm) for the shortest. [114] [115] [116] [117] testicular volume varies from 15 to 25 ml. the tunica albuginea and interlobular septa make up the connective tissue framework of the testis. the tunica albuginea consists of three connective tissue layers and an outer surface covered by mesothelium. from the outer to the inner layers, the amount of collagen fi bers decreases while the number of cells increases. the fi bers and cells in the two outermost layers form planes parallel to the testicular surface; cell types include fi broblasts, myofi broblasts, and mast cells. myofi broblasts are more numerous in the posterior portion of the testis. the thickness of the tunica albuginea increases with age from 400-450 µm in young men to more than 900 µm in elderly men. 118 it acts as a semipermeable membrane that produces the fl uid of the vaginal cavity. the presence of many contractile cells showing high concentrations of gmp suggests that the tunica albuginea undergoes impulses of contraction and relaxation. these cells might regulate testicular size 119 and favor the transport of spermatozoa into the epididymis. 120 the innermost layer, the tunica vasculosa, consists of loose connective tissue containing blood and lymphatic vessels. the interlobular septa consist of fi brous connective tissue with blood vessels supplying the testicular parenchyma. the interlobular septa divide the testis into approximately 250 pyramidal lobules with their bases at the tunica albuginea and vertices at the mediastinum testis. each lobule contains two to four seminiferous tubules and numerous leydig cells. 121 adult seminiferous tubules are 180-200 µm in diameter and 30-80 cm long. the total combined length of the seminiferous tubules is about 540 m (range, 299-981 m). 122 they are highly convoluted and tightly packed within the lobules. the seminiferous tubules comprise about 80% of testicular volume. the tubular lining of germ cells and sertoli cells is surrounded by a lamina propria (tunica propria) ( fig. 12-18 ). sertoli cells are columnar cells that extend from the basal lamina to the tubular lumen, with 10-12 cells per crosssectioned tubule. they are easily identifi ed by their nuclear characteristics. the nucleus is located near the basal lamina and has a triangular shape with indented outline, pale chromatin, and a large central nucleolus ( fig. 12-19 ). charcot-böttcher's crystals and lipid droplets often are visible in the cytoplasm. [123] [124] [125] [126] ultrastructurally, sertoli cells have characteristic nucleoli, plasma membranes, and cytoplasmic components. the nucleolus has a tripartite structure with a round fi brillar center, a compact granular portion, and a three-dimensional net composed of intermingled fi brillar and granular portions. [127] [128] [129] the plasma membrane has two types of intercellular junction which develop at puberty: junctions between adjacent sertoli cells, and junctions between sertoli cells and germ cells. 130 the inter-sertoli cell junctions are tightjunction complexes. the adjacent cytoplasm has numerous actin fi laments and parallel-arranged smooth endoplasmic reticula cisternae. in adjacent plasma membranes there are adhesion molecules, including connexin-43. between the plasma membrane and the adjacent endoplasmic reticulum cisterna there are many molecules, including those required for actin fi lament anchorage, vinculin, zonula occludens-1, plakoglobin, and radixin. the inter-sertoli cell junctions are the morphologic basis for the blood-testis barrier and divide the seminiferous epithelium into two compartments: the basal compartment (which contains spermatogonia and newly formed primary spermatocytes) and the adluminal compartment (which contains meiotic primary spermatocytes, secondary spermatocytes and spermatids). these junctions permit each compartment to have its own microenvironment for spermatogenic development. [131] [132] [133] the sertoli cell-germ cell junctions persist from the primary spermatocyte stage through spermatozoon release. these junctions are desmosomes and gap-type junctions. the adhesion among sertoli cells and germ cells is mediated by n-cadherin. these junctions have also occasionally been observed between spermatogonia. 134 sertoli cell cytoplasm contains abundant smooth endoplasmic reticulum, elongated mitochondria, annulate lamellae, lysosomes, residual bodies, glycogen granules, microtubules, vimentin fi laments around the nucleus , 135 actin fi laments in both inter-sertoli cell junctions and ectoplasmic specializations that surround germ cells, 136 lipid droplets in amounts that vary with the seminiferous tubular cycle, 137 charcot-böttcher crystals (structures several micrometers long, formed of multiple parallel laminae of protein), and scant rough endoplasmic reticulum and ribosomes. 138 the number of sertoli cells decreases with age, from about 250 million per testis in young men to 125 million in men over 50 years. 139, 140 there is a positive correlation between the number of sertoli cells and daily sperm production. 141 sertoli cells are the target of fsh 142, 143 and androgen action ( fig. 12 -21) . 144 in adulthood, they produce testicular fl uid through an active transport mechanism, and synthesize multiple products to ensure the nutrition, proliferation and maturation of germ cells, to stimulate other cells such as leydig cells and peritubular cells, 145 and to contribute to hormonal regulation (inhibin secretion) (table 12 -1). the transport of small molecules (<600-700 da) such as pyruvate, lactate, and probably choline from the sertoli cell, to germ cells occurs through gap junctions. large or small soluble molecules are transported by proteins that are synthesized by the sertoli cell, and include androgen-binding protein, transferrin, ceruloplas-embryology and anatomy of the testis min, sulfated glycoproteins, α 2 -macroglobulin, and γ-glutamyl transpeptidase. 146 activin and inhibin are sertoli cell-secreted proteins that induce the proliferation and differentiation of germ cells. whereas activin stimulates fsh production and, subsequently, spermatogonial proliferation, inhibin b inhibits fsh secretion, and is an important marker of spermatogenesis. 147 other sertoli cell secretions are interleukins, mainly il-1, 148 and growth factors such as transforming growth factor-β (tgf-β), insulin growth factors 1 and 2 (igf-1 and igf-2), and seminiferous growth factor (sgf) or stem cell factor (scf). some of these growth factors, such as tgf-α, tgf-β, and igf-1, are involved in the regulation of leydig cell function. other secreted substances include clusterin, the steroid 3-α-4-pregnen-20-one (3hp), and prostaglandin d synthase (table 12-2) . sertoli cells are also involved in migration of differentiating germ cells towards the tubular lumen. this movement leads to a continuous remodeling of the plasma membrane and requires synthesis of several proteases, including urokinase, tissue-type plasminogen activator, cyclic protein 2, collagenase iv, other metalloproteins, and several antiproteases, such as cystatin c, tissue inhibitor of metalloproteinase type 2, and α 2 -macroglobulin. 149 the sertoli cell also regulates germ cell apoptosis by the production of fas-ligand, which binds to the fas-ligand receptor (apo-1, cd95) in germ cell plasma membranes. in addition, sertoli cells possess receptors for several factors such as the nerve growth factor (ngf) produced by spermatocytes and young spermatids, emphasizing the complexity of the sertoli cell-germ cell relationship. sertoli cells also produce some steroid hormones (estradiol and testosterone) and several components of the seminiferous tubule wall, including laminin, type iv collagen, and heparin sulfate-rich proteoglycans. the germ cells of the adult testis include spermatogonia, primary and secondary spermatocytes, and spermatids . spermatogonia there are two types of spermatogonia: a and b. type a are about 12 µm in diameter, rest on the basal lamina, and are surrounded by the cytoplasm of the adjacent sertoli cells. the nuclei of type a spermatogonia are spherical, contain several peripheral nucleoli, and have four different patterns: ad (dark), ap (pale), al (long), and ac (cloudy). 150, 151 the cytoplasm of these spermatogonia contains a moderate number of ribosomes, small ovoid mitochondria joined by electron-dense bars, and lubarsch's crystals. these are several micrometers long and are composed of numerous 8-15 nm parallel fi laments intermingled with ribosome-like granules. ad spermatogonia are thought to be stem cells in spermatogenesis. some of them replicate dna and, during replication, acquire the al pattern. afterwards, they divide to make another ad (maintaining the stem cell reservoir) and an ap spermatogonium. during replication, ap spermatogonia become ac and then divide to form two type b spermatogonia. [152] [153] [154] type b spermatogonia are the most numerous, and their contact with the basal lamina is less extensive than that of ultrastructural studies show coarse chromatin masses in which synaptonemal complexes and sex pairs may be present. the nucleolus acquires a peculiar appearance, with segregation of the fi brillar and granular portions. associated with the nucleolus is the round body that contains proteins but no nucleic acids. 128 in the pachytene spermatocyte, homologous chromosomes are completely paired, and on electron microscopy the chromatin masses appear larger and less numerous than in the zygotene spermatocyte. in the diplotene spermatocyte, paired homologous chromosomes begin to separate and remain joined by the points of interchange (chiasmata); neither synaptonemal complexes nor sex pairs are observed. the diakinesis spermatocyte shows maximal chromosome shortening and the chiasmata begin to resolve by displacement towards the chromosomal ends. the nuclear envelope and the nucleolus disintegrate. the spermatocyte completes the other phases of the fi rst meiotic division (metaphase, anaphase and telophase), forming two secondary spermatocytes; the fi rst meiotic division lasts 24 days. 156 secondary spermatocytes are haploid cells, smaller than primary spermatocytes, and show coarse chromatin granules and abundant rough endoplasmic reticulum cisternae. 157 these cells rapidly undergo the second meiotic division and within 8 hours give rise to two spermatids. the newly formed spermatids differ from secondary spermatocytes, having smaller nuclei with homogeneously distributed chromatin. spermiogenesis the transformation of spermatids into spermatozoa is called spermiogenesis. during this process pronounced changes occur in the nucleus and cytoplasm. 158 the nucleus becomes progressively darker and elongated. 159 the cytoplasm develops the acrosome and fl agellum, 160 the mitochondria cluster around the fi rst portion of the spermatozoon tail, and the remaining cytoplasm is phagocytosed by sertoli cells. 161, 162 by electron microscopy, there are four tran-sient stages of spermatid development: golgi, cap, acrosome, and maturation. these correspond to those defi ned by light microscopy of nuclear morphology: sa, sb, sb 1 , sb 2 , sc, sd 1 and sd 2 . 163, 164 these phases may be grouped as early (or round) spermatids that comprise the stages with round nuclei (sa and sb), and as late (or elongated) spermatids that comprise the stages with elongated nuclei (sc and sd). mature spermatids (sd 2 ) are the spermatozoa that are released into the tubular lumen (spermiation). all the germ cells derived from the same stem cell remain interconnected by cytoplasmic bridges that ensure synchronous maturation during the spermatogenic process. 165 cycle of the seminiferous epithelium at fi rst glance, the arrangement of the germ cells in the seminiferous tubules appears disorderly. however, closer study reveals that these cells are grouped into six successive associations, designated i-vi. in contrast to other mammals, in humans the volume occupied by each association is small, so that several associations may be observed in the same tubular cross-section. stereological studies have shown that the successive associations are organized helically along the length of the seminiferous tubule. 126, [165] [166] [167] each association persists for a specifi c number of days (i, 4.8 days; ii, 3.1 days; iii, 1 day; iv, 1.2 days; v, 5 days; and vi, 0.8 days), and each successively transforms into the following one. finally, at the end of association vi, the cycle is repeated; the spermatogenic process requires 4.6 cycles. 168 because each cycle lasts 15.9 days, the transformation of spermatogonium into spermatozoon takes 74 days (fig. 12-22) . the succession of different associations probably depends on cyclic sertoli cell activity. cyclic changes in the mitochondria, rough endoplasmic reticulum, golgi complex, lysosomes, and lipid droplets have been reported. [169] [170] [171] this cyclic activity is probably regulated by germ cell signals. 172 the yield of human spermatogenesis is lower than that of embryology and anatomy of the testis the six different germ cell associations of the seminiferous tubules and the sequence of spermatogenesis. completion of spermatogenesis requires more than four cycles and lasts for approximately 74 days. each association is indicated by roman numerals with its corresponding duration. ad: dark type of a spermatogonia; ap: pale type of a spermatogonia; b: b spermatogonia; i: interphase primary spermatocyte; l: leptotene primary spermatocyte; z: zygotene primary spermatocyte; p: pachytene primary spermatocyte; ii: secondary spermatocyte (only in stage vi). s a , s b1 , s b2 , s c , s d1 , and s d2 represent the progressive stages of spermatid differentiation into spermatozoa. most mammalian species, including primates, with maximal cell degeneration occurring at the end of meiosis. 173 the seminiferous tubule is surrounded by a 6 µm thick lamina propria (tunica propria) consisting of a basement membrane, myofi broblasts, fi broblasts, collagen and elastic fi bers, and extracellular matrix. 174, 175 the basement membrane measures 100-200 nm in thickness, and displays three layers: lamina lucida (beneath the sertoli cells), lamina densa (basal lamina), and lamina reticularis (a discontinuous layer containing fi bers). the basal lamina contains laminin, type iv collagen, entactin (nidogen), and heparan sulfate. 176 external to the basal lamina there are fi ve to seven layers of fl attened, elongated peritubular cells that have important secretory functions (table 12-3) . 177 the cells forming the three to fi ve innermost layers are myofi broblasts containing numerous actin, myosin, and desmin fi laments. these cells play an important role in the rhythmic tubular contractions that propel spermatozoa toward the rete testis. 178, 179 the two outermost cell layers consist of fi broblasts without desmin fi laments, and with less actin and myosin than the myofi broblasts. collagen fi bers are present among the peritubular cells and are abundant between the basal lamina and the peritubular cells. elastic fi bers are located mainly at the periphery of peritubular cells. because elastic fi bers appear at puberty, their absence in adults is a sign of tubular immaturity or dysgenesis. 180 the extracellular matrix contains proteoglycans and fi bronectin. in addition, the tubular wall contains capillaries and leydig cells. these are very similar to the interstitial leydig cells and are named peritubular leydig cells. the most important functions of myofi broblasts are contraction of seminiferous tubules and control of sertoli cells. 181 myofi broblasts have α and β adrenergic and muscarinic receptors. 182 contractility depends on several factors produced in the testis (endothelin-1, vasopressin, oxytocin, and tgf-β) and prostaglandins. relaxation can be facilitated by the no/cgmp system because myofi broblasts are also able to synthesize nitric oxide. sertoli cell control by myofibroblasts is facilitated by the production of p-mod-s, which activates aromatase activity, inhibin production, and the secretion of androgen-binding protein and transferrin. the interstitium between the seminiferous tubules contain leydig cells, macrophages, neuron-like cells, mast cells, blood vessels, lymphatic vessels, and nerves, accounting for 12-20% of testicular volume. 183 the most numerous connective tissue cells are fi broblasts and myofi broblasts. the former are also known as interstitial dendritic cells or cd34-positive stromal cells. they display a network around the seminiferous tubules and leydig cells, and also form the outermost layers of the tubular wall. 184 this distribution begins in fetal life. some of these cells are in contact with typical macrophages, so it has been suggested that they might be involved in immune surveillance. myofibroblasts, in addition to their presence in the inner layer of the tubular wall, are numerous in the tunica albuginea. leydig cells are distributed single or in clusters, and form about 3.8% of testicular volume. most are in the testicular interstitium, although they may also be found in the tubular tunica propria, mediastinum testis, tunica albuginea, epididymis, and spermatic cord. extratesticular leydig cells are usually seen within or near nerve trunks. [185] [186] [187] leydig cells have spherical eccentric nuclei with one or two eccentric nucleoli and prominent nuclear lamina. the cytoplasm is abundant, eosinophilic, and contains lipid droplets and lipofuscin granules (residual bodies) ( fig. 12-23 ). reinke's crystalloids are found only in the leydig cells of adults and, although it was believed that these crystals were present exclusively in humans, they have also been observed in the wild bush rat. reinke's crystalloids are up to 20 µm long and 2-3 µm wide, consisting of a complicated meshwork of 5 nm fi laments with a trigonal lattice arrangement. depending on the plane of section, three basic aspects of this lattice can be discerned. frequently, the crystalloids display pale lines, considered to be potential planes of cleavage. the fi laments are grouped into 19 nm-wide hexagons visible on cross-section. in some areas there are aggregates of electron-dense, rodshaped structures. some leydig cells contain other types of paracrystalline inclusion, the most common of which consists of multiple parallel-folded laminae. 188 leydig cells contain abundant well-developed smooth endoplasmic reticulum, pleomorphic mitochondria with tubular cristae, lysosomes, and peroxisomes. leydig cells react with antibodies to s100 protein and neuron-specifi c enolase. 189 leydig cells immunoreact to lh receptors, 3-β-hydroxysteroid dehydrogenase (3-β-ηsd), relaxin-like factor, 190 inhibin, and ghrelin. 191 relaxin-like factor, also known as insulin-like factor 3 (insf-3), is a peptide that is involved in testicular descent and can be found in serum. its concentration is a maker of the leydig cell functional status. as occurs with testosterone, insf-3 production is associated with that of lh. 192 leydig cells immunoreact with calretinin, a 29 kda calcium-binding protein that has a buffering effect to avoid abnormal increases in intracellular calcium. 193 calretinin is a more sensitive marker than inhibin, albeit less specifi c ( fig. 12-24) . 194 leydig cells also contain vegf and its two receptors (flt-1 and kdr), and endothelin and its two receptors (α and β). vegf and endothelin are involved in paracrine and autocrine control of leydig cells. leydig cells near seminiferous tubules show immunoreactivity for glial fi brillar acid protein (gfap) 195 (fig. 12-24 ). the demonstration of several substances that are characteristic of nerve cells, such as substance p, neurofi lament triplet proteins (nf-l, nf-m and nf-h), and the ultrastructural observation of microtubules, intermediate fi laments, and clear and dense core vesicles, qualifi es leydig cells for inclusion within the family of the diffuse endocrine system or paraneurons. 196, 197 leydig cells of the adult testis originate from fi broblastic precursor cells at puberty under lh stimulation. 198 experimental studies in rats have shown that adult leydig cells differentiate from peritubular cells (myofi broblasts and blood capillary pericytes). precursor leydig cells are reminiscent of neural stem cells because they express nestin and eventually acquire properties of neurons and glial cells. 199 the human testis contains about 200 million leydig cells. this number decreases with age: the testes of 60-year-old men contain about half as many as those of 20-year-old men. [202] [203] mitotic fi gures are seen occasionally in normal leydig cells. 204 leydig cells are the target cell of lh, in response to which they produce testosterone and other androgens necessary for the maintenance of spermatogenesis and many structures of the male genital tract, as well as other tissues such as bone, muscle, and skin. [205] [206] [207] [208] testosterone acts on the sertoli cells, either directly 209 or via the p-mod-s factor secreted by the myofi broblasts in the tunica propria. [210] [211] [212] leydig cells also secrete numerous non-steroidal factors, including oxytocin, which acts on myofi broblasts and stimulates seminiferous tubule contraction; β endorphin, which inhibits sertoli cell proliferation and function; egf, which regulates spermatogenesis; and other factors with less known actions, such as angiotensin, pro-opiomelanocortin, and α-melanotropic stimulating hormone (table 12-4) . together with sertoli cells, peritubular cells, and endothelial cells, leydig cells produce nitric oxide, which has a relaxing effect on smooth muscle. 213 leydig cells are associated with cholinergic and adrenergic nerve fi bers. 186 varicosities containing synaptic vesicles in the proximity of leydig cells and nerve endings in direct contact with leydig cells have been reported, although the functional signifi cance of this innervation is unknown. 214, 215 macrophages, neuron-like cells, and mast cells macrophages are a normal component of the testis 216-218 and can be classifi ed into two groups: resident and activated. resident macrophages are an essential cell type of the testicular interstitium (about 25% of interstitial cells in mouse testis). 219 in young adult men, there is one macrophage per 10-15 leydig cells, and this number increases with age. macrophages are closely related to leydig cells and play a role in proliferation and differentiation of leydig cell fi broblastic precursors. 220 interaction between macrophages and leydig cells is an example of paracrine function. in the rat, testicular macrophages produce 25-hydroxycholesterol (25-hc) and express 25-hydroxylase, which transforms cholesterol into 25-hc. 221,222 activated macrophages produce interleukins 1 and 6 (il-1 and il-6), tumor necrotizing factor-α (tnf-α), and transforming growth factor-α (tgf-α). immunohistochemical techniques have demonstrated neuron-like cells in the testicular interstitium. 223 these cells are an important source of intratesticular cate-cholamines, which appear to be increased in some disorders such as the sertoli cell-only syndrome, and hypospermatogenesis. mast cells are a normal component of the testicular interstitium, where they are often found near blood vessels. their number increases in several diseases. 224 the testis is supplied by the testicular artery, which arises from the abdominal aorta. in the spermatic cord, the testicular artery gives rise to two or three branches that obliquely penetrate the tunica albuginea testis and to multiple branches that run along the intralobular septa of the testis. 225 these centripetal arteries lead to the mediastinum testis. along their course, the centripetal arteries give off branches that abruptly reverse direction; these are called centrifugal arteries. at puberty, both the centripetal and the centrifugal arteries develop a pronounced spiral architecture. 226,227 the centrifugal arteries develop additional branches in the testicular interstitium, giving rise to arterioles and capillaries that form intertubular plexuses, some of which are apposed to the tunica propria. 228, 229 capillaries are of the continuous type, except for the seminiferous tubule capillaries, which are partially fenestrated, 230 and their endothelial cells are similar to those of brain capillaries, with scant pinocytosis, intercellular junctions of the fascia adherens type, and low permeability. the mediastinum testis is poorly vascularized. the inner two-thirds of the testicular parenchyma is drained by veins that follow the interlobular septa to the mediastinum testis (centripetal veins). the outer third is drained by veins that lead to the tunica albuginea (centrifugal veins). both centripetal and centrifugal veins join to form the pampiniform plexus, which drains the testis via the spermatic cord. lymphatic vessels are poorly developed in the testis and limited to the tunica vasculosa and interlobular septa, 231 where they accompany arterioles and venules. prelymphatic vessels have been reported in the interstitium and probably drain interstitial fl uid into the true interlobular lymphatic vessels. efferent innervation of the testis is mainly supplied by neurons of the pelvic ganglia, where contralateral and bilateral neural connections occur. postganglionic nerve fi bers enter the testis via the pelvic nerves, extend throughout the tunica vasculosa, and follow the interlobular septa to reach the interstitium. these nerve fi bers end in the wall of arterioles, the wall of seminiferous tubules, and the leydig cells. 232 adrenergic nerve fi bers innervate the tunica albuginea and the blood vessels of the tunica vasculosa. 233 peptidergic nerve endings are uncommon. afferent nerve endings form corpuscles similar to those of meissner and pacini in the tunica albuginea. the rete testis is a network of channels and cavities that connects the seminiferous tubules with the ductuli efferentes. differences in the confi guration and size of channels and cavities distinguish three portions of the rete testis: septal (intralobular), composed of the tubuli recti; mediastinal, composed of a network of interconnected channels; and extratesticular, composed of dilated cavities (up to 3 mm in diameter) termed the bullae retis. the tubuli recti are short tubules (0.5-1 mm long) that connect the seminiferous tubules to the mediastinal rete, although some seminiferous tubules may connect directly to the mediastinal rete, principally those in the central region of the testis. the tubuli recti are lined by cuboidal epithelium. there are approximately 1500 tubuli recti (or their analogous seminiferous tubule segments). the tubuli recti in the cranial, central, and anterior testis are perpendicular to the mediastinal rete testis channel into which they drain, and those in the caudal testicular region are parallel to their respective channels. the transitional segments between the seminiferous tubules and the tubuli recti are formed by modifi ed sertoli cells. 234 the epithelium of the mediastinal rete testis consists of fl attened cells interspersed with small areas of columnar cells. both cell types have single centrally located cilia and numerous microvilli on their free surfaces, and contain keratin and vimentin fi laments. 235 there are interdigitations between adjacent cells. the epithelium rests on a basal lamina, surrounded by a layer of myofi broblasts and a more peripheral layer of fi broblasts and collagen and elastic fi bers. the rete channels and cavities are traversed by the chordae rete, columns from 15 µm to 100 µm long and from 5 µm to 40 µm wide, arranged obliquely to the long axis of the cavity. the chordae consist of fi brous connective tissue with fi broblasts and are covered by fl attened epithelium; the widest contain capillaries. the rete testis probably has the following functions: damping differences in pressure between the seminiferous tubules and ductuli efferentes; reabsorption of protein and potassium from tubular fl uid; and, occasionally, phagocytosis of spermatozoa. anorchidism refers to the absence of one (monorchidism) or both testes (testicular regression syndrome). monorchidism is estimated to occur in about 4.5% of cryptorchid testes, 236 40% of the testes that are impalpable in physical examination, 237 or 1 in 5000 males. bilateral anorchidism occurs in approximately 1 in 20 000 males. 238 monorchidism the hormonal pattern in prepubertal patients with monorchidism does not differ from that of normal children, whereas children lacking both testes have elevated levels of gonadotropins and fail to respond to stimulation with hcg. [238] [239] [240] although the hcg stimulation test is often positive in children with bilateral cryptorchidism, it is negative in some children with bilateral intra-abdominal cryptorchidism and this further complicates the differential diagnosis between anorchidism and cryptorchidism. 241 for unknown reasons, the left testis is more frequently absent (68.7%) than the right. in such cases the contralateral scrotal testis undergoes compensatory hypertrophy and its volume increases to more than 2 ml. 242 compensatory hypertrophy has also been reported in association with abdominal cryptorchid testis. 243 the absence of testicular parenchyma should be confi rmed before diagnosing monorchidism. at exploration, the fi nding of a vas deferens ending near or in a hypoplastic epididymis is not suffi cient for the diagnosis of monorchidism. the only acceptable fi nding is blind-ending spermatic vessels. if inguinoscrotal exploration fails to identify these vessels, intra-abdominal exploration is required to insure against an undescended testis and avoid the development of a testicular tumor. 224 all remnants found at exploration should be removed. 245 testicular regression syndrome testicular regression syndrome refers to a variety of conditions, including agonadism, anorchidism, testicular agenesis, rudimentary testes, hypoplastic testes, and embryonal testicular dysgenesis. 246 each of these syndromes shares a complete absence or involution of both testes 247 but differ in the time of testicular disappearance during development. the most frequently observed are swyer's syndrome (see discussion on gonadal dysgenesis below), true agonadism, rudimentary testes, bilateral anorchidism, vanishing testes syndrome, and leydig cell-only syndrome (table 12-5) . true agonadism (46xy gonadal agenesis syndrome) patients with true agonadism have ambiguous external genitalia, fusion of the labia, and a short vagina, refl ecting very early testicular regression (between the eighth and 12th weeks of embryonal development). the internal genitalia consist of a uterus and two uterine tubes, although both müllerian and wolffi an derivatives may be absent. no gonads (not even in an ectopic location) are found. patients are phenotypically girls, and the male gender may be discovered only at the time of referral for other symptoms. 248 both sporadic and familial cases with associated extragenital anomalies have been reported. in some cases the cause is a heterozygous mutation of wt1. 249 in most familial cases inheritance is either recessive autonomic or x-linked, and the cause seems to be either unknown anomalies in the wt1 gene or known anomalies in other genes involved in development. 250 a sry molecular defect has never been observed. 251 agonadism may be associ-ated with several syndromes, including those of pagod (hypoplasia of lungs and pulmonary artery, agonadism, omphalocele/diaphragmatic defect, dextrocardia), 252 kennerknecht, 253 seckel, 254 and charge. 255 rudimentary testes syndrome patients with rudimentary testes have a normal male phenotype. müllerian remnants are absent and wolffi an derivatives usually are found. the testes are cryptorchid and very small, less than 0.5 cm long. seminiferous tubules are few ( fig. 12-25 ). the testicular regression occurs between the 14th and 20th weeks of gestation. this syndrome has been reported in several members of the same family, 256 suggesting genetic transmission, but this is not a constant feature. 257, 258 congenital bilateral anorchidism congenital bilateral anorchidism occurs in 1 in 20 000 newborns. the patients have male external genitalia, but the internal genitalia consist only of normal wolffi an derivatives without müllerian derivatives, suggesting that the testes were present and functionally active up to approximately the 20th week of gestation. patients have male external genitalia with hypoplasia of both the scrotum and penis. the karyotype is the normal male. the disorder may be associated with other malformations, such as anal atresia, rectourethral and rectovaginal fi stula, and urinary exstrophy. patients diagnosed at adulthood have male phenotype, androgen insuffi ciency symptoms, and elevated levels of both fsh and lh. 259, 260 familial incidence in some cases suggests sry gene mutation, but this has not been confi rmed. 261, 262 vanishing testes syndrome this term refers to the disappearance of one or both testes between the last months of intrauterine life and the beginning of puberty. [263] [264] [265] as testicular regression occurs after the seventh month, exploration fi nds the vas deferens in the inguinal canal or high in the scrotum; it may be accompanied by the epididymis and, less frequently, by testicular remnants consisting of small groups of seminiferous tubules ( fig. 12-26 ). patients lacking both testes develop hypergonadotropic hypogonadism after puberty, with gynecomastia, infantile phallus, hypoplastic scrotum, and impalpable prostate. the condition is usually secondary to a perinatal scrotal torsion, 266 although rarely there is a genetic cause. 267, 268 leydig cell-only syndrome patients with leydig cell-only syndrome have agonadism without eunuchoidism and a normal male phenotype, although meticulous surgical exploration fails to fi nd testicular remnants. study of serial sections from the spermatic cord reveals clusters of leydig cells. 269 detection of testosterone in spermatic vein blood indicates that these ectopic leydig cells are functionally active and synthesize testosterone in amounts suffi cient to induce a rudimentary male phenotype but insuffi cient to support the complete development of secondary sex characteristics. the morphology of spermatic cord remnants is similar in monorchidism and testicular regression syndrome occurring after the 20th week of gestation. [270] [271] [272] grossly, a small, fi rm mass is found at the end of the cord ( fig. 12-27 ). histologic examination reveals vas deferens, epididymis, or small groups of seminiferous tubules in 69-83% of cases. 273 vas deferens is the most constant fi nding (79%), followed by epididymis (36%) and seminiferous tubules (5-13%). the spermatic vessels are abnormally small in 83% of cases. 245, 274 areas of dystrophic calcifi cation, hemosiderin deposition, and giant cell reaction may be found within the mass in place of the testis. other fi ndings include arterial and venous vessels (88%), fat (44%), and nerves that may resemble traumatic neuroma (56%). the minimal requirement to diagnose vanishing testis is to fi nd either a vascularized fi brous nodule with calcifi cation or hemosiderin, or a fi brous nodule with cord elements. 275 it has been proposed that removal of the testicular nubbin in this syndrome may not be required because the percentage of seminiferous tubules is very low and the presence of germ cells low, and thus the probability of a tumor is minimal. 276, 277 the general recommendation is scrotal exploration as a fi rst step, reserving laparoscopy for cases in which either the atrophic remnant cannot be identifi ed during scrotal exploration or has a patent vaginal process. 266 the histologic fi ndings suggest that most cases of unilateral and bilateral anorchidism are produced during the fetal period after the testis has inhibited the müllerian ducts and induced differentiation of wolffi an duct derivatives. two hypotheses account for the disappearance of the testes: primary anomaly of the gonad; and atrophy secondary to a vascular lesion such as thrombosis or intrauterine torsion. the presence of macrophages with hemosiderin and dystrophic calcifi cation supports the latter. absence of one testis may be associated with malformations of the urogenital system, such as absence of the kidney, cystic seminal vesicles, and ipsilateral renal dysgenesis. 278, 279 this clinical term refers to diverse conditions (klinefelter's syndrome, hypogonadotropic hypogonadism, rudimentary testes syndrome, bilateral cryptorchidism, etc.) that share small testicular size. 280, 281 a peculiar case is presented by some patients with kenny-caffey syndrome: short stature, cortical thickening and medullary stenosis of long bones, delayed closure of anterior fontanelles, hypoparathyroidism, and several ocular alterations. fsh serum levels are elevated, but only in some cases, whereas lh and testosterone are normal. adult testes are small, with seminiferous tubules showing complete but diminished spermatogenesis. leydig are hyperplastic. unlike patients with the rudimentary testes syndrome, microorchidism patients have a normal-sized penis and no epididymal or prostatic atrophy. 282 polyorchidism is a rare condition, with approximately 100 reported cases. 283, 284 it was fi rst described in a postmortem study in 1880, 285 and the fi rst case treated surgically and confi rmed histologically was reported in 1895. 286 although three testes are the most common, 287 four testes have been reported in six patients, [288] [289] [290] [291] [292] and fi ve in one case but without histologic confi rmation. 293 age of diagnosis varies from newborn to 74 years, with a mean of 17 years. testicular duplication is usually an incidental fi nding during surgery for inguinal hernia, cryptorchidism, or testicular torsion, but has also been detected in patients with infertility or unexplained fertility after bilateral vasectomy. 294 the extra testis is often intrascrotal (75%) and less frequently inguinal (20%), abdominal, 295 or retroperitoneal (5%). 296, 297 duplication is three times more frequent on the left than on the right. 298 high-resolution ultrasound is the appropriate diagnostic technique. 284, 299 testicular maldescent (40%), inguinal hernia (30%), hydrocele, varicocele, and contralateral cryptorchidism are the most frequently associated anomalies. [300] [301] [302] testicular torsion (13%) 303 and testicular cancer (5.4%) are occasional complications. although the extra testis may be histologically normal, 304-306 usually it is not, 300, 307 and displays lesions such as sertoli cell-only tubules, hypospermatogenesis, or maturation arrest. the lack of spermatogenesis has been attributed to the anomalous location of the testis and the absence of communication between the testis and excretory ducts. 308 the embryologic origin of polyorchidism remains uncertain, and the following have been proposed to account for the variety of fi ndings in different cases ( fig. 12-28 the clinical differential diagnosis of polyorchidism includes most of pathologic conditions that enlarge the scrotum and spermatic cords: spermatocele, hydrocele, cysts and tumors of the spermatic cord, crossed testicular ectopia, adrenal cortical ectopia, and splenogonadal fusion. orchidectomy used to be the treatment of choice for all atrophic and non-scrotal testes. today, most surgeons undertake fi xation of the testis to the scrotal pouch and the re-creation of a 'simple testis' if it is permitted by the anatomical condition and malignancy has been precluded. this treatment may allow spermatogenesis as well as additional psychologic and cosmetic benefi ts. 313 intrascrotal rhabdomyosarcoma, testicular teratoma, and seminoma have been reported in patients with polyorchidism. 314, 315 macro-orchidism may be uni-or bilateral and be associated with chromosomal anomalies or endocrine alterations. an increase in the testicular parenchyma occurs in several conditions, 316 including congenital leydig cell hyperplasia, compensatory hypertrophy, benign idiopathic macroorchidism, bilateral megalotestes with low gonadotropins, fragile x chromosome, and the testicular hypertrophy observed in juvenile hypothyroidism. congenital leydig cell hyperplasia is uncommon and may be diffuse or nodular. the diagnosis of diffuse leydig cell hyperplasia requires quantifi cation of leydig cells by morphometry, using normal newborn testes as controls ( fig. 12-29 ). nodular leydig cell hyperplasia is characterized by the presence of non-encapsulated leydig cell nodules in the mediastinum testis, adjacent testicular parenchyma and connective tissue among the ductuli efferentes ( fig. 12-30) . the differential diagnosis of nodular leydig cell hyperplasia includes intratesticular adrenal rests and bilateral leydig cell tumor. except for patients with adrenogenital syndrome, intratesticular adrenal rests are rare. these rests are encapsulated, with the exception of the adrenogenital tumors, and consist of radially arranged cells with vesicular nuclei and small nucleoli displacing the rete testis or seminiferous tubules. leydig cell tumors may be bilateral, poorly circumscribed, and surrounded by testicular parenchyma, features making it diffi cult to distinguish from leydig cell hyper-plasia. however, leydig cell tumors are rarely congenital, whereas those occurring at infancy often induce precocious maturation of the adjacent seminiferous tubules and early macrogenitosomia. leydig cell hyperplasia is caused by large quantities of hcg entering the fetal circulation. diabetic mothers, particularly those with hypertension, may develop hyperplacentosis; the resulting edema in the placental villi alters the vascular permeability and allows the passage of hcg to the fetus. congenital leydig cell hyperplasia decreases rapidly during the fi rst months of postnatal life, after maternal human chorionic gonadotropin is gone. combined diffuse and nodular leydig cell hyperplasia occurs in several malformative syndromes, such as beckwith-wiederman, leprechaunism, triploid fetuses, fetuses with rh isoimmunization, 317 and in several complications of pregnancy. compensatory hypertrophy has been observed in monorchidism, 318 cryptorchidism 319 ( fig. 12-31) , varicocele, 320 and after testicular injury. hypertrophy persists and may increase during childhood and puberty, but ceases thereafter; the hypertrophied testis then becomes normal or remains slightly enlarged. 321, 322 the degree of hypertrophy is determined by three factors: the volume of the remaining testicular parenchyma, the age at which the injury occurred, and the functional ability of the descended testis. 323 compensatory hypertrophy results from an alteration in the hypophyseal hormonal feedback mechanism, followed by an increase in secretion of fsh, evidence that the contralateral testis is normal. in monorchidism, the testis is initially normal. 237 when a 50% reduction of testicular mass occurs (probably before birth), the endocrine feedback changes and the resulting secretion of fsh (before or immediately after birth) causes accelerated growth of the contralateral testis. in cryptorchidism, the reduction in testicular mass is less severe than in monorchidism, and the scrotal testis may also be abnormal, inducing a lesser compensatory hypertrophy. compensatory hypertrophy develops between birth and 3 years of age, and the testis may reach a volume twice normal when the other testis is absent. 243 some prepubertal and pubertal patients have pronounced unilateral 324 or bilateral [325] [326] [327] testicular hypertrophy in the absence of other pathologic fi ndings. this probably results from hormonal receptivity in the testicular parenchyma. morphometric studies have shown that the testicular enlargement is chiefl y due to an increase in the length of the seminiferous tubules, although increases in tubular diameter and sertoli cell numbers have also been observed. elevated fsh serum levels, reported in some cases, or hyperactive fsh receptors might be the cause of the excessive sertoli cell proliferation and the lengthening and thickening of seminiferous tubules. [328] [329] [330] in addition, leydig cell hyperplasia and defi cient spermatogenesis are frequent fi ndings in adult life. as the development of the two testes may be asynchronous during puberty, some unilateral macroorchidisms may represent cases in which these differences are unusually exaggerated. about 2% of adults with fertility problems have enlarged testes, with volumes over 25 ml, and low levels of fsh, lh, testosterone, prolactin, and estradiol. 331 despite the important hormonal changes, sperm concentrations and total numbers of spermatozoa are higher than normal. low fsh levels may be attributable to increased inhibin secretion because the number of sertoli cells is elevated in these testes, but no explanation for the reduction in the other hormone levels has been found. fragile x chromosome is the best-known form of inherited mental retardation, with an incidence of 1 in 1500 males and 1 in 2500 females. 332 in addition to facial dysmorphia (large ears, prognathism, high forehead, and arched palate), macroorchidism (martin-bell syndrome) is often an associated fi nding. [333] [334] [335] [336] [337] the impaired gene (fmr1 gene) is mapped to xq27,3 which is genetically fragile. the gene alteration is due to a lengthening of a trinucleotide cgg repeat that results in fmr1 gene silencing. if the cgg sequence is repeated fewer than 200 times, the disorder is considered a premutation and males show no symptoms; if the number of repetitions exceeds 200, mutation is complete and all show the disorder. [338] [339] [340] in men with this syndrome, the average testicular volume is more than 70 ml (four times greater than normal). the penis also is larger than normal, and both anomalies are apparent in infancy. the scrotum is also enlarged and prematurely pigmented. this precocious genital development is diffi cult to explain because the hypothalamopituitary axis is normal, but it may be caused by increased sensitivity to stimulation by fsh. 341 testicular biopsies from adults may be normal or show interstitial edema and hypospermatogenesis ( fig. 12-32) . usually, there is normal testicular parenchyma with focal reduced spermatogenesis and sertoli cell hyperplasia ( fig. 12 -33) or tubules containing only immature sertoli cells. morphometry indicates that testicular enlargement is chiefl y the result of lengthening of seminiferous tubules. 328 the low number of spermatids is attributed to atrophy caused by compression of the seminiferous epithelium by marked increase in intratubular fl uid. 342 meiotic anomalies have been excluded. 343 the fragile x syndrome is second in frequency only to down's syndrome as a cause of mental retardation. [344] [345] [346] however, this chromosomal anomaly is not always associated with mental retardation or macroorchidism, and there are men with fragile x syndrome who are otherwise normal. 347 the terms 'fragile x-negative martin-bell syndrome' or 'mental retardation-macro-orchidism' refer to x-linked (mrmo) or xlmr+mo patients who have the martin-bell syndrome phenotype but do not present the fragile x site. the gene responsible for this disorder is mapped to xq12-q21. 348 starts. 353, 358, 359 the etiopathogenesis has been explained by three hypotheses: an increase in gonadotropin secretion caused by trh stimulation of gonadotropic cells; 360,361 a direct tsh effect on the testis due to the structural similarity between tsh receptors and fsh receptors present in the testis; 362 and a lack of steroid hormones that are required for testicular maturation (in their absence, sertoli cell proliferation is excessive, giving rise to testicular enlargement). [363] [364] [365] [366] precocious puberty precocious puberty is defi ned by onset of secondary sex characteristics at a chronologic age that is below the mean middle age for the population. for practical purposes, this is considered to be before 8 years of age in girls and 9 years in boys. the incidence is estimated at between 1 in 5000 and 1 in 10 000, with a female:male ratio higher than 20 : 1. in boys, the fi rst symptom is rapid testicular enlargement followed by growth of pubic and axillary hair, enlargement of the penis, and acceleration of skeletal growth. 367 according to hypothalamopituitary-gonadal axis function, precocious puberty can be classifi ed into three groups: central or gonadotropin-dependent, which results from the activation of this axis; peripheral or gonadotropinindependent, mediated by sex steroid hormones secreted by the testis or adrenal glands; and a mixed group that fi rst appears as peripheral precocious puberty and thereafter, because of the secondary response of the hypothalamus, becomes gonadotropin dependent. other possible causes of precocious puberty are hypoprolactinemia, pituitary tumor, and alteration of testicular steroid metabolism. central precocious puberty (cpp) central precocious puberty, also known as true precocious puberty, is isosexual. it is the most common form of precocious puberty in girls and accounts for more than 50% of cases in boys. the age of presentation is between 4 and 10 years. 368 the cause is only known in 60% of cases; most are related to lesions in the central nervous system, whereas the others are usually idiopathic. lesions in the central nervous system that causes cpp share alterations of specifi c areas, including the posterior hypothalamus (eminencia media and tuber cinereum), mammillary bodies, the bottom of the third ventricle, or the pineal gland. 369 testicular hypertrophy appears associated with fsh-secreting pituitary adenoma, 349 hyperprolactinemia, hypoprolactinemia, and hypothyroidism. 350, 351 the most frequent association of testicular hypertrophy is with hypothyroidism. children with hypothyroidism often show testicular enlargement without virilization. 350 about 80% have macroorchidism, 352 most have elevated fsh levels, and half have increased lh levels. 353, 354 testosterone levels are normal during infancy. the response of fsh and lh to gnrh is altered and no pulsatile lh release occurs ( fig. 12-34 ). 355 testicular biopsies before puberty show an accelerated development of the testis with pubertal maturation of seminiferous tubules but not leydig cells. testicular biopsies in untreated adults show tubular and interstitial hyalinization with few leydig cells. 356, 357 testicular size in this type of macroorchidism diminishes as soon as the substitutive therapy • hereditary diseases as neurofi bromatosis and tuberous sclerosis. children with type i neurofi bromatosis often have also optic pathway tumors. • cerebral irradiation, as occurs in hypothalamopituitary selective irradiation, 378 prophylactic irradiation in children with acute lymphoblastic leukemia, 379 and irradiation of cerebral tumor that is far from the hypothalamopituitary region. the diagnosis of central precocious puberty is easy if the hormonal fi ndings show elevated gonadotropin levels (both basal values and in response to gnrh), associated with high testosterone levels and an increase in either lh/fsh ratio or in lh and fsh values after stimulation with gnrh agonists. however, in some cases it is necessary to measure nocturnal lh secretion to fi nd secretion pulses before a dynamic test can reveal the pubertal pattern. knowledge of the etiology in males has improved with the use of ct and mri. 380, 381 one of the most important contributions of these techniques is the fi nding of a high number of hamartomas in children with precocious puberty. [382] [383] [384] these lesions, also known as gangliocytomas, consist of abnormally located neurons and glial cells. lesions are usually multiple, small, and located on the hypothalamus between the anterior part of the mammillary body and the posterior part of the tuber cinereum. these neurons contain lhrh-positive neurosecretory granules, suggesting that this hormone can be released into the blood draining the hypophyseal portal system and reach the gonadotropic cells. 385 precocious puberty owing to cerebral tumors usually occurs with advanced stage of the tumor, preceded by cerebral symptoms such as hydrocephaly, papillary edema, or psychic alterations. the same occurs when precocious puberty results from cerebral infl ammation or cerebral malformation. although pineal gland tumor is rare in children, 30% produce precocious puberty, principally in boys. this tumor is usually a teratoma or non-parenchymatous tumor that destroys the pineal gland, hindering its antigonadotropic action and initiating puberty. 386 in contrast, pinealocytederived tumor secretes great amounts of melatonin that delay the onset of puberty. peripheral precocious puberty (ppp) peripheral precocious puberty is also known as precocious pseudopuberty. it may be caused by a primary testicular disorder, a lesion in other endocrine glands, or hormonal treatment. primary testicular disorders causing precocious pseudopuberty include familial testotoxicosis, functioning testicular tumor, excessive aromatase activity, or leydig cell hyperplasia with focal spermatogenesis. the principal secondary anomalies include adrenal cortical anomaly (congenital adrenal hyperplasia, virilizing tumor of the adrenal, and nelson's syndrome), and lesion secondary to hcg-secreting tumor (hepatoblastoma accounts for half of precocious pseudopuberty cases, and testicular germ cell tumor and the tumors of the retroperitoneum, mediastinum, and pineal gland are responsible for the other half of cases). 387 familial testotoxicosis: gonadotropin-independent precocious puberty (gipp) or familial male-limited precocious puberty (fmpp) familial testotoxicosis is a form of male sexual precocity characterized by early differentiation of leydig cells and the initiation of spermatogenesis in the absence of stimulation by pituitary gonadotropin. this is a primary testicular abnormality with autosomal dominant inheritance. 388, 389 ultrastructural studies confi rm an adult leydig cell pattern and complete spermatogenesis, although many spermatids are abnormal. 390 the cause of familial testotoxicosis is a constitutive activating mutation of the lh receptor gene. 391 this gene comprises 11 exons and has been mapped to 2p21. hormonal measurements show elevated serum levels of testosterone, and low levels of dihydroepiandrosterone sulfate, androstenedione, 17-hydroxyprogesterone, gonadotropin-releasing hormone (grh), and lh, as well as absence of a pulsatile pattern. in addition, serum levels of inhibin b appear elevated before the normal age of onset of puberty. 392 in some patients, a mutation in lh receptor induces leydig cell adenoma. 393 precocious puberty secondary to functioning testicular tumor a syndrome of precocious puberty can be the result of different tumors, including leydig cell tumor, sex cord tumor, adrenal cortex virilizing carcinoma, and extratesticular hcg-secreting germ cell tumor. leydig cell tumor may cause precocious puberty. the testis is enlarged owing to tumor growth and maturation of the seminiferous tubules adjacent to the tumor; such maturation results from androgen secretion by tumor cells (fig. 12-35) . in most cases, the contralateral testis is not enlarged. 394, 395 sex cord tumor with annular tubules and large cell calcifying sertoli cell tumor may give rise to precocious pseudopuberty that is isosexual (development of musculature and axillary and pubic hair) and heterosexual (gynecomastia). this precocious testicular maturation and the development of the tumor itself cause testicular enlargement. it has been suggested that tumor cells stimulate leydig cells to produce androgens that are aromatized to estrogens by the tumor cells themselves, thus accounting for the clinical symptoms. these tumors are frequently observed in peutz-jeghers syndrome 396,397 and carney's complex. 398 most infants with adrenal cortex virilizing tumors have small testes, but some cases of testicular hypertrophy have also been observed. 399 testicular development in these cases is attributed to adrenal androgenic action on seminiferous tubules. 400 in untreated (or maltreated) congenital adrenal hyperplasia, both testes can be enlarged because they contain growing masses of adrenal cortex-like cells. 401 a similar condition is observed in nelson's syndrome. testicular enlargement is modest in paraneoplastic precocious pseudopuberty secondary to hepatoblastoma 402 or extratesticular hcg-secreting germ cell tumor, although nodular or diffuse precocious maturation has been occasionally reported. 403 precocious pseudopuberty secondary to excessive aromatase activity biosynthesis of c18 estrogens from c19 androgens occurs by three consecutive oxidative reactions that are catalyzed by an enzymatic complex known as estrogen synthetase or aromatase. 404 this complex has two components: p450 arom (a product from the cyp19 gene located on 15p21.1), 405 which joins c19 substrate and catalyzes the insertion of oxygen in c19 to form c18 estrogens; and nadph-cytochrome p450 reductase, a ubiquitous fl avoprotein that conveys reducing equivalents to any form of cytochrome p450 it meets. aromatase is in the endoplasmic reticulum of estrogensynthesizing cells and expressed in placenta, ovarian granulosa, sertoli cells, leydig cells, adipose tissue, and several central nervous system regions, including the hypothalamus, amygdala, and hippocampus. excessive aromatase causes excessive conversion of androgens to estrogen, 406 and is a heterogeneous genetic disorder with an autosomal dominant inheritance. the disorder leads to heterosexual precocious pseudopuberty with gynecomastia in males, and to isosexual precocity and macromastia in females. ultimately, patient stature is short because of the potent ability of androgens to accelerate epiphyseal closure. most males are fertile and have normal libido. 407 generally, the inhibitory estrogenic effect on testicular function is less than that observed with estrogen-producing tumors or in patients treated with exogen estrogens. excessive aromatase caused by p450 mutation induces alterations in both males and females. in females lacking estrogens owing to desmolase defi ciency, excessive aromatase leads to pseudohermaphroditism and progressive virilization at puberty; conversely, pubertal development is normal in males. in children, fsh and lh levels and gonadotropin response to gnrh are normal, suggesting that the role of estrogens in pituitary regulation is weak during infancy. 408 in both genders, epiphyseal closure is delayed and a eunuchoid habitus results. adult males have small testes, severe oligozoospermia, and complete asthenozoospermia; fsh and lh levels are high, testosterone levels are normal, and serum estrogen levels are very low. all patients with excessive aromatase have short stature, with continuing linear growth into adulthood, unfused epiphyses, osteoporosis, bilateral genu valgum, and eunuchoid proportions. the testes show macroorchidism with normal testicular consistency in some cases, 409 and are small with severe oligozoospermia and 100% immotile spermatozoa in other cases. 410 a syndrome similar to that of excessive aromatase production is found in patients with estrogen resistance caused by disruptive mutations of the er gene. these patients show macroorchidism, elevated testosterone levels, and increased levels of fsh, lh, estradiol, and estrona. 411 precocious pseudopuberty secondary to leydig cell hyperplasia with focal spermatogenesis this entity can present with clinical symptoms similar to those of a functioning leydig cell tumor; this is a precocious pseudopuberty with ipsilateral testicular enlargement. 412 the testes contains hypertrophic leydig cell nests in association with normal spermatogenesis. no tumoral mass is seen. leydig cells do not contain reinke's crystalloids and do not compress the seminiferous tubules. there is a clear delimitation between tubules with spermatogenesis and infantile immature tubules. the differential diagnosis between this entity and leydig cell tumor with precocious pseudopuberty is based on the histological pattern. open excisional testicular biopsy is recommended; if there is leydig cell tumor, or the diagnosis by frozen section is not conclusive, removal is advisable. 413 there are no data to suggest that this hyperplasia might develop into leydig cell tumor. mixed precocious puberty the best known form is the mccune-albright syndrome (mas), characterized by the association of 'coffee and milk' pigmentary lesions in the skin, bone lesions (polyostotic fi brous dysplasia), enlarged testes, prepubertal size of the penis, and absence of pubic and axillary hair. although testicular enlargement is usually bilateral, unilateral macroorchidism may be the fi rst symptom. 414 an interesting fi nding is that the onset of testicular maturation is induced by the testis itself, which produces steroid secretion due to autonomous hyperfunction of sertoli cells without evidence of leydig cell involvement. 415 this secretion causes early maturation of the hypothalamopituitary-testicular axis and, subsequently, true precocious puberty. 416 serum levels of testosterone are low, but those of inhibin b and amh are abnormally increased. this syndrome is caused by mutations that activate the gnas-1 gene, which encodes the α subunit of the trimeric g-protein. because mutations are lethal in the uterus, those subjects producing amh bear a mosaicism chromosomal constitution for this defi ciency. a testis is ectopic when it is in a location outside the normal path of descent. unlike cryptorchid testes, ectopic testes are nearly normal in size and are accompanied by a spermatic cord that is normal or even longer than normal, and by a normal scrotum. 417 testicular ectopia is classifi ed according to location; [418] [419] [420] [421] [422] in decreasing order of frequency, the major types are: • interstitial or inguinal superfi cial ectopia. this is the most frequent form and may be confused with inguinal cryptorchidism. after passing through the outer genital opening, the testis ascends to the anterosuperior iliac spine and remains on the aponeurosis of the major oblique muscle. these testes often are more nearly normal histologically than are cryptorchid testes. • femoral or crural ectopia. after passing through the inguinal canal, the testis lodges in the high crural cone in scarpa's triangle. • perineal. the testis is located between the raphe and the genitocrural fold. • transverse or crossed ectopia. both testes descend through the same inguinal canal and lodge in the same scrotal pouch. each possesses its own vascular supply, epididymis, and vas deferens. in addition, there is ipsilateral hernia. [423] [424] [425] [426] [427] [428] [429] [430] between 20% and 40% of patients with this ectopia have persistent müllerian duct syndrome [431] [432] and show a high incidence of testicular germ cell tumor. 433 • pubopenile ectopia. the ectopic testis is on the back of the penis near the symphysis pubis. 434 • pelvic ectopia. the testis is in the pelvis, usually in the depth of douglas' cul-de-sac. • other unusual testicular ectopias include retroumbilical, craniolateral to the inner inguinal opening between the outer and inner oblique muscles, and subumbilical. 435 rarely, the testis and its spermatic cord may protrude through a defect in the scrotal skin, a condition called testicular exstrophy. 436 the term testicular dislocation refers to testes that secondarily disappear from the scrotum and lodge around the superfi cial inguinal ring, within the inguinal ring, or inside the abdominal cavity as a result of testicular trauma. the formation of canalicular and intra-abdominal dislocation requires the presence of previous inguinal hernia. 437 testicular fusion is a rare anomaly characterized by fusion of the testes to form a single structure, usually in the midline. each has its own epididymis and vas deferens. this anomaly is often associated with other malformations, such as fusion of the adrenal glands or horseshoe kidney. cystic dysplasia of the testis is a congenital lesion characterized by cystic transformation of an excessively developed rete testis that may extend to the tunica albuginea of the opposite pole. 438 to date, fewer than 40 cases have been reported. 439, 440 the seminiferous tubules may be dilated and atrophic; this is more evident after puberty. ultrasound images are characteristic. 441, 442 cysts arise in the septal and mediastinal rete testis ( fig. 12 -36); they are interconnected and contain acellular, eosinophilic, periodic acid-schiffpositive material. they are lined by cuboidal cells that resemble those of the normal rete testis. [443] [444] [445] the connective tissue between the cysts is scant and histologically similar to the interstitial connective tissue. there may be small groups of cysts limited to the region of the mediastinum testis, or cysts extending throughout the entire testis. in extensive cases, residual seminiferous tubules occupy only a small crescent beneath the tunica albuginea and the testis is grossly spongy. cystic dysplasia occurs in normally descended and cryptorchid testes in children and adults, and may affect one or both testes. 446 in adults, the residual parenchyma often shows complete tubular sclerosis or hypospermatogenesis with intratubular accumulation of spermatozoa and leydig cell pseudohyperplasia. in most cases the epididymis is altered. 447 the head of the epididymis is small and contains few ductuli efferentes with irregular, usually dilated lumina. the ductus epididymidis is dilated, has an atrophic epithelium, and thick connective tissue replaces the muscular layer ( fig. 12-37) . testicular cystic dysplasia is frequently associated with severe anomalies of the urinary system. renal agenesis, [446] [447] [448] [449] renal dysplasia, 446 hydroureter, and urethral stenosis 450 have been reported ipsilateral to cystic dysplasia. the clinical differential diagnosis should consider all cystic testicular lesions impairing prepubertal testes, including epidermoid cyst, cystic teratoma, juvenile granulosa cell tumor, testicular lymphangiectasis, and simple cyst of the testis. 451 the presence of ipsilateral renal anomalies during ultrasound exploration provides an important diagnostic clue. 452 previously, orchidectomy was the treatment of choice, but testis-sparing surgery 453 is now recommended. 454, 455 the etiology and pathogenesis of cystic dysplasia are uncertain. given that the rete testis is a mesonephric derivative and most of the associated renal malformations are apparently caused by failure in the induction of renal blastema by the mesonephros, cystic dysplasia is considered to be the result of an abnormal mesonephros. during childhood, the normal rete testis has no lumina, and these form during puberty. the adult rete testis is a conduit for the passage of tubular fl uid and spermatozoa and also actively reabsorbs part of this fl uid while adding ions, proteins and steroids to it. malfunction of the rete testis cells may cause the formation of excessive fl uid of abnormal composition, resulting in a condition morphologically similar to cystic dysplasia of the rete testis induced in fowl by sodium intoxication or the administration of the saltretaining hormone deoxycorticosterone acetate. gonadoblastoid testicular dysplasia refers to an abnormally differentiated testicular parenchyma beneath the tunica albuginea. 456 the anomaly consists of large tubular or nodular structures within a dense stroma, reminiscent of ovarian stroma ( fig. 12-38 ). each structure is composed of three cell types: cells with vesicular nuclei and vacuolated cytoplasm; cells with hyperchromatic nuclei; and germ cell-like cells. the former two types are arranged at the periphery, forming a pseudostratifi ed epithelium. the third type resembles fetal spermatogonia and are fewer in number. these structures contain eosinophilic, periodic acid-schiff-positive material, similar to call-exner bodies ( fig. 12-39 ). there may be continuity between these structures and normal seminiferous tubules. the differential diagnosis includes conditions showing anomalous seminiferous tubules at the gonadal periphery, including testicular dysgenesis and gonadoblastoma. testicular dysgenesis also presents tubular or cord-like structures, but these are differentiated (some form true seminiferous tubules) and may also be present within a poorly collagenized tunica albuginea; patients with testicular dysgenesis are male pseudohermaphrodites with müllerian remnants. gonadoblastoma usually appears in a streak gonad or dysgenetic gonad and contains granulosa-sertoli cells and germ cells that are similar to those of dysgerminoma or seminoma; these cells are absent in gonadoblastoid testicular dysplasia. several cases with this disorder have been reported in patients with walker-warburg syndrome. 457, 458 this disorder refers to the presence, in an adult testis, of one or several foci of infantile (immature) seminiferous tubules. each group of tubules appears well delimited but unencapsulated. nodule size varies from microscopic to 5 mm. on section, each nodule is distinguished by its whitish color. sertoli cell nodule is found in most adult cryptorchid testes, regardless of when the testes descended. it is also present in 22% of normal scrotal testes in some series, 459 and is an occasional fi nding in males with idiopathic infertility. the seminiferous tubules have a prepubertal diameter and may be anastomotic. the epithelium is columnar or pseudostratifi ed, devoid of lumina, and usually consists only of sertoli cells ( fig. 12-40 ). the cells have elongated hyperchromatic nuclei with one or several peripherally placed small nucleoli. 459 the interstitium varies from scant to well collagenized. leydig cells are usually absent in these areas and, if present, their numbers are low. study of serial sections reveals continuity between some of these tubules and normal tubules. sertoli cell nodule changes with advancing age. the sertoli cells produce large amounts of basal lamina that protrudes inside the hypoplastic tubules. in transverse and oblique sections, these protrusions might be misinterpreted as intratubular accumulations of basal lamina material ( fig. 12-41 ). this material can undergo calcifi cation to form microliths. immunohistochemical study reveals two basic components of the basal lamina (collagen iv and laminin), confi rming its extracellular origin; the protrusions consist mainly of laminin, whereas collagen iv delimits the outer profi le of the seminiferous tubules. so, while the amount of collagen iv is uniform around the tubules, the depth of laminin varies within the same tubule. tubular hypoplasia is assumed to be a primary testicular lesion, and refers to the presence of seminiferous tubules that are unable to undergo pubertal development despite the same hormonal stimuli of adjacent normal tubules. this dysgenesis includes immature sertoli cell pattern, low inhibin secretion, absence of androgen receptors, 460 and lack of maturation of peritubular myoid cells that fail to synthesize elastic fi bers. the presence of hypoplastic zones in a testicular biopsy is an adverse prognostic sign for fertility. the differential diagnosis includes tubular hamartoma in androgen insensitivity syndrome, sex cord tumor with annular tubules, and mixed atrophy of the testis. tubular hamartoma in androgen insensitivity syndrome is multiple, similar to the hypoplastic zones of tubular hypoplasia; however, the sertoli-like cells of hamartoma have spherical nuclei (rather of elongated nuclei), form a cuboidal epithelium, and contain numerous leydig cells among the tubules (see androgen insensitivity syndrome, below). sex cord tumor with annular tubules may present with multiple foci of intratubular neoplasia, similar in distribution to that of hypoplastic zones; however, sex cord tumor appears in undescended testes, and in patients with peutz-jeghers syndrome, and consists of cuboidal or spherical cells that express cytokeratins that are not expressed in hypoplastic tubules. it is possible that hypoplastic tubules contain some germ cells that may be spermatogonia or gonocytes. there are scant spermatogonia that fail to display signs of maturation or proliferation. also, some of the tubules contain intratubular undifferentiated germ cell neoplasia that usually also appears in the adjacent, non-hypoplastic seminiferous tubules. the histologic picture is similar to that of gonadoblastoma, but such a tumor can be easily excluded because it arises in malformed gonads (gonadal dysgenesis and testicular dysgenesis) characteristic of intersex stages, unlike patients with tubular hypoplasia. congenital testicular lymphangiectasis is characterized by abnormal and excessive development of lymphatic vessels in the tunica albuginea, mediastinum testis, interlobular septa, and testicular interstitium. [461] [462] [463] ultrastructurally these dilated vessels are similar to normal lymphatic capillaries, although some are markedly dilated and the testicular interstitium is slightly edematous ( fig. 12-42 ). testicular lymphangiectasis occurs in both cryptorchid and scrotal testes; in one of the latter cases, the patient had noonan's syndrome. the disease does not seem to affect the seminiferous tubules, and low numbers of spermatogonia and reduced tubular diameters are observed only in cryptorchid testes. the epididymis and spermatic cord are not affected, and congenital testicular lymphangiectasis is not associated with pulmonary, intestinal, or systemic lymphangiectasis. during fetal life, lymphatic vessels are visible only immediately beneath the tunica albuginea and in the interlobular septa. 464 during childhood, the number and size of the septal lymphatic vessels decreases; 465 by adulthood they are inconspicuous. 466 in lymphangiectasis, the septal lymphatic vessels are large and often massively dilated. testicular lymphangiectasis occurs only in the childhood testis, suggesting that these dilated vessels undergo involution at puberty or that pubertal development of the seminiferous tubules masks the lymphangiectasis. one exceptional case of epididymal lymphangiectasis, with dilated epididymal blood vessels, was reported in a 59-year-old man. 467 the vessels distort the architecture of the ductuli efferentes, which in turn become irregularly dilated by mechanical compression. other hamartomas of the testis include hamartoma of the rete testis and smooth muscle hamartoma. hamartoma of the rete testis is a disordered proliferation of tubular structures in a loose connective tissue. 468 cystic transformation of the rete testis associated with proliferation of smooth muscle cells and abundant myxoid stroma was reported in a 26-year-old man. 469 smooth muscle hamartoma is located in the inferior testicular pole, the cauda of the epididymis, and the proximal segment of the vas deferens ( fig. 12-43) , 470 and is similar to that reported in the digestive and respiratory tracts. 471, 472 smooth muscle hyperplasia also occurs in the androgen insensitivity syndrome, forming nodules up to 1 cm in diameter. the muscular proliferation is located in the lower testicular pole, and involves the tunica albuginea and adjacent soft tissues. this infrequent fi nding has been observed in newborns and consists of gonadal blastema in otherwise normal testes. the blastema is located in the vicinity of the upper testicular pole, near the implantation of the caput of the epididymis, displays a crescent shape, and extends throughout the depth of the tunica albuginea and the adjacent testicular parenchyma. the blastema consists of epithelial cords of cells or solid masses in continuity with the mesothelium (fig. 12-44 ). these cells are intermingled with others that are larger, with pale cytoplasm, vesicular nuclei, and prominent nucleoli. the blastematous epithelial cells display immunoreactivity for vimentin, laminin, type iv collagen, and cytokeratin; the expression of the latter in the most superfi cial cells is similar to that of mesothelial cells and decreases in intensity in the deeper cells. this suggests that these may be pre-sertoli cells. the cord-like structures are delimited by laminin and type iv collagen. the second larger cell type is immunoreactive for placenta-like alkaline phosphatase (plap) on the surface, suggesting that it is related to the gonocyte. leydig cells have not been observed among the cords of gonadal blastema. the differential diagnosis of gonadal blastema ectopia is with ovotestes. the small size of the gonocytes distinguishes them from ovocytes, which are several times larger. in addition, no intersex condition is observed. the presence of seminiferous tubules within the tunica albuginea is rare and usually an incidental histologic fi nding. 473 ectopic tubules are present in approximately 0.8% of pediatric autopsies and 0.3% of adult autopsies. the lower incidence in adults may be explained by proportionally less sampling. the lesion ranges from microscopic size to a few millimeters in diameter, and may be visible as minute bulges in which multiple small vesicles protrude through a thin tunica albuginea. 474 histologically there are groups of seminiferous tubules in the tunica albuginea, sometimes accompanied by leydig cells. in children, the ectopic tubules appear normal ( fig. 12-45 ), whereas in adults they are usually slightly dilated, although some may be hyalinized. serial sections reveal continuity with the intraparenchymatous seminiferous tubules. ectopia of the seminiferous tubule is probably congenital, although it has been found in elderly men. 475 it does not appear to be the result of trauma. the malformation probably arises in the sixth week of gestation, when the primordial sex cords have formed and are branching toward the gonadal surface, and the developing testes is covered by only one to three layers of celomic epithelium. later, the tunica albuginea forms around the sex cords and under the celomic epithelium. failure of insertion of the tunica albuginea between the sex cords and celomic epithelium may entrap seminiferous tubules. ectopia differs from testicular dysgenesis, a distinctive form of male pseudohermaphroditism with müllerian remnants. numerous features, characteristic of ectopic seminiferous tubules, distinguish it from other conditions, including normal thickness and collagenization of the tunica albuginea, absence of interstitial tissue resembling ovarian stroma (characteristic of testicular dysgenesis), and clear delimitation of the tunica albuginea and testicular parenchyma (see discussion on male pseudohermaphroditism with müllerian remnants, below). in a unique case, there were multiple clusters of seminiferous tubules in the wall of a hernia sac that accompanied an undescended testis removed from an adult man. the ectopic tubules were not surrounded by tunica albuginea and were similar to those in cryptorchid testicular parenchyma with only dysgenetic sertoli cells. leydig cells occur normally in the testicular interstitium (interstitial leydig cells) and in the wall of the seminiferous tubules (peritubular leydig cells). however, clusters of leydig cells are often observed in other locations in the testis, or in the epididymis or spermatic cord. 476 ectopic leydig cells may be found in the interlobular septa, 477-479 rete testis, tunica albuginea, [480] [481] [482] or within hyalinized seminiferous tubules. 478, [483] [484] [485] intratubular leydig cells are found only in tubules with advanced atrophy and marked thickening of the tunica propria, including the tubules in adult cryptorchid testes, those of men with klinefelter's syndrome, and in some other primary hypogonadisms ( fig. 12-46 ). immunohistochemical studies suggest that the endocrine function of these leydig cells is low. 486 several theories have been offered to account for these ectopic cells, including in situ differentiation, migration from the testicular interstitium, and trapping of peritubular leydig cells in the tunica propria during its thickening. 487 leydig cells are commonly found in the epididymis 487 and spermatic cord; 488,489 26 of 64 autopsies had such foci. 490 extratesticular leydig cells usually form small groups within or adjacent to nerves ( fig. 12-47) . 477, 490 the occurrence of ectopic leydig cells in the albuginea, epididymis, or spermatic cord may account for the rare cases of leydig cell tumor in these paratesticular structures. ectopic leydig cells should not be misinterpreted as tumor cells (infi ltration or metastasis) when malignancy of a testicular leydig cell tumor is suspected. other rare forms of ectopia are found within and outside the testis. intratesticular ectopia includes adrenal cortical ectopia, osseous and adipose tissue heterotopia, and ectopia of the ductus epididymidis. extratesticular ectopia includes splenic ectopia (splenogonadal fusion), hepatic ectopia (hepatotesadrenal cortical ectopia may be important in two conditions that develop tumoral masses: adrenogenital syndrome and nelson's syndrome. tumors in adrenogenital syndrome appear in 8.2% of patients with congenital adrenal hyperplasia, appearing as bilateral testicular masses of synchronous growth. these tumors consist of well delimited but non-encapsulated yellow nodules, several centimeters long, composed of large microvacuolated cells. the cause seems to be prolonged stimulation by elevated acth secretion. the differential diagnosis includes leydig cell tumor. the diagnosis of tumors in adrenogenital syndrome is supported by a family or personal history of salt-lost syndrome or hypertension, demonstration of 11 β-hydroxysteroids (a specifi c marker for adrenal cortex) in spermatic vein blood, or a rapid positive response of tumor to corticoid treatment. nelson's syndrome occurs in patients who, after adrenalectomy for treatment of cushing's syndrome, develop an acth-secreting pituitary adenoma. these patients may develop testicular tumor growth similar to that in adrenogenital syndrome. most nelson's syndrome tumors do not respond to dexametasone treatment. cartilaginous heterotopia may be found in the caput of the epididymis and has been attributed to metaplasia of metanephric rests. osseous heterotopia (testicular osteoma) is a metaplasia occurring in areas of the testicular parenchyma with fi brosis or ischemia. 491 adipose metaplasia is frequent in undescended testis, elderly men, and those with cowden's syndrome ( fig. 12-48 ). 492 groups of tubular formations that resembles the epididymis have been reported inside the testicular parenchyma in testes with marked tubular atrophy, and probably represent a rare form of metaplasia. 493 testicular descent is not always complete at birth, and about 3.2% of full-term newborns have incompletely descended testes. most of these descend within 3 months, and only 0.8% of infants have incompletely descended testes 12 months after birth. spontaneous testicular descent is exceptional after the fi rst year. in recent decades, a signifi cant increase in the incidence of cryptorchidism has been detected. 494 only 5% of patients with impalpable testes are actually devoid of testes. other causes include true cryptorchidism, testicular ectopia, and retractile testes. true cryptorchidism includes abdominal, inguinal, and high scrotal testes that cannot be moved to the scrotum. ectopic testes are those located out of the normal path of testicular descent; the most frequent site is the superfi cial inguinal pouch. other rare locations of ectopia include the abdominal wall, the upper thigh, the perineum, and the base of the penis. retractile testes may be moved to the scrotum at exploration and account for about one third of undescended testes. patients with true cryptorchidism account for about 25% of cases of empty scrotum. these testes most frequently are found in the inguinal canal or upper scrotum; arrest within the abdomen is less frequent. cryptorchidism is slightly more frequent on the right than the left, and in approximately 18% of cases is bilateral. there is a family history of cryptorchidism in 14% of cases. 495 the cryptorchid testis is usually smaller than the contralateral one, and this difference is often discernible at 6 months of age. 496 one-third of cryptorchid testes are soft. several conditions are predictive of high risk of cryptorchidism, including increased maternal age, maternal obesity, pregnancy toxemia, bleeding during late pregnancy, and smoking, tallness, subfertility antecedents, cesarean birth, low birthweight, preterm newborn, twin birth, hypospadias 497 and other congenital malformations, and children born from september to november, and in may and june. 498, 499 of these associations, low birth weight seems to be the most important. 500 there are two types of cryptorchidism: congenital and acquired. this cryptorchidism is caused by anomalies in anatomic development or hormonal mechanisms involved in testicular descent (described above). impalpable undescended testes are infrequent because the transabdominal phase follows the simple mechanism of relative movement of the testis, whereas displacement of the ovary is more complex. 501 conversely, palpable undescended testes are more frequent because the second phase of testicular descent is more complex. unilateral cryptorchidism may be caused by androgen failure, which leads to either an ipsilateral lesion in the development of genitofemoral nerve neurons or a defect in cgrp release that hinders normal migration of the gubernaculum. a normally descended testis may become cryptorchid and locate even in the abdominal cavity. two categories of acquired undescended testis have been described. the postoperative trapped testis 502 is a normally descended testis that leaves the scrotal pouch after surgery owing to an inguinal hernia or hydrocele. [503] [504] [505] this iatrogenic cryptorchidism occurs in 1.2% of children after herniotomy. adherence of the testis or the cremasteric muscle to the surgical incision causes testicular ascent when the incision heals and undergoes retraction. spontaneous ascent from unknown causes. various mechanisms have been proposed, including inability of the spermatic blood vessels to grow adequately, 506 anomalous insertion of the gubernaculum, 507 failure in reabsorption of the vaginal process 508, 509 and failure in postnatal elongation of the spermatic cord. 510, 511 the spermatic cord measures 4-5 cm at birth and reaches 8-10 cm at 10 years of age. this growth does not occur if the peritoneal-vaginal duct has become a fi brous remnant. the cause might be a defect in postnatal cgrp release by the genitofemoral nerve. 501, 512, 513 pathogenesis the most frequent fi ndings in congenital and acquired cryptorchidism at infancy are decreased germ cell numbers and diminished tubular diameter. 514, 515 there are multiple causes of testicular maldescent, including anatomical anomalies of the gubernaculum testis, hormonal dysfunction (hypogonadotropic hypogonadism), mechanical impairment (insuffi cient intra-abdominal pressure, short spermatic cord, underdeveloped processus vaginalis), dysgenetic (primary anomaly of the testis), and heredity. most cryptorchidism appears to be caused by either a defi cit of fetal androgens or an excess of maternal estrogens. androgen insuffi ciency seems to be slight and transient because anomalies other than hypoplasia of the epididymis are not seen. elevated maternal estrogens level could cause diminution of fsh secretion by the fetal pituitary, inducing low müllerian-inhibiting hormone production that would hinder testicular descent. 516 three mechanisms seem to be involved in the process: • primary testicular anomaly. cryptorchid testes may bear an anomalous germ cell population, as suggested many years ago. 517 more than 40% of cryptorchid patients have a marked decrease in the tubular fertility index, 518 in the normal testis there is transient formation of spermatocytes at 4-5 years of age. this meiotic attempt is probably an androgenic event that does not occur in cryptorchid testes and agrees with the characteristic low numbers of spermatogonia in the prepubertal age. 523 prepubertal testes undescended testes are usually smaller than the contralateral ones. this difference is already significant at 6 months of age. 524, 525 although there have been a number of biopsy studies in the fi rst years of life, there is no agreement about the severity of damage or the time of its onset. 523, 526, 527 based on the tubular fertility index (tfi) and mean tubular diameter (mtd), most testicular biopsies from cryptorchid testes of children can be classifi ed into one of three groups: • type i (testes with slight alterations). the tubular fertility index is higher than 50, and the mean tubular diameter is normal or slightly (<10%) decreased. approximately 31% of cryptorchid testes are in this group ( fig. 12-49 ). • type ii (testes with marked germinal hypoplasia). tubular fertility index is between 30 and 50, and mean tubular diameter is 10-30% lower than normal. the spermatogonia are distributed irregularly and most are in tubular sections that are grouped in the same testicular lobule. these testes comprise approximately 29% of cryptorchid testes ( fig. 12-50 ). 528 • type iii (testes with severe germinal hypoplasia). tubular fertility index is less than 30, and mean tubular diameter less than 30% of normal. many of the spermatogonia are giant with dark nuclei (fig. 12 -51). these testes often contain ring-shaped tubules, induced by increased temperature. 527 testes with type ii or iii lesions bear variable degrees of dysgenesis that, in addition to germ cells, involve sertoli cells, peritubular myofibroblasts, and leydig cells. the dysgenesis of these other cell types is evident only after puberty. in about 25% of cases the contralateral scrotal testis also has histologic lesions of variable severity. this fi nding supports the hypothesis of a bilateral defect in many cases of unilateral cryptorchidism. microdeletions in the long arm of the y chromosome are present in 27% of patients with corrected unilateral cryptorchidism who present with azoospermia or severe oligospermia. 530 these fi ndings are similar to those observed in patients with azoospermia or severe idiopathic oligospermia. unilateral cryptorchidism with a normal contralateral testis could be due to an end-organ failure. 531 in cryptorchidism secondary to spontaneous ascent, lesions are similar to megatubules (with or without eosinophilic bodies or microliths) ( fig. 12-52) , and focal granular changes in the sertoli cells ( fig. 12-53 ). the testicular interstitium is wide and edematous. these comprise about 40% of cryptorchid testes. about 8% of tests with type i lesions show many multinucleated spermatogonia (with three or more nuclei) ( fig. 12-54 ). 529 the seminiferous tubules of testes with types ii or iii lesions have a thickened lamina propria during childhood and, at puberty, sertoli cell hyperplasia. 526 patients with bilateral cryptorchidism have a higher incidence of type ii and iii lesions than those with unilateral cryptorchidism. type i lesions are comparable to those seen in experimental cryptorchidism; normal testes in which lesions were those of congenital cryptorchidism, whereas in cryptorchidism secondary to herniotomy, germ cell depletion is slight 532 and becomes important only after 5 years of age. 533 adult testes most pubertal and adult cryptorchid testes have anomalies in all testicular structures. the seminiferous tubules have decreased diameters and defi cient spermatogenesis. in decreasing order of frequency, the most common germ cell lesions are tubules with sertoli cell and spermatogonia-only pattern; tubules with sertoli cells (dysgenetic) only; tubular hyalinization; and mixed atrophy. the lamina propria has scant elastic fi bers and increased collagen fi bers. 534 sertoli cells are present in increased numbers and do not mature normally except in tubules with germ cells (fig. 12-55 ). 528, 535 often, groups of tubules containing only sertoli cells with a prepubertal pattern (very small diameter and total absence of maturation) are present and are considered hypoplastic, dysgenetic, or hamartomatous. areas of apparent leydig cell hyperplasia are frequent, and many of these cells contain vacuolated lipid-laden cytoplasm ( fig. 12-56 ). the rete testis is hypoplastic in most cases and is lined by columnar epithelium with rare areas of fl attened cells. cystic dilation is common, and adenomatous hyperplasia has been found in some cases. near the rete testis, the testicular parenchyma frequently contains metaplastic fat. in some cryptorchid testes, several tubular segments are destroyed by infl ammation that probably has an autoimmune cause (focal orchitis). 536 epididymal tubules are poorly developed and peritubular tissue is immature. blood fl ow is associated with testicular histology. for example, testicular volume, histologic pattern, and testicular artery resistive index are lower in undescended testes than in controls, and testicular artery resistive index is inversely proportional to testicular histology score in undescended testes. 537 there is also an apparent correlation between testicular size, spermiogram, and hormone levels. assuming that a signifi cant reduction in testicular size (>12 ml) is only observed in 9.3% of cases, and that serum levels of fsh, lh, and testosterone are normal, an inverse correlation is seen between fsh and testicular volume, sperm concentration, sperm motility, and normally shaped sperm. in addition, there is a direct relation between testicular volume and sperm concentration, sperm motility, and normally shaped sperm. these fi ndings indicate the cause of tubular impairment in young men operated on in childhood for cryptorchidism. 538 obstructed testes are located in the superfi cial inguinal pouch (denis-browne pouch) and are considered ectopic by some authors and cryptorchid by others. 539, 540 histologic studies reveal that most obstructed testes bear the same lesions as true cryptorchid testes. type i lesions are observed in half, type ii in more than one-third, and the remainder show type iii lesions. the higher proportion of type i lesions suggests a better prognosis than in true cryptorchid testes. some authors assume that retractile testes are normal and exclude them from studies of cryptorchidism. 541, 542 however, these testes may present important lesions and many consider them to be a form of cryptorchidism. [543] [544] [545] retractile testes may not always be movable to the lower scrotum (70-75 mm from the pubic tubercle) and in 50% of cases are smaller than scrotal testes. approximately 50% of retractile testes remain high after age 6 years, when cremasteric activity declines. 546 retractile testes have a 32% risk of becoming ascending or acquired undescended testes. the risk is higher in boys younger than 7 years, or when the spermatic cord is tight or inelastic. 547 during childhood, tubular diameter and tubular fertility index decrease. 544 adults with retractile testes that descended spontaneously but late may be fertile 548 or infertile. 549 usually there is germ cell atrophy that varies in severity from lobule to lobule. 544 regular examination of retractile testes is advisable during childhood and, if complete testicular descent does not occur, orchidopexy is indicated. most cryptorchid patients have a patent processus vaginalis, and 65-75% have a hernia sac, although most hernias are not clinically visible. urologic anomalies are present in 10.5% of patients, the most frequent being hypospadias, complete duplication of the urinary tract, non-obstructive ureteral dilatation, kidney malrotation, and posterior urethral valves. cryptorchidism is more frequent in patients with microcephaly, myelomeningocele, bifi d spine, omphalocele, gastroschisis, micropenis, and imperforate anus. cryptorchidism may appear isolated or associated with congenital anomalies, endocrine dysfunction, chromosomal disorders, or intersex conditions. thus, cryptorchidism is found in the kallmann, prader-willi, klinefelter, noonan, smith-lemli-opitz, aarskog-scott, rubinstein-taybi, prune belly, and caudal regression syndromes, anomalies of the androgen receptor, absence of anti-müllerian hormone, charge association, and trisomies 13, 18, and 21. sperm excretory duct anomalies occur in 9-36% of cryptorchid patients, 550, 551 and are classifi ed into three types: 552 • ductal fusion anomalies (25% of cases). these consist of anomalous fusion of the caput of the epididymis to the testis or segmental atresia of the epididymis and vas deferens. this is chiefl y associated with intraabdominal or high scrotal cryptorchid testes. • ductal suspension anomalies (59% of cases). the caput of the epididymis is attached to the testis, whereas the corpus and the cauda of the epididymis are separated from the testis by a mesentery. a variant consists of an excessively long cauda of the epididymis that descends along the inguinal duct to the scrotum. • anomalies associated with absent or vanishing testes (16% of cases). cryptorchidism is part of the testicular dysgenesis syndrome. this consists of abnormal testicular development that predisposes to cryptorchidism, hypospadias, spermatogenetic alterations, and testicular cancer. the association of these disorders with cryptorchidism has been corroborated by numerous clinical, epidemiological and genetic studies. the least severe form of this syndrome is a defect in spermatogenesis; the most severe is testicular cancer. a constellation of histologic lesions is common in the testes of men with testicular dysgenesis; these lesions include sertoli cellonly pattern, mixed atrophy, hypoplastic tubules (sertoli cell nodules), microlithiasis, malformed tubules, granular changes in sertoli cells, nodular leydig cell hyperplasia, and intratubular germ cell neoplasia. it is assumed that there is a prenatal development of the lesions as a result of several genetic, environmental, or endocrine disruptor factors that would interfere with the estrogen/androgen ratio. [553] [554] [555] [556] the main complications of cryptorchidism are testicular cancer, infertility, testicular torsion, and psychological problems. approximately 0.8% of 1-year-old males have cryptorchidism, and about 10% of testicular cancer patients had cryptorchidism. the risk of testicular cancer in cryptorchid males is four to10 times higher than that of the general population. testes with elevated number of multinucleated spermatogonia seem to have a higher risk of cancer and adulthood. 557 about 5% of biopsies in children contain cells similar to those seen in undifferentiated intratubular germ cell neoplasia, and these cells may evolve toward germ cell tumor ( fig. 12-57 ). 558 the most frequent tumor in undescended testes is seminoma. 559, 560 regardless of timing, orchidopexy does not reduce the risk of cancer, although it facilitates early detection as the intrascrotal testis is palpable. one in fi ve testicular tumors arises in properly descended testes contralateral to cryptorchid testes, suggesting that there is a primary bilateral testicular anomaly in cryptorchidism. intra-abdominal testes also have a higher incidence of tumors. 560 infertility infertility is the most frequent problem caused by cryptorchidism. in a series of patients with infertility, nearly 9% had cryptorchidism. 561 infertility is infl uenced by several factors, including bilaterality, number of germ cells, location and size of the testis, and age at time of orchidopexy. the most important risk factors are bilaterality and germ cell number. only 16% 562 to 25% 563 of men with bilateral cryptorchidism have normal sperm counts (20 million/ml or more). the highest sperm counts occur with testes in the superfi cial inguinal pouch. patients with bilaterally impalpable testes are usually azoospermic. 563 fertility rates in unilateral cryptorchidism vary from 25% to 81%. 564 the number of germ cells per cross-sectioned tubule is the most important prognostic factor. patients with no increase in inhibin b during the postoperative period usually have a low number of spermatogonia per cross-sectioned tubule and a low tubular fertility index. in unilateral cryptorchidism, fertility depends on the number of spermatogonia in the contralateral testis. however, if the number of germ cells per cross-sectioned tubule in the cryptorchid testis is lower than 1% of normal, the risk of infertility is 33%. in bilateral cryptorchidism the risk of infertility rises from 75% to 100% when one or both testes have less than 1% of germ cells per cross-sectioned tubule. neither the preoperative location of the testis in patients with unilateral cryptorchidism nor the small size of the testis at the time of orchidopexy is relevant for fertility. [565] [566] [567] an important fertility factor is the permeability of sperm excretory ducts. the age at orchidopexy may also infl uence fertility, although this has not been proven. in patients over 4 years of age orchidopexy does not enhance fertility. 568, 569 benefi t of testicular biopsy in patients with cryptorchidism testicular biopsies of infantile testes at orchidopexy are useful for determining baseline germ cell status and whether surgery should be completed with hormonal treatment. 570 however, even if biopsy supplies important data, it is not considered a routine procedure. even in the best cases when the number of spermatogonia is nearly normal, spermatogenesis may never occur owing to defi cient spermatogonium development during childhood, failure of spermatogenesis at puberty, and, if complete spermatogenesis occurs, this might be associated with obstruction of sperm excretory ducts. in childhood, the chance of a biopsy fi nding an occult cancer or precancer is low because intratubular germ cell neoplasia is not diffusely distributed throughout the testis. testicular biopsy is recommended in patients with intra-abdominal testes, abnormal external genitalia, or abnormal karyotype. 571 the situation is different in adults because intratubular germ cell neoplasia is present in 2-3% of cases and is diffuse. 572, 573 when intratubular germ cell neoplasia is detected in a child, further examination of the testis and rebiopsy after puberty are recommended. 574 in adults, if intratubular germ cell neoplasia is unilateral orchidectomy should be performed, but if it is bilateral, radiation may be used to eradicate the neoplasia while maintaining leydig cell function. 575 testicular microlithiasis (tm) is characterized by the presence of numerous calcifi cations diffusely distributed throughout the testicular parenchyma. the number and size of the calcifi cations often is great enough to be detected radiographically or by ultrasound. 576 isolated microliths have been reported in undescended testes, prepubertal klinefelter's syndrome, male pseudohermaphroditism, and otherwise normal children and patients studied for other diseases. 577 in adults, microliths are frequently observed in cryptorchid and ex-cryptorchid testes, 578 seminiferous tubules located at the periphery of germ cell tumor, 579 infertile patients, [580] [581] [582] and in some patients complaining of orchialgia 583, 584 or testicular asymmetry. 585 testicular microlithiasis occurs in 0.3% of cryptorchid testes and is slightly more common in prepubertal than adult testes. in adults, it usually is diagnosed when men seek help for infertility, pain, or testicular asymmetry. 581 microlithiasis has been observed in 1.4-2% of testicular echographies of different disorders. 586, 587 in infertile patients the incidence is slightly higher. microlithiasis is present in 35% of testis having a malignant tumor. 588 ultrasound studies reveal two types of microlithiasis: classic tm, in which the number of microliths is fi ve or more; and limited tm, when there are fewer than fi ve microliths ( fig. 12-58 ). the incidences of tm in these studies are lower than 1% in infants, 5.6% in the general population aged between 18 and 35 years (bilateral in 66% of patients showing microliths, 589 0.68-4.1% in patients with other disorders, 586,587,590-592 from 4.6% 593 to 20% 594 in subfertile patients, 9.52% in ex-cryptorchid testes, 595 and more than 30% in adult testes with germ cell tumors). 588,596-599 several cases of testicular microlithiasis have also been observed in infant testes with germ cell tumor or gonadal stroma tumor. 600, 601 the incidence is higher in whites than in blacks. pain is the most common clinical symptom in patients without a palpable testicular mass, and has been attributed to dilation of seminiferous tubules secondary to obstruction by microliths. microliths are made by hydroxyapatite, according to x-ray diffraction studies 602 and raman spectroscopy. 603 in the prepubertal testis, microliths are surrounded by a double layer of sertoli cells and measure up to 300 µm in diameter. when they are very large, the seminiferous epithelium may be destroyed and the microlith is surrounded by peritubular cells (fig. 12-59 ). testes with microliths have subnormal mean tubular diameters and tubular fertility index. 604 in adult testes with microliths there is incomplete spermatogenesis. some seminiferous tubules with microliths are cystically dilated (fig. 12-60 ). microliths arise as extratubular eosinophilic bodies that mineralize and pass into the tubular lumina. 605 microlithiasis may be a disorder of the tunica propria. also, testicular microlithiasis is occasionally associated with pulmonary microlithiasis and with calcifi cations in the parasympathetic nervous system. 606, 607 the association of microlithiasis and testicular cancer is controversial. 608, 609 although the development of testicular cancer has been observed in several patients whose testicular microlithiasis had been previously diagnosed by ultrasound studies, [610] [611] [612] [613] [614] it is also thought that patients with testicular microlithiasis not associated with other disorder do not require any follow-up. 615 when microlithiasis is associated with infertility the incidence of cancer varies according to the unilaterality or bilaterality of microlithiasis: 594 subfertile patients with unilateral microlithiasis show no intratubular germ cell neoplasia, whereas this is present in 20% of those with bilateral microlithiasis. the risk of malignancy is higher in classic than in limited tm. 616 the nexus between microlithiasis and cancer does not seem to be the predisposition of one disorder towards the other but rather the predisposition of both to develop in abnormal testes. this may also explain the association between microlithiasis and infertility. yearly ultrasound examination, perhaps with testicular biopsy, is recommended in those with testicular microlithiasis associated with cryptorchidism, infertility, atrophic testes, or contralateral testis bearing germ cell tumor. 617 microlithiasis also occurs in the rete testis or sperm excretory ducts. epididymal rupture and extravasation of microliths into the interductal tissue may cause a histiocytic reaction resembling malakoplakia ( fig. 12-61 ). the disorder is asymptomatic and not associated with testicular cancer. 618 gonadal dysgenesis refers to disorders characterized by amenorrhea and streak gonads in phenotypically female patients. in adults, streak gonads are elongated masses of fi brous tissue resembling ovarian stroma ( fig. 12-62 ). they may contain hilar cells and rete or epithelial cords with variable degrees of maturation, and may result from failure in gonad formation, failure of gonadal differentiation to ovary, or failure of gonadal differentiation to testis. some streak gonads contain a few ovocytes or primordial follicles, but all germ cells disappear at puberty. patients with streak gonads have a hypoplastic uterus and fallopian tubes. four types of gonadal dysgenesis have been described: 46xy pure, 46xx pure, 45x0, and mixed. 46xy gonadal dysgenesis (swyer's syndrome) is characterized by female phenotype, absence of turnerian stigmata, and female external genitalia, sometimes with fused labia majora, a hypertrophic clitoris, and hypospadias. the breasts develop at puberty. sexual infantilism persists in adulthood, and eunuchoidism and amenorrhea appear. these patients have elevated serum gonadotropin levels and low serum estradiol. there are two types of gonadal dysgenesis: complete and incomplete. patients with the complete type have female external genitalia and classic streak gonads, although cases with ovarian tissue have been reported. the cause is unknown in about 80% of cases, 619 and is due to alterations in the sry gene in the remainder (a mutation in 10-15% of cases, and a sry deletion as a result of an aberrant x/y interchange in 10-15%). 620 the consequence of failure is very early gonadal alteration (sixth to eighth week of gestation). with the subsequent absence of müllerian inhibiting factor, testosterone, and dihydrotestosterone, a female phenotype develops. patients with incomplete 46xy gonadal dysgenesis have ambiguous external genitalia and variable degree of development of the müllerian and wolffi an structures. although they have streak gonads, testicular development is usually observed. this gonadal dysgenesis does not seem to be caused by sry alterations. 621 these fi ndings suggest that in the fi rst type ovarian differentiation was canceled, and that in the second type testicular differentiation failed. the fi rst is similar to the gonad of 45x0 turner's syndrome, whereas the second resembles the gonad of mixed gonadal dysgenesis. 622 the clitoromegaly may be caused by androgens secreted by hyperplastic leydig cells in the streak gonad. some patients with 46xy gonadal dysgenesis present with extragonadal anomalies and multiple syndromes, including camptomelic dysplasia and renal disorder, 623 myotonic dystrophy and terminal renal disease, 624 without 631 facial anomalies or short stature, 632 renal insufficiency and wilms' tumor (denys-drash syndrome), the combination of cleft palate, micrognathia, kyphosis, scoliosis, and clubfoot (gardner-silengo-wachtel syndrome or genitopalatocardiac syndrome), 633 pterygium multiple syndrome, 634 graves' disease, 635, 636 and congenital universalis alopecia, microcephaly, cutis marmorata, and short stature. 637, 638 most cases are sporadic, 639 although the syndrome has been reported in several members of the same family, [640] [641] [642] [643] and several forms of inheritance (x-linked, autosomal recessive, and male-limited autosomal dominant) have been proposed. 644 in addition to infertility, patients with 46xy gonadal dysgenesis have a high risk of germ cell tumor. this risk is about 5% in the fi rst decade of life, and 25-30% overall, [645] [646] [647] [648] and, thus, prophylactic gonadectomy is recommended. patients with 46xx gonadal dysgenesis have normal stature, female phenotype, well-developed external genitalia, and hypoplastic ovaries rather than streak gonads ( fig. 12-65 ). the anomaly is usually detected when patients present with primary amenorrhea or infertility. this syndrome is sporadic and familial, and it may be linked to recessive autosomal inheritance. 649, 650 patients have no predisposition to gonadal neoplasia. associated somatic anomalies such as neurosensory hearing loss (perrault's syndrome) are rare. some familial cases have shown a balanced translocation of the x chromosome (from the long arm to the short arm) 651, 652 or between chromosomes 1 and 11. 653 because the development of ovarian follicles requires fsh, mutations have been sought in the fshr gene. mutations have been detected in familial cases and also in unrelated patients, 654, 655 whereas other patients have shown no mutations in this gene. 656 the incidence of tumors in these patients is very low, and the most common is dysgerminoma. [657] [658] [659] this is one of the most common chromosomal anomalies (from 1/2500 to 1/5000 in female newborns), 660 although 99% of zygotes with this karyotype are aborted in the fi rst stages of embryonal development. 661 patients with 45xo gonadal dysgenesis have characteristic stigmata of turner's syndrome, including short stature, pterygium coli, lymphedema, and cardiac malformations. the external genitalia are female and infantile; the gonads are typical streak gonads. today, turner's syndrome is defi ned by the combination of physical features and the complete or partial absence of one of both x chromosomes, frequently associated with mosaicism. turnerian stigmata may be classifi ed into four groups: 662 skeletal anomalies such as cubitus valgus, shortening of the fourth metacarpal and madelund's deformity characteristic of leri-weill dyschondrosteosis; soft tissue anomalies such as webbed neck, low posterior hair line, and puffy hands and feet; visceral anomalies such as aortic coarctation, horseshoe kidney, polycystic kidney, urethral stenosis and vesicourethral refl ux; and miscellaneous anomalies such as nevus pigmentosus. 663 during embryonic life, these gonads show normal germ cell numbers up to the third month, when germ cell proliferation ceases. 665, 666 ovogenesis stops in meiosis i, usually before the pachytene stage. the cause seems to be generalized meiotic pairing errors with the start of an apoptotic mechanism to avoid the formation of abnormal gametes. 667 massive apoptosis of ovocytes occurs between the 15th and the 20th weeks. 668 surviving germ cells disappear throughout fetal life, and their numbers at birth are usually low ( fig. 12-66 ). 669 patients with mosaicism have fewer anomalies than pure 45x0 individuals; 12% have menstruation (compared to 3% of pure 45x0 patients), and 18% have breast development (compared to 5% of pure 45x0 patients). in 10-20% of 45x0 patients the sry gene is demonstrable by in-situ hybridization. it has been proposed that patients with sry expression should undergo gonadectomy, because this gene is also a marker of gonadoblastoma. 670 these patients may develop gonadoblastoma, dysgerminoma, and mixed germ cell tumor. 670, 671 mixed gonadal dysgenesis is characterized by the presence of a streak gonad and a contralateral testis (often cryptorchid) or streak testis (see discussion on male pseudohermaphroditism with müllerian remnants, below). true hermaphroditism is a disorder of gonadal differentiation characterized by the presence in the same individual of both testicular and ovarian tissue. this condition is rare and usually diffi cult to diagnose, so only 25% of male hermaphrodites are diagnosed before age 20. 672 failure to recognize this disorder may lead to surgical intervention for hernia repair or orchidopexy. most hermaphrodites raised as males display symptoms for the fi rst time at puberty because of breast development 673 (95% of hermaphrodites have some degree of gynecomastia), periodic hematuria 674 (if they have a uterus ending in the urinary tract), or cryptorchidism. 675 hermaphrodites raised as females initially present with irregular menstruation or clitoromegaly. true hermaphroditism should be suspected in all children with ambiguous sex characteristics ( fig. 12-67 ). the gonads of these patients are ovotestes, ovaries, or testes, with all possible combinations. 676 true hermaphroditism can be (1) unilateral, if there are both testicular and ovarian tissues (forming one ovotestes or two separated gonads) on one side, and a testis or an ovary in the other side; if there is no gonadal tissue in this latter side, unilateral hermaphroditism is incomplete; (2) bilateral, if testicular and ovarian tissues are present on both sides of the body; and (3) alternate, if there is a testis on one side, and an ovary on the other side. ovotestis is the most frequent gonadal type in true hermaphroditism. it is more frequent on the right side and is located in the abdomen (50% of cases), labioscrotal folds, inguinal canal, or the external inguinal ring. the ovotestis has a bilobated or ovoid shape ( fig. 12-68 ). in the bilobated ovotestis the testis and ovary are connected by a pedicle, whereas in the ovoid ovotestis the ovarian tissue forms a crescent capping the testicular parenchyma. the proportion of ovary to testis varies widely ( fig. 12-69 ). at adulthood, the ovarian follicles mature and corpora lutea or corpora albicantia may be seen. the seminiferous tubules rarely develop complete spermatogenesis. the interstitium usually contains leydig cells. ovotestis is associated with a fallopian tube in 65% of cases, and with a vas deferens in the remainder. if the patient has ovotestis/ovary, a completely developed uterus is present. if the patient has bilateral ovotestis (13%), uterine agenesis is frequent (fig. 12-70 ). 677 the testis of hermaphrodites is most often on the right side (60%) and is located anywhere from the abdomen to the scrotum. these testes have low tubular fertility indices during childhood. after puberty, the seminiferous tubules remain small, often containing only dysgenetic sertoli cells, similar to the tubules of cryptorchid testes. incomplete spermatogenesis has been reported, but complete spermatogenesis is exceptional. the ovary of hermaphrodites is most frequently on the left side (63%) and usually is hypoplastic with few primordial follicles. however, in occasional patients the ovary is histologically and functionally normal. the most frequent karyotype is 46xx (60%), followed by several mosaicisms (33%) which, in decreasing order of frequency, are 46xx/46xy, 46xy/47xxy, 45x0/46xy, 46xx/47xxy. the 46xy karyotype is the least common (7%). there is variation in the incidence of some karyotypes around the world. mosaicism is found in 40.5% of european cases, but in only 21% of north america cases. conversely, most african true hermaphrodites (97%) have 46xx karyotype. the karyotype 46xy is rare and its frequency is similar in europe, asia, and north america. 678, 679 most cases are sporadic, and families with several affected members also have 46xx males. this fi nding suggests that both genetic anomalies are alternative forms of a single genetic defect. 680 the following mechanisms 681, 682 have been proposed to explain the occurrence of testicular parenchyma: true hermaphroditism 46xx, a hidden mosaicism with a cell line having a y chromosome; transfer from a y chromosome fragment (including sry gene) to the x chromosome; autosomal mutation of variable penetrance; and x-linked mutation coupled with rare x inactivation or x mutation that permits testicular differentiation in the absence of sry. some 46xx hermaphrodites with sry-negative leukocytes are positive for this gene in dna from the testicular parenchyma in the ovotestis. 683 over 22 pregnancies in true hermaphrodites have been reported, 684 in contrast to the exceptional cases of paternity. ovules may arise from the ovotestes or the ovary. management of true hermaphroditism depends on the patient's age at the time of diagnosis, the nature and location of the gonads, and the developmental stage of the external genitalia. although bilateral castration may be justifi ed in order to avoid the risk of neoplasia, gonadal preservation may be desirable until adulthood. in this case, if the patient is raised as a girl, puberty will occur spontaneously and there is a small chance of fertility. 685 however, the high risk of malignancy (estimated at 4.6%) should be taken into account. the most frequent tumors are gonadoblastoma, dysgerminoma, and yolk sac tumor. 676 the risk of cancer may be reduced if some precautions are taken, including removal of the testis if it has not descended and surveillance of the residual gonad with periodic ultrasound studies, especially in cases of chromosomal mosaicisms. normal male development requires adequate differentiation of the testes in the fetal period, synthesis and secretion of testicular hormones, and proper response of target organs to these hormones. anti-müllerian hormone produced by sertoli cells inhibits the development of müllerian derivatives that would otherwise form the uterus and fallopian tubes. testosterone produced by leydig cells stimulates differentiation of the wolffi an ducts into male genital ducts. the conversion of testosterone into dihydrotestosterone by the enzyme 5α-reductase ensures the development of male external genitalia. alterations in these processes may cause male pseudohermaphroditism. androgen synthesis defi ciencies these autosomal recessive syndromes are characterized by an error in testosterone synthesis that results in incomplete or absent virilization. cholesterol is the source for the synthesis of androgens, estrogens, and other steroid hormones through multiple steps. first, the steroidogenic acute regulatory protein (star) generates cholesterol into mitochondria; star gene mutations cause congenital lipoid adrenal hyperplasia. second, within mitochondria, the cholesterol sidechain cleavage enzyme p450scc transforms cholesterol into pregnenolone; a disorder in this enzyme is rare because it is highly lethal in embryonic life. third, pregnenolone undergoes 17α-hydroxylation by microsomal p450c17; defi ciency in 17α-hydroxylase causes female sexual infantilism and hypertension. fourth, 17-oh-pregnenolone is converted into dhea by 17,20-lyase activity of p450c17. the ratio of 17,20-lyase to 17α-hydroxylase activity of p450c17 determines the ratio of c21 to c19 steroids produced. the ratio is regulated by at least three factors, including the electrondonating protein p450 oxidoreductase (por), cytochrome b5, and serine phosphorylation of p450c17. mutations in por are present in the antley-bixler skeletal dysplasia syndrome as well as a variant of polycystic ovarian syndrome. figure 12 -71 shows the enzymes involved in the abovementioned steps. the enzyme 3β-hydroxysteroid dehydrogenase transforms dhea to androstenedione, and the enzymatic complex called aromatase transforms androstenedione into estrone and testosterone into estradiol. in some patients cholesterol synthesis is also impaired, and congenital adrenal hyperplasia is superimposed on androgen defi ciency. defi cient testosterone synthesis may result from abnormalities in the enzymes involved in pregnenolone formation (congenital lipoid adrenal hyperplasia), including 3β hydroxysteroid dehydrogenase, 17αhydroxylase, 17,20-desmolase, and 17β-hydroxysteroid dehydrogenase ( fig. 12-71 ). congenital lipoid adrenal hyperplasia is the most severe form of congenital adrenal hyperplasia. 686 the disorder is characterized by a defi cit in steroid hormone synthesis in the adrenal cortex and gonads, producing a female phenotype with severe salt-loss syndrome. conversion of cholesterol to pregnenolone requires the enzymes 20α-hydroxylase, 20,22desmolase, and 22α-hydroxylase. failure of any of these leads to defi cits in cortisol, aldosterone, and testosterone. 687 the enzymatic defect is usually is caused by a defi cit in the steroidogenic acute regulatory (star) protein; in other cases, the defi cit is in p450ssc. the mitochondrial protein star promotes cholesterol transfer from the outer to inner mitochondrial membrane, where cholesterol serves as a substrate for p450scc and initiates steroidogenesis. more than 35 different mutations in the star gene have been identifi ed. 690 as a result, cholesterol is not converted to pregnenolone, which is required for the synthesis of mineralocorticoids, glucocorticoids, and sex hormones. the disorder is rare in most countries, but is common in japan, korea, and the arabian countries. 691, 692 patients usually present with salt-losing crisis in the fi rst 2 months of life. 693, 694 in most cases, males have female or ambiguous external genitalia and a blind-sac vagina, hypoplastic wolffi an derivatives, absence of müllerian structures, and cryptorchidism. 695 the adrenals usually appear enlarged and contain lipid accumulations, 696,697 but these diminish with age and the adrenals shrink. in the testes, lipid accumulations may be present or absent in leydig cells 686, 696, [698] [699] [700] [701] [702] or sertoli cells. 703 an 8-year-old child had partially hyalinized seminiferous tubules with sertoli cell-only pattern. 704 the testes of pubertal patients are usually normal for age. 701, 702 intratubular germ cell neoplasia has been reported in one case. 705 most patients die from adrenal insuffi ciency. survivors have female phenotype 704 and require the administration of glucocorticoids, mineralocorticoids, and gonadal steroids. 703 3β-hydroxysteroid dehydrogenase defi ciency patients with this defect have two main problems: salt-loss syndrome produced by reduced aldosterone secretion, and incomplete virilization. 706 at puberty, virilization increases and gynecomastia develops. 707, 708 the enzyme 3bhsd catalyzes the conversion of 5-3βhydroxysteroids such as pregnenolone, 17-hydroxypregnenolone, and dehydroepiandrosterone into respectively 4-3β-ketosteroid, progesterone, 17-hydroxyprogesterone, and androstenedione. 709 there are two 3bhsd genes located on the p11-p13 region of chromosome 1. the type i gene is expressed in the placenta, kidney, and skin, whereas the type ii gene 3bhsd is expressed only in the gonads and adrenal glands. complete absence of the 3bhsd gene is lethal; therefore, most reported cases have only partial 3bhsd deficits. [710] [711] [712] it is assumed that these defi cits account for 10% of cases of congenital adrenal hyperplasia. the classic form of salt-losing 3bhsd defi cit is diagnosed in the fi rst months of life because of insuffi cient aldosterone synthesis and subsequent loss of salt. the other 3bhsd defi cit, without salt loss, is due to mutations in the type ii 3bhsd gene 708 and its diagnosis may be delayed until puberty. severe forms of 3bhsd defi ciency are associated with defi cits in aldosterone, cortisol, and estradiol. symptoms may vary widely, as enzymatic activity in the adrenal gland is not the same as in the testis. most patients show salt loss and adrenal crisis; they have incomplete masculinization and may develop spontaneous puberty and gynecomastia. 706, 707, 713 patients with mild forms have normal genitalia and normal mineralocorticoid levels. some patients have only hypospadias 714 or micropenis. 715 the testes are smaller and softer than normal. 17α-hydroxylase defi ciency the cause of defi cits in the enzymes 17α-hydroxylase and 17,20-lyase are mutations of the cyp17 gene that encodes cytochrome p450c17. 716 the cyp17 gene is located on chromosome 10q24-q25, 717 and 50 different mutations have been described. 718 p450c17 catalyzes the 17α-hydroxylation of pregnenolone to 17ohpregnenolone and of progesterone to 17α-oh-progesterone. this enzyme also catalyzes 17,20-lyase activity, transforming 17oh-pregnenolone to dhea. the classic form of 17αhydroxylase defi cit is caused by severe defi ciencies in cyp17; less severe defects give rise to the isolated 17,20-lyase defi cit. 17α-hydroxylase defi cit impairs the synthesis of both cortisol and testosterone. 719 low cortisol levels stimulate acth secretion, causing hypersecretion of aldosterone precursors and the development of hypokalemic hypertension and male pseudohermaphroditism in males. 720 patients usually have hypospadias and develop gynecomastia at puberty. 721 the enzyme 17,20-desmolase cleaves the side chain of 17-hydroxypregnenolone and 17hydroxyprogesterone to form dehydroepiandrosterone and androstenedione, respectively. varying degrees of 17,20desmolase defi ciency are seen, resulting in varied development of external genitalia, ranging from female phenotype to virilization with microphallus, bifi d scrotum, and perineal hypospadias. in childhood, the testes contain reduced numbers of spermatogonia (figs 12-72, 12-73) . 722, 723 the cause may be mutations in one of the genetic loci encoding p450c17, fl avoprotein or or b5. 724 17β-hydroxysteroid dehydrogenase defi ciency this enzyme transforms androstenedione into testosterone and also converts estrone into estradiol. the enzymatic defects are sexlinked. most patients have female phenotype at birth and are raised as girls, but at puberty undergo virilization. 725 one or both testes may be cryptorchid or are located in the labia majora. normal spermatogenesis has never been observed. the most common testicular patterns are hypoplasia or absence of germ cells and leydig cell hyperplasia. 726 the germ cell injury was initially attributed to cryptorchidism, but it is now thought to be a primary testicular lesion because even very young patients lack germ cells. 727 this defi cit is due to mutations in the hsd17b3 gene located on 9q22. 728, 729 leydig cell hypoplasia this variant of male pseudohermaphroditism is defi ned by insuffi cient testosterone secretion 422 and the following characteristics: predominance of female external genitalia; absence of male secondary sex characteristics at puberty; absence of uterus and fallopian tubes and the presence of epididymis and vas deferens; 46xy karyotype; lack of response to human chorionic gonadotropin stimulation; absence of an enzymatic defect in testosterone synthesis; and small undescended testes that are gray and mucous on section. [730] [731] [732] [733] age at diagnosis varies from 4 months to 35 years. the syndrome is sporadic and familial. 734, 735 the best-known cause of leydig cell hypoplasia is inactivating mutation of the lh receptor in these cells. [736] [737] [738] during fetal life, there is an inadequate response to placental hcg initially and to pituitary lh subsequently. phenotypes vary widely according to the presence of complete or partial loss of receptor function. these changes range from male pseudohermaphroditism with female external genitalia (type i of leydig cell hypoplasia) to male phenotype with micropenis, hypospadias, pubertal delay, and primary hypogonadism (type ii of leydig cell hypoplasia). in type i hypoplasia, the testes contain small seminiferous tubules with sertoli cells, spermatogonia, and thickened basement membranes. leydig cells are rare or absent, in contrast to leydig cell hyperplasia seen in other types of male pseudohermaphroditism, such as those arising from defects in androgen synthesis or androgen action on peripheral tissues. 739, 740 leydig cell hypoplasia accounts for low serum testosterone levels, lack of virilization, and lack of spermatogenesis. the absence of müllerian derivatives suggests a normal function of sertoli cells, which synthesize müllerian inhibiting factor. in type ii hypoplasia, adult testes show maturation arrest of spermatogonia and a few incompletely differentiated leydig cells. 741, 742 resistance to androgen stimulation is the cause of several syndromes with phenotypes varying from complete testicular feminization 743 to normal male. 744, 745 these syndromes are caused by partial or complete lack of response of the target organs to androgens 746 due to the absence, diminution, or impairment of androgen receptors or postreceptor anomaly. 740 the gene for the androgen receptor is located on the x chromosome (xq11-q12), and x-linked transmission occurs in two-thirds of cases. the karyotype is usually 46xy, but 47xxy and several mosaicisms have been observed. 747 these syndromes affect 1 : 20 000-1 : 40 000 newborns. the diverse phenotypes associated with androgen insensitivity may be classifi ed as: complete androgen insensitivity syndrome (cais) or testicular feminization syndrome; partial androgen insensitivity syndrome (pais) or partial testicular feminization syndrome, which includes the syndromes of lubs, gilbert-dreyfus, reifenstein, and rossewater; and mild androgen insensitivity syndrome (mais), infertile men with light androgen insensitivity, and kennedy's disease. complete androgen insensitivity syndrome (complete testicular feminization syndrome) this form of male pseudohermaphroditism is characterized by female phenotype with testes. complete testicular feminization syndrome is rarely diagnosed during childhood except in patients who present with hernia, inguinal tumor, or with a family history of pseudohermaphroditism. primary amenorrhea is the principal presentation in adults. the testes may be in the abdomen, inguinal canal, or labia majora, and during the fi rst year of life may be normal histologically except for reduced tubular diameter and low tubular fertility index. after the fi rst year, decreased germ cell numbers become evident and the few remaining spermatogonia are concentrated in clusters of seminiferous tubules. the testicular interstitium contains numerous spindle cells arranged in bundles, and during the fi rst year of life has leydig cells with abundant eosinophilic or vacuolated cytoplasm. at puberty, patients have female external genitalia, a short blind-ended vagina, feminine breast development; and scarce pubic and axillary hair. serum testosterone is at the normal male level and lh is markedly increased. in adults, the testes vary in size from small to large, are tan-brown, and contain small seminiferous tubules without lumina which usually contain only sertoli cells. 748, 749 in one-third of patients both sertoli cells and spermatogonia are present. 750 ultrastructurally, sertoli cells lack charcot-böttcher crystals and annulated lamellae; inter-sertoli cell specialized junctions are not well developed, and in cryofracture studies the arrangement of membrane particles has an immature pattern. 751 leydig cells are abundant, but few contain reinke's crystalloids. often, there are areas resembling ovarian stroma in the testicular interstitium. in about two-thirds of cases the testes contain grossly visible white nodules that stand out from the surrounding testicular parenchyma (figs 12-74, 21-75). histologically, the nodules consist of clusters of small seminiferous tubules with immature sertoli cells, hyalinized lamina propria, numerous leydig cells, and an absence of elastic fi bers ( fig. 12-76) . these have been referred to as sertoli-leydig cell hamartoma. about 25% of testes have sertoli cell adenoma, sometimes very large, consisting of tubules resembling infantile testis but lacking in germ cells and peritubular myofi broblasts. no leydig cells are present between the tubules (figs 12-77, 12-78). 752 other benign tumors include sertoli cell tumor (large cell calcifying sertoli cell tumor and sex cord tumor with annular tubules), leydig cell tumor, leiomyoma, and fi broma. 746 approximately 60% of cases have small cystic structures closely apposed to the testes, and about 80% of patients have thick bundles of smooth muscle fi bers resembling myometrium near the testis. true myometrium has been demonstrated in only one case. hypoplastic fallopian tubes are present in about one-third of cases. in about 70% of patients the epididymis and vas deferens are rudimentary; the only explanation for this is residual activity of the mutated androgen receptor. 753 approximately 10% of testes from patients with testicular feminization syndrome develop cancer. the frequency increases with age, but tumors rarely appear before puberty. these tumors include intratubular germ cell neoplasia ( fig. 12-79) , 749 several types of germ cell tumor, 750, 754 and sex cord tumor. 441 thus, the gonads should be removed immediately after puberty. 755 partial androgen insensitivity syndrome (partial testicular feminization syndrome) the phenotype of patients with partial testicular feminization varies from normal female to normal male. the disorder includes four classic syndromes: lubs' syndrome, 756 characterized by partial fusion of labioscrotal folds, a defi nitive introitus, clitoromegaly, pubic and axillary hair, and poor breast development; 757 gilbert-dreyfus syndrome, characterized by progressively greater male phenotypic features that include small phallus, hypospadias, incomplete development of wolffi an derivatives, and gynecomastia; 758 reifenstein's syndrome, characterized by hypospadias, weak or absent virilization, testicular atrophy, gynecomastia, azoospermia, and infertility; 759 and rosewater-gwinup-hamwi syndrome, characterized by infertile men whose only abnormal feature is gynecomastia. 760 mild androgen insensitivity syndrome spermatogenesis requires high levels of intratesticular testosterone. a minor form of androgen insensitivity may be observed in some patients with male phenotype who present with infertility. 761 the frequency of androgen resistance among azoospermic and oligozoospermic men is estimated at about 19% 762 or lower. 763, 764 some patients have lost exon 4 765 or mutated exons 6 764 or 7. 766 kennedy's disease kennedy's disease (spinal and bulbar muscular atrophy, sbma) is an x-linked recessive disorder of the adult male 767,768 characterized by loss of motor neurons in the spinal cord and brain stem and associated with less important loss of sensory neurons and atrophy caused by skeletal muscle denervation. 767, 769 disease onset around 20 years of age includes muscular weakness, cramps, and fasciculations. 770 in most cases the male reproductive system is impaired. [770] [771] [772] the testes may be normal in the initial stages of the disease, and many patients are fertile; however, with progression, there is onset of secondary testicular atrophy and gynecomastia. testosterone levels are decreased in some cases. the disease results from mutations in the fi rst exon of the androgen receptor (ar) gene. 773 the smba gene, located on xq11-12, has expansion of a repetitive cag sequence in exon a. the number of cag repeats is 21 (range, 17-26) in control men and more than 40 in men with kennedy's disease. 768, [774] [775] [776] [777] this disorder is a variant of male pseudohermaphroditism caused by a lack of the enzyme 5α-reductase with failure of conversion of testosterone to dihydrotestosterone. 778 in patients with the 46xy karyotype there are two isoenzymes: isoenzyme 1 is encoded by the gene srd5a, located on 5p15, and isoenzyme 2 is encoded by the gene srd5a2 on 2p23. most reported cases result from defects in srd5a2. 779 many mutations in different exons have been reported. [780] [781] [782] [783] [784] during childhood, patients have a clitoriform penis, bifi d scrotum, urogenital sinus, and testes in the inguinal canal or labioscrotal folds (fig. 12-80) . müllerian derivatives are absent. at puberty they acquire the male phenotype, with development of the penis and scrotum. adults have erections, ejaculations, and normal libido, scant body hair and a thin beard, a very small prostate, and lack of temporal hairline recession (male pattern baldness). serum levels of fsh, lh, and testosterone are increased, but dihydrotestosterone is decreased. 785, 786 the disorder is autosomal recessive and has been observed in many consanguineous families from the dominican republic. 787 this group of male pseudohermaphrodites is characterized by the presence of müllerian derivatives and unilateral or bilateral testicular dysgenesis. these two features depend on anti-müllerian hormone gene mutations and end-organ insensitivity. [788] [789] [790] [791] in normal development, anti-müllerian hormone is responsible for inhibition of the ipsilateral müllerian ducts and collagenization of the tunica albuginea. patients with defi cient secretion of this hormone may also have androgen defi ciency. three variants of defective müllerian duct regression have been reported: mixed gonadal dysgenesis, dysgenetic male pseudohermaphroditism, and persistent müllerian duct syndrome. mixed gonadal dysgenesis (asymmetric gonadal differentiation) is characterized by the presence of a testis on one side of the body and a streak gonad on the other. 792 if the gonads are intra-abdominal, the labioscrotal folds may appear as either normal labia or empty scrotal sacs (fig. 12-81 ). in the former, the syndrome cannot be recognized in the newborn unless a peniform clitoris is present. if the gonad is descended, it is usually a testis. müllerian derivatives such as fallopian tubes are usually associated with streak gonad (95% of cases), but may also be associated with testicular tissue (74%). ipsilateral to the testis there is one epididymis and one vas deferens. on the contralateral side, no gonad or a streak gonad and a fallopian tube are present. a hypoplastic uterus and a poorly developed vagina are frequent fi ndings. this syndrome accounts for about 15% of intersex conditions. some patients are raised as males, although their external genitalia are usually ambiguous as a result of fetal virilization. the penis is clitoriform, and the urethra opens in the perineum. most have cryptorchid testes and are raised as girls, becoming virilized at puberty. infertility is a common symptom. 793 the etiology is heterogeneous: 794 one-third of patients have turnerian features, in accordance with the presence of the 45x0/46xy karyotype in more than 50% of patients. other observed karyotypes are 46xy and 45x0/47xyy. approximately 81% of patients have one y chromosome. mutation in the sry gene has not been found. 795 the testes can show two different patterns: testicular dysgenesis and streak testis. testicular dysgenesis is characterized by a tunica albuginea that varies in width and is reminiscent of ovarian stroma by the storiform distribution of cells and fi bers; there are also malformed seminiferous tubules (fig. 12-82 ) that are small, usually lack lumina, and contain only immature sertoli cells. in adults, spermatogenesis has been observed occasionally. the testicular interstitium contains increased numbers of leydig cells. streak testes are complex gonads in which testicular dysgenesis is associated with a fi brous streak. most of the gonad consists of a testis showing the characteristic lesions of testicular dysgenesis. in a pole of the gonad, or in continuity with it, there is a fi brous streak whose structure may correspond to any of the varieties mentioned above (fig. 12-83 ). this peculiar gonad can also be observed in some dysgenetic male pseudohermaphrodites as well as in the persistent müllerian duct syndrome. in these cases, the streak contains no ovocytes. light microscopy indicates a wide spectrum of testicular lesions, ranging from those of patients with 46xy pure gonadal dysgenesis to true hermaphroditism. differentiation of the ovocyte-containing streak testis and ovotestis remains controversial. 796, 797 the testes in mixed gonadal dysgenesis are incapable of müllerian duct inhibition and allow complete differentiation of wolffi an derivatives, virilization of external genitalia, and, in most cases, testicular descent. the risk of germ cell neoplasia reaches 50% in the third decade of life, usually beginning with gonadoblastoma. the testes should be removed after puberty. dysgenetic male pseudohermaphroditism is a disorder of sexual differentiation characterized by bilateral dysgenetic testes or streak testis, persistent müllerian structures, and cryptorchidism. this syndrome is considered a variant of mixed gonadal dysgenesis ( fig. 12-84) . 791, 798 the karyotype may be 46xy or 45x0/46xy, and turnerian stigmata may be present. the uterus and fallopian tubes are present and both are usually hypoplastic (fig. 12-85 ). 799 the testes show lesions characteristic of testicular dysgenesis, with few germ cells during childhood (fig. 12-85) . 799 in adults, spermatogenesis is poorly developed and the testicular interstitium shows leydig cell hyperplasia. about 25% of patients develop gonadoblastoma. 800 persistent müllerian duct syndrome has many names, including male with uterus, tubular hermaphroditism, persistent oviduct syndrome, and hernia uteri inguinalis. 801 it is a rare form of pseudohermaphroditism, with müllerian derivatives in an otherwise phenotypically normal male, and is the most characteristic form of isolated anti-müllerian hormone defi ciency. the molecular basis of this syndrome is heterogeneous. three hypotheses have been proposed, including a defect in anti-müllerian hormone synthesis, caused by mutation in the anti-müllerian hormone gene (45% of cases); resistance of target organs to this hormone, caused by mutation in the receptor ii for this hormone (39% of cases); and failure in the action of this hormone immediately before the eighth week of gestation (16% of cases). 802 although the external genitalia are male, one (35% of cases) or both testes (75% of cases) are cryptorchid. the syndrome usually also includes inguinal hernia contralateral to the undescended testis, with a uterus and fallopian tubes within the hernia sac (figs 12-86, 12-87) . 803 several cases with transverse testicular ectopia and persistent müllerian duct structures have been reported. 804, 805 patients usually have inguinal hernia, but others have cryptorchidism, infertility, 806 and testicular tumor. 807 in childhood, the testes have a low tubular fertility index and decreased tubular diameter. in adults, the tunica albuginea is variably thickened, contains connective tissue resembling ovarian stroma, and may contain tubular structures -alterations typical of testicular dysgenesis. the seminiferous tubules are usually atrophic and hyalinized. tubules with reduced spermatogenesis or patterns suggesting mixed atrophy (seminiferous tubules with spermatogenesis intermingled with sertoli cell-only tubules) have also been reported. the leydig cells appear hyperplastic. azoospermia or oligozoospermia are common, and paternity is exceptional. 808 the syndrome is sporadic or familial, with autosomal recessive or x-linked inheritance. 809, 810 these patients have a higher risk of testicular tumor than that attributed to cryptorchidism, 811 and all types of germ cell tumor have been observed. 812, 813 other forms of male pseudohermaphroditism of the dysmorphic syndromes associated with incomplete virilization of external genitalia, the best-known are those of rsh, denys-drash, wagr, opiz, camptomelic dysplasia, atr-x, gardner-silengo-wachtel, meckel, branchioskeletal-genital, down's, and other trisomies. rsh (smith-lemli-opitz) syndrome is a malformative recessive autosomal syndrome caused by mutations in the gene encoding for 7-dehydrocholesterol reductase (dhcr7), responsible for the synthesis of cholesterol from its immediate precursor 7-dehydrocholesterol. [814] [815] [816] the disorder is common in europe and rare in other countries. 817 the most severe form is lethal before birth. fetuses show postaxial oligodactyly (instead of polydactyly) and sometimes severe hydrops. 818 non-lethal forms are characterized after birth by severe growth failure; a semi-obtunded state; absence of psychomotor development; microcephaly; congenital cataracts; peculiar facies; broad anteriorly rugose alveolar ridges with cleft palate, edema of the nape of the neck, and unilobulate lungs; male pseudohermaphroditism or female external genitalia in 46,xy patients; postaxial polydactyly of the hands and feet; congenital heart defects; and renal anomalies. 819 hepatic and renal insuffi ciencies are frequent. 820 the less severe forms in the male have genital anomalies (70%) varying from normal genitalia to severe hypospadias with or without cryptorchidism, and numerous small anomalies whose collection characterizes the syndrome. most patients also show mental retardation and severe behavioral problems. 821 the dhcr7 gene maps to chromosome 11q12-13. its product is a microsomal, membrane-bound protein. many different missense, nonsense, and splice-site mutations as well as duplications and deletions have been reported. [822] [823] [824] [825] [826] [827] prenatal diagnosis is possible by relating ultrasound and cytogenetic studies and carrying out a biochemical analysis in the second trimester in those pregnant women who have low levels or no conjugated estriol. 828 in denys-drash syndrome, male pseudohermaphroditism is associated with nephroblastoma and renal insufficiency. 829 the pseudohermaphroditism is usually either mixed gonadal dysgenesis, dysgenetic male pseudohermaphroditism, 46xy pure gonadal dysgenesis, or true hermaphroditism. 830 the most common nephropathy is diffuse mesangial sclerosis. 831 most patients have mutations in the wt-1 gene, 832 which is expressed in the genital ridge in the sixth week of gestation and gives rise to either streak gonads or testicular dysgenesis, but, if a delay in testicular determination occurs, normal testes are formed. 833 the term wagr syndrome refers to wilms' tumor, aniridia, genital anomalies, and mental retardation. prevalence is estimated at between 0.75% and 2% of wilms' tumor patients. the syndrome is related to the syndrome of denys-drash and that of frasier (a variety of 46,xy gonadal dysgenesis). 834, 835 all have in common mutations in the wt-1 gene located on chromosome 11 (11p13). wt-1 product is a transcription factor expressed in different tissues that participates in embryogenesis and cell differentiation. mutations lead to the production of an anomalous protein that causes alterations in renal function, gonadal anomalies, and the loss of tumor suppressor function. six variants of alleles have been described: isolated wilms' tumor, mesothelioma, isolated diffuse mesangial sclerosis, denis-drash syndrome, frasier syndrome, and wagr syndrome. frasier syndrome is caused by mutations in the donor zone of the intron 9 link, with the subsequent loss of the +kts isoform (the patient has an imbalance in kts isoforms), whereas large deletions or loss of genetic material that comprises the wt-1 gene and other contiguous genes (pax6 or an) lead to the wagr syndrome. 836, 837 patients with opitz's syndrome are mainly boys with hypertelorism and, in the severe forms, unilateral or bilateral lip cleft, laryngeal cleft, severe dysphagia with more or less life-threatening aspiration, hypospadias and, occasionally, imperforate anus. the most important internal anomalies are those in the tracheobronchial tree, cardiovascular system (defects in cardiac septation), and gallbladder, with a subjacent defect of the developing embryonal ventral midline. the syndrome is genetically heterogeneous and consists of two entities that were described as different in the past: ados, or autosomal dominant opitz syndrome or g syndrome 838 with a mutated gene that maps to 22q11.2; and xlos, or x-linked opitz syndrome or bbb syndrome 839 with a mutated gene that maps to xp22.3. 840, 841 camptomelic dysplasia is an autosomal dominant syndrome with multiple osseous malformations. patients have 46xy karyotype and external genitalia that are ambiguous or female. gonadal histology varies from testes to dysgenetic ovaries with primary follicles. the cause is a haploinsufficiency of sox9, located on 17q. 842 the incidence of gonadoblastoma is low. atr-x syndrome is characterized by mild α-thalassemia, mental retardation, facial dysmorphism, and hypospadias. 843, 844 the disorder is x-linked, and is caused by mutation in the art-x gene (synonymous xnp, hx2). 845 testicular biopsy to diagnose infertility began in the 1940s, 846, 847 and most of the diagnostic terms used today were created then. 848 these terms are usually descriptive and, except for a few (normal testes, sertoli cell-only tubules, tubular hyalinization, for example), do not specify the degree of tubular abnormality that is evaluated by each pathologist subjectively. the terms maturation arrest and hypospermatogenesis have been applied to biopsies in more than 50% of cases of infertility, 849-851 but the criteria for these vary widely among pathologists. two forms of maturation arrest have been described: spermatogenic arrest, and spermatocytic arrest, or its equivalent, meiotic arrest. true spermatogenic arrest is rare because germ cell maturation usually does not arrest at the level of a defi ned germ cell type. 852 to avoid confusion the term irregular hypospermatogenesis has been proposed 853 for testicular biopsies with decreased numbers of germ cells, subclassifi ed as slight, moderate, or severe. however, this diagnosis is of little help to clinicians. the reported frequency of spermatocytic (meiotic) arrest in infertile men varies from 12% 854 to 32.1% 855 and is present in one or both testes of about 18% of oligozoospermic or azoospermic patients. 856 if observed in only one testis, the contralateral testis may show histologic changes ranging from normal spermatogenesis to hyalinized tubules. disorganization of the seminiferous tubular cell layers is another frequent diagnosis in testicular biopsies, 848,857,858 but this term is rejected by many pathologists. actual disorganization of the seminiferous tubular cells is unlikely and has not been demonstrated in ultrastructural studies. in most cases, the apparent disorganization is an artifact induced by handling or fi xation. 859, 860 the term tubular blockage was introduced by meinhard and co-workers 858 for testes with at least 50% of seminiferous tubules devoid of a central lumen and showing spatial disorganization of germ cells. this morphology was found in 28% of testicular biopsies from infertile men, mainly those with obstructive azoospermia. 861 although this appearance can result from improper fi xation, 862 the accumulation of sertoli cells and immature germ cells in the centers of tubules suggests a specifi c lesion, a variant of germ cell sloughing. diagnostic confusion decreased the interest and trust of urologists and andrologists in the study of testicular biopsies. subsequent studies attempted to correlate semen spermatozoa concentration with testicular size and biochemical fi ndings such as serum levels of fsh, and testicular biopsies were undertaken in only a limited number of oligozoospermic and azoospermic patients. 859, 862, 863 however, these studies were also discouraging because fsh was found to correlate poorly with numbers of spermatozoa in the semen but better with numbers of spermatogonia in the seminiferous tubules, 864 and normal numbers of spermatozoa can be produced by relatively small testes whereas some large testes have no spermatogenesis. in recent years, serum levels of inhibin b have been shown to have a positive correlation with spermatozoon numbers and serum fsh level. 865, 866 the development of morphometry caused a resurgence of interest in biopsies. many semiquantitative 853,867-869 and quantitative [870] [871] [872] [873] [874] [875] studies were carried out. the greatest achievements of these studies were enhancement of the reproducibility of results and better evaluation of the reversibility of lesions. morphometry emerged as the best method to objectively evaluate the seminiferous tubular cells. 876 the scoring method of johnsen, 868 estimation of the germ cell/ sertoli cell ratio for each germ cell type, 871 and calculation of germ cell number per unit length of seminiferous tubules 870 are reliable and useful. several methods are available to evaluate the leydig cell population, including the mean number of cells per seminiferous tubule and per cell cluster; the mean number of leydig cell clusters per seminiferous tubule; the ratio of leydig cell area to seminiferous tubule area; 877 and the ratio of leydig cells to sertoli cells. 878 these methods have shown that the appearance of leydig cell hyperplasia described in many conditions is false, and that true leydig cell hyperplasia is extremely rare. optimal interpretation of testicular biopsies depends on the surgical technique by which the tissue sample is taken, the care and delicacy with which the specimen is manipulated, and proper fi xation and processing of the tissue. the size of the biopsy should not be greater than a grain of rice: that is, no diameter should be greater than 3 mm. this amounts to about 0.12% of testicular volume (normal volume is approximately 20 ml). the biopsy should be bilateral because in more than 28% of patients the fi ndings differ between the testes. at the time of biopsy, the testicular axes should be measured as the basis of quantitative studies. the tissue should be taken opposite to the rete testis through a 4-5 mm incision in the tunica albuginea. this parenchyma herniates through the incision and can be carefully snipped off. if only light microscopy is to be performed, the specimen should be fi xed in bouin's fl uid for 24 hours. if electron microscopy is indicated, a small biopsy fragment should be fi xed in glutaraldehyde-osmium tetroxide or a similar fi xative. to perform meiotic studies, testicular biopsy should be processed by air-drying or surface-spreading methods. the examination of testicular biopsies includes qualitative and quantitative evaluation of the testis and correlation between the biopsy and spermiogram. light microscopy immediately reveals whether the lesion is focal or diffuse. if focal, the percentage of tubules showing each lesion (sertoli cell-only, hyalinization, tubular hypoplasia, etc.) should be calculated. it is useful to evaluate elastic fi bers with a special stain because this highlights groups of small tubules that may be missed with hematoxylin and eosin. a minimum of 30 cross-sectioned tubules should be studied (this is usually possible when fi ve or six histological sections are available). the diameter of each tubule should be measured, and the number of spermatogonia, primary spermatocytes, young spermatids (also called round spermatids or s a + s b spermatids), mature spermatids (also called elongated or s c + s d spermatids), sertoli cells, and, in some cases, peritubular cells counted. the presence of tubular diverticula, 879 the most frequently observed lesions are sertoli cell-only tubules, tubular hyalinization, alterations in spermatogenesis in either the adluminal or the basal compartments of seminiferous tubules, and mixed tubular atrophy. sertoli cell-only syndrome includes all azoospermias in which the seminiferous epithelium consists only of sertoli cells. to better understand this syndrome, it is necessary to consider the morphological and functional changes induced in the sertoli cell by hypophyseal gonadotropin secretion during puberty. during childhood, sertoli cells are pseudostratifi ed and their nuclei are dark, small, and round or elongated, with regular outlines and one or two small peripherally placed nucleoli. the cytoplasm lacks specialized organelles. 881 adult sertoli cells have characteristically pale, triangular nuclei with irregular, indented outlines. the nucleoli are large and have tripartite structures. the cytoplasm contains abundant smooth endoplasmic reticulum and specialized structures, including annulate lamellae, charcot-böttcher crystals, and specialized junctional complexes with other sertoli cells. the pubertal increase in length and width of the seminiferous tubules replaces the infantile pseudostratifi ed pattern with a simple columnar distribution. five variants of the sertoli cell-only syndrome are identifi ed by sertoli cell morphology, the degree of development of the seminiferous tubules, and the presence or absence of interstitial lesions. 882 these variants are designated by the appearance of the predominant sertoli cell population: immature sertoli cells, dysgenetic sertoli cells, adult sertoli cells, involuting sertoli cells, and dedifferentiated sertoli cells (fig. 12-88) . each type is associated with other tubular and interstitial alterations (table 12-7) . the most frequent types of sertoli cell-only syndrome in infertility patients are dysgenetic sertoli cells, adult sertoli cells, and involuting sertoli cells. the clinical manifestations are similar, including normal external genitalia, welldeveloped secondary male characteristics, azoospermia, elevated serum fsh level, normal or elevated serum lh level, and normal or slightly low testosterone. these clinical and histologic features were long thought to constitute a single syndrome, del castillo's syndrome, but recent ultrastructural, histochemical, immunohistochemical, and cytogenetic studies have shown that this results from a variety of syndromes that may have primary or secondary causes (table 12-7) . [883] [884] [885] [886] [887] some patients with the adult or dysgenetic sertoli cellonly syndrome variants have a few spermatozoa in their spermiograms. this discrepancy between oligozoospermia and the biopsy histology is caused by the presence of some seminiferous tubules with complete spermatogenesis elsewhere in the testicular parenchyma. sertoli cell-only syndrome with immature sertoli cells sertoli cells in adult testes with this variant of sertoli cell-only syndrome have an immature prepubertal appearance with pseudostratifi cation. the number of cells per cross-sectioned tubule is greater than normal. other tubular and interstitial features suggest immaturity, including small tubular diame-ters (<80 µm), tubules lacking central lumina, thin lamina propria lacking elastic fi bers, and interstitium lacking mature leydig cells. [888] [889] [890] this syndrome is caused by a defi ciency of both fsh and lh which begins in childhood and is responsible for the lack of maturation of the sertoli cells, tubular walls, and interstitium. subsequently, there is no renewal or differentiation of germ cells, and these eventually disappear. when these patients are treated with hormones, the biopsy may show some degree of spermatogenesis or thickening and hyalinization of the tubular basement membrane. sertoli cell-only syndrome with dysgenetic sertoli cells dysgenetic sertoli cells begin pubertal differentiation but variably deviate from normal maturation, so that the morphology of dysgenetic sertoli cells differs among tubules and even among sertoli cells within the same tubule. nuclei usually have both mature features (pale chromatin and a centrally located, tripartite nucleolus) and features of immaturity (ovoid or round shape; regular outline; and small, dense chromatin granules) (fig. 12-89) . 891 in addition to vimentin, sertoli cells immunoexpress anti-müllerian hormone (amh) 892 and cytokeratin 18. 893 immunoreaction to these two substances is assumed to be a sign of immaturity, as under normal conditions it is not detected after puberty. other signs of immaturity are poor development of the hematotesticular barrier 894 and the absence of tubular lumina. tubular lumina are very small or absent in most dysgenetic sertoli cell-containing tubules, because the ability to produce testicular fl uid is greatly reduced. sertoli cell numbers per cross-sectioned tubule are very high, and mean tubular diameter is lower than 120 µm. the tubular walls have few elastic fi bers, 534 and most tubules show a variable degree of tunica propria hyalinization. completely hyalinized tubules are frequent. the testicular interstitium contains a variable number of leydig cells (normal, decreased, or apparently increased), many of which are pleomorphic with abundant paracrystalline inclusions. 895, 896 most patients have normal or slightly subnormal testosterone level and elevated levels of fsh and lh. this syndrome can be observed in men with cryptorchid testes, at the periphery of germ cell tumors, in men with idiopathic infertility, 897 and in men with y chromosome anomalies. 898 sertoli cell-only syndrome with mature sertoli cells in this variant, most sertoli cells appear mature but are present in increased numbers (14 ± 0.8 per cross-sectioned tubule). the seminiferous tubules have small diameters, but are still larger than in the two variants described above, and central lumina are visible. the cytoplasm contains abundant vacuoles that communicate with the tubular lumina ( fig. 12-90 ). the lateral cell surfaces have many unfolding and extensive specialized junctions with other sertoli cells (from the basement membrane to the apical cytoplasmic portion). lipid droplets, usually derived from phagocytosis of spermatid tubulobulbar complexes and dead germ cells, are scant. 884 vimentin fi laments are abundant in the basal and perinuclear cytoplasm. 899 the lamina propria is normal or slightly thickened. leydig cells are normal. serum testosterone level is normal or nearly normal, and fsh and lh levels are elevated. [900] [901] [902] this syndrome is probably caused by failure of migration of primordial germ cells from the primitive yolk sac to the gonadal ridge. 903 this failure may be due to a deletion in the azfa region in yq11 904 or a mutation in the genes that encodes c-kit or its ligand (stem cell factor), responsible for migration, proliferation, and survival of germ cells. testes with this variant of sertoli cell-only syndrome have numerous changes. sertoli cell nuclei may have lobulated shapes with irregular outlines, coarse chromatin granules, and inconspicuous nucleoli. seminiferous tubules have central lumina, decreased diameters, and variable thickening of the basement membrane (fig. 12-91) . elastic fi bers are present in normal or diminished amounts. leydig cells are variably involuted. this syndrome may be a primary disorder or secondary to irradiation or cytotoxic therapy, such as cancer chemotherapy or treatment for nephrotic syndrome. 905 it is not usually possible to determine the etiology from the biopsy fi ndings alone. changes in the tubular walls are more pronounced in patients with a history of cyclophosphamide treatment, combination chemotherapy, or radiotherapy. the testicular interstitium may be fi brotic in patients treated with cis-platinum or cyclophosphamide. 906 some syndromes with involuting sertoli cells, mainly those associated with decreased number of elastic fi bers, probably express a primary testicular anomaly with involuting and dysgenetic sertoli cells within the same tubule. the presence of immature-appearing sertoli cells in the tubular wall is thickened and contains elastic fi bers, increased amounts of collagen fi bers, and elevated numbers of peritubular cells as a result of tubular shortening. mean tubular diameter is markedly decreased to less than 90 µm. the testicular interstitium contains few leydig cells, and these appear dedifferentiated or contain an increased amount of lipofuscin. this variant has been observed in surgical specimens from patients receiving androgen deprivation therapy for prostatic cancer, estrogen treatment for transsexuality, and cancer chemotherapy with cis-platinum. there is a correlation between the degree of sertoli cell dedifferentiation and the dose and timing of treatment with estrogens or anti-androgens. brief treatment induces germ cell loss and inconspicuous sertoli cell changes; long-term treatment causes pronounced sertoli cell changes, including initial nuclear rounding followed by nuclear elongation and the development of dark chromatin masses. 907 eventually, the nuclei come to resemble those of infantile sertoli cells, including pseudostratifi cation. at the same time, the tubules become hyalinized and peritubular cells increase whereas leydig cells disappear. 908, 909 estrogens act on the pituitary by inhibiting lh secretion, and on leydig cells. 910 the action of gonadotropin-releasing hormone agonist analogs is only on the pituitary, whereas cis-platinum acts only on the testis. the most common causes of tubular hyalinization include dysgenetic hyalinization, hormonal defi cit, ischemia, obstruction, infl ammation, and physical or chemical agents. the differential diagnosis is given in table 12 -8. dysgenetic hyalinization dysgenetic hyalinization is a diffuse lesion in which most tubules are uniformly hyalinized (fig. 12-92) . tubules lack seminiferous tubular cells and have a reduced number of peritubular cells. the few preserved tubules usually contain only sertoli cells, although rarely a few with spermatogenesis are present. dysgenetic hyalinization is seen in klinefelter's syndrome, testes that remain cryptorchid through puberty, and some hypergonadotropic hypogonadisms associated with myopathy. focal lesions are seen in mixed atrophy of the testis. tubular hyalinization is pronounced in klinefelter's syndrome, and from infancy the seminiferous tubules are small, containing reduced numbers of sertoli cells and few or no spermatogonia. at puberty, the dysgenetic sertoli cells fail to infertility mature and soon disappear. the tubules collapse, giving the appearance of phantom tubules. 911 peritubular cells fail to differentiate and their number is low. 912 they form a discontinuous ring around the hyalinized tubules and are incapable of synthesizing elastic fi bers and other components of the lamina propria. dysgenesis also involves the interstitium: leydig cells exhibit a characteristic adenomatous pattern, although their total number is decreased. the morphology of the leydig cell is not uniform, and there are shrunken, normal, and large forms. most contain reduced amounts of lipofuscin granules and lipid droplets. reinke's crystalloids are uncommon, and paracrystalline inclusions are abundant. 896 in spite of the hyperplastic adenomatous appearance of the leydig cells, testosterone secretion is markedly decreased, and the resulting hypogonadism is the most important clinical feature of klinefelter's syndrome. tubular hyalinization in the cryptorchid testis is also dysgenetic. however, in contrast to the atrophic collapse seen in klinefelter's syndrome, cross-sections of the hyalinized tubules in cryptorchidism are targetoid. this results from the arrangement of the peritubular cells into two layers, suggesting an atrophic process that has evolved over a longer period than in klinefelter's syndrome, or a lower degree of dysgenesis. 913 elastic fi bers are diminished. 534 in the interstitium leydig cells appear hyperplastic, forming large aggregates, although their absolute numbers are decreased. leydig cell pleomorphism is less intense than in klinefelter's syndrome. many leydig cells have abundant vacuolated cytoplasm. whereas tubular hyalinization in klinefelter's syndrome is secondary to the effect of pubertal gonadotropin secretion on dysgenetic tubules, tubular hyalinization in cryptorchidism probably results from the effect of increased temperature on the dysgenetic tubules. however, other mechanisms are also involved in cryptorchid tubular hyalinization, including obstruction of sperm excretory ducts (anomalies in these ducts are frequent in cryptorchidism) and ischemia (principally in testes that could only be incompletely descended by surgery). hyalinization caused by hormonal defi cit hormonal defi cit causes diffuse tubular hyalinization, although the tubules may be recognized for a time as cellular cords surrounded by hyaline material. sertoli cell, a few spermatogonia, and rare primary spermatocytes may be identifi ed in these cords. when hyalinization is complete, only the elastic fi bers in the lamina propria indicate the structure of the previously normal adult testis. peritubular myofi broblasts decrease in number and form a ring at the periphery of the lamina propria. leydig cells disappear as hyalinization progresses, and the few that remain have pyknotic nuclei and shrunken cytoplasm with abundant lipofuscin granules. this process manifests clinically as postpubertal hypogonadotropic hypogonadism and is usually caused by a lesion in or near the pituitary, such as pituitary adenoma, craniopharyngioma, and trauma to the cranial base or sella turcica (see discussion on hypogonadotropic hypogonadism in this chapter). ischemic hyalinization ischemic atrophy is usually caused by torsion of the spermatic cord, vascular injury during inguinal surgery, 914 polyarteritis nodosa, and severe arteriosclerosis. 915 except for cases caused by torsion of the cord, these patients usually are not referred to infertility clinics. torsion of the spermatic cord often is not listed as a cause in large series of infertile patients. however, follow-up of men with torsion reveals marked alteration in their spermiograms. several hypotheses have been offered to explain the low number of sperm produced by the contralateral normal testis; the most promising include response to the release of antigens by the ischemic testis, and primary lesions of the contralateral testis 916 (see discussion on testicular torsion in this chapter). testicular anoxia caused by torsion rapidly produces severe lesions that are irreversible without adequate treatment. eight hours after torsion, there is intense hemorrhagic infarction of the seminiferous tubular cells. chronic anoxia leads to tubular hyalinization and loss of leydig cells (fig. 12-93) . testicular atrophy secondary to inguinal hernia surgery occurs in 0.03-0.5% of patents in the fi rst repair, and in 0.8-5% in surgery for recurrent hernia. atrophy is most frepostobstructive hyalinization obstruction of sperm excretory ducts may cause atrophy of seminiferous tubules. in order to produce tubular hyalinization, the obstruction must be close to the testis because the ductuli efferentes in the caput epididymis absorb about 90% of tubular fl uid and protect the testis from excessive intratubular pressure. obstructive tubular hyalinization is usually focal and secondary to varicocele and other disorders involving dilation of the channels of the rete testis. these may be congenital, as in epididymis-testis dissociation, or acquired, as in rete testis dilation secondary to epididymal atrophy caused by arteritis, arteriosclerosis, or androgen insuffi ciency. obstructive tubular hyalinization also occurs in the seminiferous tubules at the periphery of the testis in patients who have had orchitis. 917 obstructive hyalinization has a mosaic distribution: lobules of completely hyalinized tubules are intermingled with lobules of normal tubules (fig. 12-94 ). the diameter of the hyalinized tubules is not as small as in other causes of hyalinization, and the tubules occasionally contain sertoli cells. in the centers of many of the tubules there is a small lumen or vacuole, the latter in the cytoplasm of a residual sertoli cell. 918 the lamina propria is thick and contains hypertrophic peritubular cells and abundant extracellular material. finally, the peritubular cells dedifferentiate and only fi broblasts remain. 919 the interstitium contains a normal number of leydig cells forming small clusters, some of which are among hyalinized tubules. this is not seen in other patterns such as ischemic hyalinization. in addition, dilated veins with eccentrically hyalinized walls can be seen in testes associated with varicocele. this lobular pattern of tubular atrophy causes a peculiar ultrasound image which has been described as a striated pattern. 920, 921 postinfl ammatory hyalinization many infections of the testis cause irreversible lesions in the seminiferous tubules. in bacterial infection the epididymis is usually involved, resulting in obstructive azoospermia. in viral infection the testis is often affected, even without symptoms. two types of viral orchitis often cause infertility, including mumps orchitis and coxsackie b orchitis. tubular atrophy caused by viral infection has a mosaic topography in which hyalinized and normal tubules are intermingled. in fully hyalinized tubules, the only recognizable cells are peritubular cells that form an incomplete, peripheral ring around the hyalinized material. the presence of elastic fi bers in these tubules distinguishes this from dysgenetic hyalinization. leydig cells form clusters of variable size, but their total number is normal. in bacterial infection the pattern of tubular hyalinization is variable. tubular atrophy of unknown etiology may be caused by an autoimmune response. this appears to occur in hypogonadism associated with disorders in other endocrine glands, such as addison's disease associated with gonadal insufficiency; adrenal-thyroid-gonadal insuffi ciency; and the association of diabetes, hypogonadism, adrenal insuffi ciency, and hypothyroidism. the testicular lesions are morphologically similar to those seen in the seminiferous tubules at the periphery of germ cell tumor and in testes with burn-out germinal cancer. in the initial stages of hyalinization associated with germ cell neoplasm, the tubules are small, contain intratubular germ cell neoplasia and dedifferentiating sertoli cells, and the lamina propria is infi ltrated by macrophages, lymphocytes, and plasma cells. in the fi nal stages, the intratubular cells have degenerated, the infl ammation has disappeared, and the seminiferous tubules are replaced by areas of hypocellular or acellular fi brosis (fig. 12-95) . it should be noted that autoimmune hyalinization is not the most common type of hyalinization associated with testicular tumors: the obstructive, ischemic, and dysgenetic variants are more common. radiation and a wide variety of chemicals cause tubular hyalinization. lengthy cancer chemotherapy combined with infertility radiotherapy invariably causes hyalinization. children's testes are more sensitive to radiation than those of adults. radiation for testicular leukemia frequently causes tubular hyalinization. in addition, radiation induces dense interstitial fi brosis and loss of peritubular cells, obscuring the borders between the interstitium and the tubules. this makes the tubules hard to see in hematoxylin-eosin-stained sections. leydig cells are atrophic and decreased in number. ischemia secondary to radiation-induced vascular injury also contributes to hyalinization. in tubular hyalinization associated with cancer chemotherapy, in addition to the direct toxicity of drugs on seminiferous tubular cells (see discussion on sertoli cell-only syndrome with involuting sertoli cells in this chapter), nutritional defi ciencies cause hypogonadotropic hypogonadism. 922, 923 histophysiological studies have distinguished two compartments in the seminiferous tubules: basal and adluminal. the blood-testis barrier separates these, and each contains different cell types with diverse hormonal and nutritional requirements. on this basis, lesions may be classifi ed as involving only the adluminal compartment or both the basal and the adluminal compartments. the following discussion of spermatogenic lesions uses this new concept of tubular pathophysiology, conserving as much as possible of the classic terminology. this category includes all infertile testes with normal numbers of spermatogonia per cross-sectioned tubule, normal or decreased numbers of spermatocytes and young spermatids, and variable numbers of adult spermatids. a descriptive term for this disorder is immature germ cell sloughing. a few immature germ cells are normally found in the lumina of the seminiferous tubules, 923 a fi nding that correlates with their presence in the ejaculates of fertile men. 924 when these cells make up more than 4% of the cells in the ejaculate, it is abnormal and the result of premature sloughing of spermatids and, in some cases, of spermatocytes. 925, 926 some authors have attempted to establish a correlation between the number of sloughed immature germ cells and the severity of lesions of the seminiferous tubules using light 927 and electron 928 microscopy. lesions in the adluminal compartment are classifi ed according to the most abundant type of germ cell whose maturation is arrested and which then sloughs: young spermatids, late primary spermatocytes, or early primary spermatocytes ( fig. 12-96) . young spermatid sloughing young spermatid sloughing is present when the ratio of elongated (s c + s d ) spermatids to round (s a + s b ) spermatids is lower than normal. the implication of this pattern is that many round spermatids are incapable of further differentiation and are sloughed ( fig. 12-97) . late primary spermatocyte sloughing in this condition, spermatogenesis develops normally up to the level of interphase primary spermatocytes, and these are present in normal numbers. afterwards, these spermatocytes degenerate without achieving meiosis and slough into the tubular lumen. all types of spermatid are greatly reduced in number. when biopsies of these testes are not properly fi xed, the seminiferous tubules may have a target-like appearance, with numerous cells in the lumen. this appearance sometimes has been referred to as tubular blockage. another descriptive term, spermatogenic arrest, also has been applied to this morphology. the latter term is inadequate in most cases, because some spermatids are present, and the number of primary spermatocytes is usually not increased as would occur if the transformation of spermatocyte into spermatid were blocked (fig. 12-98 ). late spermatocyte sloughing is a more accurate term for this condition and is preferred. primary spermatocyte sloughing occurs at the pachytene or diplotene stage of meiosis. early primary spermatocyte sloughing this lesion is characterized by the presence of a normal number of spermatogonia and decreased numbers of primary spermatocytes (fig. 12-99 ). the seminiferous tubules may contain a few spermatids. the term early primary spermatocyte sloughing does not necessarily imply an early meiotic lesion, which is quite rare. 856, 926 rather, it refers to the sloughing of newly formed spermatocytes. the sertoli cells may show vacuolation of the apical cytoplasm as an expression of germ cell loss. this lesion is more severe than that in testes with late primary spermatocyte sloughing, and is considered to result from failure of the sertoli cells to maintain the adluminal compartment. etiology the mechanisms causing adluminal compartment lesions can be classifi ed into obstructive and nonobstructive. obstruction is present in more than 70% of cases, and is characterized by variability of involvement among lobules and the presence of at least two of the following abnormalities: enlargement of tubular diameter and a lumen with remarkable differences among lobules; sertoli cells with adherens germ cells protruding into the lumen, giving an indented outline; intense apical vacuolation of sertoli cell cytoplasm; accumulation of spermatozoa in the lumen of some tubules; or number of spermatids s c + s d is higher that that of s a + s b (see testicular lesions resulting from obstruction of sperm excretory ducts). 929 the three levels of severity of adluminal compartment lesions emphasized by the terms young spermatid sloughing, later primary spermatocytes sloughing, and early primary spermatocyte sloughing, depend on the degree (total or partial) of obstruction and the level of sperm excretory duct obstruction: as the obstruction gets nearer to the testis, the greater the severity. obstruction may be extratesticular (epididymis, vas deferens, and ejaculatory ducts) or intratesticular (rete testis or any level of the seminiferous tubule length). the most frequent causes of extratesticular excretory duct obstruction are vasectomy, infl ammation (epididymitis, prostatitis), mucoviscidosis (congenital bilateral absence of vas deferens), and testis-epididymis dissociation. rete testis obstruction. varicocele is the most frequent cause of obstruction of the rete testis. more than 50% of testes with varicocele patients also often have spermatozoa with characteristically elongated heads with thin bases. 930 initially, abnormalities are confi ned to the testis ipsilateral to the varicocele, but eventually both testes are affected, although abnormalities are more severe in the ipsilateral testis. elevated pressure in the pampiniform plexus is transmitted to the veins within the testes, principally to the centripetal veins that cross the testicular mediastinum and drain most of the testicular parenchyma ( fig. 12-100 ). 931 the dilated centripetal veins compress the intratesticular sperm excretory ducts, explaining the mosaic distribution of the tubular lesions. 932 seminiferous tubule obstruction. obstruction at the level of the seminiferous tubules can be dysgenetic or post-orchitic. a dysgenetic cause may be suspected in specimens with a mosaic distribution of lesions and seminiferous tubules with small diameters, thickened lamina propria, and an unusual seminiferous tubular cell layer consisting of cuboidal sertoli cells and spermatozoa that clog the lumina (fig. 12-101) . the diagnosis is confi rmed if study of serial sections demonstrates continuity between these tubules and those with conserved spermatogenesis. the structure of seminiferous tubules has been observed with scanning microscopy at such points of continuity. 858, 933 tubular stenosis appears to be due to a primary anomaly of sertoli cells and peritubular cells. post-orchitic obstruction should be suspected in cases of tubular atrophy with a mosaic pattern without dysgenetic tubules or varicocele. some patients have a history of orchitis associated with parotiditis; 934 in others the only fi ndings are oligozoospermia and small testes. testicular biopsy, sampling only the testicular periphery, reveals only the consequences of obstruction, lesions similar to those observed with varicocele. however, some postinfl ammatory changes should also be present, including hyalinized tubules, dilated tubules lined by cuboidal sertoli cells, or complete spermatogenesis. occasionally, there is modest perivascular or peritubular infl ammation and angiectasis. 935, 936 about 30% of testes with lesions in the adluminal compartment have no obstruction, and most have primary anomalies of germ cells. this claim is supported by the following: pronounced decrease of germ cell type when the preceding type is greatly increased in number; normal correlation between the number of mature spermatids in biopsy and number of spermatozoa in the spermiogram; and the presence of numerous malformed germ cells in the adluminal compartment. decrease in the number of a germ cell type may be so important that spermatogenesis is arrested, with subsequent azoospermia. in some cases, maturation arrest is only partial and results in severe oligozoospermia. this maturation arrest is observed mainly in primary spermatocytes and young spermatids. primary spermatocyte sloughing may also be owing to meiotic anomalies ( fig. 12-102 ). the observation of increased numbers of spermatocytes arrested in preleptotene-leptotene 926 or, more frequently, pachytene 856 suggests the diagnosis. the lesion is always bilateral. spermatocytes arrested in pachytene are usually increased in size and later degenerate. in addition, some spermatids have large, diploid, spherical, hyperchromatic nuclei. the anomaly does not always affect all spermatocytes, and then a higher number of spermatids are produced. 856 young spermatid sloughing not associated with obstruction may be due to either meiotic anomalies or defective spermiogenesis. the former gives rises to the appearance of many multinucleate, polyploid, hyperchromatic young spermatids. in the second cause, young spermatids are incapable of transforming into mature spermatids, and only round spermatids appear in the ejaculate. lesions in the basal and adluminal compartments of seminiferous tubules are the most frequent histological fi ndings in testicular biopsies from infertile men. these testes may be classifi ed into two major subgroups: hypospermatogenesis and spermatogonial maturation arrest ( fig. 12-103) . hypospermatogenesis: types and etiology hypospermatogenesis is defi ned as a reduced number of spermatogonia and primary spermatocytes, with primary spermatocytes outnumbering the spermatogonia. most seminiferous tubules contain few spermatids. about 8% of patients with hypospermatogenesis have focal tubular hyalinization. 937 two variants of hypospermatogenesis have been quantitatively distinguished: pure hypospermatogenesis, and hypospermatogenesis associated with sloughing of primary spermatocytes. pure hypospermatogenesis is defi ned as a proportionate decrease in the number of all types of germ cell. the number of spermatogonia per cross-sectioned tubule is less than 17 and usually more than 10. the number of primary spermatocytes is equal to or higher than that of spermatogonia. the number of round spermatids is higher than that of primary spermatocytes, and the number of elongated spermatids is similar to that of spermatogonia ( fig. 12-104) . hypospermatogenesis associated with primary spermatocyte sloughing is characterized by two features: low numbers of spermatogonia and primary spermatocytes (with spermatocytes more numerous than spermatogonia), and degeneration and sloughing of many primary spermatocytes. the remaining spermatocytes give rise to the few spermatids observed in the tubules (fig. 12-105) . etiology of hypospermatogenesis. hypospermatogenesis may result from hormonal dysfunction, congenital germ cell defi ciency, sertoli cell dysfunction, leydig cell dysfunction, infertility androgen insensitivity, exposure to chemical or physical agents, and vascular malfunction. hormonal dysregulation. although complete spermatogenesis may be observed in men with low levels of fsh and lh, the production of a normal number of spermatozoa requires normal gonadotropin levels. hypospermatogenesis has been reported in patients with abnormal pulsatile secretion of fsh and lh, 938 low gonadotropin secretion, 939 biologically inactive gonadotropins, mutation in the gonadotropin β subunit, 940 inactivating mutation of fsh receptor gene, 941 hyperprolactinemia, and adrenal and thyroid dysfunction (see discussion on hypogonadisms secondary to endocrine gland dysfunction in this chapter). congenital germ cell defi ciency. biopsy of cryptorchid patients after orchidopexy reveals that spermatogonia proliferation is decreased and germ cell development is insufficient in adulthood even if the number of spermatogonia was normal in infancy. is it likely that this poorly understood primary anomaly of germ cells is present in some cases of hypospermatogenesis. sertoli cell dysfunction. for many years, primary germ cell defi ciency was considered the most common cause of hypospermatogenesis; today, it is known that sertoli cell failure is the cause of many cases of germ cell defi ciency. this conclusion is based on several fi ndings. sertoli cells in many infertile patients are markedly abnormal, with an increase in the number of glycogen granules 942 and acid phosphatase activity; 884 a decrease in the number of lipid droplets; and alterations in the cytoskeleton, 943 the nucleus, 944 and cytoplasmic organelles. 945 in some cases sertoli cells have abnormal maturation, with elongated nuclei containing coarse clumped chromatin instead of triangular-shaped nuclei with pale chromatin. anomalies in sertoli cell fsh receptors may be present in idiopathic oligozoospermia associated with elevated levels of fsh. 946 serum inhibin b concentration may be used as a marker to estimate sertoli cell function. 947 leydig cell dysfunction. testosterone synthesis by leydig cells is necessary for normal spermatogenesis, 948 and abnormal leydig cell function is a frequent fi nding in idiopathic oligozoospermia. [949] [950] [951] leydig cell dysfunction should be suspected when the cells appear diffusely hyperplastic. patients have elevated serum lh level with depletion of rapid-release testosterone, revealing a lack of early response of leydig cells to gonadotropin-releasing hormone stimulation. the ratio of testosterone to lh in the plasma indicates the degree of leydig cell dysfunction. decreased ratio with normal testosterone level suggests compensated dysfunction. patients with a ratio of less than 1 : 5 and normal other parameters may have complete spermatogenesis. 951 androgen insensitivity. some patients with severe oligozoospermia or azoospermia have a defect in androgen receptor responsiveness, similar to that in reifenstein's syndrome. [952] [953] [954] the abnormality may arise from a genetic defect in the eight exons that code for this receptor, mapped to xq11-12, 955 or from post-translational errors. 956, 957 this defect is also referred to as infertile male syndrome and mild androgen insensitivity, and the patients have male phenotype with somatic features of slight androgen defi cit. 958 histologically, the testis is similar to that observed with leydig cell dysfunction or mixed atrophy, although the mechanism causing the leydig cell hyperplasia is quite different (fig. 12-106) . peripheral resistance to testosterone action alters regulation of the hypothalamohypophyseal-testicular axis, and lh and testosterone levels are elevated. androgen insensitivity causes between 10% 959 and 40% 960 of all cases of severe oligozoospermia or azoospermia. in such cases spermatogenesis improves with the administration of tamoxifen citrate, 960 clomiphene citrate, or androgen therapy. 961, 962 calculation of the index of androgen insensitivity can be helpful: plasma lh (miu/ml) × plasma testosterone levels (ng/ml). in patients with androgen insensitivity, the index is higher than 200 (normal is about 102). physical and chemical agents. the number of chemicals implicated in infertility increases daily. a detailed history is invaluable in evaluating these patients. the same is true of physical agents such as prolonged exposure to heat, ionizing radiation, or microwave radiation. 963 etiology of hypospermatogenesis associated with primary spermatocyte sloughing. most testes with primary spermatocyte sloughing have varicocele, and this is commonly associated with infertility. [964] [965] [966] [967] varicocele is found in 15% of the general population, and is present in 30-40% of infertile men. the mechanism by which varicocele affects fertility is unknown. clinical varicocele may occur without a testicular lesion (or only phlebectasis), and subclinical varicocele may be associated with severe spermatogenic lesions. increased testicular temperature 968, 969 and compression of intratesticular sperm excretory ducts by dilated veins 932 are the most plausible mechanisms. in other cases, primary spermatocyte sloughing results from anomalies of primary spermatocytes and spermatids, suggesting a meiotic anomaly. finally, in some patients the cause may be the presence of involuting sertoli cells. spermatogonial maturation arrest spermatogonial maturation arrest is a disorder defi ned by the presence of fewer than 17 spermatogonia per cross-sectioned tubule and even fewer primary spermatocytes. spermatids are usually absent. there have been attempts to correlate the etiology of spermatogonial maturation arrest with the sertoli cell type present. 970 immature sertoli cells are characteristic of hypogonadotropic hypogonadism and some syndromes with androgen insensitivity (fig. 12-107) . mature sertoli cells, if their presence is unilateral, are observed in varicocele, epididymitis, and ipsilateral testicular traumatism, but if they appear in both testes the etiology is unknown. involuting sertoli cells are usually present bilaterally; some cases are idiopathic, whereas others are associated with a history of alcoholism or chemotherapy. dedifferentiated sertoli cells are found in spermatogonial maturation arrest caused by gonadotropin inhibition in treatment with estrogen, -releasing hormone agonist, or anti-androgen. 971 mixed atrophy is a descriptive term for the coexistence, in the same testis, of tubules containing only sertoli cells and tubules with complete or incomplete spermatogenesis. 972 this disorder includes patchy failure of spermatogenesis and partial del castillo's syndrome. the extent of sertoli cell-only tubules varies widely. tubules with spermatogenesis may be normal or partially atrophic. tubular hyalinization is occasionally seen (fig. 12-108) . mixed atrophy is more common than suggested by the literature, and many cases are included under other diagnoses, such as 'hypospermatogenesis with a severe germ cell depletion in such a way that some sertoli cell-only tubules are seen,' 859 and 'sertoli cell-only syndromes with focal spermatogenesis.' 973 serial sections from testes with mixed atrophy reveal that the two different types of tubule are grouped according to their histologic pattern, suggesting that the distribution is by testicular lobules. in cases of mixed atrophy, the percentage of tubules with spermatogenesis, the degree of spermatogenic development in the tubules, and the type of sertoli cell present should be reported. correlation of the fi rst two with the spermiogram gives an indication of prognosis, and the sertoli cell types identifi es the nature (primary or secondary) of the lesion. 974 mixed atrophy (probably primary) is observed in idiopathic infertility, cryptorchidism (even if orchidopexy was done at infancy, in both the cryptorchid and the contralateral descended testis), retractile testes, macroorchidism, intravaginal torsion of the spermatic cord (in both twisted and contralateral testis), and chromosomal anomalies such as down's syndrome, 47/xyy karyotype, 46/xx karyotype, infertility giant y chromosome, klinefelter's syndrome with chromosomal mosaicism, partial androgen insensitivity, and some male pseudohermaphrodites. secondary mixed atrophy may be seen in patients undergoing chemotherapy, corticoid therapy, 975 or in those with a history of viral orchitis. in addition to anomalies in the seminiferous tubules, examination of the biopsy should include a description of the morphology of the germ cells. giant spermatogonia are a normal component of the seminiferous epithelium. these cells may be altered spermatogonia in the s or g 2 phases of the cell cycle. they rest on the basal lamina and have pale cytoplasm and an ovoid nucleus measuring at least 13 µm in diameter. the frequency of these cells in normal and infertile men is about 0.65 cells per 50 cross-sectioned tubules, although their number is usually higher in mixed atrophy. these cells should not be mistaken for intratubular germ cell neoplasia; they are also present in normal numbers in tubules at the periphery of germ cell tumor ( fig. 12-109 ). 976 multinucleate spermatogonia are a common fi nding in cryptorchid testes that were surgically corrected, infertile patients, and old men. nuclei of both ad and ap spermatogonial types may be seen within the same cell. normally, spermatogonia are present only in the transition zone between the seminiferous tubule basal layer and the tubuli recti. dislocated spermatogonia have been found throughout the testis in old age, 977 in infertile patients with a variety of lesions, after long-term estrogen therapy, 978 and in seminiferous tubules with intratubular germ cell neoplasia. 979 megalospermatocytes are large primary spermatocytes arrested in the leptotene stage ( fig. 12-110) 980 that exhibit asynapsis of chromosomes. 981 joined by cytoplasmic bridges, they form small groups. these cells may be clones of synchronously degenerating spermatocytes. 982 they are frequently found in elderly men and are a non-specifi c fi nding in infertile patients. the presence of spermatids with multiple nuclei (from 2 to 86) is frequent is old age. 983 similar cells with fewer nuclei have also been reported in infertility due to cryptorchidism, 984 hyperprolactinemia, and idiopathic infertility ( fig. 12-111 ). there are at least four teratozoospermic syndromes that may be easily identifi ed by testicular biopsy, although in most the diagnosis previously relied on morphologic study of the spermiogram: (1) round-headed spermatids (characteristic of spermatozoa lacking acrosomes) (fig. 12-112) , (2) s c + s d spermatids with a very elongated head (characteristic of varicocele) (fig. 12-113) , (3) macrocephalic s c + s d spermatids whose dna content suggests an anomaly in the fi rst meiotic division, and (4) s c + s d spermatids with voluminous eosinophilic cytoplasmic droplets (syndrome of spermatozoa with short thick fl agella 985 or fi brous sheath dysplasia). in some patients, s a + s b spermatids rest in these initial phases of spermiogenesis and eventually become sloughed in the tubular lumina. 986 in other testes there are macrocephalic s c and s d spermatids with anomalous dna content, suggesting an anomaly in the fi rst meiotic division. ultrastructural study of spermatozoa is sometimes necessary to determine the cause of male infertility. a number of mor-phologically abnormal spermatozoa are present in all semen samples, including those from fertile men, but abnormal spermatozoa are very numerous in infertile patients. ultrastructural study is advised in all cases of asthenozoospermia, in teratozoospermia when the number of spermatozoa showing the same morphological anomaly is high, and in cases with apparently normal spermatozoa that fail to fertilize in vitro. 987 the classifi cation of ultrastructural anomalies in spermatozoa is based on light microscopy fi ndings 988 of lesions in the head and tail. anomalies of the spermatozoal head these are defi ned by changes in the shape of the head, and usually involve both the nucleus and the acrosome. some anomalies, such as pear-shaped, candle-shaped, or egg-shaped heads, 989,990 are regarded as minor variants of normal. more signifi cant abnormalities are the elongated, microcephalic, macrocephalic, and crater-defect forms. the most frequent abnormal head shape is elongated with a narrow base (tapered head spermatozoa). this anomaly is frequently associated with varicocele. 991 microcephalic spermatozoa have spherical (globozoospermia) or irregularly shaped heads. the former have spherical nuclei with poorly condensed chromatin and lack acrosomes, postacrosomal sheaths, and a nuclear ring (fig. 12-114 ). most cases are sporadic, but this lesion was also reported in two pairs of infertile brothers. 992, 993 microcephalic spermatozoa with irregularly shaped heads have small and irregularly shaped acrosomes that usually are not in contact with the nucleus. this anomaly may be congenital, as in aarskog-scott syndrome, 994 or secondary to heat exposure or hashish smoking. in both types of microcephaly loss of connection between the acrosomal vesicle and the spermatozoal head is attributed to a defi ciency in basic proteins of the sperm perinuclear theca that promotes nuclear envelope organization and adhesion of the acrosomal vesicle. 995 acrosin is reduced or absent in spermatozoa lacking acrosomes and those with small acrosomes. 996 motility may be infertility normal. the occurrence of aneuploidy 997 and disomy of sex chromosomes 998, 999 in some cases should be evaluated before performing intracytoplasmic sperm injection (icsi). the cause of round-headed spermatozoa might be the lack of golgi-associated protein known in male mice as golgi-associated pdz-and coiled-coil motif-containing protein (gopc). this protein is principally localized in the trans-golgi region in round spermatids, and its loss produces globozoospermia. the primary defect consists of an inability of acrosomal vesicles to fuse to each other to create the acrosome. 1000 macrocephalic spermatozoa (macronuclear spermatozoa) have enlarged, irregularly shaped heads and defi cient chromatin condensation. there are two types (multiple tails 1001, 1002 and afl agellate), both of which have abnormal dna content (many are tetraploid), suggesting a meiotic anomaly. 1003, 1004 irregularly shaped spermatozoa are characterized by irregularity in the shape of the nucleus or acrosome. 1005 in the crater defect syndrome, there is invagination of the nuclear envelope in which the acrosome penetrates. the tail is morphologically normal, and motility is only slightly reduced. in spermatozoa with spoon-shaped nuclei, the defect is probably genetic. other anomalies include double-headed spermatozoa with two nuclei sharing a single acrosome. 1006 anomalies in the spermatozoal tail spermatozoal tail anomalies are classifi ed as generalized anomalies of the tail or anomalies in defi ned tail components, such as the connecting piece, the axoneme, or the periaxonemal structure. 1007 generalized anomalies in the tail cytoplasmic remnants. the presence of cytoplasmic droplets is normal during spermiogenesis. an elevated number of spermatozoa with cytoplasmic droplets in semen is associated with premature sloughing of spermatozoa, as occurs in varicocele, and should not be misinterpreted as spermatozoa with excess residual cytoplasm. 1008 these spermatozoa are very often abnormal and the residual cytoplasm may be located around the intermediate piece or surrounding the head. these spermatozoa also have other fl agellar anomalies. bent tail. a bend in the tail may occur at the level of the connecting piece or the intermediate piece. in bends of the connecting piece, the tail is laterally implanted and forms an angle with a nucleus that displays a thin base. bends of the intermediate piece are associated with cytoplasmic droplets, malposition of mitochondria, and loss of the parallel arrangement of the dense outer fi bers. coiled tail. spermatozoa with a coiled tail are a frequent fi nding in centrifuged semen, but they may also be a true abnormality. these spermatozoa have a perinuclear cytoplasmic remnant containing a fl agellum that is coiled around the nucleus and along the middle or principal pieces (fig. 12-115 ). this is frequently associated with abnormalities of the periaxonemal structures. tail stump (short-tail spermatozoa). the presence of many spermatozoa with short, thick tails in semen represents a well-defi ned teratozoospermic syndrome. 1009 ultrastructural examination reveals hypertrophy and hyperplasia of the fi brous sheath, 1010 hence this syndrome has also been termed 'fi brous sheath dysplasia.' 1011 additional axonemal malformations, including absence of the central pair of microtu-bules ( fig. 12-116) 1012 and, less frequently, lack of dynein arms, are observed in 50% of cases. about 24% of patients have respiratory disease, such as rhinosinusitis, bronchitis, and bronchiectasis from an early age. similar fi ndings have been reported in the cilia of the upper respiratory tract, and thus a relationship between fi brous sheath dysplasia and immotile cilia syndrome has been assumed. clinical presentation may be sporadic or familial. the cause of fi brous sheath dysplasia and the subsequent lack of motility in these spermatozoa is probably related to the occurrence of deletions in akap3 and akap4 genes, as well as the absence of akp4 protein in the fi brous sheath. 1013 multiple tails. the presence of more than two tails is associated with macrocephalic spermatozoa. 1014 sperm tail agenesis. teratozoospermia with 100% sperm tail agenesis has been reported in patients with a high degree of consanguinity. these spermatozoa also have defects in chromatin condensation and residual cytoplasmic droplets. 1015 anomalies of the connecting piece anomalies of the connecting piece are classifi ed as acephalic spermatozoa, defi cient organization of the connecting piece, and separation between the head and the tail. acephalic spermatozoa are known as 'pin-headed,' although they lack a true head; the small cephalic knob-like thickening is actually a cytoplasmic droplet with a variable degree of mitochondrial organization giving rise to a variable degree of motility. 1016 this anomaly is due to an early failure in spermiogenesis. it may be familial in some cases. 1017, 1018 spermatozoa with defi cient organization of the connecting piece have narrowing at this level, with loss of alignment of the head and fl agellum axes. spermatozoa with a separated head and fl agellum, known as decapitated and decaudated spermatozoa, are also the result of an anomaly in spermiogenesis, but the separation between heads and tails can occur during spermiation or at any level of the sperm excretory ducts. 1019, 1020 anomalies in axoneme abnormalities of the axoneme are classifi ed as numerical anomalies, microtubular ectopia, and immotile cilia syndrome. the most common numerical anomalies are the absence of one or both microtubules of the central pair and complete lack of the axoneme. spermatozoa lacking the central microtubule pair also lack the central sheath and are immotile, although they are normal by light microscopy. familial cases have been reported. 1021 this anomaly may be associated with ciliary dyskinesia. 1022 immotile cilia syndrome (primary ciliary dyskinesia) 1023 refers to patients having low mucociliary clearance associated with otitis, sinusitis, bronchitis, bronchiectasis, and immotile spermatozoa. most patients have the same defect in the axoneme and cilia of the respiratory mucosa. the frequency of this syndrome is estimated at between 1 in 20 000 and 1 in 60 000 men. clinical symptoms consist of reduced clearance of ciliary mucus in the airway, with onset at infancy. in order to prevent the later development of bronchiectasis, ultrastructural study of the respiratory mucosa is advisable if other disorders have been excluded, including cystic fi brosis, allergy and other immune disorders, α 1antitrypsin defi ciency, and cardiovascular and metabolic diseases. 1024 the most frequent anomalies of this syndrome are the absence of microtubule doublets and peripheral junctions, the central microtubule pair, the outer dynein arms, the central junctions, the two dynein arms, and the inner dynein arm plus the peripheral junctions ( fig. 12-117) . spermatozoa lacking the two dynein arms or the peripheral junctions are immotile. reduced motility is seen in spermatozoa with only one dynein arm. kartagener's syndrome is a variant of the immotile cilia syndrome characterized by the classic triad of situs inversus, bronchiectasis, and chronic sinusitis. the syndrome has autosomal recessive inheritance 1025 and is found in 20-25% of patients with situs inversus. 1026 anomalies in periaxonemal structures periaxonemal abnormalities include mitochondrial sheath defects, 1027 malposition of the annulus, alteration in number, shape, or length of the outer dense fi bers, and absence, thickening, or disruption of the fi brous sheath. 1011, 1028 many cases of asthenozoospermia, present in 30% of infertile men, may be attributable to defi cient mitochondrial function, possibly caused by mutations in their dna. 1029 abnormalities of the dense fi bers are associated with deficient motility. abnormalities of the fi brous sheath include, in addition to the abovementioned dysplasia of the fi brous sheath, absence of the fi brous sheath, and redundant fi brous sheath material associated with a defi cit or lack of mitochondria. 1030 the three defects are probably inherited. the incidence of intratubular germ cell neoplasia (igcn) in infertile patient is 0.4% in england, 1031 0.7% in spain, 1932 0.73% in germany, 1033 and 1.1% in denmark. 1034 a higher risk occurs in patients with severe oligozoospermia (fewer than 10 million spermatozoa per milliliter), azoospermia associated with unilaterally or bilaterally diminished testicular volume, 1035 a history of testicular maldescent, 1036, 1037 or unilateral testicular cancer. 1038 the cells of igcn are located in seminiferous tubules with decreased tubular diameter and lacking spermatogenesis. these cells are large and have pale cytoplasm and large and irregularly outlined nuclei, with one or several prominent nucleoli. they stain intensely with periodic acid-schiff and express placenta-like alkaline phosphatase, c-kit, and the cell adhesion molecule cd44. 1039 a reduction in the number or absence of leydig cells is infrequent in infertility, and only occurs in hypogonadotropic hypogonadism secondary to lh defi cit and in patients infertility with biologically inactive lh. leydig cell hyperplasia is very common, 1040 and has been observed in klinefelter's syndrome, cryptorchidism, male pseudohermaphroditism, minor androgen insensitivity, infertility secondary to leydig cell dysfunction, varicocele, after treatment with 5αreductase inhibitors or non-steroidal anti-androgens, and in some elderly men. such hyperplasia may give rise to hypoechoic or hyperechoic images that may be misdiagnosed as tumor. 1041 there is a close relationship between testicular dysfunction and elevated mast cell numbers in the testis. an increase in interstitial and peritubular mast cells occasionally occurs in infertile patients. 1042, 1043 this increase is higher than that observed in infl ammatory or neoplastic process. 1044 daily administration of ketotifen, an antihistamine-like drug with a mast cell-stabilizing effect, signifi cantly improves the spermiogram parameters in some patients. 1045 for effective therapy, it is important to know whether or not the azoospermia or oligozoospermia is the result of obstruction. 863, 1046 obstructive azoospermia and oligozoospermia azoospermia caused by obstruction is usually easily diagnosed, but this determination is more diffi cult with oligozoospermia. obstruction of the ductal system should be suspected when there are more than 20 mature spermatids (s c + s d ) per cross-sectioned tubule and fewer than 10 million spermatozoa in the spermiogram (fig. 12-118) . 1047, 1048 obstructive azoospermia is implicated in 7.4-14.3% of cases of male infertility. obstruction is classifi ed as proximal, distal, and mixed, according to the distance from the testis to the point of obstruction in the ductal system. proximal obstruction obstruction is considered proximal when the lesion lies between the seminiferous tubules and the distal end of the ampulla of the vas deferens. epididymal obstruction, principally of the caput-corpus transition zone, accounts for 66% of cases. rarely, there is a defective connection between the rete testis and epididymal ductuli efferentes. because the seminal vesicles are normal, men with proximal obstruction have a normal volume of semen (the testicular contribution to semen is about 5% of the total volume). when obstruction is in the cauda of the epididymis, epididymal markers, including carnitine, glycerophosphorylcholine and α-glycosidase are low. 1049 the nearer the obstruction is to the caput of the epididymis, the higher the level of these markers. distal obstruction distal obstruction is located between the ampulla of the vas deferens and the junction of the ejaculatory ducts and urethra. these patients present with sacral, perineal, or scrotal pain on ejaculation. rectal examination often reveals enlarged seminal vesicles. the volume of semen is low and consists of watery fl uid that fails to coagulate. seminal vesicle secretions are lacking. the concentration of prostatic secretions, such as acid phosphatase and citric acid, is increased owing to the lack of semen dilution. vasography may help in diagnosis, as higher segments fail to fi ll. 1050 transrectal ultrasonography is the most accurate imaging modality for the diagnosis of ejaculatory duct obstruction. needle aspiration of seminal vesicle fl uid may show spermatozoa that have entered the seminal vesicles by refl ux. mixed obstruction mixed obstruction refers to lack of patency of the vas deferens or the epididymis and alterations in the ejaculatory ducts or seminal vesicles (low ejaculate volume, and absence of fructose). the most frequent cause is mucoviscidosis. one-third of patients with congenital bilateral absence of the vas deferens have agenesis or hypoplasia of the seminal vesicles. the cause of epididymal obstruction in patients with anomalies of the prostate-vesiculo-deferential junctions is diffi cult to determine. etiology of obstructive azoospermia obstructive azoospermia may be caused by congenital or acquired lesions. congenital azoospermia the most frequent anomalies associated with congenital azoospermia are testis-epididymis dissociation, epididymal malformation in cryptorchidism, bilateral absence of the vas deferens, congenital unilateral absence of the vas deferens associated with pathology of the contralateral testis or its sperm excretory ducts, seminal vesicle agenesis, and ejaculatory duct obstruction (table 12-9) . agenesis of all mesonephric duct derivatives. agenesis of all mesonephric duct derivatives is a rare disorder that gives rise to varied anatomical anomalies, depending on the stage of embryonic development at which the mesonephric duct derivatives disappear. if failure occurs before the fourth week the ipsilateral kidney and ureter are absent, although the testis may be present, or there may be other renal anomalies. if failure occurs in the fourth week, and the ureteral bud is already formed, the ureter and kidney may develop normally. if failure occurs between the fourth and the 13th weeks, there is a variable constellation of anomalies that most frequently include normal development of the testis and globus major and hypoplasia of the other excretory duct segments, or agenesis of an excretory duct segment (epididymis, vas deferens, or seminal vesicle). epididymal anomalies. the most frequent epididymal anomalies are absence of the epididymis, testis-epididymis dissociation, defective connection of the vas deferens and epididymis, epididymal cyst, and anatomical abnormalities of the epididymis. complete absence of the epididymis is frequent in monorchidism and anorchidism. the epididymis is replaced by a small mass of cellular connective tissue with abundant blood vessels at the blind end of the vas deferens. partial absence of the epididymis is more frequent than complete absence. absence of the corpus of the epididymis gives rise to a characteristic malformation called bilobated epididymis. this varies from simple strangulation to complete separation of the caput and cauda. these anomalies are often associated with absence of the vas deferens. testis-epididymis dissociation is found in 1% of cases of obstructive azoospermia and is usually associated with cryptorchidism. defects in connection between the ductuli efferentes and the ductus epididymidis are rarely complete. in the incomplete form, some of the fi ve to 30 ductuli efferentes in the epididymis are short and end blindly. epididymal cysts usually arise from blind-ended ductuli efferentes and contain spermatozoa. these spermatoceles retain their epithelial lining, although it becomes atrophic ( fig. 12-119 ). spermatozoa may be obtained from these cysts. some epididymal cysts arise from embryonic remnants, do not contain spermatozoa, and are lined by columnar or pseudostratifi ed epithelium. wolffi an cyst, unlike müllerian cyst, is immunoreactive in the apical border of epithelial cells with cd10. 1051 cyst lined by clear cells with or without papillae raises concern for von hippel-lindau disease. 1052 large epididymal cyst requires removal and must be excised with great care to avoid damaging the ductuli efferentes and resulting in obstruction. epididymal cyst is present in about 5% of males, and the incidence is high (21%) in those exposed to diethylstilbestrol during gestation. the incidence of epididymal cyst in those with hepatorenal polycystosis is similar to that in the general population. 1053 anomalies in epididymal confi guration, altering its shape and location, are frequent in men with cryptorchidism and uncommon with descended testes. the most common malformations are elongated epididymis, angulated epididymis, and free epididymis. elongated epididymis is found in approximately 68% of undescended testes. the length of the epididymis may be several times that of the testis, and, in abdominal or inguinal cryptorchidism, the epididymis extends several centimeters below the testis. angulated epididymis is characterized by a long epididymis that has a sharp bend in the corpus with or without stenosis. with free epididymis, all or part of the epididymis is unattached to the testis. the most common variant is epididymis with free cauda. vas deferens anomalies. the most frequent anomalies are of the vas deferens are congenital absence, segmental aplasia, ectopia, duplication, diverticula, and crossed dystopia. 1054 congenital absence is defi ned as unilateral or bilateral absence of either the whole vas deferens or only a segment. obviously, azoospermia occurs with bilateral absence. the frequency of this malformation varies among populations. at autopsy, the prevalence is 0.5%, but the clinical incidence infertility is 1-1.3% in infertile men 1055 and 10-25% in patients with obstructive azoospermia. unilateral complete absence is three times more frequent than bilateral absence, and absence of only a segment is even more frequent. the affected segment may be absent or reduced to a fi brous cord. absence of the vas deferens may be associated with other malformations of the sperm excretory ducts or urinary system. the most frequent malformations of the excretory ducts are absence of the ejaculatory ducts (33% of cases) and, less frequently, absence of the seminal vesicles. about 71% of patients with bilateral absence of the vas deferens have partial aplasia of the epididymis. the most frequent malformations of the urinary system are absence of the ipsilateral kidney and other renal anomalies. complete or partial absence of the vas deferens occurs frequently in patients with cystic fi brosis. persistent mesonephric duct consists of the ureter joined to the vas deferens, forming a single duct that opens in an ectopic orifi ce between the trigone and the verumontanum. this malformation may be associated with cystic transformation or absence of the seminal vesicle. the kidney may be normal or dysplastic. anomalies of seminal vesicle and ejaculatory duct. the most frequent anomalies are agenesis of the seminal vesicles or ejaculatory ducts, cyst of the seminal vesicle, and ectopic opening of the ureter into the seminal vesicle. the last is the most common and often is associated with ipsilateral renal dysplasia. acquired azoospermia infl ammation and trauma are the main causes of acquired azoospermia. epididymitis is a frequent cause; chlamydia trachomatis 1056, 1057 and escherichia coli are the most common infectious causes in developed countries. 1958 infections with neisseria gonorrheae and mycobacteria are also implicated, and non-specifi c epididymitis is important. 1059 apart from elective vasectomy, the most frequent traumatic causes of azoospermia are surgical accidents during herniorrhaphy in chidren, 1060 orchidopexy, varicocelectomy, hydrocelectomy, deferentography, 1061 and removal of epididymal cyst. obstructive azoospermia may also result from blockage of the ejaculatory ducts following transurethral resection, or as a result of chronic urethral catheterization. lesions of the testis and epididymis may result from obstructed sperm excretory ducts, depending on the location, origin (congenital or acquired), and duration of the obstruction. location of obstruction obstruction at the level of the ampulla of the vas deferens, seminal vesicles, or ejaculatory ducts does not usually cause signifi cant lesions in the testis or epididymis. more proximal obstruction at the level of the vas deferens, epididymis, or testis-epididymis junction usually causes severe lesions in both the sperm excretory ducts and the testicular parenchyma. obstruction of the vas deferens causes increased pressure within the ductus epididymis. as a result, epididymal lumina dilate, the epithelium atrophies, and fl uid containing few spermatozoa and some spermiophages accumulates in the lumen (fig. 12-120) . the most dilated epididymal segment is the caput. the ductuli efferentes often become cystically dilated and fi lled with spermatozoa and macrophages. from reabsorption and lysosomal degradation of this protein-rich fl uid, the epithelium accumulates lipofuscin granules or aquires apical eosinophilic granules (paneth cell-like change). 1062 rupture of the vas deferens gives rise to microgranulomas and ceroid granuloma (fig. 12-121 ). macrophages and lymphocytes are often present in the intertubular connective tissue. 1063 the most frequent testicular lesions in proximal obstruction involve the adluminal compartment, and are the result of the negative effect of hydrostatic pressure on the seminiferous tubular cell layers and, in particular, on the sertoli cell (figs 12-122-12-124) . etiology of obstruction obstruction secondary to congenital absence of the vas deferens usually causes little testicular injury, mainly dilation of the seminiferous tubules and an increase in the number of mature (s c + s d ) spermatids. 1064 lesions resulting from vasectomy are more important. increased intraluminal pressure in the epididymis 1065 may give rise to pain (late post-vasectomy syndrome). 1066 testicular lesions depend on the surgical technique used: they are slight if the proximal end of the vas deferens is not ligated or sperm granuloma forms at the site of vasectomy. the spermatogenic rhythm in the testis is slower than before vasectomy, and lesions characteristic of testicular obstruction develop, including thickening of the lamina propria and fi brosis of the interstitium. 1067, 1068 in testicular obstruction secondary to herniorrhaphy in infancy, testicular lesions are mild. testicular lesions may be important if the epididymis is damaged by hydrocelectomy, and consist mainly of primary spermatocyte sloughing. in addition to these lesions, hyalinized tubules may be observed when obstruction is caused by infl ammation. in acquired obstruction the testicular lesions worsen with time. obstruction in the caput of the epididymis leads to disappearance of all germ cells in the adluminal compartment of seminiferous tubules. the tubules become dilated and sertoli cells appear vacuolated. testicular alterations after vasectomy may not be related to the duration of the obstruction but rather to the initial injury, and may disappear with time as the intraluminal pressure decreases. 1069 however, if a signifi cant amount of time has elapsed after vasectomy, the possibility of attaining a normal spermiogram with vasovasostomy is very low. vasal patency is restored in most cases of reanastomosis, but paternity rates are markedly lower (25-51%) 1069 than normal (85%). 1070 some azoospermic patients have testicular biopsy with minimal histologic abnormality or minor tubular dilation without detectable excretory duct obstruction. these fi ndings are characteristic of two main conditions: young's syndrome, and alterations in spermatozoal transport. young's syndrome young's syndrome is defi ned by the following constellation of fi ndings: azoospermia, sinusitis, bronchitis or bronchiectasis, and normal spermatozoal fl agella. 1071 the incidence is probably higher than that recorded in the literature, and young's syndrome should be suspected in all patients with obstructive azoospermia without a history of epididymitis or scrotal trauma. these patients have a lesion at the junction of the caput and corpus of the epididymis that gives the epididymis a characteristic gross appearance. the caput of the epididymis is distended, the ductuli efferentes contain yellowish fl uid and numerous spermatozoa, and the remaining epididymal segments are normal. the ductus epididymidis is blocked by thick fl uid. 1072 young's syndrome should be distinguished from other causes of infertility also associated with chronic sinusitis and pulmonary infections, including ciliary dyskinesia and cystic fi brosis. ciliary dyskinesia consists of morphological, biochemical, and functional alterations in cilia and fl agella, and includes infertility several diseases such as the immotile cilia syndrome, kartagener's syndrome, and miscellaneous syndromes characterized by imperfectly defi ned abnormalities of cilia and fl agella. 1073 in young's syndrome, sinusitis and pulmonary infections develop in childhood and stabilize or improve in adolescence; in other conditions, the pulmonary damage increases with age and the cilia and fl agella are ultrastructurally abnormal. 1074 alterations in spermatozoon transport normally, spermatozoa detach from the sertoli cells and are transported through the intratesticular and extratesticular excretory ducts, where they are stored, mainly in the cauda of the epididymis, and fi nally released from the corpus by ejaculation or eliminated by phagocytosis. only about 50% of spermatozoa are ejaculated. whereas the release of spermatozoa from the corpus is intermittent, their transport through the sperm excretory ducts is continuous. transport is accomplished by the myofi broblasts in the wall of the seminiferous tubules and ductuli efferentes and the smooth muscle cells in the wall of the ductus epididymidis and vas deferens. these cells cause peristaltic contraction, propelling spermatozoa along the length of the epididymis in a mean of 12 days (range, 1-21 days). the walls of the seminiferous tubules and extratesticular excretory ducts are under hormonal and neural control. the myofi broblasts in the seminiferous tubules have oxytocinic, α 1 -β-adrenergic, and muscarinic receptors. unmyelinated nerve fi bers penetrate the tubular lamina propria, pass among the myofi broblasts, and end near the sertoli cells. 1075 along their length these nerve fi bers have varicosities containing sympathetic vesicles. the ductus epididymis is innervated by sympathetic adrenergic nerve fi bers that end among the smooth muscle cells. several hormones, including oxytocin, endothelin-1, vasopressin, and prostaglandins, act on the musculature of the ductus epididymis. the peristaltic contractions begin in the caput and propagate toward the cauda. the frequency and amplitude of contractions vary from region to region, being higher in frequency near the caput and of maximal amplitude in the initial portion of the cauda. the progressive increase in amplitude parallels the progressive increase in thickness of the muscular wall and the requirement for greater force to propel the fl uid as it becomes progressively more viscous with a higher concentration of spermatozoa. the distal portion of the cauda is unusually at rest because it is the main reservoir of spermatozoa between ejaculations. several times daily, vigorous contractions of the distal cauda impel the spermatozoa from the cauda toward the vas deferens. 1076 several drugs that favor contraction of the muscular wall (α 1 blocking and f 2α prostaglandins) have been successfully used in the treatment of alterations in the spermatozoon transport. 1077 knowledge of the incidence of chromosomal abnormalities in male infertility has progressed in parallel with advances in technology: karyotypic studies in peripheral blood, meiotic and chromosomal studies of testicular biopsies, analysis of chromosomes in spermatozoa, and analysis of dna in blood and spermatozoa for the detection of chromosome y deletions. 1078 the incidence of chromosomal anomalies in infertile men is 2.2-6.6%, whereas in the general population it is lower than 0.5%. the frequency of chromosomal abnormalities increases with the decrease in number of spermatozoa in the ejaculate. 1079 klinefelter's syndrome genetic and clinical aspects klinefelter's syndrome is characterized by an abnormal number of x chromosomes and primary gonadal insuffi ciency. the original description was of a man with eunuchoidism, gynecomastia, small testes, mental retardation, and elevated level of serum gonadotropins. 1080 the frequency of this syndrome varies according to the population studied: 1 in 1000 to 1 in 1400 surviving newborns; 1 in 100 patients in mental institution; 3.4 in 100 infertile men; and 11% of patients who are azoospermic. 1081 in 80% of cases, the karyotype is 47xxy. the remaining 20% have chromosomal mosaicism with at least two x chromosomes. the most common are xy/xxy, xy/xxxy, xx/ xxy, xxy/xx/xy, xy/xo/xxy, xx/xxy/xxxy, and xxxy/ xxxxy. the 47xxy lesion is due to non-disjunction in sex chromosome migration during the fi rst or second meiotic division of the spermatocyte or ovule, or during the fi rst meiotic division of the zygote. 1082 study of the xg antigen in blood revealed that the extra x chromosome is from the mother in 73% of cases. advanced maternal age increases the incidence of children with the 47xxy karyotype. in 47xxy patients, the most common clinical fi ndings are: 1083 • eunuchoid phenotype with increased stature. the increased height is due to a disproportionate lengthening of the lower extremities. the ratio of span to height is less than 1. • incomplete virilization. this is variable and ranges from normal development to absence of secondary sex characteristics. • gynecomastia, usually bilateral, present in 50% of patients. • mental retardation. other commonly associated conditions include chronic bronchitis; varicose veins; cervical rib; kyphosis; scoliosis or pectus excavatum; and a high incidence of hypothalamic, hypophyseal, thyroid, and pancreatic dysfunction. 1084 the external genitalia usually are normally developed. the testes are usually less than 2.5 cm long, although in some cases of chromosomal mosaicism they are of normal size. 1085 the incidence of cryptorchidism is low in 47xxy patients but increased in mosaicism. 1086 supernumerary x-chromosome material is associated with a reduction of gray matter in the left temporal lobule, a fi nding correlated with verbal and language defi cits. 1087 histologically, the testes show the classic picture of tubular dysgenesis with small hyalinized seminiferous tubules lacking elastic fi bers and pseudoadenomatous clustering of leydig cells (figs 12-125, 12-126 ). 1080 most biopsies show some tubules with a few sertoli cells. 1088 these cells may be dysgenetic (pseudostratifi ed distribution of nuclei that are dark and elongate and contain small peripherally placed nucleoli in tubules without apparent lumina). sex chromatin may only be observed in dysgenetic sertoli cells. 1089 this suggests that either there is testicular mosaicism of the x chromosome, or that both x chromosomes are heterochromatinized. in mosaicism, sertoli cell-only tubules may be more numerous than hyalinized ones. the reduced testicular volume gives an appearance of leydig cell hyperplasia, 1090 although quantitative studies have shown that the total number of leydig cells is lower than normal. 1091 many of the leydig cells are pleomorphic and some are multivacuolated. immature fi broblast-like leydig cells may be present. the abnormally differentiated leydig cells have nuclei with coarse masses of dense chromatin, deep unfolding of the nuclear envelope, multiple paracrystalline inclusions instead of reinke's crystalloids, multilayered concentric cisternae of smooth endoplasmic reticulum, large masses of microfi laments, and scant lipid droplets. 1092 sex chromatin is apparent in 40-70% of leydig cells. leydig cell function is insuffi cient and androgen levels are less than 50% of normal. basal fsh and lh are markedly increased. 1084, 1093, 1094 in a few patients the testicular damage is less severe, with some tubules showing spermatogenesis and less prominence of leydig cells. 1095 exceptionally, complete spermatogenesis and even paternity have been reported. 1096 the xy/xxy karyotype is the most frequent variant of klinefelter's syndrome with chromosomal mosaicism. in this condition, the clinical abnormalities may be attenuated. gynecomastia is present in 33% of cases, compared to a frequency of 55% in men with the 47xxy karyotype. azoospermia is found in 50% of cases (93% in xxy men). the testes are larger and spermatogenesis is more developed in men with xxy ( fig. 12-127) . patients with the 47xxy karyotype who have spermatozoa in seminiferous tubules are bearers of 46xy spermatogonia and also of 47xxy spermatogonia, whereas those who have no spermatozoa have 47xxy spermatogonia only; these 47xxy spermatogonia may include some spermatozoa with 23x or 23y chromosomal complement, elevated numbers of both 24xy and 24xx spermatozoa, and also a high frequency of spermatozoa with 21 disomy; this could be an important risk for gonosomy 1097 and also for trisomy 21. 1098 genetic counseling is advisable in patients seeking intracytoplasmic sperm injection therapy. genetic diagnosis before implantation of the zygote or prenatal diagnosis have been recommended, except for parents who assume the risk of gonosomy. the incidence of the 48xxyy karyotype is estimated to be 0.04 per 1000 live births. [1099] [1100] [1101] [1102] this karyotype may be associated with aggressive character, antisocial behavior, more severe mental retardation, and a higher frequency of congenital malformations than the 47xxy karyotype. men with the 48xxyy karyotype also have characteristic dermatoglyphics with an increase in arches, a decrease in total fi nger ridge count, and ulnar triradiuses associated with changes in the hypothenar region. 1103 concentric lamellae of smooth endoplasmic reticulum in leydig cells are a characteristic fi nding (fig. 12-128 ). 1104 men with the 48xxxy or 49xxxyy karyotype often have skeletal malformations, principally radioulnar synostosis, and cryptorchidism. 1105 in addition to the characteristic symptoms of 47xxy klinefelter's syndrome, 1106 men with the 49xxxxy karyotype have other abnormalities, including severe mental retardation, hypoplasia of external genitalia, cardiac malformations, radioulnar synostosis, microcephaly, and a high arched palate. 1107 association of klinefelter's syndrome with malignancy patients with klinefelter's syndrome have a higher incidence of malignancy than the general population. the association was fi rst discovered with breast carcinoma, 1108 which had an incidence 20 times greater than in the general male population, 1109 and is related to hormonal stimulation. 1110 although testicular germ cell tumor is rare in these patients, 1111 extragonadal germ cell tumor is 30-40 times more frequent than in the general population. most occur in the mediastinum (about 71%) and are less frequent in the pineal gland, central nervous system, and retroperitoneum. the most frequent types are teratoma and choriocarcinoma; embryonal carcinoma and seminoma are rare. [1112] [1113] [1114] the extragonadal origin of germ cell tumors has been attributed to abnormal germ cell migration from the yolk sac. the high incidence has been attributed to elevated hormone levels and chromosomal anomaly. 1115 in a patient with the xy/xxy chromosomal mosaic and bronchogenic carcinoma, cultured xxy fi broblasts transformed three times more frequently when exposed to sv40 virus than did fi broblasts from normal men. 1116 other tumors reported in patients with klinefelter's syndrome (lymphoma, leukemia, bronchogenic carcinoma, urothelial carcinoma of the bladder, adrenal carcinoma, prostatic adenocarcinoma, testicular leydig cell tumor, and epidermoid cyst) do not appear to have a higher incidence than in the general population. [1117] [1118] [1119] [1120] occurrence of klinefelter's syndrome in childhood early identifi cation of this syndrome is possible with systematic cytogenetic study of newborns with positive sex chromatin or mental retardation. 1121 several clinical symptoms suggest klinefelter's syndrome. initial symptoms include decreased muscle tone, delayed speech, and poor language skills with an increased incidence of reading diffi culties and dyslexia. 1122 later, there may be recognition of mental retardation, 1123 psychiatric problems, excessive stature for age, disproportionately long legs, micropenis, and small testes. [1124] [1125] [1126] [1127] androgen defi ciency is an early fi nding. 1128 testicular biopsy reveals scant or absent germ cells. quantitative studies indicate that the number of germ cells in 47xxy fetuses is signifi cantly lower than in normal 46xy fetuses. the seminiferous tubules have reduced diameter, particularly those devoid of germ cells. the number of sertoli cells per cross-sectioned tubule is reduced. megatubules, ring-shaped tubules, and intratubular eosinophilic bodies are common (fig. 12-129) . in some cases of klinefelter's syndrome associated with down's syndrome, tubular hyalinization is observed in childhood. 1129 the interstitium is wide and contains few leydig cell precursors. if one testis is undescended, its histology does not differ from that of the contralateral testis. the testicular pattern remains constant through childhood. 1130 at puberty, before maturation of the tunica propria occurs, the seminiferous tubules rapidly hyalinize and leydig cell precursors differentiate into leydig cells. 1131 association of klinefelter's syndrome with precocious puberty although precocious puberty is not a characteristic fi nding in klinefelter's syndrome, karyotyping in older boys with mental retardation, gynecomastia, small testes, and precocious puberty is advisable. in most cases, the cause of precocious puberty is a hcg-secreting germ cell tumor in the mediastinum. 1132 infrequently, precocious puberty is idiopathic, and only in isolated cases is there a hamartoma in the third ventricle. 1133 klinefelter's syndrome is often associated with pituitary disorders such as panhypopituitarism 1134 or incomplete hypopituitarism. 1135 defi cits in fsh, 1136 lh, 1137 or both 1138, 1139 have been reported. the cause of this association is unknown, and diverse etiologies such as trauma, immunologic disorders, and genetic defi ciencies have been postulated. alternatively, it may be due to exhaustion of pituitary gonadotropin-secreting cells after years of gonadotropin-releasing hormone stimulation. 1135 in patients defi cient in both gonadotropins, testicular biopsy shows diffuse tubular hyalinization and a marked reduction in or absence of leydig cells. the histological picture is similar to that of hypogonadotropic hypogonadism occurring after puberty, except for the presence of isolated tubules containing only dysgenetic sertoli cells and absence of elastic fi bers in the hyalinized tubular wall (fig. 12-130) . 1139 biopsy of patients with a defi cit only in fsh is similar to that of the dysgenetic sertoli cell variant of the sertoli cell-only syndrome, although some hyalinized tubules are present. the testicular biopsy of patients defi cient only in lh resembles that of men with classic 47xxy klinefelter's syndrome. 46xx males the 46xx karyotype may be present in three phenotypes: male phenotype, including normal external genitalia; male pseudohermaphrodites, with a variable degree of ambiguity in external genitalia, ranging from hypospadias to micropenis; and true male hermaphrodites. 46xx males with male phenotype and normal external genitalia men with the 46xx karyotype having male phenotype and normal external genitalia have clinical features similar to those of klinefelter's syndrome, including small testes, small or normal penis, azoospermia, gynecomastia, and minimal development of secondary sex characteristics. however, these men have harmonious body proportions, normal or slightly low stature, and normal intelligence. 1140 the incidence of 46xx males varies from 1 : 10 000 to 1 : 25 000 live births, accounting for about 0.2% of infertile men. 1141, 1142 males with 46xx karyotype have hypergonadotropic hypogonadism with elevated serum levels of fsh and, to a lesser degree, elevated lh, with normal or slightly decreased testosterone. familial cases have been reported. 1143 during childhood, biopsy of 46xx males reveals decreased numbers of germ cells. 1144, 1145 biopsies from adults show one of three patterns: histology similar to that of 47xxy men, including diffuse tubular hyalinization with prominent leydig cells; 1146 sertoli cell-only tubules; 1147, 1148 and both patterns intermingled with less prominent leydig cells. the last is the most frequent ( fig. 12-131) . ultrastructural studies reveal an increase in intermediate fi laments, absence of annulate lamellae in sertoli cells, 1149 absence of reinke's crystalloids, and abundance of intracytoplasmic and intranuclear paracrystalline inclusions in leydig cells. 1147 46xx males with ambiguous external genitalia some patients with the 46xx karyotype have ambiguous external genitalia or hypospadias and are assumed to have a variation of male pseudohermaphroditism. 1150 these males, together with true hermaphrodites, may be found in the same family, suggesting that both disorders are different manifestations of the same genetic defect. the origin of 46xx males may be diffi cult to determine. however, as testicular differentiation requires genes located on the y chromosome, 46xx males have been classifi ed by cytogenetics as those having the sry gene, those lacking the sry gene, and xx/xy mosaicism. males with the sry gene comprise 80% of 46xx males. 1151 it is likely that this occurs when the genetic material from infertility the short arm of the y chromosome is translocated to the x chromosome. 1152 during paternal meiosis, the homolog pseudoautosomal regions of chromosomes x and y interchange the terminal portions of their short arms, giving rise to an x chromosome with the sry gene but lacking the azoospermia factor. [1153] [1154] [1155] [1156] [1157] [1158] alternatively, the sry region may be inserted in an autosome. 1159 most 46xx patients who are sry positive have a normal male phenotype. about 10% of 46xx males are sry negative and most have ambiguous genitalia. some patients have a normal male phenotype 1161 and only infertility. 1162 although sry is assumed to be the most important regulator factor of testicular determination, these patients may have mutation of one of the downstream non-y testis-determining genes. [1163] [1164] [1165] [1166] about 10% of 46xx males have xx/xy mosaicism or other karyotype with the chromosomal complement y. in these cases, detection of the specifi c dna sequences of y chromosome may be diffi cult because this chromosome may be only in some tissues and in a small number of cells. 1160 47xyy syndrome the 47xyy syndrome was fi rst described in 1961 in the father of a girl with down's syndrome. 1167 the only clinical fi ndings were excessive height and pustular acne. study of other cases suggests that these men are predisposed to a psychopathic personality and antisocial behavior, although most have a normal personality and are socially adapted. the incidence of 47xyy patients is estimated to be 0.01% of the general population, 0.7-0.9% of men in prison, and 1.8% of sexual homicide criminals. 1168 the extra y chromosome originates from non-disjunction during the paternal second meiotic division. in the past decade, many cases have been diagnosed prenatally. from birth, the patients have weight, stature and cephalic circumference above mean values and a higher risk for delayed language and/or motor development. about 50% of children have psychological and psychiatric problems such as autism; although their intelligence is normal, many patients are referred to special education programs. 1169 as adults, they have normal external genitalia and secondary sex characteristics. fertility is reduced, 1170 although many have been fathers. usually, testicular biopsy reveals mixed atrophy characterized by tubules with spermatogenesis associated with sertoli cell-only tubules (fig. 12-132) . 1171, 1172 those tubules with spermatogenesis may show normal spermatogenesis or have lesions in the adluminal or basal compartments. in these tubules, many xxy spermatocytes degenerate during meiosis. about 64 % of pachytene cells have three sex chromosomes. 1173 the number of normal spermatozoa in the ejaculate is low. there is a high incidence of both yy and xy spermatozoa and disomy 18. the variability in germ cell development is apparently due to elimination of germ cells that could not pair their sex chromosomes during the fi rst or second meiotic divisions 1174 or, later, during the round spermatid stage. 1175 spermatocytes that succeed in forming trivalent chromosomes are initially viable. 1176 the ultimate trivalent chromosome segregation yields aneuploid and euploid cells in equal numbers. sertoli cell-only tubules are attributed to either spermatogonial damage by substances released from degenerated spermatocytes 1177 or absence of testicular colonization by primordial germ cells. these men have normal serum levels of testosterone and lh. the latter may be slightly increased in 47xyy men with severe spermatogenic alterations. 1178 47xyy men with mosaicism (47xyy/46xy) have a higher risk of fathering children with hyperdiploid chromosomal constitution, and spermatozoa should be studied genetically to evaluate the risk of intracytoplasmic sperm injection. 1179 men with three and four y chromosomes have been reported. men with the 48xyyy karyotype are tall and have normal male phenotype, slight mental retardation, azoospermia and, during childhood, frequent infections of the upper respiratory tract. 1180 testicular biopsy shows sertoli cell-only tubules, severe hyalinization of tubular basement membrane, and diffuse leydig cell hyperplasia. the chromosomal complement of parents can be normal. 1181 men with 49xyyyy also have no signifi cant phenotypic abnormalities (except for cases of chromosomal mosaicism). slight mental retardation, infertility, and antisocial behavior are the most signifi cant clinical fi ndings. 1182 rarely patients have facial dysmorphism and various skeletal abnormalities. 1183 the y chromosome is essential for gender determination and spermatogenesis, and abnormalities often lead to infertility. the relationship between y chromosome abnormalities and infertility is best understood in azoospermic men with alterations in yq11, the distal region of the euchromatic part of the long y arm, the location of a male fertility gene complex called azoospermia factor. infertility may result from deletion of any of four subregions in which the azoospermia factor has been divided (azfa, azfb, azfc and azfd). 1184, 1185 the best-known y chromosome genes involved in spermatogenesis are rbm. daz, dffry, cdy, smcy, and zfy. six different partial deletions of this region have been found in azoospermic patients (table 12monocentric deleted yq chromosome partial deletion of the distal portion of the yq11 euchromatic region is associated with azoospermia owing to loss of the azoospermia factor. these men have normal external genitalia except for small testes, 1187 normal testosterone and lh serum levels, and increased fsh serum level. the most frequent histological fi nding is sertoli cell-only pattern, although many other patterns have been reported. 1188 the number of leydig cells is normal or increased. these fi ndings suggest that the azoospermia factor is required for early spermatogenesis. 1189 if the breakpoint of yq11 is proximal to the centromere, patients are short because the gene that controls stature is close to that for the azoospermia factor. 1190 dicentric yq isochromosomes sterility is frequent in men with dicentric yq isochromosomes. 1191 this anomaly is usually associated with a 45x cell line. the proportion of this line varies between patients and between cell types (fi broblasts or lymphocytes). when the point of breakage and fusion of the two y chromosomes is in the distal region yq11, and the second centromere is inactivated, the y isochromosome is normal in size but does not stain with quinacrine, and thus is called non-fl uorescent y chromosome (ynf). as the breakpoint is in the yq11 region, the azoospermia factor function is altered. development of external genitalia varies from ambiguous to normal, and is probably related to the extent of xo present. 1192 testicular biopsies are similar to those of men with monocentric deleted yq chromosomes (fig. 12-133) . 1193, 1194 ring y chromosome men with ring y chromosomes have normal male phenotype, azoospermia, and, in some cases, short stature. most have mosaic karyotype with a 45x line. in some cases, testicular biopsy resembles that of men with monocentric deleted yq chromosome, but in others there is premeiotic arrest of spermatocyte maturation. 1195 this is attributed to diffi culty in pairing the x and y chromosomes during meiosis. many patients have deletion of some azf regions. 1196, 1197 y/y translocation chromosome patients with this anomaly have small soft testes and primary spermatocyte maturation arrest owing to defective pairing of the x and y chromosomes. the karyotype may be mosaic with a 45x line. 1198 translocation of y chromosome to x chromosome most frequently this translocation is cytogenetically undetectable, and patients present with infertility and are found to have 46xx karyotype. 1199 the phenotype is similar to that of men with klinefelter's syndrome except for shorter stature, absence of mental retardation, and smaller teeth. testicular biopsy shows sertoli cell-only pattern. men with cytogenetically detectable translocations have short stature, small testes, tubular hyalinization, and prominent clustered leydig cells similar to klinefelter's syndrome. autosomal translocation of y chromosome translocation of the distal heterochromatic portion of the y chromosome to the short arm of an acrocentric chromosome occurs occasionally. the most frequent are translocations to chromosomes 5, 18, 13, 15, and 22. the fertility of these men depends on the point of breakage. 1200, 1201 if this occurs in the yq12 heterochromatic region, the patient has a male phenotype and is fertile. if the point of breakage is in the yq11 region, the patient is infertile and has small testes. seminiferous tubules may show only sertoli cells, spermatogenetic arrest in early stages of meiosis, or an infantile pattern. 1202, 1203 interstitial microdeletion in yq11 yq11 microdeletion is the most frequent congenital cause of infertility. the frequency of y chromosome microdeletion in infertile patients varies widely (1-35%). 1204 in azoospermic men, the frequency is between 18% 1205 and 37%. 1206 in oligozoospermic males the incidence drops. most microdeletions are in the azfc subregion. 1207, 1208 testicular biopsy shows only sertoli cells, maturation arrest, or mixed atrophy. there is no correlation between site of azf subregion alteration and histological pattern. 1209 there is no exact correlation between genotype and phenotype, 1209 but most microdeletions in azfa are associated with azoospermia, most microdeletions in azfb are associated with maturation arrest, and most microdeletions of azfc are associated with spermatid maturation arrest or mixed testicular atrophy. partial deletion of azfc has a mild effect on fertility. 1210 external genitalia in 46xy patients with duplication of distal xp vary from male, ambiguous, to female, and gonadal dysgenesis is frequent. if the patient has male genitalia, these are usually hypoplastic with hypogonadotropic hypogonadism and, frequently, multiple congenital anomalies and mental retardation. 1211 males with translocation of the x chromosome to an autosome may have disturbed spermatogenesis with subfertility or infertility. 1212,1213 47xxx males show mental retardation, gynecomastia, normal stature, hypoplastic scrotum, a well-confi gured but small penis, small testes, and poorly developed pubic hair. serum testosterone levels are very low. seminiferous tubules appear severely hyalinized. 47xxx males result from an abnormal x-y interchange during paternal meiosis and x-x non-disjunction during maternal meiosis. 1214 there have been many reports on the relationship between autosomal anomalies and infertility, although the causes are not fully understood because the same anomaly is associated with infertility in some patients but not in others. robertsonian translocations are found in 0.7% of infertile men (8.5% higher than in the normal population) and are more frequent in oligozoospermic than in azoospermic men. the most frequent translocations are 13;14 and 14;21. the incidence of reciprocal translocations in infertile patients is 0.5% (0.1% in the general population) and increases to 0.8% in patients with azoospermia or severe oligozoospermia. 1215 the most frequent in infertile men are 11;22 and 17;21. paracentric and pericentric inversions (except for the pericentric inversion of the heterochromatic region in chromosome 9) are eight times greater in infertile patients (0.16%) than in the general population. the highest risk for infertility occurs in the pericentric inversion of chromosome 1. 1216, 1217 the most common testicular lesions in men with autosomal anomalies are spermatogonial maturation arrest, primary spermatocyte sloughing sometimes associated with hypospermatogenesis, and sertoli cell-only pattern. 1218 the only autosomal anomaly with prolonged survival is down's syndrome. in addition to trisomy of chromosome 21 and the characteristic appearance, patients with down's syndrome usually have cryptorchidism, small testes, hypoplasia of the penis and scrotum, and hypospadias. 1219 adults have oligozoospermia or azoospermia secondary to primary testicular defi ciency. levels of fsh and lh are elevated, but testosterone is normal or slightly diminished. 1220 isolated cases of paternity have been reported. 1221, 1222 in utero, there is marked delay in germ cell development. 1223 histologic studies of prepubertal testes at autopsy reveal decreased tubular diameter and tubular fertility index. eosinophilic bodies or microliths may be present in some tubules (fig. 12-134 ). adult testes have defi cient spermatogenesis and mixed atrophy, with some tubules showing complete spermatogenesis and others containing sertoli cellonly pattern. 1224 hypergonadotropic hypogonadism is found in several myopathies (myotonic dystrophy and progressive muscular fig. 12-134 prepubertal testis in down's syndrome. there are megatubules, ring-shaped tubules, and small tubules. germ cell number is very low in all these tubules. eosinophilic bodies or microliths are present in some tubules. dystrophy) and dermopathies (bloom's, rothmund-thomson, werner's, cockayne's, and tay's syndromes), with testicular histology that resembles that of klinefelter's syndrome. hypogonadism is also observed in noonan's syndrome, cerebellar ataxia (with milder testicular lesions), and a miscellaneous group of syndromes with variable histological fi ndings. 1225 myotonic dystrophy accounts for approximately 30% of men with muscular disorders, and about 80% have testicular atrophy. the estimated incidence is 1 in 8000 live births. the abnormality involves the distal muscles of the extremities. in addition, patients may have premature baldness, posterior subcapsular cataracts, cardiac conduction defects, impotence, gynecomastia (rarely), and dementia (at later stages). myotonic dystrophy is an autosomal dominant inherited disease with variable penetrance. two loci are associated with the disease phenotype: dm1 in 19q13.3, and dm2 in chromosome 3. mutation in dm1 results in a serine/threonine protein kinase defi ciency that causes expansion of a ctg repeat (from 50 to several hundred repeats) located on the 3′-untranslated region of the dystrophy myotonic-protein kinase gene. the number of repeats is positively correlated to severity of the disease and negatively correlated to age of clinical onset. [1226] [1227] [1228] dm2 is caused by a mutation in 3q21.3 of the znf9 gene and accounts for cctg-repeat expansion (from 75 to 11 000 repeats) in intron 1 of this gene. the common clinical symptoms are due to gain of function of rna mechanism in cug and ccug repeats altering cellular function, including alternative splicing of various genes. 1229 the severity of the disease increases in the successive generations. 1230 the number of ctg repeats is not associated with male subfertility. 1231 hypogonadism is hypergonadotropic in most cases and is not related to the number of ctg repeats. 1232 testicular lesions probably begin late because 65% of patients are fathers. testicular biopsy shows different degrees of severity, ranging from nearly normal to fully hyalinized seminiferous tubules, with the number of leydig cells varying from increased to decreased. in some patients the hypogonadism is hypogonadotropic, and the testes show an infantile pattern. infertility may be the fi rst symptom of myotonic dystrophy. 1233 progressive muscular dystrophy is a multisystemic x-linked disease. it is usually associated with gonadal atrophy caused by a defective locus in chromosome 19. patients rarely live more than 20 years. the incidence is approximately 1 in 4000 live births. in both duchenne and becker forms the cause is a defect in the dystrophin gene. 1234, 1235 bloom's, rothmund-thomson, and werner's syndromes are caused by a homozygous defect in human receq helicases in chromosome 15. of the fi ve members of this gene family (recq1, blm, wrn, recq4, and recq5), three produce autosomal recessive inherited diseases. mutations of blm have been identifi ed in patients with bloom's syndrome, wrn has been shown to be mutated in werner's syndrome, and mutations of recq4 have been associated with rothmund-thomson syndrome. 1236, 1237 despite the close genetic origin of the three syndromes, symptoms are very different. bloom's syndrome is characterized by short stature, narrow face with prominent nose, facial 'patchy' skin color changes that become more marked with sun light exposure, and increased susceptibility to respiratory diseases, cancer and leukemia. severe oligozoospermia and azoospermia are common. leydig cell function is conserved. 1238 rothmund-thomson syndrome presents with poikiloderma, juvenile cataracts, sparse hair, short stature, skeletal defects, dystrophic teeth and nails, and hypogonadism. these patients are predisposed to cancer and osteogenic sarcoma. 1239 werner's syndrome (progeria) is characterized by short stature, prematurely graying hair, baldness, cataracts, atrophy and calcifi cation of muscle and fat, wrinkling of the skin, keratosis, osteoporosis, telangiectasis, atheroma, diabetes mellitus, gynecomastia, and hypergonadotropic hypogonadism. the lifespan of fi broblasts and other cells is shortened in this syndrome. the mutation is in the recq3 helicase gene. cockayne's syndrome is a rare autosomal recessive neurodegenerative disorder. signs and symptoms include infantile failure to thrive, short stature, poorly developed trunk, premature aging, neurological alterations, retinitis pigmentosa, optic atrophy, cataract, deafness, microcephaly, micrognathia, photosensitivity, delayed eruption of primary teeth, congenital absence of some permanent teeth, partial macrodontia, atrophy of the alveolar process and caries, limited articular movements in elbows, knees, and fi ngers, 1240 abnormally small eccrine glands, 1241 and hypergonadotropic hypogonadism. it may be caused by two gene mutations: cnk1 (ercc8) and ercc6, located respectively on chromosomes 5 and 10, and causing two variations of cockayne's syndrome, including cs-a, secondary to a ercc8 mutation, and cs-b with ercc6 mutation. cs-b patients have hypersensitivity to ultraviolet light secondary to a dna repair defect. 1242 tay's syndrome (trichothiodystrophy) has two presentations: ibsd (ichthyosis, brittle hair, impaired intelligence, short stature) and ibisd (photosensitivity, ichthyosis, brittle hair, impaired intelligence, short stature). in both forms, patients have decreased fertility. one case of hypergonadotropic hypogonadism has been reported. 1243 noonan's syndrome is characterized by multiple malformations reminiscent of turner's syndrome, including short statute, pterygium coli, and cubitus valgus, although there is normal male karyotype. the disease has an incidence of 1 in 1000 to 1 in 2500 live births and autosomal dominant inheritance, with sporadic occurrence in about 50% of cases. a locus for dominant forms has been mapped to 12q24.1. 1244 mutation in ptpn11 (protein-tyrosine phosphatase, nonreceptor-type 11) accounts for half of cases, although similar germline mutations also cause leopard's syndrome and certain pediatric hematopoietic malignancies. 1245 cryptorchidism is present in about 70% of cases and is usually bilateral. during childhood, testicular biopsy shows a low tubular fertility index. puberty is often delayed, and, at adulthood, hypogonadotropic or hypergonadotropic hypogonadism occurs. ultrastructural studies reveal morphologic anomalies in germ cells. 1246 although spermatogenesis is generally impaired, some patients have been fertile (fig. 12-135) . cerebellar atrophy may be associated with hypogonadism. patients are infertile and have moderate ataxia without endocrine disorder. infertility is due to morphological abnormalities of spermatozoa caused by decreased expression of map2 (the most important microtubule-associated protein), and a defect in erythroid ankyrin. 1247, 1248 many other syndromes also present with primary hypogonadism. the best known are alström's, weinstein's, borjenson-forssman-lehmann, marinesco-sjögren, richards-rundle, robinow's, and silver-russell syndromes. hypogonadotropic hypogonadism or hypogonadism of hypothalamo-hypophyseal origin is classifi ed according to whether the hypothalamo-hypophyseal failure occurs before or after puberty. eunuchoidism, present only in the former group, is the basis of the distinction. the most frequent types of hypogonadism caused by hypothalamo-hypophyseal failure are those caused by a defi cit of gonadotropinreleasing hormone, bioinactive fsh and lh, defi cit in growth hormone, those associated with prader-willi syndrome, and laurence-moon-rozabal-bardet-biedl syndrome. the onset and maintenance of the hypothalamo-hypophyseal-gonadal axis is due to pulsatile gonadotropin-releasing hormone (gnrh) secretion by neurons of the nucleus arcuatus hypothalamus, with release into the pituitary portal system and subsequent stimulation of gonadotropinreleasing hormone receptors on the surface of gonadotropinsecreting cells. the gnrh gene is located on 4q13. 1249 patients with gnrh defi cit have partial or complete absence of gnrh-induced pulsatile lh secretion, and normalization of pituitary and gonadal secretions after exogenous gnrh administration. imaging studies of the hypothalamo-hypophyseal region are normal. clinical symptoms vary with age at presentation (congenital or acquired) and severity (complete or partial defi cit). clinical presentations include delayed puberty, idiopathic hypogonadotropic hypogonadism (isolated gonadotropin defi cit), kallmann's syndrome, isolated fsh defi cit, and isolated lh defi cit (fertile eunuch syndrome). constitutional delayed puberty is assumed to be a minor form of gnrh defi cit, 1250 and is characterized by delayed sexual maturation in otherwise healthy males. patients are short and usually have a family history of delayed puberty. puberty usually begins at 13-14 years of age and progresses over 2 years. if a 14-year-old boy has not begun pubertal changes (testicular enlargement, growth in height, and development of secondary sex characteristics), delayed puberty should be suspected. 1251 simple pubertal delay that is overcome naturally in a short time without treatment must be distinguished from hypogonadotropic hypogonadism. the latter should be suspected when any of the following symptoms are present in the patient or his family: a midline defect, anosmia, or pubic hair without testicular development. hormone assays may also assist in diagnosis. if a patient between 16 and 18 years old has prepubertal gonadotropin levels, he probably has hypogonadotropic hypogonadism. a variant of hypogonadotropic hypogonadism, isolated gonadotropin defi cit is characterized by defects in the synthesis or release of fsh and lh; other hypophyseal functions are normal. patients have eunuchoid phenotype, with small testes and penis, scanty body hair and beard, a high-pitched voice, and poorly developed muscles. presentation may be sporadic, autosomal dominant, autosomal recessive, or xlinked. the cause might be a mutation in the gnrh receptor gene. 1252 patients have very low levels of fsh, lh, testosterone, and estrogen. clomiphene citrate treatment fails to stimulate hormonal secretion. 1253 pulsatile administration of gnrh is useful to promote both androgen production and spermatogenesis. the lh-leydig cell-testosterone axis is normal in most cases, but normalization of the fsh-sertoli cell-inhibin axis is not achieved in all cases. basal inhibin levels higher than 60 pg/ml and absence of cryptorchidism are favorable predictor factors for the acquisition of normal testicular size and acceptable spermatogenesis. 1254 testicular biopsy reveals an immature pattern. the seminiferous tubules have neither lumina nor elastic fi bers (fig. 12-136 ). sertoli cells are immature, and no differentiated leydig cells are seen. spermatogonia are rare. in some patients the pattern is similar to that of sertoli cell-only testes with immature sertoli cells. 1255 hypogonadism associated with anosmia is also known as maestre de san juan, 1256 kallman, 1257 abnormalities include olfactory bulb agenesis, cryptorchidism, mental retardation, color blindness, facial asymmetry, nerve deafness, epilepsy, shortening of the fourth metacarpal, tarsal navicular fi brous dysplasia, familial cerebellar ataxia, diabetes mellitus, hyperlipidemia, gynecomastia, cleft lip, maxillary or palate, unilateral renal aplasia, and cardiovascular abnormalities. the syndrome may be xlinked or autosomal. the gene for the x-linked form is mapped to xp22.3 and may have different mutations (termed kal-x, kalig-1, and admlx), complete deletion, and point mutations. this gene encodes the protein anosmina-1, which is similar to other nerve cell adhesion molecules and is involved in axonal growth and development. kal protein, secreted by mitral cells, permits the passage of olfactory neurons into the olfactory bulbs and is lacking in kallmann's syndrome. this failure also inhibits migration of neuroblasts from the olfactory epithelium to the hypothalamus to form gnrh-secreting neurons. 1259 the autosomal dominant presentation (occurring in 10% of cases) is due to loss of function of fi broblastic growth factor receptor 1 (fgfr1). 1260 interaction between kal1 and fgfr1 is required for neuronal migration. 1261 patients are classifi ed into two groups according to the partial or complete absence of gnrh. partial absence of gnrh is diagnosed by the presence of spontaneous pulses of lh, fsh, and testosterone during a 24-hour period. complete absence is diagnosed by the absence of spontaneous pulses of lh, fsh, and testosterone during a 24-hour period. these patients show an increase in fsh only after gnrh administration. 1262 testes are histologically infantile; the tubules have a small diameter, lack lumina, and contain immature sertoli cells and isolated spermatogonia. 1263 the interstitium is wide and consists of acellular connective tissue with no recognizable leydig cell precursors (fig. 12-137 ). 1264 autopsy studies in patients with anosmia and hypogonadism reveal agenesis of the olfactory bulbs that may be partial or complete and unilateral or bilateral, together with an apparently normal hypophysis and normal or hypoplastic hypothalamus. this syndrome is the least severe form of holoprosencephaly-hypopituitarism complex, a spectrum of developmental anomalies associated with impaired midline cleavage of the embryonic forebrain, aplasia of the olfactory bulbs and tracts, and midline dysplasia of the face. testicular seminoma has been reported in a patient with anosmia with hypogonadotropic hypogonadism. 1265 this rare syndrome is characterized by azoospermia or oligozoospermia in normally virilized patients with normal sexual potency. serum levels of lh and testosterone are normal, but fsh levels are very low or undetectable. the clomiphene stimulation test gives variable results. the gnrh test induces a normal response only of lh. mutations in the fsh-β gene are exceptional. 1266, 1267 testicular biopsy shows maturation arrest at the spermatocyte level, hypospermatogenesis, or partial sertoli cell-only pattern. 1268 gonadotropin treatment increases spermatozoal numbers in most cases, and fertility may be induced. isolated lh defi ciency, also known as pasqualini's or fertile eunuch syndrome, 1269, 1270 is characterized by hypogonadism secondary to lh defi cit with preservation of spermatogenesis. patients have eunuchoid habitus, small testes, decreased libido, female distribution of pubic hair, and a high-pitched voice. other frequent fi ndings include gynecomastia, anosmia, ocular lesions, and pituitary tumor. 1271 fsh level is normal, but lh and testosterone levels are very low. mutations in the lh-β subunit gene 1272 and the gnrh receptor have been reported. 1273 the clomiphene test is usually negative, and gnrh stimulation increases lh and, to a lesser degree, fsh. testicular biopsy shows seminiferous tubules with normal or slightly decreased diameters and complete spermatogenesis; however, the number of all germ cell types is below normal. leydig cells are rare or absent (fig. 12-138) . maintenance of spermatogenesis in the absence of leydig cells and serum testosterone can only be explained by assuming the occurrence of testosterone secretion suffi cient for spermatogenesis but not to be detectable in the blood. in addition to adequate hypothalamic function, spermatogenesis requires that fsh and lh are biologically active. lh is a heterodimer, composed of two subunits: α (common to fsh and lh) and β (specifi c for lh). the genes for the β subunit are on 19q13.32. if both alleles are mutated for this subunit, the lh produced in biologically inactive although it may be detectable in standard hormone assay. homozygous patients have elevated serum level of lh and low testosterone levels, lack of puberty, and infantile testes. heterozygous patients are only infertile. 1274 patients with mutation in the β subunit of the fsh gene are oligozoospermic or azoospermic. 1275 activating and inactivating mutations of gonadotropin receptor genes have been reported. activating mutation of the lh/human chorionic gonadotropin receptor gene causes familial precocious puberty (see discussion on familial testotoxicosis, below). inactivating mutation of this gene causes male pseudohermaphroditism (see discussion on leydig cell hypoplasia in this chapter). inactivating mutation of the fsh receptor gene produces only mild spermatogenetic lesions, emphasizing the relative value assumed for fsh in spermatogenesis. activating muta-tion of this gene gives rise to spermatogenesis even in the absence of pituitary function. patients with isolated growth hormone defi cit and those with resistance to growth hormone action may have delayed puberty and hypogonadotropic hypogonadism. 1276 some patients with spermatogenetic maturation arrest or idiopathic oligozoospermia have a relative defi cit of growth hormone. this hormone probably acts on the testis by stimulating local secretion of insulin-like growth factor-1, which cooperates with testosterone. prader-willi syndrome is characterized by hypogonadism, obesity, muscular hypotonia, mental and physical retardation, and acromicria. 1277 other frequent fi ndings include strabismus and non-insulin-dependent diabetes mellitus. the incidence is estimated at between 1 in 12 000 and 1 in 15 000 newborns in 25 000 live births, and is higher in males. patients have low serum levels of lh, testosterone, estradiol, and inhibin b, and high levels of fsh. these hormonal fi ndings suggest the occurrence of a mixed form of central (low lh) and peripheral (low inhibin b and high fsh) hypogonadism. 1278 the penis and testes are hypoplastic, and cryptorchidism is present in about 70% of cases (bilateral in 45% of cases) (fig. 12-139 ). 1279 during infancy and childhood, the testes have reduced tubular diameters; adults have an infantile pattern. 1280 this syndrome is caused by an anomaly of chromosome 15, usually in the 15p11-12 band. other chromosomal anomalies include robertsonian translocations, reciprocal translocations, small supernumerary metacentric chromosomes, and partial deletion of the long arm of chromosome 15. this syndrome is a pleiotropic disorder characterized by obesity, infantilism, short stature, diabetes insipidus, mental retardation, retinitis pigmentosa, polydactyly, and syndactyly. it is more frequent in males than in females. men with this syndrome are infertile, and about 74% show hypogonadism. the testes are prepubertal, the scrotum is hypoplastic or bifi d, and the penis is small. cryptorchidism is found in 42% of males, and is bilateral in 28%. at least 11 genes responsible for this syndrome have been cloned, and it is probable that additional genes are involved. the function of the products of these gene is to mediate and regulate microtubule-based transport processes. 1281, 1282 hypogonadotropic hypogonadism associated with dermatologic diseases several dermatopathies are associated with hypogonadotropic hypogonadism, including ichthyosis and johnson's neuroectodermic syndrome. most cases of ichthyosis associated with hypogonadism are x-linked. about 15% of these patients have cryptorchidism, small testes, micropenis, and high risk of testicular cancer. the cause is a defective microsomal enzyme, steroid sulfatase, causing the accumulation of cholesterol sulfate that hinders sloughing of the cornifi ed layer of the epidermis. the gene responsible for this enzyme is mapped to xp22,3. some of these patients also have anosmia or hyposmia owing to involvement of the neighboring genes, causing a contiguous gene defect. 1283 johnson-mcmillin neuroectodermic syndrome is a rare autosomal dominant disorder characterized by alopecia, hypogonadotropic hypogonadism, anosmia or hyposmia, deafness, prominent ears, microtia and/or atresia of the external auditory meatus, and a pronounced tendency to dental caries. 1284 hypogonadism associated with ataxia is rare. most patients are the offspring of a consanguineous marriage. inheritance is autosomal recessive. the most frequent syndromes are louis-bar's syndrome (ataxia-telangiectasia) and friedreich's ataxia. ataxia-telangiectasia is the most common inherited ataxia and is characterized by cerebellar ataxia that starts in infancy and develops progressively; mucocutaneous telangiectasis; anomalies of the immune system that cause pulmonary infection; hypersensitivity to ionizing radiation owing to impairment of dna repair; and a high risk of lymphoid neoplasia. the gene responsible is on 11q22-q23.1. 1285 this ataxia results from inactivation of the a-t mutated (atm) kinase, a critical protein kinase that regulates the response to dna double-strand breaks by selective phosphorylation of a variety of substrates. 1286 friedreich's ataxia is a neurodegenerative disorder characterized by degeneration of dorsal root ganglia and spinocerebellar tracts. hypertrophic myocardiopathy is also observed in many of these patients. the incidence is estimated at 1 in 40 000 children. it is caused by defects in the gene encoding frataxin, a protein required for vesicular traffi c in cell and synaptic transmission. 1287 about 95% of patients are homozygous for an unstable trinucleoid (gaa) expansion in intron 18 of stm7 on 9q13. the normal gene has up to 35 or 40 triplet repeats, whereas patients with this ataxia carry 70 to more than 1000 gaa triplets. 1288 the normal gene has seven to 22 gaa repeats, whereas the mutated gene has over 120 repeats. the extent of the expanded allele is directly proportional to the severity of disease, early onset of disease, and development of cardiac abnormalities. other ataxias associated with hypogonadism are kearns-sayre, boucher-neuhauser, and gordon-holmes syndromes. hypogonadotropic hypogonadism may also be present in carpenter's, biemond's, fraser's, and moebius' syndromes, and in patients with mental retardation. maintenance of spermatogenesis requires the harmonious cooperation of several endocrine glands and proper functioning of other tissues. symptomatic endocrinopathy is present in only 1.7% of infertile men, but over 9% of infertile patients have abnormalities in their endocrine studies. 1289 hypogonadism may be present in disorders involving the hypothalamus-hypophysis, thyroid, adrenals, pancreas, liver, kidney, and gastrointestinal tract, and may be associated with aids, chronic anemia, obesity, lysosomal and peroxisomal diseases, and neoplasia. hypogonadotropic hypogonadism can also be found in some (usually women) who perform rigorous sports (long-distance runners, swimmers, dancers, and rhythmic gymnasts). 1290 hypopituitarism hypogonadism may result from destruction of the hypothalamus or hypophysis by primary or secondary hypothalamic tumor; granulomatous disease ( fig. 12-140 ); fracture infertility of the cranial base; radiotherapy for malignancy of the nasopharynx, central nervous system, or the eye orbit; pituitary adenoma and cyst; aneurysm of the inner carotid artery; and chronic and nutritional disease. many of these processes cause panhypopituitarism with varied symptoms. 1291 clinical manifestations of hypogonadism in patients with pituitary lesions vary according to time of onset (childhood, or after puberty). in prepubertal hypopituitarism the testes retain an infantile appearance into adulthood, and there is rarely proliferation of spermatogonia and the development of primary spermatocytes. biopsy shows variable hyalinization of tubules. in postpubertal hypopituitarism the appearance ranges from complete spermatogenesis to tubular hyalinization ( fig. 12-141) . the presence of elastic fi bers in tubular walls indicates that pubertal maturation occurred before the development of hypopituitarism. leydig cells have pyknotic nuclei and retracted cytoplasm with abundant lipofuscin. in some patients, recovery of spermatogenesis occurs after administration of human chorionic gonadotropin. 1292 there are cases in which pituitary adenoma secretes both fsh and lh, inducing testosterone hypersecretion and an elevated sperm count. 1293 fsh-secreting pituitary adenoma associated with large testes and increased serum inhibin concentration has been reported. 1294 hyperprolactinemia prolactin inhibits gnrh secretion and hence fsh and lh secretion. in addition, prolactin has a direct inhibitory effect on androgens in target tissues. in men, hyperprolactinemia causes impairment of spermatogenesis, impotence, loss of libido, and depressed serum testosterone. 1295 some patients seek treatment because of oligozoospermia and infertility. hyperprolactinemia is also associated with dysfunction of prolactin receptors. 1296 spermiograms usually show oligozoospermia and an elevated level of fructose, 1287 although not all males with hyperprolactinemia have subnormal testicular function. 1298 testicular biopsy reveals variable testicular atrophy. the most frequent lesion is in the tubular adluminal compartment, with degenerative changes in the apical cytoplasm of sertoli cells, sloughing of young spermatids, 1297 and increased lipid droplets in leydig cells. 1299 in boys, two different conditions associated with abnormal prolactin secretion have been reported: hyperprolactinemia, testicular enlargement, and primary hypothyroidism; and prolactin defi ciency, obesity, and enlarged testes. infertility caused by thyroid gland malfunction is rare but reversible. it accounts for about 0.5% of male infertility testicular function is impaired more by hypo-than by hyperthyroidism. patients with hyperthyroidism may have gynecomastia, impotence, and infertility. levels of fsh and lh serum are normal or increased, with elevated sex hormonebinding globulin, increased testosterone concentration, reduced non-sex hormone-binding globulin-bound testosterone, and little or no change in free testosterone. 1300, 1301 in graves' disease there is a pronounced inhibition of gonadal steroidogenesis. 1302 in patients with hyperthyroidism, spermatozoa may be normal or reduced in number, and in both cases progressive motility is low. prepubertal hypothyroidism may impair testicular function by causing precocious or delayed puberty. in delayed puberty, hypothyroidism leads to hypogonadotropic hypogonadism, with testes showing incomplete maturation arrest and, in severe myxedematous hypothyroidism, hydrocele. 1303 in experimental hypothyroidism, testicular enlargement is frequently associated with increased spermatid production. 1304 primary hypothyroidism in adults causes hypergonadotropic, hypogonadotropic, or normogonadotropic hypogonadism, 1305 but testicular function is rarely impaired and patients are usually infertile. 1306 the cause of testicular damage is decreased gonadotropins or hyperprolactinemia. 1307 children with hypothyroidism usually have precocious pseudopuberty. 1308 about 11% of infertile patients reportedly have subclinical adrenal dysfunction, but the true incidence is probably lower. adrenal disorders most frequently associated with infertility are adrenal hypoplasia, adrenal hyperplasia, and adrenal carcinoma. congenital adrenal hypoplasia with hypogonadotropic hypogonadism is an x-linked recessive disorder that gives rise to adrenal insuffi ciency in the fi rst months of life. in later presentations, patients have cryptorchidism and delayed puberty. 1309 the responsible gene, dax1 on xp21, is expressed in the adrenals, testes, pituitary, and hypothalamus. the resulting hypogonadism may be either pure or mixed (hypophyseal and testicular). in the last case, hypogonadism is partial. 1310 testicular biopsy from one adult with adrenal hypoplasia showed an apparent primary lesion, including tubules with dysgenetic sertoli cells and others with spermatogonial maturation arrest in associated with hypertrophy and hyperplasia of leydig cells. 1311 infertility is frequent in patients with minor forms of congenital adrenal hyperplasia. those with defi ciency of 21hydroxylase 1312 or 11β-hydroxylase usually have complete spermatogenesis but with reduced numbers of all germ cells. the characteristic histologic fi nding is decreased numbers of leydig cells. [1313] [1314] [1315] [1316] in untreated patients, the testes become enlarged by 'tumors' of the adrenogenital syndrome that consist of cells similar to adrenal cortical cells ( fig. 12-142 ). [1316] [1317] [1318] [1319] adrenal cortical carcinoma adrenal carcinoma is often associated with excessive secretion of several hormones, causing hyperaldosteronism, cushing's syndrome, virilization, or feminization. virilizing tumors in infancy have their own characteristics, which differ from those of the same adult tumors as the infantile form may be associated with other disorders, such as hemihypertrophy and beckwith-wiederman syndrome, may be included in the spectrum of 'families with cancer predisposition' (mutations in p53 gene), and produce precocious pseudopuberty syndrome. in adults, adrenal carcinoma may cause marked spermatogenic depletion owing to the conversion of large amounts of dehydroepiandrosterone produced by the tumor into estrogen. feminizing tumor in infancy causes gynecomastia and pubic hair development. 1320 feminizing tumor presents more striking clinical characteristics, including progressive loss of secondary sex characteristics and feminization due to elevated estrogen. testicular atrophy results from the inhibitory effect of estrogen on pituitary gonadotropins. similar symptoms may be observed in patients with prostatic carcinoma treated with estrogens ( fig. 12-143 ) and in other conditions with excessive estrogen production, such as sertoli cell or leydig cell tumor. patients with cushing's syndrome or diseases that require long-term corticoid therapy, such as ulcerous colitis, rheumatoid arthritis, or asthma, have reversible reduction of fertility. the explanation for this is that most testicular receptors for corticoids are in leydig cells, and thus glucocorticoids are powerful inhibitors of testosterone synthesis. alterations in the carbohydrate, lipid and protein metabolism characteristic of diabetes mellitus involve the genital system, although most diabetic patients are fertile. gonadal impairment depends on the type of diabetes and the time of disease onset (infancy and childhood, puberty, or adulthood). 1321, 1322 testicular lesions in newborns with diabetic mothers are discussed in the section on congenital anomalies of the testis. 317 puberty may be delayed in diabetic patients, although the cause is unknown. other gonadal alterations appear at puberty, and diabetic men who have not been adequately treated may be infertile and have sexual dysfunction. serum levels of fsh, lh, and testosterone are decreased. 1323 spermiograms reveal low numbers and poor motility of spermatozoa. 1324 prolactin levels are increased and testosterone levels low or near normal. the seminiferous tubules have reduced diameters, thickening of the lamina propria, and alterations in the adluminal compartment. these consist of degenerative changes in the sertoli cell apical cytoplasm and sloughing of immature germ cells. the major lesion is in the interstitial connective tissue and leydig cells. small interstitial blood vessels show diabetic microangiopathy characterized by enlargement and duplication of the basal lamina, pericyte degeneration, and endothelial cell alterations. the number of fi broblasts infertility and the amount of collagen and ground substance in the interstitial connective tissue are increased. 1325 leydig cells are decreased in number and show increased amounts of lipid droplets and lysosomes, accounting for the reduced function of these cells. the tubular lesions are attributed to low serum testosterone, probably owing to defi cient leydig cell stimulation by insulin (or a decrease in insulin-dependent fsh) and abnormal carbohydrate metabolism of sertoli cells. sexual dysfunction is present in more than half of patients and consists of impotence, decreased libido, disorders of intercourse, and retrograde ejaculation. the causes of impotence are multiple, including microangiopathy and macroangiopathy, hormonal defi ciencies, psychological factors, and autonomic neuropathy affecting the parasympathetic system. neuropathy is probably chiefl y responsible for erectile failure in diabetic men. 1326 alterations in sperm excretory ducts may be associated with diabetes. the most frequent are enlarged seminal vesicles and calcifi cation of both seminal vesicles and vasa deferentia. calcifi cations are found in the muscular layers and display a concentric arrangement (fig. 12-144 ). 1327 although cystic fi brosis (mucoviscidosis) was recognized as a disease prior to 1940, its effects on the male genital system were not recognized until the 1970s. this may be explained by improvements in medical care during childhood, allowing the survival of many patients to adulthood, and the recognition of cystic fi brosis in patients who had been diagnosed with chronic bronchitis and hepatic or digestive dysfunction. in the us, cystic fi brosis is the most lethal congenital disease, with a prevalence of 1 in 2500 children, and a carrier status of 1 in 25 white men. 1328 lesions in sperm excretory ducts involve (in decreasing order of frequency) the vas deferens (congenital bilateral absence, unilateral absence), ejaculatory ducts (bilateral obstruction), epididymis (diffuse or segmental hypoplasia), and seminal vesicles (incomplete development). thus, it appears that most patients with cystic fi brosis have infertility due to obstruction. 1329, 1330 histologic studies in children, even at an early age, reveal that the vas deferens and ductus epididymis are absent or reduced to small ductuli with reduced or absent lumina and thin, poorly muscular walls (fig. 12-145 ). the testes are normal during childhood, but show hypospermatogenesis and spermatid malformations by adulthood. the spermiogram is characteristic of obstructive azoospermia, with acid ph, decreased semen volume and fructose concentration, and increased citric acid and acid phosphatase. 1331 the disease is a genetic disorder with autosomal recessive inheritance. the impaired gene (cystic fi brosis gene) is on chromosome 7 (7q31), 1332 and encodes a protein termed cystic fi brosis transmembrane regulator (cftr). alterations in this protein cause cystic fi brosis. although more than 800 mutations of this gene have been identifi ed, 1333 the most frequent mutation in caucasians is d-f508, responsible for 70% of cases. congenital bilateral obstructive azoospermia secondary to bilateral absence of the vas deferens, even in the absence of other symptoms, is often a forme fruste of cystic fi brosis. 1334 before initiating treatment for infertility, the possibility that the patient is a carrier of the cystic fi brosis gene should be evaluated. 1335 malformation of the genital system plays the most important role in infertility in cystic fi brosis. 1336 the lesions begin in the 10th week of gestation, when the wolffi an duct forms the sperm excretory ducts. 1337 variable penetrance of the cystic fi brosis gene accounts for the diversity of malformations affecting different regions of the male genital system. the liver has a primary role in metabolism, detoxifi cation, and excretion of sex steroid hormones. chronic hepatic failure damages the hypothalamo-hypophyseal-testicular axis, and subsequently all related endocrine glands. hypogonadism is frequent in the fi nal stages of severe chronic liver diseases, including alcoholism, non-alcoholic liver disease, and hemochromatosis. the association of testicular atrophy with gynecomastia and hepatic cirrhosis is well known and is referred to as silvestrini-corda syndrome. 1338, 1339 alcohol has a direct toxic effect on leydig cells. acute alcoholic intoxication suppresses serum testosterone in voluntary non-alcoholic men and laboratory animals. chronic alcohol ingestion, even in the absence of cirrhosis, causes hypogonadism, with symptoms of leydig cell failure, including testicular atrophy, infertility, decreased libido, impotence, and reduced size of the prostate and seminal vesicles. 1340 chronic alcoholic patients with cirrhosis also have symptoms of hyperestrogenism, including gynecomastia, female escutcheon, and female fat distribution pattern. most chronic alcoholic men, with or without cirrhosis, have signifi cant testicular lesions. the seminiferous tubules have reduced diameters, thickened lamina propria, and decreased or absent germ cells. leydig cells are reduced in number and contain abundant lipofuscin granules ( fig. 12-146 ). the epididymis becomes atrophic, mainly in the ductuli efferentes, owing to androgen deprivation. the epithelium of the rete testis becomes cuboidal or columnar due to estrogens. the spermiogram correlates with the variability of histologic fi ndings, usually showing a marked reduction in the number and motility of spermatozoa and an increase in the percentage of morphologically abnormal spermatozoa. 1341, 1342 about 20% of patients initially have an increase in serum testosterone; with advanced disease, testosterone level decreases. the initial increase is due to an elevation in sex hormone-binding globulin concentration and reduced testosterone metabolism by the liver. 1343 serum estrogen level also increases owing to increased conversion of testos-terone into estrogen in peripheral adipose and muscular tissue. 1344 non-alcoholic liver disease impairs gonadal function according to the severity of the disease. 1345 patients have decreased levels of total and biologically active free testosterone. hormonal alterations are not as severe as in alcoholic patients, emphasizing the direct action of alcohol on leydig cells. in α 1 -antitrypsin defi ciency testicular function and fertility are conserved; only in advanced stages of the disease do minor biochemical alterations occur. 1346 in alagille's syndrome (intrahepatic biliary duct hypoplasia), hypogonadism is associated with cholestasis, frequent vertebral, cardiac, and facial malformations, and mental retardation. hypogonadism is manifest by small testes, delayed puberty, and, in adults, lack of germ cell development. hereditary hemochromatosis is the most frequent genetic disease in the northern hemisphere and results from excessive iron absorption and accumulation in multiple tissue and organs, leading to cirrhosis, diabetes, hypogonadism, and arthralgia. four types of hereditary hemochromatosis have been reported. 1347 type 1, the most frequent, is caused by mutation in the hfe gene (c282y), leading to increased intestinal absorption of iron, supersaturation of iron deposits, and damage in multiples organs. the type i hereditary hemochromatosis gene (hfe) is located on the short arm of chromosome 6, 1348,1349 is present in 85-100% of hemochromatosis patients with northern european ancestry, and its protein product is mainly expressed in the epithelium of lieberkühn crypts. this protein interacts with the transferrin receptor, reducing its affi nity for iron-bound transferrin; therefore, hfe becomes a negative regulator of transferrinbound iron uptake. type 2 gene is a juvenile form that expresses before the age of 30 years in both sexes, and is associated with severe cardiomyopathy and hypogonadism. 1350 the type 2 hemochromatosis locus is on chromosome 1q21, but this gene has not yet been isolated. 1351, 1352 type 3 is on chromosome 7q22, impairs the transferrin 2 receptor, and its consequences are similar to those of type 1 receptor defect. type 4 is autosomal dominant, on 2q32, and affects the basolateral iron carrier ferroportin 1, resulting in iron deposition in macrophages. types 1, 2, and 3 have recessive autosomal inheritance and show a similar distribution pattern of iron deposits. in these three types, alteration of gonadal function has also been reported. iron homeostasis depends on many genes that act in a coordinated manner, and their exact function is not well known. it is assumed that normal individuals absorb 1-2 mg/day of iron, whereas homozygous patients with hereditary hemochromatosis absorb up to 3-4 mg/day. once iron deposits become saturated (cells of liver, pancreas, hypophysis, heart, adrenals, and gastric mucosa), the toxic effects of iron cause dysfunction of the liver (cirrhosis and cancer in 5-10% of patients), the pancreas (diabetes in 80% of patients), the heart (myocardiopathy), musculoskeletal system (arthritis), and hypophysis (hypogonadism) (fig. 12-147) . hypogonadism may be the fi rst sign of disease when it starts in adult life. 1353 with age, hypogonadism becomes hypogonadotropic, with low serum levels of testosterone, lh, and fsh in more than 40% of patients, 1354 except if early treatment is initiated. 1355 the most frequent fi ndings are testicular atrophy with diminished tubular diameter, tubular wall thickening, a progressive decrease in spermatogenesis, and increased lipofuscin granules in leydig cells. the cause of these testicular disorders might be preferential deposition of iron in gonadotropic cells. 1356 iron deposits are not observed in the testis. hypogonadism decreases after aggressive therapy. 1357 polycystic renal disease in adults is a dominant autosomal disorder that appears with 1 in 1000 frequency in the general population. patients with this disease comprise 10% of end-stage renal failure cases. 1358 infertility is common, even before the beginning of renal insuffi ciency. oligoteratozoospermia and necrospermia are frequent fi ndings. 1359, 1360 serum levels of fsh, lh, prolactin, testosterone, and estradiol remain normal for a long time before the onset of renal insuffi ciency. the causes of spermiogram alterations have been related to partial obstruction of ejaculatory ducts (based on fi nding cystic dilations in seminal vesicles in 60% of patients) or seminal vesicle cyst. 1361 the incidence of these two disorders in patients with polycystic renal disease is very high compared to andrological patients without this disease (5.2%). 1362 chronic renal insuffi ciency is associated with disturbed endocrine function in the pituitary, thyroid, parathyroids, and testes. the associated sexual dysfunction consists of erectile impotence, diminution of libido and semen volume, oligozoospermia or azoospermia, and infertility. in children, skeletal development and puberty are delayed. 1363 hormonal studies reveal elevated levels of fsh, lh, and prolactin, but testosterone levels are low. 1364 testicular biopsy shows seminiferous tubules with reduced diameters and reduced or absent germ cells (fig. 12-148) . 1365, 1366 the interstitium contains a normal number of leydig cells and increased numbers of macrophages. additionally, patients with chronic renal insuffi ciency due to glomerulonephritis have thickening of the tubular lamina propria and decreased number of leydig cells. patients with end-stage renal disease who undergo dialysis show calcifi cations in several organs and tissues, including the male genital system (epididymidis, tunica albuginea, and cavernous tissue) in 87% of cases, and, in isolated cases, calcifi cation of the testicular parenchyma and microlithiasis. 1367 elevated serum levels of phosphorus, increased calciumphosphorus product, severe hyperparathyroidism secondary to other disorders, older age, and prolonged time on dialysis contribute to this disorder. uremic calcifi cation is a cell-mediated process in which elevated levels of tgf, vitamin k-dependent proteins such as osteocalcin and atherocalcin, and defects in calcium-regulatory proteins such as fetuin are implicated. 1368 when these patients are dialyzed, accumulations of urate and oxalate crystals are deposited in the rete testes and ductuli efferentes. these crystals are deposited beneath the epithelium and often sloughed into the lumen. reactive changes in the rete testis, including cystic transformation, are frequent (see disorders of the rete testis). 1369 the cause of gonadal dysfunction is unclear and probably involves several factors, including impaired testicular steroidogenesis, 1370 reduced clearance of pituitary hormones, 1371 and secretory defects of the pituitary and hypothalamus. 1372 dialysis does not improve testicular function. the response to renal transplantation is not immediate and is related to the glomerular fi ltration rate. patients with rates lower than 50 ml/min develop atrophy of the seminiferous tubular cells. 1370 hypogonadism is a frequent fi nding in men with celiac disease, and results in clinical symptoms in 5-10% of untreated patients. celiac disease causes infertility in some cases. spermiograms show reduced motility and numerous morphologic anomalies in spermatozoa. hormonal studies show elevated serum fsh levels in more than 25% of men with celiac disease. lh also is increased in more than 50% of these men. the response of fsh and lh to gnrh stimulation is excessive. the cause of this pituitary derangement is unknown. sperm anomalies are not always corrected by a gluten-free diet. studies in patients with ulcerative colitis and regional enteritis reveal a low sperm count, impaired motility, and ultrastructural alterations, including nuclear pleomorphism and chromatin malcondensation and decondensation. zinc defi cit may be responsible for these alterations in crohn's disease. 1373 the alterations apparently are related to the extent of the intestinal lesions and the severity of symptoms. 1374 patients with ulcerative colitis treated with salazopyrine, 1375 mesalazine 1376 or fasalazine 1377 present with signifi cant impairment of spermatogenesis and subfertility. spermiogram parameters improve when treatment ceases. more than 17% of hiv-infected men have hypogonadism, 1378 which can be observed even in those whose viral replication is under control and show normal numbers of cd4 lymphocytes. patients frequently develop 'early andropause,' marked by dysregulation of the hypothalamopituitary-testicular axis. 1379 hypogonadism is more frequent in hiv-infected men with wasting syndrome, and therefore these patients should undergo screening for hypogonadism and, if necessary, physiologic androgen replacement therapy. [1380] [1381] [1382] [1383] the incidence of hypogonadism in males with aids is estimated to be 50%. 1384, 1385 according to autopsy studies this increases to 100% in the 3-24 months prior to death. 1386 histological studies reveal that 28% have complete but quantitatively abnormal spermatogenesis, and the remainder have spermatocytic arrest or sertoli cell-only pattern. patients with chronic anemia requiring multiple transfusions develop iron deposits in the pituitary and polyglandular insuffi ciency, with atrophy of the thyroid, adrenals, and testes ( fig. 12-149) . the most frequent conditions are βthalassemia and sickle cell anemia (see fig. 12-119) . β-thalassemia is an autosomal dominant disease with three types: thalassemia trait (heterozygous β-thalassemia), intermediate thalassemia, and major β-thalassemia. the cause is mutation in the β-globin gene resulting in ineffective erythropoiesis, hemolysis, and anemia. nearly 20% of patients with major thalassemia have delayed puberty, [1387] [1388] [1389] and 69% have hypogonadotropic hypogonadism. 1390 gonadal dysfunction persists in most patients after healing of the thalassemia. 1391 sickle cell anemia is an autosomal recessive disorder with a constellation of fi ndings resulting from abnormal synthesis of hemoglobin, with over 90% of hemoglobin being type a. most patients have hypogonadotropic hypogonadism. 1392 the majority of people in developed countries are currently overweight, and the incidence of obesity seems to be increasing. infertility is frequently associated with obesity. very obese males have increased levels of serum estradiol and decreased levels of free testosterone and inhibin b. 1393 testosterone reduction is not followed by a compensatory increase in gonadotropins, resulting in hypogonadotropic hypogonadism. 1394, 1395 testicular abnormalities begin with the adluminal compartment and later involve the basal compartment; also, there are leydig cell atrophy, cuboidal metaplasia of the rete testis, and epididymal atrophy. there are three types of autoimmune polyglandular insufficiency syndrome. type i is defi ned by the presence of at least two of three characteristic features: addison's disease, hypoparathyroidism, and chronic mucocutaneous candidiasis. the aire gene (autoimmune regulator), responsible for type l disease, is on 21q22.3, 1396,1397 and the disorder is recessive autosomal. hypergonadotropic hypogonadism is frequent. 1398 patients with type i syndrome have antibodies against many autoantigens, intracellular enzymes including the p450 side-chain cleavage enzyme, 17α-hydroxylase 1399, 1400 and 21-hydroxylase, glutamic acid decarboxylase 65, aromatic l-amino acid decarboxylase, tyrosine phosphataselike protein ia-2, tryptophan hydroxylase (tph), tyrosine hydroxylase, and cytochrome p450 1a2. 1401 type ii autoimmune polyglandular syndrome is characterized by the presence of diabetes mellitus, hyperthyroidinfertility ism, hashimoto's thyroiditis, addison's disease, vitiligo, alopecia, pernicious anemia, and hypogonadism (listed in decreasing order of frequency). type iii syndrome includes thyroiditis, diabetes mellitus, pernicious anemia, and vitiligo or alopecia. about 14% of patients have hypogonadism owing to autoimmune destruction of the testis or pituitary gonadotropin-secreting cells (fig. 12-150) . 1402, 1403 there are at least four diseases caused by metabolic deposits in lysosomes or peroxisomes associated with testicular alterations, including fabry's disease, adrenal leukodystrophy, wolman's disease, and cystinosis. fabry's disease is an x-linked metabolic disorder characterized by intralysosomal deposits of globotriaosylceramide (gb3) owing to α-galactosidase defi ciency. clinical symptoms begin with painful neuropathy and progressive renal, cardiovascular, and cerebrovascular dysfunction. all endocrine glands may accumulate gb3 as a result of welldeveloped vasculature and low rate of cell proliferation. 1404 testes and sperm excretory ducts are always damaged. some alterations, including those of endothelial cells, smooth muscle cells, and fi broblasts, are non-specifi c; others, such as those of myofi broblasts, leydig cells, and epididymal epithelium, are specifi c (figs 12-151, 12-152 ). spermatogenesis is defi cient. 1405 enzyme replacement therapy with recombinant human α-galactosidase eliminates existing glycosphingolipid deposits and blocks new ones, and is thus recommended for implementation as soon as possible after diagnosis. [1406] [1407] [1408] adrenoleukodystrophy (adrenal testicular myeloneuropathy) this disorder is caused by mutation in the adrenoleukodystrophy gene on xq28. 1409 mutation at this site produces three peroxisomal diseases: adrenoleukodystrophy, adrenomyeloneuropathy, and addison's disease. adrenoleukodystrophy is characterized by progressive demyelinization of the central nervous system, usually in children and young adults, often with adrenal insuffi ciency and testicular failure. peroxisomal β-oxidation is defi cient and, as a result, very long-chain fatty acids accumulate inside peroxisomes in many tissues, causing the signs and symptoms of the disease. 1410, 1411 adrenomyeloneuropathy begins at a later age (about 30 years) with progressive paraparesis, peripheral neuropathy, and adrenal cortical failure. males usually have gonadal dysfunction with oligozoospermia or azoospermia and hypergonadotropic hypogonadism. 1412 testicular atrophy develops slowly, the seminiferous epithelium disappear, and leydig cells contain characteristic cytoplasmic lamellar inclusions, with similar inclusions in adrenal cortical cells and cerebral cells. 1413 wolman's disease is a rare inherited lysosomal disease characterized by a defi cit in acid lipase/cholesteryl ester hydrolase. the genetic mutation has been mapped to 10q23.2-q23.3. 1414 complete enzymatic defi ciency (wolman's disease) causes death in infancy as a result of the accumulation of cholesterol esters and triglycerides in numerous organs such as the liver, adrenal cortex, and intestines. 1415 partial defi ciency is known as cholesteryl ester storage disease, and the testis accumulates triglycerides and cholesterol in leydig cells and, to a lesser degree, in interstitial macrophages. delayed disruption of spermatogenesis by this storage disease probably accounts for the frequent lack of fertility problems in men with this disease. 1416 early treatment of children with wolman's disease by transplantation of umbilical cord blood-derived stem cells may successfully restore acid lipase level in some. 1417 cystinosis is an autosomal recessive metabolopathy characterized by alterations in cystine transport from the lysosomes to the cytosol that results in intralysosomal accumulation of cystine. there are several genes responsible, all on chromosome 17p13. cystine storage occurs in all body tissues. deposits in the renal parenchyma cause the main complication of cystinosis, namely renal insuffi ciency (nephropathic cystinosis). patients also develop hypergonadotropic hypogonadism. testicular involvement may be massive, with interstitial macrophages fi lled with cystine crystals that are visible by polarized light. 1418 niemann-pick disease consists of a heterogeneous group of inherited recessive autosomal diseases characterized by deposition of lipids in macrophages and other tissues. there are four reported types (a, b, c, d). the most common, type a, results from excessive storage of sphingomyelin owing to a mutation in the acid sphingomyelin gene that encodes a lysosomal hydrolase, located on 11p15.1-4 region. 1419 interstitial macrophages in the testes have wide eosinophilic, granular cytoplasm. ultrastructural studies reveal a large number of lysosomes fi lled with laminate bodies. physical and chemical agents may impair testicular function by direct action on the pituitary, the testis, or the sperm excretory ducts. in the pituitary, damage to gonadotropic cells may be caused by estrogen. in the testes, gonadotoxic agents may selectively impair a select cell type, but later, global dysfunction occurs. for example, there is direct toxicity to sertoli cells by phthalates used as plasticizers, nitroaromatic compounds intermediate in the production of dyes and explosives, and γ-diketones used as solvents. direct toxicity on spermatogenesis is seen wtih ionizing radiation. many drugs that impair epididymal fl uid or spermatozoon transport damage sperm excretory ducts, with subsequent loss of fertility. 1420 the relationship between infertility or subfertility and certain professions or exposures to environmental agents is well known. 1421 adverse effects of the following agents on spermatogenesis has been demonstrated: organic solvents such as chlorinated solvents, aromatic solvents and varnishes, degreasers, thinners, and adhesives; this is also the case with carbon disulfi de exposure; pesticides such as ddt, linuron, and polychlorinated biphenyls; 1422 heavy metals such as lead, cadmium, mercury, and copper; industrial wastes such as dioxins and ethylene dibromide; phthalates and polyvinyl chloride; oral contraceptives; exposure to radiation or high temperature; and recreational drugs and doping. there is also a long list of potentially harmful agents that disrupt testicular function. 1423 carbon disulfi de is used as a solvent in the production of rayon. continuous exposure is toxic to the nervous system, and causes a decrease in spermatogenesis and libido and an increase in fsh and lh serum levels. 1424, 1425 dibromochloropropane dibromochloropropane is used as a soil fumigant to control nematodes. lengthy exposure causes oligozoospermia, azoospermia, increased fsh and lh levels, and y-chromosome non-dysjunction. 1426 of the two natural forms of lead, organic and inorganic, the inorganic form is more dangerous. exposure to inorganic lead by workers in smelting, battery, and stained-glass plants causes direct spermatogenic damage. 1427 patients have asthenospermia, teratozoospermia, and oligozoospermia. 1428, 1429 oral contraceptive manufacture workers in pharmaceutical plants using synthetic estrogens and progestins develop hyperestrogenism with gynecomastia, decreased libido, and impotence. 1430 neonatal exposure of males to diethylstilbestrol may induce cryptorchidism, testicular hypoplasia, epididymal cyst, and severe anomalies in semen production. 1431 there is increasing evidence to suggest that estrogen-like effects are produced by a variety of naturally occurring estrogens (so-called phytoestrogens) and numerous synthetic compounds such as phthalates, 1432 pesticides, 1433 and polychlorinated biphenyls. 1434 the principal methods of contact with potential endocrine-disrupting compounds is dietary ingestion of milk, fi sh, meat, fruits and vegetables, or environmental exposure. 1435 the increasing incidence of cryptorchidism, hypospadias, testicular cancer, and poor semen quality may be related to the negative infl uence of environmental factors on the testis during fetal life. the term 'testicular dysgenesis syndrome' has been proposed to designate this constellation of putative syndromes. 1436 estrogen exposure in utero may disrupt development of the testes and the entire male reproductive tract. estrogen may hinder fsh secretion by the fetal pituitary, and also interfere with subsequent sertoli cell proliferation, and hence the secretion of amh required for the regression of müllerian ducts. persistence of müllerian derivatives is associated with lack of testicular descent. changes in amh secretion may also account for altered germ cell proliferation during fetal life. exposure to high concentrations of estrogen might compromise testosterone production as well as masculinization of external genitalia (hypospadias) and inguinal descent of the testis (cryptorchidism). abnormal development of sertoli cells and low germ cell numbers could cause diminished spermatozoon production and infertility. 1437 marijuana decreases sperm density and motility and increases the number of morphologically abnormal spermatozoa. 1438 cocaine induces apoptosis in the rat testis ( fig. 12-153 ). 1439 about 20% of injection drug users have low serum testosterone levels. consumption of more than 80 g alcohol per day adversely affects spermatogenesis in two-thirds of patients. 1440 women smoking more than 20 cigarettes per day have fertility problems, neonatal and perinatal mortality, miscarriage, and congenital malformations. 1441 abuse of anabolic steroids by athletes causes hypogonadotropic hypogonadism and transient azoospermia. 1442 ionizing radiation causes alterations in spermatogenesis and hormonal regulation of the testes. some patients recover fertility a few years after exposure. 1443 the effects of non-ionizing radiation are less severe; however, reduced libido and reduced numbers of spermatozoa have been reported in men exposed to microwaves. 1444 heat normal intratesticular temperature is 31-33°c, about 4-6°c lower than core body temperature. conditions causing higher testicular temperature, such as varicocele and cryptorchidism, also cause testicular damage, with decreased numbers of spermatozoa and an elevated percentage of sper-matozoa with abnormal forms and low motility. 1445, 1446 primary spermatocytes at the end of the pachytene stage are most sensitive to heat. the mechanism by which heat produces testicular lesions is unknown; hyperthermia affects the activity of enzymes such as ornithine decarboxylase 1447 and carnitine acetyl transferase, 1448 both necessary for metabolism and proliferation of the seminiferous tubular cells. 1449 the synthesis of dna and rna by germ cells also depends on temperature. dna synthesis by spermatogonia and preleptotene primary spermatocytes is higher at 31°c than at 37°c. rna and protein synthesis are normal at temperatures between 28°c and 37°c, but decrease markedly at 40°c. 1450 testicular trauma is especially frequent among athletes. trauma results in a wide variety of lesions, including contusion with or without hematocele, rupture, dislocation, and eventually spermatogenetic alteration that may lead to infertility. dislocation involves the displacement of one or both testes to a non-scrotal location 1451, 1452 such as the inguinal canal, abdominal cavity, acetabular area, or distant locations such as the perineum, subcutaneous tissues, or superfi cial to the outer oblique fascia. 1453, 1454 spermatogenetic recovery by orchidopexy has been successfully performed up to 13 years after bilateral traumatic dislocation. 1455 sexual dysfunction is found in 25-50% of patients who are treated for cancer. 1456 testicular cancer, hodgkin's disease, and leukemia are the most frequent malignancies during the reproductive years. therefore, preservation of fertility requires careful selection of less gonadotoxic therapeutic regimens; if paternity is planned, cryopreservation of semen before treatment may be considered. the most destructive treatments for gonadal function are radiation therapy and alkylating agents. 1457 the testicular parenchyma is one of the most radiosensitive tissues of the body, and the germ cells are the most radiosensitive cells of the testis. experimental irradiation of volunteers with a single dose revealed that late spermatogonia (ap and b) are more radiosensitive than early (ad) spermatogonia. ap and b spermatogonia may be destroyed with doses as low as 0.3 gy (1 gy = 100 rad), whereas ad spermatogonia tolerate doses higher than 4 gy. type a spermatogonia, spermatids, and spermatozoa are respectively 100, 200, and 10 000 times less radiosensitive than b spermatogonia. doses higher than 6 gy produce a sertoli cellonly pattern. leydig cells tolerate up to 8 gy and sertoli cells up to 60 gy, although sertoli cells show ultrastructural alterations and increased phagocytosis of germ cell remnants after low doses of radiation. even with optimal protection, the contralateral testis absorbs from 0.2 to 1.4 gy in adjuvant therapy for rectal cancer 1458 or when the opposite testis is irradiated, 1459 a dose suffi cient to cause temporary azoospermia. likewise, irradiation of iliac or inguinal lymph nodes for hodgkin's disease or other forms of lymphoma exposes the testes to about 5 gy. 1460 restoration of testicular function is time-dependent, 1461 requiring at least 2 years. 1462 fertility in thyroid cancer patients who received radioiodine-131 ( 131 i) therapy decreases briefl y, but infertility is not permanent. 1463 electromagnetic radiation from cell phones impairs spermatozoon motility according to one study. 1464 prepubertal testes also are sensitive to radiation therapy. patients treated for wilms' tumor may have delayed puberty and, at adulthood, oligoospermia or azoospermia with elevated levels of fsh; this fi nding suggests that leydig cells are also damaged. a special case is that of children with acute lymphoblastic leukemia involving the testis. radiotherapy with doses of 20-25 gy, either alone or with chemotherapy, causes irreversible damage to the seminiferous tubules and leydig cells. these patients develop azoospermia and hypogonadotropic hypogonadism with low serum testosterone ( fig. 12-154 ). widespread use of cytotoxic chemotherapy has created a number of adverse side effects, including gonadotoxicity. combination chemotherapy makes it diffi cult to ascertain which specifi c agent is responsible for azoospermia and leydig cell dysfunction. comparative studies of chemotherapy for acute lymphoblastic leukemia, 1465 extragonadal solid tumors, 1466 hodgkin's disease, 1467 ewing's sarcoma, and other soft tissue sarcomas 1468 in children and pubertal boys have shown that alkylating agents cause the most severe testicular damage. alkylating agents destroy the seminiferous tubular cells and induce tubular atrophy, shrinking the testis and increasing fsh serum concentration. 1469 these agents also impair leydig cell function, causing low testosterone, normal or elevated serum levels of lh, and an exaggerated response of lh to gnrh administration. 1470 testicular damage may be increased by combination with other agents (fig. 12-155 ). cyclophosphamide appears to be responsible for the greatest number of permanent or temporary cases of azoospermia after chemotherapy. this agent acts directly on the spermatogenic stem cells, 1468 and recovery depends on the number of surviving cells. in children, cyclophosphamide reduces seminiferous tubule diameter and germ cell numbers; in the residual spermatogonia nuclei are enlarged. puberty may progress, even during treatment, and the adult testis may show a sertoli cell-only pattern. 1465 in adults, cyclophosphamide treatment may cause irreversible testicular damage. administered alone, a dose of 20 000 mg/m 2 produces permanent azoospermia in 50% of men. if cyclophosphamide is administered with doxorubicin, vincristine, dacarbazine, or dactinomycin (drugs that alone do not cause azoospermia), doses of 7500 mg/m 2 cause azoospermia in 50% of patients. fludarabine, used for the treatment of chronic lymphocytic leukemia, produces testicular damage with diminution of ejaculate volume, oligozoospermia, increase in serum levels of fsh and lh, and decreased testosterone level. dna in spermatozoa is markedly abnormal, an effect that persists for several months. 1471 procarbazine, used to treat hodgkin's disease, causes permanent azoospermia in 30% of patients, even when not combined with alkylating agents. 1472 patients treated with a combination of cyclophosphamide and procarbazine in the copp protocol (cyclophosphamide, vincristine, procarbazine, and prednisone) do not recover spermatogenesis even if the cyclophosphamide dose does not exceed 4800 mg/m 2 . chemotherapy without both alkylating agents and procarbazine, such as the abvd (dexorubicin, bleomycin, vinblastine and dacarbazine) or vbm (vinblastine, bleomycin and methotrexate) regimens, produces reversible azoospermia in 36% of patients. the alternating use of mopp (mechlorethamine, vincristine, procarbazine and prednisone) and fig. 12-154 testis from a 26-year-old patient who, at the age of 9 years, underwent surgery followed by radiotherapy for paratesticular rhabdomyosarcoma. the testicular biopsy shows post-irradiation lesions, including germ cell aplasia and peritubular and interstitial fi brosis. abvd treatments causes testicular dysfunction in 87% of patients, but spermatogenesis recovers in 40%. 1473 patients with germ cell cancer who received chemotherapy with bep regimens (cisplatinum, etoposide, and bleomycin) become azoospermic 7-8 weeks after starting treatment. when the total doses reaches 600 mg/m 2 , infertility is irreversible; at lower dosages, fertility might be recovered over a period of about 2 (50% of patients) to 5 (80%) years, 1474 although a high percentage of spermatozoa with dna abnormalities persists. 1475 an important consideration in patients with testicular cancer or hodgkin's disease is the existence of testicular dysfunction before treatment. in some series 1476 dysfunction is present at diagnosis in more than 50% of patients; its cause is unknown. proposed mechanisms include primary germ cell defi ciency, release of toxic substances by tumor cells, and alteration in the hypothalamo-hypophysealtesticular axis. sexual function is often lost in patients who undergo bilateral retroperitoneal lymph node dissection for nonseminomatous testicular cancer. up to 90% lose antegrade ejaculation, although libido, erection, and orgasm are normal. loss of antegrade ejaculation results from the removal of or injury to sympathetic ganglia and the hypogastric nervous plexus during surgery. unilateral surgery, especially if the left side is not operated on, reduces this complication. 1477, 1478 hypospermatogenesis sometimes occurs after surgery for rectal cancer, perhaps due to vascular compromise. spinal cord injury is a frequent fi nding, with more than 10 000 cases annually in the us, mostly in young adults. 1479 fertility is impaired in 90% of males with spinal cord injury. the major sexual dysfunctions in these patients are the lack of erection and ejaculation and poor semen quality. [1480] [1481] [1482] [1483] [1484] [1485] failure of ejaculation occurs in 95% of patients. semen may be obtained by means of vibratory stimulation of the penis or electroejaculation in more than 90%, but its quality is low, with increased numbers of dead spermatozoa, markedly low motility, and reduced fertilization rate. [1486] [1487] [1488] possible explanations include genitourinary tract infection, endocrine anomaly, and impaired spermatogenesis. recurrent infection occurs in 60-70% of patients. compared to controls, a signifi cant increase in the numbers of neutrophils and macrophages occurs, with a marked increase in the production of reactive oxygen species. 1489, 1490 this fi nding and the presence of elevated cytokine levels 1491 are assumed to be involved in pathogenesis. endocrine anomalies are transient, and hormonal levels return to normal after a few months. more than 50% of patients have abnormalities of the adluminal compartment of the seminiferous tubules, with variable degrees of immature germ cell sloughing; 1482 in 50% of patients the number of mature spermatids per cross-sectioned tubule is less than 10 (normal >21). possible etiologies include an increase in testicular temperature due to vascular dilation, or an alteration in scrotal thermoregulation secondary to impaired sympathetic innervation from prolonged wheelchair restraint; alteration in sperm transport secondary to nerve injury, resulting in sperm stagnation in seminal vesicles, a hostile environment that normally is devoid of spermatozoa; 1493 and abnormal composition of seminal fl uid, causing deterioration of spermatozoa that in the epididymis and ductus deferens had good motility. 1494 more than 25% of patients with spinal cord injury have brown-tinged semen in some ejaculations. 1495 although the cause is unknown, it might be related to seminal vesicle dysfunction. when spermatozoa cannot be obtained by electroejaculation or vibratory stimulation, vasal aspiration or testicular biopsy are recommended. most patients have at least a few mature spermatids in some seminiferous tubules; therefore, testicular sperm extraction followed by intracytoplasmic sperm injection is a reasonable consideration in azoospermic patients. 1492 infectious agents may reach the testis and epididymis through blood vessels, lymphatics, sperm excretory ducts, or directly from a superfi cial wound. infection transmitted through the blood mainly affects the testis and causes orchitis, whereas infection ascending through the sperm excretory ducts usually causes epididymitis. acute infl ammation is accompanied by enlargement of the testis or epididymis. the tunica albuginea is covered by a fi brinous exudate, and the testicular parenchyma is yellow or brown. bacterial infection may cause abscess. in some cases the infection begins to heal, with the deposition of granulation tissue and fi brosis; in others, the infection may persist as an active process for a long time, resulting in chronic orchidoepididymitis. the most frequent causes of viral orchidoepididymitis are mumps virus and coxsackie b virus. other viral infections that occasionally cause acute orchitis include infl uenza, infectious mononucleosis, echovirus, lymphocytic choriomeningitis, adenovirus, coronavirus, bat salivary gland virus, smallpox, varicella, vaccinia, rubella, dengue, and phlebotomous fever. subclinical orchitis probably occurs during other viral infections (fig. 12-156) . before vaccination was commonly used, mumps orchidoepididymitis complicated 14-35% of adult mumps cases and was bilateral in 20-25% of cases. nevertheless, miniepidemics still occasionally occur. 1496, 1497 as expected, the incidence remains high in countries where vaccination is not obligatory. 1498 in about 85% of cases of mumps orchitis the epididymis is also involved, but epididymal involvement alone is rare. 1499 clinical symptoms of orchitis usually appear 4-6 days after symptoms of parotiditis, but orchitis may also appear without parotid involvement. 1500 testicular involvement is multifocal, and consists of acute infl ammation of the interstitium and seminiferous tubules. the tubular lining is destroyed, and eventually only hyalinized tubules and clusters of leydig cells remain. 1501 with time, the testes shrink and become soft. if the infection is bilateral the patient is usually infertile, with severe oligozoospermia or azoospermia, although biopsy may reveal the presence of mature spermatids in some tubules, allowing sperm extraction for paternity. 1502 if only one testis was affected, the sperm concentration may be normal or slightly decreased and fertility is maintained. occasionally the testicular damage is so severe that testicular endocrine function is impaired, causing hypergonadotropic hypogonadism, with low testosterone levels and regression of secondary sex characteristics. mumps orchidoepididymitis is infrequent in childhood. most bacterial orchitis is associated with bacterial epididymitis. orchitis secondary to suppurative epididymitis caused by escherichia coli is most common. 1503 on light microscopy, the tubules are effaced by intense acute infl ammation. chronic orchitis with microabscesses is caused by e. coli, streptococci, staphylococci, pneumococci, salmonella enteritidis, 1504 and actinomyces israeli. 1505, 1506 in some cases of chronic bacterial orchitis, the testis contains an infl ammatory infi ltrate consisting of numerous histiocytes with foamy cytoplasm (xanthogranulomatous orchitis) (fig. 12-157) , 1507 similar to that of idiopathic granulomatous orchitis but lacking intratubular giant cells. rarely, as in whipple's disease, large numbers of bacilli are present in histiocytes in the interstitium, vascular walls, and seminiferous tubules. the most frequent complications of pyogenic bacterial orchidoepididymitis are scrotal pyocele and chronic draining scrotal sinus. small fragments of testicular parenchyma may be eliminated through the scrotal skin, known clinically as fungus testis. another complication is testicular infarct, resulting from compression or thrombosis of the veins of the spermatic cord, in the scrotal neck, or the superfi cial inguinal ring. most cases of chronic orchidoepididymitis are associated with granulomas in the testis. specifi c causes may require special stains, cultures, or serologic tests, and include tuberculosis, syphilis, leprosy, brucellosis, mycoses, and parasitic diseases. in sarcoidosis and idiopathic granulomatous orchitis, the agent is unknown. the incidence of tuberculous orchidoepididymitis declined after the development of effective antibiotics, but it has recently undergone a resurgence among people who have emigrated from countries with a high incidence of the disease and the increasing population of immunologically compromised patients. most cases of tuberculous orchidoepididymitis are associated with involvement elsewhere in the genitourinary system. 1508 tuberculous epididymitis is usually the result of ascent from tuberculous prostatitis, which in turn is often secondary to renal or pulmonary tuberculosis. the pattern of spread is different in children: more than half have advanced pulmonary tuberculosis, and the testis is infected through the blood. 1509 more than 50% of patients with renal tuberculosis develop tuberculous epididymitis, and orchitis occurs in approximately 3% of patients with genital tuberculosis, usually secondary to epididymal tuberculosis. it has been suggested that some cases of tuberculous orchidoepididymitis are sexually transmitted. 1510 tuberculous orchidoepididymitis occurs mainly in adults: 72% of patients are older than 35 years, and 18% are over 65 years. the signs and symptoms may be mild, consisting only of testicular enlargement and scrotal pain. in such cases, fever is infrequent and constitutional symptoms may be absent. 1511 histologically, there are typical caseating and noncaseating granulomas that destroy the seminiferous tubules and interstitium (figs 12-158, 12-159 ). in immunosuppressed patients, the granulomas consist of epithelioid histiocytes and a few lymphocytes with rare giant cells. acid-fast bacilli tend to be more numerous in immunosuppressed patients. similar lesions may be observed in orchidoepididymitis caused by bacillus calmette-guérin, which is usually used for intravesical instillation in patients with vesicular urothelial carcinoma. 1512 syphilis syphilitic orchitis may be congenital or acquired. in congenital orchitis, both testes are enlarged at birth. the histological fi ndings are similar to those of the interstitial orchitis of acquired syphilis. if diagnosis is delayed until puberty, the testis often shows retraction and fi brosis. in adults, acquired orchitis is a complication of the tertiary stage of syphilis and has two characteristic histologic patterns: interstitial infl ammation and gumma. early in the disease, patients with interstitial orchitis have painless enlargement. grossly, the parenchyma is gray with translucent areas. histologically, plasma cells are abundant. the infl ammation begins in the mediastinum testis and testicular septa, later extending through the parenchyma as the seminiferous tubules lose their cellular lining and undergo sclerosis. initially, the arteries show an obliterans type of endarteritis. small gummas may be observed. eventually, the infl ammation subsides and is replaced by fi brosis. the epididymis is usually not affected. gummatous orchitis is characterized by the presence of one or several well-delineated grossly gray-yellow zones of necrosis. 1513 histologically, ghostly silhouettes of seminiferous tubules are visible within the gumma, surrounded by infl ammation consisting of lymphocytes, plasma cells, and scattered giant cells. in most cases spirochetes may be demonstrated histochemically with warthin-starry silver stain, but the most specifi c diagnostic technique is genetic testing. the testis may be infected in patients with lepromatous or borderline leprosy. frequent involvement of the testis in lepromatous leprosy results from the low intrascrotal temperature that promotes growth of the bacilli. orchitis is usually bilateral, although the degree of involvement may differ between the testes. occasionally, testicular involvement may be the sole indication of the infection, and the diagnosis may be made by testicular biopsy. 1514 the histologic fi ndings in the testis vary with the duration of the infection. initially, there is perivascular lymphocytic infl ammation and interstitial macrophages that contain numerous acid-fast bacilli. later, the seminiferous tubules undergo atrophy, the leydig cells cluster, and blood vessels show endarteritis obliterans. finally, the testis is replaced by fi brous tissue with a few lymphocytes and macrophages containing acid-fast bacilli. most patients with lepromatous leprosy are infertile, even if the orchitis was clinically mild. 1515, 1516 brucellosis brucellosis is common in some parts of the world, including the middle east. 1517, 1518 orchitis occurs in some patients and may be the fi rst sign of disease. brucellosis should be suspected when testicular enlargement occurs in patients with undulating fever, malaise, sweats, weight loss, and headache. 1519 occasionally this may mimic testicular tumor. histologically, there is a dense lymphohistiocytic infl ammation with occasional non-caseating granulomas in the interstitium. the seminiferous tubules are infi ltrated by infl ammatory cells and undergo atrophy. diagnosis is made by clinical and laboratory fi ndings, including blood culture, the bengal rose test, and high brucella agglutination titers, 1520, 1521 or by real-time polymerase chain reaction assay of urine. 1522 sarcoidosis is a systemic granulomatous disease of unknown etiology that preferentially affects young black adults. the genitourinary tract is involved in only 0.5% of clinical cases and 5% of autopsy cases. fewer than 30 cases of primary epididymal involvement have been reported, and about 12 of these also involved the testis. 1523, 1524 isolated testicular involvement is exceptional. 1523, 1525, 1526 testicular sarcoidosis is usually unilateral and nodular. 947 it is often asymptomatic and found at autopsy. 1527 the testis contains non-caseating granulomas similar to sarcoid granulomas at other locations. before diagnosing testicular sarcoidosis, other granulomatous lesions should be excluded, including tuberculosis, sperm granuloma, granulomatous orchitis, and seminoma. seminoma often has an intense sarcoid-like reaction, and examination of multiple histologic sections may be necessary to fi nd diagnostic foci of seminoma. an association of mediastinal sarcoidosis and testicular cancer has been reported. 1528 genital involvement of sarcoidosis may be the cause of intermittent azoospermia that benefi ts from corticoid therapy. 1529 malakoplakia is a chronic infl ammatory disease that was initially described in the bladder 1530 and subsequently in many other organs. the testes (alone or together with the epididymis) are involved in 12% of cases involving the urogenital system. 1531, 1532 grossly, the testes are enlarged and have a brown-yellow parenchymous discoloration, 1533 often with abscesses. malakoplakia causes tubular destruction that is associated with a dense infi ltrate of macrophages with granular eosinophilic cytoplasm that often contains michaelis-gutmann bodies (fig. 12-160) . 1534, 1535 the differential diagnosis includes idiopathic granulomatous orchitis and leydig cell tumor. infl ammation in idiopathic granulomatous orchitis includes intratubular multinucleate giant cells; in malakoplakia it is diffi cult to identify the tubular outlines, and giant cells are usually absent. leydig cell tumor is not usually associated with infl ammation, but may contain mononucleated or binucleated cells with abundant eosinophilic cytoplasm. reinke's crystalloids are identifi ed in up to 40% of cases of leydig cell tumor but absent in malakoplakia, and michaelis-gutmann bodies are absent. fungal orchitis is rare; most cases are associated with blastomycosis, coccidiomycosis, histoplasmosis, and cryptococcocis. 1536 the genital tract may be involved in widespread blastomycosis. in decreasing order, the organs most frequently affected are the prostate, epididymis, testis, and seminal vesicles. grossly, there often are small abscesses that may have caseous centers. fungi measuring 8-15 µm in diameter with double refringent contours are present in the giant cells in granulomas and stain positively with periodic acid-schiff and methenamine silver stains. coccidioidomycosis is endemic in california, the southwestern united states, and mexico, and may present as epididymal disease after remission of systemic symptoms. 1537 the granulomas are similar to those of tuberculosis and contain 30-60 µm sporangia with endospores that stain with periodic acid-schiff. dissemination of histoplasmosis and cryptococcosis frequently occurs after steroid therapy and may give rise to granulomatous orchitis with extensive necrosis. 1538 histoplasma capsulatum measures 1-5 µm in diameter and may be demonstrated with silver stain. cryptococcus is identifi ed by its thick wall that stains with mucicarmine. most parasites that reach the genital tract, such as phyllaria and schistosoma, are in the spermatic cord, and testicular lesions are secondary to vascular injury. 1539 testicular infection has also been reported in patients with visceral leishmaniasis, congenital and acquired toxoplasmosis ( fig. 12-161) , 1540 echinococcus infection, 1541 and orchitis due to trichomonas vaginalis. idiopathic granulomatous orchitis is a chronic infl ammatory condition of older adults (mean, 59.2 years). the most prominent clinical symptom is testicular enlargement, suggesting malignancy. 1542 most patients have a history of scrotal trauma, 66% have symptoms of urinary tract infection with negative cultures, and 40% have sperm granuloma in the epididymis. an autoimmune etiology has been suggested. the testis is enlarged, with a nodular cut surface and areas of necrosis or infarction. there are two histologic forms, according to whether the lesion is predominantly in the tubules (tubular orchitis) or the interstitium (interstitial orchitis). in tubular orchitis, germ cells degenerate and the sertoli cells have vacuolated cytoplasm and vesicular nuclei. plasma cells and lymphocytes infi ltrate the walls of the seminiferous tubules, forming concentric rings. multinucleated giant cells are present in the tubular lumina and sometimes in the interstitium (fig. 12-162 ). vascular thrombosis and arteritis are common. in interstitial orchitis, the infl ammation is predominantly interstitial. ultimately, tubular atrophy and interstitial fi brosis prevail in both forms, which may arise from different immune mechanisms. 1543 tubular orchitis histologically resembles experimental orchitis caused by injection of serum from animals with orchitis, whereas interstitial orchitis resembles orchitis produced by the transfer of cells from immunized animals. the differential diagnosis of idiopathic granulomatous orchitis is infectious orchitis caused by bacteria, spirochetes, fungi, or parasites. a useful clue in the tubular form is the presence of giant cells within seminiferous tubules. the occurrence of focal lymphoid cell infi ltrates in the testicular interstitium is common in infertile patients, 1544, 1545 patients who have undergone surgery for bilateral inguinal hernia, 1546 vasectomized patients who developed post-infection obstruction, 1547 after testicular piercing, 1548 and cryptorchidism. 1549 infl ammatory infi ltrates usually involve the seminiferous tubules, and this suggests the disorder is due to an immunologic response (fig. 12-163 ). pseudolymphoma is a benign reactive process with a lymphoid cell proliferation so intense that it may be mistaken for lymphoma. testicular pseudolymphoma consists of infl ammatory infi ltrates with numerous lymphocytes and plasma cells that partially or totally destroy testicular parenchyma. 1550, 1551 the differential diagnosis includes lymphoma, various forms of orchitis, and seminoma. the diagnosis of lymphoma may be excluded by the lack of atypia and polyclonal nature of the infl ammation. syphilitic orchitis also contains a plasma cell-rich infl ammatory infi ltrate, but pseudolymphoma does not have other characteristic features of syphilitic orchitis, such as endarteritis obliterans; spirochetes cannot be demonstrated by special stains. the lack of granulomas or signifi cant numbers of macrophages, together with the negative results of specifi c histochemical stains, also helps to exclude idiopathic granulomatous orchitis, tuberculosis, leprosy, sarcoidosis, and fungal infection. finally, although the presence of a prominent infl ammatory infi ltrate and, in many cases, numerous lymphoid follicles, may suggest the diagnosis of seminoma, the presence of seminoma cells should be easily demonstrated with best's carmine stain, periodic acid-schiff, or placenta-like alkaline phosphatase. the term plasma cell granuloma 1152 refers to a reactive process characterized by the presence of polyclonal adult plasma cells that are absent in testicular plasmacytoma. 1553 sinus histiocytosis with massive lymphadenopathy (rosai-dorfman disease) is a benign proliferation of macrophages that uniquely contain numerous lymphocytes in their cytoplasm. the disease was reported in a kidney and testis of a patient in remission from malignant lymphoma in association with monoclonal iga gammopathy, 1554 and in a second patient with diabetes mellitus who had been previously treated for pulmonary tuberculosis. 1555 increased numbers of interstitial macrophages may also be observed in more than two-thirds of autopsies from adult patients, but the cause is unknown. one condition associated with this disorder is treatment with hydroxyethylstarch plasma expander. in this lesion, the interstitial macrophages stand out by virtue of their large size and multivacuolated cytoplasm, suggesting thesaurosis. there is no evidence of mucin glycoproteins, proteoglycans, starch, lipids, glycogen, or foreign body material. most patients have no clinical symptoms other than pruritus and persistent erythrema. 1556 epididymitis nodosa is a proliferation of small irregular ducts whose epithelium lacks the characteristic features of the epididymal epithelium. the disorder is associated with infl ammation and fi brosis, similar to vasitis nodosa. 1557 in several tissues, including the testis, amiodarone is concentrated up to 300 times its plasma level, 1558 causing testicular atrophy and increased serum levels of fsh and lh in some patients. 1559 the incidence of epididymitis during amiodarone therapy varies from 3% to 11%, 1560, 1561 and more than 35 cases (in several cases involvement was bilateral) have been reported, although there are probably many others. 1562, 1563 the disorder may occur at any age. 1564 when amiodarone dosage is reduced to 300 mg/day the epididymitis heals within a few weeks. 1565 autopsy studies show focal areas of fi brosis and lymphoid cell infi ltrates not related to infection. recognition is important to avoid unnecessary antibiotics or aggressive surgery. this term describes a lesion located in the epididymal head characterized by non-infectious necrosis with polypoid masses of infl amed granulation tissue in peripheral ductal structures. granulomas containing multinucleated giant cells present within efferent ductuli or form sperm microgranulomas with ductal neoformation similar to that of epididymitis nodosa. the cause is unknown, but may result from ischemia. 1566 the terms 'testicular calculus' and 'stone in the testis' have been used to describe a lesion characterized by the presence of nodular testicular calcifi cation that is not related to ischemia, orchitis, vasculitis, hematoma, or tumor. 1567, 1568 the testicular arteries may be affected by systemic disorders such as schönlein-henoch purpura, 1569 wegener's disease, 1570, 1571 kogan's disease, 1572 behçet's disease, 1573 relapsing polychondritis, rheumatoid arthritis, and dermatomyositis, but the most frequent involvement is with polyarteritis nodosa. 1574 approximately 80% of patients with polyarteritis nodosa have testicular or epididymal involvement, 1575 but only 2-18% are diagnosed during life. rarely, testicular or epididymal polyarteritis nodosa is the fi rst manifestation of the disease. in these cases the symptoms may suggest orchitis, epididymitis, testicular torsion, or tumor. [1576] [1577] [1578] the testis usually shows arterial lesions in different stages of evolution, including fi brinoid necrosis, infl ammatory reaction, thrombosis, or aneurysm. the parenchyma initially has zones of infarction ( fig. 12-164 ). histologic and immunohistochemical fi ndings similar to those of polyarteritis nodosa may occasionally be observed in the testis or the epididymis without lesions elsewhere; this condition is referred to as isolated arteritis of the testis and epididymis, 1579 and differs from classic polyarteritis by a lack of vascular thrombosis, aneurysm, or infarct. the etiology of isolated arteritis is unknown, but the prognosis is excellent. 1580 the histologic fi ndings of necrotizing arteritis in the testis or epididymis should be followed by clinical, hematologic, and biochemical studies to exclude systemic arteritis. 1581, 1582 torsion of the spermatic cord is the most frequent cause of testicular infarct, followed by trauma, incarcerated inguinal hernia, epididymitis, and vasculitis. spermatic cord torsion is a surgical emergency. if repair is delayed more than 8 hours, testicular viability is usually compromised. this disorder may appear at any age, but the other testicular and epididymal lesions peaks of maximal incidence are the perinatal period and puberty. 1583 factors that predispose to testicular torsion are anatomical anomalies in testicular suspension and abnormal position of the testis. many men with testicular torsion have an abnormally high refl ection of the tunica vaginalis, giving rise to the deformity known as 'bell-clapper.' other anomalies include elongated mesorchium, separation between the epididymis and testis, and absent or very elongated gubernaculum. the frequency of testicular torsion is higher in cryptorchid and retractile testes than in normal testes. there are two classic anatomic forms of testicular torsion: high (supravaginal or extravaginal) and low (intravaginal). each appears at a different age. extravaginal torsion typically occurs in infancy and childhood, whereas intravaginal torsion is more frequent at puberty and adulthood. neonatal torsion is bilateral in 12-21% of cases. 1584 most torsion observed on the fi rst of life is intrauterine. 1585 pubertal and adult torsion causes testicular pain that may radiate to the abdomen or other sites. about 36% of patients have a previous history of pain or swelling in one or both testes. the differential diagnosis includes all causes of acute scrotum. 1586, 1587 torsion causes hemorrhagic infarction of the testis ( fig. 12-165 ). in old neonatal torsion, the histological fi ndings are so advanced that only collagenized tissue containing calcium and hemosiderin deposits is seen. in adults, three degrees of histological lesion may be distinguished. 1588 degree i (26.5% of adult twisted testes) is characterized by edema, vascular congestion, and focal hemorrhage. seminiferous tubules are dilated, with sloughed immature germ cells, apical vacuolation of sertoli cells, and dilated lymphatic vessels. 1589 degree ii (26.5% of testes) has pronounced interstitial hemorrhage and sloughing of all germ cell types in the seminiferous tubules. the lesion is more severe in the center of the testis, and thus biopsy might provide erroneous information (fig. 12-166 ). degree iii lesions (45% of testes) are characterized by necrosis of the seminiferous tubular cell layers. there is often a correlation between the time interval of torsion and the degree of the histologic lesion. 1590 degree i appears in torsion of less than 4 hours' duration, degree ii in torsion of between 4 and 8 hours, and degree iii in torsion of more than 12 hours. nevertheless, there are some exceptions that could probably be related, among other factors, to the number of twists in the torsed spermatic cord (degrees of testicular rotation). the testicular salvage rate, defi ned as testicular growth and development that refl ects the age of the patient and the contralateral testis, is around 50% in all cases of testicular torsion. 1591 testes that do not bleed into the albugineal incision within 10 minutes are assumed to be non-viable and should be removed. 1592 little attention has been paid to intermittent testicular torsion. early orchiopexy may save these testes, but after surgery, the testis becomes small and excessively mobile, and most have the bell-clapper deformity. 1593 seminiferous tubules are devoid of germ cells and have hyalinized walls. some adults with untreated testicular torsion develop lipomembranous fat necrosis of the spermatic cord. 1594 patients seek help for pain in the high scrotum. at this level, there is a small nodule that corresponds to remnants of the twisted testis. the epididymis and proximal spermatic cord characteristically contain fat necrosis (fig. 12-167) . adults with prior spermatic cord torsion often consult for infertility. the mechanism causing spermiogram alteration is controversial, and three hypotheses have been proposed: • autoimmune process. it has been suggested that the ischemic injury breaks the blood-testis barrier, and antigens released from the necrotic germ cells activate macrophages and lymphocytes in the interstitium, stimulating the formation of antibodies against these antigens. these antibodies that enter in the blood circulation may presumably damage the contralateral testis. 1595 • alterations in microcirculation. after testicular torsion, blood fl ow decreases in the contralateral testis, causing an increase in the characteristic products of hypoxia, such as lactic acid and hypoxanthine. 1596 intense apoptosis involving mainly spermatocytes i and ii has been observed. 1597 long-term effects are yet unknown. • primary testicular lesions. many twisted testes have lesions that cannot be formed in a few hours, such as hypoplastic tubules, microlithiasis, and focal spermatogenesis. in addition, more than half of biopsies of the contralateral testis show marked spermatogenetic lesions. 1598 these fi ndings suggest that torsion occurs in testes with congenital lesions. trauma 1599 and lesions of the vessels of the spermatic cord may also cause testicular infarct. ischemic atrophy is a risk of inguinal surgery, including herniorrhaphy, varicocelectomy, hydrocelectomy, and descent of cryptorchid testis ( fig. 12-168 ). the incidence of atrophy after inguinal herniorrhaphy varies from 0.06% in primary herniorrhaphy 1600 to 7.9% after surgery for recurrent herna, 1601 depending on the diffi culty and extent of the hernia. atrophy occurs in some cases of thrombosis of the vena cava or spermatic artery. 1602 focal infarction of the testis is associated with polycythemia, sickle cell disease, trauma, 1603, 1604 and laparoscopic inguinal hernia repair. focal infarction may also be spontaneous. clinical symptoms of testicular infarct mimic testicular tumor. color doppler ultrasound reveals the diagnosis in most cases. 1605 cystic malformation of the tunica albuginea and testicular parenchyma was fi rst described in the 19th century, 1606 and was long considered rare and mainly present in the tunica albuginea. 1607, 1608 with the systematic use of ultrasonography, the incidence of cysts has been found to be much higher: 1609 non-neoplastic cysts are found in 2.1% 1610 to 9.8% 1611 of testes. 1612, 1613 cyst of the tunica albuginea is usually an incidental fi nding in patients in the fi fth or sixth decade of life. it is located in the anterolateral aspect of the testis and may be unilocular or multilocular, 1614 ranging from 2 to 4 mm and containing clear fl uid without spermatozoa. the cyst may be embedded within the connective tissue of the tunica albuginea, protrude from the inner surface of the tunica albuginea into the testicular parenchyma, or protrude from the outer surface forming a blue lump in the tunica albuginea. the epithelium lining the cyst may be simple columnar or stratifi ed cuboidal, and is supported by a thin layer of collagenized connective tissue. the columnar epithelium usually includes some ciliated cells, 1615 and the cuboidal epithelium is composed of two layers of non-ciliated cells (fig. 12-169) . cyst of the rete testis is identifi ed by a distinctive epithelial lining of areas of fl attened cells intermingled with areas of tall columnar cells. spermatozoa are frequently found within the cyst, 1616 and hence the cyst is also called intratesticular spermatocele. 1617 it may be associated with cystic transformation of the rete testis and multiple epididymal cysts. rete testis cyst is not always attached to the rete and may be found at a distance. simple cyst of the testis constitutes the remaining intraparenchymal cyst. it is usually lined by cuboidal epithelium and contains no spermatozoa. 1618, 1619 simple cyst ranges other testicular diseases from 2 nm to 18 mm in diameter. 1620, 1621 the disorders occurs at any age, from 5 months to 80 years, with a bimodal distribution with peaks at 8 month and 60 years. 1622 it may occur bilaterally, 1623 and may present as two cysts in the same testis. 1624 origin of the three types of testicular cyst is uncertain. previously, traumatic 1625 and infl ammatory 1626 origins were attributed to tunica albuginea cyst, but most now believe that they are derived from embryonal remnants of the mesonephric ducts 1615, 1627 or mesothelial cells embedded in the tunica albuginea during embryogenesis. 1614, 1628, 1629 simple cyst of the testis may also have a mesothelial origin, but it is possible that some arise from ectopic rete testis epithelium. these cysts are unrelated to epidermoid cyst, differing in the ultrasonographic 1630,1631 and histologic features (see discussion on cystic dysplasia and testicular tumors in the section on hamartomatous testicular lesions). ultrasound studies indicate that testicular cyst has little potential for growth. 1630,1632 currently, excision is recommended only in children when the cyst may impair testicular development. 1633 dysgenesis of the rete testis is characterized by inadequate maturation and persistence of infantile or pubertal characteristics in adults. 1634 this disorder is frequent in undescended adult testes. the lesion involves the rete testis segments referred to as septal, mediastinal, and extratesticular. there is poor development of the cavities and their epithelial lining, which becomes cuboidal or columnar instead of fl attened with areas of columnar cells. the lumina of the rete testis cavities may be completely absent (simple hypoplasia) or, conversely, undergo microcystic dilation (cystic hypoplasia). in a few cases, the rete testis develops papillary, cribriform, or tubular formations (adenomatous hyperplasia). the epithelium of the rete testis is usually fl attened, with scattered areas of columnar cells. in estrogen-treated patients, those with chronic hepatic insuffi ciency, functioning tumor that secretes estrogens or human chorionic gonadotropin, and other disorders that are described as hyperplasia of the rete testis it may undergo diffuse transformation into tall columnar epithelium. except for the latter group, metaplasia of the rete testis seems to be an estrogen-dependent process, and estrogen receptors are present in the rete testis epithelium. 1635 acquired cystic transformation of the rete testis is common, and its incidence increases with age and associated disorders. 1636 ultrasound 1637 ultrasound ,1638 and magnetic resonance 1639 studies reveal characteristic images that may suggest malignancy. the lesion has three forms: simple, associated with epithelial metaplasia, and with crystalline deposits. simple cystic transformation consists of dilated cavities with normal epithelium. it results from obstruction of the epididymis or the initial portion of the vas deferens due to ischemia (aging men); compression by epididymal and spermatic cord tumor, or by congestive veins in varicocele; infl ammation in patients with previous epididymitis; malformation (testis-epididymis dissociation, malformed epididymis and absence of the vas deferens); 1640 or iatrogenic causes (surgery for epididymoectomy or removal of epididymal cyst) (fig. 12-170) . 1641 cystic transformation with epithelial metaplasia is a frequent fi nding at autopsy. 1369 its development is probably due to the concurrence of sperm excretory duct obstruction and conditions involved in increased serum estrogen levels, such as chronic liver insuffi ciency. another possible cause is infl ammation involving the rete testis. cystic transformation with crystalline deposits has also been called cystic transformation of the rete testis secondary to renal insuffi ciency. 1642 it is a bilateral lesion of adult testes characterized by the concurrence of three fi ndings: cystic transformation of the rete testis, cuboidal or columnar metaplasia of its epithelium, and the presence of urate and oxalate crystalline deposits that may be recognized by polarized light. the lesion is pathognomonic of dialyzed patients with chronic renal insuffi ciency. crystalline deposits are initially formed beneath the epithelia of the rete testis and ductuli efferentes; later they protrude into the lumina, where they are fi nally released. infl ammation is absent or slight, although a few giant cells and small fi brotic areas are often seen (figs 12-171, 12-172 ). this lesion is characterized by diffuse or nodular proliferation of tubular or papillary structures that are derived from the rete testis 1643 and are observed in cryptorchid or normally descended testes. cases have been reported in newborns, children, and adults. 1644 adenomatous hyperplasia in newborn and infantile testes consists of enlargement of the mediastinum testis by cordlike or tubular structures derived from the rete testis. the lesion may extend up to one-third of testicular volume. despite excessive development of the rete testis, the normal connections with seminiferous tubules and efferent ductuli remain. presentation may be unilateral or bilateral. unilateral presentation is associated with cryptorchidism or vanishing testis. bilateral cases may also present with bilateral renal dysplasia. efferent ductuli may show luminal dilation and irregular outlines. the etiopathogenesis might be similar to that of cystic dysplasia of the testis. 1644 adenomatous hyperplasia in adults is usually an incidental fi nding at autopsy, 1645 in cryptorchid testes, 1646 or in testes with germ cell tumor. the rete testis epithelium forms nonencapsulated nodular outgrowths or a diffuse pattern. nodule size may be large enough to suggest tumor. the epithelium consists of cuboidal cells with ovoid nuclei, deep nuclear folds, and peripheral nucleoli. atypias and mitotic fi gures are lacking (fig. 12-173) . the ultrastructure and immunophenotype of the epithelium are similar to those of the normal rete testis. spermatozoa may be seen inside the cavities in some cases, suggesting that such a proliferation is connected with the seminiferous tubules. most of the testes show a certain degree of seminiferous tubular atrophy. in incidental autopsy cases the etiology is unknown, although it may be related to hormonal or chemical agent effects. [1647] [1648] [1649] in cryptorchid testes and with many testicular tumors, the most probable cause is a primary anomaly that is part of the testicular dysgenesis syndrome. 1650 adenomatous hyperplasia should be distinguished from three entities: rete testis pseudohyperplasia, which appears in atrophic testes; primary rete testis tumor; and metastasis of adenocarcinoma. in pseudohyperplasia, lesions are focal, microscopic, and usually located in the septal rete, although the mediastinal rete shows few or no alterations. benign rete testis tumor such as adenoma (solid and papillary variants) fig. 12-171 changes in the rete testis associated with dialysis. dilation of the rete testis and initial portion of the ductuli efferentes can be observed. crystalline structures, mainly rhomboidal in shape, accumulate inside and outside the tubules. other testicular diseases and cystoadenoma are isolated and focal, 1651 whereas rete testis hyperplasia is diffuse. adenocarcinoma of the rete testis is a tumor that displays numerous mitotic fi gures and infi ltrates adjacent structures. 1652 metastasis of prostatic adenocarcinoma may be excluded because these metastases alter the rete testis architecture and are immunoreactive for prostatic acid phosphatase and psa. this reactive lesion is characterized by the presence of intracytoplasmic accumulation of hyaline eosinophilic globules in the epithelial cells of the rete testis. the epithelium may be hyperplastic, but does not contain mitotic fi gures or nuclear atypia. the globules are up to 15 µm in diameter ( fig. 12-174 ). this lesion is associated with tumor and infl ammatory processes occurring near the mediastinum testis, and can be observed in association with 75% of mixed testicular germ cell tumors, 47% of seminomas, and 20% of non-germ cell testicular tumors, such as epididymal tumor that infi ltrates the testis (adenomatoid tumor). 1653 yolk sac tumor infi ltrating the rete testis may closely resemble this type of rete testis hyperplasia. positive immunoreactions for α-fetoprotein and placenta-like alkaline phosphatase, as well as nuclear atypia, are helpful to distinguish germ cell neoplasia from this rete testis hyperplasia. 1654 this lesion, described as nodular proliferation of calcifying connective tissue in the rete testis, is characterized by the presence of multiple nodules that originate from the rete testis lining and subjacent connective tissue, protruding into the channels of the rete testis. these consist of cellular connective tissue covered by several layers of a fi brin-like material, which in turn is covered by rete testis epithelium. the nodules may be totally or partially calcifi ed (fig. 12-175 ). 1655 the lesion is an incidental fi nding at autopsy in patients with impaired peripheral perfusion. selective location of the lesion in the walls of the cavities and chordae rete testis is probably related to poor vascularization of these structures. the etiopathogenetic mechanism may be anoxia, necrosis, fi brin deposition, proliferation of connective tissue, or dystrophic calcifi cation. the intracavitary growth of the lesion might be due to the lower intracavitary pressure and also to the stiff structure of the mediastinum testis. the molecular basis of male sexual differentiation homozygous deletions in wilms'tumours of a zinc-fi nger gene identifi ed by chromosome jumping isolation and characterization of a zinc fi nger polypeptide gene at the human chromosome 11 wilms' tumor locus wt-1 is required for early kidney development germline mutations in the wilms' tumor suppressor gene are associated with abnormal urogenital development in denys-drash syndrome donor splice-site mutations in wt1 are responsible for frasier syndrome expression of steroidogenic factor 1 and wilms' tumour 1 during early human gonadal development and sex determination a mutation in the gene encoding steroidogenic factor-1 causes xy sex reversal and adrenal failure in humans apparently normal ovarian differentiation in a prepubertal girl with transcriptionally inactive steroidogenic factor 1 (nr5a1/sf-1) and adrenal cortical insuffi ciency genetic control of gonadal differentiation, baillière's single-copy dna sequences specifi c for the human y chromosome sex determination and the y chromosome gonadal differentiation: a review of the physiological process and infl uencing factors based on recent experimental evidence the human sry protein is present in fetal and adult sertoli cells and germ cells molecular basis of mammalian sexual determination: activation of müllerian inhibiting substance gene expression by sry autosomal xx sex reversal caused by duplication of sox-9 campomelic dysplasia and autosomal sex reversal caused by mutations in an sry-related gene functional analysis of sox8 and sox9 during sex determination in the mouse autosomal sex reversal and campomelic dysplasia are caused by mutations in and around the sryrelated gene sox9 similar gene structure of two sox9a genes and their expression patterns during gonadal differentiation in a teleost fi sh, rice fi eld eel (monopterus albus) sex determination: a tale of two sox genes the molecular genetics of human sex determination x-p duplications with and without sex reversal the role of folliclestimulating hormone in controlling sertoli cell proliferation in testes of fetal rats age-retated variation of folliclestimulating hormone stimulated camp production, protein kinase c activity and their interactions on the rat testis the embryology of testicular descent the gubernaculum during testicular descent in the pig fetus the gubernaculum during testicular descent in the human fetus hyperplasia and hypertrophy of the gubernaculum during testicular descent in the fetus incidence of cryptorchidism the role of a non-androgenic testicular factor in the process of testicular descent in the dog mechanism of testicular descent the gubernaculum testis hunteri, testicular descent and maldescent role of the gubernaculum and intraabdominal pressure in the process of testicular descent testicular descent and cryptorchidism: the state of the art in 2004 the insulin-3 gene: lack of a genetic basis for human cryptorchidism transabdominal testicular descent is disrupted in mice with deletion of insulinlike factor 3 receptor a novel circulating hormone of testis origin in humans testicular descent ii. ontogeny and response to denervation of calcitonin gene-related peptide receptors in neonatal rat gubernaculum prenatal androgen blockade with fl utamide inhibits masculinization of the genitofemoral nerve and testicular descent cremaster muscles obtained from boys with an undescended testis show signifi cant neurological changes maternal hormone levels in early gestation of cryptorchid males: a case-control study the endocrinology of testicular descent expression of estrogen receptor esr1 and its 46-kda variant in the gubernaculum testis expression of references children and adults spermatic and peripheral plasma concentration of testosterone and androstenedione in prepubertal boys pituitary testicular axis during puberal development development of leydig cells in the normal human testis. a cytological, cytochemical and quantitative study a quantitative morphology study of human leydig cells from birth to adulthood testicular volume during adolescence. cross-sectional and longitudinal studies testicular volumes of adolescent clinical measurement of the testes in boys and men testicular histology in fetuses with the prune belly syndrome and posterior urethral valves proliferation of sertoli cells during development of the human testis assessed by stereological methods hyperplasia and the immature appearance of sertoli cells in primary testicular disorders granular transformation of sertoli cells in testicular disorders a quantitative morphological study of human leydig cells from birth to adulthood congenital leydig cell hyperplasia morphologic anomalies in triploid liveborn fetuses objective measurement of testicular volume by ultrasonography: evaluation of the technique and comparison with orchidometer estimates ethnic differences. variation in human testis size testicular size: the effects of ageing, malnutrition, and illness testicular size: assesment and clinical importance studies of the participation of the tunica albuginea and rete testis (ta and rt) in the quantitative structure of human testis postnatal development and differentiation of contractile cells within the rabbit testis the tunica albuginea of the human testis is characterized by complex contraction and relaxation activities regulated by cyclic gmp histology of the normal testis a method for determining the relative total lenght of the tubules in the testis the fi ne structure of sertoli cells in the human testis some observations on the fi ne structure of the sertoli cell in the human testis on the morphology of the human sertoli cell organization and morphogenesis of the human seminiferous epithelium the mammalian spermatozoon ultrastructural observations on nucleoli and related structures during human spermatogenesis ultrastructure of the nucleus of human sertoli cells in normal and pathological testes cell-cell communication in the testis the fi ne structure of the monkey (macaca) sertoli cell and its role in maintaining the blood-testis barrier electron microscopic observations on the structural components of the blood testis barrier sertoli cell junctions: morphological and functional correlates morphological and functional evidence for sertoli-germ cell relationship the testis evidence that vinculin is codistributed with actin bundles in ectoplasmic ('junctional') specializations of mammalian sertoli cells changes in the lipid inclusion/ sertoli cell cytoplasm area ratio during the cycle of the human seminiferous epithelium fine structure of the sertoli cell of the human testis testicular involution in elderly men: comparison of histologic quantitative studies with hormone patterns multinucleate sertoli cells in aged human testes infl uence of age on sperm production and testicular weights in men localization of folliclestimulating hormone (fsh) immunoreactivity and hormone receptor mrna in testicular tissue of infertile men role of fsh in male gonadal function androgen receptor distribution in adult human testis the paracrine role of sertoli cells on leydig cell function receptormediated endocytosis of testicular transferrin by germinal cells of the rat testis regulation of inhibin production in the human male and its clinical applications human testis cytosol an ovarian follicular fl uid contain high amounts of interleukin-1-like factor(s) cell interactions during the seminiferous epithelial cycle quantitative differences between variants of a spermatogonia in man the ultrastructure of the four types of human spermatogonia decrease in the number of ap and ad spermatogonia and the ap/ad ratio with advancing age quantifi cation of cell types throughout the cycle of the human seminiferous epithelium and their dna content normal and abnormal spermatogonia in the human testis development of the endoplasmic reticulum during human spermatogenesis kinetics of the germinal epithelium in man atlas of human spermatogenesis ultrastructural features of human spermiogenesis morphogenetic factors infl uencing the shape of the sperm head the fi ne structure and development of the neck region of the mammalian spermatozoon morphogenesis and fate of the residual body in human spermiogenesis the phagocytic function of sertoli cells. a morphological, biochemical, and endocrinological study of lysosomes and acid phosphatase localization in the rat testis renewal of spermatogonia in man ultrastructural observations on the differentiation of spermatids in man further observations on the numbers of spermatogonia, spermatocytes and spermatids connected by intercellular bridges in the mammalian testis evidence of a wave of spermatogenesis in the human testis computeraided threedimensional reconstructions of the arrangement of primary spermatocytes in human seminiferous tubules the cycle of the seminiferous epithelium in man morphometrical analysis of sertoli cell ultrastructure during the seminiferous epithelial cycle in rats stage-specifi c signals in germ line differentiation control of sertoli cell phagocytic activity by spermatogenic cells changes in the lipid inclusions/ sertoli cell cytoplasm area ratio during the cycle of the sertoli cell of the human seminiferous epithelium interaction between germ cells and sertoli cells in the testis apoptosis of male germ cells, a generalized or a cell typespecifi c phenomenon? ultrastructure and function of the lamina propia of mammalian seminiferous tubules the peritubular tissue in the normal and pathological human testis: an ultrastructural study basement membrane regulation of sertoli cells the lamina propia of vertebrate seminiferous tubules: a comparative light and electron microscopic investigation contractile cells in human seminiferous tubules peritubular myoid cells of human and rat testis are smooth muscle cells that contain desmin-type intermediate fi laments elastic tissue in the limiting membrane of the human seminiferous tubules the biology of myofi broblasts evidence for contractility of the human seminiferous tubule confi rmed by its response to noradrenaline and acetylcholine agerelated variations in seminiferous tubules in men. a stereologic evaluation distribution and role of cd34-positive stromal cells and myofi broblasts in human normal testicular stroma giant interstitial cells and extraparenchimal interstitial cells of the human testis morphological relationship between testicular nerves and leydig cells in man sertoli cells and leydig cells in man ultrastructure of leydig cells in human ageins testes neuron-specifi c enolase-like inmunoreactivity in human leydig cells relaxin-like factor: a highly specifi c and constitutive new marker for leydig cells in the human testis ghrelin and reproduction: a novel signal linking energy status and fertility a novel circulating hormone of testis origin in humans calretinin is expressed in the leydig cells of rat testis calretinin, a more sensitive but less specifi c marker than α-inhibin for ovarian sex cord-stromal neoplasms: an immunohistochemical study of 215 cases barrier properties of testis microvessels the leydig cell of the human testis -a new member of the diffuse neuroendocrine system sertoli and leydig cells of the human testis express neurofi lament triplet proteins isolation of human leydig cell mesenchymal precursors from patients with the androgen insensitivity syndrome: testosterone production and response to human chorionic gonadotropin stimulation in culture progenitor cells of the testosterone-producing leydig cells revealed attrition of the human leydig cell population with advancing age effect of ageing on the volume, structure and total leydig cell content of human testis stereological analysis of leydig cell ultrastructure in aged humans multinucleate leydig cells in normal human testes mitosis in adult human leydig cells leydig cell numbers, daily sperm production and serum gonadotropin levels in ageing men agerelated change in numbers of other interstitial cells in testes of adult men: evidence bearing on the fate of leydig cells lost with increasing age testosterone and spermatogenesis. identifi cation of stage-specifi c, androgen-regulated proteins secreted by adult rat seminiferous tubules regulation of the semininiferous epithelium changes in surface area and number of leydig cells in relation to the 6 stages of the cycle of the human seminiferous epithelium actions of the testicular paracrine factor (p-mod-s) on sertoli cell transferrin secretion througout pubertal development diaphragmatic hernia in denys-drash syndrome sex reversal and diaphragmatic hernia in phenotypically female sibs with normal xy chromosomes clinical expression and sry gene analysis in xy subjets lacking gonadal tissue pagod syndrome: eighth case and comparison to animal models of congenital vitamin a defi ciency familial occurrence of agonadism and multiple internal malformations in phenotypically normal girls with 46,xy and 46,xx karyotypes, respectively: a new autosomal recessive syndrome low birth-weight, microcephalic malformation syndrome in a 46,xx girl and her 46,xy sister with agonadism: third report of the kennerknecht syndrome or autosomal recessive seckel-like syndrome with previously undescribed genital anomalies agonadism in a 46,xy patient with charge association the syndrome of rudimentary testes: occurrence in live siblings identical twins discordant for the 'rudimentary testes' syndrome rudimentary testes syndrome revisited congenital anorchism: diagnostic and therapeutic aspects congenital bilateral anorchia: clinical, hormonal and imaging study in 12 cases pcr analysis and sequencing of the sry sex determining gene in four patients with bilateral congenital anorchia is bilateral congenital anorchia genetically determined? the vanishing testis syndrome. indications for conservative therapy the vanishing testis the vanishing testis is the vanished testis always a scrotal event? bilateral anorchia: discordance in monozygotic twins an analysis of the genetic factors involved in testicular descent in a cohort of 14 male patients with anorchia report of 11 cases with discussion of etiology and pathogenesis hyperplasia of spermatic cord nerves: a sign of testicular absence signifi cance of testicular biopsies in cryptorchidism in children human monorchism: a clinicopathological study of unilateral absent testes in 65 boys testicular regression syndrome a pathological study of 77 cases testicular regression syndrome: a clinical and pathologic study of 11 cases laparoscopic and histologic evaluation of the inguinal vanishing testis histological evaluation of the testicular nubbin in the vanishing testis syndrome ureteral ectopia into cystic seminal vesicle with ipsilateral renal dysgenesis and monorchia renal and testicular agenesis in a patient with darier's disease findings: small testicles klinefelter's syndrome diagnosed three years after surgery for mediastinal teratoma kenny-caffey syndrome and microorchidism polyorchidism: case report and review of literature polyorchidism: sonographic and magnetic resonance image fi ndings a case of supernumerary testis polyorchidism presenting as retractile testes polyorchidism: evaluation by mr polyorchidism: an unusual case polyorchidism: a strange anomaly with unsuspected properties bilateral double by testis: evaluation magnetic resonance imaging one man with fi ve testes: report of case triorchidism with normal spermatogenesis: an unusual cause for failure of vasectomy abdominal polyorchidism: a case report and review of the literature ultrasound of polyorchidism: case report and literature review sonographic features of polyorchidism polyorchidism: case report and literature review polyorchidism discovered as testicular torsion associated with undescended atrophic contralateral testis. a surgical solution revisión y aportación de un nuevo caso polyorchidism discovered as testicular torsion polyorchidism with normal spermatogenesis polyorchidism with normal spermatogenesis polyorchidism with normal spermatogenesis and equal sized testes. a theory of embryonal development testiculo supernumerario. comunicación de un caso y revisión de la literatura polyorchidism: case report and review of the literature polyorchidism: a case report polyorchidism: report of a case polyorchidism in a newborn: case report and review of the literature polyorchidism: a case report and review of the literature polyorchidism and seminoma in a child a case of polyorchidism with testicular teratoma signifi cance of testicular size measurement in andrology. ii correlation of testicular size with testicular function congenital leydig cell hyperplasia evidence in favor of the mechanical (intrauterine torsion) theory over the endocrinopathy (cryptorchidism) theory in the pathogenesis of testicular agenesis compensatory hypertrophy of testicle in unilateral cnyptorchidism impact of varicocele on testicular volume in young men: signifi cance of compensatory hypertrophy of contralateral testis plasma lh and fsh response to lrh in boys with compensatory testicular hypertrophy followup of boys with unilateral compensatory testicular hypertrophy testicular volume during adolencence unilateral testicular hypertrophy: an apparently benign occurrence without cryptorchidism benign bilateral testicular enlargement benign macroorchidism in a pubescent boy idiopathic macroorchidism macro-orchidism: light and electron microscopic study of four cases idiopathic benign bilateral testicular enlargement in a pubertal boy: a case report and review of literature macroorchidism: a case report endocrine and spermatological characteristics of 135 patients with bilateral megalotestis the fragile x premutation: new insights and clinical consequences a marker x chromosome a pedigree of mental defect showing sex-linkage x-linked mental retardation associated with macroorchidism inherited congenital normofunctional testicular hyperplasia and mental defi ciency. a corroborative study x-linked mental defi ciency megalotestes syndrome xlinked mental retardation with macroorchidism and the fragile site at xq 27 or 28 familial-x-linked mental retardation with a marker x chromosome and its relationship to macroorchidism a recognizable syndrome of sex-linked mental retardation, large testes and marker x chromosome gonadal function in men with the martin-bell (fragile x) syndrome spermatogenesis in two patients with the fragile x syndrome. i. histology: light and electron microscopy spermatogenesis in two patients with the fragile x syndrome x-linked mental retardation and an x-chromosome marker population incidence and segregation ratios in martin-bell syndrome study of a family with a fragile site of the x chromosome at xq 27-28 without mental retardation testicular size in fetal fragile x syndrome a family with mental retardation, variable macrocephaly and macroorchidism, and linkage to xq12-q21 testicular enlargement and elevated serum inhibin concentrations occur in patients with pituitary macroadenomas secreting follicle stimulating hormone juvenile hypothyroidism with testicular enlargement hipotiroidismo y maduración testicular precoz thyroid hormone and male gonadal function sexual maturation in juvenile hypothyroidism hypothalamic-pituitary gonadal axis in boys with primary hypothyroidism and macroorchidism thyroid hormones: their role in testicular steroidogenesis male hypogonadism in hypothyroidism: a study of six cases macroorchidism and testicular fi brosis associated with autoimmune thyroiditis juvenile hypothyroidism and precocious testicular maturation acquired hypothyroidism with muscular hypertrophy and precocious testicular enlargement regulation of gonadotropin-releasing hormone (gnrh) gene expression in hypothalamic neuronal cells hypothyroidism-induced macroorchidism: use of a gonadotropin-hormone agonist to understand its mechanism and augment adult stature a potential novel mechanism for precocious puberty in juvenile hypothyroidism high neonatal triiodothironine levels reduce the period of sertoli cell proliferation and accelerate tubular lumen formation in the rat testis, and increase serum inhibin levels increased numbers of sertoli and germ cells in adult rat testes induced by synergistic action of transient neonatal hypothyroidism and neonatal hemicastration tri-iodothyronine directly affects rat sertoli cell proliferation and differentiation neonatal hypothyroidism causes delayed sertoli cell maturation in rats treated with propylthiouracil: evidence that the sertoli cell controls testis growth central precocious puberty. an overview of diagnosis, treatment, and outcome precocious puberty precocious puberty sexual precocity precocious puberty caused by a suprasellar arachnoid cyst. analysis of 6 cases the endocrine spectrum of arachnoid cysts in childhood growth, puberty and hypothalamicpituitary function in children with suprasellar arachnoid cyst aspects étiologiques cliniques et biologiques des pubertés précoces d'origine centrale precocious puberty following severe head trauma precocious puberty after traumatic brain injury endocrine disorder as the only sign of chronic 'nonhypertensive' hydrocephalus precocious puberty after hypothalamic and pituitary irradiation in young children precocious and premature puberty associated with treatment of acute lymphoblastic leukaemia endocrine function and morphological fi ndings in patients with disorders of the hypothalamopituitary area: a study with magnetic resonance the radiological approach to precocious puberty mr imaging features in hypothalamic hamartoma: a report of three cases and review of literature boys with precocious puberty due to hypothalamic hamartoma: reproductive axis after discontinuation of gonadotropinreleasing hormone analog therapy etiology of central precocious puberty in males: the results of the italian study group for physiopathology of puberty hypothalamic hamartoma: a source references of luteinizing-hormone-releasing factor in precocious puberty mixed germ cell tumour of the pineal region: a case report hcg-secreting pineal teratoma causing precocious puberty: report of two patients and review of the literature puberty without gonadotropins: a unique mechanism of sexual development gonadotropinindependent familial sexual precocity with premature leydig and germinal cell maturation (familial testotoxicosis): effects of a patent luteinizing hormone-releasing factor agonist and metroxyprogesterone acetate therapy in four cases testicular changes in gonadotropin-independent familial male sexual precocity. familial testotoxicosis gonadotropin-independent precocious puberty due to luteinizing hormone receptor mutations in brazillian boys: a novel constitutively activating mutation in the fi rst transmembrane helix activating mutations in the lh receptor gene: a human model of non fsh-dependent inhibin production and germ cell maturation mutational analysis of the luteinizing hormone receptor gene in two individuals with leydig cell tumors tumor de células de leydig con pseudopubertad precoz leydig cell tumor in a child with spermatocyte maturation and no pseudoprecocious puberty an aromatase-producing sex-cord tumor resulting in prepubertal gynecomastia sertoli cell tumour in a boy with peutz-jeghers syndrome carney complex: the complex of mixomas, spotty pigmentation, endocrine overactivity, and schwannomas virilising adrenal cortical tumours in children virilizing adrenal cortical carcinoma with hypertrophy of spermatic tubules in childhood bilateral testicular tumours in congenital adrenal hyperplasia: a continuing diagnostic and therapeutic dilemma hepatoblastoma presenting as isosexual precocity. the clinical importance of histologic and serologic parameters gonadotropin-secreting pineal teratoma causing precocious puberty the p450 gene superfamily: updated listing of all genes and recommended nomenclature for the chromosomal loci human aromatase: cdna cloning, southern blot analysis, and assignment of the gene to chromosome 15 familial hyperestrogenism in both sexes: clinical, hormonal, and molecular studies of two siblings the aromatase excess syndrome is associated with feminization of both sexes and autosomal dominant transmission of aberrant p450 aromatase gene transcription aromatase defi ciency caused by a novel p450arom gene mutation: impact of absent estrogen production on serum gonadotropin concentration in a boy aromatase defi ciency in male and female siblings caused by a novel mutation and the physiological role of estrogens effect of testosterone and estradiol in a man with aromatase defi ciency estrogen: consequences and implications of human mutations in synthesis and action primary testicular abnormalities causing precocious puberty leydig cell tumor, leydig cell hyperplasia, and adrenal rest tumor focal lobular spermatogenesis and pubertal acceleration associated with ipsilateral leydig cell hyperplasia mccune-albright syndrome in a boy may present with a monolateral macroorchidism as an early and isolated clinical manifestation macroorchidism due to autonomous hyperfunction of sertoli cells and g(s)alpha gene mutation: an unusual expression of mccune-albright syndrome in a prepubertal boy mccune-albright syndrome in a male child: a clinical and endocrinologic enigma anomalies of the testicle preperitoneal ectopic testis: a case report ectopia epidídimo-perineal testículo ectópico perineal perineal testicle crossed ectopic testis: a case report and review of the literature crossed ectopic testis. case report and review transverse testicular ectopia crossed ectopic testis with common vas deferens crossed testicular ectopia detected by laparoscopy transverse testicular ectopia: a case report crossed testicular ectopia in association with double incomplete testicular descent two rare genital abnormalities: crossed testicular and scrototesticular ectopia transverse testicular ectopia associated with persistent müllerian duct syndrome. a case report the persistent müllerian duct syndrome: a rare cause of cryptorchidism mixed germ cell tumor after bilateral orchidopexy in persistent müllerian duct syndrome with transverse testicular ectopia two rare cases of ectopic testes subumbilical ectopic testis exstrophy of the testis testicular dislocation after scrotal trauma cystic dysplasia of the testis: a unique anomaly studied by microdissection cystic dysplasia of the rete testis. case report cystic dysplasia of the rete testis associated to cryptorchidism: a case report cystic dysplasia of the testis: sonographic and pathologic fi ndings cystic testicular lesions in the pediatric population the human rete testis the rete testis in man: ultrastructural aspects the mammalian rete testis. a morphological examination cystic displasia of the testis: light and electron microscopic study of three cases cystic dysplasia of the testis associated with multicystic dysplasia of the kidney ectasia of the rete testis with ipsilateral renal agenesis cystic dysplasia of the testis cystic dysplasia of testis coffi n cm. cystic testicular lesions in the pediatric population cystic dysplasia of the testis with ipsilateral renal agenesis. a case report and review of the literature conservative managemet of cystic dysplasia of the testis testicular cystic dysplasia: evaluation of 3 new cases treated without surgery cystic dysplasia of rete testis associated with ipsilateral renal agenesis. case report fetal gonadoblastoid testicular dysplasia gonadoblastoid testicular dysplasia in walker-warburg syndrome fetal gonadoblastoid testicular dysplasia: a focal failure of testicular development frequency of so-called hypoplastic or dysgenetic zones in scrotal and otherwise normal testes androgen receptor expression in sertoli cells as a function of seminiferous tubule maturation in the human cryptorchid testis congenital testicular lymphangiectasis congenital testicular lymphangiectasis in noonan's syndrome congenital testicular lymphangiectasis in children with otherwise normal testes macromicroscopischeskoe is sledavanie vnutriorgannoi limfaticheskoi sistemy muzhskoi polovoi zhelezy [macro/ microscopic study of intraorgan lymphatic system of male gonad in man distribution and fi ne structure of the lymphatic system in the human testis lymph vascular system of the interstitial tissue of the testis as revealed by electron microscopy epididymal lymphangiectasis tumors and cysts of the paratesticular region complex multilocular cystic lesion of rete testis, accompanied by smooth muscle hyperplasia, mimicking intratesticular leydig cell neoplasm smooth muscle hyperplasia of the testicular adnexa clinically mimicking neoplasia. clinicopathologic study of sixteen cases myoepithelial hamartoma of the small bowel: report of a case focal muscular hyperplasia of the trachea ectopic seminiferous tubules in the tunica albuginea of normal and dysgenetic testes pseudocysts of the tunica albuginea: benign invasion by testicular tubules development of the testis from birth to puberty testosterone immunoexpression in human leydig cells of the tunica albugiena testis and spermatic cord. a quantitative study in normal fetuses, young adults, elderly men and patients with cryptorchidism giant interstitial cells and extraparenchymal interstitial cells of the human testis the infi ltrative activity of leydig cells relation of leydig cells in the human testicle to the tubules and testicular function a histological study of extraparenchymal leydig-like cells über die zwischenzellen des hodens zur pathologischen anatomie der leydig zelle leydig cells within the lamina propria of seminiferous tubules in four patients with azoospermia ectopic leydig cells in seminiferous tubules of an infertile human male with a chromosomal aberration leydig cells within the spermatogenic seminiferous tubules immunohistochemical and quantitative study of interstitial and intratubular leydig cells in normal men, cryptorchidism, and klinefelter's syndrome über das verhalten von hoden und nebenhoden bei angeborenem fehlen des ductus deferens, zugleich zur frage des vorkommens von zwischenzellen in menschliche nebenhoden sur l'existence de glands sympathicotropes dans l'ovaire et le testicule humains; leur rapport avec la glande interstitielle du testicule zur ultrastrusktur der leydigzellen im funiculus spermaticus des menschen histogenesis of human extraparenchymal leydig cells presentación de un caso y revisión de la literatura testicular lipomatosis in cowden's syndrome age-related epididymis-like intratesticular structures: benign lesions of wolffi an origin that can be misdiagnosed as testicular tumors male reproductive disorders in humans and prenatal indicators of estrogen exposure. a review of published epidemiological studies testicular cancers occurring in brothers with cryptorchism anatomical, morphological and volumetric analysis: a review of 759 cases of testicular maldescent risk factor patterns for cryptorchidism and hypospadias cryptorchidism: a registry based study in sweden on some factors of some possible etiological importance maternal and neonatal risk factors for cryptorchidism risk factors for cryptorchidism and hypospadias abnormalities of testicular descent surgical outcome of orchidopexy ii. trapped and ascending testes iatrogenic cryptorchidism resulting from hernia repair iatrogenic ascent of the testis: an under-recognized complication of inguinal hernia operation in children iatrogenic ascent of the testis: an underrecognized complication of inguinal hernia operation in children testicular descent and ascent in the fi rst year of life late presentation of cryptorchidism: the etiology of testicular re-ascent ascent of the testis: fact or fi ction incomplete disappearance of the processus vaginalis as a cause of ascending testis undescended testis: congenital or acquired? spontaneous ascent of the testis acquired undescended testis does proximal genitofemoral nerve division induce testicular maldescent or ascent in the rat? association between testicular microlithiasis, testicular cancer, cryptorchidism and history of ascending testis the ascending testis and the testis undescended since birth share the same histopathology elevated placental estradiol: a possible etiological factor of human cryptorchidism histologic observations in cryptorchidism: the congenital germinal-cell defi ciency of the undescended testis histologic classifi cation of undescended testes cytophotometric dna quantifi cation in human spermatogonia of cryptorchid testes early postnatal testicular maldevelopment in cryptorchidism hormonal treatment of cryptorchidism -hcg or gnrh -a multicentre study busereline treatment of cryptorchidism: a randomized, double-blind, placebo-controlled study abnormal germ cell development in cryptorchidism anatomical, morphological and volumetric analysis: a review of 759 cases of testicular maldescent impact of early orchidopexy on testicular growth effi cacy of orchiopexy by patient age 1 year for cryptorchidism clinical and histopathologic evaluation of operated maldescended testes after luteinizing hormonereleasing hormone treatment bilateral prepubertad testicular biopsias predict signifi cance of cryptorchisism-associated mixed testicular atrophy, and allow assessment of fertility multinucleated spermatogonia in cryptorchid boys: a possible association with an increased risk of testicular malignancy y chromosome microdeletions in cryptorchidism and idiopathic infertility an unusual subset of cryptorchidism: possible end organ failure ascent of the testis in children impaired germ cells in secondary cryptorchid testis after herniotomy elastic fi bers in tunica propria of undescended and contralateral scrotal testes from cryptorchid patients hyperplasia and the immature appearance of sertoli cells in primary testicular disorders focal orchitis in undescended testes: discussion of pathogenetic mechanisms of tubular atrophy undescended testes in adults: clinical signifi cance of resistive index values of the testicular artery measured by doppler ultrasound as a predictor of testicular histology testicular function in men treated in childhood for undescended testes histologic lesions in undescended ectopic obstructed testes is a testis located at the superfi cial inguinal pouch (denis browne pouch) comparable to a true cryptorchid testis? histologic maldevelopment of unilaterally cryptorchid testes and their descended partners anatomical, morphological and volumetric analysis: a review of 759 cases of testicular maldescent les testicules oscillants. forme degradée de cryptorchidie? infertility in adult males with retractile testes changes in the volume and histology of retractile testes in prepubertal boys the retractile testis retractile testis -is it really a normal variant? bilateral retractile testes -subsequent effects on fertility effect of cryptorchidism and retractile testes on male factor infertility: a multicenter, retrospective, chart review epididymal anomalies associated with hydrocele/hernia and cryptorchidism: implications regarding testicular descent congenital deformities of the testis and epididymis signifi cance of epididymal and ductal anomalies associated with undescended testis insulin-like factor 3 gene mutations in testicular dysgenesis syndrome: clinical and functional characterization testicular dysgenesis syndrome: possible role of endocrine disrupters ethnic differences in occurrence of tds -genetics and/or environment? granular changes in sertoli cells in children and pubertal patients erythropoietin may reduce the risk of germ cell loss in boys with cryptorchidism early orchiopexy: prepubertal intratubular germ cell neoplasia and fertility outcome testicular cancer and cryptorchidism cryptorchidism and testicular neoplasia testicular maldescent and infertility histology of testicular biopsies taken at operation for bilateral maldescended testes in relation to fertility in adulthood semen analysis in patients operated on for impalpable testes fertility in cryptorchidism: an overview in 1987 paternity and hormone levels after unilateral cryptorchidism: association with pretreatment testicular location no relationship of testicular size at orchiopexy with fertility in men who previously had unilateral cryptorchidism fertility potential: a comparison of intra-abdominal and intracanalicular testes by age groups in children surgical mangeament of undescended testis: retrospective study of potential fertility in 274 cases orchiopexy and infertility: a critical long-term retrospective analysis the importance of mini-puberty for fertility in cryptorchidism cryptorchidism: aspects of fertility references and neoplasms. a study including data of 1,335 consecutive boys who underwent testicular biopsy simultaneously with surgery for cryptorchidism experience of screening for carcinoma-in-situ of the testis among young men with surgically corrected maldescended testes distribution of carcinoma-in-situ in testes from infertile men carcinoma-in-situ of the testis: aneuploid cells in semen cryptorchidism and testicular neoplasia testicular microlithiasis: sonographic features with pathologic correlation testicular microlithiasis in a child with torsion of the appendix testis testicular microlithiasis occurring in a postorchidopexy testis microcalcifi cations in testicular malignancy: diagnostic tool in occult tumor? testicular microlithiasis with sterility testicular microlithiasis in male infertility microlitiasis testicular asociada a infertilidad painful testicular lithiasis testicular microlithiasis: diagnosis associated with orchialgia idiopathic testicular microlithiasis. ultrastructural study testicular microlithiasis in a uk population: its incidence, associations and follow-up testicular microlithiasis is associated with testicular pathology testicular microlithiasis -a possibly premalignant condition the prevalence of testicular microlithiasis in an asymptomatic population of men 18 to 35 years old testicular microlithiasis: prevalence and tumor risk in a population referred for scrotal sonography testicular calcifi cation and microlithiasis: association with primary intra-testicular malignancy in 3,477 patients testicular microlithiasis, a premalignant condition: prevalence, histopathologic fi ndings, and relation to testicular tumor seminal profi le of subjects with testicular microlithiasis and testicular calcifi cations bilateral testicular microlithiasis predicts the presence of the precursor of testicular germ cell tumors in subfertile men testicular nicrolithiasis and cryptorchidism: ultrasound analysis after orchidopexy testicular microlithiasis. a benign condition with a malignant association testicular microlithiasis: a review and its association with testicular cancer testicular microlithiasis heralding mixed germ cell tumor of the testis in a boy signifi cance of testicular microlithiasis yolk sac tumor and testicular microlithiasis testicular microlithiasis in children: sonographic features and clinical implications identifi cation of seminiferous tubule aberrations and a low incidence of testicular microliths associated with the development of azoospermia raman spectroscopic analysis identifi es testicular microlithiasis as intratubular hydroxyapatite testicular calcifi cation in a 4-year-old boy the origin of testicular microliths testicular microlithiasis in 2 children with bilateral cryptorchidism pulmonary alveolar microlithiasis with involvement of the sympathetic nervous system and gonads testicular microlithiasis and concomitant testicular intraepithelial neoplasia increased risk of carcinoma in situ in patients with testicular germ cell cancer with ultrasonic microlithiasis in the contralateral testicle testicular carcinoma in a patient with previously demostrated testicular microlithiasis testicular microlithiasis and subsequent development of metastatic germ cell tumor detection of testicular microlithiasis by sonography the interval of development of testicular carcinoma in a patient with previously demonstrated testicular microlithiasis surveillance of testicular microlithiasis? results of an uk based national questionnaire survey testicular microlithiasis: us follow-up testicular microlithiasis as a predictor of intratubular germ cell neoplasia microlithiasis of the epididymis and the rete testis three novel sry mutations in xy gonadal dysgenesis and the enigma of xy dysgenesis cases without sry mutations a novel postzygotic nonsense mutation in sry in familial xy gonadal dysgenesis mutations in sry and wt1 genes required for gonadal development are not responsible for xy partial gonadal dysgenesis pathology of 46,xy pure gonadal dysgenesis: absence of testis differentiation associated with mutations in the testis-determining factor xy gonadal dysgenesis: genetic heterogeneity based upon clinical observations, h-y antigen status and segregations analysis chronic renal disease, myotonic dystrophy, and gonadoblastoma in xy gonadal dysgenesis renal failure with xy gonadal dysgenesis: report of the second case a syndrome of chronic renal failure and xy gonadal dysgenesis in young phenotypic females without genital ambiguity atypical presentation of denys-drash syndrome in a female with a novel wt1 gene mutation molecular analysis of frasier syndrome: mutation in the wt1 gene in a girl with gonadal dysgenesis and nephronophthisis frasier syndrome: a rare syndrome with wt1 gene mutation in pediatric urology xy siblings with inadequate virilization and cns defi ciency testicular dysgenesis and mental retardation in two incompletely masculinized xysiblings analysis of the testisdetermining gene sry in patients with xy gonadal dysgenesis the gardner-silengo-wachtel or genitor-palato-cardiac syndrome: male pseudohermaphroditism with micrognathia, cleft palate, and conotruncal cardiac defects xy gonadal dysgenesis associated with a multiple pterygium syndrome phenotype xy pure gonadal dysgenesis: a case with graves' disease a case of gonadal dysgenesis, breast development, graves' disease, and low bone mass alopecia universalis congenita, xy gonadal dysgenesis and laryngomalacia: a novel malformation syndrome alopecia congenita universalis, microcephaly, cutis marmorata, short stature and xy gonadal dysgenesis: variable expression of el-shanti syndrome syndromal (and nonsyndromal) forms of male pseudohermaphroditism familial ovarian dysgerminomas (swyer syndrome) in females associated with a new familial syndrome of 46 xy gonadal dysgenesis with anomalies of ectodermal and mesodermal structural familial xy gonadal dysgenesis familial xy gonadal dysgenesis xy gonadal dysgenesis: genetic heterogeneity based upon clinical observations, h-y antigen status and segregations analysis the x linked recessive form of xy gonadal dysgenesis with high incidende of gonadal cell tumors: clinical and genetic studies the relationship of neoplasia to disorders of abnormal sexual diferentiation xy gonadal dysgenesis: evidence for autosomal dominant transmission in a large kindred gonadoblastomas in 5 patients with 46xy gonadal dysgenesis familial 46,xx gonadal dysgenesis pure gonadal dysgenesis xx and xy: observations in fi fteen patients gonadal dysgenesis in a patient with an x; 3 translocation: case report and review gonadal dysgenesis, intra-x chromosomal insertion, and possible position effect in an otherwise normal female ovarian dysgenesis with balanced autosomal translocation a novel mutation in the fsh receptor inhibiting signal transduction and causing primary ovarian failure a novel loss of function mutation in exon 10 of the fsh receptor gene causing hypergonadotropic hypogonadism: clinical and molecular characteristics no evidence of mutations in the follicle-stimulating hormone receptor gene in mexican women with 46,xx pure gonadal dysgenesis dysgerminoma and gonadal dysgenesis in a 46,xx female with no evidence of y chromosomal dna dysgerminoma with syncytiotrophoblastic giant cells references arising from 46,xx pure gonadal dysgenesis familial dysgerminoma associated with 46, xx pure gonadal dysgenesis prenatal and postnatal prevalence of turner's syndrome: a registry study the distribution of chromosomal genotypes associated with turner's syndrome: livebirth prevalence and evidence for dismissed fetal mortality and severity in genotypes associated with structural x abnomalities or mosaicism primary ovarian failure turner syndrome and female sex chromosome aberrations: deduction of the principal factors involved in the development of clinical features turner's syndrome the anatomy and histology of xo human embryos and fetuses principles of reproductive embryology mammalian muts homologue 5 is required for chromosome pairing in meiosis accelerated germ cell apoptosis in sex chromosome aneuploid fetal human gonads sex chromosomes and sexlinked genes gonadal mixed germ cell tumor combined with a large hemangiomatous lesion in a patient with turner's syndrome and 45,x/46,x, + mar karyotype gonadoblastoma and turner syndrome a review of 41 cases with observations on testicular histology and function true hermaphroditism presenting as bilateral gynecomastia in an adolescent phenotypic male aetiological diagnosis of male sex ambiguity: a collaborative study seminoma in a 46 xx true hermaphrodite with positive h-y antigen. a case report the gonads of human true hermaphrodites true hermaphroditism. clinical morphological and cytogenetic aspects true hermaphroditism: geographical distribution, clinical fi ndings, chromosomes and gonadal histology true hermaprhoditism: clinical features, genetic variants and gonadal histology hormonal and molecular genetic fundings in 46, xx subjects with sexual ambiguity and testicular differentiation familial 46xx males coexisting with familial 46,xx true hermaphrodites in same pedigree partially delected sry gene confi ned to testicular tissue in a 46,xx true hermaphrodite without sry in leukocytic dna pregnancy in a woman with a y chromosome after removal of an ovarian dysgerminoma preservation of gonadal function in true hermaphroditism das syndrom des pseudohermaphroditismus masculinus bei kongenitaler nebennierensiden-hyperplasie ohne androgenüberproduktion a novel compound heterozygous mutation in the steroidogenic acute regulatory protein gene in a patient with congenital lipoid adrenal hyperplasia molecular and structural analysis of two novel star mutation in patients with lipoid congenital adrenal hyperplasia a novel mutation l260p of the steroidogenic acute regulatory protein gene in three unrelated patients of swiss ancestry with congenital lipoid adrenal hyperplasia mutations in the steroidogenic acute regulatory protein (star) in six patients with congenital lipoid adrenal hyperplasia disorders of androgen synthesis -from cholesterol to dehydroepiandrosterone a genetic isolate of congenital lipoid adrenal hyperplasia with atypical clinical fi ndings phenotypic variations in lipoid congenital adrenal hyperplasia abnormal sex differentiation prenatal diagnosis of congenital lipoid adrenal hyperplasia targeted disruption of the mouse gene encoding steroidogenic acute regulatory protein provides insights into congenital lipoid adrenal hyperplasia an autopsy case of congenital lipoid hyperplasia of the adrenal cortex gonadal development and grown in 46,xx and 46,xy individuals with p450scc defi ciency (congenital lipoid adrenal hyperplasia) new developments in congenital lipoid adrenal hyperplasia and steroidogenic acute regulatory protein zur morphologie und genese der kongenitalen nebennierenrindenhyperplasie beim männlichen scheinzwitter the testicular lesion and sexual differentiation in congenital lipoid adrenal hyperplasia congenital adrenal lipoid hyperplasia due to defi cient cholesterol side-chain cleavage activity (20,22-desmolase) in a patient treated for 18 years congenital lipoid adrenal hyperplasia in a eight-year-old phenotype female gonadal histology with testicular carcinoma in situ in a 15-year old 46,xy female patient with a premature termination in the steroidogenic acute regulatory protein causing congenital lipoid adrenal hyperplasia unusual steroid pattern in congenital adrenal hyperplasia: defi ciency of 3β-hydroxydehydrogenase pubertal boy with the 3β-hydroxysteroid dehydrogenase defect characterization of two novel homozygous missense mutations involving codon 6 and 259 of type ii 3β-hydroxysteroid dehydrogenase (3βhsd) gene causing, respectively, nonsalt-wasting and salt-wasting 3βhsd defi ciency disorder disorders of sexual differentiation congenital adrenal hyperplasia due to point mutations in the type ii 3β-hydroxysteroid dehydrogenase gene molecular basis of congenital adrenal hyperplasia due to 3β-hydroxysteroid dehydrogenase defi ciency detection and functional characterization of the novel missense mutation y254d in type ii 3β-hydroxysteroid dehydrogenase (3βhsd) gene of a female patient with nonsalt-losing 3βhsd defi ciency nonsalt-losing congenital adrenal hyperplasia due to 3β-hydroxysteroid dehydrogenase defi ciency with normal glomerulosa function defects of the testosterone biosynthetic pathway in boys with hypospadias abnormalities of adrenal steroidogenesis in chilean boys with micropenis mutation of proline 409 to arginine in the meander region of cytochrome p450c17 causes severe 17 alpha-hydroxylase defi ciency localization of the human cyp17 gene (cytochrome p450 (17 alpha)) to 10q24.3 by fl uorescence 'in situ' hybridization and simultaneous chromosome banding a novel point mutation in p450c17 (cyp17) causing combined 17alpha-hydroxylase/17,20-lyase defi ciency towards a unifying mechanism for cyp17 mutations that cause isolated 17,20-lyase defi ciency 17 alpha-hydroxylation defi ciency in man endocrine studies in male pseudohermaphroditism in childhood and adolescence desmolase defi ciency male pseudohermaphroditism consistent with 17-20 desmolase defi ciency pitfalls in characterizing p450c17 mutations associated with isolated 17,20-lyase defi ciency severe 46,xy virilization defi cit due to 17β-hydroxysteroid dehydrogenase defi ciency 17 ketosteroid reductase defi ciency in an adult patient without gynecomastia but with female psychosexual orientation absent spermatogenesis despite early bilateral orchidopexy in 17-ketoreductase defi ciency substitution mutation c268y causes 17β-hydroxysteroid dehydrogenase 3 defi ciency novel insertion frameshift mutation of the lh receptor gene: problematic clinical distinction of leydig cell hypoplasia from enzyme defects primarily affecting testosterone biosynthesis leydig cell hypoplasia: a cause of male pseudohermaphroditism leydig cell hypoplasia causing male pseudohermaphroditism: diagnosis 13 year after prepubertal castration leydig cell hypofunction resulting in male pseudohermaphroditism a case of male pseudohemaphroditism associated with elevated lh, normal fsh and low testosterone possibly due to the secretion of an abnormal lh molecula inherited male pseudohermaphroditism due to gonadotropin irresponsiveness a clinico-genetic investigation of leydig cell hypoplasia naturally occurring mutations of the luteinizing hormone receptor gene affecting reproduction leydig cell hypoplasia due to inactivation of luteinizing hormone receptor by a novel homozygous nonsense truncation mutation in the seventh transmembrane domain biological effect of a novel mutation in the third leucine-rich repeat of human luteinizing hormone references receptor familial incomplete male pseudohermaphroditism, type i. evidence for androgen resistance and variable clinical manifestations in a family with the reifenstein syndrome quantitative receptor defects in families with androgen resistance; failure of stabilization of the fi broblast cytosol androgen receptor leydig cell hypoplasia determining familial hypergonadotropic hypogonadism male pseudohermaphroditism resulting from leydig cell hypoplasia the syndrome of testicular feminization in male pseudohermaphrodites the molecular bases of androgen insensitivity molecular basis of androgen insensitivity the androgen insensitivity syndrome (testicular feminization). a clnicopathologic study of 43 cases a case of complete testicular feminization and 47 xxy karyotype morphometry and histology of gonads from twelve children and adolescents with the androgen insensitivity (testicular feminization) syndrome testicular carcinoma in situ in children with the androgen insensitivity (testicular feminization) syndrome síndrome de feminización testicular completa inmunohistochemical and ultrastructural study of sertoli cells in androgen insensitivity androgen receptor gene mutation associated with complete androgen insensitivity syndrome and sertoli cell adenoma residual activity of mutant androgen receptors explains wolffi an duct development in the complete androgen insensitivity syndrome bilateral testicular tumors in androgen insensitivity syndrome puberty in subjects with complete androgen insensitivity syndrome familial male pseudohermaphroditism with labial testes and partial feminization: endocrine studies and genetic aspects altered hypothalamic-pituitarytesticular function in incomplete testicular feminization syndrome etude d'un cas familial d'androgynoïdisme avec hypospadias grave, gynécomastie et hyperoestrogénie hereditary familial hypogonadism familial gynecomastia androgens and fertility variable androgen receptor levels in infertile men frecuency of androgen insensitivity in infertile phenotypically normal men analysis of the transactivation domain of the androgen receptor in patients with male infertility evidence for partial deletion in the androgen receptor gene in a phenotypic male with azoospermia preserved male fertility despite decreased androgen sensitivity caused by a mutation in the ligand-binding domain of the androgen receptor gene progressive proximal spinal and bulbar muscular atrophy of late onset. a sex-linked recessive trait founder effect in spinal and bulbar muscular atrophy (sbma) x-linked recessive bulbospinal neuronopathy: a clinicopathological study x-linked recessive bulbospinal neuronopathy: a report of ten cases x-linked spinal and bulbar muscular atrophy of late onset. a separate type of motor neuron disease? a family with adult spinal and bulbar muscular atrophy, x-linked inheritance and associated testicular failure androgen receptor gene mutations in x-linked spinal and bulbar muscular atrophy meiotic stability and genotypephenotype correlation of the tricucleotide repeat in x-linked spinal and bulbar muscular atrophy the length and location of cag trinucleotide repeats in the androgen receptor n-terminal domain affect transactivation function abnormal androgen receptor binding affi nity in subjects with kennedy's disease (spinal and bulbar muscular atrophy) molecular pathology of the androgen receptor in male (in)fertility steroid 5 alpha-reductase defi ciency in man. an inherited form of male pseudohermaphroditism identifi cation of missense mutations in the srd5a2 gene from patients with steroid 5alpha-reductase 2 defi ciency deletion of steroid 5 alpha reductase 2 gene in male pseudohermaphroditism steroid 5 alpha-reductase 2 defi ciency molecular study of the 5 alpha-reductase type 2 gene in three european families with 5 alphareductase defi ciency 5alpha-reductase 2 gene mutations in three unrelated patients of greek cypriot origin: identifi cation of an ancestral founder effect a novel frameshift mutation in the 5alpha-reductase type 2 gene in korean sisters with male pseudohermaphroditism testosterone and epitestosterone metabolism of single hairs in 5 patients with 5 alpha-reductasedefi ciency male pseudohermaphroditism due to 5 alpha-reductase defi ciency male pseudohermaphroditism due to steroid 5 alpha-reductase defi ciency persistence of müllerian ducts in male pseudohermaphroditism, and its relationship to cryptorchidism anti-müllerian hormone and intersex states anti-müllerian hormone, the jost factor the persistent müllerian duct syndrome: a rare cause of cryptorchidism hermaphroditism with atypical or 'mixed' gonadal dysgenesis. relationship to gonadal neoplasm mixed gonadal dysgenesis penoscrotal hypospadias and coarctation of the aorta with mixed gonadal dysgenesis molecular analysis of sry gene in patients with mixed gonadal dysgenesis dysgenesis of testicular and streak gonads in the syndrome of mixed gonadal dysgenesis. perspective derived from a clinicopathologic analysis of twentyone cases clinical and pathologic spectrum of 46,xy gonadal dysgenesis: its relevance to the understanding of sex differentiation dysgenetic male pseudohermaphroditism morphometry and histology of gonads from 13 children with dysgenetic male pseudohermaphroditism testicular pathology in 46,xy dysgenetic male pseudohermaphroditism: an approach to pathogenesis of testis cancer hernia uteri inguinalis beim manne persistence of müllerian derivatives in males persistent müllerian duct syndrome. review and report of 3 cases familial persistent müllerian duct syndrome persistent müllerian duct syndrome in a man with transverse testicular ectopia persistent müllerian structures in infertile male male pseudohermaphroditism with persistent müllerian and wolffi an structures complicated by intraabdominal seminoma persistence of müllerian derivatives in males familial persistent müllerian duct syndrome variants of the anti-müllerian hormone gene in a compound heterozygote with the persistent müllerian duct syndrome and his family diffuse intratubular undifferentiated germ cell tumor in both testes of a male subject with a uterus and ipsilateral testicular dysgenesis testicular tumor in patient with persistent müllerian duct syndrome a case of bilateral seminoma in the setting of persistent müllerian duct syndrome rsh/smith-lemli-opitz syndrome: mutations and metabolic morphogenesis atypical case of smith-lemli-opitz syndrome: implications for diagnosis genetic disorders of cholesterol biosynthesis in mice and humans frequency gradients of dhcr7 mutations in patients with smith-lemli-opitz syndrome in europe: evidence for different origins of common mutations prenatal death in smith-lemli-opitz/rsh syndrome antenatal manifestations of smith-lemli-opitz (rsh) syndrome: a retrospective survey of 30 cases levels of unconjugated estriol and other maternal serum markers in pregnancies with smith-lemli-opitz (rsh) syndrome fetuses syndromal (and nonsyndromal) forms of male pseudohermaphroditism mutations in the delta7-steroid reductase gene in patients with the smith-lemli-opitz syndrome molecular cloning and expression of the human delta7-sterol reductase mutations in the human sterol delta7 reductase gene at 11q12-13 cause smith-lemli-opitz syndrome smith-lemli-opitz syndrome is caused by mutations in the 7-dehydrocholesterol reductase gene opitz syndrome: a multiple congenital anomaly/mental retardation syndrome due to an inborn error of cholesterol biosynthesis mutation analysis and description of sixteen rsh/smith-lemli-opitz syndrome patients: polymerase chain reaction-based assays to simplify genotyping recognition of smith-lemli-opitz syndrome (rsh) in the fetus: utility of ultrasonography and biochemical analysis in pregnancies with low maternal serum estriol a syndrome of pseudohermaphroditism, wilms' tumor, hypertension and degenerative renal disease association between wilms' tumor and gonadal dysgenesis glomerulonephritis associated with male pseudohermaphroditism and nephroblastoma twenty-four new cases of wt1 germline mutations and review of the literature: genotype/phenotype correlations for wilms' tumor development nephrogenic rests in wilms' tumor patients with the drash syndrome frasier syndrome, part of the denys drash continuum or simply a wt1 gene associated disorder of intersex and nephropathy? frasier syndrome: a rare syndrome with wt1 gene mutation in pediatric urology mutation in the pax6 gene in twenty patients with aniridia wagr syndrome: a clinical review of 54 cases the g syndrome of multiple congenital anomalies the bbb syndrome: familial telecanthus with associated congenital anomalies mig12, a novel opitz syndrome gene product partner, is expressed in the embryonic ventral midline and co-operates with mid1 to bundle and stabilize microtubules xlinked opitz g/bbb syndrome: identifi cation of a novel mutation and prenatal diagnosis in a korean family genetic study of sox9 in a case of campomelic dysplasia molecularclinical spectrum of the atr-x syndrome a new detection method for atrx gene mutations using a mismatch-specifi c endonuclease prenatal diagnosis of atr-x syndrome in a fetus with a new g>t splicing mutation in the xnp/atr-x gene testicular biopsy, its value in male sterility testicular biopsy. further studies in male infertility interpretation of testicular biopsy testicular biopsy in the evolution of male infertility testicular causes of infertility pretesticular causes of infertility a quantitative approach to the classifi cation of hypospermatogenesis in testicular biopsies for infertility testicular biopsy for infertility: a review of sixty-eight cases with a simplifi ed histologic classifi cation of lesions testicular biopsy in azoospermia: a review of the last ten years. experience of over 800 cases posttesticular causes of infertility histopathology and ultrastructure of meiotic arrest in human spermatogenesis refl ections on testicular biopsy testicular biopsy in evaluation of male infertility testicular biopsy in the study of male infertility. its current usefulness, histologic techniques, and prospects for the future testicular biopsy of azoospermic men with vas deferens malformation using two different techniques ultrastructural studies on testicular biopsies from eighteen cases of hypospermatogenesis serum and seminal gonadotropins in normal and infertile men: correlations with sperm count, prolactinemia, and seminal prolactin the relationship of biopsy evaluations and testicular measurements to overall daily sperm production in human testes the relationship between germinal cells and serum fsh levels in males with infertility inhibin b as a serum marker of spermatogenesis: correlation to differences in sperm concentration and follicle-stimulating hormone levels danish men inhibin b is a better marker of spermatogenesis than other hormones in the evaluation of male factor infertility the correlation between sperm count and testicular biopsy using a new scoring system testicular biopsy score count -a method for registration of spermatogenesis in human testes: normal values and results in 335 hypogonadal males statistical study of a semiquantitative evaluation of testicular biopsies a method for quantitative analysis of human seminiferous epithelium quantitation of the cells of the seminiferous epithelium of human testis employing sertoli cells as a constant quantifi cation of human seminiferous epithelium. iii histological studies in 44 infertile men with normal chromosome complements quantifi cation of human seminiferous epithelium. 4. histological studies in 17 men with numerical and structural autosomal aberrations quantifi cation of human seminiferous epithelium. i histological studies in twenty-one fertile men with normal chromosome complements quantitative analysis of the seminiferous epithelium in human testicular biopsies, and the relation of spermatogenesis to sperm density quantifi cation of the human sertoli cell population: its distribution, relation to germ cell numbers, and age related decline quantitation of leydig cells in testicular biopsies of oligospermic men with varicocele a method for the quantifi cation of leydig cells in man seminiferous tubule hypercurvature: a newly recognized common syndrome of human male infertility branching of seminiferous tubules associated with hypofertility and chronic respiratory infection morphological and histometric study on the human sertoli cells from birth to the onset of puberty sertoli cell types in sertoli-cell-only syndrome: relationships between sertoli cell morphology and aetiology pathophysiological observations of sertoli cells in patients with germinal aplasia or severe germ cell depletion. ultrastructural fi ndings and hormone levels morphological evidence for two types of idiopathic 'sertoli-cell-only' syndrome enzyme histochemical studies on the pathological changes in human sertoli cells behaviour of glycogen and related enzymes in the sertoli cell syndrome hypoplasia túbulo intersticial difusa (hipogonadismo hipogonadotrópico) leydig cell differentiation induced by stimulation with hcg and hmg in two patients affected with hypogonadotropic hypogonadism the fi ne structure of the immature human testis in hypogonadotrophic hypogonadism hyperplasia and the immature appearance of sertoli cells in primary testicular disorders immunohistochemical detection of immature sertoli cell markers in testicular tissue of infertile adult men: a preliminary study maturation phenotype of sertoli cells in testicular biopsies of azoospermic men the human blood-testis barrier in impaired spermatogenesis reversion of the differentiated phenotype and maturation block in sertoli cells in pathological human testis leydig cell types in primary testicular disorders ectopic testis and the undescended testis: a histological comparison short arm dicentric y chromosome in a sterile man: a case report intermediate fi laments in sertoli cells urinary gonadotropins in the sertoli-cell-only syndrome impaired leydig cell function in vitro in testicular tissue from human males with 'sertoli cell only' syndrome testicular fsh and hcg receptors in sertoli-cell-only syndrome syndrome produced by absence of the germinal epithelium without impairment of the sertoli or leydig cells azfa deletions in sertoli cell-only syndrome: a retrospective study sertoli cell only syndrome in 1982 return of spermatogenesis after stopping cyclophosphamide therapy evidence for testicular impairment after long-term treatment with a luteinizing hormone-releasing hormone agonist in elderly men effects of estrogens on the testis of transsexuals: a pathological and immunocytochemical study response of the human testis to long-term estrogen treatment: morphology of sertoli cells, leydig cells and spermatogonial stem cells depressed testosterone release from testicular tissue in vitro after withdrawal of oestrogen treatment in patients with prostatic carcinoma tubular hyalinization in human testes the peritubular myofi broblasts in the testes from normal men and men with klinefelter's syndrome. a quantitative, ultrastructural, and immunohistochemical study laminin, type iv collagen, and fi bronectin in normal references and cryptorchid human testes. an immunohistochemical study testicular atrophy as a sequela of inguinal herniorrhaphy testis epididymis, and spermatic cord in elderly men. correlation of angiographic and histologic studies with systemic arteriosclerosis primary testicular lesions in the twisted testis infl ammation and infestation of the testis and paratesticular structures quantitative testicular biopsy in congenital and acquired genital obstruction the peritubular myoid cells in the testes from men with varicocele. an ultrastructural, immunohistochemical and quantitative study striated pattern of the testicle on ultrasound: an appearance of testicular fi brosis clinical importance of a unilateral striated pattern seen on sonography of the testicle suppressive effects of 1,2-dibromo-3-3-chloropropane on human spermatogenesis signifi cance of testicular exfoliation in male infecundity laboratory manual for the examination of human semen and semen-cervical mucus interaction zytologische klassifi kation in reifer keimzellen im luftgetrockneten ejakulatanstrich beim spermatologische syndrom der vermehrten desquamation von zellen der spermatogenese (vdzs) transformed spermatocytes constituting the ejaculate of an infertile man studies on the differentiation of round cells in the human ejaculate diagnostic value of differential quantifi cation of spermatids in obstructive azoospermia seminal cytology in the presence of varicocele the pathophysiology of varicocele in male infertility obstruction of the tubuli recti and ductuli efferentes by dilated veins in the testes of men with varicocele and its possible role in causing atrophy of the seminiferous tubules scanning electron microscopic study on the shape of infertile seminiferous tubules: a hypothesis of pathogenesis of idiopathic male infertility the histopathology of acute mumps orchitis an azoospermic man with a double-strand dna break-processing defi ciency in the spermatocyte nuclei: case report association of three isoforms of the meiotic boule gene with spermatogenic failure in infertile men abnormal germ cell exfoliation in semen of hypogonadotrophic patients during a hcg treatment altered luteinizing hormone pulsatility in infertile patients with idiopathic oligoasthenozoospermia differential effect of luteinizing hormone-releasing hormone infusion on testicular steroids in normal men and patients with idiopathic oligospermia molecular heterogeneity of serum follicle-stimulating hormone in hypogonadal patients before and during androgen replacement therapy and in normal men men homozygous for an inactivating mutation of the folliclestimulating-hormone (fsh) receptor gene present variable suppression of spermatogenesis and fertility klassifi zierung tubulärer hodenatrophien bei sterilitätabklärungen testicular ultrastructure in infertile men ultrastructure of the nucleus of human sertoli cells in normal and pathological testes evaluation of the ultrastructural changes in the human sertoli cell in testicular disorders and the relationship of the changes to the levels of serum fsh abnormality of testicular fsh receptors in infertile men signifi cance of inhibin in reproductive pathophysiology and current clinical applications placebo-controlled trial of high-dose mesterolone treatment of idiopathic male infertility impaired leydig cell function in infertile men: a study of 357 idiopathic infertile men and 318 proven fertile controls leydig and sertoli cell function in normal and oligospermic males: a preliminary report leydig cell function in infertile men with idiopathic oligospermic infertility androgen insensitivity as a cause of infertility in otherwise normal men the frequency of androgen receptor defi ciency in infertile men malignant sex cord stromal tumor in a patient with the androgen insensitivity syndrome a new point mutation of the androgen receptor gene in a patient with partial androgen resistance and severe oligozoospermia the use of clomiphene citrate in the treatment of azoospermia secondary to incomplete androgen resistance evidence for a partial deletion in the androgen receptor gene in a phenotypic male with azoospermia androgen insensitivity and male infertility frequency of androgen insensitivity infertile phenotypically normal men improvement of spermatogenesis after treatment with the antiestrogen tamoxifen in a man with the incomplete androgen insensitivity syndrome a controlled comparison of the effi cacy of clomiphene citrate in male infertility pregnancy after hormonal correction of severe spermatogenic defect due to a mutation in androgen receptor gene occupational exposures associated with male reproductive dysfunction ultrastructural surface characteristics of seminiferous tubules from men with varicocele signifi cance of testicular size measurement in andrology. ii. correlation of testicular size with testicular function deterioration of semen parameters over time in men with untreated varicocele: evidence of progressive testicular damage quantitative evaluation of testicular biopsies in varicocele testis biopsy in subfertile men with varicocele testicular changes in subfertile males with varicocele classifi cation of several types of maturational arrest of spermatogonia according to sertoli cell morphology: an approach to aetiology prostate cancer. primary hormonal treatment focal atrophy of the seminiferous tubule in the human testis quantitative and ultrastructural study on germinal epithelium in testicular biopsies with 'mixed atrophy proliferation and functional maturation of sertoli cells, and their relevance to disorders of testis function in adulthood return of spermatogenesis after stopping cyclophosphamide therapy the frequency and morphology of 'giant spermatogonia' in the human testis seminiferous tubule involution in elderly men pattern of compartmentation in human seminiferous tubules showing dislocation of spermatogonia dislocated type-a spermatogonia in human seminiferous tubules intercellular bridges between megalospermatocytes in the human testis megalospermatocytes in the human testis exhibit asynapsis of chromosomes megalospermatocytes: indicators of disturbed meiosis in man multinucleate spermatids in aging human testes unusual incidence of binucleate spermatids in human cryptorchidism testicular and epididymal pathology spermatogenetic arrest with inhibition of acrosome and sperm tail development recent advances in human sperm pathology anomalies morphologiques du spermatozoide humain. propositions pour un système de classifi cation morphologische studien an abnormen spermatiden und spermatozoen des menschen classifi cation of abnormalities in human spermatids based on recent advances in ultrastructural research on spermatid differentiation morphology of spermatozoa in fertile man with and without varicocele on round headed human spermatozoa morphogenesis of round headed human spermatozoa lacking acrosomes in a case of severe teratozoospermia sperm acrosome defects in a patient with aarskog-scott syndrome failure of differentiation of the nuclear-perinuclear skeletal complex in the round-headed human spermatozoa human sperm acrosin activity with relation to semen parameters and acrosomal ultrastructure chromosome aneuploidy in the spermatozoa of two men with globozoospermia sperm chromosome aneuploidy analysis in a man with globozoospermia round head' sperm defect. ultrastructural and meiotic segregation study lack of acrosome formation in mice lacking a golgi protein human infertility due to production of multiple-tailed references spermatozoa with excessive amounts of dna multi-tailed spermatozoa in a case with asthenospermia and teratospermia human spermatozoa with large heads and multiple fl agella: a quantitative ultrastructural study of 6 cases chromosomal analysis of spermatozoa with normal-sized heads in two infertile patients with macrocephalic sperm head syndrome crater defect in human spermatozoa ultrastructural study of human binucleate spermatids ultrastructural abnormalities of human spermatozoa cytoplasmic droplets: the good, the bad or just confusing? absence de la paire centrale du complexe axonemique dans une tératospermie avec fl agelles courts et épais asthenozoospermie totale avec anomalie ultrastructurale du fl agelle dans deux frères stériles dysplasia of the fi brous sheath. an ultrastructural defect of human spermatozoa associated with sperm immotility and primary sterility tail stump spermatozoa: morphogenesis of the defect. an ultrastructural study of sperm and testicular biopsy gene deletions in an infertile man with sperm fi brous sheath dysplasia multi-tailed spermatozoa in a case with asthenospermia and teratospermia sperm tail agenesis in a case of consanguinity ultrastructural study of the decapitates sperm defect in a infertile man morphogenesis of the decapitated and decaudated sperm defect in two brothers acephalic spermatozoa and abnormal development of the head-neck attachment: a human syndrome of genetic origin sperm structure and its relevance to infertility decapitated and decaudated spermatozoa in man, and pathogenesis based on the ultrastructure assisted reproduction for infertile patients with 9 + 0 immotile spermatozoa associated with autosomal dominant polycystic kidney disease spermatozoa and cilia lacking axoneme in an infertile man lack of dynein arms in immotile human spermatozoa diagnostic approach to primary ciliary dyskinesia: a review primary ciliary dyskinesia: a review anomalies of lateralization in man: a case of total situs inversus stiff-tail-oder mittelstück-syndrom pour development of outer dense fi bres as a mayor cause of tail abnormalities in the spermatozoa of asthenoteratozoospermic men human mtdna haplogroups associated with high or reduced spermatozoa motility male infertility and mitochondrial dna carcinoma in situ in testicular biopsies from men presenting with infertility carcinoma in situ of the testis in infertile men. a histological, immunocytochemical and cytophotometric study of dna content early testicular cancer in severe oligozoospermia carcinoma-insitu of the testis: possible origin from gonocytes and precursor of all types of germ cell tumours except spermatocytoma male subfertility. continuing diagnostic and therapeutic measures testicular cancer risk in boys with maldescended testis: a cohort study seminoma discovered in two males undergoing successful testicular sperm extraction for intracytoplasmic sperm injection screening for carcinoma in situ of the contralateral testis in patients with germinal testicular cancer the incidence of seminoma and expression of cell adhesion cd44 in cryptorchid boys and infertile men a novel androgen receptor mutation resulting in complete androgen insensitivity syndrome and bilateral leydig cell hyperplasia leydig cell hyperplasia mimicking testicular neoplasm increased mast cells in the limiting membrane of seminiferous tubules in the testes of patients with idiopathic infertility testicular mast cell heterogeneity in idiopathic male infertility mast cells in testicular lesions ketotifen improves sperm motility and sperm morphology in male patients with leukocytospermia and unexplained infertility an attempt to explain occurrence of patent reproductive tract in azoospermic males with tubular spermatogenesis quantitative analysis of testicular biopsy: determination of partial obstruction and prediction of sperm count after surgery for obstruction correlation between spermatozoon numbers in spermiogram and seminiferous epithelium histology in testicular biopsies from subfertile men alpha-1,4-glucosidase activity and the presence of germinal epithelium cells in the semen for differential diagnosis of obstructive and nonobstructive azoospermia ejaculatory duct obstruction in subfertile males: analysis of 87 patients paratesticular cysts with benign epithelial proliferations of wolffi an origin von hippel-lindau disease prevalence of epididymal, seminal vesicle, prostate, and testicular cysts in autosomal dominant polycystic kidney disease congenital anomalies of the vas deferens, epididymis, and seminal vesicles etiologic factors in 1294 consecutive cases of male infertility male fertility and positive chlamydial serology. a study of 61 fertile and 82 subfertile men relationship of bacteriologic characteristics to semen indices in men attending an infertility clinic treatment of bacterial urinary tract infections: presence and future relationship of azoospermia to inguinal surgery coarse granular cytoplasmic change of the epididymis. an immunohistochemical and ultrastructural study histology of the epididymis in men with obstructive infertility quantitative testicular biopsy in spinal cord injured men: comparison to fertile controls some gross observations of the epididymides following vasectomy: a clinical study chronic testicular pain following vasectomy quantitative pathologic changes in the human testis after vasectomy. a controlled study consequences of vasectomy: an immunological and histological study related to subsequent fertility the status of vasectomy reversals a fi fteen-year study of alterations in semen quality occurring after vasectomy reversal surgical treatment of male infertility young's syndrome: obstructive azoospermia and chronic sinopulmonary infections respiratory tract disease and obstructive azoospermia obstructive azoospermia: respiratory function tests, electron microscopy and the results of surgery varicose axon bearing 'synaptic' vesicles on the basal lamina of the human seminiferous tubules oxytocin mediates the estrogendependent contractile activity of endothelin-1 in human and rabbit epididymis comparison of the effectiveness of placebo and alpha-blocker therapy for the treatment of idiopathic oligozoospermia screening for abnormalities of chromosomes x, y, and 18 and for diploidy in spermatozoa from infertile men participating in an vitro fertilization-intracytoplasmic sperm injection program cytogenetic study of 435 subfertile men: incidence and clinical features syndrome characterized by gynecomastia, aspermatogenesis without aleydigism and increased excretion of follicle-stimulating hormone infertility in the klinefelter syndrome control of meiotic process klinefelter syndrome endocrine features of klinefelter's syndrome clinical and diagnostic features of patients with suspected klinefelter syndrome clinical and therapeutic experiences with klinefelter's syndrome brain morphology in klinefelter syndrome: extra x chromosome and testosterone supplementation their fi ne structure in three cases of klinefelter's syndrome dimorphism in sex chromatin pattern of sertoli cells in adults with klinefelter'ssyndrome: correlation with two types of 'sertoli-cell-only' tubes leydig cell counts in chromatin-positive klinefelter's syndrome quantitative and ultrastructural study of leydig cells in klinefelter'ssyndrome etude en microscopie électronique de references la cellule de leydig dans la maladie de klinefelter en périodes pre, per et postpubertaires testicular function in klinefelter's syndrome testosterone and δ4 androstenedione in the saliva of patients with klinefelter's syndrome fine structure of spermatogenesis in klinefelter's syndrome spermatogenesis in klinefelter's syndrome pdg in 47,xxy klinefelter's syndrome patients risk of trisomy 21 in offspring of patients with klinefelter's syndrome a new chromosome constitution in klinefelter's syndrome a male with xxyy chromosomes do the 48 xxyy males have a characteristic phenotype? genetics and endocrine fi ndings in a 48 xxyy male dermatoglyphics associated with the xxyy chromosome complement ultrastructure of leydig cells in klinefelter's syndrome with 48 xxyy karyotype primary amentia and microorchidism associated with an xxxy sex-chromosome constitution 49,xxxxy syndrome with diabetes mellitus a child with 49 chromosomes carcinoma of the male breast in association with klinefelter's syndrome male breast cancer klinefelter's syndrome and breast cancer seminoma in klinefelter's syndrome with 47 xxy, 15s+ karyotype klinefelter's syndrome and mediastinal teratoma extragonadal germ cell tumor in the retrovesical region associated with klinefelter's syndrome: a case report and review of literature primary mediastinal embryonal carcinoma in association with klinefelter's syndrome the malignant potential of the dysgenetic germ cell in klinefelter's syndrome simian papovavirus 40 transformation of cells from cancer patient with xy/xxy mosaic klinefelter's syndrome isolated primary aldosteronism in a patient with adrenal carcinoma and xy/xxy klinefelter's syndrome acute leukemia following a malignant teratoma in a child with klinefelter's syndrome hematologic malignancies and klinefelter syndrome. a chance association? prostate cancer in klinefelter syndrome during hormonal replacement therapy the prepubertal testicular lesion in chromatin-positive klinefelter's syndrome (primary micro-orchidism). as seen in mentally handicapped children mental development in polysomy x klinefelter syndrome (47,xxy; 48,xxxy): effects of incomplete x inactivation klinefelter syndrome is a common cause for mental retardation of unknown etiology among prepubertal males growth and body proportions in 54 boys and men with klinefelter's syndrome disorders of sex differentiation klinefelter syndrome klinefelter syndrome and its variants: an update and review for the primary pediatrician early androgen defi ciency in infants and young boys with 47,xxy klinefelter syndrome klinefelter's syndrome in a ten month old mongolian idiot síndrome de klinefelter's xxy en el periodo prepuberal. estudio de ocho observaciones natural history of seminiferous tubule degeneration in klinefelter syndrome klinefelter syndrome and mediastinal germ cell tumors central precocious puberty in 48,xxyy klinefelter syndrome variant a case of klinefelter's syndrome with acquired hypopituitarism klinefelter's syndrome with hypogonadotropic hypogonadism chromatinpositive klinefelter's syndrome with undetectable peripheral fsh levels a case of hypogonadotrophic hypogonadism with an xy/xxy chromosome mosaicism klinefelter's syndrome with hypogonadotrophic hypogonadism xxy klinefelter's syndrome with low fsh and lh levels and absence of leydig cells a case of xx-male syndrome 45x/46xx boy with hypospadias: case report xx sex chromosomes in a human male. first case xx sex reversal, palmoplantar keratoderma, and predisposition to squamous cell carcinoma: genetic analysis in one family analytic review: nature and origin of males with xx sex chromosomes leydig cell ultrastructure in an xx male varón con cariotipo 46 xx ultrastructure of testicular biopsy from an xx male le testicule chez l'homme 46, xx: a propos d'une observation ultrastructural ultrastructure of the testis in an xx male with normal plasma testosterone sex determination and sex reversal: genotype, phenotype, dogma and semantics clinical case of the month. a male with 46,xx karyotype x-y chromosomal interchange in the etiology of true hermaphroditism and of xx klinefelter's syndrome a possible etiology of the infertile 46 xx male subject deletion mapping of the testis determining locus with dna probes in 46 xx males and 46 xy and 46 x dic (y) females dna hybridization study using yspecifi c probes in an xx-male localization of the sex-determining region-y gene in xx males xx male: clinical, hormonal/ genetic fi ndings a comparative genomic hybridization study in a 46,xx male an xx male with the sexdetermining region y gene inserted in the long arm of chromosome 16 deoxyribonucleic acid and cytological detection of y-containing cells in a xx hypospadic boy with polyorchidism xx males without sry gene and with infertility sry-negative 46,xx male with normal genitals, complete masculinization and infertility genotype-phenotype correlations in xx males and their bearing on current theories of sex determination a regulatory cascade hypothesis for mammalian sex determination: sry represses a negative regulator of male development xx sex reversal with partial duplication of chromosome arm 22q clinical, hormonal and cytogenetic evaluation of 46,xx males and review of the literature xyy human male xyy chromosome abnormality in sexual homicide perpetrators the xyy syndrome: a follow-up study on 38 boys testicular function in xyy men quantifi cation of human seminiferous epithelium. ii. histological studies in eight 47, xyy men persistence of two y chromosomes through meiotic prophase and metaphase i in an xyy man from spermatocytes to spermatozoa in an infertile xyy male sex chromosome pairing and male fertility the role of unpaired sex chromosomes in spermatogenetic failure evidence for an association between univalent y chromosomes and spermatocyte loss in xyy mice and men meiosis in xyy men pituitary gonadotrophins and 17-ketosteroids in patients with the xyy syndrome analysis of the sex chromosome constitution of sperm in men with a 47,xyy mosaic karyotype by fl uorescence in situ hybridization a male subject with 3 y chromosomes (48,xyyy): a case report triple-y syndrome following icsi treatment in a couple with normal chromosomes: case report 49 xyyyy. a case report tetrasomy y by structural rearrangement: clinical report human y chromosome function in male germ cell development (review) defi ning regions of the ychromosome reponsible for male infertility and identifi cation of a fourth azf region (azfd) by ychromosome microdeletion characterization of novel genes in azf regions localization of factors controlling spermatogenesis in the nonfl uorescent portion of the human y chromosome long arm severe oligozoospermia resulting from deletions of references azoospermia factor gene on y chromosome yq deletion and failure of spermatogenesis yq deletion with short stature, abnormal male development, and schizoid character disorder fish characterization of a dicentric yq (p11.32) isochromosome in an azoospermic male the y chromosome, part b. clinical aspects of y chromosome abnormalities short arm dicentric y chromosome with associated statural defect in a sterile man four new cases of dicentric y chromosomes meiotic studies on a subfertile patient with a ring y chromosome mosaic ring y chromosome in two normal healthy men with azoospermia ring (y) in two azoospermic men the y chromosome part b: clinical aspects of y chromosome abnormalities an abnormal terminal x-y interchange accounts for most but not all cases of human xx maleness cytogenetic, molecular and testicular tissue studies in an infertile 45,x male carrying an unbalanced (y; 22) translocation: case report unique t(y; 1)(q12; q12) reciprocal translocation with loss of the heterochromatic region of chromosome 1 in a male with azoospermia due to meiotic arrest: a case report y; autosome translocations and mosaicism in the aetiology of 45,x maleness: assignment of fertility factor to distal yq ll characterization of a (y; 4) translocation by dna hybridization genome analysis: from sequence to function submicroscopic delections in the y chromosome of infertile men y-chromosome deletions in idiopathic severe testiculopathies prevalence of y chromosome microdelections in oligospermic and azoospermic candidates for intracytoplasmic sperm injection y-chromosome microdeletions and recurrent pregnancy loss molecular scanning of yq 11 (interval 6) in men with sertoli-cellonly syndrome y chromosome variants and male reproductive function a duplication of distal xp associated with hypogonadotrophic hypogonadism, hypoplastic external genitalia, mental retardation, and multiple congenital abnormalities the behavior of sex chromosomes in two human xautosome translocations: failure of extensive -inactivation spreading icsi and the transmission of xautosomal translocation: a threegeneration evaluation of x; 20 translocation: case report xxx male: a clinical and molecular study chromosome complements in 695 sperm from three men heterozygous for reciprocal translocations and a review of the literature analysis of sperm chromosome complements from a man heterozygous for a pericentric inversion of chromosome 1 an excess of chromosome 1 breakpoints in male infertility unbalanced chromosomal translocation associated with sertoli-cell-only histology urological manifestations of down syndrome hormone profi les and contralateral testicular histology in down's syndrome with unilateral testicular tumor down syndrome and male fertility: pcr-derived fi ngerprinting, serological and andrological investigations successful pregnancy and delivery from frozen-thawed embryos after intracytoplasmic sperm injection using round-headed spermatozoa and assisted oocyte activation in a globozoospermic patient with mosaic down syndrome maturation delay of germ cells in fetuses with trisomy 21 results in increased risk for the development of testicular germ cell tumors down syndrome in the male. reproductive pathology and meiotic studies hypogonadism due to primary testicular failure correlation between ctg trinucleotide repeat length and frequency of severe congenital myotonic dystrophy size of the unstable ctg repeat sequence in relation to phenotype and prenatal transmission in myotonic dystrophy relationship between parenteral trinucleotide ctg repeat length and severity of myotonic dystrophy in offspring myotonic dystrophy: clinical and molecular parallels between myotonic dystrophy type 1 and type 2 infl uence of sex of the transmitting parent as well as of parental allele size on the ctg expansion in myotonic dystrophy (dm) ctg amplifi cation in the dm1pk gene is not associated with idiopathic male subfertility correlations between individual clinical manifestations and ctg repeat amplifi cation in myotonic dystrophy increasing infertility in myotonia dystrophica curschmann-steinert. a case report dystrophin characterization in muscle biopsies from duchenne and becker muscular dystrophy patients dystrophin, its interactions with other proteins, and implications for muscular dystrophy cloning of two new human helicase genes of the recq family: biological signifi cance of multiple species in higher eukaryotes positional cloning of the werner's syndrome gene chromosomal breakage in human spermatozoa, a heterozygous effect of the bloom syndrome mutation recql4-defi cient cells are hypersensitive to oxidative stress/ damage: insights for osteosarcoma prevalence and heterogeneity in rothmund-thomson syndrome cockayne's syndrome with unusual retinal involvement (report of one family) eccrine sweat gland anatomy in cockayne syndrome: a possible diagnostic aid cockayne's syndrome: a case report. literature review trichothiodystrophy associated with photosensitivity, gonadal failure, and striking osteosclerosis mapping a gene for noonan syndrome to the long arm of chromosome 12 noonan syndrome and related disorders: genetics and pathogenesis testicular biopsy and hormonal study in a male with noonan's syndrome purkinje cell degeneration associated with erythroid ankyrin defi ciency in nb/nb mice ataxia and male sterility (ams) mouse. a new genetic variant exhibiting degeneration and loss of cerebellar purkinje cells and spermatic cells molecular structure of the human gonadotropin-releasing hormone receptor gene gonadotropin-releasing hormone defi ciency in the human (idiopathic hypogonadotropic hypogonadism and kallmann's syndrome): pathophysiological and genetic considerations algunos aspectos del eunucoidismo hipogonadotrópico complete hypogonadotropic hypogonadism associated with a novel inactivating mutation of the gonadotropin-releasing hormone receptor synchronization of augmented luteinizing hormone secretion with sleep during puberty predictors of outcome of longterm gnrh therapy in men with idiopathic hypogonadotropic hypogonadism hypogonadism due to secondary testicular failure falta total de nervios olfatorios con anosmia en un individuo en quien existía una atrofi a congénita de los testículos y miembro viril the genetic aspects of primary eunuchoidism la dysplasie olfacto-génitale polysialic acid facilitates migration of luteinizing hormonereleasing hormone neurons on vomeronasal axons mechanisms of disease: insights into x-linked and autosomal-dominant kallmann syndrome mutations in fi broblast growth factor receptor 1 cause kallmann syndrome with a wide spectrum of reproductive phenotypes pulsatile gonadotropin-releasing therapy in male patients with kallmann's syndrome or constitutional delay of puberty hipogonadismo hipogonadotrópico con anosmia electron microscopic studies of testes in kallmann syndrome seminoma in hypogonadotropic hypogonadism associated with anosmia (kallmann's syndrome) clinical and hormonal features of selective follicle-stimulating hormone (fsh) defi ciency due to fsh βsubunit gene mutations in both sexes human fsh β subunit gene is highly conserved isolated folliclestimulating hormone defi ciency in men: successful long-term gonadotropin therapy síndrome hipoandrogénico con gametogénesis conservada: clasifi cación de la insufi ciencia testicular hypoandrogenic syndrome with spermatogenesis a syndrome of eunuchoidism with spermatogenesis, normal urinary fsh, and low-normal icsh ('fertile eunuchs') fertile eunuch syndrome with the mutations (trp8arg and ile15thr) in the β subunit of luteinizing hormone the fertile eunuch variant of idiopathic hypogonadotropic hypogonadism: spontaneous reversal associated with a homozygous mutation in the gonadotropinreleasing hormone receptor hypogonadism caused by a single amino acid substitution in the β subunit of luteinzing hormone fsh β gene mutations in a female with partial breast development and a male sibling with normal puberty and azoospermia gonadal function and response to growth hormone (gh) in boys with isolated gh defi ciency and to gh and gonadotropins in boys with multiple pituitary hormone defi ciencies ein syndrom von adipositas, kleinwuchs, kryptorchismus, und oligophrenie nach myotonieartigen zustand in neugeborenenalter hypothalamic and gonadal components of hypogonadism in boys with prader-labhart-willi syndrome cryptorchidism in the prader-willi syndrome síndrome de prader-willi a fi fth locus for bardet-biedl syndrome maps to chromosome 2q31 bardet-biedl syndrome: an emerging pathomechanism of intracellular transport an atypical contiguous gene syndrome: molecular studies in a family with xlinked kallmann's syndrome and xlinked ichthyosis johnson-mcmillin syndrome: report of a new case with novel features a single ataxia telangiectasia gene with a product similar to pi-3 kinase ataxiatelangiectasia and related diseases the friedreich's ataxia gene encodes a novel phosphatidylinositol-4-phosphatase 5-kinase friedreich ataxia: detection of gaa repeat expansions and frataxin point mutations endocrine evaluation of infertile men idiopathic hypogonadotropic hypogonadism in a male runner is reversed by clomiphene citrate endocrine function after spontaneous infarction of the human pituitary acquired hypogonadotropic hypogonadism presenting as decreased seminal volume gonadotropin-secreting pituitary adenoma with concomitant hypersecretion of testosterone and elevated sperm count. treatment with lrh agonists testicular enlargement and elevated serum inhibin concentrations occur in patients with pituitary macroadenomas secreting follicle stimulating hormone the impotent couple: low desire immunohistochemical detection of prolactin and its receptors in human testis ultrastructural lesions in testes from hyperprolactinemic men infl uence of serum prolactin on semen characteristics and sperm function return of gonadal function in men with prolactin-secreting tumors serum levels of free and bound testosterone in hyperthyroidism testicular function in hyperthyroidism graves'autoimmune serum inhibits gonadal steroidogenesis. development of a leydig cell bioassay to identify broad spectrum anti-endocrine autoantibodies male hypogonodism in hypothyroidism: a study of six cases male reproductive function in relation with thyroid alterations testicular dysfunction in men with primary hypothyroidism; reversal of hypogonadotropic hypogonadism with replacement thyroxine primary hypothyroidism and human spermatogenesis serum hormones and seminal parameters in males with thyroid disturbance thyroid function and puberty prepubertal diagnosis of x-linked congenital adrenal hypoplasia presenting after infancy x-linked adrenal hypoplasia congenita and hypogonadotropic hypogonadism: report on new mutation of the dax-1 gene in two siblings x-linked adrenal hypoplasia congenita: a mutation in dax1 expands the phenotypic spectrum in males and females genotyping of congenital adrenal hyperplasia due to 21-hydroxylase defi ciency presenting as male infertility: case report and literature review male infertility due to congenital adrenal hyperplasia: testicular biopsy fi ndings, hormonal evaluation, and therapeutic results in three patients adrenal rest tumors of the testes bilateral testicular adrenal rests in a patient with 11-hydroxylase defi cient congenital adrenal hyperplasia infertility caused by bilateral testicular masses secondary to congenital adrenal hyperplasia (21-hydroxylase defi ciency) elevated 17-hydroxyprogesterone and testosterone in a newborn with 3β-hydroxysteroid dehydrogenase defi ciency. n eng l congenital 17 alfa-hydroxylase defi ciency: a clinico-pathologic study bilateral tumors of the testis in 21-alpha hydroxylase defi ciency without adrenal hyperplasia estrogen secreting adrenal adenocarcinoma in an 18-month-old boy: aromatase activity, protein expression, mrna and utilization of gonadal type promoter diabetes mellitus/male infertility growth, puberty, and fi nal height in children with type 1 diabetes insulin-dependent diabetes affects testicular function by fshand lh-linked mechanisms semen characteristics and diabetes mellitus: signifi cance of insulin in male infertility interstitial compartment pathology and spermatogenic disruption in testes from impotent diabetic men autonomic neuropathy and sexual impotence in diabetic patients: analisis of cardiovascular refl exes mutation analysis for heterozygote detection and the prenatal diagnosis of cystic fi brosis cftr gene mutations and male infertility infertility in male patients with cystic fi brosis anomalies du sperme, des defferents et de l'epididyme dans la mucoviscidose a mutation in the second nucleotide binding fold of the cystic fi brosis gene cystic fi brosis in infertility: screening before assisted reproduction: opinion autosomal recessive hereditary congenital aplasia of the vasa deferentia in four siblings congenital bilateral absence of the vas deferens. a primary genital form of cystic fi brosis congenital mesonephric defects in male infants with mucoviscidosis genital abnormalities in male patients with cystic fi brosis la reviviscenza mammaria nell'uomo affecto da cirrosi del laennec sulla c.d. reviviscenza della mammella maschile nella cirrosi epatica (nota preventiva) hypogonadism in alcoholic liver direase: evidence for a double defect effect of chronic alcoholism on semenstudies on lipid profi les alterations of testicular morphology in alcoholic disease differential binding of testosterone and oestradiol to isoforms of sex hormone binding glolulim: selective alteration of estradiol binding in cirrhosis group for liver diseases. serum testosterone concentration in men with alcoholic cirrhosis: background for variation sexual function and testosterone levels in men with nonalcoholic liver disease testicular function and fertility in men with homozygous alpha-1 antitrypsin defi ciency hereditary hemochromatosis haemochromatosis and hla-h a novel mhc class 1-like gene is mutated in patients with hereditary haemochromatosis clinical and molecular aspects of juvenile hemochromatosis in saguenay juvenile hemochromatosis locus maps to chromosome 1q juvenile hemochromatosis a 24-year-old patient with decreased libido and erectile dysfunction as initial manifestations of hemochromatosis endocrine complications of genetic hemochromatosis hypogonadism in hereditary hemochromatosis the anterior pituitary in hemochromatosis reversibility of hypogonadotropic hypogonadism in a patient with the juvenile form of hemochromatosis polycystic kidney disease and infertility: case report and literature review infertility in adults with references polycystic kidney disease male infertility and adult polycystic kidney disease are associated with necrospermia seminal vesicle cysts and infertility in autosomal dominant polycystic kidney disease clinical relevance of scrotal and transrectal ultrasonography in andrological patients testosterone esters advance skeletal maturation more than growth in short boys with chronic renal failure and delayed puberty discordant elevation of the common alpha subunit of the glycoprotein hormones compared to β subunits in serum of uremic patients the pituitary testicular axis in men with chronic renal failure pathology of endocrine organs in chronic renal failure. an autopsy analysis of 66 patients calcifi cation of the epididymis and the tunica albuginea of the corpora cavernosa in patients on maintenance hemodialysis novel insights into uremic vascular calcifi cation: role of matrix gla protein and alpha-2-heremans schmid glycoprotein/fetuin cystic transformation and calcium oxalate deposits in rete testis and efferent ducts in dialysis patients gonadal dysfunction in uremic men. a study of the hypothalamo-pituitarytesticular axis before and after renal transplantation pathogenesis of endocrine abnormalities in uremia hypothalamic-pituitary function in uremia zinc defi ciency in men with crohn's disease may contribute to poor sperm function and male infertility sperm nucleomalacia in men with infl amatory bowel disease the effects of sulphasalazine on human male fertility potencial and seminal prostaglandins mesalazineinduced reversible infertility in a young male sulphasalazine and 5-aminosalicylic acid in long-term treatment of ulcerative colitis: report on tolerance and side-effects testosterone replacement for hypogonadism: clinical fi ndings and best practices hiv-associated hypogonadism hypogonadism and wasting in the era of haart in hiv-infected patients prevalence of hypogonadism among men with weight loss related to human immunodefi ciency virus infection who were receiving highly active antiretroviral therapy loss of lean body mass and muscle mass correlates with androgen levels in hypogonadal men with acquired immunodefi ciency syndrome and wasting effects of hypogonadism and testosterone administration on depression indices in hiv-infected men gonadal hormone levels in injection drug users endocrine disorders in men infected with human immunodefi ciency virus testicular pathologic changes and the pituitary-testicular axis during human immunodefi ciency virus infection adrenal and testicular function in boys affected by thalassemia italian working group on endocrine complications in non-endocrine diseases. impact of long-term iron chelation therapy on growth and endocrine functions in thalassaemia pubertal evaluation of adolescent boys with β-thalassemia major and delayed puberty prevalence of growth and puberty failure with respect to growth hormone and gonadotropins secretion in βthalassemia major growth and puberty and its management in thalassaemia spontaneous and gnrhprovoked gonadotropin secretion and testosterone response to human chorionic gonadotropin in adolescent boys with thalassaemia major and delayed puberty inhibin b in men with severe obesity and after weight reduction following gastroplasty obesity and reproductive disorders: a review obesity, fat distribution and infertility mutational analysis of the autoimmune regulator (aire) gene in sporadic autoimmune addison's disease can reveal patients with unidentifi ed autoimmune polyendocrine syndrome type l autoimmune regulator (aire) gene on chromosome 21: implications for autoimmune polyendocrinopathycandidiasis-ectodermal dystrophy (apeced) any more common manifestations of endocrine autoimmunity adrenal and gonadal autoimmune diseases identifi cation by molecular cloning of an autoantigen associated with addison's disease as steroid 17 alphahydroxylase two different cytochrome p450 enzymes are the adrenal antigens in autoimmune polyendocrine syndrome type i and addison's disease prevalence and clinical associations of 10 defi ned autoantibodies in autoimmune polyendocrine syndrome type i polyglandular autoinmune syndromes isolated gonadotrope failure in the polyglandular autoimmune syndrome endocrine dysfunction in patients with fabry disease testicular and epididymal involvement in fabry's disease safety and effi cacy of recombinant human alpha-galactosidase a replacement therapy in fabry's disease fabry disease, an underrecognized multisystemic disorder: expert recommendations for diagnosis, management, and enzyme replacement therapy enzyme replacement therapy and renal function in 201 patients with fabry disease putative x-linked adrenoleukodystrophy gene shares unexpected homology with abc transporters gonadal mosaicism in a family with adreno-leukodystrophy. molecular diagnosis of carrier status among daughters of a gonadal mosaic when direct detection of the mutation is not possible x-linked adrenoleukodystrophy: role of very long-chain acyl-coa synthetases adrenal and testicular function in 14 patients with adrenoleukodystrophy or adrenomyeloneuropathy the testis in adreno-leukodystrophy in situ localization of the genetic locus encoding the lysosomal acid lipase/cholesteryl esterase (lipa) defi cient in wolman disease to chromosome 10q23.2-q23.3 enfermedad de wolman. estudio anatomopatológico de dos observaciones autopsy report of an adult case with a long-standing subclinical course complicated by accelerated atherosclerosis and liver carcinoma successful treatment of wolman disease by unrelated umbilical cord blood transplantation pituitary-testicular function in nephropathic cystinosis lysosomal diseases environmental infl uences on male reproductive health occupational exposures associated with male reproductive dysfunction semen quality in fertile us men in relation to geographical area and pesticide exposure semen quality of workers occupationally exposed to hydrocarbons semen quality in workers exposed to carbon disulfi de compared to a control group from the same plant endocrinologic studies in men exposed occupationally to carbon disulfi de y-chromosomal nondysjunction in dibromochloropropane exposed workmen inorganic lead exposure in battery and paint factory: effect on human sperm structure and functional activity reproductive ability of workmen occupationally exposed to lead endocrine and reproductive dysfunction in men associated with occupational inorganic lead intoxication occupational exposure to synthetic estrogens: some methodological problems association of diethylstilbestrol exposure in utero with cryptorchidism, testicular hypoplasia and semen abnormalities altered semen quality in relation to urinary concentrations of phthalate monoester and oxidative metabolites current use of pesticides in poland and the risk of reproductive disorders the environment and male fertility: recent research on emerging chemicals and semen quality exposure to exogenous estrogens in food: possible impact on human development and health testicular dysgenesis syndrome; an increasingly common developmental disorder with environmental aspects environmental infl uences on male reproduction effects of delta-9-tetrahydrocannabinol, the primary psychoactive cannabinoid in marijuana, on human sperm function in vitro effect of cocaine on germ cell apoptosis in rats at different ages effect of chronic alcoholism on male fertility hormones and semen quality effect of smoking on semen parameters of men attending an infertility clinic concomitant abuse of anabolic androgenic steroids and human chorionic gonadotrophin impairs spermatogenesis in power athletes the infl uence of radiation on fertility in man gonadal function in workmen with long term exposure to microwaves effects of artifi cial cryptorchidism on sperm morphology spermatogenic arrest in men with testicular hyperthermia the effect of hyperthermia on ornithine decarboxylase activity in different rat tissue effects of l-acetylcarnitine (lac) on the post-injury recovery of mouse spermatogenesis monitored by fl ow cytometry. recovery after hyperthermic treatment the role of carnitine in spermatozoa metabolism. substrate induced elevations in the acetylation state of carnitine and coenzyme a in bovine and monkey spermatozoa in vitro temperature sensitivity of dna, rna, and protein synthesis throughout puberty in human testis traumatic dislocation of testes and bladder rupture report of 2 new cases and review of the literature traumatic dislocation of testis traumatic testicular dislocation a review of 36 cases restoration of spermatogenesis by orchiopexy 13 years after bilateral traumatic testicular dislocation sexuality and fertility in long-term survivors of testicular cancer fertility and pregnancy after treatment for cancer during childhood or adolescence male gonadal dose in adjuvant 3-d-pelvic irradiation after anterior resection of rectal cancer. infl uence to fertility recovery from aspermia induced by low-dose radiation in seminoma patients aspermia following lower truncal irradiation in hodgkin's disease testicular function in eight patients with seminoma after unilateral orchidectomy and radiotherapy paternity after irradiation for testicular cancer gonadal function in patients with differentiated thyroid cancer treated with (131)i effects of electromagnetic radiation from a cellular phone on human sperm motility: an in vitro study development of the seminiferous epithelium during and after treatment for acute lymphoblastic leukemia in childhood gonadal effects of cancer therapy in boys the effects of different cumulative doses of chemotherapy on testicular function impact of cyclophosphamide on long-term reduction in sperm count in men treated with combination chemotherapy for ewing and soft tissue sarcomas sertoli cell inactivation by cytotoxic damage to the human testis after cancer chemotherapy leydig cell dysfunction and gynaecomastia in adult males treated with alkylating agents testicular and sperm dna damage after treatment with fl udarabine for chronic lymphocytic leukaemia 1473. van den berg h, furstner f, van den bos c, behrendt h. decreasing the number of mopp courses reduces gonadal damage in survivors of childhood hodgkin disease effects of the chemotherapy cocktail used to treat testicular cancer on sperm chromatin integrity sperm integrity pre-and postchemotherapy in men with testicular germ cell cancer testicular dysfunction in hodgkin's disease before and after treatment posttreatment fertility in patients with testicular cancer. part 1 sexual function after bilateral retroperitoneal lymph node dissection for nonseminomatous testicular cancer sexual dysfunction and electroejaculation in men with spinal cord injury: review infertilty in the spinal cord injuried male reproductive biology of paraplegics: results of semen collection, testicular biopsy and serum hormone evaluation auto-immunity to spermatozoa and quality of semen in men with spinal cord injury histological and hormonal testicular changes in spinal cord patients male fertility and sexual function after spinal cord injury current trends in the treatment of infertility in men with spinal cord injury spermatogenese bei patienten mit traumatischer querschnittähmung deep scrotal temperature and the effect on it of clothing, air, temperature, activity, posture and paraplegia neurological correlations of ejaculation and testicular size in men with a complete spinal cord section leukocytes in semen from men with spinal cord injuries leukocyte subtypes in electroejaculates of spinal cord injured men infl ammatory cytokine concentrations are elevated in seminal plasma of men with spinal cord injuries testis biopsy fi ndings in the spinal cord injured patient seminal vesicle aspiration in spinal cord injured men: insight into poor sperm quality sperm motility from the vas deferens of spinal cord injured men is higher than from the ejaculate brown-colored semen in men with spinal cord injury mumps orchitis: report of a mini-epidemic mumps orchitis mumps -an underestimated disease mumps orchitis among soldiers: frequency, effect on sperm quality, and sperm antibodies mumps orchitis: symptoms and treatment possibilities pathology of mumps orchitis successful testicular sperm extraction and fertilization in an azoospermic man with postpubertal mumps orchitis infl ammation of the testis, epididymis, peritesticular membranes and scrotum orchitis and testicular abscess formation caused by non-typhoid salmonellosis disseminated actinomycosis presenting as a testicular mass. a case report primary testicular actinomycosis mimicking metastatic tumor xanthogranulomatous funiculitis and orchiepididymitis: report of 2 cases with immunohistochemical study and literature review tuberculous epididymitis occurring 35 years after renal tuberculosis tuberculous epididymitis as a cause of testicular pseudomalignancy in two young children tuberculous epididymo-orchitis. diagnosis by fi ne needle aspiration tuberculous epididymo-orchitis. a case report epididymo-orchitis caused by intravesically instillated bacillus calmette-guérin: genetically proven using a multiplex polymerase chain reaction method gumma of testis lepromatous leprosy presenting as orchitis clinicopathological study of testicular involvement in leprosy genital manifestations of tropical diseases human brucellosis in northern saudi arabia epididymoorchitis due to brucellosis in central anatolia brucellar orchiepididymitis in acute brucellosis brucella orchitis: a rare cause of testicular enlargement orquioepididimitis brucelosa: a propósito de un caso rapid diagnosis of brucella epididymo-orchitis by realtime polymerase chain reaction assay in urine samples epididymal sarcoidosis sarcoidosis with bilateral epididymal and testicular lesions sarcoidosis presenting as a testicular mass sarcoid of the testis genitourinary involvement of systemic sarcoidosis confi ned to testicle seminoma and sarcoidosis: an unusual association intermittent azoospermia associated with epididymal sarcoidosis ueber einschlüsse in blastumoren a case of malacoplakia of the epididymis associated with trauma epididymo-testicular malacoplakiaa case report an unusual case of malakoplakia involving the testis and prostate testicular malacoplakia septate junctions between digestive vacuoles in human malakoplakia genitourinary tract involvement with systemic mycosis coccidioidomycosis of the epididymis and testis cryptococcal epididymoorchitis complicating steroid therapy for releasing polychondritis schistosomal epididymitis alveolar echinococcosis with involvement of the ureter and testes idiopathic granulomatous orchitis: pathologic study of one case experimental allergic orchitis in mice. histopathological and immunological studies immunological phenomena observed in the testis and their possible role in infertility immunocompetent cells in human testis in health and disease sympathetic autoimmune orchitis testicular obstruction: clinicopathological studies hypoechoic lesion found on testicular ultrasound after testicular piercing focal orchitis in undescended testes: discussion of pathogenetic mechanisms of tubular atrophy pseudolymphoma of the testis plasma cell granuloma of the testis: unusual localization testicular plasmacytoma: report of a case and review of the literature sinus histiocytosis with massive lymphadenopathy (rosai-dorfman disease): report of a patient with isolated renotesticular involvement after cure of non-hodgkin's lymphoma rosai-dorfman disease of the testis: an unusual entity that mimics testicular malignancy persistent erythema and pruritus, with a confl uent histiocytic skin infi ltrate, following the use of a hydroxyethylstarch plasma expander epididymitis nodosa. an epididymal lesion analogous to vasitis nodosa amiodarone in testis and semen testicular dysfunction with amiodarone use non-infectious epididymitis associated with amiodarone therapy amiodaroneinduced epididymitis: report of a new case and literature review of 12 cases amiodarone-induced sterile epididymitis recurrent bilateral amiodarone induced epididymitis amiodarone induced epididymitis in children amiodarone-induced epididymitis granulomatous epididymal lesion of possible ischemic origin a stone in the testicle testicular calculus orchitis mimicking testicular torsion in henoch-schönlein's purpura wegener's granulomatosis involving the urogenital tract testicular infarction in a 12-year-old boy with wegener's granulomatosis cogan's syndrome: 18 cases and a review of the literature recurrent epididymo-orchitis in patients with behçet's disease isolated polyarteritis nodosa of the male reproductive system testicular mass as presenting symptom of isolated polyarteritis nodosa testicular lesions of periarteritis nodosa, with special reference to diagnosis polyarteritis nodosa presenting as acute orchitis: a case report and review of the literature isolated epididymal vasculitis limited polyarteritis nodosa of the male and female reproductive systems: diagnostic and therapeutic approach polyarteritis nodosa masquerading as a primary testicular neoplasm. a case report and review of the literature testicular vasculitis: implication for systemic disease torsión del cordón espermático. a propósito de 87 nuevos casos. revisión de la literatura controversies of perinatal torsion of the spermatic cord: a review, survey and recommendations prenatal testicular torsion: ultrasonographic features, management and histopathological fi ndings torsion of the testis testicular torsion in a 68-year-old man testicular torsion: simple grading for histological evaluation of tissues damage primary testicular lesions in the twisted testis torsión testicular antes de 6 horas late postoperative results in males treated for testicular torsion during childhood testicular tissue bleeding as an indicator of gonadal salvageability in testicular torsion surgery intermittent testicular torsion lipomembranous fat necrosis in three cases of testicular torsion immunologic aspects of testicular torsion: detection of antisperm antibodies in contralateral testicle ipsilateral and contralateral testicular biochemical acute changes after unilateral testicular torsion and detorsion increased apoptosis in the contralateral testes of patients with testicular torsion as a factor for infertility unilateral testicular torsion: abnormal histological fi ndings in the contralateral testis. cause or effect? testicular injury. late results of semen analyses after orchiectomy testicular atrophy as a sequela of inguinal hernioplasty cooper's ligament repair for adult growing hernias spontaneous thrombosis of left spermatic vein: report of 2 cases postsurgical focal testicular infarct localised infarction of the testis acute segmental testicular infarction: differentiation from tumour using high frequency colour doppler ultrasound observations on the structure and diseases of the testis cysts of the testicle tunica albuginea cyst: rare testicular mass high-resolution sonography of scrotal contents in asymptomatic subjects the high incidence of benign testicular tumors testicular cysts. us fi ndings benign intrascrotal lesions sonography of benign intrascrotal lesions cysts of the testicular parenchyma and tunica albuginea efferent ductule cyst of tunica albuginea simple cyst of the rete testis intratesticular spermatocele simple cyst of the testis: case report and review of literature congenital simple cysts of the testis: a hitherto undescribed lesion intratesticular cysts testicular cysts: differentiation with us and clinical fi ndings simple testicular cyst: a rare cause of scrotal swelling in infancy bilateral intratesticular cysts. a specifi c entity cysts of the tunica albuginea (cysts of the testis) cysts of the tunica albuginea testis nonneoplastic cystic lesions of the tunica albuginea: an electron microscopic and clinical study of 2 cases cysts of the tunica albuginea: report of 4 cases and review of the literature cysts of the tunica albuginea: a report of 3 cases with a review of the literature high resolution ultrasonography in the diagnosis of simple intratesticular cysts simple testicular cyst diagnosed preoperatively by ultrasound surveillance strategy for intratesticular cysts: preliminary report simple cysts of the testis in children: preoperative diagnosis by ultrasound and excision with testicular preservation rete testis dysgenesis. a characteristic lesion of undescended testes estrogen receptor alpha has a functional role in the mouse rete testis and efferent ductules cystic transformation of the rete testis tubular ectasia of the rete testis: an ultrasound diagnosis cystic ectasia of the rete testis pronounced cystic transformation of the rete testis. mri appearance the diversity of reproductive tract abnormalities in males with cystic fi brosis postsurgical changes in the testis: a diagnostic dilemma acquired cystic transformation of the rete testis secondary to renal failure adenomatous hyperplasia of the rete testis adenomatous hyperplasia of the rete testis. a review and report of new cases displasia quística del testículo: anomalia en la diferenciación del parénquima testicular por probable fallo en la conexión entre los conductos de origen mesonéfrico y los cordones testiculares adenomatous hyperplasia of the references rete testis: report of two cases glandular changes in the rete testis: metastatic tumour or adenomatous hyperplasia? [letter tumors and cysts of the paratesticular region adenomatous hyperplasia of the rete testis. a clinicopathologic study of nine cases primary testicular lesions are associated with testicular germ cell tumors of adult men adénome du rete testis adenocarcinoma of the rete testis: case report, ultrastructural observations, and clinicopathologic correlates infarcted adenomatoid tumor: a report of fi ve cases of a facet of a benign neoplasm that may cause diagnostic diffi culty rete testis hyperplasia with hyaline globule formation. a lesion simulating yolk sac tumor nodular proliferation of calcifying connective tissue in the rete testis: a study of three cases key: cord-005147-mvoq9vln authors: nan title: autorenregister date: 2017-02-23 journal: med genet doi: 10.1007/s11825-017-0126-6 sha: doc_id: 5147 cord_uid: mvoq9vln nan complex mechanisms of dosage compensation regulate the mammalian x chromosome due to the presence of one copy in males (xy) and two in females (xx). x inactivation silences one x chromosome in females in early development, leading to specific epigenetic and structural changes. the inactive x chromosome becomes condensed and forms a bipartite structure within the nucleus, as we have shown by chromatin conformation analyses. specific long non-coding rnas are implicated in the formation of this unique structure. the inactive x chromosome is preferentially located near the lamina or the nucleolus. genes that escape x inactivation tend to be located at the periphery of the condensed inactive x chromosome. such genes are more highly expressed in females, and thus associated with sex-specific differences manifested even in early development. we have found that significant sex bias in gene expression are associated with escape from x inactivation in human tissues from normal males and females, and in tissues from individuals with sex chromosome aneuploidy, including turner or klinefelter individuals. institute for genomic medicine, columbia university medical center, new york, usa a central challenge in human disease genetics is the identification of pathogenic mutations. one key approach to distinguishing benign and pathogenic mutations is to use population genetic data to identify regions of the human genome under purifying selection. here i describe how the residual variation intolerance scoring framework has been applied to identifying pathogenic mutations in and outside protein encoding regions of the genome. next i report how these are related approaches are being used to identify pathogenic mutations in large-scale scale studies in epilepsy and other neurodevelopmental diseases. finally, i discuss how the identification of genetic causes of disease can inform treatment choices. scents indicate things, make promises, attract attention and stimulate imagination, feed anxieties and hopes: they are the salt in the atmospheric soup. we regard seeing and hearing as more important sensory functions, because they contribute more to conscious, cognitive processes of perception -but at moments of the greatest enjoyment we close our eyes and taste the scent, smell the taste. before the spirit and beauty of a person can fascinate us, our nose must become infatuated. the olfactory system in the nose acts as a window, monitoring environmental chemical information and convert chemical stimuli in electrical nerve impulses which are conducted along the olfactory sensory neuron to their glomerular target in the brain. olfactory receptors (ors) activation shows the distinguished (camp-based) transduction pathway for odorant perception. in 1991 buck and axel discovered the olfactory gene family, the largest gene family in the human genome, and postulated an exclusive expression in the olfactory epithelium. however, recent whole genome sequencing data from our and other labs show that ors have been found in every tissue of human body which was analyzed by next generation sequencing. the importance of such ectopic expression of ors is raised since the physiological function of some of ors was characterized. when identifying additional expression profiles and functions of or in non-olfactory tissue, there are limitations posed by the deorphanization of ors concerning the activated ligands and by the small number of antibodies available. in contrast to the olfactory sensory neurons which are believed to express all 350 functional or genes (only one or type per cell), cells in non-olfactory tissues tend to express more than one individual or gene per cell. in addition, some of the signaling pathways in non-olfactory tissues seem to involve completely different components in comparison to the olfactory neurons. what is the functional role of these ectopically expressed olfactory receptors? evidences rapidly accumulate that ors participate in important cellular processes outside its primary sensorial organ where they function in odor detection and discrimination. in our lab the functional expression of the first was demonstrated in spermatozoa (2004) . in the meantime we could show the existence and function of ors in the cardiovascular system (heart, blood cells), the gastrointestinal system (small intestine, liver, pancreas), the genito-urinary system (kidney, testis, spermatozoa, prostate), the respiratory system (lung, smooth muscle cells), the skin (keratinocytes, melanocytes) and sensory organs (retina). interestingly we found a broad spectrum of important functions like cell-cell communication and recognition, tissue injury, repair and regeneration, cancer growth, progression and metastasis, nutrient sensing and muscle contraction. nevertheless the functional importance of ectopic ors is still not sufficiently understood. studies seeking to determine the function of ectopic ors are still in its infancy and require further intensive exploration. however, the potential of ors to serve a target for a wide range of clinical approaches is indeed given. this hold promises that the knowledge gained by future investigations would lead to deepen our understanding of or function in health and disease and may provide the basis for the development of applications in diagnosis and therapies in near future. enzyme replacement therapies have been developed over the last 25 years for several of the lysosomal storage disorders (lsd's). the success of enzyme replacement therapy for gaucher disease paved the way for the development of similar treatments for the mucopolysaccharidoses, fabry and pompe disease and lately also for neuronopathic lysosomal storage disorders by intrathecal or intracerebral injections. in addition, small molecule approaches have been developed including substrate reduction therapies and chaperones, which can be used orally. while in gaucher disease enzyme as well as substrate reduction therapy results in reversibility of disease manifestations, with decreases in hepatosplenomegaly, normalization of blood counts and prevention of skeletal disease, this is unfortunately not the case for all patients affected with other lysosomal storage disorders. an important concept is the "window of opportunity for treatment" which is different for these disorders. for example, in fabry disease, early fibrosis fects are well defined, neither the specific mechanisms underlying neurological abnormalities nor the role of decreased cholesterol versus sterol precursor accumulation in disease pathogenesis have been clearly delineated. to identify cellular phenotypes and causative signaling pathways, we derived induced pluripotent stem cells (ipscs) from slos and lath subjects to model these diseases in vitro. slos subjects were known carriers of the most common dhcr7 mutations, including the intronic splice acceptor mutation c.964-1g>c and the missense mutation p.t93m. while all ipscs demonstrated the expected biochemical defects due to dhcr7 or sc5d mutations, cellular assays uncovered a defect in neural stem cell maintenance resulting in accelerated neuronal formation in slos ipscs. further molecular and biochemical analyses demonstrated inhibition of cholesterol-wnt interactions and loss of wnt/β-catenin activity mediated cellular phenotypes. however, this cellular phenotype was exclusive to slos, as lath ipscs did not exhibit a neural progenitor defect or inhibition of wnt/β-catenin activity. while this work demonstrates the utility of ipscs for modeling rare diseases and identifies signaling deficits potentially underlying slos phenotypes, questions remain regarding cellular and functional consequences, the specificity of lipid-wnt interactions, and the role of other disrupted signaling pathways in mediating developmental and functional deficits in these diseases. unpublished work using a variety of approaches will be discussed comparing the specific effects of cholesterol synthesis mutations on cell fate, functional activity, and lipid modulated signaling pathways to more precisely define the consequences of cholesterol synthesis defects and identify potential targets for patient therapy. induced pluripotent stem cell (ipsc) technology has become one of the major approaches for disease modeling since its first report in 2006. the ability to reprogram cells from somatic into embryonic stem cell-like state and to differentiate them into desired cell types in the culture dish has allowed scientists to carry out the study of several diseases in cells such as neurons which, in the past, could not be isolated from living subjects. williams syndrome (ws), a genetic neurodevelopmental disorder where 25-28 genes are hemizygously deleted, is among those. despite cardiovascular abnormalities, its unique neurological phenotypes i. e. hypersociability is of our interest. for several decades, research on different neurological aspects of ws has been conducted in a variety of models such as patient-derived cell lines (lymphoblastoid cells and fibroblasts), post mortem tissue, and mouse models. however, the lack of physiologically relevant cell types such as neural progenitor cells (npcs) and neurons has left a critical gap in our knowledge the disease's cellular and molecular phenotypes. to fill this gap, we took the advantage of the reprogramming technology to capture the genomes of ws subjects in ipscs, which could be then differentiated into npcs and neurons, enabling evaluation of whether the captured genome with hemizygous deletion of those genes leads to relevant neuronal cellular phenotypes. dental pulp cells-derived ipscs of classical ws, rare ws and typical developing (td) subjects were neurally induced via dual-smad inhibition in order to generate npcs and neurons. we discovered that classical ws npcs exhibited increased apoptosis, and, therefore, doubling time, compared to td neurons. this could possibly contribute to the reduction in cortical surface area in classical ws individuals as assessed by magnetic resonance imaging. surprisingly, we found that rare ws npcs behaved similarly to td npcs rather than to classical ws npcs in terms of apoptosis. we confirmed that frizzled9, which is deleted in the classical ws but not in our rare ws genome, is responsible for such phenotype via gain-and loss-of-function assays. moreover, classical ws neurons in general showed increased frequency of activity-dependent calcium transient compared to td neurons. finally, classical ws neurons acid alpha-oxidation, and (4.) glyoxylate detoxification. with respect to peroxisomal fatty acid oxidation peroxisomes catalyze the chain-shortening of certain fatty acids including very-long-chain fatty acids, but requires the active help of mitochondria to catalyze the degradation of acetyl-coa and the reoxidation of nadh as produced in peroxisomes. furthermore, with respect to ether phospholipid biosynthesis peroxisomes heavily rely on the endoplasmic reticulum to complete formation of ether phospholipids whereas fatty acid alpha-oxidation also requires the functional interplay between peroxisomes and mitochondria and the same is true for glyoxylate detoxification. recent evidence holds that the interaction between peroxisomes and the different subcellular organelles, including mitochondria and endoplasmic reticulum, is mediated by specific tethering protein complexes which bring organelles physically together thereby allowing metabolism to proceed smoothly. the importance of peroxisomes in metabolism is stressed by the existence of a large group of single peroxisomal enzyme deficiencies of which x-linked adrenoleukodystrophy is best known. our current state of knowledge with respect to the role of peroxisomes in metabolism and the peroxisomal enzyme deficiencies will be presented at the meeting. huntington's disease: rna-sequencing, small rna-sequencing, chip-sequencing and gwas data. department of neurology, boston university school of medicine, boston, ma 02118, usa huntington's disease (hd) is a dominantly transmitted neurodegenerative disease of midlife onset. recently several different unbiased genome wide studies in hd have been performed. these analyses point to a variety of pathological pathways that are associated with important features of the disease, including age at onset, cag repeat size, and the extent of neuropathological involvement. genome wide association studies (gwas) have identified several regions of the genome that contain genes that are associated with the age at onset for hd. the strongest of these is located at 15q13.3 for rs146353869 for which a very rare allele (maf = 1.1%) is associated with an approximate 6-year younger age at onset for carriers of the minor allele. the same locus contains an independent effect for rs2140734 where a more common minor allele (maf = 30.2%) is associated with a 1. 4 year older age at onset for carriers of this allele. these single nucleotide polymorphisms are in the region of fan1, mtmr10 and several other genes; some of which are not expressed in brain and are not likely candidates for hd modification. eqtl analysis has not resolved which gene may be implicated. other gwas implicated loci include rs1037699 at 8q22.3 and rs144287831 at 3p22.2. we have sought to combine the information derived from multiple platforms to gain additional insight into the pathways that may be implicated in hd pathogenesis. in this strategy, we have performed mrna-sequencing, small rna-sequencing and chip-sequencing using the h3k4me3 mark for active transcription and the repressive mark h3k9me3 in human hd brain samples with gwas genotyping. while the striatum is most involved in hd, the extent of neurodegeneration in post-mortem tissue precludes meaningful comparison between disease and control samples, and consequently we studied prefrontal cortex (ba9). several common pathways were seen across these three platforms. mrna-sequencing and mirna-sequencing data identified altered transcriptional profiles implicating developmental pathways involving the hox genes and related homeo-box domain genes (e. g. pitx1, pou4f2, etc.) . notably, micrornas located in hox gene clusters were among those most increased and levels of these correlate with pathological involvement in the striatum. these genes, associated with early embryonic development, are commonly silent in normal adult brain, and were among the most differentially expressed genes in hd brain. these prominent statistical effects are driven by the near total absence of expression in normal brain. pathways implicated in mrna-seq and chip-seq studies, included immune function and regulation of gene expression. these associations were very strong, indicating a large immune reactive response in the hd in the heart is related to unresponsive disease and unfortunately fibrosis may occur already in an early stage, sometimes even without prior hypertrophy. whether earlier intervention will be beneficial is largely unknown. and then: what is early? many unresolved questions exist at this stage, including the following: -what is the natural history and "point of no return" for the different lsd's? -what is the natural history and "point of no return" for subgroups of patients within one lsd's? -what are the long term complications: treatments change the phenotype rather than cure the disease -what is the influence of antibody generation on clinical effectiveness? -how do we manage the extreme costs of these products, especially in light of the many unsolved issues with respect to effectiveness? surprisingly, so far healthcare professionals, governments and industry have failed to systematically address these issues, resulting in insufficient knowledge for potentially lifesaving treatments. early conditional access, followed by a strict, transparent, independent, collaborative evaluation in addition to fair pricing should be explored. after the recent explosion in sequencing throughput, variant interpretation has quickly become the bottleneck in our effort to usher in the era of genomic medicine. while homozygosity for apparently pathogenic variants in the context of disease states is a well-established phenomenon, homozygosity can uncover many medically relevant aspects of the human variome that are difficult to study otherwise. for example, seemingly benign variants may prove pathogenic in the homozygous state. this includes variants with benign prediction using in silico tools as well as variants in dominant genes with no phenotype in carriers because they represent bona fide recessive inheritance. variants that are associated with one phenotype in compound heterozygous states may express themselves quite differently phenotypically when homozygous. furthermore, previously reported pathogenic variants can be challenged when their presence in homozygosity is associated with no abnormal phenotype, thus improving the specificity of the annotation of the morbid genome. homozygosity for lof variants is a special scenario that allows us to study naturally occurring human "knockouts", a powerful tool to study the physiological context of genes in humans. finally, homozygosity in the context of autozygosity provides a robust mapping tool that can greatly aid in the identification of relevant variants, especially those that exert their pathogenic effect in ways that defy detection by our usual algorithms. by expanding the spectrum of phenotypes that are studied, one can unlock the full potential of homozygosity to understand the medical relevance of the human variome in it its full range from embryonic lethal to essentially benign. helmholtz zentrum münchen gmbh, institut für experimentelle genetik, ingolstädter landstr. 1, 85764 neuherberg, germany the inheritance of epigenetic information in mammals across generations has been controversial. some reports provided initial evidence that a paternal high fat diet may propagate obesity and glucose intolerance in offspring, but potential confounders such as molecular factors present in seminal fluid, paternal-induced alterations in maternal care or transmission of microbiomes were not ruled out in these studies. we have shown in mice that a parental high fat diet (hfd) renders offspring derived via in vitro fertilization (f1) more susceptible to develop excessive overweight and type 2 diabetes (t2d) in a gender and parent-of-origin specific mode. female, but not male, offspring from obese parents became significantly more obese during a hfd challenge than female offspring from lean parents. body weight trajectories and distribution patterns of individual body weights in female offspring from one obese and one lean parent demonstrate that paternal and maternal germline propagate obesity in a roughly equitable and additive fashion, but likely different mode of action. in contrast, a more deteriorated state of hfd-induced insulin resistance was observed in both f1 genders, albeit predominantly inherited via the maternal germline. towards the identification of epigenetic information in sperm and oocyte from hfd and low fat diet fed parents, we are currently analyzing their transcriptome and methylome signatures. the status of this analysis will be presented. we report for the first time epigenetic inheritance of an acquired metabolic disorder via mammalian oocytes and sperms excluding confounding factors. such an epigenetic mode of inheritance may contribute to the observed pandemic increase in obesity and t2d prevalence rates, especially in an environment where nutrition is abundant. brain which may be a major influence contributing to neurodegeneration. in many instances enrichment of h3k4me3 at transcription start sites was not accompanied by a corresponding increase in expression. the apparent inconsistency suggests that common regulatory mechanisms in the hd brain are disrupted and this may contribute to a complex interplay of factors contributing to the neurodegenerative process. often findings in human hd brain samples conflicted with those reported in hd transgenic mouse models, suggesting that one may wish to be cautious in interpreting the significance of either type of study in isolation. the causal pathogesis of huntington disease, new therapeutic approaches r. laufer senior vp discovery and product development global r&d, teva r&d product development mgmt. 12 hatrufa st, netanya, israel huntington disease (hd) is an autosomal dominant neurodegenerative disease characterized by progressive loss of voluntary motor control, psychiatric disturbance, cognitive decline and death 15-20 years after motor onset. hd is uniquely caused by a polyglutamine encoding cag expansion in the huntingtin gene (htt), which allows for identification of pre-manifest mutation carriers as much as decades before onset and should facilitate development of disease modifying therapies. yet over 20 years after identification of the hd mutation, available therapies offer only symptomatic relief and are fraught with side effects. development of safe small molecule therapies for hd has been hindered by difficulties identifying and validating tractable drug targets within the disorder's complex pathogenesis. teva pharmaceuticals is developing potential novel treatments based on a mechanistic understanding of disease pathways common to neurodegenerative diseases. the progress of these studies will be reviewed. rare complete gene knockouts in adult humans p. sulem statistics department, decode genetics, sturlugata 8, 105 reykjavik, iceland loss-of-function mutations cause many mendelian diseases. here have create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. we sequenced the whole genomes of over 29,000 icelanders and imputed the sequence variants identified in this set into a total of 151 chip-genotyped and phased icelanders. of the genotyped icelanders, around 10% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (maf) below 2% in close to 2000 genes (complete knockouts). genes that are highly expressed in the brain are less often completely knocked out than other genes. homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with maf <2%, 95% confidence interval (ci) = 10-261). we are currently systematically phenotyping such human complete knock out. this phenotyping lasts 4 hours and attempts to cover most of the observable diversity in a non-invasive and cost efficient manner. i will demonstrate how using systematic phenotyping can advance the knowledge on individual gene knockout. we use results from in-house transcriptomics, existing animal models and complementary approaches to assess the observation in human. we will also discuss the scrutiny in other population in order to detect such complete knock-out. we will exemplify the impact of founder population and consanguinity in such an odissey. early onset and severe obesity can be inherited via loss of function mutations within the melanocortin pathway of hypothalamic body weight regulation. the most prominent player in this signalling pathway is the fat cell hormone leptin. leptin gene mutations were the first to be linked to monogenic early onset obesity. after binding of leptin to leptin receptors in the arcuate nucleus of the hypothalamus the neuropeptide msh is processed from the precursor pomc and acts as a ligand at the mc4 receptor. mutations in the leptin receptor gene, the pomc gene and the mc4 receptor gene were subsequently diagnosed in further patients with extreme early onset obesity. while leptin mutation patients can be treated with recombinant leptin -as shown already in the late 1990s -all other monogenic obesity forms are leptin resistants, and additional leptin failed to decrease body weight. only recently pomc gene deficient patients were successfully treated with the msh-analogue setmelanotide (kühnen et al. 2016) . common severe obesity is defined by the lack of disease causing monogenic defects. a plethora of gwas identified a large number of snps in common obesity associated with the individual bmi but only to a low amount of not more then 25%. however, almost all these common obese patients are characterized by high leptin levels suggesting sufficient generation of leptin in the increased fat tissue and a state of leptin resistance. several new data concerning the contribution of epigenetic and genetic variants in the pomc gene locus argue for a role of the melanocortin pathway also in common obesity and imply, therefore, a potentially new treatment option also in common obesity based on msh-analogues. genome-wide association studies have highlighted the role of genetic associations with susceptibility to common inflammatory diseases, highlighting potential new insights into disease pathogenesis and opportunities for therapy. however understanding the functional basis of these associations and delivering translational utility remains a significant challenge to the field. non-coding regulatory genetic variants are most commonly implicated in such studies. recent work highlights how such variants are also major drivers of diversity in the immune response transcriptome. this talk will discuss approaches we are taking to try and establish functional links between immune phenotype-associated regulatory genomic and epigenomic variation, and specific modulated genes and pathways. i will describe insights from the application of expression quantitative trait (eqtl) mapping to define genomic modulators of the global transcriptomic response in different primary immune cell populations and to specific innate immune stimuli in health and disease. this work highlights the extent of local and distant context-specific eqtl, enabling resolution of immunoregulatory variants and the identification of specific modulated genes involving disease associated loci. examples will be described showing how mapping trans-regulatory loci can be a powerful approach for discovery and dissection of gene networks informative for disease. i will also show how we have applied analysis of the genetics of gene expression in patients with sepsis admitted to intensive care, revealing new insights into disease pathogenesis. further progress in this area will require characterisation of associated variants in the context-specific disease relevant epigenomic landscape in which they may act, requiring careful consideration of relevant immune cell types and environmental modulators to study, to-abstracts aktuellen stellungnahme der deutschen forschungsgemeinschaft 1 sollen humane genomsequenzierungen die möglichkeit der rückmeldung von analyseergebnissen enthalten. als orientierung für einen verantwortungsvollen umgang mit dieser frage wird auf die projektgruppe eurat 2 verwiesen. dennoch bleibt das problem der einordnung mitteilungswürdiger ergebnisse aus forscher-und probandensicht und der bereitstellung der für aufklärungs-und rückmeldungsalgorithmen erforderlichen ressourcen. darüber hinaus ist bis heute nicht geklärt, welche kommunikativen prozesse eine ausreichende basis für ein informiertes einverständnis darstellen. diese und weitere fragen möchten wir mit frau prof. dr. med. dr. phil. eva winkler (nationales centrum für tumorerkrankungen, heidelberg), herrn dr. phil. martin langanke (theologische fakultät, universität greifswald) und herrn pd dr. phil. peter burgard (universitätskinderklinik heidelberg) kontrovers diskutieren. die organisatoren werden in die fragestellung einführen, anschließend sind impulsreferate (ca. 10 min) der referenten vorgesehen, bevor eine debatte mit dem publikum angeregt wird. fangerau, f. söhner (düsseldorf) in den 1960er jahren erlebte die humangenetik wie eine reihe anderer medizinischer disziplinen auch eine erhebliche institutionelle ausweitung. in den 1960er und 1970er jahren wurde beispielsweise an den universitäten der brd das gros der humangenetischen lehrstühle, in den ausgehenden 1970er und 1980er jahren auch in der ddr, eingerichtet. 1969 ging vom symposium "genetik und gesellschaft" im rahmen des marburger "forum philippinum" die initiative aus, in der ganzen bundesrepublik genetische beratungsstellen zu gründen und damit die genetische forschung (wieder) medizinisch nutzbar zu machen. in dieser phase der etablierung der humangenetik auf akademischer und praktischer ebene setzt das geplante zeitzeugenprojekt ein. es will die entwicklung der humangenetik in ihrem selbstverständnis als quer-und als längsschnittfach (pfadenhauer 2003:66) im deutschsprachigen raum ab den 1970er jahren mit hilfe von expertengesprächen dokumentieren und analysieren. im forschungsprojekt sollen zwei komplexe von fragestellungen bearbeitet werden: ein wissenschaftshistorischer, in dem die entwicklung und anwendung von diagnostischen und therapeutischen techniken im mittelpunkt steht, und ein sozialhistorischer, in dem es um die etablierung und den ausbau der institutionen der humangenetik sowie um den verlauf der das fach betreffenden gesellschaftlichen debatten geht. neben der gründung von instituten und der fachgesellschaft sowie der normierung der ausbildung für ärztliche und naturwissenschaftlich ausgebildete humangenetiker sollen die funktion der historischen reflexion und bearbeitung der facheigenen nationalsozialistischen vergangenheit in den 1980er jahren für die etablierung des fachs und der schwierige institutionelle trennungsprozess von der anthropologie mit ihren wirkungen auf das selbst-und fremdbild der humangenetik analysiert werden (weingart, kroll, bayertz 1992) . zusätzlich zur entwicklung in der brd sollen dabei auch die entwicklung des fachs in der ddr und mögliche deutsch-deutsche kooperationen zur sprache kommen (in jena befand sich z. b. in den 70er jahren das zentrale referenz-institut für genetische beratung in der ddr (vogel 1999:416) . das projekt kann sich auf zahlreiche arbeiten zur geschichte der deutschen humangenetik stützen (vgl. kröner 1997 , 1998 cottebrune 2006 cottebrune , 2008 weingart 1988; bennike 1992 dysmorphism carrying a pathogenic variant in the ebf3 gene detected by whole-exome sequencing. five missense, two nonsense, one 9-bp duplication, and one splice-site variant in ebf3 were found; the mutation occurred de novo in eight individuals, and the missense variant c. 625c>t [p.(arg209trp) ] was inherited by two affected siblings from their healthy mother who is a mosaic. ebf3 belongs to the early b-cell factor family (also known as olf, coe, or o/e) and encodes a transcription factor involved in neuronal differentiation and maturation. structural assessment predicts perturbing effects of the five amino acid substitutions on dna-binding of ebf3. transient expression of ebf3 mutant proteins in hek 293t cells revealed mislocalization of all but one mutant in the cytoplasm in addition to nuclear localization. by transactivation assays, all ebf3 mutants showed significantly reduced or no ability to activate transcription of the reporter gene under control of the cdkn1a promotor that corresponds well with loose association of ebf3 mutants with chromatin as demonstrated by in situ subcellular fractionation experiments. finally, rna-seq and chip-seq experiments demonstrate that ebf3 acts as a transcriptional regulator at cis-regulatory sequences and ebf3 mutant had reduced function due to partial disruption of the dna-binding domain. these findings demonstrate that ebf3-mediated dysregulation of gene expression has profound effects on neuronal development in humans and add ebf3 to the growing list of genes in which mutations cause syndromic forms of intellectual disability. step, we performed wes in further unelucidated uhs cases and identified homozygous nonsense mutations in tgm3 (transglutaminase 3) and in tchh (trichohyalin), respectively. elucidation of the molecular outcomes of the disease causing mutations by cell culture experiments of padi3 and tgm3 and tridimensional protein models demonstrated clear differences in the structural organization and activity of mutant and wild type proteins. by immunofluorescence analysis, we could demonstrate a diffuse homogenous cytoplasmic distribution of the wt padi3, whereas in the mutants the proteins were observed to form large aggregates throughout the cytoplasm. by use of human anti-citrullinated protein autoantibodies, we could show a strong labelling in the wt whereas the staining of the mutants was barely above background. in order to demonstrate the importance of padi3 in hair shaft formation, we generated padi3 knockout mice. electron microscopy observations revealed morphological alterations in hair coat of padi3 knockout mice. for tgm3, we performed a transglutaminase activity assay. the analysis results revealed that the wt had a significantly higher transglutaminase activity in comparison to the truncated protein. here, we report for the first time the identification of uhs causative mutations located in the three genes padi3, tgm3 and tchh. the two enzymes responsible for posttranslational protein modifications, and their target structural protein, are all involved in hair shaft formation through their sequential interactions. these findings provide valuable information regarding the pathophysiology of uhs and contribute to a better understanding of this protein interaction cascade. this could be of further value for cosmetics and pharmaceutics industries paving the way for development of novel products. deadenylases are best known for degrading the poly(a) tail during mrna decay. the deadenylase family has expanded throughout evolution and, in mammals, consists of 12 mg 2+ -dependent 3' end ribonucleases with mostly unknown substrate specificity. pontocerebellar hypoplasia type 7 (pch7) is a unique recessive syndrome characterized by neurodegeneration with ambiguous genitalia (mim%614969). we studied 12 human families with pch7, uncovering biallelic, loss of function mutations in toe1, which encodes an unconventional deadenylase. toe1-morphant zebrafish displayed mid-and hind-brain degeneration, modeling pch-like structural defects in vivo. surprisingly, we found toe1 associated with incompletely processed small nuclear (sn)rnas of the spliceosome, which is responsible for pre-mrna splicing. these pre-snrnas contained 3' genome-encoded tails often followed by post-transcriptionally added adenosines. human cells with reduced levels of toe1 accumulated 3' end-extended pre-snr-nas, and immuno-isolated toe1 complex was sufficient for 3' end maturation of snrnas. our findings reveal the cause of a neurodegenerative syndrome linked to snrna maturation and uncover a key factor involved in processing of snrna 3' ends. the kidney maintains acid-base homeostasis and electrolyte balance through highly specialized cells. in the distal nephron acid secretion is mediated by type a intercalated cells (a-ics) , which contain v-type at-pase-rich vesicles that fuse with the apical plasma membrane on demand. intracellular bicarbonate generated by luminal h+ secretion is removed by the basolateral anion-exchanger ae1. dysfunction of type a intercalated cells results in distal renal tubular acidosis (drta) and human mutations in v-atpase subunits and ae1 are causative for this condition. for the ae1 r607h mutation a dominant-negative trafficking mechanism was proposed to explain ae1-associated dominant drta based on studies in mdck monolayers. to test this hypothesis in vivo and to test potential rescue strategies correcting this mistargeting defect, we have generated a r607h knock-in mouse strain, which corresponds to the most common dominant drta mutation in human ae1, r589h. heterozygous and homozygous r607h knock-in mice displayed incomplete drta characterized by compensatory upregulation of the na+/hco3-cotransporter, nbcn1. as expected for the r607h mutation, red blood cell ae1-mediated anion-exchange activity and surface polypeptide expression were unchanged. surprisingly, basolateral targeting of the mutant ae1 in a-ics was preserved in contrast to previous studies in mdck cells. instead, we found ae1 expression in a-ics strongly reduced in a r607h dosage-dependent manner. additional cell culture studies in two widely used immortalized renal cell lines verified that targeting and half-life time of mutant ae1 protein was indeed preserved. surprisingly, atpase expression was reduced and its plasma membrane targeting upon acid challenge compromised. ultrastructural analysis revealed a loss of apical vesicles in a-ics, while we observed lysosomal inclusions and multilamellar bodies. accumulation of p62-and ubiquitin-positive material in a-ics of knock-in mice suggest a defect in the degradative pathway, which may ultimately lead to loss of a-ics. highlighting the expression of ae1 specifically in a-ics, type b intercalated cells were unaffected. we propose that reduced basolateral anion-exchange activity in a-ics inhibits trafficking and regulation of v-type atpase, compromising luminal h+ secretion and possibly also lysosomal acidification. our findings illustrate the considerable, context-dependent complexity of ae1-related kidney disease. b. vona 1 , d. liedtke 1 , k. rak 2, 3 , r. katana 4 , l. jürgens 3 , pr. senthilan 5 , i. nanda 1 , c. neuner 1 , mah. hofrichter 1 , l. schnapp 1 , j. schröder 1 , u. zechner 6 , s. herms 7, 8, 9 , p. hoffmann 10 , t. müller 11 , m. dittrich 1, 11 , o. bartsch 6 , pm. krawitz 12 , e. klopocki 1 , w. shehata-dieler 2 , mc. göpfert 4 , t. haaf 1 although many genes have already been identified as causing non-syndromic hearing loss (nshl), diagnostic rates of approximately 50% among hearing impaired patients suggest that many more genes are remaining to be identified. nshl is the most common sensory deficit that has a prevalence between one and two per 1000 newborns. furthermore, it demonstrates classic genetic heterogeneity with as many as 1% of coding genes in the genome anticipated to be involved in non-syndromic forms of deafness. autosomal recessive (75-80%) and autosomal dominant (15-20%) forms dominate inheritance patterns of deafness; however, in a small fraction of cases, x-linked deafness (1-4%) can be observed. whole exome sequencing of a german family with diagnostically unresolved nshl revealed a novel missense variant predicted as pathogenic in the gene ferm and pdz domains containing protein 4 (frmpd4) on chromosome xp22.2. this gene, also known as preso1, was first described as a regulator of dendritic spine morphogenesis. previous screening of pathogenic cnvs in array based comparative genomic hybridization among families with heterogeneous x-linked intellectual disability (xlid) showed duplication of xp22.2 including part of frmpd4 which implicated the gene in xlid. interestingly, a segregating truncating and a de novo missense mutation in frmpd4 have associated this gene with xlid, a phenotype not observed in our family. mouse expression studies localize frmpd4 to spiral ganglion neuron peripheral dendrites of the developing cochlea. in addition, we analyzed frm-pd4 knockdown and loss-of-function zebrafish mutants for innervation and structural defects in the otic vesicle and lateral line neuromasts. posterior lateral line neuromasts are observed with reduced axonal outgrowth that is also likely reduced in the lateral line nerve. abnormal innervation is also apparent in the otic vesicle. fluorescent neuromast labeling marked a significant reduction of overall otic vesicle and lateral line neuromasts in mutants versus wild type zebrafish. scanning electron microscopy revealed a pronounced absence of kinocilia in posterior lateral line neuromasts of frmpd4 -/zebrafish. furthermore, adult frmpd4 mutants show significantly delayed acoustically evoked behavioural responses compared to wild type fish indicating hearing impairment. investigation of transgenic drosophila insertion mutants detected a mild auditory phenotype i. e. a reduction in mechanical amplification gain and associated reduction in antennal fluctuation power. our results associate frmpd4 with x-linked hearing loss and suggest mutations in this gene are correlated with pleiotropic effects abstracts has also been demonstrated to be an important pathobiochemical feature in rtt. to test whether common deficits in mtor signaling could be responsible for the molecular pathogenesis underlying both syndromes, we generated and studied a novel cdkl5 knockout (cdkl5 -/y) mouse model and performed in vitro experiments in human cells. in cdkl5 -/y knockout mice loss of cdkl5 is accompanied by reduced phosphorylation levels of critical components of the mtor signaling cascade. these findings point at a regulatory role of cdkl5/cdkl5 on mtor activity and function. to gain further insights into the possible mechanism through which cdkl5/pi3k interaction could regulate mtor signaling, we used hek-t cells as cellular model. following knock-down of cdkl5, the amount of pi3k protein was significantly reduced compared to controls. to evaluate the contribution of our findings to pathogenesis, we performed rescue experiments in cdkl5 knock-down hek-t cells using wild-type and patient-specific mutant cdkl5 constructs. further experiments are ongoing to clarify the molecular mechanism by which cdkl5 regulates pi3k protein level in the cells. inferring expressed genes by whole-genome sequencing of plasma dna medical university graz, graz, austria, 2 university of technology, graz, austria the analysis of cell-free dna (cfdna) in plasma represents a rapidly advancing field in medicine. cfdna consists predominantly of nucleosome-protected dna shed into the bloodstream by cells undergoing apoptosis. we performed whole-genome sequencing of plasma dna and identified two discrete regions at transcription start sites (tsss) where nucleosome occupancy results in different read depth coverage patterns for expressed and silent genes. by employing machine learning for gene classification, we found that the plasma dna read depth patterns from healthy donors reflected the expression signature of hematopoietic cells. in patients with cancer having metastatic disease, we were able to classify expressed cancer driver genes in regions with somatic copy number gains with high accuracy. we were able to determine the expressed isoform of genes with several tsss, as confirmed by rna-seq analysis of the matching primary tumor. our analyses provide functional information about cells releasing their dna into the circulation. institute of human genetics, fau-erlangen-nürnberg, erlangen, germany, 2 institute of medical genetics, university of zurich, schlieren, switzerland rna-splicing is an important mechanism for eukaryotic gene expression and regulation. defective splicing significantly contributes to monogenic disease in humans. indeed, the mutational space for variants affecting splicing is larger than for coding variants. several computational methods have been developed to predict a variant's effect on splicing but lack predictive value outside the canonical splice sites and do not predict aberrant transcripts. thus the plethora of dna variants generated by recent advances in "next-generation" based sequencing (ngs) can be scored for a possible splicing effect, but a laborious wet-lab based confirmation and characterization is still required. rna-seq is widely used for quantification of gene expression and can be used to detect splicing events, but is limited for this use by the variable and often low read coverage of individual congenital anomalies of the kidneys and urinary tract (cakut) are the most common cause of chronic kidney disease in children. as cakut is a genetically heterogeneous disorder and most cases are genetically unexplained, we aimed to identify new cakut causing genes. using whole-exome sequencing and trio-based de novo analysis, we identified a novel heterozygous de novo frameshift variant in the leukemia inhibitory factor receptor (lifr) gene causing instability of the mrna in a patient presenting with bilateral cakut and requiring kidney transplantation at one year of age. lifr encodes a transmembrane receptor utilized by il-6 family cytokines, mainly by the leukemia inhibitory factor (lif). mutational analysis of 121 further patients with severe cakut yielded two rare heterozygous lifr missense variants predicted to be pathogenic in three unrelated patients. lifr mutants showed decreased half-life and cell membrane localization resulting in reduced lif-stimulated stat3 phosphorylation. lifr showed high expression in human fetal kidney and the human ureter, and was also expressed in the developing murine urogenital system. lifr knockout mice displayed urinary tract malformations including hydronephrosis, hydroureter, ureter ectopia, and, consistently, reduced ureteral lumen and muscular hypertrophy, similar to the phenotypes observed in patients carrying lifr variants. additionally, a form of cryptorchidism was detected in all lifr -/mice and the patient carrying the lifr frameshift mutation. altogether, we demonstrate heterozygous novel or rare lifr mutations in 3.3% of cakut patients, and provide evidence that lifr deficiency and deactivating lifr mutations cause highly similar anomalies of the urogenital tract in mice and humans. loss of cdkl5 associated with deficient mammalian target of rapamycin (mtor) signaling in mice and human cells we and other groups have shown that mutations in the x-linked cyclin-dependent kinase-like 5 (cdkl5) gene cause a severe neurodevelopmental disorder with clinical features including intellectual disability, early-onset intractable seizures and autism, that are closely related to those present in rett syndrome (rtt) patients. rtt is caused by mutations in the x-linked mecp2 gene. cdkl5 is a serine/threonine kinase and to date knowledge about its functional roles is scarce. we searched for cdkl5 interacting proteins by yeast-two hybrid screens. one of the candidates identified in these screens is a subunit of the phosphatidylinositol-4,5-biphosphate 3-kinase (pi3k). the results obtained in yeast could be confirmed in vitro in mammalian cells and in mouse brain by immunoprecipitation experiments and by co-localization studies. pi3k phosphorylates membrane lipids which act as docking sites to recruit targets upstream of mtor and thereby regulate among major cellular processes synaptic plasticity, which is the cellular basis for learning and memory. alteration of mtor signaling gene tubes for stabilizing rna immediately after drawing the samples. the subsequent rt-pcr analysis showed that 9 of the 11 variants located at potential splicing sites indeed affect splicing. thus 8 of these variants could be classified as deleterious (iarc class 5), while one chek2 variant could not be unequivocally classified as the rt-pcr analysis identified only 20% of the mutant transcript indicating continued usage of the constitutive splice acceptor site. this led to the classification as a probably hypomorphic allele. the variants in cdh1 and mlh1 did not affect splicing and were classified as benign (iarc class 1). none of the 5 rare synonymous and nonsynonymous exonic variants showed any effect on splicing. in conclusion, this analysis allowed the disambiguation of 10 out 11 vus at potential splice sites into a definite category (either iarc class 5 or 1). this work highlights the importance of computational splicing prediction and validation using rt-pcr of peripheral blood rna to assess the pathogenicity of vus. this in turn, allows more accurate genetic counseling and clinical management of affected families. gliomas present the major group of neoplasia in the central nervous system. they typically show invasive growth and high recurrence rate and are currently not curable. idh mutations are detected in nearly 70% of low grade gliomas and are considered to play a key role in low grade glioma development. while it is known that idh1/2 mutation leads to high-levels of 2-hydroxyglutarate (2-hg) that functions as an oncometabolite, little is known about the influence of idh1/2 mutations on energy metabolism and metabolic reprogramming in the tumor cells. since patient derived idh mutant cells do not grow in cell culture, previous studies from our group and others used transduced cell lines that overexpress idh1. in order to develop in vitro models with reduced side effects, we used crispr/ cas to introduce the idh1r132h mutation in a patient derived glioblastoma cell line. the edited cells expressed idh1r132h in western blot and expression levels of idh1 were comparable to the expression in wild type cells. the mutation was stable in long time culture experiments, without signs of senescence. moreover we found elevated 2-hg levels, proving that the idh1r132h neoenzymatic function is present in our cell lines. thus, we were able to edit and culture genomic idh1r132h mutated glioma cells for the functional analysis of the idh1r132h mutation for the first time without the effects of overexpression models. edited idh1r132h cell lines showed extended doubling times compared to wildtype cells. measurement of krebs cycle metabolites using mass spectrometry revealed elevated glutamate levels. we found enhanced atp-levels that could be a consequence of decreased atp consumption. additionally, the cells showed reduced viability compared to wildtype cells when cultivated in glycolysis inhibiting media, pointing out the enhanced dependency on glycolysis in idh1r132h cells. these results indicate changes in tumor cell metabolism and energy household induced by the idh1r132h mutation. since we and others could show that idh1r132h can alter nad+ and nadph levels, we tested if the idh1r132h mutated cells are more susceptible to selective inhibition of nad/p regenerating enzymes. esirna-silencing of nampt specifically decreased cell viability in idh1r132h but not wildtype cells with a concomitant increase of dead cells. in conclusion, we developed genes. thus, a standardized ngs based approach to characterize potential splice variants is lacking. hence we investigated the utility of hybridization based gene-panel enrichment and ngs of cdna. based on results of computational simulation we selected twenty rna-samples of patients with a known pathogenic splice-site variant in an inherited cancer predisposition gene. these variants were previously characterized by rt-pcr in our lab or in the literature. after rrna depletion and dna digestion we performed first and second strand cdna synthesis followed by "tagmentation"-based library preparation, targeted enrichment using the trusight cancer panel and sequencing on an illumina miseq platform. a computational pipeline was established to enable automated detection of aberrant splicing events by implementing different alignment and splice-junction detection algorithms together with filtering against control data sets. we also considered variant calling for the detection of allelic imbalance and gene-level expression analysis in this data. breast and ovarian cancer (bc/oc) predisposition has been associated with a number of high-and low-penetrance susceptibility genes. advances in sequencing technology has made multigene testing a practical option when searching for genetic variants associated with risk for bc/oc. variants of uncertain significance (vus), though, represent a major problem. we now studied 581 patients fulfilling criteria for brca1 and 2 testing using the next generation sequencing based trusight sequencing cancer panel on a miseq platform (illumina). data was analyzed after remapping with bwa to hg19 (grch37) using seqnext software (jsi) for variants in 14 known high and moderate penetrance susceptibility genes (brca1/2, atm, chek2, palb2, rad51c, rad51d, nbn, cdh1, tp53, mlh1, msh2, msh6, pms2) . besides 106 deleterious mutations we also identified 89 vus. of these, 11 variants (1 each in brca1, brca2, palb2, rad51c, rad51d, cdh1, and mlh1, and 2 in chek2 and mlh1, respectively) affect possible splicing sites. in addition, 5 synonymous and nonsynonymous variants outside the splicing sites (1 in brca1, brca2 and cdh1, respectively, 2 in rad51d) were not reported in exome variant server or exome aggregation consortium (exac) databases, so far. no families were available to study familial segregation. for all these variants a potential effect on splicing efficiency was predicted by three different computational algorithms (bdgp: splice site prediction by neural network, netgene2 server and the human splice finder (hsf 3.0) algorithm). we took advantage that these genes are ubiquitously expressed to investigate possible effects of these variants on mrna splicing using easily accessible peripheral blood. as mrna is notoriously unstable, we used pax-abstracts was evaluated for both the individual markers and their combinations derived from multiple algorithms. pronounced demethylation of all 3 markers was observed at baseline among cases compared to controls. risk of developing lc increased with decreasing dna methylation levels, with adjusted ors (95% ci) of 15.86 (4.18-60.17), 8.12 (2.69-4.48) , and 10.55 (3.44-32.31 ), respectively, for participants in the lowest quartile of ahrr, 6p21.33, and f2rl3 compared to participants in the highest 2 quartiles of each site among controls. the individual 3 markers exhibited similar accuracy in predicting lc incidence, with aucs ranging from 0.79 to 0.81. combination of the 3 markers did not improve the predictive performance (auc = 0.80). the individual markers or their combination outperformed self-reported smoking exposure particularly in light smokers. no variation in risk prediction was identified with respect to age, follow-up time, and histological subtypes. ahrr, 6p21.33, and f2rl3 methylation in blood dna are predictive of lc development, which might be useful for identification of risk groups for further specific lc screening, such as ct examination. over the past decades the search for disease causing variants has been focusing exclusively on the coding genome. this highly selective approach has been extremely successful however, recent data have revealed the importance of the non-coding genome in fundamental processes such as gene regulation, 3d chromatin folding, and pinpointed its role in disease. in this study, we systematically investigate the cis-regulatory landscape of pitx1, a homeodomain transcription factor that is exclusively expressed in the hindlimb. mutations and non-coding structural variations at the pitx1 locus have been shown to associate with a variety of congenital limb defects including club feet, polydactyly, and arm-to-leg transformation (liebenberg syndrome). we performed in vivo enhancer reporter essays in transgenic mice and identified several limb enhancer elements at the pitx1 locus; surprisingly they all showed both forelimb and hindlimb activity, although pitx1 is never expressed in the forelimb. capture hi-c experiments revealed a hindlimb-specific chromatin-organization at the pitx1 locus, which enables its promoter to contact several enhancers bearing a pan-limb activity only in the hindlimb. this tissue-specific chromatin folding plays a determinant role to refine the unspecific limb regulatory landscape toward a highly controlled and hindlimb delimited transcriptional output. to gain a better understanding of the pathology of pitx1 associated limb defects, we used crispr/cas9 to generate a set of deletions and inversions in the pitx1 cis-regulatory landscape in mice. genetic perturbations of the regulated 3d chromatin conformation lead to an ectopic forelimb expression of pitx1, resulting in an arm-to-leg transformation in mice and in human patients respectively. our data further highlight the role of non-coding mutations affecting chromatin folding in congenital disease and give new insights into the regulation of pitx1 during development and the pathomechanism of associated limb defects. hoxb13, a member of the embryonic homeobox transcription regulators, has been identified as the first susceptibility gene specific for prostate cancer (prca). the founder missense mutation g84e, which likely originated from finland, can be found in most populations of european ancestry. we determined the frequency of hoxb13 g84e for the german population, assessed in a cohort of 379 unrelated cases, each with positive family history of prca, 367 sporadic prca cases and in 1015 controls. additional 646 affected relatives from prca families were included to explore association with aggressive disease in subgroups with high gleason score (>7), advanced tumor stage, or psa at diagnosis >20 ng/ml. carriers of g84e were rare in controls (0.4%) and showed increased frequencies in both sporadic (1.6%) and familial prca cases (3.2%). estimated risks were or = 4.2 (p = 0.026) and or = 8.3 (p = 0.0003), respectively. the risk effect size increased with the number of affected individuals per pedigree: or = 12.6 (p < 0.0001) for 3 or more, and or = 14.4 (p < 0.0001) for 4 or more affected men. the strongest association with clinical features was observed between g84e and advanced tumor stage (or = 9.2; p < 0.0001). in conclusion, the observed frequency of hoxb13 g84e mutation carriers in our study cohort was intermediate as compared to the common prevalence in scandinavia and the rare occurrence in mixed european populations from the us. the risk estimates of hoxb13 g84e and the stronger effect sizes in families with increasing number of affected relatives were in line with a high penetrant germline predisposition. the association between g84e status and tumor stage may be of greater interest for clinical practice, but needs further validation. the absolute penetrance of the hoxb13 g84e mutation should be investigated in further studies in order to elucidate its suitability as a genetic predictor for prca. smoking-associated dna methylation markers predict lung cancer incidence homozygous smn1 loss causes spinal muscular atrophy (sma), the most common lethal genetic childhood motor neuron disease. smn1 encodes smn, a ubiquitously expressed housekeeping protein, which makes the primarily motor neuron-specific phenotype rather unexpected. sma individuals harbor low smn expression from one to six smn2 copy genes, which is insufficient to functionally compensate for smn1 loss. however, rarely individuals with homozygous absence of smn1 and only three to four smn2 copies are fully asymptomatic, suggesting protection through genetic modifier(s). previously, we identified plastin 3 (pls3) overexpression as an sma protective modifier in humans and showed that smn deficit impairs endocytosis, which is rescued by pls3 overexpression. here, we identify reduction of the neuronal calcium sensor neurocalcin delta (ncald) as a protective sma modifier in five asymptomatic smn1-deleted individuals carrying only four smn2 copies. we demonstrate that ncald is a ca 2+ -dependent negative regulator of endocytosis, as ncald knockdown improves endocytosis in sma models and ameliorates pharmacologically induced endocytosis defects in zebrafish. importantly, ncald knockdown effectively ameliorates sma-associated pathological defects across species, including worm, zebrafish and mouse. in conclusion, our study identifies a previously unknown protective sma modifier in humans, demonstrates modifier impact in three different sma animal models and suggests a potential combinatorial therapeutic strategy to efficiently treat sma. since both protective modifiers restore endocytosis, our results confirm that endocytosis is a major cellular mechanism perturbed in sma and emphasize the power of protective modifiers for understanding disease mechanism and developing therapies. mutations affecting coding or regulatory regions of smc2 cause dysregulation of condensins resulting in a phenotype reminiscent of cohesinopathies cornelia de lange syndrome (cdls) is a dominantly inherited malformation syndrome caused by mutations in genes encoding subunits (smc1a, smc3, rad21) or regulators (nipbl, hdac8) of the cohesin complex. this dna-bound complex regulates several chromatin-related processes such as chromosome segregation, dna-damage repair, transcription and chromatin structure. the project presented initially started with two children and their mother who showed clinical features reminiscent of cdls. while various sequencing approaches failed to identify the disease-causing mutation, a 60 kb spanning deletion co-segregating with the phenotype was identified by array-cgh. besides the last exons of cylc2, encoding a sperm head protein, no other genes were affected. subsequent in-silico analyses predicted the existence of a ~3 kb tissue-specific regulatory element within this region, located approximately 1 mb distant from the next protein-coding gene smc2, which encodes a subunit of the cohesin-related condensin complex. significant reduction of smc2 expression was verified in patient's fibroblasts by qpcr analysis. accordingly, a strong dysregulation of smc2 was observed in hek293 and sh-sy5y cells deficient for the putative 3 kb regulatory element, which was deleted by crispr/cas9 genome editing. reporter gene assays further highlighted the functional relevance of the identified regulatory element in regulating the smc2 gene promoter. interestingly, we could prove on protein as well as on mrna level that alterations in smc2 expression are correlated with the dysregulation of other condensin subunits such as smc4 in patient's samples as well as in cris-pr/cas9-generated cells. in a large exome sequencing project we have identified a smc2 frameshift mutation in an additional family with two patients who show clinical features overlapping with those seen in our initial family. quantitative pcr analyses in fibroblasts of both subjects also showed significant reduction of smc2 and smc4 expression, which is consistent with our findings in the first family. to further investigate whether alterations in condensin gene expression are specific for the dysregulation of smc2, we have decreased smc2 levels in different cell types by sirna. quantitative protein as well as mrna analyses revealed reduced smc4/smc4 expression. our data show for the first time the coordinated expression of different condensin subunits and its relevance for human disease. abstracts human pedigree. cardiac valves initially form through a process called endothelial-to-mesenchymal transition (emt) then subsequently elongate and mature during early juvenile life. expression analysis throughout embryonic and postnatal stages of adamts19-/-mice revealed an expression in all cardiac valves after valve formation. high resolution, digital echocardiography showed that mice without adamts19 expression develop dysfunctional aortic valves early in life, reminiscent of the human phenotype. notably, the expression of adamts19 in the valve was restricted to valvular interstitial cells and not observed in endothelial cells. functional analysis using proteomic approaches suggest that the presence of ad-amts19 is necessary to maintain extracellular matrix remodelling during valve development and its maturation. not only do the lof mice fully recapitulate the human phenotype, they also highlight adamts19 as a novel marker for valvular interstitial cells to specifically target initial post-emt processes as well as serve as an important model to understand an ageing valve phenotype in humans. exome sequencing of 55 bipolar disorder patients with rapid cycling implicates novel candidate genes in disease development bipolar disorder (bd) is a severe neuropsychiatric disorder characterized by recurrent episodes of mania and depression. bd has a lifetime prevalence of about 1% and a high heritability of about 70%. although recent genome-wide association studies identified the first susceptibility genes contributing to disease development, the cumulative impact of common alleles with small effect may only explain around 38% of the phenotypic variance (lee et al. 2011) . in consequence, rare variants of high penetrance have been suggested to additionally contribute to bd susceptibility. in the present study we focused on bd patients with rapid cycling (rc). rc is a course specifier of bd defined as having at least four recurrent episodes of acute illness within one year. since rc showed strong evidence for familiarity, we hypothesized that bd patients with rc might represent a more defined etiological subgroup and that rare variants of high penetrance might contribute to the development of rc in bd patients. we selected 55 unrelated bd patients with rc of german origin and performed exome sequencing using the illumina hiseq2500 platform. for data analysis, the varbank pipeline of the cologne center for genomics was used. we filtered for rare (minor allele frequency <0.1%), heterozygous and non-synonymous variants that were predicted to be possibly damaging or disease causing by at least 4 of 5 applied prediction tools. after these filtering steps, we identified a total of 110 different genes which harbored rare functional variants in at least three independent patients. gene set analysis for these genes using consensuspathdb revealed 165 decker 7 , g. nuernberg 8, 9 , d. hassel 2 , g. a. rappold 1, 10 mutations in the homeobox gene shox cause shox deficiency, the most frequent monogenic cause of short stature. the clinical severity of shox deficiency varies widely, ranging from short stature without dysmorphic signs to mesomelic skeletal dysplasia (léri-weill dyschondrosteosis, lwd). in rare cases, individuals with shox deficiency are asymptomatic. to elucidate the factors that modify disease severity/penetrance, we studied a three-generation family with five affected individuals with lwd using whole genome linkage analysis and whole exome sequencing. the variant p.phe508cys of the retinoic acid catabolizing enzyme cyp26c1 co-segregated with the shox variant p.val161ala in the five affected individuals, while the shox mutant alone was present in three asymptomatic individuals. two further independent lwd cases with shox deficiency and damaging cyp26c1 variants were identified. the identified damaging variants in cyp26c1 affected its catabolic activity, leading to an increased level of retinoic acid. we also provide evidence that high levels of retinoic acid significantly decrease shox expression in human primary chondrocytes and zebrafish embryos. individual morpholino knock-down of either gene shortens the pectoral fins, whereas depletion of both genes leads to a more severe phenotype. together our findings demonstrate that shox and cyp26c1 act in a common molecular pathway controlling limb growth and describe cyp26c1 as the first genetic modifier for shox deficiency. heart valve dysfunction in men and mice is caused by loss of function mutations in adamts19, a novel marker for valvular interstitial cells on a global perspective defects of the cardiac valves are one of the most common heart abnormalities in humans, with a substantial number of them requiring surgical intervention at least once in their life. several mechanisms have been proposed ranging from acquired to developmental causes, but thus far the majority can not be explained on the molecular level. here we report on the identification of a unique human family affected by multiple dysfunctional cardiac valves early in life. genetic screening revealed a homozygous deletion of the first eight exons in ad-amts19, a novel candidate gene for valvular heart defects. to investigate its role in heart valve development, we designed a transgenic mouse model that reconstitutes the loss of function (lof) in adamts19 found in the statistically analyzing de novo mutations identified in >5,000 id patients highlighted ppm1d as a candidate id gene. ppm1d is a type 2c phosphatase that functions as a negative regulator of cell stress response pathways by mediating a feedback loop of p38-p53 signaling, thereby contributing to growth inhibition and suppression of stress induced apoptosis. we identified 14 patients with mild-moderate id and a de novo truncating ppm1d mutation. deep-phenotyping of the patients revealed in addition to id overlap for behavioural problems (adhd and anxiety disorder), hypotonia, broad based gait, facial dysmorphisms and periods of fever and vomiting. ppm1d is shown to be expressed during fetal (brain) development and in the adult brain. all mutations were located in the last, or penultimate exon, suggestive of escaping nonsense-mediated mrna decay. both ppm1d expression analysis and cdna sequencing in patient ebv-lcls support the presence of a stable, but truncated transcript, consistent with this hypothesis. exposure of patient's cells to ionizing radiation resulted in normal p53 activation suggesting that p53 signaling is not affected by the truncated protein. however, a cell growth disadvantage was observed. thus, we show that de novo truncating ppm1d mutations in the last and penultimate exon cause syndromic id which provides novel insights in the role of cell cycle checkpoint genes in neurodevelopmental disorders. de novo truncating variants in asxl2 are associated with a unique and recognizable clinical phenotype harvard stem cell institute, department of stem cell and regenerative enriched pathways (q < 0.05) including actin cytoskeleton and calcium ion binding. subsequently we applied the residual variation intolerance score (rvis) and identified 41 genes which were ranked among the 25% most intolerant genes in the genome. these genes included the previously reported genome-wide significant bd risk genes syne1 and mll2. in addition, we identified novel, promising candidate genes which have not previously been implicated in bd development such as ryanodine receptor 3 (ryr3, affected in six patients) and huntingtin (htt, 4 patients). both genes are ranked among the 0.5% most intolerant genes of the genome. ryr3 encodes a brain expressed intracellular cation channel that mediates the rapid release of ca2+ from the endoplasmic reticulum, thus making it a highly plausible candidate gene for contributing to rc. abnormal expansion of a trinucleotide repeat in the htt gene causes huntington disease which is a neurodegenerative disease characterized by motor, cognitive and psychiatric symptoms. the seven most promising genes are currently being followed up by resequencing in larger cohorts of 2500 independent bd cases (including 250 patients with rc) and 2500 controls of european ancestry using the single molecule molecular inversion probes (smmips) technology. de novo truncating mutations in the last and penultimate exon of ppm1d cause a novel intellectual disability syndrome abstracts with gα. signaling properties of g protein complexes carrying mutant gβ1 subunits were further analyzed by their ability to couple to dopamine d1r receptors by real-time bioluminescence resonance energy transfer (bret) assays. these studies revealed altered functionality of the missense mutations r52g, g64v, a92t, p94s, p96l, a106t, and d118g but not for l30f, h91r, and k337q. in conclusion, we demonstrate a pathogenic role of de novo and autosomal dominant mutations in gnb1 as a cause of gdd and provide functional evidence for a loss-of-function mechanism underlying the disease. comprehensive phenotyping and trio-exome analysis of 50 children with neurodevelopmental disease whole exome sequencing (wes) has been proven as a powerful analytical tool to dissect the genetic basis of human hereditary disorders. here, we report on a prospective deep phenotyping and trio-wes study of 50 children affected by previously undiagnosed and diverse complex neuropediatric disorders. all children underwent a standardised and comprehensive clinical work-up in a single centre that included detailed clinical evaluations by pediatricians and clinical geneticists, extensive laboratory and metabolic analyses, analyses of cerebrospinal fluid, mri of the brain and eeg, followed by trio-wes analysis. this systematic approach allowed to identify a pathogenic mutation in a known disease gene in altogether 21 children (42%) and discovered a convincing candidate disease gene in additional 22 children (44%). taken together, this translates into a successful genetic diagnosis of up to 86% in this cohort. in 3 children with mutations in a known disease gene (3/21 = 14.3%) the molecular diagnosis substantially influenced the clinical management and drug treatment. we further document an expansion of the phenotype in known disease entities in 4 individuals. the extraordinary high gene discovery rate in our cohort emphasizes the potential of trio-wes even in a clinically inhomogeneous group of individuals with likely genetic disease. however, this requires a multidisciplinary approach including deep and sometimes reverse phenotyping, research-based interpretation of trio-wes identified genetic alterations, extensive review of the literature, use of several mutation prediction and protein-modelling tools, as well as openness and exchange of data with national and international researchers and clinicians working on similar diseases. exome sequencing of pooled dna samples for large-scale screening in individuals with sporadic intellectual disability b. popp, a. ekici, s. uebe, c. thiel, j. hoyer, a. wiesener, a. reis, c. zweier institute of human genetics, fau-erlangen-nürnberg, erlangen, germany high throughput sequencing has enabled identification of many novel disease genes and empowered diagnostic testing for heterogeneous disorders, especially for intellectual disability (id) where more than 1000 genes have been implicated. due to this extreme heterogeneity gene panels are ineffective, and expensive exome or genome sequencing is necessary. furthermore, many affected individuals have to be sequenced to confirm candidate genes and to refine the phenotypic spectrum. we now explored if pooling strategies could satisfy the need for a genome-wide, simple, cheap and fast screening technology. the asxl genes (asxl1, asxl2 and asxl3) participate in body patterning during embryogenesis and encode for proteins that are involved in epigenetic regulation and assembly of transcription factors to specific genomic loci. germline de novo truncating variants in asxl1 and asxl3 have been respectively implicated in causing bohring-opitz and bainbridge-ropers syndromes, resulting in overlapping features of severe intellectual disability and dysmorphic features. to our knowledge, asxl2 has not yet been associated with a human mendelian disorder. in this study, we performed whole-exome sequencing in six unrelated probands with developmental delay, macrocephaly, and dysmorphic features. all six had de novo truncating variants in asxl2. a careful review en abled the recognition of a specific phenotype consisting of macrocephaly, prominent eyes, arched eyebrows, hypertelorism, a glabellar nevus flammeus, neonatal feeding difficulties, hypotonia and developmental disabilities. although overlapping features with bohring-opitz syndrome and bainbridge-ropers syndromes exist, features that distinguish the asxl2-associated condition from asxl1-and asxl3-related disorders are macrocephaly, absence of growth retardation and more variability in the degree of intellectual disabilities. we were also able to demonstrate with mrna studies that these variants are likely to exert a dominant negative effect, since both alleles are expressed in blood, with the mutated asxl2 transcripts escaping nonsense mediated decay. in conclusion, de novo truncating variants in asxl2 underlie a new neurodevelopmental syndrome, with a clinically recognizable phenotype. this work expands the germline disorders that are linked to the asxl genes. functional characterization of novel gnb1 mutations as a rare cause of global developmental delay over the past years, prioritization strategies that combined the molecular predictors of sequence variants from exomes and genomes of patients with rare mendelian disorders with computer-readable phenotype information became a highly effective method for detecting disease-causing mutations. the drawback of phenotype-based prioritization, however, is that they require a deep and comprehensive feature description to gain good performance. but in routine diagnostics, the naming of phenotypic features varies among clinicians, and sometimes a comprehensive phenotypic overview is not possible because of missing terminology. these gaps can be reduced by including a new layer of phenotypic information using facial recognition technology to detect dysmorphic features from two-dimensional photographs. automated image analysis is in principle able to identify any deviation from the norm and to quantify it objectively. we therefore developed an approach that combines facial dysmorphology novel analysis (fdna) technology with standard phenotypic and genomic features to identify pathogenic mutations in exome data. we have started collecting data from a diverse spectrum of patients with molecularly confirmed diagnoses in a multi-center study, and we present the current results. at the time of abstract submission more than 300 patients from over 10 contributing institutions were evaluated and used for simulation of a training set of exomes. automated facial recognition yields the correct diagnosis amongst the first ten suggested syndromes in more than two thirds of the cases and shows a high correlation with syndrome predictions that were based on expert annotated features. hereby, we could also confirm the diagnosis in cases with only subtle facial features. consequently, we used classical machine learning approaches to integrate scores based on the image analysis, phenotypic description and exome se-after initial evaluation of available computational methods by virtual pooling of exome data or simulated reads using different pooling fractions, we decided to exome sequence 96 individuals with sporadic id in 8 pools of 12 samples each. this was suggested to be the optimal combination with a 90% detection rate. dna was mixed in equimolar concentrations and submitted for exome sequencing. read data was aligned to the human reference, and variants were called using a ploidy of 24. resulting variant calls in 893 known id genes (sysid database) were then filtered for loss-of-function (lof) variants and for missense variants that were either previously reported as pathogenic or computationally predicted to be deleterious. furthermore, we screened 523 id candidate genes and 1694 haploinsufficiency intolerant genes for lof variants. subsequently, sanger sequencing was used to determine the individual carrying each variant in the respective pool and to test segregation in the parents. this approach resulted in the identification of 15 pathogenic variants (assumed or confirmed de novo) in known id genes (ahdc1, ankrd11, atp6v1b2, cask, chd8, kcnq2, kmt2a, kras, med13l, rit1, setd5, tcf4, wac, zbtb18), two pathogenic variants inherited from a symptomatic or healthy parent, respectively, (zmynd11, ifih1), and a homozygous variant in the recessive trappc11 gene. this included 13 loss-of-function and 5 missense variants. additionally, we identified 4 de novo variants in candidate genes. in our id cohort this resulted in a high mutation detection rate of 23%. thus, detection of rare variants from exome sequenced dna pools (pool-seq) is feasible and has a high detection rate similar other screening approaches. compared to affected-only exome sequencing this method can reduce costs by more than 90% with only marginal increase in sanger-sequencing costs and significantly speed up wet lab work with an acceptable increase in computational complexity. in contrast to targeted sequencing methods like molecular inversion probes or hybridization-based panels, our method has the advantage of allowing flexible re-analysis of the same data for new genes. in conclusion, we established exome pool-seq as a method for large-scale, cost-efficient and flexible sequencing in highly heterogeneous but well characterized disorders like id. three years of experience with targeted next-generation sequencing of developmental delay next-generation sequencing (ngs) has opened up new possibilities especially in the search for disease-causing mutations in disorders with common clinical features but a heterogeneous genetic background. the identification of the underlying genetic defect provides a clear diagnosis for patients more and more influencing their management and occasionally even their therapy, and it is the prerequisite for prenatal or preimplantation decisions in the affected family. ngs panels are used widely in clinical settings to identify genetic causes of various monogenic disease groups, such as intellectual disability (hu et al. 2015) , neurodevelopmental and neuromuscular disorders, among others. however, many new challenges have been introduced both at the technical level and at the bioinformatic level, with consequences including new requirements for interpretation of results, and for genetic counseling. we report on our experience with a targeted ngs panel comprising over 1200 brain related genes (mpimg-1-test) in the routine clinical diagnostics of patients with syndromic and non-sydromic forms of developmental delay as well as patients with neuromuscular disorders. 202 patients (age 1-71; mean 16) with syndromic (s) or non-syndromic (ns) developmental delay or with neuromuscular symptoms (nm), seen at the genetic counselling unit of our institute, were analyzed with targeted exon enrichment and ngs. chromosomal re-arrangements and copy number variations were excluded in all 202 patients previously by conven-it was recently shown that that clonal hematopoiesis can be driven by somatic point mutations. these acquired mutations occur with normal aging in up to 10% of older (>65y) individuals (1) (2) (3) and few reports in younger individuals. here we present a targeted re-sequencing assay that combines high throughput with ultra-high sensitivity based on single-molecule molecular inversion probes (smmip) (1) . we have now analyzed dna from 2000 healthy blood donors from 5 different age groups (20-29; 30-39; 40-49; 50-59 and 60-69y) , with no previous diagnosis of cancer, for somatic mutation in 100 loci. those loci included 50 known drivers of clonal hematopoiesis (2) (3) (4) and 50 novel or candidate loci. the improved assay allows low-frequency variant detection with variant levels down to <0.1%. this improved sensitivity allowed the identification of somatic mutations in a limited set of loci in >15% of old individuals, but also report those mutations in individuals of the youngest age group. most prevalent mutations include known hotspot mutations in dnmt3a and asxl1. here we show that somatic drivers of clonal hematopoiesis are more prevalent and occur in younger individuals than previously reported. these somatic events are age-related. however, the high prevalence and their occurrence in relatively young individuals implicates their origin as a common biological process involved in normal aging. a. m. nissen, j. graf, c. rapp, m. locher, a. laner, a. benet-pagès, e. holinski-feder medizinisch genetisches zentrum -mgz, munich, germany gene dosage abnormalities account for a significant proportion of pathogenic mutations in rare genetic disease related genes. in times of next generation sequencing (ngs), a single analysis approach to detect snvs and cnvs from the same data source would be of great benefit for routine diagnostics. however, cnv detection from exon-capture ngs data has no standard methods or quality measures so far. current bioinformatics tools depend solely on read depth which is systematically biased. we developed a novel approach based on: 1. utilization of five independent detection tools to increase sensitivity, 2. different reference sets for different kits and normalization against samples from the same sequencing run to improve robustness against workflow conditions, 3. definition of special quality thresholds for single exon events to minimize false negatives, 4. identification of reliable regions by assessment of capture efficiency using a reference set of cnv negative patients to minimize false positives. a cnv quence of the patients and could predict the pathogenic mutation among the top 10 positions in a prioritized exome in more than 90% of the monogenic cases in our cohort. hence, our results show that computer-assisted facial recognition is not only a promising technology that could be applied in the routine diagnostic workflow, but also a technology that allows diagnosis in cases with non-typical clinical presentation and boosts the diagnostic yield in exome studies. the added value of rapid exome sequencing in critical clinical situations for critical clinical situations, turnaround times (tats) of exome sequencing need to be fast in order to have an impact on clinical decision making. we therefore set out to develop a fast exome sequencing approach (max. 14 days). urgent exomes are preferably sequenced as trios to enable de novo analysis and assist data interpretation. dna library preparation is performed using the sureselect qxt protocol (v5, agilent), and sequencing is done on a nextseq500 (illumina) with a high coverage (200-300x). automated file handling allows rapid bwa mapping, gatk variant calling and annotation. a total of one-hundred samples have been sequenced until now using this rapid procedure: 28 trio's, 1x mother and child, 2x parents plus 2 children, and 6 single cases. six trios with known aberrations were used for experimental setup. of the remaining 31 families, in 14 families (possible) pathogenic snvs were identified, of which some still need further follow up, whereas 17 families remained negative after inspection of snvs and small indels. for cnv analysis, a trio based reference-free cnv approach is still under development. preliminary data show that all control cnvs (53kb-6mb) are detected correctly, and retrospective cnv analyses of the other samples identified three possibly de novo cnvs that need further follow up. shorter tats days were already beneficial for some patients, i. e. an adult male suffering from myelofibrosis and autoinflammatory symptoms. a sting-like phenotype (= stimulator of ifn genes) was suspected, with a possible involvement of the jak/stat pathway. urgent exome sequencing was performed and results were available within 9 days. interestingly, both a somatic variant in mpl (= trombopoetine receptor > myelofibrose) and a heterozygous variant in acp5 (trap, known immune dysregulation disorder) were identified, both fitting to the patients phenotype. based on these results the medication of the patient was changed, resulting in a substantial improvement of the patients constitution. in conclusion, we have implemented a rapid exome sequencing workflow for urgent cases. the rapid identification of pathogenic variants already had implications on patient treatment, underlying the added value of a fast genetic diagnosis. ement insertion, however, was spliced into the mrna of nabp1, leading to a frameshift mutation and a premature stop codon, potentially altering or abolishing gene function. in summary, we have shown that transposon insertions, both common variants as well as rare or de novo variants, can be detected in wes data. such insertions in coding or regulatory regions of disease-relevant genes might therefore explain some of the cases in which no pathogenic coding mutation can be identified by wes. the influence of human genetic variation on epstein-barr virus sequence diversity: a genome-to-genome approach c. hammer 1, 2 , a. loetscher 3, 4 , em. zdobnov 3, 4 genome-wide association studies (gwas) have identified common genetic polymorphisms that associate with clinical manifestation and immune response parameters of various infections. we here present an alternative approach, using variation in the virus sequence as phenotype, which is specific by nature and unique to genomic research in infectious diseases, for genome-to-genome (g2g) association studies. building on the unprecedented possibility to combine large-scale human and viral genomic data, we explored interactions between human genetic variation and viral sequence diversity in individuals infected with epstein-barr virus (ebv). the major goal is the identification of key genetic players in the evolutionary 'arms race' between pathogen and host. ebv is the pathogenic agent of infectious mononucleosis and is associated with a broad spectrum of lymphoid and epithelial malignancies, including lymphomas and nasopharyngeal carcinomas. there is also evidence for a role of ebv in the pathoetiology of multiple sclerosis. its genome is approximately 190 kbp long and encodes around 80 proteins, not all of which have been definitely identified or characterized. it is known that high loads of ebv are present in patients with advanced human immunodeficiency virus (hiv)-induced immunodeficiency. we therefore selected 780 immunosuppressed patients included in the swiss hiv cohort study (shcs) with low cd4+ t cell counts, and quantified ebv copy number in peripheral blood mononuclear cells (pbmcs). 290 cell samples contained more than 2,000 viral copies in total and were subjected to target isolation and subsequent enrichment using the sureselect method by agilent biotechnologies, followed by illumina whole-genome sequencing. after data processing and quality control, variable amino acids were called as binary variables, resulting in >200 variable positions per individual in average. the same patients also underwent genome-wide genotyping to obtain host genetic variation, followed by imputation based on the haplotype reference consortium reference panel. the association analyses are currently ongoing, and we will present the results at the conference. we use logistic regression to test for association between host single nucleotide polymorphisms (snps) and binary ebv amino acid variants. bonferroni correction is applied for multiple testing correction on the sides of both host and pathogen. stratification is taken into consideration by including principal components (pcs) for the host, and phylogenetic pcs for the virus. this project will offer a global description of the adaptive forces acting on ebv during natural infection. we have shown before for hiv that a virus genome associates much more strongly with human genetic variants than clinical endpoints. the analysis of all signals resulting from the interaction between human and viral genomes has the potential to identify novel host defense mechanisms, which could serve as future diagnostic and therapeutic targets. is called in a reliable region if at least two out of five tools are concordant for the respective cnv. the pipeline shows a sensitivity of 80% and a precision of 95%. within routine gene panel diagnostics we analyzed a total of 1088 patients indicated to have rare mendelian diseases for snv and cnvs. in 32 patients a cnv was detected in genes associated with the respective individual phenotype. interestingly, in several cases the cnv completed the patients report as it was detected in genes with a recessive mode of inheritance where previously only a heterozygous pathogenic snv was found. overall, with the additional analysis of cnvs we increased the diagnostic yield from 15% (class 4, 5 single nucleotide events) to 18%. however, there are still issues in the detection of cnvs from ngs data for routine diagnostics. cnv pipelines are very prone to errors caused by enrichment inconsistencies compared to snv detection tools. the assessment of sensitivity and specificity is difficult due to the lack of datasets to validate cnv detection pipelines. originally, the analysis of cnvs was performed mainly in patients with mental retardation disorders, resulting in a paucity of cnv data linked to other mendelian diseases. moreover, the identification of the actual size and thus the assessment of pathogenicity of a cnv is difficult, because targeted ngs gene panels do not cover all genes. in conclusion, ngs data is a suitable data source for the simultaneous detection of snvs and cnvs for clinical diagnosis; however, with the current tools it is only applicable in accurately validated regions. identification of transposon insertions in whole-exome sequencing data s. lukassen, n. übelmesser, ab. ekici, u. hüffmeier, ct. thiel, c. zweier, a. winterpacht humangenetisches institut, erlangen, germany 45% of the human genome consists of transposable element derived sequences, the most abundant of which are l1 and alu elements, followed by endogenous retroviruses. several hundred of these elements remain active, leading to insertion frequencies of up to one in 20 live births for alu elements and posing a threat to genome integrity. while most studies on transposons employ whole-genome sequencing (wgs) or target-enrichment based sequencing approaches, the most commonly used form of diagnostic high-throughput sequencing is currently whole-exome sequencing (wes). we were therefore interested in investigating transposon insertions in wes data as a possible source of disease causing mutations. we developed a software to call non-reference transposon insertions from single-end wes datasets by split-read mapping and analyzed 385 exomes this way. on average, 188 non-reference insertions were identified in each exome, with an average of 1.7 sites per patient identified in < = 0.5% of other patients. of these rare variants, 91% were deemed plausible by visual inspection. automated confidence calls of the software were concordant with visual inspection in 87% of cases. in 5% of cases a plausible insertion was awarded a lower score by the algorithm and in another 5% not called at all. in 1% of cases the automated call appeared to be falsely positive, in another 1% at the wrong position within the same 100bp window. laboratory validation of 11 convincing insertions revealed a 73% true positive rate, leading to an estimated specificity of 67%. when performing calls for reference l1 insertions on 10 exomes, 65% (49% -70%) of known elements whose flanking regions were covered by at least two reads were correctly identified, leading to a sensitivity of 65%. we thus estimate the average number of non-reference transposon insertions in our wes dataset to be 194 (153-235). 67% and 22.2% of sites identified were associated with alu and l1 elements, respectively, with the majority of calls stemming from evolutionary young transposons still assumed to be active. 10.3% of sites were located within the cds, 3.4% in the utrs of genes, 0.5% spanned an intron/ exon border, 48.5% were intronic and 37.3% of insertions were found in intergenic regions. we then chose 8 insertions within intronic (7) or utr (1) regions for further analysis. seven were not detected in the mrna. one intronic alu el-abstracts ws6-002 novel insights into male-pattern baldness pathobiology via integration of differential hair follicle mirna and mrna expression profiles with gwas data male pattern baldness (mbp) is a highly heritable condition and the most common form of hair loss in men. the phenotype is characterized by a distinct pattern of androgen-dependent progressive hair loss from the scalp that is restricted to hair follicles (hf) in the frontal and vertex scalp area. the molecular mechanisms that underlie this characteristic pattern and the differences in androgen-sensitivity between hf subpopulations in the frontal/vertex and the occipital scalp remain however elusive. to gain novel insights into the underlying biology and contributing genes and pathways, we systematically investigated for a differential expression (de) of mirna-and mrna-genes in hf samples from the frontal and occipital scalp area of 24 healthy male donors. array-based genome-wide mirna and mrna profiling revealed expression of 823 mirnas and 21,247 mrnas in human hf, of which 144 mirnas (17%) and 3,230 mr-nas (15%) showed a de between hf subpopulations. the strongest de mirnas included mir-4674, mir-6075 and mir-3185. among the strongest de mrnas were the wnt-signaling inhibitor dkk1, the protein kinase pak1 and the retinoid acid receptor rora. a subsequent pathway-based analysis in mirpathdb revealed that de mirnas targeted numerous interesting pathways. among them the wnt-and mtor signaling pathway which have been implicated in the control of hair follicle cycling, a mechanism that is disturbed in mpb affected hf and other plausible candidate pathways such as estrogen, thyroid hormone signaling or epidermal growth factor binding which have not yet been implicated mpb pathobiology. to yield further evidence for an involvement of de mirnas and mr-nas in the developement of mpb, we subsequently integrated our expression data with association data from a large gwas meta-analysis on mpb (n = 22,518). of the de mirnas and mrnas, only 1 mirna (mir-193b) and 49 mrnas were located within 1 mb of one of 63 genome-wide significant mpb risk loci. notably, the analysis revealed a co-localization of de mirna, de mrna, and nominally significant association signals (p < 10 -5 ) at 9 other genomic loci, pointing towards a role of these genomic regions in mpb pathogenesis. among them a locus on chromosome 3q22.2 that comprises the genes encoding the ephrin-type-b receptor 1 (ephb1) and the prostaglandin transporter slco2a1. interestingly, ephrins have been shown to be regulated by androgens and to play a role in hf formation, proliferation and hair cycling. and expression of prostaglandin d2, which is transported by slco2a1, has been found to be upregulated in balding scalp where it inhibits hair growth. in summary, our systematic analysis of differential mirna and mrna expression and the subsequent integration with genetic association data identified 9 novel potential risk loci for mpb and numerous candidate genes and pathways that are likely to play a role in mpb pathogenesis and emphasizes the importance of data integration of large-scale omic-analyses. palaeontological genomic analyses have shown that interbreeding between anatomically modern humans and neandertals occurred in europe and asia 40.000-50000 years ago. approximately 1.5-2% of the modern european and asian genome consists of introgressed dna from neandertals. some of these introgressed regions have been suggested to contribute to several traits and phenotypes including major depression and other mood disorders. in order to further assess the role of neandertal ancestry in cognition and the contribution of genetic risk for psychiatric disorders, we performed genome-wide analyses of neandertal alleles in publicly available psychiatric genomics consortium (pgc) gwas summary statistics with samples sizes ranging from about 6000 to 293723 individuals for the following phenotypes: educational attainment, attention deficit hyperactivity disorder (adhd), anorexia nervosa, anxiety disorders, autism spectrum disorder, bipolar disorder, major depressive disorder and schizophrenia. we estimated the proportion of heritability explained by snps in neandertal introgressed regions using stratified ld score regression (ldsc) and two sets of previously inferred neandertal introgressed regions. in a secondary analysis, we investigated whether specific functional annotations such as 3'utr, promoter regions or histone marks within neandertal regions were significantly associated with selected phenotypes. we identified a modest enrichment of heritability in neandertal introgressed regions in anorexia nervosa, autism spectrum disorder, bipolar disorder and major depressive disorder, although none of the results were statistically significant. several functional annotations, such as h3k4me1 histone marks within neandertal introgressed regions, appeared significantly enriched for snps contributing to the heritability of anorexia nervosa and autism spectrum disorder. in bipolar disorder, dnasei digital genomic footprinting regions, h3k9ac histone marks and super enhancer regions within neandertal regions appeared particularly enriched for heritability. on the other hand, both sets of neandertal regions were slightly depleted of snps contributing to the heritability of schizophrenia. for example, one set of neandertal regions that contained 33% of all analysed snps only contributed to 29% of the variance of risk (standard error: 0.04; p-value: 3.86 × 10 -3 ). in comparison to the rest of the genome, neandertal introgressed regions also contributed less to the heritability of educational attainment, adhd and anxiety disorders, although these findings were not statistically significant. to our knowledge this is the first study to systematically investigate the extent to which snps attributable to neandertal introgressed regions contribute to the heritability of several psychiatric/cognitive phenotypes. we are currently increasing our power to detect snp heritability in neandertal regions by applying the ldsc method to larger pgc datasets. harbor genes involved in the complement system, high density lipoprotein metabolism or extracellular matrix homeostasis. these pathways are known for their pleiotropic role in other conditions, such as cardiovascular disease, auto-immune diseases and cancer. here we aimed to investigate the extend of overlap between the genetic risk of various complex diseases and traits and the genetic risk for amd. methods: first, we catalogued 2,331 previously published, genome-wide significant variations associated with 82 complex diseases or traits. next, we computed a genetic score by calculating the (weighted) sum of risk increasing alleles for each disease or trait. consequently, a higher genetic score indicates that an individual has more risk/trait increasing alleles of a given disease or trait. for each score, we computed the association with late stage amd using a dataset provided by the international amd genomics consortium (iamdgc) including 16,144late stage amd cases and 17,832 controls. we also assessed the association of each variation individually with late stage amd risk in order to identify novel disease loci with strong evidence for pleiotropy. results: nineteen genetic scores of complex diseases and traits were significantly associated with amd risk (fdr < 0.01). most notably, all genetic scores related to autoimmunity were elevated in amd patients (p < 5.85×10-09 ), while scores related to cardiovascular disease were reduced in amd patients compared to controls (p < 3.10×10 -05 ). we also found that the genetic scores of melanoma and related malignancies were higher in amd patients (p < 8.43×10 -05 ). in addition, 32 out of 2,331 variants, which were used to compute the genetic scores, were significantly associated with amd (fdr < 0.01), implicating 25 novel, pleiotropic loci in amd risk. conclusion: our findings demonstrate a substantial overlap between the genetic risk of complex diseases/traits and the genetic risk for amd and provide evidence for 25 novel, pleiotropic loci associated with amd. while our findings highlight common disease pathways that may facilitate to develop multi-use drug targets, they also challenge the notion that gene/genome manipulation could be applied in general terms to eradicate risk for a defined complex disease. worldwide genetic association study of exfoliation syndrome and glaucoma identifies common genetic variants at five new susceptibility loci exfoliation syndrome (pex), a complex systemic disorder of the extracellular matrix, is the commonest cause of secondary glaucoma in aging population and thus a major cause of blindness globally, affecting 60-70 million subjects worldwide. inside a large, international collaboration project a genome-wide association study (gwas) was carried out on 9,035 pex cases and 17,008 controls, recruited from 24 countries across six inhabited continents, with replication in a further independent 4,585 cases and 92,829 controls from 15 countries. significant association was observed at seven loci, of which two confirmed the already known associated loci at the genetic markers mapping to loxl1-and cana1a-gene, five are new (p < 5×10 -8 ). the five new loci map to chromosomes 13q12 (rs7329408 near flt1-pomp-slc46a3, p = 9.41 × 10 -16 ), 11q23.3 (rs11827818 near tmem136-arhgef12, p = 1.21 × 10 -10 ), 6p21 (rs3130283 at agpat1, p = 2.12 × 10 -9 ), 3p24 (rs12490863 at rbms3, p = 3.9 × 10 -8 ) and 5q23 (rs10072088 near sema6a, p = 3.64 × 10 -8 ). to determine the pathophysiological role of the three most significantly associated loci (13q12, 11q23.3, 6p21), we investigated the expression and localization of the six related genes (flt1, pomp, slc46a3, tmem136, arhgef12 and agmale-pattern baldness (mpb) is characterized by a progressive hair loss from the frontal and vertex scalp that affects ~80% of men at the age of 80 years. epidemiological studies have shown positive associations between mpb and coronary heart disease (chd) and related phenotypes such as blood pressure (bp), diabetes (dm) or elevated blood lipid levels. the results however vary with regard to the associated pattern of hair loss (frontal or vertex) and the assessed endpoint measures for chd. and so far no study has investigated for a shared genetic determinant between the traits. using data from the heinz nixdorf recall study (n = 1,675 males) and a large meta-analysis on mpb (n = 22,518), we aimed at a systematic investigation of the association between mpb and chd on (i) an epidemiological and (ii) a genetic level. , men with vertex balding showed a higher bmi (β = 1.4 kg/m2), elevated fasting triglyceride (β = 8.0 mg/dl) and lower hdl-c levels (β = -2.7 mg/dl). to assess the genetic overlap between mpb and chd, we created a risk score (rs) from 63 mpb lead snps (p < 5×10 -8 ) and tested for association with chd and related traits phenotypes. no significant associations were observed. however, an age-stratified analysis revealed a 4% per allele risk increase for chd (hr = 1.04, 95%ci:0.97;1.17) and a decrease in fasting triglyceride levels (β = -0.5). we next used ld score regression analysis in to test for genome-wide genetic correlation between mpb and chd. the analysis revealed no significant correlations with cardiometabolic (n = 3), lipid (n = 4) or metabolic traits (n = 103). finally, to investigate for a genetic overlap at single loci, we compared the mpb risk loci with reported gwas signals for chd. the analysis identified seven overlapping associations between mpb and bp (n = 3); qt-interval length; atrial fibrillation; sudden cardiac arrest; and dm. for the majority of loci, the direction of effect differed between mpb and chd, opposing previous epidemiological findings. positive associations were identified between mpb and diastolic bp (fgf5, 4q21.21) and sudden cardiac arrest (atf1, 12q13.12). interestingly, fgf5 is known to stimulate cell growth and proliferation in multiple cell types, including cardiac myocytes and hair follicle (hf) cells, and atf1 is a hf expressed regulator of cell growth and differentiation that has been shown to prevent foam cell formation, which suggests that fgf5and atf1-signaling contribute to both traits. thus, our data support an association between mpb and chd related phenotypes and suggest that mpb deserves further evaluation as an additional risk factor for chd. pleiotropic effect of genetic variants associated with complex diseases and traits in age-related macular degeneration purpose: age-related macular degeneration (amd) is the leading cause of vision loss in western societies and is caused by both environmental and genetic risk factors. with regard to the latter, several associated risk loci abstracts otyping, results of which will be presented at the conference. of note, for nonsyndromic cleft lip with/without cleft palate (nscl/p), the most frequent form of orofacial clefting, 20 risk loci have been detected by gwas so far, with some of them reaching (nearly) genome-wide or significant p-values in samples much smaller than 550 cases. in the imputed nscpo dataset none of the presently known nscl/p risk loci showed a p-value < 10 -5 . our data so far confirm previous molecular and epidemiological findings, that nscpo is genetically distinct from nscl/ p. furthermore, the results indicate that common variants alone might not contribute to the same extent to nscpo as compared to nscl/ p. the correlation between defects at specific imprinted loci and distinct imprinting disorder (id) was accepted for a long time. however, it is now put into question because of a growing number of patients with multilocus imprinting disturbances (mlid), i. e. the aberrant methylation at more than one imprinted locus. in particular, mlid is present in individuals with silver-russell syndrome (srs) and beckwith-wiedemann syndrome (bws), and it has meanwhile turned out that patients with opposite phenotypes can share common epimutation patterns. on the other hand, mlid always occurs as mosaicism and varies in different tissues of the same individual. interestingly, the majority of mlid carriers show only one specific id phenotype, though loci of other ids are affected in addition to the one specific for the phenotype. we become aware of a growing number of patients with unexpected and even contradictory molecular findings in respect to the clinical diagnosis for referral. amongst others, we detected the srs specific icr1 hypomethylation in 11p15 in two of our patients referred as bws. in the first case, the icr1 hypomethylation was detected only in lymphocytes but was not present in buccal swab dna. the patient only had a slight asymmetry, but showed normal growth and did not exhibit any other feature compatible with bws, nor with srs. the reason for the lack of clinical features is unclear, but is comparable to the observation in monozygotic, but clinically discordant srs and bws twins. here the unaffected twin often carries the epimutation only in lymphocytes whereas the affected one shows the alteration in additional tissues. a reason might be sharing of hematopoietic stem. it can be postulated that the patient presented here is born after an (undetected) twin pregnancy with early loss of the affected twin. in the second case, the initial diagnosis of bws was made due to asymmetry, though overgrowth or other features were not present. further clinical ascertainment did not confirm this diagnosis, but growth of the patient was in the lower percentiles, in concordance with the icr1 hypomethylation. these cases as well as further cases in our cohort confirm that there is an urgent need to provide detailed clinical data upon requesting molecular diagnostics for imprinting disorders. in fact, the growing number of patients with unexpected results complicates the interpretation and illustrates the broad phenotypic range, but also provides further insights in the etiology of ids and setting of imprinting marks pat1) by qrt-pcr, immunohistochemical-and western-blot analysis in genotyped ocular tissues of pex and control patients. all six genes displayed moderate mrna expression in all ocular tissues analysed, with highest levels in iris, ciliary body, and retina. however, only pomp showed a trend towards reduced expression in the presence of the rs7329408 risk allele, in both pex and control patients. in general, both mrna and protein expression of pomp and tmem136 were significantly reduced up to 45% (p < 0.005) in anterior segment tissues in pex eyes compared to controls. no differences in mrna and protein expression were detected for the remaining genes analysed. immunofluorescence analysis showed that pomp, a proteasome maturation protein, is ubiquitously expressed in most ocular cell types and that tmem136, a transmembrane protein of unknown function, is primarily localized to endothelial cells of blood vessels and aqueous outflow structures. additionally, protein staining intensities for pomp and tmem136 were markedly reduced in anterior segment tissues of pex eyes compared to controls and co-localized to abnormal accumulation of pex material on ocular surfaces and in blood vessel walls. thus, at least two of the newly identified loci provide new biological insights into the pathology of pex syndrome/glaucoma and highlight a role for impaired proteasome function as well as vascular and trabecular endothelial dysfunction in the disease pathogenesis. nonsyndromic cleft palate only -evidence for a limited contribution of common variants in contrast to nonsyndromic cleft lip ± palate cleft palate only (cpo) is a common congenital malformation which might occur as part of a syndrome or in an isolated form, i. e., nonsyndromic cpo (nscpo). nscpo has a prevalence of 1:2500 and is considered multifactorial with genetic as well as environmental factors contributing to the disorder. in a recent study we identified the first genome-wide significant locus for nscpo which has been independently confirmed in another study. in order to discover more nscpo risk loci we performed a genome-wide imputation study with gwas data from 550 case-parent trios with european, asian and african ancestry which was retrieved from db-gap upon approved data access. notably, this gwas dataset had not yet been imputed, and we hypothesized that we can increase power to identify novel genetic associations by increasing the marker density and follow-up of suggestive findings by independent replication. genome-wide genotypes were imputed using impute2 based on 1000 genomes haplotypes, and snps were selected based on info-score > 0.4 and minor allele frequency > 1%. the imputation did not reveal any genome-wide significant snp, however, 83 snps at 26 loci showed p-values < 10 -5 . loci with more than two variants below this threshold (n = 19) were to be replicated using the massarray system (agena bioscience). three independent samples were used: two case/control replication cohorts from central europe (92 cases, 335 controls) and yemen (37 cases, 231 controls), and one european case-parent trio replication cohort (eurocran study; 223 trios). in a first round we genotyped 18 snps at eleven loci. one variant, rs6809420 at chr. 3q13, showed p < 0.1 in the replication cohort and after combining replication and gwas data, resulted in a decrease of p-value from 1.14×10 -03 to 3.22×10 -04 . this indicates that this locus, which includes candidate genes such as igs11, a known cell-adhesion molecule with yet unknown function in craniofacial development, might harbour a common risk variant with low effect size. we are currently performing a second round of gen-we report biallelic mutations in cad, encoding an enzyme of de novo pyrimidine biosynthesis, in four patients with developmental disability, epileptic encephalopathy, anaemia, and anisopoikilocytosis. two children died after a neurodegenerative disease course. treatment of two surviving children with oral uridine led to immediate cessation of seizures in both. a four-year-old girl, who was previously in minimal conscious state, started to communicate and walk with assistance after nine weeks of treatment. a three-year-old girl likewise showed developmental progress. blood smears normalised and anaemia resolved. our findings support the efficacy of uridine supplementation rendering cad deficiency a treatable neurometabolic disorder. delineation of the grin2a phenotypic spectrum alterations of the n-methyl-d-aspartate (nmda) receptor subunit glu-n2a, encoded by the gene grin2a, have been associated with a spectrum of neurodevelopmental, speech and epilepsy disorders. we identified 48 previously unreported patients with heterozygous pathogenic variants in grin2a, including 30 novel variants. after re-evaluation of all published grin2a cases, 104 previously reported patients met the acmg criteria for being pathogenic or likely pathogenic. thus, we are able to collectively review genotypes and phenotypes of 152 individuals with grin2a-related disorders. we show that the known phenotypic spectrum is expanded and ranges from near-normal development to severe and unspecific encephalopathy, comprising any disorder of speech development. furthermore, some patients do not display seizures. in contrast to previous reports, gri-n2a missense variants cluster within the functionally most relevant domains. we are the first to describe genotype-phenotype correlations in grin2a-related disorders, where carriers of pathogenic missense variants tend to have more severe neurodevelopmental phenotypes compared to carriers of truncating variants. the most severe end of the phenotypic spectrum was found to include novel features, such as infantile spasms and arthrogryposis and was associated with pathogenic variants in the pore-forming domain of grin2a. the eponymic name galloway-mowat syndrome (gamos; omim 251300) has been coined for the association of early-onset nephrotic glomerulopathy, microcephaly with variable brain anomalies, and facultative diaphragmatic hernia. it is supposed to be inherited as an autosomal recessive trait and clinical as well as genetic heterogeneity has been suggested. in 2014, wdr73 mutations were identified as a cause of gamos, but only a few cases have been reported to date. over the last 15 years, we have collected dna samples and clinical data from 56 unrelated families with one or more children affected by gamos or a gamos-like syndrome (glomerulopathy plus variable anomalies of brain morphology or function as inclusion criteria), including 15 consanguineous families. in this cohort, we performed whole exome sequencing followed by targeted analysis by sanger and ngs multigene panel resequencing. in a total of 25 families of this cohort (46%) the probable underlying genetic defect could be identified. in affected individuals from two consanguineous families, homozygous mutations of wdr73 could be found (vodopiutz et al., 2015) . thus, this gene accounted for only 4% of cases of our cohort. the affected child of another family had a novel homozygous mutation in arhgdia. this gene has previously been described in three families to cause early-onset steroid-resistant nephrotic syndrome (gupta et al., 2013; gee et al., 2013) , but there is some evidence that non-specific brain anomalies may also be part of the arhgdia-associated phenotype. fourteen and three index patients from unrelated families had mutations in one autosomal (osgep) and one x-linked gene (lage3), respectively, both encoding for components of the keops protein complex that has been implicated in transcription, telomere maintenance and chromosome segregation. no human phenotype has previously been assigned to mutations in this complex. notably, eight unrelated families with an identical mutation originated from the east asian population where the carrier frequency for this allele is 0.0008. in one consanguineous family with multiple affected children the disease segregated with a homozygous mutation in the sgpl1 gene encoding for sphingosine-1-phosphate lyase. in four families, the kidney phenotype could be attributed to mutations in genes for non-syndromic nephrosis (nphs1, plce1, one novel gene), while the brain phenotype was apparently independent. in conclusion, the molecular genetic findings in this cohort confirmed that gamos is exceedingly heterogeneous, and still in almost half of the patients with a gamos-like phenotype the genetic cause remained unclear. on the basis of our findings we are now able to define new biologic mechanisms that are critically involved in both, brain development and integrity of the glomerular filtration barrier. genotype phenotype correlations are emerging. finally, we demonstrate that gamos can also be inherited as an x-linked trait. abstracts taminase 1 gene (tgm1) is mutated in the majority of patients (around 30%), and its gene product, tgase1, is therefore primarily targeted in our approach for protein substitution. patients with arci have an impaired skin barrier function, most of them are born with a collodion membrane and suffer subsequently from varying degrees of hyperkeratosis, erythema, transepidermal water loss and infections. the disease can be life threatening neonatally but lacks a causative therapy and is still only treated symptomatically. therefore, our aim is to develop a personalised, causative therapy where the defective protein is substituted topically via a nanocarrier. therapeutic, human tgase1 was synthesized in hek 293 cells and assessed by western blot and flow cytometry analysis. enzyme activity was measured by in vitro assay. tgase1 was then coupled to a polyglycerol-based nanogel (dpg-ng) containing the thermoresponsive linker poly(n-isopropyl)acrylamide (pnipam), stabilising the enzyme as well as adding a thermal protein release trigger at 35°c, which is favorable for cutaneous applications. immunocytochemical stainings for tgase1 on monolayered basal keratinocytes that lack tgm1 expression confirmed the successful uptake of extrinsic tgase1 into the cells. further analysis over time showed that the enzyme was no longer detectable after 48 h and consequently led us to define a treatment schedule for the following experiments. 3d full thickness skin models were used as in vitro system to determine barrier function and enzyme activity after treatment with varying concentrations of the dpg-ng/tgase1 complex. three different sets of skin models were used for these experiments: normal models mimicking the healthy skin with an intact barrier function, models where tgm1 was knocked down, and models made with arci patient cells with tgm1 mutations. franz cell tests on treated skin models lacking intrinsic tgase1 confirmed the impaired barrier activity in disease models, demonstrated an improved barrier function after repeated treatments with dpg-ng/ tgase1 and showed restored tgase1 activity using an in situ assay. furthermore, first toxicity tests using mtt revealed high biocompatibility of dpg-ng/tgase1 after treatment of 2d and 3d cell cultures. these findings are successful steps for an advanced topical drug delivery system and are a promising approach for causative therapeutic intervention in arci. after further optimization concerning protein dosage and thorough toxicity tests, we will adapt this system also for the use with other proteins involved in arci. pigmentation disorders (pds) comprise a large group of rare and heterogeneous disorders that are mainly characterized by various coloration abnormalities affecting single parts of the body or the complete integument. the large group of pds includes the autosomal dominant inherited hyperpigmentation disorder dowling-degos disease (ddd). ddd is genetically heterogeneous, and to date causal mutations in three genes, namely krt5, pofut1 and poglut1 have been identified. after exclusion of mutations in these genes, we performed exome-and sanger-sequencing in six unrelated ddd-patients/families and identified six heterozygous truncating mutations in psenen encoding the presenilin enhancer protein 2. on closer examination of the histological sections, we came upon a novel feature that distinguished these individuals from previous ddd-cases by the presence of follicular hyperkeratosis. to assess the functional significance of psenen mutations in ddd pathogenesis, we performed mammalian cell culture based studies and knockdown experiments of psenen homolog psenen in zebrafish larvae (zfl). knockdown of psenen in zfl resulted in a phenotype with scattered pigmentation, which mimicked human ddd. in vivo-monitoring of pigment cells in the developing zfl suggested that disturbances in melanocyte migration and differentiation underlie ddd pathogenesis. interestingly, six of the psenen mutation carriers presented with co-morbid acne inversa (ai), an inflammatory hair follicle disorder. all individuals had a history of nicotine abuse and/or obesity, which are known trigger-factors for ai. although psenen mutations have been identified in a small subset (<1%) of familial ai previously and the co-manifestation of ddd and ai has been reported for decades, our study is the first to demonstrate experimentally that mutations in psenen indeed can cause co-manifestation of ddd and ai, most likely triggered by predisposing factors for ai. thus, the present report describes a clinically and histopathologically novel ddd subphenotype in psenen mutation carriers, which is associated with an increased susceptibility to ai. protein substitution therapy for autosomal recessive congenital ichthyosis (arci) overall burkitt lymphoma showed a low genomic complexity with a low number of snvs and svs. however, the integration of cnas, snvs and svs allowed us to identify recurrently affected genes, which are involved predominately in the pi(3) kinase pathway, tonic bcr signaling, and cell cycle regulation, chromatin composition and germinal center development. burkitt lymphoma (bl) is the most common mature aggressive b-cell lymphoma in childhood. the genetic hallmark of bl is a chromosomal translocation involving the myc oncogene and one of the immunoglobulin loci leading to myc deregulation. three epidemiologic variants of bl are differentiated: endemic (ebl), which occurs predominantly in equatorial africa and is associated with ebv-infection, sporadic (sbl), which occurs in westernized countries and immunodeficiency-associated. in addition, burkitt leukemia (b-al) is differentiated from bl in cases with more than 25% of the bone marrow cells being lymphoma cells. another rare bl-variant is myc-positive precursor b-cell acute lymphoblastic leukemia coexpressing tdt and myc (tdt+bl). finally, we recently described a myc-negative variant which shows a typical alteration on chromosome 11 (mnbll). the aim of the present study was to examine the epigenetic landscape of these bl variants. to this end, we analyzed the dna methylation of 116 bl (60 sbl, 29 ebl, 10 b-al, 15 mnbll, 2 tdt+bl) using the humanmethylation450 beadchip and contrasted the findings to 24 diffuse-large b-cell lymphoma (dlbcl) and 30 follicular lymphoma (fl). the majority of lymphoma were recruited in the framework of the icgc mmml-seq and mmml projects. the ebl were obtained from the nci ghana burkitt project. as controls, we used public available dna methylation data from 93 b-cell burkitt lymphoma (bl), including its leukemic variant burkitt leukemia (b-al), is the most common type of pediatric b-cell lymphoma accounting for 30-40% of new cases. its biological hallmark is the ig-myc translocation involving myc and mostly the immunoglobulin heavy (igh) locus or more rarely one of the immunoglobulin (ig) light chain loci. at the cytogenetic level the ig-myc translocation is the sole abnormality in around 40% of cases. overall, bl is characterized by a low genomic complexity. the aim of the present study was to analyze the genomic and transcriptomic landscape of pediatric/adolescent burkitt lymphoma by sequencing according to the guidelines of the international cancer genome consortium. a total of 39 samples of bl/ b-al from pediatric/adolescent patients entered this sequencing study. all patients were treated in population-based prospective clinical trials. inclusion criteria were besides availability of suitable materials, consent to participate in the study and appropriate diagnosis: age at diagnosis (≤ 19 years), the presence of ig-myc rearrangement detected by fish and/or whole genome sequence (wgs), absence of rearrangements of bcl2 or bcl6 genes. we performed wgs of tumor and matched control as well as transcriptome sequencing of the tumor cells according to the standards of the icgc (www.icgc.org). the pathognomonic ig-myc translocation was detected in 37of 39 of the cases using wgs, but was observed in all cases by fish. an igh-myc juxtaposition was detected in 34 patients and its variants igk-myc and igl-myc in 1 and 4 cases, respectively. we identified two different expression patterns of myc transcripts which were associated with the translocation breakpoint location. on the one hand the canonical myc transcript and on the other hand an alternative transcript with a transcription start site before the second exon. the latter produces an mrna which contains 486 nucleotides not included in the canonical transcript but nevertheless it encodes the identical protein. the integration of single nucleotide variants (snv) and copy number aberrations (cna) identified a total of 128 recurrently (≥2 samples) mutated genes. myc, id3, tp53, ccnd3, smarca4, arid1a, fbxo11, ddx3x were mutated in ≥20% of samples. in 33/39 (85%) cases, the id3/ abstracts ed. the annotation with the chromatin segmentation data of cd8 + t-cells from the blueprint project revealed enrichment of changes in methylation in distinct genomic regulatory elements in t-lgl. these differentially methylated functional regions were enriched for a set of transcription factor binding sites, known to be relevant in other lymphoid neoplasms. by bioinformatic analysis of methylation data and integration with gene expression data we identified hypermethylated and hypomethylated genes (e. g. bcl11b, themis, zeb2, hivep3) which point to candidate pathways potentially deregulated in the pathogenesis of t-lgl. conclusion: our study identified dna methylation changes in a set of candidate genes involved in various signaling pathways, which could potentially be used for diagnosis, prognosis and may become targets for novel treatment options. burkitt lymphoma (bl) is a mature aggressive b-cell lymphoma genetically characterized by a chromosomal translocation leading to ig-myc juxtaposition. treatment of bl is usually very successful particularly in children, with a cure rate of over 90% even among patients with advanced stage disease. however, the prognosis of the remaining patients experiencing disease progression and/or relapse is still very poor. bl has an overall low genomic complexity, thus secondary chromosomal changes in addition to the ig-myc translocation are rare. however, genomic complexity has been associated with aggressive disease and poor prognosis in various lymphomas including bl. because little is currently known about the underlying genetics of disease progression in bl we aimed at characterizing the molecular changes and characteristics that might lead to the relapse of bl. sequential tumor biopsies from initial diagnosis (id) and follow-up were available from a total of 8 patients (4-15 years at id), which were divided into two groups: five patients experienced a relapse from their initial bl diagnosed 58-210 days after id (group 1). in contrast, three patients developed twice a bl, i. e. presented with bl as secondary neoplasms diagnosed 3-5 years after id (group 2). dna extracted from archival formalin-fixed, paraffin-embedded (ffpe) tissue was used to analyze genome-wide copy number alterations (cna) using the oncoscan® platform (affymetrix) and mutational landscape by whole exome sequencing (wes). analysis of the cna in the 5 paired bl samples (group 1) revealed an increase in genomic complexity in 4/5 pairs as in id a mean of 8 cna was detected in contrast to 13.4 cna in relapse samples (p = 0.113). of note is that in all pairs, the relapse shared almost all cna which were present in id. wes analysis of group 1 showed similar results in all analyzed pairs. in total, 46.6% of mutations (median number of mutations = 106) were shared in id and relapse. nevertheless, a considerable amount of mutations were unique in id and relapse with a median of 18 (11.8%) and 51 (41.7%) mutations, respectively. on the other hand, mutations detected in samples populations of various differential stages. furthermore, we investigated whole-genome bisulfite sequencing (wgbs) data of 12 sbl and 6 b-al in comparison to 4 germinal center b-cell populations from healthy donors to decipher differentially methylated regions (dmr). these are defined as 10 or more cpgs differentially methylated between two groups. unsupervised dna methylation analysis of bl, fl and dlbcl revealed that all bl variants cluster apart from the non-bl cases. thus, supporting on epigenetic level that all analyzed bl samples are bl variants. multigroup comparison (σ/σ max = 0.4, q < 1e -13 ) separated the bl variants roughly in 3 groups: ebl, ebv-positive sbl and all other bl variants. furthermore, this analysis revealed ebl to harbor a massive hypermethylation in comparison to all other bl variants. comparison of the dna methylation using the humanmethylation450 beadchip data of sbl and b-al revealed 199 cpgs to be differentially methylated (σ/σ max = 0.4, q < 0.005). in contrast, using the wgbs data of the same samples a total of 4712 dmrs could be identified which were mostly located in enhancer and polycomb target regions. in conclusion, we show that all analyzed bl variants share a similar dna methylation profile. interestingly, dmrs between sbl and b-al were mainly located in enhancer and polycomb regions. in contrast, ebl showed a massive hypermethylation in comparison to the other bl variants. thus, the differences identified by dna methylation analysis can improve the understanding of the biological and clinical differences of the bl variants. dürig 1, 8 introduction: t-cell large granular lymphocytic leukemia (t-lgl) is a mature t-cell leukemia which often arises in the context of autoimmune disease. genetic changes like recurrent chromosomal aberrations are rare. recent studies identified somatic stat3 and tnfaip3 mutations in t-lgl cells. however, the molecular events driving leukemogenesis remain largely unknown. objectives: the goal of our study was to characterize the epigenetic basis of t-lgl to better understand leukemogenesis and potentially identify druggable pathways or diagnostic biomarkers for t-lgl. p. johansson 1,2 , l. klein-hitpass 2 , g. castellano 3 , k. kentouche 4 , f. nicolau 5 , i. oschlies 6 , e. carrillo-de santa pau 5 , m. przekopowitz 2 , a. queiros 3 , m. seifert 2 , a. valencia 5 , ij. martin-subero 3 , em. murga penas 7 , o. ammerpohl 7 , u. dührsen 1 , r. küppers 2 , j. we analyzed the dna methylome of facs sorted tumor cells of 11 t-lgl cases in comparison to benign αβ t-cell subsets. the infinium human methylation 450 bead chip was used for analysis. we annotated our data with the publicly available chromatin segmentation data of cd8 + t-cells from the ihec/blueprint project. the expression levels of selected genes were tested by reverse transcription real-time pcr. results supervised analysis of t-lgl compared to benign cd8 + memory cells resulted in 1,083 cpg loci significantly (q < 0.001) differentially methylatkrawitz 5, 6 , a. knaus 5, 6 , m. jäger 5, 6 , r. flöttmann 5 , t. eggermann 7 , b. hoechsmann 8 , h. schrezenmeier 8 paroxysmal nocturnal hemoglobinuria (pnh) is an acquired disorder of the blood-forming system. typically, affected hematopoietic stem cells (hscs) in pnh harbor a single somatic loss-of-function mutation in the x-linked piga gene. previously, a pnh patient with a different molecular etiology has been described and herein we report three more cases of this new subgroup: a predisposing germline mutation in pigt, which is an autosomal gene of the glycosylphosphatidylinositol (gpi)-anchor synthesis pathway, is followed by a second somatic hit. by means of deep sequencing and array-cgh, we observed acquired deletions of 8 mb to 18 mb on chromosome 20q in pnh cells that include pigt as well as a region that is commonly deleted in myeloproliferative neoplasms and myelodysplastic syndromes and that is known to be differentially methylated. this results in a complete loss of expression of certain genes at this locus which is also thought to contribute to the clonal expansion. the deficiency of gpi-anchored proteins on pnh cells results in a lack of the complement regulatory proteins cd59 and daf/cd55 on the cell surface and leaves them more vulnerable to the c5b-9 membrane attack complex. in contrast to classical pnh without any fully synthesized gpi-anchors, pigt mutations impair the transamidase that links the substrate to the anchor and thus result in an accumulation of unbound gpi molecules. this difference in the pathophysiology can also be visualized in flow cytometric analysis of peripheral blood: while cd55 and cd59 surface levels are reduced in all pnh cells, the atypical pnh cells due to a transamidase deficiency can be discriminated by a specific antibody, t5 mab, that binds free gpi anchors. besides the classical pnh symptoms of anemia, thrombosis, and hemolysis, patients with pigt mutations also manifest with additional autoinflammatory symptoms, such as urticaria, fever, arthralgia and meningitis, and it is hypothesized that the free gpi-anchor that accumulates in affected cells is causally related to autoinflammation. based on these findings, we propose the new entity of atypical pnh. background: the prevalence of metabolic disorders, in particular obesity has dramatically increased worldwide. genetic variants explain only a minor part of this obesity epidemics induced by physical inactivity and over nutrition. epidemiological studies in humans and animal models of di-from patients with secondary neoplasm (group 2) were mostly unique to id (= 109, 43.9%) whereas only 26.1% of all mutations were shared in id and secondary neoplasm samples (= 49). furthermore there were no shared cna in the corresponding samples identified by oncoscan® analysis. to sum up, the oncoscan® and wes analysis, of the paired bl group (1) provide strong evidence for a linear clonal evolution, meaning relapses may directly evolve from the previous lymphoma clone rather than a common precursor. in contrast, results obtained for patients with secondary neoplasm (group 2) showed no indication for linear but rather for divergent evolution. thus, analysis of recurrent mutations shared in id and second neoplasm samples can provide important information about disease progression and are therefore subject of ongoing analysis. y. murakami 1 , t. hirata 1 , s. murata 1 , t. kinoshita 1 , m. kawamoto 2 , s. murase 2 , h. yoshimura 2 , n. kohara 2 , n. inoue 3 , m. osato 4 , j. nishimura 4 , y. ueda 4 , y. kanakura 4 , p. m. in runx1 mutated aml the number of runx1 mutations, loss of the wild-type allele and the number and kind of additional mutations impact on prognosis a. stengel, w. kern, m. meggendorfer, k. perglerovà, t. haferlach, c. haferlach mll munich leukemia laboratory, munich, germany, 2 mll2, praha, czech republic aml with mutated runx1 show a distinct pattern of cytogenetic and molecular genetic abnormalities and an adverse prognosis. we analyzed the impact of multiple runx1 mutations and runx1 wild-type (wt) loss on associated genetic alterations and survival. for this, 467 aml cases with runx1 mutations (mut) were split in (1) runx1 wt loss (n = 53), (2) >1 runx1mut (n = 94) and 1 runx1mut (n = 323). 163 cases were selected for mutation analyses of 28 genes. in cases with 1 runx1mut, +8 was frequently found, whereas in wt loss +13 was the most abundant trisomy (+8: 66% in 1 runx1mut vs. 31% in wt loss, p = 0.022; +13: 15% vs. 62%, p < 0.001). cases with >1 runx1mut showed an intermediate distribution (+8: 44%, +13: 50%). missense mutations were the most abundant mutation type in wt loss cases (53% vs. 31%, p = 0.006), whereas in 1 runx1mut, frameshift mutations were found more frequently (45% vs. 28%, p = 0.016). in cases with >1 runx1mut, both were observed at similar frequencies (missense: 36%, frameshift: 38%). mutation analyses of 163 selected cases revealed 411 additional molecular mutations. 95% of cases showed at least one runx1-accompying mutation (range: 0-6). the median of accompanying mutations was n = 2 in the total cohort and in cases with 1 runx-1mut and >1 runx1mut, whereas it was n = 3 in runx1 wt loss. srsf2 (39%), asxl1 (36%), dnmt3a (19%), idh2 (17%), sf3b1 (17%), tet2 (17%) and bcor (16%) were revealed as most frequently mutated genes. cases with runx1 wt loss showed a higher frequency of asxl1mut compared to the other cases (50% vs. 29%, p = 0.009), while u2af1mut were absent from this group (0% vs. 10%, p = 0.019). median overall survival (os) in the total cohort was 14 months. wt loss (os: 5 months) and >1 runx1mut (14 months) showed an adverse impact on prognosis compared to 1 runx1mut (22 months; p = 0.002 and p = 0.048, respectively). mutations in asxl1 and kras and the presence of ≥2 additional mutations also negatively impacted os (10 vs. 18 months, p = 0.028; 1 vs. 15 months, p < 0.001; 12 vs. 20 months, p = 0.017). in univariate cox regression analysis runx1 wt loss (hr = 1.6; p = 0.024), ≥2 additional mutations (hr = 1.9; p = 0.019), asxl1mut (hr = 1.6; p = 0.030) and kr-asmut (hr = 4.4; p = 0.001) had an adverse impact on os. multivariate cox regression analysis revealed an independent adverse effect on os for runx1 wt loss (hr = 1.6; p = 0.039) and krasmut (hr = 4.2; p = 0.001). for 216/467 cases we received samples during course of the disease. in none of these cases, an evidence for a runx1 germline mutation was found by analyzing the mutation loads, thus all runx1 mutations are somatically acquired. taken together, we found strong differences between the subgroups in regard of cytogenetic and molecular genetic aberrations as well as regarding prognosis. thus, not only the presence and number of runx1 mutations but also the conservation of an intact runx1 allele as well as the number and kind of additional mutations is biologically and clinically relevant. abstracts different chromatin states, where methylation is inversely correlated with active histone marks. using the hardy-weinberg law, we estimate that there are 692 dmrs with a maf>0.05. we hypothesized that cis-acting dna polymorphisms could be responsible for the inter-individual variation of the dmrs methylation levels. we genotyped 2.5 million snps in the five donors and found that 82/157 (52%) dmrs have methylation levels highly correlated (>0.9) with the genotype of at least one nearby snp (± 6 kb window). this correlation was verified in 6/7 dmrs by targeted bisulfite sequencing in monocytes from 4 individuals used for wgbs and from 2 additional individuals. to validate our results in a larger population and possibly find correlating snps outside the ± 6 kb window for the remaining dmrs, we performed genome-wide association studies (gwas) using snp genotypes and illumina 450k cpg methylation data from blood samples of 1131 individuals from the heinz nixdorf recall study. these methylation arrays encompass only 51 cpgs contained in 30 of our dmrs, showing that they fail to identify a great number of potentially important regions. we certified that for these 51 cpgs, monocyte and whole blood dmrs methylation levels were correlated, and performed a gwas with ~500,000 snp for each of the 51 cpgs. for 48/51 cpgs, the correlation peak was near the cpg position. for each gwas, the snp with lowest p-value (in most cases p < 1e -200 ) was designated as lead-snp. snps in high linkage disequilibrium (r 2 > 0.8) to the lead-snps were located within the corresponding dmr or 16 bp to ~116 kb from it. many regions are bound by ctcf and other transcription factors. it is likely that snps affect the binding of these factors and thus the methylation state of the region. we conclude that these inter-individual differences in dna methylation are mainly driven by genetic factors. the dystonia 6 (dyt6) protein thap1 recruits the histone deacetylase hdac3 to mediate gene repression sektion für funktionelle genetik am institut für humangenetik, universität zu lübeck, lübeck, germany, 2 institut für neurogenetik, universität zu lübeck, lübeck, germany dystonia describes a heterogeneous group of neurological movement disorders characterized by contractions in various muscles resulting in abnormal postures, involuntary twisting and repetitive movements. dystonia 6 (dyt6), a primary torsions dystonia that first has an impact on cranio-cervical muscles causing problems with speaking and eating, is caused by mutations in the thap1 gene (thanatos-associated domain-containing apoptosis-associated protein 1). thap1 belongs to the family of thap proteins that are characterized by the presence of an evolutionarily conserved specific dna-binding thap zinc finger motif at their n-terminus. in humans 12 thap family members are known, designated thap0 to thap11. interestingly, most of the dyt6-causing mutations affect this thap domain. while we have previously described thap1-mediated repression of specific target genes, the molecular mechanisms how thap1 regulates promoter activity are rather unknown. it is known, that other members of the thap family such as thap7 and thap11 interact with the histone deacetylase hdac3 to mediate transcriptional repression. we have performed yeast-two-hybrid and gst pulldown assays to identify a specific interaction of thap1 with hdac3. by the use of truncated thap1 fragments we were able to narrow down hdac3 binding to the n-terminal thap-domain. for further functional characterization we have decreased hdac3 levels by sirna treatment or chemical inhibition and used taqman analyses to quantify the effect on thap1-target genes expression. thus, a significant increase of thap1-target genes expression was detected in those cells treated with hdac3 sirna. to further investigate whether the observed increase in gene expression is due to alterations of histone acetylation within the promoter regions we performed chromatin immunoprecipitation (chip) assays followed by qpcr using antibodies specific for different acetylated n-terminal residues of histone 3 as markers for transcriptional active promoters. by this, we detected an increased acetylation within the promoter regions of thap1 target genes that are dysregulated in cells treated with decreased hdac3 levels. et-induced obesity indicate that epigenetic changes associated with adverse parental and/or intrauterine factors may contribute to the missing heritability of metabolic disorders. possible adverse paternal effects are likely transmitted by the sperm to the next generation. to prove this hypothesis, we have systematically analyzed the effects of paternal obesity on the sperm epigenome and its implications for the next generation. results: to study the possible transmission of paternal bmi effects to the next generation, methylation levels of eight paternally expressed imprinted genes (peg1, peg3, peg4, peg5, peg9, peg10, nespas and igf2), two maternally expressed imprinted genes (meg3 and h19), and the obesity related gene hif3a were quantified by bisulphite pyrosequencing in sperm of 109 donors (undergoing ivf/icsi) and 121 fetal cord blood (fcb) of resulting offspring (conceived by ivf/icsi with the same sperm samples). hif3a showed a significant positive correlation between sperm methylation and paternal bmi. this effect on the sperm epigenome was replicated in an independent cohort of 188 sperm samples. for hif3a, paternal bmi also showed a significant positive correlation with fcb methylation. on the other hand, peg5/nnat exhibited a significant negative correlation between paternal bmi and fcb methylation. in contrast to pyrosequencing, deep bisulphite sequencing (dbs) allows one to study dna methylation at the single molecule level and enables us to distinguish between maternal and paternal alleles in fcb samples with an informative snp. epimutations which are defined as alleles showing >50% aberrantly (de)methylated cpg sites can also be identified with dbs. upon performing dbs on sperm samples, we observed a higher epimutation rate in the high bmi (28-40) group when compared to the low bmi (19-24) group across the four studied genes (peg1, hif3a, h19 and nespas). we are presently analyzing dbs data in selected cord blood samples with an informative snp to separately quantify methylation at the paternal and maternal alleles. it is important to decipher the methylation of the paternal allele when studying whether sperm methylation alterations are transmitted to the offspring. conclusions: our results suggest that male obesity is associated with modification of the sperm dna methylome, which may affect the epigenome (in fetal cord blood) of the next generation. allele-specific dna methylation occurs at functionally different regions: 1) at imprinting control elements, 2) on the silent x chromosome in females and 3) across the genome and probably dependent on the dna sequence in cis. the latter is termed haplotype-dependent allele-specific methylation and may contribute to inter-individual phenotypic variation. in a previous study on monocyte to macrophage differentiation, we showed that dna methylation differences between individuals were greater than between the two cell types. to study the genetic basis of these inter-individual differences in dna methylation, we analysed the methylome obtained by whole genome bisulfite sequencing (wgbs) of monocytes from five unrelated donors. for identifying differentially methylated regions (dmrs), we created two synthetic methylomes: one with the highest methylation values of each cpg in the five samples and one with the lowest methylation values. defining a dmr as a region of at least 4 cpgs with a methylation level difference of at least 0.8, we identified 157 dmrs, which cover 1165 cpgs and fall into cer, respectively. we show that ns-associated rit1 mutants intensified signal flux through the mek-erk pathway upon growth factor stimulation. by using heterologous expression systems, we identified the p21-activated kinase 1 (pak1) as novel effector of rit1. we found that rit1 interacts with the rho gtpases cdc42 and rac1, both of which are crucial upstream regulators of pak1. disease-causing rit1 mutations enhance protein-protein interactions and uncouple complex formation from growth factors. expression of both wild-type rit1 and its mutant forms resulted in dissolution of stress fibers and paxillin-containing focal adhesions from the cell center and increased cell movement. we conclude that rit1 is a potent regulator of actin dynamics, and dysregulated rac1/cdc42-pak1 signaling controlling cell adhesion and migration may be one aspect of the molecular basis of ns. medical systems biology, tu dresden, germany, 2 institute for clinical genetics, tu dresden, germany, 3 cancer science institute of singapore, national university of singapore, singapore, 4 institute of molecular biology, mainz, germany telomeres are short repetitive ttaggg sequences that cap the ends of chromosomes. these stretches of dna are covered by proteins and rnas which together protect the putative double strand break from dna repair mechanisms and facilitate replication. however, telomeres shorten with every cell division due to the end replication problem. the ribonucleoprotein telomerase counteracts this process by de novo elongation of telomeric repeats but its expression is mostly confined to the germ line and stem cells. even in the latter its activity is usually not sufficient to completely prevent telomere shortening. all cancer cells are also faced with this challenge and while the majority of cancer cells rely on telomerase, approximately 15% of cancers ensure sufficient telomere length via the recombination-based alternative lengthening of telomeres (alt) mechanism. to better understand telomere biology we aimed to identify novel telomeric factors by systematically screening for telomere-binding proteins in cell lines from 16 different vertebrates. here, we identified and characterized zbtb48, a zinc finger protein, as a novel direct telomere-binding protein across the vertebrate lineage. zbtb48 is directly binding to telomeric dna in vitro and it is localizing to telomeres in vivo via one specific zinc finger domain in both telomerase-and alt-positive cancer cells. interestingly, zbtb48 knock-out cells have longer telomeres, suggesting that zbtb48 limits telomere elongation. in addition, the combination of chipseq, rnaseq and proteome analysis revealed a transcription factor activity for a small, but specific set of target genes of zbtb48, linking its telomeric functions to mitochondrial metabolism. in conclusion, zbtb48 is a novel direct telomere binding protein with transcription factor activity that acts as negative regulator of telomere length. our data show for the first time a functional interaction of the 'dystonia 6 protein' thap1 with the histone deacetylase hdac3 and therefore give new insights into the molecular mechanisms of thap1-mediated gene repression. interestingly, previous functional studies as well as structure analyses revealed that only a subset of the dyt6-causing mutations affecting the n-terminal thap domain alter thap1-binding to dna. in ongoing studies we want to investigate the consequences of dyt6-causing mutations on thap1-hdac3 complex formation and its relevance in the molecular pathology of dystonia. reproductive homeobox (rhox) genes are clustered on the x chromosome and share a unique 60 amino acid helix-turn-helix dna binding homeodomain. they were identified in several species as having important roles in reproductive tissues, notably in the testis. the human rhox cluster is composed of three genes: rhoxf1 and two copies of rhoxf2 (rhoxf2a, rhoxf2b) which are referred to as rhoxf2/2b. rhox proteins are expressed exclusively by germ cells in human testis and aberrant rhox methylation is associated with several sperm parameters. because little is known about the molecular mechanism of rhox function in humans, the aim of the study was to identify target genes of human rhox proteins and to investigate the impact of rhox mutations on protein function. using gene expression profiling, we identified genes regulated by members of the human rhox gene cluster. some genes were uniquely regulated by rhoxf1 or rhoxf2/2b, while others were regulated by both of these transcription factors. several of these regulated genes encode proteins involved in processes relevant to spermatogenesis, e. g. stress protection and cell survival. one of the target genes of rhoxf2/2b is rhoxf1, suggesting cross-regulation to enhance transcriptional responses. the potential role of rhox in human infertility was addressed by sequencing rhox in a group of 250 patients with severe oligozoospermia. this revealed two mutations in rhoxf1 (c.515g>a and c.522c>t) and four in rhoxf2/2b (-73c>g, c.202g>a, c.411c>t and c.679g>a), of which only one (c.202g>a) was found in a control group of men with normal sperm concentration. functional analysis demonstrated that c.202g>a and c.679g>a significantly impaired the ability of rhoxf2/2b to regulate downstream genes. molecular modelling suggested that these mutations alter rhoxf2/f2b protein conformation. by combining clinical data with in vitro functional analysis, we demonstrate how the x-linked rhox gene cluster may function in normal human spermatogenesis and we provide evidence that it is impaired in human male fertility. colorectal cancer (crc). here, we proposed the possible molecular mechanisms responsible for crc initiation, progression and invasion using a network biology approach. materials and methods: in order to investigate the underlying crc pathogenesis, the dataset gse21510 consisting of normal tissues, stage i, stage ii, stage iii and stage iv of crc were obtained from gene expression omnibus (geo) and further examined. the differentially expressed genes (degs) were subjected to protein-protein interaction databases and a ppi network was constructed for each crc stage. topological analysis of resulted ppi networks revealed functional hub genes and involved in crc development. furthermore, the overlap genes between four studied crc stages were determined and deeply evaluated to identify deregulat ed biological networks during crc development. a standard real-time pcr was performed to validate the in silico findings utilizing sw620 and ncm460 cell lines. results: the most important hub genes (cdk1 for stage i, ubc for stage ii, esr1 for stage iii and atxn1 for stage iv) and sub-networks were identified in crc stages. moreover, several novel biomarkers were also introduced for each crc stage. gene ontology (go) and signaling pathway enrichment uncovered the important roles of wnt, mapk and jak-stat signaling pathways in regulation of crc pathogenesis. functional annotation of overlap genes revealed that cell cycle regulating genes are the most highly regulated genes during crc initiation, progression and invasion. in vitro analyses confirmed deregulation of atxn1 and cdk1, two hub genes of stage iv, in metastatic colon sw620 cells compared to normal colon ncm460 cell line. our study provides a new insight into the distinct molecular mechanisms underlying the pathogenesis of crc. the functional hub genes, sub-networks, prioritizes key pathways and novel crc biomarkers were also provided that can be useful in therapeutic programs. targeted next-generation sequencing approaches as well as next-generation whole exome sequencing are becoming more widespread in routine molecular diagnostics for patients with ataxia. however, since ngs at present is not suitable to detect (trinucleotide) repeat expansions, a pre-ngs testing for common polyglutamine expansion scas seems mandatory. but also sca subtypes caused by expansions in non-coding regions of genes like sca8, sca10, sca12, and sca36 as well as other ataxias known to be associated with repeat expansions like the fragile x-associated tremor ataxia syndrome (fxtas) should be taken into account before applying ngs-based diagnostics. in order to find an optimal diagnostic strategy in future more information about the frequency and phenotypic characteristics of rare repeat expansion disorders associated with ataxia would be helpful. we therefore analyzed a cohort of 441 patients with symptoms of cerebellar ataxia, dysarthria and other unspecific symptoms who were referred to our center for sca diagnostics and showed alleles in the normal range for the most common sca subtypes sca1-3, sca6, sca7, and sca17. these patients were screened for expansions in sca8, sca10, sca12, sca36 and fxtas as well as for the pathogenic hexanucleotide repeat in the c9orf72 gene. no expanded repeats for sca10, sca12 or sca36 were found in the analyzed patients. five patients with ataxia of unknown etiology showed sca8 cta/ctg combined alleles (83-129) that are discussed to be potentially pathogenic. one 51-year-old male patient with unclear dementia syndromes was diagnosed with a large ggggcc repeat expansion in c9orf72. and the analysis of the fmr1 gene identified one patient with a permutation (>50 cgg repeats) and seven patients poster *** = für den posterpreis nominiert preventive genetic counseling in neurogenetic disorders needs a better collaborative approach between genetic and neurology clinics -a report of four siblings with unverricht-lundborg disease: genetic counseling is the process of helping people to understand and adapt the medical, psychological and familial implications of genetic contributions to disease. for parents with a previous child or other family member with a known genetic syndrome expands options for preimplantation or prenatal diagnosis for the current or the future pregnancies. however, timely referral by health providers to genetic counselor and for discussing with couples regarding possible options is important. additionally, other factors such as personal decision making especially due to high price of some genetic services and uncertain results cause considerably delays to genetic testing. there are more than 200 various types of inherited neurological disorders in which alterations in genes lead to an inherited condition such as huntington disease, inherited forms of alzheimer disease, ataxia, muscular dystrophies and epilepsies. the knowledge of the causative gene mutations in the affected individual is critical in the possible prenatal diagnosis in other members of the pedigree. therefore a multidisciplinary care team, including neurologist and genetic counselor for the conditions diagnosed as inherited neurological disorders is critical in prenatal setting and consideration of an effective management. here, our report of four siblings affected by a rare form of inherited epilepsy (unverricht-lundborg disease) with an autosomal recessive pattern highlights the importance of the needs for a better collaborative approach in the neurogenetic setting. in fact, the birth of four successive siblings affected by similar neurogenetics disorders in a specific family is showing the need for more attention to this important issue, especially in terms of intersectoral collaboration. poorebrahim 6 hort: 6/10 differentially expressed transcription units). these differences in gene expression we detected did not correlate with dna methylation changes at the corresponding transcription regulatory sites. from our results we conclude that altered expression of imprinted genes indeed plays a role in tumorigenesis of germinal center derived b-cell lymphomas. however, the altered transcriptional regulation of these genes seems not to rely on the usual epigenetic mechanisms known from constitutional imprinting disorders. mf. abazari 1 , h. bokharaie 2 , m. asghari 3 , v. poortahmasebi 4 , h. askari 5 , m. investigating the expression of genes associated with autism spectrum disorders to identify sex related differences s. berkel, a. eltokhi, g. rappold institute of human genetics, heidelberg university hospital, heidelberg, germany neurodevelopmental disorders such as autism, attention deficit and hyperactivity syndrome as well as language problems and learning difficulties have a higher prevalence in male individuals compared to females. autism is characterized by impairments in social interaction, communication deficits and restricted and repetitive behaviors. boys are more frequently affected than girls; the ratio of affected boys compared to girls is 4:1 for autism and 11:1 for asperger syndrome. in this study we aim to elucidate the reason for this gender difference by following up two hypotheses: (1) risk genes for autism spectrum disorders (asd) might be expressed at different levels in males and females and (2) asd risk genes might interact with sexually dimorphic pathways. first, we investigated the expression of genes associated with autism spectrum disorders, including the shank gene family, in the brain of male and female mice to identify sex-dependent differences. the rna expression levels were analyzed in five different brain regions (cortex, hippocampus, striatum, cerebellum, thalamus) at different developmental stages (e15, e17, p1, p7, p12 and adult) in male and female mice. we identified a sex dimorphic expression of shank1 and shank3, but not of shank2. due to the fact that early brain development is strongly influenced by sex hormones (estrogen, testosterone), we further investigated the influence of these hormones on shank expression in human neuroblastoma cells (sh-sy5y) and primary mouse hippocampal neurons. a better understanding of the sex differences in the brain might help to explain the vulnerability for neuropsychiatric disorders like autism and paves the way to discover putative risk or protective factors for these disorders. imprinting defects in temple syndrome are caused by a failure in imprint establishment and/or maintenance j. beygo, c. mertel, g. gillessen-kaesbach, b. horsthemke, k. buiting institut für humangenetik, universitätsklinikum essen, universität duisburg-essen, essen, germany, 2 institut für humangenetik, universität zu lübeck, lübeck, germany temple syndrome (ts14) is a rare imprinting disorder characterised by low birth weight and height, muscular hypotonia and feeding difficulties in the infant period, early puberty and short stature with small hands and feet and often truncal obesity. in a subset of patients with ts14, the disease is caused by an imprinting defect (id) affecting the paternal allele of the imprinted region 14q32. the id results in aberrant methylation of the three known differentially methylated regions (dmrs), the germline-derived primary dlk1/meg3 intergenic (ig-)dmr (meg3/dlk1:ig-dmr), the postfertilization-derived, secondary dmr at the meg3 promoter (meg3:tss-dmr), and the postfertilization-derived, secondary intragenic meg8-dmr (meg8:int2-dmr). the meg3/dlk1:ig-dmr and the meg3:tss-dmr are methylated on the paternal chromosome and hypomethylated in patients with ts14 and an imprinting defect. the meg8:int2-dmr is unmethylated on the paternal chromosome and hypermethylated in these patients. both the meg3/dlk1:ig-dmr and the with alleles in the grey zone (41 to 54 cgg repeats), thus suggesting that individuals with fmr1 repeat expansions in the gray zone may also present with neurological signs. bernhart 3, 4, 5 , h. kretzmer 3, 4 , r. wagener 1 , c. mmml 6 some genes are subject to the mechanism of imprinting, i. e. their expression depends on parental origin. they primarily function in the control of proliferation, fetal development and cellular differentiation. constitutional imprinting disorders are in part also associated with an increased tumor risk. loss of imprinting has been also described as somatic event in tumorigenesis. while this phenomenon has been broadly analyzed in solid tumors, data on alterations of imprinting in lymphatic neoplasms are largely missing. we analyzed the rna expression of 321 transcription units/regions known or supposed to be subject to imprinting in two cohorts of normal b-cells and germinal center derived b-cell lymphomas. the first cohort (mmml) contains 686 samples: 56 burkitt lymphomas (bl), 600 non-burkitt lymphomas (non-bl, including various subtypes like follicular and diffuse large b-cell lymphoma) and 30 normal germinal center b-cell samples (gcbc, as controls). the second cohort (icgc mmml-seq) comprised 201 samples with 20 bl, 176 non-bl and 5 gcbc samples. gene expression was analyzed with affymetrix u133a genechips in the mmml cohort and by rna sequencing in the icgc mmml-seq cohort. results of the transcriptional analyses in the icgc mmml-seq cohort were compared to the dna methylation available from a subset of the analyzed samples (kretzmer et al., nat genet, 2015) . of the 321 transcription units 114 sites, corresponding to 64 transcription units, were present on the applied array used for the analysis of the mmml cohort. a two group comparison revealed 53 significantly differentially expressed sites corresponding to 31 transcription units between bl and non-bl including the plagl1 and peg10 genes. in total, 19 and 16 sites corresponding to 16 and 10 transcription units are differentially expressed between bl versus gcbc and non-bl versus gcbc, respectively. comparison of gene expression in the icgc cohort revealed 70 differentially expressed sites corresponding to 68 transcription units between bl and non-bl (overlap with mmml cohort: 24/31 differentially expressed transcription units), including again peg10 and plagl1, 37 differentially expressed sites corresponding to 37 transcription units between bl and gcbc (overlap with mmml cohort: 7/16 differentially expressed transcription units) and 44 differentially expressed sites corresponding to 39 transcription units between non-bl and gcbc (overlap with mmml co-abstracts maintenance dna methylation of l1 promoters, spoc1 could function in targeting g9a to l1 sequences. in conclusion our data implicate the epigenetic reader spoc1 in the suppression of line elements during germ cell development. s. bens 1 , j. kolarova 1 , m. kreuz 2 , sh. characterization of the expression of the imprinted kcnk9-gene in specific brain regions and the phenotypic analysis of kcnk9knockout mice a kcnk9/kcnk9 is a maternally expressed imprinted gene whose mutations are responsible for the maternally inherited birk-barel mental retardation dysmorphism syndrome. it encodes a member of the superfamily of k+channels with two pore-forming domains and is involved in the modulation of the resting membrane potential and excitability of neuronal cells. so far, only homozygous kcnk9 knockout mice with inactivation of both parental alleles were phenotypically characterized. these mice displayed cognitive deficits as well as a reduction of k+ leak current by 50%. in the light of maternal-specific imprinted expression of kcnk9/kcnk9 and the maternal inheritance of the birk-barel mental retardation dysmorphism, a thorough phenotypic analysis of heterozygous kcnk9 knockout mice with inactivation of only the maternally inherited allele is also warranted. as first aim of our study, we characterized the parental allele-specific expression of kcnk9 in various regions of the mouse brain. quantification of allele-specific expression by pyrosequencing (quasep) method was performed for different brain areas from several developmental stages of (c57bl/6xcast/ei) f1 hybrid mice. exclusive expression from the maternal kcnk9 allele was detected in the dentate gyrus, hippocampus, mesencephalon, medulla oblongata, thalamus and pons. biallelic expression with, however, a strong bias towards the maternal kcnk9 allele (94-99% of the transcripts) was observed in the olfactory bulbs, cortex, cerebellum, striatum and olfactory tubercles. as the second aim of our study, the phenotypes of wildtype, heterozygous kcnk9 knockout mice with maternal inherited knockout allele (kcnk9komat) and homozygous kcnk9 knockout mice (kcnk9kohom) were comparatively examined in a behavioral test battery. due to the already known cognitive defects of kcnk9kohom animals and especially the phenotype of the patients with birk-barel syndrome, it was assumed that kcnk9komat and kcnk9kohom animals show deficits in some of the tests. the spontaneous alternation in the y-maze test was significantly reduced by approximately 10-20% in kcnk9komat and kcnk9kohom mice compared to wildtype mice indicating a clearly impaired working memory. in addition, kcnk9komat and kcnk9kohom mice displayed a reduced prepulse inhibition of startle response compared to wildtype mice indicating an impairment of sensomotoric gating, a process to filter out irrelevant information. acoustic startle response as a measure of anxiety levels was also significantly decreased, but only in kcnk9kohom mice. our findings shall further elucidate the role of kcnk9/kcnk9 in brain physiology and pathophysiology and open new avenues for treatment of cognitive dysfunctions in birk-barel syndrome. meg3:tss-dmr act as imprinting control centres, although the meg3/ dlk1:ig-dmr functions as an upstream regulator of the meg3-dmr. so far, the function and regulation of the meg8-dmr is unknown. the hypomethylation of the paternal allele in ts14-id patients at the meg3/ dlk1:ig-dmr and the meg3:tss-dmr point to a failure in the establishment of the methylation imprint or to maintain the methylation imprint after fertilization. in this case, the incorrectly imprinted chromosome 14 would be inherited from either the paternal grandfather or grandmother. to prove this assumption we are investigating the grandparental origin of the affected chromosome 14 in our cohort of ten ts14-id families by studying the parent-of-origin specific methylation of the three dmrs in combination with informative single nucleotide variants (snps). at the moment we have identified three families informative for the meg3/ dlk1:ig-dmr, two families for the meg3:tss-dmr and two families for the meg8:int2-dmr. so far we have obtained results in two families for the meg3:tss-dmr. we found that in one case the allele harbouring the id was inherited from the paternal grandmother, but in the second case from the paternal grandfather, indicating that the id occurred after erasure of the parental methylation imprints. a complete lack of methylation observed in the majority of ts14-id patients is therefore likely due to a problem in establishing methylation on the paternal chromosome, whereas in rare cases with methylation mosaicism, the id is probably due to a problem to maintain the paternal imprint after fertilization. bosch, s. lukassen, j. kaindl, j. schwarz, c. nelkenbrecher, a. herrmann, a. reichel, a. ekici, t. gramberg, t. stamminger, a. winterpacht humangenetisches institut, erlangen, germany, 2 virologisches institut, erlangen, germany spoc1/phf13 is a gene located on human chromosome region 1p36.31 and mouse chromosome 4qe2. the protein was first described in patients with epithelial ovarian cancer, where its expression correlated with tumour progression and reduced survival time. spoc1 is a reader of the epigenetic mark h3k4me2/3, dynamically associates with chromatin during mitosis and plays a role in chromosome condensation. spoc1 deficient mice show a pronounced hypoplasia of the testis with a progressive loss of germ cells. although loss of spoc1 leads to a significantly reduced chromatin condensation of the sex chromosomes in meiosis, the protein is not expressed in spermatocytes but in the undifferentiated precursor cells, the spermatogonial stem cells (sscs). here, we present chip-seq data of mouse testis tissue demonstrating that spoc1 strongly binds to evolutionary young l1 elements in undifferentiated spermatogonia. we show that in hek cells overexpression of spoc1 leads to repression of transposition activity of line-elements strongly indicating a role of spoc1 in l1 element suppression. the cell has developed several lines of defence against retrotransposition to maintain genomic integrity, including dna methylation. these defence mechanisms are most elaborate in spermatogonial stem cells since transposition events in these cells would have a dramatic impact on the next generation. therefore, the repression of retrotransposition is of fundamental importance for germ cell development and ultimately the quality of the gametes. moreover, we present medip results showing that the methylation levels of l1 sequences decrease upon spoc1-knockout and demonstrate that the histone methyltransferase g9a is strongly upregulated in preleptotene meiocytes of spoc1 -/mice. g9a is expressed from spermatogonia until early meiosis where it regulates h3k9 di-methylation and has been shown to be involved in the repression of l1 element in mouse spermatogonia. we are able to demonstrate that h3k9me2 levels are unaltered in spoc1 -/mice, suggesting a potential functional link between g9a and spoc1 that does not affect the catalytic activity of g9a. since g9a can regulate de novo and medizinische genetik 1 · 2017 121 lated to investigate ciliogenesis. data resulting from rnaseq experiments are analyzed by established informatics tools (tophat, cufflinks and derivatives thereof). we will show results from our work in progress and we hope to convince people to intensify rna analyses even in routine labs to uncover hidden mechanism and/or mutations impacting mrna splicing and thereby causing human disease. telomeres are located at the ends of chromosomes and have an essential role in the maintenance of genome stability. after each cell division, a small part of this specialized sequence is lost. when telomeres reach a critically reduced length, the cell either dies through apoptosis or enters a state of permanent cell cycle arrest. it has been demonstrated that telomere biology is directly linked to basic biological phenomena such as aging, tumorigenesis and maintenance of dna integrity. it is known that oxidative stress accelerates telomere shortening in cells, resulting in premature cell senescence. shorter leukocyte telomeres have been observed in type ii diabetes or degenerative disease like dementia and alzheimer disease as well in chromosomal instability syndrome, such as fanconi anemia (fa) and nijmegen breakage syndrome (nbs). any link between telomere length and inflammation has not yet been extensively studied in autoimmune diseases. accelerated length shortening might be related to autoimmune disease predisposition. yet the reasons for this shortening are likely manifold, including the individual genetic background, oxidative stress and chronic inflammation. in order to shed light on these relationships, we investigate genomic dna extracted from blood of patients diagnosed with multiple sclerosis and from patients with huntington disease. the samples were divided into age groups. stepone q-pcr was applied to detect the relative telomere length as a function of age. initially identified differences in telomere lengths still have to be confirmed in larger cohorts. background: intrauterine exposure to gestational diabetes mellitus (gdm) confers a lifelong increased risk for metabolic and other complex disorders to the offspring. gdm-induced epigenetic modifications modulating gene regulation and persisting into later life are generally assumed to mediate these increased disease risks. to identify candidate genes for fetal programming, we compared genome-wide methylation patterns of fetal cord bloods (fcbs) from gdm and control pregnancies. methods and results: using illumina's 450k methylation arrays and following correction for multiple testing, 65 cpg sites (52 of which are associated with genes) displayed significant methylation differences between gdm and control samples. three of four candidate genes, atp5a1, prkch, and slc17a4, from our methylation screen and one, hif3a, from the literature were validated by bisulfite pyrosequencing. the gdm effect on fcb methylation was more pronounced in women with insulin-dependent gdm who had a more severe metabolic phenotype than women with dietetically treated gdm. however, the effect remained significant after adjustment for the maternal bmi and gestational week in a multivariate regression model. e. g. dekomien 1, 2 , b. bellenberg 2, 3 , n. trampe 2, 4 , r. schneider 2, 4 , c. prehn 2, 4 , c. krogias 2, 4 , r. kropatsch 1, 2 , m. regensburger 5, 6 , c. lukas 2, 3 , r. gold 2, 4 , 4 1 human genetics, bochum, germany, 2 ruhr-university bochum, germany, 3 radiology, st. josef-hospital, bochum, germany, 4 neurology, st. josef-hospital, bochum, germany, 5 molecular neurology, erlangen, germany, 6 university of erlangen, germany a pair of monozygotic 22-year-old twins suffering from hereditary spastic paraplegia 11 (spg11) is described. patients underwent thorough clinical examination and magnetic resonance imaging (mri) and mr-spectroscopy (mrs) at 3 tesla. genetic testing was performed by sanger sequencing and alternative splicing by rna analysis. clinically the patients presented a similar spectrum of symptoms with a higher level of disability in one of the patients. mri studies including morphometry and regional microstructural analysis by diffusion tensor imaging (dti) of the corpus callosum (cc) revealed marked thinning and corresponding increases of axial diffusivity (ad), radial diffusivity (rd) and apparent diffusion coefficient (adc) and reduction of the fractional anisotropy (fa) as compared to healthy controls in all cc sections, particularly in the anterior callosal body. there was marked supratentorial white matter reduction and to a lesser extent grey matter reduction in both patients. involvement of the cortico-spinal tracts was reflected by fa and rd alterations and cervical cord atrophy. the more strongly affected patient showed a higher degree of callosal microstructural damage and cervical cord atrophy. genetic testing of the spg11 gene revealed two mutations in compound heterozygous state, a known frameshift mutation as well as a novel synonymous exonic splice site mutation. this study shows similar but distinct clinical and imaging findings in monozygotic twins suffering from spg11, suggesting individual downstream genetic effects. targeted next generation sequencing techniques tremendously improved our ability to identify sequence variants. however fixing disease causing mutation still lack behind because of several reasons: inappropriate gene specific data bank, insufficient prediction tools, incomplete analysis and others. in addition identified sequence variants are a mixture of severe disease causing mutations and a myriad of variants of unknown pathogenicity. in addition an unknown number of silent mutations, neutral polymorphism and sequence variants deeply buried in introns might severely influence splicing of the premature rna molecule. by solely analysis of the dna sequence this impact onto the integrity of the mrna is completely ignored. in order to catalog the mrna isoforms derived from genes of our interest we started to set up rnaseq technologies in our routine lab. to reduce the amount of data, to improve the power of analyses and to identify rare isoforms of transcripts we use targeted rnaseq to characterize the mrna molecules originating from those genes we are interested in (e. g.: hereditary breast cancer core genes (10 genes), hereditary colon cancer (23 genes), primary ciliary dyskinesia (pcd)(40 genes). genes involved in pcd offer the invaluable advantage that the respiratory epithelium where these genes are normally expressed can be sampled from the inferior turbinate of the nose by brush biopsy either from healthy probands or from patients suffering from pcd. in addition to direct preparation of rna from these cilia, cilia carrying cells or tissue can be cultured and manipu-abstracts long-range pcr and direct sequence analysis. the comparative analysis of parental haplotypes with the sequences flanking the deletion breakpoints in the patients revealed the absence of any de novo mutations in breakpoint-flanking regions of 19 prs2-mediated and 6 prs1-mediated type-1 nf1 deletions. we conclude that although nahr is a mutational mechanism causing large nf1 deletions, there is no evidence for a local mutagenic effect of these recombination events. hence it is unlikely that nahr underlying type-1 nf1 deletions involves error-prone translesion polymerases that would increase the de novo mutation rate in breakpoint flanking regions. furthermore, the detailed haplotype analysis of prs2, a highly active nahr hotspot mediating the majority of large nf1 deletions, revealed that non-allelic homologous gene conversion (nahgc) between nf1-repa and nf1-repc, which results from non-crossover resolution of recombination intermediates, is the major driving force responsible for the haplotype diversity in this region. remarkably, the haplotype diversity patterns observed for nf1-repa and nf1-repc were markedly different indicating that during nahgc, nf1-repa is disporportionately more often the donor sequence used to repair mismachtes in heteroduplex regions than nf1-repc. we also noticed a correlation between haplotype diversity and the number of prdm9 a-allele binding sites suggesting that haplotype diversity and hence the nahgc rate within prs2 in nf-repa is regulated by prdm9. heidelberg center for personalized oncology dkfz-hipo dkfz, heidelberg, germany dna methylation aberrations at differentially methylated region of imprinted genes interfere with the naturally parental-specific mono-allelic expression. that leads to a bi-allelic or absent expression of the imprinted gene, a cause of imprinting disorders (ids). we aimed at analyzing the genome wide dna methylation pattern of two patients with ids, namely transient neonatal diabetes mellitus (tndm) and multi locus imprinting disturbance (mlid), and their respective parents. the dna methylation profiles of these individuals were obtained by whole genome bisulfite sequencing (wgbs) on b cells sorted by magnetic cell isolation. the sequencing libraries were prepared as described in kretzmer et al. [1] and sequenced on an illumina hiseq 2000 machine. wgbs data were processed with the methylctools toolkit. briefly, bisulphite-treated sequences were aligned with bwa-mem using a three-letter approach, and the methylation ratios were quantified for ~26.9 million out of 28.2 million cpg sites (coverage>5) genome-wide. quality control was performed to assess the quality of the dna methylation profiles and genetic fingerprinting was performed on the wgbs data confirming sample origin and family relationship. the wgbs data was further compared to already existing genome-wide human snp array 6.0 (snp array) and humanmethylation 450k bead array (450k) data, resulting in a good accordance with pearson's correlation coefficients >0.97. we detected an overall dna methylation level around 75% in all samples. already known dna methylation alterations, e. g. hypomethylation in plagl1 were validated by wgbs. by searching for differentially methylated regions (dmrs), defined as regions composed of at least five consecutive cpg loci showing a methylation difference between patient and corresponding parents above 30%, we identified 442 dmrs in the mlid our study supports an association between maternal gdm and the epigenetic status of the exposed offspring. consistent with a multifactorial disease model, the observed fcb methylation changes are of small effect size but affect multiple genes/loci. the identified genes are primary candidates for transmitting gdm effects to the next generation. they also may provide useful biomarkers for the diagnosis and prognosis of adverse prenatal exposures and assessing the success of interventions during pregnancy. the nuclease hsnm1b/apollo has a dual function in both dna-repair and maintenance of telomeres. as to the repair of dna interstrand crosslinks (icl), hsnm1b/apollo is linked to the fanconi anemia (fa) pathway and cells depleted for hsnm1b/apollo (sirna) resemble those from patients with fa. we have identified a single nucleotide polymorphism, rs6674384, which is associated with quantitative differences in hsnm1b/apollo expression (mrna). we analyze whether the differential expression relates to the degree of cellular sensitivity to the dna interstrand crosslinks inducing mutagen mitomycin c (mmc) and ionising radiation (ir), which induces, among other lesions, dna double strand breaks. all experiments are realized using lymphoblastoid cells derived from generally healthy donors. results of rt-pcr analysis of hsnm1b/apollo expression and of the cell viability assays will be presented and discussed in the context of the potential usefulness of considering rs6674384 in predicting individual sensitivity to mutagens relevant in anti-cancer treatment. p-basepi-014 nahr events causing type-1 nf1 microdeletions are not associated with an increased mutation rate in breakpoint-flanking regions m. hillmer 1,2 , a. summerer 1,2 , v. f. mautner 3, 4 , l. messiaen 5, 6 , h. 2 1 institute of human genetics, ulm, germany, 2 university of ulm, ulm, germany, 3 department of neurology, hamburg, germany, 4 university hospital hamburg eppendorf, hamburg, germany, 5 department of genetics, birmingham, usa, 6 university of alabama at birmingham, birmingham, usa large deletions of the nf1 gene and its flanking regions are the most frequent recurrent mutations in patients with neurofibromatosis type 1 (nf1). different types of large nf1 deletions have been identified which are distinguishable in terms of their size and breakpoint position. most frequent are type-1 nf1 deletions spanning 1.4-mb and characterized by breakpoints located within the low-copy-repeats nf1-repa and nf1-repc which exhibit 97.5% sequence homology within 51-kb. type-1 nf1 deletions are caused by non-allelic homologous recombination (nahr). two nahr hotspots have been identified termed prs1 and prs2 which encompass 5-kb and 4-kb, respectively. approximately 80% of all type-1 nf1 deletion breakpoints cluster within the prs1 and prs2 nahr hotspots. in this study, we analysed whether the nahr events causing type-1 nf1 deletions would be associated with an increased de novo mutation rate of sequences located in breakpoint-flanking regions. to do so, we sequenced the deletion breakpoint-flanking regions in the patients and compared these sequences with the homologous regions amplified from dna of the patients' parents who are not affected by nf1. however, in the germline of these parents, the deletions were mediated by nahr and then transmitted to their offspring. the parental haplotypes within the prs1 or prs2 regions of nf1-repa and nf1-repc were analysed by in the alpl gene and inherited as an autosomal dominant trait can cause milder forms. so far detailed knowledge of the milder forms is lacking. 39 patients with a mutation in the alpl gene were interviewed in a standardized questionaire concerning the different disease manifestations: teeth, bone fractures, pain of bones and muscles and quality of life. subgroups were formed with regard to the localization of the mutations in the three protein domains. patients with mutations clustering in the catalytic site of the molecule showed the most severe odontohypophosphatasia: one individual had premature primary tooth loss, 31% of patients showed adult tooth loss, 77% suffered from dental caries. the majority had the first manifestation before the age of 18. persons suffering from mutations in the two other domains reported a relatively high quality of live with low pain of muscles and bones. unexpectedly in all groups there was no significant difference in the portion of patients with bone fracture. conclusion: the clinical signs of dominant hpp are mostly unspecific. especially dental problems like severe adult teeth loss, an early manifestation of dental caries or enlarged pulp chambers can be a sign of odontohypophosphatasia and a dominant inherited mild hpp. mutations in the catalytic site of the alpl molecule are associated with a more severe odontohypophosphatasia. screening of non-neoplastic lymphatic tissues from children for the igh-myc fusion using a highly sensitive 4-color fish-assay burkitt lymphoma is a mature b-cell lymphoma which on the genetic level is characterized by the burkitt translocation t(8;14)(q24;q32) juxtaposing the igh locus in 14q32 next to the myc locus in 8q24. in a minor part of burkitt lymphomas, immunoglobulin light chain variants of the translocation result in overexpression of myc. despite being pathognomonic for burkitt lymphoma, the ig-myc juxtaposition alone is not sufficient on its own for a malignant transformation of the cell. other igh rearrangements like the igh-bcl2 fusion, typical for follicular lymphoma, were detected in a significant number of healthy individuals. for the igh-myc translocation, only scarce data in healthy individuals exist. this is most likely due to scattering of the breakpoints which are far more difficult to target by pcr than the igh-bcl2 translocation. therefore, we aimed at investigating if myc-translocation positive cells can also be detected in normal b-cell maturation. considering the epidemiology of burkitt lymphoma being the most common b-cell lymphoma in children, we focused on samples from young individuals. on the one hand, we analyzed non-neoplastic tissue specimen of bone marrow (n = 14) (age range 3-18, median age 8.5 years) and lymph nodes (n = 19)(age range 2-18 years, median age 12 years). on the other hand, considering the typical clinical presentation of burkitt lymphoma, we included non-neoplastic tissue specimen containing peyer patches (n = 25)(age range 48hours-20years, median age 17years). the specimen were analyzed using a four-color fluorescence in situ hybridisation (fish) assay with probes flanking the breakpoints on chromosomes 8 and 14. in this setting, a positive result comprised the break on both chromosomes (seen as signal split for each locus) and fusion of the involved genes (leading to two different fusion signals). the assay was first validated on controls of cells with a normal male karyotype from healthy individuals as well as on five burkitt cell lines and each five ffpe embedded t(8;14) negative and positive tissues as negative and positive controls. the assay was then applied for the screening of a igh-myc fusion in the above mentioned paraffin-embedded tissues. successful hybridizations of overall 9, 15 and 17 ffpe sections from bone marrow, lymph nodes and peyer patches respectively could be obtained. trio and 1776 dmrs in the tndm trio. in the mlid trio 238/442 dmrs showed increased and 204 dmrs decreased dna methylation in the patient's sample. of 442 dmrs, 325 are located in regions potentially associated with transcriptional regulation. further analysis revealed that 34/442 dmrs are associated with imprinted genes. in the tndm trio, we detected 1705/1776 dmrs to show hypermethylation in the patient compared to her parents and 71/1776 dmrs with lower dna methylation. in these trio 1166/1776 dmrs are associated with regions potentially correlated to transcriptional regulation and 13 dmrs with imprinted genes summarized, our results show that wgbs is a well suited and valid method for analyzing dna methylation. while the overall dna methylation levels does not differ between the analyzed patients and parents, a detailed analysis of smaller regions revealed the existence of 442 respectively 1776 differentially methylated regions between the analyzed mlid and tndm patient and their parents. supported by bmbf through fkz: 01gm0886 und 01gm1114 and 01gm1513 p-basepi-016 characteristic mutational profile in children of individuals exposed to ionizing radiation p. krawitz, h. manuel, a. knaus, g. hildebrandt, m. jäger, m. schubach, m. rodriguez de los santos, t. pantel, d. beule, s. mundlos, k. sperling institute for medical genetics and human genetics charite, berlin, germany the dna damaging effects of ionizing radiation are deliberately used in cancer therapy as well as feared in accidents related to nuclear technology. despite its influence on the exposed organism, irradiation was believed to have no major effect on succeeding generations, as irreparable dna damages were thought to result in cell death. recently, however, genome-wide mutation screenings in offsprings of male mice that were irradiated with high dosages showed an accumulation for certain de novo events. we therefore focused on these mutational classes in a small cohort of human individuals that were conceived while or after their fathers were exposed to high frequency radiation. in the whole genome sequences of 22 such offsprings we could confirm de novo rates for single nucleotide variants in the order of 10 -8 per base as previously reported. interestingly, however, we found de novo translocations of paternal origin as well as increased numbers of clustered de novo mutations that resemble the results from animal studies. this characteristic mutation profile might thus be used as an indicator of irradiation exposure in one of the individual's parents. from the upstream regular rb1 promoter. to test this hypothesis, we generated a genetic model carrying modifications in the rb1 promoter and in cpg85 using crispr/cas9 technology. data on the establishment of the model and first results will be presented. the gid/ctlh protein complex with its seven core protein members is conserved in all eukaryotic cells. in saccharomyces cerevisiae it functions as an ubiquitin ligase complex and regulates the metabolic switch from gluconeogenesis to glycolysis (1) . recently, we could show that the vertebrate gid/ctlh complex also functions as an ubiquitin ligase, however substrates and exact function remain unknown (2) . a growing number of components of the ubiquitin protein system (ups) are described to be regulators of ciliogenesis (3). defects in such genes are considered to cause ciliopathies, genetic disorders with typical phenotypic variations in patients and model organisms (4) . first data supports our hypothesis that the ctlh complex plays a major role in ciliogenesis, e. g. the ctlh subunit rmnd5a localizes to the basal body which is a modified centriole of primary cilia in nih-3t3 cells and rmnd5 knock down in xenopus laevis leads to defects in cilia formation of epidermal multiciliated cells. n. reich, m. sandbothe, r. buurman, b. schlegelberger, t. illig, b . skawran department of human genetics, hannover, germany background and aims: hepatocellular carcinoma (hcc) is characterized by genetic and epigenetic changes that lead to a deregulation of important tumor suppressors and oncogenes in a multistep process. one of these epigenetic changes is the elevated expression of histone-deacetylases (hdacs) which contribute to a transcriptional repression of certain genomic regions by remodeling the chromatin structure. thereby, not only the expression of tumor-relevant genes is affected, but also the expression of micrornas (mirnas). selected mirnas have been shown to play important roles in carcinogenesis. we aimed to identify mirnas deregulated by histone deacetylation and to understand their functional consequences in hcc tumorigenesis. methods: histone acetylation was induced by the global hdac inhibitor trichostatin a (tsa) in four hcc cell lines (hle, hlf, huh7, hepg2) and two immortalized liver cell lines (thle-2 and thle-3) in order to identify differentially expressed mirnas and messenger rnas (mrnas) by global expression profiling. findings were validated by transfection of microrna mimics and sirna-mediated knockdown in hcc cell lines, quantitative pcr, western blotting and luciferase reporter assays. results: after hdac-inhibition, hsa-mir-129-5p was significantly upregulated. the mir-129-5p holds tumor suppressive potential and its expression is reduced in different types of tumors. one predicted target gene of mir-129-5p is the hepatoma-derived growth factor (hdgf). this mitogenic growth factor is highly expressed in a variety of cancers, for example in hcc, and its expression correlates with a poor prognosis, irrespective of the tumor type. hdgf is a multifunctional protein that is involved in several signaling pathways, contributing to proliferation and metastasis of cancer cells, induction of angiogenesis and inhibition of apoptosis. incubation of hcc cells with tsa or transfection with mir-129-5p reduced expression of hdgf. luciferase assays indicate a direct regulation of hdgf by mir-129-5p. moreover, expression of the death receptor fas, which is a potential downstream target of hdgf, is also regulated by the mir-129-5p. a translocation t(8;14)(q24;q32) was not detectable in any of the investigated tissues. with the established assay we were able to provide a highly sensitive tool for the detection of the translocation t(8;14)(q24;q32). however, we did not detect normal b-cells carrying this translocation. this does not exclude that such cells exist. alternatively, the growth advantage conferred by myc may promote the acquisition of secondary genetic changes. this may result in a rapid tumorigenesis, that if occurring these cells only present as full blown burkitt lymphoma. myotonic dystrophy: links to the nuclear envelope p. meinke, s. hintze, s. limmer, b. schoser friedrich-baur-institute, munich, germany myotonic dystrophies (dm) are slowly progressing multisystemic diseases with a predominant muscular dystrophy -making dm the most frequent muscular dystrophy in adulthood. dm is caused by heterozygous dna-repeat expansions in the dmpk gene (dm1) or the cnbp gene (dm2). the repeat-containing rna accumulates in ribonuclear foci and splicing factors are sequestered to these foci, resulting in abnormal regulation of alternative splicing. dm patients show overlapping phenotype presentations with progeroid laminopathies, which are caused by mutations in nuclear envelope proteins. in search for molecular signatures of this overlap, we found an enrichment of nucleoplasmic reticuli in dm1 and dm2 patient myoblasts. additional, we found an alternative splicing of the lmna gene -both effects that are associated with progeroid laminopathies. this implies possible shared pathomechanism between dm and progeroid laminopathies. retinoblastoma is a tumor of the retina occurring in young children up to the age of five. it is caused by biallelic inactivation of the tumor suppressor gene rb1. we have shown that human rb1 is an imprinted gene and as such characterized by differential dna methylation of a cpg island (cpg85) in rb1 intron 2. cpg85 is not methylated on the paternal allele and acts as a promoter for the alternative rb1 transcript, rb1-e2b. it is argued that expression of rb1-e2b is causative of the observed skewing of regular rb1 expression in favor of the maternal allele. a true gametic differentially methylated region (gdmrs) is established in only one of the parental germ lines. it is supposed to be stable during early embryonic development and to be passed on to all daughter cells. we could show that cpg85 is free of methylation in human sperm. publicly available methylome data on oocytes revealed that cpg85 is fully methylated in human oocytes. these data are in agreement with cpg85 being a maternal methylated gdmr. we showed that the level of cpg85 methylation is 60 percent in blood, as expected. however, in eight tissues of three individuals we observed a gain of methylation at cpg85 ranging from 60 to 65 percent in liver and skin, and increasing to 70 to 85 percent in the other tissues (heart, kidney, muscle, brain, lung and spleen). interestingly, the degree of methylation was lower in fetal tissue than in adult tissue, as determined for brain and muscle. we also observed gain of methylation at cpg85 in two human embryonic stem cell lines and induced pluripotent stem cells. this is consistent with the finding of complete methylation at cpg85 in eight different retinoblastoma cell lines. we therefore conclude that cpg85 is an unstable dmr. in oocytes, dna methylation of gdmrs is established by transcriptional read-through from an upstream promoter. therefore, we hypothesize that gain of dna methylation at cpg85 is caused by run-through transcription erozygous state (nomenclature according to hgvs; reference sequence nm_000051.3). in silico-analysis by alamut (version 2.8.1) predict the loss of the donor splice site of intron 27 of the atm gene. cdna-analysis was performed and revealed the loss of exon 27 of the atm by a complex activation of two kryptic splice sites. a premature stop codon was generated giving rise to a truncated protein that leads to a pathogenic variant. the results of the genetic analysis are discussed in the context of the clinical findings. identification of the underlying genetic causes of gastric cancer will give a better view of the mechanisms that contribute to the pathophysiology of the disease. gemeinschaftspraxis für humangenetik und genetische labore, hamburg, germany, 2 zentrum für diabetologie bergedorf, schwerpunktpraxis, hamburg, germany about 2-5% of all pregnant women develop gestational diabetes mellitus (gdm) during their pregnancies and diabetes complicating pregnancy is associated with adverse maternal and perinatal outcomes, notably, risk of fetal macrosomia and neonatal hypoglycemia and development of diabetes after pregnancy. gdm is considered to result from interaction between genetic and environmental risk factors. the case of a 35-year old female german patient with a novel mutation in the pax4 gene (rare mody gene type 9) as a cause of gestational diabetes mellitus is presented. we describe clinical, biochemical and genetic features of the patient, who developed gdm and gave birth to her child by cesarean section. mody genes type 1-11 were analyzed. sequencing the pax4 gene revealed a novel mutation in exon 8, pax4,c.778delc, p.(leu260cysfs*24); reference sequence nm_006193.2), a deletion of a cytosine leading to a truncated, non-functional protein. to date, no small deletion has been detected in the pax4 gene. identification of the underlying genetic causes of gdm will give a better view of the mechanisms that contribute to the pathophysiology of the disease. furthermore, early identification may improve options to prevent gdm and complications for the mother and her child. the results of the genetic analysis are discussed in the context of the clinical findings. the modulation of dna methylation is highly flexible and plays an important role during cell differentiation. furthermore, the dna methylome alters considerably during aging. age related changes in the dna methylation of regulatory genes are assumed to have a major impact on carcinogenesis (teschendorff, 2010) . moreover, it was demonstrated that the chronological age of a human donor can be predicted with high accuracy by analyzing the dna methylation of a specific minor set of cpg loci which are aberrantly methylated during aging (horvath, 2013) . hence, we intended to investigate the effect of epimutations identified in different lymphoma entities in comparison with the influence of epigenetic changes in sequential b cell differentiation stages on the epigenetic age. the altered expression of the tumor suppressor mir-129-5p due to chromatin remodeling may play a fundamental role in hepatocarcinogenesis. we expect that histone deacetylation and putative target genes of epigenetically deregulated mir-129-5p can be targeted by new therapeutic agents. the microrna-449 family inhibits tgf-β-mediated liver cancer cell migration by targeting sox4 introduction: modulation of microrna expression is considered for treatment of hepatocellular carcinoma (hcc). therefore, we characterized the epigenetically regulated microrna-449 family (mir-449a, mir-449b, mir-449c) with regards to its functional effects and target genes in hcc. methods: after transfection of mir-449a, mir-449b, and/or mir-449c, tumor-relevant functional effects were analyzed using in vitro assays and a xenograft mouse model. binding specificities, target genes, and regulated pathways of each microrna were identified by microarray analyses. target genes were validated by luciferase reporter assays and expression analyses in vitro. furthermore, target gene expression was analyzed in 61 primary human hccs compared to normal liver tissue. results: tumor suppressive effects, binding specificities, target genes, and regulated pathways of mir-449a and mir-449b differed from those of mir-449c. transfection of mir-449a, mir-449b, and/or mir-449c inhibited cell proliferation and migration, induced apoptosis, and reduced tumor growth to different extents. importantly, mir-449a, mir-449b, and, to a lesser degree, mir-449c directly targeted sox4, which codes for a transcription factor involved in epithelial-mesenchymal transition and hcc metastasis, and thereby inhibited tgf-β-mediated cell migration. conclusions: this study provides detailed insights into the regulatory network of the epigenetically regulated microrna-449 family and, for the first time, describes distinct tumor suppressive effects and target specificities of mir-449a, mir-449b, and mir-449c. our results indicate that particularly mir-449a and mir-449b may be considered for mirna replacement therapy to prevent hcc progression and metastasis. novel mutation in the atm gene and activaton of two kryptic splice sites in an 52 year old female patient with gastric cancer gemeinschaftspraxis für humangenetik und genetische labore, hamburg, germany, 2 schwerpunktpraxis, hämatologie, onkologie und palliativmedizin, hamburg, germany, 3 israelitisches krankenhaus, chirurgische klinik, hamburg, germany gastric cancer is a global public health concern, ranking as the third leading cause of cancer mortality. familial aggregation of gastric cancer is common in about 10% of the cases, and about half of these can be attributed to hereditary germline mutations. however, for most gastric cancer cases, whether genetic events contribute to cancer susceptibility remains unknown. here we present a case report of a patient with gastric cancer, a family history of breast cancer and a novel mutation leading to complex cryptic splicing in the atm gene. ngs panel sequencing and cnv/mlpa analysis of 18 genes associated with gastric and breast cancer were performed. sequencing revealed a novel mutation in intron 27 of the atm gene, atm,c.4109 + 1g>a in an het-abstracts tions remained unidentified since positive deletion-junction pcr products could not be amplified. to identify the breakpoints of the 15 deletions, we performed custom-designed microarray cgh analysis with a high resolution of probes located within and flanking the nahr hotspots prs1 and prs2. the array analysis suggested that 11 of the 15 deletions exhibit breakpoints within prs2, even although previously performed breakpoint-spanning pcrs with primers designed according to the reference sequence of the human genome have been negative in these 11 cases. since prs2 exhibits high sequence diversity resulting from frequent nonallelic homologous recombination events without crossover, we surmised that haplotype diversity is responsible for the failure of the breakpoint-spanning pcrs performed with primers designed according to the reference sequence. therefore we characterized the haplotype diversity of prs2 in 30 human individuals and designed deletion-junction pcr primers that facilitate the amplification of rare prs2 haplotypes. so far, we have identified the breakpoints of four of the 11 type-1 nf1 deletions predicted to have been mediated by prs2 according to the array results. we are confident to identify further breakpoints by extending these analyses using primers suitable to amplify rare prs2 haplotypes. our findings indicate that the characterisation of nahr hotspots in terms of haplotype diversity is a premise to identify the breakpoints of nahr-mediated microdeletions by means of deletion-junction pcrs. p-basepi-028 *** array-based dna methylome analyses of primary lymphomas of the central nervous system ulm university, ulm, germany, 2 university of cologne, cologne, germany, 3 christian-albrechts-university kiel, kiel, germany, 4 university hospital muenster, muenster, germany primary lymphomas of the central nervous system (pcnsl) are defined as diffuse large b-cell lymphomas (dlbcl) that are confined to the central nervous system (cns). although pcnsl cannot be distinguished from dlbcl by their morphology as well as their histology, they differ in prognostic outcome. the aim of the present study was to compare the epigenomic landscape of pcnsl and dlbcl. to this end, we analyzed the dna methylation of a total of 26 pcnsl (cryopreserved or formalin fixed and paraffin embedded (ffpe)) using the infinium humanmethylation450 beadchip array (illumina) and contrasted these findings to 79 dlbcl (kretzmer et al., 2015) . as controls, we used publicly available dna methylation data from a total of 50 normal brain samples derived from different regions of the cns (gilbert et al., 2105; jaffe et al., 2016; kurscheid et al., 2015; mur et al., 2013; wockner et al., 2014) . after normalization of the data we performed thorough filtering and removed the random snps, all loci located on gonosomes, as well as those loci with a detection p-value >0.01 in at least one of the samples, leading to 457,951 loci entering subsequent analyses. when comparing the dna methylation profiles of pcnsl versus dlbcl we identified 8279 differentially methylated loci (σ/σ max = 0.4; q < 1e-4). in order to remove those loci which represent a "brain signature", we compared dlbcl versus brain (σ/ σ max = 0.4; q < 0.05) based on the list of the previously identified 8279 loci. after removing this "brain signature", we ended up with 2231 loci that are differentially methylated between pcnsl and dlbcl. in a next step we wanted to make sure that the differences in methylation at these 2231 loci are not due to differences in starting material (cryopreserved versus ffpe) which is known to influence the outcome of the beadchip analysis. therefore, we compared the dna methylation profiles of five cryopreserved versus ffpe samples (derived from the same tissue samples) and identified 318 differentially methylated loci (σ/σ max = 0.4, q < 0.05). only five loci of both lists overlapped, which were subsequently removed from further analysis so that we ended up with a final list of 2226 loci which are differentially methylated between pcnsl and dlbcl. in order to analyze the biological implications of the differentially methylated loci we evaluated an enrichment of known functional groups (ku-additionally, our aim was to analyze whether entity-specific differences in the resetting of the epigenetic clock are generated during lymphomagenesis or derive from modified dna methylation in the germinal center b cells of origin. to address these issues, we performed dna methylation profiling (hum-anmethylation450 beadchip) of 72 burkitt lymphoma samples (age 2-76 yrs), 119 diffuse large b cell lymphoma samples (age 3-93 yrs) and 103 follicular lymphoma samples (age 22-80 yrs) from the icgc mmml-seq and mmml-consortium (kretzmer et al., 2015) and the hematopathology section kiel as well as 145 peripheral blood samples of healthy individuals (0-63 yrs) available from the same project and current publications of our group (kolarova et al., 2015; friemel et al., 2014) . in addition, we received 61 b cell subpopulation samples (0-66 yrs) covering different stages of b cell differentiation that were measured in the same way (kulis et al., 2015; lee et al., 2012) . the epigenetic age of the samples was predicted using the "online age calculator" accessible at https://dnamage.genetics.ucla.edu and compared with the corresponding chronological age of the donors. in fact, the epigenetic age of peripheral blood samples of healthy donors was in high accordance with their chronological age (pearson's r 0.968, p-value <0.001) while the correlation between epigenetic and chronological age of sequential b cell differentiation stages was slightly lower (pearson's r 0.893, p-value<0.001). in contrast, the predicted epigenetic age of the burkitt lymphoma samples was significantly higher than the corresponding chronological age. this deviation may be interpreted as "epigenetic pre-aging". nevertheless, the epigenetic age of diffuse large b cell lymphomas and follicular lymphomas tended to be less affected. in conclusion, we found significant epigenetic pre-aging in burkitt lymphoma samples that seems to be induced during lymphomagenesis and does not derive from altered dna methylation patterns in the germinal center b cells of origin. moreover, no significant shift of the epigenetic age was observed for the other lymphoma entities, healthy blood samples and b cells of sequential differentiation stages. identification of type-1 nf1 deletion breakpoints mediated by rare prs2 haplotypes a. summerer 1, 2 , m. hillmer 1,2 , v. f. mautner 3, 4 , l. messiaen 5, 6 , h. 2 1 institute of human genetics, ulm, germany, 2 university of ulm, ulm, germany, 3 department of neurology, hamburg, germany, 4 university hospital hamburg eppendorf, hamburg, germany, 5 department of genetics, birmingham, usa, 6 university of alabama at birmingham, birmingham, usa neurofibromatosis type 1 (nf1) is a hereditary cancer syndrome with an incidence of 1 in 3000. in 5% of all nf1 patients, large deletions encompassing the nf1 gene and its flanking regions are causing the disease. the majority of all large nf1 deletions are of type-1; they encompass 1.4-mb and are mediated by nonallelic homologous recombination (nahr) with crossover. the breakpoints of type-1 deletions are located within the lowcopy repeats nf1-repa and nf1-repc which exhibit high sequence homology to one another. previous studies suggested that type-1 deletion breakpoints cluster within the paralogous recombination sites prs1 and prs2 spanning 5-kb and 4-kb, respectively. in our present study, we investigated 218 patients with type-1 nf1 deletions using long-range pcrs to detect breakpoints located within prs1 or prs2. according to these analyses, 157 (72%) of the breakpoints are located within prs2 and 34 (15.6%) in prs1. however, 27 (12.4%) of the type-1 deletions were not positive for these deletion-junction pcrs. we surmised that some of these deletions may have breakpoints within the 14-kb region located between prs1 and prs2. this 14-kb region also exhibits high sequence homologoy between the nf1-reps which is a prerequisite for nahr. indeed, 12 of the 27 type-1 nf1 deletions exhibited breakpoints within this 14-kb region as determined by the analysis of seven overlapping deletion-junction pcrs. however, the breakpoints of 15 dele-esophageal adenocarcinoma (ea) represents one of the most rapidly increasing cancer types in high-income countries. barrett's esophagus (be) is a premalignant precursor of ea and has an estimated prevalence of 5-6% in the population. however, only 0.1 to 0.3% of be patients develop ea. within an international consortium, we carried out a gwas meta-analysis in 6167 be patients, 4112 ae patients and 17.159 controls (gharahkhani et al., lancet oncology, 2016) . in a comprised be/ae analysis, we identified 14 genome-wide significant risk loci, of which seven were previously unreported. the strongest associated new risk variant was identified for rs17451754 (p = 4.8×10 -10 ), which maps within intron 21 of the cftr gene. cftr encodes a protein that functions as a chloride channel and that is mutated in patients with cystic fibrosis (cf). mutations in cf lead to abnormal viscous secretions with altered chemical composition, resulting in dysfunction of the respiratory system and the gastrointestinal tract. the most common cf mutation is δf508, a deletion of three nucleotides in cftr that results in the loss of a single codon for phenylalanine on protein position 508. interestingly, cf patients show a highly increased incidence of gastroesophageal reflux, which represents the major risk factor for be and ae. in view of the phenotypic overlap for gastroesophageal reflux and cystic fibrosis, and for gastroesophageal reflux and both be and ae, combined with the association of cftr risk variants in patients with be and ae, it seems plausible that a common pathophysiological mechanism is triggered by cftr. in order to test this hypothesis, we analyzed the association of δf508 in a european case-control cohort with be and ae patients. for this, we performed a genotyping assay of δf508 in 1037 be patients, 1609 ae patients and 941 controls. we could not observe a significant association (p = 0.57). this might be (i) due to insufficient sample power or (ii) due to the fact, that not δf508 but other genetic variants at the locus might explain the underlying functional mechanism of the association. fine mapping of all genetic variation at the cftr locus and exten-lis et al., 2015). remarkably, cpg loci that are differentially methylated during normal b-cell maturation were significantly depleted. in turn we saw an enrichment of loci located in heterochromatin. in summary, we detected more than 2000 loci that are differentially methylated between pcnsl and dlbcl, which do not play a functional role in normal b-cell differentiation. replication study of gwas-identified genetic modifiers of age at huntington's disease onset although there is a strong correlation between cag repeat length and age at onset (ao) of motor symptoms, individual huntington disease (hd) patients may differ dramatically in onset age and disease manifestations despite similar cag repeat lengths. since the modifier variations described so far only account for a small fraction of the heritable contribution to the ao, the identification of loci and genes using genome-wide methods appears highly promising. against the background of incomplete understanding of the hd disease pathophysiology, the hypothesis-free approach of gwas offers an ideal starting point for the search of modifier genes. recently, a combined analysis of all gwa data to hd modifiers identified different loci with genome-wide significant signals for association to residual age at motor onset [gem-h. consortium]. interestingly, none of the most significant association signals and none of the trending snps in the european gwa analysis corresponded to any previously suggested candidate modifier genes. in order to be able to better assess these data, we tried to replicate the top ten associated gwas variants in a comprehensive cohort of 505 german hd patients. we only found modest association with one of the top ranked snps (rs72810940), all remaining variations showed no correlation with the ao. this inconsistency highlights once again the difficulties of modifier searching in hd or any other monogenic disorder, which faces the same challenges as the genetic characterization of complex disorders. with an incidence of 0.2-0.3/100.000 malignant tumors of the thymus are a rather rare type of cancer. here we report the case of a man of german descent, who presented with a thymoma at age 55. in the pathological report the thymus tumor was described as an extremely unusual thymoma with partial loss of keratin and massive proliferation of myoid cells. it was subsumed to a primary thymic, partially epithelial neoplasia, resembling an uncommon b2/b3-thymoma. after the patient's death his widow looked for genetic advice concerning the risk of disease for her children. detailed personal and familial history brought up surprising information: thymoma was one of four cancers in our patient. he developed adenocarcinoma of the colon at age 35, squamous cell cancer of the nose/upper lip at 54 years and in addition current cancer staging revealed a papillary renal cell carcinoma. according to family history his father and his uncle developed colon cancer with 58 and 66 years, the son of this uncle was diagnosed with colon cancer at age 47. this cousin of the propositus was referred to genetic counseling, because of msi-high-status and loss of mlh1 and psm2 in immunohistochemistry. he was found to have a deleterious mutation in exon 14 of mlh1gene (c.2644c>a, p.tyr548stop) resulting in a premature termination of mlh1-protein. our patient has never been tested for hnpcc. however a post mortem performed immunhistochemical examination of thymic cancer cells revealed an almost complete loss of mlh1 nuclear expression suggesting the presence of a mlh1 germline mutation and indicated hnpcc. considering the loss of mlh1 in tumor cells it is more than likely that the development of thymoma was the consequence of deficient dna mismatch-repair. there have been reports of rare tumors in hnpcc families in the last years (i. e. clear cell renal carcinoma and uterine sarcoma). pande et al. reported one case of thymoma in their registration of cancer occurrences in 368 mutations carries from 176 hnpcc families [1] . our case emphasizes the importance of detailed family history and contributes to the discussion of widening the inclusion criteria for genetic counseling and testing for hnpcc. to this day the revised criteria of bethesda are used to identify families at risk. we propose that the established criteria have to be revised and rare tumors should be included. unknown partner genes in leukemias with rare translocations can be identified using targeted rna sequencing c. haferlach, n. nadarajah, m. meggendorfer, n. dicht, a. stengel, w. kern, t. haferlach mll, munic, germany in hematological malignancies fusion genes play an important role and function as therapeutic targets, impressively shown for e. g. bcr-abl1 and etv6-pdgfrb. thus, the identification of fusion genes is the basis for precision medicine, selecting treatment based on genotype and providing markers for disease monitoring. the aim of this study was to test the value of targeted rna sequencing in a routine diagnostic work up. 51 cases were selected harboring rearrangements of kmt2a (n = 10), runx1 (n = 19), etv6 (n = 11), pdgfrb (n = 6), npm1 (n = 2), rara (n = 2) and jak2 (n = 1) identified by chromosome banding (cba) and fish analyses. in none of the cases the partner gene could be identified using standard methods. targeted rna sequencing was performed using the trusight rna fusion panel (illumina, san diego, ca) consisting of 7690 probes covering 507 genes known to be involved in gene fusions. this assay allows the capture of all targeted transcripts. sequencing was performed on nextseq (illumina, san diego, ca). analysis was performed with the rna-seq alignment app (basespace sequence hub) using star for alignment and manta for gene fusion calling with default parameters (illumina). sive functional analysis are needed to find the causal variant that explains how the cftr locus interferes with the pathomechanism of be and ae. a recent functional study indicated cftr as a tumor suppressor gene in murine and human intestinal cancer, providing further evidence for cftr as a true disease gene for be and ae. background: it has long been established that mutations in brca2 predispose for pancreatic adenocarcinoma with brca2 germline mutations identified in 6-10% of familial pancreatic cancer cases. consequently, screening for pancreatic cancer has been recommended for mutation carriers with an affected first-degree relative since early detection has been shown to significantly improve 5-year survival from 4-7% to 24%. for brca1 mutations, however, relevance in pancreatic tumorigenesis is still being discussed with several studies questioning an elevated risk of pancreatic cancer in families with brca1 mutations while others are suggesting that brca1 may also play an important role in predisposing to pancreatic cancer. clinical screening for pancreatic cancer commonly remains unavailable to brca1 mutation carriers and it has even been questioned whether brca1 should be analyzed in familial pancreatic cancer at all. clinical report: here we report on a 59 year old woman with metastatic pancreatic cancer whose sister had died of pancreatic cancer at 50 years of age. in this family we identified a pathogenic brca1-germline mutation (brca1: nm_007294.3:c.1292dupt,p.(leu431phefs*5)) by next-generation sequencing using a 94-gene panel. the index patients' tumor was available for genetic analysis and showed loss-of heterozygosity for brca1. this strongly suggests the brca1 mutation to be causative of the pancreatic cancer development in this patient. when the family was first introduced to genetic counselling there was no evidence of breast-or ovarian cancer in any relatives. only after identification of the mutation did the index person reach out to distant family members and it was thereby revealed that a distant branch of the family had independently been counselled for hereditary breast and ovarian cancer. in this part of the family, however, there had not been any cases of pancreatic cancer. subsequent predictive testing was offered to healthy family members and 3 further mutation carriers could be identified. two women were referred to breast cancer screening. additionally, the mutation was identified in a relative with recurrent metastatic breast cancer at the age of 38 years. for her and the index patient parp-inhibition therapy thus became a possible further treatment option. conclusions: in conclusion we propose next-generation sequencing approaches including the analysis of brca1 to be used in familial pancreatic cancer. we also argue that brca1 mutation carriers with pancreatic cancer cases in their family should be offered the same screening program as brca2 mutation carriers. within the framework of a study this could allow for more precise risk stratification in the future. contralateral dcis is unknown. only the male breast carcinoma was herceptin receptor positive, all other breast carcinomas of the chek2 mutation carriers were her2 negative. among our chek2 positive families we noticed the association with chek2 mutation and female breast cancer. we observed a contralateral breast cancer, male breast cancer and other tumors in our families as well. the majority of the observed breast cancers was estrogen and progesterone receptor positive and herceptin negative. while benign uterine smooth muscle tumors are among the most frequent human symptomatic tumors, their malignant or borderline lesions are only rare findings. both lesions can show somatic copy number alterations, but their patterns differ, thus constituting helpful diagnostic tools. aimed at an advanced classification of the lesions we have performed molecular inversion probe array analyses of these tumors. besides complex patterns of genomic alterations seen in nearly all cases, two of the lesions presented with copy number neutral uniparental disomies i. e. normal copy numbers with an apparent monoallelic origin. in one case, an upd of part of the long arm of chromosome 22 was detected in a uterine leiomyosarcoma. the tumor showed genetic heterogeneity with gains and losses. in addition, the 11.45 mb segment located at 22q12.1-q13.1 was clearly of monoallelic origin throughout all cells investigated. all other genetic alterations were restricted only to part of the cells of the sample thus reflecting the presence of tumor cells as well as normal bystander cells which in general characterizes mutations that had arisen during tumor development. in contrast, the upd that was detected in all examined cells clearly suggests its germline occurrence. the second tumor was a leiomyoma-variant of the type with bizarre nuclei. again, besides gains and losses an apparent germline upd was found that covered a 3.71 mb segment on chromosomal segment 8q11.21. upd for even the whole arm of chromosome 22 repeatedly has been reported not to coincide with phenotypic manifestations. nevertheless, the question arises whether or not the observed upds might be related to a familiar predisposition for uterine muscle tumors. of note, as a result of genome-wide association studies snps on 22q recurrently have been found to be significantly associated with fibroid development. triple negativity is an independent predictor of germline mutations in breast cancer predisposing genes breast cancer is the most common cancer in women. 12-15% of all tumors are triple-negative breast cancers (tnbc) lacking expression of estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2. so far, tnbc have been mainly associated with mutations in brca1, although recent studies also found mutations in other breast cancer susceptibility genes. a brca1/2-centered perspective thus may ignore the significance of other predisposing genes, whose relevance appears obvious as dna damage repair by homologous recombination is a complex process involving many proteins. in 32/51 cases with rearrangements involving kmt2a (n = 10), runx1 (n = 8), etv6 (n = 6), pdfgrb (n = 4), rara (n = 2), npm1 (n = 1) or jak2 (n = 1) the partner genes were identified. these were in kmt2a rearranged cases: mllt10 (n = 2), mllt1 (n = 2), itpr2, flnc, asxl2, dcp1b, maml1 and arhgef12. in runx1 translocated cases partner genes were plag1 (n = 2), prdm16, mecom, zfpm2, man1a2, n6amt2, and kiaa1549l. prdm1, mecom and zfpm2 have previously been described in the literature as runx1 partner genes but were not suspected in our cases as partner genes due to complex cytogenetic rearrangements. the other identified partner genes have not been described so far. interestingly, prdm1, mecom, zfpm2 and the newly identified plag1 are all members of the c2h2type zinc finger gene family. partner genes identified in etv6 rearranged cases were: abl1, ccdc126, clptm1l, erg, foxo1 and cflar-as1. wdr48, zbtb11, nfia and mprip were identified as partner genes of pdgfrb and rpp30 in an npm1-translocated aml. in an all patient a jak2-ppfibp1 fusion was identified leading to classification as a bcr-abl1-like all. in an apl patient showing an ins(17;11) (q12;q14q23) a zbtb16-rara fusion was identified and thus resistance to all-trans retinoic acid, arsenic trioxide, and anthracyclines can be predicted. further in a case with t(17;19)(q21;q13) an irf2bp1-rara fusion was detected. conclusions: targeted rna sequencing was able to characterize rare gene fusions and provided the basis for the design of rt-pcr based assays for monitoring mrd. targetable genetic aberrations were identified, which were not detected by cba enabling more individualized treatment. targeted rna sequencing may be a valuable tool in routine diagnostics for patients with rearrangements unresolved by standard techniques. female carriers of a pathogenic mutation in the chek2 gene are reported to have a life time risk of about 20-45% to develop breast cancer. there is evidence for increased risks for contralateral breast cancer, male breast carcinoma and other types of tumors. in addition to well-known mutation chek2:c.1100del, other pathologic mutations are being identified in the gene due to the inclusion of the gene in most breast cancer gene panels for dna testing. between 2012-2016 the center for hereditary breast and ovarian cancer regensburg cares for 10 families with pathogenic or probably pathogenic mutations in the chek2 gene affecting nine female patients and one male patient (6 × c.1100del, 1 × deletion of exon 10, and 3 × variants considered as likely pathogenic: c.1408g> c, c.1561c> t, c.1169a> c). the mean age of diagnosis of breast cancer (both sexes) was 45.1 years (range 25-61 years). the patient with the deletion of exon 10 was first diagnosed at 25 years of age and developed a contralateral breast carcinoma (dcis) at 35 years of age. the male patient was diagnosed with breast cancer at 58 years of age and at 59 years with a renal carcinoma. one patient was diagnosed with a papillary thyroid carcinoma at age 26 years and developed breast cancer with 35 years. in 8 out of the 10 families, breast carcinoma diagnosed with 51.3 years on average, was reported in the family history. in addition, there were additional malignancies such as prostatic carcinoma, thyroid carcinoma, colorectal cancer, gastric carcinoma, leukemia, cervical carcinoma and malignant melanoma. none of the affected family members was tested for the respective chek2 mutation. the tumors with an initial diagnosis at 25 years and 35 years were estrogen-receptor-negative and progesterone-receptor-negative. the other 8 of the 11 breast cancers were positive for the estrogen receptor, 7 of the 11 tumors were positive for the progesterone receptor. the receptor status of the abstracts it has been shown that in 3d culture hescs can be differentiated into neural retina containing organoids. we established this differentiation schedule and started comparative differentiation of wildtype h1 hescs and the rb1 null derivative (g4, rb1 mt/mt ) into neural retina. during the first weeks of differentiation into neural retina organoids generated from the rb1 mt/mt hescs have a smaller diameter and thinner retina layer compared to wildtype organoids. however, during the time-course the mutant organoids began to catch up. thus, at later stages no difference in size and thickness could be observed anymore. comparative immunostainings of cryosections at d19 show no difference in expression of the markers pax6 and sox2 between the wildtype and mutant hescs. further comparative immunostainings for markers specific for neural retina like e. g. rx and vsx2 at d19 and d33 are ongoing and will be presented. exome sequencing identified potential causative candidate genes for unexplained cowden syndrome purpose: cowden syndrome (cs) is a cancer predisposition syndrome characterized by the occurrence of breast cancer, epithelial thyroid cancer, endometrial carcinoma and various other findings such as mucocutaneous lesions and macrocephaly. cs belongs to the pten hamartoma tumor syndrome (phts) primarily associated with germline mutations in pten. in recent years, germline mutations in additional genes (sdhb, sdhc, sdhd, pik3ca, akt1, sec23b) have been described in few patients; however, to date, in 20-75% of patients meeting clinical criteria for cs the underlying cause remains unclear. methods: to uncover predisposing causative genes, the exomes of 11 clinically well characterized, mutation negative patients with suspected cs were sequenced (illumina hiseq) using leukocyte dna. assuming a monogenic disease model, the called variants were filtered for rare (minor allele frequency ≤1% for homozygous/compound heterozygous variants and ≤0.01% for heterozygous variants according to dbsnp, evs, and exac), truncating (nonsense, frameshift, highly conserved splice sites), and missense germline variants (predicted to be pathogenic by at least 2/3 in-silico tools). for data analysis and variant filtering the gatk software and the cartagenia bench lab ngs software were applied. all candidate genes were included in a pathway analysis (ingenuity). in a first preliminary analysis, we focused on known cancer genes and genes interacting with pten. results: after stringent filtering steps, comparison with large datasets from population-based controls, and detailed manual inspection to exclude artifacts, 75 genes were affected by presumed biallelic variants (16 homozygous and 59 putative compound-heterozygous), one of these is a known cancer gene (cbfa2t3); in 17 genes biallelic variants were found in 2-6 patients. heterozygous variants were found in 23 genes in 2-6 patients, but none of these are known cancer genes. in 132 genes, heterozygous truncating mutations occurred in only one patient, 4 of these are cancer genes (msh6, wrn, kdm5a, pml) . the phenotype of the patient with a msh6 frameshift deletion fulfilled key features of cs (early-onset metachronous papillary thyroid cancer, breast cancer, endometrial and colorectal cancer), however, the tumor spectrum is partly compatible with lynch syndrome/ hnpcc. examination of the colorectal cancer demonstrated microsatellite instability and a loss of msh6 protein expression. the pathway analysis of the remaining candidate genes identified several interacting partners of pten (grhl3, ehhadh, cstf3). conclusions: preliminary data indicate that exome sequencing might identify potentially relevant causative genes for cs, some of which are recurrently mutated. the present work-up consists of the inclusion of further non-cancer genes, validation of variants by sanger sequencing, testing of to determine the prevalence of mutations we performed panel-based germline mutation testing of 10 high and low-moderate penetrance breast cancer susceptibility genes (brca1, brca2, atm, cdh1, chek2, nbn, palb2, rad51c, rad51d and tp53) in 229 consecutive individuals affected with tnbc unselected for age at diagnosis or breast and ovarian cancer family history. age at diagnosis ranged from 23 to 80 years with an average of 50.2 and a median of 48 years. in 60 women (26.2%) we detected a pathogenic mutation, with a higher frequency (31.3%) in the group manifesting cancer before 60 years. deleterious brca1 mutations occurred in 14.8% of tnbc patients, predominantly frameshifting (24/34, 70.6%). the most frequent, both among brca1 mutations and in total, were the founder mutations c.5266dupc and c.2411_2412delag. deleterious brca2 mutations occurred in 5.7% of patients, all but one (c.1813du-pa) being unique. while no mutations were found in cdh1 and tp53, 15 mutations (25%) were detected in one of the six other predisposition genes (palb2, chek2, atm, nbn, rad51c, rad51d). no individual presented more than one mutation. almost half of all deleterious mutations (42.5%) were detected in very young women aged 35 years or less. the median age at diagnosis was significantly younger for brca1 (40 years) and brca2 (41.5 years) carriers compared to patients without a mutation (p = 2.746e-05; mann-whitney) or compared to non-brca1/2 mutation carriers (p = 0.022). in contrast, patients with non-brca1/2 mutations were not significantly younger than mutation negative women (p = 0.5288). interestingly, family history had an independent influence on age at diagnosis. taken as a whole, women with family history had a median age at diagnosis 6 years earlier than those without (p = 0.00057). this difference was lost in mutation carriers while it remained in cases without mutation. in summary, our data confirm and expand previous studies of a high frequency of germline mutations in genes associated with ineffective repair of dna damage by homologous recombination in women with tnbcs. many of these women would go untested with current restrictive criteria. in order that each patient receives therapies tailored to her genetic status, gene panel based mutation testing should be offered to all women diagnosed with tnbc, irrespective of age at diagnosis or family history. p-cancg-037 *** neural retina differentiation of hescs as an in vitro model for retinoblastoma d. kanber, m. hiber, d. lohmann, l. steenpass institute of human genetics, university hospital essen, university duisburg-essen, essen, germany retinoblastoma is the most common eye tumor of early childhood. inactivation of both alleles of the retinoblastoma gene (rb1) results in the development of retinoblastoma. our aim is to establish a human cell-based model for retinoblastoma. using the crispr/cas9 system we have generated human embryonic stem cells (hescs) carrying a mutation either on one or both rb1 alleles. all the detected mutations are located in exon 3 of the rb1 gene and close to the splice donor site of this exon. analyses on dna, rna and protein level were performed for three mutant and one double-mutant clone. the following genotypes were identified by deep sequencing (nm_000321.2(rb1_v001)): clone c2, c.364_380del, heterozygous; clones c7 and g3, c.372_378del, heterozygous; clone g4, c.[372_378del; c.367_368dup] (complex mutation on one allele), homozygous (loss of heterozygosity). the mutations of all four clones result in a premature stop codon in exon 4. on rna level we detected expression of mutant rb1 transcripts reflecting the genotype in all clones and an additional mutant rb1 transcript with skipping of exon 3 in three clones. as the heterozygous clones also showed expression of the wildtype rb1 transcript, rb1 protein (prb) could be detected for these clones (c2, c7, g3) by western blot analysis. however, the double-mutant clone g4 showed no expression of prb. so far, we have characterized 3 heterozygous and one homozygous clone. another three double-mutant clones are under investigation. the recurrent germline missense mutation g84e in the hoxb13 gene has been demonstrated to predispose to hereditary prostate cancer (prca), despite the underlying pathogenic mechanism is not yet understood. molecular examination of a first set of g84e positive tumors sought for somatic characteristics, and suggested that oncogenic ets gene fusions may appear at unusually low frequencies as compared to the general prevalence of ets fusions in prca (22% vs approx. 50%). hypothesizing that hoxb13 could predispose to ets fusion negative prca, we have analyzed 942 cases from three european ancestry populations (finland, germany and us) for the coincidence of hoxb13 g84e and the most common ets fusion, tmprss2:erg, in corresponding tumor samples. while the prevalence of tmprss2:erg fusions was similar among the three study groups (range: 56.5-60.7%), the frequency of g84e genotypes differed markedly between us (1.5%), german (3.6%) and finnish samples (8.3%). despite the expected frequency gradient among study populations, all subsamples showed a strong enrichment of g84e mutation carriers among tmprss2:erg fusion negative cases as compared to fusion positive cases (center adjusted or = 4.96; 95%ci = 2.30-11.9; p = 0.0001). consistent with the previous study, the crude frequency of the tmprss2:erg fusion in hoxb13 g84e carriers was 23.5% (range 16.7% -28.6%). examination of disease characteristics highlighted age at diagnosis to be associated with tmprss2:erg negative status (per year or = 1.04, p = 0,00007) and by trend, also with the presence of the g84e germline variant (per year or = 0.97, p = 0.14). within the subtype of tmprss2:erg fusion negative carcinoma carriers of g84e were diagnosed 3.5 years earlier as compared to non-carriers (61.6 ± 1.4 years versus 65.1 ± 0.4 years, p = 0.017). in conclusion, this study demonstrated a significant tumor subtype specific association for hoxb13 g84e mutation carriers having a higher frequency of tmprss2:erg fusion negative prca. meta-analyses from case control comparisons suggested that subtype specific risk of hoxb13 g84e for tmprss2:erg negative prca could be as high as or = 19.0, as compared to or = 9.9, when prca is regarded as one entity regardless of fusion status. finally, although tmprss2:erg negative prca is usually known to be associated with later ages of diagnoses, hoxb13 mutations may indicate a subgroup of earlier onset cases within the fusion negative entity. relatives to determine the phase of assumed biallelic variants and segregation with the phenotype where applicable. with 1 to 2 cases in 1 million inhabitants per year adrenocortical cancer (acc) is a rare disease. due to often late diagnosis and limited treatment options prognosis for patients are poor with a 5 year overall survival rate of 7 to 35%. though knowledge about molecular genetic events in acc increased over the last few years no reliable molecular prognostic factors, no effective targeted cancer therapy and no personalized treatment approach has emerged to date. that's why we intend to establish a reliable method to define a molecular signature of accs that could be used for a prognostic classification of adrenocortical cancers, for planning an individualized therapeutic approach and for the identification of known or potential targetable molecular events in the single patient. in a retrospective study dna from acc and matched blood samples is sequenced to detect somatic single nucleotide variants (snv), small insertions and deletions (indel) and copy number alterations (cnv). sequencing data are then compared to clinical data e. g. tumor stage, resection status, ki67-index and time of progression free and overall survival to define molecular prognostic factors. target enrichment of 160 genes that are known to be associated with different entities of cancer is performed with the human comprehensive cancer panel (qiagen) and sequenced on a nextseq500 (illumina). data are analysed with gensearchngs (phenosystems). znrf3, a gene that was also described to be involved in the development and the progression of acc a few years ago, is sequenced separately with sanger and analysed with gensearch (phenosystems). to date tumor samples and matched blood samples from 43 patients were analysed. one or more tumors comprise one or more snvs or small indels in 48 of 160 genes of the panel and in znrf3. snvs and small indels are most often found in tp53, ctnnb1 and znrf3 with frequencies of 28%, 26% and 19% respectively. in 37 of 160 genes cnvs -duplications and deletions -occur. cdk4 is duplicated in over 50% of the cohort. mdm2 gains are found in over 40%. one can also find three types of cn patterns: a quiet type with low number of copy alterations, a noisy one with high number of chromosomal breakages and a chromosomal one with high frequency of alterations of chromosomal arms. while no correlation between snvs and small indels and clinical outcome could be found so far, cn patterns of the accs seem to correlate with progression free survival and overall survival. patients with a noisy cn pattern have a shorter progression free and overall survival than patients with chromosomal and quiet type. though tendencies in the correlation of molecular markers and prognosis for patients suffering acc can be recognized, further samples need to be analysed to confirm the results. it is planned to sequence another 60 tumor samples and matched blood samples for this retrospective study and to validate the results in a prospective study with another 100 patients. expression of mir-371a-3p and mir-367-3p was analysed in serum samples by quantitative pcr. the cohort of 27 gcnis patients consisted of 11 patients with a solitary testicle, who had undergone orchiectomy for contralateral tgct, and 16 patients with two testicles, one of which with gcnis, but no concurrent tgct. twenty men with non-malignant testicular disease served as controls. additionally, in situ hybridisation (ish) with a probe against mir-371a-3p was performed on four testicular biopsy specimens known to harbour gcnis. sequential step sections of the corresponding tissue blocks were analysed immunohistochemically, using oct4 antibody to visualise gcnis. the median expression value of mir-371a-3p in gcnis-patients was 5.2 (interquartile range [iqr] = 35.8) which is significantly higher than the median expression of 0.0 (iqr = 0.0) in controls. both of the two gcnis subgroups had significantly higher mir-371a-3p levels than controls, with a median expression of 18.2 (iqr = 37.3) and 2.7 (iqr = 32.7), respectively. regarding mir-367-3p expression, there were no significant differences between gcnis and controls. using a relative quantity of 5 as a cut-off value, the mir-371a-3p was able to detect 51.2% (95% confidence interval [95% ci] = 31.9-71.3%) of gcnis, while only 5% (95% ci = 0.1-24.9%) of the controls were positive. in the subgroup with previous tgct 63.6% (95% ci = 30.8-89.1%) of gcnis could be detected and in the subgroup without previous tumour the rate was 43.8% (95% ci = 19.8-70.1%). the detection rates for all gcnis and for both subgroups were significantly higher than for the controls. ish staining demonstrated the expression of mir-371a-3p in gcnis cells in two of the four cases. in conclusion, this study indicates a new and minimal-invasive way of diagnosing gcnis by measuring serum levels of mir371a-3p. this approach is endorsed by the demonstration of mir371a-3p in gcnis cells by ish staining. however, the sensitivity is still low and thus, the method certainly needs refinement possibly by applying a panel of additional mi-crornas. nonetheless, measuring serum levels of mir371a-3p may constitute a valuable aid in clinical assessment of men afflicted with high-risk factors of tgct. p-cancg-043 *** the mir-371a-3p is a highly specific and sensitive serum-based marker for the diagnosis and follow-up of testicular germ cell tumours testicular germ cell tumours (tgct) are a paradigm of curable malignancies. clinical management largely relies on measuring the serum biomarkers. inopportunely, the markers beta-hcg, afp and ldh are only elevated in about 60% of patients. therefore, micrornas of the clusters mir-371-3 and mir-302/367 were proposed as novel serum-based markers. we evaluated four of the candidate mirnas (mir-371a-3p, mir-372-3p, mir-373-3p and mir367-3p) with regard to their usefulness as tgct markers. overall, serum samples from 166 tgct-patients and from 106 controls were analysed using quantitative pcr. the first 50 consecutive patients and 20 controls were analysed for all four mirnas. after roc-analysis only the marker with the greatest discriminative power was studied further. the decline of mirna expression after orchiectomy was quantified in 134 cases and in 27 metastasized cases the marker was analysed repeatedly during the course of chemotherapy. additionally 10 cases with relapsing disease were studied. the mir-371a-3p featured the highest discriminative power (area under the curve: 0.94; 95% confidence interval [95% ci]: 0.874-0.982). in the entire cohort, patients could be distinguished from controls with a sensitivity of 88.7% (95% ci: 82.5-93.3%) and a specificity of 93.4% (95% ci: neuroendocrine tumor of the adrenal gland: an unusual manifestation of tsc c. müller-hofstede, j. horvath, b. dworniczak, p. wieacker institute of human genetics, university of münster, germany we report on a young woman asking for the recurrence risk of the neuroendocrine tumor of her mother deceased at the age of 38. her mother clinically presented because of therapy-resistent hypertonia, dyspnoe, progressive edema in the legs and face and a caput medusae. mri scan revealed a tumor (11×12 cm) in the right adrenal gland with lymph node metastases compressing the v. cava inferior and synchronous metastases in lung and liver. laboratory examinations showed highly elevated levels of cortisol and adrenocorticotropin (acth). cerebral mri was normal suggesting an ectopic acth secretion by a non-pituitary tumor. histologically, an undifferentiated, largely necrotic tumor was described so that the neuroendocrine nature of the tumor could not be proven. she died within three weeks after diagnosis. on suspicion of multiple endocrine neoplasia type 1 we initially performed a sequence analysis of men1 on tumor dna by next generation sequencing without detection of a pathogenic mutation. thereupon the molecular genetic panel analysis (nf1, ret, sdhb-d,tmem127, tsc1, tsc2, vhl) uncovered the heterozygous mutation c.3379c>t (p.arg1127trp) in the gene tsc2. this mutation is already described as pathogenic (hu et al. 2014 ). in the tumor dna the allele frequency of the normal allele mounted up to 10%, whereas the allele frequency of the mutant allele came to 90% pointing to a loss of heterozygosity (loh). the mutation was confirmed by sanger sequencing. taken all together, we assumed, that the mutation in the gene tsc2 represents a germline mutation. mutations in the suppressor genes tsc1 and tsc2 cause tuberous sclerosis, an autosomal-dominant disorder, resulting in hamartomatous tumors in the heart, brain, kidneys, skin and other organs. once in a while it is discussed whether neuroendocrine tumors (nets) represent a characteristic of tsc. there are some case reports describing nets in the context of tsc, but mainly in connection with nets of the pancreas (e. g. insulinoma) or the pituitary. to the best of our knowledge there exists only one case report of a bronchial carcinoid as a result of a germline mutation in tsc1 (dworakowska et al. 2009 ) and no description of net of the adrenal gland due to a mutation in tsc1 or tsc2. ngs provides the opportunity of wide-spread testing, even post-mortem, in order to get clarification for the descendants. although in our case we could not distinguish if the mutation detected represents a germline mutation or a somatic mutation, we were able to offer a predictive testing to the daughter and other family members. we report on a rare case of net of the adrenal gland because of mutation in the gene tsc2. this case illustrates that in the differential diagnosis of nets, tsc genes should also be considered. germ cell neoplasia in situ (gcnis) is the precursor lesion of testicular germ cell tumours (tgct). if detected clinically, this lesion may herald a pending tgct. unfortunately, the only way of diagnosing gcnis is by testicular biopsy and subsequent immunohistochemical examination. therefore, non-invasive methods of diagnosis are required. mirnas of the mir-371-3 and mir-302/367 cluster had been suggested as serum biomarkers of full-blown tgcts. we aimed to explore the utility of these mirnas for the detection of the pre-invasive stage of tgcts and we looked to the expression of two mirnas in serum samples of 27 gcnis patients. psmc3ip located on chromosome 17q21 is a putative tumor suppressor gene that encodes for the nuclear psmc3 interacting protein. the protein functions as coactivator of steroid hormone mediated gene expression and is important for rad51 and dmc1-mediated homologous recombination during dna repair of double-strand breaks. recently germline variants in psmc3ip, also known as gt198, tbip, and hop2, have been identified with low frequency in early onset familial breast and ovarian cancer (hboc) patients and in a patient with apparently sporadic early onset breast cancer. somatic variants in psmc3ip are frequently observed in breast, ovarian, and fallopian tube cancers. in this study, we analyzed a cohort of 166 brca1/2 mutation-negative hboc (n = 158) or early onset sporadic breast cancer patients (n = 8) for variants in psmc3ip. we identified seven different heterozygous variants in 8 out of 166 index patients: c.-115g>a (rs191843707); c.-70t>a (rs752276800); c.-37a>t (rs199620968); c.-24c>g (rs200359709); c.519g>a p.(trp173*); c.537 + 51g>c (rs375509656); c.*24g>a. these variants were not listed or at very low frequency (<1%) in the exac database. carriers of psmc3ip germline variants were mostly (6/8) affected by early onset breast cancer (median age of onset 36 years). for three out of seven different variants (c.-115g>a, c.519g>a, and c*24a>g), a possible impact on psmc3ip expression or function was observed. the stop mutation c.519g>a p.(trp173*) was found in two sisters, which were both diagnosed with unilateral breast cancer at age 33. the premature stop codon is located within the dna-binding domain of psmc3ip and is predicted to induce nonsense-mediated mrna decay (nmd). remarkably, c.-115g>a was already described in familial breast and ovarian cancer, and was found once in this study in a female that developed unilateral breast cancer at the age of 33 years. the variant c.-115g>a (rs191843707) was shown to induce a slightly, albeit significant decrease of reporter gene expression. the c.24*g>a variant was identified in a woman diagnosed with unilateral breast cancer at the age of 36 years. luciferase reporter assays indicated an impaired effect of c.24*g>a on microrna binding. germline variants in psmc3ip are present in breast and ovarian cancer families. whether mutated psmc3ip is a new risk factor for early onset breast/ovarian cancer in families with hboc and/or apparently sporadic early onset breast cancer remains to be shown. 86.9-97.3%) with this marker. in patients without metastases the mir-371a-3p expression declined significantly after surgery. in metastasized cases the levels dropped sharply after chemotherapy. all of the 10 relapses had elevated mir-levels, and expression decreased upon chemotherapy. mir-371a-3p has significantly higher sensitivity than each one of the classical tgct markers and than a combined panel of beta-hcg, afp and ldh (87.8% vs 50.4%). in non-metastasised seminoma the mir-371a-3p expression depended significantly on tumour size. mir-371a-3p is highly sensitive and specific for tgct. it correlates with the stage of disease and with treatment effects and it therefore fulfils the prerequisites of a valuable serum-based biomarker. the significant association with tumour bulk in localised disease provides evidence for the tgct being the primary source of mir expression. the sensitivity of mir-371a-3p surpasses that of classical tgct markers by far, and thus it may become the new gold standard for serum diagnostics of tgct in the coming years. co-occurrence of radioulnar synostosis and amegakaryocytic thrombocytopenia (rusat) was initially described as an inherited thrombocytopenia syndrome that is caused by a mutation in hoxa11. in three simplex patients, de novo missense mutations in mecom were reported as an alternative origin of the disease (rusat2). mecom, identified as a common ecotropic viral integration site 1 (evi1) in murine myeloid leukemia, is known as a key transcriptional regulator in hematopoiesis and sporadic myeloid leukemia. we report here on a novel mecom mutation cys766gly (uniprot q03112-1) identified by whole exome sequencing in a family with rusat, hearing impairment, hand dysmorphisms, and patellar hypoplasia in four patients spanning three generations. notably, two of four affected individuals in our family developed a myeloid malignancy. the novel mecom missense mutation cys766gly affects a heavily conserved cysteine residue in c 2 h 2 -zinc finger motif 9 in the c-terminal zinc finger domain of mecom. this residue is crucial for the tetrahedral coordination of a zinc ion stabilizing the zinc finger conformation and thus, is essential for dna binding of the c-terminal zinc finger domain. our findings reconfirm the causality of mecom mutations and indicate that mecom mutations also need to be considered in familial rusat patients. in addition, we report for the first time that mecom germline mutations targeting the c-terminal zinc finger domain are associated with an increased risk for myeloid malignancies. this extends the rusat2-associated phenotype and proposes that mecom germline mutations can cause a genetic predisposition to myeloid malignancy. (z., b. and s., d. contributed equally to this work) many genes that harbor rare mutations which entail a medium or high risk for breast cancer (bc) belong to dna double strand repair and have been identified by linkage analysis or by sequencing of candidate genes in bc families. in addition, a considerable number of common but low risk germline variants have been found in genome wide association studies. however, these predisposing factors yet only explain a fraction of bc cases. with the intention to identify low frequency variants conferring an intermediate bc risk, we performed an association study using a candidate gene approach and testing dna repair capacity with the micronucleus test (mnt) as an additional second phenotype. rs3810813 in the slx4/fancp gene showed an association with bc that was pronounced in younger cases and was confirmed in a verification cohort (combined analysis of 1,448 cases, 2593 controls: or = 2.19 (1.45-3.30) , p = 0.00012 for cases ≤40 years). genotyping additional snps and imputation revealed a specific european haplotype of ca. 150kb length that spans slx4 and adjacent genes. it is tagged by 6 the observed mutual dependence of the two phenotypes allowed a considerably improved interpretation of the results: (i) the unknown causal variant on the haplotype can be assumed to be present mostly in cases, indicating a rare variant with a rather strong effect; (ii) using this information on the two phenotypes in the association between the mnt results and bc improved considerably the identification of the specific risk among cases (<60 years) who carried the haplotype. roc curves for bc depending on mnt results revealed that the stratification on carriers of the haplotype increased the auc from 0.65 (p = 0.0007) to 0.94 (p = 0.0001). both associations can be best explained by a risk variant carried by a fraction of the haplotypes that is enriched in early onset bc cases. slx4 is the only gene in the tagged region which can be functionally related to both associated phenotypes, while for the other genes no connection to bc or dna repair is reported. inherited dna repair mutations: are they modifiers of brca1 and brca2 penetrance and age at onset of hereditary breast and ovarian cancer? background: inherited mutations in brca1 and brca2 are the most common causes of hereditary breast and ovarian cancer (hboc). the risk of developing breast cancer by age 70 in women carrying a brca1 mutation is 57-65% and 45-55% in brca2 carriers. however, mutations in brca1 and brca2 only explain about 25% of all hboc cases. the lifetime risk varies between families and even within affected individuals of the same family. the cause of this variability is unknown but it is hypothesized that additional mutations or rare variants in genes that are possibly interacting with brca1/2 in different dna-repair pathways contribute to this phenomenon. methods: we obtained samples of 181 patients positive for brca1 or brca2 mutations and an age-of-onset (aoo) of breast cancer below 35 or above 60 years of age from the german consortium for hereditary breast and ovarian cancer. panel sequencing was done to screen germline dna for mutations in 311 genes involved in different dna-repair pathways. variants were classified into five classes according to a modified version of plon et al (2008) . only truncating mutations and known pathogenic missense mutations were considered pathogenic or likely pathogenic. results: the patient group with an early aoo (93 women) had developed breast cancer at a mean age of 26.4 years (± 2.06) and the control group (88 women) had developed breast cancer at a mean age of 69.5 years (± 7.05). a total of 4,293 variants were detected in all patients and 54 of these (1.3%; 95% ci, 0.96%-1.64%) were presumed to be deleterious. mutations were found in 45 genes other than brca1 and brca2. mutations were mainly found in single-strand break repair (ssbr 29%), double-strand break repair (dsbr 26%) and checkpoint factors (13%). the rest were found in genes with other functions such as brca1/2 interactors, centrosome formation, and signal transduction. the putative mutations were found in 26 women of the control group (29.5%; 95%ci, 20.3%-40.2%) compared to 36 women of the patient group (38.7%; 95% ci, 28.8%-49.4%). the incidence of germline mutations in dna-repair genes did not differ according to the age of onset (p = 0.2). prevalence of additional germline mutations in dsbr in patients (13%) was not significantly different from prevalence of dsbr mutations in controls 10.2% (p = 0.6) conclusions: the preliminary results failed to show a difference in mutation load between the two cohorts of brca1/2 carriers sorted by age of onset. larger studies are needed and may provide further insight into the role of mutation load in hboc age of onset of brca1/2 carriers. objective: glioblastoma stem-like cells (gscs) carry stem cell features and therefore seem to be responsible for tumor initiating, maintaining, recurrence and chemo-and/or radiotherapy resisting. the current knowledge on genetic and transcriptomic characteristics of these cells especially in comparison to glioblastoma tissue is still limited. the aim of this study is to compare the genetic and genomic profile of glioblastoma tissue and gscs. thereby, differences in involved genes and affected pathways on dna level as well as on gene expression level are identified. material and methods: peripheral blood and tumor tissue were obtained from patients with glioblastoma. tumor tissue derived explant cell culture and serum-free culture were established. based on multi-parameter magnetic-activated cell sorting (macs) technique, cd15 and cd133 labeled cell subpopulations of gscs could be isolated. the tumor tissue, serum-free culture, and the isolated cell subpopulations as well as blood were analyzed by snp array and gene expression (excluding blood) in a paired design. for preliminary characterization of gscs in the serum-free culture we confirmed the stem cell features of gscs by the expression of nestin, sox2, and cd133 (applying immunofluorescence staining). results: our results of snp array analyses showed genetic aberrations in all analyzed cellular entities (tumor tissue and cell subpopulations, e. g. gain of chromosome 7, loss of 10q23.31, loss of 10q11.1->q26.3, and complete loss of chromosome 10). furthermore, distinct genetic differences between the cell subpopulations and tumor tissue were observed (e. g. loss of chromosome 4 and segmental uniparental disomy of 9p24.3->p21.3, only in cancer stem-like cell subpopulations). in addition, we detected many possibly candidate cancer genes and pathways which may have an influence on tumorigenesis. gene expression analyses revealed strongest differences between fresh tumor tissue and serum-free culture based cells, where more than a third of investigated genes were affected. when contrasting fresh tumor tissue with stem cell marker positive serum-free cultured cells, 1,106 genes were upregulated in the stem cell marker positive cells, whereas 1,533 genes where upregulated in fresh tumor tissue. within these genes, strongest enriched pathways in stem cell marker positive cells included positive regulation of cell cycle and cancer-related pathways, whereas in fresh tumor tissue predominantly immune-related pathways were found, e. g. myeloid leukocyte activation, inflammatory response and phagocytosis. conclusion: differences between gscs and tumor tissue using snp array analyses and gene expression were detected. our results may help to get more information about the molecular pathomechanisms of glioblastoma. it still needs more investigations on the field of genetic and genomic analyses between gscs and glioblastoma tissue to identify novel potential targets for therapy development. background: fanconi anemia (fa) is a rare inherited chromosomal instability syndrome associated with bone marrow failure as well as myelodysplastic syndrome (mds) and acute myeloid leukemia (aml). one-third of fa individuals exhibit bone marrow cytogenetic clones, notably gains of 1q and 3q and/or loss of 7/7q, and 15% to 30% of fa patients developed mds/aml. in recent years, the application of high throughput technologies has revealed recurrent somatic mutations in genes implicated in myeloid malignancies. as additional genetic maladies facilitating mds/aml development in fa is lacking, we aimed to elucidate whether these mutations would be present in fa patients with mds/aml. methods: using illumina trusighttm myeloid sequencing panel (san diego, ca), we performed next-generation sequencing (ngs) on dna extracted from bone marrow specimens from 17 fa mds/aml patients registered in the european working group of childhood mds. the sequencing panel targeted 54 genes frequently mutated in hematologic neoplasms. results:. ten of the 16 (62.5%) evaluable patients had 18 lesions (1 to 3 mutations per patient; 14 missense, 1 nonsense, 2 insertion and 1 duplication) in 13 genes. the presence of a somatic mutation did not appear to correlate with complex karyotype or −7/7q. all affected genes occurred in isolation with exception of runx1 and kras. while 13 of the mutations were pathogenic, 5 were variance of unknown significance. mutations in genes involved in epigenetics (dna methylation, chromatin maintenance and cohesin complex; idh2, tet2, dnmt3a, idh1, ezh2, rad21 and asxl1) and mutations in transcription factor genes (runx1, ikzf1 and etv6) represented the most frequently affected genes. this was followed by mutations of genes encoding signaling molecules including the ras pathway (kras and ptpn11). altered runx1 was the most common lesion and occurred in individuals with aml, raeb or raeb-t. one patient with refractory anemia with ring sideroblasts (rars) had a mutation in the spliceosomal gene, sf3b1. conclusions: while the most common mutations encountered in sporadic cases of mds were in genes involved in rna splicing and epigenetics, these two broad categories of genes appeared to have less influence in our fa patients. most mutations were nonrecurring suggesting that there is no specific mutation pattern of these genes in fa-related mds/aml. however, runx1 mutations and also mutations involved in genes of the ras pathway appear to play a pathogenic role in fa mds/aml development. taken together, the data suggests that mutations in genes that cause clonal hematopoiesis in the population at large do not contribute significantly to fa hematopoietic clonal disease; however, particularly acquisition of runx1 and ras pathway alterations promote malignant myeloid disease progression. more extensive studies analyzing more patients are necessary to further define the secondary hits leading to fa myeloid disease. chromatin remodeling is a complex process shaping the nucleosome landscape, thereby regulating the accessibility of transcription factors to regulatory regions of target genes and ultimately managing gene expression. the swi/snf (switch/sucrose nonfermentable) complex remodels the nucleosome landscape in an atp-dependent manner and is divided into the two major subclasses brahma-associated factor (baf) and polybromo brahma-associated factor (pbaf) complex. somatic mutations in subunits of the swi/snf complex have been associated with different cancers, while germline mutations have been associated with autism spectrum disorder and the neurodevelopmental disorders coffin-siris (css) and nicolaides-baraitser syndromes (ncbrs). css is characterized by intellectual disability (id), coarsening of the face and hypoplasia or absence of the fifth finger-and/or toenails. so far, variants in five of the swi/snf subunit-encoding genes arid1b, smarca4, smarcb1, arid1a and smarce1 as well as variants in the transcription factor-encoding gene sox11 have been identified in css-affected individuals. arid2 is a member of the pbaf subcomplex, which until recently had not been linked to any neurodevelopmental phenotypes. in 2015, mutations in the arid2 gene were associated with intellectual disability. in this study, we report on two individuals with private de novo arid2 frameshift mutations. both individuals present with css including id, coarsening of facial features, other recognizable facial dysmorphisms and hypoplasia of the fifth toenails. hence, this study identifies mutations in the arid2 gene as a novel and rare cause for css and enlarges the list of css-associated genes. the ubiquitin pathway is an enzymatic cascade including activating e1, conjugating e2, and ligating e3 enzymes, which governs protein degradation and sorting. it is crucial for many physiological processes. compromised function of members of the ubiquitin pathway leads to a wide range of human diseases, such as cancer, neurodegenerative diseases, and neurodevelopmental disorders. mutations in the thyroid hormone receptor interactor 12 (trip12) gene (omim 604506), which encodes an e3 ligase in the ubiquitin pathway, have been associated with autism spectrum disorder (asd). in addition to autistic features, trip12 mutation carriers showed intellectual disability (id). more recently, trip12 was postulated as a novel candidate gene for intellectual disability in a meta-analysis of published id cohorts. however, detailed clinical information characterizing the phenotype of these individuals was not provided. in this study, we present seven novel individuals with private trip12 mutations including two splice site mutations, one nonsense mutation, three missense mutations, and one translocation case with a breakpoint in intron 1 of the trip12 gene and clinically review four previously published cases. the trip12 mutation-positive individuals presented with mild to moderate id (10/11) or learning disability [intelligence quotient (iq) 76 in one individual], asd (8/11) and some of them with unspecific craniofacial dysmorphism and other anomalies. in this study, we provide detailed clinical information of eleven trip12 mutation-positive individuals and thereby due to its heterogeneous etiology, primordial growth retardation is often a challenge for geneticists and clinicians in respect of diagnosis, therapy and prognosis. thus, pinpointing its genetic origin is required for a personalized treatment and prognosis. one syndrome mainly characterized by intrauterine and postnatal growth is silver-russell syndrome (srs), a clinically and molecularly heterogeneous disorder with a considerable overlap with other syndromes. in only 60% of patients with the characteristic srs phenotype the diagnosis can be confirmed molecularly, but 40% of cases remain without molecular diagnosis. in fact, in clinically less well characterized patients referred for diagnostic testing, the detection rate is less than 20%. however, systematic investigations on the contribution of mutations in genes which may be considered in the differential diagnoses of srs are still missing. we examined 60 patients referred for molecular testing of srs but without molecular alterations associated with srs by ngs. a targeted ngs approach comprising 26 genes implicated in the differential diagnoses of srs or suggested as srs candidate genes was performed. in 5 patients fulfilling the criteria of srs accordingly to our recently developed clinical scoring system, disease-causing variants were found. these patients carried mutations in genes associated with bloom syndrome, mulibrey nanism, kbg syndrome, short syndromes or ig-f1r-associated short stature, respectively. indeed, some of the differential diagnoses detected in our cohort have a major impact on clinical management, including cancer screening because of a high risk for tumor development. furthermore, we did not identify any pathogenic mutation in one of proposed srs candidate genes (e. g. mest, grb10, copg2), thus raising the question whether these genes are indeed involved in the etiology of srs. we show that a (targeted) ngs approach is an important tool to identify the genetic cause in patients with unexplained growth retardation. furthermore, our data show (positive) clinical scoring in srs should not impede the consideration of differential diagnoses and other molecular causes. submicroscopic deletions of chromosome band 2p25.3 have been reported in more than 20 patients. common clinical features include intellectual disability/developmental delay, central obesity and behavioural difficulties. myt1l became the main candidate gene for id and obesity since it is deleted or disrupted in all published patients. however, reports of deletions affecting only this gene and even more so of deleterious myt1l sequence variants are very rare. to our knowledge, until now only two patients with de novo myt1l point mutations have been reported. in the present study, we analysed a cohort of individuals with intellectual disability of unknown aetiology and their unaffected parents by whole exome sequencing. we identified de novo myt1l sequence variants in two out of 311 patients. patient 1 carried a nonsense mutation (c.1531g>t, nm_015025.2; gly511*) whereas patient 2 carried a direct splice site mutation (c.2769-2a>g). according to prediction algorithms, both detected myt1l variants are deleterious (patient 1: sift score 0, cadd score 42; patient 2: cadd score 24.6). in addition, patient 2 carried a de novo splice site variant in setd1b. however, this variant is predicted to be benign (cadd score 2.5) as well as a known snv (rs749218728, maf 0.0000323). a comprehensive clinical characterisation of the two patients yielded only mild or moderate intellectual disability, behavioural problems and muscular hypotonia as common clinical signs. surprisingly, obesity was only present in patient 2. postnatal tall stature and transient microcephaly were present in one patient each. this clinical picture is compared to the published phenotypes of patients with myt1l point mutations, patients with microdeletions of only myt1l and patients with larger 2p25.3 deletions. with the reduced penetrance regarding obesity, the clinical picture of patients with myt1l mutations is becoming more and more unspecific. the retina and anterior neural fold homeobox gene (rax) controls the embryonic eye development and is involved in human autosomal-recessive microphthalmia. so far only a few compound heterozygous mutations in rax have been described in microphthalmia patients. we report a first case of microphthalmia caused by a novel homozygous mutation in rax. the 8-month-old patient was born to consanguineous parents and presented with extreme microphthalmia, panhypopituitarism and developmental delay. mri of the brain showed bilateral agenesis of the anterior visual pathway and tractus opticus. ocular ultrasound confirmed bilateral anophthalmia. additionally, dysgenesis of the corpus callosum and an abnormal pituitary gland have been detected. the first child of these parents, who died shortly after birth, had also been diagnosed with bilateral anophthalmia. using panel diagnosis of the disease associated genome, we identified the homozygous pathogenic variant c.112del, (p. 138fs) in rax. we performed a segregation analysis and confirmed that both parents are heterozygous for this variant. so far developmental delay and panhypo-expand the clinical spectrum of the trip12 gene in non-syndromic intellectual disability with or without asd. background: whole exome sequencing (wes) using next generation sequencing has proven to be a powerful tool in determining the underlying genetic cause of rare disorders. here, we show, that clinical follow-up and diagnostic re-evaluation can be crucial for uncovering further disease-causing mutations. clinical report and genetic findings: we report follow-up data of a previously published consanguineous family with two children, a boy and a girl, suffering from severe encephalopathy, hypotonia, microcephaly and retinal dystrophy. wes had shown a homozygous intronic splice variant in pgap1 (c.1090-2a>g;p.?) causative for the symptoms. both parents were heterozygous carrier for the pgap1 variant (granzow, paramasivam et al, mol cell probes 2015) . in the next pregnancy, the unborn child presented hydrops fetalis, omphalocele, short tubular bones and cystic kidneys. chorionic villus sampling showed the fetus to be homozygous for the pgap1 variant. however, neither of these symptoms fit with a pgap1-associated disorder. additional wes of fetal dna and re-evaluation in the family showed a homozygous nonsense variant in ift140 (c.g3577t;p.e1193*) consistent with a diagnosis of mainzer-saldino syndrome (mss) which is characterised by the association of renal disease, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia, as a second diagnosis in the fetus. again, both parents were shown to be a heterozygous carrier for the ift140 variant. yet, as omphalocele was not accounted for by any of the identified conditions, a third genetic cause cannot entirely be excluded. alternatively, omphalocele may be a rare manifestation of mss, or be the result of a combination of both disorders. the couple opted for induced abortion. discussion: it is estimated that an individual carries multiple heterozygous variants for autosomal recessive disorders in his or her genome. especially in consanguineous families, this results in an elevated risk for children with more than one disorder. in recent publications of clinical exomes, double diagnoses have been reported in 0 to 12% of investigated subjects. thus, the possibility of more than one causative gene should be carefully explored when working with wes and re-evaluation in case of additional clinical symptoms within a family should be considered. also, follow-up of families with rare genetic disorders may lead the clinical geneticist beyond the assumed single cause to multiple single gene disorders in the same family. conclusion: using wes, we have identified two independent single gene disorders in a consanguineous family demonstrating that clinical follow-up and diagnostic re-evaluation can be crucial for uncovering multiple disease-causing mutations in one family. we present a case study using molecular cytogenetic approaches on a 3 year old boy presenting with microcephaly-brachycephaly, macroglossia and absent speech development. the boy is the first child of healthy, consanguineous parents of pakistani origin. following an uncomplicated pregnancy, the hypotrophic newborn was delivered at 37 weeks weighing 2610 g. in the third month of life, the baby had viral meningitis. regular pediatric follow-up revealed psychomotor delay with hypotonia. creatine kinase, lactate and fibroblast growth factor 21 measured in serum were high. subsequent investigations at the age of 16 months included brain mri, electroencephalogram and muscle biopsy that gave hints of a mitochondriopathy or potential neuropathy of axonal type. due to the suspicion of a complex mitochondriopathy, whole exome sequencing was performed using a sureselect human all exon kit (agilent, 50mb v5) on a hiseq 2500 (illumina). the analysis revealed a heterozygous microdeletion of 121 kb on chromosome 9q34.3 which was classified as an unclear variant (uv), ehmt1-gen was not affected. re-examination of proband's dna using array cgh detected a larger 180 kb heterozygous deletion in the 9q34.3 region (arr[hg19] 9q34. 3(140,395,510-140,575,736) x1), encompassing exon 1 of ehmt1 (euchromatin histone methyltransferase 1; transcript nm_ 024757.4). haploinsufficiency of this gene results in kleefstra syndrome (omim 610253), a multisystem disorder due to either microdeletions in 9q34.3 encompassing ehmt1 or intragenic point mutations. mlpa analysis (ehmt1 mpla-kit p340) of parental dna further indicated a de novo origin of the deletion in our proband. a similar deletion has been described previously in a case presenting with clinical features of kleefstra syndrome [2] strengthening the importance to include the 5' part of ehmt1 in sequencing as well as cnv screening. in summary, our study clearly shows that array cgh is a valuable complementary approach to ngs especially for poorly covered regions in ngs i. e. exon 1 of many transcripts. we report on a three-generation family with variable manifestations including delayed and incomplete tooth eruption, early tooth loss due to short dental roots, acroosteolysis, osteoporosis, tendon ruptures, joint hypermobility, muscle weekness, glaucoma, neurological features, and psoriasis. after detection of elevated cd169/siglec1-expression on monocytes and an upregulation of interferon-stimulated gene transcripts, singleton-merten syndrome was diagnosed. the novel heterozygous mutation c.992c>g (p.thr331arg) in ifih1 was found in three affected family members. singleton-merten syndrome is a very rare autosomal dominant interferonopathy, so far described in not more than four families. until now, only two different gain-of-function mutations in ifih1 have been detected. mutations in ifih1 are also associated with aicardi-goutière syndrome and recently features of both conditions were found in the same family. our findings expand the mutational spectrum of singleton-merten syndrome and demonstrate the high intrafamiliar variability associated with mutations in ifih1. pituitarism have not been described in association with rax mutations. therefore we conducted array comparative genomic hybridization and karyotyping in the index patient. both tests gave normal results. prenatal diagnosis by chorionic villus sampling in the next pregnancy excluded a homozygous carrier status for this rax mutation. mutations in the phf6 gene are associated with borjeson-forssman-lehmann syndrome (bfls), an x-linked intellectual disability disorder affecting mainly males. female carriers usually show no or mild clinical signs. however, recent studies described females with de novo phf6 gene defects (mutations, deletions) and severe phenotypes resembling coffin-siris syndrome [1, 2] . here, we report on a girl with a maternally inherited phf6 mutation and a phenotype resembling those described previously in affected females. the mother had learning difficulties and mild dysmorphological features (hypertelorism, prominent forehead). when seen at age 12 months, the proposita showed muscular hypotonia, was unable to sit and had limited head control (developmental delay 6 months). dysmorphic features included scaphocephaly, hypertelorism, a small flat nose with anteverted nares, low set, prominent ears, a high, narrow palate, absent labia minora and linear skin pigmentation on the thighs. ophthalmologic investigation identified strabism convergens of the left eye, hyperopia and an excavated papilla with a pale optical nerve. ultrasound showed patent foramen ovale, tricuspid insufficiency and a unilateral incomplete duplication of the renal pelvis. karyotyping performed elsewhere was normal (46,xx). results of a genome-wide snp array analysis (affymetrix cytoscan hd) were also normal. using a targeted ngs approach for syndromic and non-syndromic developmental delay encompassing over 1200 brain related genes (mpimg-1-test), we identified a heterozygous lof-mutation c.88c>t (p.gln30*) in the phf6 gene (encoding phd finger protein 6). in addition, a human androgen receptor (humara) assay using blood dna showed a highly skewed x-inactivation (91:9). segregation analysis indicated a maternal origin of the variant. the mother also had skewed x-inactivation in blood. her husband and other daughter tested normal for the c.88c>t variant. here, we describe the first female patient with a maternally inherited phf6 mutation and a severe phenotype. the mild phenotype in the mother might be due to different patterns of x-inactivation or to (undetected) mosaicism. this report represents the 12th case of a severely affected female patient with a phf6 gene defect. findings support the assumption [1] that the female phenotypes of bfls might be more common than previously estimated. toms typical for ada2 deficiency such as fever, livedo racemosa, abdominal colics, arthralgias, and raynaud's phenomenon were observed months later. cecr1 sequencing (nm_001282225.1) revealed two previously described pathogenic missense mutations: c.140g>c, p.(gly47ala) and c.506g>a, p.(arg169gln). compound heterozygosity was confirmed by parental analysis. to the best of our knowledge, this combination of mutations has not been described until now. the p.(arg169gln) is considered as founder mutation in the dutch population, but first phenotype-genotype analyses did not allow further prediction of clinical outcomes. ada2 deficiency should be considered in patients with childhood stroke despite the absence of systemic inflammation and cerebral vasculitis. congenital hyperinsulinism (chi) has been described as heterogeneous entity caused by at least 9 different genes. in 1991, tein et al. first described a defective activity of l-3-hydroxyacyl-coa dehydrogenase in a 16-year old patient with hypoketotic hypoglycemic encephalopathy. biochemical markers of l-3-hydroxyacyl-coa dehydrogenase deficiency (schad) are high 3-oh-glutarate excretion in urine and c4-oh-carnitine in plasma. the clinical presentation is very heterogeneous with regard to age of onset, severity of symptoms as well as response to medical treatment and leucine-sensitivity. in some patients even a near total pancreatectomy was performed. schad dependent hyperinsulinism (hhf4) is a rare autosomal recessive disorder that is caused by mutations in the gene hadh. here we describe 4 patients from 3 unrelated families out of a cohort of 136 chi patients mainly from central europe. patients 1 and 2 are siblings from unrelated parents. the older brother manifested with hypoglycemic convulsions at the age of 5 weeks. a subtotal pancreatectomy was performed in an outside academic hospital. in further course he developed epilepsy and has been treated with diazoxide and anticonvulsants. the 2nd child was born with hypoketotic hypoglycemia and chi was diagnosed in first days of life. diazoxide treatment stabilized blood glucose and both children were referred to our pediatric endocrinology at the age of 8 years and 10 months, respectively. mutational analysis revealed the homozygous variant c.547-3c>g within the region of the splice acceptor site in intron 4 of the hadh gene in both affected children. this change is neither registered in exac nor described in the mutation databases hgmd or in the literature and was predicted to disrupt proper splicing. we then completed mutational analysis in unidentified patients of our chi cohort with diazoxide responsiveness and known or suspected consanguinity. patient 3 was born to consanguineous parents and chi manifested in the girl at neonatal age with hypoketotic hypoglycemia. she was successfully treated with diaxozide. later, she developed convulsions and statomotoric developmental delay. the homozygous splice mutation c.636 + 471g>t in intron 5 of hadh was detected in the child. the parents were identified as heterozygous carriers. in patient 4, a girl born to consanguineous turkish parents, chi manifested at the age of 6 months with hypoglycemic seizures. she responded well to diazoxide treatment. a homozygous missense mutation (c.406a>g; p.lys136glu) in exon 3 of hadh was detected in the patient and her parents were heterozygous carriers. hadh mutations in case 3 and 4 have been previously described in probands of turkish descent and appear to be founder mutations in the turkish population. in conclusion, we recommend hadh mutation analysis to be considered in chi children with unknown cause and known consanguineous pedigrees or originating from populations with higher prevalence of consanguinity. homozygous or compound heterozygous mutations in cecr1 (cat eye syndrome chromosome region, candidate 1) have recently been identified to causing deficiency of adenosine deaminase 2 (ada2; dada2) with childhood polyarteritis nodosa (pan) (omim # 615688). this inflammatory vasculitis affects the skin, and inner organs (predominantly kidneys and gastrointestinal tract) and also shows a high risk of ischemic stroke, brain hemorrhage as well as peripheral neuropathy. using whole-exome sequencing it was also found that the six adult patients (aged 20-48) described by sneddon in 1965 (sneddon syndrome, omim # 182410) likewise carried compound heterozygous cecr1 mutations. sneddon syndrome is characterized by a combination of dermatologic features (livedo racemosa) and ischemic brain infarctions. recently, clinical and genetic data of more than sixty ada2 patients have been reviewed and underlined the wide clinical variability in age at onset, clinical findings, outcome of neurological involvement, and additional hematological symptoms. typically, stroke has been reported to follow systemic inflammatory disease and predominantly affects posterior and central brain areas. here we describe one of the rare patients in whom acute mesencephalic stroke preceded systemic inflammation and presented as initial clinical symptom. symptriploidy is a recurrent finding in prenatal diagnostics. in a small number of individuals, correction of triploidy has been suggested based on the finding of (mosaic) genome-wide uniparental disomy (upd). we here investigated uncultured und cultured amniotic cells (ac) and placental tissue from a fetus, in which ultrasound examination in the 17 + 0th week of gestation revealed growth retardation, left diaphragmatic hernia with parts of stomach and bowel localized in the chest, dextrocardia, short nasal bone and single umbilical artery. these findings were confirmed at the 18 + 1th week when the pregnancy was terminated. the pregnancy was conceived spontaneously by a 28-year-old mother and a 31-year-old father, both healthy and with uneventful family history. the parents were non-consanguineous and carry a normal karyotype. microsatellite analyses of uncultured ac obtained at initial presentation showed for chromosomes 13, 18 and 21 a pattern suggesting triploidy with only biallelic presentation. while y-chromosomal sequences were lacking the x-chromosome showed, unexpectedly a rather disomic pattern. metaphase yield on cultured ac was low but showed a mosaic karyotype 48,xxx-,+10[20]/47,xxx[17], which was confirmed for several chromosomes by interphase fish. remarkably, a triploid clone was cytogenetically not detected. thus, we performed further analyses using microsatellite markers, oncoscan technology and fish. these studies unraveled in uncultured ac a pattern suggestive of triploidy with the supernumerary chromosomal complement derived from a maternal isodisomy with the notable exception of the x chromosome. in cultured ac and placental tissue for all chromosomes, except x and 10, a diploid pattern was observed with alleles from both parents identical to those in the uncultured ac. trisomy x was confirmed in both tissues with the supernumerary chromosome x being of paternal isodisomic origin. the trisomy 10 was seen only in cultured ac, and likely represents a pseudomosaic which nevertheless could not be proven due to insufficient yield of mitoses from the parallel cultures. finally, retrospective interphase fish on remnant uncultured ac showed two diploid clones, one disomic (approximately 40% of nuclei) and one trisomic (60%) for the x chromosome. the most likely explanation for the findings is a mosaicism for one diploid clone with genome-wide maternal isodisomy and a second diploid but bi-parental cell line with paternal trisomy x. given the identity of the (maternal) alleles in both clones our findings suggest that originally a triploid clone due to a maternal division error/inclusion of a polar body ii existed which underwent (erroneous) triploidy rescue resulting in one diploid biparental clone and one haploid clone of maternal origin that underwent haploid rescue resulting in genome-wide maternal isodisomy. the biparental clone with trisomy x either resulted from a sperm with two x-chromosomes or an erroneous x-duplication during trisomy rescue. since the introduction of non-invasive prenatal diagnosis (nipd, e. g. har-mony® test) in 2013, this test is frequently demanded and routinely applied in prenatal centers and medical practices. mostly, nipd is intended to detect autosomal trisomies (13, 18, 21), but also offers the possibility to analyze sex chromosomes. therefore, also sex chromosome aneuploidies (sca) (e. g. monosomy x (turner syndrome), triple x, xxy (klinefelter syndrome), xyy) are incidentally found. so far, in our prenatal center sca were detected in 7 pregnancies by har-mony® test, consisting of three pregnancies with monosomy x (turner syndrome) and two pregnancies with klinefelter syndrome (xxy). triple x and xyy were detected one time each. of the three cases with suspected monosomy x, the diagnosis of turner syndrome could only be confirmed in one case. this fetus also had a hydrops at week 10 + 0. for the other two fetuses, the chromosomal analysis of amniotic fluid revealed normal female karyotypes (46,xx). in both cases with suspected klinefelter syndrome, this diagnosis could be disproved by amniocentesis (karyotype 46,xy). in the pregnancies with assumed triple x and xyy, the true fetal karyotype was not further determined yet. from our experience, the rate of false positive results concerning the sex chromosome aneuploidies is noticeably higher than reported in two studies of nicolaides 2014 and hooks 2015. this has to be strongly considered in the counselling of patients who wish to know the fetal sex by nipd. congenital myopathies and congenital muscular dystrophies: will genetic testing replace muscle biopsy in the near future? congenital myopathies and muscular dystrophies are a group of inherited neuromuscular diseases with early onset and broad genetic and histopathological overlap. the diagnostic approach has considerably changed with next generation sequencing methods available. here, we describe the diagnostic value of genetic and histological methods in a cohort of 117 index patients and hence the efficacy of diagnostic procedures. 78 of 117 patients had a muscle biopsy as a first-tier approach. in 54 of 78 patients muscle biopsy was informative, leading to a classification in subgroups of cm or cmd. however, in only a few of these cases biopsy led to a specific diagnosis (e. g. merosin deficiency). in 55 of 78 patients genetic testing (candidate gene sequencing or ngs) was performed additionally to muscle biopsy as a second-tier diagnostic step, while 39 patients of the whole cohort received genetic testing only. in almost two-thirds of these 94 patients genetic testing identified known pathogenic or most likely pathogenic variations. these findings illustrate that genetic testing is superior to muscle biopsy in accurately diagnosing cm or cmd. in conclusion, we suggest that invasive muscle biopsy should be replaced by genetic testing as first-tier diagnostic procedure in patients with clinical signs of cm or cmd. nmd inhibition increases the amount of gaa-rna in patient's lymphocytes as well as in the cells of his parents. the residual function of the resulting protein has to be investigated. discussion and conclusion: rna analysis in lymphocytes with and without nmd inhibition is a simple method for analysing splice defects in all monogenic disorders with expression of the disease causing gene in lymphocytes. a further advantage for the patient is the use of blood cells instead of fibroblasts, because a skin biopsy can be avoided and analysis times are reduced. the exact characterization of pathogenic variants is an important aspect of diagnosis, prediction of disease severity and genetic counselling. in vitro nmd inhibition in lymphocytes of affected patients allows the characterization of splice defects. in the future successful inhibition of nmd in vitro might help to identify patients, who may profit from a therapeutic intervention with nmd inhibitors. even expression of a partial protein with low or no activity reduces the risk for the patient to develop antibodies hampering enzyme/protein replacement therapy. p-cling-067 12q14 microdeletion syndrome: a family with short stature and silver-russel (srs)-like phenotype. introduction: the silver-russel syndrome (mim 180860), first described independently by silver and russell in 1953, is a condition with intrauterine growth retardation, postnatal growth failure and other characteristic features, including relative macrocephaly (defined as a head circumference at birth ≥1.5 sd score (sds) above birth weight and/or length sds), prominent forehead, body asymmetry and feeding difficulties as recently defined in an international consensus statement. patients and methods: we report here on 3 first degree relatives with a silver-russel syndrome phenotype who presented with prenatal-and postnatal growth retardation, feeding difficulties, a prominent forehead and a failure to thrive. additional features such as dysmorphic facial features, periodically increased sweating, and scoliosis were present in one of the family members only, whereas learning problems and cardiac arrhythmia were present in one other. none of the patients had relative macrocephaly. high resolution array-cgh was performed to screen for cncs and mlpa to confirm the array-cgh result. results: no hypo-methylation of the imprinting center on 11p15 nor uniparental disomy of chromosome 7 and 14 were found in the index-patients. high-resolution array-cgh identified a 12q14.3 microdeletion of 1.67 mb (arr[grch37 12q14.3(65, 863, 528 ,640)×1). the heterozygous loss was confirmed by mlpa in the index patient and the other two affected family members (i. e. her brother and mother). the deletion includes the genes hmga2, llph, tmbim4, irak3, helb, grip1, and the pseudogene rpsap52. conclusion: to the best of our knowledge this is the first report on familial presentation of a silver-russel syndrome due to a microdeletion in 12q14.3. none of the patients had relative macrocephaly. supporting the hypothesis by takenouchi et al. that the causative gene for relative macrocephaly resides centromeric to hmga2, the region centromeric of hmga2 is not included in the deletion in our family. spastic ataxia of charlevoix-saguenay (sacs) is an autosomal recessive neurodegenerative disorder and is caused by homozygous or compound heterozygous mutations in the sacs gene. first symptoms of sacs are walking difficulties due to unsteady gait. further typical clinical features include spasticity, ataxia, pyramidal tract signs, nystagmus and dysarthria. here, we report on a 16-year-old female patient who initially presented with disturbances in motor abilities including frequent falls and high arched foot. cranial mrt was normal while nerve conduction velocity was significantly reduced. the patient's parents did not show any clinical features. since no pmp22 duplication was detected we performed a gene panel including 64 genes that are associated with hereditary motor and sensory neuropathies (hmsn) and related disorders by using targeted next generation sequencing. we identified the two heterozygous stop mutations c.9305t>a (p.leu3102ter) and c.9305dupt (p.leu3102phefs*8), located at the same position in the sacs gene. sanger sequencing did not enable us to properly display that there is a transversion and a duplication of the same nucleotide at two different alleles. this exemplifies that, in contrast to sanger sequencing, ngs can illustrate both alleles separately. to conclude, this case was only resolvable by ngs which makes this method appropriate for the detection of compound heterozygous mutations, especially in the rare event when two mutations occur at the same position. background: the precise identification and characterization of genetic variants in monogenic diseases has a wide influence on diagnosis and therapy. about 10% of pathogenic variants are splicing variants. due to the complex mechanism of splicing regulation it is difficult to predict the effects of variants on mrna splicing. possible consequences are exon skipping, intron retention, generation of novel splice sites or the utilization of a cryptic splice site. common consequences are a frame-shift and the generation of premature termination codon. this leads to rna degradation via the nonsense mediated decay (nmd) pathway. in a patient with the clinical symptoms of non-classical infantile pompe disease and a confirmed acid alpha-glucosidase (gaa) deficiency, we detected two novel, exonic variants in the gaa gene. both base pair exchanges suggested either an amino acid exchange or a splice defect as consequences. however, conventional investigation of the leucocyte mrna of the patient and his parents was inconclusive. degradation of the respective mutated rna by nmd was suspected. we developed an approach in order to characterize novel splicing mutations in a simple and non-invasive manner. material and method: isolated blood lymphocytes from patient and his parents were cultured in standard leucocyte medium supplemented with different concentrations of the nmd inhibitors ocadaic acid, anisomycin, and wortmannin for 24 h. cells were harvested and rna was isolated. the reverse transcribed cdna was amplified in allele specific pcrs and qpcr assays. results: compared to the non-stimulated lymphocyte controls nonsense mediated rna decay was inhibited by anisomycin. the consequences of aberrant rna splicing were detectable: the maternal mutation results in exon skipping, the paternal mutation in intron retention. furthermore vascular ehlers-danlos syndrome (type iv) is considered to be an autosomal dominant disorder caused by heterozygous mutations in col3a1, which are missense or splice site variants in about 95% of cases. we here report on a three-year-old female of non-consanguineous parents born with bilateral clubfoot as well as dysmorphic facial features, joint laxity, and mild contractures of finger joints. developmental delay became evident. after trauma at 2 years of age she developed brain haemorrhage. mri diagnosis at this age revealed an additional frontal aneurism as well as frontoparietal polymicrogyria. we identified novel compound heterozygous col3a1 mutations: the nonsense mutation c.1282c>t (p.arg428*) and the c.2057delc (p.pro686leufs*105) frameshift mutation leading to a premature stop codon. further studies showed that the mutations were inherited from each parent who had no features for ehlers-danlos syndrome type iv. only two other families have been reported so far with recessive mutations of this gene and a severe vascular phenotype and polymicrogyria. biallelic mutations of col3a1 seem to be accompanied with a significantly worse outcome compared with heterozygous mutations and polymicrogyria is an additional phenotypic feature. here we describe five patients with epidermolytic epidermal nevi in different degrees of severity with the mosaic mutation c.466c>t (p.arg156cys) in krt10 gene. the same mutation has previously been described in patients with ei (bygum et al. 2013) . we analyzed dna from peripheral blood and/or skin biopsies from affected and unaffected skin with next generation sequencing (ngs) and sanger sequencing methods. using ngs we found this mutation in blood in mosaic states ranging from 6% to 23%. the mosaic could only be confirmed by sanger sequencing in the patient with the highest mosaic frequency of 23%. in four of our patients we investigated skin biopsies from affected and unaffected skin. it is noteworthy, that only one of four patients showed the mutation in heterozygous state of 50% in the affected skin, whereas the other patients presented a mosaic state also in the affected skin. to exclude a recurrent sequencing artefact at this position, we examined 100 control patients for this mosaic mutation using ngs. in none of these patients we found the same dna change. patients with epidermolytic epidermal nevi have a higher risk to have children with a full-blown ei phenotype. our results show the importance of ngs as the method of choice to explore the molecular genetic basis of epidermolytic epidermal nevi. strikingly, all our patients carry the same mosaic mutation c.466c>t in krt10. we suggest that this position is a hotspot for postzygotic mutations in krt10. non-syndromic hearing loss (nshl), with presently around 100 associated genes, is one of the most genetically heterogeneous disorders constituting nearly 70% of genetic deafness with a predominantly recessive inheritance pattern. thirty percent of hearing loss (hl) can be connected as a part of over 600 distinct syndromes. next generation sequencing (ngs) technologies have revolutionized pathogenic variant identification. different strategies enhance pathogenic variant detection supporting detailed hl investigation to overcome the many ambiguities associated with clinical heterogeneity. detection of the disease causing variant in correlation with the phenotype can be challenging in small families, in situations with ambiguous clinical histories and allelic heterogeneity. using a clinical and whole exome sequencing approach, we tested over 130 probands as part of a multicentre iranian and german genetics of hl study that included 29 probands primarily with sporadic or dominant hl in a parent-child or parent-sibling trio context. the majority of these probands were pre-screened for defects in gjb2 and strc. libraries were prepared using trusight one and nextera rapid capture exome enrichment and sequenced using the miseq and nextseq 500 desktop sequencers (illumina). analysis was performed using gensearchngs and an inhouse exome analysis pipeline. around 40% of cases were resolved from phenotype matching and segregation analysis. interestingly, the fraction of resolved cases was much higher in our iranian cohort (>50%) compared to our german cohort (>30%) which may be attributed in part to increased consanguinity in the iranian families. we observed likely disease causing variants in syndrome-associated genes including eya1 causing branchio-oto-renal syndrome, a phenotype that was retrospectively confirmed by acquisition of additional clinical information. with few exceptions, we observed a diverse collection of affected genes in probands from our german collected cohort. contrastingly, the iranian cohort revealed frequent mutations in myo15a and otof. furthermore, co-segregation of variants in myo6 and tecta, with expected dominant hl phenotype, was a hindrance overcome by extensive segregation testing. familial locus heterogeneity was also observed by mutations in cib2 and slc26a4 segregating in different branches of the same extended pedigree. success in the identification of disease causing variants in known hl genes is contingent upon analysis strategy, clinical information and opportunity for segregation testing. the ability to retrospectively connect an already apparent syndromic phenotype to a syndrome-associated gene without prior knowledge is a powerful application of comprehensive analysis that is not restricted to nshl genes. this work provides an improved understanding of population-specific genetic epidemiology of hereditary hl and highlights the challenges in defining genetic causes in a highly heterogeneous disorder such as hl. vere disproportionate microcephaly (-15,6 sd), corneal clouding, myopia, teeth abnormalities and dysmorphism. panel diagnostic by next generation sequencing for primary microcephaly, including all known genes for seckel syndrome, was unremarkable. microarray analysis (affymetrix® cy-toscan hd) revealed a heterozygous 65 kb deletion, spanning the plk4 gene. this deletion was confirmed by mlpa (multiplex ligation-dependent probe amplification) analysis. subsequent sequence analysis of the plk4 gene showed a variant of unknown significance on the second allele. in silico analysis of this variant indicated a significant decrease of the relative splice efficiency at the splice donor site. rt-pcr analysis confirmed altered splicing, resulting in a predominant loss of exon 11 of the transcript and predicting truncation of the plk4 protein. interestingly, a residual wild-type transcript was also detectable in patient rna, implying that this variant effects splicing only partially. by analysis of the parents, the splice variant and the large deletion were proven to be compound heterozygous. discussion: up to now, only a few patients with plk4 mutations have been described in the literature. the phenotype comprises primary microcephaly, primordial dwarfism and chorioretinopathy (mccrp2). to our knowledge, we describe the first case of a plk4 heterozygous whole gene deletion and at least partial biallelic inactivation of the gene, therefore expanding the genetic background of this disorder. furthermore, we give a detailed phenotypic description of a further individual with plk4 alterations. the girl does not show retinopathy so far. while generalised retinopathy was discussed to be one of the most prominent distinctive features between mccrp2 and primary microcephaly/seckel syndrome, we consider plk4 rather to be a further candidate gene pointing towards seckel syndrome. additional investigations on centriole function in patient-derived cells are in progress. pathogenic variants of mitochondrial dna cause a wide range of severe congenital disorders with maternal inheritance and a high transmission risk for female carriers. we report on eight families with an index case presenting with the common pathogenic variant m.8993t>g (p.leu156arg) in the mt-atp6 gene in virtually homoplasmic form. in five families the mutation was detectable in peripheral blood from the mother in heteroplasmic form. in three families with a sporadic case of leigh syndrome the mutation was not detectable in peripheral blood (or urinary or buccal cells) from the mother, possibly indicating a de novo event. furthermore, one family presented with a de novo nonsense mutation in the gene mt-atp8, which was present in peripheral blood of the index case in about 70% and was not detectable in the mother or the unaffected sister. two female carriers with a heteroplasmy level of 50% asked for prenatal testing. both pregnancies showed an apparently homoplasmic load of the mutation. mutations in lztfl1 (bbs17) may be associated with a severe renal phenotype huntington's disease (hd) is a rare autosomal dominant neurodegenerative disorder caused by expanded cag repeats as diagnosed via direct dna analysis. for asymptomatic individuals, predictive testing (pt) can facilitate life planning and diminish uncertainty, but it is also associated with substantial social and psychological challenges. we present a prospective case series of counselees seeking predictive hd testing at the huntington centre north-rhine westphalia (bochum, germany) between 2010 and 2012. the international protocol including several pre-test sessions was followed throughout. the aim of this study was to prospectively follow the decision-making process of individuals at risk in our centre and explore their experiences following the decision as well as the impacts of mutation test results by means of standardized questionnaires and a semi-standardized telephone interview one year after the initial counselling session. 72 individuals participated in at least one of the three phases of the survey, including 31 individuals for the telephone interview. in our cohort, almost all interviewees reported a balanced emotional state one year after initial counselling, regardless of the decision for or against the test. the most important motivations for a decision in favor of pt were the ability to plan private life and to eliminate uncertainty. the most important motivations against pt were the fear of an increasing risk for others (e. g. offspring) and the fear to obtain an unfavorable htt mutation result, followed by the considered, willful decision for "wanting to not know". furthermore, we identified evidence for gender-specific aspects in decision-making in line with and expanding our previous observations. this study represents one of the few comprehensive prospective evaluations regarding decision-making and coping strategies related to predictive testing for huntington's disease. we submit that gender-related aspects should be heeded in genetic counselling during the predictive testing and counselling processes. our findings could serve as a basis for more extended prospective evaluations with higher numbers of participants and longer follow-up intervals. institute of human genetics, heidelberg, germany, 2 department of conservative dentistry, heidelberg, germany, 3 cegat gmbh, center for genomics and transcriptomics, tübingen, germany background: plk4 (polo-like kinase 4) has been designated as "master regulator" of centriole assembly. complete loss of plk4 is lethal in mice, whereas biallelic plk4 mutations with some retained function have been described in a few patients with microcephaly, growth failure and retinopathy (mccrp2, omim #616171). this is a heterogeneous entity overlapping with mcph (primary microcephaly) and seckel syndrome. during the last years several new genes have been discovered associated with this spectrum. clinical report and genetic findings: we report on a 4 year old female patient with intellectual disability, primordial dwarfism (-6,9 sd), most se-abstracts linked to rare autosomal recessive diseases with poor prognosis. we then compared couples and filtered for variants present in genes overlapping in both partners. putative pathogenic variants were tested for co-segregation in affected fetuses where material was available and in unaffected siblings. out of eleven couples of mediterranean and arabian ancestry (c:8, nc:3) and two non-consanguineous couples of european ancestry, we found five cases (5/13, 38%, c:4, nc:1) with both parents being heterozygous carriers of rare potentially deleterious variants in one or more overlapping genes. in four of these couples the underlying genetic cause for pre-or early postnatal child death could be established, in two of the families the diagnosis was confirmed by homozygous detection of the parental variant in the available dna of the affected child. in a consanguineous couple with pathogenic variants for a severe autosomal recessive disorder identified in both parents, the molecular diagnosis for their child that had died at 5 months of age could not be established. out of 9 couples in whom no causative diagnosis could be achieved 4 consented to undergo further wes analysis. identified variants are now used for preimplantation and prenatal diagnostics in all four families in which a causative diagnosis was established. our data show that ngs based gene panel sequencing of selected genes involved in lethal autosomal recessive disorders is an effective tool for carrier screening in parents and for the identification of recessive gene defects in families that have experienced early child death and/or multiple miscarriages. k. komlósi 1 , s. diederich 1 , d. l. fend-guella 2 , u. zechner 2 , s. schweiger 2 , o. bartsch 2 1 recently an x-linked syndrome with maternally inherited or de novo mutations in taf1 was described with global developmental delay, intellectual disability (id), delayed speech, characteristic facial dysmorphology, generalized hypotonia and variable neurologic features (mrxs33, mim: #300966, xlr). there have been only three publications of 14 unrelated families, 11 with single-nucleotide changes and 3 with gene duplications including taf1 (kaya et al., 2012; o'rawe et al., 2015; hu et al., 2016) . we identified a german family in which two brothers (21 and 5 years) showed severe intellectual disability, absent speech and understanding, and hypotonia but different neurologic and behavioral phenotypes. besides severe id the older brother also had postnatal short stature (-3 sd), a severe lennox-gastaut epilepsy and a neurodegenerative course. the younger brother showed autistic behavior and lost his very limited skills at age 3 to 3.5 years. both showed mild dysmorphic features (prominent supraorbital ridges, sagging cheeks, long philtrum, long face, thin upper lip, and high-arched palate), oropharyngeal dysphagia and generalized hypotonia. a gluteal crease with a sacral caudal remnant described as a characteristic feature was not seen in our case, and hearing impairment, microcephaly, dystonic movements or tremor were not observed either. the family history was highly suggestive of x-linked inheritance with an affected maternal uncle, a maternal aunt with multiple miscarriages, and two aunts with learning disability. since a previous analysis of 107 known x-linked mental retardation genes had not revealed the cause in the older brother, we used a targeted ngs approach (mpimg1-test: >1200 brain related genes) for the analysis of the younger brother. following enrichment a 300 bp paired end sequencing was carried out on an illumina miseq system with >95% of target covered >20-fold (hu et al, 2014) . only taf1 fitted the x-linked model and the phenotype. the unreported hemizygous sequence variant c.2833g>a (p.asp945asn) in exon 18 of taf1 was deemed pathogenic. it affected a highly conserved residue in the central "duf3591" domain, where 4/11 previously described mutations had clustered. segregation analysis confirmed hemizygosity in the older brother and heterozygosity in the mother with completely skewed x-inactivation (100:0). gonadism and learning disabilities. because of the delayed onset of symptoms, the diagnosis is often established during late childhood. however, in some cases renal morphological changes detected by ultrasound may resemble those usually seen in autosomal recessive polycystic kidney disease (arpkd). however, histologically bbs-kidneys differ distinctly from other polycystic disorders by cystic orientation, localisation, extension, structure and size. 21 bbs genes have been identified to date. mutations in bbs2, bbs4, bbs6 and bbs10 have been found to cause an antenatal presentation of bbs that may in some aspects mimic meckel gruber syndrome (mks). there is increasing evidence that bbs, at least in some families, shows an oligogenic mode of inheritance with three mutations at two bbs loci. yet, only three patients in two families with bbs caused by mutations in lztfl1 (bbs17) have been reported. their diagnosis was established in childhood and all patients had mesoaxial polydactyly as a distinct manifestation. in contrast to previous lztfl1 cases, in our family the diagnosis of arpkd was suspected sonographically at 27 weeks gestation (wg). pregnancy was terminated at 30 wg. autopsy revealed postaxial polydactyly of both hands, enlarged spongy kidneys, hemivertebra t6 and some features of potter's sequence. histological examination of the kidneys showed multiple, not radially oriented thin walled cysts, internally lined by thickened pas-positive basement membranes and microcystic dilatation of collecting ducts. cystic changes were accentuated in the renal medulla. corticomedullar differentiation was mainly preserved. the tentative diagnosis was bbs. fetal dna was investigated using a next generation sequencing panel which included 17 known bbs causing genes. hereby a heterozygous nonsense variant (np_065080.1: p.glu260*) inherited from the mother and a heterozygous missense variant (p.glu92lys) inherited from the father of the lztfl1 gene were identified. furthermore a maternally inherited heterozygous missense variant of unknown clinical significance in bbs4 was detected (np_149017: p.pro443ala). our case shows for the first time that mutations in lztfl1 can lead to a severe prenatal presentation of bbs due to profound renal manifestations with a kidney histology that is not considerably milder but distinct from that observed in mks. it is not clear to which extent the bbs4 variant may act as a disease modifier. this may challenge genetic counselling and prenatal diagnosis in a further pregnancy. furthermore our case shows that mesoaxial polydactyly is not always present in bbs patients with lztfl1 mutations and further studies are necessary to establish the frequency of mesoaxial polydactyly and other genotype phenotype correlations for bbs patients with lztfl1 mutations. p-cling-075 *** targeted next-generation sequencing analysis in couples at increased risk for autosomal recessive disorders k. komlósi, s. diederich, d. l. fend-guella, j. winter, o. bartsch, u. zechner, s. schweiger genetic childhood disorders leading to prenatal, neonatal or early childhood death are genetically heterogeneous. many follow autosomal recessive or x-linked modes of inheritance and bear specific challenges for genetic counselling and prenatal diagnostics. parents are carriers but unaffected and diseases are typically very rare but with recurrence risks of 25% in the same family. often, affected children (or fetuses) die before a genetic diagnosis can be established, post-mortem analysis and phenotype descriptions are insufficient and dna material of affected fetuses or children is not available for later analysis. a genetic diagnosis showing biallelic mutations or mutations on the x-chromosome in male fetuses or children is, however, the requirement for targeted carrier testing in parents, risk calculations, and prenatal and preimplantation diagnostics in further pregnancies. we employed targeted next-generation sequencing (ngs) for carrier screening of autosomal recessive lethal disorders in 8 consanguineous (c) and 5 non-consanguineous (nc) couples with one or more affected children. we searched for heterozygous variants (non-synonymous coding or splice variants as well as cnvs) in parents' dnas in a set of 430 genes r. kropatsch 1, 2 , j. preine 3, 4 , jt. epplen 1,2 1 department of human genetics, 2 ruhr university, bochum, germany, 3 charcot-marie-tooth disease (cmt), also commonly called as hereditary motor sensory neuropathy, is the most common monogenetic disease of the peripheral nervous system with significant clinical and genetic heterogeneity. the main clinical manifestations of cmt include progressive distal muscle weakness and atrophy, impaired distal sensation, depressed tendon reflexes and high-arched feet. based upon electrophysiological and histopathological features cmt can be divided into predominantly demyelinating or axonal forms. an intermediate form also exists characterized by evidence of both, demyelinating and axonal, impairments. genetically cmt can be caused by mutations in over 40 genes, including the dnm2 gene encoding dynamin 2 protein, a large gtpase primarily involved in receptor-mediated endocytosis and membrane trafficking. only a small number of mutations in dnm2 causing cmt have been described so far. we report the case of a 48-year-old man presenting with backache, ataxic gait and distal muscle weakness of the lower limb considered to be a consequence of the pre-diagnosed disc prolapse 3 years ago. over the past months bilateral progressive weakness of ankle dorsiflexion, foot drop and tingling paresthesia in stocking distribution have occurred. neurological examination disclosed depressed tendon reflexes of the upper and lower limbs. neurophysiologic investigations revealed an axonal sensible polyneuropathy with normal distal motor latencies and nerve conduction velocities. the sural nerve biopsy indicated single unmyelinated or thinly myelinated axons, loss of myelinated nerve fibers, numerous clusters of regenerating fibers without onion bulb formations suggesting an intermediate form of cmt. by using next generation sequencing (ngs) and a multi-gene panel, consisting of 751 inherited neurological disease-associated genes, we identified a heterozygous missense mutation c.439g>a, p.asp147asn (rs370086632) in the dnm2 gene. this particular mutation is located in a highly conserved nucleotide region encoding the catalytic n-terminal gtpase domain. this evidence suggests a pathogenic phenotype caused by the described mutation, which is being underlined by the following facts: public 1000 genome database covering harmless variants of the human genome does not report it. additionally, the exac browser with exome sequencing data of >60,000 unrelated individuals, lists the mutation and shows a low allele frequency of 0.000008243, corresponding to one known heterozygous mutation carrier. other online prediction tools like the mutation taster, polyphen, sift and provean categorize it as pathogenic. in conclusion, the novel dnm2 mutation is responsible with high probability for the late-onset form of intermediate cmt of the investigated patient. heterozygous mutations in pcdh19 cause an x-linked female-limited form of an early infantile epilepsy (juberg-hellman syndrome). the phenotype of this syndrome is variable, ranging from benign focal epilepsy to severe, serial seizures, repeating up to more than 10 times a day for several consecutive days. the intellectual outcome of affected patients ranges from normal to severe intellectual disability. psychiatric disturbances are frequent and manifest as autism, schizophrenia or aggressive behavior. neurological features such as ataxia may also be present. women with triple x syndrome usually show a normal physical development. cognitive deficits in vivo functional modeling of taf1 has already provided evidence for an effect on a neuronal phenotype. the phenotype in patients can be reminiscent of rett syndrome, but with milder regression and normal movements lacking a specific stereotypic pattern (no hand wringing). while severe neurodegeneration has been described in duplications, the present probands clearly showed developmental regression associated with a missense mutation. the analysis of further family members is pending. our case adds to the phenotypic spectrum of x-linked syndromic mental retardation type 33. targeted enrichment sequencing was successfully applied to identify greenberg dysplasia as cause of fatal anomalies in one fetus of a dizygotic twin pregnancy. p. m. kroisel 1, 2 , b. csapo 2, 3 , m. häusler 2, 3 , l. michelitsch 4, 5 , s. verheyen 1, 2 , p. klaritsch 2, 3 , k. wagner 1, 2 1 institute of human genetics, 2 medical university of graz, graz, austria, 3 department of obstetrics and gynecology, 4 gynecologist and obstetrician, 5 weiz, austria in the first pregnancy of a 29 year old kosovarian woman and her 36 year old husband during the 16th week of gestation one of her dizygotic twins showed a severe skeletal dysplasia with all long bones extremely shortened and partially bended. the thorax was short and narrow. in addition a ventriculomegaly of the brain and an increased nuchal translucency was noticed. a very bad prognosis was expected and an achondrogenesis was suspected clinically. the other twin appeared to be normal. an amniocentesis was performed to potentially identify the genetic basis of the disorder. qf-pcr to rule out common trisomy's and cytogenetics revealed normal results and following a normal agilent 60k array cgh analysis as a next diagnostic step next generation sequencing by using the trusight one gene panel focusing on three genes including slc26a2, trip11 and col2a1 was performed. since no pathogenic mutation was found by this approach, a more extended bioinformatics study was initiated. by filtering out common variants in the more than 4800 genes of the panel in our own database or in the exac-and 1000 genome databases our search was extended to genes with rare homozygous or compound heterozygous variants. by this strategy it was possible to reduce the potentially causative gene mutations dramatically and among those remaining genes for known very severe skeletal phenotypes just in the lbr gene the homozygous missense mutation c.1639a>g, p.asn547asp was identified in our fetus. since this particular mutation is already known to be pathogenic leading to the lethal greenberg dysplasia (clayton et al., nucleus. 2010 ) the diagnosis could be achieved in the affected fetus of the pregnancy of our patient still before completion of the 23rd week of gestation. both parents were found to be heterozygous for this mutation in the lbr gene. recently it was shown that different mutations of the very same gene can also lead to less severe forms of bone dysplasia. the couple was informed about our results and possible consequences were discussed and offered. the couple however came to the decision not to draw any consequences. both fetuses especially the affected one were well documented sonographically including in a series of 3d images. in the 27th gestational week during a sonographic investigation the affected fetus did not show cardiac function and an oligohydramnios was found. since development of the second non affected fetus was still within the normal range, we hope the now single pregnancy will carry on normal until birth. from our finding we would propose that our chosen strategy is straightforward and can be applied in a wide range of pregnancies to identify various up to severe and fatal single gene disorders associated with sonographic anomalies within a few weeks which should provide substantial benefits for these families. after the working diagnosis ps had been established, molecular analyses regarding the recurrent akt1 mutation (p.glu17lys) were performed by sanger sequencing in available affected tissue specimen of all three individuals. this revealed a high level of mosaic state for the akt1 mutation c.49g>a, (p.glu17lys) in affected tissues from bone and in meningiomas. re-evaluation of the ngs data from blood (individual i) confirmed the absence of that mutation in all reads, and no mutation was detected by sanger sequencing in dna from blood in individuals ii and iii. thus, a somatic mosaicism leading to a mild proteus phenotype could be confirmed as the underlying genetic cause in all three affected individuals. in conclusion, mild forms of proteus syndrome caused by the recurrent akt1 mutation in patients with limited regional involvement may be particularly difficult to diagnose and might be underdiagnosed. distal gne-myopathy: rare differential diagnosis of polyneuropathy here, we report a case of a 37-year-old patient with presumed polyneuropathy and elevated creatine kinase levels (400-800 u/l). clinical features included atrophic and bilateral paresis of lower legs of the frontal and rear compartment without high arched foot, while sensibility was not affected. additionally, a myopathic emg in m. tibialis anterior and a slight axonal damage in the motor neurography was detected. due to this overlapping neuromyological phenotype we performed a gene panel including 155 genes associated with neuromuscular diseases using targeted next generation sequencing. gene panel analysis revealed the homozygous mutation c.829c>t (p.arg277trp) in the gne gene. this mutation is described in the literature as cause of a distal gne-myopathy and was also detected in an affected brother (ck 1900 u/l), having consanguineous parents. the current case emphasizes that a large gene panel analysis is recommended in case of an overlapping neuropathological and myopathological phenotype. c. landgraf, g. schmidt, s. morlot, b. schlegelberger, b. auber institute of human genetics, hannover medical school, hanover, germany ehlers-danlos syndrome (eds) is a heterogeneous group of connective tissue disorders. according to the 1997 villefranche classification, eds comprises six major types as well as some rare specific entities. one of these has been referred to as "eds, musculocontractural type" (mc-eds), "adducted thumb-clubfoot syndrome" or "eds, kosho type". first described in 1959 as an eds vi subtype, it recently was identified to be caused by biallelic changes in either the chst14 or dse genes, resulting in a loss of dermatan sulfate (ds) biosynthesis. characteristic symptoms are multiple congenital malformations such as contractures (club feet, adducted thumbs), visceral and ocular anomalies, generalized joint laxity, scoliosis, muscular hypotonia, fragile, hyperextensible and bruisable skin, as well as a typical craniofacial appearance. distinctive features include hypertelorism, down-slanting palpebral fissures, bluish sclerae, micro-corneae, short nose with hypoplastic columella and long philtrum, thin upper lip vermillion, small mouth, retrognathia, low-set and posteriorly-rotated ears. the psychomotor development is delayed. 31 (chst14) and 3 (dse) patients have been reported as yet. the patient, a 30-year-old woman using a wheelchair, had club feet, surgically corrected asd ii, muscular hypotonia, the characteristic face and hyperextensible skin with atrophic scars; particularly visible were those and learning disabilities are more common than in the general population and compared to siblings. their motor skills are likely to be somewhat impaired and coordination problems are frequent. in some patients psychological problems were described. furthermore, eeg abnormalities are occasionally observed, with clinical seizures present in up to 15% of patients. here we report on a 15-year-old girl with a 47,xxx karyotype and early infantile, intractable epileptic seizures, beginning at the age of 9 months. about three years later, she developed severe, serial seizures often related to febrile infectious diseases. in adolescence, the epileptic symptoms became less intense. she additionally showed autistic features, mental deficiencies, hypermobility of the joints and ataxia. array-cgh, fragile x-analysis as well as sanger sequencing and mlpa of the scn1a and mecp2 genes revealed no additional abnormalities, besides the xxx karyotype. in the pcdh19 gene the heterozygous missense mutation c.695a>g (p.asn232ser) was identified, whereby the mutated allele seemed to appear in a 2:1 ratio compared to the wild type allele. this mutation has been previously described as disease causing. furthermore, three in silico prediction programs (sift, polyphen-2, mutationtaster) classified the mutation as pathogenic. the patient's asymptomatic mother had a normal 46,xx karyotype and was not a carrier of the pcdh19 mutation. pcdh19-related epilepsy exhibits an unusual mode of inheritance in which only heterozygous females are affected and hemizygous males are asymptomatic carriers. random x-inactivation in the brain of females with pcdh19 mutations causes a cellular mosaicism, which likely accounts for the pathogenesis by altering the cell-cell-interactions ("cellular interference"). however, the precise mechanism is still unknown. hypothetically, the wide range of phenotypic expressions may be explained by partially skewed x-inactivation and thereby limitation of the cellular interference. hence, an unequal ratio of mutated to wild type cells should give a milder phenotype compared to the fifty-fifty situation. in contrast to this hypothesis, the phenotype in our patient was rather severe. nevertheless, we cannot exclude that the triple x status contributes additionally to the observed phenotypic expression. proteus syndrome (ps, omim 176920) is a highly variable disorder with asymmetric and disproportionate overgrowth of the body, connective tissue nevi, epidermal nevi, dysregulated adipose tissue, and vascular malformations, caused by a somatic activating akt1 mutation. we report on three unrelated individuals (two adults and one 6 year old boy) who showed similar clinical findings that not fulfilled the rigorous clinical criteria for ps (biesecker, 1999) . beside an asymmetric hyperostosis of the skull or facial bones, all three had an ocular dermoid. two individuals developed alveolar hyperostoses and intracranial calcifying meningiomas. only one individual showed skin changes. all three had normal feet and no vascular lesions. molecular analyses in individual i performed in blood revealed normal results for array karyotyping and no relevant variant in whole exome sequencing (trio approach). introduction: incontinentia pigmenti (ip) is a rare x-linked male lethal genodermatosis that affects the neuroectodermal tissue and is always associated with a bullous rash of the skin along blashko lines in female neonates. it is caused by mutations in ikbkg which encodes the regulatory subunit of the ikb kinase complex required for nf-kb activation. ikbkg has a pseudogene (ikpkgp) with identical exons 4 to 10. the most frequent ip mutation is a recurrent exon 4_10 deletion due to non-allelic homologous recombination with the pseudogene. here, we report a novel deletion of exons 4 and 5 in the ikbkg transcript recognized by rna analysis. patient: we investigated a 9 yr old girl with typical erythematous rash after birth which resolved within 2-3 m. apart from a small hyperpigmented area around the right mammilla there were no skin alterations. she had few conically shaped teeth, normal nail and hair structure, no neurological manifestation and normal intelligence. only clinical sign were repeated vitreous hemorrhage of the left eye from age 5 m. family history was negative. methods: analyses on genomic dna extracted from blood included testing for the common ikbkg exon 4_10 deletion by long range pcr and mlpa (p073-a1, mrc holland), x inactivation analysis in the androgen receptor gene as described, and massively parallel sequencing (mpseq) of the ikbkg gene (trusightone, nextseq, illumina®; data analysis with nextgene/geneticist assistant [softgenetics®] and seqnext [jsi®]; reference sequence grch37, hg19). rna was extracted from cultured blood lymphocytes, and the entire ikbkg transcript (nm_003639.4, 10 exons) was sanger sequenced on cdna level (the a of the start codon is in exon 2); the results were analyzed with sequencepilot (jsi). result: genomic dna analyses including mlpa of ikbkg and ikbkgp specific probes in our patient did not reveal a putative mutation. there was a completely skewed x-inactivation pattern. cdna sequencing of ikbkg demonstrated skipping of exon 4 and 5 (r.400_671del) which is predicted to cause a frame shift starting from codon p.gln134, a premature stop codon 28 amino acids downstream (p.gln134glyfs*29) and complete loss of protein function. the loss of exons 4 and 5 is most likely due to an intronic splice variant in intron 6; investigations regarding the origin of this deletion are ongoing. conclusion: the presence of the high homologous pseudogene makes sequence analysis of ikbkg challenging. we report a deletion of exons 4 and 5 in the ikbkg transcript that required rna analysis for its identification. due to the skewed x-inactivation and typical clinical picture causality of the detected deletion is certain. the exact genomic cause of this alteration remains to be clarified. also in the era of mpseq, rna analysis may be necessary for detection of deep intronic mutations or the study of genes with homologous pseudogenes, as shown here in the case of ikbkg. primary congenital glaucoma (pcg) and early onset glaucomas are one of the major causes of of blindness in children and young adults' worldwide. both autosomal recessive and dominant inheritance have been described resulting from bowel surgeries due to colon perforation. her older sister who died aged 19 following an acute abdomen had club feet and the typical facial appearance, while three healthy sisters seem to be unaffected. her parents are first cousins of turkish origin and do not show eds symptoms either. the two affected sisters had been diagnosed with a syndromic disorder that could represent a rare form of eds. however, neither a confirmation of the suspected diagnosis nor a classification had yet been achieved. due to the distinctive symptom complex and the presumed autosomal recessive inheritance pattern, we strongly suspected this to be a case of mc-eds. sequencing of the chst14 gene (reference sequence: lrg_600) revealed a formerly undescribed homozygous variant (c.644c>t; p.pro-215leu). the variant changes the highly conserved pro215 residue which is located in the critical 3'-phospho-5'-adenylyl sulfate binding site and can be classified as likely pathogenic (acmg standards and guidelines, richards et al., genetics in medicine 2015) . the parents are heterozygous carriers of this variant, respectively. this case represents a unique entity within the umbrella term eds and illustrates the importance of clinical assessment leading to a diagnosis confirmed by genetic analysis. the underlying genetic defect in patients with mitochondrial peo is either a primary mutation of the mitochondrial genome (single, large-scale mtdna deletion or mtdna point mutation) or recessively and dominantly-inherited mutations in nuclear genes involved in mtdna maintenance leading to clonally-expanded multiple mtdna deletions in muscle. the nuclear disease genes are largely implicated in the replication and stability of mtdna, and as such a pathogenic mutation leads to secondary instability of the mitochondrial genome. causal mtdna deletions can be found in a heteroplasmic (mixture of mutated and wild type mtdna) state. however, each tissue/cell has its own biochemical threshold of mutant mtdna load which needs to be exceeded before focal respiratory chain deficiency becomes evident. to investigate this, muscle biopsies of 17 patients with genetically-and clinically-characterized mitochondrial disease of nuclear origin (9 polg, 5 twnk, 2 rrm2b and 1 slc25a4 (ant1)) and 4 healthy controls were analysed using quadruple oxphos immunohistochemistry, quantifying the biochemical phenotype in individual muscle fibres of patient muscle biopsies. this technique is based on quadruple immunofluorescence to detect structural components of complexes i (ndufb8) and iv (coxi), as well as porin (a marker of mitochondrial mass) and laminin (a cell membrane marker to define the boundaries of muscle fibres). further studies on 7/17 patients (3 polg, 2 rrm2b, 1 twnk, 1 slc25a4 (ant1)) included the correlation of the biochemical deficiency with the mtdna abnormality in individual cells, following laser microcapture and determination of the size and level of clonally-expanded mtdna deletion within fibres by real-time pcr. our preliminary data from quadruple immunocytochemical studies show that the muscle biochemical phenotype is different in patients with multiple mtdna deletions compared to other mtdna mutations; work is continuing to determine the exact size and level of clonally-expanded mtdna deletion in individual muscle fibres and correlate this with the observed biochemical defects and disease thresholds. abstracts these families, and functional studies, together with phenotype descriptions in the literature, are essential for pathogenic grading. however, several difficulties remain, such as the huge size of the ttn gene (>100kb) impeding functional studies, the wide spectrum of phenotypes and variants, the still small patient cohort, and often unspecific immunohistochemical abnormalities in muscle biopsies. the clinical evaluation of ttn variants thus presents a great challenge to the field of human genetics diagnostics. compound heterozygous variants in the qars gene (omim 603727) have been identified in only four patients with autosomal recessive progressive microcephaly with seizures and cerebral and cerebellar atrophy (mscca), to date. these patients showed severe developmental delay, progressive primary microcephaly, intractable seizures, hypomyelination or delayed myelination, thin corpus callosum, and small cerebellar vermis on brain imaging. here we report on two unrelated girls with progressive primary microcephaly, epilepsy and brain anomalies. trio exome analysis in each of the families revealed two different combinations of compound heterozygous variants in qars. all four variants are highly conserved throughout vertebrates, not reported in any database, yet, and in silico analysis predicted the variants as possibly damaging or deleterious. the first patient was born to non-consanguineous german parents. at birth, she was too short (−2.8 sd) and mildly microcephalic (−2.3 sd). she developed intractable seizures within the first hour of life. her growth continued to be mildly retarded (−2.8 sd at age 9 years) but microcephaly was progressive (−6.5 sd at age 9 years). she did not achieve any of the motor or cognitive developmental milestones, she did not have eye contact. the only interaction with her surrounding was a diffuse reaction to being touched. cranial mri showed no myelination of the supratentorial region, corpus callosum agenesis, simplified gyral pattern of frontal lobes, enlarged cerebral ventricles, and normal brain stem and cerebellum. trio-exome sequencing revealed the compound heterozygous qars variants c.1132c>t, p.(arg-378cys) and c.1567c>t, p.(arg523*). segregation analysis by sanger sequencing confirmed the heterozygous variants in the parents and two non affected sibling of the index patient. the second patient was initially evaluated at 11 days of age when she exhibited myoclonic seizures, intrauterine growth retardation, microcephaly, and elevated lactic acid. at birth, she was microcephalic (hc 29 cm) and microcephaly was progressive (−5.4 sd at age 19 months). cranial mri suggested undersulcation. she has required a gastrostomy feeding tube. trio-exome sequencing revealed the compound heterozygous qars variants c.40g>a, p.(gly14ser) and c.1573c>t, p.(arg525trp). segregation was confirmed by sanger sequencing analysis. together with the four previously described patients we conclude that compound heterozygous variants in qars are associated with a primary and progressive microcephaly, early onset of intractable seizures and severe developmental delay. brain imaging in the neonate can show simplified gyral pattern as an early characteristic feature. overlapping phenotypes are seen in patients with epileptic encephalopathy, lissencephaly and primary microcephaly. application of ngs panels or exome technology will allow for early diagnosis and further collection of patients for better delineation of the phenotype. with involvement of several genes including cyp1b1, foxc1, pitx2, myoc and pax6. however, mutations in these genes explain only a small fraction of cases suggesting the presence of further candidate genes. to elucidate further genetic causes of these conditions we performed whole exome sequencing in a patient with pcg and retinal detachment and identified compound heterozygous variants in col1a1 (p.met264leu; p.ala-1083thr). targeted col1a1 screening of 26 additional patients detected three further heterozygous variants (p.arg253*, p.gly767ser and p.gly-154val) in three distinct subjects: two of them were diagnosed with early onset glaucoma and mild form of osteogenesis imperfecta (oi), one patient had a diagnosis of pcg at age 4 years. all five variants affected evolutionary, highly conserved amino acids indicating important functional restrictions. molecular modeling predicted that the heterozygous variants are dominant in effect and affect protein stability and thus the amount of available protein, while the compound heterozygous variants act as recessive alleles and impair binding affinity to two main col1a1 binding proteins: hsp47 and fibronectin. dominantly inherited mutations in col1a1 are known causes of connective tissues disorders such as oi. these disorders are also associated with different ocular abnormalities, although the common pathology for both features is seldom recognized. our findings expand the role of col1a1 mutations in different forms of early-onset glaucoma with and without signs of oi. thus, we suggest including col1a1 mutation screening in the genetic work-up of glaucoma cases and detailed ophthalmic examinations with fundus analysis in patients with oi. the gene ttn encodes the largest known protein, titin, which plays a key role in structural, mechanical, developmental and regulatory functions of cardiac and skeletal muscles. accordingly, titinopathies are characterized by great clinical and genetic heterogeneity. the clinical spectrum ranges from severe phenotypes with cardiac involvement to pure myopathies at the milder end, including autosomal recessive and dominant inheritance patterns (chauveau et al. 2014, hum mutat; 35:1046) . next generation sequencing analysis identifies a large number of variants of unknown clinical significance; the potential clinical relevance of these variants cannot be assessed with certainty without further studies. three case reports highlight the difficulties in human genetics diagnostics concerning ttn. the first case is of a 46 year-old woman with proximal muscle weakness, slightly elevated ck, scoliosis, and no family history. a heterozygous known pathogenic variant was identified in mex1, associated with autosomal recessive congenital core myopathy combined with primary heart disease. additionally, an unknown variant was detected. both variants could be clinically relevant with regard to the patient's phenotype, but this can be neither confirmed nor excluded at this time. the second case is of a 7 month-old finnish girl who presented with severe muscle hypotonia at birth and mental alertness with normal brain mri and eeg. congenital fiber-type disproportion was suspected. a homozygous frame-shift mutation in mex1 was identified which to our knowledge has not yet been described in the literature. this variant is likely of clinical relevance with regard to the patient's phenotype, but this can be neither confirmed nor excluded at this time. the third case is that of a 34 year-old woman with suspected myofibrillar myopathy. a known pathogenic homozygous frame-shift mutation in mex6 was detected which is associated with autosomal recessive congenital myopathy with central nuclei. segregation analysis revealed that the healthy parents are heterozygous carriers of this variant. the clinical diagnosis of a ttn-associated disease could therefore be confirmed. ttn variants need to be assessed in combination with detailed clinical and muscle biopsy data. segregation analysis is necessary but not sufficient for the clinical grading of variants. identification of a variant in several independent families, segregation of the variant with disease phenotype in gous pathogenic smad4 variant c.719dupt;p.(ala241fs) (ncbi reference sequence nm_005359.5). gastric as well as colonic cancer and polyposis was present in the paternal family history. conclusions: ts14 and other imprinting disorders are likely underdiagnosed, as the main clinical features (e. g. growth retardation, hypotonia) are distinct but unspecific. as exome sequencing becomes a more frequent diagnostic procedure, imprinting disorders caused by mutations in imprinting centers will presumably be diagnosed more often. methylation defects, however, will remain underdiagnosed, without a specific clinical differential diagnosis, which would guide to appropriate analysis of the methylation status. a bowel invagination in early childhood due to a single polyp can be a symptom of jps, especially in the context of a paternal history of polyposis and intestinal cancer; thus, family history should be carefully obtained. in the outpatient clinic, child psychiatrists as well as neurologists thoroughly work up phelan-mcdermid patients according to a standardized protocol by taking medical history, performing physical examination, and, if needed, organizing further supplementary examinations. in addition, a genetic analysis and hair/tissue sampling is performed. since its foundation, a steadily increasing number of so far 70 patients from all over germany has been seen and treated. the outpatient clinic aims at facilitating and accelerating the diagnosis of phelan-mcdermid syndrome, improving medical support for affected patients of all ages, and, last but not least, fostering a better understanding of the causes and pathomechanisms leading to the symptoms of the disease. t. m. neuhann, l. neuhann, c. rapp, a. laner, a. benet-pages, e. holinski-feder medizinisch genetisches zentrum, münchen, germany congenital eye malformations, such as the microphthalmia-anophthalmia-coloboma (mac) spectrum, congenital cataracts, anterior segment dysgenesis (asd), and congenital glaucoma, affect more than 1:8.000 newborns. the phenotypic spectrum of the aforementioned entities is highly variable and partially overlapping. eye malfomations are very heterogeneous; to date causative mutations have been described in more than 100 next-generation-sequencing (ngs) technology has revolutionized genomic research and has transformed clinical diagnostics. ngs offers enormous potential for providing accurate diagnoses to individuals with previously unresolved syndromes. in the pediatric endocrine clinic, clinicians are often faced with the task of making a diagnosis in children with syndromic short stature. as there may be considerable clinical overlap between short stature syndromes, deriving a clinical diagnosis may prove challenging. furthermore, even if a clinical differential diagnosis is established, often several genes would need to be tested before a molecular diagnosis is made. as access to genetic testing is limited in algeria, we conducted a pilot study on 10 algerian patients with syndromic short stature using a combination of two different ngs modalities, namely whole-exome-sequencing (wes) and mendeliome sequencing (trusight one sequencing panel). a molecular diagnosis could be established in 9/10 patients, making the diagnostic rate in this initial cohort 90%. as 7 patients had novel mutations we could expand the mutational spectra of several genes, namely cul7, npr2, sos1, vps13b, and znf81. we could thus substantiate the clinical utility of wes and the mendeliome in patients with a diverse array of syndromic short stature syndromes. chromosome 14 harbours an imprinted locus at 14q32. maternal uniparental disomy of chromosome 14, paternal deletions and paternal loss of methylation at the intergenic differentially methylated region (ig-dmr) and the somatic dmr within meg3 are associated with temple syndrome (ts14, mim 616222). the phenotype of ts14 consists of pre-and postnatal growth retardation, early feeding problems and muscular hypotonia, joint laxity, motor developmental delay, premature puberty, and truncal obesity. juvenile polyposis syndrome (jps, mim 174900) is characterized by predisposition to hamartomatous polyps in the gastrointestinal (gi) tract, specifically in the stomach, small intestine, colon, and rectum, including the risk for gastrointestinal cancer. pathogenic variants in the bmpr1a and smad4 gene are identified in about 40-50% of affected families. we report on a family with two female children. the index patient, an 8-year-old girl, was diagnosed to have ts14 due to hypomethylation at the somatic dmr within meg3 with clinical features reminiscent of prader-willi syndrome in early childhood and milder clinical signs at further age (i. e. mild global development delay, muscular hypotonia, suspected central obesity, no prominent facial dysmorphisms). snp array-cgh analysis was unsuspicious and no deletion of the imprinting center was observed. thus, ts14 is caused by a sporadic imprinting defect in our patient. her 10-year-old sister was diagnosed with smad4-associated jps after an episode of intestinal invagination due to a polyp, histologically diagnosed as peutz-jeghers polyp, in early infancy. sequencing identified a heterozy-abstracts coding vmat2 have only very recently been described as causal for brain dopamine-serotonin vesicular transport disease in two families with multiple affected children (rilstone et al., n engl j med 368, 2013, 543-550; jacobsen et al., j inherit metab dis 39, 2016, 305-308) . the index case presented here is a 7-year-old girl with severe mental retardation and a dystonic movement disorder. she is the tenth child of a consanguineous arabic couple and was initially referred to neuropaediatric examination at the age of four months due to recurrent oculogyric crises and muscular hypotonia. blood metabolic testing and cerebrospinal fluid (csf) analyses were inconclusive. notably, biogenic amines were within their normal ranges and the differential diagnosis of aromatic l-amino acid decarboxylase (aadc) deficiency could not be confirmed. conventional cytogenetics, subtelomeric screening, array-cgh and different ngs panel analyses did not identify a causative mutation. both parents and all eight living siblings are obviously unaffected. a brother with a known hypotonic movement disorder died at the age of three years due to prolonged seizures with hyperthermia and cerebral edema. by utilizing whole-exome sequencing, we identified a homozygous substitution in the slc18a2 gene of the index case causing an amino acid change (c.710c>a; p.pro237his) in a conserved transmembrane domain of vesicular monoamine transporter 2 (vmat2). homozygosity for this missense change could also be verified in a dna sample of her deceased brother. an obvious reduction in frequency of oculogyric crises was observed in our index case under therapy with pramipexole already within 4 weeks after start of treatment. furthermore the patient shows less dystonic movements under therapy. the case presented here highlights the importance of considering brain dopamine-serotonin vesicular transport disease as differential diagnosis for early-onset extrapyramidal movement disorders combined with mental retardation even if neurotransmitters in csf are normal. for a large number of individuals with intellectual disability (id), the molecular basis of the disorder is still unknown. however, whole exome sequencing (wes) is providing more and more insights into the genetic landscape of id. in the present study, we performed trio-based wes in 311 patients with unsolved id and additional clinical features, and identified homozygous cplx1 mutations in three patients with id from two unrelated families. all displayed marked developmental delay and migrating myoclonic epilepsy, and one showed a cerebellar cleft in addition. the encoded protein, complexin 1, is crucially involved in neuronal synaptic regulation, and homozygous cplx1 knockout mice have the earliest known onset of ataxia seen in a mouse model. recently, a homozygous truncating mutation in cplx1 was suggested to be causative for migrating epilepsy and structural brain abnormalities. id was not reported. the currently limited knowledge on cplx1 suggests that complete loss of complexin 1 function may lead to a complex but variable clinical phenotype, and our findings encourage further investigations of cplx1 in patients with id, developmental delay and myoclonic epilepsy to unravel the phenotypic spectrum of carriers of biallelic cplx1 mutations. genes. due their heterogeneity, diagnostic testing for congenital eye malformations was limited in the pre-ngs era. we performed exome analysis in 30 patients with congenital eye malformation (mac spectrum, asd, congenital cataract, congenital glaucoma). primarily, a gene panel comprising 112 genes associated with eye malformations was evaluated. additionally the exome data was evaluated in selected patients as a second step. the panel analysis revealed pathogenic sequence variants in 10 patients and 8 genes (mab21l2, bcor, nhs, prss56, cyp1b1, foxc1, pitx2, gcnt2). putatively causative sequence variants were identified additional patients. the diagnostic yield of the panel was highest in patients with non-syndromic microphthalmia/coloboma and congenital cataracts, and lowest in patients with syndromic mac spectrum (i. e. additional systemic features/malformations). ngs based panel testing is a strong diagnostic tool to determine the underlying causes of non-syndromic congenital eye malformations. due to the partially overlapping phenotypes and high heterogeneity it is more sensible to perform large gene panel analysis, as opposed to smaller single phenotype based panels. superactivity of phosphoribosyl¬pyrophosphate synthetase i (prpps) is a rare inborn error of purine metabolism that is characterized by increased levels of uric acid in blood and urine (omim 300661). the disorder is caused by gain-of-function mutations in the x-chromosomal gene prps1. in male patients, disease manifestation is in early childhood. additional clinical characteristics include intellectual disability, hypotonia, ataxia and hearing loss. heterozygous female mutation carriers have a later age of onset and a less severe clinical course. only seven families with prps1 gainof-function mutations have been reported to date. we report on a 7-year-old boy with congenital hyperuricemia, urolithiasis, developmental delay, short stature, hypospadias and facial dysmorphisms. his mother also had hyperuricemia that was diagnosed at age 17 years but was otherwise healthy. a novel prps1 missense mutation (c.573g>c, p. leu191phe) was detected in the proband and his mother. enzyme activity analyses confirmed superactivity of prpp synthetase. the family reported here broadens the clinical spectrum of prpps superactivity and indicates that this rare metabolic disorder is associated with a recognizable facial gestalt. homozygous and compound heterozygous mutations of the rnu4at-ac gene are associated with mopd1 and roifman syndrome. mopd1 is characterized by severe microcephaly with brain malformations including abnormal gyral pattern, corpus callosum agenesis or hypoplasia, vermis hypoplasia and intracranial cysts, psychomotor retardation, short stature, skeletal dysplasia, dry skin, sparse hair, flexion contractures, round face with beaked nose and protruding eyes, and premature death with a majority of the patients who die before the age of 28 months for unknown reasons. roifman syndrome was first described as a novel association of antibody deficiency, spondyloepiphyseal chondro-osseus dysplasia, retinal dystrophy, poor pre-and postnatal growth, cognitive delay and facial dysmorphism including long eyelashes, downslanting palpebral fissures, a long philtrum and a thin upper lip. all patients with roifman syndrome reported so far lack brain malformations. the rnu4atac gene encodes a small nuclear rna (snrna), which is essential for minor intron splicing. homozygous (g.51g>a, g.46g>a) and compound heterozygous mutations (g.51g>a;g.55g>a, g.51g>a;g124g>a and g.40c>t;g.124g>a) have been described in mopd1. all mutations involve the 5' or 3' stem loop of the u4atac snrna. in contrast, all cases with roifman syndrome investigated so far showed compound heterozygous rnu4atac mutations with one allele harboring a mutation in the mopd1 associated 5' stem loop and the other allele showing a mutation in the stem ii site of the u4atac snrna, which has not been involved in mopd1, so far. thus, the different pattern of the mutations observed in mopd1 and roifman syndrome may contribute to the distinct features of both syndromes. however, our patient shows, that features of mopd1, i. e., brain malformations, may also be present in patients who show roifman syndrome associated rnu4atac mutations. this indicates that both syndromes may represent overlapping features of the clinical spectrum of rnu4atac mutations. h. roth, h. stöhr, b. h. f. weber institute of human genetics, university of regensburg, germany introduction: inherited retinal degenerations comprise a genetically heterogeneous group of eye diseases with overlapping clinical presentations. up to now, more than 200 genes have been associated with different forms of retinal dystrophies (rd) such as retinitis pigmentosa (rp) or cone-rod-dystrophies (crd) with mutations in 84 and 34 causative genes, respectively. here, we present the results from three patients with remarkable findings and discuss their implications for risk prediction and genetic counseling. methods: targeted next-generation sequencing (ngs) technology based on agilent custom designed gene panels (sureselect) has been established in our diagnostics department to identify causative mutations in a large patient cohort with approximately 200 rd patients. high-throughput sequencing data are routinely analyzed with the clc biomedical workbench. classification of variants was based on bioinformatic analyses using alamut visual software, mutationtaster, sift and polyphen-2 prediction programs, allele frequencies, amino acid conservation and literature. results. ngs analysis revealed two patients with rp and one patient with crd, each of whom carry putative causative mutations in several rd genes. first, a male patient with a family history of crd, is a carrier of a nonsense mutation p.(arg1144ter) in rims1 and two likely pathogenic missense mutations in aipl1 (p.(tyr134phe)) and guca1a (p.(pro-50leu)), each in a heterozygous situation. mutations in all three genes can cause adcrd. in addition, the patient carried a hemizygous nonsense mutation p.(glu1017*) in the x-chromosomal rpgr gene. secondly, a female patient with simplex rp was found to be homozygous for a frameshift-causing deletion p.(ser527leufs*28) in the impg2 gene causing arrp. she also carried three heterozygous, likely pathogenic missense mutations in crx (p.(tyr142cys)) causing adrp, in the x-chromosomal rpgr (p.(ala365val)) and in ush2a (p.(ile1621val)) associated m. s. reuter 1 , a. riess 2 , u. moog 3 , t. a. briggs 4, 5 , k. e. chandler 4 background: disruptions of the foxp2 gene, encoding a forkhead transcription factor, are the first known monogenic cause of a speech and language disorder. so far, mainly chromosomal rearrangements such as translocations or larger deletions affecting foxp2 have been reported. intragenic deletions or convincingly pathogenic point mutations in foxp2 have up to date only been reported in three families. we thus aimed at a further characterization of the mutational and clinical spectrum. methods: chromosomal microarray testing, trio exome sequencing, multi gene panel sequencing and targeted sequencing of foxp2 were performed in individuals with variable developmental disorders, and speech and language deficits. results: we identified four different truncating mutations, two novel missense mutations within the forkhead domain and an intragenic deletion in foxp2 in fourteen individuals from eight unrelated families. mutations occurred de novo in four families and were inherited from an affected parent in the other four. all index patients presented with various manifestations of language and speech impairment. apart from two individuals with normal onset of speech, age of first words was between 4 and 7 years. articulation difficulties such as slurred speech, dyspraxia, stuttering or poor pronunciation were frequently noted. motor development was normal or only mildly delayed. mild cognitive impairment was reported for most individuals. conclusion: by identifying intragenic deletions or mutations in fourteen individuals from eight unrelated families with variable developmental delay/cognitive impairment and speech and language deficits, we considerably broaden the mutational and clinical spectrum associated with aberrations in foxp2. h. rieder, f. beleggia, d. wieczorek we report on a 4-year-old boy with microcephaly, arachnoidal cysts, pachygyria, microgyria, and severe intellectual disability. he also had short stature including shortening and deformation of the femora, brachydactyly, and short ribs with costochondral dysplasia. he showed facial dysmorphism with narrow palpebral fissures, a short nose with a depressed nasal bridge, and a broad mouth with full lips. clinical laboratory investigations demonstrated persistently slightly elevated liver enzymes. exome sequencing revealed compound heterozygous mutations of the rnu4at-ac gene, g.51g>a;g.16g>a, which has been described in an individual with roifman syndrome. we report on a seven-year-old girl, first child of non-consanguineous italian parents, with developmental delay, muscular hypotonia and distinctive craniofacial features (epicanthus inversus, ptosis, broad nasal bridge, mild retrognathia, low-set posteriorly rotated ears and malpositioned teeth in the mandible). because of the tentative diagnosis of blepharophimosis-ptosis-epicanthus inversus syndrome (bpes), conventional cytogenetic analysis, sanger sequencing and mlpa (multiplex ligation-dependent amplification) of foxl2 were initiated and showed unremarkable results. microarray-cgh revealed a 414 kb microduplication of genetic material on 15q11.2: arr[hg19] 15q11.2(22765628_23179948)x3 encompassing the genes tubgcp5, cyfip1, nipa1 and nipa2 of maternal origin. patients with 15q11.2 microduplication have been described to be affected by developmental delay, motor and/or expressive language delay, epilepsy, learning disabilities and/or behavioral problems. however, genotype phenotype correlation is complicated by incomplete penetrance. healthy and mildly affected carriers are reported in the literature. we speculate that the microduplication might contribute but does not fully explain the phenotype of our patient, in particular concerning the craniofacial features. subsequent trio whole-exome sequencing identified a de novo heterozygous mutation in setbp1 (c.3909t>a/ p.tyr1303*) leading to a premature stop codon and most probably resulting in a truncated and functionally impaired protein. mutations in the set binding protein 1 gene (setbp1) on 18q12.3 have been identified to cause schinzel-giedion syndrome (sgs, omim 269150), a rare autosomal dominant disorder characterized by postnatal growth failure, severe developmental delay, seizures, facial dysmorphism, genitourinary, skeletal, neurological, and cardiac defects. chromosomal deletions in 18q including setbp1 have been reported to cause a milder phenotype known as "autosomal dominant mental retardation-29" (mrd29, omim 616078). these observations suggest that the severe sgs phenotype might be the consequence of a gain-of-function or dominant-negative effect of the mutations and that setbp1 haploinsufficiency results in a different, milder phenotype. so far, the function of the set-bp1 protein is unknown. the presented case adds up to the yet small number of reported cases of mrd29 and thereby contributes to the clinical spectrum of setbp1 haploinsufficiency. this work was supported by "förderstiftung des uksh" (project number: 006_2016). the demand for genetic counseling had been constant, in germany, over many years. from 1996 to 2004 around 47.000 cases per year on the average and with minor fluctuations were reimbursed by the german sickness funds (public health insurance system; pabst and schmidtke, gendiagnostik in deutschland, bbaw, p. 195-203, 2007) . in connection with the "genbin2"-project, a new nationwide survey was initiated regarding with arrp. finally, in another female rp patient with no family history of rd, we detected a nonsense mutation p.(trp558ter) and a likely pathogenic splice site change (c.6078 + 3a>g) in the arrp gene eys, assuming compound heterozygosity. in this patient we also identified two heterozygous, likely pathogenic missense mutations in hmcn1 (p.(pro2226thr)) and cep290 (p.(arg2210cys)) underlying dominant and recessive forms of rd. in all three cases, specific mutation(s) could not be uniquely identified as causative. conclusion. results in the three rd cases emphasize that ngs can generate unexpected results that are difficult to interpret, particularly in the absence of segregation analysis and functional data on pathogenicity. the implications for genetic counselling and predictive testing will be discussed. smart qnipt study -detection of fetal trisomy 21 based on methylation-specific quantitative real-time pcr m. sachse, s. werler, j. bonnet, u. neder, h. sperling, s. busche, s. grömminger, w. hofmann lifecodexx ag, konstanz, germany objectives: current non-invasive prenatal testing (nipt) methods for the detection of fetal trisomy 21 (t21) are primarily based on next generation sequencing (ngs) strategies which are quite costly in clinical application and hence are limited to patients who can afford the testing. here, we describe the results of a blinded study with respect to the test accuracy of a newly developed nipt assay based on quantitative real-time pcr (qpcr) for prenatal testing of fetal trisomy 21 (qnipt). methods: in the study maternal plasma samples were collected from 1,044 pregnant women and blinded by an independent contract research organization. after extraction of cell-free dna using qiasymphony instrument and methylation-specific digestion of dna samples a multiplex qpcr was performed. the primary qpcr data were finally evaluated with our ce marked data analysis software. results from this analysis and from confirmatory ngs testing were compared with nipt results using ngs. the study results of successfully analysed maternal plasma samples (n = 966) demonstrated a positive percentage agreement (ppa; equates to sensitivity) of 100% (lower 1-sided 95% confidence interval of 91.8%; n = 35/35) and a negative percentage agreement (npa; equates to specificity; n = 931/931) of 100% compared to ngs-based results. the negative predictive value (npv) for the novel qnipt and confirmatory ngs testing was 100% (lower 1-sided 95% confidence interval of 99.68%). the average fetal fraction of the 966 examined blood samples was 8.1%. the qnipt assay provided reliable test results in 54 blood samples with a fetal fraction below 4% and as low as 2.4%. conclusion: our results suggest that the proprietary qnipt assay is a very reliable and robust method suitable for clinical routine in accordance with international medical associations. the assay represents a more cost-efficient solution over ngs testing and will also be able to provide results in the shortest possible time. while current nipt methods require a minimum fetal fraction of 4% in blood samples from singleton pregnancies, we could demonstrate in the study that our smart qnipt assay can be employed on blood samples with a fetal fraction of as low as 2.4%. in summary, the application of smart qnipt could have the potential to become a nipt solution on a global scale for pregnant women of all ages and risk groups. further studies which aim to include the determination of trisomy 13 and trisomy 18 are currently underway. with respect to the developmental delay of our index patient, chromosome analysis and array-cgh were performed. a microduplication in 3p26.2 (app. 50kb) of unknown significance and a microduplication in xq27.3 (app. 550kb), which comprises the fmr1-gene, were identified and shown to be of maternal origin (arr[hg19] 3p26.2 (2, 811, 861, 170)x3, xq27.3(146, 663, 212 ,089)x3). fmr1 is associated with fragile x syndrome, which is one of the most common causes for x-linked mental retardation. cgg-trinucleotide repeat expansions in the 5' untranslated region (>200 repeats) lead to aberrant hypermethylation of the fmr1-promotor and silencing of fmr1 expression. in contrast, premutations (55-200 repeats) lead to a higher expression of fmr1 and cause a clinical syndrome that is characterised by late progressive cerebellar ataxia (fxtas). in line with this gain-of-function mechanism, we hypothesize that the xq27.3 duplication, which could lead to an increased gene dosage of fmr1, causes a fra(x)-/fxtas-like syndrome and explain the clinical findings in our family. vengoechea et al. described a patient with a similar duplication, who was affected by developmental retardation, epilepsy and hyperactivity. they discussed the microduplication, which arose de novo in their patient, as the cause for the boy's symptoms (vengoechea j. et al., eur j hum genet., 2012 nov; 20(11):1197-200) . in conclusion, we assume a fmr1-duplication syndrome in our family with variable expressivity and a different impact on male and female patients. to further prove this hypothesis, we are planning to perform a segregation-analysis within the whole family. background: congenital myasthenic syndromes (cms) are a genetically heterogenous group of disorders leading to weakness of skeletal muscles -especially ocular, bulbar and limb muscles -with onset mostly at birth or in early childhood. the severity of cms can vary significantly ranging from death in early childhood due to respiratory insufficiency to only mild muscle weakness in adulthood. more than 25 genes that are highly expressed in the neuromuscular junctions are associated with cms. mutations in the chrne gene on chromosome 17p13.2 are responsible for about one half of genetically solved cms cases. they can cause different subtypes of cms with either autosomal dominant or autosomal recessive inheritance. results: here, we report a 3-year-old boy who was born with bilateral eyelid ptosis and congenital vertical talus of the right foot that needed surgical correction. the boy displayed muscular hypotonia with a myopathic facial expression and delayed motor development. ophthalmologic examination revealed external ophthalmoplegia. a next generation sequencing based gene panel for congenital myopathies detected the homozygous frameshift mutation c.750_769dup (p.leu257profs*50) in the chrne gene in the boy. gene dosage analysis did not show an exonic deletion in the chrne gene. sanger sequencing confirmed the mutation in a heterozygous state in the boy's father. however, his mother did not carry the mutation in the chrne gene. conclusions: these results suggest the rare event of a (partial) paternal uniparental isodisomy of chromosome 17 as cause of the homozygous c.750_769dup (p.leu257profs*50) in the chrne gene in the boy. further experiments are currently undertaken to confirm this hypothesis. the annual reimbursement frequencies of the relevant entries in the ebm fee schedule, 08572, 01792, 01837 and 11232, for which only specialists in human genetics and subspecialists in medical genetics can account, from 2009 until 2014. contrary to the findings in the earlier period the demand for genetic counseling has risen sharply: 41,243 (of a total of 54,360) cases in 2009; 45,525 (58,341) in 2010; 46,691 (59,724) in 2011; 51,316 (63,242) in 2012; 54,739 (69,732) in 2013; and 61,308 (74,780) in 2014. we speculate that the temporal correlation of the rise of genetic counseling demand with the enactment of the german act on testing (february 1, 2010) is not coincidental. further factors that might contribute to the increase in demand are the ensuing guidelines of the german commission on genetic testing and cme activities related to attaining a qualification for genetic counseling for specialties other than human genetics. in the course of these activities the awareness for the importance of genetic counseling delivered by specialists in human genetics and subspecialists in medical genetics may have risen. acknowledgements: the "genbin2"-project supported by the robert koch-institute through funding from the german federal ministry of health. we gratefully acknowledge the collaboration with dr. michael erhart, zentralinstitut der kassenärztlichen bundesvereinigung (zi-kbv), berlin, germany it is well known that duplications of down syndrome critical region (dscr) on chromosome 21q22 can cause down syndrome whereby the distinct phenotype is associated with the involved genes and the size of duplication. however, in literature are hardly any cases with mosaic duplications of dscr described. here we report on a 6 year old boy with some clinical features of down syndrome including distinctive craniofacial dysmorphism, simian crease and sandal gap as well as delayed motor and speech development. no other organ abnormalities are known. conventional chromosome analysis showed no numerical or structural aberration whereas interphase fish analysis revealed three signals for dscr in approx. 40% of lymphocytes and in approx. 80% of buccal mucosa cells. array-cgh analysis on dna from peripheral blood confirmed a 2,56 mb duplication of chromosome 21q22.13q22.2. the duplication involves among others the gene dyrk1a which is reported as a candidate gene for down syndrome. this case presents one of the smallest known duplications within dscr which causes even in a mosaic state a mild phenotype of down syndrome. 4-year-old boy was referred to our outpatient clinic due to global developmental delay mainly affecting his speech and his fine motor development. in addition, muscular hypotonia and an abnormal gait were reported by the referring paediatrician. his mother, his maternal grandmother, and numerous relatives are affected by gait ataxia. no causative mutation was detected in the maternal grandmother by means of a multi-gene panel for spinocerebellar ataxia encompassing 118 genes. genetic testing for friedreich ataxia was also without pathological findings. cer, but the patient mother's grandfather had a cancer of unknown origin and died at the age of 50 years. because of the suspicion of having a lynch syndrome an immunohistochemistry and microsatellite analysis have been performed on the tissue of the colorectal cancer and the hepatic metastasis. all four mmr proteins were properly expressed in immunohistochemistry in colorectal cancer. just the expression of mlh1 protein in the hepatic metastasis was focally weakened and inhomogeneous. the microsatellite markers bat25, bat26, d5s346 (apc), d2s123 and d17s250 (mfd) were all stable, a a146t-kras mutation was found. after performing a multi-gene panel (ngs, next-generation sequencing), a gross heterozygous deletion of exon 1 in msh6 gene has been found in the cnv analysis of the ngs data. this mutation was confirmed with a mlpa and quantitative real time pcr analyses. furthermore, rna expression of msh6 was reduced to 50% in blood lymphocytes in comparison to control samples pointing to a potential role of msh6 loss in the patient's tumor development. we are observing more and more patients with probably pathogenic and pathogenic mutations in one of the mmr genes with normal immunohistochemistry and microsatellite analysis. therefore, we propose that the criteria for performing a molecular genetic analysis of hnpcc/lynch syndrome should be revised. exome sequencing reveals gata1 mutation in a patient with partial delta-storage pool deficiency and mild thrombocytopenia objectives: we report about a 35-year-old male patient of russian background with severe and frequent epistaxis and hematoma since infancy. he presented with mild thrombocytopenia and increased mean platelet volume. von willebrand's disease and subhemophilia had been excluded. previously, he was diagnosed with immune thrombocytopenic purpura. he never underwent elective surgery. his parents were asymptomatic. however, his 4-year-old daughter also suffers from severe bleeding symptoms (multiple, light red hematoma in consequence of minimal trauma). methods: whole exome sequencing (wes) was carried out for the patient, his asymptomatic wife, his symptomatic daughter and her asymptomatic 8-year-old brother. platelet function was assessed by light transmission-, lumi-aggregometry and flow cytometry. lysates of gel-filtered platelets were analyzed for total granule p-selectin, cd63 and von willebrand factor (vwf) content by western blotting and for serotonin levels by elisa, respectively. results. platelet function and characterization of the patients granula suggested a delta-storage pool disease (spd). in most cases delta-spd occurs as part of a syndrome, e. g. combined with albinism, immunodeficiency or a thalassemic-like blood disorder. as the patient and his daughter did not show any conclusive phenotype, their dna was subjected to wes. exome sequencing revealed a not yet described gata1-mutation close to two zinc finger domains (znf1 and znf2) in a highly conserved region of the gata1 gene in the 4-year old daughter (c.886a>c, p.t296p, heterozygous) and her father (c.886a>c, p.t296p, hemizygous). this mutation was absent in 150 wildtype-controls but could also be demonstrated in the indexpatients' asymptomatic mother. only a few mutations are known to be located in this c-terminal region to date. mutations in gata1 may lead to different clinical presentations, depending on their location within gata1 (e. g. diamond-blakfan anemia (exon2), x-linked thrombocytopenia (znf1), transient myeloproliferative disorder (intron 1, exon 2, exon 3) and acute megakaryoblastic leukemia (intron 1, exon 2, exon 3) in case of down-syndrome). significantly increased hbf-levels (reference level: ≤0,8%) in the affected family members of 13,5% (4-year-old daughour proposita is a 42 years old woman, who was transferred to our genetic counselling department for the suspicion of m. osler (hereditary hemorrhagic teleangiectasia, hht). during routine check up an anemia was diagnosed. a tumor search was initiated and unexpectedly the ct of the abdomen showed a suspect coin lesion of about 2.5 cm in diameter localized in the basal part of the right lung. further investigations revealed a pulmonary arterio-venous malformation which was hemodynamically relevant and already led to chronic right heart overload. a coil embolization was performed. retrospectively, medical history of the patient included episodes of severe epistaxis in childhood and a neurosurgical intervention for intracerebral bleeding at the age of 16 years without permanent neurological deficits. during genetic counselling our proposita mentioned that her 8 years old daughter also suffered from anemia due to multiple polyps of the colon. after polypectomy her hemoglobin values normalized. although histologically the polyps appeared as juvenile ones a mutation search in the apc-gene was initiated by the gastroenterologists without identifying a pathogenic mutation. combining the two pieces of information, we offered a mutation search in the smad4 gene and a pathogenic mutation c.1081c>t (p.arg361cys) was found in both patients in heterozygosity. colonoscopy in the mother did not show juvenile polyps or gastrointestinal vascular malformations. vice versa, no cerebral or pulmonary arteriovenous malformations could be detected in the daughter. our family illustrates, that the same mutation within a family may phenotypically appear as different diseases. a careful taking of medical history and the knowledge of all relevant diagnostic findings (in this case e. g. the histology of the polyps) can enable the geneticist to offer a precise differential diagnosis leading to a well-directed molecular testing. to our opinion this is still relevant even in the era of ngs-based panels because the more precise the clinical diagnosis and the choice of the genes to analyse the less problems with unclassified variants will arise. the hereditary non-polyposis colon cancer (hnpcc, lynch syndrome) is caused by pathogenic germline mutations in mismatch repair genes (mlh1, msh2, msh6 and pms2) causing microsatellite instability (msi) and decreased or lost expression of the appropriate mismatch repair protein (mmr) in the immunohistochemistry (ihc) on tumor material. thus, ihc and msi testing help to identify the mmr gene, which most likely harbors a germline pathogenic variant. msi and ihc testing prior to germline analysis are specified in the s3 guidelines of hnpcc and if negative normally preclude further genetic analyses. here, we present a 51 year old patient with a synchronous colonic (ceacum) and renal cancer at the age of 50 years. at the time of the diagnosis hepatic metastases of the caecal adenocarcinoma have already been present. histologically, the colonic tumor was a poorly differentiated adenocarcinoma (pt3pn) with lymphangiosis and haemangiosis carcinomatosa. the renal cancer showed histology of a moderately differentiated clear cell renal carcinoma. in the family history the 50 year old sister and the parents are healthy. the twin sister of the patient's mother had a collateral breast cancer at the age of 46 years and died two years later. the mother's grandparents had no canmedizinische genetik 1 · 2017 155 p-cling-109 pitfalls in molecular genetic diagnostics a. tibelius, e. fey, k. hinderhofer institute of human genetics, heidelberg, germany probably every clinical laboratory geneticist may look back on at least one case in his career which has caused him quite a headache. this means those cases with completely unexpected and at a first glance implausible results which could be interpreted correctly only after intensive enquiry and additional testing. here, we report on three of such pitfall cases from our routine diagnostics. case 1: a 30-year-old woman, pregnant with monozygotic twins, was referred to prenatal cystic fibrosis diagnosis. she and her partner were carriers of mutations in the cftr gene (621 + 1g>t and g542x, respectively). prenatal molecular diagnosis demonstrated that both fetus had inherited only the paternal mutation. routinely, we performed maternal cell contamination analysis by comparing polymorphic microsatellite loci between the maternal and fetal dna. surprisingly, 12 of 16 tested microsatellite loci revealed a discrepancy between the maternal and fetal genotypes, meaning neither of both maternal alleles was present in fetal dnas. a potential confusion of samples was excluded. moreover, the presence of paternal mutation in fetal dnas indicated a correct genetic relationship between awaited children and the partner of pregnant woman. the only one plausible interpretation of the obtained result was a pregnancy by egg donation. afterwards, this suspicion was confirmed by the couple. case 2: we present a 38-year-old man with infertility resulting from azoospermia. conventional chromosomal analysis and an additional fish analysis using y probes indicated a 46,xx karyotype with no detectable sry. in parallel, a molecular azf (azoospermia factors) diagnostic was performed by a standard multiplex pcr. by this method the absence of sry region and a deletion of regions azfb and azfc, was identified, explaining the observed azoospermia. interestingly, the pcr showed that the azfa region was still present in patient chromosomes, contradicting cytogenetic and fish results. thus, a complementary fish analysis was performed in order to reveal a low-grade y-mosaicism and sry material was detected in 3% of the cells (a result under the threshold level). based on this observation, pcr conditions for the azf diagnostic were modified and a very weak sry-specific pcr product detected. case 3: a molecular diagnostics for frax (fragile x syndrome) was requested for a 4-year-old boy with a slight delay in speech development. his brother was already molecular-genetically diagnosed as having frax. the analysis by southern blot hybridisation in the patient revealed a smear of methylated fragments characteristic for an expanded allele in the full mutation range. surprisingly, two fragments of normal length, methylated and non-methylated, could also be detected in patient's dna. a subsequent aneuploidy mlpa confirmed a supernumerary x-chromosome in the patient consistent with a klinefelter syndrome. these results were verified by an independent cytogenetic analysis. the syndrome of congenital symmetric circumferential skin creases (cscsc1 and cscsc2) replaces the old term michelin-tire-baby syndrome (mim 156610) and is characterized by congenital circumferential skin folds, primarily of the limbs, facial dysmorphism, cleft palate and intellectual disability. mutations in the β-tubulin encoding gene tubb or in the microtubule end binding family member mapre2 are the underlying genetic cause. ter) and 2,8% (35-year-old indexpatient) suggested dyserythropoiesis, although thalassemic features of the blood count were lacking. conclusion: we describe a gata1 mutation as the cause of a delta-storage pool disorder. imbalanced x-chromosome inactivation might explain the different phenotypes of the gata1 mutation carriers and will be investigated through allele-quantification based on rna isolated from whole blood and from platelet rich plasma in case of the index patient and different family members. we saw three siblings (aged 62, 61, 54; two female, one male) with variable signs of retinal disease. all three developed night blindness till the 3rd decade, furthermore near-sightedness and deterioration of the visual acuity to a various degree at age 35-45. the clinical diagnosis was given as stargardt's disease, fundus flavimaculatus or unspecified retinal degeneration respectively. two more siblings (aged 59 and 57) and the mother (died age 62) were clinically unaffected. the father (died age 69) was reported to have had bad eyesight beginning in the 5th decade, his father similarly (no further information available). the three affected siblings have a total of five children (age 25-39), none of them showing clear signs of retinal disease up to now. considering the clinical diagnosis stargardt's disease we analysed the genes abca4, elovl4, prom1 and cngb3 by sanger sequencing. no pathogenic mutation could be detected. afterwards we performed next generation sequencing of several genes associated with retinal dystrophies and found a novel splice site mutation in the prph2 gene (mim #179605). the mutation was confirmed in all three affected siblings by sanger sequencing. considering that mutation pathogenic we could re-diagnose our patients with patterned macular dystrophy type 1 (mim #169150). that disease is inherited in an autosomal dominant fashion, corresponding to the pattern of inheritance evident in our family. the etiology of epileptic encephalopathies, characterized by severe, early-onset seizures accompanied by developmental delay or regression, is highly heterogeneous. in recent years, next-generation sequencing approaches have led to the discovery of numerous causative genes; however the spectrum of associated phenotypes still needs to be further explored for many of these genes. we performed multi-gene panel analysis in a little boy of german non-consanguineous parents who showed severe early-onset infantile epileptic encephalopathy and almost absent neurological development. in this patient we identified a novel heterozygous missense mutation in the gabrg2 gene which was absent in the parents. in silico analyses strongly suggested a pathogenic relevance of this sequence variation which resides within a highly conserved region. so far, gabrg2 mutations have mainly been associated with milder types of epilepsy such as febrile seizures and childhood absence epilepsy. therefore, our findings extend the phenotypic spectrum associated with mutations in this gene at the severe end. referring on a case report by al marzouqi et al. (2014) , who reported on a girl with bilateral amastia in the context of skewed x-inactivation, we want to underline the importance of mlpa analyses in the case of negative sequencing of the eda gene, as whole exon duplication can be the cause of hypohidrotic ectodermal dysplasia. to our knowledge, the report of al marzouqi et al. has been the only case about this genetic alteration in the literature so far. novel pogz mutation in a patient with intellectual disability, microcephaly, strabismus and sensorineural hearing loss we report on a 3-year-old male patient with severe intellectual disability, microcephaly, sensorineural hearing loss, ocular abnormalities (strabismus and hyperopia), congenital heart defect (atrial septum defect and pulmonary stenosis) and minor facial abnormalities (thin upper lip, frontal upsweep). next-generation sequencing analysis revealed a novel heterozygous de novo mutation in pogz: c.2703_2710del, p.(thr902serfs*39) . the clinical problems of this patient are in accordance with the findings in the previously reported pogz mutation carriers. reports of additional patients with pogz mutations will be needed to establish detailed phenotype-genotype correlations of this novel and probably underdiagnosed syndrome. novel clinical and molecular aspects in two patients with kleefstra syndrome d. wand, b. seebauer, k. heinimann, i. filges medical genetics, university hospital, basel, schweiz the phenotype of kleefstra syndrome is clinically variable but characterized by facial dysmorphism, intellectual disability, childhood hypotonia and variably associated other malformations. haploinsufficiency of ehmt1, caused by either heterozygous microdeletions in 9q34.3 or sequence mutations in ehmt1, has been identified to be the underlying causal mechanism. here we present two girls from two unrelated families with clinical signs of kleefstra syndrome. besides the main features such as facial dysmorphism, intellectual disability/developmental delay and childhood hypotonia, the 17 years old girl presented with additional accelerated growth whereas the other girl, 2 years old, showed failure to thrive. both children have no heart defect, renal or urogenital anomalies or severe respiratory infections. we identified two rare variants likely to be causal: a novel heterozygous splice site ehmt1 variant and a heterozygous microdeletion in chromosome 9q34.3, including exons 26 an 27 of the ehmt1-gene . these patients broaden the spectrum of kleefstra -associated ehtm1 causes, contribute to novel aspects of genotype-phenotype correlations and a better understanding of the clinical variability of the disorder. here, we present two girls from two non-consanguines families showing clinical aspects of the syndrome of congenital symmetric circumferential skin creases. additional features are present in both girls. patient 1 is the first child of healthy parents. she was born at 39 + 6 weeks of gestation with a weight of 2660 g, a length of 49 cm and a head circumferences of 34 cm. respiratory distress, a cleft palate, a heart defect (atrial septum defect), an anogenital malformation and facial dysmorphism were diagnosed after delivery, additionally to the skin creases phenotype. conventional karyotyping performed on blood lymphocyte cultures and cgh-array analysis showing normal results. patient 2 is the second child of unrelated healthy parents. her older sister is healthy as well. because of an intrauterine growth retardation an amniocentesis with chromosomal analysis was performed, showing a normal karyotype of 46,xx. patient 2 was born at 38 + 3 weeks of gestation by caesarean section. the birth weight was 2450 g, the birth length was 47 cm and the head circumferences was 34 cm and the apgar score was 1/3/5. due to respiratory distress and hypoxia a tracheotomy was initiated. she also presented with cleft palate, feeding difficulties, a heart defect (atrial and ventricular septum defect), asplenia and facial dysmorphism. the skin phenotype was remarkably similar to that of patient 1 with a prominent neck fold and skin creases mainly on the back part, but also at the limbs. in both children the skin folds gradually diminish over the time without any intervention like it was described for michelin-tire-baby syndrome/ cscsc1 and cscsc2 patients. for these reason, a disease-causing mutation in the genes mapre2 and tubb were excluded in both children. to identify a genetic cause, we performed a trio whole-exom sequencing, including the healthy parents and the affected child, in both families. a de novo stop mutation was detected in patient 1, while no promising results could be detected in patient 2. further studies, especially functional in vivo studies and analysis of further patients with a similar clinical presentation will answer the question if the above described phenotype is an expanded variant of the congenital symmetric circumferential skin creases or a unique new syndrome. clinical findings in a family with x-linked hypohidrotic ectodermal dysplasia due to a duplication of exon 2 in the eda gene -a case report we want to summarize the phenotypic spectrum in one affected three generation family with x-linked hypohidrotic ectodermal dysplasia. we report on one strongly affected male and 2 slightly affected female carriers, carrying a duplication of exon 2 in the eda gene. reason for genetic assessment was the request of a female for evaluation of a possible risk for occurrence of the familial disorder in a further planned pregnancy. the woman reported an inability to breastfeed and she and her daughter showed conical teeth, a dry skin and sparse hair. the affected male showed absent deciduous teeth, hypodontia of permanent teeth, missing regulation of the temperature due to a lack of sweat glands, bilateral nipple hypoplasia, a dry and wrinkled skin, missing eye lashes, eyebrows and scalp hair and sparse body hair. the fingernails were inconspicuous, whereas the nails of the toes, particularly the nails of the hallux were yellowish and thickened. the affected male had an operation due to a gallstone and cysts were found in one kidney. there was no increased predisposition to infections. the suspected diagnosis of x-linked hypohidrotic ectodermal dysplasia was confirmed by genetic analysis of the eda gene (sequencing and mlpa analysis). a duplication of exon 2 was detected in the affected male patient and was confirmed in the two mildly affected female relatives by realtime-pcr. terminator sequencing mix v1.1 on an abi3130xl genetic analyzer (applied biosystems) and detected an insertion of 77 bp between the exons 7 and 8. we located the insertion's sequence in intron 7 of the dmd gene and sequenced flanking sequences of gdna to find the underlying mutation causing the insertion. two hemizygous single nucleotide variants (snvs) surrounding the inserted fragment could be identified. the first variant (c.650-39575 a>c) is a common polymorphism (maf according to 1000 genomes project: 14.97%) at the position of an existing acceptor splice site. the second variant (c.650-39498a>g) is novel and creates a new cryptic donor splice site with high probability. these two cryptic splice sites create the formation of a 77 bp pseudoexon which produces a frameshift and a premature stop codon (p.asp217alafs*) in the dmd gene. in summary, we could genetically confirm the clinical diagnosis of a dystrophinopathy by two divs in dmd. although the insertion of the pseudoexon creates a premature stop codon the patient's clinical phenotype indicates a milder type of becker muscular dystrophy. this contradiction could be explained by the remaining existence of dmd wild type mrna most likely due to a not constantly active cryptic splice site. most interesting about the present case is the fact that a common snv facilitates the creation of a pseudoexon. this makes the region a potential hotspot for divs in the dmd gene which would be worthwhile further investigations. with more than 90% of all humans preferring to use their right hand, handedness is the most noticeable functional expression of cerebral lateralization in humans. however, the precise molecular mechanisms that regulate handedness and other related forms of cerebral lateralization remain largely elusive. therefore, the question which genetic, epigenetic and environmental factors contribute to human handedness is one of the central questions in research on lateral asymmetries. handedness is a complex, heritable trait, for which polygenic inheritance is assumed, meaning that a large number of genetic factors with a small additive effect contribute to the observed variance in hand preference. to date, genetic association studies have implicated only a few specific genes influencing handedness. particularly interesting is the association between the human androgen receptor (ar) gene and different aspects of handedness, since the interrelationships constitute a conceptual bridge between the theories that invoke testosterone as a factor in the development of cerebral asymmetries with theories proposing that the x chromosome contains a locus that influences the direction of hand preference. in an initial large association study in 1057 samples of healthy adults we already demonstrated that handedness in both sexes is associated with the ar cag-repeat length, with longer repeats being related to a higher incidence of non-right-handedness. in addition, we have performed a second association study in an independently collected healthy cohort of more than 1000 test persons with comprehensive data on the handedness phenotype. we were able to replicate the association with longer cag repeats being related to a higher incidence of non-right-handedness, especially in females. since longer cag-repeat blocks have been linked to less efficient ar function, these results implicate that differences in ar signaling in the developing brain might be one of the factors that determine individual differences in brain lateralization. dej. waschk, ac. tewes, p. wieacker, s. ledig institute of human genetics, münster, germany the mammalian female and male reproductive tracts develop from the paramesonephric ducts (müllerian ducts, md) and mesonephric ducts (wolffian ducts, wd), respectively. in the absence of testicular differentiation and anti-müllerian hormone, the wd regress and the md give rise to the fallopian tubes, uterus, cervix and the upper part of the vagina. disorders of normal md development in females can manifest as fusion anomalies of the uterus such as septate uterus, bicornuate uterus, unicornuate uterus and uterus didelphys, or more complex malformation patterns like mayer-rokitansky-küster-hauser syndrome (mrkh) or herlyn-werner-wunderlich syndrome. mrkh is characterized by congenital absence of the uterus and the upper two-thirds of the vagina in individuals with a normal female karyotype, most of whom show normal ovarian function. mrkh can further be associated with additional malformations e. g. of the kidneys and the skeletal system. herlyn-werner-wunderlich syndrome is characterized by uterus didelphys, obstructed hemivagina and ipsilateral renal agenesis. despite anomalies of the md occur quite frequently, the etiology of most cases with these disorders remains unknown. the homeodomain transcription factor emx2 (empty spiracles homeobox 2) was found to be critical for urogenital and central nervous system development. previous studies showed that in emx2 mutant mice, the kidneys, ureters, gonads and genital tracts were completely missing. in order to elucidate whether mutations in emx2 are causative for md anomalies in humans, we performed sequence analyses of emx2 (genbank nm_004098.3) in our study group of 142 female patients with clinically characterized disorders of the md including 62 patients with mrkh and 7 cases with herlyn-werner-wunderlich syndrome. we found the heterozygous intronic mutation c.592-17c>a twice in this cohort. this variant has been described earlier (rs41308651) and is listed in the exome aggregation consortium exac variant with a minor allele frequency of 0.23%. in silico analysis revealed no significant changes for the correct splicing of emx2. we therefore consider this variant to be a benign polymorphism. we conclude that mutations in emx2 are not causative for disorders of the md in our cohort. a. zaum, w. kreß, a. gehrig, s. rost institute of human genetics, würzburg, germany dystrophinopathies are x-linked muscle diseases caused by mutations in the dmd gene (omim: 300377). due to the huge size of this gene, the detection of mutations is sometimes challenging. despite multiplex ligation-dependent probe amplification (mlpa) and sequencing of all 79 exons, about 7% of patients do not show any mutations in coding regions and therefore remain without molecular diagnosis. we assume that the majority of these patients have deep intronic variations (div) which are not detectable by standard diagnostic techniques. the index patient analysed is a twelve-year-old boy who was by chance diagnosed with elevated ck levels (up to 15,000 iu/i) at eight weeks of age. today he is still able to walk without walking aids but needs assistance when climbing stairs. in 2008, a muscle biopsy revealed complete absence of dystrophin which established the diagnosis of dmd. for the molecular diagnosis, standard diagnostics ascertained no causative mutation. therefore we decided to search for deep intronic mutations. we isolated mrna from muscle tissue of the patient and amplified overlapping cdna fragments using rt-pcr. the fragments were analysed by gel electrophoresis for size differences compared to an unaffected control. the cdna product comprising exons 6-12 revealed an augmented fragment size compared to the control and the expected product size (about 1100 bp instead of the expected 1007 bp). we sequenced the altered cdna product using bigdye recent research in psoriasis has identified pustular psoriatic manifestations as either mendelian traits or as major genetic risk factors in contrast to the numerous associated snps in classical plaque psoriasis vulgaris. autosomal recessive mutations in il36rn have been identified in ~25-40% of patients with generalized pustular psoriasis (gpp), a rare, severe pustular variant of psoriasis. in addition, heterozygous missense variants in card14 and ap1s3 have been associated to pustular psoriasis as well and shown to be functionally relevant. in order discover how relevant those genes were in our large gpp cohort of 61 patients recruited all over germany, we screened them for coding variants in il36rn, card14 and ap1s3 by sanger sequencing and for quantitative aberrations by mlpa. we identified homozygous or compound heterozygous il36rn mutations in 15 of 61 gpp patients (25%) and single heterozygous mutations in 5 patients (8%). the most common mutations were c.338>c>t/ p.ser113leu and c.227c>t/ p.pro76leu, present on 49%/20% of mutated alleles, respectively. we also identified three so far unreported mutations: c.338c>a/ p.ser113x, c.295-300delacct-tc/ p.thr99_phe100del and c.130g>a/ p.val44met. according to molecular modeling, c.338c>a/ p.ser113x resulted in a shortened, de-stabilized protein analogous to c.280g>t/ p.glu94x. the other two mutations were also predicted to result in destabilized, likely disease-relevant, loss-of-function proteins. heterozygous ap1s3 mutations were detected in two gpp patients, both of whom carried additional homozygous or compound-heterozygous il-36rn mutations. 4 gpp patients were heterozygous carriers of rare missense variants in card14 (7%); of note, two of these patients carried additional mutations in il36rn. our genotype-phenotype correlation revealed a similar gender distribution in carriers of il36rn mutations and wildtype carriers, but a strong association between bi-allelic mutations in il36rn and early age of manifestation (p = 7.4x*10-04). as in other autosomal recessively inherited mutations, the frequency of parental consanguinity was significantly higher in patients with two il36rn mutations compared to non-carriers. overall, our genetic studies suggest a lower impact of variants in ap1s3 and card14 in pustular psoriasis than of those in il36rn. interestingly, the combination of il36rn mutations with either ap1s3 or card14 congenital prosopagnosia (cp), also known as developmental prosopagnosia or face blindness, describes the inability to recognize faces. cognitive functions such as intelligence as well as the sensory visual capabilities are usually not impaired but people with prosopagnosia are negatively affected in their social life because individuals with the disorder have difficulty in recognizing family members, close friends or colleagues. although the prevalence of cp is estimated at 2.5% and it appears to run in families, the contexts, which genetic, epigenetic and environmental factors contribute to this trait, are largely unknown. therefore we started to establish a large, well-characterized cp cohort for genetic studies. we hypothesize that rare highly penetrant non-synonymous genetic variants could explain some cases of cp. as part of a larger genetic study of patients with cp, we performed family based whole-exome sequencing and targeted re-analysing on four individuals from two families with multiple affected members. by obtaining samples from affected and unaffected members of the same family, we hope to effectively identify de novo and inherited variations. variations are considered on the basis of allele frequency, mutation type, literature and mutation prediction tools, thus generating a list of candidate variations/genes for each patient that is amenable to interpretation and further analyses in the extended cohort. through this approach, we hope to identify causal variations/genes in families and isolated patients with cp. erythrokeratodermia variabilis (ekv) is a rare, autosomal dominant genetic skin disorder with a highly heterogeneous phenotype. to date, three causative genes (gjb3, gjb4 and gja1) are described but further genetic heterogeneity is expected. card14 mutations are only described for psoriasis vulgaris, generalized pustular psoriasis and pityriasis rubra pilaris. for the first time, we present disease causing card14 mutations in patients with an "ekv-like" phenotype. it refers to one familial case with two affected individuals, with an autosomal dominant transmission from the mother to the daughter and one independent sporadic case. all patients present typical ekv symptoms. a rash of well-demarcated erythematous and scaly plaques interspersed with distinct islands of uninvolved skin or small reddish papules coalescing into large reticulated scaly plaques and palmoplantar keratoderma. to confirm the suspected diagnosis of ekv, we analyzed a custom designed multi-gene panel by next generation sequencing with 74 genes associated to hereditary skin diseases (agilent haloplex technology). the sequencing results did not reveal any mutation in the genes gjb3, gjb4 and gja1, but we found two pathogenic mutations in card14. in the familial case c.467t>c p.leu156pro (rs387907240, enst00000570421.5) was detected, while the sporadic case carries c.371t>c p.leu124pro. we hypothesize that different genetic and environmental factors are involved in the evolution of the phenotype in patients with card14 mutations. our cases show that classification of unusual skin phenotypes can be challenging without genetic testing. therefore, gene panel sequencing is a cost-efficient and time-saving solution for solving difficult cases with sometimes unexpected genetic background. bipolar disorder (bd) is a complex psychiatric disorder affecting more than 1% of the world's population. the highly heritable disease is characterized by recurrent episodes of manic and depressive symptoms. as the cumulative impact of common alleles with small effect may only explain around 38% of the phenotypic variance for bd, rare variants of high penetrance have been suggested to contribute to bd susceptibility. in the present study we investigated 226 individuals of 68 large multiply affected families of european origin using whole exome sequencing (wes). in each family, two to five affected individuals with bd or recurrent major depression were selected for sequencing. wes was performed on the illumina hiseq2500 platform and the varbank pipeline of the cologne center for genomics was used for data analysis. all identified variants shared within each family were filtered for a minor allele frequency <0.1% and potentially damaging effects predicted by at least four of five different bioinformatics tools. we identified a total of 1214 rare, segregating and functional variants implicating 1122 different genes, of which 903 were brain expressed. subsequently, we applied the residual variation intolerance scores (rvis, petrovski et al., 2013) and identified 294 genes that were ranked among the 20% most "intolerant" genes in the genome. gene enrichment analysis of these genes showed a significant enrichment for a total of 18 pathways (p < 0.001) including neuron projection, axon development and cell adhesion. for follow up analyses, we prioritized genes that were either found in at least two unrelated families in the present study or that were previously reported in next generation sequencing or gwas studies of bd. in addition, we enclosed the genes that were predominantly driving the significant pathways in the above mentioned gene enrichment analysis. the different approaches of prioritization yielded 42 candidate genes that are currently being followed up by resequencing in cohorts of about 2500 independent bd cases and 2500 controls of european ancestry. the candidate genes include cdh22 that encodes a calcium-dependent cell adhesion protein that may play an important role in the morphogenesis of neural cells during the development and maintenance of the brain. for resequencing we use the single molecule molecular inversion probes (smmips) technology that enables multiplex targeted resequencing in large cohorts. the smmips sequences were designed with an empirically variants in several patients indicated an oligogenic inheritance rather than a purely monogenic one. moreover, the oligogenic basis of this group of inflammatory diseases might currently be underestimated, as our study suggests that genetic risk factors other than il36rn mutations remain to be identified in the majority of gpp patients. exome sequencing of 46 multiply affected schizophrenia families provides new insights into the pathogenesis of the disorder schizophrenia (scz) is a multifactorial psychiatric disorder with a lifetime risk of ~ 1% and a heritability of about 60-80%. analysing multiply affected families using whole exome sequencing (wes) is a very promising approach to identify new scz risk factors. in these families, individuals are affected with scz over several generations. it is likely, that in multiply affected families genetic variations with particularly strong effect co-segregate with the disorder and contribute to the development of the psychiatric symptoms. to our knowledge, the present study is the largest study analysing multiply affected scz families using wes worldwide so far. we included 46 families with at least 3 affected members each. from each family, 3-5 individuals were exome sequenced on an illumina hiseq 2500 and analysed using the varbank pipeline of the cologne center for genomics (http://varbank.ccg.uni-koeln.de) and the clc bio biomedical genomics workbench. we included rare (allele frequency ≤ 0.1% in the exome aggregation consortium dataset) variants that were predicted to be pathogenic (combined annotation dependent depletion score ≥ 15; cadd.gs.washington.edu), confirmed by sanger sequencing and co-segregating with the disorder. in total we identified potentially pathogenic mutations in ~ 880 genes. a substantial proportion of these will not contribute to the pathogenesis of scz. in order to further tease out the most promising candidate genes we applied multiple strategies: (i) screening our mutations in independent patient and control cohorts through international cooperations (access to more than 3,000 scz patients), (ii) gene-based tests, (iii) pathway-and network-analyses, (iv) gene expression analyses and (v) sequencing of the candidate genes in 2,500 scz patients and 2,500 controls. analyses are ongoing and will be presented at the upcoming conference. abstracts and our evolution, we know very little about the different mutagenic processes in our germline. of particular interest are a handful of highly recurrent dnm associated with congenital disorders and/or rasopathies, that have been described as driver mutations expanding in the male germline. the mutation itself causes a change in the tyrosine kinase receptor/ras/ mapk pathway, which in turn confers the spermatogonial stem cell a proliferative advantage. selfish or driver mutations are quite common in cancer, but we still know very little about the selfish expansion in the male germline. the reason might be that mutations in the human germline are very rare, and it is rather difficulty to directly measure such rare events. most of our knowledge on germline mutagenesis comes from indirect sequence comparisons or whole genome sequencing of pedigree families, but it renders little information about individual mutagenic events. for this reason, we have adapted an ultrasensitive, next generation sequencing (uss) technology for the measurement of rare mutations to study the expansion of selfish genes in the male germline. as a proof-of-principle, we have sequenced at an extremely high coverage exon 10 and 15 of the fgfr3 gene in young and old sperm donors. we found an increased mutation frequency for the loci associated with achondroplasia and thanatophoric dysplasia ii in sperm of older donors. our results also show that we can distinguish ultra-rare mutations occurring at a frequency of one in hundred thousand wild type; thus, making this method ideal to discover potential driver dnm that might be expanding with paternal age. s. sivalingam 1, 2 , a. j. forstner 1, 2, 3 , s. herms 1, 2, 4 , a. maaser 1, 2 , c. s. reinbold 4, 5 , t. andlauer 6 , j. frank 7 , h. dukal 7 , d. schendel 7 , 2 , p. hoffmann 1, 2, 4 , t. kircher 8 , u. dannlowski 8, 9 , a. krug 8 , a. cichon 1, 2, 4 , s. witt 7 , m. rietschel 7 , m. m. nöthen 1, 2 introduction: affective disorders such as major depressive disorder (mdd) and bipolar disorder (bd) are genetically complex and heterogeneous disorders. both genetic and environmental risk factors contribute to the etiology of the diseases. however, the neurobiological correlates by which these risk factors influence the disease development are hardly understood. increasing evidence suggests that epigenetic modifications such as dna methylation have important implications on the development of psychiatric disorders including mdd and bd. several studies revealed that alterations in the dna methylation can modulate gene expression in response to the environment. to investigate this, genome-wide dna methylation analysis of 66 female individuals from three extreme groups (genetic-, environmental risk and healthy controls) was performed. methods: for the genome-wide dna methylation analysis we selected: (i) 22 individuals with genetic risk (at least one 1st degree relative with a life-time diagnosis of mdd or bd), (ii) 22 individuals with environmental risk (maltreatment in the childhood trauma questionaire) and (iii) 22 matched healthy controls. all individuals were of european origin. processing was done according to the manufacturer's protocol using the infinium methylationepic beadchip (illumina, san diego, ca, usa) covering more than 850.00 methylation sites at the life & brain center (bonn, germany). state-of-the-art data processing protocols, including correction for blood cell type heterogeneity, color correction, eliminating probes containing snps and cross reactive probes were used. after quality control trained design algorithm mipgen (boyle et al., 2014) and sequencing is currently performed on the illumina hiseq2500 platform. our preliminary results strongly suggest that rare and highly-penetrant variants in neuronal and cell adhesion genes contribute to bd etiology. the results of resequencing of a large case/control sample will provide further evidence for an involvement of particular pathways. the use of zebrafish as model system to quantitatively assess the impact of risk variants in non-coding regions in vivo s. l. mehrem 1, 2 , b. nagarajan 3 , n. ishorst 1, 2 , a. c. böhmer 1, 2 , e. mangold 2 , b. odermatt 3 , k. u. ludwig 1, 2 1 department of genomics, life & brain center, university of bonn, bonn, germany, 2 institute of human genetics, university of bonn, bonn, germany, 3 institute of anatomy, university of bonn, bonn, germany most human malformations occur early in embryonic development and are present immediately after birth. one common human birth defect is nonsyndromic cleft lip with or without cleft palate (nscl/p), affecting about 1 in 1,000 newborns. nscl/p has a multifactorial background with a strong genetic component. recent genome-wide association studies identified several loci as risk factors for nscl/ p. notably, most of them map to non-coding regions and are expected to have a functional impact through regulatory mechanisms. given the early developmental time point of facial development and the resulting lack of accessible human tissue, follow-up analyses of risk variants are challenging. we are hypothesizing here that we might be able to quantitatively detect differences in regulatory activity between wildtype and risk variants located in predicted enhancers by using the zebrafish as model organism. the advantages of using the zebrafish are (1) ex utero development, (2) transparency of the fish, (3) easy manipulation and (4) relatively short generation times. we applied a dual-luciferase assay plasmid system which is based on a sequential measurement of two luciferases (firefly and sea pansy luciferase) in fish lysates upon injection of a single plasmid. this plasmid, which contains a minimal promoter (minp) and the enhancer region of interest, is microinjected into zebrafish eggs of one-cell stage. after three days, fish are collected, lysed and luciferase activity is measured using a luminometer. for our proof-of-principle analysis we analyzed an nscl/p risk locus on chromosome 13q31 (ludwig et al. 2012) . through database research one enhancer was predicted that contained two strongly associated risk variants. in vivo fluorescence analysis using egfp in zebrafish embryos revealed this enhancer to be active in craniofacial development, but qualitative differences in activity were not observed by eye. upon cloning of the enhancer in the dual-luciferase system, our injection results in vivo indicate that the system is working in principle. however, a high standard deviation between single replicate measurements was observed, probably due to variability in transfection efficiency. we therefore are planning to adapt the protocol in order to screen for positively injected fish embryos. we are currently investigating the functionality of these screening constructs in zebrafish embryos. results will be presented at the meeting. once successful, our approach represents a practical method to quantify the activity of regulatory elements in real time in vivo. this will be of particular importance in the functional follow-up of genetic findings in non-coding regions for the majority of birth defects. previously genome-wide association methods in patients with classic bladder exstrophy (cbe) found association with isl1, a master control gene expressed in pericloacal mesenchyme. this study sought to further explore the genetics in a larger set of patients following-up on the most promising genomic regions previously reported. genotypes of 12 markers obtained from 268 cbe patients of australian, british, german ital-and normalization 780,467 cpg-sites were tested for genome-wide dna methylation by a linear regression model while accounting for biological as well technical covariates. results: the genome-wide dna methylation analysis of the three extreme groups revealed 39 cpg sites (p < 1×10 -04 ) in the subgroup analysis "environmental risk vs. controls" and 35 cpg sites (p < 1×10 -04 ) in the analysis "genetic risk vs. controls". in addition, we identified 48 cpg sites (p < 1×10 -04 ) in the comparative analysis of "genetic risk vs. environmental risk". none of these cpgs showed significant differential dna methylation after correction for multiple testing. however the hierarchical clustering of the differentially methylated sites provided some evidence for differentially methylated patterns between the subgroups. discussion: our genome-wide dna methylation analysis of the extreme groups provided some evidence for differentially methylated cpg sites which unfortunately did not withstand correction for multiple testing. this may reflect at least in part that the sample size of the present study was too small to detect differentially methylated cpgs at the genome-wide level. a. tafazzoli 1, 2 , t. vaitsiakhovich 3 , l. pethukova 4, 5 , 2 , s. redler 1, 6 , r. kruse 7 , b. blaumeiser 8 , m. böhm 9 , g. lutz 10 , h. wolff 11 , , am. christiano 5 , p. kokordelis 1 , mm. nöthen 1, 2 , rc. betz 1, 2 1 alopecia areata (aa) is a common hair loss disorder that occurs in both sexes and all age groups. aa is thought to be a tissue-specific autoimmune disease directed against the hair follicle. both, gene-based and genome-wide association studies have identified more than 10 susceptibility loci for aa; however, a large percentage of the overall heritable risk still awaits identification. to provide further insight into the immune related nature of aa, we and our us collaborators had each performed an immunochip-based analysis. we recently performed a meta-analysis, combining the data from both studies, and are now aiming to follow-up the best results in an additional german cohort by use of a sequenom assay to identify novel susceptibility loci. we conducted the meta-analysis by using data from the above mentioned two studies on illumina beadchip arrays including a total of 1,096 cases and 3,176 controls of central european origin. method of synthesis of regression slopes (msrs) was used for the analyses which are implemented in metainer software package. for follow-up step, we chose the most promising candidate snps. these will be examined with the sequenom massarray iplex platform in an independent aa sample comprising 1,459 cases and 970 controls. by use of the meta-analysis combining data from the us and our sample, we identified 49 novel loci with a suggestive p-value of pbecker-wu ≤ 10 -3 (phet ≥ 0.01). among them, nfkb is the most significant locus (pbecker-wu = 1.5 × 10 -07 ). in order to achieve genome-wide significance, we plan to follow-up the most promising snps in an independent german sample. we considered the 19 most significant loci (lower p-becke-wu value) for the replication step. the experiments are ongoing and results will be presented at the meeting. despite the recent identification of susceptibility loci for aa, our understanding of the genetics of aa is incomplete. identification of new loci may provide novel insights into biological pathways and a better elucidation of disease pathophysiology. 22q13.1 encompassing only one gene: kcnj4. the duplication segregates with the phenotype in the family. kcnj4 belongs to the same subfamily of potassium channels as the known disease gene for cooks syndrome kcnj2 and both share several biological functions. recent data show that gain of function mutations in another potassium channel kcnh1 cause zimmermann-laband syndrome, a congenital malformation syndrome also associated with hypoplasia or aplasia of nails and terminal phalanges. therefore we propose that duplications of kcnj4 may also cause tissue specific misregulation resulting in digit and nail defects. taken together we show in a three generation pedigree that cooks syndrome is associated with a duplication of kcnj4. our data further highlight the emerging role of potassium channels in congenital digit and limb anomalies. case report: deletion of the terminal short arm of chromosome 5 (chromosome 5p deletion syndrome) without 5q-duplication with a familial history of a large pericentric inversion of chromosome 5 b. gläser, n. hirt, e. botzenhart, j. fischer, m. leipoldt institut für humangenetik, universitätsklinikum freiburg, freiburg, germany we report on a male patient referred as a newborn with typical clinical features of chromosome 5p deletion syndrome. conventional karyotyping of lymphocyte cultures confirmed a deletion of the terminal short arm of chromosome 5 with breakpoint in 5p14. the size of the deletion (22mb) could be refined by microarray analysis and assigned to pos 5:16497-22278242. parental cytogenetic investigations showed a normal karyotype in the mother whereas the father was revealed to be carrier of a large pericentric inversion of one chromosome 5. odd crossing-over in the inverted segment of a pericentric inversion in a parent can lead to unbalanced offspring caused by gametes with a terminal deletion of the p-arm together with a duplication of the q-arm or gametes with a duplication of the p-arm together with a deletion of the q-arm. in order to find out, whether the 5p-deletion in the child is the result of an independent event or if it is related to the structural chromosomal aberration of the father a microsatellite analysis is going to be performed p-cytog-130 a small supernumerary marker chromosome of the pericentric region of chromosome 8 in a child with intellectual disability: case report and literature review b. hoffmann, g. gillessen-kaesbach, i. hüning institut für humangenetik, universität zu lübeck, lübeck, germany small supernumerary marker chromosomes (ssmc) are reported in 0.043% of newborn infants. we report on a girl, which was born preterm at 28 weeks of gestation via cesarean section due to pathological ctg. the pregnancy was complicated by gestational diabetes. she presented with muscular hypotonia, multiple hemangiomas, dysmorphic features and feeding problems. the body measurements were in normal range. the feeding problems made a tube feeding necessary until the age of four months. facial features consisted in epicanthus, high palate, broad nasal tip and broad nasal root. a brain mri showed periventricular leukomalacia and hypoplasia of corpus callosum. drug-resistant seizures with hypsarrhythmia started at the age of ten months. the affected girl was the only child of healthy non-consanguineous parents. the father also presented with a few small hemangiomas in the face. there was no history of intellectual diasability in the extended family. karyotyping showed ssmc in mosaicism. molecular characterization by array-cgh showed that the ssmc consists of pericentric chromosomal material derived from chromosome 8 (arr [ h g 1 9 ] 8 p 1 1 . 1 p 2 1 . 3 ( 2 2 , 8 1 6 , 5 2 7 -4 3 , 3 9 6 , 7 7 6 ) x 2~3 , 8 q 1 1 . 1q11.21(47,673,716-52,164,874) x2~3 dn (grch37/hg19). ian, spanish and swedish origin and 1,354 ethnically matched controls and from 92 cbe case-parent trios from north america were analysed. only marker rs6874700 at the isl1 locus showed association (p = 2.22 × 10 -08 ). a meta-analysis of rs6874700 of our previous and present study showed a p value of 9.2 × 10 -19 . developmental biology models were used to clarify the location of isl1 activity in the forming urinary tract. genetic lineage analysis of isl1-expressing cells by the lineage tracer mouse model showed isl1-expressing cells in the urinary tract of mouse embryos at e10.5 and distributed in the bladder at e15.5. expression of isl1 in zebrafish larvae staged 14 hpf to 24 hpf was detected in the developing pronephros region. our study supports isl1 as a major susceptibility gene for cbe and as a regulator of urinary tract development. to date, seven patients with interstitial deletions at chromosome 8q22.2-q22.3 have been described in the literature. all patients reported had moderate to severe intellectual disability and a characteristic facial phenotype including blepharophimosis, telecanthus, epicanthus, flat malar region, and a thin upper lip vermillion. six of the seven patients had epileptic seizures. by analyzing the deletion's overlaps, two distinct critical regions have been suggested for the facial phenotype as well as for intellectual disability and seizures. here we present another patient with a de novo 3.6 mb deletion in 8q22. 2-q22.3 . the patient shows moderate mental retardation. he has an abnormal eeg, however, only one episode of clinical seizures has been observed so far. the facial gestalt resembles the typical dysmorphic features of the patients with 8q22.2-q22.3 deletions reported previously. minor anomalies were short fingers and toes, and a single palmar crease. our report supports the assumption that deletions in 8q22.2-q22.3 cause a distinctive and clinical recognizable microdeletion syndrome with characteristic facial features and intellectual disability. since the patient's deletion overlaps with most of the critical region for the dysmorphic phenotype but only with parts of the intellectual disability critical region, the molecular data presented here further narrow down the critical region for the intellectual disability seen in patients with 8q22.2-q22.3 microdeletions. p-cytog-128 *** cooks syndrome is associated with a duplication of the potassium channel kcnj4 mundlos 1, 2, 4 , d. horn 1 , m. spielmann 1, 2, 4 1 institute for medical genetics and human genetics, charité universitätsmedizin berlin, berlin, germany, 2 max planck institute for molecular genetics, berlin, germany, 3 northern ireland regional genetics service, belfast city hospital, belfast, ireland, 4 berlin-brandenburg school for regenerative therapies -bsrt, berlin, germany cooks syndrome (mim106995) is a rare autosomal dominant disorder classically characterized by onychodystrophy, and anonychia, with absence or hypoplasia of the distal phalanges of the hands and feet. large duplications including kcnj2 were shown to be causative for cooks syndrome. recently mouse studies revealed that tissue specific misregulation of kcnj2, a potassium channel of the subfamily j, cause hypoplasia of nail beds and abnormal distal phalanges, thus resembling the cooks phenotype. here we report on a three generation pedigree with typical features of cooks syndrome that was negative for kcnj2 testing. we performed high resolution array-cgh and identified 80 kb duplication on chromosome p-cytog-132 chromosome 17q23.1-q23.2 deletion syndrome -an additional case with sensorineural hearing loss. s. leubner, c. hennig, a. junge mitteldeutscher praxisverbund humangenetik, dresden, germany the chromosome 17q23.1-q23.2 deletion syndrome (mim #613355) is a contiguous gene syndrome caused by a deletion encompassing the chromosome region 17q23.1-q23.2. initially, it has been described by ballif et al. (2010) . up to now, only a few cases with a microdeletion 17q23.1 q23.2 have been reported. most of them carry a microdeletion with recurrent breakpoints and similar size of about 2.2 mb. the common clinical features of cases with the chromosome 17q23.1-q23.2 deletion syndrome comprise mild-to-moderate developmental delay, microcephaly, postnatal growth retardation, heart defects, limb anomalies, and hearing loss. we present an additional male patient with a 2.2 mb deletion of chromosome 17q23.1-q23.2, detected by array cgh (cytochip isca 4×180k v1.0, illumina). our index patient shows typical symptoms of the chromosome 17q23.1-q23.2 deletion syndrome like developmental retardation (in particular speech delay), a head circumference at the 3rd centile, postnatal growth retardation with a body height at the 4th centile, heart anomalies (right-sided aortic arch, patent ductus arteriosus), and sensorineural hearing loss on both sides. our data improve the characterization of the typical phenotype caused by a chromosome 17q23.1-q23.2 deletion and reinforce the suspicion that this region might be associated with sensorineural hearing loss. partial, homozygous deletions of ahi1 gene causes joubert syndrome type 3 d. meier, a. behnecke, jwg. janssen, u. moog, k. hinderhofer institute of human genetics, university hospital heidelberg, heidelberg, germany we describe a patient who is second child of consanguineous healthy parents from turkey. he was born after 37 weeks of gestation with normal birth parameters (2850 g, 51 cm, ofc 32 cm). at the age of three months atypical eye movements became apparent. psychomotor development was delayed from the beginning, he presented with hypotonia and later developed ataxia. an abnormal breathing pattern was not noticed. an abnormal mri with hypoplasia of the cerebella vermis at the age of 21 months and the clinical signs described above led to a clinical diagnosis of joubert syndrome. at that time no diagnostic testing was available. he returned to the outpatient clinic of the institute of human genetics at the age of 22 years having developed retinopathy in the meantime. his height was below average (~3 cm, t (p.arg730*, adnp), resulting in the introduction of a stop codon in exon 5/5 and truncation of the corresponding protein. both of the parents did not carry this mutation. adnp is part of the atp-dependent baf chromatin-remodeling p-monog-137 the challenge to insert costello syndrome causing hras mutations into human keratinocytes using the crispr/cas9 editing technology. l. brandenstein, k. kutsche, g. rosenberger university medical center hamburg-eppendorf, hamburg, germany germline missense mutations in the hras gene cause costello syndrome, a rare developmental disorder characterized by a typical facial gestalt, postnatal growth deficiency, intellectual disability, and predisposition to malignancies as well as skeletal, cardiac and dermatological abnormalities. the molecular pathophysiology caused by heterozygous hras gain of function mutations has been analysed in various tissues and cell types, however, up to date the molecular basis for cutaneous manifestations in costello syndrome is largely unknown. to address this question in an appropriate model system, permanent human keratinocyte (hacat) cells carrying costello syndrome-associated mutations in the endogenous hras gene should be generated by using the crispr/cas9 technology. double strand breaks induced by cas9 can be repaired in two ways: the error-prone non-homologous end-joining pathway for the generation of knockout models or the homology directed repair pathway, which allows precise editing. the latter enables the introduction of specific point mutations into a cell line by using a single stranded dna (ssdna) as repair template. however, we found that this is a very rare event and its efficiency depends on various factors including used cell line, selected guide rna, length and amount of ssdna and also cas9 variant. nonetheless, by using cas9 wildtype, we could insert the disease-associated c.35g>t (p.g12v) mutation into genomic hras both in hacat cells (8 positive clones out of 800) and hek cells (3 positive clones out of 15). in contrast, using the cas9 nickase protein variant that prevents off target effects, did not result in positive clones. taken together, in 10 months of working with crispr/ cas9 we gradually gained experience with many problems and pitfalls of this technology and, finally, now we are able to introduce point mutations in cell lines. next, we will use the mutant hacat cell line to gain deeper insight into the function of hras for epidermal homeostasis and its deregulation in costello syndrome. ccdc66 could also play a role in prenatal development of the mouse retina and brain. embryonic stages of interest comprise t, p.s460l (maf = 0.025, rs75495843, enst00000334661.4) in the phospholipase c delta 1 (plcd1) gene within the tricy1 locus. furthermore, all five individuals present a second variant c.903a>g, p.p301p (maf = 0.27, rs9857730) in the same gene. plcd1 is a member of the phospholipase c family. the enzyme is involved in calcium-dependent intracellular signal transduction and catalyzes the hydrolysis of phosphatidylinositol 4, 5-bisphosphate into the second messenger diacylglycerol and inositol triphosphate. homozygous knockout-mice have hair defects and show aberrant skin development with increased progression of skin tumors and intradermal hair-follicle derived cysts. in humans, plcd1 is highly expressed in the hair follicle but so far only nonsense mutations have been described causing hereditary leukonychia totalis without any skin or hair abnormalities. a segregation analysis of plcd1 in a tunisian family cohort and one german family (13 families, 64 individuals, 38 affected) showed that all affected individuals contain the same two sequence variants. based on these results, we propose that plcd1 is responsible for the cyst phenotype. cdna sequencing from three different cysts revealed additional acquired somatic sequence variants in plcd1. we found c.2234c>t p.s745l in two cysts and the two variants c.2129c>t, p.s710f and c.2132c>t p.s711f in the third one. all three somatic variants are not described in the exac database and the 1000 genomes project. allele-specific rt-pcrs were performed with cyst cdna and we could show that the somatic variants are on the same allele as the inherited variants. the acquired somatic cyst sequence variants lie within or respectively near the c2 domain of the plcd1 protein. the c2 domain is involved in the calcium-dependent binding to membrane-integrated phospholipids. depletion of the domain leads to decreased membrane association and protein activity. we assume that the allele with the variants c.1379c>t and c.903a>g is a risk factor for hereditary trichilemmal cysts and that additional acquired rare sequence variants are the genetic trigger for the development of the cysts. and patients. discordant results for 4 variants proposed a higher specificity of the rtpcr-seq. thus, we established a simple amplicon based deep sequencing approach for standard rtpcr fragments to ascertain the effects of specific splice site variants. this technique has proven to be highly scalable, fast and efficient to analyze splice-site variants. p-monog-144 eif2s3 mutations associated with severe x -linked intellectual disability syndrome mehmo mehmo (mim %300148), is a rare x-linked syndrome characterized by profound intellectual disability, epileptic seizures, hypogonadism, hypogenitalism, microcephaly, and obesity. in 1998 steinmüller and colleagues described a large family with mehmo syndrome with five affected males in two generations and assigned the disease locus to the short arm of chromosome x (xp11. 3-22.13 ). we took advantage of massively parallel sequencing in four families with mehmo syndrome, including the family reported by steinmüller et al. to identify the underlying genetic cause if this severe disorder. we here show mehmo syndrome is associated with mutations in the x chromosome gene eif2s3. in three families we identified a c-terminal frameshift mutation (p.ile465serfs) and in an unrelated boy who is less severely affected, we identified a novel maternally inherited missense mutation (p.ser108arg) in eif2s3. eif2s3 encodes the gamma subunit of the eukaryotic translation initiation factor 2 (eif2). eif2 is essential for eukaryotic translation initiation and regulation of the integrated stress response (isr). subsequent studies in patient fibroblasts (p.ile465serfs) showed increased isr activation due to the mutation and functional assays in yeast demonstrated that the p.ile465serfs mutation impairs eif2 gamma function to a greater extent than tested missense mutations, consistent with the more severe clinical phenotype of the affected males with ile465serfs mutation. our results suggest that more severe mutations in eif2s3 cause the full mehmo syndrome, while less deleterious mutations are associated with a milder form of the syndrome with only a subset of the symptoms. introduction: larger structural genomic duplications or deletions (copy number variations = cnvs) are routinely detected by array comparative genomic hybridization (acgh). while acgh has been established as a robust and effective approach for cnv screening, it remains expensive and is limited by resolution. in addition, commercial mlpa testing today allows the identification of exon deletions or duplications for a limited number of core genes. more recently, massive parallel sequencing of multi gene panels (mgps) has been introduced as a fast and cost-effective tool in routine genetic diagnostic testing to identify causal intragenic sequence alterations not only in core genes but also in those with small contribution to the respective phenotypes. the obtained mgps data may also be bioinformatically assessed to detect exon deletions or duplications within the analyzed genes panels and thus can be expected to further improve the diagnostic yield. methods: more than 100 patients were sequenced with phenotype specific gene panels on a miseq platform (illumina) and analyzed with our bioinformatic diagnostic workflow including a quantitative data assessment using our in house java based bioinformatic script to search for gene or exon deletions. detected deletions were confirmed by an independent method (e. g. mlpa, pcr amplification of the junction fragment or linkage analysis), if available. results: we here report details for six suspected deletions in seven patients detected by our in house bioinformatic workflow: heterozygous gene deletions of pafah1b1, spastin or arfgef2, respectively; two heterozygous cftr deletions of exon 2 and 3, one homozygous partial sftpb deletion and a homozygous ispd exon deletion. the complete gene deletions of pa-fah1b1 and spastin as well as the cftr deletion of exon 2 and 3 could be confirmed by mlpa. for the partial sftpb deletion both breakpoints could be precisely located within the readout, allowing determining correct deletion size and design of primers to amplify the junction fragment. the homozygous deletion of exon 6 in the ispd gene could be confirmed by pcr and linkage analysis. discussion: diagnostic multi gene panel sequencing after nextera enrichment allows sufficient homogeneity of the obtained patient and control data per target to quantitatively search for constitutional deletions covering two or more adjacent targets. combined data assessment considering the individual clinical data will not only further increase the diagnostic yield but can also be expected to further delineate the mutational spectrum for specific phenotypes by the simultaneous detection of clinically relevant sequence variants as well as cnvs. mgps sequence assessment may also allow to gain new insights into the genomic architecture and origin of target regions and haplotypes, involved in common structural variations. c. niemietz, v. sauer, l. fleischhauer, s. guttmann, s. reinartz groba, a. zibert, h. schmidt universitätsklinikum münster, klinik für transplantationsmedizin, münster, germany various types of somatic cells have been reprogrammed to induced pluripotent stem cells (ipsc) followed by differentiation into hepatocyte-like cells (hlc). recently, cells that shed from the renal epithelial system were shown to be a suitable and convenient source for ipsc generation. in the current study, urine-derived cells (ucs) were isolated from urine donations of patients having familial amyloid polyneuropathy (fap), a neurodegenerative disease caused by mutation of the transthyretin (ttr) gene and wilson disease (wd), a genetic disorder of atp7b causing copper accumulation, predominantly in liver and brain. patient-specific hlcs were differentiated in order to study disease-specific mechanisms and to investigate the efficacy of novel compounds. for isolation of renal epithelial cells, urine of fap and wd patients was processed. ucs were reprogramed into ipscs using plasmids resulting in transient expression of factors sox2, oct3/4, klf4, and c-myc. after characterization of ipsc that expressed high levels of pluripotency markers, like oct4 and nanog, a 3-step hepatocyte differentiation protocol was performed. ipscs were subjected to a treatment with growth factors (activin a, wnt3a, fgf2, hgf) for 14 days. the hepatic, patient-specific character of differentiated hlcs was assessed by functional analysis, gene expression profiling, genotype analysis, and immunostainings. therapeutic oligonucleotide efficacy targeting ttr was determined by immunocytochemistry, qrt-pcr and western blot analysis. ttr-stabilizing activity of tafamidis was investigated by means of thermal shift assay and western blot analysis. copper chelation by methanobactin was determined by atomic absorption spectroscopy. reprogramming of ucs resulted in stable ipsc lines with characteristic pluripotent marker expression. differentiated hlcs showed high similarity to human hepatocytes in terms of genetic profile and functional activity. small-interfering rnas (sirnas), antisense oligonucleotides (aso) (niemietz et al. 2016, plosone 11(9) :e0161455), and the ttr stabilizing compound tafamidis that are currently assessed in clinical studies were studied in hlcs derived from fap patients. a novel chelator was used to determine intracellular copper accumulation in hlcs derived from wd patients (lichtmannegger et al. 2016, jci 126 (7):2721-35). fap-specific hlcs revealed differently expression of key regulators of the protein quality control (pqc) system. our results demonstrate that ipsc derived from urine are excellently suited to study hereditary liver diseases. hlcs could be investigated in the patient-specific genetic background. the efficacy of novel compounds was assessed and individual responses were monitored. involved in gene regulation. by parent-patient trio whole-exome sequencing, we were able to characterize the unterlying genetic cause in the patient, who has undergone multiple diangostic test before without receiving a diagnosis. discussion: although for many congenital syndromic diseases the disease-associated genes are known it remains difficult for physicians to interpret the highly variable phenotypes as well as their variable nomenclature in order to request the appropriate molecular analysis. here we present one of more than 50 cases that will help to establish a database connecting specific phenotypes, using the human phenotype ontology terms, with the each corresponding disease-causing genes (midas). novel compound heterozygous nalcn variants in two brothers with muscular hypotonia and global development delay l. segebrecht 1 , c. weiß 2 , m. jäger 1,3 , t. zemojtel 1, 4, 5 , d. horn 1 , n. ehmke 1, 6 1 institute of medical and human genetics, charité -universitätsmedizin berlin, berlin, germany, 2 spz dpt. pediatric neurology, charité -universitätsmedizin berlin, berlin, germany, 3 berlin-brandenburg center for regenerative therapies -bcrt, charité -universitätsmedizin berlin, berlin, germany, 4 institute of bioorganic chemistry, polish academy of sciences, poznań, poland, 5 labor berlin -charité vivantes gmbh, berlin, germany, 6 berlin institute of health, berlin, germany we report on two brothers aged two and three years, with muscular hypotonia, global development delay, abnormal respiratory rhythm, mild facial dysmorphism, recurrent respiratory infections, and failure to thrive. sequencing of 3089 disease related genes identified compound heterozygosity for two novel mutations in nalcn: c.4281 c>a (p.phe1427leu) and c.4103 + 2 t>c. nalcn encodes a voltage-independent, non-selective cation channel, which is involved in regulation of neuronal excitability. the missense variant c.4281 c>a affects the highly conserved amino acid position phe1427 which is located in segment s6 of domain iv in the pore-forming unit of nalcn. the variant c.4103 + 2 t>c alters the donor splice site in intron 36 and is predicted to cause skipping of exon 36, resulting in loss of function of nalcn. biallelic mutations of nalcn are associated with infantile hypotonia, psychomotor retardation and characteristic facies (ihprf1), whereas heterozygous de novo mutations cause congenital contractures of the limbs and face, muscular hypotonia, and global developmental delay. the clinical features of our patients resemble mild ihprf1, caused by a biallelic missense mutation in segment s3 of domain iv. it has been suggested that variants in or close to s5 and s6 of the pore-forming domains lead to the above mentioned autosomal dominant condition whereas variants in other regions or loss of function mutations result in autosomal recessive inheritance. this is the first report of a mutation in a s6 segment, inherited in autosomal recessive manner. our findings indicate that phenotype-genotype correlations in nalcn are more complex than suggested so far. phenotype: in a young undiagnosed patient with developmental delay, intellectual disability and craniofacial dysmorphic anomalies, whole-exome sequencing (wes) identified a de novo insertion in the gene arid1b, known to cause the rare congenital coffin-siris syndrome and nicolaides-baraitser syndrome, respectively. methods: as part of the midas genotype-phenotype-correlation project, we sequenced the index patient and his healthy parents on a next-seq500 platform (illumina, san diego, ca, usa) and performed a trio analysis. for library preparation we used an enzymatic fragmentation approach. exome capture was performed using the sureselect human all exon kit v6 (agilent, santa clara, ca, usa) to target most of the over 20.000 genes. the libraries were sequenced to approximately 190-fold mean coverage as 151bp paired end reads. 92% of the target region was covered 20-fold or higher. data analysis and variant evaluation was performed using the clc genomic workbench 9.0 (qiagen, hilden, germany) and annotations from commercial as well as public databases (dbsnp, hgmd, clinvar, exac). results: we identified a de novo 1bp insertion in the arid1b gene, causing a frameshift mutation that leads to the truncated protein. arid1b is part of the atp-dependent chromatin remodeling baf-complex, which is abstracts tient affected by psoriasis carried the surrogate marker snp rs4406237-a for the psors1 risk variant hla-cw 0602 haplotype homozygously; and the same snp was found heterozygous in his prp-affected father. neither the pathogenic variant in card14, nor the risk variants for psoriasis described above were found in the healthy mother. whole exome sequencing revealed genetic variants, predicted to have serious consequences in further genes involved in the nf-κb as well as the notch pathway. these variants either segregate with prp or are present in the psoriasis affected individual only. the presence of an individual carrying the same card14 mutation as his prp-affected relatives but suffering from psoriasis instead strengthens the relation between prp and psoriasis, which has been repeatedly suggested in literature. we propose a balance between familial prp and psoriasis in the family investigated in this study and present genetic variants, which might influence this balance in addition to variants in card14. wilson's disease (wd) is an autosomal recessive disease resulting from copper (cu) excess due to mutations in the atp7b gene coding for a cu-transporting atpase. wd pathogenesis, however, can not only be explained by gene coding mutations since phenotypes exhibit strong variations despite the same exonic dna makeup in the gene. also in several patients with clinical wd symptoms no gene coding variants are detectable. our former studies revealed decreased liver atp7b mrna expression in some wd patients. this decrease was not only observed in patients with nonsense atp7b mutations leading to rapid mrna decay, but also in patients with missense mutations and also in some patients with suspected wd without atp7b mutations. patients with low atp7b expression presented with a more fulminant disease progression. however, we could not detect mutations in the atp7b promoter region (c.-900 to atg) in those patients. there are possibly other deregulating mechanisms responsible for decreased atp7b mrna expression. up to now, atp7b transcriptional regulation is only poorly characterized. it is known, that four metal responsive elements (mre a, c, d and e) are located within the atp7b promoter. gene regulation through mres is often metal-dependent. liver atp7b mrna expression revealed also to increase under cu addition in several species by an unknown mechanism. up to now, only one atp7b transcription factor (tf), the mrea binding ku protein, is known. the aim of our work was to further analyze the regulation of the atp7b gene, especially through mres. to screen for tf-mre interactions and to narrow down the binding site of tf, we performed electrophoretic mobility shift assays (emsa) by incubating nuclear extracts of the liver cell line hle with probes corresponding to atp7b mrec, d and e. to identify mre-binding tf matinspector analysis was performed. identified candidate tf were coexpressed with atp7b promoter-driven reporter gene to evaluate their impact on reporter gene expression. one in the reporter assay positively tested tf was validated by different emsa experiments. further it was overexpressed with and without addition of metal ions in hle to investigate the impact on endogenous atp7b expression. we showed that tf mtf1 is able to bind to mree within the atp7b promoter and significantly increases atp7b promoter driven reporter gene expression. mtf1 binding was primarily mediated by the first three bases of the mre consensus sequence. also for mrec and mred specific protein interaction could be shown and the protein binding site was narrowed down by emsa with protein identification still pending. fur-polyglutamine diseases is the formation of the so called neuronal intranuclear inclusion bodies (nii). as ataxin-3 is predominantly located in the cytoplasm, the formation of protein aggregates in the nucleus require a nucleocytoplasmic shuttling of ataxin-3. we already demonstrated in vivo using transgenic mouse models that the toxicity of expanded ataxin-3 depends on its intracellular localization: while nuclear ataxin-3 gave rise to a strong phenotype with a high number of protein aggregates, purely cytoplasmic ataxin 3, however, even with a highly expanded polyglutamine repeat (148 glutamines), was not able to induce a phenotype and even did not aggregate. we further identified and characterized intracellular transport signals (two nuclear export signals, nes, and one nuclear localization signal, nls) within the coding sequence of ataxin-3. therefore, it is evident that proteins involved in the nucleocytoplasmic transport machinery recognize these localization signals, control the intracellular localization of ataxin-3, thereby influence the toxicity and aggregation of ataxin-3 and, thus, the pathogenesis of sca3. we now screened a library of transport proteins in order to identify the transport protein which is critically involved in the nucleocytoplasmic shuttling of ataxin-3. we indeed identified a transport protein which modifies both the formation of aggregates and the intracellular localization of ataxin 3. while the overexpression of this protein moved ataxin-3 into the nucleus, its downregulation kept it out of the nucleus. we replicated this correlation in vivo in drosophila and observed in addition to this again a clear link between the intracellular localization of ataxin-3 and its toxicity i. e. its ability of induce neurodegeneration and a behavioral phenotype. likewise we even confirmed in a mouse model of sca3 the importance of the identified transport protein as its knockout largely prevented ataxin-3 from aggregating and alleviated behavioral and movement deficits. understanding the mechanisms behind the intracellular transport of ataxin-3 could give us clues into the pathogenic functions of expanded ataxin-3 and ways to mediate the progression of neuronal degeneration in sca3. evidence for genetic factors outside card14 influencing the phenotype of a family with familial pityriasis rubra pilaris and psoriasis familial pityriasis rubra pilaris (prp) is an erythematous inflammatory skin disease caused by heterozygous activating mutations in card14, a known activator of the nf-κb pathway. different genetic variants within card14 have been associated with psoriasis. the purpose of our study was to clinically and genetically investigate affected as well as unaffected members of a family with prp in order to determine the mutation responsible for this severe skin disease in the three affected family members. a father, three of his adult children as well as the mother of one child affected by prp were investigated clinically. in addition we extracted genomic dna from the blood of each individual and performed whole exome sequencing as well as direct sequencing of single genes. clinical investigation confirmed that the father and two of his children were affected by familial prp, with the skin showing the characteristic pattern of prp, early onset and chronic course. a third child was unaffected by prp, suffered however from psoriasis. the mother of one child affected by prp showed no sign of skin disease. genetic investigation revealed a heterozygous missense mutation in exon 4 of card14, c.[371c>t], p. present in all investigated individuals with prp or psoriasis. the same mutation has been described before as being pathogenic in a different family with prp. regarding genetic variants associated with psoriasis, we found the risk alleles of three coding variants in card14: rs2066964, c. short stature is a common condition of great concern to patients and their families. in most cases it is genetic in origin but the underlying cause often remains elusive due to clinical and genetic heterogeneity. in an unbiased approach we carefully phenotyped 565 patients and randomly selected 200 for whole exome sequencing. sequence variants were analyzed for pathogenicity and the affected genes characterized regarding their functional relevance for growth. all patients received extensive clinical and endocrinological examinations, careful clinical genetic phenotypic evaluation followed by targeted diagnostic assessment for suspected diagnoses. we identified a known disease-cause in only 14% of patients, the most common causes being cnvs found in 7%, followed by syndromic monogenic causes in 5% and turner syndrome in 2%. whole exome sequencing identified additional mutations in known short stature associated genes (27) in 17% of patients who manifested only part of the symptomatology precluding an early clinical diagnosis. here, heterozygous carriers of recessive skeletal dysplasia alleles (acan, npr2) were a surprisingly frequent cause of idiopathic short stature found in 3.5% of cases. we next selected known short stature genes with mutations for pathway analyses of the affected proteins and found that 54% are involved in the main functional categories cartilage formation, chromatin modification and ras-mapk signaling. in addition we identified 37 further strong candidate genes, of which seven had deleterious mutations in at least two families. interestingly, 48% of these candidate genes are involved in the 10 main functional categories already identified for the known short stature associated genes further supporting their pathogenicity. finally, in 16% of the 200 sequenced individuals our findings were of significant clinical relevance regarding preventive measures, symptomatic or even targeted treatment. besides evaluation for orthopedic or developmental issues especially screening for neoplasias (trim37, ptpn11, nf1), symptomatic treatment for chronic kidney disease (clcn5) and targeted treatment for severe hypertension (pde3a) were of clinical relevance for the affected individuals. these results demonstrated that deep phenotyping combined with targeted genetic testing and whole exome sequencing is able to increase the diagnostic yield in short stature up to 31% with concomitant improvement in treatment and prevention. rigorous variant analysis considering phenotypic data further led us to the identification of further 37 probable novel candidate genes. thermore, we found the endogenous atp7b mrna expression to be significantly increased in hle after cu treatment or cu treatment and concurrent mtf1 overexpression. sole mtf1 overexpression did not alter atp7b expression. we newly identified mtf1 to bind mree within the atp7b promoter. its in vivo role in the pathogenesis of wd needs to be further elucidated. modifying genes have been identified for lung function in cystic fibrosis [1] , disease severity [2] and several comorbidities [3] . within the european cf twin and sibling study, we focus on genes that modify the basic defect, assessed as defective chloride conductance in cftr-expressing epithelia, which we could describe by an association study on patient cohorts selected for informative endophenotypes among f508del-cftr homozygous patients [2, 4, 5] . as a first example, rs7901656 in fas modifies fas gene expression (p = 0.0009, data from 16 intestinal biopsies [6] ), cf disease phenotype (praw = 0.0039, comparing 13 concordant mildly affected f508del homozygous sib pairs and 12 concordant severely affected sib pairs; praw = 0.0047, comparing 20 unrelated f508del homozygous index cases without residual chloride secretion by nasal potential difference measurement (npd) to 13 patients with cftr-mediated chloride secretion; [2] ) and alters binding affinity for the transcription factors nf-kbp50, nf-kbp65 and hif1a [7] which govern the cellular response to infection and hypoxia. as a second example, the transcription factor ehf, derived as a positional candidate from a north american genome wide scan [1] , is associated with 2 npd-defined phenotypes (praw = 0.0082, comparing 17 index cases with high response to amiloride in npd to 13 index cases with low response to amiloride in npd; praw = 0.0268, comparing 14 index cases without chloride secretion in intestinal current measurement (icm) to 9 index cases with cftr-mediated residual chloride secretion in icm [5] ) and affects the transcriptome of cf patients' intestinal biopsies in favor of a better processing of f508del-cftr [5] which we could confirm in epithelial cell lines as sirna provided against ehf results in a downregulation of mgat2 and mgat4, both of which are key enzymes for the complex glycosylation of proteins such as cftr. these examples indicate that small, albeit carefully selected subpopulations facilitate the identification of genetic variants by an association study that can be validated in functional assays. furthermore, we suggest that while the selection of subsamples within a population with a rare disease such as cystic fibrosis results in a loss of power, findings obtained for more than one endophenotype are indicative for a true-positive finding of a modifying gene. finally, transcriptional regulation influences the cf basic defect. interference with these pathways may results in better f508del-cftr maturation, leading to better cftr function in patient's tissue and thereby promoting health in cystic fibrosis. funding by: deutsches zentrum für lungenforschung dzl; mukoviszidose institut ggmbh abstracts at presentation at age 21 years her head circumference was on the 75th centile, her height below the 3rd centile, and her weight on the 3rd centile. she showed a disproportional short stature and mild skeletal signs like clinodactyly of the 5ths fingers. minor facial dysmorphisms included an oval face, epicanthic folds, upslanting palpebral fissures and a bulbous nose. karyotype was normal and copy number variants were excluded by high-resolution cma. as no aetiological diagnosis could be made clinically we performed whole exome sequencing (wes) and detected a homozygous splice site mutation (c.1640 + 1 g>t) in pik3c2a (phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 alpha) (nm_002645). sequencing of rt-pcr products from cdna of patient's fibroblasts showed in-frame skipping of exon 5 and 6, equally affecting all known isoforms. phosphatidylinositol 3-kinases (pi3ks) are lipid kinases involved in a large set of biological processes, including membrane receptor signaling, cytoskeletal organization, and endocytic trafficking. pi3kc2a is ubiquitously expressed and has been proposed to play an important role in clathrin-mediated endocytosis and regulation of phosphatidylinositol 3-phosphate (ptdins3p) levels. furthermore pik3c2a has been implicated in the biology of the primary cilium. the patient's distinct phenotype resembles the previously described phenotype in pik3c2a hypomorphic mice with pre-and postnatal growth retardation and a broad spectrum of renal abnormalities. the complete knockout of mouse pik3c2a showed embryonic lethality. ongoing studies on the exact consequences of the splicing defect will determine if and how much residual wild-type transcript is retained and if this is a hypomorphic variant. this case is to our knowledge the first description of a pik3c2a human phenotype. adaption of the crispr/cas genome editing system as a platform for the mutation of nipal4 as a representative of arci-associated genes in hela cells n. ballin, m-a. rauschendorf, j. fischer institute of human genetics, freiburg, germany autosomal recessive congenital ichthyosis (arci) is a rare genetic disorder with known disease-causing mutations in 9 genes. functional implementations of identified mutations are in most cases still unknown, which is amongst others due to the limited amount of skin biopsies of arci-patients. thus, suitable cell culture models for the investigation of keratinocyte differentiation are highly needed. during the last years the prokaryotic clustered regularly interspaced short palindromic repeats (crispr)/crispr-associated (cas) system has been turned into a potent tool in the field of genome engineering. the cas9 endonuclease is directed by a short guide rna (grna) to its target sequence, where it generates a dna double strand break (dsb). as cellular repair mechanisms often fail in reconstituting the original sequence, insertion/ deletion (indel) mutations occur potentially leading to a complete gene knock-out (ko). consequently, the crispr/cas system offers a simple rna-programmable tool for in vitro mutation of arci-associated genes. hence, this system was applied in hela cells to target nipa like domain containing 4 (nipal4), the second most frequently mutated gene in arci patients. in this context, functional studies were used to validate nipal4 as suitable target in hela cells. successful application of the crispr/cas system lead to the generation of a new clonal hela cell line carrying a one basepair insertion in nipal4 exon 1 (c.106insg (ccds 47328.1) ). this mutation was further characterized and potentially results in a complete ko of nipal4. this system can now be used as a basis for targeting further arci-associated genes and for transferring the system into keratinocytes, which are the amyotrophic lateral sclerosis (als) is a late-onset progressive, neurodegenerative syndrome. most als cases are sporadic (≈90%). in familial forms, mutations in several different genes have been identified with a repeat expansion in c9orf72 and mutations in sod1 being the most prevalent. no non-genetic cause of als has been identified. since there are no overt clinical or pathoanatomical differences between sporadic or familial cases, de novo mutations have been suggests in disease pathogenesis and two previous studies provided some evidence for this hypothesis. we present data of 82 patient-parents trios from an international collaborative study. by whole exome sequencing we identified 69 non-synonymous de novo mutations in the patients, however all of them occurred in different genes. there was no concordance between the mutated genes found in our trio set and the two earlier smaller trio studies. in silico analyses suggest that none of the here identified mutations are part of any of the previously postulated molecular pathways. also, gene-gene-interaction analyses failed to find an enrichment of interacting genes. lastly, we demonstrate that the de novo mutations in als patients in this and the two earlier studies are located in genes prone for de novo mutations in general. our results thus indicate that, in contrast to previous reports, de novo mutations do not seem to be a major contributor for als. disproportioned short stature and multiple anomalies in a patient with a homozygous pik3c2a mutation -a new ciliopathy? we present a 21 year old female with disproportional growth retardation, eye and renal anomalies. she was born small for gestational age as the first child of healthy, consanguineous tunisian parents at 37 weeks after an uneventful pregnancy (2470 g, 47 cm). during her first year of life, there were severe feeding problems with regurgitation and she developed postnatal growth retardation. developmental milestones were normal. during the second year the girl developed strabismus in both eyes. later on, a cataracta polaris anterior, anomalies of the cornea, hyperopia and progressive retinal degeneration led to profound visual impairment. furthermore, a right kidney agenesis and a complex tubulopathy, as well as increased echogenicity of the single left kidney were diagnosed. r. brumm, m. schmuck, y. dinçer, s. schulz, s. wilson, i. rost, hg. klein, sh. eck center for human genetics and laboratory diagnostics dr. klein, dr. rost and colleagues, martinsried, germany next-generation sequencing may lead to a significant improvement in diagnostic yield for rare, heterogeneous disorders through the ability to simultaneously sequence all genes contributing to a certain indication at a cost and speed that is superior to traditional sequencing approaches. on the other hand, the practical implementation of ngs in a clinical diagnostic setting involves a variety of new challenges which need to be overcome. among these are the generation, analysis and storage of unprecedented amounts of data, strict control of sequencing performance, validation of results, interpretation of detected variants and reporting. especially the variant interpretation emerges as current bottleneck of the diagnostic workflow. here we present midas (multiple integration of data annotation software), a central software system for data integration in a diagnostic laboratory. the goal of midas is to construct a modular software system to integrate data from laboratory information management system (lims), data from the routine sanger sequencing workflow as well as ngs sequencing results and correlate the identified variants with the patients' phenotypical features to aid in variant interpretation and accelerate reporting. the phenotype is systematically recorded using the human phenotype ontology standard nomenclature. in particular, genotype-phenotype correlations identified in one patient are made available for all other cases, to aid the interpretation and build a comprehensive knowledge base. the midas software may thus serve as central information system all diagnostic patient data. midas is implemented in java using direct database access via jdbc, and javafx as graphical user interface. its architecture is designed modular including a dynamic module loader, a user management with ldab connection and basic search functionality. according to available modules the user management and search form are adjusted; granting access for module specific views. as an advantage of this architecture, other molecular diagnostic data, such as arraycgh or mlpa results can easily be integrated by implementing new modules. midas aims to aid molecular diagnostics by simplifying and accelerating data analysis and interpretation, improving patient care. midas is being developed as part of a prospective multicentric study including clinical, diagnostic and software development partners. a grant by the bavarian ministry of economy, media, energy and technology is used to fund this effort. institute of human genetics -medical faculty -rwth aachen university, aachen, germany, 2 izkf aachen -rwth aachen university, aachen, germany with the implementation of next generation sequencing (ngs) based assays as key tool in dna-sequencing, conventional sanger-sequencing has become a standard method to verify single nucleotide polymorphisms (snps) of interest. however, designing specific pcr-and sanger-sequencing primers has always been a rather time consuming task in terms of searching the right genomic sequence, considering known snps that could result in allele-specific amplification, and converting selected primer-sequences to upload them to web-based services. to circumvent this, we developed optimus primer, a python script, that automatically designs respective primers. the script uses the database of the ucsc genome browser to download positional information of genomic sequence (refseq gene: hg38) and common snps (snp147common: hg38) in the region of interest. the genomic sequence is downloaded via the ucsc primarily affected cell type in the skin of arci patients. furthermore, it is only a small step to expand the system from generating gene ko to gene editing allowing the introduction of patient-specific mutations in non-patient derived keratinocyte cells. hence, setting-up the crispr/cas system to target an arci-associated gene is an important starting point for future studies to investigate pathogenic effects of arci-causing mutations and to understand arci pathogenesis on a molecular level. in this context, the transfer of the established protocol into keratinocytes offers the possibility to generate 3d cell culture models of mutations in arci-associated genes and allows in vitro investigation of their implications on the differentiation process. this system would further represent an organotypic model of arci-disease with potential application in screening and identification of chemical compounds for the complementation of existing clinical therapies. background: telomeres cap and protect chromosome ends from degradation and fusion, and are therefore essential for maintaining chromosome stability and genomic integrity. they have a length of 5 to 15 kb (depending on age, sex and cell populations). due to the end-replication problem there is a continuous telomere loss of 50 to 150 bp/cell cycle and thus throughout lifetime. in laboratory practice different methods of telomere length measurement are used to identify patients with bone marrow failure syndromes (e. g. dyskeratosis congenita, aplastic anemia), hematological diseases, or other telomeropathies. each telomere length measurement method has its advantages and disadvantages regarding material required the complexity and feasibility of the method and other parameters. methods: in this study we compared and validated four different methods for telomere length measurement, i. e. southern blot analysis, quantitative pcr (qpcr), quantitative fluorescence in situ hybridization (t/ c-fish) and flow cytometry-fish (flowfish). whenever possible, edta and/or heparin blood samples were collected from a population of 154 healthy individuals of different age groups (newborn -81 years). depending on the method dna (southern blot and qpcr), metaphases (t/ c-fish) or rather vital cells (flowfish) were analyzed. results: comparison and validation of the telomere length measurement methods allowed us to calculate percentiles for all age groups. percentile curves could be used in diagnostic to identify patients with short telomeres. all methods showed acceptable accuracy, but equally imply the necessity of validation and appropriate controls in each experiment. here, flowfish was the most precise, accurate and reproducible method compared to the other methods. discussion: our study emphasizes the influence of expertise and experience that is required in order to produce robust and reliable telomere length analyses. here, we provide advice on how to choose the appropriate method in general and for individual cases to safely discriminate between natural variability and pathological telomere shortening in individual cases. in the next generation sequencing age, the strongest challenges have shifted from genotyping to handling the myriad of variants detected. whole exome sequencing yields tens of thousands of coding and non-coding variants, a number increased to millions if the whole genome is sequenced. to master this mountain of data, computer-based identification of potential disease mutations is absolutely indispensable. however, current computational strategies are usually generated by computer scientists without integrating human geneticists into software development. this often leads to tools which fulfil their main purpose but are not ideally suited to the needs of human geneticists. to assess the relevance of a suggested disease mutation, geneticists need detailed information about the effect on the protein and about the gene or protein itself. to close this gap, we have developed mutationdistiller, a web-based tool for user-driven variant prioritisation based on biological disease properties. its analysis includes the potential role of a mutated gene in pathogenesis as well as the estimated effect of a variant on gene/protein function. thus, mu-tationdistiller allows human geneticists to use every piece of information they consider relevant. unlike similar tools, its input goes beyond the human phenotype ontology and can include complete diagnoses, biological pathways, gene expression, and gene function. potentially harmful variants are identified by mutationtaster, which provides a deleteriousness score together with data on the actual effect of the variants. mutationdistiller is not restricted to non-synonymous variants but can also handle intragenic non-coding or synonymus variants. moreover, the program incorporates the known modes of inheritance of disease genes and the genotype of the queried variants (including compound heterozygosity). the output page provides all information on one site: the core component is a concise overview table of the most likely disease genes and variants. here, the most crucial information such as the variants' effect on the protein, frequencies in polymorphism databases and known diseases caused by mutations within this gene and their mode of inheritance are listed. more detailed gene and disease information and hyperlinks to external data are provided below, hence offering a comprehensive overview of the available knowledge. this allows geneticists to draw their own conclusions without any tedious collection of relevant information from the internet. a beta-version of the program is freely available at http://www.mutationtaster.org. m. jäger 1,2 , m. schubach 1 , t. zemojtel 1 , k. reinert 3 , d. m. church 4 , p. n. robinson 1, 5, 6 1 institute for medical and human genetics, charité, berlin, germany, 2 bcrt, berlin, germany, 3 institute for bioinformatics, department of mathematics and computer science, freie universität, berlin, germany, 4 10x genomics, pleasanton ca, united states, 5 department of mathematics and computer science, freie universität, berlin, germany, 6 the jackson laboratory, farmington ct, united states with the grch37 human genome release in 2009 the genome resource consortium extended the previous linear, "golden-path" paradigm of the human genome and introduced a more graph-like model in the sense of regions with alternate loci, representing common alterations in sequence and structure. in whole-genome sequencing (wgs) these stretches of sequence are largely but not entirely identical between the primary assembly and an its corresponding alternate locus can result in multiple variant calls against regions of the primary assembly. this results in characteristic and recognizable patterns of variant calls at positions that we term alignable scaffold-discrepant positions (asdps). we developed an algorithm (asdpex) that analyzes these patterns in the 178 structurally variable regions of the current grch38 genome assembly. a heuristical approach then infers whether the pattern of variant calls of a sample contains sequences from the primary assembly, an alternative locus, or their heterozygous combination at each of these 178 regions. api and common snps are annotated. in the following step primers are picked using primer3 (whitehead-institute). the target sequence is either the snp-containing exon in case of exons smaller than 300 bp, or, the genomic region flanking the snp of interest. the positional information of the snp is gathered using the mutalyzer api (lumc). the last two bases of each primer are checked for all snps in the 3' end (snp147: hg38). the script will be integrated into a website for free easy access and usability (www.optimus-primer.com). the only information needed is the refseq id, the exon, and the cdna position (eg: nm_000.249.3:c.464t>g). the script optimus primer will provide the user with primers in the widely used primer3 format. the users are also able to download the source from our website, and to run it on a local unix system as a command line tool. mutationtaster3: moving towards a comprehensive evaluation of disease causing mutations o. ebner, jm. schwarz, d. hombach, m. schuelke, d. seelow charitè universitätsmedizin, berlin, germany mutationtaster is a free and user-oriented application for comprehensible evaluation of non-synonymous and synonymous as well as non-coding dna sequence variants. as 1500+ citations show, the software has been strongly embraced by the clinical and research community. however, its capacities for assessing the functional consequences of non-coding variants are still limited. while mutationtaster is able to predict the variant's effect on major intragenic regulatory features such as splice sites or polyadenylation signals, many other potential effects of intronic or utr variants are not covered by the software yet. variants in the non-coding region of the genome are frequent and thought to cause a substantial part of yet unsolved mendelian diseases. these variants can occur either in extragenic regulatory regions (see abstract on regulationspotter) or in the untranslated regions, including introns, of a gene. unfortunately, their effects are much harder to predict than those of non-synonymous variants. therefore, only few disease mutations outside the coding sequence have hitherto been found and experimentally confirmed. to close this gap, we are advancing the current version of mutationtaster to improve the analysis of intragenic non-coding variants. we do so by implementing additional tests for variants in the 5' and 3'utr to determine their effect on au-rich elements, microrna binding sites and the influence of secondary structure on gene expression. moreover, we plan to shift from the analysis of exclusively monogenic to complex diseases with cancer being the first disease model to be integrated into mutationtaster3. already implemented enhancements of mutationtaster3 comprise the improved splice site analysis using maxentscan and the integration of mitochondrial polymorphisms from the human mitochondrial genome database. the addition of links to the integrative genomics viewer and the lof-metrics of the exac browser (exome aggregation consortium) streamline the usage of the mutationtaster3 results and enable researchers to get a detailed view of the relevant variants and associated genomic sequences. taken together, mutationtaster3 will offer much better capabilities to predict the disease-causing potential of intragenic variants than its predecessor. the software is freely available at http://www.mutationtaster.org neurocure exzellenzcluster, berlin, germany, 2 neuropädiatrie charité-universitätsmedizin, berlin, germany, 3 medizinische genetik charité-universitätsklinikum, berlin, germany, 4 berliner institut für gesundheitsforschung bih, berlin, germany ifications, or genome-wide interactions. all variants which have the potential to influence one or several candidate genes are presented in a graphical interface and ranked according to their predicted effect on the target gene(s). instead of giving scores, we present the potentially affected regulatory features in an intuitive graphical matrix. the software can freely be used at http://www.mutationtaster.org p-techno-179 genecascade2017 -a one-stop shop for finding disease mutations d. seelow 1, 2 in the last years, our genecascade software suite for the elucidation of rare diseases has seen some major extensions, mainly for the discovery of non-coding mutations. the website contains a number of tools focusing on the different steps in studying genetic disorders. all applications are web-based, have easy-to-use interfaces and are aimed directly at human geneticists, without any need to install software, use the command line or to try to explain clinical features to it specialists. we think that software should adapt to the user -not the other way around. and, unlike other web shops, its use is completely free. homozygositymapper finds disease-linked regions in consanguineous families. users can upload genotypes from snp chips as well as wes and even wgs data. we display likely disease loci in intuitive graphical interfaces. the results can also be used in our 'downstream' applications such as genedistiller, mutationtaster, or mutationdistiller to identify the actual disease mutation. genedistiller provides a user-driven way to find the best candidate gene for a genetic disorder. users can specify various aspects of the patients' phenotypes and the results are presented in a comprehensive way without the need to manually query other internet resources to collect further information relevant to asses the disease potential. mutationtaster evaluates the disease potential of non-synonymous as well as of non-coding and synonymous dna variants within genes. it does not only display a prediction but also detailed information on the variants' likely effect on mrna and protein. it can either analyse single variants as found by sanger sequencing or complete vcf files from wes or wgs projects. regulationspotter lists information about potentially regulatory dna variants outside of genes. variants are ranked according to their predicted effect on gene regulation. in addition to a score, we provide a user-friendly summary of the functional effects a variant may have. mutationdistiller combines genedistiller and mutationtaster in a single application for convenient variant and gene prioritisation. it offers human geneticists much more freedom to enter the phenotype than similar applications and provides deeper information about the variant and the gene. cnvinspector is aimed at the study of copy number variants. users can upload their patients' (or cohorts') cnvs and compare them with their own healthy controls or public data stored in our database. we include deci-pher to indicate known diseases caused by cnvs at the same position. epossum examines the effect of dna variants on transcription factor binding. as tf binding site prediction is notoriously unreliable, we also give an indication of the statistical relevance of a result. the genecascade website can be accessed at http://www.mutationtaster.org. we investigate 121 in-house wgs datasets and found that on average 51.8 ± 3.8 of the 178 regions correspond to an alternate locus rather than the primary assembly sequence. filtering these genomes with our algorithm identified around 7900 variant calls per genome that colocalized with asdps. our findings suggest the potential of fully incorporating the resources of graph-like genome assemblies into variant calling. our algorithm already uses the information contained in the 178 structurally variable regions of the grch38 genome assembly to avoid spurious variant calls in cases where samples contain an alternate locus rather than the corresponding segment of the primary assembly. the human phenotype ontology project provides three resources, the ontology of clinical features, disease-phenotype associations and algorithms that enable analysis of data that is described using hpo. this resource is being used for computational deep phenotyping and precision medicine. it enables clinical data integration for translational research. the hpo is being increasingly adopted in software, research projects and companies world-wide. we will discuss the progress and recent developments that the hpo project has made since 2008. this will include the expansion of hpo for common (complex) diseases, novel algorithms for phenotype-driven analysis of genomic variation, cross-species mapping of phenotype data, translation of hpo into several languages and also the addition of a more patient-friendly terminology for hpo terms. millions of patients worldwide suffer from a rare genetic disease. to date, the omim database lists more than 8000 diseases with proven or suspected mendelian basis, but the molecular cause is known for less than 60%. although next generation sequencing has drastically facilitated the discovery of disease mutations, a substantial fraction of whole exome sequencing (wes) projects fail to identify the causal variant. this may be due to the fact that disease-causing mutations are not always located within the coding sequence. whole genome sequencing (wgs) could solve this problem, but we are not yet readily prepared to handle the vast amount of variants which are generated by this technique, as intuitive and reliable software solutions for variant evaluation are missing. the currently existing approaches are not well-suited for a diagnostic setting, because they output a battery of different scores whose interpretation is left to the users. however, human geneticists are specialists for the assessment of symptoms of genetic diseases -not for the analysis of numerical scores indicating different likelihoods of the occurrence of regulatory elements. this is especially true when it comes to the hundreds of thousands of extragenic variants found by wgs, for which effect predictions always face a high level of uncertainty. we consider it crucial to provide geneticists with the information they need to determine the significance of a variant, not to flood them with scores whose meanings are difficult to grasp. to facilitate the interpretation of extragenic variants, we are developing regulationspotter. in contrast to other approaches such as genomiser or cadd, we integrate as much knowledge as possible in a user-friendly fashion to pinpoint the variants which are most likely to disturb the expression of candidate genes. regulationspotter includes different data on regulatory dna elements such as dna methylation, transcription factor binding sites, histone mod-the human exon 1 in the 5' end of the murine huntingtin gene. by using a novel object recognition test with a 24 h interval between sample and test phase we have found a profound deficiency of hippocampus dependent long-term memory in heterozygous transgenics. this phenotype was detected as early as 12 weeks of age and is complementary to deficits that we have identified in the hdhcag111 mouse model previously. motor deficits as well as intranuclear aggregates are described at much later stages in both of these models. we have shown previously that in hd patients, mediated through mtor signaling, translation of mrna carrying expanded cag repeats is elevated (krauss et al., 2013) . we have also seen that the biguanid metformin antagonizes mtor signaling in neurons in-vitro and in-vivo (kickstein et al., 2010) . we show here that metformin, by interfering with the mtor kinase and its opposing phosphatase, pp2a, regulates local protein synthesis in the brain and is able to suppress the production of disease making protein in early hd. furthermore metformin leads to a significant improvement of movement abilities in a c-elegans model for hd and to a rescue of early cognitive symptoms in the hdhcag150 animal model. these data suggest that metformin is a very promising candidate for early phase treatment of hd patients. allele-specific suppression of dominant-negative bestrophin 1 mutations a. milenkovic, lm. braun, f. grassmann, b. h. f. weber institute of human genetics, regensburg, germany purpose: retinal pigment epithelium (rpe) differentiated from human induced pluripotent stem cells (hipsc) demonstrated degradation and mislocalization of mutant bestrophin 1 (best1) protein in autosomal dominant best disease (bd). importantly, mutated alleles revealed a dominant-negative effect leading to an impairment of volume-regulated chloride transport, the basic function of the homo-pentameric best1 channel. here, our study aimed for a proof-of-concept to treat bd by selectively eliminating best1 mutant transcripts in patient-derived hipscs prior to rpe differentiation via the crispr/cas9 genome editing technology. methods: adult human dermal fibroblast were obtained from skin biopsies of bd patients and reprogrammed into hipscs. single guide rna sequences (sgrna) targeting 6 best1 mutations were selected by the "optimised crispr design tool" (zhang lab, mit 2015) . editing efficiency and specificity of designed sgrnas were tested in hek293 cells using an established fluorescence-based assay. after transfection of hipscs the percentage of indel formation of on-and off-targets was determined by a pcr-based crispr/cas cleavage detection kit. cas9-treated stem cell populations were analyzed for pluripotency and selected for full genome sequencing before differentiation to rpe cells. results: computational design of 6 disease-causing best1 variants (n11k, v86m, s108r, q238r, a243v, and i295del) offered at least one sgrna with predicted high quality per mutant allele. the guide sequences were cloned into the cas9-expressing plasmid px459 and co-transfected with pcag-egxxfp plasmids containing genomic fragments of ~500 bp of either the mutated or wildtype sequence to the corresponding sgrna. as targeted cas9 cleavage results in reconstitution of the egfp expression cassette by homology dependent repair, the efficiency and specificity of cas9 cleavage was evaluated on a plate reader by quantifying egfp fluorescence after 48 h of transfection. as a result, 5 out of the 6 sgrnas tested showed high allele-specificity and are now used for targeted genome editing in hipscs. conclusion: so far, there is no treatment for bd although the molecular pathology of best1 has recently been established. our proof-of-concept study aims to determine whether haploinsufficiency of normal best1 protein is sufficient to fully or partly restore cellular function in cells of primary bd pathology, namely the rpe. to this end, we will determine the degree of rescue (i. e. reconstitution of volume-regulated chloride conductance) by whole-cell patch-clamp analysis. if successful, our crispr/ cas9-driven approach will be useful to treat other diseases with dominant negative effects of the mutated allele. p-therap-184 *** metformin rescues early cognitive symptoms in the hdhcag150 mouse model and is therefore a promising candidate for treatment of hd patients. huntington's disease (hd) is an autosomal dominant neurodegenerative disorder that is caused by an unstable glutamine (cag) trinucleotide repeat expansion within exon 1 of the huntingtin gene and leads to cognitive decline and affects motor abilities. in the prodromal phase of the disease patients develop mood swings, personality changes and subtle cognitive impairment. close understanding of clinical signs and molecular mechanisms behind this early stage of hd is an important step for the development of a causal therapy. we have analysed a knock-in mouse model that carries 150 cag repeats and single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation age-related clonal hematopoiesis associated with adverse outcomes clonal hematopoiesis and bloodcancer risk inferred from blood dna sequence age-related mutations associated with clonal hematopoietic expansion and malignancies cancer spectrum in dna mismatch repair gene mutation carriers: results from a hospital based lynch syndrome registry update on kleefstra syndrome characterization of a novel transcript of the ehmt1 gene reveals important diagnostic implications for kleefstra syndrome national institute of child health and human development abstracts p-monog-147 novel mutation in hoxc13 expand the mutation spectrum of pure hair and nail ectodermal dysplasia abstracts in conclusion, targeted sequencing represents an effective, fast, cost-efficient and flexible method, since new candidate genes can be added easily to the panel, for the sequential analysis in chh patients. furthermore, panel sequencing alleviates the uncovering of oligogenic inheritance in genetic traits like chh abstracts p-monog-159 neurodegeneration in the olfactory bulb and olfactory impairment in the ccdc66 -/-mouse model for retinal degeneration s. schreiber 1 , e. petrasch-parwez one hallmark of sca3 and other mechanisms that govern ccdc66-deficient degeneration, a detailed evaluation is performed in order to reveal processes that contribute to retinal degeneration during early postnatal development and adulthood. methods: the functional role of ccdc66 deficiency is investigated by two independent approaches: 1) gene expression profiles at p10 and p28 are analyzed in retinal tissue of cccdc66-/-and wildtype mice (genechip mouse gene 2.0 st array, affymetrix) followed by quantitative real-time pcr (qrt-pcr) and 2) potential retinal interaction partners of ccdc66 protein are identified by yeast-two hybrid screening in the wildtype mouse and further analyzed by immunohistochemistry. results: using two screening methods (rna-expression profiles and protein interaction partners), our results indicate that 1) the ccdc66 deficient mouse model reveals early changes in retinal rna gene expression already at p10 and the highest number of genes differential expressed at p28. most expression differences were related to genes associated with the extracellular matrix. further genes are involved in retinal degeneration, angiogenesis, transcription factors and proteolysis. 2) the ccdc66 protein interacts with the proteins eps8 (epidermal growth factor receptor kinase substrate 8) and mpdz (multiple pdz-domain protein). both proteins are expressed in several retinal layers of the retina, confirmed by immunohistochemistry. moreover, mutations in the mpdz gene were already identified in patients with retinitis pigmentosa/leber congenital amaurosis. conclusion: expression profiles reveal expression changes at an early time point of retinal degeneration in the ccdc66-/-mouse model enabling further studies on the role of genes and processes involved in early retinal degeneration. in addition, the interaction partners of ccdc66, eps8 and mpdz, are the basis for further studies examining the pathways of retinal degeneration in the mammalian retina including man -and possibly contribute to future studies in man and human disease. ccdc66 null mutation causes retinal degeneration and dysfunction of medical and molecular genetics and skeletal dysplasia multidisciplinary unit hospital unit of genetics of neurodegenerative and metabolic diseases sel-002, ws7-005, p-compl-125 ws3-004 p-cling-051, p-cling-052 ws1-003, ws4-006, p-cling-056, p-cling-068, p-cling-075 ws7-001, p-cling-050, p-cling-074, p-monog-141 p-cancg-042 p-cling-060, p-cling-073, p-cling-086 p-cling-063, p-cling-065 ws7-005, ws8-004, p-cling-088 ws7-001, p-cling-110 p-cancg-044, p-cancg-046, p-cling-082 ws2-001 medizinische genetik 1 · ws3-002, p-cling-052 ws5-001, p-cling-055 ddd study 0. p-cling-095 ws3-006, ws6-001, p-cling-054, p-compl-120 ws4-006, p-cling-056, p-cling-075 distelmaier f. ws7-003 ws9-004, p-basepi-010, p-cancg-041 ws7-001, ws8-006, p-cling-050, p-cling-067, p-cling-074 ws5-001, p-cling-058, p-monog-162 ws2-002 ekici ab. ws5-005, p-monog-143 sel-001 p-cancg-048, p-techno-171 plen 2, ws4-002 golla a dfg-workshop göpfert mc sel-004, sel-004 haferlach c. ws8-005 haferlach t. ws8-005 ws3-006, ws6-001, p-compl-120, p-compl-121 p-cancg-032, p-cancg-039, p-cancg-046 gilissen c. ws3-003 ws4-003, p-basepi-006, p-cling-089, p-cling-099 p-cling-063, p-cling-065, p-cling-081, p-cling-095 ws7-001, p-cling-050 p-cling-051, p-cling-052, p-cling-054 ws3-006, p-compl-121 p-monog-155 medizinische genetik 1 · ws2-002 hüffmeier u. ws5-005 ws4-003, p-cling-099 seq consortium ws8-002 ws1-003, ws3-006, p-compl-121 p-cling-053, p-cling-059, p-cling-072, p-cling-109 ws2-005, p-cancg-040 ws1-003, ws3-006, ws9-002, p-compl-121 p-cancg-044, p-cancg-045 ws3-003 ws5-001, p-cling-055, p-cling-058, p-cling-069, p-cytog-128 ws6-003, ws6-006, p-compl-120, p-compl-124, p-compl-125 ws2-003, p-cling-095 p-cling-052, p-cling-080 p-cling-063, p-cling-065 p-cling-050, p-cling-060, p-cling-074 ws9-004, p-cancg-042 ws4-006, p-cling-075 ws2-002, ws2-003, p-cancg-036 khor cc. ws6-005 sel-001, ws8-001 ws4-003 medizinische genetik 1 · p-cling-053, p-cling-072, p-cling-095 ws3-006, p-compl-120, p-compl-121 ws2-005, p-cancg-040 ws6-006, p-cling-054 meggendorfer m. ws8-005 ws7-004, p-cancg-046 p-cling-051, p-cling-052, p-cling-080 p-cling-081, p-cling-090 ws2-005, p-cancg-040 ws2-002, ws4-005, p-monog-143 ws8-001 ws3-004 ws2-002, p-cling-095 pasutto f. ws6-005 nanda i. ws1-003 nöthen mm. ws3-006, ws6-001, ws6-002, ws6-003, ws6-006 ws3-003 ws3-003, ws3-006, ws7-005, ws8-004, p-cling-088, p-compl-120, p-compl-121 sel-004 medizinische genetik 1 · p-cancg-044 schnapp l. ws1-003 ws2-004, p-cancg-031, p-cancg-046, p-cling-092 ws4-006, p-basepi-008, p-cling-056 p-cling-066, p-cling-106 p-cancg-031, p-cling-092 salaverria i. ws8-002 ws3-006, p-compl-121 ws2-003, ws4-005, ws6-005, p-cancg-036, p-cling-085, p-cling-095, p-cytog-131, p-monog-143, p-monog-167 p-cancg-044, p-cancg-045, p-cling-067, p-cling-089 ws9-005 sel-002, ws7-005, p-compl-125 the international gamos consortium e ws2-002, ws4-005, ws5-005, p-cancg-036, p-monog-143, p-monog-167 sel-002, ws3-006 ws9-004, p-cytog-127 p-cancg-031, p-cling-092, p-cling-112 p-cancg-044, p-cancg-045, p-cling-067 stengel a. ws8-005 p-cling-085, p-cling-094 ws3-002, ws4-004, ws7-003, p-cling-051, p-cling-052, p-cling-054, p-cling-080, p-cling-093 p-cancg-047, p-cling-051 ws8-003 ws2-001, p-cling-111 ws7-005 witsch-baumgartner m. p-cling-084 p-cancg-039 p-cancg-044 wunderle m. ws2-003 zechner u. ws1-003, ws4-006, p-basepi-008, p-cling-056, p-cling-068, p-cling-075 ws6-004, p-cancg-046, p-cling-097 edu 2, p-cling-051, p-cling-052, p-cling-054, p-cling-080, p-cling-094, p-cling-096 p-cling-051, p-cling-052, p-cling-054 wiesener a. ws4-005, p-monog-167 ws4-005, ws6-005, p-cancg-036, p-cling-085, p-compl-119, p-monog-167 ws2-002 ws2-005, p-cancg-040 autorenverzeichnis zenker m. ws7-002, ws9-005, p-cling-061, p-cling-080 dialog mit shg, p-cling-074 ws4-005, ws5-005, p-cling-095, p-monog-151 ac. koller 1,2 , j. strohmaier 3 , ku. ludwig 1,2 , fc. degenhardt 4 , m. wulff 1,2 , d. breuer 1,2 , l. winkler 1,2 , f. neukirch 1,2 , a. maaser 1,2 , a. forstner 1,2 , s. sivalingam 1,2 , a. reif 5 , a. ramirez 6 , w. maier 6 , d. rujescu 7 , i. giegling 7 , h. thiele 8 , p. nürnberg 8 ,2 , a. fischer 9 congenital hypogonadotropic hypogonadism (chh) is a rare and clinically and genetically heterogeneous disorder. chh is characterized by incomplete or absent puberty caused by the lack or deficient number of hypothalamic gonadotropin-releasing hormone (gnrh) neurons, disturbed secretion or action of gnrh, or both. chh is often associated with anosmia and is then termed kallmann syndrome (ks), as well as with other phenotypes like unilateral kidney agenesis, skeletal abnormalities, midline malformations, and hearing loss. x-linked, autosomal-dominant, and autosomal-recessive, as well as di-and oligogenic inheritance has been described for chh. in the meantime a multitude of genes has been reported to be associated with chh. actually, in fewer than 40% of the chh cases the underlying genetic cause can be identified. we analyzed a total of 19 patients with chh (4 female/15 male) by using targeted sequencing of 16 chh-associated genes kal1, chd7, fgf8, fgfr1, fshb, gnrh1, gnrhr, hs6st1, kiss1, kiss1r, nsmf, prok2, prokr2, spry4, tac3 and tacr3 and identified pathogenic mutations in 8 chh patients of our cohort. mutations were detected in kal1 (3 patients), tacr3 (2 patients), prokr2 (1 patient), hs6st1 (1 patient) and gnrhr (1 patient). furthermore, we found in two patients with described pathogenic mutations (one patient with prokr2 mutation and the other with kal1 mutation, respectively) additional mutations in nsmf gene and tacr3 gene, respectively, suggesting digenic inheritance in these cases. pain is necessary to alert us to actual or potential tissue damage. specialized nerve cells in the body periphery, so called nociceptors, are fundamental to mediate pain perception and humans without pain perception are at permanent risk for injuries, burns and mutilations. pain insensitivity can be caused by sensory neurodegeneration which is a hallmark of hereditary sensory and autonomic neuropathies (hsan). although mutations in several genes were previously associated with sensory neurodegeneration, the etiology of many cases remains unknown. using next generation sequencing in patients with congenital loss of pain perception, we here identify bi-allelic mutations in the flvcr1 (feline leukemia virus subgroup c receptor 1) gene, which encodes a broadly expressed heme exporter. different flvcr1 isoforms control the size of the cytosolic heme pool required to sustain metabolic activity of different cell types. mutations institute of human genetics, bonn, germany, 2 the rild institute, wonford, exeter, uk introduction: ectodermal dysplasias (eds) are a large group of heterogeneous genetic disorders characterized by abnormal development in ectoderm-derived tissues and organs including skin, hair, and nails. among the eds, pure hair and nail ectodermal dysplasia (phned) is a rare genodermatosis characterized by nail dystrophy and sparse or absent hair on the scalp. materials and methods: a family of iranian origin was enrolled in this study. two children from a consanguineous marriage are affected from phned. in addition, the father has alopecia areata (aa) but does not show any nail dysplasia. the mother is unaffected. the paternal and maternal grandfathers had nail dysplasia almost similar to the siblings but did not manifest any hair loss or aa. homozygosity mapping and genedistiller analysis were performed to identify candidate genes. sanger sequencing was used to localize the mutations. results: in total, we identified 10 homozygous regions with almost 700 candidate genes. among these genes were also krt85 and hoxc13, already known to be related to the phenotype of our patients. therefore, we focused our additional analyses on krt85 and hoxc13. sanger sequencing showed a so far unknown homozygous insertion of 28 bp in exon 2 of hoxc13. conclusion: we identified an unknown mutation for phned which expands the spectrum of mutations for phned. the hair loss of the father seems rather be due to a distinct type of hair loss, namely aa, which is quite common in the general population. however, the nail dysplasia from both grandfathers is unclear and cannot be examined anymore. it still remains unclear if the nail dysplasia in the grandfathers was due to the same mutation in hoxc13 or is based on a different mutation. generation of a cell culture model for epidermodysplasia verruciformis by knock-out of ev3, a novel gene involved in this genodermatosis e. imahorn 1 , m. aushev 2 , sj. de jong 3 , e. jouanguy 4, 5, 6 , jl. casanova 4, 5, 6 , ph. itin 1, 7 , j. reichelt 2, 8 epidermodysplasia verruciformis (ev) is a rare hereditary skin disease leading susceptibility to certain types of cutaneous human papilloma viruses, mainly β-hpv, and a high risk for development of cutaneous squamous cell carcinoma. homozygous or compound heterozygous loss of function mutations either in tmc6 or in tmc8 have been described in several ev patients, but more than 1/3 of affected families do not have mutations in one of them. a third gene (ev3) has recently been identified to be mutated in ev patients without tmc6 or tmc8 mutation. investigations on the function of this gene may elucidate pathomechanisms of ev. we aim to generate an ev3 deficient model cell line to study the effects of ev3 despite the scarcity of patient material. detection of deletions during diagnostic massive parallel ngs gene panel sequencing neuronal differentiation at a later time point. a novel mutation c.4a>g, pser2gly in des gene in a family with catecholamine polymorphic ventricular tachycardia s. komatsuzaki, p. villavicencio lorini, k. hoffmann institute for human genetics, university hospital halle, germany background: myofibrillar myopathy (omim 601419) is a heterogeneous neuromuscular disease characterized by progressive muscle weakness, partially with cardiomyopathy and/or arrhythmia. pathohistological examinations show desmin-positive protein aggregations. the mutations in des, cryab, bag3, myot, ldb3, flnc, fhl1, and dnajb6 have been identified in patients with myofibrillar myopathy. desmin is a intermediate filaments and composed of non-helical n-and terminal head domain and a central a-helical rod domain. till now more than 67 mutations myofibrillar myopathy were identified in patients with and a genotype-phenotype was reported: mutations in rod domain tend to cause neuromuscular symptoms, and mutations in head domain is associated with cardiac symptoms. the head-domain is a serine rich domain and phosphorylated by protein kinase c. almost all reported mutations in head-domain in des are at serine residues (e. g. s2i, s7f, s13f, s46f, s26y, s26t). these findings suggested that the serine residues in head-domain could play an important role in biological function of desmin. here we report a novel mutation c.4a>g, ps2g in des in a family with catecholamine polymorphic ventricular tachycardia. case: a 58 years old patient developed a syncope in a cold winter. the cardiological investigations revealed a diagnosis of catecholamine polymorphic ventricular tachycardia and the implantation of permanent pacemaker was indicated. his older brother, father and paternal uncle suffered from arrhythmia and all of them received a permanent pacemaker implantation at around 60 years of age. his cousin was died at age of 24 years due to sudden cardiac arrest. we performed a genetic analysis for ryanodin-rezeptor 2-gen and lamin a gene and no mutation in these gene was identified. next, we performed an exome sequencing. here a novel heterozygous mutation c.4a>g, ps2g in des was identified. this mutation was also identified in uncle of index patient. the c.4a>g, ps2g in des was not reported neither in 1000 human genome project, nor in exome aggregation consortium. a mutation at serine 2 (p.s2i) was previously reported in patients with myofibrillar myopathy with cardiac symptoms. based on these findings, it was strongly supposed that the p.s2g in des gene is a causative mutation for myofibrillar myopathy with arrhythmia in our patients. discussion: recent studies suggest that the mutations in des influence not only on muscle stability and myocardial force generation, but also impaired ubiquitin proteasome system, which could be caused by aggregated desmine. generally, phosphorylation of serin and threonine in head-domain of desmin is related to disassembly of filaments. therefore the p.ser-2gly mutation in des could cause altered phosphorylation of desmin. the functional abnormalities caused by pser2gly in des gene need to be studied. gene dosage manipulation of the chromatin organizer ctcf in the nervous system of drosophila melanogaster results in neurological and morphological phenotypes e. konrad, a. gregor, m. brech, c. zweier institute of human genetics, fau-erlangen-nürnberg, erlangen, germanythree-dimensional organization of eukaryotic genomes is crucial for temporal and spatial regulation of gene expression. architectural proteins, like the ccctc-binding factor ctcf are responsible for establishing and maintaining this organization. ctcf is involved in virtually all chromatin regulating processes including enhancer function, insulation, alterna-for this purpose we used the crispr/cas9 system to delete the ev3 coding sequence in an immortalized keratinocyte line. after isolation by facs, single cell clones have been expanded. we screened 41 clones for deletion of the whole gene as well as for the expression of ev3. three clones showed no detectable gdna sequence or expression of the ev3 gene. we found only one wildtype clone without deletion of ev3 or alterations near the cas9 cut sites. all clones have been characterized by a snp-array as well as sequencing of the knockout site and the most probable offsite targets. these ev3 deficient keratinocyte lines are the first cell culture model for ev. it will be a valuable tool to identify cellular pathomechanisms of the disease and allow insight into the control of β-hpv in the general population. hemizygous loss of function of single x-chromosomal genes is the most frequent cause for genetic intellectual disability (id). while a long list of gene mutations have so far been described to be responsible for the disease phenotype, little is known about the underlying neuronal mechanisms. reprogramming of somatic cells into stem cells (ipscs) followed by differentiation into neuronal precursors (npcs) is an important tool for translating research allowing an understanding of network dysfunction in id patients. opitz bbb/g syndrome (os) is characterized by a number of ventral midline defects combined with learning disability, developmental delay and intellectual disability. it is caused by mutations in the x-linked mid1 gene, which, as we have shown previously, regulates mtor dependent local protein synthesis. in a mouse model, loss of mid1 function leads to significant disturbance of axonal outgrowth. we have generated ips cells from several patients with os and from one mother that carries a loss of function mutation in the mid1 gene. by sorting cell clones after reprogramming we have established ips cell clones from this female carrier of a 4bp deletion in the mid1 gene (c.1801_1804delctcc) either expressing from the mutated x-chromosome or the non-mutated x-chromosome and have shown that indeed the generated ipsc-clones express either 0% of the mutated or 0% of the wildtype mid1. we determined this directly using an allele-specific mid1-rt-pcr and indirectly by comparing the methylation of humara-alleles. comparison of mutation expressing ipsc-clones with non-mutation expressing ipsc-clones showed that the mid1 mutation results in significantly smaller cells with reduced s6 phosphorylation supporting aberrations in the mtor/pp2a signaling cascade. when differentiating ipscs into neuronal precursor cells, significantly bigger embryoid bodies (ebs) can be detected in the mutation expressing clones, while the total eb number is higher for the non-mutated mid1 expressing clones. when further differentiated, ebs from all ipsc-clones form neuronal rosette structures expressing beta-iii-tubulin, with a lower rosette structure count in the mutated mid1 expressing cells. these data clearly hint a defect in neurogenesis in cells with hemizygous mid1 mutations. interestingly, while ipscs stably kept the x-incativation pattern of the original fibroblast, during differentiation x-inactivation was lost leading to biallelic expression from both x-chromosomes in the npcs pointing towards an as yet unknown reactivation mechanism of the inactive x-chromosome. we are currently analyzing x-inactivation throughout the differmutations in the aminoacyl-trna-synthetase genes sars and wars2 are associated with autosomal recessive intellectual disability l intellectual disability (id) is the common feature of a very heterogeneous group of disorders, which comprises a broad variety of syndromic and non-syndromic phenotypes. here we present mutations in two aminoacyl-trna synthetases that are associated with id in two independent iranian families. in the first family, we found a missense mutation (c.514g>a, p.d172n) in the cytoplasmic seryl-trna synthetase (sars) gene that affects the enzymatic core domain of the protein and impairs its enzymatic activity. this probably leads to reduced trnaser concentrations in the cytoplasm. in silico analyses predicted the mutant protein to be unstable. this prediction could be experimentally substantiated by results obtained through studies with ectopic mutant sars in transfected hek293t cells. in the second family, we identified a compound heterozygous genotype of the mitochondrial tryptophanyl-trna synthetase (wars2) gene, consisting of a nonsense mutation (c.325dela, p.ser109alafs*15), which very likely leads to nonsense-mediated mrna decay, in combination with a missense mutation (c.37t>g, p.w13g). the p.w13g mutation affects the mitochondrial localization signal of wars2, leading to mislocalization of the mutant protein. thus, when taking aimp1 into account, which we have recently implicated in the aetiology of id as well, there are now three genes with a role in trna-aminoacylation that are associated with this condition. hence we propose that the functional integrity of t-rnas in general is an important constituent in the development and maintenance of human cognitive functions. dravet syndrome is a rare autosomal dominant genetic disorder with early-onset epileptic encephalopathy is mainly caused by different de novo mutations of the scn1a gene encoding the type 1 subunit of voltage-gated sodium channel. whole exome sequencing(wes) enables scanning a large number of genes which not only can confirm the diagnosis but also helpful in understading for possible relationship between clinical manifestations and mutaion. the authors investigate wes in a 18 months term boy with hypotonia and convulsion of normal and relative parent. she had uncontrolled sizure without fever, and developmental delay from 6 months.in treatment protocol anti covulsants changes to valorate, clonazepam and stiripentol as well. a deleterious novel heterozygous splice site mutation in scn1a tive splicing, imprinting, v(d)j recombination, chromatin loop formation and defining topologically associated domains (tads). recently, we identified de novo mutations in ctcf in patients with a surprisingly mild phenotype of variable developmental delay or intellectual disability, mild short stature and microcephaly, and behavioural anomalies. apart from observing brain malformations and early lethality or learning deficits in two conditional knockout mouse models, little is known about the role of ctcf in neuronal development and preservation so far. therefore, we utilized the model organism drosophila melanogaster to further explore the role of ctcf in cns development and function. similar to observations in knockout mice, complete knockout or ubiquitous knockdown of ctcf is embryonic lethal in drosophila. we therefore utilized the uas/gal4 system to induce tissue specific knockdown or overexpression of ctcf in the fly nervous system. we first investigated development and morphology of the larval neuromuscular junctions (nmjs), an established model for synaptic development. while pan-neuronal overexpression of ctcf showed no morphological nmj alterations, pan-neuronal knockdown resulted in fewer nmj branches than in a specific control. additionally, we observed a reduced number of active zones in a hypomorphic mutant line compared to a wildtype control. using the negative gravitaxis assay to examine gross neurological function, we found a highly significant impairment of geotaxis behavior in flies with ctcf knockdown in neurons, motoneurons and muscle and in flies with overexpression of ctcf in glia cells, muscle and motoneurons. currently we are testing learning and memory behavior with the courtship conditioning paradigm. our findings of various neurological and morphological anomalies upon manipulation of ctcf dosage in the fly nervous system emphasize the role of ctcf in nervous system development and function and provide a basis to further study the molecular mechanisms underlying cognitive dysfunction caused by ctcf-deficiency. congenital or early-onset nystagmus (cn) is characterized by involuntary eye movements and shows enormous clinical and genetic heterogeneity. cn may be an ambiguous sign of many different diseases, including retinal dysfunction/degeneration, ocular/oculocutaneous albinism, and severe central nervous system disorders, such as pelizaeus-merzbacher or pelizaeus-merzbacher-like diseases (pmld). due to enormous heterogeneity found among the diseases leading to cn, whole exome sequencing (wes) and panel-based bioinformatics was considered as an approach to rapidly identify disease-associated genetic sequence variants. we have analyzed 9 families with 16 cn-affected patients. herein, we present three clinically different patients who were initially affected with cn, but developed further clinical symptoms of various severities. one of these patients showed features of retinal dysfunction, including night blindness and myopia, while two other patients developed severe phenotypes including mental retardation or pmld. wes identified four genetic variants in genes associated with cn. the first patient showed a hemizygous splice-donor variant (c.2576 + 1g>a) in the calcium channel voltage-dependent alpha-1f subunit (cacna1f) gene, the second patient carried a hemizygous variant (c.1403g>a, p.r468h) in the ferm domain-containing protein 7 (frmd7) gene, and the third patient showed two heterozygous variants (c. 291c>g, p.y97* and c.716t>c, p.v293a) in the gap junction protein gamma-2 (gjc2) gene. sanger sequencing confirmed the identified variants in the index patients and verified co-segregation in several family members. our results suggest a beneficial role of wes to identify the molecular causes of cn and to rapidly confirm an initially unclear clinical diagnosis. especially, patients with rare and severe disorders (e. g. pmld) will benefit from a wes analysis performed in the early stage of the disease. the ccdc66-deficient (ccdc66-/-) mouse model exhibits slow retinal degeneration similar to a human retinitis pigmentosa (rp) phenotype (gerding et al., hum mol genet., 2011) . in order to determine whether ccdc66 gene expression might also play a role outside the retina, this study aimed at characterizing ccdc66 protein expression during early postnatal development of the mouse brain. furthermore, morphological and behavioral impact of ccdc66 deficiency in the mouse brain was analyzed. methods: ccdc66 protein expression was determined by sds page and western blot in whole brain homogenates and in selected brain regions of interest (olfactory bulb, hippocampus, cortex, striatum, cerebellum, brain stem) during early postnatal development and in adult wildtype (wt) mice. in addition, cryosections of the ccdc66-/-olfactory epithelium and bulb (during postnatal development) and the rostral migratory stream (in adult) were analyzed for ccdc66 reporter gene expression by x-gal staining. selected brain regions were additionally analyzed by electron microscopy. in order to correlate anatomical with behavioral data, olfactory performance was studied in aged ccdc66-/-mice compared to ccdc66+/+ controls by an olfactory habituation/dishabituation test (yang and crawley, curr protoc neurosci., 2009) , where olfactory exploration-time during the presentation of neutral and social odors is examined. results: ccdc66 protein was detected throughout the early postnatal development of the wt mouse brain, decreasing after birth. amongst analyzed brain regions, highest expression of ccdc66 protein was detected in the olfactory bulb exhibiting similar ccdc66 levels to retinal expression. accordingly, ccdc66 reporter gene expression was demonstrated in the mature olfactory bulb glomeruli, the adjacent olfactory epithelium and along the rostral migratory stream in the ccdc66-/-mouse brain. interestingly, strong ccdc66 reporter gene expression in glomeruli of the ccdc66-/-olfactory bulb was correlated with signs of degeneration in the ccdc66-/-mouse, but not in controls. the degeneration was also reflected by olfactory impairment in ccdc66-/-mice, which spent significantly less time for sniffing at initial presentation of unknown, neutral odors and barely responded to social odors. conclusion: besides the retina, ccdc66 protein plays a crucial role in the olfactory system as shown by its expression there as well as by ccdc66 deficiency resulting in neurodegeneration and alteration of olfaction-related behavior in the ccdc66-/-mice. as impairment of the olfactory sense in multiple neurodegenerative disorders is a common finding, the ccdc66-/mouse model is not only restricted to study retinal degeneration but possibly also degeneration of the central nervous system. background: the ccdc66-deficient (ccdc66-/-) mouse model exhibits slow retinal degeneration similar to a human retinitis pigmentosa (rp) phenotype (gerding et al. 2011) . in order to gain insights into the molecular disruptions in cilia structure or function lead to a class of human disorders called ciliopathies. joubert syndrome is characterized by a wide spectrum of symptoms, including a variable degree of intellectual disability, ataxia, and ocular abnormalities. here we report a novel homozygous missense variant (c.223 g>a; p.g75r) in the arl13b gene, which we identified by whole exome sequencing of a trio from a consanguineous family with multiple affected individuals suffering from intellectual disability, ataxia, ocular defects, and epilepsy. the same variant was also identified in a second family. we saw a striking difference in the severity of ataxia between affected male and female individuals in both families. functional analysis demonstrated that dihydrotestosterone treatment of sh-sy5y cells induced a down regulation of arl13b expression. both arl13b and arl13b-p.g75r expression rescued the cilia length and shh defects displayed by arl13bhennin (null) cells, indicating that the mutation did not disrupt either arl13b function. in contrast, arl13b-p.g75r displayed a marked loss of arl3 guanine nucleotide-exchange factor activity, despite retention of its gtpase activities, highlighting the correlation between its loss of function as an arl3 guanine nucleotide-exchange factor and joubert syndrome. s. renner, a. busch, t. bierhals, j. butter, v. kolbe, g. rosenberger institute of human genetics, university medical center hamburg-eppendorf, hamburg, germany taad (thoracic aortic aneurysm and dissection) is a heterogeneous disease that often remains silent until a life-threatening complication occurs. it belongs to the connective tissue disorders and causes 1% of death in industrial countries. several disease genes have been already identified; however, about 50% of patients with taad-associated syndromes do not show a mutation in these genes. thus, further heterogeneity is obvious. since individual risk stratification and therapeutic options highly depend on the individually mutated gene, it is very important to identify more disease genes which are aimed to be found by whole exome sequencing (wes). many of the known disease genes encode for proteins that are important for the structure and stabilization of the extracellular matrix as well as for the contraction of vascular smooth muscle cells. one central pathway is the tgf-beta signaling which functions among other proteins via the tgf-beta receptor, its ligand and its downstream target smad2/3. our project plan includes (i) exome sequencing both in affected individuals within families as well as in sporadic patients, (ii) filtering of raw data and prioritization of sequence variants by using a bioinformatic in house pipeline, (iii) verification of novel putative disease genes in a cohort of 200 mutation-negative patients with taad spectrum disease and (iv) functional analyses to gain deeper insight into the pathobiology of taad. in a first round of wes analysis and variant prioritization, we identified a highly conspicuous sequence variant in three family members with taad. screening of the respective gene in a large cohort of mutation negative patients revealed another variant in two siblings with taad. structural and functional considerations strongly support deleterious effects for both identified putative pathogenic missense variants that affect a novel cell cycle-and/or apoptosis regulating protein. functional analyses did not show an involvement of this protein in tgf-beta signaling. in ongoing experiments, we focus on mutation-induced consequences on cell proliferation, cell cycle progression and apoptosis. indeed, we found inhibitory effects of the missense variants on proliferation by affecting the cell cycle key protein cdkn1a. we hypothesize that dysregulation of proliferation and/ or apoptosis of specific cells, e. g. smooth muscle cells, underlies taad.abstracts for screening. positively tested compounds were reanalyzed by whole-cell patch clamp recordings and cell volume measurements. results: the halide assay revealed reproducible halide permeability across wells and, as a control, reliably detected mdckii cells overexpressing wild type best1 by a decrease of yfp fluorescence to 70% following 60 seconds iodide stimulation. cells harboring mutant best1 showed 85% of default yfp fluorescence after the same time interval. conclusion: the current study established an assay appropriate for high and small-scale compound screening targeting best1 localization and function. this assay will be used to screen for compounds in mutant cells lines best1-t6p and best1-y227n for their ability to improve trafficking to the pm or correcting protein folding to enhance ion permeability. background: neuropeptide y-y2 receptor (y2 receptor), an auto-receptor of neuropeptide y (npy) and attractive guanine nucleotide (g) protein-coupled receptor target, has been implicated as a potential therapeutic target for many clinical conditions, including epileptic seizure, depression, pain, and alcoholism. in huntington's disease (hd) patients and animal models of hd, npy-expressing striatal interneurons are selective preserved and increased with advancing disease. however, the potential role of y2 receptor in hd pathology remains under-explored. aims: to investigate whether activation of y2 receptor using npy and selective y2r ligands could ameliorate behavioral deficits and neuropathology in r6/2 mouse model of hd. methods/techniques: npy and selective y2 receptor agonist npy13-36 were intranasally administered to r6/2 mice, five days in a week, beginning from 4 weeks of age until 12 weeks of age. in the second study, r6/2 mice received daily intraperitoneal administration of selective non-peptide y2 receptor antagonist (sf-31) to selectively block y2 receptor. results/outcome: intranasal application of npy showed significant increase in rotarod performance compared to the saline and sf-31 treated r6/2 mice (*p < 0.05 and **p < 0.01 at 8 and 12 weeks of age respectively, n = 12). however, treatment with npy13-36 showed a clear trend towards increased rotarod performance at 12 weeks of age compared to the saline and sf-31 treated r6/2 mice but the difference did not reach significance. also, treatment with npy and npy13-36 showed no significant effect on body weight loss in r6/2 mice, contrasting with previous data obtained with single intracerebroventricular (icv) injection of npy in r6/2 mice. furthermore, intranasal application npy or npy13-36 led to decrease in mutant huntingtin (htt) aggregation and mediated increase in dopamine-and camp regulated phosphoprotein (darpp-32) and brain derived neurotrophic factor (bdnf) levels. additionally, we found that npy and npy13-36 attenuate microglial activation, inducible nitric oxide synthase (inos) expression, and proinflammatory cytokines production in r6/2 mice compared to the saline and sf-31 treated r6/2 mice. conclusion: taken together, our findings suggest that targeting npy-y2 receptor might be a potential neuroprotective therapy for hd and other neurodegenerative diseases. best of both world's: a novel, rapid capture protocol that overcomes drawbacks associated with dna fragmentation in established methods j. seggewiß, c. ruckert, p. wieacker institute of human genetics, westfaelian wilhelms-university of muenster, muenster, germany rapid capture protocols are an attractive proposition for clinical sequencing labs, as they enable quicker sample-to-sequencing turnaround times. the fragmentation of input dna for the construction of pre-capture libraries is a bottleneck in established protocols. mechanical shearing is the gold standard, but is laborious using single-tube covaris instruments; and higher-throughput instrumentation is cost-prohibitive to many smaller labs. "tagmentation"-based methods (e. g. the nextera rapid capture system from lllumina, or agilent's sureselect qxt system) employ transposases for fast ond simple library construction. however, these protocols are associated with significant sequence bias, especially with low-quality ffpe samples and are extremely sensitive to dna input -thus requiring meticulous quantification of viscous, high-molecular weight dna. here we describe a newly-developed rapid capture protocol that combines the kapa hyperplus kit (kapa biosystems) with integrated, low-bios enzymatic fragmentation, and agilent's proven sureselect xt target enrichment technology. the streamlined method follows for the preparation of high-quality, sequencing-ready libraries in one working day. the novel enzymatic fragmentation reagent does not require careful quantification of input dna, yielding reproducible fragmentation profiles optimal for capture over the 5fold range-tested (50-250 ng) the single-tube kapa hyperplus protocol results in very efficient conversion of input dna to precapture library thereby decreasing duplication rates and increasing the complexity of the library going into the modified, 90min sureselect fast xt hybridization protocol. our protocol represents a significant improvement for fast routine diagnostics, where robust and reproducible pipelines ore needed to support timely treatment decisions. lmj. braun, a. milenkovic, bhf. weber institute of human genetics, regensburg, germany purpose: human bestrophin-1 (best1) is a chloride channel controlled by ca 2+ and cell volume and is localized at the basolateral membrane of the retinal pigment epithelium (rpe). so far, there is no therapy for the best1-associated diseases, of which the most common is best vitelliforme macular dystrophies (bvmd). in this study, we developed an assay applicable for high and small-scale compound screening targeting best1 localization and function. methods: to assess best1 channel function we developed a halide assay. briefly, mdckii cell lines were established, stably expressing wildtype best1 or bvmd-associated best1 mutants together with a yellow fluorescent protein (yfp)-based halide sensor. in polarized mdckii cells, wildtype best1 and best1-r218c localize regularly at the basolateral plasma membrane (pm) while best1-l224m and best1-y227n appear significantly reduced in quantity and grossly mislocalized to cytosolic compartments. cells were stimulated with extracellular addition of iodide known to pass the pm through anion cannels and, as a consequence, intracellularly quench yfp fluorescence. variations in yfp fluorescence levels as a marker for best1 function were recorded in 96 well plates by a plate reader setup. a small-scale 2,560 compound library, commercially available as spectrum collection (microsource discovery systems, gaylordsville, usa) was used key: cord-004879-pgyzluwp authors: nan title: programmed cell death date: 1994 journal: experientia doi: 10.1007/bf02033112 sha: doc_id: 4879 cord_uid: pgyzluwp nan it is widely held that all developmental cell death is of a single type (apoptosis) and that neuronal death is primarily for adjusting the number of neurons in a population to the size of their target field through competition between equals for target-derived factors. we shall draw on our research and on that of others to criticize these views and replace them by the following. at least three types of neuronal death occur, only one of which resembles apoptosis; a neuron can choose between several self-destruct mechanisms depending on the cause of its death. the purpose of the death is to regulate connectivity, not neuron number. competitors for trophic factors are unequal, and many losers have made axonal targeting errors. a neuron's survival and differentiation depend on multiple anterograde and retrograde signals. activity affects retrograde signals and some but not all anterograde ones. the pattern of activity is more important than the overall amount. in rodents, the period of naturally occuring cell death of motoneurons is followed by a period of supersensitivity to axonal injury. thus, in newborn rodents lesion of the facial nerve leads to a rapid degeneration of the injured motoneurons. we have tested whether overexpression, in rive, of the bcl-2 proto-oncogene was capable of preventing death of axotomized motoneurons. to address this question we used transgenic mice whose motoneurons overexpress the bcl-2 protein. one of the two facial nerves of newborn mice was transected on the 2nd-3rd post-natal day. seven days after the lesion, the morphology of the facial nuclei was analyzed. in control mice, and when compared to the intact nucleus, 70 to 80 % of axotomized motoneurons had disappeared. in contrast, in the transgenic animals, the number of motoneurons on the lesioned side remained unchanged when compared to the eontralateral nucleus. furthermore, their axons remained visible up to the distal lesion site. these experiments show that, in rive, motoneurons overexpressing the bcl-2 protein survive after axotomy, and suggest that, in rive, bcl-2 protect neurons from experimentally induced cell death and could be a target for treatment of motoneurons degenerative diseases. messmer s., mattenberger l., sager y., blatter-garin m-c., pometta d., kate a., james r.w. drpt de mrdeeine, drpt. de pharmacologie, div. de neurophysiologie clinique, facult6 de mrdecine, gen~ve. clusterin is a widely expressed glycoprotein, highly conserved across species. numerous functions have been postulated for this protein. the most important are roles in lipid transport, as elusterin is associated with apolipoprotein ai in hdl, complement regulation and tissue remodelling, in particular during cell death and differentiation. using cultures of rat spinal cord neurones (90% neurons and 5-10% non-neuronal cells), we have studied the expression of clusterin and ape e in glutamate-induced neuronal cell death to examine potential roles in lipid management. up-regulation of the two proteins was observed. clusterin and ape e appear in the conditioned medium respectively 15h and 7.5h after incubation with glutamate. control studies, in the presence of a noncompetitive nmda receptor agonist showed the secretion of clusterin and ape e to be diminished by >60%. no up-regulation of either protein was observed in complementary studies with exclusively non-neuronal cell cultures. the cellular origin of the 2 secreted proteins is presently under investigation. programmed cell death and tissue remodelling are consequences of hormonally induced restructuring of the rat ventral prostate after castration and the rat mammary gland after weaning. we used the "differential display"-method (liang and pardee, 1992, science 257:967) to detect and isolate edna fragments whose corresponding rnas are regulated either coincidentally, or in an organ specific fashion during mammary gland involution and postcastrational prostate regression. partial sequencing of 12 clones revealed high, but not absolute homology of 5 fragments with sequences, previously characterized in different biological contexts. these five encode functions which could be anticipated to be important for cell growth and/or programmed cell death, we are presently investigating the functions of several of these transcripts in cell culture and in rive. antisense oligos are being employed in vivo to determine whether these genes contribute to the phenotype of programmed cell death. b epitopes derived from the envelope gp52 glycoprotein (ep3) or from the viral superantigen of mmtv have been incorporated into inert or live vaccines. the inert vaccine consists of purified chimeric proteins which contain the b epitopes alone or fused to multimeric promiscuous t helper epitopes from tetanus toxin. mice were immunized subcutaneously with these chimeric proteins. the live vaccine consists of an avirulent strain of salmonella typhimurium which expresses the mmtv epitopes in the form of chimeric proteins fused to the nucleocapsid protein of hepatitis b virus. this vaccine is given to mice in one oral dose. the level, duration and isotype of the immune response generated by each vaccine have been measured and compared. the level of protection has been investigated by systemically challenging immunized mice with the relzovims. a reduced binding of oxytocin (ot) occurs with aging in some, but not all, areas of the rat brain (arsenijevic et al., experientia 1993, 49, a75) . the candate putamen showed the most impressive loss of ot receptors. two other regions, the hypothalamic ventromedial nucleus (vmh) and the islands of caueja (icj) had also an important deficit of ot binding sites. on the other hand, these two regions were known to be sensitive to sex steroids. in the present work, we treated from 20 month old rats during one month with testosterone propionate (2 #g/kg s.c., once every 3 days) dissolved in oil. three rats of the same age injected with oil only served as controls. we labelled ot receptors throughout the brain of old rats using a 125i-labelled ligand specific for ot receptors. analysis of autoradiograms by an image analyzer revealed that the testosterone treatment increased ot binding sites in the vmh, in the icj, and, to a lesser extent, in the bed nucleus of the stria terminalis, a region also sensitive to sex steroids, by contrast, in the caudate putamen, the disappearance of ot receptors was not compensated. in conclusion, the decrease of ot receptors occurring in vmh and icj with aging can be reversed by administration of gonadal steroids. in contrast, the loss of ot receptors in the striatum appears to depend on another mecanism. vasopressin (avp) receptors are expressed transiently in the facial nucleus during development (tribollet et el., 1991, dev. brain res., 58, 13-24) . avp may therefore play a role in the maturation of neuromuscular connexions in the neonate rat, and possibly in the restanration of these connexions after nerve lesion in the adult. in order to investigate the latter proposition, we have sectionned the facial nerve in adult rats and used quantitative autoradiography to look at avp binding sites in the facial nucleus at various postoperative times. we observed a massive and transient increase of avp binding sites on the operated side. the number of facial avp binding sites reaches a maximum about one week after nerve section, remains stable during 2-3 weeks, then begin to decrease towards control level. the induction of avp receptors is markedly delayed if the proximal stump of the nerve is ligated. to assess whether other motor nuclei would also react to axotomy by up-regulating the expression of avp receptors, we have sectionned the hypoglossal nerve and the sciatic nerve. in both cases, the binding of avp receptor ligand increases massively in the respective motor nuclei, with a time-course similar to that found in the facial nucleus. altogether, our data suggest that central avp could be involved in the process of nerve regeneration. cytotoxic t-cell mediated apoptosis schaerer,e, karapetian,o.,adrian,m. and tschopp,j. inst.de biochimie, univ.de lausanne, 1066 epalinges. an apoptotic cell death mechanism is used by cytolytic t cells (ctl) to lyse appropriate target cells. ctl harbor cytoplasmic storage compartments, containing the lytic protein perforin and serineproteases (granzymes), whose content is released upon target cell interaction. we show that these granules are multivesieular bodies and that degranulation releases these intragranular vesicles (igv) having granzymes, t-cell receptor and yet undefined proteins associated. isolated igvs and perforin induce dna breakdown in target cells within 20 minutes. microscopic analysis demonstrates that igv specifically interact with target cell via the t-cell receptor and that their contents is taken up by the target cell. already 15 min. after interaction, 3 distinct igv proteins are found in the nucleus of the target cell.one of the molecules has been identified to be granzyme a, previously reported to be involved in apoptosis. we propose that lymphocytes transfer apoptosisinducing proteins to the nucleus of the target cells using vesicles as vehicles for delivery. cytotoxic t cells kill their targets by a mechanism involving membranolysis and dna degradation (apoptosis). recently, two sets of proteins have been proposed as dna breakdown-inducing molecules in t cells: granzyme a, b and tia-i. in this study, we cloned and further characterized the tia-i mouse homologue. aa sequence comparison with the human tia-1 showed an overall identity of 93%. devoid of a signal peptide, tia is yet localized to cytotoxic granules, probably targeted via a gly-tyr-motif. as tia-i, its mouse homolcgue contains three rnabinding domains. expression of tia during development shows a very strong signal in the brain and weaker signals in thymus, heart and other organs. during embryonic development several structures that contribute to organogenesis form transiently and are later eliminated by apoptosis. this pattern of tia expression could indicate its involvement in apoptosis. prostate involution occurs after castration in rats and is associated with the death by apoptosis of a large fraction of the epithelial cells. we have isolated several genes from a prostate involution bacteriophage lambda library using differential screening methods. among these clones, one d~monstrated an especially strong signal when used as a probe against northern blots of prostate mlhna obtained before, and at different times after castration. this gene is down-regulated after castration by 40-fold within 5 days. intramuscular injection of a testosterone depot resulted in complete restoration of expression within 24 hours. upon sequencing it became apparent that this clone has a high degree of homology to a known ndah dehydrogenase encoded in mitochondrial dna. the clone failed to hybridize to any transcripts from rat organs other than prostate. we are now in the process of isolating the htm~n hc~olog to this gene for use as a biomarker in study of benign hyperplasia and developing carcinoma. this gene is a possible indicator for testosterone-independent cell populations or of cells lacking ftl~ctional testosterone receptor. during the first three postnatal weeks the rat lung undergoes the last two developmental stages, the phase of alveolarization and the phase of microvascular maturation. the latter involves a decrease of the connective tissue mass in the alveolar septa and a merging of the two capillary layers to a single one. speculating that programmed cell death may play a role during this remodeling, we searched for the presence of apoptotie cells in rat lungs between days 10 and 24. lung paraffin sections were treated with y-terminal transferase, digoxigenin-dutp, and anti-digoxigeninfluorescein-f(ab)-fragments, and the number of fluorescent nuclei was compared between sections at different days. while the number of apoptotie ceils was low until the end of the second week and at day 24, we observed an about eight fold increase of fluorescent nuclei towards the end of the third week. we conclude that programmed cell death is involved in the structural maturation of the lung. brunner, a., wallrapp, ch., pollack, i, twardzik, t. and schneuwly, s. lehrstuhl genetik, biozentrum universit~t w~rzburg, mutants in the giant lens (g/l) gene show a strong disturbance in ommatidial development. in the absence of any gene product, additional phetoreceptors, cone cells and pigment cells develop. opposite effects can be seen in flies in which the gene product of the giant lens gene can be ectopically expressed by heat shock. a second very typical phenotype is the disturbance of photoreceptor axon guidance. molecular analysis of gil shows that it encodes a secreted protein of 444aa containing three evolutionary conserved cystein-motives very similar to egf-like repeats. we propose that gil functions as a secreted signal, most likely a lateral inhibitor for the development of specific cell fates and that gil, either directly or indirectly, is involved in targeting photoreceptor axons into the brain. the decrease in cellularity during scar establishment is mediated through apoptosis desmouliere, a., redard, m., darby, i., and g. gabbiani department of pathology, cmu, 1 rue michel server, 1211 gen~ve 4 dudng the healing of an open wound, granulation tissue formation is characterized by replication and accumulation of fibroblastic cells, many of which acquire morphological and biochemical features of smooth muscle cells and have been named myofibroblasts (sch0rch et el., histology for pathologists, t992). as the wound evolves into a scar, there is an important decrease in ceuuladty, including disappearance of myofibroblasts. the question adses as to which process is responsible for myofibroblast disappearance. during a previous investigation on the expression of (z-smooth muscle actin in myofibroblasts, we have obsewed that in late phases of wound healing, many of myofibroblasts show signs of apoptosis end suggested that this type of cell death is responsible for the disappearance of myofibroblasts (darby et al., lab. invest. 63:21, 1990) . we have tested this hypothesis by means of electron microscopy and morphometry and by in situ end-labeling of fragmented dna (wijsman et al., j. histochem. cytochem. 41:7, t 993) . our results show that the number of apoptotic cells increases as the wound closes and suggest that this may be the mechanism for the disappearance of myofibroblasts as well as for the evolution of granulation tissue into a scar. (supported by the swiss national science foundation, grant n~ s01-16 r. jaggl, a. marti and b. jehn. universit~t bern, akef, tiefenaustr. 120, 3004 bern at weaning the mammary gland undergoes a reductive remodelling process (involution) which is associated with the cessation of milk protein gene expression and apoptosis of milk-produclng epithelial cells. this process can be reversed by returning the pups to the mother within 1 day. elevated nuclear protein kinase a (pka) activity was observed from one day post-lactation, paralleled by increased c-los, junb, ]und and to a lesser extent c-]un mrna levels. ap-1 dna binding activity was transiently induced and the ap-1 complex was shown to consist principally of cfos/jund. oct-1 dna binding activity and oct-1 protein were gradually lost from the gland over the first four days of involution, whereas oct-1 m_rna levels remained unchanged. comparing nuclear extracts from normal mammary glands with nuclear extracts from glands which had been cleared of all epithelial cells three weeks after birth revealed that pka activation, ap-1 induction and oct-1 inactivation are all dependent on the presence of the epithelial compartment. the increased fos/jtm expression and the inactivation of oct-1 may be consequences of the increased pka activity. when involution is reversed, both, pica activity and ap-1 dna binding activity (and fos andjun mrna levels) are reduced to basal levels. our data suggests a role for pka and ap-1 on progranlmed cell death of manlnmry epithelial ceils. bcl-2~ does not require membrane attachment for its survival activity c. borner*, i. martinout, c. mattmann*, m. irmler*, e. sch&rrer*, j.-c. martinou-j-, and j. tschopp*. * institute of biochemistry, university of lausanne, 1066 epalinges, 1 institute of molecular biology, glaxo inc.,1228 plan los ouates. 8cl-2(z is a mitochondrial or perinuclear-associated oncoprotein that prolongs the life span of a variety of cell types by interfering with programmed cell death. how it exerts this activity is unknown but it is believed that membrane attachment is required. to identify critical regions in bcl-2o~ for subcellular localization and survival activity, we created by site-directed mutagenesis, various mutations in regions which are most conserved between the different bcl-2 species. we show here that membrane attachment is not required for the survival activity of bcl-2o< a truncation mutant of bcl-2(z lacking the last 33 amino acids (t3) including the hydrophobic domain is soluble, yet fully active in blocking apoptosis of sympathetic neurons induced by ngf deprivation or l929 fibroblasts induced by tnfc~ treatment. we further provide evidence for a putative functional region in bcl-2 which lies in the conserved domains 4 and 5 upstream of the hydrophobic cooh terminal tail. the breakdown of nuclear dna is considered to be a hallmark of apoptosis. we previously identified the perinuclear membrane localized dnase i as the endonuclease involved in the formation of oligonucleosomal-sized fragments (dna ladder). it is not clear how the nuclease is activated and has access to the dna. we show that in thymocytes induced to undergo apoptosis, lamin breakdown preceded dna laddering. by transfeeting hela cells with a constitutively active cdc2 mutant, nuclear envelope breakdown and typical apoptotic features (ehromatin condensation) were observed. moreover, co-transfection with cdc2 mutant and dnase i led to dna degradation. we propose that apoptosis can be induced by wrongly timed and hence abortive mitosis leading to uncontrolled nuclear membrane disintegration. s02-01 s02-04 platelet-derived growth factor (pdgf) is thought to play an active role in fibrosing diseases. bronchiolitis obliterans-organizing pneumonia (boop) is a condition characterized by intraluminal proliferation of connective tissue inside distal air spaces. to evaluate pdgf expression in boop we performed immunohistoehemistry on lung biopsies from 20 patients and 10 controls free of fibrosis. sedal sections were stained with an antibody against either pdgf or the monoeyte/macrophage marker cd68, in both groups the pdgf ~9 cells were essentially tissue macrophages. using point counting to measure volume fraction (vv) , pdgf-pesitive cells represented 4.65+1.63% (mean+sd) of the volume occupied by lung tissue in the boop cases, and 2,12+0.65% in the controls (!0<0,001). similarily, 10.73+4.69% of the lung tissue was occupied by cd68 e~ macrophages in the boop cases, compared to 5.37:~3.73% in the controls (p